* overlayfs access checks on underlying layers @ 2018-11-27 19:55 Miklos Szeredi 2018-11-27 19:58 ` Miklos Szeredi 0 siblings, 1 reply; 47+ messages in thread From: Miklos Szeredi @ 2018-11-27 19:55 UTC (permalink / raw) To: Stephen Smalley, Vivek Goyal, Ondrej Mosnacek, Paul Moore, J. Bruce Fields, Mark Salyzyn Cc: linux-kernel, overlayfs, linux-fsdevel, selinux Moving discussion from github[1] to here. To summarize: commit 007ea44892e6 ("ovl: relax permission checking on underlying layers") was added in 4.20-rc1 to make overlayfs access checks on underlying "real" filesystems more consistent. The discussion leading up to this commit can be found at [2]. The commit broke some selinux-testsuite cases, possibly indicating a security hole opened by this commit. The model this patch tries to follow is that if "cp --preserve=all" was allowed to the mounter from underlying layer to the overlay layer, then operation is allowed. That means even if mounter's creds doesn't provide permission to for example execute underying file X, if mounter's creds provide sufficient permission to perform "cp --preserve=all X Y" and original creds allow execute on Y, then the operation is allowed. This provides consistency in the face of copy-ups. Consistency is only provided in sane setups, where mounter has sufficient privileges to access both the lower and upper layers. The model may not have been perfectly followed, or possibly the model itself is flawed. I'd like to better understand the issues here. Thanks, Miklos [1] https://github.com/SELinuxProject/selinux-kernel/issues/43#issuecomment-442148920 [2] https://marc.info/?t=152762608800002&r=1 ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-11-27 19:55 overlayfs access checks on underlying layers Miklos Szeredi @ 2018-11-27 19:58 ` Miklos Szeredi 2018-11-27 21:05 ` Vivek Goyal 0 siblings, 1 reply; 47+ messages in thread From: Miklos Szeredi @ 2018-11-27 19:58 UTC (permalink / raw) To: Stephen Smalley, Vivek Goyal, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore Cc: linux-kernel, overlayfs, linux-fsdevel, selinux [resending with fixed email address for Paul Moore] Moving discussion from github[1] to here. To summarize: commit 007ea44892e6 ("ovl: relax permission checking on underlying layers") was added in 4.20-rc1 to make overlayfs access checks on underlying "real" filesystems more consistent. The discussion leading up to this commit can be found at [2]. The commit broke some selinux-testsuite cases, possibly indicating a security hole opened by this commit. The model this patch tries to follow is that if "cp --preserve=all" was allowed to the mounter from underlying layer to the overlay layer, then operation is allowed. That means even if mounter's creds doesn't provide permission to for example execute underying file X, if mounter's creds provide sufficient permission to perform "cp --preserve=all X Y" and original creds allow execute on Y, then the operation is allowed. This provides consistency in the face of copy-ups. Consistency is only provided in sane setups, where mounter has sufficient privileges to access both the lower and upper layers. The model may not have been perfectly followed, or possibly the model itself is flawed. I'd like to better understand the issues here. Thanks, Miklos [1] https://github.com/SELinuxProject/selinux-kernel/issues/43#issuecomment-442148920 [2] https://marc.info/?t=152762608800002&r=1 ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-11-27 19:58 ` Miklos Szeredi @ 2018-11-27 21:05 ` Vivek Goyal 2018-11-28 10:00 ` Miklos Szeredi 0 siblings, 1 reply; 47+ messages in thread From: Vivek Goyal @ 2018-11-27 21:05 UTC (permalink / raw) To: Miklos Szeredi Cc: Stephen Smalley, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh On Tue, Nov 27, 2018 at 08:58:06PM +0100, Miklos Szeredi wrote: > [resending with fixed email address for Paul Moore] > > Moving discussion from github[1] to here. > > To summarize: commit 007ea44892e6 ("ovl: relax permission checking on > underlying layers") was added in 4.20-rc1 to make overlayfs access > checks on underlying "real" filesystems more consistent. The > discussion leading up to this commit can be found at [2]. The commit > broke some selinux-testsuite cases, possibly indicating a security > hole opened by this commit. > > The model this patch tries to follow is that if "cp --preserve=all" > was allowed to the mounter from underlying layer to the overlay layer, > then operation is allowed. That means even if mounter's creds doesn't > provide permission to for example execute underying file X, if > mounter's creds provide sufficient permission to perform "cp > --preserve=all X Y" and original creds allow execute on Y, then the > operation is allowed. This provides consistency in the face of > copy-ups. Consistency is only provided in sane setups, where mounter > has sufficient privileges to access both the lower and upper layers. [cc daniel walsh] I think current selinux testsuite tests are written keeping these rules in mind. 1. Check overlay inode creds in the context of task and underlying inode creds (lower/upper), in the context of mounter. 2. For a lower inode, if said file is being copied up, then only check MAY_READ on lower. This is equivalent to mounter creating a copy of file and providing caller access to it (context mount). For the case of special devices, we do not copy up these. So should we continue to do check on lower inode in the context of mounter (instead of not doing any check on lower at all). For being able to execute a file, should we atleast check MAY_READ on lower. I am not sure why did we have to drop current checks on special file and execute. I will read through the thread you pointed out. Thanks Vivek ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-11-27 21:05 ` Vivek Goyal @ 2018-11-28 10:00 ` Miklos Szeredi 2018-11-28 17:03 ` Vivek Goyal 0 siblings, 1 reply; 47+ messages in thread From: Miklos Szeredi @ 2018-11-28 10:00 UTC (permalink / raw) To: Vivek Goyal Cc: Stephen Smalley, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh On Tue, Nov 27, 2018 at 10:05 PM Vivek Goyal <vgoyal@redhat.com> wrote: > > On Tue, Nov 27, 2018 at 08:58:06PM +0100, Miklos Szeredi wrote: > > [resending with fixed email address for Paul Moore] > > > > Moving discussion from github[1] to here. > > > > To summarize: commit 007ea44892e6 ("ovl: relax permission checking on > > underlying layers") was added in 4.20-rc1 to make overlayfs access > > checks on underlying "real" filesystems more consistent. The > > discussion leading up to this commit can be found at [2]. The commit > > broke some selinux-testsuite cases, possibly indicating a security > > hole opened by this commit. > > > > The model this patch tries to follow is that if "cp --preserve=all" > > was allowed to the mounter from underlying layer to the overlay layer, > > then operation is allowed. That means even if mounter's creds doesn't > > provide permission to for example execute underying file X, if > > mounter's creds provide sufficient permission to perform "cp > > --preserve=all X Y" and original creds allow execute on Y, then the > > operation is allowed. This provides consistency in the face of > > copy-ups. Consistency is only provided in sane setups, where mounter > > has sufficient privileges to access both the lower and upper layers. > > [cc daniel walsh] > > I think current selinux testsuite tests are written keeping these > rules in mind. > > 1. Check overlay inode creds in the context of task and underlying > inode creds (lower/upper), in the context of mounter. > > 2. For a lower inode, if said file is being copied up, then only > check MAY_READ on lower. This is equivalent to mounter creating > a copy of file and providing caller access to it (context mount). > > For the case of special devices, we do not copy up these. So should > we continue to do check on lower inode in the context of mounter > (instead of not doing any check on lower at all). Hmm, I'm trying to understand the logic... If we follow the "cp --preserve=all" thing, than mounter needs to have CREATE permission for the special file, not READ or WRITE. Does that make sense? Would that help with the context= mount use case? > > For being able to execute a file, should we atleast check MAY_READ > on lower. Yep, that looks like a bug present from day one: MAY_EXEC doesn't always imply MAY_READ, but to be able to execute a file, the kernel must read it first, and if mounter doesn't have privilege to read the file, then user should not be allowed to execute it. > I am not sure why did we have to drop current checks on special file > and execute. I will read through the thread you pointed out. TL;DR: NFS access model is that creds are checked by server (and cached in client), and server could be denying write access to a device file to mounter (root) independently of DAC. In that case write access by user to device file would be inconsistent (denied before copy-up, allowed after copy-up). Same goes for execute. And same goes for MAC: if it's denying READ/WRITE on device or denying EXECUTE on readable file to mounter, and mounter can just copy that device/file to a temporry location not controlled by that MAC, than it can work around that restriction. IOW, this is just a generalization of the rule that we ignore WRITE access on lower layer, because a write will never reach the lower layer. Thanks, Miklos ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-11-28 10:00 ` Miklos Szeredi @ 2018-11-28 17:03 ` Vivek Goyal 2018-11-28 19:34 ` Stephen Smalley 0 siblings, 1 reply; 47+ messages in thread From: Vivek Goyal @ 2018-11-28 17:03 UTC (permalink / raw) To: Miklos Szeredi Cc: Stephen Smalley, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh On Wed, Nov 28, 2018 at 11:00:09AM +0100, Miklos Szeredi wrote: > On Tue, Nov 27, 2018 at 10:05 PM Vivek Goyal <vgoyal@redhat.com> wrote: > > > > On Tue, Nov 27, 2018 at 08:58:06PM +0100, Miklos Szeredi wrote: > > > [resending with fixed email address for Paul Moore] > > > > > > Moving discussion from github[1] to here. > > > > > > To summarize: commit 007ea44892e6 ("ovl: relax permission checking on > > > underlying layers") was added in 4.20-rc1 to make overlayfs access > > > checks on underlying "real" filesystems more consistent. The > > > discussion leading up to this commit can be found at [2]. The commit > > > broke some selinux-testsuite cases, possibly indicating a security > > > hole opened by this commit. > > > > > > The model this patch tries to follow is that if "cp --preserve=all" > > > was allowed to the mounter from underlying layer to the overlay layer, > > > then operation is allowed. That means even if mounter's creds doesn't > > > provide permission to for example execute underying file X, if > > > mounter's creds provide sufficient permission to perform "cp > > > --preserve=all X Y" and original creds allow execute on Y, then the > > > operation is allowed. This provides consistency in the face of > > > copy-ups. Consistency is only provided in sane setups, where mounter > > > has sufficient privileges to access both the lower and upper layers. > > > > [cc daniel walsh] > > > > I think current selinux testsuite tests are written keeping these > > rules in mind. > > > > 1. Check overlay inode creds in the context of task and underlying > > inode creds (lower/upper), in the context of mounter. > > > > 2. For a lower inode, if said file is being copied up, then only > > check MAY_READ on lower. This is equivalent to mounter creating > > a copy of file and providing caller access to it (context mount). > > > > For the case of special devices, we do not copy up these. So should > > we continue to do check on lower inode in the context of mounter > > (instead of not doing any check on lower at all). > > Hmm, I'm trying to understand the logic... If we follow the "cp > --preserve=all" thing, than mounter needs to have CREATE permission > for the special file, not READ or WRITE. Does that make sense? Would > that help with the context= mount use case? Ok. If we follow "cp --preserve=all" methodology, then checking for mounter CREATE permission on upper for special files makes sense. Or change logic to copy up this special file during open. I am assuming we don't copy up special file during open as it is not necessary for things to work but copying up will work as well? So rules will become. - Two levels of checks. - For lower level inode, check MAY_READ for regular files. (including exec). - For special files, only make sure mounter can CREATE object in upper. - What about checks on files on upper/. As of now we seem to check access in mounter's context if it is regular file. Skip the checks completely for special files and for executables. While non-context mount should still be ok, but this means lot of privilige granting to unprivileged process using context mounts. So unprivileged process which could not open a device/socket/fifo for read/write on host fs, can open it for those operations for context mounts. IOW, for context mount case, an unprivileged user will gain lot of privileges. But that seems to be the point of context mount anyway on regular disks. If a disk is mounted using context mount option, then all real labels are ignored and all access checking happens using context label. We are doing similar thing. With one step extra and that is making sure if mounter itself can not do certain operation on host, that will still be denied. This probably means that context= mounts should be used very carefully. It will grant lot of priviliges to the process (and allow operations which process could not do on host without overlayfs mount). Case of device file still baffels me though. We don't do any mounter's checks on device files. So if a device file is on upper which mounter can't open for read but mounter is still granting priviliges to client to open that device file. That's unintutive to me and seems counter to the principle of that mounter can't give more priviliges than what it itself can't do on host. Dan, stephen, paul moore, does this sound ok to you folks from selinux point of view. Thanks Vivek ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-11-28 17:03 ` Vivek Goyal @ 2018-11-28 19:34 ` Stephen Smalley 2018-11-28 20:24 ` Miklos Szeredi 0 siblings, 1 reply; 47+ messages in thread From: Stephen Smalley @ 2018-11-28 19:34 UTC (permalink / raw) To: Vivek Goyal, Miklos Szeredi Cc: Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh On 11/28/18 12:03 PM, Vivek Goyal wrote: > On Wed, Nov 28, 2018 at 11:00:09AM +0100, Miklos Szeredi wrote: >> On Tue, Nov 27, 2018 at 10:05 PM Vivek Goyal <vgoyal@redhat.com> wrote: >>> >>> On Tue, Nov 27, 2018 at 08:58:06PM +0100, Miklos Szeredi wrote: >>>> [resending with fixed email address for Paul Moore] >>>> >>>> Moving discussion from github[1] to here. >>>> >>>> To summarize: commit 007ea44892e6 ("ovl: relax permission checking on >>>> underlying layers") was added in 4.20-rc1 to make overlayfs access >>>> checks on underlying "real" filesystems more consistent. The >>>> discussion leading up to this commit can be found at [2]. The commit >>>> broke some selinux-testsuite cases, possibly indicating a security >>>> hole opened by this commit. >>>> >>>> The model this patch tries to follow is that if "cp --preserve=all" >>>> was allowed to the mounter from underlying layer to the overlay layer, >>>> then operation is allowed. That means even if mounter's creds doesn't >>>> provide permission to for example execute underying file X, if >>>> mounter's creds provide sufficient permission to perform "cp >>>> --preserve=all X Y" and original creds allow execute on Y, then the >>>> operation is allowed. This provides consistency in the face of >>>> copy-ups. Consistency is only provided in sane setups, where mounter >>>> has sufficient privileges to access both the lower and upper layers. >>> >>> [cc daniel walsh] >>> >>> I think current selinux testsuite tests are written keeping these >>> rules in mind. >>> >>> 1. Check overlay inode creds in the context of task and underlying >>> inode creds (lower/upper), in the context of mounter. >>> >>> 2. For a lower inode, if said file is being copied up, then only >>> check MAY_READ on lower. This is equivalent to mounter creating >>> a copy of file and providing caller access to it (context mount). >>> >>> For the case of special devices, we do not copy up these. So should >>> we continue to do check on lower inode in the context of mounter >>> (instead of not doing any check on lower at all). >> >> Hmm, I'm trying to understand the logic... If we follow the "cp >> --preserve=all" thing, than mounter needs to have CREATE permission >> for the special file, not READ or WRITE. Does that make sense? Would >> that help with the context= mount use case? > > Ok. If we follow "cp --preserve=all" methodology, then checking for > mounter CREATE permission on upper for special files makes sense. Or > change logic to copy up this special file during open. I am assuming > we don't copy up special file during open as it is not necessary > for things to work but copying up will work as well? > > So rules will become. > > - Two levels of checks. > - For lower level inode, check MAY_READ for regular files. (including > exec). > - For special files, only make sure mounter can CREATE object in upper. > > - What about checks on files on upper/. As of now we seem to check > access in mounter's context if it is regular file. Skip the checks > completely for special files and for executables. > > While non-context mount should still be ok, but this means lot of > privilige granting to unprivileged process using context mounts. So > unprivileged process which could not open a device/socket/fifo for > read/write on host fs, can open it for those operations for context > mounts. > > IOW, for context mount case, an unprivileged user will gain lot of > privileges. But that seems to be the point of context mount anyway > on regular disks. If a disk is mounted using context mount option, > then all real labels are ignored and all access checking happens > using context label. We are doing similar thing. With one step extra > and that is making sure if mounter itself can not do certain operation > on host, that will still be denied. > > This probably means that context= mounts should be used very carefully. > It will grant lot of priviliges to the process (and allow operations > which process could not do on host without overlayfs mount). > > Case of device file still baffels me though. We don't do any mounter's > checks on device files. So if a device file is on upper which mounter > can't open for read but mounter is still granting priviliges to client > to open that device file. That's unintutive to me and seems counter > to the principle of that mounter can't give more priviliges than what > it itself can't do on host. > > Dan, stephen, paul moore, does this sound ok to you folks from selinux > point of view. It seems wrong to check CREATE when no file is being created, and doing so could lead to over-privileging of the mounter context, requiring one to allow a mounter context to create device nodes just to allow a client task context to read/write via already existing device nodes through an overlay. Checking READ but not EXECUTE upon an execute check could permit a mounter to execute unauthorized code, if it can context mount from a readable-but-not-executable context to an executable context. Note btw that cp --preserve=all doesn't quite operate as expected if dealing with a context mount. You can't preserve the original security context if copying to a context mount unless the two contexts happen to already match. So I'm not sure how that model applies in the case of a context mount. Does the breaking commit (007ea44892e6) fix a real bug affecting users? If not, I'd recommend just reverting it. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-11-28 19:34 ` Stephen Smalley @ 2018-11-28 20:24 ` Miklos Szeredi 2018-11-28 21:46 ` Stephen Smalley 0 siblings, 1 reply; 47+ messages in thread From: Miklos Szeredi @ 2018-11-28 20:24 UTC (permalink / raw) To: Stephen Smalley Cc: Vivek Goyal, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh On Wed, Nov 28, 2018 at 8:32 PM Stephen Smalley <sds@tycho.nsa.gov> wrote: > > On 11/28/18 12:03 PM, Vivek Goyal wrote: > > On Wed, Nov 28, 2018 at 11:00:09AM +0100, Miklos Szeredi wrote: > >> On Tue, Nov 27, 2018 at 10:05 PM Vivek Goyal <vgoyal@redhat.com> wrote: > >>> > >>> On Tue, Nov 27, 2018 at 08:58:06PM +0100, Miklos Szeredi wrote: > >>>> [resending with fixed email address for Paul Moore] > >>>> > >>>> Moving discussion from github[1] to here. > >>>> > >>>> To summarize: commit 007ea44892e6 ("ovl: relax permission checking on > >>>> underlying layers") was added in 4.20-rc1 to make overlayfs access > >>>> checks on underlying "real" filesystems more consistent. The > >>>> discussion leading up to this commit can be found at [2]. The commit > >>>> broke some selinux-testsuite cases, possibly indicating a security > >>>> hole opened by this commit. > >>>> > >>>> The model this patch tries to follow is that if "cp --preserve=all" > >>>> was allowed to the mounter from underlying layer to the overlay layer, > >>>> then operation is allowed. That means even if mounter's creds doesn't > >>>> provide permission to for example execute underying file X, if > >>>> mounter's creds provide sufficient permission to perform "cp > >>>> --preserve=all X Y" and original creds allow execute on Y, then the > >>>> operation is allowed. This provides consistency in the face of > >>>> copy-ups. Consistency is only provided in sane setups, where mounter > >>>> has sufficient privileges to access both the lower and upper layers. > >>> > >>> [cc daniel walsh] > >>> > >>> I think current selinux testsuite tests are written keeping these > >>> rules in mind. > >>> > >>> 1. Check overlay inode creds in the context of task and underlying > >>> inode creds (lower/upper), in the context of mounter. > >>> > >>> 2. For a lower inode, if said file is being copied up, then only > >>> check MAY_READ on lower. This is equivalent to mounter creating > >>> a copy of file and providing caller access to it (context mount). > >>> > >>> For the case of special devices, we do not copy up these. So should > >>> we continue to do check on lower inode in the context of mounter > >>> (instead of not doing any check on lower at all). > >> > >> Hmm, I'm trying to understand the logic... If we follow the "cp > >> --preserve=all" thing, than mounter needs to have CREATE permission > >> for the special file, not READ or WRITE. Does that make sense? Would > >> that help with the context= mount use case? > > > > Ok. If we follow "cp --preserve=all" methodology, then checking for > > mounter CREATE permission on upper for special files makes sense. Or > > change logic to copy up this special file during open. I am assuming > > we don't copy up special file during open as it is not necessary > > for things to work but copying up will work as well? > > > > So rules will become. > > > > - Two levels of checks. > > - For lower level inode, check MAY_READ for regular files. (including > > exec). > > - For special files, only make sure mounter can CREATE object in upper. > > > > - What about checks on files on upper/. As of now we seem to check > > access in mounter's context if it is regular file. Skip the checks > > completely for special files and for executables. > > > > While non-context mount should still be ok, but this means lot of > > privilige granting to unprivileged process using context mounts. So > > unprivileged process which could not open a device/socket/fifo for > > read/write on host fs, can open it for those operations for context > > mounts. > > > > IOW, for context mount case, an unprivileged user will gain lot of > > privileges. But that seems to be the point of context mount anyway > > on regular disks. If a disk is mounted using context mount option, > > then all real labels are ignored and all access checking happens > > using context label. We are doing similar thing. With one step extra > > and that is making sure if mounter itself can not do certain operation > > on host, that will still be denied. > > > > This probably means that context= mounts should be used very carefully. > > It will grant lot of priviliges to the process (and allow operations > > which process could not do on host without overlayfs mount). > > > > Case of device file still baffels me though. We don't do any mounter's > > checks on device files. So if a device file is on upper which mounter > > can't open for read but mounter is still granting priviliges to client > > to open that device file. That's unintutive to me and seems counter > > to the principle of that mounter can't give more priviliges than what > > it itself can't do on host. > > > > Dan, stephen, paul moore, does this sound ok to you folks from selinux > > point of view. > > It seems wrong to check CREATE when no file is being created, and doing > so could lead to over-privileging of the mounter context, requiring one > to allow a mounter context to create device nodes just to allow a client > task context to read/write via already existing device nodes through an > overlay. Point taken. > > Checking READ but not EXECUTE upon an execute check could permit a > mounter to execute unauthorized code, if it can context mount from a > readable-but-not-executable context to an executable context. > > Note btw that cp --preserve=all doesn't quite operate as expected if > dealing with a context mount. You can't preserve the original security > context if copying to a context mount unless the two contexts happen to > already match. So I'm not sure how that model applies in the case of a > context mount. > > Does the breaking commit (007ea44892e6) fix a real bug affecting users? > If not, I'd recommend just reverting it. That is certainly an option, but... this is all about context= mounts, right? Which allows mounter to override MAC checks under the new mount? On any mount, not just overlay, right? So why is overlay special? I'd just like to see proper justification for why we should be doing those checks on underlying layer that simply don't belong there, IMO. I'm sure you know better than I that it's not just about real bugs affecting users, it's about having a clear, well defined model to base the design on. And by reverting the breaking commit, I don't see us getting closer to that. Thanks, Miklos ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-11-28 20:24 ` Miklos Szeredi @ 2018-11-28 21:46 ` Stephen Smalley 2018-11-29 11:04 ` Miklos Szeredi 0 siblings, 1 reply; 47+ messages in thread From: Stephen Smalley @ 2018-11-28 21:46 UTC (permalink / raw) To: Miklos Szeredi Cc: Vivek Goyal, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh On 11/28/18 3:24 PM, Miklos Szeredi wrote: > On Wed, Nov 28, 2018 at 8:32 PM Stephen Smalley <sds@tycho.nsa.gov> wrote: >> >> On 11/28/18 12:03 PM, Vivek Goyal wrote: >>> On Wed, Nov 28, 2018 at 11:00:09AM +0100, Miklos Szeredi wrote: >>>> On Tue, Nov 27, 2018 at 10:05 PM Vivek Goyal <vgoyal@redhat.com> wrote: >>>>> >>>>> On Tue, Nov 27, 2018 at 08:58:06PM +0100, Miklos Szeredi wrote: >>>>>> [resending with fixed email address for Paul Moore] >>>>>> >>>>>> Moving discussion from github[1] to here. >>>>>> >>>>>> To summarize: commit 007ea44892e6 ("ovl: relax permission checking on >>>>>> underlying layers") was added in 4.20-rc1 to make overlayfs access >>>>>> checks on underlying "real" filesystems more consistent. The >>>>>> discussion leading up to this commit can be found at [2]. The commit >>>>>> broke some selinux-testsuite cases, possibly indicating a security >>>>>> hole opened by this commit. >>>>>> >>>>>> The model this patch tries to follow is that if "cp --preserve=all" >>>>>> was allowed to the mounter from underlying layer to the overlay layer, >>>>>> then operation is allowed. That means even if mounter's creds doesn't >>>>>> provide permission to for example execute underying file X, if >>>>>> mounter's creds provide sufficient permission to perform "cp >>>>>> --preserve=all X Y" and original creds allow execute on Y, then the >>>>>> operation is allowed. This provides consistency in the face of >>>>>> copy-ups. Consistency is only provided in sane setups, where mounter >>>>>> has sufficient privileges to access both the lower and upper layers. >>>>> >>>>> [cc daniel walsh] >>>>> >>>>> I think current selinux testsuite tests are written keeping these >>>>> rules in mind. >>>>> >>>>> 1. Check overlay inode creds in the context of task and underlying >>>>> inode creds (lower/upper), in the context of mounter. >>>>> >>>>> 2. For a lower inode, if said file is being copied up, then only >>>>> check MAY_READ on lower. This is equivalent to mounter creating >>>>> a copy of file and providing caller access to it (context mount). >>>>> >>>>> For the case of special devices, we do not copy up these. So should >>>>> we continue to do check on lower inode in the context of mounter >>>>> (instead of not doing any check on lower at all). >>>> >>>> Hmm, I'm trying to understand the logic... If we follow the "cp >>>> --preserve=all" thing, than mounter needs to have CREATE permission >>>> for the special file, not READ or WRITE. Does that make sense? Would >>>> that help with the context= mount use case? >>> >>> Ok. If we follow "cp --preserve=all" methodology, then checking for >>> mounter CREATE permission on upper for special files makes sense. Or >>> change logic to copy up this special file during open. I am assuming >>> we don't copy up special file during open as it is not necessary >>> for things to work but copying up will work as well? >>> >>> So rules will become. >>> >>> - Two levels of checks. >>> - For lower level inode, check MAY_READ for regular files. (including >>> exec). >>> - For special files, only make sure mounter can CREATE object in upper. >>> >>> - What about checks on files on upper/. As of now we seem to check >>> access in mounter's context if it is regular file. Skip the checks >>> completely for special files and for executables. >>> >>> While non-context mount should still be ok, but this means lot of >>> privilige granting to unprivileged process using context mounts. So >>> unprivileged process which could not open a device/socket/fifo for >>> read/write on host fs, can open it for those operations for context >>> mounts. >>> >>> IOW, for context mount case, an unprivileged user will gain lot of >>> privileges. But that seems to be the point of context mount anyway >>> on regular disks. If a disk is mounted using context mount option, >>> then all real labels are ignored and all access checking happens >>> using context label. We are doing similar thing. With one step extra >>> and that is making sure if mounter itself can not do certain operation >>> on host, that will still be denied. >>> >>> This probably means that context= mounts should be used very carefully. >>> It will grant lot of priviliges to the process (and allow operations >>> which process could not do on host without overlayfs mount). >>> >>> Case of device file still baffels me though. We don't do any mounter's >>> checks on device files. So if a device file is on upper which mounter >>> can't open for read but mounter is still granting priviliges to client >>> to open that device file. That's unintutive to me and seems counter >>> to the principle of that mounter can't give more priviliges than what >>> it itself can't do on host. >>> >>> Dan, stephen, paul moore, does this sound ok to you folks from selinux >>> point of view. >> >> It seems wrong to check CREATE when no file is being created, and doing >> so could lead to over-privileging of the mounter context, requiring one >> to allow a mounter context to create device nodes just to allow a client >> task context to read/write via already existing device nodes through an >> overlay. > > Point taken. > >> >> Checking READ but not EXECUTE upon an execute check could permit a >> mounter to execute unauthorized code, if it can context mount from a >> readable-but-not-executable context to an executable context. >> >> Note btw that cp --preserve=all doesn't quite operate as expected if >> dealing with a context mount. You can't preserve the original security >> context if copying to a context mount unless the two contexts happen to >> already match. So I'm not sure how that model applies in the case of a >> context mount. >> >> Does the breaking commit (007ea44892e6) fix a real bug affecting users? >> If not, I'd recommend just reverting it. > > That is certainly an option, but... this is all about context= > mounts, right? Which allows mounter to override MAC checks under the > new mount? On any mount, not just overlay, right? So why is overlay > special? With other filesystems, the files are only accessible under the context specified by the mounter (and you can't mount it twice with differing context mount options). With overlay, the file is simultaneously accessible under both the context specified by the mounter via the overlay and under its lower/upper context via the lower/upper dir. Generally we only use context mounts on other filesystems when they have no label information at all (no security.selinux xattrs) or when they are completely untrusted to provide that information; the context specified by the mounter is the only basis for access control. With overlay, we are frequently dealing with labeled lower and upper directories in a filesystem we trust. It seems like overlay has a goal of preventing the mounter from escalating its access through an overlay mount. > I'd just like to see proper justification for why we should be doing > those checks on underlying layer that simply don't belong there, IMO. > I'm sure you know better than I that it's not just about real bugs > affecting users, it's about having a clear, well defined model to base > the design on. And by reverting the breaking commit, I don't see us > getting closer to that. It seems like the NFS folks raised a number of concerns with the overlay approach beyond just these two checks, and Android has their override_creds=off use case. Maybe the overlay model needs a more significant rethinking than just these two cases. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-11-28 21:46 ` Stephen Smalley @ 2018-11-29 11:04 ` Miklos Szeredi 2018-11-29 13:49 ` Vivek Goyal 2018-11-29 16:16 ` Stephen Smalley 0 siblings, 2 replies; 47+ messages in thread From: Miklos Szeredi @ 2018-11-29 11:04 UTC (permalink / raw) To: Stephen Smalley Cc: Vivek Goyal, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh On Wed, Nov 28, 2018 at 10:43 PM Stephen Smalley <sds@tycho.nsa.gov> wrote: > > On 11/28/18 3:24 PM, Miklos Szeredi wrote: > > On Wed, Nov 28, 2018 at 8:32 PM Stephen Smalley <sds@tycho.nsa.gov> wrote: [...] > >> Does the breaking commit (007ea44892e6) fix a real bug affecting users? > >> If not, I'd recommend just reverting it. > > > > That is certainly an option, but... this is all about context= > > mounts, right? Which allows mounter to override MAC checks under the > > new mount? On any mount, not just overlay, right? So why is overlay > > special? > > With other filesystems, the files are only accessible under the context > specified by the mounter (and you can't mount it twice with differing > context mount options). With overlay, the file is simultaneously > accessible under both the context specified by the mounter via the > overlay and under its lower/upper context via the lower/upper dir. > > Generally we only use context mounts on other filesystems when they have > no label information at all (no security.selinux xattrs) or when they > are completely untrusted to provide that information; the context > specified by the mounter is the only basis for access control. With > overlay, we are frequently dealing with labeled lower and upper > directories in a filesystem we trust. > > It seems like overlay has a goal of preventing the mounter from > escalating its access through an overlay mount. Overlayfs main purpose is to bypass limited access to a read-only filesystem by copying up on a write access. So bypassing access is built into the system, but it's done in a way that in the end it doesn't permit mounter to do more than it could otherwise do. Or at least that's the intent, as you say. To generalize that, we could say: trigger a copy-up if access to lower layer object is denied. That would extend the scope of the trigger from write access on file/dir to read/write on special files and execute on regular files, all of which could theoretically be denied on the lower object, yet allowed on the upper object without violating the above rule. > > I'd just like to see proper justification for why we should be doing > > those checks on underlying layer that simply don't belong there, IMO. > > I'm sure you know better than I that it's not just about real bugs > > affecting users, it's about having a clear, well defined model to base > > the design on. And by reverting the breaking commit, I don't see us > > getting closer to that. > > It seems like the NFS folks raised a number of concerns with the overlay > approach beyond just these two checks, The concerns that NFS folks had was that overlayfs does not enforce permission checks (with creds of task) on underlying objects, resulting in possibly elevated access compared to directly accessing the NFS mount. But I think there's no way to reconcile server permission checks with overlayfs, so we are left with the current model of only verifying the permission on the server against the mounter's creds. > and Android has their > override_creds=off use case. Maybe the overlay model needs a more > significant rethinking than just these two cases. Yes, it would be good if that override_creds=off use case was discussed. AFAICS it trades less privilege in the mounter for more privilege in the task accessing the overlay. If, or how, that is a good model for anything other than Android is not clear to me. Thanks, Miklos ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-11-29 11:04 ` Miklos Szeredi @ 2018-11-29 13:49 ` Vivek Goyal 2019-03-04 17:01 ` Mark Salyzyn 2018-11-29 16:16 ` Stephen Smalley 1 sibling, 1 reply; 47+ messages in thread From: Vivek Goyal @ 2018-11-29 13:49 UTC (permalink / raw) To: Miklos Szeredi Cc: Stephen Smalley, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh On Thu, Nov 29, 2018 at 12:04:20PM +0100, Miklos Szeredi wrote: > On Wed, Nov 28, 2018 at 10:43 PM Stephen Smalley <sds@tycho.nsa.gov> wrote: > > > > On 11/28/18 3:24 PM, Miklos Szeredi wrote: > > > On Wed, Nov 28, 2018 at 8:32 PM Stephen Smalley <sds@tycho.nsa.gov> wrote: > > [...] > > > >> Does the breaking commit (007ea44892e6) fix a real bug affecting users? > > >> If not, I'd recommend just reverting it. > > > > > > That is certainly an option, but... this is all about context= > > > mounts, right? Which allows mounter to override MAC checks under the > > > new mount? On any mount, not just overlay, right? So why is overlay > > > special? > > > > With other filesystems, the files are only accessible under the context > > specified by the mounter (and you can't mount it twice with differing > > context mount options). With overlay, the file is simultaneously > > accessible under both the context specified by the mounter via the > > overlay and under its lower/upper context via the lower/upper dir. > > > > Generally we only use context mounts on other filesystems when they have > > no label information at all (no security.selinux xattrs) or when they > > are completely untrusted to provide that information; the context > > specified by the mounter is the only basis for access control. With > > overlay, we are frequently dealing with labeled lower and upper > > directories in a filesystem we trust. > > > > It seems like overlay has a goal of preventing the mounter from > > escalating its access through an overlay mount. > > Overlayfs main purpose is to bypass limited access to a read-only > filesystem by copying up on a write access. So bypassing access is > built into the system, but it's done in a way that in the end it > doesn't permit mounter to do more than it could otherwise do. Or at > least that's the intent, as you say. > > To generalize that, we could say: trigger a copy-up if access to > lower layer object is denied. That would extend the scope of the > trigger from write access on file/dir to read/write on special files > and execute on regular files, all of which could theoretically be > denied on the lower object, yet allowed on the upper object without > violating the above rule. Apart from copy-up, one intent of mounter checks was also to make sure mounter did not allow access to objects in lower/upper which mounter itself did not have access to. So a mounter which can not open upper/foo.txt should not be able to create a context overlay mount and allow opening upper/foo.txt to a user. IIUC, that's one of the purposes of second level of check in mounter context and if it makes sense, then it should extend to special file and exectuable too? > > > > I'd just like to see proper justification for why we should be doing > > > those checks on underlying layer that simply don't belong there, IMO. > > > I'm sure you know better than I that it's not just about real bugs > > > affecting users, it's about having a clear, well defined model to base > > > the design on. And by reverting the breaking commit, I don't see us > > > getting closer to that. > > > > It seems like the NFS folks raised a number of concerns with the overlay > > approach beyond just these two checks, > > The concerns that NFS folks had was that overlayfs does not enforce > permission checks (with creds of task) on underlying objects, > resulting in possibly elevated access compared to directly accessing > the NFS mount. But I think there's no way to reconcile server > permission checks with overlayfs, so we are left with the current > model of only verifying the permission on the server against the > mounter's creds. > > > and Android has their > > override_creds=off use case. Maybe the overlay model needs a more > > significant rethinking than just these two cases. > > Yes, it would be good if that override_creds=off use case was > discussed. AFAICS it trades less privilege in the mounter for more > privilege in the task accessing the overlay. If, or how, that is a > good model for anything other than Android is not clear to me. So will override_creds=off solve the NFS issue also where all access will happen with the creds of task now? Though it will stil require more priviliges in task for other operations in overlay to succeed. Thanks Vivek ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-11-29 13:49 ` Vivek Goyal @ 2019-03-04 17:01 ` Mark Salyzyn 2019-03-04 17:56 ` Casey Schaufler 2019-03-04 18:44 ` Stephen Smalley 0 siblings, 2 replies; 47+ messages in thread From: Mark Salyzyn @ 2019-03-04 17:01 UTC (permalink / raw) To: Vivek Goyal, Miklos Szeredi Cc: Stephen Smalley, Ondrej Mosnacek, J. Bruce Fields, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh On 11/29/2018 05:49 AM, Vivek Goyal wrote: > So will override_creds=off solve the NFS issue also where all access will > happen with the creds of task now? Though it will stil require more > priviliges in task for other operations in overlay to succeed. NFS problems seems to have ended the discussion, too many stakeholders? too many outstanding questions? Do we accept the limitations of the override_creds patch as is, and then have the folks more familiar with the NFS scenario(s) build on it? [TL;DR] After looking at all this discussion, it feels like a larger audited rewrite of the security model is in order and override_creds=off may be a disservice (although expediently deals with Android's needs) to a correct general solution. I admit I have little idea where to go from here for a general solution. As far as I see it, the model of creator && caller credentials is a problem for any non-overlapping (MAC) privilege models. This patch allows one to drop any creator privilege escalation, re-introducing the "caller" to the lower layers. As such I would expect a better model is to _always_ check the caller credentials again in the lower layers, and only check the creator credentials, some without caller credentials, for some special cases? Change an && to an || for some of the checks? What are those special cases? I must admit _none_ of those special cases need attention in the Android usage models though making it difficult for me to do the fight thing for the associated stakeholders. The lower privileged application access to the directory cache inherited by other callers troubles me (not for Android, but in general) and feels troublesome (flush out the directory cache? how to tag the privileges associated with the current instance of the directory cache?). Some operations (eg: delete a file for incoming, create mknod in upperdir) are special cases requiring the checking of caller credentaisl to function (not a problem for Android as the caller that deletes a file just so happens to have the necessary privileges). Also, mount namespaces (in upper, lower, etc), how will they affect this all, is there a need for more attention to this as well? -- Mark ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2019-03-04 17:01 ` Mark Salyzyn @ 2019-03-04 17:56 ` Casey Schaufler 2019-03-04 18:44 ` Stephen Smalley 1 sibling, 0 replies; 47+ messages in thread From: Casey Schaufler @ 2019-03-04 17:56 UTC (permalink / raw) To: Mark Salyzyn, Vivek Goyal, Miklos Szeredi Cc: Stephen Smalley, Ondrej Mosnacek, J. Bruce Fields, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh, Linux Security Module list On 3/4/2019 9:01 AM, Mark Salyzyn wrote: Adding linux-security-module to the CC. Please keep the general LSM community in to loop. > On 11/29/2018 05:49 AM, Vivek Goyal wrote: >> So will override_creds=off solve the NFS issue also where all access >> will >> happen with the creds of task now? Though it will stil require more >> priviliges in task for other operations in overlay to succeed. > > NFS problems seems to have ended the discussion, too many > stakeholders? too many outstanding questions? > > Do we accept the limitations of the override_creds patch as is, and > then have the folks more familiar with the NFS scenario(s) build on it? > > [TL;DR] > > After looking at all this discussion, it feels like a larger audited > rewrite of the security model is in order and override_creds=off may > be a disservice (although expediently deals with Android's needs) to a > correct general solution. I admit I have little idea where to go from > here for a general solution. > > As far as I see it, the model of creator && caller credentials is a > problem for any non-overlapping (MAC) privilege models. This patch > allows one to drop any creator privilege escalation, re-introducing > the "caller" to the lower layers. > > As such I would expect a better model is to _always_ check the caller > credentials again in the lower layers, and only check the creator > credentials, some without caller credentials, for some special cases? > Change an && to an || for some of the checks? What are those special > cases? I must admit _none_ of those special cases need attention in > the Android usage models though making it difficult for me to do the > fight thing for the associated stakeholders. > > The lower privileged application access to the directory cache > inherited by other callers troubles me (not for Android, but in > general) and feels troublesome (flush out the directory cache? how to > tag the privileges associated with the current instance of the > directory cache?). Some operations (eg: delete a file for incoming, > create mknod in upperdir) are special cases requiring the checking of > caller credentaisl to function (not a problem for Android as the > caller that deletes a file just so happens to have the necessary > privileges). > > Also, mount namespaces (in upper, lower, etc), how will they affect > this all, is there a need for more attention to this as well? > > -- Mark > ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2019-03-04 17:01 ` Mark Salyzyn 2019-03-04 17:56 ` Casey Schaufler @ 2019-03-04 18:44 ` Stephen Smalley 2019-03-04 19:21 ` Amir Goldstein 1 sibling, 1 reply; 47+ messages in thread From: Stephen Smalley @ 2019-03-04 18:44 UTC (permalink / raw) To: Mark Salyzyn, Vivek Goyal, Miklos Szeredi Cc: Ondrej Mosnacek, J. Bruce Fields, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh, LSM On 3/4/19 12:01 PM, Mark Salyzyn wrote: > On 11/29/2018 05:49 AM, Vivek Goyal wrote: >> So will override_creds=off solve the NFS issue also where all access will >> happen with the creds of task now? Though it will stil require more >> priviliges in task for other operations in overlay to succeed. > > NFS problems seems to have ended the discussion, too many stakeholders? > too many outstanding questions? > > Do we accept the limitations of the override_creds patch as is, and then > have the folks more familiar with the NFS scenario(s) build on it? > > [TL;DR] > > After looking at all this discussion, it feels like a larger audited > rewrite of the security model is in order and override_creds=off may be > a disservice (although expediently deals with Android's needs) to a > correct general solution. I admit I have little idea where to go from > here for a general solution. > > As far as I see it, the model of creator && caller credentials is a > problem for any non-overlapping (MAC) privilege models. This patch > allows one to drop any creator privilege escalation, re-introducing the > "caller" to the lower layers. > > As such I would expect a better model is to _always_ check the caller > credentials again in the lower layers, and only check the creator > credentials, some without caller credentials, for some special cases? > Change an && to an || for some of the checks? What are those special > cases? I must admit _none_ of those special cases need attention in the > Android usage models though making it difficult for me to do the fight > thing for the associated stakeholders. As I recall, there were multiple problems with using current process' creds for the operations on the lower/upper/work directories: - Some overlayfs operations on the lower/upper/work directories required privilege (capabilities) that the current process might lack, e.g. to set ownership and the like on upper or work files, to set special xattrs used internally by overlayfs for whiteouts or similar purposes, to act on files within the work dir which was inaccessible to the current process to prevent accessing files in an incomplete state, etc. Originally that was handled by temporarily elevating the effective capabilities around the privileged operation in the overlayfs code but that didn't help with the SELinux or other LSM capability checking that was triggered upon the capable calls. Without some change there you'd have to allow all client process domains all of the relevant capabilities in policy, greatly increasing their privileges. - The original logic was checking access to the lower dir/files in the context of the current process when performing some operation that modifies the file content or metadata, thereby triggering a SELinux/LSM write or similar check, even though the actual data or metadata modification occurs to the copied-up file instead and does not affect the lower dir/files. That was preventing making the lower dir/file labels read-only to the client processes in the policy, which was desired for the container use case. You'd need to solve those problems in some way. > The lower privileged application access to the directory cache inherited > by other callers troubles me (not for Android, but in general) and feels > troublesome (flush out the directory cache? how to tag the privileges > associated with the current instance of the directory cache?). Some > operations (eg: delete a file for incoming, create mknod in upperdir) > are special cases requiring the checking of caller credentaisl to > function (not a problem for Android as the caller that deletes a file > just so happens to have the necessary privileges). > > Also, mount namespaces (in upper, lower, etc), how will they affect this > all, is there a need for more attention to this as well? > > -- Mark ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2019-03-04 18:44 ` Stephen Smalley @ 2019-03-04 19:21 ` Amir Goldstein 0 siblings, 0 replies; 47+ messages in thread From: Amir Goldstein @ 2019-03-04 19:21 UTC (permalink / raw) To: Mark Salyzyn Cc: Vivek Goyal, Miklos Szeredi, Ondrej Mosnacek, J. Bruce Fields, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh, LSM, Stephen Smalley On Mon, Mar 4, 2019 at 8:53 PM Stephen Smalley <sds@tycho.nsa.gov> wrote: > > On 3/4/19 12:01 PM, Mark Salyzyn wrote: > > On 11/29/2018 05:49 AM, Vivek Goyal wrote: > >> So will override_creds=off solve the NFS issue also where all access will > >> happen with the creds of task now? Though it will stil require more > >> priviliges in task for other operations in overlay to succeed. > > > > NFS problems seems to have ended the discussion, too many stakeholders? > > too many outstanding questions? > > > > Do we accept the limitations of the override_creds patch as is, and then > > have the folks more familiar with the NFS scenario(s) build on it? > > > > [TL;DR] > > > > After looking at all this discussion, it feels like a larger audited > > rewrite of the security model is in order and override_creds=off may be > > a disservice (although expediently deals with Android's needs) to a > > correct general solution. I admit I have little idea where to go from > > here for a general solution. > > > > As far as I see it, the model of creator && caller credentials is a > > problem for any non-overlapping (MAC) privilege models. This patch > > allows one to drop any creator privilege escalation, re-introducing the > > "caller" to the lower layers. > > > > As such I would expect a better model is to _always_ check the caller > > credentials again in the lower layers, and only check the creator > > credentials, some without caller credentials, for some special cases? > > Change an && to an || for some of the checks? What are those special > > cases? I must admit _none_ of those special cases need attention in the > > Android usage models though making it difficult for me to do the fight > > thing for the associated stakeholders. > > As I recall, there were multiple problems with using current process' > creds for the operations on the lower/upper/work directories: > > - Some overlayfs operations on the lower/upper/work directories required > privilege (capabilities) that the current process might lack, e.g. to > set ownership and the like on upper or work files, to set special xattrs > used internally by overlayfs for whiteouts or similar purposes, to act > on files within the work dir which was inaccessible to the current > process to prevent accessing files in an incomplete state, etc. > Originally that was handled by temporarily elevating the effective > capabilities around the privileged operation in the overlayfs code but > that didn't help with the SELinux or other LSM capability checking that > was triggered upon the capable calls. Without some change there you'd > have to allow all client process domains all of the relevant > capabilities in policy, greatly increasing their privileges. > > - The original logic was checking access to the lower dir/files in the > context of the current process when performing some operation that > modifies the file content or metadata, thereby triggering a SELinux/LSM > write or similar check, even though the actual data or metadata > modification occurs to the copied-up file instead and does not affect > the lower dir/files. That was preventing making the lower dir/file > labels read-only to the client processes in the policy, which was > desired for the container use case. > > You'd need to solve those problems in some way. > This is entire discussion in avoiding the elephant in the room - Tampering with underlying layers. We only talk about functionality and enforcement of legit overlayfs operation, but what about an adversary that writes/changes files on underlying layers? For example, permission to write new files can be elevated to effectively modifying or deleting files. To me the logic in "mounter credentials" is not only functional. Whoever has access to underlying layer can really cause a lot of damage to users accessing the overlay mount, so in a way, this power needs to be translated to some sort of elevated credentials. The the current model seems solid in that respect - access to underlying layer can be restricted to the "mounter" who effectively crafts an entirely new set of files. Another way to think about this - consider a user space overlayfs implementation, for example: https://github.com/containers/fuse-overlayfs In what way should the requirements to overlayfs security model differ from the model already imposed on a user space implementation? I do understand the Android use case for override_creds=off, but to me, this is a functional configuration that says "turn off overlay specific security considerations... I know what I am doing". I am not saying that the Android adb debug use case is insecure. I am just not seeing how a && or || model for mounter and caller credential add up to a coherent story (yet...). Thanks, Amir. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-11-29 11:04 ` Miklos Szeredi 2018-11-29 13:49 ` Vivek Goyal @ 2018-11-29 16:16 ` Stephen Smalley 2018-11-29 16:22 ` Stephen Smalley 2018-11-29 19:47 ` Miklos Szeredi 1 sibling, 2 replies; 47+ messages in thread From: Stephen Smalley @ 2018-11-29 16:16 UTC (permalink / raw) To: Miklos Szeredi Cc: Vivek Goyal, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh On 11/29/18 6:04 AM, Miklos Szeredi wrote: > On Wed, Nov 28, 2018 at 10:43 PM Stephen Smalley <sds@tycho.nsa.gov> wrote: >> >> On 11/28/18 3:24 PM, Miklos Szeredi wrote: >>> On Wed, Nov 28, 2018 at 8:32 PM Stephen Smalley <sds@tycho.nsa.gov> wrote: > > [...] > >>>> Does the breaking commit (007ea44892e6) fix a real bug affecting users? >>>> If not, I'd recommend just reverting it. >>> >>> That is certainly an option, but... this is all about context= >>> mounts, right? Which allows mounter to override MAC checks under the >>> new mount? On any mount, not just overlay, right? So why is overlay >>> special? >> >> With other filesystems, the files are only accessible under the context >> specified by the mounter (and you can't mount it twice with differing >> context mount options). With overlay, the file is simultaneously >> accessible under both the context specified by the mounter via the >> overlay and under its lower/upper context via the lower/upper dir. >> >> Generally we only use context mounts on other filesystems when they have >> no label information at all (no security.selinux xattrs) or when they >> are completely untrusted to provide that information; the context >> specified by the mounter is the only basis for access control. With >> overlay, we are frequently dealing with labeled lower and upper >> directories in a filesystem we trust. >> >> It seems like overlay has a goal of preventing the mounter from >> escalating its access through an overlay mount. > > Overlayfs main purpose is to bypass limited access to a read-only > filesystem by copying up on a write access. So bypassing access is > built into the system, but it's done in a way that in the end it > doesn't permit mounter to do more than it could otherwise do. Or at > least that's the intent, as you say. > > To generalize that, we could say: trigger a copy-up if access to > lower layer object is denied. That would extend the scope of the > trigger from write access on file/dir to read/write on special files > and execute on regular files, all of which could theoretically be > denied on the lower object, yet allowed on the upper object without > violating the above rule. Possibly I misunderstood you, but I don't think we want to copy-up on permission denial, as that would still allow the mounter to read/write special files or execute regular files to which it would normally be denied access, because the copy would inherit the context specified by the mounter in the context mount case. It still represents an escalation of privilege for the mounter. In contrast, the copy-up on write behavior does not allow the mounter to do anything it could not do already (i.e. read from the lower, write to the upper). > >>> I'd just like to see proper justification for why we should be doing >>> those checks on underlying layer that simply don't belong there, IMO. >>> I'm sure you know better than I that it's not just about real bugs >>> affecting users, it's about having a clear, well defined model to base >>> the design on. And by reverting the breaking commit, I don't see us >>> getting closer to that. >> >> It seems like the NFS folks raised a number of concerns with the overlay >> approach beyond just these two checks, > > The concerns that NFS folks had was that overlayfs does not enforce > permission checks (with creds of task) on underlying objects, > resulting in possibly elevated access compared to directly accessing > the NFS mount. But I think there's no way to reconcile server > permission checks with overlayfs, so we are left with the current > model of only verifying the permission on the server against the > mounter's creds. Typically this will only work if the NFS files are world-readable/searchable, right, since usually the mounter will be root and the server will enable root squash on the export? Is that sufficient for most use cases? > >> and Android has their >> override_creds=off use case. Maybe the overlay model needs a more >> significant rethinking than just these two cases. > > Yes, it would be good if that override_creds=off use case was > discussed. AFAICS it trades less privilege in the mounter for more > privilege in the task accessing the overlay. If, or how, that is a > good model for anything other than Android is not clear to me. > > Thanks, > Miklos > ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-11-29 16:16 ` Stephen Smalley @ 2018-11-29 16:22 ` Stephen Smalley 2018-11-29 19:47 ` Miklos Szeredi 1 sibling, 0 replies; 47+ messages in thread From: Stephen Smalley @ 2018-11-29 16:22 UTC (permalink / raw) To: Miklos Szeredi Cc: Vivek Goyal, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh On 11/29/18 11:16 AM, Stephen Smalley wrote: > On 11/29/18 6:04 AM, Miklos Szeredi wrote: >> On Wed, Nov 28, 2018 at 10:43 PM Stephen Smalley <sds@tycho.nsa.gov> >> wrote: >>> >>> On 11/28/18 3:24 PM, Miklos Szeredi wrote: >>>> On Wed, Nov 28, 2018 at 8:32 PM Stephen Smalley <sds@tycho.nsa.gov> >>>> wrote: >> >> [...] >> >>>>> Does the breaking commit (007ea44892e6) fix a real bug affecting >>>>> users? >>>>> If not, I'd recommend just reverting it. >>>> >>>> That is certainly an option, but... this is all about context= >>>> mounts, right? Which allows mounter to override MAC checks under the >>>> new mount? On any mount, not just overlay, right? So why is overlay >>>> special? >>> >>> With other filesystems, the files are only accessible under the context >>> specified by the mounter (and you can't mount it twice with differing >>> context mount options). With overlay, the file is simultaneously >>> accessible under both the context specified by the mounter via the >>> overlay and under its lower/upper context via the lower/upper dir. >>> >>> Generally we only use context mounts on other filesystems when they have >>> no label information at all (no security.selinux xattrs) or when they >>> are completely untrusted to provide that information; the context >>> specified by the mounter is the only basis for access control. With >>> overlay, we are frequently dealing with labeled lower and upper >>> directories in a filesystem we trust. >>> >>> It seems like overlay has a goal of preventing the mounter from >>> escalating its access through an overlay mount. >> >> Overlayfs main purpose is to bypass limited access to a read-only >> filesystem by copying up on a write access. So bypassing access is >> built into the system, but it's done in a way that in the end it >> doesn't permit mounter to do more than it could otherwise do. Or at >> least that's the intent, as you say. >> >> To generalize that, we could say: trigger a copy-up if access to >> lower layer object is denied. That would extend the scope of the >> trigger from write access on file/dir to read/write on special files >> and execute on regular files, all of which could theoretically be >> denied on the lower object, yet allowed on the upper object without >> violating the above rule. > > Possibly I misunderstood you, but I don't think we want to copy-up on > permission denial, as that would still allow the mounter to read/write > special files or execute regular files to which it would normally be > denied access, because the copy would inherit the context specified by > the mounter in the context mount case. It still represents an > escalation of privilege for the mounter. In contrast, the copy-up on > write behavior does not allow the mounter to do anything it could not do > already (i.e. read from the lower, write to the upper). Also, copy-up in those cases might allow for device files that would normally be unusable due to nodev on the lower filesystem mount to become usable in the upper? And likewise for executable files in a noexec lower mount? Is that already an issue for nosuid? > >> >>>> I'd just like to see proper justification for why we should be doing >>>> those checks on underlying layer that simply don't belong there, IMO. >>>> I'm sure you know better than I that it's not just about real bugs >>>> affecting users, it's about having a clear, well defined model to base >>>> the design on. And by reverting the breaking commit, I don't see us >>>> getting closer to that. >>> >>> It seems like the NFS folks raised a number of concerns with the overlay >>> approach beyond just these two checks, >> >> The concerns that NFS folks had was that overlayfs does not enforce >> permission checks (with creds of task) on underlying objects, >> resulting in possibly elevated access compared to directly accessing >> the NFS mount. But I think there's no way to reconcile server >> permission checks with overlayfs, so we are left with the current >> model of only verifying the permission on the server against the >> mounter's creds. > > Typically this will only work if the NFS files are > world-readable/searchable, right, since usually the mounter will be root > and the server will enable root squash on the export? Is that > sufficient for most use cases? > >> >>> and Android has their >>> override_creds=off use case. Maybe the overlay model needs a more >>> significant rethinking than just these two cases. >> >> Yes, it would be good if that override_creds=off use case was >> discussed. AFAICS it trades less privilege in the mounter for more >> privilege in the task accessing the overlay. If, or how, that is a >> good model for anything other than Android is not clear to me. >> >> Thanks, >> Miklos >> > ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-11-29 16:16 ` Stephen Smalley 2018-11-29 16:22 ` Stephen Smalley @ 2018-11-29 19:47 ` Miklos Szeredi 2018-11-29 21:03 ` Stephen Smalley 2018-11-29 22:22 ` Daniel Walsh 1 sibling, 2 replies; 47+ messages in thread From: Miklos Szeredi @ 2018-11-29 19:47 UTC (permalink / raw) To: Stephen Smalley Cc: Vivek Goyal, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh On Thu, Nov 29, 2018 at 5:14 PM Stephen Smalley <sds@tycho.nsa.gov> wrote: > Possibly I misunderstood you, but I don't think we want to copy-up on > permission denial, as that would still allow the mounter to read/write > special files or execute regular files to which it would normally be > denied access, because the copy would inherit the context specified by > the mounter in the context mount case. It still represents an > escalation of privilege for the mounter. In contrast, the copy-up on > write behavior does not allow the mounter to do anything it could not do > already (i.e. read from the lower, write to the upper). Let's get this straight: when file is copied up, it inherits label from context=, not from label of lower file? Next question: permission to change metadata is tied to permission to open? Is it possible that open is denied, but metadata can be changed? DAC model allows this: metadata change is tied to ownership, not mode bits. And different capability flag. If the same is true for MAC, then the pre-v4.20-rc1 is already susceptible to the privilege escalation you describe, right? Thanks, Miklos ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-11-29 19:47 ` Miklos Szeredi @ 2018-11-29 21:03 ` Stephen Smalley 2018-11-29 21:19 ` Stephen Smalley 2018-11-29 22:22 ` Daniel Walsh 1 sibling, 1 reply; 47+ messages in thread From: Stephen Smalley @ 2018-11-29 21:03 UTC (permalink / raw) To: Miklos Szeredi Cc: Vivek Goyal, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh On 11/29/18 2:47 PM, Miklos Szeredi wrote: > On Thu, Nov 29, 2018 at 5:14 PM Stephen Smalley <sds@tycho.nsa.gov> wrote: > >> Possibly I misunderstood you, but I don't think we want to copy-up on >> permission denial, as that would still allow the mounter to read/write >> special files or execute regular files to which it would normally be >> denied access, because the copy would inherit the context specified by >> the mounter in the context mount case. It still represents an >> escalation of privilege for the mounter. In contrast, the copy-up on >> write behavior does not allow the mounter to do anything it could not do >> already (i.e. read from the lower, write to the upper). > > Let's get this straight: when file is copied up, it inherits label > from context=, not from label of lower file? That's correct. The overlay inodes are all assigned the label from the context= mount option, and so are any upper inodes created through the overlay. At least that's my understanding of how it is supposed to work. The original use case was for containers with the lower dir labeled with a context that is read-only to the container context and using a context that is writable by the container context for the context= mount. > Next question: permission to change metadata is tied to permission to > open? Is it possible that open is denied, but metadata can be > changed? There is no metadata change occurring here. The overlay, upper, and lower inodes all keep their labels intact for their lifetime (both overlay and upper always have the context= label; upper has whatever its original label was), unless explicitly relabeled by some process. And when viewed through the overlay, the file always has the label specified via context=, even before the copy-up. > DAC model allows this: metadata change is tied to ownership, not mode > bits. And different capability flag. > > If the same is true for MAC, then the pre-v4.20-rc1 is already > susceptible to the privilege escalation you describe, right? Actually, I guess there wouldn't be a privilege escalation if you checked the mounter's ability to create the new file upon copy-up, and checked the mounter's access to the upper inode label upon the subsequent read, write, or execute access. Then we'd typically block the ability to create the device file and we'd block the ability to execute files with the label from context=. But copy-up of special files seems undesirable for other reasons (e.g. requiring mounters to be allowed to create device nodes just to permit client's to read/write them, possible implications for nodev/noexec, implications for socket and fifo files). ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-11-29 21:03 ` Stephen Smalley @ 2018-11-29 21:19 ` Stephen Smalley 2018-12-04 13:32 ` Miklos Szeredi 0 siblings, 1 reply; 47+ messages in thread From: Stephen Smalley @ 2018-11-29 21:19 UTC (permalink / raw) To: Miklos Szeredi Cc: Vivek Goyal, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh On 11/29/18 4:03 PM, Stephen Smalley wrote: > On 11/29/18 2:47 PM, Miklos Szeredi wrote: >> On Thu, Nov 29, 2018 at 5:14 PM Stephen Smalley <sds@tycho.nsa.gov> >> wrote: >> >>> Possibly I misunderstood you, but I don't think we want to copy-up on >>> permission denial, as that would still allow the mounter to read/write >>> special files or execute regular files to which it would normally be >>> denied access, because the copy would inherit the context specified by >>> the mounter in the context mount case. It still represents an >>> escalation of privilege for the mounter. In contrast, the copy-up on >>> write behavior does not allow the mounter to do anything it could not do >>> already (i.e. read from the lower, write to the upper). >> >> Let's get this straight: when file is copied up, it inherits label >> from context=, not from label of lower file? > > That's correct. The overlay inodes are all assigned the label from the > context= mount option, and so are any upper inodes created through the > overlay. At least that's my understanding of how it is supposed to > work. The original use case was for containers with the lower dir > labeled with a context that is read-only to the container context and > using a context that is writable by the container context for the > context= mount. > >> Next question: permission to change metadata is tied to permission to >> open? Is it possible that open is denied, but metadata can be >> changed? > > There is no metadata change occurring here. The overlay, upper, and > lower inodes all keep their labels intact for their lifetime (both > overlay and upper always have the context= label; upper has whatever its ^^lower^^ > original label was), unless explicitly relabeled by some process. And > when viewed through the overlay, the file always has the label specified > via context=, even before the copy-up. > >> DAC model allows this: metadata change is tied to ownership, not mode >> bits. And different capability flag. >> >> If the same is true for MAC, then the pre-v4.20-rc1 is already >> susceptible to the privilege escalation you describe, right? > > Actually, I guess there wouldn't be a privilege escalation if you > checked the mounter's ability to create the new file upon copy-up, and > checked the mounter's access to the upper inode label upon the > subsequent read, write, or execute access. Then we'd typically block > the ability to create the device file and we'd block the ability to > execute files with the label from context=. > > But copy-up of special files seems undesirable for other reasons (e.g. > requiring mounters to be allowed to create device nodes just to permit > client's to read/write them, possible implications for nodev/noexec, > implications for socket and fifo files). ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-11-29 21:19 ` Stephen Smalley @ 2018-12-04 13:32 ` Miklos Szeredi 2018-12-04 14:30 ` Stephen Smalley 0 siblings, 1 reply; 47+ messages in thread From: Miklos Szeredi @ 2018-12-04 13:32 UTC (permalink / raw) To: Stephen Smalley Cc: Vivek Goyal, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh On Thu, Nov 29, 2018 at 10:16 PM Stephen Smalley <sds@tycho.nsa.gov> wrote: > > On 11/29/18 4:03 PM, Stephen Smalley wrote: > > On 11/29/18 2:47 PM, Miklos Szeredi wrote: > >> On Thu, Nov 29, 2018 at 5:14 PM Stephen Smalley <sds@tycho.nsa.gov> > >> wrote: > >> > >>> Possibly I misunderstood you, but I don't think we want to copy-up on > >>> permission denial, as that would still allow the mounter to read/write > >>> special files or execute regular files to which it would normally be > >>> denied access, because the copy would inherit the context specified by > >>> the mounter in the context mount case. It still represents an > >>> escalation of privilege for the mounter. In contrast, the copy-up on > >>> write behavior does not allow the mounter to do anything it could not do > >>> already (i.e. read from the lower, write to the upper). > >> > >> Let's get this straight: when file is copied up, it inherits label > >> from context=, not from label of lower file? > > > > That's correct. The overlay inodes are all assigned the label from the > > context= mount option, and so are any upper inodes created through the > > overlay. At least that's my understanding of how it is supposed to > > work. The original use case was for containers with the lower dir > > labeled with a context that is read-only to the container context and > > using a context that is writable by the container context for the > > context= mount. > > > >> Next question: permission to change metadata is tied to permission to > >> open? Is it possible that open is denied, but metadata can be > >> changed? > > > > There is no metadata change occurring here. The overlay, upper, and > > lower inodes all keep their labels intact for their lifetime (both > > overlay and upper always have the context= label; upper has whatever its > ^^lower^^ > > > original label was), unless explicitly relabeled by some process. And > > when viewed through the overlay, the file always has the label specified > > via context=, even before the copy-up. Okay. > > > >> DAC model allows this: metadata change is tied to ownership, not mode > >> bits. And different capability flag. > >> > >> If the same is true for MAC, then the pre-v4.20-rc1 is already > >> susceptible to the privilege escalation you describe, right? > > > > Actually, I guess there wouldn't be a privilege escalation if you > > checked the mounter's ability to create the new file upon copy-up, and > > checked the mounter's access to the upper inode label upon the > > subsequent read, write, or execute access. Then we'd typically block > > the ability to create the device file and we'd block the ability to > > execute files with the label from context=. > > > > But copy-up of special files seems undesirable for other reasons (e.g. > > requiring mounters to be allowed to create device nodes just to permit > > client's to read/write them, possible implications for nodev/noexec, > > implications for socket and fifo files). I think you missed my point: opening a device file or executing an executable wouldn't normally require copy-up. If - permission is granted on overlay to task, and - permission is granted on lower layer to mounter, then copy-up wouldn't be performed. My proposed sequence would be a) check task's creds against overlay inode, fail -> return fail, otherwise: b) check mounter's creds against lower inode, success -> return success, otherwise: c) copy up inode, fail -> return fail, otherwise d) check mounter's creds against upper inode, return result. So, unlike write access to regular files, write access to special files don't necessarily result in copy-up. You say this is an escalation of privilege, but I don't get it how. As DWalsh points out downthread, if mounter cannot create device files, then the copy-up will simply fail. If mounter can create device files, then this is not an escalation of privilege for the mounter. Thanks, Miklos ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-12-04 13:32 ` Miklos Szeredi @ 2018-12-04 14:30 ` Stephen Smalley 2018-12-04 14:45 ` Miklos Szeredi 2018-12-04 15:15 ` Vivek Goyal 0 siblings, 2 replies; 47+ messages in thread From: Stephen Smalley @ 2018-12-04 14:30 UTC (permalink / raw) To: Miklos Szeredi Cc: Vivek Goyal, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh On 12/4/18 8:32 AM, Miklos Szeredi wrote: > On Thu, Nov 29, 2018 at 10:16 PM Stephen Smalley <sds@tycho.nsa.gov> wrote: >> >> On 11/29/18 4:03 PM, Stephen Smalley wrote: >>> On 11/29/18 2:47 PM, Miklos Szeredi wrote: >>>> On Thu, Nov 29, 2018 at 5:14 PM Stephen Smalley <sds@tycho.nsa.gov> >>>> wrote: >>>> >>>>> Possibly I misunderstood you, but I don't think we want to copy-up on >>>>> permission denial, as that would still allow the mounter to read/write >>>>> special files or execute regular files to which it would normally be >>>>> denied access, because the copy would inherit the context specified by >>>>> the mounter in the context mount case. It still represents an >>>>> escalation of privilege for the mounter. In contrast, the copy-up on >>>>> write behavior does not allow the mounter to do anything it could not do >>>>> already (i.e. read from the lower, write to the upper). >>>> >>>> Let's get this straight: when file is copied up, it inherits label >>>> from context=, not from label of lower file? >>> >>> That's correct. The overlay inodes are all assigned the label from the >>> context= mount option, and so are any upper inodes created through the >>> overlay. At least that's my understanding of how it is supposed to >>> work. The original use case was for containers with the lower dir >>> labeled with a context that is read-only to the container context and >>> using a context that is writable by the container context for the >>> context= mount. >>> >>>> Next question: permission to change metadata is tied to permission to >>>> open? Is it possible that open is denied, but metadata can be >>>> changed? >>> >>> There is no metadata change occurring here. The overlay, upper, and >>> lower inodes all keep their labels intact for their lifetime (both >>> overlay and upper always have the context= label; upper has whatever its >> ^^lower^^ >> >>> original label was), unless explicitly relabeled by some process. And >>> when viewed through the overlay, the file always has the label specified >>> via context=, even before the copy-up. > > Okay. > >>> >>>> DAC model allows this: metadata change is tied to ownership, not mode >>>> bits. And different capability flag. >>>> >>>> If the same is true for MAC, then the pre-v4.20-rc1 is already >>>> susceptible to the privilege escalation you describe, right? >>> >>> Actually, I guess there wouldn't be a privilege escalation if you >>> checked the mounter's ability to create the new file upon copy-up, and >>> checked the mounter's access to the upper inode label upon the >>> subsequent read, write, or execute access. Then we'd typically block >>> the ability to create the device file and we'd block the ability to >>> execute files with the label from context=. >>> >>> But copy-up of special files seems undesirable for other reasons (e.g. >>> requiring mounters to be allowed to create device nodes just to permit >>> client's to read/write them, possible implications for nodev/noexec, >>> implications for socket and fifo files). > > I think you missed my point: opening a device file or executing an > executable wouldn't normally require copy-up. If > > - permission is granted on overlay to task, and > - permission is granted on lower layer to mounter, > > then copy-up wouldn't be performed. > > My proposed sequence would be > > a) check task's creds against overlay inode, fail -> return fail, otherwise: > b) check mounter's creds against lower inode, success -> return > success, otherwise: > c) copy up inode, fail -> return fail, otherwise > d) check mounter's creds against upper inode, return result. > > So, unlike write access to regular files, write access to special > files don't necessarily result in copy-up. > > You say this is an escalation of privilege, but I don't get it how. > As DWalsh points out downthread, if mounter cannot create device > files, then the copy-up will simply fail. If mounter can create > device files, then this is not an escalation of privilege for the > mounter. Yes, in that case there isn't an escalation of privilege for the mounter (I acknowledged that above). I'm still not sure copy-up of special files is a good idea though: - In the case of device files, there is the potential for mischief by the client task in misusing the mounter's privileges to gain access to otherwise unusable device node (nodev lower vs upper?), - In the case of sockets or fifos, what useful result do you get from a copy-up? Accessing the copy isn't going to yield the same result as accessing the original. For executables, this copy-up behavior might trigger a lot of undesired copies of non-executable files from the lower dir into the upper, even if we ultimately end up denying the execute. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-12-04 14:30 ` Stephen Smalley @ 2018-12-04 14:45 ` Miklos Szeredi 2018-12-04 15:35 ` Stephen Smalley 2018-12-04 15:15 ` Vivek Goyal 1 sibling, 1 reply; 47+ messages in thread From: Miklos Szeredi @ 2018-12-04 14:45 UTC (permalink / raw) To: Stephen Smalley Cc: Vivek Goyal, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh On Tue, Dec 4, 2018 at 3:28 PM Stephen Smalley <sds@tycho.nsa.gov> wrote: > > On 12/4/18 8:32 AM, Miklos Szeredi wrote: > > My proposed sequence would be > > > > a) check task's creds against overlay inode, fail -> return fail, otherwise: > > b) check mounter's creds against lower inode, success -> return > > success, otherwise: > > c) copy up inode, fail -> return fail, otherwise > > d) check mounter's creds against upper inode, return result. > > > > So, unlike write access to regular files, write access to special > > files don't necessarily result in copy-up. > > > > You say this is an escalation of privilege, but I don't get it how. > > As DWalsh points out downthread, if mounter cannot create device > > files, then the copy-up will simply fail. If mounter can create > > device files, then this is not an escalation of privilege for the > > mounter. > > Yes, in that case there isn't an escalation of privilege for the mounter > (I acknowledged that above). I'm still not sure copy-up of special > files is a good idea though: > > - In the case of device files, there is the potential for mischief by > the client task in misusing the mounter's privileges to gain access to > otherwise unusable device node (nodev lower vs upper?), The mount flag "nodev" on lower or upper has no effect on the overlay, and never had. > - In the case of sockets or fifos, what useful result do you get from a > copy-up? Accessing the copy isn't going to yield the same result as > accessing the original. This is a misconception. Overlayfs is a filesystem (that uses other filesystems as backing store), but it's more a filesystem and less a vfs hack (and trying to completely get out of the vfs hack business), and copy up has no effect on I/O being performed on a socket or FIFO: it's the same object before and after the copy up, and it's a different object from either the lower or upper nodes. Same as NFS: fifo on server is logically different object than fifo having the same name on any number of clients. > For executables, this copy-up behavior might trigger a lot of undesired > copies of non-executable files from the lower dir into the upper, even > if we ultimately end up denying the execute. That would only happen if - task is allowed exec on overlay - mounter is denied exec on lower - copy up is allowed - mounter is denied exec on upper In fact in the model where upper object inherits context from overlay this is pretty much impossible, AFAICT. Thanks, Miklos ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-12-04 14:45 ` Miklos Szeredi @ 2018-12-04 15:35 ` Stephen Smalley 2018-12-04 15:39 ` Miklos Szeredi 0 siblings, 1 reply; 47+ messages in thread From: Stephen Smalley @ 2018-12-04 15:35 UTC (permalink / raw) To: Miklos Szeredi Cc: Vivek Goyal, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh On 12/4/18 9:45 AM, Miklos Szeredi wrote: > On Tue, Dec 4, 2018 at 3:28 PM Stephen Smalley <sds@tycho.nsa.gov> wrote: >> >> On 12/4/18 8:32 AM, Miklos Szeredi wrote: > >>> My proposed sequence would be >>> >>> a) check task's creds against overlay inode, fail -> return fail, otherwise: >>> b) check mounter's creds against lower inode, success -> return >>> success, otherwise: >>> c) copy up inode, fail -> return fail, otherwise >>> d) check mounter's creds against upper inode, return result. >>> >>> So, unlike write access to regular files, write access to special >>> files don't necessarily result in copy-up. >>> >>> You say this is an escalation of privilege, but I don't get it how. >>> As DWalsh points out downthread, if mounter cannot create device >>> files, then the copy-up will simply fail. If mounter can create >>> device files, then this is not an escalation of privilege for the >>> mounter. >> >> Yes, in that case there isn't an escalation of privilege for the mounter >> (I acknowledged that above). I'm still not sure copy-up of special >> files is a good idea though: >> >> - In the case of device files, there is the potential for mischief by >> the client task in misusing the mounter's privileges to gain access to >> otherwise unusable device node (nodev lower vs upper?), > > The mount flag "nodev" on lower or upper has no effect on the overlay, > and never had. > >> - In the case of sockets or fifos, what useful result do you get from a >> copy-up? Accessing the copy isn't going to yield the same result as >> accessing the original. > > This is a misconception. Overlayfs is a filesystem (that uses other > filesystems as backing store), but it's more a filesystem and less a > vfs hack (and trying to completely get out of the vfs hack business), > and copy up has no effect on I/O being performed on a socket or FIFO: > it's the same object before and after the copy up, and it's a > different object from either the lower or upper nodes. Same as NFS: > fifo on server is logically different object than fifo having the same > name on any number of clients. > >> For executables, this copy-up behavior might trigger a lot of undesired >> copies of non-executable files from the lower dir into the upper, even >> if we ultimately end up denying the execute. > > That would only happen if > > - task is allowed exec on overlay > - mounter is denied exec on lower > - copy up is allowed > - mounter is denied exec on upper > > In fact in the model where upper object inherits context from overlay > this is pretty much impossible, AFAICT. I think the above is the situation in Android (mounter is denied exec to lower and upper to prevent unauthorized code execution, but is allowed to copy-up in order to support writes by the client). However, since they need override_creds=off or similar anyway, I guess it doesn't matter. Ok, I concede the point. Not sure what that means though for v4.20. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-12-04 15:35 ` Stephen Smalley @ 2018-12-04 15:39 ` Miklos Szeredi 2018-12-11 15:50 ` Paul Moore 0 siblings, 1 reply; 47+ messages in thread From: Miklos Szeredi @ 2018-12-04 15:39 UTC (permalink / raw) To: Stephen Smalley Cc: Vivek Goyal, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh On Tue, Dec 4, 2018 at 4:32 PM Stephen Smalley <sds@tycho.nsa.gov> wrote: > Ok, I concede the point. Not sure what that means though for v4.20. I have the revert queued up for v4.20 as that's the safest. Don't let that stop the discussion, though, I'd especially like to hear the arguments from the Android side. Thanks, Miklos ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-12-04 15:39 ` Miklos Szeredi @ 2018-12-11 15:50 ` Paul Moore 0 siblings, 0 replies; 47+ messages in thread From: Paul Moore @ 2018-12-11 15:50 UTC (permalink / raw) To: miklos Cc: Stephen Smalley, vgoyal, omosnace, bfields, salyzyn, linux-kernel, linux-unionfs, linux-fsdevel, selinux, Dan Walsh On Tue, Dec 4, 2018 at 10:39 AM Miklos Szeredi <miklos@szeredi.hu> wrote: > On Tue, Dec 4, 2018 at 4:32 PM Stephen Smalley <sds@tycho.nsa.gov> wrote: > > > Ok, I concede the point. Not sure what that means though for v4.20. > > I have the revert queued up for v4.20 as that's the safest. Miklos, when do you plan on sending the revert to Linus? I just tested v4.20-rc6 and the problem persists. -- paul moore www.paul-moore.com ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-12-04 14:30 ` Stephen Smalley 2018-12-04 14:45 ` Miklos Szeredi @ 2018-12-04 15:15 ` Vivek Goyal 2018-12-04 15:22 ` Vivek Goyal 2018-12-04 15:42 ` Stephen Smalley 1 sibling, 2 replies; 47+ messages in thread From: Vivek Goyal @ 2018-12-04 15:15 UTC (permalink / raw) To: Stephen Smalley Cc: Miklos Szeredi, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh On Tue, Dec 04, 2018 at 09:30:53AM -0500, Stephen Smalley wrote: > On 12/4/18 8:32 AM, Miklos Szeredi wrote: > > On Thu, Nov 29, 2018 at 10:16 PM Stephen Smalley <sds@tycho.nsa.gov> wrote: > > > > > > On 11/29/18 4:03 PM, Stephen Smalley wrote: > > > > On 11/29/18 2:47 PM, Miklos Szeredi wrote: > > > > > On Thu, Nov 29, 2018 at 5:14 PM Stephen Smalley <sds@tycho.nsa.gov> > > > > > wrote: > > > > > > > > > > > Possibly I misunderstood you, but I don't think we want to copy-up on > > > > > > permission denial, as that would still allow the mounter to read/write > > > > > > special files or execute regular files to which it would normally be > > > > > > denied access, because the copy would inherit the context specified by > > > > > > the mounter in the context mount case. It still represents an > > > > > > escalation of privilege for the mounter. In contrast, the copy-up on > > > > > > write behavior does not allow the mounter to do anything it could not do > > > > > > already (i.e. read from the lower, write to the upper). > > > > > > > > > > Let's get this straight: when file is copied up, it inherits label > > > > > from context=, not from label of lower file? > > > > > > > > That's correct. The overlay inodes are all assigned the label from the > > > > context= mount option, and so are any upper inodes created through the > > > > overlay. At least that's my understanding of how it is supposed to > > > > work. The original use case was for containers with the lower dir > > > > labeled with a context that is read-only to the container context and > > > > using a context that is writable by the container context for the > > > > context= mount. > > > > > > > > > Next question: permission to change metadata is tied to permission to > > > > > open? Is it possible that open is denied, but metadata can be > > > > > changed? > > > > > > > > There is no metadata change occurring here. The overlay, upper, and > > > > lower inodes all keep their labels intact for their lifetime (both > > > > overlay and upper always have the context= label; upper has whatever its > > > ^^lower^^ > > > > > > > original label was), unless explicitly relabeled by some process. And > > > > when viewed through the overlay, the file always has the label specified > > > > via context=, even before the copy-up. > > > > Okay. > > > > > > > > > > > DAC model allows this: metadata change is tied to ownership, not mode > > > > > bits. And different capability flag. > > > > > > > > > > If the same is true for MAC, then the pre-v4.20-rc1 is already > > > > > susceptible to the privilege escalation you describe, right? > > > > > > > > Actually, I guess there wouldn't be a privilege escalation if you > > > > checked the mounter's ability to create the new file upon copy-up, and > > > > checked the mounter's access to the upper inode label upon the > > > > subsequent read, write, or execute access. Then we'd typically block > > > > the ability to create the device file and we'd block the ability to > > > > execute files with the label from context=. > > > > > > > > But copy-up of special files seems undesirable for other reasons (e.g. > > > > requiring mounters to be allowed to create device nodes just to permit > > > > client's to read/write them, possible implications for nodev/noexec, > > > > implications for socket and fifo files). > > > > I think you missed my point: opening a device file or executing an > > executable wouldn't normally require copy-up. If > > > > - permission is granted on overlay to task, and > > - permission is granted on lower layer to mounter, > > > > then copy-up wouldn't be performed. > > > > My proposed sequence would be > > > > a) check task's creds against overlay inode, fail -> return fail, otherwise: > > b) check mounter's creds against lower inode, success -> return > > success, otherwise: > > c) copy up inode, fail -> return fail, otherwise > > d) check mounter's creds against upper inode, return result. > > > > So, unlike write access to regular files, write access to special > > files don't necessarily result in copy-up. > > > > You say this is an escalation of privilege, but I don't get it how. > > As DWalsh points out downthread, if mounter cannot create device > > files, then the copy-up will simply fail. If mounter can create > > device files, then this is not an escalation of privilege for the > > mounter. > > Yes, in that case there isn't an escalation of privilege for the mounter (I > acknowledged that above). I'm still not sure copy-up of special files is a > good idea though: > > - In the case of device files, there is the potential for mischief by the > client task in misusing the mounter's privileges to gain access to otherwise > unusable device node (nodev lower vs upper?), I was thinking about it as well. But client can always bypass permissions of lower device inode by first removing device file and then by doing a mknod. And that will be equivalent of copy up. IOW, IIUC, we do not deny mknod to client and that always creates a way for it to write to device file (and it does not matter what are permissions on lower?) > - In the case of sockets or fifos, what useful result do you get from a > copy-up? Accessing the copy isn't going to yield the same result as > accessing the original. sockets and fifos use overlay inode number (and not lower/upper inode number). So even if we create a copy things continue to work. Copying up will make sense upon owner or permission change to determine who can open socket/fifo in future? Thanks Vivek ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-12-04 15:15 ` Vivek Goyal @ 2018-12-04 15:22 ` Vivek Goyal 2018-12-04 15:31 ` Miklos Szeredi 2018-12-04 15:42 ` Stephen Smalley 1 sibling, 1 reply; 47+ messages in thread From: Vivek Goyal @ 2018-12-04 15:22 UTC (permalink / raw) To: Stephen Smalley Cc: Miklos Szeredi, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh On Tue, Dec 04, 2018 at 10:15:49AM -0500, Vivek Goyal wrote: > On Tue, Dec 04, 2018 at 09:30:53AM -0500, Stephen Smalley wrote: > > On 12/4/18 8:32 AM, Miklos Szeredi wrote: > > > On Thu, Nov 29, 2018 at 10:16 PM Stephen Smalley <sds@tycho.nsa.gov> wrote: > > > > > > > > On 11/29/18 4:03 PM, Stephen Smalley wrote: > > > > > On 11/29/18 2:47 PM, Miklos Szeredi wrote: > > > > > > On Thu, Nov 29, 2018 at 5:14 PM Stephen Smalley <sds@tycho.nsa.gov> > > > > > > wrote: > > > > > > > > > > > > > Possibly I misunderstood you, but I don't think we want to copy-up on > > > > > > > permission denial, as that would still allow the mounter to read/write > > > > > > > special files or execute regular files to which it would normally be > > > > > > > denied access, because the copy would inherit the context specified by > > > > > > > the mounter in the context mount case. It still represents an > > > > > > > escalation of privilege for the mounter. In contrast, the copy-up on > > > > > > > write behavior does not allow the mounter to do anything it could not do > > > > > > > already (i.e. read from the lower, write to the upper). > > > > > > > > > > > > Let's get this straight: when file is copied up, it inherits label > > > > > > from context=, not from label of lower file? > > > > > > > > > > That's correct. The overlay inodes are all assigned the label from the > > > > > context= mount option, and so are any upper inodes created through the > > > > > overlay. At least that's my understanding of how it is supposed to > > > > > work. The original use case was for containers with the lower dir > > > > > labeled with a context that is read-only to the container context and > > > > > using a context that is writable by the container context for the > > > > > context= mount. > > > > > > > > > > > Next question: permission to change metadata is tied to permission to > > > > > > open? Is it possible that open is denied, but metadata can be > > > > > > changed? > > > > > > > > > > There is no metadata change occurring here. The overlay, upper, and > > > > > lower inodes all keep their labels intact for their lifetime (both > > > > > overlay and upper always have the context= label; upper has whatever its > > > > ^^lower^^ > > > > > > > > > original label was), unless explicitly relabeled by some process. And > > > > > when viewed through the overlay, the file always has the label specified > > > > > via context=, even before the copy-up. > > > > > > Okay. > > > > > > > > > > > > > > DAC model allows this: metadata change is tied to ownership, not mode > > > > > > bits. And different capability flag. > > > > > > > > > > > > If the same is true for MAC, then the pre-v4.20-rc1 is already > > > > > > susceptible to the privilege escalation you describe, right? > > > > > > > > > > Actually, I guess there wouldn't be a privilege escalation if you > > > > > checked the mounter's ability to create the new file upon copy-up, and > > > > > checked the mounter's access to the upper inode label upon the > > > > > subsequent read, write, or execute access. Then we'd typically block > > > > > the ability to create the device file and we'd block the ability to > > > > > execute files with the label from context=. > > > > > > > > > > But copy-up of special files seems undesirable for other reasons (e.g. > > > > > requiring mounters to be allowed to create device nodes just to permit > > > > > client's to read/write them, possible implications for nodev/noexec, > > > > > implications for socket and fifo files). > > > > > > I think you missed my point: opening a device file or executing an > > > executable wouldn't normally require copy-up. If > > > > > > - permission is granted on overlay to task, and > > > - permission is granted on lower layer to mounter, > > > > > > then copy-up wouldn't be performed. > > > > > > My proposed sequence would be > > > > > > a) check task's creds against overlay inode, fail -> return fail, otherwise: > > > b) check mounter's creds against lower inode, success -> return > > > success, otherwise: > > > c) copy up inode, fail -> return fail, otherwise > > > d) check mounter's creds against upper inode, return result. > > > > > > So, unlike write access to regular files, write access to special > > > files don't necessarily result in copy-up. > > > > > > You say this is an escalation of privilege, but I don't get it how. > > > As DWalsh points out downthread, if mounter cannot create device > > > files, then the copy-up will simply fail. If mounter can create > > > device files, then this is not an escalation of privilege for the > > > mounter. > > > > Yes, in that case there isn't an escalation of privilege for the mounter (I > > acknowledged that above). I'm still not sure copy-up of special files is a > > good idea though: > > > > - In the case of device files, there is the potential for mischief by the > > client task in misusing the mounter's privileges to gain access to otherwise > > unusable device node (nodev lower vs upper?), > > I was thinking about it as well. But client can always bypass permissions > of lower device inode by first removing device file and then by doing > a mknod. And that will be equivalent of copy up. IOW, IIUC, we do not deny > mknod to client and that always creates a way for it to write to device > file (and it does not matter what are permissions on lower?) Having said that, this still create little anomaly when mknod to client is not allowed on context label. So a device file, which is on lower and client can not open it for read/write on host, it can now be opened for read/write because mounter will allow access. So why it is different that regular copy up. Well, in regular copy up, we created a copy of the original object and allowed writing to that object (cp --preserve=all) model. But in case of device file, writes will go to same original object. (And not a separate copy). Thanks Vivek ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-12-04 15:22 ` Vivek Goyal @ 2018-12-04 15:31 ` Miklos Szeredi 2018-12-04 15:42 ` Vivek Goyal 0 siblings, 1 reply; 47+ messages in thread From: Miklos Szeredi @ 2018-12-04 15:31 UTC (permalink / raw) To: Vivek Goyal Cc: Stephen Smalley, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh On Tue, Dec 4, 2018 at 4:22 PM Vivek Goyal <vgoyal@redhat.com> wrote: > Having said that, this still create little anomaly when mknod to client > is not allowed on context label. So a device file, which is on lower > and client can not open it for read/write on host, it can now be opened > for read/write because mounter will allow access. So why it is different > that regular copy up. Well, in regular copy up, we created a copy of > the original object and allowed writing to that object (cp --preserve=all) > model. But in case of device file, writes will go to same original > object. (And not a separate copy). That's true. In that sense copy up of special file should result in upper having the same label as of lower, right? Thanks, Miklos ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-12-04 15:31 ` Miklos Szeredi @ 2018-12-04 15:42 ` Vivek Goyal 2018-12-04 16:05 ` Stephen Smalley 0 siblings, 1 reply; 47+ messages in thread From: Vivek Goyal @ 2018-12-04 15:42 UTC (permalink / raw) To: Miklos Szeredi Cc: Stephen Smalley, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh On Tue, Dec 04, 2018 at 04:31:09PM +0100, Miklos Szeredi wrote: > On Tue, Dec 4, 2018 at 4:22 PM Vivek Goyal <vgoyal@redhat.com> wrote: > > > Having said that, this still create little anomaly when mknod to client > > is not allowed on context label. So a device file, which is on lower > > and client can not open it for read/write on host, it can now be opened > > for read/write because mounter will allow access. So why it is different > > that regular copy up. Well, in regular copy up, we created a copy of > > the original object and allowed writing to that object (cp --preserve=all) > > model. But in case of device file, writes will go to same original > > object. (And not a separate copy). > > That's true. > > In that sense copy up of special file should result in upper having > the same label as of lower, right? I guess that might be reasonable (if this behavior is a concern). So even after copy up, client will not be able to read/write a device if it was not allowed on lower. Stephen, what do you think about retaining label of lower for device files during copy up. What about socket/fifo. Thanks Vivek ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-12-04 15:42 ` Vivek Goyal @ 2018-12-04 16:05 ` Stephen Smalley 2018-12-04 16:17 ` Vivek Goyal 0 siblings, 1 reply; 47+ messages in thread From: Stephen Smalley @ 2018-12-04 16:05 UTC (permalink / raw) To: Vivek Goyal, Miklos Szeredi Cc: Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh On 12/4/18 10:42 AM, Vivek Goyal wrote: > On Tue, Dec 04, 2018 at 04:31:09PM +0100, Miklos Szeredi wrote: >> On Tue, Dec 4, 2018 at 4:22 PM Vivek Goyal <vgoyal@redhat.com> wrote: >> >>> Having said that, this still create little anomaly when mknod to client >>> is not allowed on context label. So a device file, which is on lower >>> and client can not open it for read/write on host, it can now be opened >>> for read/write because mounter will allow access. So why it is different >>> that regular copy up. Well, in regular copy up, we created a copy of >>> the original object and allowed writing to that object (cp --preserve=all) >>> model. But in case of device file, writes will go to same original >>> object. (And not a separate copy). >> >> That's true. >> >> In that sense copy up of special file should result in upper having >> the same label as of lower, right? > > I guess that might be reasonable (if this behavior is a concern). So even > after copy up, client will not be able to read/write a device if it was > not allowed on lower. > > Stephen, what do you think about retaining label of lower for device > files during copy up. What about socket/fifo. We don't check client task access to the upper inode label, only to the overlay, right? So the client is still free to access the device through the overlay even if we preserve the lower inode label on the upper inode? What do we gain? ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-12-04 16:05 ` Stephen Smalley @ 2018-12-04 16:17 ` Vivek Goyal 2018-12-04 16:49 ` Stephen Smalley 0 siblings, 1 reply; 47+ messages in thread From: Vivek Goyal @ 2018-12-04 16:17 UTC (permalink / raw) To: Stephen Smalley Cc: Miklos Szeredi, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh On Tue, Dec 04, 2018 at 11:05:46AM -0500, Stephen Smalley wrote: > On 12/4/18 10:42 AM, Vivek Goyal wrote: > > On Tue, Dec 04, 2018 at 04:31:09PM +0100, Miklos Szeredi wrote: > > > On Tue, Dec 4, 2018 at 4:22 PM Vivek Goyal <vgoyal@redhat.com> wrote: > > > > > > > Having said that, this still create little anomaly when mknod to client > > > > is not allowed on context label. So a device file, which is on lower > > > > and client can not open it for read/write on host, it can now be opened > > > > for read/write because mounter will allow access. So why it is different > > > > that regular copy up. Well, in regular copy up, we created a copy of > > > > the original object and allowed writing to that object (cp --preserve=all) > > > > model. But in case of device file, writes will go to same original > > > > object. (And not a separate copy). > > > > > > That's true. > > > > > > In that sense copy up of special file should result in upper having > > > the same label as of lower, right? > > > > I guess that might be reasonable (if this behavior is a concern). So even > > after copy up, client will not be able to read/write a device if it was > > not allowed on lower. > > > > Stephen, what do you think about retaining label of lower for device > > files during copy up. What about socket/fifo. > > We don't check client task access to the upper inode label, only to the > overlay, right? So the client is still free to access the device through > the overlay even if we preserve the lower inode label on the upper inode? > What do we gain? That's only with latest code and Miklos said he will revert it for 4.20. IOW, I am assuming that we will continue to check access to a file on upper in the context of mounter. Otherwise, client will be able to access files on upper/ which even mounter can't access. Thanks Vivek ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-12-04 16:17 ` Vivek Goyal @ 2018-12-04 16:49 ` Stephen Smalley 2018-12-05 13:43 ` Vivek Goyal 0 siblings, 1 reply; 47+ messages in thread From: Stephen Smalley @ 2018-12-04 16:49 UTC (permalink / raw) To: Vivek Goyal Cc: Miklos Szeredi, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh On 12/4/18 11:17 AM, Vivek Goyal wrote: > On Tue, Dec 04, 2018 at 11:05:46AM -0500, Stephen Smalley wrote: >> On 12/4/18 10:42 AM, Vivek Goyal wrote: >>> On Tue, Dec 04, 2018 at 04:31:09PM +0100, Miklos Szeredi wrote: >>>> On Tue, Dec 4, 2018 at 4:22 PM Vivek Goyal <vgoyal@redhat.com> wrote: >>>> >>>>> Having said that, this still create little anomaly when mknod to client >>>>> is not allowed on context label. So a device file, which is on lower >>>>> and client can not open it for read/write on host, it can now be opened >>>>> for read/write because mounter will allow access. So why it is different >>>>> that regular copy up. Well, in regular copy up, we created a copy of >>>>> the original object and allowed writing to that object (cp --preserve=all) >>>>> model. But in case of device file, writes will go to same original >>>>> object. (And not a separate copy). >>>> >>>> That's true. >>>> >>>> In that sense copy up of special file should result in upper having >>>> the same label as of lower, right? >>> >>> I guess that might be reasonable (if this behavior is a concern). So even >>> after copy up, client will not be able to read/write a device if it was >>> not allowed on lower. >>> >>> Stephen, what do you think about retaining label of lower for device >>> files during copy up. What about socket/fifo. >> >> We don't check client task access to the upper inode label, only to the >> overlay, right? So the client is still free to access the device through >> the overlay even if we preserve the lower inode label on the upper inode? >> What do we gain? > > That's only with latest code and Miklos said he will revert it for 4.20. > > IOW, I am assuming that we will continue to check access to a file > on upper in the context of mounter. Otherwise, client will be able to access > files on upper/ which even mounter can't access. I was assuming we're talking about the proposed solution, where we check client access to the overlay (unchanged), mounter access to lower (unchanged), copy-up if denied (new), mounter access to upper (new in the sense that previously we didn't copy-up on denials). In that situation, propagating the lower inode label to the upper inode only impacts the mounter checks, and in that case makes copy-up pointless - if it wasn't allowed to lower it won't be allowed to upper. If it is allowed, then client task is free to access the device regardless as long as it has permissions to the overlay inode. So I don't see what we gain by propagating the lower inode label to the upper inode in the context mount case, and it creates an inconsistency between special files and regular ones. Did I miss something? ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-12-04 16:49 ` Stephen Smalley @ 2018-12-05 13:43 ` Vivek Goyal 2018-12-06 20:26 ` Stephen Smalley 0 siblings, 1 reply; 47+ messages in thread From: Vivek Goyal @ 2018-12-05 13:43 UTC (permalink / raw) To: Stephen Smalley Cc: Miklos Szeredi, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh On Tue, Dec 04, 2018 at 11:49:16AM -0500, Stephen Smalley wrote: > On 12/4/18 11:17 AM, Vivek Goyal wrote: > > On Tue, Dec 04, 2018 at 11:05:46AM -0500, Stephen Smalley wrote: > > > On 12/4/18 10:42 AM, Vivek Goyal wrote: > > > > On Tue, Dec 04, 2018 at 04:31:09PM +0100, Miklos Szeredi wrote: > > > > > On Tue, Dec 4, 2018 at 4:22 PM Vivek Goyal <vgoyal@redhat.com> wrote: > > > > > > > > > > > Having said that, this still create little anomaly when mknod to client > > > > > > is not allowed on context label. So a device file, which is on lower > > > > > > and client can not open it for read/write on host, it can now be opened > > > > > > for read/write because mounter will allow access. So why it is different > > > > > > that regular copy up. Well, in regular copy up, we created a copy of > > > > > > the original object and allowed writing to that object (cp --preserve=all) > > > > > > model. But in case of device file, writes will go to same original > > > > > > object. (And not a separate copy). > > > > > > > > > > That's true. > > > > > > > > > > In that sense copy up of special file should result in upper having > > > > > the same label as of lower, right? > > > > > > > > I guess that might be reasonable (if this behavior is a concern). So even > > > > after copy up, client will not be able to read/write a device if it was > > > > not allowed on lower. > > > > > > > > Stephen, what do you think about retaining label of lower for device > > > > files during copy up. What about socket/fifo. > > > > > > We don't check client task access to the upper inode label, only to the > > > overlay, right? So the client is still free to access the device through > > > the overlay even if we preserve the lower inode label on the upper inode? > > > What do we gain? > > > > That's only with latest code and Miklos said he will revert it for 4.20. > > > > IOW, I am assuming that we will continue to check access to a file > > on upper in the context of mounter. Otherwise, client will be able to access > > files on upper/ which even mounter can't access. > > I was assuming we're talking about the proposed solution, where we check > client access to the overlay (unchanged), mounter access to lower > (unchanged), copy-up if denied (new), mounter access to upper (new in the > sense that previously we didn't copy-up on denials). > > In that situation, propagating the lower inode label to the upper inode only > impacts the mounter checks, and in that case makes copy-up pointless - if it > wasn't allowed to lower it won't be allowed to upper. If it is allowed, > then client task is free to access the device regardless as long as it has > permissions to the overlay inode. So I don't see what we gain by > propagating the lower inode label to the upper inode in the context mount > case, and it creates an inconsistency between special files and regular > ones. If we agree on retaining lower label of lower device file on copy up, then I am assuming we will change rule c) to copy up only non device files. (because if you don't have access on lower, you will not have access even after copy up). There are other paths where copy up happnes. Like link or when file metadata (ownership, permissions, timestmap) changes. In those cases, if we retain the lower label over copy up, it probably will help. IOW, just by creating a link to a device, one will not get access to a device on upper which could not be accessed on lower. Device files are special anyway. In regular files we are creating a copy and user writes to copy. But that's not the case with device files. So I guess these will have to be treated differently. Thanks Vivek ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-12-05 13:43 ` Vivek Goyal @ 2018-12-06 20:26 ` Stephen Smalley 2018-12-11 21:48 ` Vivek Goyal 0 siblings, 1 reply; 47+ messages in thread From: Stephen Smalley @ 2018-12-06 20:26 UTC (permalink / raw) To: Vivek Goyal Cc: Miklos Szeredi, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh On 12/5/18 8:43 AM, Vivek Goyal wrote: > On Tue, Dec 04, 2018 at 11:49:16AM -0500, Stephen Smalley wrote: >> On 12/4/18 11:17 AM, Vivek Goyal wrote: >>> On Tue, Dec 04, 2018 at 11:05:46AM -0500, Stephen Smalley wrote: >>>> On 12/4/18 10:42 AM, Vivek Goyal wrote: >>>>> On Tue, Dec 04, 2018 at 04:31:09PM +0100, Miklos Szeredi wrote: >>>>>> On Tue, Dec 4, 2018 at 4:22 PM Vivek Goyal <vgoyal@redhat.com> wrote: >>>>>> >>>>>>> Having said that, this still create little anomaly when mknod to client >>>>>>> is not allowed on context label. So a device file, which is on lower >>>>>>> and client can not open it for read/write on host, it can now be opened >>>>>>> for read/write because mounter will allow access. So why it is different >>>>>>> that regular copy up. Well, in regular copy up, we created a copy of >>>>>>> the original object and allowed writing to that object (cp --preserve=all) >>>>>>> model. But in case of device file, writes will go to same original >>>>>>> object. (And not a separate copy). >>>>>> >>>>>> That's true. >>>>>> >>>>>> In that sense copy up of special file should result in upper having >>>>>> the same label as of lower, right? >>>>> >>>>> I guess that might be reasonable (if this behavior is a concern). So even >>>>> after copy up, client will not be able to read/write a device if it was >>>>> not allowed on lower. >>>>> >>>>> Stephen, what do you think about retaining label of lower for device >>>>> files during copy up. What about socket/fifo. >>>> >>>> We don't check client task access to the upper inode label, only to the >>>> overlay, right? So the client is still free to access the device through >>>> the overlay even if we preserve the lower inode label on the upper inode? >>>> What do we gain? >>> >>> That's only with latest code and Miklos said he will revert it for 4.20. >>> >>> IOW, I am assuming that we will continue to check access to a file >>> on upper in the context of mounter. Otherwise, client will be able to access >>> files on upper/ which even mounter can't access. >> >> I was assuming we're talking about the proposed solution, where we check >> client access to the overlay (unchanged), mounter access to lower >> (unchanged), copy-up if denied (new), mounter access to upper (new in the >> sense that previously we didn't copy-up on denials). >> >> In that situation, propagating the lower inode label to the upper inode only >> impacts the mounter checks, and in that case makes copy-up pointless - if it >> wasn't allowed to lower it won't be allowed to upper. If it is allowed, >> then client task is free to access the device regardless as long as it has >> permissions to the overlay inode. So I don't see what we gain by >> propagating the lower inode label to the upper inode in the context mount >> case, and it creates an inconsistency between special files and regular >> ones. > > If we agree on retaining lower label of lower device file on copy up, then > I am assuming we will change rule c) to copy up only non device files. > (because if you don't have access on lower, you will not have access > even after copy up). > > There are other paths where copy up happnes. Like link or when file > metadata (ownership, permissions, timestmap) changes. In those cases, > if we retain the lower label over copy up, it probably will help. > > IOW, just by creating a link to a device, one will not get access to > a device on upper which could not be accessed on lower. > > Device files are special anyway. In regular files we are creating a > copy and user writes to copy. But that's not the case with device > files. So I guess these will have to be treated differently. I don't understand what you are suggesting. In the case of a context mount, the context specified by the mounter must be assigned to the upper inode for any files that are copied up. Otherwise, changes to file data or metadata made through the overlay will be visible under two different security contexts simultaneously: the context of the overlay inode (i.e. the one specified by the mounter) and the context of the upper inode (in your suggestion, the context from the lower inode). This allows a violation of MAC policy where one can leak data through an overlay to an unauthorized context. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-12-06 20:26 ` Stephen Smalley @ 2018-12-11 21:48 ` Vivek Goyal 2018-12-12 14:51 ` Stephen Smalley 0 siblings, 1 reply; 47+ messages in thread From: Vivek Goyal @ 2018-12-11 21:48 UTC (permalink / raw) To: Stephen Smalley Cc: Miklos Szeredi, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh On Thu, Dec 06, 2018 at 03:26:26PM -0500, Stephen Smalley wrote: > On 12/5/18 8:43 AM, Vivek Goyal wrote: > > On Tue, Dec 04, 2018 at 11:49:16AM -0500, Stephen Smalley wrote: > > > On 12/4/18 11:17 AM, Vivek Goyal wrote: > > > > On Tue, Dec 04, 2018 at 11:05:46AM -0500, Stephen Smalley wrote: > > > > > On 12/4/18 10:42 AM, Vivek Goyal wrote: > > > > > > On Tue, Dec 04, 2018 at 04:31:09PM +0100, Miklos Szeredi wrote: > > > > > > > On Tue, Dec 4, 2018 at 4:22 PM Vivek Goyal <vgoyal@redhat.com> wrote: > > > > > > > > > > > > > > > Having said that, this still create little anomaly when mknod to client > > > > > > > > is not allowed on context label. So a device file, which is on lower > > > > > > > > and client can not open it for read/write on host, it can now be opened > > > > > > > > for read/write because mounter will allow access. So why it is different > > > > > > > > that regular copy up. Well, in regular copy up, we created a copy of > > > > > > > > the original object and allowed writing to that object (cp --preserve=all) > > > > > > > > model. But in case of device file, writes will go to same original > > > > > > > > object. (And not a separate copy). > > > > > > > > > > > > > > That's true. > > > > > > > > > > > > > > In that sense copy up of special file should result in upper having > > > > > > > the same label as of lower, right? > > > > > > > > > > > > I guess that might be reasonable (if this behavior is a concern). So even > > > > > > after copy up, client will not be able to read/write a device if it was > > > > > > not allowed on lower. > > > > > > > > > > > > Stephen, what do you think about retaining label of lower for device > > > > > > files during copy up. What about socket/fifo. > > > > > > > > > > We don't check client task access to the upper inode label, only to the > > > > > overlay, right? So the client is still free to access the device through > > > > > the overlay even if we preserve the lower inode label on the upper inode? > > > > > What do we gain? > > > > > > > > That's only with latest code and Miklos said he will revert it for 4.20. > > > > > > > > IOW, I am assuming that we will continue to check access to a file > > > > on upper in the context of mounter. Otherwise, client will be able to access > > > > files on upper/ which even mounter can't access. > > > > > > I was assuming we're talking about the proposed solution, where we check > > > client access to the overlay (unchanged), mounter access to lower > > > (unchanged), copy-up if denied (new), mounter access to upper (new in the > > > sense that previously we didn't copy-up on denials). > > > > > > In that situation, propagating the lower inode label to the upper inode only > > > impacts the mounter checks, and in that case makes copy-up pointless - if it > > > wasn't allowed to lower it won't be allowed to upper. If it is allowed, > > > then client task is free to access the device regardless as long as it has > > > permissions to the overlay inode. So I don't see what we gain by > > > propagating the lower inode label to the upper inode in the context mount > > > case, and it creates an inconsistency between special files and regular > > > ones. > > > > If we agree on retaining lower label of lower device file on copy up, then > > I am assuming we will change rule c) to copy up only non device files. > > (because if you don't have access on lower, you will not have access > > even after copy up). > > > > There are other paths where copy up happnes. Like link or when file > > metadata (ownership, permissions, timestmap) changes. In those cases, > > if we retain the lower label over copy up, it probably will help. > > > > IOW, just by creating a link to a device, one will not get access to > > a device on upper which could not be accessed on lower. > > > > Device files are special anyway. In regular files we are creating a > > copy and user writes to copy. But that's not the case with device > > files. So I guess these will have to be treated differently. > > I don't understand what you are suggesting. In the case of a context mount, > the context specified by the mounter must be assigned to the upper inode for > any files that are copied up. Otherwise, changes to file data or metadata > made through the overlay will be visible under two different security > contexts simultaneously: the context of the overlay inode (i.e. the one > specified by the mounter) and the context of the upper inode (in your > suggestion, the context from the lower inode). This allows a violation of > MAC policy where one can leak data through an overlay to an unauthorized > context. Hi Stephen, Sorry, I don't understand this point of leaking data through overlay. Even if we retain lower label on copy up (for device file), to open that file process should have access on overlay context label and then mounter needs to have access on upper inode (lower label). This is not different from opening a file on lower. Just that metadata of this file on upper might be different. Can you elaborate a bit more on how this is leaking data through overlay mount. If it is, then why accessing file on lower is not equivalent of leaking of data. Thanks Vivek ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-12-11 21:48 ` Vivek Goyal @ 2018-12-12 14:51 ` Stephen Smalley 2018-12-13 14:58 ` Vivek Goyal 0 siblings, 1 reply; 47+ messages in thread From: Stephen Smalley @ 2018-12-12 14:51 UTC (permalink / raw) To: Vivek Goyal Cc: Miklos Szeredi, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh On 12/11/18 4:48 PM, Vivek Goyal wrote: > On Thu, Dec 06, 2018 at 03:26:26PM -0500, Stephen Smalley wrote: >> On 12/5/18 8:43 AM, Vivek Goyal wrote: >>> On Tue, Dec 04, 2018 at 11:49:16AM -0500, Stephen Smalley wrote: >>>> On 12/4/18 11:17 AM, Vivek Goyal wrote: >>>>> On Tue, Dec 04, 2018 at 11:05:46AM -0500, Stephen Smalley wrote: >>>>>> On 12/4/18 10:42 AM, Vivek Goyal wrote: >>>>>>> On Tue, Dec 04, 2018 at 04:31:09PM +0100, Miklos Szeredi wrote: >>>>>>>> On Tue, Dec 4, 2018 at 4:22 PM Vivek Goyal <vgoyal@redhat.com> wrote: >>>>>>>> >>>>>>>>> Having said that, this still create little anomaly when mknod to client >>>>>>>>> is not allowed on context label. So a device file, which is on lower >>>>>>>>> and client can not open it for read/write on host, it can now be opened >>>>>>>>> for read/write because mounter will allow access. So why it is different >>>>>>>>> that regular copy up. Well, in regular copy up, we created a copy of >>>>>>>>> the original object and allowed writing to that object (cp --preserve=all) >>>>>>>>> model. But in case of device file, writes will go to same original >>>>>>>>> object. (And not a separate copy). >>>>>>>> >>>>>>>> That's true. >>>>>>>> >>>>>>>> In that sense copy up of special file should result in upper having >>>>>>>> the same label as of lower, right? >>>>>>> >>>>>>> I guess that might be reasonable (if this behavior is a concern). So even >>>>>>> after copy up, client will not be able to read/write a device if it was >>>>>>> not allowed on lower. >>>>>>> >>>>>>> Stephen, what do you think about retaining label of lower for device >>>>>>> files during copy up. What about socket/fifo. >>>>>> >>>>>> We don't check client task access to the upper inode label, only to the >>>>>> overlay, right? So the client is still free to access the device through >>>>>> the overlay even if we preserve the lower inode label on the upper inode? >>>>>> What do we gain? >>>>> >>>>> That's only with latest code and Miklos said he will revert it for 4.20. >>>>> >>>>> IOW, I am assuming that we will continue to check access to a file >>>>> on upper in the context of mounter. Otherwise, client will be able to access >>>>> files on upper/ which even mounter can't access. >>>> >>>> I was assuming we're talking about the proposed solution, where we check >>>> client access to the overlay (unchanged), mounter access to lower >>>> (unchanged), copy-up if denied (new), mounter access to upper (new in the >>>> sense that previously we didn't copy-up on denials). >>>> >>>> In that situation, propagating the lower inode label to the upper inode only >>>> impacts the mounter checks, and in that case makes copy-up pointless - if it >>>> wasn't allowed to lower it won't be allowed to upper. If it is allowed, >>>> then client task is free to access the device regardless as long as it has >>>> permissions to the overlay inode. So I don't see what we gain by >>>> propagating the lower inode label to the upper inode in the context mount >>>> case, and it creates an inconsistency between special files and regular >>>> ones. >>> >>> If we agree on retaining lower label of lower device file on copy up, then >>> I am assuming we will change rule c) to copy up only non device files. >>> (because if you don't have access on lower, you will not have access >>> even after copy up). >>> >>> There are other paths where copy up happnes. Like link or when file >>> metadata (ownership, permissions, timestmap) changes. In those cases, >>> if we retain the lower label over copy up, it probably will help. >>> >>> IOW, just by creating a link to a device, one will not get access to >>> a device on upper which could not be accessed on lower. >>> >>> Device files are special anyway. In regular files we are creating a >>> copy and user writes to copy. But that's not the case with device >>> files. So I guess these will have to be treated differently. >> >> I don't understand what you are suggesting. In the case of a context mount, >> the context specified by the mounter must be assigned to the upper inode for >> any files that are copied up. Otherwise, changes to file data or metadata >> made through the overlay will be visible under two different security >> contexts simultaneously: the context of the overlay inode (i.e. the one >> specified by the mounter) and the context of the upper inode (in your >> suggestion, the context from the lower inode). This allows a violation of >> MAC policy where one can leak data through an overlay to an unauthorized >> context. > > Hi Stephen, > > Sorry, I don't understand this point of leaking data through overlay. Even > if we retain lower label on copy up (for device file), to open that file > process should have access on overlay context label and then mounter needs > to have access on upper inode (lower label). This is not different from > opening a file on lower. Just that metadata of this file on upper might > be different. > > Can you elaborate a bit more on how this is leaking data through overlay > mount. If it is, then why accessing file on lower is not equivalent of > leaking of data. In the container use case, retaining the lower label on copy-up for a context-mounted overlay permits a process in the container to leak the container data out to host files not labeled with the container label and thus potentially accessible to other containers or host processes. The container process appears to just be writing to files labeled with the container label via the overlay, but the written data and/or metadata is directly accessible through the lower label, which is likely readable to all/many containers and host processes. In the multi-level security (MLS) use case, an analogy would a situation where you have an unclassified lower dir with some content to be shared read-only across all levels, and an overlay is context-mounted at each level with a corresponding upper dir and work dir private to that level. If a client process at secret performs a write to a file via the secret overlay, and if the written data is stored in a file in the upper dir that inherits the label from the lower file (unclassified), then the secret process can leak data to unclassified processes at will, violating the MLS policy. The difference with the lower is that it is read-only and the mounter is explicitly choosing to export it under the new context for reading (but not for writing). As a side note, the actual checking during a context mount isn't as granular as we might like here, since there is no overlay-specific logic and thus no individual checking of the lower, upper, and work directory labels. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-12-12 14:51 ` Stephen Smalley @ 2018-12-13 14:58 ` Vivek Goyal 2018-12-13 16:12 ` Stephen Smalley 0 siblings, 1 reply; 47+ messages in thread From: Vivek Goyal @ 2018-12-13 14:58 UTC (permalink / raw) To: Stephen Smalley Cc: Miklos Szeredi, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh On Wed, Dec 12, 2018 at 09:51:59AM -0500, Stephen Smalley wrote: > On 12/11/18 4:48 PM, Vivek Goyal wrote: > > On Thu, Dec 06, 2018 at 03:26:26PM -0500, Stephen Smalley wrote: > > > On 12/5/18 8:43 AM, Vivek Goyal wrote: > > > > On Tue, Dec 04, 2018 at 11:49:16AM -0500, Stephen Smalley wrote: > > > > > On 12/4/18 11:17 AM, Vivek Goyal wrote: > > > > > > On Tue, Dec 04, 2018 at 11:05:46AM -0500, Stephen Smalley wrote: > > > > > > > On 12/4/18 10:42 AM, Vivek Goyal wrote: > > > > > > > > On Tue, Dec 04, 2018 at 04:31:09PM +0100, Miklos Szeredi wrote: > > > > > > > > > On Tue, Dec 4, 2018 at 4:22 PM Vivek Goyal <vgoyal@redhat.com> wrote: > > > > > > > > > > > > > > > > > > > Having said that, this still create little anomaly when mknod to client > > > > > > > > > > is not allowed on context label. So a device file, which is on lower > > > > > > > > > > and client can not open it for read/write on host, it can now be opened > > > > > > > > > > for read/write because mounter will allow access. So why it is different > > > > > > > > > > that regular copy up. Well, in regular copy up, we created a copy of > > > > > > > > > > the original object and allowed writing to that object (cp --preserve=all) > > > > > > > > > > model. But in case of device file, writes will go to same original > > > > > > > > > > object. (And not a separate copy). > > > > > > > > > > > > > > > > > > That's true. > > > > > > > > > > > > > > > > > > In that sense copy up of special file should result in upper having > > > > > > > > > the same label as of lower, right? > > > > > > > > > > > > > > > > I guess that might be reasonable (if this behavior is a concern). So even > > > > > > > > after copy up, client will not be able to read/write a device if it was > > > > > > > > not allowed on lower. > > > > > > > > > > > > > > > > Stephen, what do you think about retaining label of lower for device > > > > > > > > files during copy up. What about socket/fifo. > > > > > > > > > > > > > > We don't check client task access to the upper inode label, only to the > > > > > > > overlay, right? So the client is still free to access the device through > > > > > > > the overlay even if we preserve the lower inode label on the upper inode? > > > > > > > What do we gain? > > > > > > > > > > > > That's only with latest code and Miklos said he will revert it for 4.20. > > > > > > > > > > > > IOW, I am assuming that we will continue to check access to a file > > > > > > on upper in the context of mounter. Otherwise, client will be able to access > > > > > > files on upper/ which even mounter can't access. > > > > > > > > > > I was assuming we're talking about the proposed solution, where we check > > > > > client access to the overlay (unchanged), mounter access to lower > > > > > (unchanged), copy-up if denied (new), mounter access to upper (new in the > > > > > sense that previously we didn't copy-up on denials). > > > > > > > > > > In that situation, propagating the lower inode label to the upper inode only > > > > > impacts the mounter checks, and in that case makes copy-up pointless - if it > > > > > wasn't allowed to lower it won't be allowed to upper. If it is allowed, > > > > > then client task is free to access the device regardless as long as it has > > > > > permissions to the overlay inode. So I don't see what we gain by > > > > > propagating the lower inode label to the upper inode in the context mount > > > > > case, and it creates an inconsistency between special files and regular > > > > > ones. > > > > > > > > If we agree on retaining lower label of lower device file on copy up, then > > > > I am assuming we will change rule c) to copy up only non device files. > > > > (because if you don't have access on lower, you will not have access > > > > even after copy up). > > > > > > > > There are other paths where copy up happnes. Like link or when file > > > > metadata (ownership, permissions, timestmap) changes. In those cases, > > > > if we retain the lower label over copy up, it probably will help. > > > > > > > > IOW, just by creating a link to a device, one will not get access to > > > > a device on upper which could not be accessed on lower. > > > > > > > > Device files are special anyway. In regular files we are creating a > > > > copy and user writes to copy. But that's not the case with device > > > > files. So I guess these will have to be treated differently. > > > > > > I don't understand what you are suggesting. In the case of a context mount, > > > the context specified by the mounter must be assigned to the upper inode for > > > any files that are copied up. Otherwise, changes to file data or metadata > > > made through the overlay will be visible under two different security > > > contexts simultaneously: the context of the overlay inode (i.e. the one > > > specified by the mounter) and the context of the upper inode (in your > > > suggestion, the context from the lower inode). This allows a violation of > > > MAC policy where one can leak data through an overlay to an unauthorized > > > context. > > > > Hi Stephen, > > > > Sorry, I don't understand this point of leaking data through overlay. Even > > if we retain lower label on copy up (for device file), to open that file > > process should have access on overlay context label and then mounter needs > > to have access on upper inode (lower label). This is not different from > > opening a file on lower. Just that metadata of this file on upper might > > be different. > > > > Can you elaborate a bit more on how this is leaking data through overlay > > mount. If it is, then why accessing file on lower is not equivalent of > > leaking of data. > > In the container use case, retaining the lower label on copy-up for a > context-mounted overlay permits a process in the container to leak the > container data out to host files not labeled with the container label and > thus potentially accessible to other containers or host processes. > The > container process appears to just be writing to files labeled with the > container label via the overlay, but the written data and/or metadata is > directly accessible through the lower label, which is likely readable to > all/many containers and host processes. > > In the multi-level security (MLS) use case, an analogy would a situation > where you have an unclassified lower dir with some content to be shared > read-only across all levels, and an overlay is context-mounted at each level > with a corresponding upper dir and work dir private to that level. If a > client process at secret performs a write to a file via the secret overlay, > and if the written data is stored in a file in the upper dir that inherits > the label from the lower file (unclassified), then the secret process can > leak data to unclassified processes at will, violating the MLS policy. For the case of devices, its already happening. One might change metadata of a device (hence trigger copy up). Now all writes to upper device file from secret process still go to same underlying device and are still readable from lower device file. In fact, for the case of devices, that is even more of a reason to retain the label same as lower. Otherwise upper device node is leaking data of a secret process which can be accessed using device at lower/ (lower's label is readable). > > The difference with the lower is that it is read-only and the mounter is > explicitly choosing to export it under the new context for reading (but not > for writing). If we retain the label and if lower is not writable, then upper deivce node is not being written to as well. So from label point of view, lower and upper inode are not different. Only difference is that some metadata of upper inode might be different. Thanks Vivek > > As a side note, the actual checking during a context mount isn't as granular > as we might like here, since there is no overlay-specific logic and thus no > individual checking of the lower, upper, and work directory labels. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-12-13 14:58 ` Vivek Goyal @ 2018-12-13 16:12 ` Stephen Smalley 2018-12-13 18:54 ` Vivek Goyal 0 siblings, 1 reply; 47+ messages in thread From: Stephen Smalley @ 2018-12-13 16:12 UTC (permalink / raw) To: Vivek Goyal Cc: Miklos Szeredi, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh On 12/13/18 9:58 AM, Vivek Goyal wrote: > On Wed, Dec 12, 2018 at 09:51:59AM -0500, Stephen Smalley wrote: >> On 12/11/18 4:48 PM, Vivek Goyal wrote: >>> On Thu, Dec 06, 2018 at 03:26:26PM -0500, Stephen Smalley wrote: >>>> On 12/5/18 8:43 AM, Vivek Goyal wrote: >>>>> On Tue, Dec 04, 2018 at 11:49:16AM -0500, Stephen Smalley wrote: >>>>>> On 12/4/18 11:17 AM, Vivek Goyal wrote: >>>>>>> On Tue, Dec 04, 2018 at 11:05:46AM -0500, Stephen Smalley wrote: >>>>>>>> On 12/4/18 10:42 AM, Vivek Goyal wrote: >>>>>>>>> On Tue, Dec 04, 2018 at 04:31:09PM +0100, Miklos Szeredi wrote: >>>>>>>>>> On Tue, Dec 4, 2018 at 4:22 PM Vivek Goyal <vgoyal@redhat.com> wrote: >>>>>>>>>> >>>>>>>>>>> Having said that, this still create little anomaly when mknod to client >>>>>>>>>>> is not allowed on context label. So a device file, which is on lower >>>>>>>>>>> and client can not open it for read/write on host, it can now be opened >>>>>>>>>>> for read/write because mounter will allow access. So why it is different >>>>>>>>>>> that regular copy up. Well, in regular copy up, we created a copy of >>>>>>>>>>> the original object and allowed writing to that object (cp --preserve=all) >>>>>>>>>>> model. But in case of device file, writes will go to same original >>>>>>>>>>> object. (And not a separate copy). >>>>>>>>>> >>>>>>>>>> That's true. >>>>>>>>>> >>>>>>>>>> In that sense copy up of special file should result in upper having >>>>>>>>>> the same label as of lower, right? >>>>>>>>> >>>>>>>>> I guess that might be reasonable (if this behavior is a concern). So even >>>>>>>>> after copy up, client will not be able to read/write a device if it was >>>>>>>>> not allowed on lower. >>>>>>>>> >>>>>>>>> Stephen, what do you think about retaining label of lower for device >>>>>>>>> files during copy up. What about socket/fifo. >>>>>>>> >>>>>>>> We don't check client task access to the upper inode label, only to the >>>>>>>> overlay, right? So the client is still free to access the device through >>>>>>>> the overlay even if we preserve the lower inode label on the upper inode? >>>>>>>> What do we gain? >>>>>>> >>>>>>> That's only with latest code and Miklos said he will revert it for 4.20. >>>>>>> >>>>>>> IOW, I am assuming that we will continue to check access to a file >>>>>>> on upper in the context of mounter. Otherwise, client will be able to access >>>>>>> files on upper/ which even mounter can't access. >>>>>> >>>>>> I was assuming we're talking about the proposed solution, where we check >>>>>> client access to the overlay (unchanged), mounter access to lower >>>>>> (unchanged), copy-up if denied (new), mounter access to upper (new in the >>>>>> sense that previously we didn't copy-up on denials). >>>>>> >>>>>> In that situation, propagating the lower inode label to the upper inode only >>>>>> impacts the mounter checks, and in that case makes copy-up pointless - if it >>>>>> wasn't allowed to lower it won't be allowed to upper. If it is allowed, >>>>>> then client task is free to access the device regardless as long as it has >>>>>> permissions to the overlay inode. So I don't see what we gain by >>>>>> propagating the lower inode label to the upper inode in the context mount >>>>>> case, and it creates an inconsistency between special files and regular >>>>>> ones. >>>>> >>>>> If we agree on retaining lower label of lower device file on copy up, then >>>>> I am assuming we will change rule c) to copy up only non device files. >>>>> (because if you don't have access on lower, you will not have access >>>>> even after copy up). >>>>> >>>>> There are other paths where copy up happnes. Like link or when file >>>>> metadata (ownership, permissions, timestmap) changes. In those cases, >>>>> if we retain the lower label over copy up, it probably will help. >>>>> >>>>> IOW, just by creating a link to a device, one will not get access to >>>>> a device on upper which could not be accessed on lower. >>>>> >>>>> Device files are special anyway. In regular files we are creating a >>>>> copy and user writes to copy. But that's not the case with device >>>>> files. So I guess these will have to be treated differently. >>>> >>>> I don't understand what you are suggesting. In the case of a context mount, >>>> the context specified by the mounter must be assigned to the upper inode for >>>> any files that are copied up. Otherwise, changes to file data or metadata >>>> made through the overlay will be visible under two different security >>>> contexts simultaneously: the context of the overlay inode (i.e. the one >>>> specified by the mounter) and the context of the upper inode (in your >>>> suggestion, the context from the lower inode). This allows a violation of >>>> MAC policy where one can leak data through an overlay to an unauthorized >>>> context. >>> >>> Hi Stephen, >>> >>> Sorry, I don't understand this point of leaking data through overlay. Even >>> if we retain lower label on copy up (for device file), to open that file >>> process should have access on overlay context label and then mounter needs >>> to have access on upper inode (lower label). This is not different from >>> opening a file on lower. Just that metadata of this file on upper might >>> be different. >>> >>> Can you elaborate a bit more on how this is leaking data through overlay >>> mount. If it is, then why accessing file on lower is not equivalent of >>> leaking of data. >> >> In the container use case, retaining the lower label on copy-up for a >> context-mounted overlay permits a process in the container to leak the >> container data out to host files not labeled with the container label and >> thus potentially accessible to other containers or host processes. > >> The >> container process appears to just be writing to files labeled with the >> container label via the overlay, but the written data and/or metadata is >> directly accessible through the lower label, which is likely readable to >> all/many containers and host processes. >> >> In the multi-level security (MLS) use case, an analogy would a situation >> where you have an unclassified lower dir with some content to be shared >> read-only across all levels, and an overlay is context-mounted at each level >> with a corresponding upper dir and work dir private to that level. If a >> client process at secret performs a write to a file via the secret overlay, >> and if the written data is stored in a file in the upper dir that inherits >> the label from the lower file (unclassified), then the secret process can >> leak data to unclassified processes at will, violating the MLS policy. > > For the case of devices, its already happening. One might change metadata > of a device (hence trigger copy up). Now all writes to upper device file > from secret process still go to same underlying device and are still > readable from lower device file. This is an argument for not copying up device files IMHO, not for preserving the lower label on them. Even just allowing the secret process to trigger the creation of an unclassified file through copy-up is a violation of the MLS policy. It doesn't require writing to the file itself. > > In fact, for the case of devices, that is even more of a reason to retain > the label same as lower. Otherwise upper device node is leaking data > of a secret process which can be accessed using device at lower/ (lower's > label is readable). > >> >> The difference with the lower is that it is read-only and the mounter is >> explicitly choosing to export it under the new context for reading (but not >> for writing). > > If we retain the label and if lower is not writable, then upper deivce > node is not being written to as well. So from label point of view, lower > and upper inode are not different. Only difference is that some metadata > of upper inode might be different. > > Thanks > Vivek >> >> As a side note, the actual checking during a context mount isn't as granular >> as we might like here, since there is no overlay-specific logic and thus no >> individual checking of the lower, upper, and work directory labels. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-12-13 16:12 ` Stephen Smalley @ 2018-12-13 18:54 ` Vivek Goyal 2018-12-13 20:09 ` Stephen Smalley 0 siblings, 1 reply; 47+ messages in thread From: Vivek Goyal @ 2018-12-13 18:54 UTC (permalink / raw) To: Stephen Smalley Cc: Miklos Szeredi, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh On Thu, Dec 13, 2018 at 11:12:31AM -0500, Stephen Smalley wrote: [..] > > > > Can you elaborate a bit more on how this is leaking data through overlay > > > > mount. If it is, then why accessing file on lower is not equivalent of > > > > leaking of data. > > > > > > In the container use case, retaining the lower label on copy-up for a > > > context-mounted overlay permits a process in the container to leak the > > > container data out to host files not labeled with the container label and > > > thus potentially accessible to other containers or host processes. > > > > > The > > > container process appears to just be writing to files labeled with the > > > container label via the overlay, but the written data and/or metadata is > > > directly accessible through the lower label, which is likely readable to > > > all/many containers and host processes. > > > > > > In the multi-level security (MLS) use case, an analogy would a situation > > > where you have an unclassified lower dir with some content to be shared > > > read-only across all levels, and an overlay is context-mounted at each level > > > with a corresponding upper dir and work dir private to that level. If a > > > client process at secret performs a write to a file via the secret overlay, > > > and if the written data is stored in a file in the upper dir that inherits > > > the label from the lower file (unclassified), then the secret process can > > > leak data to unclassified processes at will, violating the MLS policy. > > > > For the case of devices, its already happening. One might change metadata > > of a device (hence trigger copy up). Now all writes to upper device file > > from secret process still go to same underlying device and are still > > readable from lower device file. > > This is an argument for not copying up device files IMHO, not for preserving > the lower label on them. How do we handle metadata change to device node (like timestamp, ownership change) without copy up. Vivek ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-12-13 18:54 ` Vivek Goyal @ 2018-12-13 20:09 ` Stephen Smalley 2018-12-13 20:26 ` Vivek Goyal 0 siblings, 1 reply; 47+ messages in thread From: Stephen Smalley @ 2018-12-13 20:09 UTC (permalink / raw) To: Vivek Goyal Cc: Miklos Szeredi, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh On 12/13/18 1:54 PM, Vivek Goyal wrote: > On Thu, Dec 13, 2018 at 11:12:31AM -0500, Stephen Smalley wrote: > > [..] >>>>> Can you elaborate a bit more on how this is leaking data through overlay >>>>> mount. If it is, then why accessing file on lower is not equivalent of >>>>> leaking of data. >>>> >>>> In the container use case, retaining the lower label on copy-up for a >>>> context-mounted overlay permits a process in the container to leak the >>>> container data out to host files not labeled with the container label and >>>> thus potentially accessible to other containers or host processes. >>> >>>> The >>>> container process appears to just be writing to files labeled with the >>>> container label via the overlay, but the written data and/or metadata is >>>> directly accessible through the lower label, which is likely readable to >>>> all/many containers and host processes. >>>> >>>> In the multi-level security (MLS) use case, an analogy would a situation >>>> where you have an unclassified lower dir with some content to be shared >>>> read-only across all levels, and an overlay is context-mounted at each level >>>> with a corresponding upper dir and work dir private to that level. If a >>>> client process at secret performs a write to a file via the secret overlay, >>>> and if the written data is stored in a file in the upper dir that inherits >>>> the label from the lower file (unclassified), then the secret process can >>>> leak data to unclassified processes at will, violating the MLS policy. >>> >>> For the case of devices, its already happening. One might change metadata >>> of a device (hence trigger copy up). Now all writes to upper device file >>> from secret process still go to same underlying device and are still >>> readable from lower device file. >> >> This is an argument for not copying up device files IMHO, not for preserving >> the lower label on them. > > How do we handle metadata change to device node (like timestamp, ownership > change) without copy up. Do we need to support such metadata changes to device nodes through an overlay mount? Is that required for some legitimate purpose (and if so, what is the use case?)? If not, just deny it up front. Much simpler and no potential for a leak. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-12-13 20:09 ` Stephen Smalley @ 2018-12-13 20:26 ` Vivek Goyal 0 siblings, 0 replies; 47+ messages in thread From: Vivek Goyal @ 2018-12-13 20:26 UTC (permalink / raw) To: Stephen Smalley Cc: Miklos Szeredi, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh On Thu, Dec 13, 2018 at 03:09:55PM -0500, Stephen Smalley wrote: > On 12/13/18 1:54 PM, Vivek Goyal wrote: > > On Thu, Dec 13, 2018 at 11:12:31AM -0500, Stephen Smalley wrote: > > > > [..] > > > > > > Can you elaborate a bit more on how this is leaking data through overlay > > > > > > mount. If it is, then why accessing file on lower is not equivalent of > > > > > > leaking of data. > > > > > > > > > > In the container use case, retaining the lower label on copy-up for a > > > > > context-mounted overlay permits a process in the container to leak the > > > > > container data out to host files not labeled with the container label and > > > > > thus potentially accessible to other containers or host processes. > > > > > > > > > The > > > > > container process appears to just be writing to files labeled with the > > > > > container label via the overlay, but the written data and/or metadata is > > > > > directly accessible through the lower label, which is likely readable to > > > > > all/many containers and host processes. > > > > > > > > > > In the multi-level security (MLS) use case, an analogy would a situation > > > > > where you have an unclassified lower dir with some content to be shared > > > > > read-only across all levels, and an overlay is context-mounted at each level > > > > > with a corresponding upper dir and work dir private to that level. If a > > > > > client process at secret performs a write to a file via the secret overlay, > > > > > and if the written data is stored in a file in the upper dir that inherits > > > > > the label from the lower file (unclassified), then the secret process can > > > > > leak data to unclassified processes at will, violating the MLS policy. > > > > > > > > For the case of devices, its already happening. One might change metadata > > > > of a device (hence trigger copy up). Now all writes to upper device file > > > > from secret process still go to same underlying device and are still > > > > readable from lower device file. > > > > > > This is an argument for not copying up device files IMHO, not for preserving > > > the lower label on them. > > > > How do we handle metadata change to device node (like timestamp, ownership > > change) without copy up. > > Do we need to support such metadata changes to device nodes through an > overlay mount? Is that required for some legitimate purpose (and if so, > what is the use case?)? If not, just deny it up front. Much simpler and no > potential for a leak. This will be overlay specific behavior and further from POSIX like filesystem behavior. Don't know which workloads depend on changing ownership of devices of changing metadata of devices. Thanks Vivek ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-12-04 15:15 ` Vivek Goyal 2018-12-04 15:22 ` Vivek Goyal @ 2018-12-04 15:42 ` Stephen Smalley 2018-12-04 16:15 ` Vivek Goyal 1 sibling, 1 reply; 47+ messages in thread From: Stephen Smalley @ 2018-12-04 15:42 UTC (permalink / raw) To: Vivek Goyal Cc: Miklos Szeredi, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh On 12/4/18 10:15 AM, Vivek Goyal wrote: > On Tue, Dec 04, 2018 at 09:30:53AM -0500, Stephen Smalley wrote: >> On 12/4/18 8:32 AM, Miklos Szeredi wrote: >>> On Thu, Nov 29, 2018 at 10:16 PM Stephen Smalley <sds@tycho.nsa.gov> wrote: >>>> >>>> On 11/29/18 4:03 PM, Stephen Smalley wrote: >>>>> On 11/29/18 2:47 PM, Miklos Szeredi wrote: >>>>>> On Thu, Nov 29, 2018 at 5:14 PM Stephen Smalley <sds@tycho.nsa.gov> >>>>>> wrote: >>>>>> >>>>>>> Possibly I misunderstood you, but I don't think we want to copy-up on >>>>>>> permission denial, as that would still allow the mounter to read/write >>>>>>> special files or execute regular files to which it would normally be >>>>>>> denied access, because the copy would inherit the context specified by >>>>>>> the mounter in the context mount case. It still represents an >>>>>>> escalation of privilege for the mounter. In contrast, the copy-up on >>>>>>> write behavior does not allow the mounter to do anything it could not do >>>>>>> already (i.e. read from the lower, write to the upper). >>>>>> >>>>>> Let's get this straight: when file is copied up, it inherits label >>>>>> from context=, not from label of lower file? >>>>> >>>>> That's correct. The overlay inodes are all assigned the label from the >>>>> context= mount option, and so are any upper inodes created through the >>>>> overlay. At least that's my understanding of how it is supposed to >>>>> work. The original use case was for containers with the lower dir >>>>> labeled with a context that is read-only to the container context and >>>>> using a context that is writable by the container context for the >>>>> context= mount. >>>>> >>>>>> Next question: permission to change metadata is tied to permission to >>>>>> open? Is it possible that open is denied, but metadata can be >>>>>> changed? >>>>> >>>>> There is no metadata change occurring here. The overlay, upper, and >>>>> lower inodes all keep their labels intact for their lifetime (both >>>>> overlay and upper always have the context= label; upper has whatever its >>>> ^^lower^^ >>>> >>>>> original label was), unless explicitly relabeled by some process. And >>>>> when viewed through the overlay, the file always has the label specified >>>>> via context=, even before the copy-up. >>> >>> Okay. >>> >>>>> >>>>>> DAC model allows this: metadata change is tied to ownership, not mode >>>>>> bits. And different capability flag. >>>>>> >>>>>> If the same is true for MAC, then the pre-v4.20-rc1 is already >>>>>> susceptible to the privilege escalation you describe, right? >>>>> >>>>> Actually, I guess there wouldn't be a privilege escalation if you >>>>> checked the mounter's ability to create the new file upon copy-up, and >>>>> checked the mounter's access to the upper inode label upon the >>>>> subsequent read, write, or execute access. Then we'd typically block >>>>> the ability to create the device file and we'd block the ability to >>>>> execute files with the label from context=. >>>>> >>>>> But copy-up of special files seems undesirable for other reasons (e.g. >>>>> requiring mounters to be allowed to create device nodes just to permit >>>>> client's to read/write them, possible implications for nodev/noexec, >>>>> implications for socket and fifo files). >>> >>> I think you missed my point: opening a device file or executing an >>> executable wouldn't normally require copy-up. If >>> >>> - permission is granted on overlay to task, and >>> - permission is granted on lower layer to mounter, >>> >>> then copy-up wouldn't be performed. >>> >>> My proposed sequence would be >>> >>> a) check task's creds against overlay inode, fail -> return fail, otherwise: >>> b) check mounter's creds against lower inode, success -> return >>> success, otherwise: >>> c) copy up inode, fail -> return fail, otherwise >>> d) check mounter's creds against upper inode, return result. >>> >>> So, unlike write access to regular files, write access to special >>> files don't necessarily result in copy-up. >>> >>> You say this is an escalation of privilege, but I don't get it how. >>> As DWalsh points out downthread, if mounter cannot create device >>> files, then the copy-up will simply fail. If mounter can create >>> device files, then this is not an escalation of privilege for the >>> mounter. >> >> Yes, in that case there isn't an escalation of privilege for the mounter (I >> acknowledged that above). I'm still not sure copy-up of special files is a >> good idea though: >> >> - In the case of device files, there is the potential for mischief by the >> client task in misusing the mounter's privileges to gain access to otherwise >> unusable device node (nodev lower vs upper?), > > I was thinking about it as well. But client can always bypass permissions > of lower device inode by first removing device file and then by doing > a mknod. And that will be equivalent of copy up. IOW, IIUC, we do not deny > mknod to client and that always creates a way for it to write to device > file (and it does not matter what are permissions on lower?) I'm not sure what you mean by "we do not deny mknod to client". Depends on your policy and what context the client is running in. Plenty of situations where the client is not allowed to create device files directly, and the mounter is. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-12-04 15:42 ` Stephen Smalley @ 2018-12-04 16:15 ` Vivek Goyal 0 siblings, 0 replies; 47+ messages in thread From: Vivek Goyal @ 2018-12-04 16:15 UTC (permalink / raw) To: Stephen Smalley Cc: Miklos Szeredi, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh On Tue, Dec 04, 2018 at 10:42:35AM -0500, Stephen Smalley wrote: [..] > > > Yes, in that case there isn't an escalation of privilege for the mounter (I > > > acknowledged that above). I'm still not sure copy-up of special files is a > > > good idea though: > > > > > > - In the case of device files, there is the potential for mischief by the > > > client task in misusing the mounter's privileges to gain access to otherwise > > > unusable device node (nodev lower vs upper?), > > > > I was thinking about it as well. But client can always bypass permissions > > of lower device inode by first removing device file and then by doing > > a mknod. And that will be equivalent of copy up. IOW, IIUC, we do not deny > > mknod to client and that always creates a way for it to write to device > > file (and it does not matter what are permissions on lower?) > > I'm not sure what you mean by "we do not deny mknod to client". Depends on > your policy and what context the client is running in. Plenty of situations > where the client is not allowed to create device files directly, and the > mounter is. > Oh, I meant for the cases where policy allows client to mknod with context label. In that case, it does not matter what's the label on lower. Client can always bypass it by first removing device node and creating a new device node. Thanks Vivek ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-11-29 19:47 ` Miklos Szeredi 2018-11-29 21:03 ` Stephen Smalley @ 2018-11-29 22:22 ` Daniel Walsh 2018-12-03 23:27 ` Paul Moore 1 sibling, 1 reply; 47+ messages in thread From: Daniel Walsh @ 2018-11-29 22:22 UTC (permalink / raw) To: Miklos Szeredi, Stephen Smalley Cc: Vivek Goyal, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux On 11/29/18 2:47 PM, Miklos Szeredi wrote: > On Thu, Nov 29, 2018 at 5:14 PM Stephen Smalley <sds@tycho.nsa.gov> wrote: > >> Possibly I misunderstood you, but I don't think we want to copy-up on >> permission denial, as that would still allow the mounter to read/write >> special files or execute regular files to which it would normally be >> denied access, because the copy would inherit the context specified by >> the mounter in the context mount case. It still represents an >> escalation of privilege for the mounter. In contrast, the copy-up on >> write behavior does not allow the mounter to do anything it could not do >> already (i.e. read from the lower, write to the upper). > Let's get this straight: when file is copied up, it inherits label > from context=, not from label of lower file? Yes, in the case of context mount, it will get the context mount directory. In the case of not context mount, it should maintain the label of the lower. > > Next question: permission to change metadata is tied to permission to > open? Is it possible that open is denied, but metadata can be > changed? Yes, SElinux handles open differently then setattr. Although I am not sure if any tools handle this. > DAC model allows this: metadata change is tied to ownership, not mode > bits. And different capability flag. > > If the same is true for MAC, then the pre-v4.20-rc1 is already > susceptible to the privilege escalation you describe, right? > > Thanks, > Miklos After talking to Vivek, I am not sure their is a privilege escallation. For device nodes, the mounter has to have the ability to create the devicenode with the context mount, if he can do this, then he can do it with or without Overlay. This might lead to users making mistakes on security, but the model is sound. And I think this stands even in the case of the lower is mounted NODEV and the upper is not. If the mounter can create a device on the upper with a particular label, then he does not need the lower. For sockets, I see the case where a process is listening on the lower level socket, the mounter mounts the overlay over the directory with the socket. Then the mounter changes the attributes of the socket, performing a copy up. If the mounter can not talk to the socket and the other end is still listening, then this could be an issue. If the socket is no longer connected to the listener on the lower, then this is not an issue. Similar for a FIFO. With SELinux we are also always checking not only the file access to the socker, but also checking whether the label of the client is able to talk to the label of the server daemon. So we are protected by a secondary check. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-11-29 22:22 ` Daniel Walsh @ 2018-12-03 23:27 ` Paul Moore 2018-12-04 14:43 ` Stephen Smalley 0 siblings, 1 reply; 47+ messages in thread From: Paul Moore @ 2018-12-03 23:27 UTC (permalink / raw) To: Dan Walsh Cc: miklos, Stephen Smalley, vgoyal, omosnace, bfields, salyzyn, linux-kernel, linux-unionfs, linux-fsdevel, selinux On Thu, Nov 29, 2018 at 5:22 PM Daniel Walsh <dwalsh@redhat.com> wrote: > On 11/29/18 2:47 PM, Miklos Szeredi wrote: > > On Thu, Nov 29, 2018 at 5:14 PM Stephen Smalley <sds@tycho.nsa.gov> wrote: > > > >> Possibly I misunderstood you, but I don't think we want to copy-up on > >> permission denial, as that would still allow the mounter to read/write > >> special files or execute regular files to which it would normally be > >> denied access, because the copy would inherit the context specified by > >> the mounter in the context mount case. It still represents an > >> escalation of privilege for the mounter. In contrast, the copy-up on > >> write behavior does not allow the mounter to do anything it could not do > >> already (i.e. read from the lower, write to the upper). > > Let's get this straight: when file is copied up, it inherits label > > from context=, not from label of lower file? > > Yes, in the case of context mount, it will get the context mount directory. > > In the case of not context mount, it should maintain the label of the > lower. > > > Next question: permission to change metadata is tied to permission to > > open? Is it possible that open is denied, but metadata can be > > changed? > > Yes, SElinux handles open differently then setattr. Although I am not > sure if any tools handle this. > > > DAC model allows this: metadata change is tied to ownership, not mode > > bits. And different capability flag. > > > > If the same is true for MAC, then the pre-v4.20-rc1 is already > > susceptible to the privilege escalation you describe, right? > > After talking to Vivek, I am not sure their is a privilege escallation. More on this below, but this thread doesn't have me convinced, and we are at -rc5 right now. We need to come to some decision on this soon because we are running out of time before v4.20 is released with this code. > For device nodes, the mounter has to have the ability to create the > devicenode with the context mount, if he can do this, then he can do it > with or without Overlay. This might lead to users making mistakes on > security, but the model is sound. And I think this stands even in the > case of the lower is mounted NODEV and the upper is not. If the mounter > can create a device on the upper with a particular label, then he does > not need the lower. The problem I have when looking at the current code is that permission is given, regardless of what is requested, for any special_file() on an overlayfs mount. It also looks like the mounter's creds are used when checking permissions regardless of the file has been copied up or not; I would expect that the mounter's permissions would only used when checking permissions against the lower inode, no? I think there is also some weird behavior if the underlying inode only allows the mounter to write (no read) and a write is requested at the overlayfs layer. I'm sure I'm missing some subtle thing with overlayfs, but why aren't we doing something like the following: int ovl_permission(...) { if (!realinode) { ... } err = generic_permission(inode, mask); if (err) return err; if (upperinode) { err = inode_permission(upperinode, mask); } else { // on the lower inode, always use the mounter's creds old_cred = ovl_override_creds(...); // check to see if we have the right perms first, if // that fails switch to a read/copy-up check if we // are doing a write (note: we are not bypassing the // exec check, the task can change the metadata like // every other fs) err = inode_permission(lowerinode, mask); if (err && (mask & (MAY_EXEC | MAY_APPEND))) { // PM: my guess is that we also need to add a // "&& !special_file(lowerinode)" to the conditional // above because you can't copy-up a dev node in the // normal sense, but we'll leave that as a discussion // point for now... // turn the write into a read (copy-up) mask &= ~(MAY_WRITE | MAY_APPEND); mask |= MAY_READ; err = inode_permission(lowerinode, mask); } // reset the creds revert_creds(old_cred); } return err; } > For sockets, I see the case where a process is listening on the lower > level socket, the mounter mounts the overlay over the directory with the > socket. Then the mounter changes the attributes of the socket, > performing a copy up. If the mounter can not talk to the socket and the > other end is still listening, then this could be an issue. If the > socket is no longer connected to the listener on the lower, then this is > not an issue. > > Similar for a FIFO. See my comment "// PM: my guess ..." in the pseudo code above. I think the write->read permission mask conversion really should only apply to normal files where you can do a copy-up. > With SELinux we are also always checking not only the file access to the > socker, but also checking whether the label of the client is able to > talk to the label of the server daemon. So we are protected by a > secondary check. That's making some assumptions on the LSM and the LSM's loaded policy and is not something I would want to rely on. -- paul moore www.paul-moore.com ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-12-03 23:27 ` Paul Moore @ 2018-12-04 14:43 ` Stephen Smalley 2018-12-04 23:01 ` Paul Moore 0 siblings, 1 reply; 47+ messages in thread From: Stephen Smalley @ 2018-12-04 14:43 UTC (permalink / raw) To: Paul Moore, Dan Walsh Cc: miklos, vgoyal, omosnace, bfields, salyzyn, linux-kernel, linux-unionfs, linux-fsdevel, selinux On 12/3/18 6:27 PM, Paul Moore wrote: > On Thu, Nov 29, 2018 at 5:22 PM Daniel Walsh <dwalsh@redhat.com> wrote: >> On 11/29/18 2:47 PM, Miklos Szeredi wrote: >>> On Thu, Nov 29, 2018 at 5:14 PM Stephen Smalley <sds@tycho.nsa.gov> wrote: >>> >>>> Possibly I misunderstood you, but I don't think we want to copy-up on >>>> permission denial, as that would still allow the mounter to read/write >>>> special files or execute regular files to which it would normally be >>>> denied access, because the copy would inherit the context specified by >>>> the mounter in the context mount case. It still represents an >>>> escalation of privilege for the mounter. In contrast, the copy-up on >>>> write behavior does not allow the mounter to do anything it could not do >>>> already (i.e. read from the lower, write to the upper). >>> Let's get this straight: when file is copied up, it inherits label >>> from context=, not from label of lower file? >> >> Yes, in the case of context mount, it will get the context mount directory. >> >> In the case of not context mount, it should maintain the label of the >> lower. >> >>> Next question: permission to change metadata is tied to permission to >>> open? Is it possible that open is denied, but metadata can be >>> changed? >> >> Yes, SElinux handles open differently then setattr. Although I am not >> sure if any tools handle this. >> >>> DAC model allows this: metadata change is tied to ownership, not mode >>> bits. And different capability flag. >>> >>> If the same is true for MAC, then the pre-v4.20-rc1 is already >>> susceptible to the privilege escalation you describe, right? >> >> After talking to Vivek, I am not sure their is a privilege escallation. > > More on this below, but this thread doesn't have me convinced, and we > are at -rc5 right now. We need to come to some decision on this soon > because we are running out of time before v4.20 is released with this > code. > >> For device nodes, the mounter has to have the ability to create the >> devicenode with the context mount, if he can do this, then he can do it >> with or without Overlay. This might lead to users making mistakes on >> security, but the model is sound. And I think this stands even in the >> case of the lower is mounted NODEV and the upper is not. If the mounter >> can create a device on the upper with a particular label, then he does >> not need the lower. > > The problem I have when looking at the current code is that permission > is given, regardless of what is requested, for any special_file() on > an overlayfs mount. > > It also looks like the mounter's creds are used when checking > permissions regardless of the file has been copied up or not; I would > expect that the mounter's permissions would only used when checking > permissions against the lower inode, no? No, that's never been the model as far as I know. mounter's permissions are checked to the underlying inode, whether upper or lower. client's permissions are only checked to the overlay inode. upper and lower are logically backing store - upper for writes and lower for reads from unmodified files. Now, in theory, upper should always be labeled the same as overlay, so client check against overlay should already imply client access to upper, unless someone has manually relabeled upper outside of the overlay. I think there is also some > weird behavior if the underlying inode only allows the mounter to > write (no read) and a write is requested at the overlayfs layer. I'm > sure I'm missing some subtle thing with overlayfs, but why aren't we > doing something like the following: > > int ovl_permission(...) { > > if (!realinode) { > ... > } > > err = generic_permission(inode, mask); > if (err) > return err; > > if (upperinode) { > err = inode_permission(upperinode, mask); > } else { > // on the lower inode, always use the mounter's creds > old_cred = ovl_override_creds(...); > > // check to see if we have the right perms first, if > // that fails switch to a read/copy-up check if we > // are doing a write (note: we are not bypassing the > // exec check, the task can change the metadata like > // every other fs) > err = inode_permission(lowerinode, mask); > if (err && (mask & (MAY_EXEC | MAY_APPEND))) { > // PM: my guess is that we also need to add a > // "&& !special_file(lowerinode)" to the conditional > // above because you can't copy-up a dev node in the > // normal sense, but we'll leave that as a discussion > // point for now... > // turn the write into a read (copy-up) > mask &= ~(MAY_WRITE | MAY_APPEND); > mask |= MAY_READ; > err = inode_permission(lowerinode, mask); > } > > // reset the creds > revert_creds(old_cred); > } > > return err; > } > >> For sockets, I see the case where a process is listening on the lower >> level socket, the mounter mounts the overlay over the directory with the >> socket. Then the mounter changes the attributes of the socket, >> performing a copy up. If the mounter can not talk to the socket and the >> other end is still listening, then this could be an issue. If the >> socket is no longer connected to the listener on the lower, then this is >> not an issue. >> >> Similar for a FIFO. > > See my comment "// PM: my guess ..." in the pseudo code above. I > think the write->read permission mask conversion really should only > apply to normal files where you can do a copy-up. > >> With SELinux we are also always checking not only the file access to the >> socker, but also checking whether the label of the client is able to >> talk to the label of the server daemon. So we are protected by a >> secondary check. > > That's making some assumptions on the LSM and the LSM's loaded policy > and is not something I would want to rely on. If you copy a socket or fifo and try to connect or open the copy, you aren't going to get the same result as if you accessed the original. Copy-up really makes no sense for those AFAICT. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: overlayfs access checks on underlying layers 2018-12-04 14:43 ` Stephen Smalley @ 2018-12-04 23:01 ` Paul Moore 0 siblings, 0 replies; 47+ messages in thread From: Paul Moore @ 2018-12-04 23:01 UTC (permalink / raw) To: Stephen Smalley Cc: Dan Walsh, miklos, vgoyal, omosnace, bfields, salyzyn, linux-kernel, linux-unionfs, linux-fsdevel, selinux On Tue, Dec 4, 2018 at 9:40 AM Stephen Smalley <sds@tycho.nsa.gov> wrote: > On 12/3/18 6:27 PM, Paul Moore wrote: > > On Thu, Nov 29, 2018 at 5:22 PM Daniel Walsh <dwalsh@redhat.com> wrote: > >> On 11/29/18 2:47 PM, Miklos Szeredi wrote: > >>> On Thu, Nov 29, 2018 at 5:14 PM Stephen Smalley <sds@tycho.nsa.gov> wrote: > >>> > >>>> Possibly I misunderstood you, but I don't think we want to copy-up on > >>>> permission denial, as that would still allow the mounter to read/write > >>>> special files or execute regular files to which it would normally be > >>>> denied access, because the copy would inherit the context specified by > >>>> the mounter in the context mount case. It still represents an > >>>> escalation of privilege for the mounter. In contrast, the copy-up on > >>>> write behavior does not allow the mounter to do anything it could not do > >>>> already (i.e. read from the lower, write to the upper). > >>> Let's get this straight: when file is copied up, it inherits label > >>> from context=, not from label of lower file? > >> > >> Yes, in the case of context mount, it will get the context mount directory. > >> > >> In the case of not context mount, it should maintain the label of the > >> lower. > >> > >>> Next question: permission to change metadata is tied to permission to > >>> open? Is it possible that open is denied, but metadata can be > >>> changed? > >> > >> Yes, SElinux handles open differently then setattr. Although I am not > >> sure if any tools handle this. > >> > >>> DAC model allows this: metadata change is tied to ownership, not mode > >>> bits. And different capability flag. > >>> > >>> If the same is true for MAC, then the pre-v4.20-rc1 is already > >>> susceptible to the privilege escalation you describe, right? > >> > >> After talking to Vivek, I am not sure their is a privilege escallation. > > > > More on this below, but this thread doesn't have me convinced, and we > > are at -rc5 right now. We need to come to some decision on this soon > > because we are running out of time before v4.20 is released with this > > code. > > > >> For device nodes, the mounter has to have the ability to create the > >> devicenode with the context mount, if he can do this, then he can do it > >> with or without Overlay. This might lead to users making mistakes on > >> security, but the model is sound. And I think this stands even in the > >> case of the lower is mounted NODEV and the upper is not. If the mounter > >> can create a device on the upper with a particular label, then he does > >> not need the lower. > > > > The problem I have when looking at the current code is that permission > > is given, regardless of what is requested, for any special_file() on > > an overlayfs mount. > > > > It also looks like the mounter's creds are used when checking > > permissions regardless of the file has been copied up or not; I would > > expect that the mounter's permissions would only used when checking > > permissions against the lower inode, no? > > No, that's never been the model as far as I know. mounter's permissions > are checked to the underlying inode, whether upper or lower. client's > permissions are only checked to the overlay inode. upper and lower are > logically backing store - upper for writes and lower for reads from > unmodified files. Now, in theory, upper should always be labeled the > same as overlay, so client check against overlay should already imply > client access to upper, unless someone has manually relabeled upper > outside of the overlay. Okay, thanks for the clarification on the model. This seems a little odd at first, but I'm trying to convince myself that it makes sense. -- paul moore www.paul-moore.com ^ permalink raw reply [flat|nested] 47+ messages in thread
end of thread, other threads:[~2019-03-04 19:21 UTC | newest] Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-11-27 19:55 overlayfs access checks on underlying layers Miklos Szeredi 2018-11-27 19:58 ` Miklos Szeredi 2018-11-27 21:05 ` Vivek Goyal 2018-11-28 10:00 ` Miklos Szeredi 2018-11-28 17:03 ` Vivek Goyal 2018-11-28 19:34 ` Stephen Smalley 2018-11-28 20:24 ` Miklos Szeredi 2018-11-28 21:46 ` Stephen Smalley 2018-11-29 11:04 ` Miklos Szeredi 2018-11-29 13:49 ` Vivek Goyal 2019-03-04 17:01 ` Mark Salyzyn 2019-03-04 17:56 ` Casey Schaufler 2019-03-04 18:44 ` Stephen Smalley 2019-03-04 19:21 ` Amir Goldstein 2018-11-29 16:16 ` Stephen Smalley 2018-11-29 16:22 ` Stephen Smalley 2018-11-29 19:47 ` Miklos Szeredi 2018-11-29 21:03 ` Stephen Smalley 2018-11-29 21:19 ` Stephen Smalley 2018-12-04 13:32 ` Miklos Szeredi 2018-12-04 14:30 ` Stephen Smalley 2018-12-04 14:45 ` Miklos Szeredi 2018-12-04 15:35 ` Stephen Smalley 2018-12-04 15:39 ` Miklos Szeredi 2018-12-11 15:50 ` Paul Moore 2018-12-04 15:15 ` Vivek Goyal 2018-12-04 15:22 ` Vivek Goyal 2018-12-04 15:31 ` Miklos Szeredi 2018-12-04 15:42 ` Vivek Goyal 2018-12-04 16:05 ` Stephen Smalley 2018-12-04 16:17 ` Vivek Goyal 2018-12-04 16:49 ` Stephen Smalley 2018-12-05 13:43 ` Vivek Goyal 2018-12-06 20:26 ` Stephen Smalley 2018-12-11 21:48 ` Vivek Goyal 2018-12-12 14:51 ` Stephen Smalley 2018-12-13 14:58 ` Vivek Goyal 2018-12-13 16:12 ` Stephen Smalley 2018-12-13 18:54 ` Vivek Goyal 2018-12-13 20:09 ` Stephen Smalley 2018-12-13 20:26 ` Vivek Goyal 2018-12-04 15:42 ` Stephen Smalley 2018-12-04 16:15 ` Vivek Goyal 2018-11-29 22:22 ` Daniel Walsh 2018-12-03 23:27 ` Paul Moore 2018-12-04 14:43 ` Stephen Smalley 2018-12-04 23:01 ` Paul Moore
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).