selinux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* overlayfs access checks on underlying layers
@ 2018-11-27 19:55 Miklos Szeredi
  2018-11-27 19:58 ` Miklos Szeredi
  0 siblings, 1 reply; 47+ messages in thread
From: Miklos Szeredi @ 2018-11-27 19:55 UTC (permalink / raw)
  To: Stephen Smalley, Vivek Goyal, Ondrej Mosnacek, Paul Moore,
	J. Bruce Fields, Mark Salyzyn
  Cc: linux-kernel, overlayfs, linux-fsdevel, selinux

Moving discussion from github[1] to here.

To summarize: commit 007ea44892e6 ("ovl: relax permission checking on
underlying layers") was added in 4.20-rc1 to make overlayfs access
checks on underlying "real" filesystems more consistent.  The
discussion leading up to this commit can be found at [2].  The commit
broke some selinux-testsuite cases, possibly indicating a security
hole opened by this commit.

The model this patch tries to follow is that if "cp --preserve=all"
was allowed to the mounter from underlying layer to the overlay layer,
then operation is allowed.  That means even if mounter's creds doesn't
provide permission to for example execute underying file X, if
mounter's creds provide sufficient permission to perform "cp
--preserve=all X Y"  and original creds allow execute on Y, then the
operation is allowed.  This provides consistency in the face of
copy-ups.  Consistency is only provided in sane setups, where mounter
has sufficient privileges to access both the lower and upper layers.

The model may not have been perfectly followed, or possibly the model
itself is flawed.  I'd like to better understand the issues here.

Thanks,
Miklos

[1]  https://github.com/SELinuxProject/selinux-kernel/issues/43#issuecomment-442148920
[2] https://marc.info/?t=152762608800002&r=1

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-11-27 19:55 overlayfs access checks on underlying layers Miklos Szeredi
@ 2018-11-27 19:58 ` Miklos Szeredi
  2018-11-27 21:05   ` Vivek Goyal
  0 siblings, 1 reply; 47+ messages in thread
From: Miklos Szeredi @ 2018-11-27 19:58 UTC (permalink / raw)
  To: Stephen Smalley, Vivek Goyal, Ondrej Mosnacek, J. Bruce Fields,
	Mark Salyzyn, Paul Moore
  Cc: linux-kernel, overlayfs, linux-fsdevel, selinux

[resending with fixed email address for Paul Moore]

Moving discussion from github[1] to here.

To summarize: commit 007ea44892e6 ("ovl: relax permission checking on
underlying layers") was added in 4.20-rc1 to make overlayfs access
checks on underlying "real" filesystems more consistent.  The
discussion leading up to this commit can be found at [2].  The commit
broke some selinux-testsuite cases, possibly indicating a security
hole opened by this commit.

The model this patch tries to follow is that if "cp --preserve=all"
was allowed to the mounter from underlying layer to the overlay layer,
then operation is allowed.  That means even if mounter's creds doesn't
provide permission to for example execute underying file X, if
mounter's creds provide sufficient permission to perform "cp
--preserve=all X Y"  and original creds allow execute on Y, then the
operation is allowed.  This provides consistency in the face of
copy-ups.  Consistency is only provided in sane setups, where mounter
has sufficient privileges to access both the lower and upper layers.

The model may not have been perfectly followed, or possibly the model
itself is flawed.  I'd like to better understand the issues here.

Thanks,
Miklos

[1]  https://github.com/SELinuxProject/selinux-kernel/issues/43#issuecomment-442148920
[2] https://marc.info/?t=152762608800002&r=1

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-11-27 19:58 ` Miklos Szeredi
@ 2018-11-27 21:05   ` Vivek Goyal
  2018-11-28 10:00     ` Miklos Szeredi
  0 siblings, 1 reply; 47+ messages in thread
From: Vivek Goyal @ 2018-11-27 21:05 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Stephen Smalley, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn,
	Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux,
	Daniel J Walsh

On Tue, Nov 27, 2018 at 08:58:06PM +0100, Miklos Szeredi wrote:
> [resending with fixed email address for Paul Moore]
> 
> Moving discussion from github[1] to here.
> 
> To summarize: commit 007ea44892e6 ("ovl: relax permission checking on
> underlying layers") was added in 4.20-rc1 to make overlayfs access
> checks on underlying "real" filesystems more consistent.  The
> discussion leading up to this commit can be found at [2].  The commit
> broke some selinux-testsuite cases, possibly indicating a security
> hole opened by this commit.
> 
> The model this patch tries to follow is that if "cp --preserve=all"
> was allowed to the mounter from underlying layer to the overlay layer,
> then operation is allowed.  That means even if mounter's creds doesn't
> provide permission to for example execute underying file X, if
> mounter's creds provide sufficient permission to perform "cp
> --preserve=all X Y"  and original creds allow execute on Y, then the
> operation is allowed.  This provides consistency in the face of
> copy-ups.  Consistency is only provided in sane setups, where mounter
> has sufficient privileges to access both the lower and upper layers.

[cc daniel walsh]

I think current selinux testsuite tests are written keeping these
rules in mind.

1. Check overlay inode creds in the context of task and underlying
   inode creds (lower/upper), in the context of mounter.

2. For a lower inode, if said file is being copied up, then only
   check MAY_READ on lower. This is equivalent to mounter creating
   a copy of file and providing caller access to it (context mount).

For the case of special devices, we do not copy up these. So should
we continue to do check on lower inode in the context of mounter
(instead of not doing any check on lower at all).

For being able to execute a file, should we atleast check MAY_READ
on lower.

I am not sure why did we have to drop current checks on special file
and execute. I will read through the thread you pointed out.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-11-27 21:05   ` Vivek Goyal
@ 2018-11-28 10:00     ` Miklos Szeredi
  2018-11-28 17:03       ` Vivek Goyal
  0 siblings, 1 reply; 47+ messages in thread
From: Miklos Szeredi @ 2018-11-28 10:00 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Stephen Smalley, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn,
	Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux,
	Daniel J Walsh

On Tue, Nov 27, 2018 at 10:05 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Tue, Nov 27, 2018 at 08:58:06PM +0100, Miklos Szeredi wrote:
> > [resending with fixed email address for Paul Moore]
> >
> > Moving discussion from github[1] to here.
> >
> > To summarize: commit 007ea44892e6 ("ovl: relax permission checking on
> > underlying layers") was added in 4.20-rc1 to make overlayfs access
> > checks on underlying "real" filesystems more consistent.  The
> > discussion leading up to this commit can be found at [2].  The commit
> > broke some selinux-testsuite cases, possibly indicating a security
> > hole opened by this commit.
> >
> > The model this patch tries to follow is that if "cp --preserve=all"
> > was allowed to the mounter from underlying layer to the overlay layer,
> > then operation is allowed.  That means even if mounter's creds doesn't
> > provide permission to for example execute underying file X, if
> > mounter's creds provide sufficient permission to perform "cp
> > --preserve=all X Y"  and original creds allow execute on Y, then the
> > operation is allowed.  This provides consistency in the face of
> > copy-ups.  Consistency is only provided in sane setups, where mounter
> > has sufficient privileges to access both the lower and upper layers.
>
> [cc daniel walsh]
>
> I think current selinux testsuite tests are written keeping these
> rules in mind.
>
> 1. Check overlay inode creds in the context of task and underlying
>    inode creds (lower/upper), in the context of mounter.
>
> 2. For a lower inode, if said file is being copied up, then only
>    check MAY_READ on lower. This is equivalent to mounter creating
>    a copy of file and providing caller access to it (context mount).
>
> For the case of special devices, we do not copy up these. So should
> we continue to do check on lower inode in the context of mounter
> (instead of not doing any check on lower at all).

Hmm, I'm trying to understand the logic... If we follow the "cp
--preserve=all" thing, than mounter needs to have CREATE permission
for the special file, not READ or WRITE.  Does that make sense?  Would
that help with the context= mount use case?

>
> For being able to execute a file, should we atleast check MAY_READ
> on lower.

Yep, that looks like a bug present from day one: MAY_EXEC doesn't
always imply MAY_READ, but to be able to execute a file, the kernel
must read it first, and if mounter doesn't have privilege to read the
file, then user should not be allowed to execute it.

> I am not sure why did we have to drop current checks on special file
> and execute. I will read through the thread you pointed out.

TL;DR: NFS access model is that creds are checked by server (and
cached in client), and server could be denying write access to a
device file to mounter (root) independently of DAC. In that case write
access by user to device file would be inconsistent (denied before
copy-up, allowed after copy-up).  Same goes for execute.

And same goes for MAC:  if it's denying READ/WRITE on device or
denying EXECUTE on readable file to mounter, and mounter can just copy
that device/file to a temporry location not controlled by that MAC,
than it can work around that restriction.  IOW, this is just a
generalization of the rule that we ignore WRITE access on lower layer,
because a write will never reach the lower layer.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-11-28 10:00     ` Miklos Szeredi
@ 2018-11-28 17:03       ` Vivek Goyal
  2018-11-28 19:34         ` Stephen Smalley
  0 siblings, 1 reply; 47+ messages in thread
From: Vivek Goyal @ 2018-11-28 17:03 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Stephen Smalley, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn,
	Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux,
	Daniel J Walsh

On Wed, Nov 28, 2018 at 11:00:09AM +0100, Miklos Szeredi wrote:
> On Tue, Nov 27, 2018 at 10:05 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Tue, Nov 27, 2018 at 08:58:06PM +0100, Miklos Szeredi wrote:
> > > [resending with fixed email address for Paul Moore]
> > >
> > > Moving discussion from github[1] to here.
> > >
> > > To summarize: commit 007ea44892e6 ("ovl: relax permission checking on
> > > underlying layers") was added in 4.20-rc1 to make overlayfs access
> > > checks on underlying "real" filesystems more consistent.  The
> > > discussion leading up to this commit can be found at [2].  The commit
> > > broke some selinux-testsuite cases, possibly indicating a security
> > > hole opened by this commit.
> > >
> > > The model this patch tries to follow is that if "cp --preserve=all"
> > > was allowed to the mounter from underlying layer to the overlay layer,
> > > then operation is allowed.  That means even if mounter's creds doesn't
> > > provide permission to for example execute underying file X, if
> > > mounter's creds provide sufficient permission to perform "cp
> > > --preserve=all X Y"  and original creds allow execute on Y, then the
> > > operation is allowed.  This provides consistency in the face of
> > > copy-ups.  Consistency is only provided in sane setups, where mounter
> > > has sufficient privileges to access both the lower and upper layers.
> >
> > [cc daniel walsh]
> >
> > I think current selinux testsuite tests are written keeping these
> > rules in mind.
> >
> > 1. Check overlay inode creds in the context of task and underlying
> >    inode creds (lower/upper), in the context of mounter.
> >
> > 2. For a lower inode, if said file is being copied up, then only
> >    check MAY_READ on lower. This is equivalent to mounter creating
> >    a copy of file and providing caller access to it (context mount).
> >
> > For the case of special devices, we do not copy up these. So should
> > we continue to do check on lower inode in the context of mounter
> > (instead of not doing any check on lower at all).
> 
> Hmm, I'm trying to understand the logic... If we follow the "cp
> --preserve=all" thing, than mounter needs to have CREATE permission
> for the special file, not READ or WRITE.  Does that make sense?  Would
> that help with the context= mount use case?

Ok. If we follow "cp --preserve=all" methodology, then checking for
mounter CREATE permission on upper for special files makes sense. Or
change logic to copy up this special file during open. I am assuming
we don't copy up special file during open as it is not necessary
for things to work but copying up will work as well?

So rules will become.

- Two levels of checks.
- For lower level inode, check MAY_READ for regular files. (including
  exec).
- For special files, only make sure mounter can CREATE object in upper.

- What about checks on files on upper/. As of now we seem to check
  access in mounter's context if it is regular file. Skip the checks
  completely for special files and for executables.

While non-context mount should still be ok, but this means lot of
privilige granting to unprivileged process using context mounts. So
unprivileged process which could not open a device/socket/fifo for
read/write on host fs, can open it for those operations for context
mounts. 

IOW, for context mount case, an unprivileged user will gain lot of
privileges. But that seems to be the point of context mount anyway
on regular disks. If a disk is mounted using context mount option,
then all real labels are ignored and all access checking happens
using context label. We are doing similar thing. With one step extra
and that is making sure if mounter itself can not do certain operation
on host, that will still be denied.

This probably means that context= mounts should be used very carefully.
It will grant lot of priviliges to the process (and allow operations
which process could not do on host without overlayfs mount).

Case of device file still baffels me though. We don't do any mounter's
checks on device files. So if a device file is on upper which mounter
can't open for read but mounter is still granting priviliges to client
to open that device file. That's unintutive to me and seems counter
to the principle of that mounter can't give more priviliges than what
it itself can't do on host.

Dan, stephen, paul moore, does this sound ok to you folks from selinux
point of view.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-11-28 17:03       ` Vivek Goyal
@ 2018-11-28 19:34         ` Stephen Smalley
  2018-11-28 20:24           ` Miklos Szeredi
  0 siblings, 1 reply; 47+ messages in thread
From: Stephen Smalley @ 2018-11-28 19:34 UTC (permalink / raw)
  To: Vivek Goyal, Miklos Szeredi
  Cc: Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore,
	linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh

On 11/28/18 12:03 PM, Vivek Goyal wrote:
> On Wed, Nov 28, 2018 at 11:00:09AM +0100, Miklos Szeredi wrote:
>> On Tue, Nov 27, 2018 at 10:05 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>>>
>>> On Tue, Nov 27, 2018 at 08:58:06PM +0100, Miklos Szeredi wrote:
>>>> [resending with fixed email address for Paul Moore]
>>>>
>>>> Moving discussion from github[1] to here.
>>>>
>>>> To summarize: commit 007ea44892e6 ("ovl: relax permission checking on
>>>> underlying layers") was added in 4.20-rc1 to make overlayfs access
>>>> checks on underlying "real" filesystems more consistent.  The
>>>> discussion leading up to this commit can be found at [2].  The commit
>>>> broke some selinux-testsuite cases, possibly indicating a security
>>>> hole opened by this commit.
>>>>
>>>> The model this patch tries to follow is that if "cp --preserve=all"
>>>> was allowed to the mounter from underlying layer to the overlay layer,
>>>> then operation is allowed.  That means even if mounter's creds doesn't
>>>> provide permission to for example execute underying file X, if
>>>> mounter's creds provide sufficient permission to perform "cp
>>>> --preserve=all X Y"  and original creds allow execute on Y, then the
>>>> operation is allowed.  This provides consistency in the face of
>>>> copy-ups.  Consistency is only provided in sane setups, where mounter
>>>> has sufficient privileges to access both the lower and upper layers.
>>>
>>> [cc daniel walsh]
>>>
>>> I think current selinux testsuite tests are written keeping these
>>> rules in mind.
>>>
>>> 1. Check overlay inode creds in the context of task and underlying
>>>     inode creds (lower/upper), in the context of mounter.
>>>
>>> 2. For a lower inode, if said file is being copied up, then only
>>>     check MAY_READ on lower. This is equivalent to mounter creating
>>>     a copy of file and providing caller access to it (context mount).
>>>
>>> For the case of special devices, we do not copy up these. So should
>>> we continue to do check on lower inode in the context of mounter
>>> (instead of not doing any check on lower at all).
>>
>> Hmm, I'm trying to understand the logic... If we follow the "cp
>> --preserve=all" thing, than mounter needs to have CREATE permission
>> for the special file, not READ or WRITE.  Does that make sense?  Would
>> that help with the context= mount use case?
> 
> Ok. If we follow "cp --preserve=all" methodology, then checking for
> mounter CREATE permission on upper for special files makes sense. Or
> change logic to copy up this special file during open. I am assuming
> we don't copy up special file during open as it is not necessary
> for things to work but copying up will work as well?
> 
> So rules will become.
> 
> - Two levels of checks.
> - For lower level inode, check MAY_READ for regular files. (including
>    exec).
> - For special files, only make sure mounter can CREATE object in upper.
> 
> - What about checks on files on upper/. As of now we seem to check
>    access in mounter's context if it is regular file. Skip the checks
>    completely for special files and for executables.
> 
> While non-context mount should still be ok, but this means lot of
> privilige granting to unprivileged process using context mounts. So
> unprivileged process which could not open a device/socket/fifo for
> read/write on host fs, can open it for those operations for context
> mounts.
> 
> IOW, for context mount case, an unprivileged user will gain lot of
> privileges. But that seems to be the point of context mount anyway
> on regular disks. If a disk is mounted using context mount option,
> then all real labels are ignored and all access checking happens
> using context label. We are doing similar thing. With one step extra
> and that is making sure if mounter itself can not do certain operation
> on host, that will still be denied.
> 
> This probably means that context= mounts should be used very carefully.
> It will grant lot of priviliges to the process (and allow operations
> which process could not do on host without overlayfs mount).
> 
> Case of device file still baffels me though. We don't do any mounter's
> checks on device files. So if a device file is on upper which mounter
> can't open for read but mounter is still granting priviliges to client
> to open that device file. That's unintutive to me and seems counter
> to the principle of that mounter can't give more priviliges than what
> it itself can't do on host.
> 
> Dan, stephen, paul moore, does this sound ok to you folks from selinux
> point of view.

It seems wrong to check CREATE when no file is being created, and doing 
so could lead to over-privileging of the mounter context, requiring one 
to allow a mounter context to create device nodes just to allow a client 
task context to read/write via already existing device nodes through an 
overlay.

Checking READ but not EXECUTE upon an execute check could permit a 
mounter to execute unauthorized code, if it can context mount from a 
readable-but-not-executable context to an executable context.

Note btw that cp --preserve=all doesn't quite operate as expected if 
dealing with a context mount.  You can't preserve the original security 
context if copying to a context mount unless the two contexts happen to 
already match.  So I'm not sure how that model applies in the case of a 
context mount.

Does the breaking commit (007ea44892e6) fix a real bug affecting users? 
  If not, I'd recommend just reverting it.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-11-28 19:34         ` Stephen Smalley
@ 2018-11-28 20:24           ` Miklos Szeredi
  2018-11-28 21:46             ` Stephen Smalley
  0 siblings, 1 reply; 47+ messages in thread
From: Miklos Szeredi @ 2018-11-28 20:24 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: Vivek Goyal, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn,
	Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux,
	Daniel J Walsh

On Wed, Nov 28, 2018 at 8:32 PM Stephen Smalley <sds@tycho.nsa.gov> wrote:
>
> On 11/28/18 12:03 PM, Vivek Goyal wrote:
> > On Wed, Nov 28, 2018 at 11:00:09AM +0100, Miklos Szeredi wrote:
> >> On Tue, Nov 27, 2018 at 10:05 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> >>>
> >>> On Tue, Nov 27, 2018 at 08:58:06PM +0100, Miklos Szeredi wrote:
> >>>> [resending with fixed email address for Paul Moore]
> >>>>
> >>>> Moving discussion from github[1] to here.
> >>>>
> >>>> To summarize: commit 007ea44892e6 ("ovl: relax permission checking on
> >>>> underlying layers") was added in 4.20-rc1 to make overlayfs access
> >>>> checks on underlying "real" filesystems more consistent.  The
> >>>> discussion leading up to this commit can be found at [2].  The commit
> >>>> broke some selinux-testsuite cases, possibly indicating a security
> >>>> hole opened by this commit.
> >>>>
> >>>> The model this patch tries to follow is that if "cp --preserve=all"
> >>>> was allowed to the mounter from underlying layer to the overlay layer,
> >>>> then operation is allowed.  That means even if mounter's creds doesn't
> >>>> provide permission to for example execute underying file X, if
> >>>> mounter's creds provide sufficient permission to perform "cp
> >>>> --preserve=all X Y"  and original creds allow execute on Y, then the
> >>>> operation is allowed.  This provides consistency in the face of
> >>>> copy-ups.  Consistency is only provided in sane setups, where mounter
> >>>> has sufficient privileges to access both the lower and upper layers.
> >>>
> >>> [cc daniel walsh]
> >>>
> >>> I think current selinux testsuite tests are written keeping these
> >>> rules in mind.
> >>>
> >>> 1. Check overlay inode creds in the context of task and underlying
> >>>     inode creds (lower/upper), in the context of mounter.
> >>>
> >>> 2. For a lower inode, if said file is being copied up, then only
> >>>     check MAY_READ on lower. This is equivalent to mounter creating
> >>>     a copy of file and providing caller access to it (context mount).
> >>>
> >>> For the case of special devices, we do not copy up these. So should
> >>> we continue to do check on lower inode in the context of mounter
> >>> (instead of not doing any check on lower at all).
> >>
> >> Hmm, I'm trying to understand the logic... If we follow the "cp
> >> --preserve=all" thing, than mounter needs to have CREATE permission
> >> for the special file, not READ or WRITE.  Does that make sense?  Would
> >> that help with the context= mount use case?
> >
> > Ok. If we follow "cp --preserve=all" methodology, then checking for
> > mounter CREATE permission on upper for special files makes sense. Or
> > change logic to copy up this special file during open. I am assuming
> > we don't copy up special file during open as it is not necessary
> > for things to work but copying up will work as well?
> >
> > So rules will become.
> >
> > - Two levels of checks.
> > - For lower level inode, check MAY_READ for regular files. (including
> >    exec).
> > - For special files, only make sure mounter can CREATE object in upper.
> >
> > - What about checks on files on upper/. As of now we seem to check
> >    access in mounter's context if it is regular file. Skip the checks
> >    completely for special files and for executables.
> >
> > While non-context mount should still be ok, but this means lot of
> > privilige granting to unprivileged process using context mounts. So
> > unprivileged process which could not open a device/socket/fifo for
> > read/write on host fs, can open it for those operations for context
> > mounts.
> >
> > IOW, for context mount case, an unprivileged user will gain lot of
> > privileges. But that seems to be the point of context mount anyway
> > on regular disks. If a disk is mounted using context mount option,
> > then all real labels are ignored and all access checking happens
> > using context label. We are doing similar thing. With one step extra
> > and that is making sure if mounter itself can not do certain operation
> > on host, that will still be denied.
> >
> > This probably means that context= mounts should be used very carefully.
> > It will grant lot of priviliges to the process (and allow operations
> > which process could not do on host without overlayfs mount).
> >
> > Case of device file still baffels me though. We don't do any mounter's
> > checks on device files. So if a device file is on upper which mounter
> > can't open for read but mounter is still granting priviliges to client
> > to open that device file. That's unintutive to me and seems counter
> > to the principle of that mounter can't give more priviliges than what
> > it itself can't do on host.
> >
> > Dan, stephen, paul moore, does this sound ok to you folks from selinux
> > point of view.
>
> It seems wrong to check CREATE when no file is being created, and doing
> so could lead to over-privileging of the mounter context, requiring one
> to allow a mounter context to create device nodes just to allow a client
> task context to read/write via already existing device nodes through an
> overlay.

Point taken.

>
> Checking READ but not EXECUTE upon an execute check could permit a
> mounter to execute unauthorized code, if it can context mount from a
> readable-but-not-executable context to an executable context.
>
> Note btw that cp --preserve=all doesn't quite operate as expected if
> dealing with a context mount.  You can't preserve the original security
> context if copying to a context mount unless the two contexts happen to
> already match.  So I'm not sure how that model applies in the case of a
> context mount.
>
> Does the breaking commit (007ea44892e6) fix a real bug affecting users?
>   If not, I'd recommend just reverting it.

That is certainly an option, but...  this is all about context=
mounts, right?  Which allows mounter to override MAC checks under the
new mount?  On any mount, not just overlay, right?  So why is overlay
special?

I'd just like to see proper justification for why we should be doing
those checks on underlying layer that simply don't belong there, IMO.
 I'm sure you know better than I that it's not just about real bugs
affecting users, it's about having a clear, well defined model to base
the design on.   And by reverting the breaking commit, I don't see us
getting closer to that.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-11-28 20:24           ` Miklos Szeredi
@ 2018-11-28 21:46             ` Stephen Smalley
  2018-11-29 11:04               ` Miklos Szeredi
  0 siblings, 1 reply; 47+ messages in thread
From: Stephen Smalley @ 2018-11-28 21:46 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Vivek Goyal, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn,
	Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux,
	Daniel J Walsh

On 11/28/18 3:24 PM, Miklos Szeredi wrote:
> On Wed, Nov 28, 2018 at 8:32 PM Stephen Smalley <sds@tycho.nsa.gov> wrote:
>>
>> On 11/28/18 12:03 PM, Vivek Goyal wrote:
>>> On Wed, Nov 28, 2018 at 11:00:09AM +0100, Miklos Szeredi wrote:
>>>> On Tue, Nov 27, 2018 at 10:05 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>>>>>
>>>>> On Tue, Nov 27, 2018 at 08:58:06PM +0100, Miklos Szeredi wrote:
>>>>>> [resending with fixed email address for Paul Moore]
>>>>>>
>>>>>> Moving discussion from github[1] to here.
>>>>>>
>>>>>> To summarize: commit 007ea44892e6 ("ovl: relax permission checking on
>>>>>> underlying layers") was added in 4.20-rc1 to make overlayfs access
>>>>>> checks on underlying "real" filesystems more consistent.  The
>>>>>> discussion leading up to this commit can be found at [2].  The commit
>>>>>> broke some selinux-testsuite cases, possibly indicating a security
>>>>>> hole opened by this commit.
>>>>>>
>>>>>> The model this patch tries to follow is that if "cp --preserve=all"
>>>>>> was allowed to the mounter from underlying layer to the overlay layer,
>>>>>> then operation is allowed.  That means even if mounter's creds doesn't
>>>>>> provide permission to for example execute underying file X, if
>>>>>> mounter's creds provide sufficient permission to perform "cp
>>>>>> --preserve=all X Y"  and original creds allow execute on Y, then the
>>>>>> operation is allowed.  This provides consistency in the face of
>>>>>> copy-ups.  Consistency is only provided in sane setups, where mounter
>>>>>> has sufficient privileges to access both the lower and upper layers.
>>>>>
>>>>> [cc daniel walsh]
>>>>>
>>>>> I think current selinux testsuite tests are written keeping these
>>>>> rules in mind.
>>>>>
>>>>> 1. Check overlay inode creds in the context of task and underlying
>>>>>      inode creds (lower/upper), in the context of mounter.
>>>>>
>>>>> 2. For a lower inode, if said file is being copied up, then only
>>>>>      check MAY_READ on lower. This is equivalent to mounter creating
>>>>>      a copy of file and providing caller access to it (context mount).
>>>>>
>>>>> For the case of special devices, we do not copy up these. So should
>>>>> we continue to do check on lower inode in the context of mounter
>>>>> (instead of not doing any check on lower at all).
>>>>
>>>> Hmm, I'm trying to understand the logic... If we follow the "cp
>>>> --preserve=all" thing, than mounter needs to have CREATE permission
>>>> for the special file, not READ or WRITE.  Does that make sense?  Would
>>>> that help with the context= mount use case?
>>>
>>> Ok. If we follow "cp --preserve=all" methodology, then checking for
>>> mounter CREATE permission on upper for special files makes sense. Or
>>> change logic to copy up this special file during open. I am assuming
>>> we don't copy up special file during open as it is not necessary
>>> for things to work but copying up will work as well?
>>>
>>> So rules will become.
>>>
>>> - Two levels of checks.
>>> - For lower level inode, check MAY_READ for regular files. (including
>>>     exec).
>>> - For special files, only make sure mounter can CREATE object in upper.
>>>
>>> - What about checks on files on upper/. As of now we seem to check
>>>     access in mounter's context if it is regular file. Skip the checks
>>>     completely for special files and for executables.
>>>
>>> While non-context mount should still be ok, but this means lot of
>>> privilige granting to unprivileged process using context mounts. So
>>> unprivileged process which could not open a device/socket/fifo for
>>> read/write on host fs, can open it for those operations for context
>>> mounts.
>>>
>>> IOW, for context mount case, an unprivileged user will gain lot of
>>> privileges. But that seems to be the point of context mount anyway
>>> on regular disks. If a disk is mounted using context mount option,
>>> then all real labels are ignored and all access checking happens
>>> using context label. We are doing similar thing. With one step extra
>>> and that is making sure if mounter itself can not do certain operation
>>> on host, that will still be denied.
>>>
>>> This probably means that context= mounts should be used very carefully.
>>> It will grant lot of priviliges to the process (and allow operations
>>> which process could not do on host without overlayfs mount).
>>>
>>> Case of device file still baffels me though. We don't do any mounter's
>>> checks on device files. So if a device file is on upper which mounter
>>> can't open for read but mounter is still granting priviliges to client
>>> to open that device file. That's unintutive to me and seems counter
>>> to the principle of that mounter can't give more priviliges than what
>>> it itself can't do on host.
>>>
>>> Dan, stephen, paul moore, does this sound ok to you folks from selinux
>>> point of view.
>>
>> It seems wrong to check CREATE when no file is being created, and doing
>> so could lead to over-privileging of the mounter context, requiring one
>> to allow a mounter context to create device nodes just to allow a client
>> task context to read/write via already existing device nodes through an
>> overlay.
> 
> Point taken.
> 
>>
>> Checking READ but not EXECUTE upon an execute check could permit a
>> mounter to execute unauthorized code, if it can context mount from a
>> readable-but-not-executable context to an executable context.
>>
>> Note btw that cp --preserve=all doesn't quite operate as expected if
>> dealing with a context mount.  You can't preserve the original security
>> context if copying to a context mount unless the two contexts happen to
>> already match.  So I'm not sure how that model applies in the case of a
>> context mount.
>>
>> Does the breaking commit (007ea44892e6) fix a real bug affecting users?
>>    If not, I'd recommend just reverting it.
> 
> That is certainly an option, but...  this is all about context=
> mounts, right?  Which allows mounter to override MAC checks under the
> new mount?  On any mount, not just overlay, right?  So why is overlay
> special?

With other filesystems, the files are only accessible under the context 
specified by the mounter (and you can't mount it twice with differing 
context mount options). With overlay, the file is simultaneously 
accessible under both the context specified by the mounter via the 
overlay and under its lower/upper context via the lower/upper dir.

Generally we only use context mounts on other filesystems when they have 
no label information at all (no security.selinux xattrs) or when they 
are completely untrusted to provide that information; the context 
specified by the mounter is the only basis for access control.  With 
overlay, we are frequently dealing with labeled lower and upper 
directories in a filesystem we trust.

It seems like overlay has a goal of preventing the mounter from 
escalating its access through an overlay mount.

> I'd just like to see proper justification for why we should be doing
> those checks on underlying layer that simply don't belong there, IMO.
>   I'm sure you know better than I that it's not just about real bugs
> affecting users, it's about having a clear, well defined model to base
> the design on.   And by reverting the breaking commit, I don't see us
> getting closer to that.

It seems like the NFS folks raised a number of concerns with the overlay 
approach beyond just these two checks, and Android has their 
override_creds=off use case.  Maybe the overlay model needs a more 
significant rethinking than just these two cases.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-11-28 21:46             ` Stephen Smalley
@ 2018-11-29 11:04               ` Miklos Szeredi
  2018-11-29 13:49                 ` Vivek Goyal
  2018-11-29 16:16                 ` Stephen Smalley
  0 siblings, 2 replies; 47+ messages in thread
From: Miklos Szeredi @ 2018-11-29 11:04 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: Vivek Goyal, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn,
	Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux,
	Daniel J Walsh

On Wed, Nov 28, 2018 at 10:43 PM Stephen Smalley <sds@tycho.nsa.gov> wrote:
>
> On 11/28/18 3:24 PM, Miklos Szeredi wrote:
> > On Wed, Nov 28, 2018 at 8:32 PM Stephen Smalley <sds@tycho.nsa.gov> wrote:

[...]

> >> Does the breaking commit (007ea44892e6) fix a real bug affecting users?
> >>    If not, I'd recommend just reverting it.
> >
> > That is certainly an option, but...  this is all about context=
> > mounts, right?  Which allows mounter to override MAC checks under the
> > new mount?  On any mount, not just overlay, right?  So why is overlay
> > special?
>
> With other filesystems, the files are only accessible under the context
> specified by the mounter (and you can't mount it twice with differing
> context mount options). With overlay, the file is simultaneously
> accessible under both the context specified by the mounter via the
> overlay and under its lower/upper context via the lower/upper dir.
>
> Generally we only use context mounts on other filesystems when they have
> no label information at all (no security.selinux xattrs) or when they
> are completely untrusted to provide that information; the context
> specified by the mounter is the only basis for access control.  With
> overlay, we are frequently dealing with labeled lower and upper
> directories in a filesystem we trust.
>
> It seems like overlay has a goal of preventing the mounter from
> escalating its access through an overlay mount.

Overlayfs main purpose is to bypass limited access to a read-only
filesystem by copying up on a write access.  So bypassing access is
built into the system, but it's done in a way that in the end it
doesn't permit mounter to do more than it could otherwise do.  Or at
least that's the intent, as you say.

To generalize that, we could say:  trigger a copy-up if access to
lower layer object is denied.  That would extend the scope of the
trigger from write access on file/dir to read/write on special files
and execute on regular files, all of which could theoretically be
denied on the lower object, yet allowed on the upper object without
violating the above rule.

> > I'd just like to see proper justification for why we should be doing
> > those checks on underlying layer that simply don't belong there, IMO.
> >   I'm sure you know better than I that it's not just about real bugs
> > affecting users, it's about having a clear, well defined model to base
> > the design on.   And by reverting the breaking commit, I don't see us
> > getting closer to that.
>
> It seems like the NFS folks raised a number of concerns with the overlay
> approach beyond just these two checks,

The concerns that NFS folks had was that overlayfs does not enforce
permission checks (with creds of task) on underlying objects,
resulting in possibly elevated access compared to directly accessing
the NFS mount.    But I think there's no way to reconcile server
permission checks with overlayfs, so we are left with the current
model of only verifying the permission on the server against the
mounter's creds.

> and Android has their
> override_creds=off use case.  Maybe the overlay model needs a more
> significant rethinking than just these two cases.

Yes, it would be good if that override_creds=off use case was
discussed.  AFAICS it trades less privilege in the mounter for more
privilege in the task accessing the overlay.  If, or how, that is a
good model for anything other than Android is not clear to me.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-11-29 11:04               ` Miklos Szeredi
@ 2018-11-29 13:49                 ` Vivek Goyal
  2019-03-04 17:01                   ` Mark Salyzyn
  2018-11-29 16:16                 ` Stephen Smalley
  1 sibling, 1 reply; 47+ messages in thread
From: Vivek Goyal @ 2018-11-29 13:49 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Stephen Smalley, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn,
	Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux,
	Daniel J Walsh

On Thu, Nov 29, 2018 at 12:04:20PM +0100, Miklos Szeredi wrote:
> On Wed, Nov 28, 2018 at 10:43 PM Stephen Smalley <sds@tycho.nsa.gov> wrote:
> >
> > On 11/28/18 3:24 PM, Miklos Szeredi wrote:
> > > On Wed, Nov 28, 2018 at 8:32 PM Stephen Smalley <sds@tycho.nsa.gov> wrote:
> 
> [...]
> 
> > >> Does the breaking commit (007ea44892e6) fix a real bug affecting users?
> > >>    If not, I'd recommend just reverting it.
> > >
> > > That is certainly an option, but...  this is all about context=
> > > mounts, right?  Which allows mounter to override MAC checks under the
> > > new mount?  On any mount, not just overlay, right?  So why is overlay
> > > special?
> >
> > With other filesystems, the files are only accessible under the context
> > specified by the mounter (and you can't mount it twice with differing
> > context mount options). With overlay, the file is simultaneously
> > accessible under both the context specified by the mounter via the
> > overlay and under its lower/upper context via the lower/upper dir.
> >
> > Generally we only use context mounts on other filesystems when they have
> > no label information at all (no security.selinux xattrs) or when they
> > are completely untrusted to provide that information; the context
> > specified by the mounter is the only basis for access control.  With
> > overlay, we are frequently dealing with labeled lower and upper
> > directories in a filesystem we trust.
> >
> > It seems like overlay has a goal of preventing the mounter from
> > escalating its access through an overlay mount.
> 
> Overlayfs main purpose is to bypass limited access to a read-only
> filesystem by copying up on a write access.  So bypassing access is
> built into the system, but it's done in a way that in the end it
> doesn't permit mounter to do more than it could otherwise do.  Or at
> least that's the intent, as you say.
> 
> To generalize that, we could say:  trigger a copy-up if access to
> lower layer object is denied.  That would extend the scope of the
> trigger from write access on file/dir to read/write on special files
> and execute on regular files, all of which could theoretically be
> denied on the lower object, yet allowed on the upper object without
> violating the above rule.

Apart from copy-up, one intent of mounter checks was also to make sure
mounter did not allow access to objects in lower/upper which mounter
itself did not have access to.

So a mounter which can not open upper/foo.txt should not be able to create
a context overlay mount and allow opening upper/foo.txt to a user. IIUC,
that's one of the purposes of second level of check in mounter context and
if it makes sense, then it should extend to special file and exectuable
too?

> 
> > > I'd just like to see proper justification for why we should be doing
> > > those checks on underlying layer that simply don't belong there, IMO.
> > >   I'm sure you know better than I that it's not just about real bugs
> > > affecting users, it's about having a clear, well defined model to base
> > > the design on.   And by reverting the breaking commit, I don't see us
> > > getting closer to that.
> >
> > It seems like the NFS folks raised a number of concerns with the overlay
> > approach beyond just these two checks,
> 
> The concerns that NFS folks had was that overlayfs does not enforce
> permission checks (with creds of task) on underlying objects,
> resulting in possibly elevated access compared to directly accessing
> the NFS mount.    But I think there's no way to reconcile server
> permission checks with overlayfs, so we are left with the current
> model of only verifying the permission on the server against the
> mounter's creds.
> 
> > and Android has their
> > override_creds=off use case.  Maybe the overlay model needs a more
> > significant rethinking than just these two cases.
> 
> Yes, it would be good if that override_creds=off use case was
> discussed.  AFAICS it trades less privilege in the mounter for more
> privilege in the task accessing the overlay.  If, or how, that is a
> good model for anything other than Android is not clear to me.

So will override_creds=off solve the NFS issue also where all access will
happen with the creds of task now? Though it will stil require more
priviliges in task for other operations in overlay to succeed.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-11-29 11:04               ` Miklos Szeredi
  2018-11-29 13:49                 ` Vivek Goyal
@ 2018-11-29 16:16                 ` Stephen Smalley
  2018-11-29 16:22                   ` Stephen Smalley
  2018-11-29 19:47                   ` Miklos Szeredi
  1 sibling, 2 replies; 47+ messages in thread
From: Stephen Smalley @ 2018-11-29 16:16 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Vivek Goyal, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn,
	Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux,
	Daniel J Walsh

On 11/29/18 6:04 AM, Miklos Szeredi wrote:
> On Wed, Nov 28, 2018 at 10:43 PM Stephen Smalley <sds@tycho.nsa.gov> wrote:
>>
>> On 11/28/18 3:24 PM, Miklos Szeredi wrote:
>>> On Wed, Nov 28, 2018 at 8:32 PM Stephen Smalley <sds@tycho.nsa.gov> wrote:
> 
> [...]
> 
>>>> Does the breaking commit (007ea44892e6) fix a real bug affecting users?
>>>>     If not, I'd recommend just reverting it.
>>>
>>> That is certainly an option, but...  this is all about context=
>>> mounts, right?  Which allows mounter to override MAC checks under the
>>> new mount?  On any mount, not just overlay, right?  So why is overlay
>>> special?
>>
>> With other filesystems, the files are only accessible under the context
>> specified by the mounter (and you can't mount it twice with differing
>> context mount options). With overlay, the file is simultaneously
>> accessible under both the context specified by the mounter via the
>> overlay and under its lower/upper context via the lower/upper dir.
>>
>> Generally we only use context mounts on other filesystems when they have
>> no label information at all (no security.selinux xattrs) or when they
>> are completely untrusted to provide that information; the context
>> specified by the mounter is the only basis for access control.  With
>> overlay, we are frequently dealing with labeled lower and upper
>> directories in a filesystem we trust.
>>
>> It seems like overlay has a goal of preventing the mounter from
>> escalating its access through an overlay mount.
> 
> Overlayfs main purpose is to bypass limited access to a read-only
> filesystem by copying up on a write access.  So bypassing access is
> built into the system, but it's done in a way that in the end it
> doesn't permit mounter to do more than it could otherwise do.  Or at
> least that's the intent, as you say.
> 
> To generalize that, we could say:  trigger a copy-up if access to
> lower layer object is denied.  That would extend the scope of the
> trigger from write access on file/dir to read/write on special files
> and execute on regular files, all of which could theoretically be
> denied on the lower object, yet allowed on the upper object without
> violating the above rule.

Possibly I misunderstood you, but I don't think we want to copy-up on 
permission denial, as that would still allow the mounter to read/write 
special files or execute regular files to which it would normally be 
denied access, because the copy would inherit the context specified by 
the mounter in the context mount case.  It still represents an 
escalation of privilege for the mounter.  In contrast, the copy-up on 
write behavior does not allow the mounter to do anything it could not do 
already (i.e. read from the lower, write to the upper).

> 
>>> I'd just like to see proper justification for why we should be doing
>>> those checks on underlying layer that simply don't belong there, IMO.
>>>    I'm sure you know better than I that it's not just about real bugs
>>> affecting users, it's about having a clear, well defined model to base
>>> the design on.   And by reverting the breaking commit, I don't see us
>>> getting closer to that.
>>
>> It seems like the NFS folks raised a number of concerns with the overlay
>> approach beyond just these two checks,
> 
> The concerns that NFS folks had was that overlayfs does not enforce
> permission checks (with creds of task) on underlying objects,
> resulting in possibly elevated access compared to directly accessing
> the NFS mount.    But I think there's no way to reconcile server
> permission checks with overlayfs, so we are left with the current
> model of only verifying the permission on the server against the
> mounter's creds.

Typically this will only work if the NFS files are 
world-readable/searchable, right, since usually the mounter will be root 
and the server will enable root squash on the export?  Is that 
sufficient for most use cases?

> 
>> and Android has their
>> override_creds=off use case.  Maybe the overlay model needs a more
>> significant rethinking than just these two cases.
> 
> Yes, it would be good if that override_creds=off use case was
> discussed.  AFAICS it trades less privilege in the mounter for more
> privilege in the task accessing the overlay.  If, or how, that is a
> good model for anything other than Android is not clear to me.
> 
> Thanks,
> Miklos
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-11-29 16:16                 ` Stephen Smalley
@ 2018-11-29 16:22                   ` Stephen Smalley
  2018-11-29 19:47                   ` Miklos Szeredi
  1 sibling, 0 replies; 47+ messages in thread
From: Stephen Smalley @ 2018-11-29 16:22 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Vivek Goyal, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn,
	Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux,
	Daniel J Walsh

On 11/29/18 11:16 AM, Stephen Smalley wrote:
> On 11/29/18 6:04 AM, Miklos Szeredi wrote:
>> On Wed, Nov 28, 2018 at 10:43 PM Stephen Smalley <sds@tycho.nsa.gov> 
>> wrote:
>>>
>>> On 11/28/18 3:24 PM, Miklos Szeredi wrote:
>>>> On Wed, Nov 28, 2018 at 8:32 PM Stephen Smalley <sds@tycho.nsa.gov> 
>>>> wrote:
>>
>> [...]
>>
>>>>> Does the breaking commit (007ea44892e6) fix a real bug affecting 
>>>>> users?
>>>>>     If not, I'd recommend just reverting it.
>>>>
>>>> That is certainly an option, but...  this is all about context=
>>>> mounts, right?  Which allows mounter to override MAC checks under the
>>>> new mount?  On any mount, not just overlay, right?  So why is overlay
>>>> special?
>>>
>>> With other filesystems, the files are only accessible under the context
>>> specified by the mounter (and you can't mount it twice with differing
>>> context mount options). With overlay, the file is simultaneously
>>> accessible under both the context specified by the mounter via the
>>> overlay and under its lower/upper context via the lower/upper dir.
>>>
>>> Generally we only use context mounts on other filesystems when they have
>>> no label information at all (no security.selinux xattrs) or when they
>>> are completely untrusted to provide that information; the context
>>> specified by the mounter is the only basis for access control.  With
>>> overlay, we are frequently dealing with labeled lower and upper
>>> directories in a filesystem we trust.
>>>
>>> It seems like overlay has a goal of preventing the mounter from
>>> escalating its access through an overlay mount.
>>
>> Overlayfs main purpose is to bypass limited access to a read-only
>> filesystem by copying up on a write access.  So bypassing access is
>> built into the system, but it's done in a way that in the end it
>> doesn't permit mounter to do more than it could otherwise do.  Or at
>> least that's the intent, as you say.
>>
>> To generalize that, we could say:  trigger a copy-up if access to
>> lower layer object is denied.  That would extend the scope of the
>> trigger from write access on file/dir to read/write on special files
>> and execute on regular files, all of which could theoretically be
>> denied on the lower object, yet allowed on the upper object without
>> violating the above rule.
> 
> Possibly I misunderstood you, but I don't think we want to copy-up on 
> permission denial, as that would still allow the mounter to read/write 
> special files or execute regular files to which it would normally be 
> denied access, because the copy would inherit the context specified by 
> the mounter in the context mount case.  It still represents an 
> escalation of privilege for the mounter.  In contrast, the copy-up on 
> write behavior does not allow the mounter to do anything it could not do 
> already (i.e. read from the lower, write to the upper).

Also, copy-up in those cases might allow for device files that would 
normally be unusable due to nodev on the lower filesystem mount to 
become usable in the upper?  And likewise for executable files in a 
noexec lower mount?  Is that already an issue for nosuid?

> 
>>
>>>> I'd just like to see proper justification for why we should be doing
>>>> those checks on underlying layer that simply don't belong there, IMO.
>>>>    I'm sure you know better than I that it's not just about real bugs
>>>> affecting users, it's about having a clear, well defined model to base
>>>> the design on.   And by reverting the breaking commit, I don't see us
>>>> getting closer to that.
>>>
>>> It seems like the NFS folks raised a number of concerns with the overlay
>>> approach beyond just these two checks,
>>
>> The concerns that NFS folks had was that overlayfs does not enforce
>> permission checks (with creds of task) on underlying objects,
>> resulting in possibly elevated access compared to directly accessing
>> the NFS mount.    But I think there's no way to reconcile server
>> permission checks with overlayfs, so we are left with the current
>> model of only verifying the permission on the server against the
>> mounter's creds.
> 
> Typically this will only work if the NFS files are 
> world-readable/searchable, right, since usually the mounter will be root 
> and the server will enable root squash on the export?  Is that 
> sufficient for most use cases?
> 
>>
>>> and Android has their
>>> override_creds=off use case.  Maybe the overlay model needs a more
>>> significant rethinking than just these two cases.
>>
>> Yes, it would be good if that override_creds=off use case was
>> discussed.  AFAICS it trades less privilege in the mounter for more
>> privilege in the task accessing the overlay.  If, or how, that is a
>> good model for anything other than Android is not clear to me.
>>
>> Thanks,
>> Miklos
>>
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-11-29 16:16                 ` Stephen Smalley
  2018-11-29 16:22                   ` Stephen Smalley
@ 2018-11-29 19:47                   ` Miklos Szeredi
  2018-11-29 21:03                     ` Stephen Smalley
  2018-11-29 22:22                     ` Daniel Walsh
  1 sibling, 2 replies; 47+ messages in thread
From: Miklos Szeredi @ 2018-11-29 19:47 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: Vivek Goyal, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn,
	Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux,
	Daniel J Walsh

On Thu, Nov 29, 2018 at 5:14 PM Stephen Smalley <sds@tycho.nsa.gov> wrote:

> Possibly I misunderstood you, but I don't think we want to copy-up on
> permission denial, as that would still allow the mounter to read/write
> special files or execute regular files to which it would normally be
> denied access, because the copy would inherit the context specified by
> the mounter in the context mount case.  It still represents an
> escalation of privilege for the mounter.  In contrast, the copy-up on
> write behavior does not allow the mounter to do anything it could not do
> already (i.e. read from the lower, write to the upper).

Let's get this straight:  when file is copied up, it inherits label
from context=, not from label of lower file?

Next question: permission to change metadata is tied to permission to
open?  Is it possible that open is denied, but metadata can be
changed?

DAC model allows this: metadata change is tied to ownership, not mode
bits.   And different capability flag.

If the same is true for MAC, then the pre-v4.20-rc1 is already
susceptible to the privilege escalation you describe, right?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-11-29 19:47                   ` Miklos Szeredi
@ 2018-11-29 21:03                     ` Stephen Smalley
  2018-11-29 21:19                       ` Stephen Smalley
  2018-11-29 22:22                     ` Daniel Walsh
  1 sibling, 1 reply; 47+ messages in thread
From: Stephen Smalley @ 2018-11-29 21:03 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Vivek Goyal, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn,
	Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux,
	Daniel J Walsh

On 11/29/18 2:47 PM, Miklos Szeredi wrote:
> On Thu, Nov 29, 2018 at 5:14 PM Stephen Smalley <sds@tycho.nsa.gov> wrote:
> 
>> Possibly I misunderstood you, but I don't think we want to copy-up on
>> permission denial, as that would still allow the mounter to read/write
>> special files or execute regular files to which it would normally be
>> denied access, because the copy would inherit the context specified by
>> the mounter in the context mount case.  It still represents an
>> escalation of privilege for the mounter.  In contrast, the copy-up on
>> write behavior does not allow the mounter to do anything it could not do
>> already (i.e. read from the lower, write to the upper).
> 
> Let's get this straight:  when file is copied up, it inherits label
> from context=, not from label of lower file?

That's correct.  The overlay inodes are all assigned the label from the 
context= mount option, and so are any upper inodes created through the 
overlay.  At least that's my understanding of how it is supposed to 
work.  The original use case was for containers with the lower dir 
labeled with a context that is read-only to the container context and 
using a context that is writable by the container context for the 
context= mount.

> Next question: permission to change metadata is tied to permission to
> open?  Is it possible that open is denied, but metadata can be
> changed?

There is no metadata change occurring here. The overlay, upper, and 
lower inodes all keep their labels intact for their lifetime (both 
overlay and upper always have the context= label; upper has whatever its 
original label was), unless explicitly relabeled by some process.  And 
when viewed through the overlay, the file always has the label specified 
via context=, even before the copy-up.

> DAC model allows this: metadata change is tied to ownership, not mode
> bits.   And different capability flag.
> 
> If the same is true for MAC, then the pre-v4.20-rc1 is already
> susceptible to the privilege escalation you describe, right?

Actually, I guess there wouldn't be a privilege escalation if you 
checked the mounter's ability to create the new file upon copy-up, and 
checked the mounter's access to the upper inode label upon the 
subsequent read, write, or execute access.  Then we'd typically block 
the ability to create the device file and we'd block the ability to 
execute files with the label from context=.

But copy-up of special files seems undesirable for other reasons (e.g. 
requiring mounters to be allowed to create device nodes just to permit 
client's to read/write them, possible implications for nodev/noexec, 
implications for socket and fifo files).

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-11-29 21:03                     ` Stephen Smalley
@ 2018-11-29 21:19                       ` Stephen Smalley
  2018-12-04 13:32                         ` Miklos Szeredi
  0 siblings, 1 reply; 47+ messages in thread
From: Stephen Smalley @ 2018-11-29 21:19 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Vivek Goyal, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn,
	Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux,
	Daniel J Walsh

On 11/29/18 4:03 PM, Stephen Smalley wrote:
> On 11/29/18 2:47 PM, Miklos Szeredi wrote:
>> On Thu, Nov 29, 2018 at 5:14 PM Stephen Smalley <sds@tycho.nsa.gov> 
>> wrote:
>>
>>> Possibly I misunderstood you, but I don't think we want to copy-up on
>>> permission denial, as that would still allow the mounter to read/write
>>> special files or execute regular files to which it would normally be
>>> denied access, because the copy would inherit the context specified by
>>> the mounter in the context mount case.  It still represents an
>>> escalation of privilege for the mounter.  In contrast, the copy-up on
>>> write behavior does not allow the mounter to do anything it could not do
>>> already (i.e. read from the lower, write to the upper).
>>
>> Let's get this straight:  when file is copied up, it inherits label
>> from context=, not from label of lower file?
> 
> That's correct.  The overlay inodes are all assigned the label from the 
> context= mount option, and so are any upper inodes created through the 
> overlay.  At least that's my understanding of how it is supposed to 
> work.  The original use case was for containers with the lower dir 
> labeled with a context that is read-only to the container context and 
> using a context that is writable by the container context for the 
> context= mount.
> 
>> Next question: permission to change metadata is tied to permission to
>> open?  Is it possible that open is denied, but metadata can be
>> changed?
> 
> There is no metadata change occurring here. The overlay, upper, and 
> lower inodes all keep their labels intact for their lifetime (both 
> overlay and upper always have the context= label; upper has whatever its
                                                   ^^lower^^

> original label was), unless explicitly relabeled by some process.  And 
> when viewed through the overlay, the file always has the label specified 
> via context=, even before the copy-up.
> 
>> DAC model allows this: metadata change is tied to ownership, not mode
>> bits.   And different capability flag.
>>
>> If the same is true for MAC, then the pre-v4.20-rc1 is already
>> susceptible to the privilege escalation you describe, right?
> 
> Actually, I guess there wouldn't be a privilege escalation if you 
> checked the mounter's ability to create the new file upon copy-up, and 
> checked the mounter's access to the upper inode label upon the 
> subsequent read, write, or execute access.  Then we'd typically block 
> the ability to create the device file and we'd block the ability to 
> execute files with the label from context=.
> 
> But copy-up of special files seems undesirable for other reasons (e.g. 
> requiring mounters to be allowed to create device nodes just to permit 
> client's to read/write them, possible implications for nodev/noexec, 
> implications for socket and fifo files).


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-11-29 19:47                   ` Miklos Szeredi
  2018-11-29 21:03                     ` Stephen Smalley
@ 2018-11-29 22:22                     ` Daniel Walsh
  2018-12-03 23:27                       ` Paul Moore
  1 sibling, 1 reply; 47+ messages in thread
From: Daniel Walsh @ 2018-11-29 22:22 UTC (permalink / raw)
  To: Miklos Szeredi, Stephen Smalley
  Cc: Vivek Goyal, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn,
	Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux

On 11/29/18 2:47 PM, Miklos Szeredi wrote:
> On Thu, Nov 29, 2018 at 5:14 PM Stephen Smalley <sds@tycho.nsa.gov> wrote:
>
>> Possibly I misunderstood you, but I don't think we want to copy-up on
>> permission denial, as that would still allow the mounter to read/write
>> special files or execute regular files to which it would normally be
>> denied access, because the copy would inherit the context specified by
>> the mounter in the context mount case.  It still represents an
>> escalation of privilege for the mounter.  In contrast, the copy-up on
>> write behavior does not allow the mounter to do anything it could not do
>> already (i.e. read from the lower, write to the upper).
> Let's get this straight:  when file is copied up, it inherits label
> from context=, not from label of lower file?

Yes, in the case of context mount, it will get the context mount directory.

In the case of not context mount, it should maintain the label of  the
lower.

>
> Next question: permission to change metadata is tied to permission to
> open?  Is it possible that open is denied, but metadata can be
> changed?
Yes, SElinux handles open differently then setattr.  Although I am not
sure if any tools handle this.
> DAC model allows this: metadata change is tied to ownership, not mode
> bits.   And different capability flag.
>
> If the same is true for MAC, then the pre-v4.20-rc1 is already
> susceptible to the privilege escalation you describe, right?
>
> Thanks,
> Miklos

After talking to Vivek, I am not sure their is a privilege escallation.


For device nodes, the mounter has to have the ability to create the
devicenode with the context mount, if he can do this, then he can do it
with or without Overlay.  This might lead to users making mistakes on
security, but the model is sound. And I think this stands even in the
case of the lower is mounted NODEV and the upper is not.  If the mounter
can create a device on the upper with a particular label, then he does
not need the lower.


For sockets, I see the case where a process is listening on the lower
level socket, the mounter mounts the overlay over the directory with the
socket.  Then the mounter changes the attributes of the socket,
performing a copy up.  If the mounter can not talk to the socket and the
other end is still listening, then this could be an issue.  If the
socket is no longer connected to the listener on the lower, then this is
not an issue.


Similar for a FIFO.


With SELinux we are also always checking not only the file access to the
socker, but also checking whether the label of the client is able to
talk to the label of the server daemon.  So we are protected by a
secondary check.






^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-11-29 22:22                     ` Daniel Walsh
@ 2018-12-03 23:27                       ` Paul Moore
  2018-12-04 14:43                         ` Stephen Smalley
  0 siblings, 1 reply; 47+ messages in thread
From: Paul Moore @ 2018-12-03 23:27 UTC (permalink / raw)
  To: Dan Walsh
  Cc: miklos, Stephen Smalley, vgoyal, omosnace, bfields, salyzyn,
	linux-kernel, linux-unionfs, linux-fsdevel, selinux

On Thu, Nov 29, 2018 at 5:22 PM Daniel Walsh <dwalsh@redhat.com> wrote:
> On 11/29/18 2:47 PM, Miklos Szeredi wrote:
> > On Thu, Nov 29, 2018 at 5:14 PM Stephen Smalley <sds@tycho.nsa.gov> wrote:
> >
> >> Possibly I misunderstood you, but I don't think we want to copy-up on
> >> permission denial, as that would still allow the mounter to read/write
> >> special files or execute regular files to which it would normally be
> >> denied access, because the copy would inherit the context specified by
> >> the mounter in the context mount case.  It still represents an
> >> escalation of privilege for the mounter.  In contrast, the copy-up on
> >> write behavior does not allow the mounter to do anything it could not do
> >> already (i.e. read from the lower, write to the upper).
> > Let's get this straight:  when file is copied up, it inherits label
> > from context=, not from label of lower file?
>
> Yes, in the case of context mount, it will get the context mount directory.
>
> In the case of not context mount, it should maintain the label of  the
> lower.
>
> > Next question: permission to change metadata is tied to permission to
> > open?  Is it possible that open is denied, but metadata can be
> > changed?
>
> Yes, SElinux handles open differently then setattr.  Although I am not
> sure if any tools handle this.
>
> > DAC model allows this: metadata change is tied to ownership, not mode
> > bits.   And different capability flag.
> >
> > If the same is true for MAC, then the pre-v4.20-rc1 is already
> > susceptible to the privilege escalation you describe, right?
>
> After talking to Vivek, I am not sure their is a privilege escallation.

More on this below, but this thread doesn't have me convinced, and we
are at -rc5 right now.  We need to come to some decision on this soon
because we are running out of time before v4.20 is released with this
code.

> For device nodes, the mounter has to have the ability to create the
> devicenode with the context mount, if he can do this, then he can do it
> with or without Overlay.  This might lead to users making mistakes on
> security, but the model is sound. And I think this stands even in the
> case of the lower is mounted NODEV and the upper is not.  If the mounter
> can create a device on the upper with a particular label, then he does
> not need the lower.

The problem I have when looking at the current code is that permission
is given, regardless of what is requested, for any special_file() on
an overlayfs mount.

It also looks like the mounter's creds are used when checking
permissions regardless of the file has been copied up or not; I would
expect that the mounter's permissions would only used when checking
permissions against the lower inode, no?  I think there is also some
weird behavior if the underlying inode only allows the mounter to
write (no read) and a write is requested at the overlayfs layer.  I'm
sure I'm missing some subtle thing with overlayfs, but why aren't we
doing something like the following:

  int ovl_permission(...) {

    if (!realinode) {
      ...
    }

    err = generic_permission(inode, mask);
    if (err)
      return err;

    if (upperinode) {
      err = inode_permission(upperinode, mask);
    } else {
      // on the lower inode, always use the mounter's creds
      old_cred = ovl_override_creds(...);

      // check to see if we have the right perms first, if
      // that fails switch to a read/copy-up check if we
      // are doing a write (note: we are not bypassing the
      // exec check, the task can change the metadata like
      // every other fs)
      err = inode_permission(lowerinode, mask);
      if (err && (mask & (MAY_EXEC | MAY_APPEND))) {
        // PM: my guess is that we also need to add a
        // "&& !special_file(lowerinode)" to the conditional
        // above because you can't copy-up a dev node in the
        // normal sense, but we'll leave that as a discussion
        // point for now...
        // turn the write into a read (copy-up)
        mask &= ~(MAY_WRITE | MAY_APPEND);
        mask |= MAY_READ;
        err = inode_permission(lowerinode, mask);
      }

      // reset the creds
      revert_creds(old_cred);
    }

    return err;
  }

> For sockets, I see the case where a process is listening on the lower
> level socket, the mounter mounts the overlay over the directory with the
> socket.  Then the mounter changes the attributes of the socket,
> performing a copy up.  If the mounter can not talk to the socket and the
> other end is still listening, then this could be an issue.  If the
> socket is no longer connected to the listener on the lower, then this is
> not an issue.
>
> Similar for a FIFO.

See my comment "// PM: my guess ..." in the pseudo code above.  I
think the write->read permission mask conversion really should only
apply to normal files where you can do a copy-up.

> With SELinux we are also always checking not only the file access to the
> socker, but also checking whether the label of the client is able to
> talk to the label of the server daemon.  So we are protected by a
> secondary check.

That's making some assumptions on the LSM and the LSM's loaded policy
and is not something I would want to rely on.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-11-29 21:19                       ` Stephen Smalley
@ 2018-12-04 13:32                         ` Miklos Szeredi
  2018-12-04 14:30                           ` Stephen Smalley
  0 siblings, 1 reply; 47+ messages in thread
From: Miklos Szeredi @ 2018-12-04 13:32 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: Vivek Goyal, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn,
	Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux,
	Daniel J Walsh

On Thu, Nov 29, 2018 at 10:16 PM Stephen Smalley <sds@tycho.nsa.gov> wrote:
>
> On 11/29/18 4:03 PM, Stephen Smalley wrote:
> > On 11/29/18 2:47 PM, Miklos Szeredi wrote:
> >> On Thu, Nov 29, 2018 at 5:14 PM Stephen Smalley <sds@tycho.nsa.gov>
> >> wrote:
> >>
> >>> Possibly I misunderstood you, but I don't think we want to copy-up on
> >>> permission denial, as that would still allow the mounter to read/write
> >>> special files or execute regular files to which it would normally be
> >>> denied access, because the copy would inherit the context specified by
> >>> the mounter in the context mount case.  It still represents an
> >>> escalation of privilege for the mounter.  In contrast, the copy-up on
> >>> write behavior does not allow the mounter to do anything it could not do
> >>> already (i.e. read from the lower, write to the upper).
> >>
> >> Let's get this straight:  when file is copied up, it inherits label
> >> from context=, not from label of lower file?
> >
> > That's correct.  The overlay inodes are all assigned the label from the
> > context= mount option, and so are any upper inodes created through the
> > overlay.  At least that's my understanding of how it is supposed to
> > work.  The original use case was for containers with the lower dir
> > labeled with a context that is read-only to the container context and
> > using a context that is writable by the container context for the
> > context= mount.
> >
> >> Next question: permission to change metadata is tied to permission to
> >> open?  Is it possible that open is denied, but metadata can be
> >> changed?
> >
> > There is no metadata change occurring here. The overlay, upper, and
> > lower inodes all keep their labels intact for their lifetime (both
> > overlay and upper always have the context= label; upper has whatever its
>                                                    ^^lower^^
>
> > original label was), unless explicitly relabeled by some process.  And
> > when viewed through the overlay, the file always has the label specified
> > via context=, even before the copy-up.

Okay.

> >
> >> DAC model allows this: metadata change is tied to ownership, not mode
> >> bits.   And different capability flag.
> >>
> >> If the same is true for MAC, then the pre-v4.20-rc1 is already
> >> susceptible to the privilege escalation you describe, right?
> >
> > Actually, I guess there wouldn't be a privilege escalation if you
> > checked the mounter's ability to create the new file upon copy-up, and
> > checked the mounter's access to the upper inode label upon the
> > subsequent read, write, or execute access.  Then we'd typically block
> > the ability to create the device file and we'd block the ability to
> > execute files with the label from context=.
> >
> > But copy-up of special files seems undesirable for other reasons (e.g.
> > requiring mounters to be allowed to create device nodes just to permit
> > client's to read/write them, possible implications for nodev/noexec,
> > implications for socket and fifo files).

I think you missed my point: opening a device file or executing an
executable wouldn't normally require copy-up.   If

 -  permission is granted on overlay to task, and
 -  permission is granted on lower layer to mounter,

then copy-up wouldn't be performed.

My proposed sequence would be

a) check task's creds against overlay inode, fail -> return fail, otherwise:
b) check mounter's creds against lower inode, success -> return
success, otherwise:
c) copy up inode, fail -> return fail, otherwise
d) check mounter's creds against upper inode, return result.

So, unlike write access to regular files, write access to special
files don't necessarily result in copy-up.

You say this is an escalation of privilege, but I don't get it how.
As DWalsh points out downthread, if mounter cannot create device
files, then the copy-up will simply fail.  If mounter can create
device files, then this is not an escalation of privilege for the
mounter.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-12-04 13:32                         ` Miklos Szeredi
@ 2018-12-04 14:30                           ` Stephen Smalley
  2018-12-04 14:45                             ` Miklos Szeredi
  2018-12-04 15:15                             ` Vivek Goyal
  0 siblings, 2 replies; 47+ messages in thread
From: Stephen Smalley @ 2018-12-04 14:30 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Vivek Goyal, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn,
	Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux,
	Daniel J Walsh

On 12/4/18 8:32 AM, Miklos Szeredi wrote:
> On Thu, Nov 29, 2018 at 10:16 PM Stephen Smalley <sds@tycho.nsa.gov> wrote:
>>
>> On 11/29/18 4:03 PM, Stephen Smalley wrote:
>>> On 11/29/18 2:47 PM, Miklos Szeredi wrote:
>>>> On Thu, Nov 29, 2018 at 5:14 PM Stephen Smalley <sds@tycho.nsa.gov>
>>>> wrote:
>>>>
>>>>> Possibly I misunderstood you, but I don't think we want to copy-up on
>>>>> permission denial, as that would still allow the mounter to read/write
>>>>> special files or execute regular files to which it would normally be
>>>>> denied access, because the copy would inherit the context specified by
>>>>> the mounter in the context mount case.  It still represents an
>>>>> escalation of privilege for the mounter.  In contrast, the copy-up on
>>>>> write behavior does not allow the mounter to do anything it could not do
>>>>> already (i.e. read from the lower, write to the upper).
>>>>
>>>> Let's get this straight:  when file is copied up, it inherits label
>>>> from context=, not from label of lower file?
>>>
>>> That's correct.  The overlay inodes are all assigned the label from the
>>> context= mount option, and so are any upper inodes created through the
>>> overlay.  At least that's my understanding of how it is supposed to
>>> work.  The original use case was for containers with the lower dir
>>> labeled with a context that is read-only to the container context and
>>> using a context that is writable by the container context for the
>>> context= mount.
>>>
>>>> Next question: permission to change metadata is tied to permission to
>>>> open?  Is it possible that open is denied, but metadata can be
>>>> changed?
>>>
>>> There is no metadata change occurring here. The overlay, upper, and
>>> lower inodes all keep their labels intact for their lifetime (both
>>> overlay and upper always have the context= label; upper has whatever its
>>                                                     ^^lower^^
>>
>>> original label was), unless explicitly relabeled by some process.  And
>>> when viewed through the overlay, the file always has the label specified
>>> via context=, even before the copy-up.
> 
> Okay.
> 
>>>
>>>> DAC model allows this: metadata change is tied to ownership, not mode
>>>> bits.   And different capability flag.
>>>>
>>>> If the same is true for MAC, then the pre-v4.20-rc1 is already
>>>> susceptible to the privilege escalation you describe, right?
>>>
>>> Actually, I guess there wouldn't be a privilege escalation if you
>>> checked the mounter's ability to create the new file upon copy-up, and
>>> checked the mounter's access to the upper inode label upon the
>>> subsequent read, write, or execute access.  Then we'd typically block
>>> the ability to create the device file and we'd block the ability to
>>> execute files with the label from context=.
>>>
>>> But copy-up of special files seems undesirable for other reasons (e.g.
>>> requiring mounters to be allowed to create device nodes just to permit
>>> client's to read/write them, possible implications for nodev/noexec,
>>> implications for socket and fifo files).
> 
> I think you missed my point: opening a device file or executing an
> executable wouldn't normally require copy-up.   If
> 
>   -  permission is granted on overlay to task, and
>   -  permission is granted on lower layer to mounter,
> 
> then copy-up wouldn't be performed.
> 
> My proposed sequence would be
> 
> a) check task's creds against overlay inode, fail -> return fail, otherwise:
> b) check mounter's creds against lower inode, success -> return
> success, otherwise:
> c) copy up inode, fail -> return fail, otherwise
> d) check mounter's creds against upper inode, return result.
> 
> So, unlike write access to regular files, write access to special
> files don't necessarily result in copy-up.
> 
> You say this is an escalation of privilege, but I don't get it how.
> As DWalsh points out downthread, if mounter cannot create device
> files, then the copy-up will simply fail.  If mounter can create
> device files, then this is not an escalation of privilege for the
> mounter.

Yes, in that case there isn't an escalation of privilege for the mounter 
(I acknowledged that above).  I'm still not sure copy-up of special 
files is a good idea though:

- In the case of device files, there is the potential for mischief by 
the client task in misusing the mounter's privileges to gain access to 
otherwise unusable device node (nodev lower vs upper?),

- In the case of sockets or fifos, what useful result do you get from a 
copy-up? Accessing the copy isn't going to yield the same result as 
accessing the original.

For executables, this copy-up behavior might trigger a lot of undesired 
copies of non-executable files from the lower dir into the upper, even 
if we ultimately end up denying the execute.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-12-03 23:27                       ` Paul Moore
@ 2018-12-04 14:43                         ` Stephen Smalley
  2018-12-04 23:01                           ` Paul Moore
  0 siblings, 1 reply; 47+ messages in thread
From: Stephen Smalley @ 2018-12-04 14:43 UTC (permalink / raw)
  To: Paul Moore, Dan Walsh
  Cc: miklos, vgoyal, omosnace, bfields, salyzyn, linux-kernel,
	linux-unionfs, linux-fsdevel, selinux

On 12/3/18 6:27 PM, Paul Moore wrote:
> On Thu, Nov 29, 2018 at 5:22 PM Daniel Walsh <dwalsh@redhat.com> wrote:
>> On 11/29/18 2:47 PM, Miklos Szeredi wrote:
>>> On Thu, Nov 29, 2018 at 5:14 PM Stephen Smalley <sds@tycho.nsa.gov> wrote:
>>>
>>>> Possibly I misunderstood you, but I don't think we want to copy-up on
>>>> permission denial, as that would still allow the mounter to read/write
>>>> special files or execute regular files to which it would normally be
>>>> denied access, because the copy would inherit the context specified by
>>>> the mounter in the context mount case.  It still represents an
>>>> escalation of privilege for the mounter.  In contrast, the copy-up on
>>>> write behavior does not allow the mounter to do anything it could not do
>>>> already (i.e. read from the lower, write to the upper).
>>> Let's get this straight:  when file is copied up, it inherits label
>>> from context=, not from label of lower file?
>>
>> Yes, in the case of context mount, it will get the context mount directory.
>>
>> In the case of not context mount, it should maintain the label of  the
>> lower.
>>
>>> Next question: permission to change metadata is tied to permission to
>>> open?  Is it possible that open is denied, but metadata can be
>>> changed?
>>
>> Yes, SElinux handles open differently then setattr.  Although I am not
>> sure if any tools handle this.
>>
>>> DAC model allows this: metadata change is tied to ownership, not mode
>>> bits.   And different capability flag.
>>>
>>> If the same is true for MAC, then the pre-v4.20-rc1 is already
>>> susceptible to the privilege escalation you describe, right?
>>
>> After talking to Vivek, I am not sure their is a privilege escallation.
> 
> More on this below, but this thread doesn't have me convinced, and we
> are at -rc5 right now.  We need to come to some decision on this soon
> because we are running out of time before v4.20 is released with this
> code.
> 
>> For device nodes, the mounter has to have the ability to create the
>> devicenode with the context mount, if he can do this, then he can do it
>> with or without Overlay.  This might lead to users making mistakes on
>> security, but the model is sound. And I think this stands even in the
>> case of the lower is mounted NODEV and the upper is not.  If the mounter
>> can create a device on the upper with a particular label, then he does
>> not need the lower.
> 
> The problem I have when looking at the current code is that permission
> is given, regardless of what is requested, for any special_file() on
> an overlayfs mount.
> 
> It also looks like the mounter's creds are used when checking
> permissions regardless of the file has been copied up or not; I would
> expect that the mounter's permissions would only used when checking
> permissions against the lower inode, no?

No, that's never been the model as far as I know.  mounter's permissions 
are checked to the underlying inode, whether upper or lower.  client's 
permissions are only checked to the overlay inode.  upper and lower are 
logically backing store - upper for writes and lower for reads from 
unmodified files.  Now, in theory, upper should always be labeled the 
same as overlay, so client check against overlay should already imply 
client access to upper, unless someone has manually relabeled upper 
outside of the overlay.

   I think there is also some
> weird behavior if the underlying inode only allows the mounter to
> write (no read) and a write is requested at the overlayfs layer.  I'm
> sure I'm missing some subtle thing with overlayfs, but why aren't we
> doing something like the following:
> 
>    int ovl_permission(...) {
> 
>      if (!realinode) {
>        ...
>      }
> 
>      err = generic_permission(inode, mask);
>      if (err)
>        return err;
> 
>      if (upperinode) {
>        err = inode_permission(upperinode, mask);
>      } else {
>        // on the lower inode, always use the mounter's creds
>        old_cred = ovl_override_creds(...);
> 
>        // check to see if we have the right perms first, if
>        // that fails switch to a read/copy-up check if we
>        // are doing a write (note: we are not bypassing the
>        // exec check, the task can change the metadata like
>        // every other fs)
>        err = inode_permission(lowerinode, mask);
>        if (err && (mask & (MAY_EXEC | MAY_APPEND))) {
>          // PM: my guess is that we also need to add a
>          // "&& !special_file(lowerinode)" to the conditional
>          // above because you can't copy-up a dev node in the
>          // normal sense, but we'll leave that as a discussion
>          // point for now...
>          // turn the write into a read (copy-up)
>          mask &= ~(MAY_WRITE | MAY_APPEND);
>          mask |= MAY_READ;
>          err = inode_permission(lowerinode, mask);
>        }
> 
>        // reset the creds
>        revert_creds(old_cred);
>      }
> 
>      return err;
>    }
> 
>> For sockets, I see the case where a process is listening on the lower
>> level socket, the mounter mounts the overlay over the directory with the
>> socket.  Then the mounter changes the attributes of the socket,
>> performing a copy up.  If the mounter can not talk to the socket and the
>> other end is still listening, then this could be an issue.  If the
>> socket is no longer connected to the listener on the lower, then this is
>> not an issue.
>>
>> Similar for a FIFO.
> 
> See my comment "// PM: my guess ..." in the pseudo code above.  I
> think the write->read permission mask conversion really should only
> apply to normal files where you can do a copy-up.
> 
>> With SELinux we are also always checking not only the file access to the
>> socker, but also checking whether the label of the client is able to
>> talk to the label of the server daemon.  So we are protected by a
>> secondary check.
> 
> That's making some assumptions on the LSM and the LSM's loaded policy
> and is not something I would want to rely on.

If you copy a socket or fifo and try to connect or open the copy, you 
aren't going to get the same result as if you accessed the original. 
Copy-up really makes no sense for those AFAICT.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-12-04 14:30                           ` Stephen Smalley
@ 2018-12-04 14:45                             ` Miklos Szeredi
  2018-12-04 15:35                               ` Stephen Smalley
  2018-12-04 15:15                             ` Vivek Goyal
  1 sibling, 1 reply; 47+ messages in thread
From: Miklos Szeredi @ 2018-12-04 14:45 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: Vivek Goyal, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn,
	Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux,
	Daniel J Walsh

On Tue, Dec 4, 2018 at 3:28 PM Stephen Smalley <sds@tycho.nsa.gov> wrote:
>
> On 12/4/18 8:32 AM, Miklos Szeredi wrote:

> > My proposed sequence would be
> >
> > a) check task's creds against overlay inode, fail -> return fail, otherwise:
> > b) check mounter's creds against lower inode, success -> return
> > success, otherwise:
> > c) copy up inode, fail -> return fail, otherwise
> > d) check mounter's creds against upper inode, return result.
> >
> > So, unlike write access to regular files, write access to special
> > files don't necessarily result in copy-up.
> >
> > You say this is an escalation of privilege, but I don't get it how.
> > As DWalsh points out downthread, if mounter cannot create device
> > files, then the copy-up will simply fail.  If mounter can create
> > device files, then this is not an escalation of privilege for the
> > mounter.
>
> Yes, in that case there isn't an escalation of privilege for the mounter
> (I acknowledged that above).  I'm still not sure copy-up of special
> files is a good idea though:
>
> - In the case of device files, there is the potential for mischief by
> the client task in misusing the mounter's privileges to gain access to
> otherwise unusable device node (nodev lower vs upper?),

The mount flag "nodev" on lower or upper has no effect on the overlay,
and never had.

> - In the case of sockets or fifos, what useful result do you get from a
> copy-up? Accessing the copy isn't going to yield the same result as
> accessing the original.

This is a misconception.  Overlayfs is a filesystem (that uses other
filesystems as backing store), but it's more a filesystem and less a
vfs hack (and trying to completely get out of the vfs hack business),
and copy up has no effect on I/O being performed on a socket or FIFO:
it's the same object before and after the copy up, and it's a
different object from either the lower or upper nodes.   Same as NFS:
fifo on server is logically different object than fifo having the same
name on any number of clients.

> For executables, this copy-up behavior might trigger a lot of undesired
> copies of non-executable files from the lower dir into the upper, even
> if we ultimately end up denying the execute.

That would only happen if

  - task is allowed exec on overlay
  - mounter is denied exec on lower
  - copy up is allowed
  - mounter is denied exec on upper

In fact in the model where upper object inherits context from overlay
this is pretty much impossible, AFAICT.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-12-04 14:30                           ` Stephen Smalley
  2018-12-04 14:45                             ` Miklos Szeredi
@ 2018-12-04 15:15                             ` Vivek Goyal
  2018-12-04 15:22                               ` Vivek Goyal
  2018-12-04 15:42                               ` Stephen Smalley
  1 sibling, 2 replies; 47+ messages in thread
From: Vivek Goyal @ 2018-12-04 15:15 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: Miklos Szeredi, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn,
	Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux,
	Daniel J Walsh

On Tue, Dec 04, 2018 at 09:30:53AM -0500, Stephen Smalley wrote:
> On 12/4/18 8:32 AM, Miklos Szeredi wrote:
> > On Thu, Nov 29, 2018 at 10:16 PM Stephen Smalley <sds@tycho.nsa.gov> wrote:
> > > 
> > > On 11/29/18 4:03 PM, Stephen Smalley wrote:
> > > > On 11/29/18 2:47 PM, Miklos Szeredi wrote:
> > > > > On Thu, Nov 29, 2018 at 5:14 PM Stephen Smalley <sds@tycho.nsa.gov>
> > > > > wrote:
> > > > > 
> > > > > > Possibly I misunderstood you, but I don't think we want to copy-up on
> > > > > > permission denial, as that would still allow the mounter to read/write
> > > > > > special files or execute regular files to which it would normally be
> > > > > > denied access, because the copy would inherit the context specified by
> > > > > > the mounter in the context mount case.  It still represents an
> > > > > > escalation of privilege for the mounter.  In contrast, the copy-up on
> > > > > > write behavior does not allow the mounter to do anything it could not do
> > > > > > already (i.e. read from the lower, write to the upper).
> > > > > 
> > > > > Let's get this straight:  when file is copied up, it inherits label
> > > > > from context=, not from label of lower file?
> > > > 
> > > > That's correct.  The overlay inodes are all assigned the label from the
> > > > context= mount option, and so are any upper inodes created through the
> > > > overlay.  At least that's my understanding of how it is supposed to
> > > > work.  The original use case was for containers with the lower dir
> > > > labeled with a context that is read-only to the container context and
> > > > using a context that is writable by the container context for the
> > > > context= mount.
> > > > 
> > > > > Next question: permission to change metadata is tied to permission to
> > > > > open?  Is it possible that open is denied, but metadata can be
> > > > > changed?
> > > > 
> > > > There is no metadata change occurring here. The overlay, upper, and
> > > > lower inodes all keep their labels intact for their lifetime (both
> > > > overlay and upper always have the context= label; upper has whatever its
> > >                                                     ^^lower^^
> > > 
> > > > original label was), unless explicitly relabeled by some process.  And
> > > > when viewed through the overlay, the file always has the label specified
> > > > via context=, even before the copy-up.
> > 
> > Okay.
> > 
> > > > 
> > > > > DAC model allows this: metadata change is tied to ownership, not mode
> > > > > bits.   And different capability flag.
> > > > > 
> > > > > If the same is true for MAC, then the pre-v4.20-rc1 is already
> > > > > susceptible to the privilege escalation you describe, right?
> > > > 
> > > > Actually, I guess there wouldn't be a privilege escalation if you
> > > > checked the mounter's ability to create the new file upon copy-up, and
> > > > checked the mounter's access to the upper inode label upon the
> > > > subsequent read, write, or execute access.  Then we'd typically block
> > > > the ability to create the device file and we'd block the ability to
> > > > execute files with the label from context=.
> > > > 
> > > > But copy-up of special files seems undesirable for other reasons (e.g.
> > > > requiring mounters to be allowed to create device nodes just to permit
> > > > client's to read/write them, possible implications for nodev/noexec,
> > > > implications for socket and fifo files).
> > 
> > I think you missed my point: opening a device file or executing an
> > executable wouldn't normally require copy-up.   If
> > 
> >   -  permission is granted on overlay to task, and
> >   -  permission is granted on lower layer to mounter,
> > 
> > then copy-up wouldn't be performed.
> > 
> > My proposed sequence would be
> > 
> > a) check task's creds against overlay inode, fail -> return fail, otherwise:
> > b) check mounter's creds against lower inode, success -> return
> > success, otherwise:
> > c) copy up inode, fail -> return fail, otherwise
> > d) check mounter's creds against upper inode, return result.
> > 
> > So, unlike write access to regular files, write access to special
> > files don't necessarily result in copy-up.
> > 
> > You say this is an escalation of privilege, but I don't get it how.
> > As DWalsh points out downthread, if mounter cannot create device
> > files, then the copy-up will simply fail.  If mounter can create
> > device files, then this is not an escalation of privilege for the
> > mounter.
> 
> Yes, in that case there isn't an escalation of privilege for the mounter (I
> acknowledged that above).  I'm still not sure copy-up of special files is a
> good idea though:
> 
> - In the case of device files, there is the potential for mischief by the
> client task in misusing the mounter's privileges to gain access to otherwise
> unusable device node (nodev lower vs upper?),

I was thinking about it as well. But client can always bypass permissions
of lower device inode by first removing device file and then by doing
a mknod. And that will be equivalent of copy up. IOW, IIUC, we do not deny
mknod to client and that always creates a way for it to write to device
file (and it does not matter what are permissions on lower?)

> - In the case of sockets or fifos, what useful result do you get from a
> copy-up? Accessing the copy isn't going to yield the same result as
> accessing the original.

sockets and fifos use overlay inode number (and not lower/upper inode
number). So even if we create a copy things continue to work. Copying
up will make sense upon owner or permission change to determine who
can open socket/fifo in future?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-12-04 15:15                             ` Vivek Goyal
@ 2018-12-04 15:22                               ` Vivek Goyal
  2018-12-04 15:31                                 ` Miklos Szeredi
  2018-12-04 15:42                               ` Stephen Smalley
  1 sibling, 1 reply; 47+ messages in thread
From: Vivek Goyal @ 2018-12-04 15:22 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: Miklos Szeredi, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn,
	Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux,
	Daniel J Walsh

On Tue, Dec 04, 2018 at 10:15:49AM -0500, Vivek Goyal wrote:
> On Tue, Dec 04, 2018 at 09:30:53AM -0500, Stephen Smalley wrote:
> > On 12/4/18 8:32 AM, Miklos Szeredi wrote:
> > > On Thu, Nov 29, 2018 at 10:16 PM Stephen Smalley <sds@tycho.nsa.gov> wrote:
> > > > 
> > > > On 11/29/18 4:03 PM, Stephen Smalley wrote:
> > > > > On 11/29/18 2:47 PM, Miklos Szeredi wrote:
> > > > > > On Thu, Nov 29, 2018 at 5:14 PM Stephen Smalley <sds@tycho.nsa.gov>
> > > > > > wrote:
> > > > > > 
> > > > > > > Possibly I misunderstood you, but I don't think we want to copy-up on
> > > > > > > permission denial, as that would still allow the mounter to read/write
> > > > > > > special files or execute regular files to which it would normally be
> > > > > > > denied access, because the copy would inherit the context specified by
> > > > > > > the mounter in the context mount case.  It still represents an
> > > > > > > escalation of privilege for the mounter.  In contrast, the copy-up on
> > > > > > > write behavior does not allow the mounter to do anything it could not do
> > > > > > > already (i.e. read from the lower, write to the upper).
> > > > > > 
> > > > > > Let's get this straight:  when file is copied up, it inherits label
> > > > > > from context=, not from label of lower file?
> > > > > 
> > > > > That's correct.  The overlay inodes are all assigned the label from the
> > > > > context= mount option, and so are any upper inodes created through the
> > > > > overlay.  At least that's my understanding of how it is supposed to
> > > > > work.  The original use case was for containers with the lower dir
> > > > > labeled with a context that is read-only to the container context and
> > > > > using a context that is writable by the container context for the
> > > > > context= mount.
> > > > > 
> > > > > > Next question: permission to change metadata is tied to permission to
> > > > > > open?  Is it possible that open is denied, but metadata can be
> > > > > > changed?
> > > > > 
> > > > > There is no metadata change occurring here. The overlay, upper, and
> > > > > lower inodes all keep their labels intact for their lifetime (both
> > > > > overlay and upper always have the context= label; upper has whatever its
> > > >                                                     ^^lower^^
> > > > 
> > > > > original label was), unless explicitly relabeled by some process.  And
> > > > > when viewed through the overlay, the file always has the label specified
> > > > > via context=, even before the copy-up.
> > > 
> > > Okay.
> > > 
> > > > > 
> > > > > > DAC model allows this: metadata change is tied to ownership, not mode
> > > > > > bits.   And different capability flag.
> > > > > > 
> > > > > > If the same is true for MAC, then the pre-v4.20-rc1 is already
> > > > > > susceptible to the privilege escalation you describe, right?
> > > > > 
> > > > > Actually, I guess there wouldn't be a privilege escalation if you
> > > > > checked the mounter's ability to create the new file upon copy-up, and
> > > > > checked the mounter's access to the upper inode label upon the
> > > > > subsequent read, write, or execute access.  Then we'd typically block
> > > > > the ability to create the device file and we'd block the ability to
> > > > > execute files with the label from context=.
> > > > > 
> > > > > But copy-up of special files seems undesirable for other reasons (e.g.
> > > > > requiring mounters to be allowed to create device nodes just to permit
> > > > > client's to read/write them, possible implications for nodev/noexec,
> > > > > implications for socket and fifo files).
> > > 
> > > I think you missed my point: opening a device file or executing an
> > > executable wouldn't normally require copy-up.   If
> > > 
> > >   -  permission is granted on overlay to task, and
> > >   -  permission is granted on lower layer to mounter,
> > > 
> > > then copy-up wouldn't be performed.
> > > 
> > > My proposed sequence would be
> > > 
> > > a) check task's creds against overlay inode, fail -> return fail, otherwise:
> > > b) check mounter's creds against lower inode, success -> return
> > > success, otherwise:
> > > c) copy up inode, fail -> return fail, otherwise
> > > d) check mounter's creds against upper inode, return result.
> > > 
> > > So, unlike write access to regular files, write access to special
> > > files don't necessarily result in copy-up.
> > > 
> > > You say this is an escalation of privilege, but I don't get it how.
> > > As DWalsh points out downthread, if mounter cannot create device
> > > files, then the copy-up will simply fail.  If mounter can create
> > > device files, then this is not an escalation of privilege for the
> > > mounter.
> > 
> > Yes, in that case there isn't an escalation of privilege for the mounter (I
> > acknowledged that above).  I'm still not sure copy-up of special files is a
> > good idea though:
> > 
> > - In the case of device files, there is the potential for mischief by the
> > client task in misusing the mounter's privileges to gain access to otherwise
> > unusable device node (nodev lower vs upper?),
> 
> I was thinking about it as well. But client can always bypass permissions
> of lower device inode by first removing device file and then by doing
> a mknod. And that will be equivalent of copy up. IOW, IIUC, we do not deny
> mknod to client and that always creates a way for it to write to device
> file (and it does not matter what are permissions on lower?)

Having said that, this still create little anomaly when mknod to client
is not allowed on context label. So a device file, which is on lower
and client can not open it for read/write on host, it can now be opened
for read/write because mounter will allow access. So why it is different
that regular copy up. Well, in regular copy up, we created a copy of
the original object and allowed writing to that object (cp --preserve=all)
model. But in case of device file, writes will go to same original
object. (And not a separate copy).

Thanks
Vivek

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-12-04 15:22                               ` Vivek Goyal
@ 2018-12-04 15:31                                 ` Miklos Szeredi
  2018-12-04 15:42                                   ` Vivek Goyal
  0 siblings, 1 reply; 47+ messages in thread
From: Miklos Szeredi @ 2018-12-04 15:31 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Stephen Smalley, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn,
	Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux,
	Daniel J Walsh

On Tue, Dec 4, 2018 at 4:22 PM Vivek Goyal <vgoyal@redhat.com> wrote:

> Having said that, this still create little anomaly when mknod to client
> is not allowed on context label. So a device file, which is on lower
> and client can not open it for read/write on host, it can now be opened
> for read/write because mounter will allow access. So why it is different
> that regular copy up. Well, in regular copy up, we created a copy of
> the original object and allowed writing to that object (cp --preserve=all)
> model. But in case of device file, writes will go to same original
> object. (And not a separate copy).

That's true.

In that sense copy up of special file should result in upper having
the same label as of lower, right?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-12-04 14:45                             ` Miklos Szeredi
@ 2018-12-04 15:35                               ` Stephen Smalley
  2018-12-04 15:39                                 ` Miklos Szeredi
  0 siblings, 1 reply; 47+ messages in thread
From: Stephen Smalley @ 2018-12-04 15:35 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Vivek Goyal, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn,
	Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux,
	Daniel J Walsh

On 12/4/18 9:45 AM, Miklos Szeredi wrote:
> On Tue, Dec 4, 2018 at 3:28 PM Stephen Smalley <sds@tycho.nsa.gov> wrote:
>>
>> On 12/4/18 8:32 AM, Miklos Szeredi wrote:
> 
>>> My proposed sequence would be
>>>
>>> a) check task's creds against overlay inode, fail -> return fail, otherwise:
>>> b) check mounter's creds against lower inode, success -> return
>>> success, otherwise:
>>> c) copy up inode, fail -> return fail, otherwise
>>> d) check mounter's creds against upper inode, return result.
>>>
>>> So, unlike write access to regular files, write access to special
>>> files don't necessarily result in copy-up.
>>>
>>> You say this is an escalation of privilege, but I don't get it how.
>>> As DWalsh points out downthread, if mounter cannot create device
>>> files, then the copy-up will simply fail.  If mounter can create
>>> device files, then this is not an escalation of privilege for the
>>> mounter.
>>
>> Yes, in that case there isn't an escalation of privilege for the mounter
>> (I acknowledged that above).  I'm still not sure copy-up of special
>> files is a good idea though:
>>
>> - In the case of device files, there is the potential for mischief by
>> the client task in misusing the mounter's privileges to gain access to
>> otherwise unusable device node (nodev lower vs upper?),
> 
> The mount flag "nodev" on lower or upper has no effect on the overlay,
> and never had.
> 
>> - In the case of sockets or fifos, what useful result do you get from a
>> copy-up? Accessing the copy isn't going to yield the same result as
>> accessing the original.
> 
> This is a misconception.  Overlayfs is a filesystem (that uses other
> filesystems as backing store), but it's more a filesystem and less a
> vfs hack (and trying to completely get out of the vfs hack business),
> and copy up has no effect on I/O being performed on a socket or FIFO:
> it's the same object before and after the copy up, and it's a
> different object from either the lower or upper nodes.   Same as NFS:
> fifo on server is logically different object than fifo having the same
> name on any number of clients.
> 
>> For executables, this copy-up behavior might trigger a lot of undesired
>> copies of non-executable files from the lower dir into the upper, even
>> if we ultimately end up denying the execute.
> 
> That would only happen if
> 
>    - task is allowed exec on overlay
>    - mounter is denied exec on lower
>    - copy up is allowed
>    - mounter is denied exec on upper
> 
> In fact in the model where upper object inherits context from overlay
> this is pretty much impossible, AFAICT.

I think the above is the situation in Android (mounter is denied exec to 
lower and upper to prevent unauthorized code execution, but is allowed 
to copy-up in order to support writes by the client).  However, since 
they need override_creds=off or similar anyway, I guess it doesn't matter.

Ok, I concede the point.  Not sure what that means though for v4.20.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-12-04 15:35                               ` Stephen Smalley
@ 2018-12-04 15:39                                 ` Miklos Szeredi
  2018-12-11 15:50                                   ` Paul Moore
  0 siblings, 1 reply; 47+ messages in thread
From: Miklos Szeredi @ 2018-12-04 15:39 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: Vivek Goyal, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn,
	Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux,
	Daniel J Walsh

On Tue, Dec 4, 2018 at 4:32 PM Stephen Smalley <sds@tycho.nsa.gov> wrote:

> Ok, I concede the point.  Not sure what that means though for v4.20.

I have the revert queued up for v4.20 as that's the safest.

Don't let that stop the discussion, though, I'd especially like to
hear the arguments from the Android side.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-12-04 15:15                             ` Vivek Goyal
  2018-12-04 15:22                               ` Vivek Goyal
@ 2018-12-04 15:42                               ` Stephen Smalley
  2018-12-04 16:15                                 ` Vivek Goyal
  1 sibling, 1 reply; 47+ messages in thread
From: Stephen Smalley @ 2018-12-04 15:42 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Miklos Szeredi, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn,
	Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux,
	Daniel J Walsh

On 12/4/18 10:15 AM, Vivek Goyal wrote:
> On Tue, Dec 04, 2018 at 09:30:53AM -0500, Stephen Smalley wrote:
>> On 12/4/18 8:32 AM, Miklos Szeredi wrote:
>>> On Thu, Nov 29, 2018 at 10:16 PM Stephen Smalley <sds@tycho.nsa.gov> wrote:
>>>>
>>>> On 11/29/18 4:03 PM, Stephen Smalley wrote:
>>>>> On 11/29/18 2:47 PM, Miklos Szeredi wrote:
>>>>>> On Thu, Nov 29, 2018 at 5:14 PM Stephen Smalley <sds@tycho.nsa.gov>
>>>>>> wrote:
>>>>>>
>>>>>>> Possibly I misunderstood you, but I don't think we want to copy-up on
>>>>>>> permission denial, as that would still allow the mounter to read/write
>>>>>>> special files or execute regular files to which it would normally be
>>>>>>> denied access, because the copy would inherit the context specified by
>>>>>>> the mounter in the context mount case.  It still represents an
>>>>>>> escalation of privilege for the mounter.  In contrast, the copy-up on
>>>>>>> write behavior does not allow the mounter to do anything it could not do
>>>>>>> already (i.e. read from the lower, write to the upper).
>>>>>>
>>>>>> Let's get this straight:  when file is copied up, it inherits label
>>>>>> from context=, not from label of lower file?
>>>>>
>>>>> That's correct.  The overlay inodes are all assigned the label from the
>>>>> context= mount option, and so are any upper inodes created through the
>>>>> overlay.  At least that's my understanding of how it is supposed to
>>>>> work.  The original use case was for containers with the lower dir
>>>>> labeled with a context that is read-only to the container context and
>>>>> using a context that is writable by the container context for the
>>>>> context= mount.
>>>>>
>>>>>> Next question: permission to change metadata is tied to permission to
>>>>>> open?  Is it possible that open is denied, but metadata can be
>>>>>> changed?
>>>>>
>>>>> There is no metadata change occurring here. The overlay, upper, and
>>>>> lower inodes all keep their labels intact for their lifetime (both
>>>>> overlay and upper always have the context= label; upper has whatever its
>>>>                                                      ^^lower^^
>>>>
>>>>> original label was), unless explicitly relabeled by some process.  And
>>>>> when viewed through the overlay, the file always has the label specified
>>>>> via context=, even before the copy-up.
>>>
>>> Okay.
>>>
>>>>>
>>>>>> DAC model allows this: metadata change is tied to ownership, not mode
>>>>>> bits.   And different capability flag.
>>>>>>
>>>>>> If the same is true for MAC, then the pre-v4.20-rc1 is already
>>>>>> susceptible to the privilege escalation you describe, right?
>>>>>
>>>>> Actually, I guess there wouldn't be a privilege escalation if you
>>>>> checked the mounter's ability to create the new file upon copy-up, and
>>>>> checked the mounter's access to the upper inode label upon the
>>>>> subsequent read, write, or execute access.  Then we'd typically block
>>>>> the ability to create the device file and we'd block the ability to
>>>>> execute files with the label from context=.
>>>>>
>>>>> But copy-up of special files seems undesirable for other reasons (e.g.
>>>>> requiring mounters to be allowed to create device nodes just to permit
>>>>> client's to read/write them, possible implications for nodev/noexec,
>>>>> implications for socket and fifo files).
>>>
>>> I think you missed my point: opening a device file or executing an
>>> executable wouldn't normally require copy-up.   If
>>>
>>>    -  permission is granted on overlay to task, and
>>>    -  permission is granted on lower layer to mounter,
>>>
>>> then copy-up wouldn't be performed.
>>>
>>> My proposed sequence would be
>>>
>>> a) check task's creds against overlay inode, fail -> return fail, otherwise:
>>> b) check mounter's creds against lower inode, success -> return
>>> success, otherwise:
>>> c) copy up inode, fail -> return fail, otherwise
>>> d) check mounter's creds against upper inode, return result.
>>>
>>> So, unlike write access to regular files, write access to special
>>> files don't necessarily result in copy-up.
>>>
>>> You say this is an escalation of privilege, but I don't get it how.
>>> As DWalsh points out downthread, if mounter cannot create device
>>> files, then the copy-up will simply fail.  If mounter can create
>>> device files, then this is not an escalation of privilege for the
>>> mounter.
>>
>> Yes, in that case there isn't an escalation of privilege for the mounter (I
>> acknowledged that above).  I'm still not sure copy-up of special files is a
>> good idea though:
>>
>> - In the case of device files, there is the potential for mischief by the
>> client task in misusing the mounter's privileges to gain access to otherwise
>> unusable device node (nodev lower vs upper?),
> 
> I was thinking about it as well. But client can always bypass permissions
> of lower device inode by first removing device file and then by doing
> a mknod. And that will be equivalent of copy up. IOW, IIUC, we do not deny
> mknod to client and that always creates a way for it to write to device
> file (and it does not matter what are permissions on lower?)

I'm not sure what you mean by "we do not deny mknod to client".  Depends 
on your policy and what context the client is running in.  Plenty of 
situations where the client is not allowed to create device files 
directly, and the mounter is.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-12-04 15:31                                 ` Miklos Szeredi
@ 2018-12-04 15:42                                   ` Vivek Goyal
  2018-12-04 16:05                                     ` Stephen Smalley
  0 siblings, 1 reply; 47+ messages in thread
From: Vivek Goyal @ 2018-12-04 15:42 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Stephen Smalley, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn,
	Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux,
	Daniel J Walsh

On Tue, Dec 04, 2018 at 04:31:09PM +0100, Miklos Szeredi wrote:
> On Tue, Dec 4, 2018 at 4:22 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> 
> > Having said that, this still create little anomaly when mknod to client
> > is not allowed on context label. So a device file, which is on lower
> > and client can not open it for read/write on host, it can now be opened
> > for read/write because mounter will allow access. So why it is different
> > that regular copy up. Well, in regular copy up, we created a copy of
> > the original object and allowed writing to that object (cp --preserve=all)
> > model. But in case of device file, writes will go to same original
> > object. (And not a separate copy).
> 
> That's true.
> 
> In that sense copy up of special file should result in upper having
> the same label as of lower, right?

I guess that might be reasonable (if this behavior is a concern). So even
after copy up, client will not be able to read/write a device if it was
not allowed on lower. 

Stephen, what do you think about retaining label of lower for device
files during copy up. What about socket/fifo.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-12-04 15:42                                   ` Vivek Goyal
@ 2018-12-04 16:05                                     ` Stephen Smalley
  2018-12-04 16:17                                       ` Vivek Goyal
  0 siblings, 1 reply; 47+ messages in thread
From: Stephen Smalley @ 2018-12-04 16:05 UTC (permalink / raw)
  To: Vivek Goyal, Miklos Szeredi
  Cc: Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn, Paul Moore,
	linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh

On 12/4/18 10:42 AM, Vivek Goyal wrote:
> On Tue, Dec 04, 2018 at 04:31:09PM +0100, Miklos Szeredi wrote:
>> On Tue, Dec 4, 2018 at 4:22 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>>
>>> Having said that, this still create little anomaly when mknod to client
>>> is not allowed on context label. So a device file, which is on lower
>>> and client can not open it for read/write on host, it can now be opened
>>> for read/write because mounter will allow access. So why it is different
>>> that regular copy up. Well, in regular copy up, we created a copy of
>>> the original object and allowed writing to that object (cp --preserve=all)
>>> model. But in case of device file, writes will go to same original
>>> object. (And not a separate copy).
>>
>> That's true.
>>
>> In that sense copy up of special file should result in upper having
>> the same label as of lower, right?
> 
> I guess that might be reasonable (if this behavior is a concern). So even
> after copy up, client will not be able to read/write a device if it was
> not allowed on lower.
> 
> Stephen, what do you think about retaining label of lower for device
> files during copy up. What about socket/fifo.

We don't check client task access to the upper inode label, only to the 
overlay, right?  So the client is still free to access the device 
through the overlay even if we preserve the lower inode label on the 
upper inode?  What do we gain?




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-12-04 15:42                               ` Stephen Smalley
@ 2018-12-04 16:15                                 ` Vivek Goyal
  0 siblings, 0 replies; 47+ messages in thread
From: Vivek Goyal @ 2018-12-04 16:15 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: Miklos Szeredi, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn,
	Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux,
	Daniel J Walsh

On Tue, Dec 04, 2018 at 10:42:35AM -0500, Stephen Smalley wrote:

[..]
> > > Yes, in that case there isn't an escalation of privilege for the mounter (I
> > > acknowledged that above).  I'm still not sure copy-up of special files is a
> > > good idea though:
> > > 
> > > - In the case of device files, there is the potential for mischief by the
> > > client task in misusing the mounter's privileges to gain access to otherwise
> > > unusable device node (nodev lower vs upper?),
> > 
> > I was thinking about it as well. But client can always bypass permissions
> > of lower device inode by first removing device file and then by doing
> > a mknod. And that will be equivalent of copy up. IOW, IIUC, we do not deny
> > mknod to client and that always creates a way for it to write to device
> > file (and it does not matter what are permissions on lower?)
> 
> I'm not sure what you mean by "we do not deny mknod to client".  Depends on
> your policy and what context the client is running in.  Plenty of situations
> where the client is not allowed to create device files directly, and the
> mounter is.
> 

Oh, I meant for the cases where policy allows client to mknod with context
label. In that case, it does not matter what's the label on lower. Client
can always bypass it by first removing device node and creating a new
device node.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-12-04 16:05                                     ` Stephen Smalley
@ 2018-12-04 16:17                                       ` Vivek Goyal
  2018-12-04 16:49                                         ` Stephen Smalley
  0 siblings, 1 reply; 47+ messages in thread
From: Vivek Goyal @ 2018-12-04 16:17 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: Miklos Szeredi, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn,
	Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux,
	Daniel J Walsh

On Tue, Dec 04, 2018 at 11:05:46AM -0500, Stephen Smalley wrote:
> On 12/4/18 10:42 AM, Vivek Goyal wrote:
> > On Tue, Dec 04, 2018 at 04:31:09PM +0100, Miklos Szeredi wrote:
> > > On Tue, Dec 4, 2018 at 4:22 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > 
> > > > Having said that, this still create little anomaly when mknod to client
> > > > is not allowed on context label. So a device file, which is on lower
> > > > and client can not open it for read/write on host, it can now be opened
> > > > for read/write because mounter will allow access. So why it is different
> > > > that regular copy up. Well, in regular copy up, we created a copy of
> > > > the original object and allowed writing to that object (cp --preserve=all)
> > > > model. But in case of device file, writes will go to same original
> > > > object. (And not a separate copy).
> > > 
> > > That's true.
> > > 
> > > In that sense copy up of special file should result in upper having
> > > the same label as of lower, right?
> > 
> > I guess that might be reasonable (if this behavior is a concern). So even
> > after copy up, client will not be able to read/write a device if it was
> > not allowed on lower.
> > 
> > Stephen, what do you think about retaining label of lower for device
> > files during copy up. What about socket/fifo.
> 
> We don't check client task access to the upper inode label, only to the
> overlay, right?  So the client is still free to access the device through
> the overlay even if we preserve the lower inode label on the upper inode?
> What do we gain?

That's only with latest code and Miklos said he will revert it for 4.20.

IOW, I am assuming that we will continue to check access to a file
on upper in the context of mounter. Otherwise, client will be able to access
files on upper/ which even mounter can't access.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-12-04 16:17                                       ` Vivek Goyal
@ 2018-12-04 16:49                                         ` Stephen Smalley
  2018-12-05 13:43                                           ` Vivek Goyal
  0 siblings, 1 reply; 47+ messages in thread
From: Stephen Smalley @ 2018-12-04 16:49 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Miklos Szeredi, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn,
	Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux,
	Daniel J Walsh

On 12/4/18 11:17 AM, Vivek Goyal wrote:
> On Tue, Dec 04, 2018 at 11:05:46AM -0500, Stephen Smalley wrote:
>> On 12/4/18 10:42 AM, Vivek Goyal wrote:
>>> On Tue, Dec 04, 2018 at 04:31:09PM +0100, Miklos Szeredi wrote:
>>>> On Tue, Dec 4, 2018 at 4:22 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>>>>
>>>>> Having said that, this still create little anomaly when mknod to client
>>>>> is not allowed on context label. So a device file, which is on lower
>>>>> and client can not open it for read/write on host, it can now be opened
>>>>> for read/write because mounter will allow access. So why it is different
>>>>> that regular copy up. Well, in regular copy up, we created a copy of
>>>>> the original object and allowed writing to that object (cp --preserve=all)
>>>>> model. But in case of device file, writes will go to same original
>>>>> object. (And not a separate copy).
>>>>
>>>> That's true.
>>>>
>>>> In that sense copy up of special file should result in upper having
>>>> the same label as of lower, right?
>>>
>>> I guess that might be reasonable (if this behavior is a concern). So even
>>> after copy up, client will not be able to read/write a device if it was
>>> not allowed on lower.
>>>
>>> Stephen, what do you think about retaining label of lower for device
>>> files during copy up. What about socket/fifo.
>>
>> We don't check client task access to the upper inode label, only to the
>> overlay, right?  So the client is still free to access the device through
>> the overlay even if we preserve the lower inode label on the upper inode?
>> What do we gain?
> 
> That's only with latest code and Miklos said he will revert it for 4.20.
> 
> IOW, I am assuming that we will continue to check access to a file
> on upper in the context of mounter. Otherwise, client will be able to access
> files on upper/ which even mounter can't access.

I was assuming we're talking about the proposed solution, where we check 
client access to the overlay (unchanged), mounter access to lower 
(unchanged), copy-up if denied (new), mounter access to upper (new in 
the sense that previously we didn't copy-up on denials).

In that situation, propagating the lower inode label to the upper inode 
only impacts the mounter checks, and in that case makes copy-up 
pointless - if it wasn't allowed to lower it won't be allowed to upper. 
  If it is allowed, then client task is free to access the device 
regardless as long as it has permissions to the overlay inode.  So I 
don't see what we gain by propagating the lower inode label to the upper 
inode in the context mount case, and it creates an inconsistency between 
special files and regular ones.

Did I miss something?




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-12-04 14:43                         ` Stephen Smalley
@ 2018-12-04 23:01                           ` Paul Moore
  0 siblings, 0 replies; 47+ messages in thread
From: Paul Moore @ 2018-12-04 23:01 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: Dan Walsh, miklos, vgoyal, omosnace, bfields, salyzyn,
	linux-kernel, linux-unionfs, linux-fsdevel, selinux

On Tue, Dec 4, 2018 at 9:40 AM Stephen Smalley <sds@tycho.nsa.gov> wrote:
> On 12/3/18 6:27 PM, Paul Moore wrote:
> > On Thu, Nov 29, 2018 at 5:22 PM Daniel Walsh <dwalsh@redhat.com> wrote:
> >> On 11/29/18 2:47 PM, Miklos Szeredi wrote:
> >>> On Thu, Nov 29, 2018 at 5:14 PM Stephen Smalley <sds@tycho.nsa.gov> wrote:
> >>>
> >>>> Possibly I misunderstood you, but I don't think we want to copy-up on
> >>>> permission denial, as that would still allow the mounter to read/write
> >>>> special files or execute regular files to which it would normally be
> >>>> denied access, because the copy would inherit the context specified by
> >>>> the mounter in the context mount case.  It still represents an
> >>>> escalation of privilege for the mounter.  In contrast, the copy-up on
> >>>> write behavior does not allow the mounter to do anything it could not do
> >>>> already (i.e. read from the lower, write to the upper).
> >>> Let's get this straight:  when file is copied up, it inherits label
> >>> from context=, not from label of lower file?
> >>
> >> Yes, in the case of context mount, it will get the context mount directory.
> >>
> >> In the case of not context mount, it should maintain the label of  the
> >> lower.
> >>
> >>> Next question: permission to change metadata is tied to permission to
> >>> open?  Is it possible that open is denied, but metadata can be
> >>> changed?
> >>
> >> Yes, SElinux handles open differently then setattr.  Although I am not
> >> sure if any tools handle this.
> >>
> >>> DAC model allows this: metadata change is tied to ownership, not mode
> >>> bits.   And different capability flag.
> >>>
> >>> If the same is true for MAC, then the pre-v4.20-rc1 is already
> >>> susceptible to the privilege escalation you describe, right?
> >>
> >> After talking to Vivek, I am not sure their is a privilege escallation.
> >
> > More on this below, but this thread doesn't have me convinced, and we
> > are at -rc5 right now.  We need to come to some decision on this soon
> > because we are running out of time before v4.20 is released with this
> > code.
> >
> >> For device nodes, the mounter has to have the ability to create the
> >> devicenode with the context mount, if he can do this, then he can do it
> >> with or without Overlay.  This might lead to users making mistakes on
> >> security, but the model is sound. And I think this stands even in the
> >> case of the lower is mounted NODEV and the upper is not.  If the mounter
> >> can create a device on the upper with a particular label, then he does
> >> not need the lower.
> >
> > The problem I have when looking at the current code is that permission
> > is given, regardless of what is requested, for any special_file() on
> > an overlayfs mount.
> >
> > It also looks like the mounter's creds are used when checking
> > permissions regardless of the file has been copied up or not; I would
> > expect that the mounter's permissions would only used when checking
> > permissions against the lower inode, no?
>
> No, that's never been the model as far as I know.  mounter's permissions
> are checked to the underlying inode, whether upper or lower.  client's
> permissions are only checked to the overlay inode.  upper and lower are
> logically backing store - upper for writes and lower for reads from
> unmodified files.  Now, in theory, upper should always be labeled the
> same as overlay, so client check against overlay should already imply
> client access to upper, unless someone has manually relabeled upper
> outside of the overlay.

Okay, thanks for the clarification on the model.  This seems a little
odd at first, but I'm trying to convince myself that it makes sense.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-12-04 16:49                                         ` Stephen Smalley
@ 2018-12-05 13:43                                           ` Vivek Goyal
  2018-12-06 20:26                                             ` Stephen Smalley
  0 siblings, 1 reply; 47+ messages in thread
From: Vivek Goyal @ 2018-12-05 13:43 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: Miklos Szeredi, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn,
	Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux,
	Daniel J Walsh

On Tue, Dec 04, 2018 at 11:49:16AM -0500, Stephen Smalley wrote:
> On 12/4/18 11:17 AM, Vivek Goyal wrote:
> > On Tue, Dec 04, 2018 at 11:05:46AM -0500, Stephen Smalley wrote:
> > > On 12/4/18 10:42 AM, Vivek Goyal wrote:
> > > > On Tue, Dec 04, 2018 at 04:31:09PM +0100, Miklos Szeredi wrote:
> > > > > On Tue, Dec 4, 2018 at 4:22 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > > > 
> > > > > > Having said that, this still create little anomaly when mknod to client
> > > > > > is not allowed on context label. So a device file, which is on lower
> > > > > > and client can not open it for read/write on host, it can now be opened
> > > > > > for read/write because mounter will allow access. So why it is different
> > > > > > that regular copy up. Well, in regular copy up, we created a copy of
> > > > > > the original object and allowed writing to that object (cp --preserve=all)
> > > > > > model. But in case of device file, writes will go to same original
> > > > > > object. (And not a separate copy).
> > > > > 
> > > > > That's true.
> > > > > 
> > > > > In that sense copy up of special file should result in upper having
> > > > > the same label as of lower, right?
> > > > 
> > > > I guess that might be reasonable (if this behavior is a concern). So even
> > > > after copy up, client will not be able to read/write a device if it was
> > > > not allowed on lower.
> > > > 
> > > > Stephen, what do you think about retaining label of lower for device
> > > > files during copy up. What about socket/fifo.
> > > 
> > > We don't check client task access to the upper inode label, only to the
> > > overlay, right?  So the client is still free to access the device through
> > > the overlay even if we preserve the lower inode label on the upper inode?
> > > What do we gain?
> > 
> > That's only with latest code and Miklos said he will revert it for 4.20.
> > 
> > IOW, I am assuming that we will continue to check access to a file
> > on upper in the context of mounter. Otherwise, client will be able to access
> > files on upper/ which even mounter can't access.
> 
> I was assuming we're talking about the proposed solution, where we check
> client access to the overlay (unchanged), mounter access to lower
> (unchanged), copy-up if denied (new), mounter access to upper (new in the
> sense that previously we didn't copy-up on denials).
> 
> In that situation, propagating the lower inode label to the upper inode only
> impacts the mounter checks, and in that case makes copy-up pointless - if it
> wasn't allowed to lower it won't be allowed to upper.  If it is allowed,
> then client task is free to access the device regardless as long as it has
> permissions to the overlay inode.  So I don't see what we gain by
> propagating the lower inode label to the upper inode in the context mount
> case, and it creates an inconsistency between special files and regular
> ones.

If we agree on retaining lower label of lower device file on copy up, then
I am assuming we will change rule c) to copy up only non device files.
(because if you don't have access on lower, you will not have access
even after copy up).

There are other paths where copy up happnes. Like link or when file 
metadata (ownership, permissions, timestmap) changes. In those cases,
if we retain the lower label over copy up, it probably will help.

IOW, just by creating a link to a device, one will not get access to
a device on upper which could not be accessed on lower.

Device files are special anyway. In regular files we are creating a 
copy and user writes to copy. But that's not the case with device
files. So I guess these will have to be treated differently.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-12-05 13:43                                           ` Vivek Goyal
@ 2018-12-06 20:26                                             ` Stephen Smalley
  2018-12-11 21:48                                               ` Vivek Goyal
  0 siblings, 1 reply; 47+ messages in thread
From: Stephen Smalley @ 2018-12-06 20:26 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Miklos Szeredi, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn,
	Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux,
	Daniel J Walsh

On 12/5/18 8:43 AM, Vivek Goyal wrote:
> On Tue, Dec 04, 2018 at 11:49:16AM -0500, Stephen Smalley wrote:
>> On 12/4/18 11:17 AM, Vivek Goyal wrote:
>>> On Tue, Dec 04, 2018 at 11:05:46AM -0500, Stephen Smalley wrote:
>>>> On 12/4/18 10:42 AM, Vivek Goyal wrote:
>>>>> On Tue, Dec 04, 2018 at 04:31:09PM +0100, Miklos Szeredi wrote:
>>>>>> On Tue, Dec 4, 2018 at 4:22 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>>>>>>
>>>>>>> Having said that, this still create little anomaly when mknod to client
>>>>>>> is not allowed on context label. So a device file, which is on lower
>>>>>>> and client can not open it for read/write on host, it can now be opened
>>>>>>> for read/write because mounter will allow access. So why it is different
>>>>>>> that regular copy up. Well, in regular copy up, we created a copy of
>>>>>>> the original object and allowed writing to that object (cp --preserve=all)
>>>>>>> model. But in case of device file, writes will go to same original
>>>>>>> object. (And not a separate copy).
>>>>>>
>>>>>> That's true.
>>>>>>
>>>>>> In that sense copy up of special file should result in upper having
>>>>>> the same label as of lower, right?
>>>>>
>>>>> I guess that might be reasonable (if this behavior is a concern). So even
>>>>> after copy up, client will not be able to read/write a device if it was
>>>>> not allowed on lower.
>>>>>
>>>>> Stephen, what do you think about retaining label of lower for device
>>>>> files during copy up. What about socket/fifo.
>>>>
>>>> We don't check client task access to the upper inode label, only to the
>>>> overlay, right?  So the client is still free to access the device through
>>>> the overlay even if we preserve the lower inode label on the upper inode?
>>>> What do we gain?
>>>
>>> That's only with latest code and Miklos said he will revert it for 4.20.
>>>
>>> IOW, I am assuming that we will continue to check access to a file
>>> on upper in the context of mounter. Otherwise, client will be able to access
>>> files on upper/ which even mounter can't access.
>>
>> I was assuming we're talking about the proposed solution, where we check
>> client access to the overlay (unchanged), mounter access to lower
>> (unchanged), copy-up if denied (new), mounter access to upper (new in the
>> sense that previously we didn't copy-up on denials).
>>
>> In that situation, propagating the lower inode label to the upper inode only
>> impacts the mounter checks, and in that case makes copy-up pointless - if it
>> wasn't allowed to lower it won't be allowed to upper.  If it is allowed,
>> then client task is free to access the device regardless as long as it has
>> permissions to the overlay inode.  So I don't see what we gain by
>> propagating the lower inode label to the upper inode in the context mount
>> case, and it creates an inconsistency between special files and regular
>> ones.
> 
> If we agree on retaining lower label of lower device file on copy up, then
> I am assuming we will change rule c) to copy up only non device files.
> (because if you don't have access on lower, you will not have access
> even after copy up).
> 
> There are other paths where copy up happnes. Like link or when file
> metadata (ownership, permissions, timestmap) changes. In those cases,
> if we retain the lower label over copy up, it probably will help.
> 
> IOW, just by creating a link to a device, one will not get access to
> a device on upper which could not be accessed on lower.
> 
> Device files are special anyway. In regular files we are creating a
> copy and user writes to copy. But that's not the case with device
> files. So I guess these will have to be treated differently.

I don't understand what you are suggesting.  In the case of a context 
mount, the context specified by the mounter must be assigned to the 
upper inode for any files that are copied up.  Otherwise, changes to 
file data or metadata made through the overlay will be visible under two 
different security contexts simultaneously: the context of the overlay 
inode (i.e. the one specified by the mounter) and the context of the 
upper inode (in your suggestion, the context from the lower inode). 
This allows a violation of MAC policy where one can leak data through an 
overlay to an unauthorized context.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-12-04 15:39                                 ` Miklos Szeredi
@ 2018-12-11 15:50                                   ` Paul Moore
  0 siblings, 0 replies; 47+ messages in thread
From: Paul Moore @ 2018-12-11 15:50 UTC (permalink / raw)
  To: miklos
  Cc: Stephen Smalley, vgoyal, omosnace, bfields, salyzyn,
	linux-kernel, linux-unionfs, linux-fsdevel, selinux, Dan Walsh

On Tue, Dec 4, 2018 at 10:39 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Tue, Dec 4, 2018 at 4:32 PM Stephen Smalley <sds@tycho.nsa.gov> wrote:
>
> > Ok, I concede the point.  Not sure what that means though for v4.20.
>
> I have the revert queued up for v4.20 as that's the safest.

Miklos, when do you plan on sending the revert to Linus?  I just
tested v4.20-rc6 and the problem persists.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-12-06 20:26                                             ` Stephen Smalley
@ 2018-12-11 21:48                                               ` Vivek Goyal
  2018-12-12 14:51                                                 ` Stephen Smalley
  0 siblings, 1 reply; 47+ messages in thread
From: Vivek Goyal @ 2018-12-11 21:48 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: Miklos Szeredi, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn,
	Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux,
	Daniel J Walsh

On Thu, Dec 06, 2018 at 03:26:26PM -0500, Stephen Smalley wrote:
> On 12/5/18 8:43 AM, Vivek Goyal wrote:
> > On Tue, Dec 04, 2018 at 11:49:16AM -0500, Stephen Smalley wrote:
> > > On 12/4/18 11:17 AM, Vivek Goyal wrote:
> > > > On Tue, Dec 04, 2018 at 11:05:46AM -0500, Stephen Smalley wrote:
> > > > > On 12/4/18 10:42 AM, Vivek Goyal wrote:
> > > > > > On Tue, Dec 04, 2018 at 04:31:09PM +0100, Miklos Szeredi wrote:
> > > > > > > On Tue, Dec 4, 2018 at 4:22 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > > > > > 
> > > > > > > > Having said that, this still create little anomaly when mknod to client
> > > > > > > > is not allowed on context label. So a device file, which is on lower
> > > > > > > > and client can not open it for read/write on host, it can now be opened
> > > > > > > > for read/write because mounter will allow access. So why it is different
> > > > > > > > that regular copy up. Well, in regular copy up, we created a copy of
> > > > > > > > the original object and allowed writing to that object (cp --preserve=all)
> > > > > > > > model. But in case of device file, writes will go to same original
> > > > > > > > object. (And not a separate copy).
> > > > > > > 
> > > > > > > That's true.
> > > > > > > 
> > > > > > > In that sense copy up of special file should result in upper having
> > > > > > > the same label as of lower, right?
> > > > > > 
> > > > > > I guess that might be reasonable (if this behavior is a concern). So even
> > > > > > after copy up, client will not be able to read/write a device if it was
> > > > > > not allowed on lower.
> > > > > > 
> > > > > > Stephen, what do you think about retaining label of lower for device
> > > > > > files during copy up. What about socket/fifo.
> > > > > 
> > > > > We don't check client task access to the upper inode label, only to the
> > > > > overlay, right?  So the client is still free to access the device through
> > > > > the overlay even if we preserve the lower inode label on the upper inode?
> > > > > What do we gain?
> > > > 
> > > > That's only with latest code and Miklos said he will revert it for 4.20.
> > > > 
> > > > IOW, I am assuming that we will continue to check access to a file
> > > > on upper in the context of mounter. Otherwise, client will be able to access
> > > > files on upper/ which even mounter can't access.
> > > 
> > > I was assuming we're talking about the proposed solution, where we check
> > > client access to the overlay (unchanged), mounter access to lower
> > > (unchanged), copy-up if denied (new), mounter access to upper (new in the
> > > sense that previously we didn't copy-up on denials).
> > > 
> > > In that situation, propagating the lower inode label to the upper inode only
> > > impacts the mounter checks, and in that case makes copy-up pointless - if it
> > > wasn't allowed to lower it won't be allowed to upper.  If it is allowed,
> > > then client task is free to access the device regardless as long as it has
> > > permissions to the overlay inode.  So I don't see what we gain by
> > > propagating the lower inode label to the upper inode in the context mount
> > > case, and it creates an inconsistency between special files and regular
> > > ones.
> > 
> > If we agree on retaining lower label of lower device file on copy up, then
> > I am assuming we will change rule c) to copy up only non device files.
> > (because if you don't have access on lower, you will not have access
> > even after copy up).
> > 
> > There are other paths where copy up happnes. Like link or when file
> > metadata (ownership, permissions, timestmap) changes. In those cases,
> > if we retain the lower label over copy up, it probably will help.
> > 
> > IOW, just by creating a link to a device, one will not get access to
> > a device on upper which could not be accessed on lower.
> > 
> > Device files are special anyway. In regular files we are creating a
> > copy and user writes to copy. But that's not the case with device
> > files. So I guess these will have to be treated differently.
> 
> I don't understand what you are suggesting.  In the case of a context mount,
> the context specified by the mounter must be assigned to the upper inode for
> any files that are copied up.  Otherwise, changes to file data or metadata
> made through the overlay will be visible under two different security
> contexts simultaneously: the context of the overlay inode (i.e. the one
> specified by the mounter) and the context of the upper inode (in your
> suggestion, the context from the lower inode). This allows a violation of
> MAC policy where one can leak data through an overlay to an unauthorized
> context.

Hi Stephen,

Sorry, I don't understand this point of leaking data through overlay. Even
if we retain lower label on copy up (for device file), to open that file
process should have access on overlay context label and then mounter needs
to have access on upper inode (lower label). This is not different from
opening a file on lower. Just that metadata of this file on upper might
be different.

Can you elaborate a bit more on how this is leaking data through overlay
mount. If it is, then why accessing file on lower is not equivalent of
leaking of data. 

Thanks
Vivek

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-12-11 21:48                                               ` Vivek Goyal
@ 2018-12-12 14:51                                                 ` Stephen Smalley
  2018-12-13 14:58                                                   ` Vivek Goyal
  0 siblings, 1 reply; 47+ messages in thread
From: Stephen Smalley @ 2018-12-12 14:51 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Miklos Szeredi, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn,
	Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux,
	Daniel J Walsh

On 12/11/18 4:48 PM, Vivek Goyal wrote:
> On Thu, Dec 06, 2018 at 03:26:26PM -0500, Stephen Smalley wrote:
>> On 12/5/18 8:43 AM, Vivek Goyal wrote:
>>> On Tue, Dec 04, 2018 at 11:49:16AM -0500, Stephen Smalley wrote:
>>>> On 12/4/18 11:17 AM, Vivek Goyal wrote:
>>>>> On Tue, Dec 04, 2018 at 11:05:46AM -0500, Stephen Smalley wrote:
>>>>>> On 12/4/18 10:42 AM, Vivek Goyal wrote:
>>>>>>> On Tue, Dec 04, 2018 at 04:31:09PM +0100, Miklos Szeredi wrote:
>>>>>>>> On Tue, Dec 4, 2018 at 4:22 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>>>>>>>>
>>>>>>>>> Having said that, this still create little anomaly when mknod to client
>>>>>>>>> is not allowed on context label. So a device file, which is on lower
>>>>>>>>> and client can not open it for read/write on host, it can now be opened
>>>>>>>>> for read/write because mounter will allow access. So why it is different
>>>>>>>>> that regular copy up. Well, in regular copy up, we created a copy of
>>>>>>>>> the original object and allowed writing to that object (cp --preserve=all)
>>>>>>>>> model. But in case of device file, writes will go to same original
>>>>>>>>> object. (And not a separate copy).
>>>>>>>>
>>>>>>>> That's true.
>>>>>>>>
>>>>>>>> In that sense copy up of special file should result in upper having
>>>>>>>> the same label as of lower, right?
>>>>>>>
>>>>>>> I guess that might be reasonable (if this behavior is a concern). So even
>>>>>>> after copy up, client will not be able to read/write a device if it was
>>>>>>> not allowed on lower.
>>>>>>>
>>>>>>> Stephen, what do you think about retaining label of lower for device
>>>>>>> files during copy up. What about socket/fifo.
>>>>>>
>>>>>> We don't check client task access to the upper inode label, only to the
>>>>>> overlay, right?  So the client is still free to access the device through
>>>>>> the overlay even if we preserve the lower inode label on the upper inode?
>>>>>> What do we gain?
>>>>>
>>>>> That's only with latest code and Miklos said he will revert it for 4.20.
>>>>>
>>>>> IOW, I am assuming that we will continue to check access to a file
>>>>> on upper in the context of mounter. Otherwise, client will be able to access
>>>>> files on upper/ which even mounter can't access.
>>>>
>>>> I was assuming we're talking about the proposed solution, where we check
>>>> client access to the overlay (unchanged), mounter access to lower
>>>> (unchanged), copy-up if denied (new), mounter access to upper (new in the
>>>> sense that previously we didn't copy-up on denials).
>>>>
>>>> In that situation, propagating the lower inode label to the upper inode only
>>>> impacts the mounter checks, and in that case makes copy-up pointless - if it
>>>> wasn't allowed to lower it won't be allowed to upper.  If it is allowed,
>>>> then client task is free to access the device regardless as long as it has
>>>> permissions to the overlay inode.  So I don't see what we gain by
>>>> propagating the lower inode label to the upper inode in the context mount
>>>> case, and it creates an inconsistency between special files and regular
>>>> ones.
>>>
>>> If we agree on retaining lower label of lower device file on copy up, then
>>> I am assuming we will change rule c) to copy up only non device files.
>>> (because if you don't have access on lower, you will not have access
>>> even after copy up).
>>>
>>> There are other paths where copy up happnes. Like link or when file
>>> metadata (ownership, permissions, timestmap) changes. In those cases,
>>> if we retain the lower label over copy up, it probably will help.
>>>
>>> IOW, just by creating a link to a device, one will not get access to
>>> a device on upper which could not be accessed on lower.
>>>
>>> Device files are special anyway. In regular files we are creating a
>>> copy and user writes to copy. But that's not the case with device
>>> files. So I guess these will have to be treated differently.
>>
>> I don't understand what you are suggesting.  In the case of a context mount,
>> the context specified by the mounter must be assigned to the upper inode for
>> any files that are copied up.  Otherwise, changes to file data or metadata
>> made through the overlay will be visible under two different security
>> contexts simultaneously: the context of the overlay inode (i.e. the one
>> specified by the mounter) and the context of the upper inode (in your
>> suggestion, the context from the lower inode). This allows a violation of
>> MAC policy where one can leak data through an overlay to an unauthorized
>> context.
> 
> Hi Stephen,
> 
> Sorry, I don't understand this point of leaking data through overlay. Even
> if we retain lower label on copy up (for device file), to open that file
> process should have access on overlay context label and then mounter needs
> to have access on upper inode (lower label). This is not different from
> opening a file on lower. Just that metadata of this file on upper might
> be different.
> 
> Can you elaborate a bit more on how this is leaking data through overlay
> mount. If it is, then why accessing file on lower is not equivalent of
> leaking of data.

In the container use case, retaining the lower label on copy-up for a 
context-mounted overlay permits a process in the container to leak the 
container data out to host files not labeled with the container label 
and thus potentially accessible to other containers or host processes. 
The container process appears to just be writing to files labeled with 
the container label via the overlay, but the written data and/or 
metadata is directly accessible through the lower label, which is likely 
readable to all/many containers and host processes.

In the multi-level security (MLS) use case, an analogy would a situation 
where you have an unclassified lower dir with some content to be shared 
read-only across all levels, and an overlay is context-mounted at each 
level with a corresponding upper dir and work dir private to that level. 
  If a client process at secret performs a write to a file via the 
secret overlay, and if the written data is stored in a file in the upper 
dir that inherits the label from the lower file (unclassified), then the 
secret process can leak data to unclassified processes at will, 
violating the MLS policy.

The difference with the lower is that it is read-only and the mounter is 
explicitly choosing to export it under the new context for reading (but 
not for writing).

As a side note, the actual checking during a context mount isn't as 
granular as we might like here, since there is no overlay-specific logic 
and thus no individual checking of the lower, upper, and work directory 
labels.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-12-12 14:51                                                 ` Stephen Smalley
@ 2018-12-13 14:58                                                   ` Vivek Goyal
  2018-12-13 16:12                                                     ` Stephen Smalley
  0 siblings, 1 reply; 47+ messages in thread
From: Vivek Goyal @ 2018-12-13 14:58 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: Miklos Szeredi, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn,
	Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux,
	Daniel J Walsh

On Wed, Dec 12, 2018 at 09:51:59AM -0500, Stephen Smalley wrote:
> On 12/11/18 4:48 PM, Vivek Goyal wrote:
> > On Thu, Dec 06, 2018 at 03:26:26PM -0500, Stephen Smalley wrote:
> > > On 12/5/18 8:43 AM, Vivek Goyal wrote:
> > > > On Tue, Dec 04, 2018 at 11:49:16AM -0500, Stephen Smalley wrote:
> > > > > On 12/4/18 11:17 AM, Vivek Goyal wrote:
> > > > > > On Tue, Dec 04, 2018 at 11:05:46AM -0500, Stephen Smalley wrote:
> > > > > > > On 12/4/18 10:42 AM, Vivek Goyal wrote:
> > > > > > > > On Tue, Dec 04, 2018 at 04:31:09PM +0100, Miklos Szeredi wrote:
> > > > > > > > > On Tue, Dec 4, 2018 at 4:22 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > > > > > > > 
> > > > > > > > > > Having said that, this still create little anomaly when mknod to client
> > > > > > > > > > is not allowed on context label. So a device file, which is on lower
> > > > > > > > > > and client can not open it for read/write on host, it can now be opened
> > > > > > > > > > for read/write because mounter will allow access. So why it is different
> > > > > > > > > > that regular copy up. Well, in regular copy up, we created a copy of
> > > > > > > > > > the original object and allowed writing to that object (cp --preserve=all)
> > > > > > > > > > model. But in case of device file, writes will go to same original
> > > > > > > > > > object. (And not a separate copy).
> > > > > > > > > 
> > > > > > > > > That's true.
> > > > > > > > > 
> > > > > > > > > In that sense copy up of special file should result in upper having
> > > > > > > > > the same label as of lower, right?
> > > > > > > > 
> > > > > > > > I guess that might be reasonable (if this behavior is a concern). So even
> > > > > > > > after copy up, client will not be able to read/write a device if it was
> > > > > > > > not allowed on lower.
> > > > > > > > 
> > > > > > > > Stephen, what do you think about retaining label of lower for device
> > > > > > > > files during copy up. What about socket/fifo.
> > > > > > > 
> > > > > > > We don't check client task access to the upper inode label, only to the
> > > > > > > overlay, right?  So the client is still free to access the device through
> > > > > > > the overlay even if we preserve the lower inode label on the upper inode?
> > > > > > > What do we gain?
> > > > > > 
> > > > > > That's only with latest code and Miklos said he will revert it for 4.20.
> > > > > > 
> > > > > > IOW, I am assuming that we will continue to check access to a file
> > > > > > on upper in the context of mounter. Otherwise, client will be able to access
> > > > > > files on upper/ which even mounter can't access.
> > > > > 
> > > > > I was assuming we're talking about the proposed solution, where we check
> > > > > client access to the overlay (unchanged), mounter access to lower
> > > > > (unchanged), copy-up if denied (new), mounter access to upper (new in the
> > > > > sense that previously we didn't copy-up on denials).
> > > > > 
> > > > > In that situation, propagating the lower inode label to the upper inode only
> > > > > impacts the mounter checks, and in that case makes copy-up pointless - if it
> > > > > wasn't allowed to lower it won't be allowed to upper.  If it is allowed,
> > > > > then client task is free to access the device regardless as long as it has
> > > > > permissions to the overlay inode.  So I don't see what we gain by
> > > > > propagating the lower inode label to the upper inode in the context mount
> > > > > case, and it creates an inconsistency between special files and regular
> > > > > ones.
> > > > 
> > > > If we agree on retaining lower label of lower device file on copy up, then
> > > > I am assuming we will change rule c) to copy up only non device files.
> > > > (because if you don't have access on lower, you will not have access
> > > > even after copy up).
> > > > 
> > > > There are other paths where copy up happnes. Like link or when file
> > > > metadata (ownership, permissions, timestmap) changes. In those cases,
> > > > if we retain the lower label over copy up, it probably will help.
> > > > 
> > > > IOW, just by creating a link to a device, one will not get access to
> > > > a device on upper which could not be accessed on lower.
> > > > 
> > > > Device files are special anyway. In regular files we are creating a
> > > > copy and user writes to copy. But that's not the case with device
> > > > files. So I guess these will have to be treated differently.
> > > 
> > > I don't understand what you are suggesting.  In the case of a context mount,
> > > the context specified by the mounter must be assigned to the upper inode for
> > > any files that are copied up.  Otherwise, changes to file data or metadata
> > > made through the overlay will be visible under two different security
> > > contexts simultaneously: the context of the overlay inode (i.e. the one
> > > specified by the mounter) and the context of the upper inode (in your
> > > suggestion, the context from the lower inode). This allows a violation of
> > > MAC policy where one can leak data through an overlay to an unauthorized
> > > context.
> > 
> > Hi Stephen,
> > 
> > Sorry, I don't understand this point of leaking data through overlay. Even
> > if we retain lower label on copy up (for device file), to open that file
> > process should have access on overlay context label and then mounter needs
> > to have access on upper inode (lower label). This is not different from
> > opening a file on lower. Just that metadata of this file on upper might
> > be different.
> > 
> > Can you elaborate a bit more on how this is leaking data through overlay
> > mount. If it is, then why accessing file on lower is not equivalent of
> > leaking of data.
> 
> In the container use case, retaining the lower label on copy-up for a
> context-mounted overlay permits a process in the container to leak the
> container data out to host files not labeled with the container label and
> thus potentially accessible to other containers or host processes.

> The
> container process appears to just be writing to files labeled with the
> container label via the overlay, but the written data and/or metadata is
> directly accessible through the lower label, which is likely readable to
> all/many containers and host processes.
> 
> In the multi-level security (MLS) use case, an analogy would a situation
> where you have an unclassified lower dir with some content to be shared
> read-only across all levels, and an overlay is context-mounted at each level
> with a corresponding upper dir and work dir private to that level.  If a
> client process at secret performs a write to a file via the secret overlay,
> and if the written data is stored in a file in the upper dir that inherits
> the label from the lower file (unclassified), then the secret process can
> leak data to unclassified processes at will, violating the MLS policy.

For the case of devices, its already happening. One might change metadata
of a device (hence trigger copy up). Now all writes to upper device file
from secret process still go to same underlying device and are still
readable from lower device file.

In fact, for the case of devices, that is even more of a reason to retain
the label same as lower. Otherwise upper device node is leaking data
of a secret process which can be accessed using device at lower/ (lower's
label is readable).

> 
> The difference with the lower is that it is read-only and the mounter is
> explicitly choosing to export it under the new context for reading (but not
> for writing).

If we retain the label and if lower is not writable, then upper deivce
node is not being written to as well. So from label point of view, lower
and upper inode are not different. Only difference is that some metadata
of upper inode might be different.

Thanks
Vivek
> 
> As a side note, the actual checking during a context mount isn't as granular
> as we might like here, since there is no overlay-specific logic and thus no
> individual checking of the lower, upper, and work directory labels.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-12-13 14:58                                                   ` Vivek Goyal
@ 2018-12-13 16:12                                                     ` Stephen Smalley
  2018-12-13 18:54                                                       ` Vivek Goyal
  0 siblings, 1 reply; 47+ messages in thread
From: Stephen Smalley @ 2018-12-13 16:12 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Miklos Szeredi, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn,
	Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux,
	Daniel J Walsh

On 12/13/18 9:58 AM, Vivek Goyal wrote:
> On Wed, Dec 12, 2018 at 09:51:59AM -0500, Stephen Smalley wrote:
>> On 12/11/18 4:48 PM, Vivek Goyal wrote:
>>> On Thu, Dec 06, 2018 at 03:26:26PM -0500, Stephen Smalley wrote:
>>>> On 12/5/18 8:43 AM, Vivek Goyal wrote:
>>>>> On Tue, Dec 04, 2018 at 11:49:16AM -0500, Stephen Smalley wrote:
>>>>>> On 12/4/18 11:17 AM, Vivek Goyal wrote:
>>>>>>> On Tue, Dec 04, 2018 at 11:05:46AM -0500, Stephen Smalley wrote:
>>>>>>>> On 12/4/18 10:42 AM, Vivek Goyal wrote:
>>>>>>>>> On Tue, Dec 04, 2018 at 04:31:09PM +0100, Miklos Szeredi wrote:
>>>>>>>>>> On Tue, Dec 4, 2018 at 4:22 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Having said that, this still create little anomaly when mknod to client
>>>>>>>>>>> is not allowed on context label. So a device file, which is on lower
>>>>>>>>>>> and client can not open it for read/write on host, it can now be opened
>>>>>>>>>>> for read/write because mounter will allow access. So why it is different
>>>>>>>>>>> that regular copy up. Well, in regular copy up, we created a copy of
>>>>>>>>>>> the original object and allowed writing to that object (cp --preserve=all)
>>>>>>>>>>> model. But in case of device file, writes will go to same original
>>>>>>>>>>> object. (And not a separate copy).
>>>>>>>>>>
>>>>>>>>>> That's true.
>>>>>>>>>>
>>>>>>>>>> In that sense copy up of special file should result in upper having
>>>>>>>>>> the same label as of lower, right?
>>>>>>>>>
>>>>>>>>> I guess that might be reasonable (if this behavior is a concern). So even
>>>>>>>>> after copy up, client will not be able to read/write a device if it was
>>>>>>>>> not allowed on lower.
>>>>>>>>>
>>>>>>>>> Stephen, what do you think about retaining label of lower for device
>>>>>>>>> files during copy up. What about socket/fifo.
>>>>>>>>
>>>>>>>> We don't check client task access to the upper inode label, only to the
>>>>>>>> overlay, right?  So the client is still free to access the device through
>>>>>>>> the overlay even if we preserve the lower inode label on the upper inode?
>>>>>>>> What do we gain?
>>>>>>>
>>>>>>> That's only with latest code and Miklos said he will revert it for 4.20.
>>>>>>>
>>>>>>> IOW, I am assuming that we will continue to check access to a file
>>>>>>> on upper in the context of mounter. Otherwise, client will be able to access
>>>>>>> files on upper/ which even mounter can't access.
>>>>>>
>>>>>> I was assuming we're talking about the proposed solution, where we check
>>>>>> client access to the overlay (unchanged), mounter access to lower
>>>>>> (unchanged), copy-up if denied (new), mounter access to upper (new in the
>>>>>> sense that previously we didn't copy-up on denials).
>>>>>>
>>>>>> In that situation, propagating the lower inode label to the upper inode only
>>>>>> impacts the mounter checks, and in that case makes copy-up pointless - if it
>>>>>> wasn't allowed to lower it won't be allowed to upper.  If it is allowed,
>>>>>> then client task is free to access the device regardless as long as it has
>>>>>> permissions to the overlay inode.  So I don't see what we gain by
>>>>>> propagating the lower inode label to the upper inode in the context mount
>>>>>> case, and it creates an inconsistency between special files and regular
>>>>>> ones.
>>>>>
>>>>> If we agree on retaining lower label of lower device file on copy up, then
>>>>> I am assuming we will change rule c) to copy up only non device files.
>>>>> (because if you don't have access on lower, you will not have access
>>>>> even after copy up).
>>>>>
>>>>> There are other paths where copy up happnes. Like link or when file
>>>>> metadata (ownership, permissions, timestmap) changes. In those cases,
>>>>> if we retain the lower label over copy up, it probably will help.
>>>>>
>>>>> IOW, just by creating a link to a device, one will not get access to
>>>>> a device on upper which could not be accessed on lower.
>>>>>
>>>>> Device files are special anyway. In regular files we are creating a
>>>>> copy and user writes to copy. But that's not the case with device
>>>>> files. So I guess these will have to be treated differently.
>>>>
>>>> I don't understand what you are suggesting.  In the case of a context mount,
>>>> the context specified by the mounter must be assigned to the upper inode for
>>>> any files that are copied up.  Otherwise, changes to file data or metadata
>>>> made through the overlay will be visible under two different security
>>>> contexts simultaneously: the context of the overlay inode (i.e. the one
>>>> specified by the mounter) and the context of the upper inode (in your
>>>> suggestion, the context from the lower inode). This allows a violation of
>>>> MAC policy where one can leak data through an overlay to an unauthorized
>>>> context.
>>>
>>> Hi Stephen,
>>>
>>> Sorry, I don't understand this point of leaking data through overlay. Even
>>> if we retain lower label on copy up (for device file), to open that file
>>> process should have access on overlay context label and then mounter needs
>>> to have access on upper inode (lower label). This is not different from
>>> opening a file on lower. Just that metadata of this file on upper might
>>> be different.
>>>
>>> Can you elaborate a bit more on how this is leaking data through overlay
>>> mount. If it is, then why accessing file on lower is not equivalent of
>>> leaking of data.
>>
>> In the container use case, retaining the lower label on copy-up for a
>> context-mounted overlay permits a process in the container to leak the
>> container data out to host files not labeled with the container label and
>> thus potentially accessible to other containers or host processes.
> 
>> The
>> container process appears to just be writing to files labeled with the
>> container label via the overlay, but the written data and/or metadata is
>> directly accessible through the lower label, which is likely readable to
>> all/many containers and host processes.
>>
>> In the multi-level security (MLS) use case, an analogy would a situation
>> where you have an unclassified lower dir with some content to be shared
>> read-only across all levels, and an overlay is context-mounted at each level
>> with a corresponding upper dir and work dir private to that level.  If a
>> client process at secret performs a write to a file via the secret overlay,
>> and if the written data is stored in a file in the upper dir that inherits
>> the label from the lower file (unclassified), then the secret process can
>> leak data to unclassified processes at will, violating the MLS policy.
> 
> For the case of devices, its already happening. One might change metadata
> of a device (hence trigger copy up). Now all writes to upper device file
> from secret process still go to same underlying device and are still
> readable from lower device file.

This is an argument for not copying up device files IMHO, not for 
preserving the lower label on them.

Even just allowing the secret process to trigger the creation of an 
unclassified file through copy-up is a violation of the MLS policy.  It 
doesn't require writing to the file itself.

> 
> In fact, for the case of devices, that is even more of a reason to retain
> the label same as lower. Otherwise upper device node is leaking data
> of a secret process which can be accessed using device at lower/ (lower's
> label is readable).
> 
>>
>> The difference with the lower is that it is read-only and the mounter is
>> explicitly choosing to export it under the new context for reading (but not
>> for writing).
> 
> If we retain the label and if lower is not writable, then upper deivce
> node is not being written to as well. So from label point of view, lower
> and upper inode are not different. Only difference is that some metadata
> of upper inode might be different.
> 
> Thanks
> Vivek
>>
>> As a side note, the actual checking during a context mount isn't as granular
>> as we might like here, since there is no overlay-specific logic and thus no
>> individual checking of the lower, upper, and work directory labels.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-12-13 16:12                                                     ` Stephen Smalley
@ 2018-12-13 18:54                                                       ` Vivek Goyal
  2018-12-13 20:09                                                         ` Stephen Smalley
  0 siblings, 1 reply; 47+ messages in thread
From: Vivek Goyal @ 2018-12-13 18:54 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: Miklos Szeredi, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn,
	Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux,
	Daniel J Walsh

On Thu, Dec 13, 2018 at 11:12:31AM -0500, Stephen Smalley wrote:

[..]
> > > > Can you elaborate a bit more on how this is leaking data through overlay
> > > > mount. If it is, then why accessing file on lower is not equivalent of
> > > > leaking of data.
> > > 
> > > In the container use case, retaining the lower label on copy-up for a
> > > context-mounted overlay permits a process in the container to leak the
> > > container data out to host files not labeled with the container label and
> > > thus potentially accessible to other containers or host processes.
> > 
> > > The
> > > container process appears to just be writing to files labeled with the
> > > container label via the overlay, but the written data and/or metadata is
> > > directly accessible through the lower label, which is likely readable to
> > > all/many containers and host processes.
> > > 
> > > In the multi-level security (MLS) use case, an analogy would a situation
> > > where you have an unclassified lower dir with some content to be shared
> > > read-only across all levels, and an overlay is context-mounted at each level
> > > with a corresponding upper dir and work dir private to that level.  If a
> > > client process at secret performs a write to a file via the secret overlay,
> > > and if the written data is stored in a file in the upper dir that inherits
> > > the label from the lower file (unclassified), then the secret process can
> > > leak data to unclassified processes at will, violating the MLS policy.
> > 
> > For the case of devices, its already happening. One might change metadata
> > of a device (hence trigger copy up). Now all writes to upper device file
> > from secret process still go to same underlying device and are still
> > readable from lower device file.
> 
> This is an argument for not copying up device files IMHO, not for preserving
> the lower label on them.

How do we handle metadata change to device node (like timestamp, ownership
change) without copy up.

Vivek

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-12-13 18:54                                                       ` Vivek Goyal
@ 2018-12-13 20:09                                                         ` Stephen Smalley
  2018-12-13 20:26                                                           ` Vivek Goyal
  0 siblings, 1 reply; 47+ messages in thread
From: Stephen Smalley @ 2018-12-13 20:09 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Miklos Szeredi, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn,
	Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux,
	Daniel J Walsh

On 12/13/18 1:54 PM, Vivek Goyal wrote:
> On Thu, Dec 13, 2018 at 11:12:31AM -0500, Stephen Smalley wrote:
> 
> [..]
>>>>> Can you elaborate a bit more on how this is leaking data through overlay
>>>>> mount. If it is, then why accessing file on lower is not equivalent of
>>>>> leaking of data.
>>>>
>>>> In the container use case, retaining the lower label on copy-up for a
>>>> context-mounted overlay permits a process in the container to leak the
>>>> container data out to host files not labeled with the container label and
>>>> thus potentially accessible to other containers or host processes.
>>>
>>>> The
>>>> container process appears to just be writing to files labeled with the
>>>> container label via the overlay, but the written data and/or metadata is
>>>> directly accessible through the lower label, which is likely readable to
>>>> all/many containers and host processes.
>>>>
>>>> In the multi-level security (MLS) use case, an analogy would a situation
>>>> where you have an unclassified lower dir with some content to be shared
>>>> read-only across all levels, and an overlay is context-mounted at each level
>>>> with a corresponding upper dir and work dir private to that level.  If a
>>>> client process at secret performs a write to a file via the secret overlay,
>>>> and if the written data is stored in a file in the upper dir that inherits
>>>> the label from the lower file (unclassified), then the secret process can
>>>> leak data to unclassified processes at will, violating the MLS policy.
>>>
>>> For the case of devices, its already happening. One might change metadata
>>> of a device (hence trigger copy up). Now all writes to upper device file
>>> from secret process still go to same underlying device and are still
>>> readable from lower device file.
>>
>> This is an argument for not copying up device files IMHO, not for preserving
>> the lower label on them.
> 
> How do we handle metadata change to device node (like timestamp, ownership
> change) without copy up.

Do we need to support such metadata changes to device nodes through an 
overlay mount?  Is that required for some legitimate purpose (and if so, 
what is the use case?)?  If not, just deny it up front.  Much simpler 
and no potential for a leak.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-12-13 20:09                                                         ` Stephen Smalley
@ 2018-12-13 20:26                                                           ` Vivek Goyal
  0 siblings, 0 replies; 47+ messages in thread
From: Vivek Goyal @ 2018-12-13 20:26 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: Miklos Szeredi, Ondrej Mosnacek, J. Bruce Fields, Mark Salyzyn,
	Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux,
	Daniel J Walsh

On Thu, Dec 13, 2018 at 03:09:55PM -0500, Stephen Smalley wrote:
> On 12/13/18 1:54 PM, Vivek Goyal wrote:
> > On Thu, Dec 13, 2018 at 11:12:31AM -0500, Stephen Smalley wrote:
> > 
> > [..]
> > > > > > Can you elaborate a bit more on how this is leaking data through overlay
> > > > > > mount. If it is, then why accessing file on lower is not equivalent of
> > > > > > leaking of data.
> > > > > 
> > > > > In the container use case, retaining the lower label on copy-up for a
> > > > > context-mounted overlay permits a process in the container to leak the
> > > > > container data out to host files not labeled with the container label and
> > > > > thus potentially accessible to other containers or host processes.
> > > > 
> > > > > The
> > > > > container process appears to just be writing to files labeled with the
> > > > > container label via the overlay, but the written data and/or metadata is
> > > > > directly accessible through the lower label, which is likely readable to
> > > > > all/many containers and host processes.
> > > > > 
> > > > > In the multi-level security (MLS) use case, an analogy would a situation
> > > > > where you have an unclassified lower dir with some content to be shared
> > > > > read-only across all levels, and an overlay is context-mounted at each level
> > > > > with a corresponding upper dir and work dir private to that level.  If a
> > > > > client process at secret performs a write to a file via the secret overlay,
> > > > > and if the written data is stored in a file in the upper dir that inherits
> > > > > the label from the lower file (unclassified), then the secret process can
> > > > > leak data to unclassified processes at will, violating the MLS policy.
> > > > 
> > > > For the case of devices, its already happening. One might change metadata
> > > > of a device (hence trigger copy up). Now all writes to upper device file
> > > > from secret process still go to same underlying device and are still
> > > > readable from lower device file.
> > > 
> > > This is an argument for not copying up device files IMHO, not for preserving
> > > the lower label on them.
> > 
> > How do we handle metadata change to device node (like timestamp, ownership
> > change) without copy up.
> 
> Do we need to support such metadata changes to device nodes through an
> overlay mount?  Is that required for some legitimate purpose (and if so,
> what is the use case?)?  If not, just deny it up front.  Much simpler and no
> potential for a leak.

This will be overlay specific behavior and further from POSIX
like filesystem behavior. Don't know which workloads depend on changing
ownership of devices of changing metadata of devices.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2018-11-29 13:49                 ` Vivek Goyal
@ 2019-03-04 17:01                   ` Mark Salyzyn
  2019-03-04 17:56                     ` Casey Schaufler
  2019-03-04 18:44                     ` Stephen Smalley
  0 siblings, 2 replies; 47+ messages in thread
From: Mark Salyzyn @ 2019-03-04 17:01 UTC (permalink / raw)
  To: Vivek Goyal, Miklos Szeredi
  Cc: Stephen Smalley, Ondrej Mosnacek, J. Bruce Fields, Paul Moore,
	linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh

On 11/29/2018 05:49 AM, Vivek Goyal wrote:
> So will override_creds=off solve the NFS issue also where all access will
> happen with the creds of task now? Though it will stil require more
> priviliges in task for other operations in overlay to succeed.

NFS problems seems to have ended the discussion, too many stakeholders? 
too many outstanding questions?

Do we accept the limitations of the override_creds patch as is, and then 
have the folks more familiar with the NFS scenario(s) build on it?

[TL;DR]

After looking at all this discussion, it feels like a larger audited 
rewrite of the security model is in order and override_creds=off may be 
a disservice (although expediently deals with Android's needs) to a 
correct general solution. I admit I have little idea where to go from 
here for a general solution.

As far as I see it, the model of creator && caller credentials is a 
problem for any non-overlapping (MAC) privilege models. This patch 
allows one to drop any creator privilege escalation, re-introducing the 
"caller" to the lower layers.

As such I would expect a better model is to _always_ check the caller 
credentials again in the lower layers, and only check the creator 
credentials, some without caller credentials, for some special cases? 
Change an && to an || for some of the checks? What are those special 
cases? I must admit _none_ of those special cases need attention in the 
Android usage models though making it difficult for me to do the fight 
thing for the associated stakeholders.

The lower privileged application access to the directory cache inherited 
by other callers troubles me (not for Android, but in general) and feels 
troublesome (flush out the directory cache? how to tag the privileges 
associated with the current instance of the directory cache?). Some 
operations (eg: delete a file for incoming, create mknod in upperdir) 
are special cases requiring the checking of caller credentaisl to 
function (not a problem for Android as the caller that deletes a file 
just so happens to have the necessary privileges).

Also, mount namespaces (in upper, lower, etc), how will they affect this 
all, is there a need for more attention to this as well?

-- Mark

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2019-03-04 17:01                   ` Mark Salyzyn
@ 2019-03-04 17:56                     ` Casey Schaufler
  2019-03-04 18:44                     ` Stephen Smalley
  1 sibling, 0 replies; 47+ messages in thread
From: Casey Schaufler @ 2019-03-04 17:56 UTC (permalink / raw)
  To: Mark Salyzyn, Vivek Goyal, Miklos Szeredi
  Cc: Stephen Smalley, Ondrej Mosnacek, J. Bruce Fields, Paul Moore,
	linux-kernel, overlayfs, linux-fsdevel, selinux, Daniel J Walsh,
	Linux Security Module list

On 3/4/2019 9:01 AM, Mark Salyzyn wrote:

Adding linux-security-module to the CC. Please keep the general
LSM community in to loop.


> On 11/29/2018 05:49 AM, Vivek Goyal wrote:
>> So will override_creds=off solve the NFS issue also where all access 
>> will
>> happen with the creds of task now? Though it will stil require more
>> priviliges in task for other operations in overlay to succeed.
>
> NFS problems seems to have ended the discussion, too many 
> stakeholders? too many outstanding questions?
>
> Do we accept the limitations of the override_creds patch as is, and 
> then have the folks more familiar with the NFS scenario(s) build on it?
>
> [TL;DR]
>
> After looking at all this discussion, it feels like a larger audited 
> rewrite of the security model is in order and override_creds=off may 
> be a disservice (although expediently deals with Android's needs) to a 
> correct general solution. I admit I have little idea where to go from 
> here for a general solution.
>
> As far as I see it, the model of creator && caller credentials is a 
> problem for any non-overlapping (MAC) privilege models. This patch 
> allows one to drop any creator privilege escalation, re-introducing 
> the "caller" to the lower layers.
>
> As such I would expect a better model is to _always_ check the caller 
> credentials again in the lower layers, and only check the creator 
> credentials, some without caller credentials, for some special cases? 
> Change an && to an || for some of the checks? What are those special 
> cases? I must admit _none_ of those special cases need attention in 
> the Android usage models though making it difficult for me to do the 
> fight thing for the associated stakeholders.
>
> The lower privileged application access to the directory cache 
> inherited by other callers troubles me (not for Android, but in 
> general) and feels troublesome (flush out the directory cache? how to 
> tag the privileges associated with the current instance of the 
> directory cache?). Some operations (eg: delete a file for incoming, 
> create mknod in upperdir) are special cases requiring the checking of 
> caller credentaisl to function (not a problem for Android as the 
> caller that deletes a file just so happens to have the necessary 
> privileges).
>
> Also, mount namespaces (in upper, lower, etc), how will they affect 
> this all, is there a need for more attention to this as well?
>
> -- Mark
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2019-03-04 17:01                   ` Mark Salyzyn
  2019-03-04 17:56                     ` Casey Schaufler
@ 2019-03-04 18:44                     ` Stephen Smalley
  2019-03-04 19:21                       ` Amir Goldstein
  1 sibling, 1 reply; 47+ messages in thread
From: Stephen Smalley @ 2019-03-04 18:44 UTC (permalink / raw)
  To: Mark Salyzyn, Vivek Goyal, Miklos Szeredi
  Cc: Ondrej Mosnacek, J. Bruce Fields, Paul Moore, linux-kernel,
	overlayfs, linux-fsdevel, selinux, Daniel J Walsh, LSM

On 3/4/19 12:01 PM, Mark Salyzyn wrote:
> On 11/29/2018 05:49 AM, Vivek Goyal wrote:
>> So will override_creds=off solve the NFS issue also where all access will
>> happen with the creds of task now? Though it will stil require more
>> priviliges in task for other operations in overlay to succeed.
> 
> NFS problems seems to have ended the discussion, too many stakeholders? 
> too many outstanding questions?
> 
> Do we accept the limitations of the override_creds patch as is, and then 
> have the folks more familiar with the NFS scenario(s) build on it?
> 
> [TL;DR]
> 
> After looking at all this discussion, it feels like a larger audited 
> rewrite of the security model is in order and override_creds=off may be 
> a disservice (although expediently deals with Android's needs) to a 
> correct general solution. I admit I have little idea where to go from 
> here for a general solution.
> 
> As far as I see it, the model of creator && caller credentials is a 
> problem for any non-overlapping (MAC) privilege models. This patch 
> allows one to drop any creator privilege escalation, re-introducing the 
> "caller" to the lower layers.
> 
> As such I would expect a better model is to _always_ check the caller 
> credentials again in the lower layers, and only check the creator 
> credentials, some without caller credentials, for some special cases? 
> Change an && to an || for some of the checks? What are those special 
> cases? I must admit _none_ of those special cases need attention in the 
> Android usage models though making it difficult for me to do the fight 
> thing for the associated stakeholders.

As I recall, there were multiple problems with using current process' 
creds for the operations on the lower/upper/work directories:

- Some overlayfs operations on the lower/upper/work directories required 
privilege (capabilities) that the current process might lack, e.g. to 
set ownership and the like on upper or work files, to set special xattrs 
used internally by overlayfs for whiteouts or similar purposes, to act 
on files within the work dir which was inaccessible to the current 
process to prevent accessing files in an incomplete state, etc. 
Originally that was handled by temporarily elevating the effective 
capabilities around the privileged operation in the overlayfs code but 
that didn't help with the SELinux or other LSM capability checking that 
was triggered upon the capable calls.  Without some change there you'd 
have to allow all client process domains all of the relevant 
capabilities in policy, greatly increasing their privileges.

- The original logic was checking access to the lower dir/files in the 
context of the current process when performing some operation that 
modifies the file content or metadata, thereby triggering a SELinux/LSM 
write or similar check, even though the actual data or metadata 
modification occurs to the copied-up file instead and does not affect 
the lower dir/files.  That was preventing making the lower dir/file 
labels read-only to the client processes in the policy, which was 
desired for the container use case.

You'd need to solve those problems in some way.

> The lower privileged application access to the directory cache inherited 
> by other callers troubles me (not for Android, but in general) and feels 
> troublesome (flush out the directory cache? how to tag the privileges 
> associated with the current instance of the directory cache?). Some 
> operations (eg: delete a file for incoming, create mknod in upperdir) 
> are special cases requiring the checking of caller credentaisl to 
> function (not a problem for Android as the caller that deletes a file 
> just so happens to have the necessary privileges).
> 
> Also, mount namespaces (in upper, lower, etc), how will they affect this 
> all, is there a need for more attention to this as well?
> 
> -- Mark


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: overlayfs access checks on underlying layers
  2019-03-04 18:44                     ` Stephen Smalley
@ 2019-03-04 19:21                       ` Amir Goldstein
  0 siblings, 0 replies; 47+ messages in thread
From: Amir Goldstein @ 2019-03-04 19:21 UTC (permalink / raw)
  To: Mark Salyzyn
  Cc: Vivek Goyal, Miklos Szeredi, Ondrej Mosnacek, J. Bruce Fields,
	Paul Moore, linux-kernel, overlayfs, linux-fsdevel, selinux,
	Daniel J Walsh, LSM, Stephen Smalley

On Mon, Mar 4, 2019 at 8:53 PM Stephen Smalley <sds@tycho.nsa.gov> wrote:
>
> On 3/4/19 12:01 PM, Mark Salyzyn wrote:
> > On 11/29/2018 05:49 AM, Vivek Goyal wrote:
> >> So will override_creds=off solve the NFS issue also where all access will
> >> happen with the creds of task now? Though it will stil require more
> >> priviliges in task for other operations in overlay to succeed.
> >
> > NFS problems seems to have ended the discussion, too many stakeholders?
> > too many outstanding questions?
> >
> > Do we accept the limitations of the override_creds patch as is, and then
> > have the folks more familiar with the NFS scenario(s) build on it?
> >
> > [TL;DR]
> >
> > After looking at all this discussion, it feels like a larger audited
> > rewrite of the security model is in order and override_creds=off may be
> > a disservice (although expediently deals with Android's needs) to a
> > correct general solution. I admit I have little idea where to go from
> > here for a general solution.
> >
> > As far as I see it, the model of creator && caller credentials is a
> > problem for any non-overlapping (MAC) privilege models. This patch
> > allows one to drop any creator privilege escalation, re-introducing the
> > "caller" to the lower layers.
> >
> > As such I would expect a better model is to _always_ check the caller
> > credentials again in the lower layers, and only check the creator
> > credentials, some without caller credentials, for some special cases?
> > Change an && to an || for some of the checks? What are those special
> > cases? I must admit _none_ of those special cases need attention in the
> > Android usage models though making it difficult for me to do the fight
> > thing for the associated stakeholders.
>
> As I recall, there were multiple problems with using current process'
> creds for the operations on the lower/upper/work directories:
>
> - Some overlayfs operations on the lower/upper/work directories required
> privilege (capabilities) that the current process might lack, e.g. to
> set ownership and the like on upper or work files, to set special xattrs
> used internally by overlayfs for whiteouts or similar purposes, to act
> on files within the work dir which was inaccessible to the current
> process to prevent accessing files in an incomplete state, etc.
> Originally that was handled by temporarily elevating the effective
> capabilities around the privileged operation in the overlayfs code but
> that didn't help with the SELinux or other LSM capability checking that
> was triggered upon the capable calls.  Without some change there you'd
> have to allow all client process domains all of the relevant
> capabilities in policy, greatly increasing their privileges.
>
> - The original logic was checking access to the lower dir/files in the
> context of the current process when performing some operation that
> modifies the file content or metadata, thereby triggering a SELinux/LSM
> write or similar check, even though the actual data or metadata
> modification occurs to the copied-up file instead and does not affect
> the lower dir/files.  That was preventing making the lower dir/file
> labels read-only to the client processes in the policy, which was
> desired for the container use case.
>
> You'd need to solve those problems in some way.
>

This is entire discussion in avoiding the elephant in the room -
Tampering with underlying layers.
We only talk about functionality and enforcement of legit overlayfs
operation, but what about an adversary that writes/changes files on
underlying layers? For example, permission to write new files can
be elevated to effectively modifying or deleting files.

To me the logic in "mounter credentials" is not only functional.
Whoever has access to underlying layer can really cause a lot of
damage to users accessing the overlay mount, so in a way, this
power needs to be translated to some sort of elevated credentials.

The the current model seems solid in that respect - access to
underlying layer can be restricted to the "mounter" who effectively
crafts an entirely new set of files.

Another way to think about this - consider a user space overlayfs
implementation, for example:
https://github.com/containers/fuse-overlayfs

In what way should the requirements to overlayfs security model
differ from the model already imposed on a user space implementation?

I do understand the Android use case for override_creds=off, but
to me, this is a functional configuration that says "turn off overlay
specific security considerations... I know what I am doing".
I am not saying that the Android adb debug use case is insecure.
I am just not seeing how a && or || model for mounter and caller
credential add up to a coherent story (yet...).

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2019-03-04 19:21 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-27 19:55 overlayfs access checks on underlying layers Miklos Szeredi
2018-11-27 19:58 ` Miklos Szeredi
2018-11-27 21:05   ` Vivek Goyal
2018-11-28 10:00     ` Miklos Szeredi
2018-11-28 17:03       ` Vivek Goyal
2018-11-28 19:34         ` Stephen Smalley
2018-11-28 20:24           ` Miklos Szeredi
2018-11-28 21:46             ` Stephen Smalley
2018-11-29 11:04               ` Miklos Szeredi
2018-11-29 13:49                 ` Vivek Goyal
2019-03-04 17:01                   ` Mark Salyzyn
2019-03-04 17:56                     ` Casey Schaufler
2019-03-04 18:44                     ` Stephen Smalley
2019-03-04 19:21                       ` Amir Goldstein
2018-11-29 16:16                 ` Stephen Smalley
2018-11-29 16:22                   ` Stephen Smalley
2018-11-29 19:47                   ` Miklos Szeredi
2018-11-29 21:03                     ` Stephen Smalley
2018-11-29 21:19                       ` Stephen Smalley
2018-12-04 13:32                         ` Miklos Szeredi
2018-12-04 14:30                           ` Stephen Smalley
2018-12-04 14:45                             ` Miklos Szeredi
2018-12-04 15:35                               ` Stephen Smalley
2018-12-04 15:39                                 ` Miklos Szeredi
2018-12-11 15:50                                   ` Paul Moore
2018-12-04 15:15                             ` Vivek Goyal
2018-12-04 15:22                               ` Vivek Goyal
2018-12-04 15:31                                 ` Miklos Szeredi
2018-12-04 15:42                                   ` Vivek Goyal
2018-12-04 16:05                                     ` Stephen Smalley
2018-12-04 16:17                                       ` Vivek Goyal
2018-12-04 16:49                                         ` Stephen Smalley
2018-12-05 13:43                                           ` Vivek Goyal
2018-12-06 20:26                                             ` Stephen Smalley
2018-12-11 21:48                                               ` Vivek Goyal
2018-12-12 14:51                                                 ` Stephen Smalley
2018-12-13 14:58                                                   ` Vivek Goyal
2018-12-13 16:12                                                     ` Stephen Smalley
2018-12-13 18:54                                                       ` Vivek Goyal
2018-12-13 20:09                                                         ` Stephen Smalley
2018-12-13 20:26                                                           ` Vivek Goyal
2018-12-04 15:42                               ` Stephen Smalley
2018-12-04 16:15                                 ` Vivek Goyal
2018-11-29 22:22                     ` Daniel Walsh
2018-12-03 23:27                       ` Paul Moore
2018-12-04 14:43                         ` Stephen Smalley
2018-12-04 23:01                           ` Paul Moore

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).