* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-31 8:11 ` Amir Goldstein 0 siblings, 0 replies; 138+ messages in thread From: Amir Goldstein @ 2015-07-31 8:11 UTC (permalink / raw) To: Casey Schaufler Cc: Seth Forshee, Theodore Ts'o, Stephen Smalley, Andy Lutomirski, Eric W. Biederman, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel On Thu, Jul 30, 2015 at 6:33 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: > On 7/30/2015 7:47 AM, Amir Goldstein wrote: >> On Thu, Jul 30, 2015 at 4:55 PM, Seth Forshee >> <seth.forshee@canonical.com> wrote: >>> On Thu, Jul 30, 2015 at 07:24:11AM +0300, Amir Goldstein wrote: >>>> On Tue, Jul 28, 2015 at 11:40 PM, Seth Forshee >>>> <seth.forshee@canonical.com> wrote: >>>>> On Wed, Jul 22, 2015 at 05:05:17PM -0700, Casey Schaufler wrote: >>>>>>> This is what I currently think you want for user ns mounts: >>>>>>> >>>>>>> 1. smk_root and smk_default are assigned the label of the backing >>>>>>> device. >>>> Seth, >>>> >>>> There were 2 main concerns discussed in this thread: >>>> 1. trusting LSM labels outside the namespace >>>> 2. trusting the content of the image file/loopdev >>>> >>>> While your approach addresses the first concern, I suspect it may be placing >>>> an obstacle in a way for resolving the second concern. >>>> >>>> A viable security policy to mitigate the second concern could be: >>>> - Allow only trusted programs (e.g. mkfs, fsck) to write to 'Loopback' images >>>> - Allow mount only of 'Loopback' images >>>> >>>> This should allow the system as a whole to trust unprivileged mounts based on >>>> the trust of the entities that had raw access the the fs layout. >>> You don't really say what you mean by "trusted" programs. In a container >>> context I'd have to assume that you mean suid-root or similar programs >>> shared into the container by the host. In that case is any new kernel >>> functionality even required? >> Sorry I was not clear. I will try to explain better. >> I meant that the programs are "trusted" by the LSM security policy. >> I envisioned a system where unprivileged user is allowed to spawn >> a container which contains "trusted" programs (e.g. mkfs) that are labeled >> as 'FileSystemTools' by the admin of the host. >> FileSystemTools are allowed to write into Loopback labeled files. > > You could do this on a Smack based system. It would require > CAP_MAC_ADMIN and CAP_MAC_OVERRIDE to set up. You would need > to set some SMACK64EXEC labels on your FileSystemTools, and > they would have to be written as carefully as the would if they > had "more" privilege. You'd need to designate a repository for > your loopback files. On the whole, it would be unattractive. > I will pass on providing the details for fear someone will like > it well enough to implement. > >>> That also doesn't work for some of our use cases, where we'd like to be >>> able to do something like "mount -o loop foo.img /mnt/foo" in an >>> unprivileged container where foo.img is not created on the local machine >>> and not fully under control of the host environment. >> That use case will not be addressed by the policy I suggested, >> but the more common case of: >> - create a loopback file >> - mkfs >> - mount >> will be addressed. >> >> So if the (host) admin of the system trusts that unprivileged user cannot create >> a malicious fs layout using mkfs and fsck alone, then the system is >> relatively safe >> mounting (non fuse) file systems from loopback files. >> IMHO, this statement is going to be easier for Ted to sign. > > But that sort of defeats the purpose of unprivileged mounts. > Or rather, you're trying to place restrictions on what an > unprivileged user can do without calling the ability to > violate those restrictions "privilege". I don't understand your concern. I am saying that LSM can come to the rescue, in a use case that many have been considering as unsolvable (i.e. the loopback tampering). Yes, I am trying to place restrictions on what an unprivileged user can do. As it stands right now, user is about to gain the ability to mount FUSE. With some extra care on crafting the policy and without any extra code, user can gain the ability to mount "trusted loopback files". It does not solve all use cases, but it does solve a handful. Anyway, the concern I was raising was about the fact that if files inside the loopback mount inherit the label of the loopback file, this policy is going to be impossible to write. But Stephan has already proposed an alternative to this implicit inherit rule on [PATCH 6/7] thread, so I withdraw my concern. > >> >>> Agreed though that the "attack from below" problem for untrusted >>> filesystems is still an open question. At minimum we have fuse, which >>> has been designed to protect against this threat. Others have mentioned >>> on this thread that Ted had said something at kernel summit last year >>> about being willing to support ext4 mounts from unprivileged user >>> namespaces as well. I've added Ted to the Cc in case he wants to confirm >>> or deny this rumor. >>> >>>> Alas, if you choose to propagate the backing dev label to contained files, >>>> they would all share the designated 'Loopback' label and render the policy above >>>> useless. >>>> >>>> Any thoughts on how to reconcile this conflict? >>> I'm not seeing what the conflict is here - nothing you proposed says >>> anything about security labels in the filesystem, and nothing would >>> prevent a "trusted" program with CAP_MAC_ADMIN from setting whatever >>> label was desired on the backing device. Care to elaborate? >>> >>> Seth > ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-31 8:11 ` Amir Goldstein 0 siblings, 0 replies; 138+ messages in thread From: Amir Goldstein @ 2015-07-31 8:11 UTC (permalink / raw) To: Casey Schaufler Cc: Serge Hallyn, Theodore Ts'o, linux-kernel, Andy Lutomirski, Seth Forshee, LSM List, SELinux-NSA, Linux FS Devel, Stephen Smalley, Alexander Viro On Thu, Jul 30, 2015 at 6:33 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: > On 7/30/2015 7:47 AM, Amir Goldstein wrote: >> On Thu, Jul 30, 2015 at 4:55 PM, Seth Forshee >> <seth.forshee@canonical.com> wrote: >>> On Thu, Jul 30, 2015 at 07:24:11AM +0300, Amir Goldstein wrote: >>>> On Tue, Jul 28, 2015 at 11:40 PM, Seth Forshee >>>> <seth.forshee@canonical.com> wrote: >>>>> On Wed, Jul 22, 2015 at 05:05:17PM -0700, Casey Schaufler wrote: >>>>>>> This is what I currently think you want for user ns mounts: >>>>>>> >>>>>>> 1. smk_root and smk_default are assigned the label of the backing >>>>>>> device. >>>> Seth, >>>> >>>> There were 2 main concerns discussed in this thread: >>>> 1. trusting LSM labels outside the namespace >>>> 2. trusting the content of the image file/loopdev >>>> >>>> While your approach addresses the first concern, I suspect it may be placing >>>> an obstacle in a way for resolving the second concern. >>>> >>>> A viable security policy to mitigate the second concern could be: >>>> - Allow only trusted programs (e.g. mkfs, fsck) to write to 'Loopback' images >>>> - Allow mount only of 'Loopback' images >>>> >>>> This should allow the system as a whole to trust unprivileged mounts based on >>>> the trust of the entities that had raw access the the fs layout. >>> You don't really say what you mean by "trusted" programs. In a container >>> context I'd have to assume that you mean suid-root or similar programs >>> shared into the container by the host. In that case is any new kernel >>> functionality even required? >> Sorry I was not clear. I will try to explain better. >> I meant that the programs are "trusted" by the LSM security policy. >> I envisioned a system where unprivileged user is allowed to spawn >> a container which contains "trusted" programs (e.g. mkfs) that are labeled >> as 'FileSystemTools' by the admin of the host. >> FileSystemTools are allowed to write into Loopback labeled files. > > You could do this on a Smack based system. It would require > CAP_MAC_ADMIN and CAP_MAC_OVERRIDE to set up. You would need > to set some SMACK64EXEC labels on your FileSystemTools, and > they would have to be written as carefully as the would if they > had "more" privilege. You'd need to designate a repository for > your loopback files. On the whole, it would be unattractive. > I will pass on providing the details for fear someone will like > it well enough to implement. > >>> That also doesn't work for some of our use cases, where we'd like to be >>> able to do something like "mount -o loop foo.img /mnt/foo" in an >>> unprivileged container where foo.img is not created on the local machine >>> and not fully under control of the host environment. >> That use case will not be addressed by the policy I suggested, >> but the more common case of: >> - create a loopback file >> - mkfs >> - mount >> will be addressed. >> >> So if the (host) admin of the system trusts that unprivileged user cannot create >> a malicious fs layout using mkfs and fsck alone, then the system is >> relatively safe >> mounting (non fuse) file systems from loopback files. >> IMHO, this statement is going to be easier for Ted to sign. > > But that sort of defeats the purpose of unprivileged mounts. > Or rather, you're trying to place restrictions on what an > unprivileged user can do without calling the ability to > violate those restrictions "privilege". I don't understand your concern. I am saying that LSM can come to the rescue, in a use case that many have been considering as unsolvable (i.e. the loopback tampering). Yes, I am trying to place restrictions on what an unprivileged user can do. As it stands right now, user is about to gain the ability to mount FUSE. With some extra care on crafting the policy and without any extra code, user can gain the ability to mount "trusted loopback files". It does not solve all use cases, but it does solve a handful. Anyway, the concern I was raising was about the fact that if files inside the loopback mount inherit the label of the loopback file, this policy is going to be impossible to write. But Stephan has already proposed an alternative to this implicit inherit rule on [PATCH 6/7] thread, so I withdraw my concern. > >> >>> Agreed though that the "attack from below" problem for untrusted >>> filesystems is still an open question. At minimum we have fuse, which >>> has been designed to protect against this threat. Others have mentioned >>> on this thread that Ted had said something at kernel summit last year >>> about being willing to support ext4 mounts from unprivileged user >>> namespaces as well. I've added Ted to the Cc in case he wants to confirm >>> or deny this rumor. >>> >>>> Alas, if you choose to propagate the backing dev label to contained files, >>>> they would all share the designated 'Loopback' label and render the policy above >>>> useless. >>>> >>>> Any thoughts on how to reconcile this conflict? >>> I'm not seeing what the conflict is here - nothing you proposed says >>> anything about security labels in the filesystem, and nothing would >>> prevent a "trusted" program with CAP_MAC_ADMIN from setting whatever >>> label was desired on the backing device. Care to elaborate? >>> >>> Seth > ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-31 8:11 ` Amir Goldstein @ 2015-07-31 19:56 ` Casey Schaufler -1 siblings, 0 replies; 138+ messages in thread From: Casey Schaufler @ 2015-07-31 19:56 UTC (permalink / raw) To: Amir Goldstein Cc: Seth Forshee, Theodore Ts'o, Stephen Smalley, Andy Lutomirski, Eric W. Biederman, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel, Casey Schaufler On 7/31/2015 1:11 AM, Amir Goldstein wrote: > On Thu, Jul 30, 2015 at 6:33 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: >> On 7/30/2015 7:47 AM, Amir Goldstein wrote: >>> On Thu, Jul 30, 2015 at 4:55 PM, Seth Forshee >>> <seth.forshee@canonical.com> wrote: >>>> On Thu, Jul 30, 2015 at 07:24:11AM +0300, Amir Goldstein wrote: >>>>> On Tue, Jul 28, 2015 at 11:40 PM, Seth Forshee >>>>> <seth.forshee@canonical.com> wrote: >>>>>> On Wed, Jul 22, 2015 at 05:05:17PM -0700, Casey Schaufler wrote: >>>>>>>> This is what I currently think you want for user ns mounts: >>>>>>>> >>>>>>>> 1. smk_root and smk_default are assigned the label of the backing >>>>>>>> device. >>>>> Seth, >>>>> >>>>> There were 2 main concerns discussed in this thread: >>>>> 1. trusting LSM labels outside the namespace >>>>> 2. trusting the content of the image file/loopdev >>>>> >>>>> While your approach addresses the first concern, I suspect it may be placing >>>>> an obstacle in a way for resolving the second concern. >>>>> >>>>> A viable security policy to mitigate the second concern could be: >>>>> - Allow only trusted programs (e.g. mkfs, fsck) to write to 'Loopback' images >>>>> - Allow mount only of 'Loopback' images >>>>> >>>>> This should allow the system as a whole to trust unprivileged mounts based on >>>>> the trust of the entities that had raw access the the fs layout. >>>> You don't really say what you mean by "trusted" programs. In a container >>>> context I'd have to assume that you mean suid-root or similar programs >>>> shared into the container by the host. In that case is any new kernel >>>> functionality even required? >>> Sorry I was not clear. I will try to explain better. >>> I meant that the programs are "trusted" by the LSM security policy. >>> I envisioned a system where unprivileged user is allowed to spawn >>> a container which contains "trusted" programs (e.g. mkfs) that are labeled >>> as 'FileSystemTools' by the admin of the host. >>> FileSystemTools are allowed to write into Loopback labeled files. >> You could do this on a Smack based system. It would require >> CAP_MAC_ADMIN and CAP_MAC_OVERRIDE to set up. You would need >> to set some SMACK64EXEC labels on your FileSystemTools, and >> they would have to be written as carefully as the would if they >> had "more" privilege. You'd need to designate a repository for >> your loopback files. On the whole, it would be unattractive. >> I will pass on providing the details for fear someone will like >> it well enough to implement. >> >>>> That also doesn't work for some of our use cases, where we'd like to be >>>> able to do something like "mount -o loop foo.img /mnt/foo" in an >>>> unprivileged container where foo.img is not created on the local machine >>>> and not fully under control of the host environment. >>> That use case will not be addressed by the policy I suggested, >>> but the more common case of: >>> - create a loopback file >>> - mkfs >>> - mount >>> will be addressed. >>> >>> So if the (host) admin of the system trusts that unprivileged user cannot create >>> a malicious fs layout using mkfs and fsck alone, then the system is >>> relatively safe >>> mounting (non fuse) file systems from loopback files. >>> IMHO, this statement is going to be easier for Ted to sign. >> But that sort of defeats the purpose of unprivileged mounts. >> Or rather, you're trying to place restrictions on what an >> unprivileged user can do without calling the ability to >> violate those restrictions "privilege". > I don't understand your concern. My concern is that you're playing a shell game. Allow unprivileged mounts, but only of things that where created using privilege. How is that better than requiring privilege to do the mount? > I am saying that LSM can come to the rescue, in a use case that > many have been considering as unsolvable (i.e. the loopback tampering). > > Yes, I am trying to place restrictions on what an unprivileged user can do. > As it stands right now, user is about to gain the ability to mount FUSE. > With some extra care on crafting the policy and without any extra code, > user can gain the ability to mount "trusted loopback files". > It does not solve all use cases, but it does solve a handful. As I said, you can do this, but it will be ugly, and people won't understand how to use it correctly. The distance between the "trusted" creation of the filesystem and the "untrusted" mount is too great. Plus, there are too many ways to circumvent the integrity of your "trusted" filesystem. > Anyway, the concern I was raising was about the fact that if files inside > the loopback mount inherit the label of the loopback file, this policy is > going to be impossible to write. > But Stephan has already proposed an alternative to this implicit inherit rule > on [PATCH 6/7] thread, so I withdraw my concern. What Stephan has proposed is dandy for SELinux. > > >>>> Agreed though that the "attack from below" problem for untrusted >>>> filesystems is still an open question. At minimum we have fuse, which >>>> has been designed to protect against this threat. Others have mentioned >>>> on this thread that Ted had said something at kernel summit last year >>>> about being willing to support ext4 mounts from unprivileged user >>>> namespaces as well. I've added Ted to the Cc in case he wants to confirm >>>> or deny this rumor. >>>> >>>>> Alas, if you choose to propagate the backing dev label to contained files, >>>>> they would all share the designated 'Loopback' label and render the policy above >>>>> useless. >>>>> >>>>> Any thoughts on how to reconcile this conflict? >>>> I'm not seeing what the conflict is here - nothing you proposed says >>>> anything about security labels in the filesystem, and nothing would >>>> prevent a "trusted" program with CAP_MAC_ADMIN from setting whatever >>>> label was desired on the backing device. Care to elaborate? >>>> >>>> Seth ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-31 19:56 ` Casey Schaufler 0 siblings, 0 replies; 138+ messages in thread From: Casey Schaufler @ 2015-07-31 19:56 UTC (permalink / raw) To: Amir Goldstein Cc: Serge Hallyn, Theodore Ts'o, linux-kernel, Andy Lutomirski, Seth Forshee, LSM List, SELinux-NSA, Linux FS Devel, Stephen Smalley, Alexander Viro On 7/31/2015 1:11 AM, Amir Goldstein wrote: > On Thu, Jul 30, 2015 at 6:33 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: >> On 7/30/2015 7:47 AM, Amir Goldstein wrote: >>> On Thu, Jul 30, 2015 at 4:55 PM, Seth Forshee >>> <seth.forshee@canonical.com> wrote: >>>> On Thu, Jul 30, 2015 at 07:24:11AM +0300, Amir Goldstein wrote: >>>>> On Tue, Jul 28, 2015 at 11:40 PM, Seth Forshee >>>>> <seth.forshee@canonical.com> wrote: >>>>>> On Wed, Jul 22, 2015 at 05:05:17PM -0700, Casey Schaufler wrote: >>>>>>>> This is what I currently think you want for user ns mounts: >>>>>>>> >>>>>>>> 1. smk_root and smk_default are assigned the label of the backing >>>>>>>> device. >>>>> Seth, >>>>> >>>>> There were 2 main concerns discussed in this thread: >>>>> 1. trusting LSM labels outside the namespace >>>>> 2. trusting the content of the image file/loopdev >>>>> >>>>> While your approach addresses the first concern, I suspect it may be placing >>>>> an obstacle in a way for resolving the second concern. >>>>> >>>>> A viable security policy to mitigate the second concern could be: >>>>> - Allow only trusted programs (e.g. mkfs, fsck) to write to 'Loopback' images >>>>> - Allow mount only of 'Loopback' images >>>>> >>>>> This should allow the system as a whole to trust unprivileged mounts based on >>>>> the trust of the entities that had raw access the the fs layout. >>>> You don't really say what you mean by "trusted" programs. In a container >>>> context I'd have to assume that you mean suid-root or similar programs >>>> shared into the container by the host. In that case is any new kernel >>>> functionality even required? >>> Sorry I was not clear. I will try to explain better. >>> I meant that the programs are "trusted" by the LSM security policy. >>> I envisioned a system where unprivileged user is allowed to spawn >>> a container which contains "trusted" programs (e.g. mkfs) that are labeled >>> as 'FileSystemTools' by the admin of the host. >>> FileSystemTools are allowed to write into Loopback labeled files. >> You could do this on a Smack based system. It would require >> CAP_MAC_ADMIN and CAP_MAC_OVERRIDE to set up. You would need >> to set some SMACK64EXEC labels on your FileSystemTools, and >> they would have to be written as carefully as the would if they >> had "more" privilege. You'd need to designate a repository for >> your loopback files. On the whole, it would be unattractive. >> I will pass on providing the details for fear someone will like >> it well enough to implement. >> >>>> That also doesn't work for some of our use cases, where we'd like to be >>>> able to do something like "mount -o loop foo.img /mnt/foo" in an >>>> unprivileged container where foo.img is not created on the local machine >>>> and not fully under control of the host environment. >>> That use case will not be addressed by the policy I suggested, >>> but the more common case of: >>> - create a loopback file >>> - mkfs >>> - mount >>> will be addressed. >>> >>> So if the (host) admin of the system trusts that unprivileged user cannot create >>> a malicious fs layout using mkfs and fsck alone, then the system is >>> relatively safe >>> mounting (non fuse) file systems from loopback files. >>> IMHO, this statement is going to be easier for Ted to sign. >> But that sort of defeats the purpose of unprivileged mounts. >> Or rather, you're trying to place restrictions on what an >> unprivileged user can do without calling the ability to >> violate those restrictions "privilege". > I don't understand your concern. My concern is that you're playing a shell game. Allow unprivileged mounts, but only of things that where created using privilege. How is that better than requiring privilege to do the mount? > I am saying that LSM can come to the rescue, in a use case that > many have been considering as unsolvable (i.e. the loopback tampering). > > Yes, I am trying to place restrictions on what an unprivileged user can do. > As it stands right now, user is about to gain the ability to mount FUSE. > With some extra care on crafting the policy and without any extra code, > user can gain the ability to mount "trusted loopback files". > It does not solve all use cases, but it does solve a handful. As I said, you can do this, but it will be ugly, and people won't understand how to use it correctly. The distance between the "trusted" creation of the filesystem and the "untrusted" mount is too great. Plus, there are too many ways to circumvent the integrity of your "trusted" filesystem. > Anyway, the concern I was raising was about the fact that if files inside > the loopback mount inherit the label of the loopback file, this policy is > going to be impossible to write. > But Stephan has already proposed an alternative to this implicit inherit rule > on [PATCH 6/7] thread, so I withdraw my concern. What Stephan has proposed is dandy for SELinux. > > >>>> Agreed though that the "attack from below" problem for untrusted >>>> filesystems is still an open question. At minimum we have fuse, which >>>> has been designed to protect against this threat. Others have mentioned >>>> on this thread that Ted had said something at kernel summit last year >>>> about being willing to support ext4 mounts from unprivileged user >>>> namespaces as well. I've added Ted to the Cc in case he wants to confirm >>>> or deny this rumor. >>>> >>>>> Alas, if you choose to propagate the backing dev label to contained files, >>>>> they would all share the designated 'Loopback' label and render the policy above >>>>> useless. >>>>> >>>>> Any thoughts on how to reconcile this conflict? >>>> I'm not seeing what the conflict is here - nothing you proposed says >>>> anything about security labels in the filesystem, and nothing would >>>> prevent a "trusted" program with CAP_MAC_ADMIN from setting whatever >>>> label was desired on the backing device. Care to elaborate? >>>> >>>> Seth ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-31 19:56 ` Casey Schaufler @ 2015-08-01 17:01 ` Amir Goldstein -1 siblings, 0 replies; 138+ messages in thread From: Amir Goldstein @ 2015-08-01 17:01 UTC (permalink / raw) To: Casey Schaufler Cc: Seth Forshee, Theodore Ts'o, Stephen Smalley, Andy Lutomirski, Eric W. Biederman, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel On Fri, Jul 31, 2015 at 10:56 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: > On 7/31/2015 1:11 AM, Amir Goldstein wrote: >> On Thu, Jul 30, 2015 at 6:33 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: >>> On 7/30/2015 7:47 AM, Amir Goldstein wrote: >>>> On Thu, Jul 30, 2015 at 4:55 PM, Seth Forshee >>>> <seth.forshee@canonical.com> wrote: >>>>> On Thu, Jul 30, 2015 at 07:24:11AM +0300, Amir Goldstein wrote: >>>>>> On Tue, Jul 28, 2015 at 11:40 PM, Seth Forshee >>>>>> <seth.forshee@canonical.com> wrote: >>>>>>> On Wed, Jul 22, 2015 at 05:05:17PM -0700, Casey Schaufler wrote: >>>>>>>>> This is what I currently think you want for user ns mounts: >>>>>>>>> >>>>>>>>> 1. smk_root and smk_default are assigned the label of the backing >>>>>>>>> device. >>>>>> Seth, >>>>>> >>>>>> There were 2 main concerns discussed in this thread: >>>>>> 1. trusting LSM labels outside the namespace >>>>>> 2. trusting the content of the image file/loopdev >>>>>> >>>>>> While your approach addresses the first concern, I suspect it may be placing >>>>>> an obstacle in a way for resolving the second concern. >>>>>> >>>>>> A viable security policy to mitigate the second concern could be: >>>>>> - Allow only trusted programs (e.g. mkfs, fsck) to write to 'Loopback' images >>>>>> - Allow mount only of 'Loopback' images >>>>>> >>>>>> This should allow the system as a whole to trust unprivileged mounts based on >>>>>> the trust of the entities that had raw access the the fs layout. >>>>> You don't really say what you mean by "trusted" programs. In a container >>>>> context I'd have to assume that you mean suid-root or similar programs >>>>> shared into the container by the host. In that case is any new kernel >>>>> functionality even required? >>>> Sorry I was not clear. I will try to explain better. >>>> I meant that the programs are "trusted" by the LSM security policy. >>>> I envisioned a system where unprivileged user is allowed to spawn >>>> a container which contains "trusted" programs (e.g. mkfs) that are labeled >>>> as 'FileSystemTools' by the admin of the host. >>>> FileSystemTools are allowed to write into Loopback labeled files. >>> You could do this on a Smack based system. It would require >>> CAP_MAC_ADMIN and CAP_MAC_OVERRIDE to set up. You would need >>> to set some SMACK64EXEC labels on your FileSystemTools, and >>> they would have to be written as carefully as the would if they >>> had "more" privilege. You'd need to designate a repository for >>> your loopback files. On the whole, it would be unattractive. >>> I will pass on providing the details for fear someone will like >>> it well enough to implement. >>> >>>>> That also doesn't work for some of our use cases, where we'd like to be >>>>> able to do something like "mount -o loop foo.img /mnt/foo" in an >>>>> unprivileged container where foo.img is not created on the local machine >>>>> and not fully under control of the host environment. >>>> That use case will not be addressed by the policy I suggested, >>>> but the more common case of: >>>> - create a loopback file >>>> - mkfs >>>> - mount >>>> will be addressed. >>>> >>>> So if the (host) admin of the system trusts that unprivileged user cannot create >>>> a malicious fs layout using mkfs and fsck alone, then the system is >>>> relatively safe >>>> mounting (non fuse) file systems from loopback files. >>>> IMHO, this statement is going to be easier for Ted to sign. >>> But that sort of defeats the purpose of unprivileged mounts. >>> Or rather, you're trying to place restrictions on what an >>> unprivileged user can do without calling the ability to >>> violate those restrictions "privilege". >> I don't understand your concern. > > My concern is that you're playing a shell game. Allow unprivileged > mounts, but only of things that where created using privilege. How > is that better than requiring privilege to do the mount? To me, the ability of an admin to delegate permissions to unprivileged user to mkfs/fsck/mount "trusted" loopdevs, sounds very useful. But I am not going to argue that use case any further. I do agree that it would have been much better if user namespace could allow unprivileged mounts of certain non FUSE file systems without relying on specially crafted security policies, but I do not see how that can happen. > >> I am saying that LSM can come to the rescue, in a use case that >> many have been considering as unsolvable (i.e. the loopback tampering). >> >> Yes, I am trying to place restrictions on what an unprivileged user can do. >> As it stands right now, user is about to gain the ability to mount FUSE. >> With some extra care on crafting the policy and without any extra code, >> user can gain the ability to mount "trusted loopback files". >> It does not solve all use cases, but it does solve a handful. > > As I said, you can do this, but it will be ugly, and people won't > understand how to use it correctly. The distance between the "trusted" > creation of the filesystem and the "untrusted" mount is too great. > Plus, there are too many ways to circumvent the integrity of your > "trusted" filesystem. > >> Anyway, the concern I was raising was about the fact that if files inside >> the loopback mount inherit the label of the loopback file, this policy is >> going to be impossible to write. >> But Stephan has already proposed an alternative to this implicit inherit rule >> on [PATCH 6/7] thread, so I withdraw my concern. > > What Stephan has proposed is dandy for SELinux. > >> >> >>>>> Agreed though that the "attack from below" problem for untrusted >>>>> filesystems is still an open question. At minimum we have fuse, which >>>>> has been designed to protect against this threat. Others have mentioned >>>>> on this thread that Ted had said something at kernel summit last year >>>>> about being willing to support ext4 mounts from unprivileged user >>>>> namespaces as well. I've added Ted to the Cc in case he wants to confirm >>>>> or deny this rumor. >>>>> >>>>>> Alas, if you choose to propagate the backing dev label to contained files, >>>>>> they would all share the designated 'Loopback' label and render the policy above >>>>>> useless. >>>>>> >>>>>> Any thoughts on how to reconcile this conflict? >>>>> I'm not seeing what the conflict is here - nothing you proposed says >>>>> anything about security labels in the filesystem, and nothing would >>>>> prevent a "trusted" program with CAP_MAC_ADMIN from setting whatever >>>>> label was desired on the backing device. Care to elaborate? >>>>> >>>>> Seth > ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-08-01 17:01 ` Amir Goldstein 0 siblings, 0 replies; 138+ messages in thread From: Amir Goldstein @ 2015-08-01 17:01 UTC (permalink / raw) To: Casey Schaufler Cc: Serge Hallyn, Theodore Ts'o, linux-kernel, Andy Lutomirski, Seth Forshee, LSM List, SELinux-NSA, Linux FS Devel, Stephen Smalley, Alexander Viro On Fri, Jul 31, 2015 at 10:56 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: > On 7/31/2015 1:11 AM, Amir Goldstein wrote: >> On Thu, Jul 30, 2015 at 6:33 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: >>> On 7/30/2015 7:47 AM, Amir Goldstein wrote: >>>> On Thu, Jul 30, 2015 at 4:55 PM, Seth Forshee >>>> <seth.forshee@canonical.com> wrote: >>>>> On Thu, Jul 30, 2015 at 07:24:11AM +0300, Amir Goldstein wrote: >>>>>> On Tue, Jul 28, 2015 at 11:40 PM, Seth Forshee >>>>>> <seth.forshee@canonical.com> wrote: >>>>>>> On Wed, Jul 22, 2015 at 05:05:17PM -0700, Casey Schaufler wrote: >>>>>>>>> This is what I currently think you want for user ns mounts: >>>>>>>>> >>>>>>>>> 1. smk_root and smk_default are assigned the label of the backing >>>>>>>>> device. >>>>>> Seth, >>>>>> >>>>>> There were 2 main concerns discussed in this thread: >>>>>> 1. trusting LSM labels outside the namespace >>>>>> 2. trusting the content of the image file/loopdev >>>>>> >>>>>> While your approach addresses the first concern, I suspect it may be placing >>>>>> an obstacle in a way for resolving the second concern. >>>>>> >>>>>> A viable security policy to mitigate the second concern could be: >>>>>> - Allow only trusted programs (e.g. mkfs, fsck) to write to 'Loopback' images >>>>>> - Allow mount only of 'Loopback' images >>>>>> >>>>>> This should allow the system as a whole to trust unprivileged mounts based on >>>>>> the trust of the entities that had raw access the the fs layout. >>>>> You don't really say what you mean by "trusted" programs. In a container >>>>> context I'd have to assume that you mean suid-root or similar programs >>>>> shared into the container by the host. In that case is any new kernel >>>>> functionality even required? >>>> Sorry I was not clear. I will try to explain better. >>>> I meant that the programs are "trusted" by the LSM security policy. >>>> I envisioned a system where unprivileged user is allowed to spawn >>>> a container which contains "trusted" programs (e.g. mkfs) that are labeled >>>> as 'FileSystemTools' by the admin of the host. >>>> FileSystemTools are allowed to write into Loopback labeled files. >>> You could do this on a Smack based system. It would require >>> CAP_MAC_ADMIN and CAP_MAC_OVERRIDE to set up. You would need >>> to set some SMACK64EXEC labels on your FileSystemTools, and >>> they would have to be written as carefully as the would if they >>> had "more" privilege. You'd need to designate a repository for >>> your loopback files. On the whole, it would be unattractive. >>> I will pass on providing the details for fear someone will like >>> it well enough to implement. >>> >>>>> That also doesn't work for some of our use cases, where we'd like to be >>>>> able to do something like "mount -o loop foo.img /mnt/foo" in an >>>>> unprivileged container where foo.img is not created on the local machine >>>>> and not fully under control of the host environment. >>>> That use case will not be addressed by the policy I suggested, >>>> but the more common case of: >>>> - create a loopback file >>>> - mkfs >>>> - mount >>>> will be addressed. >>>> >>>> So if the (host) admin of the system trusts that unprivileged user cannot create >>>> a malicious fs layout using mkfs and fsck alone, then the system is >>>> relatively safe >>>> mounting (non fuse) file systems from loopback files. >>>> IMHO, this statement is going to be easier for Ted to sign. >>> But that sort of defeats the purpose of unprivileged mounts. >>> Or rather, you're trying to place restrictions on what an >>> unprivileged user can do without calling the ability to >>> violate those restrictions "privilege". >> I don't understand your concern. > > My concern is that you're playing a shell game. Allow unprivileged > mounts, but only of things that where created using privilege. How > is that better than requiring privilege to do the mount? To me, the ability of an admin to delegate permissions to unprivileged user to mkfs/fsck/mount "trusted" loopdevs, sounds very useful. But I am not going to argue that use case any further. I do agree that it would have been much better if user namespace could allow unprivileged mounts of certain non FUSE file systems without relying on specially crafted security policies, but I do not see how that can happen. > >> I am saying that LSM can come to the rescue, in a use case that >> many have been considering as unsolvable (i.e. the loopback tampering). >> >> Yes, I am trying to place restrictions on what an unprivileged user can do. >> As it stands right now, user is about to gain the ability to mount FUSE. >> With some extra care on crafting the policy and without any extra code, >> user can gain the ability to mount "trusted loopback files". >> It does not solve all use cases, but it does solve a handful. > > As I said, you can do this, but it will be ugly, and people won't > understand how to use it correctly. The distance between the "trusted" > creation of the filesystem and the "untrusted" mount is too great. > Plus, there are too many ways to circumvent the integrity of your > "trusted" filesystem. > >> Anyway, the concern I was raising was about the fact that if files inside >> the loopback mount inherit the label of the loopback file, this policy is >> going to be impossible to write. >> But Stephan has already proposed an alternative to this implicit inherit rule >> on [PATCH 6/7] thread, so I withdraw my concern. > > What Stephan has proposed is dandy for SELinux. > >> >> >>>>> Agreed though that the "attack from below" problem for untrusted >>>>> filesystems is still an open question. At minimum we have fuse, which >>>>> has been designed to protect against this threat. Others have mentioned >>>>> on this thread that Ted had said something at kernel summit last year >>>>> about being willing to support ext4 mounts from unprivileged user >>>>> namespaces as well. I've added Ted to the Cc in case he wants to confirm >>>>> or deny this rumor. >>>>> >>>>>> Alas, if you choose to propagate the backing dev label to contained files, >>>>>> they would all share the designated 'Loopback' label and render the policy above >>>>>> useless. >>>>>> >>>>>> Any thoughts on how to reconcile this conflict? >>>>> I'm not seeing what the conflict is here - nothing you proposed says >>>>> anything about security labels in the filesystem, and nothing would >>>>> prevent a "trusted" program with CAP_MAC_ADMIN from setting whatever >>>>> label was desired on the backing device. Care to elaborate? >>>>> >>>>> Seth > ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-30 4:24 ` Amir Goldstein 0 siblings, 0 replies; 138+ messages in thread From: Amir Goldstein @ 2015-07-30 4:24 UTC (permalink / raw) To: Seth Forshee Cc: Casey Schaufler, Stephen Smalley, Andy Lutomirski, Eric W. Biederman, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel On Tue, Jul 28, 2015 at 11:40 PM, Seth Forshee <seth.forshee@canonical.com> wrote: > > On Wed, Jul 22, 2015 at 05:05:17PM -0700, Casey Schaufler wrote: > > > This is what I currently think you want for user ns mounts: > > > > > > 1. smk_root and smk_default are assigned the label of the backing > > > device. Seth, There were 2 main concerns discussed in this thread: 1. trusting LSM labels outside the namespace 2. trusting the content of the image file/loopdev While your approach addresses the first concern, I suspect it may be placing an obstacle in a way for resolving the second concern. A viable security policy to mitigate the second concern could be: - Allow only trusted programs (e.g. mkfs, fsck) to write to 'Loopback' images - Allow mount only of 'Loopback' images This should allow the system as a whole to trust unprivileged mounts based on the trust of the entities that had raw access the the fs layout. Alas, if you choose to propagate the backing dev label to contained files, they would all share the designated 'Loopback' label and render the policy above useless. Any thoughts on how to reconcile this conflict? Amir. > > > 2. s_root is assigned the transmute property. > > > 3. For existing files: > > > a. Files with the same label as the backing device are accessible. > > > b. Files with any other label are not accessible. > > > > That's right. Accept correct data, reject anything that's not right. > > > > > If this is right, there are a couple lingering questions in my mind. > > > > > > First, what happens with files created in directories with the same > > > label as the backing device but without the transmute property set? The > > > inode for the new file will initially be labeled with smk_of_current(), > > > but then during d_instantiate it will get smk_default and thus end up > > > with the label we want. So that seems okay. > > > > Yes. > > > > > The second is whether files with the SMACK64EXEC attribute is still a > > > problem. It seems it is, for files with the same label as the backing > > > store at least. I think we can simply skip the code that reads out this > > > xattr and sets smk_task for user ns mounts, or else skip assigning the > > > label to the new task in bprm_set_creds. The latter seems more > > > consistent with the approach you've suggested for dealing with labels > > > from disk. > > > > Yes, I think that skipping the smk_fetch(XATTR_NAME_SMACKEXEC, ...) in > > smack_d_instantiate for unprivileged mounts would do the trick. > > > > > So I guess all of that seems okay, though perhaps a bit restrictive > > > given that the user who mounted the filesystem already has full access > > > to the backing store. > > > > In truth, there is no reason to expect that the "user" who did the > > mount will ever have a Smack label that differs from the label of > > the backing store. If what we've got here seems restrictive, it's > > because you've got access from someone other than the "user". > > > > > Please let me know whether or not this matches up with what you are > > > thinking, then I can procede with the implementation. > > > > My current mindset is that, if you're going to allow unprivileged > > mounts of user defined backing stores, this is as safe as we can > > make it. > > All right, I've got a patch which I think does this, and I've managed to > do some testing to confirm that it behaves like I expect. How does this > look? > > What's missing is getting the label from the block device inode; as > Stephen discovered the inode that I thought we could get the label from > turned out to be the wrong one. Afaict we would need a new hook in order > to do that, so for now I'm using the label of the proccess calling > mount. > > --- > > diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c > index a143328f75eb..8e631a66b03c 100644 > --- a/security/smack/smack_lsm.c > +++ b/security/smack/smack_lsm.c > @@ -662,6 +662,8 @@ static int smack_sb_kern_mount(struct super_block *sb, int flags, void *data) > skp = smk_of_current(); > sp->smk_root = skp; > sp->smk_default = skp; > + if (sb_in_userns(sb)) > + transmute = 1; > } > /* > * Initialize the root inode. > @@ -1023,6 +1025,12 @@ static int smack_inode_permission(struct inode *inode, int mask) > if (mask == 0) > return 0; > > + if (sb_in_userns(inode->i_sb)) { > + struct superblock_smack *sbsp = inode->i_sb->s_security; > + if (smk_of_inode(inode) != sbsp->smk_root) > + return -EACCES; > + } > + > /* May be droppable after audit */ > if (no_block) > return -ECHILD; > @@ -3220,14 +3228,16 @@ static void smack_d_instantiate(struct dentry *opt_dentry, struct inode *inode) > if (rc >= 0) > transflag = SMK_INODE_TRANSMUTE; > } > - /* > - * Don't let the exec or mmap label be "*" or "@". > - */ > - skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp); > - if (IS_ERR(skp) || skp == &smack_known_star || > - skp == &smack_known_web) > - skp = NULL; > - isp->smk_task = skp; > + if (!sb_in_userns(inode->i_sb)) { > + /* > + * Don't let the exec or mmap label be "*" or "@". > + */ > + skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp); > + if (IS_ERR(skp) || skp == &smack_known_star || > + skp == &smack_known_web) > + skp = NULL; > + isp->smk_task = skp; > + } > > skp = smk_fetch(XATTR_NAME_SMACKMMAP, inode, dp); > if (IS_ERR(skp) || skp == &smack_known_star || > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-30 4:24 ` Amir Goldstein 0 siblings, 0 replies; 138+ messages in thread From: Amir Goldstein @ 2015-07-30 4:24 UTC (permalink / raw) To: Seth Forshee Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, Linux FS Devel, LSM List, SELinux-NSA, Stephen Smalley, Alexander Viro On Tue, Jul 28, 2015 at 11:40 PM, Seth Forshee <seth.forshee@canonical.com> wrote: > > On Wed, Jul 22, 2015 at 05:05:17PM -0700, Casey Schaufler wrote: > > > This is what I currently think you want for user ns mounts: > > > > > > 1. smk_root and smk_default are assigned the label of the backing > > > device. Seth, There were 2 main concerns discussed in this thread: 1. trusting LSM labels outside the namespace 2. trusting the content of the image file/loopdev While your approach addresses the first concern, I suspect it may be placing an obstacle in a way for resolving the second concern. A viable security policy to mitigate the second concern could be: - Allow only trusted programs (e.g. mkfs, fsck) to write to 'Loopback' images - Allow mount only of 'Loopback' images This should allow the system as a whole to trust unprivileged mounts based on the trust of the entities that had raw access the the fs layout. Alas, if you choose to propagate the backing dev label to contained files, they would all share the designated 'Loopback' label and render the policy above useless. Any thoughts on how to reconcile this conflict? Amir. > > > 2. s_root is assigned the transmute property. > > > 3. For existing files: > > > a. Files with the same label as the backing device are accessible. > > > b. Files with any other label are not accessible. > > > > That's right. Accept correct data, reject anything that's not right. > > > > > If this is right, there are a couple lingering questions in my mind. > > > > > > First, what happens with files created in directories with the same > > > label as the backing device but without the transmute property set? The > > > inode for the new file will initially be labeled with smk_of_current(), > > > but then during d_instantiate it will get smk_default and thus end up > > > with the label we want. So that seems okay. > > > > Yes. > > > > > The second is whether files with the SMACK64EXEC attribute is still a > > > problem. It seems it is, for files with the same label as the backing > > > store at least. I think we can simply skip the code that reads out this > > > xattr and sets smk_task for user ns mounts, or else skip assigning the > > > label to the new task in bprm_set_creds. The latter seems more > > > consistent with the approach you've suggested for dealing with labels > > > from disk. > > > > Yes, I think that skipping the smk_fetch(XATTR_NAME_SMACKEXEC, ...) in > > smack_d_instantiate for unprivileged mounts would do the trick. > > > > > So I guess all of that seems okay, though perhaps a bit restrictive > > > given that the user who mounted the filesystem already has full access > > > to the backing store. > > > > In truth, there is no reason to expect that the "user" who did the > > mount will ever have a Smack label that differs from the label of > > the backing store. If what we've got here seems restrictive, it's > > because you've got access from someone other than the "user". > > > > > Please let me know whether or not this matches up with what you are > > > thinking, then I can procede with the implementation. > > > > My current mindset is that, if you're going to allow unprivileged > > mounts of user defined backing stores, this is as safe as we can > > make it. > > All right, I've got a patch which I think does this, and I've managed to > do some testing to confirm that it behaves like I expect. How does this > look? > > What's missing is getting the label from the block device inode; as > Stephen discovered the inode that I thought we could get the label from > turned out to be the wrong one. Afaict we would need a new hook in order > to do that, so for now I'm using the label of the proccess calling > mount. > > --- > > diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c > index a143328f75eb..8e631a66b03c 100644 > --- a/security/smack/smack_lsm.c > +++ b/security/smack/smack_lsm.c > @@ -662,6 +662,8 @@ static int smack_sb_kern_mount(struct super_block *sb, int flags, void *data) > skp = smk_of_current(); > sp->smk_root = skp; > sp->smk_default = skp; > + if (sb_in_userns(sb)) > + transmute = 1; > } > /* > * Initialize the root inode. > @@ -1023,6 +1025,12 @@ static int smack_inode_permission(struct inode *inode, int mask) > if (mask == 0) > return 0; > > + if (sb_in_userns(inode->i_sb)) { > + struct superblock_smack *sbsp = inode->i_sb->s_security; > + if (smk_of_inode(inode) != sbsp->smk_root) > + return -EACCES; > + } > + > /* May be droppable after audit */ > if (no_block) > return -ECHILD; > @@ -3220,14 +3228,16 @@ static void smack_d_instantiate(struct dentry *opt_dentry, struct inode *inode) > if (rc >= 0) > transflag = SMK_INODE_TRANSMUTE; > } > - /* > - * Don't let the exec or mmap label be "*" or "@". > - */ > - skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp); > - if (IS_ERR(skp) || skp == &smack_known_star || > - skp == &smack_known_web) > - skp = NULL; > - isp->smk_task = skp; > + if (!sb_in_userns(inode->i_sb)) { > + /* > + * Don't let the exec or mmap label be "*" or "@". > + */ > + skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp); > + if (IS_ERR(skp) || skp == &smack_known_star || > + skp == &smack_known_web) > + skp = NULL; > + isp->smk_task = skp; > + } > > skp = smk_fetch(XATTR_NAME_SMACKMMAP, inode, dp); > if (IS_ERR(skp) || skp == &smack_known_star || > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-30 4:24 ` Amir Goldstein @ 2015-07-30 13:55 ` Seth Forshee -1 siblings, 0 replies; 138+ messages in thread From: Seth Forshee @ 2015-07-30 13:55 UTC (permalink / raw) To: Amir Goldstein Cc: Theodore Ts'o, Casey Schaufler, Stephen Smalley, Andy Lutomirski, Eric W. Biederman, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel On Thu, Jul 30, 2015 at 07:24:11AM +0300, Amir Goldstein wrote: > On Tue, Jul 28, 2015 at 11:40 PM, Seth Forshee > <seth.forshee@canonical.com> wrote: > > > > On Wed, Jul 22, 2015 at 05:05:17PM -0700, Casey Schaufler wrote: > > > > This is what I currently think you want for user ns mounts: > > > > > > > > 1. smk_root and smk_default are assigned the label of the backing > > > > device. > > Seth, > > There were 2 main concerns discussed in this thread: > 1. trusting LSM labels outside the namespace > 2. trusting the content of the image file/loopdev > > While your approach addresses the first concern, I suspect it may be placing > an obstacle in a way for resolving the second concern. > > A viable security policy to mitigate the second concern could be: > - Allow only trusted programs (e.g. mkfs, fsck) to write to 'Loopback' images > - Allow mount only of 'Loopback' images > > This should allow the system as a whole to trust unprivileged mounts based on > the trust of the entities that had raw access the the fs layout. You don't really say what you mean by "trusted" programs. In a container context I'd have to assume that you mean suid-root or similar programs shared into the container by the host. In that case is any new kernel functionality even required? That also doesn't work for some of our use cases, where we'd like to be able to do something like "mount -o loop foo.img /mnt/foo" in an unprivileged container where foo.img is not created on the local machine and not fully under control of the host environment. Agreed though that the "attack from below" problem for untrusted filesystems is still an open question. At minimum we have fuse, which has been designed to protect against this threat. Others have mentioned on this thread that Ted had said something at kernel summit last year about being willing to support ext4 mounts from unprivileged user namespaces as well. I've added Ted to the Cc in case he wants to confirm or deny this rumor. > Alas, if you choose to propagate the backing dev label to contained files, > they would all share the designated 'Loopback' label and render the policy above > useless. > > Any thoughts on how to reconcile this conflict? I'm not seeing what the conflict is here - nothing you proposed says anything about security labels in the filesystem, and nothing would prevent a "trusted" program with CAP_MAC_ADMIN from setting whatever label was desired on the backing device. Care to elaborate? Seth ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-30 13:55 ` Seth Forshee 0 siblings, 0 replies; 138+ messages in thread From: Seth Forshee @ 2015-07-30 13:55 UTC (permalink / raw) To: Amir Goldstein Cc: Serge Hallyn, Theodore Ts'o, linux-kernel, Andy Lutomirski, Linux FS Devel, LSM List, SELinux-NSA, Stephen Smalley, Alexander Viro On Thu, Jul 30, 2015 at 07:24:11AM +0300, Amir Goldstein wrote: > On Tue, Jul 28, 2015 at 11:40 PM, Seth Forshee > <seth.forshee@canonical.com> wrote: > > > > On Wed, Jul 22, 2015 at 05:05:17PM -0700, Casey Schaufler wrote: > > > > This is what I currently think you want for user ns mounts: > > > > > > > > 1. smk_root and smk_default are assigned the label of the backing > > > > device. > > Seth, > > There were 2 main concerns discussed in this thread: > 1. trusting LSM labels outside the namespace > 2. trusting the content of the image file/loopdev > > While your approach addresses the first concern, I suspect it may be placing > an obstacle in a way for resolving the second concern. > > A viable security policy to mitigate the second concern could be: > - Allow only trusted programs (e.g. mkfs, fsck) to write to 'Loopback' images > - Allow mount only of 'Loopback' images > > This should allow the system as a whole to trust unprivileged mounts based on > the trust of the entities that had raw access the the fs layout. You don't really say what you mean by "trusted" programs. In a container context I'd have to assume that you mean suid-root or similar programs shared into the container by the host. In that case is any new kernel functionality even required? That also doesn't work for some of our use cases, where we'd like to be able to do something like "mount -o loop foo.img /mnt/foo" in an unprivileged container where foo.img is not created on the local machine and not fully under control of the host environment. Agreed though that the "attack from below" problem for untrusted filesystems is still an open question. At minimum we have fuse, which has been designed to protect against this threat. Others have mentioned on this thread that Ted had said something at kernel summit last year about being willing to support ext4 mounts from unprivileged user namespaces as well. I've added Ted to the Cc in case he wants to confirm or deny this rumor. > Alas, if you choose to propagate the backing dev label to contained files, > they would all share the designated 'Loopback' label and render the policy above > useless. > > Any thoughts on how to reconcile this conflict? I'm not seeing what the conflict is here - nothing you proposed says anything about security labels in the filesystem, and nothing would prevent a "trusted" program with CAP_MAC_ADMIN from setting whatever label was desired on the backing device. Care to elaborate? Seth ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-30 13:55 ` Seth Forshee @ 2015-07-30 14:47 ` Amir Goldstein -1 siblings, 0 replies; 138+ messages in thread From: Amir Goldstein @ 2015-07-30 14:47 UTC (permalink / raw) To: Seth Forshee Cc: Theodore Ts'o, Casey Schaufler, Stephen Smalley, Andy Lutomirski, Eric W. Biederman, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel On Thu, Jul 30, 2015 at 4:55 PM, Seth Forshee <seth.forshee@canonical.com> wrote: > > On Thu, Jul 30, 2015 at 07:24:11AM +0300, Amir Goldstein wrote: > > On Tue, Jul 28, 2015 at 11:40 PM, Seth Forshee > > <seth.forshee@canonical.com> wrote: > > > > > > On Wed, Jul 22, 2015 at 05:05:17PM -0700, Casey Schaufler wrote: > > > > > This is what I currently think you want for user ns mounts: > > > > > > > > > > 1. smk_root and smk_default are assigned the label of the backing > > > > > device. > > > > Seth, > > > > There were 2 main concerns discussed in this thread: > > 1. trusting LSM labels outside the namespace > > 2. trusting the content of the image file/loopdev > > > > While your approach addresses the first concern, I suspect it may be placing > > an obstacle in a way for resolving the second concern. > > > > A viable security policy to mitigate the second concern could be: > > - Allow only trusted programs (e.g. mkfs, fsck) to write to 'Loopback' images > > - Allow mount only of 'Loopback' images > > > > This should allow the system as a whole to trust unprivileged mounts based on > > the trust of the entities that had raw access the the fs layout. > > You don't really say what you mean by "trusted" programs. In a container > context I'd have to assume that you mean suid-root or similar programs > shared into the container by the host. In that case is any new kernel > functionality even required? Sorry I was not clear. I will try to explain better. I meant that the programs are "trusted" by the LSM security policy. I envisioned a system where unprivileged user is allowed to spawn a container which contains "trusted" programs (e.g. mkfs) that are labeled as 'FileSystemTools' by the admin of the host. FileSystemTools are allowed to write into Loopback labeled files. > > That also doesn't work for some of our use cases, where we'd like to be > able to do something like "mount -o loop foo.img /mnt/foo" in an > unprivileged container where foo.img is not created on the local machine > and not fully under control of the host environment. That use case will not be addressed by the policy I suggested, but the more common case of: - create a loopback file - mkfs - mount will be addressed. So if the (host) admin of the system trusts that unprivileged user cannot create a malicious fs layout using mkfs and fsck alone, then the system is relatively safe mounting (non fuse) file systems from loopback files. IMHO, this statement is going to be easier for Ted to sign. > > Agreed though that the "attack from below" problem for untrusted > filesystems is still an open question. At minimum we have fuse, which > has been designed to protect against this threat. Others have mentioned > on this thread that Ted had said something at kernel summit last year > about being willing to support ext4 mounts from unprivileged user > namespaces as well. I've added Ted to the Cc in case he wants to confirm > or deny this rumor. > > > Alas, if you choose to propagate the backing dev label to contained files, > > they would all share the designated 'Loopback' label and render the policy above > > useless. > > > > Any thoughts on how to reconcile this conflict? > > I'm not seeing what the conflict is here - nothing you proposed says > anything about security labels in the filesystem, and nothing would > prevent a "trusted" program with CAP_MAC_ADMIN from setting whatever > label was desired on the backing device. Care to elaborate? > > Seth ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-30 14:47 ` Amir Goldstein 0 siblings, 0 replies; 138+ messages in thread From: Amir Goldstein @ 2015-07-30 14:47 UTC (permalink / raw) To: Seth Forshee Cc: Serge Hallyn, Theodore Ts'o, linux-kernel, Andy Lutomirski, Linux FS Devel, LSM List, SELinux-NSA, Stephen Smalley, Alexander Viro On Thu, Jul 30, 2015 at 4:55 PM, Seth Forshee <seth.forshee@canonical.com> wrote: > > On Thu, Jul 30, 2015 at 07:24:11AM +0300, Amir Goldstein wrote: > > On Tue, Jul 28, 2015 at 11:40 PM, Seth Forshee > > <seth.forshee@canonical.com> wrote: > > > > > > On Wed, Jul 22, 2015 at 05:05:17PM -0700, Casey Schaufler wrote: > > > > > This is what I currently think you want for user ns mounts: > > > > > > > > > > 1. smk_root and smk_default are assigned the label of the backing > > > > > device. > > > > Seth, > > > > There were 2 main concerns discussed in this thread: > > 1. trusting LSM labels outside the namespace > > 2. trusting the content of the image file/loopdev > > > > While your approach addresses the first concern, I suspect it may be placing > > an obstacle in a way for resolving the second concern. > > > > A viable security policy to mitigate the second concern could be: > > - Allow only trusted programs (e.g. mkfs, fsck) to write to 'Loopback' images > > - Allow mount only of 'Loopback' images > > > > This should allow the system as a whole to trust unprivileged mounts based on > > the trust of the entities that had raw access the the fs layout. > > You don't really say what you mean by "trusted" programs. In a container > context I'd have to assume that you mean suid-root or similar programs > shared into the container by the host. In that case is any new kernel > functionality even required? Sorry I was not clear. I will try to explain better. I meant that the programs are "trusted" by the LSM security policy. I envisioned a system where unprivileged user is allowed to spawn a container which contains "trusted" programs (e.g. mkfs) that are labeled as 'FileSystemTools' by the admin of the host. FileSystemTools are allowed to write into Loopback labeled files. > > That also doesn't work for some of our use cases, where we'd like to be > able to do something like "mount -o loop foo.img /mnt/foo" in an > unprivileged container where foo.img is not created on the local machine > and not fully under control of the host environment. That use case will not be addressed by the policy I suggested, but the more common case of: - create a loopback file - mkfs - mount will be addressed. So if the (host) admin of the system trusts that unprivileged user cannot create a malicious fs layout using mkfs and fsck alone, then the system is relatively safe mounting (non fuse) file systems from loopback files. IMHO, this statement is going to be easier for Ted to sign. > > Agreed though that the "attack from below" problem for untrusted > filesystems is still an open question. At minimum we have fuse, which > has been designed to protect against this threat. Others have mentioned > on this thread that Ted had said something at kernel summit last year > about being willing to support ext4 mounts from unprivileged user > namespaces as well. I've added Ted to the Cc in case he wants to confirm > or deny this rumor. > > > Alas, if you choose to propagate the backing dev label to contained files, > > they would all share the designated 'Loopback' label and render the policy above > > useless. > > > > Any thoughts on how to reconcile this conflict? > > I'm not seeing what the conflict is here - nothing you proposed says > anything about security labels in the filesystem, and nothing would > prevent a "trusted" program with CAP_MAC_ADMIN from setting whatever > label was desired on the backing device. Care to elaborate? > > Seth ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-30 14:47 ` Amir Goldstein @ 2015-07-30 15:33 ` Casey Schaufler -1 siblings, 0 replies; 138+ messages in thread From: Casey Schaufler @ 2015-07-30 15:33 UTC (permalink / raw) To: Amir Goldstein, Seth Forshee Cc: Theodore Ts'o, Stephen Smalley, Andy Lutomirski, Eric W. Biederman, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel, Casey Schaufler On 7/30/2015 7:47 AM, Amir Goldstein wrote: > On Thu, Jul 30, 2015 at 4:55 PM, Seth Forshee > <seth.forshee@canonical.com> wrote: >> On Thu, Jul 30, 2015 at 07:24:11AM +0300, Amir Goldstein wrote: >>> On Tue, Jul 28, 2015 at 11:40 PM, Seth Forshee >>> <seth.forshee@canonical.com> wrote: >>>> On Wed, Jul 22, 2015 at 05:05:17PM -0700, Casey Schaufler wrote: >>>>>> This is what I currently think you want for user ns mounts: >>>>>> >>>>>> 1. smk_root and smk_default are assigned the label of the backing >>>>>> device. >>> Seth, >>> >>> There were 2 main concerns discussed in this thread: >>> 1. trusting LSM labels outside the namespace >>> 2. trusting the content of the image file/loopdev >>> >>> While your approach addresses the first concern, I suspect it may be placing >>> an obstacle in a way for resolving the second concern. >>> >>> A viable security policy to mitigate the second concern could be: >>> - Allow only trusted programs (e.g. mkfs, fsck) to write to 'Loopback' images >>> - Allow mount only of 'Loopback' images >>> >>> This should allow the system as a whole to trust unprivileged mounts based on >>> the trust of the entities that had raw access the the fs layout. >> You don't really say what you mean by "trusted" programs. In a container >> context I'd have to assume that you mean suid-root or similar programs >> shared into the container by the host. In that case is any new kernel >> functionality even required? > Sorry I was not clear. I will try to explain better. > I meant that the programs are "trusted" by the LSM security policy. > I envisioned a system where unprivileged user is allowed to spawn > a container which contains "trusted" programs (e.g. mkfs) that are labeled > as 'FileSystemTools' by the admin of the host. > FileSystemTools are allowed to write into Loopback labeled files. You could do this on a Smack based system. It would require CAP_MAC_ADMIN and CAP_MAC_OVERRIDE to set up. You would need to set some SMACK64EXEC labels on your FileSystemTools, and they would have to be written as carefully as the would if they had "more" privilege. You'd need to designate a repository for your loopback files. On the whole, it would be unattractive. I will pass on providing the details for fear someone will like it well enough to implement. >> That also doesn't work for some of our use cases, where we'd like to be >> able to do something like "mount -o loop foo.img /mnt/foo" in an >> unprivileged container where foo.img is not created on the local machine >> and not fully under control of the host environment. > That use case will not be addressed by the policy I suggested, > but the more common case of: > - create a loopback file > - mkfs > - mount > will be addressed. > > So if the (host) admin of the system trusts that unprivileged user cannot create > a malicious fs layout using mkfs and fsck alone, then the system is > relatively safe > mounting (non fuse) file systems from loopback files. > IMHO, this statement is going to be easier for Ted to sign. But that sort of defeats the purpose of unprivileged mounts. Or rather, you're trying to place restrictions on what an unprivileged user can do without calling the ability to violate those restrictions "privilege". > >> Agreed though that the "attack from below" problem for untrusted >> filesystems is still an open question. At minimum we have fuse, which >> has been designed to protect against this threat. Others have mentioned >> on this thread that Ted had said something at kernel summit last year >> about being willing to support ext4 mounts from unprivileged user >> namespaces as well. I've added Ted to the Cc in case he wants to confirm >> or deny this rumor. >> >>> Alas, if you choose to propagate the backing dev label to contained files, >>> they would all share the designated 'Loopback' label and render the policy above >>> useless. >>> >>> Any thoughts on how to reconcile this conflict? >> I'm not seeing what the conflict is here - nothing you proposed says >> anything about security labels in the filesystem, and nothing would >> prevent a "trusted" program with CAP_MAC_ADMIN from setting whatever >> label was desired on the backing device. Care to elaborate? >> >> Seth ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-30 15:33 ` Casey Schaufler 0 siblings, 0 replies; 138+ messages in thread From: Casey Schaufler @ 2015-07-30 15:33 UTC (permalink / raw) To: Amir Goldstein, Seth Forshee Cc: Serge Hallyn, Theodore Ts'o, linux-kernel, Andy Lutomirski, LSM List, SELinux-NSA, Linux FS Devel, Stephen Smalley, Alexander Viro On 7/30/2015 7:47 AM, Amir Goldstein wrote: > On Thu, Jul 30, 2015 at 4:55 PM, Seth Forshee > <seth.forshee@canonical.com> wrote: >> On Thu, Jul 30, 2015 at 07:24:11AM +0300, Amir Goldstein wrote: >>> On Tue, Jul 28, 2015 at 11:40 PM, Seth Forshee >>> <seth.forshee@canonical.com> wrote: >>>> On Wed, Jul 22, 2015 at 05:05:17PM -0700, Casey Schaufler wrote: >>>>>> This is what I currently think you want for user ns mounts: >>>>>> >>>>>> 1. smk_root and smk_default are assigned the label of the backing >>>>>> device. >>> Seth, >>> >>> There were 2 main concerns discussed in this thread: >>> 1. trusting LSM labels outside the namespace >>> 2. trusting the content of the image file/loopdev >>> >>> While your approach addresses the first concern, I suspect it may be placing >>> an obstacle in a way for resolving the second concern. >>> >>> A viable security policy to mitigate the second concern could be: >>> - Allow only trusted programs (e.g. mkfs, fsck) to write to 'Loopback' images >>> - Allow mount only of 'Loopback' images >>> >>> This should allow the system as a whole to trust unprivileged mounts based on >>> the trust of the entities that had raw access the the fs layout. >> You don't really say what you mean by "trusted" programs. In a container >> context I'd have to assume that you mean suid-root or similar programs >> shared into the container by the host. In that case is any new kernel >> functionality even required? > Sorry I was not clear. I will try to explain better. > I meant that the programs are "trusted" by the LSM security policy. > I envisioned a system where unprivileged user is allowed to spawn > a container which contains "trusted" programs (e.g. mkfs) that are labeled > as 'FileSystemTools' by the admin of the host. > FileSystemTools are allowed to write into Loopback labeled files. You could do this on a Smack based system. It would require CAP_MAC_ADMIN and CAP_MAC_OVERRIDE to set up. You would need to set some SMACK64EXEC labels on your FileSystemTools, and they would have to be written as carefully as the would if they had "more" privilege. You'd need to designate a repository for your loopback files. On the whole, it would be unattractive. I will pass on providing the details for fear someone will like it well enough to implement. >> That also doesn't work for some of our use cases, where we'd like to be >> able to do something like "mount -o loop foo.img /mnt/foo" in an >> unprivileged container where foo.img is not created on the local machine >> and not fully under control of the host environment. > That use case will not be addressed by the policy I suggested, > but the more common case of: > - create a loopback file > - mkfs > - mount > will be addressed. > > So if the (host) admin of the system trusts that unprivileged user cannot create > a malicious fs layout using mkfs and fsck alone, then the system is > relatively safe > mounting (non fuse) file systems from loopback files. > IMHO, this statement is going to be easier for Ted to sign. But that sort of defeats the purpose of unprivileged mounts. Or rather, you're trying to place restrictions on what an unprivileged user can do without calling the ability to violate those restrictions "privilege". > >> Agreed though that the "attack from below" problem for untrusted >> filesystems is still an open question. At minimum we have fuse, which >> has been designed to protect against this threat. Others have mentioned >> on this thread that Ted had said something at kernel summit last year >> about being willing to support ext4 mounts from unprivileged user >> namespaces as well. I've added Ted to the Cc in case he wants to confirm >> or deny this rumor. >> >>> Alas, if you choose to propagate the backing dev label to contained files, >>> they would all share the designated 'Loopback' label and render the policy above >>> useless. >>> >>> Any thoughts on how to reconcile this conflict? >> I'm not seeing what the conflict is here - nothing you proposed says >> anything about security labels in the filesystem, and nothing would >> prevent a "trusted" program with CAP_MAC_ADMIN from setting whatever >> label was desired on the backing device. Care to elaborate? >> >> Seth ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-30 15:33 ` Casey Schaufler @ 2015-07-30 15:52 ` Colin Walters -1 siblings, 0 replies; 138+ messages in thread From: Colin Walters @ 2015-07-30 15:52 UTC (permalink / raw) To: Casey Schaufler, Amir Goldstein, Seth Forshee Cc: Theodore Ts'o, Stephen Smalley, Andy Lutomirski, Eric W. Biederman, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel It's worth noting here that I think a lot of the use cases for unprivileged mounts are testing/development type things, and these are pretty well covered by: http://libguestfs.org/ Basically it just runs the host kernel in a VM, and the userspace is a minimal agent that you can talk to over virtio. You can use the API, or `guestmount` exposes it via FUSE. It doesn't magically make the kernel filesystems robust against untrusted input, but in the case of compromise, it's an "unprivileged" VM. I've used it for several projects and been quite happy. ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-30 15:52 ` Colin Walters 0 siblings, 0 replies; 138+ messages in thread From: Colin Walters @ 2015-07-30 15:52 UTC (permalink / raw) To: Casey Schaufler, Amir Goldstein, Seth Forshee Cc: Serge Hallyn, Theodore Ts'o, linux-kernel, Andy Lutomirski, LSM List, SELinux-NSA, Linux FS Devel, Stephen Smalley, Alexander Viro It's worth noting here that I think a lot of the use cases for unprivileged mounts are testing/development type things, and these are pretty well covered by: http://libguestfs.org/ Basically it just runs the host kernel in a VM, and the userspace is a minimal agent that you can talk to over virtio. You can use the API, or `guestmount` exposes it via FUSE. It doesn't magically make the kernel filesystems robust against untrusted input, but in the case of compromise, it's an "unprivileged" VM. I've used it for several projects and been quite happy. ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-30 15:52 ` Colin Walters @ 2015-07-30 16:15 ` Eric W. Biederman -1 siblings, 0 replies; 138+ messages in thread From: Eric W. Biederman @ 2015-07-30 16:15 UTC (permalink / raw) To: Colin Walters Cc: Casey Schaufler, Amir Goldstein, Seth Forshee, Theodore Ts'o, Stephen Smalley, Andy Lutomirski, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel Colin Walters <walters@verbum.org> writes: > It's worth noting here that I think a lot of the use cases > for unprivileged mounts are testing/development type things, > and these are pretty well covered by: > > http://libguestfs.org/ > > Basically it just runs the host kernel in a VM, and the userspace > is a minimal agent that you can talk to over virtio. You can use > the API, or `guestmount` exposes it via FUSE. > > It doesn't magically make the kernel filesystems robust against > untrusted input, but in the case of compromise, it's an > "unprivileged" VM. I've used it for several projects and been > quite happy. Thanks for pointing this out. That makes it clear we only have to get as far as making fuse work for this work to be useful in practice. Eric ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-30 16:15 ` Eric W. Biederman 0 siblings, 0 replies; 138+ messages in thread From: Eric W. Biederman @ 2015-07-30 16:15 UTC (permalink / raw) To: Colin Walters Cc: Theodore Ts'o, Serge Hallyn, linux-kernel, Andy Lutomirski, Seth Forshee, LSM List, Alexander Viro, SELinux-NSA, Linux FS Devel, Stephen Smalley Colin Walters <walters@verbum.org> writes: > It's worth noting here that I think a lot of the use cases > for unprivileged mounts are testing/development type things, > and these are pretty well covered by: > > http://libguestfs.org/ > > Basically it just runs the host kernel in a VM, and the userspace > is a minimal agent that you can talk to over virtio. You can use > the API, or `guestmount` exposes it via FUSE. > > It doesn't magically make the kernel filesystems robust against > untrusted input, but in the case of compromise, it's an > "unprivileged" VM. I've used it for several projects and been > quite happy. Thanks for pointing this out. That makes it clear we only have to get as far as making fuse work for this work to be useful in practice. Eric ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-30 4:24 ` Amir Goldstein @ 2015-07-30 13:57 ` Serge Hallyn -1 siblings, 0 replies; 138+ messages in thread From: Serge Hallyn @ 2015-07-30 13:57 UTC (permalink / raw) To: Amir Goldstein Cc: Seth Forshee, Casey Schaufler, Stephen Smalley, Andy Lutomirski, Eric W. Biederman, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel Quoting Amir Goldstein (amir@cellrox.com): > On Tue, Jul 28, 2015 at 11:40 PM, Seth Forshee > <seth.forshee@canonical.com> wrote: > > > > On Wed, Jul 22, 2015 at 05:05:17PM -0700, Casey Schaufler wrote: > > > > This is what I currently think you want for user ns mounts: > > > > > > > > 1. smk_root and smk_default are assigned the label of the backing > > > > device. > > Seth, > > There were 2 main concerns discussed in this thread: > 1. trusting LSM labels outside the namespace > 2. trusting the content of the image file/loopdev > > While your approach addresses the first concern, I suspect it may be placing > an obstacle in a way for resolving the second concern. > > A viable security policy to mitigate the second concern could be: > - Allow only trusted programs (e.g. mkfs, fsck) to write to 'Loopback' images > - Allow mount only of 'Loopback' images > > This should allow the system as a whole to trust unprivileged mounts based on > the trust of the entities that had raw access the the fs layout. Just to be sure I understand right, you're looking for a way to let the host admin trust that the kernel's superblock parsers aren't being fed trash or an exploit? > Alas, if you choose to propagate the backing dev label to contained files, > they would all share the designated 'Loopback' label and render the policy above > useless. > > Any thoughts on how to reconcile this conflict? > > Amir. > > > > > > 2. s_root is assigned the transmute property. > > > > 3. For existing files: > > > > a. Files with the same label as the backing device are accessible. > > > > b. Files with any other label are not accessible. > > > > > > That's right. Accept correct data, reject anything that's not right. > > > > > > > If this is right, there are a couple lingering questions in my mind. > > > > > > > > First, what happens with files created in directories with the same > > > > label as the backing device but without the transmute property set? The > > > > inode for the new file will initially be labeled with smk_of_current(), > > > > but then during d_instantiate it will get smk_default and thus end up > > > > with the label we want. So that seems okay. > > > > > > Yes. > > > > > > > The second is whether files with the SMACK64EXEC attribute is still a > > > > problem. It seems it is, for files with the same label as the backing > > > > store at least. I think we can simply skip the code that reads out this > > > > xattr and sets smk_task for user ns mounts, or else skip assigning the > > > > label to the new task in bprm_set_creds. The latter seems more > > > > consistent with the approach you've suggested for dealing with labels > > > > from disk. > > > > > > Yes, I think that skipping the smk_fetch(XATTR_NAME_SMACKEXEC, ...) in > > > smack_d_instantiate for unprivileged mounts would do the trick. > > > > > > > So I guess all of that seems okay, though perhaps a bit restrictive > > > > given that the user who mounted the filesystem already has full access > > > > to the backing store. > > > > > > In truth, there is no reason to expect that the "user" who did the > > > mount will ever have a Smack label that differs from the label of > > > the backing store. If what we've got here seems restrictive, it's > > > because you've got access from someone other than the "user". > > > > > > > Please let me know whether or not this matches up with what you are > > > > thinking, then I can procede with the implementation. > > > > > > My current mindset is that, if you're going to allow unprivileged > > > mounts of user defined backing stores, this is as safe as we can > > > make it. > > > > All right, I've got a patch which I think does this, and I've managed to > > do some testing to confirm that it behaves like I expect. How does this > > look? > > > > What's missing is getting the label from the block device inode; as > > Stephen discovered the inode that I thought we could get the label from > > turned out to be the wrong one. Afaict we would need a new hook in order > > to do that, so for now I'm using the label of the proccess calling > > mount. > > > > --- > > > > diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c > > index a143328f75eb..8e631a66b03c 100644 > > --- a/security/smack/smack_lsm.c > > +++ b/security/smack/smack_lsm.c > > @@ -662,6 +662,8 @@ static int smack_sb_kern_mount(struct super_block *sb, int flags, void *data) > > skp = smk_of_current(); > > sp->smk_root = skp; > > sp->smk_default = skp; > > + if (sb_in_userns(sb)) > > + transmute = 1; > > } > > /* > > * Initialize the root inode. > > @@ -1023,6 +1025,12 @@ static int smack_inode_permission(struct inode *inode, int mask) > > if (mask == 0) > > return 0; > > > > + if (sb_in_userns(inode->i_sb)) { > > + struct superblock_smack *sbsp = inode->i_sb->s_security; > > + if (smk_of_inode(inode) != sbsp->smk_root) > > + return -EACCES; > > + } > > + > > /* May be droppable after audit */ > > if (no_block) > > return -ECHILD; > > @@ -3220,14 +3228,16 @@ static void smack_d_instantiate(struct dentry *opt_dentry, struct inode *inode) > > if (rc >= 0) > > transflag = SMK_INODE_TRANSMUTE; > > } > > - /* > > - * Don't let the exec or mmap label be "*" or "@". > > - */ > > - skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp); > > - if (IS_ERR(skp) || skp == &smack_known_star || > > - skp == &smack_known_web) > > - skp = NULL; > > - isp->smk_task = skp; > > + if (!sb_in_userns(inode->i_sb)) { > > + /* > > + * Don't let the exec or mmap label be "*" or "@". > > + */ > > + skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp); > > + if (IS_ERR(skp) || skp == &smack_known_star || > > + skp == &smack_known_web) > > + skp = NULL; > > + isp->smk_task = skp; > > + } > > > > skp = smk_fetch(XATTR_NAME_SMACKMMAP, inode, dp); > > if (IS_ERR(skp) || skp == &smack_known_star || > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-30 13:57 ` Serge Hallyn 0 siblings, 0 replies; 138+ messages in thread From: Serge Hallyn @ 2015-07-30 13:57 UTC (permalink / raw) To: Amir Goldstein Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, Seth Forshee, LSM List, SELinux-NSA, Linux FS Devel, Stephen Smalley, Alexander Viro Quoting Amir Goldstein (amir@cellrox.com): > On Tue, Jul 28, 2015 at 11:40 PM, Seth Forshee > <seth.forshee@canonical.com> wrote: > > > > On Wed, Jul 22, 2015 at 05:05:17PM -0700, Casey Schaufler wrote: > > > > This is what I currently think you want for user ns mounts: > > > > > > > > 1. smk_root and smk_default are assigned the label of the backing > > > > device. > > Seth, > > There were 2 main concerns discussed in this thread: > 1. trusting LSM labels outside the namespace > 2. trusting the content of the image file/loopdev > > While your approach addresses the first concern, I suspect it may be placing > an obstacle in a way for resolving the second concern. > > A viable security policy to mitigate the second concern could be: > - Allow only trusted programs (e.g. mkfs, fsck) to write to 'Loopback' images > - Allow mount only of 'Loopback' images > > This should allow the system as a whole to trust unprivileged mounts based on > the trust of the entities that had raw access the the fs layout. Just to be sure I understand right, you're looking for a way to let the host admin trust that the kernel's superblock parsers aren't being fed trash or an exploit? > Alas, if you choose to propagate the backing dev label to contained files, > they would all share the designated 'Loopback' label and render the policy above > useless. > > Any thoughts on how to reconcile this conflict? > > Amir. > > > > > > 2. s_root is assigned the transmute property. > > > > 3. For existing files: > > > > a. Files with the same label as the backing device are accessible. > > > > b. Files with any other label are not accessible. > > > > > > That's right. Accept correct data, reject anything that's not right. > > > > > > > If this is right, there are a couple lingering questions in my mind. > > > > > > > > First, what happens with files created in directories with the same > > > > label as the backing device but without the transmute property set? The > > > > inode for the new file will initially be labeled with smk_of_current(), > > > > but then during d_instantiate it will get smk_default and thus end up > > > > with the label we want. So that seems okay. > > > > > > Yes. > > > > > > > The second is whether files with the SMACK64EXEC attribute is still a > > > > problem. It seems it is, for files with the same label as the backing > > > > store at least. I think we can simply skip the code that reads out this > > > > xattr and sets smk_task for user ns mounts, or else skip assigning the > > > > label to the new task in bprm_set_creds. The latter seems more > > > > consistent with the approach you've suggested for dealing with labels > > > > from disk. > > > > > > Yes, I think that skipping the smk_fetch(XATTR_NAME_SMACKEXEC, ...) in > > > smack_d_instantiate for unprivileged mounts would do the trick. > > > > > > > So I guess all of that seems okay, though perhaps a bit restrictive > > > > given that the user who mounted the filesystem already has full access > > > > to the backing store. > > > > > > In truth, there is no reason to expect that the "user" who did the > > > mount will ever have a Smack label that differs from the label of > > > the backing store. If what we've got here seems restrictive, it's > > > because you've got access from someone other than the "user". > > > > > > > Please let me know whether or not this matches up with what you are > > > > thinking, then I can procede with the implementation. > > > > > > My current mindset is that, if you're going to allow unprivileged > > > mounts of user defined backing stores, this is as safe as we can > > > make it. > > > > All right, I've got a patch which I think does this, and I've managed to > > do some testing to confirm that it behaves like I expect. How does this > > look? > > > > What's missing is getting the label from the block device inode; as > > Stephen discovered the inode that I thought we could get the label from > > turned out to be the wrong one. Afaict we would need a new hook in order > > to do that, so for now I'm using the label of the proccess calling > > mount. > > > > --- > > > > diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c > > index a143328f75eb..8e631a66b03c 100644 > > --- a/security/smack/smack_lsm.c > > +++ b/security/smack/smack_lsm.c > > @@ -662,6 +662,8 @@ static int smack_sb_kern_mount(struct super_block *sb, int flags, void *data) > > skp = smk_of_current(); > > sp->smk_root = skp; > > sp->smk_default = skp; > > + if (sb_in_userns(sb)) > > + transmute = 1; > > } > > /* > > * Initialize the root inode. > > @@ -1023,6 +1025,12 @@ static int smack_inode_permission(struct inode *inode, int mask) > > if (mask == 0) > > return 0; > > > > + if (sb_in_userns(inode->i_sb)) { > > + struct superblock_smack *sbsp = inode->i_sb->s_security; > > + if (smk_of_inode(inode) != sbsp->smk_root) > > + return -EACCES; > > + } > > + > > /* May be droppable after audit */ > > if (no_block) > > return -ECHILD; > > @@ -3220,14 +3228,16 @@ static void smack_d_instantiate(struct dentry *opt_dentry, struct inode *inode) > > if (rc >= 0) > > transflag = SMK_INODE_TRANSMUTE; > > } > > - /* > > - * Don't let the exec or mmap label be "*" or "@". > > - */ > > - skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp); > > - if (IS_ERR(skp) || skp == &smack_known_star || > > - skp == &smack_known_web) > > - skp = NULL; > > - isp->smk_task = skp; > > + if (!sb_in_userns(inode->i_sb)) { > > + /* > > + * Don't let the exec or mmap label be "*" or "@". > > + */ > > + skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp); > > + if (IS_ERR(skp) || skp == &smack_known_star || > > + skp == &smack_known_web) > > + skp = NULL; > > + isp->smk_task = skp; > > + } > > > > skp = smk_fetch(XATTR_NAME_SMACKMMAP, inode, dp); > > if (IS_ERR(skp) || skp == &smack_known_star || > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-30 13:57 ` Serge Hallyn @ 2015-07-30 15:09 ` Amir Goldstein -1 siblings, 0 replies; 138+ messages in thread From: Amir Goldstein @ 2015-07-30 15:09 UTC (permalink / raw) To: Serge Hallyn Cc: Seth Forshee, Casey Schaufler, Stephen Smalley, Andy Lutomirski, Eric W. Biederman, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel On Thu, Jul 30, 2015 at 4:57 PM, Serge Hallyn <serge.hallyn@ubuntu.com> wrote: > Quoting Amir Goldstein (amir@cellrox.com): >> On Tue, Jul 28, 2015 at 11:40 PM, Seth Forshee >> <seth.forshee@canonical.com> wrote: >> > >> > On Wed, Jul 22, 2015 at 05:05:17PM -0700, Casey Schaufler wrote: >> > > > This is what I currently think you want for user ns mounts: >> > > > >> > > > 1. smk_root and smk_default are assigned the label of the backing >> > > > device. >> >> Seth, >> >> There were 2 main concerns discussed in this thread: >> 1. trusting LSM labels outside the namespace >> 2. trusting the content of the image file/loopdev >> >> While your approach addresses the first concern, I suspect it may be placing >> an obstacle in a way for resolving the second concern. >> >> A viable security policy to mitigate the second concern could be: >> - Allow only trusted programs (e.g. mkfs, fsck) to write to 'Loopback' images >> - Allow mount only of 'Loopback' images >> >> This should allow the system as a whole to trust unprivileged mounts based on >> the trust of the entities that had raw access the the fs layout. > > Just to be sure I understand right, you're looking for a way to let > the host admin trust that the kernel's superblock parsers aren't being > fed trash or an exploit? Correct. I do not believe in the direction of auditing file system code to vulnerability free level nor do I think that cryptographically signed file system metadata is the only way to ensure an exploit free unprivileged mount. > >> Alas, if you choose to propagate the backing dev label to contained files, >> they would all share the designated 'Loopback' label and render the policy above >> useless. >> >> Any thoughts on how to reconcile this conflict? >> >> Amir. >> >> >> > > > 2. s_root is assigned the transmute property. >> > > > 3. For existing files: >> > > > a. Files with the same label as the backing device are accessible. >> > > > b. Files with any other label are not accessible. >> > > >> > > That's right. Accept correct data, reject anything that's not right. >> > > >> > > > If this is right, there are a couple lingering questions in my mind. >> > > > >> > > > First, what happens with files created in directories with the same >> > > > label as the backing device but without the transmute property set? The >> > > > inode for the new file will initially be labeled with smk_of_current(), >> > > > but then during d_instantiate it will get smk_default and thus end up >> > > > with the label we want. So that seems okay. >> > > >> > > Yes. >> > > >> > > > The second is whether files with the SMACK64EXEC attribute is still a >> > > > problem. It seems it is, for files with the same label as the backing >> > > > store at least. I think we can simply skip the code that reads out this >> > > > xattr and sets smk_task for user ns mounts, or else skip assigning the >> > > > label to the new task in bprm_set_creds. The latter seems more >> > > > consistent with the approach you've suggested for dealing with labels >> > > > from disk. >> > > >> > > Yes, I think that skipping the smk_fetch(XATTR_NAME_SMACKEXEC, ...) in >> > > smack_d_instantiate for unprivileged mounts would do the trick. >> > > >> > > > So I guess all of that seems okay, though perhaps a bit restrictive >> > > > given that the user who mounted the filesystem already has full access >> > > > to the backing store. >> > > >> > > In truth, there is no reason to expect that the "user" who did the >> > > mount will ever have a Smack label that differs from the label of >> > > the backing store. If what we've got here seems restrictive, it's >> > > because you've got access from someone other than the "user". >> > > >> > > > Please let me know whether or not this matches up with what you are >> > > > thinking, then I can procede with the implementation. >> > > >> > > My current mindset is that, if you're going to allow unprivileged >> > > mounts of user defined backing stores, this is as safe as we can >> > > make it. >> > >> > All right, I've got a patch which I think does this, and I've managed to >> > do some testing to confirm that it behaves like I expect. How does this >> > look? >> > >> > What's missing is getting the label from the block device inode; as >> > Stephen discovered the inode that I thought we could get the label from >> > turned out to be the wrong one. Afaict we would need a new hook in order >> > to do that, so for now I'm using the label of the proccess calling >> > mount. >> > >> > --- >> > >> > diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c >> > index a143328f75eb..8e631a66b03c 100644 >> > --- a/security/smack/smack_lsm.c >> > +++ b/security/smack/smack_lsm.c >> > @@ -662,6 +662,8 @@ static int smack_sb_kern_mount(struct super_block *sb, int flags, void *data) >> > skp = smk_of_current(); >> > sp->smk_root = skp; >> > sp->smk_default = skp; >> > + if (sb_in_userns(sb)) >> > + transmute = 1; >> > } >> > /* >> > * Initialize the root inode. >> > @@ -1023,6 +1025,12 @@ static int smack_inode_permission(struct inode *inode, int mask) >> > if (mask == 0) >> > return 0; >> > >> > + if (sb_in_userns(inode->i_sb)) { >> > + struct superblock_smack *sbsp = inode->i_sb->s_security; >> > + if (smk_of_inode(inode) != sbsp->smk_root) >> > + return -EACCES; >> > + } >> > + >> > /* May be droppable after audit */ >> > if (no_block) >> > return -ECHILD; >> > @@ -3220,14 +3228,16 @@ static void smack_d_instantiate(struct dentry *opt_dentry, struct inode *inode) >> > if (rc >= 0) >> > transflag = SMK_INODE_TRANSMUTE; >> > } >> > - /* >> > - * Don't let the exec or mmap label be "*" or "@". >> > - */ >> > - skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp); >> > - if (IS_ERR(skp) || skp == &smack_known_star || >> > - skp == &smack_known_web) >> > - skp = NULL; >> > - isp->smk_task = skp; >> > + if (!sb_in_userns(inode->i_sb)) { >> > + /* >> > + * Don't let the exec or mmap label be "*" or "@". >> > + */ >> > + skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp); >> > + if (IS_ERR(skp) || skp == &smack_known_star || >> > + skp == &smack_known_web) >> > + skp = NULL; >> > + isp->smk_task = skp; >> > + } >> > >> > skp = smk_fetch(XATTR_NAME_SMACKMMAP, inode, dp); >> > if (IS_ERR(skp) || skp == &smack_known_star || >> > -- >> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in >> > the body of a message to majordomo@vger.kernel.org >> > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-30 15:09 ` Amir Goldstein 0 siblings, 0 replies; 138+ messages in thread From: Amir Goldstein @ 2015-07-30 15:09 UTC (permalink / raw) To: Serge Hallyn Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, Seth Forshee, LSM List, SELinux-NSA, Linux FS Devel, Stephen Smalley, Alexander Viro On Thu, Jul 30, 2015 at 4:57 PM, Serge Hallyn <serge.hallyn@ubuntu.com> wrote: > Quoting Amir Goldstein (amir@cellrox.com): >> On Tue, Jul 28, 2015 at 11:40 PM, Seth Forshee >> <seth.forshee@canonical.com> wrote: >> > >> > On Wed, Jul 22, 2015 at 05:05:17PM -0700, Casey Schaufler wrote: >> > > > This is what I currently think you want for user ns mounts: >> > > > >> > > > 1. smk_root and smk_default are assigned the label of the backing >> > > > device. >> >> Seth, >> >> There were 2 main concerns discussed in this thread: >> 1. trusting LSM labels outside the namespace >> 2. trusting the content of the image file/loopdev >> >> While your approach addresses the first concern, I suspect it may be placing >> an obstacle in a way for resolving the second concern. >> >> A viable security policy to mitigate the second concern could be: >> - Allow only trusted programs (e.g. mkfs, fsck) to write to 'Loopback' images >> - Allow mount only of 'Loopback' images >> >> This should allow the system as a whole to trust unprivileged mounts based on >> the trust of the entities that had raw access the the fs layout. > > Just to be sure I understand right, you're looking for a way to let > the host admin trust that the kernel's superblock parsers aren't being > fed trash or an exploit? Correct. I do not believe in the direction of auditing file system code to vulnerability free level nor do I think that cryptographically signed file system metadata is the only way to ensure an exploit free unprivileged mount. > >> Alas, if you choose to propagate the backing dev label to contained files, >> they would all share the designated 'Loopback' label and render the policy above >> useless. >> >> Any thoughts on how to reconcile this conflict? >> >> Amir. >> >> >> > > > 2. s_root is assigned the transmute property. >> > > > 3. For existing files: >> > > > a. Files with the same label as the backing device are accessible. >> > > > b. Files with any other label are not accessible. >> > > >> > > That's right. Accept correct data, reject anything that's not right. >> > > >> > > > If this is right, there are a couple lingering questions in my mind. >> > > > >> > > > First, what happens with files created in directories with the same >> > > > label as the backing device but without the transmute property set? The >> > > > inode for the new file will initially be labeled with smk_of_current(), >> > > > but then during d_instantiate it will get smk_default and thus end up >> > > > with the label we want. So that seems okay. >> > > >> > > Yes. >> > > >> > > > The second is whether files with the SMACK64EXEC attribute is still a >> > > > problem. It seems it is, for files with the same label as the backing >> > > > store at least. I think we can simply skip the code that reads out this >> > > > xattr and sets smk_task for user ns mounts, or else skip assigning the >> > > > label to the new task in bprm_set_creds. The latter seems more >> > > > consistent with the approach you've suggested for dealing with labels >> > > > from disk. >> > > >> > > Yes, I think that skipping the smk_fetch(XATTR_NAME_SMACKEXEC, ...) in >> > > smack_d_instantiate for unprivileged mounts would do the trick. >> > > >> > > > So I guess all of that seems okay, though perhaps a bit restrictive >> > > > given that the user who mounted the filesystem already has full access >> > > > to the backing store. >> > > >> > > In truth, there is no reason to expect that the "user" who did the >> > > mount will ever have a Smack label that differs from the label of >> > > the backing store. If what we've got here seems restrictive, it's >> > > because you've got access from someone other than the "user". >> > > >> > > > Please let me know whether or not this matches up with what you are >> > > > thinking, then I can procede with the implementation. >> > > >> > > My current mindset is that, if you're going to allow unprivileged >> > > mounts of user defined backing stores, this is as safe as we can >> > > make it. >> > >> > All right, I've got a patch which I think does this, and I've managed to >> > do some testing to confirm that it behaves like I expect. How does this >> > look? >> > >> > What's missing is getting the label from the block device inode; as >> > Stephen discovered the inode that I thought we could get the label from >> > turned out to be the wrong one. Afaict we would need a new hook in order >> > to do that, so for now I'm using the label of the proccess calling >> > mount. >> > >> > --- >> > >> > diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c >> > index a143328f75eb..8e631a66b03c 100644 >> > --- a/security/smack/smack_lsm.c >> > +++ b/security/smack/smack_lsm.c >> > @@ -662,6 +662,8 @@ static int smack_sb_kern_mount(struct super_block *sb, int flags, void *data) >> > skp = smk_of_current(); >> > sp->smk_root = skp; >> > sp->smk_default = skp; >> > + if (sb_in_userns(sb)) >> > + transmute = 1; >> > } >> > /* >> > * Initialize the root inode. >> > @@ -1023,6 +1025,12 @@ static int smack_inode_permission(struct inode *inode, int mask) >> > if (mask == 0) >> > return 0; >> > >> > + if (sb_in_userns(inode->i_sb)) { >> > + struct superblock_smack *sbsp = inode->i_sb->s_security; >> > + if (smk_of_inode(inode) != sbsp->smk_root) >> > + return -EACCES; >> > + } >> > + >> > /* May be droppable after audit */ >> > if (no_block) >> > return -ECHILD; >> > @@ -3220,14 +3228,16 @@ static void smack_d_instantiate(struct dentry *opt_dentry, struct inode *inode) >> > if (rc >= 0) >> > transflag = SMK_INODE_TRANSMUTE; >> > } >> > - /* >> > - * Don't let the exec or mmap label be "*" or "@". >> > - */ >> > - skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp); >> > - if (IS_ERR(skp) || skp == &smack_known_star || >> > - skp == &smack_known_web) >> > - skp = NULL; >> > - isp->smk_task = skp; >> > + if (!sb_in_userns(inode->i_sb)) { >> > + /* >> > + * Don't let the exec or mmap label be "*" or "@". >> > + */ >> > + skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp); >> > + if (IS_ERR(skp) || skp == &smack_known_star || >> > + skp == &smack_known_web) >> > + skp = NULL; >> > + isp->smk_task = skp; >> > + } >> > >> > skp = smk_fetch(XATTR_NAME_SMACKMMAP, inode, dp); >> > if (IS_ERR(skp) || skp == &smack_known_star || >> > -- >> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in >> > the body of a message to majordomo@vger.kernel.org >> > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 138+ messages in thread
* [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-15 19:46 ` Seth Forshee 0 siblings, 0 replies; 138+ messages in thread From: Seth Forshee @ 2015-07-15 19:46 UTC (permalink / raw) To: Eric W. Biederman, Alexander Viro, linux-fsdevel, linux-security-module, selinux Cc: Serge Hallyn, Andy Lutomirski, Seth Forshee, linux-kernel These are the first in a larger set of patches that I've been working on (with help from Eric Biederman) to support mounting ext4 and fuse filesystems from within user namespaces. I've pushed the full series to: git://kernel.ubuntu.com/sforshee/linux.git userns-mounts Taking the series as a whole, the strategy is to handle as much of the heavy lifting as possible in the vfs so the filesystems don't have to handle weird edge cases. If you look at the full series you'll find that the changes in ext4 to support user namespace mounts turn out to be fairly minimal (fuse is a bit more complicated though as it must deal with translating ids for a userspace process which is running in pid and user namespaces). The patches I'm sending today lay some of the groundwork in the vfs and related code. They fall into two broad groups: 1. Patches 1-2 add s_user_ns and simplify MNT_NODEV handling. These are pretty straightforward, and Eric has expressed interest in merging these patches soon. Note that patch 2 won't apply cleanly without Eric's noexec patches for proc and sys [1]. 2. Patches 2-7 tighten down security for mounts with s_user_ns != &init_user_ns. This includes updates to how file caps and suid are handled and LSM updates to ignore security labels on superblocks from non-init namespaces. The LSM changes in particular may not be optimal, as I don't have a lot of familiarity with this code, so I'd be especially appreciative of review of these changes and suggestions on how to improve them. Subsequent patches will update the vfs for id translation, handling various corner cases, giving privileges to the user namsepace which owns a superblock, and finally supporting user namespace mounts for ext4 and fuse. Thanks, Seth [1] http://lkml.kernel.org/r/87mvz4yomp.fsf_-_@x220.int.ebiederm.org Andy Lutomirski (1): fs: Treat foreign mounts as nosuid Eric W. Biederman (1): userns: Simpilify MNT_NODEV handling. Seth Forshee (5): fs: Add user namesapace member to struct super_block fs: Ignore file caps in mounts from other user namespaces security: Restrict security attribute updates for userns mounts selinux: Ignore security labels on user namespace mounts smack: Don't use security labels for user namespace mounts fs/block_dev.c | 2 +- fs/exec.c | 2 +- fs/namei.c | 9 ++++++++- fs/namespace.c | 34 ++++++++++++++++++++-------------- fs/proc/root.c | 3 ++- fs/super.c | 38 +++++++++++++++++++++++++++++++++----- include/linux/fs.h | 9 +++++++++ include/linux/mount.h | 1 + include/linux/user_namespace.h | 8 ++++++++ kernel/user_namespace.c | 14 ++++++++++++++ security/commoncap.c | 4 +++- security/security.c | 10 +++++++++- security/selinux/hooks.c | 16 +++++++++++++++- security/smack/smack_lsm.c | 12 ++++++++++-- 14 files changed, 134 insertions(+), 28 deletions(-) ^ permalink raw reply [flat|nested] 138+ messages in thread
* [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-15 19:46 ` Seth Forshee 0 siblings, 0 replies; 138+ messages in thread From: Seth Forshee @ 2015-07-15 19:46 UTC (permalink / raw) To: Eric W. Biederman, Alexander Viro, linux-fsdevel, linux-security-module, selinux Cc: Serge Hallyn, Seth Forshee, linux-kernel, Andy Lutomirski These are the first in a larger set of patches that I've been working on (with help from Eric Biederman) to support mounting ext4 and fuse filesystems from within user namespaces. I've pushed the full series to: git://kernel.ubuntu.com/sforshee/linux.git userns-mounts Taking the series as a whole, the strategy is to handle as much of the heavy lifting as possible in the vfs so the filesystems don't have to handle weird edge cases. If you look at the full series you'll find that the changes in ext4 to support user namespace mounts turn out to be fairly minimal (fuse is a bit more complicated though as it must deal with translating ids for a userspace process which is running in pid and user namespaces). The patches I'm sending today lay some of the groundwork in the vfs and related code. They fall into two broad groups: 1. Patches 1-2 add s_user_ns and simplify MNT_NODEV handling. These are pretty straightforward, and Eric has expressed interest in merging these patches soon. Note that patch 2 won't apply cleanly without Eric's noexec patches for proc and sys [1]. 2. Patches 2-7 tighten down security for mounts with s_user_ns != &init_user_ns. This includes updates to how file caps and suid are handled and LSM updates to ignore security labels on superblocks from non-init namespaces. The LSM changes in particular may not be optimal, as I don't have a lot of familiarity with this code, so I'd be especially appreciative of review of these changes and suggestions on how to improve them. Subsequent patches will update the vfs for id translation, handling various corner cases, giving privileges to the user namsepace which owns a superblock, and finally supporting user namespace mounts for ext4 and fuse. Thanks, Seth [1] http://lkml.kernel.org/r/87mvz4yomp.fsf_-_@x220.int.ebiederm.org Andy Lutomirski (1): fs: Treat foreign mounts as nosuid Eric W. Biederman (1): userns: Simpilify MNT_NODEV handling. Seth Forshee (5): fs: Add user namesapace member to struct super_block fs: Ignore file caps in mounts from other user namespaces security: Restrict security attribute updates for userns mounts selinux: Ignore security labels on user namespace mounts smack: Don't use security labels for user namespace mounts fs/block_dev.c | 2 +- fs/exec.c | 2 +- fs/namei.c | 9 ++++++++- fs/namespace.c | 34 ++++++++++++++++++++-------------- fs/proc/root.c | 3 ++- fs/super.c | 38 +++++++++++++++++++++++++++++++++----- include/linux/fs.h | 9 +++++++++ include/linux/mount.h | 1 + include/linux/user_namespace.h | 8 ++++++++ kernel/user_namespace.c | 14 ++++++++++++++ security/commoncap.c | 4 +++- security/security.c | 10 +++++++++- security/selinux/hooks.c | 16 +++++++++++++++- security/smack/smack_lsm.c | 12 ++++++++++-- 14 files changed, 134 insertions(+), 28 deletions(-) ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-15 19:46 ` Seth Forshee @ 2015-07-15 20:36 ` Casey Schaufler -1 siblings, 0 replies; 138+ messages in thread From: Casey Schaufler @ 2015-07-15 20:36 UTC (permalink / raw) To: Seth Forshee, Eric W. Biederman, Alexander Viro, linux-fsdevel, linux-security-module, selinux Cc: Serge Hallyn, Andy Lutomirski, linux-kernel On 7/15/2015 12:46 PM, Seth Forshee wrote: > These are the first in a larger set of patches that I've been working on > (with help from Eric Biederman) to support mounting ext4 and fuse > filesystems from within user namespaces. I've pushed the full series to: > > git://kernel.ubuntu.com/sforshee/linux.git userns-mounts > > Taking the series as a whole, the strategy is to handle as much of the > heavy lifting as possible in the vfs so the filesystems don't have to > handle weird edge cases. If you look at the full series you'll find that > the changes in ext4 to support user namespace mounts turn out to be > fairly minimal (fuse is a bit more complicated though as it must deal > with translating ids for a userspace process which is running in pid and > user namespaces). > > The patches I'm sending today lay some of the groundwork in the vfs and > related code. They fall into two broad groups: > > 1. Patches 1-2 add s_user_ns and simplify MNT_NODEV handling. These are > pretty straightforward, and Eric has expressed interest in merging > these patches soon. Note that patch 2 won't apply cleanly without > Eric's noexec patches for proc and sys [1]. > > 2. Patches 2-7 tighten down security for mounts with s_user_ns != > &init_user_ns. This includes updates to how file caps and suid are > handled and LSM updates to ignore security labels on superblocks > from non-init namespaces. > > The LSM changes in particular may not be optimal, as I don't have a > lot of familiarity with this code, so I'd be especially appreciative > of review of these changes and suggestions on how to improve them. Lukasz Pawelczyk <l.pawelczyk@samsung.com> proposed LSM support in user namespaces ([RFC] lsm: namespace hooks) that make a whole lot more sense than just turning off the option of using labels on files. Gutting the ability to use MAC in a namespace is a step down the road of making MAC and namespaces incompatible. > > Subsequent patches will update the vfs for id translation, handling > various corner cases, giving privileges to the user namsepace which owns > a superblock, and finally supporting user namespace mounts for ext4 and > fuse. > > Thanks, > Seth > > [1] http://lkml.kernel.org/r/87mvz4yomp.fsf_-_@x220.int.ebiederm.org > > > Andy Lutomirski (1): > fs: Treat foreign mounts as nosuid > > Eric W. Biederman (1): > userns: Simpilify MNT_NODEV handling. > > Seth Forshee (5): > fs: Add user namesapace member to struct super_block > fs: Ignore file caps in mounts from other user namespaces > security: Restrict security attribute updates for userns mounts > selinux: Ignore security labels on user namespace mounts > smack: Don't use security labels for user namespace mounts > > fs/block_dev.c | 2 +- > fs/exec.c | 2 +- > fs/namei.c | 9 ++++++++- > fs/namespace.c | 34 ++++++++++++++++++++-------------- > fs/proc/root.c | 3 ++- > fs/super.c | 38 +++++++++++++++++++++++++++++++++----- > include/linux/fs.h | 9 +++++++++ > include/linux/mount.h | 1 + > include/linux/user_namespace.h | 8 ++++++++ > kernel/user_namespace.c | 14 ++++++++++++++ > security/commoncap.c | 4 +++- > security/security.c | 10 +++++++++- > security/selinux/hooks.c | 16 +++++++++++++++- > security/smack/smack_lsm.c | 12 ++++++++++-- > 14 files changed, 134 insertions(+), 28 deletions(-) > > -- > To unsubscribe from this list: send the line "unsubscribe linux-security-module" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-15 20:36 ` Casey Schaufler 0 siblings, 0 replies; 138+ messages in thread From: Casey Schaufler @ 2015-07-15 20:36 UTC (permalink / raw) To: Seth Forshee, Eric W. Biederman, Alexander Viro, linux-fsdevel, linux-security-module, selinux Cc: Serge Hallyn, linux-kernel, Andy Lutomirski On 7/15/2015 12:46 PM, Seth Forshee wrote: > These are the first in a larger set of patches that I've been working on > (with help from Eric Biederman) to support mounting ext4 and fuse > filesystems from within user namespaces. I've pushed the full series to: > > git://kernel.ubuntu.com/sforshee/linux.git userns-mounts > > Taking the series as a whole, the strategy is to handle as much of the > heavy lifting as possible in the vfs so the filesystems don't have to > handle weird edge cases. If you look at the full series you'll find that > the changes in ext4 to support user namespace mounts turn out to be > fairly minimal (fuse is a bit more complicated though as it must deal > with translating ids for a userspace process which is running in pid and > user namespaces). > > The patches I'm sending today lay some of the groundwork in the vfs and > related code. They fall into two broad groups: > > 1. Patches 1-2 add s_user_ns and simplify MNT_NODEV handling. These are > pretty straightforward, and Eric has expressed interest in merging > these patches soon. Note that patch 2 won't apply cleanly without > Eric's noexec patches for proc and sys [1]. > > 2. Patches 2-7 tighten down security for mounts with s_user_ns != > &init_user_ns. This includes updates to how file caps and suid are > handled and LSM updates to ignore security labels on superblocks > from non-init namespaces. > > The LSM changes in particular may not be optimal, as I don't have a > lot of familiarity with this code, so I'd be especially appreciative > of review of these changes and suggestions on how to improve them. Lukasz Pawelczyk <l.pawelczyk@samsung.com> proposed LSM support in user namespaces ([RFC] lsm: namespace hooks) that make a whole lot more sense than just turning off the option of using labels on files. Gutting the ability to use MAC in a namespace is a step down the road of making MAC and namespaces incompatible. > > Subsequent patches will update the vfs for id translation, handling > various corner cases, giving privileges to the user namsepace which owns > a superblock, and finally supporting user namespace mounts for ext4 and > fuse. > > Thanks, > Seth > > [1] http://lkml.kernel.org/r/87mvz4yomp.fsf_-_@x220.int.ebiederm.org > > > Andy Lutomirski (1): > fs: Treat foreign mounts as nosuid > > Eric W. Biederman (1): > userns: Simpilify MNT_NODEV handling. > > Seth Forshee (5): > fs: Add user namesapace member to struct super_block > fs: Ignore file caps in mounts from other user namespaces > security: Restrict security attribute updates for userns mounts > selinux: Ignore security labels on user namespace mounts > smack: Don't use security labels for user namespace mounts > > fs/block_dev.c | 2 +- > fs/exec.c | 2 +- > fs/namei.c | 9 ++++++++- > fs/namespace.c | 34 ++++++++++++++++++++-------------- > fs/proc/root.c | 3 ++- > fs/super.c | 38 +++++++++++++++++++++++++++++++++----- > include/linux/fs.h | 9 +++++++++ > include/linux/mount.h | 1 + > include/linux/user_namespace.h | 8 ++++++++ > kernel/user_namespace.c | 14 ++++++++++++++ > security/commoncap.c | 4 +++- > security/security.c | 10 +++++++++- > security/selinux/hooks.c | 16 +++++++++++++++- > security/smack/smack_lsm.c | 12 ++++++++++-- > 14 files changed, 134 insertions(+), 28 deletions(-) > > -- > To unsubscribe from this list: send the line "unsubscribe linux-security-module" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-15 20:36 ` Casey Schaufler @ 2015-07-15 21:06 ` Eric W. Biederman -1 siblings, 0 replies; 138+ messages in thread From: Eric W. Biederman @ 2015-07-15 21:06 UTC (permalink / raw) To: Casey Schaufler Cc: Seth Forshee, Alexander Viro, linux-fsdevel, linux-security-module, selinux, Serge Hallyn, Andy Lutomirski, linux-kernel Casey Schaufler <casey@schaufler-ca.com> writes: > On 7/15/2015 12:46 PM, Seth Forshee wrote: >> These are the first in a larger set of patches that I've been working on >> (with help from Eric Biederman) to support mounting ext4 and fuse >> filesystems from within user namespaces. I've pushed the full series to: >> >> git://kernel.ubuntu.com/sforshee/linux.git userns-mounts >> >> Taking the series as a whole, the strategy is to handle as much of the >> heavy lifting as possible in the vfs so the filesystems don't have to >> handle weird edge cases. If you look at the full series you'll find that >> the changes in ext4 to support user namespace mounts turn out to be >> fairly minimal (fuse is a bit more complicated though as it must deal >> with translating ids for a userspace process which is running in pid and >> user namespaces). >> >> The patches I'm sending today lay some of the groundwork in the vfs and >> related code. They fall into two broad groups: >> >> 1. Patches 1-2 add s_user_ns and simplify MNT_NODEV handling. These are >> pretty straightforward, and Eric has expressed interest in merging >> these patches soon. Note that patch 2 won't apply cleanly without >> Eric's noexec patches for proc and sys [1]. >> >> 2. Patches 2-7 tighten down security for mounts with s_user_ns != >> &init_user_ns. This includes updates to how file caps and suid are >> handled and LSM updates to ignore security labels on superblocks >> from non-init namespaces. >> >> The LSM changes in particular may not be optimal, as I don't have a >> lot of familiarity with this code, so I'd be especially appreciative >> of review of these changes and suggestions on how to improve them. > > Lukasz Pawelczyk <l.pawelczyk@samsung.com> proposed > LSM support in user namespaces ([RFC] lsm: namespace hooks) > that make a whole lot more sense than just turning off > the option of using labels on files. Gutting the ability > to use MAC in a namespace is a step down the road of > making MAC and namespaces incompatible. This is not "turning off the option to use labels on files". This is supporting mounting filesystems like ext4 by unprivileged users and not trusting the labels they set in the same way as we trust labels on filesystems mounted by privileged users. The first step needs to be not trusting those labels and treating such filesystems as filesystems without label support. I hope that is Seth has implemented. In the long run we can do more interesting things with such filesystems once the appropriate LSM policy is in place. Getting s_user_ns present on struct super, properly set, and all of the appropriate checks against it present in the vfs so that filesystems don't need to duplicate logic is important if we are going do more interesting things with user namespaces (as users have been asking for). It is important for things as small as making it safe to allow truly unprivileged users to mount fuse filesystems. I am on the fence with Lukasz Pawelczyk's patches. Some parts I liked some parts I had issues with. As I recall one of my issues was that those patches conflicted in detail if not in principle with this appropach. If these patches do not do a good job of laying the ground work for supporting security labels that unprivileged users can set than Seth could really use some feedback. Figuring out how to properly deal with the LSMs has been one of his challenges. I am hoping I can finishing working through the patches to fix the semantics of rename and bind mounts before the next merge window opens, so I can have enough cycles to lift the feature freeze on user namespaces. Except for maybe his first two patches (which fix a small userspace API breakage) none of Seth's patches get to go in until I lift the freeze. Which is probably too much information but I hope this makes it clear that the point of this work is as an enabler for future developments, not as something to make user namespaces and LSMs incompatible. Eric ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-15 21:06 ` Eric W. Biederman 0 siblings, 0 replies; 138+ messages in thread From: Eric W. Biederman @ 2015-07-15 21:06 UTC (permalink / raw) To: Casey Schaufler Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, Seth Forshee, linux-security-module, Alexander Viro, selinux, linux-fsdevel Casey Schaufler <casey@schaufler-ca.com> writes: > On 7/15/2015 12:46 PM, Seth Forshee wrote: >> These are the first in a larger set of patches that I've been working on >> (with help from Eric Biederman) to support mounting ext4 and fuse >> filesystems from within user namespaces. I've pushed the full series to: >> >> git://kernel.ubuntu.com/sforshee/linux.git userns-mounts >> >> Taking the series as a whole, the strategy is to handle as much of the >> heavy lifting as possible in the vfs so the filesystems don't have to >> handle weird edge cases. If you look at the full series you'll find that >> the changes in ext4 to support user namespace mounts turn out to be >> fairly minimal (fuse is a bit more complicated though as it must deal >> with translating ids for a userspace process which is running in pid and >> user namespaces). >> >> The patches I'm sending today lay some of the groundwork in the vfs and >> related code. They fall into two broad groups: >> >> 1. Patches 1-2 add s_user_ns and simplify MNT_NODEV handling. These are >> pretty straightforward, and Eric has expressed interest in merging >> these patches soon. Note that patch 2 won't apply cleanly without >> Eric's noexec patches for proc and sys [1]. >> >> 2. Patches 2-7 tighten down security for mounts with s_user_ns != >> &init_user_ns. This includes updates to how file caps and suid are >> handled and LSM updates to ignore security labels on superblocks >> from non-init namespaces. >> >> The LSM changes in particular may not be optimal, as I don't have a >> lot of familiarity with this code, so I'd be especially appreciative >> of review of these changes and suggestions on how to improve them. > > Lukasz Pawelczyk <l.pawelczyk@samsung.com> proposed > LSM support in user namespaces ([RFC] lsm: namespace hooks) > that make a whole lot more sense than just turning off > the option of using labels on files. Gutting the ability > to use MAC in a namespace is a step down the road of > making MAC and namespaces incompatible. This is not "turning off the option to use labels on files". This is supporting mounting filesystems like ext4 by unprivileged users and not trusting the labels they set in the same way as we trust labels on filesystems mounted by privileged users. The first step needs to be not trusting those labels and treating such filesystems as filesystems without label support. I hope that is Seth has implemented. In the long run we can do more interesting things with such filesystems once the appropriate LSM policy is in place. Getting s_user_ns present on struct super, properly set, and all of the appropriate checks against it present in the vfs so that filesystems don't need to duplicate logic is important if we are going do more interesting things with user namespaces (as users have been asking for). It is important for things as small as making it safe to allow truly unprivileged users to mount fuse filesystems. I am on the fence with Lukasz Pawelczyk's patches. Some parts I liked some parts I had issues with. As I recall one of my issues was that those patches conflicted in detail if not in principle with this appropach. If these patches do not do a good job of laying the ground work for supporting security labels that unprivileged users can set than Seth could really use some feedback. Figuring out how to properly deal with the LSMs has been one of his challenges. I am hoping I can finishing working through the patches to fix the semantics of rename and bind mounts before the next merge window opens, so I can have enough cycles to lift the feature freeze on user namespaces. Except for maybe his first two patches (which fix a small userspace API breakage) none of Seth's patches get to go in until I lift the freeze. Which is probably too much information but I hope this makes it clear that the point of this work is as an enabler for future developments, not as something to make user namespaces and LSMs incompatible. Eric ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-15 21:06 ` Eric W. Biederman @ 2015-07-15 21:48 ` Seth Forshee -1 siblings, 0 replies; 138+ messages in thread From: Seth Forshee @ 2015-07-15 21:48 UTC (permalink / raw) To: Eric W. Biederman Cc: Casey Schaufler, Alexander Viro, linux-fsdevel, linux-security-module, selinux, Serge Hallyn, Andy Lutomirski, linux-kernel On Wed, Jul 15, 2015 at 04:06:35PM -0500, Eric W. Biederman wrote: > Casey Schaufler <casey@schaufler-ca.com> writes: > > > On 7/15/2015 12:46 PM, Seth Forshee wrote: > >> These are the first in a larger set of patches that I've been working on > >> (with help from Eric Biederman) to support mounting ext4 and fuse > >> filesystems from within user namespaces. I've pushed the full series to: > >> > >> git://kernel.ubuntu.com/sforshee/linux.git userns-mounts > >> > >> Taking the series as a whole, the strategy is to handle as much of the > >> heavy lifting as possible in the vfs so the filesystems don't have to > >> handle weird edge cases. If you look at the full series you'll find that > >> the changes in ext4 to support user namespace mounts turn out to be > >> fairly minimal (fuse is a bit more complicated though as it must deal > >> with translating ids for a userspace process which is running in pid and > >> user namespaces). > >> > >> The patches I'm sending today lay some of the groundwork in the vfs and > >> related code. They fall into two broad groups: > >> > >> 1. Patches 1-2 add s_user_ns and simplify MNT_NODEV handling. These are > >> pretty straightforward, and Eric has expressed interest in merging > >> these patches soon. Note that patch 2 won't apply cleanly without > >> Eric's noexec patches for proc and sys [1]. > >> > >> 2. Patches 2-7 tighten down security for mounts with s_user_ns != > >> &init_user_ns. This includes updates to how file caps and suid are > >> handled and LSM updates to ignore security labels on superblocks > >> from non-init namespaces. > >> > >> The LSM changes in particular may not be optimal, as I don't have a > >> lot of familiarity with this code, so I'd be especially appreciative > >> of review of these changes and suggestions on how to improve them. > > > > Lukasz Pawelczyk <l.pawelczyk@samsung.com> proposed > > LSM support in user namespaces ([RFC] lsm: namespace hooks) > > that make a whole lot more sense than just turning off > > the option of using labels on files. Gutting the ability > > to use MAC in a namespace is a step down the road of > > making MAC and namespaces incompatible. > > This is not "turning off the option to use labels on files". > > This is supporting mounting filesystems like ext4 by unprivileged users > and not trusting the labels they set in the same way as we trust labels > on filesystems mounted by privileged users. > > The first step needs to be not trusting those labels and treating such > filesystems as filesystems without label support. I hope that is Seth > has implemented. > > In the long run we can do more interesting things with such filesystems > once the appropriate LSM policy is in place. Yes, this exactly. Right now it looks to me like the only safe thing to do with mounts from unprivileged users is to ignore the security labels, so that's what I'm trying to do with these changes. If there's some better thing to do, or some better way to do it, I'm more than happy to receive that feedback. Seth ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-15 21:48 ` Seth Forshee 0 siblings, 0 replies; 138+ messages in thread From: Seth Forshee @ 2015-07-15 21:48 UTC (permalink / raw) To: Eric W. Biederman Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, linux-fsdevel, linux-security-module, Alexander Viro, selinux On Wed, Jul 15, 2015 at 04:06:35PM -0500, Eric W. Biederman wrote: > Casey Schaufler <casey@schaufler-ca.com> writes: > > > On 7/15/2015 12:46 PM, Seth Forshee wrote: > >> These are the first in a larger set of patches that I've been working on > >> (with help from Eric Biederman) to support mounting ext4 and fuse > >> filesystems from within user namespaces. I've pushed the full series to: > >> > >> git://kernel.ubuntu.com/sforshee/linux.git userns-mounts > >> > >> Taking the series as a whole, the strategy is to handle as much of the > >> heavy lifting as possible in the vfs so the filesystems don't have to > >> handle weird edge cases. If you look at the full series you'll find that > >> the changes in ext4 to support user namespace mounts turn out to be > >> fairly minimal (fuse is a bit more complicated though as it must deal > >> with translating ids for a userspace process which is running in pid and > >> user namespaces). > >> > >> The patches I'm sending today lay some of the groundwork in the vfs and > >> related code. They fall into two broad groups: > >> > >> 1. Patches 1-2 add s_user_ns and simplify MNT_NODEV handling. These are > >> pretty straightforward, and Eric has expressed interest in merging > >> these patches soon. Note that patch 2 won't apply cleanly without > >> Eric's noexec patches for proc and sys [1]. > >> > >> 2. Patches 2-7 tighten down security for mounts with s_user_ns != > >> &init_user_ns. This includes updates to how file caps and suid are > >> handled and LSM updates to ignore security labels on superblocks > >> from non-init namespaces. > >> > >> The LSM changes in particular may not be optimal, as I don't have a > >> lot of familiarity with this code, so I'd be especially appreciative > >> of review of these changes and suggestions on how to improve them. > > > > Lukasz Pawelczyk <l.pawelczyk@samsung.com> proposed > > LSM support in user namespaces ([RFC] lsm: namespace hooks) > > that make a whole lot more sense than just turning off > > the option of using labels on files. Gutting the ability > > to use MAC in a namespace is a step down the road of > > making MAC and namespaces incompatible. > > This is not "turning off the option to use labels on files". > > This is supporting mounting filesystems like ext4 by unprivileged users > and not trusting the labels they set in the same way as we trust labels > on filesystems mounted by privileged users. > > The first step needs to be not trusting those labels and treating such > filesystems as filesystems without label support. I hope that is Seth > has implemented. > > In the long run we can do more interesting things with such filesystems > once the appropriate LSM policy is in place. Yes, this exactly. Right now it looks to me like the only safe thing to do with mounts from unprivileged users is to ignore the security labels, so that's what I'm trying to do with these changes. If there's some better thing to do, or some better way to do it, I'm more than happy to receive that feedback. Seth ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-15 21:48 ` Seth Forshee @ 2015-07-15 22:28 ` Eric W. Biederman -1 siblings, 0 replies; 138+ messages in thread From: Eric W. Biederman @ 2015-07-15 22:28 UTC (permalink / raw) To: Seth Forshee Cc: Casey Schaufler, Alexander Viro, linux-fsdevel, linux-security-module, selinux, Serge Hallyn, Andy Lutomirski, linux-kernel Seth Forshee <seth.forshee@canonical.com> writes: > On Wed, Jul 15, 2015 at 04:06:35PM -0500, Eric W. Biederman wrote: >> Casey Schaufler <casey@schaufler-ca.com> writes: >> >> > On 7/15/2015 12:46 PM, Seth Forshee wrote: >> >> These are the first in a larger set of patches that I've been working on >> >> (with help from Eric Biederman) to support mounting ext4 and fuse >> >> filesystems from within user namespaces. I've pushed the full series to: >> >> >> >> git://kernel.ubuntu.com/sforshee/linux.git userns-mounts >> >> >> >> Taking the series as a whole, the strategy is to handle as much of the >> >> heavy lifting as possible in the vfs so the filesystems don't have to >> >> handle weird edge cases. If you look at the full series you'll find that >> >> the changes in ext4 to support user namespace mounts turn out to be >> >> fairly minimal (fuse is a bit more complicated though as it must deal >> >> with translating ids for a userspace process which is running in pid and >> >> user namespaces). >> >> >> >> The patches I'm sending today lay some of the groundwork in the vfs and >> >> related code. They fall into two broad groups: >> >> >> >> 1. Patches 1-2 add s_user_ns and simplify MNT_NODEV handling. These are >> >> pretty straightforward, and Eric has expressed interest in merging >> >> these patches soon. Note that patch 2 won't apply cleanly without >> >> Eric's noexec patches for proc and sys [1]. >> >> >> >> 2. Patches 2-7 tighten down security for mounts with s_user_ns != >> >> &init_user_ns. This includes updates to how file caps and suid are >> >> handled and LSM updates to ignore security labels on superblocks >> >> from non-init namespaces. >> >> >> >> The LSM changes in particular may not be optimal, as I don't have a >> >> lot of familiarity with this code, so I'd be especially appreciative >> >> of review of these changes and suggestions on how to improve them. >> > >> > Lukasz Pawelczyk <l.pawelczyk@samsung.com> proposed >> > LSM support in user namespaces ([RFC] lsm: namespace hooks) >> > that make a whole lot more sense than just turning off >> > the option of using labels on files. Gutting the ability >> > to use MAC in a namespace is a step down the road of >> > making MAC and namespaces incompatible. >> >> This is not "turning off the option to use labels on files". >> >> This is supporting mounting filesystems like ext4 by unprivileged users >> and not trusting the labels they set in the same way as we trust labels >> on filesystems mounted by privileged users. >> >> The first step needs to be not trusting those labels and treating such >> filesystems as filesystems without label support. I hope that is Seth >> has implemented. >> >> In the long run we can do more interesting things with such filesystems >> once the appropriate LSM policy is in place. > > Yes, this exactly. Right now it looks to me like the only safe thing to > do with mounts from unprivileged users is to ignore the security labels, > so that's what I'm trying to do with these changes. If there's some > better thing to do, or some better way to do it, I'm more than happy to > receive that feedback. Ugh. This made me realize that we have an interesting problem here. An unprivileged mount of tmpfs probably needs to have s_user_ns == &init_user_ns. Otherwise we will break security labels on tmpfs for no good reason. ramfs and sysfs also seem to have similar concerns. Because they have no backing store we can trust those filesystems with security labels. Plus for at least sysfs there is the security label bleed through issue, that we need to make certain works. Perhaps these filesystems with trusted backing store need to call "sget_userns(..., &init_user_ns)". If we don't get this right we will have significant regressions with respect to security labels, and that is not ok. Eric ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-15 22:28 ` Eric W. Biederman 0 siblings, 0 replies; 138+ messages in thread From: Eric W. Biederman @ 2015-07-15 22:28 UTC (permalink / raw) To: Seth Forshee Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, linux-fsdevel, linux-security-module, Alexander Viro, selinux Seth Forshee <seth.forshee@canonical.com> writes: > On Wed, Jul 15, 2015 at 04:06:35PM -0500, Eric W. Biederman wrote: >> Casey Schaufler <casey@schaufler-ca.com> writes: >> >> > On 7/15/2015 12:46 PM, Seth Forshee wrote: >> >> These are the first in a larger set of patches that I've been working on >> >> (with help from Eric Biederman) to support mounting ext4 and fuse >> >> filesystems from within user namespaces. I've pushed the full series to: >> >> >> >> git://kernel.ubuntu.com/sforshee/linux.git userns-mounts >> >> >> >> Taking the series as a whole, the strategy is to handle as much of the >> >> heavy lifting as possible in the vfs so the filesystems don't have to >> >> handle weird edge cases. If you look at the full series you'll find that >> >> the changes in ext4 to support user namespace mounts turn out to be >> >> fairly minimal (fuse is a bit more complicated though as it must deal >> >> with translating ids for a userspace process which is running in pid and >> >> user namespaces). >> >> >> >> The patches I'm sending today lay some of the groundwork in the vfs and >> >> related code. They fall into two broad groups: >> >> >> >> 1. Patches 1-2 add s_user_ns and simplify MNT_NODEV handling. These are >> >> pretty straightforward, and Eric has expressed interest in merging >> >> these patches soon. Note that patch 2 won't apply cleanly without >> >> Eric's noexec patches for proc and sys [1]. >> >> >> >> 2. Patches 2-7 tighten down security for mounts with s_user_ns != >> >> &init_user_ns. This includes updates to how file caps and suid are >> >> handled and LSM updates to ignore security labels on superblocks >> >> from non-init namespaces. >> >> >> >> The LSM changes in particular may not be optimal, as I don't have a >> >> lot of familiarity with this code, so I'd be especially appreciative >> >> of review of these changes and suggestions on how to improve them. >> > >> > Lukasz Pawelczyk <l.pawelczyk@samsung.com> proposed >> > LSM support in user namespaces ([RFC] lsm: namespace hooks) >> > that make a whole lot more sense than just turning off >> > the option of using labels on files. Gutting the ability >> > to use MAC in a namespace is a step down the road of >> > making MAC and namespaces incompatible. >> >> This is not "turning off the option to use labels on files". >> >> This is supporting mounting filesystems like ext4 by unprivileged users >> and not trusting the labels they set in the same way as we trust labels >> on filesystems mounted by privileged users. >> >> The first step needs to be not trusting those labels and treating such >> filesystems as filesystems without label support. I hope that is Seth >> has implemented. >> >> In the long run we can do more interesting things with such filesystems >> once the appropriate LSM policy is in place. > > Yes, this exactly. Right now it looks to me like the only safe thing to > do with mounts from unprivileged users is to ignore the security labels, > so that's what I'm trying to do with these changes. If there's some > better thing to do, or some better way to do it, I'm more than happy to > receive that feedback. Ugh. This made me realize that we have an interesting problem here. An unprivileged mount of tmpfs probably needs to have s_user_ns == &init_user_ns. Otherwise we will break security labels on tmpfs for no good reason. ramfs and sysfs also seem to have similar concerns. Because they have no backing store we can trust those filesystems with security labels. Plus for at least sysfs there is the security label bleed through issue, that we need to make certain works. Perhaps these filesystems with trusted backing store need to call "sget_userns(..., &init_user_ns)". If we don't get this right we will have significant regressions with respect to security labels, and that is not ok. Eric ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-15 22:28 ` Eric W. Biederman @ 2015-07-16 1:05 ` Andy Lutomirski -1 siblings, 0 replies; 138+ messages in thread From: Andy Lutomirski @ 2015-07-16 1:05 UTC (permalink / raw) To: Eric W. Biederman Cc: SELinux-NSA, Serge Hallyn, Alexander Viro, linux-kernel, LSM List, Linux FS Devel, Casey Schaufler, Seth Forshee On Jul 15, 2015 3:34 PM, "Eric W. Biederman" <ebiederm@xmission.com> wrote: > > Seth Forshee <seth.forshee@canonical.com> writes: > > > On Wed, Jul 15, 2015 at 04:06:35PM -0500, Eric W. Biederman wrote: > >> Casey Schaufler <casey@schaufler-ca.com> writes: > >> > >> > On 7/15/2015 12:46 PM, Seth Forshee wrote: > >> >> These are the first in a larger set of patches that I've been working on > >> >> (with help from Eric Biederman) to support mounting ext4 and fuse > >> >> filesystems from within user namespaces. I've pushed the full series to: > >> >> > >> >> git://kernel.ubuntu.com/sforshee/linux.git userns-mounts > >> >> > >> >> Taking the series as a whole, the strategy is to handle as much of the > >> >> heavy lifting as possible in the vfs so the filesystems don't have to > >> >> handle weird edge cases. If you look at the full series you'll find that > >> >> the changes in ext4 to support user namespace mounts turn out to be > >> >> fairly minimal (fuse is a bit more complicated though as it must deal > >> >> with translating ids for a userspace process which is running in pid and > >> >> user namespaces). > >> >> > >> >> The patches I'm sending today lay some of the groundwork in the vfs and > >> >> related code. They fall into two broad groups: > >> >> > >> >> 1. Patches 1-2 add s_user_ns and simplify MNT_NODEV handling. These are > >> >> pretty straightforward, and Eric has expressed interest in merging > >> >> these patches soon. Note that patch 2 won't apply cleanly without > >> >> Eric's noexec patches for proc and sys [1]. > >> >> > >> >> 2. Patches 2-7 tighten down security for mounts with s_user_ns != > >> >> &init_user_ns. This includes updates to how file caps and suid are > >> >> handled and LSM updates to ignore security labels on superblocks > >> >> from non-init namespaces. > >> >> > >> >> The LSM changes in particular may not be optimal, as I don't have a > >> >> lot of familiarity with this code, so I'd be especially appreciative > >> >> of review of these changes and suggestions on how to improve them. > >> > > >> > Lukasz Pawelczyk <l.pawelczyk@samsung.com> proposed > >> > LSM support in user namespaces ([RFC] lsm: namespace hooks) > >> > that make a whole lot more sense than just turning off > >> > the option of using labels on files. Gutting the ability > >> > to use MAC in a namespace is a step down the road of > >> > making MAC and namespaces incompatible. > >> > >> This is not "turning off the option to use labels on files". > >> > >> This is supporting mounting filesystems like ext4 by unprivileged users > >> and not trusting the labels they set in the same way as we trust labels > >> on filesystems mounted by privileged users. > >> > >> The first step needs to be not trusting those labels and treating such > >> filesystems as filesystems without label support. I hope that is Seth > >> has implemented. > >> > >> In the long run we can do more interesting things with such filesystems > >> once the appropriate LSM policy is in place. > > > > Yes, this exactly. Right now it looks to me like the only safe thing to > > do with mounts from unprivileged users is to ignore the security labels, > > so that's what I'm trying to do with these changes. If there's some > > better thing to do, or some better way to do it, I'm more than happy to > > receive that feedback. > > Ugh. > > This made me realize that we have an interesting problem here. An > unprivileged mount of tmpfs probably needs to have > s_user_ns == &init_user_ns. > > Otherwise we will break security labels on tmpfs for no good reason. > ramfs and sysfs also seem to have similar concerns. > > Because they have no backing store we can trust those filesystems with > security labels. Plus for at least sysfs there is the security label > bleed through issue, that we need to make certain works. > > Perhaps these filesystems with trusted backing store need to call > "sget_userns(..., &init_user_ns)". > > If we don't get this right we will have significant regressions with > respect to security labels, and that is not ok. That's only a problem if there's anyone who sets security labels on such a mount. You need global caps to do that (I hope), which requires someone outside the userns to help, which means there's a good chance that literally no one does this. --Andy ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-16 1:05 ` Andy Lutomirski 0 siblings, 0 replies; 138+ messages in thread From: Andy Lutomirski @ 2015-07-16 1:05 UTC (permalink / raw) To: Eric W. Biederman Cc: Serge Hallyn, Seth Forshee, linux-kernel, LSM List, Alexander Viro, SELinux-NSA, Linux FS Devel On Jul 15, 2015 3:34 PM, "Eric W. Biederman" <ebiederm@xmission.com> wrote: > > Seth Forshee <seth.forshee@canonical.com> writes: > > > On Wed, Jul 15, 2015 at 04:06:35PM -0500, Eric W. Biederman wrote: > >> Casey Schaufler <casey@schaufler-ca.com> writes: > >> > >> > On 7/15/2015 12:46 PM, Seth Forshee wrote: > >> >> These are the first in a larger set of patches that I've been working on > >> >> (with help from Eric Biederman) to support mounting ext4 and fuse > >> >> filesystems from within user namespaces. I've pushed the full series to: > >> >> > >> >> git://kernel.ubuntu.com/sforshee/linux.git userns-mounts > >> >> > >> >> Taking the series as a whole, the strategy is to handle as much of the > >> >> heavy lifting as possible in the vfs so the filesystems don't have to > >> >> handle weird edge cases. If you look at the full series you'll find that > >> >> the changes in ext4 to support user namespace mounts turn out to be > >> >> fairly minimal (fuse is a bit more complicated though as it must deal > >> >> with translating ids for a userspace process which is running in pid and > >> >> user namespaces). > >> >> > >> >> The patches I'm sending today lay some of the groundwork in the vfs and > >> >> related code. They fall into two broad groups: > >> >> > >> >> 1. Patches 1-2 add s_user_ns and simplify MNT_NODEV handling. These are > >> >> pretty straightforward, and Eric has expressed interest in merging > >> >> these patches soon. Note that patch 2 won't apply cleanly without > >> >> Eric's noexec patches for proc and sys [1]. > >> >> > >> >> 2. Patches 2-7 tighten down security for mounts with s_user_ns != > >> >> &init_user_ns. This includes updates to how file caps and suid are > >> >> handled and LSM updates to ignore security labels on superblocks > >> >> from non-init namespaces. > >> >> > >> >> The LSM changes in particular may not be optimal, as I don't have a > >> >> lot of familiarity with this code, so I'd be especially appreciative > >> >> of review of these changes and suggestions on how to improve them. > >> > > >> > Lukasz Pawelczyk <l.pawelczyk@samsung.com> proposed > >> > LSM support in user namespaces ([RFC] lsm: namespace hooks) > >> > that make a whole lot more sense than just turning off > >> > the option of using labels on files. Gutting the ability > >> > to use MAC in a namespace is a step down the road of > >> > making MAC and namespaces incompatible. > >> > >> This is not "turning off the option to use labels on files". > >> > >> This is supporting mounting filesystems like ext4 by unprivileged users > >> and not trusting the labels they set in the same way as we trust labels > >> on filesystems mounted by privileged users. > >> > >> The first step needs to be not trusting those labels and treating such > >> filesystems as filesystems without label support. I hope that is Seth > >> has implemented. > >> > >> In the long run we can do more interesting things with such filesystems > >> once the appropriate LSM policy is in place. > > > > Yes, this exactly. Right now it looks to me like the only safe thing to > > do with mounts from unprivileged users is to ignore the security labels, > > so that's what I'm trying to do with these changes. If there's some > > better thing to do, or some better way to do it, I'm more than happy to > > receive that feedback. > > Ugh. > > This made me realize that we have an interesting problem here. An > unprivileged mount of tmpfs probably needs to have > s_user_ns == &init_user_ns. > > Otherwise we will break security labels on tmpfs for no good reason. > ramfs and sysfs also seem to have similar concerns. > > Because they have no backing store we can trust those filesystems with > security labels. Plus for at least sysfs there is the security label > bleed through issue, that we need to make certain works. > > Perhaps these filesystems with trusted backing store need to call > "sget_userns(..., &init_user_ns)". > > If we don't get this right we will have significant regressions with > respect to security labels, and that is not ok. That's only a problem if there's anyone who sets security labels on such a mount. You need global caps to do that (I hope), which requires someone outside the userns to help, which means there's a good chance that literally no one does this. --Andy ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-16 1:05 ` Andy Lutomirski @ 2015-07-16 2:20 ` Eric W. Biederman -1 siblings, 0 replies; 138+ messages in thread From: Eric W. Biederman @ 2015-07-16 2:20 UTC (permalink / raw) To: Andy Lutomirski Cc: SELinux-NSA, Serge Hallyn, Alexander Viro, linux-kernel, LSM List, Linux FS Devel, Casey Schaufler, Seth Forshee Andy Lutomirski <luto@amacapital.net> writes: > On Jul 15, 2015 3:34 PM, "Eric W. Biederman" <ebiederm@xmission.com> wrote: >> >> Seth Forshee <seth.forshee@canonical.com> writes: >> >> > On Wed, Jul 15, 2015 at 04:06:35PM -0500, Eric W. Biederman wrote: >> >> Casey Schaufler <casey@schaufler-ca.com> writes: >> >> >> >> > On 7/15/2015 12:46 PM, Seth Forshee wrote: >> >> >> These are the first in a larger set of patches that I've been working on >> >> >> (with help from Eric Biederman) to support mounting ext4 and fuse >> >> >> filesystems from within user namespaces. I've pushed the full series to: >> >> >> >> >> >> git://kernel.ubuntu.com/sforshee/linux.git userns-mounts >> >> >> >> >> >> Taking the series as a whole, the strategy is to handle as much of the >> >> >> heavy lifting as possible in the vfs so the filesystems don't have to >> >> >> handle weird edge cases. If you look at the full series you'll find that >> >> >> the changes in ext4 to support user namespace mounts turn out to be >> >> >> fairly minimal (fuse is a bit more complicated though as it must deal >> >> >> with translating ids for a userspace process which is running in pid and >> >> >> user namespaces). >> >> >> >> >> >> The patches I'm sending today lay some of the groundwork in the vfs and >> >> >> related code. They fall into two broad groups: >> >> >> >> >> >> 1. Patches 1-2 add s_user_ns and simplify MNT_NODEV handling. These are >> >> >> pretty straightforward, and Eric has expressed interest in merging >> >> >> these patches soon. Note that patch 2 won't apply cleanly without >> >> >> Eric's noexec patches for proc and sys [1]. >> >> >> >> >> >> 2. Patches 2-7 tighten down security for mounts with s_user_ns != >> >> >> &init_user_ns. This includes updates to how file caps and suid are >> >> >> handled and LSM updates to ignore security labels on superblocks >> >> >> from non-init namespaces. >> >> >> >> >> >> The LSM changes in particular may not be optimal, as I don't have a >> >> >> lot of familiarity with this code, so I'd be especially appreciative >> >> >> of review of these changes and suggestions on how to improve them. >> >> > >> >> > Lukasz Pawelczyk <l.pawelczyk@samsung.com> proposed >> >> > LSM support in user namespaces ([RFC] lsm: namespace hooks) >> >> > that make a whole lot more sense than just turning off >> >> > the option of using labels on files. Gutting the ability >> >> > to use MAC in a namespace is a step down the road of >> >> > making MAC and namespaces incompatible. >> >> >> >> This is not "turning off the option to use labels on files". >> >> >> >> This is supporting mounting filesystems like ext4 by unprivileged users >> >> and not trusting the labels they set in the same way as we trust labels >> >> on filesystems mounted by privileged users. >> >> >> >> The first step needs to be not trusting those labels and treating such >> >> filesystems as filesystems without label support. I hope that is Seth >> >> has implemented. >> >> >> >> In the long run we can do more interesting things with such filesystems >> >> once the appropriate LSM policy is in place. >> > >> > Yes, this exactly. Right now it looks to me like the only safe thing to >> > do with mounts from unprivileged users is to ignore the security labels, >> > so that's what I'm trying to do with these changes. If there's some >> > better thing to do, or some better way to do it, I'm more than happy to >> > receive that feedback. >> >> Ugh. >> >> This made me realize that we have an interesting problem here. An >> unprivileged mount of tmpfs probably needs to have >> s_user_ns == &init_user_ns. >> >> Otherwise we will break security labels on tmpfs for no good reason. >> ramfs and sysfs also seem to have similar concerns. >> >> Because they have no backing store we can trust those filesystems with >> security labels. Plus for at least sysfs there is the security label >> bleed through issue, that we need to make certain works. >> >> Perhaps these filesystems with trusted backing store need to call >> "sget_userns(..., &init_user_ns)". >> >> If we don't get this right we will have significant regressions with >> respect to security labels, and that is not ok. > > That's only a problem if there's anyone who sets security labels on > such a mount. You need global caps to do that (I hope), which > requires someone outside the userns to help, which means there's a > good chance that literally no one does this. Fair enough. That is however something we need to test. If no one puts security labels or file caps on such a mount we can change things. If not we can't because it would introduce regressions. Eric ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-16 2:20 ` Eric W. Biederman 0 siblings, 0 replies; 138+ messages in thread From: Eric W. Biederman @ 2015-07-16 2:20 UTC (permalink / raw) To: Andy Lutomirski Cc: Serge Hallyn, Seth Forshee, linux-kernel, LSM List, Alexander Viro, SELinux-NSA, Linux FS Devel Andy Lutomirski <luto@amacapital.net> writes: > On Jul 15, 2015 3:34 PM, "Eric W. Biederman" <ebiederm@xmission.com> wrote: >> >> Seth Forshee <seth.forshee@canonical.com> writes: >> >> > On Wed, Jul 15, 2015 at 04:06:35PM -0500, Eric W. Biederman wrote: >> >> Casey Schaufler <casey@schaufler-ca.com> writes: >> >> >> >> > On 7/15/2015 12:46 PM, Seth Forshee wrote: >> >> >> These are the first in a larger set of patches that I've been working on >> >> >> (with help from Eric Biederman) to support mounting ext4 and fuse >> >> >> filesystems from within user namespaces. I've pushed the full series to: >> >> >> >> >> >> git://kernel.ubuntu.com/sforshee/linux.git userns-mounts >> >> >> >> >> >> Taking the series as a whole, the strategy is to handle as much of the >> >> >> heavy lifting as possible in the vfs so the filesystems don't have to >> >> >> handle weird edge cases. If you look at the full series you'll find that >> >> >> the changes in ext4 to support user namespace mounts turn out to be >> >> >> fairly minimal (fuse is a bit more complicated though as it must deal >> >> >> with translating ids for a userspace process which is running in pid and >> >> >> user namespaces). >> >> >> >> >> >> The patches I'm sending today lay some of the groundwork in the vfs and >> >> >> related code. They fall into two broad groups: >> >> >> >> >> >> 1. Patches 1-2 add s_user_ns and simplify MNT_NODEV handling. These are >> >> >> pretty straightforward, and Eric has expressed interest in merging >> >> >> these patches soon. Note that patch 2 won't apply cleanly without >> >> >> Eric's noexec patches for proc and sys [1]. >> >> >> >> >> >> 2. Patches 2-7 tighten down security for mounts with s_user_ns != >> >> >> &init_user_ns. This includes updates to how file caps and suid are >> >> >> handled and LSM updates to ignore security labels on superblocks >> >> >> from non-init namespaces. >> >> >> >> >> >> The LSM changes in particular may not be optimal, as I don't have a >> >> >> lot of familiarity with this code, so I'd be especially appreciative >> >> >> of review of these changes and suggestions on how to improve them. >> >> > >> >> > Lukasz Pawelczyk <l.pawelczyk@samsung.com> proposed >> >> > LSM support in user namespaces ([RFC] lsm: namespace hooks) >> >> > that make a whole lot more sense than just turning off >> >> > the option of using labels on files. Gutting the ability >> >> > to use MAC in a namespace is a step down the road of >> >> > making MAC and namespaces incompatible. >> >> >> >> This is not "turning off the option to use labels on files". >> >> >> >> This is supporting mounting filesystems like ext4 by unprivileged users >> >> and not trusting the labels they set in the same way as we trust labels >> >> on filesystems mounted by privileged users. >> >> >> >> The first step needs to be not trusting those labels and treating such >> >> filesystems as filesystems without label support. I hope that is Seth >> >> has implemented. >> >> >> >> In the long run we can do more interesting things with such filesystems >> >> once the appropriate LSM policy is in place. >> > >> > Yes, this exactly. Right now it looks to me like the only safe thing to >> > do with mounts from unprivileged users is to ignore the security labels, >> > so that's what I'm trying to do with these changes. If there's some >> > better thing to do, or some better way to do it, I'm more than happy to >> > receive that feedback. >> >> Ugh. >> >> This made me realize that we have an interesting problem here. An >> unprivileged mount of tmpfs probably needs to have >> s_user_ns == &init_user_ns. >> >> Otherwise we will break security labels on tmpfs for no good reason. >> ramfs and sysfs also seem to have similar concerns. >> >> Because they have no backing store we can trust those filesystems with >> security labels. Plus for at least sysfs there is the security label >> bleed through issue, that we need to make certain works. >> >> Perhaps these filesystems with trusted backing store need to call >> "sget_userns(..., &init_user_ns)". >> >> If we don't get this right we will have significant regressions with >> respect to security labels, and that is not ok. > > That's only a problem if there's anyone who sets security labels on > such a mount. You need global caps to do that (I hope), which > requires someone outside the userns to help, which means there's a > good chance that literally no one does this. Fair enough. That is however something we need to test. If no one puts security labels or file caps on such a mount we can change things. If not we can't because it would introduce regressions. Eric ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-16 1:05 ` Andy Lutomirski @ 2015-07-16 13:12 ` Stephen Smalley -1 siblings, 0 replies; 138+ messages in thread From: Stephen Smalley @ 2015-07-16 13:12 UTC (permalink / raw) To: Andy Lutomirski, Eric W. Biederman Cc: Serge Hallyn, Seth Forshee, linux-kernel, LSM List, Alexander Viro, SELinux-NSA, Linux FS Devel On 07/15/2015 09:05 PM, Andy Lutomirski wrote: > On Jul 15, 2015 3:34 PM, "Eric W. Biederman" <ebiederm@xmission.com> wrote: >> >> Seth Forshee <seth.forshee@canonical.com> writes: >> >>> On Wed, Jul 15, 2015 at 04:06:35PM -0500, Eric W. Biederman wrote: >>>> Casey Schaufler <casey@schaufler-ca.com> writes: >>>> >>>>> On 7/15/2015 12:46 PM, Seth Forshee wrote: >>>>>> These are the first in a larger set of patches that I've been working on >>>>>> (with help from Eric Biederman) to support mounting ext4 and fuse >>>>>> filesystems from within user namespaces. I've pushed the full series to: >>>>>> >>>>>> git://kernel.ubuntu.com/sforshee/linux.git userns-mounts >>>>>> >>>>>> Taking the series as a whole, the strategy is to handle as much of the >>>>>> heavy lifting as possible in the vfs so the filesystems don't have to >>>>>> handle weird edge cases. If you look at the full series you'll find that >>>>>> the changes in ext4 to support user namespace mounts turn out to be >>>>>> fairly minimal (fuse is a bit more complicated though as it must deal >>>>>> with translating ids for a userspace process which is running in pid and >>>>>> user namespaces). >>>>>> >>>>>> The patches I'm sending today lay some of the groundwork in the vfs and >>>>>> related code. They fall into two broad groups: >>>>>> >>>>>> 1. Patches 1-2 add s_user_ns and simplify MNT_NODEV handling. These are >>>>>> pretty straightforward, and Eric has expressed interest in merging >>>>>> these patches soon. Note that patch 2 won't apply cleanly without >>>>>> Eric's noexec patches for proc and sys [1]. >>>>>> >>>>>> 2. Patches 2-7 tighten down security for mounts with s_user_ns != >>>>>> &init_user_ns. This includes updates to how file caps and suid are >>>>>> handled and LSM updates to ignore security labels on superblocks >>>>>> from non-init namespaces. >>>>>> >>>>>> The LSM changes in particular may not be optimal, as I don't have a >>>>>> lot of familiarity with this code, so I'd be especially appreciative >>>>>> of review of these changes and suggestions on how to improve them. >>>>> >>>>> Lukasz Pawelczyk <l.pawelczyk@samsung.com> proposed >>>>> LSM support in user namespaces ([RFC] lsm: namespace hooks) >>>>> that make a whole lot more sense than just turning off >>>>> the option of using labels on files. Gutting the ability >>>>> to use MAC in a namespace is a step down the road of >>>>> making MAC and namespaces incompatible. >>>> >>>> This is not "turning off the option to use labels on files". >>>> >>>> This is supporting mounting filesystems like ext4 by unprivileged users >>>> and not trusting the labels they set in the same way as we trust labels >>>> on filesystems mounted by privileged users. >>>> >>>> The first step needs to be not trusting those labels and treating such >>>> filesystems as filesystems without label support. I hope that is Seth >>>> has implemented. >>>> >>>> In the long run we can do more interesting things with such filesystems >>>> once the appropriate LSM policy is in place. >>> >>> Yes, this exactly. Right now it looks to me like the only safe thing to >>> do with mounts from unprivileged users is to ignore the security labels, >>> so that's what I'm trying to do with these changes. If there's some >>> better thing to do, or some better way to do it, I'm more than happy to >>> receive that feedback. >> >> Ugh. >> >> This made me realize that we have an interesting problem here. An >> unprivileged mount of tmpfs probably needs to have >> s_user_ns == &init_user_ns. >> >> Otherwise we will break security labels on tmpfs for no good reason. >> ramfs and sysfs also seem to have similar concerns. >> >> Because they have no backing store we can trust those filesystems with >> security labels. Plus for at least sysfs there is the security label >> bleed through issue, that we need to make certain works. >> >> Perhaps these filesystems with trusted backing store need to call >> "sget_userns(..., &init_user_ns)". >> >> If we don't get this right we will have significant regressions with >> respect to security labels, and that is not ok. > > That's only a problem if there's anyone who sets security labels on > such a mount. You need global caps to do that (I hope), which > requires someone outside the userns to help, which means there's a > good chance that literally no one does this. Setting of security.selinux attributes is governed by SELinux permission checks, not by capabilities. Also, files are always assigned a label at creation time; a tmpfs inode will be labeled based on its creator without any userspace entity ever calling setxattr() at all. ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-16 13:12 ` Stephen Smalley 0 siblings, 0 replies; 138+ messages in thread From: Stephen Smalley @ 2015-07-16 13:12 UTC (permalink / raw) To: Andy Lutomirski, Eric W. Biederman Cc: Serge Hallyn, linux-kernel, Seth Forshee, LSM List, Alexander Viro, SELinux-NSA, Linux FS Devel On 07/15/2015 09:05 PM, Andy Lutomirski wrote: > On Jul 15, 2015 3:34 PM, "Eric W. Biederman" <ebiederm@xmission.com> wrote: >> >> Seth Forshee <seth.forshee@canonical.com> writes: >> >>> On Wed, Jul 15, 2015 at 04:06:35PM -0500, Eric W. Biederman wrote: >>>> Casey Schaufler <casey@schaufler-ca.com> writes: >>>> >>>>> On 7/15/2015 12:46 PM, Seth Forshee wrote: >>>>>> These are the first in a larger set of patches that I've been working on >>>>>> (with help from Eric Biederman) to support mounting ext4 and fuse >>>>>> filesystems from within user namespaces. I've pushed the full series to: >>>>>> >>>>>> git://kernel.ubuntu.com/sforshee/linux.git userns-mounts >>>>>> >>>>>> Taking the series as a whole, the strategy is to handle as much of the >>>>>> heavy lifting as possible in the vfs so the filesystems don't have to >>>>>> handle weird edge cases. If you look at the full series you'll find that >>>>>> the changes in ext4 to support user namespace mounts turn out to be >>>>>> fairly minimal (fuse is a bit more complicated though as it must deal >>>>>> with translating ids for a userspace process which is running in pid and >>>>>> user namespaces). >>>>>> >>>>>> The patches I'm sending today lay some of the groundwork in the vfs and >>>>>> related code. They fall into two broad groups: >>>>>> >>>>>> 1. Patches 1-2 add s_user_ns and simplify MNT_NODEV handling. These are >>>>>> pretty straightforward, and Eric has expressed interest in merging >>>>>> these patches soon. Note that patch 2 won't apply cleanly without >>>>>> Eric's noexec patches for proc and sys [1]. >>>>>> >>>>>> 2. Patches 2-7 tighten down security for mounts with s_user_ns != >>>>>> &init_user_ns. This includes updates to how file caps and suid are >>>>>> handled and LSM updates to ignore security labels on superblocks >>>>>> from non-init namespaces. >>>>>> >>>>>> The LSM changes in particular may not be optimal, as I don't have a >>>>>> lot of familiarity with this code, so I'd be especially appreciative >>>>>> of review of these changes and suggestions on how to improve them. >>>>> >>>>> Lukasz Pawelczyk <l.pawelczyk@samsung.com> proposed >>>>> LSM support in user namespaces ([RFC] lsm: namespace hooks) >>>>> that make a whole lot more sense than just turning off >>>>> the option of using labels on files. Gutting the ability >>>>> to use MAC in a namespace is a step down the road of >>>>> making MAC and namespaces incompatible. >>>> >>>> This is not "turning off the option to use labels on files". >>>> >>>> This is supporting mounting filesystems like ext4 by unprivileged users >>>> and not trusting the labels they set in the same way as we trust labels >>>> on filesystems mounted by privileged users. >>>> >>>> The first step needs to be not trusting those labels and treating such >>>> filesystems as filesystems without label support. I hope that is Seth >>>> has implemented. >>>> >>>> In the long run we can do more interesting things with such filesystems >>>> once the appropriate LSM policy is in place. >>> >>> Yes, this exactly. Right now it looks to me like the only safe thing to >>> do with mounts from unprivileged users is to ignore the security labels, >>> so that's what I'm trying to do with these changes. If there's some >>> better thing to do, or some better way to do it, I'm more than happy to >>> receive that feedback. >> >> Ugh. >> >> This made me realize that we have an interesting problem here. An >> unprivileged mount of tmpfs probably needs to have >> s_user_ns == &init_user_ns. >> >> Otherwise we will break security labels on tmpfs for no good reason. >> ramfs and sysfs also seem to have similar concerns. >> >> Because they have no backing store we can trust those filesystems with >> security labels. Plus for at least sysfs there is the security label >> bleed through issue, that we need to make certain works. >> >> Perhaps these filesystems with trusted backing store need to call >> "sget_userns(..., &init_user_ns)". >> >> If we don't get this right we will have significant regressions with >> respect to security labels, and that is not ok. > > That's only a problem if there's anyone who sets security labels on > such a mount. You need global caps to do that (I hope), which > requires someone outside the userns to help, which means there's a > good chance that literally no one does this. Setting of security.selinux attributes is governed by SELinux permission checks, not by capabilities. Also, files are always assigned a label at creation time; a tmpfs inode will be labeled based on its creator without any userspace entity ever calling setxattr() at all. ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-15 21:48 ` Seth Forshee @ 2015-07-15 23:04 ` Casey Schaufler -1 siblings, 0 replies; 138+ messages in thread From: Casey Schaufler @ 2015-07-15 23:04 UTC (permalink / raw) To: Seth Forshee, Eric W. Biederman Cc: Alexander Viro, linux-fsdevel, linux-security-module, selinux, Serge Hallyn, Andy Lutomirski, linux-kernel On 7/15/2015 2:48 PM, Seth Forshee wrote: > On Wed, Jul 15, 2015 at 04:06:35PM -0500, Eric W. Biederman wrote: >> Casey Schaufler <casey@schaufler-ca.com> writes: >> >>> On 7/15/2015 12:46 PM, Seth Forshee wrote: >>>> These are the first in a larger set of patches that I've been working on >>>> (with help from Eric Biederman) to support mounting ext4 and fuse >>>> filesystems from within user namespaces. I've pushed the full series to: >>>> >>>> git://kernel.ubuntu.com/sforshee/linux.git userns-mounts >>>> >>>> Taking the series as a whole, the strategy is to handle as much of the >>>> heavy lifting as possible in the vfs so the filesystems don't have to >>>> handle weird edge cases. If you look at the full series you'll find that >>>> the changes in ext4 to support user namespace mounts turn out to be >>>> fairly minimal (fuse is a bit more complicated though as it must deal >>>> with translating ids for a userspace process which is running in pid and >>>> user namespaces). >>>> >>>> The patches I'm sending today lay some of the groundwork in the vfs and >>>> related code. They fall into two broad groups: >>>> >>>> 1. Patches 1-2 add s_user_ns and simplify MNT_NODEV handling. These are >>>> pretty straightforward, and Eric has expressed interest in merging >>>> these patches soon. Note that patch 2 won't apply cleanly without >>>> Eric's noexec patches for proc and sys [1]. >>>> >>>> 2. Patches 2-7 tighten down security for mounts with s_user_ns != >>>> &init_user_ns. This includes updates to how file caps and suid are >>>> handled and LSM updates to ignore security labels on superblocks >>>> from non-init namespaces. >>>> >>>> The LSM changes in particular may not be optimal, as I don't have a >>>> lot of familiarity with this code, so I'd be especially appreciative >>>> of review of these changes and suggestions on how to improve them. >>> Lukasz Pawelczyk <l.pawelczyk@samsung.com> proposed >>> LSM support in user namespaces ([RFC] lsm: namespace hooks) >>> that make a whole lot more sense than just turning off >>> the option of using labels on files. Gutting the ability >>> to use MAC in a namespace is a step down the road of >>> making MAC and namespaces incompatible. >> This is not "turning off the option to use labels on files". >> >> This is supporting mounting filesystems like ext4 by unprivileged users >> and not trusting the labels they set in the same way as we trust labels >> on filesystems mounted by privileged users. >> >> The first step needs to be not trusting those labels and treating such >> filesystems as filesystems without label support. I hope that is Seth >> has implemented. >> >> In the long run we can do more interesting things with such filesystems >> once the appropriate LSM policy is in place. > Yes, this exactly. Right now it looks to me like the only safe thing to > do with mounts from unprivileged users is to ignore the security labels, > so that's what I'm trying to do with these changes. If there's some > better thing to do, or some better way to do it, I'm more than happy to > receive that feedback. If you ignore Smack labels you get a system that is broken. Without specifying Smack mount options (requires CAP_MAC_ADMIN) all your files will be labeled with the floor ("_") label. Unless you're running with the floor label (Smack systems generally don't) there won't be anything you can write to. You will be able to read everything, which is also something you're unlikely to want. Like I said, broken. Personally, I don't believe that the goal of supporting unprivileged mounts is especially sane. I am willing to be educated, but I don't see a rational solution. > Seth ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-15 23:04 ` Casey Schaufler 0 siblings, 0 replies; 138+ messages in thread From: Casey Schaufler @ 2015-07-15 23:04 UTC (permalink / raw) To: Seth Forshee, Eric W. Biederman Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, linux-security-module, Alexander Viro, selinux, linux-fsdevel On 7/15/2015 2:48 PM, Seth Forshee wrote: > On Wed, Jul 15, 2015 at 04:06:35PM -0500, Eric W. Biederman wrote: >> Casey Schaufler <casey@schaufler-ca.com> writes: >> >>> On 7/15/2015 12:46 PM, Seth Forshee wrote: >>>> These are the first in a larger set of patches that I've been working on >>>> (with help from Eric Biederman) to support mounting ext4 and fuse >>>> filesystems from within user namespaces. I've pushed the full series to: >>>> >>>> git://kernel.ubuntu.com/sforshee/linux.git userns-mounts >>>> >>>> Taking the series as a whole, the strategy is to handle as much of the >>>> heavy lifting as possible in the vfs so the filesystems don't have to >>>> handle weird edge cases. If you look at the full series you'll find that >>>> the changes in ext4 to support user namespace mounts turn out to be >>>> fairly minimal (fuse is a bit more complicated though as it must deal >>>> with translating ids for a userspace process which is running in pid and >>>> user namespaces). >>>> >>>> The patches I'm sending today lay some of the groundwork in the vfs and >>>> related code. They fall into two broad groups: >>>> >>>> 1. Patches 1-2 add s_user_ns and simplify MNT_NODEV handling. These are >>>> pretty straightforward, and Eric has expressed interest in merging >>>> these patches soon. Note that patch 2 won't apply cleanly without >>>> Eric's noexec patches for proc and sys [1]. >>>> >>>> 2. Patches 2-7 tighten down security for mounts with s_user_ns != >>>> &init_user_ns. This includes updates to how file caps and suid are >>>> handled and LSM updates to ignore security labels on superblocks >>>> from non-init namespaces. >>>> >>>> The LSM changes in particular may not be optimal, as I don't have a >>>> lot of familiarity with this code, so I'd be especially appreciative >>>> of review of these changes and suggestions on how to improve them. >>> Lukasz Pawelczyk <l.pawelczyk@samsung.com> proposed >>> LSM support in user namespaces ([RFC] lsm: namespace hooks) >>> that make a whole lot more sense than just turning off >>> the option of using labels on files. Gutting the ability >>> to use MAC in a namespace is a step down the road of >>> making MAC and namespaces incompatible. >> This is not "turning off the option to use labels on files". >> >> This is supporting mounting filesystems like ext4 by unprivileged users >> and not trusting the labels they set in the same way as we trust labels >> on filesystems mounted by privileged users. >> >> The first step needs to be not trusting those labels and treating such >> filesystems as filesystems without label support. I hope that is Seth >> has implemented. >> >> In the long run we can do more interesting things with such filesystems >> once the appropriate LSM policy is in place. > Yes, this exactly. Right now it looks to me like the only safe thing to > do with mounts from unprivileged users is to ignore the security labels, > so that's what I'm trying to do with these changes. If there's some > better thing to do, or some better way to do it, I'm more than happy to > receive that feedback. If you ignore Smack labels you get a system that is broken. Without specifying Smack mount options (requires CAP_MAC_ADMIN) all your files will be labeled with the floor ("_") label. Unless you're running with the floor label (Smack systems generally don't) there won't be anything you can write to. You will be able to read everything, which is also something you're unlikely to want. Like I said, broken. Personally, I don't believe that the goal of supporting unprivileged mounts is especially sane. I am willing to be educated, but I don't see a rational solution. > Seth ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-15 21:06 ` Eric W. Biederman @ 2015-07-15 22:39 ` Casey Schaufler -1 siblings, 0 replies; 138+ messages in thread From: Casey Schaufler @ 2015-07-15 22:39 UTC (permalink / raw) To: Eric W. Biederman Cc: Seth Forshee, Alexander Viro, linux-fsdevel, linux-security-module, selinux, Serge Hallyn, Andy Lutomirski, linux-kernel On 7/15/2015 2:06 PM, Eric W. Biederman wrote: > Casey Schaufler <casey@schaufler-ca.com> writes: > >> On 7/15/2015 12:46 PM, Seth Forshee wrote: >>> These are the first in a larger set of patches that I've been working on >>> (with help from Eric Biederman) to support mounting ext4 and fuse >>> filesystems from within user namespaces. I've pushed the full series to: >>> >>> git://kernel.ubuntu.com/sforshee/linux.git userns-mounts >>> >>> Taking the series as a whole, the strategy is to handle as much of the >>> heavy lifting as possible in the vfs so the filesystems don't have to >>> handle weird edge cases. If you look at the full series you'll find that >>> the changes in ext4 to support user namespace mounts turn out to be >>> fairly minimal (fuse is a bit more complicated though as it must deal >>> with translating ids for a userspace process which is running in pid and >>> user namespaces). >>> >>> The patches I'm sending today lay some of the groundwork in the vfs and >>> related code. They fall into two broad groups: >>> >>> 1. Patches 1-2 add s_user_ns and simplify MNT_NODEV handling. These are >>> pretty straightforward, and Eric has expressed interest in merging >>> these patches soon. Note that patch 2 won't apply cleanly without >>> Eric's noexec patches for proc and sys [1]. >>> >>> 2. Patches 2-7 tighten down security for mounts with s_user_ns != >>> &init_user_ns. This includes updates to how file caps and suid are >>> handled and LSM updates to ignore security labels on superblocks >>> from non-init namespaces. >>> >>> The LSM changes in particular may not be optimal, as I don't have a >>> lot of familiarity with this code, so I'd be especially appreciative >>> of review of these changes and suggestions on how to improve them. >> Lukasz Pawelczyk <l.pawelczyk@samsung.com> proposed >> LSM support in user namespaces ([RFC] lsm: namespace hooks) >> that make a whole lot more sense than just turning off >> the option of using labels on files. Gutting the ability >> to use MAC in a namespace is a step down the road of >> making MAC and namespaces incompatible. > This is not "turning off the option to use labels on files". It gives an unprivileged user the ability to ignore the Smack labels that are on files and to create files with labels that do not match the rules laid down by the security module. > This is supporting mounting filesystems like ext4 by unprivileged users > and not trusting the labels they set in the same way as we trust labels > on filesystems mounted by privileged users. OK, you don't trust the metadata on a filesystem mounted by an untrusted user. That's fair. > The first step needs to be not trusting those labels and treating such > filesystems as filesystems without label support. I hope that is Seth > has implemented. A filesystem with Smack labels gets mounted in a namespace. The labels are ignored. Instead, the filesystem defaults (potentially specified as mount options smackfsdef="something", but usually the floor label ("_")) are used, giving the user the ability to read everything and (usually) change nothing. This is both dangerous (unintended read access to files) and pointless (can't make changes). I can't speak authoritatively for SELinux, but it looks to me like you may have similar issues there. > In the long run we can do more interesting things with such filesystems > once the appropriate LSM policy is in place. The problem is not that the short term behavior is uninteresting, it's that it is broken. Mounting a filesystem with xattrs and ignoring those xattrs results in incorrect access control decisions. > Getting s_user_ns present on struct super, properly set, and all of the > appropriate checks against it present in the vfs so that filesystems > don't need to duplicate logic is important if we are going do more > interesting things with user namespaces (as users have been asking for). OK, but the fact that someone wants to do something they shouldn't doesn't mean you get to break things that work now to accommodate them. There are reasons why mounting filesystems requires privilege! > It is important for things as small as making it safe to allow > truly unprivileged users to mount fuse filesystems. If it isn't safe you shouldn't be doing it, even if it's "small" and something that would make life easier for some set of users. > I am on the fence with Lukasz Pawelczyk's patches. Some parts I liked > some parts I had issues with. As I recall one of my issues was that > those patches conflicted in detail if not in principle with this > appropach. > > If these patches do not do a good job of laying the ground work for > supporting security labels that unprivileged users can set than Seth > could really use some feedback. Figuring out how to properly deal with > the LSMs has been one of his challenges. The feedback is that you can't pick and choose when you are going to pay attention to the security attributes on a filesystem. It's possible that it will work out the way you want it, but it probably won't. Smack doesn't allow you to choose if you're using xattrs. SELinux does, but certainly doesn't expect you to be flipping it on and off. I'm not convinced that it's safe to do for capability sets, either, but I'm not up to arguing PIxFE+ vector calculations just now. > I am hoping I can finishing working through the patches to fix the > semantics of rename and bind mounts before the next merge window opens, > so I can have enough cycles to lift the feature freeze on user > namespaces. Except for maybe his first two patches (which fix a small > userspace API breakage) none of Seth's patches get to go in until I lift > the freeze. Thanks. I know (believe me, I know) how frustrating it can be when you get the big NAK on something that seems like it's addressed. Unfortunately, the proposed approach (not just the specifics of implementation) does not work. > Which is probably too much information but I hope this makes it clear > that the point of this work is as an enabler for future developments, > not as something to make user namespaces and LSMs incompatible. I am paranoid, but not to the extent that I think anyone is trying to break the interaction between security modules and namespaces. Having worked with Lukasz on his security namespace patches it is clear to me that this is not a simple problem and that it is unlikely to have the simple solution everyone would like to see. I also don't see an intermediate state that works while the "real" solution is being refined. As always, I'm willing to be proven wrong. > Eric ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-15 22:39 ` Casey Schaufler 0 siblings, 0 replies; 138+ messages in thread From: Casey Schaufler @ 2015-07-15 22:39 UTC (permalink / raw) To: Eric W. Biederman Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, Seth Forshee, linux-security-module, Alexander Viro, selinux, linux-fsdevel On 7/15/2015 2:06 PM, Eric W. Biederman wrote: > Casey Schaufler <casey@schaufler-ca.com> writes: > >> On 7/15/2015 12:46 PM, Seth Forshee wrote: >>> These are the first in a larger set of patches that I've been working on >>> (with help from Eric Biederman) to support mounting ext4 and fuse >>> filesystems from within user namespaces. I've pushed the full series to: >>> >>> git://kernel.ubuntu.com/sforshee/linux.git userns-mounts >>> >>> Taking the series as a whole, the strategy is to handle as much of the >>> heavy lifting as possible in the vfs so the filesystems don't have to >>> handle weird edge cases. If you look at the full series you'll find that >>> the changes in ext4 to support user namespace mounts turn out to be >>> fairly minimal (fuse is a bit more complicated though as it must deal >>> with translating ids for a userspace process which is running in pid and >>> user namespaces). >>> >>> The patches I'm sending today lay some of the groundwork in the vfs and >>> related code. They fall into two broad groups: >>> >>> 1. Patches 1-2 add s_user_ns and simplify MNT_NODEV handling. These are >>> pretty straightforward, and Eric has expressed interest in merging >>> these patches soon. Note that patch 2 won't apply cleanly without >>> Eric's noexec patches for proc and sys [1]. >>> >>> 2. Patches 2-7 tighten down security for mounts with s_user_ns != >>> &init_user_ns. This includes updates to how file caps and suid are >>> handled and LSM updates to ignore security labels on superblocks >>> from non-init namespaces. >>> >>> The LSM changes in particular may not be optimal, as I don't have a >>> lot of familiarity with this code, so I'd be especially appreciative >>> of review of these changes and suggestions on how to improve them. >> Lukasz Pawelczyk <l.pawelczyk@samsung.com> proposed >> LSM support in user namespaces ([RFC] lsm: namespace hooks) >> that make a whole lot more sense than just turning off >> the option of using labels on files. Gutting the ability >> to use MAC in a namespace is a step down the road of >> making MAC and namespaces incompatible. > This is not "turning off the option to use labels on files". It gives an unprivileged user the ability to ignore the Smack labels that are on files and to create files with labels that do not match the rules laid down by the security module. > This is supporting mounting filesystems like ext4 by unprivileged users > and not trusting the labels they set in the same way as we trust labels > on filesystems mounted by privileged users. OK, you don't trust the metadata on a filesystem mounted by an untrusted user. That's fair. > The first step needs to be not trusting those labels and treating such > filesystems as filesystems without label support. I hope that is Seth > has implemented. A filesystem with Smack labels gets mounted in a namespace. The labels are ignored. Instead, the filesystem defaults (potentially specified as mount options smackfsdef="something", but usually the floor label ("_")) are used, giving the user the ability to read everything and (usually) change nothing. This is both dangerous (unintended read access to files) and pointless (can't make changes). I can't speak authoritatively for SELinux, but it looks to me like you may have similar issues there. > In the long run we can do more interesting things with such filesystems > once the appropriate LSM policy is in place. The problem is not that the short term behavior is uninteresting, it's that it is broken. Mounting a filesystem with xattrs and ignoring those xattrs results in incorrect access control decisions. > Getting s_user_ns present on struct super, properly set, and all of the > appropriate checks against it present in the vfs so that filesystems > don't need to duplicate logic is important if we are going do more > interesting things with user namespaces (as users have been asking for). OK, but the fact that someone wants to do something they shouldn't doesn't mean you get to break things that work now to accommodate them. There are reasons why mounting filesystems requires privilege! > It is important for things as small as making it safe to allow > truly unprivileged users to mount fuse filesystems. If it isn't safe you shouldn't be doing it, even if it's "small" and something that would make life easier for some set of users. > I am on the fence with Lukasz Pawelczyk's patches. Some parts I liked > some parts I had issues with. As I recall one of my issues was that > those patches conflicted in detail if not in principle with this > appropach. > > If these patches do not do a good job of laying the ground work for > supporting security labels that unprivileged users can set than Seth > could really use some feedback. Figuring out how to properly deal with > the LSMs has been one of his challenges. The feedback is that you can't pick and choose when you are going to pay attention to the security attributes on a filesystem. It's possible that it will work out the way you want it, but it probably won't. Smack doesn't allow you to choose if you're using xattrs. SELinux does, but certainly doesn't expect you to be flipping it on and off. I'm not convinced that it's safe to do for capability sets, either, but I'm not up to arguing PIxFE+ vector calculations just now. > I am hoping I can finishing working through the patches to fix the > semantics of rename and bind mounts before the next merge window opens, > so I can have enough cycles to lift the feature freeze on user > namespaces. Except for maybe his first two patches (which fix a small > userspace API breakage) none of Seth's patches get to go in until I lift > the freeze. Thanks. I know (believe me, I know) how frustrating it can be when you get the big NAK on something that seems like it's addressed. Unfortunately, the proposed approach (not just the specifics of implementation) does not work. > Which is probably too much information but I hope this makes it clear > that the point of this work is as an enabler for future developments, > not as something to make user namespaces and LSMs incompatible. I am paranoid, but not to the extent that I think anyone is trying to break the interaction between security modules and namespaces. Having worked with Lukasz on his security namespace patches it is clear to me that this is not a simple problem and that it is unlikely to have the simple solution everyone would like to see. I also don't see an intermediate state that works while the "real" solution is being refined. As always, I'm willing to be proven wrong. > Eric ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-15 22:39 ` Casey Schaufler @ 2015-07-16 1:08 ` Andy Lutomirski -1 siblings, 0 replies; 138+ messages in thread From: Andy Lutomirski @ 2015-07-16 1:08 UTC (permalink / raw) To: Casey Schaufler Cc: Eric W. Biederman, Seth Forshee, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel On Wed, Jul 15, 2015 at 3:39 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: > On 7/15/2015 2:06 PM, Eric W. Biederman wrote: >> Casey Schaufler <casey@schaufler-ca.com> writes: > >> The first step needs to be not trusting those labels and treating such >> filesystems as filesystems without label support. I hope that is Seth >> has implemented. > > A filesystem with Smack labels gets mounted in a namespace. The labels > are ignored. Instead, the filesystem defaults (potentially specified as > mount options smackfsdef="something", but usually the floor label ("_")) > are used, giving the user the ability to read everything and (usually) > change nothing. This is both dangerous (unintended read access to files) > and pointless (can't make changes). I don't get it. If I mount an unprivileged filesystem, then either the contents were put there *by me*, in which case letting me access them are fine, or (with Seth's patches and then some) I control the backing store, in which case I can do whatever I want regardless of what LSM thinks. So I don't see the problem. Why would Smack or any other LSM care at all, unless it wants to prevent me from mounting the fs in the first place? --Andy ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-16 1:08 ` Andy Lutomirski 0 siblings, 0 replies; 138+ messages in thread From: Andy Lutomirski @ 2015-07-16 1:08 UTC (permalink / raw) To: Casey Schaufler Cc: Serge Hallyn, linux-kernel, Seth Forshee, LSM List, SELinux-NSA, Linux FS Devel, Alexander Viro On Wed, Jul 15, 2015 at 3:39 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: > On 7/15/2015 2:06 PM, Eric W. Biederman wrote: >> Casey Schaufler <casey@schaufler-ca.com> writes: > >> The first step needs to be not trusting those labels and treating such >> filesystems as filesystems without label support. I hope that is Seth >> has implemented. > > A filesystem with Smack labels gets mounted in a namespace. The labels > are ignored. Instead, the filesystem defaults (potentially specified as > mount options smackfsdef="something", but usually the floor label ("_")) > are used, giving the user the ability to read everything and (usually) > change nothing. This is both dangerous (unintended read access to files) > and pointless (can't make changes). I don't get it. If I mount an unprivileged filesystem, then either the contents were put there *by me*, in which case letting me access them are fine, or (with Seth's patches and then some) I control the backing store, in which case I can do whatever I want regardless of what LSM thinks. So I don't see the problem. Why would Smack or any other LSM care at all, unless it wants to prevent me from mounting the fs in the first place? --Andy ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-16 1:08 ` Andy Lutomirski @ 2015-07-16 2:54 ` Casey Schaufler -1 siblings, 0 replies; 138+ messages in thread From: Casey Schaufler @ 2015-07-16 2:54 UTC (permalink / raw) To: Andy Lutomirski Cc: Eric W. Biederman, Seth Forshee, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel On 7/15/2015 6:08 PM, Andy Lutomirski wrote: > On Wed, Jul 15, 2015 at 3:39 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: >> On 7/15/2015 2:06 PM, Eric W. Biederman wrote: >>> Casey Schaufler <casey@schaufler-ca.com> writes: >>> The first step needs to be not trusting those labels and treating such >>> filesystems as filesystems without label support. I hope that is Seth >>> has implemented. >> A filesystem with Smack labels gets mounted in a namespace. The labels >> are ignored. Instead, the filesystem defaults (potentially specified as >> mount options smackfsdef="something", but usually the floor label ("_")) >> are used, giving the user the ability to read everything and (usually) >> change nothing. This is both dangerous (unintended read access to files) >> and pointless (can't make changes). > I don't get it. > > If I mount an unprivileged filesystem, then either the contents were > put there *by me*, in which case letting me access them are fine, or > (with Seth's patches and then some) I control the backing store, in > which case I can do whatever I want regardless of what LSM thinks. > > So I don't see the problem. Why would Smack or any other LSM care at > all, unless it wants to prevent me from mounting the fs in the first > place? First off, I don't cotton to the notion that you should be able to mount filesystems without privilege. But it seems I'm being outvoted on that. I suspect that there are cases where it might be safe, but I can't think of one off the top of my head. If you do mount a filesystem it needs to behave according to the rules of the system. If you have a security module that uses attributes on the filesystem you can't ignore them just because it's "your data". Mandatory access control schemes, including Smack and SELinux don't give a fig about who you are. It's the label on the data and the process that matter. If "you" get to muck the labels up, you've broken the mandatory access control. > --Andy ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-16 2:54 ` Casey Schaufler 0 siblings, 0 replies; 138+ messages in thread From: Casey Schaufler @ 2015-07-16 2:54 UTC (permalink / raw) To: Andy Lutomirski Cc: Serge Hallyn, linux-kernel, Seth Forshee, LSM List, SELinux-NSA, Linux FS Devel, Alexander Viro On 7/15/2015 6:08 PM, Andy Lutomirski wrote: > On Wed, Jul 15, 2015 at 3:39 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: >> On 7/15/2015 2:06 PM, Eric W. Biederman wrote: >>> Casey Schaufler <casey@schaufler-ca.com> writes: >>> The first step needs to be not trusting those labels and treating such >>> filesystems as filesystems without label support. I hope that is Seth >>> has implemented. >> A filesystem with Smack labels gets mounted in a namespace. The labels >> are ignored. Instead, the filesystem defaults (potentially specified as >> mount options smackfsdef="something", but usually the floor label ("_")) >> are used, giving the user the ability to read everything and (usually) >> change nothing. This is both dangerous (unintended read access to files) >> and pointless (can't make changes). > I don't get it. > > If I mount an unprivileged filesystem, then either the contents were > put there *by me*, in which case letting me access them are fine, or > (with Seth's patches and then some) I control the backing store, in > which case I can do whatever I want regardless of what LSM thinks. > > So I don't see the problem. Why would Smack or any other LSM care at > all, unless it wants to prevent me from mounting the fs in the first > place? First off, I don't cotton to the notion that you should be able to mount filesystems without privilege. But it seems I'm being outvoted on that. I suspect that there are cases where it might be safe, but I can't think of one off the top of my head. If you do mount a filesystem it needs to behave according to the rules of the system. If you have a security module that uses attributes on the filesystem you can't ignore them just because it's "your data". Mandatory access control schemes, including Smack and SELinux don't give a fig about who you are. It's the label on the data and the process that matter. If "you" get to muck the labels up, you've broken the mandatory access control. > --Andy ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-16 2:54 ` Casey Schaufler @ 2015-07-16 4:47 ` Eric W. Biederman -1 siblings, 0 replies; 138+ messages in thread From: Eric W. Biederman @ 2015-07-16 4:47 UTC (permalink / raw) To: Casey Schaufler Cc: Andy Lutomirski, Seth Forshee, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel Casey Schaufler <casey@schaufler-ca.com> writes: > On 7/15/2015 6:08 PM, Andy Lutomirski wrote: >> On Wed, Jul 15, 2015 at 3:39 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: >>> On 7/15/2015 2:06 PM, Eric W. Biederman wrote: >>>> Casey Schaufler <casey@schaufler-ca.com> writes: >>>> The first step needs to be not trusting those labels and treating such >>>> filesystems as filesystems without label support. I hope that is Seth >>>> has implemented. >>> A filesystem with Smack labels gets mounted in a namespace. The labels >>> are ignored. Instead, the filesystem defaults (potentially specified as >>> mount options smackfsdef="something", but usually the floor label ("_")) >>> are used, giving the user the ability to read everything and (usually) >>> change nothing. This is both dangerous (unintended read access to files) >>> and pointless (can't make changes). >> I don't get it. >> >> If I mount an unprivileged filesystem, then either the contents were >> put there *by me*, in which case letting me access them are fine, or >> (with Seth's patches and then some) I control the backing store, in >> which case I can do whatever I want regardless of what LSM thinks. >> >> So I don't see the problem. Why would Smack or any other LSM care at >> all, unless it wants to prevent me from mounting the fs in the first >> place? > > First off, I don't cotton to the notion that you should be able > to mount filesystems without privilege. But it seems I'm being > outvoted on that. I suspect that there are cases where it might > be safe, but I can't think of one off the top of my head. There are two fundamental issues mounting filesystems without privielge, by which I actually mean mounting filesystems as the root user in a user namespace. - Are the semantics safe. - Is the extra attack surface a problem. Figuring out how to make semantics safe is what we are talking about. Once we sort out the semantics we can look at the handful of filesystems like fuse where the extra attack surface is not a concern. With that said desktop environments have for a long time been automatically mounting whichever filesystem you place in your computer, so in practice what this is really about is trying to align the kernel with how people use filesystems. I haven't looked closely but I think docker is just about as bad as those desktop environments when it comes to mounting filesystems. > If you do mount a filesystem it needs to behave according to the > rules of the system. I agree. > If you have a security module that uses > attributes on the filesystem you can't ignore them just because > it's "your data". Mandatory access control schemes, including > Smack and SELinux don't give a fig about who you are. It's the > label on the data and the process that matter. If "you" get to > muck the labels up, you've broken the mandatory access control. So there are filesystems like fat and minix that can not store a label. Since it is not possible to store labels securely in filesystems mounted by unprivileged users (at least in the normal sense) the intent would be to treat a filesystem mounted without the privileges of the global root user as a filesystem that does not support xattrs. Treating such a filesystem as a filesystem that does not support xattrs is the only possible way support such a filesystem securely, because as you have said someone who can muck up the labels breaks mandatory access control. Given how non-trivial it is to grasp the nuances of different lsms mandatory access control semantics, I am asking Seth for the first past to simply forbid mounting of filesystems with just user namespace permissions when there is an lsm active. Once we get that far smack may never need to support such systems. Eric ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-16 4:47 ` Eric W. Biederman 0 siblings, 0 replies; 138+ messages in thread From: Eric W. Biederman @ 2015-07-16 4:47 UTC (permalink / raw) To: Casey Schaufler Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, Seth Forshee, LSM List, Alexander Viro, SELinux-NSA, Linux FS Devel Casey Schaufler <casey@schaufler-ca.com> writes: > On 7/15/2015 6:08 PM, Andy Lutomirski wrote: >> On Wed, Jul 15, 2015 at 3:39 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: >>> On 7/15/2015 2:06 PM, Eric W. Biederman wrote: >>>> Casey Schaufler <casey@schaufler-ca.com> writes: >>>> The first step needs to be not trusting those labels and treating such >>>> filesystems as filesystems without label support. I hope that is Seth >>>> has implemented. >>> A filesystem with Smack labels gets mounted in a namespace. The labels >>> are ignored. Instead, the filesystem defaults (potentially specified as >>> mount options smackfsdef="something", but usually the floor label ("_")) >>> are used, giving the user the ability to read everything and (usually) >>> change nothing. This is both dangerous (unintended read access to files) >>> and pointless (can't make changes). >> I don't get it. >> >> If I mount an unprivileged filesystem, then either the contents were >> put there *by me*, in which case letting me access them are fine, or >> (with Seth's patches and then some) I control the backing store, in >> which case I can do whatever I want regardless of what LSM thinks. >> >> So I don't see the problem. Why would Smack or any other LSM care at >> all, unless it wants to prevent me from mounting the fs in the first >> place? > > First off, I don't cotton to the notion that you should be able > to mount filesystems without privilege. But it seems I'm being > outvoted on that. I suspect that there are cases where it might > be safe, but I can't think of one off the top of my head. There are two fundamental issues mounting filesystems without privielge, by which I actually mean mounting filesystems as the root user in a user namespace. - Are the semantics safe. - Is the extra attack surface a problem. Figuring out how to make semantics safe is what we are talking about. Once we sort out the semantics we can look at the handful of filesystems like fuse where the extra attack surface is not a concern. With that said desktop environments have for a long time been automatically mounting whichever filesystem you place in your computer, so in practice what this is really about is trying to align the kernel with how people use filesystems. I haven't looked closely but I think docker is just about as bad as those desktop environments when it comes to mounting filesystems. > If you do mount a filesystem it needs to behave according to the > rules of the system. I agree. > If you have a security module that uses > attributes on the filesystem you can't ignore them just because > it's "your data". Mandatory access control schemes, including > Smack and SELinux don't give a fig about who you are. It's the > label on the data and the process that matter. If "you" get to > muck the labels up, you've broken the mandatory access control. So there are filesystems like fat and minix that can not store a label. Since it is not possible to store labels securely in filesystems mounted by unprivileged users (at least in the normal sense) the intent would be to treat a filesystem mounted without the privileges of the global root user as a filesystem that does not support xattrs. Treating such a filesystem as a filesystem that does not support xattrs is the only possible way support such a filesystem securely, because as you have said someone who can muck up the labels breaks mandatory access control. Given how non-trivial it is to grasp the nuances of different lsms mandatory access control semantics, I am asking Seth for the first past to simply forbid mounting of filesystems with just user namespace permissions when there is an lsm active. Once we get that far smack may never need to support such systems. Eric ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-16 4:47 ` Eric W. Biederman @ 2015-07-17 0:09 ` Dave Chinner -1 siblings, 0 replies; 138+ messages in thread From: Dave Chinner @ 2015-07-17 0:09 UTC (permalink / raw) To: Eric W. Biederman Cc: Casey Schaufler, Andy Lutomirski, Seth Forshee, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel On Wed, Jul 15, 2015 at 11:47:08PM -0500, Eric W. Biederman wrote: > Casey Schaufler <casey@schaufler-ca.com> writes: > > On 7/15/2015 6:08 PM, Andy Lutomirski wrote: > >> If I mount an unprivileged filesystem, then either the contents were > >> put there *by me*, in which case letting me access them are fine, or > >> (with Seth's patches and then some) I control the backing store, in > >> which case I can do whatever I want regardless of what LSM thinks. > >> > >> So I don't see the problem. Why would Smack or any other LSM care at > >> all, unless it wants to prevent me from mounting the fs in the first > >> place? > > > > First off, I don't cotton to the notion that you should be able > > to mount filesystems without privilege. But it seems I'm being > > outvoted on that. I suspect that there are cases where it might > > be safe, but I can't think of one off the top of my head. > > There are two fundamental issues mounting filesystems without privielge, > by which I actually mean mounting filesystems as the root user in a user > namespace. > > - Are the semantics safe. > - Is the extra attack surface a problem. I think the attack surface this exposes is the biggest problem facing this proposal. > Figuring out how to make semantics safe is what we are talking about. > > Once we sort out the semantics we can look at the handful of filesystems > like fuse where the extra attack surface is not a concern. > > With that said desktop environments have for a long time been > automatically mounting whichever filesystem you place in your computer, > so in practice what this is really about is trying to align the kernel > with how people use filesystems. The key difference is that desktops only do this when you physically plug in a device. With unprivileged mounts, a hostile attacker doesn't need physical access to the machine to exploit lurking kernel filesystem bugs. i.e. they can just use loopback mounts, and they can keep mounting corrupted images until they find something that works. User namespaces are supposed to provide trust separation. The kernel filesystems simply aren't hardened against unprivileged attacks from below - there is a trust relationship between root and the filesystem in that they are the only things that can write to the disk. Mounts from within a userns destroys this relationship as the userns root, by definition, is not a trusted actor. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-17 0:09 ` Dave Chinner 0 siblings, 0 replies; 138+ messages in thread From: Dave Chinner @ 2015-07-17 0:09 UTC (permalink / raw) To: Eric W. Biederman Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, Seth Forshee, LSM List, Alexander Viro, SELinux-NSA, Linux FS Devel On Wed, Jul 15, 2015 at 11:47:08PM -0500, Eric W. Biederman wrote: > Casey Schaufler <casey@schaufler-ca.com> writes: > > On 7/15/2015 6:08 PM, Andy Lutomirski wrote: > >> If I mount an unprivileged filesystem, then either the contents were > >> put there *by me*, in which case letting me access them are fine, or > >> (with Seth's patches and then some) I control the backing store, in > >> which case I can do whatever I want regardless of what LSM thinks. > >> > >> So I don't see the problem. Why would Smack or any other LSM care at > >> all, unless it wants to prevent me from mounting the fs in the first > >> place? > > > > First off, I don't cotton to the notion that you should be able > > to mount filesystems without privilege. But it seems I'm being > > outvoted on that. I suspect that there are cases where it might > > be safe, but I can't think of one off the top of my head. > > There are two fundamental issues mounting filesystems without privielge, > by which I actually mean mounting filesystems as the root user in a user > namespace. > > - Are the semantics safe. > - Is the extra attack surface a problem. I think the attack surface this exposes is the biggest problem facing this proposal. > Figuring out how to make semantics safe is what we are talking about. > > Once we sort out the semantics we can look at the handful of filesystems > like fuse where the extra attack surface is not a concern. > > With that said desktop environments have for a long time been > automatically mounting whichever filesystem you place in your computer, > so in practice what this is really about is trying to align the kernel > with how people use filesystems. The key difference is that desktops only do this when you physically plug in a device. With unprivileged mounts, a hostile attacker doesn't need physical access to the machine to exploit lurking kernel filesystem bugs. i.e. they can just use loopback mounts, and they can keep mounting corrupted images until they find something that works. User namespaces are supposed to provide trust separation. The kernel filesystems simply aren't hardened against unprivileged attacks from below - there is a trust relationship between root and the filesystem in that they are the only things that can write to the disk. Mounts from within a userns destroys this relationship as the userns root, by definition, is not a trusted actor. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-17 0:09 ` Dave Chinner @ 2015-07-17 0:42 ` Eric W. Biederman -1 siblings, 0 replies; 138+ messages in thread From: Eric W. Biederman @ 2015-07-17 0:42 UTC (permalink / raw) To: Dave Chinner Cc: Casey Schaufler, Andy Lutomirski, Seth Forshee, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel Dave Chinner <david@fromorbit.com> writes: > On Wed, Jul 15, 2015 at 11:47:08PM -0500, Eric W. Biederman wrote: >> Casey Schaufler <casey@schaufler-ca.com> writes: >> > On 7/15/2015 6:08 PM, Andy Lutomirski wrote: >> >> If I mount an unprivileged filesystem, then either the contents were >> >> put there *by me*, in which case letting me access them are fine, or >> >> (with Seth's patches and then some) I control the backing store, in >> >> which case I can do whatever I want regardless of what LSM thinks. >> >> >> >> So I don't see the problem. Why would Smack or any other LSM care at >> >> all, unless it wants to prevent me from mounting the fs in the first >> >> place? >> > >> > First off, I don't cotton to the notion that you should be able >> > to mount filesystems without privilege. But it seems I'm being >> > outvoted on that. I suspect that there are cases where it might >> > be safe, but I can't think of one off the top of my head. >> >> There are two fundamental issues mounting filesystems without privielge, >> by which I actually mean mounting filesystems as the root user in a user >> namespace. >> >> - Are the semantics safe. >> - Is the extra attack surface a problem. > > I think the attack surface this exposes is the biggest problem > facing this proposal. I completely agree. >> Figuring out how to make semantics safe is what we are talking about. >> >> Once we sort out the semantics we can look at the handful of filesystems >> like fuse where the extra attack surface is not a concern. >> >> With that said desktop environments have for a long time been >> automatically mounting whichever filesystem you place in your computer, >> so in practice what this is really about is trying to align the kernel >> with how people use filesystems. > > The key difference is that desktops only do this when you physically > plug in a device. With unprivileged mounts, a hostile attacker > doesn't need physical access to the machine to exploit lurking > kernel filesystem bugs. i.e. they can just use loopback mounts, and > they can keep mounting corrupted images until they find something > that works. Yep. That magnifies the problem quite a bit. > User namespaces are supposed to provide trust separation. The > kernel filesystems simply aren't hardened against unprivileged > attacks from below - there is a trust relationship between root and > the filesystem in that they are the only things that can write to > the disk. Mounts from within a userns destroys this relationship as > the userns root, by definition, is not a trusted actor. I talked to Ted Tso a while back and ext4 is at least in principle already hardened against that kind of attack. I am not certain I believe it, but if it is true I think it is fantastic. At this point any setting of the FS_USER_MOUNT flag I figure needs to go through the filesystem maintainers tree and they need to be aware of and agree to deal with the attack from below issue. The one filesystem I truly expect we can make work is fuse. fuse has been designed to deal with some variation of the attack from below issue since day one. We looked at what the patches to fuse would look like with the current state of the vfs and it was not pretty. We very much need to sort through as much as possible at the vfs layer, and in generic code. Allow everyone to see what is going on and how it works before preceeding forward with enabling any filesystems. I truly hope we can find a small set of block device filesystems that we can harden from attack below. That would allow linux to have serious defenses against evil usb stick attacks. I think that is going to take a lot of careful coding, testing and validation and advancing the state of the art to get there. Eric ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-17 0:42 ` Eric W. Biederman 0 siblings, 0 replies; 138+ messages in thread From: Eric W. Biederman @ 2015-07-17 0:42 UTC (permalink / raw) To: Dave Chinner Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, Seth Forshee, LSM List, Alexander Viro, SELinux-NSA, Linux FS Devel Dave Chinner <david@fromorbit.com> writes: > On Wed, Jul 15, 2015 at 11:47:08PM -0500, Eric W. Biederman wrote: >> Casey Schaufler <casey@schaufler-ca.com> writes: >> > On 7/15/2015 6:08 PM, Andy Lutomirski wrote: >> >> If I mount an unprivileged filesystem, then either the contents were >> >> put there *by me*, in which case letting me access them are fine, or >> >> (with Seth's patches and then some) I control the backing store, in >> >> which case I can do whatever I want regardless of what LSM thinks. >> >> >> >> So I don't see the problem. Why would Smack or any other LSM care at >> >> all, unless it wants to prevent me from mounting the fs in the first >> >> place? >> > >> > First off, I don't cotton to the notion that you should be able >> > to mount filesystems without privilege. But it seems I'm being >> > outvoted on that. I suspect that there are cases where it might >> > be safe, but I can't think of one off the top of my head. >> >> There are two fundamental issues mounting filesystems without privielge, >> by which I actually mean mounting filesystems as the root user in a user >> namespace. >> >> - Are the semantics safe. >> - Is the extra attack surface a problem. > > I think the attack surface this exposes is the biggest problem > facing this proposal. I completely agree. >> Figuring out how to make semantics safe is what we are talking about. >> >> Once we sort out the semantics we can look at the handful of filesystems >> like fuse where the extra attack surface is not a concern. >> >> With that said desktop environments have for a long time been >> automatically mounting whichever filesystem you place in your computer, >> so in practice what this is really about is trying to align the kernel >> with how people use filesystems. > > The key difference is that desktops only do this when you physically > plug in a device. With unprivileged mounts, a hostile attacker > doesn't need physical access to the machine to exploit lurking > kernel filesystem bugs. i.e. they can just use loopback mounts, and > they can keep mounting corrupted images until they find something > that works. Yep. That magnifies the problem quite a bit. > User namespaces are supposed to provide trust separation. The > kernel filesystems simply aren't hardened against unprivileged > attacks from below - there is a trust relationship between root and > the filesystem in that they are the only things that can write to > the disk. Mounts from within a userns destroys this relationship as > the userns root, by definition, is not a trusted actor. I talked to Ted Tso a while back and ext4 is at least in principle already hardened against that kind of attack. I am not certain I believe it, but if it is true I think it is fantastic. At this point any setting of the FS_USER_MOUNT flag I figure needs to go through the filesystem maintainers tree and they need to be aware of and agree to deal with the attack from below issue. The one filesystem I truly expect we can make work is fuse. fuse has been designed to deal with some variation of the attack from below issue since day one. We looked at what the patches to fuse would look like with the current state of the vfs and it was not pretty. We very much need to sort through as much as possible at the vfs layer, and in generic code. Allow everyone to see what is going on and how it works before preceeding forward with enabling any filesystems. I truly hope we can find a small set of block device filesystems that we can harden from attack below. That would allow linux to have serious defenses against evil usb stick attacks. I think that is going to take a lot of careful coding, testing and validation and advancing the state of the art to get there. Eric ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-17 0:42 ` Eric W. Biederman @ 2015-07-17 2:47 ` Dave Chinner -1 siblings, 0 replies; 138+ messages in thread From: Dave Chinner @ 2015-07-17 2:47 UTC (permalink / raw) To: Eric W. Biederman Cc: Casey Schaufler, Andy Lutomirski, Seth Forshee, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel On Thu, Jul 16, 2015 at 07:42:03PM -0500, Eric W. Biederman wrote: > Dave Chinner <david@fromorbit.com> writes: > > > On Wed, Jul 15, 2015 at 11:47:08PM -0500, Eric W. Biederman wrote: > >> Casey Schaufler <casey@schaufler-ca.com> writes: > >> > On 7/15/2015 6:08 PM, Andy Lutomirski wrote: > >> >> If I mount an unprivileged filesystem, then either the contents were > >> >> put there *by me*, in which case letting me access them are fine, or > >> >> (with Seth's patches and then some) I control the backing store, in > >> >> which case I can do whatever I want regardless of what LSM thinks. > >> >> > >> >> So I don't see the problem. Why would Smack or any other LSM care at > >> >> all, unless it wants to prevent me from mounting the fs in the first > >> >> place? > >> > > >> > First off, I don't cotton to the notion that you should be able > >> > to mount filesystems without privilege. But it seems I'm being > >> > outvoted on that. I suspect that there are cases where it might > >> > be safe, but I can't think of one off the top of my head. > >> > >> There are two fundamental issues mounting filesystems without privielge, > >> by which I actually mean mounting filesystems as the root user in a user > >> namespace. > >> > >> - Are the semantics safe. > >> - Is the extra attack surface a problem. > > > > I think the attack surface this exposes is the biggest problem > > facing this proposal. > > I completely agree. > > >> Figuring out how to make semantics safe is what we are talking about. > >> > >> Once we sort out the semantics we can look at the handful of filesystems > >> like fuse where the extra attack surface is not a concern. > >> > >> With that said desktop environments have for a long time been > >> automatically mounting whichever filesystem you place in your computer, > >> so in practice what this is really about is trying to align the kernel > >> with how people use filesystems. > > > > The key difference is that desktops only do this when you physically > > plug in a device. With unprivileged mounts, a hostile attacker > > doesn't need physical access to the machine to exploit lurking > > kernel filesystem bugs. i.e. they can just use loopback mounts, and > > they can keep mounting corrupted images until they find something > > that works. > > Yep. That magnifies the problem quite a bit. > > > User namespaces are supposed to provide trust separation. The > > kernel filesystems simply aren't hardened against unprivileged > > attacks from below - there is a trust relationship between root and > > the filesystem in that they are the only things that can write to > > the disk. Mounts from within a userns destroys this relationship as > > the userns root, by definition, is not a trusted actor. > > I talked to Ted Tso a while back and ext4 is at least in principle > already hardened against that kind of attack. I am not certain I > believe it, but if it is true I think it is fantastic. No, it's not. No filesystem is, because to harden against such attacks requires complete verification of all metadata when it is read from disk, before it is used, or some method or ensuring the block was not tampered with. CRCs are not sufficient, because they can be tampered with, too. The only way a filesystem would be able to trust what it reads from disk has not been tampered with in a system with untrusted mounts is if it has some kind of cryptographically secure signature in the metadata and the attacker is unable to access the key for that signature. No filesystem we have has that capability and AFAIA there are no plans for any filesystem to implement such tamper detection. And no, ext4 encryption does not provide this because it only stores the values and data in encrypted format and does not protect metadata from tampering when it is not mounted. If we don't have crypto signatures in metadata, then XFS is probably the most robust against tampering as it does a lot more checking of the on-disk metadata before it is used than any other filesystem (i.e. see the verifier infrastructure that does corruption checks after read (in io completion) and before write (in io submission) to catch bad metadata before it is used by the kernel, or before it is written to disk by the kernel. However, these checks are far from comprehensive. we can only check internal consistency of the metadata objects in the block, and even then we really only can check for values within range rather than absolute correctness. e.g. we can check a dirent has a valid name, length, ftype and inode number, but we can't validate that the inode is actually allocated or not because that requires a lookup in the allocated inode btree. We *trust* that inode number to be allocated and valid because it is in metadata the filesystem wrote. For inode numbers that come from untrusted sources (NFS, open-by-handle, etc) we have a flag that does inode number validation on lookup (XFS_IGET_UNTRUSTED) to check against trusted metadata (i.e. the allocated inode btrees), but that is expensive and so not done on inodes that we pull directly from metadata that has come from disk. Indeed, we still trust on-disk metadata to be correct to validate that other metadata canbe trusted, so if one structure can be tampered with, so can others. IOWs, if we cannot trust one part of the filesystem metadata to be correct, then we cannot trust that filesystem *at all*, *for anything*. And even running fsck doesn't restore trust - all it does is tell us that any modification that was made is not a detectable inconsistency that needs fixing. > At this point any setting of the FS_USER_MOUNT flag I figure needs to go > through the filesystem maintainers tree and they need to be aware of and > agree to deal with the attack from below issue. > > The one filesystem I truly expect we can make work is fuse. fuse has > been designed to deal with some variation of the attack from below issue > since day one. We looked at what the patches to fuse would look like > with the current state of the vfs and it was not pretty. > > We very much need to sort through as much as possible at the vfs layer, > and in generic code. Allow everyone to see what is going on and how > it works before preceeding forward with enabling any filesystems. The VFS protects us from attacks from above the filesystem, not below. The VFS plays no part in validating the on-disk structure of a filesystem which is what attacks from below will be attempting to exploit. > I truly hope we can find a small set of block device filesystems that we > can harden from attack below. That would allow linux to have serious > defenses against evil usb stick attacks. I think that is going to take > a lot of careful coding, testing and validation and advancing the state > of the art to get there. Somehow, I just can't see that happening. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-17 2:47 ` Dave Chinner 0 siblings, 0 replies; 138+ messages in thread From: Dave Chinner @ 2015-07-17 2:47 UTC (permalink / raw) To: Eric W. Biederman Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, Seth Forshee, LSM List, Alexander Viro, SELinux-NSA, Linux FS Devel On Thu, Jul 16, 2015 at 07:42:03PM -0500, Eric W. Biederman wrote: > Dave Chinner <david@fromorbit.com> writes: > > > On Wed, Jul 15, 2015 at 11:47:08PM -0500, Eric W. Biederman wrote: > >> Casey Schaufler <casey@schaufler-ca.com> writes: > >> > On 7/15/2015 6:08 PM, Andy Lutomirski wrote: > >> >> If I mount an unprivileged filesystem, then either the contents were > >> >> put there *by me*, in which case letting me access them are fine, or > >> >> (with Seth's patches and then some) I control the backing store, in > >> >> which case I can do whatever I want regardless of what LSM thinks. > >> >> > >> >> So I don't see the problem. Why would Smack or any other LSM care at > >> >> all, unless it wants to prevent me from mounting the fs in the first > >> >> place? > >> > > >> > First off, I don't cotton to the notion that you should be able > >> > to mount filesystems without privilege. But it seems I'm being > >> > outvoted on that. I suspect that there are cases where it might > >> > be safe, but I can't think of one off the top of my head. > >> > >> There are two fundamental issues mounting filesystems without privielge, > >> by which I actually mean mounting filesystems as the root user in a user > >> namespace. > >> > >> - Are the semantics safe. > >> - Is the extra attack surface a problem. > > > > I think the attack surface this exposes is the biggest problem > > facing this proposal. > > I completely agree. > > >> Figuring out how to make semantics safe is what we are talking about. > >> > >> Once we sort out the semantics we can look at the handful of filesystems > >> like fuse where the extra attack surface is not a concern. > >> > >> With that said desktop environments have for a long time been > >> automatically mounting whichever filesystem you place in your computer, > >> so in practice what this is really about is trying to align the kernel > >> with how people use filesystems. > > > > The key difference is that desktops only do this when you physically > > plug in a device. With unprivileged mounts, a hostile attacker > > doesn't need physical access to the machine to exploit lurking > > kernel filesystem bugs. i.e. they can just use loopback mounts, and > > they can keep mounting corrupted images until they find something > > that works. > > Yep. That magnifies the problem quite a bit. > > > User namespaces are supposed to provide trust separation. The > > kernel filesystems simply aren't hardened against unprivileged > > attacks from below - there is a trust relationship between root and > > the filesystem in that they are the only things that can write to > > the disk. Mounts from within a userns destroys this relationship as > > the userns root, by definition, is not a trusted actor. > > I talked to Ted Tso a while back and ext4 is at least in principle > already hardened against that kind of attack. I am not certain I > believe it, but if it is true I think it is fantastic. No, it's not. No filesystem is, because to harden against such attacks requires complete verification of all metadata when it is read from disk, before it is used, or some method or ensuring the block was not tampered with. CRCs are not sufficient, because they can be tampered with, too. The only way a filesystem would be able to trust what it reads from disk has not been tampered with in a system with untrusted mounts is if it has some kind of cryptographically secure signature in the metadata and the attacker is unable to access the key for that signature. No filesystem we have has that capability and AFAIA there are no plans for any filesystem to implement such tamper detection. And no, ext4 encryption does not provide this because it only stores the values and data in encrypted format and does not protect metadata from tampering when it is not mounted. If we don't have crypto signatures in metadata, then XFS is probably the most robust against tampering as it does a lot more checking of the on-disk metadata before it is used than any other filesystem (i.e. see the verifier infrastructure that does corruption checks after read (in io completion) and before write (in io submission) to catch bad metadata before it is used by the kernel, or before it is written to disk by the kernel. However, these checks are far from comprehensive. we can only check internal consistency of the metadata objects in the block, and even then we really only can check for values within range rather than absolute correctness. e.g. we can check a dirent has a valid name, length, ftype and inode number, but we can't validate that the inode is actually allocated or not because that requires a lookup in the allocated inode btree. We *trust* that inode number to be allocated and valid because it is in metadata the filesystem wrote. For inode numbers that come from untrusted sources (NFS, open-by-handle, etc) we have a flag that does inode number validation on lookup (XFS_IGET_UNTRUSTED) to check against trusted metadata (i.e. the allocated inode btrees), but that is expensive and so not done on inodes that we pull directly from metadata that has come from disk. Indeed, we still trust on-disk metadata to be correct to validate that other metadata canbe trusted, so if one structure can be tampered with, so can others. IOWs, if we cannot trust one part of the filesystem metadata to be correct, then we cannot trust that filesystem *at all*, *for anything*. And even running fsck doesn't restore trust - all it does is tell us that any modification that was made is not a detectable inconsistency that needs fixing. > At this point any setting of the FS_USER_MOUNT flag I figure needs to go > through the filesystem maintainers tree and they need to be aware of and > agree to deal with the attack from below issue. > > The one filesystem I truly expect we can make work is fuse. fuse has > been designed to deal with some variation of the attack from below issue > since day one. We looked at what the patches to fuse would look like > with the current state of the vfs and it was not pretty. > > We very much need to sort through as much as possible at the vfs layer, > and in generic code. Allow everyone to see what is going on and how > it works before preceeding forward with enabling any filesystems. The VFS protects us from attacks from above the filesystem, not below. The VFS plays no part in validating the on-disk structure of a filesystem which is what attacks from below will be attempting to exploit. > I truly hope we can find a small set of block device filesystems that we > can harden from attack below. That would allow linux to have serious > defenses against evil usb stick attacks. I think that is going to take > a lot of careful coding, testing and validation and advancing the state > of the art to get there. Somehow, I just can't see that happening. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-17 2:47 ` Dave Chinner @ 2015-07-21 17:37 ` J. Bruce Fields -1 siblings, 0 replies; 138+ messages in thread From: J. Bruce Fields @ 2015-07-21 17:37 UTC (permalink / raw) To: Dave Chinner Cc: Eric W. Biederman, Casey Schaufler, Andy Lutomirski, Seth Forshee, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel On Fri, Jul 17, 2015 at 12:47:35PM +1000, Dave Chinner wrote: > On Thu, Jul 16, 2015 at 07:42:03PM -0500, Eric W. Biederman wrote: > > Dave Chinner <david@fromorbit.com> writes: > > > > > On Wed, Jul 15, 2015 at 11:47:08PM -0500, Eric W. Biederman wrote: > > >> Casey Schaufler <casey@schaufler-ca.com> writes: > > >> > On 7/15/2015 6:08 PM, Andy Lutomirski wrote: > > >> >> If I mount an unprivileged filesystem, then either the contents were > > >> >> put there *by me*, in which case letting me access them are fine, or > > >> >> (with Seth's patches and then some) I control the backing store, in > > >> >> which case I can do whatever I want regardless of what LSM thinks. > > >> >> > > >> >> So I don't see the problem. Why would Smack or any other LSM care at > > >> >> all, unless it wants to prevent me from mounting the fs in the first > > >> >> place? > > >> > > > >> > First off, I don't cotton to the notion that you should be able > > >> > to mount filesystems without privilege. But it seems I'm being > > >> > outvoted on that. I suspect that there are cases where it might > > >> > be safe, but I can't think of one off the top of my head. > > >> > > >> There are two fundamental issues mounting filesystems without privielge, > > >> by which I actually mean mounting filesystems as the root user in a user > > >> namespace. > > >> > > >> - Are the semantics safe. > > >> - Is the extra attack surface a problem. > > > > > > I think the attack surface this exposes is the biggest problem > > > facing this proposal. > > > > I completely agree. > > > > >> Figuring out how to make semantics safe is what we are talking about. > > >> > > >> Once we sort out the semantics we can look at the handful of filesystems > > >> like fuse where the extra attack surface is not a concern. > > >> > > >> With that said desktop environments have for a long time been > > >> automatically mounting whichever filesystem you place in your computer, > > >> so in practice what this is really about is trying to align the kernel > > >> with how people use filesystems. > > > > > > The key difference is that desktops only do this when you physically > > > plug in a device. With unprivileged mounts, a hostile attacker > > > doesn't need physical access to the machine to exploit lurking > > > kernel filesystem bugs. i.e. they can just use loopback mounts, and > > > they can keep mounting corrupted images until they find something > > > that works. > > > > Yep. That magnifies the problem quite a bit. > > > > > User namespaces are supposed to provide trust separation. The > > > kernel filesystems simply aren't hardened against unprivileged > > > attacks from below - there is a trust relationship between root and > > > the filesystem in that they are the only things that can write to > > > the disk. Mounts from within a userns destroys this relationship as > > > the userns root, by definition, is not a trusted actor. > > > > I talked to Ted Tso a while back and ext4 is at least in principle > > already hardened against that kind of attack. I am not certain I > > believe it, but if it is true I think it is fantastic. > > No, it's not. No filesystem is, because to harden against such > attacks requires complete verification of all metadata when it is > read from disk, before it is used, or some method or ensuring the > block was not tampered with. CRCs are not sufficient, because they > can be tampered with, too. > > The only way a filesystem would be able to trust what it reads from > disk has not been tampered with in a system with untrusted mounts is > if it has some kind of cryptographically secure signature in the > metadata and the attacker is unable to access the key for that > signature. Preventing tampering is a little different from protecting the kernel from attack, isn't it? I thought the latter was what people were asking about. So, for example, a screwed up on-disk directory structure shouldn't result in creating a cycle in the dcache and then deadlocking. --b. > No filesystem we have has that capability and AFAIA there > are no plans for any filesystem to implement such tamper detection. > And no, ext4 encryption does not provide this because it only stores > the values and data in encrypted format and does not protect > metadata from tampering when it is not mounted. > > If we don't have crypto signatures in metadata, then XFS is probably > the most robust against tampering as it does a lot more checking of > the on-disk metadata before it is used than any other filesystem > (i.e. see the verifier infrastructure that does corruption checks > after read (in io completion) and before write (in io submission) > to catch bad metadata before it is used by the kernel, or before it > is written to disk by the kernel. > > However, these checks are far from comprehensive. we can only check > internal consistency of the metadata objects in the block, and even > then we really only can check for values within range rather than > absolute correctness. e.g. we can check a dirent has a valid name, > length, ftype and inode number, but we can't validate that the inode > is actually allocated or not because that requires a lookup in the > allocated inode btree. We *trust* that inode number to be > allocated and valid because it is in metadata the filesystem wrote. > > For inode numbers that come from untrusted sources (NFS, > open-by-handle, etc) we have a flag that does inode number > validation on lookup (XFS_IGET_UNTRUSTED) to check against trusted > metadata (i.e. the allocated inode btrees), but that is expensive > and so not done on inodes that we pull directly from metadata that > has come from disk. Indeed, we still trust on-disk metadata to be > correct to validate that other metadata canbe trusted, so if one > structure can be tampered with, so can others. > > IOWs, if we cannot trust one part of the filesystem metadata to be > correct, then we cannot trust that filesystem *at all*, *for > anything*. And even running fsck doesn't restore trust - all it does > is tell us that any modification that was made is not a detectable > inconsistency that needs fixing. > > > At this point any setting of the FS_USER_MOUNT flag I figure needs to go > > through the filesystem maintainers tree and they need to be aware of and > > agree to deal with the attack from below issue. > > > > The one filesystem I truly expect we can make work is fuse. fuse has > > been designed to deal with some variation of the attack from below issue > > since day one. We looked at what the patches to fuse would look like > > with the current state of the vfs and it was not pretty. > > > > We very much need to sort through as much as possible at the vfs layer, > > and in generic code. Allow everyone to see what is going on and how > > it works before preceeding forward with enabling any filesystems. > > The VFS protects us from attacks from above the filesystem, not > below. The VFS plays no part in validating the on-disk structure of > a filesystem which is what attacks from below will be attempting to > exploit. > > > I truly hope we can find a small set of block device filesystems that we > > can harden from attack below. That would allow linux to have serious > > defenses against evil usb stick attacks. I think that is going to take > > a lot of careful coding, testing and validation and advancing the state > > of the art to get there. > > Somehow, I just can't see that happening. > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-21 17:37 ` J. Bruce Fields 0 siblings, 0 replies; 138+ messages in thread From: J. Bruce Fields @ 2015-07-21 17:37 UTC (permalink / raw) To: Dave Chinner Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, Seth Forshee, LSM List, SELinux-NSA, Linux FS Devel, Alexander Viro On Fri, Jul 17, 2015 at 12:47:35PM +1000, Dave Chinner wrote: > On Thu, Jul 16, 2015 at 07:42:03PM -0500, Eric W. Biederman wrote: > > Dave Chinner <david@fromorbit.com> writes: > > > > > On Wed, Jul 15, 2015 at 11:47:08PM -0500, Eric W. Biederman wrote: > > >> Casey Schaufler <casey@schaufler-ca.com> writes: > > >> > On 7/15/2015 6:08 PM, Andy Lutomirski wrote: > > >> >> If I mount an unprivileged filesystem, then either the contents were > > >> >> put there *by me*, in which case letting me access them are fine, or > > >> >> (with Seth's patches and then some) I control the backing store, in > > >> >> which case I can do whatever I want regardless of what LSM thinks. > > >> >> > > >> >> So I don't see the problem. Why would Smack or any other LSM care at > > >> >> all, unless it wants to prevent me from mounting the fs in the first > > >> >> place? > > >> > > > >> > First off, I don't cotton to the notion that you should be able > > >> > to mount filesystems without privilege. But it seems I'm being > > >> > outvoted on that. I suspect that there are cases where it might > > >> > be safe, but I can't think of one off the top of my head. > > >> > > >> There are two fundamental issues mounting filesystems without privielge, > > >> by which I actually mean mounting filesystems as the root user in a user > > >> namespace. > > >> > > >> - Are the semantics safe. > > >> - Is the extra attack surface a problem. > > > > > > I think the attack surface this exposes is the biggest problem > > > facing this proposal. > > > > I completely agree. > > > > >> Figuring out how to make semantics safe is what we are talking about. > > >> > > >> Once we sort out the semantics we can look at the handful of filesystems > > >> like fuse where the extra attack surface is not a concern. > > >> > > >> With that said desktop environments have for a long time been > > >> automatically mounting whichever filesystem you place in your computer, > > >> so in practice what this is really about is trying to align the kernel > > >> with how people use filesystems. > > > > > > The key difference is that desktops only do this when you physically > > > plug in a device. With unprivileged mounts, a hostile attacker > > > doesn't need physical access to the machine to exploit lurking > > > kernel filesystem bugs. i.e. they can just use loopback mounts, and > > > they can keep mounting corrupted images until they find something > > > that works. > > > > Yep. That magnifies the problem quite a bit. > > > > > User namespaces are supposed to provide trust separation. The > > > kernel filesystems simply aren't hardened against unprivileged > > > attacks from below - there is a trust relationship between root and > > > the filesystem in that they are the only things that can write to > > > the disk. Mounts from within a userns destroys this relationship as > > > the userns root, by definition, is not a trusted actor. > > > > I talked to Ted Tso a while back and ext4 is at least in principle > > already hardened against that kind of attack. I am not certain I > > believe it, but if it is true I think it is fantastic. > > No, it's not. No filesystem is, because to harden against such > attacks requires complete verification of all metadata when it is > read from disk, before it is used, or some method or ensuring the > block was not tampered with. CRCs are not sufficient, because they > can be tampered with, too. > > The only way a filesystem would be able to trust what it reads from > disk has not been tampered with in a system with untrusted mounts is > if it has some kind of cryptographically secure signature in the > metadata and the attacker is unable to access the key for that > signature. Preventing tampering is a little different from protecting the kernel from attack, isn't it? I thought the latter was what people were asking about. So, for example, a screwed up on-disk directory structure shouldn't result in creating a cycle in the dcache and then deadlocking. --b. > No filesystem we have has that capability and AFAIA there > are no plans for any filesystem to implement such tamper detection. > And no, ext4 encryption does not provide this because it only stores > the values and data in encrypted format and does not protect > metadata from tampering when it is not mounted. > > If we don't have crypto signatures in metadata, then XFS is probably > the most robust against tampering as it does a lot more checking of > the on-disk metadata before it is used than any other filesystem > (i.e. see the verifier infrastructure that does corruption checks > after read (in io completion) and before write (in io submission) > to catch bad metadata before it is used by the kernel, or before it > is written to disk by the kernel. > > However, these checks are far from comprehensive. we can only check > internal consistency of the metadata objects in the block, and even > then we really only can check for values within range rather than > absolute correctness. e.g. we can check a dirent has a valid name, > length, ftype and inode number, but we can't validate that the inode > is actually allocated or not because that requires a lookup in the > allocated inode btree. We *trust* that inode number to be > allocated and valid because it is in metadata the filesystem wrote. > > For inode numbers that come from untrusted sources (NFS, > open-by-handle, etc) we have a flag that does inode number > validation on lookup (XFS_IGET_UNTRUSTED) to check against trusted > metadata (i.e. the allocated inode btrees), but that is expensive > and so not done on inodes that we pull directly from metadata that > has come from disk. Indeed, we still trust on-disk metadata to be > correct to validate that other metadata canbe trusted, so if one > structure can be tampered with, so can others. > > IOWs, if we cannot trust one part of the filesystem metadata to be > correct, then we cannot trust that filesystem *at all*, *for > anything*. And even running fsck doesn't restore trust - all it does > is tell us that any modification that was made is not a detectable > inconsistency that needs fixing. > > > At this point any setting of the FS_USER_MOUNT flag I figure needs to go > > through the filesystem maintainers tree and they need to be aware of and > > agree to deal with the attack from below issue. > > > > The one filesystem I truly expect we can make work is fuse. fuse has > > been designed to deal with some variation of the attack from below issue > > since day one. We looked at what the patches to fuse would look like > > with the current state of the vfs and it was not pretty. > > > > We very much need to sort through as much as possible at the vfs layer, > > and in generic code. Allow everyone to see what is going on and how > > it works before preceeding forward with enabling any filesystems. > > The VFS protects us from attacks from above the filesystem, not > below. The VFS plays no part in validating the on-disk structure of > a filesystem which is what attacks from below will be attempting to > exploit. > > > I truly hope we can find a small set of block device filesystems that we > > can harden from attack below. That would allow linux to have serious > > defenses against evil usb stick attacks. I think that is going to take > > a lot of careful coding, testing and validation and advancing the state > > of the art to get there. > > Somehow, I just can't see that happening. > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-21 17:37 ` J. Bruce Fields @ 2015-07-22 7:56 ` Dave Chinner -1 siblings, 0 replies; 138+ messages in thread From: Dave Chinner @ 2015-07-22 7:56 UTC (permalink / raw) To: J. Bruce Fields Cc: Eric W. Biederman, Casey Schaufler, Andy Lutomirski, Seth Forshee, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel On Tue, Jul 21, 2015 at 01:37:21PM -0400, J. Bruce Fields wrote: > On Fri, Jul 17, 2015 at 12:47:35PM +1000, Dave Chinner wrote: > > On Thu, Jul 16, 2015 at 07:42:03PM -0500, Eric W. Biederman wrote: > > > Dave Chinner <david@fromorbit.com> writes: > > > > The key difference is that desktops only do this when you physically > > > > plug in a device. With unprivileged mounts, a hostile attacker > > > > doesn't need physical access to the machine to exploit lurking > > > > kernel filesystem bugs. i.e. they can just use loopback mounts, and > > > > they can keep mounting corrupted images until they find something > > > > that works. > > > > > > Yep. That magnifies the problem quite a bit. > > > > > > > User namespaces are supposed to provide trust separation. The > > > > kernel filesystems simply aren't hardened against unprivileged > > > > attacks from below - there is a trust relationship between root and > > > > the filesystem in that they are the only things that can write to > > > > the disk. Mounts from within a userns destroys this relationship as > > > > the userns root, by definition, is not a trusted actor. > > > > > > I talked to Ted Tso a while back and ext4 is at least in principle > > > already hardened against that kind of attack. I am not certain I > > > believe it, but if it is true I think it is fantastic. > > > > No, it's not. No filesystem is, because to harden against such > > attacks requires complete verification of all metadata when it is > > read from disk, before it is used, or some method or ensuring the > > block was not tampered with. CRCs are not sufficient, because they > > can be tampered with, too. > > > > The only way a filesystem would be able to trust what it reads from > > disk has not been tampered with in a system with untrusted mounts is > > if it has some kind of cryptographically secure signature in the > > metadata and the attacker is unable to access the key for that > > signature. > > Preventing tampering is a little different from protecting the kernel > from attack, isn't it? I thought the latter was what people were asking > about. People might be asking for the latter, but the only attack vector that can be made against filesystems from below is via tampering with the on-disk structure. An untrusted user in an untrusted container can construct arbitrary untrusted filesystem structures and get them parsed by a context running as $DIETY that assumes the structure is from a trusted source. What can possibly go wrong? IOWs, To protect the kernel against attack from untrusted filesystem images, we either have to be able to guarantee the image can not be modified by untrusted parties (i.e. needs to be created with signed tools, contain only signed filesystem metadata and signed/encrypted data), or we have to sandbox the filesystem parsing code completely (i.e. fuse). > So, for example, a screwed up on-disk directory structure shouldn't > result in creating a cycle in the dcache and then deadlocking. Therein lies the problem: how do you detect such structural defects without doing a full structure validation? e.g. cyclic links may only manifest when completely unrelated pieces of metadata are linked together in a specific way. Further, the problem is not restricted to validation at mount time - if the user can write to the filesystem image file, then they can modify it after it has been mounted, too. That means the attacker may be someone who has broken into a container, not necessarily the user you trusted with unprivileged mounts. That means every cold metadata read needs to be treated with suspicion, not just at mount time. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-22 7:56 ` Dave Chinner 0 siblings, 0 replies; 138+ messages in thread From: Dave Chinner @ 2015-07-22 7:56 UTC (permalink / raw) To: J. Bruce Fields Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, Seth Forshee, LSM List, SELinux-NSA, Linux FS Devel, Alexander Viro On Tue, Jul 21, 2015 at 01:37:21PM -0400, J. Bruce Fields wrote: > On Fri, Jul 17, 2015 at 12:47:35PM +1000, Dave Chinner wrote: > > On Thu, Jul 16, 2015 at 07:42:03PM -0500, Eric W. Biederman wrote: > > > Dave Chinner <david@fromorbit.com> writes: > > > > The key difference is that desktops only do this when you physically > > > > plug in a device. With unprivileged mounts, a hostile attacker > > > > doesn't need physical access to the machine to exploit lurking > > > > kernel filesystem bugs. i.e. they can just use loopback mounts, and > > > > they can keep mounting corrupted images until they find something > > > > that works. > > > > > > Yep. That magnifies the problem quite a bit. > > > > > > > User namespaces are supposed to provide trust separation. The > > > > kernel filesystems simply aren't hardened against unprivileged > > > > attacks from below - there is a trust relationship between root and > > > > the filesystem in that they are the only things that can write to > > > > the disk. Mounts from within a userns destroys this relationship as > > > > the userns root, by definition, is not a trusted actor. > > > > > > I talked to Ted Tso a while back and ext4 is at least in principle > > > already hardened against that kind of attack. I am not certain I > > > believe it, but if it is true I think it is fantastic. > > > > No, it's not. No filesystem is, because to harden against such > > attacks requires complete verification of all metadata when it is > > read from disk, before it is used, or some method or ensuring the > > block was not tampered with. CRCs are not sufficient, because they > > can be tampered with, too. > > > > The only way a filesystem would be able to trust what it reads from > > disk has not been tampered with in a system with untrusted mounts is > > if it has some kind of cryptographically secure signature in the > > metadata and the attacker is unable to access the key for that > > signature. > > Preventing tampering is a little different from protecting the kernel > from attack, isn't it? I thought the latter was what people were asking > about. People might be asking for the latter, but the only attack vector that can be made against filesystems from below is via tampering with the on-disk structure. An untrusted user in an untrusted container can construct arbitrary untrusted filesystem structures and get them parsed by a context running as $DIETY that assumes the structure is from a trusted source. What can possibly go wrong? IOWs, To protect the kernel against attack from untrusted filesystem images, we either have to be able to guarantee the image can not be modified by untrusted parties (i.e. needs to be created with signed tools, contain only signed filesystem metadata and signed/encrypted data), or we have to sandbox the filesystem parsing code completely (i.e. fuse). > So, for example, a screwed up on-disk directory structure shouldn't > result in creating a cycle in the dcache and then deadlocking. Therein lies the problem: how do you detect such structural defects without doing a full structure validation? e.g. cyclic links may only manifest when completely unrelated pieces of metadata are linked together in a specific way. Further, the problem is not restricted to validation at mount time - if the user can write to the filesystem image file, then they can modify it after it has been mounted, too. That means the attacker may be someone who has broken into a container, not necessarily the user you trusted with unprivileged mounts. That means every cold metadata read needs to be treated with suspicion, not just at mount time. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-22 7:56 ` Dave Chinner @ 2015-07-22 14:09 ` J. Bruce Fields -1 siblings, 0 replies; 138+ messages in thread From: J. Bruce Fields @ 2015-07-22 14:09 UTC (permalink / raw) To: Dave Chinner Cc: Eric W. Biederman, Casey Schaufler, Andy Lutomirski, Seth Forshee, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel On Wed, Jul 22, 2015 at 05:56:40PM +1000, Dave Chinner wrote: > On Tue, Jul 21, 2015 at 01:37:21PM -0400, J. Bruce Fields wrote: > > On Fri, Jul 17, 2015 at 12:47:35PM +1000, Dave Chinner wrote: > > > On Thu, Jul 16, 2015 at 07:42:03PM -0500, Eric W. Biederman wrote: > > > > Dave Chinner <david@fromorbit.com> writes: > > > > > The key difference is that desktops only do this when you physically > > > > > plug in a device. With unprivileged mounts, a hostile attacker > > > > > doesn't need physical access to the machine to exploit lurking > > > > > kernel filesystem bugs. i.e. they can just use loopback mounts, and > > > > > they can keep mounting corrupted images until they find something > > > > > that works. > > > > > > > > Yep. That magnifies the problem quite a bit. > > > > > > > > > User namespaces are supposed to provide trust separation. The > > > > > kernel filesystems simply aren't hardened against unprivileged > > > > > attacks from below - there is a trust relationship between root and > > > > > the filesystem in that they are the only things that can write to > > > > > the disk. Mounts from within a userns destroys this relationship as > > > > > the userns root, by definition, is not a trusted actor. > > > > > > > > I talked to Ted Tso a while back and ext4 is at least in principle > > > > already hardened against that kind of attack. I am not certain I > > > > believe it, but if it is true I think it is fantastic. > > > > > > No, it's not. No filesystem is, because to harden against such > > > attacks requires complete verification of all metadata when it is > > > read from disk, before it is used, or some method or ensuring the > > > block was not tampered with. CRCs are not sufficient, because they > > > can be tampered with, too. > > > > > > The only way a filesystem would be able to trust what it reads from > > > disk has not been tampered with in a system with untrusted mounts is > > > if it has some kind of cryptographically secure signature in the > > > metadata and the attacker is unable to access the key for that > > > signature. > > > > Preventing tampering is a little different from protecting the kernel > > from attack, isn't it? I thought the latter was what people were asking > > about. > > People might be asking for the latter, but the only attack vector > that can be made against filesystems from below is via tampering > with the on-disk structure. > > An untrusted user in an untrusted container can construct arbitrary > untrusted filesystem structures and get them parsed by a context > running as $DIETY that assumes the structure is from a trusted > source. What can possibly go wrong? > > IOWs, To protect the kernel against attack from untrusted filesystem > images, we either have to be able to guarantee the image can not be > modified by untrusted parties (i.e. needs to be created with > signed tools, contain only signed filesystem metadata and > signed/encrypted data), I don't think that works--who exactly would be the "trusted party"? It can't be this kernel or this hardware--users expect to be able to mount filesystems created by older kernels, on other machines, running other distributions (even other operating systems). It can't be the user--then any user could compromise the kernel by signing a bad filesystem. Authenticating the creator of the filesystem might be useful for other reasons, but it sounds to me like at best only very weak protection against corrupted filesystems. As a similar example, browser makers are stuck both implementing SSL and hardening their code against malicious content. Those address separate problems. > or we have to sandbox the filesystem parsing > code completely (i.e. fuse). > > > So, for example, a screwed up on-disk directory structure shouldn't > > result in creating a cycle in the dcache and then deadlocking. > > Therein lies the problem: how do you detect such structural defects > without doing a full structure validation? You can prevent cycles in a graph if you can prevent adding an edge which would be part of a cycle. For the dcache, it's d_splice_alias that does that (using d_ancestor). (And I believe the main motivation for that was NFS, where you don't need a filesystem cycle, just a server-side race that can briefly make it look like there's one--an example of the changing filesystem problem that you point out below.) > e.g. cyclic links may > only manifest when completely unrelated pieces of metadata are linked > together in a specific way. > > Further, the problem is not restricted to validation at mount time - > if the user can write to the filesystem image file, then they can > modify it after it has been mounted, too. That means the attacker > may be someone who has broken into a container, not necessarily the > user you trusted with unprivileged mounts. That means every cold > metadata read needs to be treated with suspicion, not just at mount > time. Yes. Agreed that this is difficult. (I can't actually give an example of an existing problem of this sort, but I'd be surprised if they don't exist.) --b. ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-22 14:09 ` J. Bruce Fields 0 siblings, 0 replies; 138+ messages in thread From: J. Bruce Fields @ 2015-07-22 14:09 UTC (permalink / raw) To: Dave Chinner Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, Seth Forshee, LSM List, SELinux-NSA, Linux FS Devel, Alexander Viro On Wed, Jul 22, 2015 at 05:56:40PM +1000, Dave Chinner wrote: > On Tue, Jul 21, 2015 at 01:37:21PM -0400, J. Bruce Fields wrote: > > On Fri, Jul 17, 2015 at 12:47:35PM +1000, Dave Chinner wrote: > > > On Thu, Jul 16, 2015 at 07:42:03PM -0500, Eric W. Biederman wrote: > > > > Dave Chinner <david@fromorbit.com> writes: > > > > > The key difference is that desktops only do this when you physically > > > > > plug in a device. With unprivileged mounts, a hostile attacker > > > > > doesn't need physical access to the machine to exploit lurking > > > > > kernel filesystem bugs. i.e. they can just use loopback mounts, and > > > > > they can keep mounting corrupted images until they find something > > > > > that works. > > > > > > > > Yep. That magnifies the problem quite a bit. > > > > > > > > > User namespaces are supposed to provide trust separation. The > > > > > kernel filesystems simply aren't hardened against unprivileged > > > > > attacks from below - there is a trust relationship between root and > > > > > the filesystem in that they are the only things that can write to > > > > > the disk. Mounts from within a userns destroys this relationship as > > > > > the userns root, by definition, is not a trusted actor. > > > > > > > > I talked to Ted Tso a while back and ext4 is at least in principle > > > > already hardened against that kind of attack. I am not certain I > > > > believe it, but if it is true I think it is fantastic. > > > > > > No, it's not. No filesystem is, because to harden against such > > > attacks requires complete verification of all metadata when it is > > > read from disk, before it is used, or some method or ensuring the > > > block was not tampered with. CRCs are not sufficient, because they > > > can be tampered with, too. > > > > > > The only way a filesystem would be able to trust what it reads from > > > disk has not been tampered with in a system with untrusted mounts is > > > if it has some kind of cryptographically secure signature in the > > > metadata and the attacker is unable to access the key for that > > > signature. > > > > Preventing tampering is a little different from protecting the kernel > > from attack, isn't it? I thought the latter was what people were asking > > about. > > People might be asking for the latter, but the only attack vector > that can be made against filesystems from below is via tampering > with the on-disk structure. > > An untrusted user in an untrusted container can construct arbitrary > untrusted filesystem structures and get them parsed by a context > running as $DIETY that assumes the structure is from a trusted > source. What can possibly go wrong? > > IOWs, To protect the kernel against attack from untrusted filesystem > images, we either have to be able to guarantee the image can not be > modified by untrusted parties (i.e. needs to be created with > signed tools, contain only signed filesystem metadata and > signed/encrypted data), I don't think that works--who exactly would be the "trusted party"? It can't be this kernel or this hardware--users expect to be able to mount filesystems created by older kernels, on other machines, running other distributions (even other operating systems). It can't be the user--then any user could compromise the kernel by signing a bad filesystem. Authenticating the creator of the filesystem might be useful for other reasons, but it sounds to me like at best only very weak protection against corrupted filesystems. As a similar example, browser makers are stuck both implementing SSL and hardening their code against malicious content. Those address separate problems. > or we have to sandbox the filesystem parsing > code completely (i.e. fuse). > > > So, for example, a screwed up on-disk directory structure shouldn't > > result in creating a cycle in the dcache and then deadlocking. > > Therein lies the problem: how do you detect such structural defects > without doing a full structure validation? You can prevent cycles in a graph if you can prevent adding an edge which would be part of a cycle. For the dcache, it's d_splice_alias that does that (using d_ancestor). (And I believe the main motivation for that was NFS, where you don't need a filesystem cycle, just a server-side race that can briefly make it look like there's one--an example of the changing filesystem problem that you point out below.) > e.g. cyclic links may > only manifest when completely unrelated pieces of metadata are linked > together in a specific way. > > Further, the problem is not restricted to validation at mount time - > if the user can write to the filesystem image file, then they can > modify it after it has been mounted, too. That means the attacker > may be someone who has broken into a container, not necessarily the > user you trusted with unprivileged mounts. That means every cold > metadata read needs to be treated with suspicion, not just at mount > time. Yes. Agreed that this is difficult. (I can't actually give an example of an existing problem of this sort, but I'd be surprised if they don't exist.) --b. ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-22 14:09 ` J. Bruce Fields @ 2015-07-22 16:52 ` Austin S Hemmelgarn -1 siblings, 0 replies; 138+ messages in thread From: Austin S Hemmelgarn @ 2015-07-22 16:52 UTC (permalink / raw) To: J. Bruce Fields, Dave Chinner Cc: Eric W. Biederman, Casey Schaufler, Andy Lutomirski, Seth Forshee, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1497 bytes --] On 2015-07-22 10:09, J. Bruce Fields wrote: > On Wed, Jul 22, 2015 at 05:56:40PM +1000, Dave Chinner wrote: >> On Tue, Jul 21, 2015 at 01:37:21PM -0400, J. Bruce Fields wrote: >>> On Fri, Jul 17, 2015 at 12:47:35PM +1000, Dave Chinner wrote: >>> So, for example, a screwed up on-disk directory structure shouldn't >>> result in creating a cycle in the dcache and then deadlocking. >> >> Therein lies the problem: how do you detect such structural defects >> without doing a full structure validation? > > You can prevent cycles in a graph if you can prevent adding an edge > which would be part of a cycle. > Except if the user can write to the filesystem's backing storage (be it a device or a file), and has sufficient knowledge of the on-disk structures, they can create all the cycles they want in the metadata. So unless the kernel builds the graph internally by parsing the metadata _and_ has some way to detect that the on-disk metadata has hit a cycle (which may not just involve 2 items), then you still have the potential for a DoS attack. Trust me, I've done this before (quite a while back when I was just starting out with programming on Linux) with hard-link cycles in an ext4 filesystem in a virtual machine just to see what would happen (IIRC, something deadlocked, I can't remember though if it was fsck or trying to access the file once the FS was mounted) (and in fact, I think I may try this again just to see if anything has changed). [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 3019 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-22 16:52 ` Austin S Hemmelgarn 0 siblings, 0 replies; 138+ messages in thread From: Austin S Hemmelgarn @ 2015-07-22 16:52 UTC (permalink / raw) To: J. Bruce Fields, Dave Chinner Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, Seth Forshee, LSM List, SELinux-NSA, Linux FS Devel, Alexander Viro [-- Attachment #1: Type: text/plain, Size: 1497 bytes --] On 2015-07-22 10:09, J. Bruce Fields wrote: > On Wed, Jul 22, 2015 at 05:56:40PM +1000, Dave Chinner wrote: >> On Tue, Jul 21, 2015 at 01:37:21PM -0400, J. Bruce Fields wrote: >>> On Fri, Jul 17, 2015 at 12:47:35PM +1000, Dave Chinner wrote: >>> So, for example, a screwed up on-disk directory structure shouldn't >>> result in creating a cycle in the dcache and then deadlocking. >> >> Therein lies the problem: how do you detect such structural defects >> without doing a full structure validation? > > You can prevent cycles in a graph if you can prevent adding an edge > which would be part of a cycle. > Except if the user can write to the filesystem's backing storage (be it a device or a file), and has sufficient knowledge of the on-disk structures, they can create all the cycles they want in the metadata. So unless the kernel builds the graph internally by parsing the metadata _and_ has some way to detect that the on-disk metadata has hit a cycle (which may not just involve 2 items), then you still have the potential for a DoS attack. Trust me, I've done this before (quite a while back when I was just starting out with programming on Linux) with hard-link cycles in an ext4 filesystem in a virtual machine just to see what would happen (IIRC, something deadlocked, I can't remember though if it was fsck or trying to access the file once the FS was mounted) (and in fact, I think I may try this again just to see if anything has changed). [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 3019 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-22 16:52 ` Austin S Hemmelgarn @ 2015-07-22 17:41 ` J. Bruce Fields -1 siblings, 0 replies; 138+ messages in thread From: J. Bruce Fields @ 2015-07-22 17:41 UTC (permalink / raw) To: Austin S Hemmelgarn Cc: Dave Chinner, Eric W. Biederman, Casey Schaufler, Andy Lutomirski, Seth Forshee, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel On Wed, Jul 22, 2015 at 12:52:58PM -0400, Austin S Hemmelgarn wrote: > On 2015-07-22 10:09, J. Bruce Fields wrote: > >On Wed, Jul 22, 2015 at 05:56:40PM +1000, Dave Chinner wrote: > >>On Tue, Jul 21, 2015 at 01:37:21PM -0400, J. Bruce Fields wrote: > >>>On Fri, Jul 17, 2015 at 12:47:35PM +1000, Dave Chinner wrote: > >>>So, for example, a screwed up on-disk directory structure shouldn't > >>>result in creating a cycle in the dcache and then deadlocking. > >> > >>Therein lies the problem: how do you detect such structural defects > >>without doing a full structure validation? > > > >You can prevent cycles in a graph if you can prevent adding an edge > >which would be part of a cycle. > > > Except if the user can write to the filesystem's backing storage (be > it a device or a file), and has sufficient knowledge of the on-disk > structures, they can create all the cycles they want in the > metadata. So unless the kernel builds the graph internally by > parsing the metadata _and_ has some way to detect that the on-disk > metadata has hit a cycle (which may not just involve 2 items), Understood. Again, see the d_ancestor call in d_splice_alias, this is exactly what it checks for. > then > you still have the potential for a DoS attack. > Trust me, I've done this before (quite a while back when I was just > starting out with programming on Linux) with hard-link cycles in an > ext4 filesystem in a virtual machine just to see what would happen > (IIRC, something deadlocked, I can't remember though if it was fsck > or trying to access the file once the FS was mounted) (and in fact, > I think I may try this again just to see if anything has changed). I've also seen bugs caused by loops in corrupted ext4 filesystems. As far as I know, they're fixed as of 95ad5c291313b. (I mentioned the example of dcache loops because it's something I happened to run across before. I'm sure there are any number of cases where we need similar checking to keep internal data structures consistent in the face of unexpected filesystem content.) --b. ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-22 17:41 ` J. Bruce Fields 0 siblings, 0 replies; 138+ messages in thread From: J. Bruce Fields @ 2015-07-22 17:41 UTC (permalink / raw) To: Austin S Hemmelgarn Cc: Serge Hallyn, Dave Chinner, linux-kernel, Andy Lutomirski, Seth Forshee, LSM List, SELinux-NSA, Linux FS Devel, Alexander Viro On Wed, Jul 22, 2015 at 12:52:58PM -0400, Austin S Hemmelgarn wrote: > On 2015-07-22 10:09, J. Bruce Fields wrote: > >On Wed, Jul 22, 2015 at 05:56:40PM +1000, Dave Chinner wrote: > >>On Tue, Jul 21, 2015 at 01:37:21PM -0400, J. Bruce Fields wrote: > >>>On Fri, Jul 17, 2015 at 12:47:35PM +1000, Dave Chinner wrote: > >>>So, for example, a screwed up on-disk directory structure shouldn't > >>>result in creating a cycle in the dcache and then deadlocking. > >> > >>Therein lies the problem: how do you detect such structural defects > >>without doing a full structure validation? > > > >You can prevent cycles in a graph if you can prevent adding an edge > >which would be part of a cycle. > > > Except if the user can write to the filesystem's backing storage (be > it a device or a file), and has sufficient knowledge of the on-disk > structures, they can create all the cycles they want in the > metadata. So unless the kernel builds the graph internally by > parsing the metadata _and_ has some way to detect that the on-disk > metadata has hit a cycle (which may not just involve 2 items), Understood. Again, see the d_ancestor call in d_splice_alias, this is exactly what it checks for. > then > you still have the potential for a DoS attack. > Trust me, I've done this before (quite a while back when I was just > starting out with programming on Linux) with hard-link cycles in an > ext4 filesystem in a virtual machine just to see what would happen > (IIRC, something deadlocked, I can't remember though if it was fsck > or trying to access the file once the FS was mounted) (and in fact, > I think I may try this again just to see if anything has changed). I've also seen bugs caused by loops in corrupted ext4 filesystems. As far as I know, they're fixed as of 95ad5c291313b. (I mentioned the example of dcache loops because it's something I happened to run across before. I'm sure there are any number of cases where we need similar checking to keep internal data structures consistent in the face of unexpected filesystem content.) --b. ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-22 17:41 ` J. Bruce Fields @ 2015-07-23 1:51 ` Dave Chinner -1 siblings, 0 replies; 138+ messages in thread From: Dave Chinner @ 2015-07-23 1:51 UTC (permalink / raw) To: J. Bruce Fields Cc: Austin S Hemmelgarn, Eric W. Biederman, Casey Schaufler, Andy Lutomirski, Seth Forshee, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel On Wed, Jul 22, 2015 at 01:41:00PM -0400, J. Bruce Fields wrote: > On Wed, Jul 22, 2015 at 12:52:58PM -0400, Austin S Hemmelgarn wrote: > > On 2015-07-22 10:09, J. Bruce Fields wrote: > > >On Wed, Jul 22, 2015 at 05:56:40PM +1000, Dave Chinner wrote: > > >>On Tue, Jul 21, 2015 at 01:37:21PM -0400, J. Bruce Fields wrote: > > >>>On Fri, Jul 17, 2015 at 12:47:35PM +1000, Dave Chinner wrote: > > >>>So, for example, a screwed up on-disk directory structure shouldn't > > >>>result in creating a cycle in the dcache and then deadlocking. > > >> > > >>Therein lies the problem: how do you detect such structural defects > > >>without doing a full structure validation? > > > > > >You can prevent cycles in a graph if you can prevent adding an edge > > >which would be part of a cycle. > > > > > Except if the user can write to the filesystem's backing storage (be > > it a device or a file), and has sufficient knowledge of the on-disk > > structures, they can create all the cycles they want in the > > metadata. So unless the kernel builds the graph internally by > > parsing the metadata _and_ has some way to detect that the on-disk > > metadata has hit a cycle (which may not just involve 2 items), > > Understood. Again, see the d_ancestor call in d_splice_alias, this is > exactly what it checks for. But that only addresses one type of loop in one specific metadata structure. There's plenty of other ways you could construct metadata loops that are essentially undetected and result in either deadlock or livelock within the filesystem code itself. e.g. just make btree sibling pointers loop over a range of entries that have the same index key (e.g. free space extents of the same size). If allocation then falls into this loop, the kernel will just spin searching the same blocks for something it will never find. Such resource consumption attacks are trivial to construct but extremely difficult to detect because they exploit normal behaviour of the structure and algorithms by mangling trusted pointers. Of course, this sort of attack will eventually deadlock the filesystem because it will backs up on locks held by the live locked search. Once the filesystem is deadlocked, it can then cause sync() calls to get stuck on the filesystem. And because sync() is a global operation, a deadlocked filesystem in one container could cause sync to hang in completely unrelated container.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-23 1:51 ` Dave Chinner 0 siblings, 0 replies; 138+ messages in thread From: Dave Chinner @ 2015-07-23 1:51 UTC (permalink / raw) To: J. Bruce Fields Cc: Serge Hallyn, linux-kernel, LSM List, Andy Lutomirski, Seth Forshee, Austin S Hemmelgarn, SELinux-NSA, Linux FS Devel, Alexander Viro On Wed, Jul 22, 2015 at 01:41:00PM -0400, J. Bruce Fields wrote: > On Wed, Jul 22, 2015 at 12:52:58PM -0400, Austin S Hemmelgarn wrote: > > On 2015-07-22 10:09, J. Bruce Fields wrote: > > >On Wed, Jul 22, 2015 at 05:56:40PM +1000, Dave Chinner wrote: > > >>On Tue, Jul 21, 2015 at 01:37:21PM -0400, J. Bruce Fields wrote: > > >>>On Fri, Jul 17, 2015 at 12:47:35PM +1000, Dave Chinner wrote: > > >>>So, for example, a screwed up on-disk directory structure shouldn't > > >>>result in creating a cycle in the dcache and then deadlocking. > > >> > > >>Therein lies the problem: how do you detect such structural defects > > >>without doing a full structure validation? > > > > > >You can prevent cycles in a graph if you can prevent adding an edge > > >which would be part of a cycle. > > > > > Except if the user can write to the filesystem's backing storage (be > > it a device or a file), and has sufficient knowledge of the on-disk > > structures, they can create all the cycles they want in the > > metadata. So unless the kernel builds the graph internally by > > parsing the metadata _and_ has some way to detect that the on-disk > > metadata has hit a cycle (which may not just involve 2 items), > > Understood. Again, see the d_ancestor call in d_splice_alias, this is > exactly what it checks for. But that only addresses one type of loop in one specific metadata structure. There's plenty of other ways you could construct metadata loops that are essentially undetected and result in either deadlock or livelock within the filesystem code itself. e.g. just make btree sibling pointers loop over a range of entries that have the same index key (e.g. free space extents of the same size). If allocation then falls into this loop, the kernel will just spin searching the same blocks for something it will never find. Such resource consumption attacks are trivial to construct but extremely difficult to detect because they exploit normal behaviour of the structure and algorithms by mangling trusted pointers. Of course, this sort of attack will eventually deadlock the filesystem because it will backs up on locks held by the live locked search. Once the filesystem is deadlocked, it can then cause sync() calls to get stuck on the filesystem. And because sync() is a global operation, a deadlocked filesystem in one container could cause sync to hang in completely unrelated container.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-23 1:51 ` Dave Chinner @ 2015-07-23 13:19 ` J. Bruce Fields -1 siblings, 0 replies; 138+ messages in thread From: J. Bruce Fields @ 2015-07-23 13:19 UTC (permalink / raw) To: Dave Chinner Cc: Austin S Hemmelgarn, Eric W. Biederman, Casey Schaufler, Andy Lutomirski, Seth Forshee, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel On Thu, Jul 23, 2015 at 11:51:35AM +1000, Dave Chinner wrote: > On Wed, Jul 22, 2015 at 01:41:00PM -0400, J. Bruce Fields wrote: > > On Wed, Jul 22, 2015 at 12:52:58PM -0400, Austin S Hemmelgarn wrote: > > > On 2015-07-22 10:09, J. Bruce Fields wrote: > > > >On Wed, Jul 22, 2015 at 05:56:40PM +1000, Dave Chinner wrote: > > > >>On Tue, Jul 21, 2015 at 01:37:21PM -0400, J. Bruce Fields wrote: > > > >>>On Fri, Jul 17, 2015 at 12:47:35PM +1000, Dave Chinner wrote: > > > >>>So, for example, a screwed up on-disk directory structure shouldn't > > > >>>result in creating a cycle in the dcache and then deadlocking. > > > >> > > > >>Therein lies the problem: how do you detect such structural defects > > > >>without doing a full structure validation? > > > > > > > >You can prevent cycles in a graph if you can prevent adding an edge > > > >which would be part of a cycle. > > > > > > > Except if the user can write to the filesystem's backing storage (be > > > it a device or a file), and has sufficient knowledge of the on-disk > > > structures, they can create all the cycles they want in the > > > metadata. So unless the kernel builds the graph internally by > > > parsing the metadata _and_ has some way to detect that the on-disk > > > metadata has hit a cycle (which may not just involve 2 items), > > > > Understood. Again, see the d_ancestor call in d_splice_alias, this is > > exactly what it checks for. > > But that only addresses one type of loop in one specific metadata > structure. Yep, agreed! > There's plenty of other ways you could construct metadata > loops that are essentially undetected and result in either deadlock > or livelock within the filesystem code itself. e.g. just make btree > sibling pointers loop over a range of entries that have the same > index key (e.g. free space extents of the same size). If allocation > then falls into this loop, the kernel will just spin searching the > same blocks for something it will never find. Such resource > consumption attacks are trivial to construct but extremely difficult > to detect because they exploit normal behaviour of the structure and > algorithms by mangling trusted pointers. Interesting example, thanks! I doubt this particular example would be *that* hard to detect? But understood that there may be lots of others. --b. > > Of course, this sort of attack will eventually deadlock the > filesystem because it will backs up on locks held by the live locked > search. Once the filesystem is deadlocked, it can then cause sync() > calls to get stuck on the filesystem. And because sync() is a global > operation, a deadlocked filesystem in one container could cause sync > to hang in completely unrelated container.... > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-23 13:19 ` J. Bruce Fields 0 siblings, 0 replies; 138+ messages in thread From: J. Bruce Fields @ 2015-07-23 13:19 UTC (permalink / raw) To: Dave Chinner Cc: Serge Hallyn, linux-kernel, LSM List, Andy Lutomirski, Seth Forshee, Austin S Hemmelgarn, SELinux-NSA, Linux FS Devel, Alexander Viro On Thu, Jul 23, 2015 at 11:51:35AM +1000, Dave Chinner wrote: > On Wed, Jul 22, 2015 at 01:41:00PM -0400, J. Bruce Fields wrote: > > On Wed, Jul 22, 2015 at 12:52:58PM -0400, Austin S Hemmelgarn wrote: > > > On 2015-07-22 10:09, J. Bruce Fields wrote: > > > >On Wed, Jul 22, 2015 at 05:56:40PM +1000, Dave Chinner wrote: > > > >>On Tue, Jul 21, 2015 at 01:37:21PM -0400, J. Bruce Fields wrote: > > > >>>On Fri, Jul 17, 2015 at 12:47:35PM +1000, Dave Chinner wrote: > > > >>>So, for example, a screwed up on-disk directory structure shouldn't > > > >>>result in creating a cycle in the dcache and then deadlocking. > > > >> > > > >>Therein lies the problem: how do you detect such structural defects > > > >>without doing a full structure validation? > > > > > > > >You can prevent cycles in a graph if you can prevent adding an edge > > > >which would be part of a cycle. > > > > > > > Except if the user can write to the filesystem's backing storage (be > > > it a device or a file), and has sufficient knowledge of the on-disk > > > structures, they can create all the cycles they want in the > > > metadata. So unless the kernel builds the graph internally by > > > parsing the metadata _and_ has some way to detect that the on-disk > > > metadata has hit a cycle (which may not just involve 2 items), > > > > Understood. Again, see the d_ancestor call in d_splice_alias, this is > > exactly what it checks for. > > But that only addresses one type of loop in one specific metadata > structure. Yep, agreed! > There's plenty of other ways you could construct metadata > loops that are essentially undetected and result in either deadlock > or livelock within the filesystem code itself. e.g. just make btree > sibling pointers loop over a range of entries that have the same > index key (e.g. free space extents of the same size). If allocation > then falls into this loop, the kernel will just spin searching the > same blocks for something it will never find. Such resource > consumption attacks are trivial to construct but extremely difficult > to detect because they exploit normal behaviour of the structure and > algorithms by mangling trusted pointers. Interesting example, thanks! I doubt this particular example would be *that* hard to detect? But understood that there may be lots of others. --b. > > Of course, this sort of attack will eventually deadlock the > filesystem because it will backs up on locks held by the live locked > search. Once the filesystem is deadlocked, it can then cause sync() > calls to get stuck on the filesystem. And because sync() is a global > operation, a deadlocked filesystem in one container could cause sync > to hang in completely unrelated container.... > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-23 13:19 ` J. Bruce Fields @ 2015-07-23 23:48 ` Dave Chinner -1 siblings, 0 replies; 138+ messages in thread From: Dave Chinner @ 2015-07-23 23:48 UTC (permalink / raw) To: J. Bruce Fields Cc: Austin S Hemmelgarn, Eric W. Biederman, Casey Schaufler, Andy Lutomirski, Seth Forshee, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel On Thu, Jul 23, 2015 at 09:19:28AM -0400, J. Bruce Fields wrote: > On Thu, Jul 23, 2015 at 11:51:35AM +1000, Dave Chinner wrote: > > On Wed, Jul 22, 2015 at 01:41:00PM -0400, J. Bruce Fields wrote: > > > On Wed, Jul 22, 2015 at 12:52:58PM -0400, Austin S Hemmelgarn wrote: > > > > On 2015-07-22 10:09, J. Bruce Fields wrote: > > > > >On Wed, Jul 22, 2015 at 05:56:40PM +1000, Dave Chinner wrote: > > > > >>On Tue, Jul 21, 2015 at 01:37:21PM -0400, J. Bruce Fields wrote: > > > > >>>On Fri, Jul 17, 2015 at 12:47:35PM +1000, Dave Chinner wrote: > > > > >>>So, for example, a screwed up on-disk directory structure shouldn't > > > > >>>result in creating a cycle in the dcache and then deadlocking. > > > > >> > > > > >>Therein lies the problem: how do you detect such structural defects > > > > >>without doing a full structure validation? > > > > > > > > > >You can prevent cycles in a graph if you can prevent adding an edge > > > > >which would be part of a cycle. > > > > > > > > > Except if the user can write to the filesystem's backing storage (be > > > > it a device or a file), and has sufficient knowledge of the on-disk > > > > structures, they can create all the cycles they want in the > > > > metadata. So unless the kernel builds the graph internally by > > > > parsing the metadata _and_ has some way to detect that the on-disk > > > > metadata has hit a cycle (which may not just involve 2 items), > > > > > > Understood. Again, see the d_ancestor call in d_splice_alias, this is > > > exactly what it checks for. > > > > But that only addresses one type of loop in one specific metadata > > structure. > > Yep, agreed! > > > There's plenty of other ways you could construct metadata > > loops that are essentially undetected and result in either deadlock > > or livelock within the filesystem code itself. e.g. just make btree > > sibling pointers loop over a range of entries that have the same > > index key (e.g. free space extents of the same size). If allocation > > then falls into this loop, the kernel will just spin searching the > > same blocks for something it will never find. Such resource > > consumption attacks are trivial to construct but extremely difficult > > to detect because they exploit normal behaviour of the structure and > > algorithms by mangling trusted pointers. > > Interesting example, thanks! I doubt this particular example would be > *that* hard to detect? Yes, it can be detected, but it's not as easy as it sounds because of abstractions between tree walking and record parsing. > But understood that there may be lots of others. Yeah, that's just one of many, many ways I can think of modifying on disk structures to screw up the kernel. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-23 23:48 ` Dave Chinner 0 siblings, 0 replies; 138+ messages in thread From: Dave Chinner @ 2015-07-23 23:48 UTC (permalink / raw) To: J. Bruce Fields Cc: Serge Hallyn, linux-kernel, LSM List, Andy Lutomirski, Seth Forshee, Austin S Hemmelgarn, SELinux-NSA, Linux FS Devel, Alexander Viro On Thu, Jul 23, 2015 at 09:19:28AM -0400, J. Bruce Fields wrote: > On Thu, Jul 23, 2015 at 11:51:35AM +1000, Dave Chinner wrote: > > On Wed, Jul 22, 2015 at 01:41:00PM -0400, J. Bruce Fields wrote: > > > On Wed, Jul 22, 2015 at 12:52:58PM -0400, Austin S Hemmelgarn wrote: > > > > On 2015-07-22 10:09, J. Bruce Fields wrote: > > > > >On Wed, Jul 22, 2015 at 05:56:40PM +1000, Dave Chinner wrote: > > > > >>On Tue, Jul 21, 2015 at 01:37:21PM -0400, J. Bruce Fields wrote: > > > > >>>On Fri, Jul 17, 2015 at 12:47:35PM +1000, Dave Chinner wrote: > > > > >>>So, for example, a screwed up on-disk directory structure shouldn't > > > > >>>result in creating a cycle in the dcache and then deadlocking. > > > > >> > > > > >>Therein lies the problem: how do you detect such structural defects > > > > >>without doing a full structure validation? > > > > > > > > > >You can prevent cycles in a graph if you can prevent adding an edge > > > > >which would be part of a cycle. > > > > > > > > > Except if the user can write to the filesystem's backing storage (be > > > > it a device or a file), and has sufficient knowledge of the on-disk > > > > structures, they can create all the cycles they want in the > > > > metadata. So unless the kernel builds the graph internally by > > > > parsing the metadata _and_ has some way to detect that the on-disk > > > > metadata has hit a cycle (which may not just involve 2 items), > > > > > > Understood. Again, see the d_ancestor call in d_splice_alias, this is > > > exactly what it checks for. > > > > But that only addresses one type of loop in one specific metadata > > structure. > > Yep, agreed! > > > There's plenty of other ways you could construct metadata > > loops that are essentially undetected and result in either deadlock > > or livelock within the filesystem code itself. e.g. just make btree > > sibling pointers loop over a range of entries that have the same > > index key (e.g. free space extents of the same size). If allocation > > then falls into this loop, the kernel will just spin searching the > > same blocks for something it will never find. Such resource > > consumption attacks are trivial to construct but extremely difficult > > to detect because they exploit normal behaviour of the structure and > > algorithms by mangling trusted pointers. > > Interesting example, thanks! I doubt this particular example would be > *that* hard to detect? Yes, it can be detected, but it's not as easy as it sounds because of abstractions between tree walking and record parsing. > But understood that there may be lots of others. Yeah, that's just one of many, many ways I can think of modifying on disk structures to screw up the kernel. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-17 0:42 ` Eric W. Biederman @ 2015-07-18 0:07 ` Serge E. Hallyn -1 siblings, 0 replies; 138+ messages in thread From: Serge E. Hallyn @ 2015-07-18 0:07 UTC (permalink / raw) To: Eric W. Biederman Cc: Dave Chinner, Casey Schaufler, Andy Lutomirski, Seth Forshee, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel On Thu, Jul 16, 2015 at 07:42:03PM -0500, Eric W. Biederman wrote: > Dave Chinner <david@fromorbit.com> writes: > > > On Wed, Jul 15, 2015 at 11:47:08PM -0500, Eric W. Biederman wrote: > >> Casey Schaufler <casey@schaufler-ca.com> writes: > >> > On 7/15/2015 6:08 PM, Andy Lutomirski wrote: > >> >> If I mount an unprivileged filesystem, then either the contents were > >> >> put there *by me*, in which case letting me access them are fine, or > >> >> (with Seth's patches and then some) I control the backing store, in > >> >> which case I can do whatever I want regardless of what LSM thinks. > >> >> > >> >> So I don't see the problem. Why would Smack or any other LSM care at > >> >> all, unless it wants to prevent me from mounting the fs in the first > >> >> place? > >> > > >> > First off, I don't cotton to the notion that you should be able > >> > to mount filesystems without privilege. But it seems I'm being > >> > outvoted on that. I suspect that there are cases where it might > >> > be safe, but I can't think of one off the top of my head. > >> > >> There are two fundamental issues mounting filesystems without privielge, > >> by which I actually mean mounting filesystems as the root user in a user > >> namespace. > >> > >> - Are the semantics safe. > >> - Is the extra attack surface a problem. > > > > I think the attack surface this exposes is the biggest problem > > facing this proposal. > > I completely agree. > > >> Figuring out how to make semantics safe is what we are talking about. > >> > >> Once we sort out the semantics we can look at the handful of filesystems > >> like fuse where the extra attack surface is not a concern. > >> > >> With that said desktop environments have for a long time been > >> automatically mounting whichever filesystem you place in your computer, > >> so in practice what this is really about is trying to align the kernel > >> with how people use filesystems. > > > > The key difference is that desktops only do this when you physically > > plug in a device. With unprivileged mounts, a hostile attacker > > doesn't need physical access to the machine to exploit lurking > > kernel filesystem bugs. i.e. they can just use loopback mounts, and > > they can keep mounting corrupted images until they find something > > that works. > > Yep. That magnifies the problem quite a bit. > > > User namespaces are supposed to provide trust separation. The > > kernel filesystems simply aren't hardened against unprivileged > > attacks from below - there is a trust relationship between root and > > the filesystem in that they are the only things that can write to > > the disk. Mounts from within a userns destroys this relationship as > > the userns root, by definition, is not a trusted actor. > > I talked to Ted Tso a while back and ext4 is at least in principle > already hardened against that kind of attack. I am not certain I > believe it, but if it is true I think it is fantastic. Not sure what he said in private, but at the kernel summit last year what he said was not that it was "hardened", but that any bugs which would result from mounting a garbage image (i.e. an unpriv user fuzzing) would be deemed by him a real bug. As opposed to saying "don't do that". To the best of my knowledge that's so far only the case with Ted/ext4, which I assume is why Seth started with ext4. -serge ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-18 0:07 ` Serge E. Hallyn 0 siblings, 0 replies; 138+ messages in thread From: Serge E. Hallyn @ 2015-07-18 0:07 UTC (permalink / raw) To: Eric W. Biederman Cc: Serge Hallyn, Dave Chinner, linux-kernel, Andy Lutomirski, Seth Forshee, LSM List, Alexander Viro, SELinux-NSA, Linux FS Devel On Thu, Jul 16, 2015 at 07:42:03PM -0500, Eric W. Biederman wrote: > Dave Chinner <david@fromorbit.com> writes: > > > On Wed, Jul 15, 2015 at 11:47:08PM -0500, Eric W. Biederman wrote: > >> Casey Schaufler <casey@schaufler-ca.com> writes: > >> > On 7/15/2015 6:08 PM, Andy Lutomirski wrote: > >> >> If I mount an unprivileged filesystem, then either the contents were > >> >> put there *by me*, in which case letting me access them are fine, or > >> >> (with Seth's patches and then some) I control the backing store, in > >> >> which case I can do whatever I want regardless of what LSM thinks. > >> >> > >> >> So I don't see the problem. Why would Smack or any other LSM care at > >> >> all, unless it wants to prevent me from mounting the fs in the first > >> >> place? > >> > > >> > First off, I don't cotton to the notion that you should be able > >> > to mount filesystems without privilege. But it seems I'm being > >> > outvoted on that. I suspect that there are cases where it might > >> > be safe, but I can't think of one off the top of my head. > >> > >> There are two fundamental issues mounting filesystems without privielge, > >> by which I actually mean mounting filesystems as the root user in a user > >> namespace. > >> > >> - Are the semantics safe. > >> - Is the extra attack surface a problem. > > > > I think the attack surface this exposes is the biggest problem > > facing this proposal. > > I completely agree. > > >> Figuring out how to make semantics safe is what we are talking about. > >> > >> Once we sort out the semantics we can look at the handful of filesystems > >> like fuse where the extra attack surface is not a concern. > >> > >> With that said desktop environments have for a long time been > >> automatically mounting whichever filesystem you place in your computer, > >> so in practice what this is really about is trying to align the kernel > >> with how people use filesystems. > > > > The key difference is that desktops only do this when you physically > > plug in a device. With unprivileged mounts, a hostile attacker > > doesn't need physical access to the machine to exploit lurking > > kernel filesystem bugs. i.e. they can just use loopback mounts, and > > they can keep mounting corrupted images until they find something > > that works. > > Yep. That magnifies the problem quite a bit. > > > User namespaces are supposed to provide trust separation. The > > kernel filesystems simply aren't hardened against unprivileged > > attacks from below - there is a trust relationship between root and > > the filesystem in that they are the only things that can write to > > the disk. Mounts from within a userns destroys this relationship as > > the userns root, by definition, is not a trusted actor. > > I talked to Ted Tso a while back and ext4 is at least in principle > already hardened against that kind of attack. I am not certain I > believe it, but if it is true I think it is fantastic. Not sure what he said in private, but at the kernel summit last year what he said was not that it was "hardened", but that any bugs which would result from mounting a garbage image (i.e. an unpriv user fuzzing) would be deemed by him a real bug. As opposed to saying "don't do that". To the best of my knowledge that's so far only the case with Ted/ext4, which I assume is why Seth started with ext4. -serge ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-16 4:47 ` Eric W. Biederman @ 2015-07-20 17:54 ` Colin Walters -1 siblings, 0 replies; 138+ messages in thread From: Colin Walters @ 2015-07-20 17:54 UTC (permalink / raw) To: Eric W. Biederman, Casey Schaufler Cc: Andy Lutomirski, Seth Forshee, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel On Thu, Jul 16, 2015, at 12:47 AM, Eric W. Biederman wrote: > With that said desktop environments have for a long time been > automatically mounting whichever filesystem you place in your computer, > so in practice what this is really about is trying to align the kernel > with how people use filesystems. There is a large attack surface difference between mounting a device that someone physically plugged into the computer (and note typically it's required that the active console be unlocked as well[1]) versus allowing any "unprivileged" process at any time to do it. Many server setups use "unprivileged" uids that otherwise wouldn't be able to exploit bugs in filesystem code. [1] https://bugzilla.gnome.org/show_bug.cgi?id=653520 "AutomountManager also keeps track of the current session availability (using the ConsoleKit and gnome-screensaver DBus interfaces) and inhibits mounting if the current session is locked, or another session is in use instead." ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-20 17:54 ` Colin Walters 0 siblings, 0 replies; 138+ messages in thread From: Colin Walters @ 2015-07-20 17:54 UTC (permalink / raw) To: Eric W. Biederman, Casey Schaufler Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, Seth Forshee, LSM List, Alexander Viro, SELinux-NSA, Linux FS Devel On Thu, Jul 16, 2015, at 12:47 AM, Eric W. Biederman wrote: > With that said desktop environments have for a long time been > automatically mounting whichever filesystem you place in your computer, > so in practice what this is really about is trying to align the kernel > with how people use filesystems. There is a large attack surface difference between mounting a device that someone physically plugged into the computer (and note typically it's required that the active console be unlocked as well[1]) versus allowing any "unprivileged" process at any time to do it. Many server setups use "unprivileged" uids that otherwise wouldn't be able to exploit bugs in filesystem code. [1] https://bugzilla.gnome.org/show_bug.cgi?id=653520 "AutomountManager also keeps track of the current session availability (using the ConsoleKit and gnome-screensaver DBus interfaces) and inhibits mounting if the current session is locked, or another session is in use instead." ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-15 21:06 ` Eric W. Biederman @ 2015-07-16 11:16 ` Lukasz Pawelczyk -1 siblings, 0 replies; 138+ messages in thread From: Lukasz Pawelczyk @ 2015-07-16 11:16 UTC (permalink / raw) To: Eric W. Biederman, Casey Schaufler Cc: Seth Forshee, Alexander Viro, linux-fsdevel, linux-security-module, selinux, Serge Hallyn, Andy Lutomirski, linux-kernel On śro, 2015-07-15 at 16:06 -0500, Eric W. Biederman wrote: > > I am on the fence with Lukasz Pawelczyk's patches. Some parts I > liked > some parts I had issues with. As I recall one of my issues was that > those patches conflicted in detail if not in principle with this > appropach. > > If these patches do not do a good job of laying the ground work for > supporting security labels that unprivileged users can set than Seth > could really use some feedback. Figuring out how to properly deal > with > the LSMs has been one of his challenges. I fail to see how those 2 are in any conflict. Smack namespace is just a mean of limiting the view of Smack labels within user namespace, to be able to give some limited capabilities to processes in the namespace to make it possible to partially administer Smack there. It doesn't change Smack behaviour or mode of operation in any way. If your approach here is to treat user ns mounted filesystem as if they didn't support xattrs at all then my patches don't conflict here any more than Smack itself already does. If the filesystem will get a default (e.g. by smack* mount options) label then this label will co-work with Smack namespaces. -- Lukasz Pawelczyk Samsung R&D Institute Poland Samsung Electronics ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-16 11:16 ` Lukasz Pawelczyk 0 siblings, 0 replies; 138+ messages in thread From: Lukasz Pawelczyk @ 2015-07-16 11:16 UTC (permalink / raw) To: Eric W. Biederman, Casey Schaufler Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, Seth Forshee, linux-security-module, Alexander Viro, selinux, linux-fsdevel On śro, 2015-07-15 at 16:06 -0500, Eric W. Biederman wrote: > > I am on the fence with Lukasz Pawelczyk's patches. Some parts I > liked > some parts I had issues with. As I recall one of my issues was that > those patches conflicted in detail if not in principle with this > appropach. > > If these patches do not do a good job of laying the ground work for > supporting security labels that unprivileged users can set than Seth > could really use some feedback. Figuring out how to properly deal > with > the LSMs has been one of his challenges. I fail to see how those 2 are in any conflict. Smack namespace is just a mean of limiting the view of Smack labels within user namespace, to be able to give some limited capabilities to processes in the namespace to make it possible to partially administer Smack there. It doesn't change Smack behaviour or mode of operation in any way. If your approach here is to treat user ns mounted filesystem as if they didn't support xattrs at all then my patches don't conflict here any more than Smack itself already does. If the filesystem will get a default (e.g. by smack* mount options) label then this label will co-work with Smack namespaces. -- Lukasz Pawelczyk Samsung R&D Institute Poland Samsung Electronics ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-16 11:16 ` Lukasz Pawelczyk @ 2015-07-17 0:10 ` Eric W. Biederman -1 siblings, 0 replies; 138+ messages in thread From: Eric W. Biederman @ 2015-07-17 0:10 UTC (permalink / raw) To: Lukasz Pawelczyk Cc: Casey Schaufler, Seth Forshee, Alexander Viro, linux-fsdevel, linux-security-module, selinux, Serge Hallyn, Andy Lutomirski, linux-kernel Lukasz Pawelczyk <l.pawelczyk@samsung.com> writes: > On śro, 2015-07-15 at 16:06 -0500, Eric W. Biederman wrote: >> >> I am on the fence with Lukasz Pawelczyk's patches. Some parts I >> liked >> some parts I had issues with. As I recall one of my issues was that >> those patches conflicted in detail if not in principle with this >> appropach. >> >> If these patches do not do a good job of laying the ground work for >> supporting security labels that unprivileged users can set than Seth >> could really use some feedback. Figuring out how to properly deal >> with >> the LSMs has been one of his challenges. > > I fail to see how those 2 are in any conflict. Like I said. They don't really conflict, and actually to really support things well for smack we probably need something like your patches. At the same time a patch written without dealing with s_user_ns is going to going to fail to take a lot of important details into account. Right now after fixing the mount namespace issues the top priority is to work through the details and get s_user_ns implemented. By that I mean some version of patch 1 of Seth's series. s_user_ns fundamentally changes how the concepts are represented in the kernel in a way that is easier to secure, and that fundamentally better matches things. And sigh. This review has shown we don't quite have all of the details worked out. > If your approach here is to treat user ns mounted filesystem as if they > didn't support xattrs at all then my patches don't conflict here any > more than Smack itself already does. The end game if people developing smack choose to play, is to figure out how to store your unmapped labels in a filesystem contained by a user namespace and a smack label namespace root. > If the filesystem will get a default (e.g. by smack* mount options) > label then this label will co-work with Smack namespaces. A default, but I don't know if it will be smack mount options that will give that default. The devil is in the details and there are a lot of details. Eric ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-17 0:10 ` Eric W. Biederman 0 siblings, 0 replies; 138+ messages in thread From: Eric W. Biederman @ 2015-07-17 0:10 UTC (permalink / raw) To: Lukasz Pawelczyk Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, Seth Forshee, linux-security-module, Alexander Viro, selinux, linux-fsdevel Lukasz Pawelczyk <l.pawelczyk@samsung.com> writes: > On śro, 2015-07-15 at 16:06 -0500, Eric W. Biederman wrote: >> >> I am on the fence with Lukasz Pawelczyk's patches. Some parts I >> liked >> some parts I had issues with. As I recall one of my issues was that >> those patches conflicted in detail if not in principle with this >> appropach. >> >> If these patches do not do a good job of laying the ground work for >> supporting security labels that unprivileged users can set than Seth >> could really use some feedback. Figuring out how to properly deal >> with >> the LSMs has been one of his challenges. > > I fail to see how those 2 are in any conflict. Like I said. They don't really conflict, and actually to really support things well for smack we probably need something like your patches. At the same time a patch written without dealing with s_user_ns is going to going to fail to take a lot of important details into account. Right now after fixing the mount namespace issues the top priority is to work through the details and get s_user_ns implemented. By that I mean some version of patch 1 of Seth's series. s_user_ns fundamentally changes how the concepts are represented in the kernel in a way that is easier to secure, and that fundamentally better matches things. And sigh. This review has shown we don't quite have all of the details worked out. > If your approach here is to treat user ns mounted filesystem as if they > didn't support xattrs at all then my patches don't conflict here any > more than Smack itself already does. The end game if people developing smack choose to play, is to figure out how to store your unmapped labels in a filesystem contained by a user namespace and a smack label namespace root. > If the filesystem will get a default (e.g. by smack* mount options) > label then this label will co-work with Smack namespaces. A default, but I don't know if it will be smack mount options that will give that default. The devil is in the details and there are a lot of details. Eric ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-17 0:10 ` Eric W. Biederman @ 2015-07-17 10:13 ` Lukasz Pawelczyk -1 siblings, 0 replies; 138+ messages in thread From: Lukasz Pawelczyk @ 2015-07-17 10:13 UTC (permalink / raw) To: Eric W. Biederman Cc: Casey Schaufler, Seth Forshee, Alexander Viro, linux-fsdevel, linux-security-module, selinux, Serge Hallyn, Andy Lutomirski, linux-kernel On czw, 2015-07-16 at 19:10 -0500, Eric W. Biederman wrote: > Lukasz Pawelczyk <l.pawelczyk@samsung.com> writes: > > > > I fail to see how those 2 are in any conflict. > > Like I said. They don't really conflict, and actually to really > support > things well for smack we probably need something like your patches. As far as I can see now from the discussion the best thing to do would to be inherit label from a backing store object, or something along this line. > At the same time a patch written without dealing with s_user_ns is > going > to going to fail to take a lot of important details into account. I don't touch anything that would need to deal with s_user_ns. I also don't change Smack's mounting logic in any way. My patches are orthogonal to that. > Right now after fixing the mount namespace issues the top priority is > to > work through the details and get s_user_ns implemented. By that I > mean > some version of patch 1 of Seth's series. My priority is to make Smack namespace work. This is a functionality that has a perfectly valid use case now. Without it Smack in a container is impossible to operate on. > s_user_ns fundamentally changes how the concepts are represented in > the > kernel in a way that is easier to secure, and that fundamentally > better > matches things. And sigh. This review has shown we don't quite have > all of the details worked out. > > > If your approach here is to treat user ns mounted filesystem as if > > they > > didn't support xattrs at all then my patches don't conflict here > > any > > more than Smack itself already does. > > The end game if people developing smack choose to play, is to figure > out > how to store your unmapped labels in a filesystem contained by a > user namespace and a smack label namespace root. Storing an unmapped label (read: real label) in Smack namespace is exactly the same as it is now without the namespace. I always store the real label. The problem here is: what real label should be "read" and eventually stored in that filesystem (see my first comment here). Again, Smack namespace doesn't touch that logic. > > If the filesystem will get a default (e.g. by smack* mount options) > > label then this label will co-work with Smack namespaces. > > A default, but I don't know if it will be smack mount options that > will > give that default. The devil is in the details and there are a lot > of details. Now Smack gives the default. If someone will modify Smack to give a different label because of s_user_ns support Smack namepace will not cause any hindrance here. Smack namespace main role is only to be able to operate Smack within a container. All the other LSM can do that already as they don't require caps to operate normally. Smack does. Hence it had to be namespaced in some way to give limited capabilities in a container (user ns). This really has nothing to do with the way Smack mounts, assigns labels, decides what is allowed and what is not, etc. What this discussion is about is how to modify or even bend LSM's way of work to make unprivileged user ns mounts work under LSM (or not). Smack namespace here is just an utility within Smack itself. And maybe it can be used to help this at some point, but beyond that it's orthogonal to the problem. -- Lukasz Pawelczyk Samsung R&D Institute Poland Samsung Electronics ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-17 10:13 ` Lukasz Pawelczyk 0 siblings, 0 replies; 138+ messages in thread From: Lukasz Pawelczyk @ 2015-07-17 10:13 UTC (permalink / raw) To: Eric W. Biederman Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, Seth Forshee, linux-security-module, Alexander Viro, selinux, linux-fsdevel On czw, 2015-07-16 at 19:10 -0500, Eric W. Biederman wrote: > Lukasz Pawelczyk <l.pawelczyk@samsung.com> writes: > > > > I fail to see how those 2 are in any conflict. > > Like I said. They don't really conflict, and actually to really > support > things well for smack we probably need something like your patches. As far as I can see now from the discussion the best thing to do would to be inherit label from a backing store object, or something along this line. > At the same time a patch written without dealing with s_user_ns is > going > to going to fail to take a lot of important details into account. I don't touch anything that would need to deal with s_user_ns. I also don't change Smack's mounting logic in any way. My patches are orthogonal to that. > Right now after fixing the mount namespace issues the top priority is > to > work through the details and get s_user_ns implemented. By that I > mean > some version of patch 1 of Seth's series. My priority is to make Smack namespace work. This is a functionality that has a perfectly valid use case now. Without it Smack in a container is impossible to operate on. > s_user_ns fundamentally changes how the concepts are represented in > the > kernel in a way that is easier to secure, and that fundamentally > better > matches things. And sigh. This review has shown we don't quite have > all of the details worked out. > > > If your approach here is to treat user ns mounted filesystem as if > > they > > didn't support xattrs at all then my patches don't conflict here > > any > > more than Smack itself already does. > > The end game if people developing smack choose to play, is to figure > out > how to store your unmapped labels in a filesystem contained by a > user namespace and a smack label namespace root. Storing an unmapped label (read: real label) in Smack namespace is exactly the same as it is now without the namespace. I always store the real label. The problem here is: what real label should be "read" and eventually stored in that filesystem (see my first comment here). Again, Smack namespace doesn't touch that logic. > > If the filesystem will get a default (e.g. by smack* mount options) > > label then this label will co-work with Smack namespaces. > > A default, but I don't know if it will be smack mount options that > will > give that default. The devil is in the details and there are a lot > of details. Now Smack gives the default. If someone will modify Smack to give a different label because of s_user_ns support Smack namepace will not cause any hindrance here. Smack namespace main role is only to be able to operate Smack within a container. All the other LSM can do that already as they don't require caps to operate normally. Smack does. Hence it had to be namespaced in some way to give limited capabilities in a container (user ns). This really has nothing to do with the way Smack mounts, assigns labels, decides what is allowed and what is not, etc. What this discussion is about is how to modify or even bend LSM's way of work to make unprivileged user ns mounts work under LSM (or not). Smack namespace here is just an utility within Smack itself. And maybe it can be used to help this at some point, but beyond that it's orthogonal to the problem. -- Lukasz Pawelczyk Samsung R&D Institute Poland Samsung Electronics ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-15 19:46 ` Seth Forshee @ 2015-07-16 3:15 ` Eric W. Biederman -1 siblings, 0 replies; 138+ messages in thread From: Eric W. Biederman @ 2015-07-16 3:15 UTC (permalink / raw) To: Seth Forshee Cc: Alexander Viro, linux-fsdevel, linux-security-module, selinux, Serge Hallyn, Andy Lutomirski, linux-kernel, Casey Schaufler Seth I think for the LSMs we should start with: diff --git a/security/security.c b/security/security.c index 062f3c997fdc..5b6ece92a8e5 100644 --- a/security/security.c +++ b/security/security.c @@ -310,6 +310,8 @@ int security_sb_statfs(struct dentry *dentry) int security_sb_mount(const char *dev_name, struct path *path, const char *type, unsigned long flags, void *data) { + if (current_user_ns() != &init_user_ns) + return -EPERM; return call_int_hook(sb_mount, 0, dev_name, path, type, flags, data); } Then we should push this down into all of the lsms. Then when we should remove or relax or change the check as appropriate in each lsm. The point is this is good enough to see that it is trivially safe, and this allows us to focus on the core issues, and stop worrying about the lsms for a bit. Then we can focus on each lsm one at at time and take the time to really understand them and talk with their maintainers etc to make certain we get things correct. This should remove the need for your patches 5, 6 and 7. For the immediate future. Eric ^ permalink raw reply related [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-16 3:15 ` Eric W. Biederman 0 siblings, 0 replies; 138+ messages in thread From: Eric W. Biederman @ 2015-07-16 3:15 UTC (permalink / raw) To: Seth Forshee Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, linux-security-module, Alexander Viro, selinux, linux-fsdevel Seth I think for the LSMs we should start with: diff --git a/security/security.c b/security/security.c index 062f3c997fdc..5b6ece92a8e5 100644 --- a/security/security.c +++ b/security/security.c @@ -310,6 +310,8 @@ int security_sb_statfs(struct dentry *dentry) int security_sb_mount(const char *dev_name, struct path *path, const char *type, unsigned long flags, void *data) { + if (current_user_ns() != &init_user_ns) + return -EPERM; return call_int_hook(sb_mount, 0, dev_name, path, type, flags, data); } Then we should push this down into all of the lsms. Then when we should remove or relax or change the check as appropriate in each lsm. The point is this is good enough to see that it is trivially safe, and this allows us to focus on the core issues, and stop worrying about the lsms for a bit. Then we can focus on each lsm one at at time and take the time to really understand them and talk with their maintainers etc to make certain we get things correct. This should remove the need for your patches 5, 6 and 7. For the immediate future. Eric ^ permalink raw reply related [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-16 3:15 ` Eric W. Biederman @ 2015-07-16 13:59 ` Seth Forshee -1 siblings, 0 replies; 138+ messages in thread From: Seth Forshee @ 2015-07-16 13:59 UTC (permalink / raw) To: Eric W. Biederman Cc: Alexander Viro, linux-fsdevel, linux-security-module, selinux, Serge Hallyn, Andy Lutomirski, linux-kernel, Casey Schaufler On Wed, Jul 15, 2015 at 10:15:21PM -0500, Eric W. Biederman wrote: > > Seth I think for the LSMs we should start with: > > diff --git a/security/security.c b/security/security.c > index 062f3c997fdc..5b6ece92a8e5 100644 > --- a/security/security.c > +++ b/security/security.c > @@ -310,6 +310,8 @@ int security_sb_statfs(struct dentry *dentry) > int security_sb_mount(const char *dev_name, struct path *path, > const char *type, unsigned long flags, void *data) > { > + if (current_user_ns() != &init_user_ns) > + return -EPERM; > return call_int_hook(sb_mount, 0, dev_name, path, type, flags, data); > } This just makes it impossible to mount from a user namespace. Every mount from current_user_ns() != &init_user_ns will fail. > Then we should push this down into all of the lsms. > Then when we should remove or relax or change the check as appropriate > in each lsm. > > The point is this is good enough to see that it is trivially safe, > and this allows us to focus on the core issues, and stop worrying about > the lsms for a bit. > > Then we can focus on each lsm one at at time and take the time to really > understand them and talk with their maintainers etc to make certain > we get things correct. > > This should remove the need for your patches 5, 6 and 7. For the > immediate future. I'm still not entirely sure what you were trying to do, maybe refuse to mount whenever a security module is loaded? I think this could be a good option to start, but couldn't we restrict it to only the LSMs which use xattrs for security labels? In situations where the filesystem cannot supply security policy metadata I can't think of any reason to disallow the mounts. Seth ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-16 13:59 ` Seth Forshee 0 siblings, 0 replies; 138+ messages in thread From: Seth Forshee @ 2015-07-16 13:59 UTC (permalink / raw) To: Eric W. Biederman Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, linux-security-module, Alexander Viro, selinux, linux-fsdevel On Wed, Jul 15, 2015 at 10:15:21PM -0500, Eric W. Biederman wrote: > > Seth I think for the LSMs we should start with: > > diff --git a/security/security.c b/security/security.c > index 062f3c997fdc..5b6ece92a8e5 100644 > --- a/security/security.c > +++ b/security/security.c > @@ -310,6 +310,8 @@ int security_sb_statfs(struct dentry *dentry) > int security_sb_mount(const char *dev_name, struct path *path, > const char *type, unsigned long flags, void *data) > { > + if (current_user_ns() != &init_user_ns) > + return -EPERM; > return call_int_hook(sb_mount, 0, dev_name, path, type, flags, data); > } This just makes it impossible to mount from a user namespace. Every mount from current_user_ns() != &init_user_ns will fail. > Then we should push this down into all of the lsms. > Then when we should remove or relax or change the check as appropriate > in each lsm. > > The point is this is good enough to see that it is trivially safe, > and this allows us to focus on the core issues, and stop worrying about > the lsms for a bit. > > Then we can focus on each lsm one at at time and take the time to really > understand them and talk with their maintainers etc to make certain > we get things correct. > > This should remove the need for your patches 5, 6 and 7. For the > immediate future. I'm still not entirely sure what you were trying to do, maybe refuse to mount whenever a security module is loaded? I think this could be a good option to start, but couldn't we restrict it to only the LSMs which use xattrs for security labels? In situations where the filesystem cannot supply security policy metadata I can't think of any reason to disallow the mounts. Seth ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-16 13:59 ` Seth Forshee @ 2015-07-16 15:09 ` Casey Schaufler -1 siblings, 0 replies; 138+ messages in thread From: Casey Schaufler @ 2015-07-16 15:09 UTC (permalink / raw) To: Seth Forshee, Eric W. Biederman Cc: Alexander Viro, linux-fsdevel, linux-security-module, selinux, Serge Hallyn, Andy Lutomirski, linux-kernel On 7/16/2015 6:59 AM, Seth Forshee wrote: > On Wed, Jul 15, 2015 at 10:15:21PM -0500, Eric W. Biederman wrote: >> Seth I think for the LSMs we should start with: >> >> diff --git a/security/security.c b/security/security.c >> index 062f3c997fdc..5b6ece92a8e5 100644 >> --- a/security/security.c >> +++ b/security/security.c >> @@ -310,6 +310,8 @@ int security_sb_statfs(struct dentry *dentry) >> int security_sb_mount(const char *dev_name, struct path *path, >> const char *type, unsigned long flags, void *data) >> { >> + if (current_user_ns() != &init_user_ns) >> + return -EPERM; >> return call_int_hook(sb_mount, 0, dev_name, path, type, flags, data); >> } > This just makes it impossible to mount from a user namespace. Every > mount from current_user_ns() != &init_user_ns will fail. > >> Then we should push this down into all of the lsms. >> Then when we should remove or relax or change the check as appropriate >> in each lsm. >> >> The point is this is good enough to see that it is trivially safe, >> and this allows us to focus on the core issues, and stop worrying about >> the lsms for a bit. Given the extent to which LSMs are deployed I find it a bit worrisome that they might not be considered a "core issue". >> Then we can focus on each lsm one at at time and take the time to really >> understand them and talk with their maintainers etc to make certain >> we get things correct. The "Do the easy stuff, fix the hard stuff after we've sold the product" approach works really well until you get to the point of fixing the hard stuff. This is the origin of the 90/90 rule of software development. >> >> This should remove the need for your patches 5, 6 and 7. For the >> immediate future. > I'm still not entirely sure what you were trying to do, maybe refuse to > mount whenever a security module is loaded? I think this could be a good > option to start, but couldn't we restrict it to only the LSMs which use > xattrs for security labels? In situations where the filesystem cannot > supply security policy metadata I can't think of any reason to disallow > the mounts. This whole notion of mounting a generic filesystem (e.g. ext4) that is "owned" by a user (as opposed to the system) has lots of implications, and I seriously doubt that many of them have been accounted for. Think back to the "negative group access" issue. You can't just ignore issues that are inconvenient, or claim that you have a reasonable system just because *you* can't think of a problem. > Seth > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-16 15:09 ` Casey Schaufler 0 siblings, 0 replies; 138+ messages in thread From: Casey Schaufler @ 2015-07-16 15:09 UTC (permalink / raw) To: Seth Forshee, Eric W. Biederman Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, linux-security-module, Alexander Viro, selinux, linux-fsdevel On 7/16/2015 6:59 AM, Seth Forshee wrote: > On Wed, Jul 15, 2015 at 10:15:21PM -0500, Eric W. Biederman wrote: >> Seth I think for the LSMs we should start with: >> >> diff --git a/security/security.c b/security/security.c >> index 062f3c997fdc..5b6ece92a8e5 100644 >> --- a/security/security.c >> +++ b/security/security.c >> @@ -310,6 +310,8 @@ int security_sb_statfs(struct dentry *dentry) >> int security_sb_mount(const char *dev_name, struct path *path, >> const char *type, unsigned long flags, void *data) >> { >> + if (current_user_ns() != &init_user_ns) >> + return -EPERM; >> return call_int_hook(sb_mount, 0, dev_name, path, type, flags, data); >> } > This just makes it impossible to mount from a user namespace. Every > mount from current_user_ns() != &init_user_ns will fail. > >> Then we should push this down into all of the lsms. >> Then when we should remove or relax or change the check as appropriate >> in each lsm. >> >> The point is this is good enough to see that it is trivially safe, >> and this allows us to focus on the core issues, and stop worrying about >> the lsms for a bit. Given the extent to which LSMs are deployed I find it a bit worrisome that they might not be considered a "core issue". >> Then we can focus on each lsm one at at time and take the time to really >> understand them and talk with their maintainers etc to make certain >> we get things correct. The "Do the easy stuff, fix the hard stuff after we've sold the product" approach works really well until you get to the point of fixing the hard stuff. This is the origin of the 90/90 rule of software development. >> >> This should remove the need for your patches 5, 6 and 7. For the >> immediate future. > I'm still not entirely sure what you were trying to do, maybe refuse to > mount whenever a security module is loaded? I think this could be a good > option to start, but couldn't we restrict it to only the LSMs which use > xattrs for security labels? In situations where the filesystem cannot > supply security policy metadata I can't think of any reason to disallow > the mounts. This whole notion of mounting a generic filesystem (e.g. ext4) that is "owned" by a user (as opposed to the system) has lots of implications, and I seriously doubt that many of them have been accounted for. Think back to the "negative group access" issue. You can't just ignore issues that are inconvenient, or claim that you have a reasonable system just because *you* can't think of a problem. > Seth > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-16 15:09 ` Casey Schaufler @ 2015-07-16 18:57 ` Seth Forshee -1 siblings, 0 replies; 138+ messages in thread From: Seth Forshee @ 2015-07-16 18:57 UTC (permalink / raw) To: Casey Schaufler Cc: Eric W. Biederman, Alexander Viro, linux-fsdevel, linux-security-module, selinux, Serge Hallyn, Andy Lutomirski, linux-kernel On Thu, Jul 16, 2015 at 08:09:20AM -0700, Casey Schaufler wrote: > On 7/16/2015 6:59 AM, Seth Forshee wrote: > > On Wed, Jul 15, 2015 at 10:15:21PM -0500, Eric W. Biederman wrote: > >> Seth I think for the LSMs we should start with: > >> > >> diff --git a/security/security.c b/security/security.c > >> index 062f3c997fdc..5b6ece92a8e5 100644 > >> --- a/security/security.c > >> +++ b/security/security.c > >> @@ -310,6 +310,8 @@ int security_sb_statfs(struct dentry *dentry) > >> int security_sb_mount(const char *dev_name, struct path *path, > >> const char *type, unsigned long flags, void *data) > >> { > >> + if (current_user_ns() != &init_user_ns) > >> + return -EPERM; > >> return call_int_hook(sb_mount, 0, dev_name, path, type, flags, data); > >> } > > This just makes it impossible to mount from a user namespace. Every > > mount from current_user_ns() != &init_user_ns will fail. > > > >> Then we should push this down into all of the lsms. > >> Then when we should remove or relax or change the check as appropriate > >> in each lsm. > >> > >> The point is this is good enough to see that it is trivially safe, > >> and this allows us to focus on the core issues, and stop worrying about > >> the lsms for a bit. > > Given the extent to which LSMs are deployed I find it a bit > worrisome that they might not be considered a "core issue". > > >> Then we can focus on each lsm one at at time and take the time to really > >> understand them and talk with their maintainers etc to make certain > >> we get things correct. > > The "Do the easy stuff, fix the hard stuff after we've sold the product" > approach works really well until you get to the point of fixing the hard > stuff. This is the origin of the 90/90 rule of software development. > > >> > >> This should remove the need for your patches 5, 6 and 7. For the > >> immediate future. > > I'm still not entirely sure what you were trying to do, maybe refuse to > > mount whenever a security module is loaded? I think this could be a good > > option to start, but couldn't we restrict it to only the LSMs which use > > xattrs for security labels? In situations where the filesystem cannot > > supply security policy metadata I can't think of any reason to disallow > > the mounts. > > This whole notion of mounting a generic filesystem (e.g. ext4) that > is "owned" by a user (as opposed to the system) has lots of implications, > and I seriously doubt that many of them have been accounted for. > > Think back to the "negative group access" issue. You can't just > ignore issues that are inconvenient, or claim that you have a reasonable > system just because *you* can't think of a problem. I've spent a lot of time considering the implications and previous vulnerabilities, and I've addressed everything I turned up. Now I'm asking for review from those with more experience with and expertise of the code in question. I'm not sure what more I should be doing. I welcome feedback about anything I've missed, but stating generally that you think I probably missed something isn't very helpful. The LSM issue is thornier than the rest of it though, which is why I specifically asked for review there in the cover letter. There's a lot of complexity and nuance, and I still don't have a grasp on all the subtleties. One such subtlety is the full impact of simply ignoring the security labels on disk (but I am still confused as to why this is different from filesystems which don't support xattrs at all). I was unaware of Lukasz's patches until yesterday, and I will have a look at them. But since we don't have the LSM support for user namespaces yet, I don't see the problem with doing something safe for LSMs initially and evolving the LSM integration for user ns mounts along with the rest of the user ns integration. Your point is taken about my less-than-expert opinion about the other security modules. We should at minimum get acks from the maintainers of those modules that unprivileged mounts will not compromise MAC. For Smack specifically, I believe my only concern was the SMACK64EXEC attribute, as all the other attributes only affected subjects' access to the files. So maybe it would be possible to simply ignore this attribute in unprivileged mounts and respect the others, even lacking more complete LSM support for user namespaces. Seth ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-16 18:57 ` Seth Forshee 0 siblings, 0 replies; 138+ messages in thread From: Seth Forshee @ 2015-07-16 18:57 UTC (permalink / raw) To: Casey Schaufler Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, linux-security-module, selinux, linux-fsdevel, Alexander Viro On Thu, Jul 16, 2015 at 08:09:20AM -0700, Casey Schaufler wrote: > On 7/16/2015 6:59 AM, Seth Forshee wrote: > > On Wed, Jul 15, 2015 at 10:15:21PM -0500, Eric W. Biederman wrote: > >> Seth I think for the LSMs we should start with: > >> > >> diff --git a/security/security.c b/security/security.c > >> index 062f3c997fdc..5b6ece92a8e5 100644 > >> --- a/security/security.c > >> +++ b/security/security.c > >> @@ -310,6 +310,8 @@ int security_sb_statfs(struct dentry *dentry) > >> int security_sb_mount(const char *dev_name, struct path *path, > >> const char *type, unsigned long flags, void *data) > >> { > >> + if (current_user_ns() != &init_user_ns) > >> + return -EPERM; > >> return call_int_hook(sb_mount, 0, dev_name, path, type, flags, data); > >> } > > This just makes it impossible to mount from a user namespace. Every > > mount from current_user_ns() != &init_user_ns will fail. > > > >> Then we should push this down into all of the lsms. > >> Then when we should remove or relax or change the check as appropriate > >> in each lsm. > >> > >> The point is this is good enough to see that it is trivially safe, > >> and this allows us to focus on the core issues, and stop worrying about > >> the lsms for a bit. > > Given the extent to which LSMs are deployed I find it a bit > worrisome that they might not be considered a "core issue". > > >> Then we can focus on each lsm one at at time and take the time to really > >> understand them and talk with their maintainers etc to make certain > >> we get things correct. > > The "Do the easy stuff, fix the hard stuff after we've sold the product" > approach works really well until you get to the point of fixing the hard > stuff. This is the origin of the 90/90 rule of software development. > > >> > >> This should remove the need for your patches 5, 6 and 7. For the > >> immediate future. > > I'm still not entirely sure what you were trying to do, maybe refuse to > > mount whenever a security module is loaded? I think this could be a good > > option to start, but couldn't we restrict it to only the LSMs which use > > xattrs for security labels? In situations where the filesystem cannot > > supply security policy metadata I can't think of any reason to disallow > > the mounts. > > This whole notion of mounting a generic filesystem (e.g. ext4) that > is "owned" by a user (as opposed to the system) has lots of implications, > and I seriously doubt that many of them have been accounted for. > > Think back to the "negative group access" issue. You can't just > ignore issues that are inconvenient, or claim that you have a reasonable > system just because *you* can't think of a problem. I've spent a lot of time considering the implications and previous vulnerabilities, and I've addressed everything I turned up. Now I'm asking for review from those with more experience with and expertise of the code in question. I'm not sure what more I should be doing. I welcome feedback about anything I've missed, but stating generally that you think I probably missed something isn't very helpful. The LSM issue is thornier than the rest of it though, which is why I specifically asked for review there in the cover letter. There's a lot of complexity and nuance, and I still don't have a grasp on all the subtleties. One such subtlety is the full impact of simply ignoring the security labels on disk (but I am still confused as to why this is different from filesystems which don't support xattrs at all). I was unaware of Lukasz's patches until yesterday, and I will have a look at them. But since we don't have the LSM support for user namespaces yet, I don't see the problem with doing something safe for LSMs initially and evolving the LSM integration for user ns mounts along with the rest of the user ns integration. Your point is taken about my less-than-expert opinion about the other security modules. We should at minimum get acks from the maintainers of those modules that unprivileged mounts will not compromise MAC. For Smack specifically, I believe my only concern was the SMACK64EXEC attribute, as all the other attributes only affected subjects' access to the files. So maybe it would be possible to simply ignore this attribute in unprivileged mounts and respect the others, even lacking more complete LSM support for user namespaces. Seth ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-16 18:57 ` Seth Forshee @ 2015-07-16 21:42 ` Casey Schaufler -1 siblings, 0 replies; 138+ messages in thread From: Casey Schaufler @ 2015-07-16 21:42 UTC (permalink / raw) To: Seth Forshee Cc: Eric W. Biederman, Alexander Viro, linux-fsdevel, linux-security-module, selinux, Serge Hallyn, Andy Lutomirski, linux-kernel On 7/16/2015 11:57 AM, Seth Forshee wrote: > On Thu, Jul 16, 2015 at 08:09:20AM -0700, Casey Schaufler wrote: >> On 7/16/2015 6:59 AM, Seth Forshee wrote: >>> On Wed, Jul 15, 2015 at 10:15:21PM -0500, Eric W. Biederman wrote: >>>> Seth I think for the LSMs we should start with: >>>> >>>> diff --git a/security/security.c b/security/security.c >>>> index 062f3c997fdc..5b6ece92a8e5 100644 >>>> --- a/security/security.c >>>> +++ b/security/security.c >>>> @@ -310,6 +310,8 @@ int security_sb_statfs(struct dentry *dentry) >>>> int security_sb_mount(const char *dev_name, struct path *path, >>>> const char *type, unsigned long flags, void *data) >>>> { >>>> + if (current_user_ns() != &init_user_ns) >>>> + return -EPERM; >>>> return call_int_hook(sb_mount, 0, dev_name, path, type, flags, data); >>>> } >>> This just makes it impossible to mount from a user namespace. Every >>> mount from current_user_ns() != &init_user_ns will fail. >>> >>>> Then we should push this down into all of the lsms. >>>> Then when we should remove or relax or change the check as appropriate >>>> in each lsm. >>>> >>>> The point is this is good enough to see that it is trivially safe, >>>> and this allows us to focus on the core issues, and stop worrying about >>>> the lsms for a bit. >> Given the extent to which LSMs are deployed I find it a bit >> worrisome that they might not be considered a "core issue". >> >>>> Then we can focus on each lsm one at at time and take the time to really >>>> understand them and talk with their maintainers etc to make certain >>>> we get things correct. >> The "Do the easy stuff, fix the hard stuff after we've sold the product" >> approach works really well until you get to the point of fixing the hard >> stuff. This is the origin of the 90/90 rule of software development. >> >>>> This should remove the need for your patches 5, 6 and 7. For the >>>> immediate future. >>> I'm still not entirely sure what you were trying to do, maybe refuse to >>> mount whenever a security module is loaded? I think this could be a good >>> option to start, but couldn't we restrict it to only the LSMs which use >>> xattrs for security labels? In situations where the filesystem cannot >>> supply security policy metadata I can't think of any reason to disallow >>> the mounts. >> This whole notion of mounting a generic filesystem (e.g. ext4) that >> is "owned" by a user (as opposed to the system) has lots of implications, >> and I seriously doubt that many of them have been accounted for. >> >> Think back to the "negative group access" issue. You can't just >> ignore issues that are inconvenient, or claim that you have a reasonable >> system just because *you* can't think of a problem. > I've spent a lot of time considering the implications and previous > vulnerabilities, and I've addressed everything I turned up. Now I'm > asking for review from those with more experience with and expertise of > the code in question. I'm not sure what more I should be doing. Part of the problem I see is that you're looking at the details when there's an architectural issue. That's OK, it happens all the time, but we have to pull the issue up slightly higher in order to address the underlying difficulties. You want to provide a mechanism whereby an unprivileged user (Seth) can mount a filesystem for his own use. You want full filesystem semantics, but you're willing to accept restrictions on certain filesystem features to avoid opening security holes. You are not willing to accept restrictions that make the filesystem unusable, such as making it read-only. I am going to present a suggestion. Feel free to correct my assumptions and my reasoning. For simplicity let's use loop-back mounting of a filesystem contained in a file as an example. The principles should apply to newly created memory based filesystems or disk partitions "owned" by Seth. Seth wants to mount a file (~seth/myfs) which contains an ext4 filesystem. There is already a filesystem object, with security attributes, that the system knows how to deal with. If Seth mounts this as a filesystem he, and potentially other people, will be able to access the content of this object without accessing the object itself. seth$ mount --justforme -t ext4 ~seth/myfs /tmp/seth seth$ chmod 777 /tmp/seth seth$ ls -la /tmp/seth drwxrwxrwx. 3 seth seth 260 Jul 16 12:59 . drwxrwxrwxt 18 root root 4069 Jul 16 11:13 .. seth$ Everything's fine at this point. Wilma is also using the system, being the sort who likes to hide things in out of the way places wilma$ cp ~/scandals /tmp/seth wilma$ chmod 600 /tmp/seth/scandals puts her list of scandals on the unsuspecting filesystem, and changes the mode to ensure that no one can find out what went on after the office party. Seth unmounts /tmp/seth. He looks in ~seth/myfs, finds out what really happened at the office party, and the story goes from there. Wilma did everything correctly according to the system security policy, but the system security policy did not protect her as advertised. The system was tricked into behaving as if it was in control of the content of the filesystem when in fact it was not. One way to fix this problem is for unprivileged mounts to recognize the attributes of the object mounted and to propagate those attributes to all the objects they present. All files on /tmp/seth would be owned by seth and protected by the mode bits, ACL and LSM requirements of ~/seth/myfs. opening a file on /tmp/seth would require the same permissions as opening the file containing the mounted filesystem. These attributes would have to be immutable, or at least demonstrably more restrictive (chmod might be allowed in some cases, but chown would never be) when changed. I don't see how a user other than seth could create a new file, as you'd either have a magical change in ownership or a false sense of security. I don't see that the presence of user namespaces changes anything. You may reduce the set of uids available, but the problems with putting a uid into someone else's file is just as real. > I welcome feedback about anything I've missed, but stating generally > that you think I probably missed something isn't very helpful. True enough. I hope I've explained myself above. > The LSM issue is thornier than the rest of it though, which is why I > specifically asked for review there in the cover letter. There's a lot > of complexity and nuance, and I still don't have a grasp on all the > subtleties. One such subtlety is the full impact of simply ignoring the > security labels on disk (but I am still confused as to why this is > different from filesystems which don't support xattrs at all). If you can mount a filesystem such that the labels are ignored you are effectively specifying that the Smack label on the files be determined by the defaulting rules. With CAP_MAC_ADMIN that's fine. Without it, it's not. > I was unaware of Lukasz's patches until yesterday, and I will have a > look at them. But since we don't have the LSM support for user > namespaces yet, I don't see the problem with doing something safe for > LSMs initially and evolving the LSM integration for user ns mounts along > with the rest of the user ns integration. Ignoring the security attributes is not safe! > Your point is taken about my less-than-expert opinion about the other > security modules. We should at minimum get acks from the maintainers of > those modules that unprivileged mounts will not compromise MAC. I am the Smack maintainer. Unprivileged mounts as you have described them compromise MAC. They compromise DAC, too. > For Smack specifically, I believe my only concern was the SMACK64EXEC > attribute, as all the other attributes only affected subjects' access to > the files. So maybe it would be possible to simply ignore this attribute > in unprivileged mounts and respect the others, even lacking more > complete LSM support for user namespaces. SMACK64EXEC is analogous to the setuid bit, but I would rather see exec() of programs with this attribute refused that for it to be blindly ignored. > Seth > ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-16 21:42 ` Casey Schaufler 0 siblings, 0 replies; 138+ messages in thread From: Casey Schaufler @ 2015-07-16 21:42 UTC (permalink / raw) To: Seth Forshee Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, linux-security-module, selinux, linux-fsdevel, Alexander Viro On 7/16/2015 11:57 AM, Seth Forshee wrote: > On Thu, Jul 16, 2015 at 08:09:20AM -0700, Casey Schaufler wrote: >> On 7/16/2015 6:59 AM, Seth Forshee wrote: >>> On Wed, Jul 15, 2015 at 10:15:21PM -0500, Eric W. Biederman wrote: >>>> Seth I think for the LSMs we should start with: >>>> >>>> diff --git a/security/security.c b/security/security.c >>>> index 062f3c997fdc..5b6ece92a8e5 100644 >>>> --- a/security/security.c >>>> +++ b/security/security.c >>>> @@ -310,6 +310,8 @@ int security_sb_statfs(struct dentry *dentry) >>>> int security_sb_mount(const char *dev_name, struct path *path, >>>> const char *type, unsigned long flags, void *data) >>>> { >>>> + if (current_user_ns() != &init_user_ns) >>>> + return -EPERM; >>>> return call_int_hook(sb_mount, 0, dev_name, path, type, flags, data); >>>> } >>> This just makes it impossible to mount from a user namespace. Every >>> mount from current_user_ns() != &init_user_ns will fail. >>> >>>> Then we should push this down into all of the lsms. >>>> Then when we should remove or relax or change the check as appropriate >>>> in each lsm. >>>> >>>> The point is this is good enough to see that it is trivially safe, >>>> and this allows us to focus on the core issues, and stop worrying about >>>> the lsms for a bit. >> Given the extent to which LSMs are deployed I find it a bit >> worrisome that they might not be considered a "core issue". >> >>>> Then we can focus on each lsm one at at time and take the time to really >>>> understand them and talk with their maintainers etc to make certain >>>> we get things correct. >> The "Do the easy stuff, fix the hard stuff after we've sold the product" >> approach works really well until you get to the point of fixing the hard >> stuff. This is the origin of the 90/90 rule of software development. >> >>>> This should remove the need for your patches 5, 6 and 7. For the >>>> immediate future. >>> I'm still not entirely sure what you were trying to do, maybe refuse to >>> mount whenever a security module is loaded? I think this could be a good >>> option to start, but couldn't we restrict it to only the LSMs which use >>> xattrs for security labels? In situations where the filesystem cannot >>> supply security policy metadata I can't think of any reason to disallow >>> the mounts. >> This whole notion of mounting a generic filesystem (e.g. ext4) that >> is "owned" by a user (as opposed to the system) has lots of implications, >> and I seriously doubt that many of them have been accounted for. >> >> Think back to the "negative group access" issue. You can't just >> ignore issues that are inconvenient, or claim that you have a reasonable >> system just because *you* can't think of a problem. > I've spent a lot of time considering the implications and previous > vulnerabilities, and I've addressed everything I turned up. Now I'm > asking for review from those with more experience with and expertise of > the code in question. I'm not sure what more I should be doing. Part of the problem I see is that you're looking at the details when there's an architectural issue. That's OK, it happens all the time, but we have to pull the issue up slightly higher in order to address the underlying difficulties. You want to provide a mechanism whereby an unprivileged user (Seth) can mount a filesystem for his own use. You want full filesystem semantics, but you're willing to accept restrictions on certain filesystem features to avoid opening security holes. You are not willing to accept restrictions that make the filesystem unusable, such as making it read-only. I am going to present a suggestion. Feel free to correct my assumptions and my reasoning. For simplicity let's use loop-back mounting of a filesystem contained in a file as an example. The principles should apply to newly created memory based filesystems or disk partitions "owned" by Seth. Seth wants to mount a file (~seth/myfs) which contains an ext4 filesystem. There is already a filesystem object, with security attributes, that the system knows how to deal with. If Seth mounts this as a filesystem he, and potentially other people, will be able to access the content of this object without accessing the object itself. seth$ mount --justforme -t ext4 ~seth/myfs /tmp/seth seth$ chmod 777 /tmp/seth seth$ ls -la /tmp/seth drwxrwxrwx. 3 seth seth 260 Jul 16 12:59 . drwxrwxrwxt 18 root root 4069 Jul 16 11:13 .. seth$ Everything's fine at this point. Wilma is also using the system, being the sort who likes to hide things in out of the way places wilma$ cp ~/scandals /tmp/seth wilma$ chmod 600 /tmp/seth/scandals puts her list of scandals on the unsuspecting filesystem, and changes the mode to ensure that no one can find out what went on after the office party. Seth unmounts /tmp/seth. He looks in ~seth/myfs, finds out what really happened at the office party, and the story goes from there. Wilma did everything correctly according to the system security policy, but the system security policy did not protect her as advertised. The system was tricked into behaving as if it was in control of the content of the filesystem when in fact it was not. One way to fix this problem is for unprivileged mounts to recognize the attributes of the object mounted and to propagate those attributes to all the objects they present. All files on /tmp/seth would be owned by seth and protected by the mode bits, ACL and LSM requirements of ~/seth/myfs. opening a file on /tmp/seth would require the same permissions as opening the file containing the mounted filesystem. These attributes would have to be immutable, or at least demonstrably more restrictive (chmod might be allowed in some cases, but chown would never be) when changed. I don't see how a user other than seth could create a new file, as you'd either have a magical change in ownership or a false sense of security. I don't see that the presence of user namespaces changes anything. You may reduce the set of uids available, but the problems with putting a uid into someone else's file is just as real. > I welcome feedback about anything I've missed, but stating generally > that you think I probably missed something isn't very helpful. True enough. I hope I've explained myself above. > The LSM issue is thornier than the rest of it though, which is why I > specifically asked for review there in the cover letter. There's a lot > of complexity and nuance, and I still don't have a grasp on all the > subtleties. One such subtlety is the full impact of simply ignoring the > security labels on disk (but I am still confused as to why this is > different from filesystems which don't support xattrs at all). If you can mount a filesystem such that the labels are ignored you are effectively specifying that the Smack label on the files be determined by the defaulting rules. With CAP_MAC_ADMIN that's fine. Without it, it's not. > I was unaware of Lukasz's patches until yesterday, and I will have a > look at them. But since we don't have the LSM support for user > namespaces yet, I don't see the problem with doing something safe for > LSMs initially and evolving the LSM integration for user ns mounts along > with the rest of the user ns integration. Ignoring the security attributes is not safe! > Your point is taken about my less-than-expert opinion about the other > security modules. We should at minimum get acks from the maintainers of > those modules that unprivileged mounts will not compromise MAC. I am the Smack maintainer. Unprivileged mounts as you have described them compromise MAC. They compromise DAC, too. > For Smack specifically, I believe my only concern was the SMACK64EXEC > attribute, as all the other attributes only affected subjects' access to > the files. So maybe it would be possible to simply ignore this attribute > in unprivileged mounts and respect the others, even lacking more > complete LSM support for user namespaces. SMACK64EXEC is analogous to the setuid bit, but I would rather see exec() of programs with this attribute refused that for it to be blindly ignored. > Seth > ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-16 21:42 ` Casey Schaufler @ 2015-07-16 22:27 ` Andy Lutomirski -1 siblings, 0 replies; 138+ messages in thread From: Andy Lutomirski @ 2015-07-16 22:27 UTC (permalink / raw) To: Casey Schaufler Cc: Seth Forshee, Eric W. Biederman, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel On Thu, Jul 16, 2015 at 2:42 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: > You want to provide a mechanism whereby an unprivileged user (Seth) > can mount a filesystem for his own use. You want full filesystem > semantics, but you're willing to accept restrictions on certain > filesystem features to avoid opening security holes. You are not > willing to accept restrictions that make the filesystem unusable, > such as making it read-only. > > I am going to present a suggestion. Feel free to correct my > assumptions and my reasoning. For simplicity let's use loop-back > mounting of a filesystem contained in a file as an example. The > principles should apply to newly created memory based filesystems > or disk partitions "owned" by Seth. > > Seth wants to mount a file (~seth/myfs) which contains an ext4 > filesystem. There is already a filesystem object, with security > attributes, that the system knows how to deal with. If Seth mounts > this as a filesystem he, and potentially other people, will be > able to access the content of this object without accessing the > object itself. > > seth$ mount --justforme -t ext4 ~seth/myfs /tmp/seth > seth$ chmod 777 /tmp/seth > seth$ ls -la /tmp/seth > drwxrwxrwx. 3 seth seth 260 Jul 16 12:59 . > drwxrwxrwxt 18 root root 4069 Jul 16 11:13 .. > seth$ > > Everything's fine at this point. Wilma is also using the system, > being the sort who likes to hide things in out of the way places > > wilma$ cp ~/scandals /tmp/seth > wilma$ chmod 600 /tmp/seth/scandals This is already impossible as described. Seth can only mount the filesystem in a private mount namespace inside a user namespace that he created. Wilma can't see it unless Seth passes an fd to Wilma and Wilma accepts and uses it. > > puts her list of scandals on the unsuspecting filesystem, and changes > the mode to ensure that no one can find out what went on after the > office party. > > Seth unmounts /tmp/seth. He looks in ~seth/myfs, finds out what really > happened at the office party, and the story goes from there. > > Wilma did everything correctly according to the system security policy, > but the system security policy did not protect her as advertised. The > system was tricked into behaving as if it was in control of the content > of the filesystem when in fact it was not. I would argue that, if Wilma writes to some place described by an fd and doesn't verify where she's writing to, then she has no expectation of privacy. After all, she could just *tell* Seth directly whatever she wants (assuming she can communicate with Seth in the first place). > > One way to fix this problem is for unprivileged mounts to recognize the > attributes of the object mounted and to propagate those attributes to all > the objects they present. All files on /tmp/seth would be owned by seth > and protected by the mode bits, ACL and LSM requirements of ~/seth/myfs. This is impossible to enforce, because Seth could use FUSE instead of ext4. > opening a file on /tmp/seth would require the same permissions as opening > the file containing the mounted filesystem. These attributes would have to > be immutable, or at least demonstrably more restrictive (chmod might be > allowed in some cases, but chown would never be) when changed. I don't see > how a user other than seth could create a new file, as you'd either have > a magical change in ownership or a false sense of security. This would be a very harsh restriction. Seth might legitimately want to give a user access to a file on backing store he owns without giving that user access to the backing store. Root on a normal system does that all the time. > If you can mount a filesystem such that the labels are ignored you > are effectively specifying that the Smack label on the files be > determined by the defaulting rules. With CAP_MAC_ADMIN that's fine. > Without it, it's not. Can you explain what the threat model is here? I don't see what it is that you're trying to prevent. >> Your point is taken about my less-than-expert opinion about the other >> security modules. We should at minimum get acks from the maintainers of >> those modules that unprivileged mounts will not compromise MAC. > > I am the Smack maintainer. Unprivileged mounts as you have > described them compromise MAC. They compromise DAC, too. > How do they compromise DAC? --Andy ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-16 22:27 ` Andy Lutomirski 0 siblings, 0 replies; 138+ messages in thread From: Andy Lutomirski @ 2015-07-16 22:27 UTC (permalink / raw) To: Casey Schaufler Cc: Serge Hallyn, linux-kernel, Seth Forshee, LSM List, SELinux-NSA, Linux FS Devel, Alexander Viro On Thu, Jul 16, 2015 at 2:42 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: > You want to provide a mechanism whereby an unprivileged user (Seth) > can mount a filesystem for his own use. You want full filesystem > semantics, but you're willing to accept restrictions on certain > filesystem features to avoid opening security holes. You are not > willing to accept restrictions that make the filesystem unusable, > such as making it read-only. > > I am going to present a suggestion. Feel free to correct my > assumptions and my reasoning. For simplicity let's use loop-back > mounting of a filesystem contained in a file as an example. The > principles should apply to newly created memory based filesystems > or disk partitions "owned" by Seth. > > Seth wants to mount a file (~seth/myfs) which contains an ext4 > filesystem. There is already a filesystem object, with security > attributes, that the system knows how to deal with. If Seth mounts > this as a filesystem he, and potentially other people, will be > able to access the content of this object without accessing the > object itself. > > seth$ mount --justforme -t ext4 ~seth/myfs /tmp/seth > seth$ chmod 777 /tmp/seth > seth$ ls -la /tmp/seth > drwxrwxrwx. 3 seth seth 260 Jul 16 12:59 . > drwxrwxrwxt 18 root root 4069 Jul 16 11:13 .. > seth$ > > Everything's fine at this point. Wilma is also using the system, > being the sort who likes to hide things in out of the way places > > wilma$ cp ~/scandals /tmp/seth > wilma$ chmod 600 /tmp/seth/scandals This is already impossible as described. Seth can only mount the filesystem in a private mount namespace inside a user namespace that he created. Wilma can't see it unless Seth passes an fd to Wilma and Wilma accepts and uses it. > > puts her list of scandals on the unsuspecting filesystem, and changes > the mode to ensure that no one can find out what went on after the > office party. > > Seth unmounts /tmp/seth. He looks in ~seth/myfs, finds out what really > happened at the office party, and the story goes from there. > > Wilma did everything correctly according to the system security policy, > but the system security policy did not protect her as advertised. The > system was tricked into behaving as if it was in control of the content > of the filesystem when in fact it was not. I would argue that, if Wilma writes to some place described by an fd and doesn't verify where she's writing to, then she has no expectation of privacy. After all, she could just *tell* Seth directly whatever she wants (assuming she can communicate with Seth in the first place). > > One way to fix this problem is for unprivileged mounts to recognize the > attributes of the object mounted and to propagate those attributes to all > the objects they present. All files on /tmp/seth would be owned by seth > and protected by the mode bits, ACL and LSM requirements of ~/seth/myfs. This is impossible to enforce, because Seth could use FUSE instead of ext4. > opening a file on /tmp/seth would require the same permissions as opening > the file containing the mounted filesystem. These attributes would have to > be immutable, or at least demonstrably more restrictive (chmod might be > allowed in some cases, but chown would never be) when changed. I don't see > how a user other than seth could create a new file, as you'd either have > a magical change in ownership or a false sense of security. This would be a very harsh restriction. Seth might legitimately want to give a user access to a file on backing store he owns without giving that user access to the backing store. Root on a normal system does that all the time. > If you can mount a filesystem such that the labels are ignored you > are effectively specifying that the Smack label on the files be > determined by the defaulting rules. With CAP_MAC_ADMIN that's fine. > Without it, it's not. Can you explain what the threat model is here? I don't see what it is that you're trying to prevent. >> Your point is taken about my less-than-expert opinion about the other >> security modules. We should at minimum get acks from the maintainers of >> those modules that unprivileged mounts will not compromise MAC. > > I am the Smack maintainer. Unprivileged mounts as you have > described them compromise MAC. They compromise DAC, too. > How do they compromise DAC? --Andy ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-16 22:27 ` Andy Lutomirski @ 2015-07-16 23:08 ` Casey Schaufler -1 siblings, 0 replies; 138+ messages in thread From: Casey Schaufler @ 2015-07-16 23:08 UTC (permalink / raw) To: Andy Lutomirski Cc: Seth Forshee, Eric W. Biederman, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel On 7/16/2015 3:27 PM, Andy Lutomirski wrote: > On Thu, Jul 16, 2015 at 2:42 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: >> You want to provide a mechanism whereby an unprivileged user (Seth) >> can mount a filesystem for his own use. You want full filesystem >> semantics, but you're willing to accept restrictions on certain >> filesystem features to avoid opening security holes. You are not >> willing to accept restrictions that make the filesystem unusable, >> such as making it read-only. >> >> I am going to present a suggestion. Feel free to correct my >> assumptions and my reasoning. For simplicity let's use loop-back >> mounting of a filesystem contained in a file as an example. The >> principles should apply to newly created memory based filesystems >> or disk partitions "owned" by Seth. >> >> Seth wants to mount a file (~seth/myfs) which contains an ext4 >> filesystem. There is already a filesystem object, with security >> attributes, that the system knows how to deal with. If Seth mounts >> this as a filesystem he, and potentially other people, will be >> able to access the content of this object without accessing the >> object itself. >> >> seth$ mount --justforme -t ext4 ~seth/myfs /tmp/seth >> seth$ chmod 777 /tmp/seth >> seth$ ls -la /tmp/seth >> drwxrwxrwx. 3 seth seth 260 Jul 16 12:59 . >> drwxrwxrwxt 18 root root 4069 Jul 16 11:13 .. >> seth$ >> >> Everything's fine at this point. Wilma is also using the system, >> being the sort who likes to hide things in out of the way places >> >> wilma$ cp ~/scandals /tmp/seth >> wilma$ chmod 600 /tmp/seth/scandals > This is already impossible as described. Seth can only mount the > filesystem in a private mount namespace inside a user namespace that > he created. Wilma can't see it unless Seth passes an fd to Wilma and > Wilma accepts and uses it. But you do have multiple UIDs withing your user namespace, right? There are processes running as someone other than seth, right? > >> puts her list of scandals on the unsuspecting filesystem, and changes >> the mode to ensure that no one can find out what went on after the >> office party. >> >> Seth unmounts /tmp/seth. He looks in ~seth/myfs, finds out what really >> happened at the office party, and the story goes from there. >> >> Wilma did everything correctly according to the system security policy, >> but the system security policy did not protect her as advertised. The >> system was tricked into behaving as if it was in control of the content >> of the filesystem when in fact it was not. > > I would argue that, if Wilma writes to some place described by an fd > and doesn't verify where she's writing to, then she has no expectation > of privacy. After all, she could just *tell* Seth directly whatever > she wants (assuming she can communicate with Seth in the first place). Don't ascribe either wisdom or good intentions to Wilma. >> One way to fix this problem is for unprivileged mounts to recognize the >> attributes of the object mounted and to propagate those attributes to all >> the objects they present. All files on /tmp/seth would be owned by seth >> and protected by the mode bits, ACL and LSM requirements of ~/seth/myfs. > This is impossible to enforce, because Seth could use FUSE instead of ext4. I never said that things aren't already broken. And, if you want to ignore the potential DAC issues (read, negative groups) just do it for the LSM xattrs. > >> opening a file on /tmp/seth would require the same permissions as opening >> the file containing the mounted filesystem. These attributes would have to >> be immutable, or at least demonstrably more restrictive (chmod might be >> allowed in some cases, but chown would never be) when changed. I don't see >> how a user other than seth could create a new file, as you'd either have >> a magical change in ownership or a false sense of security. > This would be a very harsh restriction. Seth might legitimately want > to give a user access to a file on backing store he owns without > giving that user access to the backing store. Root on a normal system > does that all the time. You already said that it was impossible for Wilma to get access, so how is this more restrictive? Besides, Seth can always set the mode on ~/seth so that Wilma can't read the files it contains. This isn't an old problem or a novel solution. >> If you can mount a filesystem such that the labels are ignored you >> are effectively specifying that the Smack label on the files be >> determined by the defaulting rules. With CAP_MAC_ADMIN that's fine. >> Without it, it's not. > Can you explain what the threat model is here? I don't see what it is > that you're trying to prevent. Um, OK. The filesystem has files with a hundred different Smack labels on it. I mount it as an unlabeled filesystem and everything is readable by everyone. Bad jojo. > >>> Your point is taken about my less-than-expert opinion about the other >>> security modules. We should at minimum get acks from the maintainers of >>> those modules that unprivileged mounts will not compromise MAC. >> I am the Smack maintainer. Unprivileged mounts as you have >> described them compromise MAC. They compromise DAC, too. >> > How do they compromise DAC? Wilma's expectation (or the application running with a mapped UID) that chmod will keep Seth out of the file. > --Andy > ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-16 23:08 ` Casey Schaufler 0 siblings, 0 replies; 138+ messages in thread From: Casey Schaufler @ 2015-07-16 23:08 UTC (permalink / raw) To: Andy Lutomirski Cc: Serge Hallyn, linux-kernel, Seth Forshee, LSM List, SELinux-NSA, Linux FS Devel, Alexander Viro On 7/16/2015 3:27 PM, Andy Lutomirski wrote: > On Thu, Jul 16, 2015 at 2:42 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: >> You want to provide a mechanism whereby an unprivileged user (Seth) >> can mount a filesystem for his own use. You want full filesystem >> semantics, but you're willing to accept restrictions on certain >> filesystem features to avoid opening security holes. You are not >> willing to accept restrictions that make the filesystem unusable, >> such as making it read-only. >> >> I am going to present a suggestion. Feel free to correct my >> assumptions and my reasoning. For simplicity let's use loop-back >> mounting of a filesystem contained in a file as an example. The >> principles should apply to newly created memory based filesystems >> or disk partitions "owned" by Seth. >> >> Seth wants to mount a file (~seth/myfs) which contains an ext4 >> filesystem. There is already a filesystem object, with security >> attributes, that the system knows how to deal with. If Seth mounts >> this as a filesystem he, and potentially other people, will be >> able to access the content of this object without accessing the >> object itself. >> >> seth$ mount --justforme -t ext4 ~seth/myfs /tmp/seth >> seth$ chmod 777 /tmp/seth >> seth$ ls -la /tmp/seth >> drwxrwxrwx. 3 seth seth 260 Jul 16 12:59 . >> drwxrwxrwxt 18 root root 4069 Jul 16 11:13 .. >> seth$ >> >> Everything's fine at this point. Wilma is also using the system, >> being the sort who likes to hide things in out of the way places >> >> wilma$ cp ~/scandals /tmp/seth >> wilma$ chmod 600 /tmp/seth/scandals > This is already impossible as described. Seth can only mount the > filesystem in a private mount namespace inside a user namespace that > he created. Wilma can't see it unless Seth passes an fd to Wilma and > Wilma accepts and uses it. But you do have multiple UIDs withing your user namespace, right? There are processes running as someone other than seth, right? > >> puts her list of scandals on the unsuspecting filesystem, and changes >> the mode to ensure that no one can find out what went on after the >> office party. >> >> Seth unmounts /tmp/seth. He looks in ~seth/myfs, finds out what really >> happened at the office party, and the story goes from there. >> >> Wilma did everything correctly according to the system security policy, >> but the system security policy did not protect her as advertised. The >> system was tricked into behaving as if it was in control of the content >> of the filesystem when in fact it was not. > > I would argue that, if Wilma writes to some place described by an fd > and doesn't verify where she's writing to, then she has no expectation > of privacy. After all, she could just *tell* Seth directly whatever > she wants (assuming she can communicate with Seth in the first place). Don't ascribe either wisdom or good intentions to Wilma. >> One way to fix this problem is for unprivileged mounts to recognize the >> attributes of the object mounted and to propagate those attributes to all >> the objects they present. All files on /tmp/seth would be owned by seth >> and protected by the mode bits, ACL and LSM requirements of ~/seth/myfs. > This is impossible to enforce, because Seth could use FUSE instead of ext4. I never said that things aren't already broken. And, if you want to ignore the potential DAC issues (read, negative groups) just do it for the LSM xattrs. > >> opening a file on /tmp/seth would require the same permissions as opening >> the file containing the mounted filesystem. These attributes would have to >> be immutable, or at least demonstrably more restrictive (chmod might be >> allowed in some cases, but chown would never be) when changed. I don't see >> how a user other than seth could create a new file, as you'd either have >> a magical change in ownership or a false sense of security. > This would be a very harsh restriction. Seth might legitimately want > to give a user access to a file on backing store he owns without > giving that user access to the backing store. Root on a normal system > does that all the time. You already said that it was impossible for Wilma to get access, so how is this more restrictive? Besides, Seth can always set the mode on ~/seth so that Wilma can't read the files it contains. This isn't an old problem or a novel solution. >> If you can mount a filesystem such that the labels are ignored you >> are effectively specifying that the Smack label on the files be >> determined by the defaulting rules. With CAP_MAC_ADMIN that's fine. >> Without it, it's not. > Can you explain what the threat model is here? I don't see what it is > that you're trying to prevent. Um, OK. The filesystem has files with a hundred different Smack labels on it. I mount it as an unlabeled filesystem and everything is readable by everyone. Bad jojo. > >>> Your point is taken about my less-than-expert opinion about the other >>> security modules. We should at minimum get acks from the maintainers of >>> those modules that unprivileged mounts will not compromise MAC. >> I am the Smack maintainer. Unprivileged mounts as you have >> described them compromise MAC. They compromise DAC, too. >> > How do they compromise DAC? Wilma's expectation (or the application running with a mapped UID) that chmod will keep Seth out of the file. > --Andy > ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-16 23:08 ` Casey Schaufler @ 2015-07-16 23:29 ` Andy Lutomirski -1 siblings, 0 replies; 138+ messages in thread From: Andy Lutomirski @ 2015-07-16 23:29 UTC (permalink / raw) To: Casey Schaufler Cc: Seth Forshee, Eric W. Biederman, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel On Thu, Jul 16, 2015 at 4:08 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: > On 7/16/2015 3:27 PM, Andy Lutomirski wrote: >> On Thu, Jul 16, 2015 at 2:42 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: >>> You want to provide a mechanism whereby an unprivileged user (Seth) >>> can mount a filesystem for his own use. You want full filesystem >>> semantics, but you're willing to accept restrictions on certain >>> filesystem features to avoid opening security holes. You are not >>> willing to accept restrictions that make the filesystem unusable, >>> such as making it read-only. >>> >>> I am going to present a suggestion. Feel free to correct my >>> assumptions and my reasoning. For simplicity let's use loop-back >>> mounting of a filesystem contained in a file as an example. The >>> principles should apply to newly created memory based filesystems >>> or disk partitions "owned" by Seth. >>> >>> Seth wants to mount a file (~seth/myfs) which contains an ext4 >>> filesystem. There is already a filesystem object, with security >>> attributes, that the system knows how to deal with. If Seth mounts >>> this as a filesystem he, and potentially other people, will be >>> able to access the content of this object without accessing the >>> object itself. >>> >>> seth$ mount --justforme -t ext4 ~seth/myfs /tmp/seth >>> seth$ chmod 777 /tmp/seth >>> seth$ ls -la /tmp/seth >>> drwxrwxrwx. 3 seth seth 260 Jul 16 12:59 . >>> drwxrwxrwxt 18 root root 4069 Jul 16 11:13 .. >>> seth$ >>> >>> Everything's fine at this point. Wilma is also using the system, >>> being the sort who likes to hide things in out of the way places >>> >>> wilma$ cp ~/scandals /tmp/seth >>> wilma$ chmod 600 /tmp/seth/scandals >> This is already impossible as described. Seth can only mount the >> filesystem in a private mount namespace inside a user namespace that >> he created. Wilma can't see it unless Seth passes an fd to Wilma and >> Wilma accepts and uses it. > > But you do have multiple UIDs withing your user namespace, right? > There are processes running as someone other than seth, right? > Only if root set it up that way. For example, root could set up "subuids" (this is a userspace concept) that belong to Seth. These would be uids that Seth controls and that represent subsets of Seth's authority. Wilma wouldn't be one of these subuids unless she was somehow part of Seth (or if root completely screwed up). >> >>> puts her list of scandals on the unsuspecting filesystem, and changes >>> the mode to ensure that no one can find out what went on after the >>> office party. >>> >>> Seth unmounts /tmp/seth. He looks in ~seth/myfs, finds out what really >>> happened at the office party, and the story goes from there. >>> >>> Wilma did everything correctly according to the system security policy, >>> but the system security policy did not protect her as advertised. The >>> system was tricked into behaving as if it was in control of the content >>> of the filesystem when in fact it was not. >> >> I would argue that, if Wilma writes to some place described by an fd >> and doesn't verify where she's writing to, then she has no expectation >> of privacy. After all, she could just *tell* Seth directly whatever >> she wants (assuming she can communicate with Seth in the first place). > > Don't ascribe either wisdom or good intentions to Wilma. In that case, I'll mention the futility of solving the problem, even without user namespaces. If Wilma tells Seth something, he's going to find out. If Wilma pokes it (in whatever form) into an fd provided by Seth, then Seth is extremely likely to find out, regardless of what root or the MAC owner tries to do. If Wilma writes to a path that's mounted in her namespace, then, sure, overall policy associated with her namespace (which, in your example, is the root namespace) must apply. But Seth can't mount things into Wilma's namespace without having CAP_SYS_ADMIN in that namespace and, if he has CAP_SYS_ADMIN, it's already game over. > >>> One way to fix this problem is for unprivileged mounts to recognize the >>> attributes of the object mounted and to propagate those attributes to all >>> the objects they present. All files on /tmp/seth would be owned by seth >>> and protected by the mode bits, ACL and LSM requirements of ~/seth/myfs. >> This is impossible to enforce, because Seth could use FUSE instead of ext4. > > I never said that things aren't already broken. And, if you want > to ignore the potential DAC issues (read, negative groups) just > do it for the LSM xattrs. > Negative groups are a solved problem, I believe. > >> >>> opening a file on /tmp/seth would require the same permissions as opening >>> the file containing the mounted filesystem. These attributes would have to >>> be immutable, or at least demonstrably more restrictive (chmod might be >>> allowed in some cases, but chown would never be) when changed. I don't see >>> how a user other than seth could create a new file, as you'd either have >>> a magical change in ownership or a false sense of security. >> This would be a very harsh restriction. Seth might legitimately want >> to give a user access to a file on backing store he owns without >> giving that user access to the backing store. Root on a normal system >> does that all the time. > > You already said that it was impossible for Wilma to get > access, so how is this more restrictive? Besides, Seth can > always set the mode on ~/seth so that Wilma can't read the > files it contains. This isn't an old problem or a novel > solution. Seth can pass an fd around. This is actually a plausible thing to do: Seth creates a userns to sandbox himself, mounts some FUSE thing in there, and passes an fd out for the benefit of some daemon. That daemon had better validate the thing before using it, though. I really don't see the benefit of making up extra rules that apply to users outside a userns who try to access specifically a filesystem with backing store. They wouldn't make sense for filesystems without backing store. > >>> If you can mount a filesystem such that the labels are ignored you >>> are effectively specifying that the Smack label on the files be >>> determined by the defaulting rules. With CAP_MAC_ADMIN that's fine. >>> Without it, it's not. >> Can you explain what the threat model is here? I don't see what it is >> that you're trying to prevent. > > Um, OK. > The filesystem has files with a hundred different Smack labels on it. > I mount it as an unlabeled filesystem and everything is readable by > everyone. Bad jojo. I still don't understand. If it's a filesystem backed by a file that Seth has RW access to, then Seth can read everything on it, full stop. The security labels in the filesystem are irrelevant. This is like saying that, if you put restrictive labels in the filesystem that lives on /dev/sda2 and give Seth ownership of /dev/sda2, then you expect Seth to be unable to bypass the policy specifies by your labels. Or maybe I'm misunderstanding you. > >> >>>> Your point is taken about my less-than-expert opinion about the other >>>> security modules. We should at minimum get acks from the maintainers of >>>> those modules that unprivileged mounts will not compromise MAC. >>> I am the Smack maintainer. Unprivileged mounts as you have >>> described them compromise MAC. They compromise DAC, too. >>> >> How do they compromise DAC? > > Wilma's expectation (or the application running with a mapped UID) > that chmod will keep Seth out of the file. That was never true. If Seth has an open fd, Wilma can chmod all day and it won't matter. In this example, Seth owns the entire filesystem along with its backing store. --Andy ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-16 23:29 ` Andy Lutomirski 0 siblings, 0 replies; 138+ messages in thread From: Andy Lutomirski @ 2015-07-16 23:29 UTC (permalink / raw) To: Casey Schaufler Cc: Serge Hallyn, linux-kernel, Seth Forshee, LSM List, SELinux-NSA, Linux FS Devel, Alexander Viro On Thu, Jul 16, 2015 at 4:08 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: > On 7/16/2015 3:27 PM, Andy Lutomirski wrote: >> On Thu, Jul 16, 2015 at 2:42 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: >>> You want to provide a mechanism whereby an unprivileged user (Seth) >>> can mount a filesystem for his own use. You want full filesystem >>> semantics, but you're willing to accept restrictions on certain >>> filesystem features to avoid opening security holes. You are not >>> willing to accept restrictions that make the filesystem unusable, >>> such as making it read-only. >>> >>> I am going to present a suggestion. Feel free to correct my >>> assumptions and my reasoning. For simplicity let's use loop-back >>> mounting of a filesystem contained in a file as an example. The >>> principles should apply to newly created memory based filesystems >>> or disk partitions "owned" by Seth. >>> >>> Seth wants to mount a file (~seth/myfs) which contains an ext4 >>> filesystem. There is already a filesystem object, with security >>> attributes, that the system knows how to deal with. If Seth mounts >>> this as a filesystem he, and potentially other people, will be >>> able to access the content of this object without accessing the >>> object itself. >>> >>> seth$ mount --justforme -t ext4 ~seth/myfs /tmp/seth >>> seth$ chmod 777 /tmp/seth >>> seth$ ls -la /tmp/seth >>> drwxrwxrwx. 3 seth seth 260 Jul 16 12:59 . >>> drwxrwxrwxt 18 root root 4069 Jul 16 11:13 .. >>> seth$ >>> >>> Everything's fine at this point. Wilma is also using the system, >>> being the sort who likes to hide things in out of the way places >>> >>> wilma$ cp ~/scandals /tmp/seth >>> wilma$ chmod 600 /tmp/seth/scandals >> This is already impossible as described. Seth can only mount the >> filesystem in a private mount namespace inside a user namespace that >> he created. Wilma can't see it unless Seth passes an fd to Wilma and >> Wilma accepts and uses it. > > But you do have multiple UIDs withing your user namespace, right? > There are processes running as someone other than seth, right? > Only if root set it up that way. For example, root could set up "subuids" (this is a userspace concept) that belong to Seth. These would be uids that Seth controls and that represent subsets of Seth's authority. Wilma wouldn't be one of these subuids unless she was somehow part of Seth (or if root completely screwed up). >> >>> puts her list of scandals on the unsuspecting filesystem, and changes >>> the mode to ensure that no one can find out what went on after the >>> office party. >>> >>> Seth unmounts /tmp/seth. He looks in ~seth/myfs, finds out what really >>> happened at the office party, and the story goes from there. >>> >>> Wilma did everything correctly according to the system security policy, >>> but the system security policy did not protect her as advertised. The >>> system was tricked into behaving as if it was in control of the content >>> of the filesystem when in fact it was not. >> >> I would argue that, if Wilma writes to some place described by an fd >> and doesn't verify where she's writing to, then she has no expectation >> of privacy. After all, she could just *tell* Seth directly whatever >> she wants (assuming she can communicate with Seth in the first place). > > Don't ascribe either wisdom or good intentions to Wilma. In that case, I'll mention the futility of solving the problem, even without user namespaces. If Wilma tells Seth something, he's going to find out. If Wilma pokes it (in whatever form) into an fd provided by Seth, then Seth is extremely likely to find out, regardless of what root or the MAC owner tries to do. If Wilma writes to a path that's mounted in her namespace, then, sure, overall policy associated with her namespace (which, in your example, is the root namespace) must apply. But Seth can't mount things into Wilma's namespace without having CAP_SYS_ADMIN in that namespace and, if he has CAP_SYS_ADMIN, it's already game over. > >>> One way to fix this problem is for unprivileged mounts to recognize the >>> attributes of the object mounted and to propagate those attributes to all >>> the objects they present. All files on /tmp/seth would be owned by seth >>> and protected by the mode bits, ACL and LSM requirements of ~/seth/myfs. >> This is impossible to enforce, because Seth could use FUSE instead of ext4. > > I never said that things aren't already broken. And, if you want > to ignore the potential DAC issues (read, negative groups) just > do it for the LSM xattrs. > Negative groups are a solved problem, I believe. > >> >>> opening a file on /tmp/seth would require the same permissions as opening >>> the file containing the mounted filesystem. These attributes would have to >>> be immutable, or at least demonstrably more restrictive (chmod might be >>> allowed in some cases, but chown would never be) when changed. I don't see >>> how a user other than seth could create a new file, as you'd either have >>> a magical change in ownership or a false sense of security. >> This would be a very harsh restriction. Seth might legitimately want >> to give a user access to a file on backing store he owns without >> giving that user access to the backing store. Root on a normal system >> does that all the time. > > You already said that it was impossible for Wilma to get > access, so how is this more restrictive? Besides, Seth can > always set the mode on ~/seth so that Wilma can't read the > files it contains. This isn't an old problem or a novel > solution. Seth can pass an fd around. This is actually a plausible thing to do: Seth creates a userns to sandbox himself, mounts some FUSE thing in there, and passes an fd out for the benefit of some daemon. That daemon had better validate the thing before using it, though. I really don't see the benefit of making up extra rules that apply to users outside a userns who try to access specifically a filesystem with backing store. They wouldn't make sense for filesystems without backing store. > >>> If you can mount a filesystem such that the labels are ignored you >>> are effectively specifying that the Smack label on the files be >>> determined by the defaulting rules. With CAP_MAC_ADMIN that's fine. >>> Without it, it's not. >> Can you explain what the threat model is here? I don't see what it is >> that you're trying to prevent. > > Um, OK. > The filesystem has files with a hundred different Smack labels on it. > I mount it as an unlabeled filesystem and everything is readable by > everyone. Bad jojo. I still don't understand. If it's a filesystem backed by a file that Seth has RW access to, then Seth can read everything on it, full stop. The security labels in the filesystem are irrelevant. This is like saying that, if you put restrictive labels in the filesystem that lives on /dev/sda2 and give Seth ownership of /dev/sda2, then you expect Seth to be unable to bypass the policy specifies by your labels. Or maybe I'm misunderstanding you. > >> >>>> Your point is taken about my less-than-expert opinion about the other >>>> security modules. We should at minimum get acks from the maintainers of >>>> those modules that unprivileged mounts will not compromise MAC. >>> I am the Smack maintainer. Unprivileged mounts as you have >>> described them compromise MAC. They compromise DAC, too. >>> >> How do they compromise DAC? > > Wilma's expectation (or the application running with a mapped UID) > that chmod will keep Seth out of the file. That was never true. If Seth has an open fd, Wilma can chmod all day and it won't matter. In this example, Seth owns the entire filesystem along with its backing store. --Andy ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-16 23:29 ` Andy Lutomirski @ 2015-07-17 0:45 ` Casey Schaufler -1 siblings, 0 replies; 138+ messages in thread From: Casey Schaufler @ 2015-07-17 0:45 UTC (permalink / raw) To: Andy Lutomirski Cc: Seth Forshee, Eric W. Biederman, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel On 7/16/2015 4:29 PM, Andy Lutomirski wrote: > On Thu, Jul 16, 2015 at 4:08 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: >> On 7/16/2015 3:27 PM, Andy Lutomirski wrote: >>> On Thu, Jul 16, 2015 at 2:42 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: >>>> You want to provide a mechanism whereby an unprivileged user (Seth) >>>> can mount a filesystem for his own use. You want full filesystem >>>> semantics, but you're willing to accept restrictions on certain >>>> filesystem features to avoid opening security holes. You are not >>>> willing to accept restrictions that make the filesystem unusable, >>>> such as making it read-only. >>>> >>>> I am going to present a suggestion. Feel free to correct my >>>> assumptions and my reasoning. For simplicity let's use loop-back >>>> mounting of a filesystem contained in a file as an example. The >>>> principles should apply to newly created memory based filesystems >>>> or disk partitions "owned" by Seth. >>>> >>>> Seth wants to mount a file (~seth/myfs) which contains an ext4 >>>> filesystem. There is already a filesystem object, with security >>>> attributes, that the system knows how to deal with. If Seth mounts >>>> this as a filesystem he, and potentially other people, will be >>>> able to access the content of this object without accessing the >>>> object itself. >>>> >>>> seth$ mount --justforme -t ext4 ~seth/myfs /tmp/seth >>>> seth$ chmod 777 /tmp/seth >>>> seth$ ls -la /tmp/seth >>>> drwxrwxrwx. 3 seth seth 260 Jul 16 12:59 . >>>> drwxrwxrwxt 18 root root 4069 Jul 16 11:13 .. >>>> seth$ >>>> >>>> Everything's fine at this point. Wilma is also using the system, >>>> being the sort who likes to hide things in out of the way places >>>> >>>> wilma$ cp ~/scandals /tmp/seth >>>> wilma$ chmod 600 /tmp/seth/scandals >>> This is already impossible as described. Seth can only mount the >>> filesystem in a private mount namespace inside a user namespace that >>> he created. Wilma can't see it unless Seth passes an fd to Wilma and >>> Wilma accepts and uses it. >> But you do have multiple UIDs withing your user namespace, right? >> There are processes running as someone other than seth, right? >> > Only if root set it up that way. For example, root could set up > "subuids" (this is a userspace concept) that belong to Seth. These > would be uids that Seth controls and that represent subsets of Seth's > authority. Wilma wouldn't be one of these subuids unless she was > somehow part of Seth (or if root completely screwed up). Or if root had some really unexpected and inappropriate ideas on what qualifies as "clever". But I'll back off. It looks like this particular objection of mine is covered. > >>>> puts her list of scandals on the unsuspecting filesystem, and changes >>>> the mode to ensure that no one can find out what went on after the >>>> office party. >>>> >>>> Seth unmounts /tmp/seth. He looks in ~seth/myfs, finds out what really >>>> happened at the office party, and the story goes from there. >>>> >>>> Wilma did everything correctly according to the system security policy, >>>> but the system security policy did not protect her as advertised. The >>>> system was tricked into behaving as if it was in control of the content >>>> of the filesystem when in fact it was not. >>> I would argue that, if Wilma writes to some place described by an fd >>> and doesn't verify where she's writing to, then she has no expectation >>> of privacy. After all, she could just *tell* Seth directly whatever >>> she wants (assuming she can communicate with Seth in the first place). >> Don't ascribe either wisdom or good intentions to Wilma. > In that case, I'll mention the futility of solving the problem, even > without user namespaces. If Wilma tells Seth something, he's going to > find out. If Wilma pokes it (in whatever form) into an fd provided by > Seth, then Seth is extremely likely to find out, regardless of what > root or the MAC owner tries to do. I'll buy that, too. I still get queasy every time someone tells me that passing file descriptors is a security feature. > If Wilma writes to a path that's mounted in her namespace, then, sure, > overall policy associated with her namespace (which, in your example, > is the root namespace) must apply. But Seth can't mount things into > Wilma's namespace without having CAP_SYS_ADMIN in that namespace and, > if he has CAP_SYS_ADMIN, it's already game over. And so long as it's restricted to the namespace ... I'm starting to get it now. >>>> One way to fix this problem is for unprivileged mounts to recognize the >>>> attributes of the object mounted and to propagate those attributes to all >>>> the objects they present. All files on /tmp/seth would be owned by seth >>>> and protected by the mode bits, ACL and LSM requirements of ~/seth/myfs. >>> This is impossible to enforce, because Seth could use FUSE instead of ext4. >> I never said that things aren't already broken. And, if you want >> to ignore the potential DAC issues (read, negative groups) just >> do it for the LSM xattrs. >> > Negative groups are a solved problem, I believe. My position is that there's a workaround but that the design is still fundamentally flawed. > >>>> opening a file on /tmp/seth would require the same permissions as opening >>>> the file containing the mounted filesystem. These attributes would have to >>>> be immutable, or at least demonstrably more restrictive (chmod might be >>>> allowed in some cases, but chown would never be) when changed. I don't see >>>> how a user other than seth could create a new file, as you'd either have >>>> a magical change in ownership or a false sense of security. >>> This would be a very harsh restriction. Seth might legitimately want >>> to give a user access to a file on backing store he owns without >>> giving that user access to the backing store. Root on a normal system >>> does that all the time. >> You already said that it was impossible for Wilma to get >> access, so how is this more restrictive? Besides, Seth can >> always set the mode on ~/seth so that Wilma can't read the >> files it contains. This isn't an old problem or a novel >> solution. > Seth can pass an fd around. This is actually a plausible thing to do: > Seth creates a userns to sandbox himself, mounts some FUSE thing in > there, and passes an fd out for the benefit of some daemon. That > daemon had better validate the thing before using it, though. Point. It won't, but it should. > I really don't see the benefit of making up extra rules that apply to > users outside a userns who try to access specifically a filesystem > with backing store. They wouldn't make sense for filesystems without > backing store. Sure it would. For Smack, it would be the label a file would be created with, which would be the label of the process creating the memory based filesystem. For SELinux the rules are more a touch more sophisticated, but I'm sure that Paul or Stephen could come up with how to determine it. The point, looping all the way back to the beginning, where we were talking about just ignoring the labels on the filesystem, is that if you use the same Smack label on the files in the filesystem as the backing store file has, we'll all be happy. If that label isn't something user can write to, he won't be able to write to the mounted objects, either. If there is no backing store then use the label of the process creating the filesystem, which will be the user, which will mean everything will work hunky dory. Yes, there's work involved, but I doubt there's a lot. Getting the label from the backing store or the creating process is simple enough. >>>> If you can mount a filesystem such that the labels are ignored you >>>> are effectively specifying that the Smack label on the files be >>>> determined by the defaulting rules. With CAP_MAC_ADMIN that's fine. >>>> Without it, it's not. >>> Can you explain what the threat model is here? I don't see what it is >>> that you're trying to prevent. >> Um, OK. >> The filesystem has files with a hundred different Smack labels on it. >> I mount it as an unlabeled filesystem and everything is readable by >> everyone. Bad jojo. > I still don't understand. If it's a filesystem backed by a file that > Seth has RW access to, then Seth can read everything on it, full stop. > The security labels in the filesystem are irrelevant. Well, they can't be trusted, if that's what you mean. That's why I'm saying that the objects exposed by mounting this backing store need to be treated with the same security attributes as the backing store. Fudge it for DAC if you are so inclined, but I think it's the right way to go for MAC. > This is like saying that, if you put restrictive labels in the > filesystem that lives on /dev/sda2 and give Seth ownership of > /dev/sda2, then you expect Seth to be unable to bypass the policy > specifies by your labels. Consider the Smack label on /dev/sda2. Smack does not care who owns it, just what the Smack label is. Just like on ~/seth/myfs. The backing store "object" is /dev/sda2 in the one case, ~/seth/myfs in the other, and something in the ether for a memory based filesystem. So long as the labels of the files exposed on the mount point match those of the backing store "object", Smack is going to be happy. Since you're running without privilege, you can't change the labels on the files. Now Seth, being the sneaky person that he is, could change the Smack labels on the files in the backing store while it's offline. Since he has access to the backing store, he can't give himself more access by changing the labels within the filesystem. He can give himself less, but I'm OK with that. > Or maybe I'm misunderstanding you. Probably, but I'm undoubtedly doing the same. If you're going to be at LinuxCon in Seattle we should continue this discussion over the beverage of your choice. >>>>> Your point is taken about my less-than-expert opinion about the other >>>>> security modules. We should at minimum get acks from the maintainers of >>>>> those modules that unprivileged mounts will not compromise MAC. >>>> I am the Smack maintainer. Unprivileged mounts as you have >>>> described them compromise MAC. They compromise DAC, too. >>>> >>> How do they compromise DAC? >> Wilma's expectation (or the application running with a mapped UID) >> that chmod will keep Seth out of the file. > That was never true. If Seth has an open fd, Wilma can chmod all day > and it won't matter. In this example, Seth owns the entire filesystem > along with its backing store. > > --Andy > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-17 0:45 ` Casey Schaufler 0 siblings, 0 replies; 138+ messages in thread From: Casey Schaufler @ 2015-07-17 0:45 UTC (permalink / raw) To: Andy Lutomirski Cc: Serge Hallyn, linux-kernel, Seth Forshee, LSM List, SELinux-NSA, Linux FS Devel, Alexander Viro On 7/16/2015 4:29 PM, Andy Lutomirski wrote: > On Thu, Jul 16, 2015 at 4:08 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: >> On 7/16/2015 3:27 PM, Andy Lutomirski wrote: >>> On Thu, Jul 16, 2015 at 2:42 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: >>>> You want to provide a mechanism whereby an unprivileged user (Seth) >>>> can mount a filesystem for his own use. You want full filesystem >>>> semantics, but you're willing to accept restrictions on certain >>>> filesystem features to avoid opening security holes. You are not >>>> willing to accept restrictions that make the filesystem unusable, >>>> such as making it read-only. >>>> >>>> I am going to present a suggestion. Feel free to correct my >>>> assumptions and my reasoning. For simplicity let's use loop-back >>>> mounting of a filesystem contained in a file as an example. The >>>> principles should apply to newly created memory based filesystems >>>> or disk partitions "owned" by Seth. >>>> >>>> Seth wants to mount a file (~seth/myfs) which contains an ext4 >>>> filesystem. There is already a filesystem object, with security >>>> attributes, that the system knows how to deal with. If Seth mounts >>>> this as a filesystem he, and potentially other people, will be >>>> able to access the content of this object without accessing the >>>> object itself. >>>> >>>> seth$ mount --justforme -t ext4 ~seth/myfs /tmp/seth >>>> seth$ chmod 777 /tmp/seth >>>> seth$ ls -la /tmp/seth >>>> drwxrwxrwx. 3 seth seth 260 Jul 16 12:59 . >>>> drwxrwxrwxt 18 root root 4069 Jul 16 11:13 .. >>>> seth$ >>>> >>>> Everything's fine at this point. Wilma is also using the system, >>>> being the sort who likes to hide things in out of the way places >>>> >>>> wilma$ cp ~/scandals /tmp/seth >>>> wilma$ chmod 600 /tmp/seth/scandals >>> This is already impossible as described. Seth can only mount the >>> filesystem in a private mount namespace inside a user namespace that >>> he created. Wilma can't see it unless Seth passes an fd to Wilma and >>> Wilma accepts and uses it. >> But you do have multiple UIDs withing your user namespace, right? >> There are processes running as someone other than seth, right? >> > Only if root set it up that way. For example, root could set up > "subuids" (this is a userspace concept) that belong to Seth. These > would be uids that Seth controls and that represent subsets of Seth's > authority. Wilma wouldn't be one of these subuids unless she was > somehow part of Seth (or if root completely screwed up). Or if root had some really unexpected and inappropriate ideas on what qualifies as "clever". But I'll back off. It looks like this particular objection of mine is covered. > >>>> puts her list of scandals on the unsuspecting filesystem, and changes >>>> the mode to ensure that no one can find out what went on after the >>>> office party. >>>> >>>> Seth unmounts /tmp/seth. He looks in ~seth/myfs, finds out what really >>>> happened at the office party, and the story goes from there. >>>> >>>> Wilma did everything correctly according to the system security policy, >>>> but the system security policy did not protect her as advertised. The >>>> system was tricked into behaving as if it was in control of the content >>>> of the filesystem when in fact it was not. >>> I would argue that, if Wilma writes to some place described by an fd >>> and doesn't verify where she's writing to, then she has no expectation >>> of privacy. After all, she could just *tell* Seth directly whatever >>> she wants (assuming she can communicate with Seth in the first place). >> Don't ascribe either wisdom or good intentions to Wilma. > In that case, I'll mention the futility of solving the problem, even > without user namespaces. If Wilma tells Seth something, he's going to > find out. If Wilma pokes it (in whatever form) into an fd provided by > Seth, then Seth is extremely likely to find out, regardless of what > root or the MAC owner tries to do. I'll buy that, too. I still get queasy every time someone tells me that passing file descriptors is a security feature. > If Wilma writes to a path that's mounted in her namespace, then, sure, > overall policy associated with her namespace (which, in your example, > is the root namespace) must apply. But Seth can't mount things into > Wilma's namespace without having CAP_SYS_ADMIN in that namespace and, > if he has CAP_SYS_ADMIN, it's already game over. And so long as it's restricted to the namespace ... I'm starting to get it now. >>>> One way to fix this problem is for unprivileged mounts to recognize the >>>> attributes of the object mounted and to propagate those attributes to all >>>> the objects they present. All files on /tmp/seth would be owned by seth >>>> and protected by the mode bits, ACL and LSM requirements of ~/seth/myfs. >>> This is impossible to enforce, because Seth could use FUSE instead of ext4. >> I never said that things aren't already broken. And, if you want >> to ignore the potential DAC issues (read, negative groups) just >> do it for the LSM xattrs. >> > Negative groups are a solved problem, I believe. My position is that there's a workaround but that the design is still fundamentally flawed. > >>>> opening a file on /tmp/seth would require the same permissions as opening >>>> the file containing the mounted filesystem. These attributes would have to >>>> be immutable, or at least demonstrably more restrictive (chmod might be >>>> allowed in some cases, but chown would never be) when changed. I don't see >>>> how a user other than seth could create a new file, as you'd either have >>>> a magical change in ownership or a false sense of security. >>> This would be a very harsh restriction. Seth might legitimately want >>> to give a user access to a file on backing store he owns without >>> giving that user access to the backing store. Root on a normal system >>> does that all the time. >> You already said that it was impossible for Wilma to get >> access, so how is this more restrictive? Besides, Seth can >> always set the mode on ~/seth so that Wilma can't read the >> files it contains. This isn't an old problem or a novel >> solution. > Seth can pass an fd around. This is actually a plausible thing to do: > Seth creates a userns to sandbox himself, mounts some FUSE thing in > there, and passes an fd out for the benefit of some daemon. That > daemon had better validate the thing before using it, though. Point. It won't, but it should. > I really don't see the benefit of making up extra rules that apply to > users outside a userns who try to access specifically a filesystem > with backing store. They wouldn't make sense for filesystems without > backing store. Sure it would. For Smack, it would be the label a file would be created with, which would be the label of the process creating the memory based filesystem. For SELinux the rules are more a touch more sophisticated, but I'm sure that Paul or Stephen could come up with how to determine it. The point, looping all the way back to the beginning, where we were talking about just ignoring the labels on the filesystem, is that if you use the same Smack label on the files in the filesystem as the backing store file has, we'll all be happy. If that label isn't something user can write to, he won't be able to write to the mounted objects, either. If there is no backing store then use the label of the process creating the filesystem, which will be the user, which will mean everything will work hunky dory. Yes, there's work involved, but I doubt there's a lot. Getting the label from the backing store or the creating process is simple enough. >>>> If you can mount a filesystem such that the labels are ignored you >>>> are effectively specifying that the Smack label on the files be >>>> determined by the defaulting rules. With CAP_MAC_ADMIN that's fine. >>>> Without it, it's not. >>> Can you explain what the threat model is here? I don't see what it is >>> that you're trying to prevent. >> Um, OK. >> The filesystem has files with a hundred different Smack labels on it. >> I mount it as an unlabeled filesystem and everything is readable by >> everyone. Bad jojo. > I still don't understand. If it's a filesystem backed by a file that > Seth has RW access to, then Seth can read everything on it, full stop. > The security labels in the filesystem are irrelevant. Well, they can't be trusted, if that's what you mean. That's why I'm saying that the objects exposed by mounting this backing store need to be treated with the same security attributes as the backing store. Fudge it for DAC if you are so inclined, but I think it's the right way to go for MAC. > This is like saying that, if you put restrictive labels in the > filesystem that lives on /dev/sda2 and give Seth ownership of > /dev/sda2, then you expect Seth to be unable to bypass the policy > specifies by your labels. Consider the Smack label on /dev/sda2. Smack does not care who owns it, just what the Smack label is. Just like on ~/seth/myfs. The backing store "object" is /dev/sda2 in the one case, ~/seth/myfs in the other, and something in the ether for a memory based filesystem. So long as the labels of the files exposed on the mount point match those of the backing store "object", Smack is going to be happy. Since you're running without privilege, you can't change the labels on the files. Now Seth, being the sneaky person that he is, could change the Smack labels on the files in the backing store while it's offline. Since he has access to the backing store, he can't give himself more access by changing the labels within the filesystem. He can give himself less, but I'm OK with that. > Or maybe I'm misunderstanding you. Probably, but I'm undoubtedly doing the same. If you're going to be at LinuxCon in Seattle we should continue this discussion over the beverage of your choice. >>>>> Your point is taken about my less-than-expert opinion about the other >>>>> security modules. We should at minimum get acks from the maintainers of >>>>> those modules that unprivileged mounts will not compromise MAC. >>>> I am the Smack maintainer. Unprivileged mounts as you have >>>> described them compromise MAC. They compromise DAC, too. >>>> >>> How do they compromise DAC? >> Wilma's expectation (or the application running with a mapped UID) >> that chmod will keep Seth out of the file. > That was never true. If Seth has an open fd, Wilma can chmod all day > and it won't matter. In this example, Seth owns the entire filesystem > along with its backing store. > > --Andy > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-17 0:45 ` Casey Schaufler @ 2015-07-17 0:59 ` Andy Lutomirski -1 siblings, 0 replies; 138+ messages in thread From: Andy Lutomirski @ 2015-07-17 0:59 UTC (permalink / raw) To: Casey Schaufler Cc: Seth Forshee, Eric W. Biederman, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel On Thu, Jul 16, 2015 at 5:45 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: > On 7/16/2015 4:29 PM, Andy Lutomirski wrote: >> I really don't see the benefit of making up extra rules that apply to >> users outside a userns who try to access specifically a filesystem >> with backing store. They wouldn't make sense for filesystems without >> backing store. > > Sure it would. For Smack, it would be the label a file would be > created with, which would be the label of the process creating > the memory based filesystem. For SELinux the rules are more a > touch more sophisticated, but I'm sure that Paul or Stephen could > come up with how to determine it. > > The point, looping all the way back to the beginning, where we > were talking about just ignoring the labels on the filesystem, > is that if you use the same Smack label on the files in the > filesystem as the backing store file has, we'll all be happy. > If that label isn't something user can write to, he won't be > able to write to the mounted objects, either. If there is no > backing store then use the label of the process creating the > filesystem, which will be the user, which will mean everything > will work hunky dory. > > Yes, there's work involved, but I doubt there's a lot. Getting > the label from the backing store or the creating process is > simple enough. > So what if Smack used the label of the user creating the filesystem even for filesystems with backing store? IMO this ought to be doable with the LSM hooks -- it certainly seems reasonable for the LSM to be aware of who created a filesystem. In fact, I'd argue that if Smack can't do this with the proposed LSM hooks, then the hooks are insufficient. Presumably Smack could also figure out what was mounted, but keep in mind that there are filesystems like ntfs-3g out there. While ntfs-3g logically has backing store, I don't think the kernel actually knows about it. > >>>>> If you can mount a filesystem such that the labels are ignored you >>>>> are effectively specifying that the Smack label on the files be >>>>> determined by the defaulting rules. With CAP_MAC_ADMIN that's fine. >>>>> Without it, it's not. >>>> Can you explain what the threat model is here? I don't see what it is >>>> that you're trying to prevent. >>> Um, OK. >>> The filesystem has files with a hundred different Smack labels on it. >>> I mount it as an unlabeled filesystem and everything is readable by >>> everyone. Bad jojo. >> I still don't understand. If it's a filesystem backed by a file that >> Seth has RW access to, then Seth can read everything on it, full stop. >> The security labels in the filesystem are irrelevant. > > Well, they can't be trusted, if that's what you mean. > That's why I'm saying that the objects exposed by mounting > this backing store need to be treated with the same security > attributes as the backing store. Fudge it for DAC if you are > so inclined, but I think it's the right way to go for MAC. > >> This is like saying that, if you put restrictive labels in the >> filesystem that lives on /dev/sda2 and give Seth ownership of >> /dev/sda2, then you expect Seth to be unable to bypass the policy >> specifies by your labels. > > Consider the Smack label on /dev/sda2. Smack does not care > who owns it, just what the Smack label is. Just like on > ~/seth/myfs. The backing store "object" is /dev/sda2 in the > one case, ~/seth/myfs in the other, and something in the ether > for a memory based filesystem. So long as the labels of the > files exposed on the mount point match those of the backing > store "object", Smack is going to be happy. Since you're > running without privilege, you can't change the labels on > the files. > > Now Seth, being the sneaky person that he is, could change > the Smack labels on the files in the backing store while it's > offline. Since he has access to the backing store, he can't > give himself more access by changing the labels within the > filesystem. He can give himself less, but I'm OK with that. > >> Or maybe I'm misunderstanding you. > > Probably, but I'm undoubtedly doing the same. > > If you're going to be at LinuxCon in Seattle we should > continue this discussion over the beverage of your choice. There's a small but not quite zero chance I'll be there. I'll probably be in Seoul. It's too bad that LSS and KS are in different places this year. --Andy ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-17 0:59 ` Andy Lutomirski 0 siblings, 0 replies; 138+ messages in thread From: Andy Lutomirski @ 2015-07-17 0:59 UTC (permalink / raw) To: Casey Schaufler Cc: Serge Hallyn, linux-kernel, Seth Forshee, LSM List, SELinux-NSA, Linux FS Devel, Alexander Viro On Thu, Jul 16, 2015 at 5:45 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: > On 7/16/2015 4:29 PM, Andy Lutomirski wrote: >> I really don't see the benefit of making up extra rules that apply to >> users outside a userns who try to access specifically a filesystem >> with backing store. They wouldn't make sense for filesystems without >> backing store. > > Sure it would. For Smack, it would be the label a file would be > created with, which would be the label of the process creating > the memory based filesystem. For SELinux the rules are more a > touch more sophisticated, but I'm sure that Paul or Stephen could > come up with how to determine it. > > The point, looping all the way back to the beginning, where we > were talking about just ignoring the labels on the filesystem, > is that if you use the same Smack label on the files in the > filesystem as the backing store file has, we'll all be happy. > If that label isn't something user can write to, he won't be > able to write to the mounted objects, either. If there is no > backing store then use the label of the process creating the > filesystem, which will be the user, which will mean everything > will work hunky dory. > > Yes, there's work involved, but I doubt there's a lot. Getting > the label from the backing store or the creating process is > simple enough. > So what if Smack used the label of the user creating the filesystem even for filesystems with backing store? IMO this ought to be doable with the LSM hooks -- it certainly seems reasonable for the LSM to be aware of who created a filesystem. In fact, I'd argue that if Smack can't do this with the proposed LSM hooks, then the hooks are insufficient. Presumably Smack could also figure out what was mounted, but keep in mind that there are filesystems like ntfs-3g out there. While ntfs-3g logically has backing store, I don't think the kernel actually knows about it. > >>>>> If you can mount a filesystem such that the labels are ignored you >>>>> are effectively specifying that the Smack label on the files be >>>>> determined by the defaulting rules. With CAP_MAC_ADMIN that's fine. >>>>> Without it, it's not. >>>> Can you explain what the threat model is here? I don't see what it is >>>> that you're trying to prevent. >>> Um, OK. >>> The filesystem has files with a hundred different Smack labels on it. >>> I mount it as an unlabeled filesystem and everything is readable by >>> everyone. Bad jojo. >> I still don't understand. If it's a filesystem backed by a file that >> Seth has RW access to, then Seth can read everything on it, full stop. >> The security labels in the filesystem are irrelevant. > > Well, they can't be trusted, if that's what you mean. > That's why I'm saying that the objects exposed by mounting > this backing store need to be treated with the same security > attributes as the backing store. Fudge it for DAC if you are > so inclined, but I think it's the right way to go for MAC. > >> This is like saying that, if you put restrictive labels in the >> filesystem that lives on /dev/sda2 and give Seth ownership of >> /dev/sda2, then you expect Seth to be unable to bypass the policy >> specifies by your labels. > > Consider the Smack label on /dev/sda2. Smack does not care > who owns it, just what the Smack label is. Just like on > ~/seth/myfs. The backing store "object" is /dev/sda2 in the > one case, ~/seth/myfs in the other, and something in the ether > for a memory based filesystem. So long as the labels of the > files exposed on the mount point match those of the backing > store "object", Smack is going to be happy. Since you're > running without privilege, you can't change the labels on > the files. > > Now Seth, being the sneaky person that he is, could change > the Smack labels on the files in the backing store while it's > offline. Since he has access to the backing store, he can't > give himself more access by changing the labels within the > filesystem. He can give himself less, but I'm OK with that. > >> Or maybe I'm misunderstanding you. > > Probably, but I'm undoubtedly doing the same. > > If you're going to be at LinuxCon in Seattle we should > continue this discussion over the beverage of your choice. There's a small but not quite zero chance I'll be there. I'll probably be in Seoul. It's too bad that LSS and KS are in different places this year. --Andy ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-17 0:59 ` Andy Lutomirski @ 2015-07-17 14:28 ` Serge E. Hallyn -1 siblings, 0 replies; 138+ messages in thread From: Serge E. Hallyn @ 2015-07-17 14:28 UTC (permalink / raw) To: Andy Lutomirski Cc: Casey Schaufler, Seth Forshee, Eric W. Biederman, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel On Thu, Jul 16, 2015 at 05:59:22PM -0700, Andy Lutomirski wrote: > On Thu, Jul 16, 2015 at 5:45 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: > > On 7/16/2015 4:29 PM, Andy Lutomirski wrote: > >> I really don't see the benefit of making up extra rules that apply to > >> users outside a userns who try to access specifically a filesystem > >> with backing store. They wouldn't make sense for filesystems without > >> backing store. > > > > Sure it would. For Smack, it would be the label a file would be > > created with, which would be the label of the process creating > > the memory based filesystem. For SELinux the rules are more a > > touch more sophisticated, but I'm sure that Paul or Stephen could > > come up with how to determine it. > > > > The point, looping all the way back to the beginning, where we > > were talking about just ignoring the labels on the filesystem, > > is that if you use the same Smack label on the files in the > > filesystem as the backing store file has, we'll all be happy. > > If that label isn't something user can write to, he won't be > > able to write to the mounted objects, either. If there is no > > backing store then use the label of the process creating the > > filesystem, which will be the user, which will mean everything > > will work hunky dory. > > > > Yes, there's work involved, but I doubt there's a lot. Getting > > the label from the backing store or the creating process is > > simple enough. > > > > So what if Smack used the label of the user creating the filesystem > even for filesystems with backing store? IMO this ought to be doable The more usual LSM-ish way to handle this would be to ask the LSM, at mount time, with a new security_mount_bdev_in_userns() hook, passing it the user's label and the backing store's label (if any), and storing the label to be used for the files. Even more LSM-ish (though risking performance hit) would be to then have the LSM at each inode_init_security decide whether to use that label or trust what's in the fs anyway (or do something else). That could allow the LSM to use policy to decide that. Because I don't know that for all LSMs it makes sense for a 'subject' label to be assigned to an object. > with the LSM hooks -- it certainly seems reasonable for the LSM to be > aware of who created a filesystem. In fact, I'd argue that if Smack > can't do this with the proposed LSM hooks, then the hooks are > insufficient. > > Presumably Smack could also figure out what was mounted, but keep in > mind that there are filesystems like ntfs-3g out there. While ntfs-3g > logically has backing store, I don't think the kernel actually knows > about it. > > > > >>>>> If you can mount a filesystem such that the labels are ignored you > >>>>> are effectively specifying that the Smack label on the files be > >>>>> determined by the defaulting rules. With CAP_MAC_ADMIN that's fine. > >>>>> Without it, it's not. > >>>> Can you explain what the threat model is here? I don't see what it is > >>>> that you're trying to prevent. > >>> Um, OK. > >>> The filesystem has files with a hundred different Smack labels on it. > >>> I mount it as an unlabeled filesystem and everything is readable by > >>> everyone. Bad jojo. > >> I still don't understand. If it's a filesystem backed by a file that > >> Seth has RW access to, then Seth can read everything on it, full stop. > >> The security labels in the filesystem are irrelevant. > > > > Well, they can't be trusted, if that's what you mean. > > That's why I'm saying that the objects exposed by mounting > > this backing store need to be treated with the same security > > attributes as the backing store. Fudge it for DAC if you are > > so inclined, but I think it's the right way to go for MAC. > > > >> This is like saying that, if you put restrictive labels in the > >> filesystem that lives on /dev/sda2 and give Seth ownership of > >> /dev/sda2, then you expect Seth to be unable to bypass the policy > >> specifies by your labels. > > > > Consider the Smack label on /dev/sda2. Smack does not care > > who owns it, just what the Smack label is. Just like on > > ~/seth/myfs. The backing store "object" is /dev/sda2 in the > > one case, ~/seth/myfs in the other, and something in the ether > > for a memory based filesystem. So long as the labels of the > > files exposed on the mount point match those of the backing > > store "object", Smack is going to be happy. Since you're > > running without privilege, you can't change the labels on > > the files. > > > > Now Seth, being the sneaky person that he is, could change > > the Smack labels on the files in the backing store while it's > > offline. Since he has access to the backing store, he can't > > give himself more access by changing the labels within the > > filesystem. He can give himself less, but I'm OK with that. > > > >> Or maybe I'm misunderstanding you. > > > > Probably, but I'm undoubtedly doing the same. > > > > If you're going to be at LinuxCon in Seattle we should > > continue this discussion over the beverage of your choice. > > There's a small but not quite zero chance I'll be there. I'll > probably be in Seoul. It's too bad that LSS and KS are in different > places this year. FWIW I'll be there and happy to discuss. -serge ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-17 14:28 ` Serge E. Hallyn 0 siblings, 0 replies; 138+ messages in thread From: Serge E. Hallyn @ 2015-07-17 14:28 UTC (permalink / raw) To: Andy Lutomirski Cc: Serge Hallyn, linux-kernel, Seth Forshee, LSM List, SELinux-NSA, Linux FS Devel, Alexander Viro On Thu, Jul 16, 2015 at 05:59:22PM -0700, Andy Lutomirski wrote: > On Thu, Jul 16, 2015 at 5:45 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: > > On 7/16/2015 4:29 PM, Andy Lutomirski wrote: > >> I really don't see the benefit of making up extra rules that apply to > >> users outside a userns who try to access specifically a filesystem > >> with backing store. They wouldn't make sense for filesystems without > >> backing store. > > > > Sure it would. For Smack, it would be the label a file would be > > created with, which would be the label of the process creating > > the memory based filesystem. For SELinux the rules are more a > > touch more sophisticated, but I'm sure that Paul or Stephen could > > come up with how to determine it. > > > > The point, looping all the way back to the beginning, where we > > were talking about just ignoring the labels on the filesystem, > > is that if you use the same Smack label on the files in the > > filesystem as the backing store file has, we'll all be happy. > > If that label isn't something user can write to, he won't be > > able to write to the mounted objects, either. If there is no > > backing store then use the label of the process creating the > > filesystem, which will be the user, which will mean everything > > will work hunky dory. > > > > Yes, there's work involved, but I doubt there's a lot. Getting > > the label from the backing store or the creating process is > > simple enough. > > > > So what if Smack used the label of the user creating the filesystem > even for filesystems with backing store? IMO this ought to be doable The more usual LSM-ish way to handle this would be to ask the LSM, at mount time, with a new security_mount_bdev_in_userns() hook, passing it the user's label and the backing store's label (if any), and storing the label to be used for the files. Even more LSM-ish (though risking performance hit) would be to then have the LSM at each inode_init_security decide whether to use that label or trust what's in the fs anyway (or do something else). That could allow the LSM to use policy to decide that. Because I don't know that for all LSMs it makes sense for a 'subject' label to be assigned to an object. > with the LSM hooks -- it certainly seems reasonable for the LSM to be > aware of who created a filesystem. In fact, I'd argue that if Smack > can't do this with the proposed LSM hooks, then the hooks are > insufficient. > > Presumably Smack could also figure out what was mounted, but keep in > mind that there are filesystems like ntfs-3g out there. While ntfs-3g > logically has backing store, I don't think the kernel actually knows > about it. > > > > >>>>> If you can mount a filesystem such that the labels are ignored you > >>>>> are effectively specifying that the Smack label on the files be > >>>>> determined by the defaulting rules. With CAP_MAC_ADMIN that's fine. > >>>>> Without it, it's not. > >>>> Can you explain what the threat model is here? I don't see what it is > >>>> that you're trying to prevent. > >>> Um, OK. > >>> The filesystem has files with a hundred different Smack labels on it. > >>> I mount it as an unlabeled filesystem and everything is readable by > >>> everyone. Bad jojo. > >> I still don't understand. If it's a filesystem backed by a file that > >> Seth has RW access to, then Seth can read everything on it, full stop. > >> The security labels in the filesystem are irrelevant. > > > > Well, they can't be trusted, if that's what you mean. > > That's why I'm saying that the objects exposed by mounting > > this backing store need to be treated with the same security > > attributes as the backing store. Fudge it for DAC if you are > > so inclined, but I think it's the right way to go for MAC. > > > >> This is like saying that, if you put restrictive labels in the > >> filesystem that lives on /dev/sda2 and give Seth ownership of > >> /dev/sda2, then you expect Seth to be unable to bypass the policy > >> specifies by your labels. > > > > Consider the Smack label on /dev/sda2. Smack does not care > > who owns it, just what the Smack label is. Just like on > > ~/seth/myfs. The backing store "object" is /dev/sda2 in the > > one case, ~/seth/myfs in the other, and something in the ether > > for a memory based filesystem. So long as the labels of the > > files exposed on the mount point match those of the backing > > store "object", Smack is going to be happy. Since you're > > running without privilege, you can't change the labels on > > the files. > > > > Now Seth, being the sneaky person that he is, could change > > the Smack labels on the files in the backing store while it's > > offline. Since he has access to the backing store, he can't > > give himself more access by changing the labels within the > > filesystem. He can give himself less, but I'm OK with that. > > > >> Or maybe I'm misunderstanding you. > > > > Probably, but I'm undoubtedly doing the same. > > > > If you're going to be at LinuxCon in Seattle we should > > continue this discussion over the beverage of your choice. > > There's a small but not quite zero chance I'll be there. I'll > probably be in Seoul. It's too bad that LSS and KS are in different > places this year. FWIW I'll be there and happy to discuss. -serge ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-17 14:28 ` Serge E. Hallyn @ 2015-07-17 14:56 ` Seth Forshee -1 siblings, 0 replies; 138+ messages in thread From: Seth Forshee @ 2015-07-17 14:56 UTC (permalink / raw) To: Serge E. Hallyn Cc: Andy Lutomirski, Casey Schaufler, Eric W. Biederman, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel On Fri, Jul 17, 2015 at 09:28:32AM -0500, Serge E. Hallyn wrote: > > > If you're going to be at LinuxCon in Seattle we should > > > continue this discussion over the beverage of your choice. > > > > There's a small but not quite zero chance I'll be there. I'll > > probably be in Seoul. It's too bad that LSS and KS are in different > > places this year. > > FWIW I'll be there and happy to discuss. I'll also be in Seattle and happy to discuss. Seth ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-17 14:56 ` Seth Forshee 0 siblings, 0 replies; 138+ messages in thread From: Seth Forshee @ 2015-07-17 14:56 UTC (permalink / raw) To: Serge E. Hallyn Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, Linux FS Devel, LSM List, SELinux-NSA, Alexander Viro On Fri, Jul 17, 2015 at 09:28:32AM -0500, Serge E. Hallyn wrote: > > > If you're going to be at LinuxCon in Seattle we should > > > continue this discussion over the beverage of your choice. > > > > There's a small but not quite zero chance I'll be there. I'll > > probably be in Seoul. It's too bad that LSS and KS are in different > > places this year. > > FWIW I'll be there and happy to discuss. I'll also be in Seattle and happy to discuss. Seth ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-17 0:59 ` Andy Lutomirski @ 2015-07-21 20:35 ` Seth Forshee -1 siblings, 0 replies; 138+ messages in thread From: Seth Forshee @ 2015-07-21 20:35 UTC (permalink / raw) To: Casey Schaufler, Andy Lutomirski Cc: Eric W. Biederman, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel On Thu, Jul 16, 2015 at 05:59:22PM -0700, Andy Lutomirski wrote: > On Thu, Jul 16, 2015 at 5:45 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: > > On 7/16/2015 4:29 PM, Andy Lutomirski wrote: > >> I really don't see the benefit of making up extra rules that apply to > >> users outside a userns who try to access specifically a filesystem > >> with backing store. They wouldn't make sense for filesystems without > >> backing store. > > > > Sure it would. For Smack, it would be the label a file would be > > created with, which would be the label of the process creating > > the memory based filesystem. For SELinux the rules are more a > > touch more sophisticated, but I'm sure that Paul or Stephen could > > come up with how to determine it. > > > > The point, looping all the way back to the beginning, where we > > were talking about just ignoring the labels on the filesystem, > > is that if you use the same Smack label on the files in the > > filesystem as the backing store file has, we'll all be happy. > > If that label isn't something user can write to, he won't be > > able to write to the mounted objects, either. If there is no > > backing store then use the label of the process creating the > > filesystem, which will be the user, which will mean everything > > will work hunky dory. > > > > Yes, there's work involved, but I doubt there's a lot. Getting > > the label from the backing store or the creating process is > > simple enough. > > So something like the diff below (untested)? All I'm really doing is setting smk_default as you describe above and then using it instead of smk_of_current() in smack_inode_alloc_security() and instead of the label from the disk in smack_d_instantiate(). Since a user currently needs CAP_MAC_ADMIN in init_user_ns to store security labels it looks like this should be sufficient. I'm not even sure that the inode_alloc_security hook changes are needed. We could allow privileged users in s_user_ns to write security labels to disk since they already control the backing store, as long as Smack didn't subsequently import them. I didn't do that here. > So what if Smack used the label of the user creating the filesystem > even for filesystems with backing store? IMO this ought to be doable > with the LSM hooks -- it certainly seems reasonable for the LSM to be > aware of who created a filesystem. In fact, I'd argue that if Smack > can't do this with the proposed LSM hooks, then the hooks are > insufficient. It would be very simple to use the label of the task instead. Seth --- diff --git a/include/linux/fs.h b/include/linux/fs.h index 32f598db0b0d..4597420ab933 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1486,6 +1486,10 @@ static inline void sb_start_intwrite(struct super_block *sb) __sb_start_write(sb, SB_FREEZE_FS, true); } +static inline bool sb_in_userns(struct super_block *sb) +{ + return sb->s_user_ns != &init_user_ns; +} extern bool inode_owner_or_capable(const struct inode *inode); diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c index a143328f75eb..591fd19294e7 100644 --- a/security/smack/smack_lsm.c +++ b/security/smack/smack_lsm.c @@ -255,6 +255,10 @@ static struct smack_known *smk_fetch(const char *name, struct inode *ip, char *buffer; struct smack_known *skp = NULL; + /* Should never fetch xattrs from untrusted mounts */ + if (WARN_ON(sb_in_userns(ip->i_sb))) + return ERR_PTR(-EPERM); + if (ip->i_op->getxattr == NULL) return ERR_PTR(-EOPNOTSUPP); @@ -656,10 +660,14 @@ static int smack_sb_kern_mount(struct super_block *sb, int flags, void *data) */ if (specified) return -EPERM; + /* - * Unprivileged mounts get root and default from the caller. + * User namespace mounts get root and default from the backing + * store, if there is one. Other unprivileged mounts get them + * from the caller. */ - skp = smk_of_current(); + skp = (sb_in_userns(sb) && sb->s_bdev) ? + smk_of_inode(sb->s_bdev->bd_inode) : smk_of_current(); sp->smk_root = skp; sp->smk_default = skp; } @@ -792,7 +800,12 @@ static int smack_bprm_secureexec(struct linux_binprm *bprm) */ static int smack_inode_alloc_security(struct inode *inode) { - struct smack_known *skp = smk_of_current(); + struct smack_known *skp; + + if (sb_in_userns(inode->i_sb)) + skp = ((struct superblock_smack *)(inode->i_sb->s_security))->smk_default; + else + skp = smk_of_current(); inode->i_security = new_inode_smack(skp); if (inode->i_security == NULL) @@ -3175,6 +3188,11 @@ static void smack_d_instantiate(struct dentry *opt_dentry, struct inode *inode) break; } /* + * Don't use labels from xattrs for unprivileged mounts. + */ + if (sb_in_userns(inode->i_sb)) + break; + /* * No xattr support means, alas, no SMACK label. * Use the aforeapplied default. * It would be curious if the label of the task ^ permalink raw reply related [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-21 20:35 ` Seth Forshee 0 siblings, 0 replies; 138+ messages in thread From: Seth Forshee @ 2015-07-21 20:35 UTC (permalink / raw) To: Casey Schaufler, Andy Lutomirski Cc: Serge Hallyn, linux-kernel, LSM List, SELinux-NSA, Linux FS Devel, Alexander Viro On Thu, Jul 16, 2015 at 05:59:22PM -0700, Andy Lutomirski wrote: > On Thu, Jul 16, 2015 at 5:45 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: > > On 7/16/2015 4:29 PM, Andy Lutomirski wrote: > >> I really don't see the benefit of making up extra rules that apply to > >> users outside a userns who try to access specifically a filesystem > >> with backing store. They wouldn't make sense for filesystems without > >> backing store. > > > > Sure it would. For Smack, it would be the label a file would be > > created with, which would be the label of the process creating > > the memory based filesystem. For SELinux the rules are more a > > touch more sophisticated, but I'm sure that Paul or Stephen could > > come up with how to determine it. > > > > The point, looping all the way back to the beginning, where we > > were talking about just ignoring the labels on the filesystem, > > is that if you use the same Smack label on the files in the > > filesystem as the backing store file has, we'll all be happy. > > If that label isn't something user can write to, he won't be > > able to write to the mounted objects, either. If there is no > > backing store then use the label of the process creating the > > filesystem, which will be the user, which will mean everything > > will work hunky dory. > > > > Yes, there's work involved, but I doubt there's a lot. Getting > > the label from the backing store or the creating process is > > simple enough. > > So something like the diff below (untested)? All I'm really doing is setting smk_default as you describe above and then using it instead of smk_of_current() in smack_inode_alloc_security() and instead of the label from the disk in smack_d_instantiate(). Since a user currently needs CAP_MAC_ADMIN in init_user_ns to store security labels it looks like this should be sufficient. I'm not even sure that the inode_alloc_security hook changes are needed. We could allow privileged users in s_user_ns to write security labels to disk since they already control the backing store, as long as Smack didn't subsequently import them. I didn't do that here. > So what if Smack used the label of the user creating the filesystem > even for filesystems with backing store? IMO this ought to be doable > with the LSM hooks -- it certainly seems reasonable for the LSM to be > aware of who created a filesystem. In fact, I'd argue that if Smack > can't do this with the proposed LSM hooks, then the hooks are > insufficient. It would be very simple to use the label of the task instead. Seth --- diff --git a/include/linux/fs.h b/include/linux/fs.h index 32f598db0b0d..4597420ab933 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1486,6 +1486,10 @@ static inline void sb_start_intwrite(struct super_block *sb) __sb_start_write(sb, SB_FREEZE_FS, true); } +static inline bool sb_in_userns(struct super_block *sb) +{ + return sb->s_user_ns != &init_user_ns; +} extern bool inode_owner_or_capable(const struct inode *inode); diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c index a143328f75eb..591fd19294e7 100644 --- a/security/smack/smack_lsm.c +++ b/security/smack/smack_lsm.c @@ -255,6 +255,10 @@ static struct smack_known *smk_fetch(const char *name, struct inode *ip, char *buffer; struct smack_known *skp = NULL; + /* Should never fetch xattrs from untrusted mounts */ + if (WARN_ON(sb_in_userns(ip->i_sb))) + return ERR_PTR(-EPERM); + if (ip->i_op->getxattr == NULL) return ERR_PTR(-EOPNOTSUPP); @@ -656,10 +660,14 @@ static int smack_sb_kern_mount(struct super_block *sb, int flags, void *data) */ if (specified) return -EPERM; + /* - * Unprivileged mounts get root and default from the caller. + * User namespace mounts get root and default from the backing + * store, if there is one. Other unprivileged mounts get them + * from the caller. */ - skp = smk_of_current(); + skp = (sb_in_userns(sb) && sb->s_bdev) ? + smk_of_inode(sb->s_bdev->bd_inode) : smk_of_current(); sp->smk_root = skp; sp->smk_default = skp; } @@ -792,7 +800,12 @@ static int smack_bprm_secureexec(struct linux_binprm *bprm) */ static int smack_inode_alloc_security(struct inode *inode) { - struct smack_known *skp = smk_of_current(); + struct smack_known *skp; + + if (sb_in_userns(inode->i_sb)) + skp = ((struct superblock_smack *)(inode->i_sb->s_security))->smk_default; + else + skp = smk_of_current(); inode->i_security = new_inode_smack(skp); if (inode->i_security == NULL) @@ -3175,6 +3188,11 @@ static void smack_d_instantiate(struct dentry *opt_dentry, struct inode *inode) break; } /* + * Don't use labels from xattrs for unprivileged mounts. + */ + if (sb_in_userns(inode->i_sb)) + break; + /* * No xattr support means, alas, no SMACK label. * Use the aforeapplied default. * It would be curious if the label of the task ^ permalink raw reply related [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-21 20:35 ` Seth Forshee @ 2015-07-22 1:52 ` Casey Schaufler -1 siblings, 0 replies; 138+ messages in thread From: Casey Schaufler @ 2015-07-22 1:52 UTC (permalink / raw) To: Seth Forshee, Andy Lutomirski Cc: Eric W. Biederman, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel, Casey Schaufler On 7/21/2015 1:35 PM, Seth Forshee wrote: > On Thu, Jul 16, 2015 at 05:59:22PM -0700, Andy Lutomirski wrote: >> On Thu, Jul 16, 2015 at 5:45 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: >>> On 7/16/2015 4:29 PM, Andy Lutomirski wrote: >>>> I really don't see the benefit of making up extra rules that apply to >>>> users outside a userns who try to access specifically a filesystem >>>> with backing store. They wouldn't make sense for filesystems without >>>> backing store. >>> Sure it would. For Smack, it would be the label a file would be >>> created with, which would be the label of the process creating >>> the memory based filesystem. For SELinux the rules are more a >>> touch more sophisticated, but I'm sure that Paul or Stephen could >>> come up with how to determine it. >>> >>> The point, looping all the way back to the beginning, where we >>> were talking about just ignoring the labels on the filesystem, >>> is that if you use the same Smack label on the files in the >>> filesystem as the backing store file has, we'll all be happy. >>> If that label isn't something user can write to, he won't be >>> able to write to the mounted objects, either. If there is no >>> backing store then use the label of the process creating the >>> filesystem, which will be the user, which will mean everything >>> will work hunky dory. >>> >>> Yes, there's work involved, but I doubt there's a lot. Getting >>> the label from the backing store or the creating process is >>> simple enough. >>> > So something like the diff below (untested)? I think that this is close, and quite good for someone who isn't very familiar with Smack. It's definitely headed in the right direction. > All I'm really doing is setting smk_default as you describe above and > then using it instead of smk_of_current() in > smack_inode_alloc_security() and instead of the label from the disk in > smack_d_instantiate(). Let's say your backing store is a file labeled Rubble. mount -o smackfsroot=Rubble,smackfsdef=Rubble ... It is completely reasonable for a process labeled Flintstone to have rwxa access to a file labeled Rubble. Smack rule: Flintstone Rubble rwxa In the case of writing to an existing Rubble file, what you have looks fine. What's not so great is that if the Flintstone process creates a file, it should be labeled Flintstone. Your use of the smk_default, which is going to violate the principle of least astonishment, and break the Smack policy as well. Let's make a minor change. Instead of using smackfsroot let's use smackfstransmute and a slightly different access rule: mount -o smackfstransmute=Rubble,smackfsdef=Rubble ... Smack rule: Flintstone Rubble rwxat Now the only change we have to make to the Smack code is that we don't want to create any files unless either the process is labeled Rubble or the rule allowing the creation has the "t" for transmute access. That should ensure that everything is labeled Rubble. If it isn't, someone has mucked with the metadata in a detectable way. > Since a user currently needs CAP_MAC_ADMIN in > init_user_ns to store security labels it looks like this should be > sufficient. I'm not even sure that the inode_alloc_security hook changes > are needed. > > We could allow privileged users in s_user_ns to write security labels to > disk since they already control the backing store, as long as Smack > didn't subsequently import them. I didn't do that here. > >> So what if Smack used the label of the user creating the filesystem >> even for filesystems with backing store? IMO this ought to be doable >> with the LSM hooks -- it certainly seems reasonable for the LSM to be >> aware of who created a filesystem. In fact, I'd argue that if Smack >> can't do this with the proposed LSM hooks, then the hooks are >> insufficient. > It would be very simple to use the label of the task instead. > > Seth > > --- > > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 32f598db0b0d..4597420ab933 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -1486,6 +1486,10 @@ static inline void sb_start_intwrite(struct super_block *sb) > __sb_start_write(sb, SB_FREEZE_FS, true); > } > > +static inline bool sb_in_userns(struct super_block *sb) > +{ > + return sb->s_user_ns != &init_user_ns; > +} > > extern bool inode_owner_or_capable(const struct inode *inode); > > diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c > index a143328f75eb..591fd19294e7 100644 > --- a/security/smack/smack_lsm.c > +++ b/security/smack/smack_lsm.c > @@ -255,6 +255,10 @@ static struct smack_known *smk_fetch(const char *name, struct inode *ip, > char *buffer; > struct smack_known *skp = NULL; > > + /* Should never fetch xattrs from untrusted mounts */ > + if (WARN_ON(sb_in_userns(ip->i_sb))) > + return ERR_PTR(-EPERM); > + Go ahead and fetch it, we'll check to make sure it's viable later. > if (ip->i_op->getxattr == NULL) > return ERR_PTR(-EOPNOTSUPP); > > @@ -656,10 +660,14 @@ static int smack_sb_kern_mount(struct super_block *sb, int flags, void *data) > */ > if (specified) > return -EPERM; > + > /* > - * Unprivileged mounts get root and default from the caller. > + * User namespace mounts get root and default from the backing > + * store, if there is one. Other unprivileged mounts get them > + * from the caller. > */ > - skp = smk_of_current(); > + skp = (sb_in_userns(sb) && sb->s_bdev) ? > + smk_of_inode(sb->s_bdev->bd_inode) : smk_of_current(); > sp->smk_root = skp; > sp->smk_default = skp; sp->smk_flags |= SMK_INODE_TRANSMUTE; > } > @@ -792,7 +800,12 @@ static int smack_bprm_secureexec(struct linux_binprm *bprm) > */ > static int smack_inode_alloc_security(struct inode *inode) > { > - struct smack_known *skp = smk_of_current(); > + struct smack_known *skp; > + > + if (sb_in_userns(inode->i_sb)) > + skp = ((struct superblock_smack *)(inode->i_sb->s_security))->smk_default; > + else > + skp = smk_of_current(); This should be left alone. smack_inode_init_security is where you could disallow access that doesn't legitimately result in a Rubble label on the file. It's something like ... after the call may = smk_access_entry(...) if (sb_in_userns(inode->i_sb)) if (skp != dsp && (may & MAY_TRANSMUTE) == 0) return -EACCES; > inode->i_security = new_inode_smack(skp); > if (inode->i_security == NULL) > @@ -3175,6 +3188,11 @@ static void smack_d_instantiate(struct dentry *opt_dentry, struct inode *inode) > break; > } > /* > + * Don't use labels from xattrs for unprivileged mounts. > + */ > + if (sb_in_userns(inode->i_sb)) > + break; > + /* Again, use the label. Just check to make sure it's what you expect. > * No xattr support means, alas, no SMACK label. > * Use the aforeapplied default. > * It would be curious if the label of the task Also untested. > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-22 1:52 ` Casey Schaufler 0 siblings, 0 replies; 138+ messages in thread From: Casey Schaufler @ 2015-07-22 1:52 UTC (permalink / raw) To: Seth Forshee, Andy Lutomirski Cc: Serge Hallyn, linux-kernel, LSM List, SELinux-NSA, Linux FS Devel, Alexander Viro On 7/21/2015 1:35 PM, Seth Forshee wrote: > On Thu, Jul 16, 2015 at 05:59:22PM -0700, Andy Lutomirski wrote: >> On Thu, Jul 16, 2015 at 5:45 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: >>> On 7/16/2015 4:29 PM, Andy Lutomirski wrote: >>>> I really don't see the benefit of making up extra rules that apply to >>>> users outside a userns who try to access specifically a filesystem >>>> with backing store. They wouldn't make sense for filesystems without >>>> backing store. >>> Sure it would. For Smack, it would be the label a file would be >>> created with, which would be the label of the process creating >>> the memory based filesystem. For SELinux the rules are more a >>> touch more sophisticated, but I'm sure that Paul or Stephen could >>> come up with how to determine it. >>> >>> The point, looping all the way back to the beginning, where we >>> were talking about just ignoring the labels on the filesystem, >>> is that if you use the same Smack label on the files in the >>> filesystem as the backing store file has, we'll all be happy. >>> If that label isn't something user can write to, he won't be >>> able to write to the mounted objects, either. If there is no >>> backing store then use the label of the process creating the >>> filesystem, which will be the user, which will mean everything >>> will work hunky dory. >>> >>> Yes, there's work involved, but I doubt there's a lot. Getting >>> the label from the backing store or the creating process is >>> simple enough. >>> > So something like the diff below (untested)? I think that this is close, and quite good for someone who isn't very familiar with Smack. It's definitely headed in the right direction. > All I'm really doing is setting smk_default as you describe above and > then using it instead of smk_of_current() in > smack_inode_alloc_security() and instead of the label from the disk in > smack_d_instantiate(). Let's say your backing store is a file labeled Rubble. mount -o smackfsroot=Rubble,smackfsdef=Rubble ... It is completely reasonable for a process labeled Flintstone to have rwxa access to a file labeled Rubble. Smack rule: Flintstone Rubble rwxa In the case of writing to an existing Rubble file, what you have looks fine. What's not so great is that if the Flintstone process creates a file, it should be labeled Flintstone. Your use of the smk_default, which is going to violate the principle of least astonishment, and break the Smack policy as well. Let's make a minor change. Instead of using smackfsroot let's use smackfstransmute and a slightly different access rule: mount -o smackfstransmute=Rubble,smackfsdef=Rubble ... Smack rule: Flintstone Rubble rwxat Now the only change we have to make to the Smack code is that we don't want to create any files unless either the process is labeled Rubble or the rule allowing the creation has the "t" for transmute access. That should ensure that everything is labeled Rubble. If it isn't, someone has mucked with the metadata in a detectable way. > Since a user currently needs CAP_MAC_ADMIN in > init_user_ns to store security labels it looks like this should be > sufficient. I'm not even sure that the inode_alloc_security hook changes > are needed. > > We could allow privileged users in s_user_ns to write security labels to > disk since they already control the backing store, as long as Smack > didn't subsequently import them. I didn't do that here. > >> So what if Smack used the label of the user creating the filesystem >> even for filesystems with backing store? IMO this ought to be doable >> with the LSM hooks -- it certainly seems reasonable for the LSM to be >> aware of who created a filesystem. In fact, I'd argue that if Smack >> can't do this with the proposed LSM hooks, then the hooks are >> insufficient. > It would be very simple to use the label of the task instead. > > Seth > > --- > > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 32f598db0b0d..4597420ab933 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -1486,6 +1486,10 @@ static inline void sb_start_intwrite(struct super_block *sb) > __sb_start_write(sb, SB_FREEZE_FS, true); > } > > +static inline bool sb_in_userns(struct super_block *sb) > +{ > + return sb->s_user_ns != &init_user_ns; > +} > > extern bool inode_owner_or_capable(const struct inode *inode); > > diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c > index a143328f75eb..591fd19294e7 100644 > --- a/security/smack/smack_lsm.c > +++ b/security/smack/smack_lsm.c > @@ -255,6 +255,10 @@ static struct smack_known *smk_fetch(const char *name, struct inode *ip, > char *buffer; > struct smack_known *skp = NULL; > > + /* Should never fetch xattrs from untrusted mounts */ > + if (WARN_ON(sb_in_userns(ip->i_sb))) > + return ERR_PTR(-EPERM); > + Go ahead and fetch it, we'll check to make sure it's viable later. > if (ip->i_op->getxattr == NULL) > return ERR_PTR(-EOPNOTSUPP); > > @@ -656,10 +660,14 @@ static int smack_sb_kern_mount(struct super_block *sb, int flags, void *data) > */ > if (specified) > return -EPERM; > + > /* > - * Unprivileged mounts get root and default from the caller. > + * User namespace mounts get root and default from the backing > + * store, if there is one. Other unprivileged mounts get them > + * from the caller. > */ > - skp = smk_of_current(); > + skp = (sb_in_userns(sb) && sb->s_bdev) ? > + smk_of_inode(sb->s_bdev->bd_inode) : smk_of_current(); > sp->smk_root = skp; > sp->smk_default = skp; sp->smk_flags |= SMK_INODE_TRANSMUTE; > } > @@ -792,7 +800,12 @@ static int smack_bprm_secureexec(struct linux_binprm *bprm) > */ > static int smack_inode_alloc_security(struct inode *inode) > { > - struct smack_known *skp = smk_of_current(); > + struct smack_known *skp; > + > + if (sb_in_userns(inode->i_sb)) > + skp = ((struct superblock_smack *)(inode->i_sb->s_security))->smk_default; > + else > + skp = smk_of_current(); This should be left alone. smack_inode_init_security is where you could disallow access that doesn't legitimately result in a Rubble label on the file. It's something like ... after the call may = smk_access_entry(...) if (sb_in_userns(inode->i_sb)) if (skp != dsp && (may & MAY_TRANSMUTE) == 0) return -EACCES; > inode->i_security = new_inode_smack(skp); > if (inode->i_security == NULL) > @@ -3175,6 +3188,11 @@ static void smack_d_instantiate(struct dentry *opt_dentry, struct inode *inode) > break; > } > /* > + * Don't use labels from xattrs for unprivileged mounts. > + */ > + if (sb_in_userns(inode->i_sb)) > + break; > + /* Again, use the label. Just check to make sure it's what you expect. > * No xattr support means, alas, no SMACK label. > * Use the aforeapplied default. > * It would be curious if the label of the task Also untested. > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-22 1:52 ` Casey Schaufler @ 2015-07-22 15:56 ` Seth Forshee -1 siblings, 0 replies; 138+ messages in thread From: Seth Forshee @ 2015-07-22 15:56 UTC (permalink / raw) To: Casey Schaufler Cc: Andy Lutomirski, Eric W. Biederman, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel On Tue, Jul 21, 2015 at 06:52:31PM -0700, Casey Schaufler wrote: > On 7/21/2015 1:35 PM, Seth Forshee wrote: > > On Thu, Jul 16, 2015 at 05:59:22PM -0700, Andy Lutomirski wrote: > >> On Thu, Jul 16, 2015 at 5:45 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: > >>> On 7/16/2015 4:29 PM, Andy Lutomirski wrote: > >>>> I really don't see the benefit of making up extra rules that apply to > >>>> users outside a userns who try to access specifically a filesystem > >>>> with backing store. They wouldn't make sense for filesystems without > >>>> backing store. > >>> Sure it would. For Smack, it would be the label a file would be > >>> created with, which would be the label of the process creating > >>> the memory based filesystem. For SELinux the rules are more a > >>> touch more sophisticated, but I'm sure that Paul or Stephen could > >>> come up with how to determine it. > >>> > >>> The point, looping all the way back to the beginning, where we > >>> were talking about just ignoring the labels on the filesystem, > >>> is that if you use the same Smack label on the files in the > >>> filesystem as the backing store file has, we'll all be happy. > >>> If that label isn't something user can write to, he won't be > >>> able to write to the mounted objects, either. If there is no > >>> backing store then use the label of the process creating the > >>> filesystem, which will be the user, which will mean everything > >>> will work hunky dory. > >>> > >>> Yes, there's work involved, but I doubt there's a lot. Getting > >>> the label from the backing store or the creating process is > >>> simple enough. > >>> > > So something like the diff below (untested)? > > I think that this is close, and quite good for someone > who isn't very familiar with Smack. It's definitely headed > in the right direction. > > > All I'm really doing is setting smk_default as you describe above and > > then using it instead of smk_of_current() in > > smack_inode_alloc_security() and instead of the label from the disk in > > smack_d_instantiate(). > > Let's say your backing store is a file labeled Rubble. > > mount -o smackfsroot=Rubble,smackfsdef=Rubble ... > > It is completely reasonable for a process labeled Flintstone to > have rwxa access to a file labeled Rubble. > > Smack rule: Flintstone Rubble rwxa > > In the case of writing to an existing Rubble file, what you > have looks fine. What's not so great is that if the Flintstone > process creates a file, it should be labeled Flintstone. Your > use of the smk_default, which is going to violate the principle > of least astonishment, and break the Smack policy as well. > > Let's make a minor change. Instead of using smackfsroot let's > use smackfstransmute and a slightly different access rule: > > mount -o smackfstransmute=Rubble,smackfsdef=Rubble ... > > Smack rule: Flintstone Rubble rwxat > > Now the only change we have to make to the Smack code is > that we don't want to create any files unless either the > process is labeled Rubble or the rule allowing the creation > has the "t" for transmute access. That should ensure that > everything is labeled Rubble. If it isn't, someone has mucked > with the metadata in a detectable way. All right, that kind of makes sense, but I'm still missing some pieces. Questions follow. > > diff --git a/include/linux/fs.h b/include/linux/fs.h > > index 32f598db0b0d..4597420ab933 100644 > > --- a/include/linux/fs.h > > +++ b/include/linux/fs.h > > @@ -1486,6 +1486,10 @@ static inline void sb_start_intwrite(struct super_block *sb) > > __sb_start_write(sb, SB_FREEZE_FS, true); > > } > > > > +static inline bool sb_in_userns(struct super_block *sb) > > +{ > > + return sb->s_user_ns != &init_user_ns; > > +} > > > > extern bool inode_owner_or_capable(const struct inode *inode); > > > > diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c > > index a143328f75eb..591fd19294e7 100644 > > --- a/security/smack/smack_lsm.c > > +++ b/security/smack/smack_lsm.c > > @@ -255,6 +255,10 @@ static struct smack_known *smk_fetch(const char *name, struct inode *ip, > > char *buffer; > > struct smack_known *skp = NULL; > > > > + /* Should never fetch xattrs from untrusted mounts */ > > + if (WARN_ON(sb_in_userns(ip->i_sb))) > > + return ERR_PTR(-EPERM); > > + > > Go ahead and fetch it, we'll check to make sure it's viable later. > > > if (ip->i_op->getxattr == NULL) > > return ERR_PTR(-EOPNOTSUPP); > > > > @@ -656,10 +660,14 @@ static int smack_sb_kern_mount(struct super_block *sb, int flags, void *data) > > */ > > if (specified) > > return -EPERM; > > + > > /* > > - * Unprivileged mounts get root and default from the caller. > > + * User namespace mounts get root and default from the backing > > + * store, if there is one. Other unprivileged mounts get them > > + * from the caller. > > */ > > - skp = smk_of_current(); > > + skp = (sb_in_userns(sb) && sb->s_bdev) ? > > + smk_of_inode(sb->s_bdev->bd_inode) : smk_of_current(); > > sp->smk_root = skp; > > sp->smk_default = skp; > > sp->smk_flags |= SMK_INODE_TRANSMUTE; I assume that you meant skp and not sp here. > > > } > > @@ -792,7 +800,12 @@ static int smack_bprm_secureexec(struct linux_binprm *bprm) > > */ > > static int smack_inode_alloc_security(struct inode *inode) > > { > > - struct smack_known *skp = smk_of_current(); > > + struct smack_known *skp; > > + > > + if (sb_in_userns(inode->i_sb)) > > + skp = ((struct superblock_smack *)(inode->i_sb->s_security))->smk_default; > > + else > > + skp = smk_of_current(); > > This should be left alone. > smack_inode_init_security is where you could disallow access that doesn't > legitimately result in a Rubble label on the file. It's something like > > ... after the call may = smk_access_entry(...) > if (sb_in_userns(inode->i_sb)) > if (skp != dsp && (may & MAY_TRANSMUTE) == 0) > return -EACCES; I'm not getting how this covers all cases. So we've set the transmute flag on the root inode. Files and directories created in the root directory get the same label, and directories also get the transmute attribute. That's all fine. What about an existing directory in the filesystem that already has a Slate label? I'm not getting what happens with this directory, or for new files created in this directory, which also relates to my other questions below. Also an aside - smk_access_entry looks weird. may is initialized to -ENOENT, and then rule_list is searched for a rule which matches the object and subject labels. Presumably it's possible that no rule could be found, otherwise the prior initialization of may is pointless. If this happens the following code treats it as though it always contains access flags even though it might contain -ENOENT. Nothing bad actually happens with a two's compliement representation of -ENOENT since it will just set a bit that's already set, but it still seems like it should have a may > 0 condition, for clarity if for no other reason. > > > inode->i_security = new_inode_smack(skp); > > if (inode->i_security == NULL) > > @@ -3175,6 +3188,11 @@ static void smack_d_instantiate(struct dentry *opt_dentry, struct inode *inode) > > break; > > } > > /* > > + * Don't use labels from xattrs for unprivileged mounts. > > + */ > > + if (sb_in_userns(inode->i_sb)) > > + break; > > + /* > > Again, use the label. Just check to make sure it's what you expect. What happens if it's not what I expect? smack_d_instantiate cannot fail ... so just use the default label? In that case why bother reading it at all? Or would we actually want to change the on-disk label if it didn't match? > > > * No xattr support means, alas, no SMACK label. > > * Use the aforeapplied default. > > * It would be curious if the label of the task > > Also untested. > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/ > > > ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-22 15:56 ` Seth Forshee 0 siblings, 0 replies; 138+ messages in thread From: Seth Forshee @ 2015-07-22 15:56 UTC (permalink / raw) To: Casey Schaufler Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, LSM List, SELinux-NSA, Linux FS Devel, Alexander Viro On Tue, Jul 21, 2015 at 06:52:31PM -0700, Casey Schaufler wrote: > On 7/21/2015 1:35 PM, Seth Forshee wrote: > > On Thu, Jul 16, 2015 at 05:59:22PM -0700, Andy Lutomirski wrote: > >> On Thu, Jul 16, 2015 at 5:45 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: > >>> On 7/16/2015 4:29 PM, Andy Lutomirski wrote: > >>>> I really don't see the benefit of making up extra rules that apply to > >>>> users outside a userns who try to access specifically a filesystem > >>>> with backing store. They wouldn't make sense for filesystems without > >>>> backing store. > >>> Sure it would. For Smack, it would be the label a file would be > >>> created with, which would be the label of the process creating > >>> the memory based filesystem. For SELinux the rules are more a > >>> touch more sophisticated, but I'm sure that Paul or Stephen could > >>> come up with how to determine it. > >>> > >>> The point, looping all the way back to the beginning, where we > >>> were talking about just ignoring the labels on the filesystem, > >>> is that if you use the same Smack label on the files in the > >>> filesystem as the backing store file has, we'll all be happy. > >>> If that label isn't something user can write to, he won't be > >>> able to write to the mounted objects, either. If there is no > >>> backing store then use the label of the process creating the > >>> filesystem, which will be the user, which will mean everything > >>> will work hunky dory. > >>> > >>> Yes, there's work involved, but I doubt there's a lot. Getting > >>> the label from the backing store or the creating process is > >>> simple enough. > >>> > > So something like the diff below (untested)? > > I think that this is close, and quite good for someone > who isn't very familiar with Smack. It's definitely headed > in the right direction. > > > All I'm really doing is setting smk_default as you describe above and > > then using it instead of smk_of_current() in > > smack_inode_alloc_security() and instead of the label from the disk in > > smack_d_instantiate(). > > Let's say your backing store is a file labeled Rubble. > > mount -o smackfsroot=Rubble,smackfsdef=Rubble ... > > It is completely reasonable for a process labeled Flintstone to > have rwxa access to a file labeled Rubble. > > Smack rule: Flintstone Rubble rwxa > > In the case of writing to an existing Rubble file, what you > have looks fine. What's not so great is that if the Flintstone > process creates a file, it should be labeled Flintstone. Your > use of the smk_default, which is going to violate the principle > of least astonishment, and break the Smack policy as well. > > Let's make a minor change. Instead of using smackfsroot let's > use smackfstransmute and a slightly different access rule: > > mount -o smackfstransmute=Rubble,smackfsdef=Rubble ... > > Smack rule: Flintstone Rubble rwxat > > Now the only change we have to make to the Smack code is > that we don't want to create any files unless either the > process is labeled Rubble or the rule allowing the creation > has the "t" for transmute access. That should ensure that > everything is labeled Rubble. If it isn't, someone has mucked > with the metadata in a detectable way. All right, that kind of makes sense, but I'm still missing some pieces. Questions follow. > > diff --git a/include/linux/fs.h b/include/linux/fs.h > > index 32f598db0b0d..4597420ab933 100644 > > --- a/include/linux/fs.h > > +++ b/include/linux/fs.h > > @@ -1486,6 +1486,10 @@ static inline void sb_start_intwrite(struct super_block *sb) > > __sb_start_write(sb, SB_FREEZE_FS, true); > > } > > > > +static inline bool sb_in_userns(struct super_block *sb) > > +{ > > + return sb->s_user_ns != &init_user_ns; > > +} > > > > extern bool inode_owner_or_capable(const struct inode *inode); > > > > diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c > > index a143328f75eb..591fd19294e7 100644 > > --- a/security/smack/smack_lsm.c > > +++ b/security/smack/smack_lsm.c > > @@ -255,6 +255,10 @@ static struct smack_known *smk_fetch(const char *name, struct inode *ip, > > char *buffer; > > struct smack_known *skp = NULL; > > > > + /* Should never fetch xattrs from untrusted mounts */ > > + if (WARN_ON(sb_in_userns(ip->i_sb))) > > + return ERR_PTR(-EPERM); > > + > > Go ahead and fetch it, we'll check to make sure it's viable later. > > > if (ip->i_op->getxattr == NULL) > > return ERR_PTR(-EOPNOTSUPP); > > > > @@ -656,10 +660,14 @@ static int smack_sb_kern_mount(struct super_block *sb, int flags, void *data) > > */ > > if (specified) > > return -EPERM; > > + > > /* > > - * Unprivileged mounts get root and default from the caller. > > + * User namespace mounts get root and default from the backing > > + * store, if there is one. Other unprivileged mounts get them > > + * from the caller. > > */ > > - skp = smk_of_current(); > > + skp = (sb_in_userns(sb) && sb->s_bdev) ? > > + smk_of_inode(sb->s_bdev->bd_inode) : smk_of_current(); > > sp->smk_root = skp; > > sp->smk_default = skp; > > sp->smk_flags |= SMK_INODE_TRANSMUTE; I assume that you meant skp and not sp here. > > > } > > @@ -792,7 +800,12 @@ static int smack_bprm_secureexec(struct linux_binprm *bprm) > > */ > > static int smack_inode_alloc_security(struct inode *inode) > > { > > - struct smack_known *skp = smk_of_current(); > > + struct smack_known *skp; > > + > > + if (sb_in_userns(inode->i_sb)) > > + skp = ((struct superblock_smack *)(inode->i_sb->s_security))->smk_default; > > + else > > + skp = smk_of_current(); > > This should be left alone. > smack_inode_init_security is where you could disallow access that doesn't > legitimately result in a Rubble label on the file. It's something like > > ... after the call may = smk_access_entry(...) > if (sb_in_userns(inode->i_sb)) > if (skp != dsp && (may & MAY_TRANSMUTE) == 0) > return -EACCES; I'm not getting how this covers all cases. So we've set the transmute flag on the root inode. Files and directories created in the root directory get the same label, and directories also get the transmute attribute. That's all fine. What about an existing directory in the filesystem that already has a Slate label? I'm not getting what happens with this directory, or for new files created in this directory, which also relates to my other questions below. Also an aside - smk_access_entry looks weird. may is initialized to -ENOENT, and then rule_list is searched for a rule which matches the object and subject labels. Presumably it's possible that no rule could be found, otherwise the prior initialization of may is pointless. If this happens the following code treats it as though it always contains access flags even though it might contain -ENOENT. Nothing bad actually happens with a two's compliement representation of -ENOENT since it will just set a bit that's already set, but it still seems like it should have a may > 0 condition, for clarity if for no other reason. > > > inode->i_security = new_inode_smack(skp); > > if (inode->i_security == NULL) > > @@ -3175,6 +3188,11 @@ static void smack_d_instantiate(struct dentry *opt_dentry, struct inode *inode) > > break; > > } > > /* > > + * Don't use labels from xattrs for unprivileged mounts. > > + */ > > + if (sb_in_userns(inode->i_sb)) > > + break; > > + /* > > Again, use the label. Just check to make sure it's what you expect. What happens if it's not what I expect? smack_d_instantiate cannot fail ... so just use the default label? In that case why bother reading it at all? Or would we actually want to change the on-disk label if it didn't match? > > > * No xattr support means, alas, no SMACK label. > > * Use the aforeapplied default. > > * It would be curious if the label of the task > > Also untested. > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/ > > > ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-22 15:56 ` Seth Forshee @ 2015-07-22 18:10 ` Casey Schaufler -1 siblings, 0 replies; 138+ messages in thread From: Casey Schaufler @ 2015-07-22 18:10 UTC (permalink / raw) To: Seth Forshee Cc: Andy Lutomirski, Eric W. Biederman, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel, Casey Schaufler On 7/22/2015 8:56 AM, Seth Forshee wrote: > On Tue, Jul 21, 2015 at 06:52:31PM -0700, Casey Schaufler wrote: >> On 7/21/2015 1:35 PM, Seth Forshee wrote: >>> On Thu, Jul 16, 2015 at 05:59:22PM -0700, Andy Lutomirski wrote: >>>> On Thu, Jul 16, 2015 at 5:45 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: >>>>> On 7/16/2015 4:29 PM, Andy Lutomirski wrote: >>>>>> I really don't see the benefit of making up extra rules that apply to >>>>>> users outside a userns who try to access specifically a filesystem >>>>>> with backing store. They wouldn't make sense for filesystems without >>>>>> backing store. >>>>> Sure it would. For Smack, it would be the label a file would be >>>>> created with, which would be the label of the process creating >>>>> the memory based filesystem. For SELinux the rules are more a >>>>> touch more sophisticated, but I'm sure that Paul or Stephen could >>>>> come up with how to determine it. >>>>> >>>>> The point, looping all the way back to the beginning, where we >>>>> were talking about just ignoring the labels on the filesystem, >>>>> is that if you use the same Smack label on the files in the >>>>> filesystem as the backing store file has, we'll all be happy. >>>>> If that label isn't something user can write to, he won't be >>>>> able to write to the mounted objects, either. If there is no >>>>> backing store then use the label of the process creating the >>>>> filesystem, which will be the user, which will mean everything >>>>> will work hunky dory. >>>>> >>>>> Yes, there's work involved, but I doubt there's a lot. Getting >>>>> the label from the backing store or the creating process is >>>>> simple enough. >>>>> >>> So something like the diff below (untested)? >> I think that this is close, and quite good for someone >> who isn't very familiar with Smack. It's definitely headed >> in the right direction. >> >>> All I'm really doing is setting smk_default as you describe above and >>> then using it instead of smk_of_current() in >>> smack_inode_alloc_security() and instead of the label from the disk in >>> smack_d_instantiate(). >> Let's say your backing store is a file labeled Rubble. >> >> mount -o smackfsroot=Rubble,smackfsdef=Rubble ... >> >> It is completely reasonable for a process labeled Flintstone to >> have rwxa access to a file labeled Rubble. >> >> Smack rule: Flintstone Rubble rwxa >> >> In the case of writing to an existing Rubble file, what you >> have looks fine. What's not so great is that if the Flintstone >> process creates a file, it should be labeled Flintstone. Your >> use of the smk_default, which is going to violate the principle >> of least astonishment, and break the Smack policy as well. >> >> Let's make a minor change. Instead of using smackfsroot let's >> use smackfstransmute and a slightly different access rule: >> >> mount -o smackfstransmute=Rubble,smackfsdef=Rubble ... >> >> Smack rule: Flintstone Rubble rwxat >> >> Now the only change we have to make to the Smack code is >> that we don't want to create any files unless either the >> process is labeled Rubble or the rule allowing the creation >> has the "t" for transmute access. That should ensure that >> everything is labeled Rubble. If it isn't, someone has mucked >> with the metadata in a detectable way. > All right, that kind of makes sense, but I'm still missing some pieces. > Questions follow. > >>> diff --git a/include/linux/fs.h b/include/linux/fs.h >>> index 32f598db0b0d..4597420ab933 100644 >>> --- a/include/linux/fs.h >>> +++ b/include/linux/fs.h >>> @@ -1486,6 +1486,10 @@ static inline void sb_start_intwrite(struct super_block *sb) >>> __sb_start_write(sb, SB_FREEZE_FS, true); >>> } >>> >>> +static inline bool sb_in_userns(struct super_block *sb) >>> +{ >>> + return sb->s_user_ns != &init_user_ns; >>> +} >>> >>> extern bool inode_owner_or_capable(const struct inode *inode); >>> >>> diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c >>> index a143328f75eb..591fd19294e7 100644 >>> --- a/security/smack/smack_lsm.c >>> +++ b/security/smack/smack_lsm.c >>> @@ -255,6 +255,10 @@ static struct smack_known *smk_fetch(const char *name, struct inode *ip, >>> char *buffer; >>> struct smack_known *skp = NULL; >>> >>> + /* Should never fetch xattrs from untrusted mounts */ >>> + if (WARN_ON(sb_in_userns(ip->i_sb))) >>> + return ERR_PTR(-EPERM); >>> + >> Go ahead and fetch it, we'll check to make sure it's viable later. >> >>> if (ip->i_op->getxattr == NULL) >>> return ERR_PTR(-EOPNOTSUPP); >>> >>> @@ -656,10 +660,14 @@ static int smack_sb_kern_mount(struct super_block *sb, int flags, void *data) >>> */ >>> if (specified) >>> return -EPERM; >>> + >>> /* >>> - * Unprivileged mounts get root and default from the caller. >>> + * User namespace mounts get root and default from the backing >>> + * store, if there is one. Other unprivileged mounts get them >>> + * from the caller. >>> */ >>> - skp = smk_of_current(); >>> + skp = (sb_in_userns(sb) && sb->s_bdev) ? >>> + smk_of_inode(sb->s_bdev->bd_inode) : smk_of_current(); >>> sp->smk_root = skp; >>> sp->smk_default = skp; >> sp->smk_flags |= SMK_INODE_TRANSMUTE; > I assume that you meant skp and not sp here. Actually, neither is correct. You want to set SMK_INODE_TRANSMUTE in the smk_flags field of the root inode. That's easy: transmute = 1; and the code after "Initialize the root inode" will take care of it. >>> } >>> @@ -792,7 +800,12 @@ static int smack_bprm_secureexec(struct linux_binprm *bprm) >>> */ >>> static int smack_inode_alloc_security(struct inode *inode) >>> { >>> - struct smack_known *skp = smk_of_current(); >>> + struct smack_known *skp; >>> + >>> + if (sb_in_userns(inode->i_sb)) >>> + skp = ((struct superblock_smack *)(inode->i_sb->s_security))->smk_default; >>> + else >>> + skp = smk_of_current(); >> This should be left alone. >> smack_inode_init_security is where you could disallow access that doesn't >> legitimately result in a Rubble label on the file. It's something like >> >> ... after the call may = smk_access_entry(...) >> if (sb_in_userns(inode->i_sb)) >> if (skp != dsp && (may & MAY_TRANSMUTE) == 0) >> return -EACCES; > I'm not getting how this covers all cases. > > So we've set the transmute flag on the root inode. Files and directories > created in the root directory get the same label, and directories also > get the transmute attribute. That's all fine. > > What about an existing directory in the filesystem that already has a > Slate label? I'm not getting what happens with this directory, or for > new files created in this directory, which also relates to my other > questions below. > > Also an aside - smk_access_entry looks weird. may is initialized to > -ENOENT, and then rule_list is searched for a rule which matches the > object and subject labels. Presumably it's possible that no rule could > be found, otherwise the prior initialization of may is pointless. If > this happens the following code treats it as though it always contains > access flags even though it might contain -ENOENT. Nothing bad actually > happens with a two's compliement representation of -ENOENT since it will > just set a bit that's already set, but it still seems like it should > have a may > 0 condition, for clarity if for no other reason. My suggested code is just wrong. I wasn't looking at the whole code, only the patch, and got myself confused. Apologies. If we want to go straight for the jugular how about this? I'm assuming that inode->i_sb->s_bdev->bd_inode is the inode of the backing store. static int smack_inode_permission(struct inode *inode, int mask) { struct smk_audit_info ad; int no_block = mask & MAY_NOT_BLOCK; int rc; mask &= (MAY_READ|MAY_WRITE|MAY_EXEC|MAY_APPEND); /* * No permission to check. Existence test. Yup, it's there. */ if (mask == 0) return 0; + if (sb_in_userns(inode->i_sb)) && + smk_of_inode(inode) != smk_of_inode(inode->i_sb->s_bdev->bd_inode)) + return -EACCES; + /* May be droppable after audit */ if (no_block) return -ECHILD; smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_INODE); smk_ad_setfield_u_fs_inode(&ad, inode); rc = smk_curacc(smk_of_inode(inode), mask, &ad); rc = smk_bu_inode(inode, mask, rc); return rc; } > >>> inode->i_security = new_inode_smack(skp); >>> if (inode->i_security == NULL) >>> @@ -3175,6 +3188,11 @@ static void smack_d_instantiate(struct dentry *opt_dentry, struct inode *inode) >>> break; >>> } >>> /* >>> + * Don't use labels from xattrs for unprivileged mounts. >>> + */ >>> + if (sb_in_userns(inode->i_sb)) >>> + break; >>> + /* >> Again, use the label. Just check to make sure it's what you expect. > What happens if it's not what I expect? smack_d_instantiate cannot fail > ... so just use the default label? In that case why bother reading it at > all? Or would we actually want to change the on-disk label if it didn't > match? > >>> * No xattr support means, alas, no SMACK label. >>> * Use the aforeapplied default. >>> * It would be curious if the label of the task >> Also untested. >> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> Please read the FAQ at http://www.tux.org/lkml/ >>> > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-22 18:10 ` Casey Schaufler 0 siblings, 0 replies; 138+ messages in thread From: Casey Schaufler @ 2015-07-22 18:10 UTC (permalink / raw) To: Seth Forshee Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, LSM List, SELinux-NSA, Linux FS Devel, Alexander Viro On 7/22/2015 8:56 AM, Seth Forshee wrote: > On Tue, Jul 21, 2015 at 06:52:31PM -0700, Casey Schaufler wrote: >> On 7/21/2015 1:35 PM, Seth Forshee wrote: >>> On Thu, Jul 16, 2015 at 05:59:22PM -0700, Andy Lutomirski wrote: >>>> On Thu, Jul 16, 2015 at 5:45 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: >>>>> On 7/16/2015 4:29 PM, Andy Lutomirski wrote: >>>>>> I really don't see the benefit of making up extra rules that apply to >>>>>> users outside a userns who try to access specifically a filesystem >>>>>> with backing store. They wouldn't make sense for filesystems without >>>>>> backing store. >>>>> Sure it would. For Smack, it would be the label a file would be >>>>> created with, which would be the label of the process creating >>>>> the memory based filesystem. For SELinux the rules are more a >>>>> touch more sophisticated, but I'm sure that Paul or Stephen could >>>>> come up with how to determine it. >>>>> >>>>> The point, looping all the way back to the beginning, where we >>>>> were talking about just ignoring the labels on the filesystem, >>>>> is that if you use the same Smack label on the files in the >>>>> filesystem as the backing store file has, we'll all be happy. >>>>> If that label isn't something user can write to, he won't be >>>>> able to write to the mounted objects, either. If there is no >>>>> backing store then use the label of the process creating the >>>>> filesystem, which will be the user, which will mean everything >>>>> will work hunky dory. >>>>> >>>>> Yes, there's work involved, but I doubt there's a lot. Getting >>>>> the label from the backing store or the creating process is >>>>> simple enough. >>>>> >>> So something like the diff below (untested)? >> I think that this is close, and quite good for someone >> who isn't very familiar with Smack. It's definitely headed >> in the right direction. >> >>> All I'm really doing is setting smk_default as you describe above and >>> then using it instead of smk_of_current() in >>> smack_inode_alloc_security() and instead of the label from the disk in >>> smack_d_instantiate(). >> Let's say your backing store is a file labeled Rubble. >> >> mount -o smackfsroot=Rubble,smackfsdef=Rubble ... >> >> It is completely reasonable for a process labeled Flintstone to >> have rwxa access to a file labeled Rubble. >> >> Smack rule: Flintstone Rubble rwxa >> >> In the case of writing to an existing Rubble file, what you >> have looks fine. What's not so great is that if the Flintstone >> process creates a file, it should be labeled Flintstone. Your >> use of the smk_default, which is going to violate the principle >> of least astonishment, and break the Smack policy as well. >> >> Let's make a minor change. Instead of using smackfsroot let's >> use smackfstransmute and a slightly different access rule: >> >> mount -o smackfstransmute=Rubble,smackfsdef=Rubble ... >> >> Smack rule: Flintstone Rubble rwxat >> >> Now the only change we have to make to the Smack code is >> that we don't want to create any files unless either the >> process is labeled Rubble or the rule allowing the creation >> has the "t" for transmute access. That should ensure that >> everything is labeled Rubble. If it isn't, someone has mucked >> with the metadata in a detectable way. > All right, that kind of makes sense, but I'm still missing some pieces. > Questions follow. > >>> diff --git a/include/linux/fs.h b/include/linux/fs.h >>> index 32f598db0b0d..4597420ab933 100644 >>> --- a/include/linux/fs.h >>> +++ b/include/linux/fs.h >>> @@ -1486,6 +1486,10 @@ static inline void sb_start_intwrite(struct super_block *sb) >>> __sb_start_write(sb, SB_FREEZE_FS, true); >>> } >>> >>> +static inline bool sb_in_userns(struct super_block *sb) >>> +{ >>> + return sb->s_user_ns != &init_user_ns; >>> +} >>> >>> extern bool inode_owner_or_capable(const struct inode *inode); >>> >>> diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c >>> index a143328f75eb..591fd19294e7 100644 >>> --- a/security/smack/smack_lsm.c >>> +++ b/security/smack/smack_lsm.c >>> @@ -255,6 +255,10 @@ static struct smack_known *smk_fetch(const char *name, struct inode *ip, >>> char *buffer; >>> struct smack_known *skp = NULL; >>> >>> + /* Should never fetch xattrs from untrusted mounts */ >>> + if (WARN_ON(sb_in_userns(ip->i_sb))) >>> + return ERR_PTR(-EPERM); >>> + >> Go ahead and fetch it, we'll check to make sure it's viable later. >> >>> if (ip->i_op->getxattr == NULL) >>> return ERR_PTR(-EOPNOTSUPP); >>> >>> @@ -656,10 +660,14 @@ static int smack_sb_kern_mount(struct super_block *sb, int flags, void *data) >>> */ >>> if (specified) >>> return -EPERM; >>> + >>> /* >>> - * Unprivileged mounts get root and default from the caller. >>> + * User namespace mounts get root and default from the backing >>> + * store, if there is one. Other unprivileged mounts get them >>> + * from the caller. >>> */ >>> - skp = smk_of_current(); >>> + skp = (sb_in_userns(sb) && sb->s_bdev) ? >>> + smk_of_inode(sb->s_bdev->bd_inode) : smk_of_current(); >>> sp->smk_root = skp; >>> sp->smk_default = skp; >> sp->smk_flags |= SMK_INODE_TRANSMUTE; > I assume that you meant skp and not sp here. Actually, neither is correct. You want to set SMK_INODE_TRANSMUTE in the smk_flags field of the root inode. That's easy: transmute = 1; and the code after "Initialize the root inode" will take care of it. >>> } >>> @@ -792,7 +800,12 @@ static int smack_bprm_secureexec(struct linux_binprm *bprm) >>> */ >>> static int smack_inode_alloc_security(struct inode *inode) >>> { >>> - struct smack_known *skp = smk_of_current(); >>> + struct smack_known *skp; >>> + >>> + if (sb_in_userns(inode->i_sb)) >>> + skp = ((struct superblock_smack *)(inode->i_sb->s_security))->smk_default; >>> + else >>> + skp = smk_of_current(); >> This should be left alone. >> smack_inode_init_security is where you could disallow access that doesn't >> legitimately result in a Rubble label on the file. It's something like >> >> ... after the call may = smk_access_entry(...) >> if (sb_in_userns(inode->i_sb)) >> if (skp != dsp && (may & MAY_TRANSMUTE) == 0) >> return -EACCES; > I'm not getting how this covers all cases. > > So we've set the transmute flag on the root inode. Files and directories > created in the root directory get the same label, and directories also > get the transmute attribute. That's all fine. > > What about an existing directory in the filesystem that already has a > Slate label? I'm not getting what happens with this directory, or for > new files created in this directory, which also relates to my other > questions below. > > Also an aside - smk_access_entry looks weird. may is initialized to > -ENOENT, and then rule_list is searched for a rule which matches the > object and subject labels. Presumably it's possible that no rule could > be found, otherwise the prior initialization of may is pointless. If > this happens the following code treats it as though it always contains > access flags even though it might contain -ENOENT. Nothing bad actually > happens with a two's compliement representation of -ENOENT since it will > just set a bit that's already set, but it still seems like it should > have a may > 0 condition, for clarity if for no other reason. My suggested code is just wrong. I wasn't looking at the whole code, only the patch, and got myself confused. Apologies. If we want to go straight for the jugular how about this? I'm assuming that inode->i_sb->s_bdev->bd_inode is the inode of the backing store. static int smack_inode_permission(struct inode *inode, int mask) { struct smk_audit_info ad; int no_block = mask & MAY_NOT_BLOCK; int rc; mask &= (MAY_READ|MAY_WRITE|MAY_EXEC|MAY_APPEND); /* * No permission to check. Existence test. Yup, it's there. */ if (mask == 0) return 0; + if (sb_in_userns(inode->i_sb)) && + smk_of_inode(inode) != smk_of_inode(inode->i_sb->s_bdev->bd_inode)) + return -EACCES; + /* May be droppable after audit */ if (no_block) return -ECHILD; smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_INODE); smk_ad_setfield_u_fs_inode(&ad, inode); rc = smk_curacc(smk_of_inode(inode), mask, &ad); rc = smk_bu_inode(inode, mask, rc); return rc; } > >>> inode->i_security = new_inode_smack(skp); >>> if (inode->i_security == NULL) >>> @@ -3175,6 +3188,11 @@ static void smack_d_instantiate(struct dentry *opt_dentry, struct inode *inode) >>> break; >>> } >>> /* >>> + * Don't use labels from xattrs for unprivileged mounts. >>> + */ >>> + if (sb_in_userns(inode->i_sb)) >>> + break; >>> + /* >> Again, use the label. Just check to make sure it's what you expect. > What happens if it's not what I expect? smack_d_instantiate cannot fail > ... so just use the default label? In that case why bother reading it at > all? Or would we actually want to change the on-disk label if it didn't > match? > >>> * No xattr support means, alas, no SMACK label. >>> * Use the aforeapplied default. >>> * It would be curious if the label of the task >> Also untested. >> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> Please read the FAQ at http://www.tux.org/lkml/ >>> > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-22 18:10 ` Casey Schaufler @ 2015-07-22 19:32 ` Seth Forshee -1 siblings, 0 replies; 138+ messages in thread From: Seth Forshee @ 2015-07-22 19:32 UTC (permalink / raw) To: Casey Schaufler Cc: Andy Lutomirski, Eric W. Biederman, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel On Wed, Jul 22, 2015 at 11:10:46AM -0700, Casey Schaufler wrote: > On 7/22/2015 8:56 AM, Seth Forshee wrote: > > On Tue, Jul 21, 2015 at 06:52:31PM -0700, Casey Schaufler wrote: > >> On 7/21/2015 1:35 PM, Seth Forshee wrote: > >>> On Thu, Jul 16, 2015 at 05:59:22PM -0700, Andy Lutomirski wrote: > >>>> On Thu, Jul 16, 2015 at 5:45 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: > >>>>> On 7/16/2015 4:29 PM, Andy Lutomirski wrote: > >>>>>> I really don't see the benefit of making up extra rules that apply to > >>>>>> users outside a userns who try to access specifically a filesystem > >>>>>> with backing store. They wouldn't make sense for filesystems without > >>>>>> backing store. > >>>>> Sure it would. For Smack, it would be the label a file would be > >>>>> created with, which would be the label of the process creating > >>>>> the memory based filesystem. For SELinux the rules are more a > >>>>> touch more sophisticated, but I'm sure that Paul or Stephen could > >>>>> come up with how to determine it. > >>>>> > >>>>> The point, looping all the way back to the beginning, where we > >>>>> were talking about just ignoring the labels on the filesystem, > >>>>> is that if you use the same Smack label on the files in the > >>>>> filesystem as the backing store file has, we'll all be happy. > >>>>> If that label isn't something user can write to, he won't be > >>>>> able to write to the mounted objects, either. If there is no > >>>>> backing store then use the label of the process creating the > >>>>> filesystem, which will be the user, which will mean everything > >>>>> will work hunky dory. > >>>>> > >>>>> Yes, there's work involved, but I doubt there's a lot. Getting > >>>>> the label from the backing store or the creating process is > >>>>> simple enough. > >>>>> > >>> So something like the diff below (untested)? > >> I think that this is close, and quite good for someone > >> who isn't very familiar with Smack. It's definitely headed > >> in the right direction. > >> > >>> All I'm really doing is setting smk_default as you describe above and > >>> then using it instead of smk_of_current() in > >>> smack_inode_alloc_security() and instead of the label from the disk in > >>> smack_d_instantiate(). > >> Let's say your backing store is a file labeled Rubble. > >> > >> mount -o smackfsroot=Rubble,smackfsdef=Rubble ... > >> > >> It is completely reasonable for a process labeled Flintstone to > >> have rwxa access to a file labeled Rubble. > >> > >> Smack rule: Flintstone Rubble rwxa > >> > >> In the case of writing to an existing Rubble file, what you > >> have looks fine. What's not so great is that if the Flintstone > >> process creates a file, it should be labeled Flintstone. Your > >> use of the smk_default, which is going to violate the principle > >> of least astonishment, and break the Smack policy as well. > >> > >> Let's make a minor change. Instead of using smackfsroot let's > >> use smackfstransmute and a slightly different access rule: > >> > >> mount -o smackfstransmute=Rubble,smackfsdef=Rubble ... > >> > >> Smack rule: Flintstone Rubble rwxat > >> > >> Now the only change we have to make to the Smack code is > >> that we don't want to create any files unless either the > >> process is labeled Rubble or the rule allowing the creation > >> has the "t" for transmute access. That should ensure that > >> everything is labeled Rubble. If it isn't, someone has mucked > >> with the metadata in a detectable way. > > All right, that kind of makes sense, but I'm still missing some pieces. > > Questions follow. > > > >>> diff --git a/include/linux/fs.h b/include/linux/fs.h > >>> index 32f598db0b0d..4597420ab933 100644 > >>> --- a/include/linux/fs.h > >>> +++ b/include/linux/fs.h > >>> @@ -1486,6 +1486,10 @@ static inline void sb_start_intwrite(struct super_block *sb) > >>> __sb_start_write(sb, SB_FREEZE_FS, true); > >>> } > >>> > >>> +static inline bool sb_in_userns(struct super_block *sb) > >>> +{ > >>> + return sb->s_user_ns != &init_user_ns; > >>> +} > >>> > >>> extern bool inode_owner_or_capable(const struct inode *inode); > >>> > >>> diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c > >>> index a143328f75eb..591fd19294e7 100644 > >>> --- a/security/smack/smack_lsm.c > >>> +++ b/security/smack/smack_lsm.c > >>> @@ -255,6 +255,10 @@ static struct smack_known *smk_fetch(const char *name, struct inode *ip, > >>> char *buffer; > >>> struct smack_known *skp = NULL; > >>> > >>> + /* Should never fetch xattrs from untrusted mounts */ > >>> + if (WARN_ON(sb_in_userns(ip->i_sb))) > >>> + return ERR_PTR(-EPERM); > >>> + > >> Go ahead and fetch it, we'll check to make sure it's viable later. > >> > >>> if (ip->i_op->getxattr == NULL) > >>> return ERR_PTR(-EOPNOTSUPP); > >>> > >>> @@ -656,10 +660,14 @@ static int smack_sb_kern_mount(struct super_block *sb, int flags, void *data) > >>> */ > >>> if (specified) > >>> return -EPERM; > >>> + > >>> /* > >>> - * Unprivileged mounts get root and default from the caller. > >>> + * User namespace mounts get root and default from the backing > >>> + * store, if there is one. Other unprivileged mounts get them > >>> + * from the caller. > >>> */ > >>> - skp = smk_of_current(); > >>> + skp = (sb_in_userns(sb) && sb->s_bdev) ? > >>> + smk_of_inode(sb->s_bdev->bd_inode) : smk_of_current(); > >>> sp->smk_root = skp; > >>> sp->smk_default = skp; > >> sp->smk_flags |= SMK_INODE_TRANSMUTE; > > I assume that you meant skp and not sp here. > > Actually, neither is correct. You want to set SMK_INODE_TRANSMUTE > in the smk_flags field of the root inode. That's easy: > > transmute = 1; > > and the code after "Initialize the root inode" will take care of it. Yeah, that's what I've actually done. > >>> } > >>> @@ -792,7 +800,12 @@ static int smack_bprm_secureexec(struct linux_binprm *bprm) > >>> */ > >>> static int smack_inode_alloc_security(struct inode *inode) > >>> { > >>> - struct smack_known *skp = smk_of_current(); > >>> + struct smack_known *skp; > >>> + > >>> + if (sb_in_userns(inode->i_sb)) > >>> + skp = ((struct superblock_smack *)(inode->i_sb->s_security))->smk_default; > >>> + else > >>> + skp = smk_of_current(); > >> This should be left alone. > >> smack_inode_init_security is where you could disallow access that doesn't > >> legitimately result in a Rubble label on the file. It's something like > >> > >> ... after the call may = smk_access_entry(...) > >> if (sb_in_userns(inode->i_sb)) > >> if (skp != dsp && (may & MAY_TRANSMUTE) == 0) > >> return -EACCES; > > I'm not getting how this covers all cases. > > > > So we've set the transmute flag on the root inode. Files and directories > > created in the root directory get the same label, and directories also > > get the transmute attribute. That's all fine. > > > > What about an existing directory in the filesystem that already has a > > Slate label? I'm not getting what happens with this directory, or for > > new files created in this directory, which also relates to my other > > questions below. > > > > Also an aside - smk_access_entry looks weird. may is initialized to > > -ENOENT, and then rule_list is searched for a rule which matches the > > object and subject labels. Presumably it's possible that no rule could > > be found, otherwise the prior initialization of may is pointless. If > > this happens the following code treats it as though it always contains > > access flags even though it might contain -ENOENT. Nothing bad actually > > happens with a two's compliement representation of -ENOENT since it will > > just set a bit that's already set, but it still seems like it should > > have a may > 0 condition, for clarity if for no other reason. > > My suggested code is just wrong. I wasn't looking at the whole code, > only the patch, and got myself confused. Apologies. > > If we want to go straight for the jugular how about this? I'm assuming > that inode->i_sb->s_bdev->bd_inode is the inode of the backing store. Yes. > static int smack_inode_permission(struct inode *inode, int mask) > { > struct smk_audit_info ad; > int no_block = mask & MAY_NOT_BLOCK; > int rc; > > mask &= (MAY_READ|MAY_WRITE|MAY_EXEC|MAY_APPEND); > /* > * No permission to check. Existence test. Yup, it's there. > */ > if (mask == 0) > return 0; > > + if (sb_in_userns(inode->i_sb)) && > + smk_of_inode(inode) != smk_of_inode(inode->i_sb->s_bdev->bd_inode)) > + return -EACCES; > + > /* May be droppable after audit */ > if (no_block) > return -ECHILD; > smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_INODE); > smk_ad_setfield_u_fs_inode(&ad, inode); > rc = smk_curacc(smk_of_inode(inode), mask, &ad); > rc = smk_bu_inode(inode, mask, rc); > return rc; > } Hmm, okay. I think I've been a little confused all this time about how you want to handle these unprivileged mounts. Originally I thought you wanted all objects in the filesystem to get the same label as the backing store. That's what I tried to implement originally, i.e. smk_root=smk_default=smk_of_inode(...->bd_inode), then assign every object (new and existing) smk_default and completely ignore the labels on disk. This is what I currently think you want for user ns mounts: 1. smk_root and smk_default are assigned the label of the backing device. 2. s_root is assigned the transmute property. 3. For existing files: a. Files with the same label as the backing device are accessible. b. Files with any other label are not accessible. If this is right, there are a couple lingering questions in my mind. First, what happens with files created in directories with the same label as the backing device but without the transmute property set? The inode for the new file will initially be labeled with smk_of_current(), but then during d_instantiate it will get smk_default and thus end up with the label we want. So that seems okay. The second is whether files with the SMACK64EXEC attribute is still a problem. It seems it is, for files with the same label as the backing store at least. I think we can simply skip the code that reads out this xattr and sets smk_task for user ns mounts, or else skip assigning the label to the new task in bprm_set_creds. The latter seems more consistent with the approach you've suggested for dealing with labels from disk. So I guess all of that seems okay, though perhaps a bit restrictive given that the user who mounted the filesystem already has full access to the backing store. Please let me know whether or not this matches up with what you are thinking, then I can procede with the implementation. Thanks, Seth ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-22 19:32 ` Seth Forshee 0 siblings, 0 replies; 138+ messages in thread From: Seth Forshee @ 2015-07-22 19:32 UTC (permalink / raw) To: Casey Schaufler Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, LSM List, SELinux-NSA, Linux FS Devel, Alexander Viro On Wed, Jul 22, 2015 at 11:10:46AM -0700, Casey Schaufler wrote: > On 7/22/2015 8:56 AM, Seth Forshee wrote: > > On Tue, Jul 21, 2015 at 06:52:31PM -0700, Casey Schaufler wrote: > >> On 7/21/2015 1:35 PM, Seth Forshee wrote: > >>> On Thu, Jul 16, 2015 at 05:59:22PM -0700, Andy Lutomirski wrote: > >>>> On Thu, Jul 16, 2015 at 5:45 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: > >>>>> On 7/16/2015 4:29 PM, Andy Lutomirski wrote: > >>>>>> I really don't see the benefit of making up extra rules that apply to > >>>>>> users outside a userns who try to access specifically a filesystem > >>>>>> with backing store. They wouldn't make sense for filesystems without > >>>>>> backing store. > >>>>> Sure it would. For Smack, it would be the label a file would be > >>>>> created with, which would be the label of the process creating > >>>>> the memory based filesystem. For SELinux the rules are more a > >>>>> touch more sophisticated, but I'm sure that Paul or Stephen could > >>>>> come up with how to determine it. > >>>>> > >>>>> The point, looping all the way back to the beginning, where we > >>>>> were talking about just ignoring the labels on the filesystem, > >>>>> is that if you use the same Smack label on the files in the > >>>>> filesystem as the backing store file has, we'll all be happy. > >>>>> If that label isn't something user can write to, he won't be > >>>>> able to write to the mounted objects, either. If there is no > >>>>> backing store then use the label of the process creating the > >>>>> filesystem, which will be the user, which will mean everything > >>>>> will work hunky dory. > >>>>> > >>>>> Yes, there's work involved, but I doubt there's a lot. Getting > >>>>> the label from the backing store or the creating process is > >>>>> simple enough. > >>>>> > >>> So something like the diff below (untested)? > >> I think that this is close, and quite good for someone > >> who isn't very familiar with Smack. It's definitely headed > >> in the right direction. > >> > >>> All I'm really doing is setting smk_default as you describe above and > >>> then using it instead of smk_of_current() in > >>> smack_inode_alloc_security() and instead of the label from the disk in > >>> smack_d_instantiate(). > >> Let's say your backing store is a file labeled Rubble. > >> > >> mount -o smackfsroot=Rubble,smackfsdef=Rubble ... > >> > >> It is completely reasonable for a process labeled Flintstone to > >> have rwxa access to a file labeled Rubble. > >> > >> Smack rule: Flintstone Rubble rwxa > >> > >> In the case of writing to an existing Rubble file, what you > >> have looks fine. What's not so great is that if the Flintstone > >> process creates a file, it should be labeled Flintstone. Your > >> use of the smk_default, which is going to violate the principle > >> of least astonishment, and break the Smack policy as well. > >> > >> Let's make a minor change. Instead of using smackfsroot let's > >> use smackfstransmute and a slightly different access rule: > >> > >> mount -o smackfstransmute=Rubble,smackfsdef=Rubble ... > >> > >> Smack rule: Flintstone Rubble rwxat > >> > >> Now the only change we have to make to the Smack code is > >> that we don't want to create any files unless either the > >> process is labeled Rubble or the rule allowing the creation > >> has the "t" for transmute access. That should ensure that > >> everything is labeled Rubble. If it isn't, someone has mucked > >> with the metadata in a detectable way. > > All right, that kind of makes sense, but I'm still missing some pieces. > > Questions follow. > > > >>> diff --git a/include/linux/fs.h b/include/linux/fs.h > >>> index 32f598db0b0d..4597420ab933 100644 > >>> --- a/include/linux/fs.h > >>> +++ b/include/linux/fs.h > >>> @@ -1486,6 +1486,10 @@ static inline void sb_start_intwrite(struct super_block *sb) > >>> __sb_start_write(sb, SB_FREEZE_FS, true); > >>> } > >>> > >>> +static inline bool sb_in_userns(struct super_block *sb) > >>> +{ > >>> + return sb->s_user_ns != &init_user_ns; > >>> +} > >>> > >>> extern bool inode_owner_or_capable(const struct inode *inode); > >>> > >>> diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c > >>> index a143328f75eb..591fd19294e7 100644 > >>> --- a/security/smack/smack_lsm.c > >>> +++ b/security/smack/smack_lsm.c > >>> @@ -255,6 +255,10 @@ static struct smack_known *smk_fetch(const char *name, struct inode *ip, > >>> char *buffer; > >>> struct smack_known *skp = NULL; > >>> > >>> + /* Should never fetch xattrs from untrusted mounts */ > >>> + if (WARN_ON(sb_in_userns(ip->i_sb))) > >>> + return ERR_PTR(-EPERM); > >>> + > >> Go ahead and fetch it, we'll check to make sure it's viable later. > >> > >>> if (ip->i_op->getxattr == NULL) > >>> return ERR_PTR(-EOPNOTSUPP); > >>> > >>> @@ -656,10 +660,14 @@ static int smack_sb_kern_mount(struct super_block *sb, int flags, void *data) > >>> */ > >>> if (specified) > >>> return -EPERM; > >>> + > >>> /* > >>> - * Unprivileged mounts get root and default from the caller. > >>> + * User namespace mounts get root and default from the backing > >>> + * store, if there is one. Other unprivileged mounts get them > >>> + * from the caller. > >>> */ > >>> - skp = smk_of_current(); > >>> + skp = (sb_in_userns(sb) && sb->s_bdev) ? > >>> + smk_of_inode(sb->s_bdev->bd_inode) : smk_of_current(); > >>> sp->smk_root = skp; > >>> sp->smk_default = skp; > >> sp->smk_flags |= SMK_INODE_TRANSMUTE; > > I assume that you meant skp and not sp here. > > Actually, neither is correct. You want to set SMK_INODE_TRANSMUTE > in the smk_flags field of the root inode. That's easy: > > transmute = 1; > > and the code after "Initialize the root inode" will take care of it. Yeah, that's what I've actually done. > >>> } > >>> @@ -792,7 +800,12 @@ static int smack_bprm_secureexec(struct linux_binprm *bprm) > >>> */ > >>> static int smack_inode_alloc_security(struct inode *inode) > >>> { > >>> - struct smack_known *skp = smk_of_current(); > >>> + struct smack_known *skp; > >>> + > >>> + if (sb_in_userns(inode->i_sb)) > >>> + skp = ((struct superblock_smack *)(inode->i_sb->s_security))->smk_default; > >>> + else > >>> + skp = smk_of_current(); > >> This should be left alone. > >> smack_inode_init_security is where you could disallow access that doesn't > >> legitimately result in a Rubble label on the file. It's something like > >> > >> ... after the call may = smk_access_entry(...) > >> if (sb_in_userns(inode->i_sb)) > >> if (skp != dsp && (may & MAY_TRANSMUTE) == 0) > >> return -EACCES; > > I'm not getting how this covers all cases. > > > > So we've set the transmute flag on the root inode. Files and directories > > created in the root directory get the same label, and directories also > > get the transmute attribute. That's all fine. > > > > What about an existing directory in the filesystem that already has a > > Slate label? I'm not getting what happens with this directory, or for > > new files created in this directory, which also relates to my other > > questions below. > > > > Also an aside - smk_access_entry looks weird. may is initialized to > > -ENOENT, and then rule_list is searched for a rule which matches the > > object and subject labels. Presumably it's possible that no rule could > > be found, otherwise the prior initialization of may is pointless. If > > this happens the following code treats it as though it always contains > > access flags even though it might contain -ENOENT. Nothing bad actually > > happens with a two's compliement representation of -ENOENT since it will > > just set a bit that's already set, but it still seems like it should > > have a may > 0 condition, for clarity if for no other reason. > > My suggested code is just wrong. I wasn't looking at the whole code, > only the patch, and got myself confused. Apologies. > > If we want to go straight for the jugular how about this? I'm assuming > that inode->i_sb->s_bdev->bd_inode is the inode of the backing store. Yes. > static int smack_inode_permission(struct inode *inode, int mask) > { > struct smk_audit_info ad; > int no_block = mask & MAY_NOT_BLOCK; > int rc; > > mask &= (MAY_READ|MAY_WRITE|MAY_EXEC|MAY_APPEND); > /* > * No permission to check. Existence test. Yup, it's there. > */ > if (mask == 0) > return 0; > > + if (sb_in_userns(inode->i_sb)) && > + smk_of_inode(inode) != smk_of_inode(inode->i_sb->s_bdev->bd_inode)) > + return -EACCES; > + > /* May be droppable after audit */ > if (no_block) > return -ECHILD; > smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_INODE); > smk_ad_setfield_u_fs_inode(&ad, inode); > rc = smk_curacc(smk_of_inode(inode), mask, &ad); > rc = smk_bu_inode(inode, mask, rc); > return rc; > } Hmm, okay. I think I've been a little confused all this time about how you want to handle these unprivileged mounts. Originally I thought you wanted all objects in the filesystem to get the same label as the backing store. That's what I tried to implement originally, i.e. smk_root=smk_default=smk_of_inode(...->bd_inode), then assign every object (new and existing) smk_default and completely ignore the labels on disk. This is what I currently think you want for user ns mounts: 1. smk_root and smk_default are assigned the label of the backing device. 2. s_root is assigned the transmute property. 3. For existing files: a. Files with the same label as the backing device are accessible. b. Files with any other label are not accessible. If this is right, there are a couple lingering questions in my mind. First, what happens with files created in directories with the same label as the backing device but without the transmute property set? The inode for the new file will initially be labeled with smk_of_current(), but then during d_instantiate it will get smk_default and thus end up with the label we want. So that seems okay. The second is whether files with the SMACK64EXEC attribute is still a problem. It seems it is, for files with the same label as the backing store at least. I think we can simply skip the code that reads out this xattr and sets smk_task for user ns mounts, or else skip assigning the label to the new task in bprm_set_creds. The latter seems more consistent with the approach you've suggested for dealing with labels from disk. So I guess all of that seems okay, though perhaps a bit restrictive given that the user who mounted the filesystem already has full access to the backing store. Please let me know whether or not this matches up with what you are thinking, then I can procede with the implementation. Thanks, Seth ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-22 19:32 ` Seth Forshee @ 2015-07-23 0:05 ` Casey Schaufler -1 siblings, 0 replies; 138+ messages in thread From: Casey Schaufler @ 2015-07-23 0:05 UTC (permalink / raw) To: Seth Forshee Cc: Andy Lutomirski, Eric W. Biederman, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel On 7/22/2015 12:32 PM, Seth Forshee wrote: > On Wed, Jul 22, 2015 at 11:10:46AM -0700, Casey Schaufler wrote: >> On 7/22/2015 8:56 AM, Seth Forshee wrote: >>> On Tue, Jul 21, 2015 at 06:52:31PM -0700, Casey Schaufler wrote: >>>> On 7/21/2015 1:35 PM, Seth Forshee wrote: >>>>> On Thu, Jul 16, 2015 at 05:59:22PM -0700, Andy Lutomirski wrote: >>>>>> On Thu, Jul 16, 2015 at 5:45 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: >>>>>>> On 7/16/2015 4:29 PM, Andy Lutomirski wrote: >>>>>>>> I really don't see the benefit of making up extra rules that apply to >>>>>>>> users outside a userns who try to access specifically a filesystem >>>>>>>> with backing store. They wouldn't make sense for filesystems without >>>>>>>> backing store. >>>>>>> Sure it would. For Smack, it would be the label a file would be >>>>>>> created with, which would be the label of the process creating >>>>>>> the memory based filesystem. For SELinux the rules are more a >>>>>>> touch more sophisticated, but I'm sure that Paul or Stephen could >>>>>>> come up with how to determine it. >>>>>>> >>>>>>> The point, looping all the way back to the beginning, where we >>>>>>> were talking about just ignoring the labels on the filesystem, >>>>>>> is that if you use the same Smack label on the files in the >>>>>>> filesystem as the backing store file has, we'll all be happy. >>>>>>> If that label isn't something user can write to, he won't be >>>>>>> able to write to the mounted objects, either. If there is no >>>>>>> backing store then use the label of the process creating the >>>>>>> filesystem, which will be the user, which will mean everything >>>>>>> will work hunky dory. >>>>>>> >>>>>>> Yes, there's work involved, but I doubt there's a lot. Getting >>>>>>> the label from the backing store or the creating process is >>>>>>> simple enough. >>>>>>> >>>>> So something like the diff below (untested)? >>>> I think that this is close, and quite good for someone >>>> who isn't very familiar with Smack. It's definitely headed >>>> in the right direction. >>>> >>>>> All I'm really doing is setting smk_default as you describe above and >>>>> then using it instead of smk_of_current() in >>>>> smack_inode_alloc_security() and instead of the label from the disk in >>>>> smack_d_instantiate(). >>>> Let's say your backing store is a file labeled Rubble. >>>> >>>> mount -o smackfsroot=Rubble,smackfsdef=Rubble ... >>>> >>>> It is completely reasonable for a process labeled Flintstone to >>>> have rwxa access to a file labeled Rubble. >>>> >>>> Smack rule: Flintstone Rubble rwxa >>>> >>>> In the case of writing to an existing Rubble file, what you >>>> have looks fine. What's not so great is that if the Flintstone >>>> process creates a file, it should be labeled Flintstone. Your >>>> use of the smk_default, which is going to violate the principle >>>> of least astonishment, and break the Smack policy as well. >>>> >>>> Let's make a minor change. Instead of using smackfsroot let's >>>> use smackfstransmute and a slightly different access rule: >>>> >>>> mount -o smackfstransmute=Rubble,smackfsdef=Rubble ... >>>> >>>> Smack rule: Flintstone Rubble rwxat >>>> >>>> Now the only change we have to make to the Smack code is >>>> that we don't want to create any files unless either the >>>> process is labeled Rubble or the rule allowing the creation >>>> has the "t" for transmute access. That should ensure that >>>> everything is labeled Rubble. If it isn't, someone has mucked >>>> with the metadata in a detectable way. >>> All right, that kind of makes sense, but I'm still missing some pieces. >>> Questions follow. >>> >>>>> diff --git a/include/linux/fs.h b/include/linux/fs.h >>>>> index 32f598db0b0d..4597420ab933 100644 >>>>> --- a/include/linux/fs.h >>>>> +++ b/include/linux/fs.h >>>>> @@ -1486,6 +1486,10 @@ static inline void sb_start_intwrite(struct super_block *sb) >>>>> __sb_start_write(sb, SB_FREEZE_FS, true); >>>>> } >>>>> >>>>> +static inline bool sb_in_userns(struct super_block *sb) >>>>> +{ >>>>> + return sb->s_user_ns != &init_user_ns; >>>>> +} >>>>> >>>>> extern bool inode_owner_or_capable(const struct inode *inode); >>>>> >>>>> diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c >>>>> index a143328f75eb..591fd19294e7 100644 >>>>> --- a/security/smack/smack_lsm.c >>>>> +++ b/security/smack/smack_lsm.c >>>>> @@ -255,6 +255,10 @@ static struct smack_known *smk_fetch(const char *name, struct inode *ip, >>>>> char *buffer; >>>>> struct smack_known *skp = NULL; >>>>> >>>>> + /* Should never fetch xattrs from untrusted mounts */ >>>>> + if (WARN_ON(sb_in_userns(ip->i_sb))) >>>>> + return ERR_PTR(-EPERM); >>>>> + >>>> Go ahead and fetch it, we'll check to make sure it's viable later. >>>> >>>>> if (ip->i_op->getxattr == NULL) >>>>> return ERR_PTR(-EOPNOTSUPP); >>>>> >>>>> @@ -656,10 +660,14 @@ static int smack_sb_kern_mount(struct super_block *sb, int flags, void *data) >>>>> */ >>>>> if (specified) >>>>> return -EPERM; >>>>> + >>>>> /* >>>>> - * Unprivileged mounts get root and default from the caller. >>>>> + * User namespace mounts get root and default from the backing >>>>> + * store, if there is one. Other unprivileged mounts get them >>>>> + * from the caller. >>>>> */ >>>>> - skp = smk_of_current(); >>>>> + skp = (sb_in_userns(sb) && sb->s_bdev) ? >>>>> + smk_of_inode(sb->s_bdev->bd_inode) : smk_of_current(); >>>>> sp->smk_root = skp; >>>>> sp->smk_default = skp; >>>> sp->smk_flags |= SMK_INODE_TRANSMUTE; >>> I assume that you meant skp and not sp here. >> Actually, neither is correct. You want to set SMK_INODE_TRANSMUTE >> in the smk_flags field of the root inode. That's easy: >> >> transmute = 1; >> >> and the code after "Initialize the root inode" will take care of it. > Yeah, that's what I've actually done. > >>>>> } >>>>> @@ -792,7 +800,12 @@ static int smack_bprm_secureexec(struct linux_binprm *bprm) >>>>> */ >>>>> static int smack_inode_alloc_security(struct inode *inode) >>>>> { >>>>> - struct smack_known *skp = smk_of_current(); >>>>> + struct smack_known *skp; >>>>> + >>>>> + if (sb_in_userns(inode->i_sb)) >>>>> + skp = ((struct superblock_smack *)(inode->i_sb->s_security))->smk_default; >>>>> + else >>>>> + skp = smk_of_current(); >>>> This should be left alone. >>>> smack_inode_init_security is where you could disallow access that doesn't >>>> legitimately result in a Rubble label on the file. It's something like >>>> >>>> ... after the call may = smk_access_entry(...) >>>> if (sb_in_userns(inode->i_sb)) >>>> if (skp != dsp && (may & MAY_TRANSMUTE) == 0) >>>> return -EACCES; >>> I'm not getting how this covers all cases. >>> >>> So we've set the transmute flag on the root inode. Files and directories >>> created in the root directory get the same label, and directories also >>> get the transmute attribute. That's all fine. >>> >>> What about an existing directory in the filesystem that already has a >>> Slate label? I'm not getting what happens with this directory, or for >>> new files created in this directory, which also relates to my other >>> questions below. >>> >>> Also an aside - smk_access_entry looks weird. may is initialized to >>> -ENOENT, and then rule_list is searched for a rule which matches the >>> object and subject labels. Presumably it's possible that no rule could >>> be found, otherwise the prior initialization of may is pointless. If >>> this happens the following code treats it as though it always contains >>> access flags even though it might contain -ENOENT. Nothing bad actually >>> happens with a two's compliement representation of -ENOENT since it will >>> just set a bit that's already set, but it still seems like it should >>> have a may > 0 condition, for clarity if for no other reason. >> My suggested code is just wrong. I wasn't looking at the whole code, >> only the patch, and got myself confused. Apologies. >> >> If we want to go straight for the jugular how about this? I'm assuming >> that inode->i_sb->s_bdev->bd_inode is the inode of the backing store. > Yes. > >> static int smack_inode_permission(struct inode *inode, int mask) >> { >> struct smk_audit_info ad; >> int no_block = mask & MAY_NOT_BLOCK; >> int rc; >> >> mask &= (MAY_READ|MAY_WRITE|MAY_EXEC|MAY_APPEND); >> /* >> * No permission to check. Existence test. Yup, it's there. >> */ >> if (mask == 0) >> return 0; >> >> + if (sb_in_userns(inode->i_sb)) && >> + smk_of_inode(inode) != smk_of_inode(inode->i_sb->s_bdev->bd_inode)) >> + return -EACCES; >> + >> /* May be droppable after audit */ >> if (no_block) >> return -ECHILD; >> smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_INODE); >> smk_ad_setfield_u_fs_inode(&ad, inode); >> rc = smk_curacc(smk_of_inode(inode), mask, &ad); >> rc = smk_bu_inode(inode, mask, rc); >> return rc; >> } > Hmm, okay. I think I've been a little confused all this time about how > you want to handle these unprivileged mounts. Not your problem. I'm not the most consistent of reviewers. > Originally I thought you wanted all objects in the filesystem to get the > same label as the backing store. That's what I tried to implement > originally, i.e. smk_root=smk_default=smk_of_inode(...->bd_inode), then > assign every object (new and existing) smk_default and completely ignore > the labels on disk. I want everything to have the label of the backing store, but I don't want to ignore it if it somehow got something else. Because the only legitimate label for this example is Rubble, I want to reject anything else that appears. If someone builds a filesystem by hand with Slate labels I want it treated "safely". > This is what I currently think you want for user ns mounts: > > 1. smk_root and smk_default are assigned the label of the backing > device. > 2. s_root is assigned the transmute property. > 3. For existing files: > a. Files with the same label as the backing device are accessible. > b. Files with any other label are not accessible. That's right. Accept correct data, reject anything that's not right. > If this is right, there are a couple lingering questions in my mind. > > First, what happens with files created in directories with the same > label as the backing device but without the transmute property set? The > inode for the new file will initially be labeled with smk_of_current(), > but then during d_instantiate it will get smk_default and thus end up > with the label we want. So that seems okay. Yes. > The second is whether files with the SMACK64EXEC attribute is still a > problem. It seems it is, for files with the same label as the backing > store at least. I think we can simply skip the code that reads out this > xattr and sets smk_task for user ns mounts, or else skip assigning the > label to the new task in bprm_set_creds. The latter seems more > consistent with the approach you've suggested for dealing with labels > from disk. Yes, I think that skipping the smk_fetch(XATTR_NAME_SMACKEXEC, ...) in smack_d_instantiate for unprivileged mounts would do the trick. > So I guess all of that seems okay, though perhaps a bit restrictive > given that the user who mounted the filesystem already has full access > to the backing store. In truth, there is no reason to expect that the "user" who did the mount will ever have a Smack label that differs from the label of the backing store. If what we've got here seems restrictive, it's because you've got access from someone other than the "user". > Please let me know whether or not this matches up with what you are > thinking, then I can procede with the implementation. My current mindset is that, if you're going to allow unprivileged mounts of user defined backing stores, this is as safe as we can make it. > > Thanks, > Seth > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-23 0:05 ` Casey Schaufler 0 siblings, 0 replies; 138+ messages in thread From: Casey Schaufler @ 2015-07-23 0:05 UTC (permalink / raw) To: Seth Forshee Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, LSM List, SELinux-NSA, Linux FS Devel, Alexander Viro On 7/22/2015 12:32 PM, Seth Forshee wrote: > On Wed, Jul 22, 2015 at 11:10:46AM -0700, Casey Schaufler wrote: >> On 7/22/2015 8:56 AM, Seth Forshee wrote: >>> On Tue, Jul 21, 2015 at 06:52:31PM -0700, Casey Schaufler wrote: >>>> On 7/21/2015 1:35 PM, Seth Forshee wrote: >>>>> On Thu, Jul 16, 2015 at 05:59:22PM -0700, Andy Lutomirski wrote: >>>>>> On Thu, Jul 16, 2015 at 5:45 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: >>>>>>> On 7/16/2015 4:29 PM, Andy Lutomirski wrote: >>>>>>>> I really don't see the benefit of making up extra rules that apply to >>>>>>>> users outside a userns who try to access specifically a filesystem >>>>>>>> with backing store. They wouldn't make sense for filesystems without >>>>>>>> backing store. >>>>>>> Sure it would. For Smack, it would be the label a file would be >>>>>>> created with, which would be the label of the process creating >>>>>>> the memory based filesystem. For SELinux the rules are more a >>>>>>> touch more sophisticated, but I'm sure that Paul or Stephen could >>>>>>> come up with how to determine it. >>>>>>> >>>>>>> The point, looping all the way back to the beginning, where we >>>>>>> were talking about just ignoring the labels on the filesystem, >>>>>>> is that if you use the same Smack label on the files in the >>>>>>> filesystem as the backing store file has, we'll all be happy. >>>>>>> If that label isn't something user can write to, he won't be >>>>>>> able to write to the mounted objects, either. If there is no >>>>>>> backing store then use the label of the process creating the >>>>>>> filesystem, which will be the user, which will mean everything >>>>>>> will work hunky dory. >>>>>>> >>>>>>> Yes, there's work involved, but I doubt there's a lot. Getting >>>>>>> the label from the backing store or the creating process is >>>>>>> simple enough. >>>>>>> >>>>> So something like the diff below (untested)? >>>> I think that this is close, and quite good for someone >>>> who isn't very familiar with Smack. It's definitely headed >>>> in the right direction. >>>> >>>>> All I'm really doing is setting smk_default as you describe above and >>>>> then using it instead of smk_of_current() in >>>>> smack_inode_alloc_security() and instead of the label from the disk in >>>>> smack_d_instantiate(). >>>> Let's say your backing store is a file labeled Rubble. >>>> >>>> mount -o smackfsroot=Rubble,smackfsdef=Rubble ... >>>> >>>> It is completely reasonable for a process labeled Flintstone to >>>> have rwxa access to a file labeled Rubble. >>>> >>>> Smack rule: Flintstone Rubble rwxa >>>> >>>> In the case of writing to an existing Rubble file, what you >>>> have looks fine. What's not so great is that if the Flintstone >>>> process creates a file, it should be labeled Flintstone. Your >>>> use of the smk_default, which is going to violate the principle >>>> of least astonishment, and break the Smack policy as well. >>>> >>>> Let's make a minor change. Instead of using smackfsroot let's >>>> use smackfstransmute and a slightly different access rule: >>>> >>>> mount -o smackfstransmute=Rubble,smackfsdef=Rubble ... >>>> >>>> Smack rule: Flintstone Rubble rwxat >>>> >>>> Now the only change we have to make to the Smack code is >>>> that we don't want to create any files unless either the >>>> process is labeled Rubble or the rule allowing the creation >>>> has the "t" for transmute access. That should ensure that >>>> everything is labeled Rubble. If it isn't, someone has mucked >>>> with the metadata in a detectable way. >>> All right, that kind of makes sense, but I'm still missing some pieces. >>> Questions follow. >>> >>>>> diff --git a/include/linux/fs.h b/include/linux/fs.h >>>>> index 32f598db0b0d..4597420ab933 100644 >>>>> --- a/include/linux/fs.h >>>>> +++ b/include/linux/fs.h >>>>> @@ -1486,6 +1486,10 @@ static inline void sb_start_intwrite(struct super_block *sb) >>>>> __sb_start_write(sb, SB_FREEZE_FS, true); >>>>> } >>>>> >>>>> +static inline bool sb_in_userns(struct super_block *sb) >>>>> +{ >>>>> + return sb->s_user_ns != &init_user_ns; >>>>> +} >>>>> >>>>> extern bool inode_owner_or_capable(const struct inode *inode); >>>>> >>>>> diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c >>>>> index a143328f75eb..591fd19294e7 100644 >>>>> --- a/security/smack/smack_lsm.c >>>>> +++ b/security/smack/smack_lsm.c >>>>> @@ -255,6 +255,10 @@ static struct smack_known *smk_fetch(const char *name, struct inode *ip, >>>>> char *buffer; >>>>> struct smack_known *skp = NULL; >>>>> >>>>> + /* Should never fetch xattrs from untrusted mounts */ >>>>> + if (WARN_ON(sb_in_userns(ip->i_sb))) >>>>> + return ERR_PTR(-EPERM); >>>>> + >>>> Go ahead and fetch it, we'll check to make sure it's viable later. >>>> >>>>> if (ip->i_op->getxattr == NULL) >>>>> return ERR_PTR(-EOPNOTSUPP); >>>>> >>>>> @@ -656,10 +660,14 @@ static int smack_sb_kern_mount(struct super_block *sb, int flags, void *data) >>>>> */ >>>>> if (specified) >>>>> return -EPERM; >>>>> + >>>>> /* >>>>> - * Unprivileged mounts get root and default from the caller. >>>>> + * User namespace mounts get root and default from the backing >>>>> + * store, if there is one. Other unprivileged mounts get them >>>>> + * from the caller. >>>>> */ >>>>> - skp = smk_of_current(); >>>>> + skp = (sb_in_userns(sb) && sb->s_bdev) ? >>>>> + smk_of_inode(sb->s_bdev->bd_inode) : smk_of_current(); >>>>> sp->smk_root = skp; >>>>> sp->smk_default = skp; >>>> sp->smk_flags |= SMK_INODE_TRANSMUTE; >>> I assume that you meant skp and not sp here. >> Actually, neither is correct. You want to set SMK_INODE_TRANSMUTE >> in the smk_flags field of the root inode. That's easy: >> >> transmute = 1; >> >> and the code after "Initialize the root inode" will take care of it. > Yeah, that's what I've actually done. > >>>>> } >>>>> @@ -792,7 +800,12 @@ static int smack_bprm_secureexec(struct linux_binprm *bprm) >>>>> */ >>>>> static int smack_inode_alloc_security(struct inode *inode) >>>>> { >>>>> - struct smack_known *skp = smk_of_current(); >>>>> + struct smack_known *skp; >>>>> + >>>>> + if (sb_in_userns(inode->i_sb)) >>>>> + skp = ((struct superblock_smack *)(inode->i_sb->s_security))->smk_default; >>>>> + else >>>>> + skp = smk_of_current(); >>>> This should be left alone. >>>> smack_inode_init_security is where you could disallow access that doesn't >>>> legitimately result in a Rubble label on the file. It's something like >>>> >>>> ... after the call may = smk_access_entry(...) >>>> if (sb_in_userns(inode->i_sb)) >>>> if (skp != dsp && (may & MAY_TRANSMUTE) == 0) >>>> return -EACCES; >>> I'm not getting how this covers all cases. >>> >>> So we've set the transmute flag on the root inode. Files and directories >>> created in the root directory get the same label, and directories also >>> get the transmute attribute. That's all fine. >>> >>> What about an existing directory in the filesystem that already has a >>> Slate label? I'm not getting what happens with this directory, or for >>> new files created in this directory, which also relates to my other >>> questions below. >>> >>> Also an aside - smk_access_entry looks weird. may is initialized to >>> -ENOENT, and then rule_list is searched for a rule which matches the >>> object and subject labels. Presumably it's possible that no rule could >>> be found, otherwise the prior initialization of may is pointless. If >>> this happens the following code treats it as though it always contains >>> access flags even though it might contain -ENOENT. Nothing bad actually >>> happens with a two's compliement representation of -ENOENT since it will >>> just set a bit that's already set, but it still seems like it should >>> have a may > 0 condition, for clarity if for no other reason. >> My suggested code is just wrong. I wasn't looking at the whole code, >> only the patch, and got myself confused. Apologies. >> >> If we want to go straight for the jugular how about this? I'm assuming >> that inode->i_sb->s_bdev->bd_inode is the inode of the backing store. > Yes. > >> static int smack_inode_permission(struct inode *inode, int mask) >> { >> struct smk_audit_info ad; >> int no_block = mask & MAY_NOT_BLOCK; >> int rc; >> >> mask &= (MAY_READ|MAY_WRITE|MAY_EXEC|MAY_APPEND); >> /* >> * No permission to check. Existence test. Yup, it's there. >> */ >> if (mask == 0) >> return 0; >> >> + if (sb_in_userns(inode->i_sb)) && >> + smk_of_inode(inode) != smk_of_inode(inode->i_sb->s_bdev->bd_inode)) >> + return -EACCES; >> + >> /* May be droppable after audit */ >> if (no_block) >> return -ECHILD; >> smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_INODE); >> smk_ad_setfield_u_fs_inode(&ad, inode); >> rc = smk_curacc(smk_of_inode(inode), mask, &ad); >> rc = smk_bu_inode(inode, mask, rc); >> return rc; >> } > Hmm, okay. I think I've been a little confused all this time about how > you want to handle these unprivileged mounts. Not your problem. I'm not the most consistent of reviewers. > Originally I thought you wanted all objects in the filesystem to get the > same label as the backing store. That's what I tried to implement > originally, i.e. smk_root=smk_default=smk_of_inode(...->bd_inode), then > assign every object (new and existing) smk_default and completely ignore > the labels on disk. I want everything to have the label of the backing store, but I don't want to ignore it if it somehow got something else. Because the only legitimate label for this example is Rubble, I want to reject anything else that appears. If someone builds a filesystem by hand with Slate labels I want it treated "safely". > This is what I currently think you want for user ns mounts: > > 1. smk_root and smk_default are assigned the label of the backing > device. > 2. s_root is assigned the transmute property. > 3. For existing files: > a. Files with the same label as the backing device are accessible. > b. Files with any other label are not accessible. That's right. Accept correct data, reject anything that's not right. > If this is right, there are a couple lingering questions in my mind. > > First, what happens with files created in directories with the same > label as the backing device but without the transmute property set? The > inode for the new file will initially be labeled with smk_of_current(), > but then during d_instantiate it will get smk_default and thus end up > with the label we want. So that seems okay. Yes. > The second is whether files with the SMACK64EXEC attribute is still a > problem. It seems it is, for files with the same label as the backing > store at least. I think we can simply skip the code that reads out this > xattr and sets smk_task for user ns mounts, or else skip assigning the > label to the new task in bprm_set_creds. The latter seems more > consistent with the approach you've suggested for dealing with labels > from disk. Yes, I think that skipping the smk_fetch(XATTR_NAME_SMACKEXEC, ...) in smack_d_instantiate for unprivileged mounts would do the trick. > So I guess all of that seems okay, though perhaps a bit restrictive > given that the user who mounted the filesystem already has full access > to the backing store. In truth, there is no reason to expect that the "user" who did the mount will ever have a Smack label that differs from the label of the backing store. If what we've got here seems restrictive, it's because you've got access from someone other than the "user". > Please let me know whether or not this matches up with what you are > thinking, then I can procede with the implementation. My current mindset is that, if you're going to allow unprivileged mounts of user defined backing stores, this is as safe as we can make it. > > Thanks, > Seth > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-23 0:05 ` Casey Schaufler @ 2015-07-23 0:15 ` Eric W. Biederman -1 siblings, 0 replies; 138+ messages in thread From: Eric W. Biederman @ 2015-07-23 0:15 UTC (permalink / raw) To: Casey Schaufler Cc: Seth Forshee, Andy Lutomirski, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel Casey Schaufler <casey@schaufler-ca.com> writes: > On 7/22/2015 12:32 PM, Seth Forshee wrote: >> On Wed, Jul 22, 2015 at 11:10:46AM -0700, Casey Schaufler wrote: >>> On 7/22/2015 8:56 AM, Seth Forshee wrote: >>>> On Tue, Jul 21, 2015 at 06:52:31PM -0700, Casey Schaufler wrote: >>>>> On 7/21/2015 1:35 PM, Seth Forshee wrote: >>>>>> On Thu, Jul 16, 2015 at 05:59:22PM -0700, Andy Lutomirski wrote: >>>>>>> On Thu, Jul 16, 2015 at 5:45 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: >>>>>>>> On 7/16/2015 4:29 PM, Andy Lutomirski wrote: >>>>>>>>> I really don't see the benefit of making up extra rules that apply to >>>>>>>>> users outside a userns who try to access specifically a filesystem >>>>>>>>> with backing store. They wouldn't make sense for filesystems without >>>>>>>>> backing store. >>>>>>>> Sure it would. For Smack, it would be the label a file would be >>>>>>>> created with, which would be the label of the process creating >>>>>>>> the memory based filesystem. For SELinux the rules are more a >>>>>>>> touch more sophisticated, but I'm sure that Paul or Stephen could >>>>>>>> come up with how to determine it. >>>>>>>> >>>>>>>> The point, looping all the way back to the beginning, where we >>>>>>>> were talking about just ignoring the labels on the filesystem, >>>>>>>> is that if you use the same Smack label on the files in the >>>>>>>> filesystem as the backing store file has, we'll all be happy. >>>>>>>> If that label isn't something user can write to, he won't be >>>>>>>> able to write to the mounted objects, either. If there is no >>>>>>>> backing store then use the label of the process creating the >>>>>>>> filesystem, which will be the user, which will mean everything >>>>>>>> will work hunky dory. >>>>>>>> >>>>>>>> Yes, there's work involved, but I doubt there's a lot. Getting >>>>>>>> the label from the backing store or the creating process is >>>>>>>> simple enough. >>>>>>>> >>>>>> So something like the diff below (untested)? >>>>> I think that this is close, and quite good for someone >>>>> who isn't very familiar with Smack. It's definitely headed >>>>> in the right direction. >>>>> >>>>>> All I'm really doing is setting smk_default as you describe above and >>>>>> then using it instead of smk_of_current() in >>>>>> smack_inode_alloc_security() and instead of the label from the disk in >>>>>> smack_d_instantiate(). >>>>> Let's say your backing store is a file labeled Rubble. >>>>> >>>>> mount -o smackfsroot=Rubble,smackfsdef=Rubble ... >>>>> >>>>> It is completely reasonable for a process labeled Flintstone to >>>>> have rwxa access to a file labeled Rubble. >>>>> >>>>> Smack rule: Flintstone Rubble rwxa >>>>> >>>>> In the case of writing to an existing Rubble file, what you >>>>> have looks fine. What's not so great is that if the Flintstone >>>>> process creates a file, it should be labeled Flintstone. Your >>>>> use of the smk_default, which is going to violate the principle >>>>> of least astonishment, and break the Smack policy as well. >>>>> >>>>> Let's make a minor change. Instead of using smackfsroot let's >>>>> use smackfstransmute and a slightly different access rule: >>>>> >>>>> mount -o smackfstransmute=Rubble,smackfsdef=Rubble ... >>>>> >>>>> Smack rule: Flintstone Rubble rwxat >>>>> >>>>> Now the only change we have to make to the Smack code is >>>>> that we don't want to create any files unless either the >>>>> process is labeled Rubble or the rule allowing the creation >>>>> has the "t" for transmute access. That should ensure that >>>>> everything is labeled Rubble. If it isn't, someone has mucked >>>>> with the metadata in a detectable way. >>>> All right, that kind of makes sense, but I'm still missing some pieces. >>>> Questions follow. >>>> >>>>>> diff --git a/include/linux/fs.h b/include/linux/fs.h >>>>>> index 32f598db0b0d..4597420ab933 100644 >>>>>> --- a/include/linux/fs.h >>>>>> +++ b/include/linux/fs.h >>>>>> @@ -1486,6 +1486,10 @@ static inline void sb_start_intwrite(struct super_block *sb) >>>>>> __sb_start_write(sb, SB_FREEZE_FS, true); >>>>>> } >>>>>> >>>>>> +static inline bool sb_in_userns(struct super_block *sb) >>>>>> +{ >>>>>> + return sb->s_user_ns != &init_user_ns; >>>>>> +} >>>>>> >>>>>> extern bool inode_owner_or_capable(const struct inode *inode); >>>>>> >>>>>> diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c >>>>>> index a143328f75eb..591fd19294e7 100644 >>>>>> --- a/security/smack/smack_lsm.c >>>>>> +++ b/security/smack/smack_lsm.c >>>>>> @@ -255,6 +255,10 @@ static struct smack_known *smk_fetch(const char *name, struct inode *ip, >>>>>> char *buffer; >>>>>> struct smack_known *skp = NULL; >>>>>> >>>>>> + /* Should never fetch xattrs from untrusted mounts */ >>>>>> + if (WARN_ON(sb_in_userns(ip->i_sb))) >>>>>> + return ERR_PTR(-EPERM); >>>>>> + >>>>> Go ahead and fetch it, we'll check to make sure it's viable later. >>>>> >>>>>> if (ip->i_op->getxattr == NULL) >>>>>> return ERR_PTR(-EOPNOTSUPP); >>>>>> >>>>>> @@ -656,10 +660,14 @@ static int smack_sb_kern_mount(struct super_block *sb, int flags, void *data) >>>>>> */ >>>>>> if (specified) >>>>>> return -EPERM; >>>>>> + >>>>>> /* >>>>>> - * Unprivileged mounts get root and default from the caller. >>>>>> + * User namespace mounts get root and default from the backing >>>>>> + * store, if there is one. Other unprivileged mounts get them >>>>>> + * from the caller. >>>>>> */ >>>>>> - skp = smk_of_current(); >>>>>> + skp = (sb_in_userns(sb) && sb->s_bdev) ? >>>>>> + smk_of_inode(sb->s_bdev->bd_inode) : smk_of_current(); >>>>>> sp->smk_root = skp; >>>>>> sp->smk_default = skp; >>>>> sp->smk_flags |= SMK_INODE_TRANSMUTE; >>>> I assume that you meant skp and not sp here. >>> Actually, neither is correct. You want to set SMK_INODE_TRANSMUTE >>> in the smk_flags field of the root inode. That's easy: >>> >>> transmute = 1; >>> >>> and the code after "Initialize the root inode" will take care of it. >> Yeah, that's what I've actually done. >> >>>>>> } >>>>>> @@ -792,7 +800,12 @@ static int smack_bprm_secureexec(struct linux_binprm *bprm) >>>>>> */ >>>>>> static int smack_inode_alloc_security(struct inode *inode) >>>>>> { >>>>>> - struct smack_known *skp = smk_of_current(); >>>>>> + struct smack_known *skp; >>>>>> + >>>>>> + if (sb_in_userns(inode->i_sb)) >>>>>> + skp = ((struct superblock_smack *)(inode->i_sb->s_security))->smk_default; >>>>>> + else >>>>>> + skp = smk_of_current(); >>>>> This should be left alone. >>>>> smack_inode_init_security is where you could disallow access that doesn't >>>>> legitimately result in a Rubble label on the file. It's something like >>>>> >>>>> ... after the call may = smk_access_entry(...) >>>>> if (sb_in_userns(inode->i_sb)) >>>>> if (skp != dsp && (may & MAY_TRANSMUTE) == 0) >>>>> return -EACCES; >>>> I'm not getting how this covers all cases. >>>> >>>> So we've set the transmute flag on the root inode. Files and directories >>>> created in the root directory get the same label, and directories also >>>> get the transmute attribute. That's all fine. >>>> >>>> What about an existing directory in the filesystem that already has a >>>> Slate label? I'm not getting what happens with this directory, or for >>>> new files created in this directory, which also relates to my other >>>> questions below. >>>> >>>> Also an aside - smk_access_entry looks weird. may is initialized to >>>> -ENOENT, and then rule_list is searched for a rule which matches the >>>> object and subject labels. Presumably it's possible that no rule could >>>> be found, otherwise the prior initialization of may is pointless. If >>>> this happens the following code treats it as though it always contains >>>> access flags even though it might contain -ENOENT. Nothing bad actually >>>> happens with a two's compliement representation of -ENOENT since it will >>>> just set a bit that's already set, but it still seems like it should >>>> have a may > 0 condition, for clarity if for no other reason. >>> My suggested code is just wrong. I wasn't looking at the whole code, >>> only the patch, and got myself confused. Apologies. >>> >>> If we want to go straight for the jugular how about this? I'm assuming >>> that inode->i_sb->s_bdev->bd_inode is the inode of the backing store. >> Yes. >> >>> static int smack_inode_permission(struct inode *inode, int mask) >>> { >>> struct smk_audit_info ad; >>> int no_block = mask & MAY_NOT_BLOCK; >>> int rc; >>> >>> mask &= (MAY_READ|MAY_WRITE|MAY_EXEC|MAY_APPEND); >>> /* >>> * No permission to check. Existence test. Yup, it's there. >>> */ >>> if (mask == 0) >>> return 0; >>> >>> + if (sb_in_userns(inode->i_sb)) && >>> + smk_of_inode(inode) != smk_of_inode(inode->i_sb->s_bdev->bd_inode)) >>> + return -EACCES; >>> + >>> /* May be droppable after audit */ >>> if (no_block) >>> return -ECHILD; >>> smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_INODE); >>> smk_ad_setfield_u_fs_inode(&ad, inode); >>> rc = smk_curacc(smk_of_inode(inode), mask, &ad); >>> rc = smk_bu_inode(inode, mask, rc); >>> return rc; >>> } >> Hmm, okay. I think I've been a little confused all this time about how >> you want to handle these unprivileged mounts. > > Not your problem. I'm not the most consistent of reviewers. > >> Originally I thought you wanted all objects in the filesystem to get the >> same label as the backing store. That's what I tried to implement >> originally, i.e. smk_root=smk_default=smk_of_inode(...->bd_inode), then >> assign every object (new and existing) smk_default and completely ignore >> the labels on disk. > > I want everything to have the label of the backing store, but > I don't want to ignore it if it somehow got something else. Because > the only legitimate label for this example is Rubble, I want to > reject anything else that appears. If someone builds a filesystem > by hand with Slate labels I want it treated "safely". > >> This is what I currently think you want for user ns mounts: >> >> 1. smk_root and smk_default are assigned the label of the backing >> device. >> 2. s_root is assigned the transmute property. >> 3. For existing files: >> a. Files with the same label as the backing device are accessible. >> b. Files with any other label are not accessible. > > That's right. Accept correct data, reject anything that's not right. > >> If this is right, there are a couple lingering questions in my mind. >> >> First, what happens with files created in directories with the same >> label as the backing device but without the transmute property set? The >> inode for the new file will initially be labeled with smk_of_current(), >> but then during d_instantiate it will get smk_default and thus end up >> with the label we want. So that seems okay. > > Yes. > >> The second is whether files with the SMACK64EXEC attribute is still a >> problem. It seems it is, for files with the same label as the backing >> store at least. I think we can simply skip the code that reads out this >> xattr and sets smk_task for user ns mounts, or else skip assigning the >> label to the new task in bprm_set_creds. The latter seems more >> consistent with the approach you've suggested for dealing with labels >> from disk. > > Yes, I think that skipping the smk_fetch(XATTR_NAME_SMACKEXEC, ...) in > smack_d_instantiate for unprivileged mounts would do the trick. > >> So I guess all of that seems okay, though perhaps a bit restrictive >> given that the user who mounted the filesystem already has full access >> to the backing store. > > In truth, there is no reason to expect that the "user" who did the > mount will ever have a Smack label that differs from the label of > the backing store. If what we've got here seems restrictive, it's > because you've got access from someone other than the "user". > >> Please let me know whether or not this matches up with what you are >> thinking, then I can procede with the implementation. > > My current mindset is that, if you're going to allow unprivileged > mounts of user defined backing stores, this is as safe as we can > make it. That actually sounds very reasonable to me. It is essentially what we do with uid and gids already. I presume the smack namespace support would when integrated with all of this would allow a set of labels to be set. Have I missed a part of the conversation you talk about fileystems that don't have support for storing labels? Filesystems like vfat, isofs, etc. Eric ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-23 0:15 ` Eric W. Biederman 0 siblings, 0 replies; 138+ messages in thread From: Eric W. Biederman @ 2015-07-23 0:15 UTC (permalink / raw) To: Casey Schaufler Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, Seth Forshee, LSM List, Alexander Viro, SELinux-NSA, Linux FS Devel Casey Schaufler <casey@schaufler-ca.com> writes: > On 7/22/2015 12:32 PM, Seth Forshee wrote: >> On Wed, Jul 22, 2015 at 11:10:46AM -0700, Casey Schaufler wrote: >>> On 7/22/2015 8:56 AM, Seth Forshee wrote: >>>> On Tue, Jul 21, 2015 at 06:52:31PM -0700, Casey Schaufler wrote: >>>>> On 7/21/2015 1:35 PM, Seth Forshee wrote: >>>>>> On Thu, Jul 16, 2015 at 05:59:22PM -0700, Andy Lutomirski wrote: >>>>>>> On Thu, Jul 16, 2015 at 5:45 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: >>>>>>>> On 7/16/2015 4:29 PM, Andy Lutomirski wrote: >>>>>>>>> I really don't see the benefit of making up extra rules that apply to >>>>>>>>> users outside a userns who try to access specifically a filesystem >>>>>>>>> with backing store. They wouldn't make sense for filesystems without >>>>>>>>> backing store. >>>>>>>> Sure it would. For Smack, it would be the label a file would be >>>>>>>> created with, which would be the label of the process creating >>>>>>>> the memory based filesystem. For SELinux the rules are more a >>>>>>>> touch more sophisticated, but I'm sure that Paul or Stephen could >>>>>>>> come up with how to determine it. >>>>>>>> >>>>>>>> The point, looping all the way back to the beginning, where we >>>>>>>> were talking about just ignoring the labels on the filesystem, >>>>>>>> is that if you use the same Smack label on the files in the >>>>>>>> filesystem as the backing store file has, we'll all be happy. >>>>>>>> If that label isn't something user can write to, he won't be >>>>>>>> able to write to the mounted objects, either. If there is no >>>>>>>> backing store then use the label of the process creating the >>>>>>>> filesystem, which will be the user, which will mean everything >>>>>>>> will work hunky dory. >>>>>>>> >>>>>>>> Yes, there's work involved, but I doubt there's a lot. Getting >>>>>>>> the label from the backing store or the creating process is >>>>>>>> simple enough. >>>>>>>> >>>>>> So something like the diff below (untested)? >>>>> I think that this is close, and quite good for someone >>>>> who isn't very familiar with Smack. It's definitely headed >>>>> in the right direction. >>>>> >>>>>> All I'm really doing is setting smk_default as you describe above and >>>>>> then using it instead of smk_of_current() in >>>>>> smack_inode_alloc_security() and instead of the label from the disk in >>>>>> smack_d_instantiate(). >>>>> Let's say your backing store is a file labeled Rubble. >>>>> >>>>> mount -o smackfsroot=Rubble,smackfsdef=Rubble ... >>>>> >>>>> It is completely reasonable for a process labeled Flintstone to >>>>> have rwxa access to a file labeled Rubble. >>>>> >>>>> Smack rule: Flintstone Rubble rwxa >>>>> >>>>> In the case of writing to an existing Rubble file, what you >>>>> have looks fine. What's not so great is that if the Flintstone >>>>> process creates a file, it should be labeled Flintstone. Your >>>>> use of the smk_default, which is going to violate the principle >>>>> of least astonishment, and break the Smack policy as well. >>>>> >>>>> Let's make a minor change. Instead of using smackfsroot let's >>>>> use smackfstransmute and a slightly different access rule: >>>>> >>>>> mount -o smackfstransmute=Rubble,smackfsdef=Rubble ... >>>>> >>>>> Smack rule: Flintstone Rubble rwxat >>>>> >>>>> Now the only change we have to make to the Smack code is >>>>> that we don't want to create any files unless either the >>>>> process is labeled Rubble or the rule allowing the creation >>>>> has the "t" for transmute access. That should ensure that >>>>> everything is labeled Rubble. If it isn't, someone has mucked >>>>> with the metadata in a detectable way. >>>> All right, that kind of makes sense, but I'm still missing some pieces. >>>> Questions follow. >>>> >>>>>> diff --git a/include/linux/fs.h b/include/linux/fs.h >>>>>> index 32f598db0b0d..4597420ab933 100644 >>>>>> --- a/include/linux/fs.h >>>>>> +++ b/include/linux/fs.h >>>>>> @@ -1486,6 +1486,10 @@ static inline void sb_start_intwrite(struct super_block *sb) >>>>>> __sb_start_write(sb, SB_FREEZE_FS, true); >>>>>> } >>>>>> >>>>>> +static inline bool sb_in_userns(struct super_block *sb) >>>>>> +{ >>>>>> + return sb->s_user_ns != &init_user_ns; >>>>>> +} >>>>>> >>>>>> extern bool inode_owner_or_capable(const struct inode *inode); >>>>>> >>>>>> diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c >>>>>> index a143328f75eb..591fd19294e7 100644 >>>>>> --- a/security/smack/smack_lsm.c >>>>>> +++ b/security/smack/smack_lsm.c >>>>>> @@ -255,6 +255,10 @@ static struct smack_known *smk_fetch(const char *name, struct inode *ip, >>>>>> char *buffer; >>>>>> struct smack_known *skp = NULL; >>>>>> >>>>>> + /* Should never fetch xattrs from untrusted mounts */ >>>>>> + if (WARN_ON(sb_in_userns(ip->i_sb))) >>>>>> + return ERR_PTR(-EPERM); >>>>>> + >>>>> Go ahead and fetch it, we'll check to make sure it's viable later. >>>>> >>>>>> if (ip->i_op->getxattr == NULL) >>>>>> return ERR_PTR(-EOPNOTSUPP); >>>>>> >>>>>> @@ -656,10 +660,14 @@ static int smack_sb_kern_mount(struct super_block *sb, int flags, void *data) >>>>>> */ >>>>>> if (specified) >>>>>> return -EPERM; >>>>>> + >>>>>> /* >>>>>> - * Unprivileged mounts get root and default from the caller. >>>>>> + * User namespace mounts get root and default from the backing >>>>>> + * store, if there is one. Other unprivileged mounts get them >>>>>> + * from the caller. >>>>>> */ >>>>>> - skp = smk_of_current(); >>>>>> + skp = (sb_in_userns(sb) && sb->s_bdev) ? >>>>>> + smk_of_inode(sb->s_bdev->bd_inode) : smk_of_current(); >>>>>> sp->smk_root = skp; >>>>>> sp->smk_default = skp; >>>>> sp->smk_flags |= SMK_INODE_TRANSMUTE; >>>> I assume that you meant skp and not sp here. >>> Actually, neither is correct. You want to set SMK_INODE_TRANSMUTE >>> in the smk_flags field of the root inode. That's easy: >>> >>> transmute = 1; >>> >>> and the code after "Initialize the root inode" will take care of it. >> Yeah, that's what I've actually done. >> >>>>>> } >>>>>> @@ -792,7 +800,12 @@ static int smack_bprm_secureexec(struct linux_binprm *bprm) >>>>>> */ >>>>>> static int smack_inode_alloc_security(struct inode *inode) >>>>>> { >>>>>> - struct smack_known *skp = smk_of_current(); >>>>>> + struct smack_known *skp; >>>>>> + >>>>>> + if (sb_in_userns(inode->i_sb)) >>>>>> + skp = ((struct superblock_smack *)(inode->i_sb->s_security))->smk_default; >>>>>> + else >>>>>> + skp = smk_of_current(); >>>>> This should be left alone. >>>>> smack_inode_init_security is where you could disallow access that doesn't >>>>> legitimately result in a Rubble label on the file. It's something like >>>>> >>>>> ... after the call may = smk_access_entry(...) >>>>> if (sb_in_userns(inode->i_sb)) >>>>> if (skp != dsp && (may & MAY_TRANSMUTE) == 0) >>>>> return -EACCES; >>>> I'm not getting how this covers all cases. >>>> >>>> So we've set the transmute flag on the root inode. Files and directories >>>> created in the root directory get the same label, and directories also >>>> get the transmute attribute. That's all fine. >>>> >>>> What about an existing directory in the filesystem that already has a >>>> Slate label? I'm not getting what happens with this directory, or for >>>> new files created in this directory, which also relates to my other >>>> questions below. >>>> >>>> Also an aside - smk_access_entry looks weird. may is initialized to >>>> -ENOENT, and then rule_list is searched for a rule which matches the >>>> object and subject labels. Presumably it's possible that no rule could >>>> be found, otherwise the prior initialization of may is pointless. If >>>> this happens the following code treats it as though it always contains >>>> access flags even though it might contain -ENOENT. Nothing bad actually >>>> happens with a two's compliement representation of -ENOENT since it will >>>> just set a bit that's already set, but it still seems like it should >>>> have a may > 0 condition, for clarity if for no other reason. >>> My suggested code is just wrong. I wasn't looking at the whole code, >>> only the patch, and got myself confused. Apologies. >>> >>> If we want to go straight for the jugular how about this? I'm assuming >>> that inode->i_sb->s_bdev->bd_inode is the inode of the backing store. >> Yes. >> >>> static int smack_inode_permission(struct inode *inode, int mask) >>> { >>> struct smk_audit_info ad; >>> int no_block = mask & MAY_NOT_BLOCK; >>> int rc; >>> >>> mask &= (MAY_READ|MAY_WRITE|MAY_EXEC|MAY_APPEND); >>> /* >>> * No permission to check. Existence test. Yup, it's there. >>> */ >>> if (mask == 0) >>> return 0; >>> >>> + if (sb_in_userns(inode->i_sb)) && >>> + smk_of_inode(inode) != smk_of_inode(inode->i_sb->s_bdev->bd_inode)) >>> + return -EACCES; >>> + >>> /* May be droppable after audit */ >>> if (no_block) >>> return -ECHILD; >>> smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_INODE); >>> smk_ad_setfield_u_fs_inode(&ad, inode); >>> rc = smk_curacc(smk_of_inode(inode), mask, &ad); >>> rc = smk_bu_inode(inode, mask, rc); >>> return rc; >>> } >> Hmm, okay. I think I've been a little confused all this time about how >> you want to handle these unprivileged mounts. > > Not your problem. I'm not the most consistent of reviewers. > >> Originally I thought you wanted all objects in the filesystem to get the >> same label as the backing store. That's what I tried to implement >> originally, i.e. smk_root=smk_default=smk_of_inode(...->bd_inode), then >> assign every object (new and existing) smk_default and completely ignore >> the labels on disk. > > I want everything to have the label of the backing store, but > I don't want to ignore it if it somehow got something else. Because > the only legitimate label for this example is Rubble, I want to > reject anything else that appears. If someone builds a filesystem > by hand with Slate labels I want it treated "safely". > >> This is what I currently think you want for user ns mounts: >> >> 1. smk_root and smk_default are assigned the label of the backing >> device. >> 2. s_root is assigned the transmute property. >> 3. For existing files: >> a. Files with the same label as the backing device are accessible. >> b. Files with any other label are not accessible. > > That's right. Accept correct data, reject anything that's not right. > >> If this is right, there are a couple lingering questions in my mind. >> >> First, what happens with files created in directories with the same >> label as the backing device but without the transmute property set? The >> inode for the new file will initially be labeled with smk_of_current(), >> but then during d_instantiate it will get smk_default and thus end up >> with the label we want. So that seems okay. > > Yes. > >> The second is whether files with the SMACK64EXEC attribute is still a >> problem. It seems it is, for files with the same label as the backing >> store at least. I think we can simply skip the code that reads out this >> xattr and sets smk_task for user ns mounts, or else skip assigning the >> label to the new task in bprm_set_creds. The latter seems more >> consistent with the approach you've suggested for dealing with labels >> from disk. > > Yes, I think that skipping the smk_fetch(XATTR_NAME_SMACKEXEC, ...) in > smack_d_instantiate for unprivileged mounts would do the trick. > >> So I guess all of that seems okay, though perhaps a bit restrictive >> given that the user who mounted the filesystem already has full access >> to the backing store. > > In truth, there is no reason to expect that the "user" who did the > mount will ever have a Smack label that differs from the label of > the backing store. If what we've got here seems restrictive, it's > because you've got access from someone other than the "user". > >> Please let me know whether or not this matches up with what you are >> thinking, then I can procede with the implementation. > > My current mindset is that, if you're going to allow unprivileged > mounts of user defined backing stores, this is as safe as we can > make it. That actually sounds very reasonable to me. It is essentially what we do with uid and gids already. I presume the smack namespace support would when integrated with all of this would allow a set of labels to be set. Have I missed a part of the conversation you talk about fileystems that don't have support for storing labels? Filesystems like vfat, isofs, etc. Eric ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-23 0:15 ` Eric W. Biederman @ 2015-07-23 5:15 ` Seth Forshee -1 siblings, 0 replies; 138+ messages in thread From: Seth Forshee @ 2015-07-23 5:15 UTC (permalink / raw) To: Eric W. Biederman Cc: Casey Schaufler, Andy Lutomirski, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel On Wed, Jul 22, 2015 at 07:15:19PM -0500, Eric W. Biederman wrote: > Casey Schaufler <casey@schaufler-ca.com> writes: > > > On 7/22/2015 12:32 PM, Seth Forshee wrote: > >> On Wed, Jul 22, 2015 at 11:10:46AM -0700, Casey Schaufler wrote: > >>> On 7/22/2015 8:56 AM, Seth Forshee wrote: > >>>> On Tue, Jul 21, 2015 at 06:52:31PM -0700, Casey Schaufler wrote: > >>>>> On 7/21/2015 1:35 PM, Seth Forshee wrote: > >>>>>> On Thu, Jul 16, 2015 at 05:59:22PM -0700, Andy Lutomirski wrote: > >>>>>>> On Thu, Jul 16, 2015 at 5:45 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: > >>>>>>>> On 7/16/2015 4:29 PM, Andy Lutomirski wrote: > >>>>>>>>> I really don't see the benefit of making up extra rules that apply to > >>>>>>>>> users outside a userns who try to access specifically a filesystem > >>>>>>>>> with backing store. They wouldn't make sense for filesystems without > >>>>>>>>> backing store. > >>>>>>>> Sure it would. For Smack, it would be the label a file would be > >>>>>>>> created with, which would be the label of the process creating > >>>>>>>> the memory based filesystem. For SELinux the rules are more a > >>>>>>>> touch more sophisticated, but I'm sure that Paul or Stephen could > >>>>>>>> come up with how to determine it. > >>>>>>>> > >>>>>>>> The point, looping all the way back to the beginning, where we > >>>>>>>> were talking about just ignoring the labels on the filesystem, > >>>>>>>> is that if you use the same Smack label on the files in the > >>>>>>>> filesystem as the backing store file has, we'll all be happy. > >>>>>>>> If that label isn't something user can write to, he won't be > >>>>>>>> able to write to the mounted objects, either. If there is no > >>>>>>>> backing store then use the label of the process creating the > >>>>>>>> filesystem, which will be the user, which will mean everything > >>>>>>>> will work hunky dory. > >>>>>>>> > >>>>>>>> Yes, there's work involved, but I doubt there's a lot. Getting > >>>>>>>> the label from the backing store or the creating process is > >>>>>>>> simple enough. > >>>>>>>> > >>>>>> So something like the diff below (untested)? > >>>>> I think that this is close, and quite good for someone > >>>>> who isn't very familiar with Smack. It's definitely headed > >>>>> in the right direction. > >>>>> > >>>>>> All I'm really doing is setting smk_default as you describe above and > >>>>>> then using it instead of smk_of_current() in > >>>>>> smack_inode_alloc_security() and instead of the label from the disk in > >>>>>> smack_d_instantiate(). > >>>>> Let's say your backing store is a file labeled Rubble. > >>>>> > >>>>> mount -o smackfsroot=Rubble,smackfsdef=Rubble ... > >>>>> > >>>>> It is completely reasonable for a process labeled Flintstone to > >>>>> have rwxa access to a file labeled Rubble. > >>>>> > >>>>> Smack rule: Flintstone Rubble rwxa > >>>>> > >>>>> In the case of writing to an existing Rubble file, what you > >>>>> have looks fine. What's not so great is that if the Flintstone > >>>>> process creates a file, it should be labeled Flintstone. Your > >>>>> use of the smk_default, which is going to violate the principle > >>>>> of least astonishment, and break the Smack policy as well. > >>>>> > >>>>> Let's make a minor change. Instead of using smackfsroot let's > >>>>> use smackfstransmute and a slightly different access rule: > >>>>> > >>>>> mount -o smackfstransmute=Rubble,smackfsdef=Rubble ... > >>>>> > >>>>> Smack rule: Flintstone Rubble rwxat > >>>>> > >>>>> Now the only change we have to make to the Smack code is > >>>>> that we don't want to create any files unless either the > >>>>> process is labeled Rubble or the rule allowing the creation > >>>>> has the "t" for transmute access. That should ensure that > >>>>> everything is labeled Rubble. If it isn't, someone has mucked > >>>>> with the metadata in a detectable way. > >>>> All right, that kind of makes sense, but I'm still missing some pieces. > >>>> Questions follow. > >>>> > >>>>>> diff --git a/include/linux/fs.h b/include/linux/fs.h > >>>>>> index 32f598db0b0d..4597420ab933 100644 > >>>>>> --- a/include/linux/fs.h > >>>>>> +++ b/include/linux/fs.h > >>>>>> @@ -1486,6 +1486,10 @@ static inline void sb_start_intwrite(struct super_block *sb) > >>>>>> __sb_start_write(sb, SB_FREEZE_FS, true); > >>>>>> } > >>>>>> > >>>>>> +static inline bool sb_in_userns(struct super_block *sb) > >>>>>> +{ > >>>>>> + return sb->s_user_ns != &init_user_ns; > >>>>>> +} > >>>>>> > >>>>>> extern bool inode_owner_or_capable(const struct inode *inode); > >>>>>> > >>>>>> diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c > >>>>>> index a143328f75eb..591fd19294e7 100644 > >>>>>> --- a/security/smack/smack_lsm.c > >>>>>> +++ b/security/smack/smack_lsm.c > >>>>>> @@ -255,6 +255,10 @@ static struct smack_known *smk_fetch(const char *name, struct inode *ip, > >>>>>> char *buffer; > >>>>>> struct smack_known *skp = NULL; > >>>>>> > >>>>>> + /* Should never fetch xattrs from untrusted mounts */ > >>>>>> + if (WARN_ON(sb_in_userns(ip->i_sb))) > >>>>>> + return ERR_PTR(-EPERM); > >>>>>> + > >>>>> Go ahead and fetch it, we'll check to make sure it's viable later. > >>>>> > >>>>>> if (ip->i_op->getxattr == NULL) > >>>>>> return ERR_PTR(-EOPNOTSUPP); > >>>>>> > >>>>>> @@ -656,10 +660,14 @@ static int smack_sb_kern_mount(struct super_block *sb, int flags, void *data) > >>>>>> */ > >>>>>> if (specified) > >>>>>> return -EPERM; > >>>>>> + > >>>>>> /* > >>>>>> - * Unprivileged mounts get root and default from the caller. > >>>>>> + * User namespace mounts get root and default from the backing > >>>>>> + * store, if there is one. Other unprivileged mounts get them > >>>>>> + * from the caller. > >>>>>> */ > >>>>>> - skp = smk_of_current(); > >>>>>> + skp = (sb_in_userns(sb) && sb->s_bdev) ? > >>>>>> + smk_of_inode(sb->s_bdev->bd_inode) : smk_of_current(); > >>>>>> sp->smk_root = skp; > >>>>>> sp->smk_default = skp; > >>>>> sp->smk_flags |= SMK_INODE_TRANSMUTE; > >>>> I assume that you meant skp and not sp here. > >>> Actually, neither is correct. You want to set SMK_INODE_TRANSMUTE > >>> in the smk_flags field of the root inode. That's easy: > >>> > >>> transmute = 1; > >>> > >>> and the code after "Initialize the root inode" will take care of it. > >> Yeah, that's what I've actually done. > >> > >>>>>> } > >>>>>> @@ -792,7 +800,12 @@ static int smack_bprm_secureexec(struct linux_binprm *bprm) > >>>>>> */ > >>>>>> static int smack_inode_alloc_security(struct inode *inode) > >>>>>> { > >>>>>> - struct smack_known *skp = smk_of_current(); > >>>>>> + struct smack_known *skp; > >>>>>> + > >>>>>> + if (sb_in_userns(inode->i_sb)) > >>>>>> + skp = ((struct superblock_smack *)(inode->i_sb->s_security))->smk_default; > >>>>>> + else > >>>>>> + skp = smk_of_current(); > >>>>> This should be left alone. > >>>>> smack_inode_init_security is where you could disallow access that doesn't > >>>>> legitimately result in a Rubble label on the file. It's something like > >>>>> > >>>>> ... after the call may = smk_access_entry(...) > >>>>> if (sb_in_userns(inode->i_sb)) > >>>>> if (skp != dsp && (may & MAY_TRANSMUTE) == 0) > >>>>> return -EACCES; > >>>> I'm not getting how this covers all cases. > >>>> > >>>> So we've set the transmute flag on the root inode. Files and directories > >>>> created in the root directory get the same label, and directories also > >>>> get the transmute attribute. That's all fine. > >>>> > >>>> What about an existing directory in the filesystem that already has a > >>>> Slate label? I'm not getting what happens with this directory, or for > >>>> new files created in this directory, which also relates to my other > >>>> questions below. > >>>> > >>>> Also an aside - smk_access_entry looks weird. may is initialized to > >>>> -ENOENT, and then rule_list is searched for a rule which matches the > >>>> object and subject labels. Presumably it's possible that no rule could > >>>> be found, otherwise the prior initialization of may is pointless. If > >>>> this happens the following code treats it as though it always contains > >>>> access flags even though it might contain -ENOENT. Nothing bad actually > >>>> happens with a two's compliement representation of -ENOENT since it will > >>>> just set a bit that's already set, but it still seems like it should > >>>> have a may > 0 condition, for clarity if for no other reason. > >>> My suggested code is just wrong. I wasn't looking at the whole code, > >>> only the patch, and got myself confused. Apologies. > >>> > >>> If we want to go straight for the jugular how about this? I'm assuming > >>> that inode->i_sb->s_bdev->bd_inode is the inode of the backing store. > >> Yes. > >> > >>> static int smack_inode_permission(struct inode *inode, int mask) > >>> { > >>> struct smk_audit_info ad; > >>> int no_block = mask & MAY_NOT_BLOCK; > >>> int rc; > >>> > >>> mask &= (MAY_READ|MAY_WRITE|MAY_EXEC|MAY_APPEND); > >>> /* > >>> * No permission to check. Existence test. Yup, it's there. > >>> */ > >>> if (mask == 0) > >>> return 0; > >>> > >>> + if (sb_in_userns(inode->i_sb)) && > >>> + smk_of_inode(inode) != smk_of_inode(inode->i_sb->s_bdev->bd_inode)) > >>> + return -EACCES; > >>> + > >>> /* May be droppable after audit */ > >>> if (no_block) > >>> return -ECHILD; > >>> smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_INODE); > >>> smk_ad_setfield_u_fs_inode(&ad, inode); > >>> rc = smk_curacc(smk_of_inode(inode), mask, &ad); > >>> rc = smk_bu_inode(inode, mask, rc); > >>> return rc; > >>> } > >> Hmm, okay. I think I've been a little confused all this time about how > >> you want to handle these unprivileged mounts. > > > > Not your problem. I'm not the most consistent of reviewers. > > > >> Originally I thought you wanted all objects in the filesystem to get the > >> same label as the backing store. That's what I tried to implement > >> originally, i.e. smk_root=smk_default=smk_of_inode(...->bd_inode), then > >> assign every object (new and existing) smk_default and completely ignore > >> the labels on disk. > > > > I want everything to have the label of the backing store, but > > I don't want to ignore it if it somehow got something else. Because > > the only legitimate label for this example is Rubble, I want to > > reject anything else that appears. If someone builds a filesystem > > by hand with Slate labels I want it treated "safely". > > > >> This is what I currently think you want for user ns mounts: > >> > >> 1. smk_root and smk_default are assigned the label of the backing > >> device. > >> 2. s_root is assigned the transmute property. > >> 3. For existing files: > >> a. Files with the same label as the backing device are accessible. > >> b. Files with any other label are not accessible. > > > > That's right. Accept correct data, reject anything that's not right. > > > >> If this is right, there are a couple lingering questions in my mind. > >> > >> First, what happens with files created in directories with the same > >> label as the backing device but without the transmute property set? The > >> inode for the new file will initially be labeled with smk_of_current(), > >> but then during d_instantiate it will get smk_default and thus end up > >> with the label we want. So that seems okay. > > > > Yes. > > > >> The second is whether files with the SMACK64EXEC attribute is still a > >> problem. It seems it is, for files with the same label as the backing > >> store at least. I think we can simply skip the code that reads out this > >> xattr and sets smk_task for user ns mounts, or else skip assigning the > >> label to the new task in bprm_set_creds. The latter seems more > >> consistent with the approach you've suggested for dealing with labels > >> from disk. > > > > Yes, I think that skipping the smk_fetch(XATTR_NAME_SMACKEXEC, ...) in > > smack_d_instantiate for unprivileged mounts would do the trick. > > > >> So I guess all of that seems okay, though perhaps a bit restrictive > >> given that the user who mounted the filesystem already has full access > >> to the backing store. > > > > In truth, there is no reason to expect that the "user" who did the > > mount will ever have a Smack label that differs from the label of > > the backing store. If what we've got here seems restrictive, it's > > because you've got access from someone other than the "user". > > > >> Please let me know whether or not this matches up with what you are > >> thinking, then I can procede with the implementation. > > > > My current mindset is that, if you're going to allow unprivileged > > mounts of user defined backing stores, this is as safe as we can > > make it. > > That actually sounds very reasonable to me. It is essentially what we > do with uid and gids already. I presume the smack namespace support > would when integrated with all of this would allow a set of labels to be > set. > > Have I missed a part of the conversation you talk about fileystems that > don't have support for storing labels? Filesystems like vfat, isofs, > etc. As I read the code they should all end up with the superblock's smk_default label for the objects in RAM, i.e. the label of the backing store. The same would be true for existing files in a filesystem which does support storing labels but has no labels on the files. Seth ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-23 5:15 ` Seth Forshee 0 siblings, 0 replies; 138+ messages in thread From: Seth Forshee @ 2015-07-23 5:15 UTC (permalink / raw) To: Eric W. Biederman Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, Linux FS Devel, LSM List, Alexander Viro, SELinux-NSA On Wed, Jul 22, 2015 at 07:15:19PM -0500, Eric W. Biederman wrote: > Casey Schaufler <casey@schaufler-ca.com> writes: > > > On 7/22/2015 12:32 PM, Seth Forshee wrote: > >> On Wed, Jul 22, 2015 at 11:10:46AM -0700, Casey Schaufler wrote: > >>> On 7/22/2015 8:56 AM, Seth Forshee wrote: > >>>> On Tue, Jul 21, 2015 at 06:52:31PM -0700, Casey Schaufler wrote: > >>>>> On 7/21/2015 1:35 PM, Seth Forshee wrote: > >>>>>> On Thu, Jul 16, 2015 at 05:59:22PM -0700, Andy Lutomirski wrote: > >>>>>>> On Thu, Jul 16, 2015 at 5:45 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: > >>>>>>>> On 7/16/2015 4:29 PM, Andy Lutomirski wrote: > >>>>>>>>> I really don't see the benefit of making up extra rules that apply to > >>>>>>>>> users outside a userns who try to access specifically a filesystem > >>>>>>>>> with backing store. They wouldn't make sense for filesystems without > >>>>>>>>> backing store. > >>>>>>>> Sure it would. For Smack, it would be the label a file would be > >>>>>>>> created with, which would be the label of the process creating > >>>>>>>> the memory based filesystem. For SELinux the rules are more a > >>>>>>>> touch more sophisticated, but I'm sure that Paul or Stephen could > >>>>>>>> come up with how to determine it. > >>>>>>>> > >>>>>>>> The point, looping all the way back to the beginning, where we > >>>>>>>> were talking about just ignoring the labels on the filesystem, > >>>>>>>> is that if you use the same Smack label on the files in the > >>>>>>>> filesystem as the backing store file has, we'll all be happy. > >>>>>>>> If that label isn't something user can write to, he won't be > >>>>>>>> able to write to the mounted objects, either. If there is no > >>>>>>>> backing store then use the label of the process creating the > >>>>>>>> filesystem, which will be the user, which will mean everything > >>>>>>>> will work hunky dory. > >>>>>>>> > >>>>>>>> Yes, there's work involved, but I doubt there's a lot. Getting > >>>>>>>> the label from the backing store or the creating process is > >>>>>>>> simple enough. > >>>>>>>> > >>>>>> So something like the diff below (untested)? > >>>>> I think that this is close, and quite good for someone > >>>>> who isn't very familiar with Smack. It's definitely headed > >>>>> in the right direction. > >>>>> > >>>>>> All I'm really doing is setting smk_default as you describe above and > >>>>>> then using it instead of smk_of_current() in > >>>>>> smack_inode_alloc_security() and instead of the label from the disk in > >>>>>> smack_d_instantiate(). > >>>>> Let's say your backing store is a file labeled Rubble. > >>>>> > >>>>> mount -o smackfsroot=Rubble,smackfsdef=Rubble ... > >>>>> > >>>>> It is completely reasonable for a process labeled Flintstone to > >>>>> have rwxa access to a file labeled Rubble. > >>>>> > >>>>> Smack rule: Flintstone Rubble rwxa > >>>>> > >>>>> In the case of writing to an existing Rubble file, what you > >>>>> have looks fine. What's not so great is that if the Flintstone > >>>>> process creates a file, it should be labeled Flintstone. Your > >>>>> use of the smk_default, which is going to violate the principle > >>>>> of least astonishment, and break the Smack policy as well. > >>>>> > >>>>> Let's make a minor change. Instead of using smackfsroot let's > >>>>> use smackfstransmute and a slightly different access rule: > >>>>> > >>>>> mount -o smackfstransmute=Rubble,smackfsdef=Rubble ... > >>>>> > >>>>> Smack rule: Flintstone Rubble rwxat > >>>>> > >>>>> Now the only change we have to make to the Smack code is > >>>>> that we don't want to create any files unless either the > >>>>> process is labeled Rubble or the rule allowing the creation > >>>>> has the "t" for transmute access. That should ensure that > >>>>> everything is labeled Rubble. If it isn't, someone has mucked > >>>>> with the metadata in a detectable way. > >>>> All right, that kind of makes sense, but I'm still missing some pieces. > >>>> Questions follow. > >>>> > >>>>>> diff --git a/include/linux/fs.h b/include/linux/fs.h > >>>>>> index 32f598db0b0d..4597420ab933 100644 > >>>>>> --- a/include/linux/fs.h > >>>>>> +++ b/include/linux/fs.h > >>>>>> @@ -1486,6 +1486,10 @@ static inline void sb_start_intwrite(struct super_block *sb) > >>>>>> __sb_start_write(sb, SB_FREEZE_FS, true); > >>>>>> } > >>>>>> > >>>>>> +static inline bool sb_in_userns(struct super_block *sb) > >>>>>> +{ > >>>>>> + return sb->s_user_ns != &init_user_ns; > >>>>>> +} > >>>>>> > >>>>>> extern bool inode_owner_or_capable(const struct inode *inode); > >>>>>> > >>>>>> diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c > >>>>>> index a143328f75eb..591fd19294e7 100644 > >>>>>> --- a/security/smack/smack_lsm.c > >>>>>> +++ b/security/smack/smack_lsm.c > >>>>>> @@ -255,6 +255,10 @@ static struct smack_known *smk_fetch(const char *name, struct inode *ip, > >>>>>> char *buffer; > >>>>>> struct smack_known *skp = NULL; > >>>>>> > >>>>>> + /* Should never fetch xattrs from untrusted mounts */ > >>>>>> + if (WARN_ON(sb_in_userns(ip->i_sb))) > >>>>>> + return ERR_PTR(-EPERM); > >>>>>> + > >>>>> Go ahead and fetch it, we'll check to make sure it's viable later. > >>>>> > >>>>>> if (ip->i_op->getxattr == NULL) > >>>>>> return ERR_PTR(-EOPNOTSUPP); > >>>>>> > >>>>>> @@ -656,10 +660,14 @@ static int smack_sb_kern_mount(struct super_block *sb, int flags, void *data) > >>>>>> */ > >>>>>> if (specified) > >>>>>> return -EPERM; > >>>>>> + > >>>>>> /* > >>>>>> - * Unprivileged mounts get root and default from the caller. > >>>>>> + * User namespace mounts get root and default from the backing > >>>>>> + * store, if there is one. Other unprivileged mounts get them > >>>>>> + * from the caller. > >>>>>> */ > >>>>>> - skp = smk_of_current(); > >>>>>> + skp = (sb_in_userns(sb) && sb->s_bdev) ? > >>>>>> + smk_of_inode(sb->s_bdev->bd_inode) : smk_of_current(); > >>>>>> sp->smk_root = skp; > >>>>>> sp->smk_default = skp; > >>>>> sp->smk_flags |= SMK_INODE_TRANSMUTE; > >>>> I assume that you meant skp and not sp here. > >>> Actually, neither is correct. You want to set SMK_INODE_TRANSMUTE > >>> in the smk_flags field of the root inode. That's easy: > >>> > >>> transmute = 1; > >>> > >>> and the code after "Initialize the root inode" will take care of it. > >> Yeah, that's what I've actually done. > >> > >>>>>> } > >>>>>> @@ -792,7 +800,12 @@ static int smack_bprm_secureexec(struct linux_binprm *bprm) > >>>>>> */ > >>>>>> static int smack_inode_alloc_security(struct inode *inode) > >>>>>> { > >>>>>> - struct smack_known *skp = smk_of_current(); > >>>>>> + struct smack_known *skp; > >>>>>> + > >>>>>> + if (sb_in_userns(inode->i_sb)) > >>>>>> + skp = ((struct superblock_smack *)(inode->i_sb->s_security))->smk_default; > >>>>>> + else > >>>>>> + skp = smk_of_current(); > >>>>> This should be left alone. > >>>>> smack_inode_init_security is where you could disallow access that doesn't > >>>>> legitimately result in a Rubble label on the file. It's something like > >>>>> > >>>>> ... after the call may = smk_access_entry(...) > >>>>> if (sb_in_userns(inode->i_sb)) > >>>>> if (skp != dsp && (may & MAY_TRANSMUTE) == 0) > >>>>> return -EACCES; > >>>> I'm not getting how this covers all cases. > >>>> > >>>> So we've set the transmute flag on the root inode. Files and directories > >>>> created in the root directory get the same label, and directories also > >>>> get the transmute attribute. That's all fine. > >>>> > >>>> What about an existing directory in the filesystem that already has a > >>>> Slate label? I'm not getting what happens with this directory, or for > >>>> new files created in this directory, which also relates to my other > >>>> questions below. > >>>> > >>>> Also an aside - smk_access_entry looks weird. may is initialized to > >>>> -ENOENT, and then rule_list is searched for a rule which matches the > >>>> object and subject labels. Presumably it's possible that no rule could > >>>> be found, otherwise the prior initialization of may is pointless. If > >>>> this happens the following code treats it as though it always contains > >>>> access flags even though it might contain -ENOENT. Nothing bad actually > >>>> happens with a two's compliement representation of -ENOENT since it will > >>>> just set a bit that's already set, but it still seems like it should > >>>> have a may > 0 condition, for clarity if for no other reason. > >>> My suggested code is just wrong. I wasn't looking at the whole code, > >>> only the patch, and got myself confused. Apologies. > >>> > >>> If we want to go straight for the jugular how about this? I'm assuming > >>> that inode->i_sb->s_bdev->bd_inode is the inode of the backing store. > >> Yes. > >> > >>> static int smack_inode_permission(struct inode *inode, int mask) > >>> { > >>> struct smk_audit_info ad; > >>> int no_block = mask & MAY_NOT_BLOCK; > >>> int rc; > >>> > >>> mask &= (MAY_READ|MAY_WRITE|MAY_EXEC|MAY_APPEND); > >>> /* > >>> * No permission to check. Existence test. Yup, it's there. > >>> */ > >>> if (mask == 0) > >>> return 0; > >>> > >>> + if (sb_in_userns(inode->i_sb)) && > >>> + smk_of_inode(inode) != smk_of_inode(inode->i_sb->s_bdev->bd_inode)) > >>> + return -EACCES; > >>> + > >>> /* May be droppable after audit */ > >>> if (no_block) > >>> return -ECHILD; > >>> smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_INODE); > >>> smk_ad_setfield_u_fs_inode(&ad, inode); > >>> rc = smk_curacc(smk_of_inode(inode), mask, &ad); > >>> rc = smk_bu_inode(inode, mask, rc); > >>> return rc; > >>> } > >> Hmm, okay. I think I've been a little confused all this time about how > >> you want to handle these unprivileged mounts. > > > > Not your problem. I'm not the most consistent of reviewers. > > > >> Originally I thought you wanted all objects in the filesystem to get the > >> same label as the backing store. That's what I tried to implement > >> originally, i.e. smk_root=smk_default=smk_of_inode(...->bd_inode), then > >> assign every object (new and existing) smk_default and completely ignore > >> the labels on disk. > > > > I want everything to have the label of the backing store, but > > I don't want to ignore it if it somehow got something else. Because > > the only legitimate label for this example is Rubble, I want to > > reject anything else that appears. If someone builds a filesystem > > by hand with Slate labels I want it treated "safely". > > > >> This is what I currently think you want for user ns mounts: > >> > >> 1. smk_root and smk_default are assigned the label of the backing > >> device. > >> 2. s_root is assigned the transmute property. > >> 3. For existing files: > >> a. Files with the same label as the backing device are accessible. > >> b. Files with any other label are not accessible. > > > > That's right. Accept correct data, reject anything that's not right. > > > >> If this is right, there are a couple lingering questions in my mind. > >> > >> First, what happens with files created in directories with the same > >> label as the backing device but without the transmute property set? The > >> inode for the new file will initially be labeled with smk_of_current(), > >> but then during d_instantiate it will get smk_default and thus end up > >> with the label we want. So that seems okay. > > > > Yes. > > > >> The second is whether files with the SMACK64EXEC attribute is still a > >> problem. It seems it is, for files with the same label as the backing > >> store at least. I think we can simply skip the code that reads out this > >> xattr and sets smk_task for user ns mounts, or else skip assigning the > >> label to the new task in bprm_set_creds. The latter seems more > >> consistent with the approach you've suggested for dealing with labels > >> from disk. > > > > Yes, I think that skipping the smk_fetch(XATTR_NAME_SMACKEXEC, ...) in > > smack_d_instantiate for unprivileged mounts would do the trick. > > > >> So I guess all of that seems okay, though perhaps a bit restrictive > >> given that the user who mounted the filesystem already has full access > >> to the backing store. > > > > In truth, there is no reason to expect that the "user" who did the > > mount will ever have a Smack label that differs from the label of > > the backing store. If what we've got here seems restrictive, it's > > because you've got access from someone other than the "user". > > > >> Please let me know whether or not this matches up with what you are > >> thinking, then I can procede with the implementation. > > > > My current mindset is that, if you're going to allow unprivileged > > mounts of user defined backing stores, this is as safe as we can > > make it. > > That actually sounds very reasonable to me. It is essentially what we > do with uid and gids already. I presume the smack namespace support > would when integrated with all of this would allow a set of labels to be > set. > > Have I missed a part of the conversation you talk about fileystems that > don't have support for storing labels? Filesystems like vfat, isofs, > etc. As I read the code they should all end up with the superblock's smk_default label for the objects in RAM, i.e. the label of the backing store. The same would be true for existing files in a filesystem which does support storing labels but has no labels on the files. Seth ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-23 0:15 ` Eric W. Biederman @ 2015-07-23 21:48 ` Casey Schaufler -1 siblings, 0 replies; 138+ messages in thread From: Casey Schaufler @ 2015-07-23 21:48 UTC (permalink / raw) To: Eric W. Biederman Cc: Seth Forshee, Andy Lutomirski, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel, Casey Schaufler On 7/22/2015 5:15 PM, Eric W. Biederman wrote: > Casey Schaufler <casey@schaufler-ca.com> writes: > >> On 7/22/2015 12:32 PM, Seth Forshee wrote: >>> On Wed, Jul 22, 2015 at 11:10:46AM -0700, Casey Schaufler wrote: >>>> On 7/22/2015 8:56 AM, Seth Forshee wrote: >>>>> On Tue, Jul 21, 2015 at 06:52:31PM -0700, Casey Schaufler wrote: >>>>>> On 7/21/2015 1:35 PM, Seth Forshee wrote: >>>>>>> On Thu, Jul 16, 2015 at 05:59:22PM -0700, Andy Lutomirski wrote: >>>>>>>> On Thu, Jul 16, 2015 at 5:45 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: >>>>>>>>> On 7/16/2015 4:29 PM, Andy Lutomirski wrote: >>>>>>>>>> I really don't see the benefit of making up extra rules that apply to >>>>>>>>>> users outside a userns who try to access specifically a filesystem >>>>>>>>>> with backing store. They wouldn't make sense for filesystems without >>>>>>>>>> backing store. >>>>>>>>> Sure it would. For Smack, it would be the label a file would be >>>>>>>>> created with, which would be the label of the process creating >>>>>>>>> the memory based filesystem. For SELinux the rules are more a >>>>>>>>> touch more sophisticated, but I'm sure that Paul or Stephen could >>>>>>>>> come up with how to determine it. >>>>>>>>> >>>>>>>>> The point, looping all the way back to the beginning, where we >>>>>>>>> were talking about just ignoring the labels on the filesystem, >>>>>>>>> is that if you use the same Smack label on the files in the >>>>>>>>> filesystem as the backing store file has, we'll all be happy. >>>>>>>>> If that label isn't something user can write to, he won't be >>>>>>>>> able to write to the mounted objects, either. If there is no >>>>>>>>> backing store then use the label of the process creating the >>>>>>>>> filesystem, which will be the user, which will mean everything >>>>>>>>> will work hunky dory. >>>>>>>>> >>>>>>>>> Yes, there's work involved, but I doubt there's a lot. Getting >>>>>>>>> the label from the backing store or the creating process is >>>>>>>>> simple enough. >>>>>>>>> >>>>>>> So something like the diff below (untested)? >>>>>> I think that this is close, and quite good for someone >>>>>> who isn't very familiar with Smack. It's definitely headed >>>>>> in the right direction. >>>>>> >>>>>>> All I'm really doing is setting smk_default as you describe above and >>>>>>> then using it instead of smk_of_current() in >>>>>>> smack_inode_alloc_security() and instead of the label from the disk in >>>>>>> smack_d_instantiate(). >>>>>> Let's say your backing store is a file labeled Rubble. >>>>>> >>>>>> mount -o smackfsroot=Rubble,smackfsdef=Rubble ... >>>>>> >>>>>> It is completely reasonable for a process labeled Flintstone to >>>>>> have rwxa access to a file labeled Rubble. >>>>>> >>>>>> Smack rule: Flintstone Rubble rwxa >>>>>> >>>>>> In the case of writing to an existing Rubble file, what you >>>>>> have looks fine. What's not so great is that if the Flintstone >>>>>> process creates a file, it should be labeled Flintstone. Your >>>>>> use of the smk_default, which is going to violate the principle >>>>>> of least astonishment, and break the Smack policy as well. >>>>>> >>>>>> Let's make a minor change. Instead of using smackfsroot let's >>>>>> use smackfstransmute and a slightly different access rule: >>>>>> >>>>>> mount -o smackfstransmute=Rubble,smackfsdef=Rubble ... >>>>>> >>>>>> Smack rule: Flintstone Rubble rwxat >>>>>> >>>>>> Now the only change we have to make to the Smack code is >>>>>> that we don't want to create any files unless either the >>>>>> process is labeled Rubble or the rule allowing the creation >>>>>> has the "t" for transmute access. That should ensure that >>>>>> everything is labeled Rubble. If it isn't, someone has mucked >>>>>> with the metadata in a detectable way. >>>>> All right, that kind of makes sense, but I'm still missing some pieces. >>>>> Questions follow. >>>>> >>>>>>> diff --git a/include/linux/fs.h b/include/linux/fs.h >>>>>>> index 32f598db0b0d..4597420ab933 100644 >>>>>>> --- a/include/linux/fs.h >>>>>>> +++ b/include/linux/fs.h >>>>>>> @@ -1486,6 +1486,10 @@ static inline void sb_start_intwrite(struct super_block *sb) >>>>>>> __sb_start_write(sb, SB_FREEZE_FS, true); >>>>>>> } >>>>>>> >>>>>>> +static inline bool sb_in_userns(struct super_block *sb) >>>>>>> +{ >>>>>>> + return sb->s_user_ns != &init_user_ns; >>>>>>> +} >>>>>>> >>>>>>> extern bool inode_owner_or_capable(const struct inode *inode); >>>>>>> >>>>>>> diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c >>>>>>> index a143328f75eb..591fd19294e7 100644 >>>>>>> --- a/security/smack/smack_lsm.c >>>>>>> +++ b/security/smack/smack_lsm.c >>>>>>> @@ -255,6 +255,10 @@ static struct smack_known *smk_fetch(const char *name, struct inode *ip, >>>>>>> char *buffer; >>>>>>> struct smack_known *skp = NULL; >>>>>>> >>>>>>> + /* Should never fetch xattrs from untrusted mounts */ >>>>>>> + if (WARN_ON(sb_in_userns(ip->i_sb))) >>>>>>> + return ERR_PTR(-EPERM); >>>>>>> + >>>>>> Go ahead and fetch it, we'll check to make sure it's viable later. >>>>>> >>>>>>> if (ip->i_op->getxattr == NULL) >>>>>>> return ERR_PTR(-EOPNOTSUPP); >>>>>>> >>>>>>> @@ -656,10 +660,14 @@ static int smack_sb_kern_mount(struct super_block *sb, int flags, void *data) >>>>>>> */ >>>>>>> if (specified) >>>>>>> return -EPERM; >>>>>>> + >>>>>>> /* >>>>>>> - * Unprivileged mounts get root and default from the caller. >>>>>>> + * User namespace mounts get root and default from the backing >>>>>>> + * store, if there is one. Other unprivileged mounts get them >>>>>>> + * from the caller. >>>>>>> */ >>>>>>> - skp = smk_of_current(); >>>>>>> + skp = (sb_in_userns(sb) && sb->s_bdev) ? >>>>>>> + smk_of_inode(sb->s_bdev->bd_inode) : smk_of_current(); >>>>>>> sp->smk_root = skp; >>>>>>> sp->smk_default = skp; >>>>>> sp->smk_flags |= SMK_INODE_TRANSMUTE; >>>>> I assume that you meant skp and not sp here. >>>> Actually, neither is correct. You want to set SMK_INODE_TRANSMUTE >>>> in the smk_flags field of the root inode. That's easy: >>>> >>>> transmute = 1; >>>> >>>> and the code after "Initialize the root inode" will take care of it. >>> Yeah, that's what I've actually done. >>> >>>>>>> } >>>>>>> @@ -792,7 +800,12 @@ static int smack_bprm_secureexec(struct linux_binprm *bprm) >>>>>>> */ >>>>>>> static int smack_inode_alloc_security(struct inode *inode) >>>>>>> { >>>>>>> - struct smack_known *skp = smk_of_current(); >>>>>>> + struct smack_known *skp; >>>>>>> + >>>>>>> + if (sb_in_userns(inode->i_sb)) >>>>>>> + skp = ((struct superblock_smack *)(inode->i_sb->s_security))->smk_default; >>>>>>> + else >>>>>>> + skp = smk_of_current(); >>>>>> This should be left alone. >>>>>> smack_inode_init_security is where you could disallow access that doesn't >>>>>> legitimately result in a Rubble label on the file. It's something like >>>>>> >>>>>> ... after the call may = smk_access_entry(...) >>>>>> if (sb_in_userns(inode->i_sb)) >>>>>> if (skp != dsp && (may & MAY_TRANSMUTE) == 0) >>>>>> return -EACCES; >>>>> I'm not getting how this covers all cases. >>>>> >>>>> So we've set the transmute flag on the root inode. Files and directories >>>>> created in the root directory get the same label, and directories also >>>>> get the transmute attribute. That's all fine. >>>>> >>>>> What about an existing directory in the filesystem that already has a >>>>> Slate label? I'm not getting what happens with this directory, or for >>>>> new files created in this directory, which also relates to my other >>>>> questions below. >>>>> >>>>> Also an aside - smk_access_entry looks weird. may is initialized to >>>>> -ENOENT, and then rule_list is searched for a rule which matches the >>>>> object and subject labels. Presumably it's possible that no rule could >>>>> be found, otherwise the prior initialization of may is pointless. If >>>>> this happens the following code treats it as though it always contains >>>>> access flags even though it might contain -ENOENT. Nothing bad actually >>>>> happens with a two's compliement representation of -ENOENT since it will >>>>> just set a bit that's already set, but it still seems like it should >>>>> have a may > 0 condition, for clarity if for no other reason. >>>> My suggested code is just wrong. I wasn't looking at the whole code, >>>> only the patch, and got myself confused. Apologies. >>>> >>>> If we want to go straight for the jugular how about this? I'm assuming >>>> that inode->i_sb->s_bdev->bd_inode is the inode of the backing store. >>> Yes. >>> >>>> static int smack_inode_permission(struct inode *inode, int mask) >>>> { >>>> struct smk_audit_info ad; >>>> int no_block = mask & MAY_NOT_BLOCK; >>>> int rc; >>>> >>>> mask &= (MAY_READ|MAY_WRITE|MAY_EXEC|MAY_APPEND); >>>> /* >>>> * No permission to check. Existence test. Yup, it's there. >>>> */ >>>> if (mask == 0) >>>> return 0; >>>> >>>> + if (sb_in_userns(inode->i_sb)) && >>>> + smk_of_inode(inode) != smk_of_inode(inode->i_sb->s_bdev->bd_inode)) >>>> + return -EACCES; >>>> + >>>> /* May be droppable after audit */ >>>> if (no_block) >>>> return -ECHILD; >>>> smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_INODE); >>>> smk_ad_setfield_u_fs_inode(&ad, inode); >>>> rc = smk_curacc(smk_of_inode(inode), mask, &ad); >>>> rc = smk_bu_inode(inode, mask, rc); >>>> return rc; >>>> } >>> Hmm, okay. I think I've been a little confused all this time about how >>> you want to handle these unprivileged mounts. >> Not your problem. I'm not the most consistent of reviewers. >> >>> Originally I thought you wanted all objects in the filesystem to get the >>> same label as the backing store. That's what I tried to implement >>> originally, i.e. smk_root=smk_default=smk_of_inode(...->bd_inode), then >>> assign every object (new and existing) smk_default and completely ignore >>> the labels on disk. >> I want everything to have the label of the backing store, but >> I don't want to ignore it if it somehow got something else. Because >> the only legitimate label for this example is Rubble, I want to >> reject anything else that appears. If someone builds a filesystem >> by hand with Slate labels I want it treated "safely". >> >>> This is what I currently think you want for user ns mounts: >>> >>> 1. smk_root and smk_default are assigned the label of the backing >>> device. >>> 2. s_root is assigned the transmute property. >>> 3. For existing files: >>> a. Files with the same label as the backing device are accessible. >>> b. Files with any other label are not accessible. >> That's right. Accept correct data, reject anything that's not right. >> >>> If this is right, there are a couple lingering questions in my mind. >>> >>> First, what happens with files created in directories with the same >>> label as the backing device but without the transmute property set? The >>> inode for the new file will initially be labeled with smk_of_current(), >>> but then during d_instantiate it will get smk_default and thus end up >>> with the label we want. So that seems okay. >> Yes. >> >>> The second is whether files with the SMACK64EXEC attribute is still a >>> problem. It seems it is, for files with the same label as the backing >>> store at least. I think we can simply skip the code that reads out this >>> xattr and sets smk_task for user ns mounts, or else skip assigning the >>> label to the new task in bprm_set_creds. The latter seems more >>> consistent with the approach you've suggested for dealing with labels >>> from disk. >> Yes, I think that skipping the smk_fetch(XATTR_NAME_SMACKEXEC, ...) in >> smack_d_instantiate for unprivileged mounts would do the trick. >> >>> So I guess all of that seems okay, though perhaps a bit restrictive >>> given that the user who mounted the filesystem already has full access >>> to the backing store. >> In truth, there is no reason to expect that the "user" who did the >> mount will ever have a Smack label that differs from the label of >> the backing store. If what we've got here seems restrictive, it's >> because you've got access from someone other than the "user". >> >>> Please let me know whether or not this matches up with what you are >>> thinking, then I can procede with the implementation. >> My current mindset is that, if you're going to allow unprivileged >> mounts of user defined backing stores, this is as safe as we can >> make it. > That actually sounds very reasonable to me. It is essentially what we > do with uid and gids already. I presume the smack namespace support > would when integrated with all of this would allow a set of labels to be > set. > > Have I missed a part of the conversation you talk about fileystems that > don't have support for storing labels? Filesystems like vfat, isofs, > etc. They are easier. Set smackfsroot=Rubble,smackfsdef=Rubble and all objects there will get labeled Rubble. Processes with different labels that can write there will end up creating Rubble objects. For privileged mounts you can set the values at will. For unprivileged mounts, you should take the label values from the backing store. > > Eric > > -- > To unsubscribe from this list: send the line "unsubscribe linux-security-module" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-23 21:48 ` Casey Schaufler 0 siblings, 0 replies; 138+ messages in thread From: Casey Schaufler @ 2015-07-23 21:48 UTC (permalink / raw) To: Eric W. Biederman Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, Seth Forshee, LSM List, Alexander Viro, SELinux-NSA, Linux FS Devel On 7/22/2015 5:15 PM, Eric W. Biederman wrote: > Casey Schaufler <casey@schaufler-ca.com> writes: > >> On 7/22/2015 12:32 PM, Seth Forshee wrote: >>> On Wed, Jul 22, 2015 at 11:10:46AM -0700, Casey Schaufler wrote: >>>> On 7/22/2015 8:56 AM, Seth Forshee wrote: >>>>> On Tue, Jul 21, 2015 at 06:52:31PM -0700, Casey Schaufler wrote: >>>>>> On 7/21/2015 1:35 PM, Seth Forshee wrote: >>>>>>> On Thu, Jul 16, 2015 at 05:59:22PM -0700, Andy Lutomirski wrote: >>>>>>>> On Thu, Jul 16, 2015 at 5:45 PM, Casey Schaufler <casey@schaufler-ca.com> wrote: >>>>>>>>> On 7/16/2015 4:29 PM, Andy Lutomirski wrote: >>>>>>>>>> I really don't see the benefit of making up extra rules that apply to >>>>>>>>>> users outside a userns who try to access specifically a filesystem >>>>>>>>>> with backing store. They wouldn't make sense for filesystems without >>>>>>>>>> backing store. >>>>>>>>> Sure it would. For Smack, it would be the label a file would be >>>>>>>>> created with, which would be the label of the process creating >>>>>>>>> the memory based filesystem. For SELinux the rules are more a >>>>>>>>> touch more sophisticated, but I'm sure that Paul or Stephen could >>>>>>>>> come up with how to determine it. >>>>>>>>> >>>>>>>>> The point, looping all the way back to the beginning, where we >>>>>>>>> were talking about just ignoring the labels on the filesystem, >>>>>>>>> is that if you use the same Smack label on the files in the >>>>>>>>> filesystem as the backing store file has, we'll all be happy. >>>>>>>>> If that label isn't something user can write to, he won't be >>>>>>>>> able to write to the mounted objects, either. If there is no >>>>>>>>> backing store then use the label of the process creating the >>>>>>>>> filesystem, which will be the user, which will mean everything >>>>>>>>> will work hunky dory. >>>>>>>>> >>>>>>>>> Yes, there's work involved, but I doubt there's a lot. Getting >>>>>>>>> the label from the backing store or the creating process is >>>>>>>>> simple enough. >>>>>>>>> >>>>>>> So something like the diff below (untested)? >>>>>> I think that this is close, and quite good for someone >>>>>> who isn't very familiar with Smack. It's definitely headed >>>>>> in the right direction. >>>>>> >>>>>>> All I'm really doing is setting smk_default as you describe above and >>>>>>> then using it instead of smk_of_current() in >>>>>>> smack_inode_alloc_security() and instead of the label from the disk in >>>>>>> smack_d_instantiate(). >>>>>> Let's say your backing store is a file labeled Rubble. >>>>>> >>>>>> mount -o smackfsroot=Rubble,smackfsdef=Rubble ... >>>>>> >>>>>> It is completely reasonable for a process labeled Flintstone to >>>>>> have rwxa access to a file labeled Rubble. >>>>>> >>>>>> Smack rule: Flintstone Rubble rwxa >>>>>> >>>>>> In the case of writing to an existing Rubble file, what you >>>>>> have looks fine. What's not so great is that if the Flintstone >>>>>> process creates a file, it should be labeled Flintstone. Your >>>>>> use of the smk_default, which is going to violate the principle >>>>>> of least astonishment, and break the Smack policy as well. >>>>>> >>>>>> Let's make a minor change. Instead of using smackfsroot let's >>>>>> use smackfstransmute and a slightly different access rule: >>>>>> >>>>>> mount -o smackfstransmute=Rubble,smackfsdef=Rubble ... >>>>>> >>>>>> Smack rule: Flintstone Rubble rwxat >>>>>> >>>>>> Now the only change we have to make to the Smack code is >>>>>> that we don't want to create any files unless either the >>>>>> process is labeled Rubble or the rule allowing the creation >>>>>> has the "t" for transmute access. That should ensure that >>>>>> everything is labeled Rubble. If it isn't, someone has mucked >>>>>> with the metadata in a detectable way. >>>>> All right, that kind of makes sense, but I'm still missing some pieces. >>>>> Questions follow. >>>>> >>>>>>> diff --git a/include/linux/fs.h b/include/linux/fs.h >>>>>>> index 32f598db0b0d..4597420ab933 100644 >>>>>>> --- a/include/linux/fs.h >>>>>>> +++ b/include/linux/fs.h >>>>>>> @@ -1486,6 +1486,10 @@ static inline void sb_start_intwrite(struct super_block *sb) >>>>>>> __sb_start_write(sb, SB_FREEZE_FS, true); >>>>>>> } >>>>>>> >>>>>>> +static inline bool sb_in_userns(struct super_block *sb) >>>>>>> +{ >>>>>>> + return sb->s_user_ns != &init_user_ns; >>>>>>> +} >>>>>>> >>>>>>> extern bool inode_owner_or_capable(const struct inode *inode); >>>>>>> >>>>>>> diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c >>>>>>> index a143328f75eb..591fd19294e7 100644 >>>>>>> --- a/security/smack/smack_lsm.c >>>>>>> +++ b/security/smack/smack_lsm.c >>>>>>> @@ -255,6 +255,10 @@ static struct smack_known *smk_fetch(const char *name, struct inode *ip, >>>>>>> char *buffer; >>>>>>> struct smack_known *skp = NULL; >>>>>>> >>>>>>> + /* Should never fetch xattrs from untrusted mounts */ >>>>>>> + if (WARN_ON(sb_in_userns(ip->i_sb))) >>>>>>> + return ERR_PTR(-EPERM); >>>>>>> + >>>>>> Go ahead and fetch it, we'll check to make sure it's viable later. >>>>>> >>>>>>> if (ip->i_op->getxattr == NULL) >>>>>>> return ERR_PTR(-EOPNOTSUPP); >>>>>>> >>>>>>> @@ -656,10 +660,14 @@ static int smack_sb_kern_mount(struct super_block *sb, int flags, void *data) >>>>>>> */ >>>>>>> if (specified) >>>>>>> return -EPERM; >>>>>>> + >>>>>>> /* >>>>>>> - * Unprivileged mounts get root and default from the caller. >>>>>>> + * User namespace mounts get root and default from the backing >>>>>>> + * store, if there is one. Other unprivileged mounts get them >>>>>>> + * from the caller. >>>>>>> */ >>>>>>> - skp = smk_of_current(); >>>>>>> + skp = (sb_in_userns(sb) && sb->s_bdev) ? >>>>>>> + smk_of_inode(sb->s_bdev->bd_inode) : smk_of_current(); >>>>>>> sp->smk_root = skp; >>>>>>> sp->smk_default = skp; >>>>>> sp->smk_flags |= SMK_INODE_TRANSMUTE; >>>>> I assume that you meant skp and not sp here. >>>> Actually, neither is correct. You want to set SMK_INODE_TRANSMUTE >>>> in the smk_flags field of the root inode. That's easy: >>>> >>>> transmute = 1; >>>> >>>> and the code after "Initialize the root inode" will take care of it. >>> Yeah, that's what I've actually done. >>> >>>>>>> } >>>>>>> @@ -792,7 +800,12 @@ static int smack_bprm_secureexec(struct linux_binprm *bprm) >>>>>>> */ >>>>>>> static int smack_inode_alloc_security(struct inode *inode) >>>>>>> { >>>>>>> - struct smack_known *skp = smk_of_current(); >>>>>>> + struct smack_known *skp; >>>>>>> + >>>>>>> + if (sb_in_userns(inode->i_sb)) >>>>>>> + skp = ((struct superblock_smack *)(inode->i_sb->s_security))->smk_default; >>>>>>> + else >>>>>>> + skp = smk_of_current(); >>>>>> This should be left alone. >>>>>> smack_inode_init_security is where you could disallow access that doesn't >>>>>> legitimately result in a Rubble label on the file. It's something like >>>>>> >>>>>> ... after the call may = smk_access_entry(...) >>>>>> if (sb_in_userns(inode->i_sb)) >>>>>> if (skp != dsp && (may & MAY_TRANSMUTE) == 0) >>>>>> return -EACCES; >>>>> I'm not getting how this covers all cases. >>>>> >>>>> So we've set the transmute flag on the root inode. Files and directories >>>>> created in the root directory get the same label, and directories also >>>>> get the transmute attribute. That's all fine. >>>>> >>>>> What about an existing directory in the filesystem that already has a >>>>> Slate label? I'm not getting what happens with this directory, or for >>>>> new files created in this directory, which also relates to my other >>>>> questions below. >>>>> >>>>> Also an aside - smk_access_entry looks weird. may is initialized to >>>>> -ENOENT, and then rule_list is searched for a rule which matches the >>>>> object and subject labels. Presumably it's possible that no rule could >>>>> be found, otherwise the prior initialization of may is pointless. If >>>>> this happens the following code treats it as though it always contains >>>>> access flags even though it might contain -ENOENT. Nothing bad actually >>>>> happens with a two's compliement representation of -ENOENT since it will >>>>> just set a bit that's already set, but it still seems like it should >>>>> have a may > 0 condition, for clarity if for no other reason. >>>> My suggested code is just wrong. I wasn't looking at the whole code, >>>> only the patch, and got myself confused. Apologies. >>>> >>>> If we want to go straight for the jugular how about this? I'm assuming >>>> that inode->i_sb->s_bdev->bd_inode is the inode of the backing store. >>> Yes. >>> >>>> static int smack_inode_permission(struct inode *inode, int mask) >>>> { >>>> struct smk_audit_info ad; >>>> int no_block = mask & MAY_NOT_BLOCK; >>>> int rc; >>>> >>>> mask &= (MAY_READ|MAY_WRITE|MAY_EXEC|MAY_APPEND); >>>> /* >>>> * No permission to check. Existence test. Yup, it's there. >>>> */ >>>> if (mask == 0) >>>> return 0; >>>> >>>> + if (sb_in_userns(inode->i_sb)) && >>>> + smk_of_inode(inode) != smk_of_inode(inode->i_sb->s_bdev->bd_inode)) >>>> + return -EACCES; >>>> + >>>> /* May be droppable after audit */ >>>> if (no_block) >>>> return -ECHILD; >>>> smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_INODE); >>>> smk_ad_setfield_u_fs_inode(&ad, inode); >>>> rc = smk_curacc(smk_of_inode(inode), mask, &ad); >>>> rc = smk_bu_inode(inode, mask, rc); >>>> return rc; >>>> } >>> Hmm, okay. I think I've been a little confused all this time about how >>> you want to handle these unprivileged mounts. >> Not your problem. I'm not the most consistent of reviewers. >> >>> Originally I thought you wanted all objects in the filesystem to get the >>> same label as the backing store. That's what I tried to implement >>> originally, i.e. smk_root=smk_default=smk_of_inode(...->bd_inode), then >>> assign every object (new and existing) smk_default and completely ignore >>> the labels on disk. >> I want everything to have the label of the backing store, but >> I don't want to ignore it if it somehow got something else. Because >> the only legitimate label for this example is Rubble, I want to >> reject anything else that appears. If someone builds a filesystem >> by hand with Slate labels I want it treated "safely". >> >>> This is what I currently think you want for user ns mounts: >>> >>> 1. smk_root and smk_default are assigned the label of the backing >>> device. >>> 2. s_root is assigned the transmute property. >>> 3. For existing files: >>> a. Files with the same label as the backing device are accessible. >>> b. Files with any other label are not accessible. >> That's right. Accept correct data, reject anything that's not right. >> >>> If this is right, there are a couple lingering questions in my mind. >>> >>> First, what happens with files created in directories with the same >>> label as the backing device but without the transmute property set? The >>> inode for the new file will initially be labeled with smk_of_current(), >>> but then during d_instantiate it will get smk_default and thus end up >>> with the label we want. So that seems okay. >> Yes. >> >>> The second is whether files with the SMACK64EXEC attribute is still a >>> problem. It seems it is, for files with the same label as the backing >>> store at least. I think we can simply skip the code that reads out this >>> xattr and sets smk_task for user ns mounts, or else skip assigning the >>> label to the new task in bprm_set_creds. The latter seems more >>> consistent with the approach you've suggested for dealing with labels >>> from disk. >> Yes, I think that skipping the smk_fetch(XATTR_NAME_SMACKEXEC, ...) in >> smack_d_instantiate for unprivileged mounts would do the trick. >> >>> So I guess all of that seems okay, though perhaps a bit restrictive >>> given that the user who mounted the filesystem already has full access >>> to the backing store. >> In truth, there is no reason to expect that the "user" who did the >> mount will ever have a Smack label that differs from the label of >> the backing store. If what we've got here seems restrictive, it's >> because you've got access from someone other than the "user". >> >>> Please let me know whether or not this matches up with what you are >>> thinking, then I can procede with the implementation. >> My current mindset is that, if you're going to allow unprivileged >> mounts of user defined backing stores, this is as safe as we can >> make it. > That actually sounds very reasonable to me. It is essentially what we > do with uid and gids already. I presume the smack namespace support > would when integrated with all of this would allow a set of labels to be > set. > > Have I missed a part of the conversation you talk about fileystems that > don't have support for storing labels? Filesystems like vfat, isofs, > etc. They are easier. Set smackfsroot=Rubble,smackfsdef=Rubble and all objects there will get labeled Rubble. Processes with different labels that can write there will end up creating Rubble objects. For privileged mounts you can set the values at will. For unprivileged mounts, you should take the label values from the backing store. > > Eric > > -- > To unsubscribe from this list: send the line "unsubscribe linux-security-module" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-23 0:05 ` Casey Schaufler @ 2015-07-28 20:40 ` Seth Forshee -1 siblings, 0 replies; 138+ messages in thread From: Seth Forshee @ 2015-07-28 20:40 UTC (permalink / raw) To: Casey Schaufler Cc: Stephen Smalley, Andy Lutomirski, Eric W. Biederman, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel On Wed, Jul 22, 2015 at 05:05:17PM -0700, Casey Schaufler wrote: > > This is what I currently think you want for user ns mounts: > > > > 1. smk_root and smk_default are assigned the label of the backing > > device. > > 2. s_root is assigned the transmute property. > > 3. For existing files: > > a. Files with the same label as the backing device are accessible. > > b. Files with any other label are not accessible. > > That's right. Accept correct data, reject anything that's not right. > > > If this is right, there are a couple lingering questions in my mind. > > > > First, what happens with files created in directories with the same > > label as the backing device but without the transmute property set? The > > inode for the new file will initially be labeled with smk_of_current(), > > but then during d_instantiate it will get smk_default and thus end up > > with the label we want. So that seems okay. > > Yes. > > > The second is whether files with the SMACK64EXEC attribute is still a > > problem. It seems it is, for files with the same label as the backing > > store at least. I think we can simply skip the code that reads out this > > xattr and sets smk_task for user ns mounts, or else skip assigning the > > label to the new task in bprm_set_creds. The latter seems more > > consistent with the approach you've suggested for dealing with labels > > from disk. > > Yes, I think that skipping the smk_fetch(XATTR_NAME_SMACKEXEC, ...) in > smack_d_instantiate for unprivileged mounts would do the trick. > > > So I guess all of that seems okay, though perhaps a bit restrictive > > given that the user who mounted the filesystem already has full access > > to the backing store. > > In truth, there is no reason to expect that the "user" who did the > mount will ever have a Smack label that differs from the label of > the backing store. If what we've got here seems restrictive, it's > because you've got access from someone other than the "user". > > > Please let me know whether or not this matches up with what you are > > thinking, then I can procede with the implementation. > > My current mindset is that, if you're going to allow unprivileged > mounts of user defined backing stores, this is as safe as we can > make it. All right, I've got a patch which I think does this, and I've managed to do some testing to confirm that it behaves like I expect. How does this look? What's missing is getting the label from the block device inode; as Stephen discovered the inode that I thought we could get the label from turned out to be the wrong one. Afaict we would need a new hook in order to do that, so for now I'm using the label of the proccess calling mount. --- diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c index a143328f75eb..8e631a66b03c 100644 --- a/security/smack/smack_lsm.c +++ b/security/smack/smack_lsm.c @@ -662,6 +662,8 @@ static int smack_sb_kern_mount(struct super_block *sb, int flags, void *data) skp = smk_of_current(); sp->smk_root = skp; sp->smk_default = skp; + if (sb_in_userns(sb)) + transmute = 1; } /* * Initialize the root inode. @@ -1023,6 +1025,12 @@ static int smack_inode_permission(struct inode *inode, int mask) if (mask == 0) return 0; + if (sb_in_userns(inode->i_sb)) { + struct superblock_smack *sbsp = inode->i_sb->s_security; + if (smk_of_inode(inode) != sbsp->smk_root) + return -EACCES; + } + /* May be droppable after audit */ if (no_block) return -ECHILD; @@ -3220,14 +3228,16 @@ static void smack_d_instantiate(struct dentry *opt_dentry, struct inode *inode) if (rc >= 0) transflag = SMK_INODE_TRANSMUTE; } - /* - * Don't let the exec or mmap label be "*" or "@". - */ - skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp); - if (IS_ERR(skp) || skp == &smack_known_star || - skp == &smack_known_web) - skp = NULL; - isp->smk_task = skp; + if (!sb_in_userns(inode->i_sb)) { + /* + * Don't let the exec or mmap label be "*" or "@". + */ + skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp); + if (IS_ERR(skp) || skp == &smack_known_star || + skp == &smack_known_web) + skp = NULL; + isp->smk_task = skp; + } skp = smk_fetch(XATTR_NAME_SMACKMMAP, inode, dp); if (IS_ERR(skp) || skp == &smack_known_star || ^ permalink raw reply related [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-28 20:40 ` Seth Forshee 0 siblings, 0 replies; 138+ messages in thread From: Seth Forshee @ 2015-07-28 20:40 UTC (permalink / raw) To: Casey Schaufler Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, LSM List, SELinux-NSA, Linux FS Devel, Stephen Smalley, Alexander Viro On Wed, Jul 22, 2015 at 05:05:17PM -0700, Casey Schaufler wrote: > > This is what I currently think you want for user ns mounts: > > > > 1. smk_root and smk_default are assigned the label of the backing > > device. > > 2. s_root is assigned the transmute property. > > 3. For existing files: > > a. Files with the same label as the backing device are accessible. > > b. Files with any other label are not accessible. > > That's right. Accept correct data, reject anything that's not right. > > > If this is right, there are a couple lingering questions in my mind. > > > > First, what happens with files created in directories with the same > > label as the backing device but without the transmute property set? The > > inode for the new file will initially be labeled with smk_of_current(), > > but then during d_instantiate it will get smk_default and thus end up > > with the label we want. So that seems okay. > > Yes. > > > The second is whether files with the SMACK64EXEC attribute is still a > > problem. It seems it is, for files with the same label as the backing > > store at least. I think we can simply skip the code that reads out this > > xattr and sets smk_task for user ns mounts, or else skip assigning the > > label to the new task in bprm_set_creds. The latter seems more > > consistent with the approach you've suggested for dealing with labels > > from disk. > > Yes, I think that skipping the smk_fetch(XATTR_NAME_SMACKEXEC, ...) in > smack_d_instantiate for unprivileged mounts would do the trick. > > > So I guess all of that seems okay, though perhaps a bit restrictive > > given that the user who mounted the filesystem already has full access > > to the backing store. > > In truth, there is no reason to expect that the "user" who did the > mount will ever have a Smack label that differs from the label of > the backing store. If what we've got here seems restrictive, it's > because you've got access from someone other than the "user". > > > Please let me know whether or not this matches up with what you are > > thinking, then I can procede with the implementation. > > My current mindset is that, if you're going to allow unprivileged > mounts of user defined backing stores, this is as safe as we can > make it. All right, I've got a patch which I think does this, and I've managed to do some testing to confirm that it behaves like I expect. How does this look? What's missing is getting the label from the block device inode; as Stephen discovered the inode that I thought we could get the label from turned out to be the wrong one. Afaict we would need a new hook in order to do that, so for now I'm using the label of the proccess calling mount. --- diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c index a143328f75eb..8e631a66b03c 100644 --- a/security/smack/smack_lsm.c +++ b/security/smack/smack_lsm.c @@ -662,6 +662,8 @@ static int smack_sb_kern_mount(struct super_block *sb, int flags, void *data) skp = smk_of_current(); sp->smk_root = skp; sp->smk_default = skp; + if (sb_in_userns(sb)) + transmute = 1; } /* * Initialize the root inode. @@ -1023,6 +1025,12 @@ static int smack_inode_permission(struct inode *inode, int mask) if (mask == 0) return 0; + if (sb_in_userns(inode->i_sb)) { + struct superblock_smack *sbsp = inode->i_sb->s_security; + if (smk_of_inode(inode) != sbsp->smk_root) + return -EACCES; + } + /* May be droppable after audit */ if (no_block) return -ECHILD; @@ -3220,14 +3228,16 @@ static void smack_d_instantiate(struct dentry *opt_dentry, struct inode *inode) if (rc >= 0) transflag = SMK_INODE_TRANSMUTE; } - /* - * Don't let the exec or mmap label be "*" or "@". - */ - skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp); - if (IS_ERR(skp) || skp == &smack_known_star || - skp == &smack_known_web) - skp = NULL; - isp->smk_task = skp; + if (!sb_in_userns(inode->i_sb)) { + /* + * Don't let the exec or mmap label be "*" or "@". + */ + skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp); + if (IS_ERR(skp) || skp == &smack_known_star || + skp == &smack_known_web) + skp = NULL; + isp->smk_task = skp; + } skp = smk_fetch(XATTR_NAME_SMACKMMAP, inode, dp); if (IS_ERR(skp) || skp == &smack_known_star || ^ permalink raw reply related [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-28 20:40 ` Seth Forshee @ 2015-07-30 16:18 ` Casey Schaufler -1 siblings, 0 replies; 138+ messages in thread From: Casey Schaufler @ 2015-07-30 16:18 UTC (permalink / raw) To: Seth Forshee Cc: Stephen Smalley, Andy Lutomirski, Eric W. Biederman, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel On 7/28/2015 1:40 PM, Seth Forshee wrote: > On Wed, Jul 22, 2015 at 05:05:17PM -0700, Casey Schaufler wrote: >>> This is what I currently think you want for user ns mounts: >>> >>> 1. smk_root and smk_default are assigned the label of the backing >>> device. >>> 2. s_root is assigned the transmute property. >>> 3. For existing files: >>> a. Files with the same label as the backing device are accessible. >>> b. Files with any other label are not accessible. >> That's right. Accept correct data, reject anything that's not right. >> >>> If this is right, there are a couple lingering questions in my mind. >>> >>> First, what happens with files created in directories with the same >>> label as the backing device but without the transmute property set? The >>> inode for the new file will initially be labeled with smk_of_current(), >>> but then during d_instantiate it will get smk_default and thus end up >>> with the label we want. So that seems okay. >> Yes. >> >>> The second is whether files with the SMACK64EXEC attribute is still a >>> problem. It seems it is, for files with the same label as the backing >>> store at least. I think we can simply skip the code that reads out this >>> xattr and sets smk_task for user ns mounts, or else skip assigning the >>> label to the new task in bprm_set_creds. The latter seems more >>> consistent with the approach you've suggested for dealing with labels >>> from disk. >> Yes, I think that skipping the smk_fetch(XATTR_NAME_SMACKEXEC, ...) in >> smack_d_instantiate for unprivileged mounts would do the trick. >> >>> So I guess all of that seems okay, though perhaps a bit restrictive >>> given that the user who mounted the filesystem already has full access >>> to the backing store. >> In truth, there is no reason to expect that the "user" who did the >> mount will ever have a Smack label that differs from the label of >> the backing store. If what we've got here seems restrictive, it's >> because you've got access from someone other than the "user". >> >>> Please let me know whether or not this matches up with what you are >>> thinking, then I can procede with the implementation. >> My current mindset is that, if you're going to allow unprivileged >> mounts of user defined backing stores, this is as safe as we can >> make it. > All right, I've got a patch which I think does this, and I've managed to > do some testing to confirm that it behaves like I expect. How does this > look? > > What's missing is getting the label from the block device inode; as > Stephen discovered the inode that I thought we could get the label from > turned out to be the wrong one. Afaict we would need a new hook in order > to do that, so for now I'm using the label of the proccess calling > mount. That will be OK if the mount processing checks for write access to the backing store. I haven't looked to see if it does. If it doesn't the problems should be pretty obvious. > > --- > > diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c > index a143328f75eb..8e631a66b03c 100644 > --- a/security/smack/smack_lsm.c > +++ b/security/smack/smack_lsm.c > @@ -662,6 +662,8 @@ static int smack_sb_kern_mount(struct super_block *sb, int flags, void *data) > skp = smk_of_current(); > sp->smk_root = skp; > sp->smk_default = skp; > + if (sb_in_userns(sb)) > + transmute = 1; > } > /* > * Initialize the root inode. > @@ -1023,6 +1025,12 @@ static int smack_inode_permission(struct inode *inode, int mask) > if (mask == 0) > return 0; > > + if (sb_in_userns(inode->i_sb)) { > + struct superblock_smack *sbsp = inode->i_sb->s_security; > + if (smk_of_inode(inode) != sbsp->smk_root) > + return -EACCES; > + } > + > /* May be droppable after audit */ > if (no_block) > return -ECHILD; > @@ -3220,14 +3228,16 @@ static void smack_d_instantiate(struct dentry *opt_dentry, struct inode *inode) > if (rc >= 0) > transflag = SMK_INODE_TRANSMUTE; > } > - /* > - * Don't let the exec or mmap label be "*" or "@". > - */ > - skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp); > - if (IS_ERR(skp) || skp == &smack_known_star || > - skp == &smack_known_web) > - skp = NULL; > - isp->smk_task = skp; > + if (!sb_in_userns(inode->i_sb)) { > + /* > + * Don't let the exec or mmap label be "*" or "@". > + */ > + skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp); > + if (IS_ERR(skp) || skp == &smack_known_star || > + skp == &smack_known_web) > + skp = NULL; > + isp->smk_task = skp; > + } > > skp = smk_fetch(XATTR_NAME_SMACKMMAP, inode, dp); > if (IS_ERR(skp) || skp == &smack_known_star || > ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-30 16:18 ` Casey Schaufler 0 siblings, 0 replies; 138+ messages in thread From: Casey Schaufler @ 2015-07-30 16:18 UTC (permalink / raw) To: Seth Forshee Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, LSM List, SELinux-NSA, Linux FS Devel, Stephen Smalley, Alexander Viro On 7/28/2015 1:40 PM, Seth Forshee wrote: > On Wed, Jul 22, 2015 at 05:05:17PM -0700, Casey Schaufler wrote: >>> This is what I currently think you want for user ns mounts: >>> >>> 1. smk_root and smk_default are assigned the label of the backing >>> device. >>> 2. s_root is assigned the transmute property. >>> 3. For existing files: >>> a. Files with the same label as the backing device are accessible. >>> b. Files with any other label are not accessible. >> That's right. Accept correct data, reject anything that's not right. >> >>> If this is right, there are a couple lingering questions in my mind. >>> >>> First, what happens with files created in directories with the same >>> label as the backing device but without the transmute property set? The >>> inode for the new file will initially be labeled with smk_of_current(), >>> but then during d_instantiate it will get smk_default and thus end up >>> with the label we want. So that seems okay. >> Yes. >> >>> The second is whether files with the SMACK64EXEC attribute is still a >>> problem. It seems it is, for files with the same label as the backing >>> store at least. I think we can simply skip the code that reads out this >>> xattr and sets smk_task for user ns mounts, or else skip assigning the >>> label to the new task in bprm_set_creds. The latter seems more >>> consistent with the approach you've suggested for dealing with labels >>> from disk. >> Yes, I think that skipping the smk_fetch(XATTR_NAME_SMACKEXEC, ...) in >> smack_d_instantiate for unprivileged mounts would do the trick. >> >>> So I guess all of that seems okay, though perhaps a bit restrictive >>> given that the user who mounted the filesystem already has full access >>> to the backing store. >> In truth, there is no reason to expect that the "user" who did the >> mount will ever have a Smack label that differs from the label of >> the backing store. If what we've got here seems restrictive, it's >> because you've got access from someone other than the "user". >> >>> Please let me know whether or not this matches up with what you are >>> thinking, then I can procede with the implementation. >> My current mindset is that, if you're going to allow unprivileged >> mounts of user defined backing stores, this is as safe as we can >> make it. > All right, I've got a patch which I think does this, and I've managed to > do some testing to confirm that it behaves like I expect. How does this > look? > > What's missing is getting the label from the block device inode; as > Stephen discovered the inode that I thought we could get the label from > turned out to be the wrong one. Afaict we would need a new hook in order > to do that, so for now I'm using the label of the proccess calling > mount. That will be OK if the mount processing checks for write access to the backing store. I haven't looked to see if it does. If it doesn't the problems should be pretty obvious. > > --- > > diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c > index a143328f75eb..8e631a66b03c 100644 > --- a/security/smack/smack_lsm.c > +++ b/security/smack/smack_lsm.c > @@ -662,6 +662,8 @@ static int smack_sb_kern_mount(struct super_block *sb, int flags, void *data) > skp = smk_of_current(); > sp->smk_root = skp; > sp->smk_default = skp; > + if (sb_in_userns(sb)) > + transmute = 1; > } > /* > * Initialize the root inode. > @@ -1023,6 +1025,12 @@ static int smack_inode_permission(struct inode *inode, int mask) > if (mask == 0) > return 0; > > + if (sb_in_userns(inode->i_sb)) { > + struct superblock_smack *sbsp = inode->i_sb->s_security; > + if (smk_of_inode(inode) != sbsp->smk_root) > + return -EACCES; > + } > + > /* May be droppable after audit */ > if (no_block) > return -ECHILD; > @@ -3220,14 +3228,16 @@ static void smack_d_instantiate(struct dentry *opt_dentry, struct inode *inode) > if (rc >= 0) > transflag = SMK_INODE_TRANSMUTE; > } > - /* > - * Don't let the exec or mmap label be "*" or "@". > - */ > - skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp); > - if (IS_ERR(skp) || skp == &smack_known_star || > - skp == &smack_known_web) > - skp = NULL; > - isp->smk_task = skp; > + if (!sb_in_userns(inode->i_sb)) { > + /* > + * Don't let the exec or mmap label be "*" or "@". > + */ > + skp = smk_fetch(XATTR_NAME_SMACKEXEC, inode, dp); > + if (IS_ERR(skp) || skp == &smack_known_star || > + skp == &smack_known_web) > + skp = NULL; > + isp->smk_task = skp; > + } > > skp = smk_fetch(XATTR_NAME_SMACKMMAP, inode, dp); > if (IS_ERR(skp) || skp == &smack_known_star || > ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-30 16:18 ` Casey Schaufler @ 2015-07-30 17:05 ` Eric W. Biederman -1 siblings, 0 replies; 138+ messages in thread From: Eric W. Biederman @ 2015-07-30 17:05 UTC (permalink / raw) To: Casey Schaufler Cc: Seth Forshee, Stephen Smalley, Andy Lutomirski, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel Casey Schaufler <casey@schaufler-ca.com> writes: > On 7/28/2015 1:40 PM, Seth Forshee wrote: >> On Wed, Jul 22, 2015 at 05:05:17PM -0700, Casey Schaufler wrote: >>>> This is what I currently think you want for user ns mounts: >>>> >>>> 1. smk_root and smk_default are assigned the label of the backing >>>> device. >>>> 2. s_root is assigned the transmute property. >>>> 3. For existing files: >>>> a. Files with the same label as the backing device are accessible. >>>> b. Files with any other label are not accessible. >>> That's right. Accept correct data, reject anything that's not right. >>> >>>> If this is right, there are a couple lingering questions in my mind. >>>> >>>> First, what happens with files created in directories with the same >>>> label as the backing device but without the transmute property set? The >>>> inode for the new file will initially be labeled with smk_of_current(), >>>> but then during d_instantiate it will get smk_default and thus end up >>>> with the label we want. So that seems okay. >>> Yes. >>> >>>> The second is whether files with the SMACK64EXEC attribute is still a >>>> problem. It seems it is, for files with the same label as the backing >>>> store at least. I think we can simply skip the code that reads out this >>>> xattr and sets smk_task for user ns mounts, or else skip assigning the >>>> label to the new task in bprm_set_creds. The latter seems more >>>> consistent with the approach you've suggested for dealing with labels >>>> from disk. >>> Yes, I think that skipping the smk_fetch(XATTR_NAME_SMACKEXEC, ...) in >>> smack_d_instantiate for unprivileged mounts would do the trick. >>> >>>> So I guess all of that seems okay, though perhaps a bit restrictive >>>> given that the user who mounted the filesystem already has full access >>>> to the backing store. >>> In truth, there is no reason to expect that the "user" who did the >>> mount will ever have a Smack label that differs from the label of >>> the backing store. If what we've got here seems restrictive, it's >>> because you've got access from someone other than the "user". >>> >>>> Please let me know whether or not this matches up with what you are >>>> thinking, then I can procede with the implementation. >>> My current mindset is that, if you're going to allow unprivileged >>> mounts of user defined backing stores, this is as safe as we can >>> make it. >> All right, I've got a patch which I think does this, and I've managed to >> do some testing to confirm that it behaves like I expect. How does this >> look? >> >> What's missing is getting the label from the block device inode; as >> Stephen discovered the inode that I thought we could get the label from >> turned out to be the wrong one. Afaict we would need a new hook in order >> to do that, so for now I'm using the label of the proccess calling >> mount. > > That will be OK if the mount processing checks for write access to > the backing store. I haven't looked to see if it does. If it doesn't > the problems should be pretty obvious. do_new_mount vfs_kern_mount mount_fs ... mount_bdev blkdev_get_by_path(...,FMODE_READ| FMODE_WRITE | FMODE_EXCL,...) lookup_bdev kern_path filename_lookup path_lookupat lookup_last walk_component blkdev_get(...,mode,...) __blkdev_get(...,mode,...) devcgroup_inode_permission(bdev->bd_inode, perm) *scratches my head* It looks like we don't actually check the permissions on the block device. Tomoyo has a hack for it. nfsd does something. There is devcgroup silliness. But overall it looks like we depend on capable(CAP_SYS_ADMIN). Seth I do believe we have found another area of the vfs we will need to short up before allowing unprivileged mounts of block device based filesystems. It looks like there are enough hacks someone with a clue coming through and making the code make more sense seems like a good idea anyway. Eric ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-30 17:05 ` Eric W. Biederman 0 siblings, 0 replies; 138+ messages in thread From: Eric W. Biederman @ 2015-07-30 17:05 UTC (permalink / raw) To: Casey Schaufler Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, Seth Forshee, LSM List, Alexander Viro, SELinux-NSA, Linux FS Devel, Stephen Smalley Casey Schaufler <casey@schaufler-ca.com> writes: > On 7/28/2015 1:40 PM, Seth Forshee wrote: >> On Wed, Jul 22, 2015 at 05:05:17PM -0700, Casey Schaufler wrote: >>>> This is what I currently think you want for user ns mounts: >>>> >>>> 1. smk_root and smk_default are assigned the label of the backing >>>> device. >>>> 2. s_root is assigned the transmute property. >>>> 3. For existing files: >>>> a. Files with the same label as the backing device are accessible. >>>> b. Files with any other label are not accessible. >>> That's right. Accept correct data, reject anything that's not right. >>> >>>> If this is right, there are a couple lingering questions in my mind. >>>> >>>> First, what happens with files created in directories with the same >>>> label as the backing device but without the transmute property set? The >>>> inode for the new file will initially be labeled with smk_of_current(), >>>> but then during d_instantiate it will get smk_default and thus end up >>>> with the label we want. So that seems okay. >>> Yes. >>> >>>> The second is whether files with the SMACK64EXEC attribute is still a >>>> problem. It seems it is, for files with the same label as the backing >>>> store at least. I think we can simply skip the code that reads out this >>>> xattr and sets smk_task for user ns mounts, or else skip assigning the >>>> label to the new task in bprm_set_creds. The latter seems more >>>> consistent with the approach you've suggested for dealing with labels >>>> from disk. >>> Yes, I think that skipping the smk_fetch(XATTR_NAME_SMACKEXEC, ...) in >>> smack_d_instantiate for unprivileged mounts would do the trick. >>> >>>> So I guess all of that seems okay, though perhaps a bit restrictive >>>> given that the user who mounted the filesystem already has full access >>>> to the backing store. >>> In truth, there is no reason to expect that the "user" who did the >>> mount will ever have a Smack label that differs from the label of >>> the backing store. If what we've got here seems restrictive, it's >>> because you've got access from someone other than the "user". >>> >>>> Please let me know whether or not this matches up with what you are >>>> thinking, then I can procede with the implementation. >>> My current mindset is that, if you're going to allow unprivileged >>> mounts of user defined backing stores, this is as safe as we can >>> make it. >> All right, I've got a patch which I think does this, and I've managed to >> do some testing to confirm that it behaves like I expect. How does this >> look? >> >> What's missing is getting the label from the block device inode; as >> Stephen discovered the inode that I thought we could get the label from >> turned out to be the wrong one. Afaict we would need a new hook in order >> to do that, so for now I'm using the label of the proccess calling >> mount. > > That will be OK if the mount processing checks for write access to > the backing store. I haven't looked to see if it does. If it doesn't > the problems should be pretty obvious. do_new_mount vfs_kern_mount mount_fs ... mount_bdev blkdev_get_by_path(...,FMODE_READ| FMODE_WRITE | FMODE_EXCL,...) lookup_bdev kern_path filename_lookup path_lookupat lookup_last walk_component blkdev_get(...,mode,...) __blkdev_get(...,mode,...) devcgroup_inode_permission(bdev->bd_inode, perm) *scratches my head* It looks like we don't actually check the permissions on the block device. Tomoyo has a hack for it. nfsd does something. There is devcgroup silliness. But overall it looks like we depend on capable(CAP_SYS_ADMIN). Seth I do believe we have found another area of the vfs we will need to short up before allowing unprivileged mounts of block device based filesystems. It looks like there are enough hacks someone with a clue coming through and making the code make more sense seems like a good idea anyway. Eric ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-30 17:05 ` Eric W. Biederman @ 2015-07-30 17:25 ` Seth Forshee -1 siblings, 0 replies; 138+ messages in thread From: Seth Forshee @ 2015-07-30 17:25 UTC (permalink / raw) To: Eric W. Biederman Cc: Casey Schaufler, Stephen Smalley, Andy Lutomirski, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel On Thu, Jul 30, 2015 at 12:05:27PM -0500, Eric W. Biederman wrote: > Casey Schaufler <casey@schaufler-ca.com> writes: > > > On 7/28/2015 1:40 PM, Seth Forshee wrote: > >> On Wed, Jul 22, 2015 at 05:05:17PM -0700, Casey Schaufler wrote: > >>>> This is what I currently think you want for user ns mounts: > >>>> > >>>> 1. smk_root and smk_default are assigned the label of the backing > >>>> device. > >>>> 2. s_root is assigned the transmute property. > >>>> 3. For existing files: > >>>> a. Files with the same label as the backing device are accessible. > >>>> b. Files with any other label are not accessible. > >>> That's right. Accept correct data, reject anything that's not right. > >>> > >>>> If this is right, there are a couple lingering questions in my mind. > >>>> > >>>> First, what happens with files created in directories with the same > >>>> label as the backing device but without the transmute property set? The > >>>> inode for the new file will initially be labeled with smk_of_current(), > >>>> but then during d_instantiate it will get smk_default and thus end up > >>>> with the label we want. So that seems okay. > >>> Yes. > >>> > >>>> The second is whether files with the SMACK64EXEC attribute is still a > >>>> problem. It seems it is, for files with the same label as the backing > >>>> store at least. I think we can simply skip the code that reads out this > >>>> xattr and sets smk_task for user ns mounts, or else skip assigning the > >>>> label to the new task in bprm_set_creds. The latter seems more > >>>> consistent with the approach you've suggested for dealing with labels > >>>> from disk. > >>> Yes, I think that skipping the smk_fetch(XATTR_NAME_SMACKEXEC, ...) in > >>> smack_d_instantiate for unprivileged mounts would do the trick. > >>> > >>>> So I guess all of that seems okay, though perhaps a bit restrictive > >>>> given that the user who mounted the filesystem already has full access > >>>> to the backing store. > >>> In truth, there is no reason to expect that the "user" who did the > >>> mount will ever have a Smack label that differs from the label of > >>> the backing store. If what we've got here seems restrictive, it's > >>> because you've got access from someone other than the "user". > >>> > >>>> Please let me know whether or not this matches up with what you are > >>>> thinking, then I can procede with the implementation. > >>> My current mindset is that, if you're going to allow unprivileged > >>> mounts of user defined backing stores, this is as safe as we can > >>> make it. > >> All right, I've got a patch which I think does this, and I've managed to > >> do some testing to confirm that it behaves like I expect. How does this > >> look? > >> > >> What's missing is getting the label from the block device inode; as > >> Stephen discovered the inode that I thought we could get the label from > >> turned out to be the wrong one. Afaict we would need a new hook in order > >> to do that, so for now I'm using the label of the proccess calling > >> mount. > > > > That will be OK if the mount processing checks for write access to > > the backing store. I haven't looked to see if it does. If it doesn't > > the problems should be pretty obvious. > > > do_new_mount > vfs_kern_mount > mount_fs > ... > mount_bdev > blkdev_get_by_path(...,FMODE_READ| FMODE_WRITE | FMODE_EXCL,...) > lookup_bdev > kern_path > filename_lookup > path_lookupat > lookup_last > walk_component > blkdev_get(...,mode,...) > __blkdev_get(...,mode,...) > devcgroup_inode_permission(bdev->bd_inode, perm) > > *scratches my head* > > It looks like we don't actually check the permissions on the block > device. Tomoyo has a hack for it. nfsd does something. There is > devcgroup silliness. > > But overall it looks like we depend on capable(CAP_SYS_ADMIN). > > Seth I do believe we have found another area of the vfs we will need to > short up before allowing unprivileged mounts of block device based > filesystems. > > It looks like there are enough hacks someone with a clue coming through > and making the code make more sense seems like a good idea anyway. Yep, I just came to the same conclusion myself, and I also verified the behavior emperically. That's definitely a problem. I'll get to work on fixing that. Seth ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-30 17:25 ` Seth Forshee 0 siblings, 0 replies; 138+ messages in thread From: Seth Forshee @ 2015-07-30 17:25 UTC (permalink / raw) To: Eric W. Biederman Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, Linux FS Devel, LSM List, Alexander Viro, SELinux-NSA, Stephen Smalley On Thu, Jul 30, 2015 at 12:05:27PM -0500, Eric W. Biederman wrote: > Casey Schaufler <casey@schaufler-ca.com> writes: > > > On 7/28/2015 1:40 PM, Seth Forshee wrote: > >> On Wed, Jul 22, 2015 at 05:05:17PM -0700, Casey Schaufler wrote: > >>>> This is what I currently think you want for user ns mounts: > >>>> > >>>> 1. smk_root and smk_default are assigned the label of the backing > >>>> device. > >>>> 2. s_root is assigned the transmute property. > >>>> 3. For existing files: > >>>> a. Files with the same label as the backing device are accessible. > >>>> b. Files with any other label are not accessible. > >>> That's right. Accept correct data, reject anything that's not right. > >>> > >>>> If this is right, there are a couple lingering questions in my mind. > >>>> > >>>> First, what happens with files created in directories with the same > >>>> label as the backing device but without the transmute property set? The > >>>> inode for the new file will initially be labeled with smk_of_current(), > >>>> but then during d_instantiate it will get smk_default and thus end up > >>>> with the label we want. So that seems okay. > >>> Yes. > >>> > >>>> The second is whether files with the SMACK64EXEC attribute is still a > >>>> problem. It seems it is, for files with the same label as the backing > >>>> store at least. I think we can simply skip the code that reads out this > >>>> xattr and sets smk_task for user ns mounts, or else skip assigning the > >>>> label to the new task in bprm_set_creds. The latter seems more > >>>> consistent with the approach you've suggested for dealing with labels > >>>> from disk. > >>> Yes, I think that skipping the smk_fetch(XATTR_NAME_SMACKEXEC, ...) in > >>> smack_d_instantiate for unprivileged mounts would do the trick. > >>> > >>>> So I guess all of that seems okay, though perhaps a bit restrictive > >>>> given that the user who mounted the filesystem already has full access > >>>> to the backing store. > >>> In truth, there is no reason to expect that the "user" who did the > >>> mount will ever have a Smack label that differs from the label of > >>> the backing store. If what we've got here seems restrictive, it's > >>> because you've got access from someone other than the "user". > >>> > >>>> Please let me know whether or not this matches up with what you are > >>>> thinking, then I can procede with the implementation. > >>> My current mindset is that, if you're going to allow unprivileged > >>> mounts of user defined backing stores, this is as safe as we can > >>> make it. > >> All right, I've got a patch which I think does this, and I've managed to > >> do some testing to confirm that it behaves like I expect. How does this > >> look? > >> > >> What's missing is getting the label from the block device inode; as > >> Stephen discovered the inode that I thought we could get the label from > >> turned out to be the wrong one. Afaict we would need a new hook in order > >> to do that, so for now I'm using the label of the proccess calling > >> mount. > > > > That will be OK if the mount processing checks for write access to > > the backing store. I haven't looked to see if it does. If it doesn't > > the problems should be pretty obvious. > > > do_new_mount > vfs_kern_mount > mount_fs > ... > mount_bdev > blkdev_get_by_path(...,FMODE_READ| FMODE_WRITE | FMODE_EXCL,...) > lookup_bdev > kern_path > filename_lookup > path_lookupat > lookup_last > walk_component > blkdev_get(...,mode,...) > __blkdev_get(...,mode,...) > devcgroup_inode_permission(bdev->bd_inode, perm) > > *scratches my head* > > It looks like we don't actually check the permissions on the block > device. Tomoyo has a hack for it. nfsd does something. There is > devcgroup silliness. > > But overall it looks like we depend on capable(CAP_SYS_ADMIN). > > Seth I do believe we have found another area of the vfs we will need to > short up before allowing unprivileged mounts of block device based > filesystems. > > It looks like there are enough hacks someone with a clue coming through > and making the code make more sense seems like a good idea anyway. Yep, I just came to the same conclusion myself, and I also verified the behavior emperically. That's definitely a problem. I'll get to work on fixing that. Seth ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-30 17:25 ` Seth Forshee @ 2015-07-30 17:33 ` Eric W. Biederman -1 siblings, 0 replies; 138+ messages in thread From: Eric W. Biederman @ 2015-07-30 17:33 UTC (permalink / raw) To: Seth Forshee Cc: Casey Schaufler, Stephen Smalley, Andy Lutomirski, Alexander Viro, Linux FS Devel, LSM List, SELinux-NSA, Serge Hallyn, linux-kernel Seth Forshee <seth.forshee@canonical.com> writes: > On Thu, Jul 30, 2015 at 12:05:27PM -0500, Eric W. Biederman wrote: >> Casey Schaufler <casey@schaufler-ca.com> writes: >> >> > On 7/28/2015 1:40 PM, Seth Forshee wrote: >> >> On Wed, Jul 22, 2015 at 05:05:17PM -0700, Casey Schaufler wrote: >> >>>> This is what I currently think you want for user ns mounts: >> >>>> >> >>>> 1. smk_root and smk_default are assigned the label of the backing >> >>>> device. >> >>>> 2. s_root is assigned the transmute property. >> >>>> 3. For existing files: >> >>>> a. Files with the same label as the backing device are accessible. >> >>>> b. Files with any other label are not accessible. >> >>> That's right. Accept correct data, reject anything that's not right. >> >>> >> >>>> If this is right, there are a couple lingering questions in my mind. >> >>>> >> >>>> First, what happens with files created in directories with the same >> >>>> label as the backing device but without the transmute property set? The >> >>>> inode for the new file will initially be labeled with smk_of_current(), >> >>>> but then during d_instantiate it will get smk_default and thus end up >> >>>> with the label we want. So that seems okay. >> >>> Yes. >> >>> >> >>>> The second is whether files with the SMACK64EXEC attribute is still a >> >>>> problem. It seems it is, for files with the same label as the backing >> >>>> store at least. I think we can simply skip the code that reads out this >> >>>> xattr and sets smk_task for user ns mounts, or else skip assigning the >> >>>> label to the new task in bprm_set_creds. The latter seems more >> >>>> consistent with the approach you've suggested for dealing with labels >> >>>> from disk. >> >>> Yes, I think that skipping the smk_fetch(XATTR_NAME_SMACKEXEC, ...) in >> >>> smack_d_instantiate for unprivileged mounts would do the trick. >> >>> >> >>>> So I guess all of that seems okay, though perhaps a bit restrictive >> >>>> given that the user who mounted the filesystem already has full access >> >>>> to the backing store. >> >>> In truth, there is no reason to expect that the "user" who did the >> >>> mount will ever have a Smack label that differs from the label of >> >>> the backing store. If what we've got here seems restrictive, it's >> >>> because you've got access from someone other than the "user". >> >>> >> >>>> Please let me know whether or not this matches up with what you are >> >>>> thinking, then I can procede with the implementation. >> >>> My current mindset is that, if you're going to allow unprivileged >> >>> mounts of user defined backing stores, this is as safe as we can >> >>> make it. >> >> All right, I've got a patch which I think does this, and I've managed to >> >> do some testing to confirm that it behaves like I expect. How does this >> >> look? >> >> >> >> What's missing is getting the label from the block device inode; as >> >> Stephen discovered the inode that I thought we could get the label from >> >> turned out to be the wrong one. Afaict we would need a new hook in order >> >> to do that, so for now I'm using the label of the proccess calling >> >> mount. >> > >> > That will be OK if the mount processing checks for write access to >> > the backing store. I haven't looked to see if it does. If it doesn't >> > the problems should be pretty obvious. >> >> >> do_new_mount >> vfs_kern_mount >> mount_fs >> ... >> mount_bdev >> blkdev_get_by_path(...,FMODE_READ| FMODE_WRITE | FMODE_EXCL,...) >> lookup_bdev >> kern_path >> filename_lookup >> path_lookupat >> lookup_last >> walk_component >> blkdev_get(...,mode,...) >> __blkdev_get(...,mode,...) >> devcgroup_inode_permission(bdev->bd_inode, perm) >> >> *scratches my head* >> >> It looks like we don't actually check the permissions on the block >> device. Tomoyo has a hack for it. nfsd does something. There is >> devcgroup silliness. >> >> But overall it looks like we depend on capable(CAP_SYS_ADMIN). >> >> Seth I do believe we have found another area of the vfs we will need to >> short up before allowing unprivileged mounts of block device based >> filesystems. >> >> It looks like there are enough hacks someone with a clue coming through >> and making the code make more sense seems like a good idea anyway. > > Yep, I just came to the same conclusion myself, and I also verified the > behavior emperically. That's definitely a problem. I'll get to work on > fixing that. At a quick glance it looks like lookup_bdev, and most of it's callers need to be modified to do potentially do the additional permission checking. I expect we could move the devcgroup checks into whatever new checks we wind up adding. Fun, fun fun. Eric ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-30 17:33 ` Eric W. Biederman 0 siblings, 0 replies; 138+ messages in thread From: Eric W. Biederman @ 2015-07-30 17:33 UTC (permalink / raw) To: Seth Forshee Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, Linux FS Devel, LSM List, Alexander Viro, SELinux-NSA, Stephen Smalley Seth Forshee <seth.forshee@canonical.com> writes: > On Thu, Jul 30, 2015 at 12:05:27PM -0500, Eric W. Biederman wrote: >> Casey Schaufler <casey@schaufler-ca.com> writes: >> >> > On 7/28/2015 1:40 PM, Seth Forshee wrote: >> >> On Wed, Jul 22, 2015 at 05:05:17PM -0700, Casey Schaufler wrote: >> >>>> This is what I currently think you want for user ns mounts: >> >>>> >> >>>> 1. smk_root and smk_default are assigned the label of the backing >> >>>> device. >> >>>> 2. s_root is assigned the transmute property. >> >>>> 3. For existing files: >> >>>> a. Files with the same label as the backing device are accessible. >> >>>> b. Files with any other label are not accessible. >> >>> That's right. Accept correct data, reject anything that's not right. >> >>> >> >>>> If this is right, there are a couple lingering questions in my mind. >> >>>> >> >>>> First, what happens with files created in directories with the same >> >>>> label as the backing device but without the transmute property set? The >> >>>> inode for the new file will initially be labeled with smk_of_current(), >> >>>> but then during d_instantiate it will get smk_default and thus end up >> >>>> with the label we want. So that seems okay. >> >>> Yes. >> >>> >> >>>> The second is whether files with the SMACK64EXEC attribute is still a >> >>>> problem. It seems it is, for files with the same label as the backing >> >>>> store at least. I think we can simply skip the code that reads out this >> >>>> xattr and sets smk_task for user ns mounts, or else skip assigning the >> >>>> label to the new task in bprm_set_creds. The latter seems more >> >>>> consistent with the approach you've suggested for dealing with labels >> >>>> from disk. >> >>> Yes, I think that skipping the smk_fetch(XATTR_NAME_SMACKEXEC, ...) in >> >>> smack_d_instantiate for unprivileged mounts would do the trick. >> >>> >> >>>> So I guess all of that seems okay, though perhaps a bit restrictive >> >>>> given that the user who mounted the filesystem already has full access >> >>>> to the backing store. >> >>> In truth, there is no reason to expect that the "user" who did the >> >>> mount will ever have a Smack label that differs from the label of >> >>> the backing store. If what we've got here seems restrictive, it's >> >>> because you've got access from someone other than the "user". >> >>> >> >>>> Please let me know whether or not this matches up with what you are >> >>>> thinking, then I can procede with the implementation. >> >>> My current mindset is that, if you're going to allow unprivileged >> >>> mounts of user defined backing stores, this is as safe as we can >> >>> make it. >> >> All right, I've got a patch which I think does this, and I've managed to >> >> do some testing to confirm that it behaves like I expect. How does this >> >> look? >> >> >> >> What's missing is getting the label from the block device inode; as >> >> Stephen discovered the inode that I thought we could get the label from >> >> turned out to be the wrong one. Afaict we would need a new hook in order >> >> to do that, so for now I'm using the label of the proccess calling >> >> mount. >> > >> > That will be OK if the mount processing checks for write access to >> > the backing store. I haven't looked to see if it does. If it doesn't >> > the problems should be pretty obvious. >> >> >> do_new_mount >> vfs_kern_mount >> mount_fs >> ... >> mount_bdev >> blkdev_get_by_path(...,FMODE_READ| FMODE_WRITE | FMODE_EXCL,...) >> lookup_bdev >> kern_path >> filename_lookup >> path_lookupat >> lookup_last >> walk_component >> blkdev_get(...,mode,...) >> __blkdev_get(...,mode,...) >> devcgroup_inode_permission(bdev->bd_inode, perm) >> >> *scratches my head* >> >> It looks like we don't actually check the permissions on the block >> device. Tomoyo has a hack for it. nfsd does something. There is >> devcgroup silliness. >> >> But overall it looks like we depend on capable(CAP_SYS_ADMIN). >> >> Seth I do believe we have found another area of the vfs we will need to >> short up before allowing unprivileged mounts of block device based >> filesystems. >> >> It looks like there are enough hacks someone with a clue coming through >> and making the code make more sense seems like a good idea anyway. > > Yep, I just came to the same conclusion myself, and I also verified the > behavior emperically. That's definitely a problem. I'll get to work on > fixing that. At a quick glance it looks like lookup_bdev, and most of it's callers need to be modified to do potentially do the additional permission checking. I expect we could move the devcgroup checks into whatever new checks we wind up adding. Fun, fun fun. Eric ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-16 21:42 ` Casey Schaufler @ 2015-07-17 13:21 ` Seth Forshee -1 siblings, 0 replies; 138+ messages in thread From: Seth Forshee @ 2015-07-17 13:21 UTC (permalink / raw) To: Casey Schaufler Cc: Eric W. Biederman, Alexander Viro, linux-fsdevel, linux-security-module, selinux, Serge Hallyn, Andy Lutomirski, linux-kernel On Thu, Jul 16, 2015 at 02:42:22PM -0700, Casey Schaufler wrote: <snip> > > I welcome feedback about anything I've missed, but stating generally > > that you think I probably missed something isn't very helpful. > > True enough. I hope I've explained myself above. Thanks, that definitely clarified where we were having a disconnect. Andy's done a fantastic job explaining how those concerns are addressed. > > The LSM issue is thornier than the rest of it though, which is why I > > specifically asked for review there in the cover letter. There's a lot > > of complexity and nuance, and I still don't have a grasp on all the > > subtleties. One such subtlety is the full impact of simply ignoring the > > security labels on disk (but I am still confused as to why this is > > different from filesystems which don't support xattrs at all). > > If you can mount a filesystem such that the labels are ignored you > are effectively specifying that the Smack label on the files be > determined by the defaulting rules. With CAP_MAC_ADMIN that's fine. > Without it, it's not. > > > I was unaware of Lukasz's patches until yesterday, and I will have a > > look at them. But since we don't have the LSM support for user > > namespaces yet, I don't see the problem with doing something safe for > > LSMs initially and evolving the LSM integration for user ns mounts along > > with the rest of the user ns integration. > > Ignoring the security attributes is not safe! Understood. It's surely safe for each LSM to deny such mounts until it has a way to handle them safely though. I'm not trying to completely punt on the issue of security modules, just break this down into more manageable chunks. You've given good guidance for Smack (thanks very much for that), so I can plan to work on that soon. > > Your point is taken about my less-than-expert opinion about the other > > security modules. We should at minimum get acks from the maintainers of > > those modules that unprivileged mounts will not compromise MAC. > > I am the Smack maintainer. Unprivileged mounts as you have > described them compromise MAC. They compromise DAC, too. It looks like Andy's more or less convinced you that DAC isn't (additionally?) compromised. And there's a plan for MAC, that the security module can deny mounts from user namespaces until it has a solution for allowing them safely. > > For Smack specifically, I believe my only concern was the SMACK64EXEC > > attribute, as all the other attributes only affected subjects' access to > > the files. So maybe it would be possible to simply ignore this attribute > > in unprivileged mounts and respect the others, even lacking more > > complete LSM support for user namespaces. > > SMACK64EXEC is analogous to the setuid bit, but I would rather see > exec() of programs with this attribute refused that for it to be > blindly ignored. That's fine, it's your call. Thanks, Seth ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-17 13:21 ` Seth Forshee 0 siblings, 0 replies; 138+ messages in thread From: Seth Forshee @ 2015-07-17 13:21 UTC (permalink / raw) To: Casey Schaufler Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, linux-security-module, selinux, linux-fsdevel, Alexander Viro On Thu, Jul 16, 2015 at 02:42:22PM -0700, Casey Schaufler wrote: <snip> > > I welcome feedback about anything I've missed, but stating generally > > that you think I probably missed something isn't very helpful. > > True enough. I hope I've explained myself above. Thanks, that definitely clarified where we were having a disconnect. Andy's done a fantastic job explaining how those concerns are addressed. > > The LSM issue is thornier than the rest of it though, which is why I > > specifically asked for review there in the cover letter. There's a lot > > of complexity and nuance, and I still don't have a grasp on all the > > subtleties. One such subtlety is the full impact of simply ignoring the > > security labels on disk (but I am still confused as to why this is > > different from filesystems which don't support xattrs at all). > > If you can mount a filesystem such that the labels are ignored you > are effectively specifying that the Smack label on the files be > determined by the defaulting rules. With CAP_MAC_ADMIN that's fine. > Without it, it's not. > > > I was unaware of Lukasz's patches until yesterday, and I will have a > > look at them. But since we don't have the LSM support for user > > namespaces yet, I don't see the problem with doing something safe for > > LSMs initially and evolving the LSM integration for user ns mounts along > > with the rest of the user ns integration. > > Ignoring the security attributes is not safe! Understood. It's surely safe for each LSM to deny such mounts until it has a way to handle them safely though. I'm not trying to completely punt on the issue of security modules, just break this down into more manageable chunks. You've given good guidance for Smack (thanks very much for that), so I can plan to work on that soon. > > Your point is taken about my less-than-expert opinion about the other > > security modules. We should at minimum get acks from the maintainers of > > those modules that unprivileged mounts will not compromise MAC. > > I am the Smack maintainer. Unprivileged mounts as you have > described them compromise MAC. They compromise DAC, too. It looks like Andy's more or less convinced you that DAC isn't (additionally?) compromised. And there's a plan for MAC, that the security module can deny mounts from user namespaces until it has a solution for allowing them safely. > > For Smack specifically, I believe my only concern was the SMACK64EXEC > > attribute, as all the other attributes only affected subjects' access to > > the files. So maybe it would be possible to simply ignore this attribute > > in unprivileged mounts and respect the others, even lacking more > > complete LSM support for user namespaces. > > SMACK64EXEC is analogous to the setuid bit, but I would rather see > exec() of programs with this attribute refused that for it to be > blindly ignored. That's fine, it's your call. Thanks, Seth ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-17 13:21 ` Seth Forshee @ 2015-07-17 17:14 ` Casey Schaufler -1 siblings, 0 replies; 138+ messages in thread From: Casey Schaufler @ 2015-07-17 17:14 UTC (permalink / raw) To: Seth Forshee Cc: Eric W. Biederman, Alexander Viro, linux-fsdevel, linux-security-module, selinux, Serge Hallyn, Andy Lutomirski, linux-kernel On 7/17/2015 6:21 AM, Seth Forshee wrote: > On Thu, Jul 16, 2015 at 02:42:22PM -0700, Casey Schaufler wrote: > > <snip> > >>> I welcome feedback about anything I've missed, but stating generally >>> that you think I probably missed something isn't very helpful. >> True enough. I hope I've explained myself above. > Thanks, that definitely clarified where we were having a disconnect. > Andy's done a fantastic job explaining how those concerns are addressed. > >>> The LSM issue is thornier than the rest of it though, which is why I >>> specifically asked for review there in the cover letter. There's a lot >>> of complexity and nuance, and I still don't have a grasp on all the >>> subtleties. One such subtlety is the full impact of simply ignoring the >>> security labels on disk (but I am still confused as to why this is >>> different from filesystems which don't support xattrs at all). >> If you can mount a filesystem such that the labels are ignored you >> are effectively specifying that the Smack label on the files be >> determined by the defaulting rules. With CAP_MAC_ADMIN that's fine. >> Without it, it's not. >> >>> I was unaware of Lukasz's patches until yesterday, and I will have a >>> look at them. But since we don't have the LSM support for user >>> namespaces yet, I don't see the problem with doing something safe for >>> LSMs initially and evolving the LSM integration for user ns mounts along >>> with the rest of the user ns integration. >> Ignoring the security attributes is not safe! > Understood. It's surely safe for each LSM to deny such mounts until it > has a way to handle them safely though. > > I'm not trying to completely punt on the issue of security modules, just > break this down into more manageable chunks. You've given good guidance > for Smack (thanks very much for that), so I can plan to work on that > soon. > >>> Your point is taken about my less-than-expert opinion about the other >>> security modules. We should at minimum get acks from the maintainers of >>> those modules that unprivileged mounts will not compromise MAC. >> I am the Smack maintainer. Unprivileged mounts as you have >> described them compromise MAC. They compromise DAC, too. > It looks like Andy's more or less convinced you that DAC isn't > (additionally?) compromised. And there's a plan for MAC, that the > security module can deny mounts from user namespaces until it has a > solution for allowing them safely. I wouldn't say that Andy has me convinced on DAC. I would say that he's taken me deeper into the details of namespaces than I feel comfortable making arguments about. I don't know that he's right, I just don't know how to argue that he isn't. Part of what bothers me is the dependence on namespaces. If you could come up with a mechanism that wasn't dependent on namespaces it would be much easier for dinosaurs like me to comprehend. As far as declaring that MAC and namespace owned mounts are incompatible goes, I think that I said early on that wasn't going to fly. Too much of the Linux population (Fedora, Android, Tizen, ...) uses MAC for the feature to be considered ready for general consumption without it. And no, I don't believe in partial implementations. You wouldn't get away with putting this in if it only worked on s370 processors. >>> For Smack specifically, I believe my only concern was the SMACK64EXEC >>> attribute, as all the other attributes only affected subjects' access to >>> the files. So maybe it would be possible to simply ignore this attribute >>> in unprivileged mounts and respect the others, even lacking more >>> complete LSM support for user namespaces. >> SMACK64EXEC is analogous to the setuid bit, but I would rather see >> exec() of programs with this attribute refused that for it to be >> blindly ignored. > That's fine, it's your call. I said it, but on reflection the current NOSETUID behavior is as you described it, so I wouldn't change that. > > Thanks, > Seth > -- > To unsubscribe from this list: send the line "unsubscribe linux-security-module" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-17 17:14 ` Casey Schaufler 0 siblings, 0 replies; 138+ messages in thread From: Casey Schaufler @ 2015-07-17 17:14 UTC (permalink / raw) To: Seth Forshee Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, linux-security-module, selinux, linux-fsdevel, Alexander Viro On 7/17/2015 6:21 AM, Seth Forshee wrote: > On Thu, Jul 16, 2015 at 02:42:22PM -0700, Casey Schaufler wrote: > > <snip> > >>> I welcome feedback about anything I've missed, but stating generally >>> that you think I probably missed something isn't very helpful. >> True enough. I hope I've explained myself above. > Thanks, that definitely clarified where we were having a disconnect. > Andy's done a fantastic job explaining how those concerns are addressed. > >>> The LSM issue is thornier than the rest of it though, which is why I >>> specifically asked for review there in the cover letter. There's a lot >>> of complexity and nuance, and I still don't have a grasp on all the >>> subtleties. One such subtlety is the full impact of simply ignoring the >>> security labels on disk (but I am still confused as to why this is >>> different from filesystems which don't support xattrs at all). >> If you can mount a filesystem such that the labels are ignored you >> are effectively specifying that the Smack label on the files be >> determined by the defaulting rules. With CAP_MAC_ADMIN that's fine. >> Without it, it's not. >> >>> I was unaware of Lukasz's patches until yesterday, and I will have a >>> look at them. But since we don't have the LSM support for user >>> namespaces yet, I don't see the problem with doing something safe for >>> LSMs initially and evolving the LSM integration for user ns mounts along >>> with the rest of the user ns integration. >> Ignoring the security attributes is not safe! > Understood. It's surely safe for each LSM to deny such mounts until it > has a way to handle them safely though. > > I'm not trying to completely punt on the issue of security modules, just > break this down into more manageable chunks. You've given good guidance > for Smack (thanks very much for that), so I can plan to work on that > soon. > >>> Your point is taken about my less-than-expert opinion about the other >>> security modules. We should at minimum get acks from the maintainers of >>> those modules that unprivileged mounts will not compromise MAC. >> I am the Smack maintainer. Unprivileged mounts as you have >> described them compromise MAC. They compromise DAC, too. > It looks like Andy's more or less convinced you that DAC isn't > (additionally?) compromised. And there's a plan for MAC, that the > security module can deny mounts from user namespaces until it has a > solution for allowing them safely. I wouldn't say that Andy has me convinced on DAC. I would say that he's taken me deeper into the details of namespaces than I feel comfortable making arguments about. I don't know that he's right, I just don't know how to argue that he isn't. Part of what bothers me is the dependence on namespaces. If you could come up with a mechanism that wasn't dependent on namespaces it would be much easier for dinosaurs like me to comprehend. As far as declaring that MAC and namespace owned mounts are incompatible goes, I think that I said early on that wasn't going to fly. Too much of the Linux population (Fedora, Android, Tizen, ...) uses MAC for the feature to be considered ready for general consumption without it. And no, I don't believe in partial implementations. You wouldn't get away with putting this in if it only worked on s370 processors. >>> For Smack specifically, I believe my only concern was the SMACK64EXEC >>> attribute, as all the other attributes only affected subjects' access to >>> the files. So maybe it would be possible to simply ignore this attribute >>> in unprivileged mounts and respect the others, even lacking more >>> complete LSM support for user namespaces. >> SMACK64EXEC is analogous to the setuid bit, but I would rather see >> exec() of programs with this attribute refused that for it to be >> blindly ignored. > That's fine, it's your call. I said it, but on reflection the current NOSETUID behavior is as you described it, so I wouldn't change that. > > Thanks, > Seth > -- > To unsubscribe from this list: send the line "unsubscribe linux-security-module" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts 2015-07-16 13:59 ` Seth Forshee @ 2015-07-16 15:59 ` Seth Forshee -1 siblings, 0 replies; 138+ messages in thread From: Seth Forshee @ 2015-07-16 15:59 UTC (permalink / raw) To: Eric W. Biederman Cc: Alexander Viro, linux-fsdevel, linux-security-module, selinux, Serge Hallyn, Andy Lutomirski, linux-kernel, Casey Schaufler On Thu, Jul 16, 2015 at 08:59:47AM -0500, Seth Forshee wrote: > On Wed, Jul 15, 2015 at 10:15:21PM -0500, Eric W. Biederman wrote: > > > > Seth I think for the LSMs we should start with: > > > > diff --git a/security/security.c b/security/security.c > > index 062f3c997fdc..5b6ece92a8e5 100644 > > --- a/security/security.c > > +++ b/security/security.c > > @@ -310,6 +310,8 @@ int security_sb_statfs(struct dentry *dentry) > > int security_sb_mount(const char *dev_name, struct path *path, > > const char *type, unsigned long flags, void *data) > > { > > + if (current_user_ns() != &init_user_ns) > > + return -EPERM; > > return call_int_hook(sb_mount, 0, dev_name, path, type, flags, data); > > } > > This just makes it impossible to mount from a user namespace. Every > mount from current_user_ns() != &init_user_ns will fail. What might work instead is to add a check in security_sb_kern_mount. Then it would need to check s_user_ns, that way if proc, sysfs, etc. use sget_userns(..., &init_user_ns) they can still be mounted in containers. It would be nicer to have a hook after sget but before fill_super so that a bunch of work doesn't have to be done and then undone. Right now there doesn't seem to be any suitable hook. > > Then we should push this down into all of the lsms. > > Then when we should remove or relax or change the check as appropriate > > in each lsm. > > > > The point is this is good enough to see that it is trivially safe, > > and this allows us to focus on the core issues, and stop worrying about > > the lsms for a bit. > > > > Then we can focus on each lsm one at at time and take the time to really > > understand them and talk with their maintainers etc to make certain > > we get things correct. > > > > This should remove the need for your patches 5, 6 and 7. For the > > immediate future. > > I'm still not entirely sure what you were trying to do, maybe refuse to > mount whenever a security module is loaded? I think this could be a good > option to start, but couldn't we restrict it to only the LSMs which use > xattrs for security labels? In situations where the filesystem cannot > supply security policy metadata I can't think of any reason to disallow > the mounts. > > Seth ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH 0/7] Initial support for user namespace owned mounts @ 2015-07-16 15:59 ` Seth Forshee 0 siblings, 0 replies; 138+ messages in thread From: Seth Forshee @ 2015-07-16 15:59 UTC (permalink / raw) To: Eric W. Biederman Cc: Serge Hallyn, linux-kernel, Andy Lutomirski, linux-security-module, Alexander Viro, selinux, linux-fsdevel On Thu, Jul 16, 2015 at 08:59:47AM -0500, Seth Forshee wrote: > On Wed, Jul 15, 2015 at 10:15:21PM -0500, Eric W. Biederman wrote: > > > > Seth I think for the LSMs we should start with: > > > > diff --git a/security/security.c b/security/security.c > > index 062f3c997fdc..5b6ece92a8e5 100644 > > --- a/security/security.c > > +++ b/security/security.c > > @@ -310,6 +310,8 @@ int security_sb_statfs(struct dentry *dentry) > > int security_sb_mount(const char *dev_name, struct path *path, > > const char *type, unsigned long flags, void *data) > > { > > + if (current_user_ns() != &init_user_ns) > > + return -EPERM; > > return call_int_hook(sb_mount, 0, dev_name, path, type, flags, data); > > } > > This just makes it impossible to mount from a user namespace. Every > mount from current_user_ns() != &init_user_ns will fail. What might work instead is to add a check in security_sb_kern_mount. Then it would need to check s_user_ns, that way if proc, sysfs, etc. use sget_userns(..., &init_user_ns) they can still be mounted in containers. It would be nicer to have a hook after sget but before fill_super so that a bunch of work doesn't have to be done and then undone. Right now there doesn't seem to be any suitable hook. > > Then we should push this down into all of the lsms. > > Then when we should remove or relax or change the check as appropriate > > in each lsm. > > > > The point is this is good enough to see that it is trivially safe, > > and this allows us to focus on the core issues, and stop worrying about > > the lsms for a bit. > > > > Then we can focus on each lsm one at at time and take the time to really > > understand them and talk with their maintainers etc to make certain > > we get things correct. > > > > This should remove the need for your patches 5, 6 and 7. For the > > immediate future. > > I'm still not entirely sure what you were trying to do, maybe refuse to > mount whenever a security module is loaded? I think this could be a good > option to start, but couldn't we restrict it to only the LSMs which use > xattrs for security labels? In situations where the filesystem cannot > supply security policy metadata I can't think of any reason to disallow > the mounts. > > Seth ^ permalink raw reply [flat|nested] 138+ messages in thread
end of thread, other threads:[~2015-08-01 17:01 UTC | newest] Thread overview: 138+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-07-31 8:11 [PATCH 0/7] Initial support for user namespace owned mounts Amir Goldstein 2015-07-31 8:11 ` Amir Goldstein 2015-07-31 19:56 ` Casey Schaufler 2015-07-31 19:56 ` Casey Schaufler 2015-08-01 17:01 ` Amir Goldstein 2015-08-01 17:01 ` Amir Goldstein -- strict thread matches above, loose matches on Subject: below -- 2015-07-30 4:24 Amir Goldstein 2015-07-30 4:24 ` Amir Goldstein 2015-07-30 13:55 ` Seth Forshee 2015-07-30 13:55 ` Seth Forshee 2015-07-30 14:47 ` Amir Goldstein 2015-07-30 14:47 ` Amir Goldstein 2015-07-30 15:33 ` Casey Schaufler 2015-07-30 15:33 ` Casey Schaufler 2015-07-30 15:52 ` Colin Walters 2015-07-30 15:52 ` Colin Walters 2015-07-30 16:15 ` Eric W. Biederman 2015-07-30 16:15 ` Eric W. Biederman 2015-07-30 13:57 ` Serge Hallyn 2015-07-30 13:57 ` Serge Hallyn 2015-07-30 15:09 ` Amir Goldstein 2015-07-30 15:09 ` Amir Goldstein 2015-07-15 19:46 Seth Forshee 2015-07-15 19:46 ` Seth Forshee 2015-07-15 20:36 ` Casey Schaufler 2015-07-15 20:36 ` Casey Schaufler 2015-07-15 21:06 ` Eric W. Biederman 2015-07-15 21:06 ` Eric W. Biederman 2015-07-15 21:48 ` Seth Forshee 2015-07-15 21:48 ` Seth Forshee 2015-07-15 22:28 ` Eric W. Biederman 2015-07-15 22:28 ` Eric W. Biederman 2015-07-16 1:05 ` Andy Lutomirski 2015-07-16 1:05 ` Andy Lutomirski 2015-07-16 2:20 ` Eric W. Biederman 2015-07-16 2:20 ` Eric W. Biederman 2015-07-16 13:12 ` Stephen Smalley 2015-07-16 13:12 ` Stephen Smalley 2015-07-15 23:04 ` Casey Schaufler 2015-07-15 23:04 ` Casey Schaufler 2015-07-15 22:39 ` Casey Schaufler 2015-07-15 22:39 ` Casey Schaufler 2015-07-16 1:08 ` Andy Lutomirski 2015-07-16 1:08 ` Andy Lutomirski 2015-07-16 2:54 ` Casey Schaufler 2015-07-16 2:54 ` Casey Schaufler 2015-07-16 4:47 ` Eric W. Biederman 2015-07-16 4:47 ` Eric W. Biederman 2015-07-17 0:09 ` Dave Chinner 2015-07-17 0:09 ` Dave Chinner 2015-07-17 0:42 ` Eric W. Biederman 2015-07-17 0:42 ` Eric W. Biederman 2015-07-17 2:47 ` Dave Chinner 2015-07-17 2:47 ` Dave Chinner 2015-07-21 17:37 ` J. Bruce Fields 2015-07-21 17:37 ` J. Bruce Fields 2015-07-22 7:56 ` Dave Chinner 2015-07-22 7:56 ` Dave Chinner 2015-07-22 14:09 ` J. Bruce Fields 2015-07-22 14:09 ` J. Bruce Fields 2015-07-22 16:52 ` Austin S Hemmelgarn 2015-07-22 16:52 ` Austin S Hemmelgarn 2015-07-22 17:41 ` J. Bruce Fields 2015-07-22 17:41 ` J. Bruce Fields 2015-07-23 1:51 ` Dave Chinner 2015-07-23 1:51 ` Dave Chinner 2015-07-23 13:19 ` J. Bruce Fields 2015-07-23 13:19 ` J. Bruce Fields 2015-07-23 23:48 ` Dave Chinner 2015-07-23 23:48 ` Dave Chinner 2015-07-18 0:07 ` Serge E. Hallyn 2015-07-18 0:07 ` Serge E. Hallyn 2015-07-20 17:54 ` Colin Walters 2015-07-20 17:54 ` Colin Walters 2015-07-16 11:16 ` Lukasz Pawelczyk 2015-07-16 11:16 ` Lukasz Pawelczyk 2015-07-17 0:10 ` Eric W. Biederman 2015-07-17 0:10 ` Eric W. Biederman 2015-07-17 10:13 ` Lukasz Pawelczyk 2015-07-17 10:13 ` Lukasz Pawelczyk 2015-07-16 3:15 ` Eric W. Biederman 2015-07-16 3:15 ` Eric W. Biederman 2015-07-16 13:59 ` Seth Forshee 2015-07-16 13:59 ` Seth Forshee 2015-07-16 15:09 ` Casey Schaufler 2015-07-16 15:09 ` Casey Schaufler 2015-07-16 18:57 ` Seth Forshee 2015-07-16 18:57 ` Seth Forshee 2015-07-16 21:42 ` Casey Schaufler 2015-07-16 21:42 ` Casey Schaufler 2015-07-16 22:27 ` Andy Lutomirski 2015-07-16 22:27 ` Andy Lutomirski 2015-07-16 23:08 ` Casey Schaufler 2015-07-16 23:08 ` Casey Schaufler 2015-07-16 23:29 ` Andy Lutomirski 2015-07-16 23:29 ` Andy Lutomirski 2015-07-17 0:45 ` Casey Schaufler 2015-07-17 0:45 ` Casey Schaufler 2015-07-17 0:59 ` Andy Lutomirski 2015-07-17 0:59 ` Andy Lutomirski 2015-07-17 14:28 ` Serge E. Hallyn 2015-07-17 14:28 ` Serge E. Hallyn 2015-07-17 14:56 ` Seth Forshee 2015-07-17 14:56 ` Seth Forshee 2015-07-21 20:35 ` Seth Forshee 2015-07-21 20:35 ` Seth Forshee 2015-07-22 1:52 ` Casey Schaufler 2015-07-22 1:52 ` Casey Schaufler 2015-07-22 15:56 ` Seth Forshee 2015-07-22 15:56 ` Seth Forshee 2015-07-22 18:10 ` Casey Schaufler 2015-07-22 18:10 ` Casey Schaufler 2015-07-22 19:32 ` Seth Forshee 2015-07-22 19:32 ` Seth Forshee 2015-07-23 0:05 ` Casey Schaufler 2015-07-23 0:05 ` Casey Schaufler 2015-07-23 0:15 ` Eric W. Biederman 2015-07-23 0:15 ` Eric W. Biederman 2015-07-23 5:15 ` Seth Forshee 2015-07-23 5:15 ` Seth Forshee 2015-07-23 21:48 ` Casey Schaufler 2015-07-23 21:48 ` Casey Schaufler 2015-07-28 20:40 ` Seth Forshee 2015-07-28 20:40 ` Seth Forshee 2015-07-30 16:18 ` Casey Schaufler 2015-07-30 16:18 ` Casey Schaufler 2015-07-30 17:05 ` Eric W. Biederman 2015-07-30 17:05 ` Eric W. Biederman 2015-07-30 17:25 ` Seth Forshee 2015-07-30 17:25 ` Seth Forshee 2015-07-30 17:33 ` Eric W. Biederman 2015-07-30 17:33 ` Eric W. Biederman 2015-07-17 13:21 ` Seth Forshee 2015-07-17 13:21 ` Seth Forshee 2015-07-17 17:14 ` Casey Schaufler 2015-07-17 17:14 ` Casey Schaufler 2015-07-16 15:59 ` Seth Forshee 2015-07-16 15:59 ` Seth Forshee
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.