netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [RFC][PATCH net] bpf: Use mount_nodev not mount_ns to mount the bpf filesystem
       [not found] <87oa84wa6h.fsf@x220.int.ebiederm.org>
@ 2016-05-18 10:48 ` Hannes Frederic Sowa
  2016-05-18 14:56   ` Eric W. Biederman
  0 siblings, 1 reply; 8+ messages in thread
From: Hannes Frederic Sowa @ 2016-05-18 10:48 UTC (permalink / raw)
  To: Eric W. Biederman, Daniel Borkmann
  Cc: Alexei Starovoitov, David S. Miller, netdev

On 18.05.2016 01:12, Eric W. Biederman wrote:
> 
> While reviewing the filesystems that set FS_USERNS_MOUNT I spotted the
> bpf filesystem.  Looking at the code I saw a broken usage of mount_ns
> with current->nsproxy->mnt_ns. As the code does not acquire a reference
> to the mount namespace it can not possibly be correct to store the mount
> namespace on the superblock as it does.
> 
> Replace mount_ns with mount_nodev so that each mount of the bpf
> filesystem returns a distinct instance, and the code is not utterly
> broken.
> 
> Fixes: b2197755b263 ("bpf: add support for persistent maps/progs")
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
> 
> No one should care about this change, as userspace typically only mounts
> things once and does not depend on things in one mount do not showing up
> in another.  Can someone who actually uses the bpf filesystem please
> verify this.
> 
> This needs to be fixed as the existing code is broken beyond words that
> I know how to express.

The idea is to have the bpf filesystem as a singeleton per mnt-namespace
to prevent endless instances being created and kernel resources being
hogged by pinning them to hard to discover bpf mounts.

Do you see any problem with adding appropriate reference counts?

Bye,
Hannes

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH net] bpf: Use mount_nodev not mount_ns to mount the bpf filesystem
  2016-05-18 10:48 ` [RFC][PATCH net] bpf: Use mount_nodev not mount_ns to mount the bpf filesystem Hannes Frederic Sowa
@ 2016-05-18 14:56   ` Eric W. Biederman
  2016-05-18 20:43     ` Daniel Borkmann
  0 siblings, 1 reply; 8+ messages in thread
From: Eric W. Biederman @ 2016-05-18 14:56 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Daniel Borkmann, Alexei Starovoitov, David S. Miller, netdev

Hannes Frederic Sowa <hannes@stressinduktion.org> writes:

> On 18.05.2016 01:12, Eric W. Biederman wrote:
>> 
>> While reviewing the filesystems that set FS_USERNS_MOUNT I spotted the
>> bpf filesystem.  Looking at the code I saw a broken usage of mount_ns
>> with current->nsproxy->mnt_ns. As the code does not acquire a reference
>> to the mount namespace it can not possibly be correct to store the mount
>> namespace on the superblock as it does.
>> 
>> Replace mount_ns with mount_nodev so that each mount of the bpf
>> filesystem returns a distinct instance, and the code is not utterly
>> broken.
>> 
>> Fixes: b2197755b263 ("bpf: add support for persistent maps/progs")
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>> 
>> No one should care about this change, as userspace typically only mounts
>> things once and does not depend on things in one mount do not showing up
>> in another.  Can someone who actually uses the bpf filesystem please
>> verify this.
>> 
>> This needs to be fixed as the existing code is broken beyond words that
>> I know how to express.
>
> The idea is to have the bpf filesystem as a singeleton per mnt-namespace
> to prevent endless instances being created and kernel resources being
> hogged by pinning them to hard to discover bpf mounts.

There is no method in the kernel to support a singleton per mount
namespace.  Mount propagation ruins that idea, and in most recent
distros mount propgation is enabled by default (it is something you can
opt out of later but not opt into later).

In general convention is a much better defense against endless
instances.

Having just fought a similar fight with devpts (because things went
horribly wrong) you are much better off with telling people to be careful
how to use things rather than not letting people use things wrong.
Especially if we are still at the "the idea is" stage rather than a
stage where changing this will actually break deployed implementations.

> Do you see any problem with adding appropriate reference counts?

Honestly my head hurts thinking about it.  Technically reference counts
would fix one aspect of it, but the whole situation really sucks.

Especially in a world of mount propgation where these mounts propgate
between mount namespaces, and where people choose to share or not on a
different criteria besides the mount namespace, attempting a one fs per
mount namespace policy is just bizarre bordering on completely broken.
Even if implemented correctly.

Filesystems do not know and should not care about the mount namespace
they are implemented it.  These are and should remain independent
concenpts and your implementation and attempted semantics violate that
horribly and I can't see a way to achieve what you were trying to
achieve.  The VFS just doesn't work that way.

Eric

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH net] bpf: Use mount_nodev not mount_ns to mount the bpf filesystem
  2016-05-18 14:56   ` Eric W. Biederman
@ 2016-05-18 20:43     ` Daniel Borkmann
  2016-05-18 20:46       ` Hannes Frederic Sowa
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel Borkmann @ 2016-05-18 20:43 UTC (permalink / raw)
  To: Eric W. Biederman, Hannes Frederic Sowa
  Cc: Alexei Starovoitov, David S. Miller, netdev

On 05/18/2016 04:56 PM, Eric W. Biederman wrote:
> Hannes Frederic Sowa <hannes@stressinduktion.org> writes:
>> On 18.05.2016 01:12, Eric W. Biederman wrote:
>>>
>>> While reviewing the filesystems that set FS_USERNS_MOUNT I spotted the
>>> bpf filesystem.  Looking at the code I saw a broken usage of mount_ns
>>> with current->nsproxy->mnt_ns. As the code does not acquire a reference
>>> to the mount namespace it can not possibly be correct to store the mount
>>> namespace on the superblock as it does.
>>>
>>> Replace mount_ns with mount_nodev so that each mount of the bpf
>>> filesystem returns a distinct instance, and the code is not utterly
>>> broken.
>>>
>>> Fixes: b2197755b263 ("bpf: add support for persistent maps/progs")
>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>> ---
>>>
>>> No one should care about this change, as userspace typically only mounts
>>> things once and does not depend on things in one mount do not showing up
>>> in another.  Can someone who actually uses the bpf filesystem please
>>> verify this.
[...]

LGTM.

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

>> The idea is to have the bpf filesystem as a singeleton per mnt-namespace
>> to prevent endless instances being created and kernel resources being
>> hogged by pinning them to hard to discover bpf mounts.

Eric, please send the patch officially and feel free to add my Ack. Given
the circumstances, moving to mount_nodev() seems the best way forward. To
also address above mentioned concern from Hannes, we need to remove the
FS_USERNS_MOUNT flag along with the change. It looks like the fix is best
addressed in a single patch if you want to include it. If not, we can
otherwise send it separately as well, I don't mind.

Thanks for your feedback!

diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 8f94ca1..b2aefa2 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -378,7 +378,7 @@ static int bpf_fill_super(struct super_block *sb, void *data, int silent)
  static struct dentry *bpf_mount(struct file_system_type *type, int flags,
  				const char *dev_name, void *data)
  {
-	return mount_ns(type, flags, current->nsproxy->mnt_ns, bpf_fill_super);
+	return mount_nodev(type, flags, data, bpf_fill_super);
  }

  static struct file_system_type bpf_fs_type = {
@@ -386,7 +386,6 @@ static struct file_system_type bpf_fs_type = {
  	.name		= "bpf",
  	.mount		= bpf_mount,
  	.kill_sb	= kill_litter_super,
-	.fs_flags	= FS_USERNS_MOUNT,
  };

  MODULE_ALIAS_FS("bpf");
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH net] bpf: Use mount_nodev not mount_ns to mount the bpf filesystem
  2016-05-18 20:43     ` Daniel Borkmann
@ 2016-05-18 20:46       ` Hannes Frederic Sowa
  2016-05-20 22:22         ` [PATCH " Eric W. Biederman
  2016-05-20 22:31         ` [RFC][PATCH " Eric W. Biederman
  0 siblings, 2 replies; 8+ messages in thread
From: Hannes Frederic Sowa @ 2016-05-18 20:46 UTC (permalink / raw)
  To: Daniel Borkmann, Eric W. Biederman
  Cc: Alexei Starovoitov, David S. Miller, netdev

On 18.05.2016 22:43, Daniel Borkmann wrote:
> On 05/18/2016 04:56 PM, Eric W. Biederman wrote:
>> Hannes Frederic Sowa <hannes@stressinduktion.org> writes:
>>> On 18.05.2016 01:12, Eric W. Biederman wrote:
>>>>
>>>> While reviewing the filesystems that set FS_USERNS_MOUNT I spotted the
>>>> bpf filesystem.  Looking at the code I saw a broken usage of mount_ns
>>>> with current->nsproxy->mnt_ns. As the code does not acquire a reference
>>>> to the mount namespace it can not possibly be correct to store the
>>>> mount
>>>> namespace on the superblock as it does.
>>>>
>>>> Replace mount_ns with mount_nodev so that each mount of the bpf
>>>> filesystem returns a distinct instance, and the code is not utterly
>>>> broken.
>>>>
>>>> Fixes: b2197755b263 ("bpf: add support for persistent maps/progs")
>>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>>> ---
>>>>
>>>> No one should care about this change, as userspace typically only
>>>> mounts
>>>> things once and does not depend on things in one mount do not
>>>> showing up
>>>> in another.  Can someone who actually uses the bpf filesystem please
>>>> verify this.
> [...]
> 
> LGTM.
> 
> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
> 
>>> The idea is to have the bpf filesystem as a singeleton per mnt-namespace
>>> to prevent endless instances being created and kernel resources being
>>> hogged by pinning them to hard to discover bpf mounts.
> 
> Eric, please send the patch officially and feel free to add my Ack. Given
> the circumstances, moving to mount_nodev() seems the best way forward. To
> also address above mentioned concern from Hannes, we need to remove the
> FS_USERNS_MOUNT flag along with the change. It looks like the fix is best
> addressed in a single patch if you want to include it. If not, we can
> otherwise send it separately as well, I don't mind.

I agree. Would make most sense to make the change in one patch. Later on
we can reason about if it makes sense to use the net namespace to split
bpf maps and programs or maybe even introduce a new primitive for that.

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH net] bpf: Use mount_nodev not mount_ns to mount the bpf filesystem
  2016-05-18 20:46       ` Hannes Frederic Sowa
@ 2016-05-20 22:22         ` Eric W. Biederman
  2016-05-20 23:27           ` Hannes Frederic Sowa
  2016-05-20 23:46           ` David Miller
  2016-05-20 22:31         ` [RFC][PATCH " Eric W. Biederman
  1 sibling, 2 replies; 8+ messages in thread
From: Eric W. Biederman @ 2016-05-20 22:22 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Daniel Borkmann, Alexei Starovoitov, David S. Miller, netdev


While reviewing the filesystems that set FS_USERNS_MOUNT I spotted the
bpf filesystem.  Looking at the code I saw a broken usage of mount_ns
with current->nsproxy->mnt_ns. As the code does not acquire a
reference to the mount namespace it can not possibly be correct to
store the mount namespace on the superblock as it does.

Replace mount_ns with mount_nodev so that each mount of the bpf
filesystem returns a distinct instance, and the code is not buggy.

In discussion with Hannes Frederic Sowa it was reported that the use
of mount_ns was an attempt to have one bpf instance per mount
namespace, in an attempt to keep resources that pin resources from
hiding.  That intent simply does not work, the vfs is not built to
allow that kind of behavior.  Which means that the bpf filesystem
really is buggy both semantically and in it's implemenation as it does
not nor can it implement the original intent.

This change is userspace visible, but my experience with similar
filesystems leads me to believe nothing will break with a model of each
mount of the bpf filesystem is distinct from all others.

Fixes: b2197755b263 ("bpf: add support for persistent maps/progs")
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/bpf/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 8f94ca1860cf..55d923688f85 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -378,7 +378,7 @@ static int bpf_fill_super(struct super_block *sb, void *data, int silent)
 static struct dentry *bpf_mount(struct file_system_type *type, int flags,
 				const char *dev_name, void *data)
 {
-	return mount_ns(type, flags, current->nsproxy->mnt_ns, bpf_fill_super);
+	return mount_nodev(type, flags, data, bpf_fill_super);
 }
 
 static struct file_system_type bpf_fs_type = {
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH net] bpf: Use mount_nodev not mount_ns to mount the bpf filesystem
  2016-05-18 20:46       ` Hannes Frederic Sowa
  2016-05-20 22:22         ` [PATCH " Eric W. Biederman
@ 2016-05-20 22:31         ` Eric W. Biederman
  1 sibling, 0 replies; 8+ messages in thread
From: Eric W. Biederman @ 2016-05-20 22:31 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Daniel Borkmann, Alexei Starovoitov, David S. Miller, netdev

Hannes Frederic Sowa <hannes@stressinduktion.org> writes:

> On 18.05.2016 22:43, Daniel Borkmann wrote:
>> Eric, please send the patch officially and feel free to add my Ack.

Done.

>> Given
>> the circumstances, moving to mount_nodev() seems the best way forward. To
>> also address above mentioned concern from Hannes, we need to remove the
>> FS_USERNS_MOUNT flag along with the change. It looks like the fix is best
>> addressed in a single patch if you want to include it. If not, we can
>> otherwise send it separately as well, I don't mind.
>
> I agree. Would make most sense to make the change in one patch. Later on
> we can reason about if it makes sense to use the net namespace to split
> bpf maps and programs or maybe even introduce a new primitive for that.

I will let you two take care of the FS_USERNS_MOUNT flag.

Removal of the FS_USERNS_MOUNT flag because it was added prematurely is
a completely different analysis of consequences and possible regressions
in userspace.  The two changes should be kept separate to make it easy
to handle the unlikely case that either of them cause a regression.

But I have not objections to removing the FS_USERNS_MOUNT flag from the
bpf filesystem.

Eric

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] bpf: Use mount_nodev not mount_ns to mount the bpf filesystem
  2016-05-20 22:22         ` [PATCH " Eric W. Biederman
@ 2016-05-20 23:27           ` Hannes Frederic Sowa
  2016-05-20 23:46           ` David Miller
  1 sibling, 0 replies; 8+ messages in thread
From: Hannes Frederic Sowa @ 2016-05-20 23:27 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Daniel Borkmann, Alexei Starovoitov, David S. Miller, netdev

On 21.05.2016 00:22, Eric W. Biederman wrote:
> 
> While reviewing the filesystems that set FS_USERNS_MOUNT I spotted the
> bpf filesystem.  Looking at the code I saw a broken usage of mount_ns
> with current->nsproxy->mnt_ns. As the code does not acquire a
> reference to the mount namespace it can not possibly be correct to
> store the mount namespace on the superblock as it does.
> 
> Replace mount_ns with mount_nodev so that each mount of the bpf
> filesystem returns a distinct instance, and the code is not buggy.
> 
> In discussion with Hannes Frederic Sowa it was reported that the use
> of mount_ns was an attempt to have one bpf instance per mount
> namespace, in an attempt to keep resources that pin resources from
> hiding.  That intent simply does not work, the vfs is not built to
> allow that kind of behavior.  Which means that the bpf filesystem
> really is buggy both semantically and in it's implemenation as it does
> not nor can it implement the original intent.
> 
> This change is userspace visible, but my experience with similar
> filesystems leads me to believe nothing will break with a model of each
> mount of the bpf filesystem is distinct from all others.
> 
> Fixes: b2197755b263 ("bpf: add support for persistent maps/progs")
> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

Thanks Eric!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] bpf: Use mount_nodev not mount_ns to mount the bpf filesystem
  2016-05-20 22:22         ` [PATCH " Eric W. Biederman
  2016-05-20 23:27           ` Hannes Frederic Sowa
@ 2016-05-20 23:46           ` David Miller
  1 sibling, 0 replies; 8+ messages in thread
From: David Miller @ 2016-05-20 23:46 UTC (permalink / raw)
  To: ebiederm; +Cc: hannes, daniel, ast, netdev

From: ebiederm@xmission.com (Eric W. Biederman)
Date: Fri, 20 May 2016 17:22:48 -0500

> 
> While reviewing the filesystems that set FS_USERNS_MOUNT I spotted the
> bpf filesystem.  Looking at the code I saw a broken usage of mount_ns
> with current->nsproxy->mnt_ns. As the code does not acquire a
> reference to the mount namespace it can not possibly be correct to
> store the mount namespace on the superblock as it does.
> 
> Replace mount_ns with mount_nodev so that each mount of the bpf
> filesystem returns a distinct instance, and the code is not buggy.
> 
> In discussion with Hannes Frederic Sowa it was reported that the use
> of mount_ns was an attempt to have one bpf instance per mount
> namespace, in an attempt to keep resources that pin resources from
> hiding.  That intent simply does not work, the vfs is not built to
> allow that kind of behavior.  Which means that the bpf filesystem
> really is buggy both semantically and in it's implemenation as it does
> not nor can it implement the original intent.
> 
> This change is userspace visible, but my experience with similar
> filesystems leads me to believe nothing will break with a model of each
> mount of the bpf filesystem is distinct from all others.
> 
> Fixes: b2197755b263 ("bpf: add support for persistent maps/progs")
> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Applied and queued up for -stable, thanks everyone.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-05-20 23:46 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <87oa84wa6h.fsf@x220.int.ebiederm.org>
2016-05-18 10:48 ` [RFC][PATCH net] bpf: Use mount_nodev not mount_ns to mount the bpf filesystem Hannes Frederic Sowa
2016-05-18 14:56   ` Eric W. Biederman
2016-05-18 20:43     ` Daniel Borkmann
2016-05-18 20:46       ` Hannes Frederic Sowa
2016-05-20 22:22         ` [PATCH " Eric W. Biederman
2016-05-20 23:27           ` Hannes Frederic Sowa
2016-05-20 23:46           ` David Miller
2016-05-20 22:31         ` [RFC][PATCH " Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).