All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv4 0/2] capability controlled user-namespaces
@ 2018-01-03  7:26 ` Mahesh Bandewar
  0 siblings, 0 replies; 14+ messages in thread
From: Mahesh Bandewar @ 2018-01-03  7:26 UTC (permalink / raw)
  To: LKML, James Morris
  Cc: Netdev, Kernel-hardening, Linux API, Linux Security,
	Serge Hallyn, Michael Kerrisk, Kees Cook, Eric W . Biederman,
	Eric Dumazet, David Miller, Mahesh Bandewar, Mahesh Bandewar

From: Mahesh Bandewar <maheshb@google.com>

TL;DR version
-------------
Creating a sandbox environment with namespaces is challenging
considering what these sandboxed processes can engage into. e.g.
CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few.
Current form of user-namespaces, however, if changed a bit can allow
us to create a sandbox environment without locking down user-
namespaces.

Detailed version
----------------

Problem
-------
User-namespaces in the current form have increased the attack surface as
any process can acquire capabilities which are not available to them (by
default) by performing combination of clone()/unshare()/setns() syscalls.

    #define _GNU_SOURCE
    #include <stdio.h>
    #include <sched.h>
    #include <netinet/in.h>

    int main(int ac, char **av)
    {
        int sock = -1;

        printf("Attempting to open RAW socket before unshare()...\n");
        sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW);
        if (sock < 0) {
            perror("socket() SOCK_RAW failed: ");
        } else {
            printf("Successfully opened RAW-Sock before unshare().\n");
            close(sock);
            sock = -1;
        }

        if (unshare(CLONE_NEWUSER | CLONE_NEWNET) < 0) {
            perror("unshare() failed: ");
            return 1;
        }

        printf("Attempting to open RAW socket after unshare()...\n");
        sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW);
        if (sock < 0) {
            perror("socket() SOCK_RAW failed: ");
        } else {
            printf("Successfully opened RAW-Sock after unshare().\n");
            close(sock);
            sock = -1;
        }

        return 0;
    }

The above example shows how easy it is to acquire NET_RAW capabilities
and once acquired, these processes could take benefit of above mentioned
or similar issues discovered/undiscovered with malicious intent. Note
that this is just an example and the problem/solution is not limited
to NET_RAW capability *only*. 

The easiest fix one can apply here is to lock-down user-namespaces which
many of the distros do (i.e. don't allow users to create user namespaces),
but unfortunately that prevents everyone from using them.

Approach
--------
Introduce a notion of 'controlled' user-namespaces. Every process on
the host is allowed to create user-namespaces (governed by the limit
imposed by per-ns sysctl) however, mark user-namespaces created by
sandboxed processes as 'controlled'. Use this 'mark' at the time of
capability check in conjunction with a global capability whitelist.
If the capability is not whitelisted, processes that belong to 
controlled user-namespaces will not be allowed.

Processes that do not have CAP_SYS_ADMIN in init-ns can *only* create
controlled user-namespaces. In other words, user-namespaces created by
privileged processes (those which have CAP_SYS_ADMIN in init-ns) are
not controlled. A hierarchy underneath any controlled user-ns is always
controlled.

A global whitelist is list of capabilities governed by a sysctl
(kernel.controlled_userns_caps_whitelist) which is available to
(privileged) user in init-ns to modify while it's applicable to all
controlled user-namespaces on the host irrespective of when that user-ns
was created.

Marking user-namespaces controlled without modifying the whitelist is
equivalent of the current behavior. The default value of whitelist includes
all capabilities so that the compatibility is maintained. However it gives
admins fine-grained ability to control various capabilities system wide
without locking down user-namespaces.

Example
-------
Here is the example that demonstrates the behavior of a kernel that has
this patch-set applied. It uses the same c-code from this commit-log and
is called acquire_raw.c -

(a) The 'root' user has all the capabilities all the time (before and
    after taking capability).

    root@vm0:~# id
    uid=0(root) gid=0(root) groups=0(root)

    root@vm0:~# sysctl -q kernel.controlled_userns_caps_whitelist 
    kernel.controlled_userns_caps_whitelist = 1f,ffffffff

    root@vm0:~# ./acquire_raw 
    Attempting to open RAW socket before unshare()...
    Successfully opened RAW-Sock before unshare().
    Attempting to open RAW socket after unshare()...
    Successfully opened RAW-Sock after unshare().

    root@vm0:~# sysctl -w kernel.controlled_userns_caps_whitelist=1f,ffffdfff
    kernel.controlled_userns_caps_whitelist = 1f,ffffdfff

    root@vm0:~# ./acquire_raw 
    Attempting to open RAW socket before unshare()...
    Successfully opened RAW-Sock before unshare().
    Attempting to open RAW socket after unshare()...
    Successfully opened RAW-Sock after unshare().

(b) Unprivileged user cannot change the mask.

    mahesh@vm0:~$ id
    uid=1000(mahesh) gid=1000(mahesh)
    groups=1000(mahesh),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),118(lpadmin),128(sambashare)

    mahesh@vm0:~$ sysctl -q kernel.controlled_userns_caps_whitelist 
    kernel.controlled_userns_caps_whitelist = 1f,ffffffff

    mahesh@vm0:~$ sysctl -w kernel.controlled_userns_caps_whitelist=1f,ffffdfff
    sysctl: permission denied on key 'kernel.controlled_userns_caps_whitelist'

(c) Unprivileged user does not have CAP_NET_RAW in init-ns but can get
    that capability inside child-user-ns when the controlled_userns_caps
    mask is unchanged (current behavior).

    mahesh@vm0:~$ sysctl -q kernel.controlled_userns_caps_whitelist 
    kernel.controlled_userns_caps_whitelist = 1f,ffffffff

    mahesh@vm0:~$ ./acquire_raw 
    Attempting to open RAW socket before unshare()...
    socket() SOCK_RAW failed: : Operation not permitted
    Attempting to open RAW socket after unshare()...
    Successfully opened RAW-Sock after unshare().

(d) Changing the controlled_userns_caps_whitelist mask will prevent user
    for acquiring 'controlled capability' inside user-namespace.

    mahesh@vm0:~$ sysctl -q kernel.controlled_userns_caps_whitelist 
    kernel.controlled_userns_caps_whitelist = 1f,ffffdfff
    mahesh@vm0:~$ ./acquire_raw 
    Attempting to open RAW socket before unshare()...
    socket() SOCK_RAW failed: : Operation not permitted
    Attempting to open RAW socket after unshare()...
    socket() SOCK_RAW failed: : Operation not permitted


Please see individual patches in this series.

Mahesh Bandewar (2):
  capability: introduce sysctl for controlled user-ns capability whitelist
  userns: control capabilities of some user namespaces

 Documentation/sysctl/kernel.txt | 21 +++++++++++++++++
 include/linux/capability.h      |  7 ++++++
 include/linux/user_namespace.h  | 25 ++++++++++++++++++++
 kernel/capability.c             | 52 +++++++++++++++++++++++++++++++++++++++++
 kernel/sysctl.c                 |  5 ++++
 kernel/user_namespace.c         |  4 ++++
 security/commoncap.c            |  8 +++++++
 7 files changed, 122 insertions(+)

-- 
2.15.1.620.gb9897f4670-goog

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCHv4 0/2] capability controlled user-namespaces
@ 2018-01-03  7:26 ` Mahesh Bandewar
  0 siblings, 0 replies; 14+ messages in thread
From: Mahesh Bandewar @ 2018-01-03  7:26 UTC (permalink / raw)
  To: linux-security-module

From: Mahesh Bandewar <maheshb@google.com>

TL;DR version
-------------
Creating a sandbox environment with namespaces is challenging
considering what these sandboxed processes can engage into. e.g.
CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few.
Current form of user-namespaces, however, if changed a bit can allow
us to create a sandbox environment without locking down user-
namespaces.

Detailed version
----------------

Problem
-------
User-namespaces in the current form have increased the attack surface as
any process can acquire capabilities which are not available to them (by
default) by performing combination of clone()/unshare()/setns() syscalls.

    #define _GNU_SOURCE
    #include <stdio.h>
    #include <sched.h>
    #include <netinet/in.h>

    int main(int ac, char **av)
    {
        int sock = -1;

        printf("Attempting to open RAW socket before unshare()...\n");
        sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW);
        if (sock < 0) {
            perror("socket() SOCK_RAW failed: ");
        } else {
            printf("Successfully opened RAW-Sock before unshare().\n");
            close(sock);
            sock = -1;
        }

        if (unshare(CLONE_NEWUSER | CLONE_NEWNET) < 0) {
            perror("unshare() failed: ");
            return 1;
        }

        printf("Attempting to open RAW socket after unshare()...\n");
        sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW);
        if (sock < 0) {
            perror("socket() SOCK_RAW failed: ");
        } else {
            printf("Successfully opened RAW-Sock after unshare().\n");
            close(sock);
            sock = -1;
        }

        return 0;
    }

The above example shows how easy it is to acquire NET_RAW capabilities
and once acquired, these processes could take benefit of above mentioned
or similar issues discovered/undiscovered with malicious intent. Note
that this is just an example and the problem/solution is not limited
to NET_RAW capability *only*. 

The easiest fix one can apply here is to lock-down user-namespaces which
many of the distros do (i.e. don't allow users to create user namespaces),
but unfortunately that prevents everyone from using them.

Approach
--------
Introduce a notion of 'controlled' user-namespaces. Every process on
the host is allowed to create user-namespaces (governed by the limit
imposed by per-ns sysctl) however, mark user-namespaces created by
sandboxed processes as 'controlled'. Use this 'mark' at the time of
capability check in conjunction with a global capability whitelist.
If the capability is not whitelisted, processes that belong to 
controlled user-namespaces will not be allowed.

Processes that do not have CAP_SYS_ADMIN in init-ns can *only* create
controlled user-namespaces. In other words, user-namespaces created by
privileged processes (those which have CAP_SYS_ADMIN in init-ns) are
not controlled. A hierarchy underneath any controlled user-ns is always
controlled.

A global whitelist is list of capabilities governed by a sysctl
(kernel.controlled_userns_caps_whitelist) which is available to
(privileged) user in init-ns to modify while it's applicable to all
controlled user-namespaces on the host irrespective of when that user-ns
was created.

Marking user-namespaces controlled without modifying the whitelist is
equivalent of the current behavior. The default value of whitelist includes
all capabilities so that the compatibility is maintained. However it gives
admins fine-grained ability to control various capabilities system wide
without locking down user-namespaces.

Example
-------
Here is the example that demonstrates the behavior of a kernel that has
this patch-set applied. It uses the same c-code from this commit-log and
is called acquire_raw.c -

(a) The 'root' user has all the capabilities all the time (before and
    after taking capability).

    root at vm0:~# id
    uid=0(root) gid=0(root) groups=0(root)

    root at vm0:~# sysctl -q kernel.controlled_userns_caps_whitelist 
    kernel.controlled_userns_caps_whitelist = 1f,ffffffff

    root at vm0:~# ./acquire_raw 
    Attempting to open RAW socket before unshare()...
    Successfully opened RAW-Sock before unshare().
    Attempting to open RAW socket after unshare()...
    Successfully opened RAW-Sock after unshare().

    root at vm0:~# sysctl -w kernel.controlled_userns_caps_whitelist=1f,ffffdfff
    kernel.controlled_userns_caps_whitelist = 1f,ffffdfff

    root at vm0:~# ./acquire_raw 
    Attempting to open RAW socket before unshare()...
    Successfully opened RAW-Sock before unshare().
    Attempting to open RAW socket after unshare()...
    Successfully opened RAW-Sock after unshare().

(b) Unprivileged user cannot change the mask.

    mahesh at vm0:~$ id
    uid=1000(mahesh) gid=1000(mahesh)
    groups=1000(mahesh),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),118(lpadmin),128(sambashare)

    mahesh at vm0:~$ sysctl -q kernel.controlled_userns_caps_whitelist 
    kernel.controlled_userns_caps_whitelist = 1f,ffffffff

    mahesh at vm0:~$ sysctl -w kernel.controlled_userns_caps_whitelist=1f,ffffdfff
    sysctl: permission denied on key 'kernel.controlled_userns_caps_whitelist'

(c) Unprivileged user does not have CAP_NET_RAW in init-ns but can get
    that capability inside child-user-ns when the controlled_userns_caps
    mask is unchanged (current behavior).

    mahesh at vm0:~$ sysctl -q kernel.controlled_userns_caps_whitelist 
    kernel.controlled_userns_caps_whitelist = 1f,ffffffff

    mahesh at vm0:~$ ./acquire_raw 
    Attempting to open RAW socket before unshare()...
    socket() SOCK_RAW failed: : Operation not permitted
    Attempting to open RAW socket after unshare()...
    Successfully opened RAW-Sock after unshare().

(d) Changing the controlled_userns_caps_whitelist mask will prevent user
    for acquiring 'controlled capability' inside user-namespace.

    mahesh at vm0:~$ sysctl -q kernel.controlled_userns_caps_whitelist 
    kernel.controlled_userns_caps_whitelist = 1f,ffffdfff
    mahesh at vm0:~$ ./acquire_raw 
    Attempting to open RAW socket before unshare()...
    socket() SOCK_RAW failed: : Operation not permitted
    Attempting to open RAW socket after unshare()...
    socket() SOCK_RAW failed: : Operation not permitted


Please see individual patches in this series.

Mahesh Bandewar (2):
  capability: introduce sysctl for controlled user-ns capability whitelist
  userns: control capabilities of some user namespaces

 Documentation/sysctl/kernel.txt | 21 +++++++++++++++++
 include/linux/capability.h      |  7 ++++++
 include/linux/user_namespace.h  | 25 ++++++++++++++++++++
 kernel/capability.c             | 52 +++++++++++++++++++++++++++++++++++++++++
 kernel/sysctl.c                 |  5 ++++
 kernel/user_namespace.c         |  4 ++++
 security/commoncap.c            |  8 +++++++
 7 files changed, 122 insertions(+)

-- 
2.15.1.620.gb9897f4670-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info@ http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [kernel-hardening] [PATCHv4 0/2] capability controlled user-namespaces
@ 2018-01-03  7:26 ` Mahesh Bandewar
  0 siblings, 0 replies; 14+ messages in thread
From: Mahesh Bandewar @ 2018-01-03  7:26 UTC (permalink / raw)
  To: LKML, James Morris
  Cc: Netdev, Kernel-hardening, Linux API, Linux Security,
	Serge Hallyn, Michael Kerrisk, Kees Cook, Eric W . Biederman,
	Eric Dumazet, David Miller, Mahesh Bandewar, Mahesh Bandewar

From: Mahesh Bandewar <maheshb@google.com>

TL;DR version
-------------
Creating a sandbox environment with namespaces is challenging
considering what these sandboxed processes can engage into. e.g.
CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few.
Current form of user-namespaces, however, if changed a bit can allow
us to create a sandbox environment without locking down user-
namespaces.

Detailed version
----------------

Problem
-------
User-namespaces in the current form have increased the attack surface as
any process can acquire capabilities which are not available to them (by
default) by performing combination of clone()/unshare()/setns() syscalls.

    #define _GNU_SOURCE
    #include <stdio.h>
    #include <sched.h>
    #include <netinet/in.h>

    int main(int ac, char **av)
    {
        int sock = -1;

        printf("Attempting to open RAW socket before unshare()...\n");
        sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW);
        if (sock < 0) {
            perror("socket() SOCK_RAW failed: ");
        } else {
            printf("Successfully opened RAW-Sock before unshare().\n");
            close(sock);
            sock = -1;
        }

        if (unshare(CLONE_NEWUSER | CLONE_NEWNET) < 0) {
            perror("unshare() failed: ");
            return 1;
        }

        printf("Attempting to open RAW socket after unshare()...\n");
        sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW);
        if (sock < 0) {
            perror("socket() SOCK_RAW failed: ");
        } else {
            printf("Successfully opened RAW-Sock after unshare().\n");
            close(sock);
            sock = -1;
        }

        return 0;
    }

The above example shows how easy it is to acquire NET_RAW capabilities
and once acquired, these processes could take benefit of above mentioned
or similar issues discovered/undiscovered with malicious intent. Note
that this is just an example and the problem/solution is not limited
to NET_RAW capability *only*. 

The easiest fix one can apply here is to lock-down user-namespaces which
many of the distros do (i.e. don't allow users to create user namespaces),
but unfortunately that prevents everyone from using them.

Approach
--------
Introduce a notion of 'controlled' user-namespaces. Every process on
the host is allowed to create user-namespaces (governed by the limit
imposed by per-ns sysctl) however, mark user-namespaces created by
sandboxed processes as 'controlled'. Use this 'mark' at the time of
capability check in conjunction with a global capability whitelist.
If the capability is not whitelisted, processes that belong to 
controlled user-namespaces will not be allowed.

Processes that do not have CAP_SYS_ADMIN in init-ns can *only* create
controlled user-namespaces. In other words, user-namespaces created by
privileged processes (those which have CAP_SYS_ADMIN in init-ns) are
not controlled. A hierarchy underneath any controlled user-ns is always
controlled.

A global whitelist is list of capabilities governed by a sysctl
(kernel.controlled_userns_caps_whitelist) which is available to
(privileged) user in init-ns to modify while it's applicable to all
controlled user-namespaces on the host irrespective of when that user-ns
was created.

Marking user-namespaces controlled without modifying the whitelist is
equivalent of the current behavior. The default value of whitelist includes
all capabilities so that the compatibility is maintained. However it gives
admins fine-grained ability to control various capabilities system wide
without locking down user-namespaces.

Example
-------
Here is the example that demonstrates the behavior of a kernel that has
this patch-set applied. It uses the same c-code from this commit-log and
is called acquire_raw.c -

(a) The 'root' user has all the capabilities all the time (before and
    after taking capability).

    root@vm0:~# id
    uid=0(root) gid=0(root) groups=0(root)

    root@vm0:~# sysctl -q kernel.controlled_userns_caps_whitelist 
    kernel.controlled_userns_caps_whitelist = 1f,ffffffff

    root@vm0:~# ./acquire_raw 
    Attempting to open RAW socket before unshare()...
    Successfully opened RAW-Sock before unshare().
    Attempting to open RAW socket after unshare()...
    Successfully opened RAW-Sock after unshare().

    root@vm0:~# sysctl -w kernel.controlled_userns_caps_whitelist=1f,ffffdfff
    kernel.controlled_userns_caps_whitelist = 1f,ffffdfff

    root@vm0:~# ./acquire_raw 
    Attempting to open RAW socket before unshare()...
    Successfully opened RAW-Sock before unshare().
    Attempting to open RAW socket after unshare()...
    Successfully opened RAW-Sock after unshare().

(b) Unprivileged user cannot change the mask.

    mahesh@vm0:~$ id
    uid=1000(mahesh) gid=1000(mahesh)
    groups=1000(mahesh),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),118(lpadmin),128(sambashare)

    mahesh@vm0:~$ sysctl -q kernel.controlled_userns_caps_whitelist 
    kernel.controlled_userns_caps_whitelist = 1f,ffffffff

    mahesh@vm0:~$ sysctl -w kernel.controlled_userns_caps_whitelist=1f,ffffdfff
    sysctl: permission denied on key 'kernel.controlled_userns_caps_whitelist'

(c) Unprivileged user does not have CAP_NET_RAW in init-ns but can get
    that capability inside child-user-ns when the controlled_userns_caps
    mask is unchanged (current behavior).

    mahesh@vm0:~$ sysctl -q kernel.controlled_userns_caps_whitelist 
    kernel.controlled_userns_caps_whitelist = 1f,ffffffff

    mahesh@vm0:~$ ./acquire_raw 
    Attempting to open RAW socket before unshare()...
    socket() SOCK_RAW failed: : Operation not permitted
    Attempting to open RAW socket after unshare()...
    Successfully opened RAW-Sock after unshare().

(d) Changing the controlled_userns_caps_whitelist mask will prevent user
    for acquiring 'controlled capability' inside user-namespace.

    mahesh@vm0:~$ sysctl -q kernel.controlled_userns_caps_whitelist 
    kernel.controlled_userns_caps_whitelist = 1f,ffffdfff
    mahesh@vm0:~$ ./acquire_raw 
    Attempting to open RAW socket before unshare()...
    socket() SOCK_RAW failed: : Operation not permitted
    Attempting to open RAW socket after unshare()...
    socket() SOCK_RAW failed: : Operation not permitted


Please see individual patches in this series.

Mahesh Bandewar (2):
  capability: introduce sysctl for controlled user-ns capability whitelist
  userns: control capabilities of some user namespaces

 Documentation/sysctl/kernel.txt | 21 +++++++++++++++++
 include/linux/capability.h      |  7 ++++++
 include/linux/user_namespace.h  | 25 ++++++++++++++++++++
 kernel/capability.c             | 52 +++++++++++++++++++++++++++++++++++++++++
 kernel/sysctl.c                 |  5 ++++
 kernel/user_namespace.c         |  4 ++++
 security/commoncap.c            |  8 +++++++
 7 files changed, 122 insertions(+)

-- 
2.15.1.620.gb9897f4670-goog

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv4 0/2] capability controlled user-namespaces
  2018-01-03  7:26 ` Mahesh Bandewar
  (?)
@ 2018-01-03 16:44   ` Eric W. Biederman
  -1 siblings, 0 replies; 14+ messages in thread
From: Eric W. Biederman @ 2018-01-03 16:44 UTC (permalink / raw)
  To: Mahesh Bandewar
  Cc: LKML, James Morris, Netdev, Kernel-hardening, Linux API,
	Linux Security, Serge Hallyn, Michael Kerrisk, Kees Cook,
	Eric Dumazet, David Miller, Mahesh Bandewar

Mahesh Bandewar <mahesh@bandewar.net> writes:

> From: Mahesh Bandewar <maheshb@google.com>
>
> TL;DR version
> -------------
> Creating a sandbox environment with namespaces is challenging
> considering what these sandboxed processes can engage into. e.g.
> CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few.
> Current form of user-namespaces, however, if changed a bit can allow
> us to create a sandbox environment without locking down user-
> namespaces.

In other conversations it appears it has been pointed out that user
namespaces are not necessarily safe under no_new_privs.  In theory
user namespaces should be safe but in practice not so much.

So let me ask.  Would your concerns be addressed if we simply made
creation and joining of user namespaces impossible in a no_new_privs
sandbox?

Eric

>
> Detailed version
> ----------------
>
> Problem
> -------
> User-namespaces in the current form have increased the attack surface as
> any process can acquire capabilities which are not available to them (by
> default) by performing combination of clone()/unshare()/setns() syscalls.
>
>     #define _GNU_SOURCE
>     #include <stdio.h>
>     #include <sched.h>
>     #include <netinet/in.h>
>
>     int main(int ac, char **av)
>     {
>         int sock = -1;
>
>         printf("Attempting to open RAW socket before unshare()...\n");
>         sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW);
>         if (sock < 0) {
>             perror("socket() SOCK_RAW failed: ");
>         } else {
>             printf("Successfully opened RAW-Sock before unshare().\n");
>             close(sock);
>             sock = -1;
>         }
>
>         if (unshare(CLONE_NEWUSER | CLONE_NEWNET) < 0) {
>             perror("unshare() failed: ");
>             return 1;
>         }
>
>         printf("Attempting to open RAW socket after unshare()...\n");
>         sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW);
>         if (sock < 0) {
>             perror("socket() SOCK_RAW failed: ");
>         } else {
>             printf("Successfully opened RAW-Sock after unshare().\n");
>             close(sock);
>             sock = -1;
>         }
>
>         return 0;
>     }
>
> The above example shows how easy it is to acquire NET_RAW capabilities
> and once acquired, these processes could take benefit of above mentioned
> or similar issues discovered/undiscovered with malicious intent. Note
> that this is just an example and the problem/solution is not limited
> to NET_RAW capability *only*. 
>
> The easiest fix one can apply here is to lock-down user-namespaces which
> many of the distros do (i.e. don't allow users to create user namespaces),
> but unfortunately that prevents everyone from using them.
>
> Approach
> --------
> Introduce a notion of 'controlled' user-namespaces. Every process on
> the host is allowed to create user-namespaces (governed by the limit
> imposed by per-ns sysctl) however, mark user-namespaces created by
> sandboxed processes as 'controlled'. Use this 'mark' at the time of
> capability check in conjunction with a global capability whitelist.
> If the capability is not whitelisted, processes that belong to 
> controlled user-namespaces will not be allowed.
>
> Processes that do not have CAP_SYS_ADMIN in init-ns can *only* create
> controlled user-namespaces. In other words, user-namespaces created by
> privileged processes (those which have CAP_SYS_ADMIN in init-ns) are
> not controlled. A hierarchy underneath any controlled user-ns is always
> controlled.
>
> A global whitelist is list of capabilities governed by a sysctl
> (kernel.controlled_userns_caps_whitelist) which is available to
> (privileged) user in init-ns to modify while it's applicable to all
> controlled user-namespaces on the host irrespective of when that user-ns
> was created.
>
> Marking user-namespaces controlled without modifying the whitelist is
> equivalent of the current behavior. The default value of whitelist includes
> all capabilities so that the compatibility is maintained. However it gives
> admins fine-grained ability to control various capabilities system wide
> without locking down user-namespaces.
>
> Example
> -------
> Here is the example that demonstrates the behavior of a kernel that has
> this patch-set applied. It uses the same c-code from this commit-log and
> is called acquire_raw.c -
>
> (a) The 'root' user has all the capabilities all the time (before and
>     after taking capability).
>
>     root@vm0:~# id
>     uid=0(root) gid=0(root) groups=0(root)
>
>     root@vm0:~# sysctl -q kernel.controlled_userns_caps_whitelist 
>     kernel.controlled_userns_caps_whitelist = 1f,ffffffff
>
>     root@vm0:~# ./acquire_raw 
>     Attempting to open RAW socket before unshare()...
>     Successfully opened RAW-Sock before unshare().
>     Attempting to open RAW socket after unshare()...
>     Successfully opened RAW-Sock after unshare().
>
>     root@vm0:~# sysctl -w kernel.controlled_userns_caps_whitelist=1f,ffffdfff
>     kernel.controlled_userns_caps_whitelist = 1f,ffffdfff
>
>     root@vm0:~# ./acquire_raw 
>     Attempting to open RAW socket before unshare()...
>     Successfully opened RAW-Sock before unshare().
>     Attempting to open RAW socket after unshare()...
>     Successfully opened RAW-Sock after unshare().
>
> (b) Unprivileged user cannot change the mask.
>
>     mahesh@vm0:~$ id
>     uid=1000(mahesh) gid=1000(mahesh)
>     groups=1000(mahesh),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),118(lpadmin),128(sambashare)
>
>     mahesh@vm0:~$ sysctl -q kernel.controlled_userns_caps_whitelist 
>     kernel.controlled_userns_caps_whitelist = 1f,ffffffff
>
>     mahesh@vm0:~$ sysctl -w kernel.controlled_userns_caps_whitelist=1f,ffffdfff
>     sysctl: permission denied on key 'kernel.controlled_userns_caps_whitelist'
>
> (c) Unprivileged user does not have CAP_NET_RAW in init-ns but can get
>     that capability inside child-user-ns when the controlled_userns_caps
>     mask is unchanged (current behavior).
>
>     mahesh@vm0:~$ sysctl -q kernel.controlled_userns_caps_whitelist 
>     kernel.controlled_userns_caps_whitelist = 1f,ffffffff
>
>     mahesh@vm0:~$ ./acquire_raw 
>     Attempting to open RAW socket before unshare()...
>     socket() SOCK_RAW failed: : Operation not permitted
>     Attempting to open RAW socket after unshare()...
>     Successfully opened RAW-Sock after unshare().
>
> (d) Changing the controlled_userns_caps_whitelist mask will prevent user
>     for acquiring 'controlled capability' inside user-namespace.
>
>     mahesh@vm0:~$ sysctl -q kernel.controlled_userns_caps_whitelist 
>     kernel.controlled_userns_caps_whitelist = 1f,ffffdfff
>     mahesh@vm0:~$ ./acquire_raw 
>     Attempting to open RAW socket before unshare()...
>     socket() SOCK_RAW failed: : Operation not permitted
>     Attempting to open RAW socket after unshare()...
>     socket() SOCK_RAW failed: : Operation not permitted
>
>
> Please see individual patches in this series.
>
> Mahesh Bandewar (2):
>   capability: introduce sysctl for controlled user-ns capability whitelist
>   userns: control capabilities of some user namespaces
>
>  Documentation/sysctl/kernel.txt | 21 +++++++++++++++++
>  include/linux/capability.h      |  7 ++++++
>  include/linux/user_namespace.h  | 25 ++++++++++++++++++++
>  kernel/capability.c             | 52 +++++++++++++++++++++++++++++++++++++++++
>  kernel/sysctl.c                 |  5 ++++
>  kernel/user_namespace.c         |  4 ++++
>  security/commoncap.c            |  8 +++++++
>  7 files changed, 122 insertions(+)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCHv4 0/2] capability controlled user-namespaces
@ 2018-01-03 16:44   ` Eric W. Biederman
  0 siblings, 0 replies; 14+ messages in thread
From: Eric W. Biederman @ 2018-01-03 16:44 UTC (permalink / raw)
  To: linux-security-module

Mahesh Bandewar <mahesh@bandewar.net> writes:

> From: Mahesh Bandewar <maheshb@google.com>
>
> TL;DR version
> -------------
> Creating a sandbox environment with namespaces is challenging
> considering what these sandboxed processes can engage into. e.g.
> CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few.
> Current form of user-namespaces, however, if changed a bit can allow
> us to create a sandbox environment without locking down user-
> namespaces.

In other conversations it appears it has been pointed out that user
namespaces are not necessarily safe under no_new_privs.  In theory
user namespaces should be safe but in practice not so much.

So let me ask.  Would your concerns be addressed if we simply made
creation and joining of user namespaces impossible in a no_new_privs
sandbox?

Eric

>
> Detailed version
> ----------------
>
> Problem
> -------
> User-namespaces in the current form have increased the attack surface as
> any process can acquire capabilities which are not available to them (by
> default) by performing combination of clone()/unshare()/setns() syscalls.
>
>     #define _GNU_SOURCE
>     #include <stdio.h>
>     #include <sched.h>
>     #include <netinet/in.h>
>
>     int main(int ac, char **av)
>     {
>         int sock = -1;
>
>         printf("Attempting to open RAW socket before unshare()...\n");
>         sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW);
>         if (sock < 0) {
>             perror("socket() SOCK_RAW failed: ");
>         } else {
>             printf("Successfully opened RAW-Sock before unshare().\n");
>             close(sock);
>             sock = -1;
>         }
>
>         if (unshare(CLONE_NEWUSER | CLONE_NEWNET) < 0) {
>             perror("unshare() failed: ");
>             return 1;
>         }
>
>         printf("Attempting to open RAW socket after unshare()...\n");
>         sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW);
>         if (sock < 0) {
>             perror("socket() SOCK_RAW failed: ");
>         } else {
>             printf("Successfully opened RAW-Sock after unshare().\n");
>             close(sock);
>             sock = -1;
>         }
>
>         return 0;
>     }
>
> The above example shows how easy it is to acquire NET_RAW capabilities
> and once acquired, these processes could take benefit of above mentioned
> or similar issues discovered/undiscovered with malicious intent. Note
> that this is just an example and the problem/solution is not limited
> to NET_RAW capability *only*. 
>
> The easiest fix one can apply here is to lock-down user-namespaces which
> many of the distros do (i.e. don't allow users to create user namespaces),
> but unfortunately that prevents everyone from using them.
>
> Approach
> --------
> Introduce a notion of 'controlled' user-namespaces. Every process on
> the host is allowed to create user-namespaces (governed by the limit
> imposed by per-ns sysctl) however, mark user-namespaces created by
> sandboxed processes as 'controlled'. Use this 'mark' at the time of
> capability check in conjunction with a global capability whitelist.
> If the capability is not whitelisted, processes that belong to 
> controlled user-namespaces will not be allowed.
>
> Processes that do not have CAP_SYS_ADMIN in init-ns can *only* create
> controlled user-namespaces. In other words, user-namespaces created by
> privileged processes (those which have CAP_SYS_ADMIN in init-ns) are
> not controlled. A hierarchy underneath any controlled user-ns is always
> controlled.
>
> A global whitelist is list of capabilities governed by a sysctl
> (kernel.controlled_userns_caps_whitelist) which is available to
> (privileged) user in init-ns to modify while it's applicable to all
> controlled user-namespaces on the host irrespective of when that user-ns
> was created.
>
> Marking user-namespaces controlled without modifying the whitelist is
> equivalent of the current behavior. The default value of whitelist includes
> all capabilities so that the compatibility is maintained. However it gives
> admins fine-grained ability to control various capabilities system wide
> without locking down user-namespaces.
>
> Example
> -------
> Here is the example that demonstrates the behavior of a kernel that has
> this patch-set applied. It uses the same c-code from this commit-log and
> is called acquire_raw.c -
>
> (a) The 'root' user has all the capabilities all the time (before and
>     after taking capability).
>
>     root at vm0:~# id
>     uid=0(root) gid=0(root) groups=0(root)
>
>     root at vm0:~# sysctl -q kernel.controlled_userns_caps_whitelist 
>     kernel.controlled_userns_caps_whitelist = 1f,ffffffff
>
>     root at vm0:~# ./acquire_raw 
>     Attempting to open RAW socket before unshare()...
>     Successfully opened RAW-Sock before unshare().
>     Attempting to open RAW socket after unshare()...
>     Successfully opened RAW-Sock after unshare().
>
>     root at vm0:~# sysctl -w kernel.controlled_userns_caps_whitelist=1f,ffffdfff
>     kernel.controlled_userns_caps_whitelist = 1f,ffffdfff
>
>     root at vm0:~# ./acquire_raw 
>     Attempting to open RAW socket before unshare()...
>     Successfully opened RAW-Sock before unshare().
>     Attempting to open RAW socket after unshare()...
>     Successfully opened RAW-Sock after unshare().
>
> (b) Unprivileged user cannot change the mask.
>
>     mahesh at vm0:~$ id
>     uid=1000(mahesh) gid=1000(mahesh)
>     groups=1000(mahesh),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),118(lpadmin),128(sambashare)
>
>     mahesh at vm0:~$ sysctl -q kernel.controlled_userns_caps_whitelist 
>     kernel.controlled_userns_caps_whitelist = 1f,ffffffff
>
>     mahesh at vm0:~$ sysctl -w kernel.controlled_userns_caps_whitelist=1f,ffffdfff
>     sysctl: permission denied on key 'kernel.controlled_userns_caps_whitelist'
>
> (c) Unprivileged user does not have CAP_NET_RAW in init-ns but can get
>     that capability inside child-user-ns when the controlled_userns_caps
>     mask is unchanged (current behavior).
>
>     mahesh at vm0:~$ sysctl -q kernel.controlled_userns_caps_whitelist 
>     kernel.controlled_userns_caps_whitelist = 1f,ffffffff
>
>     mahesh at vm0:~$ ./acquire_raw 
>     Attempting to open RAW socket before unshare()...
>     socket() SOCK_RAW failed: : Operation not permitted
>     Attempting to open RAW socket after unshare()...
>     Successfully opened RAW-Sock after unshare().
>
> (d) Changing the controlled_userns_caps_whitelist mask will prevent user
>     for acquiring 'controlled capability' inside user-namespace.
>
>     mahesh at vm0:~$ sysctl -q kernel.controlled_userns_caps_whitelist 
>     kernel.controlled_userns_caps_whitelist = 1f,ffffdfff
>     mahesh at vm0:~$ ./acquire_raw 
>     Attempting to open RAW socket before unshare()...
>     socket() SOCK_RAW failed: : Operation not permitted
>     Attempting to open RAW socket after unshare()...
>     socket() SOCK_RAW failed: : Operation not permitted
>
>
> Please see individual patches in this series.
>
> Mahesh Bandewar (2):
>   capability: introduce sysctl for controlled user-ns capability whitelist
>   userns: control capabilities of some user namespaces
>
>  Documentation/sysctl/kernel.txt | 21 +++++++++++++++++
>  include/linux/capability.h      |  7 ++++++
>  include/linux/user_namespace.h  | 25 ++++++++++++++++++++
>  kernel/capability.c             | 52 +++++++++++++++++++++++++++++++++++++++++
>  kernel/sysctl.c                 |  5 ++++
>  kernel/user_namespace.c         |  4 ++++
>  security/commoncap.c            |  8 +++++++
>  7 files changed, 122 insertions(+)
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [kernel-hardening] Re: [PATCHv4 0/2] capability controlled user-namespaces
@ 2018-01-03 16:44   ` Eric W. Biederman
  0 siblings, 0 replies; 14+ messages in thread
From: Eric W. Biederman @ 2018-01-03 16:44 UTC (permalink / raw)
  To: Mahesh Bandewar
  Cc: LKML, James Morris, Netdev, Kernel-hardening, Linux API,
	Linux Security, Serge Hallyn, Michael Kerrisk, Kees Cook,
	Eric Dumazet, David Miller, Mahesh Bandewar

Mahesh Bandewar <mahesh@bandewar.net> writes:

> From: Mahesh Bandewar <maheshb@google.com>
>
> TL;DR version
> -------------
> Creating a sandbox environment with namespaces is challenging
> considering what these sandboxed processes can engage into. e.g.
> CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few.
> Current form of user-namespaces, however, if changed a bit can allow
> us to create a sandbox environment without locking down user-
> namespaces.

In other conversations it appears it has been pointed out that user
namespaces are not necessarily safe under no_new_privs.  In theory
user namespaces should be safe but in practice not so much.

So let me ask.  Would your concerns be addressed if we simply made
creation and joining of user namespaces impossible in a no_new_privs
sandbox?

Eric

>
> Detailed version
> ----------------
>
> Problem
> -------
> User-namespaces in the current form have increased the attack surface as
> any process can acquire capabilities which are not available to them (by
> default) by performing combination of clone()/unshare()/setns() syscalls.
>
>     #define _GNU_SOURCE
>     #include <stdio.h>
>     #include <sched.h>
>     #include <netinet/in.h>
>
>     int main(int ac, char **av)
>     {
>         int sock = -1;
>
>         printf("Attempting to open RAW socket before unshare()...\n");
>         sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW);
>         if (sock < 0) {
>             perror("socket() SOCK_RAW failed: ");
>         } else {
>             printf("Successfully opened RAW-Sock before unshare().\n");
>             close(sock);
>             sock = -1;
>         }
>
>         if (unshare(CLONE_NEWUSER | CLONE_NEWNET) < 0) {
>             perror("unshare() failed: ");
>             return 1;
>         }
>
>         printf("Attempting to open RAW socket after unshare()...\n");
>         sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW);
>         if (sock < 0) {
>             perror("socket() SOCK_RAW failed: ");
>         } else {
>             printf("Successfully opened RAW-Sock after unshare().\n");
>             close(sock);
>             sock = -1;
>         }
>
>         return 0;
>     }
>
> The above example shows how easy it is to acquire NET_RAW capabilities
> and once acquired, these processes could take benefit of above mentioned
> or similar issues discovered/undiscovered with malicious intent. Note
> that this is just an example and the problem/solution is not limited
> to NET_RAW capability *only*. 
>
> The easiest fix one can apply here is to lock-down user-namespaces which
> many of the distros do (i.e. don't allow users to create user namespaces),
> but unfortunately that prevents everyone from using them.
>
> Approach
> --------
> Introduce a notion of 'controlled' user-namespaces. Every process on
> the host is allowed to create user-namespaces (governed by the limit
> imposed by per-ns sysctl) however, mark user-namespaces created by
> sandboxed processes as 'controlled'. Use this 'mark' at the time of
> capability check in conjunction with a global capability whitelist.
> If the capability is not whitelisted, processes that belong to 
> controlled user-namespaces will not be allowed.
>
> Processes that do not have CAP_SYS_ADMIN in init-ns can *only* create
> controlled user-namespaces. In other words, user-namespaces created by
> privileged processes (those which have CAP_SYS_ADMIN in init-ns) are
> not controlled. A hierarchy underneath any controlled user-ns is always
> controlled.
>
> A global whitelist is list of capabilities governed by a sysctl
> (kernel.controlled_userns_caps_whitelist) which is available to
> (privileged) user in init-ns to modify while it's applicable to all
> controlled user-namespaces on the host irrespective of when that user-ns
> was created.
>
> Marking user-namespaces controlled without modifying the whitelist is
> equivalent of the current behavior. The default value of whitelist includes
> all capabilities so that the compatibility is maintained. However it gives
> admins fine-grained ability to control various capabilities system wide
> without locking down user-namespaces.
>
> Example
> -------
> Here is the example that demonstrates the behavior of a kernel that has
> this patch-set applied. It uses the same c-code from this commit-log and
> is called acquire_raw.c -
>
> (a) The 'root' user has all the capabilities all the time (before and
>     after taking capability).
>
>     root@vm0:~# id
>     uid=0(root) gid=0(root) groups=0(root)
>
>     root@vm0:~# sysctl -q kernel.controlled_userns_caps_whitelist 
>     kernel.controlled_userns_caps_whitelist = 1f,ffffffff
>
>     root@vm0:~# ./acquire_raw 
>     Attempting to open RAW socket before unshare()...
>     Successfully opened RAW-Sock before unshare().
>     Attempting to open RAW socket after unshare()...
>     Successfully opened RAW-Sock after unshare().
>
>     root@vm0:~# sysctl -w kernel.controlled_userns_caps_whitelist=1f,ffffdfff
>     kernel.controlled_userns_caps_whitelist = 1f,ffffdfff
>
>     root@vm0:~# ./acquire_raw 
>     Attempting to open RAW socket before unshare()...
>     Successfully opened RAW-Sock before unshare().
>     Attempting to open RAW socket after unshare()...
>     Successfully opened RAW-Sock after unshare().
>
> (b) Unprivileged user cannot change the mask.
>
>     mahesh@vm0:~$ id
>     uid=1000(mahesh) gid=1000(mahesh)
>     groups=1000(mahesh),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),118(lpadmin),128(sambashare)
>
>     mahesh@vm0:~$ sysctl -q kernel.controlled_userns_caps_whitelist 
>     kernel.controlled_userns_caps_whitelist = 1f,ffffffff
>
>     mahesh@vm0:~$ sysctl -w kernel.controlled_userns_caps_whitelist=1f,ffffdfff
>     sysctl: permission denied on key 'kernel.controlled_userns_caps_whitelist'
>
> (c) Unprivileged user does not have CAP_NET_RAW in init-ns but can get
>     that capability inside child-user-ns when the controlled_userns_caps
>     mask is unchanged (current behavior).
>
>     mahesh@vm0:~$ sysctl -q kernel.controlled_userns_caps_whitelist 
>     kernel.controlled_userns_caps_whitelist = 1f,ffffffff
>
>     mahesh@vm0:~$ ./acquire_raw 
>     Attempting to open RAW socket before unshare()...
>     socket() SOCK_RAW failed: : Operation not permitted
>     Attempting to open RAW socket after unshare()...
>     Successfully opened RAW-Sock after unshare().
>
> (d) Changing the controlled_userns_caps_whitelist mask will prevent user
>     for acquiring 'controlled capability' inside user-namespace.
>
>     mahesh@vm0:~$ sysctl -q kernel.controlled_userns_caps_whitelist 
>     kernel.controlled_userns_caps_whitelist = 1f,ffffdfff
>     mahesh@vm0:~$ ./acquire_raw 
>     Attempting to open RAW socket before unshare()...
>     socket() SOCK_RAW failed: : Operation not permitted
>     Attempting to open RAW socket after unshare()...
>     socket() SOCK_RAW failed: : Operation not permitted
>
>
> Please see individual patches in this series.
>
> Mahesh Bandewar (2):
>   capability: introduce sysctl for controlled user-ns capability whitelist
>   userns: control capabilities of some user namespaces
>
>  Documentation/sysctl/kernel.txt | 21 +++++++++++++++++
>  include/linux/capability.h      |  7 ++++++
>  include/linux/user_namespace.h  | 25 ++++++++++++++++++++
>  kernel/capability.c             | 52 +++++++++++++++++++++++++++++++++++++++++
>  kernel/sysctl.c                 |  5 ++++
>  kernel/user_namespace.c         |  4 ++++
>  security/commoncap.c            |  8 +++++++
>  7 files changed, 122 insertions(+)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv4 0/2] capability controlled user-namespaces
@ 2018-01-04  5:53     ` Mahesh Bandewar (महेश बंडेवार)
  0 siblings, 0 replies; 14+ messages in thread
From: Mahesh Bandewar (महेश बंडेवार) @ 2018-01-04  5:53 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Mahesh Bandewar, LKML, James Morris, Netdev, Kernel-hardening,
	Linux API, Linux Security, Serge Hallyn, Michael Kerrisk,
	Kees Cook, Eric Dumazet, David Miller

On Wed, Jan 3, 2018 at 8:44 AM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> Mahesh Bandewar <mahesh@bandewar.net> writes:
>
>> From: Mahesh Bandewar <maheshb@google.com>
>>
>> TL;DR version
>> -------------
>> Creating a sandbox environment with namespaces is challenging
>> considering what these sandboxed processes can engage into. e.g.
>> CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few.
>> Current form of user-namespaces, however, if changed a bit can allow
>> us to create a sandbox environment without locking down user-
>> namespaces.
>
> In other conversations it appears it has been pointed out that user
> namespaces are not necessarily safe under no_new_privs.  In theory
> user namespaces should be safe but in practice not so much.
>
> So let me ask.  Would your concerns be addressed if we simply made
> creation and joining of user namespaces impossible in a no_new_privs
> sandbox?
>
Isn't this another form of locking down user-ns similar to setting per
user-ns sysctl max_userns = 0?

Having said that, not allowing processes to create and/or attach
user-namespaces is going to be problematic and possibly a regression.
This (current) patchset doesn't do that. It allows users to create
user-ns's of any depth and number permitted by per-ns max_userns
sysctl. However one can decide what to take-off and what to leave in
terms of capabilities for the sandbox environment.

--mahesh..

> Eric
>
[...]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv4 0/2] capability controlled user-namespaces
@ 2018-01-04  5:53     ` Mahesh Bandewar (महेश बंडेवार)
  0 siblings, 0 replies; 14+ messages in thread
From: Mahesh Bandewar (महेश बंडेवार) @ 2018-01-04  5:53 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Mahesh Bandewar, LKML, James Morris, Netdev, Kernel-hardening,
	Linux API, Linux Security, Serge Hallyn, Michael Kerrisk,
	Kees Cook, Eric Dumazet, David Miller

On Wed, Jan 3, 2018 at 8:44 AM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
> Mahesh Bandewar <mahesh-bmGAjcP2qsnk1uMJSBkQmQ@public.gmane.org> writes:
>
>> From: Mahesh Bandewar <maheshb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
>>
>> TL;DR version
>> -------------
>> Creating a sandbox environment with namespaces is challenging
>> considering what these sandboxed processes can engage into. e.g.
>> CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few.
>> Current form of user-namespaces, however, if changed a bit can allow
>> us to create a sandbox environment without locking down user-
>> namespaces.
>
> In other conversations it appears it has been pointed out that user
> namespaces are not necessarily safe under no_new_privs.  In theory
> user namespaces should be safe but in practice not so much.
>
> So let me ask.  Would your concerns be addressed if we simply made
> creation and joining of user namespaces impossible in a no_new_privs
> sandbox?
>
Isn't this another form of locking down user-ns similar to setting per
user-ns sysctl max_userns = 0?

Having said that, not allowing processes to create and/or attach
user-namespaces is going to be problematic and possibly a regression.
This (current) patchset doesn't do that. It allows users to create
user-ns's of any depth and number permitted by per-ns max_userns
sysctl. However one can decide what to take-off and what to leave in
terms of capabilities for the sandbox environment.

--mahesh..

> Eric
>
[...]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCHv4 0/2] capability controlled user-namespaces
@ 2018-01-04  5:53     ` Mahesh Bandewar (महेश बंडेवार)
  0 siblings, 0 replies; 14+ messages in thread
From: Mahesh Bandewar (महेश बंडेवार) @ 2018-01-04  5:53 UTC (permalink / raw)
  To: linux-security-module

On Wed, Jan 3, 2018 at 8:44 AM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> Mahesh Bandewar <mahesh@bandewar.net> writes:
>
>> From: Mahesh Bandewar <maheshb@google.com>
>>
>> TL;DR version
>> -------------
>> Creating a sandbox environment with namespaces is challenging
>> considering what these sandboxed processes can engage into. e.g.
>> CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few.
>> Current form of user-namespaces, however, if changed a bit can allow
>> us to create a sandbox environment without locking down user-
>> namespaces.
>
> In other conversations it appears it has been pointed out that user
> namespaces are not necessarily safe under no_new_privs.  In theory
> user namespaces should be safe but in practice not so much.
>
> So let me ask.  Would your concerns be addressed if we simply made
> creation and joining of user namespaces impossible in a no_new_privs
> sandbox?
>
Isn't this another form of locking down user-ns similar to setting per
user-ns sysctl max_userns = 0?

Having said that, not allowing processes to create and/or attach
user-namespaces is going to be problematic and possibly a regression.
This (current) patchset doesn't do that. It allows users to create
user-ns's of any depth and number permitted by per-ns max_userns
sysctl. However one can decide what to take-off and what to leave in
terms of capabilities for the sandbox environment.

--mahesh..

> Eric
>
[...]
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [kernel-hardening] Re: [PATCHv4 0/2] capability controlled user-namespaces
@ 2018-01-04  5:53     ` Mahesh Bandewar (महेश बंडेवार)
  0 siblings, 0 replies; 14+ messages in thread
From: Mahesh Bandewar (महेश बंडेवार) @ 2018-01-04  5:53 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Mahesh Bandewar, LKML, James Morris, Netdev, Kernel-hardening,
	Linux API, Linux Security, Serge Hallyn, Michael Kerrisk,
	Kees Cook, Eric Dumazet, David Miller

On Wed, Jan 3, 2018 at 8:44 AM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> Mahesh Bandewar <mahesh@bandewar.net> writes:
>
>> From: Mahesh Bandewar <maheshb@google.com>
>>
>> TL;DR version
>> -------------
>> Creating a sandbox environment with namespaces is challenging
>> considering what these sandboxed processes can engage into. e.g.
>> CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few.
>> Current form of user-namespaces, however, if changed a bit can allow
>> us to create a sandbox environment without locking down user-
>> namespaces.
>
> In other conversations it appears it has been pointed out that user
> namespaces are not necessarily safe under no_new_privs.  In theory
> user namespaces should be safe but in practice not so much.
>
> So let me ask.  Would your concerns be addressed if we simply made
> creation and joining of user namespaces impossible in a no_new_privs
> sandbox?
>
Isn't this another form of locking down user-ns similar to setting per
user-ns sysctl max_userns = 0?

Having said that, not allowing processes to create and/or attach
user-namespaces is going to be problematic and possibly a regression.
This (current) patchset doesn't do that. It allows users to create
user-ns's of any depth and number permitted by per-ns max_userns
sysctl. However one can decide what to take-off and what to leave in
terms of capabilities for the sandbox environment.

--mahesh..

> Eric
>
[...]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv4 0/2] capability controlled user-namespaces
       [not found]           ` <alpine.LRH.2.21.1803090901100.20664@namei.org>
@ 2018-03-08 22:52             ` Eric W. Biederman
  2018-03-08 23:22               ` Mahesh Bandewar (महेश बंडेवार)
  0 siblings, 1 reply; 14+ messages in thread
From: Eric W. Biederman @ 2018-03-08 22:52 UTC (permalink / raw)
  To: James Morris
  Cc: Eric Dumazet, Serge E. Hallyn,
	Mahesh Bandewar (महेश
	बंडेवार),
	kernel-hardening

James Morris <jmorris@namei.org> writes:

> Perhaps try a repost upstream for possible merging to 4.16.

I have a real concern that capability controlled user namespaces
are only good for CAP_NET_RAW and CAP_NET_ADMIN.  They don't appear
general.

I think this should be discussed on the linux hardening mailing list.
As that is what we are really talking about something to reduce the
attack surface of the kernel.  Possibly after it has shipped.
In some well defined way.

That feels to me like a project for profiling tools, and some bpf programs
that attach to functions and call permissions.

Either that or something like my count of maximum number of namespaces.
Which appears to be just as usable as capability controlled user
namespaces.

I am very sympathetic but this does not appear to be a general solution
to a general problem.  The general problem being how to reduce the
attack surface of the kernel.

Especially when the end goal is fixing the relevant kernel code and
removing the restrictions I don't see why a weird kernel patch with
oddball semantics can help.

Eric


> On Mon, 26 Feb 2018, Eric Dumazet wrote:
>
>> On Mon, Feb 5, 2018 at 6:40 AM, Serge E. Hallyn <serge@hallyn.com> wrote:
>> > Quoting Mahesh Bandewar (महेश बंडेवार) (maheshb@google.com):
>> >> -everyone (just keeping the relevant people)
>> >>
>> >> Hi James and Eric,
>> >>
>> >> I would really like to know how we can proceed with this patch-series.
>> >> At this moment it does seem like this is the only solution (unless
>> >> something is in the kitchen that solves this problem differently that
>> >> I'm not aware of) to reduce the surface attack and address 0day
>> >> vulnerabilities. I have been trying this for last 6+ months now and
>> >> most of the questions are answered. I really appreciate the feedback /
>> >> queries / questions received in making this a better solution from
>> >> Serge.
>> >>
>> >> The last status that I know from this and other mail-thread is that
>> >> James wants to know Eric's take. Eric wanted to see if no_new_privs
>> >> way solves the problem. To which I have replied.
>> >>
>> >> I would really love to see if there is any blockage that I can clear
>> >> and why this has been held back.
>> >>
>> >> So Eric, please respond (publicly or to this thread) to make me
>> >
>> > Hey Eric,
>> >
>> > ping?
>> >
>> > (ack or nack, let's not leave him hanging :)
>> 
>> 
>> Hmm...
>> 
>> Eric Biederman , what can we do to unblock this ?
>> 
>> We can pretend the issue does not exist, until something bad happens.
>> 
>> Thanks.
>> 
>> 
>> >> understand why this can/cannot make into linux and make it easier for
>> >> James to decide when/how/what to pull as far as this patch-series is
>> >> concerned.
>> >>
>> >> [I don't mean to hurt anyone by being direct so please accept my
>> >> sincere apologies if that happened.]
>> >>
>> >> Thanks,
>> >> --mahesh..
>> >>
>> >> On Wed, Jan 3, 2018 at 9:53 PM, Mahesh Bandewar (महेश बंडेवार)
>> >> <maheshb@google.com> wrote:
>> >> > On Wed, Jan 3, 2018 at 8:44 AM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>> >> >> Mahesh Bandewar <mahesh@bandewar.net> writes:
>> >> >>
>> >> >>> From: Mahesh Bandewar <maheshb@google.com>
>> >> >>>
>> >> >>> TL;DR version
>> >> >>> -------------
>> >> >>> Creating a sandbox environment with namespaces is challenging
>> >> >>> considering what these sandboxed processes can engage into. e.g.
>> >> >>> CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few.
>> >> >>> Current form of user-namespaces, however, if changed a bit can allow
>> >> >>> us to create a sandbox environment without locking down user-
>> >> >>> namespaces.
>> >> >>
>> >> >> In other conversations it appears it has been pointed out that user
>> >> >> namespaces are not necessarily safe under no_new_privs.  In theory
>> >> >> user namespaces should be safe but in practice not so much.
>> >> >>
>> >> >> So let me ask.  Would your concerns be addressed if we simply made
>> >> >> creation and joining of user namespaces impossible in a no_new_privs
>> >> >> sandbox?
>> >> >>
>> >> > Isn't this another form of locking down user-ns similar to setting per
>> >> > user-ns sysctl max_userns = 0?
>> >> >
>> >> > Having said that, not allowing processes to create and/or attach
>> >> > user-namespaces is going to be problematic and possibly a regression.
>> >> > This (current) patchset doesn't do that. It allows users to create
>> >> > user-ns's of any depth and number permitted by per-ns max_userns
>> >> > sysctl. However one can decide what to take-off and what to leave in
>> >> > terms of capabilities for the sandbox environment.
>> >> >
>> >> > --mahesh..
>> >> >
>> >> >> Eric
>> >> >>
>> >>
>> >> > [...]
>> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv4 0/2] capability controlled user-namespaces
  2018-03-08 22:52             ` Eric W. Biederman
@ 2018-03-08 23:22               ` Mahesh Bandewar (महेश बंडेवार)
  2018-03-08 23:46                 ` Eric W. Biederman
  0 siblings, 1 reply; 14+ messages in thread
From: Mahesh Bandewar (महेश बंडेवार) @ 2018-03-08 23:22 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: James Morris, Eric Dumazet, Serge E. Hallyn, kernel-hardening

On Thu, Mar 8, 2018 at 2:52 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> James Morris <jmorris@namei.org> writes:
>
>> Perhaps try a repost upstream for possible merging to 4.16.
>
> I have a real concern that capability controlled user namespaces
> are only good for CAP_NET_RAW and CAP_NET_ADMIN.  They don't appear
> general.
>
NET_RAW and NET_ADMIN threats are real and demonstrated and hence it's
easy to show this patch-set to handle them well.

> I think this should be discussed on the linux hardening mailing list.
> As that is what we are really talking about something to reduce the
> attack surface of the kernel.  Possibly after it has shipped.
> In some well defined way.
>
This patch-set has been posted to linux-hardening mailing list since
initial RFC series.

> That feels to me like a project for profiling tools, and some bpf programs
> that attach to functions and call permissions.
>
> Either that or something like my count of maximum number of namespaces.
> Which appears to be just as usable as capability controlled user
> namespaces.
>
maximum number of namespaces is similar to the distros adding a sysctl
to disallow creating user-namespace and does not solve the problem nor
it's usable if the use case involves creating user-namespaces.


> I am very sympathetic but this does not appear to be a general solution
> to a general problem.  The general problem being how to reduce the
> attack surface of the kernel.
>
Now let's say there is vulnerability discovered in CAP_DAC_OVERRIDE,
why do think this patch-set is not general enough to handle that? The
point is that at this moment there is no mechanism that allows me to
create a sandbox in a true sense. This patch-set allows you to do
that.

> Especially when the end goal is fixing the relevant kernel code and
> removing the restrictions I don't see why a weird kernel patch with
> oddball semantics can help.
>
I'm not fixated on *this only* solution but don't want a solution that
restricts creating user-namespaces since my use-cases involve creating
user-namespaces with "lesser" privileges. The patch-set has been
posted for more than 6 months and problem is known for some time now
unfortunately I haven't seen any other solution that does not involve
blocking user-namespace creation.


> Eric
>
>
>> On Mon, 26 Feb 2018, Eric Dumazet wrote:
>>
>>> On Mon, Feb 5, 2018 at 6:40 AM, Serge E. Hallyn <serge@hallyn.com> wrote:
>>> > Quoting Mahesh Bandewar (महेश बंडेवार) (maheshb@google.com):
>>> >> -everyone (just keeping the relevant people)
>>> >>
>>> >> Hi James and Eric,
>>> >>
>>> >> I would really like to know how we can proceed with this patch-series.
>>> >> At this moment it does seem like this is the only solution (unless
>>> >> something is in the kitchen that solves this problem differently that
>>> >> I'm not aware of) to reduce the surface attack and address 0day
>>> >> vulnerabilities. I have been trying this for last 6+ months now and
>>> >> most of the questions are answered. I really appreciate the feedback /
>>> >> queries / questions received in making this a better solution from
>>> >> Serge.
>>> >>
>>> >> The last status that I know from this and other mail-thread is that
>>> >> James wants to know Eric's take. Eric wanted to see if no_new_privs
>>> >> way solves the problem. To which I have replied.
>>> >>
>>> >> I would really love to see if there is any blockage that I can clear
>>> >> and why this has been held back.
>>> >>
>>> >> So Eric, please respond (publicly or to this thread) to make me
>>> >
>>> > Hey Eric,
>>> >
>>> > ping?
>>> >
>>> > (ack or nack, let's not leave him hanging :)
>>>
>>>
>>> Hmm...
>>>
>>> Eric Biederman , what can we do to unblock this ?
>>>
>>> We can pretend the issue does not exist, until something bad happens.
>>>
>>> Thanks.
>>>
>>>
>>> >> understand why this can/cannot make into linux and make it easier for
>>> >> James to decide when/how/what to pull as far as this patch-series is
>>> >> concerned.
>>> >>
>>> >> [I don't mean to hurt anyone by being direct so please accept my
>>> >> sincere apologies if that happened.]
>>> >>
>>> >> Thanks,
>>> >> --mahesh..
>>> >>
>>> >> On Wed, Jan 3, 2018 at 9:53 PM, Mahesh Bandewar (महेश बंडेवार)
>>> >> <maheshb@google.com> wrote:
>>> >> > On Wed, Jan 3, 2018 at 8:44 AM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>>> >> >> Mahesh Bandewar <mahesh@bandewar.net> writes:
>>> >> >>
>>> >> >>> From: Mahesh Bandewar <maheshb@google.com>
>>> >> >>>
>>> >> >>> TL;DR version
>>> >> >>> -------------
>>> >> >>> Creating a sandbox environment with namespaces is challenging
>>> >> >>> considering what these sandboxed processes can engage into. e.g.
>>> >> >>> CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few.
>>> >> >>> Current form of user-namespaces, however, if changed a bit can allow
>>> >> >>> us to create a sandbox environment without locking down user-
>>> >> >>> namespaces.
>>> >> >>
>>> >> >> In other conversations it appears it has been pointed out that user
>>> >> >> namespaces are not necessarily safe under no_new_privs.  In theory
>>> >> >> user namespaces should be safe but in practice not so much.
>>> >> >>
>>> >> >> So let me ask.  Would your concerns be addressed if we simply made
>>> >> >> creation and joining of user namespaces impossible in a no_new_privs
>>> >> >> sandbox?
>>> >> >>
>>> >> > Isn't this another form of locking down user-ns similar to setting per
>>> >> > user-ns sysctl max_userns = 0?
>>> >> >
>>> >> > Having said that, not allowing processes to create and/or attach
>>> >> > user-namespaces is going to be problematic and possibly a regression.
>>> >> > This (current) patchset doesn't do that. It allows users to create
>>> >> > user-ns's of any depth and number permitted by per-ns max_userns
>>> >> > sysctl. However one can decide what to take-off and what to leave in
>>> >> > terms of capabilities for the sandbox environment.
>>> >> >
>>> >> > --mahesh..
>>> >> >
>>> >> >> Eric
>>> >> >>
>>> >>
>>> >> > [...]
>>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv4 0/2] capability controlled user-namespaces
  2018-03-08 23:22               ` Mahesh Bandewar (महेश बंडेवार)
@ 2018-03-08 23:46                 ` Eric W. Biederman
  2018-03-09  5:19                   ` Mahesh Bandewar (महेश बंडेवार)
  0 siblings, 1 reply; 14+ messages in thread
From: Eric W. Biederman @ 2018-03-08 23:46 UTC (permalink / raw)
  To: Mahesh Bandewar (महेश
	बंडेवार)
  Cc: James Morris, Eric Dumazet, Serge E. Hallyn, kernel-hardening

"Mahesh Bandewar (महेश बंडेवार)" <maheshb@google.com> writes:

> On Thu, Mar 8, 2018 at 2:52 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>> James Morris <jmorris@namei.org> writes:
>>
>>> Perhaps try a repost upstream for possible merging to 4.16.
>>
>> I have a real concern that capability controlled user namespaces
>> are only good for CAP_NET_RAW and CAP_NET_ADMIN.  They don't appear
>> general.
>>
> NET_RAW and NET_ADMIN threats are real and demonstrated and hence it's
> easy to show this patch-set to handle them well.
>
>> I think this should be discussed on the linux hardening mailing list.
>> As that is what we are really talking about something to reduce the
>> attack surface of the kernel.  Possibly after it has shipped.
>> In some well defined way.
>>
> This patch-set has been posted to linux-hardening mailing list since
> initial RFC series.

When I looked this thread was not.  Hmm.  It looks like this thread had
become completely private.  Sigh.

>> That feels to me like a project for profiling tools, and some bpf programs
>> that attach to functions and call permissions.
>>
>> Either that or something like my count of maximum number of namespaces.
>> Which appears to be just as usable as capability controlled user
>> namespaces.
>>
> maximum number of namespaces is similar to the distros adding a sysctl
> to disallow creating user-namespace and does not solve the problem nor
> it's usable if the use case involves creating user-namespaces.

If the namespace you are limiting creating is the network namespace it
has nearly the same efficacy and we already have that knob in the kernel
and we need it for several reasons.

>> I am very sympathetic but this does not appear to be a general solution
>> to a general problem.  The general problem being how to reduce the
>> attack surface of the kernel.
>>
> Now let's say there is vulnerability discovered in CAP_DAC_OVERRIDE,
> why do think this patch-set is not general enough to handle that? The
> point is that at this moment there is no mechanism that allows me to
> create a sandbox in a true sense. This patch-set allows you to do
> that.

I don't think the same amount of code is behind the other capablities
which drastically alters the efficacy of something like this when
considered in such a context.

>> Especially when the end goal is fixing the relevant kernel code and
>> removing the restrictions I don't see why a weird kernel patch with
>> oddball semantics can help.
>>
> I'm not fixated on *this only* solution but don't want a solution that
> restricts creating user-namespaces since my use-cases involve creating
> user-namespaces with "lesser" privileges. The patch-set has been
> posted for more than 6 months and problem is known for some time now
> unfortunately I haven't seen any other solution that does not involve
> blocking user-namespace creation.

I don't recall poposing a solution that limits creating user-namespaces.
Certainly I have proposed other kinds of solutions.

I offered sketches for several other solutions.  Including the one above
about using the tracing/debugging framework to inject additional
permission checks into the code at run-time.

There is a real danger in the direction you are walking.  Having a
mentality that is reactive and adding restrictions after the fact has
the very real danger of breaking applications when those restrictions
are imposed.

The only way I know to avoid breaking things is to have preemptive
sandboxing that tightly limits what applications can do.  Perhaps
something like sandstorm.io.  That preemptive sandbox let's you say.
Wasn't it nice that we don't allow that code path so patching is less of
a priority.

Past that you have to balance between what you might break and what you
are what problems you are going to avoid by disallowing things after the
fact.

I have a little time but I don't think I will have much time for a
general design discussion until after 4.16 is out.

So far to me, capability controlled user namespaces look like a nasty
adhoc feature that one person will use, and not much.  That however will
need to be maintained in perpetuity.  As such I think it is quite
reasonable to drag my feet, and ask is there something better and/or
more general that we can do.

Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv4 0/2] capability controlled user-namespaces
  2018-03-08 23:46                 ` Eric W. Biederman
@ 2018-03-09  5:19                   ` Mahesh Bandewar (महेश बंडेवार)
  0 siblings, 0 replies; 14+ messages in thread
From: Mahesh Bandewar (महेश बंडेवार) @ 2018-03-09  5:19 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: James Morris, Eric Dumazet, Serge E. Hallyn, kernel-hardening

On Thu, Mar 8, 2018 at 3:46 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> "Mahesh Bandewar (महेश बंडेवार)" <maheshb@google.com> writes:
>
>> On Thu, Mar 8, 2018 at 2:52 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>>> James Morris <jmorris@namei.org> writes:
>>>
>>>> Perhaps try a repost upstream for possible merging to 4.16.
>>>
>>> I have a real concern that capability controlled user namespaces
>>> are only good for CAP_NET_RAW and CAP_NET_ADMIN.  They don't appear
>>> general.
>>>
>> NET_RAW and NET_ADMIN threats are real and demonstrated and hence it's
>> easy to show this patch-set to handle them well.
>>
>>> I think this should be discussed on the linux hardening mailing list.
>>> As that is what we are really talking about something to reduce the
>>> attack surface of the kernel.  Possibly after it has shipped.
>>> In some well defined way.
>>>
>> This patch-set has been posted to linux-hardening mailing list since
>> initial RFC series.
>
> When I looked this thread was not.  Hmm.  It looks like this thread had
> become completely private.  Sigh.
>
>>> That feels to me like a project for profiling tools, and some bpf programs
>>> that attach to functions and call permissions.
>>>
>>> Either that or something like my count of maximum number of namespaces.
>>> Which appears to be just as usable as capability controlled user
>>> namespaces.
>>>
>> maximum number of namespaces is similar to the distros adding a sysctl
>> to disallow creating user-namespace and does not solve the problem nor
>> it's usable if the use case involves creating user-namespaces.
>
> If the namespace you are limiting creating is the network namespace it
> has nearly the same efficacy and we already have that knob in the kernel
> and we need it for several reasons.
>
It may be useful for other use cases and that's fine but doesn't solve
the problem that I'm trying to address. Again in my use case I cannot
restrict creating any namespace. All I can say is, what they cannot
do.

>>> I am very sympathetic but this does not appear to be a general solution
>>> to a general problem.  The general problem being how to reduce the
>>> attack surface of the kernel.
>>>
>> Now let's say there is vulnerability discovered in CAP_DAC_OVERRIDE,
>> why do think this patch-set is not general enough to handle that? The
>> point is that at this moment there is no mechanism that allows me to
>> create a sandbox in a true sense. This patch-set allows you to do
>> that.
>
> I don't think the same amount of code is behind the other capablities
> which drastically alters the efficacy of something like this when
> considered in such a context.
>
I don't think what is or how much code behind each capability but it's
the same mechanism that allows or disallows in a sandboxed or
controlled environment.

>>> Especially when the end goal is fixing the relevant kernel code and
>>> removing the restrictions I don't see why a weird kernel patch with
>>> oddball semantics can help.
>>>
>> I'm not fixated on *this only* solution but don't want a solution that
>> restricts creating user-namespaces since my use-cases involve creating
>> user-namespaces with "lesser" privileges. The patch-set has been
>> posted for more than 6 months and problem is known for some time now
>> unfortunately I haven't seen any other solution that does not involve
>> blocking user-namespace creation.
>
> I don't recall poposing a solution that limits creating user-namespaces.
> Certainly I have proposed other kinds of solutions.
>
> I offered sketches for several other solutions.  Including the one above
> about using the tracing/debugging framework to inject additional
> permission checks into the code at run-time.
>
> There is a real danger in the direction you are walking.  Having a
> mentality that is reactive and adding restrictions after the fact has
> the very real danger of breaking applications when those restrictions
> are imposed.
>
I think you got it wrong. It doesn't have to be reactive! Well, one
could be reactive also, but in my environment I know what are the
dangerous set of actions that I wouldn't allow and would set the
environment preemptively right from day one while keeping the
flexibility of extending if something else is discovered. The default
mask I have proposed is for the backward compatibility so that no
existing system should break. The admin in every deployment is in the
position to decide what is the best default-mask that suits his/her
environment. This is a tool that can be turned on or off based on the
need.

> The only way I know to avoid breaking things is to have preemptive
> sandboxing that tightly limits what applications can do.  Perhaps
> something like sandstorm.io.  That preemptive sandbox let's you say.
> Wasn't it nice that we don't allow that code path so patching is less of
> a priority.
>
> Past that you have to balance between what you might break and what you
> are what problems you are going to avoid by disallowing things after the
> fact.
>
> I have a little time but I don't think I will have much time for a
> general design discussion until after 4.16 is out.
>
> So far to me, capability controlled user namespaces look like a nasty
> adhoc feature that one person will use, and not much.  That however will
> need to be maintained in perpetuity.  As such I think it is quite
> reasonable to drag my feet, and ask is there something better and/or
> more general that we can do.
>
sure, but until then we are all exposed and we don't know when the
next obscure vulnerability will be discovered. I just wish that
user-namespaces didn't happen because now we (kernel developers) are
torn between applications trying to use them in interesting ways and
security people trying to stop them from using it and I see this
happening in perpetuity.


> Eric
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-03-09  5:19 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-03  7:26 [PATCHv4 0/2] capability controlled user-namespaces Mahesh Bandewar
2018-01-03  7:26 ` [kernel-hardening] " Mahesh Bandewar
2018-01-03  7:26 ` Mahesh Bandewar
2018-01-03 16:44 ` Eric W. Biederman
2018-01-03 16:44   ` [kernel-hardening] " Eric W. Biederman
2018-01-03 16:44   ` Eric W. Biederman
2018-01-04  5:53   ` Mahesh Bandewar (महेश बंडेवार)
2018-01-04  5:53     ` [kernel-hardening] " Mahesh Bandewar (महेश बंडेवार)
2018-01-04  5:53     ` Mahesh Bandewar (महेश बंडेवार)
2018-01-04  5:53     ` Mahesh Bandewar (महेश बंडेवार)
     [not found]     ` <CAF2d9jjPiXX2Rf5QTvMKPdym5cqZBpTSP1Z21xzyjNcpaD=fGg@mail.gmail.com>
     [not found]       ` <20180205144015.GA12118@mail.hallyn.com>
     [not found]         ` <CANn89iL3y7aEqgUYP8Qq5NbJiYcPKMFCOWedhgOrO5cgy5c7vA@mail.gmail.com>
     [not found]           ` <alpine.LRH.2.21.1803090901100.20664@namei.org>
2018-03-08 22:52             ` Eric W. Biederman
2018-03-08 23:22               ` Mahesh Bandewar (महेश बंडेवार)
2018-03-08 23:46                 ` Eric W. Biederman
2018-03-09  5:19                   ` Mahesh Bandewar (महेश बंडेवार)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.