All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC v3 0/7] proc: modernize proc to support multiple private instances
@ 2017-11-09 16:13 ` Djalal Harouni
  0 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-09 16:13 UTC (permalink / raw)
  To: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	linux-fsdevel, linux-kernel, kernel-hardening,
	linux-security-module, linux-api
  Cc: Greg Kroah-Hartman, Alexander Viro, Akinobu Mita, me,
	Oleg Nesterov, Jeff Layton, Ingo Molnar, Alexey Dobriyan,
	ebiederm, Linus Torvalds, Daniel Micay, Jonathan Corbet, bfields,
	Stephen Rothwell, solar, Djalal Harouni

Hi list,


Preface:
--------
This is RFC v3 to modernize procfs and make it able to support multiple
private instances per the same pid namespace.

I have been working on this with Alexey Gladkov and Andy Lutomirski.

RFC v1 is here:
https://lkml.org/lkml/2017/3/30/670

RFC v2 is here:
https://lkml.org/lkml/2017/4/25/282

This RFC v3 can be applied on top of next-20171109

This RFC was tested on Ubuntu/Debian and Alexey tested it on altlinux.
It does not work on Fedora due to a bug during boot with dracut, I did
not have time to investigate it more. I will make sure to fix it next
iteration. We decided to send it to get more feedback on the direction,
we will continue to improve it.

RFC v3 handles all previous comments from Andy Lutomirski, thank you for
all the feedback.


Procfs modernization:
---------------------
Historically procfs was always tied to pid namespaces, during pid
namespace creation we internally create a procfs mount for it. However,
this has the effect that all new procfs mounts are just a mirror of the
internal one, any change, any mount option update, any new future
introduction will propagate to all other procfs mounts that are in the
same pid namespace.

This may have solved several use cases in that time. However today we
face new requirements, and making procfs able to support new private
instances inside same pid namespace seems a major point. If we want to
to introduce new features and security mechanisms we have to make sure
first that we do not break existing usecases. Supporting private procfs
instances wil allow to support new features and behaviour without
propagating it to all other procfs mounts.


Today procfs is more of a burden especially to some Embedded, IoT,
sandbox, container use cases. In user space we are over-mounting null
or inaccessible files on top to hide files and information. If we want
to hide pids we have to create PID namespaces otherwise mount options
propagate to all other proc mounts, changing a mount option value in one
mount will propagate to all other proc mounts. If we want to introduce
new features, then they will propagate to all other mounts too, resulting
either maybe new useful functionality or maybe breaking stuff. We have
also to note that userspace should not workaround procfs, the kernel
should just provide a sane simple interface.


In this regard several developers and maintainers pointed out that
there are problems with procfs and it has to be modernized:

"Here's another one: split up and modernize /proc." by Andy Lutomirski [1]

Discussion about kernel pointer leaks:
"And yes, as Kees and Daniel mentioned, it's definitely not just dmesg.
In fact, the primary things tend to be /proc and /sys, not dmesg
itself." By Linus Torvalds [2]

Lot of other areas in the kernel and filesystems have been updated to be
able to support private instances, devpts is one major example [3]. The aim
here is to modernize procfs without breaking userspace, or without affecting
the shared procfs mount. Later new features will apply on the private
instances, and after more testing, months, maybe it can be made the default
especially for IoT.

We want the possibility to do:

  mount -t proc -onewinstance,newfeature none /proc

newfeature: we are planning new features later for procfs, for now in
this RFC we only introduce "pids=all|ptraceable" mount option.

This allows to absorbe changes, make improvments without breaking use
cases.


Which will be used for:

1) Embedded systems and IoT: usually we have one supervisor for
apps, we have some lightweight sandbox support, however if we create
pid namespaces we have to manage all the processes inside too,
where our goal is to be able to run a bunch of apps each one inside
its own mount namespace, maybe use network namespaces for vlans
setups, but right now we only want mount namespaces, without all the
other complexity. we want procfs to behave more like a real file system,
and block access to inodes that belong to other users. 'hidepid=' will
not work since it is a shared mount option.


2) Containers, sandboxes and Private instances of file systems - devpts case
Historically, lot of file systems inside Linux kernel view when instantiated
were just a mirror of an already created and mounted filesystem. This was the
case of devpts filesystem, it seems at that time the requirements were to
optimize things and reuse the same memory, etc. This design used to work but not
anymore with today’s containers, IoT, hostile environments and all the privacy
challenges that Linux faces.

In that regards, devpts was updated so that each new mounts is a total
independent file system by the following patches:
“devpts: Make each mount of devpts an independent filesystem” by
Eric W. Biederman [3] [4]


3) Linux Security Modules have multiple ptrace paths inside some
subsystems, however inside procfs, the implementation does not guarantee
that the ptrace() check which triggers the security_ptrace_check() hook
will always run. We have the 'hidepid' mount option that can be used to
force the ptrace_may_access() check inside has_pid_permissions() to run.
The problem is that 'hidepid' is per pid namespace and not attached to
the mount point, any remount or modification of 'hidepid' will propagate
to all other procfs mounts.

This also does not allow to support Yama LSM easily in desktop and user
sessions. Yama ptrace scope which restricts ptrace and some other
syscalls to be allowed only on inferiors, can be updated to have a
per-task context, where the context will be inherited during fork(),
clone() and preserved across execve(). If we support multiple private
procfs instances, then we may force the ptrace_may_access() on
/proc/<pids>/ to always run inside that new procfs instances. This will
allow to specifiy on user sessions if we should populate procfs with
pids that the user can ptrace or not.

By using Yama ptrace scope, some restricted users will only be able to see
inferiors inside /proc, they won't even be able to see their other
processes. Some software like Chromium, Firefox's crash handler, Wine
and others are already using Yama to restrict which processes can be
ptracable. With this change this will give the possibility to restrict
/proc/<pids>/ but more importantly this will give desktop users a
generic and usuable way to specifiy which users should see all processes
and which user can not.

Side notes:
* This covers the lack of seccomp where it is not able to parse
arguments, it is easy to install a seccomp filter on direct syscalls
that operate on pids, however /proc/<pid>/ is a Linux ABI using
filesystem syscalls. With this change all LSMs should be able to analyze
open/read/write/close... on /proc/<pid>/


4) This will allow to implement new features either in kernel or
userspace without having to worry about procfs.
In containers, sandboxes, etc we have workarounds to hide some /proc
inodes, this should be supported natively without doing extra complex
work, the kernel should be able to support sane options that work with
today and future Linux use cases.

Alexey Gladkov has on top a patch [7] that allows to hide non-pid
inodes from procfs, we are improving that patch and with 'newinstance'
option it can be used in containers and sandboxes, as these are already
trying to hide and block access to procfs inodes anyway.

https://github.com/legionus/linux/commit/993a2a5b9af95b0ac901ff41d32124b72ed676e3


Introduced changes:
-------------------
This series adds two new mount options:

* 'newinstance' mount option, it was also suggesed by Andy Lutomirski [5].
When this option is passed we automatically create a private procfs instance.

This is not the default behaviour since we do not want to break userspace
and we do not want to provide different devices IDs by default when
stat()ing inodes, I am not sure about all the use cases there [6].

* 'pids' mount option, as discussed with Andy Lutomirski.

  If 'pids=' is passed without 'newinstance' then it has no effect.

  If 'newinstance,pids=all' then processes will be show inside the proc
  mount.

  If 'newinstance,pids=ptraceable' then only ptraceable processes will be
  shown.

This allows to support lightweight sandboxes in Embedded Linux, also
solves the case for LSM where now with this mount option, we make sure
that they have a ptrace path in procfs.


Use cases of 'newinstance' mount option:

* We create a private procfs instance that it is disconnected from the
shared or other procfs instances.

* "hidepid" instead of chaning all other mirrored procfs mounts, now
it will work only on the new private instance.

* "gid" instead of chaning all other mirrored procfs mounts, now it will
work only on the new private instance.

* "pids=ptraceable" mount option which will take precendence over
"hidepid" will only work when 'newinstance' is set. Otherwise it is ignored.

This should allow later after real testing to have a smooth transition
to a procfs with default private instances.


How to test:
$ sudo mount -t proc -onewinstance,pids=ptraceable none /test


Note for userspace that should be documented:
If you are over mounting /proc, then make sure you are in a new mount
namespace where propagation to master is disconnected. This will avoid
to pin that new /proc mount.


References:
-----------
[1] https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2017-January/004215.html
[2] http://www.openwall.com/lists/kernel-hardening/2017/10/05/5
[3] https://lwn.net/Articles/689539/
[4] http://lxr.free-electrons.com/source/Documentation/filesystems/devpts.txt?v=3.14
[5] https://lkml.org/lkml/2017/5/2/407
[6] https://lkml.org/lkml/2017/5/3/357


# Changes since RFC v2:
*) Renamed mount options to 'newinstance' and 'pids='
   Suggested-by: Andy Lutomirski <luto@kernel.org>
*) Fixed order of commit, Suggested-by: Andy Lutomirski <luto@kernel.org>
*) Many bug fixes.


# Changes since RFC v1:
*) Removed 'unshared' mount option and replaced it with 'limit_pids'
   which is attached to the current procfs mount.
   Suggested-by Andy Lutomirski <luto@kernel.org>
*) Do not fill dcache with pid entries that we can not ptrace.
*) Many bug fixes.


Djalal Harouni (7):
 [PATCH 1/7] proc: add proc_fs_info struct to store proc information
 [PATCH 2/7] proc: move /proc/{self|thread-self} dentries to proc_fs_info
 [PATCH 3/7] proc: add helpers to set and get proc hidepid and gid mount options
 [PATCH 4/7] proc: support mounting private procfs instances inside same pid namespace
 [PATCH 5/7] proc: move hidepid definitions to proc files
 [PATCH 6/7] proc: support new 'pids=all|ptraceable' mount option
 [patch 7/7] proc: flush dcache entries from all procfs instances
        

 fs/locks.c                    |   6 +-
 fs/proc/base.c                | 103 ++++++++++++++++-------
 fs/proc/inode.c               |  34 ++++++--
 fs/proc/internal.h            |   2 +-
 fs/proc/root.c                | 188 +++++++++++

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH RFC v3 0/7] proc: modernize proc to support multiple private instances
@ 2017-11-09 16:13 ` Djalal Harouni
  0 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-09 16:13 UTC (permalink / raw)
  To: linux-security-module

Hi list,


Preface:
--------
This is RFC v3 to modernize procfs and make it able to support multiple
private instances per the same pid namespace.

I have been working on this with Alexey Gladkov and Andy Lutomirski.

RFC v1 is here:
https://lkml.org/lkml/2017/3/30/670

RFC v2 is here:
https://lkml.org/lkml/2017/4/25/282

This RFC v3 can be applied on top of next-20171109

This RFC was tested on Ubuntu/Debian and Alexey tested it on altlinux.
It does not work on Fedora due to a bug during boot with dracut, I did
not have time to investigate it more. I will make sure to fix it next
iteration. We decided to send it to get more feedback on the direction,
we will continue to improve it.

RFC v3 handles all previous comments from Andy Lutomirski, thank you for
all the feedback.


Procfs modernization:
---------------------
Historically procfs was always tied to pid namespaces, during pid
namespace creation we internally create a procfs mount for it. However,
this has the effect that all new procfs mounts are just a mirror of the
internal one, any change, any mount option update, any new future
introduction will propagate to all other procfs mounts that are in the
same pid namespace.

This may have solved several use cases in that time. However today we
face new requirements, and making procfs able to support new private
instances inside same pid namespace seems a major point. If we want to
to introduce new features and security mechanisms we have to make sure
first that we do not break existing usecases. Supporting private procfs
instances wil allow to support new features and behaviour without
propagating it to all other procfs mounts.


Today procfs is more of a burden especially to some Embedded, IoT,
sandbox, container use cases. In user space we are over-mounting null
or inaccessible files on top to hide files and information. If we want
to hide pids we have to create PID namespaces otherwise mount options
propagate to all other proc mounts, changing a mount option value in one
mount will propagate to all other proc mounts. If we want to introduce
new features, then they will propagate to all other mounts too, resulting
either maybe new useful functionality or maybe breaking stuff. We have
also to note that userspace should not workaround procfs, the kernel
should just provide a sane simple interface.


In this regard several developers and maintainers pointed out that
there are problems with procfs and it has to be modernized:

"Here's another one: split up and modernize /proc." by Andy Lutomirski [1]

Discussion about kernel pointer leaks:
"And yes, as Kees and Daniel mentioned, it's definitely not just dmesg.
In fact, the primary things tend to be /proc and /sys, not dmesg
itself." By Linus Torvalds [2]

Lot of other areas in the kernel and filesystems have been updated to be
able to support private instances, devpts is one major example [3]. The aim
here is to modernize procfs without breaking userspace, or without affecting
the shared procfs mount. Later new features will apply on the private
instances, and after more testing, months, maybe it can be made the default
especially for IoT.

We want the possibility to do:

  mount -t proc -onewinstance,newfeature none /proc

newfeature: we are planning new features later for procfs, for now in
this RFC we only introduce "pids=all|ptraceable" mount option.

This allows to absorbe changes, make improvments without breaking use
cases.


Which will be used for:

1) Embedded systems and IoT: usually we have one supervisor for
apps, we have some lightweight sandbox support, however if we create
pid namespaces we have to manage all the processes inside too,
where our goal is to be able to run a bunch of apps each one inside
its own mount namespace, maybe use network namespaces for vlans
setups, but right now we only want mount namespaces, without all the
other complexity. we want procfs to behave more like a real file system,
and block access to inodes that belong to other users. 'hidepid=' will
not work since it is a shared mount option.


2) Containers, sandboxes and Private instances of file systems - devpts case
Historically, lot of file systems inside Linux kernel view when instantiated
were just a mirror of an already created and mounted filesystem. This was the
case of devpts filesystem, it seems at that time the requirements were to
optimize things and reuse the same memory, etc. This design used to work but not
anymore with today?s containers, IoT, hostile environments and all the privacy
challenges that Linux faces.

In that regards, devpts was updated so that each new mounts is a total
independent file system by the following patches:
?devpts: Make each mount of devpts an independent filesystem? by
Eric W. Biederman [3] [4]


3) Linux Security Modules have multiple ptrace paths inside some
subsystems, however inside procfs, the implementation does not guarantee
that the ptrace() check which triggers the security_ptrace_check() hook
will always run. We have the 'hidepid' mount option that can be used to
force the ptrace_may_access() check inside has_pid_permissions() to run.
The problem is that 'hidepid' is per pid namespace and not attached to
the mount point, any remount or modification of 'hidepid' will propagate
to all other procfs mounts.

This also does not allow to support Yama LSM easily in desktop and user
sessions. Yama ptrace scope which restricts ptrace and some other
syscalls to be allowed only on inferiors, can be updated to have a
per-task context, where the context will be inherited during fork(),
clone() and preserved across execve(). If we support multiple private
procfs instances, then we may force the ptrace_may_access() on
/proc/<pids>/ to always run inside that new procfs instances. This will
allow to specifiy on user sessions if we should populate procfs with
pids that the user can ptrace or not.

By using Yama ptrace scope, some restricted users will only be able to see
inferiors inside /proc, they won't even be able to see their other
processes. Some software like Chromium, Firefox's crash handler, Wine
and others are already using Yama to restrict which processes can be
ptracable. With this change this will give the possibility to restrict
/proc/<pids>/ but more importantly this will give desktop users a
generic and usuable way to specifiy which users should see all processes
and which user can not.

Side notes:
* This covers the lack of seccomp where it is not able to parse
arguments, it is easy to install a seccomp filter on direct syscalls
that operate on pids, however /proc/<pid>/ is a Linux ABI using
filesystem syscalls. With this change all LSMs should be able to analyze
open/read/write/close... on /proc/<pid>/


4) This will allow to implement new features either in kernel or
userspace without having to worry about procfs.
In containers, sandboxes, etc we have workarounds to hide some /proc
inodes, this should be supported natively without doing extra complex
work, the kernel should be able to support sane options that work with
today and future Linux use cases.

Alexey Gladkov has on top a patch [7] that allows to hide non-pid
inodes from procfs, we are improving that patch and with 'newinstance'
option it can be used in containers and sandboxes, as these are already
trying to hide and block access to procfs inodes anyway.

https://github.com/legionus/linux/commit/993a2a5b9af95b0ac901ff41d32124b72ed676e3


Introduced changes:
-------------------
This series adds two new mount options:

* 'newinstance' mount option, it was also suggesed by Andy Lutomirski [5].
When this option is passed we automatically create a private procfs instance.

This is not the default behaviour since we do not want to break userspace
and we do not want to provide different devices IDs by default when
stat()ing inodes, I am not sure about all the use cases there [6].

* 'pids' mount option, as discussed with Andy Lutomirski.

  If 'pids=' is passed without 'newinstance' then it has no effect.

  If 'newinstance,pids=all' then processes will be show inside the proc
  mount.

  If 'newinstance,pids=ptraceable' then only ptraceable processes will be
  shown.

This allows to support lightweight sandboxes in Embedded Linux, also
solves the case for LSM where now with this mount option, we make sure
that they have a ptrace path in procfs.


Use cases of 'newinstance' mount option:

* We create a private procfs instance that it is disconnected from the
shared or other procfs instances.

* "hidepid" instead of chaning all other mirrored procfs mounts, now
it will work only on the new private instance.

* "gid" instead of chaning all other mirrored procfs mounts, now it will
work only on the new private instance.

* "pids=ptraceable" mount option which will take precendence over
"hidepid" will only work when 'newinstance' is set. Otherwise it is ignored.

This should allow later after real testing to have a smooth transition
to a procfs with default private instances.


How to test:
$ sudo mount -t proc -onewinstance,pids=ptraceable none /test


Note for userspace that should be documented:
If you are over mounting /proc, then make sure you are in a new mount
namespace where propagation to master is disconnected. This will avoid
to pin that new /proc mount.


References:
-----------
[1] https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2017-January/004215.html
[2] http://www.openwall.com/lists/kernel-hardening/2017/10/05/5
[3] https://lwn.net/Articles/689539/
[4] http://lxr.free-electrons.com/source/Documentation/filesystems/devpts.txt?v=3.14
[5] https://lkml.org/lkml/2017/5/2/407
[6] https://lkml.org/lkml/2017/5/3/357


# Changes since RFC v2:
*) Renamed mount options to 'newinstance' and 'pids='
   Suggested-by: Andy Lutomirski <luto@kernel.org>
*) Fixed order of commit, Suggested-by: Andy Lutomirski <luto@kernel.org>
*) Many bug fixes.


# Changes since RFC v1:
*) Removed 'unshared' mount option and replaced it with 'limit_pids'
   which is attached to the current procfs mount.
   Suggested-by Andy Lutomirski <luto@kernel.org>
*) Do not fill dcache with pid entries that we can not ptrace.
*) Many bug fixes.


Djalal Harouni (7):
 [PATCH 1/7] proc: add proc_fs_info struct to store proc information
 [PATCH 2/7] proc: move /proc/{self|thread-self} dentries to proc_fs_info
 [PATCH 3/7] proc: add helpers to set and get proc hidepid and gid mount options
 [PATCH 4/7] proc: support mounting private procfs instances inside same pid namespace
 [PATCH 5/7] proc: move hidepid definitions to proc files
 [PATCH 6/7] proc: support new 'pids=all|ptraceable' mount option
 [patch 7/7] proc: flush dcache entries from all procfs instances
        

 fs/locks.c                    |   6 +-
 fs/proc/base.c                | 103 ++++++++++++++++-------
 fs/proc/inode.c               |  34 ++++++--
 fs/proc/internal.h            |   2 +-
 fs/proc/root.c                | 188 +++++++++++
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [kernel-hardening] [PATCH RFC v3 0/7] proc: modernize proc to support multiple private instances
@ 2017-11-09 16:13 ` Djalal Harouni
  0 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-09 16:13 UTC (permalink / raw)
  To: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	linux-fsdevel, linux-kernel, kernel-hardening,
	linux-security-module, linux-api
  Cc: Greg Kroah-Hartman, Alexander Viro, Akinobu Mita, me,
	Oleg Nesterov, Jeff Layton, Ingo Molnar, Alexey Dobriyan,
	ebiederm, Linus Torvalds, Daniel Micay, Jonathan Corbet, bfields,
	Stephen Rothwell, solar, Djalal Harouni

Hi list,


Preface:
--------
This is RFC v3 to modernize procfs and make it able to support multiple
private instances per the same pid namespace.

I have been working on this with Alexey Gladkov and Andy Lutomirski.

RFC v1 is here:
https://lkml.org/lkml/2017/3/30/670

RFC v2 is here:
https://lkml.org/lkml/2017/4/25/282

This RFC v3 can be applied on top of next-20171109

This RFC was tested on Ubuntu/Debian and Alexey tested it on altlinux.
It does not work on Fedora due to a bug during boot with dracut, I did
not have time to investigate it more. I will make sure to fix it next
iteration. We decided to send it to get more feedback on the direction,
we will continue to improve it.

RFC v3 handles all previous comments from Andy Lutomirski, thank you for
all the feedback.


Procfs modernization:
---------------------
Historically procfs was always tied to pid namespaces, during pid
namespace creation we internally create a procfs mount for it. However,
this has the effect that all new procfs mounts are just a mirror of the
internal one, any change, any mount option update, any new future
introduction will propagate to all other procfs mounts that are in the
same pid namespace.

This may have solved several use cases in that time. However today we
face new requirements, and making procfs able to support new private
instances inside same pid namespace seems a major point. If we want to
to introduce new features and security mechanisms we have to make sure
first that we do not break existing usecases. Supporting private procfs
instances wil allow to support new features and behaviour without
propagating it to all other procfs mounts.


Today procfs is more of a burden especially to some Embedded, IoT,
sandbox, container use cases. In user space we are over-mounting null
or inaccessible files on top to hide files and information. If we want
to hide pids we have to create PID namespaces otherwise mount options
propagate to all other proc mounts, changing a mount option value in one
mount will propagate to all other proc mounts. If we want to introduce
new features, then they will propagate to all other mounts too, resulting
either maybe new useful functionality or maybe breaking stuff. We have
also to note that userspace should not workaround procfs, the kernel
should just provide a sane simple interface.


In this regard several developers and maintainers pointed out that
there are problems with procfs and it has to be modernized:

"Here's another one: split up and modernize /proc." by Andy Lutomirski [1]

Discussion about kernel pointer leaks:
"And yes, as Kees and Daniel mentioned, it's definitely not just dmesg.
In fact, the primary things tend to be /proc and /sys, not dmesg
itself." By Linus Torvalds [2]

Lot of other areas in the kernel and filesystems have been updated to be
able to support private instances, devpts is one major example [3]. The aim
here is to modernize procfs without breaking userspace, or without affecting
the shared procfs mount. Later new features will apply on the private
instances, and after more testing, months, maybe it can be made the default
especially for IoT.

We want the possibility to do:

  mount -t proc -onewinstance,newfeature none /proc

newfeature: we are planning new features later for procfs, for now in
this RFC we only introduce "pids=all|ptraceable" mount option.

This allows to absorbe changes, make improvments without breaking use
cases.


Which will be used for:

1) Embedded systems and IoT: usually we have one supervisor for
apps, we have some lightweight sandbox support, however if we create
pid namespaces we have to manage all the processes inside too,
where our goal is to be able to run a bunch of apps each one inside
its own mount namespace, maybe use network namespaces for vlans
setups, but right now we only want mount namespaces, without all the
other complexity. we want procfs to behave more like a real file system,
and block access to inodes that belong to other users. 'hidepid=' will
not work since it is a shared mount option.


2) Containers, sandboxes and Private instances of file systems - devpts case
Historically, lot of file systems inside Linux kernel view when instantiated
were just a mirror of an already created and mounted filesystem. This was the
case of devpts filesystem, it seems at that time the requirements were to
optimize things and reuse the same memory, etc. This design used to work but not
anymore with today’s containers, IoT, hostile environments and all the privacy
challenges that Linux faces.

In that regards, devpts was updated so that each new mounts is a total
independent file system by the following patches:
“devpts: Make each mount of devpts an independent filesystem” by
Eric W. Biederman [3] [4]


3) Linux Security Modules have multiple ptrace paths inside some
subsystems, however inside procfs, the implementation does not guarantee
that the ptrace() check which triggers the security_ptrace_check() hook
will always run. We have the 'hidepid' mount option that can be used to
force the ptrace_may_access() check inside has_pid_permissions() to run.
The problem is that 'hidepid' is per pid namespace and not attached to
the mount point, any remount or modification of 'hidepid' will propagate
to all other procfs mounts.

This also does not allow to support Yama LSM easily in desktop and user
sessions. Yama ptrace scope which restricts ptrace and some other
syscalls to be allowed only on inferiors, can be updated to have a
per-task context, where the context will be inherited during fork(),
clone() and preserved across execve(). If we support multiple private
procfs instances, then we may force the ptrace_may_access() on
/proc/<pids>/ to always run inside that new procfs instances. This will
allow to specifiy on user sessions if we should populate procfs with
pids that the user can ptrace or not.

By using Yama ptrace scope, some restricted users will only be able to see
inferiors inside /proc, they won't even be able to see their other
processes. Some software like Chromium, Firefox's crash handler, Wine
and others are already using Yama to restrict which processes can be
ptracable. With this change this will give the possibility to restrict
/proc/<pids>/ but more importantly this will give desktop users a
generic and usuable way to specifiy which users should see all processes
and which user can not.

Side notes:
* This covers the lack of seccomp where it is not able to parse
arguments, it is easy to install a seccomp filter on direct syscalls
that operate on pids, however /proc/<pid>/ is a Linux ABI using
filesystem syscalls. With this change all LSMs should be able to analyze
open/read/write/close... on /proc/<pid>/


4) This will allow to implement new features either in kernel or
userspace without having to worry about procfs.
In containers, sandboxes, etc we have workarounds to hide some /proc
inodes, this should be supported natively without doing extra complex
work, the kernel should be able to support sane options that work with
today and future Linux use cases.

Alexey Gladkov has on top a patch [7] that allows to hide non-pid
inodes from procfs, we are improving that patch and with 'newinstance'
option it can be used in containers and sandboxes, as these are already
trying to hide and block access to procfs inodes anyway.

https://github.com/legionus/linux/commit/993a2a5b9af95b0ac901ff41d32124b72ed676e3


Introduced changes:
-------------------
This series adds two new mount options:

* 'newinstance' mount option, it was also suggesed by Andy Lutomirski [5].
When this option is passed we automatically create a private procfs instance.

This is not the default behaviour since we do not want to break userspace
and we do not want to provide different devices IDs by default when
stat()ing inodes, I am not sure about all the use cases there [6].

* 'pids' mount option, as discussed with Andy Lutomirski.

  If 'pids=' is passed without 'newinstance' then it has no effect.

  If 'newinstance,pids=all' then processes will be show inside the proc
  mount.

  If 'newinstance,pids=ptraceable' then only ptraceable processes will be
  shown.

This allows to support lightweight sandboxes in Embedded Linux, also
solves the case for LSM where now with this mount option, we make sure
that they have a ptrace path in procfs.


Use cases of 'newinstance' mount option:

* We create a private procfs instance that it is disconnected from the
shared or other procfs instances.

* "hidepid" instead of chaning all other mirrored procfs mounts, now
it will work only on the new private instance.

* "gid" instead of chaning all other mirrored procfs mounts, now it will
work only on the new private instance.

* "pids=ptraceable" mount option which will take precendence over
"hidepid" will only work when 'newinstance' is set. Otherwise it is ignored.

This should allow later after real testing to have a smooth transition
to a procfs with default private instances.


How to test:
$ sudo mount -t proc -onewinstance,pids=ptraceable none /test


Note for userspace that should be documented:
If you are over mounting /proc, then make sure you are in a new mount
namespace where propagation to master is disconnected. This will avoid
to pin that new /proc mount.


References:
-----------
[1] https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2017-January/004215.html
[2] http://www.openwall.com/lists/kernel-hardening/2017/10/05/5
[3] https://lwn.net/Articles/689539/
[4] http://lxr.free-electrons.com/source/Documentation/filesystems/devpts.txt?v=3.14
[5] https://lkml.org/lkml/2017/5/2/407
[6] https://lkml.org/lkml/2017/5/3/357


# Changes since RFC v2:
*) Renamed mount options to 'newinstance' and 'pids='
   Suggested-by: Andy Lutomirski <luto@kernel.org>
*) Fixed order of commit, Suggested-by: Andy Lutomirski <luto@kernel.org>
*) Many bug fixes.


# Changes since RFC v1:
*) Removed 'unshared' mount option and replaced it with 'limit_pids'
   which is attached to the current procfs mount.
   Suggested-by Andy Lutomirski <luto@kernel.org>
*) Do not fill dcache with pid entries that we can not ptrace.
*) Many bug fixes.


Djalal Harouni (7):
 [PATCH 1/7] proc: add proc_fs_info struct to store proc information
 [PATCH 2/7] proc: move /proc/{self|thread-self} dentries to proc_fs_info
 [PATCH 3/7] proc: add helpers to set and get proc hidepid and gid mount options
 [PATCH 4/7] proc: support mounting private procfs instances inside same pid namespace
 [PATCH 5/7] proc: move hidepid definitions to proc files
 [PATCH 6/7] proc: support new 'pids=all|ptraceable' mount option
 [patch 7/7] proc: flush dcache entries from all procfs instances
        

 fs/locks.c                    |   6 +-
 fs/proc/base.c                | 103 ++++++++++++++++-------
 fs/proc/inode.c               |  34 ++++++--
 fs/proc/internal.h            |   2 +-
 fs/proc/root.c                | 188 +++++++++++

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH RFC v3 1/7] proc: add proc_fs_info struct to store proc information
  2017-11-09 16:13 ` Djalal Harouni
  (?)
@ 2017-11-09 16:14   ` Djalal Harouni
  -1 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-09 16:14 UTC (permalink / raw)
  To: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	linux-fsdevel, linux-kernel, kernel-hardening,
	linux-security-module, linux-api
  Cc: Greg Kroah-Hartman, Alexander Viro, Akinobu Mita, me,
	Oleg Nesterov, Jeff Layton, Ingo Molnar, Alexey Dobriyan,
	ebiederm, Linus Torvalds, Daniel Micay, Jonathan Corbet, bfields,
	Stephen Rothwell, solar, Djalal Harouni

This is a prepation patch that adds proc_fs_info to handle procfs
internal information. Right now procfs internal information is stored
inside the pid namespace which make it hard to change or modernize
procfs without affecting pid namespaces, furthermore this is blocking
all kind of changes that are needed to solve today's or future Linux
challenges, as noted by various maintainers and userspace needs:

"Here's another one: split up and modernize /proc." by Andy Lutomirski [1]

Discussion about kernel pointer leaks:
"And yes, as Kees and Daniel mentioned, it's definitely not just dmesg.
In fact, the primary things tend to be /proc and /sys, not dmesg
itself." By Linus Torvalds [2]

procfs is an important Linux API that offers features using filesystem
syscalls, hence lets handle it as a real filesystem, with its own
private information and avoid mixing it with PID namespaces since it is
more than PID namespace after all. This will allow later to support
separate instances each one with its own superblock, which will solve
lot of problems.

Other Linux interfaces like devpts were also updated to support
containers, sandboxes and multiple private instances [2]. Time to update
procfs also.

Patch changes:
* Adds proc_fs_info struct to store procfs mount information.
* Updates proc_mount() to directly handle mounts there.
* Updates all calls that need to access now proc_fs_info struct.

[1] https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2017-January/004215.html
[2] http://www.openwall.com/lists/kernel-hardening/2017/10/05/5
[3] http://lxr.free-electrons.com/source/Documentation/filesystems/devpts.txt?v=3.14

Cc: Kees Cook <keescook@chromium.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@gmail.com>
---
 fs/locks.c              |  6 +++--
 fs/proc/base.c          | 30 +++++++++++----------
 fs/proc/inode.c         |  8 +++---
 fs/proc/root.c          | 69 ++++++++++++++++++++++++++++++++++++++++++++++---
 fs/proc/self.c          |  8 +++---
 fs/proc/thread_self.c   |  6 +++--
 fs/proc_namespace.c     | 14 +++++-----
 include/linux/proc_fs.h | 10 +++++++
 8 files changed, 117 insertions(+), 34 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 21b4dfa..6d5c473 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2624,7 +2624,8 @@ static void lock_get_status(struct seq_file *f, struct file_lock *fl,
 {
 	struct inode *inode = NULL;
 	unsigned int fl_pid;
-	struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
+	struct proc_fs_info *fs_info = proc_sb(file_inode(f->file)->i_sb);
+	struct pid_namespace *proc_pidns = fs_info->pid_ns;
 
 	fl_pid = locks_translate_pid(fl, proc_pidns);
 	/*
@@ -2704,7 +2705,8 @@ static int locks_show(struct seq_file *f, void *v)
 {
 	struct locks_iterator *iter = f->private;
 	struct file_lock *fl, *bfl;
-	struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
+	struct proc_fs_info *fs_info = proc_sb(file_inode(f->file)->i_sb);
+	struct pid_namespace *proc_pidns = fs_info->pid_ns;
 
 	fl = hlist_entry(v, struct file_lock, fl_link);
 
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 31934cb..5fc2006 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -696,7 +696,8 @@ static bool has_pid_permissions(struct pid_namespace *pid,
 
 static int proc_pid_permission(struct inode *inode, int mask)
 {
-	struct pid_namespace *pid = inode->i_sb->s_fs_info;
+	struct proc_fs_info *fs_info = proc_sb(inode->i_sb);
+	struct pid_namespace *pid = fs_info->pid_ns;
 	struct task_struct *task;
 	bool has_perms;
 
@@ -731,12 +732,12 @@ static const struct inode_operations proc_def_inode_operations = {
 static int proc_single_show(struct seq_file *m, void *v)
 {
 	struct inode *inode = m->private;
-	struct pid_namespace *ns;
 	struct pid *pid;
 	struct task_struct *task;
 	int ret;
 
-	ns = inode->i_sb->s_fs_info;
+	struct proc_fs_info *fs_info = proc_sb(inode->i_sb);
+	struct pid_namespace *ns = fs_info->pid_ns;
 	pid = proc_pid(inode);
 	task = get_pid_task(pid, PIDTYPE_PID);
 	if (!task)
@@ -1774,9 +1775,10 @@ struct inode *proc_pid_make_inode(struct super_block * sb,
 int pid_getattr(const struct path *path, struct kstat *stat,
 		u32 request_mask, unsigned int query_flags)
 {
-	struct inode *inode = d_inode(path->dentry);
 	struct task_struct *task;
-	struct pid_namespace *pid = path->dentry->d_sb->s_fs_info;
+	struct inode *inode = d_inode(path->dentry);
+	struct proc_fs_info *fs_info = proc_sb(inode->i_sb);
+	struct pid_namespace *pid = fs_info->pid_ns;
 
 	generic_fillattr(inode, stat);
 
@@ -2291,6 +2293,7 @@ static const struct seq_operations proc_timers_seq_ops = {
 static int proc_timers_open(struct inode *inode, struct file *file)
 {
 	struct timers_private *tp;
+	struct proc_fs_info *fs_info = proc_sb(inode->i_sb);
 
 	tp = __seq_open_private(file, &proc_timers_seq_ops,
 			sizeof(struct timers_private));
@@ -2298,7 +2301,7 @@ static int proc_timers_open(struct inode *inode, struct file *file)
 		return -ENOMEM;
 
 	tp->pid = proc_pid(inode);
-	tp->ns = inode->i_sb->s_fs_info;
+	tp->ns = fs_info->pid_ns;
 	return 0;
 }
 
@@ -3133,13 +3136,13 @@ struct dentry *proc_pid_lookup(struct inode *dir, struct dentry * dentry, unsign
 	int result = -ENOENT;
 	struct task_struct *task;
 	unsigned tgid;
-	struct pid_namespace *ns;
+	struct proc_fs_info *fs_info = proc_sb(dir->i_sb);
+	struct pid_namespace *ns = fs_info->pid_ns;
 
 	tgid = name_to_int(&dentry->d_name);
 	if (tgid == ~0U)
 		goto out;
 
-	ns = dentry->d_sb->s_fs_info;
 	rcu_read_lock();
 	task = find_task_by_pid_ns(tgid, ns);
 	if (task)
@@ -3203,7 +3206,8 @@ static struct tgid_iter next_tgid(struct pid_namespace *ns, struct tgid_iter ite
 int proc_pid_readdir(struct file *file, struct dir_context *ctx)
 {
 	struct tgid_iter iter;
-	struct pid_namespace *ns = file_inode(file)->i_sb->s_fs_info;
+	struct proc_fs_info *fs_info = proc_sb(file_inode(file)->i_sb);
+	struct pid_namespace *ns = fs_info->pid_ns;
 	loff_t pos = ctx->pos;
 
 	if (pos >= PID_MAX_LIMIT + TGID_OFFSET)
@@ -3432,7 +3436,8 @@ static struct dentry *proc_task_lookup(struct inode *dir, struct dentry * dentry
 	struct task_struct *task;
 	struct task_struct *leader = get_proc_task(dir);
 	unsigned tid;
-	struct pid_namespace *ns;
+	struct proc_fs_info *fs_info = proc_sb(dentry->d_sb);
+	struct pid_namespace *ns = fs_info->pid_ns;
 
 	if (!leader)
 		goto out_no_task;
@@ -3441,7 +3446,6 @@ static struct dentry *proc_task_lookup(struct inode *dir, struct dentry * dentry
 	if (tid == ~0U)
 		goto out;
 
-	ns = dentry->d_sb->s_fs_info;
 	rcu_read_lock();
 	task = find_task_by_pid_ns(tid, ns);
 	if (task)
@@ -3543,7 +3547,8 @@ static int proc_task_readdir(struct file *file, struct dir_context *ctx)
 {
 	struct inode *inode = file_inode(file);
 	struct task_struct *task;
-	struct pid_namespace *ns;
+	struct proc_fs_info *fs_info = proc_sb(inode->i_sb);
+	struct pid_namespace *ns = fs_info->pid_ns;
 	int tid;
 
 	if (proc_inode_is_dead(inode))
@@ -3555,7 +3560,6 @@ static int proc_task_readdir(struct file *file, struct dir_context *ctx)
 	/* f_version caches the tgid value that the last readdir call couldn't
 	 * return. lseek aka telldir automagically resets f_version to 0.
 	 */
-	ns = inode->i_sb->s_fs_info;
 	tid = (int)file->f_version;
 	file->f_version = 0;
 	for (task = first_tid(proc_pid(inode), tid, ctx->pos - 2, ns);
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index dd0f826..9abc370 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -104,7 +104,8 @@ void __init proc_init_inodecache(void)
 static int proc_show_options(struct seq_file *seq, struct dentry *root)
 {
 	struct super_block *sb = root->d_sb;
-	struct pid_namespace *pid = sb->s_fs_info;
+	struct proc_fs_info *fs_info = proc_sb(sb);
+	struct pid_namespace *pid = fs_info->pid_ns;
 
 	if (!gid_eq(pid->pid_gid, GLOBAL_ROOT_GID))
 		seq_printf(seq, ",gid=%u", from_kgid_munged(&init_user_ns, pid->pid_gid));
@@ -474,7 +475,8 @@ struct inode *proc_get_inode(struct super_block *sb, struct proc_dir_entry *de)
 
 int proc_fill_super(struct super_block *s, void *data, int silent)
 {
-	struct pid_namespace *ns = get_pid_ns(s->s_fs_info);
+	struct proc_fs_info *fs_info = proc_sb(s);
+	struct pid_namespace *ns = get_pid_ns(fs_info->pid_ns);
 	struct inode *root_inode;
 	int ret;
 
@@ -496,7 +498,7 @@ int proc_fill_super(struct super_block *s, void *data, int silent)
 	 * top of it
 	 */
 	s->s_stack_depth = FILESYSTEM_MAX_STACK_DEPTH;
-	
+
 	pde_get(&proc_root);
 	root_inode = proc_get_inode(s, &proc_root);
 	if (!root_inode) {
diff --git a/fs/proc/root.c b/fs/proc/root.c
index ede8e64..43e2639 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -16,6 +16,7 @@
 #include <linux/init.h>
 #include <linux/sched.h>
 #include <linux/sched/stat.h>
+#include <linux/slab.h>
 #include <linux/module.h>
 #include <linux/bitops.h>
 #include <linux/user_namespace.h>
@@ -80,16 +81,45 @@ int proc_parse_options(char *options, struct pid_namespace *pid)
 
 int proc_remount(struct super_block *sb, int *flags, char *data)
 {
-	struct pid_namespace *pid = sb->s_fs_info;
+	struct proc_fs_info *fs_info = proc_sb(sb);
+	struct pid_namespace *pid = fs_info->pid_ns;
 
 	sync_filesystem(sb);
 	return !proc_parse_options(data, pid);
 }
 
+static int proc_test_super(struct super_block *s, void *data)
+{
+	struct proc_fs_info *p = data;
+	struct proc_fs_info *fs_info = proc_sb(s);
+
+	return p->pid_ns == fs_info->pid_ns;
+}
+
+static int proc_set_super(struct super_block *sb, void *data)
+{
+	sb->s_fs_info = data;
+	return set_anon_super(sb, NULL);
+}
+
 static struct dentry *proc_mount(struct file_system_type *fs_type,
 	int flags, const char *dev_name, void *data)
 {
+	int error;
+	struct super_block *sb;
 	struct pid_namespace *ns;
+	struct proc_fs_info *fs_info;
+
+	/*
+	 * Don't allow mounting unless the caller has CAP_SYS_ADMIN over
+	 * the namespace.
+	 */
+	if (!(flags & SB_KERNMOUNT) && !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
+		return ERR_PTR(-EPERM);
+
+	fs_info = kzalloc(sizeof(*fs_info), GFP_NOFS);
+	if (!fs_info)
+		return ERR_PTR(-ENOMEM);
 
 	if (flags & SB_KERNMOUNT) {
 		ns = data;
@@ -98,20 +128,51 @@ static struct dentry *proc_mount(struct file_system_type *fs_type,
 		ns = task_active_pid_ns(current);
 	}
 
-	return mount_ns(fs_type, flags, data, ns, ns->user_ns, proc_fill_super);
+	fs_info->pid_ns = ns;
+
+	sb = sget_userns(fs_type, proc_test_super, proc_set_super, flags,
+			 ns->user_ns, fs_info);
+	if (IS_ERR(sb)) {
+		error = PTR_ERR(sb);
+		goto error_fs_info;
+	}
+
+	if (sb->s_root) {
+		kfree(fs_info);
+		if ((flags ^ sb->s_flags) & MS_RDONLY) {
+			error = -EBUSY;
+			goto error;
+		}
+	} else {
+		error = proc_fill_super(sb, data, flags & SB_SILENT ? 1 : 0);
+		if (error) {
+			deactivate_locked_super(sb);
+			goto error;
+		}
+
+		sb->s_flags |= SB_ACTIVE;
+	}
+
+	return dget(sb->s_root);
+
+error_fs_info:
+	kfree(fs_info);
+error:
+	return ERR_PTR(error);
 }
 
 static void proc_kill_sb(struct super_block *sb)
 {
-	struct pid_namespace *ns;
+	struct proc_fs_info *fs_info = proc_sb(sb);
+	struct pid_namespace *ns = fs_info->pid_ns;
 
-	ns = (struct pid_namespace *)sb->s_fs_info;
 	if (ns->proc_self)
 		dput(ns->proc_self);
 	if (ns->proc_thread_self)
 		dput(ns->proc_thread_self);
 	kill_anon_super(sb);
 	put_pid_ns(ns);
+	kfree(fs_info);
 }
 
 static struct file_system_type proc_fs_type = {
diff --git a/fs/proc/self.c b/fs/proc/self.c
index 31326bb..f773301 100644
--- a/fs/proc/self.c
+++ b/fs/proc/self.c
@@ -11,7 +11,8 @@ static const char *proc_self_get_link(struct dentry *dentry,
 				      struct inode *inode,
 				      struct delayed_call *done)
 {
-	struct pid_namespace *ns = inode->i_sb->s_fs_info;
+	struct proc_fs_info *fs_info = proc_sb(inode->i_sb);
+	struct pid_namespace *ns = fs_info->pid_ns;
 	pid_t tgid = task_tgid_nr_ns(current, ns);
 	char *name;
 
@@ -35,9 +36,10 @@ static unsigned self_inum;
 int proc_setup_self(struct super_block *s)
 {
 	struct inode *root_inode = d_inode(s->s_root);
-	struct pid_namespace *ns = s->s_fs_info;
+	struct proc_fs_info *fs_info = proc_sb(s);
+	struct pid_namespace *ns = fs_info->pid_ns;
 	struct dentry *self;
-	
+
 	inode_lock(root_inode);
 	self = d_alloc_name(s->s_root, "self");
 	if (self) {
diff --git a/fs/proc/thread_self.c b/fs/proc/thread_self.c
index b813e3b..578887b 100644
--- a/fs/proc/thread_self.c
+++ b/fs/proc/thread_self.c
@@ -11,7 +11,8 @@ static const char *proc_thread_self_get_link(struct dentry *dentry,
 					     struct inode *inode,
 					     struct delayed_call *done)
 {
-	struct pid_namespace *ns = inode->i_sb->s_fs_info;
+	struct proc_fs_info *fs_info = proc_sb(inode->i_sb);
+	struct pid_namespace *ns = fs_info->pid_ns;
 	pid_t tgid = task_tgid_nr_ns(current, ns);
 	pid_t pid = task_pid_nr_ns(current, ns);
 	char *name;
@@ -35,8 +36,9 @@ static unsigned thread_self_inum;
 
 int proc_setup_thread_self(struct super_block *s)
 {
+	struct proc_fs_info *fs_info = proc_sb(s);
+	struct pid_namespace *ns = fs_info->pid_ns;
 	struct inode *root_inode = d_inode(s->s_root);
-	struct pid_namespace *ns = s->s_fs_info;
 	struct dentry *thread_self;
 
 	inode_lock(root_inode);
diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
index b786840..5389f43 100644
--- a/fs/proc_namespace.c
+++ b/fs/proc_namespace.c
@@ -37,23 +37,23 @@ static unsigned mounts_poll(struct file *file, poll_table *wait)
 	return res;
 }
 
-struct proc_fs_info {
+struct proc_fs_opts {
 	int flag;
 	const char *str;
 };
 
 static int show_sb_opts(struct seq_file *m, struct super_block *sb)
 {
-	static const struct proc_fs_info fs_info[] = {
+	static const struct proc_fs_opts fs_opts[] = {
 		{ SB_SYNCHRONOUS, ",sync" },
 		{ SB_DIRSYNC, ",dirsync" },
 		{ SB_MANDLOCK, ",mand" },
 		{ SB_LAZYTIME, ",lazytime" },
 		{ 0, NULL }
 	};
-	const struct proc_fs_info *fs_infop;
+	const struct proc_fs_opts *fs_infop;
 
-	for (fs_infop = fs_info; fs_infop->flag; fs_infop++) {
+	for (fs_infop = fs_opts; fs_infop->flag; fs_infop++) {
 		if (sb->s_flags & fs_infop->flag)
 			seq_puts(m, fs_infop->str);
 	}
@@ -63,7 +63,7 @@ static int show_sb_opts(struct seq_file *m, struct super_block *sb)
 
 static void show_mnt_opts(struct seq_file *m, struct vfsmount *mnt)
 {
-	static const struct proc_fs_info mnt_info[] = {
+	static const struct proc_fs_opts mnt_opts[] = {
 		{ MNT_NOSUID, ",nosuid" },
 		{ MNT_NODEV, ",nodev" },
 		{ MNT_NOEXEC, ",noexec" },
@@ -72,9 +72,9 @@ static void show_mnt_opts(struct seq_file *m, struct vfsmount *mnt)
 		{ MNT_RELATIME, ",relatime" },
 		{ 0, NULL }
 	};
-	const struct proc_fs_info *fs_infop;
+	const struct proc_fs_opts *fs_infop;
 
-	for (fs_infop = mnt_info; fs_infop->flag; fs_infop++) {
+	for (fs_infop = mnt_opts; fs_infop->flag; fs_infop++) {
 		if (mnt->mnt_flags & fs_infop->flag)
 			seq_puts(m, fs_infop->str);
 	}
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 928ef9e..9a3f6e9 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -8,10 +8,19 @@
 #include <linux/types.h>
 #include <linux/fs.h>
 
+struct proc_fs_info {
+	struct pid_namespace *pid_ns;
+};
+
 struct proc_dir_entry;
 
 #ifdef CONFIG_PROC_FS
 
+static inline struct proc_fs_info *proc_sb(struct super_block *sb)
+{
+	return sb->s_fs_info;
+}
+
 extern void proc_root_init(void);
 extern void proc_flush_task(struct task_struct *);
 
@@ -48,6 +57,7 @@ static inline void proc_flush_task(struct task_struct *task)
 {
 }
 
+extern inline struct proc_fs_info *proc_sb(struct super_block *sb) { return NULL;}
 static inline struct proc_dir_entry *proc_symlink(const char *name,
 		struct proc_dir_entry *parent,const char *dest) { return NULL;}
 static inline struct proc_dir_entry *proc_mkdir(const char *name,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v3 1/7] proc: add proc_fs_info struct to store proc information
@ 2017-11-09 16:14   ` Djalal Harouni
  0 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-09 16:14 UTC (permalink / raw)
  To: linux-security-module

This is a prepation patch that adds proc_fs_info to handle procfs
internal information. Right now procfs internal information is stored
inside the pid namespace which make it hard to change or modernize
procfs without affecting pid namespaces, furthermore this is blocking
all kind of changes that are needed to solve today's or future Linux
challenges, as noted by various maintainers and userspace needs:

"Here's another one: split up and modernize /proc." by Andy Lutomirski [1]

Discussion about kernel pointer leaks:
"And yes, as Kees and Daniel mentioned, it's definitely not just dmesg.
In fact, the primary things tend to be /proc and /sys, not dmesg
itself." By Linus Torvalds [2]

procfs is an important Linux API that offers features using filesystem
syscalls, hence lets handle it as a real filesystem, with its own
private information and avoid mixing it with PID namespaces since it is
more than PID namespace after all. This will allow later to support
separate instances each one with its own superblock, which will solve
lot of problems.

Other Linux interfaces like devpts were also updated to support
containers, sandboxes and multiple private instances [2]. Time to update
procfs also.

Patch changes:
* Adds proc_fs_info struct to store procfs mount information.
* Updates proc_mount() to directly handle mounts there.
* Updates all calls that need to access now proc_fs_info struct.

[1] https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2017-January/004215.html
[2] http://www.openwall.com/lists/kernel-hardening/2017/10/05/5
[3] http://lxr.free-electrons.com/source/Documentation/filesystems/devpts.txt?v=3.14

Cc: Kees Cook <keescook@chromium.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@gmail.com>
---
 fs/locks.c              |  6 +++--
 fs/proc/base.c          | 30 +++++++++++----------
 fs/proc/inode.c         |  8 +++---
 fs/proc/root.c          | 69 ++++++++++++++++++++++++++++++++++++++++++++++---
 fs/proc/self.c          |  8 +++---
 fs/proc/thread_self.c   |  6 +++--
 fs/proc_namespace.c     | 14 +++++-----
 include/linux/proc_fs.h | 10 +++++++
 8 files changed, 117 insertions(+), 34 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 21b4dfa..6d5c473 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2624,7 +2624,8 @@ static void lock_get_status(struct seq_file *f, struct file_lock *fl,
 {
 	struct inode *inode = NULL;
 	unsigned int fl_pid;
-	struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
+	struct proc_fs_info *fs_info = proc_sb(file_inode(f->file)->i_sb);
+	struct pid_namespace *proc_pidns = fs_info->pid_ns;
 
 	fl_pid = locks_translate_pid(fl, proc_pidns);
 	/*
@@ -2704,7 +2705,8 @@ static int locks_show(struct seq_file *f, void *v)
 {
 	struct locks_iterator *iter = f->private;
 	struct file_lock *fl, *bfl;
-	struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
+	struct proc_fs_info *fs_info = proc_sb(file_inode(f->file)->i_sb);
+	struct pid_namespace *proc_pidns = fs_info->pid_ns;
 
 	fl = hlist_entry(v, struct file_lock, fl_link);
 
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 31934cb..5fc2006 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -696,7 +696,8 @@ static bool has_pid_permissions(struct pid_namespace *pid,
 
 static int proc_pid_permission(struct inode *inode, int mask)
 {
-	struct pid_namespace *pid = inode->i_sb->s_fs_info;
+	struct proc_fs_info *fs_info = proc_sb(inode->i_sb);
+	struct pid_namespace *pid = fs_info->pid_ns;
 	struct task_struct *task;
 	bool has_perms;
 
@@ -731,12 +732,12 @@ static const struct inode_operations proc_def_inode_operations = {
 static int proc_single_show(struct seq_file *m, void *v)
 {
 	struct inode *inode = m->private;
-	struct pid_namespace *ns;
 	struct pid *pid;
 	struct task_struct *task;
 	int ret;
 
-	ns = inode->i_sb->s_fs_info;
+	struct proc_fs_info *fs_info = proc_sb(inode->i_sb);
+	struct pid_namespace *ns = fs_info->pid_ns;
 	pid = proc_pid(inode);
 	task = get_pid_task(pid, PIDTYPE_PID);
 	if (!task)
@@ -1774,9 +1775,10 @@ struct inode *proc_pid_make_inode(struct super_block * sb,
 int pid_getattr(const struct path *path, struct kstat *stat,
 		u32 request_mask, unsigned int query_flags)
 {
-	struct inode *inode = d_inode(path->dentry);
 	struct task_struct *task;
-	struct pid_namespace *pid = path->dentry->d_sb->s_fs_info;
+	struct inode *inode = d_inode(path->dentry);
+	struct proc_fs_info *fs_info = proc_sb(inode->i_sb);
+	struct pid_namespace *pid = fs_info->pid_ns;
 
 	generic_fillattr(inode, stat);
 
@@ -2291,6 +2293,7 @@ static const struct seq_operations proc_timers_seq_ops = {
 static int proc_timers_open(struct inode *inode, struct file *file)
 {
 	struct timers_private *tp;
+	struct proc_fs_info *fs_info = proc_sb(inode->i_sb);
 
 	tp = __seq_open_private(file, &proc_timers_seq_ops,
 			sizeof(struct timers_private));
@@ -2298,7 +2301,7 @@ static int proc_timers_open(struct inode *inode, struct file *file)
 		return -ENOMEM;
 
 	tp->pid = proc_pid(inode);
-	tp->ns = inode->i_sb->s_fs_info;
+	tp->ns = fs_info->pid_ns;
 	return 0;
 }
 
@@ -3133,13 +3136,13 @@ struct dentry *proc_pid_lookup(struct inode *dir, struct dentry * dentry, unsign
 	int result = -ENOENT;
 	struct task_struct *task;
 	unsigned tgid;
-	struct pid_namespace *ns;
+	struct proc_fs_info *fs_info = proc_sb(dir->i_sb);
+	struct pid_namespace *ns = fs_info->pid_ns;
 
 	tgid = name_to_int(&dentry->d_name);
 	if (tgid == ~0U)
 		goto out;
 
-	ns = dentry->d_sb->s_fs_info;
 	rcu_read_lock();
 	task = find_task_by_pid_ns(tgid, ns);
 	if (task)
@@ -3203,7 +3206,8 @@ static struct tgid_iter next_tgid(struct pid_namespace *ns, struct tgid_iter ite
 int proc_pid_readdir(struct file *file, struct dir_context *ctx)
 {
 	struct tgid_iter iter;
-	struct pid_namespace *ns = file_inode(file)->i_sb->s_fs_info;
+	struct proc_fs_info *fs_info = proc_sb(file_inode(file)->i_sb);
+	struct pid_namespace *ns = fs_info->pid_ns;
 	loff_t pos = ctx->pos;
 
 	if (pos >= PID_MAX_LIMIT + TGID_OFFSET)
@@ -3432,7 +3436,8 @@ static struct dentry *proc_task_lookup(struct inode *dir, struct dentry * dentry
 	struct task_struct *task;
 	struct task_struct *leader = get_proc_task(dir);
 	unsigned tid;
-	struct pid_namespace *ns;
+	struct proc_fs_info *fs_info = proc_sb(dentry->d_sb);
+	struct pid_namespace *ns = fs_info->pid_ns;
 
 	if (!leader)
 		goto out_no_task;
@@ -3441,7 +3446,6 @@ static struct dentry *proc_task_lookup(struct inode *dir, struct dentry * dentry
 	if (tid == ~0U)
 		goto out;
 
-	ns = dentry->d_sb->s_fs_info;
 	rcu_read_lock();
 	task = find_task_by_pid_ns(tid, ns);
 	if (task)
@@ -3543,7 +3547,8 @@ static int proc_task_readdir(struct file *file, struct dir_context *ctx)
 {
 	struct inode *inode = file_inode(file);
 	struct task_struct *task;
-	struct pid_namespace *ns;
+	struct proc_fs_info *fs_info = proc_sb(inode->i_sb);
+	struct pid_namespace *ns = fs_info->pid_ns;
 	int tid;
 
 	if (proc_inode_is_dead(inode))
@@ -3555,7 +3560,6 @@ static int proc_task_readdir(struct file *file, struct dir_context *ctx)
 	/* f_version caches the tgid value that the last readdir call couldn't
 	 * return. lseek aka telldir automagically resets f_version to 0.
 	 */
-	ns = inode->i_sb->s_fs_info;
 	tid = (int)file->f_version;
 	file->f_version = 0;
 	for (task = first_tid(proc_pid(inode), tid, ctx->pos - 2, ns);
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index dd0f826..9abc370 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -104,7 +104,8 @@ void __init proc_init_inodecache(void)
 static int proc_show_options(struct seq_file *seq, struct dentry *root)
 {
 	struct super_block *sb = root->d_sb;
-	struct pid_namespace *pid = sb->s_fs_info;
+	struct proc_fs_info *fs_info = proc_sb(sb);
+	struct pid_namespace *pid = fs_info->pid_ns;
 
 	if (!gid_eq(pid->pid_gid, GLOBAL_ROOT_GID))
 		seq_printf(seq, ",gid=%u", from_kgid_munged(&init_user_ns, pid->pid_gid));
@@ -474,7 +475,8 @@ struct inode *proc_get_inode(struct super_block *sb, struct proc_dir_entry *de)
 
 int proc_fill_super(struct super_block *s, void *data, int silent)
 {
-	struct pid_namespace *ns = get_pid_ns(s->s_fs_info);
+	struct proc_fs_info *fs_info = proc_sb(s);
+	struct pid_namespace *ns = get_pid_ns(fs_info->pid_ns);
 	struct inode *root_inode;
 	int ret;
 
@@ -496,7 +498,7 @@ int proc_fill_super(struct super_block *s, void *data, int silent)
 	 * top of it
 	 */
 	s->s_stack_depth = FILESYSTEM_MAX_STACK_DEPTH;
-	
+
 	pde_get(&proc_root);
 	root_inode = proc_get_inode(s, &proc_root);
 	if (!root_inode) {
diff --git a/fs/proc/root.c b/fs/proc/root.c
index ede8e64..43e2639 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -16,6 +16,7 @@
 #include <linux/init.h>
 #include <linux/sched.h>
 #include <linux/sched/stat.h>
+#include <linux/slab.h>
 #include <linux/module.h>
 #include <linux/bitops.h>
 #include <linux/user_namespace.h>
@@ -80,16 +81,45 @@ int proc_parse_options(char *options, struct pid_namespace *pid)
 
 int proc_remount(struct super_block *sb, int *flags, char *data)
 {
-	struct pid_namespace *pid = sb->s_fs_info;
+	struct proc_fs_info *fs_info = proc_sb(sb);
+	struct pid_namespace *pid = fs_info->pid_ns;
 
 	sync_filesystem(sb);
 	return !proc_parse_options(data, pid);
 }
 
+static int proc_test_super(struct super_block *s, void *data)
+{
+	struct proc_fs_info *p = data;
+	struct proc_fs_info *fs_info = proc_sb(s);
+
+	return p->pid_ns == fs_info->pid_ns;
+}
+
+static int proc_set_super(struct super_block *sb, void *data)
+{
+	sb->s_fs_info = data;
+	return set_anon_super(sb, NULL);
+}
+
 static struct dentry *proc_mount(struct file_system_type *fs_type,
 	int flags, const char *dev_name, void *data)
 {
+	int error;
+	struct super_block *sb;
 	struct pid_namespace *ns;
+	struct proc_fs_info *fs_info;
+
+	/*
+	 * Don't allow mounting unless the caller has CAP_SYS_ADMIN over
+	 * the namespace.
+	 */
+	if (!(flags & SB_KERNMOUNT) && !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
+		return ERR_PTR(-EPERM);
+
+	fs_info = kzalloc(sizeof(*fs_info), GFP_NOFS);
+	if (!fs_info)
+		return ERR_PTR(-ENOMEM);
 
 	if (flags & SB_KERNMOUNT) {
 		ns = data;
@@ -98,20 +128,51 @@ static struct dentry *proc_mount(struct file_system_type *fs_type,
 		ns = task_active_pid_ns(current);
 	}
 
-	return mount_ns(fs_type, flags, data, ns, ns->user_ns, proc_fill_super);
+	fs_info->pid_ns = ns;
+
+	sb = sget_userns(fs_type, proc_test_super, proc_set_super, flags,
+			 ns->user_ns, fs_info);
+	if (IS_ERR(sb)) {
+		error = PTR_ERR(sb);
+		goto error_fs_info;
+	}
+
+	if (sb->s_root) {
+		kfree(fs_info);
+		if ((flags ^ sb->s_flags) & MS_RDONLY) {
+			error = -EBUSY;
+			goto error;
+		}
+	} else {
+		error = proc_fill_super(sb, data, flags & SB_SILENT ? 1 : 0);
+		if (error) {
+			deactivate_locked_super(sb);
+			goto error;
+		}
+
+		sb->s_flags |= SB_ACTIVE;
+	}
+
+	return dget(sb->s_root);
+
+error_fs_info:
+	kfree(fs_info);
+error:
+	return ERR_PTR(error);
 }
 
 static void proc_kill_sb(struct super_block *sb)
 {
-	struct pid_namespace *ns;
+	struct proc_fs_info *fs_info = proc_sb(sb);
+	struct pid_namespace *ns = fs_info->pid_ns;
 
-	ns = (struct pid_namespace *)sb->s_fs_info;
 	if (ns->proc_self)
 		dput(ns->proc_self);
 	if (ns->proc_thread_self)
 		dput(ns->proc_thread_self);
 	kill_anon_super(sb);
 	put_pid_ns(ns);
+	kfree(fs_info);
 }
 
 static struct file_system_type proc_fs_type = {
diff --git a/fs/proc/self.c b/fs/proc/self.c
index 31326bb..f773301 100644
--- a/fs/proc/self.c
+++ b/fs/proc/self.c
@@ -11,7 +11,8 @@ static const char *proc_self_get_link(struct dentry *dentry,
 				      struct inode *inode,
 				      struct delayed_call *done)
 {
-	struct pid_namespace *ns = inode->i_sb->s_fs_info;
+	struct proc_fs_info *fs_info = proc_sb(inode->i_sb);
+	struct pid_namespace *ns = fs_info->pid_ns;
 	pid_t tgid = task_tgid_nr_ns(current, ns);
 	char *name;
 
@@ -35,9 +36,10 @@ static unsigned self_inum;
 int proc_setup_self(struct super_block *s)
 {
 	struct inode *root_inode = d_inode(s->s_root);
-	struct pid_namespace *ns = s->s_fs_info;
+	struct proc_fs_info *fs_info = proc_sb(s);
+	struct pid_namespace *ns = fs_info->pid_ns;
 	struct dentry *self;
-	
+
 	inode_lock(root_inode);
 	self = d_alloc_name(s->s_root, "self");
 	if (self) {
diff --git a/fs/proc/thread_self.c b/fs/proc/thread_self.c
index b813e3b..578887b 100644
--- a/fs/proc/thread_self.c
+++ b/fs/proc/thread_self.c
@@ -11,7 +11,8 @@ static const char *proc_thread_self_get_link(struct dentry *dentry,
 					     struct inode *inode,
 					     struct delayed_call *done)
 {
-	struct pid_namespace *ns = inode->i_sb->s_fs_info;
+	struct proc_fs_info *fs_info = proc_sb(inode->i_sb);
+	struct pid_namespace *ns = fs_info->pid_ns;
 	pid_t tgid = task_tgid_nr_ns(current, ns);
 	pid_t pid = task_pid_nr_ns(current, ns);
 	char *name;
@@ -35,8 +36,9 @@ static unsigned thread_self_inum;
 
 int proc_setup_thread_self(struct super_block *s)
 {
+	struct proc_fs_info *fs_info = proc_sb(s);
+	struct pid_namespace *ns = fs_info->pid_ns;
 	struct inode *root_inode = d_inode(s->s_root);
-	struct pid_namespace *ns = s->s_fs_info;
 	struct dentry *thread_self;
 
 	inode_lock(root_inode);
diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
index b786840..5389f43 100644
--- a/fs/proc_namespace.c
+++ b/fs/proc_namespace.c
@@ -37,23 +37,23 @@ static unsigned mounts_poll(struct file *file, poll_table *wait)
 	return res;
 }
 
-struct proc_fs_info {
+struct proc_fs_opts {
 	int flag;
 	const char *str;
 };
 
 static int show_sb_opts(struct seq_file *m, struct super_block *sb)
 {
-	static const struct proc_fs_info fs_info[] = {
+	static const struct proc_fs_opts fs_opts[] = {
 		{ SB_SYNCHRONOUS, ",sync" },
 		{ SB_DIRSYNC, ",dirsync" },
 		{ SB_MANDLOCK, ",mand" },
 		{ SB_LAZYTIME, ",lazytime" },
 		{ 0, NULL }
 	};
-	const struct proc_fs_info *fs_infop;
+	const struct proc_fs_opts *fs_infop;
 
-	for (fs_infop = fs_info; fs_infop->flag; fs_infop++) {
+	for (fs_infop = fs_opts; fs_infop->flag; fs_infop++) {
 		if (sb->s_flags & fs_infop->flag)
 			seq_puts(m, fs_infop->str);
 	}
@@ -63,7 +63,7 @@ static int show_sb_opts(struct seq_file *m, struct super_block *sb)
 
 static void show_mnt_opts(struct seq_file *m, struct vfsmount *mnt)
 {
-	static const struct proc_fs_info mnt_info[] = {
+	static const struct proc_fs_opts mnt_opts[] = {
 		{ MNT_NOSUID, ",nosuid" },
 		{ MNT_NODEV, ",nodev" },
 		{ MNT_NOEXEC, ",noexec" },
@@ -72,9 +72,9 @@ static void show_mnt_opts(struct seq_file *m, struct vfsmount *mnt)
 		{ MNT_RELATIME, ",relatime" },
 		{ 0, NULL }
 	};
-	const struct proc_fs_info *fs_infop;
+	const struct proc_fs_opts *fs_infop;
 
-	for (fs_infop = mnt_info; fs_infop->flag; fs_infop++) {
+	for (fs_infop = mnt_opts; fs_infop->flag; fs_infop++) {
 		if (mnt->mnt_flags & fs_infop->flag)
 			seq_puts(m, fs_infop->str);
 	}
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 928ef9e..9a3f6e9 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -8,10 +8,19 @@
 #include <linux/types.h>
 #include <linux/fs.h>
 
+struct proc_fs_info {
+	struct pid_namespace *pid_ns;
+};
+
 struct proc_dir_entry;
 
 #ifdef CONFIG_PROC_FS
 
+static inline struct proc_fs_info *proc_sb(struct super_block *sb)
+{
+	return sb->s_fs_info;
+}
+
 extern void proc_root_init(void);
 extern void proc_flush_task(struct task_struct *);
 
@@ -48,6 +57,7 @@ static inline void proc_flush_task(struct task_struct *task)
 {
 }
 
+extern inline struct proc_fs_info *proc_sb(struct super_block *sb) { return NULL;}
 static inline struct proc_dir_entry *proc_symlink(const char *name,
 		struct proc_dir_entry *parent,const char *dest) { return NULL;}
 static inline struct proc_dir_entry *proc_mkdir(const char *name,
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [kernel-hardening] [PATCH RFC v3 1/7] proc: add proc_fs_info struct to store proc information
@ 2017-11-09 16:14   ` Djalal Harouni
  0 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-09 16:14 UTC (permalink / raw)
  To: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	linux-fsdevel, linux-kernel, kernel-hardening,
	linux-security-module, linux-api
  Cc: Greg Kroah-Hartman, Alexander Viro, Akinobu Mita, me,
	Oleg Nesterov, Jeff Layton, Ingo Molnar, Alexey Dobriyan,
	ebiederm, Linus Torvalds, Daniel Micay, Jonathan Corbet, bfields,
	Stephen Rothwell, solar, Djalal Harouni

This is a prepation patch that adds proc_fs_info to handle procfs
internal information. Right now procfs internal information is stored
inside the pid namespace which make it hard to change or modernize
procfs without affecting pid namespaces, furthermore this is blocking
all kind of changes that are needed to solve today's or future Linux
challenges, as noted by various maintainers and userspace needs:

"Here's another one: split up and modernize /proc." by Andy Lutomirski [1]

Discussion about kernel pointer leaks:
"And yes, as Kees and Daniel mentioned, it's definitely not just dmesg.
In fact, the primary things tend to be /proc and /sys, not dmesg
itself." By Linus Torvalds [2]

procfs is an important Linux API that offers features using filesystem
syscalls, hence lets handle it as a real filesystem, with its own
private information and avoid mixing it with PID namespaces since it is
more than PID namespace after all. This will allow later to support
separate instances each one with its own superblock, which will solve
lot of problems.

Other Linux interfaces like devpts were also updated to support
containers, sandboxes and multiple private instances [2]. Time to update
procfs also.

Patch changes:
* Adds proc_fs_info struct to store procfs mount information.
* Updates proc_mount() to directly handle mounts there.
* Updates all calls that need to access now proc_fs_info struct.

[1] https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2017-January/004215.html
[2] http://www.openwall.com/lists/kernel-hardening/2017/10/05/5
[3] http://lxr.free-electrons.com/source/Documentation/filesystems/devpts.txt?v=3.14

Cc: Kees Cook <keescook@chromium.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@gmail.com>
---
 fs/locks.c              |  6 +++--
 fs/proc/base.c          | 30 +++++++++++----------
 fs/proc/inode.c         |  8 +++---
 fs/proc/root.c          | 69 ++++++++++++++++++++++++++++++++++++++++++++++---
 fs/proc/self.c          |  8 +++---
 fs/proc/thread_self.c   |  6 +++--
 fs/proc_namespace.c     | 14 +++++-----
 include/linux/proc_fs.h | 10 +++++++
 8 files changed, 117 insertions(+), 34 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 21b4dfa..6d5c473 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2624,7 +2624,8 @@ static void lock_get_status(struct seq_file *f, struct file_lock *fl,
 {
 	struct inode *inode = NULL;
 	unsigned int fl_pid;
-	struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
+	struct proc_fs_info *fs_info = proc_sb(file_inode(f->file)->i_sb);
+	struct pid_namespace *proc_pidns = fs_info->pid_ns;
 
 	fl_pid = locks_translate_pid(fl, proc_pidns);
 	/*
@@ -2704,7 +2705,8 @@ static int locks_show(struct seq_file *f, void *v)
 {
 	struct locks_iterator *iter = f->private;
 	struct file_lock *fl, *bfl;
-	struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
+	struct proc_fs_info *fs_info = proc_sb(file_inode(f->file)->i_sb);
+	struct pid_namespace *proc_pidns = fs_info->pid_ns;
 
 	fl = hlist_entry(v, struct file_lock, fl_link);
 
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 31934cb..5fc2006 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -696,7 +696,8 @@ static bool has_pid_permissions(struct pid_namespace *pid,
 
 static int proc_pid_permission(struct inode *inode, int mask)
 {
-	struct pid_namespace *pid = inode->i_sb->s_fs_info;
+	struct proc_fs_info *fs_info = proc_sb(inode->i_sb);
+	struct pid_namespace *pid = fs_info->pid_ns;
 	struct task_struct *task;
 	bool has_perms;
 
@@ -731,12 +732,12 @@ static const struct inode_operations proc_def_inode_operations = {
 static int proc_single_show(struct seq_file *m, void *v)
 {
 	struct inode *inode = m->private;
-	struct pid_namespace *ns;
 	struct pid *pid;
 	struct task_struct *task;
 	int ret;
 
-	ns = inode->i_sb->s_fs_info;
+	struct proc_fs_info *fs_info = proc_sb(inode->i_sb);
+	struct pid_namespace *ns = fs_info->pid_ns;
 	pid = proc_pid(inode);
 	task = get_pid_task(pid, PIDTYPE_PID);
 	if (!task)
@@ -1774,9 +1775,10 @@ struct inode *proc_pid_make_inode(struct super_block * sb,
 int pid_getattr(const struct path *path, struct kstat *stat,
 		u32 request_mask, unsigned int query_flags)
 {
-	struct inode *inode = d_inode(path->dentry);
 	struct task_struct *task;
-	struct pid_namespace *pid = path->dentry->d_sb->s_fs_info;
+	struct inode *inode = d_inode(path->dentry);
+	struct proc_fs_info *fs_info = proc_sb(inode->i_sb);
+	struct pid_namespace *pid = fs_info->pid_ns;
 
 	generic_fillattr(inode, stat);
 
@@ -2291,6 +2293,7 @@ static const struct seq_operations proc_timers_seq_ops = {
 static int proc_timers_open(struct inode *inode, struct file *file)
 {
 	struct timers_private *tp;
+	struct proc_fs_info *fs_info = proc_sb(inode->i_sb);
 
 	tp = __seq_open_private(file, &proc_timers_seq_ops,
 			sizeof(struct timers_private));
@@ -2298,7 +2301,7 @@ static int proc_timers_open(struct inode *inode, struct file *file)
 		return -ENOMEM;
 
 	tp->pid = proc_pid(inode);
-	tp->ns = inode->i_sb->s_fs_info;
+	tp->ns = fs_info->pid_ns;
 	return 0;
 }
 
@@ -3133,13 +3136,13 @@ struct dentry *proc_pid_lookup(struct inode *dir, struct dentry * dentry, unsign
 	int result = -ENOENT;
 	struct task_struct *task;
 	unsigned tgid;
-	struct pid_namespace *ns;
+	struct proc_fs_info *fs_info = proc_sb(dir->i_sb);
+	struct pid_namespace *ns = fs_info->pid_ns;
 
 	tgid = name_to_int(&dentry->d_name);
 	if (tgid == ~0U)
 		goto out;
 
-	ns = dentry->d_sb->s_fs_info;
 	rcu_read_lock();
 	task = find_task_by_pid_ns(tgid, ns);
 	if (task)
@@ -3203,7 +3206,8 @@ static struct tgid_iter next_tgid(struct pid_namespace *ns, struct tgid_iter ite
 int proc_pid_readdir(struct file *file, struct dir_context *ctx)
 {
 	struct tgid_iter iter;
-	struct pid_namespace *ns = file_inode(file)->i_sb->s_fs_info;
+	struct proc_fs_info *fs_info = proc_sb(file_inode(file)->i_sb);
+	struct pid_namespace *ns = fs_info->pid_ns;
 	loff_t pos = ctx->pos;
 
 	if (pos >= PID_MAX_LIMIT + TGID_OFFSET)
@@ -3432,7 +3436,8 @@ static struct dentry *proc_task_lookup(struct inode *dir, struct dentry * dentry
 	struct task_struct *task;
 	struct task_struct *leader = get_proc_task(dir);
 	unsigned tid;
-	struct pid_namespace *ns;
+	struct proc_fs_info *fs_info = proc_sb(dentry->d_sb);
+	struct pid_namespace *ns = fs_info->pid_ns;
 
 	if (!leader)
 		goto out_no_task;
@@ -3441,7 +3446,6 @@ static struct dentry *proc_task_lookup(struct inode *dir, struct dentry * dentry
 	if (tid == ~0U)
 		goto out;
 
-	ns = dentry->d_sb->s_fs_info;
 	rcu_read_lock();
 	task = find_task_by_pid_ns(tid, ns);
 	if (task)
@@ -3543,7 +3547,8 @@ static int proc_task_readdir(struct file *file, struct dir_context *ctx)
 {
 	struct inode *inode = file_inode(file);
 	struct task_struct *task;
-	struct pid_namespace *ns;
+	struct proc_fs_info *fs_info = proc_sb(inode->i_sb);
+	struct pid_namespace *ns = fs_info->pid_ns;
 	int tid;
 
 	if (proc_inode_is_dead(inode))
@@ -3555,7 +3560,6 @@ static int proc_task_readdir(struct file *file, struct dir_context *ctx)
 	/* f_version caches the tgid value that the last readdir call couldn't
 	 * return. lseek aka telldir automagically resets f_version to 0.
 	 */
-	ns = inode->i_sb->s_fs_info;
 	tid = (int)file->f_version;
 	file->f_version = 0;
 	for (task = first_tid(proc_pid(inode), tid, ctx->pos - 2, ns);
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index dd0f826..9abc370 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -104,7 +104,8 @@ void __init proc_init_inodecache(void)
 static int proc_show_options(struct seq_file *seq, struct dentry *root)
 {
 	struct super_block *sb = root->d_sb;
-	struct pid_namespace *pid = sb->s_fs_info;
+	struct proc_fs_info *fs_info = proc_sb(sb);
+	struct pid_namespace *pid = fs_info->pid_ns;
 
 	if (!gid_eq(pid->pid_gid, GLOBAL_ROOT_GID))
 		seq_printf(seq, ",gid=%u", from_kgid_munged(&init_user_ns, pid->pid_gid));
@@ -474,7 +475,8 @@ struct inode *proc_get_inode(struct super_block *sb, struct proc_dir_entry *de)
 
 int proc_fill_super(struct super_block *s, void *data, int silent)
 {
-	struct pid_namespace *ns = get_pid_ns(s->s_fs_info);
+	struct proc_fs_info *fs_info = proc_sb(s);
+	struct pid_namespace *ns = get_pid_ns(fs_info->pid_ns);
 	struct inode *root_inode;
 	int ret;
 
@@ -496,7 +498,7 @@ int proc_fill_super(struct super_block *s, void *data, int silent)
 	 * top of it
 	 */
 	s->s_stack_depth = FILESYSTEM_MAX_STACK_DEPTH;
-	
+
 	pde_get(&proc_root);
 	root_inode = proc_get_inode(s, &proc_root);
 	if (!root_inode) {
diff --git a/fs/proc/root.c b/fs/proc/root.c
index ede8e64..43e2639 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -16,6 +16,7 @@
 #include <linux/init.h>
 #include <linux/sched.h>
 #include <linux/sched/stat.h>
+#include <linux/slab.h>
 #include <linux/module.h>
 #include <linux/bitops.h>
 #include <linux/user_namespace.h>
@@ -80,16 +81,45 @@ int proc_parse_options(char *options, struct pid_namespace *pid)
 
 int proc_remount(struct super_block *sb, int *flags, char *data)
 {
-	struct pid_namespace *pid = sb->s_fs_info;
+	struct proc_fs_info *fs_info = proc_sb(sb);
+	struct pid_namespace *pid = fs_info->pid_ns;
 
 	sync_filesystem(sb);
 	return !proc_parse_options(data, pid);
 }
 
+static int proc_test_super(struct super_block *s, void *data)
+{
+	struct proc_fs_info *p = data;
+	struct proc_fs_info *fs_info = proc_sb(s);
+
+	return p->pid_ns == fs_info->pid_ns;
+}
+
+static int proc_set_super(struct super_block *sb, void *data)
+{
+	sb->s_fs_info = data;
+	return set_anon_super(sb, NULL);
+}
+
 static struct dentry *proc_mount(struct file_system_type *fs_type,
 	int flags, const char *dev_name, void *data)
 {
+	int error;
+	struct super_block *sb;
 	struct pid_namespace *ns;
+	struct proc_fs_info *fs_info;
+
+	/*
+	 * Don't allow mounting unless the caller has CAP_SYS_ADMIN over
+	 * the namespace.
+	 */
+	if (!(flags & SB_KERNMOUNT) && !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
+		return ERR_PTR(-EPERM);
+
+	fs_info = kzalloc(sizeof(*fs_info), GFP_NOFS);
+	if (!fs_info)
+		return ERR_PTR(-ENOMEM);
 
 	if (flags & SB_KERNMOUNT) {
 		ns = data;
@@ -98,20 +128,51 @@ static struct dentry *proc_mount(struct file_system_type *fs_type,
 		ns = task_active_pid_ns(current);
 	}
 
-	return mount_ns(fs_type, flags, data, ns, ns->user_ns, proc_fill_super);
+	fs_info->pid_ns = ns;
+
+	sb = sget_userns(fs_type, proc_test_super, proc_set_super, flags,
+			 ns->user_ns, fs_info);
+	if (IS_ERR(sb)) {
+		error = PTR_ERR(sb);
+		goto error_fs_info;
+	}
+
+	if (sb->s_root) {
+		kfree(fs_info);
+		if ((flags ^ sb->s_flags) & MS_RDONLY) {
+			error = -EBUSY;
+			goto error;
+		}
+	} else {
+		error = proc_fill_super(sb, data, flags & SB_SILENT ? 1 : 0);
+		if (error) {
+			deactivate_locked_super(sb);
+			goto error;
+		}
+
+		sb->s_flags |= SB_ACTIVE;
+	}
+
+	return dget(sb->s_root);
+
+error_fs_info:
+	kfree(fs_info);
+error:
+	return ERR_PTR(error);
 }
 
 static void proc_kill_sb(struct super_block *sb)
 {
-	struct pid_namespace *ns;
+	struct proc_fs_info *fs_info = proc_sb(sb);
+	struct pid_namespace *ns = fs_info->pid_ns;
 
-	ns = (struct pid_namespace *)sb->s_fs_info;
 	if (ns->proc_self)
 		dput(ns->proc_self);
 	if (ns->proc_thread_self)
 		dput(ns->proc_thread_self);
 	kill_anon_super(sb);
 	put_pid_ns(ns);
+	kfree(fs_info);
 }
 
 static struct file_system_type proc_fs_type = {
diff --git a/fs/proc/self.c b/fs/proc/self.c
index 31326bb..f773301 100644
--- a/fs/proc/self.c
+++ b/fs/proc/self.c
@@ -11,7 +11,8 @@ static const char *proc_self_get_link(struct dentry *dentry,
 				      struct inode *inode,
 				      struct delayed_call *done)
 {
-	struct pid_namespace *ns = inode->i_sb->s_fs_info;
+	struct proc_fs_info *fs_info = proc_sb(inode->i_sb);
+	struct pid_namespace *ns = fs_info->pid_ns;
 	pid_t tgid = task_tgid_nr_ns(current, ns);
 	char *name;
 
@@ -35,9 +36,10 @@ static unsigned self_inum;
 int proc_setup_self(struct super_block *s)
 {
 	struct inode *root_inode = d_inode(s->s_root);
-	struct pid_namespace *ns = s->s_fs_info;
+	struct proc_fs_info *fs_info = proc_sb(s);
+	struct pid_namespace *ns = fs_info->pid_ns;
 	struct dentry *self;
-	
+
 	inode_lock(root_inode);
 	self = d_alloc_name(s->s_root, "self");
 	if (self) {
diff --git a/fs/proc/thread_self.c b/fs/proc/thread_self.c
index b813e3b..578887b 100644
--- a/fs/proc/thread_self.c
+++ b/fs/proc/thread_self.c
@@ -11,7 +11,8 @@ static const char *proc_thread_self_get_link(struct dentry *dentry,
 					     struct inode *inode,
 					     struct delayed_call *done)
 {
-	struct pid_namespace *ns = inode->i_sb->s_fs_info;
+	struct proc_fs_info *fs_info = proc_sb(inode->i_sb);
+	struct pid_namespace *ns = fs_info->pid_ns;
 	pid_t tgid = task_tgid_nr_ns(current, ns);
 	pid_t pid = task_pid_nr_ns(current, ns);
 	char *name;
@@ -35,8 +36,9 @@ static unsigned thread_self_inum;
 
 int proc_setup_thread_self(struct super_block *s)
 {
+	struct proc_fs_info *fs_info = proc_sb(s);
+	struct pid_namespace *ns = fs_info->pid_ns;
 	struct inode *root_inode = d_inode(s->s_root);
-	struct pid_namespace *ns = s->s_fs_info;
 	struct dentry *thread_self;
 
 	inode_lock(root_inode);
diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
index b786840..5389f43 100644
--- a/fs/proc_namespace.c
+++ b/fs/proc_namespace.c
@@ -37,23 +37,23 @@ static unsigned mounts_poll(struct file *file, poll_table *wait)
 	return res;
 }
 
-struct proc_fs_info {
+struct proc_fs_opts {
 	int flag;
 	const char *str;
 };
 
 static int show_sb_opts(struct seq_file *m, struct super_block *sb)
 {
-	static const struct proc_fs_info fs_info[] = {
+	static const struct proc_fs_opts fs_opts[] = {
 		{ SB_SYNCHRONOUS, ",sync" },
 		{ SB_DIRSYNC, ",dirsync" },
 		{ SB_MANDLOCK, ",mand" },
 		{ SB_LAZYTIME, ",lazytime" },
 		{ 0, NULL }
 	};
-	const struct proc_fs_info *fs_infop;
+	const struct proc_fs_opts *fs_infop;
 
-	for (fs_infop = fs_info; fs_infop->flag; fs_infop++) {
+	for (fs_infop = fs_opts; fs_infop->flag; fs_infop++) {
 		if (sb->s_flags & fs_infop->flag)
 			seq_puts(m, fs_infop->str);
 	}
@@ -63,7 +63,7 @@ static int show_sb_opts(struct seq_file *m, struct super_block *sb)
 
 static void show_mnt_opts(struct seq_file *m, struct vfsmount *mnt)
 {
-	static const struct proc_fs_info mnt_info[] = {
+	static const struct proc_fs_opts mnt_opts[] = {
 		{ MNT_NOSUID, ",nosuid" },
 		{ MNT_NODEV, ",nodev" },
 		{ MNT_NOEXEC, ",noexec" },
@@ -72,9 +72,9 @@ static void show_mnt_opts(struct seq_file *m, struct vfsmount *mnt)
 		{ MNT_RELATIME, ",relatime" },
 		{ 0, NULL }
 	};
-	const struct proc_fs_info *fs_infop;
+	const struct proc_fs_opts *fs_infop;
 
-	for (fs_infop = mnt_info; fs_infop->flag; fs_infop++) {
+	for (fs_infop = mnt_opts; fs_infop->flag; fs_infop++) {
 		if (mnt->mnt_flags & fs_infop->flag)
 			seq_puts(m, fs_infop->str);
 	}
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 928ef9e..9a3f6e9 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -8,10 +8,19 @@
 #include <linux/types.h>
 #include <linux/fs.h>
 
+struct proc_fs_info {
+	struct pid_namespace *pid_ns;
+};
+
 struct proc_dir_entry;
 
 #ifdef CONFIG_PROC_FS
 
+static inline struct proc_fs_info *proc_sb(struct super_block *sb)
+{
+	return sb->s_fs_info;
+}
+
 extern void proc_root_init(void);
 extern void proc_flush_task(struct task_struct *);
 
@@ -48,6 +57,7 @@ static inline void proc_flush_task(struct task_struct *task)
 {
 }
 
+extern inline struct proc_fs_info *proc_sb(struct super_block *sb) { return NULL;}
 static inline struct proc_dir_entry *proc_symlink(const char *name,
 		struct proc_dir_entry *parent,const char *dest) { return NULL;}
 static inline struct proc_dir_entry *proc_mkdir(const char *name,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v3 2/7] proc: move /proc/{self|thread-self} dentries to proc_fs_info
  2017-11-09 16:13 ` Djalal Harouni
  (?)
@ 2017-11-09 16:14   ` Djalal Harouni
  -1 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-09 16:14 UTC (permalink / raw)
  To: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	linux-fsdevel, linux-kernel, kernel-hardening,
	linux-security-module, linux-api
  Cc: Greg Kroah-Hartman, Alexander Viro, Akinobu Mita, me,
	Oleg Nesterov, Jeff Layton, Ingo Molnar, Alexey Dobriyan,
	ebiederm, Linus Torvalds, Daniel Micay, Jonathan Corbet, bfields,
	Stephen Rothwell, solar, Djalal Harouni

This is a preparation patch that moves /proc/{self|thread-self} dentries
to be stored inside procfs proc_fs_info struct instead of making them
per PID namespace. Since we want to support multiple procfs instances we
need to make sure that these dentries are also per-superblock instead of
per-pidns, and we want to make sure that unmounting a private procfs
won't clash with other procfs mounts.

Cc: Kees Cook <keescook@chromium.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@gmail.com>
---
 fs/proc/base.c                | 4 ++--
 fs/proc/root.c                | 8 ++++----
 fs/proc/self.c                | 3 +--
 fs/proc/thread_self.c         | 5 ++---
 include/linux/pid_namespace.h | 4 +---
 include/linux/proc_fs.h       | 2 ++
 6 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 5fc2006..0d9b4214 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3214,13 +3214,13 @@ int proc_pid_readdir(struct file *file, struct dir_context *ctx)
 		return 0;
 
 	if (pos == TGID_OFFSET - 2) {
-		struct inode *inode = d_inode(ns->proc_self);
+		struct inode *inode = d_inode(fs_info->proc_self);
 		if (!dir_emit(ctx, "self", 4, inode->i_ino, DT_LNK))
 			return 0;
 		ctx->pos = pos = pos + 1;
 	}
 	if (pos == TGID_OFFSET - 1) {
-		struct inode *inode = d_inode(ns->proc_thread_self);
+		struct inode *inode = d_inode(fs_info->proc_thread_self);
 		if (!dir_emit(ctx, "thread-self", 11, inode->i_ino, DT_LNK))
 			return 0;
 		ctx->pos = pos = pos + 1;
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 43e2639..b225ae5 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -166,10 +166,10 @@ static void proc_kill_sb(struct super_block *sb)
 	struct proc_fs_info *fs_info = proc_sb(sb);
 	struct pid_namespace *ns = fs_info->pid_ns;
 
-	if (ns->proc_self)
-		dput(ns->proc_self);
-	if (ns->proc_thread_self)
-		dput(ns->proc_thread_self);
+	if (fs_info->proc_self)
+		dput(fs_info->proc_self);
+	if (fs_info->proc_thread_self)
+		dput(fs_info->proc_thread_self);
 	kill_anon_super(sb);
 	put_pid_ns(ns);
 	kfree(fs_info);
diff --git a/fs/proc/self.c b/fs/proc/self.c
index f773301..8a67cf0 100644
--- a/fs/proc/self.c
+++ b/fs/proc/self.c
@@ -37,7 +37,6 @@ int proc_setup_self(struct super_block *s)
 {
 	struct inode *root_inode = d_inode(s->s_root);
 	struct proc_fs_info *fs_info = proc_sb(s);
-	struct pid_namespace *ns = fs_info->pid_ns;
 	struct dentry *self;
 
 	inode_lock(root_inode);
@@ -64,7 +63,7 @@ int proc_setup_self(struct super_block *s)
 		pr_err("proc_fill_super: can't allocate /proc/self\n");
 		return PTR_ERR(self);
 	}
-	ns->proc_self = self;
+	fs_info->proc_self = self;
 	return 0;
 }
 
diff --git a/fs/proc/thread_self.c b/fs/proc/thread_self.c
index 578887b..6e3225f 100644
--- a/fs/proc/thread_self.c
+++ b/fs/proc/thread_self.c
@@ -37,7 +37,6 @@ static unsigned thread_self_inum;
 int proc_setup_thread_self(struct super_block *s)
 {
 	struct proc_fs_info *fs_info = proc_sb(s);
-	struct pid_namespace *ns = fs_info->pid_ns;
 	struct inode *root_inode = d_inode(s->s_root);
 	struct dentry *thread_self;
 
@@ -62,10 +61,10 @@ int proc_setup_thread_self(struct super_block *s)
 	}
 	inode_unlock(root_inode);
 	if (IS_ERR(thread_self)) {
-		pr_err("proc_fill_super: can't allocate /proc/thread_self\n");
+		pr_err("proc_fill_super: can't allocate /proc/thread-self\n");
 		return PTR_ERR(thread_self);
 	}
-	ns->proc_thread_self = thread_self;
+	fs_info->proc_thread_self = thread_self;
 	return 0;
 }
 
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index 49538b1..f91a8bf 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -31,9 +31,7 @@ struct pid_namespace {
 	unsigned int level;
 	struct pid_namespace *parent;
 #ifdef CONFIG_PROC_FS
-	struct vfsmount *proc_mnt;
-	struct dentry *proc_self;
-	struct dentry *proc_thread_self;
+	struct vfsmount *proc_mnt; /* Internal proc mounted during each new pidns */
 #endif
 #ifdef CONFIG_BSD_PROCESS_ACCT
 	struct fs_pin *bacct;
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 9a3f6e9..8f89069 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -10,6 +10,8 @@
 
 struct proc_fs_info {
 	struct pid_namespace *pid_ns;
+	struct dentry *proc_self; /* For /proc/self/ */
+	struct dentry *proc_thread_self; /* For /proc/thread-self/ */
 };
 
 struct proc_dir_entry;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v3 2/7] proc: move /proc/{self|thread-self} dentries to proc_fs_info
@ 2017-11-09 16:14   ` Djalal Harouni
  0 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-09 16:14 UTC (permalink / raw)
  To: linux-security-module

This is a preparation patch that moves /proc/{self|thread-self} dentries
to be stored inside procfs proc_fs_info struct instead of making them
per PID namespace. Since we want to support multiple procfs instances we
need to make sure that these dentries are also per-superblock instead of
per-pidns, and we want to make sure that unmounting a private procfs
won't clash with other procfs mounts.

Cc: Kees Cook <keescook@chromium.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@gmail.com>
---
 fs/proc/base.c                | 4 ++--
 fs/proc/root.c                | 8 ++++----
 fs/proc/self.c                | 3 +--
 fs/proc/thread_self.c         | 5 ++---
 include/linux/pid_namespace.h | 4 +---
 include/linux/proc_fs.h       | 2 ++
 6 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 5fc2006..0d9b4214 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3214,13 +3214,13 @@ int proc_pid_readdir(struct file *file, struct dir_context *ctx)
 		return 0;
 
 	if (pos == TGID_OFFSET - 2) {
-		struct inode *inode = d_inode(ns->proc_self);
+		struct inode *inode = d_inode(fs_info->proc_self);
 		if (!dir_emit(ctx, "self", 4, inode->i_ino, DT_LNK))
 			return 0;
 		ctx->pos = pos = pos + 1;
 	}
 	if (pos == TGID_OFFSET - 1) {
-		struct inode *inode = d_inode(ns->proc_thread_self);
+		struct inode *inode = d_inode(fs_info->proc_thread_self);
 		if (!dir_emit(ctx, "thread-self", 11, inode->i_ino, DT_LNK))
 			return 0;
 		ctx->pos = pos = pos + 1;
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 43e2639..b225ae5 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -166,10 +166,10 @@ static void proc_kill_sb(struct super_block *sb)
 	struct proc_fs_info *fs_info = proc_sb(sb);
 	struct pid_namespace *ns = fs_info->pid_ns;
 
-	if (ns->proc_self)
-		dput(ns->proc_self);
-	if (ns->proc_thread_self)
-		dput(ns->proc_thread_self);
+	if (fs_info->proc_self)
+		dput(fs_info->proc_self);
+	if (fs_info->proc_thread_self)
+		dput(fs_info->proc_thread_self);
 	kill_anon_super(sb);
 	put_pid_ns(ns);
 	kfree(fs_info);
diff --git a/fs/proc/self.c b/fs/proc/self.c
index f773301..8a67cf0 100644
--- a/fs/proc/self.c
+++ b/fs/proc/self.c
@@ -37,7 +37,6 @@ int proc_setup_self(struct super_block *s)
 {
 	struct inode *root_inode = d_inode(s->s_root);
 	struct proc_fs_info *fs_info = proc_sb(s);
-	struct pid_namespace *ns = fs_info->pid_ns;
 	struct dentry *self;
 
 	inode_lock(root_inode);
@@ -64,7 +63,7 @@ int proc_setup_self(struct super_block *s)
 		pr_err("proc_fill_super: can't allocate /proc/self\n");
 		return PTR_ERR(self);
 	}
-	ns->proc_self = self;
+	fs_info->proc_self = self;
 	return 0;
 }
 
diff --git a/fs/proc/thread_self.c b/fs/proc/thread_self.c
index 578887b..6e3225f 100644
--- a/fs/proc/thread_self.c
+++ b/fs/proc/thread_self.c
@@ -37,7 +37,6 @@ static unsigned thread_self_inum;
 int proc_setup_thread_self(struct super_block *s)
 {
 	struct proc_fs_info *fs_info = proc_sb(s);
-	struct pid_namespace *ns = fs_info->pid_ns;
 	struct inode *root_inode = d_inode(s->s_root);
 	struct dentry *thread_self;
 
@@ -62,10 +61,10 @@ int proc_setup_thread_self(struct super_block *s)
 	}
 	inode_unlock(root_inode);
 	if (IS_ERR(thread_self)) {
-		pr_err("proc_fill_super: can't allocate /proc/thread_self\n");
+		pr_err("proc_fill_super: can't allocate /proc/thread-self\n");
 		return PTR_ERR(thread_self);
 	}
-	ns->proc_thread_self = thread_self;
+	fs_info->proc_thread_self = thread_self;
 	return 0;
 }
 
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index 49538b1..f91a8bf 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -31,9 +31,7 @@ struct pid_namespace {
 	unsigned int level;
 	struct pid_namespace *parent;
 #ifdef CONFIG_PROC_FS
-	struct vfsmount *proc_mnt;
-	struct dentry *proc_self;
-	struct dentry *proc_thread_self;
+	struct vfsmount *proc_mnt; /* Internal proc mounted during each new pidns */
 #endif
 #ifdef CONFIG_BSD_PROCESS_ACCT
 	struct fs_pin *bacct;
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 9a3f6e9..8f89069 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -10,6 +10,8 @@
 
 struct proc_fs_info {
 	struct pid_namespace *pid_ns;
+	struct dentry *proc_self; /* For /proc/self/ */
+	struct dentry *proc_thread_self; /* For /proc/thread-self/ */
 };
 
 struct proc_dir_entry;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [kernel-hardening] [PATCH RFC v3 2/7] proc: move /proc/{self|thread-self} dentries to proc_fs_info
@ 2017-11-09 16:14   ` Djalal Harouni
  0 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-09 16:14 UTC (permalink / raw)
  To: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	linux-fsdevel, linux-kernel, kernel-hardening,
	linux-security-module, linux-api
  Cc: Greg Kroah-Hartman, Alexander Viro, Akinobu Mita, me,
	Oleg Nesterov, Jeff Layton, Ingo Molnar, Alexey Dobriyan,
	ebiederm, Linus Torvalds, Daniel Micay, Jonathan Corbet, bfields,
	Stephen Rothwell, solar, Djalal Harouni

This is a preparation patch that moves /proc/{self|thread-self} dentries
to be stored inside procfs proc_fs_info struct instead of making them
per PID namespace. Since we want to support multiple procfs instances we
need to make sure that these dentries are also per-superblock instead of
per-pidns, and we want to make sure that unmounting a private procfs
won't clash with other procfs mounts.

Cc: Kees Cook <keescook@chromium.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@gmail.com>
---
 fs/proc/base.c                | 4 ++--
 fs/proc/root.c                | 8 ++++----
 fs/proc/self.c                | 3 +--
 fs/proc/thread_self.c         | 5 ++---
 include/linux/pid_namespace.h | 4 +---
 include/linux/proc_fs.h       | 2 ++
 6 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 5fc2006..0d9b4214 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3214,13 +3214,13 @@ int proc_pid_readdir(struct file *file, struct dir_context *ctx)
 		return 0;
 
 	if (pos == TGID_OFFSET - 2) {
-		struct inode *inode = d_inode(ns->proc_self);
+		struct inode *inode = d_inode(fs_info->proc_self);
 		if (!dir_emit(ctx, "self", 4, inode->i_ino, DT_LNK))
 			return 0;
 		ctx->pos = pos = pos + 1;
 	}
 	if (pos == TGID_OFFSET - 1) {
-		struct inode *inode = d_inode(ns->proc_thread_self);
+		struct inode *inode = d_inode(fs_info->proc_thread_self);
 		if (!dir_emit(ctx, "thread-self", 11, inode->i_ino, DT_LNK))
 			return 0;
 		ctx->pos = pos = pos + 1;
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 43e2639..b225ae5 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -166,10 +166,10 @@ static void proc_kill_sb(struct super_block *sb)
 	struct proc_fs_info *fs_info = proc_sb(sb);
 	struct pid_namespace *ns = fs_info->pid_ns;
 
-	if (ns->proc_self)
-		dput(ns->proc_self);
-	if (ns->proc_thread_self)
-		dput(ns->proc_thread_self);
+	if (fs_info->proc_self)
+		dput(fs_info->proc_self);
+	if (fs_info->proc_thread_self)
+		dput(fs_info->proc_thread_self);
 	kill_anon_super(sb);
 	put_pid_ns(ns);
 	kfree(fs_info);
diff --git a/fs/proc/self.c b/fs/proc/self.c
index f773301..8a67cf0 100644
--- a/fs/proc/self.c
+++ b/fs/proc/self.c
@@ -37,7 +37,6 @@ int proc_setup_self(struct super_block *s)
 {
 	struct inode *root_inode = d_inode(s->s_root);
 	struct proc_fs_info *fs_info = proc_sb(s);
-	struct pid_namespace *ns = fs_info->pid_ns;
 	struct dentry *self;
 
 	inode_lock(root_inode);
@@ -64,7 +63,7 @@ int proc_setup_self(struct super_block *s)
 		pr_err("proc_fill_super: can't allocate /proc/self\n");
 		return PTR_ERR(self);
 	}
-	ns->proc_self = self;
+	fs_info->proc_self = self;
 	return 0;
 }
 
diff --git a/fs/proc/thread_self.c b/fs/proc/thread_self.c
index 578887b..6e3225f 100644
--- a/fs/proc/thread_self.c
+++ b/fs/proc/thread_self.c
@@ -37,7 +37,6 @@ static unsigned thread_self_inum;
 int proc_setup_thread_self(struct super_block *s)
 {
 	struct proc_fs_info *fs_info = proc_sb(s);
-	struct pid_namespace *ns = fs_info->pid_ns;
 	struct inode *root_inode = d_inode(s->s_root);
 	struct dentry *thread_self;
 
@@ -62,10 +61,10 @@ int proc_setup_thread_self(struct super_block *s)
 	}
 	inode_unlock(root_inode);
 	if (IS_ERR(thread_self)) {
-		pr_err("proc_fill_super: can't allocate /proc/thread_self\n");
+		pr_err("proc_fill_super: can't allocate /proc/thread-self\n");
 		return PTR_ERR(thread_self);
 	}
-	ns->proc_thread_self = thread_self;
+	fs_info->proc_thread_self = thread_self;
 	return 0;
 }
 
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index 49538b1..f91a8bf 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -31,9 +31,7 @@ struct pid_namespace {
 	unsigned int level;
 	struct pid_namespace *parent;
 #ifdef CONFIG_PROC_FS
-	struct vfsmount *proc_mnt;
-	struct dentry *proc_self;
-	struct dentry *proc_thread_self;
+	struct vfsmount *proc_mnt; /* Internal proc mounted during each new pidns */
 #endif
 #ifdef CONFIG_BSD_PROCESS_ACCT
 	struct fs_pin *bacct;
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 9a3f6e9..8f89069 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -10,6 +10,8 @@
 
 struct proc_fs_info {
 	struct pid_namespace *pid_ns;
+	struct dentry *proc_self; /* For /proc/self/ */
+	struct dentry *proc_thread_self; /* For /proc/thread-self/ */
 };
 
 struct proc_dir_entry;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v3 3/7] proc: add helpers to set and get proc hidepid and gid mount options
  2017-11-09 16:13 ` Djalal Harouni
  (?)
@ 2017-11-09 16:14   ` Djalal Harouni
  -1 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-09 16:14 UTC (permalink / raw)
  To: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	linux-fsdevel, linux-kernel, kernel-hardening,
	linux-security-module, linux-api
  Cc: Greg Kroah-Hartman, Alexander Viro, Akinobu Mita, me,
	Oleg Nesterov, Jeff Layton, Ingo Molnar, Alexey Dobriyan,
	ebiederm, Linus Torvalds, Daniel Micay, Jonathan Corbet, bfields,
	Stephen Rothwell, solar, Djalal Harouni

This is a cleaning patch to add helpers to set and get proc mount
options instead of a direct access. This allows later to easily track
what's happening, how these fields are accessed, and in case we need
to update them in the future.

Later we will move these mount options to proc_fs_info struct. First
lets fix the access.

Cc: Kees Cook <keescook@chromium.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@gmail.com>
---
 fs/proc/base.c          | 16 +++++++++-------
 fs/proc/inode.c         |  5 +++--
 fs/proc/internal.h      |  2 +-
 fs/proc/root.c          | 15 ++++++++++-----
 include/linux/proc_fs.h | 44 ++++++++++++++++++++++++++++++++++++++++++--
 5 files changed, 65 insertions(+), 17 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 0d9b4214..f324c49 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -682,13 +682,16 @@ int proc_setattr(struct dentry *dentry, struct iattr *attr)
  * May current process learn task's sched/cmdline info (for hide_pid_min=1)
  * or euid/egid (for hide_pid_min=2)?
  */
-static bool has_pid_permissions(struct pid_namespace *pid,
+static bool has_pid_permissions(struct proc_fs_info *fs_info,
 				 struct task_struct *task,
 				 int hide_pid_min)
 {
-	if (pid->hide_pid < hide_pid_min)
+	int hide_pid = proc_fs_hide_pid(fs_info);
+	kgid_t gid = proc_fs_pid_gid(fs_info);
+
+	if (hide_pid < hide_pid_min)
 		return true;
-	if (in_group_p(pid->pid_gid))
+	if (in_group_p(gid))
 		return true;
 	return ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS);
 }
@@ -704,7 +707,7 @@ static int proc_pid_permission(struct inode *inode, int mask)
 	task = get_proc_task(inode);
 	if (!task)
 		return -ESRCH;
-	has_perms = has_pid_permissions(pid, task, HIDEPID_NO_ACCESS);
+	has_perms = has_pid_permissions(fs_info, task, HIDEPID_NO_ACCESS);
 	put_task_struct(task);
 
 	if (!has_perms) {
@@ -1778,7 +1781,6 @@ int pid_getattr(const struct path *path, struct kstat *stat,
 	struct task_struct *task;
 	struct inode *inode = d_inode(path->dentry);
 	struct proc_fs_info *fs_info = proc_sb(inode->i_sb);
-	struct pid_namespace *pid = fs_info->pid_ns;
 
 	generic_fillattr(inode, stat);
 
@@ -1787,7 +1789,7 @@ int pid_getattr(const struct path *path, struct kstat *stat,
 	stat->gid = GLOBAL_ROOT_GID;
 	task = pid_task(proc_pid(inode), PIDTYPE_PID);
 	if (task) {
-		if (!has_pid_permissions(pid, task, HIDEPID_INVISIBLE)) {
+		if (!has_pid_permissions(fs_info, task, HIDEPID_INVISIBLE)) {
 			rcu_read_unlock();
 			/*
 			 * This doesn't prevent learning whether PID exists,
@@ -3234,7 +3236,7 @@ int proc_pid_readdir(struct file *file, struct dir_context *ctx)
 		int len;
 
 		cond_resched();
-		if (!has_pid_permissions(ns, iter.task, HIDEPID_INVISIBLE))
+		if (!has_pid_permissions(fs_info, iter.task, HIDEPID_INVISIBLE))
 			continue;
 
 		len = snprintf(name, sizeof(name), "%d", iter.tgid);
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 9abc370..bdd808d 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -476,11 +476,12 @@ struct inode *proc_get_inode(struct super_block *sb, struct proc_dir_entry *de)
 int proc_fill_super(struct super_block *s, void *data, int silent)
 {
 	struct proc_fs_info *fs_info = proc_sb(s);
-	struct pid_namespace *ns = get_pid_ns(fs_info->pid_ns);
 	struct inode *root_inode;
 	int ret;
 
-	if (!proc_parse_options(data, ns))
+	get_pid_ns(fs_info->pid_ns);
+
+	if (!proc_parse_options(data, fs_info))
 		return -EINVAL;
 
 	/* User space would break if executables or devices appear on proc */
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 4a67188..10bc7be 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -240,7 +240,7 @@ static inline void proc_tty_init(void) {}
  * root.c
  */
 extern struct proc_dir_entry proc_root;
-extern int proc_parse_options(char *options, struct pid_namespace *pid);
+extern int proc_parse_options(char *options, struct proc_fs_info *fs_info);
 
 extern void proc_self_init(void);
 extern int proc_remount(struct super_block *, int *, char *);
diff --git a/fs/proc/root.c b/fs/proc/root.c
index b225ae5..48cc481 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -37,11 +37,12 @@ static const match_table_t tokens = {
 	{Opt_err, NULL},
 };
 
-int proc_parse_options(char *options, struct pid_namespace *pid)
+int proc_parse_options(char *options, struct proc_fs_info *fs_info)
 {
 	char *p;
 	substring_t args[MAX_OPT_ARGS];
 	int option;
+	kgid_t gid;
 
 	if (!options)
 		return 1;
@@ -57,7 +58,12 @@ int proc_parse_options(char *options, struct pid_namespace *pid)
 		case Opt_gid:
 			if (match_int(&args[0], &option))
 				return 0;
-			pid->pid_gid = make_kgid(current_user_ns(), option);
+			gid = make_kgid(current_user_ns(), option);
+			if (!gid_valid(gid)) {
+				pr_err("proc: invalid gid mount option.\n");
+				return 0;
+			}
+			proc_fs_set_pid_gid(fs_info, gid);
 			break;
 		case Opt_hidepid:
 			if (match_int(&args[0], &option))
@@ -67,7 +73,7 @@ int proc_parse_options(char *options, struct pid_namespace *pid)
 				pr_err("proc: hidepid value must be between 0 and 2.\n");
 				return 0;
 			}
-			pid->hide_pid = option;
+			proc_fs_set_hide_pid(fs_info, option);
 			break;
 		default:
 			pr_err("proc: unrecognized mount option \"%s\" "
@@ -82,10 +88,9 @@ int proc_parse_options(char *options, struct pid_namespace *pid)
 int proc_remount(struct super_block *sb, int *flags, char *data)
 {
 	struct proc_fs_info *fs_info = proc_sb(sb);
-	struct pid_namespace *pid = fs_info->pid_ns;
 
 	sync_filesystem(sb);
-	return !proc_parse_options(data, pid);
+	return !proc_parse_options(data, fs_info);
 }
 
 static int proc_test_super(struct super_block *s, void *data)
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 8f89069..047d0d0 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -7,6 +7,10 @@
 
 #include <linux/types.h>
 #include <linux/fs.h>
+#include <linux/pid_namespace.h>
+
+struct proc_dir_entry;
+struct pid_namespace;
 
 struct proc_fs_info {
 	struct pid_namespace *pid_ns;
@@ -14,8 +18,6 @@ struct proc_fs_info {
 	struct dentry *proc_thread_self; /* For /proc/thread-self/ */
 };
 
-struct proc_dir_entry;
-
 #ifdef CONFIG_PROC_FS
 
 static inline struct proc_fs_info *proc_sb(struct super_block *sb)
@@ -23,6 +25,26 @@ static inline struct proc_fs_info *proc_sb(struct super_block *sb)
 	return sb->s_fs_info;
 }
 
+static inline void proc_fs_set_hide_pid(struct proc_fs_info *fs_info, int hide_pid)
+{
+	fs_info->pid_ns->hide_pid = hide_pid;
+}
+
+static inline void proc_fs_set_pid_gid(struct proc_fs_info *fs_info, kgid_t gid)
+{
+	fs_info->pid_ns->pid_gid = gid;
+}
+
+static inline int proc_fs_hide_pid(struct proc_fs_info *fs_info)
+{
+	return fs_info->pid_ns->hide_pid;
+}
+
+static inline kgid_t proc_fs_pid_gid(struct proc_fs_info *fs_info)
+{
+	return fs_info->pid_ns->pid_gid;
+}
+
 extern void proc_root_init(void);
 extern void proc_flush_task(struct task_struct *);
 
@@ -59,6 +81,24 @@ static inline void proc_flush_task(struct task_struct *task)
 {
 }
 
+static inline void proc_fs_set_hide_pid(struct proc_fs_info *fs_info, int hide_pid)
+{
+}
+
+static inline void proc_fs_set_pid_gid(struct proc_info_fs *fs_info, kgid_t gid)
+{
+}
+
+static inline int proc_fs_hide_pid(struct proc_fs_info *fs_info)
+{
+	return 0;
+}
+
+extern kgid_t proc_fs_pid_gid(struct proc_fs_info *fs_info)
+{
+	return GLOBAL_ROOT_GID;
+}
+
 extern inline struct proc_fs_info *proc_sb(struct super_block *sb) { return NULL;}
 static inline struct proc_dir_entry *proc_symlink(const char *name,
 		struct proc_dir_entry *parent,const char *dest) { return NULL;}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v3 3/7] proc: add helpers to set and get proc hidepid and gid mount options
@ 2017-11-09 16:14   ` Djalal Harouni
  0 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-09 16:14 UTC (permalink / raw)
  To: linux-security-module

This is a cleaning patch to add helpers to set and get proc mount
options instead of a direct access. This allows later to easily track
what's happening, how these fields are accessed, and in case we need
to update them in the future.

Later we will move these mount options to proc_fs_info struct. First
lets fix the access.

Cc: Kees Cook <keescook@chromium.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@gmail.com>
---
 fs/proc/base.c          | 16 +++++++++-------
 fs/proc/inode.c         |  5 +++--
 fs/proc/internal.h      |  2 +-
 fs/proc/root.c          | 15 ++++++++++-----
 include/linux/proc_fs.h | 44 ++++++++++++++++++++++++++++++++++++++++++--
 5 files changed, 65 insertions(+), 17 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 0d9b4214..f324c49 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -682,13 +682,16 @@ int proc_setattr(struct dentry *dentry, struct iattr *attr)
  * May current process learn task's sched/cmdline info (for hide_pid_min=1)
  * or euid/egid (for hide_pid_min=2)?
  */
-static bool has_pid_permissions(struct pid_namespace *pid,
+static bool has_pid_permissions(struct proc_fs_info *fs_info,
 				 struct task_struct *task,
 				 int hide_pid_min)
 {
-	if (pid->hide_pid < hide_pid_min)
+	int hide_pid = proc_fs_hide_pid(fs_info);
+	kgid_t gid = proc_fs_pid_gid(fs_info);
+
+	if (hide_pid < hide_pid_min)
 		return true;
-	if (in_group_p(pid->pid_gid))
+	if (in_group_p(gid))
 		return true;
 	return ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS);
 }
@@ -704,7 +707,7 @@ static int proc_pid_permission(struct inode *inode, int mask)
 	task = get_proc_task(inode);
 	if (!task)
 		return -ESRCH;
-	has_perms = has_pid_permissions(pid, task, HIDEPID_NO_ACCESS);
+	has_perms = has_pid_permissions(fs_info, task, HIDEPID_NO_ACCESS);
 	put_task_struct(task);
 
 	if (!has_perms) {
@@ -1778,7 +1781,6 @@ int pid_getattr(const struct path *path, struct kstat *stat,
 	struct task_struct *task;
 	struct inode *inode = d_inode(path->dentry);
 	struct proc_fs_info *fs_info = proc_sb(inode->i_sb);
-	struct pid_namespace *pid = fs_info->pid_ns;
 
 	generic_fillattr(inode, stat);
 
@@ -1787,7 +1789,7 @@ int pid_getattr(const struct path *path, struct kstat *stat,
 	stat->gid = GLOBAL_ROOT_GID;
 	task = pid_task(proc_pid(inode), PIDTYPE_PID);
 	if (task) {
-		if (!has_pid_permissions(pid, task, HIDEPID_INVISIBLE)) {
+		if (!has_pid_permissions(fs_info, task, HIDEPID_INVISIBLE)) {
 			rcu_read_unlock();
 			/*
 			 * This doesn't prevent learning whether PID exists,
@@ -3234,7 +3236,7 @@ int proc_pid_readdir(struct file *file, struct dir_context *ctx)
 		int len;
 
 		cond_resched();
-		if (!has_pid_permissions(ns, iter.task, HIDEPID_INVISIBLE))
+		if (!has_pid_permissions(fs_info, iter.task, HIDEPID_INVISIBLE))
 			continue;
 
 		len = snprintf(name, sizeof(name), "%d", iter.tgid);
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 9abc370..bdd808d 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -476,11 +476,12 @@ struct inode *proc_get_inode(struct super_block *sb, struct proc_dir_entry *de)
 int proc_fill_super(struct super_block *s, void *data, int silent)
 {
 	struct proc_fs_info *fs_info = proc_sb(s);
-	struct pid_namespace *ns = get_pid_ns(fs_info->pid_ns);
 	struct inode *root_inode;
 	int ret;
 
-	if (!proc_parse_options(data, ns))
+	get_pid_ns(fs_info->pid_ns);
+
+	if (!proc_parse_options(data, fs_info))
 		return -EINVAL;
 
 	/* User space would break if executables or devices appear on proc */
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 4a67188..10bc7be 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -240,7 +240,7 @@ static inline void proc_tty_init(void) {}
  * root.c
  */
 extern struct proc_dir_entry proc_root;
-extern int proc_parse_options(char *options, struct pid_namespace *pid);
+extern int proc_parse_options(char *options, struct proc_fs_info *fs_info);
 
 extern void proc_self_init(void);
 extern int proc_remount(struct super_block *, int *, char *);
diff --git a/fs/proc/root.c b/fs/proc/root.c
index b225ae5..48cc481 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -37,11 +37,12 @@ static const match_table_t tokens = {
 	{Opt_err, NULL},
 };
 
-int proc_parse_options(char *options, struct pid_namespace *pid)
+int proc_parse_options(char *options, struct proc_fs_info *fs_info)
 {
 	char *p;
 	substring_t args[MAX_OPT_ARGS];
 	int option;
+	kgid_t gid;
 
 	if (!options)
 		return 1;
@@ -57,7 +58,12 @@ int proc_parse_options(char *options, struct pid_namespace *pid)
 		case Opt_gid:
 			if (match_int(&args[0], &option))
 				return 0;
-			pid->pid_gid = make_kgid(current_user_ns(), option);
+			gid = make_kgid(current_user_ns(), option);
+			if (!gid_valid(gid)) {
+				pr_err("proc: invalid gid mount option.\n");
+				return 0;
+			}
+			proc_fs_set_pid_gid(fs_info, gid);
 			break;
 		case Opt_hidepid:
 			if (match_int(&args[0], &option))
@@ -67,7 +73,7 @@ int proc_parse_options(char *options, struct pid_namespace *pid)
 				pr_err("proc: hidepid value must be between 0 and 2.\n");
 				return 0;
 			}
-			pid->hide_pid = option;
+			proc_fs_set_hide_pid(fs_info, option);
 			break;
 		default:
 			pr_err("proc: unrecognized mount option \"%s\" "
@@ -82,10 +88,9 @@ int proc_parse_options(char *options, struct pid_namespace *pid)
 int proc_remount(struct super_block *sb, int *flags, char *data)
 {
 	struct proc_fs_info *fs_info = proc_sb(sb);
-	struct pid_namespace *pid = fs_info->pid_ns;
 
 	sync_filesystem(sb);
-	return !proc_parse_options(data, pid);
+	return !proc_parse_options(data, fs_info);
 }
 
 static int proc_test_super(struct super_block *s, void *data)
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 8f89069..047d0d0 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -7,6 +7,10 @@
 
 #include <linux/types.h>
 #include <linux/fs.h>
+#include <linux/pid_namespace.h>
+
+struct proc_dir_entry;
+struct pid_namespace;
 
 struct proc_fs_info {
 	struct pid_namespace *pid_ns;
@@ -14,8 +18,6 @@ struct proc_fs_info {
 	struct dentry *proc_thread_self; /* For /proc/thread-self/ */
 };
 
-struct proc_dir_entry;
-
 #ifdef CONFIG_PROC_FS
 
 static inline struct proc_fs_info *proc_sb(struct super_block *sb)
@@ -23,6 +25,26 @@ static inline struct proc_fs_info *proc_sb(struct super_block *sb)
 	return sb->s_fs_info;
 }
 
+static inline void proc_fs_set_hide_pid(struct proc_fs_info *fs_info, int hide_pid)
+{
+	fs_info->pid_ns->hide_pid = hide_pid;
+}
+
+static inline void proc_fs_set_pid_gid(struct proc_fs_info *fs_info, kgid_t gid)
+{
+	fs_info->pid_ns->pid_gid = gid;
+}
+
+static inline int proc_fs_hide_pid(struct proc_fs_info *fs_info)
+{
+	return fs_info->pid_ns->hide_pid;
+}
+
+static inline kgid_t proc_fs_pid_gid(struct proc_fs_info *fs_info)
+{
+	return fs_info->pid_ns->pid_gid;
+}
+
 extern void proc_root_init(void);
 extern void proc_flush_task(struct task_struct *);
 
@@ -59,6 +81,24 @@ static inline void proc_flush_task(struct task_struct *task)
 {
 }
 
+static inline void proc_fs_set_hide_pid(struct proc_fs_info *fs_info, int hide_pid)
+{
+}
+
+static inline void proc_fs_set_pid_gid(struct proc_info_fs *fs_info, kgid_t gid)
+{
+}
+
+static inline int proc_fs_hide_pid(struct proc_fs_info *fs_info)
+{
+	return 0;
+}
+
+extern kgid_t proc_fs_pid_gid(struct proc_fs_info *fs_info)
+{
+	return GLOBAL_ROOT_GID;
+}
+
 extern inline struct proc_fs_info *proc_sb(struct super_block *sb) { return NULL;}
 static inline struct proc_dir_entry *proc_symlink(const char *name,
 		struct proc_dir_entry *parent,const char *dest) { return NULL;}
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [kernel-hardening] [PATCH RFC v3 3/7] proc: add helpers to set and get proc hidepid and gid mount options
@ 2017-11-09 16:14   ` Djalal Harouni
  0 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-09 16:14 UTC (permalink / raw)
  To: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	linux-fsdevel, linux-kernel, kernel-hardening,
	linux-security-module, linux-api
  Cc: Greg Kroah-Hartman, Alexander Viro, Akinobu Mita, me,
	Oleg Nesterov, Jeff Layton, Ingo Molnar, Alexey Dobriyan,
	ebiederm, Linus Torvalds, Daniel Micay, Jonathan Corbet, bfields,
	Stephen Rothwell, solar, Djalal Harouni

This is a cleaning patch to add helpers to set and get proc mount
options instead of a direct access. This allows later to easily track
what's happening, how these fields are accessed, and in case we need
to update them in the future.

Later we will move these mount options to proc_fs_info struct. First
lets fix the access.

Cc: Kees Cook <keescook@chromium.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@gmail.com>
---
 fs/proc/base.c          | 16 +++++++++-------
 fs/proc/inode.c         |  5 +++--
 fs/proc/internal.h      |  2 +-
 fs/proc/root.c          | 15 ++++++++++-----
 include/linux/proc_fs.h | 44 ++++++++++++++++++++++++++++++++++++++++++--
 5 files changed, 65 insertions(+), 17 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 0d9b4214..f324c49 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -682,13 +682,16 @@ int proc_setattr(struct dentry *dentry, struct iattr *attr)
  * May current process learn task's sched/cmdline info (for hide_pid_min=1)
  * or euid/egid (for hide_pid_min=2)?
  */
-static bool has_pid_permissions(struct pid_namespace *pid,
+static bool has_pid_permissions(struct proc_fs_info *fs_info,
 				 struct task_struct *task,
 				 int hide_pid_min)
 {
-	if (pid->hide_pid < hide_pid_min)
+	int hide_pid = proc_fs_hide_pid(fs_info);
+	kgid_t gid = proc_fs_pid_gid(fs_info);
+
+	if (hide_pid < hide_pid_min)
 		return true;
-	if (in_group_p(pid->pid_gid))
+	if (in_group_p(gid))
 		return true;
 	return ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS);
 }
@@ -704,7 +707,7 @@ static int proc_pid_permission(struct inode *inode, int mask)
 	task = get_proc_task(inode);
 	if (!task)
 		return -ESRCH;
-	has_perms = has_pid_permissions(pid, task, HIDEPID_NO_ACCESS);
+	has_perms = has_pid_permissions(fs_info, task, HIDEPID_NO_ACCESS);
 	put_task_struct(task);
 
 	if (!has_perms) {
@@ -1778,7 +1781,6 @@ int pid_getattr(const struct path *path, struct kstat *stat,
 	struct task_struct *task;
 	struct inode *inode = d_inode(path->dentry);
 	struct proc_fs_info *fs_info = proc_sb(inode->i_sb);
-	struct pid_namespace *pid = fs_info->pid_ns;
 
 	generic_fillattr(inode, stat);
 
@@ -1787,7 +1789,7 @@ int pid_getattr(const struct path *path, struct kstat *stat,
 	stat->gid = GLOBAL_ROOT_GID;
 	task = pid_task(proc_pid(inode), PIDTYPE_PID);
 	if (task) {
-		if (!has_pid_permissions(pid, task, HIDEPID_INVISIBLE)) {
+		if (!has_pid_permissions(fs_info, task, HIDEPID_INVISIBLE)) {
 			rcu_read_unlock();
 			/*
 			 * This doesn't prevent learning whether PID exists,
@@ -3234,7 +3236,7 @@ int proc_pid_readdir(struct file *file, struct dir_context *ctx)
 		int len;
 
 		cond_resched();
-		if (!has_pid_permissions(ns, iter.task, HIDEPID_INVISIBLE))
+		if (!has_pid_permissions(fs_info, iter.task, HIDEPID_INVISIBLE))
 			continue;
 
 		len = snprintf(name, sizeof(name), "%d", iter.tgid);
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 9abc370..bdd808d 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -476,11 +476,12 @@ struct inode *proc_get_inode(struct super_block *sb, struct proc_dir_entry *de)
 int proc_fill_super(struct super_block *s, void *data, int silent)
 {
 	struct proc_fs_info *fs_info = proc_sb(s);
-	struct pid_namespace *ns = get_pid_ns(fs_info->pid_ns);
 	struct inode *root_inode;
 	int ret;
 
-	if (!proc_parse_options(data, ns))
+	get_pid_ns(fs_info->pid_ns);
+
+	if (!proc_parse_options(data, fs_info))
 		return -EINVAL;
 
 	/* User space would break if executables or devices appear on proc */
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 4a67188..10bc7be 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -240,7 +240,7 @@ static inline void proc_tty_init(void) {}
  * root.c
  */
 extern struct proc_dir_entry proc_root;
-extern int proc_parse_options(char *options, struct pid_namespace *pid);
+extern int proc_parse_options(char *options, struct proc_fs_info *fs_info);
 
 extern void proc_self_init(void);
 extern int proc_remount(struct super_block *, int *, char *);
diff --git a/fs/proc/root.c b/fs/proc/root.c
index b225ae5..48cc481 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -37,11 +37,12 @@ static const match_table_t tokens = {
 	{Opt_err, NULL},
 };
 
-int proc_parse_options(char *options, struct pid_namespace *pid)
+int proc_parse_options(char *options, struct proc_fs_info *fs_info)
 {
 	char *p;
 	substring_t args[MAX_OPT_ARGS];
 	int option;
+	kgid_t gid;
 
 	if (!options)
 		return 1;
@@ -57,7 +58,12 @@ int proc_parse_options(char *options, struct pid_namespace *pid)
 		case Opt_gid:
 			if (match_int(&args[0], &option))
 				return 0;
-			pid->pid_gid = make_kgid(current_user_ns(), option);
+			gid = make_kgid(current_user_ns(), option);
+			if (!gid_valid(gid)) {
+				pr_err("proc: invalid gid mount option.\n");
+				return 0;
+			}
+			proc_fs_set_pid_gid(fs_info, gid);
 			break;
 		case Opt_hidepid:
 			if (match_int(&args[0], &option))
@@ -67,7 +73,7 @@ int proc_parse_options(char *options, struct pid_namespace *pid)
 				pr_err("proc: hidepid value must be between 0 and 2.\n");
 				return 0;
 			}
-			pid->hide_pid = option;
+			proc_fs_set_hide_pid(fs_info, option);
 			break;
 		default:
 			pr_err("proc: unrecognized mount option \"%s\" "
@@ -82,10 +88,9 @@ int proc_parse_options(char *options, struct pid_namespace *pid)
 int proc_remount(struct super_block *sb, int *flags, char *data)
 {
 	struct proc_fs_info *fs_info = proc_sb(sb);
-	struct pid_namespace *pid = fs_info->pid_ns;
 
 	sync_filesystem(sb);
-	return !proc_parse_options(data, pid);
+	return !proc_parse_options(data, fs_info);
 }
 
 static int proc_test_super(struct super_block *s, void *data)
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 8f89069..047d0d0 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -7,6 +7,10 @@
 
 #include <linux/types.h>
 #include <linux/fs.h>
+#include <linux/pid_namespace.h>
+
+struct proc_dir_entry;
+struct pid_namespace;
 
 struct proc_fs_info {
 	struct pid_namespace *pid_ns;
@@ -14,8 +18,6 @@ struct proc_fs_info {
 	struct dentry *proc_thread_self; /* For /proc/thread-self/ */
 };
 
-struct proc_dir_entry;
-
 #ifdef CONFIG_PROC_FS
 
 static inline struct proc_fs_info *proc_sb(struct super_block *sb)
@@ -23,6 +25,26 @@ static inline struct proc_fs_info *proc_sb(struct super_block *sb)
 	return sb->s_fs_info;
 }
 
+static inline void proc_fs_set_hide_pid(struct proc_fs_info *fs_info, int hide_pid)
+{
+	fs_info->pid_ns->hide_pid = hide_pid;
+}
+
+static inline void proc_fs_set_pid_gid(struct proc_fs_info *fs_info, kgid_t gid)
+{
+	fs_info->pid_ns->pid_gid = gid;
+}
+
+static inline int proc_fs_hide_pid(struct proc_fs_info *fs_info)
+{
+	return fs_info->pid_ns->hide_pid;
+}
+
+static inline kgid_t proc_fs_pid_gid(struct proc_fs_info *fs_info)
+{
+	return fs_info->pid_ns->pid_gid;
+}
+
 extern void proc_root_init(void);
 extern void proc_flush_task(struct task_struct *);
 
@@ -59,6 +81,24 @@ static inline void proc_flush_task(struct task_struct *task)
 {
 }
 
+static inline void proc_fs_set_hide_pid(struct proc_fs_info *fs_info, int hide_pid)
+{
+}
+
+static inline void proc_fs_set_pid_gid(struct proc_info_fs *fs_info, kgid_t gid)
+{
+}
+
+static inline int proc_fs_hide_pid(struct proc_fs_info *fs_info)
+{
+	return 0;
+}
+
+extern kgid_t proc_fs_pid_gid(struct proc_fs_info *fs_info)
+{
+	return GLOBAL_ROOT_GID;
+}
+
 extern inline struct proc_fs_info *proc_sb(struct super_block *sb) { return NULL;}
 static inline struct proc_dir_entry *proc_symlink(const char *name,
 		struct proc_dir_entry *parent,const char *dest) { return NULL;}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v3 4/7] proc: support mounting private procfs instances inside same pid namespace
  2017-11-09 16:13 ` Djalal Harouni
  (?)
@ 2017-11-09 16:14   ` Djalal Harouni
  -1 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-09 16:14 UTC (permalink / raw)
  To: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	linux-fsdevel, linux-kernel, kernel-hardening,
	linux-security-module, linux-api
  Cc: Greg Kroah-Hartman, Alexander Viro, Akinobu Mita, me,
	Oleg Nesterov, Jeff Layton, Ingo Molnar, Alexey Dobriyan,
	ebiederm, Linus Torvalds, Daniel Micay, Jonathan Corbet, bfields,
	Stephen Rothwell, solar, Djalal Harouni

This patch allows to have multiple private procfs instances inside the
same pid namespace. For some cases procfs is more of a burden than an
API, it is blocking lot of other features, and from time to time we have
to workaround procfs to implement new features either in kernel or
userspace.

Discussion about security-related TODO items:
"Here's another one: split up and modernize /proc." By Andy Lutomirski [1]

Discussion about kernel pointer leaks:
"And yes, as Kees and Daniel mentioned, it's definitely not just dmesg.
In fact, the primary things tend to be /proc and /sys, not dmesg
itself." By Linus Torvalds [2]

Lot of other areas in the kernel and filesystems have been updated to be
able to support private instances, devpts is one major example. The aim
here is to modernize procfs, allow it to have multiple private instances,
so we can implement on top modern features or security mechanisms
without breaking userspace, or without affecting the shared procfs
mount. The new features will apply on the private instances, and after more
testing, months, maybe it can be made the default especially for IoT.

This allows to absorbe changes, make improvments without breaking use
cases, it is following the same path of devpts.

The final aim is to be able to do inside a lightweight sandbox:

  mount -t proc -onewinstance,pids=ptraceable none /proc

Which will be used for:

1) Embedded systems and IoT: usually we have one supervisor for
apps, we have some lightweight sandbox support, however if we create
pid namespaces we have to manage all the processes inside too,
where our goal is to be able to run a bunch of apps each one inside
its own mount namespace, maybe use network namespaces for vlans
setups, but right now we only want mount namespaces, without all the
other complexity. we want procfs to behave more like a real file system,
and block access to inodes that belong to other users. 'hidepid=' will
not work since it is a shared mount option.

2) Containers, sandboxes and Private instances of file systems - devpts case
Historically, lot of file systems inside Linux kernel view when instantiated
were just a mirror of an already created and mounted filesystem. This was the
case of devpts filesystem, it seems at that time the requirements were to
optimize things and reuse the same memory, etc. This design used to work but not
anymore with today’s containers, IoT, hostile environments and all the privacy
challenges that Linux faces.

In that regards, devpts was updated so that each new mounts is a total
independent file system by the following patches:
“devpts: Make each mount of devpts an independent filesystem” by
Eric W. Biederman [3] [4]

3) Linux Security Modules have multiple ptrace paths inside some
subsystems, however inside procfs, the implementation does not guarantee
that the ptrace() check which triggers the security_ptrace_check() hook
will always run. We have the 'hidepid' mount option that can be used to
force the ptrace_may_access() check inside has_pid_permissions() to run.
The problem is that 'hidepid' is per pid namespace and not attached to
the mount point, any remount or modification of 'hidepid' will propagate
to all other procfs mounts.

This also does not allow to support Yama LSM easily in desktop and user
sessions. Yama ptrace scope which restricts ptrace and some other
syscalls to be allowed only on inferiors, can be updated to have a
per-task context, where the context will be inherited during fork(),
clone() and preserved across execve(). If we support multiple private
procfs instances, then we may force the ptrace_may_access() on
/proc/<pids>/ to always run inside that new procfs instances. This will
allow to specifiy on user sessions if we should populate procfs with
pids that the user can ptrace or not.

By using Yama ptrace scope, some restricted users will only be able to see
inferiors inside /proc, they won't even be able to see their other
processes. Some software like Chromium, Firefox's crash handler, Wine
and others are already using Yama to restrict which processes can be
ptracable. With this change this will give the possibility to restrict
/proc/<pids>/ but more importantly this will give desktop users a
generic and usuable way to specifiy which users should see all processes
and which user can not.

Side notes:
* This covers the lack of seccomp where it is not able to parse
arguments, it is easy to install a seccomp filter on direct syscalls
that operate on pids, however /proc/<pid>/ is a Linux ABI using
filesystem syscalls. With this change all LSMs should be able to analyze
open/read/write/close... on /proc/<pid>/

4) This will allow to implement new features either in kernel or
userspace without having to worry about procfs.
In containers, sandboxes, etc we have workarounds to hide some /proc
inodes, this should be supported natively without doing extra complex
work, the kernel should be able to support sane options that work with
today and future Linux use cases.

Changes of this patch:

* 'newinstance' mount option, it was also suggesed by Andy Lutomirski [5].
When this option is passed we automatically create a private procfs instance.

This is not the default behaviour since we do not want to break userspace
and we do not want to provide different devices IDs by default when
stat()ing inodes, I am not sure about all the use cases there [6].

* Also this patch moves the 'hidepid' and 'gid' mount options from being
defined and used inside PID namespaces to their private proc_fs_info
struct, cleaning both PID namespaces and procfs.

Use cases of 'newinstance' mount option:

* We create a private procfs instance that it is disconnected from the
shared or other procfs instances.

* "hidepid" instead of chaning all other mirrored procfs mounts, now
it will work only on the new private instance.

* "gid" instead of chaning all other mirrored procfs mounts, now it will
work only on the new private instance.

* The next patch that introduces "pids=ptraceable" mount option which
will take precendence over "hidepid" will only work when 'newinstance'
is set. Otherwise it is ignored.

This should allow later after real testing to have a smooth transition
to a procfs with default private instances.

[1] https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2017-January/004215.html
[2] http://www.openwall.com/lists/kernel-hardening/2017/10/05/5
[3] https://lwn.net/Articles/689539/
[4] http://lxr.free-electrons.com/source/Documentation/filesystems/devpts.txt?v=3.14
[5] https://lkml.org/lkml/2017/5/2/407
[6] https://lkml.org/lkml/2017/5/3/357

Cc: Kees Cook <keescook@chromium.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@gmail.com>
---
 fs/proc/base.c                |  4 +--
 fs/proc/inode.c               | 14 +++++---
 fs/proc/root.c                | 78 ++++++++++++++++++++++++++++++++++++++++---
 include/linux/pid_namespace.h |  2 --
 include/linux/proc_fs.h       | 30 ++++++++++++++---
 5 files changed, 110 insertions(+), 18 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index f324c49..54b527c 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -700,7 +700,7 @@ static bool has_pid_permissions(struct proc_fs_info *fs_info,
 static int proc_pid_permission(struct inode *inode, int mask)
 {
 	struct proc_fs_info *fs_info = proc_sb(inode->i_sb);
-	struct pid_namespace *pid = fs_info->pid_ns;
+	int hide_pid = proc_fs_hide_pid(fs_info);
 	struct task_struct *task;
 	bool has_perms;
 
@@ -711,7 +711,7 @@ static int proc_pid_permission(struct inode *inode, int mask)
 	put_task_struct(task);
 
 	if (!has_perms) {
-		if (pid->hide_pid == HIDEPID_INVISIBLE) {
+		if (hide_pid == HIDEPID_INVISIBLE) {
 			/*
 			 * Let's make getdents(), stat(), and open()
 			 * consistent with each other.  If a process
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index bdd808d..faec32a 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -105,12 +105,16 @@ static int proc_show_options(struct seq_file *seq, struct dentry *root)
 {
 	struct super_block *sb = root->d_sb;
 	struct proc_fs_info *fs_info = proc_sb(sb);
-	struct pid_namespace *pid = fs_info->pid_ns;
+	int hide_pid = proc_fs_hide_pid(fs_info);
+	kgid_t pid_gid = proc_fs_pid_gid(fs_info);
 
-	if (!gid_eq(pid->pid_gid, GLOBAL_ROOT_GID))
-		seq_printf(seq, ",gid=%u", from_kgid_munged(&init_user_ns, pid->pid_gid));
-	if (pid->hide_pid != HIDEPID_OFF)
-		seq_printf(seq, ",hidepid=%u", pid->hide_pid);
+	if (proc_fs_newinstance(fs_info))
+		seq_printf(seq, ",newinstance");
+
+	if (!gid_eq(pid_gid, GLOBAL_ROOT_GID))
+		seq_printf(seq, ",gid=%u", from_kgid_munged(current_user_ns(),pid_gid));
+	if (hide_pid != HIDEPID_OFF)
+		seq_printf(seq, ",hidepid=%u", hide_pid);
 
 	return 0;
 }
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 48cc481..33ab965 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -28,15 +28,57 @@
 #include "internal.h"
 
 enum {
-	Opt_gid, Opt_hidepid, Opt_err,
+	Opt_gid, Opt_hidepid, Opt_newinstance, Opt_err,
 };
 
 static const match_table_t tokens = {
 	{Opt_hidepid, "hidepid=%u"},
 	{Opt_gid, "gid=%u"},
+	{Opt_newinstance, "newinstance"},
 	{Opt_err, NULL},
 };
 
+/* We only parse 'newinstance' option here */
+int proc_parse_early_options(char *options, struct proc_fs_info *fs_info)
+{
+	char *p, *opts, *orig;
+	substring_t args[MAX_OPT_ARGS];
+
+	if (!options)
+		return 0;
+
+	opts = kstrdup(options, GFP_KERNEL);
+	if (!opts)
+		return -ENOMEM;
+
+	orig = opts;
+
+	while ((p = strsep(&opts, ",")) != NULL) {
+		int token;
+
+		if (!*p)
+			continue;
+
+		token = match_token(p, tokens, args);
+		switch (token) {
+		case Opt_newinstance:
+			proc_fs_set_newinstance(fs_info, true);
+			pr_info("proc: mounting a new procfs instance ");
+			break;
+		case Opt_gid:
+		case Opt_hidepid:
+			break;
+		default:
+			pr_err("proc: unrecognized mount option \"%s\" "
+			       "or missing value\n", p);
+			return -EINVAL;
+		}
+	}
+
+	kfree(orig);
+	return 0;
+}
+
 int proc_parse_options(char *options, struct proc_fs_info *fs_info)
 {
 	char *p;
@@ -75,6 +117,8 @@ int proc_parse_options(char *options, struct proc_fs_info *fs_info)
 			}
 			proc_fs_set_hide_pid(fs_info, option);
 			break;
+		case Opt_newinstance:
+			break;
 		default:
 			pr_err("proc: unrecognized mount option \"%s\" "
 			       "or missing value\n", p);
@@ -87,18 +131,34 @@ int proc_parse_options(char *options, struct proc_fs_info *fs_info)
 
 int proc_remount(struct super_block *sb, int *flags, char *data)
 {
+	int error;
 	struct proc_fs_info *fs_info = proc_sb(sb);
 
 	sync_filesystem(sb);
+
+	/*
+	 * If this is a new instance, then parse again the proc mount
+	 * options.
+	 */
+	if (proc_fs_newinstance(fs_info)) {
+		error = proc_parse_early_options(data, fs_info);
+		if (error < 0)
+			return error;
+	}
+
 	return !proc_parse_options(data, fs_info);
 }
 
-static int proc_test_super(struct super_block *s, void *data)
+static int proc_test_super(struct super_block *sb, void *data)
 {
 	struct proc_fs_info *p = data;
-	struct proc_fs_info *fs_info = proc_sb(s);
+	struct proc_fs_info *fs_info = proc_sb(sb);
+
+	if (!proc_fs_newinstance(p) && !proc_fs_newinstance(fs_info) &&
+	    p->pid_ns == fs_info->pid_ns)
+		return 1;
 
-	return p->pid_ns == fs_info->pid_ns;
+	return 0;
 }
 
 static int proc_set_super(struct super_block *sb, void *data)
@@ -110,7 +170,7 @@ static int proc_set_super(struct super_block *sb, void *data)
 static struct dentry *proc_mount(struct file_system_type *fs_type,
 	int flags, const char *dev_name, void *data)
 {
-	int error;
+	int error = 0;
 	struct super_block *sb;
 	struct pid_namespace *ns;
 	struct proc_fs_info *fs_info;
@@ -126,10 +186,18 @@ static struct dentry *proc_mount(struct file_system_type *fs_type,
 	if (!fs_info)
 		return ERR_PTR(-ENOMEM);
 
+	/* Set it as early as possible */
+	proc_fs_set_newinstance(fs_info, false);
+
 	if (flags & SB_KERNMOUNT) {
 		ns = data;
 		data = NULL;
 	} else {
+		/* Parse early mount options if not a kernel mount */
+		error = proc_parse_early_options(data, fs_info);
+		if (error < 0)
+			goto error_fs_info;
+
 		ns = task_active_pid_ns(current);
 	}
 
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index f91a8bf..786ea04 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -39,8 +39,6 @@ struct pid_namespace {
 	struct user_namespace *user_ns;
 	struct ucounts *ucounts;
 	struct work_struct proc_work;
-	kgid_t pid_gid;
-	int hide_pid;
 	int reboot;	/* group exit code if this pidns was rebooted */
 	struct ns_common ns;
 } __randomize_layout;
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 047d0d0..408b51d 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -16,6 +16,9 @@ struct proc_fs_info {
 	struct pid_namespace *pid_ns;
 	struct dentry *proc_self; /* For /proc/self/ */
 	struct dentry *proc_thread_self; /* For /proc/thread-self/ */
+	bool newinstance; /* Flag for new separated instances */
+	kgid_t pid_gid;
+	int hide_pid;
 };
 
 #ifdef CONFIG_PROC_FS
@@ -27,22 +30,32 @@ static inline struct proc_fs_info *proc_sb(struct super_block *sb)
 
 static inline void proc_fs_set_hide_pid(struct proc_fs_info *fs_info, int hide_pid)
 {
-	fs_info->pid_ns->hide_pid = hide_pid;
+	fs_info->hide_pid = hide_pid;
 }
 
 static inline void proc_fs_set_pid_gid(struct proc_fs_info *fs_info, kgid_t gid)
 {
-	fs_info->pid_ns->pid_gid = gid;
+	fs_info->pid_gid = gid;
+}
+
+static inline void proc_fs_set_newinstance(struct proc_fs_info *fs_info, bool value)
+{
+	fs_info->newinstance = value;
 }
 
 static inline int proc_fs_hide_pid(struct proc_fs_info *fs_info)
 {
-	return fs_info->pid_ns->hide_pid;
+	return fs_info->hide_pid;
 }
 
 static inline kgid_t proc_fs_pid_gid(struct proc_fs_info *fs_info)
 {
-	return fs_info->pid_ns->pid_gid;
+	return fs_info->pid_gid;
+}
+
+static inline bool proc_fs_newinstance(struct proc_fs_info *fs_info)
+{
+	return fs_info->newinstance;
 }
 
 extern void proc_root_init(void);
@@ -89,6 +102,10 @@ static inline void proc_fs_set_pid_gid(struct proc_info_fs *fs_info, kgid_t gid)
 {
 }
 
+static inline void proc_fs_set_newinstance(struct proc_fs_info *fs_info, bool value)
+{
+}
+
 static inline int proc_fs_hide_pid(struct proc_fs_info *fs_info)
 {
 	return 0;
@@ -99,6 +116,11 @@ extern kgid_t proc_fs_pid_gid(struct proc_fs_info *fs_info)
 	return GLOBAL_ROOT_GID;
 }
 
+static inline bool proc_fs_newinstance(struct proc_fs_info *fs_info)
+{
+	return false;
+}
+
 extern inline struct proc_fs_info *proc_sb(struct super_block *sb) { return NULL;}
 static inline struct proc_dir_entry *proc_symlink(const char *name,
 		struct proc_dir_entry *parent,const char *dest) { return NULL;}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v3 4/7] proc: support mounting private procfs instances inside same pid namespace
@ 2017-11-09 16:14   ` Djalal Harouni
  0 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-09 16:14 UTC (permalink / raw)
  To: linux-security-module

This patch allows to have multiple private procfs instances inside the
same pid namespace. For some cases procfs is more of a burden than an
API, it is blocking lot of other features, and from time to time we have
to workaround procfs to implement new features either in kernel or
userspace.

Discussion about security-related TODO items:
"Here's another one: split up and modernize /proc." By Andy Lutomirski [1]

Discussion about kernel pointer leaks:
"And yes, as Kees and Daniel mentioned, it's definitely not just dmesg.
In fact, the primary things tend to be /proc and /sys, not dmesg
itself." By Linus Torvalds [2]

Lot of other areas in the kernel and filesystems have been updated to be
able to support private instances, devpts is one major example. The aim
here is to modernize procfs, allow it to have multiple private instances,
so we can implement on top modern features or security mechanisms
without breaking userspace, or without affecting the shared procfs
mount. The new features will apply on the private instances, and after more
testing, months, maybe it can be made the default especially for IoT.

This allows to absorbe changes, make improvments without breaking use
cases, it is following the same path of devpts.

The final aim is to be able to do inside a lightweight sandbox:

  mount -t proc -onewinstance,pids=ptraceable none /proc

Which will be used for:

1) Embedded systems and IoT: usually we have one supervisor for
apps, we have some lightweight sandbox support, however if we create
pid namespaces we have to manage all the processes inside too,
where our goal is to be able to run a bunch of apps each one inside
its own mount namespace, maybe use network namespaces for vlans
setups, but right now we only want mount namespaces, without all the
other complexity. we want procfs to behave more like a real file system,
and block access to inodes that belong to other users. 'hidepid=' will
not work since it is a shared mount option.

2) Containers, sandboxes and Private instances of file systems - devpts case
Historically, lot of file systems inside Linux kernel view when instantiated
were just a mirror of an already created and mounted filesystem. This was the
case of devpts filesystem, it seems at that time the requirements were to
optimize things and reuse the same memory, etc. This design used to work but not
anymore with today?s containers, IoT, hostile environments and all the privacy
challenges that Linux faces.

In that regards, devpts was updated so that each new mounts is a total
independent file system by the following patches:
?devpts: Make each mount of devpts an independent filesystem? by
Eric W. Biederman [3] [4]

3) Linux Security Modules have multiple ptrace paths inside some
subsystems, however inside procfs, the implementation does not guarantee
that the ptrace() check which triggers the security_ptrace_check() hook
will always run. We have the 'hidepid' mount option that can be used to
force the ptrace_may_access() check inside has_pid_permissions() to run.
The problem is that 'hidepid' is per pid namespace and not attached to
the mount point, any remount or modification of 'hidepid' will propagate
to all other procfs mounts.

This also does not allow to support Yama LSM easily in desktop and user
sessions. Yama ptrace scope which restricts ptrace and some other
syscalls to be allowed only on inferiors, can be updated to have a
per-task context, where the context will be inherited during fork(),
clone() and preserved across execve(). If we support multiple private
procfs instances, then we may force the ptrace_may_access() on
/proc/<pids>/ to always run inside that new procfs instances. This will
allow to specifiy on user sessions if we should populate procfs with
pids that the user can ptrace or not.

By using Yama ptrace scope, some restricted users will only be able to see
inferiors inside /proc, they won't even be able to see their other
processes. Some software like Chromium, Firefox's crash handler, Wine
and others are already using Yama to restrict which processes can be
ptracable. With this change this will give the possibility to restrict
/proc/<pids>/ but more importantly this will give desktop users a
generic and usuable way to specifiy which users should see all processes
and which user can not.

Side notes:
* This covers the lack of seccomp where it is not able to parse
arguments, it is easy to install a seccomp filter on direct syscalls
that operate on pids, however /proc/<pid>/ is a Linux ABI using
filesystem syscalls. With this change all LSMs should be able to analyze
open/read/write/close... on /proc/<pid>/

4) This will allow to implement new features either in kernel or
userspace without having to worry about procfs.
In containers, sandboxes, etc we have workarounds to hide some /proc
inodes, this should be supported natively without doing extra complex
work, the kernel should be able to support sane options that work with
today and future Linux use cases.

Changes of this patch:

* 'newinstance' mount option, it was also suggesed by Andy Lutomirski [5].
When this option is passed we automatically create a private procfs instance.

This is not the default behaviour since we do not want to break userspace
and we do not want to provide different devices IDs by default when
stat()ing inodes, I am not sure about all the use cases there [6].

* Also this patch moves the 'hidepid' and 'gid' mount options from being
defined and used inside PID namespaces to their private proc_fs_info
struct, cleaning both PID namespaces and procfs.

Use cases of 'newinstance' mount option:

* We create a private procfs instance that it is disconnected from the
shared or other procfs instances.

* "hidepid" instead of chaning all other mirrored procfs mounts, now
it will work only on the new private instance.

* "gid" instead of chaning all other mirrored procfs mounts, now it will
work only on the new private instance.

* The next patch that introduces "pids=ptraceable" mount option which
will take precendence over "hidepid" will only work when 'newinstance'
is set. Otherwise it is ignored.

This should allow later after real testing to have a smooth transition
to a procfs with default private instances.

[1] https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2017-January/004215.html
[2] http://www.openwall.com/lists/kernel-hardening/2017/10/05/5
[3] https://lwn.net/Articles/689539/
[4] http://lxr.free-electrons.com/source/Documentation/filesystems/devpts.txt?v=3.14
[5] https://lkml.org/lkml/2017/5/2/407
[6] https://lkml.org/lkml/2017/5/3/357

Cc: Kees Cook <keescook@chromium.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@gmail.com>
---
 fs/proc/base.c                |  4 +--
 fs/proc/inode.c               | 14 +++++---
 fs/proc/root.c                | 78 ++++++++++++++++++++++++++++++++++++++++---
 include/linux/pid_namespace.h |  2 --
 include/linux/proc_fs.h       | 30 ++++++++++++++---
 5 files changed, 110 insertions(+), 18 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index f324c49..54b527c 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -700,7 +700,7 @@ static bool has_pid_permissions(struct proc_fs_info *fs_info,
 static int proc_pid_permission(struct inode *inode, int mask)
 {
 	struct proc_fs_info *fs_info = proc_sb(inode->i_sb);
-	struct pid_namespace *pid = fs_info->pid_ns;
+	int hide_pid = proc_fs_hide_pid(fs_info);
 	struct task_struct *task;
 	bool has_perms;
 
@@ -711,7 +711,7 @@ static int proc_pid_permission(struct inode *inode, int mask)
 	put_task_struct(task);
 
 	if (!has_perms) {
-		if (pid->hide_pid == HIDEPID_INVISIBLE) {
+		if (hide_pid == HIDEPID_INVISIBLE) {
 			/*
 			 * Let's make getdents(), stat(), and open()
 			 * consistent with each other.  If a process
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index bdd808d..faec32a 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -105,12 +105,16 @@ static int proc_show_options(struct seq_file *seq, struct dentry *root)
 {
 	struct super_block *sb = root->d_sb;
 	struct proc_fs_info *fs_info = proc_sb(sb);
-	struct pid_namespace *pid = fs_info->pid_ns;
+	int hide_pid = proc_fs_hide_pid(fs_info);
+	kgid_t pid_gid = proc_fs_pid_gid(fs_info);
 
-	if (!gid_eq(pid->pid_gid, GLOBAL_ROOT_GID))
-		seq_printf(seq, ",gid=%u", from_kgid_munged(&init_user_ns, pid->pid_gid));
-	if (pid->hide_pid != HIDEPID_OFF)
-		seq_printf(seq, ",hidepid=%u", pid->hide_pid);
+	if (proc_fs_newinstance(fs_info))
+		seq_printf(seq, ",newinstance");
+
+	if (!gid_eq(pid_gid, GLOBAL_ROOT_GID))
+		seq_printf(seq, ",gid=%u", from_kgid_munged(current_user_ns(),pid_gid));
+	if (hide_pid != HIDEPID_OFF)
+		seq_printf(seq, ",hidepid=%u", hide_pid);
 
 	return 0;
 }
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 48cc481..33ab965 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -28,15 +28,57 @@
 #include "internal.h"
 
 enum {
-	Opt_gid, Opt_hidepid, Opt_err,
+	Opt_gid, Opt_hidepid, Opt_newinstance, Opt_err,
 };
 
 static const match_table_t tokens = {
 	{Opt_hidepid, "hidepid=%u"},
 	{Opt_gid, "gid=%u"},
+	{Opt_newinstance, "newinstance"},
 	{Opt_err, NULL},
 };
 
+/* We only parse 'newinstance' option here */
+int proc_parse_early_options(char *options, struct proc_fs_info *fs_info)
+{
+	char *p, *opts, *orig;
+	substring_t args[MAX_OPT_ARGS];
+
+	if (!options)
+		return 0;
+
+	opts = kstrdup(options, GFP_KERNEL);
+	if (!opts)
+		return -ENOMEM;
+
+	orig = opts;
+
+	while ((p = strsep(&opts, ",")) != NULL) {
+		int token;
+
+		if (!*p)
+			continue;
+
+		token = match_token(p, tokens, args);
+		switch (token) {
+		case Opt_newinstance:
+			proc_fs_set_newinstance(fs_info, true);
+			pr_info("proc: mounting a new procfs instance ");
+			break;
+		case Opt_gid:
+		case Opt_hidepid:
+			break;
+		default:
+			pr_err("proc: unrecognized mount option \"%s\" "
+			       "or missing value\n", p);
+			return -EINVAL;
+		}
+	}
+
+	kfree(orig);
+	return 0;
+}
+
 int proc_parse_options(char *options, struct proc_fs_info *fs_info)
 {
 	char *p;
@@ -75,6 +117,8 @@ int proc_parse_options(char *options, struct proc_fs_info *fs_info)
 			}
 			proc_fs_set_hide_pid(fs_info, option);
 			break;
+		case Opt_newinstance:
+			break;
 		default:
 			pr_err("proc: unrecognized mount option \"%s\" "
 			       "or missing value\n", p);
@@ -87,18 +131,34 @@ int proc_parse_options(char *options, struct proc_fs_info *fs_info)
 
 int proc_remount(struct super_block *sb, int *flags, char *data)
 {
+	int error;
 	struct proc_fs_info *fs_info = proc_sb(sb);
 
 	sync_filesystem(sb);
+
+	/*
+	 * If this is a new instance, then parse again the proc mount
+	 * options.
+	 */
+	if (proc_fs_newinstance(fs_info)) {
+		error = proc_parse_early_options(data, fs_info);
+		if (error < 0)
+			return error;
+	}
+
 	return !proc_parse_options(data, fs_info);
 }
 
-static int proc_test_super(struct super_block *s, void *data)
+static int proc_test_super(struct super_block *sb, void *data)
 {
 	struct proc_fs_info *p = data;
-	struct proc_fs_info *fs_info = proc_sb(s);
+	struct proc_fs_info *fs_info = proc_sb(sb);
+
+	if (!proc_fs_newinstance(p) && !proc_fs_newinstance(fs_info) &&
+	    p->pid_ns == fs_info->pid_ns)
+		return 1;
 
-	return p->pid_ns == fs_info->pid_ns;
+	return 0;
 }
 
 static int proc_set_super(struct super_block *sb, void *data)
@@ -110,7 +170,7 @@ static int proc_set_super(struct super_block *sb, void *data)
 static struct dentry *proc_mount(struct file_system_type *fs_type,
 	int flags, const char *dev_name, void *data)
 {
-	int error;
+	int error = 0;
 	struct super_block *sb;
 	struct pid_namespace *ns;
 	struct proc_fs_info *fs_info;
@@ -126,10 +186,18 @@ static struct dentry *proc_mount(struct file_system_type *fs_type,
 	if (!fs_info)
 		return ERR_PTR(-ENOMEM);
 
+	/* Set it as early as possible */
+	proc_fs_set_newinstance(fs_info, false);
+
 	if (flags & SB_KERNMOUNT) {
 		ns = data;
 		data = NULL;
 	} else {
+		/* Parse early mount options if not a kernel mount */
+		error = proc_parse_early_options(data, fs_info);
+		if (error < 0)
+			goto error_fs_info;
+
 		ns = task_active_pid_ns(current);
 	}
 
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index f91a8bf..786ea04 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -39,8 +39,6 @@ struct pid_namespace {
 	struct user_namespace *user_ns;
 	struct ucounts *ucounts;
 	struct work_struct proc_work;
-	kgid_t pid_gid;
-	int hide_pid;
 	int reboot;	/* group exit code if this pidns was rebooted */
 	struct ns_common ns;
 } __randomize_layout;
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 047d0d0..408b51d 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -16,6 +16,9 @@ struct proc_fs_info {
 	struct pid_namespace *pid_ns;
 	struct dentry *proc_self; /* For /proc/self/ */
 	struct dentry *proc_thread_self; /* For /proc/thread-self/ */
+	bool newinstance; /* Flag for new separated instances */
+	kgid_t pid_gid;
+	int hide_pid;
 };
 
 #ifdef CONFIG_PROC_FS
@@ -27,22 +30,32 @@ static inline struct proc_fs_info *proc_sb(struct super_block *sb)
 
 static inline void proc_fs_set_hide_pid(struct proc_fs_info *fs_info, int hide_pid)
 {
-	fs_info->pid_ns->hide_pid = hide_pid;
+	fs_info->hide_pid = hide_pid;
 }
 
 static inline void proc_fs_set_pid_gid(struct proc_fs_info *fs_info, kgid_t gid)
 {
-	fs_info->pid_ns->pid_gid = gid;
+	fs_info->pid_gid = gid;
+}
+
+static inline void proc_fs_set_newinstance(struct proc_fs_info *fs_info, bool value)
+{
+	fs_info->newinstance = value;
 }
 
 static inline int proc_fs_hide_pid(struct proc_fs_info *fs_info)
 {
-	return fs_info->pid_ns->hide_pid;
+	return fs_info->hide_pid;
 }
 
 static inline kgid_t proc_fs_pid_gid(struct proc_fs_info *fs_info)
 {
-	return fs_info->pid_ns->pid_gid;
+	return fs_info->pid_gid;
+}
+
+static inline bool proc_fs_newinstance(struct proc_fs_info *fs_info)
+{
+	return fs_info->newinstance;
 }
 
 extern void proc_root_init(void);
@@ -89,6 +102,10 @@ static inline void proc_fs_set_pid_gid(struct proc_info_fs *fs_info, kgid_t gid)
 {
 }
 
+static inline void proc_fs_set_newinstance(struct proc_fs_info *fs_info, bool value)
+{
+}
+
 static inline int proc_fs_hide_pid(struct proc_fs_info *fs_info)
 {
 	return 0;
@@ -99,6 +116,11 @@ extern kgid_t proc_fs_pid_gid(struct proc_fs_info *fs_info)
 	return GLOBAL_ROOT_GID;
 }
 
+static inline bool proc_fs_newinstance(struct proc_fs_info *fs_info)
+{
+	return false;
+}
+
 extern inline struct proc_fs_info *proc_sb(struct super_block *sb) { return NULL;}
 static inline struct proc_dir_entry *proc_symlink(const char *name,
 		struct proc_dir_entry *parent,const char *dest) { return NULL;}
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [kernel-hardening] [PATCH RFC v3 4/7] proc: support mounting private procfs instances inside same pid namespace
@ 2017-11-09 16:14   ` Djalal Harouni
  0 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-09 16:14 UTC (permalink / raw)
  To: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	linux-fsdevel, linux-kernel, kernel-hardening,
	linux-security-module, linux-api
  Cc: Greg Kroah-Hartman, Alexander Viro, Akinobu Mita, me,
	Oleg Nesterov, Jeff Layton, Ingo Molnar, Alexey Dobriyan,
	ebiederm, Linus Torvalds, Daniel Micay, Jonathan Corbet, bfields,
	Stephen Rothwell, solar, Djalal Harouni

This patch allows to have multiple private procfs instances inside the
same pid namespace. For some cases procfs is more of a burden than an
API, it is blocking lot of other features, and from time to time we have
to workaround procfs to implement new features either in kernel or
userspace.

Discussion about security-related TODO items:
"Here's another one: split up and modernize /proc." By Andy Lutomirski [1]

Discussion about kernel pointer leaks:
"And yes, as Kees and Daniel mentioned, it's definitely not just dmesg.
In fact, the primary things tend to be /proc and /sys, not dmesg
itself." By Linus Torvalds [2]

Lot of other areas in the kernel and filesystems have been updated to be
able to support private instances, devpts is one major example. The aim
here is to modernize procfs, allow it to have multiple private instances,
so we can implement on top modern features or security mechanisms
without breaking userspace, or without affecting the shared procfs
mount. The new features will apply on the private instances, and after more
testing, months, maybe it can be made the default especially for IoT.

This allows to absorbe changes, make improvments without breaking use
cases, it is following the same path of devpts.

The final aim is to be able to do inside a lightweight sandbox:

  mount -t proc -onewinstance,pids=ptraceable none /proc

Which will be used for:

1) Embedded systems and IoT: usually we have one supervisor for
apps, we have some lightweight sandbox support, however if we create
pid namespaces we have to manage all the processes inside too,
where our goal is to be able to run a bunch of apps each one inside
its own mount namespace, maybe use network namespaces for vlans
setups, but right now we only want mount namespaces, without all the
other complexity. we want procfs to behave more like a real file system,
and block access to inodes that belong to other users. 'hidepid=' will
not work since it is a shared mount option.

2) Containers, sandboxes and Private instances of file systems - devpts case
Historically, lot of file systems inside Linux kernel view when instantiated
were just a mirror of an already created and mounted filesystem. This was the
case of devpts filesystem, it seems at that time the requirements were to
optimize things and reuse the same memory, etc. This design used to work but not
anymore with today’s containers, IoT, hostile environments and all the privacy
challenges that Linux faces.

In that regards, devpts was updated so that each new mounts is a total
independent file system by the following patches:
“devpts: Make each mount of devpts an independent filesystem” by
Eric W. Biederman [3] [4]

3) Linux Security Modules have multiple ptrace paths inside some
subsystems, however inside procfs, the implementation does not guarantee
that the ptrace() check which triggers the security_ptrace_check() hook
will always run. We have the 'hidepid' mount option that can be used to
force the ptrace_may_access() check inside has_pid_permissions() to run.
The problem is that 'hidepid' is per pid namespace and not attached to
the mount point, any remount or modification of 'hidepid' will propagate
to all other procfs mounts.

This also does not allow to support Yama LSM easily in desktop and user
sessions. Yama ptrace scope which restricts ptrace and some other
syscalls to be allowed only on inferiors, can be updated to have a
per-task context, where the context will be inherited during fork(),
clone() and preserved across execve(). If we support multiple private
procfs instances, then we may force the ptrace_may_access() on
/proc/<pids>/ to always run inside that new procfs instances. This will
allow to specifiy on user sessions if we should populate procfs with
pids that the user can ptrace or not.

By using Yama ptrace scope, some restricted users will only be able to see
inferiors inside /proc, they won't even be able to see their other
processes. Some software like Chromium, Firefox's crash handler, Wine
and others are already using Yama to restrict which processes can be
ptracable. With this change this will give the possibility to restrict
/proc/<pids>/ but more importantly this will give desktop users a
generic and usuable way to specifiy which users should see all processes
and which user can not.

Side notes:
* This covers the lack of seccomp where it is not able to parse
arguments, it is easy to install a seccomp filter on direct syscalls
that operate on pids, however /proc/<pid>/ is a Linux ABI using
filesystem syscalls. With this change all LSMs should be able to analyze
open/read/write/close... on /proc/<pid>/

4) This will allow to implement new features either in kernel or
userspace without having to worry about procfs.
In containers, sandboxes, etc we have workarounds to hide some /proc
inodes, this should be supported natively without doing extra complex
work, the kernel should be able to support sane options that work with
today and future Linux use cases.

Changes of this patch:

* 'newinstance' mount option, it was also suggesed by Andy Lutomirski [5].
When this option is passed we automatically create a private procfs instance.

This is not the default behaviour since we do not want to break userspace
and we do not want to provide different devices IDs by default when
stat()ing inodes, I am not sure about all the use cases there [6].

* Also this patch moves the 'hidepid' and 'gid' mount options from being
defined and used inside PID namespaces to their private proc_fs_info
struct, cleaning both PID namespaces and procfs.

Use cases of 'newinstance' mount option:

* We create a private procfs instance that it is disconnected from the
shared or other procfs instances.

* "hidepid" instead of chaning all other mirrored procfs mounts, now
it will work only on the new private instance.

* "gid" instead of chaning all other mirrored procfs mounts, now it will
work only on the new private instance.

* The next patch that introduces "pids=ptraceable" mount option which
will take precendence over "hidepid" will only work when 'newinstance'
is set. Otherwise it is ignored.

This should allow later after real testing to have a smooth transition
to a procfs with default private instances.

[1] https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2017-January/004215.html
[2] http://www.openwall.com/lists/kernel-hardening/2017/10/05/5
[3] https://lwn.net/Articles/689539/
[4] http://lxr.free-electrons.com/source/Documentation/filesystems/devpts.txt?v=3.14
[5] https://lkml.org/lkml/2017/5/2/407
[6] https://lkml.org/lkml/2017/5/3/357

Cc: Kees Cook <keescook@chromium.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@gmail.com>
---
 fs/proc/base.c                |  4 +--
 fs/proc/inode.c               | 14 +++++---
 fs/proc/root.c                | 78 ++++++++++++++++++++++++++++++++++++++++---
 include/linux/pid_namespace.h |  2 --
 include/linux/proc_fs.h       | 30 ++++++++++++++---
 5 files changed, 110 insertions(+), 18 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index f324c49..54b527c 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -700,7 +700,7 @@ static bool has_pid_permissions(struct proc_fs_info *fs_info,
 static int proc_pid_permission(struct inode *inode, int mask)
 {
 	struct proc_fs_info *fs_info = proc_sb(inode->i_sb);
-	struct pid_namespace *pid = fs_info->pid_ns;
+	int hide_pid = proc_fs_hide_pid(fs_info);
 	struct task_struct *task;
 	bool has_perms;
 
@@ -711,7 +711,7 @@ static int proc_pid_permission(struct inode *inode, int mask)
 	put_task_struct(task);
 
 	if (!has_perms) {
-		if (pid->hide_pid == HIDEPID_INVISIBLE) {
+		if (hide_pid == HIDEPID_INVISIBLE) {
 			/*
 			 * Let's make getdents(), stat(), and open()
 			 * consistent with each other.  If a process
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index bdd808d..faec32a 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -105,12 +105,16 @@ static int proc_show_options(struct seq_file *seq, struct dentry *root)
 {
 	struct super_block *sb = root->d_sb;
 	struct proc_fs_info *fs_info = proc_sb(sb);
-	struct pid_namespace *pid = fs_info->pid_ns;
+	int hide_pid = proc_fs_hide_pid(fs_info);
+	kgid_t pid_gid = proc_fs_pid_gid(fs_info);
 
-	if (!gid_eq(pid->pid_gid, GLOBAL_ROOT_GID))
-		seq_printf(seq, ",gid=%u", from_kgid_munged(&init_user_ns, pid->pid_gid));
-	if (pid->hide_pid != HIDEPID_OFF)
-		seq_printf(seq, ",hidepid=%u", pid->hide_pid);
+	if (proc_fs_newinstance(fs_info))
+		seq_printf(seq, ",newinstance");
+
+	if (!gid_eq(pid_gid, GLOBAL_ROOT_GID))
+		seq_printf(seq, ",gid=%u", from_kgid_munged(current_user_ns(),pid_gid));
+	if (hide_pid != HIDEPID_OFF)
+		seq_printf(seq, ",hidepid=%u", hide_pid);
 
 	return 0;
 }
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 48cc481..33ab965 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -28,15 +28,57 @@
 #include "internal.h"
 
 enum {
-	Opt_gid, Opt_hidepid, Opt_err,
+	Opt_gid, Opt_hidepid, Opt_newinstance, Opt_err,
 };
 
 static const match_table_t tokens = {
 	{Opt_hidepid, "hidepid=%u"},
 	{Opt_gid, "gid=%u"},
+	{Opt_newinstance, "newinstance"},
 	{Opt_err, NULL},
 };
 
+/* We only parse 'newinstance' option here */
+int proc_parse_early_options(char *options, struct proc_fs_info *fs_info)
+{
+	char *p, *opts, *orig;
+	substring_t args[MAX_OPT_ARGS];
+
+	if (!options)
+		return 0;
+
+	opts = kstrdup(options, GFP_KERNEL);
+	if (!opts)
+		return -ENOMEM;
+
+	orig = opts;
+
+	while ((p = strsep(&opts, ",")) != NULL) {
+		int token;
+
+		if (!*p)
+			continue;
+
+		token = match_token(p, tokens, args);
+		switch (token) {
+		case Opt_newinstance:
+			proc_fs_set_newinstance(fs_info, true);
+			pr_info("proc: mounting a new procfs instance ");
+			break;
+		case Opt_gid:
+		case Opt_hidepid:
+			break;
+		default:
+			pr_err("proc: unrecognized mount option \"%s\" "
+			       "or missing value\n", p);
+			return -EINVAL;
+		}
+	}
+
+	kfree(orig);
+	return 0;
+}
+
 int proc_parse_options(char *options, struct proc_fs_info *fs_info)
 {
 	char *p;
@@ -75,6 +117,8 @@ int proc_parse_options(char *options, struct proc_fs_info *fs_info)
 			}
 			proc_fs_set_hide_pid(fs_info, option);
 			break;
+		case Opt_newinstance:
+			break;
 		default:
 			pr_err("proc: unrecognized mount option \"%s\" "
 			       "or missing value\n", p);
@@ -87,18 +131,34 @@ int proc_parse_options(char *options, struct proc_fs_info *fs_info)
 
 int proc_remount(struct super_block *sb, int *flags, char *data)
 {
+	int error;
 	struct proc_fs_info *fs_info = proc_sb(sb);
 
 	sync_filesystem(sb);
+
+	/*
+	 * If this is a new instance, then parse again the proc mount
+	 * options.
+	 */
+	if (proc_fs_newinstance(fs_info)) {
+		error = proc_parse_early_options(data, fs_info);
+		if (error < 0)
+			return error;
+	}
+
 	return !proc_parse_options(data, fs_info);
 }
 
-static int proc_test_super(struct super_block *s, void *data)
+static int proc_test_super(struct super_block *sb, void *data)
 {
 	struct proc_fs_info *p = data;
-	struct proc_fs_info *fs_info = proc_sb(s);
+	struct proc_fs_info *fs_info = proc_sb(sb);
+
+	if (!proc_fs_newinstance(p) && !proc_fs_newinstance(fs_info) &&
+	    p->pid_ns == fs_info->pid_ns)
+		return 1;
 
-	return p->pid_ns == fs_info->pid_ns;
+	return 0;
 }
 
 static int proc_set_super(struct super_block *sb, void *data)
@@ -110,7 +170,7 @@ static int proc_set_super(struct super_block *sb, void *data)
 static struct dentry *proc_mount(struct file_system_type *fs_type,
 	int flags, const char *dev_name, void *data)
 {
-	int error;
+	int error = 0;
 	struct super_block *sb;
 	struct pid_namespace *ns;
 	struct proc_fs_info *fs_info;
@@ -126,10 +186,18 @@ static struct dentry *proc_mount(struct file_system_type *fs_type,
 	if (!fs_info)
 		return ERR_PTR(-ENOMEM);
 
+	/* Set it as early as possible */
+	proc_fs_set_newinstance(fs_info, false);
+
 	if (flags & SB_KERNMOUNT) {
 		ns = data;
 		data = NULL;
 	} else {
+		/* Parse early mount options if not a kernel mount */
+		error = proc_parse_early_options(data, fs_info);
+		if (error < 0)
+			goto error_fs_info;
+
 		ns = task_active_pid_ns(current);
 	}
 
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index f91a8bf..786ea04 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -39,8 +39,6 @@ struct pid_namespace {
 	struct user_namespace *user_ns;
 	struct ucounts *ucounts;
 	struct work_struct proc_work;
-	kgid_t pid_gid;
-	int hide_pid;
 	int reboot;	/* group exit code if this pidns was rebooted */
 	struct ns_common ns;
 } __randomize_layout;
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 047d0d0..408b51d 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -16,6 +16,9 @@ struct proc_fs_info {
 	struct pid_namespace *pid_ns;
 	struct dentry *proc_self; /* For /proc/self/ */
 	struct dentry *proc_thread_self; /* For /proc/thread-self/ */
+	bool newinstance; /* Flag for new separated instances */
+	kgid_t pid_gid;
+	int hide_pid;
 };
 
 #ifdef CONFIG_PROC_FS
@@ -27,22 +30,32 @@ static inline struct proc_fs_info *proc_sb(struct super_block *sb)
 
 static inline void proc_fs_set_hide_pid(struct proc_fs_info *fs_info, int hide_pid)
 {
-	fs_info->pid_ns->hide_pid = hide_pid;
+	fs_info->hide_pid = hide_pid;
 }
 
 static inline void proc_fs_set_pid_gid(struct proc_fs_info *fs_info, kgid_t gid)
 {
-	fs_info->pid_ns->pid_gid = gid;
+	fs_info->pid_gid = gid;
+}
+
+static inline void proc_fs_set_newinstance(struct proc_fs_info *fs_info, bool value)
+{
+	fs_info->newinstance = value;
 }
 
 static inline int proc_fs_hide_pid(struct proc_fs_info *fs_info)
 {
-	return fs_info->pid_ns->hide_pid;
+	return fs_info->hide_pid;
 }
 
 static inline kgid_t proc_fs_pid_gid(struct proc_fs_info *fs_info)
 {
-	return fs_info->pid_ns->pid_gid;
+	return fs_info->pid_gid;
+}
+
+static inline bool proc_fs_newinstance(struct proc_fs_info *fs_info)
+{
+	return fs_info->newinstance;
 }
 
 extern void proc_root_init(void);
@@ -89,6 +102,10 @@ static inline void proc_fs_set_pid_gid(struct proc_info_fs *fs_info, kgid_t gid)
 {
 }
 
+static inline void proc_fs_set_newinstance(struct proc_fs_info *fs_info, bool value)
+{
+}
+
 static inline int proc_fs_hide_pid(struct proc_fs_info *fs_info)
 {
 	return 0;
@@ -99,6 +116,11 @@ extern kgid_t proc_fs_pid_gid(struct proc_fs_info *fs_info)
 	return GLOBAL_ROOT_GID;
 }
 
+static inline bool proc_fs_newinstance(struct proc_fs_info *fs_info)
+{
+	return false;
+}
+
 extern inline struct proc_fs_info *proc_sb(struct super_block *sb) { return NULL;}
 static inline struct proc_dir_entry *proc_symlink(const char *name,
 		struct proc_dir_entry *parent,const char *dest) { return NULL;}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v3 5/7] proc: move hidepid definitions to proc files
  2017-11-09 16:13 ` Djalal Harouni
  (?)
@ 2017-11-09 16:14   ` Djalal Harouni
  -1 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-09 16:14 UTC (permalink / raw)
  To: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	linux-fsdevel, linux-kernel, kernel-hardening,
	linux-security-module, linux-api
  Cc: Greg Kroah-Hartman, Alexander Viro, Akinobu Mita, me,
	Oleg Nesterov, Jeff Layton, Ingo Molnar, Alexey Dobriyan,
	ebiederm, Linus Torvalds, Daniel Micay, Jonathan Corbet, bfields,
	Stephen Rothwell, solar, Djalal Harouni

This moves the 'hidepid' definitions to proc files. The 'hidepid' is a
proc mount option, not really a per pid namespace value. It was there
since it was used inside PID namespaces, however now we have improved
proc logic and reduce the complexity and ties with PID namespaces lets
move this last bit to where it really belongs.

Cc: Kees Cook <keescook@chromium.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@gmail.com>
---
 include/linux/pid_namespace.h | 6 ------
 include/linux/proc_fs.h       | 6 ++++++
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index 786ea04..66f47f1 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -15,12 +15,6 @@
 
 struct fs_pin;
 
-enum { /* definitions for pid_namespace's hide_pid field */
-	HIDEPID_OFF	  = 0,
-	HIDEPID_NO_ACCESS = 1,
-	HIDEPID_INVISIBLE = 2,
-};
-
 struct pid_namespace {
 	struct kref kref;
 	struct idr idr;
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 408b51d..c123e5ec 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -12,6 +12,12 @@
 struct proc_dir_entry;
 struct pid_namespace;
 
+enum { /* definitions for 'hidepid' mount option */
+	HIDEPID_OFF	  = 0,
+	HIDEPID_NO_ACCESS = 1,
+	HIDEPID_INVISIBLE = 2,
+};
+
 struct proc_fs_info {
 	struct pid_namespace *pid_ns;
 	struct dentry *proc_self; /* For /proc/self/ */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v3 5/7] proc: move hidepid definitions to proc files
@ 2017-11-09 16:14   ` Djalal Harouni
  0 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-09 16:14 UTC (permalink / raw)
  To: linux-security-module

This moves the 'hidepid' definitions to proc files. The 'hidepid' is a
proc mount option, not really a per pid namespace value. It was there
since it was used inside PID namespaces, however now we have improved
proc logic and reduce the complexity and ties with PID namespaces lets
move this last bit to where it really belongs.

Cc: Kees Cook <keescook@chromium.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@gmail.com>
---
 include/linux/pid_namespace.h | 6 ------
 include/linux/proc_fs.h       | 6 ++++++
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index 786ea04..66f47f1 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -15,12 +15,6 @@
 
 struct fs_pin;
 
-enum { /* definitions for pid_namespace's hide_pid field */
-	HIDEPID_OFF	  = 0,
-	HIDEPID_NO_ACCESS = 1,
-	HIDEPID_INVISIBLE = 2,
-};
-
 struct pid_namespace {
 	struct kref kref;
 	struct idr idr;
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 408b51d..c123e5ec 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -12,6 +12,12 @@
 struct proc_dir_entry;
 struct pid_namespace;
 
+enum { /* definitions for 'hidepid' mount option */
+	HIDEPID_OFF	  = 0,
+	HIDEPID_NO_ACCESS = 1,
+	HIDEPID_INVISIBLE = 2,
+};
+
 struct proc_fs_info {
 	struct pid_namespace *pid_ns;
 	struct dentry *proc_self; /* For /proc/self/ */
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [kernel-hardening] [PATCH RFC v3 5/7] proc: move hidepid definitions to proc files
@ 2017-11-09 16:14   ` Djalal Harouni
  0 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-09 16:14 UTC (permalink / raw)
  To: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	linux-fsdevel, linux-kernel, kernel-hardening,
	linux-security-module, linux-api
  Cc: Greg Kroah-Hartman, Alexander Viro, Akinobu Mita, me,
	Oleg Nesterov, Jeff Layton, Ingo Molnar, Alexey Dobriyan,
	ebiederm, Linus Torvalds, Daniel Micay, Jonathan Corbet, bfields,
	Stephen Rothwell, solar, Djalal Harouni

This moves the 'hidepid' definitions to proc files. The 'hidepid' is a
proc mount option, not really a per pid namespace value. It was there
since it was used inside PID namespaces, however now we have improved
proc logic and reduce the complexity and ties with PID namespaces lets
move this last bit to where it really belongs.

Cc: Kees Cook <keescook@chromium.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@gmail.com>
---
 include/linux/pid_namespace.h | 6 ------
 include/linux/proc_fs.h       | 6 ++++++
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index 786ea04..66f47f1 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -15,12 +15,6 @@
 
 struct fs_pin;
 
-enum { /* definitions for pid_namespace's hide_pid field */
-	HIDEPID_OFF	  = 0,
-	HIDEPID_NO_ACCESS = 1,
-	HIDEPID_INVISIBLE = 2,
-};
-
 struct pid_namespace {
 	struct kref kref;
 	struct idr idr;
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 408b51d..c123e5ec 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -12,6 +12,12 @@
 struct proc_dir_entry;
 struct pid_namespace;
 
+enum { /* definitions for 'hidepid' mount option */
+	HIDEPID_OFF	  = 0,
+	HIDEPID_NO_ACCESS = 1,
+	HIDEPID_INVISIBLE = 2,
+};
+
 struct proc_fs_info {
 	struct pid_namespace *pid_ns;
 	struct dentry *proc_self; /* For /proc/self/ */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v3 6/7] proc: support new 'pids=all|ptraceable' mount option
  2017-11-09 16:13 ` Djalal Harouni
  (?)
@ 2017-11-09 16:14   ` Djalal Harouni
  -1 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-09 16:14 UTC (permalink / raw)
  To: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	linux-fsdevel, linux-kernel, kernel-hardening,
	linux-security-module, linux-api
  Cc: Greg Kroah-Hartman, Alexander Viro, Akinobu Mita, me,
	Oleg Nesterov, Jeff Layton, Ingo Molnar, Alexey Dobriyan,
	ebiederm, Linus Torvalds, Daniel Micay, Jonathan Corbet, bfields,
	Stephen Rothwell, solar, Djalal Harouni

This patch introduces the new 'pids' mount option, as it was discussed
and suggested by Andy Lutomirski [1].

* If 'pids=' is passed without 'newinstance' then it has no effect.

* If 'newinstance,pids=all' then all processes will be shown in proc.

* If 'newinstance,pids=ptraceable' then only ptraceable processes will be
shown.

* 'pids=' takes precendence over 'hidepid=' since 'hidepid=' can be
  ignored if "gid=" was set and caller has the "gid=" set in its groups.
  We want to guarantee that LSM have a security path there that can not
  be disabled with "gid=".

This allows to support lightweight sandboxes in Embedded Linux.

Later Yama LSM can be updated to check that processes are able only
able to see their children inside /proc/, allowing to support more tight
cases.

[1] https://lkml.org/lkml/2017/4/26/646

Cc: Kees Cook <keescook@chromium.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@gmail.com>
---
 fs/proc/base.c          | 36 +++++++++++++++++++++++++++++-------
 fs/proc/inode.c         |  6 +++++-
 fs/proc/root.c          | 20 ++++++++++++++++++--
 include/linux/proc_fs.h | 30 ++++++++++++++++++++++++++++++
 4 files changed, 82 insertions(+), 10 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 54b527c..88b92bc 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -686,13 +686,24 @@ static bool has_pid_permissions(struct proc_fs_info *fs_info,
 				 struct task_struct *task,
 				 int hide_pid_min)
 {
-	int hide_pid = proc_fs_hide_pid(fs_info);
-	kgid_t gid = proc_fs_pid_gid(fs_info);
+	int pids = proc_fs_pids(fs_info);
+
+	/*
+	 * If 'pids=all' or if it was not set then lets fallback
+	 * to 'hidepid' and 'gid', if those are not enforced too, then
+	 * ptrace checks are skipped. Otherwise ptrace permission is
+	 * required for all other cases.
+	 */
+	if (pids == PIDS_ALL) {
+		int hide_pid = proc_fs_hide_pid(fs_info);
+		kgid_t gid = proc_fs_pid_gid(fs_info);
+
+		if (hide_pid < hide_pid_min)
+			return true;
 
-	if (hide_pid < hide_pid_min)
-		return true;
-	if (in_group_p(gid))
-		return true;
+		if (in_group_p(gid))
+			return true;
+	}
 	return ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS);
 }
 
@@ -701,6 +712,7 @@ static int proc_pid_permission(struct inode *inode, int mask)
 {
 	struct proc_fs_info *fs_info = proc_sb(inode->i_sb);
 	int hide_pid = proc_fs_hide_pid(fs_info);
+	int pids = proc_fs_pids(fs_info);
 	struct task_struct *task;
 	bool has_perms;
 
@@ -711,7 +723,8 @@ static int proc_pid_permission(struct inode *inode, int mask)
 	put_task_struct(task);
 
 	if (!has_perms) {
-		if (hide_pid == HIDEPID_INVISIBLE) {
+		if (pids == PIDS_PTRACEABLE ||
+		    hide_pid == HIDEPID_INVISIBLE) {
 			/*
 			 * Let's make getdents(), stat(), and open()
 			 * consistent with each other.  If a process
@@ -3140,6 +3153,7 @@ struct dentry *proc_pid_lookup(struct inode *dir, struct dentry * dentry, unsign
 	unsigned tgid;
 	struct proc_fs_info *fs_info = proc_sb(dir->i_sb);
 	struct pid_namespace *ns = fs_info->pid_ns;
+	int pids = proc_fs_pids(fs_info);
 
 	tgid = name_to_int(&dentry->d_name);
 	if (tgid == ~0U)
@@ -3153,7 +3167,15 @@ struct dentry *proc_pid_lookup(struct inode *dir, struct dentry * dentry, unsign
 	if (!task)
 		goto out;
 
+	/* Limit procfs to only ptraceable tasks */
+	if (pids != PIDS_ALL) {
+		cond_resched();
+		if (!has_pid_permissions(fs_info, task, HIDEPID_NO_ACCESS))
+			goto out_put_task;
+	}
+
 	result = proc_pid_instantiate(dir, dentry, task, NULL);
+out_put_task:
 	put_task_struct(task);
 out:
 	return ERR_PTR(result);
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index faec32a..2707d5f 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -108,8 +108,12 @@ static int proc_show_options(struct seq_file *seq, struct dentry *root)
 	int hide_pid = proc_fs_hide_pid(fs_info);
 	kgid_t pid_gid = proc_fs_pid_gid(fs_info);
 
-	if (proc_fs_newinstance(fs_info))
+	if (proc_fs_newinstance(fs_info)) {
+		int pids = proc_fs_pids(fs_info);
+
 		seq_printf(seq, ",newinstance");
+		seq_printf(seq, ",pids=%s", pids == PIDS_ALL ? "all" : "ptraceable");
+	}
 
 	if (!gid_eq(pid_gid, GLOBAL_ROOT_GID))
 		seq_printf(seq, ",gid=%u", from_kgid_munged(current_user_ns(),pid_gid));
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 33ab965..5cdff69 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -28,13 +28,14 @@
 #include "internal.h"
 
 enum {
-	Opt_gid, Opt_hidepid, Opt_newinstance, Opt_err,
+	Opt_gid, Opt_hidepid, Opt_newinstance, Opt_pids, Opt_err,
 };
 
 static const match_table_t tokens = {
 	{Opt_hidepid, "hidepid=%u"},
 	{Opt_gid, "gid=%u"},
 	{Opt_newinstance, "newinstance"},
+	{Opt_pids, "pids=%s"},
 	{Opt_err, NULL},
 };
 
@@ -67,6 +68,7 @@ int proc_parse_early_options(char *options, struct proc_fs_info *fs_info)
 			break;
 		case Opt_gid:
 		case Opt_hidepid:
+		case Opt_pids:
 			break;
 		default:
 			pr_err("proc: unrecognized mount option \"%s\" "
@@ -83,7 +85,7 @@ int proc_parse_options(char *options, struct proc_fs_info *fs_info)
 {
 	char *p;
 	substring_t args[MAX_OPT_ARGS];
-	int option;
+	int option, ret = 0;
 	kgid_t gid;
 
 	if (!options)
@@ -119,6 +121,19 @@ int proc_parse_options(char *options, struct proc_fs_info *fs_info)
 			break;
 		case Opt_newinstance:
 			break;
+		case Opt_pids:
+			if (strcmp(args[0].from, "all") == 0)
+				ret = proc_fs_set_pids(fs_info, PIDS_ALL);
+			else if (strcmp(args[0].from, "ptraceable") == 0)
+				ret = proc_fs_set_pids(fs_info, PIDS_PTRACEABLE);
+			else
+				ret = -EINVAL;
+
+			if (ret < 0) {
+				pr_err("proc: invalid 'pids' mount option.\n");
+				return 0;
+			}
+			break;
 		default:
 			pr_err("proc: unrecognized mount option \"%s\" "
 			       "or missing value\n", p);
@@ -188,6 +203,7 @@ static struct dentry *proc_mount(struct file_system_type *fs_type,
 
 	/* Set it as early as possible */
 	proc_fs_set_newinstance(fs_info, false);
+	proc_fs_set_pids(fs_info, PIDS_ALL);
 
 	if (flags & SB_KERNMOUNT) {
 		ns = data;
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index c123e5ec..0730f52 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -18,6 +18,11 @@ enum { /* definitions for 'hidepid' mount option */
 	HIDEPID_INVISIBLE = 2,
 };
 
+enum { /* definitions for 'pids' mount option */
+	PIDS_ALL	= 0,
+	PIDS_PTRACEABLE	= 1,
+};
+
 struct proc_fs_info {
 	struct pid_namespace *pid_ns;
 	struct dentry *proc_self; /* For /proc/self/ */
@@ -25,6 +30,7 @@ struct proc_fs_info {
 	bool newinstance; /* Flag for new separated instances */
 	kgid_t pid_gid;
 	int hide_pid;
+	int pids;
 };
 
 #ifdef CONFIG_PROC_FS
@@ -49,6 +55,16 @@ static inline void proc_fs_set_newinstance(struct proc_fs_info *fs_info, bool va
 	fs_info->newinstance = value;
 }
 
+static inline int proc_fs_set_pids(struct proc_fs_info *fs_info, int value)
+{
+	if (value != PIDS_ALL &&
+	    (value != PIDS_PTRACEABLE || !fs_info->newinstance))
+		return -EINVAL;
+
+	fs_info->pids = value;
+	return 0;
+}
+
 static inline int proc_fs_hide_pid(struct proc_fs_info *fs_info)
 {
 	return fs_info->hide_pid;
@@ -64,6 +80,11 @@ static inline bool proc_fs_newinstance(struct proc_fs_info *fs_info)
 	return fs_info->newinstance;
 }
 
+static inline int proc_fs_pids(struct proc_fs_info *fs_info)
+{
+	return fs_info->pids;
+}
+
 extern void proc_root_init(void);
 extern void proc_flush_task(struct task_struct *);
 
@@ -112,6 +133,10 @@ static inline void proc_fs_set_newinstance(struct proc_fs_info *fs_info, bool va
 {
 }
 
+static inline int proc_fs_set_pids(struct proc_fs_info *fs_info, int value)
+{
+}
+
 static inline int proc_fs_hide_pid(struct proc_fs_info *fs_info)
 {
 	return 0;
@@ -127,6 +152,11 @@ static inline bool proc_fs_newinstance(struct proc_fs_info *fs_info)
 	return false;
 }
 
+static inline int proc_fs_pids(struct proc_fs_info *fs_info)
+{
+	return 0;
+}
+
 extern inline struct proc_fs_info *proc_sb(struct super_block *sb) { return NULL;}
 static inline struct proc_dir_entry *proc_symlink(const char *name,
 		struct proc_dir_entry *parent,const char *dest) { return NULL;}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v3 6/7] proc: support new 'pids=all|ptraceable' mount option
@ 2017-11-09 16:14   ` Djalal Harouni
  0 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-09 16:14 UTC (permalink / raw)
  To: linux-security-module

This patch introduces the new 'pids' mount option, as it was discussed
and suggested by Andy Lutomirski [1].

* If 'pids=' is passed without 'newinstance' then it has no effect.

* If 'newinstance,pids=all' then all processes will be shown in proc.

* If 'newinstance,pids=ptraceable' then only ptraceable processes will be
shown.

* 'pids=' takes precendence over 'hidepid=' since 'hidepid=' can be
  ignored if "gid=" was set and caller has the "gid=" set in its groups.
  We want to guarantee that LSM have a security path there that can not
  be disabled with "gid=".

This allows to support lightweight sandboxes in Embedded Linux.

Later Yama LSM can be updated to check that processes are able only
able to see their children inside /proc/, allowing to support more tight
cases.

[1] https://lkml.org/lkml/2017/4/26/646

Cc: Kees Cook <keescook@chromium.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@gmail.com>
---
 fs/proc/base.c          | 36 +++++++++++++++++++++++++++++-------
 fs/proc/inode.c         |  6 +++++-
 fs/proc/root.c          | 20 ++++++++++++++++++--
 include/linux/proc_fs.h | 30 ++++++++++++++++++++++++++++++
 4 files changed, 82 insertions(+), 10 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 54b527c..88b92bc 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -686,13 +686,24 @@ static bool has_pid_permissions(struct proc_fs_info *fs_info,
 				 struct task_struct *task,
 				 int hide_pid_min)
 {
-	int hide_pid = proc_fs_hide_pid(fs_info);
-	kgid_t gid = proc_fs_pid_gid(fs_info);
+	int pids = proc_fs_pids(fs_info);
+
+	/*
+	 * If 'pids=all' or if it was not set then lets fallback
+	 * to 'hidepid' and 'gid', if those are not enforced too, then
+	 * ptrace checks are skipped. Otherwise ptrace permission is
+	 * required for all other cases.
+	 */
+	if (pids == PIDS_ALL) {
+		int hide_pid = proc_fs_hide_pid(fs_info);
+		kgid_t gid = proc_fs_pid_gid(fs_info);
+
+		if (hide_pid < hide_pid_min)
+			return true;
 
-	if (hide_pid < hide_pid_min)
-		return true;
-	if (in_group_p(gid))
-		return true;
+		if (in_group_p(gid))
+			return true;
+	}
 	return ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS);
 }
 
@@ -701,6 +712,7 @@ static int proc_pid_permission(struct inode *inode, int mask)
 {
 	struct proc_fs_info *fs_info = proc_sb(inode->i_sb);
 	int hide_pid = proc_fs_hide_pid(fs_info);
+	int pids = proc_fs_pids(fs_info);
 	struct task_struct *task;
 	bool has_perms;
 
@@ -711,7 +723,8 @@ static int proc_pid_permission(struct inode *inode, int mask)
 	put_task_struct(task);
 
 	if (!has_perms) {
-		if (hide_pid == HIDEPID_INVISIBLE) {
+		if (pids == PIDS_PTRACEABLE ||
+		    hide_pid == HIDEPID_INVISIBLE) {
 			/*
 			 * Let's make getdents(), stat(), and open()
 			 * consistent with each other.  If a process
@@ -3140,6 +3153,7 @@ struct dentry *proc_pid_lookup(struct inode *dir, struct dentry * dentry, unsign
 	unsigned tgid;
 	struct proc_fs_info *fs_info = proc_sb(dir->i_sb);
 	struct pid_namespace *ns = fs_info->pid_ns;
+	int pids = proc_fs_pids(fs_info);
 
 	tgid = name_to_int(&dentry->d_name);
 	if (tgid == ~0U)
@@ -3153,7 +3167,15 @@ struct dentry *proc_pid_lookup(struct inode *dir, struct dentry * dentry, unsign
 	if (!task)
 		goto out;
 
+	/* Limit procfs to only ptraceable tasks */
+	if (pids != PIDS_ALL) {
+		cond_resched();
+		if (!has_pid_permissions(fs_info, task, HIDEPID_NO_ACCESS))
+			goto out_put_task;
+	}
+
 	result = proc_pid_instantiate(dir, dentry, task, NULL);
+out_put_task:
 	put_task_struct(task);
 out:
 	return ERR_PTR(result);
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index faec32a..2707d5f 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -108,8 +108,12 @@ static int proc_show_options(struct seq_file *seq, struct dentry *root)
 	int hide_pid = proc_fs_hide_pid(fs_info);
 	kgid_t pid_gid = proc_fs_pid_gid(fs_info);
 
-	if (proc_fs_newinstance(fs_info))
+	if (proc_fs_newinstance(fs_info)) {
+		int pids = proc_fs_pids(fs_info);
+
 		seq_printf(seq, ",newinstance");
+		seq_printf(seq, ",pids=%s", pids == PIDS_ALL ? "all" : "ptraceable");
+	}
 
 	if (!gid_eq(pid_gid, GLOBAL_ROOT_GID))
 		seq_printf(seq, ",gid=%u", from_kgid_munged(current_user_ns(),pid_gid));
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 33ab965..5cdff69 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -28,13 +28,14 @@
 #include "internal.h"
 
 enum {
-	Opt_gid, Opt_hidepid, Opt_newinstance, Opt_err,
+	Opt_gid, Opt_hidepid, Opt_newinstance, Opt_pids, Opt_err,
 };
 
 static const match_table_t tokens = {
 	{Opt_hidepid, "hidepid=%u"},
 	{Opt_gid, "gid=%u"},
 	{Opt_newinstance, "newinstance"},
+	{Opt_pids, "pids=%s"},
 	{Opt_err, NULL},
 };
 
@@ -67,6 +68,7 @@ int proc_parse_early_options(char *options, struct proc_fs_info *fs_info)
 			break;
 		case Opt_gid:
 		case Opt_hidepid:
+		case Opt_pids:
 			break;
 		default:
 			pr_err("proc: unrecognized mount option \"%s\" "
@@ -83,7 +85,7 @@ int proc_parse_options(char *options, struct proc_fs_info *fs_info)
 {
 	char *p;
 	substring_t args[MAX_OPT_ARGS];
-	int option;
+	int option, ret = 0;
 	kgid_t gid;
 
 	if (!options)
@@ -119,6 +121,19 @@ int proc_parse_options(char *options, struct proc_fs_info *fs_info)
 			break;
 		case Opt_newinstance:
 			break;
+		case Opt_pids:
+			if (strcmp(args[0].from, "all") == 0)
+				ret = proc_fs_set_pids(fs_info, PIDS_ALL);
+			else if (strcmp(args[0].from, "ptraceable") == 0)
+				ret = proc_fs_set_pids(fs_info, PIDS_PTRACEABLE);
+			else
+				ret = -EINVAL;
+
+			if (ret < 0) {
+				pr_err("proc: invalid 'pids' mount option.\n");
+				return 0;
+			}
+			break;
 		default:
 			pr_err("proc: unrecognized mount option \"%s\" "
 			       "or missing value\n", p);
@@ -188,6 +203,7 @@ static struct dentry *proc_mount(struct file_system_type *fs_type,
 
 	/* Set it as early as possible */
 	proc_fs_set_newinstance(fs_info, false);
+	proc_fs_set_pids(fs_info, PIDS_ALL);
 
 	if (flags & SB_KERNMOUNT) {
 		ns = data;
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index c123e5ec..0730f52 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -18,6 +18,11 @@ enum { /* definitions for 'hidepid' mount option */
 	HIDEPID_INVISIBLE = 2,
 };
 
+enum { /* definitions for 'pids' mount option */
+	PIDS_ALL	= 0,
+	PIDS_PTRACEABLE	= 1,
+};
+
 struct proc_fs_info {
 	struct pid_namespace *pid_ns;
 	struct dentry *proc_self; /* For /proc/self/ */
@@ -25,6 +30,7 @@ struct proc_fs_info {
 	bool newinstance; /* Flag for new separated instances */
 	kgid_t pid_gid;
 	int hide_pid;
+	int pids;
 };
 
 #ifdef CONFIG_PROC_FS
@@ -49,6 +55,16 @@ static inline void proc_fs_set_newinstance(struct proc_fs_info *fs_info, bool va
 	fs_info->newinstance = value;
 }
 
+static inline int proc_fs_set_pids(struct proc_fs_info *fs_info, int value)
+{
+	if (value != PIDS_ALL &&
+	    (value != PIDS_PTRACEABLE || !fs_info->newinstance))
+		return -EINVAL;
+
+	fs_info->pids = value;
+	return 0;
+}
+
 static inline int proc_fs_hide_pid(struct proc_fs_info *fs_info)
 {
 	return fs_info->hide_pid;
@@ -64,6 +80,11 @@ static inline bool proc_fs_newinstance(struct proc_fs_info *fs_info)
 	return fs_info->newinstance;
 }
 
+static inline int proc_fs_pids(struct proc_fs_info *fs_info)
+{
+	return fs_info->pids;
+}
+
 extern void proc_root_init(void);
 extern void proc_flush_task(struct task_struct *);
 
@@ -112,6 +133,10 @@ static inline void proc_fs_set_newinstance(struct proc_fs_info *fs_info, bool va
 {
 }
 
+static inline int proc_fs_set_pids(struct proc_fs_info *fs_info, int value)
+{
+}
+
 static inline int proc_fs_hide_pid(struct proc_fs_info *fs_info)
 {
 	return 0;
@@ -127,6 +152,11 @@ static inline bool proc_fs_newinstance(struct proc_fs_info *fs_info)
 	return false;
 }
 
+static inline int proc_fs_pids(struct proc_fs_info *fs_info)
+{
+	return 0;
+}
+
 extern inline struct proc_fs_info *proc_sb(struct super_block *sb) { return NULL;}
 static inline struct proc_dir_entry *proc_symlink(const char *name,
 		struct proc_dir_entry *parent,const char *dest) { return NULL;}
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [kernel-hardening] [PATCH RFC v3 6/7] proc: support new 'pids=all|ptraceable' mount option
@ 2017-11-09 16:14   ` Djalal Harouni
  0 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-09 16:14 UTC (permalink / raw)
  To: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	linux-fsdevel, linux-kernel, kernel-hardening,
	linux-security-module, linux-api
  Cc: Greg Kroah-Hartman, Alexander Viro, Akinobu Mita, me,
	Oleg Nesterov, Jeff Layton, Ingo Molnar, Alexey Dobriyan,
	ebiederm, Linus Torvalds, Daniel Micay, Jonathan Corbet, bfields,
	Stephen Rothwell, solar, Djalal Harouni

This patch introduces the new 'pids' mount option, as it was discussed
and suggested by Andy Lutomirski [1].

* If 'pids=' is passed without 'newinstance' then it has no effect.

* If 'newinstance,pids=all' then all processes will be shown in proc.

* If 'newinstance,pids=ptraceable' then only ptraceable processes will be
shown.

* 'pids=' takes precendence over 'hidepid=' since 'hidepid=' can be
  ignored if "gid=" was set and caller has the "gid=" set in its groups.
  We want to guarantee that LSM have a security path there that can not
  be disabled with "gid=".

This allows to support lightweight sandboxes in Embedded Linux.

Later Yama LSM can be updated to check that processes are able only
able to see their children inside /proc/, allowing to support more tight
cases.

[1] https://lkml.org/lkml/2017/4/26/646

Cc: Kees Cook <keescook@chromium.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@gmail.com>
---
 fs/proc/base.c          | 36 +++++++++++++++++++++++++++++-------
 fs/proc/inode.c         |  6 +++++-
 fs/proc/root.c          | 20 ++++++++++++++++++--
 include/linux/proc_fs.h | 30 ++++++++++++++++++++++++++++++
 4 files changed, 82 insertions(+), 10 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 54b527c..88b92bc 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -686,13 +686,24 @@ static bool has_pid_permissions(struct proc_fs_info *fs_info,
 				 struct task_struct *task,
 				 int hide_pid_min)
 {
-	int hide_pid = proc_fs_hide_pid(fs_info);
-	kgid_t gid = proc_fs_pid_gid(fs_info);
+	int pids = proc_fs_pids(fs_info);
+
+	/*
+	 * If 'pids=all' or if it was not set then lets fallback
+	 * to 'hidepid' and 'gid', if those are not enforced too, then
+	 * ptrace checks are skipped. Otherwise ptrace permission is
+	 * required for all other cases.
+	 */
+	if (pids == PIDS_ALL) {
+		int hide_pid = proc_fs_hide_pid(fs_info);
+		kgid_t gid = proc_fs_pid_gid(fs_info);
+
+		if (hide_pid < hide_pid_min)
+			return true;
 
-	if (hide_pid < hide_pid_min)
-		return true;
-	if (in_group_p(gid))
-		return true;
+		if (in_group_p(gid))
+			return true;
+	}
 	return ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS);
 }
 
@@ -701,6 +712,7 @@ static int proc_pid_permission(struct inode *inode, int mask)
 {
 	struct proc_fs_info *fs_info = proc_sb(inode->i_sb);
 	int hide_pid = proc_fs_hide_pid(fs_info);
+	int pids = proc_fs_pids(fs_info);
 	struct task_struct *task;
 	bool has_perms;
 
@@ -711,7 +723,8 @@ static int proc_pid_permission(struct inode *inode, int mask)
 	put_task_struct(task);
 
 	if (!has_perms) {
-		if (hide_pid == HIDEPID_INVISIBLE) {
+		if (pids == PIDS_PTRACEABLE ||
+		    hide_pid == HIDEPID_INVISIBLE) {
 			/*
 			 * Let's make getdents(), stat(), and open()
 			 * consistent with each other.  If a process
@@ -3140,6 +3153,7 @@ struct dentry *proc_pid_lookup(struct inode *dir, struct dentry * dentry, unsign
 	unsigned tgid;
 	struct proc_fs_info *fs_info = proc_sb(dir->i_sb);
 	struct pid_namespace *ns = fs_info->pid_ns;
+	int pids = proc_fs_pids(fs_info);
 
 	tgid = name_to_int(&dentry->d_name);
 	if (tgid == ~0U)
@@ -3153,7 +3167,15 @@ struct dentry *proc_pid_lookup(struct inode *dir, struct dentry * dentry, unsign
 	if (!task)
 		goto out;
 
+	/* Limit procfs to only ptraceable tasks */
+	if (pids != PIDS_ALL) {
+		cond_resched();
+		if (!has_pid_permissions(fs_info, task, HIDEPID_NO_ACCESS))
+			goto out_put_task;
+	}
+
 	result = proc_pid_instantiate(dir, dentry, task, NULL);
+out_put_task:
 	put_task_struct(task);
 out:
 	return ERR_PTR(result);
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index faec32a..2707d5f 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -108,8 +108,12 @@ static int proc_show_options(struct seq_file *seq, struct dentry *root)
 	int hide_pid = proc_fs_hide_pid(fs_info);
 	kgid_t pid_gid = proc_fs_pid_gid(fs_info);
 
-	if (proc_fs_newinstance(fs_info))
+	if (proc_fs_newinstance(fs_info)) {
+		int pids = proc_fs_pids(fs_info);
+
 		seq_printf(seq, ",newinstance");
+		seq_printf(seq, ",pids=%s", pids == PIDS_ALL ? "all" : "ptraceable");
+	}
 
 	if (!gid_eq(pid_gid, GLOBAL_ROOT_GID))
 		seq_printf(seq, ",gid=%u", from_kgid_munged(current_user_ns(),pid_gid));
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 33ab965..5cdff69 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -28,13 +28,14 @@
 #include "internal.h"
 
 enum {
-	Opt_gid, Opt_hidepid, Opt_newinstance, Opt_err,
+	Opt_gid, Opt_hidepid, Opt_newinstance, Opt_pids, Opt_err,
 };
 
 static const match_table_t tokens = {
 	{Opt_hidepid, "hidepid=%u"},
 	{Opt_gid, "gid=%u"},
 	{Opt_newinstance, "newinstance"},
+	{Opt_pids, "pids=%s"},
 	{Opt_err, NULL},
 };
 
@@ -67,6 +68,7 @@ int proc_parse_early_options(char *options, struct proc_fs_info *fs_info)
 			break;
 		case Opt_gid:
 		case Opt_hidepid:
+		case Opt_pids:
 			break;
 		default:
 			pr_err("proc: unrecognized mount option \"%s\" "
@@ -83,7 +85,7 @@ int proc_parse_options(char *options, struct proc_fs_info *fs_info)
 {
 	char *p;
 	substring_t args[MAX_OPT_ARGS];
-	int option;
+	int option, ret = 0;
 	kgid_t gid;
 
 	if (!options)
@@ -119,6 +121,19 @@ int proc_parse_options(char *options, struct proc_fs_info *fs_info)
 			break;
 		case Opt_newinstance:
 			break;
+		case Opt_pids:
+			if (strcmp(args[0].from, "all") == 0)
+				ret = proc_fs_set_pids(fs_info, PIDS_ALL);
+			else if (strcmp(args[0].from, "ptraceable") == 0)
+				ret = proc_fs_set_pids(fs_info, PIDS_PTRACEABLE);
+			else
+				ret = -EINVAL;
+
+			if (ret < 0) {
+				pr_err("proc: invalid 'pids' mount option.\n");
+				return 0;
+			}
+			break;
 		default:
 			pr_err("proc: unrecognized mount option \"%s\" "
 			       "or missing value\n", p);
@@ -188,6 +203,7 @@ static struct dentry *proc_mount(struct file_system_type *fs_type,
 
 	/* Set it as early as possible */
 	proc_fs_set_newinstance(fs_info, false);
+	proc_fs_set_pids(fs_info, PIDS_ALL);
 
 	if (flags & SB_KERNMOUNT) {
 		ns = data;
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index c123e5ec..0730f52 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -18,6 +18,11 @@ enum { /* definitions for 'hidepid' mount option */
 	HIDEPID_INVISIBLE = 2,
 };
 
+enum { /* definitions for 'pids' mount option */
+	PIDS_ALL	= 0,
+	PIDS_PTRACEABLE	= 1,
+};
+
 struct proc_fs_info {
 	struct pid_namespace *pid_ns;
 	struct dentry *proc_self; /* For /proc/self/ */
@@ -25,6 +30,7 @@ struct proc_fs_info {
 	bool newinstance; /* Flag for new separated instances */
 	kgid_t pid_gid;
 	int hide_pid;
+	int pids;
 };
 
 #ifdef CONFIG_PROC_FS
@@ -49,6 +55,16 @@ static inline void proc_fs_set_newinstance(struct proc_fs_info *fs_info, bool va
 	fs_info->newinstance = value;
 }
 
+static inline int proc_fs_set_pids(struct proc_fs_info *fs_info, int value)
+{
+	if (value != PIDS_ALL &&
+	    (value != PIDS_PTRACEABLE || !fs_info->newinstance))
+		return -EINVAL;
+
+	fs_info->pids = value;
+	return 0;
+}
+
 static inline int proc_fs_hide_pid(struct proc_fs_info *fs_info)
 {
 	return fs_info->hide_pid;
@@ -64,6 +80,11 @@ static inline bool proc_fs_newinstance(struct proc_fs_info *fs_info)
 	return fs_info->newinstance;
 }
 
+static inline int proc_fs_pids(struct proc_fs_info *fs_info)
+{
+	return fs_info->pids;
+}
+
 extern void proc_root_init(void);
 extern void proc_flush_task(struct task_struct *);
 
@@ -112,6 +133,10 @@ static inline void proc_fs_set_newinstance(struct proc_fs_info *fs_info, bool va
 {
 }
 
+static inline int proc_fs_set_pids(struct proc_fs_info *fs_info, int value)
+{
+}
+
 static inline int proc_fs_hide_pid(struct proc_fs_info *fs_info)
 {
 	return 0;
@@ -127,6 +152,11 @@ static inline bool proc_fs_newinstance(struct proc_fs_info *fs_info)
 	return false;
 }
 
+static inline int proc_fs_pids(struct proc_fs_info *fs_info)
+{
+	return 0;
+}
+
 extern inline struct proc_fs_info *proc_sb(struct super_block *sb) { return NULL;}
 static inline struct proc_dir_entry *proc_symlink(const char *name,
 		struct proc_dir_entry *parent,const char *dest) { return NULL;}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v3 7/7] proc: flush dcache entries from all procfs instances
@ 2017-11-09 16:14   ` Djalal Harouni
  0 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-09 16:14 UTC (permalink / raw)
  To: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	linux-fsdevel, linux-kernel, kernel-hardening,
	linux-security-module, linux-api
  Cc: Greg Kroah-Hartman, Alexander Viro, Akinobu Mita, me,
	Oleg Nesterov, Jeff Layton, Ingo Molnar, Alexey Dobriyan,
	ebiederm, Linus Torvalds, Daniel Micay, Jonathan Corbet, bfields,
	Stephen Rothwell, solar, Djalal Harouni

Flush dcache entries of a task when it terminates. The task may have
showed up in multiple procfs mounts per pid namespace, and we need to
walk the mounts and invalidate any left entires.

Cc: Kees Cook <keescook@chromium.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@gmail.com>
---
 fs/proc/base.c                | 27 +++++++++++++++++++-----
 fs/proc/inode.c               |  9 +++++++-
 fs/proc/root.c                | 10 +++++++++
 include/linux/pid_namespace.h | 49 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/proc_fs.h       |  2 ++
 5 files changed, 91 insertions(+), 6 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 88b92bc..27e52aa 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3034,7 +3034,8 @@ static const struct inode_operations proc_tgid_base_inode_operations = {
 	.permission	= proc_pid_permission,
 };
 
-static void proc_flush_task_mnt(struct vfsmount *mnt, pid_t pid, pid_t tgid)
+static void proc_flush_task_mnt_root(struct dentry *mnt_root,
+				     pid_t pid, pid_t tgid)
 {
 	struct dentry *dentry, *leader, *dir;
 	char buf[PROC_NUMBUF];
@@ -3043,7 +3044,7 @@ static void proc_flush_task_mnt(struct vfsmount *mnt, pid_t pid, pid_t tgid)
 	name.name = buf;
 	name.len = snprintf(buf, sizeof(buf), "%d", pid);
 	/* no ->d_hash() rejects on procfs */
-	dentry = d_hash_and_lookup(mnt->mnt_root, &name);
+	dentry = d_hash_and_lookup(mnt_root, &name);
 	if (dentry) {
 		d_invalidate(dentry);
 		dput(dentry);
@@ -3054,7 +3055,7 @@ static void proc_flush_task_mnt(struct vfsmount *mnt, pid_t pid, pid_t tgid)
 
 	name.name = buf;
 	name.len = snprintf(buf, sizeof(buf), "%d", tgid);
-	leader = d_hash_and_lookup(mnt->mnt_root, &name);
+	leader = d_hash_and_lookup(mnt_root, &name);
 	if (!leader)
 		goto out;
 
@@ -3109,14 +3110,30 @@ void proc_flush_task(struct task_struct *task)
 	int i;
 	struct pid *pid, *tgid;
 	struct upid *upid;
+	struct proc_fs_info *fs_info_entry;
+	struct pid_namespace *pid_ns;
+	struct dentry *mnt_root;
 
 	pid = task_pid(task);
 	tgid = task_tgid(task);
 
 	for (i = 0; i <= pid->level; i++) {
 		upid = &pid->numbers[i];
-		proc_flush_task_mnt(upid->ns->proc_mnt, upid->nr,
-					tgid->numbers[i].nr);
+		pid_ns = upid->ns;
+
+		pidns_proc_lock_shared(pid_ns);
+		list_for_each_entry(fs_info_entry, &pid_ns->procfs_mounts,
+				    pidns_entry) {
+			if (proc_fs_newinstance(fs_info_entry)) {
+				mnt_root = fs_info_entry->sb->s_root;
+				proc_flush_task_mnt_root(mnt_root, upid->nr,
+							 tgid->numbers[i].nr);
+			}
+		}
+		pidns_proc_unlock_shared(pid_ns);
+
+		mnt_root = pid_ns->proc_mnt->mnt_root;
+		proc_flush_task_mnt_root(mnt_root, upid->nr, tgid->numbers[i].nr);
 	}
 }
 
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 2707d5f..8fcf0d7 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -484,10 +484,17 @@ struct inode *proc_get_inode(struct super_block *sb, struct proc_dir_entry *de)
 int proc_fill_super(struct super_block *s, void *data, int silent)
 {
 	struct proc_fs_info *fs_info = proc_sb(s);
+	struct pid_namespace *ns = get_pid_ns(fs_info->pid_ns);
 	struct inode *root_inode;
 	int ret;
 
-	get_pid_ns(fs_info->pid_ns);
+	fs_info->sb = s;
+
+	if (proc_fs_newinstance(fs_info)) {
+		pidns_proc_lock(ns);
+		list_add_tail(&fs_info->pidns_entry, &ns->procfs_mounts);
+		pidns_proc_unlock(ns);
+	}
 
 	if (!proc_parse_options(data, fs_info))
 		return -EINVAL;
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 5cdff69..5503799 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -259,6 +259,13 @@ static void proc_kill_sb(struct super_block *sb)
 		dput(fs_info->proc_self);
 	if (fs_info->proc_thread_self)
 		dput(fs_info->proc_thread_self);
+
+	if (proc_fs_newinstance(fs_info)) {
+		pidns_proc_lock(ns);
+		list_del(&fs_info->pidns_entry);
+		pidns_proc_unlock(ns);
+	}
+
 	kill_anon_super(sb);
 	put_pid_ns(ns);
 	kfree(fs_info);
@@ -374,6 +381,9 @@ int pid_ns_prepare_proc(struct pid_namespace *ns)
 		return PTR_ERR(mnt);
 
 	ns->proc_mnt = mnt;
+	init_rwsem(&ns->rw_procfs_mnts);
+	INIT_LIST_HEAD(&ns->procfs_mounts);
+
 	return 0;
 }
 
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index 66f47f1..9a7a28d 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -26,6 +26,15 @@ struct pid_namespace {
 	struct pid_namespace *parent;
 #ifdef CONFIG_PROC_FS
 	struct vfsmount *proc_mnt; /* Internal proc mounted during each new pidns */
+
+	/* Serialize separated procfs access from super block create/destory */
+	struct rw_semaphore rw_procfs_mnts;
+
+	/*
+	 * List of separated procfs mounts, used to invalidate task dentry
+	 * from all the related procfs mounts.
+	 */
+	struct list_head procfs_mounts;
 #endif
 #ifdef CONFIG_BSD_PROCESS_ACCT
 	struct fs_pin *bacct;
@@ -90,4 +99,44 @@ extern struct pid_namespace *task_active_pid_ns(struct task_struct *tsk);
 void pidhash_init(void);
 void pid_idr_init(void);
 
+#ifdef CONFIG_PROC_FS
+static inline void pidns_proc_lock(struct pid_namespace *pid_ns)
+{
+	down_write(&pid_ns->rw_procfs_mnts);
+}
+
+static inline void pidns_proc_unlock(struct pid_namespace *pid_ns)
+{
+	up_write(&pid_ns->rw_procfs_mnts);
+}
+
+static inline void pidns_proc_lock_shared(struct pid_namespace *pid_ns)
+{
+	down_read(&pid_ns->rw_procfs_mnts);
+}
+
+static inline void pidns_proc_unlock_shared(struct pid_namespace *pid_ns)
+{
+	up_read(&pid_ns->rw_procfs_mnts);
+}
+#else /* !CONFIG_PROC_FS */
+
+static inline void pidns_proc_lock(struct pid_namespace *pid_ns)
+{
+}
+
+static inline void pidns_proc_unlock(struct pid_namespace *pid_ns)
+{
+}
+
+static inline void pidns_proc_lock_shared(struct pid_namespace *pid_ns)
+{
+}
+
+static inline void pidns_proc_unlock_shared(struct pid_namespace *pid_ns)
+{
+}
+
+#endif /* CONFIG_PROC_FS */
+
 #endif /* _LINUX_PID_NS_H */
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 0730f52..e56fbab 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -24,7 +24,9 @@ enum { /* definitions for 'pids' mount option */
 };
 
 struct proc_fs_info {
+	struct super_block *sb;
 	struct pid_namespace *pid_ns;
+	struct list_head pidns_entry; /* Node in procfs_mounts of pidns */
 	struct dentry *proc_self; /* For /proc/self/ */
 	struct dentry *proc_thread_self; /* For /proc/thread-self/ */
 	bool newinstance; /* Flag for new separated instances */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v3 7/7] proc: flush dcache entries from all procfs instances
@ 2017-11-09 16:14   ` Djalal Harouni
  0 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-09 16:14 UTC (permalink / raw)
  To: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-hardening-ZwoEplunGu1jrUoiu81ncdBPR1lH4CV8,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA
  Cc: Greg Kroah-Hartman, Alexander Viro, Akinobu Mita, me-xzjC0nNlxno,
	Oleg Nesterov, Jeff Layton, Ingo Molnar, Alexey Dobriyan,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, Linus Torvalds, Daniel Micay,
	Jonathan Corbet, bfields-uC3wQj2KruNg9hUCZPvPmw,
	Stephen Rothwell, solar-cxoSlKxDwOJWk0Htik3J/w, Djalal Harouni

Flush dcache entries of a task when it terminates. The task may have
showed up in multiple procfs mounts per pid namespace, and we need to
walk the mounts and invalidate any left entires.

Cc: Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
Cc: Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
Cc: Andy Lutomirski <luto-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Alexey Gladkov <gladkov.alexey-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Signed-off-by: Djalal Harouni <tixxdz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 fs/proc/base.c                | 27 +++++++++++++++++++-----
 fs/proc/inode.c               |  9 +++++++-
 fs/proc/root.c                | 10 +++++++++
 include/linux/pid_namespace.h | 49 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/proc_fs.h       |  2 ++
 5 files changed, 91 insertions(+), 6 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 88b92bc..27e52aa 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3034,7 +3034,8 @@ static const struct inode_operations proc_tgid_base_inode_operations = {
 	.permission	= proc_pid_permission,
 };
 
-static void proc_flush_task_mnt(struct vfsmount *mnt, pid_t pid, pid_t tgid)
+static void proc_flush_task_mnt_root(struct dentry *mnt_root,
+				     pid_t pid, pid_t tgid)
 {
 	struct dentry *dentry, *leader, *dir;
 	char buf[PROC_NUMBUF];
@@ -3043,7 +3044,7 @@ static void proc_flush_task_mnt(struct vfsmount *mnt, pid_t pid, pid_t tgid)
 	name.name = buf;
 	name.len = snprintf(buf, sizeof(buf), "%d", pid);
 	/* no ->d_hash() rejects on procfs */
-	dentry = d_hash_and_lookup(mnt->mnt_root, &name);
+	dentry = d_hash_and_lookup(mnt_root, &name);
 	if (dentry) {
 		d_invalidate(dentry);
 		dput(dentry);
@@ -3054,7 +3055,7 @@ static void proc_flush_task_mnt(struct vfsmount *mnt, pid_t pid, pid_t tgid)
 
 	name.name = buf;
 	name.len = snprintf(buf, sizeof(buf), "%d", tgid);
-	leader = d_hash_and_lookup(mnt->mnt_root, &name);
+	leader = d_hash_and_lookup(mnt_root, &name);
 	if (!leader)
 		goto out;
 
@@ -3109,14 +3110,30 @@ void proc_flush_task(struct task_struct *task)
 	int i;
 	struct pid *pid, *tgid;
 	struct upid *upid;
+	struct proc_fs_info *fs_info_entry;
+	struct pid_namespace *pid_ns;
+	struct dentry *mnt_root;
 
 	pid = task_pid(task);
 	tgid = task_tgid(task);
 
 	for (i = 0; i <= pid->level; i++) {
 		upid = &pid->numbers[i];
-		proc_flush_task_mnt(upid->ns->proc_mnt, upid->nr,
-					tgid->numbers[i].nr);
+		pid_ns = upid->ns;
+
+		pidns_proc_lock_shared(pid_ns);
+		list_for_each_entry(fs_info_entry, &pid_ns->procfs_mounts,
+				    pidns_entry) {
+			if (proc_fs_newinstance(fs_info_entry)) {
+				mnt_root = fs_info_entry->sb->s_root;
+				proc_flush_task_mnt_root(mnt_root, upid->nr,
+							 tgid->numbers[i].nr);
+			}
+		}
+		pidns_proc_unlock_shared(pid_ns);
+
+		mnt_root = pid_ns->proc_mnt->mnt_root;
+		proc_flush_task_mnt_root(mnt_root, upid->nr, tgid->numbers[i].nr);
 	}
 }
 
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 2707d5f..8fcf0d7 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -484,10 +484,17 @@ struct inode *proc_get_inode(struct super_block *sb, struct proc_dir_entry *de)
 int proc_fill_super(struct super_block *s, void *data, int silent)
 {
 	struct proc_fs_info *fs_info = proc_sb(s);
+	struct pid_namespace *ns = get_pid_ns(fs_info->pid_ns);
 	struct inode *root_inode;
 	int ret;
 
-	get_pid_ns(fs_info->pid_ns);
+	fs_info->sb = s;
+
+	if (proc_fs_newinstance(fs_info)) {
+		pidns_proc_lock(ns);
+		list_add_tail(&fs_info->pidns_entry, &ns->procfs_mounts);
+		pidns_proc_unlock(ns);
+	}
 
 	if (!proc_parse_options(data, fs_info))
 		return -EINVAL;
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 5cdff69..5503799 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -259,6 +259,13 @@ static void proc_kill_sb(struct super_block *sb)
 		dput(fs_info->proc_self);
 	if (fs_info->proc_thread_self)
 		dput(fs_info->proc_thread_self);
+
+	if (proc_fs_newinstance(fs_info)) {
+		pidns_proc_lock(ns);
+		list_del(&fs_info->pidns_entry);
+		pidns_proc_unlock(ns);
+	}
+
 	kill_anon_super(sb);
 	put_pid_ns(ns);
 	kfree(fs_info);
@@ -374,6 +381,9 @@ int pid_ns_prepare_proc(struct pid_namespace *ns)
 		return PTR_ERR(mnt);
 
 	ns->proc_mnt = mnt;
+	init_rwsem(&ns->rw_procfs_mnts);
+	INIT_LIST_HEAD(&ns->procfs_mounts);
+
 	return 0;
 }
 
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index 66f47f1..9a7a28d 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -26,6 +26,15 @@ struct pid_namespace {
 	struct pid_namespace *parent;
 #ifdef CONFIG_PROC_FS
 	struct vfsmount *proc_mnt; /* Internal proc mounted during each new pidns */
+
+	/* Serialize separated procfs access from super block create/destory */
+	struct rw_semaphore rw_procfs_mnts;
+
+	/*
+	 * List of separated procfs mounts, used to invalidate task dentry
+	 * from all the related procfs mounts.
+	 */
+	struct list_head procfs_mounts;
 #endif
 #ifdef CONFIG_BSD_PROCESS_ACCT
 	struct fs_pin *bacct;
@@ -90,4 +99,44 @@ extern struct pid_namespace *task_active_pid_ns(struct task_struct *tsk);
 void pidhash_init(void);
 void pid_idr_init(void);
 
+#ifdef CONFIG_PROC_FS
+static inline void pidns_proc_lock(struct pid_namespace *pid_ns)
+{
+	down_write(&pid_ns->rw_procfs_mnts);
+}
+
+static inline void pidns_proc_unlock(struct pid_namespace *pid_ns)
+{
+	up_write(&pid_ns->rw_procfs_mnts);
+}
+
+static inline void pidns_proc_lock_shared(struct pid_namespace *pid_ns)
+{
+	down_read(&pid_ns->rw_procfs_mnts);
+}
+
+static inline void pidns_proc_unlock_shared(struct pid_namespace *pid_ns)
+{
+	up_read(&pid_ns->rw_procfs_mnts);
+}
+#else /* !CONFIG_PROC_FS */
+
+static inline void pidns_proc_lock(struct pid_namespace *pid_ns)
+{
+}
+
+static inline void pidns_proc_unlock(struct pid_namespace *pid_ns)
+{
+}
+
+static inline void pidns_proc_lock_shared(struct pid_namespace *pid_ns)
+{
+}
+
+static inline void pidns_proc_unlock_shared(struct pid_namespace *pid_ns)
+{
+}
+
+#endif /* CONFIG_PROC_FS */
+
 #endif /* _LINUX_PID_NS_H */
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 0730f52..e56fbab 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -24,7 +24,9 @@ enum { /* definitions for 'pids' mount option */
 };
 
 struct proc_fs_info {
+	struct super_block *sb;
 	struct pid_namespace *pid_ns;
+	struct list_head pidns_entry; /* Node in procfs_mounts of pidns */
 	struct dentry *proc_self; /* For /proc/self/ */
 	struct dentry *proc_thread_self; /* For /proc/thread-self/ */
 	bool newinstance; /* Flag for new separated instances */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH RFC v3 7/7] proc: flush dcache entries from all procfs instances
@ 2017-11-09 16:14   ` Djalal Harouni
  0 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-09 16:14 UTC (permalink / raw)
  To: linux-security-module

Flush dcache entries of a task when it terminates. The task may have
showed up in multiple procfs mounts per pid namespace, and we need to
walk the mounts and invalidate any left entires.

Cc: Kees Cook <keescook@chromium.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@gmail.com>
---
 fs/proc/base.c                | 27 +++++++++++++++++++-----
 fs/proc/inode.c               |  9 +++++++-
 fs/proc/root.c                | 10 +++++++++
 include/linux/pid_namespace.h | 49 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/proc_fs.h       |  2 ++
 5 files changed, 91 insertions(+), 6 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 88b92bc..27e52aa 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3034,7 +3034,8 @@ static const struct inode_operations proc_tgid_base_inode_operations = {
 	.permission	= proc_pid_permission,
 };
 
-static void proc_flush_task_mnt(struct vfsmount *mnt, pid_t pid, pid_t tgid)
+static void proc_flush_task_mnt_root(struct dentry *mnt_root,
+				     pid_t pid, pid_t tgid)
 {
 	struct dentry *dentry, *leader, *dir;
 	char buf[PROC_NUMBUF];
@@ -3043,7 +3044,7 @@ static void proc_flush_task_mnt(struct vfsmount *mnt, pid_t pid, pid_t tgid)
 	name.name = buf;
 	name.len = snprintf(buf, sizeof(buf), "%d", pid);
 	/* no ->d_hash() rejects on procfs */
-	dentry = d_hash_and_lookup(mnt->mnt_root, &name);
+	dentry = d_hash_and_lookup(mnt_root, &name);
 	if (dentry) {
 		d_invalidate(dentry);
 		dput(dentry);
@@ -3054,7 +3055,7 @@ static void proc_flush_task_mnt(struct vfsmount *mnt, pid_t pid, pid_t tgid)
 
 	name.name = buf;
 	name.len = snprintf(buf, sizeof(buf), "%d", tgid);
-	leader = d_hash_and_lookup(mnt->mnt_root, &name);
+	leader = d_hash_and_lookup(mnt_root, &name);
 	if (!leader)
 		goto out;
 
@@ -3109,14 +3110,30 @@ void proc_flush_task(struct task_struct *task)
 	int i;
 	struct pid *pid, *tgid;
 	struct upid *upid;
+	struct proc_fs_info *fs_info_entry;
+	struct pid_namespace *pid_ns;
+	struct dentry *mnt_root;
 
 	pid = task_pid(task);
 	tgid = task_tgid(task);
 
 	for (i = 0; i <= pid->level; i++) {
 		upid = &pid->numbers[i];
-		proc_flush_task_mnt(upid->ns->proc_mnt, upid->nr,
-					tgid->numbers[i].nr);
+		pid_ns = upid->ns;
+
+		pidns_proc_lock_shared(pid_ns);
+		list_for_each_entry(fs_info_entry, &pid_ns->procfs_mounts,
+				    pidns_entry) {
+			if (proc_fs_newinstance(fs_info_entry)) {
+				mnt_root = fs_info_entry->sb->s_root;
+				proc_flush_task_mnt_root(mnt_root, upid->nr,
+							 tgid->numbers[i].nr);
+			}
+		}
+		pidns_proc_unlock_shared(pid_ns);
+
+		mnt_root = pid_ns->proc_mnt->mnt_root;
+		proc_flush_task_mnt_root(mnt_root, upid->nr, tgid->numbers[i].nr);
 	}
 }
 
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 2707d5f..8fcf0d7 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -484,10 +484,17 @@ struct inode *proc_get_inode(struct super_block *sb, struct proc_dir_entry *de)
 int proc_fill_super(struct super_block *s, void *data, int silent)
 {
 	struct proc_fs_info *fs_info = proc_sb(s);
+	struct pid_namespace *ns = get_pid_ns(fs_info->pid_ns);
 	struct inode *root_inode;
 	int ret;
 
-	get_pid_ns(fs_info->pid_ns);
+	fs_info->sb = s;
+
+	if (proc_fs_newinstance(fs_info)) {
+		pidns_proc_lock(ns);
+		list_add_tail(&fs_info->pidns_entry, &ns->procfs_mounts);
+		pidns_proc_unlock(ns);
+	}
 
 	if (!proc_parse_options(data, fs_info))
 		return -EINVAL;
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 5cdff69..5503799 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -259,6 +259,13 @@ static void proc_kill_sb(struct super_block *sb)
 		dput(fs_info->proc_self);
 	if (fs_info->proc_thread_self)
 		dput(fs_info->proc_thread_self);
+
+	if (proc_fs_newinstance(fs_info)) {
+		pidns_proc_lock(ns);
+		list_del(&fs_info->pidns_entry);
+		pidns_proc_unlock(ns);
+	}
+
 	kill_anon_super(sb);
 	put_pid_ns(ns);
 	kfree(fs_info);
@@ -374,6 +381,9 @@ int pid_ns_prepare_proc(struct pid_namespace *ns)
 		return PTR_ERR(mnt);
 
 	ns->proc_mnt = mnt;
+	init_rwsem(&ns->rw_procfs_mnts);
+	INIT_LIST_HEAD(&ns->procfs_mounts);
+
 	return 0;
 }
 
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index 66f47f1..9a7a28d 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -26,6 +26,15 @@ struct pid_namespace {
 	struct pid_namespace *parent;
 #ifdef CONFIG_PROC_FS
 	struct vfsmount *proc_mnt; /* Internal proc mounted during each new pidns */
+
+	/* Serialize separated procfs access from super block create/destory */
+	struct rw_semaphore rw_procfs_mnts;
+
+	/*
+	 * List of separated procfs mounts, used to invalidate task dentry
+	 * from all the related procfs mounts.
+	 */
+	struct list_head procfs_mounts;
 #endif
 #ifdef CONFIG_BSD_PROCESS_ACCT
 	struct fs_pin *bacct;
@@ -90,4 +99,44 @@ extern struct pid_namespace *task_active_pid_ns(struct task_struct *tsk);
 void pidhash_init(void);
 void pid_idr_init(void);
 
+#ifdef CONFIG_PROC_FS
+static inline void pidns_proc_lock(struct pid_namespace *pid_ns)
+{
+	down_write(&pid_ns->rw_procfs_mnts);
+}
+
+static inline void pidns_proc_unlock(struct pid_namespace *pid_ns)
+{
+	up_write(&pid_ns->rw_procfs_mnts);
+}
+
+static inline void pidns_proc_lock_shared(struct pid_namespace *pid_ns)
+{
+	down_read(&pid_ns->rw_procfs_mnts);
+}
+
+static inline void pidns_proc_unlock_shared(struct pid_namespace *pid_ns)
+{
+	up_read(&pid_ns->rw_procfs_mnts);
+}
+#else /* !CONFIG_PROC_FS */
+
+static inline void pidns_proc_lock(struct pid_namespace *pid_ns)
+{
+}
+
+static inline void pidns_proc_unlock(struct pid_namespace *pid_ns)
+{
+}
+
+static inline void pidns_proc_lock_shared(struct pid_namespace *pid_ns)
+{
+}
+
+static inline void pidns_proc_unlock_shared(struct pid_namespace *pid_ns)
+{
+}
+
+#endif /* CONFIG_PROC_FS */
+
 #endif /* _LINUX_PID_NS_H */
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 0730f52..e56fbab 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -24,7 +24,9 @@ enum { /* definitions for 'pids' mount option */
 };
 
 struct proc_fs_info {
+	struct super_block *sb;
 	struct pid_namespace *pid_ns;
+	struct list_head pidns_entry; /* Node in procfs_mounts of pidns */
 	struct dentry *proc_self; /* For /proc/self/ */
 	struct dentry *proc_thread_self; /* For /proc/thread-self/ */
 	bool newinstance; /* Flag for new separated instances */
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [kernel-hardening] [PATCH RFC v3 7/7] proc: flush dcache entries from all procfs instances
@ 2017-11-09 16:14   ` Djalal Harouni
  0 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-09 16:14 UTC (permalink / raw)
  To: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	linux-fsdevel, linux-kernel, kernel-hardening,
	linux-security-module, linux-api
  Cc: Greg Kroah-Hartman, Alexander Viro, Akinobu Mita, me,
	Oleg Nesterov, Jeff Layton, Ingo Molnar, Alexey Dobriyan,
	ebiederm, Linus Torvalds, Daniel Micay, Jonathan Corbet, bfields,
	Stephen Rothwell, solar, Djalal Harouni

Flush dcache entries of a task when it terminates. The task may have
showed up in multiple procfs mounts per pid namespace, and we need to
walk the mounts and invalidate any left entires.

Cc: Kees Cook <keescook@chromium.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@gmail.com>
---
 fs/proc/base.c                | 27 +++++++++++++++++++-----
 fs/proc/inode.c               |  9 +++++++-
 fs/proc/root.c                | 10 +++++++++
 include/linux/pid_namespace.h | 49 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/proc_fs.h       |  2 ++
 5 files changed, 91 insertions(+), 6 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 88b92bc..27e52aa 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3034,7 +3034,8 @@ static const struct inode_operations proc_tgid_base_inode_operations = {
 	.permission	= proc_pid_permission,
 };
 
-static void proc_flush_task_mnt(struct vfsmount *mnt, pid_t pid, pid_t tgid)
+static void proc_flush_task_mnt_root(struct dentry *mnt_root,
+				     pid_t pid, pid_t tgid)
 {
 	struct dentry *dentry, *leader, *dir;
 	char buf[PROC_NUMBUF];
@@ -3043,7 +3044,7 @@ static void proc_flush_task_mnt(struct vfsmount *mnt, pid_t pid, pid_t tgid)
 	name.name = buf;
 	name.len = snprintf(buf, sizeof(buf), "%d", pid);
 	/* no ->d_hash() rejects on procfs */
-	dentry = d_hash_and_lookup(mnt->mnt_root, &name);
+	dentry = d_hash_and_lookup(mnt_root, &name);
 	if (dentry) {
 		d_invalidate(dentry);
 		dput(dentry);
@@ -3054,7 +3055,7 @@ static void proc_flush_task_mnt(struct vfsmount *mnt, pid_t pid, pid_t tgid)
 
 	name.name = buf;
 	name.len = snprintf(buf, sizeof(buf), "%d", tgid);
-	leader = d_hash_and_lookup(mnt->mnt_root, &name);
+	leader = d_hash_and_lookup(mnt_root, &name);
 	if (!leader)
 		goto out;
 
@@ -3109,14 +3110,30 @@ void proc_flush_task(struct task_struct *task)
 	int i;
 	struct pid *pid, *tgid;
 	struct upid *upid;
+	struct proc_fs_info *fs_info_entry;
+	struct pid_namespace *pid_ns;
+	struct dentry *mnt_root;
 
 	pid = task_pid(task);
 	tgid = task_tgid(task);
 
 	for (i = 0; i <= pid->level; i++) {
 		upid = &pid->numbers[i];
-		proc_flush_task_mnt(upid->ns->proc_mnt, upid->nr,
-					tgid->numbers[i].nr);
+		pid_ns = upid->ns;
+
+		pidns_proc_lock_shared(pid_ns);
+		list_for_each_entry(fs_info_entry, &pid_ns->procfs_mounts,
+				    pidns_entry) {
+			if (proc_fs_newinstance(fs_info_entry)) {
+				mnt_root = fs_info_entry->sb->s_root;
+				proc_flush_task_mnt_root(mnt_root, upid->nr,
+							 tgid->numbers[i].nr);
+			}
+		}
+		pidns_proc_unlock_shared(pid_ns);
+
+		mnt_root = pid_ns->proc_mnt->mnt_root;
+		proc_flush_task_mnt_root(mnt_root, upid->nr, tgid->numbers[i].nr);
 	}
 }
 
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 2707d5f..8fcf0d7 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -484,10 +484,17 @@ struct inode *proc_get_inode(struct super_block *sb, struct proc_dir_entry *de)
 int proc_fill_super(struct super_block *s, void *data, int silent)
 {
 	struct proc_fs_info *fs_info = proc_sb(s);
+	struct pid_namespace *ns = get_pid_ns(fs_info->pid_ns);
 	struct inode *root_inode;
 	int ret;
 
-	get_pid_ns(fs_info->pid_ns);
+	fs_info->sb = s;
+
+	if (proc_fs_newinstance(fs_info)) {
+		pidns_proc_lock(ns);
+		list_add_tail(&fs_info->pidns_entry, &ns->procfs_mounts);
+		pidns_proc_unlock(ns);
+	}
 
 	if (!proc_parse_options(data, fs_info))
 		return -EINVAL;
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 5cdff69..5503799 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -259,6 +259,13 @@ static void proc_kill_sb(struct super_block *sb)
 		dput(fs_info->proc_self);
 	if (fs_info->proc_thread_self)
 		dput(fs_info->proc_thread_self);
+
+	if (proc_fs_newinstance(fs_info)) {
+		pidns_proc_lock(ns);
+		list_del(&fs_info->pidns_entry);
+		pidns_proc_unlock(ns);
+	}
+
 	kill_anon_super(sb);
 	put_pid_ns(ns);
 	kfree(fs_info);
@@ -374,6 +381,9 @@ int pid_ns_prepare_proc(struct pid_namespace *ns)
 		return PTR_ERR(mnt);
 
 	ns->proc_mnt = mnt;
+	init_rwsem(&ns->rw_procfs_mnts);
+	INIT_LIST_HEAD(&ns->procfs_mounts);
+
 	return 0;
 }
 
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index 66f47f1..9a7a28d 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -26,6 +26,15 @@ struct pid_namespace {
 	struct pid_namespace *parent;
 #ifdef CONFIG_PROC_FS
 	struct vfsmount *proc_mnt; /* Internal proc mounted during each new pidns */
+
+	/* Serialize separated procfs access from super block create/destory */
+	struct rw_semaphore rw_procfs_mnts;
+
+	/*
+	 * List of separated procfs mounts, used to invalidate task dentry
+	 * from all the related procfs mounts.
+	 */
+	struct list_head procfs_mounts;
 #endif
 #ifdef CONFIG_BSD_PROCESS_ACCT
 	struct fs_pin *bacct;
@@ -90,4 +99,44 @@ extern struct pid_namespace *task_active_pid_ns(struct task_struct *tsk);
 void pidhash_init(void);
 void pid_idr_init(void);
 
+#ifdef CONFIG_PROC_FS
+static inline void pidns_proc_lock(struct pid_namespace *pid_ns)
+{
+	down_write(&pid_ns->rw_procfs_mnts);
+}
+
+static inline void pidns_proc_unlock(struct pid_namespace *pid_ns)
+{
+	up_write(&pid_ns->rw_procfs_mnts);
+}
+
+static inline void pidns_proc_lock_shared(struct pid_namespace *pid_ns)
+{
+	down_read(&pid_ns->rw_procfs_mnts);
+}
+
+static inline void pidns_proc_unlock_shared(struct pid_namespace *pid_ns)
+{
+	up_read(&pid_ns->rw_procfs_mnts);
+}
+#else /* !CONFIG_PROC_FS */
+
+static inline void pidns_proc_lock(struct pid_namespace *pid_ns)
+{
+}
+
+static inline void pidns_proc_unlock(struct pid_namespace *pid_ns)
+{
+}
+
+static inline void pidns_proc_lock_shared(struct pid_namespace *pid_ns)
+{
+}
+
+static inline void pidns_proc_unlock_shared(struct pid_namespace *pid_ns)
+{
+}
+
+#endif /* CONFIG_PROC_FS */
+
 #endif /* _LINUX_PID_NS_H */
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 0730f52..e56fbab 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -24,7 +24,9 @@ enum { /* definitions for 'pids' mount option */
 };
 
 struct proc_fs_info {
+	struct super_block *sb;
 	struct pid_namespace *pid_ns;
+	struct list_head pidns_entry; /* Node in procfs_mounts of pidns */
 	struct dentry *proc_self; /* For /proc/self/ */
 	struct dentry *proc_thread_self; /* For /proc/thread-self/ */
 	bool newinstance; /* Flag for new separated instances */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v3 6/7] proc: support new 'pids=all|ptraceable' mount option
  2017-11-09 16:14   ` Djalal Harouni
                       ` (2 preceding siblings ...)
  (?)
@ 2017-11-10  2:38     ` Andy Lutomirski
  -1 siblings, 0 replies; 71+ messages in thread
From: Andy Lutomirski @ 2017-11-10  2:38 UTC (permalink / raw)
  To: Djalal Harouni
  Cc: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	Linux FS Devel, linux-kernel, kernel-hardening, LSM List,
	Linux API, Greg Kroah-Hartman, Alexander Viro, Akinobu Mita, me,
	Oleg Nesterov, Jeff Layton, Ingo Molnar, Alexey Dobriyan,
	Eric W. Biederman, Linus Torvalds, Daniel Micay, Jonathan Corbet,
	J. Bruce Fields, Stephen Rothwell, Solar Designer

On Thu, Nov 9, 2017 at 8:14 AM, Djalal Harouni <tixxdz@gmail.com> wrote:
> This patch introduces the new 'pids' mount option, as it was discussed
> and suggested by Andy Lutomirski [1].
>
> * If 'pids=' is passed without 'newinstance' then it has no effect.

Would it be safer this were an error instead?

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v3 6/7] proc: support new 'pids=all|ptraceable' mount option
@ 2017-11-10  2:38     ` Andy Lutomirski
  0 siblings, 0 replies; 71+ messages in thread
From: Andy Lutomirski @ 2017-11-10  2:38 UTC (permalink / raw)
  To: Djalal Harouni
  Cc: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	Linux FS Devel, linux-kernel, kernel-hardening, LSM List,
	Linux API, Greg Kroah-Hartman, Alexander Viro, Akinobu Mita, me,
	Oleg Nesterov, Jeff Layton, Ingo Molnar, Alexey Dobriyan,
	Eric W. Biederman, Linus Torvalds, Daniel Micay, Jonathan Corbet,
	J. Bruce Fields, Stephen Rothwell, Solar Designer

On Thu, Nov 9, 2017 at 8:14 AM, Djalal Harouni <tixxdz@gmail.com> wrote:
> This patch introduces the new 'pids' mount option, as it was discussed
> and suggested by Andy Lutomirski [1].
>
> * If 'pids=' is passed without 'newinstance' then it has no effect.

Would it be safer this were an error instead?

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v3 6/7] proc: support new 'pids=all|ptraceable' mount option
@ 2017-11-10  2:38     ` Andy Lutomirski
  0 siblings, 0 replies; 71+ messages in thread
From: Andy Lutomirski @ 2017-11-10  2:38 UTC (permalink / raw)
  To: Djalal Harouni
  Cc: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	Linux FS Devel, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-hardening-ZwoEplunGu1jrUoiu81ncdBPR1lH4CV8, LSM List,
	Linux API, Greg Kroah-Hartman, Alexander Viro, Akinobu Mita,
	me-xzjC0nNlxno, Oleg Nesterov, Jeff Layton, Ingo Molnar,
	Alexey Dobriyan, Eric W. Biederman

On Thu, Nov 9, 2017 at 8:14 AM, Djalal Harouni <tixxdz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> This patch introduces the new 'pids' mount option, as it was discussed
> and suggested by Andy Lutomirski [1].
>
> * If 'pids=' is passed without 'newinstance' then it has no effect.

Would it be safer this were an error instead?

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH RFC v3 6/7] proc: support new 'pids=all|ptraceable' mount option
@ 2017-11-10  2:38     ` Andy Lutomirski
  0 siblings, 0 replies; 71+ messages in thread
From: Andy Lutomirski @ 2017-11-10  2:38 UTC (permalink / raw)
  To: linux-security-module

On Thu, Nov 9, 2017 at 8:14 AM, Djalal Harouni <tixxdz@gmail.com> wrote:
> This patch introduces the new 'pids' mount option, as it was discussed
> and suggested by Andy Lutomirski [1].
>
> * If 'pids=' is passed without 'newinstance' then it has no effect.

Would it be safer this were an error instead?
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [kernel-hardening] Re: [PATCH RFC v3 6/7] proc: support new 'pids=all|ptraceable' mount option
@ 2017-11-10  2:38     ` Andy Lutomirski
  0 siblings, 0 replies; 71+ messages in thread
From: Andy Lutomirski @ 2017-11-10  2:38 UTC (permalink / raw)
  To: Djalal Harouni
  Cc: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	Linux FS Devel, linux-kernel, kernel-hardening, LSM List,
	Linux API, Greg Kroah-Hartman, Alexander Viro, Akinobu Mita, me,
	Oleg Nesterov, Jeff Layton, Ingo Molnar, Alexey Dobriyan,
	Eric W. Biederman, Linus Torvalds, Daniel Micay, Jonathan Corbet,
	J. Bruce Fields, Stephen Rothwell, Solar Designer

On Thu, Nov 9, 2017 at 8:14 AM, Djalal Harouni <tixxdz@gmail.com> wrote:
> This patch introduces the new 'pids' mount option, as it was discussed
> and suggested by Andy Lutomirski [1].
>
> * If 'pids=' is passed without 'newinstance' then it has no effect.

Would it be safer this were an error instead?

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v3 4/7] proc: support mounting private procfs instances inside same pid namespace
@ 2017-11-10  2:53     ` James Morris
  0 siblings, 0 replies; 71+ messages in thread
From: James Morris @ 2017-11-10  2:53 UTC (permalink / raw)
  To: Djalal Harouni
  Cc: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	linux-fsdevel, linux-kernel, kernel-hardening,
	linux-security-module, linux-api, Greg Kroah-Hartman,
	Alexander Viro, Akinobu Mita, me, Oleg Nesterov, Jeff Layton,
	Ingo Molnar, Alexey Dobriyan, ebiederm, Linus Torvalds,
	Daniel Micay, Jonathan Corbet, bfields, Stephen Rothwell, solar

On Thu, 9 Nov 2017, Djalal Harouni wrote:

> This should allow later after real testing to have a smooth transition
> to a procfs with default private instances.
> 
> [1] https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2017-January/004215.html
> [2] http://www.openwall.com/lists/kernel-hardening/2017/10/05/5
> [3] https://lwn.net/Articles/689539/
> [4] http://lxr.free-electrons.com/source/Documentation/filesystems/devpts.txt?v=3.14
> [5] https://lkml.org/lkml/2017/5/2/407
> [6] https://lkml.org/lkml/2017/5/3/357
> 
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Suggested-by: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
> Signed-off-by: Djalal Harouni <tixxdz@gmail.com>


Reviewed-by: James Morris <james.l.morris@oracle.com>


-- 
James Morris
<james.l.morris@oracle.com>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v3 4/7] proc: support mounting private procfs instances inside same pid namespace
@ 2017-11-10  2:53     ` James Morris
  0 siblings, 0 replies; 71+ messages in thread
From: James Morris @ 2017-11-10  2:53 UTC (permalink / raw)
  To: Djalal Harouni
  Cc: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-hardening-ZwoEplunGu1jrUoiu81ncdBPR1lH4CV8,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Greg Kroah-Hartman,
	Alexander Viro, Akinobu Mita, me-xzjC0nNlxno, Oleg Nesterov,
	Jeff Layton, Ingo Molnar, Alexey Dobriyan,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, Linus Torvalds, Daniel Micay,
	Jonathan Corbet, bfields-uC3wQj2KruMpug/h7KTFAQ

On Thu, 9 Nov 2017, Djalal Harouni wrote:

> This should allow later after real testing to have a smooth transition
> to a procfs with default private instances.
> 
> [1] https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2017-January/004215.html
> [2] http://www.openwall.com/lists/kernel-hardening/2017/10/05/5
> [3] https://lwn.net/Articles/689539/
> [4] http://lxr.free-electrons.com/source/Documentation/filesystems/devpts.txt?v=3.14
> [5] https://lkml.org/lkml/2017/5/2/407
> [6] https://lkml.org/lkml/2017/5/3/357
> 
> Cc: Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
> Cc: Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
> Suggested-by: Andy Lutomirski <luto-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Signed-off-by: Alexey Gladkov <gladkov.alexey-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Signed-off-by: Djalal Harouni <tixxdz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>


Reviewed-by: James Morris <james.l.morris-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>


-- 
James Morris
<james.l.morris-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH RFC v3 4/7] proc: support mounting private procfs instances inside same pid namespace
@ 2017-11-10  2:53     ` James Morris
  0 siblings, 0 replies; 71+ messages in thread
From: James Morris @ 2017-11-10  2:53 UTC (permalink / raw)
  To: linux-security-module

On Thu, 9 Nov 2017, Djalal Harouni wrote:

> This should allow later after real testing to have a smooth transition
> to a procfs with default private instances.
> 
> [1] https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2017-January/004215.html
> [2] http://www.openwall.com/lists/kernel-hardening/2017/10/05/5
> [3] https://lwn.net/Articles/689539/
> [4] http://lxr.free-electrons.com/source/Documentation/filesystems/devpts.txt?v=3.14
> [5] https://lkml.org/lkml/2017/5/2/407
> [6] https://lkml.org/lkml/2017/5/3/357
> 
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Suggested-by: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
> Signed-off-by: Djalal Harouni <tixxdz@gmail.com>


Reviewed-by: James Morris <james.l.morris@oracle.com>


-- 
James Morris
<james.l.morris@oracle.com>

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [kernel-hardening] Re: [PATCH RFC v3 4/7] proc: support mounting private procfs instances inside same pid namespace
@ 2017-11-10  2:53     ` James Morris
  0 siblings, 0 replies; 71+ messages in thread
From: James Morris @ 2017-11-10  2:53 UTC (permalink / raw)
  To: Djalal Harouni
  Cc: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	linux-fsdevel, linux-kernel, kernel-hardening,
	linux-security-module, linux-api, Greg Kroah-Hartman,
	Alexander Viro, Akinobu Mita, me, Oleg Nesterov, Jeff Layton,
	Ingo Molnar, Alexey Dobriyan, ebiederm, Linus Torvalds,
	Daniel Micay, Jonathan Corbet, bfields, Stephen Rothwell, solar

On Thu, 9 Nov 2017, Djalal Harouni wrote:

> This should allow later after real testing to have a smooth transition
> to a procfs with default private instances.
> 
> [1] https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2017-January/004215.html
> [2] http://www.openwall.com/lists/kernel-hardening/2017/10/05/5
> [3] https://lwn.net/Articles/689539/
> [4] http://lxr.free-electrons.com/source/Documentation/filesystems/devpts.txt?v=3.14
> [5] https://lkml.org/lkml/2017/5/2/407
> [6] https://lkml.org/lkml/2017/5/3/357
> 
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Suggested-by: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
> Signed-off-by: Djalal Harouni <tixxdz@gmail.com>


Reviewed-by: James Morris <james.l.morris@oracle.com>


-- 
James Morris
<james.l.morris@oracle.com>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v3 6/7] proc: support new 'pids=all|ptraceable' mount option
@ 2017-11-10  2:56     ` James Morris
  0 siblings, 0 replies; 71+ messages in thread
From: James Morris @ 2017-11-10  2:56 UTC (permalink / raw)
  To: Djalal Harouni
  Cc: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	linux-fsdevel, linux-kernel, kernel-hardening,
	linux-security-module, linux-api, Greg Kroah-Hartman,
	Alexander Viro, Akinobu Mita, me, Oleg Nesterov, Jeff Layton,
	Ingo Molnar, Alexey Dobriyan, ebiederm, Linus Torvalds,
	Daniel Micay, Jonathan Corbet, bfields, Stephen Rothwell, solar

On Thu, 9 Nov 2017, Djalal Harouni wrote:

> This patch introduces the new 'pids' mount option, as it was discussed
> and suggested by Andy Lutomirski [1].
> 
> * If 'pids=' is passed without 'newinstance' then it has no effect.
> 
> * If 'newinstance,pids=all' then all processes will be shown in proc.
> 
> * If 'newinstance,pids=ptraceable' then only ptraceable processes will be
> shown.
> 
> * 'pids=' takes precendence over 'hidepid=' since 'hidepid=' can be
>   ignored if "gid=" was set and caller has the "gid=" set in its groups.
>   We want to guarantee that LSM have a security path there that can not
>   be disabled with "gid=".
> 
> This allows to support lightweight sandboxes in Embedded Linux.
> 
> Later Yama LSM can be updated to check that processes are able only
> able to see their children inside /proc/, allowing to support more tight
> cases.
> 
> [1] https://lkml.org/lkml/2017/4/26/646
> 
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Suggested-by: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
> Signed-off-by: Djalal Harouni <tixxdz@gmail.com>


Reviewed-by: James Morris <james.l.morris@oracle.com>


-- 
James Morris
<james.l.morris@oracle.com>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v3 6/7] proc: support new 'pids=all|ptraceable' mount option
@ 2017-11-10  2:56     ` James Morris
  0 siblings, 0 replies; 71+ messages in thread
From: James Morris @ 2017-11-10  2:56 UTC (permalink / raw)
  To: Djalal Harouni
  Cc: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-hardening-ZwoEplunGu1jrUoiu81ncdBPR1lH4CV8,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Greg Kroah-Hartman,
	Alexander Viro, Akinobu Mita, me-xzjC0nNlxno, Oleg Nesterov,
	Jeff Layton, Ingo Molnar, Alexey Dobriyan,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, Linus Torvalds, Daniel Micay,
	Jonathan Corbet, bfields-uC3wQj2KruMpug/h7KTFAQ

On Thu, 9 Nov 2017, Djalal Harouni wrote:

> This patch introduces the new 'pids' mount option, as it was discussed
> and suggested by Andy Lutomirski [1].
> 
> * If 'pids=' is passed without 'newinstance' then it has no effect.
> 
> * If 'newinstance,pids=all' then all processes will be shown in proc.
> 
> * If 'newinstance,pids=ptraceable' then only ptraceable processes will be
> shown.
> 
> * 'pids=' takes precendence over 'hidepid=' since 'hidepid=' can be
>   ignored if "gid=" was set and caller has the "gid=" set in its groups.
>   We want to guarantee that LSM have a security path there that can not
>   be disabled with "gid=".
> 
> This allows to support lightweight sandboxes in Embedded Linux.
> 
> Later Yama LSM can be updated to check that processes are able only
> able to see their children inside /proc/, allowing to support more tight
> cases.
> 
> [1] https://lkml.org/lkml/2017/4/26/646
> 
> Cc: Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
> Cc: Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
> Suggested-by: Andy Lutomirski <luto-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Signed-off-by: Alexey Gladkov <gladkov.alexey-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Signed-off-by: Djalal Harouni <tixxdz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>


Reviewed-by: James Morris <james.l.morris-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>


-- 
James Morris
<james.l.morris-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH RFC v3 6/7] proc: support new 'pids=all|ptraceable' mount option
@ 2017-11-10  2:56     ` James Morris
  0 siblings, 0 replies; 71+ messages in thread
From: James Morris @ 2017-11-10  2:56 UTC (permalink / raw)
  To: linux-security-module

On Thu, 9 Nov 2017, Djalal Harouni wrote:

> This patch introduces the new 'pids' mount option, as it was discussed
> and suggested by Andy Lutomirski [1].
> 
> * If 'pids=' is passed without 'newinstance' then it has no effect.
> 
> * If 'newinstance,pids=all' then all processes will be shown in proc.
> 
> * If 'newinstance,pids=ptraceable' then only ptraceable processes will be
> shown.
> 
> * 'pids=' takes precendence over 'hidepid=' since 'hidepid=' can be
>   ignored if "gid=" was set and caller has the "gid=" set in its groups.
>   We want to guarantee that LSM have a security path there that can not
>   be disabled with "gid=".
> 
> This allows to support lightweight sandboxes in Embedded Linux.
> 
> Later Yama LSM can be updated to check that processes are able only
> able to see their children inside /proc/, allowing to support more tight
> cases.
> 
> [1] https://lkml.org/lkml/2017/4/26/646
> 
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Suggested-by: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
> Signed-off-by: Djalal Harouni <tixxdz@gmail.com>


Reviewed-by: James Morris <james.l.morris@oracle.com>


-- 
James Morris
<james.l.morris@oracle.com>

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [kernel-hardening] Re: [PATCH RFC v3 6/7] proc: support new 'pids=all|ptraceable' mount option
@ 2017-11-10  2:56     ` James Morris
  0 siblings, 0 replies; 71+ messages in thread
From: James Morris @ 2017-11-10  2:56 UTC (permalink / raw)
  To: Djalal Harouni
  Cc: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	linux-fsdevel, linux-kernel, kernel-hardening,
	linux-security-module, linux-api, Greg Kroah-Hartman,
	Alexander Viro, Akinobu Mita, me, Oleg Nesterov, Jeff Layton,
	Ingo Molnar, Alexey Dobriyan, ebiederm, Linus Torvalds,
	Daniel Micay, Jonathan Corbet, bfields, Stephen Rothwell, solar

On Thu, 9 Nov 2017, Djalal Harouni wrote:

> This patch introduces the new 'pids' mount option, as it was discussed
> and suggested by Andy Lutomirski [1].
> 
> * If 'pids=' is passed without 'newinstance' then it has no effect.
> 
> * If 'newinstance,pids=all' then all processes will be shown in proc.
> 
> * If 'newinstance,pids=ptraceable' then only ptraceable processes will be
> shown.
> 
> * 'pids=' takes precendence over 'hidepid=' since 'hidepid=' can be
>   ignored if "gid=" was set and caller has the "gid=" set in its groups.
>   We want to guarantee that LSM have a security path there that can not
>   be disabled with "gid=".
> 
> This allows to support lightweight sandboxes in Embedded Linux.
> 
> Later Yama LSM can be updated to check that processes are able only
> able to see their children inside /proc/, allowing to support more tight
> cases.
> 
> [1] https://lkml.org/lkml/2017/4/26/646
> 
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Suggested-by: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
> Signed-off-by: Djalal Harouni <tixxdz@gmail.com>


Reviewed-by: James Morris <james.l.morris@oracle.com>


-- 
James Morris
<james.l.morris@oracle.com>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v3 1/7] proc: add proc_fs_info struct to store proc information
  2017-11-09 16:14   ` Djalal Harouni
  (?)
  (?)
@ 2017-11-10 10:26     ` Alexey Dobriyan
  -1 siblings, 0 replies; 71+ messages in thread
From: Alexey Dobriyan @ 2017-11-10 10:26 UTC (permalink / raw)
  To: Djalal Harouni
  Cc: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	linux-fsdevel, linux-kernel, kernel-hardening,
	linux-security-module, linux-api, Greg Kroah-Hartman,
	Alexander Viro, Akinobu Mita, me, Oleg Nesterov, Jeff Layton,
	Ingo Molnar, ebiederm, Linus Torvalds, Daniel Micay,
	Jonathan Corbet, bfields, Stephen Rothwell, solar

On 11/9/17, Djalal Harouni <tixxdz@gmail.com> wrote:

> +struct proc_fs_info {
> +	struct pid_namespace *pid_ns;
> +};

> +static inline struct proc_fs_info *proc_sb(struct super_block *sb)
> +{
> +	return sb->s_fs_info;
> +}

Can you rename this to "struct proc_super_block *" then?
That "info" suffix all over filesystems doesn't add much info itself
just more typing.
Ditto for "fs_info" identifiers.

> +extern inline struct proc_fs_info *proc_sb(struct super_block *sb)
> { return NULL;}

extern inline?

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v3 1/7] proc: add proc_fs_info struct to store proc information
@ 2017-11-10 10:26     ` Alexey Dobriyan
  0 siblings, 0 replies; 71+ messages in thread
From: Alexey Dobriyan @ 2017-11-10 10:26 UTC (permalink / raw)
  To: Djalal Harouni
  Cc: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	linux-fsdevel, linux-kernel, kernel-hardening,
	linux-security-module, linux-api, Greg Kroah-Hartman,
	Alexander Viro, Akinobu Mita, me, Oleg Nesterov, Jeff Layton,
	Ingo Molnar, ebiederm, Linus Torvalds, Daniel Micay,
	Jonathan Corbet, bfields, Stephen Rothwell

On 11/9/17, Djalal Harouni <tixxdz@gmail.com> wrote:

> +struct proc_fs_info {
> +	struct pid_namespace *pid_ns;
> +};

> +static inline struct proc_fs_info *proc_sb(struct super_block *sb)
> +{
> +	return sb->s_fs_info;
> +}

Can you rename this to "struct proc_super_block *" then?
That "info" suffix all over filesystems doesn't add much info itself
just more typing.
Ditto for "fs_info" identifiers.

> +extern inline struct proc_fs_info *proc_sb(struct super_block *sb)
> { return NULL;}

extern inline?

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH RFC v3 1/7] proc: add proc_fs_info struct to store proc information
@ 2017-11-10 10:26     ` Alexey Dobriyan
  0 siblings, 0 replies; 71+ messages in thread
From: Alexey Dobriyan @ 2017-11-10 10:26 UTC (permalink / raw)
  To: linux-security-module

On 11/9/17, Djalal Harouni <tixxdz@gmail.com> wrote:

> +struct proc_fs_info {
> +	struct pid_namespace *pid_ns;
> +};

> +static inline struct proc_fs_info *proc_sb(struct super_block *sb)
> +{
> +	return sb->s_fs_info;
> +}

Can you rename this to "struct proc_super_block *" then?
That "info" suffix all over filesystems doesn't add much info itself
just more typing.
Ditto for "fs_info" identifiers.

> +extern inline struct proc_fs_info *proc_sb(struct super_block *sb)
> { return NULL;}

extern inline?
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [kernel-hardening] Re: [PATCH RFC v3 1/7] proc: add proc_fs_info struct to store proc information
@ 2017-11-10 10:26     ` Alexey Dobriyan
  0 siblings, 0 replies; 71+ messages in thread
From: Alexey Dobriyan @ 2017-11-10 10:26 UTC (permalink / raw)
  To: Djalal Harouni
  Cc: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	linux-fsdevel, linux-kernel, kernel-hardening,
	linux-security-module, linux-api, Greg Kroah-Hartman,
	Alexander Viro, Akinobu Mita, me, Oleg Nesterov, Jeff Layton,
	Ingo Molnar, ebiederm, Linus Torvalds, Daniel Micay,
	Jonathan Corbet, bfields, Stephen Rothwell, solar

On 11/9/17, Djalal Harouni <tixxdz@gmail.com> wrote:

> +struct proc_fs_info {
> +	struct pid_namespace *pid_ns;
> +};

> +static inline struct proc_fs_info *proc_sb(struct super_block *sb)
> +{
> +	return sb->s_fs_info;
> +}

Can you rename this to "struct proc_super_block *" then?
That "info" suffix all over filesystems doesn't add much info itself
just more typing.
Ditto for "fs_info" identifiers.

> +extern inline struct proc_fs_info *proc_sb(struct super_block *sb)
> { return NULL;}

extern inline?

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v3 2/7] proc: move /proc/{self|thread-self} dentries to proc_fs_info
  2017-11-09 16:14   ` Djalal Harouni
  (?)
  (?)
@ 2017-11-10 10:31     ` Alexey Dobriyan
  -1 siblings, 0 replies; 71+ messages in thread
From: Alexey Dobriyan @ 2017-11-10 10:31 UTC (permalink / raw)
  To: Djalal Harouni
  Cc: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	linux-fsdevel, linux-kernel, kernel-hardening,
	linux-security-module, linux-api, Greg Kroah-Hartman,
	Alexander Viro, Akinobu Mita, me, Oleg Nesterov, Jeff Layton,
	Ingo Molnar, ebiederm, Linus Torvalds, Daniel Micay,
	Jonathan Corbet, bfields, Stephen Rothwell, solar

On 11/9/17, Djalal Harouni <tixxdz@gmail.com> wrote:

>  struct proc_fs_info {
>  	struct pid_namespace *pid_ns;
> +	struct dentry *proc_self; /* For /proc/self/ */
> +	struct dentry *proc_thread_self; /* For /proc/thread-self/ */

These are redundant comments.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v3 2/7] proc: move /proc/{self|thread-self} dentries to proc_fs_info
@ 2017-11-10 10:31     ` Alexey Dobriyan
  0 siblings, 0 replies; 71+ messages in thread
From: Alexey Dobriyan @ 2017-11-10 10:31 UTC (permalink / raw)
  To: Djalal Harouni
  Cc: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	linux-fsdevel, linux-kernel, kernel-hardening,
	linux-security-module, linux-api, Greg Kroah-Hartman,
	Alexander Viro, Akinobu Mita, me, Oleg Nesterov, Jeff Layton,
	Ingo Molnar, ebiederm, Linus Torvalds, Daniel Micay,
	Jonathan Corbet, bfields, Stephen Rothwell

On 11/9/17, Djalal Harouni <tixxdz@gmail.com> wrote:

>  struct proc_fs_info {
>  	struct pid_namespace *pid_ns;
> +	struct dentry *proc_self; /* For /proc/self/ */
> +	struct dentry *proc_thread_self; /* For /proc/thread-self/ */

These are redundant comments.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH RFC v3 2/7] proc: move /proc/{self|thread-self} dentries to proc_fs_info
@ 2017-11-10 10:31     ` Alexey Dobriyan
  0 siblings, 0 replies; 71+ messages in thread
From: Alexey Dobriyan @ 2017-11-10 10:31 UTC (permalink / raw)
  To: linux-security-module

On 11/9/17, Djalal Harouni <tixxdz@gmail.com> wrote:

>  struct proc_fs_info {
>  	struct pid_namespace *pid_ns;
> +	struct dentry *proc_self; /* For /proc/self/ */
> +	struct dentry *proc_thread_self; /* For /proc/thread-self/ */

These are redundant comments.
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [kernel-hardening] Re: [PATCH RFC v3 2/7] proc: move /proc/{self|thread-self} dentries to proc_fs_info
@ 2017-11-10 10:31     ` Alexey Dobriyan
  0 siblings, 0 replies; 71+ messages in thread
From: Alexey Dobriyan @ 2017-11-10 10:31 UTC (permalink / raw)
  To: Djalal Harouni
  Cc: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	linux-fsdevel, linux-kernel, kernel-hardening,
	linux-security-module, linux-api, Greg Kroah-Hartman,
	Alexander Viro, Akinobu Mita, me, Oleg Nesterov, Jeff Layton,
	Ingo Molnar, ebiederm, Linus Torvalds, Daniel Micay,
	Jonathan Corbet, bfields, Stephen Rothwell, solar

On 11/9/17, Djalal Harouni <tixxdz@gmail.com> wrote:

>  struct proc_fs_info {
>  	struct pid_namespace *pid_ns;
> +	struct dentry *proc_self; /* For /proc/self/ */
> +	struct dentry *proc_thread_self; /* For /proc/thread-self/ */

These are redundant comments.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v3 1/7] proc: add proc_fs_info struct to store proc information
  2017-11-10 10:26     ` Alexey Dobriyan
  (?)
  (?)
@ 2017-11-10 10:32       ` Djalal Harouni
  -1 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-10 10:32 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	Linux FS Devel, linux-kernel, kernel-hardening, LSM List,
	Linux API, Greg Kroah-Hartman, Alexander Viro, Akinobu Mita,
	Tobin C. Harding, Oleg Nesterov, Jeff Layton, Ingo Molnar,
	Eric W. Biederman, Linus Torvalds, Daniel Micay, Jonathan Corbet,
	J. Bruce Fields, Stephen Rothwell, Solar Designer

On Fri, Nov 10, 2017 at 11:26 AM, Alexey Dobriyan <adobriyan@gmail.com> wrote:
> On 11/9/17, Djalal Harouni <tixxdz@gmail.com> wrote:
>
>> +struct proc_fs_info {
>> +     struct pid_namespace *pid_ns;
>> +};
>
>> +static inline struct proc_fs_info *proc_sb(struct super_block *sb)
>> +{
>> +     return sb->s_fs_info;
>> +}
>
> Can you rename this to "struct proc_super_block *" then?
> That "info" suffix all over filesystems doesn't add much info itself
> just more typing.
> Ditto for "fs_info" identifiers.

Ok, will do.

>> +extern inline struct proc_fs_info *proc_sb(struct super_block *sb)
>> { return NULL;}
>
> extern inline?

Oups, sorry will fix it and try to compile without proc.

Thank you!

-- 
tixxdz

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v3 1/7] proc: add proc_fs_info struct to store proc information
@ 2017-11-10 10:32       ` Djalal Harouni
  0 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-10 10:32 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	Linux FS Devel, linux-kernel, kernel-hardening, LSM List,
	Linux API, Greg Kroah-Hartman, Alexander Viro, Akinobu Mita,
	Tobin C. Harding, Oleg Nesterov, Jeff Layton, Ingo Molnar,
	Eric W. Biederman, Linus Torvalds, Daniel Micay

On Fri, Nov 10, 2017 at 11:26 AM, Alexey Dobriyan <adobriyan@gmail.com> wrote:
> On 11/9/17, Djalal Harouni <tixxdz@gmail.com> wrote:
>
>> +struct proc_fs_info {
>> +     struct pid_namespace *pid_ns;
>> +};
>
>> +static inline struct proc_fs_info *proc_sb(struct super_block *sb)
>> +{
>> +     return sb->s_fs_info;
>> +}
>
> Can you rename this to "struct proc_super_block *" then?
> That "info" suffix all over filesystems doesn't add much info itself
> just more typing.
> Ditto for "fs_info" identifiers.

Ok, will do.

>> +extern inline struct proc_fs_info *proc_sb(struct super_block *sb)
>> { return NULL;}
>
> extern inline?

Oups, sorry will fix it and try to compile without proc.

Thank you!

-- 
tixxdz

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH RFC v3 1/7] proc: add proc_fs_info struct to store proc information
@ 2017-11-10 10:32       ` Djalal Harouni
  0 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-10 10:32 UTC (permalink / raw)
  To: linux-security-module

On Fri, Nov 10, 2017 at 11:26 AM, Alexey Dobriyan <adobriyan@gmail.com> wrote:
> On 11/9/17, Djalal Harouni <tixxdz@gmail.com> wrote:
>
>> +struct proc_fs_info {
>> +     struct pid_namespace *pid_ns;
>> +};
>
>> +static inline struct proc_fs_info *proc_sb(struct super_block *sb)
>> +{
>> +     return sb->s_fs_info;
>> +}
>
> Can you rename this to "struct proc_super_block *" then?
> That "info" suffix all over filesystems doesn't add much info itself
> just more typing.
> Ditto for "fs_info" identifiers.

Ok, will do.

>> +extern inline struct proc_fs_info *proc_sb(struct super_block *sb)
>> { return NULL;}
>
> extern inline?

Oups, sorry will fix it and try to compile without proc.

Thank you!

-- 
tixxdz
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [kernel-hardening] Re: [PATCH RFC v3 1/7] proc: add proc_fs_info struct to store proc information
@ 2017-11-10 10:32       ` Djalal Harouni
  0 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-10 10:32 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	Linux FS Devel, linux-kernel, kernel-hardening, LSM List,
	Linux API, Greg Kroah-Hartman, Alexander Viro, Akinobu Mita,
	Tobin C. Harding, Oleg Nesterov, Jeff Layton, Ingo Molnar,
	Eric W. Biederman, Linus Torvalds, Daniel Micay, Jonathan Corbet,
	J. Bruce Fields, Stephen Rothwell, Solar Designer

On Fri, Nov 10, 2017 at 11:26 AM, Alexey Dobriyan <adobriyan@gmail.com> wrote:
> On 11/9/17, Djalal Harouni <tixxdz@gmail.com> wrote:
>
>> +struct proc_fs_info {
>> +     struct pid_namespace *pid_ns;
>> +};
>
>> +static inline struct proc_fs_info *proc_sb(struct super_block *sb)
>> +{
>> +     return sb->s_fs_info;
>> +}
>
> Can you rename this to "struct proc_super_block *" then?
> That "info" suffix all over filesystems doesn't add much info itself
> just more typing.
> Ditto for "fs_info" identifiers.

Ok, will do.

>> +extern inline struct proc_fs_info *proc_sb(struct super_block *sb)
>> { return NULL;}
>
> extern inline?

Oups, sorry will fix it and try to compile without proc.

Thank you!

-- 
tixxdz

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v3 4/7] proc: support mounting private procfs instances inside same pid namespace
  2017-11-10  2:53     ` James Morris
  (?)
  (?)
@ 2017-11-10 10:33       ` Djalal Harouni
  -1 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-10 10:33 UTC (permalink / raw)
  To: James Morris
  Cc: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	Linux FS Devel, linux-kernel, kernel-hardening, LSM List,
	Linux API, Greg Kroah-Hartman, Alexander Viro, Akinobu Mita,
	Tobin C. Harding, Oleg Nesterov, Jeff Layton, Ingo Molnar,
	Alexey Dobriyan, Eric W. Biederman, Linus Torvalds, Daniel Micay,
	Jonathan Corbet, J. Bruce Fields, Stephen Rothwell,
	Solar Designer

On Fri, Nov 10, 2017 at 3:53 AM, James Morris <james.l.morris@oracle.com> wrote:
> On Thu, 9 Nov 2017, Djalal Harouni wrote:
>
>> This should allow later after real testing to have a smooth transition
>> to a procfs with default private instances.
>>
>> [1] https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2017-January/004215.html
>> [2] http://www.openwall.com/lists/kernel-hardening/2017/10/05/5
>> [3] https://lwn.net/Articles/689539/
>> [4] http://lxr.free-electrons.com/source/Documentation/filesystems/devpts.txt?v=3.14
>> [5] https://lkml.org/lkml/2017/5/2/407
>> [6] https://lkml.org/lkml/2017/5/3/357
>>
>> Cc: Kees Cook <keescook@chromium.org>
>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Suggested-by: Andy Lutomirski <luto@kernel.org>
>> Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
>> Signed-off-by: Djalal Harouni <tixxdz@gmail.com>
>
>
> Reviewed-by: James Morris <james.l.morris@oracle.com>

Thank you James!


-- 
tixxdz

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v3 4/7] proc: support mounting private procfs instances inside same pid namespace
@ 2017-11-10 10:33       ` Djalal Harouni
  0 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-10 10:33 UTC (permalink / raw)
  To: James Morris
  Cc: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	Linux FS Devel, linux-kernel, kernel-hardening, LSM List,
	Linux API, Greg Kroah-Hartman, Alexander Viro, Akinobu Mita,
	Tobin C. Harding, Oleg Nesterov, Jeff Layton, Ingo Molnar,
	Alexey Dobriyan, Eric W. Biederman, Linus Torvalds

On Fri, Nov 10, 2017 at 3:53 AM, James Morris <james.l.morris@oracle.com> wrote:
> On Thu, 9 Nov 2017, Djalal Harouni wrote:
>
>> This should allow later after real testing to have a smooth transition
>> to a procfs with default private instances.
>>
>> [1] https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2017-January/004215.html
>> [2] http://www.openwall.com/lists/kernel-hardening/2017/10/05/5
>> [3] https://lwn.net/Articles/689539/
>> [4] http://lxr.free-electrons.com/source/Documentation/filesystems/devpts.txt?v=3.14
>> [5] https://lkml.org/lkml/2017/5/2/407
>> [6] https://lkml.org/lkml/2017/5/3/357
>>
>> Cc: Kees Cook <keescook@chromium.org>
>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Suggested-by: Andy Lutomirski <luto@kernel.org>
>> Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
>> Signed-off-by: Djalal Harouni <tixxdz@gmail.com>
>
>
> Reviewed-by: James Morris <james.l.morris@oracle.com>

Thank you James!


-- 
tixxdz

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH RFC v3 4/7] proc: support mounting private procfs instances inside same pid namespace
@ 2017-11-10 10:33       ` Djalal Harouni
  0 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-10 10:33 UTC (permalink / raw)
  To: linux-security-module

On Fri, Nov 10, 2017 at 3:53 AM, James Morris <james.l.morris@oracle.com> wrote:
> On Thu, 9 Nov 2017, Djalal Harouni wrote:
>
>> This should allow later after real testing to have a smooth transition
>> to a procfs with default private instances.
>>
>> [1] https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2017-January/004215.html
>> [2] http://www.openwall.com/lists/kernel-hardening/2017/10/05/5
>> [3] https://lwn.net/Articles/689539/
>> [4] http://lxr.free-electrons.com/source/Documentation/filesystems/devpts.txt?v=3.14
>> [5] https://lkml.org/lkml/2017/5/2/407
>> [6] https://lkml.org/lkml/2017/5/3/357
>>
>> Cc: Kees Cook <keescook@chromium.org>
>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Suggested-by: Andy Lutomirski <luto@kernel.org>
>> Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
>> Signed-off-by: Djalal Harouni <tixxdz@gmail.com>
>
>
> Reviewed-by: James Morris <james.l.morris@oracle.com>

Thank you James!


-- 
tixxdz
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [kernel-hardening] Re: [PATCH RFC v3 4/7] proc: support mounting private procfs instances inside same pid namespace
@ 2017-11-10 10:33       ` Djalal Harouni
  0 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-10 10:33 UTC (permalink / raw)
  To: James Morris
  Cc: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	Linux FS Devel, linux-kernel, kernel-hardening, LSM List,
	Linux API, Greg Kroah-Hartman, Alexander Viro, Akinobu Mita,
	Tobin C. Harding, Oleg Nesterov, Jeff Layton, Ingo Molnar,
	Alexey Dobriyan, Eric W. Biederman, Linus Torvalds, Daniel Micay,
	Jonathan Corbet, J. Bruce Fields, Stephen Rothwell,
	Solar Designer

On Fri, Nov 10, 2017 at 3:53 AM, James Morris <james.l.morris@oracle.com> wrote:
> On Thu, 9 Nov 2017, Djalal Harouni wrote:
>
>> This should allow later after real testing to have a smooth transition
>> to a procfs with default private instances.
>>
>> [1] https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2017-January/004215.html
>> [2] http://www.openwall.com/lists/kernel-hardening/2017/10/05/5
>> [3] https://lwn.net/Articles/689539/
>> [4] http://lxr.free-electrons.com/source/Documentation/filesystems/devpts.txt?v=3.14
>> [5] https://lkml.org/lkml/2017/5/2/407
>> [6] https://lkml.org/lkml/2017/5/3/357
>>
>> Cc: Kees Cook <keescook@chromium.org>
>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Suggested-by: Andy Lutomirski <luto@kernel.org>
>> Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
>> Signed-off-by: Djalal Harouni <tixxdz@gmail.com>
>
>
> Reviewed-by: James Morris <james.l.morris@oracle.com>

Thank you James!


-- 
tixxdz

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v3 3/7] proc: add helpers to set and get proc hidepid and gid mount options
  2017-11-09 16:14   ` Djalal Harouni
  (?)
  (?)
@ 2017-11-10 10:36     ` Alexey Dobriyan
  -1 siblings, 0 replies; 71+ messages in thread
From: Alexey Dobriyan @ 2017-11-10 10:36 UTC (permalink / raw)
  To: Djalal Harouni
  Cc: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	linux-fsdevel, linux-kernel, kernel-hardening,
	linux-security-module, linux-api, Greg Kroah-Hartman,
	Alexander Viro, Akinobu Mita, me, Oleg Nesterov, Jeff Layton,
	Ingo Molnar, ebiederm, Linus Torvalds, Daniel Micay,
	Jonathan Corbet, bfields, Stephen Rothwell, solar

On 11/9/17, Djalal Harouni <tixxdz@gmail.com> wrote:
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c

> -static bool has_pid_permissions(struct pid_namespace *pid,
> +static bool has_pid_permissions(struct proc_fs_info *fs_info,

More "const".

> diff --git a/fs/proc/inode.c b/fs/proc/inode.c
> index 9abc370..bdd808d 100644
> --- a/fs/proc/inode.c
> +++ b/fs/proc/inode.c
> @@ -476,11 +476,12 @@ struct inode *proc_get_inode(struct super_block *sb,
> struct proc_dir_entry *de)
>  int proc_fill_super(struct super_block *s, void *data, int silent)
>  {
>  	struct proc_fs_info *fs_info = proc_sb(s);
> -	struct pid_namespace *ns = get_pid_ns(fs_info->pid_ns);
>  	struct inode *root_inode;
>  	int ret;
>
> -	if (!proc_parse_options(data, ns))
> +	get_pid_ns(fs_info->pid_ns);
> +
> +	if (!proc_parse_options(data, fs_info))
>  		return -EINVAL;
>
>  	/* User space would break if executables or devices appear on proc */
> diff --git a/fs/proc/internal.h b/fs/proc/internal.h
> index 4a67188..10bc7be 100644
> --- a/fs/proc/internal.h
> +++ b/fs/proc/internal.h
> @@ -240,7 +240,7 @@ static inline void proc_tty_init(void) {}
>   * root.c
>   */
>  extern struct proc_dir_entry proc_root;
> -extern int proc_parse_options(char *options, struct pid_namespace *pid);
> +extern int proc_parse_options(char *options, struct proc_fs_info
> *fs_info);

"extern" can be dropped if you're touching prototype anyway.



> +static inline int proc_fs_hide_pid(struct proc_fs_info *fs_info)
> +{
> +	return fs_info->pid_ns->hide_pid;
> +}
> +
> +static inline kgid_t proc_fs_pid_gid(struct proc_fs_info *fs_info)
> +{
> +	return fs_info->pid_ns->pid_gid;
> +}

More "const".

> @@ -59,6 +81,24 @@ static inline void proc_flush_task(struct task_struct
> *task)
>  {
>  }
>
> +static inline void proc_fs_set_hide_pid(struct proc_fs_info *fs_info, int
> hide_pid)
> +{
> +}
> +
> +static inline void proc_fs_set_pid_gid(struct proc_info_fs *fs_info, kgid_t
> gid)
> +{
> +}
> +
> +static inline int proc_fs_hide_pid(struct proc_fs_info *fs_info)
> +{
> +	return 0;
> +}
> +
> +extern kgid_t proc_fs_pid_gid(struct proc_fs_info *fs_info)

ehh?

> +{
> +	return GLOBAL_ROOT_GID;
> +}

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v3 3/7] proc: add helpers to set and get proc hidepid and gid mount options
@ 2017-11-10 10:36     ` Alexey Dobriyan
  0 siblings, 0 replies; 71+ messages in thread
From: Alexey Dobriyan @ 2017-11-10 10:36 UTC (permalink / raw)
  To: Djalal Harouni
  Cc: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	linux-fsdevel, linux-kernel, kernel-hardening,
	linux-security-module, linux-api, Greg Kroah-Hartman,
	Alexander Viro, Akinobu Mita, me, Oleg Nesterov, Jeff Layton,
	Ingo Molnar, ebiederm, Linus Torvalds, Daniel Micay,
	Jonathan Corbet, bfields, Stephen Rothwell

On 11/9/17, Djalal Harouni <tixxdz@gmail.com> wrote:
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c

> -static bool has_pid_permissions(struct pid_namespace *pid,
> +static bool has_pid_permissions(struct proc_fs_info *fs_info,

More "const".

> diff --git a/fs/proc/inode.c b/fs/proc/inode.c
> index 9abc370..bdd808d 100644
> --- a/fs/proc/inode.c
> +++ b/fs/proc/inode.c
> @@ -476,11 +476,12 @@ struct inode *proc_get_inode(struct super_block *sb,
> struct proc_dir_entry *de)
>  int proc_fill_super(struct super_block *s, void *data, int silent)
>  {
>  	struct proc_fs_info *fs_info = proc_sb(s);
> -	struct pid_namespace *ns = get_pid_ns(fs_info->pid_ns);
>  	struct inode *root_inode;
>  	int ret;
>
> -	if (!proc_parse_options(data, ns))
> +	get_pid_ns(fs_info->pid_ns);
> +
> +	if (!proc_parse_options(data, fs_info))
>  		return -EINVAL;
>
>  	/* User space would break if executables or devices appear on proc */
> diff --git a/fs/proc/internal.h b/fs/proc/internal.h
> index 4a67188..10bc7be 100644
> --- a/fs/proc/internal.h
> +++ b/fs/proc/internal.h
> @@ -240,7 +240,7 @@ static inline void proc_tty_init(void) {}
>   * root.c
>   */
>  extern struct proc_dir_entry proc_root;
> -extern int proc_parse_options(char *options, struct pid_namespace *pid);
> +extern int proc_parse_options(char *options, struct proc_fs_info
> *fs_info);

"extern" can be dropped if you're touching prototype anyway.



> +static inline int proc_fs_hide_pid(struct proc_fs_info *fs_info)
> +{
> +	return fs_info->pid_ns->hide_pid;
> +}
> +
> +static inline kgid_t proc_fs_pid_gid(struct proc_fs_info *fs_info)
> +{
> +	return fs_info->pid_ns->pid_gid;
> +}

More "const".

> @@ -59,6 +81,24 @@ static inline void proc_flush_task(struct task_struct
> *task)
>  {
>  }
>
> +static inline void proc_fs_set_hide_pid(struct proc_fs_info *fs_info, int
> hide_pid)
> +{
> +}
> +
> +static inline void proc_fs_set_pid_gid(struct proc_info_fs *fs_info, kgid_t
> gid)
> +{
> +}
> +
> +static inline int proc_fs_hide_pid(struct proc_fs_info *fs_info)
> +{
> +	return 0;
> +}
> +
> +extern kgid_t proc_fs_pid_gid(struct proc_fs_info *fs_info)

ehh?

> +{
> +	return GLOBAL_ROOT_GID;
> +}

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH RFC v3 3/7] proc: add helpers to set and get proc hidepid and gid mount options
@ 2017-11-10 10:36     ` Alexey Dobriyan
  0 siblings, 0 replies; 71+ messages in thread
From: Alexey Dobriyan @ 2017-11-10 10:36 UTC (permalink / raw)
  To: linux-security-module

On 11/9/17, Djalal Harouni <tixxdz@gmail.com> wrote:
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c

> -static bool has_pid_permissions(struct pid_namespace *pid,
> +static bool has_pid_permissions(struct proc_fs_info *fs_info,

More "const".

> diff --git a/fs/proc/inode.c b/fs/proc/inode.c
> index 9abc370..bdd808d 100644
> --- a/fs/proc/inode.c
> +++ b/fs/proc/inode.c
> @@ -476,11 +476,12 @@ struct inode *proc_get_inode(struct super_block *sb,
> struct proc_dir_entry *de)
>  int proc_fill_super(struct super_block *s, void *data, int silent)
>  {
>  	struct proc_fs_info *fs_info = proc_sb(s);
> -	struct pid_namespace *ns = get_pid_ns(fs_info->pid_ns);
>  	struct inode *root_inode;
>  	int ret;
>
> -	if (!proc_parse_options(data, ns))
> +	get_pid_ns(fs_info->pid_ns);
> +
> +	if (!proc_parse_options(data, fs_info))
>  		return -EINVAL;
>
>  	/* User space would break if executables or devices appear on proc */
> diff --git a/fs/proc/internal.h b/fs/proc/internal.h
> index 4a67188..10bc7be 100644
> --- a/fs/proc/internal.h
> +++ b/fs/proc/internal.h
> @@ -240,7 +240,7 @@ static inline void proc_tty_init(void) {}
>   * root.c
>   */
>  extern struct proc_dir_entry proc_root;
> -extern int proc_parse_options(char *options, struct pid_namespace *pid);
> +extern int proc_parse_options(char *options, struct proc_fs_info
> *fs_info);

"extern" can be dropped if you're touching prototype anyway.



> +static inline int proc_fs_hide_pid(struct proc_fs_info *fs_info)
> +{
> +	return fs_info->pid_ns->hide_pid;
> +}
> +
> +static inline kgid_t proc_fs_pid_gid(struct proc_fs_info *fs_info)
> +{
> +	return fs_info->pid_ns->pid_gid;
> +}

More "const".

> @@ -59,6 +81,24 @@ static inline void proc_flush_task(struct task_struct
> *task)
>  {
>  }
>
> +static inline void proc_fs_set_hide_pid(struct proc_fs_info *fs_info, int
> hide_pid)
> +{
> +}
> +
> +static inline void proc_fs_set_pid_gid(struct proc_info_fs *fs_info, kgid_t
> gid)
> +{
> +}
> +
> +static inline int proc_fs_hide_pid(struct proc_fs_info *fs_info)
> +{
> +	return 0;
> +}
> +
> +extern kgid_t proc_fs_pid_gid(struct proc_fs_info *fs_info)

ehh?

> +{
> +	return GLOBAL_ROOT_GID;
> +}
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [kernel-hardening] Re: [PATCH RFC v3 3/7] proc: add helpers to set and get proc hidepid and gid mount options
@ 2017-11-10 10:36     ` Alexey Dobriyan
  0 siblings, 0 replies; 71+ messages in thread
From: Alexey Dobriyan @ 2017-11-10 10:36 UTC (permalink / raw)
  To: Djalal Harouni
  Cc: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	linux-fsdevel, linux-kernel, kernel-hardening,
	linux-security-module, linux-api, Greg Kroah-Hartman,
	Alexander Viro, Akinobu Mita, me, Oleg Nesterov, Jeff Layton,
	Ingo Molnar, ebiederm, Linus Torvalds, Daniel Micay,
	Jonathan Corbet, bfields, Stephen Rothwell, solar

On 11/9/17, Djalal Harouni <tixxdz@gmail.com> wrote:
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c

> -static bool has_pid_permissions(struct pid_namespace *pid,
> +static bool has_pid_permissions(struct proc_fs_info *fs_info,

More "const".

> diff --git a/fs/proc/inode.c b/fs/proc/inode.c
> index 9abc370..bdd808d 100644
> --- a/fs/proc/inode.c
> +++ b/fs/proc/inode.c
> @@ -476,11 +476,12 @@ struct inode *proc_get_inode(struct super_block *sb,
> struct proc_dir_entry *de)
>  int proc_fill_super(struct super_block *s, void *data, int silent)
>  {
>  	struct proc_fs_info *fs_info = proc_sb(s);
> -	struct pid_namespace *ns = get_pid_ns(fs_info->pid_ns);
>  	struct inode *root_inode;
>  	int ret;
>
> -	if (!proc_parse_options(data, ns))
> +	get_pid_ns(fs_info->pid_ns);
> +
> +	if (!proc_parse_options(data, fs_info))
>  		return -EINVAL;
>
>  	/* User space would break if executables or devices appear on proc */
> diff --git a/fs/proc/internal.h b/fs/proc/internal.h
> index 4a67188..10bc7be 100644
> --- a/fs/proc/internal.h
> +++ b/fs/proc/internal.h
> @@ -240,7 +240,7 @@ static inline void proc_tty_init(void) {}
>   * root.c
>   */
>  extern struct proc_dir_entry proc_root;
> -extern int proc_parse_options(char *options, struct pid_namespace *pid);
> +extern int proc_parse_options(char *options, struct proc_fs_info
> *fs_info);

"extern" can be dropped if you're touching prototype anyway.



> +static inline int proc_fs_hide_pid(struct proc_fs_info *fs_info)
> +{
> +	return fs_info->pid_ns->hide_pid;
> +}
> +
> +static inline kgid_t proc_fs_pid_gid(struct proc_fs_info *fs_info)
> +{
> +	return fs_info->pid_ns->pid_gid;
> +}

More "const".

> @@ -59,6 +81,24 @@ static inline void proc_flush_task(struct task_struct
> *task)
>  {
>  }
>
> +static inline void proc_fs_set_hide_pid(struct proc_fs_info *fs_info, int
> hide_pid)
> +{
> +}
> +
> +static inline void proc_fs_set_pid_gid(struct proc_info_fs *fs_info, kgid_t
> gid)
> +{
> +}
> +
> +static inline int proc_fs_hide_pid(struct proc_fs_info *fs_info)
> +{
> +	return 0;
> +}
> +
> +extern kgid_t proc_fs_pid_gid(struct proc_fs_info *fs_info)

ehh?

> +{
> +	return GLOBAL_ROOT_GID;
> +}

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v3 6/7] proc: support new 'pids=all|ptraceable' mount option
  2017-11-10  2:38     ` Andy Lutomirski
                         ` (2 preceding siblings ...)
  (?)
@ 2017-11-10 10:38       ` Djalal Harouni
  -1 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-10 10:38 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kees Cook, Alexey Gladkov, Andrew Morton, Linux FS Devel,
	linux-kernel, kernel-hardening, LSM List, Linux API,
	Greg Kroah-Hartman, Alexander Viro, Akinobu Mita,
	Tobin C. Harding, Oleg Nesterov, Jeff Layton, Ingo Molnar,
	Alexey Dobriyan, Eric W. Biederman, Linus Torvalds, Daniel Micay,
	Jonathan Corbet, J. Bruce Fields, Stephen Rothwell,
	Solar Designer

On Fri, Nov 10, 2017 at 3:38 AM, Andy Lutomirski <luto@kernel.org> wrote:
> On Thu, Nov 9, 2017 at 8:14 AM, Djalal Harouni <tixxdz@gmail.com> wrote:
>> This patch introduces the new 'pids' mount option, as it was discussed
>> and suggested by Andy Lutomirski [1].
>>
>> * If 'pids=' is passed without 'newinstance' then it has no effect.
>
> Would it be safer this were an error instead?

Hm, I tend to say that you are right, but I also keep your comment
when you said that "newinstance" should be the default later and users
won't have to explicitly pass it. What you think ?

-- 
tixxdz

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v3 6/7] proc: support new 'pids=all|ptraceable' mount option
@ 2017-11-10 10:38       ` Djalal Harouni
  0 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-10 10:38 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kees Cook, Alexey Gladkov, Andrew Morton, Linux FS Devel,
	linux-kernel, kernel-hardening, LSM List, Linux API,
	Greg Kroah-Hartman, Alexander Viro, Akinobu Mita,
	Tobin C. Harding, Oleg Nesterov, Jeff Layton, Ingo Molnar,
	Alexey Dobriyan, Eric W. Biederman, Linus Torvalds, Daniel Micay,
	Jonathan Corbet, J. Bruce Fields, Stephen Rothwell,
	Solar Designer

On Fri, Nov 10, 2017 at 3:38 AM, Andy Lutomirski <luto@kernel.org> wrote:
> On Thu, Nov 9, 2017 at 8:14 AM, Djalal Harouni <tixxdz@gmail.com> wrote:
>> This patch introduces the new 'pids' mount option, as it was discussed
>> and suggested by Andy Lutomirski [1].
>>
>> * If 'pids=' is passed without 'newinstance' then it has no effect.
>
> Would it be safer this were an error instead?

Hm, I tend to say that you are right, but I also keep your comment
when you said that "newinstance" should be the default later and users
won't have to explicitly pass it. What you think ?

-- 
tixxdz

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v3 6/7] proc: support new 'pids=all|ptraceable' mount option
@ 2017-11-10 10:38       ` Djalal Harouni
  0 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-10 10:38 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kees Cook, Alexey Gladkov, Andrew Morton, Linux FS Devel,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-hardening-ZwoEplunGu1jrUoiu81ncdBPR1lH4CV8, LSM List,
	Linux API, Greg Kroah-Hartman, Alexander Viro, Akinobu Mita,
	Tobin C. Harding, Oleg Nesterov, Jeff Layton, Ingo Molnar,
	Alexey Dobriyan, Eric W. Biederman, Linus

On Fri, Nov 10, 2017 at 3:38 AM, Andy Lutomirski <luto-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> On Thu, Nov 9, 2017 at 8:14 AM, Djalal Harouni <tixxdz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> This patch introduces the new 'pids' mount option, as it was discussed
>> and suggested by Andy Lutomirski [1].
>>
>> * If 'pids=' is passed without 'newinstance' then it has no effect.
>
> Would it be safer this were an error instead?

Hm, I tend to say that you are right, but I also keep your comment
when you said that "newinstance" should be the default later and users
won't have to explicitly pass it. What you think ?

-- 
tixxdz

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH RFC v3 6/7] proc: support new 'pids=all|ptraceable' mount option
@ 2017-11-10 10:38       ` Djalal Harouni
  0 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-10 10:38 UTC (permalink / raw)
  To: linux-security-module

On Fri, Nov 10, 2017 at 3:38 AM, Andy Lutomirski <luto@kernel.org> wrote:
> On Thu, Nov 9, 2017 at 8:14 AM, Djalal Harouni <tixxdz@gmail.com> wrote:
>> This patch introduces the new 'pids' mount option, as it was discussed
>> and suggested by Andy Lutomirski [1].
>>
>> * If 'pids=' is passed without 'newinstance' then it has no effect.
>
> Would it be safer this were an error instead?

Hm, I tend to say that you are right, but I also keep your comment
when you said that "newinstance" should be the default later and users
won't have to explicitly pass it. What you think ?

-- 
tixxdz
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [kernel-hardening] Re: [PATCH RFC v3 6/7] proc: support new 'pids=all|ptraceable' mount option
@ 2017-11-10 10:38       ` Djalal Harouni
  0 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-10 10:38 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kees Cook, Alexey Gladkov, Andrew Morton, Linux FS Devel,
	linux-kernel, kernel-hardening, LSM List, Linux API,
	Greg Kroah-Hartman, Alexander Viro, Akinobu Mita,
	Tobin C. Harding, Oleg Nesterov, Jeff Layton, Ingo Molnar,
	Alexey Dobriyan, Eric W. Biederman, Linus Torvalds, Daniel Micay,
	Jonathan Corbet, J. Bruce Fields, Stephen Rothwell,
	Solar Designer

On Fri, Nov 10, 2017 at 3:38 AM, Andy Lutomirski <luto@kernel.org> wrote:
> On Thu, Nov 9, 2017 at 8:14 AM, Djalal Harouni <tixxdz@gmail.com> wrote:
>> This patch introduces the new 'pids' mount option, as it was discussed
>> and suggested by Andy Lutomirski [1].
>>
>> * If 'pids=' is passed without 'newinstance' then it has no effect.
>
> Would it be safer this were an error instead?

Hm, I tend to say that you are right, but I also keep your comment
when you said that "newinstance" should be the default later and users
won't have to explicitly pass it. What you think ?

-- 
tixxdz

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v3 3/7] proc: add helpers to set and get proc hidepid and gid mount options
  2017-11-10 10:36     ` Alexey Dobriyan
  (?)
  (?)
@ 2017-11-10 10:41       ` Djalal Harouni
  -1 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-10 10:41 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	Linux FS Devel, linux-kernel, kernel-hardening, LSM List,
	Linux API, Greg Kroah-Hartman, Alexander Viro, Akinobu Mita,
	Tobin C. Harding, Oleg Nesterov, Jeff Layton, Ingo Molnar,
	Eric W. Biederman, Linus Torvalds, Daniel Micay, Jonathan Corbet,
	J. Bruce Fields, Stephen Rothwell, Solar Designer

On Fri, Nov 10, 2017 at 11:36 AM, Alexey Dobriyan <adobriyan@gmail.com> wrote:
> On 11/9/17, Djalal Harouni <tixxdz@gmail.com> wrote:
>> --- a/fs/proc/base.c
>> +++ b/fs/proc/base.c
>
>> -static bool has_pid_permissions(struct pid_namespace *pid,
>> +static bool has_pid_permissions(struct proc_fs_info *fs_info,
>
> More "const".
>
>> diff --git a/fs/proc/inode.c b/fs/proc/inode.c
>> index 9abc370..bdd808d 100644
>> --- a/fs/proc/inode.c
>> +++ b/fs/proc/inode.c
>> @@ -476,11 +476,12 @@ struct inode *proc_get_inode(struct super_block *sb,
>> struct proc_dir_entry *de)
>>  int proc_fill_super(struct super_block *s, void *data, int silent)
>>  {
>>       struct proc_fs_info *fs_info = proc_sb(s);
>> -     struct pid_namespace *ns = get_pid_ns(fs_info->pid_ns);
>>       struct inode *root_inode;
>>       int ret;
>>
>> -     if (!proc_parse_options(data, ns))
>> +     get_pid_ns(fs_info->pid_ns);
>> +
>> +     if (!proc_parse_options(data, fs_info))
>>               return -EINVAL;
>>
>>       /* User space would break if executables or devices appear on proc */
>> diff --git a/fs/proc/internal.h b/fs/proc/internal.h
>> index 4a67188..10bc7be 100644
>> --- a/fs/proc/internal.h
>> +++ b/fs/proc/internal.h
>> @@ -240,7 +240,7 @@ static inline void proc_tty_init(void) {}
>>   * root.c
>>   */
>>  extern struct proc_dir_entry proc_root;
>> -extern int proc_parse_options(char *options, struct pid_namespace *pid);
>> +extern int proc_parse_options(char *options, struct proc_fs_info
>> *fs_info);
>
> "extern" can be dropped if you're touching prototype anyway.
>
>
>
>> +static inline int proc_fs_hide_pid(struct proc_fs_info *fs_info)
>> +{
>> +     return fs_info->pid_ns->hide_pid;
>> +}
>> +
>> +static inline kgid_t proc_fs_pid_gid(struct proc_fs_info *fs_info)
>> +{
>> +     return fs_info->pid_ns->pid_gid;
>> +}
>
> More "const".
>
>> @@ -59,6 +81,24 @@ static inline void proc_flush_task(struct task_struct
>> *task)
>>  {
>>  }
>>
>> +static inline void proc_fs_set_hide_pid(struct proc_fs_info *fs_info, int
>> hide_pid)
>> +{
>> +}
>> +
>> +static inline void proc_fs_set_pid_gid(struct proc_info_fs *fs_info, kgid_t
>> gid)
>> +{
>> +}
>> +
>> +static inline int proc_fs_hide_pid(struct proc_fs_info *fs_info)
>> +{
>> +     return 0;
>> +}
>> +
>> +extern kgid_t proc_fs_pid_gid(struct proc_fs_info *fs_info)
>
> ehh?

Ouch copy/past, will compile it without proc support.

Will fix "const" and other comments too, thank you!


-- 
tixxdz

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v3 3/7] proc: add helpers to set and get proc hidepid and gid mount options
@ 2017-11-10 10:41       ` Djalal Harouni
  0 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-10 10:41 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	Linux FS Devel, linux-kernel, kernel-hardening, LSM List,
	Linux API, Greg Kroah-Hartman, Alexander Viro, Akinobu Mita,
	Tobin C. Harding, Oleg Nesterov, Jeff Layton, Ingo Molnar,
	Eric W. Biederman, Linus Torvalds, Daniel Micay

On Fri, Nov 10, 2017 at 11:36 AM, Alexey Dobriyan <adobriyan@gmail.com> wrote:
> On 11/9/17, Djalal Harouni <tixxdz@gmail.com> wrote:
>> --- a/fs/proc/base.c
>> +++ b/fs/proc/base.c
>
>> -static bool has_pid_permissions(struct pid_namespace *pid,
>> +static bool has_pid_permissions(struct proc_fs_info *fs_info,
>
> More "const".
>
>> diff --git a/fs/proc/inode.c b/fs/proc/inode.c
>> index 9abc370..bdd808d 100644
>> --- a/fs/proc/inode.c
>> +++ b/fs/proc/inode.c
>> @@ -476,11 +476,12 @@ struct inode *proc_get_inode(struct super_block *sb,
>> struct proc_dir_entry *de)
>>  int proc_fill_super(struct super_block *s, void *data, int silent)
>>  {
>>       struct proc_fs_info *fs_info = proc_sb(s);
>> -     struct pid_namespace *ns = get_pid_ns(fs_info->pid_ns);
>>       struct inode *root_inode;
>>       int ret;
>>
>> -     if (!proc_parse_options(data, ns))
>> +     get_pid_ns(fs_info->pid_ns);
>> +
>> +     if (!proc_parse_options(data, fs_info))
>>               return -EINVAL;
>>
>>       /* User space would break if executables or devices appear on proc */
>> diff --git a/fs/proc/internal.h b/fs/proc/internal.h
>> index 4a67188..10bc7be 100644
>> --- a/fs/proc/internal.h
>> +++ b/fs/proc/internal.h
>> @@ -240,7 +240,7 @@ static inline void proc_tty_init(void) {}
>>   * root.c
>>   */
>>  extern struct proc_dir_entry proc_root;
>> -extern int proc_parse_options(char *options, struct pid_namespace *pid);
>> +extern int proc_parse_options(char *options, struct proc_fs_info
>> *fs_info);
>
> "extern" can be dropped if you're touching prototype anyway.
>
>
>
>> +static inline int proc_fs_hide_pid(struct proc_fs_info *fs_info)
>> +{
>> +     return fs_info->pid_ns->hide_pid;
>> +}
>> +
>> +static inline kgid_t proc_fs_pid_gid(struct proc_fs_info *fs_info)
>> +{
>> +     return fs_info->pid_ns->pid_gid;
>> +}
>
> More "const".
>
>> @@ -59,6 +81,24 @@ static inline void proc_flush_task(struct task_struct
>> *task)
>>  {
>>  }
>>
>> +static inline void proc_fs_set_hide_pid(struct proc_fs_info *fs_info, int
>> hide_pid)
>> +{
>> +}
>> +
>> +static inline void proc_fs_set_pid_gid(struct proc_info_fs *fs_info, kgid_t
>> gid)
>> +{
>> +}
>> +
>> +static inline int proc_fs_hide_pid(struct proc_fs_info *fs_info)
>> +{
>> +     return 0;
>> +}
>> +
>> +extern kgid_t proc_fs_pid_gid(struct proc_fs_info *fs_info)
>
> ehh?

Ouch copy/past, will compile it without proc support.

Will fix "const" and other comments too, thank you!


-- 
tixxdz

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH RFC v3 3/7] proc: add helpers to set and get proc hidepid and gid mount options
@ 2017-11-10 10:41       ` Djalal Harouni
  0 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-10 10:41 UTC (permalink / raw)
  To: linux-security-module

On Fri, Nov 10, 2017 at 11:36 AM, Alexey Dobriyan <adobriyan@gmail.com> wrote:
> On 11/9/17, Djalal Harouni <tixxdz@gmail.com> wrote:
>> --- a/fs/proc/base.c
>> +++ b/fs/proc/base.c
>
>> -static bool has_pid_permissions(struct pid_namespace *pid,
>> +static bool has_pid_permissions(struct proc_fs_info *fs_info,
>
> More "const".
>
>> diff --git a/fs/proc/inode.c b/fs/proc/inode.c
>> index 9abc370..bdd808d 100644
>> --- a/fs/proc/inode.c
>> +++ b/fs/proc/inode.c
>> @@ -476,11 +476,12 @@ struct inode *proc_get_inode(struct super_block *sb,
>> struct proc_dir_entry *de)
>>  int proc_fill_super(struct super_block *s, void *data, int silent)
>>  {
>>       struct proc_fs_info *fs_info = proc_sb(s);
>> -     struct pid_namespace *ns = get_pid_ns(fs_info->pid_ns);
>>       struct inode *root_inode;
>>       int ret;
>>
>> -     if (!proc_parse_options(data, ns))
>> +     get_pid_ns(fs_info->pid_ns);
>> +
>> +     if (!proc_parse_options(data, fs_info))
>>               return -EINVAL;
>>
>>       /* User space would break if executables or devices appear on proc */
>> diff --git a/fs/proc/internal.h b/fs/proc/internal.h
>> index 4a67188..10bc7be 100644
>> --- a/fs/proc/internal.h
>> +++ b/fs/proc/internal.h
>> @@ -240,7 +240,7 @@ static inline void proc_tty_init(void) {}
>>   * root.c
>>   */
>>  extern struct proc_dir_entry proc_root;
>> -extern int proc_parse_options(char *options, struct pid_namespace *pid);
>> +extern int proc_parse_options(char *options, struct proc_fs_info
>> *fs_info);
>
> "extern" can be dropped if you're touching prototype anyway.
>
>
>
>> +static inline int proc_fs_hide_pid(struct proc_fs_info *fs_info)
>> +{
>> +     return fs_info->pid_ns->hide_pid;
>> +}
>> +
>> +static inline kgid_t proc_fs_pid_gid(struct proc_fs_info *fs_info)
>> +{
>> +     return fs_info->pid_ns->pid_gid;
>> +}
>
> More "const".
>
>> @@ -59,6 +81,24 @@ static inline void proc_flush_task(struct task_struct
>> *task)
>>  {
>>  }
>>
>> +static inline void proc_fs_set_hide_pid(struct proc_fs_info *fs_info, int
>> hide_pid)
>> +{
>> +}
>> +
>> +static inline void proc_fs_set_pid_gid(struct proc_info_fs *fs_info, kgid_t
>> gid)
>> +{
>> +}
>> +
>> +static inline int proc_fs_hide_pid(struct proc_fs_info *fs_info)
>> +{
>> +     return 0;
>> +}
>> +
>> +extern kgid_t proc_fs_pid_gid(struct proc_fs_info *fs_info)
>
> ehh?

Ouch copy/past, will compile it without proc support.

Will fix "const" and other comments too, thank you!


-- 
tixxdz
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [kernel-hardening] Re: [PATCH RFC v3 3/7] proc: add helpers to set and get proc hidepid and gid mount options
@ 2017-11-10 10:41       ` Djalal Harouni
  0 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-10 10:41 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	Linux FS Devel, linux-kernel, kernel-hardening, LSM List,
	Linux API, Greg Kroah-Hartman, Alexander Viro, Akinobu Mita,
	Tobin C. Harding, Oleg Nesterov, Jeff Layton, Ingo Molnar,
	Eric W. Biederman, Linus Torvalds, Daniel Micay, Jonathan Corbet,
	J. Bruce Fields, Stephen Rothwell, Solar Designer

On Fri, Nov 10, 2017 at 11:36 AM, Alexey Dobriyan <adobriyan@gmail.com> wrote:
> On 11/9/17, Djalal Harouni <tixxdz@gmail.com> wrote:
>> --- a/fs/proc/base.c
>> +++ b/fs/proc/base.c
>
>> -static bool has_pid_permissions(struct pid_namespace *pid,
>> +static bool has_pid_permissions(struct proc_fs_info *fs_info,
>
> More "const".
>
>> diff --git a/fs/proc/inode.c b/fs/proc/inode.c
>> index 9abc370..bdd808d 100644
>> --- a/fs/proc/inode.c
>> +++ b/fs/proc/inode.c
>> @@ -476,11 +476,12 @@ struct inode *proc_get_inode(struct super_block *sb,
>> struct proc_dir_entry *de)
>>  int proc_fill_super(struct super_block *s, void *data, int silent)
>>  {
>>       struct proc_fs_info *fs_info = proc_sb(s);
>> -     struct pid_namespace *ns = get_pid_ns(fs_info->pid_ns);
>>       struct inode *root_inode;
>>       int ret;
>>
>> -     if (!proc_parse_options(data, ns))
>> +     get_pid_ns(fs_info->pid_ns);
>> +
>> +     if (!proc_parse_options(data, fs_info))
>>               return -EINVAL;
>>
>>       /* User space would break if executables or devices appear on proc */
>> diff --git a/fs/proc/internal.h b/fs/proc/internal.h
>> index 4a67188..10bc7be 100644
>> --- a/fs/proc/internal.h
>> +++ b/fs/proc/internal.h
>> @@ -240,7 +240,7 @@ static inline void proc_tty_init(void) {}
>>   * root.c
>>   */
>>  extern struct proc_dir_entry proc_root;
>> -extern int proc_parse_options(char *options, struct pid_namespace *pid);
>> +extern int proc_parse_options(char *options, struct proc_fs_info
>> *fs_info);
>
> "extern" can be dropped if you're touching prototype anyway.
>
>
>
>> +static inline int proc_fs_hide_pid(struct proc_fs_info *fs_info)
>> +{
>> +     return fs_info->pid_ns->hide_pid;
>> +}
>> +
>> +static inline kgid_t proc_fs_pid_gid(struct proc_fs_info *fs_info)
>> +{
>> +     return fs_info->pid_ns->pid_gid;
>> +}
>
> More "const".
>
>> @@ -59,6 +81,24 @@ static inline void proc_flush_task(struct task_struct
>> *task)
>>  {
>>  }
>>
>> +static inline void proc_fs_set_hide_pid(struct proc_fs_info *fs_info, int
>> hide_pid)
>> +{
>> +}
>> +
>> +static inline void proc_fs_set_pid_gid(struct proc_info_fs *fs_info, kgid_t
>> gid)
>> +{
>> +}
>> +
>> +static inline int proc_fs_hide_pid(struct proc_fs_info *fs_info)
>> +{
>> +     return 0;
>> +}
>> +
>> +extern kgid_t proc_fs_pid_gid(struct proc_fs_info *fs_info)
>
> ehh?

Ouch copy/past, will compile it without proc support.

Will fix "const" and other comments too, thank you!


-- 
tixxdz

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v3 2/7] proc: move /proc/{self|thread-self} dentries to proc_fs_info
  2017-11-10 10:31     ` Alexey Dobriyan
  (?)
  (?)
@ 2017-11-10 10:45       ` Djalal Harouni
  -1 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-10 10:45 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	Linux FS Devel, linux-kernel, kernel-hardening, LSM List,
	Linux API, Greg Kroah-Hartman, Alexander Viro, Akinobu Mita,
	Tobin C. Harding, Oleg Nesterov, Jeff Layton, Ingo Molnar,
	Eric W. Biederman, Linus Torvalds, Daniel Micay, Jonathan Corbet,
	J. Bruce Fields, Stephen Rothwell, Solar Designer

On Fri, Nov 10, 2017 at 11:31 AM, Alexey Dobriyan <adobriyan@gmail.com> wrote:
> On 11/9/17, Djalal Harouni <tixxdz@gmail.com> wrote:
>
>>  struct proc_fs_info {
>>       struct pid_namespace *pid_ns;
>> +     struct dentry *proc_self; /* For /proc/self/ */
>> +     struct dentry *proc_thread_self; /* For /proc/thread-self/ */
>
> These are redundant comments.

I can remove them, but actually the area is so difficult and
uncommented that I won't mind extra comments...

Thanks!

-- 
tixxdz

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH RFC v3 2/7] proc: move /proc/{self|thread-self} dentries to proc_fs_info
@ 2017-11-10 10:45       ` Djalal Harouni
  0 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-10 10:45 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	Linux FS Devel, linux-kernel, kernel-hardening, LSM List,
	Linux API, Greg Kroah-Hartman, Alexander Viro, Akinobu Mita,
	Tobin C. Harding, Oleg Nesterov, Jeff Layton, Ingo Molnar,
	Eric W. Biederman, Linus Torvalds, Daniel Micay

On Fri, Nov 10, 2017 at 11:31 AM, Alexey Dobriyan <adobriyan@gmail.com> wrote:
> On 11/9/17, Djalal Harouni <tixxdz@gmail.com> wrote:
>
>>  struct proc_fs_info {
>>       struct pid_namespace *pid_ns;
>> +     struct dentry *proc_self; /* For /proc/self/ */
>> +     struct dentry *proc_thread_self; /* For /proc/thread-self/ */
>
> These are redundant comments.

I can remove them, but actually the area is so difficult and
uncommented that I won't mind extra comments...

Thanks!

-- 
tixxdz

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH RFC v3 2/7] proc: move /proc/{self|thread-self} dentries to proc_fs_info
@ 2017-11-10 10:45       ` Djalal Harouni
  0 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-10 10:45 UTC (permalink / raw)
  To: linux-security-module

On Fri, Nov 10, 2017 at 11:31 AM, Alexey Dobriyan <adobriyan@gmail.com> wrote:
> On 11/9/17, Djalal Harouni <tixxdz@gmail.com> wrote:
>
>>  struct proc_fs_info {
>>       struct pid_namespace *pid_ns;
>> +     struct dentry *proc_self; /* For /proc/self/ */
>> +     struct dentry *proc_thread_self; /* For /proc/thread-self/ */
>
> These are redundant comments.

I can remove them, but actually the area is so difficult and
uncommented that I won't mind extra comments...

Thanks!

-- 
tixxdz
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [kernel-hardening] Re: [PATCH RFC v3 2/7] proc: move /proc/{self|thread-self} dentries to proc_fs_info
@ 2017-11-10 10:45       ` Djalal Harouni
  0 siblings, 0 replies; 71+ messages in thread
From: Djalal Harouni @ 2017-11-10 10:45 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Kees Cook, Alexey Gladkov, Andy Lutomirski, Andrew Morton,
	Linux FS Devel, linux-kernel, kernel-hardening, LSM List,
	Linux API, Greg Kroah-Hartman, Alexander Viro, Akinobu Mita,
	Tobin C. Harding, Oleg Nesterov, Jeff Layton, Ingo Molnar,
	Eric W. Biederman, Linus Torvalds, Daniel Micay, Jonathan Corbet,
	J. Bruce Fields, Stephen Rothwell, Solar Designer

On Fri, Nov 10, 2017 at 11:31 AM, Alexey Dobriyan <adobriyan@gmail.com> wrote:
> On 11/9/17, Djalal Harouni <tixxdz@gmail.com> wrote:
>
>>  struct proc_fs_info {
>>       struct pid_namespace *pid_ns;
>> +     struct dentry *proc_self; /* For /proc/self/ */
>> +     struct dentry *proc_thread_self; /* For /proc/thread-self/ */
>
> These are redundant comments.

I can remove them, but actually the area is so difficult and
uncommented that I won't mind extra comments...

Thanks!

-- 
tixxdz

^ permalink raw reply	[flat|nested] 71+ messages in thread

end of thread, other threads:[~2017-11-10 10:45 UTC | newest]

Thread overview: 71+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-09 16:13 [PATCH RFC v3 0/7] proc: modernize proc to support multiple private instances Djalal Harouni
2017-11-09 16:13 ` [kernel-hardening] " Djalal Harouni
2017-11-09 16:13 ` Djalal Harouni
2017-11-09 16:14 ` [PATCH RFC v3 1/7] proc: add proc_fs_info struct to store proc information Djalal Harouni
2017-11-09 16:14   ` [kernel-hardening] " Djalal Harouni
2017-11-09 16:14   ` Djalal Harouni
2017-11-10 10:26   ` Alexey Dobriyan
2017-11-10 10:26     ` [kernel-hardening] " Alexey Dobriyan
2017-11-10 10:26     ` Alexey Dobriyan
2017-11-10 10:26     ` Alexey Dobriyan
2017-11-10 10:32     ` Djalal Harouni
2017-11-10 10:32       ` [kernel-hardening] " Djalal Harouni
2017-11-10 10:32       ` Djalal Harouni
2017-11-10 10:32       ` Djalal Harouni
2017-11-09 16:14 ` [PATCH RFC v3 2/7] proc: move /proc/{self|thread-self} dentries to proc_fs_info Djalal Harouni
2017-11-09 16:14   ` [kernel-hardening] " Djalal Harouni
2017-11-09 16:14   ` Djalal Harouni
2017-11-10 10:31   ` Alexey Dobriyan
2017-11-10 10:31     ` [kernel-hardening] " Alexey Dobriyan
2017-11-10 10:31     ` Alexey Dobriyan
2017-11-10 10:31     ` Alexey Dobriyan
2017-11-10 10:45     ` Djalal Harouni
2017-11-10 10:45       ` [kernel-hardening] " Djalal Harouni
2017-11-10 10:45       ` Djalal Harouni
2017-11-10 10:45       ` Djalal Harouni
2017-11-09 16:14 ` [PATCH RFC v3 3/7] proc: add helpers to set and get proc hidepid and gid mount options Djalal Harouni
2017-11-09 16:14   ` [kernel-hardening] " Djalal Harouni
2017-11-09 16:14   ` Djalal Harouni
2017-11-10 10:36   ` Alexey Dobriyan
2017-11-10 10:36     ` [kernel-hardening] " Alexey Dobriyan
2017-11-10 10:36     ` Alexey Dobriyan
2017-11-10 10:36     ` Alexey Dobriyan
2017-11-10 10:41     ` Djalal Harouni
2017-11-10 10:41       ` [kernel-hardening] " Djalal Harouni
2017-11-10 10:41       ` Djalal Harouni
2017-11-10 10:41       ` Djalal Harouni
2017-11-09 16:14 ` [PATCH RFC v3 4/7] proc: support mounting private procfs instances inside same pid namespace Djalal Harouni
2017-11-09 16:14   ` [kernel-hardening] " Djalal Harouni
2017-11-09 16:14   ` Djalal Harouni
2017-11-10  2:53   ` James Morris
2017-11-10  2:53     ` [kernel-hardening] " James Morris
2017-11-10  2:53     ` James Morris
2017-11-10  2:53     ` James Morris
2017-11-10 10:33     ` Djalal Harouni
2017-11-10 10:33       ` [kernel-hardening] " Djalal Harouni
2017-11-10 10:33       ` Djalal Harouni
2017-11-10 10:33       ` Djalal Harouni
2017-11-09 16:14 ` [PATCH RFC v3 5/7] proc: move hidepid definitions to proc files Djalal Harouni
2017-11-09 16:14   ` [kernel-hardening] " Djalal Harouni
2017-11-09 16:14   ` Djalal Harouni
2017-11-09 16:14 ` [PATCH RFC v3 6/7] proc: support new 'pids=all|ptraceable' mount option Djalal Harouni
2017-11-09 16:14   ` [kernel-hardening] " Djalal Harouni
2017-11-09 16:14   ` Djalal Harouni
2017-11-10  2:38   ` Andy Lutomirski
2017-11-10  2:38     ` [kernel-hardening] " Andy Lutomirski
2017-11-10  2:38     ` Andy Lutomirski
2017-11-10  2:38     ` Andy Lutomirski
2017-11-10  2:38     ` Andy Lutomirski
2017-11-10 10:38     ` Djalal Harouni
2017-11-10 10:38       ` [kernel-hardening] " Djalal Harouni
2017-11-10 10:38       ` Djalal Harouni
2017-11-10 10:38       ` Djalal Harouni
2017-11-10 10:38       ` Djalal Harouni
2017-11-10  2:56   ` James Morris
2017-11-10  2:56     ` [kernel-hardening] " James Morris
2017-11-10  2:56     ` James Morris
2017-11-10  2:56     ` James Morris
2017-11-09 16:14 ` [PATCH RFC v3 7/7] proc: flush dcache entries from all procfs instances Djalal Harouni
2017-11-09 16:14   ` [kernel-hardening] " Djalal Harouni
2017-11-09 16:14   ` Djalal Harouni
2017-11-09 16:14   ` Djalal Harouni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.