linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v2 0/2] mm,fork,security: introduce MADV_WIPEONFORK
       [not found] <20170806140425.20937-1-riel@redhat.com>
@ 2017-08-07 13:22 ` Michal Hocko
       [not found]   ` <20170807132257.GH32434-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2017-08-07 13:22 UTC (permalink / raw)
  To: riel
  Cc: linux-kernel, mike.kravetz, linux-mm, fweimer, colm, akpm,
	keescook, luto, wad, mingo, kirill, dave.hansen, linux-api

This is an user visible API so make sure you CC linux-api (added)

On Sun 06-08-17 10:04:23, Rik van Riel wrote:
> v2: fix MAP_SHARED case and kbuild warnings
> 
> Introduce MADV_WIPEONFORK semantics, which result in a VMA being
> empty in the child process after fork. This differs from MADV_DONTFORK
> in one important way.
> 
> If a child process accesses memory that was MADV_WIPEONFORK, it
> will get zeroes. The address ranges are still valid, they are just empty.
> 
> If a child process accesses memory that was MADV_DONTFORK, it will
> get a segmentation fault, since those address ranges are no longer
> valid in the child after fork.
> 
> Since MADV_DONTFORK also seems to be used to allow very large
> programs to fork in systems with strict memory overcommit restrictions,
> changing the semantics of MADV_DONTFORK might break existing programs.
> 
> The use case is libraries that store or cache information, and
> want to know that they need to regenerate it in the child process
> after fork.
> 
> Examples of this would be:
> - systemd/pulseaudio API checks (fail after fork)
>   (replacing a getpid check, which is too slow without a PID cache)
> - PKCS#11 API reinitialization check (mandated by specification)
> - glibc's upcoming PRNG (reseed after fork)
> - OpenSSL PRNG (reseed after fork)
> 
> The security benefits of a forking server having a re-inialized
> PRNG in every child process are pretty obvious. However, due to
> libraries having all kinds of internal state, and programs getting
> compiled with many different versions of each library, it is
> unreasonable to expect calling programs to re-initialize everything
> manually after fork.
> 
> A further complication is the proliferation of clone flags,
> programs bypassing glibc's functions to call clone directly,
> and programs calling unshare, causing the glibc pthread_atfork
> hook to not get called.
> 
> It would be better to have the kernel take care of this automatically.
> 
> This is similar to the OpenBSD minherit syscall with MAP_INHERIT_ZERO:
> 
>     https://man.openbsd.org/minherit.2
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/2] mm,fork,security: introduce MADV_WIPEONFORK
       [not found]   ` <20170807132257.GH32434-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
@ 2017-08-07 13:46     ` Michal Hocko
  2017-08-07 14:19       ` Florian Weimer
                         ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Michal Hocko @ 2017-08-07 13:46 UTC (permalink / raw)
  To: riel-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	mike.kravetz-QHcLZuEGTsvQT0dZR+AlfA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, fweimer-H+wXaHxf7aLQT0dZR+AlfA,
	colm-ZXBCfW2eEe/k1uMJSBkQmQ,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	keescook-F7+t8E8rja9g9hUCZPvPmw, luto-kltTT9wpgjJwATOyAt5JVQ,
	wad-F7+t8E8rja9g9hUCZPvPmw, mingo-DgEjT+Ai2ygdnm+yROfE0A,
	kirill-oKw7cIdHH8eLwutG50LtGA,
	dave.hansen-ral2JQCrhuEAvxtiuMwx3w,
	linux-api-u79uwXL29TY76Z2rM5mHXA

On Mon 07-08-17 15:22:57, Michal Hocko wrote:
> This is an user visible API so make sure you CC linux-api (added)
> 
> On Sun 06-08-17 10:04:23, Rik van Riel wrote:
> > v2: fix MAP_SHARED case and kbuild warnings
> > 
> > Introduce MADV_WIPEONFORK semantics, which result in a VMA being
> > empty in the child process after fork. This differs from MADV_DONTFORK
> > in one important way.
> > 
> > If a child process accesses memory that was MADV_WIPEONFORK, it
> > will get zeroes. The address ranges are still valid, they are just empty.
> > 
> > If a child process accesses memory that was MADV_DONTFORK, it will
> > get a segmentation fault, since those address ranges are no longer
> > valid in the child after fork.
> > 
> > Since MADV_DONTFORK also seems to be used to allow very large
> > programs to fork in systems with strict memory overcommit restrictions,
> > changing the semantics of MADV_DONTFORK might break existing programs.
> > 
> > The use case is libraries that store or cache information, and
> > want to know that they need to regenerate it in the child process
> > after fork.

How do they know that they need to regenerate if they do not get SEGV?
Are they going to assume that a read of zeros is a "must init again"? Isn't
that too fragile? Or do they play other tricks like parse /proc/self/smaps
and read in the flag?
 
> > Examples of this would be:
> > - systemd/pulseaudio API checks (fail after fork)
> >   (replacing a getpid check, which is too slow without a PID cache)
> > - PKCS#11 API reinitialization check (mandated by specification)
> > - glibc's upcoming PRNG (reseed after fork)
> > - OpenSSL PRNG (reseed after fork)
> > 
> > The security benefits of a forking server having a re-inialized
> > PRNG in every child process are pretty obvious. However, due to
> > libraries having all kinds of internal state, and programs getting
> > compiled with many different versions of each library, it is
> > unreasonable to expect calling programs to re-initialize everything
> > manually after fork.
> > 
> > A further complication is the proliferation of clone flags,
> > programs bypassing glibc's functions to call clone directly,
> > and programs calling unshare, causing the glibc pthread_atfork
> > hook to not get called.
> > 
> > It would be better to have the kernel take care of this automatically.
> > 
> > This is similar to the OpenBSD minherit syscall with MAP_INHERIT_ZERO:
> > 
> >     https://man.openbsd.org/minherit.2

I would argue that a MAP_$FOO flag would be more appropriate. Or do you
see any cases where such a special mapping would need to change the
semantic and inherit the content over the fork again?

I do not like the madvise because it is an advise and as such it can be
ignored/not implemented and that shouldn't have any correctness effects
on the child process.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/2] mm,fork,security: introduce MADV_WIPEONFORK
  2017-08-07 13:46     ` Michal Hocko
@ 2017-08-07 14:19       ` Florian Weimer
  2017-08-10 13:06         ` Michal Hocko
  2017-08-07 14:59       ` Rik van Riel
  2017-08-07 15:55       ` Colm MacCárthaigh
  2 siblings, 1 reply; 23+ messages in thread
From: Florian Weimer @ 2017-08-07 14:19 UTC (permalink / raw)
  To: Michal Hocko, riel
  Cc: linux-kernel, mike.kravetz, linux-mm, colm, akpm, keescook, luto,
	wad, mingo, kirill, dave.hansen, linux-api

On 08/07/2017 03:46 PM, Michal Hocko wrote:
> How do they know that they need to regenerate if they do not get SEGV?
> Are they going to assume that a read of zeros is a "must init again"? Isn't
> that too fragile?

Why would it be fragile?  Some level of synchronization is needed to set
things up, of course, but I think it's possible to write a lock-free
algorithm to maintain the state even without strong guarantees of memory
ordering from fork.

In the DRBG uniqueness case, you don't care if you reinitialize because
it's the first use, or because a fork just happened.

In the API-mandated fork check, a detection false positive before a fork
is not acceptable (because it would prevent legitimate API use), but I
think you can deal with this case if you publish a pointer to a
pre-initialized, non-zero mapping.

Thanks,
Florian

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/2] mm,fork,security: introduce MADV_WIPEONFORK
  2017-08-07 13:46     ` Michal Hocko
  2017-08-07 14:19       ` Florian Weimer
@ 2017-08-07 14:59       ` Rik van Riel
       [not found]         ` <1502117991.6577.13.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2017-08-10 13:05         ` Michal Hocko
  2017-08-07 15:55       ` Colm MacCárthaigh
  2 siblings, 2 replies; 23+ messages in thread
From: Rik van Riel @ 2017-08-07 14:59 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, mike.kravetz, linux-mm, fweimer, colm, akpm,
	keescook, luto, wad, mingo, kirill, dave.hansen, linux-api

On Mon, 2017-08-07 at 15:46 +0200, Michal Hocko wrote:
> On Mon 07-08-17 15:22:57, Michal Hocko wrote:
> > This is an user visible API so make sure you CC linux-api (added)
> > 
> > On Sun 06-08-17 10:04:23, Rik van Riel wrote:
> > > 
> > > A further complication is the proliferation of clone flags,
> > > programs bypassing glibc's functions to call clone directly,
> > > and programs calling unshare, causing the glibc pthread_atfork
> > > hook to not get called.
> > > 
> > > It would be better to have the kernel take care of this
> > > automatically.
> > > 
> > > This is similar to the OpenBSD minherit syscall with
> > > MAP_INHERIT_ZERO:
> > > 
> > >     https://man.openbsd.org/minherit.2
> 
> I would argue that a MAP_$FOO flag would be more appropriate. Or do
> you
> see any cases where such a special mapping would need to change the
> semantic and inherit the content over the fork again?
> 
> I do not like the madvise because it is an advise and as such it can
> be
> ignored/not implemented and that shouldn't have any correctness
> effects
> on the child process.

Too late for that. VM_DONTFORK is already implemented
through MADV_DONTFORK & MADV_DOFORK, in a way that is
very similar to the MADV_WIPEONFORK from these patches.

I wonder if that was done because MAP_* flags are a
bitmap, with a very limited number of values as a result,
while MADV_* constants have an essentially unlimited
numerical namespace available.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/2] mm,fork,security: introduce MADV_WIPEONFORK
  2017-08-07 13:46     ` Michal Hocko
  2017-08-07 14:19       ` Florian Weimer
  2017-08-07 14:59       ` Rik van Riel
@ 2017-08-07 15:55       ` Colm MacCárthaigh
  2017-08-07 16:02         ` Colm MacCárthaigh
  2017-08-10 13:21         ` Michal Hocko
  2 siblings, 2 replies; 23+ messages in thread
From: Colm MacCárthaigh @ 2017-08-07 15:55 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Rik van Riel, linux-kernel, mike.kravetz, linux-mm,
	Florian Weimer, akpm, keescook, luto, wad, mingo, kirill,
	dave.hansen, linux-api

[-- Attachment #1: Type: text/plain, Size: 2157 bytes --]

On Mon, Aug 7, 2017 at 3:46 PM, Michal Hocko <mhocko@kernel.org> wrote:

>
> > > The use case is libraries that store or cache information, and
> > > want to know that they need to regenerate it in the child process
> > > after fork.
>
> How do they know that they need to regenerate if they do not get SEGV?
> Are they going to assume that a read of zeros is a "must init again"? Isn't
> that too fragile? Or do they play other tricks like parse /proc/self/smaps
> and read in the flag?
>

Hi from a user space crypto maintainer :) Here's how we do exactly this it
in s2n:

https://github.com/awslabs/s2n/blob/master/utils/s2n_random.c , lines 62 -
91

and here's how LibreSSL does it:

https://github.com/libressl-portable/openbsd/blob/57dcd4329d83bff3dd67a293d5c4a53b795c587e/src/lib/libc/crypt/arc4random.h
(lines 37 on)
https://github.com/libressl-portable/openbsd/blob/57dcd4329d83bff3dd67a293d5c4a53b795c587e/src/lib/libc/crypt/arc4random.c
(Line 110)

OpenSSL and libc are in the process of adding similar DRBGs and would use a
WIPEONFORK. BoringSSL's maintainers are also interested as it adds
robustness.  I also recall it being a topic of discussion at the High
Assurance Cryptography Symposium (HACS) where many crypto maintainers meet
and several more maintainers there indicated it would be nice to have.

Right now on Linux we all either use pthread_atfork() to zero the memory on
fork, or getpid() and getppid() guards. The former can be evaded by direct
syscall() and other tricks (which things like Language VMs are prone to
doing), and the latter check is probabilistic as pids can repeat, though if
you use both getpid() and getppid() - which is slow! - the probability of
both PIDs colliding is very low indeed.

The result at the moment on Linux there's no bulletproof way to detect a
fork and erase a key or DRBG state. It would really be nice to be able to
match what we can do with MAP_INHERIT_ZERO and minherit() on BSD.
 madvise() does seem like the established idiom for behavior like this on
Linux.  I don't imagine it will be hard to use in practice, we can fall
back to existing behavior if the flag isn't accepted.

-- 
Colm

[-- Attachment #2: Type: text/html, Size: 3223 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/2] mm,fork,security: introduce MADV_WIPEONFORK
  2017-08-07 15:55       ` Colm MacCárthaigh
@ 2017-08-07 16:02         ` Colm MacCárthaigh
  2017-08-10 13:21         ` Michal Hocko
  1 sibling, 0 replies; 23+ messages in thread
From: Colm MacCárthaigh @ 2017-08-07 16:02 UTC (permalink / raw)
  To: linux-api, linux-kernel

[ Resent as text/plain, after sacrificing an offering to the RFC822 gods]

On Mon, Aug 7, 2017 at 3:46 PM, Michal Hocko <mhocko@kernel.org> wrote:
>
>
> > > The use case is libraries that store or cache information, and
> > > want to know that they need to regenerate it in the child process
> > > after fork.
>
> How do they know that they need to regenerate if they do not get SEGV?
> Are they going to assume that a read of zeros is a "must init again"?
> Isn't
> that too fragile? Or do they play other tricks like parse /proc/self/smaps
> and read in the flag?

Hi from a user space crypto maintainer :) Here's how we do exactly this it
in s2n:

https://github.com/awslabs/s2n/blob/master/utils/s2n_random.c , lines 62 - 91

and here's how LibreSSL does it:

https://github.com/libressl-portable/openbsd/blob/57dcd4329d83bff3dd67a293d5c4a53b795c587e/src/lib/libc/crypt/arc4random.h
(lines 37 on)
https://github.com/libressl-portable/openbsd/blob/57dcd4329d83bff3dd67a293d5c4a53b795c587e/src/lib/libc/crypt/arc4random.c
(Line 110)

OpenSSL and libc are in the process of adding similar DRBGs and would use a
WIPEONFORK. BoringSSL's maintainers are also interested as it adds
robustness.  I also recall it being a topic of discussion at the High
Assurance Cryptography Symposium (HACS) where many crypto maintainers meet
and several more maintainers there indicated it would be nice to have.

Right now on Linux we all either use pthread_atfork() to zero the memory on
fork, or getpid() and getppid() guards. The former can be evaded by direct
syscall() and other tricks (which things like Language VMs are prone to
doing), and the latter check is probabilistic as pids can repeat, though if
you use both getpid() and getppid() - which is slow! - the probability of
both PIDs colliding is very low indeed.

The result at the moment on Linux there's no bulletproof way to detect a
fork and erase a key or DRBG state. It would really be nice to be able to
match what we can do with MAP_INHERIT_ZERO and minherit() on BSD.  madvise()
does seem like the established idiom for behavior like this on Linux.  I
don't imagine it will be hard to use in practice, we can fall back to
existing behavior if the flag isn't accepted.


-- 
Colm

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/2] mm,fork,security: introduce MADV_WIPEONFORK
       [not found]         ` <1502117991.6577.13.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-08-09  9:59           ` Kirill A. Shutemov
  2017-08-09 12:31             ` Rik van Riel
  2017-08-09 12:42             ` Florian Weimer
  0 siblings, 2 replies; 23+ messages in thread
From: Kirill A. Shutemov @ 2017-08-09  9:59 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Michal Hocko, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	mike.kravetz-QHcLZuEGTsvQT0dZR+AlfA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, fweimer-H+wXaHxf7aLQT0dZR+AlfA,
	colm-ZXBCfW2eEe/k1uMJSBkQmQ,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	keescook-F7+t8E8rja9g9hUCZPvPmw, luto-kltTT9wpgjJwATOyAt5JVQ,
	wad-F7+t8E8rja9g9hUCZPvPmw, mingo-DgEjT+Ai2ygdnm+yROfE0A,
	dave.hansen-ral2JQCrhuEAvxtiuMwx3w,
	linux-api-u79uwXL29TY76Z2rM5mHXA

On Mon, Aug 07, 2017 at 10:59:51AM -0400, Rik van Riel wrote:
> On Mon, 2017-08-07 at 15:46 +0200, Michal Hocko wrote:
> > On Mon 07-08-17 15:22:57, Michal Hocko wrote:
> > > This is an user visible API so make sure you CC linux-api (added)
> > > 
> > > On Sun 06-08-17 10:04:23, Rik van Riel wrote:
> > > > 
> > > > A further complication is the proliferation of clone flags,
> > > > programs bypassing glibc's functions to call clone directly,
> > > > and programs calling unshare, causing the glibc pthread_atfork
> > > > hook to not get called.
> > > > 
> > > > It would be better to have the kernel take care of this
> > > > automatically.
> > > > 
> > > > This is similar to the OpenBSD minherit syscall with
> > > > MAP_INHERIT_ZERO:
> > > > 
> > > >     https://man.openbsd.org/minherit.2
> > 
> > I would argue that a MAP_$FOO flag would be more appropriate. Or do
> > you
> > see any cases where such a special mapping would need to change the
> > semantic and inherit the content over the fork again?
> > 
> > I do not like the madvise because it is an advise and as such it can
> > be
> > ignored/not implemented and that shouldn't have any correctness
> > effects
> > on the child process.
> 
> Too late for that. VM_DONTFORK is already implemented
> through MADV_DONTFORK & MADV_DOFORK, in a way that is
> very similar to the MADV_WIPEONFORK from these patches.

It's not obvious to me what would break if kernel would ignore
MADV_DONTFORK or MADV_DONTDUMP.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/2] mm,fork,security: introduce MADV_WIPEONFORK
  2017-08-09  9:59           ` Kirill A. Shutemov
@ 2017-08-09 12:31             ` Rik van Riel
  2017-08-09 12:42             ` Florian Weimer
  1 sibling, 0 replies; 23+ messages in thread
From: Rik van Riel @ 2017-08-09 12:31 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Michal Hocko, linux-kernel, mike.kravetz, linux-mm, fweimer,
	colm, akpm, keescook, luto, wad, mingo, dave.hansen, linux-api

On Wed, 2017-08-09 at 12:59 +0300, Kirill A. Shutemov wrote:
> On Mon, Aug 07, 2017 at 10:59:51AM -0400, Rik van Riel wrote:
> > On Mon, 2017-08-07 at 15:46 +0200, Michal Hocko wrote:
> > > On Mon 07-08-17 15:22:57, Michal Hocko wrote:
> > > > This is an user visible API so make sure you CC linux-api
> > > > (added)
> > > > 
> > > > On Sun 06-08-17 10:04:23, Rik van Riel wrote:
> > > > > 
> > > > > A further complication is the proliferation of clone flags,
> > > > > programs bypassing glibc's functions to call clone directly,
> > > > > and programs calling unshare, causing the glibc
> > > > > pthread_atfork
> > > > > hook to not get called.
> > > > > 
> > > > > It would be better to have the kernel take care of this
> > > > > automatically.
> > > > > 
> > > > > This is similar to the OpenBSD minherit syscall with
> > > > > MAP_INHERIT_ZERO:
> > > > > 
> > > > >     https://man.openbsd.org/minherit.2
> > > 
> > > I would argue that a MAP_$FOO flag would be more appropriate. Or
> > > do
> > > you
> > > see any cases where such a special mapping would need to change
> > > the
> > > semantic and inherit the content over the fork again?
> > > 
> > > I do not like the madvise because it is an advise and as such it
> > > can
> > > be
> > > ignored/not implemented and that shouldn't have any correctness
> > > effects
> > > on the child process.
> > 
> > Too late for that. VM_DONTFORK is already implemented
> > through MADV_DONTFORK & MADV_DOFORK, in a way that is
> > very similar to the MADV_WIPEONFORK from these patches.
> 
> It's not obvious to me what would break if kernel would ignore
> MADV_DONTFORK or MADV_DONTDUMP.
> 
You might end up with multiple processes having a device open
which can only handle one process at a time.

Another thing that could go wrong is that if overcommit_memory=2,
a very large process with MADV_DONTFORK on a large memory area
suddenly fails to fork (due to there not being enough available
memory), and is unable to start a helper process.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/2] mm,fork,security: introduce MADV_WIPEONFORK
  2017-08-09  9:59           ` Kirill A. Shutemov
  2017-08-09 12:31             ` Rik van Riel
@ 2017-08-09 12:42             ` Florian Weimer
  1 sibling, 0 replies; 23+ messages in thread
From: Florian Weimer @ 2017-08-09 12:42 UTC (permalink / raw)
  To: Kirill A. Shutemov, Rik van Riel
  Cc: Michal Hocko, linux-kernel, mike.kravetz, linux-mm, colm, akpm,
	keescook, luto, wad, mingo, dave.hansen, linux-api

On 08/09/2017 11:59 AM, Kirill A. Shutemov wrote:
> It's not obvious to me what would break if kernel would ignore
> MADV_DONTFORK or MADV_DONTDUMP.

Ignoring MADV_DONTDUMP could cause secrets to be written to disk,
contrary to the expected security policy of the system.

Thanks,
Florian

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/2] mm,fork,security: introduce MADV_WIPEONFORK
  2017-08-07 14:59       ` Rik van Riel
       [not found]         ` <1502117991.6577.13.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-08-10 13:05         ` Michal Hocko
       [not found]           ` <20170810130531.GS23863-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
  1 sibling, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2017-08-10 13:05 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, mike.kravetz, linux-mm, fweimer, colm, akpm,
	keescook, luto, wad, mingo, kirill, dave.hansen, linux-api

On Mon 07-08-17 10:59:51, Rik van Riel wrote:
> On Mon, 2017-08-07 at 15:46 +0200, Michal Hocko wrote:
> > On Mon 07-08-17 15:22:57, Michal Hocko wrote:
> > > This is an user visible API so make sure you CC linux-api (added)
> > > 
> > > On Sun 06-08-17 10:04:23, Rik van Riel wrote:
> > > > 
> > > > A further complication is the proliferation of clone flags,
> > > > programs bypassing glibc's functions to call clone directly,
> > > > and programs calling unshare, causing the glibc pthread_atfork
> > > > hook to not get called.
> > > > 
> > > > It would be better to have the kernel take care of this
> > > > automatically.
> > > > 
> > > > This is similar to the OpenBSD minherit syscall with
> > > > MAP_INHERIT_ZERO:
> > > > 
> > > >     https://man.openbsd.org/minherit.2
> > 
> > I would argue that a MAP_$FOO flag would be more appropriate. Or do
> > you
> > see any cases where such a special mapping would need to change the
> > semantic and inherit the content over the fork again?
> > 
> > I do not like the madvise because it is an advise and as such it can
> > be
> > ignored/not implemented and that shouldn't have any correctness
> > effects
> > on the child process.
> 
> Too late for that. VM_DONTFORK is already implemented
> through MADV_DONTFORK & MADV_DOFORK, in a way that is
> very similar to the MADV_WIPEONFORK from these patches.

Yeah, those two seem to be breaking the "madvise as an advise" semantic as
well but that doesn't mean we should follow that pattern any further.

> I wonder if that was done because MAP_* flags are a
> bitmap, with a very limited number of values as a result,
> while MADV_* constants have an essentially unlimited
> numerical namespace available.

That might have been the reason or it could have been simply because it
is easier to put something into madvise than mmap...

So back to the question. Is there any real usecase where you want to
have this on/off like or would a simple MAP_ZERO_ON_FORK be sufficient.
There should be some bits left between from my quick grep over arch
mman.h.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/2] mm,fork,security: introduce MADV_WIPEONFORK
  2017-08-07 14:19       ` Florian Weimer
@ 2017-08-10 13:06         ` Michal Hocko
  0 siblings, 0 replies; 23+ messages in thread
From: Michal Hocko @ 2017-08-10 13:06 UTC (permalink / raw)
  To: Florian Weimer
  Cc: riel, linux-kernel, mike.kravetz, linux-mm, colm, akpm, keescook,
	luto, wad, mingo, kirill, dave.hansen, linux-api

On Mon 07-08-17 16:19:18, Florian Weimer wrote:
> On 08/07/2017 03:46 PM, Michal Hocko wrote:
> > How do they know that they need to regenerate if they do not get SEGV?
> > Are they going to assume that a read of zeros is a "must init again"? Isn't
> > that too fragile?
> 
> Why would it be fragile?  Some level of synchronization is needed to set
> things up, of course, but I think it's possible to write a lock-free
> algorithm to maintain the state even without strong guarantees of memory
> ordering from fork.

Yeah, that is what I meant as fragile... I am not question this is
impossible.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/2] mm,fork,security: introduce MADV_WIPEONFORK
  2017-08-07 15:55       ` Colm MacCárthaigh
  2017-08-07 16:02         ` Colm MacCárthaigh
@ 2017-08-10 13:21         ` Michal Hocko
  2017-08-10 14:11           ` Michal Hocko
  1 sibling, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2017-08-10 13:21 UTC (permalink / raw)
  To: Colm MacCárthaigh
  Cc: Rik van Riel, linux-kernel, mike.kravetz, linux-mm,
	Florian Weimer, akpm, keescook, luto, wad, mingo, kirill,
	dave.hansen, linux-api

On Mon 07-08-17 17:55:45, Colm MacCárthaigh wrote:
> On Mon, Aug 7, 2017 at 3:46 PM, Michal Hocko <mhocko@kernel.org> wrote:
> 
> >
> > > > The use case is libraries that store or cache information, and
> > > > want to know that they need to regenerate it in the child process
> > > > after fork.
> >
> > How do they know that they need to regenerate if they do not get SEGV?
> > Are they going to assume that a read of zeros is a "must init again"? Isn't
> > that too fragile? Or do they play other tricks like parse /proc/self/smaps
> > and read in the flag?
> >
> 
> Hi from a user space crypto maintainer :) Here's how we do exactly this it
> in s2n:
> 
> https://github.com/awslabs/s2n/blob/master/utils/s2n_random.c , lines 62 -
> 91
> 
> and here's how LibreSSL does it:
> 
> https://github.com/libressl-portable/openbsd/blob/57dcd4329d83bff3dd67a293d5c4a53b795c587e/src/lib/libc/crypt/arc4random.h
> (lines 37 on)
> https://github.com/libressl-portable/openbsd/blob/57dcd4329d83bff3dd67a293d5c4a53b795c587e/src/lib/libc/crypt/arc4random.c
> (Line 110)
> 
> OpenSSL and libc are in the process of adding similar DRBGs and would use a
> WIPEONFORK. BoringSSL's maintainers are also interested as it adds
> robustness.  I also recall it being a topic of discussion at the High
> Assurance Cryptography Symposium (HACS) where many crypto maintainers meet
> and several more maintainers there indicated it would be nice to have.
> 
> Right now on Linux we all either use pthread_atfork() to zero the memory on
> fork, or getpid() and getppid() guards. The former can be evaded by direct
> syscall() and other tricks (which things like Language VMs are prone to
> doing), and the latter check is probabilistic as pids can repeat, though if
> you use both getpid() and getppid() - which is slow! - the probability of
> both PIDs colliding is very low indeed.

Thanks, these references are really useful to build a picture. I would
probably use an unlinked fd with O_CLOEXEC to dect this but I can see
how this is not the greatest option for a library.

> The result at the moment on Linux there's no bulletproof way to detect a
> fork and erase a key or DRBG state. It would really be nice to be able to
> match what we can do with MAP_INHERIT_ZERO and minherit() on BSD.
>  madvise() does seem like the established idiom for behavior like this on
> Linux.  I don't imagine it will be hard to use in practice, we can fall
> back to existing behavior if the flag isn't accepted.

The reason why I dislike madvise, as already said, is that it should be
an advise rather than something correctness related. Sure we do have
some exceptions there but that doesn't mean we should repeat the same
error. If anything an mmap MAP_$FOO sounds like a better approach to me.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/2] mm,fork,security: introduce MADV_WIPEONFORK
       [not found]           ` <20170810130531.GS23863-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
@ 2017-08-10 13:23             ` Colm MacCárthaigh
       [not found]               ` <CAAF6GDc2hsj-XJj=Rx2ZF6Sh3Ke6nKewABXfqQxQjfDd5QN7Ug-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 23+ messages in thread
From: Colm MacCárthaigh @ 2017-08-10 13:23 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Rik van Riel, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Mike Kravetz,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Florian Weimer,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Kees Cook,
	luto-kltTT9wpgjJwATOyAt5JVQ, Will Drewry,
	mingo-DgEjT+Ai2ygdnm+yROfE0A, kirill-oKw7cIdHH8eLwutG50LtGA,
	dave.hansen-ral2JQCrhuEAvxtiuMwx3w,
	linux-api-u79uwXL29TY76Z2rM5mHXA

On Thu, Aug 10, 2017 at 3:05 PM, Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>> Too late for that. VM_DONTFORK is already implemented
>> through MADV_DONTFORK & MADV_DOFORK, in a way that is
>> very similar to the MADV_WIPEONFORK from these patches.
>
> Yeah, those two seem to be breaking the "madvise as an advise" semantic as
> well but that doesn't mean we should follow that pattern any further.

I would imagine that many of the crypto applications using
MADV_WIPEONFORK will also be using MADV_DONTDUMP. In cases where it's
for protecting secret keys, I'd like to use both in my code, for
example. Though that doesn't really help decide this.

There is also at least one case for being able to turn WIPEONFORK
on/off with an existing page; a process that uses privilege separation
often goes through the following flow:

1. [ Access privileged keys as a power user and initialize memory ]
2. [ Fork a child process that actually does the work ]
3. [ Child drops privileges and uses the memory to do work ]
4. [ Parent hangs around to re-spawn a child if it crashes ]

In that mode it would be convenient to be able to mark the memory as
WIPEONFORK in the child, but not the parent.

-- 
Colm

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/2] mm,fork,security: introduce MADV_WIPEONFORK
  2017-08-10 13:21         ` Michal Hocko
@ 2017-08-10 14:11           ` Michal Hocko
  0 siblings, 0 replies; 23+ messages in thread
From: Michal Hocko @ 2017-08-10 14:11 UTC (permalink / raw)
  To: Colm MacCárthaigh
  Cc: Rik van Riel, linux-kernel, mike.kravetz, linux-mm,
	Florian Weimer, akpm, keescook, luto, wad, mingo, kirill,
	dave.hansen, linux-api

On Thu 10-08-17 15:21:10, Michal Hocko wrote:
[...]
> Thanks, these references are really useful to build a picture. I would
> probably use an unlinked fd with O_CLOEXEC to dect this but I can see
> how this is not the greatest option for a library.

Blee, brainfart on my end. For some reason I mixed fork/exec...
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/2] mm,fork,security: introduce MADV_WIPEONFORK
       [not found]               ` <CAAF6GDc2hsj-XJj=Rx2ZF6Sh3Ke6nKewABXfqQxQjfDd5QN7Ug-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-08-10 15:36                 ` Michal Hocko
  2017-08-10 16:17                   ` Colm MacCárthaigh
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2017-08-10 15:36 UTC (permalink / raw)
  To: Colm MacCárthaigh
  Cc: Rik van Riel, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Mike Kravetz,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Florian Weimer,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Kees Cook,
	luto-kltTT9wpgjJwATOyAt5JVQ, Will Drewry,
	mingo-DgEjT+Ai2ygdnm+yROfE0A, kirill-oKw7cIdHH8eLwutG50LtGA,
	dave.hansen-ral2JQCrhuEAvxtiuMwx3w,
	linux-api-u79uwXL29TY76Z2rM5mHXA

On Thu 10-08-17 15:23:05, Colm MacCárthaigh wrote:
> On Thu, Aug 10, 2017 at 3:05 PM, Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> >> Too late for that. VM_DONTFORK is already implemented
> >> through MADV_DONTFORK & MADV_DOFORK, in a way that is
> >> very similar to the MADV_WIPEONFORK from these patches.
> >
> > Yeah, those two seem to be breaking the "madvise as an advise" semantic as
> > well but that doesn't mean we should follow that pattern any further.
> 
> I would imagine that many of the crypto applications using
> MADV_WIPEONFORK will also be using MADV_DONTDUMP. In cases where it's
> for protecting secret keys, I'd like to use both in my code, for
> example. Though that doesn't really help decide this.
> 
> There is also at least one case for being able to turn WIPEONFORK
> on/off with an existing page; a process that uses privilege separation
> often goes through the following flow:
> 
> 1. [ Access privileged keys as a power user and initialize memory ]
> 2. [ Fork a child process that actually does the work ]
> 3. [ Child drops privileges and uses the memory to do work ]
> 4. [ Parent hangs around to re-spawn a child if it crashes ]
> 
> In that mode it would be convenient to be able to mark the memory as
> WIPEONFORK in the child, but not the parent.

I am not sure I understand. The child will have an own VMA so chaging
the attribute will not affect parent. Or did I misunderstand your
example?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/2] mm,fork,security: introduce MADV_WIPEONFORK
  2017-08-10 15:36                 ` Michal Hocko
@ 2017-08-10 16:17                   ` Colm MacCárthaigh
       [not found]                     ` <CAAF6GDeno6RpHf1KORVSxUL7M-CQfbWFFdyKK8LAWd_6PcJ55Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 23+ messages in thread
From: Colm MacCárthaigh @ 2017-08-10 16:17 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Florian Weimer, Kees Cook, Mike Kravetz, Rik van Riel,
	Will Drewry, akpm, dave.hansen, kirill, linux-api, linux-kernel,
	linux-mm, luto, mingo

[-- Attachment #1: Type: text/plain, Size: 1943 bytes --]

On Déar 10 Lún 2017 at 17:36 Michal Hocko <mhocko@kernel.org> wrote:

> On Thu 10-08-17 15:23:05, Colm MacCįrthaigh wrote:
> > On Thu, Aug 10, 2017 at 3:05 PM, Michal Hocko <mhocko@kernel.org> wrote:
> > >> Too late for that. VM_DONTFORK is already implemented
> > >> through MADV_DONTFORK & MADV_DOFORK, in a way that is
> > >> very similar to the MADV_WIPEONFORK from these patches.
> > >
> > > Yeah, those two seem to be breaking the "madvise as an advise"
> semantic as
> > > well but that doesn't mean we should follow that pattern any further.
> >
> > I would imagine that many of the crypto applications using
> > MADV_WIPEONFORK will also be using MADV_DONTDUMP. In cases where it's
> > for protecting secret keys, I'd like to use both in my code, for
> > example. Though that doesn't really help decide this.
> >
> > There is also at least one case for being able to turn WIPEONFORK
> > on/off with an existing page; a process that uses privilege separation
> > often goes through the following flow:
> >
> > 1. [ Access privileged keys as a power user and initialize memory ]
> > 2. [ Fork a child process that actually does the work ]
> > 3. [ Child drops privileges and uses the memory to do work ]
> > 4. [ Parent hangs around to re-spawn a child if it crashes ]
> >
> > In that mode it would be convenient to be able to mark the memory as
> > WIPEONFORK in the child, but not the parent.
>
> I am not sure I understand. The child will have an own VMA so chaging
> the attribute will not affect parent. Or did I misunderstand your
> example?
>

Typically with privilege separation the parent has to share some minimal
state with the child. In this case that's why the page is left alone.
Though a smart parent could unset and set just immediately around the fork.

The point then of protecting it in the child is to ensure that a grandchild
doesn't inherit the secret data.

-- 
Colm

[-- Attachment #2: Type: text/html, Size: 2613 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/2] mm,fork,security: introduce MADV_WIPEONFORK
       [not found]                     ` <CAAF6GDeno6RpHf1KORVSxUL7M-CQfbWFFdyKK8LAWd_6PcJ55Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-08-10 17:01                       ` Michal Hocko
       [not found]                         ` <20170810170144.GA987-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2017-08-10 17:01 UTC (permalink / raw)
  To: Colm MacCárthaigh
  Cc: Florian Weimer, Kees Cook, Mike Kravetz, Rik van Riel,
	Will Drewry, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	dave.hansen-ral2JQCrhuEAvxtiuMwx3w,
	kirill-oKw7cIdHH8eLwutG50LtGA, linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, luto-kltTT9wpgjJwATOyAt5JVQ,
	mingo-DgEjT+Ai2ygdnm+yROfE0A

On Thu 10-08-17 16:17:18, Colm MacCárthaigh wrote:
> On Déar 10 Lún 2017 at 17:36 Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> 
> > On Thu 10-08-17 15:23:05, Colm MacCįrthaigh wrote:
> > > On Thu, Aug 10, 2017 at 3:05 PM, Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> > > >> Too late for that. VM_DONTFORK is already implemented
> > > >> through MADV_DONTFORK & MADV_DOFORK, in a way that is
> > > >> very similar to the MADV_WIPEONFORK from these patches.
> > > >
> > > > Yeah, those two seem to be breaking the "madvise as an advise"
> > semantic as
> > > > well but that doesn't mean we should follow that pattern any further.
> > >
> > > I would imagine that many of the crypto applications using
> > > MADV_WIPEONFORK will also be using MADV_DONTDUMP. In cases where it's
> > > for protecting secret keys, I'd like to use both in my code, for
> > > example. Though that doesn't really help decide this.
> > >
> > > There is also at least one case for being able to turn WIPEONFORK
> > > on/off with an existing page; a process that uses privilege separation
> > > often goes through the following flow:
> > >
> > > 1. [ Access privileged keys as a power user and initialize memory ]
> > > 2. [ Fork a child process that actually does the work ]
> > > 3. [ Child drops privileges and uses the memory to do work ]
> > > 4. [ Parent hangs around to re-spawn a child if it crashes ]
> > >
> > > In that mode it would be convenient to be able to mark the memory as
> > > WIPEONFORK in the child, but not the parent.
> >
> > I am not sure I understand. The child will have an own VMA so chaging
> > the attribute will not affect parent. Or did I misunderstand your
> > example?
> >
> 
> Typically with privilege separation the parent has to share some minimal
> state with the child. In this case that's why the page is left alone.
> Though a smart parent could unset and set just immediately around the fork.
> 
> The point then of protecting it in the child is to ensure that a grandchild
> doesn't inherit the secret data.

Does anybody actually do that using the minherit BSD interface?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/2] mm,fork,security: introduce MADV_WIPEONFORK
       [not found]                         ` <20170810170144.GA987-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
@ 2017-08-10 22:09                           ` Colm MacCárthaigh
       [not found]                             ` <CAAF6GDdFjS612mx1TXzaVk1J-Afz9wsAywTEijO2TG4idxabiw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 23+ messages in thread
From: Colm MacCárthaigh @ 2017-08-10 22:09 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Florian Weimer, Kees Cook, Mike Kravetz, Rik van Riel,
	Will Drewry, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	dave.hansen-ral2JQCrhuEAvxtiuMwx3w,
	kirill-oKw7cIdHH8eLwutG50LtGA, linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, luto-kltTT9wpgjJwATOyAt5JVQ,
	mingo-DgEjT+Ai2ygdnm+yROfE0A

On Thu, Aug 10, 2017 at 7:01 PM, Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> Does anybody actually do that using the minherit BSD interface?

I can't find any OSS examples. I just thought of it in response to
your question, but now that I have, I do want to use it that way in
privsep code.

As a mere user, fwiw it would make /my/ code less complex (in
Kolmogorov terms) to be an madvise option. Here's what that would look
like in user space:

mmap()

#if MAP_INHERIT_ZERO
    minherit() || pthread_atfork(workaround_fptr);
#elif MADVISE_WIPEONFORK
    madvise() || pthread_atfork(workaround_fptr);
#else
    pthread_atfork(workaround_fptr);
#endif

Vs:

#if MAP_WIPEONFORK
    mmap( ... WIPEONFORK) || pthread_atfork(workaround_fptr);
#else
    mmap()
#endif

#if MAP_INHERIT_ZERO
    madvise() || pthread_atfork(workaround_fptr);
#endif

#if !defined(MAP_WIPEONFORK) && !defined(MAP_INHERIT_ZERO)
    pthread_atfork(workaround_fptr);
#endif

The former is neater, and also a lot easier to stay structured if the
code is separated across different functional units. Allocation is
often handled in special functions.

For me, madvise() is the principle of least surprise, following
existing DONTDUMP semantics.

-- 
Colm

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/2] mm,fork,security: introduce MADV_WIPEONFORK
       [not found]                             ` <CAAF6GDdFjS612mx1TXzaVk1J-Afz9wsAywTEijO2TG4idxabiw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-08-11 14:06                               ` Michal Hocko
       [not found]                                 ` <20170811140653.GO30811-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2017-08-11 14:06 UTC (permalink / raw)
  To: Colm MacCárthaigh
  Cc: Florian Weimer, Kees Cook, Mike Kravetz, Rik van Riel,
	Will Drewry, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	dave.hansen-ral2JQCrhuEAvxtiuMwx3w,
	kirill-oKw7cIdHH8eLwutG50LtGA, linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, luto-kltTT9wpgjJwATOyAt5JVQ,
	mingo-DgEjT+Ai2ygdnm+yROfE0A

On Fri 11-08-17 00:09:57, Colm MacCárthaigh wrote:
> On Thu, Aug 10, 2017 at 7:01 PM, Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> > Does anybody actually do that using the minherit BSD interface?
> 
> I can't find any OSS examples. I just thought of it in response to
> your question, but now that I have, I do want to use it that way in
> privsep code.
> 
> As a mere user, fwiw it would make /my/ code less complex (in
> Kolmogorov terms) to be an madvise option. Here's what that would look
> like in user space:
> 
> mmap()
> 
> #if MAP_INHERIT_ZERO
>     minherit() || pthread_atfork(workaround_fptr);
> #elif MADVISE_WIPEONFORK
>     madvise() || pthread_atfork(workaround_fptr);
> #else
>     pthread_atfork(workaround_fptr);
> #endif
> 
> Vs:
> 
> #if MAP_WIPEONFORK
>     mmap( ... WIPEONFORK) || pthread_atfork(workaround_fptr);
> #else
>     mmap()
> #endif
> 
> #if MAP_INHERIT_ZERO
>     madvise() || pthread_atfork(workaround_fptr);
> #endif
> 
> #if !defined(MAP_WIPEONFORK) && !defined(MAP_INHERIT_ZERO)
>     pthread_atfork(workaround_fptr);
> #endif
> 
> The former is neater, and also a lot easier to stay structured if the
> code is separated across different functional units. Allocation is
> often handled in special functions.

OK, I guess I see your point. Thanks for the clarification.
 
> For me, madvise() is the principle of least surprise, following
> existing DONTDUMP semantics.

I am sorry to look too insisting here (I have still hard time to reconcile
myself with the madvise (ab)use) but if we in fact want minherit like
interface why don't we simply add minherit and make the code which wants
to use that interface easier to port? Is the only reason that hooking
into madvise is less code? If yes is that a sufficient reason to justify
the (ab)use of madvise? If there is a general consensus on that part I
will shut up and won't object anymore. Arguably MADV_DONTFORK would fit
into minherit API better as well. MADV_DONTDUMP is a differnet storry of
course.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/2] mm,fork,security: introduce MADV_WIPEONFORK
       [not found]                                 ` <20170811140653.GO30811-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
@ 2017-08-11 14:11                                   ` Florian Weimer
       [not found]                                     ` <c8cda773-b28d-f35f-7f18-6735584cb173-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 23+ messages in thread
From: Florian Weimer @ 2017-08-11 14:11 UTC (permalink / raw)
  To: Michal Hocko, Colm MacCárthaigh
  Cc: Kees Cook, Mike Kravetz, Rik van Riel, Will Drewry,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	dave.hansen-ral2JQCrhuEAvxtiuMwx3w,
	kirill-oKw7cIdHH8eLwutG50LtGA, linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, luto-kltTT9wpgjJwATOyAt5JVQ,
	mingo-DgEjT+Ai2ygdnm+yROfE0A

On 08/11/2017 04:06 PM, Michal Hocko wrote:

> I am sorry to look too insisting here (I have still hard time to reconcile
> myself with the madvise (ab)use) but if we in fact want minherit like
> interface why don't we simply add minherit and make the code which wants
> to use that interface easier to port? Is the only reason that hooking
> into madvise is less code? If yes is that a sufficient reason to justify
> the (ab)use of madvise? If there is a general consensus on that part I
> will shut up and won't object anymore. Arguably MADV_DONTFORK would fit
> into minherit API better as well.

It does, OpenBSD calls it MAP_INHERIT_NONE.

Could you implement MAP_INHERIT_COPY and MAP_INHERIT_SHARE as well?  Or
is changing from MAP_SHARED to MAP_PRIVATE and back impossible?

Thanks,
Florian

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/2] mm,fork,security: introduce MADV_WIPEONFORK
       [not found]                                     ` <c8cda773-b28d-f35f-7f18-6735584cb173-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-08-11 14:24                                       ` Michal Hocko
       [not found]                                         ` <20170811142457.GP30811-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2017-08-11 14:24 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Colm MacCárthaigh, Kees Cook, Mike Kravetz, Rik van Riel,
	Will Drewry, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	dave.hansen-ral2JQCrhuEAvxtiuMwx3w,
	kirill-oKw7cIdHH8eLwutG50LtGA, linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, luto-kltTT9wpgjJwATOyAt5JVQ,
	mingo-DgEjT+Ai2ygdnm+yROfE0A

On Fri 11-08-17 16:11:44, Florian Weimer wrote:
> On 08/11/2017 04:06 PM, Michal Hocko wrote:
> 
> > I am sorry to look too insisting here (I have still hard time to reconcile
> > myself with the madvise (ab)use) but if we in fact want minherit like
> > interface why don't we simply add minherit and make the code which wants
> > to use that interface easier to port? Is the only reason that hooking
> > into madvise is less code? If yes is that a sufficient reason to justify
> > the (ab)use of madvise? If there is a general consensus on that part I
> > will shut up and won't object anymore. Arguably MADV_DONTFORK would fit
> > into minherit API better as well.
> 
> It does, OpenBSD calls it MAP_INHERIT_NONE.
> 
> Could you implement MAP_INHERIT_COPY and MAP_INHERIT_SHARE as well?  Or
> is changing from MAP_SHARED to MAP_PRIVATE and back impossible?

I haven't explored those two very much. Their semantic seems rather
awkward, especially map_inherit_share one. I guess MAP_INHERIT_COPY
would be doable. Do we have to support all modes or a missing support
would disqualify the syscall completely?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/2] mm,fork,security: introduce MADV_WIPEONFORK
       [not found]                                         ` <20170811142457.GP30811-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
@ 2017-08-11 15:24                                           ` Florian Weimer
  2017-08-11 15:31                                             ` Michal Hocko
  0 siblings, 1 reply; 23+ messages in thread
From: Florian Weimer @ 2017-08-11 15:24 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Colm MacCárthaigh, Kees Cook, Mike Kravetz, Rik van Riel,
	Will Drewry, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	dave.hansen-ral2JQCrhuEAvxtiuMwx3w,
	kirill-oKw7cIdHH8eLwutG50LtGA, linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, luto-kltTT9wpgjJwATOyAt5JVQ,
	mingo-DgEjT+Ai2ygdnm+yROfE0A

On 08/11/2017 04:24 PM, Michal Hocko wrote:
> On Fri 11-08-17 16:11:44, Florian Weimer wrote:
>> On 08/11/2017 04:06 PM, Michal Hocko wrote:
>>
>>> I am sorry to look too insisting here (I have still hard time to reconcile
>>> myself with the madvise (ab)use) but if we in fact want minherit like
>>> interface why don't we simply add minherit and make the code which wants
>>> to use that interface easier to port? Is the only reason that hooking
>>> into madvise is less code? If yes is that a sufficient reason to justify
>>> the (ab)use of madvise? If there is a general consensus on that part I
>>> will shut up and won't object anymore. Arguably MADV_DONTFORK would fit
>>> into minherit API better as well.
>>
>> It does, OpenBSD calls it MAP_INHERIT_NONE.
>>
>> Could you implement MAP_INHERIT_COPY and MAP_INHERIT_SHARE as well?  Or
>> is changing from MAP_SHARED to MAP_PRIVATE and back impossible?
> 
> I haven't explored those two very much. Their semantic seems rather
> awkward, especially map_inherit_share one. I guess MAP_INHERIT_COPY
> would be doable. Do we have to support all modes or a missing support
> would disqualify the syscall completely?

I think it would be a bit awkward if we implemented MAP_INHERIT_ZERO and
it would not turn a shared mapping into a private mapping in the child,
or would not work on shared mappings at all, or deviate in any way from
the OpenBSD implementation.

MAP_INHERIT_SHARE for a MAP_PRIVATE mapping which has been modified is a
bit bizarre, and I don't know how OpenBSD implements any of this.  It
could well be that the exact behavior implemented in OpenBSD is a poor
fit for the Linux VM implementation.

Florian

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/2] mm,fork,security: introduce MADV_WIPEONFORK
  2017-08-11 15:24                                           ` Florian Weimer
@ 2017-08-11 15:31                                             ` Michal Hocko
  0 siblings, 0 replies; 23+ messages in thread
From: Michal Hocko @ 2017-08-11 15:31 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Colm MacCárthaigh, Kees Cook, Mike Kravetz, Rik van Riel,
	Will Drewry, akpm, dave.hansen, kirill, linux-api, linux-kernel,
	linux-mm, luto, mingo

On Fri 11-08-17 17:24:29, Florian Weimer wrote:
> On 08/11/2017 04:24 PM, Michal Hocko wrote:
> > On Fri 11-08-17 16:11:44, Florian Weimer wrote:
> >> On 08/11/2017 04:06 PM, Michal Hocko wrote:
> >>
> >>> I am sorry to look too insisting here (I have still hard time to reconcile
> >>> myself with the madvise (ab)use) but if we in fact want minherit like
> >>> interface why don't we simply add minherit and make the code which wants
> >>> to use that interface easier to port? Is the only reason that hooking
> >>> into madvise is less code? If yes is that a sufficient reason to justify
> >>> the (ab)use of madvise? If there is a general consensus on that part I
> >>> will shut up and won't object anymore. Arguably MADV_DONTFORK would fit
> >>> into minherit API better as well.
> >>
> >> It does, OpenBSD calls it MAP_INHERIT_NONE.
> >>
> >> Could you implement MAP_INHERIT_COPY and MAP_INHERIT_SHARE as well?  Or
> >> is changing from MAP_SHARED to MAP_PRIVATE and back impossible?
> > 
> > I haven't explored those two very much. Their semantic seems rather
> > awkward, especially map_inherit_share one. I guess MAP_INHERIT_COPY
> > would be doable. Do we have to support all modes or a missing support
> > would disqualify the syscall completely?
> 
> I think it would be a bit awkward if we implemented MAP_INHERIT_ZERO and
> it would not turn a shared mapping into a private mapping in the child,
> or would not work on shared mappings at all, or deviate in any way from
> the OpenBSD implementation.

If we go with minherit API then I think we should adhere with the BSD
semantic and alloc MAP_INHERIT_ZERO for shared mappings as well

> MAP_INHERIT_SHARE for a MAP_PRIVATE mapping which has been modified is a
> bit bizarre, and I don't know how OpenBSD implements any of this.  It
> could well be that the exact behavior implemented in OpenBSD is a poor
> fit for the Linux VM implementation.

yeah, it would be MAP_INHERIT_SHARE that I would consider problematic
and rather go with ENOSUPP or even EINVAL.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2017-08-11 15:31 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20170806140425.20937-1-riel@redhat.com>
2017-08-07 13:22 ` [PATCH v2 0/2] mm,fork,security: introduce MADV_WIPEONFORK Michal Hocko
     [not found]   ` <20170807132257.GH32434-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2017-08-07 13:46     ` Michal Hocko
2017-08-07 14:19       ` Florian Weimer
2017-08-10 13:06         ` Michal Hocko
2017-08-07 14:59       ` Rik van Riel
     [not found]         ` <1502117991.6577.13.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-08-09  9:59           ` Kirill A. Shutemov
2017-08-09 12:31             ` Rik van Riel
2017-08-09 12:42             ` Florian Weimer
2017-08-10 13:05         ` Michal Hocko
     [not found]           ` <20170810130531.GS23863-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2017-08-10 13:23             ` Colm MacCárthaigh
     [not found]               ` <CAAF6GDc2hsj-XJj=Rx2ZF6Sh3Ke6nKewABXfqQxQjfDd5QN7Ug-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-08-10 15:36                 ` Michal Hocko
2017-08-10 16:17                   ` Colm MacCárthaigh
     [not found]                     ` <CAAF6GDeno6RpHf1KORVSxUL7M-CQfbWFFdyKK8LAWd_6PcJ55Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-08-10 17:01                       ` Michal Hocko
     [not found]                         ` <20170810170144.GA987-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2017-08-10 22:09                           ` Colm MacCárthaigh
     [not found]                             ` <CAAF6GDdFjS612mx1TXzaVk1J-Afz9wsAywTEijO2TG4idxabiw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-08-11 14:06                               ` Michal Hocko
     [not found]                                 ` <20170811140653.GO30811-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2017-08-11 14:11                                   ` Florian Weimer
     [not found]                                     ` <c8cda773-b28d-f35f-7f18-6735584cb173-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-08-11 14:24                                       ` Michal Hocko
     [not found]                                         ` <20170811142457.GP30811-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2017-08-11 15:24                                           ` Florian Weimer
2017-08-11 15:31                                             ` Michal Hocko
2017-08-07 15:55       ` Colm MacCárthaigh
2017-08-07 16:02         ` Colm MacCárthaigh
2017-08-10 13:21         ` Michal Hocko
2017-08-10 14:11           ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).