All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd
@ 2016-08-23 16:12 ` Michal Hocko
  0 siblings, 0 replies; 12+ messages in thread
From: Michal Hocko @ 2016-08-23 16:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, LKML, Michal Hocko, Roland McGrath, Oleg Nesterov,
	Andreas Schwab, William Preston

From: Michal Hocko <mhocko@suse.com>

fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit")
has caused a subtle regression in nscd which uses CLONE_CHILD_CLEARTID
to clear the nscd_certainly_running flag in the shared databases, so
that the clients are notified when nscd is restarted.  Now, when nscd
uses a non-persistent database, clients that have it mapped keep
thinking the database is being updated by nscd, when in fact nscd has
created a new (anonymous) one (for non-persistent databases it uses an
unlinked file as backend).

The original proposal for the CLONE_CHILD_CLEARTID change claimed
(https://lkml.org/lkml/2006/10/25/233):
"
The NPTL library uses the CLONE_CHILD_CLEARTID flag on clone() syscalls
on behalf of pthread_create() library calls.  This feature is used to
request that the kernel clear the thread-id in user space (at an address
provided in the syscall) when the thread disassociates itself from the
address space, which is done in mm_release().

Unfortunately, when a multi-threaded process incurs a core dump (such as
from a SIGSEGV), the core-dumping thread sends SIGKILL signals to all of
the other threads, which then proceed to clear their user-space tids
before synchronizing in exit_mm() with the start of core dumping.  This
misrepresents the state of process's address space at the time of the
SIGSEGV and makes it more difficult for someone to debug NPTL and glibc
problems (misleading him/her to conclude that the threads had gone away
before the fault).

The fix below is to simply avoid the CLONE_CHILD_CLEARTID action if a
core dump has been initiated.
"

The resulting patch from Roland (https://lkml.org/lkml/2006/10/26/269)
seems to have a larger scope than the original patch asked for. It seems
that limitting the scope of the check to core dumping should work for
SIGSEGV issue describe above.

[Changelog partly based on Andreas' description]
Fixes: fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit")
Tested-by:  William Preston <wpreston@suse.com>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andreas Schwab <schwab@suse.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---

Hi,
the previous version of this patch was sent http://lkml.kernel.org/r/1470039287-14643-1-git-send-email-mhocko@kernel.org
I have dropped the vfork check which Oleg didn't like. It shouldn't
have caused any change for the nscd testing so I am keeping William's
tested-by.

 kernel/fork.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 52e725d4a866..b89f0eb99f0a 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -913,14 +913,11 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
 	deactivate_mm(tsk, mm);
 
 	/*
-	 * If we're exiting normally, clear a user-space tid field if
-	 * requested.  We leave this alone when dying by signal, to leave
-	 * the value intact in a core dump, and to save the unnecessary
-	 * trouble, say, a killed vfork parent shouldn't touch this mm.
-	 * Userland only wants this done for a sys_exit.
+	 * Signal userspace if we're not exiting with a core dump
+	 * or a killed vfork parent which shouldn't touch this mm.
 	 */
 	if (tsk->clear_child_tid) {
-		if (!(tsk->flags & PF_SIGNALED) &&
+		if (!(tsk->signal->flags & SIGNAL_GROUP_COREDUMP) &&
 		    atomic_read(&mm->mm_users) > 1) {
 			/*
 			 * We don't check the error code - if userspace has
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2] kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd
@ 2016-08-23 16:12 ` Michal Hocko
  0 siblings, 0 replies; 12+ messages in thread
From: Michal Hocko @ 2016-08-23 16:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, LKML, Michal Hocko, Roland McGrath, Oleg Nesterov,
	Andreas Schwab, William Preston

From: Michal Hocko <mhocko@suse.com>

fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit")
has caused a subtle regression in nscd which uses CLONE_CHILD_CLEARTID
to clear the nscd_certainly_running flag in the shared databases, so
that the clients are notified when nscd is restarted.  Now, when nscd
uses a non-persistent database, clients that have it mapped keep
thinking the database is being updated by nscd, when in fact nscd has
created a new (anonymous) one (for non-persistent databases it uses an
unlinked file as backend).

The original proposal for the CLONE_CHILD_CLEARTID change claimed
(https://lkml.org/lkml/2006/10/25/233):
"
The NPTL library uses the CLONE_CHILD_CLEARTID flag on clone() syscalls
on behalf of pthread_create() library calls.  This feature is used to
request that the kernel clear the thread-id in user space (at an address
provided in the syscall) when the thread disassociates itself from the
address space, which is done in mm_release().

Unfortunately, when a multi-threaded process incurs a core dump (such as
from a SIGSEGV), the core-dumping thread sends SIGKILL signals to all of
the other threads, which then proceed to clear their user-space tids
before synchronizing in exit_mm() with the start of core dumping.  This
misrepresents the state of process's address space at the time of the
SIGSEGV and makes it more difficult for someone to debug NPTL and glibc
problems (misleading him/her to conclude that the threads had gone away
before the fault).

The fix below is to simply avoid the CLONE_CHILD_CLEARTID action if a
core dump has been initiated.
"

The resulting patch from Roland (https://lkml.org/lkml/2006/10/26/269)
seems to have a larger scope than the original patch asked for. It seems
that limitting the scope of the check to core dumping should work for
SIGSEGV issue describe above.

[Changelog partly based on Andreas' description]
Fixes: fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit")
Tested-by:  William Preston <wpreston@suse.com>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andreas Schwab <schwab@suse.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---

Hi,
the previous version of this patch was sent http://lkml.kernel.org/r/1470039287-14643-1-git-send-email-mhocko@kernel.org
I have dropped the vfork check which Oleg didn't like. It shouldn't
have caused any change for the nscd testing so I am keeping William's
tested-by.

 kernel/fork.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 52e725d4a866..b89f0eb99f0a 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -913,14 +913,11 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
 	deactivate_mm(tsk, mm);
 
 	/*
-	 * If we're exiting normally, clear a user-space tid field if
-	 * requested.  We leave this alone when dying by signal, to leave
-	 * the value intact in a core dump, and to save the unnecessary
-	 * trouble, say, a killed vfork parent shouldn't touch this mm.
-	 * Userland only wants this done for a sys_exit.
+	 * Signal userspace if we're not exiting with a core dump
+	 * or a killed vfork parent which shouldn't touch this mm.
 	 */
 	if (tsk->clear_child_tid) {
-		if (!(tsk->flags & PF_SIGNALED) &&
+		if (!(tsk->signal->flags & SIGNAL_GROUP_COREDUMP) &&
 		    atomic_read(&mm->mm_users) > 1) {
 			/*
 			 * We don't check the error code - if userspace has
-- 
2.8.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd
  2016-08-23 16:12 ` Michal Hocko
@ 2016-08-23 16:32   ` Oleg Nesterov
  -1 siblings, 0 replies; 12+ messages in thread
From: Oleg Nesterov @ 2016-08-23 16:32 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, linux-mm, LKML, Michal Hocko, Roland McGrath,
	Andreas Schwab, William Preston

On 08/23, Michal Hocko wrote:
>
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -913,14 +913,11 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
>  	deactivate_mm(tsk, mm);
>  
>  	/*
> -	 * If we're exiting normally, clear a user-space tid field if
> -	 * requested.  We leave this alone when dying by signal, to leave
> -	 * the value intact in a core dump, and to save the unnecessary
> -	 * trouble, say, a killed vfork parent shouldn't touch this mm.
> -	 * Userland only wants this done for a sys_exit.
> +	 * Signal userspace if we're not exiting with a core dump
> +	 * or a killed vfork parent which shouldn't touch this mm.

Well. ACK, but the comment looks wrong...

The "killed vfork parent ..." part should be removed, as you pointed
out this is no longer true.

OTOH, to me it would be better to not remove the "leave the value
intact in a core dump" part, otherwise the " we're not exiting with
a core dump" looks pointless because SIGNAL_GROUP_COREDUMP is self-
documenting.

Oleg.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd
@ 2016-08-23 16:32   ` Oleg Nesterov
  0 siblings, 0 replies; 12+ messages in thread
From: Oleg Nesterov @ 2016-08-23 16:32 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, linux-mm, LKML, Michal Hocko, Roland McGrath,
	Andreas Schwab, William Preston

On 08/23, Michal Hocko wrote:
>
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -913,14 +913,11 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
>  	deactivate_mm(tsk, mm);
>  
>  	/*
> -	 * If we're exiting normally, clear a user-space tid field if
> -	 * requested.  We leave this alone when dying by signal, to leave
> -	 * the value intact in a core dump, and to save the unnecessary
> -	 * trouble, say, a killed vfork parent shouldn't touch this mm.
> -	 * Userland only wants this done for a sys_exit.
> +	 * Signal userspace if we're not exiting with a core dump
> +	 * or a killed vfork parent which shouldn't touch this mm.

Well. ACK, but the comment looks wrong...

The "killed vfork parent ..." part should be removed, as you pointed
out this is no longer true.

OTOH, to me it would be better to not remove the "leave the value
intact in a core dump" part, otherwise the " we're not exiting with
a core dump" looks pointless because SIGNAL_GROUP_COREDUMP is self-
documenting.

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd
  2016-08-23 16:32   ` Oleg Nesterov
@ 2016-08-24  8:10     ` Michal Hocko
  -1 siblings, 0 replies; 12+ messages in thread
From: Michal Hocko @ 2016-08-24  8:10 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, linux-mm, LKML, Roland McGrath, Andreas Schwab,
	William Preston

On Tue 23-08-16 18:32:34, Oleg Nesterov wrote:
> On 08/23, Michal Hocko wrote:
> >
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -913,14 +913,11 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
> >  	deactivate_mm(tsk, mm);
> >  
> >  	/*
> > -	 * If we're exiting normally, clear a user-space tid field if
> > -	 * requested.  We leave this alone when dying by signal, to leave
> > -	 * the value intact in a core dump, and to save the unnecessary
> > -	 * trouble, say, a killed vfork parent shouldn't touch this mm.
> > -	 * Userland only wants this done for a sys_exit.
> > +	 * Signal userspace if we're not exiting with a core dump
> > +	 * or a killed vfork parent which shouldn't touch this mm.
> 
> Well. ACK, but the comment looks wrong...
> 
> The "killed vfork parent ..." part should be removed, as you pointed
> out this is no longer true.
> 
> OTOH, to me it would be better to not remove the "leave the value
> intact in a core dump" part, otherwise the " we're not exiting with
> a core dump" looks pointless because SIGNAL_GROUP_COREDUMP is self-
> documenting.

Sounds better?
diff --git a/kernel/fork.c b/kernel/fork.c
index b89f0eb99f0a..ddde5849df81 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -914,7 +914,8 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
 
 	/*
 	 * Signal userspace if we're not exiting with a core dump
-	 * or a killed vfork parent which shouldn't touch this mm.
+	 * because we want to leave the value intact for debugging
+	 * purposes.
 	 */
 	if (tsk->clear_child_tid) {
 		if (!(tsk->signal->flags & SIGNAL_GROUP_COREDUMP) &&
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd
@ 2016-08-24  8:10     ` Michal Hocko
  0 siblings, 0 replies; 12+ messages in thread
From: Michal Hocko @ 2016-08-24  8:10 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, linux-mm, LKML, Roland McGrath, Andreas Schwab,
	William Preston

On Tue 23-08-16 18:32:34, Oleg Nesterov wrote:
> On 08/23, Michal Hocko wrote:
> >
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -913,14 +913,11 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
> >  	deactivate_mm(tsk, mm);
> >  
> >  	/*
> > -	 * If we're exiting normally, clear a user-space tid field if
> > -	 * requested.  We leave this alone when dying by signal, to leave
> > -	 * the value intact in a core dump, and to save the unnecessary
> > -	 * trouble, say, a killed vfork parent shouldn't touch this mm.
> > -	 * Userland only wants this done for a sys_exit.
> > +	 * Signal userspace if we're not exiting with a core dump
> > +	 * or a killed vfork parent which shouldn't touch this mm.
> 
> Well. ACK, but the comment looks wrong...
> 
> The "killed vfork parent ..." part should be removed, as you pointed
> out this is no longer true.
> 
> OTOH, to me it would be better to not remove the "leave the value
> intact in a core dump" part, otherwise the " we're not exiting with
> a core dump" looks pointless because SIGNAL_GROUP_COREDUMP is self-
> documenting.

Sounds better?
diff --git a/kernel/fork.c b/kernel/fork.c
index b89f0eb99f0a..ddde5849df81 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -914,7 +914,8 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
 
 	/*
 	 * Signal userspace if we're not exiting with a core dump
-	 * or a killed vfork parent which shouldn't touch this mm.
+	 * because we want to leave the value intact for debugging
+	 * purposes.
 	 */
 	if (tsk->clear_child_tid) {
 		if (!(tsk->signal->flags & SIGNAL_GROUP_COREDUMP) &&
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd
  2016-08-24  8:10     ` Michal Hocko
@ 2016-08-24 15:32       ` Oleg Nesterov
  -1 siblings, 0 replies; 12+ messages in thread
From: Oleg Nesterov @ 2016-08-24 15:32 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, linux-mm, LKML, Roland McGrath, Andreas Schwab,
	William Preston

On 08/24, Michal Hocko wrote:
>
> Sounds better?
> diff --git a/kernel/fork.c b/kernel/fork.c
> index b89f0eb99f0a..ddde5849df81 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -914,7 +914,8 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
>  
>  	/*
>  	 * Signal userspace if we're not exiting with a core dump
> -	 * or a killed vfork parent which shouldn't touch this mm.
> +	 * because we want to leave the value intact for debugging
> +	 * purposes.
>  	 */
>  	if (tsk->clear_child_tid) {
>  		if (!(tsk->signal->flags & SIGNAL_GROUP_COREDUMP) &&

Yes, thanks Michal!

Acked-by: Oleg Nesterov <oleg@redhat.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd
@ 2016-08-24 15:32       ` Oleg Nesterov
  0 siblings, 0 replies; 12+ messages in thread
From: Oleg Nesterov @ 2016-08-24 15:32 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, linux-mm, LKML, Roland McGrath, Andreas Schwab,
	William Preston

On 08/24, Michal Hocko wrote:
>
> Sounds better?
> diff --git a/kernel/fork.c b/kernel/fork.c
> index b89f0eb99f0a..ddde5849df81 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -914,7 +914,8 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
>  
>  	/*
>  	 * Signal userspace if we're not exiting with a core dump
> -	 * or a killed vfork parent which shouldn't touch this mm.
> +	 * because we want to leave the value intact for debugging
> +	 * purposes.
>  	 */
>  	if (tsk->clear_child_tid) {
>  		if (!(tsk->signal->flags & SIGNAL_GROUP_COREDUMP) &&

Yes, thanks Michal!

Acked-by: Oleg Nesterov <oleg@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd
  2016-08-24 15:32       ` Oleg Nesterov
@ 2016-08-24 15:37         ` Michal Hocko
  -1 siblings, 0 replies; 12+ messages in thread
From: Michal Hocko @ 2016-08-24 15:37 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov
  Cc: linux-mm, LKML, Roland McGrath, Andreas Schwab, William Preston

On Wed 24-08-16 17:32:00, Oleg Nesterov wrote:
> On 08/24, Michal Hocko wrote:
> >
> > Sounds better?
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index b89f0eb99f0a..ddde5849df81 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -914,7 +914,8 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
> >  
> >  	/*
> >  	 * Signal userspace if we're not exiting with a core dump
> > -	 * or a killed vfork parent which shouldn't touch this mm.
> > +	 * because we want to leave the value intact for debugging
> > +	 * purposes.
> >  	 */
> >  	if (tsk->clear_child_tid) {
> >  		if (!(tsk->signal->flags & SIGNAL_GROUP_COREDUMP) &&
> 
> Yes, thanks Michal!
> 
> Acked-by: Oleg Nesterov <oleg@redhat.com>

OK, thanks.
---
>From 39cad7842660e0261c27f75702d49458a1f3cea1 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Mon, 30 May 2016 20:20:32 +0200
Subject: [PATCH] kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd

fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit")
has caused a subtle regression in nscd which uses CLONE_CHILD_CLEARTID
to clear the nscd_certainly_running flag in the shared databases, so
that the clients are notified when nscd is restarted.  Now, when nscd
uses a non-persistent database, clients that have it mapped keep
thinking the database is being updated by nscd, when in fact nscd has
created a new (anonymous) one (for non-persistent databases it uses an
unlinked file as backend).

The original proposal for the CLONE_CHILD_CLEARTID change claimed
(https://lkml.org/lkml/2006/10/25/233):
"
The NPTL library uses the CLONE_CHILD_CLEARTID flag on clone() syscalls
on behalf of pthread_create() library calls.  This feature is used to
request that the kernel clear the thread-id in user space (at an address
provided in the syscall) when the thread disassociates itself from the
address space, which is done in mm_release().

Unfortunately, when a multi-threaded process incurs a core dump (such as
from a SIGSEGV), the core-dumping thread sends SIGKILL signals to all of
the other threads, which then proceed to clear their user-space tids
before synchronizing in exit_mm() with the start of core dumping.  This
misrepresents the state of process's address space at the time of the
SIGSEGV and makes it more difficult for someone to debug NPTL and glibc
problems (misleading him/her to conclude that the threads had gone away
before the fault).

The fix below is to simply avoid the CLONE_CHILD_CLEARTID action if a
core dump has been initiated.
"

The resulting patch from Roland (https://lkml.org/lkml/2006/10/26/269)
seems to have a larger scope than the original patch asked for. It seems
that limitting the scope of the check to core dumping should work for
SIGSEGV issue describe above.

[Changelog partly based on Andreas' description]
Fixes: fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit")
Tested-by:  William Preston <wpreston@suse.com>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Andreas Schwab <schwab@suse.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 kernel/fork.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 52e725d4a866..ddde5849df81 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -913,14 +913,12 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
 	deactivate_mm(tsk, mm);
 
 	/*
-	 * If we're exiting normally, clear a user-space tid field if
-	 * requested.  We leave this alone when dying by signal, to leave
-	 * the value intact in a core dump, and to save the unnecessary
-	 * trouble, say, a killed vfork parent shouldn't touch this mm.
-	 * Userland only wants this done for a sys_exit.
+	 * Signal userspace if we're not exiting with a core dump
+	 * because we want to leave the value intact for debugging
+	 * purposes.
 	 */
 	if (tsk->clear_child_tid) {
-		if (!(tsk->flags & PF_SIGNALED) &&
+		if (!(tsk->signal->flags & SIGNAL_GROUP_COREDUMP) &&
 		    atomic_read(&mm->mm_users) > 1) {
 			/*
 			 * We don't check the error code - if userspace has
-- 
2.8.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd
@ 2016-08-24 15:37         ` Michal Hocko
  0 siblings, 0 replies; 12+ messages in thread
From: Michal Hocko @ 2016-08-24 15:37 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov
  Cc: linux-mm, LKML, Roland McGrath, Andreas Schwab, William Preston

On Wed 24-08-16 17:32:00, Oleg Nesterov wrote:
> On 08/24, Michal Hocko wrote:
> >
> > Sounds better?
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index b89f0eb99f0a..ddde5849df81 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -914,7 +914,8 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
> >  
> >  	/*
> >  	 * Signal userspace if we're not exiting with a core dump
> > -	 * or a killed vfork parent which shouldn't touch this mm.
> > +	 * because we want to leave the value intact for debugging
> > +	 * purposes.
> >  	 */
> >  	if (tsk->clear_child_tid) {
> >  		if (!(tsk->signal->flags & SIGNAL_GROUP_COREDUMP) &&
> 
> Yes, thanks Michal!
> 
> Acked-by: Oleg Nesterov <oleg@redhat.com>

OK, thanks.
---

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd
  2016-08-24 15:37         ` Michal Hocko
@ 2016-08-31  6:42           ` Michal Hocko
  -1 siblings, 0 replies; 12+ messages in thread
From: Michal Hocko @ 2016-08-31  6:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, LKML, Roland McGrath, Andreas Schwab, William Preston,
	Oleg Nesterov

On Wed 24-08-16 17:37:16, Michal Hocko wrote:
> On Wed 24-08-16 17:32:00, Oleg Nesterov wrote:
> > On 08/24, Michal Hocko wrote:
> > >
> > > Sounds better?
> > > diff --git a/kernel/fork.c b/kernel/fork.c
> > > index b89f0eb99f0a..ddde5849df81 100644
> > > --- a/kernel/fork.c
> > > +++ b/kernel/fork.c
> > > @@ -914,7 +914,8 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
> > >  
> > >  	/*
> > >  	 * Signal userspace if we're not exiting with a core dump
> > > -	 * or a killed vfork parent which shouldn't touch this mm.
> > > +	 * because we want to leave the value intact for debugging
> > > +	 * purposes.
> > >  	 */
> > >  	if (tsk->clear_child_tid) {
> > >  		if (!(tsk->signal->flags & SIGNAL_GROUP_COREDUMP) &&
> > 
> > Yes, thanks Michal!
> > 
> > Acked-by: Oleg Nesterov <oleg@redhat.com>
> 
> OK, thanks.

ping

> ---
> From 39cad7842660e0261c27f75702d49458a1f3cea1 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Mon, 30 May 2016 20:20:32 +0200
> Subject: [PATCH] kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd
> 
> fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit")
> has caused a subtle regression in nscd which uses CLONE_CHILD_CLEARTID
> to clear the nscd_certainly_running flag in the shared databases, so
> that the clients are notified when nscd is restarted.  Now, when nscd
> uses a non-persistent database, clients that have it mapped keep
> thinking the database is being updated by nscd, when in fact nscd has
> created a new (anonymous) one (for non-persistent databases it uses an
> unlinked file as backend).
> 
> The original proposal for the CLONE_CHILD_CLEARTID change claimed
> (https://lkml.org/lkml/2006/10/25/233):
> "
> The NPTL library uses the CLONE_CHILD_CLEARTID flag on clone() syscalls
> on behalf of pthread_create() library calls.  This feature is used to
> request that the kernel clear the thread-id in user space (at an address
> provided in the syscall) when the thread disassociates itself from the
> address space, which is done in mm_release().
> 
> Unfortunately, when a multi-threaded process incurs a core dump (such as
> from a SIGSEGV), the core-dumping thread sends SIGKILL signals to all of
> the other threads, which then proceed to clear their user-space tids
> before synchronizing in exit_mm() with the start of core dumping.  This
> misrepresents the state of process's address space at the time of the
> SIGSEGV and makes it more difficult for someone to debug NPTL and glibc
> problems (misleading him/her to conclude that the threads had gone away
> before the fault).
> 
> The fix below is to simply avoid the CLONE_CHILD_CLEARTID action if a
> core dump has been initiated.
> "
> 
> The resulting patch from Roland (https://lkml.org/lkml/2006/10/26/269)
> seems to have a larger scope than the original patch asked for. It seems
> that limitting the scope of the check to core dumping should work for
> SIGSEGV issue describe above.
> 
> [Changelog partly based on Andreas' description]
> Fixes: fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit")
> Tested-by:  William Preston <wpreston@suse.com>
> Cc: Roland McGrath <roland@hack.frob.com>
> Cc: Andreas Schwab <schwab@suse.com>
> Acked-by: Oleg Nesterov <oleg@redhat.com>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  kernel/fork.c | 10 ++++------
>  1 file changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 52e725d4a866..ddde5849df81 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -913,14 +913,12 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
>  	deactivate_mm(tsk, mm);
>  
>  	/*
> -	 * If we're exiting normally, clear a user-space tid field if
> -	 * requested.  We leave this alone when dying by signal, to leave
> -	 * the value intact in a core dump, and to save the unnecessary
> -	 * trouble, say, a killed vfork parent shouldn't touch this mm.
> -	 * Userland only wants this done for a sys_exit.
> +	 * Signal userspace if we're not exiting with a core dump
> +	 * because we want to leave the value intact for debugging
> +	 * purposes.
>  	 */
>  	if (tsk->clear_child_tid) {
> -		if (!(tsk->flags & PF_SIGNALED) &&
> +		if (!(tsk->signal->flags & SIGNAL_GROUP_COREDUMP) &&
>  		    atomic_read(&mm->mm_users) > 1) {
>  			/*
>  			 * We don't check the error code - if userspace has
> -- 
> 2.8.1
> 
> -- 
> Michal Hocko
> SUSE Labs

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd
@ 2016-08-31  6:42           ` Michal Hocko
  0 siblings, 0 replies; 12+ messages in thread
From: Michal Hocko @ 2016-08-31  6:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, LKML, Roland McGrath, Andreas Schwab, William Preston,
	Oleg Nesterov

On Wed 24-08-16 17:37:16, Michal Hocko wrote:
> On Wed 24-08-16 17:32:00, Oleg Nesterov wrote:
> > On 08/24, Michal Hocko wrote:
> > >
> > > Sounds better?
> > > diff --git a/kernel/fork.c b/kernel/fork.c
> > > index b89f0eb99f0a..ddde5849df81 100644
> > > --- a/kernel/fork.c
> > > +++ b/kernel/fork.c
> > > @@ -914,7 +914,8 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
> > >  
> > >  	/*
> > >  	 * Signal userspace if we're not exiting with a core dump
> > > -	 * or a killed vfork parent which shouldn't touch this mm.
> > > +	 * because we want to leave the value intact for debugging
> > > +	 * purposes.
> > >  	 */
> > >  	if (tsk->clear_child_tid) {
> > >  		if (!(tsk->signal->flags & SIGNAL_GROUP_COREDUMP) &&
> > 
> > Yes, thanks Michal!
> > 
> > Acked-by: Oleg Nesterov <oleg@redhat.com>
> 
> OK, thanks.

ping

> ---
> From 39cad7842660e0261c27f75702d49458a1f3cea1 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Mon, 30 May 2016 20:20:32 +0200
> Subject: [PATCH] kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd
> 
> fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit")
> has caused a subtle regression in nscd which uses CLONE_CHILD_CLEARTID
> to clear the nscd_certainly_running flag in the shared databases, so
> that the clients are notified when nscd is restarted.  Now, when nscd
> uses a non-persistent database, clients that have it mapped keep
> thinking the database is being updated by nscd, when in fact nscd has
> created a new (anonymous) one (for non-persistent databases it uses an
> unlinked file as backend).
> 
> The original proposal for the CLONE_CHILD_CLEARTID change claimed
> (https://lkml.org/lkml/2006/10/25/233):
> "
> The NPTL library uses the CLONE_CHILD_CLEARTID flag on clone() syscalls
> on behalf of pthread_create() library calls.  This feature is used to
> request that the kernel clear the thread-id in user space (at an address
> provided in the syscall) when the thread disassociates itself from the
> address space, which is done in mm_release().
> 
> Unfortunately, when a multi-threaded process incurs a core dump (such as
> from a SIGSEGV), the core-dumping thread sends SIGKILL signals to all of
> the other threads, which then proceed to clear their user-space tids
> before synchronizing in exit_mm() with the start of core dumping.  This
> misrepresents the state of process's address space at the time of the
> SIGSEGV and makes it more difficult for someone to debug NPTL and glibc
> problems (misleading him/her to conclude that the threads had gone away
> before the fault).
> 
> The fix below is to simply avoid the CLONE_CHILD_CLEARTID action if a
> core dump has been initiated.
> "
> 
> The resulting patch from Roland (https://lkml.org/lkml/2006/10/26/269)
> seems to have a larger scope than the original patch asked for. It seems
> that limitting the scope of the check to core dumping should work for
> SIGSEGV issue describe above.
> 
> [Changelog partly based on Andreas' description]
> Fixes: fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit")
> Tested-by:  William Preston <wpreston@suse.com>
> Cc: Roland McGrath <roland@hack.frob.com>
> Cc: Andreas Schwab <schwab@suse.com>
> Acked-by: Oleg Nesterov <oleg@redhat.com>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  kernel/fork.c | 10 ++++------
>  1 file changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 52e725d4a866..ddde5849df81 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -913,14 +913,12 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
>  	deactivate_mm(tsk, mm);
>  
>  	/*
> -	 * If we're exiting normally, clear a user-space tid field if
> -	 * requested.  We leave this alone when dying by signal, to leave
> -	 * the value intact in a core dump, and to save the unnecessary
> -	 * trouble, say, a killed vfork parent shouldn't touch this mm.
> -	 * Userland only wants this done for a sys_exit.
> +	 * Signal userspace if we're not exiting with a core dump
> +	 * because we want to leave the value intact for debugging
> +	 * purposes.
>  	 */
>  	if (tsk->clear_child_tid) {
> -		if (!(tsk->flags & PF_SIGNALED) &&
> +		if (!(tsk->signal->flags & SIGNAL_GROUP_COREDUMP) &&
>  		    atomic_read(&mm->mm_users) > 1) {
>  			/*
>  			 * We don't check the error code - if userspace has
> -- 
> 2.8.1
> 
> -- 
> Michal Hocko
> SUSE Labs

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-08-31  6:42 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-23 16:12 [PATCH v2] kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd Michal Hocko
2016-08-23 16:12 ` Michal Hocko
2016-08-23 16:32 ` Oleg Nesterov
2016-08-23 16:32   ` Oleg Nesterov
2016-08-24  8:10   ` Michal Hocko
2016-08-24  8:10     ` Michal Hocko
2016-08-24 15:32     ` Oleg Nesterov
2016-08-24 15:32       ` Oleg Nesterov
2016-08-24 15:37       ` Michal Hocko
2016-08-24 15:37         ` Michal Hocko
2016-08-31  6:42         ` Michal Hocko
2016-08-31  6:42           ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.