All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: <linux-mm@kvack.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	William Preston <wpreston@suse.com>,
	Michal Hocko <mhocko@suse.com>,
	Roland McGrath <roland@hack.frob.com>,
	Oleg Nesterov <oleg@redhat.com>, Andreas Schwab <schwab@suse.com>
Subject: [RFC PATCH] kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd
Date: Mon,  1 Aug 2016 10:14:47 +0200	[thread overview]
Message-ID: <1470039287-14643-1-git-send-email-mhocko@kernel.org> (raw)

From: Michal Hocko <mhocko@suse.com>

fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit")
has caused a subtle regression in nscd which uses CLONE_CHILD_CLEARTID
to clear the nscd_certainly_running flag in the shared databases, so
that the clients are notified when nscd is restarted.  Now, when nscd
uses a non-persistent database, clients that have it mapped keep
thinking the database is being updated by nscd, when in fact nscd has
created a new (anonymous) one (for non-persistent databases it uses an
unlinked file as backend).

The original proposal for the CLONE_CHILD_CLEARTID change claimed
(https://lkml.org/lkml/2006/10/25/233):
"
The NPTL library uses the CLONE_CHILD_CLEARTID flag on clone() syscalls
on behalf of pthread_create() library calls.  This feature is used to
request that the kernel clear the thread-id in user space (at an address
provided in the syscall) when the thread disassociates itself from the
address space, which is done in mm_release().

Unfortunately, when a multi-threaded process incurs a core dump (such as
from a SIGSEGV), the core-dumping thread sends SIGKILL signals to all of
the other threads, which then proceed to clear their user-space tids
before synchronizing in exit_mm() with the start of core dumping.  This
misrepresents the state of process's address space at the time of the
SIGSEGV and makes it more difficult for someone to debug NPTL and glibc
problems (misleading him/her to conclude that the threads had gone away
before the fault).

The fix below is to simply avoid the CLONE_CHILD_CLEARTID action if a
core dump has been initiated.
"

The resulting patch from Roland (https://lkml.org/lkml/2006/10/26/269)
seems to have a larger scope than the original patch asked for. It seems
that limitting the scope of the check to core dumping should work for
SIGSEGV issue describe above. We should also check for vfork because
this is killable since d68b46fe16ad ("vfork: make it killable").

[Changelog partly based on Andreas' description]
Fixes: fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit")
Tested-by:  William Preston <wpreston@suse.com>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andreas Schwab <schwab@suse.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---

Hi,
the issue has been reported by Andreas https://lkml.org/lkml/2015/8/20/459
almost one year ago without any response. This is my attempt to fix the issue
and the testing confirms that nscd doesn't complain with this patch applied
but I have hard time to think through all the consequences, to be honest so
I am sending this as an RFC.

 kernel/fork.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 52e725d4a866..0c1184473c3e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -913,14 +913,12 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
 	deactivate_mm(tsk, mm);
 
 	/*
-	 * If we're exiting normally, clear a user-space tid field if
-	 * requested.  We leave this alone when dying by signal, to leave
-	 * the value intact in a core dump, and to save the unnecessary
-	 * trouble, say, a killed vfork parent shouldn't touch this mm.
-	 * Userland only wants this done for a sys_exit.
+	 * Signal userspace if we're not exiting with a core dump
+	 * or a killed vfork parent which shouldn't touch this mm.
 	 */
 	if (tsk->clear_child_tid) {
-		if (!(tsk->flags & PF_SIGNALED) &&
+		if (!(tsk->flags & PF_SIGNALED && tsk->vfork_done) &&
+		    !(tsk->signal->flags & SIGNAL_GROUP_COREDUMP) &&
 		    atomic_read(&mm->mm_users) > 1) {
 			/*
 			 * We don't check the error code - if userspace has
-- 
2.8.1

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org>
To: linux-mm@kvack.org
Cc: LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	William Preston <wpreston@suse.com>,
	Michal Hocko <mhocko@suse.com>,
	Roland McGrath <roland@hack.frob.com>,
	Oleg Nesterov <oleg@redhat.com>, Andreas Schwab <schwab@suse.com>
Subject: [RFC PATCH] kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd
Date: Mon,  1 Aug 2016 10:14:47 +0200	[thread overview]
Message-ID: <1470039287-14643-1-git-send-email-mhocko@kernel.org> (raw)

From: Michal Hocko <mhocko@suse.com>

fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit")
has caused a subtle regression in nscd which uses CLONE_CHILD_CLEARTID
to clear the nscd_certainly_running flag in the shared databases, so
that the clients are notified when nscd is restarted.  Now, when nscd
uses a non-persistent database, clients that have it mapped keep
thinking the database is being updated by nscd, when in fact nscd has
created a new (anonymous) one (for non-persistent databases it uses an
unlinked file as backend).

The original proposal for the CLONE_CHILD_CLEARTID change claimed
(https://lkml.org/lkml/2006/10/25/233):
"
The NPTL library uses the CLONE_CHILD_CLEARTID flag on clone() syscalls
on behalf of pthread_create() library calls.  This feature is used to
request that the kernel clear the thread-id in user space (at an address
provided in the syscall) when the thread disassociates itself from the
address space, which is done in mm_release().

Unfortunately, when a multi-threaded process incurs a core dump (such as
from a SIGSEGV), the core-dumping thread sends SIGKILL signals to all of
the other threads, which then proceed to clear their user-space tids
before synchronizing in exit_mm() with the start of core dumping.  This
misrepresents the state of process's address space at the time of the
SIGSEGV and makes it more difficult for someone to debug NPTL and glibc
problems (misleading him/her to conclude that the threads had gone away
before the fault).

The fix below is to simply avoid the CLONE_CHILD_CLEARTID action if a
core dump has been initiated.
"

The resulting patch from Roland (https://lkml.org/lkml/2006/10/26/269)
seems to have a larger scope than the original patch asked for. It seems
that limitting the scope of the check to core dumping should work for
SIGSEGV issue describe above. We should also check for vfork because
this is killable since d68b46fe16ad ("vfork: make it killable").

[Changelog partly based on Andreas' description]
Fixes: fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit")
Tested-by:  William Preston <wpreston@suse.com>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andreas Schwab <schwab@suse.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---

Hi,
the issue has been reported by Andreas https://lkml.org/lkml/2015/8/20/459
almost one year ago without any response. This is my attempt to fix the issue
and the testing confirms that nscd doesn't complain with this patch applied
but I have hard time to think through all the consequences, to be honest so
I am sending this as an RFC.

 kernel/fork.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 52e725d4a866..0c1184473c3e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -913,14 +913,12 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
 	deactivate_mm(tsk, mm);
 
 	/*
-	 * If we're exiting normally, clear a user-space tid field if
-	 * requested.  We leave this alone when dying by signal, to leave
-	 * the value intact in a core dump, and to save the unnecessary
-	 * trouble, say, a killed vfork parent shouldn't touch this mm.
-	 * Userland only wants this done for a sys_exit.
+	 * Signal userspace if we're not exiting with a core dump
+	 * or a killed vfork parent which shouldn't touch this mm.
 	 */
 	if (tsk->clear_child_tid) {
-		if (!(tsk->flags & PF_SIGNALED) &&
+		if (!(tsk->flags & PF_SIGNALED && tsk->vfork_done) &&
+		    !(tsk->signal->flags & SIGNAL_GROUP_COREDUMP) &&
 		    atomic_read(&mm->mm_users) > 1) {
 			/*
 			 * We don't check the error code - if userspace has
-- 
2.8.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

             reply	other threads:[~2016-08-01 10:43 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-01  8:14 Michal Hocko [this message]
2016-08-01  8:14 ` [RFC PATCH] kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd Michal Hocko
2016-08-03 21:08 ` Oleg Nesterov
2016-08-03 21:08   ` Oleg Nesterov
2016-08-12  9:41   ` Michal Hocko
2016-08-12  9:41     ` Michal Hocko
2016-08-19 13:25     ` Michal Hocko
2016-08-19 13:25       ` Michal Hocko
2016-08-23 15:27       ` Oleg Nesterov
2016-08-23 15:27         ` Oleg Nesterov
2016-08-23 16:03         ` Michal Hocko
2016-08-23 16:03           ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1470039287-14643-1-git-send-email-mhocko@kernel.org \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=oleg@redhat.com \
    --cc=roland@hack.frob.com \
    --cc=schwab@suse.com \
    --cc=wpreston@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.