linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd
@ 2016-08-01  8:14 Michal Hocko
  2016-08-03 21:08 ` Oleg Nesterov
  0 siblings, 1 reply; 6+ messages in thread
From: Michal Hocko @ 2016-08-01  8:14 UTC (permalink / raw)
  To: linux-mm
  Cc: LKML, Andrew Morton, William Preston, Michal Hocko,
	Roland McGrath, Oleg Nesterov, Andreas Schwab

From: Michal Hocko <mhocko@suse.com>

fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit")
has caused a subtle regression in nscd which uses CLONE_CHILD_CLEARTID
to clear the nscd_certainly_running flag in the shared databases, so
that the clients are notified when nscd is restarted.  Now, when nscd
uses a non-persistent database, clients that have it mapped keep
thinking the database is being updated by nscd, when in fact nscd has
created a new (anonymous) one (for non-persistent databases it uses an
unlinked file as backend).

The original proposal for the CLONE_CHILD_CLEARTID change claimed
(https://lkml.org/lkml/2006/10/25/233):
"
The NPTL library uses the CLONE_CHILD_CLEARTID flag on clone() syscalls
on behalf of pthread_create() library calls.  This feature is used to
request that the kernel clear the thread-id in user space (at an address
provided in the syscall) when the thread disassociates itself from the
address space, which is done in mm_release().

Unfortunately, when a multi-threaded process incurs a core dump (such as
from a SIGSEGV), the core-dumping thread sends SIGKILL signals to all of
the other threads, which then proceed to clear their user-space tids
before synchronizing in exit_mm() with the start of core dumping.  This
misrepresents the state of process's address space at the time of the
SIGSEGV and makes it more difficult for someone to debug NPTL and glibc
problems (misleading him/her to conclude that the threads had gone away
before the fault).

The fix below is to simply avoid the CLONE_CHILD_CLEARTID action if a
core dump has been initiated.
"

The resulting patch from Roland (https://lkml.org/lkml/2006/10/26/269)
seems to have a larger scope than the original patch asked for. It seems
that limitting the scope of the check to core dumping should work for
SIGSEGV issue describe above. We should also check for vfork because
this is killable since d68b46fe16ad ("vfork: make it killable").

[Changelog partly based on Andreas' description]
Fixes: fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit")
Tested-by:  William Preston <wpreston@suse.com>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andreas Schwab <schwab@suse.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---

Hi,
the issue has been reported by Andreas https://lkml.org/lkml/2015/8/20/459
almost one year ago without any response. This is my attempt to fix the issue
and the testing confirms that nscd doesn't complain with this patch applied
but I have hard time to think through all the consequences, to be honest so
I am sending this as an RFC.

 kernel/fork.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 52e725d4a866..0c1184473c3e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -913,14 +913,12 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
 	deactivate_mm(tsk, mm);
 
 	/*
-	 * If we're exiting normally, clear a user-space tid field if
-	 * requested.  We leave this alone when dying by signal, to leave
-	 * the value intact in a core dump, and to save the unnecessary
-	 * trouble, say, a killed vfork parent shouldn't touch this mm.
-	 * Userland only wants this done for a sys_exit.
+	 * Signal userspace if we're not exiting with a core dump
+	 * or a killed vfork parent which shouldn't touch this mm.
 	 */
 	if (tsk->clear_child_tid) {
-		if (!(tsk->flags & PF_SIGNALED) &&
+		if (!(tsk->flags & PF_SIGNALED && tsk->vfork_done) &&
+		    !(tsk->signal->flags & SIGNAL_GROUP_COREDUMP) &&
 		    atomic_read(&mm->mm_users) > 1) {
 			/*
 			 * We don't check the error code - if userspace has
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd
  2016-08-01  8:14 [RFC PATCH] kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd Michal Hocko
@ 2016-08-03 21:08 ` Oleg Nesterov
  2016-08-12  9:41   ` Michal Hocko
  0 siblings, 1 reply; 6+ messages in thread
From: Oleg Nesterov @ 2016-08-03 21:08 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, LKML, Andrew Morton, William Preston, Michal Hocko,
	Roland McGrath, Andreas Schwab

sorry for delay, I am travelling till the end of the week.

On 08/01, Michal Hocko wrote:
>
> fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit")

almost 10 years ago ;)

> has caused a subtle regression in nscd which uses CLONE_CHILD_CLEARTID
> to clear the nscd_certainly_running flag in the shared databases, so
> that the clients are notified when nscd is restarted.

So iiuc with this patch nscd_certainly_running should be cleared even if
ncsd was killed by !sig_kernel_coredump() signal, right?

> We should also check for vfork because
> this is killable since d68b46fe16ad ("vfork: make it killable").

Hmm, why? Can't understand... In any case this check doesn't look right, the
comment says "a killed vfork parent" while tsk->vfork_done != NULL means it
is a vforked child.

So if we want this change, why we can't simply do

	-	if (!(tsk->flags & PF_SIGNALED) &&
	+	if (!(tsk->signal->flags & SIGNAL_GROUP_COREDUMP) &&

?

And I think PF_SIGNALED must die in any case... but this is off-topic.

Oleg.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd
  2016-08-03 21:08 ` Oleg Nesterov
@ 2016-08-12  9:41   ` Michal Hocko
  2016-08-19 13:25     ` Michal Hocko
  0 siblings, 1 reply; 6+ messages in thread
From: Michal Hocko @ 2016-08-12  9:41 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-mm, LKML, Andrew Morton, William Preston, Roland McGrath,
	Andreas Schwab

On Wed 03-08-16 23:08:04, Oleg Nesterov wrote:
> sorry for delay, I am travelling till the end of the week.

Same here...

> On 08/01, Michal Hocko wrote:
> >
> > fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit")
> 
> almost 10 years ago ;)

Yes, it's been a while... I guess nscd doesn't enable persistent host
caching by default. I just know that our customer wanted to enable this
feature to find out it doesn't work properly. At least that is my
understanding.

> > has caused a subtle regression in nscd which uses CLONE_CHILD_CLEARTID
> > to clear the nscd_certainly_running flag in the shared databases, so
> > that the clients are notified when nscd is restarted.
> 
> So iiuc with this patch nscd_certainly_running should be cleared even if
> ncsd was killed by !sig_kernel_coredump() signal, right?

Yes.

> > We should also check for vfork because
> > this is killable since d68b46fe16ad ("vfork: make it killable").
> 
> Hmm, why? Can't understand... In any case this check doesn't look right, the
> comment says "a killed vfork parent" while tsk->vfork_done != NULL means it
> is a vforked child.
> 
> So if we want this change, why we can't simply do
> 
> 	-	if (!(tsk->flags & PF_SIGNALED) &&
> 	+	if (!(tsk->signal->flags & SIGNAL_GROUP_COREDUMP) &&
> 
> ?

This is what I had initially. But then the comment above the check made
me worried that the parent of vforked child might get confused if the
flag is cleared. I might have completely misunderstood the point of the
comment though. So if you believe that vfork_done check is incorrect I
can drop it. It shouldn't have any effect on the nscd usecase AFAIU.

Thanks!

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd
  2016-08-12  9:41   ` Michal Hocko
@ 2016-08-19 13:25     ` Michal Hocko
  2016-08-23 15:27       ` Oleg Nesterov
  0 siblings, 1 reply; 6+ messages in thread
From: Michal Hocko @ 2016-08-19 13:25 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-mm, LKML, Andrew Morton, William Preston, Roland McGrath,
	Andreas Schwab

On Fri 12-08-16 11:41:13, Michal Hocko wrote:
> On Wed 03-08-16 23:08:04, Oleg Nesterov wrote:
> > sorry for delay, I am travelling till the end of the week.
> 
> Same here...
> 
> > On 08/01, Michal Hocko wrote:
[...]
> > > We should also check for vfork because
> > > this is killable since d68b46fe16ad ("vfork: make it killable").
> > 
> > Hmm, why? Can't understand... In any case this check doesn't look right, the
> > comment says "a killed vfork parent" while tsk->vfork_done != NULL means it
> > is a vforked child.
> > 
> > So if we want this change, why we can't simply do
> > 
> > 	-	if (!(tsk->flags & PF_SIGNALED) &&
> > 	+	if (!(tsk->signal->flags & SIGNAL_GROUP_COREDUMP) &&
> > 
> > ?
> 
> This is what I had initially. But then the comment above the check made
> me worried that the parent of vforked child might get confused if the
> flag is cleared. I might have completely misunderstood the point of the
> comment though. So if you believe that vfork_done check is incorrect I
> can drop it. It shouldn't have any effect on the nscd usecase AFAIU.

So should I drop the vfork check and repost or we do not care about this
"regression" and declare nscd broken because it relies on a behavior
which is not in fact guaranteed by the kernel?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd
  2016-08-19 13:25     ` Michal Hocko
@ 2016-08-23 15:27       ` Oleg Nesterov
  2016-08-23 16:03         ` Michal Hocko
  0 siblings, 1 reply; 6+ messages in thread
From: Oleg Nesterov @ 2016-08-23 15:27 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, LKML, Andrew Morton, William Preston, Roland McGrath,
	Andreas Schwab

On 08/19, Michal Hocko wrote:
>
> On Fri 12-08-16 11:41:13, Michal Hocko wrote:
> > On Wed 03-08-16 23:08:04, Oleg Nesterov wrote:
> > >
> > > So if we want this change, why we can't simply do
> > >
> > > 	-	if (!(tsk->flags & PF_SIGNALED) &&
> > > 	+	if (!(tsk->signal->flags & SIGNAL_GROUP_COREDUMP) &&
> > >
> > > ?
> >
> > This is what I had initially. But then the comment above the check made
> > me worried that the parent of vforked child might get confused if the
> > flag is cleared.

I don't think the child can be confused... At least I can't imagine how
this can happen.

Anyway, I objected because the tsk->vfork != NULL check was wrong, in this
case this tsk is vforke'd child, not parent.

> So should I drop the vfork check and repost

Probably yes. At least the SIGNAL_GROUP_COREDUMP will match the comment.

> or we do not care about this
> "regression"

Honestly, I do not know ;) Personally, I am always scared when it comes
to the subtle changes like this, you can never know what can be broken.
And note that it can be broken 10 years later, like it happened with
nscd ;)

But if you send the s/PF_SIGNALED/SIGNAL_GROUP_COREDUMP/ change I will
ack it ;) Even if it won't really fix this nscd problem (imo), because
I guess nscd wants to reset ->clear_child_tid even if the signal was
sig_kernel_coredump().

Oleg.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd
  2016-08-23 15:27       ` Oleg Nesterov
@ 2016-08-23 16:03         ` Michal Hocko
  0 siblings, 0 replies; 6+ messages in thread
From: Michal Hocko @ 2016-08-23 16:03 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-mm, LKML, Andrew Morton, William Preston, Roland McGrath,
	Andreas Schwab

On Tue 23-08-16 17:27:11, Oleg Nesterov wrote:
> On 08/19, Michal Hocko wrote:
[...]
> > or we do not care about this
> > "regression"
> 
> Honestly, I do not know ;) Personally, I am always scared when it comes
> to the subtle changes like this, you can never know what can be broken.

If _you_ are scarred (after so many years of permanent exposure to this
code) then try to imagine how I am scarred when touching anything in
this area...

> And note that it can be broken 10 years later, like it happened with
> nscd ;)
> 
> But if you send the s/PF_SIGNALED/SIGNAL_GROUP_COREDUMP/ change I will
> ack it ;)

OK, I will repost

> Even if it won't really fix this nscd problem (imo), because
> I guess nscd wants to reset ->clear_child_tid even if the signal was
> sig_kernel_coredump().

Come on, have you ever seen this fine piece of software crashing?
But more seriously, I wouldn't give a damn because nscd is usually the
first thing I disable on my systems but there seem to be people who
would like to use this persistence thingy and even service restart will
break it. So I think we should plug this hole.

Anyway thanks for your review and feedback. As always it is really
appreciated!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-08-23 16:19 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-01  8:14 [RFC PATCH] kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd Michal Hocko
2016-08-03 21:08 ` Oleg Nesterov
2016-08-12  9:41   ` Michal Hocko
2016-08-19 13:25     ` Michal Hocko
2016-08-23 15:27       ` Oleg Nesterov
2016-08-23 16:03         ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).