All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] tty: Make __do_SAK() less greedy in regard to tasklist_lock
@ 2018-01-17 12:39 Kirill Tkhai
  2018-01-17 12:39 ` [PATCH v2 1/3] Revert "do_SAK: Don't recursively take the tasklist_lock" Kirill Tkhai
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Kirill Tkhai @ 2018-01-17 12:39 UTC (permalink / raw)
  To: gregkh, jslaby, oleg, ebiederm, ktkhai; +Cc: linux-kernel

Hi,

this patchset makes __do_SAK() to take tasklist_lock for very small time
in comparison to that it does now. Though this function is executed
in process context and it takes tasklist_lock read locked with interrupts enabled,
another tasks may want to take it for writing with interrupt disabled
(e.g., forking tasks), and these tasks may evoke hard lockups.

I've observed several hard lockups caused by long execution of __do_SAK()
on the node with 200 big containers. 3.10 kernel is used there, and mainline
kernel does not have differences in comparation to that, because of __do_SAK()
function has not changed for a long time. So, mainline kernel has this problem too.

The patchset proposes two optimizations in __do__SAK(). The first one is
to skip threads, when they share previous thread's fd table [2/3].

The second optimization is to iterate task list under rcu_read_lock().
This allows to take tasklist_lock for a very small time just to check we
reached the end of the task list. See patch [3/3] for the details.

v2: All three patches changed. Now we don't care about races with
    unshare_files() and do not take tasklist_lock on reaching task
    list end. Link to v1: https://lkml.org/lkml/2018/1/11/486

Thanks,
Kirill

---

Kirill Tkhai (2):
      Revert "do_SAK: Don't recursively take the tasklist_lock"
      tty: Use RCU read lock to iterate tasks and threads in __do_SAK()

Oleg Nesterov (1):
      tty: Avoid threads files iterations in __do_SAK()


 drivers/tty/tty_io.c |   41 ++++++++++++++++++++++++++++-------------
 1 file changed, 28 insertions(+), 13 deletions(-)

--
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/3] Revert "do_SAK: Don't recursively take the tasklist_lock"
  2018-01-17 12:39 [PATCH v2 0/3] tty: Make __do_SAK() less greedy in regard to tasklist_lock Kirill Tkhai
@ 2018-01-17 12:39 ` Kirill Tkhai
  2018-01-17 17:18   ` Eric W. Biederman
  2018-01-17 12:39 ` [PATCH v2 2/3] tty: Avoid threads files iterations in __do_SAK() Kirill Tkhai
  2018-01-17 12:39 ` [PATCH v2 3/3] tty: Use RCU read lock to iterate tasks and threads " Kirill Tkhai
  2 siblings, 1 reply; 15+ messages in thread
From: Kirill Tkhai @ 2018-01-17 12:39 UTC (permalink / raw)
  To: gregkh, jslaby, oleg, ebiederm, ktkhai; +Cc: linux-kernel

This reverts commit 20ac94378de5.

send_sig() does not take tasklist_lock for a long time,
so this commit and the problem it solves are not relevant
anymore.

Also, the problem of force_sig() is it clears SIGNAL_UNKILLABLE
flag, thus even global init may be killed by __do_SAK(),
which is definitely not the expected behavior.

Came from discussion in "tty: Iterate only thread group leaders in __do_SAK()"
https://lkml.org/lkml/2018/1/11/492

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
 drivers/tty/tty_io.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index dc60aeea87d8..84715ba1aee2 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -2737,7 +2737,7 @@ void __do_SAK(struct tty_struct *tty)
 		if (i != 0) {
 			tty_notice(tty, "SAK: killed process %d (%s): by fd#%d\n",
 				   task_pid_nr(p), p->comm, i - 1);
-			force_sig(SIGKILL, p);
+			send_sig(SIGKILL, p, 1);
 		}
 		task_unlock(p);
 	} while_each_thread(g, p);

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/3] tty: Avoid threads files iterations in __do_SAK()
  2018-01-17 12:39 [PATCH v2 0/3] tty: Make __do_SAK() less greedy in regard to tasklist_lock Kirill Tkhai
  2018-01-17 12:39 ` [PATCH v2 1/3] Revert "do_SAK: Don't recursively take the tasklist_lock" Kirill Tkhai
@ 2018-01-17 12:39 ` Kirill Tkhai
  2018-01-17 12:39 ` [PATCH v2 3/3] tty: Use RCU read lock to iterate tasks and threads " Kirill Tkhai
  2 siblings, 0 replies; 15+ messages in thread
From: Kirill Tkhai @ 2018-01-17 12:39 UTC (permalink / raw)
  To: gregkh, jslaby, oleg, ebiederm, ktkhai; +Cc: linux-kernel

From: Oleg Nesterov <oleg@redhat.com>

The patch makes __do_SAK() iterate a next thread files
only in case of the thread's files are different
to previous. I.e., if all threads points the same
files_struct, the files will be iterated only once.

Since all threads have the same files_struct is
the generic case for most Linux systems, this
improvement should clearly speed up __do_SAK()
execution.

Also, for_each_process()/for_each_thread() are
used instead of do_each_thread()/while_each_thread().
This prepares __do_SAK() to become tasklist_lock
free, and will be made in next patch.

[ktkhai: comment written]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
 drivers/tty/tty_io.c |   37 +++++++++++++++++++++++++------------
 1 file changed, 25 insertions(+), 12 deletions(-)

diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 84715ba1aee2..89326cee2403 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -2704,7 +2704,8 @@ void __do_SAK(struct tty_struct *tty)
 #ifdef TTY_SOFT_SAK
 	tty_hangup(tty);
 #else
-	struct task_struct *g, *p;
+	struct task_struct *p, *t;
+	struct files_struct *files;
 	struct pid *session;
 	int		i;
 
@@ -2725,22 +2726,34 @@ void __do_SAK(struct tty_struct *tty)
 	} while_each_pid_task(session, PIDTYPE_SID, p);
 
 	/* Now kill any processes that happen to have the tty open */
-	do_each_thread(g, p) {
+	for_each_process(p) {
 		if (p->signal->tty == tty) {
 			tty_notice(tty, "SAK: killed process %d (%s): by controlling tty\n",
 				   task_pid_nr(p), p->comm);
-			send_sig(SIGKILL, p, 1);
-			continue;
+			goto kill;
 		}
-		task_lock(p);
-		i = iterate_fd(p->files, 0, this_tty, tty);
-		if (i != 0) {
-			tty_notice(tty, "SAK: killed process %d (%s): by fd#%d\n",
-				   task_pid_nr(p), p->comm, i - 1);
-			send_sig(SIGKILL, p, 1);
+
+		files = NULL;
+		for_each_thread(p, t) {
+			if (t->files == files) /* racy but we do not care */
+				continue;
+
+			task_lock(t);
+			files = t->files;
+			i = iterate_fd(files, 0, this_tty, tty);
+			task_unlock(t);
+
+			if (i != 0) {
+				tty_notice(tty, "SAK: killed process %d (%s): by fd#%d\n",
+					   task_pid_nr(t), t->comm, i - 1);
+				goto kill;
+			}
 		}
-		task_unlock(p);
-	} while_each_thread(g, p);
+
+		continue;
+kill:
+		send_sig(SIGKILL, p, 1);
+	}
 	read_unlock(&tasklist_lock);
 #endif
 }

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 3/3] tty: Use RCU read lock to iterate tasks and threads in __do_SAK()
  2018-01-17 12:39 [PATCH v2 0/3] tty: Make __do_SAK() less greedy in regard to tasklist_lock Kirill Tkhai
  2018-01-17 12:39 ` [PATCH v2 1/3] Revert "do_SAK: Don't recursively take the tasklist_lock" Kirill Tkhai
  2018-01-17 12:39 ` [PATCH v2 2/3] tty: Avoid threads files iterations in __do_SAK() Kirill Tkhai
@ 2018-01-17 12:39 ` Kirill Tkhai
  2018-01-17 16:54   ` Eric W. Biederman
  2 siblings, 1 reply; 15+ messages in thread
From: Kirill Tkhai @ 2018-01-17 12:39 UTC (permalink / raw)
  To: gregkh, jslaby, oleg, ebiederm, ktkhai; +Cc: linux-kernel

There were made several efforts to make __do_SAK()
working in process context long ago, but it does
not solves the problem completely. Since __do_SAK()
may take tasklist_lock for a long time, the concurent
processes, waiting for write lock with interrupts
disabled (e.g., forking), get into the same situation
like __do_SAK() would have been executed in interrupt
context. I've observed several hard lockups on 3.10
kernel running 200 containers, caused by long duration
of copy_process()->write_lock_irq() after SAK was sent
to a tty. Current mainline kernel has the same problem.

The solution is to use RCU to iterate processes and threads.
Task list integrity is the only reason we taken tasklist_lock
before, as tty subsys primitives mostly take it for reading
also (e.g., __proc_set_tty). RCU read lock is enough for that.
This patch solves the problem and makes __do_SAK() to be
not greedy of tasklist_lock. That should prevent hard lockups
I've pointed above.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
 drivers/tty/tty_io.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 89326cee2403..55115e65668d 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -2724,7 +2724,9 @@ void __do_SAK(struct tty_struct *tty)
 			   task_pid_nr(p), p->comm);
 		send_sig(SIGKILL, p, 1);
 	} while_each_pid_task(session, PIDTYPE_SID, p);
+	read_unlock(&tasklist_lock);
 
+	rcu_read_lock();
 	/* Now kill any processes that happen to have the tty open */
 	for_each_process(p) {
 		if (p->signal->tty == tty) {
@@ -2754,7 +2756,7 @@ void __do_SAK(struct tty_struct *tty)
 kill:
 		send_sig(SIGKILL, p, 1);
 	}
-	read_unlock(&tasklist_lock);
+	rcu_read_unlock();
 #endif
 }
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] tty: Use RCU read lock to iterate tasks and threads in __do_SAK()
  2018-01-17 12:39 ` [PATCH v2 3/3] tty: Use RCU read lock to iterate tasks and threads " Kirill Tkhai
@ 2018-01-17 16:54   ` Eric W. Biederman
  2018-01-17 17:39     ` Oleg Nesterov
  2018-01-18 10:11     ` Kirill Tkhai
  0 siblings, 2 replies; 15+ messages in thread
From: Eric W. Biederman @ 2018-01-17 16:54 UTC (permalink / raw)
  To: Kirill Tkhai; +Cc: gregkh, jslaby, oleg, linux-kernel

Kirill Tkhai <ktkhai@virtuozzo.com> writes:

> There were made several efforts to make __do_SAK()
> working in process context long ago, but it does
> not solves the problem completely. Since __do_SAK()
> may take tasklist_lock for a long time, the concurent
> processes, waiting for write lock with interrupts
> disabled (e.g., forking), get into the same situation
> like __do_SAK() would have been executed in interrupt
> context. I've observed several hard lockups on 3.10
> kernel running 200 containers, caused by long duration
> of copy_process()->write_lock_irq() after SAK was sent
> to a tty. Current mainline kernel has the same problem.
>
> The solution is to use RCU to iterate processes and threads.
> Task list integrity is the only reason we taken tasklist_lock
> before, as tty subsys primitives mostly take it for reading
> also (e.g., __proc_set_tty). RCU read lock is enough for that.
> This patch solves the problem and makes __do_SAK() to be
> not greedy of tasklist_lock. That should prevent hard lockups
> I've pointed above.

__do_SAK() needs to be 100% accurate.  I do not see the rcu_read_lock
guaranteeing that new processes created while the process list is being
iterated that happen to have a reference to the tty will be seen.

So I do not believe this is the actual fix to the problem.  Especially
not if we intend to for SAK to remain a secure attention key that
guarantees no other processes have access to the tty.

Eric


> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> ---
>  drivers/tty/tty_io.c |    4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
> index 89326cee2403..55115e65668d 100644
> --- a/drivers/tty/tty_io.c
> +++ b/drivers/tty/tty_io.c
> @@ -2724,7 +2724,9 @@ void __do_SAK(struct tty_struct *tty)
>  			   task_pid_nr(p), p->comm);
>  		send_sig(SIGKILL, p, 1);
>  	} while_each_pid_task(session, PIDTYPE_SID, p);
> +	read_unlock(&tasklist_lock);
>  
> +	rcu_read_lock();
>  	/* Now kill any processes that happen to have the tty open */
>  	for_each_process(p) {
>  		if (p->signal->tty == tty) {
> @@ -2754,7 +2756,7 @@ void __do_SAK(struct tty_struct *tty)
>  kill:
>  		send_sig(SIGKILL, p, 1);
>  	}
> -	read_unlock(&tasklist_lock);
> +	rcu_read_unlock();
>  #endif
>  }
>  

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] Revert "do_SAK: Don't recursively take the tasklist_lock"
  2018-01-17 12:39 ` [PATCH v2 1/3] Revert "do_SAK: Don't recursively take the tasklist_lock" Kirill Tkhai
@ 2018-01-17 17:18   ` Eric W. Biederman
  2018-01-17 17:34     ` Oleg Nesterov
  2018-01-18  9:59     ` Kirill Tkhai
  0 siblings, 2 replies; 15+ messages in thread
From: Eric W. Biederman @ 2018-01-17 17:18 UTC (permalink / raw)
  To: Kirill Tkhai; +Cc: gregkh, jslaby, oleg, linux-kernel

Kirill Tkhai <ktkhai@virtuozzo.com> writes:

> This reverts commit 20ac94378de5.
>
> send_sig() does not take tasklist_lock for a long time,
> so this commit and the problem it solves are not relevant
> anymore.
>
> Also, the problem of force_sig() is it clears SIGNAL_UNKILLABLE
> flag, thus even global init may be killed by __do_SAK(),
> which is definitely not the expected behavior.

Actually it is.

SAK should kill everything that has the tty open.  If init opens the tty
I am so sorry, it can not operate correctly.  init should not have your
tty open.

The alternative and perhaps the better option is to simply remove SAK
support, if we can not make it race free.

But yes SAK very much does care about races.  

> Came from discussion in "tty: Iterate only thread group leaders in __do_SAK()"
> https://lkml.org/lkml/2018/1/11/492
>
> Suggested-by: Oleg Nesterov <oleg@redhat.com>
> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> ---
>  drivers/tty/tty_io.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
> index dc60aeea87d8..84715ba1aee2 100644
> --- a/drivers/tty/tty_io.c
> +++ b/drivers/tty/tty_io.c
> @@ -2737,7 +2737,7 @@ void __do_SAK(struct tty_struct *tty)
>  		if (i != 0) {
>  			tty_notice(tty, "SAK: killed process %d (%s): by fd#%d\n",
>  				   task_pid_nr(p), p->comm, i - 1);
> -			force_sig(SIGKILL, p);
> +			send_sig(SIGKILL, p, 1);
>  		}
>  		task_unlock(p);
>  	} while_each_thread(g, p);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] Revert "do_SAK: Don't recursively take the tasklist_lock"
  2018-01-17 17:18   ` Eric W. Biederman
@ 2018-01-17 17:34     ` Oleg Nesterov
  2018-01-17 17:49       ` Eric W. Biederman
  2018-01-18  9:59     ` Kirill Tkhai
  1 sibling, 1 reply; 15+ messages in thread
From: Oleg Nesterov @ 2018-01-17 17:34 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Kirill Tkhai, gregkh, jslaby, linux-kernel

On 01/17, Eric W. Biederman wrote:

> Kirill Tkhai <ktkhai@virtuozzo.com> writes:
>
> > This reverts commit 20ac94378de5.
> >
> > send_sig() does not take tasklist_lock for a long time,
> > so this commit and the problem it solves are not relevant
> > anymore.
> >
> > Also, the problem of force_sig() is it clears SIGNAL_UNKILLABLE
> > flag, thus even global init may be killed by __do_SAK(),
> > which is definitely not the expected behavior.
>
> Actually it is.
>
> SAK should kill everything that has the tty open.  If init opens the tty
> I am so sorry, it can not operate correctly.  init should not have your
> tty open.

OK, but then we need "force" in other places too. __do_SAK() does send_sig(SIGKILL)
in do_each_pid_task(PIDTYPE_SID) and if signal->tty == tty.

Plus force_sig() is not rcu-friendly.

So I personally agree with this change. Whether we want to kill the global init
or not should be discussed, if we want to do this __do_SAK() should use
SEND_SIG_FORCED and this is what Kirill is going to do (iiuc), but this needs
another patch.

Oleg.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] tty: Use RCU read lock to iterate tasks and threads in __do_SAK()
  2018-01-17 16:54   ` Eric W. Biederman
@ 2018-01-17 17:39     ` Oleg Nesterov
  2018-01-18 10:11     ` Kirill Tkhai
  1 sibling, 0 replies; 15+ messages in thread
From: Oleg Nesterov @ 2018-01-17 17:39 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Kirill Tkhai, gregkh, jslaby, linux-kernel

On 01/17, Eric W. Biederman wrote:
>
> __do_SAK() needs to be 100% accurate.

But it can't. A process/thread can open tty right after the check.

> I do not see the rcu_read_lock
> guaranteeing that new processes created while the process list is being
> iterated that happen to have a reference to the tty will be seen.

We can't miss the new child if its parent has this tty opened at fork() time,
__do_SAK() sends SIGKILL and ->siglock serializes __do_SAK() with copy_process()
which checks signal_pending() under the same ->siglock. So either fork() should
fail or for_each_process() should see the new child.

Right?

Otherwise we do not care. The child can open tty later but this doesn't differ
from the "race" above.

Oleg.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] Revert "do_SAK: Don't recursively take the tasklist_lock"
  2018-01-17 17:34     ` Oleg Nesterov
@ 2018-01-17 17:49       ` Eric W. Biederman
  2018-01-17 18:04         ` Oleg Nesterov
  2018-01-18 10:07         ` Kirill Tkhai
  0 siblings, 2 replies; 15+ messages in thread
From: Eric W. Biederman @ 2018-01-17 17:49 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Kirill Tkhai, gregkh, jslaby, linux-kernel

Oleg Nesterov <oleg@redhat.com> writes:

> On 01/17, Eric W. Biederman wrote:
>
>> Kirill Tkhai <ktkhai@virtuozzo.com> writes:
>>
>> > This reverts commit 20ac94378de5.
>> >
>> > send_sig() does not take tasklist_lock for a long time,
>> > so this commit and the problem it solves are not relevant
>> > anymore.
>> >
>> > Also, the problem of force_sig() is it clears SIGNAL_UNKILLABLE
>> > flag, thus even global init may be killed by __do_SAK(),
>> > which is definitely not the expected behavior.
>>
>> Actually it is.
>>
>> SAK should kill everything that has the tty open.  If init opens the tty
>> I am so sorry, it can not operate correctly.  init should not have your
>> tty open.
>
> OK, but then we need "force" in other places too. __do_SAK() does send_sig(SIGKILL)
> in do_each_pid_task(PIDTYPE_SID) and if signal->tty == tty.
>
> Plus force_sig() is not rcu-friendly.
>
> So I personally agree with this change. Whether we want to kill the global init
> or not should be discussed, if we want to do this __do_SAK() should use
> SEND_SIG_FORCED and this is what Kirill is going to do (iiuc), but this needs
> another patch.

To operate correctly, do_SAK() needs to kill everything that has the tty
open.  Unless we can make that guarantee I don't see the point of
changing do_SAK.

It would be better to give up on do_SAK altogether than to keep do_SAK
limping along and failing to meet it's security guarantees.

If there are real world races, let's document those and say do_SAK has
been broken for X number of years and just remove it.  Right now that
seems the more reasonable course.

Unless there truly is someone using do_SAK to ensure they have a tty all
to themselves.

Eric

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] Revert "do_SAK: Don't recursively take the tasklist_lock"
  2018-01-17 17:49       ` Eric W. Biederman
@ 2018-01-17 18:04         ` Oleg Nesterov
  2018-01-17 18:37           ` Eric W. Biederman
  2018-01-18 10:07         ` Kirill Tkhai
  1 sibling, 1 reply; 15+ messages in thread
From: Oleg Nesterov @ 2018-01-17 18:04 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Kirill Tkhai, gregkh, jslaby, linux-kernel

On 01/17, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> > On 01/17, Eric W. Biederman wrote:
> >
> >> Kirill Tkhai <ktkhai@virtuozzo.com> writes:
> >>
> >> > This reverts commit 20ac94378de5.
> >> >
> >> > send_sig() does not take tasklist_lock for a long time,
> >> > so this commit and the problem it solves are not relevant
> >> > anymore.
> >> >
> >> > Also, the problem of force_sig() is it clears SIGNAL_UNKILLABLE
> >> > flag, thus even global init may be killed by __do_SAK(),
> >> > which is definitely not the expected behavior.
> >>
> >> Actually it is.
> >>
> >> SAK should kill everything that has the tty open.  If init opens the tty
> >> I am so sorry, it can not operate correctly.  init should not have your
> >> tty open.
> >
> > OK, but then we need "force" in other places too. __do_SAK() does send_sig(SIGKILL)
> > in do_each_pid_task(PIDTYPE_SID) and if signal->tty == tty.
> >
> > Plus force_sig() is not rcu-friendly.
> >
> > So I personally agree with this change. Whether we want to kill the global init
> > or not should be discussed, if we want to do this __do_SAK() should use
> > SEND_SIG_FORCED and this is what Kirill is going to do (iiuc), but this needs
> > another patch.
>
> To operate correctly, do_SAK() needs to kill everything that has the tty
> open.  Unless we can make that guarantee I don't see the point of
> changing do_SAK.

OK, but how this connects to this change?

Again, this force_sig() doesn't match other send_sig()'s in __do_SAK(),
and Kirill is going to turn them all into send_sig_info(SEND_SIG_FORCED).
Just we need to discuss whether we need to skip the global init or not
but this is another story.

So why do you dislike this change?

force_sig() should die anyway. At least in its current form, it should not
be used unless task == current. But this is off-topic.

Oleg.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] Revert "do_SAK: Don't recursively take the tasklist_lock"
  2018-01-17 18:04         ` Oleg Nesterov
@ 2018-01-17 18:37           ` Eric W. Biederman
  2018-01-17 20:43             ` Oleg Nesterov
  0 siblings, 1 reply; 15+ messages in thread
From: Eric W. Biederman @ 2018-01-17 18:37 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Kirill Tkhai, gregkh, jslaby, linux-kernel

Oleg Nesterov <oleg@redhat.com> writes:

> On 01/17, Eric W. Biederman wrote:
>>
>> Oleg Nesterov <oleg@redhat.com> writes:
>>
>> > On 01/17, Eric W. Biederman wrote:
>> >
>> >> Kirill Tkhai <ktkhai@virtuozzo.com> writes:
>> >>
>> >> > This reverts commit 20ac94378de5.
>> >> >
>> >> > send_sig() does not take tasklist_lock for a long time,
>> >> > so this commit and the problem it solves are not relevant
>> >> > anymore.
>> >> >
>> >> > Also, the problem of force_sig() is it clears SIGNAL_UNKILLABLE
>> >> > flag, thus even global init may be killed by __do_SAK(),
>> >> > which is definitely not the expected behavior.
>> >>
>> >> Actually it is.
>> >>
>> >> SAK should kill everything that has the tty open.  If init opens the tty
>> >> I am so sorry, it can not operate correctly.  init should not have your
>> >> tty open.
>> >
>> > OK, but then we need "force" in other places too. __do_SAK() does send_sig(SIGKILL)
>> > in do_each_pid_task(PIDTYPE_SID) and if signal->tty == tty.
>> >
>> > Plus force_sig() is not rcu-friendly.
>> >
>> > So I personally agree with this change. Whether we want to kill the global init
>> > or not should be discussed, if we want to do this __do_SAK() should use
>> > SEND_SIG_FORCED and this is what Kirill is going to do (iiuc), but this needs
>> > another patch.
>>
>> To operate correctly, do_SAK() needs to kill everything that has the tty
>> open.  Unless we can make that guarantee I don't see the point of
>> changing do_SAK.
>
> OK, but how this connects to this change?
>
> Again, this force_sig() doesn't match other send_sig()'s in __do_SAK(),
> and Kirill is going to turn them all into send_sig_info(SEND_SIG_FORCED).
> Just we need to discuss whether we need to skip the global init or not
> but this is another story.
>
> So why do you dislike this change?
>
> force_sig() should die anyway. At least in its current form, it should not
> be used unless task == current. But this is off-topic.

I see that as a fair criticism of force_sig,
and a good argument to use send_sig(SIGKILL, SEND_SIG_FORCED).

Which will kill the global init.

What I don't like is a bunch of patches to introduce races and make
something more racy that should be a logical atomic operation to kill
all of the processes that have a certain tty open so that on the next
open there will be exactly one process with the tty open.

I guess it is a super vhangup.

The purported purpose of SAK is for security.   Breaking security for
performance is not ok.  See what that just did to intel.

So we either need to say do_SAK is broken.  In which case the proper fix
is to just delete the thing.  Or we need not to ensure the final
implemenation is an atomic kill of everything that has the tty open.

I think if these patches can justify using rcu with races in the current
do_SAK implementation than I think do_SAK can just die.  Removing do_SAK
would be a much better way of ensuring do_SAK does not have long lock
hold times.

Races in do_SAK do not justify saying it is ok to introduce more races
in do_SAK.  Either do_SAK is not fit for purpose or it is.

Eric

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] Revert "do_SAK: Don't recursively take the tasklist_lock"
  2018-01-17 18:37           ` Eric W. Biederman
@ 2018-01-17 20:43             ` Oleg Nesterov
  0 siblings, 0 replies; 15+ messages in thread
From: Oleg Nesterov @ 2018-01-17 20:43 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Kirill Tkhai, gregkh, jslaby, linux-kernel

On 01/17, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> >> To operate correctly, do_SAK() needs to kill everything that has the tty
> >> open.  Unless we can make that guarantee I don't see the point of
> >> changing do_SAK.
> >
> > OK, but how this connects to this change?
> >
> > Again, this force_sig() doesn't match other send_sig()'s in __do_SAK(),
> > and Kirill is going to turn them all into send_sig_info(SEND_SIG_FORCED).
> > Just we need to discuss whether we need to skip the global init or not
> > but this is another story.
> >
> > So why do you dislike this change?
> >
> > force_sig() should die anyway. At least in its current form, it should not
> > be used unless task == current. But this is off-topic.
>
> I see that as a fair criticism of force_sig,
> and a good argument to use send_sig(SIGKILL, SEND_SIG_FORCED).
>
> Which will kill the global init.

and iiuc you think this is right. I won't argue, but again, this needs some
discussion imo.

And in fact Kirill was going to do this before anything else, it was me who
(rightly or not) suggested to do this after cleanups because this is the user
visible change.

> What I don't like is a bunch of patches to introduce races and make
> something more racy

Why? I do not see how this series can add the new problems or races, technically
it looks correct to me.

> So we either need to say do_SAK is broken.  In which case the proper fix
> is to just delete the thing.

I personally never used it so I am fine with your suggestion ;)

> Or we need not to ensure the final
> implemenation is an atomic kill of everything that has the tty open.

Then I think we need some changes in drivers/tty/, with or without Kirill's
changes.

May be some flag set/cleared by __do_SAK() which should make tty_open() fail.
Not sure about tty's passed via unix sockets... and actually I have no idea
how much much paranoia __do_SAK() needs.

Oleg.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] Revert "do_SAK: Don't recursively take the tasklist_lock"
  2018-01-17 17:18   ` Eric W. Biederman
  2018-01-17 17:34     ` Oleg Nesterov
@ 2018-01-18  9:59     ` Kirill Tkhai
  1 sibling, 0 replies; 15+ messages in thread
From: Kirill Tkhai @ 2018-01-18  9:59 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: gregkh, jslaby, oleg, linux-kernel

On 17.01.2018 20:18, Eric W. Biederman wrote:
> Kirill Tkhai <ktkhai@virtuozzo.com> writes:
> 
>> This reverts commit 20ac94378de5.
>>
>> send_sig() does not take tasklist_lock for a long time,
>> so this commit and the problem it solves are not relevant
>> anymore.
>>
>> Also, the problem of force_sig() is it clears SIGNAL_UNKILLABLE
>> flag, thus even global init may be killed by __do_SAK(),
>> which is definitely not the expected behavior.
> 
> Actually it is.
> 
> SAK should kill everything that has the tty open.  If init opens the tty
> I am so sorry, it can not operate correctly.  init should not have your
> tty open.
> 
> The alternative and perhaps the better option is to simply remove SAK
> support, if we can not make it race free.
> 
> But yes SAK very much does care about races.  

As Oleg pointed, this patch does not introduce new races in comparison
to what we have now. Please, look at the discussion we started here:
https://lkml.org/lkml/2018/1/12/171. You wasn't in CC, as there was
not reverted your patch 20ac94378de5. Oleg suggested an accurate way
to revert the commit firstly before doing all the things, because
currently there is an ambiguity with force_sig() and send_sig() in
the single function. My patch introduces uniformity in the function.
> 
>> Came from discussion in "tty: Iterate only thread group leaders in __do_SAK()"
>> https://lkml.org/lkml/2018/1/11/492
>>
>> Suggested-by: Oleg Nesterov <oleg@redhat.com>
>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
>> ---
>>  drivers/tty/tty_io.c |    2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
>> index dc60aeea87d8..84715ba1aee2 100644
>> --- a/drivers/tty/tty_io.c
>> +++ b/drivers/tty/tty_io.c
>> @@ -2737,7 +2737,7 @@ void __do_SAK(struct tty_struct *tty)
>>  		if (i != 0) {
>>  			tty_notice(tty, "SAK: killed process %d (%s): by fd#%d\n",
>>  				   task_pid_nr(p), p->comm, i - 1);
>> -			force_sig(SIGKILL, p);
>> +			send_sig(SIGKILL, p, 1);
>>  		}
>>  		task_unlock(p);
>>  	} while_each_thread(g, p);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] Revert "do_SAK: Don't recursively take the tasklist_lock"
  2018-01-17 17:49       ` Eric W. Biederman
  2018-01-17 18:04         ` Oleg Nesterov
@ 2018-01-18 10:07         ` Kirill Tkhai
  1 sibling, 0 replies; 15+ messages in thread
From: Kirill Tkhai @ 2018-01-18 10:07 UTC (permalink / raw)
  To: Eric W. Biederman, Oleg Nesterov; +Cc: gregkh, jslaby, linux-kernel

On 17.01.2018 20:49, Eric W. Biederman wrote:
> Oleg Nesterov <oleg@redhat.com> writes:
> 
>> On 01/17, Eric W. Biederman wrote:
>>
>>> Kirill Tkhai <ktkhai@virtuozzo.com> writes:
>>>
>>>> This reverts commit 20ac94378de5.
>>>>
>>>> send_sig() does not take tasklist_lock for a long time,
>>>> so this commit and the problem it solves are not relevant
>>>> anymore.
>>>>
>>>> Also, the problem of force_sig() is it clears SIGNAL_UNKILLABLE
>>>> flag, thus even global init may be killed by __do_SAK(),
>>>> which is definitely not the expected behavior.
>>>
>>> Actually it is.
>>>
>>> SAK should kill everything that has the tty open.  If init opens the tty
>>> I am so sorry, it can not operate correctly.  init should not have your
>>> tty open.
>>
>> OK, but then we need "force" in other places too. __do_SAK() does send_sig(SIGKILL)
>> in do_each_pid_task(PIDTYPE_SID) and if signal->tty == tty.
>>
>> Plus force_sig() is not rcu-friendly.
>>
>> So I personally agree with this change. Whether we want to kill the global init
>> or not should be discussed, if we want to do this __do_SAK() should use
>> SEND_SIG_FORCED and this is what Kirill is going to do (iiuc), but this needs
>> another patch.
> 
> To operate correctly, do_SAK() needs to kill everything that has the tty
> open.  Unless we can make that guarantee I don't see the point of
> changing do_SAK.

do_SAK() doesn't kill everything at the moment, and it wasn't able to do that
10 years ago, when the reverted commit was introduced. There are two ways
to obtain dead tty:
1)when fd send via unix sockets
2)when a process open foreign /proc/[foreign_pid/fd/fdX

If someone wants to fix that, this possibly should be made outside __do_SAK().
But these races were 10+ years ago, when SAK was implemented, and my patchset
does not add new races to already existing.
 
> It would be better to give up on do_SAK altogether than to keep do_SAK
> limping along and failing to meet it's security guarantees.
> 
> If there are real world races, let's document those and say do_SAK has
> been broken for X number of years and just remove it.  Right now that
> seems the more reasonable course.
> 
> Unless there truly is someone using do_SAK to ensure they have a tty all
> to themselves.
We can remove it, but someone may already use this interface, and this will
break the compatibility.

Anyway, my patchset does not aim to check either SAK is broken or not. My
patchset just make the same functionality as now, but faster.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] tty: Use RCU read lock to iterate tasks and threads in __do_SAK()
  2018-01-17 16:54   ` Eric W. Biederman
  2018-01-17 17:39     ` Oleg Nesterov
@ 2018-01-18 10:11     ` Kirill Tkhai
  1 sibling, 0 replies; 15+ messages in thread
From: Kirill Tkhai @ 2018-01-18 10:11 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: gregkh, jslaby, oleg, linux-kernel

On 17.01.2018 19:54, Eric W. Biederman wrote:
> Kirill Tkhai <ktkhai@virtuozzo.com> writes:
> 
>> There were made several efforts to make __do_SAK()
>> working in process context long ago, but it does
>> not solves the problem completely. Since __do_SAK()
>> may take tasklist_lock for a long time, the concurent
>> processes, waiting for write lock with interrupts
>> disabled (e.g., forking), get into the same situation
>> like __do_SAK() would have been executed in interrupt
>> context. I've observed several hard lockups on 3.10
>> kernel running 200 containers, caused by long duration
>> of copy_process()->write_lock_irq() after SAK was sent
>> to a tty. Current mainline kernel has the same problem.
>>
>> The solution is to use RCU to iterate processes and threads.
>> Task list integrity is the only reason we taken tasklist_lock
>> before, as tty subsys primitives mostly take it for reading
>> also (e.g., __proc_set_tty). RCU read lock is enough for that.
>> This patch solves the problem and makes __do_SAK() to be
>> not greedy of tasklist_lock. That should prevent hard lockups
>> I've pointed above.
> 
> __do_SAK() needs to be 100% accurate.  I do not see the rcu_read_lock
> guaranteeing that new processes created while the process list is being
> iterated that happen to have a reference to the tty will be seen.
> 
> So I do not believe this is the actual fix to the problem.  Especially
> not if we intend to for SAK to remain a secure attention key that
> guarantees no other processes have access to the tty.

As I wrote to your answer to [1/3] SAK does not guarantee that.
See the comment near __do_SAK() header:

  "Now, if it would be correct ;-/ The current code has a nasty hole -
   it doesn't catch files in flight. We may send the descriptor to ourselves
   via AF_UNIX socket, close it and later fetch from socket. FIXME."

My patch does not introduce new races. If there is a race, you may just
prove it by a scheme.

> 
>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
>> ---
>>  drivers/tty/tty_io.c |    4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
>> index 89326cee2403..55115e65668d 100644
>> --- a/drivers/tty/tty_io.c
>> +++ b/drivers/tty/tty_io.c
>> @@ -2724,7 +2724,9 @@ void __do_SAK(struct tty_struct *tty)
>>  			   task_pid_nr(p), p->comm);
>>  		send_sig(SIGKILL, p, 1);
>>  	} while_each_pid_task(session, PIDTYPE_SID, p);
>> +	read_unlock(&tasklist_lock);
>>  
>> +	rcu_read_lock();
>>  	/* Now kill any processes that happen to have the tty open */
>>  	for_each_process(p) {
>>  		if (p->signal->tty == tty) {
>> @@ -2754,7 +2756,7 @@ void __do_SAK(struct tty_struct *tty)
>>  kill:
>>  		send_sig(SIGKILL, p, 1);
>>  	}
>> -	read_unlock(&tasklist_lock);
>> +	rcu_read_unlock();
>>  #endif
>>  }
>>  

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-01-18 10:11 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-17 12:39 [PATCH v2 0/3] tty: Make __do_SAK() less greedy in regard to tasklist_lock Kirill Tkhai
2018-01-17 12:39 ` [PATCH v2 1/3] Revert "do_SAK: Don't recursively take the tasklist_lock" Kirill Tkhai
2018-01-17 17:18   ` Eric W. Biederman
2018-01-17 17:34     ` Oleg Nesterov
2018-01-17 17:49       ` Eric W. Biederman
2018-01-17 18:04         ` Oleg Nesterov
2018-01-17 18:37           ` Eric W. Biederman
2018-01-17 20:43             ` Oleg Nesterov
2018-01-18 10:07         ` Kirill Tkhai
2018-01-18  9:59     ` Kirill Tkhai
2018-01-17 12:39 ` [PATCH v2 2/3] tty: Avoid threads files iterations in __do_SAK() Kirill Tkhai
2018-01-17 12:39 ` [PATCH v2 3/3] tty: Use RCU read lock to iterate tasks and threads " Kirill Tkhai
2018-01-17 16:54   ` Eric W. Biederman
2018-01-17 17:39     ` Oleg Nesterov
2018-01-18 10:11     ` Kirill Tkhai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.