All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched/fair: do not announce throttled next buddy in dequeue_task_fair
@ 2016-06-16 12:57 Konstantin Khlebnikov
  2016-06-16 17:06 ` bsegall
                   ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Konstantin Khlebnikov @ 2016-06-16 12:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, linux-kernel; +Cc: stable

Hierarchy could be already throttled at this point. Throttled next
buddy could trigger null pointer dereference in pick_next_task_fair().

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Stable <stable@vger.kernel.org> # v3.2+
---
 kernel/sched/fair.c |    9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index fe809fe169d2..3c6b038cb734 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4519,15 +4519,14 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 
 		/* Don't dequeue parent if it has other entities besides us */
 		if (cfs_rq->load.weight) {
+			/* avoid re-evaluating load for this entity */
+			se = parent_entity(se);
 			/*
 			 * Bias pick_next to pick a task from this cfs_rq, as
 			 * p is sleeping when it is within its sched_slice.
 			 */
-			if (task_sleep && parent_entity(se))
-				set_next_buddy(parent_entity(se));
-
-			/* avoid re-evaluating load for this entity */
-			se = parent_entity(se);
+			if (task_sleep && se && !throttled_hierarchy(cfs_rq))
+				set_next_buddy(se);
 			break;
 		}
 		flags |= DEQUEUE_SLEEP;

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] sched/fair: do not announce throttled next buddy in dequeue_task_fair
  2016-06-16 12:57 [PATCH] sched/fair: do not announce throttled next buddy in dequeue_task_fair Konstantin Khlebnikov
@ 2016-06-16 17:06 ` bsegall
  2016-06-21 13:44 ` Konstantin Khlebnikov
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 21+ messages in thread
From: bsegall @ 2016-06-16 17:06 UTC (permalink / raw)
  To: Konstantin Khlebnikov; +Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, stable

Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:

> Hierarchy could be already throttled at this point. Throttled next
> buddy could trigger null pointer dereference in pick_next_task_fair().
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

Reviewed-by: Ben Segall <bsegall@google.com>

> Cc: Stable <stable@vger.kernel.org> # v3.2+
> ---
>  kernel/sched/fair.c |    9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index fe809fe169d2..3c6b038cb734 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4519,15 +4519,14 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
>  
>  		/* Don't dequeue parent if it has other entities besides us */
>  		if (cfs_rq->load.weight) {
> +			/* avoid re-evaluating load for this entity */
> +			se = parent_entity(se);
>  			/*
>  			 * Bias pick_next to pick a task from this cfs_rq, as
>  			 * p is sleeping when it is within its sched_slice.
>  			 */
> -			if (task_sleep && parent_entity(se))
> -				set_next_buddy(parent_entity(se));
> -
> -			/* avoid re-evaluating load for this entity */
> -			se = parent_entity(se);
> +			if (task_sleep && se && !throttled_hierarchy(cfs_rq))
> +				set_next_buddy(se);
>  			break;
>  		}
>  		flags |= DEQUEUE_SLEEP;

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] sched/fair: do not announce throttled next buddy in dequeue_task_fair
  2016-06-16 12:57 [PATCH] sched/fair: do not announce throttled next buddy in dequeue_task_fair Konstantin Khlebnikov
  2016-06-16 17:06 ` bsegall
@ 2016-06-21 13:44 ` Konstantin Khlebnikov
  2016-06-24  9:00 ` [tip:sched/urgent] sched/fair: Do not announce throttled next buddy in dequeue_task_fair() tip-bot for Konstantin Khlebnikov
  2016-07-11  7:25 ` [PATCH] sched/fair: do not announce throttled next buddy in dequeue_task_fair Wanpeng Li
  3 siblings, 0 replies; 21+ messages in thread
From: Konstantin Khlebnikov @ 2016-06-21 13:44 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, linux-kernel; +Cc: stable

On 16.06.2016 15:57, Konstantin Khlebnikov wrote:
> Hierarchy could be already throttled at this point. Throttled next
> buddy could trigger null pointer dereference in pick_next_task_fair().

trivial debug in set_next_buddy

@@ -4755,8 +4758,11 @@ static void set_next_buddy(struct sched_entity *se)
         if (entity_is_task(se) && unlikely(task_of(se)->policy == SCHED_IDLE))
                 return;

-       for_each_sched_entity(se)
+       for_each_sched_entity(se) {
+               if (WARN_ON_ONCE(!se->on_rq))
+                       return;
                 cfs_rq_of(se)->next = se;
+       }
  }

catched this

<4>[32815.274642] ------------[ cut here ]------------
<4>[32815.274651] WARNING: CPU: 6 PID: 92082 at kernel/sched/fair.c:4819 set_next_buddy+0x61/0x70()
<4>[32815.274652] Modules linked in: macvlan overlay ip6t_REJECT nf_reject_ipv6 ip6table_filter xt_multiport ip6_tables x_tables tcp_diag 
inet_diag bridge cls_cgroup sch_htb netconsole configfs 8021q mrp garp stp llc x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm 
crc32_pclmul aesni_intel ablk_helper cryptd lrw gf128mul ast ttm drm_kms_helper drm glue_helper aes_x86_64 sb_edac edac_core microcode 
sysimgblt sysfillrect syscopyarea lpc_ich mlx4_en mlx4_core vxlan udp_tunnel ip6_udp_tunnel tcp_htcp igb i2c_algo_bit ixgbe i2c_core dca 
ahci ptp libahci pps_core mdio raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx raid1 raid0 multipath 
linear [last unloaded: ipmi_msghandler]
<4>[32815.274691] CPU: 6 PID: 92082 Comm: mlauncher.pl Not tainted 3.18.25-28.debug.7 #1
<4>[32815.274692] Hardware name: GIGABYTE T17I-234/GA-7PTSH, BIOS R22 07/21/2015
<4>[32815.274694]  00000000000012d3 ffff8827a6623b08 ffffffff816d0ec3 0000000000000007
<4>[32815.274695]  0000000000000000 ffff8827a6623b48 ffffffff8106bb2c ffff8827a6623b78
<4>[32815.274697]  ffff8800795eca80 ffff880079622480 ffff8800795eca80 ffff88326fb9dc20
<4>[32815.274699] Call Trace:
<4>[32815.274704]  [<ffffffff816d0ec3>] dump_stack+0x4e/0x68
<4>[32815.274707]  [<ffffffff8106bb2c>] warn_slowpath_common+0x8c/0xc0
<4>[32815.274709]  [<ffffffff8106bb7a>] warn_slowpath_null+0x1a/0x20
<4>[32815.274710]  [<ffffffff8109af81>] set_next_buddy+0x61/0x70
<4>[32815.274712]  [<ffffffff8109ea98>] check_preempt_wakeup+0x208/0x280
<4>[32815.274717]  [<ffffffff81092abf>] check_preempt_curr+0x8f/0xa0
<4>[32815.274718]  [<ffffffff8109b9e1>] attach_task+0x51/0x60
<4>[32815.274721]  [<ffffffff810a1c55>] load_balance+0x605/0x970
<4>[32815.274723]  [<ffffffff810a247a>] pick_next_task_fair+0x4ba/0x730
<4>[32815.274725]  [<ffffffff8109e625>] ? dequeue_task_fair+0x315/0x580
<4>[32815.274729]  [<ffffffff816d4143>] __schedule+0x103/0x800
<4>[32815.274731]  [<ffffffff816d4919>] schedule+0x29/0x70
<4>[32815.274733]  [<ffffffff816d7a5c>] do_nanosleep+0xac/0x130
<4>[32815.274736]  [<ffffffff810c358d>] hrtimer_nanosleep+0xad/0x160
<4>[32815.274738]  [<ffffffff810c1f70>] ? update_rmtp+0x70/0x70
<4>[32815.274740]  [<ffffffff810c36b6>] SyS_nanosleep+0x76/0x90
<4>[32815.274741]  [<ffffffff816d8f09>] system_call_fastpath+0x12/0x17
<4>[32815.274743] ---[ end trace 656c1001c069cc10 ]---

>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Cc: Stable <stable@vger.kernel.org> # v3.2+
> ---
>   kernel/sched/fair.c |    9 ++++-----
>   1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index fe809fe169d2..3c6b038cb734 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4519,15 +4519,14 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
>
>   		/* Don't dequeue parent if it has other entities besides us */
>   		if (cfs_rq->load.weight) {
> +			/* avoid re-evaluating load for this entity */
> +			se = parent_entity(se);
>   			/*
>   			 * Bias pick_next to pick a task from this cfs_rq, as
>   			 * p is sleeping when it is within its sched_slice.
>   			 */
> -			if (task_sleep && parent_entity(se))
> -				set_next_buddy(parent_entity(se));
> -
> -			/* avoid re-evaluating load for this entity */
> -			se = parent_entity(se);
> +			if (task_sleep && se && !throttled_hierarchy(cfs_rq))
> +				set_next_buddy(se);
>   			break;
>   		}
>   		flags |= DEQUEUE_SLEEP;
>


-- 
Konstantin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [tip:sched/urgent] sched/fair: Do not announce throttled next buddy in dequeue_task_fair()
  2016-06-16 12:57 [PATCH] sched/fair: do not announce throttled next buddy in dequeue_task_fair Konstantin Khlebnikov
  2016-06-16 17:06 ` bsegall
  2016-06-21 13:44 ` Konstantin Khlebnikov
@ 2016-06-24  9:00 ` tip-bot for Konstantin Khlebnikov
  2016-07-11  7:25 ` [PATCH] sched/fair: do not announce throttled next buddy in dequeue_task_fair Wanpeng Li
  3 siblings, 0 replies; 21+ messages in thread
From: tip-bot for Konstantin Khlebnikov @ 2016-06-24  9:00 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, mingo, hpa, khlebnikov, bsegall, peterz, torvalds, tglx

Commit-ID:  754bd598be9bbc953bc709a9e8ed7f3188bfb9d7
Gitweb:     http://git.kernel.org/tip/754bd598be9bbc953bc709a9e8ed7f3188bfb9d7
Author:     Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
AuthorDate: Thu, 16 Jun 2016 15:57:15 +0300
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 24 Jun 2016 08:26:45 +0200

sched/fair: Do not announce throttled next buddy in dequeue_task_fair()

Hierarchy could be already throttled at this point. Throttled next
buddy could trigger a NULL pointer dereference in pick_next_task_fair().

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ben Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/146608183552.21905.15924473394414832071.stgit@buzz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8c5d8c0..bdcbeea 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4537,15 +4537,14 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 
 		/* Don't dequeue parent if it has other entities besides us */
 		if (cfs_rq->load.weight) {
+			/* Avoid re-evaluating load for this entity: */
+			se = parent_entity(se);
 			/*
 			 * Bias pick_next to pick a task from this cfs_rq, as
 			 * p is sleeping when it is within its sched_slice.
 			 */
-			if (task_sleep && parent_entity(se))
-				set_next_buddy(parent_entity(se));
-
-			/* avoid re-evaluating load for this entity */
-			se = parent_entity(se);
+			if (task_sleep && se && !throttled_hierarchy(cfs_rq))
+				set_next_buddy(se);
 			break;
 		}
 		flags |= DEQUEUE_SLEEP;

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] sched/fair: do not announce throttled next buddy in dequeue_task_fair
  2016-06-16 12:57 [PATCH] sched/fair: do not announce throttled next buddy in dequeue_task_fair Konstantin Khlebnikov
                   ` (2 preceding siblings ...)
  2016-06-24  9:00 ` [tip:sched/urgent] sched/fair: Do not announce throttled next buddy in dequeue_task_fair() tip-bot for Konstantin Khlebnikov
@ 2016-07-11  7:25 ` Wanpeng Li
  2016-07-11  8:15   ` Konstantin Khlebnikov
  2016-07-11  8:22   ` Xunlei Pang
  3 siblings, 2 replies; 21+ messages in thread
From: Wanpeng Li @ 2016-07-11  7:25 UTC (permalink / raw)
  To: Konstantin Khlebnikov; +Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, stable

2016-06-16 20:57 GMT+08:00 Konstantin Khlebnikov <khlebnikov@yandex-team.ru>:
> Hierarchy could be already throttled at this point. Throttled next
> buddy could trigger null pointer dereference in pick_next_task_fair().

There is cfs_rq->next check in pick_next_entity(), so how can null
pointer dereference happen?

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] sched/fair: do not announce throttled next buddy in dequeue_task_fair
  2016-07-11  7:25 ` [PATCH] sched/fair: do not announce throttled next buddy in dequeue_task_fair Wanpeng Li
@ 2016-07-11  8:15   ` Konstantin Khlebnikov
  2016-07-11  8:22   ` Xunlei Pang
  1 sibling, 0 replies; 21+ messages in thread
From: Konstantin Khlebnikov @ 2016-07-11  8:15 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, stable

On 11.07.2016 10:25, Wanpeng Li wrote:
> 2016-06-16 20:57 GMT+08:00 Konstantin Khlebnikov <khlebnikov@yandex-team.ru>:
>> Hierarchy could be already throttled at this point. Throttled next
>> buddy could trigger null pointer dereference in pick_next_task_fair().
>
> There is cfs_rq->next check in pick_next_entity(), so how can null
> pointer dereference happen?
>

If we nominate task from throttled hiearchy as a next buddy then at some
level in pick_next_task_fair we could pick cfs_rq which has no runnable
entities - in pick_next_entiry both "curr" and "left" are NULL.

-- 
Konstantin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] sched/fair: do not announce throttled next buddy in dequeue_task_fair
  2016-07-11  7:25 ` [PATCH] sched/fair: do not announce throttled next buddy in dequeue_task_fair Wanpeng Li
  2016-07-11  8:15   ` Konstantin Khlebnikov
@ 2016-07-11  8:22   ` Xunlei Pang
  2016-07-11  8:42     ` Xunlei Pang
  1 sibling, 1 reply; 21+ messages in thread
From: Xunlei Pang @ 2016-07-11  8:22 UTC (permalink / raw)
  To: Wanpeng Li, Konstantin Khlebnikov
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, stable

On 2016/07/11 at 15:25, Wanpeng Li wrote:
> 2016-06-16 20:57 GMT+08:00 Konstantin Khlebnikov <khlebnikov@yandex-team.ru>:
>> Hierarchy could be already throttled at this point. Throttled next
>> buddy could trigger null pointer dereference in pick_next_task_fair().
> There is cfs_rq->next check in pick_next_entity(), so how can null
> pointer dereference happen?

I guess it's the following code leading to a NULL se returned:
pick_next_entity():
    if (cfs_rq->next && wakeup_preempt_entity(cfs_rq->next, left) < 1)
        se = cfs_rq->next;

Regards,
Xunlei

> Regards,
> Wanpeng Li

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] sched/fair: do not announce throttled next buddy in dequeue_task_fair
  2016-07-11  8:22   ` Xunlei Pang
@ 2016-07-11  8:42     ` Xunlei Pang
  2016-07-11  9:54       ` Wanpeng Li
  0 siblings, 1 reply; 21+ messages in thread
From: Xunlei Pang @ 2016-07-11  8:42 UTC (permalink / raw)
  To: Wanpeng Li, Konstantin Khlebnikov
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, stable

On 2016/07/11 at 16:22, Xunlei Pang wrote:
> On 2016/07/11 at 15:25, Wanpeng Li wrote:
>> 2016-06-16 20:57 GMT+08:00 Konstantin Khlebnikov <khlebnikov@yandex-team.ru>:
>>> Hierarchy could be already throttled at this point. Throttled next
>>> buddy could trigger null pointer dereference in pick_next_task_fair().
>> There is cfs_rq->next check in pick_next_entity(), so how can null
>> pointer dereference happen?
> I guess it's the following code leading to a NULL se returned:

s/NULL/empty-entity cfs_rq se/

> pick_next_entity():
>     if (cfs_rq->next && wakeup_preempt_entity(cfs_rq->next, left) < 1)
>         se = cfs_rq->next;
>
> Regards,
> Xunlei
>
>> Regards,
>> Wanpeng Li

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] sched/fair: do not announce throttled next buddy in dequeue_task_fair
  2016-07-11  8:42     ` Xunlei Pang
@ 2016-07-11  9:54       ` Wanpeng Li
  2016-07-11 12:12         ` Xunlei Pang
  0 siblings, 1 reply; 21+ messages in thread
From: Wanpeng Li @ 2016-07-11  9:54 UTC (permalink / raw)
  To: xlpang
  Cc: Konstantin Khlebnikov, Peter Zijlstra, Ingo Molnar, linux-kernel, stable

Hi Konstantin, Xunlei,
2016-07-11 16:42 GMT+08:00 Xunlei Pang <xpang@redhat.com>:
> On 2016/07/11 at 16:22, Xunlei Pang wrote:
>> On 2016/07/11 at 15:25, Wanpeng Li wrote:
>>> 2016-06-16 20:57 GMT+08:00 Konstantin Khlebnikov <khlebnikov@yandex-team.ru>:
>>>> Hierarchy could be already throttled at this point. Throttled next
>>>> buddy could trigger null pointer dereference in pick_next_task_fair().
>>> There is cfs_rq->next check in pick_next_entity(), so how can null
>>> pointer dereference happen?
>> I guess it's the following code leading to a NULL se returned:
>
> s/NULL/empty-entity cfs_rq se/
>
>> pick_next_entity():
>>     if (cfs_rq->next && wakeup_preempt_entity(cfs_rq->next, left) < 1)
            ^^^^^^^^^^^^^
I think this will return false.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] sched/fair: do not announce throttled next buddy in dequeue_task_fair
  2016-07-11  9:54       ` Wanpeng Li
@ 2016-07-11 12:12         ` Xunlei Pang
  2016-07-11 12:26           ` Konstantin Khlebnikov
  0 siblings, 1 reply; 21+ messages in thread
From: Xunlei Pang @ 2016-07-11 12:12 UTC (permalink / raw)
  To: Wanpeng Li, xlpang, Konstantin Khlebnikov
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, stable

On 2016/07/11 at 17:54, Wanpeng Li wrote:
> Hi Konstantin, Xunlei,
> 2016-07-11 16:42 GMT+08:00 Xunlei Pang <xpang@redhat.com>:
>> On 2016/07/11 at 16:22, Xunlei Pang wrote:
>>> On 2016/07/11 at 15:25, Wanpeng Li wrote:
>>>> 2016-06-16 20:57 GMT+08:00 Konstantin Khlebnikov <khlebnikov@yandex-team.ru>:
>>>>> Hierarchy could be already throttled at this point. Throttled next
>>>>> buddy could trigger null pointer dereference in pick_next_task_fair().
>>>> There is cfs_rq->next check in pick_next_entity(), so how can null
>>>> pointer dereference happen?
>>> I guess it's the following code leading to a NULL se returned:
>> s/NULL/empty-entity cfs_rq se/
>>
>>> pick_next_entity():
>>>     if (cfs_rq->next && wakeup_preempt_entity(cfs_rq->next, left) < 1)
>             ^^^^^^^^^^^^^
> I think this will return false.

With the wrong throttled_hierarchy(), I think this can happen. But after we have the
corrected throttled_hierarchy() patch, I can't see how it is possible.

dequeue_task_fair():
    if (task_sleep && parent_entity(se))
        set_next_buddy(parent_entity(se));

How does dequeue_task_fair() with DEQUEUE_SLEEP set(true task_sleep) happen to a throttled hierarchy?
IOW, a task belongs to a throttled hierarchy is running?

Maybe Konstantin knows the reason.

Regards,
Xunlei

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] sched/fair: do not announce throttled next buddy in dequeue_task_fair
  2016-07-11 12:12         ` Xunlei Pang
@ 2016-07-11 12:26           ` Konstantin Khlebnikov
  2016-07-12 17:25             ` bsegall
  0 siblings, 1 reply; 21+ messages in thread
From: Konstantin Khlebnikov @ 2016-07-11 12:26 UTC (permalink / raw)
  To: xlpang, Wanpeng Li; +Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, stable

On 11.07.2016 15:12, Xunlei Pang wrote:
> On 2016/07/11 at 17:54, Wanpeng Li wrote:
>> Hi Konstantin, Xunlei,
>> 2016-07-11 16:42 GMT+08:00 Xunlei Pang <xpang@redhat.com>:
>>> On 2016/07/11 at 16:22, Xunlei Pang wrote:
>>>> On 2016/07/11 at 15:25, Wanpeng Li wrote:
>>>>> 2016-06-16 20:57 GMT+08:00 Konstantin Khlebnikov <khlebnikov@yandex-team.ru>:
>>>>>> Hierarchy could be already throttled at this point. Throttled next
>>>>>> buddy could trigger null pointer dereference in pick_next_task_fair().
>>>>> There is cfs_rq->next check in pick_next_entity(), so how can null
>>>>> pointer dereference happen?
>>>> I guess it's the following code leading to a NULL se returned:
>>> s/NULL/empty-entity cfs_rq se/
>>>
>>>> pick_next_entity():
>>>>      if (cfs_rq->next && wakeup_preempt_entity(cfs_rq->next, left) < 1)
>>              ^^^^^^^^^^^^^
>> I think this will return false.
>
> With the wrong throttled_hierarchy(), I think this can happen. But after we have the
> corrected throttled_hierarchy() patch, I can't see how it is possible.
>
> dequeue_task_fair():
>      if (task_sleep && parent_entity(se))
>          set_next_buddy(parent_entity(se));
>
> How does dequeue_task_fair() with DEQUEUE_SLEEP set(true task_sleep) happen to a throttled hierarchy?
> IOW, a task belongs to a throttled hierarchy is running?
>
> Maybe Konstantin knows the reason.

This function (dequeue_task_fair) check throttling but at point it could skip several
levels and announce as next buddy actually throttled entry.
Probably this bug hadn't happened but this's really hard to prove that this is impossible.
->set_curr_task(), PI-boost or some tricky migration in balancer could break this easily.

-- 
Konstantin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] sched/fair: do not announce throttled next buddy in dequeue_task_fair
  2016-07-11 12:26           ` Konstantin Khlebnikov
@ 2016-07-12 17:25             ` bsegall
  2016-07-13  1:50               ` Wanpeng Li
  0 siblings, 1 reply; 21+ messages in thread
From: bsegall @ 2016-07-12 17:25 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: xlpang, Wanpeng Li, Peter Zijlstra, Ingo Molnar, linux-kernel, stable

Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:

> On 11.07.2016 15:12, Xunlei Pang wrote:
>> On 2016/07/11 at 17:54, Wanpeng Li wrote:
>>> Hi Konstantin, Xunlei,
>>> 2016-07-11 16:42 GMT+08:00 Xunlei Pang <xpang@redhat.com>:
>>>> On 2016/07/11 at 16:22, Xunlei Pang wrote:
>>>>> On 2016/07/11 at 15:25, Wanpeng Li wrote:
>>>>>> 2016-06-16 20:57 GMT+08:00 Konstantin Khlebnikov <khlebnikov@yandex-team.ru>:
>>>>>>> Hierarchy could be already throttled at this point. Throttled next
>>>>>>> buddy could trigger null pointer dereference in pick_next_task_fair().
>>>>>> There is cfs_rq->next check in pick_next_entity(), so how can null
>>>>>> pointer dereference happen?
>>>>> I guess it's the following code leading to a NULL se returned:
>>>> s/NULL/empty-entity cfs_rq se/
>>>>
>>>>> pick_next_entity():
>>>>>      if (cfs_rq->next && wakeup_preempt_entity(cfs_rq->next, left) < 1)
>>>              ^^^^^^^^^^^^^
>>> I think this will return false.
>>
>> With the wrong throttled_hierarchy(), I think this can happen. But after we have the
>> corrected throttled_hierarchy() patch, I can't see how it is possible.
>>
>> dequeue_task_fair():
>>      if (task_sleep && parent_entity(se))
>>          set_next_buddy(parent_entity(se));
>>
>> How does dequeue_task_fair() with DEQUEUE_SLEEP set(true task_sleep) happen to a throttled hierarchy?
>> IOW, a task belongs to a throttled hierarchy is running?
>>
>> Maybe Konstantin knows the reason.
>
> This function (dequeue_task_fair) check throttling but at point it could skip several
> levels and announce as next buddy actually throttled entry.
> Probably this bug hadn't happened but this's really hard to prove that this is impossible.
> ->set_curr_task(), PI-boost or some tricky migration in balancer could break this easily.

sched_setscheduler can call put_prev_task, which then can cause a
throttle outside of __schedule(), then the task blocks normally and
deactivate_task(DEQUEUE_SLEEP) happens and you lose.

The obvious way to avoid these would be to somehow change put_prev so
that it only does throttles in the schedule() path (which is what we
/want/), which would probably involve adding a parameter to put_prev for
just this.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] sched/fair: do not announce throttled next buddy in dequeue_task_fair
  2016-07-12 17:25             ` bsegall
@ 2016-07-13  1:50               ` Wanpeng Li
  2016-07-13  1:58                 ` Xunlei Pang
  2016-07-13 17:06                 ` bsegall
  0 siblings, 2 replies; 21+ messages in thread
From: Wanpeng Li @ 2016-07-13  1:50 UTC (permalink / raw)
  To: Benjamin Segall
  Cc: Konstantin Khlebnikov, xlpang, Peter Zijlstra, Ingo Molnar,
	linux-kernel, stable

2016-07-13 1:25 GMT+08:00  <bsegall@google.com>:
> Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
>
>> On 11.07.2016 15:12, Xunlei Pang wrote:
>>> On 2016/07/11 at 17:54, Wanpeng Li wrote:
>>>> Hi Konstantin, Xunlei,
>>>> 2016-07-11 16:42 GMT+08:00 Xunlei Pang <xpang@redhat.com>:
>>>>> On 2016/07/11 at 16:22, Xunlei Pang wrote:
>>>>>> On 2016/07/11 at 15:25, Wanpeng Li wrote:
>>>>>>> 2016-06-16 20:57 GMT+08:00 Konstantin Khlebnikov <khlebnikov@yandex-team.ru>:
>>>>>>>> Hierarchy could be already throttled at this point. Throttled next
>>>>>>>> buddy could trigger null pointer dereference in pick_next_task_fair().
>>>>>>> There is cfs_rq->next check in pick_next_entity(), so how can null
>>>>>>> pointer dereference happen?
>>>>>> I guess it's the following code leading to a NULL se returned:
>>>>> s/NULL/empty-entity cfs_rq se/
>>>>>
>>>>>> pick_next_entity():
>>>>>>      if (cfs_rq->next && wakeup_preempt_entity(cfs_rq->next, left) < 1)
>>>>              ^^^^^^^^^^^^^
>>>> I think this will return false.
>>>
>>> With the wrong throttled_hierarchy(), I think this can happen. But after we have the
>>> corrected throttled_hierarchy() patch, I can't see how it is possible.
>>>
>>> dequeue_task_fair():
>>>      if (task_sleep && parent_entity(se))
>>>          set_next_buddy(parent_entity(se));
>>>
>>> How does dequeue_task_fair() with DEQUEUE_SLEEP set(true task_sleep) happen to a throttled hierarchy?
>>> IOW, a task belongs to a throttled hierarchy is running?
>>>
>>> Maybe Konstantin knows the reason.
>>
>> This function (dequeue_task_fair) check throttling but at point it could skip several
>> levels and announce as next buddy actually throttled entry.
>> Probably this bug hadn't happened but this's really hard to prove that this is impossible.
>> ->set_curr_task(), PI-boost or some tricky migration in balancer could break this easily.
>
> sched_setscheduler can call put_prev_task, which then can cause a
> throttle outside of __schedule(), then the task blocks normally and
> deactivate_task(DEQUEUE_SLEEP) happens and you lose.

The cfs_rq_throttled() check in dequeue_task_fair() will capture the
cfs_rq which is throttled in sched_setscheduler::put_prev_task path,
so nothing lost, where I miss?

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] sched/fair: do not announce throttled next buddy in dequeue_task_fair
  2016-07-13  1:50               ` Wanpeng Li
@ 2016-07-13  1:58                 ` Xunlei Pang
  2016-07-13  2:14                   ` Wanpeng Li
  2016-07-13 17:06                 ` bsegall
  1 sibling, 1 reply; 21+ messages in thread
From: Xunlei Pang @ 2016-07-13  1:58 UTC (permalink / raw)
  To: Wanpeng Li, Benjamin Segall
  Cc: Konstantin Khlebnikov, xlpang, Peter Zijlstra, Ingo Molnar,
	linux-kernel, stable

On 2016/07/13 at 09:50, Wanpeng Li wrote:
> 2016-07-13 1:25 GMT+08:00  <bsegall@google.com>:
>> Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
>>
>>> On 11.07.2016 15:12, Xunlei Pang wrote:
>>>> On 2016/07/11 at 17:54, Wanpeng Li wrote:
>>>>> Hi Konstantin, Xunlei,
>>>>> 2016-07-11 16:42 GMT+08:00 Xunlei Pang <xpang@redhat.com>:
>>>>>> On 2016/07/11 at 16:22, Xunlei Pang wrote:
>>>>>>> On 2016/07/11 at 15:25, Wanpeng Li wrote:
>>>>>>>> 2016-06-16 20:57 GMT+08:00 Konstantin Khlebnikov <khlebnikov@yandex-team.ru>:
>>>>>>>>> Hierarchy could be already throttled at this point. Throttled next
>>>>>>>>> buddy could trigger null pointer dereference in pick_next_task_fair().
>>>>>>>> There is cfs_rq->next check in pick_next_entity(), so how can null
>>>>>>>> pointer dereference happen?
>>>>>>> I guess it's the following code leading to a NULL se returned:
>>>>>> s/NULL/empty-entity cfs_rq se/
>>>>>>
>>>>>>> pick_next_entity():
>>>>>>>      if (cfs_rq->next && wakeup_preempt_entity(cfs_rq->next, left) < 1)
>>>>>              ^^^^^^^^^^^^^
>>>>> I think this will return false.
>>>> With the wrong throttled_hierarchy(), I think this can happen. But after we have the
>>>> corrected throttled_hierarchy() patch, I can't see how it is possible.
>>>>
>>>> dequeue_task_fair():
>>>>      if (task_sleep && parent_entity(se))
>>>>          set_next_buddy(parent_entity(se));
>>>>
>>>> How does dequeue_task_fair() with DEQUEUE_SLEEP set(true task_sleep) happen to a throttled hierarchy?
>>>> IOW, a task belongs to a throttled hierarchy is running?
>>>>
>>>> Maybe Konstantin knows the reason.
>>> This function (dequeue_task_fair) check throttling but at point it could skip several
>>> levels and announce as next buddy actually throttled entry.
>>> Probably this bug hadn't happened but this's really hard to prove that this is impossible.
>>> ->set_curr_task(), PI-boost or some tricky migration in balancer could break this easily.
>> sched_setscheduler can call put_prev_task, which then can cause a
>> throttle outside of __schedule(), then the task blocks normally and
>> deactivate_task(DEQUEUE_SLEEP) happens and you lose.
> The cfs_rq_throttled() check in dequeue_task_fair() will capture the
> cfs_rq which is throttled in sched_setscheduler::put_prev_task path,
> so nothing lost, where I miss?

cfs_rq_throttled() returns false for child cgroups in the throttled hierarchy, so
throttled_hierarchy() should be relied on in such cases.

Regards,
Xunlei

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] sched/fair: do not announce throttled next buddy in dequeue_task_fair
  2016-07-13  1:58                 ` Xunlei Pang
@ 2016-07-13  2:14                   ` Wanpeng Li
  0 siblings, 0 replies; 21+ messages in thread
From: Wanpeng Li @ 2016-07-13  2:14 UTC (permalink / raw)
  To: xlpang
  Cc: Benjamin Segall, Konstantin Khlebnikov, Peter Zijlstra,
	Ingo Molnar, linux-kernel, stable

2016-07-13 9:58 GMT+08:00 Xunlei Pang <xpang@redhat.com>:
> On 2016/07/13 at 09:50, Wanpeng Li wrote:
>> 2016-07-13 1:25 GMT+08:00  <bsegall@google.com>:
>>> Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
>>>
>>>> On 11.07.2016 15:12, Xunlei Pang wrote:
>>>>> On 2016/07/11 at 17:54, Wanpeng Li wrote:
>>>>>> Hi Konstantin, Xunlei,
>>>>>> 2016-07-11 16:42 GMT+08:00 Xunlei Pang <xpang@redhat.com>:
>>>>>>> On 2016/07/11 at 16:22, Xunlei Pang wrote:
>>>>>>>> On 2016/07/11 at 15:25, Wanpeng Li wrote:
>>>>>>>>> 2016-06-16 20:57 GMT+08:00 Konstantin Khlebnikov <khlebnikov@yandex-team.ru>:
>>>>>>>>>> Hierarchy could be already throttled at this point. Throttled next
>>>>>>>>>> buddy could trigger null pointer dereference in pick_next_task_fair().
>>>>>>>>> There is cfs_rq->next check in pick_next_entity(), so how can null
>>>>>>>>> pointer dereference happen?
>>>>>>>> I guess it's the following code leading to a NULL se returned:
>>>>>>> s/NULL/empty-entity cfs_rq se/
>>>>>>>
>>>>>>>> pick_next_entity():
>>>>>>>>      if (cfs_rq->next && wakeup_preempt_entity(cfs_rq->next, left) < 1)
>>>>>>              ^^^^^^^^^^^^^
>>>>>> I think this will return false.
>>>>> With the wrong throttled_hierarchy(), I think this can happen. But after we have the
>>>>> corrected throttled_hierarchy() patch, I can't see how it is possible.
>>>>>
>>>>> dequeue_task_fair():
>>>>>      if (task_sleep && parent_entity(se))
>>>>>          set_next_buddy(parent_entity(se));
>>>>>
>>>>> How does dequeue_task_fair() with DEQUEUE_SLEEP set(true task_sleep) happen to a throttled hierarchy?
>>>>> IOW, a task belongs to a throttled hierarchy is running?
>>>>>
>>>>> Maybe Konstantin knows the reason.
>>>> This function (dequeue_task_fair) check throttling but at point it could skip several
>>>> levels and announce as next buddy actually throttled entry.
>>>> Probably this bug hadn't happened but this's really hard to prove that this is impossible.
>>>> ->set_curr_task(), PI-boost or some tricky migration in balancer could break this easily.
>>> sched_setscheduler can call put_prev_task, which then can cause a
>>> throttle outside of __schedule(), then the task blocks normally and
>>> deactivate_task(DEQUEUE_SLEEP) happens and you lose.
>> The cfs_rq_throttled() check in dequeue_task_fair() will capture the
>> cfs_rq which is throttled in sched_setscheduler::put_prev_task path,
>> so nothing lost, where I miss?
>
> cfs_rq_throttled() returns false for child cgroups in the throttled hierarchy, so
> throttled_hierarchy() should be relied on in such cases.

Yes, so what's lost in bsegall's reply?

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] sched/fair: do not announce throttled next buddy in dequeue_task_fair
  2016-07-13  1:50               ` Wanpeng Li
  2016-07-13  1:58                 ` Xunlei Pang
@ 2016-07-13 17:06                 ` bsegall
  2016-07-14 12:11                   ` Wanpeng Li
  1 sibling, 1 reply; 21+ messages in thread
From: bsegall @ 2016-07-13 17:06 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Konstantin Khlebnikov, xlpang, Peter Zijlstra, Ingo Molnar,
	linux-kernel, stable

Wanpeng Li <kernellwp@gmail.com> writes:

> 2016-07-13 1:25 GMT+08:00  <bsegall@google.com>:
>> Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
>>
>>> On 11.07.2016 15:12, Xunlei Pang wrote:
>>>> On 2016/07/11 at 17:54, Wanpeng Li wrote:
>>>>> Hi Konstantin, Xunlei,
>>>>> 2016-07-11 16:42 GMT+08:00 Xunlei Pang <xpang@redhat.com>:
>>>>>> On 2016/07/11 at 16:22, Xunlei Pang wrote:
>>>>>>> On 2016/07/11 at 15:25, Wanpeng Li wrote:
>>>>>>>> 2016-06-16 20:57 GMT+08:00 Konstantin Khlebnikov <khlebnikov@yandex-team.ru>:
>>>>>>>>> Hierarchy could be already throttled at this point. Throttled next
>>>>>>>>> buddy could trigger null pointer dereference in pick_next_task_fair().
>>>>>>>> There is cfs_rq->next check in pick_next_entity(), so how can null
>>>>>>>> pointer dereference happen?
>>>>>>> I guess it's the following code leading to a NULL se returned:
>>>>>> s/NULL/empty-entity cfs_rq se/
>>>>>>
>>>>>>> pick_next_entity():
>>>>>>>      if (cfs_rq->next && wakeup_preempt_entity(cfs_rq->next, left) < 1)
>>>>>              ^^^^^^^^^^^^^
>>>>> I think this will return false.
>>>>
>>>> With the wrong throttled_hierarchy(), I think this can happen. But after we have the
>>>> corrected throttled_hierarchy() patch, I can't see how it is possible.
>>>>
>>>> dequeue_task_fair():
>>>>      if (task_sleep && parent_entity(se))
>>>>          set_next_buddy(parent_entity(se));
>>>>
>>>> How does dequeue_task_fair() with DEQUEUE_SLEEP set(true task_sleep) happen to a throttled hierarchy?
>>>> IOW, a task belongs to a throttled hierarchy is running?
>>>>
>>>> Maybe Konstantin knows the reason.
>>>
>>> This function (dequeue_task_fair) check throttling but at point it could skip several
>>> levels and announce as next buddy actually throttled entry.
>>> Probably this bug hadn't happened but this's really hard to prove that this is impossible.
>>> ->set_curr_task(), PI-boost or some tricky migration in balancer could break this easily.
>>
>> sched_setscheduler can call put_prev_task, which then can cause a
>> throttle outside of __schedule(), then the task blocks normally and
>> deactivate_task(DEQUEUE_SLEEP) happens and you lose.
>
> The cfs_rq_throttled() check in dequeue_task_fair() will capture the
> cfs_rq which is throttled in sched_setscheduler::put_prev_task path,
> so nothing lost, where I miss?
>
> Regards,
> Wanpeng Li

The cfs_rq_throttled() checks there are done bottom-up, so they will
trigger too late. a/b/t, where t is descheduling and a is throttled can
still cause a set_next_buddy(b);

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] sched/fair: do not announce throttled next buddy in dequeue_task_fair
  2016-07-13 17:06                 ` bsegall
@ 2016-07-14 12:11                   ` Wanpeng Li
  2016-07-14 17:54                     ` bsegall
  0 siblings, 1 reply; 21+ messages in thread
From: Wanpeng Li @ 2016-07-14 12:11 UTC (permalink / raw)
  To: Benjamin Segall
  Cc: Konstantin Khlebnikov, xlpang, Peter Zijlstra, Ingo Molnar,
	linux-kernel, stable

2016-07-14 1:06 GMT+08:00  <bsegall@google.com>:
> Wanpeng Li <kernellwp@gmail.com> writes:
>
>> 2016-07-13 1:25 GMT+08:00  <bsegall@google.com>:
>>> Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
>>>
>>>> On 11.07.2016 15:12, Xunlei Pang wrote:
>>>>> On 2016/07/11 at 17:54, Wanpeng Li wrote:
>>>>>> Hi Konstantin, Xunlei,
>>>>>> 2016-07-11 16:42 GMT+08:00 Xunlei Pang <xpang@redhat.com>:
>>>>>>> On 2016/07/11 at 16:22, Xunlei Pang wrote:
>>>>>>>> On 2016/07/11 at 15:25, Wanpeng Li wrote:
>>>>>>>>> 2016-06-16 20:57 GMT+08:00 Konstantin Khlebnikov <khlebnikov@yandex-team.ru>:
>>>>>>>>>> Hierarchy could be already throttled at this point. Throttled next
>>>>>>>>>> buddy could trigger null pointer dereference in pick_next_task_fair().
>>>>>>>>> There is cfs_rq->next check in pick_next_entity(), so how can null
>>>>>>>>> pointer dereference happen?
>>>>>>>> I guess it's the following code leading to a NULL se returned:
>>>>>>> s/NULL/empty-entity cfs_rq se/
>>>>>>>
>>>>>>>> pick_next_entity():
>>>>>>>>      if (cfs_rq->next && wakeup_preempt_entity(cfs_rq->next, left) < 1)
>>>>>>              ^^^^^^^^^^^^^
>>>>>> I think this will return false.
>>>>>
>>>>> With the wrong throttled_hierarchy(), I think this can happen. But after we have the
>>>>> corrected throttled_hierarchy() patch, I can't see how it is possible.
>>>>>
>>>>> dequeue_task_fair():
>>>>>      if (task_sleep && parent_entity(se))
>>>>>          set_next_buddy(parent_entity(se));
>>>>>
>>>>> How does dequeue_task_fair() with DEQUEUE_SLEEP set(true task_sleep) happen to a throttled hierarchy?
>>>>> IOW, a task belongs to a throttled hierarchy is running?
>>>>>
>>>>> Maybe Konstantin knows the reason.
>>>>
>>>> This function (dequeue_task_fair) check throttling but at point it could skip several
>>>> levels and announce as next buddy actually throttled entry.
>>>> Probably this bug hadn't happened but this's really hard to prove that this is impossible.
>>>> ->set_curr_task(), PI-boost or some tricky migration in balancer could break this easily.
>>>
>>> sched_setscheduler can call put_prev_task, which then can cause a
>>> throttle outside of __schedule(), then the task blocks normally and
>>> deactivate_task(DEQUEUE_SLEEP) happens and you lose.
>>
>> The cfs_rq_throttled() check in dequeue_task_fair() will capture the
>> cfs_rq which is throttled in sched_setscheduler::put_prev_task path,
>> so nothing lost, where I miss?
>>
>> Regards,
>> Wanpeng Li
>
> The cfs_rq_throttled() checks there are done bottom-up, so they will
> trigger too late. a/b/t, where t is descheduling and a is throttled can
> still cause a set_next_buddy(b);

throttle cfs_rq is up-bottom, so when a is throttled, b and c are not
yet, then task_sleep && se && !throttled_hierarchy(cfs_rq) still can't
prevent a set_next_buddy(b).

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] sched/fair: do not announce throttled next buddy in dequeue_task_fair
  2016-07-14 12:11                   ` Wanpeng Li
@ 2016-07-14 17:54                     ` bsegall
  2016-07-14 22:30                       ` Wanpeng Li
  0 siblings, 1 reply; 21+ messages in thread
From: bsegall @ 2016-07-14 17:54 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Konstantin Khlebnikov, xlpang, Peter Zijlstra, Ingo Molnar,
	linux-kernel, stable

Wanpeng Li <kernellwp@gmail.com> writes:

> 2016-07-14 1:06 GMT+08:00  <bsegall@google.com>:
>> Wanpeng Li <kernellwp@gmail.com> writes:
>>
>>> 2016-07-13 1:25 GMT+08:00  <bsegall@google.com>:
>>>> Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
>>>>
>>>>> On 11.07.2016 15:12, Xunlei Pang wrote:
>>>>>> On 2016/07/11 at 17:54, Wanpeng Li wrote:
>>>>>>> Hi Konstantin, Xunlei,
>>>>>>> 2016-07-11 16:42 GMT+08:00 Xunlei Pang <xpang@redhat.com>:
>>>>>>>> On 2016/07/11 at 16:22, Xunlei Pang wrote:
>>>>>>>>> On 2016/07/11 at 15:25, Wanpeng Li wrote:
>>>>>>>>>> 2016-06-16 20:57 GMT+08:00 Konstantin Khlebnikov <khlebnikov@yandex-team.ru>:
>>>>>>>>>>> Hierarchy could be already throttled at this point. Throttled next
>>>>>>>>>>> buddy could trigger null pointer dereference in pick_next_task_fair().
>>>>>>>>>> There is cfs_rq->next check in pick_next_entity(), so how can null
>>>>>>>>>> pointer dereference happen?
>>>>>>>>> I guess it's the following code leading to a NULL se returned:
>>>>>>>> s/NULL/empty-entity cfs_rq se/
>>>>>>>>
>>>>>>>>> pick_next_entity():
>>>>>>>>>      if (cfs_rq->next && wakeup_preempt_entity(cfs_rq->next, left) < 1)
>>>>>>>              ^^^^^^^^^^^^^
>>>>>>> I think this will return false.
>>>>>>
>>>>>> With the wrong throttled_hierarchy(), I think this can happen. But after we have the
>>>>>> corrected throttled_hierarchy() patch, I can't see how it is possible.
>>>>>>
>>>>>> dequeue_task_fair():
>>>>>>      if (task_sleep && parent_entity(se))
>>>>>>          set_next_buddy(parent_entity(se));
>>>>>>
>>>>>> How does dequeue_task_fair() with DEQUEUE_SLEEP set(true task_sleep) happen to a throttled hierarchy?
>>>>>> IOW, a task belongs to a throttled hierarchy is running?
>>>>>>
>>>>>> Maybe Konstantin knows the reason.
>>>>>
>>>>> This function (dequeue_task_fair) check throttling but at point it could skip several
>>>>> levels and announce as next buddy actually throttled entry.
>>>>> Probably this bug hadn't happened but this's really hard to prove that this is impossible.
>>>>> ->set_curr_task(), PI-boost or some tricky migration in balancer could break this easily.
>>>>
>>>> sched_setscheduler can call put_prev_task, which then can cause a
>>>> throttle outside of __schedule(), then the task blocks normally and
>>>> deactivate_task(DEQUEUE_SLEEP) happens and you lose.
>>>
>>> The cfs_rq_throttled() check in dequeue_task_fair() will capture the
>>> cfs_rq which is throttled in sched_setscheduler::put_prev_task path,
>>> so nothing lost, where I miss?
>>>
>>> Regards,
>>> Wanpeng Li
>>
>> The cfs_rq_throttled() checks there are done bottom-up, so they will
>> trigger too late. a/b/t, where t is descheduling and a is throttled can
>> still cause a set_next_buddy(b);
>
> throttle cfs_rq is up-bottom, so when a is throttled, b and c are not
> yet, then task_sleep && se && !throttled_hierarchy(cfs_rq) still can't
> prevent a set_next_buddy(b).
>
> Regards,
> Wanpeng Li

They don't race or anything, everything's under rq->lock.
throttled_hierarchy will register properly, the issue is that a parent
is the one cfs_rq_throttled(), not the current cfs_rq, and
set_next_buddy will set cfs_rq->next to an se that is !on_rq.

In the other order (set_next_buddy then throttle), throttle_cfs_rq will
call dequeue which will clear the problematic buddy.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] sched/fair: do not announce throttled next buddy in dequeue_task_fair
  2016-07-14 17:54                     ` bsegall
@ 2016-07-14 22:30                       ` Wanpeng Li
  2016-07-14 22:49                         ` bsegall
  0 siblings, 1 reply; 21+ messages in thread
From: Wanpeng Li @ 2016-07-14 22:30 UTC (permalink / raw)
  To: Benjamin Segall
  Cc: Konstantin Khlebnikov, xlpang, Peter Zijlstra, Ingo Molnar,
	linux-kernel, stable

2016-07-15 1:54 GMT+08:00  <bsegall@google.com>:
> Wanpeng Li <kernellwp@gmail.com> writes:
>
>> 2016-07-14 1:06 GMT+08:00  <bsegall@google.com>:
>>> Wanpeng Li <kernellwp@gmail.com> writes:
>>>
>>>> 2016-07-13 1:25 GMT+08:00  <bsegall@google.com>:
>>>>> Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
>>>>>
>>>>>> On 11.07.2016 15:12, Xunlei Pang wrote:
>>>>>>> On 2016/07/11 at 17:54, Wanpeng Li wrote:
>>>>>>>> Hi Konstantin, Xunlei,
>>>>>>>> 2016-07-11 16:42 GMT+08:00 Xunlei Pang <xpang@redhat.com>:
>>>>>>>>> On 2016/07/11 at 16:22, Xunlei Pang wrote:
>>>>>>>>>> On 2016/07/11 at 15:25, Wanpeng Li wrote:
>>>>>>>>>>> 2016-06-16 20:57 GMT+08:00 Konstantin Khlebnikov <khlebnikov@yandex-team.ru>:
>>>>>>>>>>>> Hierarchy could be already throttled at this point. Throttled next
>>>>>>>>>>>> buddy could trigger null pointer dereference in pick_next_task_fair().
>>>>>>>>>>> There is cfs_rq->next check in pick_next_entity(), so how can null
>>>>>>>>>>> pointer dereference happen?
>>>>>>>>>> I guess it's the following code leading to a NULL se returned:
>>>>>>>>> s/NULL/empty-entity cfs_rq se/
>>>>>>>>>
>>>>>>>>>> pick_next_entity():
>>>>>>>>>>      if (cfs_rq->next && wakeup_preempt_entity(cfs_rq->next, left) < 1)
>>>>>>>>              ^^^^^^^^^^^^^
>>>>>>>> I think this will return false.
>>>>>>>
>>>>>>> With the wrong throttled_hierarchy(), I think this can happen. But after we have the
>>>>>>> corrected throttled_hierarchy() patch, I can't see how it is possible.
>>>>>>>
>>>>>>> dequeue_task_fair():
>>>>>>>      if (task_sleep && parent_entity(se))
>>>>>>>          set_next_buddy(parent_entity(se));
>>>>>>>
>>>>>>> How does dequeue_task_fair() with DEQUEUE_SLEEP set(true task_sleep) happen to a throttled hierarchy?
>>>>>>> IOW, a task belongs to a throttled hierarchy is running?
>>>>>>>
>>>>>>> Maybe Konstantin knows the reason.
>>>>>>
>>>>>> This function (dequeue_task_fair) check throttling but at point it could skip several
>>>>>> levels and announce as next buddy actually throttled entry.
>>>>>> Probably this bug hadn't happened but this's really hard to prove that this is impossible.
>>>>>> ->set_curr_task(), PI-boost or some tricky migration in balancer could break this easily.
>>>>>
>>>>> sched_setscheduler can call put_prev_task, which then can cause a
>>>>> throttle outside of __schedule(), then the task blocks normally and
>>>>> deactivate_task(DEQUEUE_SLEEP) happens and you lose.
>>>>
>>>> The cfs_rq_throttled() check in dequeue_task_fair() will capture the
>>>> cfs_rq which is throttled in sched_setscheduler::put_prev_task path,
>>>> so nothing lost, where I miss?
>>>>
>>>> Regards,
>>>> Wanpeng Li
>>>
>>> The cfs_rq_throttled() checks there are done bottom-up, so they will
>>> trigger too late. a/b/t, where t is descheduling and a is throttled can
>>> still cause a set_next_buddy(b);
>>
>> throttle cfs_rq is up-bottom, so when a is throttled, b and c are not
>> yet, then task_sleep && se && !throttled_hierarchy(cfs_rq) still can't
>> prevent a set_next_buddy(b).
>>
>> Regards,
>> Wanpeng Li
>
> They don't race or anything, everything's under rq->lock.
> throttled_hierarchy will register properly, the issue is that a parent
> is the one cfs_rq_throttled(), not the current cfs_rq, and
> set_next_buddy will set cfs_rq->next to an se that is !on_rq.

Why b is !on_rq after throttle a?

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] sched/fair: do not announce throttled next buddy in dequeue_task_fair
  2016-07-14 22:30                       ` Wanpeng Li
@ 2016-07-14 22:49                         ` bsegall
  2016-07-14 23:03                           ` Wanpeng Li
  0 siblings, 1 reply; 21+ messages in thread
From: bsegall @ 2016-07-14 22:49 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Konstantin Khlebnikov, xlpang, Peter Zijlstra, Ingo Molnar,
	linux-kernel, stable

Wanpeng Li <kernellwp@gmail.com> writes:

> 2016-07-15 1:54 GMT+08:00  <bsegall@google.com>:
>> Wanpeng Li <kernellwp@gmail.com> writes:
>>
>>> 2016-07-14 1:06 GMT+08:00  <bsegall@google.com>:
>>>> Wanpeng Li <kernellwp@gmail.com> writes:
>>>>
>>>>> 2016-07-13 1:25 GMT+08:00  <bsegall@google.com>:
>>>>>> Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
>>>>>>
>>>>>>> On 11.07.2016 15:12, Xunlei Pang wrote:
>>>>>>>> On 2016/07/11 at 17:54, Wanpeng Li wrote:
>>>>>>>>> Hi Konstantin, Xunlei,
>>>>>>>>> 2016-07-11 16:42 GMT+08:00 Xunlei Pang <xpang@redhat.com>:
>>>>>>>>>> On 2016/07/11 at 16:22, Xunlei Pang wrote:
>>>>>>>>>>> On 2016/07/11 at 15:25, Wanpeng Li wrote:
>>>>>>>>>>>> 2016-06-16 20:57 GMT+08:00 Konstantin Khlebnikov <khlebnikov@yandex-team.ru>:
>>>>>>>>>>>>> Hierarchy could be already throttled at this point. Throttled next
>>>>>>>>>>>>> buddy could trigger null pointer dereference in pick_next_task_fair().
>>>>>>>>>>>> There is cfs_rq->next check in pick_next_entity(), so how can null
>>>>>>>>>>>> pointer dereference happen?
>>>>>>>>>>> I guess it's the following code leading to a NULL se returned:
>>>>>>>>>> s/NULL/empty-entity cfs_rq se/
>>>>>>>>>>
>>>>>>>>>>> pick_next_entity():
>>>>>>>>>>>      if (cfs_rq->next && wakeup_preempt_entity(cfs_rq->next, left) < 1)
>>>>>>>>>              ^^^^^^^^^^^^^
>>>>>>>>> I think this will return false.
>>>>>>>>
>>>>>>>> With the wrong throttled_hierarchy(), I think this can happen. But after we have the
>>>>>>>> corrected throttled_hierarchy() patch, I can't see how it is possible.
>>>>>>>>
>>>>>>>> dequeue_task_fair():
>>>>>>>>      if (task_sleep && parent_entity(se))
>>>>>>>>          set_next_buddy(parent_entity(se));
>>>>>>>>
>>>>>>>> How does dequeue_task_fair() with DEQUEUE_SLEEP set(true task_sleep) happen to a throttled hierarchy?
>>>>>>>> IOW, a task belongs to a throttled hierarchy is running?
>>>>>>>>
>>>>>>>> Maybe Konstantin knows the reason.
>>>>>>>
>>>>>>> This function (dequeue_task_fair) check throttling but at point it could skip several
>>>>>>> levels and announce as next buddy actually throttled entry.
>>>>>>> Probably this bug hadn't happened but this's really hard to prove that this is impossible.
>>>>>>> ->set_curr_task(), PI-boost or some tricky migration in balancer could break this easily.
>>>>>>
>>>>>> sched_setscheduler can call put_prev_task, which then can cause a
>>>>>> throttle outside of __schedule(), then the task blocks normally and
>>>>>> deactivate_task(DEQUEUE_SLEEP) happens and you lose.
>>>>>
>>>>> The cfs_rq_throttled() check in dequeue_task_fair() will capture the
>>>>> cfs_rq which is throttled in sched_setscheduler::put_prev_task path,
>>>>> so nothing lost, where I miss?
>>>>>
>>>>> Regards,
>>>>> Wanpeng Li
>>>>
>>>> The cfs_rq_throttled() checks there are done bottom-up, so they will
>>>> trigger too late. a/b/t, where t is descheduling and a is throttled can
>>>> still cause a set_next_buddy(b);
>>>
>>> throttle cfs_rq is up-bottom, so when a is throttled, b and c are not
>>> yet, then task_sleep && se && !throttled_hierarchy(cfs_rq) still can't
>>> prevent a set_next_buddy(b).
>>>
>>> Regards,
>>> Wanpeng Li
>>
>> They don't race or anything, everything's under rq->lock.
>> throttled_hierarchy will register properly, the issue is that a parent
>> is the one cfs_rq_throttled(), not the current cfs_rq, and
>> set_next_buddy will set cfs_rq->next to an se that is !on_rq.
>
> Why b is !on_rq after throttle a?
>
> Regards,
> Wanpeng Li

a is !on_rq (because of throttle), but set_next_buddy will set ->next up
the entire tree.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] sched/fair: do not announce throttled next buddy in dequeue_task_fair
  2016-07-14 22:49                         ` bsegall
@ 2016-07-14 23:03                           ` Wanpeng Li
  0 siblings, 0 replies; 21+ messages in thread
From: Wanpeng Li @ 2016-07-14 23:03 UTC (permalink / raw)
  To: Benjamin Segall
  Cc: Konstantin Khlebnikov, xlpang, Peter Zijlstra, Ingo Molnar,
	linux-kernel, stable

2016-07-15 6:49 GMT+08:00  <bsegall@google.com>:
> Wanpeng Li <kernellwp@gmail.com> writes:
>
>> 2016-07-15 1:54 GMT+08:00  <bsegall@google.com>:
>>> Wanpeng Li <kernellwp@gmail.com> writes:
>>>
>>>> 2016-07-14 1:06 GMT+08:00  <bsegall@google.com>:
>>>>> Wanpeng Li <kernellwp@gmail.com> writes:
>>>>>
>>>>>> 2016-07-13 1:25 GMT+08:00  <bsegall@google.com>:
>>>>>>> Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
>>>>>>>
>>>>>>>> On 11.07.2016 15:12, Xunlei Pang wrote:
>>>>>>>>> On 2016/07/11 at 17:54, Wanpeng Li wrote:
>>>>>>>>>> Hi Konstantin, Xunlei,
>>>>>>>>>> 2016-07-11 16:42 GMT+08:00 Xunlei Pang <xpang@redhat.com>:
>>>>>>>>>>> On 2016/07/11 at 16:22, Xunlei Pang wrote:
>>>>>>>>>>>> On 2016/07/11 at 15:25, Wanpeng Li wrote:
>>>>>>>>>>>>> 2016-06-16 20:57 GMT+08:00 Konstantin Khlebnikov <khlebnikov@yandex-team.ru>:
>>>>>>>>>>>>>> Hierarchy could be already throttled at this point. Throttled next
>>>>>>>>>>>>>> buddy could trigger null pointer dereference in pick_next_task_fair().
>>>>>>>>>>>>> There is cfs_rq->next check in pick_next_entity(), so how can null
>>>>>>>>>>>>> pointer dereference happen?
>>>>>>>>>>>> I guess it's the following code leading to a NULL se returned:
>>>>>>>>>>> s/NULL/empty-entity cfs_rq se/
>>>>>>>>>>>
>>>>>>>>>>>> pick_next_entity():
>>>>>>>>>>>>      if (cfs_rq->next && wakeup_preempt_entity(cfs_rq->next, left) < 1)
>>>>>>>>>>              ^^^^^^^^^^^^^
>>>>>>>>>> I think this will return false.
>>>>>>>>>
>>>>>>>>> With the wrong throttled_hierarchy(), I think this can happen. But after we have the
>>>>>>>>> corrected throttled_hierarchy() patch, I can't see how it is possible.
>>>>>>>>>
>>>>>>>>> dequeue_task_fair():
>>>>>>>>>      if (task_sleep && parent_entity(se))
>>>>>>>>>          set_next_buddy(parent_entity(se));
>>>>>>>>>
>>>>>>>>> How does dequeue_task_fair() with DEQUEUE_SLEEP set(true task_sleep) happen to a throttled hierarchy?
>>>>>>>>> IOW, a task belongs to a throttled hierarchy is running?
>>>>>>>>>
>>>>>>>>> Maybe Konstantin knows the reason.
>>>>>>>>
>>>>>>>> This function (dequeue_task_fair) check throttling but at point it could skip several
>>>>>>>> levels and announce as next buddy actually throttled entry.
>>>>>>>> Probably this bug hadn't happened but this's really hard to prove that this is impossible.
>>>>>>>> ->set_curr_task(), PI-boost or some tricky migration in balancer could break this easily.
>>>>>>>
>>>>>>> sched_setscheduler can call put_prev_task, which then can cause a
>>>>>>> throttle outside of __schedule(), then the task blocks normally and
>>>>>>> deactivate_task(DEQUEUE_SLEEP) happens and you lose.
>>>>>>
>>>>>> The cfs_rq_throttled() check in dequeue_task_fair() will capture the
>>>>>> cfs_rq which is throttled in sched_setscheduler::put_prev_task path,
>>>>>> so nothing lost, where I miss?
>>>>>>
>>>>>> Regards,
>>>>>> Wanpeng Li
>>>>>
>>>>> The cfs_rq_throttled() checks there are done bottom-up, so they will
>>>>> trigger too late. a/b/t, where t is descheduling and a is throttled can
>>>>> still cause a set_next_buddy(b);
>>>>
>>>> throttle cfs_rq is up-bottom, so when a is throttled, b and c are not
>>>> yet, then task_sleep && se && !throttled_hierarchy(cfs_rq) still can't
>>>> prevent a set_next_buddy(b).
>>>>
>>>> Regards,
>>>> Wanpeng Li
>>>
>>> They don't race or anything, everything's under rq->lock.
>>> throttled_hierarchy will register properly, the issue is that a parent
>>> is the one cfs_rq_throttled(), not the current cfs_rq, and
>>> set_next_buddy will set cfs_rq->next to an se that is !on_rq.
>>
>> Why b is !on_rq after throttle a?
>>
>> Regards,
>> Wanpeng Li
>
> a is !on_rq (because of throttle), but set_next_buddy will set ->next up
> the entire tree.

Got it, thanks for your explanation. :)

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2016-07-14 23:03 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-16 12:57 [PATCH] sched/fair: do not announce throttled next buddy in dequeue_task_fair Konstantin Khlebnikov
2016-06-16 17:06 ` bsegall
2016-06-21 13:44 ` Konstantin Khlebnikov
2016-06-24  9:00 ` [tip:sched/urgent] sched/fair: Do not announce throttled next buddy in dequeue_task_fair() tip-bot for Konstantin Khlebnikov
2016-07-11  7:25 ` [PATCH] sched/fair: do not announce throttled next buddy in dequeue_task_fair Wanpeng Li
2016-07-11  8:15   ` Konstantin Khlebnikov
2016-07-11  8:22   ` Xunlei Pang
2016-07-11  8:42     ` Xunlei Pang
2016-07-11  9:54       ` Wanpeng Li
2016-07-11 12:12         ` Xunlei Pang
2016-07-11 12:26           ` Konstantin Khlebnikov
2016-07-12 17:25             ` bsegall
2016-07-13  1:50               ` Wanpeng Li
2016-07-13  1:58                 ` Xunlei Pang
2016-07-13  2:14                   ` Wanpeng Li
2016-07-13 17:06                 ` bsegall
2016-07-14 12:11                   ` Wanpeng Li
2016-07-14 17:54                     ` bsegall
2016-07-14 22:30                       ` Wanpeng Li
2016-07-14 22:49                         ` bsegall
2016-07-14 23:03                           ` Wanpeng Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.