linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Subhra Mazumdar <subhra.mazumdar@oracle.com>
To: Mel Gorman <mgorman@techsingularity.net>,
	Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Paul Turner <pjt@google.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Fr?d?ric Weisbecker <fweisbec@gmail.com>,
	Kees Cook <keescook@chromium.org>,
	kerrnel@google.com
Subject: Re: [RFC][PATCH 00/16] sched: Core scheduling
Date: Fri, 8 Mar 2019 11:44:01 -0800	[thread overview]
Message-ID: <14a9adf7-9b50-1dfa-0c35-d04e976081c2@oracle.com> (raw)
In-Reply-To: <20190222124544.GY9565@techsingularity.net>


On 2/22/19 4:45 AM, Mel Gorman wrote:
> On Mon, Feb 18, 2019 at 09:49:10AM -0800, Linus Torvalds wrote:
>> On Mon, Feb 18, 2019 at 9:40 AM Peter Zijlstra <peterz@infradead.org> wrote:
>>> However; whichever way around you turn this cookie; it is expensive and nasty.
>> Do you (or anybody else) have numbers for real loads?
>>
>> Because performance is all that matters. If performance is bad, then
>> it's pointless, since just turning off SMT is the answer.
>>
> I tried to do a comparison between tip/master, ht disabled and this series
> putting test workloads into a tagged cgroup but unfortunately it failed
>
> [  156.978682] BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
> [  156.986597] #PF error: [normal kernel read fault]
> [  156.991343] PGD 0 P4D 0
> [  156.993905] Oops: 0000 [#1] SMP PTI
> [  156.997438] CPU: 15 PID: 0 Comm: swapper/15 Not tainted 5.0.0-rc7-schedcore-v1r1 #1
> [  157.005161] Hardware name: SGI.COM C2112-4GP3/X10DRT-P-Series, BIOS 2.0a 05/09/2016
> [  157.012896] RIP: 0010:wakeup_preempt_entity.isra.70+0x9/0x50
> [  157.018613] Code: 00 be c0 82 60 00 e9 86 02 1a 00 66 0f 1f 44 00 00 48 c1 e7 03 be c0 80 60 00 e9 72 02 1a 00 66 90 0f 1f 44 00 00
>   53 48 89 fb <48> 2b 5e 58 48 85 db 7e 2c 48 81 3e 00 00 10 00 8b 05 a9 b7 19 01
> [  157.037544] RSP: 0018:ffffc9000c5bbde8 EFLAGS: 00010086
> [  157.042819] RAX: ffff88810f5f6a00 RBX: 00000001547f175c RCX: 0000000000000001
> [  157.050015] RDX: ffff88bf3bdb0a40 RSI: 0000000000000000 RDI: 00000001547f175c
> [  157.057215] RBP: ffff88bf7fae32c0 R08: 000000000001e358 R09: ffff88810fb9f000
> [  157.064410] R10: ffffc9000c5bbe08 R11: ffff88810fb9f5c4 R12: 0000000000000000
> [  157.071611] R13: ffff88bf4e3ea0c0 R14: 0000000000000000 R15: ffff88bf4e3ea7a8
> [  157.078814] FS:  0000000000000000(0000) GS:ffff88bf7f5c0000(0000) knlGS:0000000000000000
> [  157.086977] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  157.092779] CR2: 0000000000000058 CR3: 000000000220e005 CR4: 00000000003606e0
> [  157.099979] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  157.109529] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  157.119058] Call Trace:
> [  157.123865]  pick_next_entity+0x61/0x110
> [  157.130137]  pick_task_fair+0x4b/0x90
> [  157.136124]  __schedule+0x365/0x12c0
> [  157.141985]  schedule_idle+0x1e/0x40
> [  157.147822]  do_idle+0x166/0x280
> [  157.153275]  cpu_startup_entry+0x19/0x20
> [  157.159420]  start_secondary+0x17a/0x1d0
> [  157.165568]  secondary_startup_64+0xa4/0xb0
> [  157.171985] Modules linked in: af_packet iscsi_ibft iscsi_boot_sysfs msr intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ipmi_ssif irqbypass crc32_pclmul ghash_clmulni_intel ixgbe aesni_intel xfrm_algo iTCO_wdt joydev iTCO_vendor_support libphy igb aes_x86_64 crypto_simd ptp cryptd mei_me mdio pps_core ioatdma glue_helper pcspkr ipmi_si lpc_ich i2c_i801 mei dca ipmi_devintf ipmi_msghandler acpi_pad pcc_cpufreq button btrfs libcrc32c xor zstd_decompress zstd_compress raid6_pq hid_generic usbhid ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops xhci_pci crc32c_intel ehci_pci ttm xhci_hcd ehci_hcd drm ahci usbcore mpt3sas libahci raid_class scsi_transport_sas wmi sg nbd dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua
> [  157.258990] CR2: 0000000000000058
> [  157.264961] ---[ end trace a301ac5e3ee86fde ]---
> [  157.283719] RIP: 0010:wakeup_preempt_entity.isra.70+0x9/0x50
> [  157.291967] Code: 00 be c0 82 60 00 e9 86 02 1a 00 66 0f 1f 44 00 00 48 c1 e7 03 be c0 80 60 00 e9 72 02 1a 00 66 90 0f 1f 44 00 00 53 48 89 fb <48> 2b 5e 58 48 85 db 7e 2c 48 81 3e 00 00 10 00 8b 05 a9 b7 19 01
> [  157.316121] RSP: 0018:ffffc9000c5bbde8 EFLAGS: 00010086
> [  157.324060] RAX: ffff88810f5f6a00 RBX: 00000001547f175c RCX: 0000000000000001
> [  157.333932] RDX: ffff88bf3bdb0a40 RSI: 0000000000000000 RDI: 00000001547f175c
> [  157.343795] RBP: ffff88bf7fae32c0 R08: 000000000001e358 R09: ffff88810fb9f000
> [  157.353634] R10: ffffc9000c5bbe08 R11: ffff88810fb9f5c4 R12: 0000000000000000
> [  157.363506] R13: ffff88bf4e3ea0c0 R14: 0000000000000000 R15: ffff88bf4e3ea7a8
> [  157.373395] FS:  0000000000000000(0000) GS:ffff88bf7f5c0000(0000) knlGS:0000000000000000
> [  157.384238] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  157.392709] CR2: 0000000000000058 CR3: 000000000220e005 CR4: 00000000003606e0
> [  157.402601] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  157.412488] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  157.422334] Kernel panic - not syncing: Attempted to kill the idle task!
> [  158.529804] Shutting down cpus with NMI
> [  158.573249] Kernel Offset: disabled
> [  158.586198] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
>
> RIP translates to kernel/sched/fair.c:6819
>
> static int
> wakeup_preempt_entity(struct sched_entity *curr, struct sched_entity *se)
> {
>          s64 gran, vdiff = curr->vruntime - se->vruntime; /* LINE 6819 */
>
>          if (vdiff <= 0)
>                  return -1;
>
>          gran = wakeup_gran(se);
>          if (vdiff > gran)
>                  return 1;
> }
>
> I haven't tried debugging it yet.
>
I think the following fix, while trivial, is the right fix for the NULL
dereference in this case. This bug is reproducible with patch 14. I also 
did
some performance bisecting and with patch 14 performance is decimated, 
that's
expected. Most of the performance recovery happens in patch 15 which,
unfortunately, is also the one that introduces the hard lockup.

-------8<-----------

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1d0dac4..ecadf36 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4131,7 +4131,7 @@ pick_next_entity(struct cfs_rq *cfs_rq, struct 
sched_entity *curr)
          * Avoid running the skip buddy, if running something else can
          * be done without getting too unfair.
*/
-       if (cfs_rq->skip == se) {
+       if (cfs_rq->skip && cfs_rq->skip == se) {
                 struct sched_entity *second;

                 if (se == curr) {
@@ -4149,13 +4149,15 @@ pick_next_entity(struct cfs_rq *cfs_rq, struct 
sched_entity *curr)
/*
          * Prefer last buddy, try to return the CPU to a preempted task.
*/
-       if (cfs_rq->last && wakeup_preempt_entity(cfs_rq->last, left) < 1)
+       if (left && cfs_rq->last && wakeup_preempt_entity(cfs_rq->last, 
left)
+           < 1)
                 se = cfs_rq->last;

/*
          * Someone really wants this to run. If it's not unfair, run it.
*/
-       if (cfs_rq->next && wakeup_preempt_entity(cfs_rq->next, left) < 1)
+       if (left && cfs_rq->next && wakeup_preempt_entity(cfs_rq->next, 
left)
+           < 1)
                 se = cfs_rq->next;

         clear_buddies(cfs_rq, se);


  parent reply	other threads:[~2019-03-08 19:48 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-18 16:56 [RFC][PATCH 00/16] sched: Core scheduling Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 01/16] stop_machine: Fix stop_cpus_in_progress ordering Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 02/16] sched: Fix kerneldoc comment for ia64_set_curr_task Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 03/16] sched: Wrap rq::lock access Peter Zijlstra
2019-02-19 16:13   ` Phil Auld
2019-02-19 16:22     ` Peter Zijlstra
2019-02-19 16:37       ` Phil Auld
2019-03-18 15:41   ` Julien Desfossez
2019-03-20  2:29     ` Subhra Mazumdar
2019-03-21 21:20       ` Julien Desfossez
2019-03-22 13:34         ` Peter Zijlstra
2019-03-22 20:59           ` Julien Desfossez
2019-03-23  0:06         ` Subhra Mazumdar
2019-03-27  1:02           ` Subhra Mazumdar
2019-03-29 13:35           ` Julien Desfossez
2019-03-29 22:23             ` Subhra Mazumdar
2019-04-01 21:35               ` Subhra Mazumdar
2019-04-03 20:16                 ` Julien Desfossez
2019-04-05  1:30                   ` Subhra Mazumdar
2019-04-02  7:42               ` Peter Zijlstra
2019-03-22 23:28       ` Tim Chen
2019-03-22 23:44         ` Tim Chen
2019-02-18 16:56 ` [RFC][PATCH 04/16] sched/{rt,deadline}: Fix set_next_task vs pick_next_task Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 05/16] sched: Add task_struct pointer to sched_class::set_curr_task Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 06/16] sched/fair: Export newidle_balance() Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 07/16] sched: Allow put_prev_task() to drop rq->lock Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 08/16] sched: Rework pick_next_task() slow-path Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 09/16] sched: Introduce sched_class::pick_task() Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 10/16] sched: Core-wide rq->lock Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 11/16] sched: Basic tracking of matching tasks Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 12/16] sched: A quick and dirty cgroup tagging interface Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 13/16] sched: Add core wide task selection and scheduling Peter Zijlstra
     [not found]   ` <20190402064612.GA46500@aaronlu>
2019-04-02  8:28     ` Peter Zijlstra
2019-04-02 13:20       ` Aaron Lu
2019-04-05 14:55       ` Aaron Lu
2019-04-09 18:09         ` Tim Chen
2019-04-10  4:36           ` Aaron Lu
2019-04-10 14:18             ` Aubrey Li
2019-04-11  2:11               ` Aaron Lu
2019-04-10 14:44             ` Peter Zijlstra
2019-04-11  3:05               ` Aaron Lu
2019-04-11  9:19                 ` Peter Zijlstra
2019-04-10  8:06           ` Peter Zijlstra
2019-04-10 19:58             ` Vineeth Remanan Pillai
2019-04-15 16:59             ` Julien Desfossez
2019-04-16 13:43       ` Aaron Lu
2019-04-09 18:38   ` Julien Desfossez
2019-04-10 15:01     ` Peter Zijlstra
2019-04-11  0:11     ` Subhra Mazumdar
2019-04-19  8:40       ` Ingo Molnar
2019-04-19 23:16         ` Subhra Mazumdar
2019-02-18 16:56 ` [RFC][PATCH 14/16] sched/fair: Add a few assertions Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 15/16] sched: Trivial forced-newidle balancer Peter Zijlstra
2019-02-21 16:19   ` Valentin Schneider
2019-02-21 16:41     ` Peter Zijlstra
2019-02-21 16:47       ` Peter Zijlstra
2019-02-21 18:28         ` Valentin Schneider
2019-04-04  8:31       ` Aubrey Li
2019-04-06  1:36         ` Aubrey Li
2019-02-18 16:56 ` [RFC][PATCH 16/16] sched: Debug bits Peter Zijlstra
2019-02-18 17:49 ` [RFC][PATCH 00/16] sched: Core scheduling Linus Torvalds
2019-02-18 20:40   ` Peter Zijlstra
2019-02-19  0:29     ` Linus Torvalds
2019-02-19 15:15       ` Ingo Molnar
2019-02-22 12:17     ` Paolo Bonzini
2019-02-22 14:20       ` Peter Zijlstra
2019-02-22 19:26         ` Tim Chen
2019-02-26  8:26           ` Aubrey Li
2019-02-27  7:54             ` Aubrey Li
2019-02-21  2:53   ` Subhra Mazumdar
2019-02-21 14:03     ` Peter Zijlstra
2019-02-21 18:44       ` Subhra Mazumdar
2019-02-22  0:34       ` Subhra Mazumdar
2019-02-22 12:45   ` Mel Gorman
2019-02-22 16:10     ` Mel Gorman
2019-03-08 19:44     ` Subhra Mazumdar [this message]
2019-03-11  4:23       ` Aubrey Li
2019-03-11 18:34         ` Subhra Mazumdar
2019-03-11 23:33           ` Subhra Mazumdar
2019-03-12  0:20             ` Greg Kerr
2019-03-12  0:47               ` Subhra Mazumdar
2019-03-12  7:33               ` Aaron Lu
2019-03-12  7:45             ` Aubrey Li
2019-03-13  5:55               ` Aubrey Li
2019-03-14  0:35                 ` Tim Chen
2019-03-14  5:30                   ` Aubrey Li
2019-03-14  6:07                     ` Li, Aubrey
2019-03-18  6:56             ` Aubrey Li
2019-03-12 19:07           ` Pawan Gupta
2019-03-26  7:32       ` Aaron Lu
2019-03-26  7:56         ` Aaron Lu
2019-02-19 22:07 ` Greg Kerr
2019-02-20  9:42   ` Peter Zijlstra
2019-02-20 18:33     ` Greg Kerr
2019-02-22 14:10       ` Peter Zijlstra
2019-03-07 22:06         ` Paolo Bonzini
2019-02-20 18:43     ` Subhra Mazumdar
2019-03-01  2:54 ` Subhra Mazumdar
2019-03-14 15:28 ` Julien Desfossez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=14a9adf7-9b50-1dfa-0c35-d04e976081c2@oracle.com \
    --to=subhra.mazumdar@oracle.com \
    --cc=fweisbec@gmail.com \
    --cc=keescook@chromium.org \
    --cc=kerrnel@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).