linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Aaron Lu <aaron.lu@linux.alibaba.com>
To: Subhra Mazumdar <subhra.mazumdar@oracle.com>
Cc: Mel Gorman <mgorman@techsingularity.net>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Paul Turner <pjt@google.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Fr?d?ric Weisbecker <fweisbec@gmail.com>,
	Kees Cook <keescook@chromium.org>,
	kerrnel@google.com
Subject: Re: [RFC][PATCH 00/16] sched: Core scheduling
Date: Tue, 26 Mar 2019 15:56:26 +0800	[thread overview]
Message-ID: <20190326075625.GA23460@aaronlu> (raw)
In-Reply-To: <20190326073210.GA100891@aaronlu>

On Tue, Mar 26, 2019 at 03:32:12PM +0800, Aaron Lu wrote:
> On Fri, Mar 08, 2019 at 11:44:01AM -0800, Subhra Mazumdar wrote:
> > 
> > On 2/22/19 4:45 AM, Mel Gorman wrote:
> > >On Mon, Feb 18, 2019 at 09:49:10AM -0800, Linus Torvalds wrote:
> > >>On Mon, Feb 18, 2019 at 9:40 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > >>>However; whichever way around you turn this cookie; it is expensive and nasty.
> > >>Do you (or anybody else) have numbers for real loads?
> > >>
> > >>Because performance is all that matters. If performance is bad, then
> > >>it's pointless, since just turning off SMT is the answer.
> > >>
> > >I tried to do a comparison between tip/master, ht disabled and this series
> > >putting test workloads into a tagged cgroup but unfortunately it failed
> > >
> > >[  156.978682] BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
> > >[  156.986597] #PF error: [normal kernel read fault]
> > >[  156.991343] PGD 0 P4D 0
> > >[  156.993905] Oops: 0000 [#1] SMP PTI
> > >[  156.997438] CPU: 15 PID: 0 Comm: swapper/15 Not tainted 5.0.0-rc7-schedcore-v1r1 #1
> > >[  157.005161] Hardware name: SGI.COM C2112-4GP3/X10DRT-P-Series, BIOS 2.0a 05/09/2016
> > >[  157.012896] RIP: 0010:wakeup_preempt_entity.isra.70+0x9/0x50
> > >[  157.018613] Code: 00 be c0 82 60 00 e9 86 02 1a 00 66 0f 1f 44 00 00 48 c1 e7 03 be c0 80 60 00 e9 72 02 1a 00 66 90 0f 1f 44 00 00
> > >  53 48 89 fb <48> 2b 5e 58 48 85 db 7e 2c 48 81 3e 00 00 10 00 8b 05 a9 b7 19 01
> > >[  157.037544] RSP: 0018:ffffc9000c5bbde8 EFLAGS: 00010086
> > >[  157.042819] RAX: ffff88810f5f6a00 RBX: 00000001547f175c RCX: 0000000000000001
> > >[  157.050015] RDX: ffff88bf3bdb0a40 RSI: 0000000000000000 RDI: 00000001547f175c
> > >[  157.057215] RBP: ffff88bf7fae32c0 R08: 000000000001e358 R09: ffff88810fb9f000
> > >[  157.064410] R10: ffffc9000c5bbe08 R11: ffff88810fb9f5c4 R12: 0000000000000000
> > >[  157.071611] R13: ffff88bf4e3ea0c0 R14: 0000000000000000 R15: ffff88bf4e3ea7a8
> > >[  157.078814] FS:  0000000000000000(0000) GS:ffff88bf7f5c0000(0000) knlGS:0000000000000000
> > >[  157.086977] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >[  157.092779] CR2: 0000000000000058 CR3: 000000000220e005 CR4: 00000000003606e0
> > >[  157.099979] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > >[  157.109529] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > >[  157.119058] Call Trace:
> > >[  157.123865]  pick_next_entity+0x61/0x110
> > >[  157.130137]  pick_task_fair+0x4b/0x90
> > >[  157.136124]  __schedule+0x365/0x12c0
> > >[  157.141985]  schedule_idle+0x1e/0x40
> > >[  157.147822]  do_idle+0x166/0x280
> > >[  157.153275]  cpu_startup_entry+0x19/0x20
> > >[  157.159420]  start_secondary+0x17a/0x1d0
> > >[  157.165568]  secondary_startup_64+0xa4/0xb0
> > >[  157.171985] Modules linked in: af_packet iscsi_ibft iscsi_boot_sysfs msr intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ipmi_ssif irqbypass crc32_pclmul ghash_clmulni_intel ixgbe aesni_intel xfrm_algo iTCO_wdt joydev iTCO_vendor_support libphy igb aes_x86_64 crypto_simd ptp cryptd mei_me mdio pps_core ioatdma glue_helper pcspkr ipmi_si lpc_ich i2c_i801 mei dca ipmi_devintf ipmi_msghandler acpi_pad pcc_cpufreq button btrfs libcrc32c xor zstd_decompress zstd_compress raid6_pq hid_generic usbhid ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops xhci_pci crc32c_intel ehci_pci ttm xhci_hcd ehci_hcd drm ahci usbcore mpt3sas libahci raid_class scsi_transport_sas wmi sg nbd dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua
> > >[  157.258990] CR2: 0000000000000058
> > >[  157.264961] ---[ end trace a301ac5e3ee86fde ]---
> > >[  157.283719] RIP: 0010:wakeup_preempt_entity.isra.70+0x9/0x50
> > >[  157.291967] Code: 00 be c0 82 60 00 e9 86 02 1a 00 66 0f 1f 44 00 00 48 c1 e7 03 be c0 80 60 00 e9 72 02 1a 00 66 90 0f 1f 44 00 00 53 48 89 fb <48> 2b 5e 58 48 85 db 7e 2c 48 81 3e 00 00 10 00 8b 05 a9 b7 19 01
> > >[  157.316121] RSP: 0018:ffffc9000c5bbde8 EFLAGS: 00010086
> > >[  157.324060] RAX: ffff88810f5f6a00 RBX: 00000001547f175c RCX: 0000000000000001
> > >[  157.333932] RDX: ffff88bf3bdb0a40 RSI: 0000000000000000 RDI: 00000001547f175c
> > >[  157.343795] RBP: ffff88bf7fae32c0 R08: 000000000001e358 R09: ffff88810fb9f000
> > >[  157.353634] R10: ffffc9000c5bbe08 R11: ffff88810fb9f5c4 R12: 0000000000000000
> > >[  157.363506] R13: ffff88bf4e3ea0c0 R14: 0000000000000000 R15: ffff88bf4e3ea7a8
> > >[  157.373395] FS:  0000000000000000(0000) GS:ffff88bf7f5c0000(0000) knlGS:0000000000000000
> > >[  157.384238] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >[  157.392709] CR2: 0000000000000058 CR3: 000000000220e005 CR4: 00000000003606e0
> > >[  157.402601] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > >[  157.412488] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > >[  157.422334] Kernel panic - not syncing: Attempted to kill the idle task!
> > >[  158.529804] Shutting down cpus with NMI
> > >[  158.573249] Kernel Offset: disabled
> > >[  158.586198] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
> > >
> > >RIP translates to kernel/sched/fair.c:6819
> > >
> > >static int
> > >wakeup_preempt_entity(struct sched_entity *curr, struct sched_entity *se)
> > >{
> > >         s64 gran, vdiff = curr->vruntime - se->vruntime; /* LINE 6819 */
> > >
> > >         if (vdiff <= 0)
> > >                 return -1;
> > >
> > >         gran = wakeup_gran(se);
> > >         if (vdiff > gran)
> > >                 return 1;
> > >}
> > >
> > >I haven't tried debugging it yet.
> > >
> > I think the following fix, while trivial, is the right fix for the NULL
> > dereference in this case. This bug is reproducible with patch 14. I
> 
> I assume you meant patch 4?

Correction, should be patch 9 where pick_task_fair() is introduced.

Thanks,
Aaron

> 
> My understanding is, this is due to 'left' being NULL in
> pick_next_entity().
> 
> With patch 4, in pick_task_fair(), pick_next_entity() can be called with
> an empty rbtree of cfs_rq and with a NULL 'curr'. This resulted in a
> NULL 'left'. Before patch 4, this can't happen.
> 
> It's not clear to me why NULL is used instead of 'curr' for
> pick_next_entity() in pick_task_fair(). My first thought is, 'curr' will
> not be considered as next entity, but then 'curr' is checked after
> pick_next_entity() returns so this shouldn't be the reason. Guess I
> missed something.
> 
> Thanks,
> Aaron
> 
> > also did
> > some performance bisecting and with patch 14 performance is
> > decimated, that's
> > expected. Most of the performance recovery happens in patch 15 which,
> > unfortunately, is also the one that introduces the hard lockup.
> > 
> > -------8<-----------
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 1d0dac4..ecadf36 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -4131,7 +4131,7 @@ pick_next_entity(struct cfs_rq *cfs_rq, struct
> > sched_entity *curr)
> >          * Avoid running the skip buddy, if running something else can
> >          * be done without getting too unfair.
> > */
> > -       if (cfs_rq->skip == se) {
> > +       if (cfs_rq->skip && cfs_rq->skip == se) {
> >                 struct sched_entity *second;
> > 
> >                 if (se == curr) {
> > @@ -4149,13 +4149,15 @@ pick_next_entity(struct cfs_rq *cfs_rq,
> > struct sched_entity *curr)
> > /*
> >          * Prefer last buddy, try to return the CPU to a preempted task.
> > */
> > -       if (cfs_rq->last && wakeup_preempt_entity(cfs_rq->last, left) < 1)
> > +       if (left && cfs_rq->last &&
> > wakeup_preempt_entity(cfs_rq->last, left)
> > +           < 1)
> >                 se = cfs_rq->last;
> > 
> > /*
> >          * Someone really wants this to run. If it's not unfair, run it.
> > */
> > -       if (cfs_rq->next && wakeup_preempt_entity(cfs_rq->next, left) < 1)
> > +       if (left && cfs_rq->next &&
> > wakeup_preempt_entity(cfs_rq->next, left)
> > +           < 1)
> >                 se = cfs_rq->next;
> > 
> >         clear_buddies(cfs_rq, se);
> > 

  reply	other threads:[~2019-03-26  7:56 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-18 16:56 [RFC][PATCH 00/16] sched: Core scheduling Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 01/16] stop_machine: Fix stop_cpus_in_progress ordering Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 02/16] sched: Fix kerneldoc comment for ia64_set_curr_task Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 03/16] sched: Wrap rq::lock access Peter Zijlstra
2019-02-19 16:13   ` Phil Auld
2019-02-19 16:22     ` Peter Zijlstra
2019-02-19 16:37       ` Phil Auld
2019-03-18 15:41   ` Julien Desfossez
2019-03-20  2:29     ` Subhra Mazumdar
2019-03-21 21:20       ` Julien Desfossez
2019-03-22 13:34         ` Peter Zijlstra
2019-03-22 20:59           ` Julien Desfossez
2019-03-23  0:06         ` Subhra Mazumdar
2019-03-27  1:02           ` Subhra Mazumdar
2019-03-29 13:35           ` Julien Desfossez
2019-03-29 22:23             ` Subhra Mazumdar
2019-04-01 21:35               ` Subhra Mazumdar
2019-04-03 20:16                 ` Julien Desfossez
2019-04-05  1:30                   ` Subhra Mazumdar
2019-04-02  7:42               ` Peter Zijlstra
2019-03-22 23:28       ` Tim Chen
2019-03-22 23:44         ` Tim Chen
2019-02-18 16:56 ` [RFC][PATCH 04/16] sched/{rt,deadline}: Fix set_next_task vs pick_next_task Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 05/16] sched: Add task_struct pointer to sched_class::set_curr_task Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 06/16] sched/fair: Export newidle_balance() Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 07/16] sched: Allow put_prev_task() to drop rq->lock Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 08/16] sched: Rework pick_next_task() slow-path Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 09/16] sched: Introduce sched_class::pick_task() Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 10/16] sched: Core-wide rq->lock Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 11/16] sched: Basic tracking of matching tasks Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 12/16] sched: A quick and dirty cgroup tagging interface Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 13/16] sched: Add core wide task selection and scheduling Peter Zijlstra
     [not found]   ` <20190402064612.GA46500@aaronlu>
2019-04-02  8:28     ` Peter Zijlstra
2019-04-02 13:20       ` Aaron Lu
2019-04-05 14:55       ` Aaron Lu
2019-04-09 18:09         ` Tim Chen
2019-04-10  4:36           ` Aaron Lu
2019-04-10 14:18             ` Aubrey Li
2019-04-11  2:11               ` Aaron Lu
2019-04-10 14:44             ` Peter Zijlstra
2019-04-11  3:05               ` Aaron Lu
2019-04-11  9:19                 ` Peter Zijlstra
2019-04-10  8:06           ` Peter Zijlstra
2019-04-10 19:58             ` Vineeth Remanan Pillai
2019-04-15 16:59             ` Julien Desfossez
2019-04-16 13:43       ` Aaron Lu
2019-04-09 18:38   ` Julien Desfossez
2019-04-10 15:01     ` Peter Zijlstra
2019-04-11  0:11     ` Subhra Mazumdar
2019-04-19  8:40       ` Ingo Molnar
2019-04-19 23:16         ` Subhra Mazumdar
2019-02-18 16:56 ` [RFC][PATCH 14/16] sched/fair: Add a few assertions Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 15/16] sched: Trivial forced-newidle balancer Peter Zijlstra
2019-02-21 16:19   ` Valentin Schneider
2019-02-21 16:41     ` Peter Zijlstra
2019-02-21 16:47       ` Peter Zijlstra
2019-02-21 18:28         ` Valentin Schneider
2019-04-04  8:31       ` Aubrey Li
2019-04-06  1:36         ` Aubrey Li
2019-02-18 16:56 ` [RFC][PATCH 16/16] sched: Debug bits Peter Zijlstra
2019-02-18 17:49 ` [RFC][PATCH 00/16] sched: Core scheduling Linus Torvalds
2019-02-18 20:40   ` Peter Zijlstra
2019-02-19  0:29     ` Linus Torvalds
2019-02-19 15:15       ` Ingo Molnar
2019-02-22 12:17     ` Paolo Bonzini
2019-02-22 14:20       ` Peter Zijlstra
2019-02-22 19:26         ` Tim Chen
2019-02-26  8:26           ` Aubrey Li
2019-02-27  7:54             ` Aubrey Li
2019-02-21  2:53   ` Subhra Mazumdar
2019-02-21 14:03     ` Peter Zijlstra
2019-02-21 18:44       ` Subhra Mazumdar
2019-02-22  0:34       ` Subhra Mazumdar
2019-02-22 12:45   ` Mel Gorman
2019-02-22 16:10     ` Mel Gorman
2019-03-08 19:44     ` Subhra Mazumdar
2019-03-11  4:23       ` Aubrey Li
2019-03-11 18:34         ` Subhra Mazumdar
2019-03-11 23:33           ` Subhra Mazumdar
2019-03-12  0:20             ` Greg Kerr
2019-03-12  0:47               ` Subhra Mazumdar
2019-03-12  7:33               ` Aaron Lu
2019-03-12  7:45             ` Aubrey Li
2019-03-13  5:55               ` Aubrey Li
2019-03-14  0:35                 ` Tim Chen
2019-03-14  5:30                   ` Aubrey Li
2019-03-14  6:07                     ` Li, Aubrey
2019-03-18  6:56             ` Aubrey Li
2019-03-12 19:07           ` Pawan Gupta
2019-03-26  7:32       ` Aaron Lu
2019-03-26  7:56         ` Aaron Lu [this message]
2019-02-19 22:07 ` Greg Kerr
2019-02-20  9:42   ` Peter Zijlstra
2019-02-20 18:33     ` Greg Kerr
2019-02-22 14:10       ` Peter Zijlstra
2019-03-07 22:06         ` Paolo Bonzini
2019-02-20 18:43     ` Subhra Mazumdar
2019-03-01  2:54 ` Subhra Mazumdar
2019-03-14 15:28 ` Julien Desfossez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190326075625.GA23460@aaronlu \
    --to=aaron.lu@linux.alibaba.com \
    --cc=fweisbec@gmail.com \
    --cc=keescook@chromium.org \
    --cc=kerrnel@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=subhra.mazumdar@oracle.com \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).