All of lore.kernel.org
 help / color / mirror / Atom feed
From: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Rik van Riel <riel@surriel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Gautham R Shenoy <ego@linux.vnet.ibm.com>,
	Parth Shah <parth@linux.ibm.com>
Subject: Re: [PATCH 00/10] sched/fair: wake_affine improvements
Date: Fri, 23 Apr 2021 16:01:29 +0530	[thread overview]
Message-ID: <20210423103129.GH2633526@linux.vnet.ibm.com> (raw)
In-Reply-To: <20210423082532.GA4239@techsingularity.net>

* Mel Gorman <mgorman@techsingularity.net> [2021-04-23 09:25:32]:

> On Thu, Apr 22, 2021 at 03:53:16PM +0530, Srikar Dronamraju wrote:
> > Recently we found that some of the benchmark numbers on Power10 were lesser
> > than expected. Some analysis showed that the problem lies in the fact that
> > L2-Cache on Power10 is at core level i.e only 4 threads share the L2-cache.
> > 
> 
> I didn't get the chance to review this properly although I am suspicious
> of tracking idle_core and updating that more frequently. It becomes a very
> hot cache line that bounces. I did experiement with tracking an idle core
> but the data either went stale too quickly or the updates incurred more
> overhead than a reduced search saved.

Thanks a lot for trying this.

> The series also oopses a *lot* and didn't get through a run of basic
> workloads on x86 on any of three machines. An example oops is
> 

Can you pass me your failing config. I am somehow not been seeing this
either on x86 or on Powerpc on multiple systems.
Also if possible cat /proc/schedstat and cat
/proc/sys/kernel/sched_domain/cpu0/domain*/name

> [  137.770968] BUG: unable to handle page fault for address: 000000000001a5c8
> [  137.777836] #PF: supervisor read access in kernel mode
> [  137.782965] #PF: error_code(0x0000) - not-present page
> [  137.788097] PGD 8000004098a42067 P4D 8000004098a42067 PUD 4092e36067 PMD 40883ac067 PTE 0
> [  137.796261] Oops: 0000 [#1] SMP PTI
> [  137.799747] CPU: 0 PID: 14913 Comm: GC Slave Tainted: G            E     5.12.0-rc8-llcfallback-v1r1 #1
> [  137.809123] Hardware name: SGI.COM C2112-4GP3/X10DRT-P-Series, BIOS 2.0a 05/09/2016
> [  137.816765] RIP: 0010:cpus_share_cache+0x22/0x30
> [  137.821396] Code: fc ff 0f 0b eb 80 66 90 0f 1f 44 00 00 48 63 ff 48 63 f6 48 c7 c0 c8 a5 01 00 48 8b 0c fd 00 59 9d 9a 48 8b 14 f5 00 59 9d 9a <8b> 14 02 39 14 01 0f 94 c0 c3 0f 1f 40 00 0f 1f 44 00 00 41 57 41

IP says cpus_share_cache, and it takes 2 ints,
RAX is 000000000001a5c8 but the panic says
"unable to handle page fault for address: 000000000001a5c8"
so it must have failed for "per_cpu(sd_llc_id, xx_cpu)"

> [  137.840133] RSP: 0018:ffff9e0aa26a7c60 EFLAGS: 00010086
> [  137.845360] RAX: 000000000001a5c8 RBX: ffff8bfbc65cf280 RCX: ffff8c3abf440000
> [  137.852483] RDX: 0000000000000000 RSI: ffffffffffffffff RDI: 0000000000000015
> [  137.859605] RBP: ffff8bbc478b9fe0 R08: 0000000000000000 R09: 000000000001a5b8
> [  137.866730] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8bbc47c88a00
> [  137.873855] R13: 0000000000000015 R14: 0000000000000001 R15: ffff8bfde0199fc0
> [  137.880978] FS:  00007fa33c173700(0000) GS:ffff8bfabf400000(0000) knlGS:0000000000000000
> [  137.889055] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  137.894794] CR2: 000000000001a5c8 CR3: 0000004096b22006 CR4: 00000000003706f0
> [  137.901918] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  137.909041] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

-- 
Thanks and Regards
Srikar Dronamraju

  reply	other threads:[~2021-04-23 10:32 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-22 10:23 [PATCH 00/10] sched/fair: wake_affine improvements Srikar Dronamraju
2021-04-22 10:23 ` [PATCH 01/10] sched/fair: Update affine statistics when needed Srikar Dronamraju
2021-04-22 10:23 ` [PATCH 02/10] sched/fair: Maintain the identity of idle-core Srikar Dronamraju
2021-04-22 10:23 ` [PATCH 03/10] sched/fair: Update idle-core more often Srikar Dronamraju
2021-04-22 10:23 ` [PATCH 04/10] sched/fair: Prefer idle CPU to cache affinity Srikar Dronamraju
2021-04-22 10:23 ` [PATCH 05/10] sched/fair: Call wake_affine only if necessary Srikar Dronamraju
2021-04-22 10:23 ` [PATCH 06/10] sched/idle: Move busy_cpu accounting to idle callback Srikar Dronamraju
2021-04-22 10:23 ` [PATCH 07/10] sched/fair: Remove ifdefs in waker_affine_idler_llc Srikar Dronamraju
2021-04-22 10:23 ` [PATCH 08/10] sched/fair: Dont iterate if no idle CPUs Srikar Dronamraju
2021-04-22 10:23 ` [PATCH 09/10] sched/topology: Introduce fallback LLC Srikar Dronamraju
2021-04-22 15:10   ` kernel test robot
2021-04-22 15:10     ` kernel test robot
2021-04-22 10:23 ` [PATCH 10/10] powerpc/smp: Add fallback flag to powerpc MC domain Srikar Dronamraju
2021-04-23  8:25 ` [PATCH 00/10] sched/fair: wake_affine improvements Mel Gorman
2021-04-23 10:31   ` Srikar Dronamraju [this message]
2021-04-23 12:38     ` Mel Gorman
2021-04-26 10:30       ` Srikar Dronamraju
2021-04-26 11:35         ` Mel Gorman
2021-04-26 10:39   ` Srikar Dronamraju
2021-04-26 11:41     ` Mel Gorman
2021-04-28 12:57       ` Srikar Dronamraju
2021-04-27 14:52 ` Vincent Guittot
2021-04-28 12:49   ` Srikar Dronamraju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210423103129.GH2633526@linux.vnet.ibm.com \
    --to=srikar@linux.vnet.ibm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=ego@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=parth@linux.ibm.com \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    --cc=tglx@linutronix.de \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.