From: Lauro Venancio <firstname.lastname@example.org> To: Peter Zijlstra <email@example.com> Cc: firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, Mike Galbraith <email@example.com>, Thomas Gleixner <firstname.lastname@example.org>, Ingo Molnar <email@example.com> Subject: Re: [RFC 2/3] sched/topology: fix sched groups on NUMA machines with mesh topology Date: Thu, 13 Apr 2017 17:21:00 -0300 Message-ID: <firstname.lastname@example.org> (raw) In-Reply-To: <email@example.com> On 04/13/2017 12:48 PM, Peter Zijlstra wrote: > On Thu, Apr 13, 2017 at 10:56:08AM -0300, Lauro Ramos Venancio wrote: >> Currently, on a 4 nodes NUMA machine with ring topology, two sched >> groups are generated for the last NUMA sched domain. One group has the >> CPUs from NUMA nodes 3, 0 and 1; the other group has the CPUs from nodes >> 1, 2 and 3. As CPUs from nodes 1 and 3 belongs to both groups, the >> scheduler is unable to directly move tasks between these nodes. In the >> worst scenario, when a set of tasks are bound to nodes 1 and 3, the >> performance is severely impacted because just one node is used while the >> other node remains idle. > I feel a picture would be ever so much clearer. > >> This patch constructs the sched groups from each CPU perspective. So, on >> a 4 nodes machine with ring topology, while nodes 0 and 2 keep the same >> groups as before [(3, 0, 1)(1, 2, 3)], nodes 1 and 3 have new groups >> [(0, 1, 2)(2, 3, 0)]. This allows moving tasks between any node 2-hops >> apart. > So I still have no idea what specifically goes wrong and how this fixes > it. Changelog is impenetrable. On a 4 nodes machine with ring topology, the last sched domain level contains groups with 3 numa nodes each. So we have four possible groups: (0, 1, 2) (1, 2, 3) (2, 3, 0)(3, 0, 1). As we need just two groups to fill the sched domain, currently, the groups (3, 0, 1) and (1, 2, 3) are used for all CPUs. The problem with it is that nodes 1 and 3 belongs to both groups, becoming impossible to move tasks between these two nodes. This patch uses different groups depending on the CPU they are installed. So nodes 0 and 2 CPUs keep the same group as before: (3, 0, 1) and (1, 2, 3). Nodes 1 and 3 CPUs use the new groups: (0, 1, 2) and (2, 3, 0). So the first pair of groups allows movement between nodes 0 and 2; and the second pair of groups allows movement between nodes 1 and 3. I will improve the changelog. > "From each CPU's persepective" doesn't really help, there already is a > for_each_cpu() in. The for_each_cpu() is used to iterate across all sched domain cpus. It doesn't consider the CPU where the groups are being installed (parameter cpu in build_overlap_sched_groups()). Currently, the parameter cpu is used just for memory allocation and for ordering the groups, it doesn't change the groups that are chosen. This patch uses the parameter cpu to choose the first group, changing also, as consequence, the second group. > > Also, since I'm not sure what happend to the 4 node system, I cannot > begin to imagine what would happen on the 8 node one.
next prev parent reply index Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-04-13 13:56 [RFC 0/3] " Lauro Ramos Venancio 2017-04-13 13:56 ` [RFC 1/3] sched/topology: Refactor function build_overlap_sched_groups() Lauro Ramos Venancio 2017-04-13 14:50 ` Rik van Riel 2017-05-15 9:02 ` [tip:sched/core] " tip-bot for Lauro Ramos Venancio 2017-04-13 13:56 ` [RFC 2/3] sched/topology: fix sched groups on NUMA machines with mesh topology Lauro Ramos Venancio 2017-04-13 15:16 ` Rik van Riel 2017-04-13 15:48 ` Peter Zijlstra 2017-04-13 20:21 ` Lauro Venancio [this message] 2017-04-13 21:06 ` Lauro Venancio 2017-04-13 23:38 ` Rik van Riel 2017-04-14 10:48 ` Peter Zijlstra 2017-04-14 11:38 ` Peter Zijlstra 2017-04-14 12:20 ` Peter Zijlstra 2017-05-15 9:03 ` [tip:sched/core] sched/fair, cpumask: Export for_each_cpu_wrap() tip-bot for Peter Zijlstra 2017-05-17 10:53 ` hackbench vs select_idle_sibling; was: " Peter Zijlstra 2017-05-17 12:46 ` Matt Fleming 2017-05-17 14:49 ` Chris Mason 2017-05-19 15:00 ` Matt Fleming 2017-06-05 13:00 ` Matt Fleming 2017-06-06 9:21 ` Peter Zijlstra 2017-06-09 17:52 ` Chris Mason 2017-06-08 9:22 ` [tip:sched/core] sched/core: Implement new approach to scale select_idle_cpu() tip-bot for Peter Zijlstra 2017-04-14 16:58 ` [RFC 2/3] sched/topology: fix sched groups on NUMA machines with mesh topology Peter Zijlstra 2017-04-17 14:40 ` Lauro Venancio 2017-04-13 13:56 ` [RFC 3/3] sched/topology: Different sched groups must not have the same balance cpu Lauro Ramos Venancio 2017-04-13 15:27 ` Rik van Riel 2017-04-14 16:49 ` Peter Zijlstra 2017-04-17 15:34 ` Lauro Venancio 2017-04-18 12:32 ` Peter Zijlstra
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
LKML Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \ email@example.com public-inbox-index lkml Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel AGPL code for this site: git clone https://public-inbox.org/public-inbox.git