linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nishanth Aravamudan <naravamudan@digitalocean.com>
To: "Jan H. Schönherr" <jschoenh@amazon.de>
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC 00/60] Coscheduling for Linux
Date: Wed, 12 Sep 2018 16:15:35 -0700	[thread overview]
Message-ID: <20180912231535.GA1546@breakout> (raw)
In-Reply-To: <c41f4e4f-30a1-5a3c-dec6-dc8f1d181c94@amazon.de>

On 12.09.2018 [21:34:14 +0200], Jan H. Schönherr wrote:
> On 09/12/2018 02:24 AM, Nishanth Aravamudan wrote:
> > [ I am not subscribed to LKML, please keep me CC'd on replies ]
> > 
> > I tried a simple test with several VMs (in my initial test, I have 48
> > idle 1-cpu 512-mb VMs and 2 idle 2-cpu, 2-gb VMs) using libvirt, none
> > pinned to any CPUs. When I tried to set all of the top-level libvirt cpu
> > cgroups' to be co-scheduled (/bin/echo 1 >
> > /sys/fs/cgroup/cpu/machine/<VM-x>.libvirt-qemu/cpu.scheduled), the
> > machine hangs. This is using cosched_max_level=1.
> > 
> > There are several moving parts there, so I tried narrowing it down, by
> > only coscheduling one VM, and thing seemed fine:
> > 
> > /sys/fs/cgroup/cpu/machine/<VM-1>.libvirt-qemu# echo 1 > cpu.scheduled 
> > /sys/fs/cgroup/cpu/machine/<VM-1>.libvirt-qemu# cat cpu.scheduled 
> > 1
> > 
> > One thing that is not entirely obvious to me (but might be completely
> > intentional) is that since by default the top-level libvirt cpu cgroups
> > are empty:
> > 
> > /sys/fs/cgroup/cpu/machine/<VM-1>.libvirt-qemu# cat tasks 
> > 
> > the result of this should be a no-op, right? [This becomes relevant
> > below] Specifically, all of the threads of qemu are in sub-cgroups,
> > which do not indicate they are co-scheduling:
> > 
> > /sys/fs/cgroup/cpu/machine/<VM-1>.libvirt-qemu# cat emulator/cpu.scheduled 
> > 0
> > /sys/fs/cgroup/cpu/machine/<VM-1>.libvirt-qemu# cat vcpu0/cpu.scheduled 
> > 0
> > 
> 
> This setup *should* work. It should be possible to set cpu.scheduled
> independent of the cpu.scheduled values of parent and child task groups.
> Any intermediate regular task group (i.e. cpu.scheduled==0) will still
> contribute the group fairness aspects.

Ah I see, that makes sense, thank you.

> That said, I see a hang, too. It seems to happen, when there is a
> cpu.scheduled!=0 group that is not a direct child of the root task group.
> You seem to have "/sys/fs/cgroup/cpu/machine" as an intermediate group.
> (The case ==0 within !=0 within the root task group works for me.)
>
> I'm going to dive into the code.
> 
> [...]
> > I am happy to do any further debugging I can do, or try patches on top
> > of those posted on the mailing list.
> 
> If you're willing, you can try to get rid of the intermediate "machine"
> cgroup in your setup for the moment. This might tell us, whether we're
> looking at the same issue.

Yep I will do this now. Note that if I just try to set machine's
cpu.scheduled to 1, with no other changes (not even changing any child
cgroup's cpu.scheduled yet), I get the following trace:

[16052.164259] ------------[ cut here ]------------
[16052.168973] rq->clock_update_flags < RQCF_ACT_SKIP
[16052.168991] WARNING: CPU: 59 PID: 59533 at kernel/sched/sched.h:1303 assert_clock_updated.isra.82.part.83+0x15/0x18
[16052.184424] Modules linked in: act_police cls_basic ebtable_filter ebtables ip6table_filter iptable_filter nbd ip6table_raw ip6_tables xt_CT iptable_raw ip_tables s
[16052.255653]  xxhash raid10 raid0 multipath linear raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq ses libcrc32c raid1 enclosure scsi
[16052.276029] CPU: 59 PID: 59533 Comm: bash Tainted: G           O      4.19.0-rc2-amazon-cosched+ #1
[16052.291142] Hardware name: Dell Inc. PowerEdge R640/0W23H8, BIOS 1.4.9 06/29/2018
[16052.298728] RIP: 0010:assert_clock_updated.isra.82.part.83+0x15/0x18
[16052.305166] Code: 0f 85 75 ff ff ff 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 c7 c7 28 30 eb 94 31 c0 c6 05 47 18 27 01 01 e8 f4 df fb ff <0f> 0b c3 48 8b 970
[16052.324050] RSP: 0018:ffff9cada610bca8 EFLAGS: 00010096
[16052.329361] RAX: 0000000000000026 RBX: ffff8f06d65bae00 RCX: 0000000000000006
[16052.336580] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffff8f1edf756620
[16052.343799] RBP: ffff8f06e0462e00 R08: 000000000000079b R09: ffff9cada610bc48
[16052.351018] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8f06e0462e80
[16052.358237] R13: 0000000000000001 R14: ffff8f06e0462e00 R15: 0000000000000001
[16052.365458] FS:  00007ff07ab02740(0000) GS:ffff8f1edf740000(0000) knlGS:0000000000000000
[16052.373647] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[16052.379480] CR2: 00007ff07ab139d8 CR3: 0000002ca2aea002 CR4: 00000000007626e0
[16052.386698] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[16052.393917] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[16052.401137] PKRU: 55555554
[16052.403927] Call Trace:
[16052.406460]  update_curr+0x19f/0x1c0
[16052.410116]  dequeue_entity+0x21/0x8c0
[16052.413950]  ? terminate_walk+0x55/0xb0
[16052.417871]  dequeue_entity_fair+0x46/0x1c0
[16052.422136]  sdrq_update_root+0x35d/0x480
[16052.426227]  cosched_set_scheduled+0x80/0x1c0
[16052.430675]  cpu_scheduled_write_u64+0x26/0x30
[16052.435209]  cgroup_file_write+0xe3/0x140
[16052.439305]  kernfs_fop_write+0x110/0x190
[16052.443397]  __vfs_write+0x26/0x170
[16052.446974]  ? __audit_syscall_entry+0x101/0x130
[16052.451674]  ? _cond_resched+0x15/0x30
[16052.455509]  ? __sb_start_write+0x41/0x80
[16052.459600]  vfs_write+0xad/0x1a0
[16052.462997]  ksys_write+0x42/0x90
[16052.466397]  do_syscall_64+0x55/0x110
[16052.470152]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[16052.475286] RIP: 0033:0x7ff07a1e93c0
[16052.478943] Code: 73 01 c3 48 8b 0d c8 2a 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d bd 8c 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff4
[16052.497827] RSP: 002b:00007ffc73e335b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[16052.505498] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007ff07a1e93c0
[16052.512715] RDX: 0000000000000002 RSI: 00000000023a0408 RDI: 0000000000000001
[16052.519936] RBP: 00000000023a0408 R08: 000000000000000a R09: 00007ff07ab02740
[16052.527156] R10: 00007ff07a4bb6a0 R11: 0000000000000246 R12: 00007ff07a4bd400
[16052.534374] R13: 0000000000000002 R14: 0000000000000001 R15: 0000000000000000
[16052.541593] ---[ end trace b20c73e6c2bec22c ]---

I'll reboot and move some cgroups around :)

Thanks,
Nish

  reply	other threads:[~2018-09-12 23:15 UTC|newest]

Thread overview: 114+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-07 21:39 [RFC 00/60] Coscheduling for Linux Jan H. Schönherr
2018-09-07 21:39 ` [RFC 01/60] sched: Store task_group->se[] pointers as part of cfs_rq Jan H. Schönherr
2018-09-07 21:39 ` [RFC 02/60] sched: Introduce set_entity_cfs() to place a SE into a certain CFS runqueue Jan H. Schönherr
2018-09-07 21:39 ` [RFC 03/60] sched: Setup sched_domain_shared for all sched_domains Jan H. Schönherr
2018-09-07 21:39 ` [RFC 04/60] sched: Replace sd_numa_mask() hack with something sane Jan H. Schönherr
2018-09-07 21:39 ` [RFC 05/60] sched: Allow to retrieve the sched_domain_topology Jan H. Schönherr
2018-09-07 21:39 ` [RFC 06/60] sched: Add a lock-free variant of resched_cpu() Jan H. Schönherr
2018-09-07 21:39 ` [RFC 07/60] sched: Reduce dependencies of init_tg_cfs_entry() Jan H. Schönherr
2018-09-07 21:39 ` [RFC 08/60] sched: Move init_entity_runnable_average() into init_tg_cfs_entry() Jan H. Schönherr
2018-09-07 21:39 ` [RFC 09/60] sched: Do not require a CFS in init_tg_cfs_entry() Jan H. Schönherr
2018-09-07 21:39 ` [RFC 10/60] sched: Use parent_entity() in more places Jan H. Schönherr
2018-09-07 21:39 ` [RFC 11/60] locking/lockdep: Increase number of supported lockdep subclasses Jan H. Schönherr
2018-09-07 21:39 ` [RFC 12/60] locking/lockdep: Make cookie generator accessible Jan H. Schönherr
2018-09-07 21:40 ` [RFC 13/60] sched: Remove useless checks for root task-group Jan H. Schönherr
2018-09-07 21:40 ` [RFC 14/60] sched: Refactor sync_throttle() to accept a CFS runqueue as argument Jan H. Schönherr
2018-09-07 21:40 ` [RFC 15/60] sched: Introduce parent_cfs_rq() and use it Jan H. Schönherr
2018-09-07 21:40 ` [RFC 16/60] sched: Preparatory code movement Jan H. Schönherr
2018-09-07 21:40 ` [RFC 17/60] sched: Introduce and use generic task group CFS traversal functions Jan H. Schönherr
2018-09-07 21:40 ` [RFC 18/60] sched: Fix return value of SCHED_WARN_ON() Jan H. Schönherr
2018-09-07 21:40 ` [RFC 19/60] sched: Add entity variants of enqueue_task_fair() and dequeue_task_fair() Jan H. Schönherr
2018-09-07 21:40 ` [RFC 20/60] sched: Let {en,de}queue_entity_fair() work with a varying amount of tasks Jan H. Schönherr
2018-09-07 21:40 ` [RFC 21/60] sched: Add entity variants of put_prev_task_fair() and set_curr_task_fair() Jan H. Schönherr
2018-09-07 21:40 ` [RFC 22/60] cosched: Add config option for coscheduling support Jan H. Schönherr
2018-09-07 21:40 ` [RFC 23/60] cosched: Add core data structures for coscheduling Jan H. Schönherr
2018-09-07 21:40 ` [RFC 24/60] cosched: Do minimal pre-SMP coscheduler initialization Jan H. Schönherr
2018-09-07 21:40 ` [RFC 25/60] cosched: Prepare scheduling domain topology for coscheduling Jan H. Schönherr
2018-09-07 21:40 ` [RFC 26/60] cosched: Construct runqueue hierarchy Jan H. Schönherr
2018-09-07 21:40 ` [RFC 27/60] cosched: Add some small helper functions for later use Jan H. Schönherr
2018-09-07 21:40 ` [RFC 28/60] cosched: Add is_sd_se() to distinguish SD-SEs from TG-SEs Jan H. Schönherr
2018-09-07 21:40 ` [RFC 29/60] cosched: Adjust code reflecting on the total number of CFS tasks on a CPU Jan H. Schönherr
2018-09-07 21:40 ` [RFC 30/60] cosched: Disallow share modification on task groups for now Jan H. Schönherr
2018-09-07 21:40 ` [RFC 31/60] cosched: Don't disable idle tick " Jan H. Schönherr
2018-09-07 21:40 ` [RFC 32/60] cosched: Specialize parent_cfs_rq() for hierarchical runqueues Jan H. Schönherr
2018-09-07 21:40 ` [RFC 33/60] cosched: Allow resched_curr() to be called " Jan H. Schönherr
2018-09-07 21:40 ` [RFC 34/60] cosched: Add rq_of() variants for different use cases Jan H. Schönherr
2018-09-07 21:40 ` [RFC 35/60] cosched: Adjust rq_lock() functions to work with hierarchical runqueues Jan H. Schönherr
2018-09-07 21:40 ` [RFC 36/60] cosched: Use hrq_of() for rq_clock() and rq_clock_task() Jan H. Schönherr
2018-09-07 21:40 ` [RFC 37/60] cosched: Use hrq_of() for (indirect calls to) ___update_load_sum() Jan H. Schönherr
2018-09-07 21:40 ` [RFC 38/60] cosched: Skip updates on non-CPU runqueues in cfs_rq_util_change() Jan H. Schönherr
2018-09-07 21:40 ` [RFC 39/60] cosched: Adjust task group management for hierarchical runqueues Jan H. Schönherr
2018-09-07 21:40 ` [RFC 40/60] cosched: Keep track of task group hierarchy within each SD-RQ Jan H. Schönherr
2018-09-07 21:40 ` [RFC 41/60] cosched: Introduce locking for leader activities Jan H. Schönherr
2018-09-07 21:40 ` [RFC 42/60] cosched: Introduce locking for (mostly) enqueuing and dequeuing Jan H. Schönherr
2018-09-07 21:40 ` [RFC 43/60] cosched: Add for_each_sched_entity() variant for owned entities Jan H. Schönherr
2018-09-07 21:40 ` [RFC 44/60] cosched: Perform various rq_of() adjustments in scheduler code Jan H. Schönherr
2018-09-07 21:40 ` [RFC 45/60] cosched: Continue to account all load on per-CPU runqueues Jan H. Schönherr
2018-09-07 21:40 ` [RFC 46/60] cosched: Warn on throttling attempts of non-CPU runqueues Jan H. Schönherr
2018-09-07 21:40 ` [RFC 47/60] cosched: Adjust SE traversal and locking for common leader activities Jan H. Schönherr
2018-09-07 21:40 ` [RFC 48/60] cosched: Adjust SE traversal and locking for yielding and buddies Jan H. Schönherr
2018-09-07 21:40 ` [RFC 49/60] cosched: Adjust locking for enqueuing and dequeueing Jan H. Schönherr
2018-09-07 21:40 ` [RFC 50/60] cosched: Propagate load changes across hierarchy levels Jan H. Schönherr
2018-09-07 21:40 ` [RFC 51/60] cosched: Hacky work-around to avoid observing zero weight SD-SE Jan H. Schönherr
2018-09-07 21:40 ` [RFC 52/60] cosched: Support SD-SEs in enqueuing and dequeuing Jan H. Schönherr
2018-09-07 21:40 ` [RFC 53/60] cosched: Prevent balancing related functions from crossing hierarchy levels Jan H. Schönherr
2018-09-07 21:40 ` [RFC 54/60] cosched: Support idling in a coscheduled set Jan H. Schönherr
2018-09-07 21:40 ` [RFC 55/60] cosched: Adjust task selection for coscheduling Jan H. Schönherr
2018-09-07 21:40 ` [RFC 56/60] cosched: Adjust wakeup preemption rules " Jan H. Schönherr
2018-09-07 21:40 ` [RFC 57/60] cosched: Add sysfs interface to configure coscheduling on cgroups Jan H. Schönherr
2018-09-07 21:40 ` [RFC 58/60] cosched: Switch runqueues between regular scheduling and coscheduling Jan H. Schönherr
2018-09-07 21:40 ` [RFC 59/60] cosched: Handle non-atomicity during switches to and from coscheduling Jan H. Schönherr
2018-09-07 21:40 ` [RFC 60/60] cosched: Add command line argument to enable coscheduling Jan H. Schönherr
2018-09-10  2:50   ` Randy Dunlap
2018-09-12  0:24 ` [RFC 00/60] Coscheduling for Linux Nishanth Aravamudan
2018-09-12 19:34   ` Jan H. Schönherr
2018-09-12 23:15     ` Nishanth Aravamudan [this message]
2018-09-13 11:31       ` Jan H. Schönherr
2018-09-13 18:16         ` Nishanth Aravamudan
2018-09-12 23:18     ` Jan H. Schönherr
2018-09-13  3:05       ` Nishanth Aravamudan
2018-09-13 19:19 ` [RFC 61/60] cosched: Accumulated fixes and improvements Jan H. Schönherr
2018-09-26 17:25   ` Nishanth Aravamudan
2018-09-26 21:05     ` Nishanth Aravamudan
2018-10-01  9:13       ` Jan H. Schönherr
2018-09-14 11:12 ` [RFC 00/60] Coscheduling for Linux Peter Zijlstra
2018-09-14 16:25   ` Jan H. Schönherr
2018-09-15  8:48     ` Task group cleanups and optimizations (was: Re: [RFC 00/60] Coscheduling for Linux) Jan H. Schönherr
2018-09-17  9:48       ` Peter Zijlstra
2018-09-18 13:22         ` Jan H. Schönherr
2018-09-18 13:38           ` Peter Zijlstra
2018-09-18 13:54             ` Jan H. Schönherr
2018-09-18 13:42           ` Peter Zijlstra
2018-09-18 14:35           ` Rik van Riel
2018-09-19  9:23             ` Jan H. Schönherr
2018-11-23 16:51           ` Frederic Weisbecker
2018-12-04 13:23             ` Jan H. Schönherr
2018-09-17 11:33     ` [RFC 00/60] Coscheduling for Linux Peter Zijlstra
2018-11-02 22:13       ` Nishanth Aravamudan
2018-09-17 12:25     ` Peter Zijlstra
2018-09-26  9:58       ` Jan H. Schönherr
2018-09-27 18:36         ` Subhra Mazumdar
2018-11-23 16:29           ` Frederic Weisbecker
2018-09-17 13:37     ` Peter Zijlstra
2018-09-26  9:35       ` Jan H. Schönherr
2018-09-18 14:40     ` Rik van Riel
2018-09-24 15:23       ` Jan H. Schönherr
2018-09-24 18:01         ` Rik van Riel
2018-09-18  0:33 ` Subhra Mazumdar
2018-09-18 11:44   ` Jan H. Schönherr
2018-09-19 21:53     ` Subhra Mazumdar
2018-09-24 15:43       ` Jan H. Schönherr
2018-09-27 18:12         ` Subhra Mazumdar
2018-10-04 13:29 ` Jon Masters
2018-10-17  2:09 ` Frederic Weisbecker
2018-10-19 11:40   ` Jan H. Schönherr
2018-10-19 14:52     ` Frederic Weisbecker
2018-10-19 15:16     ` Rik van Riel
2018-10-19 15:33       ` Frederic Weisbecker
2018-10-19 15:45         ` Rik van Riel
2018-10-19 19:07           ` Jan H. Schönherr
2018-10-19  0:26 ` Subhra Mazumdar
2018-10-26 23:44   ` Jan H. Schönherr
2018-10-29 22:52     ` Subhra Mazumdar
2018-10-26 23:05 ` Subhra Mazumdar
2018-10-27  0:07   ` Jan H. Schönherr

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180912231535.GA1546@breakout \
    --to=naravamudan@digitalocean.com \
    --cc=jschoenh@amazon.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).