From: Nishanth Aravamudan <naravamudan@digitalocean.com>
To: "Jan H. Schönherr" <jschoenh@amazon.de>
Cc: Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
linux-kernel@vger.kernel.org
Subject: Re: [RFC 00/60] Coscheduling for Linux
Date: Tue, 11 Sep 2018 17:24:49 -0700 [thread overview]
Message-ID: <20180912002449.GA21797@breakout> (raw)
In-Reply-To: <20180907214047.26914-1-jschoenh@amazon.de>
[ I am not subscribed to LKML, please keep me CC'd on replies ]
On 07.09.2018 [23:39:47 +0200], Jan H. Schönherr wrote:
> This patch series extends CFS with support for coscheduling. The
> implementation is versatile enough to cover many different
> coscheduling use-cases, while at the same time being non-intrusive, so
> that behavior of legacy workloads does not change.
I tried a simple test with several VMs (in my initial test, I have 48
idle 1-cpu 512-mb VMs and 2 idle 2-cpu, 2-gb VMs) using libvirt, none
pinned to any CPUs. When I tried to set all of the top-level libvirt cpu
cgroups' to be co-scheduled (/bin/echo 1 >
/sys/fs/cgroup/cpu/machine/<VM-x>.libvirt-qemu/cpu.scheduled), the
machine hangs. This is using cosched_max_level=1.
There are several moving parts there, so I tried narrowing it down, by
only coscheduling one VM, and thing seemed fine:
/sys/fs/cgroup/cpu/machine/<VM-1>.libvirt-qemu# echo 1 > cpu.scheduled
/sys/fs/cgroup/cpu/machine/<VM-1>.libvirt-qemu# cat cpu.scheduled
1
One thing that is not entirely obvious to me (but might be completely
intentional) is that since by default the top-level libvirt cpu cgroups
are empty:
/sys/fs/cgroup/cpu/machine/<VM-1>.libvirt-qemu# cat tasks
the result of this should be a no-op, right? [This becomes relevant
below] Specifically, all of the threads of qemu are in sub-cgroups,
which do not indicate they are co-scheduling:
/sys/fs/cgroup/cpu/machine/<VM-1>.libvirt-qemu# cat emulator/cpu.scheduled
0
/sys/fs/cgroup/cpu/machine/<VM-1>.libvirt-qemu# cat vcpu0/cpu.scheduled
0
When I then try to coschedule the second VM, the machine hangs.
/sys/fs/cgroup/cpu/machine/<VM-2>.libvirt-qemu# echo 1 > cpu.scheduled
Timeout, server <HOST> not responding.
On the console, I see the same backtraces I see when I try to set all of
the VMs to be coscheduled:
[ 144.494091] watchdog: BUG: soft lockup - CPU#87 stuck for 22s! [CPU 0/KVM:25344]
[ 144.507629] Modules linked in: act_police cls_basic ebtable_filter ebtables ip6table_filter iptable_filter nbd ip6table_raw ip6_tables xt_CT iptable_raw ip_tables s
[ 144.578858] xxhash raid10 raid0 multipath linear raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor ses raid6_pq enclosure libcrc32c raid1 scsi
[ 144.599227] CPU: 87 PID: 25344 Comm: CPU 0/KVM Tainted: G O 4.19.0-rc2-amazon-cosched+ #1
[ 144.608819] Hardware name: Dell Inc. PowerEdge R640/0W23H8, BIOS 1.4.9 06/29/2018
[ 144.616403] RIP: 0010:smp_call_function_single+0xa7/0xd0
[ 144.621818] Code: 01 48 89 d1 48 89 f2 4c 89 c6 e8 64 fe ff ff c9 c3 48 89 d1 48 89 f2 48 89 e6 e8 54 fe ff ff 8b 54 24 18 83 e2 01 74 0b f3 90 <8b> 54 24 18 83 e25
[ 144.640703] RSP: 0018:ffffb2a4a75abb40 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
[ 144.648390] RAX: 0000000000000000 RBX: 0000000000000057 RCX: 0000000000000000
[ 144.655607] RDX: 0000000000000001 RSI: 00000000000000fb RDI: 0000000000000202
[ 144.662826] RBP: ffffb2a4a75abb60 R08: 0000000000000000 R09: 0000000000000f39
[ 144.670073] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8a9c03fc8000
[ 144.677301] R13: ffff8ab4589dc100 R14: 0000000000000057 R15: 0000000000000000
[ 144.684519] FS: 00007f51cd41a700(0000) GS:ffff8ab45fac0000(0000) knlGS:0000000000000000
[ 144.692710] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 144.698542] CR2: 000000c4203c0000 CR3: 000000178a97e005 CR4: 00000000007626e0
[ 144.705771] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 144.712989] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 144.720215] PKRU: 55555554
[ 144.723016] Call Trace:
[ 144.725553] ? vmx_sched_in+0xc0/0xc0 [kvm_intel]
[ 144.730341] vmx_vcpu_load+0x244/0x310 [kvm_intel]
[ 144.735220] ? __switch_to_asm+0x40/0x70
[ 144.739231] ? __switch_to_asm+0x34/0x70
[ 144.743235] ? __switch_to_asm+0x40/0x70
[ 144.747240] ? __switch_to_asm+0x34/0x70
[ 144.751243] ? __switch_to_asm+0x40/0x70
[ 144.755246] ? __switch_to_asm+0x34/0x70
[ 144.759250] ? __switch_to_asm+0x40/0x70
[ 144.763272] ? __switch_to_asm+0x34/0x70
[ 144.767284] ? __switch_to_asm+0x40/0x70
[ 144.771296] ? __switch_to_asm+0x34/0x70
[ 144.775299] ? __switch_to_asm+0x40/0x70
[ 144.779313] ? __switch_to_asm+0x34/0x70
[ 144.783317] ? __switch_to_asm+0x40/0x70
[ 144.787338] kvm_arch_vcpu_load+0x40/0x270 [kvm]
[ 144.792056] finish_task_switch+0xe2/0x260
[ 144.796238] __schedule+0x316/0x890
[ 144.799810] schedule+0x32/0x80
[ 144.803039] kvm_vcpu_block+0x7a/0x2e0 [kvm]
[ 144.807399] kvm_arch_vcpu_ioctl_run+0x1a7/0x1990 [kvm]
[ 144.812705] ? futex_wake+0x84/0x150
[ 144.816368] kvm_vcpu_ioctl+0x3ab/0x5d0 [kvm]
[ 144.820810] ? wake_up_q+0x70/0x70
[ 144.824311] do_vfs_ioctl+0x92/0x600
[ 144.827985] ? syscall_trace_enter+0x1ac/0x290
[ 144.832517] ksys_ioctl+0x60/0x90
[ 144.835913] ? exit_to_usermode_loop+0xa6/0xc2
[ 144.840436] __x64_sys_ioctl+0x16/0x20
[ 144.844267] do_syscall_64+0x55/0x110
[ 144.848012] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 144.853160] RIP: 0033:0x7f51cf82bea7
[ 144.856816] Code: 44 00 00 48 8b 05 e1 cf 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff8
[ 144.875752] RSP: 002b:00007f51cd419a18 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
I am happy to do any further debugging I can do, or try patches on top
of those posted on the mailing list.
Thanks,
Nish
next prev parent reply other threads:[~2018-09-12 0:24 UTC|newest]
Thread overview: 114+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-07 21:39 [RFC 00/60] Coscheduling for Linux Jan H. Schönherr
2018-09-07 21:39 ` [RFC 01/60] sched: Store task_group->se[] pointers as part of cfs_rq Jan H. Schönherr
2018-09-07 21:39 ` [RFC 02/60] sched: Introduce set_entity_cfs() to place a SE into a certain CFS runqueue Jan H. Schönherr
2018-09-07 21:39 ` [RFC 03/60] sched: Setup sched_domain_shared for all sched_domains Jan H. Schönherr
2018-09-07 21:39 ` [RFC 04/60] sched: Replace sd_numa_mask() hack with something sane Jan H. Schönherr
2018-09-07 21:39 ` [RFC 05/60] sched: Allow to retrieve the sched_domain_topology Jan H. Schönherr
2018-09-07 21:39 ` [RFC 06/60] sched: Add a lock-free variant of resched_cpu() Jan H. Schönherr
2018-09-07 21:39 ` [RFC 07/60] sched: Reduce dependencies of init_tg_cfs_entry() Jan H. Schönherr
2018-09-07 21:39 ` [RFC 08/60] sched: Move init_entity_runnable_average() into init_tg_cfs_entry() Jan H. Schönherr
2018-09-07 21:39 ` [RFC 09/60] sched: Do not require a CFS in init_tg_cfs_entry() Jan H. Schönherr
2018-09-07 21:39 ` [RFC 10/60] sched: Use parent_entity() in more places Jan H. Schönherr
2018-09-07 21:39 ` [RFC 11/60] locking/lockdep: Increase number of supported lockdep subclasses Jan H. Schönherr
2018-09-07 21:39 ` [RFC 12/60] locking/lockdep: Make cookie generator accessible Jan H. Schönherr
2018-09-07 21:40 ` [RFC 13/60] sched: Remove useless checks for root task-group Jan H. Schönherr
2018-09-07 21:40 ` [RFC 14/60] sched: Refactor sync_throttle() to accept a CFS runqueue as argument Jan H. Schönherr
2018-09-07 21:40 ` [RFC 15/60] sched: Introduce parent_cfs_rq() and use it Jan H. Schönherr
2018-09-07 21:40 ` [RFC 16/60] sched: Preparatory code movement Jan H. Schönherr
2018-09-07 21:40 ` [RFC 17/60] sched: Introduce and use generic task group CFS traversal functions Jan H. Schönherr
2018-09-07 21:40 ` [RFC 18/60] sched: Fix return value of SCHED_WARN_ON() Jan H. Schönherr
2018-09-07 21:40 ` [RFC 19/60] sched: Add entity variants of enqueue_task_fair() and dequeue_task_fair() Jan H. Schönherr
2018-09-07 21:40 ` [RFC 20/60] sched: Let {en,de}queue_entity_fair() work with a varying amount of tasks Jan H. Schönherr
2018-09-07 21:40 ` [RFC 21/60] sched: Add entity variants of put_prev_task_fair() and set_curr_task_fair() Jan H. Schönherr
2018-09-07 21:40 ` [RFC 22/60] cosched: Add config option for coscheduling support Jan H. Schönherr
2018-09-07 21:40 ` [RFC 23/60] cosched: Add core data structures for coscheduling Jan H. Schönherr
2018-09-07 21:40 ` [RFC 24/60] cosched: Do minimal pre-SMP coscheduler initialization Jan H. Schönherr
2018-09-07 21:40 ` [RFC 25/60] cosched: Prepare scheduling domain topology for coscheduling Jan H. Schönherr
2018-09-07 21:40 ` [RFC 26/60] cosched: Construct runqueue hierarchy Jan H. Schönherr
2018-09-07 21:40 ` [RFC 27/60] cosched: Add some small helper functions for later use Jan H. Schönherr
2018-09-07 21:40 ` [RFC 28/60] cosched: Add is_sd_se() to distinguish SD-SEs from TG-SEs Jan H. Schönherr
2018-09-07 21:40 ` [RFC 29/60] cosched: Adjust code reflecting on the total number of CFS tasks on a CPU Jan H. Schönherr
2018-09-07 21:40 ` [RFC 30/60] cosched: Disallow share modification on task groups for now Jan H. Schönherr
2018-09-07 21:40 ` [RFC 31/60] cosched: Don't disable idle tick " Jan H. Schönherr
2018-09-07 21:40 ` [RFC 32/60] cosched: Specialize parent_cfs_rq() for hierarchical runqueues Jan H. Schönherr
2018-09-07 21:40 ` [RFC 33/60] cosched: Allow resched_curr() to be called " Jan H. Schönherr
2018-09-07 21:40 ` [RFC 34/60] cosched: Add rq_of() variants for different use cases Jan H. Schönherr
2018-09-07 21:40 ` [RFC 35/60] cosched: Adjust rq_lock() functions to work with hierarchical runqueues Jan H. Schönherr
2018-09-07 21:40 ` [RFC 36/60] cosched: Use hrq_of() for rq_clock() and rq_clock_task() Jan H. Schönherr
2018-09-07 21:40 ` [RFC 37/60] cosched: Use hrq_of() for (indirect calls to) ___update_load_sum() Jan H. Schönherr
2018-09-07 21:40 ` [RFC 38/60] cosched: Skip updates on non-CPU runqueues in cfs_rq_util_change() Jan H. Schönherr
2018-09-07 21:40 ` [RFC 39/60] cosched: Adjust task group management for hierarchical runqueues Jan H. Schönherr
2018-09-07 21:40 ` [RFC 40/60] cosched: Keep track of task group hierarchy within each SD-RQ Jan H. Schönherr
2018-09-07 21:40 ` [RFC 41/60] cosched: Introduce locking for leader activities Jan H. Schönherr
2018-09-07 21:40 ` [RFC 42/60] cosched: Introduce locking for (mostly) enqueuing and dequeuing Jan H. Schönherr
2018-09-07 21:40 ` [RFC 43/60] cosched: Add for_each_sched_entity() variant for owned entities Jan H. Schönherr
2018-09-07 21:40 ` [RFC 44/60] cosched: Perform various rq_of() adjustments in scheduler code Jan H. Schönherr
2018-09-07 21:40 ` [RFC 45/60] cosched: Continue to account all load on per-CPU runqueues Jan H. Schönherr
2018-09-07 21:40 ` [RFC 46/60] cosched: Warn on throttling attempts of non-CPU runqueues Jan H. Schönherr
2018-09-07 21:40 ` [RFC 47/60] cosched: Adjust SE traversal and locking for common leader activities Jan H. Schönherr
2018-09-07 21:40 ` [RFC 48/60] cosched: Adjust SE traversal and locking for yielding and buddies Jan H. Schönherr
2018-09-07 21:40 ` [RFC 49/60] cosched: Adjust locking for enqueuing and dequeueing Jan H. Schönherr
2018-09-07 21:40 ` [RFC 50/60] cosched: Propagate load changes across hierarchy levels Jan H. Schönherr
2018-09-07 21:40 ` [RFC 51/60] cosched: Hacky work-around to avoid observing zero weight SD-SE Jan H. Schönherr
2018-09-07 21:40 ` [RFC 52/60] cosched: Support SD-SEs in enqueuing and dequeuing Jan H. Schönherr
2018-09-07 21:40 ` [RFC 53/60] cosched: Prevent balancing related functions from crossing hierarchy levels Jan H. Schönherr
2018-09-07 21:40 ` [RFC 54/60] cosched: Support idling in a coscheduled set Jan H. Schönherr
2018-09-07 21:40 ` [RFC 55/60] cosched: Adjust task selection for coscheduling Jan H. Schönherr
2018-09-07 21:40 ` [RFC 56/60] cosched: Adjust wakeup preemption rules " Jan H. Schönherr
2018-09-07 21:40 ` [RFC 57/60] cosched: Add sysfs interface to configure coscheduling on cgroups Jan H. Schönherr
2018-09-07 21:40 ` [RFC 58/60] cosched: Switch runqueues between regular scheduling and coscheduling Jan H. Schönherr
2018-09-07 21:40 ` [RFC 59/60] cosched: Handle non-atomicity during switches to and from coscheduling Jan H. Schönherr
2018-09-07 21:40 ` [RFC 60/60] cosched: Add command line argument to enable coscheduling Jan H. Schönherr
2018-09-10 2:50 ` Randy Dunlap
2018-09-12 0:24 ` Nishanth Aravamudan [this message]
2018-09-12 19:34 ` [RFC 00/60] Coscheduling for Linux Jan H. Schönherr
2018-09-12 23:15 ` Nishanth Aravamudan
2018-09-13 11:31 ` Jan H. Schönherr
2018-09-13 18:16 ` Nishanth Aravamudan
2018-09-12 23:18 ` Jan H. Schönherr
2018-09-13 3:05 ` Nishanth Aravamudan
2018-09-13 19:19 ` [RFC 61/60] cosched: Accumulated fixes and improvements Jan H. Schönherr
2018-09-26 17:25 ` Nishanth Aravamudan
2018-09-26 21:05 ` Nishanth Aravamudan
2018-10-01 9:13 ` Jan H. Schönherr
2018-09-14 11:12 ` [RFC 00/60] Coscheduling for Linux Peter Zijlstra
2018-09-14 16:25 ` Jan H. Schönherr
2018-09-15 8:48 ` Task group cleanups and optimizations (was: Re: [RFC 00/60] Coscheduling for Linux) Jan H. Schönherr
2018-09-17 9:48 ` Peter Zijlstra
2018-09-18 13:22 ` Jan H. Schönherr
2018-09-18 13:38 ` Peter Zijlstra
2018-09-18 13:54 ` Jan H. Schönherr
2018-09-18 13:42 ` Peter Zijlstra
2018-09-18 14:35 ` Rik van Riel
2018-09-19 9:23 ` Jan H. Schönherr
2018-11-23 16:51 ` Frederic Weisbecker
2018-12-04 13:23 ` Jan H. Schönherr
2018-09-17 11:33 ` [RFC 00/60] Coscheduling for Linux Peter Zijlstra
2018-11-02 22:13 ` Nishanth Aravamudan
2018-09-17 12:25 ` Peter Zijlstra
2018-09-26 9:58 ` Jan H. Schönherr
2018-09-27 18:36 ` Subhra Mazumdar
2018-11-23 16:29 ` Frederic Weisbecker
2018-09-17 13:37 ` Peter Zijlstra
2018-09-26 9:35 ` Jan H. Schönherr
2018-09-18 14:40 ` Rik van Riel
2018-09-24 15:23 ` Jan H. Schönherr
2018-09-24 18:01 ` Rik van Riel
2018-09-18 0:33 ` Subhra Mazumdar
2018-09-18 11:44 ` Jan H. Schönherr
2018-09-19 21:53 ` Subhra Mazumdar
2018-09-24 15:43 ` Jan H. Schönherr
2018-09-27 18:12 ` Subhra Mazumdar
2018-10-04 13:29 ` Jon Masters
2018-10-17 2:09 ` Frederic Weisbecker
2018-10-19 11:40 ` Jan H. Schönherr
2018-10-19 14:52 ` Frederic Weisbecker
2018-10-19 15:16 ` Rik van Riel
2018-10-19 15:33 ` Frederic Weisbecker
2018-10-19 15:45 ` Rik van Riel
2018-10-19 19:07 ` Jan H. Schönherr
2018-10-19 0:26 ` Subhra Mazumdar
2018-10-26 23:44 ` Jan H. Schönherr
2018-10-29 22:52 ` Subhra Mazumdar
2018-10-26 23:05 ` Subhra Mazumdar
2018-10-27 0:07 ` Jan H. Schönherr
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180912002449.GA21797@breakout \
--to=naravamudan@digitalocean.com \
--cc=jschoenh@amazon.de \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).