From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754070Ab2C0PCi (ORCPT ); Tue, 27 Mar 2012 11:02:38 -0400 Received: from mail-vx0-f174.google.com ([209.85.220.174]:41002 "EHLO mail-vx0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751354Ab2C0PCf convert rfc822-to-8bit (ORCPT ); Tue, 27 Mar 2012 11:02:35 -0400 MIME-Version: 1.0 X-Originating-IP: [212.179.42.66] In-Reply-To: <1332338318-5958-1-git-send-email-fweisbec@gmail.com> References: <1332338318-5958-1-git-send-email-fweisbec@gmail.com> Date: Tue, 27 Mar 2012 17:02:34 +0200 Message-ID: Subject: Re: [RFC][PATCH 00/32] Nohz cpusets v2 (adaptive tickless kernel) From: Gilad Ben-Yossef To: Frederic Weisbecker Cc: LKML , linaro-sched-sig@lists.linaro.org, Alessio Igor Bogani , Andrew Morton , Avi Kivity , Chris Metcalf , Christoph Lameter , Daniel Lezcano , Geoff Levand , Ingo Molnar , Max Krasnyansky , "Paul E. McKenney" , Peter Zijlstra , Stephen Hemminger , Steven Rostedt , Sven-Thorsten Dietrich , Thomas Gleixner , Zen Lin Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 21, 2012 at 3:58 PM, Frederic Weisbecker wrote: > Hi all, > > A summary of what this is about can be found here: >  https://lkml.org/lkml/2011/8/15/245 > > There are still a lot of things to handle. Especially about > what is done by scheduler_tick() but we also need to: > > - completely handle cputime accounting (need to find every "reader" > of cputime and flush cputimes for all of them). > -handle  perf > - handle irqtime finegrained accounting > - handle ilb load balancing > - etc... > I gave the new version a spin (x86 8 way VM) and it looks cool. I did get the following warning once, but couldn't recreate it: [ 31.812741] ------------[ cut here ]------------ [ 31.812741] WARNING: at /home/giladb/Workspace/linux/kernel/time/tick-sched.c:706 tick_nohz_account_ticks+0x7c/0x90() [ 31.812741] Hardware name: Bochs [ 31.812741] Modules linked in: [ 31.812741] Pid: 1006, comm: sh Not tainted 3.3.0-rc7+ #167 [ 31.812741] Call Trace: [ 31.812741] [] warn_slowpath_common+0x6d/0xa0 [ 31.812741] [] ? tick_nohz_account_ticks+0x7c/0x90 [ 31.812741] [] ? tick_nohz_account_ticks+0x7c/0x90 [ 31.812741] [] warn_slowpath_null+0x1d/0x20 [ 31.812741] [] tick_nohz_account_ticks+0x7c/0x90 [ 31.812741] [] tick_nohz_flush_current_times+0x3f/0x80 [ 31.812741] [] tick_nohz_restart_adaptive+0xd/0x30 [ 31.812741] [] tick_nohz_check_adaptive+0x3e/0x50 [ 31.812741] [] smp_cpuset_update_nohz_interrupt+0x20/0x30 [ 31.812741] [] cpuset_update_nohz_interrupt+0x2a/0x30 [ 31.812741] [] ? _raw_spin_unlock_irq+0xd/0x30 [ 31.812741] [] finish_task_switch+0x46/0xa0 [ 31.812741] [] __schedule+0x398/0x910 [ 31.812741] [] ? deactivate_slab+0x611/0x730 [ 31.812741] [] ? __find_get_block+0x97/0x1a0 [ 31.812741] [] ? cpumask_next_and+0x24/0xa0 [ 31.812741] [] ? get_parent_ip+0xb/0x40 [ 31.812741] [] schedule+0x30/0x50 [ 31.812741] [] schedule_hrtimeout_range_clock+0xf5/0x110 [ 31.812741] [] ? get_parent_ip+0xb/0x40 [ 31.812741] [] ? sub_preempt_count+0x7b/0xb0 [ 31.812741] [] ? _raw_spin_unlock_irqrestore+0x13/0x40 [ 31.812741] [] ? __wake_up+0x40/0x50 [ 31.812741] [] ? put_ldisc+0x3f/0xa0 [ 31.812741] [] schedule_hrtimeout_range+0x12/0x20 [ 31.812741] [] poll_schedule_timeout+0x39/0x60 [ 31.812741] [] do_sys_poll+0x400/0x490 [ 31.812741] [] ? cpuacct_charge+0x65/0x70 [ 31.812741] [] ? poll_freewait+0x70/0x70 [ 31.812741] [] ? __pollwait+0xd0/0xd0 [ 31.812741] [] ? __pollwait+0xd0/0xd0 [ 31.812741] [] ? native_sched_clock+0x33/0xe0 [ 31.812741] [] ? sched_clock_local+0xb2/0x190 [ 31.812741] [] ? cpuacct_charge+0x65/0x70 [ 31.812741] [] ? update_curr+0x1a6/0x2a0 [ 31.812741] [] ? sched_clock_cpu+0x139/0x190 [ 31.812741] [] ? sched_clock_local+0xb2/0x190 [ 31.812741] [] ? hrtimer_forward+0x163/0x1b0 [ 31.812741] [] ? ktime_get+0x62/0x100 [ 31.812741] [] ? lapic_next_event+0x16/0x20 [ 31.812741] [] ? clockevents_program_event+0xc2/0x170 [ 31.812741] [] ? tick_program_event+0x24/0x30 [ 31.812741] [] ? hrtimer_interrupt+0x1ad/0x2e0 [ 31.812741] [] ? rcu_pending+0x58/0x70 [ 31.812741] [] ? irq_exit+0x6d/0x80 [ 31.812741] [] ? smp_apic_timer_interrupt+0x53/0x90 [ 31.812741] [] ? avc_has_perm_noaudit+0xc8/0x360 [ 31.812741] [] ? apic_timer_interrupt+0x2a/0x30 [ 31.812741] [] ? tty_ioctl+0x47e/0xa30 [ 31.812741] [] ? inode_has_perm+0x36/0x50 [ 31.812741] [] ? file_has_perm+0xa8/0xb0 [ 31.812741] [] ? tty_check_change+0xe0/0xe0 [ 31.812741] [] ? do_vfs_ioctl+0x83/0x570 [ 31.812741] [] ? selinux_file_ioctl+0x56/0x110 [ 31.812741] [] sys_poll+0x54/0xb0 [ 31.812741] [] syscall_call+0x7/0xb [ 31.812741] ---[ end trace 1d7d659b4aead681 ]--- With the two patches I'll attach to the next replies to this message, I've been able to get a task running on an isolated CPU with 0 timer interrupts. In my case, I also had to disable the clocksource watchdog, but only because TSC is not stable on my VM. This is really not a nohz/cpuset problem. There is one source of interference to cpu isolation this causes, which is the cputime flush IPI. Every time you run a command in the shell you get 3 - 4 IPIs sent to the nohz cpuset to flush the cputimes so that thread group times get computed correctly. That's not very nice :-) I've tried disabling the IPI send, just to see how it goes and as far as I've been able to tell you get bare metal like environment for a 100% cpu bound code with no interrupts. Of course. ps/top then show 0% cpu utilization for that task since without the IPI the times it spends on the CPU is not registered... that is a small price to pay in my eyes for bare metal performance on Linux, but what do I know? :-) Overall, way cool. Please keep it up ! GIlad -- Gilad Ben-Yossef Chief Coffee Drinker gilad@benyossef.com Israel Cell: +972-52-8260388 US Cell: +1-973-8260388 http://benyossef.com "If you take a class in large-scale robotics, can you end up in a situation where the homework eats your dog?"  -- Jean-Baptiste Queru