linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bharata B Rao <bharata@linux.vnet.ibm.com>
To: Paul Turner <pjt@google.com>
Cc: linux-kernel@vger.kernel.org,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Dhaval Giani <dhaval.giani@gmail.com>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>,
	Srivatsa Vaddagiri <vatsa@in.ibm.com>,
	Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>,
	Ingo Molnar <mingo@elte.hu>, Pavel Emelyanov <xemul@openvz.org>
Subject: Re: [patch 00/15] CFS Bandwidth Control V5
Date: Thu, 24 Mar 2011 21:42:36 +0530	[thread overview]
Message-ID: <20110324161236.GA3399@in.ibm.com> (raw)
In-Reply-To: <20110323030326.789836913@google.com>

On Tue, Mar 22, 2011 at 08:03:26PM -0700, Paul Turner wrote:
> Hi all,
> 
> Please find attached the latest version of bandwidth control for the normal
> scheduling class.  This revision has undergone fairly extensive changes since
> the previous version based largely on the observation that many of the edge
> conditions requiring special casing around update_curr() were a result of
> introducing side-effects into that operation.  By introducing an interstitial
> state, where we recognize that the runqueue is over bandwidth, but not marking
> it throttled until we can actually remove it from the CPU we avoid the
> previous possible interactions with throttled entities which eliminates some
> head-scratching corner cases.

I am seeing hard lockups occasionally, not always reproducible. This particular
one occured when I had 1 task in a bandwidth constrained parent group and 10
tasks in its child group which has infinite bandwidth on a 16 CPU system.

Here is the log...

WARNING: at kernel/watchdog.c:226 watchdog_overflow_callback+0x98/0xc0()
Hardware name: System x3650 M2 -[794796Q]-
Watchdog detected hard LOCKUP on cpu 0
Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf xt_physdev ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipv6 ext4 jbd2 dm_mirror dm_region_hash dm_log dm_mod kvm_intel kvm uinput matroxfb_base matroxfb_DAC1064 matroxfb_accel matroxfb_Ti3026 matroxfb_g450 g450_pll matroxfb_misc cdc_ether usbnet mii ses enclosure sg serio_raw pcspkr i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support shpchp ioatdma dca i7core_edac edac_core bnx2 ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif usb_storage pata_acpi ata_generic ata_piix megaraid_sas qla2xxx scsi_transport_fc scsi_tgt [last unloaded: microcode]
Pid: 0, comm: swapper Not tainted 2.6.38-tip #6
Call Trace:
 <NMI>  [<ffffffff8106558f>] warn_slowpath_common+0x7f/0xc0
 [<ffffffff81065686>] warn_slowpath_fmt+0x46/0x50
 [<ffffffff810d8158>] watchdog_overflow_callback+0x98/0xc0
 [<ffffffff8110fb39>] __perf_event_overflow+0x99/0x250
 [<ffffffff8110d2dd>] ? perf_event_update_userpage+0xbd/0x140
 [<ffffffff8110d220>] ? perf_event_update_userpage+0x0/0x140
 [<ffffffff81110234>] perf_event_overflow+0x14/0x20
 [<ffffffff8101eb66>] intel_pmu_handle_irq+0x306/0x560
 [<ffffffff8150e4c1>] ? hw_breakpoint_exceptions_notify+0x21/0x200
 [<ffffffff8150faf6>] ? kprobe_exceptions_notify+0x16/0x450
 [<ffffffff8150e6f0>] perf_event_nmi_handler+0x50/0xc0
 [<ffffffff81510aa4>] notifier_call_chain+0x94/0xd0
 [<ffffffff81510b4c>] __atomic_notifier_call_chain+0x6c/0xa0
 [<ffffffff81510ae0>] ? __atomic_notifier_call_chain+0x0/0xa0
 [<ffffffff81510b96>] atomic_notifier_call_chain+0x16/0x20
 [<ffffffff81510bce>] notify_die+0x2e/0x30
 [<ffffffff8150d89a>] do_nmi+0xda/0x2a0
 [<ffffffff8150d4e0>] nmi+0x20/0x39
 [<ffffffff8109f4a3>] ? register_lock_class+0xb3/0x550
 <<EOE>>  <IRQ>  [<ffffffff81013e73>] ? native_sched_clock+0x13/0x60
 [<ffffffff810131e9>] ? sched_clock+0x9/0x10
 [<ffffffff81090e0d>] ? sched_clock_cpu+0xcd/0x110
 [<ffffffff810a2348>] __lock_acquire+0x98/0x15c0
 [<ffffffff810a2628>] ? __lock_acquire+0x378/0x15c0
 [<ffffffff81013e73>] ? native_sched_clock+0x13/0x60
 [<ffffffff810131e9>] ? sched_clock+0x9/0x10
 [<ffffffff81049880>] ? tg_unthrottle_down+0x0/0x50
 [<ffffffff810a3928>] lock_acquire+0xb8/0x150
 [<ffffffff81059e9c>] ? distribute_cfs_bandwidth+0xfc/0x1d0
 [<ffffffff8150c146>] _raw_spin_lock+0x36/0x70
 [<ffffffff81059e9c>] ? distribute_cfs_bandwidth+0xfc/0x1d0
 [<ffffffff81059e9c>] distribute_cfs_bandwidth+0xfc/0x1d0
 [<ffffffff81059da0>] ? distribute_cfs_bandwidth+0x0/0x1d0
 [<ffffffff8105a0eb>] sched_cfs_period_timer+0x9b/0x100
 [<ffffffff8105a050>] ? sched_cfs_period_timer+0x0/0x100
 [<ffffffff8108e631>] __run_hrtimer+0x91/0x1f0
 [<ffffffff8108e9fa>] hrtimer_interrupt+0xda/0x250
 [<ffffffff8109a5d9>] tick_do_broadcast+0x49/0x90
 [<ffffffff8109a71c>] tick_handle_oneshot_broadcast+0xfc/0x140
 [<ffffffff8100ecae>] timer_interrupt+0x1e/0x30
 [<ffffffff810d8bcd>] handle_irq_event_percpu+0x5d/0x230
 [<ffffffff810d8e28>] handle_irq_event+0x58/0x80
 [<ffffffff810dbaae>] ? handle_edge_irq+0x1e/0xe0
 [<ffffffff810dbaff>] handle_edge_irq+0x6f/0xe0
 [<ffffffff8100e449>] handle_irq+0x49/0xa0
 [<ffffffff81516bed>] do_IRQ+0x5d/0xe0
 [<ffffffff8150ce53>] ret_from_intr+0x0/0x1a
 <EOI>  [<ffffffff8109dbbd>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffff812dd074>] ? acpi_idle_enter_bm+0x242/0x27a
 [<ffffffff812dd06d>] ? acpi_idle_enter_bm+0x23b/0x27a
 [<ffffffff813ee532>] cpuidle_idle_call+0xc2/0x260
 [<ffffffff8100c07c>] cpu_idle+0xbc/0x110
 [<ffffffff814f0937>] rest_init+0xb7/0xc0
 [<ffffffff814f0880>] ? rest_init+0x0/0xc0
 [<ffffffff81dfffa2>] start_kernel+0x41c/0x427
 [<ffffffff81dff346>] x86_64_start_reservations+0x131/0x135
 [<ffffffff81dff44d>] x86_64_start_kernel+0x103/0x112

  parent reply	other threads:[~2011-03-24 16:12 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-23  3:03 [patch 00/15] CFS Bandwidth Control V5 Paul Turner
2011-03-23  3:03 ` [patch 01/15] sched: introduce primitives to account for CFS bandwidth tracking Paul Turner
2011-03-24 12:38   ` Kamalesh Babulal
2011-04-05 13:28   ` Peter Zijlstra
2011-03-23  3:03 ` [patch 02/15] sched: validate CFS quota hierarchies Paul Turner
2011-03-23 10:39   ` torbenh
2011-03-23 20:49     ` Paul Turner
2011-03-24  6:31   ` Bharata B Rao
2011-04-08 17:01     ` Peter Zijlstra
2011-03-29  6:57   ` Hidetoshi Seto
2011-04-04 23:10     ` Paul Turner
2011-04-05 13:28   ` Peter Zijlstra
2011-03-23  3:03 ` [patch 03/15] sched: accumulate per-cfs_rq cpu usage Paul Turner
2011-04-05 13:28   ` Peter Zijlstra
2011-04-06 20:44     ` Paul Turner
2011-04-05 13:28   ` Peter Zijlstra
2011-04-06 20:47     ` Paul Turner
2011-03-23  3:03 ` [patch 04/15] sched: throttle cfs_rq entities which exceed their local quota Paul Turner
2011-03-23  5:09   ` Mike Galbraith
2011-03-23 20:53     ` Paul Turner
2011-03-24  6:36   ` Bharata B Rao
2011-03-24  7:40     ` Paul Turner
2011-04-05 13:28   ` Peter Zijlstra
2011-04-05 23:15     ` Paul Turner
2011-03-23  3:03 ` [patch 05/15] sched: unthrottle cfs_rq(s) who ran out of quota at period refresh Paul Turner
2011-04-05 13:28   ` Peter Zijlstra
2011-04-05 13:33     ` Peter Zijlstra
2011-04-05 13:28   ` Peter Zijlstra
2011-04-05 13:28   ` Peter Zijlstra
2011-03-23  3:03 ` [patch 06/15] sched: allow for positional tg_tree walks Paul Turner
2011-03-23  3:03 ` [patch 07/15] sched: prevent interactions between throttled entities and load-balance Paul Turner
2011-04-05 13:28   ` Peter Zijlstra
2011-03-23  3:03 ` [patch 08/15] sched: migrate throttled tasks on HOTPLUG Paul Turner
2011-04-05 13:28   ` Peter Zijlstra
2011-04-06  2:31     ` Paul Turner
2011-03-23  3:03 ` [patch 09/15] sched: add exports tracking cfs bandwidth control statistics Paul Turner
2011-04-05 13:28   ` Peter Zijlstra
2011-03-23  3:03 ` [patch 10/15] sched: (fixlet) dont update shares twice on on_rq parent Paul Turner
2011-04-05 13:28   ` Peter Zijlstra
2011-03-23  3:03 ` [patch 11/15] sched: hierarchical task accounting for SCHED_OTHER Paul Turner
2011-04-05 13:28   ` Peter Zijlstra
2011-03-23  3:03 ` [patch 12/15] sched: maintain throttled rqs as a list Paul Turner
2011-04-22  2:50   ` Hidetoshi Seto
2011-04-24 21:23     ` Paul Turner
2011-03-23  3:03 ` [patch 13/15] sched: expire slack quota using generation counters Paul Turner
2011-04-05 13:28   ` Peter Zijlstra
2011-04-06  7:22     ` Paul Turner
2011-04-06  8:15       ` Peter Zijlstra
2011-04-06 11:26       ` Peter Zijlstra
2011-03-23  3:03 ` [patch 14/15] sched: return unused quota on voluntary sleep Paul Turner
2011-04-05 13:28   ` Peter Zijlstra
2011-04-06  2:25     ` Paul Turner
2011-03-23  3:03 ` [patch 15/15] sched: add documentation for bandwidth control Paul Turner
2011-03-24  6:38   ` Bharata B Rao
2011-03-24 16:12 ` Bharata B Rao [this message]
2011-03-31  7:57 ` [patch 00/15] CFS Bandwidth Control V5 Xiao Guangrong
2011-04-04 23:10   ` Paul Turner
2011-04-05 13:28 ` Peter Zijlstra
2011-05-20  2:12 ` Test for CFS Bandwidth Control V6 Xiao Guangrong
2011-05-24  0:53   ` Hidetoshi Seto
2011-05-24  7:56     ` Xiao Guangrong
2011-06-08  2:54     ` Paul Turner
2011-06-08  5:55       ` Hidetoshi Seto

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110324161236.GA3399@in.ibm.com \
    --to=bharata@linux.vnet.ibm.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=dhaval.giani@gmail.com \
    --cc=kamalesh@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=pjt@google.com \
    --cc=svaidy@linux.vnet.ibm.com \
    --cc=vatsa@in.ibm.com \
    --cc=xemul@openvz.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).