All of lore.kernel.org
 help / color / mirror / Atom feed
* 2.6.29-rc8: Reported regressions 2.6.27 -> 2.6.28
@ 2009-03-14 19:11 ` Rafael J. Wysocki
  0 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:11 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Linux SCSI List, Network Development, Natalie Protasevich,
	Stable Kernel Team, Linux ACPI, Andrew Morton,
	Kernel Testers List, Linus Torvalds, Linux PM List

This message contains a list of some regressions introduced between 2.6.27 and
2.6.28, for which there are no fixes in the mainline I know of.  If any of them
have been fixed already, please let me know.

If you know of any other unresolved regressions introduced between 2.6.27
and 2.6.28, please let me know either and I'll add them to the list.
Also, please let me know if any of the entries below are invalid.

Each entry from the list will be sent additionally in an automatic reply to
this message with CCs to the people involved in reporting and handling the
issue.


Listed regressions statistics:

  Date          Total  Pending  Unresolved
  ----------------------------------------
  2009-03-14      156       19          16
  2009-03-03      153       21          16
  2009-02-24      154       27          23
  2009-02-15      152       30          26
  2009-02-04      149       33          30
  2009-01-20      144       30          27
  2009-01-11      139       33          30
  2008-12-21      120       19          17
  2008-12-13      111       14          13
  2008-12-07      106       20          17
  2008-12-04      106       29          21
  2008-11-22       93       25          15
  2008-11-16       89       32          18
  2008-11-09       73       40          27
  2008-11-02       55       41          29
  2008-10-25       26       25          20


Unresolved regressions
----------------------

Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12868
Subject		: iproute2 and regressing "ipv6: convert tunnels to net_device_ops"
Submitter	: Jan Engelhardt <jengelh@medozas.de>
Date		: 2009-03-09 14:46 (6 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=1326c3d5a4b792a2b15877feb7fb691f8945d203
References	: http://marc.info/?l=linux-netdev&m=123660999632730&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12818
Subject		: iwlagn broken after suspend to RAM (iwlagn: MAC is in deep sleep!)
Submitter	: Stefan Seyfried <seife@suse.de>
Date		: 2009-03-04 08:32 (11 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12690
Subject		: DPMS (LCD powersave, poweroff) don't work
Submitter	: Antonin Kolisek <akolisek@linuxx.hyperlinx.cz>
Date		: 2009-02-11 09:40 (32 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12645
Subject		: DMI low-memory-protect quirk causes resume hang on Samsung NC10
Submitter	: Patrick Walton <pcwalton@cs.ucla.edu>
Date		: 2009-02-06 18:35 (37 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0af40a4b1050c050e62eb1dc30b82d5ab22bf221


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12634
Subject		: video distortion and lockup with i830 video chip and 2.6.28.3
Submitter	: Bob Raitz <pappy_mcfae@yahoo.com>
Date		: 2009-02-04 21:10 (39 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12619
Subject		: Regression 2.6.28 and last - boot failed
Submitter	: jan sonnek <ha2nny@gmail.com>
Date		: 2009-02-01 19:59 (42 days old)
References	: http://marc.info/?l=linux-kernel&m=123351836213969&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12500
Subject		: r8169: NETDEV WATCHDOG: eth0 (r8169): transmit timed out
Submitter	: Justin Piszcz <jpiszcz@lucidpixels.com>
Date		: 2009-01-13 21:19 (61 days old)
References	: http://marc.info/?l=linux-kernel&m=123188160811322&w=4
Handled-By	: Francois Romieu <romieu@fr.zoreil.com>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
Subject		: KVM guests stalling on 2.6.28 (bisected)
Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
Date		: 2009-01-17 03:37 (57 days old)
Handled-By	: Avi Kivity <avi@redhat.com>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12426
Subject		: TMDC Joystick no longer works in kernel 2.6.28
Submitter	: Andrew S. Johnson <andy@asjohnson.com>
Date		: 2009-01-10 21:53 (64 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=6902c0bead4ce266226fc0c5b3828b850bdc884a
References	: http://marc.info/?l=linux-kernel&m=123162486415366&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12421
Subject		: GPF on 2.6.28 and 2.6.28-rc9-git3, e1000e and e1000 issues
Submitter	: Doug Bazarnic <doug@bazarnic.net>
Date		: 2009-01-09 21:26 (65 days old)
References	: http://marc.info/?l=linux-kernel&m=123153653120204&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12411
Subject		: 2.6.28: BUG in r8169
Submitter	: Andrey Vul <andrey.vul@gmail.com>
Date		: 2008-12-31 18:37 (74 days old)
References	: http://marc.info/?l=linux-kernel&m=123074869611409&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12404
Subject		: Oops in 2.6.28-rc9 and -rc8 -- mtrr issues / e1000e
Submitter	: Kernel <kernel@bazarnic.net>
Date		: 2008-12-22 9:37 (83 days old)
References	: http://marc.info/?l=linux-kernel&m=122993873320150&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12337
Subject		: ~100 extra wakeups reported by powertop
Submitter	: Alberto Gonzalez <luis6674@yahoo.com>
Date		: 2008-12-31 12:25 (74 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12209
Subject		: oldish top core dumps (in its meminfo() function)
Submitter	: Andreas Mohr <andi@lisas.de>
Date		: 2008-12-12 18:49 (93 days old)
References	: http://marc.info/?l=linux-kernel&m=122910784006472&w=4
		  http://marc.info/?l=linux-kernel&m=122907511319288&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12208
Subject		: uml is very slow on 2.6.28 host
Submitter	: Miklos Szeredi <miklos@szeredi.hu>
Date		: 2008-12-12 9:35 (93 days old)
References	: http://marc.info/?l=linux-kernel&m=122907463518593&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12061
Subject		: snd_hda_intel: power_save: sound cracks on powerdown
Submitter	: Jens Weibler <bugzilla-kernel@jensthebrain.de>
Date		: 2008-11-18 12:07 (117 days old)
Handled-By	: Takashi Iwai <tiwai@suse.de>


Regressions with patches
------------------------

Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12835
Subject		: Regression in backlight detection
Submitter	: Michael Spang <mspang@csclub.uwaterloo.ca>
Date		: 2009-02-24 5:41 (19 days old)
References	: http://marc.info/?l=linux-kernel&m=123545411502396&w=4
Handled-By	: Michael Spang <mspang@csclub.uwaterloo.ca>
Patch		: http://marc.info/?l=linux-kernel&m=123545411502396&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12798
Subject		: No wake up after suspend.
Submitter	: Michal Graczyk <zazulas@gmail.com>
Date		: 2009-03-01 15:30 (14 days old)
Handled-By	: Zhang Rui <rui.zhang@intel.com>
Patch		: http://bugzilla.kernel.org/attachment.cgi?id=20402&action=view


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12612
Subject		: hard lockup when interrupting cdda2wav
Submitter	: Matthias Reichl <hias@horus.com>
Date		: 2009-01-28 16:41 (46 days old)
References	: http://marc.info/?l=linux-kernel&m=123316111415677&w=4
Handled-By	: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Patch		: http://marc.info/?l=linux-scsi&m=123371501613019&w=2


For details, please visit the bug entries and follow the links given in
references.

As you can see, there is a Bugzilla entry for each of the listed regressions.
There also is a Bugzilla entry used for tracking the regressions introduced
between 2.6.27 and 2.6.28, unresolved as well as resolved, at:

http://bugzilla.kernel.org/show_bug.cgi?id=11808

Please let me know if there are any Bugzilla entries that should be added to
the list in there.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 180+ messages in thread

* 2.6.29-rc8: Reported regressions 2.6.27 -> 2.6.28
@ 2009-03-14 19:11 ` Rafael J. Wysocki
  0 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:11 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Andrew Morton, Linus Torvalds, Natalie Protasevich,
	Kernel Testers List, Network Development, Linux ACPI,
	Linux PM List, Linux SCSI List, Stable Kernel Team

This message contains a list of some regressions introduced between 2.6.27 and
2.6.28, for which there are no fixes in the mainline I know of.  If any of them
have been fixed already, please let me know.

If you know of any other unresolved regressions introduced between 2.6.27
and 2.6.28, please let me know either and I'll add them to the list.
Also, please let me know if any of the entries below are invalid.

Each entry from the list will be sent additionally in an automatic reply to
this message with CCs to the people involved in reporting and handling the
issue.


Listed regressions statistics:

  Date          Total  Pending  Unresolved
  ----------------------------------------
  2009-03-14      156       19          16
  2009-03-03      153       21          16
  2009-02-24      154       27          23
  2009-02-15      152       30          26
  2009-02-04      149       33          30
  2009-01-20      144       30          27
  2009-01-11      139       33          30
  2008-12-21      120       19          17
  2008-12-13      111       14          13
  2008-12-07      106       20          17
  2008-12-04      106       29          21
  2008-11-22       93       25          15
  2008-11-16       89       32          18
  2008-11-09       73       40          27
  2008-11-02       55       41          29
  2008-10-25       26       25          20


Unresolved regressions
----------------------

Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12868
Subject		: iproute2 and regressing "ipv6: convert tunnels to net_device_ops"
Submitter	: Jan Engelhardt <jengelh@medozas.de>
Date		: 2009-03-09 14:46 (6 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=1326c3d5a4b792a2b15877feb7fb691f8945d203
References	: http://marc.info/?l=linux-netdev&m=123660999632730&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12818
Subject		: iwlagn broken after suspend to RAM (iwlagn: MAC is in deep sleep!)
Submitter	: Stefan Seyfried <seife@suse.de>
Date		: 2009-03-04 08:32 (11 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12690
Subject		: DPMS (LCD powersave, poweroff) don't work
Submitter	: Antonin Kolisek <akolisek@linuxx.hyperlinx.cz>
Date		: 2009-02-11 09:40 (32 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12645
Subject		: DMI low-memory-protect quirk causes resume hang on Samsung NC10
Submitter	: Patrick Walton <pcwalton@cs.ucla.edu>
Date		: 2009-02-06 18:35 (37 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0af40a4b1050c050e62eb1dc30b82d5ab22bf221


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12634
Subject		: video distortion and lockup with i830 video chip and 2.6.28.3
Submitter	: Bob Raitz <pappy_mcfae@yahoo.com>
Date		: 2009-02-04 21:10 (39 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12619
Subject		: Regression 2.6.28 and last - boot failed
Submitter	: jan sonnek <ha2nny@gmail.com>
Date		: 2009-02-01 19:59 (42 days old)
References	: http://marc.info/?l=linux-kernel&m=123351836213969&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12500
Subject		: r8169: NETDEV WATCHDOG: eth0 (r8169): transmit timed out
Submitter	: Justin Piszcz <jpiszcz@lucidpixels.com>
Date		: 2009-01-13 21:19 (61 days old)
References	: http://marc.info/?l=linux-kernel&m=123188160811322&w=4
Handled-By	: Francois Romieu <romieu@fr.zoreil.com>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
Subject		: KVM guests stalling on 2.6.28 (bisected)
Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
Date		: 2009-01-17 03:37 (57 days old)
Handled-By	: Avi Kivity <avi@redhat.com>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12426
Subject		: TMDC Joystick no longer works in kernel 2.6.28
Submitter	: Andrew S. Johnson <andy@asjohnson.com>
Date		: 2009-01-10 21:53 (64 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=6902c0bead4ce266226fc0c5b3828b850bdc884a
References	: http://marc.info/?l=linux-kernel&m=123162486415366&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12421
Subject		: GPF on 2.6.28 and 2.6.28-rc9-git3, e1000e and e1000 issues
Submitter	: Doug Bazarnic <doug@bazarnic.net>
Date		: 2009-01-09 21:26 (65 days old)
References	: http://marc.info/?l=linux-kernel&m=123153653120204&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12411
Subject		: 2.6.28: BUG in r8169
Submitter	: Andrey Vul <andrey.vul@gmail.com>
Date		: 2008-12-31 18:37 (74 days old)
References	: http://marc.info/?l=linux-kernel&m=123074869611409&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12404
Subject		: Oops in 2.6.28-rc9 and -rc8 -- mtrr issues / e1000e
Submitter	: Kernel <kernel@bazarnic.net>
Date		: 2008-12-22 9:37 (83 days old)
References	: http://marc.info/?l=linux-kernel&m=122993873320150&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12337
Subject		: ~100 extra wakeups reported by powertop
Submitter	: Alberto Gonzalez <luis6674@yahoo.com>
Date		: 2008-12-31 12:25 (74 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12209
Subject		: oldish top core dumps (in its meminfo() function)
Submitter	: Andreas Mohr <andi@lisas.de>
Date		: 2008-12-12 18:49 (93 days old)
References	: http://marc.info/?l=linux-kernel&m=122910784006472&w=4
		  http://marc.info/?l=linux-kernel&m=122907511319288&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12208
Subject		: uml is very slow on 2.6.28 host
Submitter	: Miklos Szeredi <miklos@szeredi.hu>
Date		: 2008-12-12 9:35 (93 days old)
References	: http://marc.info/?l=linux-kernel&m=122907463518593&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12061
Subject		: snd_hda_intel: power_save: sound cracks on powerdown
Submitter	: Jens Weibler <bugzilla-kernel@jensthebrain.de>
Date		: 2008-11-18 12:07 (117 days old)
Handled-By	: Takashi Iwai <tiwai@suse.de>


Regressions with patches
------------------------

Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12835
Subject		: Regression in backlight detection
Submitter	: Michael Spang <mspang@csclub.uwaterloo.ca>
Date		: 2009-02-24 5:41 (19 days old)
References	: http://marc.info/?l=linux-kernel&m=123545411502396&w=4
Handled-By	: Michael Spang <mspang@csclub.uwaterloo.ca>
Patch		: http://marc.info/?l=linux-kernel&m=123545411502396&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12798
Subject		: No wake up after suspend.
Submitter	: Michal Graczyk <zazulas@gmail.com>
Date		: 2009-03-01 15:30 (14 days old)
Handled-By	: Zhang Rui <rui.zhang@intel.com>
Patch		: http://bugzilla.kernel.org/attachment.cgi?id=20402&action=view


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12612
Subject		: hard lockup when interrupting cdda2wav
Submitter	: Matthias Reichl <hias@horus.com>
Date		: 2009-01-28 16:41 (46 days old)
References	: http://marc.info/?l=linux-kernel&m=123316111415677&w=4
Handled-By	: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Patch		: http://marc.info/?l=linux-scsi&m=123371501613019&w=2


For details, please visit the bug entries and follow the links given in
references.

As you can see, there is a Bugzilla entry for each of the listed regressions.
There also is a Bugzilla entry used for tracking the regressions introduced
between 2.6.27 and 2.6.28, unresolved as well as resolved, at:

http://bugzilla.kernel.org/show_bug.cgi?id=11808

Please let me know if there are any Bugzilla entries that should be added to
the list in there.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12061] snd_hda_intel: power_save: sound cracks on powerdown
  2009-03-14 19:11 ` Rafael J. Wysocki
@ 2009-03-14 19:12   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:12 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Jens Weibler, Takashi Iwai

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12061
Subject		: snd_hda_intel: power_save: sound cracks on powerdown
Submitter	: Jens Weibler <bugzilla-kernel@jensthebrain.de>
Date		: 2008-11-18 12:07 (117 days old)
Handled-By	: Takashi Iwai <tiwai@suse.de>



^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12061] snd_hda_intel: power_save: sound cracks on powerdown
@ 2009-03-14 19:12   ` Rafael J. Wysocki
  0 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:12 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Jens Weibler, Takashi Iwai

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12061
Subject		: snd_hda_intel: power_save: sound cracks on powerdown
Submitter	: Jens Weibler <bugzilla-kernel-6hJTtV8wudIr9FUcG+3rRQ@public.gmane.org>
Date		: 2008-11-18 12:07 (117 days old)
Handled-By	: Takashi Iwai <tiwai-l3A5Bk7waGM@public.gmane.org>


^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12337] ~100 extra wakeups reported by powertop
  2009-03-14 19:11 ` Rafael J. Wysocki
@ 2009-03-14 19:20   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Alberto Gonzalez

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12337
Subject		: ~100 extra wakeups reported by powertop
Submitter	: Alberto Gonzalez <luis6674@yahoo.com>
Date		: 2008-12-31 12:25 (74 days old)



^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12411] 2.6.28: BUG in r8169
  2009-03-14 19:11 ` Rafael J. Wysocki
@ 2009-03-14 19:20   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Andrey Vul, Francois Romieu

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12411
Subject		: 2.6.28: BUG in r8169
Submitter	: Andrey Vul <andrey.vul@gmail.com>
Date		: 2008-12-31 18:37 (74 days old)
References	: http://marc.info/?l=linux-kernel&m=123074869611409&w=4



^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12208] uml is very slow on 2.6.28 host
  2009-03-14 19:11 ` Rafael J. Wysocki
@ 2009-03-14 19:20   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Miklos Szeredi

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12208
Subject		: uml is very slow on 2.6.28 host
Submitter	: Miklos Szeredi <miklos@szeredi.hu>
Date		: 2008-12-12 9:35 (93 days old)
References	: http://marc.info/?l=linux-kernel&m=122907463518593&w=4



^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12404] Oops in 2.6.28-rc9 and -rc8 -- mtrr issues / e1000e
  2009-03-14 19:11 ` Rafael J. Wysocki
@ 2009-03-14 19:20   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Kernel

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12404
Subject		: Oops in 2.6.28-rc9 and -rc8 -- mtrr issues / e1000e
Submitter	: Kernel <kernel@bazarnic.net>
Date		: 2008-12-22 9:37 (83 days old)
References	: http://marc.info/?l=linux-kernel&m=122993873320150&w=4



^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12209] oldish top core dumps (in its meminfo() function)
  2009-03-14 19:11 ` Rafael J. Wysocki
@ 2009-03-14 19:20   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Andreas Mohr

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12209
Subject		: oldish top core dumps (in its meminfo() function)
Submitter	: Andreas Mohr <andi@lisas.de>
Date		: 2008-12-12 18:49 (93 days old)
References	: http://marc.info/?l=linux-kernel&m=122910784006472&w=4
		  http://marc.info/?l=linux-kernel&m=122907511319288&w=4



^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12337] ~100 extra wakeups reported by powertop
@ 2009-03-14 19:20   ` Rafael J. Wysocki
  0 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Alberto Gonzalez

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12337
Subject		: ~100 extra wakeups reported by powertop
Submitter	: Alberto Gonzalez <luis6674-/E1597aS9LQAvxtiuMwx3w@public.gmane.org>
Date		: 2008-12-31 12:25 (74 days old)


^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12208] uml is very slow on 2.6.28 host
@ 2009-03-14 19:20   ` Rafael J. Wysocki
  0 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Miklos Szeredi

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12208
Subject		: uml is very slow on 2.6.28 host
Submitter	: Miklos Szeredi <miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org>
Date		: 2008-12-12 9:35 (93 days old)
References	: http://marc.info/?l=linux-kernel&m=122907463518593&w=4


^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12411] 2.6.28: BUG in r8169
@ 2009-03-14 19:20   ` Rafael J. Wysocki
  0 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Andrey Vul, Francois Romieu

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12411
Subject		: 2.6.28: BUG in r8169
Submitter	: Andrey Vul <andrey.vul-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2008-12-31 18:37 (74 days old)
References	: http://marc.info/?l=linux-kernel&m=123074869611409&w=4


^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12404] Oops in 2.6.28-rc9 and -rc8 -- mtrr issues / e1000e
@ 2009-03-14 19:20   ` Rafael J. Wysocki
  0 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Kernel

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12404
Subject		: Oops in 2.6.28-rc9 and -rc8 -- mtrr issues / e1000e
Submitter	: Kernel <kernel-nOyj/A09A+/k1uMJSBkQmQ@public.gmane.org>
Date		: 2008-12-22 9:37 (83 days old)
References	: http://marc.info/?l=linux-kernel&m=122993873320150&w=4


^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12209] oldish top core dumps (in its meminfo() function)
@ 2009-03-14 19:20   ` Rafael J. Wysocki
  0 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Andreas Mohr

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12209
Subject		: oldish top core dumps (in its meminfo() function)
Submitter	: Andreas Mohr <andi-5+Cda9B46AM@public.gmane.org>
Date		: 2008-12-12 18:49 (93 days old)
References	: http://marc.info/?l=linux-kernel&m=122910784006472&w=4
		  http://marc.info/?l=linux-kernel&m=122907511319288&w=4


^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12421] GPF on 2.6.28 and 2.6.28-rc9-git3, e1000e and e1000 issues
  2009-03-14 19:11 ` Rafael J. Wysocki
@ 2009-03-14 19:20   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Doug Bazarnic

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12421
Subject		: GPF on 2.6.28 and 2.6.28-rc9-git3, e1000e and e1000 issues
Submitter	: Doug Bazarnic <doug@bazarnic.net>
Date		: 2009-01-09 21:26 (65 days old)
References	: http://marc.info/?l=linux-kernel&m=123153653120204&w=4



^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12426] TMDC Joystick no longer works in kernel 2.6.28
  2009-03-14 19:11 ` Rafael J. Wysocki
@ 2009-03-14 19:20   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Andrew S. Johnson, Dmitry Torokhov, Dmitry Torokhov

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12426
Subject		: TMDC Joystick no longer works in kernel 2.6.28
Submitter	: Andrew S. Johnson <andy@asjohnson.com>
Date		: 2009-01-10 21:53 (64 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=6902c0bead4ce266226fc0c5b3828b850bdc884a
References	: http://marc.info/?l=linux-kernel&m=123162486415366&w=4



^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-03-14 19:11 ` Rafael J. Wysocki
@ 2009-03-14 19:20   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Avi Kivity, Ingo Molnar, Kevin Shanahan,
	Kevin Shanahan, Mike Galbraith, Peter Zijlstra

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
Subject		: KVM guests stalling on 2.6.28 (bisected)
Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
Date		: 2009-01-17 03:37 (57 days old)
Handled-By	: Avi Kivity <avi@redhat.com>



^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12421] GPF on 2.6.28 and 2.6.28-rc9-git3, e1000e and e1000 issues
@ 2009-03-14 19:20   ` Rafael J. Wysocki
  0 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Doug Bazarnic

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12421
Subject		: GPF on 2.6.28 and 2.6.28-rc9-git3, e1000e and e1000 issues
Submitter	: Doug Bazarnic <doug-nOyj/A09A+/k1uMJSBkQmQ@public.gmane.org>
Date		: 2009-01-09 21:26 (65 days old)
References	: http://marc.info/?l=linux-kernel&m=123153653120204&w=4


^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-14 19:20   ` Rafael J. Wysocki
  0 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Avi Kivity, Ingo Molnar, Kevin Shanahan,
	Kevin Shanahan, Mike Galbraith, Peter Zijlstra

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
Subject		: KVM guests stalling on 2.6.28 (bisected)
Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
Date		: 2009-01-17 03:37 (57 days old)
Handled-By	: Avi Kivity <avi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>


^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12426] TMDC Joystick no longer works in kernel 2.6.28
@ 2009-03-14 19:20   ` Rafael J. Wysocki
  0 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Andrew S. Johnson, Dmitry Torokhov, Dmitry Torokhov

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12426
Subject		: TMDC Joystick no longer works in kernel 2.6.28
Submitter	: Andrew S. Johnson <andy-9eXmkf4BiQ2aMJb+Lgu22Q@public.gmane.org>
Date		: 2009-01-10 21:53 (64 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=6902c0bead4ce266226fc0c5b3828b850bdc884a
References	: http://marc.info/?l=linux-kernel&m=123162486415366&w=4


^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12619] Regression 2.6.28 and last - boot failed
  2009-03-14 19:11 ` Rafael J. Wysocki
@ 2009-03-14 19:20   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, jan sonnek

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12619
Subject		: Regression 2.6.28 and last - boot failed
Submitter	: jan sonnek <ha2nny@gmail.com>
Date		: 2009-02-01 19:59 (42 days old)
References	: http://marc.info/?l=linux-kernel&m=123351836213969&w=4



^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12612] hard lockup when interrupting cdda2wav
  2009-03-14 19:11 ` Rafael J. Wysocki
@ 2009-03-14 19:20   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, FUJITA Tomonori, Matthias Reichl

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12612
Subject		: hard lockup when interrupting cdda2wav
Submitter	: Matthias Reichl <hias@horus.com>
Date		: 2009-01-28 16:41 (46 days old)
References	: http://marc.info/?l=linux-kernel&m=123316111415677&w=4
Handled-By	: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Patch		: http://marc.info/?l=linux-scsi&m=123371501613019&w=2



^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12500] r8169: NETDEV WATCHDOG: eth0 (r8169): transmit timed out
  2009-03-14 19:11 ` Rafael J. Wysocki
@ 2009-03-14 19:20   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Francois Romieu, Justin Piszcz

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12500
Subject		: r8169: NETDEV WATCHDOG: eth0 (r8169): transmit timed out
Submitter	: Justin Piszcz <jpiszcz@lucidpixels.com>
Date		: 2009-01-13 21:19 (61 days old)
References	: http://marc.info/?l=linux-kernel&m=123188160811322&w=4
Handled-By	: Francois Romieu <romieu@fr.zoreil.com>



^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12619] Regression 2.6.28 and last - boot failed
@ 2009-03-14 19:20   ` Rafael J. Wysocki
  0 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, jan sonnek

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12619
Subject		: Regression 2.6.28 and last - boot failed
Submitter	: jan sonnek <ha2nny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2009-02-01 19:59 (42 days old)
References	: http://marc.info/?l=linux-kernel&m=123351836213969&w=4


^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12612] hard lockup when interrupting cdda2wav
@ 2009-03-14 19:20   ` Rafael J. Wysocki
  0 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, FUJITA Tomonori, Matthias Reichl

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12612
Subject		: hard lockup when interrupting cdda2wav
Submitter	: Matthias Reichl <hias-vtPv7MOkFPkAvxtiuMwx3w@public.gmane.org>
Date		: 2009-01-28 16:41 (46 days old)
References	: http://marc.info/?l=linux-kernel&m=123316111415677&w=4
Handled-By	: FUJITA Tomonori <fujita.tomonori-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
Patch		: http://marc.info/?l=linux-scsi&m=123371501613019&w=2


^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12500] r8169: NETDEV WATCHDOG: eth0 (r8169): transmit timed out
@ 2009-03-14 19:20   ` Rafael J. Wysocki
  0 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Francois Romieu, Justin Piszcz

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12500
Subject		: r8169: NETDEV WATCHDOG: eth0 (r8169): transmit timed out
Submitter	: Justin Piszcz <jpiszcz-BP4nVm5VUdNhbmWW9KSYcQ@public.gmane.org>
Date		: 2009-01-13 21:19 (61 days old)
References	: http://marc.info/?l=linux-kernel&m=123188160811322&w=4
Handled-By	: Francois Romieu <romieu-W8zweXLXuWQS+FvcfC7Uqw@public.gmane.org>


^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12645] DMI low-memory-protect quirk causes resume hang on Samsung NC10
  2009-03-14 19:11 ` Rafael J. Wysocki
@ 2009-03-14 19:20   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Ingo Molnar, Patrick Walton, Philipp Kohlbecher

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12645
Subject		: DMI low-memory-protect quirk causes resume hang on Samsung NC10
Submitter	: Patrick Walton <pcwalton@cs.ucla.edu>
Date		: 2009-02-06 18:35 (37 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0af40a4b1050c050e62eb1dc30b82d5ab22bf221



^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12690] DPMS (LCD powersave, poweroff) don't work
  2009-03-14 19:11 ` Rafael J. Wysocki
@ 2009-03-14 19:20   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Antonin Kolisek

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12690
Subject		: DPMS (LCD powersave, poweroff) don't work
Submitter	: Antonin Kolisek <akolisek@linuxx.hyperlinx.cz>
Date		: 2009-02-11 09:40 (32 days old)



^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12634] video distortion and lockup with i830 video chip and 2.6.28.3
  2009-03-14 19:11 ` Rafael J. Wysocki
@ 2009-03-14 19:20   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Bob Raitz

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12634
Subject		: video distortion and lockup with i830 video chip and 2.6.28.3
Submitter	: Bob Raitz <pappy_mcfae@yahoo.com>
Date		: 2009-02-04 21:10 (39 days old)



^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12645] DMI low-memory-protect quirk causes resume hang on Samsung NC10
@ 2009-03-14 19:20   ` Rafael J. Wysocki
  0 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Ingo Molnar, Patrick Walton, Philipp Kohlbecher

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12645
Subject		: DMI low-memory-protect quirk causes resume hang on Samsung NC10
Submitter	: Patrick Walton <pcwalton-764C0pRuGfqVc3sceRu5cw@public.gmane.org>
Date		: 2009-02-06 18:35 (37 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0af40a4b1050c050e62eb1dc30b82d5ab22bf221


^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12634] video distortion and lockup with i830 video chip and 2.6.28.3
@ 2009-03-14 19:20   ` Rafael J. Wysocki
  0 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Bob Raitz

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12634
Subject		: video distortion and lockup with i830 video chip and 2.6.28.3
Submitter	: Bob Raitz <pappy_mcfae-/E1597aS9LQAvxtiuMwx3w@public.gmane.org>
Date		: 2009-02-04 21:10 (39 days old)


^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12690] DPMS (LCD powersave, poweroff) don't work
@ 2009-03-14 19:20   ` Rafael J. Wysocki
  0 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Antonin Kolisek

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12690
Subject		: DPMS (LCD powersave, poweroff) don't work
Submitter	: Antonin Kolisek <akolisek-T3ps84XAcx36AaHJ4hbVU+3CNBr840j2@public.gmane.org>
Date		: 2009-02-11 09:40 (32 days old)


^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12798] No wake up after suspend.
  2009-03-14 19:11 ` Rafael J. Wysocki
@ 2009-03-14 19:20   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Michal Graczyk, Zhang Rui

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12798
Subject		: No wake up after suspend.
Submitter	: Michal Graczyk <zazulas@gmail.com>
Date		: 2009-03-01 15:30 (14 days old)
Handled-By	: Zhang Rui <rui.zhang@intel.com>
Patch		: http://bugzilla.kernel.org/attachment.cgi?id=20402&action=view



^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12835] Regression in backlight detection
  2009-03-14 19:11 ` Rafael J. Wysocki
@ 2009-03-14 19:20   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Andi Kleen, Carlos Corbacho, Len Brown,
	Michael Spang, Thomas Renninger, Zhang Rui

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12835
Subject		: Regression in backlight detection
Submitter	: Michael Spang <mspang@csclub.uwaterloo.ca>
Date		: 2009-02-24 5:41 (19 days old)
References	: http://marc.info/?l=linux-kernel&m=123545411502396&w=4
Handled-By	: Michael Spang <mspang@csclub.uwaterloo.ca>
Patch		: http://marc.info/?l=linux-kernel&m=123545411502396&w=4



^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12818] iwlagn broken after suspend to RAM (iwlagn: MAC is in deep sleep!)
  2009-03-14 19:11 ` Rafael J. Wysocki
@ 2009-03-14 19:20   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, John W. Linville, Reinette Chatre,
	Stefan Seyfried, Tomas Winkler, Zhu Yi, Zhu, Yi

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12818
Subject		: iwlagn broken after suspend to RAM (iwlagn: MAC is in deep sleep!)
Submitter	: Stefan Seyfried <seife@suse.de>
Date		: 2009-03-04 08:32 (11 days old)



^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12835] Regression in backlight detection
@ 2009-03-14 19:20   ` Rafael J. Wysocki
  0 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Andi Kleen, Carlos Corbacho, Len Brown,
	Michael Spang, Thomas Renninger, Zhang Rui

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12835
Subject		: Regression in backlight detection
Submitter	: Michael Spang <mspang-1wCw9BSqJbv44Nm34jS7GywD8/FfD2ys@public.gmane.org>
Date		: 2009-02-24 5:41 (19 days old)
References	: http://marc.info/?l=linux-kernel&m=123545411502396&w=4
Handled-By	: Michael Spang <mspang-1wCw9BSqJbv44Nm34jS7GywD8/FfD2ys@public.gmane.org>
Patch		: http://marc.info/?l=linux-kernel&m=123545411502396&w=4


^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12798] No wake up after suspend.
@ 2009-03-14 19:20   ` Rafael J. Wysocki
  0 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Michal Graczyk, Zhang Rui

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12798
Subject		: No wake up after suspend.
Submitter	: Michal Graczyk <zazulas-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2009-03-01 15:30 (14 days old)
Handled-By	: Zhang Rui <rui.zhang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Patch		: http://bugzilla.kernel.org/attachment.cgi?id=20402&action=view


^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12818] iwlagn broken after suspend to RAM (iwlagn: MAC is in deep sleep!)
@ 2009-03-14 19:20   ` Rafael J. Wysocki
  0 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, John W. Linville, Reinette Chatre,
	Stefan Seyfried, Tomas Winkler, Zhu Yi, Zhu, Yi

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12818
Subject		: iwlagn broken after suspend to RAM (iwlagn: MAC is in deep sleep!)
Submitter	: Stefan Seyfried <seife-l3A5Bk7waGM@public.gmane.org>
Date		: 2009-03-04 08:32 (11 days old)


^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12868] iproute2 and regressing "ipv6: convert tunnels to net_device_ops"
  2009-03-14 19:11 ` Rafael J. Wysocki
                   ` (18 preceding siblings ...)
  (?)
@ 2009-03-14 19:20 ` Rafael J. Wysocki
  -1 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Jan Engelhardt, netdev, Stephen Hemminger

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12868
Subject		: iproute2 and regressing "ipv6: convert tunnels to net_device_ops"
Submitter	: Jan Engelhardt <jengelh@medozas.de>
Date		: 2009-03-09 14:46 (6 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=1326c3d5a4b792a2b15877feb7fb691f8945d203
References	: http://marc.info/?l=linux-netdev&m=123660999632730&w=4



^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-03-14 19:20   ` Rafael J. Wysocki
@ 2009-03-15  9:03     ` Kevin Shanahan
  -1 siblings, 0 replies; 180+ messages in thread
From: Kevin Shanahan @ 2009-03-15  9:03 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Avi Kivity,
	Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Sat, 2009-03-14 at 20:20 +0100, Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.27 and 2.6.28.
> 
> The following bug entry is on the current list of known regressions
> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> be listed and let me know (either way).
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> Subject		: KVM guests stalling on 2.6.28 (bisected)
> Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
> Date		: 2009-01-17 03:37 (57 days old)
> Handled-By	: Avi Kivity <avi@redhat.com>

No further updates since the last reminder.
The bug should still be listed.

Cheers,
Kevin.



^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-15  9:03     ` Kevin Shanahan
  0 siblings, 0 replies; 180+ messages in thread
From: Kevin Shanahan @ 2009-03-15  9:03 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Avi Kivity,
	Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Sat, 2009-03-14 at 20:20 +0100, Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.27 and 2.6.28.
> 
> The following bug entry is on the current list of known regressions
> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> be listed and let me know (either way).
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> Subject		: KVM guests stalling on 2.6.28 (bisected)
> Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
> Date		: 2009-01-17 03:37 (57 days old)
> Handled-By	: Avi Kivity <avi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

No further updates since the last reminder.
The bug should still be listed.

Cheers,
Kevin.


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-15  9:18       ` Avi Kivity
  0 siblings, 0 replies; 180+ messages in thread
From: Avi Kivity @ 2009-03-15  9:18 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

Kevin Shanahan wrote:
> On Sat, 2009-03-14 at 20:20 +0100, Rafael J. Wysocki wrote:
>   
>> This message has been generated automatically as a part of a report
>> of regressions introduced between 2.6.27 and 2.6.28.
>>
>> The following bug entry is on the current list of known regressions
>> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
>> be listed and let me know (either way).
>>
>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
>> Subject		: KVM guests stalling on 2.6.28 (bisected)
>> Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
>> Date		: 2009-01-17 03:37 (57 days old)
>> Handled-By	: Avi Kivity <avi@redhat.com>
>>     
>
> No further updates since the last reminder.
> The bug should still be listed.
>   

I've looked at the traces but lack the skill to make any sense out of them.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-15  9:18       ` Avi Kivity
  0 siblings, 0 replies; 180+ messages in thread
From: Avi Kivity @ 2009-03-15  9:18 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

Kevin Shanahan wrote:
> On Sat, 2009-03-14 at 20:20 +0100, Rafael J. Wysocki wrote:
>   
>> This message has been generated automatically as a part of a report
>> of regressions introduced between 2.6.27 and 2.6.28.
>>
>> The following bug entry is on the current list of known regressions
>> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
>> be listed and let me know (either way).
>>
>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
>> Subject		: KVM guests stalling on 2.6.28 (bisected)
>> Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
>> Date		: 2009-01-17 03:37 (57 days old)
>> Handled-By	: Avi Kivity <avi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>>     
>
> No further updates since the last reminder.
> The bug should still be listed.
>   

I've looked at the traces but lack the skill to make any sense out of them.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-15  9:48         ` Ingo Molnar
  0 siblings, 0 replies; 180+ messages in thread
From: Ingo Molnar @ 2009-03-15  9:48 UTC (permalink / raw)
  To: Avi Kivity, Peter Zijlstra, Mike Galbraith
  Cc: Kevin Shanahan, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List


* Avi Kivity <avi@redhat.com> wrote:

> Kevin Shanahan wrote:
>> On Sat, 2009-03-14 at 20:20 +0100, Rafael J. Wysocki wrote:
>>   
>>> This message has been generated automatically as a part of a report
>>> of regressions introduced between 2.6.27 and 2.6.28.
>>>
>>> The following bug entry is on the current list of known regressions
>>> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
>>> be listed and let me know (either way).
>>>
>>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
>>> Subject		: KVM guests stalling on 2.6.28 (bisected)
>>> Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
>>> Date		: 2009-01-17 03:37 (57 days old)
>>> Handled-By	: Avi Kivity <avi@redhat.com>
>>>     
>>
>> No further updates since the last reminder.
>> The bug should still be listed.
>>   
>
> I've looked at the traces but lack the skill to make any sense 
> out of them.

Do you have specific questions about them that we could answer?

	Ingo

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-15  9:48         ` Ingo Molnar
  0 siblings, 0 replies; 180+ messages in thread
From: Ingo Molnar @ 2009-03-15  9:48 UTC (permalink / raw)
  To: Avi Kivity, Peter Zijlstra, Mike Galbraith
  Cc: Kevin Shanahan, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List


* Avi Kivity <avi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> Kevin Shanahan wrote:
>> On Sat, 2009-03-14 at 20:20 +0100, Rafael J. Wysocki wrote:
>>   
>>> This message has been generated automatically as a part of a report
>>> of regressions introduced between 2.6.27 and 2.6.28.
>>>
>>> The following bug entry is on the current list of known regressions
>>> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
>>> be listed and let me know (either way).
>>>
>>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
>>> Subject		: KVM guests stalling on 2.6.28 (bisected)
>>> Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
>>> Date		: 2009-01-17 03:37 (57 days old)
>>> Handled-By	: Avi Kivity <avi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>>>     
>>
>> No further updates since the last reminder.
>> The bug should still be listed.
>>   
>
> I've looked at the traces but lack the skill to make any sense 
> out of them.

Do you have specific questions about them that we could answer?

	Ingo

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-15  9:56           ` Avi Kivity
  0 siblings, 0 replies; 180+ messages in thread
From: Avi Kivity @ 2009-03-15  9:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Mike Galbraith, Kevin Shanahan,
	Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List

Ingo Molnar wrote:
>> I've looked at the traces but lack the skill to make any sense 
>> out of them.
>>     
>
> Do you have specific questions about them that we could answer?
>   

A general question: what's going on?  I guess this will only be answered 
by me getting my hands dirty and understanding how ftrace works and how 
the output maps to what's happening.  I'll look at the docs for a while.

A specific question for now is how can I identify long latency within 
qemu here?  As far as I can tell all qemu latencies in trace6.txt are 
sub 100ms, which, while long, don't explain the guest stalling for many 
seconds.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-15  9:56           ` Avi Kivity
  0 siblings, 0 replies; 180+ messages in thread
From: Avi Kivity @ 2009-03-15  9:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Mike Galbraith, Kevin Shanahan,
	Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List

Ingo Molnar wrote:
>> I've looked at the traces but lack the skill to make any sense 
>> out of them.
>>     
>
> Do you have specific questions about them that we could answer?
>   

A general question: what's going on?  I guess this will only be answered 
by me getting my hands dirty and understanding how ftrace works and how 
the output maps to what's happening.  I'll look at the docs for a while.

A specific question for now is how can I identify long latency within 
qemu here?  As far as I can tell all qemu latencies in trace6.txt are 
sub 100ms, which, while long, don't explain the guest stalling for many 
seconds.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-03-15  9:56           ` Avi Kivity
  (?)
@ 2009-03-15 10:03           ` Ingo Molnar
  2009-03-15 10:13               ` Avi Kivity
  -1 siblings, 1 reply; 180+ messages in thread
From: Ingo Molnar @ 2009-03-15 10:03 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Peter Zijlstra, Mike Galbraith, Kevin Shanahan,
	Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List


* Avi Kivity <avi@redhat.com> wrote:

> Ingo Molnar wrote:
>>> I've looked at the traces but lack the skill to make any sense out of 
>>> them.
>>>     
>>
>> Do you have specific questions about them that we could answer?
>>   
>
> A general question: what's going on?  I guess this will only 
> be answered by me getting my hands dirty and understanding how 
> ftrace works and how the output maps to what's happening.  
> I'll look at the docs for a while.
>
> A specific question for now is how can I identify long latency 
> within qemu here?  As far as I can tell all qemu latencies in 
> trace6.txt are sub 100ms, which, while long, don't explain the 
> guest stalling for many seconds.

Exactly - that in turn means that there's no scheduler latency 
on the host/native kernel side - in turn it must be a KVM 
related latency. (If there was any host side scheduler wakeup or 
other type of latency we'd see it in the trace.)

The most useful trace would be a specific set of trace_printk() 
calls (available on the latest tracing tree), coupled with a 
hyper_trace_printk() which injects a trace entry from the guest 
side into the host kernel trace buffer. (== that would mean a 
hypercall that does a trace_printk().)

	Ingo

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-15 10:13               ` Avi Kivity
  0 siblings, 0 replies; 180+ messages in thread
From: Avi Kivity @ 2009-03-15 10:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Mike Galbraith, Kevin Shanahan,
	Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List

Ingo Molnar wrote:
>> A specific question for now is how can I identify long latency 
>> within qemu here?  As far as I can tell all qemu latencies in 
>> trace6.txt are sub 100ms, which, while long, don't explain the 
>> guest stalling for many seconds.
>>     
>
> Exactly - that in turn means that there's no scheduler latency 
> on the host/native kernel side - in turn it must be a KVM 
> related latency. (If there was any host side scheduler wakeup or 
> other type of latency we'd see it in the trace.)
>   

But if there's a missing wakeup (which is the likeliest candidate for 
the bug) then we would have seen high latencies, no?

Can you explain what the patch in question (14800984706) does?  Maybe 
that will give us a clue.

> The most useful trace would be a specific set of trace_printk() 
> calls (available on the latest tracing tree), coupled with a 
> hyper_trace_printk() which injects a trace entry from the guest 
> side into the host kernel trace buffer. (== that would mean a 
> hypercall that does a trace_printk().)

Yes, that would provide all the information.  Not sure if I would be up 
to decoding it, though.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-15 10:13               ` Avi Kivity
  0 siblings, 0 replies; 180+ messages in thread
From: Avi Kivity @ 2009-03-15 10:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Mike Galbraith, Kevin Shanahan,
	Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List

Ingo Molnar wrote:
>> A specific question for now is how can I identify long latency 
>> within qemu here?  As far as I can tell all qemu latencies in 
>> trace6.txt are sub 100ms, which, while long, don't explain the 
>> guest stalling for many seconds.
>>     
>
> Exactly - that in turn means that there's no scheduler latency 
> on the host/native kernel side - in turn it must be a KVM 
> related latency. (If there was any host side scheduler wakeup or 
> other type of latency we'd see it in the trace.)
>   

But if there's a missing wakeup (which is the likeliest candidate for 
the bug) then we would have seen high latencies, no?

Can you explain what the patch in question (14800984706) does?  Maybe 
that will give us a clue.

> The most useful trace would be a specific set of trace_printk() 
> calls (available on the latest tracing tree), coupled with a 
> hyper_trace_printk() which injects a trace entry from the guest 
> side into the host kernel trace buffer. (== that would mean a 
> hypercall that does a trace_printk().)

Yes, that would provide all the information.  Not sure if I would be up 
to decoding it, though.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-16  9:49       ` Avi Kivity
  0 siblings, 0 replies; 180+ messages in thread
From: Avi Kivity @ 2009-03-16  9:49 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

Kevin Shanahan wrote:
> On Sat, 2009-03-14 at 20:20 +0100, Rafael J. Wysocki wrote:
>   
>> This message has been generated automatically as a part of a report
>> of regressions introduced between 2.6.27 and 2.6.28.
>>
>> The following bug entry is on the current list of known regressions
>> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
>> be listed and let me know (either way).
>>
>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
>> Subject		: KVM guests stalling on 2.6.28 (bisected)
>> Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
>> Date		: 2009-01-17 03:37 (57 days old)
>> Handled-By	: Avi Kivity <avi@redhat.com>
>>     
>
> No further updates since the last reminder.
> The bug should still be listed.
>
>   

Does the bug reproduce if you use the acpi_pm clocksource in the guests?


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-16  9:49       ` Avi Kivity
  0 siblings, 0 replies; 180+ messages in thread
From: Avi Kivity @ 2009-03-16  9:49 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

Kevin Shanahan wrote:
> On Sat, 2009-03-14 at 20:20 +0100, Rafael J. Wysocki wrote:
>   
>> This message has been generated automatically as a part of a report
>> of regressions introduced between 2.6.27 and 2.6.28.
>>
>> The following bug entry is on the current list of known regressions
>> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
>> be listed and let me know (either way).
>>
>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
>> Subject		: KVM guests stalling on 2.6.28 (bisected)
>> Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
>> Date		: 2009-01-17 03:37 (57 days old)
>> Handled-By	: Avi Kivity <avi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>>     
>
> No further updates since the last reminder.
> The bug should still be listed.
>
>   

Does the bug reproduce if you use the acpi_pm clocksource in the guests?


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-16 12:46         ` Kevin Shanahan
  0 siblings, 0 replies; 180+ messages in thread
From: Kevin Shanahan @ 2009-03-16 12:46 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Mon, 2009-03-16 at 11:49 +0200, Avi Kivity wrote:
> Kevin Shanahan wrote:
> > On Sat, 2009-03-14 at 20:20 +0100, Rafael J. Wysocki wrote:
> >   
> >> This message has been generated automatically as a part of a report
> >> of regressions introduced between 2.6.27 and 2.6.28.
> >>
> >> The following bug entry is on the current list of known regressions
> >> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> >> be listed and let me know (either way).
> >>
> >> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> >> Subject		: KVM guests stalling on 2.6.28 (bisected)
> >> Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
> >> Date		: 2009-01-17 03:37 (57 days old)
> >> Handled-By	: Avi Kivity <avi@redhat.com>
> >>     
> >
> > No further updates since the last reminder.
> > The bug should still be listed.   
> 
> Does the bug reproduce if you use the acpi_pm clocksource in the guests?

In the guest being pinged? Yes, it still happens.

hermes-old:~# cat /sys/devices/system/clocksource/clocksource0/available_clocksource 
kvm-clock acpi_pm jiffies tsc 
hermes-old:~# cat /sys/devices/system/clocksource/clocksource0/current_clocksource 
acpi_pm

kmshanah@flexo:~$ ping -c 600 hermes-old

--- hermes-old.wumi.org.au ping statistics ---
600 packets transmitted, 600 received, 0% packet loss, time 599439ms
rtt min/avg/max/mdev = 0.131/723.197/9941.884/1569.918 ms, pipe 10

I had to reconfigure the guest kernel to make that clocksource
available. The way I had the guest kernel configured before, it only had
tsc and jiffies clocksources available. Unstable TSC was detected, so it
has been using jiffies until now.

Here's another test, using kvm-clock as the guest's clocksource:

hermes-old:~# cat /sys/devices/system/clocksource/clocksource0/current_clocksource 
kvm-clock

kmshanah@flexo:~$ ping -c 600 hermes-old

--- hermes-old.wumi.org.au ping statistics ---
600 packets transmitted, 600 received, 0% packet loss, time 599295ms
rtt min/avg/max/mdev = 0.131/1116.170/30840.411/4171.905 ms, pipe 31

Regards,
Kevin.



^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-16 12:46         ` Kevin Shanahan
  0 siblings, 0 replies; 180+ messages in thread
From: Kevin Shanahan @ 2009-03-16 12:46 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Mon, 2009-03-16 at 11:49 +0200, Avi Kivity wrote:
> Kevin Shanahan wrote:
> > On Sat, 2009-03-14 at 20:20 +0100, Rafael J. Wysocki wrote:
> >   
> >> This message has been generated automatically as a part of a report
> >> of regressions introduced between 2.6.27 and 2.6.28.
> >>
> >> The following bug entry is on the current list of known regressions
> >> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> >> be listed and let me know (either way).
> >>
> >> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> >> Subject		: KVM guests stalling on 2.6.28 (bisected)
> >> Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
> >> Date		: 2009-01-17 03:37 (57 days old)
> >> Handled-By	: Avi Kivity <avi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> >>     
> >
> > No further updates since the last reminder.
> > The bug should still be listed.   
> 
> Does the bug reproduce if you use the acpi_pm clocksource in the guests?

In the guest being pinged? Yes, it still happens.

hermes-old:~# cat /sys/devices/system/clocksource/clocksource0/available_clocksource 
kvm-clock acpi_pm jiffies tsc 
hermes-old:~# cat /sys/devices/system/clocksource/clocksource0/current_clocksource 
acpi_pm

kmshanah@flexo:~$ ping -c 600 hermes-old

--- hermes-old.wumi.org.au ping statistics ---
600 packets transmitted, 600 received, 0% packet loss, time 599439ms
rtt min/avg/max/mdev = 0.131/723.197/9941.884/1569.918 ms, pipe 10

I had to reconfigure the guest kernel to make that clocksource
available. The way I had the guest kernel configured before, it only had
tsc and jiffies clocksources available. Unstable TSC was detected, so it
has been using jiffies until now.

Here's another test, using kvm-clock as the guest's clocksource:

hermes-old:~# cat /sys/devices/system/clocksource/clocksource0/current_clocksource 
kvm-clock

kmshanah@flexo:~$ ping -c 600 hermes-old

--- hermes-old.wumi.org.au ping statistics ---
600 packets transmitted, 600 received, 0% packet loss, time 599295ms
rtt min/avg/max/mdev = 0.131/1116.170/30840.411/4171.905 ms, pipe 31

Regards,
Kevin.


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-16 20:07           ` Frederic Weisbecker
  0 siblings, 0 replies; 180+ messages in thread
From: Frederic Weisbecker @ 2009-03-16 20:07 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Mon, Mar 16, 2009 at 11:16:35PM +1030, Kevin Shanahan wrote:
> On Mon, 2009-03-16 at 11:49 +0200, Avi Kivity wrote:
> > Kevin Shanahan wrote:
> > > On Sat, 2009-03-14 at 20:20 +0100, Rafael J. Wysocki wrote:
> > >   
> > >> This message has been generated automatically as a part of a report
> > >> of regressions introduced between 2.6.27 and 2.6.28.
> > >>
> > >> The following bug entry is on the current list of known regressions
> > >> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> > >> be listed and let me know (either way).
> > >>
> > >> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> > >> Subject		: KVM guests stalling on 2.6.28 (bisected)
> > >> Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
> > >> Date		: 2009-01-17 03:37 (57 days old)
> > >> Handled-By	: Avi Kivity <avi@redhat.com>
> > >>     
> > >
> > > No further updates since the last reminder.
> > > The bug should still be listed.   
> > 
> > Does the bug reproduce if you use the acpi_pm clocksource in the guests?
> 
> In the guest being pinged? Yes, it still happens.


Hi Kevin,

I've looked a bit at your traces.
I think it's probably too wide to find something inside.
Latest -tip is provided with a new set of events tracing, meaning
that you will be able to produce function graph traces with various
sched events included.

Another thing, is it possible to reproduce it with only one ping?
Or testing perioding pings and keep only one that raised a relevant
threshold of latency? I think we could do a script that can do that.
It would make the trace much clearer.

Just wait a bit, I'm looking at which event could be relevant to enable
and I come back to you with a set of commands to test.

Frederic.
 
> hermes-old:~# cat /sys/devices/system/clocksource/clocksource0/available_clocksource 
> kvm-clock acpi_pm jiffies tsc 
> hermes-old:~# cat /sys/devices/system/clocksource/clocksource0/current_clocksource 
> acpi_pm
> 
> kmshanah@flexo:~$ ping -c 600 hermes-old
> 
> --- hermes-old.wumi.org.au ping statistics ---
> 600 packets transmitted, 600 received, 0% packet loss, time 599439ms
> rtt min/avg/max/mdev = 0.131/723.197/9941.884/1569.918 ms, pipe 10
> 
> I had to reconfigure the guest kernel to make that clocksource
> available. The way I had the guest kernel configured before, it only had
> tsc and jiffies clocksources available. Unstable TSC was detected, so it
> has been using jiffies until now.
> 
> Here's another test, using kvm-clock as the guest's clocksource:
> 
> hermes-old:~# cat /sys/devices/system/clocksource/clocksource0/current_clocksource 
> kvm-clock
> 
> kmshanah@flexo:~$ ping -c 600 hermes-old
> 
> --- hermes-old.wumi.org.au ping statistics ---
> 600 packets transmitted, 600 received, 0% packet loss, time 599295ms
> rtt min/avg/max/mdev = 0.131/1116.170/30840.411/4171.905 ms, pipe 31
> 
> Regards,
> Kevin.
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-16 20:07           ` Frederic Weisbecker
  0 siblings, 0 replies; 180+ messages in thread
From: Frederic Weisbecker @ 2009-03-16 20:07 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Mon, Mar 16, 2009 at 11:16:35PM +1030, Kevin Shanahan wrote:
> On Mon, 2009-03-16 at 11:49 +0200, Avi Kivity wrote:
> > Kevin Shanahan wrote:
> > > On Sat, 2009-03-14 at 20:20 +0100, Rafael J. Wysocki wrote:
> > >   
> > >> This message has been generated automatically as a part of a report
> > >> of regressions introduced between 2.6.27 and 2.6.28.
> > >>
> > >> The following bug entry is on the current list of known regressions
> > >> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> > >> be listed and let me know (either way).
> > >>
> > >> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> > >> Subject		: KVM guests stalling on 2.6.28 (bisected)
> > >> Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
> > >> Date		: 2009-01-17 03:37 (57 days old)
> > >> Handled-By	: Avi Kivity <avi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > >>     
> > >
> > > No further updates since the last reminder.
> > > The bug should still be listed.   
> > 
> > Does the bug reproduce if you use the acpi_pm clocksource in the guests?
> 
> In the guest being pinged? Yes, it still happens.


Hi Kevin,

I've looked a bit at your traces.
I think it's probably too wide to find something inside.
Latest -tip is provided with a new set of events tracing, meaning
that you will be able to produce function graph traces with various
sched events included.

Another thing, is it possible to reproduce it with only one ping?
Or testing perioding pings and keep only one that raised a relevant
threshold of latency? I think we could do a script that can do that.
It would make the trace much clearer.

Just wait a bit, I'm looking at which event could be relevant to enable
and I come back to you with a set of commands to test.

Frederic.
 
> hermes-old:~# cat /sys/devices/system/clocksource/clocksource0/available_clocksource 
> kvm-clock acpi_pm jiffies tsc 
> hermes-old:~# cat /sys/devices/system/clocksource/clocksource0/current_clocksource 
> acpi_pm
> 
> kmshanah@flexo:~$ ping -c 600 hermes-old
> 
> --- hermes-old.wumi.org.au ping statistics ---
> 600 packets transmitted, 600 received, 0% packet loss, time 599439ms
> rtt min/avg/max/mdev = 0.131/723.197/9941.884/1569.918 ms, pipe 10
> 
> I had to reconfigure the guest kernel to make that clocksource
> available. The way I had the guest kernel configured before, it only had
> tsc and jiffies clocksources available. Unstable TSC was detected, so it
> has been using jiffies until now.
> 
> Here's another test, using kvm-clock as the guest's clocksource:
> 
> hermes-old:~# cat /sys/devices/system/clocksource/clocksource0/current_clocksource 
> kvm-clock
> 
> kmshanah@flexo:~$ ping -c 600 hermes-old
> 
> --- hermes-old.wumi.org.au ping statistics ---
> 600 packets transmitted, 600 received, 0% packet loss, time 599295ms
> rtt min/avg/max/mdev = 0.131/1116.170/30840.411/4171.905 ms, pipe 31
> 
> Regards,
> Kevin.
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-03-16 20:07           ` Frederic Weisbecker
@ 2009-03-16 22:55             ` Kevin Shanahan
  -1 siblings, 0 replies; 180+ messages in thread
From: Kevin Shanahan @ 2009-03-16 22:55 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Mon, 2009-03-16 at 21:07 +0100, Frederic Weisbecker wrote:
> I've looked a bit at your traces.
> I think it's probably too wide to find something inside.
> Latest -tip is provided with a new set of events tracing, meaning
> that you will be able to produce function graph traces with various
> sched events included.
> 
> Another thing, is it possible to reproduce it with only one ping?
> Or testing perioding pings and keep only one that raised a relevant
> threshold of latency? I think we could do a script that can do that.
> It would make the trace much clearer.

Yeah, I think that should be possible. If you can come up with such a
script, that would be great.

> Just wait a bit, I'm looking at which event could be relevant to enable
> and I come back to you with a set of commands to test.

Excellent! Thanks for looking into this.

Cheers,
Kevin.



^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-16 22:55             ` Kevin Shanahan
  0 siblings, 0 replies; 180+ messages in thread
From: Kevin Shanahan @ 2009-03-16 22:55 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Mon, 2009-03-16 at 21:07 +0100, Frederic Weisbecker wrote:
> I've looked a bit at your traces.
> I think it's probably too wide to find something inside.
> Latest -tip is provided with a new set of events tracing, meaning
> that you will be able to produce function graph traces with various
> sched events included.
> 
> Another thing, is it possible to reproduce it with only one ping?
> Or testing perioding pings and keep only one that raised a relevant
> threshold of latency? I think we could do a script that can do that.
> It would make the trace much clearer.

Yeah, I think that should be possible. If you can come up with such a
script, that would be great.

> Just wait a bit, I'm looking at which event could be relevant to enable
> and I come back to you with a set of commands to test.

Excellent! Thanks for looking into this.

Cheers,
Kevin.


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12612] hard lockup when interrupting cdda2wav
  2009-03-14 19:20   ` Rafael J. Wysocki
@ 2009-03-17  0:53     ` FUJITA Tomonori
  -1 siblings, 0 replies; 180+ messages in thread
From: FUJITA Tomonori @ 2009-03-17  0:53 UTC (permalink / raw)
  To: rjw; +Cc: linux-kernel, kernel-testers, fujita.tomonori, hias, James.Bottomley

On Sat, 14 Mar 2009 20:20:17 +0100 (CET)
"Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.27 and 2.6.28.
> 
> The following bug entry is on the current list of known regressions
> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> be listed and let me know (either way).
> 
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12612
> Subject		: hard lockup when interrupting cdda2wav
> Submitter	: Matthias Reichl <hias@horus.com>
> Date		: 2009-01-28 16:41 (46 days old)
> References	: http://marc.info/?l=linux-kernel&m=123316111415677&w=4
> Handled-By	: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
> Patch		: http://marc.info/?l=linux-scsi&m=123371501613019&w=2

This still should be listed. I think that the fix (in James'
scsi-misc) will be merged to 2.6.30-rc1 then be backported.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12612] hard lockup when interrupting cdda2wav
@ 2009-03-17  0:53     ` FUJITA Tomonori
  0 siblings, 0 replies; 180+ messages in thread
From: FUJITA Tomonori @ 2009-03-17  0:53 UTC (permalink / raw)
  To: rjw-KKrjLPT3xs0
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	fujita.tomonori-Zyj7fXuS5i5L9jVzuh4AOg,
	hias-vtPv7MOkFPkAvxtiuMwx3w,
	James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk

On Sat, 14 Mar 2009 20:20:17 +0100 (CET)
"Rafael J. Wysocki" <rjw-KKrjLPT3xs0@public.gmane.org> wrote:

> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.27 and 2.6.28.
> 
> The following bug entry is on the current list of known regressions
> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> be listed and let me know (either way).
> 
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12612
> Subject		: hard lockup when interrupting cdda2wav
> Submitter	: Matthias Reichl <hias-vtPv7MOkFPkAvxtiuMwx3w@public.gmane.org>
> Date		: 2009-01-28 16:41 (46 days old)
> References	: http://marc.info/?l=linux-kernel&m=123316111415677&w=4
> Handled-By	: FUJITA Tomonori <fujita.tomonori-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
> Patch		: http://marc.info/?l=linux-scsi&m=123371501613019&w=2

This still should be listed. I think that the fix (in James'
scsi-misc) will be merged to 2.6.30-rc1 then be backported.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12612] hard lockup when interrupting cdda2wav
@ 2009-03-17 14:52       ` James Bottomley
  0 siblings, 0 replies; 180+ messages in thread
From: James Bottomley @ 2009-03-17 14:52 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: rjw, linux-kernel, kernel-testers, hias

On Tue, 2009-03-17 at 09:53 +0900, FUJITA Tomonori wrote:
> On Sat, 14 Mar 2009 20:20:17 +0100 (CET)
> "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> 
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.27 and 2.6.28.
> > 
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> > be listed and let me know (either way).
> > 
> > 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12612
> > Subject		: hard lockup when interrupting cdda2wav
> > Submitter	: Matthias Reichl <hias@horus.com>
> > Date		: 2009-01-28 16:41 (46 days old)
> > References	: http://marc.info/?l=linux-kernel&m=123316111415677&w=4
> > Handled-By	: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
> > Patch		: http://marc.info/?l=linux-scsi&m=123371501613019&w=2
> 
> This still should be listed. I think that the fix (in James'
> scsi-misc) will be merged to 2.6.30-rc1 then be backported.

It hasn't shown any problems at all under test in -next ... hopefully
under a reasonable test pool.  I think we can move it across for current
bug fixes (crosses fingers).

James



^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12612] hard lockup when interrupting cdda2wav
@ 2009-03-17 14:52       ` James Bottomley
  0 siblings, 0 replies; 180+ messages in thread
From: James Bottomley @ 2009-03-17 14:52 UTC (permalink / raw)
  To: FUJITA Tomonori
  Cc: rjw-KKrjLPT3xs0, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	hias-vtPv7MOkFPkAvxtiuMwx3w

On Tue, 2009-03-17 at 09:53 +0900, FUJITA Tomonori wrote:
> On Sat, 14 Mar 2009 20:20:17 +0100 (CET)
> "Rafael J. Wysocki" <rjw-KKrjLPT3xs0@public.gmane.org> wrote:
> 
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.27 and 2.6.28.
> > 
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> > be listed and let me know (either way).
> > 
> > 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12612
> > Subject		: hard lockup when interrupting cdda2wav
> > Submitter	: Matthias Reichl <hias-vtPv7MOkFPkAvxtiuMwx3w@public.gmane.org>
> > Date		: 2009-01-28 16:41 (46 days old)
> > References	: http://marc.info/?l=linux-kernel&m=123316111415677&w=4
> > Handled-By	: FUJITA Tomonori <fujita.tomonori-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
> > Patch		: http://marc.info/?l=linux-scsi&m=123371501613019&w=2
> 
> This still should be listed. I think that the fix (in James'
> scsi-misc) will be merged to 2.6.30-rc1 then be backported.

It hasn't shown any problems at all under test in -next ... hopefully
under a reasonable test pool.  I think we can move it across for current
bug fixes (crosses fingers).

James


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-18  0:20               ` Frederic Weisbecker
  0 siblings, 0 replies; 180+ messages in thread
From: Frederic Weisbecker @ 2009-03-18  0:20 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Tue, Mar 17, 2009 at 09:25:37AM +1030, Kevin Shanahan wrote:
> On Mon, 2009-03-16 at 21:07 +0100, Frederic Weisbecker wrote:
> > I've looked a bit at your traces.
> > I think it's probably too wide to find something inside.
> > Latest -tip is provided with a new set of events tracing, meaning
> > that you will be able to produce function graph traces with various
> > sched events included.
> > 
> > Another thing, is it possible to reproduce it with only one ping?
> > Or testing perioding pings and keep only one that raised a relevant
> > threshold of latency? I think we could do a script that can do that.
> > It would make the trace much clearer.
> 
> Yeah, I think that should be possible. If you can come up with such a
> script, that would be great.

Ok, I've made a small script based on yours which could do this job.
You will just have to set yourself a threshold of latency
that you consider as buggy. I don't remember the latency you observed.
About 5 secs right?

It's the "thres" variable in the script.

The resulting trace should be a mixup of the function graph traces
and scheduler events which look like this:

 gnome-screensav-4691  [000]  6716.774277:   4691:120:S ==> [000]     0:140:R <idle>
  xfce4-terminal-4723  [001]  6716.774303:   4723:120:R   + [001]  4289:120:S Xorg
  xfce4-terminal-4723  [001]  6716.774417:   4723:120:S ==> [001]  4289:120:R Xorg
            Xorg-4289  [001]  6716.774427:   4289:120:S ==> [001]     0:140:R <idle>

+ is a wakeup and ==> is a context switch.


The script will loop trying some pings and will only keep the trace that matches
the latency threshold you defined.

Tell if the following script work for you.

You will need to pull the latest -tip tree and enable the following:

CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_SCHED_TRACER=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_EVENT_TRACER=y

Thanks!

Ah and you will need python too (since bash can't do floating point
operation, it uses python here).

#!/bin/bash

# Switch off all CPUs except for one to simplify the trace
echo 0 > /sys/devices/system/cpu/cpu1/online
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 0 > /sys/devices/system/cpu/cpu3/online


# Make sure debugfs has been mounted
if [ ! -d /sys/kernel/debug/tracing ]; then
    mount -t debugfs debugfs /sys/kernel/debug
fi

# Set up the trace parameters
pushd /sys/kernel/debug/tracing || exit 1
echo 0 > tracing_enabled
echo function_graph > current_tracer
echo funcgraph-abstime > trace_options
echo funcgraph-proc    > trace_options

# Set here the kvm IP addr
addr=""

# Set here a threshold of latency in sec
thres="5"
found="False"
lat=0
prefix=/sys/kernel/debug/tracing

echo 1 > $prefix/events/sched/sched_wakeup/enable
echo 1 > $prefix/events/sched/sched_switch/enable

while [ "$found" != "True" ]
do
	# Flush the previous buffer
	echo nop > $prefix/current_tracer

	# Reset the function_graph tracer
	echo function_graph > $prefix/current_tracer

	echo 1 > $prefix/tracing_enabled
	lat=$(ping -c 1 $addr | grep rtt | grep -Eo " [0-9]+.[0-9]+")
	echo 0 > $prefix/tracing_enabled

	found=$(python -c "print float(str($lat).strip()) > $thres")
	sleep 0.01
done

echo 0 > $prefix/events/sched/sched_wakeup/enable
echo 0 > $prefix/events/sched/sched_switch/enable


echo "Found buggy latency: $lat"
echo "Please send the trace you will find on $prefix/trace"



> 
> > Just wait a bit, I'm looking at which event could be relevant to enable
> > and I come back to you with a set of commands to test.
> 
> Excellent! Thanks for looking into this.
> 
> Cheers,
> Kevin.
> 
> 


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-18  0:20               ` Frederic Weisbecker
  0 siblings, 0 replies; 180+ messages in thread
From: Frederic Weisbecker @ 2009-03-18  0:20 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Tue, Mar 17, 2009 at 09:25:37AM +1030, Kevin Shanahan wrote:
> On Mon, 2009-03-16 at 21:07 +0100, Frederic Weisbecker wrote:
> > I've looked a bit at your traces.
> > I think it's probably too wide to find something inside.
> > Latest -tip is provided with a new set of events tracing, meaning
> > that you will be able to produce function graph traces with various
> > sched events included.
> > 
> > Another thing, is it possible to reproduce it with only one ping?
> > Or testing perioding pings and keep only one that raised a relevant
> > threshold of latency? I think we could do a script that can do that.
> > It would make the trace much clearer.
> 
> Yeah, I think that should be possible. If you can come up with such a
> script, that would be great.

Ok, I've made a small script based on yours which could do this job.
You will just have to set yourself a threshold of latency
that you consider as buggy. I don't remember the latency you observed.
About 5 secs right?

It's the "thres" variable in the script.

The resulting trace should be a mixup of the function graph traces
and scheduler events which look like this:

 gnome-screensav-4691  [000]  6716.774277:   4691:120:S ==> [000]     0:140:R <idle>
  xfce4-terminal-4723  [001]  6716.774303:   4723:120:R   + [001]  4289:120:S Xorg
  xfce4-terminal-4723  [001]  6716.774417:   4723:120:S ==> [001]  4289:120:R Xorg
            Xorg-4289  [001]  6716.774427:   4289:120:S ==> [001]     0:140:R <idle>

+ is a wakeup and ==> is a context switch.


The script will loop trying some pings and will only keep the trace that matches
the latency threshold you defined.

Tell if the following script work for you.

You will need to pull the latest -tip tree and enable the following:

CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_SCHED_TRACER=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_EVENT_TRACER=y

Thanks!

Ah and you will need python too (since bash can't do floating point
operation, it uses python here).

#!/bin/bash

# Switch off all CPUs except for one to simplify the trace
echo 0 > /sys/devices/system/cpu/cpu1/online
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 0 > /sys/devices/system/cpu/cpu3/online


# Make sure debugfs has been mounted
if [ ! -d /sys/kernel/debug/tracing ]; then
    mount -t debugfs debugfs /sys/kernel/debug
fi

# Set up the trace parameters
pushd /sys/kernel/debug/tracing || exit 1
echo 0 > tracing_enabled
echo function_graph > current_tracer
echo funcgraph-abstime > trace_options
echo funcgraph-proc    > trace_options

# Set here the kvm IP addr
addr=""

# Set here a threshold of latency in sec
thres="5"
found="False"
lat=0
prefix=/sys/kernel/debug/tracing

echo 1 > $prefix/events/sched/sched_wakeup/enable
echo 1 > $prefix/events/sched/sched_switch/enable

while [ "$found" != "True" ]
do
	# Flush the previous buffer
	echo nop > $prefix/current_tracer

	# Reset the function_graph tracer
	echo function_graph > $prefix/current_tracer

	echo 1 > $prefix/tracing_enabled
	lat=$(ping -c 1 $addr | grep rtt | grep -Eo " [0-9]+.[0-9]+")
	echo 0 > $prefix/tracing_enabled

	found=$(python -c "print float(str($lat).strip()) > $thres")
	sleep 0.01
done

echo 0 > $prefix/events/sched/sched_wakeup/enable
echo 0 > $prefix/events/sched/sched_switch/enable


echo "Found buggy latency: $lat"
echo "Please send the trace you will find on $prefix/trace"



> 
> > Just wait a bit, I'm looking at which event could be relevant to enable
> > and I come back to you with a set of commands to test.
> 
> Excellent! Thanks for looking into this.
> 
> Cheers,
> Kevin.
> 
> 

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-03-18  0:20               ` Frederic Weisbecker
@ 2009-03-18  1:16                 ` Kevin Shanahan
  -1 siblings, 0 replies; 180+ messages in thread
From: Kevin Shanahan @ 2009-03-18  1:16 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Wed, 2009-03-18 at 01:20 +0100, Frederic Weisbecker wrote:
> On Tue, Mar 17, 2009 at 09:25:37AM +1030, Kevin Shanahan wrote:
> > On Mon, 2009-03-16 at 21:07 +0100, Frederic Weisbecker wrote:
> > > I've looked a bit at your traces.
> > > I think it's probably too wide to find something inside.
> > > Latest -tip is provided with a new set of events tracing, meaning
> > > that you will be able to produce function graph traces with various
> > > sched events included.
> > > 
> > > Another thing, is it possible to reproduce it with only one ping?
> > > Or testing perioding pings and keep only one that raised a relevant
> > > threshold of latency? I think we could do a script that can do that.
> > > It would make the trace much clearer.
> > 
> > Yeah, I think that should be possible. If you can come up with such a
> > script, that would be great.
> 
> Ok, I've made a small script based on yours which could do this job.
> You will just have to set yourself a threshold of latency
> that you consider as buggy. I don't remember the latency you observed.
> About 5 secs right?
> 
> It's the "thres" variable in the script.
> 
> The resulting trace should be a mixup of the function graph traces
> and scheduler events which look like this:
> 
>  gnome-screensav-4691  [000]  6716.774277:   4691:120:S ==> [000]     0:140:R <idle>
>   xfce4-terminal-4723  [001]  6716.774303:   4723:120:R   + [001]  4289:120:S Xorg
>   xfce4-terminal-4723  [001]  6716.774417:   4723:120:S ==> [001]  4289:120:R Xorg
>             Xorg-4289  [001]  6716.774427:   4289:120:S ==> [001]     0:140:R <idle>
> 
> + is a wakeup and ==> is a context switch.
> 
> 
> The script will loop trying some pings and will only keep the trace that matches
> the latency threshold you defined.
> 
> Tell if the following script work for you.

Yes, this looks like it will work as intended.

One thing I was thinking about though - would we need to look for a
trace that consists of a fast ping followed by a slow ping? If we only
keep the trace data from when we experience the slow ping, the guest
will have already "stalled" before the trace started, so the trace won't
indicate any of the information about how we got into that state. Is
that information going to be important, or is it enough to just see what
it does to get out of the stalled state?

Either way, I'll try to get some results in my maintenance window
tonight.

Cheers,
Kevin.

> You will need to pull the latest -tip tree and enable the following:
> 
> CONFIG_FUNCTION_TRACER=y
> CONFIG_FUNCTION_GRAPH_TRACER=y
> CONFIG_DYNAMIC_FTRACE=y
> CONFIG_SCHED_TRACER=y
> CONFIG_CONTEXT_SWITCH_TRACER=y
> CONFIG_EVENT_TRACER=y
> 
> Thanks!
> 
> Ah and you will need python too (since bash can't do floating point
> operation, it uses python here).
> 
> #!/bin/bash
> 
> # Switch off all CPUs except for one to simplify the trace
> echo 0 > /sys/devices/system/cpu/cpu1/online
> echo 0 > /sys/devices/system/cpu/cpu2/online
> echo 0 > /sys/devices/system/cpu/cpu3/online
> 
> 
> # Make sure debugfs has been mounted
> if [ ! -d /sys/kernel/debug/tracing ]; then
>     mount -t debugfs debugfs /sys/kernel/debug
> fi
> 
> # Set up the trace parameters
> pushd /sys/kernel/debug/tracing || exit 1
> echo 0 > tracing_enabled
> echo function_graph > current_tracer
> echo funcgraph-abstime > trace_options
> echo funcgraph-proc    > trace_options
> 
> # Set here the kvm IP addr
> addr=""
> 
> # Set here a threshold of latency in sec
> thres="5"
> found="False"
> lat=0
> prefix=/sys/kernel/debug/tracing
> 
> echo 1 > $prefix/events/sched/sched_wakeup/enable
> echo 1 > $prefix/events/sched/sched_switch/enable
> 
> while [ "$found" != "True" ]
> do
> 	# Flush the previous buffer
> 	echo nop > $prefix/current_tracer
> 
> 	# Reset the function_graph tracer
> 	echo function_graph > $prefix/current_tracer
> 
> 	echo 1 > $prefix/tracing_enabled
> 	lat=$(ping -c 1 $addr | grep rtt | grep -Eo " [0-9]+.[0-9]+")
> 	echo 0 > $prefix/tracing_enabled
> 
> 	found=$(python -c "print float(str($lat).strip()) > $thres")
> 	sleep 0.01
> done
> 
> echo 0 > $prefix/events/sched/sched_wakeup/enable
> echo 0 > $prefix/events/sched/sched_switch/enable
> 
> 
> echo "Found buggy latency: $lat"
> echo "Please send the trace you will find on $prefix/trace"



^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-18  1:16                 ` Kevin Shanahan
  0 siblings, 0 replies; 180+ messages in thread
From: Kevin Shanahan @ 2009-03-18  1:16 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Wed, 2009-03-18 at 01:20 +0100, Frederic Weisbecker wrote:
> On Tue, Mar 17, 2009 at 09:25:37AM +1030, Kevin Shanahan wrote:
> > On Mon, 2009-03-16 at 21:07 +0100, Frederic Weisbecker wrote:
> > > I've looked a bit at your traces.
> > > I think it's probably too wide to find something inside.
> > > Latest -tip is provided with a new set of events tracing, meaning
> > > that you will be able to produce function graph traces with various
> > > sched events included.
> > > 
> > > Another thing, is it possible to reproduce it with only one ping?
> > > Or testing perioding pings and keep only one that raised a relevant
> > > threshold of latency? I think we could do a script that can do that.
> > > It would make the trace much clearer.
> > 
> > Yeah, I think that should be possible. If you can come up with such a
> > script, that would be great.
> 
> Ok, I've made a small script based on yours which could do this job.
> You will just have to set yourself a threshold of latency
> that you consider as buggy. I don't remember the latency you observed.
> About 5 secs right?
> 
> It's the "thres" variable in the script.
> 
> The resulting trace should be a mixup of the function graph traces
> and scheduler events which look like this:
> 
>  gnome-screensav-4691  [000]  6716.774277:   4691:120:S ==> [000]     0:140:R <idle>
>   xfce4-terminal-4723  [001]  6716.774303:   4723:120:R   + [001]  4289:120:S Xorg
>   xfce4-terminal-4723  [001]  6716.774417:   4723:120:S ==> [001]  4289:120:R Xorg
>             Xorg-4289  [001]  6716.774427:   4289:120:S ==> [001]     0:140:R <idle>
> 
> + is a wakeup and ==> is a context switch.
> 
> 
> The script will loop trying some pings and will only keep the trace that matches
> the latency threshold you defined.
> 
> Tell if the following script work for you.

Yes, this looks like it will work as intended.

One thing I was thinking about though - would we need to look for a
trace that consists of a fast ping followed by a slow ping? If we only
keep the trace data from when we experience the slow ping, the guest
will have already "stalled" before the trace started, so the trace won't
indicate any of the information about how we got into that state. Is
that information going to be important, or is it enough to just see what
it does to get out of the stalled state?

Either way, I'll try to get some results in my maintenance window
tonight.

Cheers,
Kevin.

> You will need to pull the latest -tip tree and enable the following:
> 
> CONFIG_FUNCTION_TRACER=y
> CONFIG_FUNCTION_GRAPH_TRACER=y
> CONFIG_DYNAMIC_FTRACE=y
> CONFIG_SCHED_TRACER=y
> CONFIG_CONTEXT_SWITCH_TRACER=y
> CONFIG_EVENT_TRACER=y
> 
> Thanks!
> 
> Ah and you will need python too (since bash can't do floating point
> operation, it uses python here).
> 
> #!/bin/bash
> 
> # Switch off all CPUs except for one to simplify the trace
> echo 0 > /sys/devices/system/cpu/cpu1/online
> echo 0 > /sys/devices/system/cpu/cpu2/online
> echo 0 > /sys/devices/system/cpu/cpu3/online
> 
> 
> # Make sure debugfs has been mounted
> if [ ! -d /sys/kernel/debug/tracing ]; then
>     mount -t debugfs debugfs /sys/kernel/debug
> fi
> 
> # Set up the trace parameters
> pushd /sys/kernel/debug/tracing || exit 1
> echo 0 > tracing_enabled
> echo function_graph > current_tracer
> echo funcgraph-abstime > trace_options
> echo funcgraph-proc    > trace_options
> 
> # Set here the kvm IP addr
> addr=""
> 
> # Set here a threshold of latency in sec
> thres="5"
> found="False"
> lat=0
> prefix=/sys/kernel/debug/tracing
> 
> echo 1 > $prefix/events/sched/sched_wakeup/enable
> echo 1 > $prefix/events/sched/sched_switch/enable
> 
> while [ "$found" != "True" ]
> do
> 	# Flush the previous buffer
> 	echo nop > $prefix/current_tracer
> 
> 	# Reset the function_graph tracer
> 	echo function_graph > $prefix/current_tracer
> 
> 	echo 1 > $prefix/tracing_enabled
> 	lat=$(ping -c 1 $addr | grep rtt | grep -Eo " [0-9]+.[0-9]+")
> 	echo 0 > $prefix/tracing_enabled
> 
> 	found=$(python -c "print float(str($lat).strip()) > $thres")
> 	sleep 0.01
> done
> 
> echo 0 > $prefix/events/sched/sched_wakeup/enable
> echo 0 > $prefix/events/sched/sched_switch/enable
> 
> 
> echo "Found buggy latency: $lat"
> echo "Please send the trace you will find on $prefix/trace"


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-18  2:24                   ` Frederic Weisbecker
  0 siblings, 0 replies; 180+ messages in thread
From: Frederic Weisbecker @ 2009-03-18  2:24 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Wed, Mar 18, 2009 at 11:46:26AM +1030, Kevin Shanahan wrote:
> On Wed, 2009-03-18 at 01:20 +0100, Frederic Weisbecker wrote:
> > On Tue, Mar 17, 2009 at 09:25:37AM +1030, Kevin Shanahan wrote:
> > > On Mon, 2009-03-16 at 21:07 +0100, Frederic Weisbecker wrote:
> > > > I've looked a bit at your traces.
> > > > I think it's probably too wide to find something inside.
> > > > Latest -tip is provided with a new set of events tracing, meaning
> > > > that you will be able to produce function graph traces with various
> > > > sched events included.
> > > > 
> > > > Another thing, is it possible to reproduce it with only one ping?
> > > > Or testing perioding pings and keep only one that raised a relevant
> > > > threshold of latency? I think we could do a script that can do that.
> > > > It would make the trace much clearer.
> > > 
> > > Yeah, I think that should be possible. If you can come up with such a
> > > script, that would be great.
> > 
> > Ok, I've made a small script based on yours which could do this job.
> > You will just have to set yourself a threshold of latency
> > that you consider as buggy. I don't remember the latency you observed.
> > About 5 secs right?
> > 
> > It's the "thres" variable in the script.
> > 
> > The resulting trace should be a mixup of the function graph traces
> > and scheduler events which look like this:
> > 
> >  gnome-screensav-4691  [000]  6716.774277:   4691:120:S ==> [000]     0:140:R <idle>
> >   xfce4-terminal-4723  [001]  6716.774303:   4723:120:R   + [001]  4289:120:S Xorg
> >   xfce4-terminal-4723  [001]  6716.774417:   4723:120:S ==> [001]  4289:120:R Xorg
> >             Xorg-4289  [001]  6716.774427:   4289:120:S ==> [001]     0:140:R <idle>
> > 
> > + is a wakeup and ==> is a context switch.
> > 
> > 
> > The script will loop trying some pings and will only keep the trace that matches
> > the latency threshold you defined.
> > 
> > Tell if the following script work for you.
> 
> Yes, this looks like it will work as intended.
> 
> One thing I was thinking about though - would we need to look for a
> trace that consists of a fast ping followed by a slow ping? If we only
> keep the trace data from when we experience the slow ping, the guest
> will have already "stalled" before the trace started, so the trace won't
> indicate any of the information about how we got into that state. Is
> that information going to be important, or is it enough to just see what
> it does to get out of the stalled state?


I don't know :-)
I fear the only thing we would see by looking at a fast ping trace
is the kvm going to sleep at the end. I guess the hot black box
here is likely: what happens when we try to wake up kvm and why it is
taking so long.

May be by looking at a slow ping trace, we could follow the flow once
the kvm is supposed to be awaken. At this stage, we can perhaps
follow both the scheduler and kvm activities. Hopefully after that
we can reduce more the trace, by filtering some specific areas.

It will likely end up with some ftrace_printk() (putting specific
trace messages in defined locations)...


 
> Either way, I'll try to get some results in my maintenance window
> tonight.
>
> Cheers,
> Kevin.
> 
> > You will need to pull the latest -tip tree and enable the following:
> > 
> > CONFIG_FUNCTION_TRACER=y
> > CONFIG_FUNCTION_GRAPH_TRACER=y
> > CONFIG_DYNAMIC_FTRACE=y
> > CONFIG_SCHED_TRACER=y
> > CONFIG_CONTEXT_SWITCH_TRACER=y
> > CONFIG_EVENT_TRACER=y
> > 
> > Thanks!
> > 
> > Ah and you will need python too (since bash can't do floating point
> > operation, it uses python here).
> > 
> > #!/bin/bash
> > 
> > # Switch off all CPUs except for one to simplify the trace
> > echo 0 > /sys/devices/system/cpu/cpu1/online
> > echo 0 > /sys/devices/system/cpu/cpu2/online
> > echo 0 > /sys/devices/system/cpu/cpu3/online
> > 
> > 
> > # Make sure debugfs has been mounted
> > if [ ! -d /sys/kernel/debug/tracing ]; then
> >     mount -t debugfs debugfs /sys/kernel/debug
> > fi
> > 
> > # Set up the trace parameters
> > pushd /sys/kernel/debug/tracing || exit 1
> > echo 0 > tracing_enabled
> > echo function_graph > current_tracer
> > echo funcgraph-abstime > trace_options
> > echo funcgraph-proc    > trace_options
> > 
> > # Set here the kvm IP addr
> > addr=""
> > 
> > # Set here a threshold of latency in sec
> > thres="5"
> > found="False"
> > lat=0
> > prefix=/sys/kernel/debug/tracing
> > 
> > echo 1 > $prefix/events/sched/sched_wakeup/enable
> > echo 1 > $prefix/events/sched/sched_switch/enable
> > 
> > while [ "$found" != "True" ]
> > do
> > 	# Flush the previous buffer
> > 	echo nop > $prefix/current_tracer
> > 
> > 	# Reset the function_graph tracer
> > 	echo function_graph > $prefix/current_tracer
> > 
> > 	echo 1 > $prefix/tracing_enabled
> > 	lat=$(ping -c 1 $addr | grep rtt | grep -Eo " [0-9]+.[0-9]+")
> > 	echo 0 > $prefix/tracing_enabled
> > 
> > 	found=$(python -c "print float(str($lat).strip()) > $thres")
> > 	sleep 0.01
> > done
> > 
> > echo 0 > $prefix/events/sched/sched_wakeup/enable
> > echo 0 > $prefix/events/sched/sched_switch/enable
> > 
> > 
> > echo "Found buggy latency: $lat"
> > echo "Please send the trace you will find on $prefix/trace"
> 
> 


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-18  2:24                   ` Frederic Weisbecker
  0 siblings, 0 replies; 180+ messages in thread
From: Frederic Weisbecker @ 2009-03-18  2:24 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Wed, Mar 18, 2009 at 11:46:26AM +1030, Kevin Shanahan wrote:
> On Wed, 2009-03-18 at 01:20 +0100, Frederic Weisbecker wrote:
> > On Tue, Mar 17, 2009 at 09:25:37AM +1030, Kevin Shanahan wrote:
> > > On Mon, 2009-03-16 at 21:07 +0100, Frederic Weisbecker wrote:
> > > > I've looked a bit at your traces.
> > > > I think it's probably too wide to find something inside.
> > > > Latest -tip is provided with a new set of events tracing, meaning
> > > > that you will be able to produce function graph traces with various
> > > > sched events included.
> > > > 
> > > > Another thing, is it possible to reproduce it with only one ping?
> > > > Or testing perioding pings and keep only one that raised a relevant
> > > > threshold of latency? I think we could do a script that can do that.
> > > > It would make the trace much clearer.
> > > 
> > > Yeah, I think that should be possible. If you can come up with such a
> > > script, that would be great.
> > 
> > Ok, I've made a small script based on yours which could do this job.
> > You will just have to set yourself a threshold of latency
> > that you consider as buggy. I don't remember the latency you observed.
> > About 5 secs right?
> > 
> > It's the "thres" variable in the script.
> > 
> > The resulting trace should be a mixup of the function graph traces
> > and scheduler events which look like this:
> > 
> >  gnome-screensav-4691  [000]  6716.774277:   4691:120:S ==> [000]     0:140:R <idle>
> >   xfce4-terminal-4723  [001]  6716.774303:   4723:120:R   + [001]  4289:120:S Xorg
> >   xfce4-terminal-4723  [001]  6716.774417:   4723:120:S ==> [001]  4289:120:R Xorg
> >             Xorg-4289  [001]  6716.774427:   4289:120:S ==> [001]     0:140:R <idle>
> > 
> > + is a wakeup and ==> is a context switch.
> > 
> > 
> > The script will loop trying some pings and will only keep the trace that matches
> > the latency threshold you defined.
> > 
> > Tell if the following script work for you.
> 
> Yes, this looks like it will work as intended.
> 
> One thing I was thinking about though - would we need to look for a
> trace that consists of a fast ping followed by a slow ping? If we only
> keep the trace data from when we experience the slow ping, the guest
> will have already "stalled" before the trace started, so the trace won't
> indicate any of the information about how we got into that state. Is
> that information going to be important, or is it enough to just see what
> it does to get out of the stalled state?


I don't know :-)
I fear the only thing we would see by looking at a fast ping trace
is the kvm going to sleep at the end. I guess the hot black box
here is likely: what happens when we try to wake up kvm and why it is
taking so long.

May be by looking at a slow ping trace, we could follow the flow once
the kvm is supposed to be awaken. At this stage, we can perhaps
follow both the scheduler and kvm activities. Hopefully after that
we can reduce more the trace, by filtering some specific areas.

It will likely end up with some ftrace_printk() (putting specific
trace messages in defined locations)...


 
> Either way, I'll try to get some results in my maintenance window
> tonight.
>
> Cheers,
> Kevin.
> 
> > You will need to pull the latest -tip tree and enable the following:
> > 
> > CONFIG_FUNCTION_TRACER=y
> > CONFIG_FUNCTION_GRAPH_TRACER=y
> > CONFIG_DYNAMIC_FTRACE=y
> > CONFIG_SCHED_TRACER=y
> > CONFIG_CONTEXT_SWITCH_TRACER=y
> > CONFIG_EVENT_TRACER=y
> > 
> > Thanks!
> > 
> > Ah and you will need python too (since bash can't do floating point
> > operation, it uses python here).
> > 
> > #!/bin/bash
> > 
> > # Switch off all CPUs except for one to simplify the trace
> > echo 0 > /sys/devices/system/cpu/cpu1/online
> > echo 0 > /sys/devices/system/cpu/cpu2/online
> > echo 0 > /sys/devices/system/cpu/cpu3/online
> > 
> > 
> > # Make sure debugfs has been mounted
> > if [ ! -d /sys/kernel/debug/tracing ]; then
> >     mount -t debugfs debugfs /sys/kernel/debug
> > fi
> > 
> > # Set up the trace parameters
> > pushd /sys/kernel/debug/tracing || exit 1
> > echo 0 > tracing_enabled
> > echo function_graph > current_tracer
> > echo funcgraph-abstime > trace_options
> > echo funcgraph-proc    > trace_options
> > 
> > # Set here the kvm IP addr
> > addr=""
> > 
> > # Set here a threshold of latency in sec
> > thres="5"
> > found="False"
> > lat=0
> > prefix=/sys/kernel/debug/tracing
> > 
> > echo 1 > $prefix/events/sched/sched_wakeup/enable
> > echo 1 > $prefix/events/sched/sched_switch/enable
> > 
> > while [ "$found" != "True" ]
> > do
> > 	# Flush the previous buffer
> > 	echo nop > $prefix/current_tracer
> > 
> > 	# Reset the function_graph tracer
> > 	echo function_graph > $prefix/current_tracer
> > 
> > 	echo 1 > $prefix/tracing_enabled
> > 	lat=$(ping -c 1 $addr | grep rtt | grep -Eo " [0-9]+.[0-9]+")
> > 	echo 0 > $prefix/tracing_enabled
> > 
> > 	found=$(python -c "print float(str($lat).strip()) > $thres")
> > 	sleep 0.01
> > done
> > 
> > echo 0 > $prefix/events/sched/sched_wakeup/enable
> > echo 0 > $prefix/events/sched/sched_switch/enable
> > 
> > 
> > echo "Found buggy latency: $lat"
> > echo "Please send the trace you will find on $prefix/trace"
> 
> 

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-03-18  1:16                 ` Kevin Shanahan
  (?)
  (?)
@ 2009-03-18 21:24                 ` Kevin Shanahan
  2009-03-21  5:00                     ` Kevin Shanahan
  -1 siblings, 1 reply; 180+ messages in thread
From: Kevin Shanahan @ 2009-03-18 21:24 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Wed, 2009-03-18 at 11:46 +1030, Kevin Shanahan wrote:
> On Wed, 2009-03-18 at 01:20 +0100, Frederic Weisbecker wrote:
> > Ok, I've made a small script based on yours which could do this job.
> > You will just have to set yourself a threshold of latency
> > that you consider as buggy. I don't remember the latency you observed.
> > About 5 secs right?
> > 
> > It's the "thres" variable in the script.
> > 
> > The resulting trace should be a mixup of the function graph traces
> > and scheduler events which look like this:
> > 
> >  gnome-screensav-4691  [000]  6716.774277:   4691:120:S ==> [000]     0:140:R <idle>
> >   xfce4-terminal-4723  [001]  6716.774303:   4723:120:R   + [001]  4289:120:S Xorg
> >   xfce4-terminal-4723  [001]  6716.774417:   4723:120:S ==> [001]  4289:120:R Xorg
> >             Xorg-4289  [001]  6716.774427:   4289:120:S ==> [001]     0:140:R <idle>
> > 
> > + is a wakeup and ==> is a context switch.
> > 
> > The script will loop trying some pings and will only keep the trace that matches
> > the latency threshold you defined.
> > 
> > Tell if the following script work for you.

...

> Either way, I'll try to get some results in my maintenance window
> tonight.

Testing did not go so well. I compiled and booted
2.6.29-rc8-tip-02630-g93c4989, but had some problems with the system
load when I tried to start tracing - it shot up to around 16-20 or so. I
started shutting down VMs to try and get it under control, but before I
got back to tracing again the machine disappeared off the network -
unresponsive to ping.

When I got in this morning, there was nothing on the console, nothing in
the logs to show what went wrong. I will try again, but my next chance
will probably be Saturday. Stay tuned.

Regards,
Kevin.



^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-21  5:00                     ` Kevin Shanahan
  0 siblings, 0 replies; 180+ messages in thread
From: Kevin Shanahan @ 2009-03-21  5:00 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Thu, 2009-03-19 at 07:54 +1030, Kevin Shanahan wrote:
> On Wed, 2009-03-18 at 11:46 +1030, Kevin Shanahan wrote:
> > On Wed, 2009-03-18 at 01:20 +0100, Frederic Weisbecker wrote:
> > > Ok, I've made a small script based on yours which could do this job.
> > > You will just have to set yourself a threshold of latency
> > > that you consider as buggy. I don't remember the latency you observed.
> > > About 5 secs right?
> > > 
> > > It's the "thres" variable in the script.
> > > 
> > > The resulting trace should be a mixup of the function graph traces
> > > and scheduler events which look like this:
> > > 
> > >  gnome-screensav-4691  [000]  6716.774277:   4691:120:S ==> [000]     0:140:R <idle>
> > >   xfce4-terminal-4723  [001]  6716.774303:   4723:120:R   + [001]  4289:120:S Xorg
> > >   xfce4-terminal-4723  [001]  6716.774417:   4723:120:S ==> [001]  4289:120:R Xorg
> > >             Xorg-4289  [001]  6716.774427:   4289:120:S ==> [001]     0:140:R <idle>
> > > 
> > > + is a wakeup and ==> is a context switch.
> > > 
> > > The script will loop trying some pings and will only keep the trace that matches
> > > the latency threshold you defined.
> > > 
> > > Tell if the following script work for you.
> 
> ...
> 
> > Either way, I'll try to get some results in my maintenance window
> > tonight.
> 
> Testing did not go so well. I compiled and booted
> 2.6.29-rc8-tip-02630-g93c4989, but had some problems with the system
> load when I tried to start tracing - it shot up to around 16-20 or so. I
> started shutting down VMs to try and get it under control, but before I
> got back to tracing again the machine disappeared off the network -
> unresponsive to ping.
> 
> When I got in this morning, there was nothing on the console, nothing in
> the logs to show what went wrong. I will try again, but my next chance
> will probably be Saturday. Stay tuned.

Okay, new set of traces have been uploaded to:

  http://disenchant.net/tmp/bug-12465/trace-3/

These were done on the latest tip, which I pulled down this morning:
2.6.29-rc8-tip-02744-gd9937cb.

The system load was very high again when I first tried to trace with
sevarl guests running, so I ended up only having the one guest running
and thankfully the bug was still reproducable that way.

Fingers crossed this set of traces is able to tell us something.

Regards,
Kevin.



^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-21  5:00                     ` Kevin Shanahan
  0 siblings, 0 replies; 180+ messages in thread
From: Kevin Shanahan @ 2009-03-21  5:00 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Thu, 2009-03-19 at 07:54 +1030, Kevin Shanahan wrote:
> On Wed, 2009-03-18 at 11:46 +1030, Kevin Shanahan wrote:
> > On Wed, 2009-03-18 at 01:20 +0100, Frederic Weisbecker wrote:
> > > Ok, I've made a small script based on yours which could do this job.
> > > You will just have to set yourself a threshold of latency
> > > that you consider as buggy. I don't remember the latency you observed.
> > > About 5 secs right?
> > > 
> > > It's the "thres" variable in the script.
> > > 
> > > The resulting trace should be a mixup of the function graph traces
> > > and scheduler events which look like this:
> > > 
> > >  gnome-screensav-4691  [000]  6716.774277:   4691:120:S ==> [000]     0:140:R <idle>
> > >   xfce4-terminal-4723  [001]  6716.774303:   4723:120:R   + [001]  4289:120:S Xorg
> > >   xfce4-terminal-4723  [001]  6716.774417:   4723:120:S ==> [001]  4289:120:R Xorg
> > >             Xorg-4289  [001]  6716.774427:   4289:120:S ==> [001]     0:140:R <idle>
> > > 
> > > + is a wakeup and ==> is a context switch.
> > > 
> > > The script will loop trying some pings and will only keep the trace that matches
> > > the latency threshold you defined.
> > > 
> > > Tell if the following script work for you.
> 
> ...
> 
> > Either way, I'll try to get some results in my maintenance window
> > tonight.
> 
> Testing did not go so well. I compiled and booted
> 2.6.29-rc8-tip-02630-g93c4989, but had some problems with the system
> load when I tried to start tracing - it shot up to around 16-20 or so. I
> started shutting down VMs to try and get it under control, but before I
> got back to tracing again the machine disappeared off the network -
> unresponsive to ping.
> 
> When I got in this morning, there was nothing on the console, nothing in
> the logs to show what went wrong. I will try again, but my next chance
> will probably be Saturday. Stay tuned.

Okay, new set of traces have been uploaded to:

  http://disenchant.net/tmp/bug-12465/trace-3/

These were done on the latest tip, which I pulled down this morning:
2.6.29-rc8-tip-02744-gd9937cb.

The system load was very high again when I first tried to trace with
sevarl guests running, so I ended up only having the one guest running
and thankfully the bug was still reproducable that way.

Fingers crossed this set of traces is able to tell us something.

Regards,
Kevin.


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-21 14:08                       ` Frederic Weisbecker
  0 siblings, 0 replies; 180+ messages in thread
From: Frederic Weisbecker @ 2009-03-21 14:08 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Sat, Mar 21, 2009 at 03:30:39PM +1030, Kevin Shanahan wrote:
> On Thu, 2009-03-19 at 07:54 +1030, Kevin Shanahan wrote:
> > On Wed, 2009-03-18 at 11:46 +1030, Kevin Shanahan wrote:
> > > On Wed, 2009-03-18 at 01:20 +0100, Frederic Weisbecker wrote:
> > > > Ok, I've made a small script based on yours which could do this job.
> > > > You will just have to set yourself a threshold of latency
> > > > that you consider as buggy. I don't remember the latency you observed.
> > > > About 5 secs right?
> > > > 
> > > > It's the "thres" variable in the script.
> > > > 
> > > > The resulting trace should be a mixup of the function graph traces
> > > > and scheduler events which look like this:
> > > > 
> > > >  gnome-screensav-4691  [000]  6716.774277:   4691:120:S ==> [000]     0:140:R <idle>
> > > >   xfce4-terminal-4723  [001]  6716.774303:   4723:120:R   + [001]  4289:120:S Xorg
> > > >   xfce4-terminal-4723  [001]  6716.774417:   4723:120:S ==> [001]  4289:120:R Xorg
> > > >             Xorg-4289  [001]  6716.774427:   4289:120:S ==> [001]     0:140:R <idle>
> > > > 
> > > > + is a wakeup and ==> is a context switch.
> > > > 
> > > > The script will loop trying some pings and will only keep the trace that matches
> > > > the latency threshold you defined.
> > > > 
> > > > Tell if the following script work for you.
> > 
> > ...
> > 
> > > Either way, I'll try to get some results in my maintenance window
> > > tonight.
> > 
> > Testing did not go so well. I compiled and booted
> > 2.6.29-rc8-tip-02630-g93c4989, but had some problems with the system
> > load when I tried to start tracing - it shot up to around 16-20 or so. I
> > started shutting down VMs to try and get it under control, but before I
> > got back to tracing again the machine disappeared off the network -
> > unresponsive to ping.
> > 
> > When I got in this morning, there was nothing on the console, nothing in
> > the logs to show what went wrong. I will try again, but my next chance
> > will probably be Saturday. Stay tuned.
> 
> Okay, new set of traces have been uploaded to:
> 
>   http://disenchant.net/tmp/bug-12465/trace-3/
> 
> These were done on the latest tip, which I pulled down this morning:
> 2.6.29-rc8-tip-02744-gd9937cb.
> 
> The system load was very high again when I first tried to trace with
> sevarl guests running, so I ended up only having the one guest running
> and thankfully the bug was still reproducable that way.
> 
> Fingers crossed this set of traces is able to tell us something.


Thanks a lot Kevin!

The traces seem indeed much more clearer now.
Looking at the first trace, we begin with qemu which answers to the ping.
By roughly simplying the trace, we have that:


Found buggy latency:  9297.585
Please send the trace you will find on /sys/kernel/debug/tracing/trace
# tracer: function_graph
#
#      TIME       CPU  TASK/PID        DURATION                  FUNCTION CALLS
#       |         |    |    |           |   |                     |   |   |   |

							/* answer the ping (socket write) */
 2668.130735 |   0)  qemu-sy-4048  |               |  sys_writev() {
 2668.130735 |   0)  qemu-sy-4048  |   0.361 us    |    fget_light();
 2668.130744 |   0)  qemu-sy-4048  |               |       netif_rx_ni() {
 2668.130744 |   0)  qemu-sy-4048  |               |         netif_rx() {
 2668.130763 |   0)  qemu-sy-4048  |               |           ipv4_conntrack_in() {
 2668.130764 |   0)  qemu-sy-4048  |               |             nf_conntrack_in() {
 2668.130764 |   0)  qemu-sy-4048  |   0.328 us    |               ipv4_get_l4proto();
 2668.130765 |   0)  qemu-sy-4048  |   0.310 us    |               __nf_ct_l4proto_find();
 2668.130776 |   0)  qemu-sy-4048  |               |                 icmp_packet() {
 2668.130804 |   0)  qemu-sy-4048  |               |                   netif_receive_skb() {
 2668.130804 |   0)  qemu-sy-4048  |               |                     ip_rcv() {
 2668.130824 |   0)  qemu-sy-4048  |               |                       raw_rcv() {
 2668.130824 |   0)  qemu-sy-4048  |   0.307 us    |                         skb_push();
 2668.130825 |   0)  qemu-sy-4048  |               |                           raw_rcv_skb() {
 2668.130832 |   0)  qemu-sy-4048  |               |                             __wake_up_common() {
 2668.130838 |   0)  qemu-sy-4048  |               |                               /* sched_wakeup: task ping:7420 [120] success=1 */
 2668.130839 |   0)  qemu-sy-4048  |   0.312 us    |                           }
                                                                              }
                                                                             }
                                                      [...]

							/* ping was waaiting for this response and is now awaken */
 2668.130876 |   0)  qemu-sy-4048  |               |  schedule() {
 2668.130885 |   0)  qemu-sy-4048  |               |  /* sched_switch: task qemu-system-x86:4048 [120] ==> ping:7420 [120] */
 2668.130885 |   0)  qemu-sy-4048  |               |    runqueue_is_locked() {
 2668.130886 |   0)  qemu-sy-4048  |   0.399 us    |    __phys_addr();
 ------------------------------------------
 0)  qemu-sy-4048  =>   ping-7420   
 ------------------------------------------

 2668.130887 |   0)   ping-7420    |               |                  finish_task_switch() {
 2668.130887 |   0)   ping-7420    |               |                    perf_counter_task_sched_in() {
 2668.130888 |   0)   ping-7420    |   0.319 us    |                      _spin_lock();
 2668.130888 |   0)   ping-7420    |   0.959 us    |                    }
 2668.130889 |   0)   ping-7420    |   1.644 us    |                  }
 2668.130889 |   0)   ping-7420    | ! 298102.3 us |                }
 2668.130890 |   0)   ping-7420    |               |                del_timer_sync() {
 2668.130890 |   0)   ping-7420    |               |                  try_to_del_timer_sync() {
 2668.130890 |   0)   ping-7420    |               |                    lock_timer_base() {
 2668.130890 |   0)   ping-7420    |   0.328 us    |                      _spin_lock_irqsave();
 2668.130891 |   0)   ping-7420    |   0.946 us    |                    }
 2668.130891 |   0)   ping-7420    |   0.328 us    |                    _spin_unlock_irqrestore();
 2668.130892 |   0)   ping-7420    |   2.218 us    |                  }
 2668.130892 |   0)   ping-7420    |   2.847 us    |                }
 2668.130893 |   0)   ping-7420    | ! 298108.7 us |              }
 2668.130893 |   0)   ping-7420    |   0.340 us    |              finish_wait();
 2668.130894 |   0)   ping-7420    |   0.328 us    |              _spin_lock_irqsave();
 2668.130894 |   0)   ping-7420    |   0.324 us    |              _spin_unlock_irqrestore();



As you can see we are in the middle of the dialog between ping and the kvm.
It means that we have lost many traces. I thing that the ring buffer does not have
enough space allocated for these 9 seconds of processing.

Just wait a bit while I'm cooking a better script, or at least trying to get a
better number of entries to allocate to the ring buffer and I come back to you.

But anyway, the scheduler switch and wake up events help a lot.

 
> Regards,
> Kevin.
> 
> 


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-21 14:08                       ` Frederic Weisbecker
  0 siblings, 0 replies; 180+ messages in thread
From: Frederic Weisbecker @ 2009-03-21 14:08 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Sat, Mar 21, 2009 at 03:30:39PM +1030, Kevin Shanahan wrote:
> On Thu, 2009-03-19 at 07:54 +1030, Kevin Shanahan wrote:
> > On Wed, 2009-03-18 at 11:46 +1030, Kevin Shanahan wrote:
> > > On Wed, 2009-03-18 at 01:20 +0100, Frederic Weisbecker wrote:
> > > > Ok, I've made a small script based on yours which could do this job.
> > > > You will just have to set yourself a threshold of latency
> > > > that you consider as buggy. I don't remember the latency you observed.
> > > > About 5 secs right?
> > > > 
> > > > It's the "thres" variable in the script.
> > > > 
> > > > The resulting trace should be a mixup of the function graph traces
> > > > and scheduler events which look like this:
> > > > 
> > > >  gnome-screensav-4691  [000]  6716.774277:   4691:120:S ==> [000]     0:140:R <idle>
> > > >   xfce4-terminal-4723  [001]  6716.774303:   4723:120:R   + [001]  4289:120:S Xorg
> > > >   xfce4-terminal-4723  [001]  6716.774417:   4723:120:S ==> [001]  4289:120:R Xorg
> > > >             Xorg-4289  [001]  6716.774427:   4289:120:S ==> [001]     0:140:R <idle>
> > > > 
> > > > + is a wakeup and ==> is a context switch.
> > > > 
> > > > The script will loop trying some pings and will only keep the trace that matches
> > > > the latency threshold you defined.
> > > > 
> > > > Tell if the following script work for you.
> > 
> > ...
> > 
> > > Either way, I'll try to get some results in my maintenance window
> > > tonight.
> > 
> > Testing did not go so well. I compiled and booted
> > 2.6.29-rc8-tip-02630-g93c4989, but had some problems with the system
> > load when I tried to start tracing - it shot up to around 16-20 or so. I
> > started shutting down VMs to try and get it under control, but before I
> > got back to tracing again the machine disappeared off the network -
> > unresponsive to ping.
> > 
> > When I got in this morning, there was nothing on the console, nothing in
> > the logs to show what went wrong. I will try again, but my next chance
> > will probably be Saturday. Stay tuned.
> 
> Okay, new set of traces have been uploaded to:
> 
>   http://disenchant.net/tmp/bug-12465/trace-3/
> 
> These were done on the latest tip, which I pulled down this morning:
> 2.6.29-rc8-tip-02744-gd9937cb.
> 
> The system load was very high again when I first tried to trace with
> sevarl guests running, so I ended up only having the one guest running
> and thankfully the bug was still reproducable that way.
> 
> Fingers crossed this set of traces is able to tell us something.


Thanks a lot Kevin!

The traces seem indeed much more clearer now.
Looking at the first trace, we begin with qemu which answers to the ping.
By roughly simplying the trace, we have that:


Found buggy latency:  9297.585
Please send the trace you will find on /sys/kernel/debug/tracing/trace
# tracer: function_graph
#
#      TIME       CPU  TASK/PID        DURATION                  FUNCTION CALLS
#       |         |    |    |           |   |                     |   |   |   |

							/* answer the ping (socket write) */
 2668.130735 |   0)  qemu-sy-4048  |               |  sys_writev() {
 2668.130735 |   0)  qemu-sy-4048  |   0.361 us    |    fget_light();
 2668.130744 |   0)  qemu-sy-4048  |               |       netif_rx_ni() {
 2668.130744 |   0)  qemu-sy-4048  |               |         netif_rx() {
 2668.130763 |   0)  qemu-sy-4048  |               |           ipv4_conntrack_in() {
 2668.130764 |   0)  qemu-sy-4048  |               |             nf_conntrack_in() {
 2668.130764 |   0)  qemu-sy-4048  |   0.328 us    |               ipv4_get_l4proto();
 2668.130765 |   0)  qemu-sy-4048  |   0.310 us    |               __nf_ct_l4proto_find();
 2668.130776 |   0)  qemu-sy-4048  |               |                 icmp_packet() {
 2668.130804 |   0)  qemu-sy-4048  |               |                   netif_receive_skb() {
 2668.130804 |   0)  qemu-sy-4048  |               |                     ip_rcv() {
 2668.130824 |   0)  qemu-sy-4048  |               |                       raw_rcv() {
 2668.130824 |   0)  qemu-sy-4048  |   0.307 us    |                         skb_push();
 2668.130825 |   0)  qemu-sy-4048  |               |                           raw_rcv_skb() {
 2668.130832 |   0)  qemu-sy-4048  |               |                             __wake_up_common() {
 2668.130838 |   0)  qemu-sy-4048  |               |                               /* sched_wakeup: task ping:7420 [120] success=1 */
 2668.130839 |   0)  qemu-sy-4048  |   0.312 us    |                           }
                                                                              }
                                                                             }
                                                      [...]

							/* ping was waaiting for this response and is now awaken */
 2668.130876 |   0)  qemu-sy-4048  |               |  schedule() {
 2668.130885 |   0)  qemu-sy-4048  |               |  /* sched_switch: task qemu-system-x86:4048 [120] ==> ping:7420 [120] */
 2668.130885 |   0)  qemu-sy-4048  |               |    runqueue_is_locked() {
 2668.130886 |   0)  qemu-sy-4048  |   0.399 us    |    __phys_addr();
 ------------------------------------------
 0)  qemu-sy-4048  =>   ping-7420   
 ------------------------------------------

 2668.130887 |   0)   ping-7420    |               |                  finish_task_switch() {
 2668.130887 |   0)   ping-7420    |               |                    perf_counter_task_sched_in() {
 2668.130888 |   0)   ping-7420    |   0.319 us    |                      _spin_lock();
 2668.130888 |   0)   ping-7420    |   0.959 us    |                    }
 2668.130889 |   0)   ping-7420    |   1.644 us    |                  }
 2668.130889 |   0)   ping-7420    | ! 298102.3 us |                }
 2668.130890 |   0)   ping-7420    |               |                del_timer_sync() {
 2668.130890 |   0)   ping-7420    |               |                  try_to_del_timer_sync() {
 2668.130890 |   0)   ping-7420    |               |                    lock_timer_base() {
 2668.130890 |   0)   ping-7420    |   0.328 us    |                      _spin_lock_irqsave();
 2668.130891 |   0)   ping-7420    |   0.946 us    |                    }
 2668.130891 |   0)   ping-7420    |   0.328 us    |                    _spin_unlock_irqrestore();
 2668.130892 |   0)   ping-7420    |   2.218 us    |                  }
 2668.130892 |   0)   ping-7420    |   2.847 us    |                }
 2668.130893 |   0)   ping-7420    | ! 298108.7 us |              }
 2668.130893 |   0)   ping-7420    |   0.340 us    |              finish_wait();
 2668.130894 |   0)   ping-7420    |   0.328 us    |              _spin_lock_irqsave();
 2668.130894 |   0)   ping-7420    |   0.324 us    |              _spin_unlock_irqrestore();



As you can see we are in the middle of the dialog between ping and the kvm.
It means that we have lost many traces. I thing that the ring buffer does not have
enough space allocated for these 9 seconds of processing.

Just wait a bit while I'm cooking a better script, or at least trying to get a
better number of entries to allocate to the ring buffer and I come back to you.

But anyway, the scheduler switch and wake up events help a lot.

 
> Regards,
> Kevin.
> 
> 

^ permalink raw reply	[flat|nested] 180+ messages in thread

* ptrace performance (was: [Bug #12208] uml is very slow on 2.6.28 host)
  2009-03-14 19:20   ` Rafael J. Wysocki
@ 2009-03-21 14:44     ` Michael Riepe
  -1 siblings, 0 replies; 180+ messages in thread
From: Michael Riepe @ 2009-03-21 14:44 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Miklos Szeredi

Disclaimer: I'm not using UML, but these problems may be related.

> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12208
> Subject		: uml is very slow on 2.6.28 host
> Submitter	: Miklos Szeredi <miklos@szeredi.hu>
> Date		: 2008-12-12 9:35 (93 days old)
> References	: http://marc.info/?l=linux-kernel&m=122907463518593&w=4

The other day I noticed a dramatic ptrace slowdown between 2.6.27 and
2.6.28.x (verified with 2.6.28.8). In particular, a command like

	dd if=/dev/zero of=/dev/null bs=1024k count=1024

will normally report a throughput in the GB/s range. On 2.6.27, this is
also true if you run

	strace -o /dev/null <dd command as above>

which is only a little slower. But if I do the same on 2.6.28.x, I get a
throughput of about 100 MB/s or less, i.e. less than 10%. I tried the
commands on three different machines (an Athlon64 3000+, a Core Duo
T2400 and an Atom 330), and they all behave similar. The more system
calls a program uses, the worse the slowdown (try the dd command with
bs=16k and count=65536, for example - but don't hold your breath).

Interestingly, the CPUs are mostly idle while the command is executing
on 2.6.28.x, but there is a high (system) load on 2.6.27. Therefore, I
suspect that it's a scheduling or maybe timer problem that was
introduced between 2.6.27 and 2.6.28. I haven't had the time to check
the rc kernels yet; perhaps someone else can run a quick check to verify
that it's gone in the latest 2.6.29-rc.

-- 
Michael "Tired" Riepe <michael.riepe@googlemail.com>
X-Tired: Each morning I get up I die a little

^ permalink raw reply	[flat|nested] 180+ messages in thread

* ptrace performance (was: [Bug #12208] uml is very slow on 2.6.28 host)
@ 2009-03-21 14:44     ` Michael Riepe
  0 siblings, 0 replies; 180+ messages in thread
From: Michael Riepe @ 2009-03-21 14:44 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Miklos Szeredi

Disclaimer: I'm not using UML, but these problems may be related.

> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12208
> Subject		: uml is very slow on 2.6.28 host
> Submitter	: Miklos Szeredi <miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org>
> Date		: 2008-12-12 9:35 (93 days old)
> References	: http://marc.info/?l=linux-kernel&m=122907463518593&w=4

The other day I noticed a dramatic ptrace slowdown between 2.6.27 and
2.6.28.x (verified with 2.6.28.8). In particular, a command like

	dd if=/dev/zero of=/dev/null bs=1024k count=1024

will normally report a throughput in the GB/s range. On 2.6.27, this is
also true if you run

	strace -o /dev/null <dd command as above>

which is only a little slower. But if I do the same on 2.6.28.x, I get a
throughput of about 100 MB/s or less, i.e. less than 10%. I tried the
commands on three different machines (an Athlon64 3000+, a Core Duo
T2400 and an Atom 330), and they all behave similar. The more system
calls a program uses, the worse the slowdown (try the dd command with
bs=16k and count=65536, for example - but don't hold your breath).

Interestingly, the CPUs are mostly idle while the command is executing
on 2.6.28.x, but there is a high (system) load on 2.6.27. Therefore, I
suspect that it's a scheduling or maybe timer problem that was
introduced between 2.6.27 and 2.6.28. I haven't had the time to check
the rc kernels yet; perhaps someone else can run a quick check to verify
that it's gone in the latest 2.6.29-rc.

-- 
Michael "Tired" Riepe <michael.riepe-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org>
X-Tired: Each morning I get up I die a little

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: ptrace performance (was: [Bug #12208] uml is very slow on 2.6.28 host)
@ 2009-03-21 15:22       ` Ingo Molnar
  0 siblings, 0 replies; 180+ messages in thread
From: Ingo Molnar @ 2009-03-21 15:22 UTC (permalink / raw)
  To: Michael Riepe
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Miklos Szeredi


* Michael Riepe <michael.riepe@googlemail.com> wrote:

> Disclaimer: I'm not using UML, but these problems may be related.
> 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12208
> > Subject		: uml is very slow on 2.6.28 host
> > Submitter	: Miklos Szeredi <miklos@szeredi.hu>
> > Date		: 2008-12-12 9:35 (93 days old)
> > References	: http://marc.info/?l=linux-kernel&m=122907463518593&w=4
> 
> The other day I noticed a dramatic ptrace slowdown between 2.6.27 and
> 2.6.28.x (verified with 2.6.28.8). In particular, a command like
> 
> 	dd if=/dev/zero of=/dev/null bs=1024k count=1024
> 
> will normally report a throughput in the GB/s range. On 2.6.27, this is
> also true if you run
> 
> 	strace -o /dev/null <dd command as above>
> 
> which is only a little slower. But if I do the same on 2.6.28.x, I 
> get a throughput of about 100 MB/s or less, i.e. less than 10%. I 
> tried the commands on three different machines (an Athlon64 3000+, 
> a Core Duo T2400 and an Atom 330), and they all behave similar. 
> The more system calls a program uses, the worse the slowdown (try 
> the dd command with bs=16k and count=65536, for example - but 
> don't hold your breath).
> 
> Interestingly, the CPUs are mostly idle while the command is 
> executing on 2.6.28.x, but there is a high (system) load on 
> 2.6.27. Therefore, I suspect that it's a scheduling or maybe timer 
> problem that was introduced between 2.6.27 and 2.6.28. I haven't 
> had the time to check the rc kernels yet; perhaps someone else can 
> run a quick check to verify that it's gone in the latest 
> 2.6.29-rc.

that's almost certainly due to the wait_task_inactive() change. Does 
the patch below improve things?

	Ingo

diff --git a/kernel/sched.c b/kernel/sched.c
index 3e827b8..2d60f23 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2119,7 +2119,8 @@ unsigned long wait_task_inactive(struct task_struct *p, long match_state)
 		 * yield - it could be a while.
 		 */
 		if (unlikely(on_rq)) {
-			schedule_timeout_uninterruptible(1);
+			cpu_relax();
+			cond_resched();
 			continue;
 		}
 

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: ptrace performance (was: [Bug #12208] uml is very slow on 2.6.28 host)
@ 2009-03-21 15:22       ` Ingo Molnar
  0 siblings, 0 replies; 180+ messages in thread
From: Ingo Molnar @ 2009-03-21 15:22 UTC (permalink / raw)
  To: Michael Riepe
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Miklos Szeredi


* Michael Riepe <michael.riepe-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote:

> Disclaimer: I'm not using UML, but these problems may be related.
> 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12208
> > Subject		: uml is very slow on 2.6.28 host
> > Submitter	: Miklos Szeredi <miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org>
> > Date		: 2008-12-12 9:35 (93 days old)
> > References	: http://marc.info/?l=linux-kernel&m=122907463518593&w=4
> 
> The other day I noticed a dramatic ptrace slowdown between 2.6.27 and
> 2.6.28.x (verified with 2.6.28.8). In particular, a command like
> 
> 	dd if=/dev/zero of=/dev/null bs=1024k count=1024
> 
> will normally report a throughput in the GB/s range. On 2.6.27, this is
> also true if you run
> 
> 	strace -o /dev/null <dd command as above>
> 
> which is only a little slower. But if I do the same on 2.6.28.x, I 
> get a throughput of about 100 MB/s or less, i.e. less than 10%. I 
> tried the commands on three different machines (an Athlon64 3000+, 
> a Core Duo T2400 and an Atom 330), and they all behave similar. 
> The more system calls a program uses, the worse the slowdown (try 
> the dd command with bs=16k and count=65536, for example - but 
> don't hold your breath).
> 
> Interestingly, the CPUs are mostly idle while the command is 
> executing on 2.6.28.x, but there is a high (system) load on 
> 2.6.27. Therefore, I suspect that it's a scheduling or maybe timer 
> problem that was introduced between 2.6.27 and 2.6.28. I haven't 
> had the time to check the rc kernels yet; perhaps someone else can 
> run a quick check to verify that it's gone in the latest 
> 2.6.29-rc.

that's almost certainly due to the wait_task_inactive() change. Does 
the patch below improve things?

	Ingo

diff --git a/kernel/sched.c b/kernel/sched.c
index 3e827b8..2d60f23 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2119,7 +2119,8 @@ unsigned long wait_task_inactive(struct task_struct *p, long match_state)
 		 * yield - it could be a while.
 		 */
 		if (unlikely(on_rq)) {
-			schedule_timeout_uninterruptible(1);
+			cpu_relax();
+			cond_resched();
 			continue;
 		}
 

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: ptrace performance
  2009-03-21 15:22       ` Ingo Molnar
  (?)
@ 2009-03-21 17:02       ` Michael Riepe
  -1 siblings, 0 replies; 180+ messages in thread
From: Michael Riepe @ 2009-03-21 17:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Miklos Szeredi



Ingo Molnar wrote:

> that's almost certainly due to the wait_task_inactive() change. Does 
> the patch below improve things?

That makes it *much* better - dd reports more than 3 GB/s on the Core
Duo. I'll have to check the other systems later; they're busy at the moment.

-- 
Michael "Tired" Riepe <michael.riepe@googlemail.com>
X-Tired: Each morning I get up I die a little

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-24 11:44                       ` Frederic Weisbecker
  0 siblings, 0 replies; 180+ messages in thread
From: Frederic Weisbecker @ 2009-03-24 11:44 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Sat, Mar 21, 2009 at 03:30:39PM +1030, Kevin Shanahan wrote:
> On Thu, 2009-03-19 at 07:54 +1030, Kevin Shanahan wrote:
> > On Wed, 2009-03-18 at 11:46 +1030, Kevin Shanahan wrote:
> > > On Wed, 2009-03-18 at 01:20 +0100, Frederic Weisbecker wrote:
> > > > Ok, I've made a small script based on yours which could do this job.
> > > > You will just have to set yourself a threshold of latency
> > > > that you consider as buggy. I don't remember the latency you observed.
> > > > About 5 secs right?
> > > > 
> > > > It's the "thres" variable in the script.
> > > > 
> > > > The resulting trace should be a mixup of the function graph traces
> > > > and scheduler events which look like this:
> > > > 
> > > >  gnome-screensav-4691  [000]  6716.774277:   4691:120:S ==> [000]     0:140:R <idle>
> > > >   xfce4-terminal-4723  [001]  6716.774303:   4723:120:R   + [001]  4289:120:S Xorg
> > > >   xfce4-terminal-4723  [001]  6716.774417:   4723:120:S ==> [001]  4289:120:R Xorg
> > > >             Xorg-4289  [001]  6716.774427:   4289:120:S ==> [001]     0:140:R <idle>
> > > > 
> > > > + is a wakeup and ==> is a context switch.
> > > > 
> > > > The script will loop trying some pings and will only keep the trace that matches
> > > > the latency threshold you defined.
> > > > 
> > > > Tell if the following script work for you.
> > 
> > ...
> > 
> > > Either way, I'll try to get some results in my maintenance window
> > > tonight.
> > 
> > Testing did not go so well. I compiled and booted
> > 2.6.29-rc8-tip-02630-g93c4989, but had some problems with the system
> > load when I tried to start tracing - it shot up to around 16-20 or so. I
> > started shutting down VMs to try and get it under control, but before I
> > got back to tracing again the machine disappeared off the network -
> > unresponsive to ping.
> > 
> > When I got in this morning, there was nothing on the console, nothing in
> > the logs to show what went wrong. I will try again, but my next chance
> > will probably be Saturday. Stay tuned.
> 
> Okay, new set of traces have been uploaded to:
> 
>   http://disenchant.net/tmp/bug-12465/trace-3/
> 
> These were done on the latest tip, which I pulled down this morning:
> 2.6.29-rc8-tip-02744-gd9937cb.
> 
> The system load was very high again when I first tried to trace with
> sevarl guests running, so I ended up only having the one guest running
> and thankfully the bug was still reproducable that way.
> 
> Fingers crossed this set of traces is able to tell us something.
> 
> Regards,
> Kevin.
> 
> 

Sorry, I've been late to answer.
As I explained in my previous mail, you trace is only
a snapshot that happened in 10 msec.

I experimented different sizes for the ring buffer but even
a 1 second trace require 20 Mo of memory. And a so huge trace
would be impractical.

I think we should keep the trace filters we had previously.
If you don't minde, could you please retest against latest -tip
the following updated patch? Iadded the filters, fixed the python
subshell and also flushed the buffer more nicely according to
a recent feature in -tip:

echo > trace 

instead of switching to nop.
You will need to pull latest -tip again.

Thanks a lot Kevin!


#!/bin/bash

# Switch off all CPUs except for one to simplify the trace
echo 0 > /sys/devices/system/cpu/cpu1/online
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 0 > /sys/devices/system/cpu/cpu3/online


# Make sure debugfs has been mounted
if [ ! -d /sys/kernel/debug/tracing ]; then
    mount -t debugfs debugfs /sys/kernel/debug
fi

# Set up the trace parameters
pushd /sys/kernel/debug/tracing || exit 1
echo 0 > tracing_enabled
echo function_graph > current_tracer
echo funcgraph-abstime > trace_options
echo funcgraph-proc    > trace_options

# Set here the kvm IP addr
addr="hermes-old"

# Set here a threshold of latency in sec
thres="5000"
found="False"
lat=0
prefix=/sys/kernel/debug/tracing

echo 1 > $prefix/events/sched/sched_wakeup/enable
echo 1 > $prefix/events/sched/sched_switch/enable

# Set the filter for functions to trace
echo ''         > set_ftrace_filter  # clear filter functions
echo '*sched*' >> set_ftrace_filter 
echo '*wake*'  >> set_ftrace_filter
echo '*kvm*'   >> set_ftrace_filter

# Reset the function_graph tracer
echo function_graph > $prefix/current_tracer

while [ "$found" != "True" ]
do
        # Flush the previous buffer
        echo trace > $prefix/trace

        echo 1 > $prefix/tracing_enabled
        lat=$(ping -c 1 $addr | grep rtt | grep -Eo " [0-9]+.[0-9]+")
        echo 0 > $prefix/tracing_enabled

	echo $lat
	found=$(python -c "print float(str($lat).strip())")
        sleep 0.01
done

echo 0 > $prefix/events/sched/sched_wakeup/enable
echo 0 > $prefix/events/sched/sched_switch/enable


echo "Found buggy latency: $lat"
echo "Please send the trace you will find on $prefix/trace"



^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-24 11:44                       ` Frederic Weisbecker
  0 siblings, 0 replies; 180+ messages in thread
From: Frederic Weisbecker @ 2009-03-24 11:44 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Sat, Mar 21, 2009 at 03:30:39PM +1030, Kevin Shanahan wrote:
> On Thu, 2009-03-19 at 07:54 +1030, Kevin Shanahan wrote:
> > On Wed, 2009-03-18 at 11:46 +1030, Kevin Shanahan wrote:
> > > On Wed, 2009-03-18 at 01:20 +0100, Frederic Weisbecker wrote:
> > > > Ok, I've made a small script based on yours which could do this job.
> > > > You will just have to set yourself a threshold of latency
> > > > that you consider as buggy. I don't remember the latency you observed.
> > > > About 5 secs right?
> > > > 
> > > > It's the "thres" variable in the script.
> > > > 
> > > > The resulting trace should be a mixup of the function graph traces
> > > > and scheduler events which look like this:
> > > > 
> > > >  gnome-screensav-4691  [000]  6716.774277:   4691:120:S ==> [000]     0:140:R <idle>
> > > >   xfce4-terminal-4723  [001]  6716.774303:   4723:120:R   + [001]  4289:120:S Xorg
> > > >   xfce4-terminal-4723  [001]  6716.774417:   4723:120:S ==> [001]  4289:120:R Xorg
> > > >             Xorg-4289  [001]  6716.774427:   4289:120:S ==> [001]     0:140:R <idle>
> > > > 
> > > > + is a wakeup and ==> is a context switch.
> > > > 
> > > > The script will loop trying some pings and will only keep the trace that matches
> > > > the latency threshold you defined.
> > > > 
> > > > Tell if the following script work for you.
> > 
> > ...
> > 
> > > Either way, I'll try to get some results in my maintenance window
> > > tonight.
> > 
> > Testing did not go so well. I compiled and booted
> > 2.6.29-rc8-tip-02630-g93c4989, but had some problems with the system
> > load when I tried to start tracing - it shot up to around 16-20 or so. I
> > started shutting down VMs to try and get it under control, but before I
> > got back to tracing again the machine disappeared off the network -
> > unresponsive to ping.
> > 
> > When I got in this morning, there was nothing on the console, nothing in
> > the logs to show what went wrong. I will try again, but my next chance
> > will probably be Saturday. Stay tuned.
> 
> Okay, new set of traces have been uploaded to:
> 
>   http://disenchant.net/tmp/bug-12465/trace-3/
> 
> These were done on the latest tip, which I pulled down this morning:
> 2.6.29-rc8-tip-02744-gd9937cb.
> 
> The system load was very high again when I first tried to trace with
> sevarl guests running, so I ended up only having the one guest running
> and thankfully the bug was still reproducable that way.
> 
> Fingers crossed this set of traces is able to tell us something.
> 
> Regards,
> Kevin.
> 
> 

Sorry, I've been late to answer.
As I explained in my previous mail, you trace is only
a snapshot that happened in 10 msec.

I experimented different sizes for the ring buffer but even
a 1 second trace require 20 Mo of memory. And a so huge trace
would be impractical.

I think we should keep the trace filters we had previously.
If you don't minde, could you please retest against latest -tip
the following updated patch? Iadded the filters, fixed the python
subshell and also flushed the buffer more nicely according to
a recent feature in -tip:

echo > trace 

instead of switching to nop.
You will need to pull latest -tip again.

Thanks a lot Kevin!


#!/bin/bash

# Switch off all CPUs except for one to simplify the trace
echo 0 > /sys/devices/system/cpu/cpu1/online
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 0 > /sys/devices/system/cpu/cpu3/online


# Make sure debugfs has been mounted
if [ ! -d /sys/kernel/debug/tracing ]; then
    mount -t debugfs debugfs /sys/kernel/debug
fi

# Set up the trace parameters
pushd /sys/kernel/debug/tracing || exit 1
echo 0 > tracing_enabled
echo function_graph > current_tracer
echo funcgraph-abstime > trace_options
echo funcgraph-proc    > trace_options

# Set here the kvm IP addr
addr="hermes-old"

# Set here a threshold of latency in sec
thres="5000"
found="False"
lat=0
prefix=/sys/kernel/debug/tracing

echo 1 > $prefix/events/sched/sched_wakeup/enable
echo 1 > $prefix/events/sched/sched_switch/enable

# Set the filter for functions to trace
echo ''         > set_ftrace_filter  # clear filter functions
echo '*sched*' >> set_ftrace_filter 
echo '*wake*'  >> set_ftrace_filter
echo '*kvm*'   >> set_ftrace_filter

# Reset the function_graph tracer
echo function_graph > $prefix/current_tracer

while [ "$found" != "True" ]
do
        # Flush the previous buffer
        echo trace > $prefix/trace

        echo 1 > $prefix/tracing_enabled
        lat=$(ping -c 1 $addr | grep rtt | grep -Eo " [0-9]+.[0-9]+")
        echo 0 > $prefix/tracing_enabled

	echo $lat
	found=$(python -c "print float(str($lat).strip())")
        sleep 0.01
done

echo 0 > $prefix/events/sched/sched_wakeup/enable
echo 0 > $prefix/events/sched/sched_switch/enable


echo "Found buggy latency: $lat"
echo "Please send the trace you will find on $prefix/trace"


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-03-24 11:44                       ` Frederic Weisbecker
@ 2009-03-24 11:47                         ` Frederic Weisbecker
  -1 siblings, 0 replies; 180+ messages in thread
From: Frederic Weisbecker @ 2009-03-24 11:47 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Tue, Mar 24, 2009 at 12:44:12PM +0100, Frederic Weisbecker wrote:
> On Sat, Mar 21, 2009 at 03:30:39PM +1030, Kevin Shanahan wrote:
> > On Thu, 2009-03-19 at 07:54 +1030, Kevin Shanahan wrote:
> > > On Wed, 2009-03-18 at 11:46 +1030, Kevin Shanahan wrote:
> > > > On Wed, 2009-03-18 at 01:20 +0100, Frederic Weisbecker wrote:
> > > > > Ok, I've made a small script based on yours which could do this job.
> > > > > You will just have to set yourself a threshold of latency
> > > > > that you consider as buggy. I don't remember the latency you observed.
> > > > > About 5 secs right?
> > > > > 
> > > > > It's the "thres" variable in the script.
> > > > > 
> > > > > The resulting trace should be a mixup of the function graph traces
> > > > > and scheduler events which look like this:
> > > > > 
> > > > >  gnome-screensav-4691  [000]  6716.774277:   4691:120:S ==> [000]     0:140:R <idle>
> > > > >   xfce4-terminal-4723  [001]  6716.774303:   4723:120:R   + [001]  4289:120:S Xorg
> > > > >   xfce4-terminal-4723  [001]  6716.774417:   4723:120:S ==> [001]  4289:120:R Xorg
> > > > >             Xorg-4289  [001]  6716.774427:   4289:120:S ==> [001]     0:140:R <idle>
> > > > > 
> > > > > + is a wakeup and ==> is a context switch.
> > > > > 
> > > > > The script will loop trying some pings and will only keep the trace that matches
> > > > > the latency threshold you defined.
> > > > > 
> > > > > Tell if the following script work for you.
> > > 
> > > ...
> > > 
> > > > Either way, I'll try to get some results in my maintenance window
> > > > tonight.
> > > 
> > > Testing did not go so well. I compiled and booted
> > > 2.6.29-rc8-tip-02630-g93c4989, but had some problems with the system
> > > load when I tried to start tracing - it shot up to around 16-20 or so. I
> > > started shutting down VMs to try and get it under control, but before I
> > > got back to tracing again the machine disappeared off the network -
> > > unresponsive to ping.
> > > 
> > > When I got in this morning, there was nothing on the console, nothing in
> > > the logs to show what went wrong. I will try again, but my next chance
> > > will probably be Saturday. Stay tuned.
> > 
> > Okay, new set of traces have been uploaded to:
> > 
> >   http://disenchant.net/tmp/bug-12465/trace-3/
> > 
> > These were done on the latest tip, which I pulled down this morning:
> > 2.6.29-rc8-tip-02744-gd9937cb.
> > 
> > The system load was very high again when I first tried to trace with
> > sevarl guests running, so I ended up only having the one guest running
> > and thankfully the bug was still reproducable that way.
> > 
> > Fingers crossed this set of traces is able to tell us something.
> > 
> > Regards,
> > Kevin.
> > 
> > 
> 
> Sorry, I've been late to answer.
> As I explained in my previous mail, you trace is only
> a snapshot that happened in 10 msec.
> 
> I experimented different sizes for the ring buffer but even
> a 1 second trace require 20 Mo of memory. And a so huge trace
> would be impractical.
> 
> I think we should keep the trace filters we had previously.
> If you don't minde, could you please retest against latest -tip
> the following updated patch? Iadded the filters, fixed the python
> subshell and also flushed the buffer more nicely according to
> a recent feature in -tip:
> 
> echo > trace 
> 
> instead of switching to nop.
> You will need to pull latest -tip again.
> 
> Thanks a lot Kevin!


Ah you will also need to increase the size of your buffer.
See below:
 
> 
> #!/bin/bash
> 
> # Switch off all CPUs except for one to simplify the trace
> echo 0 > /sys/devices/system/cpu/cpu1/online
> echo 0 > /sys/devices/system/cpu/cpu2/online
> echo 0 > /sys/devices/system/cpu/cpu3/online
> 
> 
> # Make sure debugfs has been mounted
> if [ ! -d /sys/kernel/debug/tracing ]; then
>     mount -t debugfs debugfs /sys/kernel/debug
> fi
> 
> # Set up the trace parameters
> pushd /sys/kernel/debug/tracing || exit 1
> echo 0 > tracing_enabled
> echo function_graph > current_tracer
> echo funcgraph-abstime > trace_options
> echo funcgraph-proc    > trace_options
> 
> # Set here the kvm IP addr
> addr="hermes-old"
> 
> # Set here a threshold of latency in sec
> thres="5000"
> found="False"
> lat=0
> prefix=/sys/kernel/debug/tracing
> 
> echo 1 > $prefix/events/sched/sched_wakeup/enable
> echo 1 > $prefix/events/sched/sched_switch/enable
> 
> # Set the filter for functions to trace
> echo ''         > set_ftrace_filter  # clear filter functions
> echo '*sched*' >> set_ftrace_filter 
> echo '*wake*'  >> set_ftrace_filter
> echo '*kvm*'   >> set_ftrace_filter
> 
> # Reset the function_graph tracer
> echo function_graph > $prefix/current_tracer

Put a

echo 20000 > $prefix/buffer_size_kb

So that we will have enough space (hopefully).

Thanks!

> 
> while [ "$found" != "True" ]
> do
>         # Flush the previous buffer
>         echo trace > $prefix/trace
> 
>         echo 1 > $prefix/tracing_enabled
>         lat=$(ping -c 1 $addr | grep rtt | grep -Eo " [0-9]+.[0-9]+")
>         echo 0 > $prefix/tracing_enabled
> 
> 	echo $lat
> 	found=$(python -c "print float(str($lat).strip())")
>         sleep 0.01
> done
> 
> echo 0 > $prefix/events/sched/sched_wakeup/enable
> echo 0 > $prefix/events/sched/sched_switch/enable
> 
> 
> echo "Found buggy latency: $lat"
> echo "Please send the trace you will find on $prefix/trace"
> 
> 


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-24 11:47                         ` Frederic Weisbecker
  0 siblings, 0 replies; 180+ messages in thread
From: Frederic Weisbecker @ 2009-03-24 11:47 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Tue, Mar 24, 2009 at 12:44:12PM +0100, Frederic Weisbecker wrote:
> On Sat, Mar 21, 2009 at 03:30:39PM +1030, Kevin Shanahan wrote:
> > On Thu, 2009-03-19 at 07:54 +1030, Kevin Shanahan wrote:
> > > On Wed, 2009-03-18 at 11:46 +1030, Kevin Shanahan wrote:
> > > > On Wed, 2009-03-18 at 01:20 +0100, Frederic Weisbecker wrote:
> > > > > Ok, I've made a small script based on yours which could do this job.
> > > > > You will just have to set yourself a threshold of latency
> > > > > that you consider as buggy. I don't remember the latency you observed.
> > > > > About 5 secs right?
> > > > > 
> > > > > It's the "thres" variable in the script.
> > > > > 
> > > > > The resulting trace should be a mixup of the function graph traces
> > > > > and scheduler events which look like this:
> > > > > 
> > > > >  gnome-screensav-4691  [000]  6716.774277:   4691:120:S ==> [000]     0:140:R <idle>
> > > > >   xfce4-terminal-4723  [001]  6716.774303:   4723:120:R   + [001]  4289:120:S Xorg
> > > > >   xfce4-terminal-4723  [001]  6716.774417:   4723:120:S ==> [001]  4289:120:R Xorg
> > > > >             Xorg-4289  [001]  6716.774427:   4289:120:S ==> [001]     0:140:R <idle>
> > > > > 
> > > > > + is a wakeup and ==> is a context switch.
> > > > > 
> > > > > The script will loop trying some pings and will only keep the trace that matches
> > > > > the latency threshold you defined.
> > > > > 
> > > > > Tell if the following script work for you.
> > > 
> > > ...
> > > 
> > > > Either way, I'll try to get some results in my maintenance window
> > > > tonight.
> > > 
> > > Testing did not go so well. I compiled and booted
> > > 2.6.29-rc8-tip-02630-g93c4989, but had some problems with the system
> > > load when I tried to start tracing - it shot up to around 16-20 or so. I
> > > started shutting down VMs to try and get it under control, but before I
> > > got back to tracing again the machine disappeared off the network -
> > > unresponsive to ping.
> > > 
> > > When I got in this morning, there was nothing on the console, nothing in
> > > the logs to show what went wrong. I will try again, but my next chance
> > > will probably be Saturday. Stay tuned.
> > 
> > Okay, new set of traces have been uploaded to:
> > 
> >   http://disenchant.net/tmp/bug-12465/trace-3/
> > 
> > These were done on the latest tip, which I pulled down this morning:
> > 2.6.29-rc8-tip-02744-gd9937cb.
> > 
> > The system load was very high again when I first tried to trace with
> > sevarl guests running, so I ended up only having the one guest running
> > and thankfully the bug was still reproducable that way.
> > 
> > Fingers crossed this set of traces is able to tell us something.
> > 
> > Regards,
> > Kevin.
> > 
> > 
> 
> Sorry, I've been late to answer.
> As I explained in my previous mail, you trace is only
> a snapshot that happened in 10 msec.
> 
> I experimented different sizes for the ring buffer but even
> a 1 second trace require 20 Mo of memory. And a so huge trace
> would be impractical.
> 
> I think we should keep the trace filters we had previously.
> If you don't minde, could you please retest against latest -tip
> the following updated patch? Iadded the filters, fixed the python
> subshell and also flushed the buffer more nicely according to
> a recent feature in -tip:
> 
> echo > trace 
> 
> instead of switching to nop.
> You will need to pull latest -tip again.
> 
> Thanks a lot Kevin!


Ah you will also need to increase the size of your buffer.
See below:
 
> 
> #!/bin/bash
> 
> # Switch off all CPUs except for one to simplify the trace
> echo 0 > /sys/devices/system/cpu/cpu1/online
> echo 0 > /sys/devices/system/cpu/cpu2/online
> echo 0 > /sys/devices/system/cpu/cpu3/online
> 
> 
> # Make sure debugfs has been mounted
> if [ ! -d /sys/kernel/debug/tracing ]; then
>     mount -t debugfs debugfs /sys/kernel/debug
> fi
> 
> # Set up the trace parameters
> pushd /sys/kernel/debug/tracing || exit 1
> echo 0 > tracing_enabled
> echo function_graph > current_tracer
> echo funcgraph-abstime > trace_options
> echo funcgraph-proc    > trace_options
> 
> # Set here the kvm IP addr
> addr="hermes-old"
> 
> # Set here a threshold of latency in sec
> thres="5000"
> found="False"
> lat=0
> prefix=/sys/kernel/debug/tracing
> 
> echo 1 > $prefix/events/sched/sched_wakeup/enable
> echo 1 > $prefix/events/sched/sched_switch/enable
> 
> # Set the filter for functions to trace
> echo ''         > set_ftrace_filter  # clear filter functions
> echo '*sched*' >> set_ftrace_filter 
> echo '*wake*'  >> set_ftrace_filter
> echo '*kvm*'   >> set_ftrace_filter
> 
> # Reset the function_graph tracer
> echo function_graph > $prefix/current_tracer

Put a

echo 20000 > $prefix/buffer_size_kb

So that we will have enough space (hopefully).

Thanks!

> 
> while [ "$found" != "True" ]
> do
>         # Flush the previous buffer
>         echo trace > $prefix/trace
> 
>         echo 1 > $prefix/tracing_enabled
>         lat=$(ping -c 1 $addr | grep rtt | grep -Eo " [0-9]+.[0-9]+")
>         echo 0 > $prefix/tracing_enabled
> 
> 	echo $lat
> 	found=$(python -c "print float(str($lat).strip())")
>         sleep 0.01
> done
> 
> echo 0 > $prefix/events/sched/sched_wakeup/enable
> echo 0 > $prefix/events/sched/sched_switch/enable
> 
> 
> echo "Found buggy latency: $lat"
> echo "Please send the trace you will find on $prefix/trace"
> 
> 

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-03-24 11:44                       ` Frederic Weisbecker
  (?)
  (?)
@ 2009-03-25 23:40                       ` Kevin Shanahan
  2009-03-25 23:48                           ` Frederic Weisbecker
  -1 siblings, 1 reply; 180+ messages in thread
From: Kevin Shanahan @ 2009-03-25 23:40 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Tue, 2009-03-24 at 12:44 +0100, Frederic Weisbecker wrote:
> Sorry, I've been late to answer.
> As I explained in my previous mail, you trace is only
> a snapshot that happened in 10 msec.
> 
> I experimented different sizes for the ring buffer but even
> a 1 second trace require 20 Mo of memory. And a so huge trace
> would be impractical.
> 
> I think we should keep the trace filters we had previously.
> If you don't minde, could you please retest against latest -tip
> the following updated patch? Iadded the filters, fixed the python
> subshell and also flushed the buffer more nicely according to
> a recent feature in -tip:
> 
> echo > trace 
> 
> instead of switching to nop.
> You will need to pull latest -tip again.

Ok, thanks for that. I'll get a new -tip kernel ready to test tonight.
I'm not sure about the change to the python subshell though:

> while [ "$found" != "True" ]
> do
>         # Flush the previous buffer
>         echo trace > $prefix/trace
> 
>         echo 1 > $prefix/tracing_enabled
>         lat=$(ping -c 1 $addr | grep rtt | grep -Eo " [0-9]+.[0-9]+")
>         echo 0 > $prefix/tracing_enabled
> 
> 	echo $lat
> 	found=$(python -c "print float(str($lat).strip())")
>         sleep 0.01
> done

kmshanah@kulgan:~$ python -c "print float(str(1.234).strip())"
1.234

That's not going to evaluate to "True" at all is it? What happened to
the test against the latency threshold value? Did you mean something
like this?

kmshanah@kulgan:~$ python -c "print float(str(1.234).strip()) > 5000"
False
kmshanah@kulgan:~$ python -c "print float(str(5001.234).strip()) > 5000"
True

Cheers,
Kevin.



^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-03-25 23:40                       ` Kevin Shanahan
@ 2009-03-25 23:48                           ` Frederic Weisbecker
  0 siblings, 0 replies; 180+ messages in thread
From: Frederic Weisbecker @ 2009-03-25 23:48 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Thu, Mar 26, 2009 at 10:10:32AM +1030, Kevin Shanahan wrote:
> On Tue, 2009-03-24 at 12:44 +0100, Frederic Weisbecker wrote:
> > Sorry, I've been late to answer.
> > As I explained in my previous mail, you trace is only
> > a snapshot that happened in 10 msec.
> > 
> > I experimented different sizes for the ring buffer but even
> > a 1 second trace require 20 Mo of memory. And a so huge trace
> > would be impractical.
> > 
> > I think we should keep the trace filters we had previously.
> > If you don't minde, could you please retest against latest -tip
> > the following updated patch? Iadded the filters, fixed the python
> > subshell and also flushed the buffer more nicely according to
> > a recent feature in -tip:
> > 
> > echo > trace 
> > 
> > instead of switching to nop.
> > You will need to pull latest -tip again.
> 
> Ok, thanks for that. I'll get a new -tip kernel ready to test tonight.
> I'm not sure about the change to the python subshell though:
> 
> > while [ "$found" != "True" ]
> > do
> >         # Flush the previous buffer
> >         echo trace > $prefix/trace
> > 
> >         echo 1 > $prefix/tracing_enabled
> >         lat=$(ping -c 1 $addr | grep rtt | grep -Eo " [0-9]+.[0-9]+")
> >         echo 0 > $prefix/tracing_enabled
> > 
> > 	echo $lat
> > 	found=$(python -c "print float(str($lat).strip())")
> >         sleep 0.01
> > done
> 
> kmshanah@kulgan:~$ python -c "print float(str(1.234).strip())"
> 1.234
> 
> That's not going to evaluate to "True" at all is it? What happened to
> the test against the latency threshold value? Did you mean something
> like this?
> 
> kmshanah@kulgan:~$ python -c "print float(str(1.234).strip()) > 5000"
> False
> kmshanah@kulgan:~$ python -c "print float(str(5001.234).strip()) > 5000"
> True


Sorry. I guess I was a bit asleep.
It's a mistake. So you can restore how it was.

Thanks.

 
> Cheers,
> Kevin.
> 
> 


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-25 23:48                           ` Frederic Weisbecker
  0 siblings, 0 replies; 180+ messages in thread
From: Frederic Weisbecker @ 2009-03-25 23:48 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Thu, Mar 26, 2009 at 10:10:32AM +1030, Kevin Shanahan wrote:
> On Tue, 2009-03-24 at 12:44 +0100, Frederic Weisbecker wrote:
> > Sorry, I've been late to answer.
> > As I explained in my previous mail, you trace is only
> > a snapshot that happened in 10 msec.
> > 
> > I experimented different sizes for the ring buffer but even
> > a 1 second trace require 20 Mo of memory. And a so huge trace
> > would be impractical.
> > 
> > I think we should keep the trace filters we had previously.
> > If you don't minde, could you please retest against latest -tip
> > the following updated patch? Iadded the filters, fixed the python
> > subshell and also flushed the buffer more nicely according to
> > a recent feature in -tip:
> > 
> > echo > trace 
> > 
> > instead of switching to nop.
> > You will need to pull latest -tip again.
> 
> Ok, thanks for that. I'll get a new -tip kernel ready to test tonight.
> I'm not sure about the change to the python subshell though:
> 
> > while [ "$found" != "True" ]
> > do
> >         # Flush the previous buffer
> >         echo trace > $prefix/trace
> > 
> >         echo 1 > $prefix/tracing_enabled
> >         lat=$(ping -c 1 $addr | grep rtt | grep -Eo " [0-9]+.[0-9]+")
> >         echo 0 > $prefix/tracing_enabled
> > 
> > 	echo $lat
> > 	found=$(python -c "print float(str($lat).strip())")
> >         sleep 0.01
> > done
> 
> kmshanah@kulgan:~$ python -c "print float(str(1.234).strip())"
> 1.234
> 
> That's not going to evaluate to "True" at all is it? What happened to
> the test against the latency threshold value? Did you mean something
> like this?
> 
> kmshanah@kulgan:~$ python -c "print float(str(1.234).strip()) > 5000"
> False
> kmshanah@kulgan:~$ python -c "print float(str(5001.234).strip()) > 5000"
> True


Sorry. I guess I was a bit asleep.
It's a mistake. So you can restore how it was.

Thanks.

 
> Cheers,
> Kevin.
> 
> 

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-03-24 11:44                       ` Frederic Weisbecker
@ 2009-03-26 20:22                         ` Kevin Shanahan
  -1 siblings, 0 replies; 180+ messages in thread
From: Kevin Shanahan @ 2009-03-26 20:22 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Tue, 2009-03-24 at 12:44 +0100, Frederic Weisbecker wrote:
> As I explained in my previous mail, you trace is only
> a snapshot that happened in 10 msec.
> 
> I experimented different sizes for the ring buffer but even
> a 1 second trace require 20 Mo of memory. And a so huge trace
> would be impractical.
> 
> I think we should keep the trace filters we had previously.
> If you don't minde, could you please retest against latest -tip
> the following updated patch? Iadded the filters, fixed the python
> subshell and also flushed the buffer more nicely according to
> a recent feature in -tip:
> 
> echo > trace 
> 
> instead of switching to nop.
> You will need to pull latest -tip again.

Ok, new set of traces uploaded again here:

  http://disenchant.net/tmp/bug-12465/trace-4/

These were taken using 2.6.29-tip-02749-g398bf09.

Same as last time, it was only necessary to have the one guest running
to reproduce the problem.

Cheers,
Kevin.



^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-26 20:22                         ` Kevin Shanahan
  0 siblings, 0 replies; 180+ messages in thread
From: Kevin Shanahan @ 2009-03-26 20:22 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Tue, 2009-03-24 at 12:44 +0100, Frederic Weisbecker wrote:
> As I explained in my previous mail, you trace is only
> a snapshot that happened in 10 msec.
> 
> I experimented different sizes for the ring buffer but even
> a 1 second trace require 20 Mo of memory. And a so huge trace
> would be impractical.
> 
> I think we should keep the trace filters we had previously.
> If you don't minde, could you please retest against latest -tip
> the following updated patch? Iadded the filters, fixed the python
> subshell and also flushed the buffer more nicely according to
> a recent feature in -tip:
> 
> echo > trace 
> 
> instead of switching to nop.
> You will need to pull latest -tip again.

Ok, new set of traces uploaded again here:

  http://disenchant.net/tmp/bug-12465/trace-4/

These were taken using 2.6.29-tip-02749-g398bf09.

Same as last time, it was only necessary to have the one guest running
to reproduce the problem.

Cheers,
Kevin.


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-03-21 17:07   ` Rafael J. Wysocki
@ 2009-03-21 19:50     ` Ingo Molnar
  -1 siblings, 0 replies; 180+ messages in thread
From: Ingo Molnar @ 2009-03-21 19:50 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Avi Kivity,
	Kevin Shanahan, Kevin Shanahan, Mike Galbraith, Peter Zijlstra


* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.27 and 2.6.28.
> 
> The following bug entry is on the current list of known regressions
> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> be listed and let me know (either way).
> 
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> Subject		: KVM guests stalling on 2.6.28 (bisected)
> Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
> Date		: 2009-01-17 03:37 (64 days old)
> References	: http://lkml.org/lkml/2009/3/15/51
> Handled-By	: Avi Kivity <avi@redhat.com>

It's still being investigated.

	Ingo

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-21 19:50     ` Ingo Molnar
  0 siblings, 0 replies; 180+ messages in thread
From: Ingo Molnar @ 2009-03-21 19:50 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Avi Kivity,
	Kevin Shanahan, Kevin Shanahan, Mike Galbraith, Peter Zijlstra


* Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org> wrote:

> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.27 and 2.6.28.
> 
> The following bug entry is on the current list of known regressions
> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> be listed and let me know (either way).
> 
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> Subject		: KVM guests stalling on 2.6.28 (bisected)
> Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
> Date		: 2009-01-17 03:37 (64 days old)
> References	: http://lkml.org/lkml/2009/3/15/51
> Handled-By	: Avi Kivity <avi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

It's still being investigated.

	Ingo

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-03-21 17:01 2.6.29-rc8-git5: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
@ 2009-03-21 17:07   ` Rafael J. Wysocki
  0 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-21 17:07 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Avi Kivity, Ingo Molnar, Kevin Shanahan,
	Kevin Shanahan, Mike Galbraith, Peter Zijlstra

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
Subject		: KVM guests stalling on 2.6.28 (bisected)
Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
Date		: 2009-01-17 03:37 (64 days old)
References	: http://lkml.org/lkml/2009/3/15/51
Handled-By	: Avi Kivity <avi@redhat.com>



^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-21 17:07   ` Rafael J. Wysocki
  0 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-21 17:07 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Avi Kivity, Ingo Molnar, Kevin Shanahan,
	Kevin Shanahan, Mike Galbraith, Peter Zijlstra

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
Subject		: KVM guests stalling on 2.6.28 (bisected)
Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
Date		: 2009-01-17 03:37 (64 days old)
References	: http://lkml.org/lkml/2009/3/15/51
Handled-By	: Avi Kivity <avi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-08 10:04       ` Avi Kivity
  0 siblings, 0 replies; 180+ messages in thread
From: Avi Kivity @ 2009-03-08 10:04 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

Kevin Shanahan wrote:
> On Tue, 2009-03-03 at 20:41 +0100, Rafael J. Wysocki wrote:
>   
>> The following bug entry is on the current list of known regressions
>> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
>> be listed and let me know (either way).
>>
>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
>> Subject		: KVM guests stalling on 2.6.28 (bisected)
>> Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
>> Date		: 2009-01-17 03:37 (46 days old)
>> Handled-By	: Avi Kivity <avi@redhat.com>
>>     
>
> Yes this should still be listed.
>
> The traces are there waiting to be looked at. If there's anything else I
> can do to help things along, please let me know.
>
>   

I was away on vacation, I'll try to look at the traces soon.  Help from 
the sched developers would be appreciated, though, as I doubt I have the 
skills to decypher them.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-08 10:04       ` Avi Kivity
  0 siblings, 0 replies; 180+ messages in thread
From: Avi Kivity @ 2009-03-08 10:04 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

Kevin Shanahan wrote:
> On Tue, 2009-03-03 at 20:41 +0100, Rafael J. Wysocki wrote:
>   
>> The following bug entry is on the current list of known regressions
>> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
>> be listed and let me know (either way).
>>
>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
>> Subject		: KVM guests stalling on 2.6.28 (bisected)
>> Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
>> Date		: 2009-01-17 03:37 (46 days old)
>> Handled-By	: Avi Kivity <avi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>>     
>
> Yes this should still be listed.
>
> The traces are there waiting to be looked at. If there's anything else I
> can do to help things along, please let me know.
>
>   

I was away on vacation, I'll try to look at the traces soon.  Help from 
the sched developers would be appreciated, though, as I doubt I have the 
skills to decypher them.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-03-03 19:41   ` Rafael J. Wysocki
@ 2009-03-04  3:08     ` Kevin Shanahan
  -1 siblings, 0 replies; 180+ messages in thread
From: Kevin Shanahan @ 2009-03-04  3:08 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Avi Kivity,
	Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Tue, 2009-03-03 at 20:41 +0100, Rafael J. Wysocki wrote:
> The following bug entry is on the current list of known regressions
> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> be listed and let me know (either way).
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> Subject		: KVM guests stalling on 2.6.28 (bisected)
> Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
> Date		: 2009-01-17 03:37 (46 days old)
> Handled-By	: Avi Kivity <avi@redhat.com>

Yes this should still be listed.

The traces are there waiting to be looked at. If there's anything else I
can do to help things along, please let me know.

Regards,
Kevin.



^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-04  3:08     ` Kevin Shanahan
  0 siblings, 0 replies; 180+ messages in thread
From: Kevin Shanahan @ 2009-03-04  3:08 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Avi Kivity,
	Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Tue, 2009-03-03 at 20:41 +0100, Rafael J. Wysocki wrote:
> The following bug entry is on the current list of known regressions
> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> be listed and let me know (either way).
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> Subject		: KVM guests stalling on 2.6.28 (bisected)
> Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
> Date		: 2009-01-17 03:37 (46 days old)
> Handled-By	: Avi Kivity <avi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Yes this should still be listed.

The traces are there waiting to be looked at. If there's anything else I
can do to help things along, please let me know.

Regards,
Kevin.


^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-03-03 19:34 2.6.29-rc6-git7: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
@ 2009-03-03 19:41   ` Rafael J. Wysocki
  0 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-03 19:41 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Avi Kivity, Ingo Molnar, Kevin Shanahan,
	Kevin Shanahan, Mike Galbraith, Peter Zijlstra

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
Subject		: KVM guests stalling on 2.6.28 (bisected)
Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
Date		: 2009-01-17 03:37 (46 days old)
Handled-By	: Avi Kivity <avi@redhat.com>



^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-03 19:41   ` Rafael J. Wysocki
  0 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-03-03 19:41 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Avi Kivity, Ingo Molnar, Kevin Shanahan,
	Kevin Shanahan, Mike Galbraith, Peter Zijlstra

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
Subject		: KVM guests stalling on 2.6.28 (bisected)
Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
Date		: 2009-01-17 03:37 (46 days old)
Handled-By	: Avi Kivity <avi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-02-24 22:11         ` Kevin Shanahan
  0 siblings, 0 replies; 180+ messages in thread
From: Kevin Shanahan @ 2009-02-24 22:11 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Tue, 2009-02-24 at 14:09 +0200, Avi Kivity wrote:
> Kevin Shanahan wrote:
> > On Mon, 2009-02-23 at 23:03 +0100, Rafael J. Wysocki wrote:
> >   
> >> This message has been generated automatically as a part of a report
> >> of regressions introduced between 2.6.27 and 2.6.28.
> >>
> >> The following bug entry is on the current list of known regressions
> >> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> >> be listed and let me know (either way).
> >>     
> >
> > Yes, the problem should still be listed.
> > The bug is still present as recently as 2.6.29-rc5-00299-gadfafef.
> >   
> 
> Did tracing turn anything up?

I provided some more traces using Ingo's "tip" branch, but I don't think
anyone has looked at them yet.

  http://bugzilla.kernel.org/show_bug.cgi?id=12465#c11

I can provide more traces if e.g. a different set of functions is
required, but I'm not going to be able to analyse them properly myself.

I should have a bit more time for testing next week and I plan to try
setting up the guest with the different virtual network adapters models
to see if that helps.

Cheers,
Kevin.



^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-02-24 22:11         ` Kevin Shanahan
  0 siblings, 0 replies; 180+ messages in thread
From: Kevin Shanahan @ 2009-02-24 22:11 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Tue, 2009-02-24 at 14:09 +0200, Avi Kivity wrote:
> Kevin Shanahan wrote:
> > On Mon, 2009-02-23 at 23:03 +0100, Rafael J. Wysocki wrote:
> >   
> >> This message has been generated automatically as a part of a report
> >> of regressions introduced between 2.6.27 and 2.6.28.
> >>
> >> The following bug entry is on the current list of known regressions
> >> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> >> be listed and let me know (either way).
> >>     
> >
> > Yes, the problem should still be listed.
> > The bug is still present as recently as 2.6.29-rc5-00299-gadfafef.
> >   
> 
> Did tracing turn anything up?

I provided some more traces using Ingo's "tip" branch, but I don't think
anyone has looked at them yet.

  http://bugzilla.kernel.org/show_bug.cgi?id=12465#c11

I can provide more traces if e.g. a different set of functions is
required, but I'm not going to be able to analyse them properly myself.

I should have a bit more time for testing next week and I plan to try
setting up the guest with the different virtual network adapters models
to see if that helps.

Cheers,
Kevin.


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-02-24 12:09       ` Avi Kivity
  0 siblings, 0 replies; 180+ messages in thread
From: Avi Kivity @ 2009-02-24 12:09 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

Kevin Shanahan wrote:
> On Mon, 2009-02-23 at 23:03 +0100, Rafael J. Wysocki wrote:
>   
>> This message has been generated automatically as a part of a report
>> of regressions introduced between 2.6.27 and 2.6.28.
>>
>> The following bug entry is on the current list of known regressions
>> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
>> be listed and let me know (either way).
>>     
>
> Yes, the problem should still be listed.
> The bug is still present as recently as 2.6.29-rc5-00299-gadfafef.
>   

Did tracing turn anything up?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-02-24 12:09       ` Avi Kivity
  0 siblings, 0 replies; 180+ messages in thread
From: Avi Kivity @ 2009-02-24 12:09 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

Kevin Shanahan wrote:
> On Mon, 2009-02-23 at 23:03 +0100, Rafael J. Wysocki wrote:
>   
>> This message has been generated automatically as a part of a report
>> of regressions introduced between 2.6.27 and 2.6.28.
>>
>> The following bug entry is on the current list of known regressions
>> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
>> be listed and let me know (either way).
>>     
>
> Yes, the problem should still be listed.
> The bug is still present as recently as 2.6.29-rc5-00299-gadfafef.
>   

Did tracing turn anything up?

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-02-24  1:37       ` Rafael J. Wysocki
  0 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-02-24  1:37 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Linux Kernel Mailing List, Kernel Testers List, Ingo Molnar,
	Mike Galbraith, Peter Zijlstra

On Tuesday 24 February 2009, Kevin Shanahan wrote:
> On Mon, 2009-02-23 at 23:03 +0100, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.27 and 2.6.28.
> > 
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> > be listed and let me know (either way).
> 
> Yes, the problem should still be listed.
> The bug is still present as recently as 2.6.29-rc5-00299-gadfafef.

Thanks for the update.

Rafael

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-02-24  1:37       ` Rafael J. Wysocki
  0 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-02-24  1:37 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Linux Kernel Mailing List, Kernel Testers List, Ingo Molnar,
	Mike Galbraith, Peter Zijlstra

On Tuesday 24 February 2009, Kevin Shanahan wrote:
> On Mon, 2009-02-23 at 23:03 +0100, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.27 and 2.6.28.
> > 
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> > be listed and let me know (either way).
> 
> Yes, the problem should still be listed.
> The bug is still present as recently as 2.6.29-rc5-00299-gadfafef.

Thanks for the update.

Rafael

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-02-23 22:03   ` Rafael J. Wysocki
@ 2009-02-24  0:59     ` Kevin Shanahan
  -1 siblings, 0 replies; 180+ messages in thread
From: Kevin Shanahan @ 2009-02-24  0:59 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Ingo Molnar,
	Mike Galbraith, Peter Zijlstra

On Mon, 2009-02-23 at 23:03 +0100, Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.27 and 2.6.28.
> 
> The following bug entry is on the current list of known regressions
> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> be listed and let me know (either way).

Yes, the problem should still be listed.
The bug is still present as recently as 2.6.29-rc5-00299-gadfafef.

Regards,
Kevin.

> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> Subject		: KVM guests stalling on 2.6.28 (bisected)
> Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
> Date		: 2009-01-17 03:37 (38 days old)



^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-02-24  0:59     ` Kevin Shanahan
  0 siblings, 0 replies; 180+ messages in thread
From: Kevin Shanahan @ 2009-02-24  0:59 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Ingo Molnar,
	Mike Galbraith, Peter Zijlstra

On Mon, 2009-02-23 at 23:03 +0100, Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.27 and 2.6.28.
> 
> The following bug entry is on the current list of known regressions
> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> be listed and let me know (either way).

Yes, the problem should still be listed.
The bug is still present as recently as 2.6.29-rc5-00299-gadfafef.

Regards,
Kevin.

> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> Subject		: KVM guests stalling on 2.6.28 (bisected)
> Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
> Date		: 2009-01-17 03:37 (38 days old)


^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-02-23 22:00 2.6.29-rc6: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
@ 2009-02-23 22:03   ` Rafael J. Wysocki
  0 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-02-23 22:03 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Ingo Molnar, Kevin Shanahan, Kevin Shanahan,
	Mike Galbraith, Peter Zijlstra

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
Subject		: KVM guests stalling on 2.6.28 (bisected)
Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
Date		: 2009-01-17 03:37 (38 days old)



^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-02-23 22:03   ` Rafael J. Wysocki
  0 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-02-23 22:03 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Ingo Molnar, Kevin Shanahan, Kevin Shanahan,
	Mike Galbraith, Peter Zijlstra

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
Subject		: KVM guests stalling on 2.6.28 (bisected)
Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
Date		: 2009-01-17 03:37 (38 days old)


^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-02-14 20:48 2.6.29-rc5: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  0 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Ingo Molnar, Kevin Shanahan, Kevin Shanahan,
	Mike Galbraith, Peter Zijlstra

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
Subject		: KVM guests stalling on 2.6.28 (bisected)
Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
Date		: 2009-01-17 03:37 (29 days old)



^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  0 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Ingo Molnar, Kevin Shanahan, Kevin Shanahan,
	Mike Galbraith, Peter Zijlstra

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
Subject		: KVM guests stalling on 2.6.28 (bisected)
Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
Date		: 2009-01-17 03:37 (29 days old)


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-02-05 22:37       ` Rafael J. Wysocki
  0 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-02-05 22:37 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Linux Kernel Mailing List, Kernel Testers List, Ingo Molnar,
	Mike Galbraith, Peter Zijlstra

On Thursday 05 February 2009, Kevin Shanahan wrote:
> On Wed, 2009-02-04 at 11:58 +0100, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.27 and 2.6.28.
> > 
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> > be listed and let me know (either way).
> > 
> > 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> > Subject		: KVM guests stalling on 2.6.28 (bisected)
> > Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
> > Date		: 2009-01-17 03:37 (19 days old)
> 
> Yes, this should still be listed.

Thanks for the update.
 
> Please remove kmshanah@flexo.wumi.org.au from the CC list.

It gets added because it is present in the Author: field in
http://bugzilla.kernel.org/show_bug.cgi?id=12465#c5

This is how the script works, sorry for the inconvenience.

Rafael


> 
> Thanks,
> Kevin.
> 
> 
> 
> 


-- 
Everyone knows that debugging is twice as hard as writing a program
in the first place.  So if you're as clever as you can be when you write it,
how will you ever debug it? --- Brian Kernighan

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-02-05 22:37       ` Rafael J. Wysocki
  0 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-02-05 22:37 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Linux Kernel Mailing List, Kernel Testers List, Ingo Molnar,
	Mike Galbraith, Peter Zijlstra

On Thursday 05 February 2009, Kevin Shanahan wrote:
> On Wed, 2009-02-04 at 11:58 +0100, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.27 and 2.6.28.
> > 
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> > be listed and let me know (either way).
> > 
> > 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> > Subject		: KVM guests stalling on 2.6.28 (bisected)
> > Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
> > Date		: 2009-01-17 03:37 (19 days old)
> 
> Yes, this should still be listed.

Thanks for the update.
 
> Please remove kmshanah-IiIpDuVlHfMLO379cgqW9odd74u8MsAO@public.gmane.org from the CC list.

It gets added because it is present in the Author: field in
http://bugzilla.kernel.org/show_bug.cgi?id=12465#c5

This is how the script works, sorry for the inconvenience.

Rafael


> 
> Thanks,
> Kevin.
> 
> 
> 
> 


-- 
Everyone knows that debugging is twice as hard as writing a program
in the first place.  So if you're as clever as you can be when you write it,
how will you ever debug it? --- Brian Kernighan

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-02-04 10:58   ` Rafael J. Wysocki
@ 2009-02-05 19:35     ` Kevin Shanahan
  -1 siblings, 0 replies; 180+ messages in thread
From: Kevin Shanahan @ 2009-02-05 19:35 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Ingo Molnar,
	Mike Galbraith, Peter Zijlstra

On Wed, 2009-02-04 at 11:58 +0100, Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.27 and 2.6.28.
> 
> The following bug entry is on the current list of known regressions
> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> be listed and let me know (either way).
> 
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> Subject		: KVM guests stalling on 2.6.28 (bisected)
> Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
> Date		: 2009-01-17 03:37 (19 days old)

Yes, this should still be listed.

Please remove kmshanah@flexo.wumi.org.au from the CC list.

Thanks,
Kevin.



^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-02-05 19:35     ` Kevin Shanahan
  0 siblings, 0 replies; 180+ messages in thread
From: Kevin Shanahan @ 2009-02-05 19:35 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Ingo Molnar,
	Mike Galbraith, Peter Zijlstra

On Wed, 2009-02-04 at 11:58 +0100, Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.27 and 2.6.28.
> 
> The following bug entry is on the current list of known regressions
> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> be listed and let me know (either way).
> 
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> Subject		: KVM guests stalling on 2.6.28 (bisected)
> Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
> Date		: 2009-01-17 03:37 (19 days old)

Yes, this should still be listed.

Please remove kmshanah-IiIpDuVlHfMLO379cgqW9odd74u8MsAO@public.gmane.org from the CC list.

Thanks,
Kevin.


^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-02-04 10:55 2.6.29-rc3-git6: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
@ 2009-02-04 10:58   ` Rafael J. Wysocki
  0 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-02-04 10:58 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Ingo Molnar, Kevin Shanahan, Kevin Shanahan,
	Mike Galbraith, Peter Zijlstra

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
Subject		: KVM guests stalling on 2.6.28 (bisected)
Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
Date		: 2009-01-17 03:37 (19 days old)



^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-02-04 10:58   ` Rafael J. Wysocki
  0 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-02-04 10:58 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Ingo Molnar, Kevin Shanahan, Kevin Shanahan,
	Mike Galbraith, Peter Zijlstra

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
Subject		: KVM guests stalling on 2.6.28 (bisected)
Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
Date		: 2009-01-17 03:37 (19 days old)


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-01-26 11:35                         ` Peter Zijlstra
@ 2009-01-26 15:00                             ` Ingo Molnar
  0 siblings, 0 replies; 180+ messages in thread
From: Ingo Molnar @ 2009-01-26 15:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kevin Shanahan, Avi Kivity, Steven Rostedt, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Frédéric Weisbecker, bugme-daemon


* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> Is there a way to add a wall-time column to this output so that we can 
> see where the time goes?

yes, on tip/master:

  http://people.redhat.com/mingo/tip.git/README

do something like this:

 echo funcgraph-abstime > /debug/tracing/trace_options

when the function-graph plugin is active. This will activate the absolute 
timestamps column in the trace output.

> Another something nice would be to have ctx switches like:
> 
> foo-1 => bar-2 ran: ${time foo spend on the cpu} since: ${time bar spend away from the cpu}
> 
> I'll poke me a little at this function graph tracer thingy to see if I 
> can do that.

indeed, tracking the 'scheduling atom duration' would be very nice.

	Ingo

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-26 15:00                             ` Ingo Molnar
  0 siblings, 0 replies; 180+ messages in thread
From: Ingo Molnar @ 2009-01-26 15:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kevin Shanahan, Avi Kivity, Steven Rostedt, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Frédéric Weisbecker,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r


* Peter Zijlstra <a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public.gmane.org> wrote:

> Is there a way to add a wall-time column to this output so that we can 
> see where the time goes?

yes, on tip/master:

  http://people.redhat.com/mingo/tip.git/README

do something like this:

 echo funcgraph-abstime > /debug/tracing/trace_options

when the function-graph plugin is active. This will activate the absolute 
timestamps column in the trace output.

> Another something nice would be to have ctx switches like:
> 
> foo-1 => bar-2 ran: ${time foo spend on the cpu} since: ${time bar spend away from the cpu}
> 
> I'll poke me a little at this function graph tracer thingy to see if I 
> can do that.

indeed, tracking the 'scheduling atom duration' would be very nice.

	Ingo

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-01-26  9:55                         ` Kevin Shanahan
  (?)
@ 2009-01-26 11:35                         ` Peter Zijlstra
  2009-01-26 15:00                             ` Ingo Molnar
  -1 siblings, 1 reply; 180+ messages in thread
From: Peter Zijlstra @ 2009-01-26 11:35 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Ingo Molnar, Avi Kivity, Steven Rostedt, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Frédéric Weisbecker, bugme-daemon

On Mon, 2009-01-26 at 20:25 +1030, Kevin Shanahan wrote:

> Just carrying out the steps was okay, but I don't really know what I'm
> looking at. I've uploaded the trace here (about 10 seconds worth, I
> think):
> 
>   http://disenchant.net/tmp/bug-12465/trace-1/
> 
> The guest being pinged is process 4353:
> 
> kmshanah@flexo:~$ pstree -p 4353
> qemu-system-x86(4353)─┬─{qemu-system-x86}(4354)
>                       ├─{qemu-system-x86}(4355)
>                       └─{qemu-system-x86}(4772)
> 
> I guess the larger overhead/duration values are what we are looking for,
> e.g.:
> 
> kmshanah@flexo:~$ bzgrep -E '[[:digit:]]{6,}' trace.txt.bz2 
>  0)   ksoftir-4    | ! 3010470 us |  }
>  0)  qemu-sy-4354  | ! 250406.2 us |    }
>  0)  qemu-sy-4354  | ! 250407.0 us |  }
>  0)  qemu-sy-4354  | ! 362946.3 us |    }
>  0)  qemu-sy-4354  | ! 362947.0 us |  }
>  0)  qemu-sy-4177  | ! 780480.3 us |  }
>  0)  qemu-sy-4354  | ! 117685.7 us |    }
>  0)  qemu-sy-4354  | ! 117686.5 us |  }
> 
> That ksoftirqd value is a bit strange (> 3 seconds, or is the formatting
> wrong?). I guess I still need some guidance to know what I'm looking at
> with this trace and/or what to do next.

What happens is that it gets preempted a few times while running a
particular function, say do_softirqd(), or kvm_arch_vcpu_ioctl_run().

Now, when this function ends, it prints the wall-time delay between
start and end of that function, instead of the task-time delay.

So by having been preempted several times, that gets inflated.

That said, the output is slightly 'buggy' in that is seems to miss
context switches at times:

 0)  qemu-sy-4339  |               |        schedule() {
 0)  qemu-sy-4131  | ! 6750.369 us |        }

I also find it very hard to attribute all time:

 0)  qemu-sy-4354  |               |  kvm_vcpu_ioctl() {
 0)  qemu-sy-4354  |               |    kvm_arch_vcpu_ioctl_run() {
 0)  qemu-sy-4354  |               |      kvm_arch_vcpu_load() {
 0)  qemu-sy-4354  |               |        kvm_write_guest_time() {
 0)  qemu-sy-4354  |   0.289 us    |        }
 0)  qemu-sy-4354  |   0.956 us    |      }
 0)  qemu-sy-4354  |               |      kvm_inject_pending_timer_irqs() {
 0)  qemu-sy-4354  |               |        kvm_inject_apic_timer_irqs() {
 0)  qemu-sy-4354  |   0.295 us    |        }
 0)  qemu-sy-4354  |               |        kvm_inject_pit_timer_irqs() {
 0)  qemu-sy-4354  |   0.304 us    |        }
 0)  qemu-sy-4354  |   1.488 us    |      }
 0)  qemu-sy-4354  |               |      kvm_lapic_enabled() {
 0)  qemu-sy-4354  |   0.294 us    |      }
 0)  qemu-sy-4354  |               |      kvm_lapic_find_highest_irr() {
 0)  qemu-sy-4354  |   0.307 us    |      }
 0)  qemu-sy-4354  |               |      kvm_cpu_has_interrupt() {
 0)  qemu-sy-4354  |               |        kvm_apic_has_interrupt() {
 0)  qemu-sy-4354  |   0.325 us    |        }
 0)  qemu-sy-4354  |               |        kvm_apic_accept_pic_intr() {
 0)  qemu-sy-4354  |   0.298 us    |        }
 0)  qemu-sy-4354  |   1.521 us    |      }
 0)  qemu-sy-4354  |               |      kvm_lapic_sync_to_vapic() {
 0)  qemu-sy-4354  |   0.295 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          autoremove_wake_function() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |               |                  wakeup_preempt_entity() {
 0)  qemu-sy-4354  |   0.309 us    |                  }
 0)  qemu-sy-4354  |               |                  resched_task() {
 0)  qemu-sy-4354  |   0.324 us    |                  }
 0)  qemu-sy-4354  |   1.614 us    |                }
 0)  qemu-sy-4354  |   2.934 us    |              }
 0)  qemu-sy-4354  |   3.529 us    |            }
 0)  qemu-sy-4354  |   4.118 us    |          }
 0)  qemu-sy-4354  |   4.743 us    |        }
 0)  qemu-sy-4354  |   5.432 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          autoremove_wake_function() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  =>  qemu-sy-4294
 0)  qemu-sy-4237  =>  qemu-sy-4354
 0)  qemu-sy-4354  |   5.500 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.316 us    |                }
 0)  qemu-sy-4354  |   1.250 us    |              }
 0)  qemu-sy-4354  |   1.834 us    |            }
 0)  qemu-sy-4354  |   2.434 us    |          }
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |   0.418 us    |              }
 0)  qemu-sy-4354  |   1.001 us    |            }
 0)  qemu-sy-4354  |   1.608 us    |          }
 0)  qemu-sy-4354  |   4.987 us    |        }
 0)  qemu-sy-4354  |   5.597 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.325 us    |                }
 0)  qemu-sy-4354  |   1.247 us    |              }
 0)  qemu-sy-4354  |   1.831 us    |            }
 0)  qemu-sy-4354  |   2.435 us    |          }
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |   0.415 us    |              }
 0)  qemu-sy-4354  |   0.995 us    |            }
 0)  qemu-sy-4354  |   1.587 us    |          }
 0)  qemu-sy-4354  |   5.026 us    |        }
 0)  qemu-sy-4354  |   5.639 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.313 us    |                }
 0)  qemu-sy-4354  |   1.331 us    |              }
 0)  qemu-sy-4354  |   1.903 us    |            }
 0)  qemu-sy-4354  |   2.507 us    |          }
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |   0.415 us    |              }
 0)  qemu-sy-4354  |   0.998 us    |            }
 0)  qemu-sy-4354  |   1.596 us    |          }
 0)  qemu-sy-4354  |   5.017 us    |        }
 0)  qemu-sy-4354  |   5.630 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.318 us    |                }
 0)  qemu-sy-4354  |   1.275 us    |              }
 0)  qemu-sy-4354  |   1.860 us    |            }
 0)  qemu-sy-4354  |   2.474 us    |          }
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |   0.406 us    |              }
 0)  qemu-sy-4354  |   0.989 us    |            }
 0)  qemu-sy-4354  |   1.581 us    |          }
 0)  qemu-sy-4354  |   4.953 us    |        }
 0)  qemu-sy-4354  |   5.567 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.313 us    |                }
 0)  qemu-sy-4354  |   2.645 us    |              }
 0)  qemu-sy-4354  |   3.219 us    |            }
 0)  qemu-sy-4354  |   3.824 us    |          }
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |   0.396 us    |              }
 0)  qemu-sy-4354  |   0.968 us    |            }
 0)  qemu-sy-4354  |   1.557 us    |          }
 0)  qemu-sy-4354  |   6.390 us    |        }
 0)  qemu-sy-4354  |   7.004 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.310 us    |                }
 0)  qemu-sy-4354  |   1.160 us    |              }
 0)  qemu-sy-4354  |   1.731 us    |            }
 0)  qemu-sy-4354  |   2.330 us    |          }
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |   0.397 us    |              }
 0)  qemu-sy-4354  |   0.965 us    |            }
 0)  qemu-sy-4354  |   1.554 us    |          }
 0)  qemu-sy-4354  |   4.768 us    |        }
 0)  qemu-sy-4354  |   5.383 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.307 us    |                }
 0)  qemu-sy-4354  |   1.208 us    |              }
 0)  qemu-sy-4354  |   1.777 us    |            }
 0)  qemu-sy-4354  |   2.377 us    |          }
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |   0.394 us    |              }
 0)  qemu-sy-4354  |   0.964 us    |            }
 0)  qemu-sy-4354  |   1.554 us    |          }
 0)  qemu-sy-4354  |   4.855 us    |        }
 0)  qemu-sy-4354  |   5.482 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.307 us    |                }
 0)  qemu-sy-4354  |   1.193 us    |              }
 0)  qemu-sy-4354  |   1.765 us    |            }
 0)  qemu-sy-4354  |   2.368 us    |          }
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |   0.394 us    |              }
 0)  qemu-sy-4354  |   0.974 us    |            }
 0)  qemu-sy-4354  |   1.560 us    |          }
 0)  qemu-sy-4354  |   4.831 us    |        }
 0)  qemu-sy-4354  |   5.461 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.318 us    |                }
 0)  qemu-sy-4354  |   1.175 us    |              }
 0)  qemu-sy-4354  |   1.747 us    |            }
 0)  qemu-sy-4354  |   2.344 us    |          }
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |   2.029 us    |              }
 0)  qemu-sy-4354  |   2.597 us    |            }
 0)  qemu-sy-4354  |   3.186 us    |          }
 0)  qemu-sy-4354  |   6.430 us    |        }
 0)  qemu-sy-4354  |   7.046 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.310 us    |                }
 0)  qemu-sy-4354  |   1.199 us    |              }
 0)  qemu-sy-4354  |   1.780 us    |            }
 0)  qemu-sy-4354  |   2.378 us    |          }
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |   0.397 us    |              }
 0)  qemu-sy-4354  |   0.968 us    |            }
 0)  qemu-sy-4354  |   1.560 us    |          }
 0)  qemu-sy-4354  |   4.933 us    |        }
 0)  qemu-sy-4354  |   5.549 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          autoremove_wake_function() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.316 us    |                }
 0)  qemu-sy-4354  |   1.202 us    |              }
 0)  qemu-sy-4354  |   1.792 us    |            }
 0)  qemu-sy-4354  |   2.357 us    |          }
 0)  qemu-sy-4354  |   2.973 us    |        }
 0)  qemu-sy-4354  |   3.607 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.304 us    |                }
 0)  qemu-sy-4354  |   1.149 us    |              }
 0)  qemu-sy-4354  |   1.713 us    |            }
 0)  qemu-sy-4354  |   2.309 us    |          }
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |   0.405 us    |              }
 0)  qemu-sy-4354  |   0.971 us    |            }
 0)  qemu-sy-4354  |   1.569 us    |          }
 0)  qemu-sy-4354  |   4.800 us    |        }
 0)  qemu-sy-4354  |   5.408 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.298 us    |                }
 0)  qemu-sy-4354  |   1.127 us    |              }
 0)  qemu-sy-4354  |   1.695 us    |            }
 0)  qemu-sy-4354  |   2.291 us    |          }
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |   0.403 us    |              }
 0)  qemu-sy-4354  |   0.974 us    |            }
 0)  qemu-sy-4354  |   1.575 us    |          }
 0)  qemu-sy-4354  |   4.888 us    |        }
 0)  qemu-sy-4354  |   5.482 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          autoremove_wake_function() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.303 us    |                }
 0)  qemu-sy-4354  |   2.428 us    |              }
 0)  qemu-sy-4354  |   2.991 us    |            }
 0)  qemu-sy-4354  |   3.559 us    |          }
 0)  qemu-sy-4354  |   4.157 us    |        }
 0)  qemu-sy-4354  |   4.752 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.313 us    |                }
 0)  qemu-sy-4354  |   1.437 us    |              }
 0)  qemu-sy-4354  |   2.002 us    |            }
 0)  qemu-sy-4354  |   2.594 us    |          }
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |   0.418 us    |              }
 0)  qemu-sy-4354  |   1.016 us    |            }
 0)  qemu-sy-4354  |   1.587 us    |          }
 0)  qemu-sy-4354  |   5.077 us    |        }
 0)  qemu-sy-4354  |   5.699 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.309 us    |                }
 0)  qemu-sy-4354  |   1.314 us    |              }
 0)  qemu-sy-4354  |   1.884 us    |            }
 0)  qemu-sy-4354  |   2.480 us    |          }
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |   0.405 us    |              }
 0)  qemu-sy-4354  |   0.977 us    |            }
 0)  qemu-sy-4354  |   1.560 us    |          }
 0)  qemu-sy-4354  |   4.962 us    |        }
 0)  qemu-sy-4354  |   5.591 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          autoremove_wake_function() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.304 us    |                }
 0)  qemu-sy-4354  |   1.199 us    |              }
 0)  qemu-sy-4354  |   1.765 us    |            }
 0)  qemu-sy-4354  |   2.330 us    |          }
 0)  qemu-sy-4354  |   2.952 us    |        }
 0)  qemu-sy-4354  |   3.547 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          autoremove_wake_function() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.322 us    |                }
 0)  qemu-sy-4354  |   1.278 us    |              }
 0)  qemu-sy-4354  |   1.839 us    |            }
 0)  qemu-sy-4354  |   2.402 us    |          }
 0)  qemu-sy-4354  |   3.032 us    |        }
 0)  qemu-sy-4354  |   3.658 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          autoremove_wake_function() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.303 us    |                }
 0)  qemu-sy-4354  |   1.208 us    |              }
 0)  qemu-sy-4354  |   1.759 us    |            }
 0)  qemu-sy-4354  |   2.341 us    |          }
 0)  qemu-sy-4354  |   2.949 us    |        }
 0)  qemu-sy-4354  |   3.556 us    |      }
 0)  qemu-sy-4354  |               |      scheduler_tick() {
 0)  qemu-sy-4354  |               |        sched_slice() {
 0)  qemu-sy-4354  |   0.342 us    |        }
 0)  qemu-sy-4354  |   3.222 us    |      }
 0)  qemu-sy-4354  |               |      wake_up_process() {
 0)  qemu-sy-4354  |               |        try_to_wake_up() {
 0)  qemu-sy-4354  |               |          check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.343 us    |          }
 0)  qemu-sy-4354  |   1.331 us    |        }
 0)  qemu-sy-4354  |   1.915 us    |      }
 0)  qemu-sy-4354  |               |      kvm_lapic_sync_from_vapic() {
 0)  qemu-sy-4354  |   0.294 us    |      }
 0)  qemu-sy-4354  |               |      kvm_handle_exit() {
 0)  qemu-sy-4354  |   0.457 us    |      }
 0)  qemu-sy-4354  |               |      kvm_resched() {
 0)  qemu-sy-4354  |               |        _cond_resched() {
 0)  qemu-sy-4354  |               |          __cond_resched() {
 0)  qemu-sy-4354  |               |            schedule() {
 0)  qemu-sy-4354  |               |              wakeup_preempt_entity() {
 0)  qemu-sy-4354  |   0.294 us    |              }
 0)  qemu-sy-4354  |               |              kvm_sched_out() {
 0)  qemu-sy-4354  |               |                kvm_arch_vcpu_put() {
 0)  qemu-sy-4354  |   0.592 us    |                }
 0)  qemu-sy-4354  |   1.218 us    |              }
 0)  qemu-sy-4354  =>   kipmi0-496
 0)  qemu-sy-4213  =>  qemu-sy-4354
 0)  qemu-sy-4354  |               |              kvm_sched_in() {
 0)  qemu-sy-4354  |               |                kvm_arch_vcpu_load() {
 0)  qemu-sy-4354  |               |                  kvm_write_guest_time() {
 0)  qemu-sy-4354  |   0.298 us    |                  }
 0)  qemu-sy-4354  |   1.070 us    |                }
 0)  qemu-sy-4354  |   1.665 us    |              }
 0)  qemu-sy-4354  | ! 9172.159 us |            }
 0)  qemu-sy-4354  | ! 9172.793 us |          }
 0)  qemu-sy-4354  | ! 9173.422 us |        }
 0)  qemu-sy-4354  | ! 9174.032 us |      }
 0)  qemu-sy-4354  |               |      kvm_inject_pending_timer_irqs() {
 0)  qemu-sy-4354  |               |        kvm_inject_apic_timer_irqs() {
 0)  qemu-sy-4354  |               |          kvm_vcpu_kick() {
 0)  qemu-sy-4354  |   0.291 us    |          }
 0)  qemu-sy-4354  |   1.151 us    |        }
 0)  qemu-sy-4354  |               |        kvm_inject_pit_timer_irqs() {
 0)  qemu-sy-4354  |   0.352 us    |        }
 0)  qemu-sy-4354  |   2.429 us    |      }
 0)  qemu-sy-4354  |               |      kvm_lapic_enabled() {
 0)  qemu-sy-4354  |   0.291 us    |      }
 0)  qemu-sy-4354  |               |      kvm_lapic_find_highest_irr() {
 0)  qemu-sy-4354  |   0.312 us    |      }
 0)  qemu-sy-4354  |               |      kvm_lapic_get_cr8() {
 0)  qemu-sy-4354  |   0.298 us    |      }
 0)  qemu-sy-4354  |               |      kvm_cpu_has_interrupt() {
 0)  qemu-sy-4354  |               |        kvm_apic_has_interrupt() {
 0)  qemu-sy-4354  |   0.385 us    |        }
 0)  qemu-sy-4354  |   0.980 us    |      }
 0)  qemu-sy-4354  |               |      kvm_lapic_sync_to_vapic() {
 0)  qemu-sy-4354  |   0.295 us    |      }
 0)  qemu-sy-4354  |               |      kvm_lapic_sync_from_vapic() {
 0)  qemu-sy-4354  |   0.331 us    |      }
 0)  qemu-sy-4354  |               |      kvm_handle_exit() {
 0)  qemu-sy-4354  |   0.568 us    |      }
 0)  qemu-sy-4354  |               |      kvm_inject_pending_timer_irqs() {
 0)  qemu-sy-4354  |               |        kvm_inject_apic_timer_irqs() {
 0)  qemu-sy-4354  |               |          kvm_vcpu_kick() {
 0)  qemu-sy-4354  |   0.295 us    |          }
 0)  qemu-sy-4354  |   0.959 us    |        }
 0)  qemu-sy-4354  |               |        kvm_inject_pit_timer_irqs() {
 0)  qemu-sy-4354  |   0.313 us    |        }
 0)  qemu-sy-4354  |   2.170 us    |      }
 0)  qemu-sy-4354  |               |      kvm_lapic_enabled() {
 0)  qemu-sy-4354  |   0.310 us    |      }
 0)  qemu-sy-4354  |               |      kvm_lapic_find_highest_irr() {
 0)  qemu-sy-4354  |   0.295 us    |      }
 0)  qemu-sy-4354  |               |      kvm_lapic_get_cr8() {
 0)  qemu-sy-4354  |   0.295 us    |      }
 0)  qemu-sy-4354  |               |      kvm_cpu_has_interrupt() {
 0)  qemu-sy-4354  |               |        kvm_apic_has_interrupt() {
 0)  qemu-sy-4354  |   0.325 us    |        }
 0)  qemu-sy-4354  |   0.938 us    |      }
 0)  qemu-sy-4354  |               |      kvm_cpu_get_interrupt() {
 0)  qemu-sy-4354  |               |        kvm_get_apic_interrupt() {
 0)  qemu-sy-4354  |               |          kvm_apic_has_interrupt() {
 0)  qemu-sy-4354  |   0.322 us    |          }
 0)  qemu-sy-4354  |   0.944 us    |        }
 0)  qemu-sy-4354  |   1.542 us    |      }
 0)  qemu-sy-4354  |               |      kvm_timer_intr_post() {
 0)  qemu-sy-4354  |               |        kvm_apic_timer_intr_post() {
 0)  qemu-sy-4354  |   0.309 us    |        }
 0)  qemu-sy-4354  |   2.059 us    |      }
 0)  qemu-sy-4354  |               |      kvm_cpu_has_interrupt() {
 0)  qemu-sy-4354  |               |        kvm_apic_has_interrupt() {
 0)  qemu-sy-4354  |   0.340 us    |        }
 0)  qemu-sy-4354  |               |        kvm_apic_accept_pic_intr() {
 0)  qemu-sy-4354  |   0.313 us    |        }
 0)  qemu-sy-4354  |   1.560 us    |      }
 0)  qemu-sy-4354  |               |      kvm_lapic_sync_to_vapic() {
 0)  qemu-sy-4354  |   0.298 us    |      }
 0)  qemu-sy-4354  |               |      kvm_lapic_sync_from_vapic() {
 0)  qemu-sy-4354  |   0.319 us    |      }
 0)  qemu-sy-4354  |               |      kvm_handle_exit() {
 0)  qemu-sy-4354  |               |        kvm_mmu_page_fault() {
 0)  qemu-sy-4354  |               |          kvm_read_guest() {
 0)  qemu-sy-4354  |               |            kvm_read_guest_page() {
 0)  qemu-sy-4354  |   0.764 us    |            }
 0)  qemu-sy-4354  |   1.377 us    |          }
 0)  qemu-sy-4354  |               |          kvm_read_guest() {
 0)  qemu-sy-4354  |               |            kvm_read_guest_page() {
 0)  qemu-sy-4354  |   0.499 us    |            }
 0)  qemu-sy-4354  |   1.088 us    |          }
 0)  qemu-sy-4354  |               |          kvm_release_pfn_clean() {
 0)  qemu-sy-4354  |   0.349 us    |          }
 0)  qemu-sy-4354  |               |          kvm_read_guest() {
 0)  qemu-sy-4354  |               |            kvm_read_guest_page() {
 0)  qemu-sy-4354  |   0.451 us    |            }
 0)  qemu-sy-4354  |   1.046 us    |          }
 0)  qemu-sy-4354  |               |          kvm_read_guest() {
 0)  qemu-sy-4354  |               |            kvm_read_guest_page() {
 0)  qemu-sy-4354  |   0.361 us    |            }
 0)  qemu-sy-4354  |   0.956 us    |          }
 0)  qemu-sy-4354  |               |          kvm_read_guest() {
 0)  qemu-sy-4354  |               |            kvm_read_guest_page() {
 0)  qemu-sy-4354  |   0.381 us    |            }
 0)  qemu-sy-4354  |   0.974 us    |          }
 0)  qemu-sy-4354  |               |          kvm_read_guest() {
 0)  qemu-sy-4354  |               |            kvm_read_guest_page() {
 0)  qemu-sy-4354  |   0.345 us    |            }
 0)  qemu-sy-4354  |   0.959 us    |          }
 0)  qemu-sy-4354  |               |          kvm_read_guest() {
 0)  qemu-sy-4354  |               |            kvm_read_guest_page() {
 0)  qemu-sy-4354  |   0.364 us    |            }
 0)  qemu-sy-4354  |   0.965 us    |          }
 0)  qemu-sy-4354  |               |          kvm_ioapic_update_eoi() {
 0)  qemu-sy-4354  |   0.358 us    |          }
 0)  qemu-sy-4354  | + 13.782 us   |        }
 0)  qemu-sy-4354  | + 14.681 us   |      }
 0)  qemu-sy-4354  |               |      kvm_inject_pending_timer_irqs() {
 0)  qemu-sy-4354  |               |        kvm_inject_apic_timer_irqs() {
 0)  qemu-sy-4354  |               |          kvm_vcpu_kick() {
 0)  qemu-sy-4354  |   0.291 us    |          }
 0)  qemu-sy-4354  |   0.953 us    |        }
 0)  qemu-sy-4354  |               |        kvm_inject_pit_timer_irqs() {
 0)  qemu-sy-4354  |   0.304 us    |        }
 0)  qemu-sy-4354  |   2.150 us    |      }
 0)  qemu-sy-4354  |               |      kvm_lapic_enabled() {
 0)  qemu-sy-4354  |   0.304 us    |      }
 0)  qemu-sy-4354  |               |      kvm_lapic_find_highest_irr() {
 0)  qemu-sy-4354  |   0.295 us    |      }
 0)  qemu-sy-4354  |               |      kvm_lapic_get_cr8() {
 0)  qemu-sy-4354  |   0.309 us    |      }
 0)  qemu-sy-4354  |               |      kvm_cpu_has_interrupt() {
 0)  qemu-sy-4354  |               |        kvm_apic_has_interrupt() {
 0)  qemu-sy-4354  |   0.315 us    |        }
 0)  qemu-sy-4354  |   0.914 us    |      }
 0)  qemu-sy-4354  |               |      kvm_lapic_sync_to_vapic() {
 0)  qemu-sy-4354  |   0.297 us    |      }
 0)  qemu-sy-4354  |               |      kvm_lapic_sync_from_vapic() {
 0)  qemu-sy-4354  |   0.318 us    |      }
 0)  qemu-sy-4354  |               |      kvm_handle_exit() {
 0)  qemu-sy-4354  |               |        kvm_emulate_pio() {
 0)  qemu-sy-4354  |               |          kvm_io_bus_find_dev() {
 0)  qemu-sy-4354  |   0.406 us    |          }
 0)  qemu-sy-4354  |   1.115 us    |        }
 0)  qemu-sy-4354  |   2.026 us    |      }
 0)  qemu-sy-4354  |               |      kvm_get_cr8() {
 0)  qemu-sy-4354  |               |        kvm_lapic_get_cr8() {
 0)  qemu-sy-4354  |   0.292 us    |        }
 0)  qemu-sy-4354  |   2.257 us    |      }
 0)  qemu-sy-4354  |               |      kvm_arch_vcpu_put() {
 0)  qemu-sy-4354  |   0.574 us    |      }
 0)  qemu-sy-4354  | ! 250406.2 us |    }
 0)  qemu-sy-4354  | ! 250407.0 us |  }


There's 2 preemptions in there, accounting for perhaps 15ms
Then there's about 20 __wake_up()s in there (wth do those come from?)
accounting for 5ms each, totaling 100ms.

There's a scheduler_tick() in there but no IRQ entry ?!

All in all its very hard to get to the total of 250ms.

I suspect __vcpu_run() and vcpu_enter_guest() get inlined, and we might
just be looking at time spend in the guest... bit hard to tell for me,
as this is the first time ever I looked at all this kvm code.


Is there a way to add a wall-time column to this output so that we can
see where the time goes?

Another something nice would be to have ctx switches like:

foo-1 => bar-2 ran: ${time foo spend on the cpu} since: ${time bar spend away from the cpu}

I'll poke me a little at this function graph tracer thingy to see if I
can do that.




^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-26  9:55                         ` Kevin Shanahan
  0 siblings, 0 replies; 180+ messages in thread
From: Kevin Shanahan @ 2009-01-26  9:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Steven Rostedt, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker, bugme-daemon

On Wed, 2009-01-21 at 16:18 +0100, Ingo Molnar wrote:
> * Avi Kivity <avi@redhat.com> wrote:
> > It means, a scheduling problem.  Can you run the latency tracer (which 
> > only works with realtime priority), so we can tell if it is (a) kvm 
> > failing to wake up the vcpu properly or (b) the scheduler delaying the 
> > vcpu from running.
> 
> Could we please get an ftrace capture of the incident?
> 
> Firstly, it makes sense to simplify the tracing environment as much as 
> possible: for example single-CPU traces are much easier to interpret.
> 
> Can you reproduce it with just one CPU online? I.e. if you offline all the 
> other cores via:
> 
>   echo 0 > /sys/devices/system/cpu/cpu1/online
> 
>   [etc.]
> 
> and keep CPU#0 only, do the latencies still occur?
> 
> If they do still occur, then please do the traces that way.
> 
> [ If they do not occur then switch back on all CPUs - we'll sort out the
>   traces ;-) ]
> 
> Then please build a function tracer kernel, by enabling:
> 
>   CONFIG_FUNCTION_TRACER=y
>   CONFIG_FUNCTION_GRAPH_TRACER=y
>   CONFIG_DYNAMIC_FTRACE=y
> 
> Once you boot into such a kernel, you can switch on function tracing via:
> 
>   cd /debug/tracing/
> 
>   echo 0 > tracing_enabled
>   echo function_graph > current_tracer
>   echo funcgraph-proc > trace_options 
> 
> It does not run yet, first find a suitable set of functions to trace. For 
> example this will be a pretty good starting point for scheduler+KVM 
> problems:
> 
>   echo ''         > set_ftrace_filter  # clear filter functions
>   echo '*sched*' >> set_ftrace_filter 
>   echo '*wake*'  >> set_ftrace_filter
>   echo '*kvm*'   >> set_ftrace_filter
>   echo 1 > tracing_enabled             # let the tracer go
> 
> You can see your current selection of functions to trace via 'cat 
> set_ftrace_filter', and you can see all functions via 'cat 
> available_filter_functions'.
> 
> You can also trace all functions via:
> 
>   echo '*' > set_ftrace_filter
> 
> Tracer output can be captured from the 'trace' file. It should look like 
> this:
> 
>  15)   cc1-28106    |   0.263 us    |    page_evictable();
>  15)   cc1-28106    |               |    lru_cache_add_lru() {
>  15)   cc1-28106    |   0.252 us    |      __lru_cache_add();
>  15)   cc1-28106    |   0.738 us    |    }
>  15)   cc1-28106    | + 74.026 us   |  }
>  15)   cc1-28106    |               |  up_read() {
>  15)   cc1-28106    |   0.257 us    |    _spin_lock_irqsave();
>  15)   cc1-28106    |   0.253 us    |    _spin_unlock_irqrestore();
>  15)   cc1-28106    |   1.329 us    |  }
> 
> To capture a continuous stream of all trace data you can do:
> 
>   cat trace_pipe > /tmp/trace.txt
> 
> (this will also drain the trace ringbuffers.)
> 
> Note that this can be quite expensive if there are a lot of functions that 
> are traced - so it makes sense to trim down the set of traced functions to 
> only the interesting ones. Which are the interesting ones can be 
> determined from looking at the traces. You should see your KVM threads 
> getting active every second as the ping happens.
> 
> If you get lost events you can increase the trace buffer size via the 
> buffer_size_kb control - the default is around 1.4 MB.
> 
> Let me know if any of these steps is causing problems or if interpreting 
> the traces is difficult.

Just carrying out the steps was okay, but I don't really know what I'm
looking at. I've uploaded the trace here (about 10 seconds worth, I
think):

  http://disenchant.net/tmp/bug-12465/trace-1/

The guest being pinged is process 4353:

kmshanah@flexo:~$ pstree -p 4353
qemu-system-x86(4353)─┬─{qemu-system-x86}(4354)
                      ├─{qemu-system-x86}(4355)
                      └─{qemu-system-x86}(4772)

I guess the larger overhead/duration values are what we are looking for,
e.g.:

kmshanah@flexo:~$ bzgrep -E '[[:digit:]]{6,}' trace.txt.bz2 
 0)   ksoftir-4    | ! 3010470 us |  }
 0)  qemu-sy-4354  | ! 250406.2 us |    }
 0)  qemu-sy-4354  | ! 250407.0 us |  }
 0)  qemu-sy-4354  | ! 362946.3 us |    }
 0)  qemu-sy-4354  | ! 362947.0 us |  }
 0)  qemu-sy-4177  | ! 780480.3 us |  }
 0)  qemu-sy-4354  | ! 117685.7 us |    }
 0)  qemu-sy-4354  | ! 117686.5 us |  }

That ksoftirqd value is a bit strange (> 3 seconds, or is the formatting
wrong?). I guess I still need some guidance to know what I'm looking at
with this trace and/or what to do next.

Cheers,
Kevin.



^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-26  9:55                         ` Kevin Shanahan
  0 siblings, 0 replies; 180+ messages in thread
From: Kevin Shanahan @ 2009-01-26  9:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Steven Rostedt, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r

On Wed, 2009-01-21 at 16:18 +0100, Ingo Molnar wrote:
> * Avi Kivity <avi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > It means, a scheduling problem.  Can you run the latency tracer (which 
> > only works with realtime priority), so we can tell if it is (a) kvm 
> > failing to wake up the vcpu properly or (b) the scheduler delaying the 
> > vcpu from running.
> 
> Could we please get an ftrace capture of the incident?
> 
> Firstly, it makes sense to simplify the tracing environment as much as 
> possible: for example single-CPU traces are much easier to interpret.
> 
> Can you reproduce it with just one CPU online? I.e. if you offline all the 
> other cores via:
> 
>   echo 0 > /sys/devices/system/cpu/cpu1/online
> 
>   [etc.]
> 
> and keep CPU#0 only, do the latencies still occur?
> 
> If they do still occur, then please do the traces that way.
> 
> [ If they do not occur then switch back on all CPUs - we'll sort out the
>   traces ;-) ]
> 
> Then please build a function tracer kernel, by enabling:
> 
>   CONFIG_FUNCTION_TRACER=y
>   CONFIG_FUNCTION_GRAPH_TRACER=y
>   CONFIG_DYNAMIC_FTRACE=y
> 
> Once you boot into such a kernel, you can switch on function tracing via:
> 
>   cd /debug/tracing/
> 
>   echo 0 > tracing_enabled
>   echo function_graph > current_tracer
>   echo funcgraph-proc > trace_options 
> 
> It does not run yet, first find a suitable set of functions to trace. For 
> example this will be a pretty good starting point for scheduler+KVM 
> problems:
> 
>   echo ''         > set_ftrace_filter  # clear filter functions
>   echo '*sched*' >> set_ftrace_filter 
>   echo '*wake*'  >> set_ftrace_filter
>   echo '*kvm*'   >> set_ftrace_filter
>   echo 1 > tracing_enabled             # let the tracer go
> 
> You can see your current selection of functions to trace via 'cat 
> set_ftrace_filter', and you can see all functions via 'cat 
> available_filter_functions'.
> 
> You can also trace all functions via:
> 
>   echo '*' > set_ftrace_filter
> 
> Tracer output can be captured from the 'trace' file. It should look like 
> this:
> 
>  15)   cc1-28106    |   0.263 us    |    page_evictable();
>  15)   cc1-28106    |               |    lru_cache_add_lru() {
>  15)   cc1-28106    |   0.252 us    |      __lru_cache_add();
>  15)   cc1-28106    |   0.738 us    |    }
>  15)   cc1-28106    | + 74.026 us   |  }
>  15)   cc1-28106    |               |  up_read() {
>  15)   cc1-28106    |   0.257 us    |    _spin_lock_irqsave();
>  15)   cc1-28106    |   0.253 us    |    _spin_unlock_irqrestore();
>  15)   cc1-28106    |   1.329 us    |  }
> 
> To capture a continuous stream of all trace data you can do:
> 
>   cat trace_pipe > /tmp/trace.txt
> 
> (this will also drain the trace ringbuffers.)
> 
> Note that this can be quite expensive if there are a lot of functions that 
> are traced - so it makes sense to trim down the set of traced functions to 
> only the interesting ones. Which are the interesting ones can be 
> determined from looking at the traces. You should see your KVM threads 
> getting active every second as the ping happens.
> 
> If you get lost events you can increase the trace buffer size via the 
> buffer_size_kb control - the default is around 1.4 MB.
> 
> Let me know if any of these steps is causing problems or if interpreting 
> the traces is difficult.

Just carrying out the steps was okay, but I don't really know what I'm
looking at. I've uploaded the trace here (about 10 seconds worth, I
think):

  http://disenchant.net/tmp/bug-12465/trace-1/

The guest being pinged is process 4353:

kmshanah@flexo:~$ pstree -p 4353
qemu-system-x86(4353)─┬─{qemu-system-x86}(4354)
                      ├─{qemu-system-x86}(4355)
                      └─{qemu-system-x86}(4772)

I guess the larger overhead/duration values are what we are looking for,
e.g.:

kmshanah@flexo:~$ bzgrep -E '[[:digit:]]{6,}' trace.txt.bz2 
 0)   ksoftir-4    | ! 3010470 us |  }
 0)  qemu-sy-4354  | ! 250406.2 us |    }
 0)  qemu-sy-4354  | ! 250407.0 us |  }
 0)  qemu-sy-4354  | ! 362946.3 us |    }
 0)  qemu-sy-4354  | ! 362947.0 us |  }
 0)  qemu-sy-4177  | ! 780480.3 us |  }
 0)  qemu-sy-4354  | ! 117685.7 us |    }
 0)  qemu-sy-4354  | ! 117686.5 us |  }

That ksoftirqd value is a bit strange (> 3 seconds, or is the formatting
wrong?). I guess I still need some guidance to know what I'm looking at
with this trace and/or what to do next.

Cheers,
Kevin.


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-22 20:31                           ` Ingo Molnar
  0 siblings, 0 replies; 180+ messages in thread
From: Ingo Molnar @ 2009-01-22 20:31 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Steven Rostedt, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker, bugme-daemon


* Kevin Shanahan <kmshanah@ucwb.org.au> wrote:

> On Wed, 2009-01-21 at 16:18 +0100, Ingo Molnar wrote:
> > * Avi Kivity <avi@redhat.com> wrote:
> > > It means, a scheduling problem.  Can you run the latency tracer (which 
> > > only works with realtime priority), so we can tell if it is (a) kvm 
> > > failing to wake up the vcpu properly or (b) the scheduler delaying the 
> > > vcpu from running.
> > 
> > Could we please get an ftrace capture of the incident?
> > 
> > Firstly, it makes sense to simplify the tracing environment as much as 
> > possible: for example single-CPU traces are much easier to interpret.
> > 
> > Can you reproduce it with just one CPU online? I.e. if you offline all the 
> > other cores via:
> > 
> >   echo 0 > /sys/devices/system/cpu/cpu1/online
> > 
> >   [etc.]
> > 
> > and keep CPU#0 only, do the latencies still occur?
> > 
> > If they do still occur, then please do the traces that way.
> > 
> > [ If they do not occur then switch back on all CPUs - we'll sort out the
> >   traces ;-) ]
> > 
> > Then please build a function tracer kernel, by enabling:
> > 
> >   CONFIG_FUNCTION_TRACER=y
> >   CONFIG_FUNCTION_GRAPH_TRACER=y
> >   CONFIG_DYNAMIC_FTRACE=y
> 
> Looks like the function graph tracer is only in 2.6.29, so I've updated
> now to 2.6.29-rc2-00013-gf3b8436.
> 
> Again, a control test to make sure the problem still occurs:
> 
> --- hermes-old.wumi.org.au ping statistics ---
> 64 packets transmitted, 64 received, 0% packet loss, time 63080ms
> rtt min/avg/max/mdev = 0.168/479.893/4015.950/894.721 ms, pipe 5
> 
> Yes, plenty of delays there. Next, checking if I can reproduce with only
> one core online:
> 
> echo 0 > /sys/devices/system/cpu/cpu1/online
> echo 0 > /sys/devices/system/cpu/cpu2/online
> echo 0 > /sys/devices/system/cpu/cpu3/online
> ...
> 
> --- hermes-old.wumi.org.au ping statistics ---
> 900 packets transmitted, 900 received, 0% packet loss, time 900253ms
> rtt min/avg/max/mdev = 0.127/38.937/2082.347/170.348 ms, pipe 3
> 
> --- hermes-old.wumi.org.au ping statistics ---
> 900 packets transmitted, 900 received, 0% packet loss, time 900995ms
> rtt min/avg/max/mdev = 0.127/428.398/17126.227/1634.980 ms, pipe 18
> 
> So it looks like I can do the simplified trace. [...]

That's good news! Another thing is that happens sometimes is that narrow 
races go away if tracing is turned on - the dreaded Heisenbugs. Hopefully 
this wont happen, but if it does, tracing is the cheapest when only a few 
specific functions are traced.

There are two main types of delays that can occur:

 - the delay is CPU time - i.e. anomalously large amount of CPU time spent 
   somewhere in the kernel. Getting a trace of exactly what that 
   processing is would be nice.

 - the delay is some sort of missed wakeup or other logic error in the 
   flow of execution. These are harder to trace - you might want to take a 
   look at trace_options to extend the trace format with various details, 
   if the need arises.

> [...] I've run out of time for that this morning, but I'll spend some 
> time on it over the weekend. Thanks for the detailed instructions - it 
> doesn't look like it will be too hard.

ok, looking forward to your traces. Also, let us know if you run into 
anything unintuitive / complicated in the ftrace usage side.

	Ingo

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-22 20:31                           ` Ingo Molnar
  0 siblings, 0 replies; 180+ messages in thread
From: Ingo Molnar @ 2009-01-22 20:31 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Steven Rostedt, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r


* Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org> wrote:

> On Wed, 2009-01-21 at 16:18 +0100, Ingo Molnar wrote:
> > * Avi Kivity <avi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > > It means, a scheduling problem.  Can you run the latency tracer (which 
> > > only works with realtime priority), so we can tell if it is (a) kvm 
> > > failing to wake up the vcpu properly or (b) the scheduler delaying the 
> > > vcpu from running.
> > 
> > Could we please get an ftrace capture of the incident?
> > 
> > Firstly, it makes sense to simplify the tracing environment as much as 
> > possible: for example single-CPU traces are much easier to interpret.
> > 
> > Can you reproduce it with just one CPU online? I.e. if you offline all the 
> > other cores via:
> > 
> >   echo 0 > /sys/devices/system/cpu/cpu1/online
> > 
> >   [etc.]
> > 
> > and keep CPU#0 only, do the latencies still occur?
> > 
> > If they do still occur, then please do the traces that way.
> > 
> > [ If they do not occur then switch back on all CPUs - we'll sort out the
> >   traces ;-) ]
> > 
> > Then please build a function tracer kernel, by enabling:
> > 
> >   CONFIG_FUNCTION_TRACER=y
> >   CONFIG_FUNCTION_GRAPH_TRACER=y
> >   CONFIG_DYNAMIC_FTRACE=y
> 
> Looks like the function graph tracer is only in 2.6.29, so I've updated
> now to 2.6.29-rc2-00013-gf3b8436.
> 
> Again, a control test to make sure the problem still occurs:
> 
> --- hermes-old.wumi.org.au ping statistics ---
> 64 packets transmitted, 64 received, 0% packet loss, time 63080ms
> rtt min/avg/max/mdev = 0.168/479.893/4015.950/894.721 ms, pipe 5
> 
> Yes, plenty of delays there. Next, checking if I can reproduce with only
> one core online:
> 
> echo 0 > /sys/devices/system/cpu/cpu1/online
> echo 0 > /sys/devices/system/cpu/cpu2/online
> echo 0 > /sys/devices/system/cpu/cpu3/online
> ...
> 
> --- hermes-old.wumi.org.au ping statistics ---
> 900 packets transmitted, 900 received, 0% packet loss, time 900253ms
> rtt min/avg/max/mdev = 0.127/38.937/2082.347/170.348 ms, pipe 3
> 
> --- hermes-old.wumi.org.au ping statistics ---
> 900 packets transmitted, 900 received, 0% packet loss, time 900995ms
> rtt min/avg/max/mdev = 0.127/428.398/17126.227/1634.980 ms, pipe 18
> 
> So it looks like I can do the simplified trace. [...]

That's good news! Another thing is that happens sometimes is that narrow 
races go away if tracing is turned on - the dreaded Heisenbugs. Hopefully 
this wont happen, but if it does, tracing is the cheapest when only a few 
specific functions are traced.

There are two main types of delays that can occur:

 - the delay is CPU time - i.e. anomalously large amount of CPU time spent 
   somewhere in the kernel. Getting a trace of exactly what that 
   processing is would be nice.

 - the delay is some sort of missed wakeup or other logic error in the 
   flow of execution. These are harder to trace - you might want to take a 
   look at trace_options to extend the trace format with various details, 
   if the need arises.

> [...] I've run out of time for that this morning, but I'll spend some 
> time on it over the weekend. Thanks for the detailed instructions - it 
> doesn't look like it will be too hard.

ok, looking forward to your traces. Also, let us know if you run into 
anything unintuitive / complicated in the ftrace usage side.

	Ingo

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-01-21 15:18                       ` Ingo Molnar
  (?)
@ 2009-01-22 19:57                       ` Kevin Shanahan
  2009-01-22 20:31                           ` Ingo Molnar
  -1 siblings, 1 reply; 180+ messages in thread
From: Kevin Shanahan @ 2009-01-22 19:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Steven Rostedt, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker, bugme-daemon

On Wed, 2009-01-21 at 16:18 +0100, Ingo Molnar wrote:
> * Avi Kivity <avi@redhat.com> wrote:
> > It means, a scheduling problem.  Can you run the latency tracer (which 
> > only works with realtime priority), so we can tell if it is (a) kvm 
> > failing to wake up the vcpu properly or (b) the scheduler delaying the 
> > vcpu from running.
> 
> Could we please get an ftrace capture of the incident?
> 
> Firstly, it makes sense to simplify the tracing environment as much as 
> possible: for example single-CPU traces are much easier to interpret.
> 
> Can you reproduce it with just one CPU online? I.e. if you offline all the 
> other cores via:
> 
>   echo 0 > /sys/devices/system/cpu/cpu1/online
> 
>   [etc.]
> 
> and keep CPU#0 only, do the latencies still occur?
> 
> If they do still occur, then please do the traces that way.
> 
> [ If they do not occur then switch back on all CPUs - we'll sort out the
>   traces ;-) ]
> 
> Then please build a function tracer kernel, by enabling:
> 
>   CONFIG_FUNCTION_TRACER=y
>   CONFIG_FUNCTION_GRAPH_TRACER=y
>   CONFIG_DYNAMIC_FTRACE=y

Looks like the function graph tracer is only in 2.6.29, so I've updated
now to 2.6.29-rc2-00013-gf3b8436.

Again, a control test to make sure the problem still occurs:

--- hermes-old.wumi.org.au ping statistics ---
64 packets transmitted, 64 received, 0% packet loss, time 63080ms
rtt min/avg/max/mdev = 0.168/479.893/4015.950/894.721 ms, pipe 5

Yes, plenty of delays there. Next, checking if I can reproduce with only
one core online:

echo 0 > /sys/devices/system/cpu/cpu1/online
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 0 > /sys/devices/system/cpu/cpu3/online
...

--- hermes-old.wumi.org.au ping statistics ---
900 packets transmitted, 900 received, 0% packet loss, time 900253ms
rtt min/avg/max/mdev = 0.127/38.937/2082.347/170.348 ms, pipe 3

--- hermes-old.wumi.org.au ping statistics ---
900 packets transmitted, 900 received, 0% packet loss, time 900995ms
rtt min/avg/max/mdev = 0.127/428.398/17126.227/1634.980 ms, pipe 18

So it looks like I can do the simplified trace. I've run out of time for
that this morning, but I'll spend some time on it over the weekend.
Thanks for the detailed instructions - it doesn't look like it will be
too hard.

Cheers,
Kevin.



^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-22  1:48                           ` Steven Rostedt
  0 siblings, 0 replies; 180+ messages in thread
From: Steven Rostedt @ 2009-01-22  1:48 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Kevin Shanahan, Ingo Molnar, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker, bugme-daemon




On Wed, 21 Jan 2009, Avi Kivity wrote:

> Kevin Shanahan wrote:
> > > > --- hermes-old.wumi.org.au ping statistics ---
> > > > 900 packets transmitted, 900 received, 0% packet loss, time 899326ms
> > > > rtt min/avg/max/mdev = 0.093/0.157/3.611/0.117 ms
> > > > 
> > > > So, a _huge_ difference. But what does it mean?
> > > >       
> > > It means, a scheduling problem.  Can you run the latency tracer (which
> > > only works with realtime priority), so we can tell if it is (a) kvm
> > > failing to wake up the vcpu properly or (b) the scheduler delaying the
> > > vcpu from running.
> > >     
> > 
> > Sorry, but are you sure that's going to be useful?
> > 
> > If it only works on realtime threads and I'm not seeing the problem when
> > running kvm with realtime priority, is this going to tell you what you
> > want to know?
> > 
> > Not trying to be difficult, but that just didn't make sense to me.
> >   
> 
> You're right, wasn't thinking properly.
> 
> This is a tough one.  I'll see if I can think of something.  Ingo, any ideas?

I fixed up the wakeup latency tracer to work with all tasks (as well as 
other fixes). You can checkout the following:

git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git

  branch: tip/devel

compile with CONFIG_FUNCTION_TRACER and CONFIG_SCHED_TRACER and just

echo 0 > /debug/tracing/tracing_enabled
echo wakeup > /debug/tracing/current_tracer

echo 1 > /debug/tracing/tracing_enabled
run your test
echo 0 > /debug/tracing/tracing_enabled

and then look at /debug/tracing/latency_trace

-- Steve


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-22  1:48                           ` Steven Rostedt
  0 siblings, 0 replies; 180+ messages in thread
From: Steven Rostedt @ 2009-01-22  1:48 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Kevin Shanahan, Ingo Molnar, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r




On Wed, 21 Jan 2009, Avi Kivity wrote:

> Kevin Shanahan wrote:
> > > > --- hermes-old.wumi.org.au ping statistics ---
> > > > 900 packets transmitted, 900 received, 0% packet loss, time 899326ms
> > > > rtt min/avg/max/mdev = 0.093/0.157/3.611/0.117 ms
> > > > 
> > > > So, a _huge_ difference. But what does it mean?
> > > >       
> > > It means, a scheduling problem.  Can you run the latency tracer (which
> > > only works with realtime priority), so we can tell if it is (a) kvm
> > > failing to wake up the vcpu properly or (b) the scheduler delaying the
> > > vcpu from running.
> > >     
> > 
> > Sorry, but are you sure that's going to be useful?
> > 
> > If it only works on realtime threads and I'm not seeing the problem when
> > running kvm with realtime priority, is this going to tell you what you
> > want to know?
> > 
> > Not trying to be difficult, but that just didn't make sense to me.
> >   
> 
> You're right, wasn't thinking properly.
> 
> This is a tough one.  I'll see if I can think of something.  Ingo, any ideas?

I fixed up the wakeup latency tracer to work with all tasks (as well as 
other fixes). You can checkout the following:

git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git

  branch: tip/devel

compile with CONFIG_FUNCTION_TRACER and CONFIG_SCHED_TRACER and just

echo 0 > /debug/tracing/tracing_enabled
echo wakeup > /debug/tracing/current_tracer

echo 1 > /debug/tracing/tracing_enabled
run your test
echo 0 > /debug/tracing/tracing_enabled

and then look at /debug/tracing/latency_trace

-- Steve

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-21 15:18                       ` Ingo Molnar
  0 siblings, 0 replies; 180+ messages in thread
From: Ingo Molnar @ 2009-01-21 15:18 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Kevin Shanahan, Steven Rostedt, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker, bugme-daemon


* Avi Kivity <avi@redhat.com> wrote:

> Kevin Shanahan wrote:
>> On Tue, 2009-01-20 at 19:47 +0200, Avi Kivity wrote:
>>   
>>> Steven Rostedt wrote:
>>>     
>>>> Note, the wakeup latency only tests realtime threads, since other threads
>>>> can have other issues for wakeup. I could change the wakeup tracer as
>>>> wakeup_rt, and make a new "wakeup" that tests all threads, but it may
>>>> be difficult to get something accurate.
>>>>       
>>> Kevin, can you retest with kvm at realtime priority?
>>>     
>>
>> Running vanilla Linux 2.6.28, kvm-82. First a control test to check that
>> the problem is still there when running at normal priority:
>>
>> --- hermes-old.wumi.org.au ping statistics ---
>> 900 packets transmitted, 900 received, 0% packet loss, time 899283ms
>> rtt min/avg/max/mdev = 0.119/269.773/13739.426/1230.836 ms, pipe 14
>>
>> Yeah, sure is.
>>
>> Okay, so now I set the realtime attributes of the processes for the VM
>> instance being pinged:
>>
>> flexo:~# ps ax | grep 6284
>>  6284 ?        Sl     6:11 /usr/local/kvm/bin/qemu-system-x86_64 -smp 2
>> -m 2048 -hda kvm-17-1.img -hdb kvm-17-tmp.img -net
>> nic,vlan=0,macaddr=52:54:00:12:34:67,model=rtl8139 -net
>> tap,vlan=0,ifname=tap17,script=no -vnc 127.0.0.1:17 -usbdevice tablet
>> -daemonize
>> flexo:~# pstree -p 6284
>> qemu-system-x86(6284)─┬─{qemu-system-x86}(6285)
>>                       ├─{qemu-system-x86}(6286)
>>                       └─{qemu-system-x86}(6540)
>>
>> (info cpus on the QEMU console shows 6285 and 6286 being the VCPU
>> processes. Not sure what the third child is for, maybe vnc?.)
>>
>> flexo:~# chrt -r -p 3 6284
>> flexo:~# chrt -r -p 3 6285
>> flexo:~# chrt -r -p 3 6286
>> flexo:~# chrt -p 6284
>> pid 6284's current scheduling policy: SCHED_RR
>> pid 6284's current scheduling priority: 3
>> flexo:~# chrt -p 6285
>> pid 6285's current scheduling policy: SCHED_RR
>> pid 6285's current scheduling priority: 3
>> flexo:~# chrt -p 6286
>> pid 6286's current scheduling policy: SCHED_RR
>> pid 6286's current scheduling priority: 3
>>
>> And the result of the ping test now:
>>
>> --- hermes-old.wumi.org.au ping statistics ---
>> 900 packets transmitted, 900 received, 0% packet loss, time 899326ms
>> rtt min/avg/max/mdev = 0.093/0.157/3.611/0.117 ms
>>
>> So, a _huge_ difference. But what does it mean?
>
> It means, a scheduling problem.  Can you run the latency tracer (which 
> only works with realtime priority), so we can tell if it is (a) kvm 
> failing to wake up the vcpu properly or (b) the scheduler delaying the 
> vcpu from running.

Could we please get an ftrace capture of the incident?

Firstly, it makes sense to simplify the tracing environment as much as 
possible: for example single-CPU traces are much easier to interpret.

Can you reproduce it with just one CPU online? I.e. if you offline all the 
other cores via:

  echo 0 > /sys/devices/system/cpu/cpu1/online

  [etc.]

and keep CPU#0 only, do the latencies still occur?

If they do still occur, then please do the traces that way.

[ If they do not occur then switch back on all CPUs - we'll sort out the
  traces ;-) ]

Then please build a function tracer kernel, by enabling:

  CONFIG_FUNCTION_TRACER=y
  CONFIG_FUNCTION_GRAPH_TRACER=y
  CONFIG_DYNAMIC_FTRACE=y

Once you boot into such a kernel, you can switch on function tracing via:

  cd /debug/tracing/

  echo 0 > tracing_enabled
  echo function_graph > current_tracer
  echo funcgraph-proc > trace_options 

It does not run yet, first find a suitable set of functions to trace. For 
example this will be a pretty good starting point for scheduler+KVM 
problems:

  echo ''         > set_ftrace_filter  # clear filter functions
  echo '*sched*' >> set_ftrace_filter 
  echo '*wake*'  >> set_ftrace_filter
  echo '*kvm*'   >> set_ftrace_filter
  echo 1 > tracing_enabled             # let the tracer go

You can see your current selection of functions to trace via 'cat 
set_ftrace_filter', and you can see all functions via 'cat 
available_filter_functions'.

You can also trace all functions via:

  echo '*' > set_ftrace_filter

Tracer output can be captured from the 'trace' file. It should look like 
this:

 15)   cc1-28106    |   0.263 us    |    page_evictable();
 15)   cc1-28106    |               |    lru_cache_add_lru() {
 15)   cc1-28106    |   0.252 us    |      __lru_cache_add();
 15)   cc1-28106    |   0.738 us    |    }
 15)   cc1-28106    | + 74.026 us   |  }
 15)   cc1-28106    |               |  up_read() {
 15)   cc1-28106    |   0.257 us    |    _spin_lock_irqsave();
 15)   cc1-28106    |   0.253 us    |    _spin_unlock_irqrestore();
 15)   cc1-28106    |   1.329 us    |  }

To capture a continuous stream of all trace data you can do:

  cat trace_pipe > /tmp/trace.txt

(this will also drain the trace ringbuffers.)

Note that this can be quite expensive if there are a lot of functions that 
are traced - so it makes sense to trim down the set of traced functions to 
only the interesting ones. Which are the interesting ones can be 
determined from looking at the traces. You should see your KVM threads 
getting active every second as the ping happens.

If you get lost events you can increase the trace buffer size via the 
buffer_size_kb control - the default is around 1.4 MB.

Let me know if any of these steps is causing problems or if interpreting 
the traces is difficult.

	Ingo

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-21 15:18                       ` Ingo Molnar
  0 siblings, 0 replies; 180+ messages in thread
From: Ingo Molnar @ 2009-01-21 15:18 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Kevin Shanahan, Steven Rostedt, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r


* Avi Kivity <avi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> Kevin Shanahan wrote:
>> On Tue, 2009-01-20 at 19:47 +0200, Avi Kivity wrote:
>>   
>>> Steven Rostedt wrote:
>>>     
>>>> Note, the wakeup latency only tests realtime threads, since other threads
>>>> can have other issues for wakeup. I could change the wakeup tracer as
>>>> wakeup_rt, and make a new "wakeup" that tests all threads, but it may
>>>> be difficult to get something accurate.
>>>>       
>>> Kevin, can you retest with kvm at realtime priority?
>>>     
>>
>> Running vanilla Linux 2.6.28, kvm-82. First a control test to check that
>> the problem is still there when running at normal priority:
>>
>> --- hermes-old.wumi.org.au ping statistics ---
>> 900 packets transmitted, 900 received, 0% packet loss, time 899283ms
>> rtt min/avg/max/mdev = 0.119/269.773/13739.426/1230.836 ms, pipe 14
>>
>> Yeah, sure is.
>>
>> Okay, so now I set the realtime attributes of the processes for the VM
>> instance being pinged:
>>
>> flexo:~# ps ax | grep 6284
>>  6284 ?        Sl     6:11 /usr/local/kvm/bin/qemu-system-x86_64 -smp 2
>> -m 2048 -hda kvm-17-1.img -hdb kvm-17-tmp.img -net
>> nic,vlan=0,macaddr=52:54:00:12:34:67,model=rtl8139 -net
>> tap,vlan=0,ifname=tap17,script=no -vnc 127.0.0.1:17 -usbdevice tablet
>> -daemonize
>> flexo:~# pstree -p 6284
>> qemu-system-x86(6284)─┬─{qemu-system-x86}(6285)
>>                       ├─{qemu-system-x86}(6286)
>>                       └─{qemu-system-x86}(6540)
>>
>> (info cpus on the QEMU console shows 6285 and 6286 being the VCPU
>> processes. Not sure what the third child is for, maybe vnc?.)
>>
>> flexo:~# chrt -r -p 3 6284
>> flexo:~# chrt -r -p 3 6285
>> flexo:~# chrt -r -p 3 6286
>> flexo:~# chrt -p 6284
>> pid 6284's current scheduling policy: SCHED_RR
>> pid 6284's current scheduling priority: 3
>> flexo:~# chrt -p 6285
>> pid 6285's current scheduling policy: SCHED_RR
>> pid 6285's current scheduling priority: 3
>> flexo:~# chrt -p 6286
>> pid 6286's current scheduling policy: SCHED_RR
>> pid 6286's current scheduling priority: 3
>>
>> And the result of the ping test now:
>>
>> --- hermes-old.wumi.org.au ping statistics ---
>> 900 packets transmitted, 900 received, 0% packet loss, time 899326ms
>> rtt min/avg/max/mdev = 0.093/0.157/3.611/0.117 ms
>>
>> So, a _huge_ difference. But what does it mean?
>
> It means, a scheduling problem.  Can you run the latency tracer (which 
> only works with realtime priority), so we can tell if it is (a) kvm 
> failing to wake up the vcpu properly or (b) the scheduler delaying the 
> vcpu from running.

Could we please get an ftrace capture of the incident?

Firstly, it makes sense to simplify the tracing environment as much as 
possible: for example single-CPU traces are much easier to interpret.

Can you reproduce it with just one CPU online? I.e. if you offline all the 
other cores via:

  echo 0 > /sys/devices/system/cpu/cpu1/online

  [etc.]

and keep CPU#0 only, do the latencies still occur?

If they do still occur, then please do the traces that way.

[ If they do not occur then switch back on all CPUs - we'll sort out the
  traces ;-) ]

Then please build a function tracer kernel, by enabling:

  CONFIG_FUNCTION_TRACER=y
  CONFIG_FUNCTION_GRAPH_TRACER=y
  CONFIG_DYNAMIC_FTRACE=y

Once you boot into such a kernel, you can switch on function tracing via:

  cd /debug/tracing/

  echo 0 > tracing_enabled
  echo function_graph > current_tracer
  echo funcgraph-proc > trace_options 

It does not run yet, first find a suitable set of functions to trace. For 
example this will be a pretty good starting point for scheduler+KVM 
problems:

  echo ''         > set_ftrace_filter  # clear filter functions
  echo '*sched*' >> set_ftrace_filter 
  echo '*wake*'  >> set_ftrace_filter
  echo '*kvm*'   >> set_ftrace_filter
  echo 1 > tracing_enabled             # let the tracer go

You can see your current selection of functions to trace via 'cat 
set_ftrace_filter', and you can see all functions via 'cat 
available_filter_functions'.

You can also trace all functions via:

  echo '*' > set_ftrace_filter

Tracer output can be captured from the 'trace' file. It should look like 
this:

 15)   cc1-28106    |   0.263 us    |    page_evictable();
 15)   cc1-28106    |               |    lru_cache_add_lru() {
 15)   cc1-28106    |   0.252 us    |      __lru_cache_add();
 15)   cc1-28106    |   0.738 us    |    }
 15)   cc1-28106    | + 74.026 us   |  }
 15)   cc1-28106    |               |  up_read() {
 15)   cc1-28106    |   0.257 us    |    _spin_lock_irqsave();
 15)   cc1-28106    |   0.253 us    |    _spin_unlock_irqrestore();
 15)   cc1-28106    |   1.329 us    |  }

To capture a continuous stream of all trace data you can do:

  cat trace_pipe > /tmp/trace.txt

(this will also drain the trace ringbuffers.)

Note that this can be quite expensive if there are a lot of functions that 
are traced - so it makes sense to trim down the set of traced functions to 
only the interesting ones. Which are the interesting ones can be 
determined from looking at the traces. You should see your KVM threads 
getting active every second as the ping happens.

If you get lost events you can increase the trace buffer size via the 
buffer_size_kb control - the default is around 1.4 MB.

Let me know if any of these steps is causing problems or if interpreting 
the traces is difficult.

	Ingo

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-21 15:13                           ` Steven Rostedt
  0 siblings, 0 replies; 180+ messages in thread
From: Steven Rostedt @ 2009-01-21 15:13 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Kevin Shanahan, Ingo Molnar, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker, bugme-daemon


On Wed, 21 Jan 2009, Avi Kivity wrote:

> Kevin Shanahan wrote:
> > > > --- hermes-old.wumi.org.au ping statistics ---
> > > > 900 packets transmitted, 900 received, 0% packet loss, time 899326ms
> > > > rtt min/avg/max/mdev = 0.093/0.157/3.611/0.117 ms
> > > > 
> > > > So, a _huge_ difference. But what does it mean?
> > > >       
> > > It means, a scheduling problem.  Can you run the latency tracer (which
> > > only works with realtime priority), so we can tell if it is (a) kvm
> > > failing to wake up the vcpu properly or (b) the scheduler delaying the
> > > vcpu from running.
> > >     
> > 
> > Sorry, but are you sure that's going to be useful?
> > 
> > If it only works on realtime threads and I'm not seeing the problem when
> > running kvm with realtime priority, is this going to tell you what you
> > want to know?
> > 
> > Not trying to be difficult, but that just didn't make sense to me.
> >   
> 
> You're right, wasn't thinking properly.
> 
> This is a tough one.  I'll see if I can think of something.  Ingo, any ideas?
> 

I should have replied to this email :-)

Yeah, I'm working on making wakeup latency tracer work with non rt tasks.

The "wakeup" tracer will now trace all tasks where as a new "wakeup_rt" 
tracer will only trace rt tasks. I did it for rt tasks only because it 
only records the highest latency wake ups and the non rt tasks were always 
bigger than the rt tasks which made what I was tracing useless (the rt 
scheduling).

But by not having an option for all tasks, it makes the wakeup tracer 
useless for everyone else ;-)

-- Steve


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-21 15:13                           ` Steven Rostedt
  0 siblings, 0 replies; 180+ messages in thread
From: Steven Rostedt @ 2009-01-21 15:13 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Kevin Shanahan, Ingo Molnar, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r


On Wed, 21 Jan 2009, Avi Kivity wrote:

> Kevin Shanahan wrote:
> > > > --- hermes-old.wumi.org.au ping statistics ---
> > > > 900 packets transmitted, 900 received, 0% packet loss, time 899326ms
> > > > rtt min/avg/max/mdev = 0.093/0.157/3.611/0.117 ms
> > > > 
> > > > So, a _huge_ difference. But what does it mean?
> > > >       
> > > It means, a scheduling problem.  Can you run the latency tracer (which
> > > only works with realtime priority), so we can tell if it is (a) kvm
> > > failing to wake up the vcpu properly or (b) the scheduler delaying the
> > > vcpu from running.
> > >     
> > 
> > Sorry, but are you sure that's going to be useful?
> > 
> > If it only works on realtime threads and I'm not seeing the problem when
> > running kvm with realtime priority, is this going to tell you what you
> > want to know?
> > 
> > Not trying to be difficult, but that just didn't make sense to me.
> >   
> 
> You're right, wasn't thinking properly.
> 
> This is a tough one.  I'll see if I can think of something.  Ingo, any ideas?
> 

I should have replied to this email :-)

Yeah, I'm working on making wakeup latency tracer work with non rt tasks.

The "wakeup" tracer will now trace all tasks where as a new "wakeup_rt" 
tracer will only trace rt tasks. I did it for rt tasks only because it 
only records the highest latency wake ups and the non rt tasks were always 
bigger than the rt tasks which made what I was tracing useless (the rt 
scheduling).

But by not having an option for all tasks, it makes the wakeup tracer 
useless for everyone else ;-)

-- Steve

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-21 15:10                       ` Steven Rostedt
  0 siblings, 0 replies; 180+ messages in thread
From: Steven Rostedt @ 2009-01-21 15:10 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Kevin Shanahan, Ingo Molnar, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker, bugme-daemon


On Wed, 21 Jan 2009, Avi Kivity wrote:

> Kevin Shanahan wrote:
> > On Tue, 2009-01-20 at 19:47 +0200, Avi Kivity wrote:
> >   
> > > Steven Rostedt wrote:
> > >     
> > > > Note, the wakeup latency only tests realtime threads, since other
> > > > threads
> > > > can have other issues for wakeup. I could change the wakeup tracer as
> > > > wakeup_rt, and make a new "wakeup" that tests all threads, but it may
> > > > be difficult to get something accurate.
> > > >       
> > > Kevin, can you retest with kvm at realtime priority?
> > >     
> > 
> > Running vanilla Linux 2.6.28, kvm-82. First a control test to check that
> > the problem is still there when running at normal priority:
> > 
> > --- hermes-old.wumi.org.au ping statistics ---
> > 900 packets transmitted, 900 received, 0% packet loss, time 899283ms
> > rtt min/avg/max/mdev = 0.119/269.773/13739.426/1230.836 ms, pipe 14
> > 
> > Yeah, sure is.
> > 
> > Okay, so now I set the realtime attributes of the processes for the VM
> > instance being pinged:
> > 
> > flexo:~# ps ax | grep 6284
> >  6284 ?        Sl     6:11 /usr/local/kvm/bin/qemu-system-x86_64 -smp 2
> > -m 2048 -hda kvm-17-1.img -hdb kvm-17-tmp.img -net
> > nic,vlan=0,macaddr=52:54:00:12:34:67,model=rtl8139 -net
> > tap,vlan=0,ifname=tap17,script=no -vnc 127.0.0.1:17 -usbdevice tablet
> > -daemonize
> > flexo:~# pstree -p 6284
> > qemu-system-x86(6284)???{qemu-system-x86}(6285)
> >                       ??{qemu-system-x86}(6286)
> >                       ??{qemu-system-x86}(6540)
> > 
> > (info cpus on the QEMU console shows 6285 and 6286 being the VCPU
> > processes. Not sure what the third child is for, maybe vnc?.)
> > 
> > flexo:~# chrt -r -p 3 6284
> > flexo:~# chrt -r -p 3 6285
> > flexo:~# chrt -r -p 3 6286
> > flexo:~# chrt -p 6284
> > pid 6284's current scheduling policy: SCHED_RR
> > pid 6284's current scheduling priority: 3
> > flexo:~# chrt -p 6285
> > pid 6285's current scheduling policy: SCHED_RR
> > pid 6285's current scheduling priority: 3
> > flexo:~# chrt -p 6286
> > pid 6286's current scheduling policy: SCHED_RR
> > pid 6286's current scheduling priority: 3
> > 
> > And the result of the ping test now:
> > 
> > --- hermes-old.wumi.org.au ping statistics ---
> > 900 packets transmitted, 900 received, 0% packet loss, time 899326ms
> > rtt min/avg/max/mdev = 0.093/0.157/3.611/0.117 ms
> > 
> > So, a _huge_ difference. But what does it mean?
> 
> It means, a scheduling problem.  Can you run the latency tracer (which only
> works with realtime priority), so we can tell if it is (a) kvm failing to wake
> up the vcpu properly or (b) the scheduler delaying the vcpu from running.
> 

Note, I'm working on a tracer that will also measure non RT task wake up 
times.

-- Steve


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-21 15:10                       ` Steven Rostedt
  0 siblings, 0 replies; 180+ messages in thread
From: Steven Rostedt @ 2009-01-21 15:10 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Kevin Shanahan, Ingo Molnar, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r


On Wed, 21 Jan 2009, Avi Kivity wrote:

> Kevin Shanahan wrote:
> > On Tue, 2009-01-20 at 19:47 +0200, Avi Kivity wrote:
> >   
> > > Steven Rostedt wrote:
> > >     
> > > > Note, the wakeup latency only tests realtime threads, since other
> > > > threads
> > > > can have other issues for wakeup. I could change the wakeup tracer as
> > > > wakeup_rt, and make a new "wakeup" that tests all threads, but it may
> > > > be difficult to get something accurate.
> > > >       
> > > Kevin, can you retest with kvm at realtime priority?
> > >     
> > 
> > Running vanilla Linux 2.6.28, kvm-82. First a control test to check that
> > the problem is still there when running at normal priority:
> > 
> > --- hermes-old.wumi.org.au ping statistics ---
> > 900 packets transmitted, 900 received, 0% packet loss, time 899283ms
> > rtt min/avg/max/mdev = 0.119/269.773/13739.426/1230.836 ms, pipe 14
> > 
> > Yeah, sure is.
> > 
> > Okay, so now I set the realtime attributes of the processes for the VM
> > instance being pinged:
> > 
> > flexo:~# ps ax | grep 6284
> >  6284 ?        Sl     6:11 /usr/local/kvm/bin/qemu-system-x86_64 -smp 2
> > -m 2048 -hda kvm-17-1.img -hdb kvm-17-tmp.img -net
> > nic,vlan=0,macaddr=52:54:00:12:34:67,model=rtl8139 -net
> > tap,vlan=0,ifname=tap17,script=no -vnc 127.0.0.1:17 -usbdevice tablet
> > -daemonize
> > flexo:~# pstree -p 6284
> > qemu-system-x86(6284)???{qemu-system-x86}(6285)
> >                       ??{qemu-system-x86}(6286)
> >                       ??{qemu-system-x86}(6540)
> > 
> > (info cpus on the QEMU console shows 6285 and 6286 being the VCPU
> > processes. Not sure what the third child is for, maybe vnc?.)
> > 
> > flexo:~# chrt -r -p 3 6284
> > flexo:~# chrt -r -p 3 6285
> > flexo:~# chrt -r -p 3 6286
> > flexo:~# chrt -p 6284
> > pid 6284's current scheduling policy: SCHED_RR
> > pid 6284's current scheduling priority: 3
> > flexo:~# chrt -p 6285
> > pid 6285's current scheduling policy: SCHED_RR
> > pid 6285's current scheduling priority: 3
> > flexo:~# chrt -p 6286
> > pid 6286's current scheduling policy: SCHED_RR
> > pid 6286's current scheduling priority: 3
> > 
> > And the result of the ping test now:
> > 
> > --- hermes-old.wumi.org.au ping statistics ---
> > 900 packets transmitted, 900 received, 0% packet loss, time 899326ms
> > rtt min/avg/max/mdev = 0.093/0.157/3.611/0.117 ms
> > 
> > So, a _huge_ difference. But what does it mean?
> 
> It means, a scheduling problem.  Can you run the latency tracer (which only
> works with realtime priority), so we can tell if it is (a) kvm failing to wake
> up the vcpu properly or (b) the scheduler delaying the vcpu from running.
> 

Note, I'm working on a tracer that will also measure non RT task wake up 
times.

-- Steve

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-21 14:59                         ` Avi Kivity
  0 siblings, 0 replies; 180+ messages in thread
From: Avi Kivity @ 2009-01-21 14:59 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Steven Rostedt, Ingo Molnar, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker, bugme-daemon

Kevin Shanahan wrote:
>>> --- hermes-old.wumi.org.au ping statistics ---
>>> 900 packets transmitted, 900 received, 0% packet loss, time 899326ms
>>> rtt min/avg/max/mdev = 0.093/0.157/3.611/0.117 ms
>>>
>>> So, a _huge_ difference. But what does it mean?
>>>       
>> It means, a scheduling problem.  Can you run the latency tracer (which 
>> only works with realtime priority), so we can tell if it is (a) kvm 
>> failing to wake up the vcpu properly or (b) the scheduler delaying the 
>> vcpu from running.
>>     
>
> Sorry, but are you sure that's going to be useful?
>
> If it only works on realtime threads and I'm not seeing the problem when
> running kvm with realtime priority, is this going to tell you what you
> want to know?
>
> Not trying to be difficult, but that just didn't make sense to me.
>   

You're right, wasn't thinking properly.

This is a tough one.  I'll see if I can think of something.  Ingo, any 
ideas?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-21 14:59                         ` Avi Kivity
  0 siblings, 0 replies; 180+ messages in thread
From: Avi Kivity @ 2009-01-21 14:59 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Steven Rostedt, Ingo Molnar, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r

Kevin Shanahan wrote:
>>> --- hermes-old.wumi.org.au ping statistics ---
>>> 900 packets transmitted, 900 received, 0% packet loss, time 899326ms
>>> rtt min/avg/max/mdev = 0.093/0.157/3.611/0.117 ms
>>>
>>> So, a _huge_ difference. But what does it mean?
>>>       
>> It means, a scheduling problem.  Can you run the latency tracer (which 
>> only works with realtime priority), so we can tell if it is (a) kvm 
>> failing to wake up the vcpu properly or (b) the scheduler delaying the 
>> vcpu from running.
>>     
>
> Sorry, but are you sure that's going to be useful?
>
> If it only works on realtime threads and I'm not seeing the problem when
> running kvm with realtime priority, is this going to tell you what you
> want to know?
>
> Not trying to be difficult, but that just didn't make sense to me.
>   

You're right, wasn't thinking properly.

This is a tough one.  I'll see if I can think of something.  Ingo, any 
ideas?

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-21 14:51                       ` Kevin Shanahan
  0 siblings, 0 replies; 180+ messages in thread
From: Kevin Shanahan @ 2009-01-21 14:51 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Steven Rostedt, Ingo Molnar, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker, bugme-daemon

On Wed, 2009-01-21 at 16:34 +0200, Avi Kivity wrote:
> Kevin Shanahan wrote:
> > On Tue, 2009-01-20 at 19:47 +0200, Avi Kivity wrote:
> >> Kevin, can you retest with kvm at realtime priority?
...

> > --- hermes-old.wumi.org.au ping statistics ---
> > 900 packets transmitted, 900 received, 0% packet loss, time 899326ms
> > rtt min/avg/max/mdev = 0.093/0.157/3.611/0.117 ms
> >
> > So, a _huge_ difference. But what does it mean?
> 
> It means, a scheduling problem.  Can you run the latency tracer (which 
> only works with realtime priority), so we can tell if it is (a) kvm 
> failing to wake up the vcpu properly or (b) the scheduler delaying the 
> vcpu from running.

Sorry, but are you sure that's going to be useful?

If it only works on realtime threads and I'm not seeing the problem when
running kvm with realtime priority, is this going to tell you what you
want to know?

Not trying to be difficult, but that just didn't make sense to me.

Regards,
Kevin.



^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-21 14:51                       ` Kevin Shanahan
  0 siblings, 0 replies; 180+ messages in thread
From: Kevin Shanahan @ 2009-01-21 14:51 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Steven Rostedt, Ingo Molnar, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r

On Wed, 2009-01-21 at 16:34 +0200, Avi Kivity wrote:
> Kevin Shanahan wrote:
> > On Tue, 2009-01-20 at 19:47 +0200, Avi Kivity wrote:
> >> Kevin, can you retest with kvm at realtime priority?
...

> > --- hermes-old.wumi.org.au ping statistics ---
> > 900 packets transmitted, 900 received, 0% packet loss, time 899326ms
> > rtt min/avg/max/mdev = 0.093/0.157/3.611/0.117 ms
> >
> > So, a _huge_ difference. But what does it mean?
> 
> It means, a scheduling problem.  Can you run the latency tracer (which 
> only works with realtime priority), so we can tell if it is (a) kvm 
> failing to wake up the vcpu properly or (b) the scheduler delaying the 
> vcpu from running.

Sorry, but are you sure that's going to be useful?

If it only works on realtime threads and I'm not seeing the problem when
running kvm with realtime priority, is this going to tell you what you
want to know?

Not trying to be difficult, but that just didn't make sense to me.

Regards,
Kevin.


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-21 14:34                     ` Avi Kivity
  0 siblings, 0 replies; 180+ messages in thread
From: Avi Kivity @ 2009-01-21 14:34 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Steven Rostedt, Ingo Molnar, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker, bugme-daemon

Kevin Shanahan wrote:
> On Tue, 2009-01-20 at 19:47 +0200, Avi Kivity wrote:
>   
>> Steven Rostedt wrote:
>>     
>>> Note, the wakeup latency only tests realtime threads, since other threads
>>> can have other issues for wakeup. I could change the wakeup tracer as
>>> wakeup_rt, and make a new "wakeup" that tests all threads, but it may
>>> be difficult to get something accurate.
>>>       
>> Kevin, can you retest with kvm at realtime priority?
>>     
>
> Running vanilla Linux 2.6.28, kvm-82. First a control test to check that
> the problem is still there when running at normal priority:
>
> --- hermes-old.wumi.org.au ping statistics ---
> 900 packets transmitted, 900 received, 0% packet loss, time 899283ms
> rtt min/avg/max/mdev = 0.119/269.773/13739.426/1230.836 ms, pipe 14
>
> Yeah, sure is.
>
> Okay, so now I set the realtime attributes of the processes for the VM
> instance being pinged:
>
> flexo:~# ps ax | grep 6284
>  6284 ?        Sl     6:11 /usr/local/kvm/bin/qemu-system-x86_64 -smp 2
> -m 2048 -hda kvm-17-1.img -hdb kvm-17-tmp.img -net
> nic,vlan=0,macaddr=52:54:00:12:34:67,model=rtl8139 -net
> tap,vlan=0,ifname=tap17,script=no -vnc 127.0.0.1:17 -usbdevice tablet
> -daemonize
> flexo:~# pstree -p 6284
> qemu-system-x86(6284)─┬─{qemu-system-x86}(6285)
>                       ├─{qemu-system-x86}(6286)
>                       └─{qemu-system-x86}(6540)
>
> (info cpus on the QEMU console shows 6285 and 6286 being the VCPU
> processes. Not sure what the third child is for, maybe vnc?.)
>
> flexo:~# chrt -r -p 3 6284
> flexo:~# chrt -r -p 3 6285
> flexo:~# chrt -r -p 3 6286
> flexo:~# chrt -p 6284
> pid 6284's current scheduling policy: SCHED_RR
> pid 6284's current scheduling priority: 3
> flexo:~# chrt -p 6285
> pid 6285's current scheduling policy: SCHED_RR
> pid 6285's current scheduling priority: 3
> flexo:~# chrt -p 6286
> pid 6286's current scheduling policy: SCHED_RR
> pid 6286's current scheduling priority: 3
>
> And the result of the ping test now:
>
> --- hermes-old.wumi.org.au ping statistics ---
> 900 packets transmitted, 900 received, 0% packet loss, time 899326ms
> rtt min/avg/max/mdev = 0.093/0.157/3.611/0.117 ms
>
> So, a _huge_ difference. But what does it mean?

It means, a scheduling problem.  Can you run the latency tracer (which 
only works with realtime priority), so we can tell if it is (a) kvm 
failing to wake up the vcpu properly or (b) the scheduler delaying the 
vcpu from running.

> P.S. Can someone tell me if I'm doing the CC: to bugme-daemon wrong? I
>      thought that was supposed to add the emails as comments to the
>      bugzilla report?
>   

So long as it isn't complaining, you can continue.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-21 14:34                     ` Avi Kivity
  0 siblings, 0 replies; 180+ messages in thread
From: Avi Kivity @ 2009-01-21 14:34 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Steven Rostedt, Ingo Molnar, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r

Kevin Shanahan wrote:
> On Tue, 2009-01-20 at 19:47 +0200, Avi Kivity wrote:
>   
>> Steven Rostedt wrote:
>>     
>>> Note, the wakeup latency only tests realtime threads, since other threads
>>> can have other issues for wakeup. I could change the wakeup tracer as
>>> wakeup_rt, and make a new "wakeup" that tests all threads, but it may
>>> be difficult to get something accurate.
>>>       
>> Kevin, can you retest with kvm at realtime priority?
>>     
>
> Running vanilla Linux 2.6.28, kvm-82. First a control test to check that
> the problem is still there when running at normal priority:
>
> --- hermes-old.wumi.org.au ping statistics ---
> 900 packets transmitted, 900 received, 0% packet loss, time 899283ms
> rtt min/avg/max/mdev = 0.119/269.773/13739.426/1230.836 ms, pipe 14
>
> Yeah, sure is.
>
> Okay, so now I set the realtime attributes of the processes for the VM
> instance being pinged:
>
> flexo:~# ps ax | grep 6284
>  6284 ?        Sl     6:11 /usr/local/kvm/bin/qemu-system-x86_64 -smp 2
> -m 2048 -hda kvm-17-1.img -hdb kvm-17-tmp.img -net
> nic,vlan=0,macaddr=52:54:00:12:34:67,model=rtl8139 -net
> tap,vlan=0,ifname=tap17,script=no -vnc 127.0.0.1:17 -usbdevice tablet
> -daemonize
> flexo:~# pstree -p 6284
> qemu-system-x86(6284)─┬─{qemu-system-x86}(6285)
>                       ├─{qemu-system-x86}(6286)
>                       └─{qemu-system-x86}(6540)
>
> (info cpus on the QEMU console shows 6285 and 6286 being the VCPU
> processes. Not sure what the third child is for, maybe vnc?.)
>
> flexo:~# chrt -r -p 3 6284
> flexo:~# chrt -r -p 3 6285
> flexo:~# chrt -r -p 3 6286
> flexo:~# chrt -p 6284
> pid 6284's current scheduling policy: SCHED_RR
> pid 6284's current scheduling priority: 3
> flexo:~# chrt -p 6285
> pid 6285's current scheduling policy: SCHED_RR
> pid 6285's current scheduling priority: 3
> flexo:~# chrt -p 6286
> pid 6286's current scheduling policy: SCHED_RR
> pid 6286's current scheduling priority: 3
>
> And the result of the ping test now:
>
> --- hermes-old.wumi.org.au ping statistics ---
> 900 packets transmitted, 900 received, 0% packet loss, time 899326ms
> rtt min/avg/max/mdev = 0.093/0.157/3.611/0.117 ms
>
> So, a _huge_ difference. But what does it mean?

It means, a scheduling problem.  Can you run the latency tracer (which 
only works with realtime priority), so we can tell if it is (a) kvm 
failing to wake up the vcpu properly or (b) the scheduler delaying the 
vcpu from running.

> P.S. Can someone tell me if I'm doing the CC: to bugme-daemon wrong? I
>      thought that was supposed to add the emails as comments to the
>      bugzilla report?
>   

So long as it isn't complaining, you can continue.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-21 14:25                   ` Kevin Shanahan
  0 siblings, 0 replies; 180+ messages in thread
From: Kevin Shanahan @ 2009-01-21 14:25 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Steven Rostedt, Ingo Molnar, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker, bugme-daemon

On Tue, 2009-01-20 at 19:47 +0200, Avi Kivity wrote:
> Steven Rostedt wrote:
> > Note, the wakeup latency only tests realtime threads, since other threads
> > can have other issues for wakeup. I could change the wakeup tracer as
> > wakeup_rt, and make a new "wakeup" that tests all threads, but it may
> > be difficult to get something accurate.
> 
> Kevin, can you retest with kvm at realtime priority?

Running vanilla Linux 2.6.28, kvm-82. First a control test to check that
the problem is still there when running at normal priority:

--- hermes-old.wumi.org.au ping statistics ---
900 packets transmitted, 900 received, 0% packet loss, time 899283ms
rtt min/avg/max/mdev = 0.119/269.773/13739.426/1230.836 ms, pipe 14

Yeah, sure is.

Okay, so now I set the realtime attributes of the processes for the VM
instance being pinged:

flexo:~# ps ax | grep 6284
 6284 ?        Sl     6:11 /usr/local/kvm/bin/qemu-system-x86_64 -smp 2
-m 2048 -hda kvm-17-1.img -hdb kvm-17-tmp.img -net
nic,vlan=0,macaddr=52:54:00:12:34:67,model=rtl8139 -net
tap,vlan=0,ifname=tap17,script=no -vnc 127.0.0.1:17 -usbdevice tablet
-daemonize
flexo:~# pstree -p 6284
qemu-system-x86(6284)─┬─{qemu-system-x86}(6285)
                      ├─{qemu-system-x86}(6286)
                      └─{qemu-system-x86}(6540)

(info cpus on the QEMU console shows 6285 and 6286 being the VCPU
processes. Not sure what the third child is for, maybe vnc?.)

flexo:~# chrt -r -p 3 6284
flexo:~# chrt -r -p 3 6285
flexo:~# chrt -r -p 3 6286
flexo:~# chrt -p 6284
pid 6284's current scheduling policy: SCHED_RR
pid 6284's current scheduling priority: 3
flexo:~# chrt -p 6285
pid 6285's current scheduling policy: SCHED_RR
pid 6285's current scheduling priority: 3
flexo:~# chrt -p 6286
pid 6286's current scheduling policy: SCHED_RR
pid 6286's current scheduling priority: 3

And the result of the ping test now:

--- hermes-old.wumi.org.au ping statistics ---
900 packets transmitted, 900 received, 0% packet loss, time 899326ms
rtt min/avg/max/mdev = 0.093/0.157/3.611/0.117 ms

So, a _huge_ difference. But what does it mean?

Regards,
Kevin.

P.S. Can someone tell me if I'm doing the CC: to bugme-daemon wrong? I
     thought that was supposed to add the emails as comments to the
     bugzilla report?



^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-21 14:25                   ` Kevin Shanahan
  0 siblings, 0 replies; 180+ messages in thread
From: Kevin Shanahan @ 2009-01-21 14:25 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Steven Rostedt, Ingo Molnar, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r

On Tue, 2009-01-20 at 19:47 +0200, Avi Kivity wrote:
> Steven Rostedt wrote:
> > Note, the wakeup latency only tests realtime threads, since other threads
> > can have other issues for wakeup. I could change the wakeup tracer as
> > wakeup_rt, and make a new "wakeup" that tests all threads, but it may
> > be difficult to get something accurate.
> 
> Kevin, can you retest with kvm at realtime priority?

Running vanilla Linux 2.6.28, kvm-82. First a control test to check that
the problem is still there when running at normal priority:

--- hermes-old.wumi.org.au ping statistics ---
900 packets transmitted, 900 received, 0% packet loss, time 899283ms
rtt min/avg/max/mdev = 0.119/269.773/13739.426/1230.836 ms, pipe 14

Yeah, sure is.

Okay, so now I set the realtime attributes of the processes for the VM
instance being pinged:

flexo:~# ps ax | grep 6284
 6284 ?        Sl     6:11 /usr/local/kvm/bin/qemu-system-x86_64 -smp 2
-m 2048 -hda kvm-17-1.img -hdb kvm-17-tmp.img -net
nic,vlan=0,macaddr=52:54:00:12:34:67,model=rtl8139 -net
tap,vlan=0,ifname=tap17,script=no -vnc 127.0.0.1:17 -usbdevice tablet
-daemonize
flexo:~# pstree -p 6284
qemu-system-x86(6284)─┬─{qemu-system-x86}(6285)
                      ├─{qemu-system-x86}(6286)
                      └─{qemu-system-x86}(6540)

(info cpus on the QEMU console shows 6285 and 6286 being the VCPU
processes. Not sure what the third child is for, maybe vnc?.)

flexo:~# chrt -r -p 3 6284
flexo:~# chrt -r -p 3 6285
flexo:~# chrt -r -p 3 6286
flexo:~# chrt -p 6284
pid 6284's current scheduling policy: SCHED_RR
pid 6284's current scheduling priority: 3
flexo:~# chrt -p 6285
pid 6285's current scheduling policy: SCHED_RR
pid 6285's current scheduling priority: 3
flexo:~# chrt -p 6286
pid 6286's current scheduling policy: SCHED_RR
pid 6286's current scheduling priority: 3

And the result of the ping test now:

--- hermes-old.wumi.org.au ping statistics ---
900 packets transmitted, 900 received, 0% packet loss, time 899326ms
rtt min/avg/max/mdev = 0.093/0.157/3.611/0.117 ms

So, a _huge_ difference. But what does it mean?

Regards,
Kevin.

P.S. Can someone tell me if I'm doing the CC: to bugme-daemon wrong? I
     thought that was supposed to add the emails as comments to the
     bugzilla report?


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 18:42               ` Ingo Molnar
  0 siblings, 0 replies; 180+ messages in thread
From: Ingo Molnar @ 2009-01-20 18:42 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Mike Galbraith, bugme-daemon,
	Peter Zijlstra


* Kevin Shanahan <kmshanah@ucwb.org.au> wrote:

> Running the ping test with without apache2 running in the guest:
> 
> --- hermes-old.wumi.org.au ping statistics ---
> 900 packets transmitted, 900 received, 0% packet loss, time 902740ms
> rtt min/avg/max/mdev = 0.568/3.745/272.558/16.990 ms
> 
> And with apache2 running:
> 
> --- hermes-old.wumi.org.au ping statistics ---
> 900 packets transmitted, 900 received, 0% packet loss, time 902758ms
> rtt min/avg/max/mdev = 0.625/25.634/852.739/76.586 ms
> 
> In both cases it's quite variable, but the max latency is still not as 
> bad as when running with the irq chip enabled.

So the worst-case ping latency is more than 10 times lower?

I'd say this points in the direction of some sort of KVM-internal 
wakeup/signalling latency that happens if KVM does not deschedule. For 
example it could be a bug like this: if a guest image runs at 100% CPU 
time for a long time, IRQ injections might not propagate up until the 
preemption callbacks run. (but i'm just speculating here)

	Ingo

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 18:42               ` Ingo Molnar
  0 siblings, 0 replies; 180+ messages in thread
From: Ingo Molnar @ 2009-01-20 18:42 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Mike Galbraith,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r, Peter Zijlstra


* Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org> wrote:

> Running the ping test with without apache2 running in the guest:
> 
> --- hermes-old.wumi.org.au ping statistics ---
> 900 packets transmitted, 900 received, 0% packet loss, time 902740ms
> rtt min/avg/max/mdev = 0.568/3.745/272.558/16.990 ms
> 
> And with apache2 running:
> 
> --- hermes-old.wumi.org.au ping statistics ---
> 900 packets transmitted, 900 received, 0% packet loss, time 902758ms
> rtt min/avg/max/mdev = 0.625/25.634/852.739/76.586 ms
> 
> In both cases it's quite variable, but the max latency is still not as 
> bad as when running with the irq chip enabled.

So the worst-case ping latency is more than 10 times lower?

I'd say this points in the direction of some sort of KVM-internal 
wakeup/signalling latency that happens if KVM does not deschedule. For 
example it could be a bug like this: if a guest image runs at 100% CPU 
time for a long time, IRQ injections might not propagate up until the 
preemption callbacks run. (but i'm just speculating here)

	Ingo

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 18:39                     ` Ingo Molnar
  0 siblings, 0 replies; 180+ messages in thread
From: Ingo Molnar @ 2009-01-20 18:39 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Kevin Shanahan, Avi Kivity, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Kevin Shanahan,
	Mike Galbraith, Peter Zijlstra, Frédéric Weisbecker


* Steven Rostedt <rostedt@goodmis.org> wrote:

> > hm, that's a significant regression then. The latency tracer used to 
> > measure the highest-prio task in the system - be that RT or non-rt.
> 
> Well, it is a regression from what was in -rt yes. But not from what 
> ever was in mainline.

indeed, it is not a regression, it is worse: it makes the mainline version 
utterly useless in 99% of the cases ... This really needs to be fixed.

	Ingo

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 18:39                     ` Ingo Molnar
  0 siblings, 0 replies; 180+ messages in thread
From: Ingo Molnar @ 2009-01-20 18:39 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Kevin Shanahan, Avi Kivity, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Kevin Shanahan,
	Mike Galbraith, Peter Zijlstra, Frédéric Weisbecker


* Steven Rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org> wrote:

> > hm, that's a significant regression then. The latency tracer used to 
> > measure the highest-prio task in the system - be that RT or non-rt.
> 
> Well, it is a regression from what was in -rt yes. But not from what 
> ever was in mainline.

indeed, it is not a regression, it is worse: it makes the mainline version 
utterly useless in 99% of the cases ... This really needs to be fixed.

	Ingo

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 17:54             ` Kevin Shanahan
  0 siblings, 0 replies; 180+ messages in thread
From: Kevin Shanahan @ 2009-01-20 17:54 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Ingo Molnar, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Mike Galbraith, bugme-daemon,
	Peter Zijlstra

On Tue, 2009-01-20 at 15:04 +0200, Avi Kivity wrote:
> Kevin Shanahan wrote:
> > On Tue, 2009-01-20 at 12:35 +0100, Ingo Molnar wrote:
> >> This only seems to occur under KVM, right? I.e. you tested it with -no-kvm 
> >> and the problem went away, correct?
> >>     
> >
> > Well, the I couldn't make the test conditions identical, but it the
> > problem didn't occur with the test I was able to do:
> >
> >   http://marc.info/?l=linux-kernel&m=123228728416498&w=2
> >
> >   
> 
> Can you also try with -no-kvm-irqchip?
> 
> You will need to comment out the lines
> 
>     /* ISA IRQs map to GSI 1-1 except for IRQ0 which maps
>      * to GSI 2.  GSI maps to ioapic 1-1.  This is not
>      * the cleanest way of doing it but it should work. */
> 
>     if (vector == 0)
>         vector = 2;
> 
> in qemu/hw/apic.c (should also fix -no-kvm smp).  This will change kvm 
> wakeups to use signals rather than the in-kernel code, which may be buggy.

Okay, I commented out those lines and compiled a new kvm-82 userspace
and kernel modules. Using those on a vanilla 2.6.28 kernel, with all
guests run with -no-kvm-irqchip added.

As before a number of the XP guests wanted to chug away at 100% CPU
usage for a long time. Three of the guests clocked up ~40 minutes CPU
time before I decided to just shut them down. Perhaps coincidentally,
these three guests are the only ones with Office 2003 installed on them.
That could be the difference between those guests and the other XP
guests, but that's probably not important for now.

The two Linux SMP guests booted okay this time, though they seem to only
use one CPU on the host (I guess kvm is not multi-threaded in this
mode?). "hermes-old", the guest I am pinging in all my tests, had a lot
of trouble running the apache2 setup - it was so slow it was difficult
to load a complete page from our RT system. The kvm process for this
guest was taking up 100% cpu on the host constantly and all sorts of
wierd stuff could be seen by watching top in the guest:

top - 03:44:17 up 43 min,  1 user,  load average: 3.95, 1.55, 0.80
Tasks: 101 total,   4 running,  97 sleeping,   0 stopped,   0 zombie
Cpu(s): 79.7%us, 10.4%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  9.9%si,
0.0%st
Mem:   2075428k total,   391128k used,  1684300k free,    13044k buffers
Swap:  3502160k total,        0k used,  3502160k free,   118488k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND             
 2956 postgres  20   0 19704  11m  10m S 1658  0.6   2:55.99 postmaster          
 2934 www-data  20   0 60392  40m 5132 R   31  2.0   0:17.28 apache2             
 2958 postgres  20   0 19700  11m 9.8m R   28  0.6   0:20.41 postmaster          
 2940 www-data  20   0 58652  38m 5016 S   27  1.9   0:04.87 apache2             
 2937 www-data  20   0 60124  40m 5112 S   18  2.0   0:11.00 apache2             
 2959 postgres  20   0 19132 5424 4132 S   10  0.3   0:01.50 postmaster          
 2072 postgres  20   0  8064 1416  548 S    7  0.1   0:23.71 postmaster          
 2960 postgres  20   0 19132 5368 4060 R    6  0.3   0:01.55 postmaster          
 2071 postgres  20   0  8560 1972  488 S    5  0.1   0:08.33 postmaster    

Running the ping test with without apache2 running in the guest:

--- hermes-old.wumi.org.au ping statistics ---
900 packets transmitted, 900 received, 0% packet loss, time 902740ms
rtt min/avg/max/mdev = 0.568/3.745/272.558/16.990 ms

And with apache2 running:

--- hermes-old.wumi.org.au ping statistics ---
900 packets transmitted, 900 received, 0% packet loss, time 902758ms
rtt min/avg/max/mdev = 0.625/25.634/852.739/76.586 ms

In both cases it's quite variable, but the max latency is still not as
bad as when running with the irq chip enabled.

Anyway, the test is again not ideal, but I hope we're proving something.
That's all I can do for tonight - should be ready for more again
tomorrow night.

Regards,
Kevin.



^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 17:54             ` Kevin Shanahan
  0 siblings, 0 replies; 180+ messages in thread
From: Kevin Shanahan @ 2009-01-20 17:54 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Ingo Molnar, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Mike Galbraith,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r, Peter Zijlstra

On Tue, 2009-01-20 at 15:04 +0200, Avi Kivity wrote:
> Kevin Shanahan wrote:
> > On Tue, 2009-01-20 at 12:35 +0100, Ingo Molnar wrote:
> >> This only seems to occur under KVM, right? I.e. you tested it with -no-kvm 
> >> and the problem went away, correct?
> >>     
> >
> > Well, the I couldn't make the test conditions identical, but it the
> > problem didn't occur with the test I was able to do:
> >
> >   http://marc.info/?l=linux-kernel&m=123228728416498&w=2
> >
> >   
> 
> Can you also try with -no-kvm-irqchip?
> 
> You will need to comment out the lines
> 
>     /* ISA IRQs map to GSI 1-1 except for IRQ0 which maps
>      * to GSI 2.  GSI maps to ioapic 1-1.  This is not
>      * the cleanest way of doing it but it should work. */
> 
>     if (vector == 0)
>         vector = 2;
> 
> in qemu/hw/apic.c (should also fix -no-kvm smp).  This will change kvm 
> wakeups to use signals rather than the in-kernel code, which may be buggy.

Okay, I commented out those lines and compiled a new kvm-82 userspace
and kernel modules. Using those on a vanilla 2.6.28 kernel, with all
guests run with -no-kvm-irqchip added.

As before a number of the XP guests wanted to chug away at 100% CPU
usage for a long time. Three of the guests clocked up ~40 minutes CPU
time before I decided to just shut them down. Perhaps coincidentally,
these three guests are the only ones with Office 2003 installed on them.
That could be the difference between those guests and the other XP
guests, but that's probably not important for now.

The two Linux SMP guests booted okay this time, though they seem to only
use one CPU on the host (I guess kvm is not multi-threaded in this
mode?). "hermes-old", the guest I am pinging in all my tests, had a lot
of trouble running the apache2 setup - it was so slow it was difficult
to load a complete page from our RT system. The kvm process for this
guest was taking up 100% cpu on the host constantly and all sorts of
wierd stuff could be seen by watching top in the guest:

top - 03:44:17 up 43 min,  1 user,  load average: 3.95, 1.55, 0.80
Tasks: 101 total,   4 running,  97 sleeping,   0 stopped,   0 zombie
Cpu(s): 79.7%us, 10.4%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  9.9%si,
0.0%st
Mem:   2075428k total,   391128k used,  1684300k free,    13044k buffers
Swap:  3502160k total,        0k used,  3502160k free,   118488k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND             
 2956 postgres  20   0 19704  11m  10m S 1658  0.6   2:55.99 postmaster          
 2934 www-data  20   0 60392  40m 5132 R   31  2.0   0:17.28 apache2             
 2958 postgres  20   0 19700  11m 9.8m R   28  0.6   0:20.41 postmaster          
 2940 www-data  20   0 58652  38m 5016 S   27  1.9   0:04.87 apache2             
 2937 www-data  20   0 60124  40m 5112 S   18  2.0   0:11.00 apache2             
 2959 postgres  20   0 19132 5424 4132 S   10  0.3   0:01.50 postmaster          
 2072 postgres  20   0  8064 1416  548 S    7  0.1   0:23.71 postmaster          
 2960 postgres  20   0 19132 5368 4060 R    6  0.3   0:01.55 postmaster          
 2071 postgres  20   0  8560 1972  488 S    5  0.1   0:08.33 postmaster    

Running the ping test with without apache2 running in the guest:

--- hermes-old.wumi.org.au ping statistics ---
900 packets transmitted, 900 received, 0% packet loss, time 902740ms
rtt min/avg/max/mdev = 0.568/3.745/272.558/16.990 ms

And with apache2 running:

--- hermes-old.wumi.org.au ping statistics ---
900 packets transmitted, 900 received, 0% packet loss, time 902758ms
rtt min/avg/max/mdev = 0.625/25.634/852.739/76.586 ms

In both cases it's quite variable, but the max latency is still not as
bad as when running with the irq chip enabled.

Anyway, the test is again not ideal, but I hope we're proving something.
That's all I can do for tonight - should be ready for more again
tomorrow night.

Regards,
Kevin.


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 17:53                   ` Steven Rostedt
  0 siblings, 0 replies; 180+ messages in thread
From: Steven Rostedt @ 2009-01-20 17:53 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kevin Shanahan, Avi Kivity, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Kevin Shanahan,
	Mike Galbraith, Peter Zijlstra, Frédéric Weisbecker


On Tue, 20 Jan 2009, Ingo Molnar wrote:

> 
> * Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > On Tue, 20 Jan 2009, Ingo Molnar wrote:
> > > Another test would be to build the scheduler latency tracer into your 
> > > kernel:
> > > 
> > >     CONFIG_SCHED_TRACER=y
> > > 
> > > And enable it via:
> > > 
> > >     echo wakeup > /debug/tracing/current_tracer
> > > 
> > > and you should be seeing the worst-case scheduling latency traces in 
> > > /debug/tracing/trace, and the largest observed latency will be in 
> > > /debug/tracing/tracing_max_latency [in microseconds].
> > 
> > Note, the wakeup latency only tests realtime threads, since other 
> > threads can have other issues for wakeup. I could change the wakeup 
> > tracer as wakeup_rt, and make a new "wakeup" that tests all threads, but 
> > it may be difficult to get something accurate.
> 
> hm, that's a significant regression then. The latency tracer used to 
> measure the highest-prio task in the system - be that RT or non-rt.

Well, it is a regression from what was in -rt yes. But not from what ever 
was in mainline.

But I needed to change this to detect the problem that we 
solved with push and pull of rt tasks. The wake up of a non-rt tasks 
always took longer than an -rt task, and by tracing all tasks, I never got 
the wake up latency of an rt task.

As I mentioned earlier, I can make a wakeup-rt to do the rt tracing, and 
make wakeup do all tasks.

-- Steve


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 17:53                   ` Steven Rostedt
  0 siblings, 0 replies; 180+ messages in thread
From: Steven Rostedt @ 2009-01-20 17:53 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kevin Shanahan, Avi Kivity, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Kevin Shanahan,
	Mike Galbraith, Peter Zijlstra, Frédéric Weisbecker


On Tue, 20 Jan 2009, Ingo Molnar wrote:

> 
> * Steven Rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org> wrote:
> 
> > On Tue, 20 Jan 2009, Ingo Molnar wrote:
> > > Another test would be to build the scheduler latency tracer into your 
> > > kernel:
> > > 
> > >     CONFIG_SCHED_TRACER=y
> > > 
> > > And enable it via:
> > > 
> > >     echo wakeup > /debug/tracing/current_tracer
> > > 
> > > and you should be seeing the worst-case scheduling latency traces in 
> > > /debug/tracing/trace, and the largest observed latency will be in 
> > > /debug/tracing/tracing_max_latency [in microseconds].
> > 
> > Note, the wakeup latency only tests realtime threads, since other 
> > threads can have other issues for wakeup. I could change the wakeup 
> > tracer as wakeup_rt, and make a new "wakeup" that tests all threads, but 
> > it may be difficult to get something accurate.
> 
> hm, that's a significant regression then. The latency tracer used to 
> measure the highest-prio task in the system - be that RT or non-rt.

Well, it is a regression from what was in -rt yes. But not from what ever 
was in mainline.

But I needed to change this to detect the problem that we 
solved with push and pull of rt tasks. The wake up of a non-rt tasks 
always took longer than an -rt task, and by tracing all tasks, I never got 
the wake up latency of an rt task.

As I mentioned earlier, I can make a wakeup-rt to do the rt tracing, and 
make wakeup do all tasks.

-- Steve

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 17:47                 ` Avi Kivity
  0 siblings, 0 replies; 180+ messages in thread
From: Avi Kivity @ 2009-01-20 17:47 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, Kevin Shanahan, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Kevin Shanahan,
	Mike Galbraith, Peter Zijlstra, Frédéric Weisbecker

Steven Rostedt wrote:
> Note, the wakeup latency only tests realtime threads, since other threads
> can have other issues for wakeup. I could change the wakeup tracer as
> wakeup_rt, and make a new "wakeup" that tests all threads, but it may
> be difficult to get something accurate.
>   

Kevin, can you retest with kvm at realtime priority?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 17:47                 ` Avi Kivity
  0 siblings, 0 replies; 180+ messages in thread
From: Avi Kivity @ 2009-01-20 17:47 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, Kevin Shanahan, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Kevin Shanahan,
	Mike Galbraith, Peter Zijlstra, Frédéric Weisbecker

Steven Rostedt wrote:
> Note, the wakeup latency only tests realtime threads, since other threads
> can have other issues for wakeup. I could change the wakeup tracer as
> wakeup_rt, and make a new "wakeup" that tests all threads, but it may
> be difficult to get something accurate.
>   

Kevin, can you retest with kvm at realtime priority?

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 16:19                     ` Peter Zijlstra
  0 siblings, 0 replies; 180+ messages in thread
From: Peter Zijlstra @ 2009-01-20 16:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kevin Shanahan, Avi Kivity, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	bugme-daemon

On Tue, 2009-01-20 at 17:06 +0100, Ingo Molnar wrote:
> se.wait_max                        :           -92.027877
> 
> that field is not supposed to be negative. Mike, Peter, any ideas?

Possibly unrelated, but whilst I was poking at try_to_wake_up yesterday,
I thought I spotted a site where we fail to update rq clock.

Since we just moved the task to a new cpu (and thus rq) we need to
update_rq_clock() again.

diff --git a/kernel/sched.c b/kernel/sched.c
index d7ae5f4..6cd5e52 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2398,6 +2398,7 @@ static int try_to_wake_up(struct task_struct *p, unsigned int state, int sync)
 	if (cpu != orig_cpu) {
 		set_task_cpu(p, cpu);
 		task_rq_unlock(rq, &flags);
+		update_rq_clock(rq);
 		/* might preempt at this point */
 		rq = task_rq_lock(p, &flags);
 		old_state = p->state;



^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 16:19                     ` Peter Zijlstra
  0 siblings, 0 replies; 180+ messages in thread
From: Peter Zijlstra @ 2009-01-20 16:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kevin Shanahan, Avi Kivity, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r

On Tue, 2009-01-20 at 17:06 +0100, Ingo Molnar wrote:
> se.wait_max                        :           -92.027877
> 
> that field is not supposed to be negative. Mike, Peter, any ideas?

Possibly unrelated, but whilst I was poking at try_to_wake_up yesterday,
I thought I spotted a site where we fail to update rq clock.

Since we just moved the task to a new cpu (and thus rq) we need to
update_rq_clock() again.

diff --git a/kernel/sched.c b/kernel/sched.c
index d7ae5f4..6cd5e52 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2398,6 +2398,7 @@ static int try_to_wake_up(struct task_struct *p, unsigned int state, int sync)
 	if (cpu != orig_cpu) {
 		set_task_cpu(p, cpu);
 		task_rq_unlock(rq, &flags);
+		update_rq_clock(rq);
 		/* might preempt at this point */
 		rq = task_rq_lock(p, &flags);
 		old_state = p->state;


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 16:06                   ` Ingo Molnar
  0 siblings, 0 replies; 180+ messages in thread
From: Ingo Molnar @ 2009-01-20 16:06 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Mike Galbraith, Peter Zijlstra,
	bugme-daemon


* Kevin Shanahan <kmshanah@ucwb.org.au> wrote:

> I've uploaded the debug info here:
>   http://disenchant.net/tmp/bug-12465/

one interesting number to watch for is the KVM thread's wait_max in 
/proc/*/sched. The largest one seems to be 11 milliseconds:

se.wait_max                        :             3.175034
se.wait_max                        :             4.029938
se.wait_max                        :             4.217674
se.wait_max                        :             4.957836
se.wait_max                        :            10.339471
se.wait_max                        :            11.603943

which would be about right given your latency settings:

 /proc/sys/kernel/sched_latency_ns:
 60000000

[ 60 msecs ]

but ... i dont specifically see the kvm threads there. Are they not in 
/proc/*? Maybe it's in threads and it needs to be accessed via 
/proc/*/task/*/sched, as via:

$ grep -h wait_max /proc/*/task/*/sched | sort -t: -n -k 2 | tail -10
se.wait_max                        :            77.858092
se.wait_max                        :            78.778409
se.wait_max                        :            79.379026
se.wait_max                        :            85.930963
se.wait_max                        :            87.671842
se.wait_max                        :            88.008602
se.wait_max                        :            95.095744
se.wait_max                        :           157.882573
se.wait_max                        :           268.714775
se.wait_max                        :           393.085252

so the worst-case latency

Btw., there's a few weird stats in your logs:

se.wait_max                        :          -284.864857
se.wait_max                        :          -284.843431
se.wait_max                        :          -284.820204
se.wait_max                        :          -284.345294
se.wait_max                        :          -284.298462
se.wait_max                        :          -284.018644
se.wait_max                        :          -284.018070
se.wait_max                        :          -188.022417
se.wait_max                        :          -188.021659
se.wait_max                        :           -92.030204
se.wait_max                        :           -92.027877

that field is not supposed to be negative. Mike, Peter, any ideas?

	Ingo

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 16:06                   ` Ingo Molnar
  0 siblings, 0 replies; 180+ messages in thread
From: Ingo Molnar @ 2009-01-20 16:06 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Mike Galbraith, Peter Zijlstra,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r


* Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org> wrote:

> I've uploaded the debug info here:
>   http://disenchant.net/tmp/bug-12465/

one interesting number to watch for is the KVM thread's wait_max in 
/proc/*/sched. The largest one seems to be 11 milliseconds:

se.wait_max                        :             3.175034
se.wait_max                        :             4.029938
se.wait_max                        :             4.217674
se.wait_max                        :             4.957836
se.wait_max                        :            10.339471
se.wait_max                        :            11.603943

which would be about right given your latency settings:

 /proc/sys/kernel/sched_latency_ns:
 60000000

[ 60 msecs ]

but ... i dont specifically see the kvm threads there. Are they not in 
/proc/*? Maybe it's in threads and it needs to be accessed via 
/proc/*/task/*/sched, as via:

$ grep -h wait_max /proc/*/task/*/sched | sort -t: -n -k 2 | tail -10
se.wait_max                        :            77.858092
se.wait_max                        :            78.778409
se.wait_max                        :            79.379026
se.wait_max                        :            85.930963
se.wait_max                        :            87.671842
se.wait_max                        :            88.008602
se.wait_max                        :            95.095744
se.wait_max                        :           157.882573
se.wait_max                        :           268.714775
se.wait_max                        :           393.085252

so the worst-case latency

Btw., there's a few weird stats in your logs:

se.wait_max                        :          -284.864857
se.wait_max                        :          -284.843431
se.wait_max                        :          -284.820204
se.wait_max                        :          -284.345294
se.wait_max                        :          -284.298462
se.wait_max                        :          -284.018644
se.wait_max                        :          -284.018070
se.wait_max                        :          -188.022417
se.wait_max                        :          -188.021659
se.wait_max                        :           -92.030204
se.wait_max                        :           -92.027877

that field is not supposed to be negative. Mike, Peter, any ideas?

	Ingo

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 15:51                 ` Kevin Shanahan
  0 siblings, 0 replies; 180+ messages in thread
From: Kevin Shanahan @ 2009-01-20 15:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Mike Galbraith, Peter Zijlstra,
	bugme-daemon

On Tue, 2009-01-20 at 15:25 +0100, Ingo Molnar wrote:
> > I could run top, vmstat and cat /proc/sched_debug in a loop until the
> > problem occurs and then trim it. Something like:
> > 
> > while true; do
> >   date                                >> $FILE
> >   echo "-- top: --"                   >> $FILE
> >   top -H -c -b -d 1 -n 0.5            >> $FILE 2>/dev/null
> >   echo "-- vmstat: --"                >> $FILE
> >   vmstat                              >> $FILE 2>/dev/null
> >   echo "-- sched_debug #$i: --"       >> $FILE
> >   cat /proc/sched_debug               >> $FILE 2>/dev/null
> > done
> > 
> > That should take a snapshot every half second or so.
> 
> Yeah, that would be lovely. You dont even have to trim it much - just give 
> us a timestamp to look at for the delay incident. You might also want to 
> start the kvm session while the script is already running - that way we'll 
> get fresh statistics and see the whole thing.

I've uploaded the debug info here:
  http://disenchant.net/tmp/bug-12465/

Some interesting sections should be around these times:

  01:36:04 -> 01:36:27
  01:37:30 -> 01:37:42
  01:37:52 -> 01:37:56
  01:39:37 -> 01:39:40
  01:40:01 -> 01:40:14

The output from ping is there too so you can see how the delays usually
show up (e.g. in clusters). The large debug file runs from before I
launched the VMs, right through the ping test. The trimmed file just
cuts out everything before I started ping.

Regards,
Kevin.



^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 15:51                 ` Kevin Shanahan
  0 siblings, 0 replies; 180+ messages in thread
From: Kevin Shanahan @ 2009-01-20 15:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Mike Galbraith, Peter Zijlstra,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r

On Tue, 2009-01-20 at 15:25 +0100, Ingo Molnar wrote:
> > I could run top, vmstat and cat /proc/sched_debug in a loop until the
> > problem occurs and then trim it. Something like:
> > 
> > while true; do
> >   date                                >> $FILE
> >   echo "-- top: --"                   >> $FILE
> >   top -H -c -b -d 1 -n 0.5            >> $FILE 2>/dev/null
> >   echo "-- vmstat: --"                >> $FILE
> >   vmstat                              >> $FILE 2>/dev/null
> >   echo "-- sched_debug #$i: --"       >> $FILE
> >   cat /proc/sched_debug               >> $FILE 2>/dev/null
> > done
> > 
> > That should take a snapshot every half second or so.
> 
> Yeah, that would be lovely. You dont even have to trim it much - just give 
> us a timestamp to look at for the delay incident. You might also want to 
> start the kvm session while the script is already running - that way we'll 
> get fresh statistics and see the whole thing.

I've uploaded the debug info here:
  http://disenchant.net/tmp/bug-12465/

Some interesting sections should be around these times:

  01:36:04 -> 01:36:27
  01:37:30 -> 01:37:42
  01:37:52 -> 01:37:56
  01:39:37 -> 01:39:40
  01:40:01 -> 01:40:14

The output from ping is there too so you can see how the delays usually
show up (e.g. in clusters). The large debug file runs from before I
launched the VMs, right through the ping test. The trimmed file just
cuts out everything before I started ping.

Regards,
Kevin.


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 15:04                 ` Ingo Molnar
  0 siblings, 0 replies; 180+ messages in thread
From: Ingo Molnar @ 2009-01-20 15:04 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Kevin Shanahan, Avi Kivity, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Kevin Shanahan,
	Mike Galbraith, Peter Zijlstra, Frédéric Weisbecker


* Steven Rostedt <rostedt@goodmis.org> wrote:

> On Tue, 20 Jan 2009, Ingo Molnar wrote:
> > Another test would be to build the scheduler latency tracer into your 
> > kernel:
> > 
> >     CONFIG_SCHED_TRACER=y
> > 
> > And enable it via:
> > 
> >     echo wakeup > /debug/tracing/current_tracer
> > 
> > and you should be seeing the worst-case scheduling latency traces in 
> > /debug/tracing/trace, and the largest observed latency will be in 
> > /debug/tracing/tracing_max_latency [in microseconds].
> 
> Note, the wakeup latency only tests realtime threads, since other 
> threads can have other issues for wakeup. I could change the wakeup 
> tracer as wakeup_rt, and make a new "wakeup" that tests all threads, but 
> it may be difficult to get something accurate.

hm, that's a significant regression then. The latency tracer used to 
measure the highest-prio task in the system - be that RT or non-rt.

	Ingo

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 15:04                 ` Ingo Molnar
  0 siblings, 0 replies; 180+ messages in thread
From: Ingo Molnar @ 2009-01-20 15:04 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Kevin Shanahan, Avi Kivity, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Kevin Shanahan,
	Mike Galbraith, Peter Zijlstra, Frédéric Weisbecker


* Steven Rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org> wrote:

> On Tue, 20 Jan 2009, Ingo Molnar wrote:
> > Another test would be to build the scheduler latency tracer into your 
> > kernel:
> > 
> >     CONFIG_SCHED_TRACER=y
> > 
> > And enable it via:
> > 
> >     echo wakeup > /debug/tracing/current_tracer
> > 
> > and you should be seeing the worst-case scheduling latency traces in 
> > /debug/tracing/trace, and the largest observed latency will be in 
> > /debug/tracing/tracing_max_latency [in microseconds].
> 
> Note, the wakeup latency only tests realtime threads, since other 
> threads can have other issues for wakeup. I could change the wakeup 
> tracer as wakeup_rt, and make a new "wakeup" that tests all threads, but 
> it may be difficult to get something accurate.

hm, that's a significant regression then. The latency tracer used to 
measure the highest-prio task in the system - be that RT or non-rt.

	Ingo

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 14:59               ` Steven Rostedt
  0 siblings, 0 replies; 180+ messages in thread
From: Steven Rostedt @ 2009-01-20 14:59 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kevin Shanahan, Avi Kivity, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Kevin Shanahan,
	Mike Galbraith, Peter Zijlstra, Frédéric Weisbecker



On Tue, 20 Jan 2009, Ingo Molnar wrote:
> Another test would be to build the scheduler latency tracer into your 
> kernel:
> 
>     CONFIG_SCHED_TRACER=y
> 
> And enable it via:
> 
>     echo wakeup > /debug/tracing/current_tracer
> 
> and you should be seeing the worst-case scheduling latency traces in 
> /debug/tracing/trace, and the largest observed latency will be in 
> /debug/tracing/tracing_max_latency [in microseconds].

Note, the wakeup latency only tests realtime threads, since other threads
can have other issues for wakeup. I could change the wakeup tracer as
wakeup_rt, and make a new "wakeup" that tests all threads, but it may
be difficult to get something accurate.

-- Steve

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 14:59               ` Steven Rostedt
  0 siblings, 0 replies; 180+ messages in thread
From: Steven Rostedt @ 2009-01-20 14:59 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kevin Shanahan, Avi Kivity, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Kevin Shanahan,
	Mike Galbraith, Peter Zijlstra, Frédéric Weisbecker



On Tue, 20 Jan 2009, Ingo Molnar wrote:
> Another test would be to build the scheduler latency tracer into your 
> kernel:
> 
>     CONFIG_SCHED_TRACER=y
> 
> And enable it via:
> 
>     echo wakeup > /debug/tracing/current_tracer
> 
> and you should be seeing the worst-case scheduling latency traces in 
> /debug/tracing/trace, and the largest observed latency will be in 
> /debug/tracing/tracing_max_latency [in microseconds].

Note, the wakeup latency only tests realtime threads, since other threads
can have other issues for wakeup. I could change the wakeup tracer as
wakeup_rt, and make a new "wakeup" that tests all threads, but it may
be difficult to get something accurate.

-- Steve

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 14:46               ` Frédéric Weisbecker
  0 siblings, 0 replies; 180+ messages in thread
From: Frédéric Weisbecker @ 2009-01-20 14:46 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Ingo Molnar, Avi Kivity, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra

2009/1/20 Kevin Shanahan <kmshanah@ucwb.org.au>:
> On Tue, 2009-01-20 at 13:56 +0100, Ingo Molnar wrote:
>> * Kevin Shanahan <kmshanah@ucwb.org.au> wrote:
>> > > This suggests some sort of KVM-specific problem. Scheduler latencies
>> > > in the seconds that occur under normal load situations are noticed and
>> > > reported quickly - and there are no such open regressions currently.
>> >
>> > It at least suggests a problem with interaction between the scheduler
>> > and kvm, otherwise reverting that scheduler patch wouldn't have made the
>> > regression go away.
>>
>> the scheduler affects almost everything, so almost by definition a
>> scheduler change can tickle a race or other timing bug in just about any
>> code - and reverting that change in the scheduler can make the bug go
>> away. But yes, it could also be a genuine scheduler bug - that is always a
>> possibility.
>
> Okay, I understand.
>
>> Could you please run a cfs-debug-info.sh session on a CONFIG_SCHED_DEBUG=y
>> and CONFIG_SCHEDSTATS=y kernel, while you are experiencing those
>> latencies:
>>
>>   http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh
>>
>> and post that (relatively large) somewhere, or send it as a reply after
>> bzip2 -9 compressing it? It will include a lot of information about the
>> delays your tasks are experiencing.
>
> Running it while the problem is occuring will be tricky, as it only
> lasts for a few seconds at a time. Is it going to be useful at all to
> just see those statistics if the system is running normally?
>
> I might need to modify the script a little. Am I right that everything
> above "gathering statistics..." is pretty much static information?
>
> I could run top, vmstat and cat /proc/sched_debug in a loop until the
> problem occurs and then trim it. Something like:
>
> while true; do
>  date                                >> $FILE
>  echo "-- top: --"                   >> $FILE
>  top -H -c -b -d 1 -n 0.5            >> $FILE 2>/dev/null
>  echo "-- vmstat: --"                >> $FILE
>  vmstat                              >> $FILE 2>/dev/null
>  echo "-- sched_debug #$i: --"       >> $FILE
>  cat /proc/sched_debug               >> $FILE 2>/dev/null
> done
>
> That should take a snapshot every half second or so.
>
> Regards,
> Kevin.
>
> P.S. Please keep kmshanah@flexo.wumi.org.au out of the CC list (it won't
>     route properly anyway). I don't know how it got added - the only
>     place it would have appeared was in the "revert" commit message
>     when I was testing 2.6.28 with the commit I bisected down to
>     removed.
>


One other thing you can do is enabling CONFIG_FUNCTION_GRAPH_TRACER,
as Ingo suggested, and
trace the schedule() function.
This way you will see the time spent in (almost) each functions called
from schedule() and perhaps find
where is the contention (if it comes from the scheduler).

How to use it?

echo schedule > /debugfs/tracing/set_graph_function
echo function_graph > /debugfs/tracing/current_tracer
cat /debugfs/tracing/trace

Or even through a pipe:
cat /debugfs/tracing/trace_pipe > ~/func_graph.log

To end the tracing: echo nop > /debugfs/tracing/current_tracer
Or just make a pause: echo 0 > /debugfs/tracing/tracing_enabled

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 14:46               ` Frédéric Weisbecker
  0 siblings, 0 replies; 180+ messages in thread
From: Frédéric Weisbecker @ 2009-01-20 14:46 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Ingo Molnar, Avi Kivity, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra

2009/1/20 Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>:
> On Tue, 2009-01-20 at 13:56 +0100, Ingo Molnar wrote:
>> * Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org> wrote:
>> > > This suggests some sort of KVM-specific problem. Scheduler latencies
>> > > in the seconds that occur under normal load situations are noticed and
>> > > reported quickly - and there are no such open regressions currently.
>> >
>> > It at least suggests a problem with interaction between the scheduler
>> > and kvm, otherwise reverting that scheduler patch wouldn't have made the
>> > regression go away.
>>
>> the scheduler affects almost everything, so almost by definition a
>> scheduler change can tickle a race or other timing bug in just about any
>> code - and reverting that change in the scheduler can make the bug go
>> away. But yes, it could also be a genuine scheduler bug - that is always a
>> possibility.
>
> Okay, I understand.
>
>> Could you please run a cfs-debug-info.sh session on a CONFIG_SCHED_DEBUG=y
>> and CONFIG_SCHEDSTATS=y kernel, while you are experiencing those
>> latencies:
>>
>>   http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh
>>
>> and post that (relatively large) somewhere, or send it as a reply after
>> bzip2 -9 compressing it? It will include a lot of information about the
>> delays your tasks are experiencing.
>
> Running it while the problem is occuring will be tricky, as it only
> lasts for a few seconds at a time. Is it going to be useful at all to
> just see those statistics if the system is running normally?
>
> I might need to modify the script a little. Am I right that everything
> above "gathering statistics..." is pretty much static information?
>
> I could run top, vmstat and cat /proc/sched_debug in a loop until the
> problem occurs and then trim it. Something like:
>
> while true; do
>  date                                >> $FILE
>  echo "-- top: --"                   >> $FILE
>  top -H -c -b -d 1 -n 0.5            >> $FILE 2>/dev/null
>  echo "-- vmstat: --"                >> $FILE
>  vmstat                              >> $FILE 2>/dev/null
>  echo "-- sched_debug #$i: --"       >> $FILE
>  cat /proc/sched_debug               >> $FILE 2>/dev/null
> done
>
> That should take a snapshot every half second or so.
>
> Regards,
> Kevin.
>
> P.S. Please keep kmshanah-IiIpDuVlHfMLO379cgqW9odd74u8MsAO@public.gmane.org out of the CC list (it won't
>     route properly anyway). I don't know how it got added - the only
>     place it would have appeared was in the "revert" commit message
>     when I was testing 2.6.28 with the commit I bisected down to
>     removed.
>


One other thing you can do is enabling CONFIG_FUNCTION_GRAPH_TRACER,
as Ingo suggested, and
trace the schedule() function.
This way you will see the time spent in (almost) each functions called
from schedule() and perhaps find
where is the contention (if it comes from the scheduler).

How to use it?

echo schedule > /debugfs/tracing/set_graph_function
echo function_graph > /debugfs/tracing/current_tracer
cat /debugfs/tracing/trace

Or even through a pipe:
cat /debugfs/tracing/trace_pipe > ~/func_graph.log

To end the tracing: echo nop > /debugfs/tracing/current_tracer
Or just make a pause: echo 0 > /debugfs/tracing/tracing_enabled

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 14:25               ` Ingo Molnar
  0 siblings, 0 replies; 180+ messages in thread
From: Ingo Molnar @ 2009-01-20 14:25 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Mike Galbraith, Peter Zijlstra


* Kevin Shanahan <kmshanah@ucwb.org.au> wrote:

> On Tue, 2009-01-20 at 13:56 +0100, Ingo Molnar wrote:
> > * Kevin Shanahan <kmshanah@ucwb.org.au> wrote:
> > > > This suggests some sort of KVM-specific problem. Scheduler latencies 
> > > > in the seconds that occur under normal load situations are noticed and 
> > > > reported quickly - and there are no such open regressions currently.
> > > 
> > > It at least suggests a problem with interaction between the scheduler 
> > > and kvm, otherwise reverting that scheduler patch wouldn't have made the 
> > > regression go away.
> > 
> > the scheduler affects almost everything, so almost by definition a 
> > scheduler change can tickle a race or other timing bug in just about any 
> > code - and reverting that change in the scheduler can make the bug go 
> > away. But yes, it could also be a genuine scheduler bug - that is always a 
> > possibility.
> 
> Okay, I understand.
> 
> > Could you please run a cfs-debug-info.sh session on a CONFIG_SCHED_DEBUG=y 
> > and CONFIG_SCHEDSTATS=y kernel, while you are experiencing those 
> > latencies:
> > 
> >   http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh
> > 
> > and post that (relatively large) somewhere, or send it as a reply after 
> > bzip2 -9 compressing it? It will include a lot of information about the 
> > delays your tasks are experiencing.
> 
> Running it while the problem is occuring will be tricky, as it only 
> lasts for a few seconds at a time. Is it going to be useful at all to 
> just see those statistics if the system is running normally?
> 
> I might need to modify the script a little. Am I right that everything 
> above "gathering statistics..." is pretty much static information?

Correct.

> I could run top, vmstat and cat /proc/sched_debug in a loop until the
> problem occurs and then trim it. Something like:
> 
> while true; do
>   date                                >> $FILE
>   echo "-- top: --"                   >> $FILE
>   top -H -c -b -d 1 -n 0.5            >> $FILE 2>/dev/null
>   echo "-- vmstat: --"                >> $FILE
>   vmstat                              >> $FILE 2>/dev/null
>   echo "-- sched_debug #$i: --"       >> $FILE
>   cat /proc/sched_debug               >> $FILE 2>/dev/null
> done
> 
> That should take a snapshot every half second or so.

Yeah, that would be lovely. You dont even have to trim it much - just give 
us a timestamp to look at for the delay incident. You might also want to 
start the kvm session while the script is already running - that way we'll 
get fresh statistics and see the whole thing.

	Ingo

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 14:25               ` Ingo Molnar
  0 siblings, 0 replies; 180+ messages in thread
From: Ingo Molnar @ 2009-01-20 14:25 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Mike Galbraith, Peter Zijlstra


* Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org> wrote:

> On Tue, 2009-01-20 at 13:56 +0100, Ingo Molnar wrote:
> > * Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org> wrote:
> > > > This suggests some sort of KVM-specific problem. Scheduler latencies 
> > > > in the seconds that occur under normal load situations are noticed and 
> > > > reported quickly - and there are no such open regressions currently.
> > > 
> > > It at least suggests a problem with interaction between the scheduler 
> > > and kvm, otherwise reverting that scheduler patch wouldn't have made the 
> > > regression go away.
> > 
> > the scheduler affects almost everything, so almost by definition a 
> > scheduler change can tickle a race or other timing bug in just about any 
> > code - and reverting that change in the scheduler can make the bug go 
> > away. But yes, it could also be a genuine scheduler bug - that is always a 
> > possibility.
> 
> Okay, I understand.
> 
> > Could you please run a cfs-debug-info.sh session on a CONFIG_SCHED_DEBUG=y 
> > and CONFIG_SCHEDSTATS=y kernel, while you are experiencing those 
> > latencies:
> > 
> >   http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh
> > 
> > and post that (relatively large) somewhere, or send it as a reply after 
> > bzip2 -9 compressing it? It will include a lot of information about the 
> > delays your tasks are experiencing.
> 
> Running it while the problem is occuring will be tricky, as it only 
> lasts for a few seconds at a time. Is it going to be useful at all to 
> just see those statistics if the system is running normally?
> 
> I might need to modify the script a little. Am I right that everything 
> above "gathering statistics..." is pretty much static information?

Correct.

> I could run top, vmstat and cat /proc/sched_debug in a loop until the
> problem occurs and then trim it. Something like:
> 
> while true; do
>   date                                >> $FILE
>   echo "-- top: --"                   >> $FILE
>   top -H -c -b -d 1 -n 0.5            >> $FILE 2>/dev/null
>   echo "-- vmstat: --"                >> $FILE
>   vmstat                              >> $FILE 2>/dev/null
>   echo "-- sched_debug #$i: --"       >> $FILE
>   cat /proc/sched_debug               >> $FILE 2>/dev/null
> done
> 
> That should take a snapshot every half second or so.

Yeah, that would be lovely. You dont even have to trim it much - just give 
us a timestamp to look at for the delay incident. You might also want to 
start the kvm session while the script is already running - that way we'll 
get fresh statistics and see the whole thing.

	Ingo

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 14:23             ` Kevin Shanahan
  0 siblings, 0 replies; 180+ messages in thread
From: Kevin Shanahan @ 2009-01-20 14:23 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Mike Galbraith, Peter Zijlstra

On Tue, 2009-01-20 at 13:56 +0100, Ingo Molnar wrote:
> * Kevin Shanahan <kmshanah@ucwb.org.au> wrote:
> > > This suggests some sort of KVM-specific problem. Scheduler latencies 
> > > in the seconds that occur under normal load situations are noticed and 
> > > reported quickly - and there are no such open regressions currently.
> > 
> > It at least suggests a problem with interaction between the scheduler 
> > and kvm, otherwise reverting that scheduler patch wouldn't have made the 
> > regression go away.
> 
> the scheduler affects almost everything, so almost by definition a 
> scheduler change can tickle a race or other timing bug in just about any 
> code - and reverting that change in the scheduler can make the bug go 
> away. But yes, it could also be a genuine scheduler bug - that is always a 
> possibility.

Okay, I understand.

> Could you please run a cfs-debug-info.sh session on a CONFIG_SCHED_DEBUG=y 
> and CONFIG_SCHEDSTATS=y kernel, while you are experiencing those 
> latencies:
> 
>   http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh
> 
> and post that (relatively large) somewhere, or send it as a reply after 
> bzip2 -9 compressing it? It will include a lot of information about the 
> delays your tasks are experiencing.

Running it while the problem is occuring will be tricky, as it only
lasts for a few seconds at a time. Is it going to be useful at all to
just see those statistics if the system is running normally?

I might need to modify the script a little. Am I right that everything
above "gathering statistics..." is pretty much static information?

I could run top, vmstat and cat /proc/sched_debug in a loop until the
problem occurs and then trim it. Something like:

while true; do
  date                                >> $FILE
  echo "-- top: --"                   >> $FILE
  top -H -c -b -d 1 -n 0.5            >> $FILE 2>/dev/null
  echo "-- vmstat: --"                >> $FILE
  vmstat                              >> $FILE 2>/dev/null
  echo "-- sched_debug #$i: --"       >> $FILE
  cat /proc/sched_debug               >> $FILE 2>/dev/null
done

That should take a snapshot every half second or so.

Regards,
Kevin.

P.S. Please keep kmshanah@flexo.wumi.org.au out of the CC list (it won't
     route properly anyway). I don't know how it got added - the only
     place it would have appeared was in the "revert" commit message
     when I was testing 2.6.28 with the commit I bisected down to
     removed.



^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 14:23             ` Kevin Shanahan
  0 siblings, 0 replies; 180+ messages in thread
From: Kevin Shanahan @ 2009-01-20 14:23 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Mike Galbraith, Peter Zijlstra

On Tue, 2009-01-20 at 13:56 +0100, Ingo Molnar wrote:
> * Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org> wrote:
> > > This suggests some sort of KVM-specific problem. Scheduler latencies 
> > > in the seconds that occur under normal load situations are noticed and 
> > > reported quickly - and there are no such open regressions currently.
> > 
> > It at least suggests a problem with interaction between the scheduler 
> > and kvm, otherwise reverting that scheduler patch wouldn't have made the 
> > regression go away.
> 
> the scheduler affects almost everything, so almost by definition a 
> scheduler change can tickle a race or other timing bug in just about any 
> code - and reverting that change in the scheduler can make the bug go 
> away. But yes, it could also be a genuine scheduler bug - that is always a 
> possibility.

Okay, I understand.

> Could you please run a cfs-debug-info.sh session on a CONFIG_SCHED_DEBUG=y 
> and CONFIG_SCHEDSTATS=y kernel, while you are experiencing those 
> latencies:
> 
>   http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh
> 
> and post that (relatively large) somewhere, or send it as a reply after 
> bzip2 -9 compressing it? It will include a lot of information about the 
> delays your tasks are experiencing.

Running it while the problem is occuring will be tricky, as it only
lasts for a few seconds at a time. Is it going to be useful at all to
just see those statistics if the system is running normally?

I might need to modify the script a little. Am I right that everything
above "gathering statistics..." is pretty much static information?

I could run top, vmstat and cat /proc/sched_debug in a loop until the
problem occurs and then trim it. Something like:

while true; do
  date                                >> $FILE
  echo "-- top: --"                   >> $FILE
  top -H -c -b -d 1 -n 0.5            >> $FILE 2>/dev/null
  echo "-- vmstat: --"                >> $FILE
  vmstat                              >> $FILE 2>/dev/null
  echo "-- sched_debug #$i: --"       >> $FILE
  cat /proc/sched_debug               >> $FILE 2>/dev/null
done

That should take a snapshot every half second or so.

Regards,
Kevin.

P.S. Please keep kmshanah-IiIpDuVlHfMLO379cgqW9odd74u8MsAO@public.gmane.org out of the CC list (it won't
     route properly anyway). I don't know how it got added - the only
     place it would have appeared was in the "revert" commit message
     when I was testing 2.6.28 with the commit I bisected down to
     removed.


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 13:07             ` Ingo Molnar
  0 siblings, 0 replies; 180+ messages in thread
From: Ingo Molnar @ 2009-01-20 13:07 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Kevin Shanahan, Mike Galbraith,
	Peter Zijlstra, Steven Rostedt, Frédéric Weisbecker


* Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Kevin Shanahan <kmshanah@ucwb.org.au> wrote:
> 
> > > This suggests some sort of KVM-specific problem. Scheduler latencies 
> > > in the seconds that occur under normal load situations are noticed and 
> > > reported quickly - and there are no such open regressions currently.
> > 
> > It at least suggests a problem with interaction between the scheduler 
> > and kvm, otherwise reverting that scheduler patch wouldn't have made the 
> > regression go away.
> 
> the scheduler affects almost everything, so almost by definition a 
> scheduler change can tickle a race or other timing bug in just about any 
> code - and reverting that change in the scheduler can make the bug go 
> away. But yes, it could also be a genuine scheduler bug - that is always a 
> possibility.
> 
> Could you please run a cfs-debug-info.sh session on a CONFIG_SCHED_DEBUG=y 
> and CONFIG_SCHEDSTATS=y kernel, while you are experiencing those 
> latencies:
> 
>   http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh
> 
> and post that (relatively large) somewhere, or send it as a reply after 
> bzip2 -9 compressing it? It will include a lot of information about the 
> delays your tasks are experiencing.

Another test would be to build the scheduler latency tracer into your 
kernel:

    CONFIG_SCHED_TRACER=y

And enable it via:

    echo wakeup > /debug/tracing/current_tracer

and you should be seeing the worst-case scheduling latency traces in 
/debug/tracing/trace, and the largest observed latency will be in 
/debug/tracing/tracing_max_latency [in microseconds].

You can reset the max-latency (and thus restart tracing) via:

    echo 0 > /debug/tracing/tracing_max_latency

Latencies up to 100 microseconds are ok. If you see 10 seconds delays 
there (values of 10,000,000 or more) then it's probably a scheduler bug.

Please reproduce the latency under KVM and send us the trace. The trace 
file will be a lot more verbose and a lot more verbose if you also enable 
the function tracer (FUNCTION_TRACER, DYNAMIC_FTRACE and 
FUNCTION_GRAPH_TRACER).

	Ingo

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 13:07             ` Ingo Molnar
  0 siblings, 0 replies; 180+ messages in thread
From: Ingo Molnar @ 2009-01-20 13:07 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Kevin Shanahan, Mike Galbraith,
	Peter Zijlstra, Steven Rostedt, Frédéric Weisbecker


* Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org> wrote:

> 
> * Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org> wrote:
> 
> > > This suggests some sort of KVM-specific problem. Scheduler latencies 
> > > in the seconds that occur under normal load situations are noticed and 
> > > reported quickly - and there are no such open regressions currently.
> > 
> > It at least suggests a problem with interaction between the scheduler 
> > and kvm, otherwise reverting that scheduler patch wouldn't have made the 
> > regression go away.
> 
> the scheduler affects almost everything, so almost by definition a 
> scheduler change can tickle a race or other timing bug in just about any 
> code - and reverting that change in the scheduler can make the bug go 
> away. But yes, it could also be a genuine scheduler bug - that is always a 
> possibility.
> 
> Could you please run a cfs-debug-info.sh session on a CONFIG_SCHED_DEBUG=y 
> and CONFIG_SCHEDSTATS=y kernel, while you are experiencing those 
> latencies:
> 
>   http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh
> 
> and post that (relatively large) somewhere, or send it as a reply after 
> bzip2 -9 compressing it? It will include a lot of information about the 
> delays your tasks are experiencing.

Another test would be to build the scheduler latency tracer into your 
kernel:

    CONFIG_SCHED_TRACER=y

And enable it via:

    echo wakeup > /debug/tracing/current_tracer

and you should be seeing the worst-case scheduling latency traces in 
/debug/tracing/trace, and the largest observed latency will be in 
/debug/tracing/tracing_max_latency [in microseconds].

You can reset the max-latency (and thus restart tracing) via:

    echo 0 > /debug/tracing/tracing_max_latency

Latencies up to 100 microseconds are ok. If you see 10 seconds delays 
there (values of 10,000,000 or more) then it's probably a scheduler bug.

Please reproduce the latency under KVM and send us the trace. The trace 
file will be a lot more verbose and a lot more verbose if you also enable 
the function tracer (FUNCTION_TRACER, DYNAMIC_FTRACE and 
FUNCTION_GRAPH_TRACER).

	Ingo

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 13:04           ` Avi Kivity
  0 siblings, 0 replies; 180+ messages in thread
From: Avi Kivity @ 2009-01-20 13:04 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Ingo Molnar, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Kevin Shanahan, Mike Galbraith,
	Peter Zijlstra

Kevin Shanahan wrote:
> On Tue, 2009-01-20 at 12:35 +0100, Ingo Molnar wrote:
>   
>> * Kevin Shanahan <kmshanah@ucwb.org.au> wrote:
>>
>>     
>>> On Mon, 2009-01-19 at 22:45 +0100, Rafael J. Wysocki wrote:
>>>       
>>>> This message has been generated automatically as a part of a report
>>>> of regressions introduced between 2.6.27 and 2.6.28.
>>>>
>>>> The following bug entry is on the current list of known regressions
>>>> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
>>>> be listed and let me know (either way).
>>>>
>>>>
>>>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
>>>> Subject		: KVM guests stalling on 2.6.28 (bisected)
>>>> Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
>>>> Date		: 2009-01-17 03:37 (3 days old)
>>>>         
>>> Yes, please keep this on the list.
>>>       
>> This only seems to occur under KVM, right? I.e. you tested it with -no-kvm 
>> and the problem went away, correct?
>>     
>
> Well, the I couldn't make the test conditions identical, but it the
> problem didn't occur with the test I was able to do:
>
>   http://marc.info/?l=linux-kernel&m=123228728416498&w=2
>
>   

Can you also try with -no-kvm-irqchip?

You will need to comment out the lines

    /* ISA IRQs map to GSI 1-1 except for IRQ0 which maps
     * to GSI 2.  GSI maps to ioapic 1-1.  This is not
     * the cleanest way of doing it but it should work. */

    if (vector == 0)
        vector = 2;

in qemu/hw/apic.c (should also fix -no-kvm smp).  This will change kvm 
wakeups to use signals rather than the in-kernel code, which may be buggy.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 13:04           ` Avi Kivity
  0 siblings, 0 replies; 180+ messages in thread
From: Avi Kivity @ 2009-01-20 13:04 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Ingo Molnar, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Kevin Shanahan, Mike Galbraith,
	Peter Zijlstra

Kevin Shanahan wrote:
> On Tue, 2009-01-20 at 12:35 +0100, Ingo Molnar wrote:
>   
>> * Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org> wrote:
>>
>>     
>>> On Mon, 2009-01-19 at 22:45 +0100, Rafael J. Wysocki wrote:
>>>       
>>>> This message has been generated automatically as a part of a report
>>>> of regressions introduced between 2.6.27 and 2.6.28.
>>>>
>>>> The following bug entry is on the current list of known regressions
>>>> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
>>>> be listed and let me know (either way).
>>>>
>>>>
>>>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
>>>> Subject		: KVM guests stalling on 2.6.28 (bisected)
>>>> Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
>>>> Date		: 2009-01-17 03:37 (3 days old)
>>>>         
>>> Yes, please keep this on the list.
>>>       
>> This only seems to occur under KVM, right? I.e. you tested it with -no-kvm 
>> and the problem went away, correct?
>>     
>
> Well, the I couldn't make the test conditions identical, but it the
> problem didn't occur with the test I was able to do:
>
>   http://marc.info/?l=linux-kernel&m=123228728416498&w=2
>
>   

Can you also try with -no-kvm-irqchip?

You will need to comment out the lines

    /* ISA IRQs map to GSI 1-1 except for IRQ0 which maps
     * to GSI 2.  GSI maps to ioapic 1-1.  This is not
     * the cleanest way of doing it but it should work. */

    if (vector == 0)
        vector = 2;

in qemu/hw/apic.c (should also fix -no-kvm smp).  This will change kvm 
wakeups to use signals rather than the in-kernel code, which may be buggy.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 12:56           ` Ingo Molnar
  0 siblings, 0 replies; 180+ messages in thread
From: Ingo Molnar @ 2009-01-20 12:56 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Kevin Shanahan, Mike Galbraith,
	Peter Zijlstra


* Kevin Shanahan <kmshanah@ucwb.org.au> wrote:

> > This suggests some sort of KVM-specific problem. Scheduler latencies 
> > in the seconds that occur under normal load situations are noticed and 
> > reported quickly - and there are no such open regressions currently.
> 
> It at least suggests a problem with interaction between the scheduler 
> and kvm, otherwise reverting that scheduler patch wouldn't have made the 
> regression go away.

the scheduler affects almost everything, so almost by definition a 
scheduler change can tickle a race or other timing bug in just about any 
code - and reverting that change in the scheduler can make the bug go 
away. But yes, it could also be a genuine scheduler bug - that is always a 
possibility.

Could you please run a cfs-debug-info.sh session on a CONFIG_SCHED_DEBUG=y 
and CONFIG_SCHEDSTATS=y kernel, while you are experiencing those 
latencies:

  http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh

and post that (relatively large) somewhere, or send it as a reply after 
bzip2 -9 compressing it? It will include a lot of information about the 
delays your tasks are experiencing.

	Ingo

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 12:56           ` Ingo Molnar
  0 siblings, 0 replies; 180+ messages in thread
From: Ingo Molnar @ 2009-01-20 12:56 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Kevin Shanahan, Mike Galbraith,
	Peter Zijlstra


* Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org> wrote:

> > This suggests some sort of KVM-specific problem. Scheduler latencies 
> > in the seconds that occur under normal load situations are noticed and 
> > reported quickly - and there are no such open regressions currently.
> 
> It at least suggests a problem with interaction between the scheduler 
> and kvm, otherwise reverting that scheduler patch wouldn't have made the 
> regression go away.

the scheduler affects almost everything, so almost by definition a 
scheduler change can tickle a race or other timing bug in just about any 
code - and reverting that change in the scheduler can make the bug go 
away. But yes, it could also be a genuine scheduler bug - that is always a 
possibility.

Could you please run a cfs-debug-info.sh session on a CONFIG_SCHED_DEBUG=y 
and CONFIG_SCHEDSTATS=y kernel, while you are experiencing those 
latencies:

  http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh

and post that (relatively large) somewhere, or send it as a reply after 
bzip2 -9 compressing it? It will include a lot of information about the 
delays your tasks are experiencing.

	Ingo

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 12:42         ` Kevin Shanahan
  0 siblings, 0 replies; 180+ messages in thread
From: Kevin Shanahan @ 2009-01-20 12:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Kevin Shanahan, Mike Galbraith,
	Peter Zijlstra

On Tue, 2009-01-20 at 12:35 +0100, Ingo Molnar wrote:
> * Kevin Shanahan <kmshanah@ucwb.org.au> wrote:
> 
> > On Mon, 2009-01-19 at 22:45 +0100, Rafael J. Wysocki wrote:
> > > This message has been generated automatically as a part of a report
> > > of regressions introduced between 2.6.27 and 2.6.28.
> > > 
> > > The following bug entry is on the current list of known regressions
> > > introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> > > be listed and let me know (either way).
> > > 
> > > 
> > > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> > > Subject		: KVM guests stalling on 2.6.28 (bisected)
> > > Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
> > > Date		: 2009-01-17 03:37 (3 days old)
> > 
> > Yes, please keep this on the list.
> 
> This only seems to occur under KVM, right? I.e. you tested it with -no-kvm 
> and the problem went away, correct?

Well, the I couldn't make the test conditions identical, but it the
problem didn't occur with the test I was able to do:

  http://marc.info/?l=linux-kernel&m=123228728416498&w=2

> This suggests some sort of KVM-specific problem. Scheduler latencies in 
> the seconds that occur under normal load situations are noticed and 
> reported quickly - and there are no such open regressions currently.

It at least suggests a problem with interaction between the scheduler
and kvm, otherwise reverting that scheduler patch wouldn't have made the
regression go away.

Regards,
Kevin.



^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 12:42         ` Kevin Shanahan
  0 siblings, 0 replies; 180+ messages in thread
From: Kevin Shanahan @ 2009-01-20 12:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Kevin Shanahan, Mike Galbraith,
	Peter Zijlstra

On Tue, 2009-01-20 at 12:35 +0100, Ingo Molnar wrote:
> * Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org> wrote:
> 
> > On Mon, 2009-01-19 at 22:45 +0100, Rafael J. Wysocki wrote:
> > > This message has been generated automatically as a part of a report
> > > of regressions introduced between 2.6.27 and 2.6.28.
> > > 
> > > The following bug entry is on the current list of known regressions
> > > introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> > > be listed and let me know (either way).
> > > 
> > > 
> > > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> > > Subject		: KVM guests stalling on 2.6.28 (bisected)
> > > Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
> > > Date		: 2009-01-17 03:37 (3 days old)
> > 
> > Yes, please keep this on the list.
> 
> This only seems to occur under KVM, right? I.e. you tested it with -no-kvm 
> and the problem went away, correct?

Well, the I couldn't make the test conditions identical, but it the
problem didn't occur with the test I was able to do:

  http://marc.info/?l=linux-kernel&m=123228728416498&w=2

> This suggests some sort of KVM-specific problem. Scheduler latencies in 
> the seconds that occur under normal load situations are noticed and 
> reported quickly - and there are no such open regressions currently.

It at least suggests a problem with interaction between the scheduler
and kvm, otherwise reverting that scheduler patch wouldn't have made the
regression go away.

Regards,
Kevin.


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 12:37         ` Avi Kivity
  0 siblings, 0 replies; 180+ messages in thread
From: Avi Kivity @ 2009-01-20 12:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kevin Shanahan, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Kevin Shanahan, Mike Galbraith,
	Peter Zijlstra

Ingo Molnar wrote:
> * Kevin Shanahan <kmshanah@ucwb.org.au> wrote:
>
>   
>> On Mon, 2009-01-19 at 22:45 +0100, Rafael J. Wysocki wrote:
>>     
>>> This message has been generated automatically as a part of a report
>>> of regressions introduced between 2.6.27 and 2.6.28.
>>>
>>> The following bug entry is on the current list of known regressions
>>> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
>>> be listed and let me know (either way).
>>>
>>>
>>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
>>> Subject		: KVM guests stalling on 2.6.28 (bisected)
>>> Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
>>> Date		: 2009-01-17 03:37 (3 days old)
>>>       
>> Yes, please keep this on the list.
>>     
>
> This only seems to occur under KVM, right? I.e. you tested it with -no-kvm 
> and the problem went away, correct?
>
> This suggests some sort of KVM-specific problem. Scheduler latencies in 
> the seconds that occur under normal load situations are noticed and 
> reported quickly - and there are no such open regressions currently.
>
>   

Not necessarily.  -no-kvm runs with only one thread, compared to kvm 
that runs with 1 + nr_cpus threads.

> Avi, can you reproduce these latencies? 

No.

> A possibly theory would be some 
> sort of guest wakeup problem/race triggered by a shift in 
> preemption/scheduling patterns. Or something related to preempt-notifiers 
> (which KVM is using). A genuine scheduler bug is in the cards too, but the 
> KVM-only angle of this bug gives it a low probability.
>   

Can we trace task wakeups somehow? (latency between wakeup and actually 
running).

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 12:37         ` Avi Kivity
  0 siblings, 0 replies; 180+ messages in thread
From: Avi Kivity @ 2009-01-20 12:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kevin Shanahan, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Kevin Shanahan, Mike Galbraith,
	Peter Zijlstra

Ingo Molnar wrote:
> * Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org> wrote:
>
>   
>> On Mon, 2009-01-19 at 22:45 +0100, Rafael J. Wysocki wrote:
>>     
>>> This message has been generated automatically as a part of a report
>>> of regressions introduced between 2.6.27 and 2.6.28.
>>>
>>> The following bug entry is on the current list of known regressions
>>> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
>>> be listed and let me know (either way).
>>>
>>>
>>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
>>> Subject		: KVM guests stalling on 2.6.28 (bisected)
>>> Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
>>> Date		: 2009-01-17 03:37 (3 days old)
>>>       
>> Yes, please keep this on the list.
>>     
>
> This only seems to occur under KVM, right? I.e. you tested it with -no-kvm 
> and the problem went away, correct?
>
> This suggests some sort of KVM-specific problem. Scheduler latencies in 
> the seconds that occur under normal load situations are noticed and 
> reported quickly - and there are no such open regressions currently.
>
>   

Not necessarily.  -no-kvm runs with only one thread, compared to kvm 
that runs with 1 + nr_cpus threads.

> Avi, can you reproduce these latencies? 

No.

> A possibly theory would be some 
> sort of guest wakeup problem/race triggered by a shift in 
> preemption/scheduling patterns. Or something related to preempt-notifiers 
> (which KVM is using). A genuine scheduler bug is in the cards too, but the 
> KVM-only angle of this bug gives it a low probability.
>   

Can we trace task wakeups somehow? (latency between wakeup and actually 
running).

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 11:35       ` Ingo Molnar
  0 siblings, 0 replies; 180+ messages in thread
From: Ingo Molnar @ 2009-01-20 11:35 UTC (permalink / raw)
  To: Kevin Shanahan, Avi Kivity
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Kevin Shanahan, Mike Galbraith,
	Peter Zijlstra


* Kevin Shanahan <kmshanah@ucwb.org.au> wrote:

> On Mon, 2009-01-19 at 22:45 +0100, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.27 and 2.6.28.
> > 
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> > be listed and let me know (either way).
> > 
> > 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> > Subject		: KVM guests stalling on 2.6.28 (bisected)
> > Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
> > Date		: 2009-01-17 03:37 (3 days old)
> 
> Yes, please keep this on the list.

This only seems to occur under KVM, right? I.e. you tested it with -no-kvm 
and the problem went away, correct?

This suggests some sort of KVM-specific problem. Scheduler latencies in 
the seconds that occur under normal load situations are noticed and 
reported quickly - and there are no such open regressions currently.

Avi, can you reproduce these latencies? A possibly theory would be some 
sort of guest wakeup problem/race triggered by a shift in 
preemption/scheduling patterns. Or something related to preempt-notifiers 
(which KVM is using). A genuine scheduler bug is in the cards too, but the 
KVM-only angle of this bug gives it a low probability.

	Ingo

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 11:35       ` Ingo Molnar
  0 siblings, 0 replies; 180+ messages in thread
From: Ingo Molnar @ 2009-01-20 11:35 UTC (permalink / raw)
  To: Kevin Shanahan, Avi Kivity
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Kevin Shanahan, Mike Galbraith,
	Peter Zijlstra


* Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org> wrote:

> On Mon, 2009-01-19 at 22:45 +0100, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.27 and 2.6.28.
> > 
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> > be listed and let me know (either way).
> > 
> > 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> > Subject		: KVM guests stalling on 2.6.28 (bisected)
> > Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
> > Date		: 2009-01-17 03:37 (3 days old)
> 
> Yes, please keep this on the list.

This only seems to occur under KVM, right? I.e. you tested it with -no-kvm 
and the problem went away, correct?

This suggests some sort of KVM-specific problem. Scheduler latencies in 
the seconds that occur under normal load situations are noticed and 
reported quickly - and there are no such open regressions currently.

Avi, can you reproduce these latencies? A possibly theory would be some 
sort of guest wakeup problem/race triggered by a shift in 
preemption/scheduling patterns. Or something related to preempt-notifiers 
(which KVM is using). A genuine scheduler bug is in the cards too, but the 
KVM-only angle of this bug gives it a low probability.

	Ingo

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-01-19 21:45   ` Rafael J. Wysocki
  (?)
@ 2009-01-20  0:12   ` Kevin Shanahan
  2009-01-20 11:35       ` Ingo Molnar
  -1 siblings, 1 reply; 180+ messages in thread
From: Kevin Shanahan @ 2009-01-20  0:12 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Ingo Molnar,
	Kevin Shanahan, Mike Galbraith, Peter Zijlstra

On Mon, 2009-01-19 at 22:45 +0100, Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.27 and 2.6.28.
> 
> The following bug entry is on the current list of known regressions
> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> be listed and let me know (either way).
> 
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> Subject		: KVM guests stalling on 2.6.28 (bisected)
> Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
> Date		: 2009-01-17 03:37 (3 days old)

Yes, please keep this on the list.

Cheers,
Kevin.



^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-01-19 21:41 2.6.29-rc2-git1: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
@ 2009-01-19 21:45   ` Rafael J. Wysocki
  0 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-01-19 21:45 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Ingo Molnar, Kevin Shanahan, Kevin Shanahan,
	Mike Galbraith, Peter Zijlstra

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
Subject		: KVM guests stalling on 2.6.28 (bisected)
Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
Date		: 2009-01-17 03:37 (3 days old)



^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-19 21:45   ` Rafael J. Wysocki
  0 siblings, 0 replies; 180+ messages in thread
From: Rafael J. Wysocki @ 2009-01-19 21:45 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Ingo Molnar, Kevin Shanahan, Kevin Shanahan,
	Mike Galbraith, Peter Zijlstra

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
Subject		: KVM guests stalling on 2.6.28 (bisected)
Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
Date		: 2009-01-17 03:37 (3 days old)


^ permalink raw reply	[flat|nested] 180+ messages in thread

end of thread, other threads:[~2009-03-26 20:23 UTC | newest]

Thread overview: 180+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-14 19:11 2.6.29-rc8: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
2009-03-14 19:11 ` Rafael J. Wysocki
2009-03-14 19:12 ` [Bug #12061] snd_hda_intel: power_save: sound cracks on powerdown Rafael J. Wysocki
2009-03-14 19:12   ` Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12411] 2.6.28: BUG in r8169 Rafael J. Wysocki
2009-03-14 19:20   ` Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12209] oldish top core dumps (in its meminfo() function) Rafael J. Wysocki
2009-03-14 19:20   ` Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12208] uml is very slow on 2.6.28 host Rafael J. Wysocki
2009-03-14 19:20   ` Rafael J. Wysocki
2009-03-21 14:44   ` ptrace performance (was: [Bug #12208] uml is very slow on 2.6.28 host) Michael Riepe
2009-03-21 14:44     ` Michael Riepe
2009-03-21 15:22     ` Ingo Molnar
2009-03-21 15:22       ` Ingo Molnar
2009-03-21 17:02       ` ptrace performance Michael Riepe
2009-03-14 19:20 ` [Bug #12337] ~100 extra wakeups reported by powertop Rafael J. Wysocki
2009-03-14 19:20   ` Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12404] Oops in 2.6.28-rc9 and -rc8 -- mtrr issues / e1000e Rafael J. Wysocki
2009-03-14 19:20   ` Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12421] GPF on 2.6.28 and 2.6.28-rc9-git3, e1000e and e1000 issues Rafael J. Wysocki
2009-03-14 19:20   ` Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12426] TMDC Joystick no longer works in kernel 2.6.28 Rafael J. Wysocki
2009-03-14 19:20   ` Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12465] KVM guests stalling on 2.6.28 (bisected) Rafael J. Wysocki
2009-03-14 19:20   ` Rafael J. Wysocki
2009-03-15  9:03   ` Kevin Shanahan
2009-03-15  9:03     ` Kevin Shanahan
2009-03-15  9:18     ` Avi Kivity
2009-03-15  9:18       ` Avi Kivity
2009-03-15  9:48       ` Ingo Molnar
2009-03-15  9:48         ` Ingo Molnar
2009-03-15  9:56         ` Avi Kivity
2009-03-15  9:56           ` Avi Kivity
2009-03-15 10:03           ` Ingo Molnar
2009-03-15 10:13             ` Avi Kivity
2009-03-15 10:13               ` Avi Kivity
2009-03-16  9:49     ` Avi Kivity
2009-03-16  9:49       ` Avi Kivity
2009-03-16 12:46       ` Kevin Shanahan
2009-03-16 12:46         ` Kevin Shanahan
2009-03-16 20:07         ` Frederic Weisbecker
2009-03-16 20:07           ` Frederic Weisbecker
2009-03-16 22:55           ` Kevin Shanahan
2009-03-16 22:55             ` Kevin Shanahan
2009-03-18  0:20             ` Frederic Weisbecker
2009-03-18  0:20               ` Frederic Weisbecker
2009-03-18  1:16               ` Kevin Shanahan
2009-03-18  1:16                 ` Kevin Shanahan
2009-03-18  2:24                 ` Frederic Weisbecker
2009-03-18  2:24                   ` Frederic Weisbecker
2009-03-18 21:24                 ` Kevin Shanahan
2009-03-21  5:00                   ` Kevin Shanahan
2009-03-21  5:00                     ` Kevin Shanahan
2009-03-21 14:08                     ` Frederic Weisbecker
2009-03-21 14:08                       ` Frederic Weisbecker
2009-03-24 11:44                     ` Frederic Weisbecker
2009-03-24 11:44                       ` Frederic Weisbecker
2009-03-24 11:47                       ` Frederic Weisbecker
2009-03-24 11:47                         ` Frederic Weisbecker
2009-03-25 23:40                       ` Kevin Shanahan
2009-03-25 23:48                         ` Frederic Weisbecker
2009-03-25 23:48                           ` Frederic Weisbecker
2009-03-26 20:22                       ` Kevin Shanahan
2009-03-26 20:22                         ` Kevin Shanahan
2009-03-14 19:20 ` [Bug #12500] r8169: NETDEV WATCHDOG: eth0 (r8169): transmit timed out Rafael J. Wysocki
2009-03-14 19:20   ` Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12619] Regression 2.6.28 and last - boot failed Rafael J. Wysocki
2009-03-14 19:20   ` Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12612] hard lockup when interrupting cdda2wav Rafael J. Wysocki
2009-03-14 19:20   ` Rafael J. Wysocki
2009-03-17  0:53   ` FUJITA Tomonori
2009-03-17  0:53     ` FUJITA Tomonori
2009-03-17 14:52     ` James Bottomley
2009-03-17 14:52       ` James Bottomley
2009-03-14 19:20 ` [Bug #12645] DMI low-memory-protect quirk causes resume hang on Samsung NC10 Rafael J. Wysocki
2009-03-14 19:20   ` Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12634] video distortion and lockup with i830 video chip and 2.6.28.3 Rafael J. Wysocki
2009-03-14 19:20   ` Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12690] DPMS (LCD powersave, poweroff) don't work Rafael J. Wysocki
2009-03-14 19:20   ` Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12798] No wake up after suspend Rafael J. Wysocki
2009-03-14 19:20   ` Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12818] iwlagn broken after suspend to RAM (iwlagn: MAC is in deep sleep!) Rafael J. Wysocki
2009-03-14 19:20   ` Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12835] Regression in backlight detection Rafael J. Wysocki
2009-03-14 19:20   ` Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12868] iproute2 and regressing "ipv6: convert tunnels to net_device_ops" Rafael J. Wysocki
  -- strict thread matches above, loose matches on Subject: below --
2009-03-21 17:01 2.6.29-rc8-git5: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
2009-03-21 17:07 ` [Bug #12465] KVM guests stalling on 2.6.28 (bisected) Rafael J. Wysocki
2009-03-21 17:07   ` Rafael J. Wysocki
2009-03-21 19:50   ` Ingo Molnar
2009-03-21 19:50     ` Ingo Molnar
2009-03-03 19:34 2.6.29-rc6-git7: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
2009-03-03 19:41 ` [Bug #12465] KVM guests stalling on 2.6.28 (bisected) Rafael J. Wysocki
2009-03-03 19:41   ` Rafael J. Wysocki
2009-03-04  3:08   ` Kevin Shanahan
2009-03-04  3:08     ` Kevin Shanahan
2009-03-08 10:04     ` Avi Kivity
2009-03-08 10:04       ` Avi Kivity
2009-02-23 22:00 2.6.29-rc6: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
2009-02-23 22:03 ` [Bug #12465] KVM guests stalling on 2.6.28 (bisected) Rafael J. Wysocki
2009-02-23 22:03   ` Rafael J. Wysocki
2009-02-24  0:59   ` Kevin Shanahan
2009-02-24  0:59     ` Kevin Shanahan
2009-02-24  1:37     ` Rafael J. Wysocki
2009-02-24  1:37       ` Rafael J. Wysocki
2009-02-24 12:09     ` Avi Kivity
2009-02-24 12:09       ` Avi Kivity
2009-02-24 22:11       ` Kevin Shanahan
2009-02-24 22:11         ` Kevin Shanahan
2009-02-14 20:48 2.6.29-rc5: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
2009-02-14 20:50 ` [Bug #12465] KVM guests stalling on 2.6.28 (bisected) Rafael J. Wysocki
2009-02-14 20:50   ` Rafael J. Wysocki
2009-02-04 10:55 2.6.29-rc3-git6: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
2009-02-04 10:58 ` [Bug #12465] KVM guests stalling on 2.6.28 (bisected) Rafael J. Wysocki
2009-02-04 10:58   ` Rafael J. Wysocki
2009-02-05 19:35   ` Kevin Shanahan
2009-02-05 19:35     ` Kevin Shanahan
2009-02-05 22:37     ` Rafael J. Wysocki
2009-02-05 22:37       ` Rafael J. Wysocki
2009-01-19 21:41 2.6.29-rc2-git1: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
2009-01-19 21:45 ` [Bug #12465] KVM guests stalling on 2.6.28 (bisected) Rafael J. Wysocki
2009-01-19 21:45   ` Rafael J. Wysocki
2009-01-20  0:12   ` Kevin Shanahan
2009-01-20 11:35     ` Ingo Molnar
2009-01-20 11:35       ` Ingo Molnar
2009-01-20 12:37       ` Avi Kivity
2009-01-20 12:37         ` Avi Kivity
2009-01-20 12:42       ` Kevin Shanahan
2009-01-20 12:42         ` Kevin Shanahan
2009-01-20 12:56         ` Ingo Molnar
2009-01-20 12:56           ` Ingo Molnar
2009-01-20 13:07           ` Ingo Molnar
2009-01-20 13:07             ` Ingo Molnar
2009-01-20 14:59             ` Steven Rostedt
2009-01-20 14:59               ` Steven Rostedt
2009-01-20 15:04               ` Ingo Molnar
2009-01-20 15:04                 ` Ingo Molnar
2009-01-20 17:53                 ` Steven Rostedt
2009-01-20 17:53                   ` Steven Rostedt
2009-01-20 18:39                   ` Ingo Molnar
2009-01-20 18:39                     ` Ingo Molnar
2009-01-20 17:47               ` Avi Kivity
2009-01-20 17:47                 ` Avi Kivity
2009-01-21 14:25                 ` Kevin Shanahan
2009-01-21 14:25                   ` Kevin Shanahan
2009-01-21 14:34                   ` Avi Kivity
2009-01-21 14:34                     ` Avi Kivity
2009-01-21 14:51                     ` Kevin Shanahan
2009-01-21 14:51                       ` Kevin Shanahan
2009-01-21 14:59                       ` Avi Kivity
2009-01-21 14:59                         ` Avi Kivity
2009-01-21 15:13                         ` Steven Rostedt
2009-01-21 15:13                           ` Steven Rostedt
2009-01-22  1:48                         ` Steven Rostedt
2009-01-22  1:48                           ` Steven Rostedt
2009-01-21 15:10                     ` Steven Rostedt
2009-01-21 15:10                       ` Steven Rostedt
2009-01-21 15:18                     ` Ingo Molnar
2009-01-21 15:18                       ` Ingo Molnar
2009-01-22 19:57                       ` Kevin Shanahan
2009-01-22 20:31                         ` Ingo Molnar
2009-01-22 20:31                           ` Ingo Molnar
2009-01-26  9:55                       ` Kevin Shanahan
2009-01-26  9:55                         ` Kevin Shanahan
2009-01-26 11:35                         ` Peter Zijlstra
2009-01-26 15:00                           ` Ingo Molnar
2009-01-26 15:00                             ` Ingo Molnar
2009-01-20 14:23           ` Kevin Shanahan
2009-01-20 14:23             ` Kevin Shanahan
2009-01-20 14:25             ` Ingo Molnar
2009-01-20 14:25               ` Ingo Molnar
2009-01-20 15:51               ` Kevin Shanahan
2009-01-20 15:51                 ` Kevin Shanahan
2009-01-20 16:06                 ` Ingo Molnar
2009-01-20 16:06                   ` Ingo Molnar
2009-01-20 16:19                   ` Peter Zijlstra
2009-01-20 16:19                     ` Peter Zijlstra
2009-01-20 14:46             ` Frédéric Weisbecker
2009-01-20 14:46               ` Frédéric Weisbecker
2009-01-20 13:04         ` Avi Kivity
2009-01-20 13:04           ` Avi Kivity
2009-01-20 17:54           ` Kevin Shanahan
2009-01-20 17:54             ` Kevin Shanahan
2009-01-20 18:42             ` Ingo Molnar
2009-01-20 18:42               ` Ingo Molnar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.