All of lore.kernel.org
 help / color / mirror / Atom feed
* 2.6.29-rc5: Reported regressions 2.6.27 -> 2.6.28
@ 2009-02-14 20:48 ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:48 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Andrew Morton, Linus Torvalds, Natalie Protasevich,
	Kernel Testers List, Network Development, Linux ACPI,
	Linux PM List, Linux SCSI List, Stable Kernel Team

This message contains a list of some regressions introduced between 2.6.27 and
2.6.28, for which there are no fixes in the mainline I know of.  If any of them
have been fixed already, please let me know.

If you know of any other unresolved regressions introduced between 2.6.27
and 2.6.28, please let me know either and I'll add them to the list.
Also, please let me know if any of the entries below are invalid.

Each entry from the list will be sent additionally in an automatic reply to
this message with CCs to the people involved in reporting and handling the
issue.


Listed regressions statistics:

  Date          Total  Pending  Unresolved
  ----------------------------------------
  2009-02-15      152       30          26
  2009-02-04      149       33          30
  2009-01-20      144       30          27
  2009-01-11      139       33          30
  2008-12-21      120       19          17
  2008-12-13      111       14          13
  2008-12-07      106       20          17
  2008-12-04      106       29          21
  2008-11-22       93       25          15
  2008-11-16       89       32          18
  2008-11-09       73       40          27
  2008-11-02       55       41          29
  2008-10-25       26       25          20


Unresolved regressions
----------------------

Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12690
Subject		: DPMS (LCD powersave, poweroff) don't work
Submitter	: Antonin Kolisek <akolisek-T3ps84XAcx36AaHJ4hbVU+3CNBr840j2@public.gmane.org>
Date		: 2009-02-11 09:40 (4 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12658
Subject		: ThrustMaster Firestorm Dual Power 3 Gamepads stopped working
Submitter	: Frank Roscher <Frank-Roscher-hi6Y0CQ0nG0@public.gmane.org>
Date		: 2009-02-08 08:45 (7 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12645
Subject		: DMI low-memory-protect quirk causes resume hang on Samsung NC10
Submitter	: Patrick Walton <pcwalton-764C0pRuGfqVc3sceRu5cw@public.gmane.org>
Date		: 2009-02-06 18:35 (9 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0af40a4b1050c050e62eb1dc30b82d5ab22bf221


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12634
Subject		: video distortion and lockup with i830 video chip and 2.6.28.3
Submitter	: Bob Raitz <pappy_mcfae-/E1597aS9LQAvxtiuMwx3w@public.gmane.org>
Date		: 2009-02-04 21:10 (11 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12619
Subject		: Regression 2.6.28 and last - boot failed
Submitter	: jan sonnek <ha2nny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2009-02-01 19:59 (14 days old)
References	: http://marc.info/?l=linux-kernel&m=123351836213969&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12559
Subject		: Huawei E169 doesn't work as mass storage anymore
Submitter	: kpalberg <kpalberg-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2009-01-28 02:34 (18 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12500
Subject		: r8169: NETDEV WATCHDOG: eth0 (r8169): transmit timed out
Submitter	: Justin Piszcz <jpiszcz-BP4nVm5VUdNhbmWW9KSYcQ@public.gmane.org>
Date		: 2009-01-13 21:19 (33 days old)
References	: http://marc.info/?l=linux-kernel&m=123188160811322&w=4
Handled-By	: Francois Romieu <romieu-W8zweXLXuWQS+FvcfC7Uqw@public.gmane.org>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
Subject		: KVM guests stalling on 2.6.28 (bisected)
Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
Date		: 2009-01-17 03:37 (29 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12421
Subject		: GPF on 2.6.28 and 2.6.28-rc9-git3, e1000e and e1000 issues
Submitter	: Doug Bazarnic <doug-nOyj/A09A+/k1uMJSBkQmQ@public.gmane.org>
Date		: 2009-01-09 21:26 (37 days old)
References	: http://marc.info/?l=linux-kernel&m=123153653120204&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12411
Subject		: 2.6.28: BUG in r8169
Submitter	: Andrey Vul <andrey.vul-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2008-12-31 18:37 (46 days old)
References	: http://marc.info/?l=linux-kernel&m=123074869611409&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12409
Subject		: NULL pointer dereference at get_stats()
Submitter	: Tetsuo Handa <penguin-kernel-1yMVhJb1mP/7nzcFbJAaVXf5DAMn2ifp@public.gmane.org>
Date		: 2008-12-30 12:53 (47 days old)
References	: http://marc.info/?l=linux-kernel&m=123064167008695&w=4
Handled-By	: Frederik Deweerdt <frederik.deweerdt-kjvbsxwSFqI@public.gmane.org>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12408
Subject		: Funny problem with 2.6.28: Kernel stalls
Submitter	: Michael Roth <mroth-+8Z3Oe2AQjqzQB+pC5nmwQ@public.gmane.org>
Date		: 2008-12-25 15:14 (52 days old)
References	: http://marc.info/?l=linux-kernel&m=123021931714282&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12407
Subject		: Kernel 2.6.28 regression: Hang after hibernate
Submitter	: Frank Groeneveld <frankgroeneveld-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2008-12-28 20:34 (49 days old)
References	: http://marc.info/?l=linux-kernel&m=123049651906081&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12405
Subject		: oops in __bounce_end_io_read under kvm
Submitter	: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Date		: 2008-12-26 17:36 (51 days old)
References	: http://marc.info/?l=linux-kernel&m=123031303400676&w=4
Handled-By	: Jens Axboe <jens.axboe-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12404
Subject		: Oops in 2.6.28-rc9 and -rc8 -- mtrr issues / e1000e
Submitter	: Kernel <kernel-nOyj/A09A+/k1uMJSBkQmQ@public.gmane.org>
Date		: 2008-12-22 9:37 (55 days old)
References	: http://marc.info/?l=linux-kernel&m=122993873320150&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12403
Subject		: TTY problem on linux-2.6.28-rc7
Submitter	: sasa sasa <sasak.1983-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2008-12-22 4:23 (55 days old)
References	: http://marc.info/?l=linux-kernel&m=122991914600390&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12401
Subject		: 2.6.28 regression: xbacklight broken on ThinkPad X61s
Submitter	: Tino Keitel <tino.keitel-Mmb7MZpHnFY@public.gmane.org>
Date		: 2009-01-05 8:39 (41 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=22c13f9d8179f4c9caecfcb60a95214562b9addc
References	: http://marc.info/?l=linux-kernel&m=123114479110314&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12395
Subject		: 2.6.28-rc9: oprofile regression
Submitter	: Tim Blechmann <tim-xpEK/MU0Hawdnm+yROfE0A@public.gmane.org>
Date		: 2008-12-21 14:23 (56 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b99170288421c79f0c2efa8b33e26e65f4bb7fb8
References	: http://marc.info/?l=linux-kernel&m=122986946614791&w=4
Handled-By	: Andi Kleen <ak-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12337
Subject		: ~100 extra wakeups reported by powertop
Submitter	: Alberto Gonzalez <luis6674-/E1597aS9LQAvxtiuMwx3w@public.gmane.org>
Date		: 2008-12-31 12:25 (46 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12265
Subject		: FPU emulation broken in 2.6.28-rc8 ?
Submitter	: Rogier Wolff <R.E.Wolff-bu/CaDbLbdHGjfRZg6uqBA@public.gmane.org>
Date		: 2008-12-17 8:56 (60 days old)
References	: http://marc.info/?l=linux-kernel&m=122950463030747&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12263
Subject		: Sata soft reset filling log
Submitter	: Justin Madru <bevicm-QP1aEjBt37AFQeE35raUng@public.gmane.org>
Date		: 2008-12-13 2:07 (64 days old)
References	: http://marc.info/?l=linux-kernel&m=122913412608533&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12224
Subject		: journal activity on inactive partition causes inactive harddrive spinup
Submitter	: C Sights <csights-97jfqw80gc6171pxa8y+qA@public.gmane.org>
Date		: 2008-12-14 11:39 (63 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=c87591b719737b4e91eb1a9fa8fd55a4ff1886d6
Handled-By	: Eric Sandeen <sandeen-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12209
Subject		: oldish top core dumps (in its meminfo() function)
Submitter	: Andreas Mohr <andi-5+Cda9B46AM@public.gmane.org>
Date		: 2008-12-12 18:49 (65 days old)
References	: http://marc.info/?l=linux-kernel&m=122910784006472&w=4
		  http://marc.info/?l=linux-kernel&m=122907511319288&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12208
Subject		: uml is very slow on 2.6.28 host
Submitter	: Miklos Szeredi <miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org>
Date		: 2008-12-12 9:35 (65 days old)
References	: http://marc.info/?l=linux-kernel&m=122907463518593&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12160
Subject		: networking oops after resume from s2ram (2.6.28-rc6)
Submitter	: Marcin Slusarz <marcin.slusarz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2008-11-28 21:15 (79 days old)
References	: http://marc.info/?l=linux-kernel&m=122790701615723&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12061
Subject		: snd_hda_intel: power_save: sound cracks on powerdown
Submitter	: Jens Weibler <bugzilla-kernel-6hJTtV8wudIr9FUcG+3rRQ@public.gmane.org>
Date		: 2008-11-18 12:07 (89 days old)
Handled-By	: Takashi Iwai <tiwai-l3A5Bk7waGM@public.gmane.org>


Regressions with patches
------------------------

Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12614
Subject		: WOL with forcedeth broken since f55c21fd9a92a444e55ad1ca4e4732d56661bf2e
Submitter	: Philipp Matthias Hahn <pmhahn-u4khhh1J0LzF41mA0N3lWw@public.gmane.org>
Date		: 2009-01-29 6:31 (17 days old)
References	: http://marc.info/?l=linux-kernel&m=123321232825316&w=4
Handled-By	: Yinghai Lu <yinghai-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
		  Tobias Diedrich <ranma+kernel-YxUgxmcw2FPQD6PfKP4TzA@public.gmane.org>
Patch		: http://marc.info/?l=linux-kernel&m=123330459229248&w=4
		  http://marc.info/?l=linux-kernel&m=123411195117835&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12612
Subject		: hard lockup when interrupting cdda2wav
Submitter	: Matthias Reichl <hias-vtPv7MOkFPkAvxtiuMwx3w@public.gmane.org>
Date		: 2009-01-28 16:41 (18 days old)
References	: http://marc.info/?l=linux-kernel&m=123316111415677&w=4
Handled-By	: FUJITA Tomonori <fujita.tomonori-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
Patch		: http://marc.info/?l=linux-scsi&m=123371501613019&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12406
Subject		: 2.6.28 thinks that my PS/2 mouse is a touchpad
Submitter	: Alexander E. Patrakov <patrakov-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2008-12-27 9:06 (50 days old)
References	: http://marc.info/?l=linux-kernel&m=123036893817280&w=4
Handled-By	: Arjan Opmeer <arjan-OssVvNj1wBysTnJN9+BGXg@public.gmane.org>
Patch		: http://marc.info/?l=linux-kernel&m=123092147703236&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12393
Subject		: debugging in dosemu causes lots of 'scheduling while atomic'
Submitter	: Michal Suchanek <hramrach-aRb0bU7PRFPrBKCeMvbIDA@public.gmane.org>
Date		: 2009-01-09 07:28 (37 days old)
Handled-By	: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
Patch		: http://lkml.org/lkml/2009/1/13/445


For details, please visit the bug entries and follow the links given in
references.

As you can see, there is a Bugzilla entry for each of the listed regressions.
There also is a Bugzilla entry used for tracking the regressions introduced
between 2.6.27 and 2.6.28, unresolved as well as resolved, at:

http://bugzilla.kernel.org/show_bug.cgi?id=11808

Please let me know if there are any Bugzilla entries that should be added to
the list in there.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 262+ messages in thread

* 2.6.29-rc5: Reported regressions 2.6.27 -> 2.6.28
@ 2009-02-14 20:48 ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:48 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Andrew Morton, Linus Torvalds, Natalie Protasevich,
	Kernel Testers List, Network Development, Linux ACPI,
	Linux PM List, Linux SCSI List, Stable Kernel Team

This message contains a list of some regressions introduced between 2.6.27 and
2.6.28, for which there are no fixes in the mainline I know of.  If any of them
have been fixed already, please let me know.

If you know of any other unresolved regressions introduced between 2.6.27
and 2.6.28, please let me know either and I'll add them to the list.
Also, please let me know if any of the entries below are invalid.

Each entry from the list will be sent additionally in an automatic reply to
this message with CCs to the people involved in reporting and handling the
issue.


Listed regressions statistics:

  Date          Total  Pending  Unresolved
  ----------------------------------------
  2009-02-15      152       30          26
  2009-02-04      149       33          30
  2009-01-20      144       30          27
  2009-01-11      139       33          30
  2008-12-21      120       19          17
  2008-12-13      111       14          13
  2008-12-07      106       20          17
  2008-12-04      106       29          21
  2008-11-22       93       25          15
  2008-11-16       89       32          18
  2008-11-09       73       40          27
  2008-11-02       55       41          29
  2008-10-25       26       25          20


Unresolved regressions
----------------------

Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12690
Subject		: DPMS (LCD powersave, poweroff) don't work
Submitter	: Antonin Kolisek <akolisek@linuxx.hyperlinx.cz>
Date		: 2009-02-11 09:40 (4 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12658
Subject		: ThrustMaster Firestorm Dual Power 3 Gamepads stopped working
Submitter	: Frank Roscher <Frank-Roscher@gmx.net>
Date		: 2009-02-08 08:45 (7 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12645
Subject		: DMI low-memory-protect quirk causes resume hang on Samsung NC10
Submitter	: Patrick Walton <pcwalton@cs.ucla.edu>
Date		: 2009-02-06 18:35 (9 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0af40a4b1050c050e62eb1dc30b82d5ab22bf221


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12634
Subject		: video distortion and lockup with i830 video chip and 2.6.28.3
Submitter	: Bob Raitz <pappy_mcfae@yahoo.com>
Date		: 2009-02-04 21:10 (11 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12619
Subject		: Regression 2.6.28 and last - boot failed
Submitter	: jan sonnek <ha2nny@gmail.com>
Date		: 2009-02-01 19:59 (14 days old)
References	: http://marc.info/?l=linux-kernel&m=123351836213969&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12559
Subject		: Huawei E169 doesn't work as mass storage anymore
Submitter	: kpalberg <kpalberg@gmail.com>
Date		: 2009-01-28 02:34 (18 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12500
Subject		: r8169: NETDEV WATCHDOG: eth0 (r8169): transmit timed out
Submitter	: Justin Piszcz <jpiszcz@lucidpixels.com>
Date		: 2009-01-13 21:19 (33 days old)
References	: http://marc.info/?l=linux-kernel&m=123188160811322&w=4
Handled-By	: Francois Romieu <romieu@fr.zoreil.com>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
Subject		: KVM guests stalling on 2.6.28 (bisected)
Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
Date		: 2009-01-17 03:37 (29 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12421
Subject		: GPF on 2.6.28 and 2.6.28-rc9-git3, e1000e and e1000 issues
Submitter	: Doug Bazarnic <doug@bazarnic.net>
Date		: 2009-01-09 21:26 (37 days old)
References	: http://marc.info/?l=linux-kernel&m=123153653120204&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12411
Subject		: 2.6.28: BUG in r8169
Submitter	: Andrey Vul <andrey.vul@gmail.com>
Date		: 2008-12-31 18:37 (46 days old)
References	: http://marc.info/?l=linux-kernel&m=123074869611409&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12409
Subject		: NULL pointer dereference at get_stats()
Submitter	: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Date		: 2008-12-30 12:53 (47 days old)
References	: http://marc.info/?l=linux-kernel&m=123064167008695&w=4
Handled-By	: Frederik Deweerdt <frederik.deweerdt@xprog.eu>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12408
Subject		: Funny problem with 2.6.28: Kernel stalls
Submitter	: Michael Roth <mroth@nessie.de>
Date		: 2008-12-25 15:14 (52 days old)
References	: http://marc.info/?l=linux-kernel&m=123021931714282&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12407
Subject		: Kernel 2.6.28 regression: Hang after hibernate
Submitter	: Frank Groeneveld <frankgroeneveld@gmail.com>
Date		: 2008-12-28 20:34 (49 days old)
References	: http://marc.info/?l=linux-kernel&m=123049651906081&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12405
Subject		: oops in __bounce_end_io_read under kvm
Submitter	: Christoph Hellwig <hch@lst.de>
Date		: 2008-12-26 17:36 (51 days old)
References	: http://marc.info/?l=linux-kernel&m=123031303400676&w=4
Handled-By	: Jens Axboe <jens.axboe@oracle.com>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12404
Subject		: Oops in 2.6.28-rc9 and -rc8 -- mtrr issues / e1000e
Submitter	: Kernel <kernel@bazarnic.net>
Date		: 2008-12-22 9:37 (55 days old)
References	: http://marc.info/?l=linux-kernel&m=122993873320150&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12403
Subject		: TTY problem on linux-2.6.28-rc7
Submitter	: sasa sasa <sasak.1983@gmail.com>
Date		: 2008-12-22 4:23 (55 days old)
References	: http://marc.info/?l=linux-kernel&m=122991914600390&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12401
Subject		: 2.6.28 regression: xbacklight broken on ThinkPad X61s
Submitter	: Tino Keitel <tino.keitel@gmx.de>
Date		: 2009-01-05 8:39 (41 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=22c13f9d8179f4c9caecfcb60a95214562b9addc
References	: http://marc.info/?l=linux-kernel&m=123114479110314&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12395
Subject		: 2.6.28-rc9: oprofile regression
Submitter	: Tim Blechmann <tim@klingt.org>
Date		: 2008-12-21 14:23 (56 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b99170288421c79f0c2efa8b33e26e65f4bb7fb8
References	: http://marc.info/?l=linux-kernel&m=122986946614791&w=4
Handled-By	: Andi Kleen <ak@linux.intel.com>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12337
Subject		: ~100 extra wakeups reported by powertop
Submitter	: Alberto Gonzalez <luis6674@yahoo.com>
Date		: 2008-12-31 12:25 (46 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12265
Subject		: FPU emulation broken in 2.6.28-rc8 ?
Submitter	: Rogier Wolff <R.E.Wolff@bitwizard.nl>
Date		: 2008-12-17 8:56 (60 days old)
References	: http://marc.info/?l=linux-kernel&m=122950463030747&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12263
Subject		: Sata soft reset filling log
Submitter	: Justin Madru <bevicm@dslextreme.com>
Date		: 2008-12-13 2:07 (64 days old)
References	: http://marc.info/?l=linux-kernel&m=122913412608533&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12224
Subject		: journal activity on inactive partition causes inactive harddrive spinup
Submitter	: C Sights <csights@fastmail.fm>
Date		: 2008-12-14 11:39 (63 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=c87591b719737b4e91eb1a9fa8fd55a4ff1886d6
Handled-By	: Eric Sandeen <sandeen@redhat.com>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12209
Subject		: oldish top core dumps (in its meminfo() function)
Submitter	: Andreas Mohr <andi@lisas.de>
Date		: 2008-12-12 18:49 (65 days old)
References	: http://marc.info/?l=linux-kernel&m=122910784006472&w=4
		  http://marc.info/?l=linux-kernel&m=122907511319288&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12208
Subject		: uml is very slow on 2.6.28 host
Submitter	: Miklos Szeredi <miklos@szeredi.hu>
Date		: 2008-12-12 9:35 (65 days old)
References	: http://marc.info/?l=linux-kernel&m=122907463518593&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12160
Subject		: networking oops after resume from s2ram (2.6.28-rc6)
Submitter	: Marcin Slusarz <marcin.slusarz@gmail.com>
Date		: 2008-11-28 21:15 (79 days old)
References	: http://marc.info/?l=linux-kernel&m=122790701615723&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12061
Subject		: snd_hda_intel: power_save: sound cracks on powerdown
Submitter	: Jens Weibler <bugzilla-kernel@jensthebrain.de>
Date		: 2008-11-18 12:07 (89 days old)
Handled-By	: Takashi Iwai <tiwai@suse.de>


Regressions with patches
------------------------

Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12614
Subject		: WOL with forcedeth broken since f55c21fd9a92a444e55ad1ca4e4732d56661bf2e
Submitter	: Philipp Matthias Hahn <pmhahn@titan.lahn.de>
Date		: 2009-01-29 6:31 (17 days old)
References	: http://marc.info/?l=linux-kernel&m=123321232825316&w=4
Handled-By	: Yinghai Lu <yinghai@kernel.org>
		  Tobias Diedrich <ranma+kernel@tdiedrich.de>
Patch		: http://marc.info/?l=linux-kernel&m=123330459229248&w=4
		  http://marc.info/?l=linux-kernel&m=123411195117835&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12612
Subject		: hard lockup when interrupting cdda2wav
Submitter	: Matthias Reichl <hias@horus.com>
Date		: 2009-01-28 16:41 (18 days old)
References	: http://marc.info/?l=linux-kernel&m=123316111415677&w=4
Handled-By	: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Patch		: http://marc.info/?l=linux-scsi&m=123371501613019&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12406
Subject		: 2.6.28 thinks that my PS/2 mouse is a touchpad
Submitter	: Alexander E. Patrakov <patrakov@gmail.com>
Date		: 2008-12-27 9:06 (50 days old)
References	: http://marc.info/?l=linux-kernel&m=123036893817280&w=4
Handled-By	: Arjan Opmeer <arjan@opmeer.net>
Patch		: http://marc.info/?l=linux-kernel&m=123092147703236&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12393
Subject		: debugging in dosemu causes lots of 'scheduling while atomic'
Submitter	: Michal Suchanek <hramrach@centrum.cz>
Date		: 2009-01-09 07:28 (37 days old)
Handled-By	: Thomas Gleixner <tglx@linutronix.de>
Patch		: http://lkml.org/lkml/2009/1/13/445


For details, please visit the bug entries and follow the links given in
references.

As you can see, there is a Bugzilla entry for each of the listed regressions.
There also is a Bugzilla entry used for tracking the regressions introduced
between 2.6.27 and 2.6.28, unresolved as well as resolved, at:

http://bugzilla.kernel.org/show_bug.cgi?id=11808

Please let me know if there are any Bugzilla entries that should be added to
the list in there.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12061] snd_hda_intel: power_save: sound cracks on powerdown
  2009-02-14 20:48 ` Rafael J. Wysocki
@ 2009-02-14 20:48   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:48 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Jens Weibler, Takashi Iwai

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12061
Subject		: snd_hda_intel: power_save: sound cracks on powerdown
Submitter	: Jens Weibler <bugzilla-kernel@jensthebrain.de>
Date		: 2008-11-18 12:07 (89 days old)
Handled-By	: Takashi Iwai <tiwai@suse.de>



^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12061] snd_hda_intel: power_save: sound cracks on powerdown
@ 2009-02-14 20:48   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:48 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Jens Weibler, Takashi Iwai

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12061
Subject		: snd_hda_intel: power_save: sound cracks on powerdown
Submitter	: Jens Weibler <bugzilla-kernel-6hJTtV8wudIr9FUcG+3rRQ@public.gmane.org>
Date		: 2008-11-18 12:07 (89 days old)
Handled-By	: Takashi Iwai <tiwai-l3A5Bk7waGM@public.gmane.org>


^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12208] uml is very slow on 2.6.28 host
  2009-02-14 20:48 ` Rafael J. Wysocki
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Miklos Szeredi

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12208
Subject		: uml is very slow on 2.6.28 host
Submitter	: Miklos Szeredi <miklos@szeredi.hu>
Date		: 2008-12-12 9:35 (65 days old)
References	: http://marc.info/?l=linux-kernel&m=122907463518593&w=4



^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12209] oldish top core dumps (in its meminfo() function)
  2009-02-14 20:48 ` Rafael J. Wysocki
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Andreas Mohr

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12209
Subject		: oldish top core dumps (in its meminfo() function)
Submitter	: Andreas Mohr <andi@lisas.de>
Date		: 2008-12-12 18:49 (65 days old)
References	: http://marc.info/?l=linux-kernel&m=122910784006472&w=4
		  http://marc.info/?l=linux-kernel&m=122907511319288&w=4



^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12160] networking oops after resume from s2ram (2.6.28-rc6)
  2009-02-14 20:48 ` Rafael J. Wysocki
                   ` (3 preceding siblings ...)
  (?)
@ 2009-02-14 20:50 ` Rafael J. Wysocki
  -1 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Marcin Slusarz, netdev

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12160
Subject		: networking oops after resume from s2ram (2.6.28-rc6)
Submitter	: Marcin Slusarz <marcin.slusarz@gmail.com>
Date		: 2008-11-28 21:15 (79 days old)
References	: http://marc.info/?l=linux-kernel&m=122790701615723&w=4



^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12209] oldish top core dumps (in its meminfo() function)
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Andreas Mohr

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12209
Subject		: oldish top core dumps (in its meminfo() function)
Submitter	: Andreas Mohr <andi-5+Cda9B46AM@public.gmane.org>
Date		: 2008-12-12 18:49 (65 days old)
References	: http://marc.info/?l=linux-kernel&m=122910784006472&w=4
		  http://marc.info/?l=linux-kernel&m=122907511319288&w=4


^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12208] uml is very slow on 2.6.28 host
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Miklos Szeredi

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12208
Subject		: uml is very slow on 2.6.28 host
Submitter	: Miklos Szeredi <miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org>
Date		: 2008-12-12 9:35 (65 days old)
References	: http://marc.info/?l=linux-kernel&m=122907463518593&w=4


^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12263] Sata soft reset filling log
  2009-02-14 20:48 ` Rafael J. Wysocki
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Justin Madru, Linux IDE

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12263
Subject		: Sata soft reset filling log
Submitter	: Justin Madru <bevicm-QP1aEjBt37AFQeE35raUng@public.gmane.org>
Date		: 2008-12-13 2:07 (64 days old)
References	: http://marc.info/?l=linux-kernel&m=122913412608533&w=4

^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12224] journal activity on inactive partition causes inactive harddrive spinup
  2009-02-14 20:48 ` Rafael J. Wysocki
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Andrew Morton, Arthur Jones, C Sights,
	Eric Sandeen, Greg Kroah-Hartman, Linus Torvalds, Theodore Tso

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12224
Subject		: journal activity on inactive partition causes inactive harddrive spinup
Submitter	: C Sights <csights@fastmail.fm>
Date		: 2008-12-14 11:39 (63 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=c87591b719737b4e91eb1a9fa8fd55a4ff1886d6
Handled-By	: Eric Sandeen <sandeen@redhat.com>



^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12337] ~100 extra wakeups reported by powertop
  2009-02-14 20:48 ` Rafael J. Wysocki
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Alberto Gonzalez

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12337
Subject		: ~100 extra wakeups reported by powertop
Submitter	: Alberto Gonzalez <luis6674@yahoo.com>
Date		: 2008-12-31 12:25 (46 days old)



^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12265] FPU emulation broken in 2.6.28-rc8 ?
  2009-02-14 20:48 ` Rafael J. Wysocki
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Rogier Wolff

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12265
Subject		: FPU emulation broken in 2.6.28-rc8 ?
Submitter	: Rogier Wolff <R.E.Wolff@bitwizard.nl>
Date		: 2008-12-17 8:56 (60 days old)
References	: http://marc.info/?l=linux-kernel&m=122950463030747&w=4



^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12263] Sata soft reset filling log
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Justin Madru, Linux IDE

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12263
Subject		: Sata soft reset filling log
Submitter	: Justin Madru <bevicm@dslextreme.com>
Date		: 2008-12-13 2:07 (64 days old)
References	: http://marc.info/?l=linux-kernel&m=122913412608533&w=4



^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12265] FPU emulation broken in 2.6.28-rc8 ?
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Rogier Wolff

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12265
Subject		: FPU emulation broken in 2.6.28-rc8 ?
Submitter	: Rogier Wolff <R.E.Wolff-bu/CaDbLbdHGjfRZg6uqBA@public.gmane.org>
Date		: 2008-12-17 8:56 (60 days old)
References	: http://marc.info/?l=linux-kernel&m=122950463030747&w=4


^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12337] ~100 extra wakeups reported by powertop
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Alberto Gonzalez

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12337
Subject		: ~100 extra wakeups reported by powertop
Submitter	: Alberto Gonzalez <luis6674-/E1597aS9LQAvxtiuMwx3w@public.gmane.org>
Date		: 2008-12-31 12:25 (46 days old)


^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12224] journal activity on inactive partition causes inactive harddrive spinup
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Andrew Morton, Arthur Jones, C Sights,
	Eric Sandeen, Greg Kroah-Hartman, Linus Torvalds, Theodore Tso

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12224
Subject		: journal activity on inactive partition causes inactive harddrive spinup
Submitter	: C Sights <csights-97jfqw80gc6171pxa8y+qA@public.gmane.org>
Date		: 2008-12-14 11:39 (63 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=c87591b719737b4e91eb1a9fa8fd55a4ff1886d6
Handled-By	: Eric Sandeen <sandeen-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>


^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12393] debugging in dosemu causes lots of 'scheduling while atomic'
  2009-02-14 20:48 ` Rafael J. Wysocki
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Michal Suchanek, Thomas Gleixner

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12393
Subject		: debugging in dosemu causes lots of 'scheduling while atomic'
Submitter	: Michal Suchanek <hramrach@centrum.cz>
Date		: 2009-01-09 07:28 (37 days old)
Handled-By	: Thomas Gleixner <tglx@linutronix.de>
Patch		: http://lkml.org/lkml/2009/1/13/445



^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12401] 2.6.28 regression: xbacklight broken on ThinkPad X61s
  2009-02-14 20:48 ` Rafael J. Wysocki
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Andi Kleen, Len Brown, Matthew Garrett,
	Thomas Renninger, Tino Keitel, Zhang Rui

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12401
Subject		: 2.6.28 regression: xbacklight broken on ThinkPad X61s
Submitter	: Tino Keitel <tino.keitel@gmx.de>
Date		: 2009-01-05 8:39 (41 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=22c13f9d8179f4c9caecfcb60a95214562b9addc
References	: http://marc.info/?l=linux-kernel&m=123114479110314&w=4



^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12395] 2.6.28-rc9: oprofile regression
  2009-02-14 20:48 ` Rafael J. Wysocki
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Andi Kleen, Robert Richter, Tim Blechmann

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12395
Subject		: 2.6.28-rc9: oprofile regression
Submitter	: Tim Blechmann <tim@klingt.org>
Date		: 2008-12-21 14:23 (56 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b99170288421c79f0c2efa8b33e26e65f4bb7fb8
References	: http://marc.info/?l=linux-kernel&m=122986946614791&w=4
Handled-By	: Andi Kleen <ak@linux.intel.com>



^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12403] TTY problem on linux-2.6.28-rc7
  2009-02-14 20:48 ` Rafael J. Wysocki
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, sasa sasa

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12403
Subject		: TTY problem on linux-2.6.28-rc7
Submitter	: sasa sasa <sasak.1983@gmail.com>
Date		: 2008-12-22 4:23 (55 days old)
References	: http://marc.info/?l=linux-kernel&m=122991914600390&w=4



^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12393] debugging in dosemu causes lots of 'scheduling while atomic'
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Michal Suchanek, Thomas Gleixner

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12393
Subject		: debugging in dosemu causes lots of 'scheduling while atomic'
Submitter	: Michal Suchanek <hramrach-aRb0bU7PRFPrBKCeMvbIDA@public.gmane.org>
Date		: 2009-01-09 07:28 (37 days old)
Handled-By	: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
Patch		: http://lkml.org/lkml/2009/1/13/445


^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12395] 2.6.28-rc9: oprofile regression
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Andi Kleen, Robert Richter, Tim Blechmann

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12395
Subject		: 2.6.28-rc9: oprofile regression
Submitter	: Tim Blechmann <tim-xpEK/MU0Hawdnm+yROfE0A@public.gmane.org>
Date		: 2008-12-21 14:23 (56 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b99170288421c79f0c2efa8b33e26e65f4bb7fb8
References	: http://marc.info/?l=linux-kernel&m=122986946614791&w=4
Handled-By	: Andi Kleen <ak-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>


^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12403] TTY problem on linux-2.6.28-rc7
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, sasa sasa

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12403
Subject		: TTY problem on linux-2.6.28-rc7
Submitter	: sasa sasa <sasak.1983-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2008-12-22 4:23 (55 days old)
References	: http://marc.info/?l=linux-kernel&m=122991914600390&w=4


^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12401] 2.6.28 regression: xbacklight broken on ThinkPad X61s
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Andi Kleen, Len Brown, Matthew Garrett,
	Thomas Renninger, Tino Keitel, Zhang Rui

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12401
Subject		: 2.6.28 regression: xbacklight broken on ThinkPad X61s
Submitter	: Tino Keitel <tino.keitel-Mmb7MZpHnFY@public.gmane.org>
Date		: 2009-01-05 8:39 (41 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=22c13f9d8179f4c9caecfcb60a95214562b9addc
References	: http://marc.info/?l=linux-kernel&m=123114479110314&w=4


^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12404] Oops in 2.6.28-rc9 and -rc8 -- mtrr issues / e1000e
  2009-02-14 20:48 ` Rafael J. Wysocki
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Kernel

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12404
Subject		: Oops in 2.6.28-rc9 and -rc8 -- mtrr issues / e1000e
Submitter	: Kernel <kernel@bazarnic.net>
Date		: 2008-12-22 9:37 (55 days old)
References	: http://marc.info/?l=linux-kernel&m=122993873320150&w=4



^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12405] oops in __bounce_end_io_read under kvm
  2009-02-14 20:48 ` Rafael J. Wysocki
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Christoph Hellwig, Jens Axboe

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12405
Subject		: oops in __bounce_end_io_read under kvm
Submitter	: Christoph Hellwig <hch@lst.de>
Date		: 2008-12-26 17:36 (51 days old)
References	: http://marc.info/?l=linux-kernel&m=123031303400676&w=4
Handled-By	: Jens Axboe <jens.axboe@oracle.com>



^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12406] 2.6.28 thinks that my PS/2 mouse is a touchpad
  2009-02-14 20:48 ` Rafael J. Wysocki
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Alexander E. Patrakov, Arjan Opmeer,
	Denys Vlasenko, Dmitry Torokhov

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12406
Subject		: 2.6.28 thinks that my PS/2 mouse is a touchpad
Submitter	: Alexander E. Patrakov <patrakov@gmail.com>
Date		: 2008-12-27 9:06 (50 days old)
References	: http://marc.info/?l=linux-kernel&m=123036893817280&w=4
Handled-By	: Arjan Opmeer <arjan@opmeer.net>
Patch		: http://marc.info/?l=linux-kernel&m=123092147703236&w=4



^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12406] 2.6.28 thinks that my PS/2 mouse is a touchpad
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Alexander E. Patrakov, Arjan Opmeer,
	Denys Vlasenko, Dmitry Torokhov

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12406
Subject		: 2.6.28 thinks that my PS/2 mouse is a touchpad
Submitter	: Alexander E. Patrakov <patrakov-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2008-12-27 9:06 (50 days old)
References	: http://marc.info/?l=linux-kernel&m=123036893817280&w=4
Handled-By	: Arjan Opmeer <arjan-OssVvNj1wBysTnJN9+BGXg@public.gmane.org>
Patch		: http://marc.info/?l=linux-kernel&m=123092147703236&w=4


^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12404] Oops in 2.6.28-rc9 and -rc8 -- mtrr issues / e1000e
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Kernel

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12404
Subject		: Oops in 2.6.28-rc9 and -rc8 -- mtrr issues / e1000e
Submitter	: Kernel <kernel-nOyj/A09A+/k1uMJSBkQmQ@public.gmane.org>
Date		: 2008-12-22 9:37 (55 days old)
References	: http://marc.info/?l=linux-kernel&m=122993873320150&w=4


^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12405] oops in __bounce_end_io_read under kvm
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Christoph Hellwig, Jens Axboe

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12405
Subject		: oops in __bounce_end_io_read under kvm
Submitter	: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Date		: 2008-12-26 17:36 (51 days old)
References	: http://marc.info/?l=linux-kernel&m=123031303400676&w=4
Handled-By	: Jens Axboe <jens.axboe-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>


^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12407] Kernel 2.6.28 regression: Hang after hibernate
  2009-02-14 20:48 ` Rafael J. Wysocki
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Frank Groeneveld, Pavel Machek

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12407
Subject		: Kernel 2.6.28 regression: Hang after hibernate
Submitter	: Frank Groeneveld <frankgroeneveld@gmail.com>
Date		: 2008-12-28 20:34 (49 days old)
References	: http://marc.info/?l=linux-kernel&m=123049651906081&w=4



^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12408] Funny problem with 2.6.28: Kernel stalls
  2009-02-14 20:48 ` Rafael J. Wysocki
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Michael Roth, Thomas Gleixner

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12408
Subject		: Funny problem with 2.6.28: Kernel stalls
Submitter	: Michael Roth <mroth@nessie.de>
Date		: 2008-12-25 15:14 (52 days old)
References	: http://marc.info/?l=linux-kernel&m=123021931714282&w=4



^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12409] NULL pointer dereference at get_stats()
  2009-02-14 20:48 ` Rafael J. Wysocki
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Frederik Deweerdt, Tetsuo Handa

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12409
Subject		: NULL pointer dereference at get_stats()
Submitter	: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Date		: 2008-12-30 12:53 (47 days old)
References	: http://marc.info/?l=linux-kernel&m=123064167008695&w=4
Handled-By	: Frederik Deweerdt <frederik.deweerdt@xprog.eu>



^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12411] 2.6.28: BUG in r8169
  2009-02-14 20:48 ` Rafael J. Wysocki
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Andrey Vul, Francois Romieu

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12411
Subject		: 2.6.28: BUG in r8169
Submitter	: Andrey Vul <andrey.vul@gmail.com>
Date		: 2008-12-31 18:37 (46 days old)
References	: http://marc.info/?l=linux-kernel&m=123074869611409&w=4



^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12409] NULL pointer dereference at get_stats()
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Frederik Deweerdt, Tetsuo Handa

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12409
Subject		: NULL pointer dereference at get_stats()
Submitter	: Tetsuo Handa <penguin-kernel-1yMVhJb1mP/7nzcFbJAaVXf5DAMn2ifp@public.gmane.org>
Date		: 2008-12-30 12:53 (47 days old)
References	: http://marc.info/?l=linux-kernel&m=123064167008695&w=4
Handled-By	: Frederik Deweerdt <frederik.deweerdt-kjvbsxwSFqI@public.gmane.org>


^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12411] 2.6.28: BUG in r8169
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Andrey Vul, Francois Romieu

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12411
Subject		: 2.6.28: BUG in r8169
Submitter	: Andrey Vul <andrey.vul-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2008-12-31 18:37 (46 days old)
References	: http://marc.info/?l=linux-kernel&m=123074869611409&w=4


^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12407] Kernel 2.6.28 regression: Hang after hibernate
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Frank Groeneveld, Pavel Machek

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12407
Subject		: Kernel 2.6.28 regression: Hang after hibernate
Submitter	: Frank Groeneveld <frankgroeneveld-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2008-12-28 20:34 (49 days old)
References	: http://marc.info/?l=linux-kernel&m=123049651906081&w=4


^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12408] Funny problem with 2.6.28: Kernel stalls
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Michael Roth, Thomas Gleixner

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12408
Subject		: Funny problem with 2.6.28: Kernel stalls
Submitter	: Michael Roth <mroth-+8Z3Oe2AQjqzQB+pC5nmwQ@public.gmane.org>
Date		: 2008-12-25 15:14 (52 days old)
References	: http://marc.info/?l=linux-kernel&m=123021931714282&w=4


^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12421] GPF on 2.6.28 and 2.6.28-rc9-git3, e1000e and e1000 issues
  2009-02-14 20:48 ` Rafael J. Wysocki
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Doug Bazarnic

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12421
Subject		: GPF on 2.6.28 and 2.6.28-rc9-git3, e1000e and e1000 issues
Submitter	: Doug Bazarnic <doug@bazarnic.net>
Date		: 2009-01-09 21:26 (37 days old)
References	: http://marc.info/?l=linux-kernel&m=123153653120204&w=4



^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12500] r8169: NETDEV WATCHDOG: eth0 (r8169): transmit timed out
  2009-02-14 20:48 ` Rafael J. Wysocki
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Francois Romieu, Justin Piszcz

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12500
Subject		: r8169: NETDEV WATCHDOG: eth0 (r8169): transmit timed out
Submitter	: Justin Piszcz <jpiszcz@lucidpixels.com>
Date		: 2009-01-13 21:19 (33 days old)
References	: http://marc.info/?l=linux-kernel&m=123188160811322&w=4
Handled-By	: Francois Romieu <romieu@fr.zoreil.com>



^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-02-14 20:48 ` Rafael J. Wysocki
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Ingo Molnar, Kevin Shanahan, Kevin Shanahan,
	Mike Galbraith, Peter Zijlstra

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
Subject		: KVM guests stalling on 2.6.28 (bisected)
Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
Date		: 2009-01-17 03:37 (29 days old)



^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12421] GPF on 2.6.28 and 2.6.28-rc9-git3, e1000e and e1000 issues
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Doug Bazarnic

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12421
Subject		: GPF on 2.6.28 and 2.6.28-rc9-git3, e1000e and e1000 issues
Submitter	: Doug Bazarnic <doug-nOyj/A09A+/k1uMJSBkQmQ@public.gmane.org>
Date		: 2009-01-09 21:26 (37 days old)
References	: http://marc.info/?l=linux-kernel&m=123153653120204&w=4


^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12500] r8169: NETDEV WATCHDOG: eth0 (r8169): transmit timed out
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Francois Romieu, Justin Piszcz

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12500
Subject		: r8169: NETDEV WATCHDOG: eth0 (r8169): transmit timed out
Submitter	: Justin Piszcz <jpiszcz-BP4nVm5VUdNhbmWW9KSYcQ@public.gmane.org>
Date		: 2009-01-13 21:19 (33 days old)
References	: http://marc.info/?l=linux-kernel&m=123188160811322&w=4
Handled-By	: Francois Romieu <romieu-W8zweXLXuWQS+FvcfC7Uqw@public.gmane.org>


^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Ingo Molnar, Kevin Shanahan, Kevin Shanahan,
	Mike Galbraith, Peter Zijlstra

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
Subject		: KVM guests stalling on 2.6.28 (bisected)
Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
Date		: 2009-01-17 03:37 (29 days old)


^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12559] Huawei E169 doesn't work as mass storage anymore
  2009-02-14 20:48 ` Rafael J. Wysocki
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, fangxiaozhi, kpalberg

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12559
Subject		: Huawei E169 doesn't work as mass storage anymore
Submitter	: kpalberg <kpalberg@gmail.com>
Date		: 2009-01-28 02:34 (18 days old)



^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12612] hard lockup when interrupting cdda2wav
  2009-02-14 20:48 ` Rafael J. Wysocki
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, FUJITA Tomonori, Matthias Reichl

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12612
Subject		: hard lockup when interrupting cdda2wav
Submitter	: Matthias Reichl <hias@horus.com>
Date		: 2009-01-28 16:41 (18 days old)
References	: http://marc.info/?l=linux-kernel&m=123316111415677&w=4
Handled-By	: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Patch		: http://marc.info/?l=linux-scsi&m=123371501613019&w=2



^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12614] WOL with forcedeth broken since f55c21fd9a92a444e55ad1ca4e4732d56661bf2e
  2009-02-14 20:48 ` Rafael J. Wysocki
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Andrew Morton, Jeff Garzik,
	Philipp Matthias Hahn, Tobias Diedrich, Yinghai Lu, Yinghai Lu

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12614
Subject		: WOL with forcedeth broken since f55c21fd9a92a444e55ad1ca4e4732d56661bf2e
Submitter	: Philipp Matthias Hahn <pmhahn@titan.lahn.de>
Date		: 2009-01-29 6:31 (17 days old)
References	: http://marc.info/?l=linux-kernel&m=123321232825316&w=4
Handled-By	: Yinghai Lu <yinghai@kernel.org>
		  Tobias Diedrich <ranma+kernel@tdiedrich.de>
Patch		: http://marc.info/?l=linux-kernel&m=123330459229248&w=4
		  http://marc.info/?l=linux-kernel&m=123411195117835&w=4



^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12619] Regression 2.6.28 and last - boot failed
  2009-02-14 20:48 ` Rafael J. Wysocki
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, jan sonnek

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12619
Subject		: Regression 2.6.28 and last - boot failed
Submitter	: jan sonnek <ha2nny@gmail.com>
Date		: 2009-02-01 19:59 (14 days old)
References	: http://marc.info/?l=linux-kernel&m=123351836213969&w=4



^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12559] Huawei E169 doesn't work as mass storage anymore
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, fangxiaozhi, kpalberg

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12559
Subject		: Huawei E169 doesn't work as mass storage anymore
Submitter	: kpalberg <kpalberg-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2009-01-28 02:34 (18 days old)


^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12619] Regression 2.6.28 and last - boot failed
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, jan sonnek

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12619
Subject		: Regression 2.6.28 and last - boot failed
Submitter	: jan sonnek <ha2nny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2009-02-01 19:59 (14 days old)
References	: http://marc.info/?l=linux-kernel&m=123351836213969&w=4


^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12612] hard lockup when interrupting cdda2wav
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, FUJITA Tomonori, Matthias Reichl

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12612
Subject		: hard lockup when interrupting cdda2wav
Submitter	: Matthias Reichl <hias-vtPv7MOkFPkAvxtiuMwx3w@public.gmane.org>
Date		: 2009-01-28 16:41 (18 days old)
References	: http://marc.info/?l=linux-kernel&m=123316111415677&w=4
Handled-By	: FUJITA Tomonori <fujita.tomonori-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
Patch		: http://marc.info/?l=linux-scsi&m=123371501613019&w=2


^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12614] WOL with forcedeth broken since f55c21fd9a92a444e55ad1ca4e4732d56661bf2e
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Andrew Morton, Jeff Garzik,
	Philipp Matthias Hahn, Tobias Diedrich, Yinghai Lu, Yinghai Lu

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12614
Subject		: WOL with forcedeth broken since f55c21fd9a92a444e55ad1ca4e4732d56661bf2e
Submitter	: Philipp Matthias Hahn <pmhahn-u4khhh1J0LzF41mA0N3lWw@public.gmane.org>
Date		: 2009-01-29 6:31 (17 days old)
References	: http://marc.info/?l=linux-kernel&m=123321232825316&w=4
Handled-By	: Yinghai Lu <yinghai-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
		  Tobias Diedrich <ranma+kernel-YxUgxmcw2FPQD6PfKP4TzA@public.gmane.org>
Patch		: http://marc.info/?l=linux-kernel&m=123330459229248&w=4
		  http://marc.info/?l=linux-kernel&m=123411195117835&w=4


^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12634] video distortion and lockup with i830 video chip and 2.6.28.3
  2009-02-14 20:48 ` Rafael J. Wysocki
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Bob Raitz

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12634
Subject		: video distortion and lockup with i830 video chip and 2.6.28.3
Submitter	: Bob Raitz <pappy_mcfae@yahoo.com>
Date		: 2009-02-04 21:10 (11 days old)



^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12645] DMI low-memory-protect quirk causes resume hang on Samsung NC10
  2009-02-14 20:48 ` Rafael J. Wysocki
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Ingo Molnar, Patrick Walton, Philipp Kohlbecher

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12645
Subject		: DMI low-memory-protect quirk causes resume hang on Samsung NC10
Submitter	: Patrick Walton <pcwalton@cs.ucla.edu>
Date		: 2009-02-06 18:35 (9 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0af40a4b1050c050e62eb1dc30b82d5ab22bf221



^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12658] ThrustMaster Firestorm Dual Power 3 Gamepads stopped working
  2009-02-14 20:48 ` Rafael J. Wysocki
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Frank Roscher

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12658
Subject		: ThrustMaster Firestorm Dual Power 3 Gamepads stopped working
Submitter	: Frank Roscher <Frank-Roscher@gmx.net>
Date		: 2009-02-08 08:45 (7 days old)



^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12690] DPMS (LCD powersave, poweroff) don't work
  2009-02-14 20:48 ` Rafael J. Wysocki
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Antonin Kolisek

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12690
Subject		: DPMS (LCD powersave, poweroff) don't work
Submitter	: Antonin Kolisek <akolisek@linuxx.hyperlinx.cz>
Date		: 2009-02-11 09:40 (4 days old)



^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12634] video distortion and lockup with i830 video chip and 2.6.28.3
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Bob Raitz

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12634
Subject		: video distortion and lockup with i830 video chip and 2.6.28.3
Submitter	: Bob Raitz <pappy_mcfae-/E1597aS9LQAvxtiuMwx3w@public.gmane.org>
Date		: 2009-02-04 21:10 (11 days old)


^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12645] DMI low-memory-protect quirk causes resume hang on Samsung NC10
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Ingo Molnar, Patrick Walton, Philipp Kohlbecher

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12645
Subject		: DMI low-memory-protect quirk causes resume hang on Samsung NC10
Submitter	: Patrick Walton <pcwalton-764C0pRuGfqVc3sceRu5cw@public.gmane.org>
Date		: 2009-02-06 18:35 (9 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0af40a4b1050c050e62eb1dc30b82d5ab22bf221


^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12658] ThrustMaster Firestorm Dual Power 3 Gamepads stopped working
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Frank Roscher

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12658
Subject		: ThrustMaster Firestorm Dual Power 3 Gamepads stopped working
Submitter	: Frank Roscher <Frank-Roscher-hi6Y0CQ0nG0@public.gmane.org>
Date		: 2009-02-08 08:45 (7 days old)


^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12690] DPMS (LCD powersave, poweroff) don't work
@ 2009-02-14 20:50   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-14 20:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Antonin Kolisek

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12690
Subject		: DPMS (LCD powersave, poweroff) don't work
Submitter	: Antonin Kolisek <akolisek-T3ps84XAcx36AaHJ4hbVU+3CNBr840j2@public.gmane.org>
Date		: 2009-02-11 09:40 (4 days old)


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12265] FPU emulation broken in 2.6.28-rc8 ?
  2009-02-14 20:50   ` Rafael J. Wysocki
@ 2009-02-14 23:23     ` Ingo Molnar
  -1 siblings, 0 replies; 262+ messages in thread
From: Ingo Molnar @ 2009-02-14 23:23 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Rogier Wolff


* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.27 and 2.6.28.
> 
> The following bug entry is on the current list of known regressions
> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> be listed and let me know (either way).
> 
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12265
> Subject		: FPU emulation broken in 2.6.28-rc8 ?
> Submitter	: Rogier Wolff <R.E.Wolff@bitwizard.nl>
> Date		: 2008-12-17 8:56 (60 days old)
> References	: http://marc.info/?l=linux-kernel&m=122950463030747&w=4

Should be fixed in -rc5 by:

  d315760: x86: fix math_emu register frame access

Rogier, can you confirm?

	Ingo

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12265] FPU emulation broken in 2.6.28-rc8 ?
@ 2009-02-14 23:23     ` Ingo Molnar
  0 siblings, 0 replies; 262+ messages in thread
From: Ingo Molnar @ 2009-02-14 23:23 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Rogier Wolff


* Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org> wrote:

> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.27 and 2.6.28.
> 
> The following bug entry is on the current list of known regressions
> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> be listed and let me know (either way).
> 
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12265
> Subject		: FPU emulation broken in 2.6.28-rc8 ?
> Submitter	: Rogier Wolff <R.E.Wolff-bu/CaDbLbdHGjfRZg6uqBA@public.gmane.org>
> Date		: 2008-12-17 8:56 (60 days old)
> References	: http://marc.info/?l=linux-kernel&m=122950463030747&w=4

Should be fixed in -rc5 by:

  d315760: x86: fix math_emu register frame access

Rogier, can you confirm?

	Ingo

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12337] ~100 extra wakeups reported by powertop
  2009-02-14 20:50   ` Rafael J. Wysocki
@ 2009-02-14 23:35     ` Alberto Gonzalez
  -1 siblings, 0 replies; 262+ messages in thread
From: Alberto Gonzalez @ 2009-02-14 23:35 UTC (permalink / raw)
  To: Linux Kernel Mailing List, Rafael J. Wysocki
  Cc: Kernel Testers List, Jesse Barnes

--- On Sat, 2/14/09, Rafael J. Wysocki <rjw@sisk.pl> wrote:

> The following bug entry is on the current list of known
> regressions
> introduced between 2.6.27 and 2.6.28.  Please verify if it
> still should
> be listed and let me know (either way).
> 
> 
> Bug-Entry	:
> http://bugzilla.kernel.org/show_bug.cgi?id=12337
> Subject		: ~100 extra wakeups reported by powertop
> Submitter	: Alberto Gonzalez <luis6674@yahoo.com>
> Date		: 2008-12-31 12:25 (46 days old)

Yes, still present in latest stable 2.6.28.5

I updated the report to say that this happened on my 5 year old Pentium 4, but now I got a new Dell desktop (Intel G45 based) and the exact same problem happens, so I can't be the only one seeing it. In the bugzilla Eric Anholt said that it could be related to vblank and that jbarnes had look into a similar issue before, so maybe he has some clue.

Thanks.


      

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12337] ~100 extra wakeups reported by powertop
@ 2009-02-14 23:35     ` Alberto Gonzalez
  0 siblings, 0 replies; 262+ messages in thread
From: Alberto Gonzalez @ 2009-02-14 23:35 UTC (permalink / raw)
  To: Linux Kernel Mailing List, Rafael J. Wysocki
  Cc: Kernel Testers List, Jesse Barnes

--- On Sat, 2/14/09, Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org> wrote:

> The following bug entry is on the current list of known
> regressions
> introduced between 2.6.27 and 2.6.28.  Please verify if it
> still should
> be listed and let me know (either way).
> 
> 
> Bug-Entry	:
> http://bugzilla.kernel.org/show_bug.cgi?id=12337
> Subject		: ~100 extra wakeups reported by powertop
> Submitter	: Alberto Gonzalez <luis6674-/E1597aS9LQAvxtiuMwx3w@public.gmane.org>
> Date		: 2008-12-31 12:25 (46 days old)

Yes, still present in latest stable 2.6.28.5

I updated the report to say that this happened on my 5 year old Pentium 4, but now I got a new Dell desktop (Intel G45 based) and the exact same problem happens, so I can't be the only one seeing it. In the bugzilla Eric Anholt said that it could be related to vblank and that jbarnes had look into a similar issue before, so maybe he has some clue.

Thanks.


      

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12406] 2.6.28 thinks that my PS/2 mouse is a touchpad
  2009-02-14 20:50   ` Rafael J. Wysocki
@ 2009-02-15  6:14     ` Alexander E. Patrakov
  -1 siblings, 0 replies; 262+ messages in thread
From: Alexander E. Patrakov @ 2009-02-15  6:14 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Arjan Opmeer,
	Denys Vlasenko, Dmitry Torokhov

"Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.27 and 2.6.28.
> 
> The following bug entry is on the current list of known regressions
> introduced between 2.6.27 and 2.6.28.  Please verify if it still
> should be listed and let me know (either way).

Yes, it is still a regression with a patch that is not in -stable.

> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12406
> Subject		: 2.6.28 thinks that my PS/2 mouse is a
> touchpad Submitter	: Alexander E. Patrakov <patrakov@gmail.com>
> Date		: 2008-12-27 9:06 (50 days old)
> References	:
> http://marc.info/?l=linux-kernel&m=123036893817280&w=4
> Handled-By	: Arjan Opmeer <arjan@opmeer.net>
> Patch		:
> http://marc.info/?l=linux-kernel&m=123092147703236&w=4

-- 
Alexander E. Patrakov

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12406] 2.6.28 thinks that my PS/2 mouse is a touchpad
@ 2009-02-15  6:14     ` Alexander E. Patrakov
  0 siblings, 0 replies; 262+ messages in thread
From: Alexander E. Patrakov @ 2009-02-15  6:14 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Arjan Opmeer,
	Denys Vlasenko, Dmitry Torokhov

"Rafael J. Wysocki" <rjw-KKrjLPT3xs0@public.gmane.org> wrote:

> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.27 and 2.6.28.
> 
> The following bug entry is on the current list of known regressions
> introduced between 2.6.27 and 2.6.28.  Please verify if it still
> should be listed and let me know (either way).

Yes, it is still a regression with a patch that is not in -stable.

> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12406
> Subject		: 2.6.28 thinks that my PS/2 mouse is a
> touchpad Submitter	: Alexander E. Patrakov <patrakov-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Date		: 2008-12-27 9:06 (50 days old)
> References	:
> http://marc.info/?l=linux-kernel&m=123036893817280&w=4
> Handled-By	: Arjan Opmeer <arjan-OssVvNj1wBysTnJN9+BGXg@public.gmane.org>
> Patch		:
> http://marc.info/?l=linux-kernel&m=123092147703236&w=4

-- 
Alexander E. Patrakov

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected) [Bug 12465]
  2009-02-14 20:50   ` Rafael J. Wysocki
@ 2009-02-15  9:48     ` Kevin Shanahan
  -1 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-02-15  9:48 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Ingo Molnar,
	Mike Galbraith, bugme-daemon, Steven Rostedt, Peter Zijlstra

On Sat, 2009-02-14 at 21:50 +0100, Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.27 and 2.6.28.
> 
> The following bug entry is on the current list of known regressions
> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> be listed and let me know (either way).

Yes, this should still be listed.

I just tested against 2.6.29-rc5 and the problem is as bad as ever
(perhaps worse?)

--- hermes-old.wumi.org.au ping statistics ---
900 packets transmitted, 448 received, +317 errors, 50% packet loss, time 899845ms
rtt min/avg/max/mdev = 0.131/420.015/10890.699/1297.022 ms, pipe 11

The guest being pinged crashed during the test - the QEMU monitor was
accessible, but the guest didn't respond to "sendkey alt-sysrq-s", etc.
This was the last thing in the guest syslog after reboot:

Feb 15 19:48:58 hermes-old kernel: ------------[ cut here ]------------
Feb 15 19:48:58 hermes-old kernel: WARNING: at net/sched/sch_generic.c:219 dev_watchdog+0x111/0x195()
Feb 15 19:48:58 hermes-old kernel: NETDEV WATCHDOG: eth0 (8139too): transmit timed out
Feb 15 19:48:58 hermes-old kernel: Pid: 0, comm: swapper Not tainted 2.6.27.10 #1
Feb 15 19:48:58 hermes-old kernel:  [<c011d75c>] warn_slowpath+0x5c/0x81
Feb 15 19:48:58 hermes-old kernel:  [<c02f5f7c>] nf_hook_slow+0x44/0xb1
Feb 15 19:48:58 hermes-old kernel:  [<c02d93f1>] dev_queue_xmit+0x3da/0x411
Feb 15 19:48:58 hermes-old kernel:  [<c030043d>] ip_finish_output+0x1f9/0x231
Feb 15 19:48:58 hermes-old kernel:  [<c01daeee>] __next_cpu+0x12/0x21
Feb 15 19:48:58 hermes-old kernel:  [<c0116b42>] find_busiest_group+0x232/0x69f
Feb 15 19:48:58 hermes-old kernel:  [<c01160dc>] update_curr+0x41/0x65
Feb 15 19:48:58 hermes-old kernel:  [<c02e33b5>] dev_watchdog+0x111/0x195
Feb 15 19:48:58 hermes-old kernel:  [<c011822f>] enqueue_task_fair+0x16/0x24
Feb 15 19:48:58 hermes-old kernel:  [<c0115645>] enqueue_task+0xa/0x14
Feb 15 19:48:58 hermes-old kernel:  [<c01156d5>] activate_task+0x16/0x1b
Feb 15 19:48:58 hermes-old kernel:  [<c0119c8c>] try_to_wake_up+0x131/0x13a
Feb 15 19:48:58 hermes-old kernel:  [<c02e32a4>] dev_watchdog+0x0/0x195
Feb 15 19:48:58 hermes-old kernel:  [<c012424c>] run_timer_softirq+0xf5/0x14a
Feb 15 19:48:58 hermes-old kernel:  [<c0120f60>] __do_softirq+0x5d/0xc1
Feb 15 19:48:58 hermes-old kernel:  [<c0120ff6>] do_softirq+0x32/0x36
Feb 15 19:48:58 hermes-old kernel:  [<c012112c>] irq_exit+0x35/0x40
Feb 15 19:48:58 hermes-old kernel:  [<c010e8db>] smp_apic_timer_interrupt+0x6e/0x7b
Feb 15 19:48:58 hermes-old kernel:  [<c01035ac>] apic_timer_interrupt+0x28/0x30
Feb 15 19:48:58 hermes-old kernel:  [<c0107386>] default_idle+0x2a/0x3d
Feb 15 19:48:58 hermes-old kernel:  [<c0101900>] cpu_idle+0x5c/0x84
Feb 15 19:48:58 hermes-old kernel:  =======================
Feb 15 19:48:58 hermes-old kernel: ---[ end trace eff10a8043ac4e7b ]---
Feb 15 19:49:01 hermes-old kernel: eth0: Transmit timeout, status 0d 0000 c07f media d0.
Feb 15 19:49:01 hermes-old kernel: eth0: Tx queue start entry 839  dirty entry 839.
Feb 15 19:49:01 hermes-old kernel: eth0:  Tx descriptor 0 is 0008a03c.
Feb 15 19:49:01 hermes-old kernel: eth0:  Tx descriptor 1 is 0008a062.
Feb 15 19:49:01 hermes-old kernel: eth0:  Tx descriptor 2 is 0008a062.
Feb 15 19:49:01 hermes-old kernel: eth0:  Tx descriptor 3 is 0008a05b. (queue head)
Feb 15 19:49:01 hermes-old kernel: eth0: link up, 100Mbps, full-duplex, lpa 0x05E1

I think I saw some patches to fix the latency tracer for non-RT tasks on
the mailing list a while ago. If that's still going to be a useful test,
can someone give me some hints on which kernel tree and/or patches to
download to get that working? The simpler you can make it, the better ;)

Cheers,
Kevin.



^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected) [Bug 12465]
@ 2009-02-15  9:48     ` Kevin Shanahan
  0 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-02-15  9:48 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Ingo Molnar,
	Mike Galbraith, bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r,
	Steven Rostedt, Peter Zijlstra

On Sat, 2009-02-14 at 21:50 +0100, Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.27 and 2.6.28.
> 
> The following bug entry is on the current list of known regressions
> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> be listed and let me know (either way).

Yes, this should still be listed.

I just tested against 2.6.29-rc5 and the problem is as bad as ever
(perhaps worse?)

--- hermes-old.wumi.org.au ping statistics ---
900 packets transmitted, 448 received, +317 errors, 50% packet loss, time 899845ms
rtt min/avg/max/mdev = 0.131/420.015/10890.699/1297.022 ms, pipe 11

The guest being pinged crashed during the test - the QEMU monitor was
accessible, but the guest didn't respond to "sendkey alt-sysrq-s", etc.
This was the last thing in the guest syslog after reboot:

Feb 15 19:48:58 hermes-old kernel: ------------[ cut here ]------------
Feb 15 19:48:58 hermes-old kernel: WARNING: at net/sched/sch_generic.c:219 dev_watchdog+0x111/0x195()
Feb 15 19:48:58 hermes-old kernel: NETDEV WATCHDOG: eth0 (8139too): transmit timed out
Feb 15 19:48:58 hermes-old kernel: Pid: 0, comm: swapper Not tainted 2.6.27.10 #1
Feb 15 19:48:58 hermes-old kernel:  [<c011d75c>] warn_slowpath+0x5c/0x81
Feb 15 19:48:58 hermes-old kernel:  [<c02f5f7c>] nf_hook_slow+0x44/0xb1
Feb 15 19:48:58 hermes-old kernel:  [<c02d93f1>] dev_queue_xmit+0x3da/0x411
Feb 15 19:48:58 hermes-old kernel:  [<c030043d>] ip_finish_output+0x1f9/0x231
Feb 15 19:48:58 hermes-old kernel:  [<c01daeee>] __next_cpu+0x12/0x21
Feb 15 19:48:58 hermes-old kernel:  [<c0116b42>] find_busiest_group+0x232/0x69f
Feb 15 19:48:58 hermes-old kernel:  [<c01160dc>] update_curr+0x41/0x65
Feb 15 19:48:58 hermes-old kernel:  [<c02e33b5>] dev_watchdog+0x111/0x195
Feb 15 19:48:58 hermes-old kernel:  [<c011822f>] enqueue_task_fair+0x16/0x24
Feb 15 19:48:58 hermes-old kernel:  [<c0115645>] enqueue_task+0xa/0x14
Feb 15 19:48:58 hermes-old kernel:  [<c01156d5>] activate_task+0x16/0x1b
Feb 15 19:48:58 hermes-old kernel:  [<c0119c8c>] try_to_wake_up+0x131/0x13a
Feb 15 19:48:58 hermes-old kernel:  [<c02e32a4>] dev_watchdog+0x0/0x195
Feb 15 19:48:58 hermes-old kernel:  [<c012424c>] run_timer_softirq+0xf5/0x14a
Feb 15 19:48:58 hermes-old kernel:  [<c0120f60>] __do_softirq+0x5d/0xc1
Feb 15 19:48:58 hermes-old kernel:  [<c0120ff6>] do_softirq+0x32/0x36
Feb 15 19:48:58 hermes-old kernel:  [<c012112c>] irq_exit+0x35/0x40
Feb 15 19:48:58 hermes-old kernel:  [<c010e8db>] smp_apic_timer_interrupt+0x6e/0x7b
Feb 15 19:48:58 hermes-old kernel:  [<c01035ac>] apic_timer_interrupt+0x28/0x30
Feb 15 19:48:58 hermes-old kernel:  [<c0107386>] default_idle+0x2a/0x3d
Feb 15 19:48:58 hermes-old kernel:  [<c0101900>] cpu_idle+0x5c/0x84
Feb 15 19:48:58 hermes-old kernel:  =======================
Feb 15 19:48:58 hermes-old kernel: ---[ end trace eff10a8043ac4e7b ]---
Feb 15 19:49:01 hermes-old kernel: eth0: Transmit timeout, status 0d 0000 c07f media d0.
Feb 15 19:49:01 hermes-old kernel: eth0: Tx queue start entry 839  dirty entry 839.
Feb 15 19:49:01 hermes-old kernel: eth0:  Tx descriptor 0 is 0008a03c.
Feb 15 19:49:01 hermes-old kernel: eth0:  Tx descriptor 1 is 0008a062.
Feb 15 19:49:01 hermes-old kernel: eth0:  Tx descriptor 2 is 0008a062.
Feb 15 19:49:01 hermes-old kernel: eth0:  Tx descriptor 3 is 0008a05b. (queue head)
Feb 15 19:49:01 hermes-old kernel: eth0: link up, 100Mbps, full-duplex, lpa 0x05E1

I think I saw some patches to fix the latency tracer for non-RT tasks on
the mailing list a while ago. If that's still going to be a useful test,
can someone give me some hints on which kernel tree and/or patches to
download to get that working? The simpler you can make it, the better ;)

Cheers,
Kevin.


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected) [Bug 12465]
  2009-02-15  9:48     ` Kevin Shanahan
  (?)
@ 2009-02-15 10:04     ` Ingo Molnar
  2009-02-22 10:39         ` Kevin Shanahan
  2009-02-23 11:38         ` Kevin Shanahan
  -1 siblings, 2 replies; 262+ messages in thread
From: Ingo Molnar @ 2009-02-15 10:04 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Mike Galbraith, bugme-daemon,
	Steven Rostedt, Peter Zijlstra


* Kevin Shanahan <kmshanah@ucwb.org.au> wrote:

> On Sat, 2009-02-14 at 21:50 +0100, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.27 and 2.6.28.
> > 
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> > be listed and let me know (either way).
> 
> Yes, this should still be listed.
> 
> I just tested against 2.6.29-rc5 and the problem is as bad as ever
> (perhaps worse?)
> 
> --- hermes-old.wumi.org.au ping statistics ---
> 900 packets transmitted, 448 received, +317 errors, 50% packet loss, time 899845ms
> rtt min/avg/max/mdev = 0.131/420.015/10890.699/1297.022 ms, pipe 11

i looked at the trace you did earlier and which you uploaded to:

  http://disenchant.net/tmp/bug-12465/trace-1/

Here is one 3 seconds (!) latency:

 0)  qemu-sy-4237  |               |      kvm_vcpu_block() {
 0)  qemu-sy-4237  |               |        kvm_cpu_has_interrupt() {
 0)  qemu-sy-4237  |               |          kvm_apic_has_interrupt() {
 0)  qemu-sy-4237  |   0.291 us    |          }
 0)  qemu-sy-4237  |               |          kvm_apic_accept_pic_intr() {
 0)  qemu-sy-4237  |   0.291 us    |          }
 0)  qemu-sy-4237  |   1.476 us    |        }
 0)  qemu-sy-4237  |               |        kvm_cpu_has_pending_timer() {
 0)  qemu-sy-4237  |   0.325 us    |        }
 0)  qemu-sy-4237  |               |        kvm_arch_vcpu_runnable() {
 0)  qemu-sy-4237  |   0.288 us    |        }
 0)  qemu-sy-4237  |               |        kvm_arch_vcpu_put() {
 0)  qemu-sy-4237  |   0.415 us    |        }
 0)  qemu-sy-4237  |               |        schedule() {
 0)  qemu-sy-4237  |               |          wakeup_preempt_entity() {
 0)  qemu-sy-4237  |   0.300 us    |          }
 ------------------------------------------
 0)  qemu-sy-4237  =>   ksoftir-4   
 ------------------------------------------

 0)   ksoftir-4    | ! 3010470 us |  }
 ------------------------------------------
 0)   ksoftir-4    =>  qemu-sy-4355 
 ------------------------------------------

 0)  qemu-sy-4355  |   1.575 us    |          }
 0)  qemu-sy-4355  |   6.520 us    |        }
 0)  qemu-sy-4355  |   7.121 us    |      }
 0)  qemu-sy-4355  |               |      __wake_up() {
 0)  qemu-sy-4355  |               |        __wake_up_common() {
 0)  qemu-sy-4355  |               |          autoremove_wake_function() {
 0)  qemu-sy-4355  |               |            default_wake_function() {

qemu-sy-4237 has been scheduled away, and the system appeared to have done
nothing in the meantime. That's not something that really looks like a
scheduler regression - there is nothing the scheduler can do if KVM
decides to block a task.

It would be nice to enhance this single-CPU trace some more - to more
surgically see what is going on. Firstly, absolute timestamps would be
nice:

  echo funcgraph-abstime  > trace_options
  echo funcgraph-proc     > trace_options

as it's a bit hard to see the global timescale of events.

Secondly, not all events are included - in particular i dont really see
the points when packets are passed. Would it be possible to add a tracing
hypercall so that the guest kernel can inject trace events that can be seen
on the native-side trace? Regarding ping latencies really just two things
matter: the loopback network device's rx and tx path. We should trace the
outgoing sequence number and the incoming sequence number of IP packets,
and inject that to the host side. This way we can correlate the delays
precisely.

	Ingo

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12401] 2.6.28 regression: xbacklight broken on ThinkPad X61s
  2009-02-14 20:50   ` Rafael J. Wysocki
@ 2009-02-15 13:44     ` Matthew Garrett
  -1 siblings, 0 replies; 262+ messages in thread
From: Matthew Garrett @ 2009-02-15 13:44 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Andi Kleen,
	Len Brown, Thomas Renninger, Tino Keitel, Zhang Rui

This one sounded like a configuration error.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12401] 2.6.28 regression: xbacklight broken on ThinkPad X61s
@ 2009-02-15 13:44     ` Matthew Garrett
  0 siblings, 0 replies; 262+ messages in thread
From: Matthew Garrett @ 2009-02-15 13:44 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Andi Kleen,
	Len Brown, Thomas Renninger, Tino Keitel, Zhang Rui

This one sounded like a configuration error.

-- 
Matthew Garrett | mjg59-1xO5oi07KQx4cg9Nei1l7Q@public.gmane.org

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12337] ~100 extra wakeups reported by powertop
@ 2009-02-15 14:20       ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-15 14:20 UTC (permalink / raw)
  To: luis6674; +Cc: Linux Kernel Mailing List, Kernel Testers List, Jesse Barnes

On Sunday 15 February 2009, Alberto Gonzalez wrote:
> --- On Sat, 2/14/09, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> 
> > The following bug entry is on the current list of known
> > regressions
> > introduced between 2.6.27 and 2.6.28.  Please verify if it
> > still should
> > be listed and let me know (either way).
> > 
> > 
> > Bug-Entry	:
> > http://bugzilla.kernel.org/show_bug.cgi?id=12337
> > Subject		: ~100 extra wakeups reported by powertop
> > Submitter	: Alberto Gonzalez <luis6674@yahoo.com>
> > Date		: 2008-12-31 12:25 (46 days old)
> 
> Yes, still present in latest stable 2.6.28.5
> 
> I updated the report to say that this happened on my 5 year old Pentium 4, but now I got a new Dell desktop (Intel G45 based) and the exact same problem happens, so I can't be the only one seeing it. In the bugzilla Eric Anholt said that it could be related to vblank and that jbarnes had look into a similar issue before, so maybe he has some clue.

Thanks a lot for the update.

Rafael

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12337] ~100 extra wakeups reported by powertop
@ 2009-02-15 14:20       ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-15 14:20 UTC (permalink / raw)
  To: luis6674-/E1597aS9LQAvxtiuMwx3w
  Cc: Linux Kernel Mailing List, Kernel Testers List, Jesse Barnes

On Sunday 15 February 2009, Alberto Gonzalez wrote:
> --- On Sat, 2/14/09, Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org> wrote:
> 
> > The following bug entry is on the current list of known
> > regressions
> > introduced between 2.6.27 and 2.6.28.  Please verify if it
> > still should
> > be listed and let me know (either way).
> > 
> > 
> > Bug-Entry	:
> > http://bugzilla.kernel.org/show_bug.cgi?id=12337
> > Subject		: ~100 extra wakeups reported by powertop
> > Submitter	: Alberto Gonzalez <luis6674-/E1597aS9LQAvxtiuMwx3w@public.gmane.org>
> > Date		: 2008-12-31 12:25 (46 days old)
> 
> Yes, still present in latest stable 2.6.28.5
> 
> I updated the report to say that this happened on my 5 year old Pentium 4, but now I got a new Dell desktop (Intel G45 based) and the exact same problem happens, so I can't be the only one seeing it. In the bugzilla Eric Anholt said that it could be related to vblank and that jbarnes had look into a similar issue before, so maybe he has some clue.

Thanks a lot for the update.

Rafael

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12401] 2.6.28 regression: xbacklight broken on ThinkPad X61s
@ 2009-02-15 14:38       ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-15 14:38 UTC (permalink / raw)
  To: Tino Keitel
  Cc: Matthew Garrett, Linux Kernel Mailing List, Kernel Testers List,
	Andi Kleen, Len Brown, Thomas Renninger, Zhang Rui

On Sunday 15 February 2009, Matthew Garrett wrote:
> This one sounded like a configuration error.

Tino?

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12401] 2.6.28 regression: xbacklight broken on ThinkPad X61s
@ 2009-02-15 14:38       ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-15 14:38 UTC (permalink / raw)
  To: Tino Keitel
  Cc: Matthew Garrett, Linux Kernel Mailing List, Kernel Testers List,
	Andi Kleen, Len Brown, Thomas Renninger, Zhang Rui

On Sunday 15 February 2009, Matthew Garrett wrote:
> This one sounded like a configuration error.

Tino?

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12406] 2.6.28 thinks that my PS/2 mouse is a touchpad
@ 2009-02-15 14:40       ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-15 14:40 UTC (permalink / raw)
  To: Alexander E. Patrakov
  Cc: Linux Kernel Mailing List, Kernel Testers List, Arjan Opmeer,
	Denys Vlasenko, Dmitry Torokhov, stable

On Sunday 15 February 2009, Alexander E. Patrakov wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> 
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.27 and 2.6.28.
> > 
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.27 and 2.6.28.  Please verify if it still
> > should be listed and let me know (either way).
> 
> Yes, it is still a regression with a patch that is not in -stable.

I think it hasn't been merged yet.
 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12406
> > Subject		: 2.6.28 thinks that my PS/2 mouse is a
> > touchpad Submitter	: Alexander E. Patrakov <patrakov@gmail.com>
> > Date		: 2008-12-27 9:06 (50 days old)
> > References	:
> > http://marc.info/?l=linux-kernel&m=123036893817280&w=4
> > Handled-By	: Arjan Opmeer <arjan@opmeer.net>
> > Patch		:
> > http://marc.info/?l=linux-kernel&m=123092147703236&w=4

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12406] 2.6.28 thinks that my PS/2 mouse is a touchpad
@ 2009-02-15 14:40       ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-15 14:40 UTC (permalink / raw)
  To: Alexander E. Patrakov
  Cc: Linux Kernel Mailing List, Kernel Testers List, Arjan Opmeer,
	Denys Vlasenko, Dmitry Torokhov, stable-DgEjT+Ai2ygdnm+yROfE0A

On Sunday 15 February 2009, Alexander E. Patrakov wrote:
> "Rafael J. Wysocki" <rjw-KKrjLPT3xs0@public.gmane.org> wrote:
> 
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.27 and 2.6.28.
> > 
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.27 and 2.6.28.  Please verify if it still
> > should be listed and let me know (either way).
> 
> Yes, it is still a regression with a patch that is not in -stable.

I think it hasn't been merged yet.
 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12406
> > Subject		: 2.6.28 thinks that my PS/2 mouse is a
> > touchpad Submitter	: Alexander E. Patrakov <patrakov-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> > Date		: 2008-12-27 9:06 (50 days old)
> > References	:
> > http://marc.info/?l=linux-kernel&m=123036893817280&w=4
> > Handled-By	: Arjan Opmeer <arjan-OssVvNj1wBysTnJN9+BGXg@public.gmane.org>
> > Patch		:
> > http://marc.info/?l=linux-kernel&m=123092147703236&w=4

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12263] Sata soft reset filling log
  2009-02-14 20:50   ` Rafael J. Wysocki
  (?)
@ 2009-02-15 20:47   ` Justin Madru
  2009-02-15 21:21     ` Rafael J. Wysocki
  -1 siblings, 1 reply; 262+ messages in thread
From: Justin Madru @ 2009-02-15 20:47 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Linux IDE,
	Alan Cox, Hugh Dickins, Larry Finger, Mikael Pettersson,
	Sergei Shtylyov

Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.27 and 2.6.28.
>
> The following bug entry is on the current list of known regressions
> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> be listed and let me know (either way).
>
>
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12263
> Subject		: Sata soft reset filling log
> Submitter	: Justin Madru <bevicm@dslextreme.com>
> Date		: 2008-12-13 2:07 (64 days old)
> References	: http://marc.info/?l=linux-kernel&m=122913412608533&w=4
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
>
>
>   

I'm still seeing this on .29-rc5, and I think that my bug #12263 is a 
duplicate of bug #12609,
or more correctly it's a duplicate of mine because I reported first.

It seems like the bug has been fixed in tip/master for some time now.
Below is the diff of origin and tip from when I tested.

$ git diff origin/master..tip/master drivers/ata/

diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c
index 54961c0..e004c25 100644
--- a/drivers/ata/ata_piix.c
+++ b/drivers/ata/ata_piix.c
@@ -310,7 +310,7 @@ static struct scsi_host_template piix_sht = {
 };
 
 static struct ata_port_operations piix_pata_ops = {
-    .inherits        = &ata_bmdma32_port_ops,
+    .inherits        = &ata_bmdma_port_ops,
     .cable_detect        = ata_cable_40wire,
     .set_piomode        = piix_set_piomode,
     .set_dmamode        = piix_set_dmamode,
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 9fbf059..1ed3966 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -1482,7 +1482,7 @@ static int ata_hpa_resize(struct ata_device *dev)
     struct ata_eh_context *ehc = &dev->link->eh_context;
     int print_info = ehc->i.flags & ATA_EHI_PRINTINFO;
     u64 sectors = ata_id_n_sectors(dev->id);
-    u64 native_sectors;
+    u64 uninitialized_var(native_sectors);
     int rc;
 
     /* do we need to do it? */
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index b9747fa..d65b9b2 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -3247,7 +3247,7 @@ void ata_scsi_scan_host(struct ata_port *ap, int sync)
     int tries = 5;
     struct ata_device *last_failed_dev = NULL;
     struct ata_link *link;
-    struct ata_device *dev;
+    struct ata_device *uninitialized_var(dev);
 
     if (ap->flags & ATA_FLAG_DISABLED)
         return;
diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c
index 0b299b0..416e3e2 100644
--- a/drivers/ata/libata-sff.c
+++ b/drivers/ata/libata-sff.c
@@ -80,13 +80,6 @@ const struct ata_port_operations ata_bmdma_port_ops = {
 };
 EXPORT_SYMBOL_GPL(ata_bmdma_port_ops);
 
-const struct ata_port_operations ata_bmdma32_port_ops = {
-    .inherits        = &ata_bmdma_port_ops,
-
-    .sff_data_xfer        = ata_sff_data_xfer32,
-};
-EXPORT_SYMBOL_GPL(ata_bmdma32_port_ops);
-
 /**
  *    ata_fill_sg - Fill PCI IDE PRD table
  *    @qc: Metadata associated with taskfile to be transferred
@@ -743,52 +736,6 @@ unsigned int ata_sff_data_xfer(struct ata_device 
*dev, unsigned char *buf,
 EXPORT_SYMBOL_GPL(ata_sff_data_xfer);
 
 /**
- *    ata_sff_data_xfer32 - Transfer data by PIO
- *    @dev: device to target
- *    @buf: data buffer
- *    @buflen: buffer length
- *    @rw: read/write
- *
- *    Transfer data from/to the device data register by PIO using 32bit
- *    I/O operations.
- *
- *    LOCKING:
- *    Inherited from caller.
- *
- *    RETURNS:
- *    Bytes consumed.
- */
-
-unsigned int ata_sff_data_xfer32(struct ata_device *dev, unsigned char 
*buf,
-                   unsigned int buflen, int rw)
-{
-    struct ata_port *ap = dev->link->ap;
-    void __iomem *data_addr = ap->ioaddr.data_addr;
-    unsigned int words = buflen >> 2;
-    int slop = buflen & 3;
-
-    /* Transfer multiple of 4 bytes */
-    if (rw == READ)
-        ioread32_rep(data_addr, buf, words);
-    else
-        iowrite32_rep(data_addr, buf, words);
-
-    if (unlikely(slop)) {
-        __le32 pad;
-        if (rw == READ) {
-            pad = cpu_to_le32(ioread32(ap->ioaddr.data_addr));
-            memcpy(buf + buflen - slop, &pad, slop);
-        } else {
-            memcpy(&pad, buf + buflen - slop, slop);
-            iowrite32(le32_to_cpu(pad), ap->ioaddr.data_addr);
-        }
-        words++;
-    }
-    return words << 2;
-}
-EXPORT_SYMBOL_GPL(ata_sff_data_xfer32);
-
-/**
  *    ata_sff_data_xfer_noirq - Transfer data by PIO
  *    @dev: device to target
  *    @buf: data buffer
diff --git a/drivers/ata/pata_ali.c b/drivers/ata/pata_ali.c
index eb99dbe..7cd48ea 100644
--- a/drivers/ata/pata_ali.c
+++ b/drivers/ata/pata_ali.c
@@ -151,7 +151,8 @@ static void ali_fifo_control(struct ata_port *ap, 
struct ata_device *adev, int o
 
     pci_read_config_byte(pdev, pio_fifo, &fifo);
     fifo &= ~(0x0F << shift);
-    fifo |= (on << shift);
+    if (on)
+        fifo |= (on << shift);
     pci_write_config_byte(pdev, pio_fifo, fifo);
 }
 
@@ -369,11 +370,10 @@ static struct ata_port_operations 
ali_early_port_ops = {
     .inherits    = &ata_sff_port_ops,
     .cable_detect    = ata_cable_40wire,
     .set_piomode    = ali_set_piomode,
-    .sff_data_xfer  = ata_sff_data_xfer32,
 };
 
 static const struct ata_port_operations ali_dma_base_ops = {
-    .inherits    = &ata_bmdma32_port_ops,
+    .inherits    = &ata_bmdma_port_ops,
     .set_piomode    = ali_set_piomode,
     .set_dmamode    = ali_set_dmamode,
 };
diff --git a/drivers/ata/pata_amd.c b/drivers/ata/pata_amd.c
index 63719ab..0ec9c7d 100644
--- a/drivers/ata/pata_amd.c
+++ b/drivers/ata/pata_amd.c
@@ -24,7 +24,7 @@
 #include <linux/libata.h>
 
 #define DRV_NAME "pata_amd"
-#define DRV_VERSION "0.3.11"
+#define DRV_VERSION "0.3.10"
 
 /**
  *    timing_setup        -    shared timing computation and load
@@ -345,7 +345,7 @@ static struct scsi_host_template amd_sht = {
 };
 
 static const struct ata_port_operations amd_base_port_ops = {
-    .inherits    = &ata_bmdma32_port_ops,
+    .inherits    = &ata_bmdma_port_ops,
     .prereset    = amd_pre_reset,
 };
 
diff --git a/drivers/ata/pata_atiixp.c b/drivers/ata/pata_atiixp.c
index 506adde..115eb00 100644
--- a/drivers/ata/pata_atiixp.c
+++ b/drivers/ata/pata_atiixp.c
@@ -140,7 +140,7 @@ static void atiixp_set_dmamode(struct ata_port *ap, 
struct ata_device *adev)
         wanted_pio = 3;
     else if (adev->dma_mode == XFER_MW_DMA_0)
         wanted_pio = 0;
-    else BUG();
+    else panic("atiixp_set_dmamode: unknown DMA mode!");
 
     if (adev->pio_mode != wanted_pio)
         atiixp_set_pio_timing(ap, adev, wanted_pio);
diff --git a/drivers/ata/pata_mpiix.c b/drivers/ata/pata_mpiix.c
index aa576ca..7c8faa4 100644
--- a/drivers/ata/pata_mpiix.c
+++ b/drivers/ata/pata_mpiix.c
@@ -35,7 +35,7 @@
 #include <linux/libata.h>
 
 #define DRV_NAME "pata_mpiix"
-#define DRV_VERSION "0.7.7"
+#define DRV_VERSION "0.7.6"
 
 enum {
     IDETIM = 0x6C,        /* IDE control register */
@@ -146,7 +146,6 @@ static struct ata_port_operations mpiix_port_ops = {
     .cable_detect    = ata_cable_40wire,
     .set_piomode    = mpiix_set_piomode,
     .prereset    = mpiix_pre_reset,
-    .sff_data_xfer    = ata_sff_data_xfer32,
 };
 
 static int mpiix_init_one(struct pci_dev *dev, const struct 
pci_device_id *id)
diff --git a/drivers/ata/pata_sil680.c b/drivers/ata/pata_sil680.c
index 9e764e5..83580a5 100644
--- a/drivers/ata/pata_sil680.c
+++ b/drivers/ata/pata_sil680.c
@@ -32,7 +32,7 @@
 #include <linux/libata.h>
 
 #define DRV_NAME "pata_sil680"
-#define DRV_VERSION "0.4.9"
+#define DRV_VERSION "0.4.8"
 
 #define SIL680_MMIO_BAR        5
 
@@ -195,7 +195,7 @@ static struct scsi_host_template sil680_sht = {
 };
 
 static struct ata_port_operations sil680_port_ops = {
-    .inherits    = &ata_bmdma32_port_ops,
+    .inherits    = &ata_bmdma_port_ops,
     .cable_detect    = sil680_cable_detect,
     .set_piomode    = sil680_set_piomode,
     .set_dmamode    = sil680_set_dmamode,
diff --git a/drivers/ata/sata_via.c b/drivers/ata/sata_via.c
index 5c62da9..f9803a2 100644
--- a/drivers/ata/sata_via.c
+++ b/drivers/ata/sata_via.c
@@ -566,7 +566,7 @@ static int svia_init_one(struct pci_dev *pdev, const 
struct pci_device_id *ent)
     static int printed_version;
     unsigned int i;
     int rc;
-    struct ata_host *host;
+    struct ata_host *uninitialized_var(host);
     int board_id = (int) ent->driver_data;
     const unsigned *bar_sizes;
 
Justin Madru


^ permalink raw reply related	[flat|nested] 262+ messages in thread

* Re: [Bug #12263] Sata soft reset filling log
  2009-02-15 20:47   ` Justin Madru
@ 2009-02-15 21:21     ` Rafael J. Wysocki
  2009-02-15 22:30       ` Ingo Molnar
  0 siblings, 1 reply; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-15 21:21 UTC (permalink / raw)
  To: Justin Madru, Ingo Molnar
  Cc: Linux Kernel Mailing List, Kernel Testers List, Linux IDE,
	Alan Cox, Hugh Dickins, Larry Finger, Mikael Pettersson,
	Sergei Shtylyov

On Sunday 15 February 2009, Justin Madru wrote:
> Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.27 and 2.6.28.
> >
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> > be listed and let me know (either way).
> >
> >
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12263
> > Subject		: Sata soft reset filling log
> > Submitter	: Justin Madru <bevicm@dslextreme.com>
> > Date		: 2008-12-13 2:07 (64 days old)
> > References	: http://marc.info/?l=linux-kernel&m=122913412608533&w=4
> 
> I'm still seeing this on .29-rc5, and I think that my bug #12263 is a 
> duplicate of bug #12609,
> or more correctly it's a duplicate of mine because I reported first.
> 
> It seems like the bug has been fixed in tip/master for some time now.
> Below is the diff of origin and tip from when I tested.

Ingo, do you know whinch patch in -tip fixes this regression?

> $ git diff origin/master..tip/master drivers/ata/
> 
> diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c
> index 54961c0..e004c25 100644
> --- a/drivers/ata/ata_piix.c
> +++ b/drivers/ata/ata_piix.c
> @@ -310,7 +310,7 @@ static struct scsi_host_template piix_sht = {
>  };
>  
>  static struct ata_port_operations piix_pata_ops = {
> -    .inherits        = &ata_bmdma32_port_ops,
> +    .inherits        = &ata_bmdma_port_ops,
>      .cable_detect        = ata_cable_40wire,
>      .set_piomode        = piix_set_piomode,
>      .set_dmamode        = piix_set_dmamode,
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index 9fbf059..1ed3966 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -1482,7 +1482,7 @@ static int ata_hpa_resize(struct ata_device *dev)
>      struct ata_eh_context *ehc = &dev->link->eh_context;
>      int print_info = ehc->i.flags & ATA_EHI_PRINTINFO;
>      u64 sectors = ata_id_n_sectors(dev->id);
> -    u64 native_sectors;
> +    u64 uninitialized_var(native_sectors);
>      int rc;
>  
>      /* do we need to do it? */
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index b9747fa..d65b9b2 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -3247,7 +3247,7 @@ void ata_scsi_scan_host(struct ata_port *ap, int sync)
>      int tries = 5;
>      struct ata_device *last_failed_dev = NULL;
>      struct ata_link *link;
> -    struct ata_device *dev;
> +    struct ata_device *uninitialized_var(dev);
>  
>      if (ap->flags & ATA_FLAG_DISABLED)
>          return;
> diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c
> index 0b299b0..416e3e2 100644
> --- a/drivers/ata/libata-sff.c
> +++ b/drivers/ata/libata-sff.c
> @@ -80,13 +80,6 @@ const struct ata_port_operations ata_bmdma_port_ops = {
>  };
>  EXPORT_SYMBOL_GPL(ata_bmdma_port_ops);
>  
> -const struct ata_port_operations ata_bmdma32_port_ops = {
> -    .inherits        = &ata_bmdma_port_ops,
> -
> -    .sff_data_xfer        = ata_sff_data_xfer32,
> -};
> -EXPORT_SYMBOL_GPL(ata_bmdma32_port_ops);
> -
>  /**
>   *    ata_fill_sg - Fill PCI IDE PRD table
>   *    @qc: Metadata associated with taskfile to be transferred
> @@ -743,52 +736,6 @@ unsigned int ata_sff_data_xfer(struct ata_device 
> *dev, unsigned char *buf,
>  EXPORT_SYMBOL_GPL(ata_sff_data_xfer);
>  
>  /**
> - *    ata_sff_data_xfer32 - Transfer data by PIO
> - *    @dev: device to target
> - *    @buf: data buffer
> - *    @buflen: buffer length
> - *    @rw: read/write
> - *
> - *    Transfer data from/to the device data register by PIO using 32bit
> - *    I/O operations.
> - *
> - *    LOCKING:
> - *    Inherited from caller.
> - *
> - *    RETURNS:
> - *    Bytes consumed.
> - */
> -
> -unsigned int ata_sff_data_xfer32(struct ata_device *dev, unsigned char 
> *buf,
> -                   unsigned int buflen, int rw)
> -{
> -    struct ata_port *ap = dev->link->ap;
> -    void __iomem *data_addr = ap->ioaddr.data_addr;
> -    unsigned int words = buflen >> 2;
> -    int slop = buflen & 3;
> -
> -    /* Transfer multiple of 4 bytes */
> -    if (rw == READ)
> -        ioread32_rep(data_addr, buf, words);
> -    else
> -        iowrite32_rep(data_addr, buf, words);
> -
> -    if (unlikely(slop)) {
> -        __le32 pad;
> -        if (rw == READ) {
> -            pad = cpu_to_le32(ioread32(ap->ioaddr.data_addr));
> -            memcpy(buf + buflen - slop, &pad, slop);
> -        } else {
> -            memcpy(&pad, buf + buflen - slop, slop);
> -            iowrite32(le32_to_cpu(pad), ap->ioaddr.data_addr);
> -        }
> -        words++;
> -    }
> -    return words << 2;
> -}
> -EXPORT_SYMBOL_GPL(ata_sff_data_xfer32);
> -
> -/**
>   *    ata_sff_data_xfer_noirq - Transfer data by PIO
>   *    @dev: device to target
>   *    @buf: data buffer
> diff --git a/drivers/ata/pata_ali.c b/drivers/ata/pata_ali.c
> index eb99dbe..7cd48ea 100644
> --- a/drivers/ata/pata_ali.c
> +++ b/drivers/ata/pata_ali.c
> @@ -151,7 +151,8 @@ static void ali_fifo_control(struct ata_port *ap, 
> struct ata_device *adev, int o
>  
>      pci_read_config_byte(pdev, pio_fifo, &fifo);
>      fifo &= ~(0x0F << shift);
> -    fifo |= (on << shift);
> +    if (on)
> +        fifo |= (on << shift);
>      pci_write_config_byte(pdev, pio_fifo, fifo);
>  }
>  
> @@ -369,11 +370,10 @@ static struct ata_port_operations 
> ali_early_port_ops = {
>      .inherits    = &ata_sff_port_ops,
>      .cable_detect    = ata_cable_40wire,
>      .set_piomode    = ali_set_piomode,
> -    .sff_data_xfer  = ata_sff_data_xfer32,
>  };
>  
>  static const struct ata_port_operations ali_dma_base_ops = {
> -    .inherits    = &ata_bmdma32_port_ops,
> +    .inherits    = &ata_bmdma_port_ops,
>      .set_piomode    = ali_set_piomode,
>      .set_dmamode    = ali_set_dmamode,
>  };
> diff --git a/drivers/ata/pata_amd.c b/drivers/ata/pata_amd.c
> index 63719ab..0ec9c7d 100644
> --- a/drivers/ata/pata_amd.c
> +++ b/drivers/ata/pata_amd.c
> @@ -24,7 +24,7 @@
>  #include <linux/libata.h>
>  
>  #define DRV_NAME "pata_amd"
> -#define DRV_VERSION "0.3.11"
> +#define DRV_VERSION "0.3.10"
>  
>  /**
>   *    timing_setup        -    shared timing computation and load
> @@ -345,7 +345,7 @@ static struct scsi_host_template amd_sht = {
>  };
>  
>  static const struct ata_port_operations amd_base_port_ops = {
> -    .inherits    = &ata_bmdma32_port_ops,
> +    .inherits    = &ata_bmdma_port_ops,
>      .prereset    = amd_pre_reset,
>  };
>  
> diff --git a/drivers/ata/pata_atiixp.c b/drivers/ata/pata_atiixp.c
> index 506adde..115eb00 100644
> --- a/drivers/ata/pata_atiixp.c
> +++ b/drivers/ata/pata_atiixp.c
> @@ -140,7 +140,7 @@ static void atiixp_set_dmamode(struct ata_port *ap, 
> struct ata_device *adev)
>          wanted_pio = 3;
>      else if (adev->dma_mode == XFER_MW_DMA_0)
>          wanted_pio = 0;
> -    else BUG();
> +    else panic("atiixp_set_dmamode: unknown DMA mode!");
>  
>      if (adev->pio_mode != wanted_pio)
>          atiixp_set_pio_timing(ap, adev, wanted_pio);
> diff --git a/drivers/ata/pata_mpiix.c b/drivers/ata/pata_mpiix.c
> index aa576ca..7c8faa4 100644
> --- a/drivers/ata/pata_mpiix.c
> +++ b/drivers/ata/pata_mpiix.c
> @@ -35,7 +35,7 @@
>  #include <linux/libata.h>
>  
>  #define DRV_NAME "pata_mpiix"
> -#define DRV_VERSION "0.7.7"
> +#define DRV_VERSION "0.7.6"
>  
>  enum {
>      IDETIM = 0x6C,        /* IDE control register */
> @@ -146,7 +146,6 @@ static struct ata_port_operations mpiix_port_ops = {
>      .cable_detect    = ata_cable_40wire,
>      .set_piomode    = mpiix_set_piomode,
>      .prereset    = mpiix_pre_reset,
> -    .sff_data_xfer    = ata_sff_data_xfer32,
>  };
>  
>  static int mpiix_init_one(struct pci_dev *dev, const struct 
> pci_device_id *id)
> diff --git a/drivers/ata/pata_sil680.c b/drivers/ata/pata_sil680.c
> index 9e764e5..83580a5 100644
> --- a/drivers/ata/pata_sil680.c
> +++ b/drivers/ata/pata_sil680.c
> @@ -32,7 +32,7 @@
>  #include <linux/libata.h>
>  
>  #define DRV_NAME "pata_sil680"
> -#define DRV_VERSION "0.4.9"
> +#define DRV_VERSION "0.4.8"
>  
>  #define SIL680_MMIO_BAR        5
>  
> @@ -195,7 +195,7 @@ static struct scsi_host_template sil680_sht = {
>  };
>  
>  static struct ata_port_operations sil680_port_ops = {
> -    .inherits    = &ata_bmdma32_port_ops,
> +    .inherits    = &ata_bmdma_port_ops,
>      .cable_detect    = sil680_cable_detect,
>      .set_piomode    = sil680_set_piomode,
>      .set_dmamode    = sil680_set_dmamode,
> diff --git a/drivers/ata/sata_via.c b/drivers/ata/sata_via.c
> index 5c62da9..f9803a2 100644
> --- a/drivers/ata/sata_via.c
> +++ b/drivers/ata/sata_via.c
> @@ -566,7 +566,7 @@ static int svia_init_one(struct pci_dev *pdev, const 
> struct pci_device_id *ent)
>      static int printed_version;
>      unsigned int i;
>      int rc;
> -    struct ata_host *host;
> +    struct ata_host *uninitialized_var(host);
>      int board_id = (int) ent->driver_data;
>      const unsigned *bar_sizes;
>  
> Justin Madru

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12401] 2.6.28 regression: xbacklight broken on ThinkPad X61s
@ 2009-02-15 22:16         ` Tino Keitel
  0 siblings, 0 replies; 262+ messages in thread
From: Tino Keitel @ 2009-02-15 22:16 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Matthew Garrett, Linux Kernel Mailing List, Kernel Testers List,
	Andi Kleen, Len Brown, Thomas Renninger, Zhang Rui

On Sun, Feb 15, 2009 at 15:38:28 +0100, Rafael J. Wysocki wrote:
> On Sunday 15 February 2009, Matthew Garrett wrote:
> > This one sounded like a configuration error.

I think if it works without DRI in 2.6.27 and doesn't work in 2.6.28,
it isn't a configuration error, but a real regression.

Regards,
Tino

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12401] 2.6.28 regression: xbacklight broken on ThinkPad X61s
@ 2009-02-15 22:16         ` Tino Keitel
  0 siblings, 0 replies; 262+ messages in thread
From: Tino Keitel @ 2009-02-15 22:16 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Matthew Garrett, Linux Kernel Mailing List, Kernel Testers List,
	Andi Kleen, Len Brown, Thomas Renninger, Zhang Rui

On Sun, Feb 15, 2009 at 15:38:28 +0100, Rafael J. Wysocki wrote:
> On Sunday 15 February 2009, Matthew Garrett wrote:
> > This one sounded like a configuration error.

I think if it works without DRI in 2.6.27 and doesn't work in 2.6.28,
it isn't a configuration error, but a real regression.

Regards,
Tino

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12263] Sata soft reset filling log
  2009-02-15 21:21     ` Rafael J. Wysocki
@ 2009-02-15 22:30       ` Ingo Molnar
  2009-02-15 23:12         ` Rafael J. Wysocki
  0 siblings, 1 reply; 262+ messages in thread
From: Ingo Molnar @ 2009-02-15 22:30 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Justin Madru, Linux Kernel Mailing List, Kernel Testers List,
	Linux IDE, Alan Cox, Hugh Dickins, Larry Finger,
	Mikael Pettersson, Sergei Shtylyov


* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> On Sunday 15 February 2009, Justin Madru wrote:
> > Rafael J. Wysocki wrote:
> > > This message has been generated automatically as a part of a report
> > > of regressions introduced between 2.6.27 and 2.6.28.
> > >
> > > The following bug entry is on the current list of known regressions
> > > introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> > > be listed and let me know (either way).
> > >
> > >
> > > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12263
> > > Subject		: Sata soft reset filling log
> > > Submitter	: Justin Madru <bevicm@dslextreme.com>
> > > Date		: 2008-12-13 2:07 (64 days old)
> > > References	: http://marc.info/?l=linux-kernel&m=122913412608533&w=4
> > 
> > I'm still seeing this on .29-rc5, and I think that my bug #12263 is a 
> > duplicate of bug #12609,
> > or more correctly it's a duplicate of mine because I reported first.
> > 
> > It seems like the bug has been fixed in tip/master for some time now.
> > Below is the diff of origin and tip from when I tested.
> 
> Ingo, do you know whinch patch in -tip fixes this regression?

This one, done on Jan 10, more than a month ago:

  f1d26da: Revert "libata: Add 32bit PIO support"

When a commit causes trouble in -tip qa i immediately revert it in 95% 
of the cases, no questions asked. Especially if it's related to 
persistent storage.

	Ingo

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12263] Sata soft reset filling log
  2009-02-15 22:30       ` Ingo Molnar
@ 2009-02-15 23:12         ` Rafael J. Wysocki
  2009-02-16 15:18           ` Sergei Shtylyov
  0 siblings, 1 reply; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-15 23:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Justin Madru, Linux Kernel Mailing List, Kernel Testers List,
	Linux IDE, Alan Cox, Hugh Dickins, Larry Finger,
	Mikael Pettersson, Sergei Shtylyov

On Sunday 15 February 2009, Ingo Molnar wrote:
> 
> * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> 
> > On Sunday 15 February 2009, Justin Madru wrote:
> > > Rafael J. Wysocki wrote:
> > > > This message has been generated automatically as a part of a report
> > > > of regressions introduced between 2.6.27 and 2.6.28.
> > > >
> > > > The following bug entry is on the current list of known regressions
> > > > introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> > > > be listed and let me know (either way).
> > > >
> > > >
> > > > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12263
> > > > Subject		: Sata soft reset filling log
> > > > Submitter	: Justin Madru <bevicm@dslextreme.com>
> > > > Date		: 2008-12-13 2:07 (64 days old)
> > > > References	: http://marc.info/?l=linux-kernel&m=122913412608533&w=4
> > > 
> > > I'm still seeing this on .29-rc5, and I think that my bug #12263 is a 
> > > duplicate of bug #12609,
> > > or more correctly it's a duplicate of mine because I reported first.
> > > 
> > > It seems like the bug has been fixed in tip/master for some time now.
> > > Below is the diff of origin and tip from when I tested.
> > 
> > Ingo, do you know whinch patch in -tip fixes this regression?
> 
> This one, done on Jan 10, more than a month ago:
> 
>   f1d26da: Revert "libata: Add 32bit PIO support"
> 
> When a commit causes trouble in -tip qa i immediately revert it in 95% 
> of the cases, no questions asked. Especially if it's related to 
> persistent storage.

OK, thanks.

We seem to have a working fix patch for this issue in bug #12609.

Best,
Rafael

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12401] 2.6.28 regression: xbacklight broken on ThinkPad X61s
@ 2009-02-16  1:16           ` Matthew Garrett
  0 siblings, 0 replies; 262+ messages in thread
From: Matthew Garrett @ 2009-02-16  1:16 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Andi Kleen, Len Brown, Thomas Renninger,
	Zhang Rui

On Sun, Feb 15, 2009 at 11:16:47PM +0100, Tino Keitel wrote:

> I think if it works without DRI in 2.6.27 and doesn't work in 2.6.28,
> it isn't a configuration error, but a real regression.

It only worked by accident without DRM support, since you were using the 
ATI codepath in the firmware rather than the Intel one. That bug's been 
fixed, so now you're following the Intel codepath - unfortunately 
there's no way to do that without kernel-level graphics support, which 
means DRM.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12401] 2.6.28 regression: xbacklight broken on ThinkPad X61s
@ 2009-02-16  1:16           ` Matthew Garrett
  0 siblings, 0 replies; 262+ messages in thread
From: Matthew Garrett @ 2009-02-16  1:16 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Andi Kleen, Le

On Sun, Feb 15, 2009 at 11:16:47PM +0100, Tino Keitel wrote:

> I think if it works without DRI in 2.6.27 and doesn't work in 2.6.28,
> it isn't a configuration error, but a real regression.

It only worked by accident without DRM support, since you were using the 
ATI codepath in the firmware rather than the Intel one. That bug's been 
fixed, so now you're following the Intel codepath - unfortunately 
there's no way to do that without kernel-level graphics support, which 
means DRM.

-- 
Matthew Garrett | mjg59-1xO5oi07KQx4cg9Nei1l7Q@public.gmane.org

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12401] 2.6.28 regression: xbacklight broken on ThinkPad X61s
@ 2009-02-16 12:37             ` Ingo Molnar
  0 siblings, 0 replies; 262+ messages in thread
From: Ingo Molnar @ 2009-02-16 12:37 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Andi Kleen, Len Brown, Thomas Renninger,
	Zhang Rui


* Matthew Garrett <mjg59@srcf.ucam.org> wrote:

> On Sun, Feb 15, 2009 at 11:16:47PM +0100, Tino Keitel wrote:
> 
> > I think if it works without DRI in 2.6.27 and doesn't work in 2.6.28,
> > it isn't a configuration error, but a real regression.
> 
> It only worked by accident without DRM support, since you were using the 
> ATI codepath in the firmware rather than the Intel one. That bug's been 
> fixed, [...]

Which precise commit ID is that?

> [...] so now you're following the Intel codepath - unfortunately 
> there's no way to do that without kernel-level graphics support, which 
> means DRM.

Tino, does it all work fine if CONFIG_DRM is enabled?

	Ingo

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12401] 2.6.28 regression: xbacklight broken on ThinkPad X61s
@ 2009-02-16 12:37             ` Ingo Molnar
  0 siblings, 0 replies; 262+ messages in thread
From: Ingo Molnar @ 2009-02-16 12:37 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Andi Kleen, Len Brown, Thomas Renninger,
	Zhang Rui


* Matthew Garrett <mjg59-1xO5oi07KQx4cg9Nei1l7Q@public.gmane.org> wrote:

> On Sun, Feb 15, 2009 at 11:16:47PM +0100, Tino Keitel wrote:
> 
> > I think if it works without DRI in 2.6.27 and doesn't work in 2.6.28,
> > it isn't a configuration error, but a real regression.
> 
> It only worked by accident without DRM support, since you were using the 
> ATI codepath in the firmware rather than the Intel one. That bug's been 
> fixed, [...]

Which precise commit ID is that?

> [...] so now you're following the Intel codepath - unfortunately 
> there's no way to do that without kernel-level graphics support, which 
> means DRM.

Tino, does it all work fine if CONFIG_DRM is enabled?

	Ingo

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12401] 2.6.28 regression: xbacklight broken on ThinkPad X61s
@ 2009-02-16 12:42               ` Matthew Garrett
  0 siblings, 0 replies; 262+ messages in thread
From: Matthew Garrett @ 2009-02-16 12:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Andi Kleen, Len Brown, Thomas Renninger,
	Zhang Rui

On Mon, Feb 16, 2009 at 01:37:40PM +0100, Ingo Molnar wrote:
> 
> * Matthew Garrett <mjg59@srcf.ucam.org> wrote:
> 
> > On Sun, Feb 15, 2009 at 11:16:47PM +0100, Tino Keitel wrote:
> > 
> > > I think if it works without DRI in 2.6.27 and doesn't work in 2.6.28,
> > > it isn't a configuration error, but a real regression.
> > 
> > It only worked by accident without DRM support, since you were using the 
> > ATI codepath in the firmware rather than the Intel one. That bug's been 
> > fixed, [...]
> 
> Which precise commit ID is that?

22c13f9d8179f4c9caecfcb60a95214562b9addc

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12401] 2.6.28 regression: xbacklight broken on ThinkPad X61s
@ 2009-02-16 12:42               ` Matthew Garrett
  0 siblings, 0 replies; 262+ messages in thread
From: Matthew Garrett @ 2009-02-16 12:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Andi Kleen, Len Brown, Thomas Renninger,
	Zhang Rui

On Mon, Feb 16, 2009 at 01:37:40PM +0100, Ingo Molnar wrote:
> 
> * Matthew Garrett <mjg59-1xO5oi07KQx4cg9Nei1l7Q@public.gmane.org> wrote:
> 
> > On Sun, Feb 15, 2009 at 11:16:47PM +0100, Tino Keitel wrote:
> > 
> > > I think if it works without DRI in 2.6.27 and doesn't work in 2.6.28,
> > > it isn't a configuration error, but a real regression.
> > 
> > It only worked by accident without DRM support, since you were using the 
> > ATI codepath in the firmware rather than the Intel one. That bug's been 
> > fixed, [...]
> 
> Which precise commit ID is that?

22c13f9d8179f4c9caecfcb60a95214562b9addc

-- 
Matthew Garrett | mjg59-1xO5oi07KQx4cg9Nei1l7Q@public.gmane.org

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12263] Sata soft reset filling log
  2009-02-15 23:12         ` Rafael J. Wysocki
@ 2009-02-16 15:18           ` Sergei Shtylyov
  2009-02-16 15:21             ` Ingo Molnar
       [not found]             ` <499983DF.5050503-hkdhdckH98+B+jHODAdFcQ@public.gmane.org>
  0 siblings, 2 replies; 262+ messages in thread
From: Sergei Shtylyov @ 2009-02-16 15:18 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Ingo Molnar, Justin Madru, Linux Kernel Mailing List,
	Kernel Testers List, Linux IDE, Alan Cox, Hugh Dickins,
	Larry Finger, Mikael Pettersson

Hello.

Rafael J. Wysocki wrote:

>>>>>This message has been generated automatically as a part of a report
>>>>>of regressions introduced between 2.6.27 and 2.6.28.

>>>>>The following bug entry is on the current list of known regressions
>>>>>introduced between 2.6.27 and 2.6.28.  Please verify if it still should
>>>>>be listed and let me know (either way).

>>>>>Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12263
>>>>>Subject		: Sata soft reset filling log
>>>>>Submitter	: Justin Madru <bevicm@dslextreme.com>
>>>>>Date		: 2008-12-13 2:07 (64 days old)
>>>>>References	: http://marc.info/?l=linux-kernel&m=122913412608533&w=4

>>>>I'm still seeing this on .29-rc5, and I think that my bug #12263 is a 
>>>>duplicate of bug #12609,
>>>>or more correctly it's a duplicate of mine because I reported first.

>>>>It seems like the bug has been fixed in tip/master for some time now.
>>>>Below is the diff of origin and tip from when I tested.

>>>Ingo, do you know whinch patch in -tip fixes this regression?

>>This one, done on Jan 10, more than a month ago:

>>  f1d26da: Revert "libata: Add 32bit PIO support"

>>When a commit causes trouble in -tip qa i immediately revert it in 95% 
>>of the cases, no questions asked. Especially if it's related to 
>>persistent storage.

> OK, thanks.

> We seem to have a working fix patch for this issue in bug #12609.

    Wait, if this is indeed post-2.6.27 regression, it couldn't possibly have 
been caused by that patch which got merged during 2.6.29-rc1 timeframe. 
Something's up with this bug...

MBR, Sergei

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12263] Sata soft reset filling log
  2009-02-16 15:18           ` Sergei Shtylyov
@ 2009-02-16 15:21             ` Ingo Molnar
       [not found]             ` <499983DF.5050503-hkdhdckH98+B+jHODAdFcQ@public.gmane.org>
  1 sibling, 0 replies; 262+ messages in thread
From: Ingo Molnar @ 2009-02-16 15:21 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: Rafael J. Wysocki, Justin Madru, Linux Kernel Mailing List,
	Kernel Testers List, Linux IDE, Alan Cox, Hugh Dickins,
	Larry Finger, Mikael Pettersson


* Sergei Shtylyov <sshtylyov@ru.mvista.com> wrote:

> Hello.
>
> Rafael J. Wysocki wrote:
>
>>>>>> This message has been generated automatically as a part of a report
>>>>>> of regressions introduced between 2.6.27 and 2.6.28.
>
>>>>>> The following bug entry is on the current list of known regressions
>>>>>> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
>>>>>> be listed and let me know (either way).
>
>>>>>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12263
>>>>>> Subject		: Sata soft reset filling log
>>>>>> Submitter	: Justin Madru <bevicm@dslextreme.com>
>>>>>> Date		: 2008-12-13 2:07 (64 days old)
>>>>>> References	: http://marc.info/?l=linux-kernel&m=122913412608533&w=4
>
>>>>> I'm still seeing this on .29-rc5, and I think that my bug #12263 
>>>>> is a duplicate of bug #12609,
>>>>> or more correctly it's a duplicate of mine because I reported first.
>
>>>>> It seems like the bug has been fixed in tip/master for some time now.
>>>>> Below is the diff of origin and tip from when I tested.
>
>>>> Ingo, do you know whinch patch in -tip fixes this regression?
>
>>> This one, done on Jan 10, more than a month ago:
>
>>>  f1d26da: Revert "libata: Add 32bit PIO support"
>
>>> When a commit causes trouble in -tip qa i immediately revert it in 
>>> 95% of the cases, no questions asked. Especially if it's related to  
>>> persistent storage.
>
>> OK, thanks.
>
>> We seem to have a working fix patch for this issue in bug #12609.
>
>    Wait, if this is indeed post-2.6.27 regression, it couldn't possibly 
> have been caused by that patch which got merged during 2.6.29-rc1 
> timeframe. Something's up with this bug...

SATA uses the SCSI layer, right? It could then perhaps be these bits in 
tip:out-of-tree:

 813104e: Revert "[SCSI] simplify scsi_io_completion()"
 84db545: Revert "[SCSI] Fix uninitialized variable error in scsi_io_completion"
 0eb6038: Revert "[SCSI] Fix error handling for DIF/DIX"
 3cd94dd: Revert "[SCSI] scsi_lib: don't decrement busy counters when inserting commands"
 c27aed5: Revert "[SCSI] scsi_lib: fix DID_RESET status problems"

i needed these to keep an aic7xxx box from crashing. This regression got 
introduced at around 2.6.28-rc1, so it fits the timeframe.

	Ingo

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12263] Sata soft reset filling log
  2009-02-16 15:18           ` Sergei Shtylyov
@ 2009-02-16 15:21                 ` Sergei Shtylyov
       [not found]             ` <499983DF.5050503-hkdhdckH98+B+jHODAdFcQ@public.gmane.org>
  1 sibling, 0 replies; 262+ messages in thread
From: Sergei Shtylyov @ 2009-02-16 15:21 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Ingo Molnar, Justin Madru, Linux Kernel Mailing List,
	Kernel Testers List, Linux IDE, Alan Cox, Hugh Dickins,
	Larry Finger, Mikael Pettersson

Hello, I wrote:

>>>>>> This message has been generated automatically as a part of a report
>>>>>> of regressions introduced between 2.6.27 and 2.6.28.

>>>>>> The following bug entry is on the current list of known regressions
>>>>>> introduced between 2.6.27 and 2.6.28.  Please verify if it still 
>>>>>> should
>>>>>> be listed and let me know (either way).

>>>>>> Bug-Entry    : http://bugzilla.kernel.org/show_bug.cgi?id=12263
>>>>>> Subject        : Sata soft reset filling log
>>>>>> Submitter    : Justin Madru <bevicm-QP1aEjBt37AFQeE35raUng@public.gmane.org>
>>>>>> Date        : 2008-12-13 2:07 (64 days old)
>>>>>> References    : 
>>>>>> http://marc.info/?l=linux-kernel&m=122913412608533&w=4

>>>>> I'm still seeing this on .29-rc5, and I think that my bug #12263 is 
>>>>> a duplicate of bug #12609,
>>>>> or more correctly it's a duplicate of mine because I reported first.

>>>>> It seems like the bug has been fixed in tip/master for some time now.
>>>>> Below is the diff of origin and tip from when I tested.

>>>> Ingo, do you know whinch patch in -tip fixes this regression?

>>> This one, done on Jan 10, more than a month ago:

>>>  f1d26da: Revert "libata: Add 32bit PIO support"

>>> When a commit causes trouble in -tip qa i immediately revert it in 
>>> 95% of the cases, no questions asked. Especially if it's related to 
>>> persistent storage.

>> OK, thanks.

>> We seem to have a working fix patch for this issue in bug #12609.

>    Wait, if this is indeed post-2.6.27 regression, it couldn't possibly 
> have been caused by that patch which got merged during 2.6.29-rc1 
> timeframe. Something's up with this bug...

    Also, it's been reported for a hard disk while regression in bug 12609 
only hits the ATAPI devices. I think that bug 12263 needs to be reopened.

MBR, Sergei

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12263] Sata soft reset filling log
@ 2009-02-16 15:21                 ` Sergei Shtylyov
  0 siblings, 0 replies; 262+ messages in thread
From: Sergei Shtylyov @ 2009-02-16 15:21 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Ingo Molnar, Justin Madru, Linux Kernel Mailing List,
	Kernel Testers List, Linux IDE, Alan Cox, Hugh Dickins,
	Larry Finger, Mikael Pettersson

Hello, I wrote:

>>>>>> This message has been generated automatically as a part of a report
>>>>>> of regressions introduced between 2.6.27 and 2.6.28.

>>>>>> The following bug entry is on the current list of known regressions
>>>>>> introduced between 2.6.27 and 2.6.28.  Please verify if it still 
>>>>>> should
>>>>>> be listed and let me know (either way).

>>>>>> Bug-Entry    : http://bugzilla.kernel.org/show_bug.cgi?id=12263
>>>>>> Subject        : Sata soft reset filling log
>>>>>> Submitter    : Justin Madru <bevicm@dslextreme.com>
>>>>>> Date        : 2008-12-13 2:07 (64 days old)
>>>>>> References    : 
>>>>>> http://marc.info/?l=linux-kernel&m=122913412608533&w=4

>>>>> I'm still seeing this on .29-rc5, and I think that my bug #12263 is 
>>>>> a duplicate of bug #12609,
>>>>> or more correctly it's a duplicate of mine because I reported first.

>>>>> It seems like the bug has been fixed in tip/master for some time now.
>>>>> Below is the diff of origin and tip from when I tested.

>>>> Ingo, do you know whinch patch in -tip fixes this regression?

>>> This one, done on Jan 10, more than a month ago:

>>>  f1d26da: Revert "libata: Add 32bit PIO support"

>>> When a commit causes trouble in -tip qa i immediately revert it in 
>>> 95% of the cases, no questions asked. Especially if it's related to 
>>> persistent storage.

>> OK, thanks.

>> We seem to have a working fix patch for this issue in bug #12609.

>    Wait, if this is indeed post-2.6.27 regression, it couldn't possibly 
> have been caused by that patch which got merged during 2.6.29-rc1 
> timeframe. Something's up with this bug...

    Also, it's been reported for a hard disk while regression in bug 12609 
only hits the ATAPI devices. I think that bug 12263 needs to be reopened.

MBR, Sergei

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12263] Sata soft reset filling log
  2009-02-16 15:21                 ` Sergei Shtylyov
@ 2009-02-16 15:31                     ` Sergei Shtylyov
  -1 siblings, 0 replies; 262+ messages in thread
From: Sergei Shtylyov @ 2009-02-16 15:31 UTC (permalink / raw)
  To: Rafael J. Wysocki, Ingo Molnar
  Cc: Justin Madru, Linux Kernel Mailing List, Kernel Testers List,
	Linux IDE, Alan Cox, Hugh Dickins, Larry Finger,
	Mikael Pettersson

Hello, I wrote:

>>>>>>> This message has been generated automatically as a part of a report
>>>>>>> of regressions introduced between 2.6.27 and 2.6.28.

>>>>>>> The following bug entry is on the current list of known regressions
>>>>>>> introduced between 2.6.27 and 2.6.28.  Please verify if it still 
>>>>>>> should
>>>>>>> be listed and let me know (either way).

>>>>>>> Bug-Entry    : http://bugzilla.kernel.org/show_bug.cgi?id=12263
>>>>>>> Subject        : Sata soft reset filling log
>>>>>>> Submitter    : Justin Madru <bevicm-QP1aEjBt37AFQeE35raUng@public.gmane.org>
>>>>>>> Date        : 2008-12-13 2:07 (64 days old)
>>>>>>> References    : 
>>>>>>> http://marc.info/?l=linux-kernel&m=122913412608533&w=4

>>>>>> I'm still seeing this on .29-rc5, and I think that my bug #12263 
>>>>>> is a duplicate of bug #12609,
>>>>>> or more correctly it's a duplicate of mine because I reported first.

>>>>>> It seems like the bug has been fixed in tip/master for some time now.
>>>>>> Below is the diff of origin and tip from when I tested.

>>>>> Ingo, do you know whinch patch in -tip fixes this regression?

>>>> This one, done on Jan 10, more than a month ago:

>>>>  f1d26da: Revert "libata: Add 32bit PIO support"

>>>> When a commit causes trouble in -tip qa i immediately revert it in 
>>>> 95% of the cases, no questions asked. Especially if it's related to 
>>>> persistent storage.

>>> OK, thanks.

>>> We seem to have a working fix patch for this issue in bug #12609.

>>    Wait, if this is indeed post-2.6.27 regression, it couldn't 
>> possibly have been caused by that patch which got merged during 
>> 2.6.29-rc1 timeframe. Something's up with this bug...

>    Also, it's been reported for a hard disk while regression in bug 
> 12609 only hits the ATAPI devices.  I think that bug 12263 needs to be reopened.

    After referring to the SCSI command codes "cdb 0x1e" means ALLOW MEDIUM 
REMOVAL command -- which could hardly be addressed to an usual hard disk. So, 
it looks like we had a case of the confused bug report which has a lot of info 
on the hard disk while errors were most probably happening with a CD/DVD 
drive. :-)

MBR, Sergei

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12263] Sata soft reset filling log
@ 2009-02-16 15:31                     ` Sergei Shtylyov
  0 siblings, 0 replies; 262+ messages in thread
From: Sergei Shtylyov @ 2009-02-16 15:31 UTC (permalink / raw)
  To: Rafael J. Wysocki, Ingo Molnar
  Cc: Justin Madru, Linux Kernel Mailing List, Kernel Testers List,
	Linux IDE, Alan Cox, Hugh Dickins, Larry Finger,
	Mikael Pettersson

Hello, I wrote:

>>>>>>> This message has been generated automatically as a part of a report
>>>>>>> of regressions introduced between 2.6.27 and 2.6.28.

>>>>>>> The following bug entry is on the current list of known regressions
>>>>>>> introduced between 2.6.27 and 2.6.28.  Please verify if it still 
>>>>>>> should
>>>>>>> be listed and let me know (either way).

>>>>>>> Bug-Entry    : http://bugzilla.kernel.org/show_bug.cgi?id=12263
>>>>>>> Subject        : Sata soft reset filling log
>>>>>>> Submitter    : Justin Madru <bevicm@dslextreme.com>
>>>>>>> Date        : 2008-12-13 2:07 (64 days old)
>>>>>>> References    : 
>>>>>>> http://marc.info/?l=linux-kernel&m=122913412608533&w=4

>>>>>> I'm still seeing this on .29-rc5, and I think that my bug #12263 
>>>>>> is a duplicate of bug #12609,
>>>>>> or more correctly it's a duplicate of mine because I reported first.

>>>>>> It seems like the bug has been fixed in tip/master for some time now.
>>>>>> Below is the diff of origin and tip from when I tested.

>>>>> Ingo, do you know whinch patch in -tip fixes this regression?

>>>> This one, done on Jan 10, more than a month ago:

>>>>  f1d26da: Revert "libata: Add 32bit PIO support"

>>>> When a commit causes trouble in -tip qa i immediately revert it in 
>>>> 95% of the cases, no questions asked. Especially if it's related to 
>>>> persistent storage.

>>> OK, thanks.

>>> We seem to have a working fix patch for this issue in bug #12609.

>>    Wait, if this is indeed post-2.6.27 regression, it couldn't 
>> possibly have been caused by that patch which got merged during 
>> 2.6.29-rc1 timeframe. Something's up with this bug...

>    Also, it's been reported for a hard disk while regression in bug 
> 12609 only hits the ATAPI devices.  I think that bug 12263 needs to be reopened.

    After referring to the SCSI command codes "cdb 0x1e" means ALLOW MEDIUM 
REMOVAL command -- which could hardly be addressed to an usual hard disk. So, 
it looks like we had a case of the confused bug report which has a lot of info 
on the hard disk while errors were most probably happening with a CD/DVD 
drive. :-)

MBR, Sergei

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12403] TTY problem on linux-2.6.28-rc7
  2009-02-14 20:50   ` Rafael J. Wysocki
@ 2009-02-16 16:12     ` Aristeu Rozanski
  -1 siblings, 0 replies; 262+ messages in thread
From: Aristeu Rozanski @ 2009-02-16 16:12 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, sasa sasa

> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.27 and 2.6.28.
> 
> The following bug entry is on the current list of known regressions
> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> be listed and let me know (either way).
> 
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12403
> Subject		: TTY problem on linux-2.6.28-rc7
> Submitter	: sasa sasa <sasak.1983@gmail.com>
> Date		: 2008-12-22 4:23 (55 days old)
> References	: http://marc.info/?l=linux-kernel&m=122991914600390&w=4
according to
	http://marc.info/?l=linux-kernel&m=123054911532245&w=4
which is a reply to the first post, it's not a kernel problem

-- 
Aristeu


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12403] TTY problem on linux-2.6.28-rc7
@ 2009-02-16 16:12     ` Aristeu Rozanski
  0 siblings, 0 replies; 262+ messages in thread
From: Aristeu Rozanski @ 2009-02-16 16:12 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, sasa sasa

> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.27 and 2.6.28.
> 
> The following bug entry is on the current list of known regressions
> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> be listed and let me know (either way).
> 
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12403
> Subject		: TTY problem on linux-2.6.28-rc7
> Submitter	: sasa sasa <sasak.1983-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Date		: 2008-12-22 4:23 (55 days old)
> References	: http://marc.info/?l=linux-kernel&m=122991914600390&w=4
according to
	http://marc.info/?l=linux-kernel&m=123054911532245&w=4
which is a reply to the first post, it's not a kernel problem

-- 
Aristeu

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12263] Sata soft reset filling log
  2009-02-16 15:31                     ` Sergei Shtylyov
  (?)
@ 2009-02-16 19:23                     ` Justin Madru
       [not found]                       ` <4999BD1A.1060101-u1xxEuL7cY4AvxtiuMwx3w@public.gmane.org>
  -1 siblings, 1 reply; 262+ messages in thread
From: Justin Madru @ 2009-02-16 19:23 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: Rafael J. Wysocki, Ingo Molnar, Linux Kernel Mailing List,
	Kernel Testers List, Linux IDE, Alan Cox, Hugh Dickins,
	Larry Finger, Mikael Pettersson

Sergei Shtylyov wrote:

> After referring to the SCSI command codes "cdb 0x1e" means ALLOW MEDIUM REMOVAL command -- which
> could hardly be addressed to an usual hard disk. So, it looks like we had a case of the confused bug report which
> has a lot of info on the hard disk while errors were most probably happening with a CD/DVD drive.
Yes, I originally thought it was my hard disk because the kernel logs showed ata2.

But, Tejun Heo figured out it was my DVD drive (ATAPI) that was on the ata2 link.

(see http://marc.info/?l=linux-kernel&m=122993014109646&w=2)

I tried to bisect it, but around .28-rc1 I began to get numerous compile errors, so couldn't continue.

I also tried patches that Tejun sent me, but non of them worked, it just slightly change the error message.

So, yes this is a regression that was introduced in the .28 merge window, and I still think that bug #12609 is a duplicate of my bug.

I don't see this bug on tip/master and this is the diff of origin and tip at the time I tested.

$ git diff origin/master..tip/master drivers/ata/

diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c

index 54961c0..e004c25 100644

--- a/drivers/ata/ata_piix.c

+++ b/drivers/ata/ata_piix.c

@@ -310,7 +310,7 @@ static struct scsi_host_template piix_sht = {

};

static struct ata_port_operations piix_pata_ops = {

-    .inherits        = &ata_bmdma32_port_ops,

+    .inherits        = &ata_bmdma_port_ops,

    .cable_detect        = ata_cable_40wire,

    .set_piomode        = piix_set_piomode,

    .set_dmamode        = piix_set_dmamode,

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c

index 9fbf059..1ed3966 100644

--- a/drivers/ata/libata-core.c

+++ b/drivers/ata/libata-core.c

@@ -1482,7 +1482,7 @@ static int ata_hpa_resize(struct ata_device *dev)

    struct ata_eh_context *ehc = &dev->link->eh_context;

    int print_info = ehc->i.flags & ATA_EHI_PRINTINFO;

    u64 sectors = ata_id_n_sectors(dev->id);

-    u64 native_sectors;

+    u64 uninitialized_var(native_sectors);

    int rc;

    /* do we need to do it? */

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c

index b9747fa..d65b9b2 100644

--- a/drivers/ata/libata-scsi.c

+++ b/drivers/ata/libata-scsi.c

@@ -3247,7 +3247,7 @@ void ata_scsi_scan_host(struct ata_port *ap, int sync)

    int tries = 5;

    struct ata_device *last_failed_dev = NULL;

    struct ata_link *link;

-    struct ata_device *dev;

+    struct ata_device *uninitialized_var(dev);

    if (ap->flags & ATA_FLAG_DISABLED)

        return;

diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c

index 0b299b0..416e3e2 100644

--- a/drivers/ata/libata-sff.c

+++ b/drivers/ata/libata-sff.c

@@ -80,13 +80,6 @@ const struct ata_port_operations ata_bmdma_port_ops = {

};

EXPORT_SYMBOL_GPL(ata_bmdma_port_ops);

-const struct ata_port_operations ata_bmdma32_port_ops = {

-    .inherits        = &ata_bmdma_port_ops,

-

-    .sff_data_xfer        = ata_sff_data_xfer32,

-};

-EXPORT_SYMBOL_GPL(ata_bmdma32_port_ops);

-

/**

 *    ata_fill_sg - Fill PCI IDE PRD table

 *    @qc: Metadata associated with taskfile to be transferred

@@ -743,52 +736,6 @@ unsigned int ata_sff_data_xfer(struct ata_device *dev, unsigned char *buf,

EXPORT_SYMBOL_GPL(ata_sff_data_xfer);

/**

- *    ata_sff_data_xfer32 - Transfer data by PIO

- *    @dev: device to target

- *    @buf: data buffer

- *    @buflen: buffer length

- *    @rw: read/write

- *

- *    Transfer data from/to the device data register by PIO using 32bit

- *    I/O operations.

- *

- *    LOCKING:

- *    Inherited from caller.

- *

- *    RETURNS:

- *    Bytes consumed.

- */

-

-unsigned int ata_sff_data_xfer32(struct ata_device *dev, unsigned char *buf,

-                   unsigned int buflen, int rw)

-{

-    struct ata_port *ap = dev->link->ap;

-    void __iomem *data_addr = ap->ioaddr.data_addr;

-    unsigned int words = buflen >> 2;

-    int slop = buflen & 3;

-

-    /* Transfer multiple of 4 bytes */

-    if (rw == READ)

-        ioread32_rep(data_addr, buf, words);

-    else

-        iowrite32_rep(data_addr, buf, words);

-

-    if (unlikely(slop)) {

-        __le32 pad;

-        if (rw == READ) {

-            pad = cpu_to_le32(ioread32(ap->ioaddr.data_addr));

-            memcpy(buf + buflen - slop, &pad, slop);

-        } else {

-            memcpy(&pad, buf + buflen - slop, slop);

-            iowrite32(le32_to_cpu(pad), ap->ioaddr.data_addr);

-        }

-        words++;

-    }

-    return words << 2;

-}

-EXPORT_SYMBOL_GPL(ata_sff_data_xfer32);

-

-/**

 *    ata_sff_data_xfer_noirq - Transfer data by PIO

 *    @dev: device to target

 *    @buf: data buffer

diff --git a/drivers/ata/pata_ali.c b/drivers/ata/pata_ali.c

index eb99dbe..7cd48ea 100644

--- a/drivers/ata/pata_ali.c

+++ b/drivers/ata/pata_ali.c

@@ -151,7 +151,8 @@ static void ali_fifo_control(struct ata_port *ap, struct ata_device *adev, int o

    pci_read_config_byte(pdev, pio_fifo, &fifo);

    fifo &= ~(0x0F << shift);

-    fifo |= (on << shift);

+    if (on)

+        fifo |= (on << shift);

    pci_write_config_byte(pdev, pio_fifo, fifo);

}

@@ -369,11 +370,10 @@ static struct ata_port_operations ali_early_port_ops = {

    .inherits    = &ata_sff_port_ops,

    .cable_detect    = ata_cable_40wire,

    .set_piomode    = ali_set_piomode,

-    .sff_data_xfer  = ata_sff_data_xfer32,

};

static const struct ata_port_operations ali_dma_base_ops = {

-    .inherits    = &ata_bmdma32_port_ops,

+    .inherits    = &ata_bmdma_port_ops,

    .set_piomode    = ali_set_piomode,

    .set_dmamode    = ali_set_dmamode,

};

diff --git a/drivers/ata/pata_amd.c b/drivers/ata/pata_amd.c

index 63719ab..0ec9c7d 100644

--- a/drivers/ata/pata_amd.c

+++ b/drivers/ata/pata_amd.c

@@ -24,7 +24,7 @@

#include <linux/libata.h>

#define DRV_NAME "pata_amd"

-#define DRV_VERSION "0.3.11"

+#define DRV_VERSION "0.3.10"

/**

 *    timing_setup        -    shared timing computation and load

@@ -345,7 +345,7 @@ static struct scsi_host_template amd_sht = {

};

static const struct ata_port_operations amd_base_port_ops = {

-    .inherits    = &ata_bmdma32_port_ops,

+    .inherits    = &ata_bmdma_port_ops,

    .prereset    = amd_pre_reset,

};

diff --git a/drivers/ata/pata_atiixp.c b/drivers/ata/pata_atiixp.c

index 506adde..115eb00 100644

--- a/drivers/ata/pata_atiixp.c

+++ b/drivers/ata/pata_atiixp.c

@@ -140,7 +140,7 @@ static void atiixp_set_dmamode(struct ata_port *ap, struct ata_device *adev)

        wanted_pio = 3;

    else if (adev->dma_mode == XFER_MW_DMA_0)

        wanted_pio = 0;

-    else BUG();

+    else panic("atiixp_set_dmamode: unknown DMA mode!");

    if (adev->pio_mode != wanted_pio)

        atiixp_set_pio_timing(ap, adev, wanted_pio);

diff --git a/drivers/ata/pata_mpiix.c b/drivers/ata/pata_mpiix.c

index aa576ca..7c8faa4 100644

--- a/drivers/ata/pata_mpiix.c

+++ b/drivers/ata/pata_mpiix.c

@@ -35,7 +35,7 @@

#include <linux/libata.h>

#define DRV_NAME "pata_mpiix"

-#define DRV_VERSION "0.7.7"

+#define DRV_VERSION "0.7.6"

enum {

    IDETIM = 0x6C,        /* IDE control register */

@@ -146,7 +146,6 @@ static struct ata_port_operations mpiix_port_ops = {

    .cable_detect    = ata_cable_40wire,

    .set_piomode    = mpiix_set_piomode,

    .prereset    = mpiix_pre_reset,

-    .sff_data_xfer    = ata_sff_data_xfer32,

};

static int mpiix_init_one(struct pci_dev *dev, const struct pci_device_id *id)

diff --git a/drivers/ata/pata_sil680.c b/drivers/ata/pata_sil680.c

index 9e764e5..83580a5 100644

--- a/drivers/ata/pata_sil680.c

+++ b/drivers/ata/pata_sil680.c

@@ -32,7 +32,7 @@

#include <linux/libata.h>

#define DRV_NAME "pata_sil680"

-#define DRV_VERSION "0.4.9"

+#define DRV_VERSION "0.4.8"

#define SIL680_MMIO_BAR        5

@@ -195,7 +195,7 @@ static struct scsi_host_template sil680_sht = {

};

static struct ata_port_operations sil680_port_ops = {

-    .inherits    = &ata_bmdma32_port_ops,

+    .inherits    = &ata_bmdma_port_ops,

    .cable_detect    = sil680_cable_detect,

    .set_piomode    = sil680_set_piomode,

    .set_dmamode    = sil680_set_dmamode,

diff --git a/drivers/ata/sata_via.c b/drivers/ata/sata_via.c

index 5c62da9..f9803a2 100644

--- a/drivers/ata/sata_via.c

+++ b/drivers/ata/sata_via.c

@@ -566,7 +566,7 @@ static int svia_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)

    static int printed_version;

    unsigned int i;

    int rc;

-    struct ata_host *host;

+    struct ata_host *uninitialized_var(host);

    int board_id = (int) ent->driver_data;

    const unsigned *bar_sizes;

Justin Madru


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12263] Sata soft reset filling log
  2009-02-16 19:23                     ` Justin Madru
@ 2009-02-16 19:42                           ` Sergei Shtylyov
  0 siblings, 0 replies; 262+ messages in thread
From: Sergei Shtylyov @ 2009-02-16 19:42 UTC (permalink / raw)
  To: Justin Madru
  Cc: Rafael J. Wysocki, Ingo Molnar, Linux Kernel Mailing List,
	Kernel Testers List, Linux IDE, Alan Cox, Hugh Dickins,
	Larry Finger, Mikael Pettersson

Hello.

Justin Madru wrote:

>> After referring to the SCSI command codes "cdb 0x1e" means ALLOW 
>> MEDIUM REMOVAL command -- which
>> could hardly be addressed to an usual hard disk. So, it looks like we 
>> had a case of the confused bug report which
>> has a lot of info on the hard disk while errors were most probably 
>> happening with a CD/DVD drive.

> Yes, I originally thought it was my hard disk because the kernel logs 
> showed ata2.

> But, Tejun Heo figured out it was my DVD drive (ATAPI) that was on the 
> ata2 link.

> (see http://marc.info/?l=linux-kernel&m=122993014109646&w=2)

> I tried to bisect it, but around .28-rc1 I began to get numerous compile 
> errors, so couldn't continue.

> I also tried patches that Tejun sent me, but non of them worked, it just 
> slightly change the error message.

> So, yes this is a regression that was introduced in the .28 merge 
> window, and I still think that bug #12609 is a duplicate of my bug.

    If 12609 is truly a post-2.6.28 regression and 12263 is post-2.6.27 
regresssion, this just cannot be.

> I don't see this bug on tip/master and this is the diff of origin and 
> tip at the time I tested.

> $ git diff origin/master..tip/master drivers/ata/

    What tree is that?

WBR, Sergei

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12263] Sata soft reset filling log
@ 2009-02-16 19:42                           ` Sergei Shtylyov
  0 siblings, 0 replies; 262+ messages in thread
From: Sergei Shtylyov @ 2009-02-16 19:42 UTC (permalink / raw)
  To: Justin Madru
  Cc: Rafael J. Wysocki, Ingo Molnar, Linux Kernel Mailing List,
	Kernel Testers List, Linux IDE, Alan Cox, Hugh Dickins,
	Larry Finger, Mikael Pettersson

Hello.

Justin Madru wrote:

>> After referring to the SCSI command codes "cdb 0x1e" means ALLOW 
>> MEDIUM REMOVAL command -- which
>> could hardly be addressed to an usual hard disk. So, it looks like we 
>> had a case of the confused bug report which
>> has a lot of info on the hard disk while errors were most probably 
>> happening with a CD/DVD drive.

> Yes, I originally thought it was my hard disk because the kernel logs 
> showed ata2.

> But, Tejun Heo figured out it was my DVD drive (ATAPI) that was on the 
> ata2 link.

> (see http://marc.info/?l=linux-kernel&m=122993014109646&w=2)

> I tried to bisect it, but around .28-rc1 I began to get numerous compile 
> errors, so couldn't continue.

> I also tried patches that Tejun sent me, but non of them worked, it just 
> slightly change the error message.

> So, yes this is a regression that was introduced in the .28 merge 
> window, and I still think that bug #12609 is a duplicate of my bug.

    If 12609 is truly a post-2.6.28 regression and 12263 is post-2.6.27 
regresssion, this just cannot be.

> I don't see this bug on tip/master and this is the diff of origin and 
> tip at the time I tested.

> $ git diff origin/master..tip/master drivers/ata/

    What tree is that?

WBR, Sergei

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12403] TTY problem on linux-2.6.28-rc7
  2009-02-16 16:12     ` Aristeu Rozanski
  (?)
@ 2009-02-16 20:42     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-16 20:42 UTC (permalink / raw)
  To: Aristeu Rozanski
  Cc: Linux Kernel Mailing List, Kernel Testers List, sasa sasa

On Monday 16 February 2009, Aristeu Rozanski wrote:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.27 and 2.6.28.
> > 
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> > be listed and let me know (either way).
> > 
> > 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12403
> > Subject		: TTY problem on linux-2.6.28-rc7
> > Submitter	: sasa sasa <sasak.1983@gmail.com>
> > Date		: 2008-12-22 4:23 (55 days old)
> > References	: http://marc.info/?l=linux-kernel&m=122991914600390&w=4
> according to
> 	http://marc.info/?l=linux-kernel&m=123054911532245&w=4
> which is a reply to the first post, it's not a kernel problem

OK, I've closed the bug.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12263] Sata soft reset filling log
  2009-02-16 19:42                           ` Sergei Shtylyov
@ 2009-02-16 21:40                               ` Justin Madru
  -1 siblings, 0 replies; 262+ messages in thread
From: Justin Madru @ 2009-02-16 21:40 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: Rafael J. Wysocki, Ingo Molnar, Linux Kernel Mailing List,
	Kernel Testers List, Linux IDE, Alan Cox, Hugh Dickins,
	Larry Finger, Mikael Pettersson

Sergei Shtylyov wrote:

> Hello.
>> Justin Madru wrote:
>>> After referring to the SCSI command codes "cdb 0x1e" means ALLOW 
>>> MEDIUM REMOVAL command -- which
>>> could hardly be addressed to an usual hard disk. So, it looks like 
>>> we had a case of the confused bug report which
>>> has a lot of info on the hard disk while errors were most probably 
>>> happening with a CD/DVD drive.
>>
>> Yes, I originally thought it was my hard disk because the kernel logs 
>> showed ata2.
>> But, Tejun Heo figured out it was my DVD drive (ATAPI) that was on 
>> the ata2 link.
>> (see http://marc.info/?l=linux-kernel&m=122993014109646&w=2)
>> I tried to bisect it, but around .28-rc1 I began to get numerous 
>> compile errors, so couldn't continue.
>> I also tried patches that Tejun sent me, but non of them worked, it 
>> just slightly change the error message.
>> So, yes this is a regression that was introduced in the .28 merge 
>> window, and I still think that bug #12609 is a duplicate of my bug.
>
> If 12609 is truly a post-2.6.28 regression and 12263 is post-2.6.27 
> regresssion, this just cannot be.

Maybe the reporter of #12609 didn't notice/test kernels 28-rc1 to 28. Or 
maybe the difference in hardware is
the issue, but the bug is still the same. Don't know.


>> I don't see this bug on tip/master and this is the diff of origin and 
>> tip at the time I tested.
>> $ git diff origin/master..tip/master drivers/ata/
>
> What tree is that?

This is what I have in .git/config and I get the same diff if I run:
git diff master..tip drivers/ata/  or  git diff master...tip drivers/ata/

[core]
    repositoryformatversion = 0
    filemode = true
    bare = false
    logallrefupdates = true
[remote "origin"]
    url = 
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
    remote = origin
    merge = refs/heads/master
[remote "tip"]
    url = 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git
    fetch = +refs/heads/*:refs/remotes/tip/*
[branch "tip"]
    remote = tip
    merge = refs/heads/master

Justin Madru

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12263] Sata soft reset filling log
@ 2009-02-16 21:40                               ` Justin Madru
  0 siblings, 0 replies; 262+ messages in thread
From: Justin Madru @ 2009-02-16 21:40 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: Rafael J. Wysocki, Ingo Molnar, Linux Kernel Mailing List,
	Kernel Testers List, Linux IDE, Alan Cox, Hugh Dickins,
	Larry Finger, Mikael Pettersson

Sergei Shtylyov wrote:

> Hello.
>> Justin Madru wrote:
>>> After referring to the SCSI command codes "cdb 0x1e" means ALLOW 
>>> MEDIUM REMOVAL command -- which
>>> could hardly be addressed to an usual hard disk. So, it looks like 
>>> we had a case of the confused bug report which
>>> has a lot of info on the hard disk while errors were most probably 
>>> happening with a CD/DVD drive.
>>
>> Yes, I originally thought it was my hard disk because the kernel logs 
>> showed ata2.
>> But, Tejun Heo figured out it was my DVD drive (ATAPI) that was on 
>> the ata2 link.
>> (see http://marc.info/?l=linux-kernel&m=122993014109646&w=2)
>> I tried to bisect it, but around .28-rc1 I began to get numerous 
>> compile errors, so couldn't continue.
>> I also tried patches that Tejun sent me, but non of them worked, it 
>> just slightly change the error message.
>> So, yes this is a regression that was introduced in the .28 merge 
>> window, and I still think that bug #12609 is a duplicate of my bug.
>
> If 12609 is truly a post-2.6.28 regression and 12263 is post-2.6.27 
> regresssion, this just cannot be.

Maybe the reporter of #12609 didn't notice/test kernels 28-rc1 to 28. Or 
maybe the difference in hardware is
the issue, but the bug is still the same. Don't know.


>> I don't see this bug on tip/master and this is the diff of origin and 
>> tip at the time I tested.
>> $ git diff origin/master..tip/master drivers/ata/
>
> What tree is that?

This is what I have in .git/config and I get the same diff if I run:
git diff master..tip drivers/ata/  or  git diff master...tip drivers/ata/

[core]
    repositoryformatversion = 0
    filemode = true
    bare = false
    logallrefupdates = true
[remote "origin"]
    url = 
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
    remote = origin
    merge = refs/heads/master
[remote "tip"]
    url = 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git
    fetch = +refs/heads/*:refs/remotes/tip/*
[branch "tip"]
    remote = tip
    merge = refs/heads/master

Justin Madru

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12263] Sata soft reset filling log
  2009-02-16 21:40                               ` Justin Madru
@ 2009-02-17 11:19                                   ` Hugh Dickins
  -1 siblings, 0 replies; 262+ messages in thread
From: Hugh Dickins @ 2009-02-17 11:19 UTC (permalink / raw)
  To: Justin Madru
  Cc: Sergei Shtylyov, Rafael J. Wysocki, Ingo Molnar,
	Linux Kernel Mailing List, Kernel Testers List, Linux IDE,
	Alan Cox, Larry Finger, Mikael Pettersson

On Mon, 16 Feb 2009, Justin Madru wrote:
> Sergei Shtylyov wrote:
> >
> > If 12609 is truly a post-2.6.28 regression and 12263 is post-2.6.27
> > regresssion, this just cannot be.
> 
> Maybe the reporter of #12609 didn't notice/test kernels 28-rc1 to 28. Or maybe
> the difference in hardware is
> the issue, but the bug is still the same. Don't know.

Sorry Justin, you must be confused: as Sergei says,
#12609 and #12263 can only be different.

I was one of the reporters of #12609, and I do know it's a post-2.6.28
regression (and Larry said so too), and one fix (not the preferred fix)
is to revert the ata_bmdma32_port_ops from 2.6.29-rc, and the preferred
fix is to improve the ata_sff_data_xfer32() introduced in 2.6.29-rc1.

2.6.28 does not contain any ata_bmdma32_port_ops, nor ata_sff_data_xfer32(),
not did 2.6.28-rc1 contain them.  So it is impossible for the reversion of
the patch that introduced them to fix any problem on 2.6.28.

I'm quite prepared to believe that your #12263 manifests similarly to
#12609, and that a tip tree which contains a fix for #12609 contains
a fix for #12263; but please, those bugs are not the same, and they
don't have the same fix.

Hugh

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12263] Sata soft reset filling log
@ 2009-02-17 11:19                                   ` Hugh Dickins
  0 siblings, 0 replies; 262+ messages in thread
From: Hugh Dickins @ 2009-02-17 11:19 UTC (permalink / raw)
  To: Justin Madru
  Cc: Sergei Shtylyov, Rafael J. Wysocki, Ingo Molnar,
	Linux Kernel Mailing List, Kernel Testers List, Linux IDE,
	Alan Cox, Larry Finger, Mikael Pettersson

On Mon, 16 Feb 2009, Justin Madru wrote:
> Sergei Shtylyov wrote:
> >
> > If 12609 is truly a post-2.6.28 regression and 12263 is post-2.6.27
> > regresssion, this just cannot be.
> 
> Maybe the reporter of #12609 didn't notice/test kernels 28-rc1 to 28. Or maybe
> the difference in hardware is
> the issue, but the bug is still the same. Don't know.

Sorry Justin, you must be confused: as Sergei says,
#12609 and #12263 can only be different.

I was one of the reporters of #12609, and I do know it's a post-2.6.28
regression (and Larry said so too), and one fix (not the preferred fix)
is to revert the ata_bmdma32_port_ops from 2.6.29-rc, and the preferred
fix is to improve the ata_sff_data_xfer32() introduced in 2.6.29-rc1.

2.6.28 does not contain any ata_bmdma32_port_ops, nor ata_sff_data_xfer32(),
not did 2.6.28-rc1 contain them.  So it is impossible for the reversion of
the patch that introduced them to fix any problem on 2.6.28.

I'm quite prepared to believe that your #12263 manifests similarly to
#12609, and that a tip tree which contains a fix for #12609 contains
a fix for #12263; but please, those bugs are not the same, and they
don't have the same fix.

Hugh

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12612] hard lockup when interrupting cdda2wav
  2009-02-14 20:50   ` Rafael J. Wysocki
@ 2009-02-17 17:16     ` Matthias Reichl
  -1 siblings, 0 replies; 262+ messages in thread
From: Matthias Reichl @ 2009-02-17 17:16 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, FUJITA Tomonori

The bug is still present in 2.6.28.5:

=============================================
[ INFO: possible recursive locking detected ]
2.6.28.5-dbg #1
---------------------------------------------
swapper/0 is trying to acquire lock:
 (&q->__queue_lock){.+..}, at: [<ffffffff8040e615>] blk_put_request+0x25/0x60

but task is already holding lock:
 (&q->__queue_lock){.+..}, at: [<ffffffff8040e4fa>] blk_end_io+0x5a/0xa0

other info that might help us debug this:
1 lock held by swapper/0:
 #0:  (&q->__queue_lock){.+..}, at: [<ffffffff8040e4fa>] blk_end_io+0x5a/0xa0

stack backtrace:
Pid: 0, comm: swapper Not tainted 2.6.28.5-dbg #1
Call Trace:
 <IRQ>  [<ffffffff8026cd07>] __lock_acquire+0x1797/0x1930
 [<ffffffff806abf8b>] error_exit+0x29/0xa9
 [<ffffffff80521f40>] sg_rq_end_io+0x0/0x2e0
 [<ffffffff8026cf3a>] lock_acquire+0x9a/0xe0
 [<ffffffff8040e615>] blk_put_request+0x25/0x60
 [<ffffffff806ab973>] _spin_lock_irqsave+0x43/0x90
 [<ffffffff8040e615>] blk_put_request+0x25/0x60
 [<ffffffff8040e615>] blk_put_request+0x25/0x60
 [<ffffffff80520a94>] sg_finish_rem_req+0xa4/0x100
 [<ffffffff805221b8>] sg_rq_end_io+0x278/0x2e0
 [<ffffffff8040e2a1>] end_that_request_last+0x61/0x260
 [<ffffffff8040e508>] blk_end_io+0x68/0xa0
 [<ffffffff80508181>] scsi_end_request+0x41/0xd0
 [<ffffffff80508870>] scsi_io_completion+0x130/0x470
 [<ffffffff80413405>] blk_done_softirq+0x75/0x90
 [<ffffffff802488ab>] __do_softirq+0x9b/0x180
 [<ffffffff80213df3>] native_sched_clock+0x13/0x70
 [<ffffffff8020d6ec>] call_softirq+0x1c/0x30
 [<ffffffff8020f175>] do_softirq+0x65/0xa0
 [<ffffffff80248345>] irq_exit+0xa5/0xb0
 [<ffffffff8020f467>] do_IRQ+0x107/0x1d0
 [<ffffffff8020c7fb>] ret_from_intr+0x0/0xf
 <EOI>  [<ffffffff80214ba6>] mwait_idle+0x56/0x60
 [<ffffffff80214b9d>] mwait_idle+0x4d/0x60
 [<ffffffff8020b353>] cpu_idle+0x63/0xc0


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12612] hard lockup when interrupting cdda2wav
@ 2009-02-17 17:16     ` Matthias Reichl
  0 siblings, 0 replies; 262+ messages in thread
From: Matthias Reichl @ 2009-02-17 17:16 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, FUJITA Tomonori

The bug is still present in 2.6.28.5:

=============================================
[ INFO: possible recursive locking detected ]
2.6.28.5-dbg #1
---------------------------------------------
swapper/0 is trying to acquire lock:
 (&q->__queue_lock){.+..}, at: [<ffffffff8040e615>] blk_put_request+0x25/0x60

but task is already holding lock:
 (&q->__queue_lock){.+..}, at: [<ffffffff8040e4fa>] blk_end_io+0x5a/0xa0

other info that might help us debug this:
1 lock held by swapper/0:
 #0:  (&q->__queue_lock){.+..}, at: [<ffffffff8040e4fa>] blk_end_io+0x5a/0xa0

stack backtrace:
Pid: 0, comm: swapper Not tainted 2.6.28.5-dbg #1
Call Trace:
 <IRQ>  [<ffffffff8026cd07>] __lock_acquire+0x1797/0x1930
 [<ffffffff806abf8b>] error_exit+0x29/0xa9
 [<ffffffff80521f40>] sg_rq_end_io+0x0/0x2e0
 [<ffffffff8026cf3a>] lock_acquire+0x9a/0xe0
 [<ffffffff8040e615>] blk_put_request+0x25/0x60
 [<ffffffff806ab973>] _spin_lock_irqsave+0x43/0x90
 [<ffffffff8040e615>] blk_put_request+0x25/0x60
 [<ffffffff8040e615>] blk_put_request+0x25/0x60
 [<ffffffff80520a94>] sg_finish_rem_req+0xa4/0x100
 [<ffffffff805221b8>] sg_rq_end_io+0x278/0x2e0
 [<ffffffff8040e2a1>] end_that_request_last+0x61/0x260
 [<ffffffff8040e508>] blk_end_io+0x68/0xa0
 [<ffffffff80508181>] scsi_end_request+0x41/0xd0
 [<ffffffff80508870>] scsi_io_completion+0x130/0x470
 [<ffffffff80413405>] blk_done_softirq+0x75/0x90
 [<ffffffff802488ab>] __do_softirq+0x9b/0x180
 [<ffffffff80213df3>] native_sched_clock+0x13/0x70
 [<ffffffff8020d6ec>] call_softirq+0x1c/0x30
 [<ffffffff8020f175>] do_softirq+0x65/0xa0
 [<ffffffff80248345>] irq_exit+0xa5/0xb0
 [<ffffffff8020f467>] do_IRQ+0x107/0x1d0
 [<ffffffff8020c7fb>] ret_from_intr+0x0/0xf
 <EOI>  [<ffffffff80214ba6>] mwait_idle+0x56/0x60
 [<ffffffff80214b9d>] mwait_idle+0x4d/0x60
 [<ffffffff8020b353>] cpu_idle+0x63/0xc0

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12263] Sata soft reset filling log
  2009-02-17 11:19                                   ` Hugh Dickins
  (?)
@ 2009-02-17 19:08                                   ` Justin Madru
       [not found]                                     ` <499B0B3E.3070101-u1xxEuL7cY4AvxtiuMwx3w@public.gmane.org>
  -1 siblings, 1 reply; 262+ messages in thread
From: Justin Madru @ 2009-02-17 19:08 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Sergei Shtylyov, Rafael J. Wysocki, Ingo Molnar,
	Linux Kernel Mailing List, Kernel Testers List, Linux IDE,
	Alan Cox, Larry Finger, Mikael Pettersson

Hugh Dickins wrote:
> On Mon, 16 Feb 2009, Justin Madru wrote:
>   
>> Sergei Shtylyov wrote:
>>     
>>> If 12609 is truly a post-2.6.28 regression and 12263 is post-2.6.27
>>> regresssion, this just cannot be.
>>>       
>> Maybe the reporter of #12609 didn't notice/test kernels 28-rc1 to 28. Or maybe
>> the difference in hardware is
>> the issue, but the bug is still the same. Don't know.
>>     
>
> Sorry Justin, you must be confused: as Sergei says,
> #12609 and #12263 can only be different.
>
> I was one of the reporters of #12609, and I do know it's a post-2.6.28
> regression (and Larry said so too), and one fix (not the preferred fix)
> is to revert the ata_bmdma32_port_ops from 2.6.29-rc, and the preferred
> fix is to improve the ata_sff_data_xfer32() introduced in 2.6.29-rc1.
>
> 2.6.28 does not contain any ata_bmdma32_port_ops, nor ata_sff_data_xfer32(),
> not did 2.6.28-rc1 contain them.  So it is impossible for the reversion of
> the patch that introduced them to fix any problem on 2.6.28.
>
> I'm quite prepared to believe that your #12263 manifests similarly to
> #12609, and that a tip tree which contains a fix for #12609 contains
> a fix for #12263; but please, those bugs are not the same, and they
> don't have the same fix.
>
> Hugh
>
>   
Well, like I said: "[I] Don't know". I'm not a kernel developer (or even 
any developer... yet).
I'm just someone that tests the -rc kernels to see if there's any 
problems with my hardware.
I try to report any regressions to lkml, and hopefully help the developers.

To me, who has no knowledge of all these low level issues, the following 
error messages
look strikingly similar with a quick glance.

# bug 12609
# http://marc.info/?l=linux-kernel&m=123254501314058&w=4
#
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata2.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0
         cdb 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
         res 51/20:03:00:00:00/00:00:00:00:00/a0 Emask 0x5 (timeout)
ata2.00: status: { DRDY ERR }
ata2: soft resetting link
ata2.00: configured for UDMA/33
ata2: EH complete

# bug 12263
# http://marc.info/?l=linux-kernel&m=122913412608533&w=4
#
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata2.00: ST_FIRST: !(DRQ|ERR|DF)
ata2.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0
         cdb 1e 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
         res 50/00:01:00:00:00/00:00:00:00:00/a0 Emask 0x2 (HSM violation)
ata2.00: status: { DRDY }
ata2: soft resetting link
ata2.00: configured for UDMA/33
ata2: EH complete

# bug 12609
# http://marc.info/?l=linux-kernel&m=123275478111406&w=4
#
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata2.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0
         cdb 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
         res 51/20:03:00:00:00/00:00:00:00:00/a0 Emask 0x3 (HSM violation)
ata2.00: status: { DRDY ERR }
ata2: soft resetting link
ata2.00: configured for PIO4
ata2: EH complete

So, will the patch for 12609 fix my issue also, or does there need to be 
another patch?

Justin Madru

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12612] hard lockup when interrupting cdda2wav
@ 2009-02-17 20:23       ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-17 20:23 UTC (permalink / raw)
  To: Matthias Reichl
  Cc: Linux Kernel Mailing List, Kernel Testers List, FUJITA Tomonori

On Tuesday 17 February 2009, Matthias Reichl wrote:
> The bug is still present in 2.6.28.5:
> 
> =============================================
> [ INFO: possible recursive locking detected ]
> 2.6.28.5-dbg #1
> ---------------------------------------------
> swapper/0 is trying to acquire lock:
>  (&q->__queue_lock){.+..}, at: [<ffffffff8040e615>] blk_put_request+0x25/0x60
> 
> but task is already holding lock:
>  (&q->__queue_lock){.+..}, at: [<ffffffff8040e4fa>] blk_end_io+0x5a/0xa0
> 
> other info that might help us debug this:
> 1 lock held by swapper/0:
>  #0:  (&q->__queue_lock){.+..}, at: [<ffffffff8040e4fa>] blk_end_io+0x5a/0xa0
> 
> stack backtrace:
> Pid: 0, comm: swapper Not tainted 2.6.28.5-dbg #1
> Call Trace:
>  <IRQ>  [<ffffffff8026cd07>] __lock_acquire+0x1797/0x1930
>  [<ffffffff806abf8b>] error_exit+0x29/0xa9
>  [<ffffffff80521f40>] sg_rq_end_io+0x0/0x2e0
>  [<ffffffff8026cf3a>] lock_acquire+0x9a/0xe0
>  [<ffffffff8040e615>] blk_put_request+0x25/0x60
>  [<ffffffff806ab973>] _spin_lock_irqsave+0x43/0x90
>  [<ffffffff8040e615>] blk_put_request+0x25/0x60
>  [<ffffffff8040e615>] blk_put_request+0x25/0x60
>  [<ffffffff80520a94>] sg_finish_rem_req+0xa4/0x100
>  [<ffffffff805221b8>] sg_rq_end_io+0x278/0x2e0
>  [<ffffffff8040e2a1>] end_that_request_last+0x61/0x260
>  [<ffffffff8040e508>] blk_end_io+0x68/0xa0
>  [<ffffffff80508181>] scsi_end_request+0x41/0xd0
>  [<ffffffff80508870>] scsi_io_completion+0x130/0x470
>  [<ffffffff80413405>] blk_done_softirq+0x75/0x90
>  [<ffffffff802488ab>] __do_softirq+0x9b/0x180
>  [<ffffffff80213df3>] native_sched_clock+0x13/0x70
>  [<ffffffff8020d6ec>] call_softirq+0x1c/0x30
>  [<ffffffff8020f175>] do_softirq+0x65/0xa0
>  [<ffffffff80248345>] irq_exit+0xa5/0xb0
>  [<ffffffff8020f467>] do_IRQ+0x107/0x1d0
>  [<ffffffff8020c7fb>] ret_from_intr+0x0/0xf
>  <EOI>  [<ffffffff80214ba6>] mwait_idle+0x56/0x60
>  [<ffffffff80214b9d>] mwait_idle+0x4d/0x60
>  [<ffffffff8020b353>] cpu_idle+0x63/0xc0

Thanks for the update.

Rafael

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12612] hard lockup when interrupting cdda2wav
@ 2009-02-17 20:23       ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-17 20:23 UTC (permalink / raw)
  To: Matthias Reichl
  Cc: Linux Kernel Mailing List, Kernel Testers List, FUJITA Tomonori

On Tuesday 17 February 2009, Matthias Reichl wrote:
> The bug is still present in 2.6.28.5:
> 
> =============================================
> [ INFO: possible recursive locking detected ]
> 2.6.28.5-dbg #1
> ---------------------------------------------
> swapper/0 is trying to acquire lock:
>  (&q->__queue_lock){.+..}, at: [<ffffffff8040e615>] blk_put_request+0x25/0x60
> 
> but task is already holding lock:
>  (&q->__queue_lock){.+..}, at: [<ffffffff8040e4fa>] blk_end_io+0x5a/0xa0
> 
> other info that might help us debug this:
> 1 lock held by swapper/0:
>  #0:  (&q->__queue_lock){.+..}, at: [<ffffffff8040e4fa>] blk_end_io+0x5a/0xa0
> 
> stack backtrace:
> Pid: 0, comm: swapper Not tainted 2.6.28.5-dbg #1
> Call Trace:
>  <IRQ>  [<ffffffff8026cd07>] __lock_acquire+0x1797/0x1930
>  [<ffffffff806abf8b>] error_exit+0x29/0xa9
>  [<ffffffff80521f40>] sg_rq_end_io+0x0/0x2e0
>  [<ffffffff8026cf3a>] lock_acquire+0x9a/0xe0
>  [<ffffffff8040e615>] blk_put_request+0x25/0x60
>  [<ffffffff806ab973>] _spin_lock_irqsave+0x43/0x90
>  [<ffffffff8040e615>] blk_put_request+0x25/0x60
>  [<ffffffff8040e615>] blk_put_request+0x25/0x60
>  [<ffffffff80520a94>] sg_finish_rem_req+0xa4/0x100
>  [<ffffffff805221b8>] sg_rq_end_io+0x278/0x2e0
>  [<ffffffff8040e2a1>] end_that_request_last+0x61/0x260
>  [<ffffffff8040e508>] blk_end_io+0x68/0xa0
>  [<ffffffff80508181>] scsi_end_request+0x41/0xd0
>  [<ffffffff80508870>] scsi_io_completion+0x130/0x470
>  [<ffffffff80413405>] blk_done_softirq+0x75/0x90
>  [<ffffffff802488ab>] __do_softirq+0x9b/0x180
>  [<ffffffff80213df3>] native_sched_clock+0x13/0x70
>  [<ffffffff8020d6ec>] call_softirq+0x1c/0x30
>  [<ffffffff8020f175>] do_softirq+0x65/0xa0
>  [<ffffffff80248345>] irq_exit+0xa5/0xb0
>  [<ffffffff8020f467>] do_IRQ+0x107/0x1d0
>  [<ffffffff8020c7fb>] ret_from_intr+0x0/0xf
>  <EOI>  [<ffffffff80214ba6>] mwait_idle+0x56/0x60
>  [<ffffffff80214b9d>] mwait_idle+0x4d/0x60
>  [<ffffffff8020b353>] cpu_idle+0x63/0xc0

Thanks for the update.

Rafael

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12263] Sata soft reset filling log
  2009-02-17 19:08                                   ` Justin Madru
@ 2009-02-18  1:03                                         ` Sergei Shtylyov
  0 siblings, 0 replies; 262+ messages in thread
From: Sergei Shtylyov @ 2009-02-18  1:03 UTC (permalink / raw)
  To: Justin Madru
  Cc: Hugh Dickins, Rafael J. Wysocki, Ingo Molnar,
	Linux Kernel Mailing List, Kernel Testers List, Linux IDE,
	Alan Cox, Larry Finger, Mikael Pettersson

Hello.

Justin Madru wrote:

>>>> If 12609 is truly a post-2.6.28 regression and 12263 is post-2.6.27
>>>> regresssion, this just cannot be.
>>>>       
>>> Maybe the reporter of #12609 didn't notice/test kernels 28-rc1 to 
>>> 28. Or maybe
>>> the difference in hardware is
>>> the issue, but the bug is still the same. Don't know.
>>>     
>>
>> Sorry Justin, you must be confused: as Sergei says,
>> #12609 and #12263 can only be different.
>>
>> I was one of the reporters of #12609, and I do know it's a post-2.6.28
>> regression (and Larry said so too), and one fix (not the preferred fix)
>> is to revert the ata_bmdma32_port_ops from 2.6.29-rc, and the preferred
>> fix is to improve the ata_sff_data_xfer32() introduced in 2.6.29-rc1.
>>
>> 2.6.28 does not contain any ata_bmdma32_port_ops, nor 
>> ata_sff_data_xfer32(),
>> not did 2.6.28-rc1 contain them.  So it is impossible for the 
>> reversion of
>> the patch that introduced them to fix any problem on 2.6.28.
>>
>> I'm quite prepared to believe that your #12263 manifests similarly to
>> #12609, and that a tip tree which contains a fix for #12609 contains
>> a fix for #12263; but please, those bugs are not the same, and they
>> don't have the same fix.
>>
>> Hugh
>>
>>   
> Well, like I said: "[I] Don't know". I'm not a kernel developer (or 
> even any developer... yet).
> I'm just someone that tests the -rc kernels to see if there's any 
> problems with my hardware.
> I try to report any regressions to lkml, and hopefully help the 
> developers.
>
> To me, who has no knowledge of all these low level issues, the 
> following error messages
> look strikingly similar with a quick glance.
>
> # bug 12609
> # http://marc.info/?l=linux-kernel&m=123254501314058&w=4
> #
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0
>         cdb 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
>         res 51/20:03:00:00:00/00:00:00:00:00/a0 Emask 0x5 (timeout)
> ata2.00: status: { DRDY ERR }
> ata2: soft resetting link
> ata2.00: configured for UDMA/33
> ata2: EH complete
>
> # bug 12263
> # http://marc.info/?l=linux-kernel&m=122913412608533&w=4
> #
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: ST_FIRST: !(DRQ|ERR|DF)
> ata2.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0
>         cdb 1e 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
>         res 50/00:01:00:00:00/00:00:00:00:00/a0 Emask 0x2 (HSM violation)

   Note the different value of the status, error and interrupt reason 
registers: 51/20:03 vs 50/00:01. The former means (unexpected?) status 
phase interrupt with error indication and the sense key NOT READY, the 
latter means (unexpected?) command phase interrupt with no error. IIUC, 
the former happens once the 'sr' driver first sends the TEST UNIT READY 
command while probing the CD/DVD drive, the latter seems to be a result 
of some polling process (originated from userland) -- I'm not seeing 
ALLOW_MEDIUM_REMOVAL anywhere in this driver. So they only look similar, 
I think...


> So, will the patch for 12609 fix my issue also, or does there need to 
> be another patch?

   Most probably it'll need another patch.

> Justin Madru

MBR, Sergei

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12263] Sata soft reset filling log
@ 2009-02-18  1:03                                         ` Sergei Shtylyov
  0 siblings, 0 replies; 262+ messages in thread
From: Sergei Shtylyov @ 2009-02-18  1:03 UTC (permalink / raw)
  To: Justin Madru
  Cc: Hugh Dickins, Rafael J. Wysocki, Ingo Molnar,
	Linux Kernel Mailing List, Kernel Testers List, Linux IDE,
	Alan Cox, Larry Finger, Mikael Pettersson

Hello.

Justin Madru wrote:

>>>> If 12609 is truly a post-2.6.28 regression and 12263 is post-2.6.27
>>>> regresssion, this just cannot be.
>>>>       
>>> Maybe the reporter of #12609 didn't notice/test kernels 28-rc1 to 
>>> 28. Or maybe
>>> the difference in hardware is
>>> the issue, but the bug is still the same. Don't know.
>>>     
>>
>> Sorry Justin, you must be confused: as Sergei says,
>> #12609 and #12263 can only be different.
>>
>> I was one of the reporters of #12609, and I do know it's a post-2.6.28
>> regression (and Larry said so too), and one fix (not the preferred fix)
>> is to revert the ata_bmdma32_port_ops from 2.6.29-rc, and the preferred
>> fix is to improve the ata_sff_data_xfer32() introduced in 2.6.29-rc1.
>>
>> 2.6.28 does not contain any ata_bmdma32_port_ops, nor 
>> ata_sff_data_xfer32(),
>> not did 2.6.28-rc1 contain them.  So it is impossible for the 
>> reversion of
>> the patch that introduced them to fix any problem on 2.6.28.
>>
>> I'm quite prepared to believe that your #12263 manifests similarly to
>> #12609, and that a tip tree which contains a fix for #12609 contains
>> a fix for #12263; but please, those bugs are not the same, and they
>> don't have the same fix.
>>
>> Hugh
>>
>>   
> Well, like I said: "[I] Don't know". I'm not a kernel developer (or 
> even any developer... yet).
> I'm just someone that tests the -rc kernels to see if there's any 
> problems with my hardware.
> I try to report any regressions to lkml, and hopefully help the 
> developers.
>
> To me, who has no knowledge of all these low level issues, the 
> following error messages
> look strikingly similar with a quick glance.
>
> # bug 12609
> # http://marc.info/?l=linux-kernel&m=123254501314058&w=4
> #
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0
>         cdb 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
>         res 51/20:03:00:00:00/00:00:00:00:00/a0 Emask 0x5 (timeout)
> ata2.00: status: { DRDY ERR }
> ata2: soft resetting link
> ata2.00: configured for UDMA/33
> ata2: EH complete
>
> # bug 12263
> # http://marc.info/?l=linux-kernel&m=122913412608533&w=4
> #
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: ST_FIRST: !(DRQ|ERR|DF)
> ata2.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0
>         cdb 1e 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
>         res 50/00:01:00:00:00/00:00:00:00:00/a0 Emask 0x2 (HSM violation)

   Note the different value of the status, error and interrupt reason 
registers: 51/20:03 vs 50/00:01. The former means (unexpected?) status 
phase interrupt with error indication and the sense key NOT READY, the 
latter means (unexpected?) command phase interrupt with no error. IIUC, 
the former happens once the 'sr' driver first sends the TEST UNIT READY 
command while probing the CD/DVD drive, the latter seems to be a result 
of some polling process (originated from userland) -- I'm not seeing 
ALLOW_MEDIUM_REMOVAL anywhere in this driver. So they only look similar, 
I think...


> So, will the patch for 12609 fix my issue also, or does there need to 
> be another patch?

   Most probably it'll need another patch.

> Justin Madru

MBR, Sergei



^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12263] Sata soft reset filling log
  2009-02-18  1:03                                         ` Sergei Shtylyov
  (?)
@ 2009-02-18  6:42                                         ` Justin Madru
  -1 siblings, 0 replies; 262+ messages in thread
From: Justin Madru @ 2009-02-18  6:42 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: Hugh Dickins, Rafael J. Wysocki, Ingo Molnar,
	Linux Kernel Mailing List, Kernel Testers List, Linux IDE,
	Alan Cox, Larry Finger, Mikael Pettersson

Sergei Shtylyov wrote:
> Hello.
>
> Justin Madru wrote:
>
>>>>> If 12609 is truly a post-2.6.28 regression and 12263 is post-2.6.27
>>>>> regresssion, this just cannot be.
>>>>>       
>>>> Maybe the reporter of #12609 didn't notice/test kernels 28-rc1 to 
>>>> 28. Or maybe
>>>> the difference in hardware is
>>>> the issue, but the bug is still the same. Don't know.
>>>>     
>>>
>>> Sorry Justin, you must be confused: as Sergei says,
>>> #12609 and #12263 can only be different.
>>>
>>> I was one of the reporters of #12609, and I do know it's a post-2.6.28
>>> regression (and Larry said so too), and one fix (not the preferred fix)
>>> is to revert the ata_bmdma32_port_ops from 2.6.29-rc, and the preferred
>>> fix is to improve the ata_sff_data_xfer32() introduced in 2.6.29-rc1.
>>>
>>> 2.6.28 does not contain any ata_bmdma32_port_ops, nor 
>>> ata_sff_data_xfer32(),
>>> not did 2.6.28-rc1 contain them.  So it is impossible for the 
>>> reversion of
>>> the patch that introduced them to fix any problem on 2.6.28.
>>>
>>> I'm quite prepared to believe that your #12263 manifests similarly to
>>> #12609, and that a tip tree which contains a fix for #12609 contains
>>> a fix for #12263; but please, those bugs are not the same, and they
>>> don't have the same fix.
>>>
>>> Hugh
>>>
>>>   
>> Well, like I said: "[I] Don't know". I'm not a kernel developer (or 
>> even any developer... yet).
>> I'm just someone that tests the -rc kernels to see if there's any 
>> problems with my hardware.
>> I try to report any regressions to lkml, and hopefully help the 
>> developers.
>>
>> To me, who has no knowledge of all these low level issues, the 
>> following error messages
>> look strikingly similar with a quick glance.
>>
>> # bug 12609
>> # http://marc.info/?l=linux-kernel&m=123254501314058&w=4
>> #
>> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
>> ata2.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0
>>         cdb 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
>>         res 51/20:03:00:00:00/00:00:00:00:00/a0 Emask 0x5 (timeout)
>> ata2.00: status: { DRDY ERR }
>> ata2: soft resetting link
>> ata2.00: configured for UDMA/33
>> ata2: EH complete
>>
>> # bug 12263
>> # http://marc.info/?l=linux-kernel&m=122913412608533&w=4
>> #
>> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
>> ata2.00: ST_FIRST: !(DRQ|ERR|DF)
>> ata2.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0
>>         cdb 1e 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
>>         res 50/00:01:00:00:00/00:00:00:00:00/a0 Emask 0x2 (HSM 
>> violation)
>
>   Note the different value of the status, error and interrupt reason 
> registers: 51/20:03 vs 50/00:01. The former means (unexpected?) status 
> phase interrupt with error indication and the sense key NOT READY, the 
> latter means (unexpected?) command phase interrupt with no error. 
> IIUC, the former happens once the 'sr' driver first sends the TEST 
> UNIT READY command while probing the CD/DVD drive, the latter seems to 
> be a result of some polling process (originated from userland) -- I'm 
> not seeing ALLOW_MEDIUM_REMOVAL anywhere in this driver. So they only 
> look similar, I think...

And that is why I'm a tester and you're a developer ;) Thanks for the 
info! Next time I'll look closer
and maybe know what I'm actually looking at.
>
>
>> So, will the patch for 12609 fix my issue also, or does there need to 
>> be another patch?
>
>   Most probably it'll need another patch.
So then, #12263 should be reopened and marked as not a duplicate.
Anyways, if tip/master gets merged how it is now then my bug should be 
fixed.

Justin Madru

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12612] hard lockup when interrupting cdda2wav
@ 2009-02-19 13:49         ` FUJITA Tomonori
  0 siblings, 0 replies; 262+ messages in thread
From: FUJITA Tomonori @ 2009-02-19 13:49 UTC (permalink / raw)
  To: rjw; +Cc: hias, linux-kernel, kernel-testers, fujita.tomonori, James.Bottomley

On Tue, 17 Feb 2009 21:23:12 +0100
"Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> On Tuesday 17 February 2009, Matthias Reichl wrote:
> > The bug is still present in 2.6.28.5:
> > 
> > =============================================
> > [ INFO: possible recursive locking detected ]
> > 2.6.28.5-dbg #1
> > ---------------------------------------------
> > swapper/0 is trying to acquire lock:
> >  (&q->__queue_lock){.+..}, at: [<ffffffff8040e615>] blk_put_request+0x25/0x60
> > 
> > but task is already holding lock:
> >  (&q->__queue_lock){.+..}, at: [<ffffffff8040e4fa>] blk_end_io+0x5a/0xa0
> > 
> > other info that might help us debug this:
> > 1 lock held by swapper/0:
> >  #0:  (&q->__queue_lock){.+..}, at: [<ffffffff8040e4fa>] blk_end_io+0x5a/0xa0
> > 
> > stack backtrace:
> > Pid: 0, comm: swapper Not tainted 2.6.28.5-dbg #1
> > Call Trace:
> >  <IRQ>  [<ffffffff8026cd07>] __lock_acquire+0x1797/0x1930

There is a patch for this but it might take some time to push it into
mainline (I hope that James will move the pending sg fixes to
scsi-fixes tree but it might be too late):

http://marc.info/?l=linux-scsi&m=123436612119386&w=2

Sorry about that again.

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12612] hard lockup when interrupting cdda2wav
@ 2009-02-19 13:49         ` FUJITA Tomonori
  0 siblings, 0 replies; 262+ messages in thread
From: FUJITA Tomonori @ 2009-02-19 13:49 UTC (permalink / raw)
  To: rjw-KKrjLPT3xs0
  Cc: hias-vtPv7MOkFPkAvxtiuMwx3w, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	fujita.tomonori-Zyj7fXuS5i5L9jVzuh4AOg,
	James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk

On Tue, 17 Feb 2009 21:23:12 +0100
"Rafael J. Wysocki" <rjw-KKrjLPT3xs0@public.gmane.org> wrote:

> On Tuesday 17 February 2009, Matthias Reichl wrote:
> > The bug is still present in 2.6.28.5:
> > 
> > =============================================
> > [ INFO: possible recursive locking detected ]
> > 2.6.28.5-dbg #1
> > ---------------------------------------------
> > swapper/0 is trying to acquire lock:
> >  (&q->__queue_lock){.+..}, at: [<ffffffff8040e615>] blk_put_request+0x25/0x60
> > 
> > but task is already holding lock:
> >  (&q->__queue_lock){.+..}, at: [<ffffffff8040e4fa>] blk_end_io+0x5a/0xa0
> > 
> > other info that might help us debug this:
> > 1 lock held by swapper/0:
> >  #0:  (&q->__queue_lock){.+..}, at: [<ffffffff8040e4fa>] blk_end_io+0x5a/0xa0
> > 
> > stack backtrace:
> > Pid: 0, comm: swapper Not tainted 2.6.28.5-dbg #1
> > Call Trace:
> >  <IRQ>  [<ffffffff8026cd07>] __lock_acquire+0x1797/0x1930

There is a patch for this but it might take some time to push it into
mainline (I hope that James will move the pending sg fixes to
scsi-fixes tree but it might be too late):

http://marc.info/?l=linux-scsi&m=123436612119386&w=2

Sorry about that again.

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected) [bug 12465]
@ 2009-02-22 10:39         ` Kevin Shanahan
  0 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-02-22 10:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Mike Galbraith, bugme-daemon,
	Steven Rostedt, Peter Zijlstra

On Sun, 2009-02-15 at 11:04 +0100, Ingo Molnar wrote:
> qemu-sy-4237 has been scheduled away, and the system appeared to have done
> nothing in the meantime. That's not something that really looks like a
> scheduler regression - there is nothing the scheduler can do if KVM
> decides to block a task.
> 
> It would be nice to enhance this single-CPU trace some more - to more
> surgically see what is going on. Firstly, absolute timestamps would be
> nice:
> 
>   echo funcgraph-abstime  > trace_options
>   echo funcgraph-proc     > trace_options
> 
> as it's a bit hard to see the global timescale of events.

I was going to try and grab the trace with absolute timestamps tonight,
but that option doesn't seem to be available in Linus' current kernel.

flexo:/sys/kernel/debug/tracing# echo 0 > tracing_enabled
flexo:/sys/kernel/debug/tracing# echo function_graph > current_tracer
flexo:/sys/kernel/debug/tracing# echo funcgraph-proc > trace_options
flexo:/sys/kernel/debug/tracing# echo funcgraph-abstime  > trace_options
-su: echo: write error: Invalid argument
flexo:/sys/kernel/debug/tracing# cat trace_options 
print-parent nosym-offset nosym-addr noverbose noraw nohex nobin noblock
nostacktrace nosched-tree ftrace_printk noftrace_preempt nobranch
annotate nouserstacktrace nosym-userobj noprintk-msg-only
nofuncgraph-overrun funcgraph-cpu funcgraph-overhead funcgraph-proc 
flexo:/sys/kernel/debug/tracing# uname -a
Linux flexo 2.6.29-rc5-00299-gadfafef #6 SMP Sun Feb 22 20:09:37 CST
2009 x86_64 GNU/Linux

What am I missing?

Cheers,
Kevin.



^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected) [bug 12465]
@ 2009-02-22 10:39         ` Kevin Shanahan
  0 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-02-22 10:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Mike Galbraith,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r, Steven Rostedt,
	Peter Zijlstra

On Sun, 2009-02-15 at 11:04 +0100, Ingo Molnar wrote:
> qemu-sy-4237 has been scheduled away, and the system appeared to have done
> nothing in the meantime. That's not something that really looks like a
> scheduler regression - there is nothing the scheduler can do if KVM
> decides to block a task.
> 
> It would be nice to enhance this single-CPU trace some more - to more
> surgically see what is going on. Firstly, absolute timestamps would be
> nice:
> 
>   echo funcgraph-abstime  > trace_options
>   echo funcgraph-proc     > trace_options
> 
> as it's a bit hard to see the global timescale of events.

I was going to try and grab the trace with absolute timestamps tonight,
but that option doesn't seem to be available in Linus' current kernel.

flexo:/sys/kernel/debug/tracing# echo 0 > tracing_enabled
flexo:/sys/kernel/debug/tracing# echo function_graph > current_tracer
flexo:/sys/kernel/debug/tracing# echo funcgraph-proc > trace_options
flexo:/sys/kernel/debug/tracing# echo funcgraph-abstime  > trace_options
-su: echo: write error: Invalid argument
flexo:/sys/kernel/debug/tracing# cat trace_options 
print-parent nosym-offset nosym-addr noverbose noraw nohex nobin noblock
nostacktrace nosched-tree ftrace_printk noftrace_preempt nobranch
annotate nouserstacktrace nosym-userobj noprintk-msg-only
nofuncgraph-overrun funcgraph-cpu funcgraph-overhead funcgraph-proc 
flexo:/sys/kernel/debug/tracing# uname -a
Linux flexo 2.6.29-rc5-00299-gadfafef #6 SMP Sun Feb 22 20:09:37 CST
2009 x86_64 GNU/Linux

What am I missing?

Cheers,
Kevin.


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12208] uml is very slow on 2.6.28 host
  2009-02-14 20:50   ` Rafael J. Wysocki
@ 2009-02-22 13:58     ` Américo Wang
  -1 siblings, 0 replies; 262+ messages in thread
From: Américo Wang @ 2009-02-22 13:58 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Miklos Szeredi

On Sat, Feb 14, 2009 at 09:50:19PM +0100, Rafael J. Wysocki wrote:
>This message has been generated automatically as a part of a report
>of regressions introduced between 2.6.27 and 2.6.28.
>
>The following bug entry is on the current list of known regressions
>introduced between 2.6.27 and 2.6.28.  Please verify if it still should
>be listed and let me know (either way).
>
>
>Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12208
>Subject		: uml is very slow on 2.6.28 host
>Submitter	: Miklos Szeredi <miklos@szeredi.hu>
>Date		: 2008-12-12 9:35 (65 days old)
>References	: http://marc.info/?l=linux-kernel&m=122907463518593&w=4

Hello, Miklos!

I can't reproduce this on host 2.6.28.7 with uml guest of current git.
Have you tried 2.6.28.7? Does it have the same problem?

Thanks.


-- 
"Against stupidity, the gods themselves, contend in vain."


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12208] uml is very slow on 2.6.28 host
@ 2009-02-22 13:58     ` Américo Wang
  0 siblings, 0 replies; 262+ messages in thread
From: Américo Wang @ 2009-02-22 13:58 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Miklos Szeredi

On Sat, Feb 14, 2009 at 09:50:19PM +0100, Rafael J. Wysocki wrote:
>This message has been generated automatically as a part of a report
>of regressions introduced between 2.6.27 and 2.6.28.
>
>The following bug entry is on the current list of known regressions
>introduced between 2.6.27 and 2.6.28.  Please verify if it still should
>be listed and let me know (either way).
>
>
>Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12208
>Subject		: uml is very slow on 2.6.28 host
>Submitter	: Miklos Szeredi <miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org>
>Date		: 2008-12-12 9:35 (65 days old)
>References	: http://marc.info/?l=linux-kernel&m=122907463518593&w=4

Hello, Miklos!

I can't reproduce this on host 2.6.28.7 with uml guest of current git.
Have you tried 2.6.28.7? Does it have the same problem?

Thanks.


-- 
"Against stupidity, the gods themselves, contend in vain."

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected) [bug 12465]
@ 2009-02-22 17:27           ` Ingo Molnar
  0 siblings, 0 replies; 262+ messages in thread
From: Ingo Molnar @ 2009-02-22 17:27 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Mike Galbraith, bugme-daemon,
	Steven Rostedt, Peter Zijlstra


* Kevin Shanahan <kmshanah@ucwb.org.au> wrote:

> On Sun, 2009-02-15 at 11:04 +0100, Ingo Molnar wrote:
> > qemu-sy-4237 has been scheduled away, and the system appeared to have done
> > nothing in the meantime. That's not something that really looks like a
> > scheduler regression - there is nothing the scheduler can do if KVM
> > decides to block a task.
> > 
> > It would be nice to enhance this single-CPU trace some more - to more
> > surgically see what is going on. Firstly, absolute timestamps would be
> > nice:
> > 
> >   echo funcgraph-abstime  > trace_options
> >   echo funcgraph-proc     > trace_options
> > 
> > as it's a bit hard to see the global timescale of events.
> 
> I was going to try and grab the trace with absolute timestamps 
> tonight, but that option doesn't seem to be available in 
> Linus' current kernel.
> 
> flexo:/sys/kernel/debug/tracing# echo 0 > tracing_enabled
> flexo:/sys/kernel/debug/tracing# echo function_graph > current_tracer
> flexo:/sys/kernel/debug/tracing# echo funcgraph-proc > trace_options
> flexo:/sys/kernel/debug/tracing# echo funcgraph-abstime  > trace_options
> -su: echo: write error: Invalid argument
> flexo:/sys/kernel/debug/tracing# cat trace_options 
> print-parent nosym-offset nosym-addr noverbose noraw nohex nobin noblock
> nostacktrace nosched-tree ftrace_printk noftrace_preempt nobranch
> annotate nouserstacktrace nosym-userobj noprintk-msg-only
> nofuncgraph-overrun funcgraph-cpu funcgraph-overhead funcgraph-proc 
> flexo:/sys/kernel/debug/tracing# uname -a
> Linux flexo 2.6.29-rc5-00299-gadfafef #6 SMP Sun Feb 22 20:09:37 CST
> 2009 x86_64 GNU/Linux
> 
> What am I missing?

(replying here too - replied in the bugzilla already)

that's a feature of the latest tracing tree, so if you try -tip 
you'll have it.

	Ingo

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected) [bug 12465]
@ 2009-02-22 17:27           ` Ingo Molnar
  0 siblings, 0 replies; 262+ messages in thread
From: Ingo Molnar @ 2009-02-22 17:27 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Mike Galbraith,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r, Steven Rostedt,
	Peter Zijlstra


* Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org> wrote:

> On Sun, 2009-02-15 at 11:04 +0100, Ingo Molnar wrote:
> > qemu-sy-4237 has been scheduled away, and the system appeared to have done
> > nothing in the meantime. That's not something that really looks like a
> > scheduler regression - there is nothing the scheduler can do if KVM
> > decides to block a task.
> > 
> > It would be nice to enhance this single-CPU trace some more - to more
> > surgically see what is going on. Firstly, absolute timestamps would be
> > nice:
> > 
> >   echo funcgraph-abstime  > trace_options
> >   echo funcgraph-proc     > trace_options
> > 
> > as it's a bit hard to see the global timescale of events.
> 
> I was going to try and grab the trace with absolute timestamps 
> tonight, but that option doesn't seem to be available in 
> Linus' current kernel.
> 
> flexo:/sys/kernel/debug/tracing# echo 0 > tracing_enabled
> flexo:/sys/kernel/debug/tracing# echo function_graph > current_tracer
> flexo:/sys/kernel/debug/tracing# echo funcgraph-proc > trace_options
> flexo:/sys/kernel/debug/tracing# echo funcgraph-abstime  > trace_options
> -su: echo: write error: Invalid argument
> flexo:/sys/kernel/debug/tracing# cat trace_options 
> print-parent nosym-offset nosym-addr noverbose noraw nohex nobin noblock
> nostacktrace nosched-tree ftrace_printk noftrace_preempt nobranch
> annotate nouserstacktrace nosym-userobj noprintk-msg-only
> nofuncgraph-overrun funcgraph-cpu funcgraph-overhead funcgraph-proc 
> flexo:/sys/kernel/debug/tracing# uname -a
> Linux flexo 2.6.29-rc5-00299-gadfafef #6 SMP Sun Feb 22 20:09:37 CST
> 2009 x86_64 GNU/Linux
> 
> What am I missing?

(replying here too - replied in the bugzilla already)

that's a feature of the latest tracing tree, so if you try -tip 
you'll have it.

	Ingo

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected) [Bug 12465]
@ 2009-02-23 11:38         ` Kevin Shanahan
  0 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-02-23 11:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Mike Galbraith, bugme-daemon,
	Steven Rostedt, Peter Zijlstra

On Sun, 2009-02-15 at 11:04 +0100, Ingo Molnar wrote:
> It would be nice to enhance this single-CPU trace some more - to more
> surgically see what is going on. Firstly, absolute timestamps would be
> nice:
> 
>   echo funcgraph-abstime  > trace_options
>   echo funcgraph-proc     > trace_options
> 
> as it's a bit hard to see the global timescale of events.

Okay, here's some more trace data. I grabbed a few samples at different
times during the ping test. I think the data in files trace6.txt and
trace8.txt coincided with some of the biggest delays.

  http://disenchant.net/tmp/bug-12465/trace-2/

This is captured on 2.6.29-rc5-tip-02057-gaad11ad. The kvm guest being
pinged is process 11211:

  flexo:~# pstree -p 11211
  qemu-system-x86(11211)─┬─{qemu-system-x86}(11212)
                         ├─{qemu-system-x86}(11213)
                         └─{qemu-system-x86}(11609)

Cheers,
Kevin.

> Secondly, not all events are included - in particular i dont really see
> the points when packets are passed. Would it be possible to add a tracing
> hypercall so that the guest kernel can inject trace events that can be seen
> on the native-side trace? Regarding ping latencies really just two things
> matter: the loopback network device's rx and tx path. We should trace the
> outgoing sequence number and the incoming sequence number of IP packets,
> and inject that to the host side. This way we can correlate the delays
> precisely.
> 
> 	Ingo



^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected) [Bug 12465]
@ 2009-02-23 11:38         ` Kevin Shanahan
  0 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-02-23 11:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Mike Galbraith,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r, Steven Rostedt,
	Peter Zijlstra

On Sun, 2009-02-15 at 11:04 +0100, Ingo Molnar wrote:
> It would be nice to enhance this single-CPU trace some more - to more
> surgically see what is going on. Firstly, absolute timestamps would be
> nice:
> 
>   echo funcgraph-abstime  > trace_options
>   echo funcgraph-proc     > trace_options
> 
> as it's a bit hard to see the global timescale of events.

Okay, here's some more trace data. I grabbed a few samples at different
times during the ping test. I think the data in files trace6.txt and
trace8.txt coincided with some of the biggest delays.

  http://disenchant.net/tmp/bug-12465/trace-2/

This is captured on 2.6.29-rc5-tip-02057-gaad11ad. The kvm guest being
pinged is process 11211:

  flexo:~# pstree -p 11211
  qemu-system-x86(11211)─┬─{qemu-system-x86}(11212)
                         ├─{qemu-system-x86}(11213)
                         └─{qemu-system-x86}(11609)

Cheers,
Kevin.

> Secondly, not all events are included - in particular i dont really see
> the points when packets are passed. Would it be possible to add a tracing
> hypercall so that the guest kernel can inject trace events that can be seen
> on the native-side trace? Regarding ping latencies really just two things
> matter: the loopback network device's rx and tx path. We should trace the
> outgoing sequence number and the incoming sequence number of IP packets,
> and inject that to the host side. This way we can correlate the delays
> precisely.
> 
> 	Ingo


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12224] journal activity on inactive partition causes inactive harddrive spinup
  2009-02-14 20:50   ` Rafael J. Wysocki
@ 2009-02-23 12:22     ` Theodore Tso
  -1 siblings, 0 replies; 262+ messages in thread
From: Theodore Tso @ 2009-02-23 12:22 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Andrew Morton,
	Arthur Jones, C Sights, Eric Sandeen, Greg Kroah-Hartman,
	Linus Torvalds

On Sat, Feb 14, 2009 at 09:50:20PM +0100, Rafael J. Wysocki wrote:
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12224
> Subject		: journal activity on inactive partition causes inactive harddrive spinup
> Submitter	: C Sights <csights@fastmail.fm>
> Date		: 2008-12-14 11:39 (63 days old)
> First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=c87591b719737b4e91eb1a9fa8fd55a4ff1886d6
> Handled-By	: Eric Sandeen <sandeen@redhat.com>
> 

The fix for this has landed in mainline as commit 02ac59 for ext3, and
commit 9eddac for ext4.

Rafael, I've marked the bug closed in BZ for your convenience.

    	    	     	       	  	      	 - Ted

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12224] journal activity on inactive partition causes inactive harddrive spinup
@ 2009-02-23 12:22     ` Theodore Tso
  0 siblings, 0 replies; 262+ messages in thread
From: Theodore Tso @ 2009-02-23 12:22 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Andrew Morton,
	Arthur Jones, C Sights, Eric Sandeen, Greg Kroah-Hartman,
	Linus Torvalds

On Sat, Feb 14, 2009 at 09:50:20PM +0100, Rafael J. Wysocki wrote:
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12224
> Subject		: journal activity on inactive partition causes inactive harddrive spinup
> Submitter	: C Sights <csights-97jfqw80gc6171pxa8y+qA@public.gmane.org>
> Date		: 2008-12-14 11:39 (63 days old)
> First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=c87591b719737b4e91eb1a9fa8fd55a4ff1886d6
> Handled-By	: Eric Sandeen <sandeen-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> 

The fix for this has landed in mainline as commit 02ac59 for ext3, and
commit 9eddac for ext4.

Rafael, I've marked the bug closed in BZ for your convenience.

    	    	     	       	  	      	 - Ted

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12208] uml is very slow on 2.6.28 host
@ 2009-02-23 14:27       ` Miklos Szeredi
  0 siblings, 0 replies; 262+ messages in thread
From: Miklos Szeredi @ 2009-02-23 14:27 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: rjw, linux-kernel, kernel-testers, miklos

On Sun, 22 Feb 2009, =?utf-8?Q?Am=C3=A9rico?= Wang wrote:
> On Sat, Feb 14, 2009 at 09:50:19PM +0100, Rafael J. Wysocki wrote:
> >This message has been generated automatically as a part of a report
> >of regressions introduced between 2.6.27 and 2.6.28.
> >
> >The following bug entry is on the current list of known regressions
> >introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> >be listed and let me know (either way).
> >
> >
> >Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12208
> >Subject		: uml is very slow on 2.6.28 host
> >Submitter	: Miklos Szeredi <miklos@szeredi.hu>
> >Date		: 2008-12-12 9:35 (65 days old)
> >References	: http://marc.info/?l=linux-kernel&m=122907463518593&w=4
> 
> Hello, Miklos!
> 
> I can't reproduce this on host 2.6.28.7 with uml guest of current git.
> Have you tried 2.6.28.7? Does it have the same problem?

It's still slow for me on 2.6.29-rc5.  I haven't tried 2.6.28.7 yet.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12208] uml is very slow on 2.6.28 host
@ 2009-02-23 14:27       ` Miklos Szeredi
  0 siblings, 0 replies; 262+ messages in thread
From: Miklos Szeredi @ 2009-02-23 14:27 UTC (permalink / raw)
  To: xiyou.wangcong-Re5JQEeQqe8AvxtiuMwx3w
  Cc: rjw-KKrjLPT3xs0, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	miklos-sUDqSbJrdHQHWmgEVkV9KA

On Sun, 22 Feb 2009, =?utf-8?Q?Am=C3=A9rico?= Wang wrote:
> On Sat, Feb 14, 2009 at 09:50:19PM +0100, Rafael J. Wysocki wrote:
> >This message has been generated automatically as a part of a report
> >of regressions introduced between 2.6.27 and 2.6.28.
> >
> >The following bug entry is on the current list of known regressions
> >introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> >be listed and let me know (either way).
> >
> >
> >Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12208
> >Subject		: uml is very slow on 2.6.28 host
> >Submitter	: Miklos Szeredi <miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org>
> >Date		: 2008-12-12 9:35 (65 days old)
> >References	: http://marc.info/?l=linux-kernel&m=122907463518593&w=4
> 
> Hello, Miklos!
> 
> I can't reproduce this on host 2.6.28.7 with uml guest of current git.
> Have you tried 2.6.28.7? Does it have the same problem?

It's still slow for me on 2.6.29-rc5.  I haven't tried 2.6.28.7 yet.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12224] journal activity on inactive partition causes inactive harddrive spinup
@ 2009-02-23 14:36       ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-23 14:36 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Linux Kernel Mailing List, Kernel Testers List, Andrew Morton,
	Arthur Jones, C Sights, Eric Sandeen, Greg Kroah-Hartman,
	Linus Torvalds

On Monday 23 February 2009, Theodore Tso wrote:
> On Sat, Feb 14, 2009 at 09:50:20PM +0100, Rafael J. Wysocki wrote:
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12224
> > Subject		: journal activity on inactive partition causes inactive harddrive spinup
> > Submitter	: C Sights <csights@fastmail.fm>
> > Date		: 2008-12-14 11:39 (63 days old)
> > First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=c87591b719737b4e91eb1a9fa8fd55a4ff1886d6
> > Handled-By	: Eric Sandeen <sandeen@redhat.com>
> > 
> 
> The fix for this has landed in mainline as commit 02ac59 for ext3, and
> commit 9eddac for ext4.
> 
> Rafael, I've marked the bug closed in BZ for your convenience.

Thanks a lot!

Best,
Rafael

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12224] journal activity on inactive partition causes inactive harddrive spinup
@ 2009-02-23 14:36       ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-23 14:36 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Linux Kernel Mailing List, Kernel Testers List, Andrew Morton,
	Arthur Jones, C Sights, Eric Sandeen, Greg Kroah-Hartman,
	Linus Torvalds

On Monday 23 February 2009, Theodore Tso wrote:
> On Sat, Feb 14, 2009 at 09:50:20PM +0100, Rafael J. Wysocki wrote:
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12224
> > Subject		: journal activity on inactive partition causes inactive harddrive spinup
> > Submitter	: C Sights <csights-97jfqw80gc6171pxa8y+qA@public.gmane.org>
> > Date		: 2008-12-14 11:39 (63 days old)
> > First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=c87591b719737b4e91eb1a9fa8fd55a4ff1886d6
> > Handled-By	: Eric Sandeen <sandeen-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > 
> 
> The fix for this has landed in mainline as commit 02ac59 for ext3, and
> commit 9eddac for ext4.
> 
> Rafael, I've marked the bug closed in BZ for your convenience.

Thanks a lot!

Best,
Rafael

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-03-24 11:44                       ` Frederic Weisbecker
@ 2009-03-26 20:22                         ` Kevin Shanahan
  -1 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-03-26 20:22 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Tue, 2009-03-24 at 12:44 +0100, Frederic Weisbecker wrote:
> As I explained in my previous mail, you trace is only
> a snapshot that happened in 10 msec.
> 
> I experimented different sizes for the ring buffer but even
> a 1 second trace require 20 Mo of memory. And a so huge trace
> would be impractical.
> 
> I think we should keep the trace filters we had previously.
> If you don't minde, could you please retest against latest -tip
> the following updated patch? Iadded the filters, fixed the python
> subshell and also flushed the buffer more nicely according to
> a recent feature in -tip:
> 
> echo > trace 
> 
> instead of switching to nop.
> You will need to pull latest -tip again.

Ok, new set of traces uploaded again here:

  http://disenchant.net/tmp/bug-12465/trace-4/

These were taken using 2.6.29-tip-02749-g398bf09.

Same as last time, it was only necessary to have the one guest running
to reproduce the problem.

Cheers,
Kevin.



^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-26 20:22                         ` Kevin Shanahan
  0 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-03-26 20:22 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Tue, 2009-03-24 at 12:44 +0100, Frederic Weisbecker wrote:
> As I explained in my previous mail, you trace is only
> a snapshot that happened in 10 msec.
> 
> I experimented different sizes for the ring buffer but even
> a 1 second trace require 20 Mo of memory. And a so huge trace
> would be impractical.
> 
> I think we should keep the trace filters we had previously.
> If you don't minde, could you please retest against latest -tip
> the following updated patch? Iadded the filters, fixed the python
> subshell and also flushed the buffer more nicely according to
> a recent feature in -tip:
> 
> echo > trace 
> 
> instead of switching to nop.
> You will need to pull latest -tip again.

Ok, new set of traces uploaded again here:

  http://disenchant.net/tmp/bug-12465/trace-4/

These were taken using 2.6.29-tip-02749-g398bf09.

Same as last time, it was only necessary to have the one guest running
to reproduce the problem.

Cheers,
Kevin.


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-03-25 23:40                       ` Kevin Shanahan
@ 2009-03-25 23:48                           ` Frederic Weisbecker
  0 siblings, 0 replies; 262+ messages in thread
From: Frederic Weisbecker @ 2009-03-25 23:48 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Thu, Mar 26, 2009 at 10:10:32AM +1030, Kevin Shanahan wrote:
> On Tue, 2009-03-24 at 12:44 +0100, Frederic Weisbecker wrote:
> > Sorry, I've been late to answer.
> > As I explained in my previous mail, you trace is only
> > a snapshot that happened in 10 msec.
> > 
> > I experimented different sizes for the ring buffer but even
> > a 1 second trace require 20 Mo of memory. And a so huge trace
> > would be impractical.
> > 
> > I think we should keep the trace filters we had previously.
> > If you don't minde, could you please retest against latest -tip
> > the following updated patch? Iadded the filters, fixed the python
> > subshell and also flushed the buffer more nicely according to
> > a recent feature in -tip:
> > 
> > echo > trace 
> > 
> > instead of switching to nop.
> > You will need to pull latest -tip again.
> 
> Ok, thanks for that. I'll get a new -tip kernel ready to test tonight.
> I'm not sure about the change to the python subshell though:
> 
> > while [ "$found" != "True" ]
> > do
> >         # Flush the previous buffer
> >         echo trace > $prefix/trace
> > 
> >         echo 1 > $prefix/tracing_enabled
> >         lat=$(ping -c 1 $addr | grep rtt | grep -Eo " [0-9]+.[0-9]+")
> >         echo 0 > $prefix/tracing_enabled
> > 
> > 	echo $lat
> > 	found=$(python -c "print float(str($lat).strip())")
> >         sleep 0.01
> > done
> 
> kmshanah@kulgan:~$ python -c "print float(str(1.234).strip())"
> 1.234
> 
> That's not going to evaluate to "True" at all is it? What happened to
> the test against the latency threshold value? Did you mean something
> like this?
> 
> kmshanah@kulgan:~$ python -c "print float(str(1.234).strip()) > 5000"
> False
> kmshanah@kulgan:~$ python -c "print float(str(5001.234).strip()) > 5000"
> True


Sorry. I guess I was a bit asleep.
It's a mistake. So you can restore how it was.

Thanks.

 
> Cheers,
> Kevin.
> 
> 


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-25 23:48                           ` Frederic Weisbecker
  0 siblings, 0 replies; 262+ messages in thread
From: Frederic Weisbecker @ 2009-03-25 23:48 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Thu, Mar 26, 2009 at 10:10:32AM +1030, Kevin Shanahan wrote:
> On Tue, 2009-03-24 at 12:44 +0100, Frederic Weisbecker wrote:
> > Sorry, I've been late to answer.
> > As I explained in my previous mail, you trace is only
> > a snapshot that happened in 10 msec.
> > 
> > I experimented different sizes for the ring buffer but even
> > a 1 second trace require 20 Mo of memory. And a so huge trace
> > would be impractical.
> > 
> > I think we should keep the trace filters we had previously.
> > If you don't minde, could you please retest against latest -tip
> > the following updated patch? Iadded the filters, fixed the python
> > subshell and also flushed the buffer more nicely according to
> > a recent feature in -tip:
> > 
> > echo > trace 
> > 
> > instead of switching to nop.
> > You will need to pull latest -tip again.
> 
> Ok, thanks for that. I'll get a new -tip kernel ready to test tonight.
> I'm not sure about the change to the python subshell though:
> 
> > while [ "$found" != "True" ]
> > do
> >         # Flush the previous buffer
> >         echo trace > $prefix/trace
> > 
> >         echo 1 > $prefix/tracing_enabled
> >         lat=$(ping -c 1 $addr | grep rtt | grep -Eo " [0-9]+.[0-9]+")
> >         echo 0 > $prefix/tracing_enabled
> > 
> > 	echo $lat
> > 	found=$(python -c "print float(str($lat).strip())")
> >         sleep 0.01
> > done
> 
> kmshanah@kulgan:~$ python -c "print float(str(1.234).strip())"
> 1.234
> 
> That's not going to evaluate to "True" at all is it? What happened to
> the test against the latency threshold value? Did you mean something
> like this?
> 
> kmshanah@kulgan:~$ python -c "print float(str(1.234).strip()) > 5000"
> False
> kmshanah@kulgan:~$ python -c "print float(str(5001.234).strip()) > 5000"
> True


Sorry. I guess I was a bit asleep.
It's a mistake. So you can restore how it was.

Thanks.

 
> Cheers,
> Kevin.
> 
> 

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-03-24 11:44                       ` Frederic Weisbecker
  (?)
  (?)
@ 2009-03-25 23:40                       ` Kevin Shanahan
  2009-03-25 23:48                           ` Frederic Weisbecker
  -1 siblings, 1 reply; 262+ messages in thread
From: Kevin Shanahan @ 2009-03-25 23:40 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Tue, 2009-03-24 at 12:44 +0100, Frederic Weisbecker wrote:
> Sorry, I've been late to answer.
> As I explained in my previous mail, you trace is only
> a snapshot that happened in 10 msec.
> 
> I experimented different sizes for the ring buffer but even
> a 1 second trace require 20 Mo of memory. And a so huge trace
> would be impractical.
> 
> I think we should keep the trace filters we had previously.
> If you don't minde, could you please retest against latest -tip
> the following updated patch? Iadded the filters, fixed the python
> subshell and also flushed the buffer more nicely according to
> a recent feature in -tip:
> 
> echo > trace 
> 
> instead of switching to nop.
> You will need to pull latest -tip again.

Ok, thanks for that. I'll get a new -tip kernel ready to test tonight.
I'm not sure about the change to the python subshell though:

> while [ "$found" != "True" ]
> do
>         # Flush the previous buffer
>         echo trace > $prefix/trace
> 
>         echo 1 > $prefix/tracing_enabled
>         lat=$(ping -c 1 $addr | grep rtt | grep -Eo " [0-9]+.[0-9]+")
>         echo 0 > $prefix/tracing_enabled
> 
> 	echo $lat
> 	found=$(python -c "print float(str($lat).strip())")
>         sleep 0.01
> done

kmshanah@kulgan:~$ python -c "print float(str(1.234).strip())"
1.234

That's not going to evaluate to "True" at all is it? What happened to
the test against the latency threshold value? Did you mean something
like this?

kmshanah@kulgan:~$ python -c "print float(str(1.234).strip()) > 5000"
False
kmshanah@kulgan:~$ python -c "print float(str(5001.234).strip()) > 5000"
True

Cheers,
Kevin.



^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-03-24 11:44                       ` Frederic Weisbecker
@ 2009-03-24 11:47                         ` Frederic Weisbecker
  -1 siblings, 0 replies; 262+ messages in thread
From: Frederic Weisbecker @ 2009-03-24 11:47 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Tue, Mar 24, 2009 at 12:44:12PM +0100, Frederic Weisbecker wrote:
> On Sat, Mar 21, 2009 at 03:30:39PM +1030, Kevin Shanahan wrote:
> > On Thu, 2009-03-19 at 07:54 +1030, Kevin Shanahan wrote:
> > > On Wed, 2009-03-18 at 11:46 +1030, Kevin Shanahan wrote:
> > > > On Wed, 2009-03-18 at 01:20 +0100, Frederic Weisbecker wrote:
> > > > > Ok, I've made a small script based on yours which could do this job.
> > > > > You will just have to set yourself a threshold of latency
> > > > > that you consider as buggy. I don't remember the latency you observed.
> > > > > About 5 secs right?
> > > > > 
> > > > > It's the "thres" variable in the script.
> > > > > 
> > > > > The resulting trace should be a mixup of the function graph traces
> > > > > and scheduler events which look like this:
> > > > > 
> > > > >  gnome-screensav-4691  [000]  6716.774277:   4691:120:S ==> [000]     0:140:R <idle>
> > > > >   xfce4-terminal-4723  [001]  6716.774303:   4723:120:R   + [001]  4289:120:S Xorg
> > > > >   xfce4-terminal-4723  [001]  6716.774417:   4723:120:S ==> [001]  4289:120:R Xorg
> > > > >             Xorg-4289  [001]  6716.774427:   4289:120:S ==> [001]     0:140:R <idle>
> > > > > 
> > > > > + is a wakeup and ==> is a context switch.
> > > > > 
> > > > > The script will loop trying some pings and will only keep the trace that matches
> > > > > the latency threshold you defined.
> > > > > 
> > > > > Tell if the following script work for you.
> > > 
> > > ...
> > > 
> > > > Either way, I'll try to get some results in my maintenance window
> > > > tonight.
> > > 
> > > Testing did not go so well. I compiled and booted
> > > 2.6.29-rc8-tip-02630-g93c4989, but had some problems with the system
> > > load when I tried to start tracing - it shot up to around 16-20 or so. I
> > > started shutting down VMs to try and get it under control, but before I
> > > got back to tracing again the machine disappeared off the network -
> > > unresponsive to ping.
> > > 
> > > When I got in this morning, there was nothing on the console, nothing in
> > > the logs to show what went wrong. I will try again, but my next chance
> > > will probably be Saturday. Stay tuned.
> > 
> > Okay, new set of traces have been uploaded to:
> > 
> >   http://disenchant.net/tmp/bug-12465/trace-3/
> > 
> > These were done on the latest tip, which I pulled down this morning:
> > 2.6.29-rc8-tip-02744-gd9937cb.
> > 
> > The system load was very high again when I first tried to trace with
> > sevarl guests running, so I ended up only having the one guest running
> > and thankfully the bug was still reproducable that way.
> > 
> > Fingers crossed this set of traces is able to tell us something.
> > 
> > Regards,
> > Kevin.
> > 
> > 
> 
> Sorry, I've been late to answer.
> As I explained in my previous mail, you trace is only
> a snapshot that happened in 10 msec.
> 
> I experimented different sizes for the ring buffer but even
> a 1 second trace require 20 Mo of memory. And a so huge trace
> would be impractical.
> 
> I think we should keep the trace filters we had previously.
> If you don't minde, could you please retest against latest -tip
> the following updated patch? Iadded the filters, fixed the python
> subshell and also flushed the buffer more nicely according to
> a recent feature in -tip:
> 
> echo > trace 
> 
> instead of switching to nop.
> You will need to pull latest -tip again.
> 
> Thanks a lot Kevin!


Ah you will also need to increase the size of your buffer.
See below:
 
> 
> #!/bin/bash
> 
> # Switch off all CPUs except for one to simplify the trace
> echo 0 > /sys/devices/system/cpu/cpu1/online
> echo 0 > /sys/devices/system/cpu/cpu2/online
> echo 0 > /sys/devices/system/cpu/cpu3/online
> 
> 
> # Make sure debugfs has been mounted
> if [ ! -d /sys/kernel/debug/tracing ]; then
>     mount -t debugfs debugfs /sys/kernel/debug
> fi
> 
> # Set up the trace parameters
> pushd /sys/kernel/debug/tracing || exit 1
> echo 0 > tracing_enabled
> echo function_graph > current_tracer
> echo funcgraph-abstime > trace_options
> echo funcgraph-proc    > trace_options
> 
> # Set here the kvm IP addr
> addr="hermes-old"
> 
> # Set here a threshold of latency in sec
> thres="5000"
> found="False"
> lat=0
> prefix=/sys/kernel/debug/tracing
> 
> echo 1 > $prefix/events/sched/sched_wakeup/enable
> echo 1 > $prefix/events/sched/sched_switch/enable
> 
> # Set the filter for functions to trace
> echo ''         > set_ftrace_filter  # clear filter functions
> echo '*sched*' >> set_ftrace_filter 
> echo '*wake*'  >> set_ftrace_filter
> echo '*kvm*'   >> set_ftrace_filter
> 
> # Reset the function_graph tracer
> echo function_graph > $prefix/current_tracer

Put a

echo 20000 > $prefix/buffer_size_kb

So that we will have enough space (hopefully).

Thanks!

> 
> while [ "$found" != "True" ]
> do
>         # Flush the previous buffer
>         echo trace > $prefix/trace
> 
>         echo 1 > $prefix/tracing_enabled
>         lat=$(ping -c 1 $addr | grep rtt | grep -Eo " [0-9]+.[0-9]+")
>         echo 0 > $prefix/tracing_enabled
> 
> 	echo $lat
> 	found=$(python -c "print float(str($lat).strip())")
>         sleep 0.01
> done
> 
> echo 0 > $prefix/events/sched/sched_wakeup/enable
> echo 0 > $prefix/events/sched/sched_switch/enable
> 
> 
> echo "Found buggy latency: $lat"
> echo "Please send the trace you will find on $prefix/trace"
> 
> 


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-24 11:47                         ` Frederic Weisbecker
  0 siblings, 0 replies; 262+ messages in thread
From: Frederic Weisbecker @ 2009-03-24 11:47 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Tue, Mar 24, 2009 at 12:44:12PM +0100, Frederic Weisbecker wrote:
> On Sat, Mar 21, 2009 at 03:30:39PM +1030, Kevin Shanahan wrote:
> > On Thu, 2009-03-19 at 07:54 +1030, Kevin Shanahan wrote:
> > > On Wed, 2009-03-18 at 11:46 +1030, Kevin Shanahan wrote:
> > > > On Wed, 2009-03-18 at 01:20 +0100, Frederic Weisbecker wrote:
> > > > > Ok, I've made a small script based on yours which could do this job.
> > > > > You will just have to set yourself a threshold of latency
> > > > > that you consider as buggy. I don't remember the latency you observed.
> > > > > About 5 secs right?
> > > > > 
> > > > > It's the "thres" variable in the script.
> > > > > 
> > > > > The resulting trace should be a mixup of the function graph traces
> > > > > and scheduler events which look like this:
> > > > > 
> > > > >  gnome-screensav-4691  [000]  6716.774277:   4691:120:S ==> [000]     0:140:R <idle>
> > > > >   xfce4-terminal-4723  [001]  6716.774303:   4723:120:R   + [001]  4289:120:S Xorg
> > > > >   xfce4-terminal-4723  [001]  6716.774417:   4723:120:S ==> [001]  4289:120:R Xorg
> > > > >             Xorg-4289  [001]  6716.774427:   4289:120:S ==> [001]     0:140:R <idle>
> > > > > 
> > > > > + is a wakeup and ==> is a context switch.
> > > > > 
> > > > > The script will loop trying some pings and will only keep the trace that matches
> > > > > the latency threshold you defined.
> > > > > 
> > > > > Tell if the following script work for you.
> > > 
> > > ...
> > > 
> > > > Either way, I'll try to get some results in my maintenance window
> > > > tonight.
> > > 
> > > Testing did not go so well. I compiled and booted
> > > 2.6.29-rc8-tip-02630-g93c4989, but had some problems with the system
> > > load when I tried to start tracing - it shot up to around 16-20 or so. I
> > > started shutting down VMs to try and get it under control, but before I
> > > got back to tracing again the machine disappeared off the network -
> > > unresponsive to ping.
> > > 
> > > When I got in this morning, there was nothing on the console, nothing in
> > > the logs to show what went wrong. I will try again, but my next chance
> > > will probably be Saturday. Stay tuned.
> > 
> > Okay, new set of traces have been uploaded to:
> > 
> >   http://disenchant.net/tmp/bug-12465/trace-3/
> > 
> > These were done on the latest tip, which I pulled down this morning:
> > 2.6.29-rc8-tip-02744-gd9937cb.
> > 
> > The system load was very high again when I first tried to trace with
> > sevarl guests running, so I ended up only having the one guest running
> > and thankfully the bug was still reproducable that way.
> > 
> > Fingers crossed this set of traces is able to tell us something.
> > 
> > Regards,
> > Kevin.
> > 
> > 
> 
> Sorry, I've been late to answer.
> As I explained in my previous mail, you trace is only
> a snapshot that happened in 10 msec.
> 
> I experimented different sizes for the ring buffer but even
> a 1 second trace require 20 Mo of memory. And a so huge trace
> would be impractical.
> 
> I think we should keep the trace filters we had previously.
> If you don't minde, could you please retest against latest -tip
> the following updated patch? Iadded the filters, fixed the python
> subshell and also flushed the buffer more nicely according to
> a recent feature in -tip:
> 
> echo > trace 
> 
> instead of switching to nop.
> You will need to pull latest -tip again.
> 
> Thanks a lot Kevin!


Ah you will also need to increase the size of your buffer.
See below:
 
> 
> #!/bin/bash
> 
> # Switch off all CPUs except for one to simplify the trace
> echo 0 > /sys/devices/system/cpu/cpu1/online
> echo 0 > /sys/devices/system/cpu/cpu2/online
> echo 0 > /sys/devices/system/cpu/cpu3/online
> 
> 
> # Make sure debugfs has been mounted
> if [ ! -d /sys/kernel/debug/tracing ]; then
>     mount -t debugfs debugfs /sys/kernel/debug
> fi
> 
> # Set up the trace parameters
> pushd /sys/kernel/debug/tracing || exit 1
> echo 0 > tracing_enabled
> echo function_graph > current_tracer
> echo funcgraph-abstime > trace_options
> echo funcgraph-proc    > trace_options
> 
> # Set here the kvm IP addr
> addr="hermes-old"
> 
> # Set here a threshold of latency in sec
> thres="5000"
> found="False"
> lat=0
> prefix=/sys/kernel/debug/tracing
> 
> echo 1 > $prefix/events/sched/sched_wakeup/enable
> echo 1 > $prefix/events/sched/sched_switch/enable
> 
> # Set the filter for functions to trace
> echo ''         > set_ftrace_filter  # clear filter functions
> echo '*sched*' >> set_ftrace_filter 
> echo '*wake*'  >> set_ftrace_filter
> echo '*kvm*'   >> set_ftrace_filter
> 
> # Reset the function_graph tracer
> echo function_graph > $prefix/current_tracer

Put a

echo 20000 > $prefix/buffer_size_kb

So that we will have enough space (hopefully).

Thanks!

> 
> while [ "$found" != "True" ]
> do
>         # Flush the previous buffer
>         echo trace > $prefix/trace
> 
>         echo 1 > $prefix/tracing_enabled
>         lat=$(ping -c 1 $addr | grep rtt | grep -Eo " [0-9]+.[0-9]+")
>         echo 0 > $prefix/tracing_enabled
> 
> 	echo $lat
> 	found=$(python -c "print float(str($lat).strip())")
>         sleep 0.01
> done
> 
> echo 0 > $prefix/events/sched/sched_wakeup/enable
> echo 0 > $prefix/events/sched/sched_switch/enable
> 
> 
> echo "Found buggy latency: $lat"
> echo "Please send the trace you will find on $prefix/trace"
> 
> 

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-24 11:44                       ` Frederic Weisbecker
  0 siblings, 0 replies; 262+ messages in thread
From: Frederic Weisbecker @ 2009-03-24 11:44 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Sat, Mar 21, 2009 at 03:30:39PM +1030, Kevin Shanahan wrote:
> On Thu, 2009-03-19 at 07:54 +1030, Kevin Shanahan wrote:
> > On Wed, 2009-03-18 at 11:46 +1030, Kevin Shanahan wrote:
> > > On Wed, 2009-03-18 at 01:20 +0100, Frederic Weisbecker wrote:
> > > > Ok, I've made a small script based on yours which could do this job.
> > > > You will just have to set yourself a threshold of latency
> > > > that you consider as buggy. I don't remember the latency you observed.
> > > > About 5 secs right?
> > > > 
> > > > It's the "thres" variable in the script.
> > > > 
> > > > The resulting trace should be a mixup of the function graph traces
> > > > and scheduler events which look like this:
> > > > 
> > > >  gnome-screensav-4691  [000]  6716.774277:   4691:120:S ==> [000]     0:140:R <idle>
> > > >   xfce4-terminal-4723  [001]  6716.774303:   4723:120:R   + [001]  4289:120:S Xorg
> > > >   xfce4-terminal-4723  [001]  6716.774417:   4723:120:S ==> [001]  4289:120:R Xorg
> > > >             Xorg-4289  [001]  6716.774427:   4289:120:S ==> [001]     0:140:R <idle>
> > > > 
> > > > + is a wakeup and ==> is a context switch.
> > > > 
> > > > The script will loop trying some pings and will only keep the trace that matches
> > > > the latency threshold you defined.
> > > > 
> > > > Tell if the following script work for you.
> > 
> > ...
> > 
> > > Either way, I'll try to get some results in my maintenance window
> > > tonight.
> > 
> > Testing did not go so well. I compiled and booted
> > 2.6.29-rc8-tip-02630-g93c4989, but had some problems with the system
> > load when I tried to start tracing - it shot up to around 16-20 or so. I
> > started shutting down VMs to try and get it under control, but before I
> > got back to tracing again the machine disappeared off the network -
> > unresponsive to ping.
> > 
> > When I got in this morning, there was nothing on the console, nothing in
> > the logs to show what went wrong. I will try again, but my next chance
> > will probably be Saturday. Stay tuned.
> 
> Okay, new set of traces have been uploaded to:
> 
>   http://disenchant.net/tmp/bug-12465/trace-3/
> 
> These were done on the latest tip, which I pulled down this morning:
> 2.6.29-rc8-tip-02744-gd9937cb.
> 
> The system load was very high again when I first tried to trace with
> sevarl guests running, so I ended up only having the one guest running
> and thankfully the bug was still reproducable that way.
> 
> Fingers crossed this set of traces is able to tell us something.
> 
> Regards,
> Kevin.
> 
> 

Sorry, I've been late to answer.
As I explained in my previous mail, you trace is only
a snapshot that happened in 10 msec.

I experimented different sizes for the ring buffer but even
a 1 second trace require 20 Mo of memory. And a so huge trace
would be impractical.

I think we should keep the trace filters we had previously.
If you don't minde, could you please retest against latest -tip
the following updated patch? Iadded the filters, fixed the python
subshell and also flushed the buffer more nicely according to
a recent feature in -tip:

echo > trace 

instead of switching to nop.
You will need to pull latest -tip again.

Thanks a lot Kevin!


#!/bin/bash

# Switch off all CPUs except for one to simplify the trace
echo 0 > /sys/devices/system/cpu/cpu1/online
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 0 > /sys/devices/system/cpu/cpu3/online


# Make sure debugfs has been mounted
if [ ! -d /sys/kernel/debug/tracing ]; then
    mount -t debugfs debugfs /sys/kernel/debug
fi

# Set up the trace parameters
pushd /sys/kernel/debug/tracing || exit 1
echo 0 > tracing_enabled
echo function_graph > current_tracer
echo funcgraph-abstime > trace_options
echo funcgraph-proc    > trace_options

# Set here the kvm IP addr
addr="hermes-old"

# Set here a threshold of latency in sec
thres="5000"
found="False"
lat=0
prefix=/sys/kernel/debug/tracing

echo 1 > $prefix/events/sched/sched_wakeup/enable
echo 1 > $prefix/events/sched/sched_switch/enable

# Set the filter for functions to trace
echo ''         > set_ftrace_filter  # clear filter functions
echo '*sched*' >> set_ftrace_filter 
echo '*wake*'  >> set_ftrace_filter
echo '*kvm*'   >> set_ftrace_filter

# Reset the function_graph tracer
echo function_graph > $prefix/current_tracer

while [ "$found" != "True" ]
do
        # Flush the previous buffer
        echo trace > $prefix/trace

        echo 1 > $prefix/tracing_enabled
        lat=$(ping -c 1 $addr | grep rtt | grep -Eo " [0-9]+.[0-9]+")
        echo 0 > $prefix/tracing_enabled

	echo $lat
	found=$(python -c "print float(str($lat).strip())")
        sleep 0.01
done

echo 0 > $prefix/events/sched/sched_wakeup/enable
echo 0 > $prefix/events/sched/sched_switch/enable


echo "Found buggy latency: $lat"
echo "Please send the trace you will find on $prefix/trace"



^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-24 11:44                       ` Frederic Weisbecker
  0 siblings, 0 replies; 262+ messages in thread
From: Frederic Weisbecker @ 2009-03-24 11:44 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Sat, Mar 21, 2009 at 03:30:39PM +1030, Kevin Shanahan wrote:
> On Thu, 2009-03-19 at 07:54 +1030, Kevin Shanahan wrote:
> > On Wed, 2009-03-18 at 11:46 +1030, Kevin Shanahan wrote:
> > > On Wed, 2009-03-18 at 01:20 +0100, Frederic Weisbecker wrote:
> > > > Ok, I've made a small script based on yours which could do this job.
> > > > You will just have to set yourself a threshold of latency
> > > > that you consider as buggy. I don't remember the latency you observed.
> > > > About 5 secs right?
> > > > 
> > > > It's the "thres" variable in the script.
> > > > 
> > > > The resulting trace should be a mixup of the function graph traces
> > > > and scheduler events which look like this:
> > > > 
> > > >  gnome-screensav-4691  [000]  6716.774277:   4691:120:S ==> [000]     0:140:R <idle>
> > > >   xfce4-terminal-4723  [001]  6716.774303:   4723:120:R   + [001]  4289:120:S Xorg
> > > >   xfce4-terminal-4723  [001]  6716.774417:   4723:120:S ==> [001]  4289:120:R Xorg
> > > >             Xorg-4289  [001]  6716.774427:   4289:120:S ==> [001]     0:140:R <idle>
> > > > 
> > > > + is a wakeup and ==> is a context switch.
> > > > 
> > > > The script will loop trying some pings and will only keep the trace that matches
> > > > the latency threshold you defined.
> > > > 
> > > > Tell if the following script work for you.
> > 
> > ...
> > 
> > > Either way, I'll try to get some results in my maintenance window
> > > tonight.
> > 
> > Testing did not go so well. I compiled and booted
> > 2.6.29-rc8-tip-02630-g93c4989, but had some problems with the system
> > load when I tried to start tracing - it shot up to around 16-20 or so. I
> > started shutting down VMs to try and get it under control, but before I
> > got back to tracing again the machine disappeared off the network -
> > unresponsive to ping.
> > 
> > When I got in this morning, there was nothing on the console, nothing in
> > the logs to show what went wrong. I will try again, but my next chance
> > will probably be Saturday. Stay tuned.
> 
> Okay, new set of traces have been uploaded to:
> 
>   http://disenchant.net/tmp/bug-12465/trace-3/
> 
> These were done on the latest tip, which I pulled down this morning:
> 2.6.29-rc8-tip-02744-gd9937cb.
> 
> The system load was very high again when I first tried to trace with
> sevarl guests running, so I ended up only having the one guest running
> and thankfully the bug was still reproducable that way.
> 
> Fingers crossed this set of traces is able to tell us something.
> 
> Regards,
> Kevin.
> 
> 

Sorry, I've been late to answer.
As I explained in my previous mail, you trace is only
a snapshot that happened in 10 msec.

I experimented different sizes for the ring buffer but even
a 1 second trace require 20 Mo of memory. And a so huge trace
would be impractical.

I think we should keep the trace filters we had previously.
If you don't minde, could you please retest against latest -tip
the following updated patch? Iadded the filters, fixed the python
subshell and also flushed the buffer more nicely according to
a recent feature in -tip:

echo > trace 

instead of switching to nop.
You will need to pull latest -tip again.

Thanks a lot Kevin!


#!/bin/bash

# Switch off all CPUs except for one to simplify the trace
echo 0 > /sys/devices/system/cpu/cpu1/online
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 0 > /sys/devices/system/cpu/cpu3/online


# Make sure debugfs has been mounted
if [ ! -d /sys/kernel/debug/tracing ]; then
    mount -t debugfs debugfs /sys/kernel/debug
fi

# Set up the trace parameters
pushd /sys/kernel/debug/tracing || exit 1
echo 0 > tracing_enabled
echo function_graph > current_tracer
echo funcgraph-abstime > trace_options
echo funcgraph-proc    > trace_options

# Set here the kvm IP addr
addr="hermes-old"

# Set here a threshold of latency in sec
thres="5000"
found="False"
lat=0
prefix=/sys/kernel/debug/tracing

echo 1 > $prefix/events/sched/sched_wakeup/enable
echo 1 > $prefix/events/sched/sched_switch/enable

# Set the filter for functions to trace
echo ''         > set_ftrace_filter  # clear filter functions
echo '*sched*' >> set_ftrace_filter 
echo '*wake*'  >> set_ftrace_filter
echo '*kvm*'   >> set_ftrace_filter

# Reset the function_graph tracer
echo function_graph > $prefix/current_tracer

while [ "$found" != "True" ]
do
        # Flush the previous buffer
        echo trace > $prefix/trace

        echo 1 > $prefix/tracing_enabled
        lat=$(ping -c 1 $addr | grep rtt | grep -Eo " [0-9]+.[0-9]+")
        echo 0 > $prefix/tracing_enabled

	echo $lat
	found=$(python -c "print float(str($lat).strip())")
        sleep 0.01
done

echo 0 > $prefix/events/sched/sched_wakeup/enable
echo 0 > $prefix/events/sched/sched_switch/enable


echo "Found buggy latency: $lat"
echo "Please send the trace you will find on $prefix/trace"


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-03-21 17:07   ` Rafael J. Wysocki
@ 2009-03-21 19:50     ` Ingo Molnar
  -1 siblings, 0 replies; 262+ messages in thread
From: Ingo Molnar @ 2009-03-21 19:50 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Avi Kivity,
	Kevin Shanahan, Kevin Shanahan, Mike Galbraith, Peter Zijlstra


* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.27 and 2.6.28.
> 
> The following bug entry is on the current list of known regressions
> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> be listed and let me know (either way).
> 
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> Subject		: KVM guests stalling on 2.6.28 (bisected)
> Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
> Date		: 2009-01-17 03:37 (64 days old)
> References	: http://lkml.org/lkml/2009/3/15/51
> Handled-By	: Avi Kivity <avi@redhat.com>

It's still being investigated.

	Ingo

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-21 19:50     ` Ingo Molnar
  0 siblings, 0 replies; 262+ messages in thread
From: Ingo Molnar @ 2009-03-21 19:50 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Avi Kivity,
	Kevin Shanahan, Kevin Shanahan, Mike Galbraith, Peter Zijlstra


* Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org> wrote:

> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.27 and 2.6.28.
> 
> The following bug entry is on the current list of known regressions
> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> be listed and let me know (either way).
> 
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> Subject		: KVM guests stalling on 2.6.28 (bisected)
> Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
> Date		: 2009-01-17 03:37 (64 days old)
> References	: http://lkml.org/lkml/2009/3/15/51
> Handled-By	: Avi Kivity <avi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

It's still being investigated.

	Ingo

^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-03-21 17:01 2.6.29-rc8-git5: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
@ 2009-03-21 17:07   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-03-21 17:07 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Avi Kivity, Ingo Molnar, Kevin Shanahan,
	Kevin Shanahan, Mike Galbraith, Peter Zijlstra

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
Subject		: KVM guests stalling on 2.6.28 (bisected)
Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
Date		: 2009-01-17 03:37 (64 days old)
References	: http://lkml.org/lkml/2009/3/15/51
Handled-By	: Avi Kivity <avi@redhat.com>



^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-21 17:07   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-03-21 17:07 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Avi Kivity, Ingo Molnar, Kevin Shanahan,
	Kevin Shanahan, Mike Galbraith, Peter Zijlstra

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
Subject		: KVM guests stalling on 2.6.28 (bisected)
Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
Date		: 2009-01-17 03:37 (64 days old)
References	: http://lkml.org/lkml/2009/3/15/51
Handled-By	: Avi Kivity <avi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-21 14:08                       ` Frederic Weisbecker
  0 siblings, 0 replies; 262+ messages in thread
From: Frederic Weisbecker @ 2009-03-21 14:08 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Sat, Mar 21, 2009 at 03:30:39PM +1030, Kevin Shanahan wrote:
> On Thu, 2009-03-19 at 07:54 +1030, Kevin Shanahan wrote:
> > On Wed, 2009-03-18 at 11:46 +1030, Kevin Shanahan wrote:
> > > On Wed, 2009-03-18 at 01:20 +0100, Frederic Weisbecker wrote:
> > > > Ok, I've made a small script based on yours which could do this job.
> > > > You will just have to set yourself a threshold of latency
> > > > that you consider as buggy. I don't remember the latency you observed.
> > > > About 5 secs right?
> > > > 
> > > > It's the "thres" variable in the script.
> > > > 
> > > > The resulting trace should be a mixup of the function graph traces
> > > > and scheduler events which look like this:
> > > > 
> > > >  gnome-screensav-4691  [000]  6716.774277:   4691:120:S ==> [000]     0:140:R <idle>
> > > >   xfce4-terminal-4723  [001]  6716.774303:   4723:120:R   + [001]  4289:120:S Xorg
> > > >   xfce4-terminal-4723  [001]  6716.774417:   4723:120:S ==> [001]  4289:120:R Xorg
> > > >             Xorg-4289  [001]  6716.774427:   4289:120:S ==> [001]     0:140:R <idle>
> > > > 
> > > > + is a wakeup and ==> is a context switch.
> > > > 
> > > > The script will loop trying some pings and will only keep the trace that matches
> > > > the latency threshold you defined.
> > > > 
> > > > Tell if the following script work for you.
> > 
> > ...
> > 
> > > Either way, I'll try to get some results in my maintenance window
> > > tonight.
> > 
> > Testing did not go so well. I compiled and booted
> > 2.6.29-rc8-tip-02630-g93c4989, but had some problems with the system
> > load when I tried to start tracing - it shot up to around 16-20 or so. I
> > started shutting down VMs to try and get it under control, but before I
> > got back to tracing again the machine disappeared off the network -
> > unresponsive to ping.
> > 
> > When I got in this morning, there was nothing on the console, nothing in
> > the logs to show what went wrong. I will try again, but my next chance
> > will probably be Saturday. Stay tuned.
> 
> Okay, new set of traces have been uploaded to:
> 
>   http://disenchant.net/tmp/bug-12465/trace-3/
> 
> These were done on the latest tip, which I pulled down this morning:
> 2.6.29-rc8-tip-02744-gd9937cb.
> 
> The system load was very high again when I first tried to trace with
> sevarl guests running, so I ended up only having the one guest running
> and thankfully the bug was still reproducable that way.
> 
> Fingers crossed this set of traces is able to tell us something.


Thanks a lot Kevin!

The traces seem indeed much more clearer now.
Looking at the first trace, we begin with qemu which answers to the ping.
By roughly simplying the trace, we have that:


Found buggy latency:  9297.585
Please send the trace you will find on /sys/kernel/debug/tracing/trace
# tracer: function_graph
#
#      TIME       CPU  TASK/PID        DURATION                  FUNCTION CALLS
#       |         |    |    |           |   |                     |   |   |   |

							/* answer the ping (socket write) */
 2668.130735 |   0)  qemu-sy-4048  |               |  sys_writev() {
 2668.130735 |   0)  qemu-sy-4048  |   0.361 us    |    fget_light();
 2668.130744 |   0)  qemu-sy-4048  |               |       netif_rx_ni() {
 2668.130744 |   0)  qemu-sy-4048  |               |         netif_rx() {
 2668.130763 |   0)  qemu-sy-4048  |               |           ipv4_conntrack_in() {
 2668.130764 |   0)  qemu-sy-4048  |               |             nf_conntrack_in() {
 2668.130764 |   0)  qemu-sy-4048  |   0.328 us    |               ipv4_get_l4proto();
 2668.130765 |   0)  qemu-sy-4048  |   0.310 us    |               __nf_ct_l4proto_find();
 2668.130776 |   0)  qemu-sy-4048  |               |                 icmp_packet() {
 2668.130804 |   0)  qemu-sy-4048  |               |                   netif_receive_skb() {
 2668.130804 |   0)  qemu-sy-4048  |               |                     ip_rcv() {
 2668.130824 |   0)  qemu-sy-4048  |               |                       raw_rcv() {
 2668.130824 |   0)  qemu-sy-4048  |   0.307 us    |                         skb_push();
 2668.130825 |   0)  qemu-sy-4048  |               |                           raw_rcv_skb() {
 2668.130832 |   0)  qemu-sy-4048  |               |                             __wake_up_common() {
 2668.130838 |   0)  qemu-sy-4048  |               |                               /* sched_wakeup: task ping:7420 [120] success=1 */
 2668.130839 |   0)  qemu-sy-4048  |   0.312 us    |                           }
                                                                              }
                                                                             }
                                                      [...]

							/* ping was waaiting for this response and is now awaken */
 2668.130876 |   0)  qemu-sy-4048  |               |  schedule() {
 2668.130885 |   0)  qemu-sy-4048  |               |  /* sched_switch: task qemu-system-x86:4048 [120] ==> ping:7420 [120] */
 2668.130885 |   0)  qemu-sy-4048  |               |    runqueue_is_locked() {
 2668.130886 |   0)  qemu-sy-4048  |   0.399 us    |    __phys_addr();
 ------------------------------------------
 0)  qemu-sy-4048  =>   ping-7420   
 ------------------------------------------

 2668.130887 |   0)   ping-7420    |               |                  finish_task_switch() {
 2668.130887 |   0)   ping-7420    |               |                    perf_counter_task_sched_in() {
 2668.130888 |   0)   ping-7420    |   0.319 us    |                      _spin_lock();
 2668.130888 |   0)   ping-7420    |   0.959 us    |                    }
 2668.130889 |   0)   ping-7420    |   1.644 us    |                  }
 2668.130889 |   0)   ping-7420    | ! 298102.3 us |                }
 2668.130890 |   0)   ping-7420    |               |                del_timer_sync() {
 2668.130890 |   0)   ping-7420    |               |                  try_to_del_timer_sync() {
 2668.130890 |   0)   ping-7420    |               |                    lock_timer_base() {
 2668.130890 |   0)   ping-7420    |   0.328 us    |                      _spin_lock_irqsave();
 2668.130891 |   0)   ping-7420    |   0.946 us    |                    }
 2668.130891 |   0)   ping-7420    |   0.328 us    |                    _spin_unlock_irqrestore();
 2668.130892 |   0)   ping-7420    |   2.218 us    |                  }
 2668.130892 |   0)   ping-7420    |   2.847 us    |                }
 2668.130893 |   0)   ping-7420    | ! 298108.7 us |              }
 2668.130893 |   0)   ping-7420    |   0.340 us    |              finish_wait();
 2668.130894 |   0)   ping-7420    |   0.328 us    |              _spin_lock_irqsave();
 2668.130894 |   0)   ping-7420    |   0.324 us    |              _spin_unlock_irqrestore();



As you can see we are in the middle of the dialog between ping and the kvm.
It means that we have lost many traces. I thing that the ring buffer does not have
enough space allocated for these 9 seconds of processing.

Just wait a bit while I'm cooking a better script, or at least trying to get a
better number of entries to allocate to the ring buffer and I come back to you.

But anyway, the scheduler switch and wake up events help a lot.

 
> Regards,
> Kevin.
> 
> 


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-21 14:08                       ` Frederic Weisbecker
  0 siblings, 0 replies; 262+ messages in thread
From: Frederic Weisbecker @ 2009-03-21 14:08 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Sat, Mar 21, 2009 at 03:30:39PM +1030, Kevin Shanahan wrote:
> On Thu, 2009-03-19 at 07:54 +1030, Kevin Shanahan wrote:
> > On Wed, 2009-03-18 at 11:46 +1030, Kevin Shanahan wrote:
> > > On Wed, 2009-03-18 at 01:20 +0100, Frederic Weisbecker wrote:
> > > > Ok, I've made a small script based on yours which could do this job.
> > > > You will just have to set yourself a threshold of latency
> > > > that you consider as buggy. I don't remember the latency you observed.
> > > > About 5 secs right?
> > > > 
> > > > It's the "thres" variable in the script.
> > > > 
> > > > The resulting trace should be a mixup of the function graph traces
> > > > and scheduler events which look like this:
> > > > 
> > > >  gnome-screensav-4691  [000]  6716.774277:   4691:120:S ==> [000]     0:140:R <idle>
> > > >   xfce4-terminal-4723  [001]  6716.774303:   4723:120:R   + [001]  4289:120:S Xorg
> > > >   xfce4-terminal-4723  [001]  6716.774417:   4723:120:S ==> [001]  4289:120:R Xorg
> > > >             Xorg-4289  [001]  6716.774427:   4289:120:S ==> [001]     0:140:R <idle>
> > > > 
> > > > + is a wakeup and ==> is a context switch.
> > > > 
> > > > The script will loop trying some pings and will only keep the trace that matches
> > > > the latency threshold you defined.
> > > > 
> > > > Tell if the following script work for you.
> > 
> > ...
> > 
> > > Either way, I'll try to get some results in my maintenance window
> > > tonight.
> > 
> > Testing did not go so well. I compiled and booted
> > 2.6.29-rc8-tip-02630-g93c4989, but had some problems with the system
> > load when I tried to start tracing - it shot up to around 16-20 or so. I
> > started shutting down VMs to try and get it under control, but before I
> > got back to tracing again the machine disappeared off the network -
> > unresponsive to ping.
> > 
> > When I got in this morning, there was nothing on the console, nothing in
> > the logs to show what went wrong. I will try again, but my next chance
> > will probably be Saturday. Stay tuned.
> 
> Okay, new set of traces have been uploaded to:
> 
>   http://disenchant.net/tmp/bug-12465/trace-3/
> 
> These were done on the latest tip, which I pulled down this morning:
> 2.6.29-rc8-tip-02744-gd9937cb.
> 
> The system load was very high again when I first tried to trace with
> sevarl guests running, so I ended up only having the one guest running
> and thankfully the bug was still reproducable that way.
> 
> Fingers crossed this set of traces is able to tell us something.


Thanks a lot Kevin!

The traces seem indeed much more clearer now.
Looking at the first trace, we begin with qemu which answers to the ping.
By roughly simplying the trace, we have that:


Found buggy latency:  9297.585
Please send the trace you will find on /sys/kernel/debug/tracing/trace
# tracer: function_graph
#
#      TIME       CPU  TASK/PID        DURATION                  FUNCTION CALLS
#       |         |    |    |           |   |                     |   |   |   |

							/* answer the ping (socket write) */
 2668.130735 |   0)  qemu-sy-4048  |               |  sys_writev() {
 2668.130735 |   0)  qemu-sy-4048  |   0.361 us    |    fget_light();
 2668.130744 |   0)  qemu-sy-4048  |               |       netif_rx_ni() {
 2668.130744 |   0)  qemu-sy-4048  |               |         netif_rx() {
 2668.130763 |   0)  qemu-sy-4048  |               |           ipv4_conntrack_in() {
 2668.130764 |   0)  qemu-sy-4048  |               |             nf_conntrack_in() {
 2668.130764 |   0)  qemu-sy-4048  |   0.328 us    |               ipv4_get_l4proto();
 2668.130765 |   0)  qemu-sy-4048  |   0.310 us    |               __nf_ct_l4proto_find();
 2668.130776 |   0)  qemu-sy-4048  |               |                 icmp_packet() {
 2668.130804 |   0)  qemu-sy-4048  |               |                   netif_receive_skb() {
 2668.130804 |   0)  qemu-sy-4048  |               |                     ip_rcv() {
 2668.130824 |   0)  qemu-sy-4048  |               |                       raw_rcv() {
 2668.130824 |   0)  qemu-sy-4048  |   0.307 us    |                         skb_push();
 2668.130825 |   0)  qemu-sy-4048  |               |                           raw_rcv_skb() {
 2668.130832 |   0)  qemu-sy-4048  |               |                             __wake_up_common() {
 2668.130838 |   0)  qemu-sy-4048  |               |                               /* sched_wakeup: task ping:7420 [120] success=1 */
 2668.130839 |   0)  qemu-sy-4048  |   0.312 us    |                           }
                                                                              }
                                                                             }
                                                      [...]

							/* ping was waaiting for this response and is now awaken */
 2668.130876 |   0)  qemu-sy-4048  |               |  schedule() {
 2668.130885 |   0)  qemu-sy-4048  |               |  /* sched_switch: task qemu-system-x86:4048 [120] ==> ping:7420 [120] */
 2668.130885 |   0)  qemu-sy-4048  |               |    runqueue_is_locked() {
 2668.130886 |   0)  qemu-sy-4048  |   0.399 us    |    __phys_addr();
 ------------------------------------------
 0)  qemu-sy-4048  =>   ping-7420   
 ------------------------------------------

 2668.130887 |   0)   ping-7420    |               |                  finish_task_switch() {
 2668.130887 |   0)   ping-7420    |               |                    perf_counter_task_sched_in() {
 2668.130888 |   0)   ping-7420    |   0.319 us    |                      _spin_lock();
 2668.130888 |   0)   ping-7420    |   0.959 us    |                    }
 2668.130889 |   0)   ping-7420    |   1.644 us    |                  }
 2668.130889 |   0)   ping-7420    | ! 298102.3 us |                }
 2668.130890 |   0)   ping-7420    |               |                del_timer_sync() {
 2668.130890 |   0)   ping-7420    |               |                  try_to_del_timer_sync() {
 2668.130890 |   0)   ping-7420    |               |                    lock_timer_base() {
 2668.130890 |   0)   ping-7420    |   0.328 us    |                      _spin_lock_irqsave();
 2668.130891 |   0)   ping-7420    |   0.946 us    |                    }
 2668.130891 |   0)   ping-7420    |   0.328 us    |                    _spin_unlock_irqrestore();
 2668.130892 |   0)   ping-7420    |   2.218 us    |                  }
 2668.130892 |   0)   ping-7420    |   2.847 us    |                }
 2668.130893 |   0)   ping-7420    | ! 298108.7 us |              }
 2668.130893 |   0)   ping-7420    |   0.340 us    |              finish_wait();
 2668.130894 |   0)   ping-7420    |   0.328 us    |              _spin_lock_irqsave();
 2668.130894 |   0)   ping-7420    |   0.324 us    |              _spin_unlock_irqrestore();



As you can see we are in the middle of the dialog between ping and the kvm.
It means that we have lost many traces. I thing that the ring buffer does not have
enough space allocated for these 9 seconds of processing.

Just wait a bit while I'm cooking a better script, or at least trying to get a
better number of entries to allocate to the ring buffer and I come back to you.

But anyway, the scheduler switch and wake up events help a lot.

 
> Regards,
> Kevin.
> 
> 

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-21  5:00                     ` Kevin Shanahan
  0 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-03-21  5:00 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Thu, 2009-03-19 at 07:54 +1030, Kevin Shanahan wrote:
> On Wed, 2009-03-18 at 11:46 +1030, Kevin Shanahan wrote:
> > On Wed, 2009-03-18 at 01:20 +0100, Frederic Weisbecker wrote:
> > > Ok, I've made a small script based on yours which could do this job.
> > > You will just have to set yourself a threshold of latency
> > > that you consider as buggy. I don't remember the latency you observed.
> > > About 5 secs right?
> > > 
> > > It's the "thres" variable in the script.
> > > 
> > > The resulting trace should be a mixup of the function graph traces
> > > and scheduler events which look like this:
> > > 
> > >  gnome-screensav-4691  [000]  6716.774277:   4691:120:S ==> [000]     0:140:R <idle>
> > >   xfce4-terminal-4723  [001]  6716.774303:   4723:120:R   + [001]  4289:120:S Xorg
> > >   xfce4-terminal-4723  [001]  6716.774417:   4723:120:S ==> [001]  4289:120:R Xorg
> > >             Xorg-4289  [001]  6716.774427:   4289:120:S ==> [001]     0:140:R <idle>
> > > 
> > > + is a wakeup and ==> is a context switch.
> > > 
> > > The script will loop trying some pings and will only keep the trace that matches
> > > the latency threshold you defined.
> > > 
> > > Tell if the following script work for you.
> 
> ...
> 
> > Either way, I'll try to get some results in my maintenance window
> > tonight.
> 
> Testing did not go so well. I compiled and booted
> 2.6.29-rc8-tip-02630-g93c4989, but had some problems with the system
> load when I tried to start tracing - it shot up to around 16-20 or so. I
> started shutting down VMs to try and get it under control, but before I
> got back to tracing again the machine disappeared off the network -
> unresponsive to ping.
> 
> When I got in this morning, there was nothing on the console, nothing in
> the logs to show what went wrong. I will try again, but my next chance
> will probably be Saturday. Stay tuned.

Okay, new set of traces have been uploaded to:

  http://disenchant.net/tmp/bug-12465/trace-3/

These were done on the latest tip, which I pulled down this morning:
2.6.29-rc8-tip-02744-gd9937cb.

The system load was very high again when I first tried to trace with
sevarl guests running, so I ended up only having the one guest running
and thankfully the bug was still reproducable that way.

Fingers crossed this set of traces is able to tell us something.

Regards,
Kevin.



^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-21  5:00                     ` Kevin Shanahan
  0 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-03-21  5:00 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Thu, 2009-03-19 at 07:54 +1030, Kevin Shanahan wrote:
> On Wed, 2009-03-18 at 11:46 +1030, Kevin Shanahan wrote:
> > On Wed, 2009-03-18 at 01:20 +0100, Frederic Weisbecker wrote:
> > > Ok, I've made a small script based on yours which could do this job.
> > > You will just have to set yourself a threshold of latency
> > > that you consider as buggy. I don't remember the latency you observed.
> > > About 5 secs right?
> > > 
> > > It's the "thres" variable in the script.
> > > 
> > > The resulting trace should be a mixup of the function graph traces
> > > and scheduler events which look like this:
> > > 
> > >  gnome-screensav-4691  [000]  6716.774277:   4691:120:S ==> [000]     0:140:R <idle>
> > >   xfce4-terminal-4723  [001]  6716.774303:   4723:120:R   + [001]  4289:120:S Xorg
> > >   xfce4-terminal-4723  [001]  6716.774417:   4723:120:S ==> [001]  4289:120:R Xorg
> > >             Xorg-4289  [001]  6716.774427:   4289:120:S ==> [001]     0:140:R <idle>
> > > 
> > > + is a wakeup and ==> is a context switch.
> > > 
> > > The script will loop trying some pings and will only keep the trace that matches
> > > the latency threshold you defined.
> > > 
> > > Tell if the following script work for you.
> 
> ...
> 
> > Either way, I'll try to get some results in my maintenance window
> > tonight.
> 
> Testing did not go so well. I compiled and booted
> 2.6.29-rc8-tip-02630-g93c4989, but had some problems with the system
> load when I tried to start tracing - it shot up to around 16-20 or so. I
> started shutting down VMs to try and get it under control, but before I
> got back to tracing again the machine disappeared off the network -
> unresponsive to ping.
> 
> When I got in this morning, there was nothing on the console, nothing in
> the logs to show what went wrong. I will try again, but my next chance
> will probably be Saturday. Stay tuned.

Okay, new set of traces have been uploaded to:

  http://disenchant.net/tmp/bug-12465/trace-3/

These were done on the latest tip, which I pulled down this morning:
2.6.29-rc8-tip-02744-gd9937cb.

The system load was very high again when I first tried to trace with
sevarl guests running, so I ended up only having the one guest running
and thankfully the bug was still reproducable that way.

Fingers crossed this set of traces is able to tell us something.

Regards,
Kevin.


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-03-18  1:16                 ` Kevin Shanahan
  (?)
  (?)
@ 2009-03-18 21:24                 ` Kevin Shanahan
  2009-03-21  5:00                     ` Kevin Shanahan
  -1 siblings, 1 reply; 262+ messages in thread
From: Kevin Shanahan @ 2009-03-18 21:24 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Wed, 2009-03-18 at 11:46 +1030, Kevin Shanahan wrote:
> On Wed, 2009-03-18 at 01:20 +0100, Frederic Weisbecker wrote:
> > Ok, I've made a small script based on yours which could do this job.
> > You will just have to set yourself a threshold of latency
> > that you consider as buggy. I don't remember the latency you observed.
> > About 5 secs right?
> > 
> > It's the "thres" variable in the script.
> > 
> > The resulting trace should be a mixup of the function graph traces
> > and scheduler events which look like this:
> > 
> >  gnome-screensav-4691  [000]  6716.774277:   4691:120:S ==> [000]     0:140:R <idle>
> >   xfce4-terminal-4723  [001]  6716.774303:   4723:120:R   + [001]  4289:120:S Xorg
> >   xfce4-terminal-4723  [001]  6716.774417:   4723:120:S ==> [001]  4289:120:R Xorg
> >             Xorg-4289  [001]  6716.774427:   4289:120:S ==> [001]     0:140:R <idle>
> > 
> > + is a wakeup and ==> is a context switch.
> > 
> > The script will loop trying some pings and will only keep the trace that matches
> > the latency threshold you defined.
> > 
> > Tell if the following script work for you.

...

> Either way, I'll try to get some results in my maintenance window
> tonight.

Testing did not go so well. I compiled and booted
2.6.29-rc8-tip-02630-g93c4989, but had some problems with the system
load when I tried to start tracing - it shot up to around 16-20 or so. I
started shutting down VMs to try and get it under control, but before I
got back to tracing again the machine disappeared off the network -
unresponsive to ping.

When I got in this morning, there was nothing on the console, nothing in
the logs to show what went wrong. I will try again, but my next chance
will probably be Saturday. Stay tuned.

Regards,
Kevin.



^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-18  2:24                   ` Frederic Weisbecker
  0 siblings, 0 replies; 262+ messages in thread
From: Frederic Weisbecker @ 2009-03-18  2:24 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Wed, Mar 18, 2009 at 11:46:26AM +1030, Kevin Shanahan wrote:
> On Wed, 2009-03-18 at 01:20 +0100, Frederic Weisbecker wrote:
> > On Tue, Mar 17, 2009 at 09:25:37AM +1030, Kevin Shanahan wrote:
> > > On Mon, 2009-03-16 at 21:07 +0100, Frederic Weisbecker wrote:
> > > > I've looked a bit at your traces.
> > > > I think it's probably too wide to find something inside.
> > > > Latest -tip is provided with a new set of events tracing, meaning
> > > > that you will be able to produce function graph traces with various
> > > > sched events included.
> > > > 
> > > > Another thing, is it possible to reproduce it with only one ping?
> > > > Or testing perioding pings and keep only one that raised a relevant
> > > > threshold of latency? I think we could do a script that can do that.
> > > > It would make the trace much clearer.
> > > 
> > > Yeah, I think that should be possible. If you can come up with such a
> > > script, that would be great.
> > 
> > Ok, I've made a small script based on yours which could do this job.
> > You will just have to set yourself a threshold of latency
> > that you consider as buggy. I don't remember the latency you observed.
> > About 5 secs right?
> > 
> > It's the "thres" variable in the script.
> > 
> > The resulting trace should be a mixup of the function graph traces
> > and scheduler events which look like this:
> > 
> >  gnome-screensav-4691  [000]  6716.774277:   4691:120:S ==> [000]     0:140:R <idle>
> >   xfce4-terminal-4723  [001]  6716.774303:   4723:120:R   + [001]  4289:120:S Xorg
> >   xfce4-terminal-4723  [001]  6716.774417:   4723:120:S ==> [001]  4289:120:R Xorg
> >             Xorg-4289  [001]  6716.774427:   4289:120:S ==> [001]     0:140:R <idle>
> > 
> > + is a wakeup and ==> is a context switch.
> > 
> > 
> > The script will loop trying some pings and will only keep the trace that matches
> > the latency threshold you defined.
> > 
> > Tell if the following script work for you.
> 
> Yes, this looks like it will work as intended.
> 
> One thing I was thinking about though - would we need to look for a
> trace that consists of a fast ping followed by a slow ping? If we only
> keep the trace data from when we experience the slow ping, the guest
> will have already "stalled" before the trace started, so the trace won't
> indicate any of the information about how we got into that state. Is
> that information going to be important, or is it enough to just see what
> it does to get out of the stalled state?


I don't know :-)
I fear the only thing we would see by looking at a fast ping trace
is the kvm going to sleep at the end. I guess the hot black box
here is likely: what happens when we try to wake up kvm and why it is
taking so long.

May be by looking at a slow ping trace, we could follow the flow once
the kvm is supposed to be awaken. At this stage, we can perhaps
follow both the scheduler and kvm activities. Hopefully after that
we can reduce more the trace, by filtering some specific areas.

It will likely end up with some ftrace_printk() (putting specific
trace messages in defined locations)...


 
> Either way, I'll try to get some results in my maintenance window
> tonight.
>
> Cheers,
> Kevin.
> 
> > You will need to pull the latest -tip tree and enable the following:
> > 
> > CONFIG_FUNCTION_TRACER=y
> > CONFIG_FUNCTION_GRAPH_TRACER=y
> > CONFIG_DYNAMIC_FTRACE=y
> > CONFIG_SCHED_TRACER=y
> > CONFIG_CONTEXT_SWITCH_TRACER=y
> > CONFIG_EVENT_TRACER=y
> > 
> > Thanks!
> > 
> > Ah and you will need python too (since bash can't do floating point
> > operation, it uses python here).
> > 
> > #!/bin/bash
> > 
> > # Switch off all CPUs except for one to simplify the trace
> > echo 0 > /sys/devices/system/cpu/cpu1/online
> > echo 0 > /sys/devices/system/cpu/cpu2/online
> > echo 0 > /sys/devices/system/cpu/cpu3/online
> > 
> > 
> > # Make sure debugfs has been mounted
> > if [ ! -d /sys/kernel/debug/tracing ]; then
> >     mount -t debugfs debugfs /sys/kernel/debug
> > fi
> > 
> > # Set up the trace parameters
> > pushd /sys/kernel/debug/tracing || exit 1
> > echo 0 > tracing_enabled
> > echo function_graph > current_tracer
> > echo funcgraph-abstime > trace_options
> > echo funcgraph-proc    > trace_options
> > 
> > # Set here the kvm IP addr
> > addr=""
> > 
> > # Set here a threshold of latency in sec
> > thres="5"
> > found="False"
> > lat=0
> > prefix=/sys/kernel/debug/tracing
> > 
> > echo 1 > $prefix/events/sched/sched_wakeup/enable
> > echo 1 > $prefix/events/sched/sched_switch/enable
> > 
> > while [ "$found" != "True" ]
> > do
> > 	# Flush the previous buffer
> > 	echo nop > $prefix/current_tracer
> > 
> > 	# Reset the function_graph tracer
> > 	echo function_graph > $prefix/current_tracer
> > 
> > 	echo 1 > $prefix/tracing_enabled
> > 	lat=$(ping -c 1 $addr | grep rtt | grep -Eo " [0-9]+.[0-9]+")
> > 	echo 0 > $prefix/tracing_enabled
> > 
> > 	found=$(python -c "print float(str($lat).strip()) > $thres")
> > 	sleep 0.01
> > done
> > 
> > echo 0 > $prefix/events/sched/sched_wakeup/enable
> > echo 0 > $prefix/events/sched/sched_switch/enable
> > 
> > 
> > echo "Found buggy latency: $lat"
> > echo "Please send the trace you will find on $prefix/trace"
> 
> 


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-18  2:24                   ` Frederic Weisbecker
  0 siblings, 0 replies; 262+ messages in thread
From: Frederic Weisbecker @ 2009-03-18  2:24 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Wed, Mar 18, 2009 at 11:46:26AM +1030, Kevin Shanahan wrote:
> On Wed, 2009-03-18 at 01:20 +0100, Frederic Weisbecker wrote:
> > On Tue, Mar 17, 2009 at 09:25:37AM +1030, Kevin Shanahan wrote:
> > > On Mon, 2009-03-16 at 21:07 +0100, Frederic Weisbecker wrote:
> > > > I've looked a bit at your traces.
> > > > I think it's probably too wide to find something inside.
> > > > Latest -tip is provided with a new set of events tracing, meaning
> > > > that you will be able to produce function graph traces with various
> > > > sched events included.
> > > > 
> > > > Another thing, is it possible to reproduce it with only one ping?
> > > > Or testing perioding pings and keep only one that raised a relevant
> > > > threshold of latency? I think we could do a script that can do that.
> > > > It would make the trace much clearer.
> > > 
> > > Yeah, I think that should be possible. If you can come up with such a
> > > script, that would be great.
> > 
> > Ok, I've made a small script based on yours which could do this job.
> > You will just have to set yourself a threshold of latency
> > that you consider as buggy. I don't remember the latency you observed.
> > About 5 secs right?
> > 
> > It's the "thres" variable in the script.
> > 
> > The resulting trace should be a mixup of the function graph traces
> > and scheduler events which look like this:
> > 
> >  gnome-screensav-4691  [000]  6716.774277:   4691:120:S ==> [000]     0:140:R <idle>
> >   xfce4-terminal-4723  [001]  6716.774303:   4723:120:R   + [001]  4289:120:S Xorg
> >   xfce4-terminal-4723  [001]  6716.774417:   4723:120:S ==> [001]  4289:120:R Xorg
> >             Xorg-4289  [001]  6716.774427:   4289:120:S ==> [001]     0:140:R <idle>
> > 
> > + is a wakeup and ==> is a context switch.
> > 
> > 
> > The script will loop trying some pings and will only keep the trace that matches
> > the latency threshold you defined.
> > 
> > Tell if the following script work for you.
> 
> Yes, this looks like it will work as intended.
> 
> One thing I was thinking about though - would we need to look for a
> trace that consists of a fast ping followed by a slow ping? If we only
> keep the trace data from when we experience the slow ping, the guest
> will have already "stalled" before the trace started, so the trace won't
> indicate any of the information about how we got into that state. Is
> that information going to be important, or is it enough to just see what
> it does to get out of the stalled state?


I don't know :-)
I fear the only thing we would see by looking at a fast ping trace
is the kvm going to sleep at the end. I guess the hot black box
here is likely: what happens when we try to wake up kvm and why it is
taking so long.

May be by looking at a slow ping trace, we could follow the flow once
the kvm is supposed to be awaken. At this stage, we can perhaps
follow both the scheduler and kvm activities. Hopefully after that
we can reduce more the trace, by filtering some specific areas.

It will likely end up with some ftrace_printk() (putting specific
trace messages in defined locations)...


 
> Either way, I'll try to get some results in my maintenance window
> tonight.
>
> Cheers,
> Kevin.
> 
> > You will need to pull the latest -tip tree and enable the following:
> > 
> > CONFIG_FUNCTION_TRACER=y
> > CONFIG_FUNCTION_GRAPH_TRACER=y
> > CONFIG_DYNAMIC_FTRACE=y
> > CONFIG_SCHED_TRACER=y
> > CONFIG_CONTEXT_SWITCH_TRACER=y
> > CONFIG_EVENT_TRACER=y
> > 
> > Thanks!
> > 
> > Ah and you will need python too (since bash can't do floating point
> > operation, it uses python here).
> > 
> > #!/bin/bash
> > 
> > # Switch off all CPUs except for one to simplify the trace
> > echo 0 > /sys/devices/system/cpu/cpu1/online
> > echo 0 > /sys/devices/system/cpu/cpu2/online
> > echo 0 > /sys/devices/system/cpu/cpu3/online
> > 
> > 
> > # Make sure debugfs has been mounted
> > if [ ! -d /sys/kernel/debug/tracing ]; then
> >     mount -t debugfs debugfs /sys/kernel/debug
> > fi
> > 
> > # Set up the trace parameters
> > pushd /sys/kernel/debug/tracing || exit 1
> > echo 0 > tracing_enabled
> > echo function_graph > current_tracer
> > echo funcgraph-abstime > trace_options
> > echo funcgraph-proc    > trace_options
> > 
> > # Set here the kvm IP addr
> > addr=""
> > 
> > # Set here a threshold of latency in sec
> > thres="5"
> > found="False"
> > lat=0
> > prefix=/sys/kernel/debug/tracing
> > 
> > echo 1 > $prefix/events/sched/sched_wakeup/enable
> > echo 1 > $prefix/events/sched/sched_switch/enable
> > 
> > while [ "$found" != "True" ]
> > do
> > 	# Flush the previous buffer
> > 	echo nop > $prefix/current_tracer
> > 
> > 	# Reset the function_graph tracer
> > 	echo function_graph > $prefix/current_tracer
> > 
> > 	echo 1 > $prefix/tracing_enabled
> > 	lat=$(ping -c 1 $addr | grep rtt | grep -Eo " [0-9]+.[0-9]+")
> > 	echo 0 > $prefix/tracing_enabled
> > 
> > 	found=$(python -c "print float(str($lat).strip()) > $thres")
> > 	sleep 0.01
> > done
> > 
> > echo 0 > $prefix/events/sched/sched_wakeup/enable
> > echo 0 > $prefix/events/sched/sched_switch/enable
> > 
> > 
> > echo "Found buggy latency: $lat"
> > echo "Please send the trace you will find on $prefix/trace"
> 
> 

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-03-18  0:20               ` Frederic Weisbecker
@ 2009-03-18  1:16                 ` Kevin Shanahan
  -1 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-03-18  1:16 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Wed, 2009-03-18 at 01:20 +0100, Frederic Weisbecker wrote:
> On Tue, Mar 17, 2009 at 09:25:37AM +1030, Kevin Shanahan wrote:
> > On Mon, 2009-03-16 at 21:07 +0100, Frederic Weisbecker wrote:
> > > I've looked a bit at your traces.
> > > I think it's probably too wide to find something inside.
> > > Latest -tip is provided with a new set of events tracing, meaning
> > > that you will be able to produce function graph traces with various
> > > sched events included.
> > > 
> > > Another thing, is it possible to reproduce it with only one ping?
> > > Or testing perioding pings and keep only one that raised a relevant
> > > threshold of latency? I think we could do a script that can do that.
> > > It would make the trace much clearer.
> > 
> > Yeah, I think that should be possible. If you can come up with such a
> > script, that would be great.
> 
> Ok, I've made a small script based on yours which could do this job.
> You will just have to set yourself a threshold of latency
> that you consider as buggy. I don't remember the latency you observed.
> About 5 secs right?
> 
> It's the "thres" variable in the script.
> 
> The resulting trace should be a mixup of the function graph traces
> and scheduler events which look like this:
> 
>  gnome-screensav-4691  [000]  6716.774277:   4691:120:S ==> [000]     0:140:R <idle>
>   xfce4-terminal-4723  [001]  6716.774303:   4723:120:R   + [001]  4289:120:S Xorg
>   xfce4-terminal-4723  [001]  6716.774417:   4723:120:S ==> [001]  4289:120:R Xorg
>             Xorg-4289  [001]  6716.774427:   4289:120:S ==> [001]     0:140:R <idle>
> 
> + is a wakeup and ==> is a context switch.
> 
> 
> The script will loop trying some pings and will only keep the trace that matches
> the latency threshold you defined.
> 
> Tell if the following script work for you.

Yes, this looks like it will work as intended.

One thing I was thinking about though - would we need to look for a
trace that consists of a fast ping followed by a slow ping? If we only
keep the trace data from when we experience the slow ping, the guest
will have already "stalled" before the trace started, so the trace won't
indicate any of the information about how we got into that state. Is
that information going to be important, or is it enough to just see what
it does to get out of the stalled state?

Either way, I'll try to get some results in my maintenance window
tonight.

Cheers,
Kevin.

> You will need to pull the latest -tip tree and enable the following:
> 
> CONFIG_FUNCTION_TRACER=y
> CONFIG_FUNCTION_GRAPH_TRACER=y
> CONFIG_DYNAMIC_FTRACE=y
> CONFIG_SCHED_TRACER=y
> CONFIG_CONTEXT_SWITCH_TRACER=y
> CONFIG_EVENT_TRACER=y
> 
> Thanks!
> 
> Ah and you will need python too (since bash can't do floating point
> operation, it uses python here).
> 
> #!/bin/bash
> 
> # Switch off all CPUs except for one to simplify the trace
> echo 0 > /sys/devices/system/cpu/cpu1/online
> echo 0 > /sys/devices/system/cpu/cpu2/online
> echo 0 > /sys/devices/system/cpu/cpu3/online
> 
> 
> # Make sure debugfs has been mounted
> if [ ! -d /sys/kernel/debug/tracing ]; then
>     mount -t debugfs debugfs /sys/kernel/debug
> fi
> 
> # Set up the trace parameters
> pushd /sys/kernel/debug/tracing || exit 1
> echo 0 > tracing_enabled
> echo function_graph > current_tracer
> echo funcgraph-abstime > trace_options
> echo funcgraph-proc    > trace_options
> 
> # Set here the kvm IP addr
> addr=""
> 
> # Set here a threshold of latency in sec
> thres="5"
> found="False"
> lat=0
> prefix=/sys/kernel/debug/tracing
> 
> echo 1 > $prefix/events/sched/sched_wakeup/enable
> echo 1 > $prefix/events/sched/sched_switch/enable
> 
> while [ "$found" != "True" ]
> do
> 	# Flush the previous buffer
> 	echo nop > $prefix/current_tracer
> 
> 	# Reset the function_graph tracer
> 	echo function_graph > $prefix/current_tracer
> 
> 	echo 1 > $prefix/tracing_enabled
> 	lat=$(ping -c 1 $addr | grep rtt | grep -Eo " [0-9]+.[0-9]+")
> 	echo 0 > $prefix/tracing_enabled
> 
> 	found=$(python -c "print float(str($lat).strip()) > $thres")
> 	sleep 0.01
> done
> 
> echo 0 > $prefix/events/sched/sched_wakeup/enable
> echo 0 > $prefix/events/sched/sched_switch/enable
> 
> 
> echo "Found buggy latency: $lat"
> echo "Please send the trace you will find on $prefix/trace"



^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-18  1:16                 ` Kevin Shanahan
  0 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-03-18  1:16 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Wed, 2009-03-18 at 01:20 +0100, Frederic Weisbecker wrote:
> On Tue, Mar 17, 2009 at 09:25:37AM +1030, Kevin Shanahan wrote:
> > On Mon, 2009-03-16 at 21:07 +0100, Frederic Weisbecker wrote:
> > > I've looked a bit at your traces.
> > > I think it's probably too wide to find something inside.
> > > Latest -tip is provided with a new set of events tracing, meaning
> > > that you will be able to produce function graph traces with various
> > > sched events included.
> > > 
> > > Another thing, is it possible to reproduce it with only one ping?
> > > Or testing perioding pings and keep only one that raised a relevant
> > > threshold of latency? I think we could do a script that can do that.
> > > It would make the trace much clearer.
> > 
> > Yeah, I think that should be possible. If you can come up with such a
> > script, that would be great.
> 
> Ok, I've made a small script based on yours which could do this job.
> You will just have to set yourself a threshold of latency
> that you consider as buggy. I don't remember the latency you observed.
> About 5 secs right?
> 
> It's the "thres" variable in the script.
> 
> The resulting trace should be a mixup of the function graph traces
> and scheduler events which look like this:
> 
>  gnome-screensav-4691  [000]  6716.774277:   4691:120:S ==> [000]     0:140:R <idle>
>   xfce4-terminal-4723  [001]  6716.774303:   4723:120:R   + [001]  4289:120:S Xorg
>   xfce4-terminal-4723  [001]  6716.774417:   4723:120:S ==> [001]  4289:120:R Xorg
>             Xorg-4289  [001]  6716.774427:   4289:120:S ==> [001]     0:140:R <idle>
> 
> + is a wakeup and ==> is a context switch.
> 
> 
> The script will loop trying some pings and will only keep the trace that matches
> the latency threshold you defined.
> 
> Tell if the following script work for you.

Yes, this looks like it will work as intended.

One thing I was thinking about though - would we need to look for a
trace that consists of a fast ping followed by a slow ping? If we only
keep the trace data from when we experience the slow ping, the guest
will have already "stalled" before the trace started, so the trace won't
indicate any of the information about how we got into that state. Is
that information going to be important, or is it enough to just see what
it does to get out of the stalled state?

Either way, I'll try to get some results in my maintenance window
tonight.

Cheers,
Kevin.

> You will need to pull the latest -tip tree and enable the following:
> 
> CONFIG_FUNCTION_TRACER=y
> CONFIG_FUNCTION_GRAPH_TRACER=y
> CONFIG_DYNAMIC_FTRACE=y
> CONFIG_SCHED_TRACER=y
> CONFIG_CONTEXT_SWITCH_TRACER=y
> CONFIG_EVENT_TRACER=y
> 
> Thanks!
> 
> Ah and you will need python too (since bash can't do floating point
> operation, it uses python here).
> 
> #!/bin/bash
> 
> # Switch off all CPUs except for one to simplify the trace
> echo 0 > /sys/devices/system/cpu/cpu1/online
> echo 0 > /sys/devices/system/cpu/cpu2/online
> echo 0 > /sys/devices/system/cpu/cpu3/online
> 
> 
> # Make sure debugfs has been mounted
> if [ ! -d /sys/kernel/debug/tracing ]; then
>     mount -t debugfs debugfs /sys/kernel/debug
> fi
> 
> # Set up the trace parameters
> pushd /sys/kernel/debug/tracing || exit 1
> echo 0 > tracing_enabled
> echo function_graph > current_tracer
> echo funcgraph-abstime > trace_options
> echo funcgraph-proc    > trace_options
> 
> # Set here the kvm IP addr
> addr=""
> 
> # Set here a threshold of latency in sec
> thres="5"
> found="False"
> lat=0
> prefix=/sys/kernel/debug/tracing
> 
> echo 1 > $prefix/events/sched/sched_wakeup/enable
> echo 1 > $prefix/events/sched/sched_switch/enable
> 
> while [ "$found" != "True" ]
> do
> 	# Flush the previous buffer
> 	echo nop > $prefix/current_tracer
> 
> 	# Reset the function_graph tracer
> 	echo function_graph > $prefix/current_tracer
> 
> 	echo 1 > $prefix/tracing_enabled
> 	lat=$(ping -c 1 $addr | grep rtt | grep -Eo " [0-9]+.[0-9]+")
> 	echo 0 > $prefix/tracing_enabled
> 
> 	found=$(python -c "print float(str($lat).strip()) > $thres")
> 	sleep 0.01
> done
> 
> echo 0 > $prefix/events/sched/sched_wakeup/enable
> echo 0 > $prefix/events/sched/sched_switch/enable
> 
> 
> echo "Found buggy latency: $lat"
> echo "Please send the trace you will find on $prefix/trace"


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-18  0:20               ` Frederic Weisbecker
  0 siblings, 0 replies; 262+ messages in thread
From: Frederic Weisbecker @ 2009-03-18  0:20 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Tue, Mar 17, 2009 at 09:25:37AM +1030, Kevin Shanahan wrote:
> On Mon, 2009-03-16 at 21:07 +0100, Frederic Weisbecker wrote:
> > I've looked a bit at your traces.
> > I think it's probably too wide to find something inside.
> > Latest -tip is provided with a new set of events tracing, meaning
> > that you will be able to produce function graph traces with various
> > sched events included.
> > 
> > Another thing, is it possible to reproduce it with only one ping?
> > Or testing perioding pings and keep only one that raised a relevant
> > threshold of latency? I think we could do a script that can do that.
> > It would make the trace much clearer.
> 
> Yeah, I think that should be possible. If you can come up with such a
> script, that would be great.

Ok, I've made a small script based on yours which could do this job.
You will just have to set yourself a threshold of latency
that you consider as buggy. I don't remember the latency you observed.
About 5 secs right?

It's the "thres" variable in the script.

The resulting trace should be a mixup of the function graph traces
and scheduler events which look like this:

 gnome-screensav-4691  [000]  6716.774277:   4691:120:S ==> [000]     0:140:R <idle>
  xfce4-terminal-4723  [001]  6716.774303:   4723:120:R   + [001]  4289:120:S Xorg
  xfce4-terminal-4723  [001]  6716.774417:   4723:120:S ==> [001]  4289:120:R Xorg
            Xorg-4289  [001]  6716.774427:   4289:120:S ==> [001]     0:140:R <idle>

+ is a wakeup and ==> is a context switch.


The script will loop trying some pings and will only keep the trace that matches
the latency threshold you defined.

Tell if the following script work for you.

You will need to pull the latest -tip tree and enable the following:

CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_SCHED_TRACER=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_EVENT_TRACER=y

Thanks!

Ah and you will need python too (since bash can't do floating point
operation, it uses python here).

#!/bin/bash

# Switch off all CPUs except for one to simplify the trace
echo 0 > /sys/devices/system/cpu/cpu1/online
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 0 > /sys/devices/system/cpu/cpu3/online


# Make sure debugfs has been mounted
if [ ! -d /sys/kernel/debug/tracing ]; then
    mount -t debugfs debugfs /sys/kernel/debug
fi

# Set up the trace parameters
pushd /sys/kernel/debug/tracing || exit 1
echo 0 > tracing_enabled
echo function_graph > current_tracer
echo funcgraph-abstime > trace_options
echo funcgraph-proc    > trace_options

# Set here the kvm IP addr
addr=""

# Set here a threshold of latency in sec
thres="5"
found="False"
lat=0
prefix=/sys/kernel/debug/tracing

echo 1 > $prefix/events/sched/sched_wakeup/enable
echo 1 > $prefix/events/sched/sched_switch/enable

while [ "$found" != "True" ]
do
	# Flush the previous buffer
	echo nop > $prefix/current_tracer

	# Reset the function_graph tracer
	echo function_graph > $prefix/current_tracer

	echo 1 > $prefix/tracing_enabled
	lat=$(ping -c 1 $addr | grep rtt | grep -Eo " [0-9]+.[0-9]+")
	echo 0 > $prefix/tracing_enabled

	found=$(python -c "print float(str($lat).strip()) > $thres")
	sleep 0.01
done

echo 0 > $prefix/events/sched/sched_wakeup/enable
echo 0 > $prefix/events/sched/sched_switch/enable


echo "Found buggy latency: $lat"
echo "Please send the trace you will find on $prefix/trace"



> 
> > Just wait a bit, I'm looking at which event could be relevant to enable
> > and I come back to you with a set of commands to test.
> 
> Excellent! Thanks for looking into this.
> 
> Cheers,
> Kevin.
> 
> 


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-18  0:20               ` Frederic Weisbecker
  0 siblings, 0 replies; 262+ messages in thread
From: Frederic Weisbecker @ 2009-03-18  0:20 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Tue, Mar 17, 2009 at 09:25:37AM +1030, Kevin Shanahan wrote:
> On Mon, 2009-03-16 at 21:07 +0100, Frederic Weisbecker wrote:
> > I've looked a bit at your traces.
> > I think it's probably too wide to find something inside.
> > Latest -tip is provided with a new set of events tracing, meaning
> > that you will be able to produce function graph traces with various
> > sched events included.
> > 
> > Another thing, is it possible to reproduce it with only one ping?
> > Or testing perioding pings and keep only one that raised a relevant
> > threshold of latency? I think we could do a script that can do that.
> > It would make the trace much clearer.
> 
> Yeah, I think that should be possible. If you can come up with such a
> script, that would be great.

Ok, I've made a small script based on yours which could do this job.
You will just have to set yourself a threshold of latency
that you consider as buggy. I don't remember the latency you observed.
About 5 secs right?

It's the "thres" variable in the script.

The resulting trace should be a mixup of the function graph traces
and scheduler events which look like this:

 gnome-screensav-4691  [000]  6716.774277:   4691:120:S ==> [000]     0:140:R <idle>
  xfce4-terminal-4723  [001]  6716.774303:   4723:120:R   + [001]  4289:120:S Xorg
  xfce4-terminal-4723  [001]  6716.774417:   4723:120:S ==> [001]  4289:120:R Xorg
            Xorg-4289  [001]  6716.774427:   4289:120:S ==> [001]     0:140:R <idle>

+ is a wakeup and ==> is a context switch.


The script will loop trying some pings and will only keep the trace that matches
the latency threshold you defined.

Tell if the following script work for you.

You will need to pull the latest -tip tree and enable the following:

CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_SCHED_TRACER=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_EVENT_TRACER=y

Thanks!

Ah and you will need python too (since bash can't do floating point
operation, it uses python here).

#!/bin/bash

# Switch off all CPUs except for one to simplify the trace
echo 0 > /sys/devices/system/cpu/cpu1/online
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 0 > /sys/devices/system/cpu/cpu3/online


# Make sure debugfs has been mounted
if [ ! -d /sys/kernel/debug/tracing ]; then
    mount -t debugfs debugfs /sys/kernel/debug
fi

# Set up the trace parameters
pushd /sys/kernel/debug/tracing || exit 1
echo 0 > tracing_enabled
echo function_graph > current_tracer
echo funcgraph-abstime > trace_options
echo funcgraph-proc    > trace_options

# Set here the kvm IP addr
addr=""

# Set here a threshold of latency in sec
thres="5"
found="False"
lat=0
prefix=/sys/kernel/debug/tracing

echo 1 > $prefix/events/sched/sched_wakeup/enable
echo 1 > $prefix/events/sched/sched_switch/enable

while [ "$found" != "True" ]
do
	# Flush the previous buffer
	echo nop > $prefix/current_tracer

	# Reset the function_graph tracer
	echo function_graph > $prefix/current_tracer

	echo 1 > $prefix/tracing_enabled
	lat=$(ping -c 1 $addr | grep rtt | grep -Eo " [0-9]+.[0-9]+")
	echo 0 > $prefix/tracing_enabled

	found=$(python -c "print float(str($lat).strip()) > $thres")
	sleep 0.01
done

echo 0 > $prefix/events/sched/sched_wakeup/enable
echo 0 > $prefix/events/sched/sched_switch/enable


echo "Found buggy latency: $lat"
echo "Please send the trace you will find on $prefix/trace"



> 
> > Just wait a bit, I'm looking at which event could be relevant to enable
> > and I come back to you with a set of commands to test.
> 
> Excellent! Thanks for looking into this.
> 
> Cheers,
> Kevin.
> 
> 

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-03-16 20:07           ` Frederic Weisbecker
@ 2009-03-16 22:55             ` Kevin Shanahan
  -1 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-03-16 22:55 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Mon, 2009-03-16 at 21:07 +0100, Frederic Weisbecker wrote:
> I've looked a bit at your traces.
> I think it's probably too wide to find something inside.
> Latest -tip is provided with a new set of events tracing, meaning
> that you will be able to produce function graph traces with various
> sched events included.
> 
> Another thing, is it possible to reproduce it with only one ping?
> Or testing perioding pings and keep only one that raised a relevant
> threshold of latency? I think we could do a script that can do that.
> It would make the trace much clearer.

Yeah, I think that should be possible. If you can come up with such a
script, that would be great.

> Just wait a bit, I'm looking at which event could be relevant to enable
> and I come back to you with a set of commands to test.

Excellent! Thanks for looking into this.

Cheers,
Kevin.



^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-16 22:55             ` Kevin Shanahan
  0 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-03-16 22:55 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Mon, 2009-03-16 at 21:07 +0100, Frederic Weisbecker wrote:
> I've looked a bit at your traces.
> I think it's probably too wide to find something inside.
> Latest -tip is provided with a new set of events tracing, meaning
> that you will be able to produce function graph traces with various
> sched events included.
> 
> Another thing, is it possible to reproduce it with only one ping?
> Or testing perioding pings and keep only one that raised a relevant
> threshold of latency? I think we could do a script that can do that.
> It would make the trace much clearer.

Yeah, I think that should be possible. If you can come up with such a
script, that would be great.

> Just wait a bit, I'm looking at which event could be relevant to enable
> and I come back to you with a set of commands to test.

Excellent! Thanks for looking into this.

Cheers,
Kevin.


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-16 20:07           ` Frederic Weisbecker
  0 siblings, 0 replies; 262+ messages in thread
From: Frederic Weisbecker @ 2009-03-16 20:07 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Mon, Mar 16, 2009 at 11:16:35PM +1030, Kevin Shanahan wrote:
> On Mon, 2009-03-16 at 11:49 +0200, Avi Kivity wrote:
> > Kevin Shanahan wrote:
> > > On Sat, 2009-03-14 at 20:20 +0100, Rafael J. Wysocki wrote:
> > >   
> > >> This message has been generated automatically as a part of a report
> > >> of regressions introduced between 2.6.27 and 2.6.28.
> > >>
> > >> The following bug entry is on the current list of known regressions
> > >> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> > >> be listed and let me know (either way).
> > >>
> > >> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> > >> Subject		: KVM guests stalling on 2.6.28 (bisected)
> > >> Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
> > >> Date		: 2009-01-17 03:37 (57 days old)
> > >> Handled-By	: Avi Kivity <avi@redhat.com>
> > >>     
> > >
> > > No further updates since the last reminder.
> > > The bug should still be listed.   
> > 
> > Does the bug reproduce if you use the acpi_pm clocksource in the guests?
> 
> In the guest being pinged? Yes, it still happens.


Hi Kevin,

I've looked a bit at your traces.
I think it's probably too wide to find something inside.
Latest -tip is provided with a new set of events tracing, meaning
that you will be able to produce function graph traces with various
sched events included.

Another thing, is it possible to reproduce it with only one ping?
Or testing perioding pings and keep only one that raised a relevant
threshold of latency? I think we could do a script that can do that.
It would make the trace much clearer.

Just wait a bit, I'm looking at which event could be relevant to enable
and I come back to you with a set of commands to test.

Frederic.
 
> hermes-old:~# cat /sys/devices/system/clocksource/clocksource0/available_clocksource 
> kvm-clock acpi_pm jiffies tsc 
> hermes-old:~# cat /sys/devices/system/clocksource/clocksource0/current_clocksource 
> acpi_pm
> 
> kmshanah@flexo:~$ ping -c 600 hermes-old
> 
> --- hermes-old.wumi.org.au ping statistics ---
> 600 packets transmitted, 600 received, 0% packet loss, time 599439ms
> rtt min/avg/max/mdev = 0.131/723.197/9941.884/1569.918 ms, pipe 10
> 
> I had to reconfigure the guest kernel to make that clocksource
> available. The way I had the guest kernel configured before, it only had
> tsc and jiffies clocksources available. Unstable TSC was detected, so it
> has been using jiffies until now.
> 
> Here's another test, using kvm-clock as the guest's clocksource:
> 
> hermes-old:~# cat /sys/devices/system/clocksource/clocksource0/current_clocksource 
> kvm-clock
> 
> kmshanah@flexo:~$ ping -c 600 hermes-old
> 
> --- hermes-old.wumi.org.au ping statistics ---
> 600 packets transmitted, 600 received, 0% packet loss, time 599295ms
> rtt min/avg/max/mdev = 0.131/1116.170/30840.411/4171.905 ms, pipe 31
> 
> Regards,
> Kevin.
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-16 20:07           ` Frederic Weisbecker
  0 siblings, 0 replies; 262+ messages in thread
From: Frederic Weisbecker @ 2009-03-16 20:07 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Mon, Mar 16, 2009 at 11:16:35PM +1030, Kevin Shanahan wrote:
> On Mon, 2009-03-16 at 11:49 +0200, Avi Kivity wrote:
> > Kevin Shanahan wrote:
> > > On Sat, 2009-03-14 at 20:20 +0100, Rafael J. Wysocki wrote:
> > >   
> > >> This message has been generated automatically as a part of a report
> > >> of regressions introduced between 2.6.27 and 2.6.28.
> > >>
> > >> The following bug entry is on the current list of known regressions
> > >> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> > >> be listed and let me know (either way).
> > >>
> > >> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> > >> Subject		: KVM guests stalling on 2.6.28 (bisected)
> > >> Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
> > >> Date		: 2009-01-17 03:37 (57 days old)
> > >> Handled-By	: Avi Kivity <avi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > >>     
> > >
> > > No further updates since the last reminder.
> > > The bug should still be listed.   
> > 
> > Does the bug reproduce if you use the acpi_pm clocksource in the guests?
> 
> In the guest being pinged? Yes, it still happens.


Hi Kevin,

I've looked a bit at your traces.
I think it's probably too wide to find something inside.
Latest -tip is provided with a new set of events tracing, meaning
that you will be able to produce function graph traces with various
sched events included.

Another thing, is it possible to reproduce it with only one ping?
Or testing perioding pings and keep only one that raised a relevant
threshold of latency? I think we could do a script that can do that.
It would make the trace much clearer.

Just wait a bit, I'm looking at which event could be relevant to enable
and I come back to you with a set of commands to test.

Frederic.
 
> hermes-old:~# cat /sys/devices/system/clocksource/clocksource0/available_clocksource 
> kvm-clock acpi_pm jiffies tsc 
> hermes-old:~# cat /sys/devices/system/clocksource/clocksource0/current_clocksource 
> acpi_pm
> 
> kmshanah@flexo:~$ ping -c 600 hermes-old
> 
> --- hermes-old.wumi.org.au ping statistics ---
> 600 packets transmitted, 600 received, 0% packet loss, time 599439ms
> rtt min/avg/max/mdev = 0.131/723.197/9941.884/1569.918 ms, pipe 10
> 
> I had to reconfigure the guest kernel to make that clocksource
> available. The way I had the guest kernel configured before, it only had
> tsc and jiffies clocksources available. Unstable TSC was detected, so it
> has been using jiffies until now.
> 
> Here's another test, using kvm-clock as the guest's clocksource:
> 
> hermes-old:~# cat /sys/devices/system/clocksource/clocksource0/current_clocksource 
> kvm-clock
> 
> kmshanah@flexo:~$ ping -c 600 hermes-old
> 
> --- hermes-old.wumi.org.au ping statistics ---
> 600 packets transmitted, 600 received, 0% packet loss, time 599295ms
> rtt min/avg/max/mdev = 0.131/1116.170/30840.411/4171.905 ms, pipe 31
> 
> Regards,
> Kevin.
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-16 12:46         ` Kevin Shanahan
  0 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-03-16 12:46 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Mon, 2009-03-16 at 11:49 +0200, Avi Kivity wrote:
> Kevin Shanahan wrote:
> > On Sat, 2009-03-14 at 20:20 +0100, Rafael J. Wysocki wrote:
> >   
> >> This message has been generated automatically as a part of a report
> >> of regressions introduced between 2.6.27 and 2.6.28.
> >>
> >> The following bug entry is on the current list of known regressions
> >> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> >> be listed and let me know (either way).
> >>
> >> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> >> Subject		: KVM guests stalling on 2.6.28 (bisected)
> >> Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
> >> Date		: 2009-01-17 03:37 (57 days old)
> >> Handled-By	: Avi Kivity <avi@redhat.com>
> >>     
> >
> > No further updates since the last reminder.
> > The bug should still be listed.   
> 
> Does the bug reproduce if you use the acpi_pm clocksource in the guests?

In the guest being pinged? Yes, it still happens.

hermes-old:~# cat /sys/devices/system/clocksource/clocksource0/available_clocksource 
kvm-clock acpi_pm jiffies tsc 
hermes-old:~# cat /sys/devices/system/clocksource/clocksource0/current_clocksource 
acpi_pm

kmshanah@flexo:~$ ping -c 600 hermes-old

--- hermes-old.wumi.org.au ping statistics ---
600 packets transmitted, 600 received, 0% packet loss, time 599439ms
rtt min/avg/max/mdev = 0.131/723.197/9941.884/1569.918 ms, pipe 10

I had to reconfigure the guest kernel to make that clocksource
available. The way I had the guest kernel configured before, it only had
tsc and jiffies clocksources available. Unstable TSC was detected, so it
has been using jiffies until now.

Here's another test, using kvm-clock as the guest's clocksource:

hermes-old:~# cat /sys/devices/system/clocksource/clocksource0/current_clocksource 
kvm-clock

kmshanah@flexo:~$ ping -c 600 hermes-old

--- hermes-old.wumi.org.au ping statistics ---
600 packets transmitted, 600 received, 0% packet loss, time 599295ms
rtt min/avg/max/mdev = 0.131/1116.170/30840.411/4171.905 ms, pipe 31

Regards,
Kevin.



^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-16 12:46         ` Kevin Shanahan
  0 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-03-16 12:46 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Mon, 2009-03-16 at 11:49 +0200, Avi Kivity wrote:
> Kevin Shanahan wrote:
> > On Sat, 2009-03-14 at 20:20 +0100, Rafael J. Wysocki wrote:
> >   
> >> This message has been generated automatically as a part of a report
> >> of regressions introduced between 2.6.27 and 2.6.28.
> >>
> >> The following bug entry is on the current list of known regressions
> >> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> >> be listed and let me know (either way).
> >>
> >> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> >> Subject		: KVM guests stalling on 2.6.28 (bisected)
> >> Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
> >> Date		: 2009-01-17 03:37 (57 days old)
> >> Handled-By	: Avi Kivity <avi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> >>     
> >
> > No further updates since the last reminder.
> > The bug should still be listed.   
> 
> Does the bug reproduce if you use the acpi_pm clocksource in the guests?

In the guest being pinged? Yes, it still happens.

hermes-old:~# cat /sys/devices/system/clocksource/clocksource0/available_clocksource 
kvm-clock acpi_pm jiffies tsc 
hermes-old:~# cat /sys/devices/system/clocksource/clocksource0/current_clocksource 
acpi_pm

kmshanah@flexo:~$ ping -c 600 hermes-old

--- hermes-old.wumi.org.au ping statistics ---
600 packets transmitted, 600 received, 0% packet loss, time 599439ms
rtt min/avg/max/mdev = 0.131/723.197/9941.884/1569.918 ms, pipe 10

I had to reconfigure the guest kernel to make that clocksource
available. The way I had the guest kernel configured before, it only had
tsc and jiffies clocksources available. Unstable TSC was detected, so it
has been using jiffies until now.

Here's another test, using kvm-clock as the guest's clocksource:

hermes-old:~# cat /sys/devices/system/clocksource/clocksource0/current_clocksource 
kvm-clock

kmshanah@flexo:~$ ping -c 600 hermes-old

--- hermes-old.wumi.org.au ping statistics ---
600 packets transmitted, 600 received, 0% packet loss, time 599295ms
rtt min/avg/max/mdev = 0.131/1116.170/30840.411/4171.905 ms, pipe 31

Regards,
Kevin.


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-16  9:49       ` Avi Kivity
  0 siblings, 0 replies; 262+ messages in thread
From: Avi Kivity @ 2009-03-16  9:49 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

Kevin Shanahan wrote:
> On Sat, 2009-03-14 at 20:20 +0100, Rafael J. Wysocki wrote:
>   
>> This message has been generated automatically as a part of a report
>> of regressions introduced between 2.6.27 and 2.6.28.
>>
>> The following bug entry is on the current list of known regressions
>> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
>> be listed and let me know (either way).
>>
>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
>> Subject		: KVM guests stalling on 2.6.28 (bisected)
>> Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
>> Date		: 2009-01-17 03:37 (57 days old)
>> Handled-By	: Avi Kivity <avi@redhat.com>
>>     
>
> No further updates since the last reminder.
> The bug should still be listed.
>
>   

Does the bug reproduce if you use the acpi_pm clocksource in the guests?


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-16  9:49       ` Avi Kivity
  0 siblings, 0 replies; 262+ messages in thread
From: Avi Kivity @ 2009-03-16  9:49 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

Kevin Shanahan wrote:
> On Sat, 2009-03-14 at 20:20 +0100, Rafael J. Wysocki wrote:
>   
>> This message has been generated automatically as a part of a report
>> of regressions introduced between 2.6.27 and 2.6.28.
>>
>> The following bug entry is on the current list of known regressions
>> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
>> be listed and let me know (either way).
>>
>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
>> Subject		: KVM guests stalling on 2.6.28 (bisected)
>> Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
>> Date		: 2009-01-17 03:37 (57 days old)
>> Handled-By	: Avi Kivity <avi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>>     
>
> No further updates since the last reminder.
> The bug should still be listed.
>
>   

Does the bug reproduce if you use the acpi_pm clocksource in the guests?


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-15 10:13               ` Avi Kivity
  0 siblings, 0 replies; 262+ messages in thread
From: Avi Kivity @ 2009-03-15 10:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Mike Galbraith, Kevin Shanahan,
	Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List

Ingo Molnar wrote:
>> A specific question for now is how can I identify long latency 
>> within qemu here?  As far as I can tell all qemu latencies in 
>> trace6.txt are sub 100ms, which, while long, don't explain the 
>> guest stalling for many seconds.
>>     
>
> Exactly - that in turn means that there's no scheduler latency 
> on the host/native kernel side - in turn it must be a KVM 
> related latency. (If there was any host side scheduler wakeup or 
> other type of latency we'd see it in the trace.)
>   

But if there's a missing wakeup (which is the likeliest candidate for 
the bug) then we would have seen high latencies, no?

Can you explain what the patch in question (14800984706) does?  Maybe 
that will give us a clue.

> The most useful trace would be a specific set of trace_printk() 
> calls (available on the latest tracing tree), coupled with a 
> hyper_trace_printk() which injects a trace entry from the guest 
> side into the host kernel trace buffer. (== that would mean a 
> hypercall that does a trace_printk().)

Yes, that would provide all the information.  Not sure if I would be up 
to decoding it, though.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-15 10:13               ` Avi Kivity
  0 siblings, 0 replies; 262+ messages in thread
From: Avi Kivity @ 2009-03-15 10:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Mike Galbraith, Kevin Shanahan,
	Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List

Ingo Molnar wrote:
>> A specific question for now is how can I identify long latency 
>> within qemu here?  As far as I can tell all qemu latencies in 
>> trace6.txt are sub 100ms, which, while long, don't explain the 
>> guest stalling for many seconds.
>>     
>
> Exactly - that in turn means that there's no scheduler latency 
> on the host/native kernel side - in turn it must be a KVM 
> related latency. (If there was any host side scheduler wakeup or 
> other type of latency we'd see it in the trace.)
>   

But if there's a missing wakeup (which is the likeliest candidate for 
the bug) then we would have seen high latencies, no?

Can you explain what the patch in question (14800984706) does?  Maybe 
that will give us a clue.

> The most useful trace would be a specific set of trace_printk() 
> calls (available on the latest tracing tree), coupled with a 
> hyper_trace_printk() which injects a trace entry from the guest 
> side into the host kernel trace buffer. (== that would mean a 
> hypercall that does a trace_printk().)

Yes, that would provide all the information.  Not sure if I would be up 
to decoding it, though.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-03-15  9:56           ` Avi Kivity
  (?)
@ 2009-03-15 10:03           ` Ingo Molnar
  2009-03-15 10:13               ` Avi Kivity
  -1 siblings, 1 reply; 262+ messages in thread
From: Ingo Molnar @ 2009-03-15 10:03 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Peter Zijlstra, Mike Galbraith, Kevin Shanahan,
	Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List


* Avi Kivity <avi@redhat.com> wrote:

> Ingo Molnar wrote:
>>> I've looked at the traces but lack the skill to make any sense out of 
>>> them.
>>>     
>>
>> Do you have specific questions about them that we could answer?
>>   
>
> A general question: what's going on?  I guess this will only 
> be answered by me getting my hands dirty and understanding how 
> ftrace works and how the output maps to what's happening.  
> I'll look at the docs for a while.
>
> A specific question for now is how can I identify long latency 
> within qemu here?  As far as I can tell all qemu latencies in 
> trace6.txt are sub 100ms, which, while long, don't explain the 
> guest stalling for many seconds.

Exactly - that in turn means that there's no scheduler latency 
on the host/native kernel side - in turn it must be a KVM 
related latency. (If there was any host side scheduler wakeup or 
other type of latency we'd see it in the trace.)

The most useful trace would be a specific set of trace_printk() 
calls (available on the latest tracing tree), coupled with a 
hyper_trace_printk() which injects a trace entry from the guest 
side into the host kernel trace buffer. (== that would mean a 
hypercall that does a trace_printk().)

	Ingo

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-15  9:56           ` Avi Kivity
  0 siblings, 0 replies; 262+ messages in thread
From: Avi Kivity @ 2009-03-15  9:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Mike Galbraith, Kevin Shanahan,
	Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List

Ingo Molnar wrote:
>> I've looked at the traces but lack the skill to make any sense 
>> out of them.
>>     
>
> Do you have specific questions about them that we could answer?
>   

A general question: what's going on?  I guess this will only be answered 
by me getting my hands dirty and understanding how ftrace works and how 
the output maps to what's happening.  I'll look at the docs for a while.

A specific question for now is how can I identify long latency within 
qemu here?  As far as I can tell all qemu latencies in trace6.txt are 
sub 100ms, which, while long, don't explain the guest stalling for many 
seconds.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-15  9:56           ` Avi Kivity
  0 siblings, 0 replies; 262+ messages in thread
From: Avi Kivity @ 2009-03-15  9:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Mike Galbraith, Kevin Shanahan,
	Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List

Ingo Molnar wrote:
>> I've looked at the traces but lack the skill to make any sense 
>> out of them.
>>     
>
> Do you have specific questions about them that we could answer?
>   

A general question: what's going on?  I guess this will only be answered 
by me getting my hands dirty and understanding how ftrace works and how 
the output maps to what's happening.  I'll look at the docs for a while.

A specific question for now is how can I identify long latency within 
qemu here?  As far as I can tell all qemu latencies in trace6.txt are 
sub 100ms, which, while long, don't explain the guest stalling for many 
seconds.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-15  9:48         ` Ingo Molnar
  0 siblings, 0 replies; 262+ messages in thread
From: Ingo Molnar @ 2009-03-15  9:48 UTC (permalink / raw)
  To: Avi Kivity, Peter Zijlstra, Mike Galbraith
  Cc: Kevin Shanahan, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List


* Avi Kivity <avi@redhat.com> wrote:

> Kevin Shanahan wrote:
>> On Sat, 2009-03-14 at 20:20 +0100, Rafael J. Wysocki wrote:
>>   
>>> This message has been generated automatically as a part of a report
>>> of regressions introduced between 2.6.27 and 2.6.28.
>>>
>>> The following bug entry is on the current list of known regressions
>>> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
>>> be listed and let me know (either way).
>>>
>>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
>>> Subject		: KVM guests stalling on 2.6.28 (bisected)
>>> Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
>>> Date		: 2009-01-17 03:37 (57 days old)
>>> Handled-By	: Avi Kivity <avi@redhat.com>
>>>     
>>
>> No further updates since the last reminder.
>> The bug should still be listed.
>>   
>
> I've looked at the traces but lack the skill to make any sense 
> out of them.

Do you have specific questions about them that we could answer?

	Ingo

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-15  9:48         ` Ingo Molnar
  0 siblings, 0 replies; 262+ messages in thread
From: Ingo Molnar @ 2009-03-15  9:48 UTC (permalink / raw)
  To: Avi Kivity, Peter Zijlstra, Mike Galbraith
  Cc: Kevin Shanahan, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List


* Avi Kivity <avi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> Kevin Shanahan wrote:
>> On Sat, 2009-03-14 at 20:20 +0100, Rafael J. Wysocki wrote:
>>   
>>> This message has been generated automatically as a part of a report
>>> of regressions introduced between 2.6.27 and 2.6.28.
>>>
>>> The following bug entry is on the current list of known regressions
>>> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
>>> be listed and let me know (either way).
>>>
>>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
>>> Subject		: KVM guests stalling on 2.6.28 (bisected)
>>> Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
>>> Date		: 2009-01-17 03:37 (57 days old)
>>> Handled-By	: Avi Kivity <avi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>>>     
>>
>> No further updates since the last reminder.
>> The bug should still be listed.
>>   
>
> I've looked at the traces but lack the skill to make any sense 
> out of them.

Do you have specific questions about them that we could answer?

	Ingo

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-15  9:18       ` Avi Kivity
  0 siblings, 0 replies; 262+ messages in thread
From: Avi Kivity @ 2009-03-15  9:18 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

Kevin Shanahan wrote:
> On Sat, 2009-03-14 at 20:20 +0100, Rafael J. Wysocki wrote:
>   
>> This message has been generated automatically as a part of a report
>> of regressions introduced between 2.6.27 and 2.6.28.
>>
>> The following bug entry is on the current list of known regressions
>> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
>> be listed and let me know (either way).
>>
>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
>> Subject		: KVM guests stalling on 2.6.28 (bisected)
>> Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
>> Date		: 2009-01-17 03:37 (57 days old)
>> Handled-By	: Avi Kivity <avi@redhat.com>
>>     
>
> No further updates since the last reminder.
> The bug should still be listed.
>   

I've looked at the traces but lack the skill to make any sense out of them.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-15  9:18       ` Avi Kivity
  0 siblings, 0 replies; 262+ messages in thread
From: Avi Kivity @ 2009-03-15  9:18 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

Kevin Shanahan wrote:
> On Sat, 2009-03-14 at 20:20 +0100, Rafael J. Wysocki wrote:
>   
>> This message has been generated automatically as a part of a report
>> of regressions introduced between 2.6.27 and 2.6.28.
>>
>> The following bug entry is on the current list of known regressions
>> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
>> be listed and let me know (either way).
>>
>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
>> Subject		: KVM guests stalling on 2.6.28 (bisected)
>> Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
>> Date		: 2009-01-17 03:37 (57 days old)
>> Handled-By	: Avi Kivity <avi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>>     
>
> No further updates since the last reminder.
> The bug should still be listed.
>   

I've looked at the traces but lack the skill to make any sense out of them.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-03-14 19:20   ` Rafael J. Wysocki
@ 2009-03-15  9:03     ` Kevin Shanahan
  -1 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-03-15  9:03 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Avi Kivity,
	Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Sat, 2009-03-14 at 20:20 +0100, Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.27 and 2.6.28.
> 
> The following bug entry is on the current list of known regressions
> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> be listed and let me know (either way).
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> Subject		: KVM guests stalling on 2.6.28 (bisected)
> Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
> Date		: 2009-01-17 03:37 (57 days old)
> Handled-By	: Avi Kivity <avi@redhat.com>

No further updates since the last reminder.
The bug should still be listed.

Cheers,
Kevin.



^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-15  9:03     ` Kevin Shanahan
  0 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-03-15  9:03 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Avi Kivity,
	Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Sat, 2009-03-14 at 20:20 +0100, Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.27 and 2.6.28.
> 
> The following bug entry is on the current list of known regressions
> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> be listed and let me know (either way).
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> Subject		: KVM guests stalling on 2.6.28 (bisected)
> Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
> Date		: 2009-01-17 03:37 (57 days old)
> Handled-By	: Avi Kivity <avi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

No further updates since the last reminder.
The bug should still be listed.

Cheers,
Kevin.


^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-03-14 19:11 2.6.29-rc8: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
@ 2009-03-14 19:20   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Avi Kivity, Ingo Molnar, Kevin Shanahan,
	Kevin Shanahan, Mike Galbraith, Peter Zijlstra

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
Subject		: KVM guests stalling on 2.6.28 (bisected)
Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
Date		: 2009-01-17 03:37 (57 days old)
Handled-By	: Avi Kivity <avi@redhat.com>



^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-14 19:20   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-03-14 19:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Avi Kivity, Ingo Molnar, Kevin Shanahan,
	Kevin Shanahan, Mike Galbraith, Peter Zijlstra

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
Subject		: KVM guests stalling on 2.6.28 (bisected)
Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
Date		: 2009-01-17 03:37 (57 days old)
Handled-By	: Avi Kivity <avi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-08 10:04       ` Avi Kivity
  0 siblings, 0 replies; 262+ messages in thread
From: Avi Kivity @ 2009-03-08 10:04 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

Kevin Shanahan wrote:
> On Tue, 2009-03-03 at 20:41 +0100, Rafael J. Wysocki wrote:
>   
>> The following bug entry is on the current list of known regressions
>> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
>> be listed and let me know (either way).
>>
>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
>> Subject		: KVM guests stalling on 2.6.28 (bisected)
>> Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
>> Date		: 2009-01-17 03:37 (46 days old)
>> Handled-By	: Avi Kivity <avi@redhat.com>
>>     
>
> Yes this should still be listed.
>
> The traces are there waiting to be looked at. If there's anything else I
> can do to help things along, please let me know.
>
>   

I was away on vacation, I'll try to look at the traces soon.  Help from 
the sched developers would be appreciated, though, as I doubt I have the 
skills to decypher them.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-08 10:04       ` Avi Kivity
  0 siblings, 0 replies; 262+ messages in thread
From: Avi Kivity @ 2009-03-08 10:04 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

Kevin Shanahan wrote:
> On Tue, 2009-03-03 at 20:41 +0100, Rafael J. Wysocki wrote:
>   
>> The following bug entry is on the current list of known regressions
>> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
>> be listed and let me know (either way).
>>
>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
>> Subject		: KVM guests stalling on 2.6.28 (bisected)
>> Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
>> Date		: 2009-01-17 03:37 (46 days old)
>> Handled-By	: Avi Kivity <avi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>>     
>
> Yes this should still be listed.
>
> The traces are there waiting to be looked at. If there's anything else I
> can do to help things along, please let me know.
>
>   

I was away on vacation, I'll try to look at the traces soon.  Help from 
the sched developers would be appreciated, though, as I doubt I have the 
skills to decypher them.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-03-03 19:41   ` Rafael J. Wysocki
@ 2009-03-04  3:08     ` Kevin Shanahan
  -1 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-03-04  3:08 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Avi Kivity,
	Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Tue, 2009-03-03 at 20:41 +0100, Rafael J. Wysocki wrote:
> The following bug entry is on the current list of known regressions
> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> be listed and let me know (either way).
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> Subject		: KVM guests stalling on 2.6.28 (bisected)
> Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
> Date		: 2009-01-17 03:37 (46 days old)
> Handled-By	: Avi Kivity <avi@redhat.com>

Yes this should still be listed.

The traces are there waiting to be looked at. If there's anything else I
can do to help things along, please let me know.

Regards,
Kevin.



^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-04  3:08     ` Kevin Shanahan
  0 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-03-04  3:08 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Avi Kivity,
	Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Tue, 2009-03-03 at 20:41 +0100, Rafael J. Wysocki wrote:
> The following bug entry is on the current list of known regressions
> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> be listed and let me know (either way).
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> Subject		: KVM guests stalling on 2.6.28 (bisected)
> Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
> Date		: 2009-01-17 03:37 (46 days old)
> Handled-By	: Avi Kivity <avi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Yes this should still be listed.

The traces are there waiting to be looked at. If there's anything else I
can do to help things along, please let me know.

Regards,
Kevin.


^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-03-03 19:34 2.6.29-rc6-git7: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
@ 2009-03-03 19:41   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-03-03 19:41 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Avi Kivity, Ingo Molnar, Kevin Shanahan,
	Kevin Shanahan, Mike Galbraith, Peter Zijlstra

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
Subject		: KVM guests stalling on 2.6.28 (bisected)
Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
Date		: 2009-01-17 03:37 (46 days old)
Handled-By	: Avi Kivity <avi@redhat.com>



^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-03-03 19:41   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-03-03 19:41 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Avi Kivity, Ingo Molnar, Kevin Shanahan,
	Kevin Shanahan, Mike Galbraith, Peter Zijlstra

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
Subject		: KVM guests stalling on 2.6.28 (bisected)
Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
Date		: 2009-01-17 03:37 (46 days old)
Handled-By	: Avi Kivity <avi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-02-24 22:11         ` Kevin Shanahan
  0 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-02-24 22:11 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Tue, 2009-02-24 at 14:09 +0200, Avi Kivity wrote:
> Kevin Shanahan wrote:
> > On Mon, 2009-02-23 at 23:03 +0100, Rafael J. Wysocki wrote:
> >   
> >> This message has been generated automatically as a part of a report
> >> of regressions introduced between 2.6.27 and 2.6.28.
> >>
> >> The following bug entry is on the current list of known regressions
> >> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> >> be listed and let me know (either way).
> >>     
> >
> > Yes, the problem should still be listed.
> > The bug is still present as recently as 2.6.29-rc5-00299-gadfafef.
> >   
> 
> Did tracing turn anything up?

I provided some more traces using Ingo's "tip" branch, but I don't think
anyone has looked at them yet.

  http://bugzilla.kernel.org/show_bug.cgi?id=12465#c11

I can provide more traces if e.g. a different set of functions is
required, but I'm not going to be able to analyse them properly myself.

I should have a bit more time for testing next week and I plan to try
setting up the guest with the different virtual network adapters models
to see if that helps.

Cheers,
Kevin.



^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-02-24 22:11         ` Kevin Shanahan
  0 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-02-24 22:11 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

On Tue, 2009-02-24 at 14:09 +0200, Avi Kivity wrote:
> Kevin Shanahan wrote:
> > On Mon, 2009-02-23 at 23:03 +0100, Rafael J. Wysocki wrote:
> >   
> >> This message has been generated automatically as a part of a report
> >> of regressions introduced between 2.6.27 and 2.6.28.
> >>
> >> The following bug entry is on the current list of known regressions
> >> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> >> be listed and let me know (either way).
> >>     
> >
> > Yes, the problem should still be listed.
> > The bug is still present as recently as 2.6.29-rc5-00299-gadfafef.
> >   
> 
> Did tracing turn anything up?

I provided some more traces using Ingo's "tip" branch, but I don't think
anyone has looked at them yet.

  http://bugzilla.kernel.org/show_bug.cgi?id=12465#c11

I can provide more traces if e.g. a different set of functions is
required, but I'm not going to be able to analyse them properly myself.

I should have a bit more time for testing next week and I plan to try
setting up the guest with the different virtual network adapters models
to see if that helps.

Cheers,
Kevin.


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-02-24 12:09       ` Avi Kivity
  0 siblings, 0 replies; 262+ messages in thread
From: Avi Kivity @ 2009-02-24 12:09 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

Kevin Shanahan wrote:
> On Mon, 2009-02-23 at 23:03 +0100, Rafael J. Wysocki wrote:
>   
>> This message has been generated automatically as a part of a report
>> of regressions introduced between 2.6.27 and 2.6.28.
>>
>> The following bug entry is on the current list of known regressions
>> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
>> be listed and let me know (either way).
>>     
>
> Yes, the problem should still be listed.
> The bug is still present as recently as 2.6.29-rc5-00299-gadfafef.
>   

Did tracing turn anything up?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-02-24 12:09       ` Avi Kivity
  0 siblings, 0 replies; 262+ messages in thread
From: Avi Kivity @ 2009-02-24 12:09 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Ingo Molnar, Mike Galbraith, Peter Zijlstra

Kevin Shanahan wrote:
> On Mon, 2009-02-23 at 23:03 +0100, Rafael J. Wysocki wrote:
>   
>> This message has been generated automatically as a part of a report
>> of regressions introduced between 2.6.27 and 2.6.28.
>>
>> The following bug entry is on the current list of known regressions
>> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
>> be listed and let me know (either way).
>>     
>
> Yes, the problem should still be listed.
> The bug is still present as recently as 2.6.29-rc5-00299-gadfafef.
>   

Did tracing turn anything up?

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-02-24  1:37       ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-24  1:37 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Linux Kernel Mailing List, Kernel Testers List, Ingo Molnar,
	Mike Galbraith, Peter Zijlstra

On Tuesday 24 February 2009, Kevin Shanahan wrote:
> On Mon, 2009-02-23 at 23:03 +0100, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.27 and 2.6.28.
> > 
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> > be listed and let me know (either way).
> 
> Yes, the problem should still be listed.
> The bug is still present as recently as 2.6.29-rc5-00299-gadfafef.

Thanks for the update.

Rafael

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-02-24  1:37       ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-24  1:37 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Linux Kernel Mailing List, Kernel Testers List, Ingo Molnar,
	Mike Galbraith, Peter Zijlstra

On Tuesday 24 February 2009, Kevin Shanahan wrote:
> On Mon, 2009-02-23 at 23:03 +0100, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.27 and 2.6.28.
> > 
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> > be listed and let me know (either way).
> 
> Yes, the problem should still be listed.
> The bug is still present as recently as 2.6.29-rc5-00299-gadfafef.

Thanks for the update.

Rafael

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-02-23 22:03   ` Rafael J. Wysocki
@ 2009-02-24  0:59     ` Kevin Shanahan
  -1 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-02-24  0:59 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Ingo Molnar,
	Mike Galbraith, Peter Zijlstra

On Mon, 2009-02-23 at 23:03 +0100, Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.27 and 2.6.28.
> 
> The following bug entry is on the current list of known regressions
> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> be listed and let me know (either way).

Yes, the problem should still be listed.
The bug is still present as recently as 2.6.29-rc5-00299-gadfafef.

Regards,
Kevin.

> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> Subject		: KVM guests stalling on 2.6.28 (bisected)
> Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
> Date		: 2009-01-17 03:37 (38 days old)



^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-02-24  0:59     ` Kevin Shanahan
  0 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-02-24  0:59 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Ingo Molnar,
	Mike Galbraith, Peter Zijlstra

On Mon, 2009-02-23 at 23:03 +0100, Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.27 and 2.6.28.
> 
> The following bug entry is on the current list of known regressions
> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> be listed and let me know (either way).

Yes, the problem should still be listed.
The bug is still present as recently as 2.6.29-rc5-00299-gadfafef.

Regards,
Kevin.

> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> Subject		: KVM guests stalling on 2.6.28 (bisected)
> Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
> Date		: 2009-01-17 03:37 (38 days old)


^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-02-23 22:00 2.6.29-rc6: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
@ 2009-02-23 22:03   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-23 22:03 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Ingo Molnar, Kevin Shanahan, Kevin Shanahan,
	Mike Galbraith, Peter Zijlstra

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
Subject		: KVM guests stalling on 2.6.28 (bisected)
Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
Date		: 2009-01-17 03:37 (38 days old)



^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-02-23 22:03   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-23 22:03 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Ingo Molnar, Kevin Shanahan, Kevin Shanahan,
	Mike Galbraith, Peter Zijlstra

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
Subject		: KVM guests stalling on 2.6.28 (bisected)
Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
Date		: 2009-01-17 03:37 (38 days old)


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-02-05 22:37       ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-05 22:37 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Linux Kernel Mailing List, Kernel Testers List, Ingo Molnar,
	Mike Galbraith, Peter Zijlstra

On Thursday 05 February 2009, Kevin Shanahan wrote:
> On Wed, 2009-02-04 at 11:58 +0100, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.27 and 2.6.28.
> > 
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> > be listed and let me know (either way).
> > 
> > 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> > Subject		: KVM guests stalling on 2.6.28 (bisected)
> > Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
> > Date		: 2009-01-17 03:37 (19 days old)
> 
> Yes, this should still be listed.

Thanks for the update.
 
> Please remove kmshanah@flexo.wumi.org.au from the CC list.

It gets added because it is present in the Author: field in
http://bugzilla.kernel.org/show_bug.cgi?id=12465#c5

This is how the script works, sorry for the inconvenience.

Rafael


> 
> Thanks,
> Kevin.
> 
> 
> 
> 


-- 
Everyone knows that debugging is twice as hard as writing a program
in the first place.  So if you're as clever as you can be when you write it,
how will you ever debug it? --- Brian Kernighan

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-02-05 22:37       ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-05 22:37 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Linux Kernel Mailing List, Kernel Testers List, Ingo Molnar,
	Mike Galbraith, Peter Zijlstra

On Thursday 05 February 2009, Kevin Shanahan wrote:
> On Wed, 2009-02-04 at 11:58 +0100, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.27 and 2.6.28.
> > 
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> > be listed and let me know (either way).
> > 
> > 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> > Subject		: KVM guests stalling on 2.6.28 (bisected)
> > Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
> > Date		: 2009-01-17 03:37 (19 days old)
> 
> Yes, this should still be listed.

Thanks for the update.
 
> Please remove kmshanah-IiIpDuVlHfMLO379cgqW9odd74u8MsAO@public.gmane.org from the CC list.

It gets added because it is present in the Author: field in
http://bugzilla.kernel.org/show_bug.cgi?id=12465#c5

This is how the script works, sorry for the inconvenience.

Rafael


> 
> Thanks,
> Kevin.
> 
> 
> 
> 


-- 
Everyone knows that debugging is twice as hard as writing a program
in the first place.  So if you're as clever as you can be when you write it,
how will you ever debug it? --- Brian Kernighan

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-02-04 10:58   ` Rafael J. Wysocki
@ 2009-02-05 19:35     ` Kevin Shanahan
  -1 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-02-05 19:35 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Ingo Molnar,
	Mike Galbraith, Peter Zijlstra

On Wed, 2009-02-04 at 11:58 +0100, Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.27 and 2.6.28.
> 
> The following bug entry is on the current list of known regressions
> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> be listed and let me know (either way).
> 
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> Subject		: KVM guests stalling on 2.6.28 (bisected)
> Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
> Date		: 2009-01-17 03:37 (19 days old)

Yes, this should still be listed.

Please remove kmshanah@flexo.wumi.org.au from the CC list.

Thanks,
Kevin.



^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-02-05 19:35     ` Kevin Shanahan
  0 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-02-05 19:35 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Ingo Molnar,
	Mike Galbraith, Peter Zijlstra

On Wed, 2009-02-04 at 11:58 +0100, Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.27 and 2.6.28.
> 
> The following bug entry is on the current list of known regressions
> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> be listed and let me know (either way).
> 
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> Subject		: KVM guests stalling on 2.6.28 (bisected)
> Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
> Date		: 2009-01-17 03:37 (19 days old)

Yes, this should still be listed.

Please remove kmshanah-IiIpDuVlHfMLO379cgqW9odd74u8MsAO@public.gmane.org from the CC list.

Thanks,
Kevin.


^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-02-04 10:55 2.6.29-rc3-git6: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
@ 2009-02-04 10:58   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-04 10:58 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Ingo Molnar, Kevin Shanahan, Kevin Shanahan,
	Mike Galbraith, Peter Zijlstra

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
Subject		: KVM guests stalling on 2.6.28 (bisected)
Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
Date		: 2009-01-17 03:37 (19 days old)



^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-02-04 10:58   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-02-04 10:58 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Ingo Molnar, Kevin Shanahan, Kevin Shanahan,
	Mike Galbraith, Peter Zijlstra

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
Subject		: KVM guests stalling on 2.6.28 (bisected)
Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
Date		: 2009-01-17 03:37 (19 days old)


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-01-26 11:35                         ` Peter Zijlstra
@ 2009-01-26 15:00                             ` Ingo Molnar
  0 siblings, 0 replies; 262+ messages in thread
From: Ingo Molnar @ 2009-01-26 15:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kevin Shanahan, Avi Kivity, Steven Rostedt, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Frédéric Weisbecker, bugme-daemon


* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> Is there a way to add a wall-time column to this output so that we can 
> see where the time goes?

yes, on tip/master:

  http://people.redhat.com/mingo/tip.git/README

do something like this:

 echo funcgraph-abstime > /debug/tracing/trace_options

when the function-graph plugin is active. This will activate the absolute 
timestamps column in the trace output.

> Another something nice would be to have ctx switches like:
> 
> foo-1 => bar-2 ran: ${time foo spend on the cpu} since: ${time bar spend away from the cpu}
> 
> I'll poke me a little at this function graph tracer thingy to see if I 
> can do that.

indeed, tracking the 'scheduling atom duration' would be very nice.

	Ingo

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-26 15:00                             ` Ingo Molnar
  0 siblings, 0 replies; 262+ messages in thread
From: Ingo Molnar @ 2009-01-26 15:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kevin Shanahan, Avi Kivity, Steven Rostedt, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Frédéric Weisbecker,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r


* Peter Zijlstra <a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public.gmane.org> wrote:

> Is there a way to add a wall-time column to this output so that we can 
> see where the time goes?

yes, on tip/master:

  http://people.redhat.com/mingo/tip.git/README

do something like this:

 echo funcgraph-abstime > /debug/tracing/trace_options

when the function-graph plugin is active. This will activate the absolute 
timestamps column in the trace output.

> Another something nice would be to have ctx switches like:
> 
> foo-1 => bar-2 ran: ${time foo spend on the cpu} since: ${time bar spend away from the cpu}
> 
> I'll poke me a little at this function graph tracer thingy to see if I 
> can do that.

indeed, tracking the 'scheduling atom duration' would be very nice.

	Ingo

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-01-26  9:55                         ` Kevin Shanahan
  (?)
@ 2009-01-26 11:35                         ` Peter Zijlstra
  2009-01-26 15:00                             ` Ingo Molnar
  -1 siblings, 1 reply; 262+ messages in thread
From: Peter Zijlstra @ 2009-01-26 11:35 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Ingo Molnar, Avi Kivity, Steven Rostedt, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Frédéric Weisbecker, bugme-daemon

On Mon, 2009-01-26 at 20:25 +1030, Kevin Shanahan wrote:

> Just carrying out the steps was okay, but I don't really know what I'm
> looking at. I've uploaded the trace here (about 10 seconds worth, I
> think):
> 
>   http://disenchant.net/tmp/bug-12465/trace-1/
> 
> The guest being pinged is process 4353:
> 
> kmshanah@flexo:~$ pstree -p 4353
> qemu-system-x86(4353)─┬─{qemu-system-x86}(4354)
>                       ├─{qemu-system-x86}(4355)
>                       └─{qemu-system-x86}(4772)
> 
> I guess the larger overhead/duration values are what we are looking for,
> e.g.:
> 
> kmshanah@flexo:~$ bzgrep -E '[[:digit:]]{6,}' trace.txt.bz2 
>  0)   ksoftir-4    | ! 3010470 us |  }
>  0)  qemu-sy-4354  | ! 250406.2 us |    }
>  0)  qemu-sy-4354  | ! 250407.0 us |  }
>  0)  qemu-sy-4354  | ! 362946.3 us |    }
>  0)  qemu-sy-4354  | ! 362947.0 us |  }
>  0)  qemu-sy-4177  | ! 780480.3 us |  }
>  0)  qemu-sy-4354  | ! 117685.7 us |    }
>  0)  qemu-sy-4354  | ! 117686.5 us |  }
> 
> That ksoftirqd value is a bit strange (> 3 seconds, or is the formatting
> wrong?). I guess I still need some guidance to know what I'm looking at
> with this trace and/or what to do next.

What happens is that it gets preempted a few times while running a
particular function, say do_softirqd(), or kvm_arch_vcpu_ioctl_run().

Now, when this function ends, it prints the wall-time delay between
start and end of that function, instead of the task-time delay.

So by having been preempted several times, that gets inflated.

That said, the output is slightly 'buggy' in that is seems to miss
context switches at times:

 0)  qemu-sy-4339  |               |        schedule() {
 0)  qemu-sy-4131  | ! 6750.369 us |        }

I also find it very hard to attribute all time:

 0)  qemu-sy-4354  |               |  kvm_vcpu_ioctl() {
 0)  qemu-sy-4354  |               |    kvm_arch_vcpu_ioctl_run() {
 0)  qemu-sy-4354  |               |      kvm_arch_vcpu_load() {
 0)  qemu-sy-4354  |               |        kvm_write_guest_time() {
 0)  qemu-sy-4354  |   0.289 us    |        }
 0)  qemu-sy-4354  |   0.956 us    |      }
 0)  qemu-sy-4354  |               |      kvm_inject_pending_timer_irqs() {
 0)  qemu-sy-4354  |               |        kvm_inject_apic_timer_irqs() {
 0)  qemu-sy-4354  |   0.295 us    |        }
 0)  qemu-sy-4354  |               |        kvm_inject_pit_timer_irqs() {
 0)  qemu-sy-4354  |   0.304 us    |        }
 0)  qemu-sy-4354  |   1.488 us    |      }
 0)  qemu-sy-4354  |               |      kvm_lapic_enabled() {
 0)  qemu-sy-4354  |   0.294 us    |      }
 0)  qemu-sy-4354  |               |      kvm_lapic_find_highest_irr() {
 0)  qemu-sy-4354  |   0.307 us    |      }
 0)  qemu-sy-4354  |               |      kvm_cpu_has_interrupt() {
 0)  qemu-sy-4354  |               |        kvm_apic_has_interrupt() {
 0)  qemu-sy-4354  |   0.325 us    |        }
 0)  qemu-sy-4354  |               |        kvm_apic_accept_pic_intr() {
 0)  qemu-sy-4354  |   0.298 us    |        }
 0)  qemu-sy-4354  |   1.521 us    |      }
 0)  qemu-sy-4354  |               |      kvm_lapic_sync_to_vapic() {
 0)  qemu-sy-4354  |   0.295 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          autoremove_wake_function() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |               |                  wakeup_preempt_entity() {
 0)  qemu-sy-4354  |   0.309 us    |                  }
 0)  qemu-sy-4354  |               |                  resched_task() {
 0)  qemu-sy-4354  |   0.324 us    |                  }
 0)  qemu-sy-4354  |   1.614 us    |                }
 0)  qemu-sy-4354  |   2.934 us    |              }
 0)  qemu-sy-4354  |   3.529 us    |            }
 0)  qemu-sy-4354  |   4.118 us    |          }
 0)  qemu-sy-4354  |   4.743 us    |        }
 0)  qemu-sy-4354  |   5.432 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          autoremove_wake_function() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  =>  qemu-sy-4294
 0)  qemu-sy-4237  =>  qemu-sy-4354
 0)  qemu-sy-4354  |   5.500 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.316 us    |                }
 0)  qemu-sy-4354  |   1.250 us    |              }
 0)  qemu-sy-4354  |   1.834 us    |            }
 0)  qemu-sy-4354  |   2.434 us    |          }
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |   0.418 us    |              }
 0)  qemu-sy-4354  |   1.001 us    |            }
 0)  qemu-sy-4354  |   1.608 us    |          }
 0)  qemu-sy-4354  |   4.987 us    |        }
 0)  qemu-sy-4354  |   5.597 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.325 us    |                }
 0)  qemu-sy-4354  |   1.247 us    |              }
 0)  qemu-sy-4354  |   1.831 us    |            }
 0)  qemu-sy-4354  |   2.435 us    |          }
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |   0.415 us    |              }
 0)  qemu-sy-4354  |   0.995 us    |            }
 0)  qemu-sy-4354  |   1.587 us    |          }
 0)  qemu-sy-4354  |   5.026 us    |        }
 0)  qemu-sy-4354  |   5.639 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.313 us    |                }
 0)  qemu-sy-4354  |   1.331 us    |              }
 0)  qemu-sy-4354  |   1.903 us    |            }
 0)  qemu-sy-4354  |   2.507 us    |          }
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |   0.415 us    |              }
 0)  qemu-sy-4354  |   0.998 us    |            }
 0)  qemu-sy-4354  |   1.596 us    |          }
 0)  qemu-sy-4354  |   5.017 us    |        }
 0)  qemu-sy-4354  |   5.630 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.318 us    |                }
 0)  qemu-sy-4354  |   1.275 us    |              }
 0)  qemu-sy-4354  |   1.860 us    |            }
 0)  qemu-sy-4354  |   2.474 us    |          }
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |   0.406 us    |              }
 0)  qemu-sy-4354  |   0.989 us    |            }
 0)  qemu-sy-4354  |   1.581 us    |          }
 0)  qemu-sy-4354  |   4.953 us    |        }
 0)  qemu-sy-4354  |   5.567 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.313 us    |                }
 0)  qemu-sy-4354  |   2.645 us    |              }
 0)  qemu-sy-4354  |   3.219 us    |            }
 0)  qemu-sy-4354  |   3.824 us    |          }
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |   0.396 us    |              }
 0)  qemu-sy-4354  |   0.968 us    |            }
 0)  qemu-sy-4354  |   1.557 us    |          }
 0)  qemu-sy-4354  |   6.390 us    |        }
 0)  qemu-sy-4354  |   7.004 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.310 us    |                }
 0)  qemu-sy-4354  |   1.160 us    |              }
 0)  qemu-sy-4354  |   1.731 us    |            }
 0)  qemu-sy-4354  |   2.330 us    |          }
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |   0.397 us    |              }
 0)  qemu-sy-4354  |   0.965 us    |            }
 0)  qemu-sy-4354  |   1.554 us    |          }
 0)  qemu-sy-4354  |   4.768 us    |        }
 0)  qemu-sy-4354  |   5.383 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.307 us    |                }
 0)  qemu-sy-4354  |   1.208 us    |              }
 0)  qemu-sy-4354  |   1.777 us    |            }
 0)  qemu-sy-4354  |   2.377 us    |          }
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |   0.394 us    |              }
 0)  qemu-sy-4354  |   0.964 us    |            }
 0)  qemu-sy-4354  |   1.554 us    |          }
 0)  qemu-sy-4354  |   4.855 us    |        }
 0)  qemu-sy-4354  |   5.482 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.307 us    |                }
 0)  qemu-sy-4354  |   1.193 us    |              }
 0)  qemu-sy-4354  |   1.765 us    |            }
 0)  qemu-sy-4354  |   2.368 us    |          }
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |   0.394 us    |              }
 0)  qemu-sy-4354  |   0.974 us    |            }
 0)  qemu-sy-4354  |   1.560 us    |          }
 0)  qemu-sy-4354  |   4.831 us    |        }
 0)  qemu-sy-4354  |   5.461 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.318 us    |                }
 0)  qemu-sy-4354  |   1.175 us    |              }
 0)  qemu-sy-4354  |   1.747 us    |            }
 0)  qemu-sy-4354  |   2.344 us    |          }
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |   2.029 us    |              }
 0)  qemu-sy-4354  |   2.597 us    |            }
 0)  qemu-sy-4354  |   3.186 us    |          }
 0)  qemu-sy-4354  |   6.430 us    |        }
 0)  qemu-sy-4354  |   7.046 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.310 us    |                }
 0)  qemu-sy-4354  |   1.199 us    |              }
 0)  qemu-sy-4354  |   1.780 us    |            }
 0)  qemu-sy-4354  |   2.378 us    |          }
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |   0.397 us    |              }
 0)  qemu-sy-4354  |   0.968 us    |            }
 0)  qemu-sy-4354  |   1.560 us    |          }
 0)  qemu-sy-4354  |   4.933 us    |        }
 0)  qemu-sy-4354  |   5.549 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          autoremove_wake_function() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.316 us    |                }
 0)  qemu-sy-4354  |   1.202 us    |              }
 0)  qemu-sy-4354  |   1.792 us    |            }
 0)  qemu-sy-4354  |   2.357 us    |          }
 0)  qemu-sy-4354  |   2.973 us    |        }
 0)  qemu-sy-4354  |   3.607 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.304 us    |                }
 0)  qemu-sy-4354  |   1.149 us    |              }
 0)  qemu-sy-4354  |   1.713 us    |            }
 0)  qemu-sy-4354  |   2.309 us    |          }
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |   0.405 us    |              }
 0)  qemu-sy-4354  |   0.971 us    |            }
 0)  qemu-sy-4354  |   1.569 us    |          }
 0)  qemu-sy-4354  |   4.800 us    |        }
 0)  qemu-sy-4354  |   5.408 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.298 us    |                }
 0)  qemu-sy-4354  |   1.127 us    |              }
 0)  qemu-sy-4354  |   1.695 us    |            }
 0)  qemu-sy-4354  |   2.291 us    |          }
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |   0.403 us    |              }
 0)  qemu-sy-4354  |   0.974 us    |            }
 0)  qemu-sy-4354  |   1.575 us    |          }
 0)  qemu-sy-4354  |   4.888 us    |        }
 0)  qemu-sy-4354  |   5.482 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          autoremove_wake_function() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.303 us    |                }
 0)  qemu-sy-4354  |   2.428 us    |              }
 0)  qemu-sy-4354  |   2.991 us    |            }
 0)  qemu-sy-4354  |   3.559 us    |          }
 0)  qemu-sy-4354  |   4.157 us    |        }
 0)  qemu-sy-4354  |   4.752 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.313 us    |                }
 0)  qemu-sy-4354  |   1.437 us    |              }
 0)  qemu-sy-4354  |   2.002 us    |            }
 0)  qemu-sy-4354  |   2.594 us    |          }
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |   0.418 us    |              }
 0)  qemu-sy-4354  |   1.016 us    |            }
 0)  qemu-sy-4354  |   1.587 us    |          }
 0)  qemu-sy-4354  |   5.077 us    |        }
 0)  qemu-sy-4354  |   5.699 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.309 us    |                }
 0)  qemu-sy-4354  |   1.314 us    |              }
 0)  qemu-sy-4354  |   1.884 us    |            }
 0)  qemu-sy-4354  |   2.480 us    |          }
 0)  qemu-sy-4354  |               |          pollwake() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |   0.405 us    |              }
 0)  qemu-sy-4354  |   0.977 us    |            }
 0)  qemu-sy-4354  |   1.560 us    |          }
 0)  qemu-sy-4354  |   4.962 us    |        }
 0)  qemu-sy-4354  |   5.591 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          autoremove_wake_function() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.304 us    |                }
 0)  qemu-sy-4354  |   1.199 us    |              }
 0)  qemu-sy-4354  |   1.765 us    |            }
 0)  qemu-sy-4354  |   2.330 us    |          }
 0)  qemu-sy-4354  |   2.952 us    |        }
 0)  qemu-sy-4354  |   3.547 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          autoremove_wake_function() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.322 us    |                }
 0)  qemu-sy-4354  |   1.278 us    |              }
 0)  qemu-sy-4354  |   1.839 us    |            }
 0)  qemu-sy-4354  |   2.402 us    |          }
 0)  qemu-sy-4354  |   3.032 us    |        }
 0)  qemu-sy-4354  |   3.658 us    |      }
 0)  qemu-sy-4354  |               |      __wake_up() {
 0)  qemu-sy-4354  |               |        __wake_up_common() {
 0)  qemu-sy-4354  |               |          autoremove_wake_function() {
 0)  qemu-sy-4354  |               |            default_wake_function() {
 0)  qemu-sy-4354  |               |              try_to_wake_up() {
 0)  qemu-sy-4354  |               |                check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.303 us    |                }
 0)  qemu-sy-4354  |   1.208 us    |              }
 0)  qemu-sy-4354  |   1.759 us    |            }
 0)  qemu-sy-4354  |   2.341 us    |          }
 0)  qemu-sy-4354  |   2.949 us    |        }
 0)  qemu-sy-4354  |   3.556 us    |      }
 0)  qemu-sy-4354  |               |      scheduler_tick() {
 0)  qemu-sy-4354  |               |        sched_slice() {
 0)  qemu-sy-4354  |   0.342 us    |        }
 0)  qemu-sy-4354  |   3.222 us    |      }
 0)  qemu-sy-4354  |               |      wake_up_process() {
 0)  qemu-sy-4354  |               |        try_to_wake_up() {
 0)  qemu-sy-4354  |               |          check_preempt_wakeup() {
 0)  qemu-sy-4354  |   0.343 us    |          }
 0)  qemu-sy-4354  |   1.331 us    |        }
 0)  qemu-sy-4354  |   1.915 us    |      }
 0)  qemu-sy-4354  |               |      kvm_lapic_sync_from_vapic() {
 0)  qemu-sy-4354  |   0.294 us    |      }
 0)  qemu-sy-4354  |               |      kvm_handle_exit() {
 0)  qemu-sy-4354  |   0.457 us    |      }
 0)  qemu-sy-4354  |               |      kvm_resched() {
 0)  qemu-sy-4354  |               |        _cond_resched() {
 0)  qemu-sy-4354  |               |          __cond_resched() {
 0)  qemu-sy-4354  |               |            schedule() {
 0)  qemu-sy-4354  |               |              wakeup_preempt_entity() {
 0)  qemu-sy-4354  |   0.294 us    |              }
 0)  qemu-sy-4354  |               |              kvm_sched_out() {
 0)  qemu-sy-4354  |               |                kvm_arch_vcpu_put() {
 0)  qemu-sy-4354  |   0.592 us    |                }
 0)  qemu-sy-4354  |   1.218 us    |              }
 0)  qemu-sy-4354  =>   kipmi0-496
 0)  qemu-sy-4213  =>  qemu-sy-4354
 0)  qemu-sy-4354  |               |              kvm_sched_in() {
 0)  qemu-sy-4354  |               |                kvm_arch_vcpu_load() {
 0)  qemu-sy-4354  |               |                  kvm_write_guest_time() {
 0)  qemu-sy-4354  |   0.298 us    |                  }
 0)  qemu-sy-4354  |   1.070 us    |                }
 0)  qemu-sy-4354  |   1.665 us    |              }
 0)  qemu-sy-4354  | ! 9172.159 us |            }
 0)  qemu-sy-4354  | ! 9172.793 us |          }
 0)  qemu-sy-4354  | ! 9173.422 us |        }
 0)  qemu-sy-4354  | ! 9174.032 us |      }
 0)  qemu-sy-4354  |               |      kvm_inject_pending_timer_irqs() {
 0)  qemu-sy-4354  |               |        kvm_inject_apic_timer_irqs() {
 0)  qemu-sy-4354  |               |          kvm_vcpu_kick() {
 0)  qemu-sy-4354  |   0.291 us    |          }
 0)  qemu-sy-4354  |   1.151 us    |        }
 0)  qemu-sy-4354  |               |        kvm_inject_pit_timer_irqs() {
 0)  qemu-sy-4354  |   0.352 us    |        }
 0)  qemu-sy-4354  |   2.429 us    |      }
 0)  qemu-sy-4354  |               |      kvm_lapic_enabled() {
 0)  qemu-sy-4354  |   0.291 us    |      }
 0)  qemu-sy-4354  |               |      kvm_lapic_find_highest_irr() {
 0)  qemu-sy-4354  |   0.312 us    |      }
 0)  qemu-sy-4354  |               |      kvm_lapic_get_cr8() {
 0)  qemu-sy-4354  |   0.298 us    |      }
 0)  qemu-sy-4354  |               |      kvm_cpu_has_interrupt() {
 0)  qemu-sy-4354  |               |        kvm_apic_has_interrupt() {
 0)  qemu-sy-4354  |   0.385 us    |        }
 0)  qemu-sy-4354  |   0.980 us    |      }
 0)  qemu-sy-4354  |               |      kvm_lapic_sync_to_vapic() {
 0)  qemu-sy-4354  |   0.295 us    |      }
 0)  qemu-sy-4354  |               |      kvm_lapic_sync_from_vapic() {
 0)  qemu-sy-4354  |   0.331 us    |      }
 0)  qemu-sy-4354  |               |      kvm_handle_exit() {
 0)  qemu-sy-4354  |   0.568 us    |      }
 0)  qemu-sy-4354  |               |      kvm_inject_pending_timer_irqs() {
 0)  qemu-sy-4354  |               |        kvm_inject_apic_timer_irqs() {
 0)  qemu-sy-4354  |               |          kvm_vcpu_kick() {
 0)  qemu-sy-4354  |   0.295 us    |          }
 0)  qemu-sy-4354  |   0.959 us    |        }
 0)  qemu-sy-4354  |               |        kvm_inject_pit_timer_irqs() {
 0)  qemu-sy-4354  |   0.313 us    |        }
 0)  qemu-sy-4354  |   2.170 us    |      }
 0)  qemu-sy-4354  |               |      kvm_lapic_enabled() {
 0)  qemu-sy-4354  |   0.310 us    |      }
 0)  qemu-sy-4354  |               |      kvm_lapic_find_highest_irr() {
 0)  qemu-sy-4354  |   0.295 us    |      }
 0)  qemu-sy-4354  |               |      kvm_lapic_get_cr8() {
 0)  qemu-sy-4354  |   0.295 us    |      }
 0)  qemu-sy-4354  |               |      kvm_cpu_has_interrupt() {
 0)  qemu-sy-4354  |               |        kvm_apic_has_interrupt() {
 0)  qemu-sy-4354  |   0.325 us    |        }
 0)  qemu-sy-4354  |   0.938 us    |      }
 0)  qemu-sy-4354  |               |      kvm_cpu_get_interrupt() {
 0)  qemu-sy-4354  |               |        kvm_get_apic_interrupt() {
 0)  qemu-sy-4354  |               |          kvm_apic_has_interrupt() {
 0)  qemu-sy-4354  |   0.322 us    |          }
 0)  qemu-sy-4354  |   0.944 us    |        }
 0)  qemu-sy-4354  |   1.542 us    |      }
 0)  qemu-sy-4354  |               |      kvm_timer_intr_post() {
 0)  qemu-sy-4354  |               |        kvm_apic_timer_intr_post() {
 0)  qemu-sy-4354  |   0.309 us    |        }
 0)  qemu-sy-4354  |   2.059 us    |      }
 0)  qemu-sy-4354  |               |      kvm_cpu_has_interrupt() {
 0)  qemu-sy-4354  |               |        kvm_apic_has_interrupt() {
 0)  qemu-sy-4354  |   0.340 us    |        }
 0)  qemu-sy-4354  |               |        kvm_apic_accept_pic_intr() {
 0)  qemu-sy-4354  |   0.313 us    |        }
 0)  qemu-sy-4354  |   1.560 us    |      }
 0)  qemu-sy-4354  |               |      kvm_lapic_sync_to_vapic() {
 0)  qemu-sy-4354  |   0.298 us    |      }
 0)  qemu-sy-4354  |               |      kvm_lapic_sync_from_vapic() {
 0)  qemu-sy-4354  |   0.319 us    |      }
 0)  qemu-sy-4354  |               |      kvm_handle_exit() {
 0)  qemu-sy-4354  |               |        kvm_mmu_page_fault() {
 0)  qemu-sy-4354  |               |          kvm_read_guest() {
 0)  qemu-sy-4354  |               |            kvm_read_guest_page() {
 0)  qemu-sy-4354  |   0.764 us    |            }
 0)  qemu-sy-4354  |   1.377 us    |          }
 0)  qemu-sy-4354  |               |          kvm_read_guest() {
 0)  qemu-sy-4354  |               |            kvm_read_guest_page() {
 0)  qemu-sy-4354  |   0.499 us    |            }
 0)  qemu-sy-4354  |   1.088 us    |          }
 0)  qemu-sy-4354  |               |          kvm_release_pfn_clean() {
 0)  qemu-sy-4354  |   0.349 us    |          }
 0)  qemu-sy-4354  |               |          kvm_read_guest() {
 0)  qemu-sy-4354  |               |            kvm_read_guest_page() {
 0)  qemu-sy-4354  |   0.451 us    |            }
 0)  qemu-sy-4354  |   1.046 us    |          }
 0)  qemu-sy-4354  |               |          kvm_read_guest() {
 0)  qemu-sy-4354  |               |            kvm_read_guest_page() {
 0)  qemu-sy-4354  |   0.361 us    |            }
 0)  qemu-sy-4354  |   0.956 us    |          }
 0)  qemu-sy-4354  |               |          kvm_read_guest() {
 0)  qemu-sy-4354  |               |            kvm_read_guest_page() {
 0)  qemu-sy-4354  |   0.381 us    |            }
 0)  qemu-sy-4354  |   0.974 us    |          }
 0)  qemu-sy-4354  |               |          kvm_read_guest() {
 0)  qemu-sy-4354  |               |            kvm_read_guest_page() {
 0)  qemu-sy-4354  |   0.345 us    |            }
 0)  qemu-sy-4354  |   0.959 us    |          }
 0)  qemu-sy-4354  |               |          kvm_read_guest() {
 0)  qemu-sy-4354  |               |            kvm_read_guest_page() {
 0)  qemu-sy-4354  |   0.364 us    |            }
 0)  qemu-sy-4354  |   0.965 us    |          }
 0)  qemu-sy-4354  |               |          kvm_ioapic_update_eoi() {
 0)  qemu-sy-4354  |   0.358 us    |          }
 0)  qemu-sy-4354  | + 13.782 us   |        }
 0)  qemu-sy-4354  | + 14.681 us   |      }
 0)  qemu-sy-4354  |               |      kvm_inject_pending_timer_irqs() {
 0)  qemu-sy-4354  |               |        kvm_inject_apic_timer_irqs() {
 0)  qemu-sy-4354  |               |          kvm_vcpu_kick() {
 0)  qemu-sy-4354  |   0.291 us    |          }
 0)  qemu-sy-4354  |   0.953 us    |        }
 0)  qemu-sy-4354  |               |        kvm_inject_pit_timer_irqs() {
 0)  qemu-sy-4354  |   0.304 us    |        }
 0)  qemu-sy-4354  |   2.150 us    |      }
 0)  qemu-sy-4354  |               |      kvm_lapic_enabled() {
 0)  qemu-sy-4354  |   0.304 us    |      }
 0)  qemu-sy-4354  |               |      kvm_lapic_find_highest_irr() {
 0)  qemu-sy-4354  |   0.295 us    |      }
 0)  qemu-sy-4354  |               |      kvm_lapic_get_cr8() {
 0)  qemu-sy-4354  |   0.309 us    |      }
 0)  qemu-sy-4354  |               |      kvm_cpu_has_interrupt() {
 0)  qemu-sy-4354  |               |        kvm_apic_has_interrupt() {
 0)  qemu-sy-4354  |   0.315 us    |        }
 0)  qemu-sy-4354  |   0.914 us    |      }
 0)  qemu-sy-4354  |               |      kvm_lapic_sync_to_vapic() {
 0)  qemu-sy-4354  |   0.297 us    |      }
 0)  qemu-sy-4354  |               |      kvm_lapic_sync_from_vapic() {
 0)  qemu-sy-4354  |   0.318 us    |      }
 0)  qemu-sy-4354  |               |      kvm_handle_exit() {
 0)  qemu-sy-4354  |               |        kvm_emulate_pio() {
 0)  qemu-sy-4354  |               |          kvm_io_bus_find_dev() {
 0)  qemu-sy-4354  |   0.406 us    |          }
 0)  qemu-sy-4354  |   1.115 us    |        }
 0)  qemu-sy-4354  |   2.026 us    |      }
 0)  qemu-sy-4354  |               |      kvm_get_cr8() {
 0)  qemu-sy-4354  |               |        kvm_lapic_get_cr8() {
 0)  qemu-sy-4354  |   0.292 us    |        }
 0)  qemu-sy-4354  |   2.257 us    |      }
 0)  qemu-sy-4354  |               |      kvm_arch_vcpu_put() {
 0)  qemu-sy-4354  |   0.574 us    |      }
 0)  qemu-sy-4354  | ! 250406.2 us |    }
 0)  qemu-sy-4354  | ! 250407.0 us |  }


There's 2 preemptions in there, accounting for perhaps 15ms
Then there's about 20 __wake_up()s in there (wth do those come from?)
accounting for 5ms each, totaling 100ms.

There's a scheduler_tick() in there but no IRQ entry ?!

All in all its very hard to get to the total of 250ms.

I suspect __vcpu_run() and vcpu_enter_guest() get inlined, and we might
just be looking at time spend in the guest... bit hard to tell for me,
as this is the first time ever I looked at all this kvm code.


Is there a way to add a wall-time column to this output so that we can
see where the time goes?

Another something nice would be to have ctx switches like:

foo-1 => bar-2 ran: ${time foo spend on the cpu} since: ${time bar spend away from the cpu}

I'll poke me a little at this function graph tracer thingy to see if I
can do that.




^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-26  9:55                         ` Kevin Shanahan
  0 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-01-26  9:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Steven Rostedt, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker, bugme-daemon

On Wed, 2009-01-21 at 16:18 +0100, Ingo Molnar wrote:
> * Avi Kivity <avi@redhat.com> wrote:
> > It means, a scheduling problem.  Can you run the latency tracer (which 
> > only works with realtime priority), so we can tell if it is (a) kvm 
> > failing to wake up the vcpu properly or (b) the scheduler delaying the 
> > vcpu from running.
> 
> Could we please get an ftrace capture of the incident?
> 
> Firstly, it makes sense to simplify the tracing environment as much as 
> possible: for example single-CPU traces are much easier to interpret.
> 
> Can you reproduce it with just one CPU online? I.e. if you offline all the 
> other cores via:
> 
>   echo 0 > /sys/devices/system/cpu/cpu1/online
> 
>   [etc.]
> 
> and keep CPU#0 only, do the latencies still occur?
> 
> If they do still occur, then please do the traces that way.
> 
> [ If they do not occur then switch back on all CPUs - we'll sort out the
>   traces ;-) ]
> 
> Then please build a function tracer kernel, by enabling:
> 
>   CONFIG_FUNCTION_TRACER=y
>   CONFIG_FUNCTION_GRAPH_TRACER=y
>   CONFIG_DYNAMIC_FTRACE=y
> 
> Once you boot into such a kernel, you can switch on function tracing via:
> 
>   cd /debug/tracing/
> 
>   echo 0 > tracing_enabled
>   echo function_graph > current_tracer
>   echo funcgraph-proc > trace_options 
> 
> It does not run yet, first find a suitable set of functions to trace. For 
> example this will be a pretty good starting point for scheduler+KVM 
> problems:
> 
>   echo ''         > set_ftrace_filter  # clear filter functions
>   echo '*sched*' >> set_ftrace_filter 
>   echo '*wake*'  >> set_ftrace_filter
>   echo '*kvm*'   >> set_ftrace_filter
>   echo 1 > tracing_enabled             # let the tracer go
> 
> You can see your current selection of functions to trace via 'cat 
> set_ftrace_filter', and you can see all functions via 'cat 
> available_filter_functions'.
> 
> You can also trace all functions via:
> 
>   echo '*' > set_ftrace_filter
> 
> Tracer output can be captured from the 'trace' file. It should look like 
> this:
> 
>  15)   cc1-28106    |   0.263 us    |    page_evictable();
>  15)   cc1-28106    |               |    lru_cache_add_lru() {
>  15)   cc1-28106    |   0.252 us    |      __lru_cache_add();
>  15)   cc1-28106    |   0.738 us    |    }
>  15)   cc1-28106    | + 74.026 us   |  }
>  15)   cc1-28106    |               |  up_read() {
>  15)   cc1-28106    |   0.257 us    |    _spin_lock_irqsave();
>  15)   cc1-28106    |   0.253 us    |    _spin_unlock_irqrestore();
>  15)   cc1-28106    |   1.329 us    |  }
> 
> To capture a continuous stream of all trace data you can do:
> 
>   cat trace_pipe > /tmp/trace.txt
> 
> (this will also drain the trace ringbuffers.)
> 
> Note that this can be quite expensive if there are a lot of functions that 
> are traced - so it makes sense to trim down the set of traced functions to 
> only the interesting ones. Which are the interesting ones can be 
> determined from looking at the traces. You should see your KVM threads 
> getting active every second as the ping happens.
> 
> If you get lost events you can increase the trace buffer size via the 
> buffer_size_kb control - the default is around 1.4 MB.
> 
> Let me know if any of these steps is causing problems or if interpreting 
> the traces is difficult.

Just carrying out the steps was okay, but I don't really know what I'm
looking at. I've uploaded the trace here (about 10 seconds worth, I
think):

  http://disenchant.net/tmp/bug-12465/trace-1/

The guest being pinged is process 4353:

kmshanah@flexo:~$ pstree -p 4353
qemu-system-x86(4353)─┬─{qemu-system-x86}(4354)
                      ├─{qemu-system-x86}(4355)
                      └─{qemu-system-x86}(4772)

I guess the larger overhead/duration values are what we are looking for,
e.g.:

kmshanah@flexo:~$ bzgrep -E '[[:digit:]]{6,}' trace.txt.bz2 
 0)   ksoftir-4    | ! 3010470 us |  }
 0)  qemu-sy-4354  | ! 250406.2 us |    }
 0)  qemu-sy-4354  | ! 250407.0 us |  }
 0)  qemu-sy-4354  | ! 362946.3 us |    }
 0)  qemu-sy-4354  | ! 362947.0 us |  }
 0)  qemu-sy-4177  | ! 780480.3 us |  }
 0)  qemu-sy-4354  | ! 117685.7 us |    }
 0)  qemu-sy-4354  | ! 117686.5 us |  }

That ksoftirqd value is a bit strange (> 3 seconds, or is the formatting
wrong?). I guess I still need some guidance to know what I'm looking at
with this trace and/or what to do next.

Cheers,
Kevin.



^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-26  9:55                         ` Kevin Shanahan
  0 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-01-26  9:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Steven Rostedt, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r

On Wed, 2009-01-21 at 16:18 +0100, Ingo Molnar wrote:
> * Avi Kivity <avi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > It means, a scheduling problem.  Can you run the latency tracer (which 
> > only works with realtime priority), so we can tell if it is (a) kvm 
> > failing to wake up the vcpu properly or (b) the scheduler delaying the 
> > vcpu from running.
> 
> Could we please get an ftrace capture of the incident?
> 
> Firstly, it makes sense to simplify the tracing environment as much as 
> possible: for example single-CPU traces are much easier to interpret.
> 
> Can you reproduce it with just one CPU online? I.e. if you offline all the 
> other cores via:
> 
>   echo 0 > /sys/devices/system/cpu/cpu1/online
> 
>   [etc.]
> 
> and keep CPU#0 only, do the latencies still occur?
> 
> If they do still occur, then please do the traces that way.
> 
> [ If they do not occur then switch back on all CPUs - we'll sort out the
>   traces ;-) ]
> 
> Then please build a function tracer kernel, by enabling:
> 
>   CONFIG_FUNCTION_TRACER=y
>   CONFIG_FUNCTION_GRAPH_TRACER=y
>   CONFIG_DYNAMIC_FTRACE=y
> 
> Once you boot into such a kernel, you can switch on function tracing via:
> 
>   cd /debug/tracing/
> 
>   echo 0 > tracing_enabled
>   echo function_graph > current_tracer
>   echo funcgraph-proc > trace_options 
> 
> It does not run yet, first find a suitable set of functions to trace. For 
> example this will be a pretty good starting point for scheduler+KVM 
> problems:
> 
>   echo ''         > set_ftrace_filter  # clear filter functions
>   echo '*sched*' >> set_ftrace_filter 
>   echo '*wake*'  >> set_ftrace_filter
>   echo '*kvm*'   >> set_ftrace_filter
>   echo 1 > tracing_enabled             # let the tracer go
> 
> You can see your current selection of functions to trace via 'cat 
> set_ftrace_filter', and you can see all functions via 'cat 
> available_filter_functions'.
> 
> You can also trace all functions via:
> 
>   echo '*' > set_ftrace_filter
> 
> Tracer output can be captured from the 'trace' file. It should look like 
> this:
> 
>  15)   cc1-28106    |   0.263 us    |    page_evictable();
>  15)   cc1-28106    |               |    lru_cache_add_lru() {
>  15)   cc1-28106    |   0.252 us    |      __lru_cache_add();
>  15)   cc1-28106    |   0.738 us    |    }
>  15)   cc1-28106    | + 74.026 us   |  }
>  15)   cc1-28106    |               |  up_read() {
>  15)   cc1-28106    |   0.257 us    |    _spin_lock_irqsave();
>  15)   cc1-28106    |   0.253 us    |    _spin_unlock_irqrestore();
>  15)   cc1-28106    |   1.329 us    |  }
> 
> To capture a continuous stream of all trace data you can do:
> 
>   cat trace_pipe > /tmp/trace.txt
> 
> (this will also drain the trace ringbuffers.)
> 
> Note that this can be quite expensive if there are a lot of functions that 
> are traced - so it makes sense to trim down the set of traced functions to 
> only the interesting ones. Which are the interesting ones can be 
> determined from looking at the traces. You should see your KVM threads 
> getting active every second as the ping happens.
> 
> If you get lost events you can increase the trace buffer size via the 
> buffer_size_kb control - the default is around 1.4 MB.
> 
> Let me know if any of these steps is causing problems or if interpreting 
> the traces is difficult.

Just carrying out the steps was okay, but I don't really know what I'm
looking at. I've uploaded the trace here (about 10 seconds worth, I
think):

  http://disenchant.net/tmp/bug-12465/trace-1/

The guest being pinged is process 4353:

kmshanah@flexo:~$ pstree -p 4353
qemu-system-x86(4353)─┬─{qemu-system-x86}(4354)
                      ├─{qemu-system-x86}(4355)
                      └─{qemu-system-x86}(4772)

I guess the larger overhead/duration values are what we are looking for,
e.g.:

kmshanah@flexo:~$ bzgrep -E '[[:digit:]]{6,}' trace.txt.bz2 
 0)   ksoftir-4    | ! 3010470 us |  }
 0)  qemu-sy-4354  | ! 250406.2 us |    }
 0)  qemu-sy-4354  | ! 250407.0 us |  }
 0)  qemu-sy-4354  | ! 362946.3 us |    }
 0)  qemu-sy-4354  | ! 362947.0 us |  }
 0)  qemu-sy-4177  | ! 780480.3 us |  }
 0)  qemu-sy-4354  | ! 117685.7 us |    }
 0)  qemu-sy-4354  | ! 117686.5 us |  }

That ksoftirqd value is a bit strange (> 3 seconds, or is the formatting
wrong?). I guess I still need some guidance to know what I'm looking at
with this trace and/or what to do next.

Cheers,
Kevin.


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-22 20:31                           ` Ingo Molnar
  0 siblings, 0 replies; 262+ messages in thread
From: Ingo Molnar @ 2009-01-22 20:31 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Steven Rostedt, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker, bugme-daemon


* Kevin Shanahan <kmshanah@ucwb.org.au> wrote:

> On Wed, 2009-01-21 at 16:18 +0100, Ingo Molnar wrote:
> > * Avi Kivity <avi@redhat.com> wrote:
> > > It means, a scheduling problem.  Can you run the latency tracer (which 
> > > only works with realtime priority), so we can tell if it is (a) kvm 
> > > failing to wake up the vcpu properly or (b) the scheduler delaying the 
> > > vcpu from running.
> > 
> > Could we please get an ftrace capture of the incident?
> > 
> > Firstly, it makes sense to simplify the tracing environment as much as 
> > possible: for example single-CPU traces are much easier to interpret.
> > 
> > Can you reproduce it with just one CPU online? I.e. if you offline all the 
> > other cores via:
> > 
> >   echo 0 > /sys/devices/system/cpu/cpu1/online
> > 
> >   [etc.]
> > 
> > and keep CPU#0 only, do the latencies still occur?
> > 
> > If they do still occur, then please do the traces that way.
> > 
> > [ If they do not occur then switch back on all CPUs - we'll sort out the
> >   traces ;-) ]
> > 
> > Then please build a function tracer kernel, by enabling:
> > 
> >   CONFIG_FUNCTION_TRACER=y
> >   CONFIG_FUNCTION_GRAPH_TRACER=y
> >   CONFIG_DYNAMIC_FTRACE=y
> 
> Looks like the function graph tracer is only in 2.6.29, so I've updated
> now to 2.6.29-rc2-00013-gf3b8436.
> 
> Again, a control test to make sure the problem still occurs:
> 
> --- hermes-old.wumi.org.au ping statistics ---
> 64 packets transmitted, 64 received, 0% packet loss, time 63080ms
> rtt min/avg/max/mdev = 0.168/479.893/4015.950/894.721 ms, pipe 5
> 
> Yes, plenty of delays there. Next, checking if I can reproduce with only
> one core online:
> 
> echo 0 > /sys/devices/system/cpu/cpu1/online
> echo 0 > /sys/devices/system/cpu/cpu2/online
> echo 0 > /sys/devices/system/cpu/cpu3/online
> ...
> 
> --- hermes-old.wumi.org.au ping statistics ---
> 900 packets transmitted, 900 received, 0% packet loss, time 900253ms
> rtt min/avg/max/mdev = 0.127/38.937/2082.347/170.348 ms, pipe 3
> 
> --- hermes-old.wumi.org.au ping statistics ---
> 900 packets transmitted, 900 received, 0% packet loss, time 900995ms
> rtt min/avg/max/mdev = 0.127/428.398/17126.227/1634.980 ms, pipe 18
> 
> So it looks like I can do the simplified trace. [...]

That's good news! Another thing is that happens sometimes is that narrow 
races go away if tracing is turned on - the dreaded Heisenbugs. Hopefully 
this wont happen, but if it does, tracing is the cheapest when only a few 
specific functions are traced.

There are two main types of delays that can occur:

 - the delay is CPU time - i.e. anomalously large amount of CPU time spent 
   somewhere in the kernel. Getting a trace of exactly what that 
   processing is would be nice.

 - the delay is some sort of missed wakeup or other logic error in the 
   flow of execution. These are harder to trace - you might want to take a 
   look at trace_options to extend the trace format with various details, 
   if the need arises.

> [...] I've run out of time for that this morning, but I'll spend some 
> time on it over the weekend. Thanks for the detailed instructions - it 
> doesn't look like it will be too hard.

ok, looking forward to your traces. Also, let us know if you run into 
anything unintuitive / complicated in the ftrace usage side.

	Ingo

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-22 20:31                           ` Ingo Molnar
  0 siblings, 0 replies; 262+ messages in thread
From: Ingo Molnar @ 2009-01-22 20:31 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Steven Rostedt, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r


* Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org> wrote:

> On Wed, 2009-01-21 at 16:18 +0100, Ingo Molnar wrote:
> > * Avi Kivity <avi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > > It means, a scheduling problem.  Can you run the latency tracer (which 
> > > only works with realtime priority), so we can tell if it is (a) kvm 
> > > failing to wake up the vcpu properly or (b) the scheduler delaying the 
> > > vcpu from running.
> > 
> > Could we please get an ftrace capture of the incident?
> > 
> > Firstly, it makes sense to simplify the tracing environment as much as 
> > possible: for example single-CPU traces are much easier to interpret.
> > 
> > Can you reproduce it with just one CPU online? I.e. if you offline all the 
> > other cores via:
> > 
> >   echo 0 > /sys/devices/system/cpu/cpu1/online
> > 
> >   [etc.]
> > 
> > and keep CPU#0 only, do the latencies still occur?
> > 
> > If they do still occur, then please do the traces that way.
> > 
> > [ If they do not occur then switch back on all CPUs - we'll sort out the
> >   traces ;-) ]
> > 
> > Then please build a function tracer kernel, by enabling:
> > 
> >   CONFIG_FUNCTION_TRACER=y
> >   CONFIG_FUNCTION_GRAPH_TRACER=y
> >   CONFIG_DYNAMIC_FTRACE=y
> 
> Looks like the function graph tracer is only in 2.6.29, so I've updated
> now to 2.6.29-rc2-00013-gf3b8436.
> 
> Again, a control test to make sure the problem still occurs:
> 
> --- hermes-old.wumi.org.au ping statistics ---
> 64 packets transmitted, 64 received, 0% packet loss, time 63080ms
> rtt min/avg/max/mdev = 0.168/479.893/4015.950/894.721 ms, pipe 5
> 
> Yes, plenty of delays there. Next, checking if I can reproduce with only
> one core online:
> 
> echo 0 > /sys/devices/system/cpu/cpu1/online
> echo 0 > /sys/devices/system/cpu/cpu2/online
> echo 0 > /sys/devices/system/cpu/cpu3/online
> ...
> 
> --- hermes-old.wumi.org.au ping statistics ---
> 900 packets transmitted, 900 received, 0% packet loss, time 900253ms
> rtt min/avg/max/mdev = 0.127/38.937/2082.347/170.348 ms, pipe 3
> 
> --- hermes-old.wumi.org.au ping statistics ---
> 900 packets transmitted, 900 received, 0% packet loss, time 900995ms
> rtt min/avg/max/mdev = 0.127/428.398/17126.227/1634.980 ms, pipe 18
> 
> So it looks like I can do the simplified trace. [...]

That's good news! Another thing is that happens sometimes is that narrow 
races go away if tracing is turned on - the dreaded Heisenbugs. Hopefully 
this wont happen, but if it does, tracing is the cheapest when only a few 
specific functions are traced.

There are two main types of delays that can occur:

 - the delay is CPU time - i.e. anomalously large amount of CPU time spent 
   somewhere in the kernel. Getting a trace of exactly what that 
   processing is would be nice.

 - the delay is some sort of missed wakeup or other logic error in the 
   flow of execution. These are harder to trace - you might want to take a 
   look at trace_options to extend the trace format with various details, 
   if the need arises.

> [...] I've run out of time for that this morning, but I'll spend some 
> time on it over the weekend. Thanks for the detailed instructions - it 
> doesn't look like it will be too hard.

ok, looking forward to your traces. Also, let us know if you run into 
anything unintuitive / complicated in the ftrace usage side.

	Ingo

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-01-21 15:18                       ` Ingo Molnar
  (?)
@ 2009-01-22 19:57                       ` Kevin Shanahan
  2009-01-22 20:31                           ` Ingo Molnar
  -1 siblings, 1 reply; 262+ messages in thread
From: Kevin Shanahan @ 2009-01-22 19:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Steven Rostedt, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker, bugme-daemon

On Wed, 2009-01-21 at 16:18 +0100, Ingo Molnar wrote:
> * Avi Kivity <avi@redhat.com> wrote:
> > It means, a scheduling problem.  Can you run the latency tracer (which 
> > only works with realtime priority), so we can tell if it is (a) kvm 
> > failing to wake up the vcpu properly or (b) the scheduler delaying the 
> > vcpu from running.
> 
> Could we please get an ftrace capture of the incident?
> 
> Firstly, it makes sense to simplify the tracing environment as much as 
> possible: for example single-CPU traces are much easier to interpret.
> 
> Can you reproduce it with just one CPU online? I.e. if you offline all the 
> other cores via:
> 
>   echo 0 > /sys/devices/system/cpu/cpu1/online
> 
>   [etc.]
> 
> and keep CPU#0 only, do the latencies still occur?
> 
> If they do still occur, then please do the traces that way.
> 
> [ If they do not occur then switch back on all CPUs - we'll sort out the
>   traces ;-) ]
> 
> Then please build a function tracer kernel, by enabling:
> 
>   CONFIG_FUNCTION_TRACER=y
>   CONFIG_FUNCTION_GRAPH_TRACER=y
>   CONFIG_DYNAMIC_FTRACE=y

Looks like the function graph tracer is only in 2.6.29, so I've updated
now to 2.6.29-rc2-00013-gf3b8436.

Again, a control test to make sure the problem still occurs:

--- hermes-old.wumi.org.au ping statistics ---
64 packets transmitted, 64 received, 0% packet loss, time 63080ms
rtt min/avg/max/mdev = 0.168/479.893/4015.950/894.721 ms, pipe 5

Yes, plenty of delays there. Next, checking if I can reproduce with only
one core online:

echo 0 > /sys/devices/system/cpu/cpu1/online
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 0 > /sys/devices/system/cpu/cpu3/online
...

--- hermes-old.wumi.org.au ping statistics ---
900 packets transmitted, 900 received, 0% packet loss, time 900253ms
rtt min/avg/max/mdev = 0.127/38.937/2082.347/170.348 ms, pipe 3

--- hermes-old.wumi.org.au ping statistics ---
900 packets transmitted, 900 received, 0% packet loss, time 900995ms
rtt min/avg/max/mdev = 0.127/428.398/17126.227/1634.980 ms, pipe 18

So it looks like I can do the simplified trace. I've run out of time for
that this morning, but I'll spend some time on it over the weekend.
Thanks for the detailed instructions - it doesn't look like it will be
too hard.

Cheers,
Kevin.



^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-22  1:48                           ` Steven Rostedt
  0 siblings, 0 replies; 262+ messages in thread
From: Steven Rostedt @ 2009-01-22  1:48 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Kevin Shanahan, Ingo Molnar, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker, bugme-daemon




On Wed, 21 Jan 2009, Avi Kivity wrote:

> Kevin Shanahan wrote:
> > > > --- hermes-old.wumi.org.au ping statistics ---
> > > > 900 packets transmitted, 900 received, 0% packet loss, time 899326ms
> > > > rtt min/avg/max/mdev = 0.093/0.157/3.611/0.117 ms
> > > > 
> > > > So, a _huge_ difference. But what does it mean?
> > > >       
> > > It means, a scheduling problem.  Can you run the latency tracer (which
> > > only works with realtime priority), so we can tell if it is (a) kvm
> > > failing to wake up the vcpu properly or (b) the scheduler delaying the
> > > vcpu from running.
> > >     
> > 
> > Sorry, but are you sure that's going to be useful?
> > 
> > If it only works on realtime threads and I'm not seeing the problem when
> > running kvm with realtime priority, is this going to tell you what you
> > want to know?
> > 
> > Not trying to be difficult, but that just didn't make sense to me.
> >   
> 
> You're right, wasn't thinking properly.
> 
> This is a tough one.  I'll see if I can think of something.  Ingo, any ideas?

I fixed up the wakeup latency tracer to work with all tasks (as well as 
other fixes). You can checkout the following:

git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git

  branch: tip/devel

compile with CONFIG_FUNCTION_TRACER and CONFIG_SCHED_TRACER and just

echo 0 > /debug/tracing/tracing_enabled
echo wakeup > /debug/tracing/current_tracer

echo 1 > /debug/tracing/tracing_enabled
run your test
echo 0 > /debug/tracing/tracing_enabled

and then look at /debug/tracing/latency_trace

-- Steve


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-22  1:48                           ` Steven Rostedt
  0 siblings, 0 replies; 262+ messages in thread
From: Steven Rostedt @ 2009-01-22  1:48 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Kevin Shanahan, Ingo Molnar, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r




On Wed, 21 Jan 2009, Avi Kivity wrote:

> Kevin Shanahan wrote:
> > > > --- hermes-old.wumi.org.au ping statistics ---
> > > > 900 packets transmitted, 900 received, 0% packet loss, time 899326ms
> > > > rtt min/avg/max/mdev = 0.093/0.157/3.611/0.117 ms
> > > > 
> > > > So, a _huge_ difference. But what does it mean?
> > > >       
> > > It means, a scheduling problem.  Can you run the latency tracer (which
> > > only works with realtime priority), so we can tell if it is (a) kvm
> > > failing to wake up the vcpu properly or (b) the scheduler delaying the
> > > vcpu from running.
> > >     
> > 
> > Sorry, but are you sure that's going to be useful?
> > 
> > If it only works on realtime threads and I'm not seeing the problem when
> > running kvm with realtime priority, is this going to tell you what you
> > want to know?
> > 
> > Not trying to be difficult, but that just didn't make sense to me.
> >   
> 
> You're right, wasn't thinking properly.
> 
> This is a tough one.  I'll see if I can think of something.  Ingo, any ideas?

I fixed up the wakeup latency tracer to work with all tasks (as well as 
other fixes). You can checkout the following:

git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git

  branch: tip/devel

compile with CONFIG_FUNCTION_TRACER and CONFIG_SCHED_TRACER and just

echo 0 > /debug/tracing/tracing_enabled
echo wakeup > /debug/tracing/current_tracer

echo 1 > /debug/tracing/tracing_enabled
run your test
echo 0 > /debug/tracing/tracing_enabled

and then look at /debug/tracing/latency_trace

-- Steve

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-21 15:18                       ` Ingo Molnar
  0 siblings, 0 replies; 262+ messages in thread
From: Ingo Molnar @ 2009-01-21 15:18 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Kevin Shanahan, Steven Rostedt, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker, bugme-daemon


* Avi Kivity <avi@redhat.com> wrote:

> Kevin Shanahan wrote:
>> On Tue, 2009-01-20 at 19:47 +0200, Avi Kivity wrote:
>>   
>>> Steven Rostedt wrote:
>>>     
>>>> Note, the wakeup latency only tests realtime threads, since other threads
>>>> can have other issues for wakeup. I could change the wakeup tracer as
>>>> wakeup_rt, and make a new "wakeup" that tests all threads, but it may
>>>> be difficult to get something accurate.
>>>>       
>>> Kevin, can you retest with kvm at realtime priority?
>>>     
>>
>> Running vanilla Linux 2.6.28, kvm-82. First a control test to check that
>> the problem is still there when running at normal priority:
>>
>> --- hermes-old.wumi.org.au ping statistics ---
>> 900 packets transmitted, 900 received, 0% packet loss, time 899283ms
>> rtt min/avg/max/mdev = 0.119/269.773/13739.426/1230.836 ms, pipe 14
>>
>> Yeah, sure is.
>>
>> Okay, so now I set the realtime attributes of the processes for the VM
>> instance being pinged:
>>
>> flexo:~# ps ax | grep 6284
>>  6284 ?        Sl     6:11 /usr/local/kvm/bin/qemu-system-x86_64 -smp 2
>> -m 2048 -hda kvm-17-1.img -hdb kvm-17-tmp.img -net
>> nic,vlan=0,macaddr=52:54:00:12:34:67,model=rtl8139 -net
>> tap,vlan=0,ifname=tap17,script=no -vnc 127.0.0.1:17 -usbdevice tablet
>> -daemonize
>> flexo:~# pstree -p 6284
>> qemu-system-x86(6284)─┬─{qemu-system-x86}(6285)
>>                       ├─{qemu-system-x86}(6286)
>>                       └─{qemu-system-x86}(6540)
>>
>> (info cpus on the QEMU console shows 6285 and 6286 being the VCPU
>> processes. Not sure what the third child is for, maybe vnc?.)
>>
>> flexo:~# chrt -r -p 3 6284
>> flexo:~# chrt -r -p 3 6285
>> flexo:~# chrt -r -p 3 6286
>> flexo:~# chrt -p 6284
>> pid 6284's current scheduling policy: SCHED_RR
>> pid 6284's current scheduling priority: 3
>> flexo:~# chrt -p 6285
>> pid 6285's current scheduling policy: SCHED_RR
>> pid 6285's current scheduling priority: 3
>> flexo:~# chrt -p 6286
>> pid 6286's current scheduling policy: SCHED_RR
>> pid 6286's current scheduling priority: 3
>>
>> And the result of the ping test now:
>>
>> --- hermes-old.wumi.org.au ping statistics ---
>> 900 packets transmitted, 900 received, 0% packet loss, time 899326ms
>> rtt min/avg/max/mdev = 0.093/0.157/3.611/0.117 ms
>>
>> So, a _huge_ difference. But what does it mean?
>
> It means, a scheduling problem.  Can you run the latency tracer (which 
> only works with realtime priority), so we can tell if it is (a) kvm 
> failing to wake up the vcpu properly or (b) the scheduler delaying the 
> vcpu from running.

Could we please get an ftrace capture of the incident?

Firstly, it makes sense to simplify the tracing environment as much as 
possible: for example single-CPU traces are much easier to interpret.

Can you reproduce it with just one CPU online? I.e. if you offline all the 
other cores via:

  echo 0 > /sys/devices/system/cpu/cpu1/online

  [etc.]

and keep CPU#0 only, do the latencies still occur?

If they do still occur, then please do the traces that way.

[ If they do not occur then switch back on all CPUs - we'll sort out the
  traces ;-) ]

Then please build a function tracer kernel, by enabling:

  CONFIG_FUNCTION_TRACER=y
  CONFIG_FUNCTION_GRAPH_TRACER=y
  CONFIG_DYNAMIC_FTRACE=y

Once you boot into such a kernel, you can switch on function tracing via:

  cd /debug/tracing/

  echo 0 > tracing_enabled
  echo function_graph > current_tracer
  echo funcgraph-proc > trace_options 

It does not run yet, first find a suitable set of functions to trace. For 
example this will be a pretty good starting point for scheduler+KVM 
problems:

  echo ''         > set_ftrace_filter  # clear filter functions
  echo '*sched*' >> set_ftrace_filter 
  echo '*wake*'  >> set_ftrace_filter
  echo '*kvm*'   >> set_ftrace_filter
  echo 1 > tracing_enabled             # let the tracer go

You can see your current selection of functions to trace via 'cat 
set_ftrace_filter', and you can see all functions via 'cat 
available_filter_functions'.

You can also trace all functions via:

  echo '*' > set_ftrace_filter

Tracer output can be captured from the 'trace' file. It should look like 
this:

 15)   cc1-28106    |   0.263 us    |    page_evictable();
 15)   cc1-28106    |               |    lru_cache_add_lru() {
 15)   cc1-28106    |   0.252 us    |      __lru_cache_add();
 15)   cc1-28106    |   0.738 us    |    }
 15)   cc1-28106    | + 74.026 us   |  }
 15)   cc1-28106    |               |  up_read() {
 15)   cc1-28106    |   0.257 us    |    _spin_lock_irqsave();
 15)   cc1-28106    |   0.253 us    |    _spin_unlock_irqrestore();
 15)   cc1-28106    |   1.329 us    |  }

To capture a continuous stream of all trace data you can do:

  cat trace_pipe > /tmp/trace.txt

(this will also drain the trace ringbuffers.)

Note that this can be quite expensive if there are a lot of functions that 
are traced - so it makes sense to trim down the set of traced functions to 
only the interesting ones. Which are the interesting ones can be 
determined from looking at the traces. You should see your KVM threads 
getting active every second as the ping happens.

If you get lost events you can increase the trace buffer size via the 
buffer_size_kb control - the default is around 1.4 MB.

Let me know if any of these steps is causing problems or if interpreting 
the traces is difficult.

	Ingo

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-21 15:18                       ` Ingo Molnar
  0 siblings, 0 replies; 262+ messages in thread
From: Ingo Molnar @ 2009-01-21 15:18 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Kevin Shanahan, Steven Rostedt, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r


* Avi Kivity <avi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> Kevin Shanahan wrote:
>> On Tue, 2009-01-20 at 19:47 +0200, Avi Kivity wrote:
>>   
>>> Steven Rostedt wrote:
>>>     
>>>> Note, the wakeup latency only tests realtime threads, since other threads
>>>> can have other issues for wakeup. I could change the wakeup tracer as
>>>> wakeup_rt, and make a new "wakeup" that tests all threads, but it may
>>>> be difficult to get something accurate.
>>>>       
>>> Kevin, can you retest with kvm at realtime priority?
>>>     
>>
>> Running vanilla Linux 2.6.28, kvm-82. First a control test to check that
>> the problem is still there when running at normal priority:
>>
>> --- hermes-old.wumi.org.au ping statistics ---
>> 900 packets transmitted, 900 received, 0% packet loss, time 899283ms
>> rtt min/avg/max/mdev = 0.119/269.773/13739.426/1230.836 ms, pipe 14
>>
>> Yeah, sure is.
>>
>> Okay, so now I set the realtime attributes of the processes for the VM
>> instance being pinged:
>>
>> flexo:~# ps ax | grep 6284
>>  6284 ?        Sl     6:11 /usr/local/kvm/bin/qemu-system-x86_64 -smp 2
>> -m 2048 -hda kvm-17-1.img -hdb kvm-17-tmp.img -net
>> nic,vlan=0,macaddr=52:54:00:12:34:67,model=rtl8139 -net
>> tap,vlan=0,ifname=tap17,script=no -vnc 127.0.0.1:17 -usbdevice tablet
>> -daemonize
>> flexo:~# pstree -p 6284
>> qemu-system-x86(6284)─┬─{qemu-system-x86}(6285)
>>                       ├─{qemu-system-x86}(6286)
>>                       └─{qemu-system-x86}(6540)
>>
>> (info cpus on the QEMU console shows 6285 and 6286 being the VCPU
>> processes. Not sure what the third child is for, maybe vnc?.)
>>
>> flexo:~# chrt -r -p 3 6284
>> flexo:~# chrt -r -p 3 6285
>> flexo:~# chrt -r -p 3 6286
>> flexo:~# chrt -p 6284
>> pid 6284's current scheduling policy: SCHED_RR
>> pid 6284's current scheduling priority: 3
>> flexo:~# chrt -p 6285
>> pid 6285's current scheduling policy: SCHED_RR
>> pid 6285's current scheduling priority: 3
>> flexo:~# chrt -p 6286
>> pid 6286's current scheduling policy: SCHED_RR
>> pid 6286's current scheduling priority: 3
>>
>> And the result of the ping test now:
>>
>> --- hermes-old.wumi.org.au ping statistics ---
>> 900 packets transmitted, 900 received, 0% packet loss, time 899326ms
>> rtt min/avg/max/mdev = 0.093/0.157/3.611/0.117 ms
>>
>> So, a _huge_ difference. But what does it mean?
>
> It means, a scheduling problem.  Can you run the latency tracer (which 
> only works with realtime priority), so we can tell if it is (a) kvm 
> failing to wake up the vcpu properly or (b) the scheduler delaying the 
> vcpu from running.

Could we please get an ftrace capture of the incident?

Firstly, it makes sense to simplify the tracing environment as much as 
possible: for example single-CPU traces are much easier to interpret.

Can you reproduce it with just one CPU online? I.e. if you offline all the 
other cores via:

  echo 0 > /sys/devices/system/cpu/cpu1/online

  [etc.]

and keep CPU#0 only, do the latencies still occur?

If they do still occur, then please do the traces that way.

[ If they do not occur then switch back on all CPUs - we'll sort out the
  traces ;-) ]

Then please build a function tracer kernel, by enabling:

  CONFIG_FUNCTION_TRACER=y
  CONFIG_FUNCTION_GRAPH_TRACER=y
  CONFIG_DYNAMIC_FTRACE=y

Once you boot into such a kernel, you can switch on function tracing via:

  cd /debug/tracing/

  echo 0 > tracing_enabled
  echo function_graph > current_tracer
  echo funcgraph-proc > trace_options 

It does not run yet, first find a suitable set of functions to trace. For 
example this will be a pretty good starting point for scheduler+KVM 
problems:

  echo ''         > set_ftrace_filter  # clear filter functions
  echo '*sched*' >> set_ftrace_filter 
  echo '*wake*'  >> set_ftrace_filter
  echo '*kvm*'   >> set_ftrace_filter
  echo 1 > tracing_enabled             # let the tracer go

You can see your current selection of functions to trace via 'cat 
set_ftrace_filter', and you can see all functions via 'cat 
available_filter_functions'.

You can also trace all functions via:

  echo '*' > set_ftrace_filter

Tracer output can be captured from the 'trace' file. It should look like 
this:

 15)   cc1-28106    |   0.263 us    |    page_evictable();
 15)   cc1-28106    |               |    lru_cache_add_lru() {
 15)   cc1-28106    |   0.252 us    |      __lru_cache_add();
 15)   cc1-28106    |   0.738 us    |    }
 15)   cc1-28106    | + 74.026 us   |  }
 15)   cc1-28106    |               |  up_read() {
 15)   cc1-28106    |   0.257 us    |    _spin_lock_irqsave();
 15)   cc1-28106    |   0.253 us    |    _spin_unlock_irqrestore();
 15)   cc1-28106    |   1.329 us    |  }

To capture a continuous stream of all trace data you can do:

  cat trace_pipe > /tmp/trace.txt

(this will also drain the trace ringbuffers.)

Note that this can be quite expensive if there are a lot of functions that 
are traced - so it makes sense to trim down the set of traced functions to 
only the interesting ones. Which are the interesting ones can be 
determined from looking at the traces. You should see your KVM threads 
getting active every second as the ping happens.

If you get lost events you can increase the trace buffer size via the 
buffer_size_kb control - the default is around 1.4 MB.

Let me know if any of these steps is causing problems or if interpreting 
the traces is difficult.

	Ingo

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-21 15:13                           ` Steven Rostedt
  0 siblings, 0 replies; 262+ messages in thread
From: Steven Rostedt @ 2009-01-21 15:13 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Kevin Shanahan, Ingo Molnar, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker, bugme-daemon


On Wed, 21 Jan 2009, Avi Kivity wrote:

> Kevin Shanahan wrote:
> > > > --- hermes-old.wumi.org.au ping statistics ---
> > > > 900 packets transmitted, 900 received, 0% packet loss, time 899326ms
> > > > rtt min/avg/max/mdev = 0.093/0.157/3.611/0.117 ms
> > > > 
> > > > So, a _huge_ difference. But what does it mean?
> > > >       
> > > It means, a scheduling problem.  Can you run the latency tracer (which
> > > only works with realtime priority), so we can tell if it is (a) kvm
> > > failing to wake up the vcpu properly or (b) the scheduler delaying the
> > > vcpu from running.
> > >     
> > 
> > Sorry, but are you sure that's going to be useful?
> > 
> > If it only works on realtime threads and I'm not seeing the problem when
> > running kvm with realtime priority, is this going to tell you what you
> > want to know?
> > 
> > Not trying to be difficult, but that just didn't make sense to me.
> >   
> 
> You're right, wasn't thinking properly.
> 
> This is a tough one.  I'll see if I can think of something.  Ingo, any ideas?
> 

I should have replied to this email :-)

Yeah, I'm working on making wakeup latency tracer work with non rt tasks.

The "wakeup" tracer will now trace all tasks where as a new "wakeup_rt" 
tracer will only trace rt tasks. I did it for rt tasks only because it 
only records the highest latency wake ups and the non rt tasks were always 
bigger than the rt tasks which made what I was tracing useless (the rt 
scheduling).

But by not having an option for all tasks, it makes the wakeup tracer 
useless for everyone else ;-)

-- Steve


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-21 15:13                           ` Steven Rostedt
  0 siblings, 0 replies; 262+ messages in thread
From: Steven Rostedt @ 2009-01-21 15:13 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Kevin Shanahan, Ingo Molnar, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r


On Wed, 21 Jan 2009, Avi Kivity wrote:

> Kevin Shanahan wrote:
> > > > --- hermes-old.wumi.org.au ping statistics ---
> > > > 900 packets transmitted, 900 received, 0% packet loss, time 899326ms
> > > > rtt min/avg/max/mdev = 0.093/0.157/3.611/0.117 ms
> > > > 
> > > > So, a _huge_ difference. But what does it mean?
> > > >       
> > > It means, a scheduling problem.  Can you run the latency tracer (which
> > > only works with realtime priority), so we can tell if it is (a) kvm
> > > failing to wake up the vcpu properly or (b) the scheduler delaying the
> > > vcpu from running.
> > >     
> > 
> > Sorry, but are you sure that's going to be useful?
> > 
> > If it only works on realtime threads and I'm not seeing the problem when
> > running kvm with realtime priority, is this going to tell you what you
> > want to know?
> > 
> > Not trying to be difficult, but that just didn't make sense to me.
> >   
> 
> You're right, wasn't thinking properly.
> 
> This is a tough one.  I'll see if I can think of something.  Ingo, any ideas?
> 

I should have replied to this email :-)

Yeah, I'm working on making wakeup latency tracer work with non rt tasks.

The "wakeup" tracer will now trace all tasks where as a new "wakeup_rt" 
tracer will only trace rt tasks. I did it for rt tasks only because it 
only records the highest latency wake ups and the non rt tasks were always 
bigger than the rt tasks which made what I was tracing useless (the rt 
scheduling).

But by not having an option for all tasks, it makes the wakeup tracer 
useless for everyone else ;-)

-- Steve

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-21 15:10                       ` Steven Rostedt
  0 siblings, 0 replies; 262+ messages in thread
From: Steven Rostedt @ 2009-01-21 15:10 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Kevin Shanahan, Ingo Molnar, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker, bugme-daemon


On Wed, 21 Jan 2009, Avi Kivity wrote:

> Kevin Shanahan wrote:
> > On Tue, 2009-01-20 at 19:47 +0200, Avi Kivity wrote:
> >   
> > > Steven Rostedt wrote:
> > >     
> > > > Note, the wakeup latency only tests realtime threads, since other
> > > > threads
> > > > can have other issues for wakeup. I could change the wakeup tracer as
> > > > wakeup_rt, and make a new "wakeup" that tests all threads, but it may
> > > > be difficult to get something accurate.
> > > >       
> > > Kevin, can you retest with kvm at realtime priority?
> > >     
> > 
> > Running vanilla Linux 2.6.28, kvm-82. First a control test to check that
> > the problem is still there when running at normal priority:
> > 
> > --- hermes-old.wumi.org.au ping statistics ---
> > 900 packets transmitted, 900 received, 0% packet loss, time 899283ms
> > rtt min/avg/max/mdev = 0.119/269.773/13739.426/1230.836 ms, pipe 14
> > 
> > Yeah, sure is.
> > 
> > Okay, so now I set the realtime attributes of the processes for the VM
> > instance being pinged:
> > 
> > flexo:~# ps ax | grep 6284
> >  6284 ?        Sl     6:11 /usr/local/kvm/bin/qemu-system-x86_64 -smp 2
> > -m 2048 -hda kvm-17-1.img -hdb kvm-17-tmp.img -net
> > nic,vlan=0,macaddr=52:54:00:12:34:67,model=rtl8139 -net
> > tap,vlan=0,ifname=tap17,script=no -vnc 127.0.0.1:17 -usbdevice tablet
> > -daemonize
> > flexo:~# pstree -p 6284
> > qemu-system-x86(6284)???{qemu-system-x86}(6285)
> >                       ??{qemu-system-x86}(6286)
> >                       ??{qemu-system-x86}(6540)
> > 
> > (info cpus on the QEMU console shows 6285 and 6286 being the VCPU
> > processes. Not sure what the third child is for, maybe vnc?.)
> > 
> > flexo:~# chrt -r -p 3 6284
> > flexo:~# chrt -r -p 3 6285
> > flexo:~# chrt -r -p 3 6286
> > flexo:~# chrt -p 6284
> > pid 6284's current scheduling policy: SCHED_RR
> > pid 6284's current scheduling priority: 3
> > flexo:~# chrt -p 6285
> > pid 6285's current scheduling policy: SCHED_RR
> > pid 6285's current scheduling priority: 3
> > flexo:~# chrt -p 6286
> > pid 6286's current scheduling policy: SCHED_RR
> > pid 6286's current scheduling priority: 3
> > 
> > And the result of the ping test now:
> > 
> > --- hermes-old.wumi.org.au ping statistics ---
> > 900 packets transmitted, 900 received, 0% packet loss, time 899326ms
> > rtt min/avg/max/mdev = 0.093/0.157/3.611/0.117 ms
> > 
> > So, a _huge_ difference. But what does it mean?
> 
> It means, a scheduling problem.  Can you run the latency tracer (which only
> works with realtime priority), so we can tell if it is (a) kvm failing to wake
> up the vcpu properly or (b) the scheduler delaying the vcpu from running.
> 

Note, I'm working on a tracer that will also measure non RT task wake up 
times.

-- Steve


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-21 15:10                       ` Steven Rostedt
  0 siblings, 0 replies; 262+ messages in thread
From: Steven Rostedt @ 2009-01-21 15:10 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Kevin Shanahan, Ingo Molnar, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r


On Wed, 21 Jan 2009, Avi Kivity wrote:

> Kevin Shanahan wrote:
> > On Tue, 2009-01-20 at 19:47 +0200, Avi Kivity wrote:
> >   
> > > Steven Rostedt wrote:
> > >     
> > > > Note, the wakeup latency only tests realtime threads, since other
> > > > threads
> > > > can have other issues for wakeup. I could change the wakeup tracer as
> > > > wakeup_rt, and make a new "wakeup" that tests all threads, but it may
> > > > be difficult to get something accurate.
> > > >       
> > > Kevin, can you retest with kvm at realtime priority?
> > >     
> > 
> > Running vanilla Linux 2.6.28, kvm-82. First a control test to check that
> > the problem is still there when running at normal priority:
> > 
> > --- hermes-old.wumi.org.au ping statistics ---
> > 900 packets transmitted, 900 received, 0% packet loss, time 899283ms
> > rtt min/avg/max/mdev = 0.119/269.773/13739.426/1230.836 ms, pipe 14
> > 
> > Yeah, sure is.
> > 
> > Okay, so now I set the realtime attributes of the processes for the VM
> > instance being pinged:
> > 
> > flexo:~# ps ax | grep 6284
> >  6284 ?        Sl     6:11 /usr/local/kvm/bin/qemu-system-x86_64 -smp 2
> > -m 2048 -hda kvm-17-1.img -hdb kvm-17-tmp.img -net
> > nic,vlan=0,macaddr=52:54:00:12:34:67,model=rtl8139 -net
> > tap,vlan=0,ifname=tap17,script=no -vnc 127.0.0.1:17 -usbdevice tablet
> > -daemonize
> > flexo:~# pstree -p 6284
> > qemu-system-x86(6284)???{qemu-system-x86}(6285)
> >                       ??{qemu-system-x86}(6286)
> >                       ??{qemu-system-x86}(6540)
> > 
> > (info cpus on the QEMU console shows 6285 and 6286 being the VCPU
> > processes. Not sure what the third child is for, maybe vnc?.)
> > 
> > flexo:~# chrt -r -p 3 6284
> > flexo:~# chrt -r -p 3 6285
> > flexo:~# chrt -r -p 3 6286
> > flexo:~# chrt -p 6284
> > pid 6284's current scheduling policy: SCHED_RR
> > pid 6284's current scheduling priority: 3
> > flexo:~# chrt -p 6285
> > pid 6285's current scheduling policy: SCHED_RR
> > pid 6285's current scheduling priority: 3
> > flexo:~# chrt -p 6286
> > pid 6286's current scheduling policy: SCHED_RR
> > pid 6286's current scheduling priority: 3
> > 
> > And the result of the ping test now:
> > 
> > --- hermes-old.wumi.org.au ping statistics ---
> > 900 packets transmitted, 900 received, 0% packet loss, time 899326ms
> > rtt min/avg/max/mdev = 0.093/0.157/3.611/0.117 ms
> > 
> > So, a _huge_ difference. But what does it mean?
> 
> It means, a scheduling problem.  Can you run the latency tracer (which only
> works with realtime priority), so we can tell if it is (a) kvm failing to wake
> up the vcpu properly or (b) the scheduler delaying the vcpu from running.
> 

Note, I'm working on a tracer that will also measure non RT task wake up 
times.

-- Steve

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-21 14:59                         ` Avi Kivity
  0 siblings, 0 replies; 262+ messages in thread
From: Avi Kivity @ 2009-01-21 14:59 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Steven Rostedt, Ingo Molnar, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker, bugme-daemon

Kevin Shanahan wrote:
>>> --- hermes-old.wumi.org.au ping statistics ---
>>> 900 packets transmitted, 900 received, 0% packet loss, time 899326ms
>>> rtt min/avg/max/mdev = 0.093/0.157/3.611/0.117 ms
>>>
>>> So, a _huge_ difference. But what does it mean?
>>>       
>> It means, a scheduling problem.  Can you run the latency tracer (which 
>> only works with realtime priority), so we can tell if it is (a) kvm 
>> failing to wake up the vcpu properly or (b) the scheduler delaying the 
>> vcpu from running.
>>     
>
> Sorry, but are you sure that's going to be useful?
>
> If it only works on realtime threads and I'm not seeing the problem when
> running kvm with realtime priority, is this going to tell you what you
> want to know?
>
> Not trying to be difficult, but that just didn't make sense to me.
>   

You're right, wasn't thinking properly.

This is a tough one.  I'll see if I can think of something.  Ingo, any 
ideas?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-21 14:59                         ` Avi Kivity
  0 siblings, 0 replies; 262+ messages in thread
From: Avi Kivity @ 2009-01-21 14:59 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Steven Rostedt, Ingo Molnar, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r

Kevin Shanahan wrote:
>>> --- hermes-old.wumi.org.au ping statistics ---
>>> 900 packets transmitted, 900 received, 0% packet loss, time 899326ms
>>> rtt min/avg/max/mdev = 0.093/0.157/3.611/0.117 ms
>>>
>>> So, a _huge_ difference. But what does it mean?
>>>       
>> It means, a scheduling problem.  Can you run the latency tracer (which 
>> only works with realtime priority), so we can tell if it is (a) kvm 
>> failing to wake up the vcpu properly or (b) the scheduler delaying the 
>> vcpu from running.
>>     
>
> Sorry, but are you sure that's going to be useful?
>
> If it only works on realtime threads and I'm not seeing the problem when
> running kvm with realtime priority, is this going to tell you what you
> want to know?
>
> Not trying to be difficult, but that just didn't make sense to me.
>   

You're right, wasn't thinking properly.

This is a tough one.  I'll see if I can think of something.  Ingo, any 
ideas?

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-21 14:51                       ` Kevin Shanahan
  0 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-01-21 14:51 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Steven Rostedt, Ingo Molnar, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker, bugme-daemon

On Wed, 2009-01-21 at 16:34 +0200, Avi Kivity wrote:
> Kevin Shanahan wrote:
> > On Tue, 2009-01-20 at 19:47 +0200, Avi Kivity wrote:
> >> Kevin, can you retest with kvm at realtime priority?
...

> > --- hermes-old.wumi.org.au ping statistics ---
> > 900 packets transmitted, 900 received, 0% packet loss, time 899326ms
> > rtt min/avg/max/mdev = 0.093/0.157/3.611/0.117 ms
> >
> > So, a _huge_ difference. But what does it mean?
> 
> It means, a scheduling problem.  Can you run the latency tracer (which 
> only works with realtime priority), so we can tell if it is (a) kvm 
> failing to wake up the vcpu properly or (b) the scheduler delaying the 
> vcpu from running.

Sorry, but are you sure that's going to be useful?

If it only works on realtime threads and I'm not seeing the problem when
running kvm with realtime priority, is this going to tell you what you
want to know?

Not trying to be difficult, but that just didn't make sense to me.

Regards,
Kevin.



^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-21 14:51                       ` Kevin Shanahan
  0 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-01-21 14:51 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Steven Rostedt, Ingo Molnar, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r

On Wed, 2009-01-21 at 16:34 +0200, Avi Kivity wrote:
> Kevin Shanahan wrote:
> > On Tue, 2009-01-20 at 19:47 +0200, Avi Kivity wrote:
> >> Kevin, can you retest with kvm at realtime priority?
...

> > --- hermes-old.wumi.org.au ping statistics ---
> > 900 packets transmitted, 900 received, 0% packet loss, time 899326ms
> > rtt min/avg/max/mdev = 0.093/0.157/3.611/0.117 ms
> >
> > So, a _huge_ difference. But what does it mean?
> 
> It means, a scheduling problem.  Can you run the latency tracer (which 
> only works with realtime priority), so we can tell if it is (a) kvm 
> failing to wake up the vcpu properly or (b) the scheduler delaying the 
> vcpu from running.

Sorry, but are you sure that's going to be useful?

If it only works on realtime threads and I'm not seeing the problem when
running kvm with realtime priority, is this going to tell you what you
want to know?

Not trying to be difficult, but that just didn't make sense to me.

Regards,
Kevin.


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-21 14:34                     ` Avi Kivity
  0 siblings, 0 replies; 262+ messages in thread
From: Avi Kivity @ 2009-01-21 14:34 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Steven Rostedt, Ingo Molnar, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker, bugme-daemon

Kevin Shanahan wrote:
> On Tue, 2009-01-20 at 19:47 +0200, Avi Kivity wrote:
>   
>> Steven Rostedt wrote:
>>     
>>> Note, the wakeup latency only tests realtime threads, since other threads
>>> can have other issues for wakeup. I could change the wakeup tracer as
>>> wakeup_rt, and make a new "wakeup" that tests all threads, but it may
>>> be difficult to get something accurate.
>>>       
>> Kevin, can you retest with kvm at realtime priority?
>>     
>
> Running vanilla Linux 2.6.28, kvm-82. First a control test to check that
> the problem is still there when running at normal priority:
>
> --- hermes-old.wumi.org.au ping statistics ---
> 900 packets transmitted, 900 received, 0% packet loss, time 899283ms
> rtt min/avg/max/mdev = 0.119/269.773/13739.426/1230.836 ms, pipe 14
>
> Yeah, sure is.
>
> Okay, so now I set the realtime attributes of the processes for the VM
> instance being pinged:
>
> flexo:~# ps ax | grep 6284
>  6284 ?        Sl     6:11 /usr/local/kvm/bin/qemu-system-x86_64 -smp 2
> -m 2048 -hda kvm-17-1.img -hdb kvm-17-tmp.img -net
> nic,vlan=0,macaddr=52:54:00:12:34:67,model=rtl8139 -net
> tap,vlan=0,ifname=tap17,script=no -vnc 127.0.0.1:17 -usbdevice tablet
> -daemonize
> flexo:~# pstree -p 6284
> qemu-system-x86(6284)─┬─{qemu-system-x86}(6285)
>                       ├─{qemu-system-x86}(6286)
>                       └─{qemu-system-x86}(6540)
>
> (info cpus on the QEMU console shows 6285 and 6286 being the VCPU
> processes. Not sure what the third child is for, maybe vnc?.)
>
> flexo:~# chrt -r -p 3 6284
> flexo:~# chrt -r -p 3 6285
> flexo:~# chrt -r -p 3 6286
> flexo:~# chrt -p 6284
> pid 6284's current scheduling policy: SCHED_RR
> pid 6284's current scheduling priority: 3
> flexo:~# chrt -p 6285
> pid 6285's current scheduling policy: SCHED_RR
> pid 6285's current scheduling priority: 3
> flexo:~# chrt -p 6286
> pid 6286's current scheduling policy: SCHED_RR
> pid 6286's current scheduling priority: 3
>
> And the result of the ping test now:
>
> --- hermes-old.wumi.org.au ping statistics ---
> 900 packets transmitted, 900 received, 0% packet loss, time 899326ms
> rtt min/avg/max/mdev = 0.093/0.157/3.611/0.117 ms
>
> So, a _huge_ difference. But what does it mean?

It means, a scheduling problem.  Can you run the latency tracer (which 
only works with realtime priority), so we can tell if it is (a) kvm 
failing to wake up the vcpu properly or (b) the scheduler delaying the 
vcpu from running.

> P.S. Can someone tell me if I'm doing the CC: to bugme-daemon wrong? I
>      thought that was supposed to add the emails as comments to the
>      bugzilla report?
>   

So long as it isn't complaining, you can continue.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-21 14:34                     ` Avi Kivity
  0 siblings, 0 replies; 262+ messages in thread
From: Avi Kivity @ 2009-01-21 14:34 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Steven Rostedt, Ingo Molnar, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r

Kevin Shanahan wrote:
> On Tue, 2009-01-20 at 19:47 +0200, Avi Kivity wrote:
>   
>> Steven Rostedt wrote:
>>     
>>> Note, the wakeup latency only tests realtime threads, since other threads
>>> can have other issues for wakeup. I could change the wakeup tracer as
>>> wakeup_rt, and make a new "wakeup" that tests all threads, but it may
>>> be difficult to get something accurate.
>>>       
>> Kevin, can you retest with kvm at realtime priority?
>>     
>
> Running vanilla Linux 2.6.28, kvm-82. First a control test to check that
> the problem is still there when running at normal priority:
>
> --- hermes-old.wumi.org.au ping statistics ---
> 900 packets transmitted, 900 received, 0% packet loss, time 899283ms
> rtt min/avg/max/mdev = 0.119/269.773/13739.426/1230.836 ms, pipe 14
>
> Yeah, sure is.
>
> Okay, so now I set the realtime attributes of the processes for the VM
> instance being pinged:
>
> flexo:~# ps ax | grep 6284
>  6284 ?        Sl     6:11 /usr/local/kvm/bin/qemu-system-x86_64 -smp 2
> -m 2048 -hda kvm-17-1.img -hdb kvm-17-tmp.img -net
> nic,vlan=0,macaddr=52:54:00:12:34:67,model=rtl8139 -net
> tap,vlan=0,ifname=tap17,script=no -vnc 127.0.0.1:17 -usbdevice tablet
> -daemonize
> flexo:~# pstree -p 6284
> qemu-system-x86(6284)─┬─{qemu-system-x86}(6285)
>                       ├─{qemu-system-x86}(6286)
>                       └─{qemu-system-x86}(6540)
>
> (info cpus on the QEMU console shows 6285 and 6286 being the VCPU
> processes. Not sure what the third child is for, maybe vnc?.)
>
> flexo:~# chrt -r -p 3 6284
> flexo:~# chrt -r -p 3 6285
> flexo:~# chrt -r -p 3 6286
> flexo:~# chrt -p 6284
> pid 6284's current scheduling policy: SCHED_RR
> pid 6284's current scheduling priority: 3
> flexo:~# chrt -p 6285
> pid 6285's current scheduling policy: SCHED_RR
> pid 6285's current scheduling priority: 3
> flexo:~# chrt -p 6286
> pid 6286's current scheduling policy: SCHED_RR
> pid 6286's current scheduling priority: 3
>
> And the result of the ping test now:
>
> --- hermes-old.wumi.org.au ping statistics ---
> 900 packets transmitted, 900 received, 0% packet loss, time 899326ms
> rtt min/avg/max/mdev = 0.093/0.157/3.611/0.117 ms
>
> So, a _huge_ difference. But what does it mean?

It means, a scheduling problem.  Can you run the latency tracer (which 
only works with realtime priority), so we can tell if it is (a) kvm 
failing to wake up the vcpu properly or (b) the scheduler delaying the 
vcpu from running.

> P.S. Can someone tell me if I'm doing the CC: to bugme-daemon wrong? I
>      thought that was supposed to add the emails as comments to the
>      bugzilla report?
>   

So long as it isn't complaining, you can continue.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-21 14:25                   ` Kevin Shanahan
  0 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-01-21 14:25 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Steven Rostedt, Ingo Molnar, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker, bugme-daemon

On Tue, 2009-01-20 at 19:47 +0200, Avi Kivity wrote:
> Steven Rostedt wrote:
> > Note, the wakeup latency only tests realtime threads, since other threads
> > can have other issues for wakeup. I could change the wakeup tracer as
> > wakeup_rt, and make a new "wakeup" that tests all threads, but it may
> > be difficult to get something accurate.
> 
> Kevin, can you retest with kvm at realtime priority?

Running vanilla Linux 2.6.28, kvm-82. First a control test to check that
the problem is still there when running at normal priority:

--- hermes-old.wumi.org.au ping statistics ---
900 packets transmitted, 900 received, 0% packet loss, time 899283ms
rtt min/avg/max/mdev = 0.119/269.773/13739.426/1230.836 ms, pipe 14

Yeah, sure is.

Okay, so now I set the realtime attributes of the processes for the VM
instance being pinged:

flexo:~# ps ax | grep 6284
 6284 ?        Sl     6:11 /usr/local/kvm/bin/qemu-system-x86_64 -smp 2
-m 2048 -hda kvm-17-1.img -hdb kvm-17-tmp.img -net
nic,vlan=0,macaddr=52:54:00:12:34:67,model=rtl8139 -net
tap,vlan=0,ifname=tap17,script=no -vnc 127.0.0.1:17 -usbdevice tablet
-daemonize
flexo:~# pstree -p 6284
qemu-system-x86(6284)─┬─{qemu-system-x86}(6285)
                      ├─{qemu-system-x86}(6286)
                      └─{qemu-system-x86}(6540)

(info cpus on the QEMU console shows 6285 and 6286 being the VCPU
processes. Not sure what the third child is for, maybe vnc?.)

flexo:~# chrt -r -p 3 6284
flexo:~# chrt -r -p 3 6285
flexo:~# chrt -r -p 3 6286
flexo:~# chrt -p 6284
pid 6284's current scheduling policy: SCHED_RR
pid 6284's current scheduling priority: 3
flexo:~# chrt -p 6285
pid 6285's current scheduling policy: SCHED_RR
pid 6285's current scheduling priority: 3
flexo:~# chrt -p 6286
pid 6286's current scheduling policy: SCHED_RR
pid 6286's current scheduling priority: 3

And the result of the ping test now:

--- hermes-old.wumi.org.au ping statistics ---
900 packets transmitted, 900 received, 0% packet loss, time 899326ms
rtt min/avg/max/mdev = 0.093/0.157/3.611/0.117 ms

So, a _huge_ difference. But what does it mean?

Regards,
Kevin.

P.S. Can someone tell me if I'm doing the CC: to bugme-daemon wrong? I
     thought that was supposed to add the emails as comments to the
     bugzilla report?



^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-21 14:25                   ` Kevin Shanahan
  0 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-01-21 14:25 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Steven Rostedt, Ingo Molnar, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra, Frédéric Weisbecker,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r

On Tue, 2009-01-20 at 19:47 +0200, Avi Kivity wrote:
> Steven Rostedt wrote:
> > Note, the wakeup latency only tests realtime threads, since other threads
> > can have other issues for wakeup. I could change the wakeup tracer as
> > wakeup_rt, and make a new "wakeup" that tests all threads, but it may
> > be difficult to get something accurate.
> 
> Kevin, can you retest with kvm at realtime priority?

Running vanilla Linux 2.6.28, kvm-82. First a control test to check that
the problem is still there when running at normal priority:

--- hermes-old.wumi.org.au ping statistics ---
900 packets transmitted, 900 received, 0% packet loss, time 899283ms
rtt min/avg/max/mdev = 0.119/269.773/13739.426/1230.836 ms, pipe 14

Yeah, sure is.

Okay, so now I set the realtime attributes of the processes for the VM
instance being pinged:

flexo:~# ps ax | grep 6284
 6284 ?        Sl     6:11 /usr/local/kvm/bin/qemu-system-x86_64 -smp 2
-m 2048 -hda kvm-17-1.img -hdb kvm-17-tmp.img -net
nic,vlan=0,macaddr=52:54:00:12:34:67,model=rtl8139 -net
tap,vlan=0,ifname=tap17,script=no -vnc 127.0.0.1:17 -usbdevice tablet
-daemonize
flexo:~# pstree -p 6284
qemu-system-x86(6284)─┬─{qemu-system-x86}(6285)
                      ├─{qemu-system-x86}(6286)
                      └─{qemu-system-x86}(6540)

(info cpus on the QEMU console shows 6285 and 6286 being the VCPU
processes. Not sure what the third child is for, maybe vnc?.)

flexo:~# chrt -r -p 3 6284
flexo:~# chrt -r -p 3 6285
flexo:~# chrt -r -p 3 6286
flexo:~# chrt -p 6284
pid 6284's current scheduling policy: SCHED_RR
pid 6284's current scheduling priority: 3
flexo:~# chrt -p 6285
pid 6285's current scheduling policy: SCHED_RR
pid 6285's current scheduling priority: 3
flexo:~# chrt -p 6286
pid 6286's current scheduling policy: SCHED_RR
pid 6286's current scheduling priority: 3

And the result of the ping test now:

--- hermes-old.wumi.org.au ping statistics ---
900 packets transmitted, 900 received, 0% packet loss, time 899326ms
rtt min/avg/max/mdev = 0.093/0.157/3.611/0.117 ms

So, a _huge_ difference. But what does it mean?

Regards,
Kevin.

P.S. Can someone tell me if I'm doing the CC: to bugme-daemon wrong? I
     thought that was supposed to add the emails as comments to the
     bugzilla report?


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 18:42               ` Ingo Molnar
  0 siblings, 0 replies; 262+ messages in thread
From: Ingo Molnar @ 2009-01-20 18:42 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Mike Galbraith, bugme-daemon,
	Peter Zijlstra


* Kevin Shanahan <kmshanah@ucwb.org.au> wrote:

> Running the ping test with without apache2 running in the guest:
> 
> --- hermes-old.wumi.org.au ping statistics ---
> 900 packets transmitted, 900 received, 0% packet loss, time 902740ms
> rtt min/avg/max/mdev = 0.568/3.745/272.558/16.990 ms
> 
> And with apache2 running:
> 
> --- hermes-old.wumi.org.au ping statistics ---
> 900 packets transmitted, 900 received, 0% packet loss, time 902758ms
> rtt min/avg/max/mdev = 0.625/25.634/852.739/76.586 ms
> 
> In both cases it's quite variable, but the max latency is still not as 
> bad as when running with the irq chip enabled.

So the worst-case ping latency is more than 10 times lower?

I'd say this points in the direction of some sort of KVM-internal 
wakeup/signalling latency that happens if KVM does not deschedule. For 
example it could be a bug like this: if a guest image runs at 100% CPU 
time for a long time, IRQ injections might not propagate up until the 
preemption callbacks run. (but i'm just speculating here)

	Ingo

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 18:42               ` Ingo Molnar
  0 siblings, 0 replies; 262+ messages in thread
From: Ingo Molnar @ 2009-01-20 18:42 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Mike Galbraith,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r, Peter Zijlstra


* Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org> wrote:

> Running the ping test with without apache2 running in the guest:
> 
> --- hermes-old.wumi.org.au ping statistics ---
> 900 packets transmitted, 900 received, 0% packet loss, time 902740ms
> rtt min/avg/max/mdev = 0.568/3.745/272.558/16.990 ms
> 
> And with apache2 running:
> 
> --- hermes-old.wumi.org.au ping statistics ---
> 900 packets transmitted, 900 received, 0% packet loss, time 902758ms
> rtt min/avg/max/mdev = 0.625/25.634/852.739/76.586 ms
> 
> In both cases it's quite variable, but the max latency is still not as 
> bad as when running with the irq chip enabled.

So the worst-case ping latency is more than 10 times lower?

I'd say this points in the direction of some sort of KVM-internal 
wakeup/signalling latency that happens if KVM does not deschedule. For 
example it could be a bug like this: if a guest image runs at 100% CPU 
time for a long time, IRQ injections might not propagate up until the 
preemption callbacks run. (but i'm just speculating here)

	Ingo

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 18:39                     ` Ingo Molnar
  0 siblings, 0 replies; 262+ messages in thread
From: Ingo Molnar @ 2009-01-20 18:39 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Kevin Shanahan, Avi Kivity, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Kevin Shanahan,
	Mike Galbraith, Peter Zijlstra, Frédéric Weisbecker


* Steven Rostedt <rostedt@goodmis.org> wrote:

> > hm, that's a significant regression then. The latency tracer used to 
> > measure the highest-prio task in the system - be that RT or non-rt.
> 
> Well, it is a regression from what was in -rt yes. But not from what 
> ever was in mainline.

indeed, it is not a regression, it is worse: it makes the mainline version 
utterly useless in 99% of the cases ... This really needs to be fixed.

	Ingo

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 18:39                     ` Ingo Molnar
  0 siblings, 0 replies; 262+ messages in thread
From: Ingo Molnar @ 2009-01-20 18:39 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Kevin Shanahan, Avi Kivity, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Kevin Shanahan,
	Mike Galbraith, Peter Zijlstra, Frédéric Weisbecker


* Steven Rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org> wrote:

> > hm, that's a significant regression then. The latency tracer used to 
> > measure the highest-prio task in the system - be that RT or non-rt.
> 
> Well, it is a regression from what was in -rt yes. But not from what 
> ever was in mainline.

indeed, it is not a regression, it is worse: it makes the mainline version 
utterly useless in 99% of the cases ... This really needs to be fixed.

	Ingo

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 17:54             ` Kevin Shanahan
  0 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-01-20 17:54 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Ingo Molnar, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Mike Galbraith, bugme-daemon,
	Peter Zijlstra

On Tue, 2009-01-20 at 15:04 +0200, Avi Kivity wrote:
> Kevin Shanahan wrote:
> > On Tue, 2009-01-20 at 12:35 +0100, Ingo Molnar wrote:
> >> This only seems to occur under KVM, right? I.e. you tested it with -no-kvm 
> >> and the problem went away, correct?
> >>     
> >
> > Well, the I couldn't make the test conditions identical, but it the
> > problem didn't occur with the test I was able to do:
> >
> >   http://marc.info/?l=linux-kernel&m=123228728416498&w=2
> >
> >   
> 
> Can you also try with -no-kvm-irqchip?
> 
> You will need to comment out the lines
> 
>     /* ISA IRQs map to GSI 1-1 except for IRQ0 which maps
>      * to GSI 2.  GSI maps to ioapic 1-1.  This is not
>      * the cleanest way of doing it but it should work. */
> 
>     if (vector == 0)
>         vector = 2;
> 
> in qemu/hw/apic.c (should also fix -no-kvm smp).  This will change kvm 
> wakeups to use signals rather than the in-kernel code, which may be buggy.

Okay, I commented out those lines and compiled a new kvm-82 userspace
and kernel modules. Using those on a vanilla 2.6.28 kernel, with all
guests run with -no-kvm-irqchip added.

As before a number of the XP guests wanted to chug away at 100% CPU
usage for a long time. Three of the guests clocked up ~40 minutes CPU
time before I decided to just shut them down. Perhaps coincidentally,
these three guests are the only ones with Office 2003 installed on them.
That could be the difference between those guests and the other XP
guests, but that's probably not important for now.

The two Linux SMP guests booted okay this time, though they seem to only
use one CPU on the host (I guess kvm is not multi-threaded in this
mode?). "hermes-old", the guest I am pinging in all my tests, had a lot
of trouble running the apache2 setup - it was so slow it was difficult
to load a complete page from our RT system. The kvm process for this
guest was taking up 100% cpu on the host constantly and all sorts of
wierd stuff could be seen by watching top in the guest:

top - 03:44:17 up 43 min,  1 user,  load average: 3.95, 1.55, 0.80
Tasks: 101 total,   4 running,  97 sleeping,   0 stopped,   0 zombie
Cpu(s): 79.7%us, 10.4%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  9.9%si,
0.0%st
Mem:   2075428k total,   391128k used,  1684300k free,    13044k buffers
Swap:  3502160k total,        0k used,  3502160k free,   118488k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND             
 2956 postgres  20   0 19704  11m  10m S 1658  0.6   2:55.99 postmaster          
 2934 www-data  20   0 60392  40m 5132 R   31  2.0   0:17.28 apache2             
 2958 postgres  20   0 19700  11m 9.8m R   28  0.6   0:20.41 postmaster          
 2940 www-data  20   0 58652  38m 5016 S   27  1.9   0:04.87 apache2             
 2937 www-data  20   0 60124  40m 5112 S   18  2.0   0:11.00 apache2             
 2959 postgres  20   0 19132 5424 4132 S   10  0.3   0:01.50 postmaster          
 2072 postgres  20   0  8064 1416  548 S    7  0.1   0:23.71 postmaster          
 2960 postgres  20   0 19132 5368 4060 R    6  0.3   0:01.55 postmaster          
 2071 postgres  20   0  8560 1972  488 S    5  0.1   0:08.33 postmaster    

Running the ping test with without apache2 running in the guest:

--- hermes-old.wumi.org.au ping statistics ---
900 packets transmitted, 900 received, 0% packet loss, time 902740ms
rtt min/avg/max/mdev = 0.568/3.745/272.558/16.990 ms

And with apache2 running:

--- hermes-old.wumi.org.au ping statistics ---
900 packets transmitted, 900 received, 0% packet loss, time 902758ms
rtt min/avg/max/mdev = 0.625/25.634/852.739/76.586 ms

In both cases it's quite variable, but the max latency is still not as
bad as when running with the irq chip enabled.

Anyway, the test is again not ideal, but I hope we're proving something.
That's all I can do for tonight - should be ready for more again
tomorrow night.

Regards,
Kevin.



^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 17:54             ` Kevin Shanahan
  0 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-01-20 17:54 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Ingo Molnar, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Mike Galbraith,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r, Peter Zijlstra

On Tue, 2009-01-20 at 15:04 +0200, Avi Kivity wrote:
> Kevin Shanahan wrote:
> > On Tue, 2009-01-20 at 12:35 +0100, Ingo Molnar wrote:
> >> This only seems to occur under KVM, right? I.e. you tested it with -no-kvm 
> >> and the problem went away, correct?
> >>     
> >
> > Well, the I couldn't make the test conditions identical, but it the
> > problem didn't occur with the test I was able to do:
> >
> >   http://marc.info/?l=linux-kernel&m=123228728416498&w=2
> >
> >   
> 
> Can you also try with -no-kvm-irqchip?
> 
> You will need to comment out the lines
> 
>     /* ISA IRQs map to GSI 1-1 except for IRQ0 which maps
>      * to GSI 2.  GSI maps to ioapic 1-1.  This is not
>      * the cleanest way of doing it but it should work. */
> 
>     if (vector == 0)
>         vector = 2;
> 
> in qemu/hw/apic.c (should also fix -no-kvm smp).  This will change kvm 
> wakeups to use signals rather than the in-kernel code, which may be buggy.

Okay, I commented out those lines and compiled a new kvm-82 userspace
and kernel modules. Using those on a vanilla 2.6.28 kernel, with all
guests run with -no-kvm-irqchip added.

As before a number of the XP guests wanted to chug away at 100% CPU
usage for a long time. Three of the guests clocked up ~40 minutes CPU
time before I decided to just shut them down. Perhaps coincidentally,
these three guests are the only ones with Office 2003 installed on them.
That could be the difference between those guests and the other XP
guests, but that's probably not important for now.

The two Linux SMP guests booted okay this time, though they seem to only
use one CPU on the host (I guess kvm is not multi-threaded in this
mode?). "hermes-old", the guest I am pinging in all my tests, had a lot
of trouble running the apache2 setup - it was so slow it was difficult
to load a complete page from our RT system. The kvm process for this
guest was taking up 100% cpu on the host constantly and all sorts of
wierd stuff could be seen by watching top in the guest:

top - 03:44:17 up 43 min,  1 user,  load average: 3.95, 1.55, 0.80
Tasks: 101 total,   4 running,  97 sleeping,   0 stopped,   0 zombie
Cpu(s): 79.7%us, 10.4%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  9.9%si,
0.0%st
Mem:   2075428k total,   391128k used,  1684300k free,    13044k buffers
Swap:  3502160k total,        0k used,  3502160k free,   118488k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND             
 2956 postgres  20   0 19704  11m  10m S 1658  0.6   2:55.99 postmaster          
 2934 www-data  20   0 60392  40m 5132 R   31  2.0   0:17.28 apache2             
 2958 postgres  20   0 19700  11m 9.8m R   28  0.6   0:20.41 postmaster          
 2940 www-data  20   0 58652  38m 5016 S   27  1.9   0:04.87 apache2             
 2937 www-data  20   0 60124  40m 5112 S   18  2.0   0:11.00 apache2             
 2959 postgres  20   0 19132 5424 4132 S   10  0.3   0:01.50 postmaster          
 2072 postgres  20   0  8064 1416  548 S    7  0.1   0:23.71 postmaster          
 2960 postgres  20   0 19132 5368 4060 R    6  0.3   0:01.55 postmaster          
 2071 postgres  20   0  8560 1972  488 S    5  0.1   0:08.33 postmaster    

Running the ping test with without apache2 running in the guest:

--- hermes-old.wumi.org.au ping statistics ---
900 packets transmitted, 900 received, 0% packet loss, time 902740ms
rtt min/avg/max/mdev = 0.568/3.745/272.558/16.990 ms

And with apache2 running:

--- hermes-old.wumi.org.au ping statistics ---
900 packets transmitted, 900 received, 0% packet loss, time 902758ms
rtt min/avg/max/mdev = 0.625/25.634/852.739/76.586 ms

In both cases it's quite variable, but the max latency is still not as
bad as when running with the irq chip enabled.

Anyway, the test is again not ideal, but I hope we're proving something.
That's all I can do for tonight - should be ready for more again
tomorrow night.

Regards,
Kevin.


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 17:53                   ` Steven Rostedt
  0 siblings, 0 replies; 262+ messages in thread
From: Steven Rostedt @ 2009-01-20 17:53 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kevin Shanahan, Avi Kivity, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Kevin Shanahan,
	Mike Galbraith, Peter Zijlstra, Frédéric Weisbecker


On Tue, 20 Jan 2009, Ingo Molnar wrote:

> 
> * Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > On Tue, 20 Jan 2009, Ingo Molnar wrote:
> > > Another test would be to build the scheduler latency tracer into your 
> > > kernel:
> > > 
> > >     CONFIG_SCHED_TRACER=y
> > > 
> > > And enable it via:
> > > 
> > >     echo wakeup > /debug/tracing/current_tracer
> > > 
> > > and you should be seeing the worst-case scheduling latency traces in 
> > > /debug/tracing/trace, and the largest observed latency will be in 
> > > /debug/tracing/tracing_max_latency [in microseconds].
> > 
> > Note, the wakeup latency only tests realtime threads, since other 
> > threads can have other issues for wakeup. I could change the wakeup 
> > tracer as wakeup_rt, and make a new "wakeup" that tests all threads, but 
> > it may be difficult to get something accurate.
> 
> hm, that's a significant regression then. The latency tracer used to 
> measure the highest-prio task in the system - be that RT or non-rt.

Well, it is a regression from what was in -rt yes. But not from what ever 
was in mainline.

But I needed to change this to detect the problem that we 
solved with push and pull of rt tasks. The wake up of a non-rt tasks 
always took longer than an -rt task, and by tracing all tasks, I never got 
the wake up latency of an rt task.

As I mentioned earlier, I can make a wakeup-rt to do the rt tracing, and 
make wakeup do all tasks.

-- Steve


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 17:53                   ` Steven Rostedt
  0 siblings, 0 replies; 262+ messages in thread
From: Steven Rostedt @ 2009-01-20 17:53 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kevin Shanahan, Avi Kivity, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Kevin Shanahan,
	Mike Galbraith, Peter Zijlstra, Frédéric Weisbecker


On Tue, 20 Jan 2009, Ingo Molnar wrote:

> 
> * Steven Rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org> wrote:
> 
> > On Tue, 20 Jan 2009, Ingo Molnar wrote:
> > > Another test would be to build the scheduler latency tracer into your 
> > > kernel:
> > > 
> > >     CONFIG_SCHED_TRACER=y
> > > 
> > > And enable it via:
> > > 
> > >     echo wakeup > /debug/tracing/current_tracer
> > > 
> > > and you should be seeing the worst-case scheduling latency traces in 
> > > /debug/tracing/trace, and the largest observed latency will be in 
> > > /debug/tracing/tracing_max_latency [in microseconds].
> > 
> > Note, the wakeup latency only tests realtime threads, since other 
> > threads can have other issues for wakeup. I could change the wakeup 
> > tracer as wakeup_rt, and make a new "wakeup" that tests all threads, but 
> > it may be difficult to get something accurate.
> 
> hm, that's a significant regression then. The latency tracer used to 
> measure the highest-prio task in the system - be that RT or non-rt.

Well, it is a regression from what was in -rt yes. But not from what ever 
was in mainline.

But I needed to change this to detect the problem that we 
solved with push and pull of rt tasks. The wake up of a non-rt tasks 
always took longer than an -rt task, and by tracing all tasks, I never got 
the wake up latency of an rt task.

As I mentioned earlier, I can make a wakeup-rt to do the rt tracing, and 
make wakeup do all tasks.

-- Steve

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 17:47                 ` Avi Kivity
  0 siblings, 0 replies; 262+ messages in thread
From: Avi Kivity @ 2009-01-20 17:47 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, Kevin Shanahan, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Kevin Shanahan,
	Mike Galbraith, Peter Zijlstra, Frédéric Weisbecker

Steven Rostedt wrote:
> Note, the wakeup latency only tests realtime threads, since other threads
> can have other issues for wakeup. I could change the wakeup tracer as
> wakeup_rt, and make a new "wakeup" that tests all threads, but it may
> be difficult to get something accurate.
>   

Kevin, can you retest with kvm at realtime priority?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 17:47                 ` Avi Kivity
  0 siblings, 0 replies; 262+ messages in thread
From: Avi Kivity @ 2009-01-20 17:47 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, Kevin Shanahan, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Kevin Shanahan,
	Mike Galbraith, Peter Zijlstra, Frédéric Weisbecker

Steven Rostedt wrote:
> Note, the wakeup latency only tests realtime threads, since other threads
> can have other issues for wakeup. I could change the wakeup tracer as
> wakeup_rt, and make a new "wakeup" that tests all threads, but it may
> be difficult to get something accurate.
>   

Kevin, can you retest with kvm at realtime priority?

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 16:19                     ` Peter Zijlstra
  0 siblings, 0 replies; 262+ messages in thread
From: Peter Zijlstra @ 2009-01-20 16:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kevin Shanahan, Avi Kivity, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	bugme-daemon

On Tue, 2009-01-20 at 17:06 +0100, Ingo Molnar wrote:
> se.wait_max                        :           -92.027877
> 
> that field is not supposed to be negative. Mike, Peter, any ideas?

Possibly unrelated, but whilst I was poking at try_to_wake_up yesterday,
I thought I spotted a site where we fail to update rq clock.

Since we just moved the task to a new cpu (and thus rq) we need to
update_rq_clock() again.

diff --git a/kernel/sched.c b/kernel/sched.c
index d7ae5f4..6cd5e52 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2398,6 +2398,7 @@ static int try_to_wake_up(struct task_struct *p, unsigned int state, int sync)
 	if (cpu != orig_cpu) {
 		set_task_cpu(p, cpu);
 		task_rq_unlock(rq, &flags);
+		update_rq_clock(rq);
 		/* might preempt at this point */
 		rq = task_rq_lock(p, &flags);
 		old_state = p->state;



^ permalink raw reply related	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 16:19                     ` Peter Zijlstra
  0 siblings, 0 replies; 262+ messages in thread
From: Peter Zijlstra @ 2009-01-20 16:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kevin Shanahan, Avi Kivity, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r

On Tue, 2009-01-20 at 17:06 +0100, Ingo Molnar wrote:
> se.wait_max                        :           -92.027877
> 
> that field is not supposed to be negative. Mike, Peter, any ideas?

Possibly unrelated, but whilst I was poking at try_to_wake_up yesterday,
I thought I spotted a site where we fail to update rq clock.

Since we just moved the task to a new cpu (and thus rq) we need to
update_rq_clock() again.

diff --git a/kernel/sched.c b/kernel/sched.c
index d7ae5f4..6cd5e52 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2398,6 +2398,7 @@ static int try_to_wake_up(struct task_struct *p, unsigned int state, int sync)
 	if (cpu != orig_cpu) {
 		set_task_cpu(p, cpu);
 		task_rq_unlock(rq, &flags);
+		update_rq_clock(rq);
 		/* might preempt at this point */
 		rq = task_rq_lock(p, &flags);
 		old_state = p->state;


^ permalink raw reply related	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 16:06                   ` Ingo Molnar
  0 siblings, 0 replies; 262+ messages in thread
From: Ingo Molnar @ 2009-01-20 16:06 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Mike Galbraith, Peter Zijlstra,
	bugme-daemon


* Kevin Shanahan <kmshanah@ucwb.org.au> wrote:

> I've uploaded the debug info here:
>   http://disenchant.net/tmp/bug-12465/

one interesting number to watch for is the KVM thread's wait_max in 
/proc/*/sched. The largest one seems to be 11 milliseconds:

se.wait_max                        :             3.175034
se.wait_max                        :             4.029938
se.wait_max                        :             4.217674
se.wait_max                        :             4.957836
se.wait_max                        :            10.339471
se.wait_max                        :            11.603943

which would be about right given your latency settings:

 /proc/sys/kernel/sched_latency_ns:
 60000000

[ 60 msecs ]

but ... i dont specifically see the kvm threads there. Are they not in 
/proc/*? Maybe it's in threads and it needs to be accessed via 
/proc/*/task/*/sched, as via:

$ grep -h wait_max /proc/*/task/*/sched | sort -t: -n -k 2 | tail -10
se.wait_max                        :            77.858092
se.wait_max                        :            78.778409
se.wait_max                        :            79.379026
se.wait_max                        :            85.930963
se.wait_max                        :            87.671842
se.wait_max                        :            88.008602
se.wait_max                        :            95.095744
se.wait_max                        :           157.882573
se.wait_max                        :           268.714775
se.wait_max                        :           393.085252

so the worst-case latency

Btw., there's a few weird stats in your logs:

se.wait_max                        :          -284.864857
se.wait_max                        :          -284.843431
se.wait_max                        :          -284.820204
se.wait_max                        :          -284.345294
se.wait_max                        :          -284.298462
se.wait_max                        :          -284.018644
se.wait_max                        :          -284.018070
se.wait_max                        :          -188.022417
se.wait_max                        :          -188.021659
se.wait_max                        :           -92.030204
se.wait_max                        :           -92.027877

that field is not supposed to be negative. Mike, Peter, any ideas?

	Ingo

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 16:06                   ` Ingo Molnar
  0 siblings, 0 replies; 262+ messages in thread
From: Ingo Molnar @ 2009-01-20 16:06 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Mike Galbraith, Peter Zijlstra,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r


* Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org> wrote:

> I've uploaded the debug info here:
>   http://disenchant.net/tmp/bug-12465/

one interesting number to watch for is the KVM thread's wait_max in 
/proc/*/sched. The largest one seems to be 11 milliseconds:

se.wait_max                        :             3.175034
se.wait_max                        :             4.029938
se.wait_max                        :             4.217674
se.wait_max                        :             4.957836
se.wait_max                        :            10.339471
se.wait_max                        :            11.603943

which would be about right given your latency settings:

 /proc/sys/kernel/sched_latency_ns:
 60000000

[ 60 msecs ]

but ... i dont specifically see the kvm threads there. Are they not in 
/proc/*? Maybe it's in threads and it needs to be accessed via 
/proc/*/task/*/sched, as via:

$ grep -h wait_max /proc/*/task/*/sched | sort -t: -n -k 2 | tail -10
se.wait_max                        :            77.858092
se.wait_max                        :            78.778409
se.wait_max                        :            79.379026
se.wait_max                        :            85.930963
se.wait_max                        :            87.671842
se.wait_max                        :            88.008602
se.wait_max                        :            95.095744
se.wait_max                        :           157.882573
se.wait_max                        :           268.714775
se.wait_max                        :           393.085252

so the worst-case latency

Btw., there's a few weird stats in your logs:

se.wait_max                        :          -284.864857
se.wait_max                        :          -284.843431
se.wait_max                        :          -284.820204
se.wait_max                        :          -284.345294
se.wait_max                        :          -284.298462
se.wait_max                        :          -284.018644
se.wait_max                        :          -284.018070
se.wait_max                        :          -188.022417
se.wait_max                        :          -188.021659
se.wait_max                        :           -92.030204
se.wait_max                        :           -92.027877

that field is not supposed to be negative. Mike, Peter, any ideas?

	Ingo

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 15:51                 ` Kevin Shanahan
  0 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-01-20 15:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Mike Galbraith, Peter Zijlstra,
	bugme-daemon

On Tue, 2009-01-20 at 15:25 +0100, Ingo Molnar wrote:
> > I could run top, vmstat and cat /proc/sched_debug in a loop until the
> > problem occurs and then trim it. Something like:
> > 
> > while true; do
> >   date                                >> $FILE
> >   echo "-- top: --"                   >> $FILE
> >   top -H -c -b -d 1 -n 0.5            >> $FILE 2>/dev/null
> >   echo "-- vmstat: --"                >> $FILE
> >   vmstat                              >> $FILE 2>/dev/null
> >   echo "-- sched_debug #$i: --"       >> $FILE
> >   cat /proc/sched_debug               >> $FILE 2>/dev/null
> > done
> > 
> > That should take a snapshot every half second or so.
> 
> Yeah, that would be lovely. You dont even have to trim it much - just give 
> us a timestamp to look at for the delay incident. You might also want to 
> start the kvm session while the script is already running - that way we'll 
> get fresh statistics and see the whole thing.

I've uploaded the debug info here:
  http://disenchant.net/tmp/bug-12465/

Some interesting sections should be around these times:

  01:36:04 -> 01:36:27
  01:37:30 -> 01:37:42
  01:37:52 -> 01:37:56
  01:39:37 -> 01:39:40
  01:40:01 -> 01:40:14

The output from ping is there too so you can see how the delays usually
show up (e.g. in clusters). The large debug file runs from before I
launched the VMs, right through the ping test. The trimmed file just
cuts out everything before I started ping.

Regards,
Kevin.



^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 15:51                 ` Kevin Shanahan
  0 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-01-20 15:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Mike Galbraith, Peter Zijlstra,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r

On Tue, 2009-01-20 at 15:25 +0100, Ingo Molnar wrote:
> > I could run top, vmstat and cat /proc/sched_debug in a loop until the
> > problem occurs and then trim it. Something like:
> > 
> > while true; do
> >   date                                >> $FILE
> >   echo "-- top: --"                   >> $FILE
> >   top -H -c -b -d 1 -n 0.5            >> $FILE 2>/dev/null
> >   echo "-- vmstat: --"                >> $FILE
> >   vmstat                              >> $FILE 2>/dev/null
> >   echo "-- sched_debug #$i: --"       >> $FILE
> >   cat /proc/sched_debug               >> $FILE 2>/dev/null
> > done
> > 
> > That should take a snapshot every half second or so.
> 
> Yeah, that would be lovely. You dont even have to trim it much - just give 
> us a timestamp to look at for the delay incident. You might also want to 
> start the kvm session while the script is already running - that way we'll 
> get fresh statistics and see the whole thing.

I've uploaded the debug info here:
  http://disenchant.net/tmp/bug-12465/

Some interesting sections should be around these times:

  01:36:04 -> 01:36:27
  01:37:30 -> 01:37:42
  01:37:52 -> 01:37:56
  01:39:37 -> 01:39:40
  01:40:01 -> 01:40:14

The output from ping is there too so you can see how the delays usually
show up (e.g. in clusters). The large debug file runs from before I
launched the VMs, right through the ping test. The trimmed file just
cuts out everything before I started ping.

Regards,
Kevin.


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 15:04                 ` Ingo Molnar
  0 siblings, 0 replies; 262+ messages in thread
From: Ingo Molnar @ 2009-01-20 15:04 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Kevin Shanahan, Avi Kivity, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Kevin Shanahan,
	Mike Galbraith, Peter Zijlstra, Frédéric Weisbecker


* Steven Rostedt <rostedt@goodmis.org> wrote:

> On Tue, 20 Jan 2009, Ingo Molnar wrote:
> > Another test would be to build the scheduler latency tracer into your 
> > kernel:
> > 
> >     CONFIG_SCHED_TRACER=y
> > 
> > And enable it via:
> > 
> >     echo wakeup > /debug/tracing/current_tracer
> > 
> > and you should be seeing the worst-case scheduling latency traces in 
> > /debug/tracing/trace, and the largest observed latency will be in 
> > /debug/tracing/tracing_max_latency [in microseconds].
> 
> Note, the wakeup latency only tests realtime threads, since other 
> threads can have other issues for wakeup. I could change the wakeup 
> tracer as wakeup_rt, and make a new "wakeup" that tests all threads, but 
> it may be difficult to get something accurate.

hm, that's a significant regression then. The latency tracer used to 
measure the highest-prio task in the system - be that RT or non-rt.

	Ingo

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 15:04                 ` Ingo Molnar
  0 siblings, 0 replies; 262+ messages in thread
From: Ingo Molnar @ 2009-01-20 15:04 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Kevin Shanahan, Avi Kivity, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Kevin Shanahan,
	Mike Galbraith, Peter Zijlstra, Frédéric Weisbecker


* Steven Rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org> wrote:

> On Tue, 20 Jan 2009, Ingo Molnar wrote:
> > Another test would be to build the scheduler latency tracer into your 
> > kernel:
> > 
> >     CONFIG_SCHED_TRACER=y
> > 
> > And enable it via:
> > 
> >     echo wakeup > /debug/tracing/current_tracer
> > 
> > and you should be seeing the worst-case scheduling latency traces in 
> > /debug/tracing/trace, and the largest observed latency will be in 
> > /debug/tracing/tracing_max_latency [in microseconds].
> 
> Note, the wakeup latency only tests realtime threads, since other 
> threads can have other issues for wakeup. I could change the wakeup 
> tracer as wakeup_rt, and make a new "wakeup" that tests all threads, but 
> it may be difficult to get something accurate.

hm, that's a significant regression then. The latency tracer used to 
measure the highest-prio task in the system - be that RT or non-rt.

	Ingo

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 14:59               ` Steven Rostedt
  0 siblings, 0 replies; 262+ messages in thread
From: Steven Rostedt @ 2009-01-20 14:59 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kevin Shanahan, Avi Kivity, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Kevin Shanahan,
	Mike Galbraith, Peter Zijlstra, Frédéric Weisbecker



On Tue, 20 Jan 2009, Ingo Molnar wrote:
> Another test would be to build the scheduler latency tracer into your 
> kernel:
> 
>     CONFIG_SCHED_TRACER=y
> 
> And enable it via:
> 
>     echo wakeup > /debug/tracing/current_tracer
> 
> and you should be seeing the worst-case scheduling latency traces in 
> /debug/tracing/trace, and the largest observed latency will be in 
> /debug/tracing/tracing_max_latency [in microseconds].

Note, the wakeup latency only tests realtime threads, since other threads
can have other issues for wakeup. I could change the wakeup tracer as
wakeup_rt, and make a new "wakeup" that tests all threads, but it may
be difficult to get something accurate.

-- Steve

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 14:59               ` Steven Rostedt
  0 siblings, 0 replies; 262+ messages in thread
From: Steven Rostedt @ 2009-01-20 14:59 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kevin Shanahan, Avi Kivity, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Kevin Shanahan,
	Mike Galbraith, Peter Zijlstra, Frédéric Weisbecker



On Tue, 20 Jan 2009, Ingo Molnar wrote:
> Another test would be to build the scheduler latency tracer into your 
> kernel:
> 
>     CONFIG_SCHED_TRACER=y
> 
> And enable it via:
> 
>     echo wakeup > /debug/tracing/current_tracer
> 
> and you should be seeing the worst-case scheduling latency traces in 
> /debug/tracing/trace, and the largest observed latency will be in 
> /debug/tracing/tracing_max_latency [in microseconds].

Note, the wakeup latency only tests realtime threads, since other threads
can have other issues for wakeup. I could change the wakeup tracer as
wakeup_rt, and make a new "wakeup" that tests all threads, but it may
be difficult to get something accurate.

-- Steve

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 14:46               ` Frédéric Weisbecker
  0 siblings, 0 replies; 262+ messages in thread
From: Frédéric Weisbecker @ 2009-01-20 14:46 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Ingo Molnar, Avi Kivity, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra

2009/1/20 Kevin Shanahan <kmshanah@ucwb.org.au>:
> On Tue, 2009-01-20 at 13:56 +0100, Ingo Molnar wrote:
>> * Kevin Shanahan <kmshanah@ucwb.org.au> wrote:
>> > > This suggests some sort of KVM-specific problem. Scheduler latencies
>> > > in the seconds that occur under normal load situations are noticed and
>> > > reported quickly - and there are no such open regressions currently.
>> >
>> > It at least suggests a problem with interaction between the scheduler
>> > and kvm, otherwise reverting that scheduler patch wouldn't have made the
>> > regression go away.
>>
>> the scheduler affects almost everything, so almost by definition a
>> scheduler change can tickle a race or other timing bug in just about any
>> code - and reverting that change in the scheduler can make the bug go
>> away. But yes, it could also be a genuine scheduler bug - that is always a
>> possibility.
>
> Okay, I understand.
>
>> Could you please run a cfs-debug-info.sh session on a CONFIG_SCHED_DEBUG=y
>> and CONFIG_SCHEDSTATS=y kernel, while you are experiencing those
>> latencies:
>>
>>   http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh
>>
>> and post that (relatively large) somewhere, or send it as a reply after
>> bzip2 -9 compressing it? It will include a lot of information about the
>> delays your tasks are experiencing.
>
> Running it while the problem is occuring will be tricky, as it only
> lasts for a few seconds at a time. Is it going to be useful at all to
> just see those statistics if the system is running normally?
>
> I might need to modify the script a little. Am I right that everything
> above "gathering statistics..." is pretty much static information?
>
> I could run top, vmstat and cat /proc/sched_debug in a loop until the
> problem occurs and then trim it. Something like:
>
> while true; do
>  date                                >> $FILE
>  echo "-- top: --"                   >> $FILE
>  top -H -c -b -d 1 -n 0.5            >> $FILE 2>/dev/null
>  echo "-- vmstat: --"                >> $FILE
>  vmstat                              >> $FILE 2>/dev/null
>  echo "-- sched_debug #$i: --"       >> $FILE
>  cat /proc/sched_debug               >> $FILE 2>/dev/null
> done
>
> That should take a snapshot every half second or so.
>
> Regards,
> Kevin.
>
> P.S. Please keep kmshanah@flexo.wumi.org.au out of the CC list (it won't
>     route properly anyway). I don't know how it got added - the only
>     place it would have appeared was in the "revert" commit message
>     when I was testing 2.6.28 with the commit I bisected down to
>     removed.
>


One other thing you can do is enabling CONFIG_FUNCTION_GRAPH_TRACER,
as Ingo suggested, and
trace the schedule() function.
This way you will see the time spent in (almost) each functions called
from schedule() and perhaps find
where is the contention (if it comes from the scheduler).

How to use it?

echo schedule > /debugfs/tracing/set_graph_function
echo function_graph > /debugfs/tracing/current_tracer
cat /debugfs/tracing/trace

Or even through a pipe:
cat /debugfs/tracing/trace_pipe > ~/func_graph.log

To end the tracing: echo nop > /debugfs/tracing/current_tracer
Or just make a pause: echo 0 > /debugfs/tracing/tracing_enabled

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 14:46               ` Frédéric Weisbecker
  0 siblings, 0 replies; 262+ messages in thread
From: Frédéric Weisbecker @ 2009-01-20 14:46 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Ingo Molnar, Avi Kivity, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mike Galbraith,
	Peter Zijlstra

2009/1/20 Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>:
> On Tue, 2009-01-20 at 13:56 +0100, Ingo Molnar wrote:
>> * Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org> wrote:
>> > > This suggests some sort of KVM-specific problem. Scheduler latencies
>> > > in the seconds that occur under normal load situations are noticed and
>> > > reported quickly - and there are no such open regressions currently.
>> >
>> > It at least suggests a problem with interaction between the scheduler
>> > and kvm, otherwise reverting that scheduler patch wouldn't have made the
>> > regression go away.
>>
>> the scheduler affects almost everything, so almost by definition a
>> scheduler change can tickle a race or other timing bug in just about any
>> code - and reverting that change in the scheduler can make the bug go
>> away. But yes, it could also be a genuine scheduler bug - that is always a
>> possibility.
>
> Okay, I understand.
>
>> Could you please run a cfs-debug-info.sh session on a CONFIG_SCHED_DEBUG=y
>> and CONFIG_SCHEDSTATS=y kernel, while you are experiencing those
>> latencies:
>>
>>   http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh
>>
>> and post that (relatively large) somewhere, or send it as a reply after
>> bzip2 -9 compressing it? It will include a lot of information about the
>> delays your tasks are experiencing.
>
> Running it while the problem is occuring will be tricky, as it only
> lasts for a few seconds at a time. Is it going to be useful at all to
> just see those statistics if the system is running normally?
>
> I might need to modify the script a little. Am I right that everything
> above "gathering statistics..." is pretty much static information?
>
> I could run top, vmstat and cat /proc/sched_debug in a loop until the
> problem occurs and then trim it. Something like:
>
> while true; do
>  date                                >> $FILE
>  echo "-- top: --"                   >> $FILE
>  top -H -c -b -d 1 -n 0.5            >> $FILE 2>/dev/null
>  echo "-- vmstat: --"                >> $FILE
>  vmstat                              >> $FILE 2>/dev/null
>  echo "-- sched_debug #$i: --"       >> $FILE
>  cat /proc/sched_debug               >> $FILE 2>/dev/null
> done
>
> That should take a snapshot every half second or so.
>
> Regards,
> Kevin.
>
> P.S. Please keep kmshanah-IiIpDuVlHfMLO379cgqW9odd74u8MsAO@public.gmane.org out of the CC list (it won't
>     route properly anyway). I don't know how it got added - the only
>     place it would have appeared was in the "revert" commit message
>     when I was testing 2.6.28 with the commit I bisected down to
>     removed.
>


One other thing you can do is enabling CONFIG_FUNCTION_GRAPH_TRACER,
as Ingo suggested, and
trace the schedule() function.
This way you will see the time spent in (almost) each functions called
from schedule() and perhaps find
where is the contention (if it comes from the scheduler).

How to use it?

echo schedule > /debugfs/tracing/set_graph_function
echo function_graph > /debugfs/tracing/current_tracer
cat /debugfs/tracing/trace

Or even through a pipe:
cat /debugfs/tracing/trace_pipe > ~/func_graph.log

To end the tracing: echo nop > /debugfs/tracing/current_tracer
Or just make a pause: echo 0 > /debugfs/tracing/tracing_enabled

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 14:25               ` Ingo Molnar
  0 siblings, 0 replies; 262+ messages in thread
From: Ingo Molnar @ 2009-01-20 14:25 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Mike Galbraith, Peter Zijlstra


* Kevin Shanahan <kmshanah@ucwb.org.au> wrote:

> On Tue, 2009-01-20 at 13:56 +0100, Ingo Molnar wrote:
> > * Kevin Shanahan <kmshanah@ucwb.org.au> wrote:
> > > > This suggests some sort of KVM-specific problem. Scheduler latencies 
> > > > in the seconds that occur under normal load situations are noticed and 
> > > > reported quickly - and there are no such open regressions currently.
> > > 
> > > It at least suggests a problem with interaction between the scheduler 
> > > and kvm, otherwise reverting that scheduler patch wouldn't have made the 
> > > regression go away.
> > 
> > the scheduler affects almost everything, so almost by definition a 
> > scheduler change can tickle a race or other timing bug in just about any 
> > code - and reverting that change in the scheduler can make the bug go 
> > away. But yes, it could also be a genuine scheduler bug - that is always a 
> > possibility.
> 
> Okay, I understand.
> 
> > Could you please run a cfs-debug-info.sh session on a CONFIG_SCHED_DEBUG=y 
> > and CONFIG_SCHEDSTATS=y kernel, while you are experiencing those 
> > latencies:
> > 
> >   http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh
> > 
> > and post that (relatively large) somewhere, or send it as a reply after 
> > bzip2 -9 compressing it? It will include a lot of information about the 
> > delays your tasks are experiencing.
> 
> Running it while the problem is occuring will be tricky, as it only 
> lasts for a few seconds at a time. Is it going to be useful at all to 
> just see those statistics if the system is running normally?
> 
> I might need to modify the script a little. Am I right that everything 
> above "gathering statistics..." is pretty much static information?

Correct.

> I could run top, vmstat and cat /proc/sched_debug in a loop until the
> problem occurs and then trim it. Something like:
> 
> while true; do
>   date                                >> $FILE
>   echo "-- top: --"                   >> $FILE
>   top -H -c -b -d 1 -n 0.5            >> $FILE 2>/dev/null
>   echo "-- vmstat: --"                >> $FILE
>   vmstat                              >> $FILE 2>/dev/null
>   echo "-- sched_debug #$i: --"       >> $FILE
>   cat /proc/sched_debug               >> $FILE 2>/dev/null
> done
> 
> That should take a snapshot every half second or so.

Yeah, that would be lovely. You dont even have to trim it much - just give 
us a timestamp to look at for the delay incident. You might also want to 
start the kvm session while the script is already running - that way we'll 
get fresh statistics and see the whole thing.

	Ingo

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 14:25               ` Ingo Molnar
  0 siblings, 0 replies; 262+ messages in thread
From: Ingo Molnar @ 2009-01-20 14:25 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Mike Galbraith, Peter Zijlstra


* Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org> wrote:

> On Tue, 2009-01-20 at 13:56 +0100, Ingo Molnar wrote:
> > * Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org> wrote:
> > > > This suggests some sort of KVM-specific problem. Scheduler latencies 
> > > > in the seconds that occur under normal load situations are noticed and 
> > > > reported quickly - and there are no such open regressions currently.
> > > 
> > > It at least suggests a problem with interaction between the scheduler 
> > > and kvm, otherwise reverting that scheduler patch wouldn't have made the 
> > > regression go away.
> > 
> > the scheduler affects almost everything, so almost by definition a 
> > scheduler change can tickle a race or other timing bug in just about any 
> > code - and reverting that change in the scheduler can make the bug go 
> > away. But yes, it could also be a genuine scheduler bug - that is always a 
> > possibility.
> 
> Okay, I understand.
> 
> > Could you please run a cfs-debug-info.sh session on a CONFIG_SCHED_DEBUG=y 
> > and CONFIG_SCHEDSTATS=y kernel, while you are experiencing those 
> > latencies:
> > 
> >   http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh
> > 
> > and post that (relatively large) somewhere, or send it as a reply after 
> > bzip2 -9 compressing it? It will include a lot of information about the 
> > delays your tasks are experiencing.
> 
> Running it while the problem is occuring will be tricky, as it only 
> lasts for a few seconds at a time. Is it going to be useful at all to 
> just see those statistics if the system is running normally?
> 
> I might need to modify the script a little. Am I right that everything 
> above "gathering statistics..." is pretty much static information?

Correct.

> I could run top, vmstat and cat /proc/sched_debug in a loop until the
> problem occurs and then trim it. Something like:
> 
> while true; do
>   date                                >> $FILE
>   echo "-- top: --"                   >> $FILE
>   top -H -c -b -d 1 -n 0.5            >> $FILE 2>/dev/null
>   echo "-- vmstat: --"                >> $FILE
>   vmstat                              >> $FILE 2>/dev/null
>   echo "-- sched_debug #$i: --"       >> $FILE
>   cat /proc/sched_debug               >> $FILE 2>/dev/null
> done
> 
> That should take a snapshot every half second or so.

Yeah, that would be lovely. You dont even have to trim it much - just give 
us a timestamp to look at for the delay incident. You might also want to 
start the kvm session while the script is already running - that way we'll 
get fresh statistics and see the whole thing.

	Ingo

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 14:23             ` Kevin Shanahan
  0 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-01-20 14:23 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Mike Galbraith, Peter Zijlstra

On Tue, 2009-01-20 at 13:56 +0100, Ingo Molnar wrote:
> * Kevin Shanahan <kmshanah@ucwb.org.au> wrote:
> > > This suggests some sort of KVM-specific problem. Scheduler latencies 
> > > in the seconds that occur under normal load situations are noticed and 
> > > reported quickly - and there are no such open regressions currently.
> > 
> > It at least suggests a problem with interaction between the scheduler 
> > and kvm, otherwise reverting that scheduler patch wouldn't have made the 
> > regression go away.
> 
> the scheduler affects almost everything, so almost by definition a 
> scheduler change can tickle a race or other timing bug in just about any 
> code - and reverting that change in the scheduler can make the bug go 
> away. But yes, it could also be a genuine scheduler bug - that is always a 
> possibility.

Okay, I understand.

> Could you please run a cfs-debug-info.sh session on a CONFIG_SCHED_DEBUG=y 
> and CONFIG_SCHEDSTATS=y kernel, while you are experiencing those 
> latencies:
> 
>   http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh
> 
> and post that (relatively large) somewhere, or send it as a reply after 
> bzip2 -9 compressing it? It will include a lot of information about the 
> delays your tasks are experiencing.

Running it while the problem is occuring will be tricky, as it only
lasts for a few seconds at a time. Is it going to be useful at all to
just see those statistics if the system is running normally?

I might need to modify the script a little. Am I right that everything
above "gathering statistics..." is pretty much static information?

I could run top, vmstat and cat /proc/sched_debug in a loop until the
problem occurs and then trim it. Something like:

while true; do
  date                                >> $FILE
  echo "-- top: --"                   >> $FILE
  top -H -c -b -d 1 -n 0.5            >> $FILE 2>/dev/null
  echo "-- vmstat: --"                >> $FILE
  vmstat                              >> $FILE 2>/dev/null
  echo "-- sched_debug #$i: --"       >> $FILE
  cat /proc/sched_debug               >> $FILE 2>/dev/null
done

That should take a snapshot every half second or so.

Regards,
Kevin.

P.S. Please keep kmshanah@flexo.wumi.org.au out of the CC list (it won't
     route properly anyway). I don't know how it got added - the only
     place it would have appeared was in the "revert" commit message
     when I was testing 2.6.28 with the commit I bisected down to
     removed.



^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 14:23             ` Kevin Shanahan
  0 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-01-20 14:23 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Mike Galbraith, Peter Zijlstra

On Tue, 2009-01-20 at 13:56 +0100, Ingo Molnar wrote:
> * Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org> wrote:
> > > This suggests some sort of KVM-specific problem. Scheduler latencies 
> > > in the seconds that occur under normal load situations are noticed and 
> > > reported quickly - and there are no such open regressions currently.
> > 
> > It at least suggests a problem with interaction between the scheduler 
> > and kvm, otherwise reverting that scheduler patch wouldn't have made the 
> > regression go away.
> 
> the scheduler affects almost everything, so almost by definition a 
> scheduler change can tickle a race or other timing bug in just about any 
> code - and reverting that change in the scheduler can make the bug go 
> away. But yes, it could also be a genuine scheduler bug - that is always a 
> possibility.

Okay, I understand.

> Could you please run a cfs-debug-info.sh session on a CONFIG_SCHED_DEBUG=y 
> and CONFIG_SCHEDSTATS=y kernel, while you are experiencing those 
> latencies:
> 
>   http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh
> 
> and post that (relatively large) somewhere, or send it as a reply after 
> bzip2 -9 compressing it? It will include a lot of information about the 
> delays your tasks are experiencing.

Running it while the problem is occuring will be tricky, as it only
lasts for a few seconds at a time. Is it going to be useful at all to
just see those statistics if the system is running normally?

I might need to modify the script a little. Am I right that everything
above "gathering statistics..." is pretty much static information?

I could run top, vmstat and cat /proc/sched_debug in a loop until the
problem occurs and then trim it. Something like:

while true; do
  date                                >> $FILE
  echo "-- top: --"                   >> $FILE
  top -H -c -b -d 1 -n 0.5            >> $FILE 2>/dev/null
  echo "-- vmstat: --"                >> $FILE
  vmstat                              >> $FILE 2>/dev/null
  echo "-- sched_debug #$i: --"       >> $FILE
  cat /proc/sched_debug               >> $FILE 2>/dev/null
done

That should take a snapshot every half second or so.

Regards,
Kevin.

P.S. Please keep kmshanah-IiIpDuVlHfMLO379cgqW9odd74u8MsAO@public.gmane.org out of the CC list (it won't
     route properly anyway). I don't know how it got added - the only
     place it would have appeared was in the "revert" commit message
     when I was testing 2.6.28 with the commit I bisected down to
     removed.


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 13:07             ` Ingo Molnar
  0 siblings, 0 replies; 262+ messages in thread
From: Ingo Molnar @ 2009-01-20 13:07 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Kevin Shanahan, Mike Galbraith,
	Peter Zijlstra, Steven Rostedt, Frédéric Weisbecker


* Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Kevin Shanahan <kmshanah@ucwb.org.au> wrote:
> 
> > > This suggests some sort of KVM-specific problem. Scheduler latencies 
> > > in the seconds that occur under normal load situations are noticed and 
> > > reported quickly - and there are no such open regressions currently.
> > 
> > It at least suggests a problem with interaction between the scheduler 
> > and kvm, otherwise reverting that scheduler patch wouldn't have made the 
> > regression go away.
> 
> the scheduler affects almost everything, so almost by definition a 
> scheduler change can tickle a race or other timing bug in just about any 
> code - and reverting that change in the scheduler can make the bug go 
> away. But yes, it could also be a genuine scheduler bug - that is always a 
> possibility.
> 
> Could you please run a cfs-debug-info.sh session on a CONFIG_SCHED_DEBUG=y 
> and CONFIG_SCHEDSTATS=y kernel, while you are experiencing those 
> latencies:
> 
>   http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh
> 
> and post that (relatively large) somewhere, or send it as a reply after 
> bzip2 -9 compressing it? It will include a lot of information about the 
> delays your tasks are experiencing.

Another test would be to build the scheduler latency tracer into your 
kernel:

    CONFIG_SCHED_TRACER=y

And enable it via:

    echo wakeup > /debug/tracing/current_tracer

and you should be seeing the worst-case scheduling latency traces in 
/debug/tracing/trace, and the largest observed latency will be in 
/debug/tracing/tracing_max_latency [in microseconds].

You can reset the max-latency (and thus restart tracing) via:

    echo 0 > /debug/tracing/tracing_max_latency

Latencies up to 100 microseconds are ok. If you see 10 seconds delays 
there (values of 10,000,000 or more) then it's probably a scheduler bug.

Please reproduce the latency under KVM and send us the trace. The trace 
file will be a lot more verbose and a lot more verbose if you also enable 
the function tracer (FUNCTION_TRACER, DYNAMIC_FTRACE and 
FUNCTION_GRAPH_TRACER).

	Ingo

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 13:07             ` Ingo Molnar
  0 siblings, 0 replies; 262+ messages in thread
From: Ingo Molnar @ 2009-01-20 13:07 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Kevin Shanahan, Mike Galbraith,
	Peter Zijlstra, Steven Rostedt, Frédéric Weisbecker


* Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org> wrote:

> 
> * Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org> wrote:
> 
> > > This suggests some sort of KVM-specific problem. Scheduler latencies 
> > > in the seconds that occur under normal load situations are noticed and 
> > > reported quickly - and there are no such open regressions currently.
> > 
> > It at least suggests a problem with interaction between the scheduler 
> > and kvm, otherwise reverting that scheduler patch wouldn't have made the 
> > regression go away.
> 
> the scheduler affects almost everything, so almost by definition a 
> scheduler change can tickle a race or other timing bug in just about any 
> code - and reverting that change in the scheduler can make the bug go 
> away. But yes, it could also be a genuine scheduler bug - that is always a 
> possibility.
> 
> Could you please run a cfs-debug-info.sh session on a CONFIG_SCHED_DEBUG=y 
> and CONFIG_SCHEDSTATS=y kernel, while you are experiencing those 
> latencies:
> 
>   http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh
> 
> and post that (relatively large) somewhere, or send it as a reply after 
> bzip2 -9 compressing it? It will include a lot of information about the 
> delays your tasks are experiencing.

Another test would be to build the scheduler latency tracer into your 
kernel:

    CONFIG_SCHED_TRACER=y

And enable it via:

    echo wakeup > /debug/tracing/current_tracer

and you should be seeing the worst-case scheduling latency traces in 
/debug/tracing/trace, and the largest observed latency will be in 
/debug/tracing/tracing_max_latency [in microseconds].

You can reset the max-latency (and thus restart tracing) via:

    echo 0 > /debug/tracing/tracing_max_latency

Latencies up to 100 microseconds are ok. If you see 10 seconds delays 
there (values of 10,000,000 or more) then it's probably a scheduler bug.

Please reproduce the latency under KVM and send us the trace. The trace 
file will be a lot more verbose and a lot more verbose if you also enable 
the function tracer (FUNCTION_TRACER, DYNAMIC_FTRACE and 
FUNCTION_GRAPH_TRACER).

	Ingo

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 13:04           ` Avi Kivity
  0 siblings, 0 replies; 262+ messages in thread
From: Avi Kivity @ 2009-01-20 13:04 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Ingo Molnar, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Kevin Shanahan, Mike Galbraith,
	Peter Zijlstra

Kevin Shanahan wrote:
> On Tue, 2009-01-20 at 12:35 +0100, Ingo Molnar wrote:
>   
>> * Kevin Shanahan <kmshanah@ucwb.org.au> wrote:
>>
>>     
>>> On Mon, 2009-01-19 at 22:45 +0100, Rafael J. Wysocki wrote:
>>>       
>>>> This message has been generated automatically as a part of a report
>>>> of regressions introduced between 2.6.27 and 2.6.28.
>>>>
>>>> The following bug entry is on the current list of known regressions
>>>> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
>>>> be listed and let me know (either way).
>>>>
>>>>
>>>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
>>>> Subject		: KVM guests stalling on 2.6.28 (bisected)
>>>> Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
>>>> Date		: 2009-01-17 03:37 (3 days old)
>>>>         
>>> Yes, please keep this on the list.
>>>       
>> This only seems to occur under KVM, right? I.e. you tested it with -no-kvm 
>> and the problem went away, correct?
>>     
>
> Well, the I couldn't make the test conditions identical, but it the
> problem didn't occur with the test I was able to do:
>
>   http://marc.info/?l=linux-kernel&m=123228728416498&w=2
>
>   

Can you also try with -no-kvm-irqchip?

You will need to comment out the lines

    /* ISA IRQs map to GSI 1-1 except for IRQ0 which maps
     * to GSI 2.  GSI maps to ioapic 1-1.  This is not
     * the cleanest way of doing it but it should work. */

    if (vector == 0)
        vector = 2;

in qemu/hw/apic.c (should also fix -no-kvm smp).  This will change kvm 
wakeups to use signals rather than the in-kernel code, which may be buggy.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 13:04           ` Avi Kivity
  0 siblings, 0 replies; 262+ messages in thread
From: Avi Kivity @ 2009-01-20 13:04 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Ingo Molnar, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Kevin Shanahan, Mike Galbraith,
	Peter Zijlstra

Kevin Shanahan wrote:
> On Tue, 2009-01-20 at 12:35 +0100, Ingo Molnar wrote:
>   
>> * Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org> wrote:
>>
>>     
>>> On Mon, 2009-01-19 at 22:45 +0100, Rafael J. Wysocki wrote:
>>>       
>>>> This message has been generated automatically as a part of a report
>>>> of regressions introduced between 2.6.27 and 2.6.28.
>>>>
>>>> The following bug entry is on the current list of known regressions
>>>> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
>>>> be listed and let me know (either way).
>>>>
>>>>
>>>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
>>>> Subject		: KVM guests stalling on 2.6.28 (bisected)
>>>> Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
>>>> Date		: 2009-01-17 03:37 (3 days old)
>>>>         
>>> Yes, please keep this on the list.
>>>       
>> This only seems to occur under KVM, right? I.e. you tested it with -no-kvm 
>> and the problem went away, correct?
>>     
>
> Well, the I couldn't make the test conditions identical, but it the
> problem didn't occur with the test I was able to do:
>
>   http://marc.info/?l=linux-kernel&m=123228728416498&w=2
>
>   

Can you also try with -no-kvm-irqchip?

You will need to comment out the lines

    /* ISA IRQs map to GSI 1-1 except for IRQ0 which maps
     * to GSI 2.  GSI maps to ioapic 1-1.  This is not
     * the cleanest way of doing it but it should work. */

    if (vector == 0)
        vector = 2;

in qemu/hw/apic.c (should also fix -no-kvm smp).  This will change kvm 
wakeups to use signals rather than the in-kernel code, which may be buggy.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 12:56           ` Ingo Molnar
  0 siblings, 0 replies; 262+ messages in thread
From: Ingo Molnar @ 2009-01-20 12:56 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Kevin Shanahan, Mike Galbraith,
	Peter Zijlstra


* Kevin Shanahan <kmshanah@ucwb.org.au> wrote:

> > This suggests some sort of KVM-specific problem. Scheduler latencies 
> > in the seconds that occur under normal load situations are noticed and 
> > reported quickly - and there are no such open regressions currently.
> 
> It at least suggests a problem with interaction between the scheduler 
> and kvm, otherwise reverting that scheduler patch wouldn't have made the 
> regression go away.

the scheduler affects almost everything, so almost by definition a 
scheduler change can tickle a race or other timing bug in just about any 
code - and reverting that change in the scheduler can make the bug go 
away. But yes, it could also be a genuine scheduler bug - that is always a 
possibility.

Could you please run a cfs-debug-info.sh session on a CONFIG_SCHED_DEBUG=y 
and CONFIG_SCHEDSTATS=y kernel, while you are experiencing those 
latencies:

  http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh

and post that (relatively large) somewhere, or send it as a reply after 
bzip2 -9 compressing it? It will include a lot of information about the 
delays your tasks are experiencing.

	Ingo

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 12:56           ` Ingo Molnar
  0 siblings, 0 replies; 262+ messages in thread
From: Ingo Molnar @ 2009-01-20 12:56 UTC (permalink / raw)
  To: Kevin Shanahan
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Kevin Shanahan, Mike Galbraith,
	Peter Zijlstra


* Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org> wrote:

> > This suggests some sort of KVM-specific problem. Scheduler latencies 
> > in the seconds that occur under normal load situations are noticed and 
> > reported quickly - and there are no such open regressions currently.
> 
> It at least suggests a problem with interaction between the scheduler 
> and kvm, otherwise reverting that scheduler patch wouldn't have made the 
> regression go away.

the scheduler affects almost everything, so almost by definition a 
scheduler change can tickle a race or other timing bug in just about any 
code - and reverting that change in the scheduler can make the bug go 
away. But yes, it could also be a genuine scheduler bug - that is always a 
possibility.

Could you please run a cfs-debug-info.sh session on a CONFIG_SCHED_DEBUG=y 
and CONFIG_SCHEDSTATS=y kernel, while you are experiencing those 
latencies:

  http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh

and post that (relatively large) somewhere, or send it as a reply after 
bzip2 -9 compressing it? It will include a lot of information about the 
delays your tasks are experiencing.

	Ingo

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 12:42         ` Kevin Shanahan
  0 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-01-20 12:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Kevin Shanahan, Mike Galbraith,
	Peter Zijlstra

On Tue, 2009-01-20 at 12:35 +0100, Ingo Molnar wrote:
> * Kevin Shanahan <kmshanah@ucwb.org.au> wrote:
> 
> > On Mon, 2009-01-19 at 22:45 +0100, Rafael J. Wysocki wrote:
> > > This message has been generated automatically as a part of a report
> > > of regressions introduced between 2.6.27 and 2.6.28.
> > > 
> > > The following bug entry is on the current list of known regressions
> > > introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> > > be listed and let me know (either way).
> > > 
> > > 
> > > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> > > Subject		: KVM guests stalling on 2.6.28 (bisected)
> > > Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
> > > Date		: 2009-01-17 03:37 (3 days old)
> > 
> > Yes, please keep this on the list.
> 
> This only seems to occur under KVM, right? I.e. you tested it with -no-kvm 
> and the problem went away, correct?

Well, the I couldn't make the test conditions identical, but it the
problem didn't occur with the test I was able to do:

  http://marc.info/?l=linux-kernel&m=123228728416498&w=2

> This suggests some sort of KVM-specific problem. Scheduler latencies in 
> the seconds that occur under normal load situations are noticed and 
> reported quickly - and there are no such open regressions currently.

It at least suggests a problem with interaction between the scheduler
and kvm, otherwise reverting that scheduler patch wouldn't have made the
regression go away.

Regards,
Kevin.



^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 12:42         ` Kevin Shanahan
  0 siblings, 0 replies; 262+ messages in thread
From: Kevin Shanahan @ 2009-01-20 12:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Kevin Shanahan, Mike Galbraith,
	Peter Zijlstra

On Tue, 2009-01-20 at 12:35 +0100, Ingo Molnar wrote:
> * Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org> wrote:
> 
> > On Mon, 2009-01-19 at 22:45 +0100, Rafael J. Wysocki wrote:
> > > This message has been generated automatically as a part of a report
> > > of regressions introduced between 2.6.27 and 2.6.28.
> > > 
> > > The following bug entry is on the current list of known regressions
> > > introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> > > be listed and let me know (either way).
> > > 
> > > 
> > > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> > > Subject		: KVM guests stalling on 2.6.28 (bisected)
> > > Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
> > > Date		: 2009-01-17 03:37 (3 days old)
> > 
> > Yes, please keep this on the list.
> 
> This only seems to occur under KVM, right? I.e. you tested it with -no-kvm 
> and the problem went away, correct?

Well, the I couldn't make the test conditions identical, but it the
problem didn't occur with the test I was able to do:

  http://marc.info/?l=linux-kernel&m=123228728416498&w=2

> This suggests some sort of KVM-specific problem. Scheduler latencies in 
> the seconds that occur under normal load situations are noticed and 
> reported quickly - and there are no such open regressions currently.

It at least suggests a problem with interaction between the scheduler
and kvm, otherwise reverting that scheduler patch wouldn't have made the
regression go away.

Regards,
Kevin.


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 12:37         ` Avi Kivity
  0 siblings, 0 replies; 262+ messages in thread
From: Avi Kivity @ 2009-01-20 12:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kevin Shanahan, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Kevin Shanahan, Mike Galbraith,
	Peter Zijlstra

Ingo Molnar wrote:
> * Kevin Shanahan <kmshanah@ucwb.org.au> wrote:
>
>   
>> On Mon, 2009-01-19 at 22:45 +0100, Rafael J. Wysocki wrote:
>>     
>>> This message has been generated automatically as a part of a report
>>> of regressions introduced between 2.6.27 and 2.6.28.
>>>
>>> The following bug entry is on the current list of known regressions
>>> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
>>> be listed and let me know (either way).
>>>
>>>
>>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
>>> Subject		: KVM guests stalling on 2.6.28 (bisected)
>>> Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
>>> Date		: 2009-01-17 03:37 (3 days old)
>>>       
>> Yes, please keep this on the list.
>>     
>
> This only seems to occur under KVM, right? I.e. you tested it with -no-kvm 
> and the problem went away, correct?
>
> This suggests some sort of KVM-specific problem. Scheduler latencies in 
> the seconds that occur under normal load situations are noticed and 
> reported quickly - and there are no such open regressions currently.
>
>   

Not necessarily.  -no-kvm runs with only one thread, compared to kvm 
that runs with 1 + nr_cpus threads.

> Avi, can you reproduce these latencies? 

No.

> A possibly theory would be some 
> sort of guest wakeup problem/race triggered by a shift in 
> preemption/scheduling patterns. Or something related to preempt-notifiers 
> (which KVM is using). A genuine scheduler bug is in the cards too, but the 
> KVM-only angle of this bug gives it a low probability.
>   

Can we trace task wakeups somehow? (latency between wakeup and actually 
running).

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 12:37         ` Avi Kivity
  0 siblings, 0 replies; 262+ messages in thread
From: Avi Kivity @ 2009-01-20 12:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kevin Shanahan, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Kevin Shanahan, Mike Galbraith,
	Peter Zijlstra

Ingo Molnar wrote:
> * Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org> wrote:
>
>   
>> On Mon, 2009-01-19 at 22:45 +0100, Rafael J. Wysocki wrote:
>>     
>>> This message has been generated automatically as a part of a report
>>> of regressions introduced between 2.6.27 and 2.6.28.
>>>
>>> The following bug entry is on the current list of known regressions
>>> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
>>> be listed and let me know (either way).
>>>
>>>
>>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
>>> Subject		: KVM guests stalling on 2.6.28 (bisected)
>>> Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
>>> Date		: 2009-01-17 03:37 (3 days old)
>>>       
>> Yes, please keep this on the list.
>>     
>
> This only seems to occur under KVM, right? I.e. you tested it with -no-kvm 
> and the problem went away, correct?
>
> This suggests some sort of KVM-specific problem. Scheduler latencies in 
> the seconds that occur under normal load situations are noticed and 
> reported quickly - and there are no such open regressions currently.
>
>   

Not necessarily.  -no-kvm runs with only one thread, compared to kvm 
that runs with 1 + nr_cpus threads.

> Avi, can you reproduce these latencies? 

No.

> A possibly theory would be some 
> sort of guest wakeup problem/race triggered by a shift in 
> preemption/scheduling patterns. Or something related to preempt-notifiers 
> (which KVM is using). A genuine scheduler bug is in the cards too, but the 
> KVM-only angle of this bug gives it a low probability.
>   

Can we trace task wakeups somehow? (latency between wakeup and actually 
running).

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 11:35       ` Ingo Molnar
  0 siblings, 0 replies; 262+ messages in thread
From: Ingo Molnar @ 2009-01-20 11:35 UTC (permalink / raw)
  To: Kevin Shanahan, Avi Kivity
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Kevin Shanahan, Mike Galbraith,
	Peter Zijlstra


* Kevin Shanahan <kmshanah@ucwb.org.au> wrote:

> On Mon, 2009-01-19 at 22:45 +0100, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.27 and 2.6.28.
> > 
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> > be listed and let me know (either way).
> > 
> > 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> > Subject		: KVM guests stalling on 2.6.28 (bisected)
> > Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
> > Date		: 2009-01-17 03:37 (3 days old)
> 
> Yes, please keep this on the list.

This only seems to occur under KVM, right? I.e. you tested it with -no-kvm 
and the problem went away, correct?

This suggests some sort of KVM-specific problem. Scheduler latencies in 
the seconds that occur under normal load situations are noticed and 
reported quickly - and there are no such open regressions currently.

Avi, can you reproduce these latencies? A possibly theory would be some 
sort of guest wakeup problem/race triggered by a shift in 
preemption/scheduling patterns. Or something related to preempt-notifiers 
(which KVM is using). A genuine scheduler bug is in the cards too, but the 
KVM-only angle of this bug gives it a low probability.

	Ingo

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-20 11:35       ` Ingo Molnar
  0 siblings, 0 replies; 262+ messages in thread
From: Ingo Molnar @ 2009-01-20 11:35 UTC (permalink / raw)
  To: Kevin Shanahan, Avi Kivity
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Kevin Shanahan, Mike Galbraith,
	Peter Zijlstra


* Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org> wrote:

> On Mon, 2009-01-19 at 22:45 +0100, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.27 and 2.6.28.
> > 
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> > be listed and let me know (either way).
> > 
> > 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> > Subject		: KVM guests stalling on 2.6.28 (bisected)
> > Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
> > Date		: 2009-01-17 03:37 (3 days old)
> 
> Yes, please keep this on the list.

This only seems to occur under KVM, right? I.e. you tested it with -no-kvm 
and the problem went away, correct?

This suggests some sort of KVM-specific problem. Scheduler latencies in 
the seconds that occur under normal load situations are noticed and 
reported quickly - and there are no such open regressions currently.

Avi, can you reproduce these latencies? A possibly theory would be some 
sort of guest wakeup problem/race triggered by a shift in 
preemption/scheduling patterns. Or something related to preempt-notifiers 
(which KVM is using). A genuine scheduler bug is in the cards too, but the 
KVM-only angle of this bug gives it a low probability.

	Ingo

^ permalink raw reply	[flat|nested] 262+ messages in thread

* Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-01-19 21:45   ` Rafael J. Wysocki
  (?)
@ 2009-01-20  0:12   ` Kevin Shanahan
  2009-01-20 11:35       ` Ingo Molnar
  -1 siblings, 1 reply; 262+ messages in thread
From: Kevin Shanahan @ 2009-01-20  0:12 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Ingo Molnar,
	Kevin Shanahan, Mike Galbraith, Peter Zijlstra

On Mon, 2009-01-19 at 22:45 +0100, Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.27 and 2.6.28.
> 
> The following bug entry is on the current list of known regressions
> introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> be listed and let me know (either way).
> 
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
> Subject		: KVM guests stalling on 2.6.28 (bisected)
> Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
> Date		: 2009-01-17 03:37 (3 days old)

Yes, please keep this on the list.

Cheers,
Kevin.



^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
  2009-01-19 21:41 2.6.29-rc2-git1: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
@ 2009-01-19 21:45   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-01-19 21:45 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Ingo Molnar, Kevin Shanahan, Kevin Shanahan,
	Mike Galbraith, Peter Zijlstra

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
Subject		: KVM guests stalling on 2.6.28 (bisected)
Submitter	: Kevin Shanahan <kmshanah@ucwb.org.au>
Date		: 2009-01-17 03:37 (3 days old)



^ permalink raw reply	[flat|nested] 262+ messages in thread

* [Bug #12465] KVM guests stalling on 2.6.28 (bisected)
@ 2009-01-19 21:45   ` Rafael J. Wysocki
  0 siblings, 0 replies; 262+ messages in thread
From: Rafael J. Wysocki @ 2009-01-19 21:45 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Ingo Molnar, Kevin Shanahan, Kevin Shanahan,
	Mike Galbraith, Peter Zijlstra

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.27 and 2.6.28.

The following bug entry is on the current list of known regressions
introduced between 2.6.27 and 2.6.28.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12465
Subject		: KVM guests stalling on 2.6.28 (bisected)
Submitter	: Kevin Shanahan <kmshanah-biM/RbsGxha6c6uEtOJ/EA@public.gmane.org>
Date		: 2009-01-17 03:37 (3 days old)


^ permalink raw reply	[flat|nested] 262+ messages in thread

end of thread, other threads:[~2009-03-26 20:23 UTC | newest]

Thread overview: 262+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-14 20:48 2.6.29-rc5: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
2009-02-14 20:48 ` Rafael J. Wysocki
2009-02-14 20:48 ` [Bug #12061] snd_hda_intel: power_save: sound cracks on powerdown Rafael J. Wysocki
2009-02-14 20:48   ` Rafael J. Wysocki
2009-02-14 20:50 ` [Bug #12209] oldish top core dumps (in its meminfo() function) Rafael J. Wysocki
2009-02-14 20:50   ` Rafael J. Wysocki
2009-02-14 20:50 ` [Bug #12208] uml is very slow on 2.6.28 host Rafael J. Wysocki
2009-02-14 20:50   ` Rafael J. Wysocki
2009-02-22 13:58   ` Américo Wang
2009-02-22 13:58     ` Américo Wang
2009-02-23 14:27     ` Miklos Szeredi
2009-02-23 14:27       ` Miklos Szeredi
2009-02-14 20:50 ` [Bug #12160] networking oops after resume from s2ram (2.6.28-rc6) Rafael J. Wysocki
2009-02-14 20:50 ` [Bug #12337] ~100 extra wakeups reported by powertop Rafael J. Wysocki
2009-02-14 20:50   ` Rafael J. Wysocki
2009-02-14 23:35   ` Alberto Gonzalez
2009-02-14 23:35     ` Alberto Gonzalez
2009-02-15 14:20     ` Rafael J. Wysocki
2009-02-15 14:20       ` Rafael J. Wysocki
2009-02-14 20:50 ` [Bug #12224] journal activity on inactive partition causes inactive harddrive spinup Rafael J. Wysocki
2009-02-14 20:50   ` Rafael J. Wysocki
2009-02-23 12:22   ` Theodore Tso
2009-02-23 12:22     ` Theodore Tso
2009-02-23 14:36     ` Rafael J. Wysocki
2009-02-23 14:36       ` Rafael J. Wysocki
2009-02-14 20:50 ` [Bug #12263] Sata soft reset filling log Rafael J. Wysocki
2009-02-14 20:50   ` Rafael J. Wysocki
2009-02-15 20:47   ` Justin Madru
2009-02-15 21:21     ` Rafael J. Wysocki
2009-02-15 22:30       ` Ingo Molnar
2009-02-15 23:12         ` Rafael J. Wysocki
2009-02-16 15:18           ` Sergei Shtylyov
2009-02-16 15:21             ` Ingo Molnar
     [not found]             ` <499983DF.5050503-hkdhdckH98+B+jHODAdFcQ@public.gmane.org>
2009-02-16 15:21               ` Sergei Shtylyov
2009-02-16 15:21                 ` Sergei Shtylyov
     [not found]                 ` <49998480.3090408-hkdhdckH98+B+jHODAdFcQ@public.gmane.org>
2009-02-16 15:31                   ` Sergei Shtylyov
2009-02-16 15:31                     ` Sergei Shtylyov
2009-02-16 19:23                     ` Justin Madru
     [not found]                       ` <4999BD1A.1060101-u1xxEuL7cY4AvxtiuMwx3w@public.gmane.org>
2009-02-16 19:42                         ` Sergei Shtylyov
2009-02-16 19:42                           ` Sergei Shtylyov
     [not found]                           ` <4999C195.5050905-hkdhdckH98+B+jHODAdFcQ@public.gmane.org>
2009-02-16 21:40                             ` Justin Madru
2009-02-16 21:40                               ` Justin Madru
     [not found]                               ` <4999DD31.4010504-u1xxEuL7cY4AvxtiuMwx3w@public.gmane.org>
2009-02-17 11:19                                 ` Hugh Dickins
2009-02-17 11:19                                   ` Hugh Dickins
2009-02-17 19:08                                   ` Justin Madru
     [not found]                                     ` <499B0B3E.3070101-u1xxEuL7cY4AvxtiuMwx3w@public.gmane.org>
2009-02-18  1:03                                       ` Sergei Shtylyov
2009-02-18  1:03                                         ` Sergei Shtylyov
2009-02-18  6:42                                         ` Justin Madru
2009-02-14 20:50 ` [Bug #12265] FPU emulation broken in 2.6.28-rc8 ? Rafael J. Wysocki
2009-02-14 20:50   ` Rafael J. Wysocki
2009-02-14 23:23   ` Ingo Molnar
2009-02-14 23:23     ` Ingo Molnar
2009-02-14 20:50 ` [Bug #12401] 2.6.28 regression: xbacklight broken on ThinkPad X61s Rafael J. Wysocki
2009-02-14 20:50   ` Rafael J. Wysocki
2009-02-15 13:44   ` Matthew Garrett
2009-02-15 13:44     ` Matthew Garrett
2009-02-15 14:38     ` Rafael J. Wysocki
2009-02-15 14:38       ` Rafael J. Wysocki
2009-02-15 22:16       ` Tino Keitel
2009-02-15 22:16         ` Tino Keitel
2009-02-16  1:16         ` Matthew Garrett
2009-02-16  1:16           ` Matthew Garrett
2009-02-16 12:37           ` Ingo Molnar
2009-02-16 12:37             ` Ingo Molnar
2009-02-16 12:42             ` Matthew Garrett
2009-02-16 12:42               ` Matthew Garrett
2009-02-14 20:50 ` [Bug #12393] debugging in dosemu causes lots of 'scheduling while atomic' Rafael J. Wysocki
2009-02-14 20:50   ` Rafael J. Wysocki
2009-02-14 20:50 ` [Bug #12395] 2.6.28-rc9: oprofile regression Rafael J. Wysocki
2009-02-14 20:50   ` Rafael J. Wysocki
2009-02-14 20:50 ` [Bug #12403] TTY problem on linux-2.6.28-rc7 Rafael J. Wysocki
2009-02-14 20:50   ` Rafael J. Wysocki
2009-02-16 16:12   ` Aristeu Rozanski
2009-02-16 16:12     ` Aristeu Rozanski
2009-02-16 20:42     ` Rafael J. Wysocki
2009-02-14 20:50 ` [Bug #12405] oops in __bounce_end_io_read under kvm Rafael J. Wysocki
2009-02-14 20:50   ` Rafael J. Wysocki
2009-02-14 20:50 ` [Bug #12404] Oops in 2.6.28-rc9 and -rc8 -- mtrr issues / e1000e Rafael J. Wysocki
2009-02-14 20:50   ` Rafael J. Wysocki
2009-02-14 20:50 ` [Bug #12406] 2.6.28 thinks that my PS/2 mouse is a touchpad Rafael J. Wysocki
2009-02-14 20:50   ` Rafael J. Wysocki
2009-02-15  6:14   ` Alexander E. Patrakov
2009-02-15  6:14     ` Alexander E. Patrakov
2009-02-15 14:40     ` Rafael J. Wysocki
2009-02-15 14:40       ` Rafael J. Wysocki
2009-02-14 20:50 ` [Bug #12411] 2.6.28: BUG in r8169 Rafael J. Wysocki
2009-02-14 20:50   ` Rafael J. Wysocki
2009-02-14 20:50 ` [Bug #12408] Funny problem with 2.6.28: Kernel stalls Rafael J. Wysocki
2009-02-14 20:50   ` Rafael J. Wysocki
2009-02-14 20:50 ` [Bug #12409] NULL pointer dereference at get_stats() Rafael J. Wysocki
2009-02-14 20:50   ` Rafael J. Wysocki
2009-02-14 20:50 ` [Bug #12407] Kernel 2.6.28 regression: Hang after hibernate Rafael J. Wysocki
2009-02-14 20:50   ` Rafael J. Wysocki
2009-02-14 20:50 ` [Bug #12500] r8169: NETDEV WATCHDOG: eth0 (r8169): transmit timed out Rafael J. Wysocki
2009-02-14 20:50   ` Rafael J. Wysocki
2009-02-14 20:50 ` [Bug #12465] KVM guests stalling on 2.6.28 (bisected) Rafael J. Wysocki
2009-02-14 20:50   ` Rafael J. Wysocki
2009-02-15  9:48   ` [Bug #12465] KVM guests stalling on 2.6.28 (bisected) [Bug 12465] Kevin Shanahan
2009-02-15  9:48     ` Kevin Shanahan
2009-02-15 10:04     ` Ingo Molnar
2009-02-22 10:39       ` [Bug #12465] KVM guests stalling on 2.6.28 (bisected) [bug 12465] Kevin Shanahan
2009-02-22 10:39         ` Kevin Shanahan
2009-02-22 17:27         ` Ingo Molnar
2009-02-22 17:27           ` Ingo Molnar
2009-02-23 11:38       ` [Bug #12465] KVM guests stalling on 2.6.28 (bisected) [Bug 12465] Kevin Shanahan
2009-02-23 11:38         ` Kevin Shanahan
2009-02-14 20:50 ` [Bug #12421] GPF on 2.6.28 and 2.6.28-rc9-git3, e1000e and e1000 issues Rafael J. Wysocki
2009-02-14 20:50   ` Rafael J. Wysocki
2009-02-14 20:50 ` [Bug #12614] WOL with forcedeth broken since f55c21fd9a92a444e55ad1ca4e4732d56661bf2e Rafael J. Wysocki
2009-02-14 20:50   ` Rafael J. Wysocki
2009-02-14 20:50 ` [Bug #12619] Regression 2.6.28 and last - boot failed Rafael J. Wysocki
2009-02-14 20:50   ` Rafael J. Wysocki
2009-02-14 20:50 ` [Bug #12559] Huawei E169 doesn't work as mass storage anymore Rafael J. Wysocki
2009-02-14 20:50   ` Rafael J. Wysocki
2009-02-14 20:50 ` [Bug #12612] hard lockup when interrupting cdda2wav Rafael J. Wysocki
2009-02-14 20:50   ` Rafael J. Wysocki
2009-02-17 17:16   ` Matthias Reichl
2009-02-17 17:16     ` Matthias Reichl
2009-02-17 20:23     ` Rafael J. Wysocki
2009-02-17 20:23       ` Rafael J. Wysocki
2009-02-19 13:49       ` FUJITA Tomonori
2009-02-19 13:49         ` FUJITA Tomonori
2009-02-14 20:50 ` [Bug #12645] DMI low-memory-protect quirk causes resume hang on Samsung NC10 Rafael J. Wysocki
2009-02-14 20:50   ` Rafael J. Wysocki
2009-02-14 20:50 ` [Bug #12658] ThrustMaster Firestorm Dual Power 3 Gamepads stopped working Rafael J. Wysocki
2009-02-14 20:50   ` Rafael J. Wysocki
2009-02-14 20:50 ` [Bug #12690] DPMS (LCD powersave, poweroff) don't work Rafael J. Wysocki
2009-02-14 20:50   ` Rafael J. Wysocki
2009-02-14 20:50 ` [Bug #12634] video distortion and lockup with i830 video chip and 2.6.28.3 Rafael J. Wysocki
2009-02-14 20:50   ` Rafael J. Wysocki
  -- strict thread matches above, loose matches on Subject: below --
2009-03-21 17:01 2.6.29-rc8-git5: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
2009-03-21 17:07 ` [Bug #12465] KVM guests stalling on 2.6.28 (bisected) Rafael J. Wysocki
2009-03-21 17:07   ` Rafael J. Wysocki
2009-03-21 19:50   ` Ingo Molnar
2009-03-21 19:50     ` Ingo Molnar
2009-03-14 19:11 2.6.29-rc8: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
2009-03-14 19:20 ` [Bug #12465] KVM guests stalling on 2.6.28 (bisected) Rafael J. Wysocki
2009-03-14 19:20   ` Rafael J. Wysocki
2009-03-15  9:03   ` Kevin Shanahan
2009-03-15  9:03     ` Kevin Shanahan
2009-03-15  9:18     ` Avi Kivity
2009-03-15  9:18       ` Avi Kivity
2009-03-15  9:48       ` Ingo Molnar
2009-03-15  9:48         ` Ingo Molnar
2009-03-15  9:56         ` Avi Kivity
2009-03-15  9:56           ` Avi Kivity
2009-03-15 10:03           ` Ingo Molnar
2009-03-15 10:13             ` Avi Kivity
2009-03-15 10:13               ` Avi Kivity
2009-03-16  9:49     ` Avi Kivity
2009-03-16  9:49       ` Avi Kivity
2009-03-16 12:46       ` Kevin Shanahan
2009-03-16 12:46         ` Kevin Shanahan
2009-03-16 20:07         ` Frederic Weisbecker
2009-03-16 20:07           ` Frederic Weisbecker
2009-03-16 22:55           ` Kevin Shanahan
2009-03-16 22:55             ` Kevin Shanahan
2009-03-18  0:20             ` Frederic Weisbecker
2009-03-18  0:20               ` Frederic Weisbecker
2009-03-18  1:16               ` Kevin Shanahan
2009-03-18  1:16                 ` Kevin Shanahan
2009-03-18  2:24                 ` Frederic Weisbecker
2009-03-18  2:24                   ` Frederic Weisbecker
2009-03-18 21:24                 ` Kevin Shanahan
2009-03-21  5:00                   ` Kevin Shanahan
2009-03-21  5:00                     ` Kevin Shanahan
2009-03-21 14:08                     ` Frederic Weisbecker
2009-03-21 14:08                       ` Frederic Weisbecker
2009-03-24 11:44                     ` Frederic Weisbecker
2009-03-24 11:44                       ` Frederic Weisbecker
2009-03-24 11:47                       ` Frederic Weisbecker
2009-03-24 11:47                         ` Frederic Weisbecker
2009-03-25 23:40                       ` Kevin Shanahan
2009-03-25 23:48                         ` Frederic Weisbecker
2009-03-25 23:48                           ` Frederic Weisbecker
2009-03-26 20:22                       ` Kevin Shanahan
2009-03-26 20:22                         ` Kevin Shanahan
2009-03-03 19:34 2.6.29-rc6-git7: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
2009-03-03 19:41 ` [Bug #12465] KVM guests stalling on 2.6.28 (bisected) Rafael J. Wysocki
2009-03-03 19:41   ` Rafael J. Wysocki
2009-03-04  3:08   ` Kevin Shanahan
2009-03-04  3:08     ` Kevin Shanahan
2009-03-08 10:04     ` Avi Kivity
2009-03-08 10:04       ` Avi Kivity
2009-02-23 22:00 2.6.29-rc6: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
2009-02-23 22:03 ` [Bug #12465] KVM guests stalling on 2.6.28 (bisected) Rafael J. Wysocki
2009-02-23 22:03   ` Rafael J. Wysocki
2009-02-24  0:59   ` Kevin Shanahan
2009-02-24  0:59     ` Kevin Shanahan
2009-02-24  1:37     ` Rafael J. Wysocki
2009-02-24  1:37       ` Rafael J. Wysocki
2009-02-24 12:09     ` Avi Kivity
2009-02-24 12:09       ` Avi Kivity
2009-02-24 22:11       ` Kevin Shanahan
2009-02-24 22:11         ` Kevin Shanahan
2009-02-04 10:55 2.6.29-rc3-git6: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
2009-02-04 10:58 ` [Bug #12465] KVM guests stalling on 2.6.28 (bisected) Rafael J. Wysocki
2009-02-04 10:58   ` Rafael J. Wysocki
2009-02-05 19:35   ` Kevin Shanahan
2009-02-05 19:35     ` Kevin Shanahan
2009-02-05 22:37     ` Rafael J. Wysocki
2009-02-05 22:37       ` Rafael J. Wysocki
2009-01-19 21:41 2.6.29-rc2-git1: Reported regressions 2.6.27 -> 2.6.28 Rafael J. Wysocki
2009-01-19 21:45 ` [Bug #12465] KVM guests stalling on 2.6.28 (bisected) Rafael J. Wysocki
2009-01-19 21:45   ` Rafael J. Wysocki
2009-01-20  0:12   ` Kevin Shanahan
2009-01-20 11:35     ` Ingo Molnar
2009-01-20 11:35       ` Ingo Molnar
2009-01-20 12:37       ` Avi Kivity
2009-01-20 12:37         ` Avi Kivity
2009-01-20 12:42       ` Kevin Shanahan
2009-01-20 12:42         ` Kevin Shanahan
2009-01-20 12:56         ` Ingo Molnar
2009-01-20 12:56           ` Ingo Molnar
2009-01-20 13:07           ` Ingo Molnar
2009-01-20 13:07             ` Ingo Molnar
2009-01-20 14:59             ` Steven Rostedt
2009-01-20 14:59               ` Steven Rostedt
2009-01-20 15:04               ` Ingo Molnar
2009-01-20 15:04                 ` Ingo Molnar
2009-01-20 17:53                 ` Steven Rostedt
2009-01-20 17:53                   ` Steven Rostedt
2009-01-20 18:39                   ` Ingo Molnar
2009-01-20 18:39                     ` Ingo Molnar
2009-01-20 17:47               ` Avi Kivity
2009-01-20 17:47                 ` Avi Kivity
2009-01-21 14:25                 ` Kevin Shanahan
2009-01-21 14:25                   ` Kevin Shanahan
2009-01-21 14:34                   ` Avi Kivity
2009-01-21 14:34                     ` Avi Kivity
2009-01-21 14:51                     ` Kevin Shanahan
2009-01-21 14:51                       ` Kevin Shanahan
2009-01-21 14:59                       ` Avi Kivity
2009-01-21 14:59                         ` Avi Kivity
2009-01-21 15:13                         ` Steven Rostedt
2009-01-21 15:13                           ` Steven Rostedt
2009-01-22  1:48                         ` Steven Rostedt
2009-01-22  1:48                           ` Steven Rostedt
2009-01-21 15:10                     ` Steven Rostedt
2009-01-21 15:10                       ` Steven Rostedt
2009-01-21 15:18                     ` Ingo Molnar
2009-01-21 15:18                       ` Ingo Molnar
2009-01-22 19:57                       ` Kevin Shanahan
2009-01-22 20:31                         ` Ingo Molnar
2009-01-22 20:31                           ` Ingo Molnar
2009-01-26  9:55                       ` Kevin Shanahan
2009-01-26  9:55                         ` Kevin Shanahan
2009-01-26 11:35                         ` Peter Zijlstra
2009-01-26 15:00                           ` Ingo Molnar
2009-01-26 15:00                             ` Ingo Molnar
2009-01-20 14:23           ` Kevin Shanahan
2009-01-20 14:23             ` Kevin Shanahan
2009-01-20 14:25             ` Ingo Molnar
2009-01-20 14:25               ` Ingo Molnar
2009-01-20 15:51               ` Kevin Shanahan
2009-01-20 15:51                 ` Kevin Shanahan
2009-01-20 16:06                 ` Ingo Molnar
2009-01-20 16:06                   ` Ingo Molnar
2009-01-20 16:19                   ` Peter Zijlstra
2009-01-20 16:19                     ` Peter Zijlstra
2009-01-20 14:46             ` Frédéric Weisbecker
2009-01-20 14:46               ` Frédéric Weisbecker
2009-01-20 13:04         ` Avi Kivity
2009-01-20 13:04           ` Avi Kivity
2009-01-20 17:54           ` Kevin Shanahan
2009-01-20 17:54             ` Kevin Shanahan
2009-01-20 18:42             ` Ingo Molnar
2009-01-20 18:42               ` Ingo Molnar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.