linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.6.24-rc4-git5: Reported regressions from 2.6.23
@ 2007-12-08  2:40 Rafael J. Wysocki
  2007-12-08  6:53 ` Fabio Comolli
                   ` (9 more replies)
  0 siblings, 10 replies; 103+ messages in thread
From: Rafael J. Wysocki @ 2007-12-08  2:40 UTC (permalink / raw)
  To: LKML; +Cc: Andrew Morton, Linus Torvalds, Ingo Molnar

This message contains a list of some regressions from 2.6.23 which have been
reported since 2.6.24-rc1 was released and for which there are no fixes in the
mainline that I know of.  If any of them have been fixed already, please let me
know.

If you know of any other unresolved regressions from 2.6.23, please let me know
either and I'll add them to the list.


Subject		: EHCI causes system to resume instantly from S4
Submitter	: Maxim Levitsky <maximlevitsky@gmail.com>
References	: http://lkml.org/lkml/2007/10/27/66
		  http://bugzilla.kernel.org/show_bug.cgi?id=9258
Handled-By	: "Rafael J. Wysocki" <rjw@sisk.pl>
		  David Brownell <david-b@pacbell.net>
		  Alan Stern <stern@rowland.harvard.edu>
Patch		: 
Note:		: the problem appears to heavily depend on hardware


Subject		: leds: ledtrig-timer calls sleeping function from invalid context
Submitter	: Márton Németh <nm127@freemail.hu>
References	: http://bugzilla.kernel.org/show_bug.cgi?id=9264
Handled-By	: Richard Purdie <rpurdie@rpsys.net>
Patch		: http://bugzilla.kernel.org/attachment.cgi?id=13493&action=view


Subject		: PATA scan: ACPI Exception AE_AML_PACKAGE_LIMIT... is beyond end of object
Submitter	: Hans de Bruin <bruinjm@xs4all.nl>
References	: http://bugzilla.kernel.org/show_bug.cgi?id=9320
Handled-By	: Robert Moore <Robert.Moore@intel.com>
		  Tejun Heo <htejun@gmail.com>
		  Fu Michael <michael.fu@intel.com>
Patch		: 


Subject		: snd_hda_intel 2.6.24-rc2 bug: interrupts don't always work on Lenovo X60s
Submitter	: Roland Dreier <rdreier@cisco.com>
References	: http://lkml.org/lkml/2007/11/8/255
		  http://bugzilla.kernel.org/show_bug.cgi?id=9332
Handled-By	: 
Patch		: 


Subject		: system hangs after a few minutes
Submitter	: Marcus Better <marcus@better.se>
References	: http://bugzilla.kernel.org/show_bug.cgi?id=9335
Handled-By	: Andrew Morton <akpm@linux-foundation.org>
		  Alan Stern <stern@rowland.harvard.edu>
Patch		: http://bugzilla.kernel.org/attachment.cgi?id=13871&action=view


Subject		: cd/dvd inaccessible in 2.6.24-rc2
Submitter	: Will Trives <will@trivescon.com.au>
References	: http://lkml.org/lkml/2007/11/9/290
		  http://bugzilla.kernel.org/show_bug.cgi?id=9346
Handled-By	: Len Brown <lenb@kernel.org>
		  Tejun Heo <htejun@gmail.com>
Patch		: 


Subject		: The keyboard doesn't work
Submitter	: Francois Valenduc <francois.valenduc@skynet.be>
References	: http://bugzilla.kernel.org/show_bug.cgi?id=9362
Handled-By	: Dmitry Torokhov <dtor@insightbb.com>
		  Ingo Molnar <mingo@elte.hu>
		  Alexey Starikovskiy <astarikovskiy@suse.de>
Patch		: http://bugzilla.kernel.org/attachment.cgi?id=13892&action=view
		  http://bugzilla.kernel.org/attachment.cgi?id=13893&action=view
		  http://bugzilla.kernel.org/attachment.cgi?id=13907&action=view
Note		: patches to apply in this order, top-down


Subject		: v2.6.24-rc2-409-g9418d5d: attempt to access beyond end of device
Submitter	: Thomas Meyer <thomas@m3y3r.de>
References	: http://lkml.org/lkml/2007/11/13/250
		  http://bugzilla.kernel.org/show_bug.cgi?id=9370
Handled-By	: Matthew Wilcox <matthew@wil.cx>
Patch		: 


Subject		: SError: { DevExch } occuring and causing disruption
Submitter	: Avuton Olrich <avuton@gmail.com>
References	: http://bugzilla.kernel.org/show_bug.cgi?id=9393
Handled-By	: Tejun Heo <htejun@gmail.com>
		  Mark Lord <mlord@pobox.com>
Patch		: 


Subject		: nfsd gets stuck when underlying filesystem is XFS
Submitter	: Christian Kujau <lists@nerdbynature.de>
		  Chris Wedgwood <cw@f00f.org>
References	: http://bugzilla.kernel.org/show_bug.cgi?id=9400
Handled-By	: "J. Bruce Fields" <bfields@fieldses.org>
		  Christoph Hellwig <hch@infradead.org>
Patch		: http://lkml.org/lkml/2007/11/25/39


Subject		: 2.6.24-rc3: find complains about /proc/net
Submitter	: Pavel Machek <pavel@ucw.cz>
References	: http://lkml.org/lkml/2007/11/19/253
		  http://bugzilla.kernel.org/show_bug.cgi?id=9411
Handled-By	: "Eric W. Biederman" <ebiederm@xmission.com>
Patch		:
Note		: the existing fix needs fixing


Subject		: Not work light of button-led with module b43 in chipset broadcom 4318
Submitter	: Cristian Aravena Romero <caravena@gmail.com>
References	: http://bugzilla.kernel.org/show_bug.cgi?id=9414
Handled-By	: 
Patch		: 


Subject		: unable to turn cooling device 'off' - LG LE50 Express laptop
Submitter	: Marcus Better <marcus@better.se>
References	: http://bugzilla.kernel.org/show_bug.cgi?id=9432
Handled-By	: Len Brown <lenb@kernel.org>
		  Alexey Starikovskiy <astarikovskiy@suse.de>
Patch		: 


Subject		: 2.6.24-rc3 can't see sd partitions on Alpha
Submitter	: Bob Tracy <rct@gherkin.frus.com>
References	: http://lkml.org/lkml/2007/11/18/3
		  http://bugzilla.kernel.org/show_bug.cgi?id=9457
Handled-By	: Andrew Morton <akpm@linux-foundation.org>
		  Kay Sievers <kay.sievers@vrfy.org>
		  Ingo Molnar <mingo@elte.hu>
Patch		: 


Subject		: 2.6.24-rc3-git2 softlockup detected
Submitter	: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
References	: http://lkml.org/lkml/2007/11/28/16
		  http://bugzilla.kernel.org/show_bug.cgi?id=9472
Handled-By	: Andrew Morton <akpm@linux-foundation.org>
		  Ingo Molnar <mingo@elte.hu>
Patch		: 


Subject		: jiffies counter leaps in 2.6.24-rc3
Submitter	: Stefano Brivio <stefano.brivio@polimi.it>
References	: http://lkml.org/lkml/2007/11/24/53
		  http://bugzilla.kernel.org/show_bug.cgi?id=9475
Handled-By	: Ingo Molnar <mingo@elte.hu>
Patch		: http://lkml.org/lkml/2007/12/7/132


Subject		: kernel GPF in 2.6.24 (g09f345da)
Submitter	: Jon Nelson <jnelson-kernel-bugzilla@jamponi.net>
References	: http://bugzilla.kernel.org/show_bug.cgi?id=9482
Handled-By	: Andrew Morton <akpm@linux-foundation.org>
Patch		: 


Subject		: 20000+ wake-ups/second in 2.6.24
Submitter	: Mark Lord <lkml@rtr.ca>
References	: http://lkml.org/lkml/2007/12/1/141
		  http://bugzilla.kernel.org/show_bug.cgi?id=9489
Handled-By	: Arjan van de Ven <arjan@infradead.org>
Patch		: 


Subject		: 2.6.24: false double-clicks from USB mouse
Submitter	: Mark Lord <lkml@rtr.ca>
References	: http://lkml.org/lkml/2007/12/2/86
		  http://bugzilla.kernel.org/show_bug.cgi?id=9492
Handled-By	: Jiri Kosina <jkosina@suse.cz>
Patch		: 


Subject		: Battery shows up twice in kpowersave
Submitter	: Rolf Eike Beer <eike-kernel@sf-tec.de>
References	: http://bugzilla.kernel.org/show_bug.cgi?id=9494
Handled-By	: Alexey Starikovskiy <astarikovskiy@suse.de>
Patch		: 


Subject		: kobject ->k_name memory leak
Submitter	: Alexey Dobriyan <adobriyan@sw.ru>
References	: http://lkml.org/lkml/2007/12/3/20
		  http://bugzilla.kernel.org/show_bug.cgi?id=9496
Handled-By	: Greg KH <gregkh@suse.de>
Patch		: 


Subject		: tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52 kmap_atomic_prot()
Submitter	: Ingo Molnar <mingo@elte.hu>
References	: http://lkml.org/lkml/2007/11/29/157
		  http://bugzilla.kernel.org/show_bug.cgi?id=9497
Handled-By	: Matt Mackall <mpm@selenic.com>
Patch		: http://lkml.org/lkml/2007/11/29/387


Subject		: Regression - 2.6.24-rc3 - umem nvram card driver oops
Submitter	: David Chinner <dgc@sgi.com>
References	: http://lkml.org/lkml/2007/12/3/216
		  http://bugzilla.kernel.org/show_bug.cgi?id=9498
Handled-By	: Neil Brown <neilb@suse.de>
Patch		: http://lkml.org/lkml/2007/12/3/266


Subject		: PS3: trouble with SPARSEMEM_VMEMMAP and kexec
Submitter	: Geoff Levand <geoffrey.levand@am.sony.com>
References	: http://lkml.org/lkml/2007/12/3/137
		  http://bugzilla.kernel.org/show_bug.cgi?id=9499
Handled-By	: Milton Miller <miltonm@bga.com>
		  Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
		  Yasunori Goto <y-goto@jp.fujitsu.com>
Patch		: 


Subject		: binfmt_misc file system is empty
Submitter	: Marcus Better <marcus@better.se>
References	: http://bugzilla.kernel.org/show_bug.cgi?id=9504
Handled-By	: Denis V. Lunev <den@openvz.org>
Patch		:


Subject		: 2.6.24-rc4 hwmon it87 probe fails
Submitter	: Mike Houston <mikeserv@bmts.com>
References	: http://lkml.org/lkml/2007/12/4/466
		  http://bugzilla.kernel.org/show_bug.cgi?id=9514
Handled-By	: 
Patch		: 


Subject		: Major regression on hackbench with SLUB
Submitter	: Steven Rostedt <rostedt@goodmis.org>
References	: http://lkml.org/lkml/2007/12/7/181
		  http://bugzilla.kernel.org/show_bug.cgi?id=9521
Handled-By	: Linus Torvalds <torvalds@linux-foundation.org>
Patch		: 


Subject		: 2.6.24-rc3-git4 NFS crossmnt regression
Submitter	: Shane <gnome42@gmail.com>
References	: http://lkml.org/lkml/2007/12/6/410
		  http://bugzilla.kernel.org/show_bug.cgi?id=9522
Handled-By	: "Trond Myklebust" <trond.myklebust@fys.uio.no>
Patch		: http://bugzilla.kernel.org/attachment.cgi?id=13908&action=view


Subject		: soft lockup - CPU#1 stuck for 15s! [swapper:0]
Submitter	: "Parag Warudkar" <parag.warudkar@gmail.com>
References	: http://lkml.org/lkml/2007/12/7/299
		  http://bugzilla.kernel.org/show_bug.cgi?id=9525
Handled-By	: "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com>
Patch		: 


For details, please follow the links given in references.

As you can see, there is a Bugzilla entry for each of the listed regressions.
There also is a Bugzilla entry used for tracking the regressions from 2.6.23,
unresolved as well as resolved, at:

http://bugzilla.kernel.org/show_bug.cgi?id=9243

Please let me know if there are any Bugzilla entries that should be added to
the list in there.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08  2:40 2.6.24-rc4-git5: Reported regressions from 2.6.23 Rafael J. Wysocki
@ 2007-12-08  6:53 ` Fabio Comolli
  2007-12-08  8:28   ` Ingo Molnar
  2007-12-08  9:29 ` Andrew Morton
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 103+ messages in thread
From: Fabio Comolli @ 2007-12-08  6:53 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: LKML, Andrew Morton, Linus Torvalds, Ingo Molnar

Hi.

On Dec 8, 2007 3:40 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> This message contains a list of some regressions from 2.6.23 which have been
> reported since 2.6.24-rc1 was released and for which there are no fixes in the
> mainline that I know of. If any of them have been fixed already, please let me
> know.
>
> If you know of any other unresolved regressions from 2.6.23, please let me know
> either and I'll add them to the list.

<snip>

> Subject         : Battery shows up twice in kpowersave
> Submitter       : Rolf Eike Beer <eike-kernel@sf-tec.de>
> References      : http://bugzilla.kernel.org/show_bug.cgi?id=9494
> Handled-By      : Alexey Starikovskiy <astarikovskiy@suse.de>
> Patch           :
>

I don't think that this is a regression: I reported on RedHat bugzilla
when I switched from F7 to F8 and I was using 2.6.23.8 at that time.
It looks to me an HAL regression, but of course I may be wrong :-) as
the reported bisected to a bad commit.

https://bugzilla.redhat.com/show_bug.cgi?id=373041

By the way, I now switched to Fedrora Rawhide with a 2.6.24-rc4-git5
custom kernel and Gnome desktop and the problem is still present, even
with gnome-power-manager.

Hope this helps.
Regards,
Fabio

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08  6:53 ` Fabio Comolli
@ 2007-12-08  8:28   ` Ingo Molnar
  2007-12-08  9:23     ` Andrew Morton
  0 siblings, 1 reply; 103+ messages in thread
From: Ingo Molnar @ 2007-12-08  8:28 UTC (permalink / raw)
  To: Fabio Comolli
  Cc: Rafael J. Wysocki, LKML, Andrew Morton, Linus Torvalds, Greg KH,
	Len Brown


* Fabio Comolli <fabio.comolli@gmail.com> wrote:

> <snip>
> 
> > Subject         : Battery shows up twice in kpowersave
> > Submitter       : Rolf Eike Beer <eike-kernel@sf-tec.de>
> > References      : http://bugzilla.kernel.org/show_bug.cgi?id=9494
> > Handled-By      : Alexey Starikovskiy <astarikovskiy@suse.de>
> > Patch           :
> >
> 
> I don't think that this is a regression: I reported on RedHat bugzilla 
> when I switched from F7 to F8 and I was using 2.6.23.8 at that time. 
> It looks to me an HAL regression, but of course I may be wrong :-) as 
> the reported bisected to a bad commit.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=373041
> 
> By the way, I now switched to Fedrora Rawhide with a 2.6.24-rc4-git5 
> custom kernel and Gnome desktop and the problem is still present, even 
> with gnome-power-manager.

to me this looks like an ABI regression - utilities should work without 
change. Something changed in /sys output that caused HAL to think that 
there are two batteries:

| The output of lshal shows that there are two UDI's with 
| info.capabilities = { 'battery' }:
|
| udi = '/org/freedesktop/Hal/devices/acpi_BAT0'
| udi = '/org/freedesktop/Hal/devices/computer_power_supply_0'

whether it's a HAL bug or a kernel bug, the original state should be 
restored and it should be worked out without breaking users of older HAL 
versions.

grumble: way too many times do various system utilities break when i 
upgrade the kernel on my laptop. Maybe a new debug mechanism: we should 
start fingerprinting the exact /sys and /proc output and enforce that 
it's immutable across kernel releases as long as the hardware is 
unmodified?

	Ingo

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08  8:28   ` Ingo Molnar
@ 2007-12-08  9:23     ` Andrew Morton
  2007-12-08 22:11       ` Rafael J. Wysocki
  0 siblings, 1 reply; 103+ messages in thread
From: Andrew Morton @ 2007-12-08  9:23 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Fabio Comolli, Rafael J. Wysocki, LKML, Linus Torvalds, Greg KH,
	Len Brown, Alexey Starikovskiy

On Sat, 8 Dec 2007 09:28:15 +0100 Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Fabio Comolli <fabio.comolli@gmail.com> wrote:
> 
> > <snip>
> > 
> > > Subject         : Battery shows up twice in kpowersave
> > > Submitter       : Rolf Eike Beer <eike-kernel@sf-tec.de>
> > > References      : http://bugzilla.kernel.org/show_bug.cgi?id=9494
> > > Handled-By      : Alexey Starikovskiy <astarikovskiy@suse.de>
> > > Patch           :
> > >
> > 
> > I don't think that this is a regression: I reported on RedHat bugzilla 
> > when I switched from F7 to F8 and I was using 2.6.23.8 at that time. 
> > It looks to me an HAL regression, but of course I may be wrong :-) as 
> > the reported bisected to a bad commit.
> > 
> > https://bugzilla.redhat.com/show_bug.cgi?id=373041
> > 
> > By the way, I now switched to Fedrora Rawhide with a 2.6.24-rc4-git5 
> > custom kernel and Gnome desktop and the problem is still present, even 
> > with gnome-power-manager.
> 
> to me this looks like an ABI regression - utilities should work without 
> change. Something changed in /sys output that caused HAL to think that 
> there are two batteries:

Yep.  Although HAL is of course a most special case of "userspace".

> | The output of lshal shows that there are two UDI's with 
> | info.capabilities = { 'battery' }:
> |
> | udi = '/org/freedesktop/Hal/devices/acpi_BAT0'
> | udi = '/org/freedesktop/Hal/devices/computer_power_supply_0'
> 
> whether it's a HAL bug or a kernel bug, the original state should be 
> restored and it should be worked out without breaking users of older HAL 
> versions.

"breaking users of older HAL versions" == "breaking machines".

The patch should be reverted.  Do we know which one it was?

> grumble: way too many times do various system utilities break when i 
> upgrade the kernel on my laptop. Maybe a new debug mechanism: we should 
> start fingerprinting the exact /sys and /proc output and enforce that 
> it's immutable across kernel releases as long as the hardware is 
> unmodified?

That would be neat.  It would need to be executed on a lot of different
machines.

I wonder if there's something sneaky we can do here.  Install the script in
/lib/modules/$(uname -r) and then run it from the kernel when the fork
count reaches 1000 ;)

(hey, I've seen worse: /proc files which start with #!/bin/sh)

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08  2:40 2.6.24-rc4-git5: Reported regressions from 2.6.23 Rafael J. Wysocki
  2007-12-08  6:53 ` Fabio Comolli
@ 2007-12-08  9:29 ` Andrew Morton
  2007-12-08 22:17   ` Rafael J. Wysocki
  2007-12-08  9:30 ` tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52, [2.6.24-rc4-git5: Reported regressions from 2.6.23] Ingo Molnar
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 103+ messages in thread
From: Andrew Morton @ 2007-12-08  9:29 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Linus Torvalds, Ingo Molnar, Márton Németh

On Sat, 8 Dec 2007 03:40:49 +0100 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> This message contains a list of some regressions from 2.6.23 which have been
> reported since 2.6.24-rc1 was released and for which there are no fixes in the
> mainline that I know of.  If any of them have been fixed already, please let me
> know.
> 
> If you know of any other unresolved regressions from 2.6.23, please let me know
> either and I'll add them to the list.

Twenty nine, huh?

It would be useful if these records were sorted in date-of-reportage order
and had a date stamp so we could see how long they've been hanging about.
Something to think about for the post-2.6.24 regression if you'll be handling
those?

> Subject		: leds: ledtrig-timer calls sleeping function from invalid context
> Submitter	: Márton Németh <nm127@freemail.hu>
> References	: http://bugzilla.kernel.org/show_bug.cgi?id=9264
> Handled-By	: Richard Purdie <rpurdie@rpsys.net>
> Patch		: http://bugzilla.kernel.org/attachment.cgi?id=13493&action=view

That patch has been merged (dc47206e552c0850ad11f7e9a1fca0a3c92f5d65) and
assuming Márton has tested the latest git snapshot
(ftp://ftp.kernel.org/pub/linux/kernel/v2.6/snapshots) successfully we can
cross it off?  

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52, [2.6.24-rc4-git5: Reported regressions from 2.6.23]
  2007-12-08  2:40 2.6.24-rc4-git5: Reported regressions from 2.6.23 Rafael J. Wysocki
  2007-12-08  6:53 ` Fabio Comolli
  2007-12-08  9:29 ` Andrew Morton
@ 2007-12-08  9:30 ` Ingo Molnar
  2007-12-08 10:11   ` Andrew Morton
  2007-12-08 16:37   ` Matt Mackall
  2007-12-08  9:36 ` 2.6.24-rc4-git5: Reported regressions from 2.6.23 Andrew Morton
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 103+ messages in thread
From: Ingo Molnar @ 2007-12-08  9:30 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: LKML, Andrew Morton, Linus Torvalds, Matt Mackall


* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> Subject		: tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52 kmap_atomic_prot()
> Submitter	: Ingo Molnar <mingo@elte.hu>
> References	: http://lkml.org/lkml/2007/11/29/157
> 		  http://bugzilla.kernel.org/show_bug.cgi?id=9497
> Handled-By	: Matt Mackall <mpm@selenic.com>
> Patch		: http://lkml.org/lkml/2007/11/29/387

Matt, the above bug is still occuring en masse during random bootups:

[   67.828312] WARNING: at arch/x86/mm/highmem_32.c:52 kmap_atomic_prot()
WARNING: at arch/x86/mm/highmem_32.c:52 kmap_atomic_prot()
WARNING: at arch/x86/mm/highmem_32.c:52 kmap_atomic_prot()
[   41.270905] WARNING: at arch/x86/mm/highmem_32.c:52 kmap_atomic_prot()
WARNING: at arch/x86/mm/highmem_32.c:52 kmap_atomic_prot()
WARNING: at arch/x86/mm/highmem_32.c:52 kmap_atomic_prot()
[   82.378714] WARNING: at arch/x86/mm/highmem_32.c:52 kmap_atomic_prot()
[   35.308126] WARNING: at arch/x86/mm/highmem_32.c:52 kmap_atomic_prot()
WARNING: at arch/x86/mm/highmem_32.c:52 kmap_atomic_prot()
[   35.484155] WARNING: at arch/x86/mm/highmem_32.c:52 kmap_atomic_prot()
WARNING: at arch/x86/mm/highmem_32.c:52 kmap_atomic_prot()
WARNING: at arch/x86/mm/highmem_32.c:52 kmap_atomic_prot()
WARNING: at arch/x86/mm/highmem_32.c:52 kmap_atomic_prot()
WARNING: at arch/x86/mm/highmem_32.c:52 kmap_atomic_prot()
WARNING: at arch/x86/mm/highmem_32.c:52 kmap_atomic_prot()

and the patch does not seem to be upstream yet.

	Ingo

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08  2:40 2.6.24-rc4-git5: Reported regressions from 2.6.23 Rafael J. Wysocki
                   ` (2 preceding siblings ...)
  2007-12-08  9:30 ` tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52, [2.6.24-rc4-git5: Reported regressions from 2.6.23] Ingo Molnar
@ 2007-12-08  9:36 ` Andrew Morton
  2007-12-08 10:12   ` Andreas Mohr
  2007-12-09  6:52   ` Tejun Heo
  2007-12-08  9:42 ` Andrew Morton
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 103+ messages in thread
From: Andrew Morton @ 2007-12-08  9:36 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: LKML, Linus Torvalds, Ingo Molnar, linux-ide, Tejun Heo

On Sat, 8 Dec 2007 03:40:49 +0100 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> This message contains a list of some regressions from 2.6.23 which have been
> reported since 2.6.24-rc1 was released and for which there are no fixes in the
> mainline that I know of.  If any of them have been fixed already, please let me
> know.
> 
> If you know of any other unresolved regressions from 2.6.23, please let me know
> either and I'll add them to the list.
> 
> 
> Subject		: PATA scan: ACPI Exception AE_AML_PACKAGE_LIMIT... is beyond end of object
> Submitter	: Hans de Bruin <bruinjm@xs4all.nl>
> References	: http://bugzilla.kernel.org/show_bug.cgi?id=9320
> Handled-By	: Robert Moore <Robert.Moore@intel.com>
> 		  Tejun Heo <htejun@gmail.com>
> 		  Fu Michael <michael.fu@intel.com>
> Patch		: 
> 

A number of other people are seeing the same thing and Tejun is putting in
a blacklist of machines which cannot use libata+acpi.  That patch is not
yet in any git tree which I pull.

AFACIT the machines kepe working OK - there's just some nasty dmesg spew.

If any machines _are_ breaking then this could cause real problems and I'd
prefer that we either go for a whitelist or arrange to detect the condition
and fall back to non-acpi ata.


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08  2:40 2.6.24-rc4-git5: Reported regressions from 2.6.23 Rafael J. Wysocki
                   ` (3 preceding siblings ...)
  2007-12-08  9:36 ` 2.6.24-rc4-git5: Reported regressions from 2.6.23 Andrew Morton
@ 2007-12-08  9:42 ` Andrew Morton
  2007-12-08 18:57   ` Roland Dreier
  2007-12-08 19:40   ` Theodore Tso
  2007-12-08  9:46 ` Andrew Morton
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 103+ messages in thread
From: Andrew Morton @ 2007-12-08  9:42 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Linus Torvalds, Ingo Molnar, Roland Dreier, Takashi Iwai,
	Theodore Ts'o

On Sat, 8 Dec 2007 03:40:49 +0100 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> This message contains a list of some regressions from 2.6.23 which have been
> reported since 2.6.24-rc1 was released and for which there are no fixes in the
> mainline that I know of.  If any of them have been fixed already, please let me
> know.
> 
> If you know of any other unresolved regressions from 2.6.23, please let me know
> either and I'll add them to the list.
> 
> ...
> 
> Subject		: snd_hda_intel 2.6.24-rc2 bug: interrupts don't always work on Lenovo X60s
> Submitter	: Roland Dreier <rdreier@cisco.com>
> References	: http://lkml.org/lkml/2007/11/8/255
> 		  http://bugzilla.kernel.org/show_bug.cgi?id=9332
> Handled-By	: 
> Patch		: 

Takashi had a patch and that has been merged.  AFAIK this regression
has been fixed and we're left with a new but harmless warning.

However Roland reported other problems and it appears that the trail went
cold (http://lkml.org/lkml/2007/11/14/251)

Ted was hitting some of the same problems but that trail appears to also
have gone cold (http://lkml.org/lkml/2007/11/23/17).

Guys, can we have a status update on all of this please?

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08  2:40 2.6.24-rc4-git5: Reported regressions from 2.6.23 Rafael J. Wysocki
                   ` (4 preceding siblings ...)
  2007-12-08  9:42 ` Andrew Morton
@ 2007-12-08  9:46 ` Andrew Morton
  2007-12-08 15:49   ` Alan Stern
  2007-12-08  9:52 ` Andrew Morton
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 103+ messages in thread
From: Andrew Morton @ 2007-12-08  9:46 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: LKML, Linus Torvalds, Ingo Molnar, Alan Stern

On Sat, 8 Dec 2007 03:40:49 +0100 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> This message contains a list of some regressions from 2.6.23 which have been
> reported since 2.6.24-rc1 was released and for which there are no fixes in the
> mainline that I know of.  If any of them have been fixed already, please let me
> know.
> 
> If you know of any other unresolved regressions from 2.6.23, please let me know
> either and I'll add them to the list.
> 
> 
> ..
>
> Subject		: system hangs after a few minutes
> Submitter	: Marcus Better <marcus@better.se>
> References	: http://bugzilla.kernel.org/show_bug.cgi?id=9335
> Handled-By	: Andrew Morton <akpm@linux-foundation.org>
> 		  Alan Stern <stern@rowland.harvard.edu>
> Patch		: http://bugzilla.kernel.org/attachment.cgi?id=13871&action=view
> 

This one we have a confirmed fix from Alan but it doesn't appear to be in
anyone's tree.

There is a second bug in here, applicable to core x86: Marcus's machine
won't boot with nmi_watchdog=1.


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08  2:40 2.6.24-rc4-git5: Reported regressions from 2.6.23 Rafael J. Wysocki
                   ` (5 preceding siblings ...)
  2007-12-08  9:46 ` Andrew Morton
@ 2007-12-08  9:52 ` Andrew Morton
  2007-12-09  7:00   ` Tejun Heo
  2007-12-08 10:44 ` Richard Purdie
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 103+ messages in thread
From: Andrew Morton @ 2007-12-08  9:52 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: LKML, Linus Torvalds, Ingo Molnar, Tejun Heo

On Sat, 8 Dec 2007 03:40:49 +0100 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> This message contains a list of some regressions from 2.6.23 which have been
> reported since 2.6.24-rc1 was released and for which there are no fixes in the
> mainline that I know of.  If any of them have been fixed already, please let me
> know.
> 
> If you know of any other unresolved regressions from 2.6.23, please let me know
> either and I'll add them to the list.
> 
> ...
> 
> Subject		: cd/dvd inaccessible in 2.6.24-rc2
> Submitter	: Will Trives <will@trivescon.com.au>
> References	: http://lkml.org/lkml/2007/11/9/290
> 		  http://bugzilla.kernel.org/show_bug.cgi?id=9346
> Handled-By	: Len Brown <lenb@kernel.org>
> 		  Tejun Heo <htejun@gmail.com>
> Patch		: 
> 

Nasty one.  Tejun and several diligent reporters are doing sterling work
there and things have improved.  I don't know whether any of Tejun's
patches have been merged yet, but we'll probably be OK on this one.

What is unclear (to me) is what actually caused those people's machines to
break?


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52, [2.6.24-rc4-git5: Reported regressions from 2.6.23]
  2007-12-08  9:30 ` tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52, [2.6.24-rc4-git5: Reported regressions from 2.6.23] Ingo Molnar
@ 2007-12-08 10:11   ` Andrew Morton
  2007-12-08 16:37   ` Matt Mackall
  1 sibling, 0 replies; 103+ messages in thread
From: Andrew Morton @ 2007-12-08 10:11 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Rafael J. Wysocki, LKML, Linus Torvalds, Matt Mackall

On Sat, 8 Dec 2007 10:30:39 +0100 Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> 
> > Subject		: tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52 kmap_atomic_prot()
> > Submitter	: Ingo Molnar <mingo@elte.hu>
> > References	: http://lkml.org/lkml/2007/11/29/157
> > 		  http://bugzilla.kernel.org/show_bug.cgi?id=9497
> > Handled-By	: Matt Mackall <mpm@selenic.com>
> > Patch		: http://lkml.org/lkml/2007/11/29/387
> 
> Matt, the above bug is still occuring en masse during random bootups:
> 
> [   67.828312] WARNING: at arch/x86/mm/highmem_32.c:52 kmap_atomic_prot()
> WARNING: at arch/x86/mm/highmem_32.c:52 kmap_atomic_prot()
> WARNING: at arch/x86/mm/highmem_32.c:52 kmap_atomic_prot()
> [   41.270905] WARNING: at arch/x86/mm/highmem_32.c:52 kmap_atomic_prot()
> WARNING: at arch/x86/mm/highmem_32.c:52 kmap_atomic_prot()
> WARNING: at arch/x86/mm/highmem_32.c:52 kmap_atomic_prot()
> [   82.378714] WARNING: at arch/x86/mm/highmem_32.c:52 kmap_atomic_prot()
> [   35.308126] WARNING: at arch/x86/mm/highmem_32.c:52 kmap_atomic_prot()
> WARNING: at arch/x86/mm/highmem_32.c:52 kmap_atomic_prot()
> [   35.484155] WARNING: at arch/x86/mm/highmem_32.c:52 kmap_atomic_prot()
> WARNING: at arch/x86/mm/highmem_32.c:52 kmap_atomic_prot()
> WARNING: at arch/x86/mm/highmem_32.c:52 kmap_atomic_prot()
> WARNING: at arch/x86/mm/highmem_32.c:52 kmap_atomic_prot()
> WARNING: at arch/x86/mm/highmem_32.c:52 kmap_atomic_prot()
> WARNING: at arch/x86/mm/highmem_32.c:52 kmap_atomic_prot()
> 
> and the patch does not seem to be upstream yet.
> 

err, umm, OK, I'll go hunt it down and take a closer look at it.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08  9:36 ` 2.6.24-rc4-git5: Reported regressions from 2.6.23 Andrew Morton
@ 2007-12-08 10:12   ` Andreas Mohr
  2007-12-08 10:20     ` Andrew Morton
  2007-12-09  6:52   ` Tejun Heo
  1 sibling, 1 reply; 103+ messages in thread
From: Andreas Mohr @ 2007-12-08 10:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar, linux-ide,
	Tejun Heo

Hi,

On Sat, Dec 08, 2007 at 01:36:31AM -0800, Andrew Morton wrote:
> > Subject		: PATA scan: ACPI Exception AE_AML_PACKAGE_LIMIT... is beyond end of object
> > Submitter	: Hans de Bruin <bruinjm@xs4all.nl>
> > References	: http://bugzilla.kernel.org/show_bug.cgi?id=9320
> > Handled-By	: Robert Moore <Robert.Moore@intel.com>
> > 		  Tejun Heo <htejun@gmail.com>
> > 		  Fu Michael <michael.fu@intel.com>
> > Patch		: 
> > 
> 
> A number of other people are seeing the same thing and Tejun is putting in
> a blacklist of machines which cannot use libata+acpi.  That patch is not
> yet in any git tree which I pull.
> 
> AFACIT the machines kepe working OK - there's just some nasty dmesg spew.
> 
> If any machines _are_ breaking then this could cause real problems and I'd
> prefer that we either go for a whitelist or arrange to detect the condition
> and fall back to non-acpi ata.

Does this report now win me the lucky draw, pretty please? ;)

STD regression rc1 -> rc234, suspend fails completely, recovering is
pretty much useless since HDD is DEAD from this point on anyway.
Managed to capture -rc2 suspend logging via still-alive ssh session.

2.6.24-rc1 suspend/resume log, successful (well, a couple seconds delay, most likely due to
well-recovered AML failure):

swsusp: Marking nosave pages: 000000000009f000 - 0000000000100000
swsusp: Basic memory bitmaps created
Syncing filesystems ... done.
Freezing user space processes ... (elapsed 0.00 seconds) done.
Freezing remaining freezable tasks ... (elapsed 0.00 seconds) done.
Shrinking memory... done (0 pages freed)
Freed 0 kbytes in 0.02 seconds (0.00 MB/s)
Suspending console(s)
hub 4-0:1.0: hub_suspend
usb usb4: bus suspend
ehci_hcd 0000:00:10.3: suspend root hub
hub 3-0:1.0: hub_suspend
usb usb3: bus suspend
usb usb3: suspend_rh
hub 2-0:1.0: hub_suspend
usb usb2: bus suspend
usb usb2: suspend_rh
hub 1-0:1.0: hub_suspend
usb usb1: bus suspend
usb usb1: suspend_rh
sd 0:0:0:0: [sda] Synchronizing SCSI cache
parport_pc 00:09: disabled
serial 00:08: disabled
serial 00:07: disabled
ACPI: PCI interrupt for device 0000:00:11.5 disabled
ACPI handle has no context!
ACPI: PCI interrupt for device 0000:00:11.1 disabled
ACPI: PCI interrupt for device 0000:00:10.3 disabled
ehci_hcd 0000:00:10.3: --> PCI D3/wakeup
uhci_hcd 0000:00:10.2: uhci_suspend
ACPI: PCI interrupt for device 0000:00:10.2 disabled
uhci_hcd 0000:00:10.2: --> PCI D3
uhci_hcd 0000:00:10.1: uhci_suspend
ACPI: PCI interrupt for device 0000:00:10.1 disabled
uhci_hcd 0000:00:10.1: --> PCI D3
uhci_hcd 0000:00:10.0: uhci_suspend
ACPI: PCI interrupt for device 0000:00:10.0 disabled
uhci_hcd 0000:00:10.0: --> PCI D3
ACPI: PCI interrupt for device 0000:00:0d.0 disabled
ACPI handle has no context!
ACPI: PCI interrupt for device 0000:00:0c.0 disabled
ACPI handle has no context!
pci_set_power_state(): 0000:00:00.0: state=3, current state=5
swsusp: critical section:
swsusp: Need to copy 51195 pages
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
evxfevnt-0079 [00] enable                : System is already in ACPI mode
ACPI: PCI Interrupt Link [ALKA] BIOS reported IRQ 0, using IRQ 20
ACPI: PCI Interrupt Link [ALKB] BIOS reported IRQ 0, using IRQ 21
ACPI: PCI Interrupt Link [ALKC] BIOS reported IRQ 0, using IRQ 22
ACPI: PCI Interrupt Link [ALKD] BIOS reported IRQ 0, using IRQ 23
evxfevnt-0079 [00] enable                : System is already in ACPI mode
ACPI: Unable to turn cooling device [c180ff60] 'off'
PCI: Setting latency timer of device 0000:00:01.0 to 64
ohci1394: fw-host0: OHCI-1394 1.0 (PCI): IRQ=[19]  MMIO=[db140000-db1407ff]  Max Packet=[2048]  IR/IT contexts=[4/8]
ACPI: PCI Interrupt 0000:00:0a.0[A] -> GSI 18 (level, low) -> IRQ 18
e100: eth-intel: e100_watchdog: link up, 100Mbps, full-duplex
PM: Writing back config space on device 0000:00:0d.0 at offset 1 (was 2100007, writing 2100003)
ACPI: PCI Interrupt 0000:00:0d.0[A] -> GSI 19 (level, low) -> IRQ 22
uhci_hcd 0000:00:10.0: PCI D0, from previous PCI D3
ACPI: PCI Interrupt 0000:00:10.0[A] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 20
uhci_hcd 0000:00:10.0: uhci_resume
uhci_hcd 0000:00:10.0: uhci_check_and_reset_hc: cmd = 0x0000
uhci_hcd 0000:00:10.0: Performing full reset
usb usb1: root hub lost power or was reset
usb usb1: suspend_rh
uhci_hcd 0000:00:10.1: PCI D0, from previous PCI D3
ACPI: PCI Interrupt 0000:00:10.1[B] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 20
uhci_hcd 0000:00:10.1: uhci_resume
uhci_hcd 0000:00:10.1: uhci_check_and_reset_hc: cmd = 0x0000
uhci_hcd 0000:00:10.1: Performing full reset
usb usb2: root hub lost power or was reset
usb usb2: suspend_rh
uhci_hcd 0000:00:10.2: PCI D0, from previous PCI D3
ACPI: PCI Interrupt 0000:00:10.2[C] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 20
uhci_hcd 0000:00:10.2: uhci_resume
uhci_hcd 0000:00:10.2: uhci_check_and_reset_hc: cmd = 0x0000
uhci_hcd 0000:00:10.2: Performing full reset
usb usb3: root hub lost power or was reset
usb usb3: suspend_rh
ehci_hcd 0000:00:10.3: PCI D0, from previous PCI D3
ACPI: PCI Interrupt 0000:00:10.3[D] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 20
PM: Writing back config space on device 0000:00:10.3 at offset 3 (was 2008, writing 2010)
PM: Writing back config space on device 0000:00:10.3 at offset 1 (was 2100007, writing 2100017)
PM: Writing back config space on device 0000:00:11.1 at offset 1 (was 2900003, writing 2900007)
ACPI: PCI Interrupt 0000:00:11.1[A] -> Link [ALKA] -> GSI 20 (level, low) -> IRQ 17
ACPI: PCI Interrupt 0000:00:11.5[C] -> Link [ALKC] -> GSI 22 (level, low) -> IRQ 23
PCI: Setting latency timer of device 0000:00:11.5 to 64
ACPI: PCI Interrupt 0000:01:00.0[A] -> GSI 16 (level, low) -> IRQ 16
serial 00:07: activated
serial 00:08: activated
parport_pc 00:09: activated
i8042 aux 00:0a: activation failed
i8042 kbd 00:0b: activation failed
sd 0:0:0:0: [sda] Starting disk
ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126]
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTF_] (Node c180b990), AE_AML_PACKAGE_LIMIT
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN0.DRV1._GTF] (Node c180b888), AE_AML_PACKAGE_LIMIT
ata1.01: _GTF evaluation failed (AE 0x300d)
ata1.01: revalidation failed (errno=-5)
ata1: failed to recover some devices, retrying in 5 secs
ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126]
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTF_] (Node c180b990), AE_AML_PACKAGE_LIMIT
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN0.DRV1._GTF] (Node c180b888), AE_AML_PACKAGE_LIMIT
ata1.01: _GTF evaluation failed (AE 0x300d)
ata1.01: ACPI on devcfg failed the second time, disabling (errno=-5)
ata1.01: revalidation failed (errno=1)
ata1: failed to recover some devices, retrying in 5 secs
ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126]
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTF_] (Node c180b990), AE_AML_PACKAGE_LIMIT
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN0.DRV0._GTF] (Node c180b840), AE_AML_PACKAGE_LIMIT
ata1.00: _GTF evaluation failed (AE 0x300d)
ata1.00: revalidation failed (errno=-5)
ata1: failed to recover some devices, retrying in 5 secs
ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126]
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTF_] (Node c180b990), AE_AML_PACKAGE_LIMIT
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN0.DRV0._GTF] (Node c180b840), AE_AML_PACKAGE_LIMIT
ata1.00: _GTF evaluation failed (AE 0x300d)
ata1.00: ACPI on devcfg failed the second time, disabling (errno=-5)
ata1.00: revalidation failed (errno=1)
ata1: failed to recover some devices, retrying in 5 secs
ata1.00: configured for UDMA/100
ata1.01: configured for UDMA/33
sd 0:0:0:0: [sda] 234441648 512-byte hardware sectors (120034 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 0:0:0:0: [sda] 234441648 512-byte hardware sectors (120034 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
usb usb1: usb resume
usb usb1: wakeup_rh
hub 1-0:1.0: trying to enable port power on non-switchable hub
usb usb2: usb resume
usb usb2: wakeup_rh
hub 2-0:1.0: trying to enable port power on non-switchable hub
usb usb3: usb resume
usb usb3: wakeup_rh
hub 3-0:1.0: trying to enable port power on non-switchable hub
usb usb4: usb resume
ehci_hcd 0000:00:10.3: resume root hub
hub 4-0:1.0: hub_resume
Restarting tasks ... <7>hub 1-0:1.0: state 7 ports 2 chg 0000 evt 0006
uhci_hcd 0000:00:10.0: port 1 portsc 018a,00
hub 1-0:1.0: port 1, status 0300, change 0003, 1.5 Mb/s
done.
swsusp: Basic memory bitmaps freed
hub 1-0:1.0: debounce: port 1: total 100ms stable 100ms status 0x300
uhci_hcd 0000:00:10.0: port 2 portsc 008a,00
hub 1-0:1.0: port 2, status 0100, change 0003, 12 Mb/s
hub 1-0:1.0: debounce: port 2: total 100ms stable 100ms status 0x100
hub 2-0:1.0: state 7 ports 2 chg 0000 evt 0006
uhci_hcd 0000:00:10.1: port 1 portsc 018a,00
hub 2-0:1.0: port 1, status 0300, change 0003, 1.5 Mb/s
hub 2-0:1.0: debounce: port 1: total 100ms stable 100ms status 0x300
uhci_hcd 0000:00:10.1: port 2 portsc 008a,00
hub 2-0:1.0: port 2, status 0100, change 0003, 12 Mb/s
hub 2-0:1.0: debounce: port 2: total 100ms stable 100ms status 0x100
hub 3-0:1.0: state 7 ports 2 chg 0000 evt 0006
uhci_hcd 0000:00:10.2: port 1 portsc 008a,00
hub 3-0:1.0: port 1, status 0100, change 0003, 12 Mb/s
hub 3-0:1.0: debounce: port 1: total 100ms stable 100ms status 0x100
uhci_hcd 0000:00:10.2: port 2 portsc 008a,00
hub 3-0:1.0: port 2, status 0100, change 0003, 12 Mb/s
hub 3-0:1.0: debounce: port 2: total 100ms stable 100ms status 0x100
hub 4-0:1.0: state 7 ports 6 chg 0000 evt 0000
hub 1-0:1.0: state 7 ports 2 chg 0000 evt 0000
hub 2-0:1.0: state 7 ports 2 chg 0000 evt 0000
hub 3-0:1.0: state 7 ports 2 chg 0000 evt 0000
usb usb1: suspend_rh (auto-stop)
usb usb2: suspend_rh (auto-stop)
usb usb3: suspend_rh (auto-stop)
agpgart: Found an AGP 2.0 compliant device at 0000:00:00.0.
agpgart: Putting AGP V2 device at 0000:00:00.0 into 4x mode
agpgart: Putting AGP V2 device at 0000:01:00.0 into 4x mode
[drm] Loading R200 Microcode



2.6.24-rc2 suspend log (one screenful), UNSUCCESSFUL:

serial 00:07: disabled
ACPI: PCI interrupt for device 0000:00:11.5 disabled
ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126]
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTM_]
(Node c180b9a8), AE_AML_PACKAGE_LIMIT
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN1._GTM] (Node c180b8d0), AE_AML_PACKAGE_LIMIT
ata2: ACPI get timing mode failed (AE 0x300d)
pci_device_suspend(): ata_pci_device_suspend+0x0/0x40() returns -22
suspend_device(): pci_device_suspend+0x0/0x70() returns -22
Could not suspend device 0000:00:11.1: error -22
ACPI: PCI Interrupt 0000:00:11.5[C] -> Link [ALKC] -> GSI 22 (level, low) -> IRQ 23
ACPI: PCI Interrupt 0000:01:00.0[A] -> GSI 16 (level, low) -> IRQ 16
serial 00:07: activated
serial 00:08: activated
parport_pc 00:09: activated
i8042 aux 00:0a: activation failed
i8042 kbd 00:0b: activation failed
sd 0:0:0:0: [sda] Starting disk
sd 0:0:0:0: timing out command, waited 180s
sd 0:0:0:0: [sda] START_STOP FAILED
sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_OK,SUGGEST_OK
Restarting tasks ... <7>hub 1-0:1.0: state 7 ports 6 chg 0000 evt 0000
done.
swsusp: Basic memory bitmaps freed
swsusp: Marking nosave pages: 000000000009f000 - 0000000000100000
swsusp: Basic memory bitmaps created
Syncing filesystems ...



# lspci
00:00.0 Host bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333]
00:01.0 PCI bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333 AGP]
00:09.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller (rev 46)
00:0a.0 Multimedia audio controller: Aureal Semiconductor Vortex 2 (rev fe)
00:0c.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 08)
00:0d.0 Multimedia audio controller: Aztech System Ltd 3328 Audio (rev 10)
00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80)
00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80)
00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80)
00:10.3 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 82)
00:11.0 ISA bridge: VIA Technologies, Inc. VT8235 ISA Bridge
00:11.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller (rev 50)
00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 74)
01:00.0 VGA compatible controller: ATI Technologies Inc Radeon RV250 If [Radeon 9000] (rev 01)
01:00.1 Display controller: ATI Technologies Inc Radeon RV250 [Radeon 9000] (Secondary) (rev 01)


# dmidecode 2.9
SMBIOS 2.2 present.
39 structures occupying 1035 bytes.
Table at 0x000F0800.

Handle 0x0000, DMI type 0, 19 bytes
BIOS Information
        Vendor: Award Software International, Inc.
        Version: 6.00 PG
        Release Date: 09/16/2003
        Address: 0xE0000
        Runtime Size: 128 kB
        ROM Size: 512 kB
        Characteristics:
                ISA is supported
                PCI is supported
                PNP is supported
                APM is supported
                BIOS is upgradeable
                BIOS shadowing is allowed
                ESCD support is available
                Boot from CD is supported
                Selectable boot is supported
                BIOS ROM is socketed
                EDD is supported
                5.25"/360 KB floppy services are supported (int 13h)
                5.25"/1.2 MB floppy services are supported (int 13h)
                3.5"/720 KB floppy services are supported (int 13h)
                3.5"/2.88 MB floppy services are supported (int 13h)
                Print screen service is supported (int 5h)
                8042 keyboard services are supported (int 9h)
                Serial services are supported (int 14h)
                Printer services are supported (int 17h)
                CGA/mono video services are supported (int 10h)
                ACPI is supported
                USB legacy is supported
                AGP is supported
                LS-120 boot is supported
                ATAPI Zip drive boot is supported

Handle 0x0001, DMI type 1, 25 bytes
System Information
        Manufacturer: VIA Technologies, Inc.
        Product Name: VT8367-8235
        Version:
        Serial Number:
        UUID: Not Present
        Wake-up Type: Power Switch

Handle 0x0002, DMI type 2, 8 bytes
Base Board Information
        Manufacturer:
        Product Name: VT8367-8235
        Version:
        Serial Number:

Handle 0x0003, DMI type 3, 13 bytes
Chassis Information
        Manufacturer:
        Type: Desktop
        Lock: Not Present
        Version:
        Serial Number:
        Asset Tag:
        Boot-up State: Unknown
        Power Supply State: Unknown
        Thermal State: Unknown
        Security Status: Unknown

Handle 0x0004, DMI type 4, 32 bytes
Processor Information
        Socket Designation: Socket A
        Type: Central Processor
        Family: Duron
        Manufacturer: AMD
        ID: 81 06 00 00 FF FB 83 03
        Signature: Family 6, Model 8, Stepping 1
        Flags:
                FPU (Floating-point unit on-chip)
                VME (Virtual mode extension)
                DE (Debugging extension)
                PSE (Page size extension)
                TSC (Time stamp counter)
                MSR (Model specific registers)
                PAE (Physical address extension)
                MCE (Machine check exception)
                CX8 (CMPXCHG8 instruction supported)
                APIC (On-chip APIC hardware supported)
                SEP (Fast system call)
                MTRR (Memory type range registers)
                PGE (Page global enable)
                MCA (Machine check architecture)
                CMOV (Conditional move instruction supported)
                PAT (Page attribute table)
                PSE-36 (36-bit page size extension)
                MMX (MMX technology supported)
                FXSR (Fast floating-point save and restore)
                SSE (Streaming SIMD extensions)
        Version: AMD K7 processor
        Voltage: 3.3 V
        External Clock: 133 MHz
        Max Speed: 1500 MHz
        Current Speed: 1200 MHz
        Status: Populated, Enabled
        Upgrade: ZIF Socket
        L1 Cache Handle: 0x000A
        L2 Cache Handle: 0x000B
        L3 Cache Handle: No L3 Cache

Handle 0x0005, DMI type 5, 24 bytes
Memory Controller Information
        Error Detecting Method: None
        Error Correcting Capabilities:
                None
        Supported Interleave: One-way Interleave
        Current Interleave: Four-way Interleave
        Maximum Memory Module Size: 32 MB
        Maximum Total Memory Size: 128 MB
        Supported Speeds:
                70 ns
                60 ns
        Supported Memory Types:
                Standard
                EDO
        Memory Module Voltage: 5.0 V
        Associated Memory Slots: 4
                0x0006
                0x0007
                0x0008
                0x0009
        Enabled Error Correcting Capabilities: None

.
.
.


# hdparm -i /dev/sda

/dev/sda:

 Model=WDC WD1200JB-00CRA1                     , FwRev=17.07W17, SerialNo=WD-WCA8C4285629
 Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq }
 RawCHS=16383/16/63, TrkSize=57600, SectSize=600, ECCbytes=40
 BuffType=DualPortCache, BuffSize=8192kB, MaxMultSect=16, MultSect=?16?
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=234441648
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: Unspecified:  ATA/ATAPI-1,2,3,4,5

 * signifies the current active mode



Athlon on EPOX 8K5A2+ board.



Again, 2.6.23 and 2.6.24-rc1 work, yet 2.6.24 -rc2, -rc3 and -rc4 FAIL.

Probably won't be able to do any reporting over the weekend (WOL is
inoperable ATM for some weird reason), let me know what you need.
Took too much time to gather this report already anyway ;)

Thanks,

Andreas Mohr

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08 10:12   ` Andreas Mohr
@ 2007-12-08 10:20     ` Andrew Morton
  2007-12-08 10:28       ` Matthew Garrett
  2007-12-08 10:55       ` Andreas Mohr
  0 siblings, 2 replies; 103+ messages in thread
From: Andrew Morton @ 2007-12-08 10:20 UTC (permalink / raw)
  To: Andreas Mohr
  Cc: Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar, linux-ide,
	Tejun Heo, Len Brown, linux-acpi

On Sat, 8 Dec 2007 11:12:57 +0100 Andreas Mohr <andi@lisas.de> wrote:

> Hi,
> 
> On Sat, Dec 08, 2007 at 01:36:31AM -0800, Andrew Morton wrote:
> > > Subject		: PATA scan: ACPI Exception AE_AML_PACKAGE_LIMIT... is beyond end of object
> > > Submitter	: Hans de Bruin <bruinjm@xs4all.nl>
> > > References	: http://bugzilla.kernel.org/show_bug.cgi?id=9320
> > > Handled-By	: Robert Moore <Robert.Moore@intel.com>
> > > 		  Tejun Heo <htejun@gmail.com>
> > > 		  Fu Michael <michael.fu@intel.com>
> > > Patch		: 
> > > 
> > 
> > A number of other people are seeing the same thing and Tejun is putting in
> > a blacklist of machines which cannot use libata+acpi.  That patch is not
> > yet in any git tree which I pull.
> > 
> > AFACIT the machines kepe working OK - there's just some nasty dmesg spew.
> > 
> > If any machines _are_ breaking then this could cause real problems and I'd
> > prefer that we either go for a whitelist or arrange to detect the condition
> > and fall back to non-acpi ata.
> 
> Does this report now win me the lucky draw, pretty please? ;)

nah, you have to cc the acpi guys to get a prize ;)

Len&co, could you please take a look?

Andreas, please do separately report that WOL problem too..

Our list just reached 30.

> STD regression rc1 -> rc234, suspend fails completely, recovering is
> pretty much useless since HDD is DEAD from this point on anyway.
> Managed to capture -rc2 suspend logging via still-alive ssh session.
> 
> 2.6.24-rc1 suspend/resume log, successful (well, a couple seconds delay, most likely due to
> well-recovered AML failure):
> 
> swsusp: Marking nosave pages: 000000000009f000 - 0000000000100000
> swsusp: Basic memory bitmaps created
> Syncing filesystems ... done.
> Freezing user space processes ... (elapsed 0.00 seconds) done.
> Freezing remaining freezable tasks ... (elapsed 0.00 seconds) done.
> Shrinking memory... done (0 pages freed)
> Freed 0 kbytes in 0.02 seconds (0.00 MB/s)
> Suspending console(s)
> hub 4-0:1.0: hub_suspend
> usb usb4: bus suspend
> ehci_hcd 0000:00:10.3: suspend root hub
> hub 3-0:1.0: hub_suspend
> usb usb3: bus suspend
> usb usb3: suspend_rh
> hub 2-0:1.0: hub_suspend
> usb usb2: bus suspend
> usb usb2: suspend_rh
> hub 1-0:1.0: hub_suspend
> usb usb1: bus suspend
> usb usb1: suspend_rh
> sd 0:0:0:0: [sda] Synchronizing SCSI cache
> parport_pc 00:09: disabled
> serial 00:08: disabled
> serial 00:07: disabled
> ACPI: PCI interrupt for device 0000:00:11.5 disabled
> ACPI handle has no context!
> ACPI: PCI interrupt for device 0000:00:11.1 disabled
> ACPI: PCI interrupt for device 0000:00:10.3 disabled
> ehci_hcd 0000:00:10.3: --> PCI D3/wakeup
> uhci_hcd 0000:00:10.2: uhci_suspend
> ACPI: PCI interrupt for device 0000:00:10.2 disabled
> uhci_hcd 0000:00:10.2: --> PCI D3
> uhci_hcd 0000:00:10.1: uhci_suspend
> ACPI: PCI interrupt for device 0000:00:10.1 disabled
> uhci_hcd 0000:00:10.1: --> PCI D3
> uhci_hcd 0000:00:10.0: uhci_suspend
> ACPI: PCI interrupt for device 0000:00:10.0 disabled
> uhci_hcd 0000:00:10.0: --> PCI D3
> ACPI: PCI interrupt for device 0000:00:0d.0 disabled
> ACPI handle has no context!
> ACPI: PCI interrupt for device 0000:00:0c.0 disabled
> ACPI handle has no context!
> pci_set_power_state(): 0000:00:00.0: state=3, current state=5
> swsusp: critical section:
> swsusp: Need to copy 51195 pages
> Intel machine check architecture supported.
> Intel machine check reporting enabled on CPU#0.
> evxfevnt-0079 [00] enable                : System is already in ACPI mode
> ACPI: PCI Interrupt Link [ALKA] BIOS reported IRQ 0, using IRQ 20
> ACPI: PCI Interrupt Link [ALKB] BIOS reported IRQ 0, using IRQ 21
> ACPI: PCI Interrupt Link [ALKC] BIOS reported IRQ 0, using IRQ 22
> ACPI: PCI Interrupt Link [ALKD] BIOS reported IRQ 0, using IRQ 23
> evxfevnt-0079 [00] enable                : System is already in ACPI mode
> ACPI: Unable to turn cooling device [c180ff60] 'off'
> PCI: Setting latency timer of device 0000:00:01.0 to 64
> ohci1394: fw-host0: OHCI-1394 1.0 (PCI): IRQ=[19]  MMIO=[db140000-db1407ff]  Max Packet=[2048]  IR/IT contexts=[4/8]
> ACPI: PCI Interrupt 0000:00:0a.0[A] -> GSI 18 (level, low) -> IRQ 18
> e100: eth-intel: e100_watchdog: link up, 100Mbps, full-duplex
> PM: Writing back config space on device 0000:00:0d.0 at offset 1 (was 2100007, writing 2100003)
> ACPI: PCI Interrupt 0000:00:0d.0[A] -> GSI 19 (level, low) -> IRQ 22
> uhci_hcd 0000:00:10.0: PCI D0, from previous PCI D3
> ACPI: PCI Interrupt 0000:00:10.0[A] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 20
> uhci_hcd 0000:00:10.0: uhci_resume
> uhci_hcd 0000:00:10.0: uhci_check_and_reset_hc: cmd = 0x0000
> uhci_hcd 0000:00:10.0: Performing full reset
> usb usb1: root hub lost power or was reset
> usb usb1: suspend_rh
> uhci_hcd 0000:00:10.1: PCI D0, from previous PCI D3
> ACPI: PCI Interrupt 0000:00:10.1[B] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 20
> uhci_hcd 0000:00:10.1: uhci_resume
> uhci_hcd 0000:00:10.1: uhci_check_and_reset_hc: cmd = 0x0000
> uhci_hcd 0000:00:10.1: Performing full reset
> usb usb2: root hub lost power or was reset
> usb usb2: suspend_rh
> uhci_hcd 0000:00:10.2: PCI D0, from previous PCI D3
> ACPI: PCI Interrupt 0000:00:10.2[C] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 20
> uhci_hcd 0000:00:10.2: uhci_resume
> uhci_hcd 0000:00:10.2: uhci_check_and_reset_hc: cmd = 0x0000
> uhci_hcd 0000:00:10.2: Performing full reset
> usb usb3: root hub lost power or was reset
> usb usb3: suspend_rh
> ehci_hcd 0000:00:10.3: PCI D0, from previous PCI D3
> ACPI: PCI Interrupt 0000:00:10.3[D] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 20
> PM: Writing back config space on device 0000:00:10.3 at offset 3 (was 2008, writing 2010)
> PM: Writing back config space on device 0000:00:10.3 at offset 1 (was 2100007, writing 2100017)
> PM: Writing back config space on device 0000:00:11.1 at offset 1 (was 2900003, writing 2900007)
> ACPI: PCI Interrupt 0000:00:11.1[A] -> Link [ALKA] -> GSI 20 (level, low) -> IRQ 17
> ACPI: PCI Interrupt 0000:00:11.5[C] -> Link [ALKC] -> GSI 22 (level, low) -> IRQ 23
> PCI: Setting latency timer of device 0000:00:11.5 to 64
> ACPI: PCI Interrupt 0000:01:00.0[A] -> GSI 16 (level, low) -> IRQ 16
> serial 00:07: activated
> serial 00:08: activated
> parport_pc 00:09: activated
> i8042 aux 00:0a: activation failed
> i8042 kbd 00:0b: activation failed
> sd 0:0:0:0: [sda] Starting disk
> ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126]
> ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTF_] (Node c180b990), AE_AML_PACKAGE_LIMIT
> ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN0.DRV1._GTF] (Node c180b888), AE_AML_PACKAGE_LIMIT
> ata1.01: _GTF evaluation failed (AE 0x300d)
> ata1.01: revalidation failed (errno=-5)
> ata1: failed to recover some devices, retrying in 5 secs
> ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126]
> ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTF_] (Node c180b990), AE_AML_PACKAGE_LIMIT
> ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN0.DRV1._GTF] (Node c180b888), AE_AML_PACKAGE_LIMIT
> ata1.01: _GTF evaluation failed (AE 0x300d)
> ata1.01: ACPI on devcfg failed the second time, disabling (errno=-5)
> ata1.01: revalidation failed (errno=1)
> ata1: failed to recover some devices, retrying in 5 secs
> ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126]
> ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTF_] (Node c180b990), AE_AML_PACKAGE_LIMIT
> ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN0.DRV0._GTF] (Node c180b840), AE_AML_PACKAGE_LIMIT
> ata1.00: _GTF evaluation failed (AE 0x300d)
> ata1.00: revalidation failed (errno=-5)
> ata1: failed to recover some devices, retrying in 5 secs
> ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126]
> ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTF_] (Node c180b990), AE_AML_PACKAGE_LIMIT
> ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN0.DRV0._GTF] (Node c180b840), AE_AML_PACKAGE_LIMIT
> ata1.00: _GTF evaluation failed (AE 0x300d)
> ata1.00: ACPI on devcfg failed the second time, disabling (errno=-5)
> ata1.00: revalidation failed (errno=1)
> ata1: failed to recover some devices, retrying in 5 secs
> ata1.00: configured for UDMA/100
> ata1.01: configured for UDMA/33
> sd 0:0:0:0: [sda] 234441648 512-byte hardware sectors (120034 MB)
> sd 0:0:0:0: [sda] Write Protect is off
> sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> sd 0:0:0:0: [sda] 234441648 512-byte hardware sectors (120034 MB)
> sd 0:0:0:0: [sda] Write Protect is off
> sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> usb usb1: usb resume
> usb usb1: wakeup_rh
> hub 1-0:1.0: trying to enable port power on non-switchable hub
> usb usb2: usb resume
> usb usb2: wakeup_rh
> hub 2-0:1.0: trying to enable port power on non-switchable hub
> usb usb3: usb resume
> usb usb3: wakeup_rh
> hub 3-0:1.0: trying to enable port power on non-switchable hub
> usb usb4: usb resume
> ehci_hcd 0000:00:10.3: resume root hub
> hub 4-0:1.0: hub_resume
> Restarting tasks ... <7>hub 1-0:1.0: state 7 ports 2 chg 0000 evt 0006
> uhci_hcd 0000:00:10.0: port 1 portsc 018a,00
> hub 1-0:1.0: port 1, status 0300, change 0003, 1.5 Mb/s
> done.
> swsusp: Basic memory bitmaps freed
> hub 1-0:1.0: debounce: port 1: total 100ms stable 100ms status 0x300
> uhci_hcd 0000:00:10.0: port 2 portsc 008a,00
> hub 1-0:1.0: port 2, status 0100, change 0003, 12 Mb/s
> hub 1-0:1.0: debounce: port 2: total 100ms stable 100ms status 0x100
> hub 2-0:1.0: state 7 ports 2 chg 0000 evt 0006
> uhci_hcd 0000:00:10.1: port 1 portsc 018a,00
> hub 2-0:1.0: port 1, status 0300, change 0003, 1.5 Mb/s
> hub 2-0:1.0: debounce: port 1: total 100ms stable 100ms status 0x300
> uhci_hcd 0000:00:10.1: port 2 portsc 008a,00
> hub 2-0:1.0: port 2, status 0100, change 0003, 12 Mb/s
> hub 2-0:1.0: debounce: port 2: total 100ms stable 100ms status 0x100
> hub 3-0:1.0: state 7 ports 2 chg 0000 evt 0006
> uhci_hcd 0000:00:10.2: port 1 portsc 008a,00
> hub 3-0:1.0: port 1, status 0100, change 0003, 12 Mb/s
> hub 3-0:1.0: debounce: port 1: total 100ms stable 100ms status 0x100
> uhci_hcd 0000:00:10.2: port 2 portsc 008a,00
> hub 3-0:1.0: port 2, status 0100, change 0003, 12 Mb/s
> hub 3-0:1.0: debounce: port 2: total 100ms stable 100ms status 0x100
> hub 4-0:1.0: state 7 ports 6 chg 0000 evt 0000
> hub 1-0:1.0: state 7 ports 2 chg 0000 evt 0000
> hub 2-0:1.0: state 7 ports 2 chg 0000 evt 0000
> hub 3-0:1.0: state 7 ports 2 chg 0000 evt 0000
> usb usb1: suspend_rh (auto-stop)
> usb usb2: suspend_rh (auto-stop)
> usb usb3: suspend_rh (auto-stop)
> agpgart: Found an AGP 2.0 compliant device at 0000:00:00.0.
> agpgart: Putting AGP V2 device at 0000:00:00.0 into 4x mode
> agpgart: Putting AGP V2 device at 0000:01:00.0 into 4x mode
> [drm] Loading R200 Microcode
> 
> 
> 
> 2.6.24-rc2 suspend log (one screenful), UNSUCCESSFUL:
> 
> serial 00:07: disabled
> ACPI: PCI interrupt for device 0000:00:11.5 disabled
> ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126]
> ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTM_]
> (Node c180b9a8), AE_AML_PACKAGE_LIMIT
> ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN1._GTM] (Node c180b8d0), AE_AML_PACKAGE_LIMIT
> ata2: ACPI get timing mode failed (AE 0x300d)
> pci_device_suspend(): ata_pci_device_suspend+0x0/0x40() returns -22
> suspend_device(): pci_device_suspend+0x0/0x70() returns -22
> Could not suspend device 0000:00:11.1: error -22
> ACPI: PCI Interrupt 0000:00:11.5[C] -> Link [ALKC] -> GSI 22 (level, low) -> IRQ 23
> ACPI: PCI Interrupt 0000:01:00.0[A] -> GSI 16 (level, low) -> IRQ 16
> serial 00:07: activated
> serial 00:08: activated
> parport_pc 00:09: activated
> i8042 aux 00:0a: activation failed
> i8042 kbd 00:0b: activation failed
> sd 0:0:0:0: [sda] Starting disk
> sd 0:0:0:0: timing out command, waited 180s
> sd 0:0:0:0: [sda] START_STOP FAILED
> sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_OK,SUGGEST_OK
> Restarting tasks ... <7>hub 1-0:1.0: state 7 ports 6 chg 0000 evt 0000
> done.
> swsusp: Basic memory bitmaps freed
> swsusp: Marking nosave pages: 000000000009f000 - 0000000000100000
> swsusp: Basic memory bitmaps created
> Syncing filesystems ...
> 
> 
> 
> # lspci
> 00:00.0 Host bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333]
> 00:01.0 PCI bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333 AGP]
> 00:09.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller (rev 46)
> 00:0a.0 Multimedia audio controller: Aureal Semiconductor Vortex 2 (rev fe)
> 00:0c.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 08)
> 00:0d.0 Multimedia audio controller: Aztech System Ltd 3328 Audio (rev 10)
> 00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80)
> 00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80)
> 00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80)
> 00:10.3 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 82)
> 00:11.0 ISA bridge: VIA Technologies, Inc. VT8235 ISA Bridge
> 00:11.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
> 00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller (rev 50)
> 00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 74)
> 01:00.0 VGA compatible controller: ATI Technologies Inc Radeon RV250 If [Radeon 9000] (rev 01)
> 01:00.1 Display controller: ATI Technologies Inc Radeon RV250 [Radeon 9000] (Secondary) (rev 01)
> 
> 
> # dmidecode 2.9
> SMBIOS 2.2 present.
> 39 structures occupying 1035 bytes.
> Table at 0x000F0800.
> 
> Handle 0x0000, DMI type 0, 19 bytes
> BIOS Information
>         Vendor: Award Software International, Inc.
>         Version: 6.00 PG
>         Release Date: 09/16/2003
>         Address: 0xE0000
>         Runtime Size: 128 kB
>         ROM Size: 512 kB
>         Characteristics:
>                 ISA is supported
>                 PCI is supported
>                 PNP is supported
>                 APM is supported
>                 BIOS is upgradeable
>                 BIOS shadowing is allowed
>                 ESCD support is available
>                 Boot from CD is supported
>                 Selectable boot is supported
>                 BIOS ROM is socketed
>                 EDD is supported
>                 5.25"/360 KB floppy services are supported (int 13h)
>                 5.25"/1.2 MB floppy services are supported (int 13h)
>                 3.5"/720 KB floppy services are supported (int 13h)
>                 3.5"/2.88 MB floppy services are supported (int 13h)
>                 Print screen service is supported (int 5h)
>                 8042 keyboard services are supported (int 9h)
>                 Serial services are supported (int 14h)
>                 Printer services are supported (int 17h)
>                 CGA/mono video services are supported (int 10h)
>                 ACPI is supported
>                 USB legacy is supported
>                 AGP is supported
>                 LS-120 boot is supported
>                 ATAPI Zip drive boot is supported
> 
> Handle 0x0001, DMI type 1, 25 bytes
> System Information
>         Manufacturer: VIA Technologies, Inc.
>         Product Name: VT8367-8235
>         Version:
>         Serial Number:
>         UUID: Not Present
>         Wake-up Type: Power Switch
> 
> Handle 0x0002, DMI type 2, 8 bytes
> Base Board Information
>         Manufacturer:
>         Product Name: VT8367-8235
>         Version:
>         Serial Number:
> 
> Handle 0x0003, DMI type 3, 13 bytes
> Chassis Information
>         Manufacturer:
>         Type: Desktop
>         Lock: Not Present
>         Version:
>         Serial Number:
>         Asset Tag:
>         Boot-up State: Unknown
>         Power Supply State: Unknown
>         Thermal State: Unknown
>         Security Status: Unknown
> 
> Handle 0x0004, DMI type 4, 32 bytes
> Processor Information
>         Socket Designation: Socket A
>         Type: Central Processor
>         Family: Duron
>         Manufacturer: AMD
>         ID: 81 06 00 00 FF FB 83 03
>         Signature: Family 6, Model 8, Stepping 1
>         Flags:
>                 FPU (Floating-point unit on-chip)
>                 VME (Virtual mode extension)
>                 DE (Debugging extension)
>                 PSE (Page size extension)
>                 TSC (Time stamp counter)
>                 MSR (Model specific registers)
>                 PAE (Physical address extension)
>                 MCE (Machine check exception)
>                 CX8 (CMPXCHG8 instruction supported)
>                 APIC (On-chip APIC hardware supported)
>                 SEP (Fast system call)
>                 MTRR (Memory type range registers)
>                 PGE (Page global enable)
>                 MCA (Machine check architecture)
>                 CMOV (Conditional move instruction supported)
>                 PAT (Page attribute table)
>                 PSE-36 (36-bit page size extension)
>                 MMX (MMX technology supported)
>                 FXSR (Fast floating-point save and restore)
>                 SSE (Streaming SIMD extensions)
>         Version: AMD K7 processor
>         Voltage: 3.3 V
>         External Clock: 133 MHz
>         Max Speed: 1500 MHz
>         Current Speed: 1200 MHz
>         Status: Populated, Enabled
>         Upgrade: ZIF Socket
>         L1 Cache Handle: 0x000A
>         L2 Cache Handle: 0x000B
>         L3 Cache Handle: No L3 Cache
> 
> Handle 0x0005, DMI type 5, 24 bytes
> Memory Controller Information
>         Error Detecting Method: None
>         Error Correcting Capabilities:
>                 None
>         Supported Interleave: One-way Interleave
>         Current Interleave: Four-way Interleave
>         Maximum Memory Module Size: 32 MB
>         Maximum Total Memory Size: 128 MB
>         Supported Speeds:
>                 70 ns
>                 60 ns
>         Supported Memory Types:
>                 Standard
>                 EDO
>         Memory Module Voltage: 5.0 V
>         Associated Memory Slots: 4
>                 0x0006
>                 0x0007
>                 0x0008
>                 0x0009
>         Enabled Error Correcting Capabilities: None
> 
> .
> .
> .
> 
> 
> # hdparm -i /dev/sda
> 
> /dev/sda:
> 
>  Model=WDC WD1200JB-00CRA1                     , FwRev=17.07W17, SerialNo=WD-WCA8C4285629
>  Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq }
>  RawCHS=16383/16/63, TrkSize=57600, SectSize=600, ECCbytes=40
>  BuffType=DualPortCache, BuffSize=8192kB, MaxMultSect=16, MultSect=?16?
>  CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=234441648
>  IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
>  PIO modes:  pio0 pio1 pio2 pio3 pio4
>  DMA modes:  mdma0 mdma1 mdma2
>  UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5
>  AdvancedPM=no WriteCache=enabled
>  Drive conforms to: Unspecified:  ATA/ATAPI-1,2,3,4,5
> 
>  * signifies the current active mode
> 
> 
> 
> Athlon on EPOX 8K5A2+ board.
> 
> 
> 
> Again, 2.6.23 and 2.6.24-rc1 work, yet 2.6.24 -rc2, -rc3 and -rc4 FAIL.
> 
> Probably won't be able to do any reporting over the weekend (WOL is
> inoperable ATM for some weird reason), let me know what you need.
> Took too much time to gather this report already anyway ;)
> 
> Thanks,
> 
> Andreas Mohr

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08 10:20     ` Andrew Morton
@ 2007-12-08 10:28       ` Matthew Garrett
  2007-12-08 10:55       ` Andreas Mohr
  1 sibling, 0 replies; 103+ messages in thread
From: Matthew Garrett @ 2007-12-08 10:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andreas Mohr, Rafael J. Wysocki, LKML, Linus Torvalds,
	Ingo Molnar, linux-ide, Tejun Heo, Len Brown, linux-acpi

On Sat, Dec 08, 2007 at 02:20:01AM -0800, Andrew Morton wrote:
> On Sat, 8 Dec 2007 11:12:57 +0100 Andreas Mohr <andi@lisas.de> wrote:
> > ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126]
> > ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTF_] (Node c180b990), AE_AML_PACKAGE_LIMIT
> > ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN0.DRV1._GTF] (Node c180b888), AE_AML_PACKAGE_LIMIT
> > ata1.01: _GTF evaluation failed (AE 0x300d)

037f6bb79f753c014bc84bca0de9bf98bb5ab169 ought to have fixed this?

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08  2:40 2.6.24-rc4-git5: Reported regressions from 2.6.23 Rafael J. Wysocki
                   ` (6 preceding siblings ...)
  2007-12-08  9:52 ` Andrew Morton
@ 2007-12-08 10:44 ` Richard Purdie
  2007-12-08 22:32   ` Rafael J. Wysocki
  2007-12-09 11:54 ` Andrew Morton
  2007-12-10 20:42 ` Ingo Molnar
  9 siblings, 1 reply; 103+ messages in thread
From: Richard Purdie @ 2007-12-08 10:44 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: LKML, Andrew Morton, Linus Torvalds, Ingo Molnar

On Sat, 2007-12-08 at 03:40 +0100, Rafael J. Wysocki wrote:
> Subject		: leds: ledtrig-timer calls sleeping function from invalid context
> Submitter	: Márton Németh <nm127@freemail.hu>
> References	: http://bugzilla.kernel.org/show_bug.cgi?id=9264
> Handled-By	: Richard Purdie <rpurdie@rpsys.net>
> Patch		: http://bugzilla.kernel.org/attachment.cgi?id=13493&action=view

The fix is now in mainline:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=dc47206e552c0850ad11f7e9a1fca0a3c92f5d65

Cheers,

Richard




^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08 10:20     ` Andrew Morton
  2007-12-08 10:28       ` Matthew Garrett
@ 2007-12-08 10:55       ` Andreas Mohr
  2007-12-09 15:46         ` Tejun Heo
  1 sibling, 1 reply; 103+ messages in thread
From: Andreas Mohr @ 2007-12-08 10:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andreas Mohr, Rafael J. Wysocki, LKML, Linus Torvalds,
	Ingo Molnar, linux-ide, Tejun Heo, Len Brown, linux-acpi

Hi,

On Sat, Dec 08, 2007 at 02:20:01AM -0800, Andrew Morton wrote:
> > Does this report now win me the lucky draw, pretty please? ;)
> 
> nah, you have to cc the acpi guys to get a prize ;)

Thought so shortly, but missed it.

> Andreas, please do separately report that WOL problem too..

Local setup issue only, at least this one *isn't* a 2.6.24-rc regression. ;)

> Our list just reached 30.

Oh, so this is in fact a separate issue? Wasn't sure, couldn't do
enough analysis of similar cases.

Will test any (already submitted!) suggestions ASAP.

Andreas Mohr

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08  9:46 ` Andrew Morton
@ 2007-12-08 15:49   ` Alan Stern
  0 siblings, 0 replies; 103+ messages in thread
From: Alan Stern @ 2007-12-08 15:49 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar

On Sat, 8 Dec 2007, Andrew Morton wrote:

> On Sat, 8 Dec 2007 03:40:49 +0100 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> 
> > This message contains a list of some regressions from 2.6.23 which have been
> > reported since 2.6.24-rc1 was released and for which there are no fixes in the
> > mainline that I know of.  If any of them have been fixed already, please let me
> > know.
> > 
> > If you know of any other unresolved regressions from 2.6.23, please let me know
> > either and I'll add them to the list.
> > 
> > 
> > ..
> >
> > Subject		: system hangs after a few minutes
> > Submitter	: Marcus Better <marcus@better.se>
> > References	: http://bugzilla.kernel.org/show_bug.cgi?id=9335
> > Handled-By	: Andrew Morton <akpm@linux-foundation.org>
> > 		  Alan Stern <stern@rowland.harvard.edu>
> > Patch		: http://bugzilla.kernel.org/attachment.cgi?id=13871&action=view
> > 
> 
> This one we have a confirmed fix from Alan but it doesn't appear to be in
> anyone's tree.

An expanded version of that fix is in Greg's queue:

http://marc.info/?l=linux-usb-devel&m=119697043410947&w=2

Since he's away until Tuesday, nothing will happen for a few days.  
However you might want to replace the old fix that got added to -mm.

> There is a second bug in here, applicable to core x86: Marcus's machine
> won't boot with nmi_watchdog=1.

Alan Stern


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52, [2.6.24-rc4-git5: Reported regressions from 2.6.23]
  2007-12-08  9:30 ` tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52, [2.6.24-rc4-git5: Reported regressions from 2.6.23] Ingo Molnar
  2007-12-08 10:11   ` Andrew Morton
@ 2007-12-08 16:37   ` Matt Mackall
  2007-12-08 17:47     ` Linus Torvalds
  1 sibling, 1 reply; 103+ messages in thread
From: Matt Mackall @ 2007-12-08 16:37 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Rafael J. Wysocki, LKML, Andrew Morton, Linus Torvalds

On Sat, Dec 08, 2007 at 10:30:39AM +0100, Ingo Molnar wrote:
> 
> * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> 
> > Subject		: tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52 kmap_atomic_prot()
> > Submitter	: Ingo Molnar <mingo@elte.hu>
> > References	: http://lkml.org/lkml/2007/11/29/157
> > 		  http://bugzilla.kernel.org/show_bug.cgi?id=9497
> > Handled-By	: Matt Mackall <mpm@selenic.com>
> > Patch		: http://lkml.org/lkml/2007/11/29/387
> 
> Matt, the above bug is still occuring en masse during random bootups:
> 

I was hoping for some discussion about whether it was the best fix.
The current kzalloc thing strikes me as a step backwards for all
allocators. We'd do better to have a single non-inline kzalloc
function rather than an extra branch in the normal kmalloc fast path.

But here's the patch again, with my sign-off:


Avoid calling page allocator with __GFP_ZERO, as we might be in atomic
context and this will make thing unhappy on highmem systems. Instead,
manually zero allocations from the page allocator.

Signed-off-by: Matt Mackall <mpm@selenic.com>

diff -r f7edf7226317 mm/slob.c
--- a/mm/slob.c	Wed Dec 05 15:57:06 2007 -0600
+++ b/mm/slob.c	Wed Dec 05 15:57:51 2007 -0600
@@ -223,6 +231,7 @@ static void *slob_new_page(gfp_t gfp, in
 {
 	void *page;
 
+	gfp &= ~__GFP_ZERO;
 #ifdef CONFIG_NUMA
 	if (node != -1)
 		page = alloc_pages_node(node, gfp, order);
@@ -457,6 +470,8 @@ void *__kmalloc_node(size_t size, gfp_t 
 			page = virt_to_page(ret);
 			page->private = size;
 		}
+		if (unlikely((gfp & __GFP_ZERO) && ret))
+			memset(ret, 0, size);
 		return ret;
 	}
 }


-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52, [2.6.24-rc4-git5: Reported regressions from 2.6.23]
  2007-12-08 16:37   ` Matt Mackall
@ 2007-12-08 17:47     ` Linus Torvalds
  2007-12-08 17:54       ` Linus Torvalds
  2007-12-13 22:03       ` Christoph Lameter
  0 siblings, 2 replies; 103+ messages in thread
From: Linus Torvalds @ 2007-12-08 17:47 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Ingo Molnar, Rafael J. Wysocki, LKML, Andrew Morton, Christoph Lameter

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=US-ASCII, Size: 1170 bytes --]



On Sat, 8 Dec 2007, Matt Mackall wrote:
> 
> Avoid calling page allocator with __GFP_ZERO, as we might be in atomic
> context and this will make thing unhappy on highmem systems. Instead,
> manually zero allocations from the page allocator.

I think this is fine, but didn't we fix the warning already? Calling page 
allocators with __GFP_ZERO should be fine, as long as __GFP_HIGHMEM isn't 
set, and slab/slub/slob/kmalloc cannot use GFP_HIGHMEM *anyway*. 

But I'll apply it anyway, because it looks "obviously correct" from the 
standpoint that the _other_ slob user already clears the end result 
explicitly later on, and we simply should never pass down __GFP_ZERO to 
the actual page allocator.

On that note, shouldn't we also do this for slub.c? Christoph?

			Linus

---
 mm/slub.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index b9f37cb..9c1d9f3 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1468,6 +1468,9 @@ static void *__slab_alloc(struct kmem_cache *s,
 	void **object;
 	struct page *new;
 
+	/* We handle __GFP_ZERO in the caller */
+	gfpflags &= ~__GFP_ZERO;
+
 	if (!c->page)
 		goto new_slab;
 

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* Re: tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52, [2.6.24-rc4-git5: Reported regressions from 2.6.23]
  2007-12-08 17:47     ` Linus Torvalds
@ 2007-12-08 17:54       ` Linus Torvalds
  2007-12-08 18:09         ` Andrew Morton
                           ` (3 more replies)
  2007-12-13 22:03       ` Christoph Lameter
  1 sibling, 4 replies; 103+ messages in thread
From: Linus Torvalds @ 2007-12-08 17:54 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Ingo Molnar, Rafael J. Wysocki, LKML, Andrew Morton, Christoph Lameter

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=US-ASCII, Size: 1505 bytes --]



On Sat, 8 Dec 2007, Linus Torvalds wrote:
> 
> But I'll apply it anyway, because it looks "obviously correct" from the 
> standpoint that the _other_ slob user already clears the end result 
> explicitly later on, and we simply should never pass down __GFP_ZERO to 
> the actual page allocator.

Actually, I take that back. The other slob users are different. They share 
pages, this codepath does not.

So I think a more proper solution would be:
 (a) Something like this patch (which includes my previous mm/slub.c 
     change)
 (b) don't warn about atomic GFP_ZERO's - unless they have GFP_HIGHMEM set 
     *too*.

So which warning is it that triggers the bogus error?

		Linus
---
 mm/slob.c |    2 +-
 mm/slub.c |    3 +++
 2 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/mm/slob.c b/mm/slob.c
index ee2ef8a..773a7aa 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -330,7 +330,7 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node)
 
 	/* Not enough space: must allocate a new page */
 	if (!b) {
-		b = slob_new_page(gfp, 0, node);
+		b = slob_new_page(gfp & ~__GFP_ZERO, 0, node);
 		if (!b)
 			return 0;
 		sp = (struct slob_page *)virt_to_page(b);
diff --git a/mm/slub.c b/mm/slub.c
index b9f37cb..9c1d9f3 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1468,6 +1468,9 @@ static void *__slab_alloc(struct kmem_cache *s,
 	void **object;
 	struct page *new;
 
+	/* We handle __GFP_ZERO in the caller */
+	gfpflags &= ~__GFP_ZERO;
+
 	if (!c->page)
 		goto new_slab;
 

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* Re: tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52, [2.6.24-rc4-git5: Reported regressions from 2.6.23]
  2007-12-08 17:54       ` Linus Torvalds
@ 2007-12-08 18:09         ` Andrew Morton
  2007-12-08 18:37           ` Linus Torvalds
  2007-12-09  7:58           ` Ingo Molnar
  2007-12-08 18:33         ` Matt Mackall
                           ` (2 subsequent siblings)
  3 siblings, 2 replies; 103+ messages in thread
From: Andrew Morton @ 2007-12-08 18:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matt Mackall, Ingo Molnar, Rafael J. Wysocki, LKML, Christoph Lameter

On Sat, 8 Dec 2007 09:54:06 -0800 (PST) Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Sat, 8 Dec 2007, Linus Torvalds wrote:
> > 
> > But I'll apply it anyway, because it looks "obviously correct" from the 
> > standpoint that the _other___slob user already clears the end result 
> > explicitly later on, and we simply should never pass down __GFP_ZERO to 
> > the actual page allocator.
> 
> Actually, I take that back. The other slob users are different. They share 
> pages, this codepath does not.
> 
> So I think a more proper solution would be:
>  (a) Something like this patch (which includes my previous mm/slub.c 
>      change)
>  (b) don't warn about atomic GFP_ZERO's - unless they have GFP_HIGHMEM set 
>      *too*.
> 
> So which warning is it that triggers the bogus error?

It's a kmap_atomic() debugging patch which I wrote ages ago and whcih Ingo
sucked into his tree.  I don't _think_ this warning is present in your tree
at all.

http://lkml.org/lkml/2007/11/29/157 is where it starts.

I had a lenghty back-and-forth with Christoph on this within the past
couple of months and I cannot locate the thread and I don't recall what the
upshot was and Christoph is still offline.

Knocking out __GFP_ZERO at the point where the slab allocator(s) call the
page allocator seems like a good approach to me.

But I don't think we need to do anything for 2.6.24..

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52, [2.6.24-rc4-git5: Reported regressions from 2.6.23]
  2007-12-08 17:54       ` Linus Torvalds
  2007-12-08 18:09         ` Andrew Morton
@ 2007-12-08 18:33         ` Matt Mackall
  2007-12-08 19:00         ` Matt Mackall
  2007-12-09  8:33         ` Pekka Enberg
  3 siblings, 0 replies; 103+ messages in thread
From: Matt Mackall @ 2007-12-08 18:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Rafael J. Wysocki, LKML, Andrew Morton, Christoph Lameter

On Sat, Dec 08, 2007 at 09:54:06AM -0800, Linus Torvalds wrote:
> 
> 
> On Sat, 8 Dec 2007, Linus Torvalds wrote:
> > 
> > But I'll apply it anyway, because it looks "obviously correct" from the 
> > standpoint that the _other_??slob user already clears the end result 
> > explicitly later on, and we simply should never pass down __GFP_ZERO to 
> > the actual page allocator.
> 
> Actually, I take that back. The other slob users are different. They share 
> pages, this codepath does not.
> 
> So I think a more proper solution would be:
>  (a) Something like this patch (which includes my previous mm/slub.c 
>      change)
>  (b) don't warn about atomic GFP_ZERO's - unless they have GFP_HIGHMEM set 
>      *too*.

But what about:

(c) stop passing GFP_ZERO to kmalloc allocators
    stop fiddling with flags inside allocators
    remove ugly if (unlikely(gfp & GFP_ZERO)) from kmalloc allocators
    make kzalloc and kcalloc non-inline functions that do the memset

That should:

- make both kmalloc and kzalloc/kcalloc faster (one less branch)
- reduce kernel size

GFP_ZERO is a bit of an abuse here, given that we don't even intend to
pass it to the underlying "GFP".

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52, [2.6.24-rc4-git5: Reported regressions from 2.6.23]
  2007-12-08 18:09         ` Andrew Morton
@ 2007-12-08 18:37           ` Linus Torvalds
  2007-12-08 19:52             ` Ingo Molnar
  2007-12-09  7:58           ` Ingo Molnar
  1 sibling, 1 reply; 103+ messages in thread
From: Linus Torvalds @ 2007-12-08 18:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matt Mackall, Ingo Molnar, Rafael J. Wysocki, LKML, Christoph Lameter



On Sat, 8 Dec 2007, Andrew Morton wrote:
>
> > So which warning is it that triggers the bogus error?
> 
> It's a kmap_atomic() debugging patch which I wrote ages ago and whcih Ingo
> sucked into his tree.  I don't _think_ this warning is present in your tree
> at all.

Ok, that explains it.

> Knocking out __GFP_ZERO at the point where the slab allocator(s) call the
> page allocator seems like a good approach to me.
> 
> But I don't think we need to do anything for 2.6.24..

Good. Although we should perhaps look at that reported performance problem 
with SLUB. It looks like SLUB will do a memclear() for the area twice 
(first for the whole page, then for the thing it allocated) for the slow 
case. Maybe that exacerbates the problem.

			Linus

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08  9:42 ` Andrew Morton
@ 2007-12-08 18:57   ` Roland Dreier
  2007-12-08 19:40   ` Theodore Tso
  1 sibling, 0 replies; 103+ messages in thread
From: Roland Dreier @ 2007-12-08 18:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar,
	Roland Dreier, Takashi Iwai, Theodore Ts'o

 > > Subject		: snd_hda_intel 2.6.24-rc2 bug: interrupts don't always work on Lenovo X60s
 > > Submitter	: Roland Dreier <rdreier@cisco.com>
 > > References	: http://lkml.org/lkml/2007/11/8/255
 > > 		  http://bugzilla.kernel.org/show_bug.cgi?id=9332
 > > Handled-By	: 
 > > Patch		: 
 > 
 > Takashi had a patch and that has been merged.  AFAIK this regression
 > has been fixed and we're left with a new but harmless warning.
 > 
 > However Roland reported other problems and it appears that the trail went
 > cold (http://lkml.org/lkml/2007/11/14/251)

A fix for the most likely cause of this problem was merged (7eba5c9d
"[ALSA] hda-codec - Check PINCAP only for PIN widgets") but it seems
that setting CONFIG_SND_HDA_POWER_SAVE can cause the "azx_get_response
timeout, switching to polling mode" message sometimes too.  However
according to Takashi this is really just a cosmetic problem -- polling
mode is not so bad.

 - R.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52, [2.6.24-rc4-git5: Reported regressions from 2.6.23]
  2007-12-08 17:54       ` Linus Torvalds
  2007-12-08 18:09         ` Andrew Morton
  2007-12-08 18:33         ` Matt Mackall
@ 2007-12-08 19:00         ` Matt Mackall
  2007-12-09  8:33         ` Pekka Enberg
  3 siblings, 0 replies; 103+ messages in thread
From: Matt Mackall @ 2007-12-08 19:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Rafael J. Wysocki, LKML, Andrew Morton, Christoph Lameter

On Sat, Dec 08, 2007 at 09:54:06AM -0800, Linus Torvalds wrote:
> 
> 
> On Sat, 8 Dec 2007, Linus Torvalds wrote:
> > 
> > But I'll apply it anyway, because it looks "obviously correct" from the 
> > standpoint that the _other_??slob user already clears the end result 
> > explicitly later on, and we simply should never pass down __GFP_ZERO to 
> > the actual page allocator.
> 
> Actually, I take that back. The other slob users are different. They share 
> pages, this codepath does not.
> 
> So I think a more proper solution would be:
>  (a) Something like this patch (which includes my previous mm/slub.c 
>      change)
>  (b) don't warn about atomic GFP_ZERO's - unless they have GFP_HIGHMEM set 
>      *too*.
> 
> So which warning is it that triggers the bogus error?

While I think the whole GFP_ZERO thing is a bit broken here, this is
an improvement on my original patch, so:

Acked-by: Matt Mackall <mpm@selenic.com>
> ---
>  mm/slob.c |    2 +-
>  mm/slub.c |    3 +++
>  2 files changed, 4 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/slob.c b/mm/slob.c
> index ee2ef8a..773a7aa 100644
> --- a/mm/slob.c
> +++ b/mm/slob.c
> @@ -330,7 +330,7 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node)
>  
>  	/* Not enough space: must allocate a new page */
>  	if (!b) {
> -		b = slob_new_page(gfp, 0, node);
> +		b = slob_new_page(gfp & ~__GFP_ZERO, 0, node);
>  		if (!b)
>  			return 0;
>  		sp = (struct slob_page *)virt_to_page(b);
> diff --git a/mm/slub.c b/mm/slub.c
> index b9f37cb..9c1d9f3 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1468,6 +1468,9 @@ static void *__slab_alloc(struct kmem_cache *s,
>  	void **object;
>  	struct page *new;
>  
> +	/* We handle __GFP_ZERO in the caller */
> +	gfpflags &= ~__GFP_ZERO;
> +
>  	if (!c->page)
>  		goto new_slab;

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08  9:42 ` Andrew Morton
  2007-12-08 18:57   ` Roland Dreier
@ 2007-12-08 19:40   ` Theodore Tso
  2007-12-08 19:55     ` Ingo Molnar
  2007-12-08 22:30     ` Rafael J. Wysocki
  1 sibling, 2 replies; 103+ messages in thread
From: Theodore Tso @ 2007-12-08 19:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar,
	Roland Dreier, Takashi Iwai

On Sat, Dec 08, 2007 at 01:42:41AM -0800, Andrew Morton wrote:
> > Subject		: snd_hda_intel 2.6.24-rc2 bug: interrupts don't always work on Lenovo X60s
> > Submitter	: Roland Dreier <rdreier@cisco.com>
> > References	: http://lkml.org/lkml/2007/11/8/255
> > 		  http://bugzilla.kernel.org/show_bug.cgi?id=9332
> > Handled-By	: 
> > Patch		: 
> 
> Takashi had a patch and that has been merged.  AFAIK this regression
> has been fixed and we're left with a new but harmless warning.
> 
> However Roland reported other problems and it appears that the trail went
> cold (http://lkml.org/lkml/2007/11/14/251)
> 
> Ted was hitting some of the same problems but that trail appears to also
> have gone cold (http://lkml.org/lkml/2007/11/23/17).

Actually, not gone cold, but I stopped posting about it because it's
been solved and I thought agreement had been reached that it should be
pushed to mainline before 2.6.25.

I am very happily running with Ingo's "snd hda suspend latency:
shorten codec read" patch, which was originally intended to speed up
resuming from hibernation, but which as I discovered, also has the
nice side effect of eliminating the reported error.  

On 11/23, Takashi replied to my note (http://lkml.org/lkml/2007/11/23/17)
and suggested that Jaroslav push this patch to Linus immediately
instead of waiting for 2.6.25, since it appearly solves two problems
with one stone.  However, I just checked, as of Linus's public, and
Ingo's patch is *not* in mainline.

However, as far as I am concerned, Ingo's patch, first posted to LKML
here: http://lkml.org/lkml/2007/11/16/66 should be listed as fixing
the above regression.  Rafael, could you please make a note of this in
your regression list, and could we please get this patch pushed into
mainline?

Thanks!!

						- Ted

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52, [2.6.24-rc4-git5: Reported regressions from 2.6.23]
  2007-12-08 18:37           ` Linus Torvalds
@ 2007-12-08 19:52             ` Ingo Molnar
  2007-12-08 20:29               ` Ingo Molnar
                                 ` (2 more replies)
  0 siblings, 3 replies; 103+ messages in thread
From: Ingo Molnar @ 2007-12-08 19:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Matt Mackall, Rafael J. Wysocki, LKML, Christoph Lameter


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> > But I don't think we need to do anything for 2.6.24..
> 
> Good. Although we should perhaps look at that reported performance 
> problem with SLUB. It looks like SLUB will do a memclear() for the 
> area twice (first for the whole page, then for the thing it allocated) 
> for the slow case. Maybe that exacerbates the problem.

i dont think the SLUB problem could be explained purely via a double 
memset(). [which ought to be extremely fast anyway] We are talking about 
a 10 times slowdown on a 64-way box of a workload that is fairly 
common-sense. (tasks sending messages to each other via bog standard 
means)

while i dont want to jump to conclusions without looking at some 
profiles, i think the SLUB performance regression is indicative of the 
following fallacy: "SLAB can be done significantly simpler while keeping 
the same performance".

I couldnt point to any particular aspect of SLAB that i could 
characterise as "needless bloat".

the SLUB concept is proudly outlined in init/Kconfig:

  config SLUB
        bool "SLUB (Unqueued Allocator)"
        help
           SLUB is a slab allocator that minimizes cache line usage
           instead of managing queues of cached objects (SLAB approach).
           Per cpu caching is realized using slabs of objects instead
           of queues of objects. SLUB can use memory efficiently
           and has enhanced diagnostics.

but that's not true anymore - the two concepts are pretty much 
equivalent, after all the "performance tuning" that went on in SLUB. 
(read: 'frantically try to catch up with SLAB in benchmarks')

so even today's upstream kernel, which has 'ancient' SLUB code, SLAB and 
SLUB have essentially the same linecount:

  $ wc -l mm/slab.c mm/slub.c
  4478 mm/slab.c
  4125 mm/slub.c

(and while linecount != complexity, there is a strong relationship.)

With SLAB having 10 years more test coverage and tuning.

the messiest and most fragile aspect of SLAB that i can think of is its 
bootstrap hacks - but that is an entirely unimportant detail in my 
opinion. SLAB has been cleaned up significantly in the past few years by 
Pekka Enberg & co, it's pretty pleasant and straightforward code these 
days.

I think we should we make SLAB the default for v2.6.24 ...

	Ingo

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08 19:40   ` Theodore Tso
@ 2007-12-08 19:55     ` Ingo Molnar
  2007-12-08 22:30     ` Rafael J. Wysocki
  1 sibling, 0 replies; 103+ messages in thread
From: Ingo Molnar @ 2007-12-08 19:55 UTC (permalink / raw)
  To: Theodore Tso, Andrew Morton, Rafael J. Wysocki, LKML,
	Linus Torvalds, Roland Dreier, Takashi Iwai


* Theodore Tso <tytso@mit.edu> wrote:

> I am very happily running with Ingo's "snd hda suspend latency: 
> shorten codec read" patch, which was originally intended to speed up 
> resuming from hibernation, but which as I discovered, also has the 
> nice side effect of eliminating the reported error.
> 
> On 11/23, Takashi replied to my note 
> (http://lkml.org/lkml/2007/11/23/17) and suggested that Jaroslav push 
> this patch to Linus immediately instead of waiting for 2.6.25, since 
> it appearly solves two problems with one stone.  However, I just 
> checked, as of Linus's public, and Ingo's patch is *not* in mainline.
> 
> However, as far as I am concerned, Ingo's patch, first posted to LKML 
> here: http://lkml.org/lkml/2007/11/16/66 should be listed as fixing 
> the above regression.  Rafael, could you please make a note of this in 
> your regression list, and could we please get this patch pushed into 
> mainline?

ha! I'd never have expected _that_ to happen. Cool. Fixing a driver bug 
by accident :-)

	Ingo

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52, [2.6.24-rc4-git5: Reported regressions from 2.6.23]
  2007-12-08 19:52             ` Ingo Molnar
@ 2007-12-08 20:29               ` Ingo Molnar
  2007-12-09  8:20                 ` Pekka Enberg
  2007-12-11  6:27               ` Dave Jones
  2007-12-14  4:07               ` Christoph Lameter
  2 siblings, 1 reply; 103+ messages in thread
From: Ingo Molnar @ 2007-12-08 20:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Matt Mackall, Rafael J. Wysocki, LKML, Christoph Lameter


* Ingo Molnar <mingo@elte.hu> wrote:

> the SLUB concept is proudly outlined in init/Kconfig:
> 
>   config SLUB
>         bool "SLUB (Unqueued Allocator)"
>         help
>            SLUB is a slab allocator that minimizes cache line usage
>            instead of managing queues of cached objects (SLAB approach).
>            Per cpu caching is realized using slabs of objects instead
>            of queues of objects. SLUB can use memory efficiently
>            and has enhanced diagnostics.
> 
> but that's not true anymore - the two concepts are pretty much 
> equivalent, after all the "performance tuning" that went on in SLUB. 
> (read: 'frantically try to catch up with SLAB in benchmarks')

just to hammer this point home - and while Christoph Lameter isnt online 
currently to defend SLUB, this issue did come up and i guess there must 
be other MM hackers that are advocates of SLUB. (otherwise this code 
couldnt be upstream)

I find SLUB a fantastically confusing concept, and that starts at the 
name. "Unqueued" it says. Lets take a look at the actual code:

mm/slub.c:

  static void __always_inline *slab_alloc(struct kmem_cache *s,
                  gfp_t gfpflags, int node, void *addr)
  {
  ...
          local_irq_save(flags);
          c = get_cpu_slab(s, smp_processor_id());
  ...
          else {
                  object = c->freelist;
                  c->freelist = object[c->offset];
          }
          local_irq_restore(flags);

so it has a "free list", which is clearly per cpu. Hang on! Isnt that 
actually a per CPU queue? Which SLUB has not, we are told? The "U" in 
SLUB. How on earth can an allocator in 2007 claim to have no queuing
(which is in essence caching)? Am i on crack with this? Did i miss
something really obvious?

	Ingo

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08  9:23     ` Andrew Morton
@ 2007-12-08 22:11       ` Rafael J. Wysocki
  0 siblings, 0 replies; 103+ messages in thread
From: Rafael J. Wysocki @ 2007-12-08 22:11 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Fabio Comolli, LKML, Linus Torvalds, Greg KH,
	Len Brown, Alexey Starikovskiy

On Saturday, 8 of December 2007, Andrew Morton wrote:
> On Sat, 8 Dec 2007 09:28:15 +0100 Ingo Molnar <mingo@elte.hu> wrote:
> 
> > 
> > * Fabio Comolli <fabio.comolli@gmail.com> wrote:
> > 
> > > <snip>
> > > 
> > > > Subject         : Battery shows up twice in kpowersave
> > > > Submitter       : Rolf Eike Beer <eike-kernel@sf-tec.de>
> > > > References      : http://bugzilla.kernel.org/show_bug.cgi?id=9494
> > > > Handled-By      : Alexey Starikovskiy <astarikovskiy@suse.de>
> > > > Patch           :
> > > >
> > > 
> > > I don't think that this is a regression: I reported on RedHat bugzilla 
> > > when I switched from F7 to F8 and I was using 2.6.23.8 at that time. 
> > > It looks to me an HAL regression, but of course I may be wrong :-) as 
> > > the reported bisected to a bad commit.
> > > 
> > > https://bugzilla.redhat.com/show_bug.cgi?id=373041
> > > 
> > > By the way, I now switched to Fedrora Rawhide with a 2.6.24-rc4-git5 
> > > custom kernel and Gnome desktop and the problem is still present, even 
> > > with gnome-power-manager.
> > 
> > to me this looks like an ABI regression - utilities should work without 
> > change. Something changed in /sys output that caused HAL to think that 
> > there are two batteries:
> 
> Yep.  Although HAL is of course a most special case of "userspace".
> 
> > | The output of lshal shows that there are two UDI's with 
> > | info.capabilities = { 'battery' }:
> > |
> > | udi = '/org/freedesktop/Hal/devices/acpi_BAT0'
> > | udi = '/org/freedesktop/Hal/devices/computer_power_supply_0'
> > 
> > whether it's a HAL bug or a kernel bug, the original state should be 
> > restored and it should be worked out without breaking users of older HAL 
> > versions.
> 
> "breaking users of older HAL versions" == "breaking machines".
> 
> The patch should be reverted.  Do we know which one it was?
> 
> > grumble: way too many times do various system utilities break when i 
> > upgrade the kernel on my laptop. Maybe a new debug mechanism: we should 
> > start fingerprinting the exact /sys and /proc output and enforce that 
> > it's immutable across kernel releases as long as the hardware is 
> > unmodified?
> 
> That would be neat.  It would need to be executed on a lot of different
> machines.

Hm, that wouldn't allow us to add new attributes ...

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08  9:29 ` Andrew Morton
@ 2007-12-08 22:17   ` Rafael J. Wysocki
  0 siblings, 0 replies; 103+ messages in thread
From: Rafael J. Wysocki @ 2007-12-08 22:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, Linus Torvalds, Ingo Molnar, Márton Németh

On Saturday, 8 of December 2007, Andrew Morton wrote:
> On Sat, 8 Dec 2007 03:40:49 +0100 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> 
> > This message contains a list of some regressions from 2.6.23 which have been
> > reported since 2.6.24-rc1 was released and for which there are no fixes in the
> > mainline that I know of.  If any of them have been fixed already, please let me
> > know.
> > 
> > If you know of any other unresolved regressions from 2.6.23, please let me know
> > either and I'll add them to the list.
> 
> Twenty nine, huh?
> 
> It would be useful if these records were sorted in date-of-reportage order
> and had a date stamp so we could see how long they've been hanging about.

They are sorted by the bugzilla number which reflects the date-of-reportage
order pretty well.  For a techincal reason, it's easier to me if they're sorted
like this.

Adding date stamps should be easy, tough, I'll try to add them to the next
report.

> Something to think about for the post-2.6.24 regression if you'll be handling
> those?

Yes, I'm going to handle the post-2.6.24 regressions too (in the hope there
will be less of them ;-)).

> > Subject		: leds: ledtrig-timer calls sleeping function from invalid context
> > Submitter	: Márton Németh <nm127@freemail.hu>
> > References	: http://bugzilla.kernel.org/show_bug.cgi?id=9264
> > Handled-By	: Richard Purdie <rpurdie@rpsys.net>
> > Patch		: http://bugzilla.kernel.org/attachment.cgi?id=13493&action=view
> 
> That patch has been merged (dc47206e552c0850ad11f7e9a1fca0a3c92f5d65) and
> assuming Márton has tested the latest git snapshot
> (ftp://ftp.kernel.org/pub/linux/kernel/v2.6/snapshots) successfully we can
> cross it off?

Yes, will drop.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08 19:40   ` Theodore Tso
  2007-12-08 19:55     ` Ingo Molnar
@ 2007-12-08 22:30     ` Rafael J. Wysocki
  2007-12-09  2:15       ` Theodore Tso
  1 sibling, 1 reply; 103+ messages in thread
From: Rafael J. Wysocki @ 2007-12-08 22:30 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Andrew Morton, LKML, Linus Torvalds, Ingo Molnar, Roland Dreier,
	Takashi Iwai

On Saturday, 8 of December 2007, Theodore Tso wrote:
> On Sat, Dec 08, 2007 at 01:42:41AM -0800, Andrew Morton wrote:
> > > Subject		: snd_hda_intel 2.6.24-rc2 bug: interrupts don't always work on Lenovo X60s
> > > Submitter	: Roland Dreier <rdreier@cisco.com>
> > > References	: http://lkml.org/lkml/2007/11/8/255
> > > 		  http://bugzilla.kernel.org/show_bug.cgi?id=9332
> > > Handled-By	: 
> > > Patch		: 
> > 
> > Takashi had a patch and that has been merged.  AFAIK this regression
> > has been fixed and we're left with a new but harmless warning.
> > 
> > However Roland reported other problems and it appears that the trail went
> > cold (http://lkml.org/lkml/2007/11/14/251)
> > 
> > Ted was hitting some of the same problems but that trail appears to also
> > have gone cold (http://lkml.org/lkml/2007/11/23/17).
> 
> Actually, not gone cold, but I stopped posting about it because it's
> been solved and I thought agreement had been reached that it should be
> pushed to mainline before 2.6.25.
> 
> I am very happily running with Ingo's "snd hda suspend latency:
> shorten codec read" patch, which was originally intended to speed up
> resuming from hibernation, but which as I discovered, also has the
> nice side effect of eliminating the reported error.  
> 
> On 11/23, Takashi replied to my note (http://lkml.org/lkml/2007/11/23/17)
> and suggested that Jaroslav push this patch to Linus immediately
> instead of waiting for 2.6.25, since it appearly solves two problems
> with one stone.  However, I just checked, as of Linus's public, and
> Ingo's patch is *not* in mainline.
> 
> However, as far as I am concerned, Ingo's patch, first posted to LKML
> here: http://lkml.org/lkml/2007/11/16/66 should be listed as fixing
> the above regression.  Rafael, could you please make a note of this in
> your regression list,

Done, thanks.

> and could we please get this patch pushed into mainline?

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08 10:44 ` Richard Purdie
@ 2007-12-08 22:32   ` Rafael J. Wysocki
  0 siblings, 0 replies; 103+ messages in thread
From: Rafael J. Wysocki @ 2007-12-08 22:32 UTC (permalink / raw)
  To: Richard Purdie; +Cc: LKML, Andrew Morton, Linus Torvalds, Ingo Molnar

On Saturday, 8 of December 2007, Richard Purdie wrote:
> On Sat, 2007-12-08 at 03:40 +0100, Rafael J. Wysocki wrote:
> > Subject		: leds: ledtrig-timer calls sleeping function from invalid context
> > Submitter	: Márton Németh <nm127@freemail.hu>
> > References	: http://bugzilla.kernel.org/show_bug.cgi?id=9264
> > Handled-By	: Richard Purdie <rpurdie@rpsys.net>
> > Patch		: http://bugzilla.kernel.org/attachment.cgi?id=13493&action=view
> 
> The fix is now in mainline:
> 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=dc47206e552c0850ad11f7e9a1fca0a3c92f5d65

Yes, already dropped.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08 22:30     ` Rafael J. Wysocki
@ 2007-12-09  2:15       ` Theodore Tso
  2007-12-13 10:49         ` Takashi Iwai
  0 siblings, 1 reply; 103+ messages in thread
From: Theodore Tso @ 2007-12-09  2:15 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Andrew Morton, LKML, Linus Torvalds, Ingo Molnar, Roland Dreier,
	Takashi Iwai

On Sat, Dec 08, 2007 at 11:30:53PM +0100, Rafael J. Wysocki wrote:
> On Saturday, 8 of December 2007, Theodore Tso wrote:
> > However, as far as I am concerned, Ingo's patch, first posted to LKML
> > here: http://lkml.org/lkml/2007/11/16/66 should be listed as fixing
> > the above regression.  Rafael, could you please make a note of this in
> > your regression list,
> 
> Done, thanks.

Great, thanks.  I should add that technically this wasn't a regression
since I had been seeing this since before 2.6.23.  Also, it isn't a
big deal, since aside from noise in the syslog, falling back to
polling more doesn't make any functional or user-visible difference
(although I guess it's less efficient).  

Regardless of whether it is a regression, it would be nice to get the
patch applied and and this issue fixed for 2.6.25!

					- Ted

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08  9:36 ` 2.6.24-rc4-git5: Reported regressions from 2.6.23 Andrew Morton
  2007-12-08 10:12   ` Andreas Mohr
@ 2007-12-09  6:52   ` Tejun Heo
  2007-12-09 14:20     ` Rafael J. Wysocki
  1 sibling, 1 reply; 103+ messages in thread
From: Tejun Heo @ 2007-12-09  6:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar, linux-ide

Hello,

Andrew Morton wrote:
>> Subject		: PATA scan: ACPI Exception AE_AML_PACKAGE_LIMIT... is beyond end of object
>> Submitter	: Hans de Bruin <bruinjm@xs4all.nl>
>> References	: http://bugzilla.kernel.org/show_bug.cgi?id=9320
>> Handled-By	: Robert Moore <Robert.Moore@intel.com>
>> 		  Tejun Heo <htejun@gmail.com>
>> 		  Fu Michael <michael.fu@intel.com>
>> Patch		: 
>>
> 
> A number of other people are seeing the same thing and Tejun is
> putting in a blacklist of machines which cannot use libata+acpi.
> That patch is not yet in any git tree which I pull.
> 
> AFACIT the machines kepe working OK - there's just some nasty dmesg
> spew.
> 
> If any machines _are_ breaking then this could cause real problems
> and I'd prefer that we either go for a whitelist or arrange to
> detect the condition and fall back to non-acpi ata.

The pending patchset should make ATA ACPI quite resistant to failures.
Known bad boards can be blacklisted (currently only one is on the
list), ATA ACPI is disabled quicker if ACPI evalution fails, execution
errors are handled better and commands which are intended to help the
vendor instead of the user are filtered.  So, I think we have enough
safety nets.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08  9:52 ` Andrew Morton
@ 2007-12-09  7:00   ` Tejun Heo
  2007-12-09 13:42     ` Alan Cox
  0 siblings, 1 reply; 103+ messages in thread
From: Tejun Heo @ 2007-12-09  7:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar, Alan Cox

Hello, (cc'ing Alan)

Andrew Morton wrote:
>> Subject		: cd/dvd inaccessible in 2.6.24-rc2
>> Submitter	: Will Trives <will@trivescon.com.au>
>> References	: http://lkml.org/lkml/2007/11/9/290
>> 		  http://bugzilla.kernel.org/show_bug.cgi?id=9346
>> Handled-By	: Len Brown <lenb@kernel.org>
>> 		  Tejun Heo <htejun@gmail.com>
>> Patch		: 
>>
> 
> Nasty one.  Tejun and several diligent reporters are doing sterling
> work there and things have improved.  I don't know whether any of
> Tejun's patches have been merged yet, but we'll probably be OK on
> this one.

I'm still trying to find out what's really going on.  That drive is
quite peculiar.

> What is unclear (to me) is what actually caused those people's machines to
> break?

It's introduced by setting ATAPI transfer chunk size to actual
transfer size which is the right thing to do generally.  However, with
the change, the ATAPI HSM should be ready to drain full extra transfer
chunks which libata HSM wasn't doing.  With that part changed, most
regressions should go away.

Unfortunately, simply adding that doesn't fix the case in bug 9346 and
I'm still trying to find out why.  The good news is that the drive
works fine with proposed more extensive improvements to libata ATAPI
which will probably be included into 2.6.25, so we at least have long
term solution.

If we fail to find out the solution in time, we always have the
alternative of backing out the ATAPI transfer chunk size update.  This
will break some other cases which were fixed by the change but those
won't be regressions at least and we can add transfer chunk size
update with other changes to 2.6.25.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52, [2.6.24-rc4-git5: Reported regressions from 2.6.23]
  2007-12-08 18:09         ` Andrew Morton
  2007-12-08 18:37           ` Linus Torvalds
@ 2007-12-09  7:58           ` Ingo Molnar
  2007-12-09 14:17             ` Rafael J. Wysocki
  1 sibling, 1 reply; 103+ messages in thread
From: Ingo Molnar @ 2007-12-09  7:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linus Torvalds, Matt Mackall, Rafael J. Wysocki, LKML, Christoph Lameter


* Andrew Morton <akpm@linux-foundation.org> wrote:

> It's a kmap_atomic() debugging patch which I wrote ages ago and whcih 
> Ingo sucked into his tree.  I don't _think_ this warning is present in 
> your tree at all.
> 
> http://lkml.org/lkml/2007/11/29/157 is where it starts.
> 
> I had a lenghty back-and-forth with Christoph on this within the past 
> couple of months and I cannot locate the thread and I don't recall 
> what the upshot was and Christoph is still offline.
> 
> Knocking out __GFP_ZERO at the point where the slab allocator(s) call 
> the page allocator seems like a good approach to me.
> 
> But I don't think we need to do anything for 2.6.24..

ok - Rafael, please strike this off the regressions list, there is no 
problem in .24 other than the double memset for some SLOB metadata.

	Ingo

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52, [2.6.24-rc4-git5: Reported regressions from 2.6.23]
  2007-12-08 20:29               ` Ingo Molnar
@ 2007-12-09  8:20                 ` Pekka Enberg
  2007-12-09  8:50                   ` Ingo Molnar
  2007-12-09 15:59                   ` Arjan van de Ven
  0 siblings, 2 replies; 103+ messages in thread
From: Pekka Enberg @ 2007-12-09  8:20 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andrew Morton, Matt Mackall, Rafael J. Wysocki,
	LKML, Christoph Lameter

Hi Ingo,

On Dec 8, 2007 10:29 PM, Ingo Molnar <mingo@elte.hu> wrote:
> so it has a "free list", which is clearly per cpu. Hang on! Isnt that
> actually a per CPU queue? Which SLUB has not, we are told? The "U" in
> SLUB. How on earth can an allocator in 2007 claim to have no queuing
> (which is in essence caching)? Am i on crack with this? Did i miss
> something really obvious?

I think you did. The difference is explained in Christoph's announcement:

"A particular concern was the complex management of the numerous
object queues in SLAB. SLUB has no such queues. Instead we dedicate a
slab for each allocating CPU and use objects from a slab directly
instead of queueing them up."

Which, I think, is where SLUB gets its name from (the "unqueued" part).

Now, while SLAB code is "pleasant and straightforward code" (thanks,
btw) for UMA, it's really hairy for NUMA plus the "alien caches" eat
tons of memory (which is why Christoph wrote SLUB in the first place,
the current code in SLAB is mostly unfixable due to its *queuing*
nature).

I don't object changing the default to CONFIG_SLAB but it's not really
a long term strategy unless we want to have three kmalloc's in the
kernel: one for embedded, one for UMA, and one NUMA.

                        Pekka

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52, [2.6.24-rc4-git5: Reported regressions from 2.6.23]
  2007-12-08 17:54       ` Linus Torvalds
                           ` (2 preceding siblings ...)
  2007-12-08 19:00         ` Matt Mackall
@ 2007-12-09  8:33         ` Pekka Enberg
  3 siblings, 0 replies; 103+ messages in thread
From: Pekka Enberg @ 2007-12-09  8:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matt Mackall, Ingo Molnar, Rafael J. Wysocki, LKML,
	Andrew Morton, Christoph Lameter

Hi Linus,

On Dec 8, 2007 7:54 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> diff --git a/mm/slob.c b/mm/slob.c
> index ee2ef8a..773a7aa 100644
> --- a/mm/slob.c
> +++ b/mm/slob.c
> @@ -330,7 +330,7 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node)
>
>         /* Not enough space: must allocate a new page */
>         if (!b) {
> -               b = slob_new_page(gfp, 0, node);
> +               b = slob_new_page(gfp & ~__GFP_ZERO, 0, node);
>                 if (!b)
>                         return 0;
>                 sp = (struct slob_page *)virt_to_page(b);
>
> diff --git a/mm/slub.c b/mm/slub.c
> index b9f37cb..9c1d9f3 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1468,6 +1468,9 @@ static void *__slab_alloc(struct kmem_cache *s,
>         void **object;
>         struct page *new;
>
> +       /* We handle __GFP_ZERO in the caller */
> +       gfpflags &= ~__GFP_ZERO;
> +
>         if (!c->page)
>                 goto new_slab;

In case you didn't already merge this:

Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52, [2.6.24-rc4-git5: Reported regressions from 2.6.23]
  2007-12-09  8:20                 ` Pekka Enberg
@ 2007-12-09  8:50                   ` Ingo Molnar
  2007-12-09  9:18                     ` Pekka Enberg
  2007-12-13 22:07                     ` Christoph Lameter
  2007-12-09 15:59                   ` Arjan van de Ven
  1 sibling, 2 replies; 103+ messages in thread
From: Ingo Molnar @ 2007-12-09  8:50 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Linus Torvalds, Andrew Morton, Matt Mackall, Rafael J. Wysocki,
	LKML, Christoph Lameter


* Pekka Enberg <penberg@cs.helsinki.fi> wrote:

> Hi Ingo,
> 
> On Dec 8, 2007 10:29 PM, Ingo Molnar <mingo@elte.hu> wrote:
> > so it has a "free list", which is clearly per cpu. Hang on! Isnt that
> > actually a per CPU queue? Which SLUB has not, we are told? The "U" in
> > SLUB. How on earth can an allocator in 2007 claim to have no queuing
> > (which is in essence caching)? Am i on crack with this? Did i miss
> > something really obvious?
> 
> I think you did. The difference is explained in Christoph's 
> announcement:
> 
> "A particular concern was the complex management of the numerous 
> object queues in SLAB. SLUB has no such queues. Instead we dedicate a 
> slab for each allocating CPU and use objects from a slab directly 
> instead of queueing them up."
> 
> Which, I think, is where SLUB gets its name from (the "unqueued" 
> part).

yes, i understand the initial announcement (and the Kconfig entry still 
says the same), but that is not matched up by the reality i see in the 
actual code - SLUB clearly uses a queue/list of objects (as cited in my 
previous mail), for obvious performance reasons.

unless i'm missing something obvious (and i easily might), i see SLUB as 
SLAB reimplemented with a different queueing model. Not "without 
queueing".

> Now, while SLAB code is "pleasant and straightforward code" (thanks, 
> btw) for UMA, it's really hairy for NUMA plus the "alien caches" eat 
> tons of memory (which is why Christoph wrote SLUB in the first place, 
> the current code in SLAB is mostly unfixable due to its *queuing* 
> nature).

i'm curious about the real facts behind this "alien cache problem". I 
heard about it and asked around and was told that there's some sort of 
bad quadratic behavior of memory consumption on NUMA - but i cannot 
actually see that in the code.

The alien caches feature of SLAB i see as a spread out clustered 
index/cache of objects on other nodes. It's not increasing the average 
per object memory consumption per se! The number of alien caches 
increases with increasing number of nodes, but _of course_, as memory 
size increases too so there's more stuff and a larger expected spreadout 
of memory to keep track of. ("Fixing" that would be like reintroducing a 
single runqueue for the scheduler, based on the argument that it's an 
O(1) number of runqueues against O(N) number of runqueues - which would 
be complete nonsense.)

so i see SLAB alien caches as an automatic self-partitioning mechanism 
... which has complexities but which also has _obvious_ performance 
benefits. Yes, it has some disadvantages like all caching schemes do - 
there's more cached memory tied up in the allocator at any given moment 
- but arguing against that would be like arguing against a 2MB L2 cache 
purely on the basis that a 1MB L2 cache is smaller and hence more 
space-efficient. Caches are there to cache stuff, and more caches ... 
use more memory. It's all a question of proportion and tuning, but the 
_design_ should be based on having as thorough caching as possible.

	Ingo

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52, [2.6.24-rc4-git5: Reported regressions from 2.6.23]
  2007-12-09  8:50                   ` Ingo Molnar
@ 2007-12-09  9:18                     ` Pekka Enberg
  2007-12-09 11:51                       ` Ingo Molnar
  2007-12-13 22:07                     ` Christoph Lameter
  1 sibling, 1 reply; 103+ messages in thread
From: Pekka Enberg @ 2007-12-09  9:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andrew Morton, Matt Mackall, Rafael J. Wysocki,
	LKML, Christoph Lameter

Hi Ingo,

On Dec 9, 2007 10:50 AM, Ingo Molnar <mingo@elte.hu> wrote:
> yes, i understand the initial announcement (and the Kconfig entry still
> says the same), but that is not matched up by the reality i see in the
> actual code - SLUB clearly uses a queue/list of objects (as cited in my
> previous mail), for obvious performance reasons.
>
> unless i'm missing something obvious (and i easily might), i see SLUB as
> SLAB reimplemented with a different queueing model. Not "without
> queueing".

Sure, the emphasis is on "no *such* queues."

On Dec 9, 2007 10:50 AM, Ingo Molnar <mingo@elte.hu> wrote:
> i'm curious about the real facts behind this "alien cache problem". I
> heard about it and asked around and was told that there's some sort of
> bad quadratic behavior of memory consumption on NUMA - but i cannot
> actually see that in the code.

I mostly live in the legacy 32-bit UMA/UP land still so I cannot verify this
myself but the kind folks at SGI claim the following (again from the
announcement):

"On our systems with 1k nodes / processors we have several gigabytes
 just tied up for storing references to objects for those queues  This does
 not include the objects that could be on those queues. One fears that the
 whole memory of the machine could one day be consumed by those
 queues."

The problem is that for each cache, you have an "per-node alien queues"
for each node (see struct kmem_cache nodelists -> struct kmem_list3 alien).
Moving slab metadata to struct page solves this but now you can only have
one "queue" thats part of the same struct.

                        Pekka

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52, [2.6.24-rc4-git5: Reported regressions from 2.6.23]
  2007-12-09  9:18                     ` Pekka Enberg
@ 2007-12-09 11:51                       ` Ingo Molnar
  2007-12-09 12:34                         ` Ingo Molnar
  0 siblings, 1 reply; 103+ messages in thread
From: Ingo Molnar @ 2007-12-09 11:51 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Linus Torvalds, Andrew Morton, Matt Mackall, Rafael J. Wysocki,
	LKML, Christoph Lameter


* Pekka Enberg <penberg@cs.helsinki.fi> wrote:

> I mostly live in the legacy 32-bit UMA/UP land still so I cannot 
> verify this myself but the kind folks at SGI claim the following 
> (again from the announcement):
> 
> "On our systems with 1k nodes / processors we have several gigabytes
>  just tied up for storing references to objects for those queues This 
>  does not include the objects that could be on those queues. One fears 
>  that the whole memory of the machine could one day be consumed by 
>  those queues."

Yes, you can find gigs tied up on systems that have 100 GB of RAM, or 
you can have gigs tied up if you over-size your caches. I'd like to see 
an accurate calculation done on this.

> The problem is that for each cache, you have an "per-node alien 
> queues" for each node (see struct kmem_cache nodelists -> struct 
> kmem_list3 alien). Moving slab metadata to struct page solves this but 
> now you can only have one "queue" thats part of the same struct.

yes, it's what i referred to as "distributed, per node cache". It has no 
"quadratic overhead". It has SLAB memory spread out amongst nodes. I.e. 
1 million pages are distributed amongst 1k nodes with 1000 pages per 
node with each node having 1 page.

But that memory is not lost and it's disingenous to call it 'overhead' 
and it very much comes handy for performance _IF_ there's global 
workload that involves cross-node allocations. It's simply a cache that 
is mis-sized and mis-constructed on large node count systems but i bet 
it makes quite a performance difference on low-node-count systems.

On high node-count systems it might make sense to reduce the amount of 
cross-node caching and to _structure_ the distributed NUMA SLAB cache in 
an intelligent way (perhaps along cpuset boundaries) - but a total, 
design level _elimination_ of this caching concept, using very 
misleading arguments, just looks stupid to me ...

	Ingo

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08  2:40 2.6.24-rc4-git5: Reported regressions from 2.6.23 Rafael J. Wysocki
                   ` (7 preceding siblings ...)
  2007-12-08 10:44 ` Richard Purdie
@ 2007-12-09 11:54 ` Andrew Morton
  2007-12-09 12:05   ` Ingo Molnar
  2007-12-09 14:24   ` Rafael J. Wysocki
  2007-12-10 20:42 ` Ingo Molnar
  9 siblings, 2 replies; 103+ messages in thread
From: Andrew Morton @ 2007-12-09 11:54 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: LKML, Linus Torvalds, Ingo Molnar

On Sat, 8 Dec 2007 03:40:49 +0100 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> This message contains a list of some regressions from 2.6.23 which have been
> reported since 2.6.24-rc1 was released and for which there are no fixes in the
> mainline that I know of.

Here's one for you - I have a new Lenovo t61p with which to irritate
everyone. 

suspend-to-ram is a wipeout, but suspend-to-disk works OK under
2.6.23.

However under 2.6.24-rc1 and -rc4 the machine reboots right at the end of
resume-from-disk.

Am trying to do a git-disect on it but it seems that someone has been
screwing with ata Kconfig and I'm hitting a pile of cant-find-root-disk
bisection points and I can't immediately work out why.  I'll try to find
time to look at it again next week.


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-09 11:54 ` Andrew Morton
@ 2007-12-09 12:05   ` Ingo Molnar
  2007-12-09 14:24   ` Rafael J. Wysocki
  1 sibling, 0 replies; 103+ messages in thread
From: Ingo Molnar @ 2007-12-09 12:05 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Rafael J. Wysocki, LKML, Linus Torvalds


* Andrew Morton <akpm@linux-foundation.org> wrote:

> Am trying to do a git-disect on it but it seems that someone has been 
> screwing with ata Kconfig and I'm hitting a pile of 
> cant-find-root-disk bisection points and I can't immediately work out 
> why.  I'll try to find time to look at it again next week.

the way i solve such bisection problems is to have the patch like below 
applied by a "git-bisect run" scriptlet (and popped off after the test). 
This way all must-have drivers and kernel features are selected for that 
particular testbox, no matter what Kconfig complication there are. 
(except outright config option renaming but those are rare)

	Ingo

Index: linux/arch/x86/Kconfig.needed
===================================================================
--- /dev/null
+++ linux/arch/x86/Kconfig.needed
@@ -0,0 +1,88 @@
+config FORCE_MINIMAL_CONFIG
+	bool
+	default y
+
+select EXPERIMENTAL
+
+select EXT3_FS
+select EXT3_FS_XATTR
+select EXT3_FS_POSIX_ACL
+select EXT3_FS_SECURITY
+select BLOCK
+select HOTPLUG
+#select INOTIFY
+#select INOTIFY_USER
+
+# so that capset() works (sudo, etc.):
+select SECURITY
+select SECURITY_CAPABILITIES
+
+select BINFMT_ELF
+select MSDOS_PARTITION
+select PARTITION_ADVANCED
+select BSD_DISKLABEL
+
+select SYSFS
+select SYSFS_DEPRECATED
+select PROC_FS
+select FUTEX
+
+select ATA
+select SATA_AHCI
+select ATA_PIIX
+select PATA_AMD
+select PATA_OLDPIIX
+select BLK_DEV_SD
+
+select E100
+select E1000
+select NET_ETHERNET
+select NET_PCI
+select MII
+select CRC32
+
+select 8139TOO
+select FORCEDETH
+
+select PACKET
+
+select NETPOLL
+select NETCONSOLE
+select NET_POLL_CONTROLLER
+select INET
+select NET
+select UNIX
+select NETDEVICES
+
+select SERIAL_8250
+select SERIAL_8250_CONSOLE
+select MAGIC_SYSRQ
+
+select INPUT
+select INPUT_MOUSEDEV
+select INPUT_POLLDEV
+select INPUT_KEYBOARD
+select KEYBOARD_ATKBD
+select SERIO
+select SERIO_I8042
+
+select VT
+select VT_CONSOLE
+select HW_CONSOLE
+select VGA_CONSOLE
+select EARLY_PRINTK
+select PRINTK
+select UNIX98_PTYS
+
+select USB
+select USB_MOUSE
+select USB_EHCI_HCD
+select USB_OHCI_HCD
+select USB_UHCI_HCD
+select USB_SUPPORT
+
+select PCI
+
+select STANDALONE
+select PREVENT_FIRMWARE_BUILD
+
Index: linux/lib/Kconfig
===================================================================
--- linux.orig/lib/Kconfig
+++ linux/lib/Kconfig
@@ -142,3 +142,6 @@ config CHECK_SIGNATURE
 	bool
 
 endmenu
+
+source "arch/x86/Kconfig.needed"
+
Index: linux/lib/Kconfig.debug

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52, [2.6.24-rc4-git5: Reported regressions from 2.6.23]
  2007-12-09 11:51                       ` Ingo Molnar
@ 2007-12-09 12:34                         ` Ingo Molnar
  0 siblings, 0 replies; 103+ messages in thread
From: Ingo Molnar @ 2007-12-09 12:34 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Linus Torvalds, Andrew Morton, Matt Mackall, Rafael J. Wysocki,
	LKML, Christoph Lameter


* Ingo Molnar <mingo@elte.hu> wrote:

> > The problem is that for each cache, you have an "per-node alien 
> > queues" for each node (see struct kmem_cache nodelists -> struct 
> > kmem_list3 alien). Moving slab metadata to struct page solves this 
> > but now you can only have one "queue" thats part of the same struct.
> 
> yes, it's what i referred to as "distributed, per node cache". It has 
> no "quadratic overhead". It has SLAB memory spread out amongst nodes. 
> I.e. 1 million pages are distributed amongst 1k nodes with 1000 pages 
> per node with each node having 1 page.
> 
> But that memory is not lost and it's disingenous to call it 'overhead' 
> and it very much comes handy for performance _IF_ there's global 
> workload that involves cross-node allocations. It's simply a cache 
> that is mis-sized and mis-constructed on large node count systems but 
> i bet it makes quite a performance difference on low-node-count 
> systems.
> 
> On high node-count systems it might make sense to reduce the amount of 
> cross-node caching and to _structure_ the distributed NUMA SLAB cache 
> in an intelligent way (perhaps along cpuset boundaries) [...]

i think much of this could be achieved by delaying the creation of alien 
caches up until the point a { node X } -> { node Y } alloc/free 
relationship gets established. So if a system is partitioned along 
cpusets, vast areas of the NxN matrix never gets populated with alien 
caches.

	Ingo

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-09  7:00   ` Tejun Heo
@ 2007-12-09 13:42     ` Alan Cox
  2007-12-09 15:09       ` Tejun Heo
                         ` (2 more replies)
  0 siblings, 3 replies; 103+ messages in thread
From: Alan Cox @ 2007-12-09 13:42 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar

> If we fail to find out the solution in time, we always have the
> alternative of backing out the ATAPI transfer chunk size update.  This

Which will break far more controllers and drives than it fixes, so
backing it out is nonsensical and not in the general good.

> will break some other cases which were fixed by the change but those
> won't be regressions at least and we can add transfer chunk size
> update with other changes to 2.6.25.

Great, make everyone else wait another three months for a working CD
drive. The one off regression appears far less harmful than a revert.

Tejun - instead of backing out important updates for 2.6.24 we should
just blacklist that specific drive for now and sort it nicely in 2.6.25,
not revert stuff and break everyone elses ATAPI devices.

Alan

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52, [2.6.24-rc4-git5: Reported regressions from 2.6.23]
  2007-12-09  7:58           ` Ingo Molnar
@ 2007-12-09 14:17             ` Rafael J. Wysocki
  0 siblings, 0 replies; 103+ messages in thread
From: Rafael J. Wysocki @ 2007-12-09 14:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, Linus Torvalds, Matt Mackall, LKML, Christoph Lameter

On Sunday, 9 of December 2007, Ingo Molnar wrote:
> 
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > It's a kmap_atomic() debugging patch which I wrote ages ago and whcih 
> > Ingo sucked into his tree.  I don't _think_ this warning is present in 
> > your tree at all.
> > 
> > http://lkml.org/lkml/2007/11/29/157 is where it starts.
> > 
> > I had a lenghty back-and-forth with Christoph on this within the past 
> > couple of months and I cannot locate the thread and I don't recall 
> > what the upshot was and Christoph is still offline.
> > 
> > Knocking out __GFP_ZERO at the point where the slab allocator(s) call 
> > the page allocator seems like a good approach to me.
> > 
> > But I don't think we need to do anything for 2.6.24..
> 
> ok - Rafael, please strike this off the regressions list, there is no 
> problem in .24 other than the double memset for some SLOB metadata.

OK, dropped.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-09  6:52   ` Tejun Heo
@ 2007-12-09 14:20     ` Rafael J. Wysocki
  2007-12-09 15:11       ` Tejun Heo
  0 siblings, 1 reply; 103+ messages in thread
From: Rafael J. Wysocki @ 2007-12-09 14:20 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Andrew Morton, LKML, Linus Torvalds, Ingo Molnar, linux-ide

On Sunday, 9 of December 2007, Tejun Heo wrote:
> Hello,
> 
> Andrew Morton wrote:
> >> Subject		: PATA scan: ACPI Exception AE_AML_PACKAGE_LIMIT... is beyond end of object
> >> Submitter	: Hans de Bruin <bruinjm@xs4all.nl>
> >> References	: http://bugzilla.kernel.org/show_bug.cgi?id=9320
> >> Handled-By	: Robert Moore <Robert.Moore@intel.com>
> >> 		  Tejun Heo <htejun@gmail.com>
> >> 		  Fu Michael <michael.fu@intel.com>
> >> Patch		: 
> >>
> > 
> > A number of other people are seeing the same thing and Tejun is
> > putting in a blacklist of machines which cannot use libata+acpi.
> > That patch is not yet in any git tree which I pull.
> > 
> > AFACIT the machines kepe working OK - there's just some nasty dmesg
> > spew.
> > 
> > If any machines _are_ breaking then this could cause real problems
> > and I'd prefer that we either go for a whitelist or arrange to
> > detect the condition and fall back to non-acpi ata.
> 
> The pending patchset should make ATA ACPI quite resistant to failures.

Are you going to push it for 2.6.24?

> Known bad boards can be blacklisted (currently only one is on the
> list), ATA ACPI is disabled quicker if ACPI evalution fails, execution
> errors are handled better and commands which are intended to help the
> vendor instead of the user are filtered.  So, I think we have enough
> safety nets.

Sounds good.  :-)

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-09 11:54 ` Andrew Morton
  2007-12-09 12:05   ` Ingo Molnar
@ 2007-12-09 14:24   ` Rafael J. Wysocki
  1 sibling, 0 replies; 103+ messages in thread
From: Rafael J. Wysocki @ 2007-12-09 14:24 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, Linus Torvalds, Ingo Molnar

On Sunday, 9 of December 2007, Andrew Morton wrote:
> On Sat, 8 Dec 2007 03:40:49 +0100 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> 
> > This message contains a list of some regressions from 2.6.23 which have been
> > reported since 2.6.24-rc1 was released and for which there are no fixes in the
> > mainline that I know of.
> 
> Here's one for you - I have a new Lenovo t61p with which to irritate
> everyone. 
> 
> suspend-to-ram is a wipeout, but suspend-to-disk works OK under
> 2.6.23.
> 
> However under 2.6.24-rc1 and -rc4 the machine reboots right at the end of
> resume-from-disk.

It's http://bugzilla.kernel.org/show_bug.cgi?id=9258 , I think.

Does it do that if you unload ehci-hcd before the hibernation?

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-09 13:42     ` Alan Cox
@ 2007-12-09 15:09       ` Tejun Heo
  2007-12-09 15:25         ` Alan Cox
  2007-12-09 18:36       ` Linus Torvalds
  2007-12-09 18:41       ` Linus Torvalds
  2 siblings, 1 reply; 103+ messages in thread
From: Tejun Heo @ 2007-12-09 15:09 UTC (permalink / raw)
  To: Alan Cox
  Cc: Andrew Morton, Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar

Hello, Alan.

Alan Cox wrote:
>> will break some other cases which were fixed by the change but those
>> won't be regressions at least and we can add transfer chunk size
>> update with other changes to 2.6.25.
> 
> Great, make everyone else wait another three months for a working CD
> drive. The one off regression appears far less harmful than a revert.

Newly broken ones will be regressions.  How many do we fix by the
change?  On SATA, setting the correct transfer chunk size doesn't seem
to fix many.

> Tejun - instead of backing out important updates for 2.6.24 we should
> just blacklist that specific drive for now and sort it nicely in 2.6.25,
> not revert stuff and break everyone elses ATAPI devices.

We'll need to blacklist setting transfer chunk size, eek, and let's
leave that as the last resort and hope that we find the solution soon.
Blacklist takes time to develop and temporary blacklist for just one
release doesn't sound like a good idea.

-- 
tejun

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-09 14:20     ` Rafael J. Wysocki
@ 2007-12-09 15:11       ` Tejun Heo
  0 siblings, 0 replies; 103+ messages in thread
From: Tejun Heo @ 2007-12-09 15:11 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Andrew Morton, LKML, Linus Torvalds, Ingo Molnar, linux-ide

Rafael J. Wysocki wrote:
>>> If any machines _are_ breaking then this could cause real problems
>>> and I'd prefer that we either go for a whitelist or arrange to
>>> detect the condition and fall back to non-acpi ata.
>> The pending patchset should make ATA ACPI quite resistant to failures.
> 
> Are you going to push it for 2.6.24?

Yeah, I'm hoping so.  Maybe command filtering should wait till 2.6.25
but the rest, yeap.

-- 
tejun

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-09 15:09       ` Tejun Heo
@ 2007-12-09 15:25         ` Alan Cox
  2007-12-09 15:39           ` Tejun Heo
  0 siblings, 1 reply; 103+ messages in thread
From: Alan Cox @ 2007-12-09 15:25 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar

> Newly broken ones will be regressions.  How many do we fix by the
> change?  On SATA, setting the correct transfer chunk size doesn't seem
> to fix many.

Regressions are not some kind of grand evil. Better to regress the odd
device than continue to break entire controllers.

> > Tejun - instead of backing out important updates for 2.6.24 we should
> > just blacklist that specific drive for now and sort it nicely in 2.6.25,
> > not revert stuff and break everyone elses ATAPI devices.
> 
> We'll need to blacklist setting transfer chunk size, eek, and let's
> leave that as the last resort and hope that we find the solution soon.
> Blacklist takes time to develop and temporary blacklist for just one
> release doesn't sound like a good idea.

It seems to be sensible to me *if* it is just this one device we are
somehow confusing and that one device is holding up fixing everything
else.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-09 15:25         ` Alan Cox
@ 2007-12-09 15:39           ` Tejun Heo
  0 siblings, 0 replies; 103+ messages in thread
From: Tejun Heo @ 2007-12-09 15:39 UTC (permalink / raw)
  To: Alan Cox
  Cc: Andrew Morton, Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar

Alan Cox wrote:
>> Newly broken ones will be regressions.  How many do we fix by the
>> change?  On SATA, setting the correct transfer chunk size doesn't seem
>> to fix many.
> 
> Regressions are not some kind of grand evil. Better to regress the odd
> device than continue to break entire controllers.

We need to put more weight on regressions as it at least makes releases
predictable to users.  Anyways, I wasn't saying it was some absolute
maxim.  I was literally asking how many so that we can evaluate the
trade off.

>>> Tejun - instead of backing out important updates for 2.6.24 we should
>>> just blacklist that specific drive for now and sort it nicely in 2.6.25,
>>> not revert stuff and break everyone elses ATAPI devices.
>> We'll need to blacklist setting transfer chunk size, eek, and let's
>> leave that as the last resort and hope that we find the solution soon.
>> Blacklist takes time to develop and temporary blacklist for just one
>> release doesn't sound like a good idea.
> 
> It seems to be sensible to me *if* it is just this one device we are
> somehow confusing and that one device is holding up fixing everything
> else.

Yeah, if it's this one device, I fully agree.  Let's see how debugging
turns out.

-- 
tejun

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08 10:55       ` Andreas Mohr
@ 2007-12-09 15:46         ` Tejun Heo
  2007-12-09 19:59           ` Andreas Mohr
  0 siblings, 1 reply; 103+ messages in thread
From: Tejun Heo @ 2007-12-09 15:46 UTC (permalink / raw)
  To: Andreas Mohr
  Cc: Andrew Morton, Rafael J. Wysocki, LKML, Linus Torvalds,
	Ingo Molnar, linux-ide, Len Brown, linux-acpi

Andreas Mohr wrote:
> Hi,
> 
> On Sat, Dec 08, 2007 at 02:20:01AM -0800, Andrew Morton wrote:
>>> Does this report now win me the lucky draw, pretty please? ;)
>> nah, you have to cc the acpi guys to get a prize ;)
> 
> Thought so shortly, but missed it.
> 
>> Andreas, please do separately report that WOL problem too..
> 
> Local setup issue only, at least this one *isn't* a 2.6.24-rc regression. ;)
> 
>> Our list just reached 30.
> 
> Oh, so this is in fact a separate issue? Wasn't sure, couldn't do
> enough analysis of similar cases.
> 
> Will test any (already submitted!) suggestions ASAP.

Please post full kernel boot log and the result of 'lspci -nn'.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52, [2.6.24-rc4-git5: Reported regressions from 2.6.23]
  2007-12-09  8:20                 ` Pekka Enberg
  2007-12-09  8:50                   ` Ingo Molnar
@ 2007-12-09 15:59                   ` Arjan van de Ven
  1 sibling, 0 replies; 103+ messages in thread
From: Arjan van de Ven @ 2007-12-09 15:59 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Matt Mackall,
	Rafael J. Wysocki, LKML, Christoph Lameter

On Sun, 9 Dec 2007 10:20:19 +0200
"Pekka Enberg" <penberg@cs.helsinki.fi> wrote:
\
> Now, while SLAB code is "pleasant and straightforward code" (thanks,
> btw) for UMA, it's really hairy for NUMA plus the "alien caches" eat
> tons of memory 

.. and they make slab slower on numa systems for database workloads
(I'm sure they do fine for SGI's customers HPC load though))

>(which is why Christoph wrote SLUB in the first place,
> the current code in SLAB is mostly unfixable due to its *queuing*
> nature).

To be honest, a SLAB without the alien stuff might be one of the best
performers today .... ;)
(on database loads.. where slub is quite a disaster as everyone knows)

-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-09 13:42     ` Alan Cox
  2007-12-09 15:09       ` Tejun Heo
@ 2007-12-09 18:36       ` Linus Torvalds
  2007-12-09 21:54         ` Alan Cox
  2007-12-09 18:41       ` Linus Torvalds
  2 siblings, 1 reply; 103+ messages in thread
From: Linus Torvalds @ 2007-12-09 18:36 UTC (permalink / raw)
  To: Alan Cox; +Cc: Tejun Heo, Andrew Morton, Rafael J. Wysocki, LKML, Ingo Molnar



On Sun, 9 Dec 2007, Alan Cox wrote:
>
> > If we fail to find out the solution in time, we always have the
> > alternative of backing out the ATAPI transfer chunk size update.  This
> 
> Which will break far more controllers and drives than it fixes, so
> backing it out is nonsensical and not in the general good.

No.

Regressions are worse. It doesn't matter AT ALL if you think that it 
breaks ten times more devices, if it's a regression and those devices 
didn't work in the past, they simply DO NOT COUNT.

				Linus

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-09 13:42     ` Alan Cox
  2007-12-09 15:09       ` Tejun Heo
  2007-12-09 18:36       ` Linus Torvalds
@ 2007-12-09 18:41       ` Linus Torvalds
  2007-12-09 22:01         ` Alan Cox
  2 siblings, 1 reply; 103+ messages in thread
From: Linus Torvalds @ 2007-12-09 18:41 UTC (permalink / raw)
  To: Alan Cox; +Cc: Tejun Heo, Andrew Morton, Rafael J. Wysocki, LKML, Ingo Molnar



On Sun, 9 Dec 2007, Alan Cox wrote:
> 
> Great, make everyone else wait another three months for a working CD
> drive. The one off regression appears far less harmful than a revert.

Btw, Alan, that "math" is total and utter BULLSH*T, and you should know 
that.

"The one off regression" is likely the tip of an iceberg. If something 
regresses for one person, for that one person who tested and noticed and 
made a bug-report, there's probably a thousand people who haven't even 
tested the development kernel, or who had problems and just went back to 
the previous version.

In contrast, reverting something will be guaranteed to not have those 
kinds of issues, since the only people who could notice are people for who 
it never worked in the first place. There's no "silent mass of people" 
that can be affected.

This is why regressions are so important. They don't trump _everything_, 
but basically ignoring and letting them slide is *much* more painful than 
just reverting it.

The biggest reason to ignore a regression is if nobody can even figure 
out where it came from, or reverting simply isn't an option for some 
really deep and fundamental issue. That doesn't seem to be the case here. 

So we should revert unless there is some known acceptable real fix.

		Linus


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-09 15:46         ` Tejun Heo
@ 2007-12-09 19:59           ` Andreas Mohr
  0 siblings, 0 replies; 103+ messages in thread
From: Andreas Mohr @ 2007-12-09 19:59 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andreas Mohr, Andrew Morton, Rafael J. Wysocki, LKML,
	Linus Torvalds, Ingo Molnar, linux-ide, Len Brown, linux-acpi

Hi,

On Mon, Dec 10, 2007 at 12:46:57AM +0900, Tejun Heo wrote:
> Please post full kernel boot log and the result of 'lspci -nn'.

Done, on #9530.

Will try some of the promising patches/suggestions now, hopefully this will
show me what's up. Will add further results there.

Andreas Mohr

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-09 18:36       ` Linus Torvalds
@ 2007-12-09 21:54         ` Alan Cox
  0 siblings, 0 replies; 103+ messages in thread
From: Alan Cox @ 2007-12-09 21:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tejun Heo, Andrew Morton, Rafael J. Wysocki, LKML, Ingo Molnar

> Regressions are worse. It doesn't matter AT ALL if you think that it 
> breaks ten times more devices, if it's a regression and those devices 
> didn't work in the past, they simply DO NOT COUNT.

Must be time for an -ac tree again.

Alan

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-09 18:41       ` Linus Torvalds
@ 2007-12-09 22:01         ` Alan Cox
  2007-12-09 22:51           ` Ray Lee
  2007-12-10  1:57           ` Linus Torvalds
  0 siblings, 2 replies; 103+ messages in thread
From: Alan Cox @ 2007-12-09 22:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tejun Heo, Andrew Morton, Rafael J. Wysocki, LKML, Ingo Molnar

> Btw, Alan, that "math" is total and utter BULLSH*T, and you should know 
> that.

The one off regression is probably not one off, but this is IDE so
actually its quite probable its a single broken firmware. 

The alternative is that you cripple just about every user of various
other standards compliant devices and controllers whose hardware we
finally fixed.

Finally you need to remember that the 'regression' is caused by the fact
we now do the _right_ thing both in terms of 'old IDE' and specs.

Believe it or not I did actually think in quite some detail about this
case, and the relative probabilities, and go back and re-review the old
IDE code (whose behaviour we now follow) and the spec. I spend a
measurable amount of my time reviewing code and weighing risks,
regressions and progress for an enterprise Linux vendor, so its something
I do every day of the week.

To blindly argue regressions are critical is sometimes (as in this case)
to argue that "this freeway is no longer compatible with a horse and
cart" means the freeway should be turned back into a dirt road.

The horse and cart happened to work by chance because the road was quiet
that day. We clearly need to add a horse & cart lane in the long term,
but for 2.6.24 it may well be the right thing to do just to blacklist
that specific drive back to old behaviour until we can tidy it more
nicely.

Alan

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-09 22:01         ` Alan Cox
@ 2007-12-09 22:51           ` Ray Lee
  2007-12-10  1:57           ` Linus Torvalds
  1 sibling, 0 replies; 103+ messages in thread
From: Ray Lee @ 2007-12-09 22:51 UTC (permalink / raw)
  To: Alan Cox
  Cc: Linus Torvalds, Tejun Heo, Andrew Morton, Rafael J. Wysocki,
	LKML, Ingo Molnar

On Dec 9, 2007 2:01 PM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> > Btw, Alan, that "math" is total and utter BULLSH*T, and you should know
> > that.
>
> To blindly argue regressions are critical is sometimes (as in this case)
> to argue that "this freeway is no longer compatible with a horse and
> cart" means the freeway should be turned back into a dirt road.

Honest question: If you allow regressions, then how does one guarantee
forward progress? (If it were a finite set of systems, all within one
group's control, then the answer is simple: count how many work.
However, in this case we only have a statistical sampling available to
us.)

Ray

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-09 22:01         ` Alan Cox
  2007-12-09 22:51           ` Ray Lee
@ 2007-12-10  1:57           ` Linus Torvalds
  2007-12-10  3:28             ` Alan Cox
                               ` (2 more replies)
  1 sibling, 3 replies; 103+ messages in thread
From: Linus Torvalds @ 2007-12-10  1:57 UTC (permalink / raw)
  To: Alan Cox; +Cc: Tejun Heo, Andrew Morton, Rafael J. Wysocki, LKML, Ingo Molnar



On Sun, 9 Dec 2007, Alan Cox wrote:
> 
> The one off regression is probably not one off, but this is IDE so
> actually its quite probable its a single broken firmware. 
>
> The alternative is that you cripple just about every user of various
> other standards compliant devices and controllers whose hardware we
> finally fixed.

Alan, you're so full of shit that it's not even funny.

Have you even *read* the thread?

Tejun already reported that this apparently gets fixed _properly_ with the 
more extensive cleanups and fixes that are pending for 2.6.25.

In other words, the stuff you call so critically important (yet we've been 
able to live without it until now!) is apparently simply NOT YET READY. 
It's breaking things.

In this case, Tejun seems to be right on the money.  I also agree 100% 
with him when he says

   "Blacklist takes time to develop and temporary blacklist for just one
    release doesn't sound like a good idea."

because if we create some blacklist for that one reported device, not only 
is it likely going to be wrong (it's almost never just one firmware or one 
chip that has a particular issue), but we tend to create thee blacklists 
and later realize that we shouldn't have blacklisted things at all, we 
should just have done things differently.

For examples of that, see the NCQ blacklist that was just _us_ doing 
things wrong (over-reacting to things we shouldn't care about), and 
there's currently another totally unrelated discussion on a very similar 
thing wrt libata and the ACPI startup commands for an unused controller 
port.

> Finally you need to remember that the 'regression' is caused by the fact
> we now do the _right_ thing both in terms of 'old IDE' and specs.

.. and what the hell does that matter? If the code doesn't work, it 
doesn't work, and you might as well point to some random scribblings done 
by a three-year-old on toilet paper rather than any "specs".

Real life matters more. Regressions matter more.

We apparently do have a full fix, but it seems to be too invasive for 
2.6.24, which means that the thing that currently DOES NOT WORK and 
causes regressions should be reverted, so that 2.6.24 is at least no worse 
than 2.6.23 (and all earlier kernels) in this respect.

And then we should just hope that the more complete fix that Tejun has 
doesn't cause any issues on its own. I would suggest that if you care so 
deeply about this issue, you press Fedora into putting Tejun's tree into 
Fedora testing, and get that thing tested out extensively.

So the fact is, we have a way forward, but we should *not* take steps 
backwards just because you want to push something out that isn't quite 
ready. We should revert the change that causes the current trouble, safe 
in the knowledge (or at least "strong hope") that we have a way forward 
that makes *both* 2.6.24 and 2.6.25 be continual improvements.

We used to allow regressions. It was really painful. It's hard to debug 
things when things sometimes break. It's much better to have a nice 
constant monotonic improvement.

It's better for users, but it's much better also for developers, even if 
you may be frustrated right now because some new code effectively gets 
shut down until it works for everybody.

		Linus

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10  1:57           ` Linus Torvalds
@ 2007-12-10  3:28             ` Alan Cox
  2007-12-10  3:38             ` Alan Cox
  2007-12-10  8:21             ` Ingo Molnar
  2 siblings, 0 replies; 103+ messages in thread
From: Alan Cox @ 2007-12-10  3:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tejun Heo, Andrew Morton, Rafael J. Wysocki, LKML, Ingo Molnar

Its your kernel. Its your call, and your privilege to be wrong.

And anyone with ATAPI problems should probably test the -mm tree before
reporting anything.

Alan

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10  1:57           ` Linus Torvalds
  2007-12-10  3:28             ` Alan Cox
@ 2007-12-10  3:38             ` Alan Cox
  2007-12-10 15:38               ` Linus Torvalds
  2007-12-10  8:21             ` Ingo Molnar
  2 siblings, 1 reply; 103+ messages in thread
From: Alan Cox @ 2007-12-10  3:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tejun Heo, Andrew Morton, Rafael J. Wysocki, LKML, Ingo Molnar

> Have you even *read* the thread?

In detail, as it unfolds and while testing variants of Tejun's code on
the hardware I have access to - none of which has this bug making it
rather trickier to help.

> In other words, the stuff you call so critically important (yet we've been 
> able to live without it until now!) is apparently simply NOT YET READY. 
> It's breaking things.

And as I keep pointing out but you keep ignoring - not doing it breaks
even more things, by a factor of quite a lot.

> .. and what the hell does that matter? If the code doesn't work, it 
> doesn't work, and you might as well point to some random scribblings done 
> by a three-year-old on toilet paper rather than any "specs".

The code without the changes doesn't work either. So pick your toilet
paper.. by your argument both are toilet paper.

> causes regressions should be reverted, so that 2.6.24 is at least no worse 
> than 2.6.23 (and all earlier kernels) in this respect.

Which as the distro bug lists for ATAPI will tell you - aint good. Still
distro vendors can ship patches.

> We used to allow regressions. It was really painful. It's hard to debug 
> things when things sometimes break. It's much better to have a nice 
> constant monotonic improvement.

Linus, the kernel regresses all over the place every release. If it
didn't do that you'd never get any changes in. Your kernel would
fossilize like RHEL or SLES and you'd be spending weeks analysing each
changeset for possible side effects, or - as happens by neccessity -
adding code paths so a fix vital to one driver ceases to share core code
with another driver - to reduce regression risk. Been there, done that
and its not the way progress happens.

> It's better for users, but it's much better also for developers, even if 
> you may be frustrated right now because some new code effectively gets 
> shut down until it works for everybody.

Have fun. I trust you'll be fixing the other 11 I think it was listed
regressions before 2.6.24  - or backing out every changeset that could be
responsible ?

No I thought not - because that wouldn't be sensible either.

Alan

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10  1:57           ` Linus Torvalds
  2007-12-10  3:28             ` Alan Cox
  2007-12-10  3:38             ` Alan Cox
@ 2007-12-10  8:21             ` Ingo Molnar
  2007-12-10  8:27               ` Tejun Heo
  2 siblings, 1 reply; 103+ messages in thread
From: Ingo Molnar @ 2007-12-10  8:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, Tejun Heo, Andrew Morton, Rafael J. Wysocki, LKML


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> Tejun already reported that this apparently gets fixed _properly_ with 
> the more extensive cleanups and fixes that are pending for 2.6.25.

btw., how extensive are those cleanups and fixes in reality, is there a 
rollup somewhere one could take a look at? Those fixes and cleanups were 
deferred to v2.6.25 in the knowledge of having the current code included 
in v2.6.24 - but now that the current approach seems to regress, maybe 
those cleanups are still safe enough. (compared to an outright revert)

	Ingo

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10  8:21             ` Ingo Molnar
@ 2007-12-10  8:27               ` Tejun Heo
  2007-12-10  8:41                 ` Ingo Molnar
  0 siblings, 1 reply; 103+ messages in thread
From: Tejun Heo @ 2007-12-10  8:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Alan Cox, Andrew Morton, Rafael J. Wysocki, LKML

Ingo Molnar wrote:
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
>> Tejun already reported that this apparently gets fixed _properly_ with 
>> the more extensive cleanups and fixes that are pending for 2.6.25.
> 
> btw., how extensive are those cleanups and fixes in reality, is there a 
> rollup somewhere one could take a look at? Those fixes and cleanups were 
> deferred to v2.6.25 in the knowledge of having the current code included 
> in v2.6.24 - but now that the current approach seems to regress, maybe 
> those cleanups are still safe enough. (compared to an outright revert)

The following git tree contains patches pending review for 2.6.25.

http://git.kernel.org/?p=linux/kernel/git/tj/libata-dev.git;a=shortlog;h=improve-ATAPI-data-transfer-no-pio

And we're getting close to fixing the regression.  I don't think there's
too much worry about this one.  Just need a bit more time to test few
more things.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10  8:27               ` Tejun Heo
@ 2007-12-10  8:41                 ` Ingo Molnar
  0 siblings, 0 replies; 103+ messages in thread
From: Ingo Molnar @ 2007-12-10  8:41 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Linus Torvalds, Alan Cox, Andrew Morton, Rafael J. Wysocki, LKML


* Tejun Heo <htejun@gmail.com> wrote:

> The following git tree contains patches pending review for 2.6.25.
> 
> http://git.kernel.org/?p=linux/kernel/git/tj/libata-dev.git;a=shortlog;h=improve-ATAPI-data-transfer-no-pio
> 
> And we're getting close to fixing the regression.  I don't think 
> there's too much worry about this one.  Just need a bit more time to 
> test few more things.

ah, i see, the joys of the kernel running BIOS written code (AML):

  http://bugzilla.kernel.org/attachment.cgi?id=13932&action=view

cute!

	Ingo

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10  3:38             ` Alan Cox
@ 2007-12-10 15:38               ` Linus Torvalds
  0 siblings, 0 replies; 103+ messages in thread
From: Linus Torvalds @ 2007-12-10 15:38 UTC (permalink / raw)
  To: Alan Cox; +Cc: Tejun Heo, Andrew Morton, Rafael J. Wysocki, LKML, Ingo Molnar



On Mon, 10 Dec 2007, Alan Cox wrote:
> 
> And as I keep pointing out but you keep ignoring - not doing it breaks
> even more things, by a factor of quite a lot.

But we've never done it before in libata, right?

So the "not doing it breaks" argument is about stuff that isn't 
regressions.

Can you really not see the difference?

		Linus

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08  2:40 2.6.24-rc4-git5: Reported regressions from 2.6.23 Rafael J. Wysocki
                   ` (8 preceding siblings ...)
  2007-12-09 11:54 ` Andrew Morton
@ 2007-12-10 20:42 ` Ingo Molnar
  2007-12-10 20:57   ` Guillaume Chazarain
  2007-12-10 20:59   ` Andrew Morton
  9 siblings, 2 replies; 103+ messages in thread
From: Ingo Molnar @ 2007-12-10 20:42 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: LKML, Andrew Morton, Linus Torvalds


* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> Subject		: jiffies counter leaps in 2.6.24-rc3
> Submitter	: Stefano Brivio <stefano.brivio@polimi.it>
> References	: http://lkml.org/lkml/2007/11/24/53
> 		  http://bugzilla.kernel.org/show_bug.cgi?id=9475
> Handled-By	: Ingo Molnar <mingo@elte.hu>
> Patch		: http://lkml.org/lkml/2007/12/7/132

Linus, Andrew, i need some help deciding what to do about this 
regression. The fixes for this have been tested and resolve the 
regression, but they change printk and other code that runs by default 
and is thus rather invasive so late in the v2.6.24 cycle. This bug 
should only affect CONFIG_PRINTK_TIME=y kernels (a non-default debug 
option) - although some claimed effect was on udelay()/mdelay() too.

i've attached below the queue of 5 patches that fix this problem. They 
have been build and boot tested with more than 1000 random kernels in 
the past few days, so i certainly trust the core and x86 bits of this.

what do you think? Right now i've got them queued up for 2.6.25 in both 
the scheduler-devel and the x86-devel git trees - but can submit them 
for 2.6.24 if it's better if we did them there. I've got no strong 
opinion either way.

	Ingo

-------------------->
Subject: x86: scale cyc_2_nsec according to CPU frequency
From: "Guillaume Chazarain" <guichaz@yahoo.fr>

scale the sched_clock() cyc_2_nsec scaling factor according to
CPU frequency changes.

[ mingo@elte.hu: simplified it and fixed it for SMP. ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/tsc_32.c |   43 ++++++++++++++++++++++++++++++-----
 arch/x86/kernel/tsc_64.c |   57 ++++++++++++++++++++++++++++++++++++++---------
 include/asm-x86/timer.h  |   23 ++++++++++++++----
 3 files changed, 102 insertions(+), 21 deletions(-)

Index: linux/arch/x86/kernel/tsc_32.c
===================================================================
--- linux.orig/arch/x86/kernel/tsc_32.c
+++ linux/arch/x86/kernel/tsc_32.c
@@ -5,6 +5,7 @@
 #include <linux/jiffies.h>
 #include <linux/init.h>
 #include <linux/dmi.h>
+#include <linux/percpu.h>
 
 #include <asm/delay.h>
 #include <asm/tsc.h>
@@ -80,13 +81,31 @@ EXPORT_SYMBOL_GPL(check_tsc_unstable);
  *
  *			-johnstul@us.ibm.com "math is hard, lets go shopping!"
  */
-unsigned long cyc2ns_scale __read_mostly;
 
-#define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */
+DEFINE_PER_CPU(unsigned long, cyc2ns);
 
-static inline void set_cyc2ns_scale(unsigned long cpu_khz)
+static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu)
 {
-	cyc2ns_scale = (1000000 << CYC2NS_SCALE_FACTOR)/cpu_khz;
+	unsigned long flags, prev_scale, *scale;
+	unsigned long long tsc_now, ns_now;
+
+	local_irq_save(flags);
+	sched_clock_idle_sleep_event();
+
+	scale = &per_cpu(cyc2ns, cpu);
+
+	rdtscll(tsc_now);
+	ns_now = __cycles_2_ns(tsc_now);
+
+	prev_scale = *scale;
+	if (cpu_khz)
+		*scale = (NSEC_PER_MSEC << CYC2NS_SCALE_FACTOR)/cpu_khz;
+
+	/*
+	 * Start smoothly with the new frequency:
+	 */
+	sched_clock_idle_wakeup_event(0);
+	local_irq_restore(flags);
 }
 
 /*
@@ -239,7 +258,9 @@ time_cpufreq_notifier(struct notifier_bl
 						ref_freq, freq->new);
 			if (!(freq->flags & CPUFREQ_CONST_LOOPS)) {
 				tsc_khz = cpu_khz;
-				set_cyc2ns_scale(cpu_khz);
+				preempt_disable();
+				set_cyc2ns_scale(cpu_khz, smp_processor_id());
+				preempt_enable();
 				/*
 				 * TSC based sched_clock turns
 				 * to junk w/ cpufreq
@@ -367,6 +388,8 @@ static inline void check_geode_tsc_relia
 
 void __init tsc_init(void)
 {
+	int cpu;
+
 	if (!cpu_has_tsc || tsc_disable)
 		goto out_no_tsc;
 
@@ -380,7 +403,15 @@ void __init tsc_init(void)
 				(unsigned long)cpu_khz / 1000,
 				(unsigned long)cpu_khz % 1000);
 
-	set_cyc2ns_scale(cpu_khz);
+	/*
+	 * Secondary CPUs do not run through tsc_init(), so set up
+	 * all the scale factors for all CPUs, assuming the same
+	 * speed as the bootup CPU. (cpufreq notifiers will fix this
+	 * up if their speed diverges)
+	 */
+	for_each_possible_cpu(cpu)
+		set_cyc2ns_scale(cpu_khz, cpu);
+
 	use_tsc_delay();
 
 	/* Check and install the TSC clocksource */
Index: linux/arch/x86/kernel/tsc_64.c
===================================================================
--- linux.orig/arch/x86/kernel/tsc_64.c
+++ linux/arch/x86/kernel/tsc_64.c
@@ -10,6 +10,7 @@
 
 #include <asm/hpet.h>
 #include <asm/timex.h>
+#include <asm/timer.h>
 
 static int notsc __initdata = 0;
 
@@ -18,16 +19,48 @@ EXPORT_SYMBOL(cpu_khz);
 unsigned int tsc_khz;
 EXPORT_SYMBOL(tsc_khz);
 
-static unsigned int cyc2ns_scale __read_mostly;
+/* Accelerators for sched_clock()
+ * convert from cycles(64bits) => nanoseconds (64bits)
+ *  basic equation:
+ *		ns = cycles / (freq / ns_per_sec)
+ *		ns = cycles * (ns_per_sec / freq)
+ *		ns = cycles * (10^9 / (cpu_khz * 10^3))
+ *		ns = cycles * (10^6 / cpu_khz)
+ *
+ *	Then we use scaling math (suggested by george@mvista.com) to get:
+ *		ns = cycles * (10^6 * SC / cpu_khz) / SC
+ *		ns = cycles * cyc2ns_scale / SC
+ *
+ *	And since SC is a constant power of two, we can convert the div
+ *  into a shift.
+ *
+ *  We can use khz divisor instead of mhz to keep a better precision, since
+ *  cyc2ns_scale is limited to 10^6 * 2^10, which fits in 32 bits.
+ *  (mathieu.desnoyers@polymtl.ca)
+ *
+ *			-johnstul@us.ibm.com "math is hard, lets go shopping!"
+ */
+DEFINE_PER_CPU(unsigned long, cyc2ns);
 
-static inline void set_cyc2ns_scale(unsigned long khz)
+static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu)
 {
-	cyc2ns_scale = (NSEC_PER_MSEC << NS_SCALE) / khz;
-}
+	unsigned long flags, prev_scale, *scale;
+	unsigned long long tsc_now, ns_now;
 
-static unsigned long long cycles_2_ns(unsigned long long cyc)
-{
-	return (cyc * cyc2ns_scale) >> NS_SCALE;
+	local_irq_save(flags);
+	sched_clock_idle_sleep_event();
+
+	scale = &per_cpu(cyc2ns, cpu);
+
+	rdtscll(tsc_now);
+	ns_now = __cycles_2_ns(tsc_now);
+
+	prev_scale = *scale;
+	if (cpu_khz)
+		*scale = (NSEC_PER_MSEC << CYC2NS_SCALE_FACTOR)/cpu_khz;
+
+	sched_clock_idle_wakeup_event(0);
+	local_irq_restore(flags);
 }
 
 unsigned long long sched_clock(void)
@@ -100,7 +133,9 @@ static int time_cpufreq_notifier(struct 
 			mark_tsc_unstable("cpufreq changes");
 	}
 
-	set_cyc2ns_scale(tsc_khz_ref);
+	preempt_disable();
+	set_cyc2ns_scale(tsc_khz_ref, smp_processor_id());
+	preempt_enable();
 
 	return 0;
 }
@@ -151,7 +186,7 @@ static unsigned long __init tsc_read_ref
 void __init tsc_calibrate(void)
 {
 	unsigned long flags, tsc1, tsc2, tr1, tr2, pm1, pm2, hpet1, hpet2;
-	int hpet = is_hpet_enabled();
+	int hpet = is_hpet_enabled(), cpu;
 
 	local_irq_save(flags);
 
@@ -206,7 +241,9 @@ void __init tsc_calibrate(void)
 	}
 
 	tsc_khz = tsc2 / tsc1;
-	set_cyc2ns_scale(tsc_khz);
+
+	for_each_possible_cpu(cpu)
+		set_cyc2ns_scale(tsc_khz, cpu);
 }
 
 /*
Index: linux/include/asm-x86/timer.h
===================================================================
--- linux.orig/include/asm-x86/timer.h
+++ linux/include/asm-x86/timer.h
@@ -2,6 +2,7 @@
 #define _ASMi386_TIMER_H
 #include <linux/init.h>
 #include <linux/pm.h>
+#include <linux/percpu.h>
 
 #define TICK_SIZE (tick_nsec / 1000)
 
@@ -16,7 +17,7 @@ extern int recalibrate_cpu_khz(void);
 #define calculate_cpu_khz() native_calculate_cpu_khz()
 #endif
 
-/* Accellerators for sched_clock()
+/* Accelerators for sched_clock()
  * convert from cycles(64bits) => nanoseconds (64bits)
  *  basic equation:
  *		ns = cycles / (freq / ns_per_sec)
@@ -31,20 +32,32 @@ extern int recalibrate_cpu_khz(void);
  *	And since SC is a constant power of two, we can convert the div
  *  into a shift.
  *
- *  We can use khz divisor instead of mhz to keep a better percision, since
+ *  We can use khz divisor instead of mhz to keep a better precision, since
  *  cyc2ns_scale is limited to 10^6 * 2^10, which fits in 32 bits.
  *  (mathieu.desnoyers@polymtl.ca)
  *
  *			-johnstul@us.ibm.com "math is hard, lets go shopping!"
  */
-extern unsigned long cyc2ns_scale __read_mostly;
+
+DECLARE_PER_CPU(unsigned long, cyc2ns);
 
 #define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */
 
-static inline unsigned long long cycles_2_ns(unsigned long long cyc)
+static inline unsigned long long __cycles_2_ns(unsigned long long cyc)
 {
-	return (cyc * cyc2ns_scale) >> CYC2NS_SCALE_FACTOR;
+	return cyc * per_cpu(cyc2ns, smp_processor_id()) >> CYC2NS_SCALE_FACTOR;
 }
 
+static inline unsigned long long cycles_2_ns(unsigned long long cyc)
+{
+	unsigned long long ns;
+	unsigned long flags;
+
+	local_irq_save(flags);
+	ns = __cycles_2_ns(cyc);
+	local_irq_restore(flags);
+
+	return ns;
+}
 
 #endif
--------------->
Subject: x86: idle wakeup event in the HLT loop
From: Ingo Molnar <mingo@elte.hu>

do a proper idle-wakeup event on HLT as well - some CPUs stop the TSC
in HLT too, not just when going through the ACPI methods.

(the ACPI idle code already does this.)

[ update the 64-bit side too, as noticed by Jiri Slaby. ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/process_32.c |   15 ++++++++++++---
 arch/x86/kernel/process_64.c |   13 ++++++++++---
 2 files changed, 22 insertions(+), 6 deletions(-)

Index: linux-x86.q/arch/x86/kernel/process_32.c
===================================================================
--- linux-x86.q.orig/arch/x86/kernel/process_32.c
+++ linux-x86.q/arch/x86/kernel/process_32.c
@@ -113,10 +113,19 @@ void default_idle(void)
 		smp_mb();
 
 		local_irq_disable();
-		if (!need_resched())
+		if (!need_resched()) {
+			ktime_t t0, t1;
+			u64 t0n, t1n;
+
+			t0 = ktime_get();
+			t0n = ktime_to_ns(t0);
 			safe_halt();	/* enables interrupts racelessly */
-		else
-			local_irq_enable();
+			local_irq_disable();
+			t1 = ktime_get();
+			t1n = ktime_to_ns(t1);
+			sched_clock_idle_wakeup_event(t1n - t0n);
+		}
+		local_irq_enable();
 		current_thread_info()->status |= TS_POLLING;
 	} else {
 		/* loop is done by the caller */
Index: linux-x86.q/arch/x86/kernel/process_64.c
===================================================================
--- linux-x86.q.orig/arch/x86/kernel/process_64.c
+++ linux-x86.q/arch/x86/kernel/process_64.c
@@ -116,9 +116,16 @@ static void default_idle(void)
 	smp_mb();
 	local_irq_disable();
 	if (!need_resched()) {
-		/* Enables interrupts one instruction before HLT.
-		   x86 special cases this so there is no race. */
-		safe_halt();
+		ktime_t t0, t1;
+		u64 t0n, t1n;
+
+		t0 = ktime_get();
+		t0n = ktime_to_ns(t0);
+		safe_halt();	/* enables interrupts racelessly */
+		local_irq_disable();
+		t1 = ktime_get();
+		t1n = ktime_to_ns(t1);
+		sched_clock_idle_wakeup_event(t1n - t0n);
 	} else
 		local_irq_enable();
 	current_thread_info()->status |= TS_POLLING;
--------------->
Subject: printk: make printk more robust by not allowing recursion
From: Ingo Molnar <mingo@elte.hu>

make printk more robust by allowing recursion only if there's a crash
going on. Also add recursion detection.

I've tested it with an artificially injected printk recursion - instead
of a lockup or spontaneous reboot or other crash, the output was a well
controlled:

[   41.057335] SysRq : <2>BUG: recent printk recursion!
[   41.057335] loglevel0-8 reBoot Crashdump show-all-locks(D) tErm Full kIll saK showMem Nice powerOff showPc show-all-timers(Q) unRaw Sync showTasks Unmount shoW-blocked-tasks

also do all this printk-debug logic with irqs disabled.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/printk.c |   48 ++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 38 insertions(+), 10 deletions(-)

Index: linux/kernel/printk.c
===================================================================
--- linux.orig/kernel/printk.c
+++ linux/kernel/printk.c
@@ -628,30 +628,57 @@ asmlinkage int printk(const char *fmt, .
 /* cpu currently holding logbuf_lock */
 static volatile unsigned int printk_cpu = UINT_MAX;
 
+const char printk_recursion_bug_msg [] =
+			KERN_CRIT "BUG: recent printk recursion!\n";
+static int printk_recursion_bug;
+
 asmlinkage int vprintk(const char *fmt, va_list args)
 {
+	static int log_level_unknown = 1;
+	static char printk_buf[1024];
+
 	unsigned long flags;
-	int printed_len;
+	int printed_len = 0;
+	int this_cpu;
 	char *p;
-	static char printk_buf[1024];
-	static int log_level_unknown = 1;
 
 	boot_delay_msec();
 
 	preempt_disable();
-	if (unlikely(oops_in_progress) && printk_cpu == smp_processor_id())
-		/* If a crash is occurring during printk() on this CPU,
-		 * make sure we can't deadlock */
-		zap_locks();
-
 	/* This stops the holder of console_sem just where we want him */
 	raw_local_irq_save(flags);
+	this_cpu = smp_processor_id();
+
+	/*
+	 * Ouch, printk recursed into itself!
+	 */
+	if (unlikely(printk_cpu == this_cpu)) {
+		/*
+		 * If a crash is occurring during printk() on this CPU,
+		 * then try to get the crash message out but make sure
+		 * we can't deadlock. Otherwise just return to avoid the
+		 * recursion and return - but flag the recursion so that
+		 * it can be printed at the next appropriate moment:
+		 */
+		if (!oops_in_progress) {
+			printk_recursion_bug = 1;
+			goto out_restore_irqs;
+		}
+		zap_locks();
+	}
+
 	lockdep_off();
 	spin_lock(&logbuf_lock);
-	printk_cpu = smp_processor_id();
+	printk_cpu = this_cpu;
 
+	if (printk_recursion_bug) {
+		printk_recursion_bug = 0;
+		strcpy(printk_buf, printk_recursion_bug_msg);
+		printed_len = sizeof(printk_recursion_bug_msg);
+	}
 	/* Emit the output into the temporary buffer */
-	printed_len = vscnprintf(printk_buf, sizeof(printk_buf), fmt, args);
+	printed_len += vscnprintf(printk_buf + printed_len,
+				  sizeof(printk_buf), fmt, args);
 
 	/*
 	 * Copy the output into log_buf.  If the caller didn't provide
@@ -744,6 +771,7 @@ asmlinkage int vprintk(const char *fmt, 
 		printk_cpu = UINT_MAX;
 		spin_unlock(&logbuf_lock);
 		lockdep_on();
+out_restore_irqs:
 		raw_local_irq_restore(flags);
 	}
 
--------------->
Subject: sched: fix CONFIG_PRINT_TIME's reliance on sched_clock()
From: Ingo Molnar <mingo@elte.hu>

Stefano Brivio reported weird printk timestamp behavior during
CPU frequency changes:

  http://bugzilla.kernel.org/show_bug.cgi?id=9475

fix CONFIG_PRINT_TIME's reliance on sched_clock() and use cpu_clock()
instead.

Reported-and-bisected-by: Stefano Brivio <stefano.brivio@polimi.it>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/printk.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/kernel/printk.c
===================================================================
--- linux.orig/kernel/printk.c
+++ linux/kernel/printk.c
@@ -707,7 +707,7 @@ asmlinkage int vprintk(const char *fmt, 
 					loglev_char = default_message_loglevel
 						+ '0';
 				}
-				t = printk_clock();
+				t = cpu_clock(printk_cpu);
 				nanosec_rem = do_div(t, 1000000000);
 				tlen = sprintf(tbuf,
 						"<%c>[%5lu.%06lu] ",
--------------->
Subject: sched: remove printk_clock()
From: Ingo Molnar <mingo@elte.hu>

printk_clock() is obsolete - it has been replaced with cpu_clock().

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/arm/kernel/time.c  |   11 -----------
 arch/ia64/kernel/time.c |   27 ---------------------------
 kernel/printk.c         |    5 -----
 3 files changed, 43 deletions(-)

Index: linux/arch/arm/kernel/time.c
===================================================================
--- linux.orig/arch/arm/kernel/time.c
+++ linux/arch/arm/kernel/time.c
@@ -79,17 +79,6 @@ static unsigned long dummy_gettimeoffset
 }
 #endif
 
-/*
- * An implementation of printk_clock() independent from
- * sched_clock().  This avoids non-bootable kernels when
- * printk_clock is enabled.
- */
-unsigned long long printk_clock(void)
-{
-	return (unsigned long long)(jiffies - INITIAL_JIFFIES) *
-			(1000000000 / HZ);
-}
-
 static unsigned long next_rtc_update;
 
 /*
Index: linux/arch/ia64/kernel/time.c
===================================================================
--- linux.orig/arch/ia64/kernel/time.c
+++ linux/arch/ia64/kernel/time.c
@@ -344,33 +344,6 @@ udelay (unsigned long usecs)
 }
 EXPORT_SYMBOL(udelay);
 
-static unsigned long long ia64_itc_printk_clock(void)
-{
-	if (ia64_get_kr(IA64_KR_PER_CPU_DATA))
-		return sched_clock();
-	return 0;
-}
-
-static unsigned long long ia64_default_printk_clock(void)
-{
-	return (unsigned long long)(jiffies_64 - INITIAL_JIFFIES) *
-		(1000000000/HZ);
-}
-
-unsigned long long (*ia64_printk_clock)(void) = &ia64_default_printk_clock;
-
-unsigned long long printk_clock(void)
-{
-	return ia64_printk_clock();
-}
-
-void __init
-ia64_setup_printk_clock(void)
-{
-	if (!(sal_platform_features & IA64_SAL_PLATFORM_FEATURE_ITC_DRIFT))
-		ia64_printk_clock = ia64_itc_printk_clock;
-}
-
 /* IA64 doesn't cache the timezone */
 void update_vsyscall_tz(void)
 {
Index: linux/kernel/printk.c
===================================================================
--- linux.orig/kernel/printk.c
+++ linux/kernel/printk.c
@@ -573,11 +573,6 @@ static int __init printk_time_setup(char
 
 __setup("time", printk_time_setup);
 
-__attribute__((weak)) unsigned long long printk_clock(void)
-{
-	return sched_clock();
-}
-
 /* Check if we have any console registered that can be called early in boot. */
 static int have_callable_console(void)
 {

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10 20:42 ` Ingo Molnar
@ 2007-12-10 20:57   ` Guillaume Chazarain
  2007-12-10 20:59   ` Andrew Morton
  1 sibling, 0 replies; 103+ messages in thread
From: Guillaume Chazarain @ 2007-12-10 20:57 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Rafael J. Wysocki, LKML, Andrew Morton, Linus Torvalds

On Dec 10, 2007 9:42 PM, Ingo Molnar <mingo@elte.hu> wrote:
> although some claimed effect was on udelay()/mdelay() too.

Any specific report?
The jumping sched_clock on frequency change caused some
scheduling oddities for me, but CFS attenuated the effect.

Thanks.

-- 
Guillaume

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10 20:42 ` Ingo Molnar
  2007-12-10 20:57   ` Guillaume Chazarain
@ 2007-12-10 20:59   ` Andrew Morton
  2007-12-10 22:45     ` Ingo Molnar
  1 sibling, 1 reply; 103+ messages in thread
From: Andrew Morton @ 2007-12-10 20:59 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: rjw, linux-kernel, torvalds

On Mon, 10 Dec 2007 21:42:12 +0100
Ingo Molnar <mingo@elte.hu> wrote:

> * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> 
> > Subject		: jiffies counter leaps in 2.6.24-rc3
> > Submitter	: Stefano Brivio <stefano.brivio@polimi.it>
> > References	: http://lkml.org/lkml/2007/11/24/53
> > 		  http://bugzilla.kernel.org/show_bug.cgi?id=9475
> > Handled-By	: Ingo Molnar <mingo@elte.hu>
> > Patch		: http://lkml.org/lkml/2007/12/7/132
> 
> Linus, Andrew, i need some help deciding what to do about this 
> regression. The fixes for this have been tested and resolve the 
> regression, but they change printk and other code that runs by default 
> and is thus rather invasive so late in the v2.6.24 cycle. This bug 
> should only affect CONFIG_PRINTK_TIME=y kernels (a non-default debug 
> option) - although some claimed effect was on udelay()/mdelay() too.
> 
> i've attached below the queue of 5 patches that fix this problem. They 
> have been build and boot tested with more than 1000 random kernels in 
> the past few days, so i certainly trust the core and x86 bits of this.
> 
> what do you think? Right now i've got them queued up for 2.6.25 in both 
> the scheduler-devel and the x86-devel git trees - but can submit them 
> for 2.6.24 if it's better if we did them there. I've got no strong 
> opinion either way.

printk_clock() doesn't seem terribly important but what's this stuff about
effects on udelay/mdelay?  That can be serious if they're getting
shortened.


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10 20:59   ` Andrew Morton
@ 2007-12-10 22:45     ` Ingo Molnar
  2007-12-10 23:04       ` Ingo Molnar
  0 siblings, 1 reply; 103+ messages in thread
From: Ingo Molnar @ 2007-12-10 22:45 UTC (permalink / raw)
  To: Andrew Morton; +Cc: rjw, linux-kernel, torvalds


* Andrew Morton <akpm@linux-foundation.org> wrote:

> > what do you think? Right now i've got them queued up for 2.6.25 in 
> > both the scheduler-devel and the x86-devel git trees - but can 
> > submit them for 2.6.24 if it's better if we did them there. I've got 
> > no strong opinion either way.
> 
> printk_clock() doesn't seem terribly important but what's this stuff 
> about effects on udelay/mdelay?  That can be serious if they're 
> getting shortened.

since udelay depends on loops_per_jiffy, which is fixed up 
time_cpufreq_notifier(), i dont see how it could be affected by 
frequency changes. (but that's the theory - practice might be different)

	Ingo

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10 22:45     ` Ingo Molnar
@ 2007-12-10 23:04       ` Ingo Molnar
  2007-12-10 23:34         ` Stefano Brivio
  0 siblings, 1 reply; 103+ messages in thread
From: Ingo Molnar @ 2007-12-10 23:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: rjw, linux-kernel, torvalds, Stefano Brivio, Guillaume Chazarain


* Ingo Molnar <mingo@elte.hu> wrote:

> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > > what do you think? Right now i've got them queued up for 2.6.25 in 
> > > both the scheduler-devel and the x86-devel git trees - but can 
> > > submit them for 2.6.24 if it's better if we did them there. I've got 
> > > no strong opinion either way.
> > 
> > printk_clock() doesn't seem terribly important but what's this stuff 
> > about effects on udelay/mdelay?  That can be serious if they're 
> > getting shortened.
> 
> since udelay depends on loops_per_jiffy, which is fixed up 
> time_cpufreq_notifier(), i dont see how it could be affected by 
> frequency changes. (but that's the theory - practice might be 
> different)

Stefano Brivio reported udelay()/mdelay() effects in the b43 driver. 
(and it caused driver failures for him.)

Stefano, could you please try to sum up your experiences with that 
issue? Is it reproducable, and the 5 patches i did fix it? (if yes, 
could you try to re-do the mdelay verifications perhaps, to make sure 
it's not some other effect interacting here. In theory sched-clock 
scaling has no effect on udelay behavior.)

	Ingo

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10 23:04       ` Ingo Molnar
@ 2007-12-10 23:34         ` Stefano Brivio
  2007-12-10 23:53           ` Guillaume Chazarain
                             ` (2 more replies)
  0 siblings, 3 replies; 103+ messages in thread
From: Stefano Brivio @ 2007-12-10 23:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, rjw, linux-kernel, torvalds, Guillaume Chazarain

On Tue, 11 Dec 2007 00:04:25 +0100
Ingo Molnar <mingo@elte.hu> wrote:
> 
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> > * Andrew Morton <akpm@linux-foundation.org> wrote:
> > 
> > > > what do you think? Right now i've got them queued up for 2.6.25 in 
> > > > both the scheduler-devel and the x86-devel git trees - but can 
> > > > submit them for 2.6.24 if it's better if we did them there. I've got 
> > > > no strong opinion either way.
> > > 
> > > printk_clock() doesn't seem terribly important but what's this stuff 
> > > about effects on udelay/mdelay?  That can be serious if they're 
> > > getting shortened.
> > 
> > since udelay depends on loops_per_jiffy, which is fixed up 
> > time_cpufreq_notifier(), i dont see how it could be affected by 
> > frequency changes. (but that's the theory - practice might be 
> > different)
> 
> Stefano Brivio reported udelay()/mdelay() effects in the b43 driver. 
> (and it caused driver failures for him.)
> 
> Stefano, could you please try to sum up your experiences with that 
> issue? Is it reproducable, and the 5 patches i did fix it? (if yes, 
> could you try to re-do the mdelay verifications perhaps, to make sure 
> it's not some other effect interacting here. In theory sched-clock 
> scaling has no effect on udelay behavior.)

Sorry for disappearing. Anyway, yes, those patches fixed it. Precision in
delays isn't that good when using my crappy unstable TSC (mdelay(2000)
causes delays between 2 and 2.9 seconds) but it's not depending on frequency
changes anymore. So I'd say it's fixed, but please tell me if you want me
to do any other test so as to be sure it is.


--
Ciao
Stefano

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10 23:34         ` Stefano Brivio
@ 2007-12-10 23:53           ` Guillaume Chazarain
  2007-12-11  8:48             ` Ingo Molnar
  2007-12-10 23:56           ` Arjan van de Ven
  2007-12-11  9:01           ` Ingo Molnar
  2 siblings, 1 reply; 103+ messages in thread
From: Guillaume Chazarain @ 2007-12-10 23:53 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Ingo Molnar, Andrew Morton, rjw, linux-kernel, torvalds

Stefano Brivio <stefano.brivio@polimi.it> wrote:

> Sorry for disappearing. Anyway, yes, those patches fixed it. Precision in
> delays isn't that good when using my crappy unstable TSC (mdelay(2000)
> causes delays between 2 and 2.9 seconds) but it's not depending on frequency
> changes anymore. So I'd say it's fixed, but please tell me if you want me
> to do any other test so as to be sure it is.

Ingo,

it seems you dropped http://lkml.org/lkml/2007/12/7/100 (cpu_clock()
based udelay), so how udelay can be affected by your proposed changes?

Thanks.

-- 
Guillaume

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10 23:34         ` Stefano Brivio
  2007-12-10 23:53           ` Guillaume Chazarain
@ 2007-12-10 23:56           ` Arjan van de Ven
  2007-12-11  0:01             ` Guillaume Chazarain
  2007-12-11  9:01           ` Ingo Molnar
  2 siblings, 1 reply; 103+ messages in thread
From: Arjan van de Ven @ 2007-12-10 23:56 UTC (permalink / raw)
  To: Stefano Brivio
  Cc: Ingo Molnar, Andrew Morton, rjw, linux-kernel, torvalds,
	Guillaume Chazarain

On Tue, 11 Dec 2007 00:34:33 +0100
Stefano Brivio <stefano.brivio@polimi.it> wrote:

> On Tue, 11 Dec 2007 00:04:25 +0100
> Ingo Molnar <mingo@elte.hu> wrote:
> > 
> > * Ingo Molnar <mingo@elte.hu> wrote:
> > 
> > > * Andrew Morton <akpm@linux-foundation.org> wrote:
> > > 
> > > > > what do you think? Right now i've got them queued up for
> > > > > 2.6.25 in both the scheduler-devel and the x86-devel git
> > > > > trees - but can submit them for 2.6.24 if it's better if we
> > > > > did them there. I've got no strong opinion either way.
> > > > 
> > > > printk_clock() doesn't seem terribly important but what's this
> > > > stuff about effects on udelay/mdelay?  That can be serious if
> > > > they're getting shortened.
> > > 
> > > since udelay depends on loops_per_jiffy, which is fixed up 
> > > time_cpufreq_notifier(), i dont see how it could be affected by 
> > > frequency changes. (but that's the theory - practice might be 
> > > different)
> > 
> > Stefano Brivio reported udelay()/mdelay() effects in the b43
> > driver. (and it caused driver failures for him.)
> > 
> > Stefano, could you please try to sum up your experiences with that 
> > issue? Is it reproducable, and the 5 patches i did fix it? (if yes, 
> > could you try to re-do the mdelay verifications perhaps, to make
> > sure it's not some other effect interacting here. In theory
> > sched-clock scaling has no effect on udelay behavior.)
> 
> Sorry for disappearing. Anyway, yes, those patches fixed it.
> Precision in delays isn't that good when using my crappy unstable TSC
> (mdelay(2000) causes delays between 2 and 2.9 seconds) but it's not
> depending on frequency changes anymore. So I'd say it's fixed, but
> please tell me if you want me to do any other test so as to be sure
> it is.
> 
> 
I'm still quite concerned about this in dual/quad core scenarios;
the frequency of both cores is the maximum of what linux sets each core to;
this means that if you're THIS sensitive to that there still is quite a nasty issue there.

I wonder if the various delay functions (maybe only in .25) should use the maximum observed loops_per_jiffie instead always (across cpus) to be super safe here.

-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10 23:56           ` Arjan van de Ven
@ 2007-12-11  0:01             ` Guillaume Chazarain
  2007-12-11  1:06               ` Arjan van de Ven
  0 siblings, 1 reply; 103+ messages in thread
From: Guillaume Chazarain @ 2007-12-11  0:01 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Stefano Brivio, Ingo Molnar, Andrew Morton, rjw, linux-kernel, torvalds

Arjan van de Ven <arjan@infradead.org> wrote:

> the frequency of both cores is the maximum of what linux sets each core to;

Do you mean that the cpufreq code can be confused about the actual
frequency of the cores? That sounds like a big problem.

Thanks for any insight.

-- 
Guillaume

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-11  0:01             ` Guillaume Chazarain
@ 2007-12-11  1:06               ` Arjan van de Ven
  2007-12-11  8:43                 ` Ingo Molnar
  0 siblings, 1 reply; 103+ messages in thread
From: Arjan van de Ven @ 2007-12-11  1:06 UTC (permalink / raw)
  To: Guillaume Chazarain
  Cc: Stefano Brivio, Ingo Molnar, Andrew Morton, rjw, linux-kernel, torvalds

On Tue, 11 Dec 2007 01:01:25 +0100
Guillaume Chazarain <guichaz@yahoo.fr> wrote:

> Arjan van de Ven <arjan@infradead.org> wrote:
> 
> > the frequency of both cores is the maximum of what linux sets each
> > core to;
> 
> Do you mean that the cpufreq code can be confused about the actual
> frequency of the cores? 

it means that cpufreq doesn't know the actual frequency (although bios sometimes tells us about the relationship, often the bios just lies through it's teeth); it only knows what it asks for, not what it gets. We know it'll get at least what it asks for, but it can get more than it asks for basically.

>That sounds like a big problem.

it'll get way worse going forward.
(but even on todays systems, the tsc no longer represents frequency, but is some fixed clock totally unrelated to cpu frequency)

-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52, [2.6.24-rc4-git5: Reported regressions from 2.6.23]
  2007-12-08 19:52             ` Ingo Molnar
  2007-12-08 20:29               ` Ingo Molnar
@ 2007-12-11  6:27               ` Dave Jones
  2007-12-11  8:52                 ` Ingo Molnar
  2007-12-14  4:07               ` Christoph Lameter
  2 siblings, 1 reply; 103+ messages in thread
From: Dave Jones @ 2007-12-11  6:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andrew Morton, Matt Mackall, Rafael J. Wysocki,
	LKML, Christoph Lameter

On Sat, Dec 08, 2007 at 08:52:11PM +0100, Ingo Molnar wrote:

 > so even today's upstream kernel, which has 'ancient' SLUB code, SLAB and 
 > SLUB have essentially the same linecount:
 > 
 >   $ wc -l mm/slab.c mm/slub.c
 >   4478 mm/slab.c
 >   4125 mm/slub.c
 > 
 > (and while linecount != complexity, there is a strong relationship.)
 > 
 > With SLAB having 10 years more test coverage and tuning.

FWIW, the one thing slub does that slab doesn't that I find really nice
is being enable to enable debugging at boot time rather than compile time.

We don't get many people running benchmarks against the Fedora kernel,
so any scalability differences between slub/slab probably won't reach us
until we start shipping betas of the next RHEL based on the same kernel.

Which leaves my only other gripe.  It broke slabtop.
There's an alternative implementation in Documentation/vm/slabinfo.c
(why there not say, util-linux, home of current slabtop?) 

	Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-11  1:06               ` Arjan van de Ven
@ 2007-12-11  8:43                 ` Ingo Molnar
  0 siblings, 0 replies; 103+ messages in thread
From: Ingo Molnar @ 2007-12-11  8:43 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Guillaume Chazarain, Stefano Brivio, Andrew Morton, rjw,
	linux-kernel, torvalds


* Arjan van de Ven <arjan@infradead.org> wrote:

> > That sounds like a big problem.
> 
> it'll get way worse going forward. (but even on todays systems, the 
> tsc no longer represents frequency, but is some fixed clock totally 
> unrelated to cpu frequency)

X86_FEATURE_CONSTANT_TSC CPUs (all modern Intel CPUs) should be fine - 
we dont do any TSC frequency fixups for them. The loops_per_jiffy fixup 
looks like this:

                if (!(freq->flags & CPUFREQ_CONST_LOOPS))
                        cpu_data(freq->cpu).loops_per_jiffy =
                                cpufreq_scale(loops_per_jiffy_ref,
                                                ref_freq, freq->new);

i.e. X86_FEATURE_CONSTANT_TSC excluded. The sched_clock() scaling factor 
is modified like this:

                        if (!(freq->flags & CPUFREQ_CONST_LOOPS)) {
                                tsc_khz = cpu_khz;
                                preempt_disable();
                                set_cyc2ns_scale(cpu_khz, smp_processor_id());

so here X86_FEATURE_CONSTANT_TSC is excluded again. So the whole 
frequency scaling issue will become a pure legacy issue only with time.

	Ingo

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10 23:53           ` Guillaume Chazarain
@ 2007-12-11  8:48             ` Ingo Molnar
  0 siblings, 0 replies; 103+ messages in thread
From: Ingo Molnar @ 2007-12-11  8:48 UTC (permalink / raw)
  To: Guillaume Chazarain
  Cc: Stefano Brivio, Andrew Morton, rjw, linux-kernel, torvalds


* Guillaume Chazarain <guichaz@yahoo.fr> wrote:

> Stefano Brivio <stefano.brivio@polimi.it> wrote:
> 
> > Sorry for disappearing. Anyway, yes, those patches fixed it. Precision in
> > delays isn't that good when using my crappy unstable TSC (mdelay(2000)
> > causes delays between 2 and 2.9 seconds) but it's not depending on frequency
> > changes anymore. So I'd say it's fixed, but please tell me if you want me
> > to do any other test so as to be sure it is.
> 
> Ingo,
> 
> it seems you dropped http://lkml.org/lkml/2007/12/7/100 (cpu_clock() 
> based udelay), so how udelay can be affected by your proposed changes?

was this needed for you to get stable udelay()? (that cpu_clock() based 
udelay patch was buggy, i got the units wrong. udelay does wacky 
conversions between various units. So i dropped it for the time being.)

the last rollup you tested didnt show udelay problems, and it didnt 
include the sched_clock() based udelay patch.

so it would be nice if you could re-examine exactly what is needed. 
Please try latest -git and the concatenation of the 4 patches below. 
What would be the best info is to see which (if any!) patches are needed 
against latest -git to get a stable udelay() on your box.

	Ingo

--------------------------------------->
* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> Subject		: jiffies counter leaps in 2.6.24-rc3
> Submitter	: Stefano Brivio <stefano.brivio@polimi.it>
> References	: http://lkml.org/lkml/2007/11/24/53
> 		  http://bugzilla.kernel.org/show_bug.cgi?id=9475
> Handled-By	: Ingo Molnar <mingo@elte.hu>
> Patch		: http://lkml.org/lkml/2007/12/7/132

Linus, Andrew, i need some help deciding what to do about this 
regression. The fixes for this have been tested and resolve the 
regression, but they change printk and other code that runs by default 
and is thus rather invasive so late in the v2.6.24 cycle. This bug 
should only affect CONFIG_PRINTK_TIME=y kernels (a non-default debug 
option) - although some claimed effect was on udelay()/mdelay() too.

i've attached below the queue of 5 patches that fix this problem. They 
have been build and boot tested with more than 1000 random kernels in 
the past few days, so i certainly trust the core and x86 bits of this.

what do you think? Right now i've got them queued up for 2.6.25 in both 
the scheduler-devel and the x86-devel git trees - but can submit them 
for 2.6.24 if it's better if we did them there. I've got no strong 
opinion either way.

	Ingo

-------------------->
Subject: x86: scale cyc_2_nsec according to CPU frequency
From: "Guillaume Chazarain" <guichaz@yahoo.fr>

scale the sched_clock() cyc_2_nsec scaling factor according to
CPU frequency changes.

[ mingo@elte.hu: simplified it and fixed it for SMP. ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/tsc_32.c |   43 ++++++++++++++++++++++++++++++-----
 arch/x86/kernel/tsc_64.c |   57 ++++++++++++++++++++++++++++++++++++++---------
 include/asm-x86/timer.h  |   23 ++++++++++++++----
 3 files changed, 102 insertions(+), 21 deletions(-)

Index: linux/arch/x86/kernel/tsc_32.c
===================================================================
--- linux.orig/arch/x86/kernel/tsc_32.c
+++ linux/arch/x86/kernel/tsc_32.c
@@ -5,6 +5,7 @@
 #include <linux/jiffies.h>
 #include <linux/init.h>
 #include <linux/dmi.h>
+#include <linux/percpu.h>
 
 #include <asm/delay.h>
 #include <asm/tsc.h>
@@ -80,13 +81,31 @@ EXPORT_SYMBOL_GPL(check_tsc_unstable);
  *
  *			-johnstul@us.ibm.com "math is hard, lets go shopping!"
  */
-unsigned long cyc2ns_scale __read_mostly;
 
-#define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */
+DEFINE_PER_CPU(unsigned long, cyc2ns);
 
-static inline void set_cyc2ns_scale(unsigned long cpu_khz)
+static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu)
 {
-	cyc2ns_scale = (1000000 << CYC2NS_SCALE_FACTOR)/cpu_khz;
+	unsigned long flags, prev_scale, *scale;
+	unsigned long long tsc_now, ns_now;
+
+	local_irq_save(flags);
+	sched_clock_idle_sleep_event();
+
+	scale = &per_cpu(cyc2ns, cpu);
+
+	rdtscll(tsc_now);
+	ns_now = __cycles_2_ns(tsc_now);
+
+	prev_scale = *scale;
+	if (cpu_khz)
+		*scale = (NSEC_PER_MSEC << CYC2NS_SCALE_FACTOR)/cpu_khz;
+
+	/*
+	 * Start smoothly with the new frequency:
+	 */
+	sched_clock_idle_wakeup_event(0);
+	local_irq_restore(flags);
 }
 
 /*
@@ -239,7 +258,9 @@ time_cpufreq_notifier(struct notifier_bl
 						ref_freq, freq->new);
 			if (!(freq->flags & CPUFREQ_CONST_LOOPS)) {
 				tsc_khz = cpu_khz;
-				set_cyc2ns_scale(cpu_khz);
+				preempt_disable();
+				set_cyc2ns_scale(cpu_khz, smp_processor_id());
+				preempt_enable();
 				/*
 				 * TSC based sched_clock turns
 				 * to junk w/ cpufreq
@@ -367,6 +388,8 @@ static inline void check_geode_tsc_relia
 
 void __init tsc_init(void)
 {
+	int cpu;
+
 	if (!cpu_has_tsc || tsc_disable)
 		goto out_no_tsc;
 
@@ -380,7 +403,15 @@ void __init tsc_init(void)
 				(unsigned long)cpu_khz / 1000,
 				(unsigned long)cpu_khz % 1000);
 
-	set_cyc2ns_scale(cpu_khz);
+	/*
+	 * Secondary CPUs do not run through tsc_init(), so set up
+	 * all the scale factors for all CPUs, assuming the same
+	 * speed as the bootup CPU. (cpufreq notifiers will fix this
+	 * up if their speed diverges)
+	 */
+	for_each_possible_cpu(cpu)
+		set_cyc2ns_scale(cpu_khz, cpu);
+
 	use_tsc_delay();
 
 	/* Check and install the TSC clocksource */
Index: linux/arch/x86/kernel/tsc_64.c
===================================================================
--- linux.orig/arch/x86/kernel/tsc_64.c
+++ linux/arch/x86/kernel/tsc_64.c
@@ -10,6 +10,7 @@
 
 #include <asm/hpet.h>
 #include <asm/timex.h>
+#include <asm/timer.h>
 
 static int notsc __initdata = 0;
 
@@ -18,16 +19,48 @@ EXPORT_SYMBOL(cpu_khz);
 unsigned int tsc_khz;
 EXPORT_SYMBOL(tsc_khz);
 
-static unsigned int cyc2ns_scale __read_mostly;
+/* Accelerators for sched_clock()
+ * convert from cycles(64bits) => nanoseconds (64bits)
+ *  basic equation:
+ *		ns = cycles / (freq / ns_per_sec)
+ *		ns = cycles * (ns_per_sec / freq)
+ *		ns = cycles * (10^9 / (cpu_khz * 10^3))
+ *		ns = cycles * (10^6 / cpu_khz)
+ *
+ *	Then we use scaling math (suggested by george@mvista.com) to get:
+ *		ns = cycles * (10^6 * SC / cpu_khz) / SC
+ *		ns = cycles * cyc2ns_scale / SC
+ *
+ *	And since SC is a constant power of two, we can convert the div
+ *  into a shift.
+ *
+ *  We can use khz divisor instead of mhz to keep a better precision, since
+ *  cyc2ns_scale is limited to 10^6 * 2^10, which fits in 32 bits.
+ *  (mathieu.desnoyers@polymtl.ca)
+ *
+ *			-johnstul@us.ibm.com "math is hard, lets go shopping!"
+ */
+DEFINE_PER_CPU(unsigned long, cyc2ns);
 
-static inline void set_cyc2ns_scale(unsigned long khz)
+static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu)
 {
-	cyc2ns_scale = (NSEC_PER_MSEC << NS_SCALE) / khz;
-}
+	unsigned long flags, prev_scale, *scale;
+	unsigned long long tsc_now, ns_now;
 
-static unsigned long long cycles_2_ns(unsigned long long cyc)
-{
-	return (cyc * cyc2ns_scale) >> NS_SCALE;
+	local_irq_save(flags);
+	sched_clock_idle_sleep_event();
+
+	scale = &per_cpu(cyc2ns, cpu);
+
+	rdtscll(tsc_now);
+	ns_now = __cycles_2_ns(tsc_now);
+
+	prev_scale = *scale;
+	if (cpu_khz)
+		*scale = (NSEC_PER_MSEC << CYC2NS_SCALE_FACTOR)/cpu_khz;
+
+	sched_clock_idle_wakeup_event(0);
+	local_irq_restore(flags);
 }
 
 unsigned long long sched_clock(void)
@@ -100,7 +133,9 @@ static int time_cpufreq_notifier(struct 
 			mark_tsc_unstable("cpufreq changes");
 	}
 
-	set_cyc2ns_scale(tsc_khz_ref);
+	preempt_disable();
+	set_cyc2ns_scale(tsc_khz_ref, smp_processor_id());
+	preempt_enable();
 
 	return 0;
 }
@@ -151,7 +186,7 @@ static unsigned long __init tsc_read_ref
 void __init tsc_calibrate(void)
 {
 	unsigned long flags, tsc1, tsc2, tr1, tr2, pm1, pm2, hpet1, hpet2;
-	int hpet = is_hpet_enabled();
+	int hpet = is_hpet_enabled(), cpu;
 
 	local_irq_save(flags);
 
@@ -206,7 +241,9 @@ void __init tsc_calibrate(void)
 	}
 
 	tsc_khz = tsc2 / tsc1;
-	set_cyc2ns_scale(tsc_khz);
+
+	for_each_possible_cpu(cpu)
+		set_cyc2ns_scale(tsc_khz, cpu);
 }
 
 /*
Index: linux/include/asm-x86/timer.h
===================================================================
--- linux.orig/include/asm-x86/timer.h
+++ linux/include/asm-x86/timer.h
@@ -2,6 +2,7 @@
 #define _ASMi386_TIMER_H
 #include <linux/init.h>
 #include <linux/pm.h>
+#include <linux/percpu.h>
 
 #define TICK_SIZE (tick_nsec / 1000)
 
@@ -16,7 +17,7 @@ extern int recalibrate_cpu_khz(void);
 #define calculate_cpu_khz() native_calculate_cpu_khz()
 #endif
 
-/* Accellerators for sched_clock()
+/* Accelerators for sched_clock()
  * convert from cycles(64bits) => nanoseconds (64bits)
  *  basic equation:
  *		ns = cycles / (freq / ns_per_sec)
@@ -31,20 +32,32 @@ extern int recalibrate_cpu_khz(void);
  *	And since SC is a constant power of two, we can convert the div
  *  into a shift.
  *
- *  We can use khz divisor instead of mhz to keep a better percision, since
+ *  We can use khz divisor instead of mhz to keep a better precision, since
  *  cyc2ns_scale is limited to 10^6 * 2^10, which fits in 32 bits.
  *  (mathieu.desnoyers@polymtl.ca)
  *
  *			-johnstul@us.ibm.com "math is hard, lets go shopping!"
  */
-extern unsigned long cyc2ns_scale __read_mostly;
+
+DECLARE_PER_CPU(unsigned long, cyc2ns);
 
 #define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */
 
-static inline unsigned long long cycles_2_ns(unsigned long long cyc)
+static inline unsigned long long __cycles_2_ns(unsigned long long cyc)
 {
-	return (cyc * cyc2ns_scale) >> CYC2NS_SCALE_FACTOR;
+	return cyc * per_cpu(cyc2ns, smp_processor_id()) >> CYC2NS_SCALE_FACTOR;
 }
 
+static inline unsigned long long cycles_2_ns(unsigned long long cyc)
+{
+	unsigned long long ns;
+	unsigned long flags;
+
+	local_irq_save(flags);
+	ns = __cycles_2_ns(cyc);
+	local_irq_restore(flags);
+
+	return ns;
+}
 
 #endif
--------------->
Subject: x86: idle wakeup event in the HLT loop
From: Ingo Molnar <mingo@elte.hu>

do a proper idle-wakeup event on HLT as well - some CPUs stop the TSC
in HLT too, not just when going through the ACPI methods.

(the ACPI idle code already does this.)

[ update the 64-bit side too, as noticed by Jiri Slaby. ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/process_32.c |   15 ++++++++++++---
 arch/x86/kernel/process_64.c |   13 ++++++++++---
 2 files changed, 22 insertions(+), 6 deletions(-)

Index: linux-x86.q/arch/x86/kernel/process_32.c
===================================================================
--- linux-x86.q.orig/arch/x86/kernel/process_32.c
+++ linux-x86.q/arch/x86/kernel/process_32.c
@@ -113,10 +113,19 @@ void default_idle(void)
 		smp_mb();
 
 		local_irq_disable();
-		if (!need_resched())
+		if (!need_resched()) {
+			ktime_t t0, t1;
+			u64 t0n, t1n;
+
+			t0 = ktime_get();
+			t0n = ktime_to_ns(t0);
 			safe_halt();	/* enables interrupts racelessly */
-		else
-			local_irq_enable();
+			local_irq_disable();
+			t1 = ktime_get();
+			t1n = ktime_to_ns(t1);
+			sched_clock_idle_wakeup_event(t1n - t0n);
+		}
+		local_irq_enable();
 		current_thread_info()->status |= TS_POLLING;
 	} else {
 		/* loop is done by the caller */
Index: linux-x86.q/arch/x86/kernel/process_64.c
===================================================================
--- linux-x86.q.orig/arch/x86/kernel/process_64.c
+++ linux-x86.q/arch/x86/kernel/process_64.c
@@ -116,9 +116,16 @@ static void default_idle(void)
 	smp_mb();
 	local_irq_disable();
 	if (!need_resched()) {
-		/* Enables interrupts one instruction before HLT.
-		   x86 special cases this so there is no race. */
-		safe_halt();
+		ktime_t t0, t1;
+		u64 t0n, t1n;
+
+		t0 = ktime_get();
+		t0n = ktime_to_ns(t0);
+		safe_halt();	/* enables interrupts racelessly */
+		local_irq_disable();
+		t1 = ktime_get();
+		t1n = ktime_to_ns(t1);
+		sched_clock_idle_wakeup_event(t1n - t0n);
 	} else
 		local_irq_enable();
 	current_thread_info()->status |= TS_POLLING;
--------------->
Subject: printk: make printk more robust by not allowing recursion
From: Ingo Molnar <mingo@elte.hu>

make printk more robust by allowing recursion only if there's a crash
going on. Also add recursion detection.

I've tested it with an artificially injected printk recursion - instead
of a lockup or spontaneous reboot or other crash, the output was a well
controlled:

[   41.057335] SysRq : <2>BUG: recent printk recursion!
[   41.057335] loglevel0-8 reBoot Crashdump show-all-locks(D) tErm Full kIll saK showMem Nice powerOff showPc show-all-timers(Q) unRaw Sync showTasks Unmount shoW-blocked-tasks

also do all this printk-debug logic with irqs disabled.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/printk.c |   48 ++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 38 insertions(+), 10 deletions(-)

Index: linux/kernel/printk.c
===================================================================
--- linux.orig/kernel/printk.c
+++ linux/kernel/printk.c
@@ -628,30 +628,57 @@ asmlinkage int printk(const char *fmt, .
 /* cpu currently holding logbuf_lock */
 static volatile unsigned int printk_cpu = UINT_MAX;
 
+const char printk_recursion_bug_msg [] =
+			KERN_CRIT "BUG: recent printk recursion!\n";
+static int printk_recursion_bug;
+
 asmlinkage int vprintk(const char *fmt, va_list args)
 {
+	static int log_level_unknown = 1;
+	static char printk_buf[1024];
+
 	unsigned long flags;
-	int printed_len;
+	int printed_len = 0;
+	int this_cpu;
 	char *p;
-	static char printk_buf[1024];
-	static int log_level_unknown = 1;
 
 	boot_delay_msec();
 
 	preempt_disable();
-	if (unlikely(oops_in_progress) && printk_cpu == smp_processor_id())
-		/* If a crash is occurring during printk() on this CPU,
-		 * make sure we can't deadlock */
-		zap_locks();
-
 	/* This stops the holder of console_sem just where we want him */
 	raw_local_irq_save(flags);
+	this_cpu = smp_processor_id();
+
+	/*
+	 * Ouch, printk recursed into itself!
+	 */
+	if (unlikely(printk_cpu == this_cpu)) {
+		/*
+		 * If a crash is occurring during printk() on this CPU,
+		 * then try to get the crash message out but make sure
+		 * we can't deadlock. Otherwise just return to avoid the
+		 * recursion and return - but flag the recursion so that
+		 * it can be printed at the next appropriate moment:
+		 */
+		if (!oops_in_progress) {
+			printk_recursion_bug = 1;
+			goto out_restore_irqs;
+		}
+		zap_locks();
+	}
+
 	lockdep_off();
 	spin_lock(&logbuf_lock);
-	printk_cpu = smp_processor_id();
+	printk_cpu = this_cpu;
 
+	if (printk_recursion_bug) {
+		printk_recursion_bug = 0;
+		strcpy(printk_buf, printk_recursion_bug_msg);
+		printed_len = sizeof(printk_recursion_bug_msg);
+	}
 	/* Emit the output into the temporary buffer */
-	printed_len = vscnprintf(printk_buf, sizeof(printk_buf), fmt, args);
+	printed_len += vscnprintf(printk_buf + printed_len,
+				  sizeof(printk_buf), fmt, args);
 
 	/*
 	 * Copy the output into log_buf.  If the caller didn't provide
@@ -744,6 +771,7 @@ asmlinkage int vprintk(const char *fmt, 
 		printk_cpu = UINT_MAX;
 		spin_unlock(&logbuf_lock);
 		lockdep_on();
+out_restore_irqs:
 		raw_local_irq_restore(flags);
 	}
 
--------------->
Subject: sched: fix CONFIG_PRINT_TIME's reliance on sched_clock()
From: Ingo Molnar <mingo@elte.hu>

Stefano Brivio reported weird printk timestamp behavior during
CPU frequency changes:

  http://bugzilla.kernel.org/show_bug.cgi?id=9475

fix CONFIG_PRINT_TIME's reliance on sched_clock() and use cpu_clock()
instead.

Reported-and-bisected-by: Stefano Brivio <stefano.brivio@polimi.it>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/printk.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/kernel/printk.c
===================================================================
--- linux.orig/kernel/printk.c
+++ linux/kernel/printk.c
@@ -707,7 +707,7 @@ asmlinkage int vprintk(const char *fmt, 
 					loglev_char = default_message_loglevel
 						+ '0';
 				}
-				t = printk_clock();
+				t = cpu_clock(printk_cpu);
 				nanosec_rem = do_div(t, 1000000000);
 				tlen = sprintf(tbuf,
 						"<%c>[%5lu.%06lu] ",

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52, [2.6.24-rc4-git5: Reported regressions from 2.6.23]
  2007-12-11  6:27               ` Dave Jones
@ 2007-12-11  8:52                 ` Ingo Molnar
  2007-12-11 19:03                   ` Peter Zijlstra
  0 siblings, 1 reply; 103+ messages in thread
From: Ingo Molnar @ 2007-12-11  8:52 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Andrew Morton, Matt Mackall,
	Rafael J. Wysocki, LKML, Christoph Lameter
  Cc: Rafael J. Wysocki


* Dave Jones <davej@redhat.com> wrote:

> On Sat, Dec 08, 2007 at 08:52:11PM +0100, Ingo Molnar wrote:
> 
>  > so even today's upstream kernel, which has 'ancient' SLUB code, SLAB and 
>  > SLUB have essentially the same linecount:
>  > 
>  >   $ wc -l mm/slab.c mm/slub.c
>  >   4478 mm/slab.c
>  >   4125 mm/slub.c
>  > 
>  > (and while linecount != complexity, there is a strong relationship.)
>  > 
>  > With SLAB having 10 years more test coverage and tuning.
> 
> FWIW, the one thing slub does that slab doesn't that I find really 
> nice is being enable to enable debugging at boot time rather than 
> compile time.

yes, but that's largely due to "dont change SLAB because we've got SLUB" 
resistence to SLAB patches. It's a 2 minute hack to implement this for 
SLAB.

> We don't get many people running benchmarks against the Fedora kernel, 
> so any scalability differences between slub/slab probably won't reach 
> us until we start shipping betas of the next RHEL based on the same 
> kernel.
> 
> Which leaves my only other gripe.  It broke slabtop.

that's actually a _bad_ ABI regression. Rafael, could you please add 
this to the regressions list?

> There's an alternative implementation in Documentation/vm/slabinfo.c 
> (why there not say, util-linux, home of current slabtop?)

the kernel should output /proc/slabinfo data with the same formatting, 
period.

	Ingo

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10 23:34         ` Stefano Brivio
  2007-12-10 23:53           ` Guillaume Chazarain
  2007-12-10 23:56           ` Arjan van de Ven
@ 2007-12-11  9:01           ` Ingo Molnar
  2007-12-11 21:10             ` Stefano Brivio
  2007-12-19  0:58             ` Stefano Brivio
  2 siblings, 2 replies; 103+ messages in thread
From: Ingo Molnar @ 2007-12-11  9:01 UTC (permalink / raw)
  To: Stefano Brivio
  Cc: Andrew Morton, rjw, linux-kernel, torvalds, Guillaume Chazarain

[-- Attachment #1: Type: text/plain, Size: 1791 bytes --]


* Stefano Brivio <stefano.brivio@polimi.it> wrote:

> > Stefano, could you please try to sum up your experiences with that 
> > issue? Is it reproducable, and the 5 patches i did fix it? (if yes, 
> > could you try to re-do the mdelay verifications perhaps, to make 
> > sure it's not some other effect interacting here. In theory 
> > sched-clock scaling has no effect on udelay behavior.)
> 
> Sorry for disappearing. Anyway, yes, those patches fixed it. Precision 
> in delays isn't that good when using my crappy unstable TSC 
> (mdelay(2000) causes delays between 2 and 2.9 seconds) but it's not 
> depending on frequency changes anymore. So I'd say it's fixed, but 
> please tell me if you want me to do any other test so as to be sure it 
> is.

ok, just to make sure we are all synced up. I made 8 patches related to 
this problem category (and all the trickle effects). 3 are upstream 
already, 5 are pending for v2.6.25. One out of those 5 is an immaterial 
cleanup patch - which leaves us 4 patches to sort out.

So i'd suggest for you to try latest -git - that will tell us whether 
udelay() is acceptable on your box right now.

i've attached those 4 patches:

 x86-sched_clock-re-scheduler-fix-x86-regression-in-native-sched-clock.patch
 x86-cpu-clock-idle-event.patch
 sched-printk-recursion-fix.patch
 sched-printk-clock-fix.patch

none of them is _supposed_ to have any effect on udelay(), but the 
interactions in this area are weird.

[ note: CONFIG_PRINTK_TIME will be broken and only fixed in v2.6.25, so 
  use some other time metric for determining mdelay quality. ]

plus then there's this patch:

  http://lkml.org/lkml/2007/12/7/100

is it perhaps this one that fixed udelay for you? [ which would be much 
more expected, as this patch changes udelay ;-) ]

	Ingo

[-- Attachment #2: x86-sched_clock-re-scheduler-fix-x86-regression-in-native-sched-clock.patch --]
[-- Type: text/plain, Size: 7410 bytes --]

Subject: x86: scale cyc_2_nsec according to CPU frequency
From: "Guillaume Chazarain" <guichaz@yahoo.fr>

scale the sched_clock() cyc_2_nsec scaling factor according to
CPU frequency changes.

[ mingo@elte.hu: simplified it and fixed it for SMP. ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/tsc_32.c |   43 ++++++++++++++++++++++++++++++-----
 arch/x86/kernel/tsc_64.c |   57 ++++++++++++++++++++++++++++++++++++++---------
 include/asm-x86/timer.h  |   23 ++++++++++++++----
 3 files changed, 102 insertions(+), 21 deletions(-)

Index: linux/arch/x86/kernel/tsc_32.c
===================================================================
--- linux.orig/arch/x86/kernel/tsc_32.c
+++ linux/arch/x86/kernel/tsc_32.c
@@ -5,6 +5,7 @@
 #include <linux/jiffies.h>
 #include <linux/init.h>
 #include <linux/dmi.h>
+#include <linux/percpu.h>
 
 #include <asm/delay.h>
 #include <asm/tsc.h>
@@ -80,13 +81,31 @@ EXPORT_SYMBOL_GPL(check_tsc_unstable);
  *
  *			-johnstul@us.ibm.com "math is hard, lets go shopping!"
  */
-unsigned long cyc2ns_scale __read_mostly;
 
-#define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */
+DEFINE_PER_CPU(unsigned long, cyc2ns);
 
-static inline void set_cyc2ns_scale(unsigned long cpu_khz)
+static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu)
 {
-	cyc2ns_scale = (1000000 << CYC2NS_SCALE_FACTOR)/cpu_khz;
+	unsigned long flags, prev_scale, *scale;
+	unsigned long long tsc_now, ns_now;
+
+	local_irq_save(flags);
+	sched_clock_idle_sleep_event();
+
+	scale = &per_cpu(cyc2ns, cpu);
+
+	rdtscll(tsc_now);
+	ns_now = __cycles_2_ns(tsc_now);
+
+	prev_scale = *scale;
+	if (cpu_khz)
+		*scale = (NSEC_PER_MSEC << CYC2NS_SCALE_FACTOR)/cpu_khz;
+
+	/*
+	 * Start smoothly with the new frequency:
+	 */
+	sched_clock_idle_wakeup_event(0);
+	local_irq_restore(flags);
 }
 
 /*
@@ -239,7 +258,9 @@ time_cpufreq_notifier(struct notifier_bl
 						ref_freq, freq->new);
 			if (!(freq->flags & CPUFREQ_CONST_LOOPS)) {
 				tsc_khz = cpu_khz;
-				set_cyc2ns_scale(cpu_khz);
+				preempt_disable();
+				set_cyc2ns_scale(cpu_khz, smp_processor_id());
+				preempt_enable();
 				/*
 				 * TSC based sched_clock turns
 				 * to junk w/ cpufreq
@@ -367,6 +388,8 @@ static inline void check_geode_tsc_relia
 
 void __init tsc_init(void)
 {
+	int cpu;
+
 	if (!cpu_has_tsc || tsc_disable)
 		goto out_no_tsc;
 
@@ -380,7 +403,15 @@ void __init tsc_init(void)
 				(unsigned long)cpu_khz / 1000,
 				(unsigned long)cpu_khz % 1000);
 
-	set_cyc2ns_scale(cpu_khz);
+	/*
+	 * Secondary CPUs do not run through tsc_init(), so set up
+	 * all the scale factors for all CPUs, assuming the same
+	 * speed as the bootup CPU. (cpufreq notifiers will fix this
+	 * up if their speed diverges)
+	 */
+	for_each_possible_cpu(cpu)
+		set_cyc2ns_scale(cpu_khz, cpu);
+
 	use_tsc_delay();
 
 	/* Check and install the TSC clocksource */
Index: linux/arch/x86/kernel/tsc_64.c
===================================================================
--- linux.orig/arch/x86/kernel/tsc_64.c
+++ linux/arch/x86/kernel/tsc_64.c
@@ -10,6 +10,7 @@
 
 #include <asm/hpet.h>
 #include <asm/timex.h>
+#include <asm/timer.h>
 
 static int notsc __initdata = 0;
 
@@ -18,16 +19,48 @@ EXPORT_SYMBOL(cpu_khz);
 unsigned int tsc_khz;
 EXPORT_SYMBOL(tsc_khz);
 
-static unsigned int cyc2ns_scale __read_mostly;
+/* Accelerators for sched_clock()
+ * convert from cycles(64bits) => nanoseconds (64bits)
+ *  basic equation:
+ *		ns = cycles / (freq / ns_per_sec)
+ *		ns = cycles * (ns_per_sec / freq)
+ *		ns = cycles * (10^9 / (cpu_khz * 10^3))
+ *		ns = cycles * (10^6 / cpu_khz)
+ *
+ *	Then we use scaling math (suggested by george@mvista.com) to get:
+ *		ns = cycles * (10^6 * SC / cpu_khz) / SC
+ *		ns = cycles * cyc2ns_scale / SC
+ *
+ *	And since SC is a constant power of two, we can convert the div
+ *  into a shift.
+ *
+ *  We can use khz divisor instead of mhz to keep a better precision, since
+ *  cyc2ns_scale is limited to 10^6 * 2^10, which fits in 32 bits.
+ *  (mathieu.desnoyers@polymtl.ca)
+ *
+ *			-johnstul@us.ibm.com "math is hard, lets go shopping!"
+ */
+DEFINE_PER_CPU(unsigned long, cyc2ns);
 
-static inline void set_cyc2ns_scale(unsigned long khz)
+static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu)
 {
-	cyc2ns_scale = (NSEC_PER_MSEC << NS_SCALE) / khz;
-}
+	unsigned long flags, prev_scale, *scale;
+	unsigned long long tsc_now, ns_now;
 
-static unsigned long long cycles_2_ns(unsigned long long cyc)
-{
-	return (cyc * cyc2ns_scale) >> NS_SCALE;
+	local_irq_save(flags);
+	sched_clock_idle_sleep_event();
+
+	scale = &per_cpu(cyc2ns, cpu);
+
+	rdtscll(tsc_now);
+	ns_now = __cycles_2_ns(tsc_now);
+
+	prev_scale = *scale;
+	if (cpu_khz)
+		*scale = (NSEC_PER_MSEC << CYC2NS_SCALE_FACTOR)/cpu_khz;
+
+	sched_clock_idle_wakeup_event(0);
+	local_irq_restore(flags);
 }
 
 unsigned long long sched_clock(void)
@@ -100,7 +133,9 @@ static int time_cpufreq_notifier(struct 
 			mark_tsc_unstable("cpufreq changes");
 	}
 
-	set_cyc2ns_scale(tsc_khz_ref);
+	preempt_disable();
+	set_cyc2ns_scale(tsc_khz_ref, smp_processor_id());
+	preempt_enable();
 
 	return 0;
 }
@@ -151,7 +186,7 @@ static unsigned long __init tsc_read_ref
 void __init tsc_calibrate(void)
 {
 	unsigned long flags, tsc1, tsc2, tr1, tr2, pm1, pm2, hpet1, hpet2;
-	int hpet = is_hpet_enabled();
+	int hpet = is_hpet_enabled(), cpu;
 
 	local_irq_save(flags);
 
@@ -206,7 +241,9 @@ void __init tsc_calibrate(void)
 	}
 
 	tsc_khz = tsc2 / tsc1;
-	set_cyc2ns_scale(tsc_khz);
+
+	for_each_possible_cpu(cpu)
+		set_cyc2ns_scale(tsc_khz, cpu);
 }
 
 /*
Index: linux/include/asm-x86/timer.h
===================================================================
--- linux.orig/include/asm-x86/timer.h
+++ linux/include/asm-x86/timer.h
@@ -2,6 +2,7 @@
 #define _ASMi386_TIMER_H
 #include <linux/init.h>
 #include <linux/pm.h>
+#include <linux/percpu.h>
 
 #define TICK_SIZE (tick_nsec / 1000)
 
@@ -16,7 +17,7 @@ extern int recalibrate_cpu_khz(void);
 #define calculate_cpu_khz() native_calculate_cpu_khz()
 #endif
 
-/* Accellerators for sched_clock()
+/* Accelerators for sched_clock()
  * convert from cycles(64bits) => nanoseconds (64bits)
  *  basic equation:
  *		ns = cycles / (freq / ns_per_sec)
@@ -31,20 +32,32 @@ extern int recalibrate_cpu_khz(void);
  *	And since SC is a constant power of two, we can convert the div
  *  into a shift.
  *
- *  We can use khz divisor instead of mhz to keep a better percision, since
+ *  We can use khz divisor instead of mhz to keep a better precision, since
  *  cyc2ns_scale is limited to 10^6 * 2^10, which fits in 32 bits.
  *  (mathieu.desnoyers@polymtl.ca)
  *
  *			-johnstul@us.ibm.com "math is hard, lets go shopping!"
  */
-extern unsigned long cyc2ns_scale __read_mostly;
+
+DECLARE_PER_CPU(unsigned long, cyc2ns);
 
 #define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */
 
-static inline unsigned long long cycles_2_ns(unsigned long long cyc)
+static inline unsigned long long __cycles_2_ns(unsigned long long cyc)
 {
-	return (cyc * cyc2ns_scale) >> CYC2NS_SCALE_FACTOR;
+	return cyc * per_cpu(cyc2ns, smp_processor_id()) >> CYC2NS_SCALE_FACTOR;
 }
 
+static inline unsigned long long cycles_2_ns(unsigned long long cyc)
+{
+	unsigned long long ns;
+	unsigned long flags;
+
+	local_irq_save(flags);
+	ns = __cycles_2_ns(cyc);
+	local_irq_restore(flags);
+
+	return ns;
+}
 
 #endif

[-- Attachment #3: x86-cpu-clock-idle-event.patch --]
[-- Type: text/plain, Size: 2055 bytes --]

Subject: x86: idle wakeup event in the HLT loop
From: Ingo Molnar <mingo@elte.hu>

do a proper idle-wakeup event on HLT as well - some CPUs stop the TSC
in HLT too, not just when going through the ACPI methods.

(the ACPI idle code already does this.)

[ update the 64-bit side too, as noticed by Jiri Slaby. ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/process_32.c |   15 ++++++++++++---
 arch/x86/kernel/process_64.c |   13 ++++++++++---
 2 files changed, 22 insertions(+), 6 deletions(-)

Index: linux-x86.q/arch/x86/kernel/process_32.c
===================================================================
--- linux-x86.q.orig/arch/x86/kernel/process_32.c
+++ linux-x86.q/arch/x86/kernel/process_32.c
@@ -113,10 +113,19 @@ void default_idle(void)
 		smp_mb();
 
 		local_irq_disable();
-		if (!need_resched())
+		if (!need_resched()) {
+			ktime_t t0, t1;
+			u64 t0n, t1n;
+
+			t0 = ktime_get();
+			t0n = ktime_to_ns(t0);
 			safe_halt();	/* enables interrupts racelessly */
-		else
-			local_irq_enable();
+			local_irq_disable();
+			t1 = ktime_get();
+			t1n = ktime_to_ns(t1);
+			sched_clock_idle_wakeup_event(t1n - t0n);
+		}
+		local_irq_enable();
 		current_thread_info()->status |= TS_POLLING;
 	} else {
 		/* loop is done by the caller */
Index: linux-x86.q/arch/x86/kernel/process_64.c
===================================================================
--- linux-x86.q.orig/arch/x86/kernel/process_64.c
+++ linux-x86.q/arch/x86/kernel/process_64.c
@@ -116,9 +116,16 @@ static void default_idle(void)
 	smp_mb();
 	local_irq_disable();
 	if (!need_resched()) {
-		/* Enables interrupts one instruction before HLT.
-		   x86 special cases this so there is no race. */
-		safe_halt();
+		ktime_t t0, t1;
+		u64 t0n, t1n;
+
+		t0 = ktime_get();
+		t0n = ktime_to_ns(t0);
+		safe_halt();	/* enables interrupts racelessly */
+		local_irq_disable();
+		t1 = ktime_get();
+		t1n = ktime_to_ns(t1);
+		sched_clock_idle_wakeup_event(t1n - t0n);
 	} else
 		local_irq_enable();
 	current_thread_info()->status |= TS_POLLING;

[-- Attachment #4: sched-printk-recursion-fix.patch --]
[-- Type: text/plain, Size: 3181 bytes --]

Subject: printk: make printk more robust by not allowing recursion
From: Ingo Molnar <mingo@elte.hu>

make printk more robust by allowing recursion only if there's a crash
going on. Also add recursion detection.

I've tested it with an artificially injected printk recursion - instead
of a lockup or spontaneous reboot or other crash, the output was a well
controlled:

[   41.057335] SysRq : <2>BUG: recent printk recursion!
[   41.057335] loglevel0-8 reBoot Crashdump show-all-locks(D) tErm Full kIll saK showMem Nice powerOff showPc show-all-timers(Q) unRaw Sync showTasks Unmount shoW-blocked-tasks

also do all this printk-debug logic with irqs disabled.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/printk.c |   48 ++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 38 insertions(+), 10 deletions(-)

Index: linux/kernel/printk.c
===================================================================
--- linux.orig/kernel/printk.c
+++ linux/kernel/printk.c
@@ -628,30 +628,57 @@ asmlinkage int printk(const char *fmt, .
 /* cpu currently holding logbuf_lock */
 static volatile unsigned int printk_cpu = UINT_MAX;
 
+const char printk_recursion_bug_msg [] =
+			KERN_CRIT "BUG: recent printk recursion!\n";
+static int printk_recursion_bug;
+
 asmlinkage int vprintk(const char *fmt, va_list args)
 {
+	static int log_level_unknown = 1;
+	static char printk_buf[1024];
+
 	unsigned long flags;
-	int printed_len;
+	int printed_len = 0;
+	int this_cpu;
 	char *p;
-	static char printk_buf[1024];
-	static int log_level_unknown = 1;
 
 	boot_delay_msec();
 
 	preempt_disable();
-	if (unlikely(oops_in_progress) && printk_cpu == smp_processor_id())
-		/* If a crash is occurring during printk() on this CPU,
-		 * make sure we can't deadlock */
-		zap_locks();
-
 	/* This stops the holder of console_sem just where we want him */
 	raw_local_irq_save(flags);
+	this_cpu = smp_processor_id();
+
+	/*
+	 * Ouch, printk recursed into itself!
+	 */
+	if (unlikely(printk_cpu == this_cpu)) {
+		/*
+		 * If a crash is occurring during printk() on this CPU,
+		 * then try to get the crash message out but make sure
+		 * we can't deadlock. Otherwise just return to avoid the
+		 * recursion and return - but flag the recursion so that
+		 * it can be printed at the next appropriate moment:
+		 */
+		if (!oops_in_progress) {
+			printk_recursion_bug = 1;
+			goto out_restore_irqs;
+		}
+		zap_locks();
+	}
+
 	lockdep_off();
 	spin_lock(&logbuf_lock);
-	printk_cpu = smp_processor_id();
+	printk_cpu = this_cpu;
 
+	if (printk_recursion_bug) {
+		printk_recursion_bug = 0;
+		strcpy(printk_buf, printk_recursion_bug_msg);
+		printed_len = sizeof(printk_recursion_bug_msg);
+	}
 	/* Emit the output into the temporary buffer */
-	printed_len = vscnprintf(printk_buf, sizeof(printk_buf), fmt, args);
+	printed_len += vscnprintf(printk_buf + printed_len,
+				  sizeof(printk_buf), fmt, args);
 
 	/*
 	 * Copy the output into log_buf.  If the caller didn't provide
@@ -744,6 +771,7 @@ asmlinkage int vprintk(const char *fmt, 
 		printk_cpu = UINT_MAX;
 		spin_unlock(&logbuf_lock);
 		lockdep_on();
+out_restore_irqs:
 		raw_local_irq_restore(flags);
 	}
 

[-- Attachment #5: sched-printk-clock-fix.patch --]
[-- Type: text/plain, Size: 943 bytes --]

Subject: sched: fix CONFIG_PRINT_TIME's reliance on sched_clock()
From: Ingo Molnar <mingo@elte.hu>

Stefano Brivio reported weird printk timestamp behavior during
CPU frequency changes:

  http://bugzilla.kernel.org/show_bug.cgi?id=9475

fix CONFIG_PRINT_TIME's reliance on sched_clock() and use cpu_clock()
instead.

Reported-and-bisected-by: Stefano Brivio <stefano.brivio@polimi.it>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/printk.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/kernel/printk.c
===================================================================
--- linux.orig/kernel/printk.c
+++ linux/kernel/printk.c
@@ -707,7 +707,7 @@ asmlinkage int vprintk(const char *fmt, 
 					loglev_char = default_message_loglevel
 						+ '0';
 				}
-				t = printk_clock();
+				t = cpu_clock(printk_cpu);
 				nanosec_rem = do_div(t, 1000000000);
 				tlen = sprintf(tbuf,
 						"<%c>[%5lu.%06lu] ",

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52, [2.6.24-rc4-git5: Reported regressions from 2.6.23]
  2007-12-11  8:52                 ` Ingo Molnar
@ 2007-12-11 19:03                   ` Peter Zijlstra
  0 siblings, 0 replies; 103+ messages in thread
From: Peter Zijlstra @ 2007-12-11 19:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Dave Jones, Linus Torvalds, Andrew Morton, Matt Mackall,
	Rafael J. Wysocki, LKML, Christoph Lameter


On Tue, 2007-12-11 at 09:52 +0100, Ingo Molnar wrote:
> * Dave Jones <davej@redhat.com> wrote:

> > Which leaves my only other gripe.  It broke slabtop.
> 
> that's actually a _bad_ ABI regression. Rafael, could you please add 
> this to the regressions list?
> 
> > There's an alternative implementation in Documentation/vm/slabinfo.c 
> > (why there not say, util-linux, home of current slabtop?)
> 
> the kernel should output /proc/slabinfo data with the same formatting, 
> period.

I can remember it being said that would be one of the things to-do
before SLUB could become default.


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-11  9:01           ` Ingo Molnar
@ 2007-12-11 21:10             ` Stefano Brivio
  2007-12-19  0:58             ` Stefano Brivio
  1 sibling, 0 replies; 103+ messages in thread
From: Stefano Brivio @ 2007-12-11 21:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, rjw, linux-kernel, torvalds, Guillaume Chazarain

On Tue, 11 Dec 2007 10:01:20 +0100
Ingo Molnar <mingo@elte.hu> wrote:

> ok, just to make sure we are all synced up. I made 8 patches related to 
> this problem category (and all the trickle effects). 3 are upstream 
> already, 5 are pending for v2.6.25. One out of those 5 is an immaterial 
> cleanup patch - which leaves us 4 patches to sort out.
> 
> So i'd suggest for you to try latest -git - that will tell us whether 
> udelay() is acceptable on your box right now.

Yes, it is (msleep(2000), as said, gives delays between 2 and 2.9s on my
box, and drivers are happy).

The commit which fixed this (it seems) is
fa2dd441df28b9fdfc68f84ae66f1b507cfff0e4. I'll bisect and tell you more in the
next days.

> i've attached those 4 patches:
> 
>  x86-sched_clock-re-scheduler-fix-x86-regression-in-native-sched-clock.patch
>  x86-cpu-clock-idle-event.patch
>  sched-printk-recursion-fix.patch
>  sched-printk-clock-fix.patch
> 
> none of them is _supposed_ to have any effect on udelay(), but the 
> interactions in this area are weird.

No effects here IIRC.

> [ note: CONFIG_PRINTK_TIME will be broken and only fixed in v2.6.25, so 
>   use some other time metric for determining mdelay quality. ]
> 
> plus then there's this patch:
> 
>   http://lkml.org/lkml/2007/12/7/100
> 
> is it perhaps this one that fixed udelay for you? [ which would be much 
> more expected, as this patch changes udelay ;-) ]

Will try it ASAP, again, in the next few days anyway.


--
Ciao
Stefano

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-09  2:15       ` Theodore Tso
@ 2007-12-13 10:49         ` Takashi Iwai
  2007-12-20 15:42           ` Takashi Iwai
  0 siblings, 1 reply; 103+ messages in thread
From: Takashi Iwai @ 2007-12-13 10:49 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Rafael J. Wysocki, Andrew Morton, LKML, Linus Torvalds,
	Ingo Molnar, Roland Dreier, perex

[Sorry for the late response as I've been on vacation]

At Sat, 8 Dec 2007 21:15:44 -0500,
Theodore Tso wrote:
> 
> On Sat, Dec 08, 2007 at 11:30:53PM +0100, Rafael J. Wysocki wrote:
> > On Saturday, 8 of December 2007, Theodore Tso wrote:
> > > However, as far as I am concerned, Ingo's patch, first posted to LKML
> > > here: http://lkml.org/lkml/2007/11/16/66 should be listed as fixing
> > > the above regression.  Rafael, could you please make a note of this in
> > > your regression list,
> > 
> > Done, thanks.
> 
> Great, thanks.  I should add that technically this wasn't a regression
> since I had been seeing this since before 2.6.23.  Also, it isn't a
> big deal, since aside from noise in the syslog, falling back to
> polling more doesn't make any functional or user-visible difference
> (although I guess it's less efficient).  
> 
> Regardless of whether it is a regression, it would be nice to get the
> patch applied and and this issue fixed for 2.6.25!

You mean 2.6.24 ? ;-)

Yes, if it solves the problem, not only improves the latency, it's
definitely nice to have now.  I was just too conservative to mark it
for 2.6.24 merge although it looks safe.

Jaroslav, could you prepare this for the push?  It corresponds to
alsa-kernel HG changeset 5557.


thanks,

Takashi

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52, [2.6.24-rc4-git5: Reported regressions from 2.6.23]
  2007-12-08 17:47     ` Linus Torvalds
  2007-12-08 17:54       ` Linus Torvalds
@ 2007-12-13 22:03       ` Christoph Lameter
  1 sibling, 0 replies; 103+ messages in thread
From: Christoph Lameter @ 2007-12-13 22:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matt Mackall, Ingo Molnar, Rafael J. Wysocki, LKML, Andrew Morton

On Sat, 8 Dec 2007, Linus Torvalds wrote:

> On that note, shouldn't we also do this for slub.c? Christoph?

SLUB does not pass __GFP_ZERO to the page allocator.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52, [2.6.24-rc4-git5: Reported regressions from 2.6.23]
  2007-12-09  8:50                   ` Ingo Molnar
  2007-12-09  9:18                     ` Pekka Enberg
@ 2007-12-13 22:07                     ` Christoph Lameter
  1 sibling, 0 replies; 103+ messages in thread
From: Christoph Lameter @ 2007-12-13 22:07 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pekka Enberg, Linus Torvalds, Andrew Morton, Matt Mackall,
	Rafael J. Wysocki, LKML

On Sun, 9 Dec 2007, Ingo Molnar wrote:

> unless i'm missing something obvious (and i easily might), i see SLUB as 
> SLAB reimplemented with a different queueing model. Not "without 
> queueing".

The "queue" that you are talking about is the freelist of a slab. It exist 
for each slab. SLAB uses a table of free entries there. The freelist is 
limited to a slab and thus the locality of the freelist is bound to 
the same page which avoids NUMA locality checks.



^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52, [2.6.24-rc4-git5: Reported regressions from 2.6.23]
  2007-12-08 19:52             ` Ingo Molnar
  2007-12-08 20:29               ` Ingo Molnar
  2007-12-11  6:27               ` Dave Jones
@ 2007-12-14  4:07               ` Christoph Lameter
  2007-12-14  7:16                 ` Eric Dumazet
  2007-12-14 12:49                 ` Ingo Molnar
  2 siblings, 2 replies; 103+ messages in thread
From: Christoph Lameter @ 2007-12-14  4:07 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andrew Morton, Matt Mackall, Rafael J. Wysocki, LKML

On Sat, 8 Dec 2007, Ingo Molnar wrote:

> 
> > Good. Although we should perhaps look at that reported performance 
> > problem with SLUB. It looks like SLUB will do a memclear() for the 
> > area twice (first for the whole page, then for the thing it allocated) 
> > for the slow case. Maybe that exacerbates the problem.
> 
> i dont think the SLUB problem could be explained purely via a double 
> memset(). [which ought to be extremely fast anyway] We are talking about 
> a 10 times slowdown on a 64-way box of a workload that is fairly 
> common-sense. (tasks sending messages to each other via bog standard 
> means)
> 
> while i dont want to jump to conclusions without looking at some 
> profiles, i think the SLUB performance regression is indicative of the 
> following fallacy: "SLAB can be done significantly simpler while keeping 
> the same performance".

Well this is double crap. First of all SLUB does not do memclear twice. 
There is no reason to assume that SLUB has the problem just because SLOB 
hat that. A "fix" for that nonexistent problem went into Linus tree. WTH 
is going on?

SLUB was done because of a series of problem with the basic concepts of 
SLAB that treaten it usability in the future.
 
> I couldnt point to any particular aspect of SLAB that i could 
> characterise as "needless bloat".

I agree, SLABs architecture is pretty tight and I was one of those who 
helped it along to be that way.

However, SLAB is just fundamentally wrong for todays machine. The key 
problem today is cacheline fetch latency and that problem will increase 
significantly in the future. Sure under some circumstances that exploit 
the fact that SLAB sometimes gets its guesses on the cpu cache right SLAB 
can still win but the more processors and nodes we get the more it will 
become difficult to keep SLAB around and the more it will become 
difficult to establish what cachelines are in the cpu cache.

> I think we should we make SLAB the default for v2.6.24 ...

If you guarantee that all the regression of SLAB vs. SLUB are addressed 
then thats fine but AFAICT that is not possible.

Here is a list of some of the benefits of SLUB just in case we forgot:


- SLUB is performance wise much faster than SLAB. This can be more than a
  factor of 10 (case of concurrent allocations / frees on multiple
  processors). See http://lkml.org/lkml/2007/10/27/245

- Single threaded allocation speed is up to double that of SLAB

- Remote freeing of objectcs in a NUMA systems is typically 30% faster.

- Debugging on SLAB is difficult. Requires recompile of the kernel
  and the resulting output is difficult to interpret. SLUB can apply
  debugging options to a subset of the slabcaches in order to allow
  the system to work with maximum speed. This is necessary to detect
  difficult to reproduce race conditions.

- SLAB can capture huge amounts of memory in its queues. The problem
  gets worse the more processors and NUMA nodes are in the system. The 
  amount of memory limits the number of per cpu objects one can configure.

- SLAB requires a pass through all slab caches every 2 seconds to
  expire objects. This is a problem both for realtime and MPI jobs
  that cannot take such a processor outage.

- SLAB does not have a sophisticated slabinfo tool to report the
  state of slab objects on the system. Can provide details of
  object use.

- SLAB requires the update of two words for freeing
  and allocation. SLUB can do that by updating a single
  word which allows to avoid enabling and disabling interrupts if
  the processor supports an atomic instruction for that purpose.
  This is important for realtime kernels where special measures
  may have to be implemented if one wants to disable interrupts.

- SLAB requires memory to be set aside for queues (processors
  times number of slabs times queue size). SLUB requires none of that.

- SLUB merges slab caches with similar characteristics to
  reduce the memory footprint even further.

- SLAB performs object level NUMA management which creates
  a complex allocator complexity. SLUB manages NUMA on the level of
  slab pages reducing object management overhead.

- SLUB allows remote node defragmentation to avoid the buildup
  of large partial lists on a single node.

- SLUB can actively reduce the fragmentation of slabs through
  slab cache specific callbacks (not merged yet)

- SLUB has resiliency features that allow it to isolate a problem
  object and continue after diagnostics have been performed.

- SLUB creates rarely used DMA caches on demand instead of creating
  them all on bootup (SLAB).


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52, [2.6.24-rc4-git5: Reported regressions from 2.6.23]
  2007-12-14  4:07               ` Christoph Lameter
@ 2007-12-14  7:16                 ` Eric Dumazet
  2007-12-14 12:49                 ` Ingo Molnar
  1 sibling, 0 replies; 103+ messages in thread
From: Eric Dumazet @ 2007-12-14  7:16 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Matt Mackall,
	Rafael J. Wysocki, LKML

Christoph Lameter a écrit :
> On Sat, 8 Dec 2007, Ingo Molnar wrote:
> 
>>> Good. Although we should perhaps look at that reported performance 
>>> problem with SLUB. It looks like SLUB will do a memclear() for the 
>>> area twice (first for the whole page, then for the thing it allocated) 
>>> for the slow case. Maybe that exacerbates the problem.
>> i dont think the SLUB problem could be explained purely via a double 
>> memset(). [which ought to be extremely fast anyway] We are talking about 
>> a 10 times slowdown on a 64-way box of a workload that is fairly 
>> common-sense. (tasks sending messages to each other via bog standard 
>> means)
>>
>> while i dont want to jump to conclusions without looking at some 
>> profiles, i think the SLUB performance regression is indicative of the 
>> following fallacy: "SLAB can be done significantly simpler while keeping 
>> the same performance".
> 
> Well this is double crap. First of all SLUB does not do memclear twice. 
> There is no reason to assume that SLUB has the problem just because SLOB 
> hat that. A "fix" for that nonexistent problem went into Linus tree. WTH 
> is going on?
> 
> SLUB was done because of a series of problem with the basic concepts of 
> SLAB that treaten it usability in the future.
>  
>> I couldnt point to any particular aspect of SLAB that i could 
>> characterise as "needless bloat".
> 
> I agree, SLABs architecture is pretty tight and I was one of those who 
> helped it along to be that way.
> 
> However, SLAB is just fundamentally wrong for todays machine. The key 
> problem today is cacheline fetch latency and that problem will increase 
> significantly in the future. Sure under some circumstances that exploit 
> the fact that SLAB sometimes gets its guesses on the cpu cache right SLAB 
> can still win but the more processors and nodes we get the more it will 
> become difficult to keep SLAB around and the more it will become 
> difficult to establish what cachelines are in the cpu cache.
> 
>> I think we should we make SLAB the default for v2.6.24 ...
> 
> If you guarantee that all the regression of SLAB vs. SLUB are addressed 
> then thats fine but AFAICT that is not possible.
> 
> Here is a list of some of the benefits of SLUB just in case we forgot:
> 
> 
> - SLUB is performance wise much faster than SLAB. This can be more than a
>   factor of 10 (case of concurrent allocations / frees on multiple
>   processors). See http://lkml.org/lkml/2007/10/27/245
> 
> - Single threaded allocation speed is up to double that of SLAB
> 
> - Remote freeing of objectcs in a NUMA systems is typically 30% faster.
> 
> - Debugging on SLAB is difficult. Requires recompile of the kernel
>   and the resulting output is difficult to interpret. SLUB can apply
>   debugging options to a subset of the slabcaches in order to allow
>   the system to work with maximum speed. This is necessary to detect
>   difficult to reproduce race conditions.
> 
> - SLAB can capture huge amounts of memory in its queues. The problem
>   gets worse the more processors and NUMA nodes are in the system. The 
>   amount of memory limits the number of per cpu objects one can configure.
> 
> - SLAB requires a pass through all slab caches every 2 seconds to
>   expire objects. This is a problem both for realtime and MPI jobs
>   that cannot take such a processor outage.
> 
> - SLAB does not have a sophisticated slabinfo tool to report the
>   state of slab objects on the system. Can provide details of
>   object use.
> 
> - SLAB requires the update of two words for freeing
>   and allocation. SLUB can do that by updating a single
>   word which allows to avoid enabling and disabling interrupts if
>   the processor supports an atomic instruction for that purpose.
>   This is important for realtime kernels where special measures
>   may have to be implemented if one wants to disable interrupts.
> 
> - SLAB requires memory to be set aside for queues (processors
>   times number of slabs times queue size). SLUB requires none of that.
> 
> - SLUB merges slab caches with similar characteristics to
>   reduce the memory footprint even further.
> 
> - SLAB performs object level NUMA management which creates
>   a complex allocator complexity. SLUB manages NUMA on the level of
>   slab pages reducing object management overhead.
> 
> - SLUB allows remote node defragmentation to avoid the buildup
>   of large partial lists on a single node.
> 
> - SLUB can actively reduce the fragmentation of slabs through
>   slab cache specific callbacks (not merged yet)
> 
> - SLUB has resiliency features that allow it to isolate a problem
>   object and continue after diagnostics have been performed.
> 
> - SLUB creates rarely used DMA caches on demand instead of creating
>   them all on bootup (SLAB).
> 

Yes, SLUB should be the way to go, but some issues are not yet solved.

I had to switch back to SLAB on a production NUMA server, with 2 nodes and 8GB 
ram. Using a lot of sockets, so a large part of memory was used by kernel.

SLUB kernel was hitting OOM after 2 or 3 days of uptime.
SLAB kernel never hit this.

Unfortunatly I dont have a test machine to reproduce the setup.

Maybe the problem is not related to SLUB at all, but an underlying VM/NUMA bug.

The /proc/buddyinfo showed that :

Node 0 contained two zones (DMA and DMA32) total 4 GB
Node 1 contained one zone (Normal) total 4 GB

So Node 0 contained no (Normal) zone

part of /proc/meminfo

Slab:          3338512 kB
SReclaimable:   789716 kB
SUnreclaim:    2548796 kB

I remember network interrupts were taken by CPU 1, so most allocations were 
done by CPU 1 (node 1), and many freeing were done on CPU 0

Hope this helps


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52, [2.6.24-rc4-git5: Reported regressions from 2.6.23]
  2007-12-14  4:07               ` Christoph Lameter
  2007-12-14  7:16                 ` Eric Dumazet
@ 2007-12-14 12:49                 ` Ingo Molnar
  2007-12-17 19:54                   ` Christoph Lameter
  1 sibling, 1 reply; 103+ messages in thread
From: Ingo Molnar @ 2007-12-14 12:49 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Linus Torvalds, Andrew Morton, Matt Mackall, Rafael J. Wysocki, LKML


* Christoph Lameter <clameter@sgi.com> wrote:

> > I think we should we make SLAB the default for v2.6.24 ...
> 
> If you guarantee that all the regression of SLAB vs. SLUB are 
> addressed then thats fine but AFAICT that is not possible.

huh? You got the ordering wrong ;-) SLUB needs to resolve all 
regressions relative to SLAB. (or at least have a really good 
explanation about why it regresses)

> Here is a list of some of the benefits of SLUB just in case we forgot:
> 
> - SLUB is performance wise much faster than SLAB. This can be more than a
>   factor of 10 (case of concurrent allocations / frees on multiple
>   processors). See http://lkml.org/lkml/2007/10/27/245

which is of little help if it regresses on other workloads. As we've 
seen it, SLUB can be more than 10 times slower on hackbench. You can 
tune SLUB to use 2MB pages but of course that's not a production level 
system. OTOH, have you tried to tune SLAB in the above benchmark?

> - Single threaded allocation speed is up to double that of SLAB

link?

> - Remote freeing of objectcs in a NUMA systems is typically 30% faster.
> 
> - Debugging on SLAB is difficult. Requires recompile of the kernel
>   and the resulting output is difficult to interpret. SLUB can apply
>   debugging options to a subset of the slabcaches in order to allow
>   the system to work with maximum speed. This is necessary to detect
>   difficult to reproduce race conditions.

that's not a fundamental property of SLAB. It would be an about 10 lines 
hack to enable SLAB debugging switchable-on runtime, with the boot flag 
defaulting to 'off'.

> - SLAB can capture huge amounts of memory in its queues. The problem
>   gets worse the more processors and NUMA nodes are in the system. The 
>   amount of memory limits the number of per cpu objects one can 
>   configure.

well that's the nature of caches, but it could be improved: restrict 
alien caches along cpusets and demand-allocate them.

> - SLAB requires a pass through all slab caches every 2 seconds to
>   expire objects. This is a problem both for realtime and MPI jobs 
>   that cannot take such a processor outage.

the moment you start capturing more memory in SLUB's per cpu queues 
(which do exist), you will have the same sort of problem.

> - SLAB does not have a sophisticated slabinfo tool to report the
>   state of slab objects on the system. Can provide details of object 
>   use.

again, not a fundamental property of SLAB.

> - SLAB requires the update of two words for freeing
>   and allocation. SLUB can do that by updating a single word which 
>   allows to avoid enabling and disabling interrupts if the processor 
>   supports an atomic instruction for that purpose. This is important 
>   for realtime kernels where special measures may have to be 
>   implemented if one wants to disable interrupts.

i do appreciate that :-) SLUB was rather easy to "port" to PREEMPT_RT: 
it did not need a single line of change. The SLAB portion is a lot 
scarier:

  dione:~linux-rt.q> diffstat patches/rt-slab-new.patch
    1 file changed, 319 insertions(+), 177 deletions(-)

> - SLAB requires memory to be set aside for queues (processors
>   times number of slabs times queue size). SLUB requires none of that.
>
> - SLUB merges slab caches with similar characteristics to
>   reduce the memory footprint even further.
> 
> - SLAB performs object level NUMA management which creates
>   a complex allocator complexity. SLUB manages NUMA on the level of 
>   slab pages reducing object management overhead.
> 
> - SLUB allows remote node defragmentation to avoid the buildup
>   of large partial lists on a single node.
> 
> - SLUB can actively reduce the fragmentation of slabs through
>   slab cache specific callbacks (not merged yet)
> 
> - SLUB has resiliency features that allow it to isolate a problem
>   object and continue after diagnostics have been performed.

all of these are neat.

How about renaming it to SLAB2 instead of SLUB? The "unqueued" bit is 
just stupid NIH syndrome. It's _of course_ queued because it has to. "It 
does not have _THAT_ queue as SLAB used to have" is just a silly excuse.

> - SLUB creates rarely used DMA caches on demand instead of creating
>   them all on bootup (SLAB).

actually, this might be a bug. the DMA caches should be created right 
away and filled with a small amount of objects due to stupid 16MB 
limitations with certain hardware. Later on a GFP_DMA request might not 
be fulfillable. (because that zone is filled up pretty quickly)

	Ingo

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52, [2.6.24-rc4-git5: Reported regressions from 2.6.23]
  2007-12-14 12:49                 ` Ingo Molnar
@ 2007-12-17 19:54                   ` Christoph Lameter
  0 siblings, 0 replies; 103+ messages in thread
From: Christoph Lameter @ 2007-12-17 19:54 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andrew Morton, Matt Mackall, Rafael J. Wysocki, LKML

On Fri, 14 Dec 2007, Ingo Molnar wrote:

> which is of little help if it regresses on other workloads. As we've 
> seen it, SLUB can be more than 10 times slower on hackbench. You can 
> tune SLUB to use 2MB pages but of course that's not a production level 
> system. OTOH, have you tried to tune SLAB in the above benchmark?

Hackbench is one special use case and I was not aware of there being an 
issue there. AFAICT other workloads are fine. I still do not understand 
why the measures in SLUB to avoid lock contention do not take in this 
case. Need to run some more tests.

> > - Single threaded allocation speed is up to double that of SLAB
> 
> link?

Same as link as for the earlier numbers.

> > - Debugging on SLAB is difficult. Requires recompile of the kernel
> >   and the resulting output is difficult to interpret. SLUB can apply
> >   debugging options to a subset of the slabcaches in order to allow
> >   the system to work with maximum speed. This is necessary to detect
> >   difficult to reproduce race conditions.
> 
> that's not a fundamental property of SLAB. It would be an about 10 lines 
> hack to enable SLAB debugging switchable-on runtime, with the boot flag 
> defaulting to 'off'.

Well try it. Note that you need to avoid the runtime debugging result in a 
negative performance impact.
 
> > - SLAB can capture huge amounts of memory in its queues. The problem
> >   gets worse the more processors and NUMA nodes are in the system. The 
> >   amount of memory limits the number of per cpu objects one can 
> >   configure.
> 
> well that's the nature of caches, but it could be improved: restrict 
> alien caches along cpusets and demand-allocate them.

Maybe but that adds additional complexity. There are other issues with 
queues too.

> > - SLAB requires a pass through all slab caches every 2 seconds to
> >   expire objects. This is a problem both for realtime and MPI jobs 
> >   that cannot take such a processor outage.
> 
> the moment you start capturing more memory in SLUB's per cpu queues 
> (which do exist), you will have the same sort of problem.

There are no queues and thus no problem in SLUB. The per cpu slab is 
exactly one slab and cannot grow beyond that.

> > - SLAB requires the update of two words for freeing
> >   and allocation. SLUB can do that by updating a single word which 
> >   allows to avoid enabling and disabling interrupts if the processor 
> >   supports an atomic instruction for that purpose. This is important 
> >   for realtime kernels where special measures may have to be 
> >   implemented if one wants to disable interrupts.
> 
> i do appreciate that :-) SLUB was rather easy to "port" to PREEMPT_RT: 
> it did not need a single line of change. The SLAB portion is a lot 
> scarier:

Finally something positive. I think we can get to a point where SLUB can 
be the same on RT and non RT. 
 
> How about renaming it to SLAB2 instead of SLUB? The "unqueued" bit is 
> just stupid NIH syndrome. It's _of course_ queued because it has to. "It 
> does not have _THAT_ queue as SLAB used to have" is just a silly excuse.

Hmmm yes. At some point I want to remove SLAB and rename SLUB SLAB. Note 
that the queues (if you want to call the per slab page freelist queues) 
are significantly different.

> > - SLUB creates rarely used DMA caches on demand instead of creating
> >   them all on bootup (SLAB).
> 
> actually, this might be a bug. the DMA caches should be created right 
> away and filled with a small amount of objects due to stupid 16MB 
> limitations with certain hardware. Later on a GFP_DMA request might not 
> be fulfillable. (because that zone is filled up pretty quickly)

Use of SLAB DMA memory are exceedingly rare. Andi Kleen has removed 
almost all uses of slab DMA. The DMA must remain allocatable in order to 
allow allocations for legacy device drivers. If it fills up then we will 
have other issues.


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-11  9:01           ` Ingo Molnar
  2007-12-11 21:10             ` Stefano Brivio
@ 2007-12-19  0:58             ` Stefano Brivio
  1 sibling, 0 replies; 103+ messages in thread
From: Stefano Brivio @ 2007-12-19  0:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, rjw, linux-kernel, torvalds, Guillaume Chazarain

On Tue, 11 Dec 2007 10:01:20 +0100
Ingo Molnar <mingo@elte.hu> wrote:

> ok, just to make sure we are all synced up. I made 8 patches related to 
> this problem category (and all the trickle effects). 3 are upstream 
> already, 5 are pending for v2.6.25. One out of those 5 is an immaterial 
> cleanup patch - which leaves us 4 patches to sort out.
> 
> So i'd suggest for you to try latest -git - that will tell us whether 
> udelay() is acceptable on your box right now.
> 
> i've attached those 4 patches:
> 
>  x86-sched_clock-re-scheduler-fix-x86-regression-in-native-sched-clock.patch
>  x86-cpu-clock-idle-event.patch
>  sched-printk-recursion-fix.patch
>  sched-printk-clock-fix.patch
> 
> none of them is _supposed_ to have any effect on udelay(), but the 
> interactions in this area are weird.

Exactly, none of them have any effect on udelay().

> [ note: CONFIG_PRINTK_TIME will be broken and only fixed in v2.6.25, so 
>   use some other time metric for determining mdelay quality. ]
> 
> plus then there's this patch:
> 
>   http://lkml.org/lkml/2007/12/7/100
> 
> is it perhaps this one that fixed udelay for you? [ which would be much 
> more expected, as this patch changes udelay ;-) ]

Yes, this one did. mdelay(2000) still gives delays between 2 and 2.9s, which is
acceptable. I have marked the regression as CODE_FIX.


--
Ciao
Stefano

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-13 10:49         ` Takashi Iwai
@ 2007-12-20 15:42           ` Takashi Iwai
  0 siblings, 0 replies; 103+ messages in thread
From: Takashi Iwai @ 2007-12-20 15:42 UTC (permalink / raw)
  To: perex
  Cc: Theodore Tso, Rafael J. Wysocki, Andrew Morton, LKML,
	Linus Torvalds, Ingo Molnar, Roland Dreier

At Thu, 13 Dec 2007 11:49:51 +0100,
I wrote:
> 
> [Sorry for the late response as I've been on vacation]
> 
> At Sat, 8 Dec 2007 21:15:44 -0500,
> Theodore Tso wrote:
> > 
> > On Sat, Dec 08, 2007 at 11:30:53PM +0100, Rafael J. Wysocki wrote:
> > > On Saturday, 8 of December 2007, Theodore Tso wrote:
> > > > However, as far as I am concerned, Ingo's patch, first posted to LKML
> > > > here: http://lkml.org/lkml/2007/11/16/66 should be listed as fixing
> > > > the above regression.  Rafael, could you please make a note of this in
> > > > your regression list,
> > > 
> > > Done, thanks.
> > 
> > Great, thanks.  I should add that technically this wasn't a regression
> > since I had been seeing this since before 2.6.23.  Also, it isn't a
> > big deal, since aside from noise in the syslog, falling back to
> > polling more doesn't make any functional or user-visible difference
> > (although I guess it's less efficient).  
> > 
> > Regardless of whether it is a regression, it would be nice to get the
> > patch applied and and this issue fixed for 2.6.25!
> 
> You mean 2.6.24 ? ;-)
> 
> Yes, if it solves the problem, not only improves the latency, it's
> definitely nice to have now.  I was just too conservative to mark it
> for 2.6.24 merge although it looks safe.
> 
> Jaroslav, could you prepare this for the push?  It corresponds to
> alsa-kernel HG changeset 5557.

Jaroslav, what about this now?


Takashi

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10  2:25                   ` Tejun Heo
@ 2007-12-10  3:20                     ` Robert Hancock
  0 siblings, 0 replies; 103+ messages in thread
From: Robert Hancock @ 2007-12-10  3:20 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andreas Mohr, Matthew Garrett, Andrew Morton, Rafael J. Wysocki,
	LKML, Linus Torvalds, Ingo Molnar, linux-ide, Len Brown,
	linux-acpi

Tejun Heo wrote:
> Robert Hancock wrote:
>> And you're quite right in your comment that we are often too quick to
>> blacklist hardware instead of looking into why it really is failing.
>> ACPI is one of those areas where we often just need to figure out how to
>> be bug-to-bug compatibile with what Windows is doing..
> 
> In the spirit of not blacklisting without looking deep into ACPI code,
> can somebody familiar with ASL take a look at comment 11 of bug 9320?
> 
>   http://bugzilla.kernel.org/show_bug.cgi?id=9320#c11
> 
> This is libata calling _GTM to find out how the BIOS configured the
> device to determine cable type.
> 
> Thanks.

I suspect it's somewhat similar (though perhaps a different cause), the 
code is trying to lookup a value (presumably register contents) in a 
table using Match, gets a value that's not in the table (which makes 
Match return the ONES value FFFFFFFF meaning not found) and so the 
lookup of the corresponding output value with that index fails. We'd 
need the full ASL dump to know exactly what's going on there.

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10  1:28                 ` Robert Hancock
@ 2007-12-10  2:25                   ` Tejun Heo
  2007-12-10  3:20                     ` Robert Hancock
  0 siblings, 1 reply; 103+ messages in thread
From: Tejun Heo @ 2007-12-10  2:25 UTC (permalink / raw)
  To: Robert Hancock
  Cc: Andreas Mohr, Matthew Garrett, Andrew Morton, Rafael J. Wysocki,
	LKML, Linus Torvalds, Ingo Molnar, linux-ide, Len Brown,
	linux-acpi

Robert Hancock wrote:
> And you're quite right in your comment that we are often too quick to
> blacklist hardware instead of looking into why it really is failing.
> ACPI is one of those areas where we often just need to figure out how to
> be bug-to-bug compatibile with what Windows is doing..

In the spirit of not blacklisting without looking deep into ACPI code,
can somebody familiar with ASL take a look at comment 11 of bug 9320?

  http://bugzilla.kernel.org/show_bug.cgi?id=9320#c11

This is libata calling _GTM to find out how the BIOS configured the
device to determine cable type.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10  0:49               ` Andreas Mohr
  2007-12-10  1:28                 ` Robert Hancock
@ 2007-12-10  2:20                 ` Tejun Heo
  1 sibling, 0 replies; 103+ messages in thread
From: Tejun Heo @ 2007-12-10  2:20 UTC (permalink / raw)
  To: Andreas Mohr
  Cc: Robert Hancock, Matthew Garrett, Andrew Morton,
	Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar, linux-ide,
	Len Brown, linux-acpi

Andreas Mohr wrote:
> As such one can conclude that this BIOS is rather very confused when being called for _GTM on an entirely
> unused controller port. And this is either because the BIOS is dumb or because ACPI doesn't really
> expect anyone to call _GTM on an unused physical port. I'd bet on the latter...
> (however I haven't found ACPI 3.0b explicitly mentioning this somewhere yet)

Thanks a lot for finding this out.  One of the two reports in bug 9320
seems to be the same problem although the other doesn't seem to be.  So,
it seems we'll have to check that both primary and secondary slots are
empty and skip _GTM if so.  :-(

Also, right, there's no need to fail suspend on _GTM failure whatever
the error is.  That was me being anal again.  Will incorporate both into
the ACPI fixes patchset.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10  0:49               ` Andreas Mohr
@ 2007-12-10  1:28                 ` Robert Hancock
  2007-12-10  2:25                   ` Tejun Heo
  2007-12-10  2:20                 ` Tejun Heo
  1 sibling, 1 reply; 103+ messages in thread
From: Robert Hancock @ 2007-12-10  1:28 UTC (permalink / raw)
  To: Andreas Mohr
  Cc: Matthew Garrett, Andrew Morton, Rafael J. Wysocki, LKML,
	Linus Torvalds, Ingo Molnar, linux-ide, Tejun Heo, Len Brown,
	linux-acpi

Andreas Mohr wrote:
> On Mon, Dec 10, 2007 at 01:04:31AM +0100, Andreas Mohr wrote:
>> IOW, it seems very likely that _GTM on these BIOSes (VIA chipsets) isn't
>> actually wrongly implemented but simply expects IDE controller values
>> to have been set up ""differently"".
>>
>>
>> Or... one could possibly even infer from this that - maybe -
>> the _GTM invocation spot is wrong, it should be done somewhere
>> different during bootup. Or whatever.
> 
> "Whatever" indeed:
> 
> There's an ASL Match() for a "PMPT" (Primary Master PorT) PCI register,
> and the possible register values are:
> 
>                     Package (0x04)
>                     {
>                         0x20,
>                         0x31,
>                         0x65,
>                         0xA8
>                     },
> 
> and from
> 
>                 OperationRegion (CFG2, PCI_Config, 0x40, 0x20)
>                 Field (CFG2, DWordAcc, NoLock, Preserve)
>                 {
>                             Offset (0x08),·
>                     SSPT,   8,·
>                     SMPT,   8,·
>                     PSPT,   8,·
>                     PMPT,   8,·
>                             Offset (0x10),·
> ...
> we can infer that at PCI_Config offset 0x48 those values should be located.
> However after bootup or resume there are:
> 
> # lspci -s 00:11.1 -xxx
> 00:11.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
> 00: 06 11 71 05 07 00 90 02 06 8a 01 01 00 20 00 00
> 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 20: 01 e4 00 00 00 00 00 00 00 00 00 00 06 11 71 05
> 30: 00 00 00 00 c0 00 00 00 00 00 00 00 ff 01 00 00
> 40: 0b 32 09 0a 18 1c c0 00 99 99 20 20 ff 00 a8 20
> 50: 07 07 f6 f1 14 03 00 00 a8 a8 a8 a8 00 00 00 00
> 60: 00 02 00 00 00 00 00 00 00 02 00 00 00 00 00 00
> 70: 02 01 00 00 00 00 00 00 82 01 00 00 00 00 00 00
> 80: 00 e0 a1 1f 00 00 00 00 00 00 00 00 00 00 00 00
> 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> c0: 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00
> d0: 06 00 71 05 06 11 71 05 00 00 00 00 00 00 00 00
> e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> f0: 00 00 00 00 00 00 07 00 00 00 00 00 00 00 00 00
> 
> 
> As one can see, the relevant values for SSPT, SMPT, PSPT and PMPT are
> 99 99 20 20, which are not quite entirely valid judging from the array above,
> and this is because the secondary port is unused, as can also be seen
> from my bootup log:
> 
> scsi0 : pata_via
> scsi1 : pata_via
> ata1: PATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xe400 irq 14
> ata2: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0xe408 irq 15
> ata1.00: ATA-5: WDC WD1200JB-00CRA1, 17.07W17, max UDMA/100
> ata1.00: 234441648 sectors, multi 16: LBA
> ata1.01: ATAPI: TOSHIBA DVD-ROM SD-M1612, 1004, max UDMA/33
> Switched to high resolution mode on CPU 0
> ata1.00: configured for UDMA/100
> ata1.01: configured for UDMA/33
> ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126]
> ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTM_] (Node df80b9a8), AE_AML_PACKAGE_LIM
> IT
> ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN1._GTM] (Node df80b8d0), AE_AML_PACKAG
> E_LIMIT
> ata2: ACPI get timing mode failed (AE 0x300d)
> 
> 
> Manually tweaking the values to 20 20 20 20 truly does skip the _GTM failure message on suspend -
> only to reappear right on resume due to 99 99 20 20 combo happening again.
> If I don't tweak, I get _GTM failure at both suspend and resume.
> 
> 
> As such one can conclude that this BIOS is rather very confused when being called for _GTM on an entirely
> unused controller port. And this is either because the BIOS is dumb or because ACPI doesn't really
> expect anyone to call _GTM on an unused physical port. I'd bet on the latter...
> (however I haven't found ACPI 3.0b explicitly mentioning this somewhere yet)
> 
> Andreas Mohr
> 

Probably Windows doesn't call _GTM on a port with no devices connected, 
and so the BIOS people never tested that case. Likely we can just avoid 
doing this - if no devices are connected the timing settings for that 
channel are irrelevant..

And you're quite right in your comment that we are often too quick to 
blacklist hardware instead of looking into why it really is failing. 
ACPI is one of those areas where we often just need to figure out how to 
be bug-to-bug compatibile with what Windows is doing..

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/



^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10  0:04             ` Andreas Mohr
@ 2007-12-10  0:49               ` Andreas Mohr
  2007-12-10  1:28                 ` Robert Hancock
  2007-12-10  2:20                 ` Tejun Heo
  0 siblings, 2 replies; 103+ messages in thread
From: Andreas Mohr @ 2007-12-10  0:49 UTC (permalink / raw)
  To: Andreas Mohr
  Cc: Robert Hancock, Matthew Garrett, Andrew Morton,
	Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar, linux-ide,
	Tejun Heo, Len Brown, linux-acpi

On Mon, Dec 10, 2007 at 01:04:31AM +0100, Andreas Mohr wrote:
> IOW, it seems very likely that _GTM on these BIOSes (VIA chipsets) isn't
> actually wrongly implemented but simply expects IDE controller values
> to have been set up ""differently"".
> 
> 
> Or... one could possibly even infer from this that - maybe -
> the _GTM invocation spot is wrong, it should be done somewhere
> different during bootup. Or whatever.

"Whatever" indeed:

There's an ASL Match() for a "PMPT" (Primary Master PorT) PCI register,
and the possible register values are:

                    Package (0x04)
                    {
                        0x20,
                        0x31,
                        0x65,
                        0xA8
                    },

and from

                OperationRegion (CFG2, PCI_Config, 0x40, 0x20)
                Field (CFG2, DWordAcc, NoLock, Preserve)
                {
                            Offset (0x08),·
                    SSPT,   8,·
                    SMPT,   8,·
                    PSPT,   8,·
                    PMPT,   8,·
                            Offset (0x10),·
...
we can infer that at PCI_Config offset 0x48 those values should be located.
However after bootup or resume there are:

# lspci -s 00:11.1 -xxx
00:11.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00: 06 11 71 05 07 00 90 02 06 8a 01 01 00 20 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 01 e4 00 00 00 00 00 00 00 00 00 00 06 11 71 05
30: 00 00 00 00 c0 00 00 00 00 00 00 00 ff 01 00 00
40: 0b 32 09 0a 18 1c c0 00 99 99 20 20 ff 00 a8 20
50: 07 07 f6 f1 14 03 00 00 a8 a8 a8 a8 00 00 00 00
60: 00 02 00 00 00 00 00 00 00 02 00 00 00 00 00 00
70: 02 01 00 00 00 00 00 00 82 01 00 00 00 00 00 00
80: 00 e0 a1 1f 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 06 00 71 05 06 11 71 05 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 07 00 00 00 00 00 00 00 00 00


As one can see, the relevant values for SSPT, SMPT, PSPT and PMPT are
99 99 20 20, which are not quite entirely valid judging from the array above,
and this is because the secondary port is unused, as can also be seen
from my bootup log:

scsi0 : pata_via
scsi1 : pata_via
ata1: PATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xe400 irq 14
ata2: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0xe408 irq 15
ata1.00: ATA-5: WDC WD1200JB-00CRA1, 17.07W17, max UDMA/100
ata1.00: 234441648 sectors, multi 16: LBA
ata1.01: ATAPI: TOSHIBA DVD-ROM SD-M1612, 1004, max UDMA/33
Switched to high resolution mode on CPU 0
ata1.00: configured for UDMA/100
ata1.01: configured for UDMA/33
ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126]
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTM_] (Node df80b9a8), AE_AML_PACKAGE_LIM
IT
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN1._GTM] (Node df80b8d0), AE_AML_PACKAG
E_LIMIT
ata2: ACPI get timing mode failed (AE 0x300d)


Manually tweaking the values to 20 20 20 20 truly does skip the _GTM failure message on suspend -
only to reappear right on resume due to 99 99 20 20 combo happening again.
If I don't tweak, I get _GTM failure at both suspend and resume.


As such one can conclude that this BIOS is rather very confused when being called for _GTM on an entirely
unused controller port. And this is either because the BIOS is dumb or because ACPI doesn't really
expect anyone to call _GTM on an unused physical port. I'd bet on the latter...
(however I haven't found ACPI 3.0b explicitly mentioning this somewhere yet)

Andreas Mohr

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-09 21:36           ` Andreas Mohr
@ 2007-12-10  0:04             ` Andreas Mohr
  2007-12-10  0:49               ` Andreas Mohr
  0 siblings, 1 reply; 103+ messages in thread
From: Andreas Mohr @ 2007-12-10  0:04 UTC (permalink / raw)
  To: Andreas Mohr
  Cc: Robert Hancock, Matthew Garrett, Andrew Morton,
	Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar, linux-ide,
	Tejun Heo, Len Brown, linux-acpi

Hi,

On Sun, Dec 09, 2007 at 10:36:42PM +0100, Andreas Mohr wrote:
> And the second, possibly much more lucrative, question would be
> whether we're actually doing something wrong with our ACPI _GTM execution
> which triggers the AE_AML_PACKAGE_LIMIT problem.
> 
> This might help here, perhaps (relevant snippets of AML dump):

Indeed, after looking over this horrid ASL stuff for ages I'm now starting
to believe that our IDE controller state is wrong,
since the Match()ing etc. in this particular _GTM implementation
is heavily dependant on actual PCI values
(it references some PCI_Config OperationRegion:s),
and some indexing seems to go wrong due to this.

IOW, it seems very likely that _GTM on these BIOSes (VIA chipsets) isn't
actually wrongly implemented but simply expects IDE controller values
to have been set up ""differently"".


Or... one could possibly even infer from this that - maybe -
the _GTM invocation spot is wrong, it should be done somewhere
different during bootup. Or whatever.



This seems to tell me again that we're often quick to blacklist
or whitelist things left and right when instead fundamental problems
are hidden somewhere.

Still investigating,

Andreas Mohr

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08 18:24         ` Robert Hancock
  2007-12-09  5:59           ` Tejun Heo
@ 2007-12-09 21:36           ` Andreas Mohr
  2007-12-10  0:04             ` Andreas Mohr
  1 sibling, 1 reply; 103+ messages in thread
From: Andreas Mohr @ 2007-12-09 21:36 UTC (permalink / raw)
  To: Robert Hancock
  Cc: Matthew Garrett, Andrew Morton, Andreas Mohr, Rafael J. Wysocki,
	LKML, Linus Torvalds, Ingo Molnar, linux-ide, Tejun Heo,
	Len Brown, linux-acpi

Hi,

[ACPI _GTM suspend issue sorta fixed, read below]

On Sat, Dec 08, 2007 at 12:24:16PM -0600, Robert Hancock wrote:
> Matthew Garrett wrote:
>> On Sat, Dec 08, 2007 at 02:20:01AM -0800, Andrew Morton wrote:
>>> On Sat, 8 Dec 2007 11:12:57 +0100 Andreas Mohr <andi@lisas.de> wrote:
>>>> ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126]
>>>> ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTF_] (Node c180b990), AE_AML_PACKAGE_LIMIT
>>>> ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN0.DRV1._GTF] (Node c180b888), AE_AML_PACKAGE_LIMIT
>>>> ata1.01: _GTF evaluation failed (AE 0x300d)
>>
>> 037f6bb79f753c014bc84bca0de9bf98bb5ab169 ought to have fixed this?
>>
>
> I should think it should have.

Yup, the _GTF problem is certainly fixed, but this is a dead-end
since I showed the -rc1 vs. -rc2 behaviour, whereas I still have
failing suspend in -rc4 with this patch confirmed to be applied
(source does contain the changes) and confirmed to apparently be working
(no errors in dmesg any more).

IOW, what I'm concerned about is not a _GTF error on boot any more,
but a seemingly fatally handled _GTM error on suspend.


...OK, dug some more into this, and now I managed to get it to work,
and it was indeed _GTM which broke my whole suspend:

Since _GTM is failing on me (and the point is, it's failing
catastrophically, not "normally"!), ata_acpi_on_suspend()
calling ata_acpi_gtm() fails with -EINVAL instead of -ENOENT,
however ata_acpi_on_suspend() has the following correction only:

        if (rc == -ENOENT)
                rc = 0;

to make sure a suspend doesn't get aborted (fatal error) when
_GTM is simply empty.

Changing this into

        if ((rc == -ENOENT) || (rc == -EINVAL))
                rc = 0;

to additionally account for _invalid_ _GTM execution makes my suspend
(and resume!) work again on -rc4.


Now the question is whether this error code correction is ok, or whether a
catastrophically failing _GTM should have been truly registered on boot
already (where it does gtm to fetch cable timings) to subsequently avoid
doing any ATA ACPI things on suspend at all.


And the second, possibly much more lucrative, question would be
whether we're actually doing something wrong with our ACPI _GTM execution
which triggers the AE_AML_PACKAGE_LIMIT problem.

This might help here, perhaps (relevant snippets of AML dump):

                Device (CHN0)
                {
                    Name (_ADR, 0x00)
                    Method (_GTM, 0, NotSerialized)
                    {
                        Return (GTM (PMPT, PMUE, PMUT, PSPT, PSUE, PSUT))
                    }

                    Method (_STM, 3, NotSerialized)
                    {
                        Store (Arg0, TMD0)
                        Store (PMPT, GMPT)
                        Store (PMUE, GMUE)
                        Store (PMUT, GMUT)
                        Store (PSPT, GSPT)
                        Store (PSUE, GSUE)
                        Store (PSUT, GSUT)
                        STM ()
                        Store (GMPT, PMPT)
                        Store (GMUE, PMUE)
                        Store (GMUT, PMUT)
                        Store (GSPT, PSPT)
                        Store (GSUE, PSUE)
                        Store (GSUT, PSUT)
                    }

                Device (CHN1)
                {
                    Name (_ADR, 0x01)
                    Method (_GTM, 0, NotSerialized)
                    {
                        Return (GTM (SMPT, SMUE, SMUT, SSPT, SSUE, SSUT))
                    }

                    Method (_STM, 3, NotSerialized)
                    {
                        Store (Arg0, TMD0)
                        Store (SMPT, GMPT)
                        Store (SMUE, GMUE)
                        Store (SMUT, GMUT)
                        Store (SSPT, GSPT)
                        Store (SSUE, GSUE)
                        Store (SSUT, GSUT)
                        STM ()
                        Store (GMPT, SMPT)
                        Store (GMUE, SMUE)
                        Store (GMUT, SMUT)
                        Store (GSPT, SSPT)
                        Store (GSUE, SSUE)
                        Store (GSUT, SSUT)
                    }


                Method (GTM, 6, Serialized)
                {
                    Store (Ones, PIO0)
                    Store (Ones, PIO1)
                    Store (Ones, DMA0)
                    Store (Ones, DMA1)
                    Store (0x10, CHNF)
                    If (REGF) {}
                    Else
                    {
                        Return (TMD0)
                    }

                    Store (Match (DerefOf (Index (TIM0, 0x01)), MEQ, Arg0, MTR,
                        0x00, 0x00), Local6)
                    Store (DerefOf (Index (DerefOf (Index (TIM0, 0x00)), Local6)
),
                        Local7)
                    Store (Local7, DMA0)
                    Store (Local7, PIO0)
                    Store (Match (DerefOf (Index (TIM0, 0x01)), MEQ, Arg3, MTR,
                        0x00, 0x00), Local6)
                    Store (DerefOf (Index (DerefOf (Index (TIM0, 0x00)), Local6)
),
                        Local7)
                    Store (Local7, DMA1)
                    Store (Local7, PIO1)
                    If (Arg1)
                    {
                        If (A133 ())
                        {
                            Store (DerefOf (Index (DerefOf (Index (TIM0, 0x0D)),
 Arg2)),
                                Local5)
                        }
                        Else
                        {
                            Store (DerefOf (Index (DerefOf (Index (TIM0, 0x0A)),
 Arg2)),
                                Local5)
                        }

                        If (A133 ())
                        {
                            Store (DerefOf (Index (DerefOf (Index (TIM0, 0x0C)),
 Local5)),
                                DMA0)
                        }
                        Else
                        {
                            Store (DerefOf (Index (DerefOf (Index (TIM0, 0x04)),
 Local5)),
                                DMA0)
                        }

                        Or (CHNF, 0x01, CHNF)
                    }

                    If (Arg4)
                    {
                        If (A133 ())
                        {
                            Store (DerefOf (Index (DerefOf (Index (TIM0, 0x0D)), Arg5)),
                                Local5)
                        }
                        Else
                        {
                            Store (DerefOf (Index (DerefOf (Index (TIM0, 0x0A)), Arg5)),
                                Local5)
                        }

                        If (A133 ())
                        {
                            Store (DerefOf (Index (DerefOf (Index (TIM0, 0x0C)), Local5)),
                                DMA1)
                        }
                        Else
                        {
                            Store (DerefOf (Index (DerefOf (Index (TIM0, 0x04)), Local5)),
                                DMA1)
                        }

                        Or (CHNF, 0x04, CHNF)
                    }

                    Return (TMD0)
                }


Reminder: issue tracked at #9530.


Thanks,

Andreas Mohr

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08 18:24         ` Robert Hancock
@ 2007-12-09  5:59           ` Tejun Heo
  2007-12-09 21:36           ` Andreas Mohr
  1 sibling, 0 replies; 103+ messages in thread
From: Tejun Heo @ 2007-12-09  5:59 UTC (permalink / raw)
  To: Robert Hancock
  Cc: Matthew Garrett, Andrew Morton, Andreas Mohr, Rafael J. Wysocki,
	LKML, Linus Torvalds, Ingo Molnar, linux-ide, Len Brown,
	linux-acpi

Robert Hancock wrote:
> Matthew Garrett wrote:
>> On Sat, Dec 08, 2007 at 02:20:01AM -0800, Andrew Morton wrote:
>>> On Sat, 8 Dec 2007 11:12:57 +0100 Andreas Mohr <andi@lisas.de> wrote:
>>>> ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index
>>>> (0FFFFFFFF) is beyond end of object [20070126]
>>>> ACPI Error (psparse-0537): Method parse/execution failed
>>>> [\_SB_.PCI0.IDE0.GTF_] (Node c180b990), AE_AML_PACKAGE_LIMIT
>>>> ACPI Error (psparse-0537): Method parse/execution failed
>>>> [\_SB_.PCI0.IDE0.CHN0.DRV1._GTF] (Node c180b888), AE_AML_PACKAGE_LIMIT
>>>> ata1.01: _GTF evaluation failed (AE 0x300d)
>>
>> 037f6bb79f753c014bc84bca0de9bf98bb5ab169 ought to have fixed this?
>>
> 
> I should think it should have.
> 
> I think we're too aggressive about disabling the libata ACPI support,
> even. One of my laptop's _GTF commands on resume is a DEVICE
> CONFIGURATION FREEZE LOCK command, which gets rejected by the drive
> (maybe it worked on the original Hitachi disk, but I've upgraded it to a
>  newer Samsung). I'd say if the drive returns command aborted on one of
> these, we should just ignore that command and continue to the next one
> without trying to retry or disabling the ACPI support entirely.

Yeap, my pending patchset does exactly that.  It's currently being
tested by but reporters.  I'll soon post the patchset.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
       [not found]       ` <fa.5o6E6S0UWnARbQPxLe30TvLQIiY@ifi.uio.no>
@ 2007-12-08 18:24         ` Robert Hancock
  2007-12-09  5:59           ` Tejun Heo
  2007-12-09 21:36           ` Andreas Mohr
  0 siblings, 2 replies; 103+ messages in thread
From: Robert Hancock @ 2007-12-08 18:24 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Andrew Morton, Andreas Mohr, Rafael J. Wysocki, LKML,
	Linus Torvalds, Ingo Molnar, linux-ide, Tejun Heo, Len Brown,
	linux-acpi

Matthew Garrett wrote:
> On Sat, Dec 08, 2007 at 02:20:01AM -0800, Andrew Morton wrote:
>> On Sat, 8 Dec 2007 11:12:57 +0100 Andreas Mohr <andi@lisas.de> wrote:
>>> ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126]
>>> ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTF_] (Node c180b990), AE_AML_PACKAGE_LIMIT
>>> ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN0.DRV1._GTF] (Node c180b888), AE_AML_PACKAGE_LIMIT
>>> ata1.01: _GTF evaluation failed (AE 0x300d)
> 
> 037f6bb79f753c014bc84bca0de9bf98bb5ab169 ought to have fixed this?
> 

I should think it should have.

I think we're too aggressive about disabling the libata ACPI support, 
even. One of my laptop's _GTF commands on resume is a DEVICE 
CONFIGURATION FREEZE LOCK command, which gets rejected by the drive 
(maybe it worked on the original Hitachi disk, but I've upgraded it to a 
  newer Samsung). I'd say if the drive returns command aborted on one of 
these, we should just ignore that command and continue to the next one 
without trying to retry or disabling the ACPI support entirely.

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/


^ permalink raw reply	[flat|nested] 103+ messages in thread

end of thread, other threads:[~2007-12-20 17:41 UTC | newest]

Thread overview: 103+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-12-08  2:40 2.6.24-rc4-git5: Reported regressions from 2.6.23 Rafael J. Wysocki
2007-12-08  6:53 ` Fabio Comolli
2007-12-08  8:28   ` Ingo Molnar
2007-12-08  9:23     ` Andrew Morton
2007-12-08 22:11       ` Rafael J. Wysocki
2007-12-08  9:29 ` Andrew Morton
2007-12-08 22:17   ` Rafael J. Wysocki
2007-12-08  9:30 ` tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52, [2.6.24-rc4-git5: Reported regressions from 2.6.23] Ingo Molnar
2007-12-08 10:11   ` Andrew Morton
2007-12-08 16:37   ` Matt Mackall
2007-12-08 17:47     ` Linus Torvalds
2007-12-08 17:54       ` Linus Torvalds
2007-12-08 18:09         ` Andrew Morton
2007-12-08 18:37           ` Linus Torvalds
2007-12-08 19:52             ` Ingo Molnar
2007-12-08 20:29               ` Ingo Molnar
2007-12-09  8:20                 ` Pekka Enberg
2007-12-09  8:50                   ` Ingo Molnar
2007-12-09  9:18                     ` Pekka Enberg
2007-12-09 11:51                       ` Ingo Molnar
2007-12-09 12:34                         ` Ingo Molnar
2007-12-13 22:07                     ` Christoph Lameter
2007-12-09 15:59                   ` Arjan van de Ven
2007-12-11  6:27               ` Dave Jones
2007-12-11  8:52                 ` Ingo Molnar
2007-12-11 19:03                   ` Peter Zijlstra
2007-12-14  4:07               ` Christoph Lameter
2007-12-14  7:16                 ` Eric Dumazet
2007-12-14 12:49                 ` Ingo Molnar
2007-12-17 19:54                   ` Christoph Lameter
2007-12-09  7:58           ` Ingo Molnar
2007-12-09 14:17             ` Rafael J. Wysocki
2007-12-08 18:33         ` Matt Mackall
2007-12-08 19:00         ` Matt Mackall
2007-12-09  8:33         ` Pekka Enberg
2007-12-13 22:03       ` Christoph Lameter
2007-12-08  9:36 ` 2.6.24-rc4-git5: Reported regressions from 2.6.23 Andrew Morton
2007-12-08 10:12   ` Andreas Mohr
2007-12-08 10:20     ` Andrew Morton
2007-12-08 10:28       ` Matthew Garrett
2007-12-08 10:55       ` Andreas Mohr
2007-12-09 15:46         ` Tejun Heo
2007-12-09 19:59           ` Andreas Mohr
2007-12-09  6:52   ` Tejun Heo
2007-12-09 14:20     ` Rafael J. Wysocki
2007-12-09 15:11       ` Tejun Heo
2007-12-08  9:42 ` Andrew Morton
2007-12-08 18:57   ` Roland Dreier
2007-12-08 19:40   ` Theodore Tso
2007-12-08 19:55     ` Ingo Molnar
2007-12-08 22:30     ` Rafael J. Wysocki
2007-12-09  2:15       ` Theodore Tso
2007-12-13 10:49         ` Takashi Iwai
2007-12-20 15:42           ` Takashi Iwai
2007-12-08  9:46 ` Andrew Morton
2007-12-08 15:49   ` Alan Stern
2007-12-08  9:52 ` Andrew Morton
2007-12-09  7:00   ` Tejun Heo
2007-12-09 13:42     ` Alan Cox
2007-12-09 15:09       ` Tejun Heo
2007-12-09 15:25         ` Alan Cox
2007-12-09 15:39           ` Tejun Heo
2007-12-09 18:36       ` Linus Torvalds
2007-12-09 21:54         ` Alan Cox
2007-12-09 18:41       ` Linus Torvalds
2007-12-09 22:01         ` Alan Cox
2007-12-09 22:51           ` Ray Lee
2007-12-10  1:57           ` Linus Torvalds
2007-12-10  3:28             ` Alan Cox
2007-12-10  3:38             ` Alan Cox
2007-12-10 15:38               ` Linus Torvalds
2007-12-10  8:21             ` Ingo Molnar
2007-12-10  8:27               ` Tejun Heo
2007-12-10  8:41                 ` Ingo Molnar
2007-12-08 10:44 ` Richard Purdie
2007-12-08 22:32   ` Rafael J. Wysocki
2007-12-09 11:54 ` Andrew Morton
2007-12-09 12:05   ` Ingo Molnar
2007-12-09 14:24   ` Rafael J. Wysocki
2007-12-10 20:42 ` Ingo Molnar
2007-12-10 20:57   ` Guillaume Chazarain
2007-12-10 20:59   ` Andrew Morton
2007-12-10 22:45     ` Ingo Molnar
2007-12-10 23:04       ` Ingo Molnar
2007-12-10 23:34         ` Stefano Brivio
2007-12-10 23:53           ` Guillaume Chazarain
2007-12-11  8:48             ` Ingo Molnar
2007-12-10 23:56           ` Arjan van de Ven
2007-12-11  0:01             ` Guillaume Chazarain
2007-12-11  1:06               ` Arjan van de Ven
2007-12-11  8:43                 ` Ingo Molnar
2007-12-11  9:01           ` Ingo Molnar
2007-12-11 21:10             ` Stefano Brivio
2007-12-19  0:58             ` Stefano Brivio
     [not found] <fa.qQhD8aJpiTOFaZqjRYwoaG7YT1c@ifi.uio.no>
     [not found] ` <fa.yaP86AixGhz5Q7eXSu04pIQp6ho@ifi.uio.no>
     [not found]   ` <fa.c3k8VKWAx4HIo9zXWbL5Ek0oSBw@ifi.uio.no>
     [not found]     ` <fa.C8ACnOhs8bXB++vmugf5F34JcJg@ifi.uio.no>
     [not found]       ` <fa.5o6E6S0UWnARbQPxLe30TvLQIiY@ifi.uio.no>
2007-12-08 18:24         ` Robert Hancock
2007-12-09  5:59           ` Tejun Heo
2007-12-09 21:36           ` Andreas Mohr
2007-12-10  0:04             ` Andreas Mohr
2007-12-10  0:49               ` Andreas Mohr
2007-12-10  1:28                 ` Robert Hancock
2007-12-10  2:25                   ` Tejun Heo
2007-12-10  3:20                     ` Robert Hancock
2007-12-10  2:20                 ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).