linux-next.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* linux-next: Tree for June 5
@ 2008-06-05  7:52 Stephen Rothwell
  2008-06-06  2:56 ` Andrew Morton
  0 siblings, 1 reply; 54+ messages in thread
From: Stephen Rothwell @ 2008-06-05  7:52 UTC (permalink / raw)
  To: linux-next; +Cc: LKML

[-- Attachment #1: Type: text/plain, Size: 7491 bytes --]

Hi all,

Changes since next-20080604:

The hid tree fixed the conflicts with Linus' tree.

The v4l-dvb tree no longer needed the fixup patch.

The galak tree gained a conflict with the net tree.

The wireless tree now has the same conflict (in the ps3_gelic driver) as
the semaphore-removal tree.

The rr tree gained a conflict with the net-current tree.

The ldp tree suffered from the v4l-dvb struct members renaming.

I have applied the following temporary patch for known build problems:

	"Fix various 8390 builds" - the net tree broke builds on various
architectures - hopefully this patch will go into the net tree shortly.
	"firmware: build fixes 2" - the firmware tree broke some arm builds.

----------------------------------------------------------------------------

I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/sfr/linux-next.git
(patches at
http://www.kernel.org/pub/linux/kernel/people/sfr/linux-next/).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" as mentioned in the FAQ on the wiki
(see below).

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log files
in the Next directory.  Between each merge, the tree was built with
a ppc64_defconfig for powerpc and an allmodconfig for x86_64. After the
final fixups, it is also built with powerpc allnoconfig,
44x_defconfig and allyesconfig and i386, sparc and sparc64 defconfig.

Below is a summary of the state of the merge.

We are up to 87 trees (counting Linus' and 13 trees of patches pending for
Linus' tree), more are welcome (even if they are currently empty).
Thanks to those who have contributed, and to those who haven't, please do.

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Jan Dittmer for adding the linux-next tree to his build tests
at http://l4x.org/k/ , the guys at http://test.kernel.org/ and Randy
Dunlap for doing many randconfig builds.

There is a wiki covering stuff to do with linux-next at
http://linux.f-seidel.de/linux-next/pmwiki/ .  Thanks to Frank Seidel.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

$ git checkout master
$ git reset --hard stable
Merging origin/master
Merging powerpc-merge/merge
Merging scsi-rc-fixes/master
Merging net-current/master
Merging sparc-current/master
Merging sound-current/for-linus
Merging arm-current/master
Merging pci-current/for-linus
Merging wireless-current/master
Merging kbuild-current/master
Merging quilt/driver-core.current
Merging quilt/usb.current
Merging cpufreq-current/fixes
Merging input-current/for-linus
Merging quilt/driver-core
CONFLICT (content): Merge conflict in drivers/s390/kvm/kvm_virtio.c
CONFLICT (content): Merge conflict in drivers/virtio/virtio.c
CONFLICT (content): Merge conflict in drivers/virtio/virtio_pci.c
Merging quilt/usb
Merging x86/auto-x86-next
Merging sched/auto-sched-next
Merging ftrace/auto-ftrace-next
Applying ftrace: fix rculist split fallout
Merging pci/linux-next
CONFLICT (content): Merge conflict in drivers/base/power/main.c
CONFLICT (content): Merge conflict in include/linux/device.h
Merging quilt/device-mapper
Merging hid/mm
Merging quilt/i2c
CONFLICT (content): Merge conflict in drivers/i2c/i2c-core.c
Merging quilt/kernel-doc
Merging avr32/avr32-arch
Merging v4l-dvb/stable
Merging s390/features
CONFLICT (content): Merge conflict in drivers/s390/block/dasd.c
CONFLICT (content): Merge conflict in drivers/s390/block/dasd_eckd.c
CONFLICT (content): Merge conflict in drivers/s390/block/dasd_fba.c
CONFLICT (content): Merge conflict in drivers/s390/char/tape_core.c
CONFLICT (content): Merge conflict in drivers/s390/cio/device_fsm.c
CONFLICT (content): Merge conflict in drivers/s390/net/claw.c
CONFLICT (content): Merge conflict in drivers/s390/net/ctcm_main.c
CONFLICT (content): Merge conflict in drivers/s390/net/lcs.c
Merging sh/master
Merging jfs/next
Merging kbuild/master
Merging quilt/ide
Merging libata/NEXT
Merging nfs/linux-next
Merging xfs/master
Merging infiniband/for-next
Merging acpi/test
Merging blackfin/for-linus
Merging nfsd/nfsd-next
Merging ieee1394/for-next
Merging hwmon/testing
Merging ubi/master
Merging kvm/master
Merging dlm/next
Merging scsi/master
Applying scsi: fix fallout from KOBJ_NAME_LEN removal
Merging ia64/test
Merging tests/master
CONFLICT (content): Merge conflict in lib/Kconfig.debug
Merging ocfs2/linux-next
Merging selinux/for-akpm
Merging quilt/m68k
Merging powerpc/powerpc-next
Merging hrt/mm
Merging lblnet/master
Merging ext4/next
Merging 4xx/next
Merging async_tx/next
Merging udf/for_next
Merging security-testing/next
Merging net/master
Merging sparc/master
Merging galak/powerpc-next
CONFLICT (content): Merge conflict in Documentation/powerpc/booting-without-of.txt
Merging mtd/master
Merging wireless/master
CONFLICT (content): Merge conflict in drivers/net/ps3_gelic_wireless.c
CONFLICT (content): Merge conflict in drivers/net/wireless/libertas/main.c
CONFLICT (content): Merge conflict in drivers/net/wireless/rt2x00/rt2x00dev.c
Merging crypto/master
Merging vfs/vfs-2.6.25
Merging sound/master
Merging arm/devel
CONFLICT (content): Merge conflict in arch/arm/mach-pxa/tosa.c
Merging cpufreq/next
Merging v9fs/for-next
Merging quilt/rr
CONFLICT (content): Merge conflict in drivers/net/virtio_net.c
Merging cifs/master
Merging mmc/next
Merging gfs2/master
Merging rcu/core/rcu
Merging locking/core/locking
Merging safe-poison-pointers/safe-poison-pointers
Merging stackprotector/stackprotector
Merging input/next
Merging semaphore/semaphore
Merging semaphore-removal/semaphore-removal
CONFLICT (content): Merge conflict in drivers/net/ps3_gelic_wireless.c
CONFLICT (content): Merge conflict in drivers/scsi/qla2xxx/qla_attr.c
CONFLICT (content): Merge conflict in drivers/scsi/qla2xxx/qla_def.h
CONFLICT (content): Merge conflict in drivers/scsi/qla2xxx/qla_mbx.c
CONFLICT (content): Merge conflict in drivers/scsi/qla2xxx/qla_mid.c
CONFLICT (content): Merge conflict in drivers/scsi/qla2xxx/qla_os.c
Merging quilt/ldp.next
Applying ldp: fix fallout from v4l struct element renaming
Merging bkl-removal/bkl-removal
Merging trivial/next
Merging ubifs/for_andrew
Merging lsm/for-next
Merging block/for-next
Merging embedded/master
Merging firmware/master
CONFLICT (content): Merge conflict in drivers/usb/serial/Kconfig
CONFLICT (delete/modify): drivers/usb/serial/ti_fw_3410.h deleted in firmware/master and modified in HEAD. Version HEAD of drivers/usb/serial/ti_fw_3410.h left in tree.
CONFLICT (delete/modify): drivers/usb/serial/ti_fw_5052.h deleted in firmware/master and modified in HEAD. Version HEAD of drivers/usb/serial/ti_fw_5052.h left in tree.
CONFLICT (content): Merge conflict in drivers/usb/serial/ti_usb_3410_5052.c
CONFLICT (content): Merge conflict in sound/pci/Kconfig
CONFLICT (content): Merge conflict in sound/pci/maestro3.c
CONFLICT (content): Merge conflict in sound/pci/ymfpci/ymfpci_main.c
Applying Fix various 8390 builds
Applying firmware: build fixes 2

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-05  7:52 linux-next: Tree for June 5 Stephen Rothwell
@ 2008-06-06  2:56 ` Andrew Morton
  2008-06-06  3:46   ` Andrew Morton
  2008-06-06  7:17   ` Ingo Molnar
  0 siblings, 2 replies; 54+ messages in thread
From: Andrew Morton @ 2008-06-06  2:56 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: linux-next, LKML, Ingo Molnar

On Thu, 5 Jun 2008 17:52:17 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:

> I have created today's linux-next tree at
> git://git.kernel.org/pub/scm/linux/kernel/git/sfr/linux-next.git

Instantly oopses on two x86_64 boxes with this config:
http://userweb.kernel.org/~akpm/config-akpm2.txt

oops: http://userweb.kernel.org/~akpm/p6056454.jpg

At a guess I'd say the sched_domains code is calling into slab before
slab is initalised.  Something like that.


I had to do this:

--- a/arch/x86/kernel/traps_64.c~a
+++ a/arch/x86/kernel/traps_64.c
@@ -504,6 +504,7 @@ void show_registers(struct pt_regs *regs
 		}
 	}
 	printk("\n");
+	for (  ;; );
 }	
 
 int is_valid_bugaddr(unsigned long ip)
_

to collect that oops.  Otherwise it scrolled away due to "trying to
kill init" doing a dump_stack.  pause_on_oops seems to not be working
properly any more.  It used to.


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06  2:56 ` Andrew Morton
@ 2008-06-06  3:46   ` Andrew Morton
  2008-06-06  7:17   ` Ingo Molnar
  1 sibling, 0 replies; 54+ messages in thread
From: Andrew Morton @ 2008-06-06  3:46 UTC (permalink / raw)
  To: Stephen Rothwell, linux-next, LKML, Ingo Molnar

On Thu, 5 Jun 2008 19:56:04 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:

> On Thu, 5 Jun 2008 17:52:17 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> 
> > I have created today's linux-next tree at
> > git://git.kernel.org/pub/scm/linux/kernel/git/sfr/linux-next.git
> 
> Instantly oopses on two x86_64 boxes with this config:
> http://userweb.kernel.org/~akpm/config-akpm2.txt
> 
> oops: http://userweb.kernel.org/~akpm/p6056454.jpg
> 
> At a guess I'd say the sched_domains code is calling into slab before
> slab is initalised.  Something like that.

With CONFIG_SLUB=y it dies differently:

http://userweb.kernel.org/~akpm/p6056455.jpg

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06  2:56 ` Andrew Morton
  2008-06-06  3:46   ` Andrew Morton
@ 2008-06-06  7:17   ` Ingo Molnar
  2008-06-06  7:25     ` Ingo Molnar
                       ` (2 more replies)
  1 sibling, 3 replies; 54+ messages in thread
From: Ingo Molnar @ 2008-06-06  7:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Stephen Rothwell, linux-next, LKML


* Andrew Morton <akpm@linux-foundation.org> wrote:

> On Thu, 5 Jun 2008 17:52:17 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> 
> > I have created today's linux-next tree at
> > git://git.kernel.org/pub/scm/linux/kernel/git/sfr/linux-next.git
> 
> Instantly oopses on two x86_64 boxes with this config:
> http://userweb.kernel.org/~akpm/config-akpm2.txt
> 
> oops: http://userweb.kernel.org/~akpm/p6056454.jpg
>
> At a guess I'd say the sched_domains code is calling into slab before 
> slab is initalised.  Something like that.

did SLUB change in linux-next? There no such problem in -tip.

> I had to do this:
> 
> --- a/arch/x86/kernel/traps_64.c~a
> +++ a/arch/x86/kernel/traps_64.c
> @@ -504,6 +504,7 @@ void show_registers(struct pt_regs *regs
>  		}
>  	}
>  	printk("\n");
> +	for (  ;; );
>  }	
>  
>  int is_valid_bugaddr(unsigned long ip)
> _
> 
> to collect that oops.  Otherwise it scrolled away due to "trying to 
> kill init" doing a dump_stack.  pause_on_oops seems to not be working 
> properly any more.  It used to.

hm, perhaps mdelay(1) does not loop for 1 msec anymore? You'll probably 
be able to work it around via pause_on_oops=5000000 or so.

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06  7:17   ` Ingo Molnar
@ 2008-06-06  7:25     ` Ingo Molnar
  2008-06-06  7:33       ` Andrew Morton
  2008-06-06  7:29     ` Andrew Morton
  2008-06-06  7:33     ` Stephen Rothwell
  2 siblings, 1 reply; 54+ messages in thread
From: Ingo Molnar @ 2008-06-06  7:25 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Stephen Rothwell, linux-next, LKML


* Ingo Molnar <mingo@elte.hu> wrote:

> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > On Thu, 5 Jun 2008 17:52:17 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> > 
> > > I have created today's linux-next tree at
> > > git://git.kernel.org/pub/scm/linux/kernel/git/sfr/linux-next.git
> > 
> > Instantly oopses on two x86_64 boxes with this config:
> > http://userweb.kernel.org/~akpm/config-akpm2.txt
> > 
> > oops: http://userweb.kernel.org/~akpm/p6056454.jpg
> >
> > At a guess I'd say the sched_domains code is calling into slab before 
> > slab is initalised.  Something like that.
> 
> did SLUB change in linux-next? There is no such problem in -tip.

i just successfully booted your config on 4 separate 64-bit test-systems 
with latest -tip. (two dual-core boxes, a quad and a 16way box) Latest 
-tip includes sched-next and x86-next as well.

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06  7:17   ` Ingo Molnar
  2008-06-06  7:25     ` Ingo Molnar
@ 2008-06-06  7:29     ` Andrew Morton
  2008-06-06  9:48       ` Andrew Morton
  2008-06-06  7:33     ` Stephen Rothwell
  2 siblings, 1 reply; 54+ messages in thread
From: Andrew Morton @ 2008-06-06  7:29 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Stephen Rothwell, linux-next, LKML

On Fri, 6 Jun 2008 09:17:07 +0200 Ingo Molnar <mingo@elte.hu> wrote:

> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > On Thu, 5 Jun 2008 17:52:17 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> > 
> > > I have created today's linux-next tree at
> > > git://git.kernel.org/pub/scm/linux/kernel/git/sfr/linux-next.git
> > 
> > Instantly oopses on two x86_64 boxes with this config:
> > http://userweb.kernel.org/~akpm/config-akpm2.txt
> > 
> > oops: http://userweb.kernel.org/~akpm/p6056454.jpg
> >
> > At a guess I'd say the sched_domains code is calling into slab before 
> > slab is initalised.  Something like that.
> 
> did SLUB change in linux-next? There no such problem in -tip.

It crashes on two quite different machines with both slab and slub.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06  7:17   ` Ingo Molnar
  2008-06-06  7:25     ` Ingo Molnar
  2008-06-06  7:29     ` Andrew Morton
@ 2008-06-06  7:33     ` Stephen Rothwell
  2 siblings, 0 replies; 54+ messages in thread
From: Stephen Rothwell @ 2008-06-06  7:33 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, linux-next, LKML

[-- Attachment #1: Type: text/plain, Size: 990 bytes --]

Hi Ingo,

On Fri, 6 Jun 2008 09:17:07 +0200 Ingo Molnar <mingo@elte.hu> wrote:
>
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > Instantly oopses on two x86_64 boxes with this config:
> > http://userweb.kernel.org/~akpm/config-akpm2.txt
> > 
> > oops: http://userweb.kernel.org/~akpm/p6056454.jpg
> >
> > At a guess I'd say the sched_domains code is calling into slab before 
> > slab is initalised.  Something like that.
> 
> did SLUB change in linux-next? There no such problem in -tip.

$ git rev-list stable..next-20080605 -- mm/slub.c include/linux/slub_def.h
139e2551f25697a242de2b9c61d4514c2f762ca8
0bb08241ce68aaa70b7b804b4d6319d8bad3ae24

The first is the merge of the trivial tree and the second is in that tree
and only changes some comments in slub.c

$ git rev-list stable..next-20080605 -- mm/slab.c include/linux/slab*
(nothing)

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06  7:25     ` Ingo Molnar
@ 2008-06-06  7:33       ` Andrew Morton
  2008-06-06  7:41         ` Ingo Molnar
  0 siblings, 1 reply; 54+ messages in thread
From: Andrew Morton @ 2008-06-06  7:33 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Stephen Rothwell, linux-next, LKML

On Fri, 6 Jun 2008 09:25:36 +0200 Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> > * Andrew Morton <akpm@linux-foundation.org> wrote:
> > 
> > > On Thu, 5 Jun 2008 17:52:17 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> > > 
> > > > I have created today's linux-next tree at
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/sfr/linux-next.git
> > > 
> > > Instantly oopses on two x86_64 boxes with this config:
> > > http://userweb.kernel.org/~akpm/config-akpm2.txt
> > > 
> > > oops: http://userweb.kernel.org/~akpm/p6056454.jpg
> > >
> > > At a guess I'd say the sched_domains code is calling into slab before 
> > > slab is initalised.  Something like that.
> > 
> > did SLUB change in linux-next? There is no such problem in -tip.
> 
> i just successfully booted your config on 4 separate 64-bit test-systems 
> with latest -tip. (two dual-core boxes, a quad and a 16way box) Latest 
> -tip includes sched-next and x86-next as well.

What's the point in testing a radically differenet kernel from the one
which is known to be crashing?

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06  7:33       ` Andrew Morton
@ 2008-06-06  7:41         ` Ingo Molnar
  2008-06-06  7:47           ` Andrew Morton
  0 siblings, 1 reply; 54+ messages in thread
From: Ingo Molnar @ 2008-06-06  7:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Stephen Rothwell, linux-next, LKML, the arch/x86 maintainers


* Andrew Morton <akpm@linux-foundation.org> wrote:

> > > did SLUB change in linux-next? There is no such problem in -tip.
> > 
> > i just successfully booted your config on 4 separate 64-bit 
> > test-systems with latest -tip. (two dual-core boxes, a quad and a 
> > 16way box) Latest -tip includes sched-next and x86-next as well.
> 
> What's the point in testing a radically differenet kernel from the one 
> which is known to be crashing?

well, you Cc:-ed me, so i wanted to exclude -tip's 750+ commits in this 
area (scheduling, 64-bit x86) in the first step.

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06  7:41         ` Ingo Molnar
@ 2008-06-06  7:47           ` Andrew Morton
  2008-06-06  7:53             ` Stephen Rothwell
  2008-06-06  8:23             ` Ingo Molnar
  0 siblings, 2 replies; 54+ messages in thread
From: Andrew Morton @ 2008-06-06  7:47 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Stephen Rothwell, linux-next, LKML, the arch/x86 maintainers

On Fri, 6 Jun 2008 09:41:37 +0200 Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > > > did SLUB change in linux-next? There is no such problem in -tip.
> > > 
> > > i just successfully booted your config on 4 separate 64-bit 
> > > test-systems with latest -tip. (two dual-core boxes, a quad and a 
> > > 16way box) Latest -tip includes sched-next and x86-next as well.
> > 
> > What's the point in testing a radically differenet kernel from the one 
> > which is known to be crashing?
> 
> well, you Cc:-ed me, so i wanted to exclude -tip's 750+ commits in this 
> area (scheduling, 64-bit x86) in the first step.
> 

What's the relationship between -tip and linux-next?

The crash seems to be due to sched_domains startup ordering, at a guess.

My third bisect iteration has hit this:

arch/x86/mm/kmmio.c: In function 'get_kmmio_probe':
arch/x86/mm/kmmio.c:85: error: implicit declaration of function 'list_for_each_entry_rcu'
arch/x86/mm/kmmio.c:85: error: 'list' undeclared (first use in this function)
arch/x86/mm/kmmio.c:85: error: (Each undeclared identifier is reported only once
arch/x86/mm/kmmio.c:85: error: for each function it appears in.)
arch/x86/mm/kmmio.c:85: error: syntax error before '{' token
arch/x86/mm/kmmio.c:88: warning: no return statement in function returning non-void
arch/x86/mm/kmmio.c: In function 'get_kmmio_fault_page':
arch/x86/mm/kmmio.c:100: error: 'list' undeclared (first use in this function)
arch/x86/mm/kmmio.c:100: error: syntax error before '{' token
arch/x86/mm/kmmio.c:103: warning: no return statement in function returning non-void
arch/x86/mm/kmmio.c: In function 'add_kmmio_fault_page':
arch/x86/mm/kmmio.c:328: error: implicit declaration of function 'list_add_rcu'
arch/x86/mm/kmmio.c: In function 'remove_kmmio_fault_pages':
arch/x86/mm/kmmio.c:420: error: implicit declaration of function 'list_del_rcu'


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06  7:47           ` Andrew Morton
@ 2008-06-06  7:53             ` Stephen Rothwell
  2008-06-06  8:01               ` Andrew Morton
  2008-06-06  8:27               ` Ingo Molnar
  2008-06-06  8:23             ` Ingo Molnar
  1 sibling, 2 replies; 54+ messages in thread
From: Stephen Rothwell @ 2008-06-06  7:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ingo Molnar, linux-next, LKML, the arch/x86 maintainers

Hi Andrew,

On Fri, 6 Jun 2008 00:47:43 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:
>
> My third bisect iteration has hit this:
> 
> arch/x86/mm/kmmio.c: In function 'get_kmmio_probe':
> arch/x86/mm/kmmio.c:85: error: implicit declaration of function 'list_for_each_entry_rcu'

You need the following patch from linux-next.  Which should be the commit
immediately after the merge of the ftrace tree.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

>From ee19aa543ada9ce11a0b3b8480f3a268ff86cb02 Mon Sep 17 00:00:00 2001
From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Tue, 27 May 2008 12:53:04 +1000
Subject: [PATCH] ftrace: fix rculist split fallout

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
---
 arch/x86/mm/kmmio.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/mm/kmmio.c b/arch/x86/mm/kmmio.c
index b65871e..7bfdad7 100644
--- a/arch/x86/mm/kmmio.c
+++ b/arch/x86/mm/kmmio.c
@@ -23,6 +23,7 @@
 #include <linux/errno.h>
 #include <asm/debugreg.h>
 #include <linux/mmiotrace.h>
+#include <linux/rculist.h>
 
 #define KMMIO_PAGE_HASH_BITS 4
 #define KMMIO_PAGE_TABLE_SIZE (1 << KMMIO_PAGE_HASH_BITS)
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06  7:53             ` Stephen Rothwell
@ 2008-06-06  8:01               ` Andrew Morton
  2008-06-06  8:22                 ` Stephen Rothwell
  2008-06-06  8:27               ` Ingo Molnar
  1 sibling, 1 reply; 54+ messages in thread
From: Andrew Morton @ 2008-06-06  8:01 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: Ingo Molnar, linux-next, LKML, the arch/x86 maintainers

On Fri, 6 Jun 2008 17:53:58 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:

> On Fri, 6 Jun 2008 00:47:43 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > My third bisect iteration has hit this:
> > 
> > arch/x86/mm/kmmio.c: In function 'get_kmmio_probe':
> > arch/x86/mm/kmmio.c:85: error: implicit declaration of function 'list_for_each_entry_rcu'
> 
> You need the following patch from linux-next.  Which should be the commit
> immediately after the merge of the ftrace tree.

Well yes - I just bodged it by hand then unbodged it later.  But we
have a bisection break there.  Admittedly a minor one, unless the bug
you're bisecting for requires that kprobes be configured.  But it would
be nice to squish it.

I hope Ingo isn't following this
once-you've-checked-it-in-you-can't-fix-it stupidity :(


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06  8:01               ` Andrew Morton
@ 2008-06-06  8:22                 ` Stephen Rothwell
  2008-06-06  8:30                   ` Andrew Morton
  0 siblings, 1 reply; 54+ messages in thread
From: Stephen Rothwell @ 2008-06-06  8:22 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ingo Molnar, linux-next, LKML, the arch/x86 maintainers

[-- Attachment #1: Type: text/plain, Size: 991 bytes --]

Hi Andrew,

On Fri, 6 Jun 2008 01:01:49 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:
>
> Well yes - I just bodged it by hand then unbodged it later.  But we
> have a bisection break there.  Admittedly a minor one, unless the bug
> you're bisecting for requires that kprobes be configured.  But it would
> be nice to squish it.
> 
> I hope Ingo isn't following this
> once-you've-checked-it-in-you-can't-fix-it stupidity :(

Its a break caused by the merge of the ftrace tree into the linux-next
tree (because at the point I merge the ftrace tree, linux-next contains
the rcu tree which has moves stuff into rculist.h), so logically that
patch should become part of the merge commit.  If it was part of the
merge, you could never bisect to a point where you got this build
breakage.

Each tree is fine on its own if you go one step back from the merge.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06  7:47           ` Andrew Morton
  2008-06-06  7:53             ` Stephen Rothwell
@ 2008-06-06  8:23             ` Ingo Molnar
  2008-06-06  8:28               ` Stephen Rothwell
  2008-06-06  8:38               ` Andrew Morton
  1 sibling, 2 replies; 54+ messages in thread
From: Ingo Molnar @ 2008-06-06  8:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Stephen Rothwell, linux-next, LKML, the arch/x86 maintainers


* Andrew Morton <akpm@linux-foundation.org> wrote:

> > > > i just successfully booted your config on 4 separate 64-bit 
> > > > test-systems with latest -tip. (two dual-core boxes, a quad and a 
> > > > 16way box) Latest -tip includes sched-next and x86-next as well.
> > > 
> > > What's the point in testing a radically differenet kernel from the one 
> > > which is known to be crashing?
> > 
> > well, you Cc:-ed me, so i wanted to exclude -tip's 750+ commits in this 
> > area (scheduling, 64-bit x86) in the first step.
> 
> What's the relationship between -tip and linux-next?

most of the -tip topics (there are 75 of them currently) are present in 
linux-next - about ~70% of all -tip commits are in linux-next already. 
The stuff that is not in linux-next yet is either because it's: 
miscellany fixes (i.e. intentionally grabbed out-of-tree to make our 
tests work better), not cooked enough yet, or because we are still 
working it out - tip is less than a month old still.

in general the rule is that if there's anything we want to push 
upstream, it will show up in linux-next.

> The crash seems to be due to sched_domains startup ordering, at a guess.
> 
> My third bisect iteration has hit this:
> 
> arch/x86/mm/kmmio.c: In function 'get_kmmio_probe':
> arch/x86/mm/kmmio.c:85: error: implicit declaration of function 'list_for_each_entry_rcu'
> arch/x86/mm/kmmio.c:85: error: 'list' undeclared (first use in this function)

hm, which commit is this exactly? I've never hit it myself in bisection 
(and there are days when i bisect -tip several times). We'll respin 
tip/tracing/mmiotrace if it's bisection-hostile. You can probably nudge 
it into building via "git-bisect next".

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06  7:53             ` Stephen Rothwell
  2008-06-06  8:01               ` Andrew Morton
@ 2008-06-06  8:27               ` Ingo Molnar
  1 sibling, 0 replies; 54+ messages in thread
From: Ingo Molnar @ 2008-06-06  8:27 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Andrew Morton, linux-next, LKML, the arch/x86 maintainers


* Stephen Rothwell <sfr@canb.auug.org.au> wrote:

> > arch/x86/mm/kmmio.c: In function 'get_kmmio_probe':
> > arch/x86/mm/kmmio.c:85: error: implicit declaration of function 'list_for_each_entry_rcu'
> 
> You need the following patch from linux-next.  Which should be the commit
> immediately after the merge of the ftrace tree.
> 
> -- 
> Cheers,
> Stephen Rothwell                    sfr@canb.auug.org.au
> http://www.canb.auug.org.au/~sfr/
> 
> >From ee19aa543ada9ce11a0b3b8480f3a268ff86cb02 Mon Sep 17 00:00:00 2001
> From: Stephen Rothwell <sfr@canb.auug.org.au>
> Date: Tue, 27 May 2008 12:53:04 +1000
> Subject: [PATCH] ftrace: fix rculist split fallout

ah, that one - that's in the tip/tracing/mmiotrace-mergefixups branch. 
ideally this should be embedded in the merge commit of 
tip/core/rcu+tip/auto-ftrace-next [so that no bisection can ever hit the 
combined trees without also getting the merge fixup], but havent found a 
good Git way of doing that yet.

	Ingo

----------->
commit 668a6c3654560aef8741642478973e205a4f02bf
Author: Ingo Molnar <mingo@elte.hu>
Date:   Mon May 19 13:35:24 2008 +0200

    - fix mmioftrace + rcu merge interaction
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

diff --git a/arch/x86/mm/kmmio.c b/arch/x86/mm/kmmio.c
index b65871e..93d8203 100644
--- a/arch/x86/mm/kmmio.c
+++ b/arch/x86/mm/kmmio.c
@@ -6,6 +6,7 @@
  */
 
 #include <linux/list.h>
+#include <linux/rculist.h>
 #include <linux/spinlock.h>
 #include <linux/hash.h>
 #include <linux/init.h>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06  8:23             ` Ingo Molnar
@ 2008-06-06  8:28               ` Stephen Rothwell
  2008-06-06  8:33                 ` Ingo Molnar
  2008-06-06  8:38               ` Andrew Morton
  1 sibling, 1 reply; 54+ messages in thread
From: Stephen Rothwell @ 2008-06-06  8:28 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, linux-next, LKML, the arch/x86 maintainers

[-- Attachment #1: Type: text/plain, Size: 747 bytes --]

On Fri, 6 Jun 2008 10:23:25 +0200 Ingo Molnar <mingo@elte.hu> wrote:
>
> hm, which commit is this exactly? I've never hit it myself in bisection 
> (and there are days when i bisect -tip several times). We'll respin 
> tip/tracing/mmiotrace if it's bisection-hostile. You can probably nudge 
> it into building via "git-bisect next".

See my other email.  This is because the ftrace tree does not merge well
with the rcu tree.

I may start merging such build breakage fixes into the actual merge
commits that cause them.  That will make linux-next more bisectable, but
means I have to remember that I did it for Linus' sake.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06  8:22                 ` Stephen Rothwell
@ 2008-06-06  8:30                   ` Andrew Morton
  2008-06-06  8:36                     ` Ingo Molnar
  2008-06-06 11:50                     ` Paul Mackerras
  0 siblings, 2 replies; 54+ messages in thread
From: Andrew Morton @ 2008-06-06  8:30 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: Ingo Molnar, linux-next, LKML, the arch/x86 maintainers

On Fri, 6 Jun 2008 18:22:06 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:

> Hi Andrew,
> 
> On Fri, 6 Jun 2008 01:01:49 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > Well yes - I just bodged it by hand then unbodged it later.  But we
> > have a bisection break there.  Admittedly a minor one, unless the bug
> > you're bisecting for requires that kprobes be configured.  But it would
> > be nice to squish it.
> > 
> > I hope Ingo isn't following this
> > once-you've-checked-it-in-you-can't-fix-it stupidity :(
> 
> Its a break caused by the merge of the ftrace tree into the linux-next
> tree (because at the point I merge the ftrace tree, linux-next contains
> the rcu tree which has moves stuff into rculist.h), so logically that
> patch should become part of the merge commit.  If it was part of the
> merge, you could never bisect to a point where you got this build
> breakage.
> 
> Each tree is fine on its own if you go one step back from the merge.

Well OK.  But patches in fact _do_ go into Linux as a single linear
stream of commits.  But the whole git model ignores that reality and
here we see the result.

And saying "git doesn't work like that - you don't understand" just
doesn't cut it.  It is a tool's job to permit humans to implement the
workflow which they wish to follow.  Not to go and force them into
doing something inferior.

Sigh.

/usualrant

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06  8:28               ` Stephen Rothwell
@ 2008-06-06  8:33                 ` Ingo Molnar
  0 siblings, 0 replies; 54+ messages in thread
From: Ingo Molnar @ 2008-06-06  8:33 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Andrew Morton, linux-next, LKML, the arch/x86 maintainers


* Stephen Rothwell <sfr@canb.auug.org.au> wrote:

> On Fri, 6 Jun 2008 10:23:25 +0200 Ingo Molnar <mingo@elte.hu> wrote:
> >
> > hm, which commit is this exactly? I've never hit it myself in 
> > bisection (and there are days when i bisect -tip several times). 
> > We'll respin tip/tracing/mmiotrace if it's bisection-hostile. You 
> > can probably nudge it into building via "git-bisect next".
> 
> See my other email.  This is because the ftrace tree does not merge 
> well with the rcu tree.

yeah, we have this fixup in -tip as well, in a structured way: you might 
want to start tracking the tip/tracing/mmiotrace-mergefixups branch to 
pick it up.

or we could offer you a full auto-tip-next plug-and-play branch as well. 
(there's no reason to redo all these integration steps)

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06  8:30                   ` Andrew Morton
@ 2008-06-06  8:36                     ` Ingo Molnar
  2008-06-06 11:50                     ` Paul Mackerras
  1 sibling, 0 replies; 54+ messages in thread
From: Ingo Molnar @ 2008-06-06  8:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Stephen Rothwell, linux-next, LKML, the arch/x86 maintainers


* Andrew Morton <akpm@linux-foundation.org> wrote:

> > Each tree is fine on its own if you go one step back from the merge.
> 
> Well OK.  But patches in fact _do_ go into Linux as a single linear 
> stream of commits.  But the whole git model ignores that reality and 
> here we see the result.

it's fixable via "git-merge -n" and then doing a second git-merge, to 
create only a single commit. OTOH, it's more transparent to have such 
manual fixups in a followup commit.

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06  8:23             ` Ingo Molnar
  2008-06-06  8:28               ` Stephen Rothwell
@ 2008-06-06  8:38               ` Andrew Morton
  2008-06-06  8:49                 ` Ingo Molnar
  1 sibling, 1 reply; 54+ messages in thread
From: Andrew Morton @ 2008-06-06  8:38 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Stephen Rothwell, linux-next, LKML, the arch/x86 maintainers

On Fri, 6 Jun 2008 10:23:25 +0200 Ingo Molnar <mingo@elte.hu> wrote:

> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > > > > i just successfully booted your config on 4 separate 64-bit 
> > > > > test-systems with latest -tip. (two dual-core boxes, a quad and a 
> > > > > 16way box) Latest -tip includes sched-next and x86-next as well.
> > > > 
> > > > What's the point in testing a radically differenet kernel from the one 
> > > > which is known to be crashing?
> > > 
> > > well, you Cc:-ed me, so i wanted to exclude -tip's 750+ commits in this 
> > > area (scheduling, 64-bit x86) in the first step.
> > 
> > What's the relationship between -tip and linux-next?
> 
> most of the -tip topics (there are 75 of them currently) are present in 
> linux-next - about ~70% of all -tip commits are in linux-next already. 
> The stuff that is not in linux-next yet is either because it's: 
> miscellany fixes (i.e. intentionally grabbed out-of-tree to make our 
> tests work better), not cooked enough yet, or because we are still 
> working it out - tip is less than a month old still.
> 
> in general the rule is that if there's anything we want to push 
> upstream, it will show up in linux-next.

I don't think it's a good idea for you guys to be off working on 2.6.28
material when we're trying to stabilise 2.6.25, 2.6.26 and preparing
for 2.6.27.

What's especially regrettable is that, afaik, you are expending testing
resources on a tree which nobody will ever run rather than upon the
tree which everyone _will_ run :(  We'd all be better off if that testing
was being performed against linux-next.  Or at least some (most) of it.


ho hum.

Bisecting: 23 revisions left to test after this
[919b0a2702e5a0284094f63215da65539f6ef692] Merge branch 'x86/ptemask' into auto-x86-next

No -mm today...

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06  8:38               ` Andrew Morton
@ 2008-06-06  8:49                 ` Ingo Molnar
  2008-06-06  9:01                   ` Andrew Morton
  0 siblings, 1 reply; 54+ messages in thread
From: Ingo Molnar @ 2008-06-06  8:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Stephen Rothwell, linux-next, LKML, the arch/x86 maintainers


* Andrew Morton <akpm@linux-foundation.org> wrote:

> > most of the -tip topics (there are 75 of them currently) are present 
> > in linux-next - about ~70% of all -tip commits are in linux-next 
> > already. The stuff that is not in linux-next yet is either because 
> > it's: miscellany fixes (i.e. intentionally grabbed out-of-tree to 
> > make our tests work better), not cooked enough yet, or because we 
> > are still working it out - tip is less than a month old still.
> > 
> > in general the rule is that if there's anything we want to push 
> > upstream, it will show up in linux-next.
> 
> I don't think it's a good idea for you guys to be off working on 
> 2.6.28 material when we're trying to stabilise 2.6.25, 2.6.26 and 
> preparing for 2.6.27.
> 
> What's especially regrettable is that, afaik, you are expending 
> testing resources on a tree which nobody will ever run rather than 
> upon the tree which everyone _will_ run :( [...]

what do you mean? We are testing commits that everybody will run and are 
pre-filtering them for sanity and stability before they hit linux-next.

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06  8:49                 ` Ingo Molnar
@ 2008-06-06  9:01                   ` Andrew Morton
  2008-06-06 10:47                     ` Ingo Molnar
  0 siblings, 1 reply; 54+ messages in thread
From: Andrew Morton @ 2008-06-06  9:01 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Stephen Rothwell, linux-next, LKML, the arch/x86 maintainers

On Fri, 6 Jun 2008 10:49:49 +0200 Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > > most of the -tip topics (there are 75 of them currently) are present 
> > > in linux-next - about ~70% of all -tip commits are in linux-next 
> > > already. The stuff that is not in linux-next yet is either because 
> > > it's: miscellany fixes (i.e. intentionally grabbed out-of-tree to 
> > > make our tests work better), not cooked enough yet, or because we 
> > > are still working it out - tip is less than a month old still.
> > > 
> > > in general the rule is that if there's anything we want to push 
> > > upstream, it will show up in linux-next.
> > 
> > I don't think it's a good idea for you guys to be off working on 
> > 2.6.28 material when we're trying to stabilise 2.6.25, 2.6.26 and 
> > preparing for 2.6.27.
> > 
> > What's especially regrettable is that, afaik, you are expending 
> > testing resources on a tree which nobody will ever run rather than 
> > upon the tree which everyone _will_ run :( [...]
> 
> what do you mean? We are testing commits that everybody will run and are 
> pre-filtering them for sanity and stability before they hit linux-next.
> 

One doesn't test commits - one tests a tree.  And the -tip tree is
2.6.26-rc5 plus a bunch of x86 changes.  That tree will never be run by
anyone.  Testing -tip fails to pick up problems which are caused by
integration of the x86 changes with everyone else's work and it fails
to pick up problems which lie wholly outside the x86 changes.

For both these reasons it would be more valuable were that testing
effort to be expended on our 2.6.27 candidate tree.

Plus, of course, there's the risk that linux-next contains x86-only
regressions which were fixed or avoided in -tip.




Bisecting: 5 revisions left to test after this
[29657a44f8660acd8751d7e9f5aac06ec8633481] x86: cleanup early per cpu variables/accesses v4


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06  7:29     ` Andrew Morton
@ 2008-06-06  9:48       ` Andrew Morton
  2008-06-06  9:54         ` Andrew Morton
  2008-06-06 10:54         ` Andrew Morton
  0 siblings, 2 replies; 54+ messages in thread
From: Andrew Morton @ 2008-06-06  9:48 UTC (permalink / raw)
  To: Ingo Molnar, Stephen Rothwell, linux-next, LKML

On Fri, 6 Jun 2008 00:29:57 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:

> It crashes on two quite different machines with both slab and slub.

OK, I seem to be screwed here.

Five commits to go and my bisection point is at

Author: Mike Travis <travis@sgi.com>  2008-05-12 12:21:13
Committer: Thomas Gleixner <tglx@linutronix.de>  2008-05-23 09:08:00
Parent: a9ad585c8a18f7ba754b85f5786976609b9d7d29 (x86: remove the static 256k node_to_cpumask_map)
Branch: 
Follows: v2.6.26-rc2
Precedes: next-20080526

But here I'm getting a totally different crash - an early exception.

I'll try a linear search starting at

Author: Mike Travis <travis@sgi.com>  2008-05-12 12:21:12
Committer: Thomas Gleixner <tglx@linutronix.de>  2008-05-23 09:05:25
Parent: b65e04b53ffcb4002737a5346c9ff8865c37be58 (x86: don't call pxm_to_node again)
Child:  dfdf1d75efee39e9396f8384c6f3bf555349ed60 (x86: modify Kconfig to allow up to 4096 cpus)
Branch: 

and ending at

Author: Mike Travis <travis@sgi.com>  2008-05-12 12:21:13
Committer: Thomas Gleixner <tglx@linutronix.de>  2008-05-23 09:08:00
Parent: a9ad585c8a18f7ba754b85f5786976609b9d7d29 (x86: remove the static 256k node_to_cpumask_map)
Branch: 
Follows: v2.6.26-rc2
Precedes: next-20080526




^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06  9:48       ` Andrew Morton
@ 2008-06-06  9:54         ` Andrew Morton
  2008-06-06 10:10           ` Ingo Molnar
  2008-06-06 10:54         ` Andrew Morton
  1 sibling, 1 reply; 54+ messages in thread
From: Andrew Morton @ 2008-06-06  9:54 UTC (permalink / raw)
  To: Ingo Molnar, Stephen Rothwell, linux-next, LKML

On Fri, 6 Jun 2008 02:48:11 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:

> On Fri, 6 Jun 2008 00:29:57 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > It crashes on two quite different machines with both slab and slub.
> 
> OK, I seem to be screwed here.
> 
> Five commits to go and my bisection point is at

argh, that was copy-n-pasted from gitk which doesn't give the commit IDs.

> Author: Mike Travis <travis@sgi.com>  2008-05-12 12:21:13
> Committer: Thomas Gleixner <tglx@linutronix.de>  2008-05-23 09:08:00
> Parent: a9ad585c8a18f7ba754b85f5786976609b9d7d29 (x86: remove the static 256k node_to_cpumask_map)
> Branch: 
> Follows: v2.6.26-rc2
> Precedes: next-20080526

29657a44f8660acd8751d7e9f5aac06ec8633481
   x86: cleanup early per cpu variables/accesses v4
 
> But here I'm getting a totally different crash - an early exception.
> 
> I'll try a linear search starting at
> 
> Author: Mike Travis <travis@sgi.com>  2008-05-12 12:21:12
> Committer: Thomas Gleixner <tglx@linutronix.de>  2008-05-23 09:05:25
> Parent: b65e04b53ffcb4002737a5346c9ff8865c37be58 (x86: don't call pxm_to_node again)
> Child:  dfdf1d75efee39e9396f8384c6f3bf555349ed60 (x86: modify Kconfig to allow up to 4096 cpus)
> Branch: 

ff0e010ef613b0e7136f2f40ec4b51273676b085
   x86: fix remove cpu_pda table patch
 
> and ending at
> 
> Author: Mike Travis <travis@sgi.com>  2008-05-12 12:21:13
> Committer: Thomas Gleixner <tglx@linutronix.de>  2008-05-23 09:08:00
> Parent: a9ad585c8a18f7ba754b85f5786976609b9d7d29 (x86: remove the static 256k node_to_cpumask_map)
> Branch: 
> Follows: v2.6.26-rc2
> Precedes: next-20080526


78d49c6d890aee9cf8aea371011c9d7b0121b822
    x86: remove static boot_cpu_pda array v2
 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06  9:54         ` Andrew Morton
@ 2008-06-06 10:10           ` Ingo Molnar
  0 siblings, 0 replies; 54+ messages in thread
From: Ingo Molnar @ 2008-06-06 10:10 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Stephen Rothwell, linux-next, LKML, Mike Travis


* Andrew Morton <akpm@linux-foundation.org> wrote:

> > Author: Mike Travis <travis@sgi.com>  2008-05-12 12:21:13
> > Committer: Thomas Gleixner <tglx@linutronix.de>  2008-05-23 09:08:00
> > Parent: a9ad585c8a18f7ba754b85f5786976609b9d7d29 (x86: remove the static 256k node_to_cpumask_map)
> > Branch: 
> > Follows: v2.6.26-rc2
> > Precedes: next-20080526
> 
> 29657a44f8660acd8751d7e9f5aac06ec8633481
>    x86: cleanup early per cpu variables/accesses v4

hm, these commits have not caused problems in testing before. Mike 
Cc:-ed.

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06  9:01                   ` Andrew Morton
@ 2008-06-06 10:47                     ` Ingo Molnar
  2008-06-06 16:37                       ` Ingo Molnar
  0 siblings, 1 reply; 54+ messages in thread
From: Ingo Molnar @ 2008-06-06 10:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Stephen Rothwell, linux-next, LKML, the arch/x86 maintainers


* Andrew Morton <akpm@linux-foundation.org> wrote:

> > what do you mean? We are testing commits that everybody will run and 
> > are pre-filtering them for sanity and stability before they hit 
> > linux-next.
> 
> One doesn't test commits - one tests a tree.  And the -tip tree is 
> 2.6.26-rc5 plus a bunch of x86 changes. [...]

no, 90%+ of all bugs are not due to tree interaction effects but are 
caused by individual commits, triggerable on a particular 
system/workload. (Our historic regression list is the proof for that, 
can give you itemized statistics if you want.)

also, the -tip tree is not "2.6.26-rc5 plus a bunch of x86 changes" but 
v2.6.26-rc5-84-g39b945a plus 75 topic trees we maintain:

build, core/futex-64bit, core/kill-the-BKL, core/locking, core/percpu, 
core/printk, core/rcu, core/rodata, core/softirq, core/softlockup, 
core/stacktrace, core/urgent, cpus4096, genirq, hrtimers, kmemcheck, 
out-of-tree, pci-for-jesse, safe-poison-pointers, sched, sched-devel, 
scratch, stackprotector, timers/clockevents, timers/hpet, 
timers/hrtimers, timers/nohz, timers/posixtimers, tip, tracing/ftrace, 
tracing/ftrace-mergefixups, tracing/immediates, tracing/markers, 
tracing/mmiotrace, tracing/mmiotrace-mergefixups, tracing/nmisafe, 
tracing/sched_markers, tracing/stopmachine-allcpus, tracing/sysprof, 
tracing/textedit, x86/apic, x86/apm, x86/bitops, x86/build, x86/checkme, 
x86/cleanups, x86/cpa, x86/cpu, x86/defconfig, x86/gart, x86/i8259, 
x86/intel, x86/irq, x86/irqstats, x86/kconfig, x86/ldt, x86/mce, 
x86/memtest, x86/mmio, x86/mpparse, x86/nmi, x86/numa, x86/numa-fixes, 
x86/pat, x86/pebs, x86/ptemask, x86/resumetrace, x86/scratch, x86/setup, 
x86/threadinfo, x86/timers, x86/urgent, x86/uv, x86/vdso, x86/xen, 
x86/xsave.

most of which are in linux-next (around 70%), or will be shortly in 
linux-next (more than 90%).

> [...]  That tree will never be run by anyone.  Testing -tip fails to 
> pick up problems which are caused by integration of the x86 changes 
> with everyone else's work and it fails to pick up problems which lie 
> wholly outside the x86 changes.

that's wrong, and here's a very clear counter-example: 95% of the trees 
we all test during a bisection session is executed for the first time 
ever and wont ever be run by anyone else. If the integration aspects 
mattered as much as you claim then bisection would almost never work in 
practice.

Dont get me wrong, integration _does_ matter (and hence we do it 
ourselves, instead of dumping 70+ trees on you!), but the reality is 
that 90% of the bugs are introduced by a single commit and go away if 
the change done by that commit is removed.

The real benefit of integration is not the technical effects of 
integration but the testing effects: people are enabled to test more 
commits at once.

> For both these reasons it would be more valuable were that testing 
> effort to be expended on our 2.6.27 candidate tree.

but that's blatantly wrong: my testing would only be wasted if my test 
capacity was unused. In reality it's fully utilized: half of it is spent 
on general upstream problems we trigger [9381 commits since v2.6.25 and 
counting], the other half of it is spent on our incoming -tip flow of 
patches for v2.6.27 [750 commits and counting].

If there's spare capacity we do volunteer to debug whatever problem that 
comes up. In fact i'd say i still test way more than i should ;-)

> Plus, of course, there's the risk that linux-next contains x86-only 
> regressions which were fixed or avoided in -tip.

there's risk from every single line of source code difference. There's 
risk from having just a single binary bit of difference between two 
user-space installations. The question is always the amount of risk and 
how to manage that risk.

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06  9:48       ` Andrew Morton
  2008-06-06  9:54         ` Andrew Morton
@ 2008-06-06 10:54         ` Andrew Morton
  2008-06-06 11:21           ` Vegard Nossum
                             ` (3 more replies)
  1 sibling, 4 replies; 54+ messages in thread
From: Andrew Morton @ 2008-06-06 10:54 UTC (permalink / raw)
  To: Ingo Molnar, Stephen Rothwell, linux-next, LKML, Mike Travis

On Fri, 6 Jun 2008 02:48:11 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:

> I'll try a linear search starting at

	ff0e010ef613b0e7136f2f40ec4b51273676b085
	Author: Mike Travis <travis@sgi.com>  2008-05-12 12:21:12
	Committer: Thomas Gleixner <tglx@linutronix.de>  2008-05-23 09:05:25
	Parent: b65e04b53ffcb4002737a5346c9ff8865c37be58 (x86: don't call pxm_to_node again)
	Child:  dfdf1d75efee39e9396f8384c6f3bf555349ed60 (x86: modify Kconfig to allow up to 4096 cpus)
	Branch: 
	Follows: v2.6.26-rc2
	Precedes: next-20080526

	    x86: fix remove cpu_pda table patch


Good

	dfdf1d75efee39e9396f8384c6f3bf555349ed60
	Author: Mike Travis <travis@sgi.com>  2008-05-12 12:21:12
	Committer: Thomas Gleixner <tglx@linutronix.de>  2008-05-23 09:05:39
	Parent: ff0e010ef613b0e7136f2f40ec4b51273676b085 (x86: fix remove cpu_pda table patch)
	Child:  29657a44f8660acd8751d7e9f5aac06ec8633481 (x86: cleanup early per cpu variables/accesses v4)
	Branch: 
	Follows: v2.6.26-rc2
	Precedes: next-20080526

	    x86: modify Kconfig to allow up to 4096 cpus

Good

	29657a44f8660acd8751d7e9f5aac06ec8633481
	Author: Mike Travis <travis@sgi.com>  2008-05-12 12:21:12
	Committer: Thomas Gleixner <tglx@linutronix.de>  2008-05-23 09:07:23
	Parent: dfdf1d75efee39e9396f8384c6f3bf555349ed60 (x86: modify Kconfig to allow up to 4096 cpus)
	Child:  543e21916497be5a4005fd5820264ce1de9bd56d (x86: restore pda nodenumber field)
	Branch: 
	Follows: v2.6.26-rc2
	Precedes: next-20080526

	    x86: cleanup early per cpu variables/accesses v4

Good


	543e21916497be5a4005fd5820264ce1de9bd56d
	Author: Mike Travis <travis@sgi.com>  2008-05-12 12:21:12
	Committer: Thomas Gleixner <tglx@linutronix.de>  2008-05-23 09:07:37
	Parent: 29657a44f8660acd8751d7e9f5aac06ec8633481 (x86: cleanup early per cpu variables/accesses v4)
	Child:  a9ad585c8a18f7ba754b85f5786976609b9d7d29 (x86: remove the static 256k node_to_cpumask_map)
	Branch: 
	Follows: v2.6.26-rc2
	Precedes: next-20080526

	    x86: restore pda nodenumber field

Good

	a9ad585c8a18f7ba754b85f5786976609b9d7d29
	Author: Mike Travis <travis@sgi.com>  2008-05-12 12:21:12
	Committer: Thomas Gleixner <tglx@linutronix.de>  2008-05-23 09:07:47
	Parent: 543e21916497be5a4005fd5820264ce1de9bd56d (x86: restore pda nodenumber field)
	Child:  78d49c6d890aee9cf8aea371011c9d7b0121b822 (x86: remove static boot_cpu_pda array v2)
	Branch: 
	Follows: v2.6.26-rc2
	Precedes: next-20080526

	    x86: remove the static 256k node_to_cpumask_map

crash, as described earlier.

I don't know what happened to that early exception - it didn't come back.

The below revert gets linux-next working for me.



From: Andrew Morton <akpm@linux-foundation.org>

Revert

commit a9ad585c8a18f7ba754b85f5786976609b9d7d29
Author: Mike Travis <travis@sgi.com>
Date:   Mon May 12 21:21:12 2008 +0200

    x86: remove the static 256k node_to_cpumask_map
    
      * Consolidate node_to_cpumask operations and remove the 256k
        byte node_to_cpumask_map.  This is done by allocating the
        node_to_cpumask_map array after the number of possible nodes
        (nr_node_ids) is known.
    
      * Debug printouts when CONFIG_DEBUG_PER_CPU_MAPS is active have
        been increased.  It now shows faults when calling node_to_cpumask()
        and node_to_cpumask_ptr().
    
    For inclusion into sched-devel/latest tree.
    
    Based on:
    	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
        +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git
    
    Signed-off-by: Mike Travis <travis@sgi.com>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Cc: Mike Travis <travis@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/x86/kernel/setup.c    |  132 +----------------------------------
 arch/x86/mm/numa_64.c      |    6 +
 include/asm-x86/topology.h |   25 ++----
 3 files changed, 19 insertions(+), 144 deletions(-)

diff -puN arch/x86/kernel/setup.c~revert-x86-remove-the-static-256k-node_to_cpumask_map arch/x86/kernel/setup.c
--- a/arch/x86/kernel/setup.c~revert-x86-remove-the-static-256k-node_to_cpumask_map
+++ a/arch/x86/kernel/setup.c
@@ -35,16 +35,6 @@ EXPORT_EARLY_PER_CPU_SYMBOL(x86_bios_cpu
 /* map cpu index to node index */
 DEFINE_EARLY_PER_CPU(int, x86_cpu_to_node_map, NUMA_NO_NODE);
 EXPORT_EARLY_PER_CPU_SYMBOL(x86_cpu_to_node_map);
-
-/* which logical CPUs are on which nodes */
-cpumask_t *node_to_cpumask_map;
-EXPORT_SYMBOL(node_to_cpumask_map);
-
-/* setup node_to_cpumask_map */
-static void __init setup_node_to_cpumask_map(void);
-
-#else
-static inline void setup_node_to_cpumask_map(void) { }
 #endif
 
 #if defined(CONFIG_HAVE_SETUP_PER_CPU_AREA) && defined(CONFIG_SMP)
@@ -191,15 +181,11 @@ void __init setup_per_cpu_areas(void)
 
 	}
 
-	printk(KERN_DEBUG "NR_CPUS: %d, nr_cpu_ids: %d, nr_node_ids %d\n",
-		NR_CPUS, nr_cpu_ids, nr_node_ids);
+	printk(KERN_DEBUG "NR_CPUS: %d, nr_cpu_ids: %d\n", NR_CPUS, nr_cpu_ids);
 
 	/* Setup percpu data maps */
 	setup_per_cpu_maps();
 
-	/* Setup node to cpumask map */
-	setup_node_to_cpumask_map();
-
 	/* Setup cpumask_of_cpu map */
 	setup_cpumask_of_cpu();
 }
@@ -220,35 +206,6 @@ void __cpuinit amd_enable_pci_ext_cfg(st
 #endif
 
 #ifdef X86_64_NUMA
-
-/*
- * Allocate node_to_cpumask_map based on number of available nodes
- * Requires node_possible_map to be valid.
- *
- * Note: node_to_cpumask() is not valid until after this is done.
- */
-static void __init setup_node_to_cpumask_map(void)
-{
-	unsigned int node, num = 0;
-	cpumask_t *map;
-
-	/* setup nr_node_ids if not done yet */
-	if (nr_node_ids == MAX_NUMNODES) {
-		for_each_node_mask(node, node_possible_map)
-			num = node;
-		nr_node_ids = num + 1;
-	}
-
-	/* allocate the map */
-	map = alloc_bootmem_low(nr_node_ids * sizeof(cpumask_t));
-
-	Dprintk(KERN_DEBUG "Node to cpumask map at %p for %d nodes\n",
-		map, nr_node_ids);
-
-	/* node_to_cpumask() will now work */
-	node_to_cpumask_map = map;
-}
-
 void __cpuinit numa_set_node(int cpu, int node)
 {
 	int *cpu_to_node_map = early_per_cpu_ptr(x86_cpu_to_node_map);
@@ -271,8 +228,6 @@ void __cpuinit numa_clear_node(int cpu)
 	numa_set_node(cpu, NUMA_NO_NODE);
 }
 
-#ifndef CONFIG_DEBUG_PER_CPU_MAPS
-
 void __cpuinit numa_add_cpu(int cpu)
 {
 	cpu_set(cpu, node_to_cpumask_map[early_cpu_to_node(cpu)]);
@@ -282,44 +237,9 @@ void __cpuinit numa_remove_cpu(int cpu)
 {
 	cpu_clear(cpu, node_to_cpumask_map[cpu_to_node(cpu)]);
 }
+#endif /* CONFIG_NUMA */
 
-#else /* CONFIG_DEBUG_PER_CPU_MAPS */
-
-/*
- * --------- debug versions of the numa functions ---------
- */
-static void __cpuinit numa_set_cpumask(int cpu, int enable)
-{
-	int node = cpu_to_node(cpu);
-	cpumask_t *mask;
-	char buf[64];
-
-	if (node_to_cpumask_map == NULL) {
-		printk(KERN_ERR "node_to_cpumask_map NULL\n");
-		dump_stack();
-		return;
-	}
-
-	mask = &node_to_cpumask_map[node];
-	if (enable)
-		cpu_set(cpu, *mask);
-	else
-		cpu_clear(cpu, *mask);
-
-	cpulist_scnprintf(buf, sizeof(buf), *mask);
-	printk(KERN_DEBUG "%s cpu %d node %d: mask now %s\n",
-		enable? "numa_add_cpu":"numa_remove_cpu", cpu, node, buf);
- }
-
-void __cpuinit numa_add_cpu(int cpu)
-{
-	numa_set_cpumask(cpu, 1);
-}
-
-void __cpuinit numa_remove_cpu(int cpu)
-{
-	numa_set_cpumask(cpu, 0);
-}
+#if defined(CONFIG_DEBUG_PER_CPU_MAPS) && defined(CONFIG_X86_64)
 
 int cpu_to_node(int cpu)
 {
@@ -333,10 +253,6 @@ int cpu_to_node(int cpu)
 }
 EXPORT_SYMBOL(cpu_to_node);
 
-/*
- * Same function as cpu_to_node() but used if called before the
- * per_cpu areas are setup.
- */
 int early_cpu_to_node(int cpu)
 {
 	if (early_per_cpu_ptr(x86_cpu_to_node_map))
@@ -345,47 +261,9 @@ int early_cpu_to_node(int cpu)
 	if (!per_cpu_offset(cpu)) {
 		printk(KERN_WARNING
 			"early_cpu_to_node(%d): no per_cpu area!\n", cpu);
-		dump_stack();
+			dump_stack();
 		return NUMA_NO_NODE;
 	}
 	return per_cpu(x86_cpu_to_node_map, cpu);
 }
-
-/*
- * Returns a pointer to the bitmask of CPUs on Node 'node'.
- */
-cpumask_t *_node_to_cpumask_ptr(int node)
-{
-	if (node_to_cpumask_map == NULL) {
-		printk(KERN_WARNING
-			"_node_to_cpumask_ptr(%d): no node_to_cpumask_map!\n",
-			node);
-		dump_stack();
-		return &cpu_online_map;
-	}
-	return &node_to_cpumask_map[node];
-}
-EXPORT_SYMBOL(_node_to_cpumask_ptr);
-
-/*
- * Returns a bitmask of CPUs on Node 'node'.
- */
-cpumask_t node_to_cpumask(int node)
-{
-	if (node_to_cpumask_map == NULL) {
-		printk(KERN_WARNING
-			"node_to_cpumask(%d): no node_to_cpumask_map!\n", node);
-		dump_stack();
-		return cpu_online_map;
-	}
-	return node_to_cpumask_map[node];
-}
-EXPORT_SYMBOL(node_to_cpumask);
-
-/*
- * --------- end of debug versions of the numa functions ---------
- */
-
-#endif /* CONFIG_DEBUG_PER_CPU_MAPS */
-
-#endif /* X86_64_NUMA */
+#endif
diff -puN arch/x86/mm/numa_64.c~revert-x86-remove-the-static-256k-node_to_cpumask_map arch/x86/mm/numa_64.c
--- a/arch/x86/mm/numa_64.c~revert-x86-remove-the-static-256k-node_to_cpumask_map
+++ a/arch/x86/mm/numa_64.c
@@ -35,6 +35,9 @@ s16 apicid_to_node[MAX_LOCAL_APIC] __cpu
 	[0 ... MAX_LOCAL_APIC-1] = NUMA_NO_NODE
 };
 
+cpumask_t node_to_cpumask_map[MAX_NUMNODES] __read_mostly;
+EXPORT_SYMBOL(node_to_cpumask_map);
+
 int numa_off __initdata;
 static unsigned long __initdata nodemap_addr;
 static unsigned long __initdata nodemap_size;
@@ -557,6 +560,9 @@ void __init numa_initmem_init(unsigned l
 	node_set(0, node_possible_map);
 	for (i = 0; i < NR_CPUS; i++)
 		numa_set_node(i, 0);
+	/* cpumask_of_cpu() may not be available during early startup */
+	memset(&node_to_cpumask_map[0], 0, sizeof(node_to_cpumask_map[0]));
+	cpu_set(0, node_to_cpumask_map[0]);
 	e820_register_active_regions(0, start_pfn, last_pfn);
 	setup_node_bootmem(0, start_pfn << PAGE_SHIFT, last_pfn << PAGE_SHIFT);
 }
diff -puN include/asm-x86/topology.h~revert-x86-remove-the-static-256k-node_to_cpumask_map include/asm-x86/topology.h
--- a/include/asm-x86/topology.h~revert-x86-remove-the-static-256k-node_to_cpumask_map
+++ a/include/asm-x86/topology.h
@@ -57,16 +57,10 @@ static inline int cpu_to_node(int cpu)
 }
 #define early_cpu_to_node(cpu)	cpu_to_node(cpu)
 
-/* Returns a bitmask of CPUs on Node 'node'. */
-static inline cpumask_t node_to_cpumask(int node)
-{
-	return node_to_cpumask_map[node];
-}
-
 #else /* CONFIG_X86_64 */
 
 /* Mappings between node number and cpus on that node. */
-extern cpumask_t *node_to_cpumask_map;
+extern cpumask_t node_to_cpumask_map[];
 
 /* Mappings between logical cpu number and node number */
 DECLARE_EARLY_PER_CPU(int, x86_cpu_to_node_map);
@@ -110,6 +104,7 @@ static inline cpumask_t node_to_cpumask(
 }
 
 #endif /* !CONFIG_DEBUG_PER_CPU_MAPS */
+#endif /* CONFIG_X86_64 */
 
 /* Replace default node_to_cpumask_ptr with optimized version */
 #define node_to_cpumask_ptr(v, node)		\
@@ -118,7 +113,12 @@ static inline cpumask_t node_to_cpumask(
 #define node_to_cpumask_ptr_next(v, node)	\
 			   v = _node_to_cpumask_ptr(node)
 
-#endif /* CONFIG_X86_64 */
+/* Returns the number of the first CPU on Node 'node'. */
+static inline int node_to_first_cpu(int node)
+{
+	node_to_cpumask_ptr(mask, node);
+	return first_cpu(*mask);
+}
 
 /*
  * Returns the number of the node containing Node 'node'. This
@@ -204,15 +204,6 @@ static inline int node_to_first_cpu(int 
 
 #include <asm-generic/topology.h>
 
-#ifdef CONFIG_NUMA
-/* Returns the number of the first CPU on Node 'node'. */
-static inline int node_to_first_cpu(int node)
-{
-	node_to_cpumask_ptr(mask, node);
-	return first_cpu(*mask);
-}
-#endif
-
 extern cpumask_t cpu_coregroup_map(int cpu);
 
 #ifdef ENABLE_TOPO_DEFINES
_


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06 10:54         ` Andrew Morton
@ 2008-06-06 11:21           ` Vegard Nossum
  2008-06-06 11:57           ` Ingo Molnar
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 54+ messages in thread
From: Vegard Nossum @ 2008-06-06 11:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Stephen Rothwell, linux-next, LKML, Mike Travis

On Fri, Jun 6, 2008 at 12:54 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Fri, 6 Jun 2008 02:48:11 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:

...

> commit a9ad585c8a18f7ba754b85f5786976609b9d7d29
> Author: Mike Travis <travis@sgi.com>
> Date:   Mon May 12 21:21:12 2008 +0200
>
>    x86: remove the static 256k node_to_cpumask_map
>
>      * Consolidate node_to_cpumask operations and remove the 256k
>        byte node_to_cpumask_map.  This is done by allocating the
>        node_to_cpumask_map array after the number of possible nodes
>        (nr_node_ids) is known.
>
>      * Debug printouts when CONFIG_DEBUG_PER_CPU_MAPS is active have
>        been increased.  It now shows faults when calling node_to_cpumask()
>        and node_to_cpumask_ptr().

This might be obvious, but maybe enabling CONFIG_DEBUG_PER_CPU_MAPS
will give you some more (valuable) info? It looks to me like maybe
nr_node_ids is returning inconsistent numbers or used somewhere before
it's properly initialized.


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06  8:30                   ` Andrew Morton
  2008-06-06  8:36                     ` Ingo Molnar
@ 2008-06-06 11:50                     ` Paul Mackerras
  1 sibling, 0 replies; 54+ messages in thread
From: Paul Mackerras @ 2008-06-06 11:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Stephen Rothwell, Ingo Molnar, linux-next, LKML,
	the arch/x86 maintainers

Andrew Morton writes:

> Well OK.  But patches in fact _do_ go into Linux as a single linear
> stream of commits.

Well no, they don't.  Multiple people work on things independently and
then put their stuff together.  Sometimes there are then conflicts
that have to be sorted out.  That's what merging is all about.

>  But the whole git model ignores that reality and
> here we see the result.

No, the git model (and the BK model before it) expresses the reality
that there is lots of development going on in parallel in many
different places.

> And saying "git doesn't work like that - you don't understand" just
> doesn't cut it.  It is a tool's job to permit humans to implement the
> workflow which they wish to follow.  Not to go and force them into
> doing something inferior.

You'd prefer to be the bunny that keeps every single subsystem's
string of patches all bundled together in a single humungous quilt
series?  With all due respect (and with a sense of admiration at how
much patch-wrangling you already do), I don't think you'd scale that
far. :)

Paul.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06 10:54         ` Andrew Morton
  2008-06-06 11:21           ` Vegard Nossum
@ 2008-06-06 11:57           ` Ingo Molnar
  2008-06-06 12:33             ` Vegard Nossum
  2008-06-06 13:28           ` Mike Travis
  2008-06-06 17:15           ` Ingo Molnar
  3 siblings, 1 reply; 54+ messages in thread
From: Ingo Molnar @ 2008-06-06 11:57 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Stephen Rothwell, linux-next, LKML, Mike Travis


* Andrew Morton <akpm@linux-foundation.org> wrote:

> Good
> 
> 	a9ad585c8a18f7ba754b85f5786976609b9d7d29
> 	Author: Mike Travis <travis@sgi.com>  2008-05-12 12:21:12
> 	Committer: Thomas Gleixner <tglx@linutronix.de>  2008-05-23 09:07:47
> 	Parent: 543e21916497be5a4005fd5820264ce1de9bd56d (x86: restore pda nodenumber field)
> 	Child:  78d49c6d890aee9cf8aea371011c9d7b0121b822 (x86: remove static boot_cpu_pda array v2)
> 	Branch: 
> 	Follows: v2.6.26-rc2
> 	Precedes: next-20080526
> 
> 	    x86: remove the static 256k node_to_cpumask_map
> 
> crash, as described earlier.

thanks for tracking it down! This was the origin of the commit:

 # tip/x86/numa: a9ad585: x86: remove the static 256k node_to_cpumask_map

which has been in -tip since May 12 and in linux-next for two weeks 
AFAICS, which is beyond the point of being something freshly wrong.

So i suspect something more subtle here. What compiler version are you 
using? This crash is not something that has been found in testing before 
- i use rather new compilers, gcc 4.2.2 most of the time. Previous 
compilers miscompile the kernel seriously so it's not usable for our 
regression testing grid.

until more is found out i've put the revert into tip/x86/numa for now. 
Note, you'll also need the commit below for 32-bit NUMA.

	Ingo

---------------->
commit f418f2b4a9b6ef4035cc8c9a166873a2b275e4ef
Author: Ingo Molnar <mingo@elte.hu>
Date:   Fri Jun 6 13:54:52 2008 +0200

    x86: fix revert side-effects
    
    fix 32-bit NUMA.

diff --git a/include/asm-x86/topology.h b/include/asm-x86/topology.h
index c0e6ff7..abd3aa8 100644
--- a/include/asm-x86/topology.h
+++ b/include/asm-x86/topology.h
@@ -57,6 +57,18 @@ static inline int cpu_to_node(int cpu)
 }
 #define early_cpu_to_node(cpu)	cpu_to_node(cpu)
 
+/* Returns a pointer to the cpumask of CPUs on Node 'node'. */
+static inline cpumask_t *_node_to_cpumask_ptr(int node)
+{
+	return &node_to_cpumask_map[node];
+}
+
+/* Returns a bitmask of CPUs on Node 'node'. */
+static inline cpumask_t node_to_cpumask(int node)
+{
+	return node_to_cpumask_map[node];
+}
+
 #else /* CONFIG_X86_64 */
 
 /* Mappings between node number and cpus on that node. */

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06 11:57           ` Ingo Molnar
@ 2008-06-06 12:33             ` Vegard Nossum
  2008-06-06 13:33               ` Mike Travis
  0 siblings, 1 reply; 54+ messages in thread
From: Vegard Nossum @ 2008-06-06 12:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, Stephen Rothwell, linux-next, LKML, Mike Travis

On Fri, Jun 6, 2008 at 1:57 PM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Andrew Morton <akpm@linux-foundation.org> wrote:
>
>> Good
>>
>>       a9ad585c8a18f7ba754b85f5786976609b9d7d29
>>       Author: Mike Travis <travis@sgi.com>  2008-05-12 12:21:12
>>       Committer: Thomas Gleixner <tglx@linutronix.de>  2008-05-23 09:07:47
>>       Parent: 543e21916497be5a4005fd5820264ce1de9bd56d (x86: restore pda nodenumber field)
>>       Child:  78d49c6d890aee9cf8aea371011c9d7b0121b822 (x86: remove static boot_cpu_pda array v2)
>>       Branch:
>>       Follows: v2.6.26-rc2
>>       Precedes: next-20080526
>>
>>           x86: remove the static 256k node_to_cpumask_map
>>
>> crash, as described earlier.
>
> thanks for tracking it down! This was the origin of the commit:
>
>  # tip/x86/numa: a9ad585: x86: remove the static 256k node_to_cpumask_map
>
> which has been in -tip since May 12 and in linux-next for two weeks
> AFAICS, which is beyond the point of being something freshly wrong.
>
> So i suspect something more subtle here. What compiler version are you
> using? This crash is not something that has been found in testing before
> - i use rather new compilers, gcc 4.2.2 most of the time. Previous
> compilers miscompile the kernel seriously so it's not usable for our
> regression testing grid.
>

Hi,

I reproced it with gc 4.1.2. I think the error is somewhere in kernel/sched.c.

static int __build_sched_domains(const cpumask_t *cpu_map,
                                 struct sched_domain_attr *attr)
{
...
        for (i = 0; i < MAX_NUMNODES; i++) {
...
                sg = kmalloc_node(sizeof(struct sched_group), GFP_KERNEL, i);
...

This code is calling into the allocator with a spurious value of i,
which causes SLAB to use an index (of 4 in my case) that is out of
bounds for its nodelist array (at least it hasn't been initialized).

This bit of code (a bit further down, inside the same loop) is also dubious:

                        sg = kmalloc_node(sizeof(struct sched_group),
                                          GFP_KERNEL, i);
                        if (!sg) {
                                printk(KERN_WARNING
                                "Can not alloc domain group for node %d\n", j);
                                goto error;
                        }

Where it passes i to kmalloc_node() but reports an allocation for node
j. Which one is correct?

Hope this helps, will send an update if I find out more.


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06 10:54         ` Andrew Morton
  2008-06-06 11:21           ` Vegard Nossum
  2008-06-06 11:57           ` Ingo Molnar
@ 2008-06-06 13:28           ` Mike Travis
  2008-06-06 17:15           ` Ingo Molnar
  3 siblings, 0 replies; 54+ messages in thread
From: Mike Travis @ 2008-06-06 13:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ingo Molnar, Stephen Rothwell, linux-next, LKML

Andrew Morton wrote:
> On Fri, 6 Jun 2008 02:48:11 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:
> 
>> I'll try a linear search starting at
> 
> 	ff0e010ef613b0e7136f2f40ec4b51273676b085
> 	Author: Mike Travis <travis@sgi.com>  2008-05-12 12:21:12
> 	Committer: Thomas Gleixner <tglx@linutronix.de>  2008-05-23 09:05:25
> 	Parent: b65e04b53ffcb4002737a5346c9ff8865c37be58 (x86: don't call pxm_to_node again)
> 	Child:  dfdf1d75efee39e9396f8384c6f3bf555349ed60 (x86: modify Kconfig to allow up to 4096 cpus)
> 	Branch: 
> 	Follows: v2.6.26-rc2
> 	Precedes: next-20080526
> 
> 	    x86: fix remove cpu_pda table patch
> 
> 
> Good
> 
> 	dfdf1d75efee39e9396f8384c6f3bf555349ed60
> 	Author: Mike Travis <travis@sgi.com>  2008-05-12 12:21:12
> 	Committer: Thomas Gleixner <tglx@linutronix.de>  2008-05-23 09:05:39
> 	Parent: ff0e010ef613b0e7136f2f40ec4b51273676b085 (x86: fix remove cpu_pda table patch)
> 	Child:  29657a44f8660acd8751d7e9f5aac06ec8633481 (x86: cleanup early per cpu variables/accesses v4)
> 	Branch: 
> 	Follows: v2.6.26-rc2
> 	Precedes: next-20080526
> 
> 	    x86: modify Kconfig to allow up to 4096 cpus
> 
> Good
> 
> 	29657a44f8660acd8751d7e9f5aac06ec8633481
> 	Author: Mike Travis <travis@sgi.com>  2008-05-12 12:21:12
> 	Committer: Thomas Gleixner <tglx@linutronix.de>  2008-05-23 09:07:23
> 	Parent: dfdf1d75efee39e9396f8384c6f3bf555349ed60 (x86: modify Kconfig to allow up to 4096 cpus)
> 	Child:  543e21916497be5a4005fd5820264ce1de9bd56d (x86: restore pda nodenumber field)
> 	Branch: 
> 	Follows: v2.6.26-rc2
> 	Precedes: next-20080526
> 
> 	    x86: cleanup early per cpu variables/accesses v4
> 
> Good
> 
> 
> 	543e21916497be5a4005fd5820264ce1de9bd56d
> 	Author: Mike Travis <travis@sgi.com>  2008-05-12 12:21:12
> 	Committer: Thomas Gleixner <tglx@linutronix.de>  2008-05-23 09:07:37
> 	Parent: 29657a44f8660acd8751d7e9f5aac06ec8633481 (x86: cleanup early per cpu variables/accesses v4)
> 	Child:  a9ad585c8a18f7ba754b85f5786976609b9d7d29 (x86: remove the static 256k node_to_cpumask_map)
> 	Branch: 
> 	Follows: v2.6.26-rc2
> 	Precedes: next-20080526
> 
> 	    x86: restore pda nodenumber field
> 
> Good
> 
> 	a9ad585c8a18f7ba754b85f5786976609b9d7d29
> 	Author: Mike Travis <travis@sgi.com>  2008-05-12 12:21:12
> 	Committer: Thomas Gleixner <tglx@linutronix.de>  2008-05-23 09:07:47
> 	Parent: 543e21916497be5a4005fd5820264ce1de9bd56d (x86: restore pda nodenumber field)
> 	Child:  78d49c6d890aee9cf8aea371011c9d7b0121b822 (x86: remove static boot_cpu_pda array v2)
> 	Branch: 
> 	Follows: v2.6.26-rc2
> 	Precedes: next-20080526
> 
> 	    x86: remove the static 256k node_to_cpumask_map
> 
> crash, as described earlier.
> 
> I don't know what happened to that early exception - it didn't come back.
> 
> The below revert gets linux-next working for me.

Did you try using the DEBUG_PER_CPU_MAPS option?  This should trigger on any
use of the node_to_cpumask map before it's been allocated.  (Hmm, I should
check if it also validates the node number - as when MAX_NUMNODES is used
instead of nr_node_ids.)

Also, could you send me the config file and a short description of what kind
of system you are testing on?

Thanks,
Mike

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06 12:33             ` Vegard Nossum
@ 2008-06-06 13:33               ` Mike Travis
  2008-06-06 13:50                 ` Vegard Nossum
  0 siblings, 1 reply; 54+ messages in thread
From: Mike Travis @ 2008-06-06 13:33 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Ingo Molnar, Andrew Morton, Stephen Rothwell, linux-next, LKML

Vegard Nossum wrote:
> On Fri, Jun 6, 2008 at 1:57 PM, Ingo Molnar <mingo@elte.hu> wrote:
>> * Andrew Morton <akpm@linux-foundation.org> wrote:
>>
>>> Good
>>>
>>>       a9ad585c8a18f7ba754b85f5786976609b9d7d29
>>>       Author: Mike Travis <travis@sgi.com>  2008-05-12 12:21:12
>>>       Committer: Thomas Gleixner <tglx@linutronix.de>  2008-05-23 09:07:47
>>>       Parent: 543e21916497be5a4005fd5820264ce1de9bd56d (x86: restore pda nodenumber field)
>>>       Child:  78d49c6d890aee9cf8aea371011c9d7b0121b822 (x86: remove static boot_cpu_pda array v2)
>>>       Branch:
>>>       Follows: v2.6.26-rc2
>>>       Precedes: next-20080526
>>>
>>>           x86: remove the static 256k node_to_cpumask_map
>>>
>>> crash, as described earlier.
>> thanks for tracking it down! This was the origin of the commit:
>>
>>  # tip/x86/numa: a9ad585: x86: remove the static 256k node_to_cpumask_map
>>
>> which has been in -tip since May 12 and in linux-next for two weeks
>> AFAICS, which is beyond the point of being something freshly wrong.
>>
>> So i suspect something more subtle here. What compiler version are you
>> using? This crash is not something that has been found in testing before
>> - i use rather new compilers, gcc 4.2.2 most of the time. Previous
>> compilers miscompile the kernel seriously so it's not usable for our
>> regression testing grid.
>>
> 
> Hi,
> 
> I reproced it with gc 4.1.2. I think the error is somewhere in kernel/sched.c.
> 
> static int __build_sched_domains(const cpumask_t *cpu_map,
>                                  struct sched_domain_attr *attr)
> {
> ...
>         for (i = 0; i < MAX_NUMNODES; i++) {
> ...
>                 sg = kmalloc_node(sizeof(struct sched_group), GFP_KERNEL, i);
> ...
> 
> This code is calling into the allocator with a spurious value of i,
> which causes SLAB to use an index (of 4 in my case) that is out of
> bounds for its nodelist array (at least it hasn't been initialized).
> 
> This bit of code (a bit further down, inside the same loop) is also dubious:
> 
>                         sg = kmalloc_node(sizeof(struct sched_group),
>                                           GFP_KERNEL, i);
>                         if (!sg) {
>                                 printk(KERN_WARNING
>                                 "Can not alloc domain group for node %d\n", j);
>                                 goto error;
>                         }
> 
> Where it passes i to kmalloc_node() but reports an allocation for node
> j. Which one is correct?
> 
> Hope this helps, will send an update if I find out more.
> 
> 
> Vegard
> 

Thanks Vegard for tracking this down.  My thoughts were along the same
wavelength... ;-)

Mike

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06 13:33               ` Mike Travis
@ 2008-06-06 13:50                 ` Vegard Nossum
  2008-06-06 14:07                   ` Vegard Nossum
  2008-06-06 14:13                   ` Mike Travis
  0 siblings, 2 replies; 54+ messages in thread
From: Vegard Nossum @ 2008-06-06 13:50 UTC (permalink / raw)
  To: Mike Travis
  Cc: Ingo Molnar, Andrew Morton, Stephen Rothwell, linux-next, LKML

On Fri, Jun 6, 2008 at 3:33 PM, Mike Travis <travis@sgi.com> wrote:
> Vegard Nossum wrote:
>>
>> I reproced it with gc 4.1.2. I think the error is somewhere in kernel/sched.c.
>>
>> static int __build_sched_domains(const cpumask_t *cpu_map,
>>                                  struct sched_domain_attr *attr)
>> {
>> ...
>>         for (i = 0; i < MAX_NUMNODES; i++) {
>> ...
>>                 sg = kmalloc_node(sizeof(struct sched_group), GFP_KERNEL, i);
>> ...
>>
>> This code is calling into the allocator with a spurious value of i,
>> which causes SLAB to use an index (of 4 in my case) that is out of
>> bounds for its nodelist array (at least it hasn't been initialized).
>>
>> This bit of code (a bit further down, inside the same loop) is also dubious:
>>
>>                         sg = kmalloc_node(sizeof(struct sched_group),
>>                                           GFP_KERNEL, i);
>>                         if (!sg) {
>>                                 printk(KERN_WARNING
>>                                 "Can not alloc domain group for node %d\n", j);
>>                                 goto error;
>>                         }
>>
>> Where it passes i to kmalloc_node() but reports an allocation for node
>> j. Which one is correct?
>>

Hm, I think I'm wrong and the code is correct. However...

>> Hope this helps, will send an update if I find out more.
>>
>>
>> Vegard
>>
>
> Thanks Vegard for tracking this down.  My thoughts were along the same
> wavelength... ;-)

I applied this patch
@@ -7133,6 +7133,14 @@ static int __build_sched_domains(const
cpumask_t *cpu_map,
                cpus_clear(*covered);

                cpus_and(*nodemask, *nodemask, *cpu_map);
+
+               printk("node %d\n", i);
+               for (j = 0; j < NR_CPUS; ++j)
+                       printk("%c", cpu_isset(j, *nodemask) ? 'X' : '.');
+               printk("\n");
+
+               printk("empty = %d\n", cpus_empty(*nodemask));
+
                if (cpus_empty(*nodemask)) {
                        sched_group_nodes[i] = NULL;
                        continue;

and it shows some really strange output, maybe it makes sense to you:

(the X means cpu is in the node)

Total of 2 processors activated (11976.24 BogoMIPS).
node 0
XX..............................................................................
................................................................................
................................................................................
...............
empty = 0
node 1
XX..............................................................................
................................................................................
................................................................................
...............
empty = 0
l3 = cachep->nodelists[0] (size-64) = ffff81003f824340
node 2
................................................................................
................................................................................
................................................................................
...............
empty = 1
node 3
................................................................................
................................................................................
................................................................................
...............
empty = 1
node 4
X...............................................................................
................................................................................
................................................................................
...............
empty = 0

This is a P4 3.0GHz with 1 physical CPU (but HT, so two logical CPUs).
Yet node 4 is claimed to have a cpu too. That's bogus!

(But I don't think it's an error in sched.c any more, probably the
code that sets up the node maps.)


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06 13:50                 ` Vegard Nossum
@ 2008-06-06 14:07                   ` Vegard Nossum
  2008-06-06 14:20                     ` Mike Travis
  2008-06-06 14:13                   ` Mike Travis
  1 sibling, 1 reply; 54+ messages in thread
From: Vegard Nossum @ 2008-06-06 14:07 UTC (permalink / raw)
  To: Mike Travis
  Cc: Ingo Molnar, Andrew Morton, Stephen Rothwell, linux-next, LKML

On Fri, Jun 6, 2008 at 3:50 PM, Vegard Nossum <vegard.nossum@gmail.com> wrote:
> On Fri, Jun 6, 2008 at 3:33 PM, Mike Travis <travis@sgi.com> wrote:
>> Vegard Nossum wrote:
>>>
>>> I reproced it with gc 4.1.2. I think the error is somewhere in kernel/sched.c.
>>>
>>> static int __build_sched_domains(const cpumask_t *cpu_map,
>>>                                  struct sched_domain_attr *attr)
>>> {
>>> ...
>>>         for (i = 0; i < MAX_NUMNODES; i++) {
>>> ...
>>>                 sg = kmalloc_node(sizeof(struct sched_group), GFP_KERNEL, i);
>>> ...
>>>
>>> This code is calling into the allocator with a spurious value of i,
>>> which causes SLAB to use an index (of 4 in my case) that is out of
>>> bounds for its nodelist array (at least it hasn't been initialized).
>>>
>>> This bit of code (a bit further down, inside the same loop) is also dubious:
>>>
>>>                         sg = kmalloc_node(sizeof(struct sched_group),
>>>                                           GFP_KERNEL, i);
>>>                         if (!sg) {
>>>                                 printk(KERN_WARNING
>>>                                 "Can not alloc domain group for node %d\n", j);
>>>                                 goto error;
>>>                         }
>>>
>>> Where it passes i to kmalloc_node() but reports an allocation for node
>>> j. Which one is correct?
>>>
>
> Hm, I think I'm wrong and the code is correct. However...
>
>>> Hope this helps, will send an update if I find out more.
>>>
>>>
>>> Vegard
>>>
>>
>> Thanks Vegard for tracking this down.  My thoughts were along the same
>> wavelength... ;-)

...

>
> This is a P4 3.0GHz with 1 physical CPU (but HT, so two logical CPUs).
> Yet node 4 is claimed to have a cpu too. That's bogus!
>
> (But I don't think it's an error in sched.c any more, probably the
> code that sets up the node maps.)

Aha.

The error is of course that the node masks for nodes > nr_node_ids are
not valid. While this function ignores that:

cpumask_t *_node_to_cpumask_ptr(int node)
{
        if (node_to_cpumask_map == NULL) {
                printk(KERN_WARNING
                        "_node_to_cpumask_ptr(%d): no node_to_cpumask_map!\n",
                        node);
                dump_stack();
                return &cpu_online_map;
        }
        return &node_to_cpumask_map[node];
}
EXPORT_SYMBOL(_node_to_cpumask_ptr);

Notice the return statement. It needs to check if node < nr_node_ids.


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06 13:50                 ` Vegard Nossum
  2008-06-06 14:07                   ` Vegard Nossum
@ 2008-06-06 14:13                   ` Mike Travis
  1 sibling, 0 replies; 54+ messages in thread
From: Mike Travis @ 2008-06-06 14:13 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Ingo Molnar, Andrew Morton, Stephen Rothwell, linux-next, LKML

Vegard Nossum wrote:
> On Fri, Jun 6, 2008 at 3:33 PM, Mike Travis <travis@sgi.com> wrote:
>> Vegard Nossum wrote:
>>> I reproced it with gc 4.1.2. I think the error is somewhere in kernel/sched.c.
>>>
>>> static int __build_sched_domains(const cpumask_t *cpu_map,
>>>                                  struct sched_domain_attr *attr)
>>> {
>>> ...
>>>         for (i = 0; i < MAX_NUMNODES; i++) {
>>> ...
>>>                 sg = kmalloc_node(sizeof(struct sched_group), GFP_KERNEL, i);
>>> ...
>>>
>>> This code is calling into the allocator with a spurious value of i,
>>> which causes SLAB to use an index (of 4 in my case) that is out of
>>> bounds for its nodelist array (at least it hasn't been initialized).
>>>
>>> This bit of code (a bit further down, inside the same loop) is also dubious:
>>>
>>>                         sg = kmalloc_node(sizeof(struct sched_group),
>>>                                           GFP_KERNEL, i);
>>>                         if (!sg) {
>>>                                 printk(KERN_WARNING
>>>                                 "Can not alloc domain group for node %d\n", j);
>>>                                 goto error;
>>>                         }
>>>
>>> Where it passes i to kmalloc_node() but reports an allocation for node
>>> j. Which one is correct?
>>>
> 
> Hm, I think I'm wrong and the code is correct. However...
> 
>>> Hope this helps, will send an update if I find out more.
>>>
>>>
>>> Vegard
>>>
>> Thanks Vegard for tracking this down.  My thoughts were along the same
>> wavelength... ;-)
> 
> I applied this patch
> @@ -7133,6 +7133,14 @@ static int __build_sched_domains(const
> cpumask_t *cpu_map,
>                 cpus_clear(*covered);
> 
>                 cpus_and(*nodemask, *nodemask, *cpu_map);
> +
> +               printk("node %d\n", i);
> +               for (j = 0; j < NR_CPUS; ++j)
> +                       printk("%c", cpu_isset(j, *nodemask) ? 'X' : '.');
> +               printk("\n");
> +
> +               printk("empty = %d\n", cpus_empty(*nodemask));
> +
>                 if (cpus_empty(*nodemask)) {
>                         sched_group_nodes[i] = NULL;
>                         continue;
> 
> and it shows some really strange output, maybe it makes sense to you:
> 
> (the X means cpu is in the node)
> 
> Total of 2 processors activated (11976.24 BogoMIPS).
> node 0
> XX..............................................................................
> ................................................................................
> ................................................................................
> ...............
> empty = 0
> node 1
> XX..............................................................................
> ................................................................................
> ................................................................................
> ...............
> empty = 0
> l3 = cachep->nodelists[0] (size-64) = ffff81003f824340
> node 2
> ................................................................................
> ................................................................................
> ................................................................................
> ...............
> empty = 1
> node 3
> ................................................................................
> ................................................................................
> ................................................................................
> ...............
> empty = 1
> node 4
> X...............................................................................
> ................................................................................
> ................................................................................
> ...............
> empty = 0
> 
> This is a P4 3.0GHz with 1 physical CPU (but HT, so two logical CPUs).
> Yet node 4 is claimed to have a cpu too. That's bogus!
> 
> (But I don't think it's an error in sched.c any more, probably the
> code that sets up the node maps.)
> 
> 
> Vegard
> 

Could you send me the full console log and your config file?  The setup of
the node_to_cpumask map is dependent on the early discovery (usually in the
apic code) and there's been some changes in that area recently.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06 14:07                   ` Vegard Nossum
@ 2008-06-06 14:20                     ` Mike Travis
  2008-06-06 14:36                       ` Vegard Nossum
  0 siblings, 1 reply; 54+ messages in thread
From: Mike Travis @ 2008-06-06 14:20 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Ingo Molnar, Andrew Morton, Stephen Rothwell, linux-next, LKML

Vegard Nossum wrote:
> On Fri, Jun 6, 2008 at 3:50 PM, Vegard Nossum <vegard.nossum@gmail.com> wrote:
>> On Fri, Jun 6, 2008 at 3:33 PM, Mike Travis <travis@sgi.com> wrote:
>>> Vegard Nossum wrote:
>>>> I reproced it with gc 4.1.2. I think the error is somewhere in kernel/sched.c.
>>>>
>>>> static int __build_sched_domains(const cpumask_t *cpu_map,
>>>>                                  struct sched_domain_attr *attr)
>>>> {
>>>> ...
>>>>         for (i = 0; i < MAX_NUMNODES; i++) {
>>>> ...
>>>>                 sg = kmalloc_node(sizeof(struct sched_group), GFP_KERNEL, i);
>>>> ...
>>>>
>>>> This code is calling into the allocator with a spurious value of i,
>>>> which causes SLAB to use an index (of 4 in my case) that is out of
>>>> bounds for its nodelist array (at least it hasn't been initialized).
>>>>
>>>> This bit of code (a bit further down, inside the same loop) is also dubious:
>>>>
>>>>                         sg = kmalloc_node(sizeof(struct sched_group),
>>>>                                           GFP_KERNEL, i);
>>>>                         if (!sg) {
>>>>                                 printk(KERN_WARNING
>>>>                                 "Can not alloc domain group for node %d\n", j);
>>>>                                 goto error;
>>>>                         }
>>>>
>>>> Where it passes i to kmalloc_node() but reports an allocation for node
>>>> j. Which one is correct?
>>>>
>> Hm, I think I'm wrong and the code is correct. However...
>>
>>>> Hope this helps, will send an update if I find out more.
>>>>
>>>>
>>>> Vegard
>>>>
>>> Thanks Vegard for tracking this down.  My thoughts were along the same
>>> wavelength... ;-)
> 
> ...
> 
>> This is a P4 3.0GHz with 1 physical CPU (but HT, so two logical CPUs).
>> Yet node 4 is claimed to have a cpu too. That's bogus!
>>
>> (But I don't think it's an error in sched.c any more, probably the
>> code that sets up the node maps.)
> 
> Aha.
> 
> The error is of course that the node masks for nodes > nr_node_ids are
> not valid. While this function ignores that:
> 
> cpumask_t *_node_to_cpumask_ptr(int node)
> {
>         if (node_to_cpumask_map == NULL) {
>                 printk(KERN_WARNING
>                         "_node_to_cpumask_ptr(%d): no node_to_cpumask_map!\n",
>                         node);
>                 dump_stack();
>                 return &cpu_online_map;
>         }
>         return &node_to_cpumask_map[node];
> }
> EXPORT_SYMBOL(_node_to_cpumask_ptr);
> 
> Notice the return statement. It needs to check if node < nr_node_ids.
> 
> 
> Vegard
> 


Thanks, yes I had that some after thought.  It should check the node
index if CONFIG_DEBUG_PER_CPU_MAPS is enabled.  One gotcha is that
nr_node_ids is intialized to MAX_NUMNODES until setup_node_to_cpumask_map()
sets it to the correct value.  So uses before that should be caught by
the earlier check.

Mike

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06 14:20                     ` Mike Travis
@ 2008-06-06 14:36                       ` Vegard Nossum
  2008-06-06 14:41                         ` Mike Travis
  2008-06-06 14:57                         ` Ingo Molnar
  0 siblings, 2 replies; 54+ messages in thread
From: Vegard Nossum @ 2008-06-06 14:36 UTC (permalink / raw)
  To: Mike Travis, Ingo Molnar
  Cc: Andrew Morton, Stephen Rothwell, linux-next, LKML

[-- Attachment #1: Type: text/plain, Size: 2658 bytes --]

On Fri, Jun 6, 2008 at 4:20 PM, Mike Travis <travis@sgi.com> wrote:
> Vegard Nossum wrote:
>> On Fri, Jun 6, 2008 at 3:50 PM, Vegard Nossum <vegard.nossum@gmail.com> wrote:
>>> On Fri, Jun 6, 2008 at 3:33 PM, Mike Travis <travis@sgi.com> wrote:
>>>> Vegard Nossum wrote:
>>>>> I reproced it with gc 4.1.2. I think the error is somewhere in kernel/sched.c.
>>>>>
>>>>> static int __build_sched_domains(const cpumask_t *cpu_map,
>>>>>                                  struct sched_domain_attr *attr)
>>>>> {
>>>>> ...
>>>>>         for (i = 0; i < MAX_NUMNODES; i++) {
>>>>> ...
>>>>>                 sg = kmalloc_node(sizeof(struct sched_group), GFP_KERNEL, i);
>>>>> ...
>>>>>
>>>>> This code is calling into the allocator with a spurious value of i,
>>>>> which causes SLAB to use an index (of 4 in my case) that is out of
>>>>> bounds for its nodelist array (at least it hasn't been initialized).
>>>>>

...

>> The error is of course that the node masks for nodes > nr_node_ids are
>> not valid. While this function ignores that:
>>
>> cpumask_t *_node_to_cpumask_ptr(int node)
>> {
>>         if (node_to_cpumask_map == NULL) {
>>                 printk(KERN_WARNING
>>                         "_node_to_cpumask_ptr(%d): no node_to_cpumask_map!\n",
>>                         node);
>>                 dump_stack();
>>                 return &cpu_online_map;
>>         }
>>         return &node_to_cpumask_map[node];
>> }
>> EXPORT_SYMBOL(_node_to_cpumask_ptr);
>>
>> Notice the return statement. It needs to check if node < nr_node_ids.
>>

...

>
> Thanks, yes I had that some after thought.  It should check the node
> index if CONFIG_DEBUG_PER_CPU_MAPS is enabled.  One gotcha is that
> nr_node_ids is intialized to MAX_NUMNODES until setup_node_to_cpumask_map()
> sets it to the correct value.  So uses before that should be caught by
> the earlier check.

I think it should always check the node index. The code in
kernel/sched.c (see above) calls node_to_cpumask(i) on nodes 0 < i <
MAX_NUMNODES and it WILL use invalid pointers. Or should
kernel/sched.c be changed to use nr_node_ids instead of MAX_NUMNODES?
I believe there are more places that do this than just sched.c.

I have attached two patches. The sched one fixes Andrew's boot
problem. The x86 one is untested, but I believe it is better to BUG
than silently corrupt some arbitrary memory. (Then the callers can be
found easily and fixed at least.)


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-sched-don-t-call-node_to_cpumask-on-nodes-nr_no.patch --]
[-- Type: text/x-patch; name=0001-sched-don-t-call-node_to_cpumask-on-nodes-nr_no.patch, Size: 1830 bytes --]

From 216dcbdec79d76c4d738f2c0aad41061f80564e4 Mon Sep 17 00:00:00 2001
From: Vegard Nossum <vegardno@ben.ifi.uio.no>
Date: Fri, 6 Jun 2008 16:31:19 +0200
Subject: [PATCH] sched: don't call node_to_cpumask() on nodes > nr_node_ids

Signed-off-by: Vegard Nossum <vegardno@ben.ifi.uio.no>
---
 kernel/sched.c |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index fc9ba90..8ab9cd6 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -6770,7 +6770,7 @@ static void free_sched_groups(const cpumask_t *cpu_map, cpumask_t *nodemask)
 		if (!sched_group_nodes)
 			continue;
 
-		for (i = 0; i < MAX_NUMNODES; i++) {
+		for (i = 0; i < nr_node_ids; i++) {
 			struct sched_group *oldsg, *sg = sched_group_nodes[i];
 
 			*nodemask = node_to_cpumask(i);
@@ -7097,7 +7097,7 @@ static int __build_sched_domains(const cpumask_t *cpu_map,
 #endif
 
 	/* Set up physical groups */
-	for (i = 0; i < MAX_NUMNODES; i++) {
+	for (i = 0; i < nr_node_ids; i++) {
 		SCHED_CPUMASK_VAR(nodemask, allmasks);
 		SCHED_CPUMASK_VAR(send_covered, allmasks);
 
@@ -7121,7 +7121,7 @@ static int __build_sched_domains(const cpumask_t *cpu_map,
 					send_covered, tmpmask);
 	}
 
-	for (i = 0; i < MAX_NUMNODES; i++) {
+	for (i = 0; i < nr_node_ids; i++) {
 		/* Set up node groups */
 		struct sched_group *sg, *prev;
 		SCHED_CPUMASK_VAR(nodemask, allmasks);
@@ -7160,9 +7160,9 @@ static int __build_sched_domains(const cpumask_t *cpu_map,
 		cpus_or(*covered, *covered, *nodemask);
 		prev = sg;
 
-		for (j = 0; j < MAX_NUMNODES; j++) {
+		for (j = 0; j < nr_node_ids; j++) {
 			SCHED_CPUMASK_VAR(notcovered, allmasks);
-			int n = (i + j) % MAX_NUMNODES;
+			int n = (i + j) % nr_node_ids;
 			node_to_cpumask_ptr(pnodemask, n);
 
 			cpus_complement(*notcovered, *covered);
-- 
1.5.3.1


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: 0001-x86-don-t-return-invalid-pointers-from-node_to_cpum.patch --]
[-- Type: text/x-patch; name=0001-x86-don-t-return-invalid-pointers-from-node_to_cpum.patch, Size: 966 bytes --]

From b993e7349b954555715c2adad690711465bbd60c Mon Sep 17 00:00:00 2001
From: Vegard Nossum <vegard.nossum@gmail.com>
Date: Fri, 6 Jun 2008 16:33:25 +0200
Subject: [PATCH] x86: don't return invalid pointers from node_to_cpumask()

Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
---
 arch/x86/kernel/setup.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 8ecf7b4..8411c55 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -385,6 +385,7 @@ cpumask_t *_node_to_cpumask_ptr(int node)
 		dump_stack();
 		return &cpu_online_map;
 	}
+	BUG_ON(node >= nr_node_ids);
 	return &node_to_cpumask_map[node];
 }
 EXPORT_SYMBOL(_node_to_cpumask_ptr);
@@ -400,6 +401,7 @@ cpumask_t node_to_cpumask(int node)
 		dump_stack();
 		return cpu_online_map;
 	}
+	BUG_ON(node >= nr_node_ids);
 	return node_to_cpumask_map[node];
 }
 EXPORT_SYMBOL(node_to_cpumask);
-- 
1.5.4.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06 14:36                       ` Vegard Nossum
@ 2008-06-06 14:41                         ` Mike Travis
  2008-06-06 14:51                           ` Mike Travis
  2008-06-06 14:57                         ` Ingo Molnar
  1 sibling, 1 reply; 54+ messages in thread
From: Mike Travis @ 2008-06-06 14:41 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Ingo Molnar, Andrew Morton, Stephen Rothwell, linux-next, LKML

Vegard Nossum wrote:
> On Fri, Jun 6, 2008 at 4:20 PM, Mike Travis <travis@sgi.com> wrote:
>> Vegard Nossum wrote:
>>> On Fri, Jun 6, 2008 at 3:50 PM, Vegard Nossum <vegard.nossum@gmail.com> wrote:
>>>> On Fri, Jun 6, 2008 at 3:33 PM, Mike Travis <travis@sgi.com> wrote:
>>>>> Vegard Nossum wrote:
>>>>>> I reproced it with gc 4.1.2. I think the error is somewhere in kernel/sched.c.
>>>>>>
>>>>>> static int __build_sched_domains(const cpumask_t *cpu_map,
>>>>>>                                  struct sched_domain_attr *attr)
>>>>>> {
>>>>>> ...
>>>>>>         for (i = 0; i < MAX_NUMNODES; i++) {
>>>>>> ...
>>>>>>                 sg = kmalloc_node(sizeof(struct sched_group), GFP_KERNEL, i);
>>>>>> ...
>>>>>>
>>>>>> This code is calling into the allocator with a spurious value of i,
>>>>>> which causes SLAB to use an index (of 4 in my case) that is out of
>>>>>> bounds for its nodelist array (at least it hasn't been initialized).
>>>>>>
> 
> ...
> 
>>> The error is of course that the node masks for nodes > nr_node_ids are
>>> not valid. While this function ignores that:
>>>
>>> cpumask_t *_node_to_cpumask_ptr(int node)
>>> {
>>>         if (node_to_cpumask_map == NULL) {
>>>                 printk(KERN_WARNING
>>>                         "_node_to_cpumask_ptr(%d): no node_to_cpumask_map!\n",
>>>                         node);
>>>                 dump_stack();
>>>                 return &cpu_online_map;
>>>         }
>>>         return &node_to_cpumask_map[node];
>>> }
>>> EXPORT_SYMBOL(_node_to_cpumask_ptr);
>>>
>>> Notice the return statement. It needs to check if node < nr_node_ids.
>>>
> 
> ...
> 
>> Thanks, yes I had that some after thought.  It should check the node
>> index if CONFIG_DEBUG_PER_CPU_MAPS is enabled.  One gotcha is that
>> nr_node_ids is intialized to MAX_NUMNODES until setup_node_to_cpumask_map()
>> sets it to the correct value.  So uses before that should be caught by
>> the earlier check.
> 
> I think it should always check the node index. The code in
> kernel/sched.c (see above) calls node_to_cpumask(i) on nodes 0 < i <
> MAX_NUMNODES and it WILL use invalid pointers. Or should
> kernel/sched.c be changed to use nr_node_ids instead of MAX_NUMNODES?
> I believe there are more places that do this than just sched.c.

Yes, using MAX_NUMNODES is usually incorrect (the same for NR_CPUS).
When I originally submitted the patch I searched for all usages to
make sure they were correct.  Unfortunately, later changes might not
have been validated.  (Hmm, maybe adding to checkpatch.pl a similar
warning as it now does for NR_CPUS...?)

> 
> I have attached two patches. The sched one fixes Andrew's boot
> problem. The x86 one is untested, but I believe it is better to BUG
> than silently corrupt some arbitrary memory. (Then the callers can be
> found easily and fixed at least.)

Andrew (or maybe it was Ingo) had suggested that instead of BUG use
dump_stack() and continue whenever possible.  In this case returning
an empty cpumask would be correct.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06 14:41                         ` Mike Travis
@ 2008-06-06 14:51                           ` Mike Travis
  2008-06-06 14:54                             ` Mike Travis
  0 siblings, 1 reply; 54+ messages in thread
From: Mike Travis @ 2008-06-06 14:51 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Ingo Molnar, Andrew Morton, Stephen Rothwell, linux-next, LKML

Mike Travis wrote:
> Vegard Nossum wrote:
>> On Fri, Jun 6, 2008 at 4:20 PM, Mike Travis <travis@sgi.com> wrote:
>>> Vegard Nossum wrote:
>>>> On Fri, Jun 6, 2008 at 3:50 PM, Vegard Nossum <vegard.nossum@gmail.com> wrote:
>>>>> On Fri, Jun 6, 2008 at 3:33 PM, Mike Travis <travis@sgi.com> wrote:
>>>>>> Vegard Nossum wrote:
>>>>>>> I reproced it with gc 4.1.2. I think the error is somewhere in kernel/sched.c.
>>>>>>>
>>>>>>> static int __build_sched_domains(const cpumask_t *cpu_map,
>>>>>>>                                  struct sched_domain_attr *attr)
>>>>>>> {
>>>>>>> ...
>>>>>>>         for (i = 0; i < MAX_NUMNODES; i++) {
>>>>>>> ...
>>>>>>>                 sg = kmalloc_node(sizeof(struct sched_group), GFP_KERNEL, i);
>>>>>>> ...
>>>>>>>
>>>>>>> This code is calling into the allocator with a spurious value of i,
>>>>>>> which causes SLAB to use an index (of 4 in my case) that is out of
>>>>>>> bounds for its nodelist array (at least it hasn't been initialized).
>>>>>>>
>> ...
>>
>>>> The error is of course that the node masks for nodes > nr_node_ids are
>>>> not valid. While this function ignores that:
>>>>
>>>> cpumask_t *_node_to_cpumask_ptr(int node)
>>>> {
>>>>         if (node_to_cpumask_map == NULL) {
>>>>                 printk(KERN_WARNING
>>>>                         "_node_to_cpumask_ptr(%d): no node_to_cpumask_map!\n",
>>>>                         node);
>>>>                 dump_stack();
>>>>                 return &cpu_online_map;
>>>>         }
>>>>         return &node_to_cpumask_map[node];
>>>> }
>>>> EXPORT_SYMBOL(_node_to_cpumask_ptr);
>>>>
>>>> Notice the return statement. It needs to check if node < nr_node_ids.
>>>>
>> ...
>>
>>> Thanks, yes I had that some after thought.  It should check the node
>>> index if CONFIG_DEBUG_PER_CPU_MAPS is enabled.  One gotcha is that
>>> nr_node_ids is intialized to MAX_NUMNODES until setup_node_to_cpumask_map()
>>> sets it to the correct value.  So uses before that should be caught by
>>> the earlier check.
>> I think it should always check the node index. The code in
>> kernel/sched.c (see above) calls node_to_cpumask(i) on nodes 0 < i <
>> MAX_NUMNODES and it WILL use invalid pointers. Or should
>> kernel/sched.c be changed to use nr_node_ids instead of MAX_NUMNODES?
>> I believe there are more places that do this than just sched.c.
> 
> Yes, using MAX_NUMNODES is usually incorrect (the same for NR_CPUS).
> When I originally submitted the patch I searched for all usages to
> make sure they were correct.  Unfortunately, later changes might not
> have been validated.  (Hmm, maybe adding to checkpatch.pl a similar
> warning as it now does for NR_CPUS...?)
> 
>> I have attached two patches. The sched one fixes Andrew's boot
>> problem. The x86 one is untested, but I believe it is better to BUG
>> than silently corrupt some arbitrary memory. (Then the callers can be
>> found easily and fixed at least.)
> 
> Andrew (or maybe it was Ingo) had suggested that instead of BUG use
> dump_stack() and continue whenever possible.  In this case returning
> an empty cpumask would be correct.
> 
> Thanks,
> Mike

Aha, here's the missing patch:

a953e4597abd51b74c99e0e3b7074532a60fd031



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06 14:51                           ` Mike Travis
@ 2008-06-06 14:54                             ` Mike Travis
  0 siblings, 0 replies; 54+ messages in thread
From: Mike Travis @ 2008-06-06 14:54 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Ingo Molnar, Andrew Morton, Stephen Rothwell, linux-next, LKML

Mike Travis wrote:
> Mike Travis wrote:
>> Vegard Nossum wrote:
>>> On Fri, Jun 6, 2008 at 4:20 PM, Mike Travis <travis@sgi.com> wrote:
>>>> Vegard Nossum wrote:
>>>>> On Fri, Jun 6, 2008 at 3:50 PM, Vegard Nossum <vegard.nossum@gmail.com> wrote:
>>>>>> On Fri, Jun 6, 2008 at 3:33 PM, Mike Travis <travis@sgi.com> wrote:
>>>>>>> Vegard Nossum wrote:
>>>>>>>> I reproced it with gc 4.1.2. I think the error is somewhere in kernel/sched.c.
>>>>>>>>
>>>>>>>> static int __build_sched_domains(const cpumask_t *cpu_map,
>>>>>>>>                                  struct sched_domain_attr *attr)
>>>>>>>> {
>>>>>>>> ...
>>>>>>>>         for (i = 0; i < MAX_NUMNODES; i++) {
>>>>>>>> ...
>>>>>>>>                 sg = kmalloc_node(sizeof(struct sched_group), GFP_KERNEL, i);
>>>>>>>> ...
>>>>>>>>
>>>>>>>> This code is calling into the allocator with a spurious value of i,
>>>>>>>> which causes SLAB to use an index (of 4 in my case) that is out of
>>>>>>>> bounds for its nodelist array (at least it hasn't been initialized).
>>>>>>>>
>>> ...
>>>
>>>>> The error is of course that the node masks for nodes > nr_node_ids are
>>>>> not valid. While this function ignores that:
>>>>>
>>>>> cpumask_t *_node_to_cpumask_ptr(int node)
>>>>> {
>>>>>         if (node_to_cpumask_map == NULL) {
>>>>>                 printk(KERN_WARNING
>>>>>                         "_node_to_cpumask_ptr(%d): no node_to_cpumask_map!\n",
>>>>>                         node);
>>>>>                 dump_stack();
>>>>>                 return &cpu_online_map;
>>>>>         }
>>>>>         return &node_to_cpumask_map[node];
>>>>> }
>>>>> EXPORT_SYMBOL(_node_to_cpumask_ptr);
>>>>>
>>>>> Notice the return statement. It needs to check if node < nr_node_ids.
>>>>>
>>> ...
>>>
>>>> Thanks, yes I had that some after thought.  It should check the node
>>>> index if CONFIG_DEBUG_PER_CPU_MAPS is enabled.  One gotcha is that
>>>> nr_node_ids is intialized to MAX_NUMNODES until setup_node_to_cpumask_map()
>>>> sets it to the correct value.  So uses before that should be caught by
>>>> the earlier check.
>>> I think it should always check the node index. The code in
>>> kernel/sched.c (see above) calls node_to_cpumask(i) on nodes 0 < i <
>>> MAX_NUMNODES and it WILL use invalid pointers. Or should
>>> kernel/sched.c be changed to use nr_node_ids instead of MAX_NUMNODES?
>>> I believe there are more places that do this than just sched.c.
>> Yes, using MAX_NUMNODES is usually incorrect (the same for NR_CPUS).
>> When I originally submitted the patch I searched for all usages to
>> make sure they were correct.  Unfortunately, later changes might not
>> have been validated.  (Hmm, maybe adding to checkpatch.pl a similar
>> warning as it now does for NR_CPUS...?)
>>
>>> I have attached two patches. The sched one fixes Andrew's boot
>>> problem. The x86 one is untested, but I believe it is better to BUG
>>> than silently corrupt some arbitrary memory. (Then the callers can be
>>> found easily and fixed at least.)
>> Andrew (or maybe it was Ingo) had suggested that instead of BUG use
>> dump_stack() and continue whenever possible.  In this case returning
>> an empty cpumask would be correct.
>>
>> Thanks,
>> Mike
> 
> Aha, here's the missing patch:
> 
> a953e4597abd51b74c99e0e3b7074532a60fd031
> 

Oops, message got away from me prematurely... ;-)

Ingo - can we push this from tip to linux-next?

Thanks,
Mike

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06 14:36                       ` Vegard Nossum
  2008-06-06 14:41                         ` Mike Travis
@ 2008-06-06 14:57                         ` Ingo Molnar
  2008-06-06 15:01                           ` Ingo Molnar
                                             ` (2 more replies)
  1 sibling, 3 replies; 54+ messages in thread
From: Ingo Molnar @ 2008-06-06 14:57 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Mike Travis, Andrew Morton, Stephen Rothwell, linux-next, LKML


* Vegard Nossum <vegard.nossum@gmail.com> wrote:

> > Thanks, yes I had that some after thought.  It should check the node 
> > index if CONFIG_DEBUG_PER_CPU_MAPS is enabled.  One gotcha is that 
> > nr_node_ids is intialized to MAX_NUMNODES until 
> > setup_node_to_cpumask_map() sets it to the correct value.  So uses 
> > before that should be caught by the earlier check.
> 
> I think it should always check the node index. The code in 
> kernel/sched.c (see above) calls node_to_cpumask(i) on nodes 0 < i < 
> MAX_NUMNODES and it WILL use invalid pointers. Or should 
> kernel/sched.c be changed to use nr_node_ids instead of MAX_NUMNODES? 
> I believe there are more places that do this than just sched.c.
> 
> I have attached two patches. The sched one fixes Andrew's boot 
> problem. The x86 one is untested, but I believe it is better to BUG 
> than silently corrupt some arbitrary memory. (Then the callers can be 
> found easily and fixed at least.)

nice fixes! I have applied both of them to -tip, this one to 
tip/sched-devel:

> Subject: [PATCH] sched: don't call node_to_cpumask() on nodes > nr_node_ids

AFAICS this is not yet required for v2.6.26, as the requirement to never 
iterate to MAX_NUMNODES and call nr_cpus_node() with the index only got 
introduced by Mike's patch.

and this one to tip/x86/numa:

> Subject: [PATCH] x86: don't return invalid pointers from node_to_cpumask()

and i've undone the revert of "x86: remove the static 256k 
node_to_cpumask_map" as well.

agreed?

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06 14:57                         ` Ingo Molnar
@ 2008-06-06 15:01                           ` Ingo Molnar
  2008-06-06 15:13                             ` Vegard Nossum
  2008-06-06 15:04                           ` Mike Travis
  2008-06-06 15:13                           ` Ingo Molnar
  2 siblings, 1 reply; 54+ messages in thread
From: Ingo Molnar @ 2008-06-06 15:01 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Mike Travis, Andrew Morton, Stephen Rothwell, linux-next, LKML


* Ingo Molnar <mingo@elte.hu> wrote:

> > Subject: [PATCH] sched: don't call node_to_cpumask() on nodes > 
> > nr_node_ids
> 
> AFAICS this is not yet required for v2.6.26, as the requirement to 
> never iterate to MAX_NUMNODES and call nr_cpus_node() with the index 
> only got introduced by Mike's patch.

the one below is needed as well i think.

	Ingo
---
 kernel/sched.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: tip/kernel/sched.c
===================================================================
--- tip.orig/kernel/sched.c
+++ tip/kernel/sched.c
@@ -6576,9 +6576,9 @@ static int find_next_best_node(int node,
 
 	min_val = INT_MAX;
 
-	for (i = 0; i < MAX_NUMNODES; i++) {
+	for (i = 0; i < nr_node_ids; i++) {
 		/* Start at @node */
-		n = (node + i) % MAX_NUMNODES;
+		n = (node + i) % nr_node_ids;
 
 		if (!nr_cpus_node(n))
 			continue;

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06 14:57                         ` Ingo Molnar
  2008-06-06 15:01                           ` Ingo Molnar
@ 2008-06-06 15:04                           ` Mike Travis
  2008-06-06 15:20                             ` Mike Travis
  2008-06-06 15:13                           ` Ingo Molnar
  2 siblings, 1 reply; 54+ messages in thread
From: Mike Travis @ 2008-06-06 15:04 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Vegard Nossum, Andrew Morton, Stephen Rothwell, linux-next, LKML

Ingo Molnar wrote:
> * Vegard Nossum <vegard.nossum@gmail.com> wrote:
> 
>>> Thanks, yes I had that some after thought.  It should check the node 
>>> index if CONFIG_DEBUG_PER_CPU_MAPS is enabled.  One gotcha is that 
>>> nr_node_ids is intialized to MAX_NUMNODES until 
>>> setup_node_to_cpumask_map() sets it to the correct value.  So uses 
>>> before that should be caught by the earlier check.
>> I think it should always check the node index. The code in 
>> kernel/sched.c (see above) calls node_to_cpumask(i) on nodes 0 < i < 
>> MAX_NUMNODES and it WILL use invalid pointers. Or should 
>> kernel/sched.c be changed to use nr_node_ids instead of MAX_NUMNODES? 
>> I believe there are more places that do this than just sched.c.
>>
>> I have attached two patches. The sched one fixes Andrew's boot 
>> problem. The x86 one is untested, but I believe it is better to BUG 
>> than silently corrupt some arbitrary memory. (Then the callers can be 
>> found easily and fixed at least.)
> 
> nice fixes! I have applied both of them to -tip, this one to 
> tip/sched-devel:
> 
>> Subject: [PATCH] sched: don't call node_to_cpumask() on nodes > nr_node_ids
> 
> AFAICS this is not yet required for v2.6.26, as the requirement to never 
> iterate to MAX_NUMNODES and call nr_cpus_node() with the index only got 
> introduced by Mike's patch.
> 
> and this one to tip/x86/numa:
> 
>> Subject: [PATCH] x86: don't return invalid pointers from node_to_cpumask()
> 
> and i've undone the revert of "x86: remove the static 256k 
> node_to_cpumask_map" as well.
> 
> agreed?
> 
> 	Ingo

Hi Ingo,

My -tip branch has:

	a953e4597abd51b74c99e0e3b7074532a60fd031

	sched: replace MAX_NUMNODES with nr_node_ids in kernel/sched.c
	committed: 2008-05-23 09:22:17

The check for node > nr_node_ids however should be included (at least
when CONFIG_DEBUG_PER_CPU_MAPS is enabled.)

Thanks,
Mike

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06 14:57                         ` Ingo Molnar
  2008-06-06 15:01                           ` Ingo Molnar
  2008-06-06 15:04                           ` Mike Travis
@ 2008-06-06 15:13                           ` Ingo Molnar
  2 siblings, 0 replies; 54+ messages in thread
From: Ingo Molnar @ 2008-06-06 15:13 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Mike Travis, Andrew Morton, Stephen Rothwell, linux-next, LKML


* Ingo Molnar <mingo@elte.hu> wrote:

> > Subject: [PATCH] sched: don't call node_to_cpumask() on nodes > nr_node_ids
> 
> AFAICS this is not yet required for v2.6.26, as the requirement to 
> never iterate to MAX_NUMNODES and call nr_cpus_node() with the index 
> only got introduced by Mike's patch.

note that we already had these changes in tip/cpus4096, the fact that 
PERCPU_DEBUG didnt properly detect the problem was icing on the cake.

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06 15:01                           ` Ingo Molnar
@ 2008-06-06 15:13                             ` Vegard Nossum
  2008-06-06 15:23                               ` Ingo Molnar
  0 siblings, 1 reply; 54+ messages in thread
From: Vegard Nossum @ 2008-06-06 15:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mike Travis, Andrew Morton, Stephen Rothwell, linux-next, LKML

On Fri, Jun 6, 2008 at 5:01 PM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Ingo Molnar <mingo@elte.hu> wrote:
>
>> > Subject: [PATCH] sched: don't call node_to_cpumask() on nodes >
>> > nr_node_ids
>>
>> AFAICS this is not yet required for v2.6.26, as the requirement to
>> never iterate to MAX_NUMNODES and call nr_cpus_node() with the index
>> only got introduced by Mike's patch.
>
> the one below is needed as well i think.

Yeah. I think you had better take Mike's patches, I don't trust even
that my patch and your fixlet does everything correctly.


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06 15:04                           ` Mike Travis
@ 2008-06-06 15:20                             ` Mike Travis
  2008-06-06 15:33                               ` Ingo Molnar
  0 siblings, 1 reply; 54+ messages in thread
From: Mike Travis @ 2008-06-06 15:20 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Vegard Nossum, Andrew Morton, Stephen Rothwell, linux-next, LKML

Mike Travis wrote:
> 
> Hi Ingo,
> 
> My -tip branch has:
> 
> 	a953e4597abd51b74c99e0e3b7074532a60fd031
> 
> 	sched: replace MAX_NUMNODES with nr_node_ids in kernel/sched.c
> 	committed: 2008-05-23 09:22:17
> 
> The check for node > nr_node_ids however should be included (at least
> when CONFIG_DEBUG_PER_CPU_MAPS is enabled.)
> 
> Thanks,
> Mike

Note this was in the following set of patches:


 -      Subject: [PATCH 01/10] percpu: Use a kconfig variable to signal arch specific percpu setup
 -      Subject: [PATCH 00/11] x86: cleanup early per cpu variables/accesses v5-folded
 -      Subject: [PATCH 04/11] x86: remove the static 256k node_to_cpumask_map
 -      Subject: [PATCH 03/11] x86: restore pda nodenumber field
 -      Subject: [PATCH 08/11] x86: Add performance variants of cpumask operators
 -      Subject: [PATCH 09/11] x86: Use performance variant for_each_cpu_mask_nr
 -      Subject: [PATCH 02/11] x86: cleanup early per cpu variables/accesses v4
 -      Subject: [PATCH 06/11] cpu: change some globals to statics in drivers/base/cpu.c v2
 -      Subject: [PATCH 07/11] x86: remove static boot_cpu_pda array
 -      Subject: [PATCH 05/11] sched: replace MAX_NUMNODES with nr_node_ids in kernel/sched.c
 -      Subject: [PATCH 11/11] net: Pass reference to cpumask variable in net/sunrpc/svc.c

 -	Date: Fri, 25 Apr 2008 17:15:48 -0700

(Some have been further modified by later patches.)

The patch ordering was incorrect as I removed the node_to_cpumask_map before I replaced
the MAX_NUMNODES, should have been the opposite.



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06 15:13                             ` Vegard Nossum
@ 2008-06-06 15:23                               ` Ingo Molnar
  2008-06-06 15:52                                 ` Mike Travis
  0 siblings, 1 reply; 54+ messages in thread
From: Ingo Molnar @ 2008-06-06 15:23 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Mike Travis, Andrew Morton, Stephen Rothwell, linux-next, LKML,
	Thomas Gleixner


* Vegard Nossum <vegard.nossum@gmail.com> wrote:

> >> AFAICS this is not yet required for v2.6.26, as the requirement to 
> >> never iterate to MAX_NUMNODES and call nr_cpus_node() with the 
> >> index only got introduced by Mike's patch.
> >
> > the one below is needed as well i think.
> 
> Yeah. I think you had better take Mike's patches, I don't trust even 
> that my patch and your fixlet does everything correctly.

yep, just discovered that we had them already ;-)

Thomas has just scripted up a new "detect if a commit is not in 
linux-next yet" script that should avoid such problems in the future.

your second patch is still wanted, it would have detected the problem 
earlier.

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06 15:20                             ` Mike Travis
@ 2008-06-06 15:33                               ` Ingo Molnar
  0 siblings, 0 replies; 54+ messages in thread
From: Ingo Molnar @ 2008-06-06 15:33 UTC (permalink / raw)
  To: Mike Travis
  Cc: Vegard Nossum, Andrew Morton, Stephen Rothwell, linux-next, LKML


* Mike Travis <travis@sgi.com> wrote:

> The patch ordering was incorrect as I removed the node_to_cpumask_map 
> before I replaced the MAX_NUMNODES, should have been the opposite.

It needed the combination 4 failures along the line: the debug check was 
not complete, the ordering was bad and thus the splitup was bad as well 
- and then one component went missing in linux-next and the combined 
effect created this bug that needed a bisection by Andrew and Vegard to 
figure out.

the moral: we now tightened the debug check, fixed the integration bug 
and tightened the checks we have for patch propagation. (Thomas just 
added the new tip-check-integration script to tip/tip that implements 
this)

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06 15:23                               ` Ingo Molnar
@ 2008-06-06 15:52                                 ` Mike Travis
  2008-06-18  8:26                                   ` Ingo Molnar
  0 siblings, 1 reply; 54+ messages in thread
From: Mike Travis @ 2008-06-06 15:52 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Vegard Nossum, Andrew Morton, Stephen Rothwell, linux-next, LKML,
	Thomas Gleixner

Ingo Molnar wrote:
> * Vegard Nossum <vegard.nossum@gmail.com> wrote:
> 
>>>> AFAICS this is not yet required for v2.6.26, as the requirement to 
>>>> never iterate to MAX_NUMNODES and call nr_cpus_node() with the 
>>>> index only got introduced by Mike's patch.
>>> the one below is needed as well i think.
>> Yeah. I think you had better take Mike's patches, I don't trust even 
>> that my patch and your fixlet does everything correctly.
> 
> yep, just discovered that we had them already ;-)
> 
> Thomas has just scripted up a new "detect if a commit is not in 
> linux-next yet" script that should avoid such problems in the future.
> 
> your second patch is still wanted, it would have detected the problem 
> earlier.
> 
> 	Ingo

Thanks, yes, I agree.  However I would like to modify it slightly:
---
Subject: [PATCH 1/1] x86: Add check for node passed to node_to_cpumask

  * When CONFIG_DEBUG_PER_CPU_MAPS is set, the node passed to
    node_to_cpumask and node_to_cpumask_ptr should be validated.

Signed-off-by: Mike Travis <travis@sgi.com>
---
 arch/x86/kernel/setup.c |   18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

--- linux-2.6.tip.orig/arch/x86/kernel/setup.c
+++ linux-2.6.tip/arch/x86/kernel/setup.c
@@ -399,6 +399,10 @@ int early_cpu_to_node(int cpu)
 	return per_cpu(x86_cpu_to_node_map, cpu);
 }
 
+
+/* empty cpumask */
+static cpumask_t cpu_mask_none;
+
 /*
  * Returns a pointer to the bitmask of CPUs on Node 'node'.
  */
@@ -411,6 +415,13 @@ cpumask_t *_node_to_cpumask_ptr(int node
 		dump_stack();
 		return &cpu_online_map;
 	}
+	if (node >= nr_node_ids) {
+		printk(KERN_WARNING
+			"_node_to_cpumask_ptr(%d): node > nr_node_ids(%d)\n",
+			node, nr_node_ids);
+		dump_stack();
+		return &cpu_mask_none;
+	}
 	return &node_to_cpumask_map[node];
 }
 EXPORT_SYMBOL(_node_to_cpumask_ptr);
@@ -426,6 +437,13 @@ cpumask_t node_to_cpumask(int node)
 		dump_stack();
 		return cpu_online_map;
 	}
+	if (node >= nr_node_ids) {
+		printk(KERN_WARNING
+			"node_to_cpumask(%d): node > nr_node_ids(%d)\n",
+			node, nr_node_ids);
+		dump_stack();
+		return cpu_mask_none;
+	}
 	return node_to_cpumask_map[node];
 }
 EXPORT_SYMBOL(node_to_cpumask);

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06 10:47                     ` Ingo Molnar
@ 2008-06-06 16:37                       ` Ingo Molnar
  0 siblings, 0 replies; 54+ messages in thread
From: Ingo Molnar @ 2008-06-06 16:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Stephen Rothwell, linux-next, LKML, the arch/x86 maintainers


* Ingo Molnar <mingo@elte.hu> wrote:

> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > > what do you mean? We are testing commits that everybody will run and 
> > > are pre-filtering them for sanity and stability before they hit 
> > > linux-next.
> > 
> > One doesn't test commits - one tests a tree.  And the -tip tree is 
> > 2.6.26-rc5 plus a bunch of x86 changes. [...]
> 
> no, 90%+ of all bugs are not due to tree interaction effects but are 
> caused by individual commits, triggerable on a particular 
> system/workload. (Our historic regression list is the proof for that, 
> can give you itemized statistics if you want.)
> 
> also, the -tip tree is not "2.6.26-rc5 plus a bunch of x86 changes" but 
> v2.6.26-rc5-84-g39b945a plus 75 topic trees we maintain:
> 
> build, core/futex-64bit, core/kill-the-BKL, core/locking, core/percpu, 
> core/printk, core/rcu, core/rodata, core/softirq, core/softlockup, 
> core/stacktrace, core/urgent, cpus4096, genirq, hrtimers, kmemcheck, 
> out-of-tree, pci-for-jesse, safe-poison-pointers, sched, sched-devel, 
> scratch, stackprotector, timers/clockevents, timers/hpet, 
> timers/hrtimers, timers/nohz, timers/posixtimers, tip, tracing/ftrace, 
> tracing/ftrace-mergefixups, tracing/immediates, tracing/markers, 
> tracing/mmiotrace, tracing/mmiotrace-mergefixups, tracing/nmisafe, 
> tracing/sched_markers, tracing/stopmachine-allcpus, tracing/sysprof, 
> tracing/textedit, x86/apic, x86/apm, x86/bitops, x86/build, x86/checkme, 
> x86/cleanups, x86/cpa, x86/cpu, x86/defconfig, x86/gart, x86/i8259, 
> x86/intel, x86/irq, x86/irqstats, x86/kconfig, x86/ldt, x86/mce, 
> x86/memtest, x86/mmio, x86/mpparse, x86/nmi, x86/numa, x86/numa-fixes, 
> x86/pat, x86/pebs, x86/ptemask, x86/resumetrace, x86/scratch, x86/setup, 
> x86/threadinfo, x86/timers, x86/urgent, x86/uv, x86/vdso, x86/xen, 
> x86/xsave.
> 
> most of which are in linux-next (around 70%), or will be shortly in 
> linux-next (more than 90%).

we created some stats and in fact not 70% but 80% of all -tip commits 
are in linux-next right now.

Here are the full -tip commit stats (merge commits excluded):

  total commits in auto-next-branches: 617
  auto-branches commits in linux-next: 553
  total commits in tip/auto-latest:    686
  total commits in tip/master:         699

that propotion should go up to 90% on the next linux-next iteration. 
(barring any problems with the new topics)

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06 10:54         ` Andrew Morton
                             ` (2 preceding siblings ...)
  2008-06-06 13:28           ` Mike Travis
@ 2008-06-06 17:15           ` Ingo Molnar
  3 siblings, 0 replies; 54+ messages in thread
From: Ingo Molnar @ 2008-06-06 17:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Stephen Rothwell, linux-next, LKML, Mike Travis


* Andrew Morton <akpm@linux-foundation.org> wrote:

> 	    x86: remove the static 256k node_to_cpumask_map
> 
> crash, as described earlier.
> 
> I don't know what happened to that early exception - it didn't come back.

hm, early exception, there's one commit i can think of right now:

 # tip/x86/xen: b20aecc: xen: fix early bootup crash on native hardware

but that got propagated to linux-next:

 # linux-next: b20aecc: xen: fix early bootup crash on native hardware

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: linux-next: Tree for June 5
  2008-06-06 15:52                                 ` Mike Travis
@ 2008-06-18  8:26                                   ` Ingo Molnar
  0 siblings, 0 replies; 54+ messages in thread
From: Ingo Molnar @ 2008-06-18  8:26 UTC (permalink / raw)
  To: Mike Travis
  Cc: Vegard Nossum, Andrew Morton, Stephen Rothwell, linux-next, LKML,
	Thomas Gleixner


* Mike Travis <travis@sgi.com> wrote:

> Ingo Molnar wrote:
> > * Vegard Nossum <vegard.nossum@gmail.com> wrote:
> > 
> >>>> AFAICS this is not yet required for v2.6.26, as the requirement to 
> >>>> never iterate to MAX_NUMNODES and call nr_cpus_node() with the 
> >>>> index only got introduced by Mike's patch.
> >>> the one below is needed as well i think.
> >> Yeah. I think you had better take Mike's patches, I don't trust even 
> >> that my patch and your fixlet does everything correctly.
> > 
> > yep, just discovered that we had them already ;-)
> > 
> > Thomas has just scripted up a new "detect if a commit is not in 
> > linux-next yet" script that should avoid such problems in the future.
> > 
> > your second patch is still wanted, it would have detected the problem 
> > earlier.
> > 
> > 	Ingo
> 
> Thanks, yes, I agree.  However I would like to modify it slightly:
> ---
> Subject: [PATCH 1/1] x86: Add check for node passed to node_to_cpumask
> 
>   * When CONFIG_DEBUG_PER_CPU_MAPS is set, the node passed to
>     node_to_cpumask and node_to_cpumask_ptr should be validated.
> 
> Signed-off-by: Mike Travis <travis@sgi.com>
> ---
>  arch/x86/kernel/setup.c |   18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> --- linux-2.6.tip.orig/arch/x86/kernel/setup.c
> +++ linux-2.6.tip/arch/x86/kernel/setup.c
> @@ -399,6 +399,10 @@ int early_cpu_to_node(int cpu)
>  	return per_cpu(x86_cpu_to_node_map, cpu);
>  }
>  
> +
> +/* empty cpumask */
> +static cpumask_t cpu_mask_none;

hm, this should be __read_mostly, maybe even const so that it becomes 
readonly?

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* linux-next: Tree for June 5
@ 2009-06-05  6:41 Stephen Rothwell
  0 siblings, 0 replies; 54+ messages in thread
From: Stephen Rothwell @ 2009-06-05  6:41 UTC (permalink / raw)
  To: linux-next; +Cc: LKML

[-- Attachment #1: Type: text/plain, Size: 9834 bytes --]

Hi all,

Changes since 20090604:

This tree fails to build for powerpc allyesconfig.

The mmc tree lost its build failure.

The md tree lost its build failure.

The sound tree lost its build failure.

The usb tree gained 2 conflicts against the ttydev tree.

----------------------------------------------------------------------------

I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/sfr/linux-next.git
(patches at
http://www.kernel.org/pub/linux/kernel/people/sfr/linux-next/).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" as mentioned in the FAQ on the wiki
(see below).

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log files
in the Next directory.  Between each merge, the tree was built with
a ppc64_defconfig for powerpc and an allmodconfig for x86_64. After the
final fixups (if any), it is also built with powerpc allnoconfig (32 and
64 bit), ppc44x_defconfig and allyesconfig (minus
CONFIG_PROFILE_ALL_BRANCHES) and i386, sparc and sparc64 defconfig.
These builds also have CONFIG_ENABLE_WARN_DEPRECATED,
CONFIG_ENABLE_MUST_CHECK and CONFIG_DEBUG_INFO disabled when necessary.

Below is a summary of the state of the merge.

We are up to 142 trees (counting Linus' and 19 trees of patches pending for
Linus' tree), more are welcome (even if they are currently empty).
Thanks to those who have contributed, and to those who haven't, please do.

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Jan Dittmer for adding the linux-next tree to his build tests
at http://l4x.org/k/ , the guys at http://test.kernel.org/ and Randy
Dunlap for doing many randconfig builds.

There is a wiki covering stuff to do with linux-next at
http://linux.f-seidel.de/linux-next/pmwiki/ .  Thanks to Frank Seidel.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

$ git checkout master
$ git reset --hard stable
Merging origin/master
Merging fixes/fixes
Merging arm-current/master
Merging m68k-current/for-linus
Merging powerpc-merge/merge
Merging sparc-current/master
Merging scsi-rc-fixes/master
Merging net-current/master
Merging sound-current/for-linus
Merging pci-current/for-linus
Merging wireless-current/master
Merging kbuild-current/master
Merging quilt/driver-core.current
Merging quilt/usb.current
Merging cpufreq-current/fixes
Merging input-current/for-linus
Merging md-current/for-linus
Merging audit-current/for-linus
Merging crypto-current/master
Merging dwmw2/master
Merging arm/devel
Merging avr32/avr32-arch
Merging blackfin/for-linus
Merging cris/for-next
Merging ia64/test
Merging m68k/for-next
Merging m68knommu/for-next
Merging microblaze/next
Merging mips/mips-for-linux-next
CONFLICT (content): Merge conflict in drivers/char/hw_random/Kconfig
CONFLICT (content): Merge conflict in drivers/char/hw_random/Makefile
Merging parisc/master
Merging powerpc/next
Merging 4xx/next
Merging galak/next
Merging pxa/for-next
CONFLICT (content): Merge conflict in arch/arm/mach-pxa/viper.c
Merging s390/features
Merging sh/master
Merging sparc/master
Merging x86/auto-x86-next
Merging xtensa/master
Merging configfs/linux-next
Merging ext4/next
Merging fatfs/master
Merging fuse/for-next
Merging gfs2/master
Merging jfs/next
Merging nfs/linux-next
Merging nfsd/nfsd-next
Merging nilfs2/for-next
Merging ocfs2/linux-next
Merging squashfs/master
Merging v9fs/for-next
CONFLICT (content): Merge conflict in net/9p/protocol.c
Merging ubifs/linux-next
Merging xfs/master
Merging reiserfs-bkl/reiserfs/kill-bkl-rc6
Merging vfs/for-next
CONFLICT (content): Merge conflict in fs/ext4/super.c
CONFLICT (content): Merge conflict in fs/fuse/inode.c
CONFLICT (delete/modify): fs/gfs2/ops_super.c deleted in HEAD and modified in vfs/for-next. Version vfs/for-next of fs/gfs2/ops_super.c left in tree.
CONFLICT (content): Merge conflict in fs/reiserfs/super.c
$ git rm -f fs/gfs2/ops_super.c
Applying: vfs/gfs2: fixup merge for file removal
Applying: vfs: fix mismerge of fs/reiserfs/xattr.c
Merging pci/linux-next
Merging hid/for-next
Merging quilt/i2c
Merging quilt/jdelvare-hwmon
Merging quilt/kernel-doc
Merging v4l-dvb/master
Merging quota/for_next
Merging kbuild/master
Merging ide/for-next
Merging libata/NEXT
Merging infiniband/for-next
Merging acpi/test
[master 305cb8e] Revert "Merge branch 'oqo-wmi-EXPECT-REFRESH' into test"
Merging ieee1394/for-next
Merging ubi/linux-next
Merging kvm/master
CONFLICT (content): Merge conflict in arch/x86/include/asm/mce.h
Merging dlm/next
Merging scsi/master
Merging async_tx/next
Merging udf/for_next
Merging net/master
CONFLICT (content): Merge conflict in include/linux/mmc/sdio_ids.h
Merging wireless/master
CONFLICT (content): Merge conflict in Documentation/feature-removal-schedule.txt
CONFLICT (content): Merge conflict in drivers/platform/x86/Kconfig
CONFLICT (content): Merge conflict in drivers/platform/x86/toshiba_acpi.c
Merging mtd/master
CONFLICT (content): Merge conflict in drivers/mtd/nand/mxc_nand.c
Merging crypto/master
Merging sound/for-next
Merging cpufreq/next
Merging quilt/rr
CONFLICT (content): Merge conflict in arch/x86/kernel/cpu/cpufreq/powernow-k8.c
Merging cifs/master
Merging mmc/next
Merging input/next
Merging bkl-removal/bkl-removal
Merging lsm/for-next
Merging block/for-next
CONFLICT (content): Merge conflict in drivers/ide/ide-atapi.c
CONFLICT (content): Merge conflict in drivers/ide/ide-cd.c
CONFLICT (content): Merge conflict in drivers/ide/ide-floppy.c
CONFLICT (content): Merge conflict in drivers/ide/ide-tape.c
Merging quilt/device-mapper
Merging embedded/master
Merging firmware/master
Merging pcmcia/master
Merging battery/master
Merging leds/for-mm
Merging backlight/for-mm
Merging kgdb/kgdb-next
Merging slab/for-next
Merging uclinux/for-next
Merging md/for-next
CONFLICT (content): Merge conflict in drivers/md/md.c
Merging mfd/for-next
Merging hdlc/hdlc-next
Merging drm/drm-next
Merging voltage/for-next
Merging security-testing/next
Merging lblnet/master
Merging quilt/ttydev
Merging agp/agp-next
Merging tip-core/auto-core-next
Merging cpus4096/auto-cpus4096-next
Merging tracing/auto-tracing-next
CONFLICT (content): Merge conflict in block/blk-sysfs.c
CONFLICT (content): Merge conflict in net/core/drop_monitor.c
CONFLICT (content): Merge conflict in net/core/net-traces.c
Merging genirq/auto-genirq-next
Merging safe-poison-pointers/auto-safe-poison-pointers-next
Merging sched/auto-sched-next
Merging stackprotector/auto-stackprotector-next
Merging timers/auto-timers-next
CONFLICT (content): Merge conflict in kernel/sched.c
Merging generic-ipi/auto-generic-ipi-next
Merging oprofile/auto-oprofile-next
Merging fastboot/auto-fastboot-next
Merging sparseirq/auto-sparseirq-next
Merging iommu/auto-iommu-next
Merging uwb/for-upstream
Merging watchdog/master
Merging bdev/master
Merging dwmw2-iommu/master
CONFLICT (content): Merge conflict in drivers/pci/intel-iommu.c
CONFLICT (content): Merge conflict in drivers/pci/intr_remapping.c
Merging cputime/cputime
Merging osd/linux-next
Merging jc_docs/docs-next
Merging nommu/master
Merging trivial/for-next
Merging audit/for-next
Merging omap/for-next
Merging quilt/aoe
Merging kmemleak/kmemleak
CONFLICT (delete/modify): arch/x86/kernel/vmlinux_32.lds.S deleted in HEAD and modified in kmemleak/kmemleak. Version kmemleak/kmemleak of arch/x86/kernel/vmlinux_32.lds.S left in tree.
CONFLICT (delete/modify): arch/x86/kernel/vmlinux_64.lds.S deleted in HEAD and modified in kmemleak/kmemleak. Version kmemleak/kmemleak of arch/x86/kernel/vmlinux_64.lds.S left in tree.
CONFLICT (content): Merge conflict in lib/Kconfig.debug
CONFLICT (content): Merge conflict in mm/slob.c
$ git rm -f arch/x86/kernel/vmlinux_32.lds.S arch/x86/kernel/vmlinux_64.lds.S
Merging kmemcheck/auto-kmemcheck-next
CONFLICT (content): Merge conflict in arch/x86/mm/fault.c
CONFLICT (content): Merge conflict in include/linux/ring_buffer.h
CONFLICT (content): Merge conflict in include/linux/slab.h
CONFLICT (content): Merge conflict in kernel/trace/ring_buffer.c
CONFLICT (content): Merge conflict in mm/Makefile
CONFLICT (content): Merge conflict in mm/slab.c
CONFLICT (content): Merge conflict in mm/slub.c
Merging suspend/linux-next
Merging bluetooth/master
Merging edac-amd/for-next
Merging fsnotify/for-next
Merging asm-generic/next
CONFLICT (content): Merge conflict in arch/arm/include/asm/page.h
Merging quilt/driver-core
CONFLICT (content): Merge conflict in arch/x86/kernel/microcode_core.c
CONFLICT (content): Merge conflict in drivers/base/firmware_class.c
CONFLICT (content): Merge conflict in init/main.c
Merging quilt/usb
CONFLICT (content): Merge conflict in drivers/usb/class/cdc-acm.c
CONFLICT (content): Merge conflict in drivers/usb/serial/cp210x.c
CONFLICT (content): Merge conflict in drivers/usb/serial/ftdi_sio.c
CONFLICT (content): Merge conflict in drivers/usb/serial/kobil_sct.c
CONFLICT (content): Merge conflict in drivers/usb/serial/sierra.c
Merging quilt/staging
CONFLICT (content): Merge conflict in drivers/staging/rt2860/common/mlme.c
CONFLICT (content): Merge conflict in drivers/staging/rt2870/common/mlme.c
CONFLICT (content): Merge conflict in drivers/staging/rt3070/common/mlme.c
Merging scsi-post-merge/master
CONFLICT (content): Merge conflict in include/Kbuild

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2009-06-05  6:41 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-06-05  7:52 linux-next: Tree for June 5 Stephen Rothwell
2008-06-06  2:56 ` Andrew Morton
2008-06-06  3:46   ` Andrew Morton
2008-06-06  7:17   ` Ingo Molnar
2008-06-06  7:25     ` Ingo Molnar
2008-06-06  7:33       ` Andrew Morton
2008-06-06  7:41         ` Ingo Molnar
2008-06-06  7:47           ` Andrew Morton
2008-06-06  7:53             ` Stephen Rothwell
2008-06-06  8:01               ` Andrew Morton
2008-06-06  8:22                 ` Stephen Rothwell
2008-06-06  8:30                   ` Andrew Morton
2008-06-06  8:36                     ` Ingo Molnar
2008-06-06 11:50                     ` Paul Mackerras
2008-06-06  8:27               ` Ingo Molnar
2008-06-06  8:23             ` Ingo Molnar
2008-06-06  8:28               ` Stephen Rothwell
2008-06-06  8:33                 ` Ingo Molnar
2008-06-06  8:38               ` Andrew Morton
2008-06-06  8:49                 ` Ingo Molnar
2008-06-06  9:01                   ` Andrew Morton
2008-06-06 10:47                     ` Ingo Molnar
2008-06-06 16:37                       ` Ingo Molnar
2008-06-06  7:29     ` Andrew Morton
2008-06-06  9:48       ` Andrew Morton
2008-06-06  9:54         ` Andrew Morton
2008-06-06 10:10           ` Ingo Molnar
2008-06-06 10:54         ` Andrew Morton
2008-06-06 11:21           ` Vegard Nossum
2008-06-06 11:57           ` Ingo Molnar
2008-06-06 12:33             ` Vegard Nossum
2008-06-06 13:33               ` Mike Travis
2008-06-06 13:50                 ` Vegard Nossum
2008-06-06 14:07                   ` Vegard Nossum
2008-06-06 14:20                     ` Mike Travis
2008-06-06 14:36                       ` Vegard Nossum
2008-06-06 14:41                         ` Mike Travis
2008-06-06 14:51                           ` Mike Travis
2008-06-06 14:54                             ` Mike Travis
2008-06-06 14:57                         ` Ingo Molnar
2008-06-06 15:01                           ` Ingo Molnar
2008-06-06 15:13                             ` Vegard Nossum
2008-06-06 15:23                               ` Ingo Molnar
2008-06-06 15:52                                 ` Mike Travis
2008-06-18  8:26                                   ` Ingo Molnar
2008-06-06 15:04                           ` Mike Travis
2008-06-06 15:20                             ` Mike Travis
2008-06-06 15:33                               ` Ingo Molnar
2008-06-06 15:13                           ` Ingo Molnar
2008-06-06 14:13                   ` Mike Travis
2008-06-06 13:28           ` Mike Travis
2008-06-06 17:15           ` Ingo Molnar
2008-06-06  7:33     ` Stephen Rothwell
2009-06-05  6:41 Stephen Rothwell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).