Hopefully attached revised patch addresses all concerns mentioned (except that it intentionally continues to not use alloc_percpu()). Jan >>> Andrew Morton 23.01.06 11:57:02 >>> "Jan Beulich" wrote: > > >>> Andrew Morton 21.01.06 08:25:00 >>> > >"Jan Beulich" wrote: > >> > >> The biggest arch-independent consumer is tvec_bases (over 4k on 32-bit > >> archs, > >> over 8k on 64-bit ones), which now gets converted to use dynamically > >> allocated > >> memory instead. > > > >ho hum, another pointer hop. > > > >Did you consider using alloc_percpu()? > > I did, but I saw drawbacks with that (most notably the fact that all instances are allocated at > once, possibly wasting a lot of memory). It's 4k for each cpu which is in the possible_map but which will never be brought online. I don't think that'll be a lot of memory - are there machines which have a lot of possible-but-not-really-there CPUs? > >The patch does trickery in init_timers_cpu() which, from my reading, defers > >the actual per-cpu allocation until the second CPU comes online. > >Presumably because of some ordering issue which you discovered. Readers of > >the code need to know what that issue was. > > No, I don't see any trickery there (on demand allocation in CPU_UP_PREPARE is being done > elsewhere in very similar ways), and I also didn't see any ordering issues. Hence I also didn't > see any need to explain this in detail. There _must_ be ordering issues. Otherwise we'd just dynamically allocate all the structs up-front and be done with it. Presumably the ordering issue is that init_timers() is called before kmem_cache_init(). That's non-obvious and should be commented. > >And boot_tvec_bases will always be used for the BP, and hence one slot in > >the per-cpu array will forever be unused. Until the BP is taken down and > >brought back up, in which case it will suddenly start to use a dynamically > >allocated structure. > > Why? Each slot is allocated at most once, the BP's is never allocated (it will continue to use the > static one even when brought down and back up). OK, I missed the `if (likely(!base))' test in there. Patch seems OK from that POV and we now seem to know what the ordering problem is. - The `#ifdef CONFIG_NUMA' in init_timers_cpu() seems to be unnecessary - kmalloc_node() will use kmalloc() if !NUMA. - The likely()s in init_timers_cpu() seems fairly pointless - it's not a fastpath. - We prefer to do this: if (expr) { ... } else { ... } and not if (expr) { ... } else { ... }