linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* #tj-percpu has been rebased
@ 2009-01-30 17:05 Tejun Heo
  2009-01-31  5:46 ` Tejun Heo
  0 siblings, 1 reply; 20+ messages in thread
From: Tejun Heo @ 2009-01-30 17:05 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, Rusty Russell, x86,
	Linux Kernel Mailing List

Hello,

tj-percpu has been rebased.  Sorry, the previous tree was discarded
because it was missing several patches in the middle but rebasing on
top of new misc/tj-percpu shouldn't be too difficult.  What have
happened are...

* Rebased on top of the current 2.6/master[1].

* hardirq unification patches restored.

* linker bug fix and xen fix patches applied.

* misc/tj-cpus4096 created as tip/cpus4096 + uv_flush_tlb_others()
  cleanup patch (w/ Cliff Wickman's fixes folded in) and got merged
  into misc/tj-percpu.

* misc/tj-stackprotector created as tip/stackprotector^ + three
  patches.  The back tracked commit is merge with the previous
  core/percpu.  misc/tj-stackprotector got merged into misc/tj-percpu.

So, all in all, the rebase is complete.  No patch is lost but on
hindsight it should really have been done by pulling in 2.6/master.
The end result is the same anyway sans slightly simpler merge tree and
it definitely doesn't feel like it's worth the trouble.  Well, what's
done is done.  Please keep sending patches.

The git vector is as always...

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git tj-percpu

Thanks.

-- 
tejun

[1] f2257b70b0f9b2fe8f2afd83fc6798dca75930b8

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: #tj-percpu has been rebased
  2009-01-30 17:05 #tj-percpu has been rebased Tejun Heo
@ 2009-01-31  5:46 ` Tejun Heo
  2009-01-31 13:28   ` Ingo Molnar
  2009-02-02  9:04   ` Rusty Russell
  0 siblings, 2 replies; 20+ messages in thread
From: Tejun Heo @ 2009-01-31  5:46 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, Rusty Russell, x86,
	Linux Kernel Mailing List, Jeremy Fitzhardinge, cpw

Hello, again.

Please disregard the previous two rebased trees.  There has been a
major communication misunderstanding between Ingo and me leading to
the rebased branches.  Here is yet another, hopefully final, merged
tree which builds on top of tip/core/percpu and took only about ten
minutes as opposed to hours.  :-)

* Three xen fix related patches merged.  Later threes conflicted in
  non-minor way.  Requested refresh.

* Cliff Wickman's UV flush fix patch revived and merged.

* Pull in linus#master[1].

Anyways, please pull from

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git tj-percpu

The commit ID should be da2c0b021cde94866f1e492f940aad29e8f61258.

Sorry about the mess but this one should be the safe one to post
patches against.

Thanks.

-- 
tejun

[1] 33bfad54b58cf05cfe6678c3ec9235d4bc8db4c2

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: #tj-percpu has been rebased
  2009-01-31  5:46 ` Tejun Heo
@ 2009-01-31 13:28   ` Ingo Molnar
  2009-02-02  9:04   ` Rusty Russell
  1 sibling, 0 replies; 20+ messages in thread
From: Ingo Molnar @ 2009-01-31 13:28 UTC (permalink / raw)
  To: Tejun Heo
  Cc: H. Peter Anvin, Thomas Gleixner, Rusty Russell, x86,
	Linux Kernel Mailing List, Jeremy Fitzhardinge, cpw


* Tejun Heo <tj@kernel.org> wrote:

> Hello, again.
> 
> Please disregard the previous two rebased trees.  There has been a
> major communication misunderstanding between Ingo and me leading to
> the rebased branches.  Here is yet another, hopefully final, merged
> tree which builds on top of tip/core/percpu and took only about ten
> minutes as opposed to hours.  :-)
> 
> * Three xen fix related patches merged.  Later threes conflicted in
>   non-minor way.  Requested refresh.
> 
> * Cliff Wickman's UV flush fix patch revived and merged.
> 
> * Pull in linus#master[1].
> 
> Anyways, please pull from
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git tj-percpu
> 
> The commit ID should be da2c0b021cde94866f1e492f940aad29e8f61258.
> 
> Sorry about the mess but this one should be the safe one to post
> patches against.

Very nice - pulled into tip/core/percpu, thanks Tejun!

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: #tj-percpu has been rebased
  2009-01-31  5:46 ` Tejun Heo
  2009-01-31 13:28   ` Ingo Molnar
@ 2009-02-02  9:04   ` Rusty Russell
  2009-02-04  3:18     ` Tejun Heo
  1 sibling, 1 reply; 20+ messages in thread
From: Rusty Russell @ 2009-02-02  9:04 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, x86,
	Linux Kernel Mailing List, Jeremy Fitzhardinge, cpw

On Saturday 31 January 2009 16:16:59 Tejun Heo wrote:
> Anyways, please pull from
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git tj-percpu

Thanks.  cpualloc patches rebased on top of it (I had to include the module
patch I've got queued for 2.6.29, otherwise gratuitous conflicts):

	git://git.kernel.org/pub/scm/linux/kernel/git/rusty/tj-percpu-cpualloc master

Boot tested here, but that's no guarantee that something didn't break.

I'm going to be occupied with finishing off cpumask_t this cycle, so I'd be
happy for you to take ownership of any and all of these that you want to keep.
I'll try to keep up with other patches as you post them.

Thanks!
Rusty.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: #tj-percpu has been rebased
  2009-02-02  9:04   ` Rusty Russell
@ 2009-02-04  3:18     ` Tejun Heo
  2009-02-12  3:37       ` Tejun Heo
  0 siblings, 1 reply; 20+ messages in thread
From: Tejun Heo @ 2009-02-04  3:18 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, x86,
	Linux Kernel Mailing List, Jeremy Fitzhardinge, cpw

Rusty Russell wrote:
> On Saturday 31 January 2009 16:16:59 Tejun Heo wrote:
>> Anyways, please pull from
>>
>>   git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git tj-percpu
> 
> Thanks.  cpualloc patches rebased on top of it (I had to include the module
> patch I've got queued for 2.6.29, otherwise gratuitous conflicts):
> 
> 	git://git.kernel.org/pub/scm/linux/kernel/git/rusty/tj-percpu-cpualloc master
> 
> Boot tested here, but that's no guarantee that something didn't break.
> 
> I'm going to be occupied with finishing off cpumask_t this cycle, so I'd be
> happy for you to take ownership of any and all of these that you want to keep.
> I'll try to keep up with other patches as you post them.

Will merge it in this week.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: #tj-percpu has been rebased
  2009-02-04  3:18     ` Tejun Heo
@ 2009-02-12  3:37       ` Tejun Heo
  2009-02-12  3:44         ` Tejun Heo
  0 siblings, 1 reply; 20+ messages in thread
From: Tejun Heo @ 2009-02-12  3:37 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, x86,
	Linux Kernel Mailing List, Jeremy Fitzhardinge, cpw

Hello,

Tejun Heo wrote:
> Rusty Russell wrote:
>> On Saturday 31 January 2009 16:16:59 Tejun Heo wrote:
>>> Anyways, please pull from
>>>
>>>   git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git tj-percpu
>> Thanks.  cpualloc patches rebased on top of it (I had to include the module
>> patch I've got queued for 2.6.29, otherwise gratuitous conflicts):
>>
>> 	git://git.kernel.org/pub/scm/linux/kernel/git/rusty/tj-percpu-cpualloc master
>>
>> Boot tested here, but that's no guarantee that something didn't break.
>>
>> I'm going to be occupied with finishing off cpumask_t this cycle, so I'd be
>> happy for you to take ownership of any and all of these that you want to keep.
>> I'll try to keep up with other patches as you post them.
> 
> Will merge it in this week.

Okay, just went through the patchset.  It generally looks nice and
clean but I'm still a bit hung up on the idea of making percpu area
resizable.  I'll look into Christoph Lameter's cpu_alloc patchset too
and think about it more.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: #tj-percpu has been rebased
  2009-02-12  3:37       ` Tejun Heo
@ 2009-02-12  3:44         ` Tejun Heo
  2009-02-13 20:58           ` Rusty Russell
  0 siblings, 1 reply; 20+ messages in thread
From: Tejun Heo @ 2009-02-12  3:44 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, x86,
	Linux Kernel Mailing List, Jeremy Fitzhardinge, cpw

Tejun Heo wrote:
> Hello,
> 
> Tejun Heo wrote:
>> Rusty Russell wrote:
>>> On Saturday 31 January 2009 16:16:59 Tejun Heo wrote:
>>>> Anyways, please pull from
>>>>
>>>>   git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git tj-percpu
>>> Thanks.  cpualloc patches rebased on top of it (I had to include the module
>>> patch I've got queued for 2.6.29, otherwise gratuitous conflicts):
>>>
>>> 	git://git.kernel.org/pub/scm/linux/kernel/git/rusty/tj-percpu-cpualloc master
>>>
>>> Boot tested here, but that's no guarantee that something didn't break.
>>>
>>> I'm going to be occupied with finishing off cpumask_t this cycle, so I'd be
>>> happy for you to take ownership of any and all of these that you want to keep.
>>> I'll try to keep up with other patches as you post them.
>> Will merge it in this week.
> 
> Okay, just went through the patchset.  It generally looks nice and
> clean but I'm still a bit hung up on the idea of making percpu area
> resizable.  I'll look into Christoph Lameter's cpu_alloc patchset too
> and think about it more.

Oops, those are the same ones.  I'll give a shot at cooking up
something which can be dynamically sized before going forward with
this one.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: #tj-percpu has been rebased
  2009-02-12  3:44         ` Tejun Heo
@ 2009-02-13 20:58           ` Rusty Russell
  2009-02-13 21:17             ` Jeremy Fitzhardinge
                               ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Rusty Russell @ 2009-02-13 20:58 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, x86,
	Linux Kernel Mailing List, Jeremy Fitzhardinge, cpw

On Thursday 12 February 2009 14:14:08 Tejun Heo wrote:
> Oops, those are the same ones.  I'll give a shot at cooking up
> something which can be dynamically sized before going forward with
> this one.

That's why I handed it to you! :)

Just remember we waited over 5 years for this to happen: the point of these
is that Christoph showed it's still useful.

(And I really like the idea of allocing congruent areas rather than remapping
 if someone can show that it's semi-reliable.  Good luck!)

Thanks!
Rusty. 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: #tj-percpu has been rebased
  2009-02-13 20:58           ` Rusty Russell
@ 2009-02-13 21:17             ` Jeremy Fitzhardinge
  2009-02-13 22:59             ` H. Peter Anvin
  2009-02-14  0:45             ` Tejun Heo
  2 siblings, 0 replies; 20+ messages in thread
From: Jeremy Fitzhardinge @ 2009-02-13 21:17 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Tejun Heo, Ingo Molnar, H. Peter Anvin, Thomas Gleixner, x86,
	Linux Kernel Mailing List, cpw

Rusty Russell wrote:
> That's why I handed it to you! :)
>
> Just remember we waited over 5 years for this to happen: the point of these
> is that Christoph showed it's still useful.
>
> (And I really like the idea of allocing congruent areas rather than remapping
>  if someone can show that it's semi-reliable.  Good luck!)

I really don't like the idea of making percpu memory (physically) move 
around.  It would break Xen's usage of it for a start...  Virtual 
remapping would be OK.

    J

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: #tj-percpu has been rebased
  2009-02-13 20:58           ` Rusty Russell
  2009-02-13 21:17             ` Jeremy Fitzhardinge
@ 2009-02-13 22:59             ` H. Peter Anvin
  2009-02-14  0:45             ` Tejun Heo
  2 siblings, 0 replies; 20+ messages in thread
From: H. Peter Anvin @ 2009-02-13 22:59 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Tejun Heo, Ingo Molnar, Thomas Gleixner, x86,
	Linux Kernel Mailing List, Jeremy Fitzhardinge, cpw

Rusty Russell wrote:
> On Thursday 12 February 2009 14:14:08 Tejun Heo wrote:
>> Oops, those are the same ones.  I'll give a shot at cooking up
>> something which can be dynamically sized before going forward with
>> this one.
> 
> That's why I handed it to you! :)
> 
> Just remember we waited over 5 years for this to happen: the point of these
> is that Christoph showed it's still useful.
> 
> (And I really like the idea of allocing congruent areas rather than remapping
>  if someone can show that it's semi-reliable.  Good luck!)

Well, it's easy to do in virtual space.  Trying to do it in physical 
space seems nearly impossible.

	-hpa

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: #tj-percpu has been rebased
  2009-02-13 20:58           ` Rusty Russell
  2009-02-13 21:17             ` Jeremy Fitzhardinge
  2009-02-13 22:59             ` H. Peter Anvin
@ 2009-02-14  0:45             ` Tejun Heo
  2009-02-14  1:53               ` H. Peter Anvin
  2009-02-16  7:23               ` Rusty Russell
  2 siblings, 2 replies; 20+ messages in thread
From: Tejun Heo @ 2009-02-14  0:45 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, x86,
	Linux Kernel Mailing List, Jeremy Fitzhardinge, cpw

Rusty Russell wrote:
> On Thursday 12 February 2009 14:14:08 Tejun Heo wrote:
>> Oops, those are the same ones.  I'll give a shot at cooking up
>> something which can be dynamically sized before going forward with
>> this one.
> 
> That's why I handed it to you! :)
> 
> Just remember we waited over 5 years for this to happen: the point of these
> is that Christoph showed it's still useful.
> 
> (And I really like the idea of allocing congruent areas rather than remapping
>  if someone can show that it's semi-reliable.  Good luck!)

I finished writing up the first draft last night.  Somehow I can feel
long grueling debugging hours ahead of me but it generally goes like
the following.

Percpu areas are allocated in chunks in vmalloc area.  Each chunk is
consisted of num_possible_cpus() units and the first chunk is used for
static percpu variables in the kernel image (special boot time
alloc/init handling necessary as these areas need to be brought up
before allocation services are running).  Unit grows as necessary and
all units grow or shrink in unison.  When a chunk is filled up,
another chunk is allocated.  ie. in vmalloc area

  c0                           c1                         c2           
   -------------------          -------------------        ------------
  | u0 | u1 | u2 | u3 |        | u0 | u1 | u2 | u3 |      | u0 | u1 | u
   -------------------  ......  -------------------  ....  ------------

Allocation is done in offset-size areas of single unit space.  Ie,
when UNIT_SIZE is 128k, an area at 134k of 512bytes occupy 512bytes at
6k of c1:u0, c1:u1, c1:u2 and c1u3.  Percpu access can be done by
configuring percpu base registers UNIT_SIZE apart.

Currently it uses pte mappings but byn using larger UNIT_SIZE, it can
be modified to use pmd mappings.  I'm a bit skeptical about this tho.
Percpu pages are allocated with HIGHMEM | COLD, so they won't
interfere with the physical mapping and on !NUMA it lifts load from
pgd tlb by not having stuff for different cpus occupying the same pgd
page.  What we can also do is to use large page sized UNIT_SIZE but
default to pte mappings and convert to pmd mapping if a chunk becomes
full occupied.  Anyways, we can think about this later.

I'm going back to get this thing working.

Happy Valentine's everyone. :-)

-- 
tejun

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: #tj-percpu has been rebased
  2009-02-14  0:45             ` Tejun Heo
@ 2009-02-14  1:53               ` H. Peter Anvin
  2009-02-14  2:10                 ` Tejun Heo
  2009-02-16  7:23               ` Rusty Russell
  1 sibling, 1 reply; 20+ messages in thread
From: H. Peter Anvin @ 2009-02-14  1:53 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Rusty Russell, Ingo Molnar, Thomas Gleixner, x86,
	Linux Kernel Mailing List, Jeremy Fitzhardinge, cpw

Tejun Heo wrote:
> 
> Percpu areas are allocated in chunks in vmalloc area.  Each chunk is
> consisted of num_possible_cpus() units and the first chunk is used for
> static percpu variables in the kernel image (special boot time
> alloc/init handling necessary as these areas need to be brought up
> before allocation services are running).  Unit grows as necessary and
> all units grow or shrink in unison.  When a chunk is filled up,
> another chunk is allocated.  ie. in vmalloc area
> 
>   c0                           c1                         c2           
>    -------------------          -------------------        ------------
>   | u0 | u1 | u2 | u3 |        | u0 | u1 | u2 | u3 |      | u0 | u1 | u
>    -------------------  ......  -------------------  ....  ------------
> 
> Allocation is done in offset-size areas of single unit space.  Ie,
> when UNIT_SIZE is 128k, an area at 134k of 512bytes occupy 512bytes at
> 6k of c1:u0, c1:u1, c1:u2 and c1u3.  Percpu access can be done by
> configuring percpu base registers UNIT_SIZE apart.
> 

Okay, let's think about this a bit.

At least for x86, there are two cases:

- 32 bits.  The vmalloc area is *extremely* constrained, and has the 
same class of fragmentation issues as main memory.  In fact, it might 
have *more* just by virtue of being larger.

- 64 bits.  At this point, we have with current memory sizes(*) an 
astronomically large virtual space.  Here we have no real problem 
allocating linearly in virtual space, either by giving each CPU some 
very large hunk of virtual address space (which means each percpu area 
is contiguous in virtual space) or by doing large contiguous allocations 
out of another range.

It doesn't seem to make sense to me at first glance to be any advantage 
to interlacing the CPUs.  Quite on the contrary, it seems to utterly 
preclude ever doing PMDs with a win, since (a) you'd be allocating real 
memory for CPUs which aren't actually there and (b) you'd have the wrong 
NUMA associativity.

	-hpa


(*) In about 20 years we better get the remaining virtual address bits...

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: #tj-percpu has been rebased
  2009-02-14  1:53               ` H. Peter Anvin
@ 2009-02-14  2:10                 ` Tejun Heo
  0 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2009-02-14  2:10 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Rusty Russell, Ingo Molnar, Thomas Gleixner, x86,
	Linux Kernel Mailing List, Jeremy Fitzhardinge, cpw

Hello,

H. Peter Anvin wrote:
> Okay, let's think about this a bit.
> 
> At least for x86, there are two cases:
> 
> - 32 bits.  The vmalloc area is *extremely* constrained, and has the
> same class of fragmentation issues as main memory.  In fact, it might
> have *more* just by virtue of being larger.

We can go for smaller chunks but I don't really see any perfect
solution here.  If a machine is doing 16 way SMP on 32bit, it's not
gonna scale very well anyway.

> - 64 bits.  At this point, we have with current memory sizes(*) an
> astronomically large virtual space.  Here we have no real problem
> allocating linearly in virtual space, either by giving each CPU some
> very large hunk of virtual address space (which means each percpu area
> is contiguous in virtual space) or by doing large contiguous allocations
> out of another range.
> 
> It doesn't seem to make sense to me at first glance to be any advantage
> to interlacing the CPUs.  Quite on the contrary, it seems to utterly
> preclude ever doing PMDs with a win, since (a) you'd be allocating real
> memory for CPUs which aren't actually there and (b) you'd have the wrong
> NUMA associativity.

For (a), we can do hotplug online/offline thing for dynamic areas if
necessary.  (b) why would it have the wrong NUMA associativity?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: #tj-percpu has been rebased
  2009-02-14  0:45             ` Tejun Heo
  2009-02-14  1:53               ` H. Peter Anvin
@ 2009-02-16  7:23               ` Rusty Russell
  2009-02-16 17:28                 ` H. Peter Anvin
  1 sibling, 1 reply; 20+ messages in thread
From: Rusty Russell @ 2009-02-16  7:23 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, x86,
	Linux Kernel Mailing List, Jeremy Fitzhardinge, cpw

On Saturday 14 February 2009 11:15:14 Tejun Heo wrote:
> Rusty Russell wrote:
> > On Thursday 12 February 2009 14:14:08 Tejun Heo wrote:
> >> Oops, those are the same ones.  I'll give a shot at cooking up
> >> something which can be dynamically sized before going forward with
> >> this one.
> > 
> > That's why I handed it to you! :)
> > 
> > Just remember we waited over 5 years for this to happen: the point of these
> > is that Christoph showed it's still useful.
> > 
> > (And I really like the idea of allocing congruent areas rather than remapping
> >  if someone can show that it's semi-reliable.  Good luck!)
> 
> I finished writing up the first draft last night.  Somehow I can feel
> long grueling debugging hours ahead of me but it generally goes like
> the following.
> 
> Percpu areas are allocated in chunks in vmalloc area.  Each chunk is
> consisted of num_possible_cpus() units and the first chunk is used for
> static percpu variables in the kernel image (special boot time
> alloc/init handling necessary as these areas need to be brought up
> before allocation services are running).  Unit grows as necessary and
> all units grow or shrink in unison.  When a chunk is filled up,
> another chunk is allocated.  ie. in vmalloc area
> 
>   c0                           c1                         c2           
>    -------------------          -------------------        ------------
>   | u0 | u1 | u2 | u3 |        | u0 | u1 | u2 | u3 |      | u0 | u1 | u
>    -------------------  ......  -------------------  ....  ------------
> 
> Allocation is done in offset-size areas of single unit space.  Ie,
> when UNIT_SIZE is 128k, an area at 134k of 512bytes occupy 512bytes at
> 6k of c1:u0, c1:u1, c1:u2 and c1u3.  Percpu access can be done by
> configuring percpu base registers UNIT_SIZE apart.
> 
> Currently it uses pte mappings but byn using larger UNIT_SIZE, it can
> be modified to use pmd mappings.  I'm a bit skeptical about this tho.
> Percpu pages are allocated with HIGHMEM | COLD, so they won't
> interfere with the physical mapping and on !NUMA it lifts load from
> pgd tlb by not having stuff for different cpus occupying the same pgd
> page.

Not sure I understand all of this, but it sounds like a straight virtual
mapping with some chosen separation between the mappings.

But note that for the non-NUMA case, you can just use kmalloc/__get_free_pages
and no remapping tricks are necessary at all.

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: #tj-percpu has been rebased
  2009-02-16  7:23               ` Rusty Russell
@ 2009-02-16 17:28                 ` H. Peter Anvin
  2009-02-16 23:22                   ` Rusty Russell
  0 siblings, 1 reply; 20+ messages in thread
From: H. Peter Anvin @ 2009-02-16 17:28 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Tejun Heo, Ingo Molnar, Thomas Gleixner, x86,
	Linux Kernel Mailing List, Jeremy Fitzhardinge, cpw

Rusty Russell wrote:
> 
> But note that for the non-NUMA case, you can just use kmalloc/__get_free_pages
> and no remapping tricks are necessary at all.
> 

Only if your chunks are really small.  Keep in mind that
num_possible_cpus() may be 4096, and so it is unlikely you'll be able to
get enough contiguous pages unless you're using the largepage pool, and
even then you only get 512 bytes per cpu.

All in all I think a dedicated virtual zone per CPU as opposed to
interleaving them seems to make more sense.  Even with 4096 CPUs and
reserving, say, 256 MB per CPU it's not that much address space in the
context of a 47-bit kernel space.  On 32 bits I don't think anything but
the most trivial amount of percpu space is going to fly no matter what.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: #tj-percpu has been rebased
  2009-02-16 17:28                 ` H. Peter Anvin
@ 2009-02-16 23:22                   ` Rusty Russell
  2009-02-16 23:28                     ` H. Peter Anvin
  0 siblings, 1 reply; 20+ messages in thread
From: Rusty Russell @ 2009-02-16 23:22 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Tejun Heo, Ingo Molnar, Thomas Gleixner, x86,
	Linux Kernel Mailing List, Jeremy Fitzhardinge, cpw

On Tuesday 17 February 2009 03:58:22 H. Peter Anvin wrote:
> Rusty Russell wrote:
> > 
> > But note that for the non-NUMA case, you can just use kmalloc/__get_free_pages
> > and no remapping tricks are necessary at all.
> > 
> 
> Only if your chunks are really small.  Keep in mind that
> num_possible_cpus() may be 4096, and so it is unlikely you'll be able to
> get enough contiguous pages unless you're using the largepage pool, and
> even then you only get 512 bytes per cpu.
> 
> All in all I think a dedicated virtual zone per CPU as opposed to
> interleaving them seems to make more sense.  Even with 4096 CPUs and
> reserving, say, 256 MB per CPU it's not that much address space in the
> context of a 47-bit kernel space.  On 32 bits I don't think anything but
> the most trivial amount of percpu space is going to fly no matter what.

It's the TLB cost which I really don't want to pay; num_possible_cpus()
4096 non-NUMA is a little silly (currently impossible).

I'm happy to limit per-cpu allocations to pagesize, then you only need to
find num_possible_cpus() contig pages, and if you can't, you fall back to
vmalloc.

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: #tj-percpu has been rebased
  2009-02-16 23:22                   ` Rusty Russell
@ 2009-02-16 23:28                     ` H. Peter Anvin
  2009-02-18  4:25                       ` Rusty Russell
  0 siblings, 1 reply; 20+ messages in thread
From: H. Peter Anvin @ 2009-02-16 23:28 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Tejun Heo, Ingo Molnar, Thomas Gleixner, x86,
	Linux Kernel Mailing List, Jeremy Fitzhardinge, cpw

Rusty Russell wrote:
>>
>> All in all I think a dedicated virtual zone per CPU as opposed to
>> interleaving them seems to make more sense.  Even with 4096 CPUs and
>> reserving, say, 256 MB per CPU it's not that much address space in the
>> context of a 47-bit kernel space.  On 32 bits I don't think anything but
>> the most trivial amount of percpu space is going to fly no matter what.
> 
> It's the TLB cost which I really don't want to pay; num_possible_cpus()
> 4096 non-NUMA is a little silly (currently impossible).
> 
> I'm happy to limit per-cpu allocations to pagesize, then you only need to
> find num_possible_cpus() contig pages, and if you can't, you fall back to
> vmalloc.
> 

num_possible_cpus() can be very large though, so in many cases the
likelihood of finding that many pages approach zero.  Furthermore,
num_possible_cpus() may be quite a bit larger than the actual number of
CPUs in the system.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: #tj-percpu has been rebased
  2009-02-16 23:28                     ` H. Peter Anvin
@ 2009-02-18  4:25                       ` Rusty Russell
  2009-02-18  6:40                         ` H. Peter Anvin
  0 siblings, 1 reply; 20+ messages in thread
From: Rusty Russell @ 2009-02-18  4:25 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Tejun Heo, Ingo Molnar, Thomas Gleixner, x86,
	Linux Kernel Mailing List, Jeremy Fitzhardinge, cpw

On Tuesday 17 February 2009 09:58:19 H. Peter Anvin wrote:
> Rusty Russell wrote:
> >>
> >> All in all I think a dedicated virtual zone per CPU as opposed to
> >> interleaving them seems to make more sense.  Even with 4096 CPUs and
> >> reserving, say, 256 MB per CPU it's not that much address space in the
> >> context of a 47-bit kernel space.  On 32 bits I don't think anything but
> >> the most trivial amount of percpu space is going to fly no matter what.
> > 
> > It's the TLB cost which I really don't want to pay; num_possible_cpus()
> > 4096 non-NUMA is a little silly (currently impossible).
> > 
> > I'm happy to limit per-cpu allocations to pagesize, then you only need to
> > find num_possible_cpus() contig pages, and if you can't, you fall back to
> > vmalloc.
> > 
> 
> num_possible_cpus() can be very large though, so in many cases the
> likelihood of finding that many pages approach zero.  Furthermore,
> num_possible_cpus() may be quite a bit larger than the actual number of
> CPUs in the system.

Sure, so we end up at vmalloc.  No worse, but simpler and much better if we
*can* do it.

Rusty.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: #tj-percpu has been rebased
  2009-02-18  4:25                       ` Rusty Russell
@ 2009-02-18  6:40                         ` H. Peter Anvin
  2009-02-18  7:11                           ` Rusty Russell
  0 siblings, 1 reply; 20+ messages in thread
From: H. Peter Anvin @ 2009-02-18  6:40 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Tejun Heo, Ingo Molnar, Thomas Gleixner, x86,
	Linux Kernel Mailing List, Jeremy Fitzhardinge, cpw

Rusty Russell wrote:
>>>
>> num_possible_cpus() can be very large though, so in many cases the
>> likelihood of finding that many pages approach zero.  Furthermore,
>> num_possible_cpus() may be quite a bit larger than the actual number of
>> CPUs in the system.
> 
> Sure, so we end up at vmalloc.  No worse, but simpler and much better if we
> *can* do it.
> 

If the likelihood is near zero, then you're wasting opportunities to do
it better.  If we have compact per-cpu virtual areas then we can use
large pages if we know we'll have large percpu areas.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: #tj-percpu has been rebased
  2009-02-18  6:40                         ` H. Peter Anvin
@ 2009-02-18  7:11                           ` Rusty Russell
  0 siblings, 0 replies; 20+ messages in thread
From: Rusty Russell @ 2009-02-18  7:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Tejun Heo, Ingo Molnar, Thomas Gleixner, x86,
	Linux Kernel Mailing List, Jeremy Fitzhardinge, cpw

On Wednesday 18 February 2009 17:10:20 H. Peter Anvin wrote:
> Rusty Russell wrote:
> >>>
> >> num_possible_cpus() can be very large though, so in many cases the
> >> likelihood of finding that many pages approach zero.  Furthermore,
> >> num_possible_cpus() may be quite a bit larger than the actual number of
> >> CPUs in the system.
> > 
> > Sure, so we end up at vmalloc.  No worse, but simpler and much better if we
> > *can* do it.
> 
> If the likelihood is near zero, then you're wasting opportunities to do
> it better.  If we have compact per-cpu virtual areas then we can use
> large pages if we know we'll have large percpu areas.

You're right; we'd need that defrag wonderness people keep speculating about.

What finally convinced me is that the per-cpu chunks have to be at least the
size of the .data.percpu section (24k here).  7*num_possible_cpus() is even
worse.

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2009-02-18  7:11 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-01-30 17:05 #tj-percpu has been rebased Tejun Heo
2009-01-31  5:46 ` Tejun Heo
2009-01-31 13:28   ` Ingo Molnar
2009-02-02  9:04   ` Rusty Russell
2009-02-04  3:18     ` Tejun Heo
2009-02-12  3:37       ` Tejun Heo
2009-02-12  3:44         ` Tejun Heo
2009-02-13 20:58           ` Rusty Russell
2009-02-13 21:17             ` Jeremy Fitzhardinge
2009-02-13 22:59             ` H. Peter Anvin
2009-02-14  0:45             ` Tejun Heo
2009-02-14  1:53               ` H. Peter Anvin
2009-02-14  2:10                 ` Tejun Heo
2009-02-16  7:23               ` Rusty Russell
2009-02-16 17:28                 ` H. Peter Anvin
2009-02-16 23:22                   ` Rusty Russell
2009-02-16 23:28                     ` H. Peter Anvin
2009-02-18  4:25                       ` Rusty Russell
2009-02-18  6:40                         ` H. Peter Anvin
2009-02-18  7:11                           ` Rusty Russell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).