linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* ppc44x - how do i optimize driver for tlb hits
@ 2010-09-23 15:12 Ayman El-Khashab
  2010-09-23 22:01 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 11+ messages in thread
From: Ayman El-Khashab @ 2010-09-23 15:12 UTC (permalink / raw)
  To: linuxppc-dev

I've implemented a working driver on my 460EX.  it allocates a couple
of buffers of 4MB each.  I have a custom memcmp algorithm in asm that
is extremely fast in user space, but 1/2 as fast when run on these
buffers.

my tests are showing that the algorithm seems to be memory bandwidth
bound.  my guess is that i am having tlb or cache misses (my algo
uses the dbct) that is slowing performance.  curiously when in user
space, i can affect the performance by small changes in the size of
the buffer, i.e. 4MB + 32B is fast, 4MB + 4K is much worse.

Can i adjust my driver code that is using kmalloc to make sure that
the ppc44x has 4MB tlb entries for these and that they stay put?

thanks
ayman

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ppc44x - how do i optimize driver for tlb hits
  2010-09-23 15:12 ppc44x - how do i optimize driver for tlb hits Ayman El-Khashab
@ 2010-09-23 22:01 ` Benjamin Herrenschmidt
  2010-09-23 22:35   ` Ayman El-Khashab
  0 siblings, 1 reply; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2010-09-23 22:01 UTC (permalink / raw)
  To: Ayman El-Khashab; +Cc: linuxppc-dev

On Thu, 2010-09-23 at 10:12 -0500, Ayman El-Khashab wrote:
> I've implemented a working driver on my 460EX.  it allocates a couple
> of buffers of 4MB each.  I have a custom memcmp algorithm in asm that
> is extremely fast in user space, but 1/2 as fast when run on these
> buffers.
> 
> my tests are showing that the algorithm seems to be memory bandwidth
> bound.  my guess is that i am having tlb or cache misses (my algo
> uses the dbct) that is slowing performance.  curiously when in user
> space, i can affect the performance by small changes in the size of
> the buffer, i.e. 4MB + 32B is fast, 4MB + 4K is much worse.
> 
> Can i adjust my driver code that is using kmalloc to make sure that
> the ppc44x has 4MB tlb entries for these and that they stay put?

Anything you allocate with kmalloc() is going to be mapped by bolted
256M TLB entries, so there should be no TLB misses happening in the
kernel case.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ppc44x - how do i optimize driver for tlb hits
  2010-09-23 22:01 ` Benjamin Herrenschmidt
@ 2010-09-23 22:35   ` Ayman El-Khashab
  2010-09-24  1:07     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 11+ messages in thread
From: Ayman El-Khashab @ 2010-09-23 22:35 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev

On Fri, Sep 24, 2010 at 08:01:04AM +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2010-09-23 at 10:12 -0500, Ayman El-Khashab wrote:
> > I've implemented a working driver on my 460EX.  it allocates a couple
> > of buffers of 4MB each.  I have a custom memcmp algorithm in asm that
> > is extremely fast in user space, but 1/2 as fast when run on these
> > buffers.
> > 
> > my tests are showing that the algorithm seems to be memory bandwidth
> > bound.  my guess is that i am having tlb or cache misses (my algo
> > uses the dbct) that is slowing performance.  curiously when in user
> > space, i can affect the performance by small changes in the size of
> > the buffer, i.e. 4MB + 32B is fast, 4MB + 4K is much worse.
> > 
> > Can i adjust my driver code that is using kmalloc to make sure that
> > the ppc44x has 4MB tlb entries for these and that they stay put?
> 
> Anything you allocate with kmalloc() is going to be mapped by bolted
> 256M TLB entries, so there should be no TLB misses happening in the
> kernel case.
> 

Hi Ben, can you or somebody elaborate?  I saw the pinned tlb in 44x_mmu.c.
Perhaps I don't understand the code fully, but it appears to map 256MB
of "lowmem" into a pinned tlb.  I am not sure what phys address lowmem
means, but I assumed (possibly incorrectly) that it is 0-256MB.  When I
get the physical addresses for my buffers after kmalloc, they all have
addresses that are within my DRAM but start at about the 440MB mark. I
end up passing those phys addresses to my DMA engine.

When my compare runs it takes a huge amount of time in the assembly code
doing memory fetches which makes me think that there are either tons of
cache misses (despite the prefetching) or the entries have been purged
from the TLB and must be obtained again.  As an experiment, I disabled
my cache prefetch code and the algo took forever.  Next I altered the
asm to do the same amount of data but a smaller amount over and over 
so that less if fetched from main memory.  That executed very quickly.
>From that I drew the conclusion that the algorithm is memory bandwidth
limited.

In a standalone configuration (i.e. algorithm just using user memory,
everything else identical), the speedup is 2-3x.  So the limitation 
is not a hardware limit, it must be something that is happening when
I execute the loads.  (it is a compare algorithm, so it only does
loads).

Thanks
Ayman

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ppc44x - how do i optimize driver for tlb hits
  2010-09-23 22:35   ` Ayman El-Khashab
@ 2010-09-24  1:07     ` Benjamin Herrenschmidt
  2010-09-24  2:58       ` Ayman El-Khashab
  0 siblings, 1 reply; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2010-09-24  1:07 UTC (permalink / raw)
  To: Ayman El-Khashab; +Cc: linuxppc-dev

On Thu, 2010-09-23 at 17:35 -0500, Ayman El-Khashab wrote:
> Anything you allocate with kmalloc() is going to be mapped by bolted
> > 256M TLB entries, so there should be no TLB misses happening in the
> > kernel case.
> > 
> 
> Hi Ben, can you or somebody elaborate?  I saw the pinned tlb in
> 44x_mmu.c.
> Perhaps I don't understand the code fully, but it appears to map 256MB
> of "lowmem" into a pinned tlb.  I am not sure what phys address lowmem
> means, but I assumed (possibly incorrectly) that it is 0-256MB. 

No. The first pinned entry (0...256M) is inserted by the asm code in
head_44x.S. The code in 44x_mmu.c will later map the rest of lowmem
(typically up to 768M but various settings can change that) using more
256M entries.

Basically, all of lowmem is permanently mapped with such entries. 

> When I get the physical addresses for my buffers after kmalloc, they
> all have addresses that are within my DRAM but start at about the
> 440MB mark. I end up passing those phys addresses to my DMA engine.

Anything you get from kmalloc is going to come from lowmem, and thus be
covered by those bolted TLB entries.

> When my compare runs it takes a huge amount of time in the assembly
> code doing memory fetches which makes me think that there are either
> tons of cache misses (despite the prefetching) or the entries have
> been purged

What prefetching ? IE. The DMA operation -will- flush things out of the
cache due to the DMA being not cache coherent on 44x. The 440 also
doesn't have a working HW prefetch engine afaik (it should be disabled
in FW or early asm on 440 cores and fused out in HW on 460 cores afaik).

So only explicit SW prefetching will help.

> from the TLB and must be obtained again.  As an experiment, I disabled
> my cache prefetch code and the algo took forever.  Next I altered the
> asm to do the same amount of data but a smaller amount over and over 
> so that less if fetched from main memory.  That executed very quickly.
> >From that I drew the conclusion that the algorithm is memory
> bandwidth limited.

I don't know what exactly is going on, maybe your prefetch stride isn't
right for the HW setup, or something like that. You can use xmon 'u'
command to look at the TLB content. Check that we have the 256M entries
mapping your data, they should be there.

> In a standalone configuration (i.e. algorithm just using user memory,
> everything else identical), the speedup is 2-3x.  So the limitation 
> is not a hardware limit, it must be something that is happening when
> I execute the loads.  (it is a compare algorithm, so it only does
> loads). 

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ppc44x - how do i optimize driver for tlb hits
  2010-09-24  1:07     ` Benjamin Herrenschmidt
@ 2010-09-24  2:58       ` Ayman El-Khashab
  2010-09-24  4:43         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 11+ messages in thread
From: Ayman El-Khashab @ 2010-09-24  2:58 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev

On Fri, Sep 24, 2010 at 11:07:24AM +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2010-09-23 at 17:35 -0500, Ayman El-Khashab wrote:
> > Anything you allocate with kmalloc() is going to be mapped by bolted
> > > 256M TLB entries, so there should be no TLB misses happening in the
> > > kernel case.
> > > 
> > 
> > Hi Ben, can you or somebody elaborate?  I saw the pinned tlb in
> > 44x_mmu.c.
> > Perhaps I don't understand the code fully, but it appears to map 256MB
> > of "lowmem" into a pinned tlb.  I am not sure what phys address lowmem
> > means, but I assumed (possibly incorrectly) that it is 0-256MB. 
> 
> No. The first pinned entry (0...256M) is inserted by the asm code in
> head_44x.S. The code in 44x_mmu.c will later map the rest of lowmem
> (typically up to 768M but various settings can change that) using more
> 256M entries.

Thanks Ben, appreciate all your wisdom and insight.

Ok, so my 460ex board has 512MB total, so how does that figure into 
the 768M?  Is there some other heuristic that determines how these
are mapped? 

> Basically, all of lowmem is permanently mapped with such entries. 
> 
> > When I get the physical addresses for my buffers after kmalloc, they
> > all have addresses that are within my DRAM but start at about the
> > 440MB mark. I end up passing those phys addresses to my DMA engine.
> 
> Anything you get from kmalloc is going to come from lowmem, and thus be
> covered by those bolted TLB entries.

So is it reasonable to assume that everything on my system will come from
pinned TLB entries?

> 
> > When my compare runs it takes a huge amount of time in the assembly
> > code doing memory fetches which makes me think that there are either
> > tons of cache misses (despite the prefetching) or the entries have
> > been purged
> 
> What prefetching ? IE. The DMA operation -will- flush things out of the
> cache due to the DMA being not cache coherent on 44x. The 440 also
> doesn't have a working HW prefetch engine afaik (it should be disabled
> in FW or early asm on 440 cores and fused out in HW on 460 cores afaik).
>
> So only explicit SW prefetching will help.
> 

The DMA is what I use in the "real world case" to get data into and out 
of these buffers.  However, I can disable the DMA completely and do only
the kmalloc.  In this case I still see the same poor performance.  My
prefetching is part of my algo using the dcbt instructions.  I know the
instructions are effective b/c without them the algo is much less 
performant.  So yes, my prefetches are explicit.

> > from the TLB and must be obtained again.  As an experiment, I disabled
> > my cache prefetch code and the algo took forever.  Next I altered the
> > asm to do the same amount of data but a smaller amount over and over 
> > so that less if fetched from main memory.  That executed very quickly.
> > >From that I drew the conclusion that the algorithm is memory
> > bandwidth limited.
> 
> I don't know what exactly is going on, maybe your prefetch stride isn't
> right for the HW setup, or something like that. You can use xmon 'u'
> command to look at the TLB content. Check that we have the 256M entries
> mapping your data, they should be there.

Ok, I will give that a try ... in addition, is there an easy way to use
any sort of gprof like tool to see the system performance?  What about
looking at the 44x performance counters in some meaningful way?  All
the experiments point to the fetching being slower in the full program
as opposed to the algo in a testbench, so I want to determine what it is
that could cause that.

thanks
ayman

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ppc44x - how do i optimize driver for tlb hits
  2010-09-24  2:58       ` Ayman El-Khashab
@ 2010-09-24  4:43         ` Benjamin Herrenschmidt
  2010-09-24 10:30           ` Josh Boyer
  0 siblings, 1 reply; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2010-09-24  4:43 UTC (permalink / raw)
  To: Ayman El-Khashab; +Cc: linuxppc-dev


> > No. The first pinned entry (0...256M) is inserted by the asm code in
> > head_44x.S. The code in 44x_mmu.c will later map the rest of lowmem
> > (typically up to 768M but various settings can change that) using more
> > 256M entries.
> 
> Thanks Ben, appreciate all your wisdom and insight.
> 
> Ok, so my 460ex board has 512MB total, so how does that figure into 
> the 768M?  Is there some other heuristic that determines how these
> are mapped? 

Not really, it all fits in lowmem so it will be mapped with two pinned
256M entries.

Basically, we try to map all memory with those entries in the linear
mapping. But since we only have 1G of address space available when
PAGE_OFFSET is c0000000, and we need some of that for vmalloc, ioremap,
etc... we thus limit that mapping to 768M currently.

If you have more memory, you will see only 768M unless you use
CONFIG_HIGHMEM, which allows the kernel to exploit more physical
memory. 

In this case, only the first 768M are permanently mapped (and
accessible), but you can allocate pages in "highmem" which can still be
mapped into user space and need kmap/kunmap calls to be accessed by the
kernel.

However, in your case you don't need highmem, everything fits in lowmem,
so the kernel will just use 2x256M of bolted TLB entries to map that
permanently.

Note also that kmalloc() always return lowmem.

> So is it reasonable to assume that everything on my system will come from
> pinned TLB entries?

Yes.

> The DMA is what I use in the "real world case" to get data into and out 
> of these buffers.  However, I can disable the DMA completely and do only
> the kmalloc.  In this case I still see the same poor performance.  My
> prefetching is part of my algo using the dcbt instructions.  I know the
> instructions are effective b/c without them the algo is much less 
> performant.  So yes, my prefetches are explicit.

Could be some "effect" of the cache structure, L2 cache, cache geometry
(number of ways etc...). You might be able to alleviate that by changing
the "stride" of your prefetch.

Unfortunately, I'm not familiar enough with the 440 micro architecture
and its caches to be able to help you much here.

> Ok, I will give that a try ... in addition, is there an easy way to use
> any sort of gprof like tool to see the system performance?  What about
> looking at the 44x performance counters in some meaningful way?  All
> the experiments point to the fetching being slower in the full program
> as opposed to the algo in a testbench, so I want to determine what it is
> that could cause that.

Does it have any useful performance counters ? I didn't think it did but
I may be mistaken.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ppc44x - how do i optimize driver for tlb hits
  2010-09-24  4:43         ` Benjamin Herrenschmidt
@ 2010-09-24 10:30           ` Josh Boyer
  2010-09-24 13:08             ` Ayman El-Khashab
  0 siblings, 1 reply; 11+ messages in thread
From: Josh Boyer @ 2010-09-24 10:30 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Ayman El-Khashab

On Fri, Sep 24, 2010 at 02:43:52PM +1000, Benjamin Herrenschmidt wrote:
>> The DMA is what I use in the "real world case" to get data into and out 
>> of these buffers.  However, I can disable the DMA completely and do only
>> the kmalloc.  In this case I still see the same poor performance.  My
>> prefetching is part of my algo using the dcbt instructions.  I know the
>> instructions are effective b/c without them the algo is much less 
>> performant.  So yes, my prefetches are explicit.
>
>Could be some "effect" of the cache structure, L2 cache, cache geometry
>(number of ways etc...). You might be able to alleviate that by changing
>the "stride" of your prefetch.
>
>Unfortunately, I'm not familiar enough with the 440 micro architecture
>and its caches to be able to help you much here.

Also, doesn't kmalloc have a limit to the size of the request it will
let you allocate?  I know in the distant past you could allocate 128K
with kmalloc, and 2M with an explicit call to get_free_pages.  Anything
larger than that had to use vmalloc.  The limit might indeed be higher
now, but a 4MB kmalloc buffer sounds very large, given that it would be
contiguous pages.  Two of them even less so.

>> Ok, I will give that a try ... in addition, is there an easy way to use
>> any sort of gprof like tool to see the system performance?  What about
>> looking at the 44x performance counters in some meaningful way?  All
>> the experiments point to the fetching being slower in the full program
>> as opposed to the algo in a testbench, so I want to determine what it is
>> that could cause that.
>
>Does it have any useful performance counters ? I didn't think it did but
>I may be mistaken.

No, it doesn't.

josh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ppc44x - how do i optimize driver for tlb hits
  2010-09-24 10:30           ` Josh Boyer
@ 2010-09-24 13:08             ` Ayman El-Khashab
  2010-09-24 22:11               ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 11+ messages in thread
From: Ayman El-Khashab @ 2010-09-24 13:08 UTC (permalink / raw)
  To: Josh Boyer; +Cc: linuxppc-dev

On Fri, Sep 24, 2010 at 06:30:34AM -0400, Josh Boyer wrote:
> On Fri, Sep 24, 2010 at 02:43:52PM +1000, Benjamin Herrenschmidt wrote:
> >> The DMA is what I use in the "real world case" to get data into and out 
> >> of these buffers.  However, I can disable the DMA completely and do only
> >> the kmalloc.  In this case I still see the same poor performance.  My
> >> prefetching is part of my algo using the dcbt instructions.  I know the
> >> instructions are effective b/c without them the algo is much less 
> >> performant.  So yes, my prefetches are explicit.
> >
> >Could be some "effect" of the cache structure, L2 cache, cache geometry
> >(number of ways etc...). You might be able to alleviate that by changing
> >the "stride" of your prefetch.

My original theory was that it was having lots of cache misses.  But since
the algorithm works standalone fast and uses large enough buffers (4MB),
much of the cache is flushed and replaced with my data.  The cache is 32K,
8 way, 32b/line.  I've crafted the algorithm to use those parameters.

> >
> >Unfortunately, I'm not familiar enough with the 440 micro architecture
> >and its caches to be able to help you much here.
> 
> Also, doesn't kmalloc have a limit to the size of the request it will
> let you allocate?  I know in the distant past you could allocate 128K
> with kmalloc, and 2M with an explicit call to get_free_pages.  Anything
> larger than that had to use vmalloc.  The limit might indeed be higher
> now, but a 4MB kmalloc buffer sounds very large, given that it would be
> contiguous pages.  Two of them even less so.

I thought so too, but at least in the current implementation we found
empirically that we could kmalloc up to but no more than 4MB.  We have 
also tried an approach in user memory and then using "get_user_pages"
and building a scatter-gather.  We found that the compare code doesn't 
perform any better. 

I suppose another option is to to use the kernel profiling option I 
always see but have never used.  Is that a viable option to figure out
what is happening here?  

ayman

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ppc44x - how do i optimize driver for tlb hits
  2010-09-24 13:08             ` Ayman El-Khashab
@ 2010-09-24 22:11               ` Benjamin Herrenschmidt
  2010-10-03 19:13                 ` Ayman El-Khashab
  0 siblings, 1 reply; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2010-09-24 22:11 UTC (permalink / raw)
  To: Ayman El-Khashab; +Cc: linuxppc-dev

On Fri, 2010-09-24 at 08:08 -0500, Ayman El-Khashab wrote:
> 
> I suppose another option is to to use the kernel profiling option I 
> always see but have never used.  Is that a viable option to figure out
> what is happening here?  

With perf and stochastic sampling ? If you sample fast enough... but
you'll mostly point to your routine I suppose... though it might tell
you statistically where in your code, which -might- help.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ppc44x - how do i optimize driver for tlb hits
  2010-09-24 22:11               ` Benjamin Herrenschmidt
@ 2010-10-03 19:13                 ` Ayman El-Khashab
  2010-10-03 22:38                   ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 11+ messages in thread
From: Ayman El-Khashab @ 2010-10-03 19:13 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev

On Sat, Sep 25, 2010 at 08:11:04AM +1000, Benjamin Herrenschmidt wrote:
> On Fri, 2010-09-24 at 08:08 -0500, Ayman El-Khashab wrote:
> > 
> > I suppose another option is to to use the kernel profiling option I 
> > always see but have never used.  Is that a viable option to figure out
> > what is happening here?  
> 
> With perf and stochastic sampling ? If you sample fast enough... but
> you'll mostly point to your routine I suppose... though it might tell
> you statistically where in your code, which -might- help.
> 

Thanks I didn't end up profiling it b/c we found the biggest culprit. 
Basically we were mapping this memory in kernel space and as long as we
did that ONLY everything was ok.  But then we would mmap the physical
addresses into user space.  Using MAP_SHARED made it extremely slow. 
Using MAP_PRIVATE made it very fast.  So it works, but why is MAP_SHARED
that much slower?

The other optimization was a change in the algorithm to take advantage
of the L2 prefetching.  Since we were operating on many simultaneous
streams it seems that the cache performance was not good.  

thanks
ame

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ppc44x - how do i optimize driver for tlb hits
  2010-10-03 19:13                 ` Ayman El-Khashab
@ 2010-10-03 22:38                   ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2010-10-03 22:38 UTC (permalink / raw)
  To: Ayman El-Khashab; +Cc: linuxppc-dev

On Sun, 2010-10-03 at 14:13 -0500, Ayman El-Khashab wrote:
> On Sat, Sep 25, 2010 at 08:11:04AM +1000, Benjamin Herrenschmidt wrote:
> > On Fri, 2010-09-24 at 08:08 -0500, Ayman El-Khashab wrote:
> > > 
> > > I suppose another option is to to use the kernel profiling option I 
> > > always see but have never used.  Is that a viable option to figure out
> > > what is happening here?  
> > 
> > With perf and stochastic sampling ? If you sample fast enough... but
> > you'll mostly point to your routine I suppose... though it might tell
> > you statistically where in your code, which -might- help.
> > 
> 
> Thanks I didn't end up profiling it b/c we found the biggest culprit. 
> Basically we were mapping this memory in kernel space and as long as we
> did that ONLY everything was ok.  But then we would mmap the physical
> addresses into user space.  Using MAP_SHARED made it extremely slow. 
> Using MAP_PRIVATE made it very fast.  So it works, but why is MAP_SHARED
> that much slower?

I don't see any reason off hand why this would be the case. Can you
inspect the content of the TLB with either xmon or whatever HW debugger
you may have at hand and show me what difference you have between an
entry for your workload coming from MAP_SHARED vs. one coming from
MAP_PRIVATE ?

> The other optimization was a change in the algorithm to take advantage
> of the L2 prefetching.  Since we were operating on many simultaneous
> streams it seems that the cache performance was not good.  

Cheers,
Ben.

> thanks
> ame

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-10-03 22:39 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-23 15:12 ppc44x - how do i optimize driver for tlb hits Ayman El-Khashab
2010-09-23 22:01 ` Benjamin Herrenschmidt
2010-09-23 22:35   ` Ayman El-Khashab
2010-09-24  1:07     ` Benjamin Herrenschmidt
2010-09-24  2:58       ` Ayman El-Khashab
2010-09-24  4:43         ` Benjamin Herrenschmidt
2010-09-24 10:30           ` Josh Boyer
2010-09-24 13:08             ` Ayman El-Khashab
2010-09-24 22:11               ` Benjamin Herrenschmidt
2010-10-03 19:13                 ` Ayman El-Khashab
2010-10-03 22:38                   ` Benjamin Herrenschmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).