linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 0/9] Critical Mempools
@ 2006-01-25 19:39 Matthew Dobson
  2006-01-26 17:57 ` Christoph Lameter
  0 siblings, 1 reply; 16+ messages in thread
From: Matthew Dobson @ 2006-01-25 19:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: sri, andrea, pavel, linux-mm

--
The following is a new patch series designed to solve the same problems as the
"Critical Page Pool" patches that were sent out in December.  I've tried to
incorporate as much of the feedback that I received as possible into this new,
redesigned version.

Rather than inserting hooks directly into the page allocator, I've tried to
piggyback on the existing mempools infrastructure.  What I've done is created
a new "common" mempool allocator for whole pages.  I've also made some changes
to the mempool code to add more NUMA awareness.  Lastly, I've made some
changes to the slab allocator to allow a single mempool to act as the critical
pool for an entire subsystem.  All of these changes should be completely
transparent to existing users of mempools and the slab allocator.

Using this new approach, a subsystem can create a mempool and then pass a
pointer to this mempool on to all its slab allocations.  Anytime one of its
slab allocations needs to allocate memory that memory will be allocated
through the specified mempool, rather than through alloc_pages_node() directly.

Feedback on these patches (against 2.6.16-rc1) would be greatly appreciated.

Thanks!

-Matt


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 0/9] Critical Mempools
  2006-01-25 19:39 [patch 0/9] Critical Mempools Matthew Dobson
@ 2006-01-26 17:57 ` Christoph Lameter
  2006-01-26 23:01   ` Matthew Dobson
  0 siblings, 1 reply; 16+ messages in thread
From: Christoph Lameter @ 2006-01-26 17:57 UTC (permalink / raw)
  To: Matthew Dobson; +Cc: linux-kernel, sri, andrea, pavel, linux-mm

On Wed, 25 Jan 2006, Matthew Dobson wrote:

> Using this new approach, a subsystem can create a mempool and then pass a
> pointer to this mempool on to all its slab allocations.  Anytime one of its
> slab allocations needs to allocate memory that memory will be allocated
> through the specified mempool, rather than through alloc_pages_node() directly.

All subsystems will now get more complicated by having to add this 
emergency functionality?

> Feedback on these patches (against 2.6.16-rc1) would be greatly appreciated.

There surely must be a better way than revising all subsystems for 
critical allocations.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 0/9] Critical Mempools
  2006-01-26 17:57 ` Christoph Lameter
@ 2006-01-26 23:01   ` Matthew Dobson
  2006-01-26 23:18     ` Christoph Lameter
  0 siblings, 1 reply; 16+ messages in thread
From: Matthew Dobson @ 2006-01-26 23:01 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-kernel, sri, andrea, pavel, linux-mm

Christoph Lameter wrote:
> On Wed, 25 Jan 2006, Matthew Dobson wrote:
> 
> 
>>Using this new approach, a subsystem can create a mempool and then pass a
>>pointer to this mempool on to all its slab allocations.  Anytime one of its
>>slab allocations needs to allocate memory that memory will be allocated
>>through the specified mempool, rather than through alloc_pages_node() directly.
> 
> 
> All subsystems will now get more complicated by having to add this 
> emergency functionality?

Certainly not.  Only subsystems that want to use emergency pools will get
more complicated.  If you have a suggestion as to how to implement a
similar feature that is completely transparent to its users, I would *love*
to hear it.  I have tried to keep the changes to implement this
functionality to a minimum.  As the patches currently stand, existing slab
allocator and mempool users can continue using these subsystems without
modification.


>>Feedback on these patches (against 2.6.16-rc1) would be greatly appreciated.
> 
> 
> There surely must be a better way than revising all subsystems for 
> critical allocations.

Again, I could not find any way to implement this functionality without
forcing the users of the functionality to make some, albeit very minor,
changes.  Specific suggestions are more than welcome! :)

Thanks!

-Matt

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 0/9] Critical Mempools
  2006-01-26 23:01   ` Matthew Dobson
@ 2006-01-26 23:18     ` Christoph Lameter
  2006-01-26 23:32       ` Matthew Dobson
  0 siblings, 1 reply; 16+ messages in thread
From: Christoph Lameter @ 2006-01-26 23:18 UTC (permalink / raw)
  To: Matthew Dobson; +Cc: linux-kernel, sri, andrea, pavel, linux-mm

On Thu, 26 Jan 2006, Matthew Dobson wrote:

> > All subsystems will now get more complicated by having to add this 
> > emergency functionality?
> 
> Certainly not.  Only subsystems that want to use emergency pools will get
> more complicated.  If you have a suggestion as to how to implement a
> similar feature that is completely transparent to its users, I would *love*

I thought the earlier __GFP_CRITICAL was a good idea.

> to hear it.  I have tried to keep the changes to implement this
> functionality to a minimum.  As the patches currently stand, existing slab
> allocator and mempool users can continue using these subsystems without
> modification.

The patches are extensive and the required changes to subsystems in order 
to use these pools are also extensive.

> > There surely must be a better way than revising all subsystems for 
> > critical allocations.
> Again, I could not find any way to implement this functionality without
> forcing the users of the functionality to make some, albeit very minor,
> changes.  Specific suggestions are more than welcome! :)

Gfp flag? Better memory reclaim functionality?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 0/9] Critical Mempools
  2006-01-26 23:18     ` Christoph Lameter
@ 2006-01-26 23:32       ` Matthew Dobson
  2006-01-27  0:03         ` Benjamin LaHaise
  2006-01-27  8:29         ` Sridhar Samudrala
  0 siblings, 2 replies; 16+ messages in thread
From: Matthew Dobson @ 2006-01-26 23:32 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-kernel, sri, andrea, pavel, linux-mm

Christoph Lameter wrote:
> On Thu, 26 Jan 2006, Matthew Dobson wrote:
> 
> 
>>>All subsystems will now get more complicated by having to add this 
>>>emergency functionality?
>>
>>Certainly not.  Only subsystems that want to use emergency pools will get
>>more complicated.  If you have a suggestion as to how to implement a
>>similar feature that is completely transparent to its users, I would *love*
> 
> 
> I thought the earlier __GFP_CRITICAL was a good idea.

Well, I certainly could have used that feedback a month ago! ;)  The
general response to that patchset was overwhelmingly negative.  Yours is
the first vote in favor of that approach, that I'm aware of.


>>to hear it.  I have tried to keep the changes to implement this
>>functionality to a minimum.  As the patches currently stand, existing slab
>>allocator and mempool users can continue using these subsystems without
>>modification.
> 
> 
> The patches are extensive and the required changes to subsystems in order 
> to use these pools are also extensive.

I can't really argue with your first point, but the changes required to use
the pools should actually be quite small.  Sridhar (cc'd on this thread) is
working on the changes required for the networking subsystem to use these
pools, and it looks like the patches will be no larger than the ones from
the last attempt.


>>>There surely must be a better way than revising all subsystems for 
>>>critical allocations.
>>
>>Again, I could not find any way to implement this functionality without
>>forcing the users of the functionality to make some, albeit very minor,
>>changes.  Specific suggestions are more than welcome! :)
> 
> 
> Gfp flag? Better memory reclaim functionality?

Well, I've got patches that implement the GFP flag approach, but as I
mentioned above, that was poorly received.  Better memory reclaim is a
broad and general approach that I agree is useful, but will not necessarily
solve the same set of problems (though it would likely lessen the severity
somewhat).

-Matt

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 0/9] Critical Mempools
  2006-01-26 23:32       ` Matthew Dobson
@ 2006-01-27  0:03         ` Benjamin LaHaise
  2006-01-27  0:27           ` Matthew Dobson
  2006-01-27  8:34           ` Sridhar Samudrala
  2006-01-27  8:29         ` Sridhar Samudrala
  1 sibling, 2 replies; 16+ messages in thread
From: Benjamin LaHaise @ 2006-01-27  0:03 UTC (permalink / raw)
  To: Matthew Dobson
  Cc: Christoph Lameter, linux-kernel, sri, andrea, pavel, linux-mm

On Thu, Jan 26, 2006 at 03:32:14PM -0800, Matthew Dobson wrote:
> > I thought the earlier __GFP_CRITICAL was a good idea.
> 
> Well, I certainly could have used that feedback a month ago! ;)  The
> general response to that patchset was overwhelmingly negative.  Yours is
> the first vote in favor of that approach, that I'm aware of.

Personally, I'm more in favour of a proper reservation system.  mempools 
are pretty inefficient.  Reservations have useful properties, too -- one 
could reserve memory for a critical process to use, but allow the system 
to use that memory for easy to reclaim caches or to help with memory 
defragmentation (more free pages really helps the buddy allocator).

> > Gfp flag? Better memory reclaim functionality?
> 
> Well, I've got patches that implement the GFP flag approach, but as I
> mentioned above, that was poorly received.  Better memory reclaim is a
> broad and general approach that I agree is useful, but will not necessarily
> solve the same set of problems (though it would likely lessen the severity
> somewhat).

Which areas are the priorities for getting this functionality into?  
Networking over particular sockets?  A GFP_ flag would plug into the current 
network stack trivially, as sockets already have a field to store the memory 
allocation flags.

		-ben
-- 
"Ladies and gentlemen, I'm sorry to interrupt, but the police are here 
and they've asked us to stop the party."  Don't Email: <dont@kvack.org>.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 0/9] Critical Mempools
  2006-01-27  0:03         ` Benjamin LaHaise
@ 2006-01-27  0:27           ` Matthew Dobson
  2006-01-27  7:35             ` Pekka Enberg
  2006-01-27 15:36             ` Jan Kiszka
  2006-01-27  8:34           ` Sridhar Samudrala
  1 sibling, 2 replies; 16+ messages in thread
From: Matthew Dobson @ 2006-01-27  0:27 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Christoph Lameter, linux-kernel, sri, andrea, pavel, linux-mm

Benjamin LaHaise wrote:
> On Thu, Jan 26, 2006 at 03:32:14PM -0800, Matthew Dobson wrote:
> 
>>>I thought the earlier __GFP_CRITICAL was a good idea.
>>
>>Well, I certainly could have used that feedback a month ago! ;)  The
>>general response to that patchset was overwhelmingly negative.  Yours is
>>the first vote in favor of that approach, that I'm aware of.
> 
> 
> Personally, I'm more in favour of a proper reservation system.  mempools 
> are pretty inefficient.  Reservations have useful properties, too -- one 
> could reserve memory for a critical process to use, but allow the system 
> to use that memory for easy to reclaim caches or to help with memory 
> defragmentation (more free pages really helps the buddy allocator).

That's an interesting idea...  Keep track of the number of pages "reserved"
but allow them to be used something like read-only pagecache...  Something
along those lines would most certainly be easier on the page allocator,
since it wouldn't have chunks of pages "missing" for long periods of time.


>>>Gfp flag? Better memory reclaim functionality?
>>
>>Well, I've got patches that implement the GFP flag approach, but as I
>>mentioned above, that was poorly received.  Better memory reclaim is a
>>broad and general approach that I agree is useful, but will not necessarily
>>solve the same set of problems (though it would likely lessen the severity
>>somewhat).
> 
> 
> Which areas are the priorities for getting this functionality into?  
> Networking over particular sockets?  A GFP_ flag would plug into the current 
> network stack trivially, as sockets already have a field to store the memory 
> allocation flags.

The impetus for this work was getting this functionality into the
networking stack, to keep the network alive under periods of extreme VM
pressure.  Keeping track of 'criticalness' on a per-socket basis is good,
but the problem is the receive side.  Networking packets are received and
put into skbuffs before there is any concept of what socket they belong to.
 So to really handle incoming traffic under extreme memory pressure would
require something beyond just a per-socket flag.

I have to say I'm somewhat amused by how much support the old approach is
getting now that I've spent a few weeks going back to the drawing board and
coming up with what I thought was a more general solution! :\

-Matt

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 0/9] Critical Mempools
  2006-01-27  0:27           ` Matthew Dobson
@ 2006-01-27  7:35             ` Pekka Enberg
  2006-01-27 10:10               ` Paul Jackson
  2006-01-27 15:36             ` Jan Kiszka
  1 sibling, 1 reply; 16+ messages in thread
From: Pekka Enberg @ 2006-01-27  7:35 UTC (permalink / raw)
  To: Matthew Dobson
  Cc: Benjamin LaHaise, Christoph Lameter, linux-kernel, sri, andrea,
	pavel, linux-mm

Hi,

Benjamin LaHaise wrote:
> > Personally, I'm more in favour of a proper reservation system.  mempools
> > are pretty inefficient.  Reservations have useful properties, too -- one
> > could reserve memory for a critical process to use, but allow the system
> > to use that memory for easy to reclaim caches or to help with memory
> > defragmentation (more free pages really helps the buddy allocator).

On 1/27/06, Matthew Dobson <colpatch@us.ibm.com> wrote:
> That's an interesting idea...  Keep track of the number of pages "reserved"
> but allow them to be used something like read-only pagecache...  Something
> along those lines would most certainly be easier on the page allocator,
> since it wouldn't have chunks of pages "missing" for long periods of time.

Any thoughts on what kind of allocation patterns do we have for those
critical callers? The worst case is of course that for just one 32
byte critical allocation we steal away a complete page from the
reserves which doesn't sound like a good idea under extreme VM
pressure. For a general solution, I don't think it's enough that you
simply flag an allocation GFP_CRITICAL and let the page allocator do
the allocation.

As as side note, we already have __GFP_NOFAIL. How is it different
from GFP_CRITICAL and why aren't we improving that?

                                  Pekka

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 0/9] Critical Mempools
  2006-01-26 23:32       ` Matthew Dobson
  2006-01-27  0:03         ` Benjamin LaHaise
@ 2006-01-27  8:29         ` Sridhar Samudrala
  1 sibling, 0 replies; 16+ messages in thread
From: Sridhar Samudrala @ 2006-01-27  8:29 UTC (permalink / raw)
  To: Matthew Dobson; +Cc: Christoph Lameter, linux-kernel, andrea, pavel, linux-mm

Matthew Dobson wrote:
> Christoph Lameter wrote:
>   
>> On Thu, 26 Jan 2006, Matthew Dobson wrote:
>>
>>
>>     
>>>> All subsystems will now get more complicated by having to add this 
>>>> emergency functionality?
>>>>         
>>> Certainly not.  Only subsystems that want to use emergency pools will get
>>> more complicated.  If you have a suggestion as to how to implement a
>>> similar feature that is completely transparent to its users, I would *love*
>>>       
>> I thought the earlier __GFP_CRITICAL was a good idea.
>>     
>
> Well, I certainly could have used that feedback a month ago! ;)  The
> general response to that patchset was overwhelmingly negative.  Yours is
> the first vote in favor of that approach, that I'm aware of.
>
>
>   
>>> to hear it.  I have tried to keep the changes to implement this
>>> functionality to a minimum.  As the patches currently stand, existing slab
>>> allocator and mempool users can continue using these subsystems without
>>> modification.
>>>       
>> The patches are extensive and the required changes to subsystems in order 
>> to use these pools are also extensive.
>>     
>
> I can't really argue with your first point, but the changes required to use
> the pools should actually be quite small.  Sridhar (cc'd on this thread) is
> working on the changes required for the networking subsystem to use these
> pools, and it looks like the patches will be no larger than the ones from
> the last attempt.
>   
I would say that the patches to support critical sockets will be 
slightly more complex with mempools
than the earlier patches that used the global critical page pool with a 
new GFP_CRITICAL flag.

Basically we need a facility to mark an allocation request as critical 
and satisfy this request without
any blocking in an emergency situation.

Thanks
Sridhar
>
>   
>>>> There surely must be a better way than revising all subsystems for 
>>>> critical allocations.
>>>>         
>>> Again, I could not find any way to implement this functionality without
>>> forcing the users of the functionality to make some, albeit very minor,
>>> changes.  Specific suggestions are more than welcome! :)
>>>       
>> Gfp flag? Better memory reclaim functionality?
>>     
>
> Well, I've got patches that implement the GFP flag approach, but as I
> mentioned above, that was poorly received.  Better memory reclaim is a
> broad and general approach that I agree is useful, but will not necessarily
> solve the same set of problems (though it would likely lessen the severity
> somewhat).
>
> -Matt
>   



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 0/9] Critical Mempools
  2006-01-27  0:03         ` Benjamin LaHaise
  2006-01-27  0:27           ` Matthew Dobson
@ 2006-01-27  8:34           ` Sridhar Samudrala
  1 sibling, 0 replies; 16+ messages in thread
From: Sridhar Samudrala @ 2006-01-27  8:34 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Matthew Dobson, Christoph Lameter, linux-kernel, andrea, pavel, linux-mm

Benjamin LaHaise wrote:
> On Thu, Jan 26, 2006 at 03:32:14PM -0800, Matthew Dobson wrote:
>   
>>> I thought the earlier __GFP_CRITICAL was a good idea.
>>>       
>> Well, I certainly could have used that feedback a month ago! ;)  The
>> general response to that patchset was overwhelmingly negative.  Yours is
>> the first vote in favor of that approach, that I'm aware of.
>>     
>
> Personally, I'm more in favour of a proper reservation system.  mempools 
> are pretty inefficient.  Reservations have useful properties, too -- one 
> could reserve memory for a critical process to use, but allow the system 
> to use that memory for easy to reclaim caches or to help with memory 
> defragmentation (more free pages really helps the buddy allocator).
>
>   
>>> Gfp flag? Better memory reclaim functionality?
>>>       
>> Well, I've got patches that implement the GFP flag approach, but as I
>> mentioned above, that was poorly received.  Better memory reclaim is a
>> broad and general approach that I agree is useful, but will not necessarily
>> solve the same set of problems (though it would likely lessen the severity
>> somewhat).
>>     
>
> Which areas are the priorities for getting this functionality into?  
> Networking over particular sockets?  A GFP_ flag would plug into the current 
> network stack trivially, as sockets already have a field to store the memory 
> allocation flags.
>   
Yes, i have posted patches that use this exact approach last month that 
use a critical page pool with
GFP_CRITICAL flag.
      http://lkml.org/lkml/2005/12/14/65
      http://lkml.org/lkml/2005/12/14/66

Thanks
Sridhar


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 0/9] Critical Mempools
  2006-01-27  7:35             ` Pekka Enberg
@ 2006-01-27 10:10               ` Paul Jackson
  2006-01-27 11:07                 ` Pekka Enberg
  0 siblings, 1 reply; 16+ messages in thread
From: Paul Jackson @ 2006-01-27 10:10 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: colpatch, bcrl, clameter, linux-kernel, sri, andrea, pavel, linux-mm

Pekka wrote:
> As as side note, we already have __GFP_NOFAIL. How is it different
> from GFP_CRITICAL and why aren't we improving that?

Don't these two flags invoke two different mechanisms.
  __GFP_NOFAIL can sleep for HZ/50 then retry, rather than return failure.
  __GFP_CRITICAL can steal from the emergency pool rather than fail.

I would favor renaming at least the __GFP_CRITICAL to something
like __GFP_EMERGPOOL, to highlight the relevant distinction.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 0/9] Critical Mempools
  2006-01-27 10:10               ` Paul Jackson
@ 2006-01-27 11:07                 ` Pekka Enberg
  2006-01-28  0:41                   ` Matthew Dobson
  0 siblings, 1 reply; 16+ messages in thread
From: Pekka Enberg @ 2006-01-27 11:07 UTC (permalink / raw)
  To: Paul Jackson
  Cc: colpatch, bcrl, clameter, linux-kernel, sri, andrea, pavel, linux-mm

Hi,

Pekka wrote:
> > As as side note, we already have __GFP_NOFAIL. How is it different
> > from GFP_CRITICAL and why aren't we improving that?

On 1/27/06, Paul Jackson <pj@sgi.com> wrote:
> Don't these two flags invoke two different mechanisms.
>   __GFP_NOFAIL can sleep for HZ/50 then retry, rather than return failure.
>   __GFP_CRITICAL can steal from the emergency pool rather than fail.
>
> I would favor renaming at least the __GFP_CRITICAL to something
> like __GFP_EMERGPOOL, to highlight the relevant distinction.

Yeah you're right. __GFP_NOFAIL guarantees to never fail but it
doesn't guarantee to actually succeed either. I think the suggested
semantics for __GFP_EMERGPOOL are that while it can fail, it tries to
avoid that by dipping into page reserves. However, I do still think
it's a bad idea to allow the slab allocator to steal whole pages for
critical allocations because in low-memory condition, it should be
fairly easy to exhaust the reserves and waste most of that memory at
the same time.

                            Pekka

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 0/9] Critical Mempools
  2006-01-27  0:27           ` Matthew Dobson
  2006-01-27  7:35             ` Pekka Enberg
@ 2006-01-27 15:36             ` Jan Kiszka
  1 sibling, 0 replies; 16+ messages in thread
From: Jan Kiszka @ 2006-01-27 15:36 UTC (permalink / raw)
  To: Matthew Dobson
  Cc: Benjamin LaHaise, Christoph Lameter, linux-kernel, sri, andrea,
	pavel, linux-mm

2006/1/27, Matthew Dobson <colpatch@us.ibm.com>:

> The impetus for this work was getting this functionality into the
> networking stack, to keep the network alive under periods of extreme VM
> pressure.  Keeping track of 'criticalness' on a per-socket basis is good,
> but the problem is the receive side.  Networking packets are received and
> put into skbuffs before there is any concept of what socket they belong to.
>  So to really handle incoming traffic under extreme memory pressure would
> require something beyond just a per-socket flag.

Maybe as an interesting lecture you want study how we handle this in
the deterministic network stack RTnet (www.rtnet.org): exchange full
with empty (rt-)skbs between per-user packet pools. Every packet
producer or consumer (socket, NIC, in-kernel networking service) has
its own pool of pre-allocated, fixed-sized packets. Incoming packets
are first stored at the expense of the NIC. But as soon as the real
receiver is known, that one has to pass over an empty buffer in order
to get the full one. Otherwise, the packet is dropped. Kind of hard
policy, but it prevents any local user from starving the system with
respect to skbs. Additionally for full determinism, remote users have
to be controlled via bandwidth management (to avoid exhausting the
NIC's pool), in our case a TDMA mechanism.

I'm not suggesting that this is something easy to adopt into a general
purpose networking stack (this is /one/ reason why we maintain a
separate project for it). But maybe the concept can inspire something
in this direction. Would be funny to have "native" RTnet in the kernel
one day :). Separate memory pools for critical allocations is an
interesting step that may help us as well.

Jan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 0/9] Critical Mempools
  2006-01-27 11:07                 ` Pekka Enberg
@ 2006-01-28  0:41                   ` Matthew Dobson
  2006-01-28 10:21                     ` Pekka Enberg
  0 siblings, 1 reply; 16+ messages in thread
From: Matthew Dobson @ 2006-01-28  0:41 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Paul Jackson, bcrl, clameter, linux-kernel, sri, andrea, pavel, linux-mm

Pekka Enberg wrote:
> Hi,
> 
> Pekka wrote:
> 
>>>As as side note, we already have __GFP_NOFAIL. How is it different
>>>from GFP_CRITICAL and why aren't we improving that?
> 
> 
> On 1/27/06, Paul Jackson <pj@sgi.com> wrote:
> 
>>Don't these two flags invoke two different mechanisms.
>>  __GFP_NOFAIL can sleep for HZ/50 then retry, rather than return failure.
>>  __GFP_CRITICAL can steal from the emergency pool rather than fail.
>>
>>I would favor renaming at least the __GFP_CRITICAL to something
>>like __GFP_EMERGPOOL, to highlight the relevant distinction.
> 
> 
> Yeah you're right. __GFP_NOFAIL guarantees to never fail but it
> doesn't guarantee to actually succeed either. I think the suggested
> semantics for __GFP_EMERGPOOL are that while it can fail, it tries to
> avoid that by dipping into page reserves. However, I do still think
> it's a bad idea to allow the slab allocator to steal whole pages for
> critical allocations because in low-memory condition, it should be
> fairly easy to exhaust the reserves and waste most of that memory at
> the same time.

The main pushback I got on my previous attempt at somethign like
__GFP_EMERGPOOL was that a single, system-wide pool was unacceptable.
Determining the appropriate size for such a pool would be next to
impossible, particularly as the number of users of __GFP_EMERGPOOL grows.
The general concensus was that per-subsystem or dynamically created pools
would be a more useful addition to the kernel.  Do any of you who are now
requesting the single pool approach have any suggestions as to how to
appropriately size a pool with potentially dozens of users so as to offer
any kind of useful guarantee?  The less users of a single pool, obviously
the easier it is to appropriately size that pool...

As far as allowing the slab allocator to steal a whole page from the
critical pool to satisfy a single slab request, I think that is ok.  The
only other suggestion I've heard is to insert a SLOB layer between the
critical pool's page allocator and the slab allocator, and have this SLOB
layer chopping up pages into pieces to handle slab requests that cannot be
satisfied through the normal slab/page allocator combo.  This involves
adding a fair bit of code and complexity for the benefit of a few pages of
memory.  Now, a few pages of memory could be incredibly crucial, since
we're discussing an emergency (presumably) low-mem situation, but if we're
going to be getting several requests for the same slab/kmalloc-size then
we're probably better of giving a whole page to the slab allocator.  This
is pure speculation, of course... :)

-Matt

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 0/9] Critical Mempools
  2006-01-28  0:41                   ` Matthew Dobson
@ 2006-01-28 10:21                     ` Pekka Enberg
  2006-01-30 22:38                       ` Matthew Dobson
  0 siblings, 1 reply; 16+ messages in thread
From: Pekka Enberg @ 2006-01-28 10:21 UTC (permalink / raw)
  To: Matthew Dobson
  Cc: Paul Jackson, bcrl, clameter, linux-kernel, sri, andrea, pavel, linux-mm

Hi,

On Fri, 2006-01-27 at 16:41 -0800, Matthew Dobson wrote:
> Now, a few pages of memory could be incredibly crucial, since
> we're discussing an emergency (presumably) low-mem situation, but if
> we're going to be getting several requests for the same
> slab/kmalloc-size then we're probably better of giving a whole page to
> the slab allocator.  This is pure speculation, of course... :)

Yeah but even then there's no guarantee that the critical allocations
will be serviced first. The slab allocator can as well be giving away
bits of the fresh page to non-critical allocations. For the exact same
reason, I don't think it's enough that you pass a subsystem-specific
page pool to the slab allocator.

Sorry if this has been explained before but why aren't mempools
sufficient for your purposes? Also one more alternative would be to
create a separate object cache for each subsystem-specific critical
allocation and implement a internal "page pool" for the slab allocator
so that you could specify for the number of pages an object cache
guarantees to always hold on to.

				Pekka


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch 0/9] Critical Mempools
  2006-01-28 10:21                     ` Pekka Enberg
@ 2006-01-30 22:38                       ` Matthew Dobson
  0 siblings, 0 replies; 16+ messages in thread
From: Matthew Dobson @ 2006-01-30 22:38 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Paul Jackson, bcrl, clameter, linux-kernel, sri, andrea, pavel, linux-mm

Pekka Enberg wrote:
> Hi,
> 
> On Fri, 2006-01-27 at 16:41 -0800, Matthew Dobson wrote:
> 
>>Now, a few pages of memory could be incredibly crucial, since
>>we're discussing an emergency (presumably) low-mem situation, but if
>>we're going to be getting several requests for the same
>>slab/kmalloc-size then we're probably better of giving a whole page to
>>the slab allocator.  This is pure speculation, of course... :)
> 
> 
> Yeah but even then there's no guarantee that the critical allocations
> will be serviced first. The slab allocator can as well be giving away
> bits of the fresh page to non-critical allocations. For the exact same
> reason, I don't think it's enough that you pass a subsystem-specific
> page pool to the slab allocator.

Well, it would give at least one object from the new slab to the critical
request, but you're right, the rest of the slab could be allocated to
non-critical users.  I had planned on a small follow-on patch to add
exclusivity to mempool/critical slab pages, but going a different route
seems to be the consensus.


> Sorry if this has been explained before but why aren't mempools
> sufficient for your purposes? Also one more alternative would be to
> create a separate object cache for each subsystem-specific critical
> allocation and implement a internal "page pool" for the slab allocator
> so that you could specify for the number of pages an object cache
> guarantees to always hold on to.

Mempools aren't sufficient because in order to create a real critical pool
for the whole networking subsystem, we'd have to create dozens of mempools,
one each for all the different slabs & kmalloc sizes the networking stack
requires, plus another for whole pages.  Not impossible, but U-G-L-Y.  And
wasteful.  Creating all those mempools is surely more wasteful than
creating one reasonably sized pool to back ALL the allocations.  Or, at
least, such was my rationale... :)

-Matt

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2006-01-30 22:39 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-01-25 19:39 [patch 0/9] Critical Mempools Matthew Dobson
2006-01-26 17:57 ` Christoph Lameter
2006-01-26 23:01   ` Matthew Dobson
2006-01-26 23:18     ` Christoph Lameter
2006-01-26 23:32       ` Matthew Dobson
2006-01-27  0:03         ` Benjamin LaHaise
2006-01-27  0:27           ` Matthew Dobson
2006-01-27  7:35             ` Pekka Enberg
2006-01-27 10:10               ` Paul Jackson
2006-01-27 11:07                 ` Pekka Enberg
2006-01-28  0:41                   ` Matthew Dobson
2006-01-28 10:21                     ` Pekka Enberg
2006-01-30 22:38                       ` Matthew Dobson
2006-01-27 15:36             ` Jan Kiszka
2006-01-27  8:34           ` Sridhar Samudrala
2006-01-27  8:29         ` Sridhar Samudrala

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).