linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [ATTEND] many topics
@ 2017-01-18  5:49 Matthew Wilcox
  2017-01-18 10:13 ` [Lsf-pc] " Jan Kara
  2017-01-18 13:32 ` Michal Hocko
  0 siblings, 2 replies; 23+ messages in thread
From: Matthew Wilcox @ 2017-01-18  5:49 UTC (permalink / raw)
  To: lsf-pc; +Cc: linux-fsdevel, linux-mm


o/~ There are many things that I would like to say to you ... o/~

mostly around MM and FS interaction.

1. Exploiting multiorder radix tree entries.  I believe we would do well
to attempt to allocate compound pages, insert them into the page cache,
and expect filesystems to be able to handle filling compound pages with
->readpage.  It will be more efficient because alloc_pages() can return
large entries out of the buddy list rather than breaking them down,
and it'll help reduce fragmentation.

2. Supporting filesystem block sizes > page size.  Once we do the above
for efficiency, I think it then becomes trivial to support, eg 16k block
size filesystems on x86 machines with 4k pages.

3. Moving slab objects.  I've been working with Christoph Lameter
this week on implementing a reclaim operation for radix tree nodes.
It seems feasible.  We should probably talk about reclaming dentries
and inodes as well.

4. Pretty much anything relating to DAX.  See other thread.

5. I have discovered a newfound fascination with CIFS which is totally
unrelated to my new employer.  Honest.  I should have some interesting
patches for CIFS by LSFMM.

6. Overhauling vmap to use a radix tree instead of a possibly recursive
vmalloc of an array to store pointers to the pages.

7. Using alloc_pages_exact() to kmalloc objects larger than PAGE_SIZE*2

8. Nailing down exactly what GFP_TEMPORARY means

9. Adding malloc()/free() as a kernel API

I have more things in my IDEAS file, but I think that will do for now.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Lsf-pc] [ATTEND] many topics
  2017-01-18  5:49 [ATTEND] many topics Matthew Wilcox
@ 2017-01-18 10:13 ` Jan Kara
  2017-01-18 11:26   ` willy
  2017-01-18 13:32 ` Michal Hocko
  1 sibling, 1 reply; 23+ messages in thread
From: Jan Kara @ 2017-01-18 10:13 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: lsf-pc, linux-fsdevel, linux-mm

On Tue 17-01-17 21:49:45, Matthew Wilcox wrote:
> 1. Exploiting multiorder radix tree entries.  I believe we would do well
> to attempt to allocate compound pages, insert them into the page cache,
> and expect filesystems to be able to handle filling compound pages with
> ->readpage.  It will be more efficient because alloc_pages() can return
> large entries out of the buddy list rather than breaking them down,
> and it'll help reduce fragmentation.

Kirill has patches to do this and I don't like the complexity it adds to
pagecache handling code and each filesystem that would like to support
this. I don't have objections to the general idea but the complexity of the
current implementation just looks too big to me...

> 2. Supporting filesystem block sizes > page size.  Once we do the above
> for efficiency, I think it then becomes trivial to support, eg 16k block
> size filesystems on x86 machines with 4k pages.

Heh, you wish... :) There's a big difference between opportunistically
allocating a huge page and reliably have to provide high order page. Memory
fragmentation issues will be difficult to deal with...
 
								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Lsf-pc] [ATTEND] many topics
  2017-01-18 10:13 ` [Lsf-pc] " Jan Kara
@ 2017-01-18 11:26   ` willy
  0 siblings, 0 replies; 23+ messages in thread
From: willy @ 2017-01-18 11:26 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Wilcox, lsf-pc, linux-fsdevel, linux-mm

On Wed, Jan 18, 2017 at 11:13:43AM +0100, Jan Kara wrote:
> On Tue 17-01-17 21:49:45, Matthew Wilcox wrote:
> > 1. Exploiting multiorder radix tree entries.  I believe we would do well
> > to attempt to allocate compound pages, insert them into the page cache,
> > and expect filesystems to be able to handle filling compound pages with
> > ->readpage.  It will be more efficient because alloc_pages() can return
> > large entries out of the buddy list rather than breaking them down,
> > and it'll help reduce fragmentation.
> 
> Kirill has patches to do this and I don't like the complexity it adds to
> pagecache handling code and each filesystem that would like to support
> this. I don't have objections to the general idea but the complexity of the
> current implementation just looks too big to me...

Interesting.  Dave Chinner opined to me today that it was about 20 lines
of code in XFS, so somebody is missing something.

> > 2. Supporting filesystem block sizes > page size.  Once we do the above
> > for efficiency, I think it then becomes trivial to support, eg 16k block
> > size filesystems on x86 machines with 4k pages.
> 
> Heh, you wish... :) There's a big difference between opportunistically
> allocating a huge page and reliably have to provide high order page. Memory
> fragmentation issues will be difficult to deal with...

If you're mixing a lot of order-0 allocations with a few order-4
allocations, then yes memory fragmentation may become a problem.  But if
you're doing a lot of order-4 allocations, then it should be possible
to free an order-4 allocation from the inactive list of one of the files
on the 64k filesystem.

Somewhat related, and this question was asked during my talk today so
I should have mentioned it in the email, should order-9 pages on the
inactive list be treated differently from order-0 entries?  I suspect
the answer is yes, because there's probably little point in freeing
order-9 page off the LRU list in order to satisfy a order-9 allocation;
we should just find an order-9 page and free it.  Likewise, freeing an
order-9 page in order to satisfy an order-0 allocation is going to lead
to fragmentation and should probably be avoided.

I suspect order-0 and order-9 entries can be profitably mixed on the
active list, but it might be better to have separate LRU lists for normal
and huge pages.  Does it make sense to have one LRU list per order?
Maybe not go quite that far, but organising the inactive list by size
seems to have some merit.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [ATTEND] many topics
  2017-01-18  5:49 [ATTEND] many topics Matthew Wilcox
  2017-01-18 10:13 ` [Lsf-pc] " Jan Kara
@ 2017-01-18 13:32 ` Michal Hocko
  2017-01-19 11:05   ` willy
  1 sibling, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2017-01-18 13:32 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: lsf-pc, linux-fsdevel, linux-mm

On Tue 17-01-17 21:49:45, Matthew Wilcox wrote:
[...]
> 8. Nailing down exactly what GFP_TEMPORARY means

It's a hint that the page allocator should group those pages together
for better fragmentation avoidance. Have a look at e12ba74d8ff3 ("Group
short-lived and reclaimable kernel allocations"). Basically it is
something like __GFP_MOVABLE for kernel allocations which cannot go to
the movable zones.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [ATTEND] many topics
  2017-01-18 13:32 ` Michal Hocko
@ 2017-01-19 11:05   ` willy
  2017-01-19 11:33     ` Michal Hocko
  0 siblings, 1 reply; 23+ messages in thread
From: willy @ 2017-01-19 11:05 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Matthew Wilcox, lsf-pc, linux-fsdevel, linux-mm

On Wed, Jan 18, 2017 at 02:32:43PM +0100, Michal Hocko wrote:
> On Tue 17-01-17 21:49:45, Matthew Wilcox wrote:
> [...]
> > 8. Nailing down exactly what GFP_TEMPORARY means
> 
> It's a hint that the page allocator should group those pages together
> for better fragmentation avoidance. Have a look at e12ba74d8ff3 ("Group
> short-lived and reclaimable kernel allocations"). Basically it is
> something like __GFP_MOVABLE for kernel allocations which cannot go to
> the movable zones.

Let me rephrase the topic ... Under what conditions should somebody use
the GFP_TEMPORARY gfp_t?

Example usages that I have questions about:

1. Is it permissible to call kmalloc(GFP_TEMPORARY), or is it only
for alloc_pages?  I ask because if the slab allocator is unaware of
GFP_TEMPORARY, then a non-GFP_TEMPORARY allocation may be placed in a
page allocated with GFP_TEMPORARY and we've just made it meaningless.

2. Is it permissible to sleep while holding a GFP_TEMPORARY allocation?
eg, take a mutex, or wait_for_completion()?

3. Can I make one GFP_TEMPORARY allocation, and then another one?

4. Should I disable preemption while holding a GFP_TEMPORARY allocation,
or are we OK with a task being preempted?

5. What about something even longer duration like allocating a kiocb?
That might take an arbitrary length of time to be freed, but eventually
the command will be timed out (eg 30 seconds for something that ends up
going through SCSI).

6. Or shorter duration like doing a GFP_TEMPORARY allocation, then taking
a spinlock, which *probably* isn't contended, but you never know.

7. I can see it includes __GFP_WAIT so it's not suitable for using from
interrupt context, but interrupt context might be the place which can
benefit from it the most.  Or does GFP_ATOMIC's __GFP_HIGH also allow for
allocation from the movable zone?  Should we have a GFP_TEMPORARY_ATOMIC?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [ATTEND] many topics
  2017-01-19 11:05   ` willy
@ 2017-01-19 11:33     ` Michal Hocko
  2017-01-19 11:52       ` willy
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2017-01-19 11:33 UTC (permalink / raw)
  To: willy; +Cc: lsf-pc, linux-fsdevel, linux-mm

On Thu 19-01-17 03:05:13, willy@infradead.org wrote:
> On Wed, Jan 18, 2017 at 02:32:43PM +0100, Michal Hocko wrote:
> > On Tue 17-01-17 21:49:45, Matthew Wilcox wrote:
> > [...]
> > > 8. Nailing down exactly what GFP_TEMPORARY means
> > 
> > It's a hint that the page allocator should group those pages together
> > for better fragmentation avoidance. Have a look at e12ba74d8ff3 ("Group
> > short-lived and reclaimable kernel allocations"). Basically it is
> > something like __GFP_MOVABLE for kernel allocations which cannot go to
> > the movable zones.
> 
> Let me rephrase the topic ... Under what conditions should somebody use
> the GFP_TEMPORARY gfp_t?

Most users of slab (kmalloc) do not really have to care. Slab will add
__GFP_RECLAIMABLE to all reclaimable caches automagically AFAIR. The
remaining would have to implement some kind of shrinker to allow the
reclaim.

> Example usages that I have questions about:
> 
> 1. Is it permissible to call kmalloc(GFP_TEMPORARY), or is it only
> for alloc_pages?

kmalloc will use it internally as mentioned above.  I am not even sure
whether direct using of kmalloc(GFP_TEMPORARY) is ok.  I would have to
check the code but I guess it would be just wrong unless you know your
cache is reclaimable.

> I ask because if the slab allocator is unaware of
> GFP_TEMPORARY, then a non-GFP_TEMPORARY allocation may be placed in a
> page allocated with GFP_TEMPORARY and we've just made it meaningless.
> 
> 2. Is it permissible to sleep while holding a GFP_TEMPORARY allocation?
> eg, take a mutex, or wait_for_completion()?

Yes, GFP_TEMPORARY has ___GFP_DIRECT_RECLAIM set so this is by
definition sleepable allocation request.

> 3. Can I make one GFP_TEMPORARY allocation, and then another one?

Not sure I understand. WHy would be a problem?

> 4. Should I disable preemption while holding a GFP_TEMPORARY allocation,
> or are we OK with a task being preempted?

no, it can sleep.

> 5. What about something even longer duration like allocating a kiocb?
> That might take an arbitrary length of time to be freed, but eventually
> the command will be timed out (eg 30 seconds for something that ends up
> going through SCSI).

I do not understand. The reclaimability of the object is in hands of the
respective shrinker...
 
> 6. Or shorter duration like doing a GFP_TEMPORARY allocation, then taking
> a spinlock, which *probably* isn't contended, but you never know.
> 
> 7. I can see it includes __GFP_WAIT so it's not suitable for using from
> interrupt context, but interrupt context might be the place which can
> benefit from it the most.  Or does GFP_ATOMIC's __GFP_HIGH also allow for
> allocation from the movable zone?  Should we have a GFP_TEMPORARY_ATOMIC?

This is where __GFP_RECLAIMABLE should be used as this is the core of
the functionality.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [ATTEND] many topics
  2017-01-19 11:33     ` Michal Hocko
@ 2017-01-19 11:52       ` willy
  2017-01-19 12:11         ` Michal Hocko
  0 siblings, 1 reply; 23+ messages in thread
From: willy @ 2017-01-19 11:52 UTC (permalink / raw)
  To: Michal Hocko; +Cc: willy, lsf-pc, linux-fsdevel, linux-mm

On Thu, Jan 19, 2017 at 12:33:17PM +0100, Michal Hocko wrote:
> On Thu 19-01-17 03:05:13, willy@infradead.org wrote:
> > Let me rephrase the topic ... Under what conditions should somebody use
> > the GFP_TEMPORARY gfp_t?
> 
> Most users of slab (kmalloc) do not really have to care. Slab will add
> __GFP_RECLAIMABLE to all reclaimable caches automagically AFAIR. The
> remaining would have to implement some kind of shrinker to allow the
> reclaim.

I seem to be not making myself clear.  Picture me writing a device driver.
When should I use GFP_TEMPORARY?

> > Example usages that I have questions about:
> > 
> > 1. Is it permissible to call kmalloc(GFP_TEMPORARY), or is it only
> > for alloc_pages?
> 
> kmalloc will use it internally as mentioned above.  I am not even sure
> whether direct using of kmalloc(GFP_TEMPORARY) is ok.  I would have to
> check the code but I guess it would be just wrong unless you know your
> cache is reclaimable.

You're not using words that have any meaning to a device driver writer.
Here's my code:

int foo_ioctl(..)
{
	struct foo *foo = kmalloc(sizeof(*foo), GFP_TEMPORARY);
}

Does this work?  If not, should it?  Or should slab be checking for
this and calling WARN()?

> > I ask because if the slab allocator is unaware of
> > GFP_TEMPORARY, then a non-GFP_TEMPORARY allocation may be placed in a
> > page allocated with GFP_TEMPORARY and we've just made it meaningless.
> > 
> > 2. Is it permissible to sleep while holding a GFP_TEMPORARY allocation?
> > eg, take a mutex, or wait_for_completion()?
> 
> Yes, GFP_TEMPORARY has ___GFP_DIRECT_RECLAIM set so this is by
> definition sleepable allocation request.

Again, we're talking past each other.  Can foo_ioctl() sleep before
releasing its GFP_TEMPORARY allocation, or will that make the memory
allocator unhappy?

> > 3. Can I make one GFP_TEMPORARY allocation, and then another one?
> 
> Not sure I understand. WHy would be a problem?

As you say above, GFP_TEMPORARY may sleep, so this is a variation on the "can I sleep while holding a GFP_TEMPORARY allocation" question.

> > 4. Should I disable preemption while holding a GFP_TEMPORARY allocation,
> > or are we OK with a task being preempted?
> 
> no, it can sleep.
> 
> > 5. What about something even longer duration like allocating a kiocb?
> > That might take an arbitrary length of time to be freed, but eventually
> > the command will be timed out (eg 30 seconds for something that ends up
> > going through SCSI).
> 
> I do not understand. The reclaimability of the object is in hands of the
> respective shrinker...

There is no shrinker here.  This is about the object being "temporary",
for some value of temporary.  I want to nail down what the MM is willing
to tolerate in terms of length of time an object is allocated for.

> > 6. Or shorter duration like doing a GFP_TEMPORARY allocation, then taking
> > a spinlock, which *probably* isn't contended, but you never know.
> > 
> > 7. I can see it includes __GFP_WAIT so it's not suitable for using from
> > interrupt context, but interrupt context might be the place which can
> > benefit from it the most.  Or does GFP_ATOMIC's __GFP_HIGH also allow for
> > allocation from the movable zone?  Should we have a GFP_TEMPORARY_ATOMIC?
> 
> This is where __GFP_RECLAIMABLE should be used as this is the core of
> the functionality.

This response also doesn't make sense to me.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [ATTEND] many topics
  2017-01-19 11:52       ` willy
@ 2017-01-19 12:11         ` Michal Hocko
  2017-01-21  0:11           ` NeilBrown
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2017-01-19 12:11 UTC (permalink / raw)
  To: willy; +Cc: willy, lsf-pc, linux-fsdevel, linux-mm

On Thu 19-01-17 03:52:43, willy@bombadil.infradead.org wrote:
> On Thu, Jan 19, 2017 at 12:33:17PM +0100, Michal Hocko wrote:
> > On Thu 19-01-17 03:05:13, willy@infradead.org wrote:
> > > Let me rephrase the topic ... Under what conditions should somebody use
> > > the GFP_TEMPORARY gfp_t?
> > 
> > Most users of slab (kmalloc) do not really have to care. Slab will add
> > __GFP_RECLAIMABLE to all reclaimable caches automagically AFAIR. The
> > remaining would have to implement some kind of shrinker to allow the
> > reclaim.
> 
> I seem to be not making myself clear.  Picture me writing a device driver.
> When should I use GFP_TEMPORARY?

I guess the original intention was to use this flag for allocations
which will be either freed shortly or they are reclaimable.
 
> > > Example usages that I have questions about:
> > > 
> > > 1. Is it permissible to call kmalloc(GFP_TEMPORARY), or is it only
> > > for alloc_pages?
> > 
> > kmalloc will use it internally as mentioned above.  I am not even sure
> > whether direct using of kmalloc(GFP_TEMPORARY) is ok.  I would have to
> > check the code but I guess it would be just wrong unless you know your
> > cache is reclaimable.
> 
> You're not using words that have any meaning to a device driver writer.
> Here's my code:
> 
> int foo_ioctl(..)
> {
> 	struct foo *foo = kmalloc(sizeof(*foo), GFP_TEMPORARY);
> }
> 
> Does this work?  If not, should it?  Or should slab be checking for
> this and calling WARN()?

I would have to check the code but I believe that this shouldn't be
harmful other than increase the fragmentation.

> > > I ask because if the slab allocator is unaware of
> > > GFP_TEMPORARY, then a non-GFP_TEMPORARY allocation may be placed in a
> > > page allocated with GFP_TEMPORARY and we've just made it meaningless.
> > > 
> > > 2. Is it permissible to sleep while holding a GFP_TEMPORARY allocation?
> > > eg, take a mutex, or wait_for_completion()?
> > 
> > Yes, GFP_TEMPORARY has ___GFP_DIRECT_RECLAIM set so this is by
> > definition sleepable allocation request.
> 
> Again, we're talking past each other.  Can foo_ioctl() sleep before
> releasing its GFP_TEMPORARY allocation, or will that make the memory
> allocator unhappy?

I do not think it would make the allocator unhappy as long as the sleep
is not for ever...

> > > 3. Can I make one GFP_TEMPORARY allocation, and then another one?
> > 
> > Not sure I understand. WHy would be a problem?
> 
> As you say above, GFP_TEMPORARY may sleep, so this is a variation on the "can I sleep while holding a GFP_TEMPORARY allocation" question.
> 
> > > 4. Should I disable preemption while holding a GFP_TEMPORARY allocation,
> > > or are we OK with a task being preempted?
> > 
> > no, it can sleep.
> > 
> > > 5. What about something even longer duration like allocating a kiocb?
> > > That might take an arbitrary length of time to be freed, but eventually
> > > the command will be timed out (eg 30 seconds for something that ends up
> > > going through SCSI).
> > 
> > I do not understand. The reclaimability of the object is in hands of the
> > respective shrinker...
> 
> There is no shrinker here.  This is about the object being "temporary",
> for some value of temporary.  I want to nail down what the MM is willing
> to tolerate in terms of length of time an object is allocated for.

>From my understanding MM will use the information for optimizing objects
placing and the longer the user will use that memory the worse this
optimization works. I do not think the (ab)use would be fatal...
 
> > > 6. Or shorter duration like doing a GFP_TEMPORARY allocation, then taking
> > > a spinlock, which *probably* isn't contended, but you never know.
> > > 
> > > 7. I can see it includes __GFP_WAIT so it's not suitable for using from
> > > interrupt context, but interrupt context might be the place which can
> > > benefit from it the most.  Or does GFP_ATOMIC's __GFP_HIGH also allow for
> > > allocation from the movable zone?  Should we have a GFP_TEMPORARY_ATOMIC?
> > 
> > This is where __GFP_RECLAIMABLE should be used as this is the core of
> > the functionality.
> 
> This response also doesn't make sense to me.

I meant to say that such an allocation can use __GFP_RECLAIMABLE | __GFP_NOWAIT.


-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [ATTEND] many topics
  2017-01-19 12:11         ` Michal Hocko
@ 2017-01-21  0:11           ` NeilBrown
  2017-01-21 13:16             ` Theodore Ts'o
  0 siblings, 1 reply; 23+ messages in thread
From: NeilBrown @ 2017-01-21  0:11 UTC (permalink / raw)
  To: Michal Hocko, willy; +Cc: willy, lsf-pc, linux-fsdevel, linux-mm

[-- Attachment #1: Type: text/plain, Size: 6661 bytes --]

On Thu, Jan 19 2017, Michal Hocko wrote:

> On Thu 19-01-17 03:52:43, willy@bombadil.infradead.org wrote:
>> On Thu, Jan 19, 2017 at 12:33:17PM +0100, Michal Hocko wrote:
>> > On Thu 19-01-17 03:05:13, willy@infradead.org wrote:
>> > > Let me rephrase the topic ... Under what conditions should somebody use
>> > > the GFP_TEMPORARY gfp_t?
>> > 
>> > Most users of slab (kmalloc) do not really have to care. Slab will add
>> > __GFP_RECLAIMABLE to all reclaimable caches automagically AFAIR. The
>> > remaining would have to implement some kind of shrinker to allow the
>> > reclaim.
>> 
>> I seem to be not making myself clear.  Picture me writing a device driver.
>> When should I use GFP_TEMPORARY?
>
> I guess the original intention was to use this flag for allocations
> which will be either freed shortly or they are reclaimable.

I would really like to see GFP_TEMPORARY described as a contract, rather
than in terms of implementation details.
What are the benefits of using it, and what are the costs?

For example, with GFP_NOFS, we know that the benefits are "no recursion
into the filesystem for reclaim" and hence no deadlocks.  The costs are
that failure is more likely.  So it is easy to know when to use it, and
it is easy to see if either side breaks the contract.

What are the benefits of GFP_TEMPORARY?  Presumably it doesn't guarantee
success any more than GFP_KERNEL does, but maybe it is slightly less
likely to fail, and somewhat less likely to block for a long time??  But
without some sort of promise, I wonder why anyone would use the
flag.  Is there a promise?  Or is it just "you can be nice to the MM
layer by setting this flag sometimes". ???

And what, exactly, are the costs?  How soon is "shortly".  Below you say
"not forever" which very very different to "shortly", at least it is on
my calendar 

I would like to suggest:

  GFP_TEMPORARY should be used when the memory allocated will either be
  freed, or will be placed in a reclaimable cache, before the process
  which allocated it enters an TASK_INTERRUPTIBLE sleep or returns to
  user-space.  It allows access to memory which is usually reserved for
  XXX and so can be expected to succeed more quickly during times of
  high memory pressure.

Using GFP_TEMPORARY would then help make the code self-documenting and
might improve behaviour under memory pressure in some cases.  It would
also be clear whether a particular was not correct, if a change in
behaviour of the MM would be consistent.

The rules given here might be more strict that necessary with the current
implementation, but they are clear and measurable.  This gives room for
code to change in the future without breaking things.

NeilBrown




>  
>> > > Example usages that I have questions about:
>> > > 
>> > > 1. Is it permissible to call kmalloc(GFP_TEMPORARY), or is it only
>> > > for alloc_pages?
>> > 
>> > kmalloc will use it internally as mentioned above.  I am not even sure
>> > whether direct using of kmalloc(GFP_TEMPORARY) is ok.  I would have to
>> > check the code but I guess it would be just wrong unless you know your
>> > cache is reclaimable.
>> 
>> You're not using words that have any meaning to a device driver writer.
>> Here's my code:
>> 
>> int foo_ioctl(..)
>> {
>> 	struct foo *foo = kmalloc(sizeof(*foo), GFP_TEMPORARY);
>> }
>> 
>> Does this work?  If not, should it?  Or should slab be checking for
>> this and calling WARN()?
>
> I would have to check the code but I believe that this shouldn't be
> harmful other than increase the fragmentation.
>
>> > > I ask because if the slab allocator is unaware of
>> > > GFP_TEMPORARY, then a non-GFP_TEMPORARY allocation may be placed in a
>> > > page allocated with GFP_TEMPORARY and we've just made it meaningless.
>> > > 
>> > > 2. Is it permissible to sleep while holding a GFP_TEMPORARY allocation?
>> > > eg, take a mutex, or wait_for_completion()?
>> > 
>> > Yes, GFP_TEMPORARY has ___GFP_DIRECT_RECLAIM set so this is by
>> > definition sleepable allocation request.
>> 
>> Again, we're talking past each other.  Can foo_ioctl() sleep before
>> releasing its GFP_TEMPORARY allocation, or will that make the memory
>> allocator unhappy?
>
> I do not think it would make the allocator unhappy as long as the sleep
> is not for ever...
>
>> > > 3. Can I make one GFP_TEMPORARY allocation, and then another one?
>> > 
>> > Not sure I understand. WHy would be a problem?
>> 
>> As you say above, GFP_TEMPORARY may sleep, so this is a variation on the "can I sleep while holding a GFP_TEMPORARY allocation" question.
>> 
>> > > 4. Should I disable preemption while holding a GFP_TEMPORARY allocation,
>> > > or are we OK with a task being preempted?
>> > 
>> > no, it can sleep.
>> > 
>> > > 5. What about something even longer duration like allocating a kiocb?
>> > > That might take an arbitrary length of time to be freed, but eventually
>> > > the command will be timed out (eg 30 seconds for something that ends up
>> > > going through SCSI).
>> > 
>> > I do not understand. The reclaimability of the object is in hands of the
>> > respective shrinker...
>> 
>> There is no shrinker here.  This is about the object being "temporary",
>> for some value of temporary.  I want to nail down what the MM is willing
>> to tolerate in terms of length of time an object is allocated for.
>
> From my understanding MM will use the information for optimizing objects
> placing and the longer the user will use that memory the worse this
> optimization works. I do not think the (ab)use would be fatal...
>  
>> > > 6. Or shorter duration like doing a GFP_TEMPORARY allocation, then taking
>> > > a spinlock, which *probably* isn't contended, but you never know.
>> > > 
>> > > 7. I can see it includes __GFP_WAIT so it's not suitable for using from
>> > > interrupt context, but interrupt context might be the place which can
>> > > benefit from it the most.  Or does GFP_ATOMIC's __GFP_HIGH also allow for
>> > > allocation from the movable zone?  Should we have a GFP_TEMPORARY_ATOMIC?
>> > 
>> > This is where __GFP_RECLAIMABLE should be used as this is the core of
>> > the functionality.
>> 
>> This response also doesn't make sense to me.
>
> I meant to say that such an allocation can use __GFP_RECLAIMABLE | __GFP_NOWAIT.
>
>
> -- 
> Michal Hocko
> SUSE Labs
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [ATTEND] many topics
  2017-01-21  0:11           ` NeilBrown
@ 2017-01-21 13:16             ` Theodore Ts'o
  2017-01-22  4:45               ` NeilBrown
  0 siblings, 1 reply; 23+ messages in thread
From: Theodore Ts'o @ 2017-01-21 13:16 UTC (permalink / raw)
  To: NeilBrown; +Cc: Michal Hocko, willy, willy, lsf-pc, linux-fsdevel, linux-mm

On Sat, Jan 21, 2017 at 11:11:41AM +1100, NeilBrown wrote:
> What are the benefits of GFP_TEMPORARY?  Presumably it doesn't guarantee
> success any more than GFP_KERNEL does, but maybe it is slightly less
> likely to fail, and somewhat less likely to block for a long time??  But
> without some sort of promise, I wonder why anyone would use the
> flag.  Is there a promise?  Or is it just "you can be nice to the MM
> layer by setting this flag sometimes". ???

My understanding is that the idea is to allow short-term use cases not
to be mixed with long-term use cases --- in the Java world, to declare
that a particular object will never be promoted from the "nursury"
arena to the "tenured" arena, so that we don't end up with a situation
where a page is used 90% for temporary objects, and 10% for a tenured
object, such that later on we have a page which is 90% unused.

Many of the existing users may in fact be for things like a temporary
bounce buffer for I/O, where declaring this to the mm system could
lead to less fragmented pages, but which would violate your proposed
contract:

>   GFP_TEMPORARY should be used when the memory allocated will either be
>   freed, or will be placed in a reclaimable cache, before the process
>   which allocated it enters an TASK_INTERRUPTIBLE sleep or returns to
>   user-space.  It allows access to memory which is usually reserved for
>   XXX and so can be expected to succeed more quickly during times of
>   high memory pressure.

I think what you are suggested is something very different, where you
are thinking that for *very* short-term usage perhaps we could have a
pool of memory, perhaps the same as the GFP_ATOMIC memory, or at least
similar in mechanism, where such usage could be handy.

Is there enough use cases where this would be useful?  In the local
disk backed file system world, I doubt it.  But maybe in the (for
example) NFS world, such a use would in fact be common enough that it
would be useful.

I'd suggest doing this though as a new category, perhaps
GFP_REALLY_SHORT_TERM, or GFP_MAYFLY for short.  :-)

		       	  	     	 	 - Ted

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [ATTEND] many topics
  2017-01-21 13:16             ` Theodore Ts'o
@ 2017-01-22  4:45               ` NeilBrown
  2017-01-23  6:05                 ` Matthew Wilcox
  0 siblings, 1 reply; 23+ messages in thread
From: NeilBrown @ 2017-01-22  4:45 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Michal Hocko, willy, willy, lsf-pc, linux-fsdevel, linux-mm

[-- Attachment #1: Type: text/plain, Size: 3889 bytes --]

On Sun, Jan 22 2017, Theodore Ts'o wrote:

> On Sat, Jan 21, 2017 at 11:11:41AM +1100, NeilBrown wrote:
>> What are the benefits of GFP_TEMPORARY?  Presumably it doesn't guarantee
>> success any more than GFP_KERNEL does, but maybe it is slightly less
>> likely to fail, and somewhat less likely to block for a long time??  But
>> without some sort of promise, I wonder why anyone would use the
>> flag.  Is there a promise?  Or is it just "you can be nice to the MM
>> layer by setting this flag sometimes". ???
>
> My understanding is that the idea is to allow short-term use cases not
> to be mixed with long-term use cases --- in the Java world, to declare
> that a particular object will never be promoted from the "nursury"
> arena to the "tenured" arena, so that we don't end up with a situation
> where a page is used 90% for temporary objects, and 10% for a tenured
> object, such that later on we have a page which is 90% unused.
>
> Many of the existing users may in fact be for things like a temporary
> bounce buffer for I/O, where declaring this to the mm system could
> lead to less fragmented pages, but which would violate your proposed
> contract:
>
>>   GFP_TEMPORARY should be used when the memory allocated will either be
>>   freed, or will be placed in a reclaimable cache, before the process
>>   which allocated it enters an TASK_INTERRUPTIBLE sleep or returns to
>>   user-space.  It allows access to memory which is usually reserved for
>>   XXX and so can be expected to succeed more quickly during times of
>>   high memory pressure.
>
> I think what you are suggested is something very different, where you
> are thinking that for *very* short-term usage perhaps we could have a
> pool of memory, perhaps the same as the GFP_ATOMIC memory, or at least
> similar in mechanism, where such usage could be handy.
>
> Is there enough use cases where this would be useful?  In the local
> disk backed file system world, I doubt it.  But maybe in the (for
> example) NFS world, such a use would in fact be common enough that it
> would be useful.
>
> I'd suggest doing this though as a new category, perhaps
> GFP_REALLY_SHORT_TERM, or GFP_MAYFLY for short.  :-)

I'm not suggesting this particular contract is necessarily a good thing
to have.  I just suggested it as a possible definition of
"GFP_TEMPORARY".
If you are correct, then I was clearly wrong - which nicely serves to
demonstrate that a clear definition is needed.

You have used terms like "nursery" and "tenured" which don't really help
without definitions of those terms.
How about

   GFP_TEMPORARY should be used when the memory allocated will either be
   freed, or will be placed in a reclaimable cache, after some sequence
   of events which is time-limited. i.e. there must be no indefinite
   wait on the path from allocation to freeing-or-caching.
   The memory will typically be allocated from a region dedicated to
   GFP_TEMPORARY allocations, thus ensuring that this region does not
   become fragmented.  Consequently, the delay imposed on GFP_TEMPORARY
   allocations is likely to be less than for non-TEMPORARY allocations
   when memory pressure is high.

??
I think that for this definition to work, we would need to make it "a
movable cache", meaning that any item can be either freed or
re-allocated (presumably to a "tenured" location).  I don't think we
currently have that concept for slabs do we?  That implies that this
flag would only apply to whole-page allocations  (which was part of the
original question).  We could presumably add movability to
slab-shrinkers if these seemed like a good idea.

I think that it would also make sense to require that the path from
allocation to freeing (or caching) of GFP_TEMPORARY allocation must not
wait for a non-TEMPORARY allocation, as that becomes an indefinite wait.

Is that any closer to your understanding?

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [ATTEND] many topics
  2017-01-22  4:45               ` NeilBrown
@ 2017-01-23  6:05                 ` Matthew Wilcox
  2017-01-23  6:30                   ` NeilBrown
  2017-01-23 17:09                   ` Theodore Ts'o
  0 siblings, 2 replies; 23+ messages in thread
From: Matthew Wilcox @ 2017-01-23  6:05 UTC (permalink / raw)
  To: NeilBrown
  Cc: Theodore Ts'o, Michal Hocko, lsf-pc, linux-fsdevel, linux-mm

On Sun, Jan 22, 2017 at 03:45:01PM +1100, NeilBrown wrote:
> On Sun, Jan 22 2017, Theodore Ts'o wrote:
> > On Sat, Jan 21, 2017 at 11:11:41AM +1100, NeilBrown wrote:
> >> What are the benefits of GFP_TEMPORARY?  Presumably it doesn't guarantee
> >> success any more than GFP_KERNEL does, but maybe it is slightly less
> >> likely to fail, and somewhat less likely to block for a long time??  But
> >> without some sort of promise, I wonder why anyone would use the
> >> flag.  Is there a promise?  Or is it just "you can be nice to the MM
> >> layer by setting this flag sometimes". ???
> >
> > My understanding is that the idea is to allow short-term use cases not
> > to be mixed with long-term use cases --- in the Java world, to declare
> > that a particular object will never be promoted from the "nursury"
> > arena to the "tenured" arena, so that we don't end up with a situation
> > where a page is used 90% for temporary objects, and 10% for a tenured
> > object, such that later on we have a page which is 90% unused.
> >
> > Many of the existing users may in fact be for things like a temporary
> > bounce buffer for I/O, where declaring this to the mm system could
> > lead to less fragmented pages, but which would violate your proposed
> > contract:

I don't have a clear picture in my mind of when Java promotes objects
from nursery to tenure ... which is not too different from my lack of
understanding of what the MM layer considers "temporary" :-)  Is it
acceptable usage to allocate a SCSI command (guaranteed to be freed
within 30 seconds) from the temporary area?  Or should it only be used
for allocations where the thread of control is not going to sleep between
allocation and freeing?

> You have used terms like "nursery" and "tenured" which don't really help
> without definitions of those terms.
> How about
> 
>    GFP_TEMPORARY should be used when the memory allocated will either be
>    freed, or will be placed in a reclaimable cache, after some sequence
>    of events which is time-limited. i.e. there must be no indefinite
>    wait on the path from allocation to freeing-or-caching.
>    The memory will typically be allocated from a region dedicated to
>    GFP_TEMPORARY allocations, thus ensuring that this region does not
>    become fragmented.  Consequently, the delay imposed on GFP_TEMPORARY
>    allocations is likely to be less than for non-TEMPORARY allocations
>    when memory pressure is high.

I think you're overcomplicating your proposed contract by allowing for
the "adding to a reclaimable cache" case.  If that will happen, the
code should be using GFP_RECLAIMABLE, not GFP_TEMPORARY as a matter of
good documentation.  And to allow the definitions to differ in future.
Maybe they will always be the same bit pattern, but the code should
distinguish the two cases (obviously there is no problem with allocating
memory with GFP_RECLAIMABLE, then deciding you didn't need it after all
and freeing it).

> ??
> I think that for this definition to work, we would need to make it "a
> movable cache", meaning that any item can be either freed or
> re-allocated (presumably to a "tenured" location).  I don't think we
> currently have that concept for slabs do we?  That implies that this
> flag would only apply to whole-page allocations  (which was part of the
> original question).  We could presumably add movability to
> slab-shrinkers if these seemed like a good idea.

Funnily, Christoph Lameter and I are working on just such a proposal.
He put it up as a topic discussion at the LCA Kernel Miniconf, and I've
done a proof of concept implementation for radix tree nodes.  It needs
changes to the radix tree API to make it work, so it's not published yet,
but it's a useful proof of concept for things which can probably work
and be more effective, like the dentry & inode caches.

> I think that it would also make sense to require that the path from
> allocation to freeing (or caching) of GFP_TEMPORARY allocation must not
> wait for a non-TEMPORARY allocation, as that becomes an indefinite wait.

... can it even wait for *another* TEMPORARY allocation?  I really think
this discussion needs to take place in a room with more people present
so we can get misunderstandings hammered out and general acceptance of
the consensus.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [ATTEND] many topics
  2017-01-23  6:05                 ` Matthew Wilcox
@ 2017-01-23  6:30                   ` NeilBrown
  2017-01-23  6:35                     ` Matthew Wilcox
  2017-01-23 17:09                   ` Theodore Ts'o
  1 sibling, 1 reply; 23+ messages in thread
From: NeilBrown @ 2017-01-23  6:30 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Theodore Ts'o, Michal Hocko, lsf-pc, linux-fsdevel, linux-mm

[-- Attachment #1: Type: text/plain, Size: 5049 bytes --]

On Mon, Jan 23 2017, Matthew Wilcox wrote:

> On Sun, Jan 22, 2017 at 03:45:01PM +1100, NeilBrown wrote:
>> On Sun, Jan 22 2017, Theodore Ts'o wrote:
>> > On Sat, Jan 21, 2017 at 11:11:41AM +1100, NeilBrown wrote:
>> >> What are the benefits of GFP_TEMPORARY?  Presumably it doesn't guarantee
>> >> success any more than GFP_KERNEL does, but maybe it is slightly less
>> >> likely to fail, and somewhat less likely to block for a long time??  But
>> >> without some sort of promise, I wonder why anyone would use the
>> >> flag.  Is there a promise?  Or is it just "you can be nice to the MM
>> >> layer by setting this flag sometimes". ???
>> >
>> > My understanding is that the idea is to allow short-term use cases not
>> > to be mixed with long-term use cases --- in the Java world, to declare
>> > that a particular object will never be promoted from the "nursury"
>> > arena to the "tenured" arena, so that we don't end up with a situation
>> > where a page is used 90% for temporary objects, and 10% for a tenured
>> > object, such that later on we have a page which is 90% unused.
>> >
>> > Many of the existing users may in fact be for things like a temporary
>> > bounce buffer for I/O, where declaring this to the mm system could
>> > lead to less fragmented pages, but which would violate your proposed
>> > contract:
>
> I don't have a clear picture in my mind of when Java promotes objects
> from nursery to tenure ... which is not too different from my lack of
> understanding of what the MM layer considers "temporary" :-)  Is it
> acceptable usage to allocate a SCSI command (guaranteed to be freed
> within 30 seconds) from the temporary area?  Or should it only be used
> for allocations where the thread of control is not going to sleep between
> allocation and freeing?
>
>> You have used terms like "nursery" and "tenured" which don't really help
>> without definitions of those terms.
>> How about
>> 
>>    GFP_TEMPORARY should be used when the memory allocated will either be
>>    freed, or will be placed in a reclaimable cache, after some sequence
>>    of events which is time-limited. i.e. there must be no indefinite
>>    wait on the path from allocation to freeing-or-caching.
>>    The memory will typically be allocated from a region dedicated to
>>    GFP_TEMPORARY allocations, thus ensuring that this region does not
>>    become fragmented.  Consequently, the delay imposed on GFP_TEMPORARY
>>    allocations is likely to be less than for non-TEMPORARY allocations
>>    when memory pressure is high.
>
> I think you're overcomplicating your proposed contract by allowing for
> the "adding to a reclaimable cache" case.  If that will happen, the
> code should be using GFP_RECLAIMABLE, not GFP_TEMPORARY as a matter of
> good documentation.  And to allow the definitions to differ in future.
> Maybe they will always be the same bit pattern, but the code should
> distinguish the two cases (obviously there is no problem with allocating
> memory with GFP_RECLAIMABLE, then deciding you didn't need it after all
> and freeing it).

I only included the "Reclaimable cache" possibility because Michal said:

   I guess the original intention was to use this flag for allocations
   which will be either freed shortly or they are reclaimable.


>
>> ??
>> I think that for this definition to work, we would need to make it "a
>> movable cache", meaning that any item can be either freed or
>> re-allocated (presumably to a "tenured" location).  I don't think we
>> currently have that concept for slabs do we?  That implies that this
>> flag would only apply to whole-page allocations  (which was part of the
>> original question).  We could presumably add movability to
>> slab-shrinkers if these seemed like a good idea.
>
> Funnily, Christoph Lameter and I are working on just such a proposal.
> He put it up as a topic discussion at the LCA Kernel Miniconf, and I've
> done a proof of concept implementation for radix tree nodes.  It needs
> changes to the radix tree API to make it work, so it's not published yet,
> but it's a useful proof of concept for things which can probably work
> and be more effective, like the dentry & inode caches.

Awesome!

>
>> I think that it would also make sense to require that the path from
>> allocation to freeing (or caching) of GFP_TEMPORARY allocation must not
>> wait for a non-TEMPORARY allocation, as that becomes an indefinite wait.
>
> ... can it even wait for *another* TEMPORARY allocation?  I really think
> this discussion needs to take place in a room with more people present
> so we can get misunderstandings hammered out and general acceptance of
> the consensus.

I suspect you are right, but throwing around some thoughts in advance,
to spark new ideas, can't hurt?  I hate going to meetings where the
agenda has a topic, but no background discussion.  It means that I have
to do all my thinking on my feet (not that I'll be at this meeting).

NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [ATTEND] many topics
  2017-01-23  6:30                   ` NeilBrown
@ 2017-01-23  6:35                     ` Matthew Wilcox
  0 siblings, 0 replies; 23+ messages in thread
From: Matthew Wilcox @ 2017-01-23  6:35 UTC (permalink / raw)
  To: NeilBrown
  Cc: Theodore Ts'o, Michal Hocko, lsf-pc, linux-fsdevel, linux-mm

On Mon, Jan 23, 2017 at 05:30:39PM +1100, NeilBrown wrote:
> On Mon, Jan 23 2017, Matthew Wilcox wrote:
> > ... can it even wait for *another* TEMPORARY allocation?  I really think
> > this discussion needs to take place in a room with more people present
> > so we can get misunderstandings hammered out and general acceptance of
> > the consensus.
> 
> I suspect you are right, but throwing around some thoughts in advance,
> to spark new ideas, can't hurt?  I hate going to meetings where the
> agenda has a topic, but no background discussion.  It means that I have
> to do all my thinking on my feet (not that I'll be at this meeting).

I have no desire to shut down discussion on the list!  I just wished to
point out that I don't think we're going to reach consensus on this issue,
and so it'll be worth having on the agenda.

Also, I am sad that I won't see you in Boston, and that you weren't able
to be in Hobart either.  Maybe Sydney next year?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [ATTEND] many topics
  2017-01-23  6:05                 ` Matthew Wilcox
  2017-01-23  6:30                   ` NeilBrown
@ 2017-01-23 17:09                   ` Theodore Ts'o
  2017-01-23 19:34                     ` NeilBrown
  1 sibling, 1 reply; 23+ messages in thread
From: Theodore Ts'o @ 2017-01-23 17:09 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: NeilBrown, Michal Hocko, lsf-pc, linux-fsdevel, linux-mm

On Sun, Jan 22, 2017 at 10:05:44PM -0800, Matthew Wilcox wrote:
> 
> I don't have a clear picture in my mind of when Java promotes objects
> from nursery to tenure

It's typically on the order of minutes.   :-)

> ... which is not too different from my lack of
> understanding of what the MM layer considers "temporary" :-)  Is it
> acceptable usage to allocate a SCSI command (guaranteed to be freed
> within 30 seconds) from the temporary area?  Or should it only be used
> for allocations where the thread of control is not going to sleep between
> allocation and freeing?

What the mm folks have said is that it's to prevent fragmentation.  If
that's the optimization, whether or not you the process is allocating
the memory sleeps for a few hundred milliseconds, or even seconds, is
really in the noise compared with the average lifetime of an inode in
the inode cache, or a page in the page cache....

Why do you think it matters whether or not we sleep?  I've not heard
any explanation for the assumption for why this might be important.

    		    		       	   - Ted

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [ATTEND] many topics
  2017-01-23 17:09                   ` Theodore Ts'o
@ 2017-01-23 19:34                     ` NeilBrown
  2017-01-25 14:36                       ` Vlastimil Babka
  0 siblings, 1 reply; 23+ messages in thread
From: NeilBrown @ 2017-01-23 19:34 UTC (permalink / raw)
  To: Theodore Ts'o, Matthew Wilcox
  Cc: Michal Hocko, lsf-pc, linux-fsdevel, linux-mm

[-- Attachment #1: Type: text/plain, Size: 1928 bytes --]

On Tue, Jan 24 2017, Theodore Ts'o wrote:

> On Sun, Jan 22, 2017 at 10:05:44PM -0800, Matthew Wilcox wrote:
>> 
>> I don't have a clear picture in my mind of when Java promotes objects
>> from nursery to tenure
>
> It's typically on the order of minutes.   :-)
>
>> ... which is not too different from my lack of
>> understanding of what the MM layer considers "temporary" :-)  Is it
>> acceptable usage to allocate a SCSI command (guaranteed to be freed
>> within 30 seconds) from the temporary area?  Or should it only be used
>> for allocations where the thread of control is not going to sleep between
>> allocation and freeing?
>
> What the mm folks have said is that it's to prevent fragmentation.  If
> that's the optimization, whether or not you the process is allocating
> the memory sleeps for a few hundred milliseconds, or even seconds, is
> really in the noise compared with the average lifetime of an inode in
> the inode cache, or a page in the page cache....
>
> Why do you think it matters whether or not we sleep?  I've not heard
> any explanation for the assumption for why this might be important.

Because "TEMPORARY" implies a limit to the amount of time, and sleeping
is the thing that causes a process to take a large amount of time.  It
seems like an obvious connection to me.

Imagine I want to allocate a large contiguous region in the
ZONE_MOVEABLE region.  I find a mostly free region, so I just need to
move those last few pages.  If there is a limit on how long a process
can sleep while holding an allocation from ZONE_MOVEABLE, then I know
how long, at most, I need to wait before those pages become either free
or movable.  If those processes can wait indefinitely, then I might have
to wait indefinitely to get this large region.

"temporary" doesn't mean anything without a well defined time limit.

But maybe I completely misunderstand.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [ATTEND] many topics
  2017-01-23 19:34                     ` NeilBrown
@ 2017-01-25 14:36                       ` Vlastimil Babka
  2017-01-25 20:36                         ` Matthew Wilcox
  2017-01-25 23:19                         ` NeilBrown
  0 siblings, 2 replies; 23+ messages in thread
From: Vlastimil Babka @ 2017-01-25 14:36 UTC (permalink / raw)
  To: NeilBrown, Theodore Ts'o, Matthew Wilcox
  Cc: Michal Hocko, lsf-pc, linux-fsdevel, linux-mm

On 01/23/2017 08:34 PM, NeilBrown wrote:
> On Tue, Jan 24 2017, Theodore Ts'o wrote:
>
>> On Sun, Jan 22, 2017 at 10:05:44PM -0800, Matthew Wilcox wrote:
>>>
>>> I don't have a clear picture in my mind of when Java promotes objects
>>> from nursery to tenure
>>
>> It's typically on the order of minutes.   :-)
>>
>>> ... which is not too different from my lack of
>>> understanding of what the MM layer considers "temporary" :-)  Is it
>>> acceptable usage to allocate a SCSI command (guaranteed to be freed
>>> within 30 seconds) from the temporary area?  Or should it only be used
>>> for allocations where the thread of control is not going to sleep between
>>> allocation and freeing?
>>
>> What the mm folks have said is that it's to prevent fragmentation.  If
>> that's the optimization, whether or not you the process is allocating
>> the memory sleeps for a few hundred milliseconds, or even seconds, is
>> really in the noise compared with the average lifetime of an inode in
>> the inode cache, or a page in the page cache....
>>
>> Why do you think it matters whether or not we sleep?  I've not heard
>> any explanation for the assumption for why this might be important.
>
> Because "TEMPORARY" implies a limit to the amount of time, and sleeping
> is the thing that causes a process to take a large amount of time.  It
> seems like an obvious connection to me.

There's no simple connection to time, it depends on the larger picture - what's 
the state of the allocator and what other allocations/free's are happening 
around this one. Perhaps let me try to explain what the flag does and what 
benefits are expected.

GFP_TEMPORARY, compared to GFP_KERNEL, adds __GFP_RECLAIMABLE, which tries to 
place the allocation within MIGRATE_RECLAIMABLE pageblocks - GFP_KERNEL implies 
MIGRATE_UNMOVABLE pageblocks, and userspace allocations are typically 
MIGRATE_MOVABLE. The main goal of this "mobility grouping" is to prevent the 
unmovable pages spreading all over the memory, making it impossible to get 
larger blocks by defragmentation (compaction). Ideally we would have all these 
problematic pages fit neatly into the smallest possible number of pageblocks 
that can accomodate them. But we can't know in advance how many, and we don't 
know their lifetimes, so there are various heuristics for relabeling pageblocks 
between the 3 types as we exceed the existing ones.

Now GFP_TEMPORARY means we tell the allocator about the relatively shorter 
lifetime, so it places the allocation within the RECLAIMABLE pageblocks, which 
are also used for slab caches that have shrinkers. The expected benefit of this 
is that we potentially prevent growing the number of UNMOVABLE pageblocks 
(either directly by this allocation, or a subsequent GFP_KERNEL one, that would 
otherwise fit within the existing pageblocks). While the RECLAIMABLE pages also 
cannot be defragmented (at least currently, there are some proposals for the 
slab caches...), we can at least shrink them, so the negative impact on 
compaction is considered less severe in the longer term.

> Imagine I want to allocate a large contiguous region in the
> ZONE_MOVEABLE region.  I find a mostly free region, so I just need to
> move those last few pages.  If there is a limit on how long a process
> can sleep while holding an allocation from ZONE_MOVEABLE, then I know
> how long, at most, I need to wait before those pages become either free
> or movable.  If those processes can wait indefinitely, then I might have
> to wait indefinitely to get this large region.

Yeah so this is not relevant, because GFP_TEMPORARY does not make the allocation 
__GFP_MOVABLE, so it still is not allowed to end up within a ZONE_MOVABLE zone. 
Unfortunately the issue similar to that you mention does still exist due to 
uncontrolled pinning of the movable pages, which affects both ZONE_MOVABLE and 
CMA, but that's another story...

> "temporary" doesn't mean anything without a well defined time limit.
>
> But maybe I completely misunderstand.

HTH,
Vlastimil

> NeilBrown
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [ATTEND] many topics
  2017-01-25 14:36                       ` Vlastimil Babka
@ 2017-01-25 20:36                         ` Matthew Wilcox
  2017-01-25 21:15                           ` Vlastimil Babka
  2017-01-25 23:19                         ` NeilBrown
  1 sibling, 1 reply; 23+ messages in thread
From: Matthew Wilcox @ 2017-01-25 20:36 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: NeilBrown, Theodore Ts'o, Michal Hocko, lsf-pc,
	linux-fsdevel, linux-mm

On Wed, Jan 25, 2017 at 03:36:15PM +0100, Vlastimil Babka wrote:
> On 01/23/2017 08:34 PM, NeilBrown wrote:
> > Because "TEMPORARY" implies a limit to the amount of time, and sleeping
> > is the thing that causes a process to take a large amount of time.  It
> > seems like an obvious connection to me.
> 
> There's no simple connection to time, it depends on the larger picture -
> what's the state of the allocator and what other allocations/free's are
> happening around this one. Perhaps let me try to explain what the flag does
> and what benefits are expected.

The explanations of what GFP_TEMPORARY /does/ keep getting better and
better.  And thank you for that, it really is interesting.  But what
we're asking for is guidelines for the user of this interface; what is
the contract between the caller and the MM system?

So far, I think we've answered a few questions:

 - Using GFP_TEMPORARY in calls to kmalloc() is not currently supported
   because slab will happily allocate non-TEMPORARY allocations from the
   same page.
 - GFP_TEMPORARY allocations may be held on to for a considerable length
   of time; certainly seconds and maybe minutes.
 - The advantage of marking one's allocation as TEMPORARY is twofold:
   - This allocation is more likely to succeed due to being allowed to
     access more memory.
   - Other higher-order allocations are more likely to succeed due to
     the segregation of short and long lived allocations from each other.

I'd like to see us add a tmalloc() / tmalloc_atomic() / tfree() API
for allocating temporary memory, then hook that up to SLAB as a way to
allocate small amounts of memory (... although maybe we shouldn't try
too hard to allocate multiple objects from a single page if they're all
temporary ...)

In any case, we need to ensure that GFP_TEMPORARY is not accepted by
slab ... that's not as straightforward as adding __GFP_RECLAIMABLE to
GFP_SLAB_BUG_MASK because SLAB_RECLAIMABLE slabs will reasonable add
__GFP_RECLAIMABLE before the check.  So a good place to check it is ...
kmalloc_slab()?  That hits all three slab allocators.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [ATTEND] many topics
  2017-01-25 20:36                         ` Matthew Wilcox
@ 2017-01-25 21:15                           ` Vlastimil Babka
  0 siblings, 0 replies; 23+ messages in thread
From: Vlastimil Babka @ 2017-01-25 21:15 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: NeilBrown, Theodore Ts'o, Michal Hocko, lsf-pc,
	linux-fsdevel, linux-mm

On 01/25/2017 09:36 PM, Matthew Wilcox wrote:
> On Wed, Jan 25, 2017 at 03:36:15PM +0100, Vlastimil Babka wrote:
>> On 01/23/2017 08:34 PM, NeilBrown wrote:
>> > Because "TEMPORARY" implies a limit to the amount of time, and sleeping
>> > is the thing that causes a process to take a large amount of time.  It
>> > seems like an obvious connection to me.
>>
>> There's no simple connection to time, it depends on the larger picture -
>> what's the state of the allocator and what other allocations/free's are
>> happening around this one. Perhaps let me try to explain what the flag does
>> and what benefits are expected.
>
> The explanations of what GFP_TEMPORARY /does/ keep getting better and
> better.  And thank you for that, it really is interesting.  But what
> we're asking for is guidelines for the user of this interface; what is
> the contract between the caller and the MM system?
>
> So far, I think we've answered a few questions:
>
>  - Using GFP_TEMPORARY in calls to kmalloc() is not currently supported
>    because slab will happily allocate non-TEMPORARY allocations from the
>    same page.

Sounds right, AFAIK there's no smarts in slab about this.

>  - GFP_TEMPORARY allocations may be held on to for a considerable length
>    of time; certainly seconds and maybe minutes.

I'd agree.

>  - The advantage of marking one's allocation as TEMPORARY is twofold:
>    - This allocation is more likely to succeed due to being allowed to
>      access more memory.

There's no such provision in the current implementation.

>    - Other higher-order allocations are more likely to succeed due to
>      the segregation of short and long lived allocations from each other.

Right.

> I'd like to see us add a tmalloc() / tmalloc_atomic() / tfree() API
> for allocating temporary memory, then hook that up to SLAB as a way to
> allocate small amounts of memory (... although maybe we shouldn't try
> too hard to allocate multiple objects from a single page if they're all
> temporary ...)

Before doing things like that, we should evaluate whether the benefits are 
really worth it. I only know how the mobility grouping and related heuristics 
work, but haven't measured or seen some results wrt GFP_TEMPORARY. Also are 
there some large potential users you have in mind? If there's always some 
constant small amount of temporary allocations in the system, then the benefits 
should be rather small as that amount will be effectively non-defragmentable in 
any given point of time. I would expect the most benefit when there are some 
less frequent but large bursts of temporary allocations concurrently with 
long-term unmovable allocations that will result in permanently polluting new 
pageblocks.

> In any case, we need to ensure that GFP_TEMPORARY is not accepted by
> slab ... that's not as straightforward as adding __GFP_RECLAIMABLE to
> GFP_SLAB_BUG_MASK because SLAB_RECLAIMABLE slabs will reasonable add
> __GFP_RECLAIMABLE before the check.  So a good place to check it is ...
> kmalloc_slab()?  That hits all three slab allocators.
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [ATTEND] many topics
  2017-01-25 14:36                       ` Vlastimil Babka
  2017-01-25 20:36                         ` Matthew Wilcox
@ 2017-01-25 23:19                         ` NeilBrown
  2017-01-26  8:56                           ` Michal Hocko
  1 sibling, 1 reply; 23+ messages in thread
From: NeilBrown @ 2017-01-25 23:19 UTC (permalink / raw)
  To: Vlastimil Babka, Theodore Ts'o, Matthew Wilcox
  Cc: Michal Hocko, lsf-pc, linux-fsdevel, linux-mm

[-- Attachment #1: Type: text/plain, Size: 6067 bytes --]

On Wed, Jan 25 2017, Vlastimil Babka wrote:

> On 01/23/2017 08:34 PM, NeilBrown wrote:
>> On Tue, Jan 24 2017, Theodore Ts'o wrote:
>>
>>> On Sun, Jan 22, 2017 at 10:05:44PM -0800, Matthew Wilcox wrote:
>>>>
>>>> I don't have a clear picture in my mind of when Java promotes objects
>>>> from nursery to tenure
>>>
>>> It's typically on the order of minutes.   :-)
>>>
>>>> ... which is not too different from my lack of
>>>> understanding of what the MM layer considers "temporary" :-)  Is it
>>>> acceptable usage to allocate a SCSI command (guaranteed to be freed
>>>> within 30 seconds) from the temporary area?  Or should it only be used
>>>> for allocations where the thread of control is not going to sleep between
>>>> allocation and freeing?
>>>
>>> What the mm folks have said is that it's to prevent fragmentation.  If
>>> that's the optimization, whether or not you the process is allocating
>>> the memory sleeps for a few hundred milliseconds, or even seconds, is
>>> really in the noise compared with the average lifetime of an inode in
>>> the inode cache, or a page in the page cache....
>>>
>>> Why do you think it matters whether or not we sleep?  I've not heard
>>> any explanation for the assumption for why this might be important.
>>
>> Because "TEMPORARY" implies a limit to the amount of time, and sleeping
>> is the thing that causes a process to take a large amount of time.  It
>> seems like an obvious connection to me.
>
> There's no simple connection to time, it depends on the larger picture - what's 
> the state of the allocator and what other allocations/free's are happening 
> around this one. Perhaps let me try to explain what the flag does and what 
> benefits are expected.

If there is no simple connection to time, then I would discourage use of
the word "TEMPORARY" as that has a strong connection with the concept of time.

>
> GFP_TEMPORARY, compared to GFP_KERNEL, adds __GFP_RECLAIMABLE, which tries to 
> place the allocation within MIGRATE_RECLAIMABLE pageblocks - GFP_KERNEL implies 
> MIGRATE_UNMOVABLE pageblocks, and userspace allocations are typically 
> MIGRATE_MOVABLE. The main goal of this "mobility grouping" is to prevent the 
> unmovable pages spreading all over the memory, making it impossible to get 
> larger blocks by defragmentation (compaction). Ideally we would have all these 
> problematic pages fit neatly into the smallest possible number of pageblocks 
> that can accomodate them. But we can't know in advance how many, and we don't 
> know their lifetimes, so there are various heuristics for relabeling pageblocks 
> between the 3 types as we exceed the existing ones.
>
> Now GFP_TEMPORARY means we tell the allocator about the relatively shorter 
> lifetime, so it places the allocation within the RECLAIMABLE pageblocks, which 
> are also used for slab caches that have shrinkers. The expected benefit of this 
> is that we potentially prevent growing the number of UNMOVABLE pageblocks 
> (either directly by this allocation, or a subsequent GFP_KERNEL one, that would 
> otherwise fit within the existing pageblocks). While the RECLAIMABLE pages also 
> cannot be defragmented (at least currently, there are some proposals for the 
> slab caches...), we can at least shrink them, so the negative impact on 
> compaction is considered less severe in the longer term.

Hmmm...  this seems like a fuzzy heuristic.
I can use GFP_TEMPORARY as long  I'll free the memory eventually, or
there is some way for you to ask me to free the memory, though I don't
have to succeed - every.

If this heuristic actually works, and reduces fragmentation, then I
suspect it is more luck than good management.  You have maybe added
GFP_TEMPORARY in a few places which fit with your understanding of what
you want and which don't ruin the outcomes in your tests.  But without a
strong definition of when it can and cannot be used, it seems quite
likely that someone else will start using it in a way that fits within
your vague statement of requirements, but actually results in much more
fragmentation.

i.e. I think this is a fragile heuristic and not a long term solution
for anything.

I think it would be better if we could discard the idea of "reclaimable"
and just stick with "movable" and "unmovable".  Lots of things are not
movable at present, but could be made movable with relatively little
effort.  Once the interfaces are in place to allow arbitrary kernel code
to find out when things should be moved, I suspect that a lot of
allocations could become movable.

Before we reach that point, there might be some value in the heuristic
that "reclaimable" is sort-of close to "movable", but I don't think
that heuristic should appear in the public interface.  i.e. just 'or' in
__GFP_RECLAIMABLE where you think it is a good idea, and leave big
comment explaining why, and how it can be removed when we have proper
interfaces for moving things.

Thanks,
NeilBrown


>
>> Imagine I want to allocate a large contiguous region in the
>> ZONE_MOVEABLE region.  I find a mostly free region, so I just need to
>> move those last few pages.  If there is a limit on how long a process
>> can sleep while holding an allocation from ZONE_MOVEABLE, then I know
>> how long, at most, I need to wait before those pages become either free
>> or movable.  If those processes can wait indefinitely, then I might have
>> to wait indefinitely to get this large region.
>
> Yeah so this is not relevant, because GFP_TEMPORARY does not make the allocation 
> __GFP_MOVABLE, so it still is not allowed to end up within a ZONE_MOVABLE zone. 
> Unfortunately the issue similar to that you mention does still exist due to 
> uncontrolled pinning of the movable pages, which affects both ZONE_MOVABLE and 
> CMA, but that's another story...
>
>> "temporary" doesn't mean anything without a well defined time limit.
>>
>> But maybe I completely misunderstand.
>
> HTH,
> Vlastimil
>
>> NeilBrown
>>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [ATTEND] many topics
  2017-01-25 23:19                         ` NeilBrown
@ 2017-01-26  8:56                           ` Michal Hocko
  2017-01-26 21:20                             ` NeilBrown
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2017-01-26  8:56 UTC (permalink / raw)
  To: NeilBrown
  Cc: Vlastimil Babka, Theodore Ts'o, Matthew Wilcox, lsf-pc,
	linux-fsdevel, linux-mm

On Thu 26-01-17 10:19:31, NeilBrown wrote:
> On Wed, Jan 25 2017, Vlastimil Babka wrote:
> 
> > On 01/23/2017 08:34 PM, NeilBrown wrote:
> >> On Tue, Jan 24 2017, Theodore Ts'o wrote:
> >>
> >>> On Sun, Jan 22, 2017 at 10:05:44PM -0800, Matthew Wilcox wrote:
> >>>>
> >>>> I don't have a clear picture in my mind of when Java promotes objects
> >>>> from nursery to tenure
> >>>
> >>> It's typically on the order of minutes.   :-)
> >>>
> >>>> ... which is not too different from my lack of
> >>>> understanding of what the MM layer considers "temporary" :-)  Is it
> >>>> acceptable usage to allocate a SCSI command (guaranteed to be freed
> >>>> within 30 seconds) from the temporary area?  Or should it only be used
> >>>> for allocations where the thread of control is not going to sleep between
> >>>> allocation and freeing?
> >>>
> >>> What the mm folks have said is that it's to prevent fragmentation.  If
> >>> that's the optimization, whether or not you the process is allocating
> >>> the memory sleeps for a few hundred milliseconds, or even seconds, is
> >>> really in the noise compared with the average lifetime of an inode in
> >>> the inode cache, or a page in the page cache....
> >>>
> >>> Why do you think it matters whether or not we sleep?  I've not heard
> >>> any explanation for the assumption for why this might be important.
> >>
> >> Because "TEMPORARY" implies a limit to the amount of time, and sleeping
> >> is the thing that causes a process to take a large amount of time.  It
> >> seems like an obvious connection to me.
> >
> > There's no simple connection to time, it depends on the larger picture - what's 
> > the state of the allocator and what other allocations/free's are happening 
> > around this one. Perhaps let me try to explain what the flag does and what 
> > benefits are expected.
> 
> If there is no simple connection to time, then I would discourage use of
> the word "TEMPORARY" as that has a strong connection with the concept of time.
> 
> >
> > GFP_TEMPORARY, compared to GFP_KERNEL, adds __GFP_RECLAIMABLE, which tries to 
> > place the allocation within MIGRATE_RECLAIMABLE pageblocks - GFP_KERNEL implies 
> > MIGRATE_UNMOVABLE pageblocks, and userspace allocations are typically 
> > MIGRATE_MOVABLE. The main goal of this "mobility grouping" is to prevent the 
> > unmovable pages spreading all over the memory, making it impossible to get 
> > larger blocks by defragmentation (compaction). Ideally we would have all these 
> > problematic pages fit neatly into the smallest possible number of pageblocks 
> > that can accomodate them. But we can't know in advance how many, and we don't 
> > know their lifetimes, so there are various heuristics for relabeling pageblocks 
> > between the 3 types as we exceed the existing ones.
> >
> > Now GFP_TEMPORARY means we tell the allocator about the relatively shorter 
> > lifetime, so it places the allocation within the RECLAIMABLE pageblocks, which 
> > are also used for slab caches that have shrinkers. The expected benefit of this 
> > is that we potentially prevent growing the number of UNMOVABLE pageblocks 
> > (either directly by this allocation, or a subsequent GFP_KERNEL one, that would 
> > otherwise fit within the existing pageblocks). While the RECLAIMABLE pages also 
> > cannot be defragmented (at least currently, there are some proposals for the 
> > slab caches...), we can at least shrink them, so the negative impact on 
> > compaction is considered less severe in the longer term.
> 
> Hmmm...  this seems like a fuzzy heuristic.
> I can use GFP_TEMPORARY as long  I'll free the memory eventually, or
> there is some way for you to ask me to free the memory, though I don't
> have to succeed - every.

I guess this was the original motivation. If you look at current users
then the pattern seems to be
	object = alloc(GFP_TEMPORARY);
	do_something_that_terminates_shortly();
	free(object);

Another pattern is
	cache = kmemcache_create(SLAB_RECLAIM_ACCOUNT)
	[...]
	object = kmem_cache_alloc(GFP_KERNEL)

so the later one is an implicit GFP_TEMPORARY.

I completely agree that GFP_TEMPORARY is confusing and it needs a much
better documentation.

> If this heuristic actually works, and reduces fragmentation, then I
> suspect it is more luck than good management.  You have maybe added
> GFP_TEMPORARY in a few places which fit with your understanding of what
> you want and which don't ruin the outcomes in your tests.  But without a
> strong definition of when it can and cannot be used, it seems quite
> likely that someone else will start using it in a way that fits within
> your vague statement of requirements, but actually results in much more
> fragmentation.

After more thinking about this I completely agree. And it wouldn't
be for the first time when this would happen. I actually think that
we should simply remove GFP_TEMPORARY. I seriously doubt those few
users would change anything wrt. to the memory fragmentation. The
SLAB_RECLAIM_ACCOUNT resp.  __GFP_RECLAIMABLE makes perfect sense but
the explicit usage of GFP_TEMPORARY without any contract just calls for
problems.
 
> i.e. I think this is a fragile heuristic and not a long term solution
> for anything.

Agreed!

> I think it would be better if we could discard the idea of "reclaimable"
> and just stick with "movable" and "unmovable".  Lots of things are not
> movable at present, but could be made movable with relatively little
> effort.  Once the interfaces are in place to allow arbitrary kernel code
> to find out when things should be moved, I suspect that a lot of
> allocations could become movable.

I believe we need both. There will be many objects which are hard to be
movable yet they are reclaimable which can help to reduce the
fragmentation longterm.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [ATTEND] many topics
  2017-01-26  8:56                           ` Michal Hocko
@ 2017-01-26 21:20                             ` NeilBrown
  2017-01-27 13:12                               ` Michal Hocko
  0 siblings, 1 reply; 23+ messages in thread
From: NeilBrown @ 2017-01-26 21:20 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Vlastimil Babka, Theodore Ts'o, Matthew Wilcox, lsf-pc,
	linux-fsdevel, linux-mm

[-- Attachment #1: Type: text/plain, Size: 1406 bytes --]

On Thu, Jan 26 2017, Michal Hocko wrote:

> On Thu 26-01-17 10:19:31, NeilBrown wrote:
>
>> I think it would be better if we could discard the idea of "reclaimable"
>> and just stick with "movable" and "unmovable".  Lots of things are not
>> movable at present, but could be made movable with relatively little
>> effort.  Once the interfaces are in place to allow arbitrary kernel code
>> to find out when things should be moved, I suspect that a lot of
>> allocations could become movable.
>
> I believe we need both. There will be many objects which are hard to be
> movable yet they are reclaimable which can help to reduce the
> fragmentation longterm.

Do we?  Any "reclaimable" objects which are "busy", are really
"unmovable" objects, and so contribute to fragmentation.

I've been thinking about inodes and dentries - which usually come up as
problematic objects in this context.
It would be quite complex to support moving arbitrary inodes or dentries
given the current design.  But maybe we don't need to.
Suppose these objects were allocated as 'movable', but when the first
long-term reference was taken (i.e. the first non-movable reference),
they were first moved to the "non-movable" region?
Then we only need to be able to move a subset of these, which will often
account for the bulk of the memory usage.
There would be costs of course, but I think it might be worth pursuing.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [ATTEND] many topics
  2017-01-26 21:20                             ` NeilBrown
@ 2017-01-27 13:12                               ` Michal Hocko
  0 siblings, 0 replies; 23+ messages in thread
From: Michal Hocko @ 2017-01-27 13:12 UTC (permalink / raw)
  To: NeilBrown
  Cc: Vlastimil Babka, Theodore Ts'o, Matthew Wilcox, lsf-pc,
	linux-fsdevel, linux-mm

On Fri 27-01-17 08:20:00, NeilBrown wrote:
> On Thu, Jan 26 2017, Michal Hocko wrote:
> 
> > On Thu 26-01-17 10:19:31, NeilBrown wrote:
> >
> >> I think it would be better if we could discard the idea of "reclaimable"
> >> and just stick with "movable" and "unmovable".  Lots of things are not
> >> movable at present, but could be made movable with relatively little
> >> effort.  Once the interfaces are in place to allow arbitrary kernel code
> >> to find out when things should be moved, I suspect that a lot of
> >> allocations could become movable.
> >
> > I believe we need both. There will be many objects which are hard to be
> > movable yet they are reclaimable which can help to reduce the
> > fragmentation longterm.
> 
> Do we?  Any "reclaimable" objects which are "busy", are really
> "unmovable" objects, and so contribute to fragmentation.

true and not much different from other reclaimable or movable objects.
E.g. a pinned LRU page is also unmovable.

> I've been thinking about inodes and dentries - which usually come up as
> problematic objects in this context.
> It would be quite complex to support moving arbitrary inodes or dentries
> given the current design.  But maybe we don't need to.
> Suppose these objects were allocated as 'movable', but when the first
> long-term reference was taken (i.e. the first non-movable reference),
> they were first moved to the "non-movable" region?

I am not familiar with the [di]cache enough to comment on how easy would
be to move those objects around. But there were already suggestions that
LRU pages would be migrated before a long term pins to not block
migration. Anyway this sounds like a topic on its own. From the current
discussion so far it really seems that it would be really hard to define
sensible semantic for GFP_TEMPORARY with the current implementation so I
will send a patch to simply drop this flag. If we want to have such a
flag then we should start over with defining the semantic first and
think this thing over properly.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2017-01-27 13:12 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-18  5:49 [ATTEND] many topics Matthew Wilcox
2017-01-18 10:13 ` [Lsf-pc] " Jan Kara
2017-01-18 11:26   ` willy
2017-01-18 13:32 ` Michal Hocko
2017-01-19 11:05   ` willy
2017-01-19 11:33     ` Michal Hocko
2017-01-19 11:52       ` willy
2017-01-19 12:11         ` Michal Hocko
2017-01-21  0:11           ` NeilBrown
2017-01-21 13:16             ` Theodore Ts'o
2017-01-22  4:45               ` NeilBrown
2017-01-23  6:05                 ` Matthew Wilcox
2017-01-23  6:30                   ` NeilBrown
2017-01-23  6:35                     ` Matthew Wilcox
2017-01-23 17:09                   ` Theodore Ts'o
2017-01-23 19:34                     ` NeilBrown
2017-01-25 14:36                       ` Vlastimil Babka
2017-01-25 20:36                         ` Matthew Wilcox
2017-01-25 21:15                           ` Vlastimil Babka
2017-01-25 23:19                         ` NeilBrown
2017-01-26  8:56                           ` Michal Hocko
2017-01-26 21:20                             ` NeilBrown
2017-01-27 13:12                               ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).