linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: NeilBrown <neilb@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>, Theodore Ts'o <tytso@mit.edu>,
	Matthew Wilcox <willy@infradead.org>,
	lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [ATTEND] many topics
Date: Thu, 26 Jan 2017 09:56:39 +0100	[thread overview]
Message-ID: <20170126085639.GA6590@dhcp22.suse.cz> (raw)
In-Reply-To: <8760l2vibg.fsf@notabene.neil.brown.name>

On Thu 26-01-17 10:19:31, NeilBrown wrote:
> On Wed, Jan 25 2017, Vlastimil Babka wrote:
> 
> > On 01/23/2017 08:34 PM, NeilBrown wrote:
> >> On Tue, Jan 24 2017, Theodore Ts'o wrote:
> >>
> >>> On Sun, Jan 22, 2017 at 10:05:44PM -0800, Matthew Wilcox wrote:
> >>>>
> >>>> I don't have a clear picture in my mind of when Java promotes objects
> >>>> from nursery to tenure
> >>>
> >>> It's typically on the order of minutes.   :-)
> >>>
> >>>> ... which is not too different from my lack of
> >>>> understanding of what the MM layer considers "temporary" :-)  Is it
> >>>> acceptable usage to allocate a SCSI command (guaranteed to be freed
> >>>> within 30 seconds) from the temporary area?  Or should it only be used
> >>>> for allocations where the thread of control is not going to sleep between
> >>>> allocation and freeing?
> >>>
> >>> What the mm folks have said is that it's to prevent fragmentation.  If
> >>> that's the optimization, whether or not you the process is allocating
> >>> the memory sleeps for a few hundred milliseconds, or even seconds, is
> >>> really in the noise compared with the average lifetime of an inode in
> >>> the inode cache, or a page in the page cache....
> >>>
> >>> Why do you think it matters whether or not we sleep?  I've not heard
> >>> any explanation for the assumption for why this might be important.
> >>
> >> Because "TEMPORARY" implies a limit to the amount of time, and sleeping
> >> is the thing that causes a process to take a large amount of time.  It
> >> seems like an obvious connection to me.
> >
> > There's no simple connection to time, it depends on the larger picture - what's 
> > the state of the allocator and what other allocations/free's are happening 
> > around this one. Perhaps let me try to explain what the flag does and what 
> > benefits are expected.
> 
> If there is no simple connection to time, then I would discourage use of
> the word "TEMPORARY" as that has a strong connection with the concept of time.
> 
> >
> > GFP_TEMPORARY, compared to GFP_KERNEL, adds __GFP_RECLAIMABLE, which tries to 
> > place the allocation within MIGRATE_RECLAIMABLE pageblocks - GFP_KERNEL implies 
> > MIGRATE_UNMOVABLE pageblocks, and userspace allocations are typically 
> > MIGRATE_MOVABLE. The main goal of this "mobility grouping" is to prevent the 
> > unmovable pages spreading all over the memory, making it impossible to get 
> > larger blocks by defragmentation (compaction). Ideally we would have all these 
> > problematic pages fit neatly into the smallest possible number of pageblocks 
> > that can accomodate them. But we can't know in advance how many, and we don't 
> > know their lifetimes, so there are various heuristics for relabeling pageblocks 
> > between the 3 types as we exceed the existing ones.
> >
> > Now GFP_TEMPORARY means we tell the allocator about the relatively shorter 
> > lifetime, so it places the allocation within the RECLAIMABLE pageblocks, which 
> > are also used for slab caches that have shrinkers. The expected benefit of this 
> > is that we potentially prevent growing the number of UNMOVABLE pageblocks 
> > (either directly by this allocation, or a subsequent GFP_KERNEL one, that would 
> > otherwise fit within the existing pageblocks). While the RECLAIMABLE pages also 
> > cannot be defragmented (at least currently, there are some proposals for the 
> > slab caches...), we can at least shrink them, so the negative impact on 
> > compaction is considered less severe in the longer term.
> 
> Hmmm...  this seems like a fuzzy heuristic.
> I can use GFP_TEMPORARY as long  I'll free the memory eventually, or
> there is some way for you to ask me to free the memory, though I don't
> have to succeed - every.

I guess this was the original motivation. If you look at current users
then the pattern seems to be
	object = alloc(GFP_TEMPORARY);
	do_something_that_terminates_shortly();
	free(object);

Another pattern is
	cache = kmemcache_create(SLAB_RECLAIM_ACCOUNT)
	[...]
	object = kmem_cache_alloc(GFP_KERNEL)

so the later one is an implicit GFP_TEMPORARY.

I completely agree that GFP_TEMPORARY is confusing and it needs a much
better documentation.

> If this heuristic actually works, and reduces fragmentation, then I
> suspect it is more luck than good management.  You have maybe added
> GFP_TEMPORARY in a few places which fit with your understanding of what
> you want and which don't ruin the outcomes in your tests.  But without a
> strong definition of when it can and cannot be used, it seems quite
> likely that someone else will start using it in a way that fits within
> your vague statement of requirements, but actually results in much more
> fragmentation.

After more thinking about this I completely agree. And it wouldn't
be for the first time when this would happen. I actually think that
we should simply remove GFP_TEMPORARY. I seriously doubt those few
users would change anything wrt. to the memory fragmentation. The
SLAB_RECLAIM_ACCOUNT resp.  __GFP_RECLAIMABLE makes perfect sense but
the explicit usage of GFP_TEMPORARY without any contract just calls for
problems.
 
> i.e. I think this is a fragile heuristic and not a long term solution
> for anything.

Agreed!

> I think it would be better if we could discard the idea of "reclaimable"
> and just stick with "movable" and "unmovable".  Lots of things are not
> movable at present, but could be made movable with relatively little
> effort.  Once the interfaces are in place to allow arbitrary kernel code
> to find out when things should be moved, I suspect that a lot of
> allocations could become movable.

I believe we need both. There will be many objects which are hard to be
movable yet they are reclaimable which can help to reduce the
fragmentation longterm.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-01-26  8:56 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-18  5:49 [ATTEND] many topics Matthew Wilcox
2017-01-18 10:13 ` [Lsf-pc] " Jan Kara
2017-01-18 11:26   ` willy
2017-01-18 13:32 ` Michal Hocko
2017-01-19 11:05   ` willy
2017-01-19 11:33     ` Michal Hocko
2017-01-19 11:52       ` willy
2017-01-19 12:11         ` Michal Hocko
2017-01-21  0:11           ` NeilBrown
2017-01-21 13:16             ` Theodore Ts'o
2017-01-22  4:45               ` NeilBrown
2017-01-23  6:05                 ` Matthew Wilcox
2017-01-23  6:30                   ` NeilBrown
2017-01-23  6:35                     ` Matthew Wilcox
2017-01-23 17:09                   ` Theodore Ts'o
2017-01-23 19:34                     ` NeilBrown
2017-01-25 14:36                       ` Vlastimil Babka
2017-01-25 20:36                         ` Matthew Wilcox
2017-01-25 21:15                           ` Vlastimil Babka
2017-01-25 23:19                         ` NeilBrown
2017-01-26  8:56                           ` Michal Hocko [this message]
2017-01-26 21:20                             ` NeilBrown
2017-01-27 13:12                               ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170126085639.GA6590@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=neilb@suse.com \
    --cc=tytso@mit.edu \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).