linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Minchan Kim <minchan@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	jlayton@poochiereds.net, bfields@fieldses.org,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	koct9i@gmail.com, aquini@redhat.com,
	virtualization@lists.linux-foundation.org,
	Mel Gorman <mgorman@suse.de>, Hugh Dickins <hughd@google.com>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Rik van Riel <riel@redhat.com>,
	rknize@motorola.com, Gioh Kim <gi-oh.kim@profitbricks.com>,
	Sangseok Lee <sangseok.lee@lge.com>,
	Chan Gyun Jeong <chan.jeong@lge.com>,
	Al Viro <viro@ZenIV.linux.org.uk>,
	YiPing Xu <xuyiping@hisilicon.com>,
	Jonathan Corbet <corbet@lwn.net>
Subject: Re: [PATCH v3 03/16] mm: add non-lru movable page support document
Date: Mon, 4 Apr 2016 15:09:22 +0200	[thread overview]
Message-ID: <57026782.3020201@suse.cz> (raw)
In-Reply-To: <20160404022552.GD6543@bbox>

On 04/04/2016 04:25 AM, Minchan Kim wrote:
>>
>> Ah, I see, so it's designed with page lock to handle the concurrent isolations etc.
>>
>> In http://marc.info/?l=linux-mm&m=143816716511904&w=2 Mel has warned
>> about doing this in general under page_lock and suggested that each
>> user handles concurrent calls to isolate_page() internally. Might be
>> more generic that way, even if all current implementers will
>> actually use the page lock.
>
> We need PG_lock for two reasons.
>
> Firstly, it guarantees page's flags operation(i.e., PG_movable, PG_isolated)
> atomicity. Another thing is for stability for page->mapping->a_ops.
>
> For example,
>
> isolate_migratepages_block
>          if (PageMovable(page))
>                  isolate_movable_page
>                          get_page_unless_zero <--- 1
>                          trylock_page
>                          page->mapping->a_ops->isolate_page <--- 2
>
> Between 1 and 2, driver can nullify page->mapping so we need PG_lock

Hmm I see, that really doesn't seem easily solvable without page_lock.
My idea is that compaction code would just check PageMovable() and 
PageIsolated() to find a candidate. page->mapping->a_ops->isolate_page 
would do the driver-specific necessary locking, revalidate if the page 
state and succeed isolation, or fail. It would need to handle the 
possibility that the page already doesn't belong to the mapping, which 
is probably not a problem. But what if the driver is a module that was 
already unloaded, and even though we did NULL-check each part from page 
to isolate_page, it points to a function that's already gone? That would 
need some extra handling to prevent that, hm...

>>
>>> +2. Migration
>>> +
>>> +After successful isolation, VM calls migratepage. The migratepage's goal is
>>> +to move content of the old page to new page and set up struct page fields
>>> +of new page. If migration is successful, subsystem should release old page's
>>> +refcount to free. Keep in mind that subsystem should clear PG_movable and
>>> +PG_isolated before releasing the refcount.  If everything are done, user
>>> +should return MIGRATEPAGE_SUCCESS. If subsystem cannot migrate the page
>>> +at the moment, migratepage can return -EAGAIN. On -EAGAIN, VM will retry page
>>> +migration because VM interprets -EAGAIN as "temporal migration failure".
>>> +
>>> +3. Putback
>>> +
>>> +If migration was unsuccessful, VM calls putback_page. The subsystem should
>>> +insert isolated page to own data structure again if it has. And subsystem
>>> +should clear PG_isolated which was marked in isolation step.
>>> +
>>> +Note about releasing page:
>>> +
>>> +Subsystem can release pages whenever it want but if it releses the page
>>> +which is already isolated, it should clear PG_isolated but doesn't touch
>>> +PG_movable under PG_lock. Instead of it, VM will clear PG_movable after
>>> +his job done. Otherweise, subsystem should clear both page flags before
>>> +releasing the page.
>>
>> I don't understand this right now. But maybe I will get it after
>> reading the patches and suggest some improved wording here.
>
> I will try to explain why such rule happens in there.
>
> The problem is that put_page is aware of PageLRU. So, if someone releases
> last refcount of LRU page, __put_page checks PageLRU and then, clear the
> flags and detatch the page in LRU list(i.e., data structure).
> But in case of driver page, data structure like LRU among drivers is not only one.
> IOW, we should add following code in put_page to handle various requirements
> of driver page.
>
> void __put_page(struct page *page)
> {
>          if (PageMovable(page)) {
>                  /*
>                   * It will tity up driver's data structure like LRU
>                   * and reset page's flags. And it should be atomic
>                   * and always successful
>                   */
>                  page->put(page);
>                  __ClearPageMovable(page);
>          } else if (PageCompound(page))
>                  __put_compound_page(page);
>          else
>                  __put_single_page(page);
>
> }
>
> I'd like to avoid add new branch for not popular job in put_page which is hot.
> (Might change in future but not popular at the moment)
> So, rule of driver is as follows.
>
> When the driver releases the page and he found the page is PG_isolated,
> he should unmark only PG_isolated, not PG_movable so migration side of
> VM can catch it up "Hmm, the isolated non-lru page doesn't have PG_isolated
> any more. It means drivers releases the page. So, let's put the page
> instead of putback operation".
>
> When the driver releases the page and he doesn't see PG_isolated mark
> of the page, driver should reset both PG_isolated and PG_movable.

Yeah think I understand now, thanks for the explanation. But since I 
found the "freeing isolated page" part to be racy in the 02/16 
subthread, it might be premature now to improve the wording now :/

  reply	other threads:[~2016-04-04 13:09 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-30  7:11 [PATCH v3 00/16] Support non-lru page migration Minchan Kim
2016-03-30  7:12 ` [PATCH v3 01/16] mm: use put_page to free page instead of putback_lru_page Minchan Kim
2016-04-01 12:58   ` Vlastimil Babka
2016-04-04  1:39     ` Minchan Kim
2016-04-04  4:45       ` Naoya Horiguchi
2016-04-04 14:46         ` Vlastimil Babka
2016-04-05  1:54           ` Naoya Horiguchi
2016-04-05  8:20             ` Vlastimil Babka
2016-04-06  0:54               ` Naoya Horiguchi
2016-04-06  7:57                 ` Vlastimil Babka
2016-04-04  5:53   ` Balbir Singh
2016-04-04  6:01     ` Minchan Kim
2016-04-05  3:10       ` Balbir Singh
2016-03-30  7:12 ` [PATCH v3 02/16] mm/compaction: support non-lru movable page migration Minchan Kim
2016-04-01 21:29   ` Vlastimil Babka
2016-04-04  5:12     ` Minchan Kim
2016-04-04 13:24       ` Vlastimil Babka
2016-04-07  2:35         ` Minchan Kim
2016-04-12  8:00   ` Chulmin Kim
2016-04-12 14:25     ` Minchan Kim
2016-03-30  7:12 ` [PATCH v3 03/16] mm: add non-lru movable page support document Minchan Kim
2016-04-01 14:38   ` Vlastimil Babka
2016-04-04  2:25     ` Minchan Kim
2016-04-04 13:09       ` Vlastimil Babka [this message]
2016-04-07  2:27         ` Minchan Kim
2016-03-30  7:12 ` [PATCH v3 04/16] mm/balloon: use general movable page feature into balloon Minchan Kim
2016-04-05 12:03   ` Vlastimil Babka
2016-04-11  4:29     ` Minchan Kim
2016-03-30  7:12 ` [PATCH v3 05/16] zsmalloc: keep max_object in size_class Minchan Kim
2016-04-17 15:08   ` Sergey Senozhatsky
2016-03-30  7:12 ` [PATCH v3 06/16] zsmalloc: squeeze inuse into page->mapping Minchan Kim
2016-04-17 15:08   ` Sergey Senozhatsky
2016-04-19  7:40     ` Minchan Kim
2016-03-30  7:12 ` [PATCH v3 07/16] zsmalloc: remove page_mapcount_reset Minchan Kim
2016-04-17 15:11   ` Sergey Senozhatsky
2016-03-30  7:12 ` [PATCH v3 08/16] zsmalloc: squeeze freelist into page->mapping Minchan Kim
2016-04-17 15:56   ` Sergey Senozhatsky
2016-04-19  7:42     ` Minchan Kim
2016-03-30  7:12 ` [PATCH v3 09/16] zsmalloc: move struct zs_meta from mapping to freelist Minchan Kim
2016-04-17 15:22   ` Sergey Senozhatsky
2016-03-30  7:12 ` [PATCH v3 10/16] zsmalloc: factor page chain functionality out Minchan Kim
2016-04-18  0:33   ` Sergey Senozhatsky
2016-04-19  7:46     ` Minchan Kim
2016-03-30  7:12 ` [PATCH v3 11/16] zsmalloc: separate free_zspage from putback_zspage Minchan Kim
2016-04-18  1:04   ` Sergey Senozhatsky
2016-04-19  7:51     ` Minchan Kim
2016-04-19  7:53       ` Sergey Senozhatsky
2016-03-30  7:12 ` [PATCH v3 12/16] zsmalloc: zs_compact refactoring Minchan Kim
2016-04-04  8:04   ` Chulmin Kim
2016-04-04  9:01     ` Minchan Kim
2016-03-30  7:12 ` [PATCH v3 13/16] zsmalloc: migrate head page of zspage Minchan Kim
2016-04-06 13:01   ` Chulmin Kim
2016-04-07  0:34     ` Chulmin Kim
2016-04-07  0:43     ` Minchan Kim
2016-04-19  6:08   ` Chulmin Kim
2016-04-19  6:15     ` Minchan Kim
2016-03-30  7:12 ` [PATCH v3 14/16] zsmalloc: use single linked list for page chain Minchan Kim
2016-03-30  7:12 ` [PATCH v3 15/16] zsmalloc: migrate tail pages in zspage Minchan Kim
2016-03-30  7:12 ` [PATCH v3 16/16] zram: use __GFP_MOVABLE for memory allocation Minchan Kim
2016-03-30 23:11 ` [PATCH v3 00/16] Support non-lru page migration Andrew Morton
2016-03-31  0:29   ` Sergey Senozhatsky
2016-03-31  0:57     ` Minchan Kim
2016-03-31  0:57   ` Minchan Kim
2016-04-04 13:17 ` John Einar Reitan
2016-04-11  4:35   ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57026782.3020201@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=aquini@redhat.com \
    --cc=bfields@fieldses.org \
    --cc=chan.jeong@lge.com \
    --cc=corbet@lwn.net \
    --cc=gi-oh.kim@profitbricks.com \
    --cc=hughd@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=jlayton@poochiereds.net \
    --cc=koct9i@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=minchan@kernel.org \
    --cc=riel@redhat.com \
    --cc=rknize@motorola.com \
    --cc=sangseok.lee@lge.com \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=viro@ZenIV.linux.org.uk \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=xuyiping@hisilicon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).