All of lore.kernel.org
 help / color / mirror / Atom feed
From: Shaohua Li <shli@fb.com>
To: <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>
Cc: <Kernel-team@fb.com>, <mhocko@suse.com>, <minchan@kernel.org>,
	<hughd@google.com>, <hannes@cmpxchg.org>, <riel@redhat.com>,
	<mgorman@techsingularity.net>
Subject: [RFC 0/6]mm: add new LRU list for MADV_FREE pages
Date: Sun, 29 Jan 2017 21:51:17 -0800	[thread overview]
Message-ID: <cover.1485748619.git.shli@fb.com> (raw)

Hi,

We are trying to use MADV_FREE in jemalloc. Several issues are found. Without
solving the issues, jemalloc can't use the MADV_FREE feature.
- Doesn't support system without swap enabled. Because if swap is off, we can't
  or can't efficiently age anonymous pages. And since MADV_FREE pages are mixed
  with other anonymous pages, we can't reclaim MADV_FREE pages. In current
  implementation, MADV_FREE will fallback to MADV_DONTNEED without swap enabled.
  But in our environment, a lot of machines don't enable swap. This will prevent
  our setup using MADV_FREE.
- Increases memory pressure. page reclaim bias file pages reclaim against
  anonymous pages. This doesn't make sense for MADV_FREE pages, because those
  pages could be freed easily and refilled with very slight penality. Even page
  reclaim doesn't bias file pages, there is still an issue, because MADV_FREE
  pages and other anonymous pages are mixed together. To reclaim a MADV_FREE
  page, we probably must scan a lot of other anonymous pages, which is
  inefficient. In our test, we usually see oom with MADV_FREE enabled and nothing
  without it.
- RSS accounting. MADV_FREE pages are accounted as normal anon pages and
  reclaimed lazily, so application's RSS becomes bigger. This confuses our
  workloads. We have monitoring daemon running and if it finds applications' RSS
  becomes abnormal, the daemon will kill the applications even kernel can reclaim
  the memory easily. Currently we don't export separate RSS accounting for
  MADV_FREE pages. This will prevent our setup using MADV_FREE too.

For the first two issues, introducing a new LRU list for MADV_FREE pages could
solve the issues. We can directly reclaim MADV_FREE pages without writting them
out to swap, so the first issue could be fixed. If only MADV_FREE pages are in
the new list, page reclaim can easily reclaim such pages without interference
of file or anonymous pages. The memory pressure issue will disappear.

Actually Minchan posted patches to add the LRU list before, but he didn't
pursue. So I picked up them and the patches are based on Minchan's previous
patches. The main difference between my patches and Minchan previous patches is
page reclaim policy. Minchan's patches introduces a knob to balance the reclaim
of MADV_FREE pages and anon/file pages, while the patches always reclaim
MADV_FREE pages first if there are. I described the reason in patch 5.

For the third issue, we can add a separate RSS count for MADV_FREE pages. The
count will be increased in madvise syscall and decreased in page reclaim (eg,
unmap). One issue is activate_page(). A MADV_FREE page can be promoted to
active page there. But there isn't mm_struct context at that place. Iterating
vma there sounds too silly. The patchset don't fix this issue yet. Hopefully
somebody can share a hint how to fix this issue.

Thanks,
Shaohua

Minchan previous patches:
http://marc.info/?l=linux-mm&m=144800657002763&w=2

Shaohua Li (6):
  mm: add wrap for page accouting index
  mm: add lazyfree page flag
  mm: add LRU_LAZYFREE lru list
  mm: move MADV_FREE pages into LRU_LAZYFREE list
  mm: reclaim lazyfree pages
  mm: enable MADV_FREE for swapless system

 drivers/base/node.c                       |  2 +
 drivers/staging/android/lowmemorykiller.c |  3 +-
 fs/proc/meminfo.c                         |  1 +
 fs/proc/task_mmu.c                        |  8 ++-
 include/linux/mm_inline.h                 | 41 +++++++++++++
 include/linux/mmzone.h                    |  9 +++
 include/linux/page-flags.h                |  6 ++
 include/linux/swap.h                      |  2 +-
 include/linux/vm_event_item.h             |  2 +-
 include/trace/events/mmflags.h            |  1 +
 include/trace/events/vmscan.h             | 31 +++++-----
 kernel/power/snapshot.c                   |  1 +
 mm/compaction.c                           | 11 ++--
 mm/huge_memory.c                          |  6 +-
 mm/khugepaged.c                           |  6 +-
 mm/madvise.c                              | 11 +---
 mm/memcontrol.c                           |  4 ++
 mm/memory-failure.c                       |  3 +-
 mm/memory_hotplug.c                       |  3 +-
 mm/mempolicy.c                            |  3 +-
 mm/migrate.c                              | 29 ++++------
 mm/page_alloc.c                           | 10 ++++
 mm/rmap.c                                 |  7 ++-
 mm/swap.c                                 | 51 +++++++++-------
 mm/vmscan.c                               | 96 +++++++++++++++++++++++--------
 mm/vmstat.c                               |  4 ++
 26 files changed, 242 insertions(+), 109 deletions(-)

-- 
2.9.3

WARNING: multiple messages have this Message-ID (diff)
From: Shaohua Li <shli@fb.com>
To: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: Kernel-team@fb.com, mhocko@suse.com, minchan@kernel.org,
	hughd@google.com, hannes@cmpxchg.org, riel@redhat.com,
	mgorman@techsingularity.net
Subject: [RFC 0/6]mm: add new LRU list for MADV_FREE pages
Date: Sun, 29 Jan 2017 21:51:17 -0800	[thread overview]
Message-ID: <cover.1485748619.git.shli@fb.com> (raw)

Hi,

We are trying to use MADV_FREE in jemalloc. Several issues are found. Without
solving the issues, jemalloc can't use the MADV_FREE feature.
- Doesn't support system without swap enabled. Because if swap is off, we can't
  or can't efficiently age anonymous pages. And since MADV_FREE pages are mixed
  with other anonymous pages, we can't reclaim MADV_FREE pages. In current
  implementation, MADV_FREE will fallback to MADV_DONTNEED without swap enabled.
  But in our environment, a lot of machines don't enable swap. This will prevent
  our setup using MADV_FREE.
- Increases memory pressure. page reclaim bias file pages reclaim against
  anonymous pages. This doesn't make sense for MADV_FREE pages, because those
  pages could be freed easily and refilled with very slight penality. Even page
  reclaim doesn't bias file pages, there is still an issue, because MADV_FREE
  pages and other anonymous pages are mixed together. To reclaim a MADV_FREE
  page, we probably must scan a lot of other anonymous pages, which is
  inefficient. In our test, we usually see oom with MADV_FREE enabled and nothing
  without it.
- RSS accounting. MADV_FREE pages are accounted as normal anon pages and
  reclaimed lazily, so application's RSS becomes bigger. This confuses our
  workloads. We have monitoring daemon running and if it finds applications' RSS
  becomes abnormal, the daemon will kill the applications even kernel can reclaim
  the memory easily. Currently we don't export separate RSS accounting for
  MADV_FREE pages. This will prevent our setup using MADV_FREE too.

For the first two issues, introducing a new LRU list for MADV_FREE pages could
solve the issues. We can directly reclaim MADV_FREE pages without writting them
out to swap, so the first issue could be fixed. If only MADV_FREE pages are in
the new list, page reclaim can easily reclaim such pages without interference
of file or anonymous pages. The memory pressure issue will disappear.

Actually Minchan posted patches to add the LRU list before, but he didn't
pursue. So I picked up them and the patches are based on Minchan's previous
patches. The main difference between my patches and Minchan previous patches is
page reclaim policy. Minchan's patches introduces a knob to balance the reclaim
of MADV_FREE pages and anon/file pages, while the patches always reclaim
MADV_FREE pages first if there are. I described the reason in patch 5.

For the third issue, we can add a separate RSS count for MADV_FREE pages. The
count will be increased in madvise syscall and decreased in page reclaim (eg,
unmap). One issue is activate_page(). A MADV_FREE page can be promoted to
active page there. But there isn't mm_struct context at that place. Iterating
vma there sounds too silly. The patchset don't fix this issue yet. Hopefully
somebody can share a hint how to fix this issue.

Thanks,
Shaohua

Minchan previous patches:
http://marc.info/?l=linux-mm&m=144800657002763&w=2

Shaohua Li (6):
  mm: add wrap for page accouting index
  mm: add lazyfree page flag
  mm: add LRU_LAZYFREE lru list
  mm: move MADV_FREE pages into LRU_LAZYFREE list
  mm: reclaim lazyfree pages
  mm: enable MADV_FREE for swapless system

 drivers/base/node.c                       |  2 +
 drivers/staging/android/lowmemorykiller.c |  3 +-
 fs/proc/meminfo.c                         |  1 +
 fs/proc/task_mmu.c                        |  8 ++-
 include/linux/mm_inline.h                 | 41 +++++++++++++
 include/linux/mmzone.h                    |  9 +++
 include/linux/page-flags.h                |  6 ++
 include/linux/swap.h                      |  2 +-
 include/linux/vm_event_item.h             |  2 +-
 include/trace/events/mmflags.h            |  1 +
 include/trace/events/vmscan.h             | 31 +++++-----
 kernel/power/snapshot.c                   |  1 +
 mm/compaction.c                           | 11 ++--
 mm/huge_memory.c                          |  6 +-
 mm/khugepaged.c                           |  6 +-
 mm/madvise.c                              | 11 +---
 mm/memcontrol.c                           |  4 ++
 mm/memory-failure.c                       |  3 +-
 mm/memory_hotplug.c                       |  3 +-
 mm/mempolicy.c                            |  3 +-
 mm/migrate.c                              | 29 ++++------
 mm/page_alloc.c                           | 10 ++++
 mm/rmap.c                                 |  7 ++-
 mm/swap.c                                 | 51 +++++++++-------
 mm/vmscan.c                               | 96 +++++++++++++++++++++++--------
 mm/vmstat.c                               |  4 ++
 26 files changed, 242 insertions(+), 109 deletions(-)

-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

             reply	other threads:[~2017-01-30  5:51 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-30  5:51 Shaohua Li [this message]
2017-01-30  5:51 ` [RFC 0/6]mm: add new LRU list for MADV_FREE pages Shaohua Li
2017-01-30  5:51 ` [RFC 1/6] mm: add wrap for page accouting index Shaohua Li
2017-01-30  5:51   ` Shaohua Li
2017-01-30  5:51 ` [RFC 2/6] mm: add lazyfree page flag Shaohua Li
2017-01-30  5:51   ` Shaohua Li
2017-01-30  5:51 ` [RFC 3/6] mm: add LRU_LAZYFREE lru list Shaohua Li
2017-01-30  5:51   ` Shaohua Li
2017-01-30  5:51 ` [RFC 4/6] mm: move MADV_FREE pages into LRU_LAZYFREE list Shaohua Li
2017-01-30  5:51   ` Shaohua Li
2017-01-30  5:51 ` [RFC 5/6] mm: reclaim lazyfree pages Shaohua Li
2017-01-30  5:51   ` Shaohua Li
2017-01-30  5:51 ` [RFC 6/6] mm: enable MADV_FREE for swapless system Shaohua Li
2017-01-30  5:51   ` Shaohua Li
2017-01-31 18:59 ` [RFC 0/6]mm: add new LRU list for MADV_FREE pages Johannes Weiner
2017-01-31 18:59   ` Johannes Weiner
2017-01-31 19:45   ` Shaohua Li
2017-01-31 19:45     ` Shaohua Li
2017-01-31 21:38     ` Johannes Weiner
2017-01-31 21:38       ` Johannes Weiner
2017-02-01  9:02       ` Michal Hocko
2017-02-01  9:02         ` Michal Hocko
2017-02-01 16:37       ` Shaohua Li
2017-02-01 16:37         ` Shaohua Li
2017-02-02  5:14       ` Minchan Kim
2017-02-02  5:14         ` Minchan Kim
2017-02-02 19:28         ` Johannes Weiner
2017-02-02 19:28           ` Johannes Weiner
2017-02-01  5:47 ` Minchan Kim
2017-02-01  5:47   ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1485748619.git.shli@fb.com \
    --to=shli@fb.com \
    --cc=Kernel-team@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.