From: Yang Shi <shy828301@gmail.com>
To: js1304@gmail.com
Cc: Andrew Morton <akpm@linux-foundation.org>,
Linux MM <linux-mm@kvack.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@kernel.org>,
Hugh Dickins <hughd@google.com>, Minchan Kim <minchan@kernel.org>,
Vlastimil Babka <vbabka@suse.cz>,
Mel Gorman <mgorman@techsingularity.net>,
kernel-team@lge.com, Joonsoo Kim <iamjoonsoo.kim@lge.com>
Subject: Re: [PATCH v5 05/10] mm/swap: charge the page when adding to the swap cache
Date: Fri, 3 Apr 2020 11:29:40 -0700 [thread overview]
Message-ID: <CAHbLzkqdupWUv7vPpqDpOARuYkBiTxmQxNi-zaw_TWVB1FsNjQ@mail.gmail.com> (raw)
In-Reply-To: <1585892447-32059-6-git-send-email-iamjoonsoo.kim@lge.com>
On Thu, Apr 2, 2020 at 10:41 PM <js1304@gmail.com> wrote:
>
> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>
> Currently, some swapped-in pages are not charged to the memcg until
> actual access to the page happens. I checked the code and found that
> it could cause a problem. In this implementation, even if the memcg
> is enabled, one can consume a lot of memory in the system by exploiting
> this hole. For example, one can make all the pages swapped out and
> then call madvise_willneed() to load the all swapped-out pages without
> pressing the memcg. Although actual access requires charging, it's really
> big benefit to load the swapped-out pages to the memory without pressing
> the memcg.
>
> And, for workingset detection which is implemented on the following patch,
> a memcg should be committed before the workingset detection is executed.
> For this purpose, the best solution, I think, is charging the page when
> adding to the swap cache. Charging there is not that hard. Caller of
> adding the page to the swap cache has enough information about the charged
> memcg. So, what we need to do is just passing this information to
> the right place.
>
> With this patch, specific memcg could be pressured more since readahead
> pages are also charged to it now. This would result in performance
> degradation to that user but it would be fair since that readahead is for
> that user.
If I read the code correctly, the readahead pages may be *not* charged
to it at all but other memcgs since mem_cgroup_try_charge() would
retrieve the target memcg id from the swap entry then charge to it
(generally it is the memcg from who the page is swapped out). So, it
may open a backdoor to let one memcg stress other memcgs?
>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
> include/linux/swap.h | 4 ++--
> mm/shmem.c | 18 ++++++++++--------
> mm/swap_state.c | 25 +++++++++++++++++++++----
> 3 files changed, 33 insertions(+), 14 deletions(-)
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 273de48..eea0700 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -409,7 +409,7 @@ extern unsigned long total_swapcache_pages(void);
> extern void show_swap_cache_info(void);
> extern int add_to_swap(struct page *page);
> extern int add_to_swap_cache(struct page *page, swp_entry_t entry,
> - gfp_t gfp, void **shadowp);
> + struct vm_area_struct *vma, gfp_t gfp, void **shadowp);
> extern int __add_to_swap_cache(struct page *page, swp_entry_t entry);
> extern void __delete_from_swap_cache(struct page *page,
> swp_entry_t entry, void *shadow);
> @@ -567,7 +567,7 @@ static inline int add_to_swap(struct page *page)
> }
>
> static inline int add_to_swap_cache(struct page *page, swp_entry_t entry,
> - gfp_t gfp_mask, void **shadowp)
> + struct vm_area_struct *vma, gfp_t gfp, void **shadowp)
> {
> return -1;
> }
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 9e34b4e..8e28c1f 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1369,7 +1369,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
> if (list_empty(&info->swaplist))
> list_add(&info->swaplist, &shmem_swaplist);
>
> - if (add_to_swap_cache(page, swap,
> + if (add_to_swap_cache(page, swap, NULL,
> __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN,
> NULL) == 0) {
> spin_lock_irq(&info->lock);
> @@ -1434,10 +1434,11 @@ static inline struct mempolicy *shmem_get_sbmpol(struct shmem_sb_info *sbinfo)
> #endif
>
> static void shmem_pseudo_vma_init(struct vm_area_struct *vma,
> - struct shmem_inode_info *info, pgoff_t index)
> + struct mm_struct *mm, struct shmem_inode_info *info,
> + pgoff_t index)
> {
> /* Create a pseudo vma that just contains the policy */
> - vma_init(vma, NULL);
> + vma_init(vma, mm);
> /* Bias interleave by inode number to distribute better across nodes */
> vma->vm_pgoff = index + info->vfs_inode.i_ino;
> vma->vm_policy = mpol_shared_policy_lookup(&info->policy, index);
> @@ -1450,13 +1451,14 @@ static void shmem_pseudo_vma_destroy(struct vm_area_struct *vma)
> }
>
> static struct page *shmem_swapin(swp_entry_t swap, gfp_t gfp,
> - struct shmem_inode_info *info, pgoff_t index)
> + struct mm_struct *mm, struct shmem_inode_info *info,
> + pgoff_t index)
> {
> struct vm_area_struct pvma;
> struct page *page;
> struct vm_fault vmf;
>
> - shmem_pseudo_vma_init(&pvma, info, index);
> + shmem_pseudo_vma_init(&pvma, mm, info, index);
> vmf.vma = &pvma;
> vmf.address = 0;
> page = swap_cluster_readahead(swap, gfp, &vmf);
> @@ -1481,7 +1483,7 @@ static struct page *shmem_alloc_hugepage(gfp_t gfp,
> XA_PRESENT))
> return NULL;
>
> - shmem_pseudo_vma_init(&pvma, info, hindex);
> + shmem_pseudo_vma_init(&pvma, NULL, info, hindex);
> page = alloc_pages_vma(gfp | __GFP_COMP | __GFP_NORETRY | __GFP_NOWARN,
> HPAGE_PMD_ORDER, &pvma, 0, numa_node_id(), true);
> shmem_pseudo_vma_destroy(&pvma);
> @@ -1496,7 +1498,7 @@ static struct page *shmem_alloc_page(gfp_t gfp,
> struct vm_area_struct pvma;
> struct page *page;
>
> - shmem_pseudo_vma_init(&pvma, info, index);
> + shmem_pseudo_vma_init(&pvma, NULL, info, index);
> page = alloc_page_vma(gfp, &pvma, 0);
> shmem_pseudo_vma_destroy(&pvma);
>
> @@ -1652,7 +1654,7 @@ static int shmem_swapin_page(struct inode *inode, pgoff_t index,
> count_memcg_event_mm(charge_mm, PGMAJFAULT);
> }
> /* Here we actually start the io */
> - page = shmem_swapin(swap, gfp, info, index);
> + page = shmem_swapin(swap, gfp, charge_mm, info, index);
> if (!page) {
> error = -ENOMEM;
> goto failed;
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index f06af84..1db73a2 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -112,7 +112,7 @@ void show_swap_cache_info(void)
> * but sets SwapCache flag and private instead of mapping and index.
> */
> int add_to_swap_cache(struct page *page, swp_entry_t entry,
> - gfp_t gfp, void **shadowp)
> + struct vm_area_struct *vma, gfp_t gfp, void **shadowp)
> {
> struct address_space *address_space = swap_address_space(entry);
> pgoff_t idx = swp_offset(entry);
> @@ -120,14 +120,26 @@ int add_to_swap_cache(struct page *page, swp_entry_t entry,
> unsigned long i, nr = compound_nr(page);
> unsigned long nrexceptional = 0;
> void *old;
> + bool compound = !!compound_order(page);
> + int error;
> + struct mm_struct *mm = vma ? vma->vm_mm : current->mm;
> + struct mem_cgroup *memcg;
>
> VM_BUG_ON_PAGE(!PageLocked(page), page);
> VM_BUG_ON_PAGE(PageSwapCache(page), page);
> VM_BUG_ON_PAGE(!PageSwapBacked(page), page);
>
> page_ref_add(page, nr);
> + /* PageSwapCache() prevent the page from being re-charged */
> SetPageSwapCache(page);
>
> + error = mem_cgroup_try_charge(page, mm, gfp, &memcg, compound);
> + if (error) {
> + ClearPageSwapCache(page);
> + page_ref_sub(page, nr);
> + return error;
> + }
> +
> do {
> xas_lock_irq(&xas);
> xas_create_range(&xas);
> @@ -153,11 +165,16 @@ int add_to_swap_cache(struct page *page, swp_entry_t entry,
> xas_unlock_irq(&xas);
> } while (xas_nomem(&xas, gfp));
>
> - if (!xas_error(&xas))
> + if (!xas_error(&xas)) {
> + mem_cgroup_commit_charge(page, memcg, false, compound);
> return 0;
> + }
> +
> + mem_cgroup_cancel_charge(page, memcg, compound);
>
> ClearPageSwapCache(page);
> page_ref_sub(page, nr);
> +
> return xas_error(&xas);
> }
>
> @@ -221,7 +238,7 @@ int add_to_swap(struct page *page)
> /*
> * Add it to the swap cache.
> */
> - err = add_to_swap_cache(page, entry,
> + err = add_to_swap_cache(page, entry, NULL,
> __GFP_HIGH|__GFP_NOMEMALLOC|__GFP_NOWARN, NULL);
> if (err)
> /*
> @@ -431,7 +448,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
> /* May fail (-ENOMEM) if XArray node allocation failed. */
> __SetPageLocked(new_page);
> __SetPageSwapBacked(new_page);
> - err = add_to_swap_cache(new_page, entry,
> + err = add_to_swap_cache(new_page, entry, vma,
> gfp_mask & GFP_KERNEL, NULL);
> if (likely(!err)) {
> /* Initiate read into locked page */
> --
> 2.7.4
>
>
next prev parent reply other threads:[~2020-04-03 18:29 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-03 5:40 [PATCH v5 00/10] workingset protection/detection on the anonymous LRU list js1304
2020-04-03 5:40 ` [PATCH v5 01/10] mm/vmscan: make active/inactive ratio as 1:1 for anon lru js1304
2020-04-03 5:40 ` [PATCH v5 02/10] mm/vmscan: protect the workingset on anonymous LRU js1304
2020-04-03 5:40 ` [PATCH v5 03/10] mm/workingset: extend the workingset detection for anon LRU js1304
2020-04-03 5:40 ` [PATCH v5 04/10] mm/swapcache: support to handle the exceptional entries in swapcache js1304
2020-04-03 5:40 ` [PATCH v5 05/10] mm/swap: charge the page when adding to the swap cache js1304
2020-04-03 18:29 ` Yang Shi [this message]
2020-04-06 1:03 ` Joonsoo Kim
2020-04-07 0:22 ` Yang Shi
2020-04-07 1:27 ` Joonsoo Kim
2020-04-16 16:11 ` Johannes Weiner
2020-04-17 1:38 ` Joonsoo Kim
2020-04-17 3:31 ` Johannes Weiner
2020-04-17 3:57 ` Joonsoo Kim
2020-04-03 5:40 ` [PATCH v5 06/10] mm/swap: implement workingset detection for anonymous LRU js1304
2020-04-03 5:40 ` [PATCH v5 07/10] mm/workingset: support to remember the previous owner of the page js1304
2020-04-03 5:40 ` [PATCH v5 08/10] mm/swap: do not readahead if the previous owner of the swap entry isn't me js1304
2020-04-03 5:40 ` [PATCH v5 09/10] mm/vmscan: restore active/inactive ratio for anonymous LRU js1304
2020-04-03 5:45 ` [PATCH v5 10/10] mm/swap: reinforce the reclaim_stat changed by anon LRU algorithm change js1304
2020-04-06 9:18 ` [PATCH v5 02/10] mm/vmscan: protect the workingset on anonymous LRU Hillf Danton
2020-04-07 0:40 ` Joonsoo Kim
2020-04-06 11:58 ` [PATCH v5 05/10] mm/swap: charge the page when adding to the swap cache Hillf Danton
2020-04-07 0:42 ` Joonsoo Kim
2020-04-07 2:21 ` Hillf Danton
2020-04-09 0:53 ` Joonsoo Kim
2020-04-08 16:55 ` [PATCH v5 00/10] workingset protection/detection on the anonymous LRU list Vlastimil Babka
2020-04-09 0:50 ` Joonsoo Kim
2020-06-03 3:57 ` Suren Baghdasaryan
2020-06-03 5:46 ` Joonsoo Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAHbLzkqdupWUv7vPpqDpOARuYkBiTxmQxNi-zaw_TWVB1FsNjQ@mail.gmail.com \
--to=shy828301@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=js1304@gmail.com \
--cc=kernel-team@lge.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mhocko@kernel.org \
--cc=minchan@kernel.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).