All of lore.kernel.org
 help / color / mirror / Atom feed
* rmap for shmem pages
@ 2019-12-11 23:51 Joel Fernandes
  2019-12-12 10:55 ` Kirill A. Shutemov
  0 siblings, 1 reply; 3+ messages in thread
From: Joel Fernandes @ 2019-12-11 23:51 UTC (permalink / raw)
  To: linux-mm, akpm, mhocko, riel, minchan, hannes, kirill.shutemov

Hi,
I am looking into a weird behavior of the VmRSS counter in /proc/pid/status
which I think is a bug.

The issue is if a memfd is mapped into the same process twice as MAP_SHARED, and
you write to both maps, then the RSS doubles even though the pages were
mapped elsewhere within the same address space before. We expect RSS to
increase only once in this case, not twice.

Digging further, I see the RssShmem is contributing to this increase and this
happens in alloc_set_pte() during a page fault. I tried to write a patch to
check if the page is mapped within the same process already, and if not,
don't do the increment as the accounting was already previously done. However
for shmem pages, it appears rmap has no effect and cannot detect the case I
am interested in.

The patch below is obviously incomplete since I didn't handle the unmap case,
but could anyone let me know why rmap does not seem to be working for shmem,
and is this expected?

I was expecting the RSS increment to be skipped since the code should find that the
page was mapped locally within same address space already.

---8<-----------------------

diff --git a/mm/memory.c b/mm/memory.c
index 606da187d1de..4fca5d7e9d66 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3360,6 +3361,8 @@ static vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
  *
  * Return: %0 on success, %VM_FAULT_ code in case of error.
  */
+bool page_has_single_mm(struct page *page, struct mm_struct *mm);
+
 vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
 		struct page *page)
 {
@@ -3399,8 +3402,14 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
 		mem_cgroup_commit_charge(page, memcg, false, false);
 		lru_cache_add_active_or_unevictable(page, vma);
 	} else {
-		inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page));
 		page_add_file_rmap(page, false);
+
+		if (page_has_single_mm(page, vma->vm_mm)) {
+			trace_printk("inc 2 mm_counter_file counter=%d\n", mm_counter_file(page));
+			inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page));
+		} else {
+			trace_printk("inc 2 mm_counter_file counter=%d skipped\n", mm_counter_file(page));
+		}
 	}
 	set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry);
 
diff --git a/mm/rmap.c b/mm/rmap.c
index b3e381919835..580da8469f0d 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1273,6 +1273,39 @@ static void page_remove_file_rmap(struct page *page, bool compound)
 	unlock_page_memcg(page);
 }
 
+struct page_single_mm_ctx {
+	struct mm_struct *mm;
+	bool single;
+};
+
+static bool __page_has_single_mm(struct page *page, struct vm_area_struct *vma,
+				 unsigned long addr, void *ctx_in)
+{
+	struct page_single_mm_ctx *ctx = ctx_in;
+
+	if (vma->vm_mm == ctx->mm)
+		return true;
+
+	ctx->single = false;
+	return false;
+}
+
+bool page_has_single_mm(struct page *page, struct mm_struct *mm)
+{
+	struct page_single_mm_ctx ctx;
+
+	ctx.mm = mm;
+	ctx.single = true;
+
+	struct rmap_walk_control rwc = {
+		.rmap_one = __page_has_single_mm,
+		.arg = &ctx,
+	};
+
+	rmap_walk(page, &rwc);
+	return ctx.single;
+}
+
 static void page_remove_anon_compound_rmap(struct page *page)
 {
 	int i, nr;
-- 
2.24.0.525.g8f36a354ae-goog



^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: rmap for shmem pages
  2019-12-11 23:51 rmap for shmem pages Joel Fernandes
@ 2019-12-12 10:55 ` Kirill A. Shutemov
  2019-12-12 14:43   ` Joel Fernandes
  0 siblings, 1 reply; 3+ messages in thread
From: Kirill A. Shutemov @ 2019-12-12 10:55 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-mm, akpm, mhocko, riel, minchan, hannes, kirill.shutemov

On Wed, Dec 11, 2019 at 06:51:08PM -0500, Joel Fernandes wrote:
> Hi,
> I am looking into a weird behavior of the VmRSS counter in /proc/pid/status
> which I think is a bug.
> 
> The issue is if a memfd is mapped into the same process twice as MAP_SHARED, and
> you write to both maps, then the RSS doubles even though the pages were
> mapped elsewhere within the same address space before. We expect RSS to
> increase only once in this case, not twice.

VmRSS is a best effort. Meeting your expectation would require rmap walk
(as your patch does). It's way too expensive for fault path (especially
under page table lock) just to keep percise statistics. Sorry, but you
will not get it.

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: rmap for shmem pages
  2019-12-12 10:55 ` Kirill A. Shutemov
@ 2019-12-12 14:43   ` Joel Fernandes
  0 siblings, 0 replies; 3+ messages in thread
From: Joel Fernandes @ 2019-12-12 14:43 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-mm, akpm, mhocko, riel, minchan, hannes, kirill.shutemov

On Thu, Dec 12, 2019 at 01:55:21PM +0300, Kirill A. Shutemov wrote:
> On Wed, Dec 11, 2019 at 06:51:08PM -0500, Joel Fernandes wrote:
> > Hi,
> > I am looking into a weird behavior of the VmRSS counter in /proc/pid/status
> > which I think is a bug.
> > 
> > The issue is if a memfd is mapped into the same process twice as MAP_SHARED, and
> > you write to both maps, then the RSS doubles even though the pages were
> > mapped elsewhere within the same address space before. We expect RSS to
> > increase only once in this case, not twice.
> 
> VmRSS is a best effort. Meeting your expectation would require rmap walk
> (as your patch does). It's way too expensive for fault path (especially
> under page table lock) just to keep percise statistics. Sorry, but you
> will not get it.

No problem. I was under the same impression that this is difficult to do in a
scalable way. I am glad I reached out but if anyone has any other ideas, let
me know.

thanks,

 - Joel


> 
> -- 
>  Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-12-12 14:43 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-11 23:51 rmap for shmem pages Joel Fernandes
2019-12-12 10:55 ` Kirill A. Shutemov
2019-12-12 14:43   ` Joel Fernandes

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.