linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"open list:MEMORY MANAGEMENT" <linux-mm@kvack.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Brian Geffon <bgeffon@google.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Kalesh Singh <kaleshsingh@google.com>,
	Nicolas Geoffray <ngeoffray@google.com>,
	Jared Duke <jdduke@google.com>,
	android-mm <android-mm@google.com>,
	Blake Caldwell <blake.caldwell@colorado.edu>,
	Mike Rapoport <rppt@kernel.org>
Subject: Re: RFC for new feature to move pages from one vma to another without split
Date: Thu, 13 Apr 2023 10:10:44 +0200	[thread overview]
Message-ID: <3059388f-1604-c326-c66f-c2e0f9bb6cbf@redhat.com> (raw)
In-Reply-To: <ZDbVMk0trT5UaqaA@x1n>

On 12.04.23 17:58, Peter Xu wrote:
> On Wed, Apr 12, 2023 at 10:47:52AM +0200, David Hildenbrand wrote:
>>> Personally it was always a mistery to me on how vm_pgoff works with
>>> anonymous vmas and why it needs to be setup with vm_start >> PAGE_SHIFT.
>>>
>>> Just now I tried to apply below oneliner change:
>>>
>>> @@ -1369,7 +1369,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
>>>                           /*
>>>                            * Set pgoff according to addr for anon_vma.
>>>                            */
>>> -                       pgoff = addr >> PAGE_SHIFT;
>>> +                       pgoff = 0;
>>>                           break;
>>>                   default:
>>>                           return -EINVAL;
>>>
>>> The kernel even boots without a major problem so far..
>>
>> I think it's for RMAP purposes.
>>
>> Take a look at linear_page_index() and how it's, for example, used in
>> ksm_might_need_to_copy() alongside page->index.
> 
>  From what I read, the vma's vm_pgoff is set before setup any page->index
> within the vma, while the latter will be calculated out of the vma pgoff
> with linear_page_index() (in __page_set_anon_rmap()).
> 
> 	folio->index = linear_page_index(vma, address);
> 
> I think I missed something, but it seems to me any comparisions between
> page->index and linear_page_index() will just keep working for anonymous
> even if we change vma pgoff to 0 when vma is mapped.
> 
> Do you perhaps mean this is needed for ksm only?  I really am not familiar
> enough with ksm, especially when it's swapped out.  I do see that
> ksm_might_need_to_copy() wants to avoid reusing a page if anon_vma is setup
> not for current vma, but I don't know when it'll happen.
> 
> 	if (PageKsm(page)) {
> 		if (page_stable_node(page) &&
> 		    !(ksm_run & KSM_RUN_UNMERGE))
> 			return page;	/* no need to copy it */
> 	} else if (!anon_vma) {
> 		return page;		/* no need to copy it */
> 	} else if (page->index == linear_page_index(vma, address) &&
> 			anon_vma->root == vma->anon_vma->root) {
> 		return page;		/* still no need to copy it */
> 	}
> 
> I think when all these paths don't trigger (aka, we need to copy) it means
> there's anon_vma assigned to the page but not the right one (even though I
> don't know how that could happen..).  Meanwhile I don't see either on how
> vma pg_off affects this (and I assume a real KSM page ignores page->index
> completely).

I think you are right with folio->index = linear_page_index(vma, address).

I did not check the code yet, but thinking about it I figured out why we 
want to set pgoff to the start of the VMA in the address space for 
anonymous memory:

For RMAP and friends (relying on linear_page_index), folio->index has to 
match the index within the VMA. If would set pgoff to something else, 
we'd have less VMA merging opportunities. So your system might work, but 
you'd end up with many anon VMAs.


Imagine the following:

[ anon0 ][  fd   ][ anon1 ]

Unmap the fd:

[ anon0 ][ hole  ][ anon1 ]

Mmap anon:

[ anon0 ][ anon2 ][ anon1 ]


We can now merge all 3 VMAs into one, even if the first and latter 
already map pages.


A simpler and more common example is probably:

[ anon0 ]

Mmmap anon1 before the existing one

[ anon1 ][ anon0 ]

Which we can merge into a single one.



Mapping after an existing one could work, but one would have to 
carefully set pgoff based on the size of the previous anon VMA ... which 
is more complicated

So instead, we consider the whole address space as a virtual, anon file, 
starting at offset 0. The pgoff of a VMA is then simply the offset in 
that virtual file (easily computed from the start of the VMA), and VMA 
merging is just the same as for an ordinary file.

-- 
Thanks,

David / dhildenb


  reply	other threads:[~2023-04-13  8:11 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-16 22:27 RFC for new feature to move pages from one vma to another without split Lokesh Gidra
2023-04-06 17:29 ` Peter Xu
2023-04-10  7:41   ` Lokesh Gidra
2023-04-11 15:14     ` Peter Xu
2023-05-08 22:56       ` Lokesh Gidra
2023-05-16 16:43         ` Peter Xu
2023-04-12  8:47   ` David Hildenbrand
2023-04-12 15:58     ` Peter Xu
2023-04-13  8:10       ` David Hildenbrand [this message]
2023-04-13 15:36         ` Peter Xu
2023-06-06 20:15           ` Vlastimil Babka
2023-06-06 23:18             ` Suren Baghdasaryan
2023-06-08 10:05               ` Lokesh Gidra
2023-09-14 15:30                 ` Suren Baghdasaryan
2023-06-07 20:17         ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3059388f-1604-c326-c66f-c2e0f9bb6cbf@redhat.com \
    --to=david@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=android-mm@google.com \
    --cc=axelrasmussen@google.com \
    --cc=bgeffon@google.com \
    --cc=blake.caldwell@colorado.edu \
    --cc=jdduke@google.com \
    --cc=kaleshsingh@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lokeshgidra@google.com \
    --cc=ngeoffray@google.com \
    --cc=peterx@redhat.com \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).