linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: linux-mm@kvack.org, linux-api@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Aaron Lu <aaron.lu@intel.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Vlastimil Babka <vbabka@suse.cz>
Subject: Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
Date: Tue, 11 Jul 2017 11:23:19 -0700	[thread overview]
Message-ID: <7f14334f-81d1-7698-d694-37278f05a78e@oracle.com> (raw)
In-Reply-To: <20170711123642.GC11936@dhcp22.suse.cz>

On 07/11/2017 05:36 AM, Michal Hocko wrote:
> On Thu 06-07-17 09:17:26, Mike Kravetz wrote:
>> The mremap system call has the ability to 'mirror' parts of an existing
>> mapping.  To do so, it creates a new mapping that maps the same pages as
>> the original mapping, just at a different virtual address.  This
>> functionality has existed since at least the 2.6 kernel.
>>
>> This patch simply adds a new flag to mremap which will make this
>> functionality part of the API.  It maintains backward compatibility with
>> the existing way of requesting mirroring (old_size == 0).
>>
>> If this new MREMAP_MIRROR flag is specified, then new_size must equal
>> old_size.  In addition, the MREMAP_MAYMOVE flag must be specified.
> 
> I have to admit that this came as a suprise to me. There is no mention
> about this special case in the man page and the mremap code is so
> convoluted that I simply didn't see it there. I guess the only
> reasonable usecase is when you do not have a fd for the shared memory.

I was surprised as well when a JVM developer pointed this out.

>From the old e-mail thread, here is original use case:
shmget(IPC_PRIVATE, 31498240, 0x1c0|0600) = 11337732
shmat(11337732, 0, 0)                   = 0x40299000
shmctl(11337732, IPC_RMID, 0)           = 0
mremap(0x402a9000, 0, 65536, MREMAP_MAYMOVE|MREMAP_FIXED, 0) = 0
mremap(0x402a9000, 0, 65536, MREMAP_MAYMOVE|MREMAP_FIXED, 0x100000) = 0x100000

The JVM team wants to do something similar.  They are using
mmap(MAP_ANONYMOUS|MAP_SHARED) to create the initial mapping instead
of shmget/shmat.  As Vlastimil mentioned previously, one would not
expect a shared mapping for parts of the JVM heap.  I am working
to get clarification from the JVM team.

> Anyway the patch should fail with -EINVAL on private mappings as Kirill
> already pointed out

Yes.  I think this should be a separate patch.  As mentioned earlier,
mremap today creates a new/additional private mapping if called in this
way with old_size == 0.  To me, this is a bug.

>                     and this should go along with an update to the
> man page which describes also the historical behavior.

Yes, man page updates are a must.

One reason for the RFC was to determine if people thought we should:
1) Just document the existing old_size == 0 functionality
2) Create a more explicit interface such as a new mremap flag for this
   functionality

I am waiting to see what direction people prefer before making any
man page updates.

>                                                        Make sure you
> document that this is not really a mirroring (e.g. faulting page in one
> address will automatically map it to the other mapping(s)) but merely a
> copy of the range. Maybe MREMAP_COPY would be more appropriate name.

Good point.  mirror is the first word that came to mind, but it does
not exactly apply.

-- 
Mike Kravetz

> 
>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
>> ---
>>  include/uapi/linux/mman.h       |  5 +++--
>>  mm/mremap.c                     | 23 ++++++++++++++++-------
>>  tools/include/uapi/linux/mman.h |  5 +++--
>>  3 files changed, 22 insertions(+), 11 deletions(-)
>>
>> diff --git a/include/uapi/linux/mman.h b/include/uapi/linux/mman.h
>> index ade4acd..6b3e0df 100644
>> --- a/include/uapi/linux/mman.h
>> +++ b/include/uapi/linux/mman.h
>> @@ -3,8 +3,9 @@
>>  
>>  #include <asm/mman.h>
>>  
>> -#define MREMAP_MAYMOVE	1
>> -#define MREMAP_FIXED	2
>> +#define MREMAP_MAYMOVE	0x01
>> +#define MREMAP_FIXED	0x02
>> +#define MREMAP_MIRROR	0x04
>>  
>>  #define OVERCOMMIT_GUESS		0
>>  #define OVERCOMMIT_ALWAYS		1
>> diff --git a/mm/mremap.c b/mm/mremap.c
>> index cd8a1b1..f18ab36 100644
>> --- a/mm/mremap.c
>> +++ b/mm/mremap.c
>> @@ -516,10 +516,11 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
>>  	struct vm_userfaultfd_ctx uf = NULL_VM_UFFD_CTX;
>>  	LIST_HEAD(uf_unmap);
>>  
>> -	if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE))
>> +	if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE | MREMAP_MIRROR))
>>  		return ret;
>>  
>> -	if (flags & MREMAP_FIXED && !(flags & MREMAP_MAYMOVE))
>> +	if ((flags & MREMAP_FIXED || flags & MREMAP_MIRROR) &&
>> +	    !(flags & MREMAP_MAYMOVE))
>>  		return ret;
>>  
>>  	if (offset_in_page(addr))
>> @@ -528,14 +529,22 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
>>  	old_len = PAGE_ALIGN(old_len);
>>  	new_len = PAGE_ALIGN(new_len);
>>  
>> -	/*
>> -	 * We allow a zero old-len as a special case
>> -	 * for DOS-emu "duplicate shm area" thing. But
>> -	 * a zero new-len is nonsensical.
>> -	 */
>> +	/* A zero new-len is nonsensical. */
>>  	if (!new_len)
>>  		return ret;
>>  
>> +	/*
>> +	 * For backward compatibility, we allow a zero old-len to imply
>> +	 * mirroring.  This was originally a special case for DOS-emu.
>> +	 */
>> +	if (!old_len)
>> +		flags |= MREMAP_MIRROR;
>> +	else if (flags & MREMAP_MIRROR) {
>> +		if (old_len != new_len)
>> +			return ret;
>> +		old_len = 0;
>> +	}
>> +
>>  	if (down_write_killable(&current->mm->mmap_sem))
>>  		return -EINTR;
>>  
>> diff --git a/tools/include/uapi/linux/mman.h b/tools/include/uapi/linux/mman.h
>> index 81d8edf..069f7a5 100644
>> --- a/tools/include/uapi/linux/mman.h
>> +++ b/tools/include/uapi/linux/mman.h
>> @@ -3,8 +3,9 @@
>>  
>>  #include <uapi/asm/mman.h>
>>  
>> -#define MREMAP_MAYMOVE	1
>> -#define MREMAP_FIXED	2
>> +#define MREMAP_MAYMOVE	0x01
>> +#define MREMAP_FIXED	0x02
>> +#define MREMAP_MIRROR	0x04
>>  
>>  #define OVERCOMMIT_GUESS		0
>>  #define OVERCOMMIT_ALWAYS		1
>> -- 
>> 2.7.5
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-07-11 18:23 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-06 16:17 [RFC PATCH 0/1] mm/mremap: add MREMAP_MIRROR flag Mike Kravetz
2017-07-06 16:17 ` [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality Mike Kravetz
2017-07-07  8:45   ` Anshuman Khandual
2017-07-07 17:14     ` Mike Kravetz
2017-07-09  7:23       ` Anshuman Khandual
2017-07-07 10:23   ` Kirill A. Shutemov
2017-07-07 17:29     ` Mike Kravetz
2017-07-07 17:45       ` Kirill A. Shutemov
2017-07-07 18:09         ` Mike Kravetz
2017-07-09  7:32           ` Anshuman Khandual
2017-07-10 16:22             ` Vlastimil Babka
2017-07-10 17:22               ` Mike Kravetz
2017-07-11 12:36   ` Michal Hocko
2017-07-11 18:23     ` Mike Kravetz [this message]
2017-07-11 21:02       ` Andrea Arcangeli
2017-07-11 21:57         ` Mike Kravetz
2017-07-11 23:31           ` Andrea Arcangeli
2017-07-12 11:46       ` Michal Hocko
2017-07-12 16:55         ` Mike Kravetz
2017-07-13  6:16           ` Michal Hocko
2017-07-13 16:01             ` Mike Kravetz
2017-07-13 16:30               ` Andrea Arcangeli
2017-07-13 18:11                 ` Mike Kravetz
2017-07-13 20:33                   ` Andrea Arcangeli
2017-07-07  8:19 ` [RFC PATCH 0/1] mm/mremap: add MREMAP_MIRROR flag Anshuman Khandual
2017-07-07 17:04   ` Mike Kravetz
2017-07-07 11:03 ` Anshuman Khandual
2017-07-07 17:12   ` Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7f14334f-81d1-7698-d694-37278f05a78e@oracle.com \
    --to=mike.kravetz@oracle.com \
    --cc=aarcange@redhat.com \
    --cc=aaron.lu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).