linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@amacapital.net>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Arnd Bergmann <arnd@arndb.de>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	X86 ML <x86@kernel.org>, Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	Andi Kleen <ak@linux.intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	linux-arch <linux-arch@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will.deacon@arm.com>
Subject: Re: [RFC, PATCHv2 29/29] mm, x86: introduce RLIMIT_VADDR
Date: Tue, 3 Jan 2017 10:27:22 -0800	[thread overview]
Message-ID: <CALCETrW3=SsQeC-gWOVqwTtg022+gHei=xfUc2ei3kkX0CACpg@mail.gmail.com> (raw)
In-Reply-To: <20170103160457.GB17319@node.shutemov.name>

On Tue, Jan 3, 2017 at 8:04 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> On Mon, Jan 02, 2017 at 10:08:28PM -0800, Andy Lutomirski wrote:
>> On Mon, Jan 2, 2017 at 12:44 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>> > On Tuesday, December 27, 2016 4:54:13 AM CET Kirill A. Shutemov wrote:
>> >> As with other resources you can set the limit lower than current usage.
>> >> It would affect only future virtual address space allocations.
>>
>> I still don't buy all these use cases:
>>
>> >>
>> >> Use-cases for new rlimit:
>> >>
>> >>   - Bumping the soft limit to RLIM_INFINITY, allows current process all
>> >>     its children to use addresses above 47-bits.
>>
>> OK, I get this, but only as a workaround for programs that make
>> assumptions about the address space and don't use some mechanism (to
>> be designed?) to work correctly in spite of a larger address space.
>
> I guess you've misread the case. It's opt-in for large adrress space, not
> other way around.
>
> I believe 47-bit VA by default is right way to go to make the transition
> without breaking userspace.

What I meant was: setting the rlimit to anything other than -1ULL is a
workaround, but otherwise I agree.  This still makes little sense if
set by PAM or other conventional rlimit tools.

>> >>
>> >>   - Lowering the hard limit to 47-bits would prevent current process all
>> >>     its children to use addresses above 47-bits, unless a process has
>> >>     CAP_SYS_RESOURCES.
>>
>> I've tried and I can't imagine any reason to do this.
>
> That's just if something went wrong and we want to stop an application
> from use addresses above 47-bit.

But CAP_SYS_RESOURCES still makes no sense in this context.

>
>> >>   - It’s also can be handy to lower hard or soft limit to arbitrary
>> >>     address. User-mode emulation in QEMU may lower the limit to 32-bit
>> >>     to emulate 32-bit machine on 64-bit host.
>>
>> I don't understand.  QEMU user-mode emulation intercepts all syscalls.
>> What QEMU would *actually* want is a way to say "allocate me some
>> memory with the high N bits clear".  mmap-via-int80 on x86 should be
>> fixed to do this, but a new syscall with an explicit parameter would
>> work, as would a prctl changing the current limit.
>
> Look at mess in mmap_find_vma(). QEmu has to guess where is free virtual
> memory. That's unnessesary complex.
>
> prctl would work for this too. new-mmap would *not*: there are more ways
> to allocate vitual address space: shmat(), mremap(). Changing all of them
> just for this is stupid.

Fair enough.

Except that mmap-via-int80, shmat-via-int80, etc should still work (if
I understand what qemu needs correctly), as would the prctl.

>
>> >>
>> >> TODO:
>> >>   - port to non-x86;
>> >>
>> >> Not-yet-signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> >> Cc: linux-api@vger.kernel.org
>> >
>> > This seems to nicely address the same problem on arm64, which has
>> > run into the same issue due to the various page table formats
>> > that can currently be chosen at compile time.
>>
>> On further reflection, I think this has very little to do with paging
>> formats except insofar as paging formats make us notice the problem.
>> The issue is that user code wants to be able to assume an upper limit
>> on an address, and it gets an upper limit right now that depends on
>> architecture due to paging formats.  But someone really might want to
>> write a *portable* 64-bit program that allocates memory with the high
>> 16 bits clear.  So let's add such a mechanism directly.
>>
>> As a thought experiment, what if x86_64 simply never allocated "high"
>> (above 2^47-1) addresses unless a new mmap-with-explicit-limit syscall
>> were used?  Old glibc would continue working.  Old VMs would work.
>> New programs that want to use ginormous mappings would have to use the
>> new syscall.  This would be totally stateless and would have no issues
>> with CRIU.
>
> Except, we need more than mmap as I mentioned.
>
> And what about stack? I'm not sure that everybody would be happy with
> stack in the middle of address space.

I would, personally.  I think that, for very large address spaces, we
should allocate a large block of stack and get rid of the "stack grows
down forever" legacy idea.  Then we would never need to worry about
the stack eventually hitting some other allocation.  And 2^57 bytes is
hilariously large for a default stack.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-01-03 18:27 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-27  1:53 [PATCHv2 00/29] 5-level paging Kirill A. Shutemov
2016-12-27  1:53 ` [PATCHv2 01/29] x86/cpufeature: Add 5-level paging detecton Kirill A. Shutemov
2016-12-27  1:53 ` [PATCHv2 02/29] asm-generic: introduce 5level-fixup.h Kirill A. Shutemov
2017-01-27 11:06   ` Vlastimil Babka
2017-01-27 11:30     ` Kirill A. Shutemov
2016-12-27  1:53 ` [PATCHv2 03/29] asm-generic: introduce __ARCH_USE_5LEVEL_HACK Kirill A. Shutemov
2017-01-27 13:24   ` Vlastimil Babka
2017-01-27 13:55     ` Kirill A. Shutemov
2016-12-27  1:53 ` [PATCHv2 04/29] arch, mm: convert all architectures to use 5level-fixup.h Kirill A. Shutemov
2016-12-27  1:53 ` [PATCHv2 05/29] asm-generic: introduce <asm-generic/pgtable-nop4d.h> Kirill A. Shutemov
2016-12-27  1:53 ` [PATCHv2 06/29] mm: convert generic code to 5-level paging Kirill A. Shutemov
2016-12-27  1:53 ` [PATCHv2 07/29] mm: introduce __p4d_alloc() Kirill A. Shutemov
2016-12-27  1:53 ` [PATCHv2 08/29] x86: basic changes into headers for 5-level paging Kirill A. Shutemov
2016-12-27  1:53 ` [PATCHv2 09/29] x86: trivial portion of 5-level paging conversion Kirill A. Shutemov
2016-12-27  1:53 ` [PATCHv2 10/29] x86/gup: add 5-level paging support Kirill A. Shutemov
2016-12-27  1:53 ` [PATCHv2 11/29] x86/ident_map: " Kirill A. Shutemov
2016-12-27  1:53 ` [PATCHv2 12/29] x86/mm: add support of p4d_t in vmalloc_fault() Kirill A. Shutemov
2016-12-27  1:53 ` [PATCHv2 13/29] x86/power: support p4d_t in hibernate code Kirill A. Shutemov
2016-12-27  1:53 ` [PATCHv2 14/29] x86/kexec: support p4d_t Kirill A. Shutemov
2016-12-27  1:53 ` [PATCHv2 15/29] x86: convert the rest of the code to " Kirill A. Shutemov
2016-12-27  1:54 ` [PATCHv2 16/29] x86: detect 5-level paging support Kirill A. Shutemov
2016-12-27  1:54 ` [PATCHv2 17/29] x86/asm: remove __VIRTUAL_MASK_SHIFT==47 assert Kirill A. Shutemov
2016-12-27  1:54 ` [PATCHv2 18/29] x86/mm: define virtual memory map for 5-level paging Kirill A. Shutemov
2016-12-27  1:54 ` [PATCHv2 19/29] x86/paravirt: make paravirt code support " Kirill A. Shutemov
2016-12-27  1:54 ` [PATCHv2 20/29] x86/mm: basic defines/helpers for CONFIG_X86_5LEVEL Kirill A. Shutemov
2016-12-27  1:54 ` [PATCHv2 21/29] x86/dump_pagetables: support 5-level paging Kirill A. Shutemov
2016-12-27  1:54 ` [PATCHv2 22/29] x86/mm: extend kasan to " Kirill A. Shutemov
2016-12-27  1:54 ` [PATCHv2 23/29] x86/espfix: " Kirill A. Shutemov
2016-12-27  1:54 ` [PATCHv2 24/29] x86/mm: add support of additional page table level during early boot Kirill A. Shutemov
2016-12-27  1:54 ` [PATCHv2 25/29] x86/mm: add sync_global_pgds() for configuration with 5-level paging Kirill A. Shutemov
2016-12-27  1:54 ` [PATCHv2 26/29] x86/mm: make kernel_physical_mapping_init() support " Kirill A. Shutemov
2016-12-27  1:54 ` [PATCHv2 27/29] x86/mm: add support for 5-level paging for KASLR Kirill A. Shutemov
2016-12-27  1:54 ` [PATCHv2 28/29] x86: enable 5-level paging support Kirill A. Shutemov
2016-12-27  1:54 ` [RFC, PATCHv2 29/29] mm, x86: introduce RLIMIT_VADDR Kirill A. Shutemov
2016-12-27  2:06   ` Andy Lutomirski
2016-12-27  2:24     ` Kirill A. Shutemov
2016-12-27  3:22       ` Andy Lutomirski
2017-01-02  9:09         ` Kirill A. Shutemov
2016-12-29  2:53       ` Carlos O'Donell
2016-12-31  2:08         ` Andy Lutomirski
2017-01-02  8:35           ` Kirill A. Shutemov
2017-01-13 20:11             ` H.J. Lu
2017-01-02  8:44   ` Arnd Bergmann
2017-01-03  6:08     ` Andy Lutomirski
2017-01-03 13:18       ` Arnd Bergmann
2017-01-03 18:29         ` Andy Lutomirski
2017-01-03 22:07           ` Arnd Bergmann
2017-01-03 22:09             ` Andy Lutomirski
2017-01-04 13:55               ` Arnd Bergmann
2017-01-03 16:04       ` Kirill A. Shutemov
2017-01-03 18:27         ` Andy Lutomirski [this message]
2017-01-04 14:19           ` Kirill A. Shutemov
2017-01-05 17:53             ` Andy Lutomirski
2017-01-05 19:13   ` Dave Hansen
2017-01-05 19:29     ` Kirill A. Shutemov
2017-01-05 19:39       ` Dave Hansen
2017-01-05 20:11         ` Kirill A. Shutemov
2017-01-05 20:14         ` Andy Lutomirski
2017-01-05 20:49           ` Dave Hansen
2017-01-05 21:27             ` Andy Lutomirski
2017-01-05 23:17               ` Dave Hansen
2017-01-11 14:29             ` Kirill A. Shutemov
2017-01-11 18:09               ` Andy Lutomirski
2017-01-11 18:37                 ` Kirill A. Shutemov
2017-01-11 18:49                   ` Dave Hansen
2017-01-11 19:20                     ` Andy Lutomirski
2017-01-11 19:31                       ` Linus Torvalds
2017-01-11 21:46                         ` Andi Kleen
2017-01-11 19:32                       ` Kirill A. Shutemov
2017-01-11 19:39                         ` Linus Torvalds
2017-01-11 18:26               ` Dave Hansen
2017-01-05 16:57 ` [PATCHv2 00/29] 5-level paging Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALCETrW3=SsQeC-gWOVqwTtg022+gHei=xfUc2ei3kkX0CACpg@mail.gmail.com' \
    --to=luto@amacapital.net \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=catalin.marinas@arm.com \
    --cc=dave.hansen@intel.com \
    --cc=hpa@zytor.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=will.deacon@arm.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).