From: Ingo Molnar <email@example.com> To: Andy Lutomirski <firstname.lastname@example.org> Cc: Alexey Dobriyan <email@example.com>, "H. Peter Anvin" <firstname.lastname@example.org>, Thomas Gleixner <email@example.com>, Ingo Molnar <firstname.lastname@example.org>, Borislav Petkov <email@example.com>, LKML <firstname.lastname@example.org>, X86 ML <email@example.com>, Peter Zijlstra <firstname.lastname@example.org>, Linus Torvalds <email@example.com>, Al Viro <firstname.lastname@example.org> Subject: Re: [PATCH] x86_64: uninline TASK_SIZE Date: Tue, 23 Apr 2019 13:12:58 +0200 [thread overview] Message-ID: <20190423111258.GA23410@gmail.com> (raw) In-Reply-To: <CALCETrWqPStQqv2xCJZLzsqTE-9L9U08CMApWyKD1zwjrrH4ow@mail.gmail.com> * Andy Lutomirski <email@example.com> wrote: > > Saving 2KB on a defconfig is quite a lot. > > Saving 2kB of text by adding 8 bytes to thread_info seems rather > dubious to me. You only need 256 tasks before you lose. My > not-particularly-loaded laptop has 865 tasks right now. I was suggesting current->task_size or thread_info->task_size as a way to 100% avoid the function call overhead. Worth a tiny amount of RAM - even with 1 million tasks it's only 4MB of RAM. ;-) Some TASK_SIZE users are prominent syscalls: mmap(), > As a general principle, the mere existence of TIF_ADDR32 is a bug. The > value of that flag is *wrong* under the 32-bit variant of CRIU. How > about instead making some more progress toward getting rid of dubious > TASK_SIZE users? I'm working on a little series to get rid of most of > them. Meanwhile: it sure looks like a large fraction of the users are > confused as to whether TASK_SIZE is the highest user address or the > lowest non-user address. I really like that, replacing TASK_SIZE with *nothing* would be even faster. In fact instead of just reducing its usage I'd suggest removing TASK_SIZE altogether and renaming TASK_SIZE_MAX back to TASK_SIZE, or something like that - the confusion from the deceptively macro-ish naming of TASK_SIZE is real. The original commit of making TASK_SIZE dynamic on the task's compat flag was done in: 84929801e14d: [PATCH] x86_64: TASK_SIZE fixes for compatibility mode processes Here's the justification given: Appended patch will setup compatibility mode TASK_SIZE properly. This will fix atleast three known bugs that can be encountered while running compatibility mode apps. a) A malicious 32bit app can have an elf section at 0xffffe000. During exec of this app, we will have a memory leak as insert_vm_struct() is not checking for return value in syscall32_setup_pages() and thus not freeing the vma allocated for the vsyscall page. And instead of exec failing (as it has addresses > TASK_SIZE), we were allowing it to succeed previously. b) With a 32bit app, hugetlb_get_unmapped_area/arch_get_unmapped_area may return addresses beyond 32bits, ultimately causing corruption because of wrap-around and resulting in SEGFAULT, instead of returning ENOMEM. c) 32bit app doing this below mmap will now fail. mmap((void *)(0xFFFFE000UL), 0x10000UL, PROT_READ|PROT_WRITE, MAP_FIXED|MAP_PRIVATE|MAP_ANON, 0, 0); I believe a) is addressed by getting rid of the vsyscall page - but it might also not be a current problem because the vsyscall page has its own gate-vma now. b) shouldn't be an issue if the mmap allocator correctly treats the compat bit - this doesn't require generic TASK_SIZE variations though, as the mmap allocator is already specific to arch/x86/. c) is a variant of a) I believe, which should be fixed by now. I just looked through some of the current TASK_SIZE users, and *ALL* of them seem dubious to me, with the exception of the mmap allocators. In fact some of them seem to be active bugs: kernel/: - PR_SET_MM_MAP, PR_SET_MM_MAP_SIZE, prctl_set_mm() et al. Ugh, what a nasty prctl()! But besides that, the TASK_SIZE restriction to the ABI looks questionable: if we offer this CRIU functionality then why should it be impossible for a 32-bit CRIU task to switch to 64-bit? - kernel/events/core.c: TASK_SIZE_MAX should be a fine filter here, in fact it's probably *wrong* to restrict the profiling data here just because the task happens to be in 32-bit compat mode. - kernel/rseq.c: is this TASK_SIZE restriction even required, wouldn't TASK_SIZE_MAX be sufficient? mm/: - GUP's get_get_area() et al looks really weird - why do we special-case vsyscalls: - Can we get rid of the vsyscall page in modern kernels? - I don't think anyone runs those ancient glibc versions with a fresh kernel anymore - can we start generating a WARN()ing perhaps to see whether there's any complaints? - Or at least pretend it doesn't exist in terms of a GUP target page? - mm/kasan/generic_report.c:get_wild_bug_type() - this can use TASK_SIZE_MAX just fine IMHO. - mm/mempolicy.c:mpol_shared_policy_init() - unsure, but I suspect we can just create the pseudo-vma with a TASK_SIZE_MAX vm_end just fine. - mm/mlock.c:mlockall() - I believe it could be considered an outright *bug* if there any pages outside the 32-bit area and don't get mlocked by mlockall, just because this is a compat task. Especially with the CRIU prctl() having 64-bit vmas outside the 32-bit mappings is a real possibility, right? I.e. TASK_SIZE_MAX would be the right solution here. To turn the argument around: beyond the memory allocators, which includes the mmap and huge-mmap variants plus the SysV shmem allocator, can we list all the places that absolutely *rely* on TASK_SIZE being TIF_ADDR32 restricted on compat tasks? I couldn't find any. So I concur 100% that most TASK_SIZE uses are questionable. In fact think 84929801e14d was a mistake, and we should effectively revert it carefully, by: - First by moving almost all TASK_SIZE users over to TASK_SIZE_MAX, analyzing and justifying the impact case by case. - Then making the mmap allocators compat compatible (ha) without relying on TASK_SIZE. - Renaming TASK_SIZE back to TASK_SIZE_MAX and getting rid of the TASK_SIZE and TASK_SIZE_MAX differentiation. Or am I missing some complication? Thanks, Ingo
next prev parent reply other threads:[~2019-04-23 11:13 UTC|newest] Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-04-21 16:06 Alexey Dobriyan 2019-04-21 18:28 ` Ingo Molnar 2019-04-21 20:07 ` hpa 2019-04-21 21:10 ` Alexey Dobriyan 2019-04-22 10:34 ` Ingo Molnar 2019-04-22 14:30 ` Andy Lutomirski 2019-04-22 22:09 ` Alexey Dobriyan 2019-04-23 0:54 ` Andy Lutomirski 2019-04-23 11:12 ` Ingo Molnar [this message] 2019-04-23 17:21 ` Andy Lutomirski 2019-04-24 10:38 ` Ingo Molnar 2019-04-22 22:04 ` Alexey Dobriyan 2019-04-21 22:07 ` [PATCH v2] " Alexey Dobriyan 2019-04-22 22:40 ` [PATCH] " Linus Torvalds
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20190423111258.GA23410@gmail.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --subject='Re: [PATCH] x86_64: uninline TASK_SIZE' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).