From: Andy Lutomirski <email@example.com> To: Ingo Molnar <firstname.lastname@example.org> Cc: Andy Lutomirski <email@example.com>, Alexey Dobriyan <firstname.lastname@example.org>, "H. Peter Anvin" <email@example.com>, Thomas Gleixner <firstname.lastname@example.org>, Ingo Molnar <email@example.com>, Borislav Petkov <firstname.lastname@example.org>, LKML <email@example.com>, X86 ML <firstname.lastname@example.org>, Peter Zijlstra <email@example.com>, Linus Torvalds <firstname.lastname@example.org>, Al Viro <email@example.com> Subject: Re: [PATCH] x86_64: uninline TASK_SIZE Date: Tue, 23 Apr 2019 10:21:59 -0700 [thread overview] Message-ID: <CALCETrWtK4G7-o=pQoeyzLFTpiR9giGy2jYTU10xt6S8FsMk+Q@mail.gmail.com> (raw) In-Reply-To: <20190423111258.GA23410@gmail.com> On Tue, Apr 23, 2019 at 4:13 AM Ingo Molnar <firstname.lastname@example.org> wrote: > > > * Andy Lutomirski <email@example.com> wrote: > > > > Saving 2KB on a defconfig is quite a lot. > > > > Saving 2kB of text by adding 8 bytes to thread_info seems rather > > dubious to me. You only need 256 tasks before you lose. My > > not-particularly-loaded laptop has 865 tasks right now. > > I was suggesting current->task_size or thread_info->task_size as a way to > 100% avoid the function call overhead. Worth a tiny amount of RAM - even > with 1 million tasks it's only 4MB of RAM. ;-) > > Some TASK_SIZE users are prominent syscalls: mmap(), > > > As a general principle, the mere existence of TIF_ADDR32 is a bug. The > > value of that flag is *wrong* under the 32-bit variant of CRIU. How > > about instead making some more progress toward getting rid of dubious > > TASK_SIZE users? I'm working on a little series to get rid of most of > > them. Meanwhile: it sure looks like a large fraction of the users are > > confused as to whether TASK_SIZE is the highest user address or the > > lowest non-user address. > > I really like that, replacing TASK_SIZE with *nothing* would be even > faster. > > In fact instead of just reducing its usage I'd suggest removing TASK_SIZE > altogether and renaming TASK_SIZE_MAX back to TASK_SIZE, or something > like that - the confusion from the deceptively macro-ish naming of > TASK_SIZE is real. > > The original commit of making TASK_SIZE dynamic on the task's compat flag > was done in: > > 84929801e14d: [PATCH] x86_64: TASK_SIZE fixes for compatibility mode processes > > Here's the justification given: > > Appended patch will setup compatibility mode TASK_SIZE properly. This will > fix atleast three known bugs that can be encountered while running > compatibility mode apps. > > a) A malicious 32bit app can have an elf section at 0xffffe000. During > exec of this app, we will have a memory leak as insert_vm_struct() is > not checking for return value in syscall32_setup_pages() and thus not > freeing the vma allocated for the vsyscall page. And instead of exec > failing (as it has addresses > TASK_SIZE), we were allowing it to > succeed previously. > > b) With a 32bit app, hugetlb_get_unmapped_area/arch_get_unmapped_area > may return addresses beyond 32bits, ultimately causing corruption > because of wrap-around and resulting in SEGFAULT, instead of returning > ENOMEM. > > c) 32bit app doing this below mmap will now fail. > > mmap((void *)(0xFFFFE000UL), 0x10000UL, PROT_READ|PROT_WRITE, > MAP_FIXED|MAP_PRIVATE|MAP_ANON, 0, 0); > > I believe a) is addressed by getting rid of the vsyscall page - but it > might also not be a current problem because the vsyscall page has its own > gate-vma now. > I suspect that whatever issue this is predates my involvement in Linux :) The "gate" VMA, aka vsyscall, is at 0xffffffffff600000, and the vDSO setup code shouldn't anywhere near that fragile. Also, if this really a bug, we have it for the 64-bit case, too, and TASK_SIZE isn't going to help. > b) shouldn't be an issue if the mmap allocator correctly treats the > compat bit - this doesn't require generic TASK_SIZE variations though, as > the mmap allocator is already specific to arch/x86/. This was mostly fixed by the CRIU folks a while back, I think. > > c) is a variant of a) I believe, which should be fixed by now. > > I just looked through some of the current TASK_SIZE users, and *ALL* of > them seem dubious to me, with the exception of the mmap allocators. In Even the munmap one seems to be a bug. > fact some of them seem to be active bugs: > > kernel/: > > - PR_SET_MM_MAP, PR_SET_MM_MAP_SIZE, prctl_set_mm() et al. Ugh, what a > nasty prctl()! But besides that, the TASK_SIZE restriction to the ABI > looks questionable: if we offer this CRIU functionality then why > should it be impossible for a 32-bit CRIU task to switch to 64-bit? > > - kernel/events/core.c: TASK_SIZE_MAX should be a fine filter here, in > fact it's probably *wrong* to restrict the profiling data here just > because the task happens to be in 32-bit compat mode. Yep. I have a patch fo rthis. > > - kernel/rseq.c: is this TASK_SIZE restriction even required, wouldn't > TASK_SIZE_MAX be sufficient? Presumably. > So I concur 100% that most TASK_SIZE uses are questionable. In fact think > 84929801e14d was a mistake, and we should effectively revert it > carefully, by: > > - First by moving almost all TASK_SIZE users over to TASK_SIZE_MAX, > analyzing and justifying the impact case by case. > > - Then making the mmap allocators compat compatible (ha) without relying > on TASK_SIZE. I have a somewhat hacky patch for this right now. > > - Renaming TASK_SIZE back to TASK_SIZE_MAX and getting rid of the > TASK_SIZE and TASK_SIZE_MAX differentiation. > > Or am I missing some complication? Seems like a great idea to me. BTW, what the heck is up with get_gate_page()? I'm struggling to understand what it's even trying to do. If there's an architecture that allows a user program to mremap() or otherwise position its gate VMA between TASK_SIZE and TASK_SIZE_MAX, then that code is going to explode horribly. A whole bunch of work in this direction is here: https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/fixes It's almost entirely untested.
next prev parent reply other threads:[~2019-04-23 17:22 UTC|newest] Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-04-21 16:06 Alexey Dobriyan 2019-04-21 18:28 ` Ingo Molnar 2019-04-21 20:07 ` hpa 2019-04-21 21:10 ` Alexey Dobriyan 2019-04-22 10:34 ` Ingo Molnar 2019-04-22 14:30 ` Andy Lutomirski 2019-04-22 22:09 ` Alexey Dobriyan 2019-04-23 0:54 ` Andy Lutomirski 2019-04-23 11:12 ` Ingo Molnar 2019-04-23 17:21 ` Andy Lutomirski [this message] 2019-04-24 10:38 ` Ingo Molnar 2019-04-22 22:04 ` Alexey Dobriyan 2019-04-21 22:07 ` [PATCH v2] " Alexey Dobriyan 2019-04-22 22:40 ` [PATCH] " Linus Torvalds
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='CALCETrWtK4G7-o=pQoeyzLFTpiR9giGy2jYTU10xt6S8FsMk+Q@mail.gmail.com' \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --subject='Re: [PATCH] x86_64: uninline TASK_SIZE' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).