From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755915AbcIMV3h (ORCPT ); Tue, 13 Sep 2016 17:29:37 -0400 Received: from mail.kernel.org ([198.145.29.136]:34030 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751488AbcIMV3g (ORCPT ); Tue, 13 Sep 2016 17:29:36 -0400 From: Andy Lutomirski To: x86@kernel.org Cc: Borislav Petkov , linux-kernel@vger.kernel.org, Brian Gerst , Jann Horn , Andy Lutomirski Subject: [PATCH 00/12] thread_info cleanups and stack caching Date: Tue, 13 Sep 2016 14:29:20 -0700 Message-Id: X-Mailer: git-send-email 2.7.4 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [Sorry this is late. I apparently never hit enter on the git send-email command. This is what I meant to send, except that I folded in the collect_syscall() fix and redid the rebase (which was uneventful).] This series extensively cleans up thread_info. thread_info has been partially redundant with thread_struct for a long time -- both are places for arch code to add additional per-task variables. thread_struct is much cleaner: it's always in task_struct, and there's nothing particularly magical about it. So this series moves x86's status field from thread_info to thread_struct and to remove x86's dependence on thread_info's position on the stack. Then it opts x86 into a new config option THREAD_INFO_IN_TASK to get rid of arch-specific thread_info entirely and simply embed a defanged thread_info (containing only flags) and 'int cpu' into task_struct. Once thread_info stops being magical, there's another benefit: we can free the thread stack as soon as the task is dead (without waiting for RCU) and then, if vmapped stacks are in use, cache the entire stack for reuse on the same cpu. This seems to be an overall speedup of about 0.5-1 µs per pthread_create/join compared to the old CONFIG_VMAP_STACK=n baseline in a simple test -- a percpu cache of vmalloced stacks appears to be a bit faster than a high-order stack allocation, at least when the cache hits. (I expect that workloads with a low cache hit rate are likely to be dominated by other effects anyway.) Changes from before: - A bunch of the series is already in 4.8-rc. - Added the get_wchan() and collect_syscall() patches. - Rebased. Andy Lutomirski (9): x86/asm: Move 'status' from struct thread_info to struct thread_struct sched: Allow putting thread_info into task_struct x86: Move thread_info into task_struct sched: Add try_get_task_stack() and put_task_stack() x86/dumpstack: Pin the target stack in save_stack_trace_tsk() x86/process: Pin the target stack in get_wchan() lib/syscall: Pin the task stack in collect_syscall() sched: Free the stack early if CONFIG_THREAD_INFO_IN_TASK fork: Cache two thread stacks per cpu if CONFIG_VMAP_STACK is set Linus Torvalds (2): x86/entry: Get rid of pt_regs_to_thread_info() um: Stop conflating task_struct::stack with thread_info Oleg Nesterov (1): kthread: to_live_kthread() needs try_get_task_stack() arch/x86/Kconfig | 1 + arch/x86/entry/common.c | 24 ++++------ arch/x86/entry/entry_64.S | 7 ++- arch/x86/include/asm/processor.h | 12 +++++ arch/x86/include/asm/syscall.h | 20 ++------ arch/x86/include/asm/thread_info.h | 69 ++------------------------- arch/x86/kernel/asm-offsets.c | 5 +- arch/x86/kernel/fpu/init.c | 1 - arch/x86/kernel/irq_64.c | 3 +- arch/x86/kernel/process.c | 28 ++++++----- arch/x86/kernel/process_64.c | 4 +- arch/x86/kernel/ptrace.c | 2 +- arch/x86/kernel/signal.c | 2 +- arch/x86/kernel/stacktrace.c | 5 ++ arch/x86/um/ptrace_32.c | 8 ++-- include/linux/init_task.h | 11 +++++ include/linux/sched.h | 66 +++++++++++++++++++++++++- include/linux/thread_info.h | 15 ++++++ init/Kconfig | 10 ++++ init/init_task.c | 7 ++- kernel/fork.c | 97 ++++++++++++++++++++++++++++++++++---- kernel/kthread.c | 8 +++- kernel/sched/core.c | 4 ++ kernel/sched/sched.h | 4 ++ lib/syscall.c | 15 +++++- 25 files changed, 286 insertions(+), 142 deletions(-) -- 2.7.4