* [PATCH] x86_64: uninline TASK_SIZE @ 2019-04-21 16:06 Alexey Dobriyan 2019-04-21 18:28 ` Ingo Molnar 2019-04-22 22:40 ` [PATCH] " Linus Torvalds 0 siblings, 2 replies; 14+ messages in thread From: Alexey Dobriyan @ 2019-04-21 16:06 UTC (permalink / raw) To: tglx, mingo, bp, hpa; +Cc: linux-kernel, x86 TASK_SIZE macro is quite deceptive: it looks like a constant but in fact compiles to 50+ bytes. Space savings on x86_64 defconfig: add/remove: 1/0 grow/shrink: 3/24 up/down: 77/-2247 (-2170) Function old new delta _task_size - 52 +52 mpol_shared_policy_init 344 363 +19 shmem_get_unmapped_area 92 97 +5 __rseq_handle_notify_resume.cold 34 35 +1 copy_from_user_nmi 123 113 -10 mmap_address_hint_valid 92 56 -36 arch_get_unmapped_area_topdown 471 435 -36 tlb_gather_mmu 164 126 -38 hugetlb_get_unmapped_area 774 736 -38 __create_xol_area 497 458 -39 arch_tlb_gather_mmu 160 120 -40 setup_new_exec 380 336 -44 __x64_sys_mlockall 378 333 -45 __ia32_sys_mlockall 378 333 -45 tlb_flush_mmu 235 189 -46 unmap_page_range 2098 2048 -50 copy_mount_options 518 465 -53 __get_user_pages 1737 1675 -62 get_unmapped_area 270 204 -66 perf_prepare_sample 1176 1098 -78 perf_callchain_user 549 469 -80 mremap_to.isra 545 457 -88 arch_tlb_finish_mmu 394 305 -89 __do_munmap 1039 927 -112 elf_map 527 409 -118 prctl_set_mm 1509 1335 -174 __rseq_handle_notify_resume 1116 906 -210 load_elf_binary 11761 11111 -650 Total: Before=14121337, After=14119167, chg -0.02% Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> --- arch/x86/include/asm/processor.h | 4 ++-- arch/x86/kernel/Makefile | 1 + arch/x86/kernel/task_size_64.c | 9 +++++++++ 3 files changed, 12 insertions(+), 2 deletions(-) --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -887,8 +887,8 @@ static inline void spin_lock_prefetch(const void *x) #define TASK_SIZE_LOW (test_thread_flag(TIF_ADDR32) ? \ IA32_PAGE_OFFSET : DEFAULT_MAP_WINDOW) -#define TASK_SIZE (test_thread_flag(TIF_ADDR32) ? \ - IA32_PAGE_OFFSET : TASK_SIZE_MAX) +unsigned long _task_size(void); +#define TASK_SIZE _task_size() #define TASK_SIZE_OF(child) ((test_tsk_thread_flag(child, TIF_ADDR32)) ? \ IA32_PAGE_OFFSET : TASK_SIZE_MAX) --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -46,6 +46,7 @@ CFLAGS_irq.o := -I$(src)/../include/asm/trace obj-y := process_$(BITS).o signal.o obj-$(CONFIG_COMPAT) += signal_compat.o +obj-$(CONFIG_X86_64) += task_size_64.o obj-y += traps.o idt.o irq.o irq_$(BITS).o dumpstack_$(BITS).o obj-y += time.o ioport.o dumpstack.o nmi.o obj-$(CONFIG_MODIFY_LDT_SYSCALL) += ldt.o new file mode 100644 --- /dev/null +++ b/arch/x86/kernel/task_size_64.c @@ -0,0 +1,9 @@ +#include <linux/export.h> +#include <linux/sched.h> +#include <linux/thread_info.h> + +unsigned long _task_size(void) +{ + return test_thread_flag(TIF_ADDR32) ? IA32_PAGE_OFFSET : TASK_SIZE_MAX; +} +EXPORT_SYMBOL(_task_size); ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] x86_64: uninline TASK_SIZE 2019-04-21 16:06 [PATCH] x86_64: uninline TASK_SIZE Alexey Dobriyan @ 2019-04-21 18:28 ` Ingo Molnar 2019-04-21 20:07 ` hpa 2019-04-21 22:07 ` [PATCH v2] " Alexey Dobriyan 2019-04-22 22:40 ` [PATCH] " Linus Torvalds 1 sibling, 2 replies; 14+ messages in thread From: Ingo Molnar @ 2019-04-21 18:28 UTC (permalink / raw) To: Alexey Dobriyan; +Cc: tglx, mingo, bp, hpa, linux-kernel, x86 * Alexey Dobriyan <adobriyan@gmail.com> wrote: > TASK_SIZE macro is quite deceptive: it looks like a constant but in fact > compiles to 50+ bytes. > > Space savings on x86_64 defconfig: > > add/remove: 1/0 grow/shrink: 3/24 up/down: 77/-2247 (-2170) > Function old new delta > _task_size - 52 +52 > mpol_shared_policy_init 344 363 +19 > shmem_get_unmapped_area 92 97 +5 > __rseq_handle_notify_resume.cold 34 35 +1 > copy_from_user_nmi 123 113 -10 > mmap_address_hint_valid 92 56 -36 > arch_get_unmapped_area_topdown 471 435 -36 > tlb_gather_mmu 164 126 -38 > hugetlb_get_unmapped_area 774 736 -38 > __create_xol_area 497 458 -39 > arch_tlb_gather_mmu 160 120 -40 > setup_new_exec 380 336 -44 > __x64_sys_mlockall 378 333 -45 > __ia32_sys_mlockall 378 333 -45 > tlb_flush_mmu 235 189 -46 > unmap_page_range 2098 2048 -50 > copy_mount_options 518 465 -53 > __get_user_pages 1737 1675 -62 > get_unmapped_area 270 204 -66 > perf_prepare_sample 1176 1098 -78 > perf_callchain_user 549 469 -80 > mremap_to.isra 545 457 -88 > arch_tlb_finish_mmu 394 305 -89 > __do_munmap 1039 927 -112 > elf_map 527 409 -118 > prctl_set_mm 1509 1335 -174 > __rseq_handle_notify_resume 1116 906 -210 > load_elf_binary 11761 11111 -650 > Total: Before=14121337, After=14119167, chg -0.02% > > Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> > --- > > arch/x86/include/asm/processor.h | 4 ++-- > arch/x86/kernel/Makefile | 1 + > arch/x86/kernel/task_size_64.c | 9 +++++++++ > 3 files changed, 12 insertions(+), 2 deletions(-) > > --- a/arch/x86/include/asm/processor.h > +++ b/arch/x86/include/asm/processor.h > @@ -887,8 +887,8 @@ static inline void spin_lock_prefetch(const void *x) > > #define TASK_SIZE_LOW (test_thread_flag(TIF_ADDR32) ? \ > IA32_PAGE_OFFSET : DEFAULT_MAP_WINDOW) > -#define TASK_SIZE (test_thread_flag(TIF_ADDR32) ? \ > - IA32_PAGE_OFFSET : TASK_SIZE_MAX) > +unsigned long _task_size(void); > +#define TASK_SIZE _task_size() > #define TASK_SIZE_OF(child) ((test_tsk_thread_flag(child, TIF_ADDR32)) ? \ > IA32_PAGE_OFFSET : TASK_SIZE_MAX) > > --- a/arch/x86/kernel/Makefile > +++ b/arch/x86/kernel/Makefile > @@ -46,6 +46,7 @@ CFLAGS_irq.o := -I$(src)/../include/asm/trace > > obj-y := process_$(BITS).o signal.o > obj-$(CONFIG_COMPAT) += signal_compat.o > +obj-$(CONFIG_X86_64) += task_size_64.o > obj-y += traps.o idt.o irq.o irq_$(BITS).o dumpstack_$(BITS).o > obj-y += time.o ioport.o dumpstack.o nmi.o > obj-$(CONFIG_MODIFY_LDT_SYSCALL) += ldt.o > new file mode 100644 > --- /dev/null > +++ b/arch/x86/kernel/task_size_64.c > @@ -0,0 +1,9 @@ > +#include <linux/export.h> > +#include <linux/sched.h> > +#include <linux/thread_info.h> > + > +unsigned long _task_size(void) > +{ > + return test_thread_flag(TIF_ADDR32) ? IA32_PAGE_OFFSET : TASK_SIZE_MAX; > +} > +EXPORT_SYMBOL(_task_size); Good idea - but instead of adding yet another compilation unit, why not stick _task_size() into arch/x86/kernel/process_64.c, which is the canonical place for process management related arch functions? Thanks, Ingo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] x86_64: uninline TASK_SIZE 2019-04-21 18:28 ` Ingo Molnar @ 2019-04-21 20:07 ` hpa 2019-04-21 21:10 ` Alexey Dobriyan 2019-04-21 22:07 ` [PATCH v2] " Alexey Dobriyan 1 sibling, 1 reply; 14+ messages in thread From: hpa @ 2019-04-21 20:07 UTC (permalink / raw) To: Ingo Molnar, Alexey Dobriyan; +Cc: tglx, mingo, bp, linux-kernel, x86 On April 21, 2019 11:28:42 AM PDT, Ingo Molnar <mingo@kernel.org> wrote: > >* Alexey Dobriyan <adobriyan@gmail.com> wrote: > >> TASK_SIZE macro is quite deceptive: it looks like a constant but in >fact >> compiles to 50+ bytes. >> >> Space savings on x86_64 defconfig: >> >> add/remove: 1/0 grow/shrink: 3/24 up/down: 77/-2247 (-2170) >> Function old new delta >> _task_size - 52 +52 >> mpol_shared_policy_init 344 363 +19 >> shmem_get_unmapped_area 92 97 +5 >> __rseq_handle_notify_resume.cold 34 35 +1 >> copy_from_user_nmi 123 113 -10 >> mmap_address_hint_valid 92 56 -36 >> arch_get_unmapped_area_topdown 471 435 -36 >> tlb_gather_mmu 164 126 -38 >> hugetlb_get_unmapped_area 774 736 -38 >> __create_xol_area 497 458 -39 >> arch_tlb_gather_mmu 160 120 -40 >> setup_new_exec 380 336 -44 >> __x64_sys_mlockall 378 333 -45 >> __ia32_sys_mlockall 378 333 -45 >> tlb_flush_mmu 235 189 -46 >> unmap_page_range 2098 2048 -50 >> copy_mount_options 518 465 -53 >> __get_user_pages 1737 1675 -62 >> get_unmapped_area 270 204 -66 >> perf_prepare_sample 1176 1098 -78 >> perf_callchain_user 549 469 -80 >> mremap_to.isra 545 457 -88 >> arch_tlb_finish_mmu 394 305 -89 >> __do_munmap 1039 927 -112 >> elf_map 527 409 -118 >> prctl_set_mm 1509 1335 -174 >> __rseq_handle_notify_resume 1116 906 -210 >> load_elf_binary 11761 11111 -650 >> Total: Before=14121337, After=14119167, chg -0.02% >> >> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> >> --- >> >> arch/x86/include/asm/processor.h | 4 ++-- >> arch/x86/kernel/Makefile | 1 + >> arch/x86/kernel/task_size_64.c | 9 +++++++++ >> 3 files changed, 12 insertions(+), 2 deletions(-) >> >> --- a/arch/x86/include/asm/processor.h >> +++ b/arch/x86/include/asm/processor.h >> @@ -887,8 +887,8 @@ static inline void spin_lock_prefetch(const void >*x) >> >> #define TASK_SIZE_LOW (test_thread_flag(TIF_ADDR32) ? \ >> IA32_PAGE_OFFSET : DEFAULT_MAP_WINDOW) >> -#define TASK_SIZE (test_thread_flag(TIF_ADDR32) ? \ >> - IA32_PAGE_OFFSET : TASK_SIZE_MAX) >> +unsigned long _task_size(void); >> +#define TASK_SIZE _task_size() >> #define TASK_SIZE_OF(child) ((test_tsk_thread_flag(child, >TIF_ADDR32)) ? \ >> IA32_PAGE_OFFSET : TASK_SIZE_MAX) >> >> --- a/arch/x86/kernel/Makefile >> +++ b/arch/x86/kernel/Makefile >> @@ -46,6 +46,7 @@ CFLAGS_irq.o := -I$(src)/../include/asm/trace >> >> obj-y := process_$(BITS).o signal.o >> obj-$(CONFIG_COMPAT) += signal_compat.o >> +obj-$(CONFIG_X86_64) += task_size_64.o >> obj-y += traps.o idt.o irq.o irq_$(BITS).o dumpstack_$(BITS).o >> obj-y += time.o ioport.o dumpstack.o nmi.o >> obj-$(CONFIG_MODIFY_LDT_SYSCALL) += ldt.o >> new file mode 100644 >> --- /dev/null >> +++ b/arch/x86/kernel/task_size_64.c >> @@ -0,0 +1,9 @@ >> +#include <linux/export.h> >> +#include <linux/sched.h> >> +#include <linux/thread_info.h> >> + >> +unsigned long _task_size(void) >> +{ >> + return test_thread_flag(TIF_ADDR32) ? IA32_PAGE_OFFSET : >TASK_SIZE_MAX; >> +} >> +EXPORT_SYMBOL(_task_size); > >Good idea - but instead of adding yet another compilation unit, why not > >stick _task_size() into arch/x86/kernel/process_64.c, which is the >canonical place for process management related arch functions? > >Thanks, > > Ingo Better yet... since TIF_ADDR32 isn't something that changes randomly, perhaps this should be a separate variable? -- Sent from my Android device with K-9 Mail. Please excuse my brevity. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] x86_64: uninline TASK_SIZE 2019-04-21 20:07 ` hpa @ 2019-04-21 21:10 ` Alexey Dobriyan 2019-04-22 10:34 ` Ingo Molnar 0 siblings, 1 reply; 14+ messages in thread From: Alexey Dobriyan @ 2019-04-21 21:10 UTC (permalink / raw) To: hpa; +Cc: Ingo Molnar, tglx, mingo, bp, linux-kernel, x86 On Sun, Apr 21, 2019 at 01:07:08PM -0700, hpa@zytor.com wrote: > On April 21, 2019 11:28:42 AM PDT, Ingo Molnar <mingo@kernel.org> wrote: > > > >* Alexey Dobriyan <adobriyan@gmail.com> wrote: > > > >> TASK_SIZE macro is quite deceptive: it looks like a constant but in > >fact > >> compiles to 50+ bytes. > >> > >> Space savings on x86_64 defconfig: > >> > >> add/remove: 1/0 grow/shrink: 3/24 up/down: 77/-2247 (-2170) > >> Function old new delta > >> _task_size - 52 +52 > >> mpol_shared_policy_init 344 363 +19 > >> shmem_get_unmapped_area 92 97 +5 > >> __rseq_handle_notify_resume.cold 34 35 +1 > >> copy_from_user_nmi 123 113 -10 > >> mmap_address_hint_valid 92 56 -36 > >> arch_get_unmapped_area_topdown 471 435 -36 > >> tlb_gather_mmu 164 126 -38 > >> hugetlb_get_unmapped_area 774 736 -38 > >> __create_xol_area 497 458 -39 > >> arch_tlb_gather_mmu 160 120 -40 > >> setup_new_exec 380 336 -44 > >> __x64_sys_mlockall 378 333 -45 > >> __ia32_sys_mlockall 378 333 -45 > >> tlb_flush_mmu 235 189 -46 > >> unmap_page_range 2098 2048 -50 > >> copy_mount_options 518 465 -53 > >> __get_user_pages 1737 1675 -62 > >> get_unmapped_area 270 204 -66 > >> perf_prepare_sample 1176 1098 -78 > >> perf_callchain_user 549 469 -80 > >> mremap_to.isra 545 457 -88 > >> arch_tlb_finish_mmu 394 305 -89 > >> __do_munmap 1039 927 -112 > >> elf_map 527 409 -118 > >> prctl_set_mm 1509 1335 -174 > >> __rseq_handle_notify_resume 1116 906 -210 > >> load_elf_binary 11761 11111 -650 > >> Total: Before=14121337, After=14119167, chg -0.02% > >> > >> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> > >> --- > >> > >> arch/x86/include/asm/processor.h | 4 ++-- > >> arch/x86/kernel/Makefile | 1 + > >> arch/x86/kernel/task_size_64.c | 9 +++++++++ > >> 3 files changed, 12 insertions(+), 2 deletions(-) > >> > >> --- a/arch/x86/include/asm/processor.h > >> +++ b/arch/x86/include/asm/processor.h > >> @@ -887,8 +887,8 @@ static inline void spin_lock_prefetch(const void > >*x) > >> > >> #define TASK_SIZE_LOW (test_thread_flag(TIF_ADDR32) ? \ > >> IA32_PAGE_OFFSET : DEFAULT_MAP_WINDOW) > >> -#define TASK_SIZE (test_thread_flag(TIF_ADDR32) ? \ > >> - IA32_PAGE_OFFSET : TASK_SIZE_MAX) > >> +unsigned long _task_size(void); > >> +#define TASK_SIZE _task_size() > >> #define TASK_SIZE_OF(child) ((test_tsk_thread_flag(child, > >TIF_ADDR32)) ? \ > >> IA32_PAGE_OFFSET : TASK_SIZE_MAX) > >> > >> --- a/arch/x86/kernel/Makefile > >> +++ b/arch/x86/kernel/Makefile > >> @@ -46,6 +46,7 @@ CFLAGS_irq.o := -I$(src)/../include/asm/trace > >> > >> obj-y := process_$(BITS).o signal.o > >> obj-$(CONFIG_COMPAT) += signal_compat.o > >> +obj-$(CONFIG_X86_64) += task_size_64.o > >> obj-y += traps.o idt.o irq.o irq_$(BITS).o dumpstack_$(BITS).o > >> obj-y += time.o ioport.o dumpstack.o nmi.o > >> obj-$(CONFIG_MODIFY_LDT_SYSCALL) += ldt.o > >> new file mode 100644 > >> --- /dev/null > >> +++ b/arch/x86/kernel/task_size_64.c > >> @@ -0,0 +1,9 @@ > >> +#include <linux/export.h> > >> +#include <linux/sched.h> > >> +#include <linux/thread_info.h> > >> + > >> +unsigned long _task_size(void) > >> +{ > >> + return test_thread_flag(TIF_ADDR32) ? IA32_PAGE_OFFSET : > >TASK_SIZE_MAX; > >> +} > >> +EXPORT_SYMBOL(_task_size); > > > >Good idea - but instead of adding yet another compilation unit, why not > > > >stick _task_size() into arch/x86/kernel/process_64.c, which is the > >canonical place for process management related arch functions? > > > >Thanks, > > > > Ingo > > Better yet... since TIF_ADDR32 isn't something that changes randomly, perhaps this should be a separate variable? Maybe. I only thought about putting every 32-bit related flag under CONFIG_COMPAT to further eradicate bloat (and force everyone else to keep an eye on it, ha-ha). ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] x86_64: uninline TASK_SIZE 2019-04-21 21:10 ` Alexey Dobriyan @ 2019-04-22 10:34 ` Ingo Molnar 2019-04-22 14:30 ` Andy Lutomirski 2019-04-22 22:04 ` Alexey Dobriyan 0 siblings, 2 replies; 14+ messages in thread From: Ingo Molnar @ 2019-04-22 10:34 UTC (permalink / raw) To: Alexey Dobriyan Cc: hpa, tglx, mingo, bp, linux-kernel, x86, Andy Lutomirski, Peter Zijlstra, Linus Torvalds, Al Viro * Alexey Dobriyan <adobriyan@gmail.com> wrote: > > >> +++ b/arch/x86/kernel/task_size_64.c > > >> @@ -0,0 +1,9 @@ > > >> +#include <linux/export.h> > > >> +#include <linux/sched.h> > > >> +#include <linux/thread_info.h> > > >> + > > >> +unsigned long _task_size(void) > > >> +{ > > >> + return test_thread_flag(TIF_ADDR32) ? IA32_PAGE_OFFSET : > > >TASK_SIZE_MAX; > > >> +} > > >> +EXPORT_SYMBOL(_task_size); > > > > > >Good idea - but instead of adding yet another compilation unit, why not > > > > > >stick _task_size() into arch/x86/kernel/process_64.c, which is the > > >canonical place for process management related arch functions? > > > > > >Thanks, > > > > > > Ingo > > > > Better yet... since TIF_ADDR32 isn't something that changes randomly, > > perhaps this should be a separate variable? > > Maybe. I only thought about putting every 32-bit related flag under > CONFIG_COMPAT to further eradicate bloat (and force everyone else to > keep an eye on it, ha-ha). Basically TIF_ADDR32 is only set for a task if set_personality_ia32() is called, which function is called in the following circumstances: - arch/x86/ia32/ia32_aout.c:load_aout_binary() This is in exec(), when a new binary is loaded for the current task, via search_binary_handler() and exec_binprm(). Ordering is synchronous, AFAICS there can be no race between TASK_SIZE users and the set_personality_ia32() call which is always for the current task. - in COMPAT_SET_PERSONALITY(), which through macro detours ends up being in SET_PERSONALITY2(), which is used in fs/compat_binfmt_elf.c's load_elf_binary(), used in a similar fashion in exec() as the AOUT case above. One particular macro detour of note is that fs/compat_binfmt_elf.c #includes fs/binfmt_elf.c and re-defines the personality setting method to map to set_personality_ia32(). When set_personality_ia32() is called then TIF_ADDR32 is set unconditionally, without any Kconfig variations. TIF_ADDR32 is cleared: - In set_personality_64bit(), when a 64-bit binary is loaded via fs/binfmt_elf.c. - It also defaults to clear in the init task, which is inherited by the initial kernel threads and any user-space task they might end up executing. So the conclusion is that IMO we can safely put TASK_SIZE into a new thread_info()->task_size field, and: - change ->task_size to the 32-bit address space in set_personality_ia32() - change ->task_size to teh 64-bit address space in the init task and in set_personality_64bit(). This should cover it I think, unless I missed something. Thanks, Ingo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] x86_64: uninline TASK_SIZE 2019-04-22 10:34 ` Ingo Molnar @ 2019-04-22 14:30 ` Andy Lutomirski 2019-04-22 22:09 ` Alexey Dobriyan 2019-04-22 22:04 ` Alexey Dobriyan 1 sibling, 1 reply; 14+ messages in thread From: Andy Lutomirski @ 2019-04-22 14:30 UTC (permalink / raw) To: Ingo Molnar Cc: Alexey Dobriyan, hpa, tglx, mingo, bp, linux-kernel, x86, Andy Lutomirski, Peter Zijlstra, Linus Torvalds, Al Viro > On Apr 22, 2019, at 3:34 AM, Ingo Molnar <mingo@kernel.org> wrote: > > > * Alexey Dobriyan <adobriyan@gmail.com> wrote: > >>>>> +++ b/arch/x86/kernel/task_size_64.c >>>>> @@ -0,0 +1,9 @@ >>>>> +#include <linux/export.h> >>>>> +#include <linux/sched.h> >>>>> +#include <linux/thread_info.h> >>>>> + >>>>> +unsigned long _task_size(void) >>>>> +{ >>>>> + return test_thread_flag(TIF_ADDR32) ? IA32_PAGE_OFFSET : >>>> TASK_SIZE_MAX; >>>>> +} >>>>> +EXPORT_SYMBOL(_task_size); >>>> >>>> Good idea - but instead of adding yet another compilation unit, why not >>>> >>>> stick _task_size() into arch/x86/kernel/process_64.c, which is the >>>> canonical place for process management related arch functions? >>>> >>>> Thanks, >>>> >>>> Ingo >>> >>> Better yet... since TIF_ADDR32 isn't something that changes randomly, >>> perhaps this should be a separate variable? >> >> Maybe. I only thought about putting every 32-bit related flag under >> CONFIG_COMPAT to further eradicate bloat (and force everyone else to >> keep an eye on it, ha-ha). > > Basically TIF_ADDR32 is only set for a task if set_personality_ia32() is > called, which function is called in the following circumstances: > > - arch/x86/ia32/ia32_aout.c:load_aout_binary() > > This is in exec(), when a new binary is loaded for the current task, > via search_binary_handler() and exec_binprm(). Ordering is > synchronous, AFAICS there can be no race between TASK_SIZE users and > the set_personality_ia32() call which is always for the current task. > > - in COMPAT_SET_PERSONALITY(), which through macro detours ends up being > in SET_PERSONALITY2(), which is used in fs/compat_binfmt_elf.c's > load_elf_binary(), used in a similar fashion in exec() as the AOUT > case above. One particular macro detour of note is that > fs/compat_binfmt_elf.c #includes fs/binfmt_elf.c and re-defines the > personality setting method to map to set_personality_ia32(). > > When set_personality_ia32() is called then TIF_ADDR32 is set > unconditionally, without any Kconfig variations. > > TIF_ADDR32 is cleared: > > - In set_personality_64bit(), when a 64-bit binary is loaded via > fs/binfmt_elf.c. > > - It also defaults to clear in the init task, which is inherited by the > initial kernel threads and any user-space task they might end up > executing. > > So the conclusion is that IMO we can safely put TASK_SIZE into a new > thread_info()->task_size field, and: > > - change ->task_size to the 32-bit address space in > set_personality_ia32() > > - change ->task_size to teh 64-bit address space in the init task and in > set_personality_64bit(). > > This should cover it I think, unless I missed something. > Are there really enough TASK_SIZE users to justify any of this? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] x86_64: uninline TASK_SIZE 2019-04-22 14:30 ` Andy Lutomirski @ 2019-04-22 22:09 ` Alexey Dobriyan 2019-04-23 0:54 ` Andy Lutomirski 0 siblings, 1 reply; 14+ messages in thread From: Alexey Dobriyan @ 2019-04-22 22:09 UTC (permalink / raw) To: Andy Lutomirski Cc: Ingo Molnar, hpa, tglx, mingo, bp, linux-kernel, x86, Andy Lutomirski, Peter Zijlstra, Linus Torvalds, Al Viro On Mon, Apr 22, 2019 at 07:30:40AM -0700, Andy Lutomirski wrote: > > > > On Apr 22, 2019, at 3:34 AM, Ingo Molnar <mingo@kernel.org> wrote: > > > > > > * Alexey Dobriyan <adobriyan@gmail.com> wrote: > > > >>>>> +++ b/arch/x86/kernel/task_size_64.c > >>>>> @@ -0,0 +1,9 @@ > >>>>> +#include <linux/export.h> > >>>>> +#include <linux/sched.h> > >>>>> +#include <linux/thread_info.h> > >>>>> + > >>>>> +unsigned long _task_size(void) > >>>>> +{ > >>>>> + return test_thread_flag(TIF_ADDR32) ? IA32_PAGE_OFFSET : > >>>> TASK_SIZE_MAX; > >>>>> +} > >>>>> +EXPORT_SYMBOL(_task_size); > >>>> > >>>> Good idea - but instead of adding yet another compilation unit, why not > >>>> > >>>> stick _task_size() into arch/x86/kernel/process_64.c, which is the > >>>> canonical place for process management related arch functions? > >>>> > >>>> Thanks, > >>>> > >>>> Ingo > >>> > >>> Better yet... since TIF_ADDR32 isn't something that changes randomly, > >>> perhaps this should be a separate variable? > >> > >> Maybe. I only thought about putting every 32-bit related flag under > >> CONFIG_COMPAT to further eradicate bloat (and force everyone else to > >> keep an eye on it, ha-ha). > > > > Basically TIF_ADDR32 is only set for a task if set_personality_ia32() is > > called, which function is called in the following circumstances: > > > > - arch/x86/ia32/ia32_aout.c:load_aout_binary() > > > > This is in exec(), when a new binary is loaded for the current task, > > via search_binary_handler() and exec_binprm(). Ordering is > > synchronous, AFAICS there can be no race between TASK_SIZE users and > > the set_personality_ia32() call which is always for the current task. > > > > - in COMPAT_SET_PERSONALITY(), which through macro detours ends up being > > in SET_PERSONALITY2(), which is used in fs/compat_binfmt_elf.c's > > load_elf_binary(), used in a similar fashion in exec() as the AOUT > > case above. One particular macro detour of note is that > > fs/compat_binfmt_elf.c #includes fs/binfmt_elf.c and re-defines the > > personality setting method to map to set_personality_ia32(). > > > > When set_personality_ia32() is called then TIF_ADDR32 is set > > unconditionally, without any Kconfig variations. > > > > TIF_ADDR32 is cleared: > > > > - In set_personality_64bit(), when a 64-bit binary is loaded via > > fs/binfmt_elf.c. > > > > - It also defaults to clear in the init task, which is inherited by the > > initial kernel threads and any user-space task they might end up > > executing. > > > > So the conclusion is that IMO we can safely put TASK_SIZE into a new > > thread_info()->task_size field, and: > > > > - change ->task_size to the 32-bit address space in > > set_personality_ia32() > > > > - change ->task_size to teh 64-bit address space in the init task and in > > set_personality_64bit(). > > > > This should cover it I think, unless I missed something. > > > > Are there really enough TASK_SIZE users to justify any of this? Saving 2KB on a defconfig is quite a lot. If put into thread_info, ->task_size can be pulled using just RAX which in turn allows to do asm volatile "call %P" ... "=a" (...) saving even more space. But it is late here so don't quote me. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] x86_64: uninline TASK_SIZE 2019-04-22 22:09 ` Alexey Dobriyan @ 2019-04-23 0:54 ` Andy Lutomirski 2019-04-23 11:12 ` Ingo Molnar 0 siblings, 1 reply; 14+ messages in thread From: Andy Lutomirski @ 2019-04-23 0:54 UTC (permalink / raw) To: Alexey Dobriyan Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov, LKML, X86 ML, Andy Lutomirski, Peter Zijlstra, Linus Torvalds, Al Viro On Mon, Apr 22, 2019 at 3:09 PM Alexey Dobriyan <adobriyan@gmail.com> wrote: > > On Mon, Apr 22, 2019 at 07:30:40AM -0700, Andy Lutomirski wrote: > > > > > > > On Apr 22, 2019, at 3:34 AM, Ingo Molnar <mingo@kernel.org> wrote: > > > > > > > > > * Alexey Dobriyan <adobriyan@gmail.com> wrote: > > > > > >>>>> +++ b/arch/x86/kernel/task_size_64.c > > >>>>> @@ -0,0 +1,9 @@ > > >>>>> +#include <linux/export.h> > > >>>>> +#include <linux/sched.h> > > >>>>> +#include <linux/thread_info.h> > > >>>>> + > > >>>>> +unsigned long _task_size(void) > > >>>>> +{ > > >>>>> + return test_thread_flag(TIF_ADDR32) ? IA32_PAGE_OFFSET : > > >>>> TASK_SIZE_MAX; > > >>>>> +} > > >>>>> +EXPORT_SYMBOL(_task_size); > > >>>> > > >>>> Good idea - but instead of adding yet another compilation unit, why not > > >>>> > > >>>> stick _task_size() into arch/x86/kernel/process_64.c, which is the > > >>>> canonical place for process management related arch functions? > > >>>> > > >>>> Thanks, > > >>>> > > >>>> Ingo > > >>> > > >>> Better yet... since TIF_ADDR32 isn't something that changes randomly, > > >>> perhaps this should be a separate variable? > > >> > > >> Maybe. I only thought about putting every 32-bit related flag under > > >> CONFIG_COMPAT to further eradicate bloat (and force everyone else to > > >> keep an eye on it, ha-ha). > > > > > > Basically TIF_ADDR32 is only set for a task if set_personality_ia32() is > > > called, which function is called in the following circumstances: > > > > > > - arch/x86/ia32/ia32_aout.c:load_aout_binary() > > > > > > This is in exec(), when a new binary is loaded for the current task, > > > via search_binary_handler() and exec_binprm(). Ordering is > > > synchronous, AFAICS there can be no race between TASK_SIZE users and > > > the set_personality_ia32() call which is always for the current task. > > > > > > - in COMPAT_SET_PERSONALITY(), which through macro detours ends up being > > > in SET_PERSONALITY2(), which is used in fs/compat_binfmt_elf.c's > > > load_elf_binary(), used in a similar fashion in exec() as the AOUT > > > case above. One particular macro detour of note is that > > > fs/compat_binfmt_elf.c #includes fs/binfmt_elf.c and re-defines the > > > personality setting method to map to set_personality_ia32(). > > > > > > When set_personality_ia32() is called then TIF_ADDR32 is set > > > unconditionally, without any Kconfig variations. > > > > > > TIF_ADDR32 is cleared: > > > > > > - In set_personality_64bit(), when a 64-bit binary is loaded via > > > fs/binfmt_elf.c. > > > > > > - It also defaults to clear in the init task, which is inherited by the > > > initial kernel threads and any user-space task they might end up > > > executing. > > > > > > So the conclusion is that IMO we can safely put TASK_SIZE into a new > > > thread_info()->task_size field, and: > > > > > > - change ->task_size to the 32-bit address space in > > > set_personality_ia32() > > > > > > - change ->task_size to teh 64-bit address space in the init task and in > > > set_personality_64bit(). > > > > > > This should cover it I think, unless I missed something. > > > > > > > Are there really enough TASK_SIZE users to justify any of this? > > Saving 2KB on a defconfig is quite a lot. Saving 2kB of text by adding 8 bytes to thread_info seems rather dubious to me. You only need 256 tasks before you lose. My not-particularly-loaded laptop has 865 tasks right now. As a general principle, the mere existence of TIF_ADDR32 is a bug. The value of that flag is *wrong* under the 32-bit variant of CRIU. How about instead making some more progress toward getting rid of dubious TASK_SIZE users? I'm working on a little series to get rid of most of them. Meanwhile: it sure looks like a large fraction of the users are confused as to whether TASK_SIZE is the highest user address or the lowest non-user address. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] x86_64: uninline TASK_SIZE 2019-04-23 0:54 ` Andy Lutomirski @ 2019-04-23 11:12 ` Ingo Molnar 2019-04-23 17:21 ` Andy Lutomirski 0 siblings, 1 reply; 14+ messages in thread From: Ingo Molnar @ 2019-04-23 11:12 UTC (permalink / raw) To: Andy Lutomirski Cc: Alexey Dobriyan, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov, LKML, X86 ML, Peter Zijlstra, Linus Torvalds, Al Viro * Andy Lutomirski <luto@kernel.org> wrote: > > Saving 2KB on a defconfig is quite a lot. > > Saving 2kB of text by adding 8 bytes to thread_info seems rather > dubious to me. You only need 256 tasks before you lose. My > not-particularly-loaded laptop has 865 tasks right now. I was suggesting current->task_size or thread_info->task_size as a way to 100% avoid the function call overhead. Worth a tiny amount of RAM - even with 1 million tasks it's only 4MB of RAM. ;-) Some TASK_SIZE users are prominent syscalls: mmap(), > As a general principle, the mere existence of TIF_ADDR32 is a bug. The > value of that flag is *wrong* under the 32-bit variant of CRIU. How > about instead making some more progress toward getting rid of dubious > TASK_SIZE users? I'm working on a little series to get rid of most of > them. Meanwhile: it sure looks like a large fraction of the users are > confused as to whether TASK_SIZE is the highest user address or the > lowest non-user address. I really like that, replacing TASK_SIZE with *nothing* would be even faster. In fact instead of just reducing its usage I'd suggest removing TASK_SIZE altogether and renaming TASK_SIZE_MAX back to TASK_SIZE, or something like that - the confusion from the deceptively macro-ish naming of TASK_SIZE is real. The original commit of making TASK_SIZE dynamic on the task's compat flag was done in: 84929801e14d: [PATCH] x86_64: TASK_SIZE fixes for compatibility mode processes Here's the justification given: Appended patch will setup compatibility mode TASK_SIZE properly. This will fix atleast three known bugs that can be encountered while running compatibility mode apps. a) A malicious 32bit app can have an elf section at 0xffffe000. During exec of this app, we will have a memory leak as insert_vm_struct() is not checking for return value in syscall32_setup_pages() and thus not freeing the vma allocated for the vsyscall page. And instead of exec failing (as it has addresses > TASK_SIZE), we were allowing it to succeed previously. b) With a 32bit app, hugetlb_get_unmapped_area/arch_get_unmapped_area may return addresses beyond 32bits, ultimately causing corruption because of wrap-around and resulting in SEGFAULT, instead of returning ENOMEM. c) 32bit app doing this below mmap will now fail. mmap((void *)(0xFFFFE000UL), 0x10000UL, PROT_READ|PROT_WRITE, MAP_FIXED|MAP_PRIVATE|MAP_ANON, 0, 0); I believe a) is addressed by getting rid of the vsyscall page - but it might also not be a current problem because the vsyscall page has its own gate-vma now. b) shouldn't be an issue if the mmap allocator correctly treats the compat bit - this doesn't require generic TASK_SIZE variations though, as the mmap allocator is already specific to arch/x86/. c) is a variant of a) I believe, which should be fixed by now. I just looked through some of the current TASK_SIZE users, and *ALL* of them seem dubious to me, with the exception of the mmap allocators. In fact some of them seem to be active bugs: kernel/: - PR_SET_MM_MAP, PR_SET_MM_MAP_SIZE, prctl_set_mm() et al. Ugh, what a nasty prctl()! But besides that, the TASK_SIZE restriction to the ABI looks questionable: if we offer this CRIU functionality then why should it be impossible for a 32-bit CRIU task to switch to 64-bit? - kernel/events/core.c: TASK_SIZE_MAX should be a fine filter here, in fact it's probably *wrong* to restrict the profiling data here just because the task happens to be in 32-bit compat mode. - kernel/rseq.c: is this TASK_SIZE restriction even required, wouldn't TASK_SIZE_MAX be sufficient? mm/: - GUP's get_get_area() et al looks really weird - why do we special-case vsyscalls: - Can we get rid of the vsyscall page in modern kernels? - I don't think anyone runs those ancient glibc versions with a fresh kernel anymore - can we start generating a WARN()ing perhaps to see whether there's any complaints? - Or at least pretend it doesn't exist in terms of a GUP target page? - mm/kasan/generic_report.c:get_wild_bug_type() - this can use TASK_SIZE_MAX just fine IMHO. - mm/mempolicy.c:mpol_shared_policy_init() - unsure, but I suspect we can just create the pseudo-vma with a TASK_SIZE_MAX vm_end just fine. - mm/mlock.c:mlockall() - I believe it could be considered an outright *bug* if there any pages outside the 32-bit area and don't get mlocked by mlockall, just because this is a compat task. Especially with the CRIU prctl() having 64-bit vmas outside the 32-bit mappings is a real possibility, right? I.e. TASK_SIZE_MAX would be the right solution here. To turn the argument around: beyond the memory allocators, which includes the mmap and huge-mmap variants plus the SysV shmem allocator, can we list all the places that absolutely *rely* on TASK_SIZE being TIF_ADDR32 restricted on compat tasks? I couldn't find any. So I concur 100% that most TASK_SIZE uses are questionable. In fact think 84929801e14d was a mistake, and we should effectively revert it carefully, by: - First by moving almost all TASK_SIZE users over to TASK_SIZE_MAX, analyzing and justifying the impact case by case. - Then making the mmap allocators compat compatible (ha) without relying on TASK_SIZE. - Renaming TASK_SIZE back to TASK_SIZE_MAX and getting rid of the TASK_SIZE and TASK_SIZE_MAX differentiation. Or am I missing some complication? Thanks, Ingo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] x86_64: uninline TASK_SIZE 2019-04-23 11:12 ` Ingo Molnar @ 2019-04-23 17:21 ` Andy Lutomirski 2019-04-24 10:38 ` Ingo Molnar 0 siblings, 1 reply; 14+ messages in thread From: Andy Lutomirski @ 2019-04-23 17:21 UTC (permalink / raw) To: Ingo Molnar Cc: Andy Lutomirski, Alexey Dobriyan, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov, LKML, X86 ML, Peter Zijlstra, Linus Torvalds, Al Viro On Tue, Apr 23, 2019 at 4:13 AM Ingo Molnar <mingo@kernel.org> wrote: > > > * Andy Lutomirski <luto@kernel.org> wrote: > > > > Saving 2KB on a defconfig is quite a lot. > > > > Saving 2kB of text by adding 8 bytes to thread_info seems rather > > dubious to me. You only need 256 tasks before you lose. My > > not-particularly-loaded laptop has 865 tasks right now. > > I was suggesting current->task_size or thread_info->task_size as a way to > 100% avoid the function call overhead. Worth a tiny amount of RAM - even > with 1 million tasks it's only 4MB of RAM. ;-) > > Some TASK_SIZE users are prominent syscalls: mmap(), > > > As a general principle, the mere existence of TIF_ADDR32 is a bug. The > > value of that flag is *wrong* under the 32-bit variant of CRIU. How > > about instead making some more progress toward getting rid of dubious > > TASK_SIZE users? I'm working on a little series to get rid of most of > > them. Meanwhile: it sure looks like a large fraction of the users are > > confused as to whether TASK_SIZE is the highest user address or the > > lowest non-user address. > > I really like that, replacing TASK_SIZE with *nothing* would be even > faster. > > In fact instead of just reducing its usage I'd suggest removing TASK_SIZE > altogether and renaming TASK_SIZE_MAX back to TASK_SIZE, or something > like that - the confusion from the deceptively macro-ish naming of > TASK_SIZE is real. > > The original commit of making TASK_SIZE dynamic on the task's compat flag > was done in: > > 84929801e14d: [PATCH] x86_64: TASK_SIZE fixes for compatibility mode processes > > Here's the justification given: > > Appended patch will setup compatibility mode TASK_SIZE properly. This will > fix atleast three known bugs that can be encountered while running > compatibility mode apps. > > a) A malicious 32bit app can have an elf section at 0xffffe000. During > exec of this app, we will have a memory leak as insert_vm_struct() is > not checking for return value in syscall32_setup_pages() and thus not > freeing the vma allocated for the vsyscall page. And instead of exec > failing (as it has addresses > TASK_SIZE), we were allowing it to > succeed previously. > > b) With a 32bit app, hugetlb_get_unmapped_area/arch_get_unmapped_area > may return addresses beyond 32bits, ultimately causing corruption > because of wrap-around and resulting in SEGFAULT, instead of returning > ENOMEM. > > c) 32bit app doing this below mmap will now fail. > > mmap((void *)(0xFFFFE000UL), 0x10000UL, PROT_READ|PROT_WRITE, > MAP_FIXED|MAP_PRIVATE|MAP_ANON, 0, 0); > > I believe a) is addressed by getting rid of the vsyscall page - but it > might also not be a current problem because the vsyscall page has its own > gate-vma now. > I suspect that whatever issue this is predates my involvement in Linux :) The "gate" VMA, aka vsyscall, is at 0xffffffffff600000, and the vDSO setup code shouldn't anywhere near that fragile. Also, if this really a bug, we have it for the 64-bit case, too, and TASK_SIZE isn't going to help. > b) shouldn't be an issue if the mmap allocator correctly treats the > compat bit - this doesn't require generic TASK_SIZE variations though, as > the mmap allocator is already specific to arch/x86/. This was mostly fixed by the CRIU folks a while back, I think. > > c) is a variant of a) I believe, which should be fixed by now. > > I just looked through some of the current TASK_SIZE users, and *ALL* of > them seem dubious to me, with the exception of the mmap allocators. In Even the munmap one seems to be a bug. > fact some of them seem to be active bugs: > > kernel/: > > - PR_SET_MM_MAP, PR_SET_MM_MAP_SIZE, prctl_set_mm() et al. Ugh, what a > nasty prctl()! But besides that, the TASK_SIZE restriction to the ABI > looks questionable: if we offer this CRIU functionality then why > should it be impossible for a 32-bit CRIU task to switch to 64-bit? > > - kernel/events/core.c: TASK_SIZE_MAX should be a fine filter here, in > fact it's probably *wrong* to restrict the profiling data here just > because the task happens to be in 32-bit compat mode. Yep. I have a patch fo rthis. > > - kernel/rseq.c: is this TASK_SIZE restriction even required, wouldn't > TASK_SIZE_MAX be sufficient? Presumably. > So I concur 100% that most TASK_SIZE uses are questionable. In fact think > 84929801e14d was a mistake, and we should effectively revert it > carefully, by: > > - First by moving almost all TASK_SIZE users over to TASK_SIZE_MAX, > analyzing and justifying the impact case by case. > > - Then making the mmap allocators compat compatible (ha) without relying > on TASK_SIZE. I have a somewhat hacky patch for this right now. > > - Renaming TASK_SIZE back to TASK_SIZE_MAX and getting rid of the > TASK_SIZE and TASK_SIZE_MAX differentiation. > > Or am I missing some complication? Seems like a great idea to me. BTW, what the heck is up with get_gate_page()? I'm struggling to understand what it's even trying to do. If there's an architecture that allows a user program to mremap() or otherwise position its gate VMA between TASK_SIZE and TASK_SIZE_MAX, then that code is going to explode horribly. A whole bunch of work in this direction is here: https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/fixes It's almost entirely untested. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] x86_64: uninline TASK_SIZE 2019-04-23 17:21 ` Andy Lutomirski @ 2019-04-24 10:38 ` Ingo Molnar 0 siblings, 0 replies; 14+ messages in thread From: Ingo Molnar @ 2019-04-24 10:38 UTC (permalink / raw) To: Andy Lutomirski Cc: Alexey Dobriyan, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov, LKML, X86 ML, Peter Zijlstra, Linus Torvalds, Al Viro * Andy Lutomirski <luto@kernel.org> wrote: > > Or am I missing some complication? > > Seems like a great idea to me. > > BTW, what the heck is up with get_gate_page()? I'm struggling to > understand what it's even trying to do. If there's an architecture > that allows a user program to mremap() or otherwise position its gate > VMA between TASK_SIZE and TASK_SIZE_MAX, then that code is going to > explode horribly. I believe it was an old attempt from the times when the vsyscall area *didn't* have a vma, at all, and only get_gate_page() kept the mmap allocator from overlapping it with a user vma? Should IMHO be entirely solved by the vma-ification of all things vsyscall and vdso, and we can remove this remnant. > A whole bunch of work in this direction is here: > > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/fixes > > It's almost entirely untested. Please post it as patches once you are somewhat confident in the outcome and general direction. Thanks, Ingo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] x86_64: uninline TASK_SIZE 2019-04-22 10:34 ` Ingo Molnar 2019-04-22 14:30 ` Andy Lutomirski @ 2019-04-22 22:04 ` Alexey Dobriyan 1 sibling, 0 replies; 14+ messages in thread From: Alexey Dobriyan @ 2019-04-22 22:04 UTC (permalink / raw) To: Ingo Molnar Cc: hpa, tglx, mingo, bp, linux-kernel, x86, Andy Lutomirski, Peter Zijlstra, Linus Torvalds, Al Viro On Mon, Apr 22, 2019 at 12:34:49PM +0200, Ingo Molnar wrote: > When set_personality_ia32() is called then TIF_ADDR32 is set > unconditionally, without any Kconfig variations. Indeed. personality(PER_LINUX32) = 0 (PER_LINUX) I only wasted about half an evening ifdefing TIF_ flags. Thanks for saving a lot of time! ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v2] x86_64: uninline TASK_SIZE 2019-04-21 18:28 ` Ingo Molnar 2019-04-21 20:07 ` hpa @ 2019-04-21 22:07 ` Alexey Dobriyan 1 sibling, 0 replies; 14+ messages in thread From: Alexey Dobriyan @ 2019-04-21 22:07 UTC (permalink / raw) To: Ingo Molnar; +Cc: tglx, mingo, bp, hpa, linux-kernel, x86 TASK_SIZE macro is quite deceptive: it looks like a constant but in fact compiles to 50+ bytes. Space savings on x86_64 defconfig: add/remove: 1/0 grow/shrink: 3/24 up/down: 77/-2247 (-2170) Function old new delta _task_size - 52 +52 mpol_shared_policy_init 344 363 +19 shmem_get_unmapped_area 92 97 +5 __rseq_handle_notify_resume.cold 34 35 +1 copy_from_user_nmi 123 113 -10 mmap_address_hint_valid 92 56 -36 arch_get_unmapped_area_topdown 471 435 -36 tlb_gather_mmu 164 126 -38 hugetlb_get_unmapped_area 774 736 -38 __create_xol_area 497 458 -39 arch_tlb_gather_mmu 160 120 -40 setup_new_exec 380 336 -44 __x64_sys_mlockall 378 333 -45 __ia32_sys_mlockall 378 333 -45 tlb_flush_mmu 235 189 -46 unmap_page_range 2098 2048 -50 copy_mount_options 518 465 -53 __get_user_pages 1737 1675 -62 get_unmapped_area 270 204 -66 perf_prepare_sample 1176 1098 -78 perf_callchain_user 549 469 -80 mremap_to.isra 545 457 -88 arch_tlb_finish_mmu 394 305 -89 __do_munmap 1039 927 -112 elf_map 527 409 -118 prctl_set_mm 1509 1335 -174 __rseq_handle_notify_resume 1116 906 -210 load_elf_binary 11761 11111 -650 <=== Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> --- arch/x86/include/asm/processor.h | 4 ++-- arch/x86/kernel/process_64.c | 6 ++++++ 2 files changed, 8 insertions(+), 2 deletions(-) --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -887,8 +887,8 @@ static inline void spin_lock_prefetch(const void *x) #define TASK_SIZE_LOW (test_thread_flag(TIF_ADDR32) ? \ IA32_PAGE_OFFSET : DEFAULT_MAP_WINDOW) -#define TASK_SIZE (test_thread_flag(TIF_ADDR32) ? \ - IA32_PAGE_OFFSET : TASK_SIZE_MAX) +unsigned long _task_size(void); +#define TASK_SIZE _task_size() #define TASK_SIZE_OF(child) ((test_tsk_thread_flag(child, TIF_ADDR32)) ? \ IA32_PAGE_OFFSET : TASK_SIZE_MAX) --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -827,3 +827,9 @@ unsigned long KSTK_ESP(struct task_struct *task) { return task_pt_regs(task)->sp; } + +unsigned long _task_size(void) +{ + return test_thread_flag(TIF_ADDR32) ? IA32_PAGE_OFFSET : TASK_SIZE_MAX; +} +EXPORT_SYMBOL(_task_size); ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] x86_64: uninline TASK_SIZE 2019-04-21 16:06 [PATCH] x86_64: uninline TASK_SIZE Alexey Dobriyan 2019-04-21 18:28 ` Ingo Molnar @ 2019-04-22 22:40 ` Linus Torvalds 1 sibling, 0 replies; 14+ messages in thread From: Linus Torvalds @ 2019-04-22 22:40 UTC (permalink / raw) To: Alexey Dobriyan Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Peter Anvin, Linux List Kernel Mailing, the arch/x86 maintainers On Sun, Apr 21, 2019 at 9:06 AM Alexey Dobriyan <adobriyan@gmail.com> wrote: > > TASK_SIZE macro is quite deceptive: it looks like a constant but in fact > compiles to 50+ bytes. Honestly, if you are interested in improving TASK_SIZE, I'd really like to see you try to go even further than this. TASK_SIZE _used_ to just be a fixed constant, which is why it has that name and why the usage patterns are what they are. But since that isn't true any more, I'd much rather fix the _name_, and I'd much rather fix the nasty complex hidden behavior, rather than just keep the name and keep the behavior, but turning it from an inline macro to a function call. And as Ingo points out, we should be able to just make it a field of its own, instead of that complex dance of TIF_ADDR32 etc. However, I think it would be better if that field would be in "struct mm_struct" instead of Ingo's suggestion of the thread. Because while it's currently a per-thread flag, I think it is only set by execve(), so it always ends up being the same per-mm. No? Also, we could/should just make the existing *users* of TASK_SIZE know that it's no longer a simple constant, so all those functions that use it many times could just do unsigned long task_size = TASK_SIZE; rather than re-compute it multiple times like they do now. In fact, making it a function call in many ways makes things *worse*, although maybe we could at least mark the function "pure" so that gcc would be able to cache the end result. But that would actually be wrong for the sequences that maybe do change the thread flags, so I hate that idea too. Much better to just cache it explicitly in the cases where we see that it's currently generating bad code. Linus ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2019-04-24 10:38 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-04-21 16:06 [PATCH] x86_64: uninline TASK_SIZE Alexey Dobriyan 2019-04-21 18:28 ` Ingo Molnar 2019-04-21 20:07 ` hpa 2019-04-21 21:10 ` Alexey Dobriyan 2019-04-22 10:34 ` Ingo Molnar 2019-04-22 14:30 ` Andy Lutomirski 2019-04-22 22:09 ` Alexey Dobriyan 2019-04-23 0:54 ` Andy Lutomirski 2019-04-23 11:12 ` Ingo Molnar 2019-04-23 17:21 ` Andy Lutomirski 2019-04-24 10:38 ` Ingo Molnar 2019-04-22 22:04 ` Alexey Dobriyan 2019-04-21 22:07 ` [PATCH v2] " Alexey Dobriyan 2019-04-22 22:40 ` [PATCH] " Linus Torvalds
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).