From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752559AbdCNTIi (ORCPT ); Tue, 14 Mar 2017 15:08:38 -0400 Received: from mail-yw0-f193.google.com ([209.85.161.193]:34879 "EHLO mail-yw0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751226AbdCNTId (ORCPT ); Tue, 14 Mar 2017 15:08:33 -0400 MIME-Version: 1.0 In-Reply-To: <30f2ec3e-d0c8-8dd2-837f-3380237d843c@zytor.com> References: <20161108183956.4521-1-khuey@kylehuey.com> <20161108183956.4521-7-khuey@kylehuey.com> <30f2ec3e-d0c8-8dd2-837f-3380237d843c@zytor.com> From: Kyle Huey Date: Tue, 14 Mar 2017 12:08:31 -0700 Message-ID: Subject: Re: [PATCH v10 6/7] x86/arch_prctl: Add ARCH_[GET|SET]_CPUID To: "H. Peter Anvin" Cc: "Robert O'Callahan" , Thomas Gleixner , Andy Lutomirski , Ingo Molnar , "maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" , Paolo Bonzini , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , Jeff Dike , Richard Weinberger , Alexander Viro , Shuah Khan , Dave Hansen , Borislav Petkov , Peter Zijlstra , Boris Ostrovsky , Len Brown , "Rafael J. Wysocki" , Dmitry Safonov , David Matlack , open list , "open list:USER-MODE LINUX (UML)" , "open list:USER-MODE LINUX (UML)" , "open list:FILESYSTEMS (VFS and infrastructure)" , "open list:KERNEL SELFTEST FRAMEWORK" , kvm list Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 14, 2017 at 12:01 PM, H. Peter Anvin wrote: > On 11/08/16 10:39, Kyle Huey wrote: >> } >> >> + if (test_tsk_thread_flag(prev_p, TIF_NOCPUID) ^ >> + test_tsk_thread_flag(next_p, TIF_NOCPUID)) { >> + set_cpuid_faulting(test_tsk_thread_flag(next_p, TIF_NOCPUID)); >> + } >> + >> if (test_tsk_thread_flag(prev_p, TIF_NOTSC) ^ >> test_tsk_thread_flag(next_p, TIF_NOTSC)) { >> /* prev and next are different */ >> if (test_tsk_thread_flag(next_p, TIF_NOTSC)) >> hard_disable_TSC(); >> else >> hard_enable_TSC(); >> } > > I'm unhappy about this part: we already do two XORs on these after bit > extraction, which is quite inefficient; and at least theoretically we > could be indirecting though the ->stack pointer for every one if gcc > can't tell it won't have changed (we really need to get thread_info > moved into the task_struct allocation and away from the kernel stack, > especially since on x86 the pointer is the same size as the vestigial > structure it points to.) > > It would be so much saner to do one xor and then go onto a common slow path: > > struct thread_info *prev_ti = task_thread_info(prev_p); > struct thread_info *next_ti = task_thread_info(next_p); > > tif_flipped = prev_ti->flags ^ next_ti->flags; > > if (unlikely(tif_flipped & > (_TIF_BLOCKSTEP | _TIF_NOTSC | _TIF_NOCPUID))) { > if (tif_flipped & _TIF_BLOCKSTEP) { > ... > } > if (tif_flipped & _TIF_NOTSC) { > ... > } > if (tif_flipped & _TIF_NOCPUID) { > ... > } > } > > Then we can also replace test_tsk_thread_flag() with > test_ti_thread_flag() in other places in this function. That's largely what we ended up doing. See https://lkml.org/lkml/2017/2/14/80 and the latest version of this patch, https://lkml.org/lkml/2017/3/11/197. - Kyle