From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754899Ab0BRSmR (ORCPT ); Thu, 18 Feb 2010 13:42:17 -0500 Received: from mail-fx0-f220.google.com ([209.85.220.220]:46460 "EHLO mail-fx0-f220.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752490Ab0BRSmP convert rfc822-to-8bit (ORCPT ); Thu, 18 Feb 2010 13:42:15 -0500 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=ZyTbtzwboUMF5E9Y+AK8oMJ5iail3uf21KgAgIkgQjT7QYUunoq0oxpo2ywFfBkMs/ CPERSbz0PTZ3LUGQUUkKZrfTcJ5T0TXgD7EjHBxGZtxmnDJ36acj94byxNffS659W1g2 SbxwfCHzUI+jTVN7fJ84X8NsDgFLzjM24+Zak= MIME-Version: 1.0 In-Reply-To: <4B7D86BC.10000@zytor.com> References: <1266406962-17463-1-git-send-email-luca@luca-barbieri.com> <1266406962-17463-10-git-send-email-luca@luca-barbieri.com> <87eikj54wp.fsf@basil.nowhere.org> <20100218101156.GE5964@basil.fritz.box> <4B7D5BB4.4000307@zytor.com> <4B7D86BC.10000@zytor.com> Date: Thu, 18 Feb 2010 19:42:11 +0100 X-Google-Sender-Auth: 7d503b2bec3aa066 Message-ID: Subject: Re: [PATCH 09/10] x86-32: use SSE for atomic64_read/set if available From: Luca Barbieri To: "H. Peter Anvin" Cc: Andi Kleen , mingo@elte.hu, a.p.zijlstra@chello.nl, akpm@linux-foundation.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > We already do that kind of stuff, using > kernel_fpu_begin()..kernel_fpu_end().  We went through some pain a bit > ago to clean up "private hacks" that complicated things substantially. But that saves the whole FPU state on the first usage, and also triggers a fault when userspace attempts to use it again. Additionally it does a clts/stts every time which is slow for small algorithms (lke the atomic64 routines). The first issue can be solved by using SSE and saving only the used registers, and the second with lazy TS flag restoring. How about something like: static inline unsigned long kernel_sse_begin(void) { struct thread_info *me = current_thread_info(); preempt_disable(); if (unlikely(!(me->status & TS_USEDFPU))) { unsigned long cr0 = read_cr0(); if (unlikely(cr0 & X86_CR0_TS)) { clts(); return cr0; } } return 0; } static inline void kernel_sse_end(unsigned long cr0) { if (unlikely(cr0)) write_cr0(cr0); preempt_enable(); } to be improved with lazy TS restoring instead of the read_cr0/write_cr0?