From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759438AbbA2Xkg (ORCPT ); Thu, 29 Jan 2015 18:40:36 -0500 Received: from mail-ie0-f175.google.com ([209.85.223.175]:36504 "EHLO mail-ie0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752562AbbA2Xke (ORCPT ); Thu, 29 Jan 2015 18:40:34 -0500 MIME-Version: 1.0 In-Reply-To: <1422504685-7864-1-git-send-email-airlied@redhat.com> References: <1422504685-7864-1-git-send-email-airlied@redhat.com> Date: Thu, 29 Jan 2015 15:40:33 -0800 X-Google-Sender-Auth: XGh4YfxW0KrGNovFWljASuDvRWw Message-ID: Subject: Re: [PATCH] vt_buffer: drop console buffer copying optimisations From: Linus Torvalds To: Dave Airlie Cc: Linux Kernel Mailing List , dri-devel@lists.sf.net, Greg Kroah-Hartman , Tomi Valkeinen Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jan 28, 2015 at 8:11 PM, Dave Airlie wrote: > > Linus, this came up a while back I finally got some confirmation > that it fixes those servers. I'm certainly ok with this. which way should it go in? The users are: - drivers/tty/vt/vt.c (Greg KH, "tty layer") - drivers/video/console/* (fbcon people: Tomi Valkeinen and friends) and it might make sense to have *some* indication of how much worse this makes fbcon performance in particular.. Greg/Tomi - the patch is removing this: #define scr_memcpyw(d, s, c) memcpy(d, s, c) #define scr_memmovew(d, s, c) memmove(d, s, c) #define VT_BUF_HAVE_MEMCPYW #define VT_BUF_HAVE_MEMMOVEW from , because some stupid graphics cards apparently cannot handle 64-bit accesses of regular memcpy/memmove. And on other setups, this will be the reverse: 8-bit accesses due to using "rep movsb", which is the fast way to move/clear memory on modern Intel CPU's, but is really wrong for MMIO where it will be slow as hell. So just getting rid of the memcpy/memmove is likely the right thing in general, since the fallbacks go this the traditional 16-bit-at-a-time way. And getting rid of the memcpy _may_ speed things up. But if it slows things down, we might have to try something else. Like saying "all cards we've ever seen have been ok with aligned 32-bit accesses", and extend the open-coded scr_memcpy/memmove functions to do that. Hmm? Linus