From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C8695C76188 for ; Mon, 22 Jul 2019 10:15:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A0363217D6 for ; Mon, 22 Jul 2019 10:15:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729312AbfGVKPT (ORCPT ); Mon, 22 Jul 2019 06:15:19 -0400 Received: from ozlabs.org ([203.11.71.1]:56773 "EHLO ozlabs.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726846AbfGVKPT (ORCPT ); Mon, 22 Jul 2019 06:15:19 -0400 Received: from authenticated.ozlabs.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail.ozlabs.org (Postfix) with ESMTPSA id 45sctX3vfqz9s3l; Mon, 22 Jul 2019 20:15:16 +1000 (AEST) From: Michael Ellerman To: Segher Boessenkool , Nathan Chancellor Cc: Christophe Leroy , Benjamin Herrenschmidt , Paul Mackerras , linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, clang-built-linux@googlegroups.com Subject: Re: [PATCH v2] powerpc: slightly improve cache helpers In-Reply-To: <20190721180150.GN20882@gate.crashing.org> References: <45hnfp6SlLz9sP0@ozlabs.org> <20190708191416.GA21442@archlinux-threadripper> <20190709064952.GA40851@archlinux-threadripper> <20190719032456.GA14108@archlinux-threadripper> <20190719152303.GA20882@gate.crashing.org> <20190719160455.GA12420@archlinux-threadripper> <20190721075846.GA97701@archlinux-threadripper> <20190721180150.GN20882@gate.crashing.org> Date: Mon, 22 Jul 2019 20:15:14 +1000 Message-ID: <87imru74ul.fsf@concordia.ellerman.id.au> MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Segher Boessenkool writes: > On Sun, Jul 21, 2019 at 12:58:46AM -0700, Nathan Chancellor wrote: >> I have attached the disassembly of arch/powerpc/kernel/mem.o with >> clear_page (working) and broken_clear_page (broken), along with the side >> by side diff. My assembly knowledge is fairly limited as it stands and >> it is certainly not up to snuff on PowerPC so I have no idea what I am >> looking for. Please let me know if anything immediately looks off or if >> there is anything else I can do to help out. > > You might want to use a disassembler that shows most simplified mnemonics, > and you crucially should show the relocations. "objdump -dr" works nicely. > >> 0000017c clear_user_page: >> 17c: 38 80 00 80 li 4, 128 >> 180: 7c 89 03 a6 mtctr 4 >> 184: 7c 00 1f ec dcbz 0, 3 >> 188: 38 63 00 20 addi 3, 3, 32 >> 18c: 42 00 ff f8 bdnz .+65528 > > That offset is incorrectly disassembled, btw (it's a signed field, not > unsigned). > >> 0000017c clear_user_page: >> 17c: 94 21 ff f0 stwu 1, -16(1) >> 180: 38 80 00 80 li 4, 128 >> 184: 38 63 ff e0 addi 3, 3, -32 >> 188: 7c 89 03 a6 mtctr 4 >> 18c: 38 81 00 0f addi 4, 1, 15 >> 190: 8c c3 00 20 lbzu 6, 32(3) >> 194: 98 c1 00 0f stb 6, 15(1) >> 198: 7c 00 27 ec dcbz 0, 4 >> 19c: 42 00 ff f4 bdnz .+65524 > > Uh, yeah, well, I have no idea what clang tried here, but that won't > work. It's copying a byte from each target cache line to the stack, > and then does clears the cache line containing that byte on the stack. So it seems like this is a clang bug. None of the distros we support use clang, but we would still like to keep it working if we can. Looking at the original patch, the only upside is that the compiler can use both RA and RB to compute the address, rather than us forcing RA to 0. But at least with my compiler here (GCC 8 vintage) I don't actually see GCC ever using both GPRs even with the patch. Or at least, there's no difference before/after the patch as far as I can see. So my inclination is to revert the original patch. We can try again in a few years :D Thoughts? cheers