From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49C14C636C9 for ; Wed, 21 Jul 2021 18:45:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2EC7861221 for ; Wed, 21 Jul 2021 18:45:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239367AbhGUSFU (ORCPT ); Wed, 21 Jul 2021 14:05:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59300 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231535AbhGUSFT (ORCPT ); Wed, 21 Jul 2021 14:05:19 -0400 Received: from mail-lj1-x234.google.com (mail-lj1-x234.google.com [IPv6:2a00:1450:4864:20::234]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C947CC061757 for ; Wed, 21 Jul 2021 11:45:55 -0700 (PDT) Received: by mail-lj1-x234.google.com with SMTP id h9so4254281ljm.5 for ; Wed, 21 Jul 2021 11:45:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=8y6QbjABYVdCwCo9bCH3uRUzlEcjRbA7x2BUjVyc0ws=; b=hBUMMtm8juMCJx4jllyZahvGiCtlfjKyjc1FvS1TrtIMoXehtYE8RQBhzutuMuUXBw +ZzhHb+Uy6icnIIwEZFOuLpqmC7VLRKYfCK1VRcuzDI5JuppC+jaTAWvJurBuPnMtXuA SFtKdmhSLtUp7cgKEZPGOTovb4IS5Dqjf8u7Y= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=8y6QbjABYVdCwCo9bCH3uRUzlEcjRbA7x2BUjVyc0ws=; b=mwpF9O06BHbOr59w4XIK/56VoxR0CKt9yqHEGBEHWdqtyfgFw/sBlN4YJhLisRbfbf MX5enBMt9yVBf8d3fK5ogBHFEint9QONUTPBcLVbnLqSEBEnpHX6EUJY4MoOeuiQ/smn Sp2XVbWUZtUvFXXAjc2cYX1mXmnTwr7fLYJN+nI6aqc5pXkG6uRw/Ymmaj1O8cVndCSZ UY4r0NcVnacMnC8J9N1YWXLkpa25/X6GpqC5c5CVp9TM9aQFuRWP9t86gpzE2+OxytFM IcIQmVA1U98SfFjGZceLOF/YBvk98UQy1wPvyZ+W/SUwZYeHvYRgGqSY2A11trGUeZSJ 2Fxw== X-Gm-Message-State: AOAM530yLorQlme7Rms0+A5iax1L1VDuNPM9TSjlx9sKe6w/3cVhHpaA mIB/pjQtR+3e9aPzi4JBhxfIWODHgRT5HoRq X-Google-Smtp-Source: ABdhPJyO7fZ0nN54bx9oEtQafeIpPfVviD7rIr/tkk6jp2DcicdrqBhHMuaxsfI1DmG8iok6EWtC3g== X-Received: by 2002:a2e:9a58:: with SMTP id k24mr20985645ljj.8.1626893153989; Wed, 21 Jul 2021 11:45:53 -0700 (PDT) Received: from mail-lf1-f43.google.com (mail-lf1-f43.google.com. [209.85.167.43]) by smtp.gmail.com with ESMTPSA id u10sm2883550ljl.122.2021.07.21.11.45.52 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 21 Jul 2021 11:45:53 -0700 (PDT) Received: by mail-lf1-f43.google.com with SMTP id b26so4664953lfo.4 for ; Wed, 21 Jul 2021 11:45:52 -0700 (PDT) X-Received: by 2002:a05:6512:3f82:: with SMTP id x2mr25045987lfa.421.1626893152674; Wed, 21 Jul 2021 11:45:52 -0700 (PDT) MIME-Version: 1.0 References: <20210721135926.602840-1-nborisov@suse.com> In-Reply-To: From: Linus Torvalds Date: Wed, 21 Jul 2021 11:45:36 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH] lib/string: Bring optimized memcmp from glibc To: Nikolay Borisov Cc: Linux Kernel Mailing List , Nick Desaulniers , linux-fsdevel , Dave Chinner Content-Type: multipart/mixed; boundary="000000000000741c3905c7a694cd" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --000000000000741c3905c7a694cd Content-Type: text/plain; charset="UTF-8" On Wed, Jul 21, 2021 at 11:17 AM Nikolay Borisov wrote: > > I find it somewhat arbitrary that we choose to align the 2nd pointer and > not the first. Yeah, that's a bit odd, but I don't think it matters. The hope is obviously that they are mutually aligned, and in that case it doesn't matter which one you aim to align. > So you are saying that the current memcmp could indeed use improvement > but you don't want it to be based on the glibc's code due to the ugly > misalignment handling? Yeah. I suspect that this (very simple) patch gives you the same performance improvement that the glibc code does. NOTE! I'm not saying this patch is perfect. This one doesn't even _try_ to do the mutual alignment, because it's really silly. But I'm throwing this out here for discussion, because - it's really simple - I suspect it gets you 99% of the way there - the code generation is actually quite good with both gcc and clang. This is gcc: memcmp: jmp .L60 .L52: movq (%rsi), %rax cmpq %rax, (%rdi) jne .L53 addq $8, %rdi addq $8, %rsi subq $8, %rdx .L60: cmpq $7, %rdx ja .L52 testq %rdx, %rdx je .L61 .L53: xorl %ecx, %ecx jmp .L56 .L62: addq $1, %rcx cmpq %rcx, %rdx je .L51 .L56: movzbl (%rdi,%rcx), %eax movzbl (%rsi,%rcx), %r8d subl %r8d, %eax je .L62 .L51: ret .L61: xorl %eax, %eax ret and notice how there are no spills, no extra garbage, just simple and straightforward code. Those things ends mattering too - it's good for I$, it's good for the small cases, and it's good for debugging and reading the code. If this is "good enough" for your test-case, I really would prefer something like this. "Make it as simple as possible, but no simpler" I can do the mutual alignment too, but I'd actually prefer to do it as a separate patch, for when there are numbers for that. And I wouldn't do it as a byte-by-byte case, because that's just stupid. I'd do it using a separate first single "get unaligned word from both sources, compare them for equality, and then only add enough bytes to align" Linus --000000000000741c3905c7a694cd Content-Type: text/x-patch; charset="US-ASCII"; name="patch.diff" Content-Disposition: attachment; filename="patch.diff" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_krdu1hal0 IGxpYi9zdHJpbmcuYyB8IDE2ICsrKysrKysrKysrKysrKysKIDEgZmlsZSBjaGFuZ2VkLCAxNiBp bnNlcnRpb25zKCspCgpkaWZmIC0tZ2l0IGEvbGliL3N0cmluZy5jIGIvbGliL3N0cmluZy5jCmlu ZGV4IDc3YmQwYjFkMzI5Ni4uYjJkZTQ1YTU4MWY0IDEwMDY0NAotLS0gYS9saWIvc3RyaW5nLmMK KysrIGIvbGliL3N0cmluZy5jCkBAIC0yOSw2ICsyOSw3IEBACiAjaW5jbHVkZSA8bGludXgvZXJy bm8uaD4KICNpbmNsdWRlIDxsaW51eC9zbGFiLmg+CiAKKyNpbmNsdWRlIDxhc20vdW5hbGlnbmVk Lmg+CiAjaW5jbHVkZSA8YXNtL2J5dGVvcmRlci5oPgogI2luY2x1ZGUgPGFzbS93b3JkLWF0LWEt dGltZS5oPgogI2luY2x1ZGUgPGFzbS9wYWdlLmg+CkBAIC05MzUsNiArOTM2LDIxIEBAIF9fdmlz aWJsZSBpbnQgbWVtY21wKGNvbnN0IHZvaWQgKmNzLCBjb25zdCB2b2lkICpjdCwgc2l6ZV90IGNv dW50KQogCWNvbnN0IHVuc2lnbmVkIGNoYXIgKnN1MSwgKnN1MjsKIAlpbnQgcmVzID0gMDsKIAor I2lmZGVmIENPTkZJR19IQVZFX0VGRklDSUVOVF9VTkFMSUdORURfQUNDRVNTCisJaWYgKGNvdW50 ID49IHNpemVvZih1bnNpZ25lZCBsb25nKSkgeworCQljb25zdCB1bnNpZ25lZCBsb25nICp1MSA9 IGNzOworCQljb25zdCB1bnNpZ25lZCBsb25nICp1MiA9IGN0OworCQlkbyB7CisJCQlpZiAoZ2V0 X3VuYWxpZ25lZCh1MSkgIT0gZ2V0X3VuYWxpZ25lZCh1MikpCisJCQkJYnJlYWs7CisJCQl1MSsr OworCQkJdTIrKzsKKwkJCWNvdW50IC09IHNpemVvZih1bnNpZ25lZCBsb25nKTsKKwkJfSB3aGls ZSAoY291bnQgPj0gc2l6ZW9mKHVuc2lnbmVkIGxvbmcpKTsKKwkJY3MgPSB1MTsKKwkJY3QgPSB1 MjsKKwl9CisjZW5kaWYKIAlmb3IgKHN1MSA9IGNzLCBzdTIgPSBjdDsgMCA8IGNvdW50OyArK3N1 MSwgKytzdTIsIGNvdW50LS0pCiAJCWlmICgocmVzID0gKnN1MSAtICpzdTIpICE9IDApCiAJCQli cmVhazsK --000000000000741c3905c7a694cd--