From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:53792 "EHLO mx0b-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752170AbdGKGBy (ORCPT ); Tue, 11 Jul 2017 02:01:54 -0400 From: Nick Terrell To: "Austin S. Hemmelgarn" , Adam Borowski CC: Kernel Team , Chris Mason , Yann Collet , David Sterba , "linux-btrfs@vger.kernel.org" , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH v2 3/4] btrfs: Add zstd support Date: Tue, 11 Jul 2017 06:01:38 +0000 Message-ID: <816FA2EC-C072-41D3-A14D-9891BBFEB7DB@fb.com> References: <20170629194108.1674498-1-terrelln@fb.com> <20170629194108.1674498-4-terrelln@fb.com> <20170706163225.xbluc2gi2nlaafzo@angband.pl> <20170707234018.syyfaktjxyxvwglc@angband.pl> <20170708030706.tle2mpfe376jneft@angband.pl> <046c7611-be6d-ec4d-6489-e8f49a4453f6@gmail.com> <15FC42E2-5823-4AF4-A945-0F7C60E7751C@fb.com> In-Reply-To: <15FC42E2-5823-4AF4-A945-0F7C60E7751C@fb.com> Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org List-ID: On 7/10/17, 9:57 PM, "Nick Terrell" wrote: > The problem is caused by a gcc-7 bug [1]. It miscompiles > ZSTD_wildcopy(void *dst, void const *src, ptrdiff_t len) when len is 0. > It only happens when it can't analyze ZSTD_copy8(), which is the case in > the kernel, because memcpy() is implemented with inline assembly. The > generated code is slow anyways, so I propose this workaround, which will > be included in the next patch set. I've confirmed that it fixes the bug for > me. This alternative implementation is also 10-20x faster, and compiles to > the same x86 assembly as the original ZSTD_wildcopy() with the userland > memcpy() implementation [2]. > > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81388#add_comment > [2] https://godbolt.org/g/q5YpLx > > Signed-off-by: Nick Terrell > --- > lib/zstd/zstd_internal.h | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/lib/zstd/zstd_internal.h b/lib/zstd/zstd_internal.h > index 6748719..ade0365 100644 > --- a/lib/zstd/zstd_internal.h > +++ b/lib/zstd/zstd_internal.h > @@ -126,7 +126,9 @@ static const U32 OF_defaultNormLog = OF_DEFAULTNORMLOG; > /*-******************************************* > * Shared functions to include for inlining > *********************************************/ > -static void ZSTD_copy8(void *dst, const void *src) { memcpy(dst, src, 8); } > +static void ZSTD_copy8(void *dst, const void *src) { > + ZSTD_write64(dst, ZSTD_read64(src)); > +} Sorry, my patch still triggered the gcc bug, I used the wrong compiler. This patch works, and runs about the same speed as before the patch for small inputs, and slightly faster for larger inputs (100+ bytes). I'll look for a faster workaround if benchmarks show it matters. Signed-off-by: Nick Terrell --- lib/zstd/zstd_internal.h | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/lib/zstd/zstd_internal.h b/lib/zstd/zstd_internal.h index 6748719..839014d 100644 --- a/lib/zstd/zstd_internal.h +++ b/lib/zstd/zstd_internal.h @@ -139,12 +139,8 @@ static void ZSTD_copy8(void *dst, const void *src) { memcpy(dst, src, 8); } #define WILDCOPY_OVERLENGTH 8 ZSTD_STATIC void ZSTD_wildcopy(void *dst, const void *src, ptrdiff_t length) { - const BYTE *ip = (const BYTE *)src; - BYTE *op = (BYTE *)dst; - BYTE *const oend = op + length; - do - COPY8(op, ip) - while (op < oend); + if (length > 0) + memcpy(dst, src, length); } ZSTD_STATIC void ZSTD_wildcopy_e(void *dst, const void *src, void *dstEnd) /* should be faster for decoding, but strangely, not verified on all platform */ -- 2.9.3 {.n++%ݶw{.n+{k~^nrzh&zzޗ++zfh~iz_j:+v)ߣm From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755269AbdGKGB4 (ORCPT ); Tue, 11 Jul 2017 02:01:56 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:53792 "EHLO mx0b-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752170AbdGKGBy (ORCPT ); Tue, 11 Jul 2017 02:01:54 -0400 From: Nick Terrell To: "Austin S. Hemmelgarn" , Adam Borowski CC: Kernel Team , Chris Mason , Yann Collet , David Sterba , "linux-btrfs@vger.kernel.org" , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH v2 3/4] btrfs: Add zstd support Thread-Topic: [PATCH v2 3/4] btrfs: Add zstd support Thread-Index: AQHS8Q/LbUGGKKAiq0CVg+3Wq3AYu6JHCMaAgAGOPwCAAHujAIAAOccAgAPD0QCAAJzEgIAAEdQA Date: Tue, 11 Jul 2017 06:01:38 +0000 Message-ID: <816FA2EC-C072-41D3-A14D-9891BBFEB7DB@fb.com> References: <20170629194108.1674498-1-terrelln@fb.com> <20170629194108.1674498-4-terrelln@fb.com> <20170706163225.xbluc2gi2nlaafzo@angband.pl> <20170707234018.syyfaktjxyxvwglc@angband.pl> <20170708030706.tle2mpfe376jneft@angband.pl> <046c7611-be6d-ec4d-6489-e8f49a4453f6@gmail.com> <15FC42E2-5823-4AF4-A945-0F7C60E7751C@fb.com> In-Reply-To: <15FC42E2-5823-4AF4-A945-0F7C60E7751C@fb.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [2620:10d:c090:180::1:6c79] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;DM5PR15MB1660;20:8spY8Ynn6IgeLHq5a5Y07IA69V4uDg7ccijDpF5Q55ZflpPD0RnjCvYzQJaujBFxprIdCs4pS7YhhylLyJvDn9tImxxZ+ns8UIkdBmR/5RVsdvryu7a1W5H11lOCLm0hl4erKTwtWj2q9qGorYDvafkH/nAIJHDM5HuNpQ8SBBE= x-ms-office365-filtering-correlation-id: 0c461ea8-6ecc-43fc-0171-08d4c8224bba x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(22001)(300000502095)(300135100095)(2017030254075)(300000503095)(300135400095)(2017052603031)(201703131423075)(201703031133081)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095);SRVR:DM5PR15MB1660; x-ms-traffictypediagnostic: DM5PR15MB1660: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(133145235818549)(22074186197030)(236129657087228)(67672495146484)(183786458502308); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(601004)(2401047)(5005006)(8121501046)(2017060910075)(93006095)(93001095)(10201501046)(3002001)(100000703101)(100105400095)(6041248)(20161123562025)(20161123555025)(20161123560025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123558100)(20161123564025)(6072148)(100000704101)(100105200095)(100000705101)(100105500095);SRVR:DM5PR15MB1660;BCL:0;PCL:0;RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095);SRVR:DM5PR15MB1660; x-forefront-prvs: 0365C0E14B x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(6009001)(39410400002)(39850400002)(39840400002)(39450400003)(39400400002)(377454003)(24454002)(5660300001)(8936002)(82746002)(25786009)(189998001)(7736002)(3280700002)(3660700001)(53546010)(478600001)(14454004)(81166006)(33656002)(305945005)(8676002)(966005)(2906002)(54356999)(4326008)(6506006)(6436002)(93886004)(76176999)(99286003)(6512007)(53936002)(86362001)(229853002)(2900100001)(54906002)(6116002)(102836003)(6306002)(6486002)(2950100002)(38730400002)(83716003)(6246003)(36756003)(39060400002)(77096006)(50986999);DIR:OUT;SFP:1102;SCL:1;SRVR:DM5PR15MB1660;H:DM5PR15MB1753.namprd15.prod.outlook.com;FPR:;SPF:None;MLV:ovrnspm;PTR:InfoNoRecords;LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="utf-8" Content-ID: MIME-Version: 1.0 X-MS-Exchange-CrossTenant-originalarrivaltime: 11 Jul 2017 06:01:38.9288 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 8ae927fe-1255-47a7-a2af-5f3a069daaa2 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5PR15MB1660 X-OriginatorOrg: fb.com X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-07-11_02:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by nfs id v6B622su016549 On 7/10/17, 9:57 PM, "Nick Terrell" wrote: > The problem is caused by a gcc-7 bug [1]. It miscompiles > ZSTD_wildcopy(void *dst, void const *src, ptrdiff_t len) when len is 0. > It only happens when it can't analyze ZSTD_copy8(), which is the case in > the kernel, because memcpy() is implemented with inline assembly. The > generated code is slow anyways, so I propose this workaround, which will > be included in the next patch set. I've confirmed that it fixes the bug for > me. This alternative implementation is also 10-20x faster, and compiles to > the same x86 assembly as the original ZSTD_wildcopy() with the userland > memcpy() implementation [2]. > > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81388#add_comment > [2] https://godbolt.org/g/q5YpLx > > Signed-off-by: Nick Terrell > --- > lib/zstd/zstd_internal.h | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/lib/zstd/zstd_internal.h b/lib/zstd/zstd_internal.h > index 6748719..ade0365 100644 > --- a/lib/zstd/zstd_internal.h > +++ b/lib/zstd/zstd_internal.h > @@ -126,7 +126,9 @@ static const U32 OF_defaultNormLog = OF_DEFAULTNORMLOG; > /*-******************************************* > * Shared functions to include for inlining > *********************************************/ > -static void ZSTD_copy8(void *dst, const void *src) { memcpy(dst, src, 8); } > +static void ZSTD_copy8(void *dst, const void *src) { > + ZSTD_write64(dst, ZSTD_read64(src)); > +} Sorry, my patch still triggered the gcc bug, I used the wrong compiler. This patch works, and runs about the same speed as before the patch for small inputs, and slightly faster for larger inputs (100+ bytes). I'll look for a faster workaround if benchmarks show it matters. Signed-off-by: Nick Terrell --- lib/zstd/zstd_internal.h | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/lib/zstd/zstd_internal.h b/lib/zstd/zstd_internal.h index 6748719..839014d 100644 --- a/lib/zstd/zstd_internal.h +++ b/lib/zstd/zstd_internal.h @@ -139,12 +139,8 @@ static void ZSTD_copy8(void *dst, const void *src) { memcpy(dst, src, 8); } #define WILDCOPY_OVERLENGTH 8 ZSTD_STATIC void ZSTD_wildcopy(void *dst, const void *src, ptrdiff_t length) { - const BYTE *ip = (const BYTE *)src; - BYTE *op = (BYTE *)dst; - BYTE *const oend = op + length; - do - COPY8(op, ip) - while (op < oend); + if (length > 0) + memcpy(dst, src, length); } ZSTD_STATIC void ZSTD_wildcopy_e(void *dst, const void *src, void *dstEnd) /* should be faster for decoding, but strangely, not verified on all platform */ -- 2.9.3