On 7/10/17, 9:57 PM, "Nick Terrell" wrote: > The problem is caused by a gcc-7 bug [1]. It miscompiles > ZSTD_wildcopy(void *dst, void const *src, ptrdiff_t len) when len is 0. > It only happens when it can't analyze ZSTD_copy8(), which is the case in > the kernel, because memcpy() is implemented with inline assembly. The > generated code is slow anyways, so I propose this workaround, which will > be included in the next patch set. I've confirmed that it fixes the bug for > me. This alternative implementation is also 10-20x faster, and compiles to > the same x86 assembly as the original ZSTD_wildcopy() with the userland > memcpy() implementation [2]. > > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81388#add_comment > [2] https://godbolt.org/g/q5YpLx > > Signed-off-by: Nick Terrell > --- > lib/zstd/zstd_internal.h | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/lib/zstd/zstd_internal.h b/lib/zstd/zstd_internal.h > index 6748719..ade0365 100644 > --- a/lib/zstd/zstd_internal.h > +++ b/lib/zstd/zstd_internal.h > @@ -126,7 +126,9 @@ static const U32 OF_defaultNormLog = OF_DEFAULTNORMLOG; > /*-******************************************* > * Shared functions to include for inlining > *********************************************/ > -static void ZSTD_copy8(void *dst, const void *src) { memcpy(dst, src, 8); } > +static void ZSTD_copy8(void *dst, const void *src) { > + ZSTD_write64(dst, ZSTD_read64(src)); > +} Sorry, my patch still triggered the gcc bug, I used the wrong compiler. This patch works, and runs about the same speed as before the patch for small inputs, and slightly faster for larger inputs (100+ bytes). I'll look for a faster workaround if benchmarks show it matters. Signed-off-by: Nick Terrell --- lib/zstd/zstd_internal.h | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/lib/zstd/zstd_internal.h b/lib/zstd/zstd_internal.h index 6748719..839014d 100644 --- a/lib/zstd/zstd_internal.h +++ b/lib/zstd/zstd_internal.h @@ -139,12 +139,8 @@ static void ZSTD_copy8(void *dst, const void *src) { memcpy(dst, src, 8); } #define WILDCOPY_OVERLENGTH 8 ZSTD_STATIC void ZSTD_wildcopy(void *dst, const void *src, ptrdiff_t length) { - const BYTE *ip = (const BYTE *)src; - BYTE *op = (BYTE *)dst; - BYTE *const oend = op + length; - do - COPY8(op, ip) - while (op < oend); + if (length > 0) + memcpy(dst, src, length); } ZSTD_STATIC void ZSTD_wildcopy_e(void *dst, const void *src, void *dstEnd) /* should be faster for decoding, but strangely, not verified on all platform */ -- 2.9.3 {.n++%ݶw{.n+{k~^nrzh&zzޗ++zfh~iz_j:+v)ߣm