From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752511AbeCWNj0 (ORCPT ); Fri, 23 Mar 2018 09:39:26 -0400 Received: from mailout2.samsung.com ([203.254.224.25]:31342 "EHLO mailout2.samsung.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751686AbeCWNjX (ORCPT ); Fri, 23 Mar 2018 09:39:23 -0400 DKIM-Filter: OpenDKIM Filter v2.11.0 mailout2.samsung.com 20180323133922epoutp0244d87ff84e0b8e269194a82f8918f4e4~ekEfQijc22898428984epoutp02l X-AuditID: b6c32a4b-cd1ff70000001126-75-5ab5038908d4 Mime-Version: 1.0 Subject: Re: [PATCH 1/1] lz4: Implement lz4 with dynamic offset length. Reply-To: v.narang@samsung.com From: Vaneet Narang To: Maninder Singh , Yann Collet , Sergey Senozhatsky CC: "linux-crypto@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Nick Terrell , PANKAJ MISHRA , AMIT SAHRAWAT X-Priority: 3 X-Content-Kind-Code: NORMAL In-Reply-To: <20180322042821epcms5p16b4dd6fc12a1563b5aea547f398dda40@epcms5p1> X-Drm-Type: Y,confirm X-Msg-Generator: Mail X-Msg-Type: PERSONAL X-Reply-Demand: N Message-ID: <20180323132119epcms5p8699c2b53a3bab062013e2b3238d0088c@epcms5p8> Date: Fri, 23 Mar 2018 18:51:19 +0530 X-CMS-MailID: 20180323132119epcms5p8699c2b53a3bab062013e2b3238d0088c Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="utf-8" X-Sendblock-Type: REQ_APPROVE X-MTR: 20180323132119epcms5p8699c2b53a3bab062013e2b3238d0088c CMS-TYPE: 105P X-Brightmail-Tracker: H4sIAAAAAAAAA02Sa0hTYRjHeXeO8ygtTsvRk5WOA0VKzp21uTNtUmS10sLoQ7GKddKXKbmz sTO7fYhVdnFhM7MoCRcVlhUVlmkOGSloV2+F1ActutBdkQqTLrTtNOzb73n4/9//8z48FKH0 y5OpUsGD3QJfxsgTydudaWkZlUSzVXshpOX6g5hr3reVezE8IeOetJ2Rc52BgyQ3/LlZxr2q rpFxH/8MyhdTlmP7R+Itd+qG4i1Hb11Glq9NKYWkFS8qwXwxdquxUOQsLhXsZiZ/nW2pzZCl ZTNYE2dk1ALvwGYmr6AwY3lpWXgIRr2dLysPtwp5UWQycxe5neUerC5xih4zs5FldRpWa9To dDqN3rA5W2cIS7bgkvYaH+Hyzdv54G4l6UWPZvpQAgW0HnoPDyEfSqSUdBDB4N5OwocoSkFP g9+t0yOa6fQKCBx/iiKspFOg+1kQSf0FcK/+KorI5XQ6jFWsjzyTRO9F0HivPS5SEPQ3BP7e wTgpTAGnDr0lJZ4FLRebo+YEeg1catgitVXwyxuTqGC0K4AkToIDw48JiafBy4ngv/5sGP/u j2YBfQTByPEgKRW1CLyfY24j9N4PRN0KejV8f9UWHwkm6bng7zFKkjwYfjggjzBBp0LLlzPR PRB0Glxvy5Qkc+DEg2syic0w0PeOlORToernG1nsi631MWag5+4ducQAbypP/5vfAj1HH6Nq xNRNbrruv+C6yeCziLiMZmKX6LBj0eBaKOAdGpF3iOWCXVPkdDSh6DGm57eipp6CDkRTiJmi ONVx06qM47eLuxwdCCiCSVJwH25ZlYpiftdu7Hba3OVlWOxAhvAGjhHJqiJn+LQFj43Vm7T6 rCwjq9Wa9MwMxZK1eVYlbec9eBvGLuyO+WRUQrIX7cnpbuvKqK5o2d9+43WfrNRwMXF2FTvf rN2gsFtHD28a9/b94EJVKsHSOOtayHXl3GiNriGUc7ORPzGuMvVPjLw7n+sM1daMnRwNnFuB djthLGGlf1X+0HNfJv+p1qRO/VYQn92Z41u9xuY4nadhlrHmtw0uWX2u9dKv99kEQ4olPJtO uEX+L/wpEc6iAwAA DLP-Filter: Pass X-CFilter-Loop: Reflected X-CMS-RootMailID: 20180321195916epcas4p1e89e1cc5fd242ee8c348f92f1d1740b0 X-RootMTR: 20180321195916epcas4p1e89e1cc5fd242ee8c348f92f1d1740b0 References: <20180322042821epcms5p16b4dd6fc12a1563b5aea547f398dda40@epcms5p1> <20180321195906.1260480-1-terrelln@fb.com> <1521607242-3968-2-git-send-email-maninder1.s@samsung.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Nick, Thanks for your comments, Please check my reply to few of your comments. I will be sharing benchmarking figures separately. > >> + if (curr_offset > 127) { >> + curr_offset = (curr_offset << 1) | DYN_BIT; >> + LZ4_writeLE16(op, (U16)curr_offset); >> + op += 2; >> + } else { >> + curr_offset = curr_offset << 1; >> + *op = (BYTE)curr_offset; >> + op++; >> + } > >The standard way to do variable sized integers is to use the high-bit as >the control bit, not the low-bit. Do you have benchmarks to show that using >the low-bit is faster than using the high-bit? If so, lets comment in the >code, if not lets use the high-bit. > We are not sure about performance difference of using low or high bit but Since as per LZ4 specification, offset needs to be stored in little endian format so keeping Low bit as control bit makes it easier to retreive offset when offset is spread across two bytes. offset = LZ4_readLE16(ip); if (offset & 0x1) { offset = offset >> 1; // Just one Right shift ip += 2; } else { offset = (offset & 0xff) >> 1; // Only Two Operation for one byte offset ip += 1; } So not sure if keeping High bit will make much difference as we will be needing same or more operations to get offset in case of High bit. >> /* copy literals */ >> cpy = op + length; >> if (((endOnInput) && ((cpy > (partialDecoding ? oexit : oend - MFLIMIT)) >> - || (ip + length > iend - (2 + 1 + LASTLITERALS)))) >> - || ((!endOnInput) && (cpy > oend - WILDCOPYLENGTH))) { >> + || (ip + length > iend - (2 + LASTLITERALS)))) >> + || ((!endOnInput) && (cpy > oend - WILDCOPYLENGTH - 1))) { >> if (partialDecoding) { >> if (cpy > oend) { >> /* >> @@ -188,13 +190,31 @@ static FORCE_INLINE int LZ4_decompress_generic( >> break; >> } >> >> - LZ4_wildCopy(op, ip, cpy); >> + if (dynOffset && length < 4) >> + LZ4_copy4(op, ip); >> + else >> + LZ4_wildCopy(op, ip, cpy); >> + > >The LZ4 format enforces that the last 5 bytes are literals so that >LZ4_wildCopy() can be used here. I suspect that having this extra branch >here for `dynOffset` mode hurts decompression speed. > This check is made to handle one byte read overflow when decompressing last frame. wildCopy does 8 byte copy blindly. Issue Case: 0 length literal followed by 1 byte offset and then it ends with one byte token and 5 Byte Last Literals. With this combination only 7 bytes (1 Byte Offset + 1 Byte Token + 5 Byte Literal) remains at the end of input buffer so reading 8 bytes from input buffer results in 1 byte overflow. Since 1 byte offset is not there in original LZ4, so this issue is not possible. To avoid overhead of this check, we planned to use 6 Byte as Last literal Length. -#define LASTLITERALS 5 +#define LASTLITERALS 6 >> >> int LZ4_decompress_safe(const char *source, char *dest, >> - int compressedSize, int maxDecompressedSize) >> + int compressedSize, int maxDecompressedSize, bool dynOffset) >> { >> return LZ4_decompress_generic(source, dest, compressedSize, >> maxDecompressedSize, endOnInputSize, full, 0, >> - noDict, (BYTE *)dest, NULL, 0); >> + noDict, (BYTE *)dest, NULL, 0, dynOffset); > >You'll need to use the same trick that LZ4_compress_fast() uses, by hard >coding `dynOffset`. We want the compiler to generate too version of >LZ4_decompress_generic(), one with `dynOffset` and one with `!dynOffset`. >That way the tight loop won't the branches that check `dynOffset`. > > if (dynOffset) > return LZ4_decompress_generic(..., true); > else > return LZ4_decompress_generic(..., false); > >Without this trick, I expect that this patch causes a regression to both >LZ4 and LZ4_DYN decompression speed. Since there is no backward compatibility of our solution with original LZ4 so we are bit confused if we should make it as separate API to avoid overhead of dynOffset checks every where in the code and also to avoid changing prototype of exported functions LZ4_decompress/LZ4_compress OR we should keep these checks to avoid redundant code. Kindly suggest Thanks & Regards, Vaneet Narang