From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S272432AbTHIROo (ORCPT ); Sat, 9 Aug 2003 13:14:44 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S272932AbTHIROk (ORCPT ); Sat, 9 Aug 2003 13:14:40 -0400 Received: from mail.jlokier.co.uk ([81.29.64.88]:41348 "EHLO mail.jlokier.co.uk") by vger.kernel.org with ESMTP id S272432AbTHIRNo (ORCPT ); Sat, 9 Aug 2003 13:13:44 -0400 Date: Sat, 9 Aug 2003 18:13:18 +0100 From: Jamie Lokier To: Matt Mackall Cc: linux-kernel , Andrew Morton , jmorris@intercode.com.au, davem@redhat.com Subject: Re: [RFC][PATCH] Make cryptoapi non-optional? Message-ID: <20030809171318.GD29647@mail.jlokier.co.uk> References: <20030809074459.GQ31810@waste.org> <20030809143314.GT31810@waste.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030809143314.GT31810@waste.org> User-Agent: Mutt/1.4.1i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Matt Mackall wrote: > This code is already at about 125% of baseline throughput, and can > probably reach 250% with some tweaking of cryptoapi's redundant > padding (in case anyone else cares about being able to get 120Mb/s > of cryptographically strong random data). Why would the cryptoapi version be faster, I wondered? So I had a look. No conclusions, just observations. - random.c defines a constant, SHA_CODE_SIZE. This selects between 4 implementations, from smaller to (presumably) faster by progressively unrolling more and more. This is set to SHA_CODE_SIZE == 0, for smallest code. In crypto/sha1.c, the code is roughly equivalent to SHA_CODE_SIZE == 3, random.c's _largest_ implementation. So you seem to be replacing a small version with a large version, albeit faster. - One of the optimisations in crypto/sha1.c is: /* blk0() and blk() perform the initial expand. */ /* I got the idea of expanding during the round function from SSLeay */ Yet, random.c has the opposite: /* * Do the preliminary expansion of 16 to 80 words. Doing it * out-of-line line this is faster than doing it in-line on * register-starved machines like the x86, and not really any * slower on real processors. */ I wonder if the random.c method is faster on x86? - sha1_transform() in crypto/sha1.c ends with this misleading comment. Was the author trying to prevent data leakage? If so, he failed: /* Wipe variables */ a = b = c = d = e = 0; memset (block32, 0x00, sizeof block32); The second line will definitely be optimised away. The third could be, although GCC doesn't. Enjoy, -- Jamie