From mboxrd@z Thu Jan 1 00:00:00 1970 From: Gilad Ben-Yossef Subject: Re: [RFC PATCH v4] IV Generation algorithms for dm-crypt Date: Thu, 2 Mar 2017 16:01:03 +0200 Message-ID: References: <1486463731-6224-1-git-send-email-binoy.jayan@linaro.org> <68f70534-a309-46ba-a84d-8acc1e6620e5@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Cc: Binoy Jayan , Rajendra , Herbert Xu , Oded , Mike Snitzer , Linux kernel mailing list , Milan Broz , linux-raid@vger.kernel.org, dm-devel@redhat.com, Mark Brown , Arnd Bergmann , linux-crypto@vger.kernel.org, Shaohua Li , "David S. Miller" , Alasdair Kergon , Ofir To: Ondrej Mosnacek Return-path: In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com List-Id: linux-crypto.vger.kernel.org On Wed, Mar 1, 2017 at 3:21 PM, Ondrej Mosnacek wrote: > 2017-03-01 13:42 GMT+01:00 Gilad Ben-Yossef : > > Wouldn't adopting a bulk request API (something like what I tried to > do here [1]) that allows users to supply multiple messages, each with > their own IV, fulfill this purpose? That way, we wouldn't need to > introduce any new modes into Crypto API and the drivers/accelerators > would only need to provide bulk implementations of common modes > (xts(aes), cbc(aes), ...) to provide better performance for dm-crypt > (and possibly other users, too). > > I'm not sure how exactly these crypto accelerators work, but wouldn't > it help if the drivers simply get more messages (in our case sectors) > in a single call? I wonder, would (efficiently) supporting such a > scheme require changes in the HW itself or could it be achieved just > by modifying driver code (let's say specifically for your CryptoCell > accelerator)? > > [1] https://www.mail-archive.com/linux-crypto@vger.kernel.org/msg23007.html > >>From a general perspective - that is things are expect to be true not just for CryptoCell but for most HW crypto engines, you want two things - for the HW engine to be able to burst work for a long time and than rest for a long time vs. a stop and go scheme (engine utilization) and for the average IO transaction to be relatively long (bus utilization) So, a big cluster size i.e. Milan's proposal) works great - you get both. Submitting a series of sequential small clusters where the HW can calculate the IV (e.g. Binoy's proposal) works great if the HW supports it - you get both. A batched series of small clusters + IV is less favorable - if your HW engines has lots of parallel context processing (this is expensive for HW) you might enjoy engine utilization but the bus utilization will be low - lots of small transactions. Gilad -- Gilad Ben-Yossef Chief Coffee Drinker "If you take a class in large-scale robotics, can you end up in a situation where the homework eats your dog?" -- Jean-Baptiste Queru