From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sage Weil Subject: RE: Adding compression/checksum support for bluestore. Date: Tue, 5 Apr 2016 17:14:48 -0400 (EDT) Message-ID: References: <20160401035610.GA5671@onthe.net.au> <20160401052838.GA8044@onthe.net.au> <20160401194912.GA18636@onthe.net.au> <20160402040736.GA22721@onthe.net.au> <20160404150042.GA25465@onthe.net.au> Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Return-path: Received: from cobra.newdream.net ([66.33.216.30]:44072 "EHLO cobra.newdream.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759786AbcDEVO4 (ORCPT ); Tue, 5 Apr 2016 17:14:56 -0400 In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Allen Samuels Cc: Chris Dunlop , Igor Fedotov , ceph-devel On Tue, 5 Apr 2016, Allen Samuels wrote: > > -----Original Message----- > > From: Sage Weil [mailto:sage@newdream.net] > > Sent: Tuesday, April 05, 2016 5:36 AM > > To: Allen Samuels > > Cc: Chris Dunlop ; Igor Fedotov > > ; ceph-devel > > Subject: RE: Adding compression/checksum support for bluestore. > > > > On Mon, 4 Apr 2016, Allen Samuels wrote: > > > But there's an approximation that gets the job done for us. > > > > > > When U is VERY SMALL (this will always be true for us :)). > > > > > > The you can approximate 1-(1-U)^D as D * U. (for even modest values > > > of U (say 10-5), this is a very good approximation). > > > > > > Now the math is easy. > > > > > > The odds of failure for reading a block of size D is now D * U, with > > > checksum correction it becomes (D * U) / (2^C). > > > > > > It's now clear that if you double the data size, you need to add one > > > bit to your checksum to compensate. > > > > > > (Again, the actual math is less than 1 bit, but in the range we care > > > about 1 bit will always do it). > > > > > > Anyways, that's what we worked out. > > > > D = block size, U = hw UBER, C = checksum. Let's add N = number of bits you > > actually want to read. In that case, we have to read (N / D) blocks of D bits, > > and we get > > > > P(reading N bits and getting some bad data and not knowing it) > > = (D * U) / (2^C) * (N / D) > > = U * N / 2^C > > > > and the D term (block size) disappears. IIUC this is what Chris was originally > > getting at. The block size affects the probability I get an error on one block, > > but if I am a user reading something, you don't care about block size--you > > care about how much data you want to read. I think in that case it doesn't > > really matter (modulo rounding error, minimum read size, how precisely we > > can locate the error, etc.). > > > > Is that right? > > It's a "Bit Error Rate", not an "I/O error rate" -- it doesn't matter > how you chunk of the bits into blocks and I/O operations. Right. And you use it to calculate "The odds of failure for reading a block of size D", but I'm saying that the user doesn't care about D (which is an implementation detail). They care about N, the amoutn of data they want to read. And when you calculate the probability of getting bad data after reading *N* bits, it has nothing to do with D. Does that make sense? sage