From mboxrd@z Thu Jan 1 00:00:00 1970 From: merez@codeaurora.org Subject: Re: [PATCH v8 2/2] mmc: support packed write command for eMMC4.5 device Date: Wed, 14 Nov 2012 00:14:58 -0800 (PST) Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from wolverine02.qualcomm.com ([199.106.114.251]:47320 "EHLO wolverine02.qualcomm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755450Ab2KNIPO (ORCPT ); Wed, 14 Nov 2012 03:15:14 -0500 Sender: linux-mmc-owner@vger.kernel.org List-Id: linux-mmc@vger.kernel.org To: Chris Ball Cc: merez@codeaurora.org, Seungwon Jeon , linux-mmc@vger.kernel.org, 'Subhash Jadavani' , "'S, Venkatraman'" , 'Saugata Das' , 'Namjae Jeon' Hi Chris, The amount of improvement from the packed commands, as from any other eMMC4.5 feature, depends on several parameters: 1. The card support of this feature. If the card supports only the feat= ure interface, then you'll see no improvement when using the feature. 2. The benchmark tool used. Since the packed command preparation is stopped due to a FLUSH request, a benchmark that issues many FLUSH requests can result in a small amount of packing and you will see no improvement. You can use the following patch to get the packed commands statistics: http://marc.info/?l=3Dlinux-mmc&m=3D134374508625826&w=3D2 With this patch you will be able to see the amount of packing and what caused the packed preparation to stop. We tested the packed commands feature with SanDisk cards and got improvement of 30% when using lmdd and tiotest. We don't use iozone for sequential tests but if you'll send me the exact command that you use w= e can try it as well. It is true that packed commands can cause degradation of read in read-write collisions. However, it is only nature that when having long= er write request a read request has to wait for a longer time and its late= ncy will increase. I believe that it is not our duty to decide if this is a reason to exclude this feature. Everyone should take its own decision i= f he wants to benefit from the write improvement, while risking the read-write collisions scenarios. eMMC4.5 introduces the HPI and stop transmission to overcome the degradation of read latency due to write (regardless of the packed commands). The packing control is our own enhancement that we believe can also be used to overcome this degradation. It is tunable and requires a specifi= c enabling, so it can be the developer=92s decision whether to use it or = not. Since it is not a standard feature we can discuss separately if it shou= ld be accepted or not and what is the best way to use it. Packed commands is not the only eMMC4.5 feature that can cause degradat= ion in specific scenarios. If we will look at the cache feature, it causes degradation by almost a half in random operations when FLUSH is being used. When using the following iozone command when cache is enabled, you will see degradation in the iozone results: =2E/data/iozone -i0 -i2 -r4k -s50m -O -o -I -f /data/mmc0/file3 However, cache support was accepted regardless of this degradation and = it is the developer=92s responsibility to decide if to use this feature or= not. To summarize, all eMMC4.5 features that were added are tunable and disabled by default. I believe that when someone would enable a certain feature he will do a= ll the required testing for determining if he can benefit from this featur= e or not in his own environment. Thanks, Maya On Tue, November 13, 2012 6:54 pm, Chris Ball wrote: > Hi Maya, > > On Sun, Nov 04 2012, merez@codeaurora.org wrote: >> Packed commands is a mandatory eMMC4.5 feature and is supported by a= ll the card vendors. > > We're still only talking about using packed writes, though, right? > >> It wa proven to be beneficial for eMMC4.5 cards and harmless for non eMMC4.5 cards. > > My understanding is that write packing causes a regression in read performance that can be tuned/fixed by your num_wr_reqs_to_start_packin= g tunable (and read packing causes a read regression with current eMMC 4.= 5 cards). Is that wrong? > >> I don't see a point to hold it back while it can be enabled or >> disabled by a flag and most of the code it adds is guarded in specif= ic functions and is not active when packed commands is disabled. > > Earlier in the thread I wrote: > >>> * I still don't have a good set of representative benchmarks showin= g >>> what kind of performance changes come with this patchset. It seem= s like we've had a small amount of testing on one controller/eMMC part combo from Seungwon, and an entirely different test from Maya, and the results aren't documented fully anywhere to the level of describing what the hardware was, what the test was, and what the results were before and after the patchset. > > I still feel this way. I'm worried that we might be merging code tha= t works well on your controller/card but causes large regressions for everyone else. I don't want to handle this by making a tunable that everyone has to tune for their system, because I don't think anyone wil= l tune it. I don't think that shipping a capability that will probably lead to performance regressions if you turn it on is a good idea. > > I'm in a better position to help now, though -- I have some motherboa= rds with Marvell SoCs and a socketed eMMC slot, and I have eMMC 4.5 parts from Sandisk and Toshiba. So I can try to help work out how > generalizable your results are across other controllers and cards. > > So far I've only tried the Sandisk part, but it didn't show any write improvement with write packing. I've verified that the switch command to turn on packed_event_en happens and succeeds, and that the caps are set correctly, so I'm not sure what's wrong yet. With iozone I get: > > KB reclen write rewrite > Unpacked writes: 10240 8192 17250 16794 > Packed writes: 10240 8192 16930 17353 > > I'll try the Toshiba part next, and I'll start using lmdd as well as iozone. Any ideas on why I might not be seeing improvements with Sandisk? > > I'm not opposed to merging packed write support in principle, I just want to be convinced that we're not causing regressions for most users who turn it on. (And more than that, I want to see that it leads to improvements that make it worth adding the code complexity for.) > > Thanks, > > - Chris. > -- > Chris Ball > One Laptop Per Child > --=20 QUALCOMM ISRAEL, on behalf of Qualcomm Innovation Center, Inc. is a mem= ber of Code Aurora Forum, hosted by The Linux Foundation