All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/1] mmc: block: Add write packing control
@ 2012-06-01 18:55 Maya Erez
  2012-06-01 18:55   ` Maya Erez
  0 siblings, 1 reply; 44+ messages in thread
From: Maya Erez @ 2012-06-01 18:55 UTC (permalink / raw)
  To: linux-mmc; +Cc: linux-arm-msm, Maya Erez

Our experiments showed that the write packing causes degradation of the read
throughput, in parallel read and write operations.
Since the read latency is critical for user experience we added a write packing control
mechanism that disables the write packing in case of read requests.
This will ensure that read requests latency is not increased due to long write packed commands.

The trigger for enabling the write packing is managing to pack several write requests.
The number of potential packed requests that will trigger the packing can be configured via sysfs.
The trigger for disabling the write packing is a fetch of a read request.

this patch is dependant in the following patches:
  [PATCH v6 1/3] mmc: core: Add packed command feature of eMMC4.5
  [PATCH v6 2/3] mmc: core: Support packed write command for eMMC4.5 device

Changes in v2:
    - Move the attribute for setting the packing enabling trigger to the block device
    - Add documentation of the new attribute

Maya Erez (1):
  mmc: block: Add write packing control

 Documentation/mmc/mmc-dev-attrs.txt |   17 ++++++
 drivers/mmc/card/block.c            |  100 ++++++++++++++++++++++++++++++++++-
 drivers/mmc/card/queue.c            |    8 +++
 drivers/mmc/card/queue.h            |    3 +
 include/linux/mmc/host.h            |    1 +
 5 files changed, 128 insertions(+), 1 deletions(-)

--
1.7.3.3
-- 
Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 44+ messages in thread
* Re: [PATCH v2 1/1] mmc: block: Add write packing control
@ 2012-06-12 19:05 merez
  0 siblings, 0 replies; 44+ messages in thread
From: merez @ 2012-06-12 19:05 UTC (permalink / raw)
  To: S, Venkatraman
  Cc: merez, Seungwon Jeon, linux-mmc, linux-arm-msm,
	DOCUMENTATION',
	open list

> On Tue, Jun 12, 2012 at 1:40 AM,  <merez@codeaurora.org> wrote:
>>> On Mon, Jun 11, 2012 at 7:25 PM,  <merez@codeaurora.org> wrote:
>>>>> Maya Erez <merez@codeaurora.org> wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > How can we check the effect?
>>>>>> > Do you have any result?
>>>>>> We ran parallel lmdd read and write operations and found out that
the
>>>>>> write packing causes the read throughput to drop from 24MB/s to
12MB/s.
>>>>>> The write packing control managed to increase the read throughput
back
>>>>>> to
>>>>>> the original value.
>>>>>> We also examined "real life" scenarios, such as performing a big
push
>>>>>> operation in parallel to launching several applications. We
measured
>>>>>> the
>>>>>> read latency and found out that with the write packing control the
worst
>>>>>> case of the read latency was smaller.
>>>>>> > Please check the several comment below.
>>>>>> >
>>>>>> > Maya Erez <merez@codeaurora.org> wrote:
>>>>>> >> The write packing control will ensure that read requests latency
>>>>>> is
>>>>>> >> not increased due to long write packed commands.
>>>>>> >>
>>>>>> >> The trigger for enabling the write packing is managing to pack
>>>>>> several
>>>>>> >> write requests. The number of potential packed requests that
will
>>>>>> >> trigger
>>>>>> >> the packing can be configured via sysfs by writing the required
>>>>>> value
>>>>>> >> to:
>>>>>> >> /sys/block/<block_dev_name>/num_wr_reqs_to_start_packing. The
trigger for disabling the write packing is fetching a read
>>>>>> request.
>>>>>> >>
>>>>>> >> ---
>>>>>> >>  Documentation/mmc/mmc-dev-attrs.txt |   17 ++++++
>>>>>> >>  drivers/mmc/card/block.c            |  100
>>>>>> >> ++++++++++++++++++++++++++++++++++-
>>>>>> >>  drivers/mmc/card/queue.c            |    8 +++
>>>>>> >>  drivers/mmc/card/queue.h            |    3 +
>>>>>> >>  include/linux/mmc/host.h            |    1 +
>>>>>> >>  5 files changed, 128 insertions(+), 1 deletions(-)
>>>>>> >>
>>>>>> >> diff --git a/Documentation/mmc/mmc-dev-attrs.txt
>>>>>> >> b/Documentation/mmc/mmc-dev-attrs.txt
>>>>>> >> index 22ae844..08f7312 100644
>>>>>> >> --- a/Documentation/mmc/mmc-dev-attrs.txt
>>>>>> >> +++ b/Documentation/mmc/mmc-dev-attrs.txt
>>>>>> >> @@ -8,6 +8,23 @@ The following attributes are read/write.
>>>>>> >>
>>>>>> >>   force_ro                Enforce read-only access even if write
>>>>>> protect switch is
>>>>>> >> off.
>>>>>> >>
>>>>>> >> + num_wr_reqs_to_start_packing    This attribute is used to
>>>>>> determine
>>>>>> >> + the trigger for activating the write packing, in case the
write
>>>>>> >> + packing control feature is enabled.
>>>>>> >> +
>>>>>> >> + When the MMC manages to reach a point where
>>>>>> >> num_wr_reqs_to_start_packing
>>>>>> >> + write requests could be packed, it enables the write packing
>>>>>> feature.
>>>>>> >> + This allows us to start the write packing only when it is
>>>>>> beneficial
>>>>>> >> + and has minimum affect on the read latency.
>>>>>> >> +
>>>>>> >> + The number of potential packed requests that will trigger the
>>>>>> packing
>>>>>> >> + can be configured via sysfs by writing the required value to:
+ /sys/block/<block_dev_name>/num_wr_reqs_to_start_packing. +
>>>>>> >> + The default value of num_wr_reqs_to_start_packing was
>>>>>> determined
>>>>>> by
>>>>>> >> + running parallel lmdd write and lmdd read operations and
>>>>>> calculating
>>>>>> >> + the max number of packed writes requests.
>>>>>> >> +
>>>>>> >>  SD and MMC Device Attributes
>>>>>> >>  ============================
>>>>>> >>
>>>>>> >> diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index 2785fd4..ef192fb 100644
>>>>>> >> --- a/drivers/mmc/card/block.c
>>>>>> >> +++ b/drivers/mmc/card/block.c
>>>>>> >> @@ -114,6 +114,7 @@ struct mmc_blk_data {
>>>>>> >>   struct device_attribute force_ro;
>>>>>> >>   struct device_attribute power_ro_lock;
>>>>>> >>   int     area_type;
>>>>>> >> + struct device_attribute num_wr_reqs_to_start_packing;
>>>>>> >>  };
>>>>>> >>
>>>>>> >>  static DEFINE_MUTEX(open_lock);
>>>>>> >> @@ -281,6 +282,38 @@ out:
>>>>>> >>   return ret;
>>>>>> >>  }
>>>>>> >>
>>>>>> >> +static ssize_t
>>>>>> >> +num_wr_reqs_to_start_packing_show(struct device *dev,
>>>>>> >> +                           struct device_attribute *attr, char
>>>>>> *buf)
>>>>>> >> +{
>>>>>> >> + struct mmc_blk_data *md = mmc_blk_get(dev_to_disk(dev)); + int
num_wr_reqs_to_start_packing;
>>>>>> >> + int ret;
>>>>>> >> +
>>>>>> >> + num_wr_reqs_to_start_packing =
>>>>>> md->queue.num_wr_reqs_to_start_packing;
>>>>>> >> +
>>>>>> >> + ret = snprintf(buf, PAGE_SIZE, "%d\n",
>>>>>> num_wr_reqs_to_start_packing);
>>>>>> >> +
>>>>>> >> + mmc_blk_put(md);
>>>>>> >> + return ret;
>>>>>> >> +}
>>>>>> >> +
>>>>>> >> +static ssize_t
>>>>>> >> +num_wr_reqs_to_start_packing_store(struct device *dev,
>>>>>> >> +                          struct device_attribute *attr, +    
                     const char *buf, size_t count) +{
>>>>>> >> + int value;
>>>>>> >> + struct mmc_blk_data *md = mmc_blk_get(dev_to_disk(dev)); + +
sscanf(buf, "%d", &value);
>>>>>> >> + if (value >= 0)
>>>>>> >> +         md->queue.num_wr_reqs_to_start_packing = value; + +
mmc_blk_put(md);
>>>>>> >> + return count;
>>>>>> >> +}
>>>>>> >> +
>>>>>> >>  static int mmc_blk_open(struct block_device *bdev, fmode_t
mode)
>>>>>> >>  {
>>>>>> >>   struct mmc_blk_data *md = mmc_blk_get(bdev->bd_disk);
>>>>>> >> @@ -1313,6 +1346,48 @@ static void mmc_blk_rw_rq_prep(struct
mmc_queue_req *mqrq,
>>>>>> >>   mmc_queue_bounce_pre(mqrq);
>>>>>> >>  }
>>>>>> >>
>>>>>> >> +static void mmc_blk_write_packing_control(struct mmc_queue *mq,
+                                   struct request *req) +{
>>>>>> >> + struct mmc_host *host = mq->card->host;
>>>>>> >> + int data_dir;
>>>>>> >> +
>>>>>> >> + if (!(host->caps2 & MMC_CAP2_PACKED_WR))
>>>>>> >> +         return;
>>>>>> >> +
>>>>>> >> + /*
>>>>>> >> +  * In case the packing control is not supported by the host,
it
>>>>>> should
>>>>>> >> +  * not have an effect on the write packing. Therefore we have
>>>>>> to
>>>>>> >> enable
>>>>>> >> +  * the write packing
>>>>>> >> +  */
>>>>>> >> + if (!(host->caps2 & MMC_CAP2_PACKED_WR_CONTROL)) {
>>>>>> >> +         mq->wr_packing_enabled = true;
>>>>>> >> +         return;
>>>>>> >> + }
>>>>>> >> +
>>>>>> >> + if (!req || (req && (req->cmd_flags & REQ_FLUSH))) {
>>>>>> >> +         if (mq->num_of_potential_packed_wr_reqs >
>>>>>> >> +                         mq->num_wr_reqs_to_start_packing) +  
              mq->wr_packing_enabled = true;
>>>>>> >> +         return;
>>>>>> >> + }
>>>>>> >> +
>>>>>> >> + data_dir = rq_data_dir(req);
>>>>>> >> +
>>>>>> >> + if (data_dir == READ) {
>>>>>> >> +         mq->num_of_potential_packed_wr_reqs = 0;
>>>>>> >> +         mq->wr_packing_enabled = false;
>>>>>> >> +         return;
>>>>>> >> + } else if (data_dir == WRITE) {
>>>>>> >> +         mq->num_of_potential_packed_wr_reqs++;
>>>>>> >> + }
>>>>>> >> +
>>>>>> >> + if (mq->num_of_potential_packed_wr_reqs >
>>>>>> >> +                 mq->num_wr_reqs_to_start_packing)
>>>>>> >> +         mq->wr_packing_enabled = true;
>>>>>> > Write Packing is available only if continuing write requests are
>>>>>> over
>>>>>> > num_wr_reqs_to_start_packing?
>>>>>> > That means individual request(1...17) will be issued with
>>>>>> non-packing.
>>>>>> > Could you explain your policy more?
>>>>>> We try to identify the case where there is parallel read and write
operations. In our experiments we found out that the number of
write
>>>>>> requests between read requests in parallel read and write
operations
>>>>>> doesn't exceed 17 requests. Therefore, we can assume that fetching
more
>>>>>> than 17 write requests without hitting a read request can indicate
that
>>>>>> there is no read activity.
>>>>> We can apply this experiment regardless I/O scheduler?
>>>>> Which I/O scheduler was used with this experiment?
>>>> The experiment was performed with the CFQ scheduler. Since the
deadline
>>>> uses a batch of 16 requests it should also fit the deadline
scheduler.
>>>> In case another value is required, this value can be changed via
sysfs.
>>>>>> You are right that this affects the write throughput a bit but the
goal
>>>>>> of
>>>>>> this algorithm is to make sure the read throughput and latency are
not
>>>>>> decreased due to write. If this is not the desired result, this
algorithm
>>>>>> can be disabled.
>>>>>> >> +
>>>>>> >> +}
>>>>>> >> +
>>>>>> >>  static u8 mmc_blk_prep_packed_list(struct mmc_queue *mq, struct
>>>>>> request
>>>>>> >> *req)
>>>>>> >>  {
>>>>>> >>   struct request_queue *q = mq->queue;
>>>>>> >> @@ -1332,6 +1407,9 @@ static u8 mmc_blk_prep_packed_list(struct
mmc_queue *mq, struct request *req)
>>>>>> >>                   !card->ext_csd.packed_event_en)
>>>>>> >>           goto no_packed;
>>>>>> >>
>>>>>> >> + if (!mq->wr_packing_enabled)
>>>>>> >> +         goto no_packed;
>>>>>> > If wr_packing_enabled is set to true, several write requests can
>>>>>> be
>>>>>> > packed.
>>>>>> > We don't need to consider read request since packed write?
>>>>>> I'm not sure I understand the question. We check if there was a
read
>>>>>> request in the mmc_blk_write_packing_control, and in such a case
set
>>>>>> mq->wr_packing_enabled to false.
>>>>>> If I didn't answer the question, please explain it again.
>>>>> Packed write can be possible after exceeding 17 requests.
>>>>> Is it assured that read request doesn't follow immediately after
packed
>>>>> write?
>>>>> I wonder this case.
>>>> Currently in such a case we will send the packed command followed by
the
>>>> read request. The latency of this read request will be high due to
waiting
>>>> for the completion of the packed write. However, since we will
disable
>>>> the
>>>> write packing, the latency of the following read requests will be
low.
>>>> We are working on a solution where the read request will bypass the
write
>>>> requests in such a case. This change requires modification of the
scheduler in order to re-insert the write requests to the scheduler.
>>> Thats the precise reason for using foreground HPI (shameless plug :-))
I understand the intent of write packing control, but using the number of
requests
>>> as a metric is too coarse. Some writes could be for only one sector
(512B) and others
>>> could be in 512KB or more, giving a 1000x variance.
>>> Foreground HPI solves this problem by interrupting only on a wait
threshold.
>>> Another aspect is that if a packed write is in progress, and you have
a read request,
>>> you will most likely disable packing for the _next_ write, not the
ongoing one, right ?
>>> That's too late an intervention IMHO.
>> If a write request is in progress and a read is fetched we pln to use
HPI
>> to stop it and re-insert the remider of the write packed command back
to
>> the scheduler for a later dispatch.
> IIUC, there were 2 reasons mentioned by you for introducing write
packing control -
> 1) Read bandwidth drop
> 2) Use case "latency" or if I were to guess, "sluggish UI".
>
> So if (2) is solved by HPI, we can investigate the reason for (1) and
fix that, rather
> than adding another functionality (which belongs in the I/O scheduler
anyway) to MMC.
According to our measurements the stop transmission (CMD12) + HPI is a
heavy operation that takes several miliseconds. Intensive usage of HPI
will cause degradation of the performance.
When there is a packed write followed by a read request, we will stop the
packed write and issue the read request. Disabling the write packing due
to this read request (by the packing control function) will eliminate the
need for additional HPI operation to stop the next packed write request,
in case of a flow of read requests. When the read requests flow will end,
the write packing will be enabled again when there will be a flow of write
requests.

Regarding the degradation of the read throughput in case of mix read/write
operations:
Our test showed that there is a degradation of ~40% in the read throughput
in mix read/write operations even without packing. Enabling the write
packing increases this degradation to ~60%. Those numbers are applicable
to CFQ scheduler while the deadline scheduler resulted in even lower read
throughput in mix operations.
While investigating this degradation we found out that the main cause for
it is the way the application issues the read requests and the policy of
the scheduler.
Therefore, the write packing control is only one piece of the complete
solution that we are working on. The complete solution will be released in
a later phase and will include:
- Usage of stop transmission in order to stop a packed write in case of a
read request
- Issue a read request that is fetched at the end of a packed list
preparation before the packed write
- The ability to re-insert the write requests back to the scheduler in
case of the above cases (re-queuing them back to the dispatch queue will
not give a real solution in case of a flow of read requests).
- Modify the scheduler to ensure preferring of reads over writes in mix
read/write scenarios

I hope this makes things a bit clearer. Let me know if you have additional
questions.
>
>> Regarding the packing control trigger, we also tried using a trigger of
an
>> amount of write bytes between read. However, the number of potential
packed requests seemed like the reasonable trigger since we would like to
>> activate the packing only when it will be beneficial, regardless of the
write requests sizes.
> Why ? How do you know "when it will be beneficial" ? As I mentioned, the
number of
> blocks per request would vary over time, and also depends on the
filesystem. OTOH, even small
> writes could take a lot longer than usual (>500ms) due to garbage
collection etc.
>
Based on our experiments the write packing is mostly beneficial in long
sequential write scenarios. Therefore, we would still like the write
packing to be enabled in such cases (as long as there are no parallel
reads). The trigger of number of potential packed requests ensures that
the packing will be enabled in case of a flow of write.

Can you please explain why you refer to the number of blocks per request
when the write packing solution doesn't take into account the request
size?

Thanks,
Maya Erez

--
Sent by consultant of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum

^ permalink raw reply	[flat|nested] 44+ messages in thread
* Re: [PATCH v2 1/1] mmc: block: Add write packing control
@ 2012-07-23 11:43 ` merez
  0 siblings, 0 replies; 44+ messages in thread
From: merez @ 2012-07-23 11:43 UTC (permalink / raw)
  To: Chris Ball
  Cc: merez, Muthu Kumar, linux-mmc, linux-arm-msm, open list, S,
	Venkatraman, Seungwon Jeon

On Wed, July 18, 2012 12:26 am, Chris Ball wrote:
> Hi,  [removing Jens and the documentation list, since now we're
> talking about the MMC side only]
>
> On Wed, Jul 18 2012, merez@codeaurora.org wrote:
>> Is there anything else that holds this patch from being pushed to
mmc-next?
>
> Yes, I'm still uncomfortable with the write packing patchsets for a
couple of reasons, and I suspect that the sum of those reasons means that
we should probably plan on holding off merging it until after 3.6.
>
> Here are the open issues; please correct any misunderstandings:
>
> With Seungwon's patchset ("Support packed write command"):
>
> * I still don't have a good set of representative benchmarks showing
>   what kind of performance changes come with this patchset.  It seems
like we've had a small amount of testing on one controller/eMMC part combo
from Seungwon, and an entirely different test from Maya, and the results
aren't documented fully anywhere to the level of describing what the
hardware was, what the test was, and what the results were before and
after the patchset.

Currently, there is only one card vendor that supports packed commands.
Following are our sequential write (LMDD) test results on 2 of our targets
(in MB/s):
                       No packing        packing
Target 1 (SDR 50MHz)     15               25
Target 2 (DDR 50MHz)     20               30

>
> With the reads-during-writes regression:
>
> * Venkat still has open questions about the nature of the read
>   regression, and thinks we should understand it with blktrace before
trying to fix it.  Maya has a theory about writes overwhelming reads, but
Venkat doesn't understand why this would explain the observed
bandwidth drop.

The degradation of read due to writes is not a new behavior and exists
also without the write packing feature (which only increases the
degradation). Our investigation of this phenomenon led us to the
Conclusion that a new scheduling policy should be used for mobile devices,
but this is not related to the current discussion of the write packing
feature.

The write packing feature increases the degradation of read due to write
since it allows the MMC to fetch many write requests in a row, instead of
fetching only one at a time.  Therefore some of the read requests will
have to wait for the completion of more write requests before they can be
issued.

To overcome this behavior, the solution would be to stop the write packing
when a read request is fetched, and this is the algorithm suggested by the
write packing control.

Let's also keep in mind that lmdd benchmarking doesn't fully reflect the
real life in which there are not many scenarios that cause massive read
and write operations. In our user-common-scenarios tests we saw that in
many cases the write packing decreases the read latency. It can happen in
cases where the same amount of write requests is fetched with and without
packing. In such a case the write packing decreases the transfer time of
the write requests and causes the read request to wait for a shorter time.

>
> With Maya's patchset ("write packing control"):
>
> * Venkat thinks that HPI should be used, and the number-of-requests
>   metric is too coarse, and it doesn't let you disable packing at the
right time, and you're essentially implementing a new I/O scheduler inside
the MMC subsystem without understanding the root cause for why that's
necessary.

According to our measurements the stop transmission (CMD12) + HPI is a
heavy operation that may take up to several milliseconds. Therefore, a
massive usage of HPI can cause a degradation of performance.
In addition, it doesn’t provide a complete solution for read during write
since it doesn’t solve the problem of “what to do with the interrupted
write request remainder?”.  That is, a common interrupting read request
will usually be followed by another one. If we just continue to write the
interrupted write request remainder we will probably get another HPI due
to the second read request, so eventually we may end up with lots of HPIs
and write retries. A complete solution will be: stop the current write,
change packing mode to non-packing, serve the read request, push back the
write remainders to the block I/O scheduler and let him schedule them
again probably after the read burst ends (this requires block layer
support of course).

Regarding the packing control, there seem to be a confusion since the
number-of-requests is the trigger for *enabling* the packing (after it was
disabled), while a single read request disable packing. Therefore, the
packing is stopped at the right time.

The packing control doesn't add any scheduling policy to the MMC layer.
The write packing feature is the one changing the scheduling policy by
fetching many write requests in a row without a delay that allows read
requests to come in the middle.
By disabling the write packing, the write packing control returns the old
scheduling policy. It causes the MMC to fetch the requests one by one,
thus read requests are served as before.

It is correct that the trigger for enabling the write packing control
should be adjusted per platform and doesn't give a complete solution. As I
mentioned above, the complete solution will include the usage of write
packing control, a re-insert of the write packed to the scheduler when a
read request is fetched and usage of HPI to stop the packing that is
already transferred.

To summarize -
We recommend including the write packing in 3.6 due to the following reasons:
1. It significantly improves the write throughput
2. In some of the cases it even decreases the read latency
3. The read degradation in simultaneous read-write flows already exist,
even without this feature

As for the write packing control, it can be included in 3.6 to supply a
partial solution for the read degradation or we can postpone it to 3.7 and
integrate it as part of the complete solution.

Thanks,
Maya

>
> My sense is that there's no way we can solve all of these to
> satisfaction in the next week (which is when the merge window will
open), but that by waiting a cycle we might come up with some good answers.
>
> What do other people think?  If you're excited about these patchsets,
now would be a fine time to come forward with your benchmarking results
and to help understand the reads-during-writes regression.
>
> Thanks!
>
> - Chris.
> --
> Chris Ball   <cjb@laptop.org>   <http://printf.net/>
> One Laptop Per Child
>


-- 
Sent by consultant of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum

^ permalink raw reply	[flat|nested] 44+ messages in thread
* Re: [PATCH v2 1/1] mmc: block: Add write packing control
@ 2012-07-24  8:44 merez
  2012-07-24 20:23   ` merez
  2012-07-26 15:28 ` S, Venkatraman
  0 siblings, 2 replies; 44+ messages in thread
From: merez @ 2012-07-24  8:44 UTC (permalink / raw)
  To: S, Venkatraman
  Cc: merez, Chris Ball, Muthu Kumar, linux-mmc, linux-arm-msm,
	open list, Seungwon Jeon

On Mon, July 23, 2012 5:22 am, S, Venkatraman wrote:
> On Mon, Jul 23, 2012 at 5:13 PM,  <merez@codeaurora.org> wrote:
>> On Wed, July 18, 2012 12:26 am, Chris Ball wrote:
>>> Hi,  [removing Jens and the documentation list, since now we're
talking about the MMC side only]
>>> On Wed, Jul 18 2012, merez@codeaurora.org wrote:
>>>> Is there anything else that holds this patch from being pushed to
>> mmc-next?
>>> Yes, I'm still uncomfortable with the write packing patchsets for a
>> couple of reasons, and I suspect that the sum of those reasons means that
>> we should probably plan on holding off merging it until after 3.6.
>>> Here are the open issues; please correct any misunderstandings: With
Seungwon's patchset ("Support packed write command"):
>>> * I still don't have a good set of representative benchmarks showing
>>>   what kind of performance changes come with this patchset.  It seems
>> like we've had a small amount of testing on one controller/eMMC part combo
>> from Seungwon, and an entirely different test from Maya, and the
results
>> aren't documented fully anywhere to the level of describing what the
hardware was, what the test was, and what the results were before and
after the patchset.
>> Currently, there is only one card vendor that supports packed commands.
Following are our sequential write (LMDD) test results on 2 of our
targets
>> (in MB/s):
>>                        No packing        packing
>> Target 1 (SDR 50MHz)     15               25
>> Target 2 (DDR 50MHz)     20               30
>>> With the reads-during-writes regression:
>>> * Venkat still has open questions about the nature of the read
>>>   regression, and thinks we should understand it with blktrace before
>> trying to fix it.  Maya has a theory about writes overwhelming reads, but
>> Venkat doesn't understand why this would explain the observed
>> bandwidth drop.
>> The degradation of read due to writes is not a new behavior and exists
also without the write packing feature (which only increases the
degradation). Our investigation of this phenomenon led us to the
Conclusion that a new scheduling policy should be used for mobile
devices,
>> but this is not related to the current discussion of the write packing
feature.
>> The write packing feature increases the degradation of read due to
write
>> since it allows the MMC to fetch many write requests in a row, instead of
>> fetching only one at a time.  Therefore some of the read requests will
have to wait for the completion of more write requests before they can
be
>> issued.
>
> I am a bit puzzled by this claim. One thing I checked carefully when
reviewing write packing patches from SJeon was that the code didn't
plough through a mixed list of reads and writes and selected only
writes.
> This section of the code in "mmc_blk_prep_packed_list()", from v8
patchset..
> <Quote>
> +               if (rq_data_dir(cur) != rq_data_dir(next)) {
> +                       put_back = 1;
> +                       break;
> +               }
> </Quote>
>
> means that once a read is encountered in the middle of write packing,
the packing is stopped at that point and it is executed. Then the next
blk_fetch_request should get the next read and continue as before.
>
> IOW, the ordering of reads and writes is _not_ altered when using packed
commands.
> For example if there were 5 write requests, followed by 1 read,
> followed by 5 more write requests in the request_queue, the first 5
writes will be executed as one "packed command", then the read will be
executed, and then the remaining 5 writes will be executed as one
"packed command". So the read does not have to wait any more than it
waited before (packing feature)

Let me try to better explain with your example.
Without packing the MMC layer will fetch 2 write requests and wait for the
first write request completion before fetching another write request.
During this time the read request could be inserted into the CFQ and since
it has higher priority than the async write it will be dispatched in the
next fetch. So, the result would be 2 write requests followed by one read
request and the read would have to wait for completion of only 2 write
requests.
With packing, all the 5 write requests will be fetched in a row, and then
the read will arrive and be dispatched in the next fetch. Then the read
will have to wait for the completion of 5 write requests.

Few more clarifications:
Due to the plug list mechanism in the block layer the applications can
"aggregate" several requests to be inserted into the scheduler before
waking the MMC queue thread.
This leads to a situation where there are several write requests in the
CFQ queue when MMC starts to do the fetches.

If the read was inserted while we are building the packed command then I
agree that we should have seen less effect on the read performance.
However, the write packing statistics show that in most of the cases the
packing stopped due to an empty queue, meaning that the read was inserted
to the CFQ after all the pending write requests were fetched and packed.

Following is an example for write packing statistics of a READ/WRITE
parallel scenario:
write packing statistics:
Packed 1 reqs - 448 times
Packed 2 reqs - 38 times
Packed 3 reqs - 23 times
Packed 4 reqs - 30 times
Packed 5 reqs - 14 times
Packed 6 reqs - 8 times
Packed 7 reqs - 4 times
Packed 8 reqs - 1 times
Packed 10 reqs - 1 times
Packed 34 reqs - 1 times
stopped packing due to the following reasons:
2 times: wrong data direction (meaning a READ was fetched and stopped the
packing)
1 times: flush or discard
565 times: empty queue (meaning blk_fetch_request returned NULL)

>
> And I requested blktrace to confirm that this is indeed the behaviour.

The trace logs show that in case of no packing, there are maximum of 3-4
requests issued before a read request, while with packing there are also
cases of 6 and 7 requests dispatched before a read request.

I'm waiting for an approval for sharing the block trace logs.
Since this is a simple test to run you can collect the trace logs and let
us know if you reach other conclusions.

Thanks,
Maya

>
> Your rest of the arguments anyway depend on this assertion, so can you
please clarify this.
>
>> To overcome this behavior, the solution would be to stop the write packing
>> when a read request is fetched, and this is the algorithm suggested by the
>> write packing control.
>> Let's also keep in mind that lmdd benchmarking doesn't fully reflect
the
>> real life in which there are not many scenarios that cause massive read
and write operations. In our user-common-scenarios tests we saw that in
many cases the write packing decreases the read latency. It can happen
in
>> cases where the same amount of write requests is fetched with and without
>> packing. In such a case the write packing decreases the transfer time
of
>> the write requests and causes the read request to wait for a shorter time.
>>> With Maya's patchset ("write packing control"):
>>> * Venkat thinks that HPI should be used, and the number-of-requests
>>>   metric is too coarse, and it doesn't let you disable packing at the
>> right time, and you're essentially implementing a new I/O scheduler inside
>> the MMC subsystem without understanding the root cause for why that's
necessary.
>> According to our measurements the stop transmission (CMD12) + HPI is a
heavy operation that may take up to several milliseconds. Therefore, a
massive usage of HPI can cause a degradation of performance.
>> In addition, it doesn’t provide a complete solution for read during write
>> since it doesn’t solve the problem of “what to do with the interrupted
write request remainder?”.  That is, a common interrupting read request
will usually be followed by another one. If we just continue to write
the
>> interrupted write request remainder we will probably get another HPI
due
>> to the second read request, so eventually we may end up with lots of HPIs
>> and write retries. A complete solution will be: stop the current write,
change packing mode to non-packing, serve the read request, push back
the
>> write remainders to the block I/O scheduler and let him schedule them
again probably after the read burst ends (this requires block layer
support of course).
>> Regarding the packing control, there seem to be a confusion since the
number-of-requests is the trigger for *enabling* the packing (after it
was
>> disabled), while a single read request disable packing. Therefore, the
packing is stopped at the right time.
>> The packing control doesn't add any scheduling policy to the MMC layer.
The write packing feature is the one changing the scheduling policy by
fetching many write requests in a row without a delay that allows read
requests to come in the middle.
>> By disabling the write packing, the write packing control returns the old
>> scheduling policy. It causes the MMC to fetch the requests one by one,
thus read requests are served as before.
>> It is correct that the trigger for enabling the write packing control
should be adjusted per platform and doesn't give a complete solution.
As
>> I
>> mentioned above, the complete solution will include the usage of write
packing control, a re-insert of the write packed to the scheduler when
a
>> read request is fetched and usage of HPI to stop the packing that is
already transferred.
>> To summarize -
>> We recommend including the write packing in 3.6 due to the following
reasons:
>> 1. It significantly improves the write throughput
>> 2. In some of the cases it even decreases the read latency
>> 3. The read degradation in simultaneous read-write flows already exist,
even without this feature
>> As for the write packing control, it can be included in 3.6 to supply a
partial solution for the read degradation or we can postpone it to 3.7
and
>> integrate it as part of the complete solution.
>> Thanks,
>> Maya
>>> My sense is that there's no way we can solve all of these to
>>> satisfaction in the next week (which is when the merge window will
>> open), but that by waiting a cycle we might come up with some good
answers.
>>> What do other people think?  If you're excited about these patchsets,
>> now would be a fine time to come forward with your benchmarking results
and to help understand the reads-during-writes regression.
>


-- 
Sent by consultant of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2012-09-06  5:17 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-01 18:55 [PATCH v2 0/1] mmc: block: Add write packing control Maya Erez
2012-06-01 18:55 ` [PATCH v2 1/1] " Maya Erez
2012-06-01 18:55   ` Maya Erez
2012-06-08  9:37   ` Seungwon Jeon
2012-06-09 14:46     ` merez
2012-06-11  9:10       ` Seungwon Jeon
2012-06-11 13:55         ` merez
2012-06-11 14:39           ` S, Venkatraman
2012-06-11 20:10             ` merez
2012-06-12  4:16               ` S, Venkatraman
2012-06-11 17:19       ` S, Venkatraman
2012-06-11 20:19         ` merez
2012-06-12  4:07           ` S, Venkatraman
2012-06-11 21:19   ` Muthu Kumar
2012-06-12  0:28     ` Muthu Kumar
2012-06-12 20:08       ` merez
2012-06-13 19:52       ` merez
2012-06-13 22:21         ` Muthu Kumar
2012-06-14  7:46           ` merez
2012-07-14 19:12           ` Chris Ball
2012-07-16  1:49             ` Muthu Kumar
2012-07-16  2:46               ` Chris Ball
2012-07-16 16:44                 ` Muthu Kumar
2012-07-17 22:50                   ` Chris Ball
2012-07-18  6:34                     ` merez
2012-07-18  7:26                       ` Chris Ball
2012-07-18  7:26                         ` Chris Ball
2012-07-19  0:33                     ` Muthu Kumar
2012-07-17  4:15                 ` S, Venkatraman
2012-06-12 19:05 merez
2012-07-23 11:43 merez
2012-07-23 11:43 ` merez
2012-07-23 12:22 ` S, Venkatraman
2012-07-24  8:44 merez
2012-07-24 20:23 ` merez
2012-07-24 20:23   ` merez
2012-07-24 20:52   ` merez
2012-07-24 20:52     ` merez
2012-07-26 15:28 ` S, Venkatraman
2012-07-26 18:54   ` merez
2012-07-27  9:07     ` S, Venkatraman
2012-08-27 18:28       ` merez
2012-08-28 17:40         ` S, Venkatraman
2012-09-06  5:17           ` merez

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.