linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RFC: 32-bit __data_len and REQ_DISCARD+REQ_SECURE
@ 2015-09-23  1:54 Grant Grundler
  2015-09-24 17:39 ` Grant Grundler
  0 siblings, 1 reply; 15+ messages in thread
From: Grant Grundler @ 2015-09-23  1:54 UTC (permalink / raw)
  To: Jens Axboe, Ulf Hansson; +Cc: LKML, linux-mmc

Jens, Ulf,

I've run into a basic issue: BLK_SECDISCARD takes 15-35 minutes
perform a secure erase of ~23GB (mostly empty) partition on a 32GB
eMMC part (happens with two vendors). One of the vendors says it
should take less than 60 seconds. I've confirmed erasing 2GB takes
only ~6 seconds - so the vendor estimate seems reasonable.

I'm looking for (a) advice on better v3.18 solutions and (b) start
discuss future solutions for upstream kernel.

Two problems:
1) 3.18 kernel block layer wants to split up the transaction into
"max_discard_sectors" chunks. This would be fine if
max_discard_sectors was more than one or two erase groups (EG). For
one of the eMMC parts, EG size is 512KB and thus the kernel sends
46,000 commands (DISCARD w/SECURE_ERASE arg) to the eMMC device. At
the bottom I discuss why this is such a bad idea for SECURE_ERASE.

My hack to fix (1): ignore max_discard_sectors for SECURE_ERASE
(BLK_SECDISCARD):
    https://chromium-review.googlesource.com/#/c/301460/

I'm trying to recover the 9 bits that blk_rq_sectors() discards and
don't care sub-512byte block offsets.


2) Unfortunately, with the above hack, the block layer is truncating
the request to 2GB. :( Turns out several fields in the block layer are
32-bit. Notably:
    struct request.__data_len
    struct bvec_iter.bi_size
    struct bio_vec.bv_len

This is despite ioctl(BLK_SECDISCARD) having a 64-bit API to pass in
"number of bytes to erase" and the emmc spec allowing 32-bits to
specify the LBA. I looked at changing the above data structures (which
normally only need  20-bits or so) to 64-bit and it's a systemic
change. "Non-trivial" doesn't even begin to describe the scope here.

So I have an even worse hack to use 9 "zero" bits here (and this works
for 3.18 - not sure it would for 4.x kernels):
    https://chromium-review.googlesource.com/#/c/301349

9-bits gives me a bit of time until eMMC devices are > 1TB in
capacity. Maybe a few more months. :)


Last code change: I'm also not sure I need to set MMC_CAP2_HC_ERASE_SZ
but I do in this 3rd patch:
   https://chromium-review.googlesource.com/#/c/301461

sdhci-acpi and sdhci-pci drivers already do this. I didn't see an
obvious place to set this for the sdhci-tegra variant or what that
code is required to do/support when setting HC_ERASE_SZ capability.



--- Why is splitting up BLK_SECDISCARD such a bad idea? ----
Command overhead in the degenerate case is bad and probably causing
the majority of the performance loss here. I'm assuming the device can
issue lots of erase transaction in parallel. Thus sending a secure
erase command for each erase group of LBAs seems like a recipe for
very long serial sequence (which is what we are seeing).

I believe there is another problem besides command overhead: migration
of "live" data from one erase group (that we just told the device it
has to erase) to another erase group (that we will tell the device to
erase in the future).

If I'm going to erase 16GB of 32GB device, we have NO clue what data
has to move and where it will move to.  However, If I tell the device
to evacuate and erase a particular physical block, the more chunks I
have to erase, the more opportunity for the evacuated data to land in
an erase group that will get erased in a future command.

Lastly, for Android, the system is in recovery mode and doing nothing
else. So there is no reason to split up the commands into "smaller
chunks" like a normal IO (where it clearly makes sense up to a point).
Secure Erase performance is critical here. The longer Secure Erase
takes, the less likely people will do it. And I don't think the world
needs more papers on linux products that don't properly erase data.

cheers,
grant

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RFC: 32-bit __data_len and REQ_DISCARD+REQ_SECURE
  2015-09-23  1:54 RFC: 32-bit __data_len and REQ_DISCARD+REQ_SECURE Grant Grundler
@ 2015-09-24 17:39 ` Grant Grundler
       [not found]   ` <CANEJEGukOKUgRJNPVaeEXpx-REzj8ooSfWdrnRMMCQr6EVaNqQ@mail.gmail.com>
  0 siblings, 1 reply; 15+ messages in thread
From: Grant Grundler @ 2015-09-24 17:39 UTC (permalink / raw)
  To: Grant Grundler; +Cc: Jens Axboe, Ulf Hansson, LKML, linux-mmc

Some followup.

1) I re-read the original proposals to support TRIM/DISCARD here:
    https://lwn.net/Articles/293658/

Note the original API asked for "unsigned nr_sects" - which is
essentially what my proposed hack does.  I'll need to find out
where/why this got dropped.

2) I've been able to test this hack on an eMMC device:
[   13.147747] mmc..._secdiscard_rq(mmc1) ERASE from 14116864 cnt
0x2c00000 (size 22528 MiB)
[   13.155964] sdhci cmd: 35/0x1a arg 0xd76800
[   13.160266] sdhci cmd: 36/0x1a arg 0x39767ff
[   13.164593] sdhci cmd: 38/0x1b arg 0x80000000
[   13.803360] random: nonblocking pool is initialized
[   14.567735] sdhci cmd: 13/0x1a arg 0x10000
[   14.573324] mmc..._secdiscard_rq(mmc1) err 0

This was with ~15K files and about 5GB written to the device. 1.4
seconds compared to about 20 minutes to secure erase the same region
with original v3.18 code.

3) I've started reviewing all references to REQ_DISCARD. SCSI
subsystem cripples "count" to 32 bits as well (See
skd_prep_discard_cdb()) and that's not going to be easy to fix. :(
A few other places that use blk_req_sectors() are easily replaced with
blk_req_bytes(). I'm not sure I need to do anything about references
to bio_iter.bi_size.

cheers,
grant

On Tue, Sep 22, 2015 at 6:54 PM, Grant Grundler <grundler@chromium.org> wrote:
> Jens, Ulf,
>
> I've run into a basic issue: BLK_SECDISCARD takes 15-35 minutes
> perform a secure erase of ~23GB (mostly empty) partition on a 32GB
> eMMC part (happens with two vendors). One of the vendors says it
> should take less than 60 seconds. I've confirmed erasing 2GB takes
> only ~6 seconds - so the vendor estimate seems reasonable.
>
> I'm looking for (a) advice on better v3.18 solutions and (b) start
> discuss future solutions for upstream kernel.
>
> Two problems:
> 1) 3.18 kernel block layer wants to split up the transaction into
> "max_discard_sectors" chunks. This would be fine if
> max_discard_sectors was more than one or two erase groups (EG). For
> one of the eMMC parts, EG size is 512KB and thus the kernel sends
> 46,000 commands (DISCARD w/SECURE_ERASE arg) to the eMMC device. At
> the bottom I discuss why this is such a bad idea for SECURE_ERASE.
>
> My hack to fix (1): ignore max_discard_sectors for SECURE_ERASE
> (BLK_SECDISCARD):
>     https://chromium-review.googlesource.com/#/c/301460/
>
> I'm trying to recover the 9 bits that blk_rq_sectors() discards and
> don't care sub-512byte block offsets.
>
>
> 2) Unfortunately, with the above hack, the block layer is truncating
> the request to 2GB. :( Turns out several fields in the block layer are
> 32-bit. Notably:
>     struct request.__data_len
>     struct bvec_iter.bi_size
>     struct bio_vec.bv_len
>
> This is despite ioctl(BLK_SECDISCARD) having a 64-bit API to pass in
> "number of bytes to erase" and the emmc spec allowing 32-bits to
> specify the LBA. I looked at changing the above data structures (which
> normally only need  20-bits or so) to 64-bit and it's a systemic
> change. "Non-trivial" doesn't even begin to describe the scope here.
>
> So I have an even worse hack to use 9 "zero" bits here (and this works
> for 3.18 - not sure it would for 4.x kernels):
>     https://chromium-review.googlesource.com/#/c/301349
>
> 9-bits gives me a bit of time until eMMC devices are > 1TB in
> capacity. Maybe a few more months. :)
>
>
> Last code change: I'm also not sure I need to set MMC_CAP2_HC_ERASE_SZ
> but I do in this 3rd patch:
>    https://chromium-review.googlesource.com/#/c/301461
>
> sdhci-acpi and sdhci-pci drivers already do this. I didn't see an
> obvious place to set this for the sdhci-tegra variant or what that
> code is required to do/support when setting HC_ERASE_SZ capability.
>
>
>
> --- Why is splitting up BLK_SECDISCARD such a bad idea? ----
> Command overhead in the degenerate case is bad and probably causing
> the majority of the performance loss here. I'm assuming the device can
> issue lots of erase transaction in parallel. Thus sending a secure
> erase command for each erase group of LBAs seems like a recipe for
> very long serial sequence (which is what we are seeing).
>
> I believe there is another problem besides command overhead: migration
> of "live" data from one erase group (that we just told the device it
> has to erase) to another erase group (that we will tell the device to
> erase in the future).
>
> If I'm going to erase 16GB of 32GB device, we have NO clue what data
> has to move and where it will move to.  However, If I tell the device
> to evacuate and erase a particular physical block, the more chunks I
> have to erase, the more opportunity for the evacuated data to land in
> an erase group that will get erased in a future command.
>
> Lastly, for Android, the system is in recovery mode and doing nothing
> else. So there is no reason to split up the commands into "smaller
> chunks" like a normal IO (where it clearly makes sense up to a point).
> Secure Erase performance is critical here. The longer Secure Erase
> takes, the less likely people will do it. And I don't think the world
> needs more papers on linux products that don't properly erase data.
>
> cheers,
> grant

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Fwd: RFC: 32-bit __data_len and REQ_DISCARD+REQ_SECURE
       [not found]   ` <CANEJEGukOKUgRJNPVaeEXpx-REzj8ooSfWdrnRMMCQr6EVaNqQ@mail.gmail.com>
@ 2015-09-28 21:45     ` Grant Grundler
  2015-09-29 12:56       ` Gwendal Grignou
  2015-10-20 17:53       ` Grant Grundler
  0 siblings, 2 replies; 15+ messages in thread
From: Grant Grundler @ 2015-09-28 21:45 UTC (permalink / raw)
  To: Jens Axboe, Ulf Hansson, LKML, linux-mmc

[resending...I forgot to switch gmail back to text-only mode. grrrh..]

---------- Forwarded message ----------
From: Grant Grundler <grundler@chromium.org>
Date: Mon, Sep 28, 2015 at 2:42 PM
Subject: Re: RFC: 32-bit __data_len and REQ_DISCARD+REQ_SECURE
To: Grant Grundler <grundler@chromium.org>
Cc: Jens Axboe <axboe@kernel.dk>, Ulf Hansson
<ulf.hansson@linaro.org>, LKML <linux-kernel@vger.kernel.org>,
"linux-mmc@vger.kernel.org" <linux-mmc@vger.kernel.org>


On Thu, Sep 24, 2015 at 10:39 AM, Grant Grundler <grundler@chromium.org> wrote:
>
> Some followup.
...
>
> 2) I've been able to test this hack on an eMMC device:
> [   13.147747] mmc..._secdiscard_rq(mmc1) ERASE from 14116864 cnt
> 0x2c00000 (size 22528 MiB)
> [   13.155964] sdhci cmd: 35/0x1a arg 0xd76800
> [   13.160266] sdhci cmd: 36/0x1a arg 0x39767ff
> [   13.164593] sdhci cmd: 38/0x1b arg 0x80000000
> [   13.803360] random: nonblocking pool is initialized
> [   14.567735] sdhci cmd: 13/0x1a arg 0x10000
> [   14.573324] mmc..._secdiscard_rq(mmc1) err 0
>
> This was with ~15K files and about 5GB written to the device. 1.4
> seconds compared to about 20 minutes to secure erase the same region
> with original v3.18 code.


To put a few more numbers on the "chunk size vs perf":
 1EG (512KB) -> 44K commands -> ~20 minutes
32EG (16MB) -> 1375 commands -> ~1 minute
128EG (64MB) -> 344 commands -> ~30 seconds
8191EG (~4GB) -> 6 commands -> 2 seconds + ~8 seconds mkfs
(I'm assuming times above include about 6-10 seconds of mkfs as part
of writing a new file system)

This is with only ~300MB of data written to the partition. I'm fully
aware that times will vary depending on how much data needs to be
migrated (and in this case very little or none). I'm certain the
difference will only get worse for the smaller the "chunk size" used
to Secure Erase due to repeated data migration.

Given the different use model for secure erase (legal/contractually
required behavior), is using 4GB chunk size acceptable?

Would anyone be terribly offended if I used the recently added
"MMC_IOC_MULTI_CMD" to send the cmd 35/36/38 sequence to the eMMC
device to securely erase the offending partition?

thanks,
grant

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RFC: 32-bit __data_len and REQ_DISCARD+REQ_SECURE
  2015-09-28 21:45     ` Fwd: " Grant Grundler
@ 2015-09-29 12:56       ` Gwendal Grignou
  2015-10-20 17:53       ` Grant Grundler
  1 sibling, 0 replies; 15+ messages in thread
From: Gwendal Grignou @ 2015-09-29 12:56 UTC (permalink / raw)
  To: Grant Grundler; +Cc: Jens Axboe, Ulf Hansson, LKML, linux-mmc

The issue is mmc_init_erase() needs to be updated:
On modern eMMC, we set pref_erase to >ext_csd.hc_erase_size.
hc_erase_size, aka High Capacity Erase Group Size, is the minimum
amount of data that can be erased on that device, not the optimal
amount. It is used to be sure a partition ends on group boundaries.
That why we ended up with a preg_size at 512kB.
If hc_erase_size was not defined, we would have pref_erase size to
4MB, because the device 1GB or bigger. Given new eMMC can be found in
128GB capacity, we should increase the scale, maybe 16MB over 4GB and
64MB over 16GB.

Gwendal.

On Mon, Sep 28, 2015 at 2:45 PM, Grant Grundler <grundler@chromium.org> wrote:
> [resending...I forgot to switch gmail back to text-only mode. grrrh..]
>
> ---------- Forwarded message ----------
> From: Grant Grundler <grundler@chromium.org>
> Date: Mon, Sep 28, 2015 at 2:42 PM
> Subject: Re: RFC: 32-bit __data_len and REQ_DISCARD+REQ_SECURE
> To: Grant Grundler <grundler@chromium.org>
> Cc: Jens Axboe <axboe@kernel.dk>, Ulf Hansson
> <ulf.hansson@linaro.org>, LKML <linux-kernel@vger.kernel.org>,
> "linux-mmc@vger.kernel.org" <linux-mmc@vger.kernel.org>
>
>
> On Thu, Sep 24, 2015 at 10:39 AM, Grant Grundler <grundler@chromium.org> wrote:
>>
>> Some followup.
> ...
>>
>> 2) I've been able to test this hack on an eMMC device:
>> [   13.147747] mmc..._secdiscard_rq(mmc1) ERASE from 14116864 cnt
>> 0x2c00000 (size 22528 MiB)
>> [   13.155964] sdhci cmd: 35/0x1a arg 0xd76800
>> [   13.160266] sdhci cmd: 36/0x1a arg 0x39767ff
>> [   13.164593] sdhci cmd: 38/0x1b arg 0x80000000
>> [   13.803360] random: nonblocking pool is initialized
>> [   14.567735] sdhci cmd: 13/0x1a arg 0x10000
>> [   14.573324] mmc..._secdiscard_rq(mmc1) err 0
>>
>> This was with ~15K files and about 5GB written to the device. 1.4
>> seconds compared to about 20 minutes to secure erase the same region
>> with original v3.18 code.
>
>
> To put a few more numbers on the "chunk size vs perf":
>  1EG (512KB) -> 44K commands -> ~20 minutes
> 32EG (16MB) -> 1375 commands -> ~1 minute
> 128EG (64MB) -> 344 commands -> ~30 seconds
> 8191EG (~4GB) -> 6 commands -> 2 seconds + ~8 seconds mkfs
> (I'm assuming times above include about 6-10 seconds of mkfs as part
> of writing a new file system)
>
> This is with only ~300MB of data written to the partition. I'm fully
> aware that times will vary depending on how much data needs to be
> migrated (and in this case very little or none). I'm certain the
> difference will only get worse for the smaller the "chunk size" used
> to Secure Erase due to repeated data migration.
>
> Given the different use model for secure erase (legal/contractually
> required behavior), is using 4GB chunk size acceptable?
>
> Would anyone be terribly offended if I used the recently added
> "MMC_IOC_MULTI_CMD" to send the cmd 35/36/38 sequence to the eMMC
> device to securely erase the offending partition?
>
> thanks,
> grant
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RFC: 32-bit __data_len and REQ_DISCARD+REQ_SECURE
  2015-09-28 21:45     ` Fwd: " Grant Grundler
  2015-09-29 12:56       ` Gwendal Grignou
@ 2015-10-20 17:53       ` Grant Grundler
  2015-10-20 18:57         ` Jeff Moyer
  1 sibling, 1 reply; 15+ messages in thread
From: Grant Grundler @ 2015-10-20 17:53 UTC (permalink / raw)
  To: Ulf Hansson, Jens Axboe, linux-mmc; +Cc: LKML, Gwendal Grignou

Ping? Does no one care how long BLK_SECDISCARD takes?

ChromeOS has landed this change as a compromise between "fast" (<10
seconds) and "minimize risk" (~90 seconds) for a 23GB partition on
eMMC:
    https://chromium-review.googlesource.com/#/c/302413/

This is a generic problem if we care about data privacy since
consumers won't expect a "secure erase" operation to take 1/2h or more
and think the device is hung.

cheers,
grant

On Mon, Sep 28, 2015 at 2:45 PM, Grant Grundler <grundler@chromium.org> wrote:
> [resending...I forgot to switch gmail back to text-only mode. grrrh..]
>
> ---------- Forwarded message ----------
> From: Grant Grundler <grundler@chromium.org>
> Date: Mon, Sep 28, 2015 at 2:42 PM
> Subject: Re: RFC: 32-bit __data_len and REQ_DISCARD+REQ_SECURE
> To: Grant Grundler <grundler@chromium.org>
> Cc: Jens Axboe <axboe@kernel.dk>, Ulf Hansson
> <ulf.hansson@linaro.org>, LKML <linux-kernel@vger.kernel.org>,
> "linux-mmc@vger.kernel.org" <linux-mmc@vger.kernel.org>
>
>
> On Thu, Sep 24, 2015 at 10:39 AM, Grant Grundler <grundler@chromium.org> wrote:
>>
>> Some followup.
> ...
>>
>> 2) I've been able to test this hack on an eMMC device:
>> [   13.147747] mmc..._secdiscard_rq(mmc1) ERASE from 14116864 cnt
>> 0x2c00000 (size 22528 MiB)
>> [   13.155964] sdhci cmd: 35/0x1a arg 0xd76800
>> [   13.160266] sdhci cmd: 36/0x1a arg 0x39767ff
>> [   13.164593] sdhci cmd: 38/0x1b arg 0x80000000
>> [   13.803360] random: nonblocking pool is initialized
>> [   14.567735] sdhci cmd: 13/0x1a arg 0x10000
>> [   14.573324] mmc..._secdiscard_rq(mmc1) err 0
>>
>> This was with ~15K files and about 5GB written to the device. 1.4
>> seconds compared to about 20 minutes to secure erase the same region
>> with original v3.18 code.
>
>
> To put a few more numbers on the "chunk size vs perf":
>  1EG (512KB) -> 44K commands -> ~20 minutes
> 32EG (16MB) -> 1375 commands -> ~1 minute
> 128EG (64MB) -> 344 commands -> ~30 seconds
> 8191EG (~4GB) -> 6 commands -> 2 seconds + ~8 seconds mkfs
> (I'm assuming times above include about 6-10 seconds of mkfs as part
> of writing a new file system)
>
> This is with only ~300MB of data written to the partition. I'm fully
> aware that times will vary depending on how much data needs to be
> migrated (and in this case very little or none). I'm certain the
> difference will only get worse for the smaller the "chunk size" used
> to Secure Erase due to repeated data migration.
>
> Given the different use model for secure erase (legal/contractually
> required behavior), is using 4GB chunk size acceptable?
>
> Would anyone be terribly offended if I used the recently added
> "MMC_IOC_MULTI_CMD" to send the cmd 35/36/38 sequence to the eMMC
> device to securely erase the offending partition?
>
> thanks,
> grant

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RFC: 32-bit __data_len and REQ_DISCARD+REQ_SECURE
  2015-10-20 17:53       ` Grant Grundler
@ 2015-10-20 18:57         ` Jeff Moyer
  2015-10-21  9:00           ` Ulf Hansson
  2015-10-21 17:38           ` RFC: 32-bit __data_len and REQ_DISCARD+REQ_SECURE Grant Grundler
  0 siblings, 2 replies; 15+ messages in thread
From: Jeff Moyer @ 2015-10-20 18:57 UTC (permalink / raw)
  To: Grant Grundler; +Cc: Ulf Hansson, Jens Axboe, linux-mmc, LKML, Gwendal Grignou

Hi Grant,

Grant Grundler <grundler@chromium.org> writes:

> Ping? Does no one care how long BLK_SECDISCARD takes?
>
> ChromeOS has landed this change as a compromise between "fast" (<10
> seconds) and "minimize risk" (~90 seconds) for a 23GB partition on
> eMMC:
>     https://chromium-review.googlesource.com/#/c/302413/

Including the patch would be helpful.  I believe this is it.  My
comments are inline.

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 8411be3..43943c7 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c

@@ -60,21 +60,37 @@
 	granularity = max(q->limits.discard_granularity >> 9, 1U);
 	alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
 
-	/*
-	 * Ensure that max_discard_sectors is of the proper
-	 * granularity, so that requests stay aligned after a split.
-	 */
-	max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9);
-	max_discard_sectors -= max_discard_sectors % granularity;
-	if (unlikely(!max_discard_sectors)) {
-		/* Avoid infinite loop below. Being cautious never hurts. */
-		return -EOPNOTSUPP;
-	}
+	max_discard_sectors = min(q->limits.max_discard_sectors,
+						UINT_MAX >> 9);

Unnecessary reformatting.
 
 	if (flags & BLKDEV_DISCARD_SECURE) {
 		if (!blk_queue_secdiscard(q))
 			return -EOPNOTSUPP;
 		type |= REQ_SECURE;
+		/*
+		 * Secure erase performs better by telling the device
+		 * about the largest range possible.  Secure erase
+		 * piecemeal will likely result in mapped sectors
+		 * getting evacuated from one range and parked in
+		 * another range that will get erased by a future
+		 * erase command.  This does NOT happen for normal
+		 * TRIM or DISCARD operations.
+		 *
+		 * 32GB was a compromise to avoid blocking the device
+		 * for potentially minute(s) at a time.
+		 */
+		if (max_discard_sectors < (1 << (25-9)))	/* 32GiB */
+			max_discard_sectors = 1 << (25-9);

And here you're ignoring q->limits.max_discard_sectors.  I'm surprised
this worked!

+	}
+
+	/*
+	 * Ensure that max_discard_sectors is of the proper
+	 * granularity, so that requests stay aligned after a split.
+	 */
+	max_discard_sectors -= max_discard_sectors % granularity;
+	if (unlikely(!max_discard_sectors)) {
+		/* Avoid infinite loop below. Being cautious never hurts. */
+		return -EOPNOTSUPP;
 	}
 
 	atomic_set(&bb.done, 1);

Grant, can we start over with the problem description? (Sorry, I didn't
see the previous posts.)  I'd like to know the values of discard_granularity
and discard_max_bytes for your device.  Additionally, it would be
interesting to know how the discards are being initiatied.  Is it via a
userspace utility such as mkfs, online discard via some file system
mounted with -o discard, or something else?  Finally, can you post
binary blktrace data somewhere for the slow case?

Thanks!
Jeff




> On Mon, Sep 28, 2015 at 2:45 PM, Grant Grundler <grundler@chromium.org> wrote:
>> [resending...I forgot to switch gmail back to text-only mode. grrrh..]
>>
>> ---------- Forwarded message ----------
>> From: Grant Grundler <grundler@chromium.org>
>> Date: Mon, Sep 28, 2015 at 2:42 PM
>> Subject: Re: RFC: 32-bit __data_len and REQ_DISCARD+REQ_SECURE
>> To: Grant Grundler <grundler@chromium.org>
>> Cc: Jens Axboe <axboe@kernel.dk>, Ulf Hansson
>> <ulf.hansson@linaro.org>, LKML <linux-kernel@vger.kernel.org>,
>> "linux-mmc@vger.kernel.org" <linux-mmc@vger.kernel.org>
>>
>>
>> On Thu, Sep 24, 2015 at 10:39 AM, Grant Grundler <grundler@chromium.org> wrote:
>>>
>>> Some followup.
>> ...
>>>
>>> 2) I've been able to test this hack on an eMMC device:
>>> [   13.147747] mmc..._secdiscard_rq(mmc1) ERASE from 14116864 cnt
>>> 0x2c00000 (size 22528 MiB)
>>> [   13.155964] sdhci cmd: 35/0x1a arg 0xd76800
>>> [   13.160266] sdhci cmd: 36/0x1a arg 0x39767ff
>>> [   13.164593] sdhci cmd: 38/0x1b arg 0x80000000
>>> [   13.803360] random: nonblocking pool is initialized
>>> [   14.567735] sdhci cmd: 13/0x1a arg 0x10000
>>> [   14.573324] mmc..._secdiscard_rq(mmc1) err 0
>>>
>>> This was with ~15K files and about 5GB written to the device. 1.4
>>> seconds compared to about 20 minutes to secure erase the same region
>>> with original v3.18 code.
>>
>>
>> To put a few more numbers on the "chunk size vs perf":
>>  1EG (512KB) -> 44K commands -> ~20 minutes
>> 32EG (16MB) -> 1375 commands -> ~1 minute
>> 128EG (64MB) -> 344 commands -> ~30 seconds
>> 8191EG (~4GB) -> 6 commands -> 2 seconds + ~8 seconds mkfs
>> (I'm assuming times above include about 6-10 seconds of mkfs as part
>> of writing a new file system)
>>
>> This is with only ~300MB of data written to the partition. I'm fully
>> aware that times will vary depending on how much data needs to be
>> migrated (and in this case very little or none). I'm certain the
>> difference will only get worse for the smaller the "chunk size" used
>> to Secure Erase due to repeated data migration.
>>
>> Given the different use model for secure erase (legal/contractually
>> required behavior), is using 4GB chunk size acceptable?
>>
>> Would anyone be terribly offended if I used the recently added
>> "MMC_IOC_MULTI_CMD" to send the cmd 35/36/38 sequence to the eMMC
>> device to securely erase the offending partition?
>>
>> thanks,
>> grant
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RFC: 32-bit __data_len and REQ_DISCARD+REQ_SECURE
  2015-10-20 18:57         ` Jeff Moyer
@ 2015-10-21  9:00           ` Ulf Hansson
  2015-10-21 17:06             ` Grant Grundler
  2015-10-28 22:15             ` Jeff Moyer
  2015-10-21 17:38           ` RFC: 32-bit __data_len and REQ_DISCARD+REQ_SECURE Grant Grundler
  1 sibling, 2 replies; 15+ messages in thread
From: Ulf Hansson @ 2015-10-21  9:00 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: Grant Grundler, Jens Axboe, linux-mmc, LKML, Gwendal Grignou

On 20 October 2015 at 20:57, Jeff Moyer <jmoyer@redhat.com> wrote:
> Hi Grant,
>
> Grant Grundler <grundler@chromium.org> writes:
>
>> Ping? Does no one care how long BLK_SECDISCARD takes?
>>
>> ChromeOS has landed this change as a compromise between "fast" (<10
>> seconds) and "minimize risk" (~90 seconds) for a 23GB partition on
>> eMMC:
>>     https://chromium-review.googlesource.com/#/c/302413/
>
> Including the patch would be helpful.  I believe this is it.  My
> comments are inline.
>
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index 8411be3..43943c7 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
>
> @@ -60,21 +60,37 @@
>         granularity = max(q->limits.discard_granularity >> 9, 1U);
>         alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
>
> -       /*
> -        * Ensure that max_discard_sectors is of the proper
> -        * granularity, so that requests stay aligned after a split.
> -        */
> -       max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9);
> -       max_discard_sectors -= max_discard_sectors % granularity;
> -       if (unlikely(!max_discard_sectors)) {
> -               /* Avoid infinite loop below. Being cautious never hurts. */
> -               return -EOPNOTSUPP;
> -       }
> +       max_discard_sectors = min(q->limits.max_discard_sectors,
> +                                               UINT_MAX >> 9);
>
> Unnecessary reformatting.
>
>         if (flags & BLKDEV_DISCARD_SECURE) {
>                 if (!blk_queue_secdiscard(q))
>                         return -EOPNOTSUPP;
>                 type |= REQ_SECURE;
> +               /*
> +                * Secure erase performs better by telling the device
> +                * about the largest range possible.  Secure erase
> +                * piecemeal will likely result in mapped sectors
> +                * getting evacuated from one range and parked in
> +                * another range that will get erased by a future
> +                * erase command.  This does NOT happen for normal
> +                * TRIM or DISCARD operations.
> +                *
> +                * 32GB was a compromise to avoid blocking the device
> +                * for potentially minute(s) at a time.
> +                */
> +               if (max_discard_sectors < (1 << (25-9)))        /* 32GiB */
> +                       max_discard_sectors = 1 << (25-9);
>
> And here you're ignoring q->limits.max_discard_sectors.  I'm surprised
> this worked!
>
> +       }
> +
> +       /*
> +        * Ensure that max_discard_sectors is of the proper
> +        * granularity, so that requests stay aligned after a split.
> +        */
> +       max_discard_sectors -= max_discard_sectors % granularity;
> +       if (unlikely(!max_discard_sectors)) {
> +               /* Avoid infinite loop below. Being cautious never hurts. */
> +               return -EOPNOTSUPP;
>         }
>
>         atomic_set(&bb.done, 1);
>
> Grant, can we start over with the problem description? (Sorry, I didn't
> see the previous posts.)  I'd like to know the values of discard_granularity
> and discard_max_bytes for your device.  Additionally, it would be
> interesting to know how the discards are being initiatied.  Is it via a
> userspace utility such as mkfs, online discard via some file system
> mounted with -o discard, or something else?  Finally, can you post
> binary blktrace data somewhere for the slow case?
>
> Thanks!
> Jeff
>
>
>
>
>> On Mon, Sep 28, 2015 at 2:45 PM, Grant Grundler <grundler@chromium.org> wrote:
>>> [resending...I forgot to switch gmail back to text-only mode. grrrh..]
>>>
>>> ---------- Forwarded message ----------
>>> From: Grant Grundler <grundler@chromium.org>
>>> Date: Mon, Sep 28, 2015 at 2:42 PM
>>> Subject: Re: RFC: 32-bit __data_len and REQ_DISCARD+REQ_SECURE
>>> To: Grant Grundler <grundler@chromium.org>
>>> Cc: Jens Axboe <axboe@kernel.dk>, Ulf Hansson
>>> <ulf.hansson@linaro.org>, LKML <linux-kernel@vger.kernel.org>,
>>> "linux-mmc@vger.kernel.org" <linux-mmc@vger.kernel.org>
>>>
>>>
>>> On Thu, Sep 24, 2015 at 10:39 AM, Grant Grundler <grundler@chromium.org> wrote:
>>>>
>>>> Some followup.
>>> ...
>>>>
>>>> 2) I've been able to test this hack on an eMMC device:
>>>> [   13.147747] mmc..._secdiscard_rq(mmc1) ERASE from 14116864 cnt
>>>> 0x2c00000 (size 22528 MiB)
>>>> [   13.155964] sdhci cmd: 35/0x1a arg 0xd76800
>>>> [   13.160266] sdhci cmd: 36/0x1a arg 0x39767ff
>>>> [   13.164593] sdhci cmd: 38/0x1b arg 0x80000000
>>>> [   13.803360] random: nonblocking pool is initialized
>>>> [   14.567735] sdhci cmd: 13/0x1a arg 0x10000
>>>> [   14.573324] mmc..._secdiscard_rq(mmc1) err 0
>>>>
>>>> This was with ~15K files and about 5GB written to the device. 1.4
>>>> seconds compared to about 20 minutes to secure erase the same region
>>>> with original v3.18 code.
>>>
>>>
>>> To put a few more numbers on the "chunk size vs perf":
>>>  1EG (512KB) -> 44K commands -> ~20 minutes
>>> 32EG (16MB) -> 1375 commands -> ~1 minute
>>> 128EG (64MB) -> 344 commands -> ~30 seconds
>>> 8191EG (~4GB) -> 6 commands -> 2 seconds + ~8 seconds mkfs
>>> (I'm assuming times above include about 6-10 seconds of mkfs as part
>>> of writing a new file system)
>>>
>>> This is with only ~300MB of data written to the partition. I'm fully
>>> aware that times will vary depending on how much data needs to be
>>> migrated (and in this case very little or none). I'm certain the
>>> difference will only get worse for the smaller the "chunk size" used
>>> to Secure Erase due to repeated data migration.
>>>
>>> Given the different use model for secure erase (legal/contractually
>>> required behavior), is using 4GB chunk size acceptable?
>>>
>>> Would anyone be terribly offended if I used the recently added
>>> "MMC_IOC_MULTI_CMD" to send the cmd 35/36/38 sequence to the eMMC
>>> device to securely erase the offending partition?
>>>
>>> thanks,
>>> grant
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/

I am not sure if this issue is the same as been discussed earlier on
the mmc list regarding "discard/erase".

Anyway, there have been several attempts to fix bugs related to this.
One of these discussion kind of pointed out a viable solution, but
unfortunate no patches that adopts that solution have been posted yet.

You might want to read up on this.
https://www.mail-archive.com/linux-mmc@vger.kernel.org/msg23643.html
http://linux-mmc.vger.kernel.narkive.com/Wp31G953/patch-mmc-core-don-t-return-1-for-max-discard

So this is an old issue, which should have been fixed long long long time ago...

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RFC: 32-bit __data_len and REQ_DISCARD+REQ_SECURE
  2015-10-21  9:00           ` Ulf Hansson
@ 2015-10-21 17:06             ` Grant Grundler
  2015-10-28 22:15             ` Jeff Moyer
  1 sibling, 0 replies; 15+ messages in thread
From: Grant Grundler @ 2015-10-21 17:06 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Jeff Moyer, Grant Grundler, Jens Axboe, linux-mmc, LKML, Gwendal Grignou

On Wed, Oct 21, 2015 at 2:00 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
....
>>>> To put a few more numbers on the "chunk size vs perf":
>>>>  1EG (512KB) -> 44K commands -> ~20 minutes
>>>> 32EG (16MB) -> 1375 commands -> ~1 minute
>>>> 128EG (64MB) -> 344 commands -> ~30 seconds
>>>> 8191EG (~4GB) -> 6 commands -> 2 seconds + ~8 seconds mkfs
>>>> (I'm assuming times above include about 6-10 seconds of mkfs as part
>>>> of writing a new file system)
>>>>
>>>> This is with only ~300MB of data written to the partition. I'm fully
>>>> aware that times will vary depending on how much data needs to be
>>>> migrated (and in this case very little or none). I'm certain the
>>>> difference will only get worse for the smaller the "chunk size" used
>>>> to Secure Erase due to repeated data migration.
....
> I am not sure if this issue is the same as been discussed earlier on
> the mmc list regarding "discard/erase".
>
> Anyway, there have been several attempts to fix bugs related to this.
> One of these discussion kind of pointed out a viable solution, but
> unfortunate no patches that adopts that solution have been posted yet.
>
> You might want to read up on this.
> https://www.mail-archive.com/linux-mmc@vger.kernel.org/msg23643.html
> http://linux-mmc.vger.kernel.narkive.com/Wp31G953/patch-mmc-core-don-t-return-1-for-max-discard
>
> So this is an old issue, which should have been fixed long long long time ago...

Agreed. :)  I'll read the references but hope that Gwendal (or someone
on Android Team?) can follow up on this shorter term. I've moved to a
different team (Google Onhub) and currently have a whole new set of
(wireless) issues to deal with. :(

At some point I expect I'll be circling back to mmc issues - storage
keeps following me like a hungry puppy where ever I go. /o\

thank you!
grant

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RFC: 32-bit __data_len and REQ_DISCARD+REQ_SECURE
  2015-10-20 18:57         ` Jeff Moyer
  2015-10-21  9:00           ` Ulf Hansson
@ 2015-10-21 17:38           ` Grant Grundler
  1 sibling, 0 replies; 15+ messages in thread
From: Grant Grundler @ 2015-10-21 17:38 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Grant Grundler, Ulf Hansson, Jens Axboe, linux-mmc, LKML,
	Gwendal Grignou

On Tue, Oct 20, 2015 at 11:57 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
> Hi Grant,
>
> Grant Grundler <grundler@chromium.org> writes:
>
>> Ping? Does no one care how long BLK_SECDISCARD takes?
>>
>> ChromeOS has landed this change as a compromise between "fast" (<10
>> seconds) and "minimize risk" (~90 seconds) for a 23GB partition on
>> eMMC:
>>     https://chromium-review.googlesource.com/#/c/302413/
>
> Including the patch would be helpful.  I believe this is it.

Thanks Jeff!  Gerrit does provide easy mechanisms to review or pull
the patch - easy to use - not easy to find though. :/

> My comments are inline.
>
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index 8411be3..43943c7 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
>
> @@ -60,21 +60,37 @@
>         granularity = max(q->limits.discard_granularity >> 9, 1U);
>         alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
>
> -       /*
> -        * Ensure that max_discard_sectors is of the proper
> -        * granularity, so that requests stay aligned after a split.
> -        */
> -       max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9);
> -       max_discard_sectors -= max_discard_sectors % granularity;
> -       if (unlikely(!max_discard_sectors)) {
> -               /* Avoid infinite loop below. Being cautious never hurts. */
> -               return -EOPNOTSUPP;
> -       }
> +       max_discard_sectors = min(q->limits.max_discard_sectors,
> +                                               UINT_MAX >> 9);
>
> Unnecessary reformatting.
>
>         if (flags & BLKDEV_DISCARD_SECURE) {
>                 if (!blk_queue_secdiscard(q))
>                         return -EOPNOTSUPP;
>                 type |= REQ_SECURE;
> +               /*
> +                * Secure erase performs better by telling the device
> +                * about the largest range possible.  Secure erase
> +                * piecemeal will likely result in mapped sectors
> +                * getting evacuated from one range and parked in
> +                * another range that will get erased by a future
> +                * erase command.  This does NOT happen for normal
> +                * TRIM or DISCARD operations.
> +                *
> +                * 32GB was a compromise to avoid blocking the device
> +                * for potentially minute(s) at a time.
> +                */
> +               if (max_discard_sectors < (1 << (25-9)))        /* 32GiB */
> +                       max_discard_sectors = 1 << (25-9);
>
> And here you're ignoring q->limits.max_discard_sectors.  I'm surprised
> this worked!

See Gwendal's earlier reply. Here is the entire thread:
   https://lkml.org/lkml/2015/9/22/1235

>
> +       }
> +
> +       /*
> +        * Ensure that max_discard_sectors is of the proper
> +        * granularity, so that requests stay aligned after a split.
> +        */
> +       max_discard_sectors -= max_discard_sectors % granularity;
> +       if (unlikely(!max_discard_sectors)) {
> +               /* Avoid infinite loop below. Being cautious never hurts. */
> +               return -EOPNOTSUPP;
>         }
>
>         atomic_set(&bb.done, 1);
>
> Grant, can we start over with the problem description? (Sorry, I didn't
> see the previous posts.)

First/second posting in https://lkml.org/lkml/2015/9/22/1235 should
provide this.

>  I'd like to know the values of discard_granularity
> and discard_max_bytes for your device.

Gwendal might be able to provide those. I no longer have possession the HW.

>  Additionally, it would be
> interesting to know how the discards are being initiatied.  Is it via a
> userspace utility such as mkfs, online discard via some file system
> mounted with -o discard, or something else?

BLK_SECDISCARD ioctl with parameters to describe the /data partition
on an android device.

> Finally, can you post
> binary blktrace data somewhere for the slow case?

Sorry,  -ENOHW.
 second
I only have a snippet of printk output from the original code with
slow performance:
[   13.409334] sdhci-cmd:  CMD 0x231a arg 0x3976800
[   13.414150] sdhci-cmd:  CMD 0x241a arg 0x3976bff
[   13.418790] sdhci-cmd:  CMD 0x261b arg 0x80000000
[   13.424488] sdhci-cmd:  CMD 0xd1a arg 0x10000
[   13.429622] sdhci-cmd:  CMD 0x231a arg 0x3976c00
[   13.434333] sdhci-cmd:  CMD 0x241a arg 0x3976fff
[   13.438968] sdhci-cmd:  CMD 0x261b arg 0x80000000
[   13.443717] sdhci-cmd:  CMD 0xd1a arg 0x10000
[   13.448113] sdhci-cmd:  CMD 0x231a arg 0x3977000
[   13.453087] sdhci-cmd:  CMD 0x241a arg 0x39773ff
[   13.457780] sdhci-cmd:  CMD 0x261b arg 0x80000000
[   13.462839] sdhci-cmd:  CMD 0xd1a arg 0x10000
[   13.468237] sdhci-cmd:  CMD 0x231a arg 0x3977400
[   13.472980] sdhci-cmd:  CMD 0x241a arg 0x39777ff
[   13.477619] sdhci-cmd:  CMD 0x261b arg 0x80000000
[   13.482352] sdhci-cmd:  CMD 0xd1a arg 0x10000

"CMD" is 35/36/38/13 (but in hex) + flags (IIRC)

Each command is taking ~20ms. But multiple that by 46k to erase the
entire 23GB partition == 15 minutes.

I will assert this is "best case" since I usually tested with very
little "live" data (< 300MB) that would need to be evacuated from any
given erase block.

> Thanks!

Thanks for the feedback! :)

cheers,
grant

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RFC: 32-bit __data_len and REQ_DISCARD+REQ_SECURE
  2015-10-21  9:00           ` Ulf Hansson
  2015-10-21 17:06             ` Grant Grundler
@ 2015-10-28 22:15             ` Jeff Moyer
  2016-06-02 22:56               ` [PATCH] mmc: Set pref erase size based on size Gwendal Grignou
  1 sibling, 1 reply; 15+ messages in thread
From: Jeff Moyer @ 2015-10-28 22:15 UTC (permalink / raw)
  To: Ulf Hansson; +Cc: Grant Grundler, Jens Axboe, linux-mmc, LKML, Gwendal Grignou

Ulf Hansson <ulf.hansson@linaro.org> writes:

> I am not sure if this issue is the same as been discussed earlier on
> the mmc list regarding "discard/erase".
>
> Anyway, there have been several attempts to fix bugs related to this.
> One of these discussion kind of pointed out a viable solution, but
> unfortunate no patches that adopts that solution have been posted yet.
>
> You might want to read up on this.
> https://www.mail-archive.com/linux-mmc@vger.kernel.org/msg23643.html
> http://linux-mmc.vger.kernel.narkive.com/Wp31G953/patch-mmc-core-don-t-return-1-for-max-discard
>
> So this is an old issue, which should have been fixed long long long time ago...

Thanks Ulf.  After reading all of the linked discussions, it's my
understanding that this is an emmc-specific issue that doesn't require
any block layer changes.  If that's wrong, please let me know.

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] mmc: Set pref erase size based on size.
  2015-10-28 22:15             ` Jeff Moyer
@ 2016-06-02 22:56               ` Gwendal Grignou
  2016-06-03  0:27                 ` [PATCH v2] " Gwendal Grignou
  0 siblings, 1 reply; 15+ messages in thread
From: Gwendal Grignou @ 2016-06-02 22:56 UTC (permalink / raw)
  To: ulf.hansson; +Cc: grundler, axboe, linux-mmc, linux-kernel

If available, eMMC stack uses HC_ERASE_GRP_SIZE as preferred erase size.
However, that size is the minimal size we must use, not the optimal size.
Calculate the optimal size based on whole device size and fall back to
HC_ERASE_GRP_SIZE if too small.

Signed-off-by: Gwendal Grignou <gwendal@chromium.org>
---
 drivers/mmc/core/core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index ef76b1c..2874160 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -1996,8 +1996,6 @@ void mmc_init_erase(struct mmc_card *card)
 	if (mmc_card_sd(card) && card->ssr.au) {
 		card->pref_erase = card->ssr.au;
 		card->erase_shift = ffs(card->ssr.au) - 1;
-	} else if (card->ext_csd.hc_erase_size) {
-		card->pref_erase = card->ext_csd.hc_erase_size;
 	} else if (card->erase_size) {
 		sz = (card->csd.capacity << (card->csd.read_blkbits - 9)) >> 11;
 		if (sz < 128)
@@ -2015,6 +2013,8 @@ void mmc_init_erase(struct mmc_card *card)
 			if (sz)
 				card->pref_erase += card->erase_size - sz;
 		}
+		if (card->ext_csd.hc_erase_size > card->pref_erase)
+			card->pref_erase = card->ext_csd.hc_erase_size;
 	} else
 		card->pref_erase = 0;
 }
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2] mmc: Set pref erase size based on size.
  2016-06-02 22:56               ` [PATCH] mmc: Set pref erase size based on size Gwendal Grignou
@ 2016-06-03  0:27                 ` Gwendal Grignou
  2016-06-03 16:08                   ` [PATCH v3] CHROMIUM: " Gwendal Grignou
  2016-06-10 17:02                   ` [PATCH v2] " Grant Grundler
  0 siblings, 2 replies; 15+ messages in thread
From: Gwendal Grignou @ 2016-06-03  0:27 UTC (permalink / raw)
  To: ulf.hansson; +Cc: grundler, axboe, linux-mmc, linux-kernel

If available, eMMC stack uses HC_ERASE_GRP_SIZE as preferred erase size.
However, that size is the minimal size we must use, not the optimal size.
Calculate the optimal size based on whole device size and fall back to
HC_ERASE_GRP_SIZE if too small.

Signed-off-by: Gwendal Grignou <gwendal@chromium.org>
---
 v2: fix comments

 drivers/mmc/core/core.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index ef76b1c..b858907 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -1987,17 +1987,17 @@ void mmc_init_erase(struct mmc_card *card)
 	 * to that size and alignment.
 	 *
 	 * For SD cards that define Allocation Unit size, limit erases to one
-	 * Allocation Unit at a time.  For MMC cards that define High Capacity
-	 * Erase Size, whether it is switched on or not, limit to that size.
-	 * Otherwise just have a stab at a good value.  For modern cards it
-	 * will end up being 4MiB.  Note that if the value is too small, it
-	 * can end up taking longer to erase.
+	 * Allocation Unit at a time.
+	 * For MMC, have a stab at a good value.  For modern cards it
+	 * will end up being 4MiB.
+	 * Be sure the size is at least the High Capacity Erase Size, as long
+	 * as it is defined, even if not used.
+	 * Note that if the value is too small, it can end up taking longer to
+	 * erase.
 	 */
 	if (mmc_card_sd(card) && card->ssr.au) {
 		card->pref_erase = card->ssr.au;
 		card->erase_shift = ffs(card->ssr.au) - 1;
-	} else if (card->ext_csd.hc_erase_size) {
-		card->pref_erase = card->ext_csd.hc_erase_size;
 	} else if (card->erase_size) {
 		sz = (card->csd.capacity << (card->csd.read_blkbits - 9)) >> 11;
 		if (sz < 128)
@@ -2015,6 +2015,8 @@ void mmc_init_erase(struct mmc_card *card)
 			if (sz)
 				card->pref_erase += card->erase_size - sz;
 		}
+		if (card->ext_csd.hc_erase_size > card->pref_erase)
+			card->pref_erase = card->ext_csd.hc_erase_size;
 	} else
 		card->pref_erase = 0;
 }
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3] CHROMIUM: mmc: Set pref erase size based on size.
  2016-06-03  0:27                 ` [PATCH v2] " Gwendal Grignou
@ 2016-06-03 16:08                   ` Gwendal Grignou
  2016-06-22 15:23                     ` Ulf Hansson
  2016-06-10 17:02                   ` [PATCH v2] " Grant Grundler
  1 sibling, 1 reply; 15+ messages in thread
From: Gwendal Grignou @ 2016-06-03 16:08 UTC (permalink / raw)
  To: ulf.hansson; +Cc: grundler, axboe, linux-mmc, linux-kernel

If available, eMMC stack uses HC_ERASE_GRP_SIZE as erase size.
The perfererred erase size may need to be bigger to ensure discard
operations are not too slow.
Some high capacity eMMC (64MB) reports erase size to 512kB, resulting in
long discard time.
Calculate prefeerred erase size based on eMMC size.

Signed-off-by: Gwendal Grignou <gwendal@chromium.org>
---
 v2: fix comments
 v3: rereading the patch, we should not compare to hc_erase_size again,
     it has already been set properly in the caller function.

 drivers/mmc/core/core.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index ef76b1c..bd881a0 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -1987,17 +1987,17 @@ void mmc_init_erase(struct mmc_card *card)
 	 * to that size and alignment.
 	 *
 	 * For SD cards that define Allocation Unit size, limit erases to one
-	 * Allocation Unit at a time.  For MMC cards that define High Capacity
-	 * Erase Size, whether it is switched on or not, limit to that size.
-	 * Otherwise just have a stab at a good value.  For modern cards it
-	 * will end up being 4MiB.  Note that if the value is too small, it
-	 * can end up taking longer to erase.
+	 * Allocation Unit at a time.
+	 * For MMC, have a stab at a good value.  For modern cards it
+	 * will end up being 4MiB.
+	 * Note that if the value is too small, it can end up taking longer to
+	 * erase.
+	 * erase_size is already set to High Capacity Erase Size if available
+	 * when this function is called.
 	 */
 	if (mmc_card_sd(card) && card->ssr.au) {
 		card->pref_erase = card->ssr.au;
 		card->erase_shift = ffs(card->ssr.au) - 1;
-	} else if (card->ext_csd.hc_erase_size) {
-		card->pref_erase = card->ext_csd.hc_erase_size;
 	} else if (card->erase_size) {
 		sz = (card->csd.capacity << (card->csd.read_blkbits - 9)) >> 11;
 		if (sz < 128)
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] mmc: Set pref erase size based on size.
  2016-06-03  0:27                 ` [PATCH v2] " Gwendal Grignou
  2016-06-03 16:08                   ` [PATCH v3] CHROMIUM: " Gwendal Grignou
@ 2016-06-10 17:02                   ` Grant Grundler
  1 sibling, 0 replies; 15+ messages in thread
From: Grant Grundler @ 2016-06-10 17:02 UTC (permalink / raw)
  To: Ulf Hansson, Jens Axboe; +Cc: Grant Grundler, linux-mmc, LKML, Gwendal Grignou

On Thu, Jun 2, 2016 at 5:27 PM, Gwendal Grignou <gwendal@chromium.org> wrote:
> If available, eMMC stack uses HC_ERASE_GRP_SIZE as preferred erase size.
> However, that size is the minimal size we must use, not the optimal size.
> Calculate the optimal size based on whole device size and fall back to
> HC_ERASE_GRP_SIZE if too small.
>
> Signed-off-by: Gwendal Grignou <gwendal@chromium.org>
> ---
>  v2: fix comments
>
>  drivers/mmc/core/core.c | 16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> index ef76b1c..b858907 100644
> --- a/drivers/mmc/core/core.c
> +++ b/drivers/mmc/core/core.c
> @@ -1987,17 +1987,17 @@ void mmc_init_erase(struct mmc_card *card)
>          * to that size and alignment.
>          *
>          * For SD cards that define Allocation Unit size, limit erases to one
> -        * Allocation Unit at a time.  For MMC cards that define High Capacity
> -        * Erase Size, whether it is switched on or not, limit to that size.
> -        * Otherwise just have a stab at a good value.  For modern cards it
> -        * will end up being 4MiB.  Note that if the value is too small, it
> -        * can end up taking longer to erase.
> +        * Allocation Unit at a time.
> +        * For MMC, have a stab at a good value.  For modern cards it
> +        * will end up being 4MiB.
> +        * Be sure the size is at least the High Capacity Erase Size, as long
> +        * as it is defined, even if not used.
> +        * Note that if the value is too small, it can end up taking longer to
> +        * erase.

The last sentence is a key point.

I want to point out the difference in erasing ~23GB of a 32GB device
as specified by the number of secure erase commands sent:
   ~44K commands (default: single erase block): ~30 minutes
   ~700 commands (32MB chunks) : ~90 seconds
   1 command: < 6 seconds

cheers,
grant

>          */
>         if (mmc_card_sd(card) && card->ssr.au) {
>                 card->pref_erase = card->ssr.au;
>                 card->erase_shift = ffs(card->ssr.au) - 1;
> -       } else if (card->ext_csd.hc_erase_size) {
> -               card->pref_erase = card->ext_csd.hc_erase_size;
>         } else if (card->erase_size) {
>                 sz = (card->csd.capacity << (card->csd.read_blkbits - 9)) >> 11;
>                 if (sz < 128)
> @@ -2015,6 +2015,8 @@ void mmc_init_erase(struct mmc_card *card)
>                         if (sz)
>                                 card->pref_erase += card->erase_size - sz;
>                 }
> +               if (card->ext_csd.hc_erase_size > card->pref_erase)
> +                       card->pref_erase = card->ext_csd.hc_erase_size;
>         } else
>                 card->pref_erase = 0;
>  }
> --
> 2.8.0.rc3.226.g39d4020
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] CHROMIUM: mmc: Set pref erase size based on size.
  2016-06-03 16:08                   ` [PATCH v3] CHROMIUM: " Gwendal Grignou
@ 2016-06-22 15:23                     ` Ulf Hansson
  0 siblings, 0 replies; 15+ messages in thread
From: Ulf Hansson @ 2016-06-22 15:23 UTC (permalink / raw)
  To: Gwendal Grignou; +Cc: Grant Grundler, Jens Axboe, linux-mmc, linux-kernel

On 3 June 2016 at 18:08, Gwendal Grignou <gwendal@chromium.org> wrote:
> If available, eMMC stack uses HC_ERASE_GRP_SIZE as erase size.
> The perfererred erase size may need to be bigger to ensure discard
> operations are not too slow.
> Some high capacity eMMC (64MB) reports erase size to 512kB, resulting in
> long discard time.
> Calculate prefeerred erase size based on eMMC size.
>
> Signed-off-by: Gwendal Grignou <gwendal@chromium.org>

Thanks, applied for next! I did update the changelog as well as the
comment in the code.

Please next time, try to work a bit more on the changelog. It really
helps when reviewing to understand what the change is all about.

Kind regards
Uffe

> ---
>  v2: fix comments
>  v3: rereading the patch, we should not compare to hc_erase_size again,
>      it has already been set properly in the caller function.
>
>  drivers/mmc/core/core.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> index ef76b1c..bd881a0 100644
> --- a/drivers/mmc/core/core.c
> +++ b/drivers/mmc/core/core.c
> @@ -1987,17 +1987,17 @@ void mmc_init_erase(struct mmc_card *card)
>          * to that size and alignment.
>          *
>          * For SD cards that define Allocation Unit size, limit erases to one
> -        * Allocation Unit at a time.  For MMC cards that define High Capacity
> -        * Erase Size, whether it is switched on or not, limit to that size.
> -        * Otherwise just have a stab at a good value.  For modern cards it
> -        * will end up being 4MiB.  Note that if the value is too small, it
> -        * can end up taking longer to erase.
> +        * Allocation Unit at a time.
> +        * For MMC, have a stab at a good value.  For modern cards it
> +        * will end up being 4MiB.
> +        * Note that if the value is too small, it can end up taking longer to
> +        * erase.
> +        * erase_size is already set to High Capacity Erase Size if available
> +        * when this function is called.
>          */
>         if (mmc_card_sd(card) && card->ssr.au) {
>                 card->pref_erase = card->ssr.au;
>                 card->erase_shift = ffs(card->ssr.au) - 1;
> -       } else if (card->ext_csd.hc_erase_size) {
> -               card->pref_erase = card->ext_csd.hc_erase_size;
>         } else if (card->erase_size) {
>                 sz = (card->csd.capacity << (card->csd.read_blkbits - 9)) >> 11;
>                 if (sz < 128)
> --
> 2.8.0.rc3.226.g39d4020
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-06-22 15:23 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-23  1:54 RFC: 32-bit __data_len and REQ_DISCARD+REQ_SECURE Grant Grundler
2015-09-24 17:39 ` Grant Grundler
     [not found]   ` <CANEJEGukOKUgRJNPVaeEXpx-REzj8ooSfWdrnRMMCQr6EVaNqQ@mail.gmail.com>
2015-09-28 21:45     ` Fwd: " Grant Grundler
2015-09-29 12:56       ` Gwendal Grignou
2015-10-20 17:53       ` Grant Grundler
2015-10-20 18:57         ` Jeff Moyer
2015-10-21  9:00           ` Ulf Hansson
2015-10-21 17:06             ` Grant Grundler
2015-10-28 22:15             ` Jeff Moyer
2016-06-02 22:56               ` [PATCH] mmc: Set pref erase size based on size Gwendal Grignou
2016-06-03  0:27                 ` [PATCH v2] " Gwendal Grignou
2016-06-03 16:08                   ` [PATCH v3] CHROMIUM: " Gwendal Grignou
2016-06-22 15:23                     ` Ulf Hansson
2016-06-10 17:02                   ` [PATCH v2] " Grant Grundler
2015-10-21 17:38           ` RFC: 32-bit __data_len and REQ_DISCARD+REQ_SECURE Grant Grundler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).