* configurable discard parameters @ 2015-06-20 17:12 Tom Yan 2015-06-21 0:20 ` Martin K. Petersen 0 siblings, 1 reply; 18+ messages in thread From: Tom Yan @ 2015-06-20 17:12 UTC (permalink / raw) To: linux-scsi, linux-ide [-- Attachment #1: Type: text/plain, Size: 2716 bytes --] Today I check if blkdiscard really does a full device trim/wipe for my Intel 530 SSD (240gb) with hexdump. I end up found that it fail to do so because it report garbage info on its block limits VPD. [tom@localhost ~]$ sudo sg_vpd -p 0xb0 /dev/sda Block limits VPD page (SBC): Write same non-zero (WSNZ): 0 Maximum compare and write length: 0 blocks Optimal transfer length granularity: 1 blocks Maximum transfer length: 0 blocks Optimal transfer length: 0 blocks Maximum prefetch length: 0 blocks Maximum unmap LBA count: 0 Maximum unmap block descriptor count: 0 Optimal unmap granularity: 1 Unmap granularity alignment valid: 0 Unmap granularity alignment: 0 Maximum write same length: 0x3fffc0 blocks Maximum atomic transfer length: 0 Atomic alignment: 0 Atomic transfer length granularity: 0 The fact is, in each iteration, for this drive, blkdiscard can only trim a maximum of 65528 sectors, which is the largest multiple of 8 sectors, which is the minimum possible. (In fact 65535 sectors seems to be some sort of limit of ATA TRIM commands. It's also the maximum of WRITE SAME (10).) With `blkdiscard`, I can still workaround this with the "--step" option, but for discard mount options, I don't think I have any way to deal with it. Although it's basically the reponsibility of Intel (Well I tried to find a way to complain to them but even their website is borked as always), but still I would like to know why the kernel doesn't allow "discard_granularity" and "discard_max_bytes" to be configurable for users. Also wanna share about this in case nobody ever noticed. Another case is, I have a SanDisk Extreme USB (32gb, w/o UASP), which is sort of a SATA SSD with USB bridge. I can basically TRIM the drive with hdparm (although there's a minor issue: https://sourceforge.net/p/hdparm/bugs/63/), but I can't do it with blkdiscard. At first I thought it was only because its VPD doesn't provide limits of discard: [tom@localhost ~]$ sudo sg_vpd -p 0xb0 /dev/sdc Block limits VPD page (SBC): Write same non-zero (WSNZ): 0 Maximum compare and write length: 0 blocks Optimal transfer length granularity: 0 blocks Maximum transfer length: 8388607 blocks Optimal transfer length: 8388607 blocks Maximum prefetch length: 0 blocks but then I try to use the attached patch to force some value for it, blkdiscard still gives me "I/O error" instead of "NOT SUPPROTED". So what's the requirement of the current discard way in the kernel? Does it work differently on SATA drives than on USB drives because of the controller (like involvement of libata)? Does it require capability of one of the three scsi unmap ways? Or did I just miss some other dirty hacking bits? [-- Attachment #2: discard.patch --] [-- Type: text/x-patch, Size: 1570 bytes --] diff --git a/drivers/scsi/sd.c~ b/drivers/scsi/sd.c index a661d33..a9cdfb6 100644 --- a/drivers/scsi/sd.c~ +++ b/drivers/scsi/sd.c @@ -99,6 +99,7 @@ MODULE_ALIAS_SCSI_DEVICE(TYPE_RBC); #endif static void sd_config_discard(struct scsi_disk *, unsigned int); +static void sd_config_discard_novpd(struct scsi_disk *); static void sd_config_write_same(struct scsi_disk *); static int sd_revalidate_disk(struct gendisk *); static void sd_unlock_native_capacity(struct gendisk *disk); @@ -700,6 +701,22 @@ static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode) queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q); } +static void sd_config_discard_novpd(struct scsi_disk *sdkp) +{ + struct request_queue *q = sdkp->disk->queue; + unsigned int logical_block_size = sdkp->device->sector_size; + unsigned int max_blocks = (u32)SD_MAX_WS16_BLOCKS; + + q->limits.discard_zeroes_data = 0; + q->limits.discard_alignment = 0; + q->limits.discard_granularity = sdkp->physical_block_size; + + sdkp->provisioning_mode = SD_LBP_WS16; + + q->limits.max_discard_sectors = max_blocks * (logical_block_size >> 9); + queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q); +} + /** * sd_setup_discard_cmnd - unmap blocks on thinly provisioned device * @sdp: scsi device to operate one @@ -2776,6 +2793,9 @@ static int sd_revalidate_disk(struct gendisk *disk) sd_read_block_limits(sdkp); sd_read_block_characteristics(sdkp); } + else { + sd_config_discard_novpd(sdkp); + } sd_read_write_protect_flag(sdkp, buffer); sd_read_cache_type(sdkp, buffer); ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: configurable discard parameters 2015-06-20 17:12 configurable discard parameters Tom Yan @ 2015-06-21 0:20 ` Martin K. Petersen 2015-06-21 7:03 ` Tom Yan 0 siblings, 1 reply; 18+ messages in thread From: Martin K. Petersen @ 2015-06-21 0:20 UTC (permalink / raw) To: Tom Yan; +Cc: linux-scsi, linux-ide >>>>> "Tom" == Tom Yan <tom.ty89@gmail.com> writes: Tom> Today I check if blkdiscard really does a full device trim/wipe for Tom> my Intel 530 SSD (240gb) with hexdump. I end up found that it fail Tom> to do so because it report garbage info on its block limits VPD. It is a SATA-attached drive, it has no block limits VPD. What you are seeing is information prepared by libata's SATL. Tom> The fact is, in each iteration, for this drive, blkdiscard can only Tom> trim a maximum of 65528 sectors, which is the largest multiple of 8 Tom> sectors, which is the minimum possible. Maximum write same length: 0x3fffc0 blocks That's 2GB-e per request for a drive with 512-byte logical blocks. Tom> but still I would like to know why the kernel doesn't allow Tom> "discard_granularity" and "discard_max_bytes" to be configurable Tom> for users. Because if the vendor got these trivial values wrong there is little to no chance that they implemented discard correctly in their firmware. -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: configurable discard parameters 2015-06-21 0:20 ` Martin K. Petersen @ 2015-06-21 7:03 ` Tom Yan 2015-06-21 8:05 ` Tom Yan 0 siblings, 1 reply; 18+ messages in thread From: Tom Yan @ 2015-06-21 7:03 UTC (permalink / raw) To: Martin K. Petersen; +Cc: linux-scsi, linux-ide On 21 June 2015 at 08:20, Martin K. Petersen <martin.petersen@oracle.com> wrote: > It is a SATA-attached drive, it has no block limits VPD. What you are > seeing is information prepared by libata's SATL. > > Because if the vendor got these trivial values wrong there is little to > no chance that they implemented discard correctly in their firmware. I don't get it. So there's a chance that the VPDs is not purely from the hardware? Then how can I differentiate them? But then you said "the vendor got these trivial values wrong", so were you talking about this drive or just for real SCSI drives? -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: configurable discard parameters 2015-06-21 7:03 ` Tom Yan @ 2015-06-21 8:05 ` Tom Yan 2015-06-21 12:36 ` Tom Yan 0 siblings, 1 reply; 18+ messages in thread From: Tom Yan @ 2015-06-21 8:05 UTC (permalink / raw) To: Martin K. Petersen; +Cc: linux-scsi, linux-ide By the way do you think it could be a bug of libata's SATL anyway? Like perhaps it should break a single unmap request to multiple ATA commands? I am not totally sure about it but it looks like there's a limit of addressed blocks in a single ATA DSM/TRIM command (4 bytes, which is 65535). On 21 June 2015 at 15:03, Tom Yan <tom.ty89@gmail.com> wrote: > On 21 June 2015 at 08:20, Martin K. Petersen <martin.petersen@oracle.com> wrote: >> It is a SATA-attached drive, it has no block limits VPD. What you are >> seeing is information prepared by libata's SATL. >> >> Because if the vendor got these trivial values wrong there is little to >> no chance that they implemented discard correctly in their firmware. > > I don't get it. So there's a chance that the VPDs is not purely from > the hardware? Then how can I differentiate them? But then you said > "the vendor got these trivial values wrong", so were you talking about > this drive or just for real SCSI drives? -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: configurable discard parameters 2015-06-21 8:05 ` Tom Yan @ 2015-06-21 12:36 ` Tom Yan 2015-06-21 20:30 ` Tom Yan 0 siblings, 1 reply; 18+ messages in thread From: Tom Yan @ 2015-06-21 12:36 UTC (permalink / raw) To: Martin K. Petersen; +Cc: linux-scsi, linux-ide https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/ata/libata-scsi.c?id=18f0f97850059303ed73b1f02084f55ca330a80c https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/ata/libata-scsi.c?id=e78db4dfb1355a895f7ea50133b702b55b8ed184 So I think I found the answer myself. But I don't see the rationale here at all. So values (arbitrary?) of granularity and maximum has been hardcoded in the kernel for ALL devices which make use of libata, and the parameter cannot be changed by user in runtime? So does it expect all devices to be of the same spec? Or is it just like a draft/example which people can only consider it as "may work"? On 21 June 2015 at 16:05, Tom Yan <tom.ty89@gmail.com> wrote: > By the way do you think it could be a bug of libata's SATL anyway? > Like perhaps it should break a single unmap request to multiple ATA > commands? I am not totally sure about it but it looks like there's a > limit of addressed blocks in a single ATA DSM/TRIM command (4 bytes, > which is 65535). > > On 21 June 2015 at 15:03, Tom Yan <tom.ty89@gmail.com> wrote: >> On 21 June 2015 at 08:20, Martin K. Petersen <martin.petersen@oracle.com> wrote: >>> It is a SATA-attached drive, it has no block limits VPD. What you are >>> seeing is information prepared by libata's SATL. >>> >>> Because if the vendor got these trivial values wrong there is little to >>> no chance that they implemented discard correctly in their firmware. >> >> I don't get it. So there's a chance that the VPDs is not purely from >> the hardware? Then how can I differentiate them? But then you said >> "the vendor got these trivial values wrong", so were you talking about >> this drive or just for real SCSI drives? -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: configurable discard parameters 2015-06-21 12:36 ` Tom Yan @ 2015-06-21 20:30 ` Tom Yan 2015-06-22 20:57 ` Martin K. Petersen 0 siblings, 1 reply; 18+ messages in thread From: Tom Yan @ 2015-06-21 20:30 UTC (permalink / raw) To: Martin K. Petersen; +Cc: linux-scsi, linux-ide After rechecking the result of blkdiscard (without specifying --step) with hexdump, I see that the single ioctl is actually broken into multiple TRIM commands/ranges by 65535 sector. I wonder if it will be further tuned by ANY KIND of granularity though. Though it will probably only be a problem for device TRIM, and for that you can probably always rely on blkdiscard with --step (and avoid device TRIM with things like mkfs.btrfs) so I guess it won't be a big problem. And it will be fine for filesystem TRIM anyway I guess. So it seems that "Maximum write same length: 0x3fffc0 blocks" doesn't really matter (In fact when I ran `sg_write_same -U --lba=0 --num=0x7fffffff`, the command also trim the first 65528 sectors of the drive). However, I am curious what is the meaning of "Our current TRIM payload is a single sector that can accommodate 64 * 65535 blocks being unmapped. Report this value in the Block Limits Maximum Unmap LBA count field." in the commit message. What does the value actually mean/affect? Could it cause trouble to certain drives? By the way I think I got the answer for my USB TRIM question. Basically for USB->SATA drives the SATL is implemented in the bridges, so it must be able to "SAT" one of the three SCSI unmap methods to ATA TRIM, just like libata does, so that it can handle whatever requested through the scsi disk driver, otherwise one can only TRIM it through SCSI's ATA pass-through commands if the bridge supports them, e.g. hdparm. So my dirty hacking was silly. P.S.: [tom@localhost ~]$ sudo sg_vpd -p ai /dev/sda ATA information VPD page: SAT Vendor identification: linux SAT Product identification: libata SAT Product revision level: 3.00 So it actually tells. And I hope that the kernel wouldn't "falsify" anything for devices which do provide some VPD(s). : \ Another P.S.: I made a mistake in one of my last mails. 65535 is two bytes, not four bytes. : / On 21 June 2015 at 20:36, Tom Yan <tom.ty89@gmail.com> wrote: > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/ata/libata-scsi.c?id=18f0f97850059303ed73b1f02084f55ca330a80c > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/ata/libata-scsi.c?id=e78db4dfb1355a895f7ea50133b702b55b8ed184 > > So I think I found the answer myself. But I don't see the rationale > here at all. So values (arbitrary?) of granularity and maximum has > been hardcoded in the kernel for ALL devices which make use of libata, > and the parameter cannot be changed by user in runtime? So does it > expect all devices to be of the same spec? Or is it just like a > draft/example which people can only consider it as "may work"? > > On 21 June 2015 at 16:05, Tom Yan <tom.ty89@gmail.com> wrote: >> By the way do you think it could be a bug of libata's SATL anyway? >> Like perhaps it should break a single unmap request to multiple ATA >> commands? I am not totally sure about it but it looks like there's a >> limit of addressed blocks in a single ATA DSM/TRIM command (4 bytes, >> which is 65535). >> >> On 21 June 2015 at 15:03, Tom Yan <tom.ty89@gmail.com> wrote: >>> On 21 June 2015 at 08:20, Martin K. Petersen <martin.petersen@oracle.com> wrote: >>>> It is a SATA-attached drive, it has no block limits VPD. What you are >>>> seeing is information prepared by libata's SATL. >>>> >>>> Because if the vendor got these trivial values wrong there is little to >>>> no chance that they implemented discard correctly in their firmware. >>> >>> I don't get it. So there's a chance that the VPDs is not purely from >>> the hardware? Then how can I differentiate them? But then you said >>> "the vendor got these trivial values wrong", so were you talking about >>> this drive or just for real SCSI drives? -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: configurable discard parameters 2015-06-21 20:30 ` Tom Yan @ 2015-06-22 20:57 ` Martin K. Petersen 2015-06-23 14:16 ` Tom Yan 0 siblings, 1 reply; 18+ messages in thread From: Martin K. Petersen @ 2015-06-22 20:57 UTC (permalink / raw) To: Tom Yan; +Cc: Martin K. Petersen, linux-scsi, linux-ide >>>>> "Tom" == Tom Yan <tom.ty89@gmail.com> writes: Tom> However, I am curious what is the meaning of "Our current TRIM Tom> payload is a single sector that can accommodate 64 * 65535 blocks Tom> being unmapped. Report this value in the Block Limits Maximum Unmap Tom> LBA count field." in the commit message. What does the value Tom> actually mean/affect? Could it cause trouble to certain drives? DSM Trim takes a payload of <block nr, block count> ranges. The block count is constrained to 65535 by virtue of being 16 bits. We can fit 64 8-byte range descriptors in a single 512-byte sector payload. That's why we set the limit for a single, contiguous discard request to 0x3fffc0. That results in 2GB minus change for a single command. Tom> By the way I think I got the answer for my USB TRIM question. Tom> Basically for USB->SATA drives the SATL is implemented in the Tom> bridges, so it must be able to "SAT" one of the three SCSI unmap Tom> methods to ATA TRIM, just like libata does, Yep. I have set to see one that gets that right, though. There are even many enterprise SAS/RAID controllers that don't support UNMAP-TRIM translation. Linux' libata was one of the first implementations of SCSI-ATA translation for TRIM. Initially we bent the rules for UNMAP a bit. When the standards caught up we switched to WRITE SAME. Tom> So it actually tells. And I hope that the kernel wouldn't "falsify" Tom> anything for devices which do provide some VPD(s). : \ SATA doesn't have VPDs. -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: configurable discard parameters 2015-06-22 20:57 ` Martin K. Petersen @ 2015-06-23 14:16 ` Tom Yan 2015-06-23 15:36 ` Martin K. Petersen 0 siblings, 1 reply; 18+ messages in thread From: Tom Yan @ 2015-06-23 14:16 UTC (permalink / raw) To: Martin K. Petersen; +Cc: linux-scsi, linux-ide Still, I wonder if the kernel should allow the user to configure the real granularity and send ATA commands that are rounded off by it (which works just like --step in blkdiscard). For example, (64 * 65535) % 8 = 0 but 65535 % 8 = 7 So the block count limit for the ATA commands would be 65535 - (65535 % real_gran) per range. Also, by experimenting on my SanDisk Extreme USB with hdparm TRIM and hexdump, I can see that the maximum block/sector allowed to be discarded per ATA command is 65536, inspite of the payload size. I don't know whether the USB bridging or the way hdparm does TRIM matters, but it seems that some devices can't really handle limit like 0x3fffc0 blocks. So I still think that the kernel should allow users to configure limits that can make libata send reliable ATA TRIM commands. On 23 June 2015 at 04:57, Martin K. Petersen <martin.petersen@oracle.com> wrote: > > Tom> So it actually tells. And I hope that the kernel wouldn't "falsify" > Tom> anything for devices which do provide some VPD(s). : \ > > SATA doesn't have VPDs. > I know now, that's why I think it's "alright" for libata to do so. I just don't know whether the kernel would tamper with them for devices like USB or real SCSI drives. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: configurable discard parameters 2015-06-23 14:16 ` Tom Yan @ 2015-06-23 15:36 ` Martin K. Petersen 2015-06-23 16:41 ` Tom Yan 0 siblings, 1 reply; 18+ messages in thread From: Martin K. Petersen @ 2015-06-23 15:36 UTC (permalink / raw) To: Tom Yan; +Cc: Martin K. Petersen, linux-scsi, linux-ide >>>>> "Tom" == Tom Yan <tom.ty89@gmail.com> writes: Tom> I don't know whether the USB bridging or the way hdparm does TRIM Tom> matters, but it seems that some devices can't really handle limit Tom> like 0x3fffc0 blocks. The 0x3fffc0 limit is for SATA devices connected through Linux' libata SCSI ATA translation. This number corresponds to the biggest range we can discard while maintaining a 1:1 mapping between the SCSI command and the ATA ditto. If you connect the SATA drive to a USB/UAS bridge you are entirely at the bridge's whim when it comes to translation of WRITE SAME(10/16) w/ UNMAP or the UNMAP command to DSM TRIM. libata is not involved and neither is the 0x3fffc0 limit (although chances are that the drive has the same limit internally). Even when using hdparm the bridge device still needs to correctly translate the ATA PASSTHROUGH commands. Tom> So I still think that the kernel should allow users to configure Tom> limits that can make libata send reliable ATA TRIM commands. You haven't given us any reason to. We are not aware of any ATA drives that put constraints on the range block count. Again, if you want to help please focus on your bridge devices and what, if anything, can be done to uniquely identify them and override any incorrect values they might report to the SCSI stack. Up to and including us disabling discard entirely on these devices if their translation isn't verifiable and bullet proof. -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: configurable discard parameters 2015-06-23 15:36 ` Martin K. Petersen @ 2015-06-23 16:41 ` Tom Yan 2015-06-23 17:03 ` Martin K. Petersen 0 siblings, 1 reply; 18+ messages in thread From: Tom Yan @ 2015-06-23 16:41 UTC (permalink / raw) To: Martin K. Petersen; +Cc: linux-scsi, linux-ide On 23 June 2015 at 23:36, Martin K. Petersen <martin.petersen@oracle.com> wrote: > > You haven't given us any reason to. We are not aware of any ATA drives > that put constraints on the range block count. > What I have been doing is trying to show you example of those constraints. When I talked about the block count limit of the SanDisk Extreme USB, I am not asking you to fix anything for the drive I HAVE (because I can only use hdparm to TRIM it anyway), but to show you that some devices MIGHT have such limit. I just can't be 100% sure of it, even though from what I see in the different results on payload size and range sector count, I don't think the bridge is intervening at all (but just the limit of the SATA controller behind). But for my Intel 530 SSD, the granularity problem is obvious and solid enough because it's connected directly with SATA. Currently what the kernel does is assume all devices support 1 sector granularity. And the problem doesn't even lie on the current "hardcoded" granularity in libata, because blkdev_issue_discard() only does "mod check" against the granularity on the maximum sector counts, so even if it's allowed to be configured, it wouldn't really help. And I have yet to see anything which actually does "mod check" against ANY granularity on the sector count per range. This is what the example in my last mail was about. And the only info I have ever saw about TRIM block limit of a SATA drive is in `hdparm -I`. For my Intel SSD, which actually has a lower limit of 8 sectors, it shows: Data Set Management TRIM supported (limit 1 block) while for the SanDisk Extreme USB, which actually has a lower limit of 256 sectors, it shows: Data Set Management TRIM supported (limit 8 blocks) Inspite of the accuracy, I don't see the kernel actually read this limit. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: configurable discard parameters 2015-06-23 16:41 ` Tom Yan @ 2015-06-23 17:03 ` Martin K. Petersen 2015-06-23 17:24 ` Tom Yan 0 siblings, 1 reply; 18+ messages in thread From: Martin K. Petersen @ 2015-06-23 17:03 UTC (permalink / raw) To: Tom Yan; +Cc: Martin K. Petersen, linux-scsi, linux-ide >>>>> "Tom" == Tom Yan <tom.ty89@gmail.com> writes: Tom> Currently what the kernel does is assume all devices support 1 Tom> sector granularity. The ATA Command Set does not allow for any other granularity than 1 sector. Bigger granularity reporting is supported in SCSI SBC to allow for thinly provisioned disk arrays with a bigger allocation unit. It is common for disk arrays to provision LUN space in units of 1MB or more. Tom> For my Intel SSD, which actually has a lower limit of 8 sectors, it Tom> shows: Data Set Management TRIM supported (limit 1 block) while for Tom> the SanDisk Extreme USB, which actually has a lower limit of 256 Tom> sectors, it shows: Data Set Management TRIM supported (limit 8 Tom> blocks) It don't know what these "lower limits" you are talking about are. What hdparm tells you is that DSM TRIM on the Intel drive supports 1 block (512 bytes) of range payload data. Whereas the SanDisk drive supports a full 4KB page of range payload data (8 * 512 bytes). We did try to enable payloads bigger than 512 bytes back in the day but most drives that reported supporting the bigger payload actually didn't and all hell broke loose. So we reverted to a single sector payload for libata. I still have the payload patch in my SSD test branch and regularly test with it. But there are still drives that fail in mysterious ways with it in place and so far I haven't felt compelled to maintain yet another blacklist. -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: configurable discard parameters 2015-06-23 17:03 ` Martin K. Petersen @ 2015-06-23 17:24 ` Tom Yan 2015-06-23 18:26 ` Martin K. Petersen 0 siblings, 1 reply; 18+ messages in thread From: Tom Yan @ 2015-06-23 17:24 UTC (permalink / raw) To: Martin K. Petersen; +Cc: linux-scsi, linux-ide On 24 June 2015 at 01:03, Martin K. Petersen <martin.petersen@oracle.com> wrote: > > It don't know what these "lower limits" you are talking about are. > [tom@localhost ~]$ sudo shred -v -n 1 /dev/sda3 [tom@localhost ~]$ sudo blkdiscard -l 512 /dev/sda3 [tom@localhost ~]$ sudo hexdump -n 4096 /dev/sda3 | head 0000000 58c7 f3b0 0f97 2a9a 2da8 7c44 11f5 7852 0000010 4f1d d19c 682b a3d8 6be9 61df 4eb3 a4ee 0000020 f28b f2f6 ef36 c244 f1b0 eadd 5ad0 c5ec 0000030 54c4 abb4 f9c3 7ca0 d807 63ad 81cd 45e6 0000040 2b30 709c 550a 5e52 6928 4468 78c9 f671 0000050 eef1 dfa7 2a6d 3150 64f8 f9a2 240d bbf4 0000060 2cff 02e0 c893 16a7 67c2 8cd9 af76 71b6 0000070 4740 83cc 5b15 3db0 64b2 1a6d 3f40 5a98 0000080 09d0 035c b5ae 2cd2 1850 ea70 7a09 0491 0000090 6b27 9d81 9d74 a087 afd9 e75a 85ea adf6 [tom@localhost ~]$ sudo blkdiscard -l 3584 /dev/sda3 [tom@localhost ~]$ sudo hexdump -n 4096 /dev/sda3 | head 0000000 58c7 f3b0 0f97 2a9a 2da8 7c44 11f5 7852 0000010 4f1d d19c 682b a3d8 6be9 61df 4eb3 a4ee 0000020 f28b f2f6 ef36 c244 f1b0 eadd 5ad0 c5ec 0000030 54c4 abb4 f9c3 7ca0 d807 63ad 81cd 45e6 0000040 2b30 709c 550a 5e52 6928 4468 78c9 f671 0000050 eef1 dfa7 2a6d 3150 64f8 f9a2 240d bbf4 0000060 2cff 02e0 c893 16a7 67c2 8cd9 af76 71b6 0000070 4740 83cc 5b15 3db0 64b2 1a6d 3f40 5a98 0000080 09d0 035c b5ae 2cd2 1850 ea70 7a09 0491 0000090 6b27 9d81 9d74 a087 afd9 e75a 85ea adf6 [tom@localhost ~]$ sudo blkdiscard -l 4096 /dev/sda3 [tom@localhost ~]$ sudo hexdump -n 4096 /dev/sda3 | head 0000000 0000 0000 0000 0000 0000 0000 0000 0000 * 0001000 So when libata issue ATA commands with ranges of 65535 sectors, only 65535-(65535%8) = 65528 sectors are discarded, yet it wouldn't know about that so the next range will start from LBA 65536. I can workaround this by specifying --step in blkdiscard, but I think the kernel should have a param configurable for general. > > What hdparm tells you is that DSM TRIM on the Intel drive supports 1 > block (512 bytes) of range payload data. Whereas the SanDisk drive > supports a full 4KB page of range payload data (8 * 512 bytes). > Oh so they are not reporting garbage. That's good to know. I guess that's why hdparm actually splits the range list I provided by 512-range per ATA command (which works) for the SanDisk drive. > > We did try to enable payloads bigger than 512 bytes back in the day but > most drives that reported supporting the bigger payload actually didn't > and all hell broke loose. So we reverted to a single sector payload for > libata. > > I still have the payload patch in my SSD test branch and regularly test > with it. But there are still drives that fail in mysterious ways with it > in place and so far I haven't felt compelled to maintain yet another > blacklist. > For now I have no problem with the single sector payload thing. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: configurable discard parameters 2015-06-23 17:24 ` Tom Yan @ 2015-06-23 18:26 ` Martin K. Petersen 2015-06-23 21:25 ` Tom Yan 0 siblings, 1 reply; 18+ messages in thread From: Martin K. Petersen @ 2015-06-23 18:26 UTC (permalink / raw) To: Tom Yan; +Cc: Martin K. Petersen, linux-scsi, linux-ide >>>>> "Tom" == Tom Yan <tom.ty89@gmail.com> writes: Tom> So when libata issue ATA commands with ranges of 65535 sectors, Tom> only 65535-(65535%8) = 65528 sectors are discarded, That's unfortunate but TRIM is advisory so the drive is free to ignore all or parts of the request. What happens if you discard sectors 0-6 and then sector 7? Tom> I can workaround this by specifying --step in blkdiscard, but I Tom> think the kernel should have a param configurable for general. This is on the Intel 530? What does the drive report in /sys/block/sdN/queue/discard_zeroes_data? -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: configurable discard parameters 2015-06-23 18:26 ` Martin K. Petersen @ 2015-06-23 21:25 ` Tom Yan 2015-06-24 2:55 ` Martin K. Petersen 0 siblings, 1 reply; 18+ messages in thread From: Tom Yan @ 2015-06-23 21:25 UTC (permalink / raw) To: Martin K. Petersen; +Cc: linux-scsi, linux-ide [-- Attachment #1: Type: text/plain, Size: 1608 bytes --] First of all let me add another "statistic" about the issue: [tom@localhost ~]$ sudo shred -n 1 /dev/sda3 [tom@localhost ~]$ sudo blkdiscard /dev/sda3 [tom@localhost ~]$ sudo hexdump /dev/sda3 | wc -l 310635 [tom@localhost ~]$ sudo hexdump /dev/sda3 | pcregrep -M '0000 0000 0000 0000 0000 0000 0000 0000\n\*' | wc -l 2410 total untrimmed sectors: (310635 - 1 - 2410) / 32 = 9632 total ranges: (43 GiB * (1024 ^ 3) / 512) / 65535 ~= 1376 average untrimmed sectors per range: (9632 / 1376) = 7 = (65535 % 8) Also, FWIW: [tom@localhost ~]$ sudo shred -n 1 /dev/sda3 [tom@localhost ~]$ let step=(65535-65535%8)*512 [tom@localhost ~]$ echo $step 33550336 [tom@localhost ~]$ sudo blkdiscard -p "$step" /dev/sda3 [tom@localhost ~]$ sudo hexdump /dev/sda3 0000000 0000 0000 0000 0000 0000 0000 0000 0000 * ac0000000 On 24 June 2015 at 02:26, Martin K. Petersen <martin.petersen@oracle.com> wrote: > > What happens if you discard sectors 0-6 and then sector 7? > [tom@localhost ~]$ sudo shred -n 1 /dev/sda3 [tom@localhost ~]$ sudo blkdiscard -l 3584 /dev/sda3 [tom@localhost ~]$ sudo blkdiscard -o 3584 -l 512 /dev/sda3 [tom@localhost ~]$ sudo hexdump -n 4096 /dev/sda3 0000000 f06d 8365 5e1b 616c 7362 4d61 2182 02fb ... 0000ff0 54ef 9579 51bc 9042 115a 375e c28f 4dcc 0001000 > > This is on the Intel 530? What does the drive report in > /sys/block/sdN/queue/discard_zeroes_data? > Yes. It reports 0. In `hdparm -I`: * Data Set Management TRIM supported (limit 1 block) * Deterministic read data after TRIM You may also want to check out and compare the two attached test case files. [-- Attachment #2: sandisk_test --] [-- Type: application/octet-stream, Size: 3176 bytes --] [tom@localhost ~]$ sudo shred -n 1 /dev/sdc1 [tom@localhost ~]$ sudo hdparm --please-destroy-my-drive --trim-sector-ranges 256:1 /dev/sdc /dev/sdc: trimming 1 sectors from 1 ranges succeeded [tom@localhost ~]$ sudo hexdump /dev/sdc1 | head -n 3 0000000 2c1c 1ef7 b9ed bcc3 9bce cf70 3849 c601 0000010 63bb 6423 204f 95af e414 64c6 48c9 6a35 0000020 6914 f4c2 be6e 9c93 d561 9bf4 c76b 0bce [tom@localhost ~]$ sudo shred -n 1 /dev/sdc1 [tom@localhost ~]$ sudo hdparm --please-destroy-my-drive --trim-sector-ranges 256:128 /dev/sdc /dev/sdc: trimming 128 sectors from 1 ranges succeeded [tom@localhost ~]$ sudo hexdump /dev/sdc1 | head -n 3 0000000 de94 1b97 e60f 416c 6dce e138 a5a6 7cb9 0000010 5412 9f00 d573 d9c1 6fdf e4df 65ad 1e26 0000020 e6fd 24dd 0907 9471 b5f4 3b67 e7cf c165 [tom@localhost ~]$ sudo shred -n 1 /dev/sdc1 [tom@localhost ~]$ sudo hdparm --please-destroy-my-drive --trim-sector-ranges 256:256 /dev/sdc /dev/sdc: trimming 256 sectors from 1 ranges succeeded [tom@localhost ~]$ sudo hexdump /dev/sdc1 | head -n 3 0000000 0000 0000 0000 0000 0000 0000 0000 0000 * 0020000 8fdb 7a20 c3ee 9e29 c325 29b2 c323 29dd [tom@localhost ~]$ sudo shred -n 1 /dev/sdc1 [tom@localhost ~]$ for (( i=256; i<=256; i+=1 )); do echo "$i:1"; done | sudo hdparm --please-destroy-my-drive --trim-sector-ranges-stdin /dev/sdc /dev/sdc: trimming 1 sectors from 1 ranges succeeded [tom@localhost ~]$ sudo hexdump /dev/sdc1 | head -n 3 0000000 ae7d d079 e9c1 4f5c 3c83 8989 c1b7 3f5a 0000010 8c41 5197 66a6 cfee 1d2f 2468 0c95 32b9 0000020 4679 fdea 4b33 b86d 34c7 6721 a921 f35c [tom@localhost ~]$ sudo shred -n 1 /dev/sdc1 [tom@localhost ~]$ for (( i=256; i<=383; i+=1 )); do echo "$i:1"; done | sudo hdparm --please-destroy-my-drive --trim-sector-ranges-stdin /dev/sdc /dev/sdc: trimming 128 sectors from 128 ranges succeeded [tom@localhost ~]$ sudo hexdump /dev/sdc1 | head -n 3 0000000 6067 e2c3 8fe2 475c 2cf7 b7d1 5c9f e7df 0000010 9f12 1a09 8c15 d54d 0b9a 6973 c3a1 2bc1 0000020 7e30 4b1c e427 7608 9f1f 8d5a f390 ff1a [tom@localhost ~]$ sudo shred -n 1 /dev/sdc1 [tom@localhost ~]$ for (( i=256; i<=511; i+=1 )); do echo "$i:1"; done | sudo hdparm --please-destroy-my-drive --trim-sector-ranges-stdin /dev/sdc /dev/sdc: trimming 256 sectors from 256 ranges succeeded [tom@localhost ~]$ sudo hexdump /dev/sdc1 | head -n 3 0000000 0000 0000 0000 0000 0000 0000 0000 0000 * 0020000 e90c 9dd1 9f7c 0504 afc3 24ec e07d d93a [tom@localhost ~]$ sudo shred -n 1 /dev/sdc1 [tom@localhost ~]$ sudo hdparm --please-destroy-my-drive --trim-sector-ranges 256:65535 65791:65025 /dev/sdc /dev/sdc: trimming 130560 sectors from 2 ranges succeeded [tom@localhost ~]$ sudo hexdump /dev/sdc1 | head -n 3 0000000 0000 0000 0000 0000 0000 0000 0000 0000 * 1fe0000 f634 ce74 02e9 43a0 164d 7388 ea26 6767 [tom@localhost ~]$ sudo shred -n 1 /dev/sdc1 [tom@localhost ~]$ sudo hdparm --please-destroy-my-drive --trim-sector-ranges 256:65280 65536:65280 /dev/sdc /dev/sdc: trimming 130560 sectors from 2 ranges succeeded [tom@localhost ~]$ sudo hexdump /dev/sdc1 | head -n 3 0000000 0000 0000 0000 0000 0000 0000 0000 0000 * 3fc0000 62f4 c0a3 2789 ec72 f555 994b 449f 9ff2 [tom@localhost ~]$ [-- Attachment #3: intel_test --] [-- Type: application/octet-stream, Size: 1339 bytes --] [tom@localhost ~]$ sudo sh -c 'for (( i=512; i<=4096; i+=512 )); do blkdiscard -p "$i" -l 4096 /dev/sda3; sleep 5; echo "${i}:"; hexdump /dev/sda3 | head -n 3; done' 512: 0000000 8537 38ae 0ed2 1442 36c0 fab6 0bbf 9b96 0000010 3d5e 138f 8890 3af4 644e 04a6 062b 0625 0000020 9f62 e65c cb8d 03c2 36df 37c7 999c bd16 1024: 0000000 8537 38ae 0ed2 1442 36c0 fab6 0bbf 9b96 0000010 3d5e 138f 8890 3af4 644e 04a6 062b 0625 0000020 9f62 e65c cb8d 03c2 36df 37c7 999c bd16 1536: 0000000 8537 38ae 0ed2 1442 36c0 fab6 0bbf 9b96 0000010 3d5e 138f 8890 3af4 644e 04a6 062b 0625 0000020 9f62 e65c cb8d 03c2 36df 37c7 999c bd16 2048: 0000000 8537 38ae 0ed2 1442 36c0 fab6 0bbf 9b96 0000010 3d5e 138f 8890 3af4 644e 04a6 062b 0625 0000020 9f62 e65c cb8d 03c2 36df 37c7 999c bd16 2560: 0000000 8537 38ae 0ed2 1442 36c0 fab6 0bbf 9b96 0000010 3d5e 138f 8890 3af4 644e 04a6 062b 0625 0000020 9f62 e65c cb8d 03c2 36df 37c7 999c bd16 3072: 0000000 8537 38ae 0ed2 1442 36c0 fab6 0bbf 9b96 0000010 3d5e 138f 8890 3af4 644e 04a6 062b 0625 0000020 9f62 e65c cb8d 03c2 36df 37c7 999c bd16 3584: 0000000 8537 38ae 0ed2 1442 36c0 fab6 0bbf 9b96 0000010 3d5e 138f 8890 3af4 644e 04a6 062b 0625 0000020 9f62 e65c cb8d 03c2 36df 37c7 999c bd16 4096: 0000000 0000 0000 0000 0000 0000 0000 0000 0000 * 0001000 2196 b091 70e4 24eb 3dab 2834 a1cf 249a [tom@localhost ~]$ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: configurable discard parameters 2015-06-23 21:25 ` Tom Yan @ 2015-06-24 2:55 ` Martin K. Petersen 2015-06-24 12:46 ` Tom Yan 0 siblings, 1 reply; 18+ messages in thread From: Martin K. Petersen @ 2015-06-24 2:55 UTC (permalink / raw) To: Tom Yan; +Cc: Martin K. Petersen, linux-scsi, linux-ide >>>>> "Tom" == Tom Yan <tom.ty89@gmail.com> writes: Tom> total untrimmed sectors: (310635 - 1 - 2410) / 32 = 9632 total Tom> ranges: (43 GiB * (1024 ^ 3) / 512) / 65535 ~= 1376 average Tom> untrimmed sectors per range: (9632 / 1376) = 7 = (65535 % 8) Every type of drive has its own internal restrictions. Unless the drive guarantees zero after TRIM it is perfectly normal for heads, tails or random pieces in-between to be left untouched. I am surprised that the common case of contiguous range entries was not handled properly by your drive. Most of them deal with that just fine (regardless of their internal granularity and alignment constraints). It is normally only the partial block tracking between command invocations that causes grief. In any case. Unless discard_zeroes_data is 1 (and that requires guarantees above and beyond what can be discovered via the ATA Command Set) all bets are off wrt. dependable behavior. The code below is an untested proof of concept to see what it would take to alleviate your particular issue. I am, however, not at all convinced that introducing such a change is worth the risk. I know of at least a couple of devices that would blow up as a result of this patch... -- Martin K. Petersen Oracle Linux Engineering diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index 3131adcc1f87..9c270a303f0b 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -2238,8 +2238,24 @@ static unsigned int ata_scsiop_inq_b0(struct ata_scsi_args *args, u8 *rbuf) * with the unmap bit set. */ if (ata_id_has_trim(args->id)) { - put_unaligned_be64(65535 * 512 / 8, &rbuf[36]); - put_unaligned_be32(1, &rbuf[28]); + /* + * If the drive supports reliable zero after trim we set + * the granularity to 1 and the blocks per range to + * 0xffff. Otherwise we set set the granularity to 8 and + * restrict the blocks per range to 0xfff8. This is done + * to accommodate older drives which prefer 4K-alignment. + */ + + if (ata_id_has_zero_after_trim(args->id) && + args->dev->horkage & ATA_HORKAGE_ZERO_AFTER_TRIM) { + put_unaligned_be64(ATA_MAX_TRIM_RANGE * 512 / 8, + &rbuf[36]); + put_unaligned_be32(1, &rbuf[28]); + } else { + put_unaligned_be64(ATA_ALIGNED_TRIM_RANGE * 512 / 8, + &rbuf[36]); + put_unaligned_be32(8, &rbuf[28]); + } } return 0; @@ -3168,7 +3184,14 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc) goto invalid_fld; buf = page_address(sg_page(scsi_sglist(scmd))); - size = ata_set_lba_range_entries(buf, 512, block, n_block); + + if (ata_id_has_zero_after_trim(dev->id) && + dev->horkage & ATA_HORKAGE_ZERO_AFTER_TRIM) + size = ata_set_lba_range_entries(buf, 512, block, n_block, + ATA_MAX_TRIM_RANGE); + else + size = ata_set_lba_range_entries(buf, 512, block, n_block, + ATA_ALIGNED_TRIM_RANGE); if (ata_ncq_enabled(dev) && ata_fpdma_dsm_supported(dev)) { /* Newer devices support queued TRIM commands */ diff --git a/include/linux/ata.h b/include/linux/ata.h index fed36418dd1c..8a17fa22cdbe 100644 --- a/include/linux/ata.h +++ b/include/linux/ata.h @@ -47,6 +47,8 @@ enum { ATA_MAX_SECTORS = 256, ATA_MAX_SECTORS_LBA48 = 65535,/* TODO: 65536? */ ATA_MAX_SECTORS_TAPE = 65535, + ATA_MAX_TRIM_RANGE = 0xffff, + ATA_ALIGNED_TRIM_RANGE = 0xfff8, ATA_ID_WORDS = 256, ATA_ID_CONFIG = 0, @@ -1012,19 +1014,20 @@ static inline void ata_id_to_hd_driveid(u16 *id) * TO NV CACHE PINNED SET. */ static inline unsigned ata_set_lba_range_entries(void *_buffer, - unsigned buf_size, u64 sector, unsigned long count) + unsigned buf_size, u64 sector, unsigned long count, + u16 bpe) { __le64 *buffer = _buffer; unsigned i = 0, used_bytes; while (i < buf_size / 8 ) { /* 6-byte LBA + 2-byte range per entry */ u64 entry = sector | - ((u64)(count > 0xffff ? 0xffff : count) << 48); + ((u64)(count > bpe ? bpe : count) << 48); buffer[i++] = __cpu_to_le64(entry); - if (count <= 0xffff) + if (count <= bpe) break; - count -= 0xffff; - sector += 0xffff; + count -= bpe; + sector += bpe; } used_bytes = ALIGN(i * 8, 512); ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: configurable discard parameters 2015-06-24 2:55 ` Martin K. Petersen @ 2015-06-24 12:46 ` Tom Yan 2015-06-25 1:15 ` Martin K. Petersen 0 siblings, 1 reply; 18+ messages in thread From: Tom Yan @ 2015-06-24 12:46 UTC (permalink / raw) To: Martin K. Petersen; +Cc: linux-scsi, linux-ide To be honest I don't think this patch should get in as well. First of all it wouldn't really solve problem for all devices which have different limits. Secondly I doubt it's related to Deterministic Zero AT ALL. (For one the SanDisk drive I have shows Deterministic Zero, still it behaves similarly as the Intel drive.) IHMO adding "switch" base on characteristics which is uncertain to be / not at all relevant is the worst thing to do ever. (Yeah I recall how Intel "follows" HDMI/DP color range spec senselessly again.) The only patching which would really mean something is to allow user to configure blocks per range and ranges per command, so that for users can tune kernel TRIM per device if they really want to, while leaving the "safe default" intact. However I am really curious how the drives "blow up" on less blocks per range. Isn't that even more of a firmware bug than the problem I have? If they blow up even with single range with less then 65535 blocks addressed, wouldn't they blow up anytime if they do filesystem TRIM? On 24 June 2015 at 10:55, Martin K. Petersen <martin.petersen@oracle.com> wrote: >>>>>> "Tom" == Tom Yan <tom.ty89@gmail.com> writes: > > Tom> total untrimmed sectors: (310635 - 1 - 2410) / 32 = 9632 total > Tom> ranges: (43 GiB * (1024 ^ 3) / 512) / 65535 ~= 1376 average > Tom> untrimmed sectors per range: (9632 / 1376) = 7 = (65535 % 8) > > Every type of drive has its own internal restrictions. Unless the drive > guarantees zero after TRIM it is perfectly normal for heads, tails or > random pieces in-between to be left untouched. > > I am surprised that the common case of contiguous range entries was not > handled properly by your drive. Most of them deal with that just fine > (regardless of their internal granularity and alignment constraints). It > is normally only the partial block tracking between command invocations > that causes grief. > > In any case. Unless discard_zeroes_data is 1 (and that requires > guarantees above and beyond what can be discovered via the ATA Command > Set) all bets are off wrt. dependable behavior. > > The code below is an untested proof of concept to see what it would take > to alleviate your particular issue. I am, however, not at all convinced > that introducing such a change is worth the risk. I know of at least a > couple of devices that would blow up as a result of this patch... > > -- > Martin K. Petersen Oracle Linux Engineering > > diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c > index 3131adcc1f87..9c270a303f0b 100644 > --- a/drivers/ata/libata-scsi.c > +++ b/drivers/ata/libata-scsi.c > @@ -2238,8 +2238,24 @@ static unsigned int ata_scsiop_inq_b0(struct ata_scsi_args *args, u8 *rbuf) > * with the unmap bit set. > */ > if (ata_id_has_trim(args->id)) { > - put_unaligned_be64(65535 * 512 / 8, &rbuf[36]); > - put_unaligned_be32(1, &rbuf[28]); > + /* > + * If the drive supports reliable zero after trim we set > + * the granularity to 1 and the blocks per range to > + * 0xffff. Otherwise we set set the granularity to 8 and > + * restrict the blocks per range to 0xfff8. This is done > + * to accommodate older drives which prefer 4K-alignment. > + */ > + > + if (ata_id_has_zero_after_trim(args->id) && > + args->dev->horkage & ATA_HORKAGE_ZERO_AFTER_TRIM) { > + put_unaligned_be64(ATA_MAX_TRIM_RANGE * 512 / 8, > + &rbuf[36]); > + put_unaligned_be32(1, &rbuf[28]); > + } else { > + put_unaligned_be64(ATA_ALIGNED_TRIM_RANGE * 512 / 8, > + &rbuf[36]); > + put_unaligned_be32(8, &rbuf[28]); > + } > } > > return 0; > @@ -3168,7 +3184,14 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc) > goto invalid_fld; > > buf = page_address(sg_page(scsi_sglist(scmd))); > - size = ata_set_lba_range_entries(buf, 512, block, n_block); > + > + if (ata_id_has_zero_after_trim(dev->id) && > + dev->horkage & ATA_HORKAGE_ZERO_AFTER_TRIM) > + size = ata_set_lba_range_entries(buf, 512, block, n_block, > + ATA_MAX_TRIM_RANGE); > + else > + size = ata_set_lba_range_entries(buf, 512, block, n_block, > + ATA_ALIGNED_TRIM_RANGE); > > if (ata_ncq_enabled(dev) && ata_fpdma_dsm_supported(dev)) { > /* Newer devices support queued TRIM commands */ > diff --git a/include/linux/ata.h b/include/linux/ata.h > index fed36418dd1c..8a17fa22cdbe 100644 > --- a/include/linux/ata.h > +++ b/include/linux/ata.h > @@ -47,6 +47,8 @@ enum { > ATA_MAX_SECTORS = 256, > ATA_MAX_SECTORS_LBA48 = 65535,/* TODO: 65536? */ > ATA_MAX_SECTORS_TAPE = 65535, > + ATA_MAX_TRIM_RANGE = 0xffff, > + ATA_ALIGNED_TRIM_RANGE = 0xfff8, > > ATA_ID_WORDS = 256, > ATA_ID_CONFIG = 0, > @@ -1012,19 +1014,20 @@ static inline void ata_id_to_hd_driveid(u16 *id) > * TO NV CACHE PINNED SET. > */ > static inline unsigned ata_set_lba_range_entries(void *_buffer, > - unsigned buf_size, u64 sector, unsigned long count) > + unsigned buf_size, u64 sector, unsigned long count, > + u16 bpe) > { > __le64 *buffer = _buffer; > unsigned i = 0, used_bytes; > > while (i < buf_size / 8 ) { /* 6-byte LBA + 2-byte range per entry */ > u64 entry = sector | > - ((u64)(count > 0xffff ? 0xffff : count) << 48); > + ((u64)(count > bpe ? bpe : count) << 48); > buffer[i++] = __cpu_to_le64(entry); > - if (count <= 0xffff) > + if (count <= bpe) > break; > - count -= 0xffff; > - sector += 0xffff; > + count -= bpe; > + sector += bpe; > } > > used_bytes = ALIGN(i * 8, 512); ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: configurable discard parameters 2015-06-24 12:46 ` Tom Yan @ 2015-06-25 1:15 ` Martin K. Petersen 2015-06-26 7:05 ` Tom Yan 0 siblings, 1 reply; 18+ messages in thread From: Martin K. Petersen @ 2015-06-25 1:15 UTC (permalink / raw) To: Tom Yan; +Cc: Martin K. Petersen, linux-scsi, linux-ide >>>>> "Tom" == Tom Yan <tom.ty89@gmail.com> writes: Tom> First of all it wouldn't really solve problem for all devices which Tom> have different limits. Only by virtue of us generally aligning on large power of two boundaries. The only reason I am entertaining this in the first place is that I have one drive that behaves in a way similar to yours. And if we can make things slightly better (but not perfect) for several drives without causing any regressions then that's worth exploring. Tom> Secondly I doubt it's related to Deterministic Zero AT ALL. (For Tom> one the SanDisk drive I have shows Deterministic Zero, still it Tom> behaves similarly as the Intel drive.) The drive reporting deterministic zero is not enough. It needs to be explicitly whitelisted before we report discard_zeroes_data=1. Tom> The only patching which would really mean something is to allow Tom> user to configure blocks per range and ranges per command, so that Tom> for users can tune kernel TRIM per device if they really want to, Tom> while leaving the "safe default" intact. I'll think about it. Tom> However I am really curious how the drives "blow up" on less blocks Tom> per range. Isn't that even more of a firmware bug than the problem Tom> I have? I have several older drives that expect a single contiguous LBA range. They don't handle multiple discontiguous ranges at all. -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: configurable discard parameters 2015-06-25 1:15 ` Martin K. Petersen @ 2015-06-26 7:05 ` Tom Yan 0 siblings, 0 replies; 18+ messages in thread From: Tom Yan @ 2015-06-26 7:05 UTC (permalink / raw) To: Martin K. Petersen; +Cc: linux-scsi, linux-ide On 25 June 2015 at 09:15, Martin K. Petersen <martin.petersen@oracle.com> wrote: > > The drive reporting deterministic zero is not enough. It needs to be > explicitly whitelisted before we report discard_zeroes_data=1. > Not sure what you mean here. I just meant I don't think the "lower limit" thing I am experiencing is related to the "type" of TRIM, at least not for any Deterministic TRIM. (Although I don't really know what exactly does Deterministic Read Data means. Perhaps it means vendor would make sure the LBA after TRIM reads either zero or a specific pattern of data?) > > I have several older drives that expect a single contiguous LBA > range. They don't handle multiple discontiguous ranges at all. > Well discontiguous is somewhat inaccurate here. I guess what you mean is they simply don't handle the next range until the previous one is used up. For example, "0:65528, 65528:65528" are definitely contiguous, but not fulfiling the requirement of your drives; while "0:65535 131072:65535" are definitely not contiguous, but your drives take them fine. By the way what are the consequences? Data loss? System hang? ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2015-06-26 7:05 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-06-20 17:12 configurable discard parameters Tom Yan 2015-06-21 0:20 ` Martin K. Petersen 2015-06-21 7:03 ` Tom Yan 2015-06-21 8:05 ` Tom Yan 2015-06-21 12:36 ` Tom Yan 2015-06-21 20:30 ` Tom Yan 2015-06-22 20:57 ` Martin K. Petersen 2015-06-23 14:16 ` Tom Yan 2015-06-23 15:36 ` Martin K. Petersen 2015-06-23 16:41 ` Tom Yan 2015-06-23 17:03 ` Martin K. Petersen 2015-06-23 17:24 ` Tom Yan 2015-06-23 18:26 ` Martin K. Petersen 2015-06-23 21:25 ` Tom Yan 2015-06-24 2:55 ` Martin K. Petersen 2015-06-24 12:46 ` Tom Yan 2015-06-25 1:15 ` Martin K. Petersen 2015-06-26 7:05 ` Tom Yan
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.