* Problem with DISCARD and RAID10
@ 2012-11-06 9:32 Brad Campbell
2012-11-06 11:40 ` Shaohua Li
0 siblings, 1 reply; 7+ messages in thread
From: Brad Campbell @ 2012-11-06 9:32 UTC (permalink / raw)
To: linux RAID, Shaohua Li
G'day Shaohua,
I'm testing Vanilla 3.7.0-rc4 and bumping up against squillions of these :
[ 41.094726] request botched: dev sdc: type=1, flags=122d8081
[ 41.094774] sector 28317178, nr/cnr 0/32
[ 41.094815] bio ffff8807fe885300, biotail ffff8807fe887300, buffer
(null), len 0
[ 41.100045] request botched: dev sda: type=1, flags=122d8081
[ 41.100094] sector 28317403, nr/cnr 0/32
[ 41.100134] bio ffff8807fe885840, biotail ffff8807fe887840, buffer
(null), len 0
[ 41.100718] request botched: dev sdb: type=1, flags=122d8081
[ 41.100767] sector 28317179, nr/cnr 0/224
[ 41.100808] bio ffff8807fe885a80, biotail ffff8807fe887d80, buffer
(null), len 0
[ 41.104649] request botched: dev sdc: type=1, flags=122d8081
[ 41.104697] sector 28317179, nr/cnr 0/224
[ 41.104738] bio ffff8807fe886000, biotail ffff8807fe887300, buffer
(null), len 0
This is a staging system that is eventually intended for production use,
however it's not important at the moment and might make a good test mule
for a while.
I'll lay out my whole background and config.
I have 6 x 240GB SSD on a test bench (3 Intel 330 & 3 Samsung 830). I
have the three Samsung connected to the on-board AHCI ports and I have
the three Intel on a Marvell PCIe board serviced by sata_mv.
System is an AMD FX8350 with 32G ram. Kernel is X86_64. Nothing else of
note.
All drives pass individual read/write and filesystem trim tests (if I
just create the filesystem on the individual drive).
All six drives are partitioned identically.
root@test:~# sfdisk -d /dev/sda
# partition table of /dev/sda
unit: sectors
/dev/sda1 : start= 63, size= 273042, Id=83, bootable
/dev/sda2 : start= 273105, size=419441085, Id=83
/dev/sda3 : start= 0, size= 0, Id= 0
/dev/sda4 : start= 0, size= 0, Id= 0
Partition 1 on all drives is a bootable 6 way RAID-1 and not relevant
here (gets mounted as /boot and is ext2).
The second partitions are configured in a RAID10 near 2, so there are
three pairs of mirrors that are striped together (Intel/Samsung x 3).
root@test:~# mdadm --detail /dev/md2
/dev/md2:
Version : 1.2
Creation Time : Thu Nov 1 20:11:38 2012
Raid Level : raid10
Array Size : 628767744 (599.64 GiB 643.86 GB)
Used Dev Size : 209589248 (199.88 GiB 214.62 GB)
Raid Devices : 6
Total Devices : 6
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Tue Nov 6 17:07:13 2012
State : active
Active Devices : 6
Working Devices : 6
Failed Devices : 0
Spare Devices : 0
Layout : near=2
Chunk Size : 128K
Name : test:2 (local to host test)
UUID : abe7511b:5eb834e1:f425f2a9:3d3ebd56
Events : 842
Number Major Minor RaidDevice State
0 8 2 0 active sync /dev/sda2
1 8 66 1 active sync /dev/sde2
2 8 18 2 active sync /dev/sdb2
3 8 82 3 active sync /dev/sdf2
4 8 34 4 active sync /dev/sdc2
5 8 98 5 active sync /dev/sdg2
The array is partitioned :
root@test:~# sfdisk -d /dev/md2
# partition table of /dev/md2
unit: sectors
/dev/md2p1 : start= 3072, size= 41942016, Id=83
/dev/md2p2 : start= 41945088, size= 83887104, Id=83
/dev/md2p3 : start=125832192, size=1131703296, Id=83
/dev/md2p4 : start= 0, size= 0, Id= 0
All three partitions are default ext4 created with mke2fs -t ext4 /dev/blah
The Intel drives support :
* Data Set Management TRIM supported (limit 1 block)
* Deterministic read data after TRIM
The Samsung Drives support :
* Data Set Management TRIM supported (limit 8 blocks)
I don't use, test or intend to use discard as a filesystem option,
however on my other machines (with single or multiple non-RAID ssd's) I
batch fun fstrim once a week or so.
Kernel version is vanilla git 3.7.0-rc4.
When I run fstrim on a partition in the array :
ie fstrim -v /home (where /home is on /dev/md2p2)
I get a dmesg full of the messages quoted at the top of the mail.
I did see some data corruption on one of the partitions that required a
re-format and re-load at one point, but I have been unable to reproduce
that.
As this is a test system, a complete reformat and reload is mostly
automated and therefore loss or corruption is of little overall consequence.
Please let me know if there is anything I can do to assist.
Regards,
Brad
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Problem with DISCARD and RAID10
2012-11-06 9:32 Problem with DISCARD and RAID10 Brad Campbell
@ 2012-11-06 11:40 ` Shaohua Li
2012-11-06 19:55 ` Holger Kiehl
2012-11-06 22:39 ` Martin K. Petersen
0 siblings, 2 replies; 7+ messages in thread
From: Shaohua Li @ 2012-11-06 11:40 UTC (permalink / raw)
To: Brad Campbell; +Cc: linux RAID, martin.petersen
On Tue, Nov 06, 2012 at 05:32:26PM +0800, Brad Campbell wrote:
> G'day Shaohua,
>
> I'm testing Vanilla 3.7.0-rc4 and bumping up against squillions of these :
>
> [ 41.094726] request botched: dev sdc: type=1, flags=122d8081
> [ 41.094774] sector 28317178, nr/cnr 0/32
> [ 41.094815] bio ffff8807fe885300, biotail ffff8807fe887300,
> buffer (null), len 0
> [ 41.100045] request botched: dev sda: type=1, flags=122d8081
> [ 41.100094] sector 28317403, nr/cnr 0/32
> [ 41.100134] bio ffff8807fe885840, biotail ffff8807fe887840,
> buffer (null), len 0
> [ 41.100718] request botched: dev sdb: type=1, flags=122d8081
> [ 41.100767] sector 28317179, nr/cnr 0/224
> [ 41.100808] bio ffff8807fe885a80, biotail ffff8807fe887d80,
> buffer (null), len 0
> [ 41.104649] request botched: dev sdc: type=1, flags=122d8081
> [ 41.104697] sector 28317179, nr/cnr 0/224
> [ 41.104738] bio ffff8807fe886000, biotail ffff8807fe887300,
> buffer (null), len 0
cc Martin to check if he has idea.
Hmm, I thought such problem is already fixed by Martin's discard merge patches
and my last test in SATA SSD doesn't show this problem. Maybe there is still
a corner case we didn't handle discard request merge well. Time run out today,
I'll check tomorrow.
Thanks,
Shaohua
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Problem with DISCARD and RAID10
2012-11-06 11:40 ` Shaohua Li
@ 2012-11-06 19:55 ` Holger Kiehl
2012-11-06 22:39 ` Martin K. Petersen
1 sibling, 0 replies; 7+ messages in thread
From: Holger Kiehl @ 2012-11-06 19:55 UTC (permalink / raw)
To: Shaohua Li; +Cc: Brad Campbell, linux RAID, martin.petersen
Hello,
just wanted to report that I see the same messages with 3.7.0-rc4 and a
Raid0 on two SSD's (Crucial M4):
Nov 6 20:46:06 yoda kernel: [ 5583.447345] bio ffff8803fe73aa80, biotail ffff8803fe73aa80, buffer (null), len 0
Nov 6 20:46:06 yoda kernel: [ 5583.455443] request botched: dev sda: type=1, flags=122d8081
Nov 6 20:46:06 yoda kernel: [ 5583.455446] sector 11433985, nr/cnr 0/1024
Nov 6 20:46:06 yoda kernel: [ 5583.455448] bio ffff8803fe73a9c0, biotail ffff8803d6a3b9c0, buffer (null), len 0
Nov 6 20:46:06 yoda kernel: [ 5583.456452] request botched: dev sdb: type=1, flags=122d8081
Nov 6 20:46:06 yoda kernel: [ 5583.456454] sector 11433985, nr/cnr 0/1024
Nov 6 20:46:06 yoda kernel: [ 5583.456456] bio ffff8803fe73a900, biotail ffff8803d6a3b540, buffer (null), len 0
Nov 6 20:46:06 yoda kernel: [ 5583.457256] request botched: dev sda: type=1, flags=122d8081
Nov 6 20:46:06 yoda kernel: [ 5583.457258] sector 11433986, nr/cnr 0/1024
Nov 6 20:46:06 yoda kernel: [ 5583.457260] bio ffff8803fe73a840, biotail ffff8803d6a3b9c0, buffer (null), len 0
Nov 6 20:46:06 yoda kernel: [ 5583.458315] request botched: dev sdb: type=1, flags=122d8081
Nov 6 20:46:06 yoda kernel: [ 5583.458317] sector 11433986, nr/cnr 0/1024
Regards,
Holger
On Tue, 6 Nov 2012, Shaohua Li wrote:
> On Tue, Nov 06, 2012 at 05:32:26PM +0800, Brad Campbell wrote:
>> G'day Shaohua,
>>
>> I'm testing Vanilla 3.7.0-rc4 and bumping up against squillions of these :
>>
>> [ 41.094726] request botched: dev sdc: type=1, flags=122d8081
>> [ 41.094774] sector 28317178, nr/cnr 0/32
>> [ 41.094815] bio ffff8807fe885300, biotail ffff8807fe887300,
>> buffer (null), len 0
>> [ 41.100045] request botched: dev sda: type=1, flags=122d8081
>> [ 41.100094] sector 28317403, nr/cnr 0/32
>> [ 41.100134] bio ffff8807fe885840, biotail ffff8807fe887840,
>> buffer (null), len 0
>> [ 41.100718] request botched: dev sdb: type=1, flags=122d8081
>> [ 41.100767] sector 28317179, nr/cnr 0/224
>> [ 41.100808] bio ffff8807fe885a80, biotail ffff8807fe887d80,
>> buffer (null), len 0
>> [ 41.104649] request botched: dev sdc: type=1, flags=122d8081
>> [ 41.104697] sector 28317179, nr/cnr 0/224
>> [ 41.104738] bio ffff8807fe886000, biotail ffff8807fe887300,
>> buffer (null), len 0
>
> cc Martin to check if he has idea.
>
> Hmm, I thought such problem is already fixed by Martin's discard merge patches
> and my last test in SATA SSD doesn't show this problem. Maybe there is still
> a corner case we didn't handle discard request merge well. Time run out today,
> I'll check tomorrow.
>
> Thanks,
> Shaohua
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Problem with DISCARD and RAID10
2012-11-06 11:40 ` Shaohua Li
2012-11-06 19:55 ` Holger Kiehl
@ 2012-11-06 22:39 ` Martin K. Petersen
2012-11-07 3:49 ` Brad Campbell
1 sibling, 1 reply; 7+ messages in thread
From: Martin K. Petersen @ 2012-11-06 22:39 UTC (permalink / raw)
To: Shaohua Li; +Cc: Brad Campbell, linux RAID, martin.petersen, James.Bottomley
>>>>> "Shaohua" == Shaohua Li <shli@kernel.org> writes:
Shaohua> cc Martin to check if he has idea.
Shaohua> Hmm, I thought such problem is already fixed by Martin's
Shaohua> discard merge patches and my last test in SATA SSD doesn't show
Shaohua> this problem.
Discard is broken in 3.7 because the relevant block layer patches went
in but James didn't queue the matching SCSI changes.
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Problem with DISCARD and RAID10
2012-11-06 22:39 ` Martin K. Petersen
@ 2012-11-07 3:49 ` Brad Campbell
2012-11-07 15:39 ` Martin K. Petersen
0 siblings, 1 reply; 7+ messages in thread
From: Brad Campbell @ 2012-11-07 3:49 UTC (permalink / raw)
To: Martin K. Petersen; +Cc: Shaohua Li, linux RAID, James.Bottomley
On 07/11/12 06:39, Martin K. Petersen wrote:
>>>>>> "Shaohua" == Shaohua Li <shli@kernel.org> writes:
>
> Shaohua> cc Martin to check if he has idea.
>
> Shaohua> Hmm, I thought such problem is already fixed by Martin's
> Shaohua> discard merge patches and my last test in SATA SSD doesn't show
> Shaohua> this problem.
>
> Discard is broken in 3.7 because the relevant block layer patches went
> in but James didn't queue the matching SCSI changes.
>
http://www.spinics.net/lists/linux-scsi/msg61874.html
To be clear, if I add patches 6-8 to the current -rc4 I should be good
to go?
Regards,
Brad
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Problem with DISCARD and RAID10
2012-11-07 3:49 ` Brad Campbell
@ 2012-11-07 15:39 ` Martin K. Petersen
2012-11-07 17:14 ` Brad Campbell
0 siblings, 1 reply; 7+ messages in thread
From: Martin K. Petersen @ 2012-11-07 15:39 UTC (permalink / raw)
To: Brad Campbell; +Cc: Martin K. Petersen, Shaohua Li, linux RAID, James.Bottomley
>>>>> "Brad" == Brad Campbell <brad@fnarfbargle.com> writes:
>> Discard is broken in 3.7 because the relevant block layer patches
>> went in but James didn't queue the matching SCSI changes.
Brad> http://www.spinics.net/lists/linux-scsi/msg61874.html
Brad> To be clear, if I add patches 6-8 to the current -rc4 I should be
Brad> good to go?
Yep!
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Problem with DISCARD and RAID10
2012-11-07 15:39 ` Martin K. Petersen
@ 2012-11-07 17:14 ` Brad Campbell
0 siblings, 0 replies; 7+ messages in thread
From: Brad Campbell @ 2012-11-07 17:14 UTC (permalink / raw)
To: Martin K. Petersen; +Cc: Shaohua Li, linux RAID, James.Bottomley
On 07/11/12 23:39, Martin K. Petersen wrote:
>>>>>> "Brad" == Brad Campbell <brad@fnarfbargle.com> writes:
>
>>> Discard is broken in 3.7 because the relevant block layer patches
>>> went in but James didn't queue the matching SCSI changes.
>
> Brad> http://www.spinics.net/lists/linux-scsi/msg61874.html
>
> Brad> To be clear, if I add patches 6-8 to the current -rc4 I should be
> Brad> good to go?
>
> Yep!
>
Can confirm that with those patches applied DISCARD behaves in a
predictable and verifiable fashion on my RAID10. Cheers!
Regards,
Brad
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-11-07 17:14 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-06 9:32 Problem with DISCARD and RAID10 Brad Campbell
2012-11-06 11:40 ` Shaohua Li
2012-11-06 19:55 ` Holger Kiehl
2012-11-06 22:39 ` Martin K. Petersen
2012-11-07 3:49 ` Brad Campbell
2012-11-07 15:39 ` Martin K. Petersen
2012-11-07 17:14 ` Brad Campbell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).