I think it it high time to get the SIIMAGE.C 1.06 bugfree. It seems to have some problems with my HD+SATA converter mainly performance wise. I htink it is due to this error coming constanlty in dmesg unless I comment it out in the sources: hde: sata_error = 0x00000000, watchdog = 0, siimage_mmio_ide_dma_test_irq What is the problem? I think it may be due to the fact that I have an SiI3112A controller onboard and the driver detects it without the A revision (just as SiI3112). And/or it is due to the fact that I connected a PATA drive with a SATA converter to the controller. Now with 2.6-test10 the performance got a bit better in comparison to test9 and prior 2.6 kernels. Before it was max 22MB/sec and now it is 25mb/sec. But it is still far away from 2.4.22-ac4 kernel which managed 37mb/s, which is still bad in comparison to windows which reaches 50mb/s. It is NOT a problem of read-ahead. I tried various hdparm parameters and it didn't improve the situation. What makes the situation even worse: When I try hdparm -d1 /dev/hde (though hdparm sates dma is already on) the drive stops working and I get some lines of erorrs like drive-seek error and some irq related stuff. So I have to push the button. Someone else using a native SATA Maxtor on Sil3112 (dunno whether A or not) has no problems, hdparm -d works as well and he gets 40mb/sec with test10. So what may be the problem, and how to get rid of it? (1. error message, 2. bad performance, 3. hdparm -d1 malfunctioning). 1 & 3 were also with 2.4.22-ac4 and 2 wasn't that bad, as stated above, so except 2 there is no regression, but also no fix yet. Changing max_kb_per_request didn't help either. If you need more infos, just ask me. Here the relevant part of dmesg: SiI3112 Serial ATA: IDE controller at PCI slot 0000:01:0b.0 SiI3112 Serial ATA: chipset revision 2 SiI3112 Serial ATA: 100% native mode on irq 11 ide2: MMIO-DMA at 0xf9844000-0xf9844007, BIOS settings: hde:pio, hdf:pio ide3: MMIO-DMA at 0xf9844008-0xf984400f, BIOS settings: hdg:pio, hdh:pio hde: SAMSUNG SP1614N, ATA DISK drive ide2 at 0xf9844080-0xf9844087,0xf984408a on irq 11 hde: max request size: 7KiB hde: 312581808 sectors (160041 MB) w/8192KiB Cache, CHS=19457/255/63, UDMA(100) /dev/ide/host2/bus0/target0/lun0:<4>hde: sata_error = 0x00000000, watchdog = 0, siimage_mmio_ide_dma_test_irq p1 p2 p3 <<4>hde: sata_error = 0x00000000, watchdog = 0, siimage_mmio_ide_dma_test_irq p5<4>hde: sata_error = 0x00000000, watchdog = 0, siimage_mmio_ide_dma_test_irq p6<4>hde: sata_error = 0x00000000, watchdog = 0, siimage_mmio_ide_dma_test_irq p7<4>hde: sata_error = 0x00000000, watchdog = 0, siimage_mmio_ide_dma_test_irq p8<4>hde: sata_error = 0x00000000, watchdog = 0, siimage_mmio_ide_dma_test_irq p9 > Here is hdparm -iI /dev/hde: /dev/hde: Model=SAMSUNG SP1614N, FwRev=TM100-24, SerialNo=0735J1FW702444 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs } RawCHS=16383/16/63, TrkSize=34902, SectSize=554, ECCbytes=4 BuffType=DualPortCache, BuffSize=8192kB, MaxMultSect=16, MultSect=16 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=268435455 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: mdma0 mdma1 mdma2 UDMA modes: udma0 udma1 udma2 AdvancedPM=no WriteCache=enabled Drive conforms to: (null): * signifies the current active mode ATA device, with non-removable media Model Number: SAMSUNG SP1614N Serial Number: 0735J1FW702444 Firmware Revision: TM100-24 Standards: Supported: 7 6 5 4 Likely used: 7 Configuration: Logical max current cylinders 16383 65535 heads 16 1 sectors/track 63 63 -- CHS current addressable sectors: 4128705 LBA user addressable sectors: 268435455 LBA48 user addressable sectors: 312581808 device size with M = 1024*1024: 152627 MBytes device size with M = 1000*1000: 160041 MBytes (160 GB) Capabilities: LBA, IORDY(can be disabled) Queue depth: 1 Standby timer values: spec'd by Standard, no device specific minimum R/W multiple sector transfer: Max = 16 Current = 16 Recommended acoustic management value: 254, current value: 0 DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5 Cycle time: min=120ns recommended=120ns PIO: pio0 pio1 pio2 pio3 pio4 Cycle time: no flow control=240ns IORDY flow control=120ns Commands/features: Enabled Supported: * READ BUFFER cmd * WRITE BUFFER cmd * Host Protected Area feature set * Look-ahead * Write cache * Power Management feature set Security Mode feature set * SMART feature set * FLUSH CACHE EXT command * Mandatory FLUSH CACHE command * Device Configuration Overlay feature set * 48-bit Address feature set Automatic Acoustic Management feature set SET MAX security extension * DOWNLOAD MICROCODE cmd * SMART self-test * SMART error logging Security: Master password revision code = 65534 supported not enabled not locked not frozen not expired: security count supported: enhanced erase 96min for SECURITY ERASE UNIT. 96min for ENHANCED SECURITY ERASE UNIT. HW reset results: CBLID- below Vih Device num = 0 determined by the jumper Checksum: correct And here the complete dmesg: Linux version 2.6.0-gentoo (root@tachyon) (gcc-Version 3.3.2 20031022 (Gentoo Linux 3.3.2-r2, propolice)) #6 Tue Nov 25 14:40:13 CET 2003 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009f800 (usable) BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved) BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000003fff0000 (usable) BIOS-e820: 000000003fff0000 - 000000003fff3000 (ACPI NVS) BIOS-e820: 000000003fff3000 - 0000000040000000 (ACPI data) BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved) BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved) BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved) 127MB HIGHMEM available. 896MB LOWMEM available. On node 0 totalpages: 262128 DMA zone: 4096 pages, LIFO batch:1 Normal zone: 225280 pages, LIFO batch:16 HighMem zone: 32752 pages, LIFO batch:7 DMI 2.2 present. ACPI: RSDP (v000 Nvidia ) @ 0x000f6b60 ACPI: RSDT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x3fff3000 ACPI: FADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x3fff3040 ACPI: MADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x3fff79c0 ACPI: DSDT (v001 NVIDIA AWRDACPI 0x00001000 MSFT 0x0100000d) @ 0x00000000 Building zonelist for node : 0 Kernel command line: root=/dev/hde6 hdg=none vga=0x51A video=vesa:mtrr,ywrap ide_setup: hdg=none Initializing CPU#0 PID hash table entries: 4096 (order 12: 32768 bytes) Detected 1904.513 MHz processor. Console: colour dummy device 80x25 Memory: 1032280k/1048512k available (3128k kernel code, 15280k reserved, 1007k data, 160k init, 131008k highmem) Calibrating delay loop... 3768.32 BogoMIPS Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) Mount-cache hash table entries: 512 (order: 0, 4096 bytes) checking if image is initramfs...it isn't (ungzip failed); looks like an initrd Freeing initrd memory: 304k freed CPU: After generic identify, caps: 0383fbff c1c3fbff 00000000 00000000 CPU: After vendor identify, caps: 0383fbff c1c3fbff 00000000 00000000 CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 256K (64 bytes/line) CPU: After all inits, caps: 0383fbff c1c3fbff 00000000 00000020 Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. CPU: AMD Athlon(tm) stepping 01 Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Checking 'hlt' instruction... OK. POSIX conformance testing by UNIFIX NET: Registered protocol family 16 PCI: PCI BIOS revision 2.10 entry at 0xfb420, last bus=2 PCI: Using configuration type 1 mtrr: v2.0 (20020519) ACPI: Subsystem revision 20031002 ACPI: IRQ 9 was Edge Triggered, setting to Level Triggerd ACPI: Interpreter enabled ACPI: Using PIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (00:00) PCI: Probing PCI hardware (bus 00) ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.HUB0._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.AGPB._PRT] ACPI: PCI Interrupt Link [LNK1] (IRQs 3 4 *5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LNK2] (IRQs 3 4 5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LNK3] (IRQs 3 4 5 6 7 10 *11 12 14 15) ACPI: PCI Interrupt Link [LNK4] (IRQs 3 4 5 6 7 10 *11 12 14 15) ACPI: PCI Interrupt Link [LNK5] (IRQs 3 4 5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LUBA] (IRQs 3 4 5 6 7 10 *11 12 14 15) ACPI: PCI Interrupt Link [LUBB] (IRQs 3 4 *5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LMAC] (IRQs 3 4 5 6 7 *10 11 12 14 15) ACPI: PCI Interrupt Link [LAPU] (IRQs 3 4 5 6 7 *10 11 12 14 15) ACPI: PCI Interrupt Link [LACI] (IRQs 3 4 *5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LMCI] (IRQs 3 4 5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LSMB] (IRQs 3 4 5 6 7 *10 11 12 14 15) ACPI: PCI Interrupt Link [LUB2] (IRQs 3 4 5 6 7 *10 11 12 14 15) ACPI: PCI Interrupt Link [LFIR] (IRQs 3 4 5 6 7 10 *11 12 14 15) ACPI: PCI Interrupt Link [L3CM] (IRQs 3 4 5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LIDE] (IRQs 3 4 5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [APC1] (IRQs *16) ACPI: PCI Interrupt Link [APC2] (IRQs 17) ACPI: PCI Interrupt Link [APC3] (IRQs *18) ACPI: PCI Interrupt Link [APC4] (IRQs *19) ACPI: PCI Interrupt Link [APCE] (IRQs 16) ACPI: PCI Interrupt Link [APCF] (IRQs 20 21 22) ACPI: PCI Interrupt Link [APCG] (IRQs 20 21 22) ACPI: PCI Interrupt Link [APCH] (IRQs 20 21 22) ACPI: PCI Interrupt Link [APCI] (IRQs 20 21 22) ACPI: PCI Interrupt Link [APCJ] (IRQs 20 21 22) ACPI: PCI Interrupt Link [APCK] (IRQs 20 21 22) ACPI: PCI Interrupt Link [APCS] (IRQs *23) ACPI: PCI Interrupt Link [APCL] (IRQs 20 21 22) ACPI: PCI Interrupt Link [APCM] (IRQs 20 21 22) ACPI: PCI Interrupt Link [AP3C] (IRQs 20 21 22) ACPI: PCI Interrupt Link [APCZ] (IRQs 20 21 22) Linux Plug and Play Support v0.97 (c) Adam Belay SCSI subsystem initialized drivers/usb/core/usb.c: registered new driver usbfs drivers/usb/core/usb.c: registered new driver hub ACPI: PCI Interrupt Link [LSMB] enabled at IRQ 10 ACPI: PCI Interrupt Link [LUBA] enabled at IRQ 11 ACPI: PCI Interrupt Link [LUBB] enabled at IRQ 5 ACPI: PCI Interrupt Link [LUB2] enabled at IRQ 10 ACPI: PCI Interrupt Link [LMAC] enabled at IRQ 10 ACPI: PCI Interrupt Link [LAPU] enabled at IRQ 10 ACPI: PCI Interrupt Link [LACI] enabled at IRQ 5 ACPI: PCI Interrupt Link [LFIR] enabled at IRQ 11 ACPI: PCI Interrupt Link [LNK4] enabled at IRQ 11 ACPI: PCI Interrupt Link [LNK1] enabled at IRQ 5 ACPI: PCI Interrupt Link [LNK3] enabled at IRQ 11 PCI: Using ACPI for IRQ routing PCI: if you experience problems, try using option 'pci=noacpi' or even 'acpi=off' vesafb: framebuffer at 0xc0000000, mapped to 0xf8808000, size 16384k vesafb: mode is 1280x1024x16, linelength=2560, pages=1 vesafb: protected mode interface info at c000:ea60 vesafb: scrolling: redraw vesafb: directcolor: size=0:5:6:5, shift=0:11:5:0 fb0: VESA VGA frame buffer device Machine check exception polling timer started. IA-32 Microcode Update Driver: v1.13 <tigran@veritas.com> apm: BIOS version 1.2 Flags 0x07 (Driver version 1.16ac) apm: overridden by ACPI. highmem bounce pool size: 64 pages devfs: v1.22 (20021013) Richard Gooch (rgooch@atnf.csiro.au) devfs: boot_options: 0x1 Installing knfsd (copyright (C) 1996 okir@monad.swb.de). NTFS driver 2.1.5 [Flags: R/W]. udf: registering filesystem SGI XFS for Linux with large block numbers, no debug enabled ACPI: Power Button (FF) [PWRF] ACPI: Fan [FAN] (on) ACPI: Processor [CPU0] (supports C1) ACPI: Thermal Zone [THRM] (43 C) bootsplash 3.1.3-2003/11/14: looking for picture.... silentjpeg size 155838 bytes, found (1280x1024, 155850 bytes, v3). Console: switching to colour frame buffer device 153x58 pty: 256 Unix98 ptys configured Real Time Clock Driver v1.12 Non-volatile memory driver v1.2 Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ sharing disabled ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A Using anticipatory io scheduler Floppy drive(s): fd0 is 1.44M FDC 0 is a post-1991 82077 loop: loaded (max 8 devices) forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.18. PCI: Setting latency timer of device 0000:00:04.0 to 64 eth0: forcedeth.c: subsystem: 0147b:1c00 Linux video capture interface: v1.00 DriverInitialize MAC address = ff:ff:ff:ff:ff:ff:00:00 DriverInitialize key = ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff DVB: registering new adapter (Technisat SkyStar2 driver). DVB: registering frontend 0:0 (Zarlink MT312)... Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx NFORCE2: IDE controller at PCI slot 0000:00:09.0 NFORCE2: chipset revision 162 NFORCE2: not 100% native mode: will probe irqs later NFORCE2: 0000:00:09.0 (rev a2) UDMA133 controller ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA, hdb:DMA ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:DMA, hdd:DMA hda: _NEC DV-5800A, ATAPI CD/DVD-ROM drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 hdc: LITE-ON LTR-16102B, ATAPI CD/DVD-ROM drive hdd: IOMEGA ZIP 100 ATAPI, ATAPI FLOPPY drive ide1 at 0x170-0x177,0x376 on irq 15 SiI3112 Serial ATA: IDE controller at PCI slot 0000:01:0b.0 SiI3112 Serial ATA: chipset revision 2 SiI3112 Serial ATA: 100% native mode on irq 11 ide2: MMIO-DMA at 0xf9844000-0xf9844007, BIOS settings: hde:pio, hdf:pio ide3: MMIO-DMA at 0xf9844008-0xf984400f, BIOS settings: hdg:pio, hdh:pio hde: SAMSUNG SP1614N, ATA DISK drive ide2 at 0xf9844080-0xf9844087,0xf984408a on irq 11 hde: max request size: 7KiB hde: 312581808 sectors (160041 MB) w/8192KiB Cache, CHS=19457/255/63, UDMA(100) /dev/ide/host2/bus0/target0/lun0:<4>hde: sata_error = 0x00000000, watchdog = 0, siimage_mmio_ide_dma_test_irq p1 p2 p3 <<4>hde: sata_error = 0x00000000, watchdog = 0, siimage_mmio_ide_dma_test_irq p5<4>hde: sata_error = 0x00000000, watchdog = 0, siimage_mmio_ide_dma_test_irq p6<4>hde: sata_error = 0x00000000, watchdog = 0, siimage_mmio_ide_dma_test_irq p7<4>hde: sata_error = 0x00000000, watchdog = 0, siimage_mmio_ide_dma_test_irq p8<4>hde: sata_error = 0x00000000, watchdog = 0, siimage_mmio_ide_dma_test_irq p9 > hda: ATAPI 48X DVD-ROM drive, 512kB Cache, UDMA(33) Uniform CD-ROM driver Revision: 3.12 hdc: ATAPI 40X CD-ROM CD-R/RW drive, 2048kB Cache, DMA ide-floppy driver 0.99.newide hdd: No disk in drive hdd: 98304kB, 32/64/96 CHS, 4096 kBps, 512 sector size, 2941 rpm ohci1394: $Rev: 1045 $ Ben Collins <bcollins@debian.org> PCI: Setting latency timer of device 0000:00:0d.0 to 64 ohci1394_0: OHCI-1394 1.1 (PCI): IRQ=[11] MMIO=[cc084000-cc0847ff] Max Packet=[2048] ohci1394_0: SelfID received outside of bus reset sequence video1394: Installed video1394 module raw1394: /dev/raw1394 device initialized Console: switching to colour frame buffer device 153x58 ehci_hcd 0000:00:02.2: EHCI Host Controller PCI: Setting latency timer of device 0000:00:02.2 to 64 ehci_hcd 0000:00:02.2: irq 10, pci mem f984c000 ehci_hcd 0000:00:02.2: new USB bus registered, assigned bus number 1 PCI: cache line size of 64 is not supported by device 0000:00:02.2 ehci_hcd 0000:00:02.2: USB 2.0 enabled, EHCI 1.00, driver 2003-Jun-13 hub 1-0:1.0: USB hub found hub 1-0:1.0: 6 ports detected ohci_hcd: 2003 Oct 13 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI) ohci_hcd: block sizes: ed 64 td 64 ohci_hcd 0000:00:02.0: OHCI Host Controller PCI: Setting latency timer of device 0000:00:02.0 to 64 ohci_hcd 0000:00:02.0: irq 11, pci mem f984e000 ohci_hcd 0000:00:02.0: new USB bus registered, assigned bus number 2 hub 2-0:1.0: USB hub found hub 2-0:1.0: 3 ports detected ohci_hcd 0000:00:02.1: OHCI Host Controller PCI: Setting latency timer of device 0000:00:02.1 to 64 ohci_hcd 0000:00:02.1: irq 5, pci mem f9850000 ohci_hcd 0000:00:02.1: new USB bus registered, assigned bus number 3 hub 3-0:1.0: USB hub found hub 3-0:1.0: 3 ports detected drivers/usb/host/uhci-hcd.c: USB Universal Host Controller Interface driver v2.1 drivers/usb/core/usb.c: registered new driver usblp drivers/usb/class/usblp.c: v0.13: USB Printer Device Class driver Initializing USB Mass Storage driver... drivers/usb/core/usb.c: registered new driver usb-storage USB Mass Storage support registered. drivers/usb/core/usb.c: registered new driver usbscanner drivers/usb/image/scanner.c: 0.4.15:USB Scanner Driver mice: PS/2 mouse device common for all mice input: ImPS/2 Generic Wheel Mouse on isa0060/serio1 serio: i8042 AUX port at 0x60,0x64 irq 12 input: AT Translated Set 2 keyboard on isa0060/serio0 serio: i8042 KBD port at 0x60,0x64 irq 1 I2O Core - (C) Copyright 1999 Red Hat Software I2O: Event thread created as pid 15 i2o: Checking for PCI I2O controllers... I2O configuration manager v 0.04. (C) Copyright 1999 Red Hat Software i2c /dev entries driver i2c_adapter i2c-0: nForce2 SMBus adapter at 0x5000 i2c_adapter i2c-1: nForce2 SMBus adapter at 0x5100 ieee1394: Host added: ID:BUS[0-00:1023] GUID[000000508df0fbe3] hub 2-0:1.0: new USB device on port 1, assigned address 2 Advanced Linux Sound Architecture Driver Version 0.9.7 (Thu Sep 25 19:16:36 2003 UTC). request_module: failed /sbin/modprobe -- snd-card-0. error = -16 PCI: Setting latency timer of device 0000:00:06.0 to 64 drivers/usb/class/usblp.c: usblp0: USB Bidirectional printer dev 2 if 0 alt 1 proto 2 vid 0x03F0 pid 0x1004 intel8x0: clocking to 47482 ALSA device list: #0: NVidia nForce2 at 0xcc081000, irq 5 NET: Registered protocol family 2 IP: routing cache hash table of 8192 buckets, 64Kbytes TCP: Hash tables configured (established 262144 bind 65536) NET: Registered protocol family 1 NET: Registered protocol family 17 ACPI: (supports S0 S3 S4 S5) hde: sata_error = 0x00000000, watchdog = 0, siimage_mmio_ide_dma_test_irq hde: sata_error = 0x00000000, watchdog = 0, siimage_mmio_ide_dma_test_irq hde: sata_error = 0x00000000, watchdog = 0, siimage_mmio_ide_dma_test_irq hde: sata_error = 0x00000000, watchdog = 0, siimage_mmio_ide_dma_test_irq UDF-fs DEBUG fs/udf/lowlevel.c:65:udf_get_last_session: CDROMMULTISESSION not supported: rc=-22 UDF-fs DEBUG fs/udf/super.c:1550:udf_fill_super: Multi-session=0 UDF-fs DEBUG fs/udf/super.c:538:udf_vrs: Starting at sector 16 (2048 byte sectors) UDF-fs: No VRS found XFS mounting filesystem hde6 Ending clean XFS mount for filesystem: hde6 VFS: Mounted root (xfs filesystem) readonly. Mounted devfs on /dev Freeing unused kernel memory: 160k freed NTFS volume version 3.1. NTFS volume version 3.1. Prakash
Holy Shit! I just tried the libata driver and it ROCKSSSS! So far, at least. I already wrote about the crappy SiI3112 ide driver, now with libata I get >60mb/sec!!!! More then I get with windows. Also tests with dd. This rocks. Lets see whether it likes swsup, as well... So folks, try libata, as well. I dunno what all is actuall needed. I enabled scsi, scie disk, scsi generic, sata and its driver. In grub I appended "doataraid noraid". YES! Prakash
"Prakash K. Cheemplavam" <prakashpublic@gmx.de> writes: Hello Prakash, > Holy Shit! > I just tried the libata driver and it ROCKSSSS! So far, at least. > I already wrote about the crappy SiI3112 ide driver, now with libata I > get >60mb/sec!!!! More then I get with windows. > Also tests with dd. This rocks. Lets see whether it likes swsup, as well... Sounds GREAT! > So folks, try libata, as well. > I dunno what all is actuall needed. I enabled scsi, scie disk, scsi > generic, sata and its driver. In grub I appended "doataraid noraid". > YES! I can't find the Silicon Image driver under "SCSI low-level drivers" -> "Serial ATA (SATA) support" under 2.6.0-test11. Just the following are there: ServerWorks Frodo Intel PIIX/ICH Promisa SATA VIA SATA So, which kernel do I need? Regards, Julien
On Sat, Nov 29, 2003 at 05:38:37PM +0100, Julien Oster wrote:
> I can't find the Silicon Image driver under
>
> "SCSI low-level drivers" -> "Serial ATA (SATA) support"
>
> under 2.6.0-test11. Just the following are there:
>
> ServerWorks Frodo
> Intel PIIX/ICH
> Promisa SATA
> VIA SATA
You need to enable CONFIG_BROKEN :)
Jeff
On Sat, Nov 29, 2003 at 04:39:34PM +0100, Prakash K. Cheemplavam wrote:
> Holy Shit!
>
> I just tried the libata driver and it ROCKSSSS! So far, at least.
>
> I already wrote about the crappy SiI3112 ide driver, now with libata I
> get >60mb/sec!!!! More then I get with windows.
>
> Also tests with dd. This rocks. Lets see whether it likes swsup, as well...
>
> So folks, try libata, as well.
Thanks :)
Note that (speaking technically) the SII libata driver doesn't mask all
interrupt conditions, which is why it's listed under CONFIG_BROKEN. So
this translates to "you might get a random lockup", which some users
certainly do see.
For other users, the libata SII driver works flawlessly for them...
Jeff
> I can't find the Silicon Image driver under
>
> "SCSI low-level drivers" -> "Serial ATA (SATA) support"
>
> under 2.6.0-test11. Just the following are there:
>
> ServerWorks Frodo
> Intel PIIX/ICH
> Promisa SATA
> VIA SATA
>
Try under ATA/ATAPI/MFM/RLL support
Silicon Image Chipset Support
CONFIG_BLK_DEV_SIIMAGE: This driver adds PIO/(U)DMA support for the SI CMD680 and SII 3112 (Serial ATA) chips.
Craig
In article <20031129165648.GB14704@gtf.org>,
Jeff Garzik <jgarzik@pobox.com> wrote:
>Note that (speaking technically) the SII libata driver doesn't mask all
>interrupt conditions, which is why it's listed under CONFIG_BROKEN. So
>this translates to "you might get a random lockup", which some users
>certainly do see.
That begs the question: is that going to be fixed ?
Also, the low performance of the IDE SII driver is because of
the bug with request-size (and the bad workaround). Was that
fixed in the libata version and if so is someone working on
porting that fix to the IDE version of the driver ?
Mike.
On Sat, Nov 29, 2003 at 05:41:14PM +0000, Miquel van Smoorenburg wrote: > In article <20031129165648.GB14704@gtf.org>, > Jeff Garzik <jgarzik@pobox.com> wrote: > >Note that (speaking technically) the SII libata driver doesn't mask all > >interrupt conditions, which is why it's listed under CONFIG_BROKEN. So > >this translates to "you might get a random lockup", which some users > >certainly do see. > > That begs the question: is that going to be fixed ? Certainly. > Also, the low performance of the IDE SII driver is because of > the bug with request-size (and the bad workaround). Was that > fixed in the libata version and if so is someone working on > porting that fix to the IDE version of the driver ? It is fixed in the libata version. Jeff
Jeff Garzik wrote: > Note that (speaking technically) the SII libata driver doesn't mask all > interrupt conditions, which is why it's listed under CONFIG_BROKEN. So > this translates to "you might get a random lockup", which some users > certainly do see. However, I've also tested it with my new Maxtor SATA. And I must say: Many thanks, well done! Now, I can use 2.6.0-test under fedora with a fine speed ~ 50MB/s in disk reads. And with GNOME2 under 2.6.0-test11: I can compile the kernel, watch a movie trailer, play 2 OpenGL screensavers, download an Knoppix ISO and the desktop has a good performance, like there is nomore running. Cool! <http://www.marcush.de/screen-2.6.jpg> (250kb) Jeff, if you ever come to Germany, Hamburg, I will invite you for a good drink in a fine location. :-) Greetings, Marcus
Craig Bradney wrote:
>>I can't find the Silicon Image driver under
>>
>>"SCSI low-level drivers" -> "Serial ATA (SATA) support"
>>
>>under 2.6.0-test11. Just the following are there:
>>
>>ServerWorks Frodo
>>Intel PIIX/ICH
>>Promisa SATA
>>VIA SATA
>>
>
>
> Try under ATA/ATAPI/MFM/RLL support
>
> Silicon Image Chipset Support
> CONFIG_BLK_DEV_SIIMAGE: This driver adds PIO/(U)DMA support for the SI CMD680 and SII 3112 (Serial ATA) chips.
No, that is the ide driver that sucks big time.
Prakash
Jeff Garzik wrote: > On Sat, Nov 29, 2003 at 04:39:34PM +0100, Prakash K. Cheemplavam wrote: >>I just tried the libata driver and it ROCKSSSS! So far, at least. >> >>I already wrote about the crappy SiI3112 ide driver, now with libata I >>get >60mb/sec!!!! More then I get with windows. > Thanks :) Come on, we must thank you. You don't imagine how frustrated I became of the SiI bugger. :-) Prakash
Okay, stop bashing IDE driver... three mails is enough...
Apply this patch and you should get similar performance from IDE driver.
You are probably seeing big improvements with libata driver because you are
using Samsung and IBM/Hitachi drives only, for Seagate it probably sucks just
like IDE driver...
IDE driver limits requests to 15kB for all SATA drives...
libata driver limits requests to 15kB only for Seagata SATA drives...
Both drivers still need proper fix for Seagate drives...
--bart
On Sunday 30 of November 2003 03:00, Prakash K. Cheemplavam wrote:
> Jeff Garzik wrote:
> > On Sat, Nov 29, 2003 at 04:39:34PM +0100, Prakash K. Cheemplavam wrote:
> >>I just tried the libata driver and it ROCKSSSS! So far, at least.
> >>
> >>I already wrote about the crappy SiI3112 ide driver, now with libata I
> >>get >60mb/sec!!!! More then I get with windows.
> >
> > Thanks :)
>
> Come on, we must thank you. You don't imagine how frustrated I became of
> the SiI bugger. :-)
>
> Prakash
[IDE] siimage.c: limit requests to 15kB only for Seagate SATA drives
Fix from jgarzik's sata_sil.c libata driver.
drivers/ide/pci/siimage.c | 23 ++++++++++++++++++++++-
1 files changed, 22 insertions(+), 1 deletion(-)
diff -puN drivers/ide/pci/siimage.c~ide-siimage-seagate drivers/ide/pci/siimage.c
--- linux-2.6.0-test11/drivers/ide/pci/siimage.c~ide-siimage-seagate 2003-11-30 15:38:48.512585200 +0100
+++ linux-2.6.0-test11-root/drivers/ide/pci/siimage.c 2003-11-30 15:38:48.516584592 +0100
@@ -1047,6 +1047,27 @@ static void __init init_mmio_iops_siimag
hwif->mmio = 2;
}
+static int is_dev_seagate_sata(ide_drive_t *drive)
+{
+ const char *s = &drive->id->model[0];
+ unsigned len;
+
+ if (!drive->present)
+ return 0;
+
+ len = strnlen(s, sizeof(drive->id->model));
+
+ if ((len > 4) && (!memcmp(s, "ST", 2))) {
+ if ((!memcmp(s + len - 2, "AS", 2)) ||
+ (!memcmp(s + len - 3, "ASL", 3))) {
+ printk(KERN_INFO "%s: applying pessimistic Seagate "
+ "errata fix\n", drive->name);
+ return 1;
+ }
+ }
+ return 0;
+}
+
/**
* init_iops_siimage - set up iops
* @hwif: interface to set up
@@ -1068,7 +1089,7 @@ static void __init init_iops_siimage (id
hwif->hwif_data = 0;
hwif->rqsize = 128;
- if (is_sata(hwif))
+ if (is_sata(hwif) && is_dev_seagate_sata(&hwif->drives[0]))
hwif->rqsize = 15;
if (pci_get_drvdata(dev) == NULL)
_
Bartlomiej Zolnierkiewicz wrote: > Okay, stop bashing IDE driver... three mails is enough... > > Apply this patch and you should get similar performance from IDE driver. > You are probably seeing big improvements with libata driver because you are > using Samsung and IBM/Hitachi drives only, for Seagate it probably sucks just > like IDE driver... > > IDE driver limits requests to 15kB for all SATA drives... > libata driver limits requests to 15kB only for Seagata SATA drives... If you read my message closely then you should have understand that setting the request highr *didn't* help, ie echo "max_kb_per_request:128" > /proc/ide/hde/settings made *no* difference, so I won't even try that patch. As far I have understood this is exactly the thing you changed in the patch. If I am mistaken, then I take it back. Prakash
I read it _very_ closely, here is your original mail with subject "Re: 2.6.0-test9 /-mm3 SATA siimage - bad disk performance": On Saturday 15 of November 2003 10:11, Prakash K. Cheemplavam wrote: > Marcus Hartig wrote: > > Hello all, > > > > with the Fedora 1 kernel 2.4.22-1.2115.nptl I get with hdparm -t > > (Timing buffered disk reads) 34 MB/sec. Its very slow for this drive. > > > > With 2.6.0-test9 and -mm3 I get around "62 MB in 3.05 = 20,31". Wow" > > Back to ~1998? > > I have a similar problem: With 2.4.22-ac3 I had 37mb/sec with my Samsung > HD and 49MB/sec with IBM/Hitachi, now with 2.6 (all I tried, including ^^^^^^^^^^^^ > test9-mm2) I had only 20mb/sec for Samsung and about 39mb/sec for the ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > IBM. Motherboard is Abit NF7-S Rev2.0, as well, so same situation with ^^^^ > the siimage 1.06 driver. I wanted to run some dd tests as well, but it > is a real performance hit. Playing with readahead or other hdparm > options didn't help either. > > Prakash In 2.6.x there is no max_kb_per_request setting in /proc/ide/hdx/settings. Therefore echo "max_kb_per_request:128" > /proc/ide/hde/settings does not work. Hmm. actually I was under influence that we have generic ioctls in 2.6.x, but I can find only BLKSECTGET, BLKSECTSET was somehow lost. Jens? Prakash, please try patch and maybe you will have 2 working drivers now :-). --bart On Sunday 30 of November 2003 16:52, Prakash K. Cheemplavam wrote: > Bartlomiej Zolnierkiewicz wrote: > > Okay, stop bashing IDE driver... three mails is enough... > > > > Apply this patch and you should get similar performance from IDE driver. > > You are probably seeing big improvements with libata driver because > > you are > > > using Samsung and IBM/Hitachi drives only, for Seagate it probably > > sucks just > > > like IDE driver... > > > > IDE driver limits requests to 15kB for all SATA drives... > > libata driver limits requests to 15kB only for Seagata SATA drives... > > If you read my message closely then you should have understand that > setting the request highr *didn't* help, ie > > echo "max_kb_per_request:128" > /proc/ide/hde/settings > > made *no* difference, so I won't even try that patch. As far I have > understood this is exactly the thing you changed in the patch. If I am > mistaken, then I take it back. > > Prakash
On Sun, Nov 30 2003, Bartlomiej Zolnierkiewicz wrote:
>
> I read it _very_ closely, here is your original mail with subject
> "Re: 2.6.0-test9 /-mm3 SATA siimage - bad disk performance":
>
> On Saturday 15 of November 2003 10:11, Prakash K. Cheemplavam wrote:
> > Marcus Hartig wrote:
> > > Hello all,
> > >
> > > with the Fedora 1 kernel 2.4.22-1.2115.nptl I get with hdparm -t
> > > (Timing buffered disk reads) 34 MB/sec. Its very slow for this drive.
> > >
> > > With 2.6.0-test9 and -mm3 I get around "62 MB in 3.05 = 20,31". Wow"
> > > Back to ~1998?
> >
> > I have a similar problem: With 2.4.22-ac3 I had 37mb/sec with my Samsung
> > HD and 49MB/sec with IBM/Hitachi, now with 2.6 (all I tried, including
> ^^^^^^^^^^^^
> > test9-mm2) I had only 20mb/sec for Samsung and about 39mb/sec for the
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > IBM. Motherboard is Abit NF7-S Rev2.0, as well, so same situation with
> ^^^^
> > the siimage 1.06 driver. I wanted to run some dd tests as well, but it
> > is a real performance hit. Playing with readahead or other hdparm
> > options didn't help either.
> >
> > Prakash
>
> In 2.6.x there is no max_kb_per_request setting in /proc/ide/hdx/settings.
> Therefore
> echo "max_kb_per_request:128" > /proc/ide/hde/settings
> does not work.
>
> Hmm. actually I was under influence that we have generic ioctls in 2.6.x,
> but I can find only BLKSECTGET, BLKSECTSET was somehow lost. Jens?
Probably because it's very dangerous to expose, echo something too big
and watch your data disappear.
--
Jens Axboe
Bartlomiej Zolnierkiewicz wrote: > Apply this patch and you should get similar performance from IDE driver. > You are probably seeing big improvements with libata driver because you are > using Samsung and IBM/Hitachi drives only, for Seagate it probably sucks just > like IDE driver... Looks good to me. > IDE driver limits requests to 15kB for all SATA drives... > libata driver limits requests to 15kB only for Seagata SATA drives... > > Both drivers still need proper fix for Seagate drives... Yep. Do you have the Maxtor fix, as well? It's in libata's SII driver, though it should be noted that the Maxtor errata only occurs for PATA<->SATA bridges, and not for real Maxtor SATA drives. Jeff
Yes, siimage.c contains Maxtor fix as well, there is even comment from
Alan about Marvell PATA<->SATA bridges...
--bart
On Sunday 30 of November 2003 17:27, Jeff Garzik wrote:
> Bartlomiej Zolnierkiewicz wrote:
> > Apply this patch and you should get similar performance from IDE driver.
> > You are probably seeing big improvements with libata driver because you
> > are using Samsung and IBM/Hitachi drives only, for Seagate it probably
> > sucks just like IDE driver...
>
> Looks good to me.
>
> > IDE driver limits requests to 15kB for all SATA drives...
> > libata driver limits requests to 15kB only for Seagata SATA drives...
> >
> > Both drivers still need proper fix for Seagate drives...
>
> Yep. Do you have the Maxtor fix, as well? It's in libata's SII driver,
> though it should be noted that the Maxtor errata only occurs for
> PATA<->SATA bridges, and not for real Maxtor SATA drives.
>
> Jeff
Jens Axboe wrote:
> On Sun, Nov 30 2003, Bartlomiej Zolnierkiewicz wrote:
>>Hmm. actually I was under influence that we have generic ioctls in 2.6.x,
>>but I can find only BLKSECTGET, BLKSECTSET was somehow lost. Jens?
>
>
> Probably because it's very dangerous to expose, echo something too big
> and watch your data disappear.
IMO, agreed.
Max KB per request really should be set by the driver, as it's a
hardware-specific thing that (as we see :)) is often errata-dependent.
Tangent: My non-pessimistic fix will involve submitting a single sector
DMA r/w taskfile manually, then proceeding with the remaining sectors in
another r/w taskfile. This doubles the interrupts on the affected
chipset/drive combos, but still allows large requests. I'm not terribly
fond of partial completions, as I feel they add complexity, particularly
so in my case: I can simply use the same error paths for both the
single-sector taskfile and the "everything else" taskfile, regardless of
which taskfile throws the error.
(thinking out loud) Though best for simplicity, I am curious if a
succession of "tiny/huge" transaction pairs are efficient? I am hoping
that the drive's cache, coupled with the fact that each pair of
taskfiles is sequentially contiguous, will not hurt speed too much over
a non-errata configuration...
Jeff
Mark Hahn wrote: >>>So folks, try libata, as well. >> >>Thanks :) > > > what do you think the chances are of libata becoming the primary ata > interface for 2.4 and 2.6? there have always been major changes even > to stable releases in the past, at least when the change seems to be a > big improvement. "primary ata interface" is a bit tough to define. Serial ATA will become _the_ ATA interface on motherboards of the future. From a software perspective, it really only matters what hardware driver you load... > incidentally, can you give me any clues to description/discussion you > might have engaged in about libata? I saw your prog-ref pdf, but it > doesn't really describe the motivation, issues of going scsi, etc. > (I looked at lkml and google, but couldn't filter well enough...) Mostly just design in my head, plus a bit of discussion at the Kernel Summit earlier this year. > feel free to reply to lkml. libata design/status/future is clearly of > general interest... I'm putting together a "Serial ATA status report", to be posted to lkml and linux-scsi, which should hopefully cover all that. Your email kicks me into action again, for that report, for which I should thank you :) Jeff
On Sun, Nov 30 2003, Jeff Garzik wrote: > Jens Axboe wrote: > >On Sun, Nov 30 2003, Bartlomiej Zolnierkiewicz wrote: > >>Hmm. actually I was under influence that we have generic ioctls in 2.6.x, > >>but I can find only BLKSECTGET, BLKSECTSET was somehow lost. Jens? > > > > > >Probably because it's very dangerous to expose, echo something too big > >and watch your data disappear. > > > IMO, agreed. > > Max KB per request really should be set by the driver, as it's a > hardware-specific thing that (as we see :)) is often errata-dependent. Yes, it would be better to have a per-drive (or hwif) extra limiting factor if it is needed. For this case it really isn't, so probably not the best idea :) > Tangent: My non-pessimistic fix will involve submitting a single sector > DMA r/w taskfile manually, then proceeding with the remaining sectors in > another r/w taskfile. This doubles the interrupts on the affected > chipset/drive combos, but still allows large requests. I'm not terribly Or split the request 50/50. > fond of partial completions, as I feel they add complexity, particularly > so in my case: I can simply use the same error paths for both the > single-sector taskfile and the "everything else" taskfile, regardless of > which taskfile throws the error. It's just a questions of maintaining the proper request state so you know how much and what part of a request is pending. Requests have been handled this way ever since clustered requests, that is why current_nr_sectors differs from nr_sectors. And with hard_* duplicates, it's pretty easy to extend this a bit. I don't see this as something complex, and if the alternative you are suggesting (your implementation idea is not clear to me...) is to fork another request then I think it's a lot better. Say you receive a request that violates the magic sector count rule. You decide to do the first half of the request, and setup your taskfile for that. You can diminish nr_sectors appropriately, or you can keep this sector count in the associated taskfile - whatever you prefer. The end_io path that covers both "normal" and partial IO is basically: if (!end_that_request_first(rq, 1, sectors)) rq is done else rq state is now correctly the 2nd half In the not-done case, you simply fall out of your isr as you would a complete request, and let your request_fn just start it again. You don't even know this request has already been processed. Depending on whether you remove the request from the queue or not, you just push the request to the top of the request queue so you are certain that you start this one next. So there's really nothing special about partial completions, rather full completions are a one-shot partial completion :) > (thinking out loud) Though best for simplicity, I am curious if a > succession of "tiny/huge" transaction pairs are efficient? I am hoping > that the drive's cache, coupled with the fact that each pair of > taskfiles is sequentially contiguous, will not hurt speed too much over > a non-errata configuration... My gut would say rather two 64kb than a 124 and 4kb. But you should do the numbers, of course :). I'd be surprised if the former wouldn't be more efficient. -- Jens Axboe
On Sunday 30 of November 2003 17:51, Jens Axboe wrote: > On Sun, Nov 30 2003, Jeff Garzik wrote: > > Jens Axboe wrote: > > >On Sun, Nov 30 2003, Bartlomiej Zolnierkiewicz wrote: > > >>Hmm. actually I was under influence that we have generic ioctls in > > >> 2.6.x, but I can find only BLKSECTGET, BLKSECTSET was somehow lost. > > >> Jens? > > > > > >Probably because it's very dangerous to expose, echo something too big > > >and watch your data disappear. > > > > IMO, agreed. > > > > Max KB per request really should be set by the driver, as it's a > > hardware-specific thing that (as we see :)) is often errata-dependent. Yep. > Yes, it would be better to have a per-drive (or hwif) extra limiting > factor if it is needed. For this case it really isn't, so probably not > the best idea :) > > > Tangent: My non-pessimistic fix will involve submitting a single sector > > DMA r/w taskfile manually, then proceeding with the remaining sectors in > > another r/w taskfile. This doubles the interrupts on the affected > > chipset/drive combos, but still allows large requests. I'm not terribly > > Or split the request 50/50. We can't - hardware will lock up. --bart
Bartlomiej Zolnierkiewicz wrote:
> On Sunday 30 of November 2003 17:51, Jens Axboe wrote:
>>>Tangent: My non-pessimistic fix will involve submitting a single sector
>>>DMA r/w taskfile manually, then proceeding with the remaining sectors in
>>>another r/w taskfile. This doubles the interrupts on the affected
>>>chipset/drive combos, but still allows large requests. I'm not terribly
>>
>>Or split the request 50/50.
>
>
> We can't - hardware will lock up.
Well, the constraint we must satisfy is
sector_count % 15 != 1
(i.e. "== 1" causes the lockup)
Beyond that, any request ratio should be ok AFAIK...
Jeff
On Sun, Nov 30 2003, Bartlomiej Zolnierkiewicz wrote:
> > Yes, it would be better to have a per-drive (or hwif) extra limiting
> > factor if it is needed. For this case it really isn't, so probably not
> > the best idea :)
> >
> > > Tangent: My non-pessimistic fix will involve submitting a single sector
> > > DMA r/w taskfile manually, then proceeding with the remaining sectors in
> > > another r/w taskfile. This doubles the interrupts on the affected
> > > chipset/drive combos, but still allows large requests. I'm not terribly
> >
> > Or split the request 50/50.
>
> We can't - hardware will lock up.
I know the problem. Then don't split 50/50 to the word, my point was to
split it closer to 50/50 than 1 sector + the rest.
--
Jens Axboe
On Sun, Nov 30 2003, Jeff Garzik wrote:
> Bartlomiej Zolnierkiewicz wrote:
> >On Sunday 30 of November 2003 17:51, Jens Axboe wrote:
> >>>Tangent: My non-pessimistic fix will involve submitting a single sector
> >>>DMA r/w taskfile manually, then proceeding with the remaining sectors in
> >>>another r/w taskfile. This doubles the interrupts on the affected
> >>>chipset/drive combos, but still allows large requests. I'm not terribly
> >>
> >>Or split the request 50/50.
> >
> >
> >We can't - hardware will lock up.
>
> Well, the constraint we must satisfy is
>
> sector_count % 15 != 1
(sector_count % 15 != 1) && (sector_count != 1)
to be more precise :)
--
Jens Axboe
On Sun, Nov 30 2003, Bartlomiej Zolnierkiewicz wrote:
>
> On Sunday 30 of November 2003 18:08, Jens Axboe wrote:
> > On Sun, Nov 30 2003, Bartlomiej Zolnierkiewicz wrote:
> > > > Yes, it would be better to have a per-drive (or hwif) extra limiting
> > > > factor if it is needed. For this case it really isn't, so probably not
> > > > the best idea :)
> > > >
> > > > > Tangent: My non-pessimistic fix will involve submitting a single
> > > > > sector DMA r/w taskfile manually, then proceeding with the remaining
> > > > > sectors in another r/w taskfile. This doubles the interrupts on the
> > > > > affected chipset/drive combos, but still allows large requests. I'm
> > > > > not terribly
> > > >
> > > > Or split the request 50/50.
> > >
> > > We can't - hardware will lock up.
> >
> > I know the problem. Then don't split 50/50 to the word, my point was to
> > split it closer to 50/50 than 1 sector + the rest.
>
> Oh, I understand now and agree.
Cool. BTW to make myself 100% clear, I don't mean "split" as in split
the request, merely the amount issued to the hardware. Request splitting
has such an ugly ring to it :)
--
Jens Axboe
On Sunday 30 of November 2003 18:08, Jens Axboe wrote:
> On Sun, Nov 30 2003, Bartlomiej Zolnierkiewicz wrote:
> > > Yes, it would be better to have a per-drive (or hwif) extra limiting
> > > factor if it is needed. For this case it really isn't, so probably not
> > > the best idea :)
> > >
> > > > Tangent: My non-pessimistic fix will involve submitting a single
> > > > sector DMA r/w taskfile manually, then proceeding with the remaining
> > > > sectors in another r/w taskfile. This doubles the interrupts on the
> > > > affected chipset/drive combos, but still allows large requests. I'm
> > > > not terribly
> > >
> > > Or split the request 50/50.
> >
> > We can't - hardware will lock up.
>
> I know the problem. Then don't split 50/50 to the word, my point was to
> split it closer to 50/50 than 1 sector + the rest.
Oh, I understand now and agree.
--bart
On the topic of speeds.. hdparm -t gives me 56Mb/s on my Maxtor 80Mb 8mb
cache PATA drive. I got that with 2.4.23 pre 8 which was ATA100 and get
just a little more on ATA133 with 2.6. Not sure what people are
expecting on SATA.
Craig
On Sun, 2003-11-30 at 18:52, Luis Miguel García wrote:
> hello:
>
> I have a Seagate Barracuda IV (80 Gb) connected to parallel ata on a
> nforce-2 motherboard.
>
> If any of you want for me to test any patch to fix the "seagate issue",
> please, count on me. I have a SATA sis3112 and a parallel-to-serial
> converter. If I'm of any help to you, drop me an email.
>
> By the way, I'm only getting 32 MB/s (hdparm -tT /dev/hda) on my
> actual parallel ata. Is this enough for an ATA-100 device?
>
> Thanks a lot.
>
> LuisMi García
> Spain
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
Jens Axboe wrote: > On Sun, Nov 30 2003, Jeff Garzik wrote: >>fond of partial completions, as I feel they add complexity, particularly >>so in my case: I can simply use the same error paths for both the >>single-sector taskfile and the "everything else" taskfile, regardless of >>which taskfile throws the error. > > > It's just a questions of maintaining the proper request state so you > know how much and what part of a request is pending. Requests have been > handled this way ever since clustered requests, that is why > current_nr_sectors differs from nr_sectors. And with hard_* duplicates, > it's pretty easy to extend this a bit. I don't see this as something > complex, and if the alternative you are suggesting (your implementation > idea is not clear to me...) is to fork another request then I think it's > a lot better. [snip howto] Yeah, I know how to do partial completions. The increased complexity arises in my driver. It's simply less code in my driver to treat each transaction as an "all or none" affair. For the vastly common case, it's less i-cache and less interrupts to do all-or-none. In the future I'll probably want to put partial completions in the error path... >>(thinking out loud) Though best for simplicity, I am curious if a >>succession of "tiny/huge" transaction pairs are efficient? I am hoping >>that the drive's cache, coupled with the fact that each pair of >>taskfiles is sequentially contiguous, will not hurt speed too much over >>a non-errata configuration... > > > My gut would say rather two 64kb than a 124 and 4kb. But you should do > the numbers, of course :). I'd be surprised if the former wouldn't be > more efficient. That's why I was thinking out loud, and also why I CC'd Eric :) We'll see. I'll implement whichever is easier first, which will certainly be better than the current sledgehammer limit. Any improvement over the current code will provide dramatic performance increases, and we can tune after that... Jeff
Bartlomiej Zolnierkiewicz wrote:
> In 2.6.x there is no max_kb_per_request setting in /proc/ide/hdx/settings.
> Therefore
> echo "max_kb_per_request:128" > /proc/ide/hde/settings
> does not work.
>
> Hmm. actually I was under influence that we have generic ioctls in 2.6.x,
> but I can find only BLKSECTGET, BLKSECTSET was somehow lost. Jens?
>
> Prakash, please try patch and maybe you will have 2 working drivers now :-).
OK, this driver fixes the transfer rate problem. Nice, so I wanted to do
the right thing, but it didn't work, as you explained... Thanks.
Nevertheless there is still the issue left:
hdparm -d1 /dev/hde makes the drive get major havoc (something like:
ide: dma_intr: status=0x58 { DriveReady, SeekCOmplete, DataRequest}
ide status timeout=0xd8{Busy}; messages taken from swsups kernal panic
). Have to do a hard reset. I guess it is the same reason why swsusp
gets a kernel panic when it sends PM commands to siimage.c. (Mybe the
same error is in libata causing the same kernel panic on swsusp.)
Any clues?
Nice that at least the siimage driver has got some improvement after me
getting on your nerves. ;-)
Prakash
Jens Axboe wrote:
>>Well, the constraint we must satisfy is
>>
>> sector_count % 15 != 1
>
>
> (sector_count % 15 != 1) && (sector_count != 1)
>
> to be more precise :)
Thanks for the clarification, I did not know that.
Avoiding sector_count==1 requires additional code :( Valid requests
might be a single sector. With page-based blkdevs requests smaller than
a page would certainly be infrequent, but are still possible, with bsg
for example...
Jeff
On Sun, Nov 30 2003, Jeff Garzik wrote: > Jens Axboe wrote: > >On Sun, Nov 30 2003, Jeff Garzik wrote: > >>fond of partial completions, as I feel they add complexity, particularly > >>so in my case: I can simply use the same error paths for both the > >>single-sector taskfile and the "everything else" taskfile, regardless of > >>which taskfile throws the error. > > > > > >It's just a questions of maintaining the proper request state so you > >know how much and what part of a request is pending. Requests have been > >handled this way ever since clustered requests, that is why > >current_nr_sectors differs from nr_sectors. And with hard_* duplicates, > >it's pretty easy to extend this a bit. I don't see this as something > >complex, and if the alternative you are suggesting (your implementation > >idea is not clear to me...) is to fork another request then I think it's > >a lot better. > [snip howto] > > Yeah, I know how to do partial completions. The increased complexity > arises in my driver. It's simply less code in my driver to treat each > transaction as an "all or none" affair. > > For the vastly common case, it's less i-cache and less interrupts to do > all-or-none. In the future I'll probably want to put partial > completions in the error path... Oh come one, i-cache? We're doing IO here, a cache line more or less in request handling is absolutely so much in the noise. What are the "increased complexity" involved with doing partial completions? You don't even have to know it's a partial request in the error handling, it's "just the request" state. Honestly, I don't see a problem there. You'll have to expand on what exactly you see as added complexity. To me it still seems like the fastest and most elegant way to handle it. It requires no special attention on request buildup, it requires no extra request and ugly split-code in the request handling. And the partial-completions come for free with the block layer code. > >>(thinking out loud) Though best for simplicity, I am curious if a > >>succession of "tiny/huge" transaction pairs are efficient? I am hoping > >>that the drive's cache, coupled with the fact that each pair of > >>taskfiles is sequentially contiguous, will not hurt speed too much over > >>a non-errata configuration... > > > > > >My gut would say rather two 64kb than a 124 and 4kb. But you should do > >the numbers, of course :). I'd be surprised if the former wouldn't be > >more efficient. > > That's why I was thinking out loud, and also why I CC'd Eric :) We'll Numbers are better than Eric :) > see. I'll implement whichever is easier first, which will certainly be > better than the current sledgehammer limit. Any improvement over the Definitely, the current static limit completely sucks... > current code will provide dramatic performance increases, and we can > tune after that... A path needs to be chosen first, though. -- Jens Axboe
On Sun, Nov 30 2003, Jeff Garzik wrote:
> Jens Axboe wrote:
> >>Well, the constraint we must satisfy is
> >>
> >> sector_count % 15 != 1
> >
> >
> > (sector_count % 15 != 1) && (sector_count != 1)
> >
> >to be more precise :)
>
>
> Thanks for the clarification, I did not know that.
>
> Avoiding sector_count==1 requires additional code :( Valid requests
> might be a single sector. With page-based blkdevs requests smaller than
> a page would certainly be infrequent, but are still possible, with bsg
> for example...
You misread it... sector_count == 1 is fine, sector_count % 15 == 1 is
ok when sector_count is 1 (it would have to be, or sector_count == 1
would not be ok :)
--
Jens Axboe
Jens Axboe wrote: > On Sun, Nov 30 2003, Jeff Garzik wrote: > >>Jens Axboe wrote: >> >>>On Sun, Nov 30 2003, Jeff Garzik wrote: >>> >>>>fond of partial completions, as I feel they add complexity, particularly >>>>so in my case: I can simply use the same error paths for both the >>>>single-sector taskfile and the "everything else" taskfile, regardless of >>>>which taskfile throws the error. >>> >>> >>>It's just a questions of maintaining the proper request state so you >>>know how much and what part of a request is pending. Requests have been >>>handled this way ever since clustered requests, that is why >>>current_nr_sectors differs from nr_sectors. And with hard_* duplicates, >>>it's pretty easy to extend this a bit. I don't see this as something >>>complex, and if the alternative you are suggesting (your implementation >>>idea is not clear to me...) is to fork another request then I think it's >>>a lot better. >> >>[snip howto] >> >>Yeah, I know how to do partial completions. The increased complexity >>arises in my driver. It's simply less code in my driver to treat each >>transaction as an "all or none" affair. >> >>For the vastly common case, it's less i-cache and less interrupts to do >>all-or-none. In the future I'll probably want to put partial >>completions in the error path... > > > Oh come one, i-cache? We're doing IO here, a cache line more or less in > request handling is absolutely so much in the noise. > > What are the "increased complexity" involved with doing partial > completions? You don't even have to know it's a partial request in the > error handling, it's "just the request" state. Honestly, I don't see a > problem there. You'll have to expand on what exactly you see as added > complexity. To me it still seems like the fastest and most elegant way > to handle it. It requires no special attention on request buildup, it > requires no extra request and ugly split-code in the request handling. > And the partial-completions come for free with the block layer code. libata, drivers/ide, and SCSI all must provide internal "submit this taskfile/cdb" API that is decoupled from struct request. Therefore, submitting a transaction pair, or for ATAPI submitting the internal REQUEST SENSE, is quite simple and only a few lines of code. Any extra diddling of the hardware, and struct request, to provide partial completions is extra code. The hardware is currently set up to provide only "it's done" or "it failed" information. Logically, then, partial completions must be more code than the current <none> ;-) WRT error handling, according to ATA specs I can look at the error information to determine how much of the request, if any, completed successfully. (dunno if this is also doable on ATAPI) That's why partial completions in the error path make sense to me. >>>>(thinking out loud) Though best for simplicity, I am curious if a >>>>succession of "tiny/huge" transaction pairs are efficient? I am hoping >>>>that the drive's cache, coupled with the fact that each pair of >>>>taskfiles is sequentially contiguous, will not hurt speed too much over >>>>a non-errata configuration... >>> >>> >>>My gut would say rather two 64kb than a 124 and 4kb. But you should do >>>the numbers, of course :). I'd be surprised if the former wouldn't be >>>more efficient. >> >>That's why I was thinking out loud, and also why I CC'd Eric :) We'll > > > Numbers are better than Eric :) Agreed. >>see. I'll implement whichever is easier first, which will certainly be >>better than the current sledgehammer limit. Any improvement over the > > > Definitely, the current static limit completely sucks... > > >>current code will provide dramatic performance increases, and we can >>tune after that... > > > A path needs to be chosen first, though. The path has been chosen: the "it works" solution first, then tune. :) Jeff
On Sun, Nov 30 2003, Jeff Garzik wrote: > Jens Axboe wrote: > >On Sun, Nov 30 2003, Jeff Garzik wrote: > > > >>Jens Axboe wrote: > >> > >>>On Sun, Nov 30 2003, Jeff Garzik wrote: > >>> > >>>>fond of partial completions, as I feel they add complexity, > >>>>particularly so in my case: I can simply use the same error paths for > >>>>both the single-sector taskfile and the "everything else" taskfile, > >>>>regardless of which taskfile throws the error. > >>> > >>> > >>>It's just a questions of maintaining the proper request state so you > >>>know how much and what part of a request is pending. Requests have been > >>>handled this way ever since clustered requests, that is why > >>>current_nr_sectors differs from nr_sectors. And with hard_* duplicates, > >>>it's pretty easy to extend this a bit. I don't see this as something > >>>complex, and if the alternative you are suggesting (your implementation > >>>idea is not clear to me...) is to fork another request then I think it's > >>>a lot better. > >> > >>[snip howto] > >> > >>Yeah, I know how to do partial completions. The increased complexity > >>arises in my driver. It's simply less code in my driver to treat each > >>transaction as an "all or none" affair. > >> > >>For the vastly common case, it's less i-cache and less interrupts to do > >>all-or-none. In the future I'll probably want to put partial > >>completions in the error path... > > > > > >Oh come one, i-cache? We're doing IO here, a cache line more or less in > >request handling is absolutely so much in the noise. > > > >What are the "increased complexity" involved with doing partial > >completions? You don't even have to know it's a partial request in the > >error handling, it's "just the request" state. Honestly, I don't see a > >problem there. You'll have to expand on what exactly you see as added > >complexity. To me it still seems like the fastest and most elegant way > >to handle it. It requires no special attention on request buildup, it > >requires no extra request and ugly split-code in the request handling. > >And the partial-completions come for free with the block layer code. > > libata, drivers/ide, and SCSI all must provide internal "submit this > taskfile/cdb" API that is decoupled from struct request. Therefore, Yes > submitting a transaction pair, or for ATAPI submitting the internal > REQUEST SENSE, is quite simple and only a few lines of code. SCSI already does these partial completions... > Any extra diddling of the hardware, and struct request, to provide > partial completions is extra code. The hardware is currently set up to > provide only "it's done" or "it failed" information. Logically, then, > partial completions must be more code than the current <none> ;-) That's not a valid argument. Whatever you do, you have to add some lines of code. > WRT error handling, according to ATA specs I can look at the error > information to determine how much of the request, if any, completed > successfully. (dunno if this is also doable on ATAPI) That's why > partial completions in the error path make sense to me. ... so if you do partial completions in the normal paths (or rather allow them), error handling will be simpler. And we all know where the hard and stupid bugs are - the basically never tested error handling. > >>see. I'll implement whichever is easier first, which will certainly > >>be better than the current sledgehammer limit. Any improvement over > >>the > > > > > >Definitely, the current static limit completely sucks... > > > > > >>current code will provide dramatic performance increases, and we can > >>tune after that... > > > > > >A path needs to be chosen first, though. > > The path has been chosen: the "it works" solution first, then tune. > :) Since one path excludes the other, you must choose a path first. Tuning is honing a path, not rewriting that code. -- Jens Axboe
Jens Axboe wrote:
> On Sun, Nov 30 2003, Jeff Garzik wrote:
>
>>Jens Axboe wrote:
>>
>>>>Well, the constraint we must satisfy is
>>>>
>>>> sector_count % 15 != 1
>>>
>>>
>>> (sector_count % 15 != 1) && (sector_count != 1)
>>>
>>>to be more precise :)
>>
>>
>>Thanks for the clarification, I did not know that.
>>
>>Avoiding sector_count==1 requires additional code :( Valid requests
>>might be a single sector. With page-based blkdevs requests smaller than
>>a page would certainly be infrequent, but are still possible, with bsg
>>for example...
>
>
> You misread it... sector_count == 1 is fine, sector_count % 15 == 1 is
> ok when sector_count is 1 (it would have to be, or sector_count == 1
> would not be ok :)
Ahh, duh. Thanks again. Yeah, it makes sense since the bug arises from
too-large Serial ATA data transactions on the SATA bus...
Jeff
hello: I have a Seagate Barracuda IV (80 Gb) connected to parallel ata on a nforce-2 motherboard. If any of you want for me to test any patch to fix the "seagate issue", please, count on me. I have a SATA sis3112 and a parallel-to-serial converter. If I'm of any help to you, drop me an email. By the way, I'm only getting 32 MB/s (hdparm -tT /dev/hda) on my actual parallel ata. Is this enough for an ATA-100 device? Thanks a lot. LuisMi García Spain
On Sun, Nov 30, 2003 at 06:10:06PM +0100, Jens Axboe wrote:
> On Sun, Nov 30 2003, Jeff Garzik wrote:
> > Bartlomiej Zolnierkiewicz wrote:
> > >On Sunday 30 of November 2003 17:51, Jens Axboe wrote:
> > >>>Tangent: My non-pessimistic fix will involve submitting a single sector
> > >>>DMA r/w taskfile manually, then proceeding with the remaining sectors in
> > >>>another r/w taskfile. This doubles the interrupts on the affected
> > >>>chipset/drive combos, but still allows large requests. I'm not terribly
> > >>
> > >>Or split the request 50/50.
> > >
> > >
> > >We can't - hardware will lock up.
> >
> > Well, the constraint we must satisfy is
> >
> > sector_count % 15 != 1
>
> (sector_count % 15 != 1) && (sector_count != 1)
>
> to be more precise :)
I think you wanted to say:
(sector_count % 15 != 1) || (sector_count == 1)
--
Vojtech Pavlik
SuSE Labs, SuSE CR
Jens Axboe wrote: > On Sun, Nov 30 2003, Jeff Garzik wrote: > >>Jens Axboe wrote: >> >>>On Sun, Nov 30 2003, Jeff Garzik wrote: >>> >>> >>>>Jens Axboe wrote: >>>> >>>> >>>>>On Sun, Nov 30 2003, Jeff Garzik wrote: >>>>> >>>>> >>>>>>fond of partial completions, as I feel they add complexity, >>>>>>particularly so in my case: I can simply use the same error paths for >>>>>>both the single-sector taskfile and the "everything else" taskfile, >>>>>>regardless of which taskfile throws the error. >>>>> >>>>> >>>>>It's just a questions of maintaining the proper request state so you >>>>>know how much and what part of a request is pending. Requests have been >>>>>handled this way ever since clustered requests, that is why >>>>>current_nr_sectors differs from nr_sectors. And with hard_* duplicates, >>>>>it's pretty easy to extend this a bit. I don't see this as something >>>>>complex, and if the alternative you are suggesting (your implementation >>>>>idea is not clear to me...) is to fork another request then I think it's >>>>>a lot better. >>>> >>>>[snip howto] >>>> >>>>Yeah, I know how to do partial completions. The increased complexity >>>>arises in my driver. It's simply less code in my driver to treat each >>>>transaction as an "all or none" affair. >>>> >>>>For the vastly common case, it's less i-cache and less interrupts to do >>>>all-or-none. In the future I'll probably want to put partial >>>>completions in the error path... >>> >>> >>>Oh come one, i-cache? We're doing IO here, a cache line more or less in >>>request handling is absolutely so much in the noise. >>> >>>What are the "increased complexity" involved with doing partial >>>completions? You don't even have to know it's a partial request in the >>>error handling, it's "just the request" state. Honestly, I don't see a >>>problem there. You'll have to expand on what exactly you see as added >>>complexity. To me it still seems like the fastest and most elegant way >>>to handle it. It requires no special attention on request buildup, it >>>requires no extra request and ugly split-code in the request handling. >>>And the partial-completions come for free with the block layer code. >> >>libata, drivers/ide, and SCSI all must provide internal "submit this >>taskfile/cdb" API that is decoupled from struct request. Therefore, > > > Yes > > >>submitting a transaction pair, or for ATAPI submitting the internal >>REQUEST SENSE, is quite simple and only a few lines of code. > > > SCSI already does these partial completions... > > >>Any extra diddling of the hardware, and struct request, to provide >>partial completions is extra code. The hardware is currently set up to >>provide only "it's done" or "it failed" information. Logically, then, >>partial completions must be more code than the current <none> ;-) > > > That's not a valid argument. Whatever you do, you have to add some lines > of code. Right. But the point with mentioning "decouple[...]" above was that the most simple path is to submit two requests to hardware, and then a single function call into {scsi|block} to complete the transaction. Current non-errata case: 1 taskfile, 1 completion func call Upcoming errata solution: 2 taskfiles, 1 completion func call Your errata suggestion seems to be: 2 taskfiles, 2 completion func calls That's obviously more work and more code for the errata case. And for the non-errata case, partial completions don't make any sense at all. >>WRT error handling, according to ATA specs I can look at the error >>information to determine how much of the request, if any, completed >>successfully. (dunno if this is also doable on ATAPI) That's why >>partial completions in the error path make sense to me. > > > ... so if you do partial completions in the normal paths (or rather > allow them), error handling will be simpler. And we all know where the In the common non-errata case, there is never a partial completion. > hard and stupid bugs are - the basically never tested error handling. I have :) libata error handling is stupid and simple, but it's also solid and easy to verify. Yet another path to be honed, of course :) >>>>see. I'll implement whichever is easier first, which will certainly >>>>be better than the current sledgehammer limit. Any improvement over >>>>the >>> >>> >>>Definitely, the current static limit completely sucks... >>> >>> >>> >>>>current code will provide dramatic performance increases, and we can >>>>tune after that... >>> >>> >>>A path needs to be chosen first, though. >> >>The path has been chosen: the "it works" solution first, then tune. >>:) > > > Since one path excludes the other, you must choose a path first. Tuning > is honing a path, not rewriting that code. The first depends on the second. The "it works" solution creates the path to be honed. Jeff
On Sunday 30 of November 2003 18:19, Prakash K. Cheemplavam wrote: > Bartlomiej Zolnierkiewicz wrote: > > In 2.6.x there is no max_kb_per_request setting in > > /proc/ide/hdx/settings. Therefore > > echo "max_kb_per_request:128" > /proc/ide/hde/settings > > does not work. > > > > Hmm. actually I was under influence that we have generic ioctls in 2.6.x, > > but I can find only BLKSECTGET, BLKSECTSET was somehow lost. Jens? > > > > Prakash, please try patch and maybe you will have 2 working drivers now > > :-). > > OK, this driver fixes the transfer rate problem. Nice, so I wanted to do > the right thing, but it didn't work, as you explained... Thanks. Cool. > Nevertheless there is still the issue left: > > hdparm -d1 /dev/hde makes the drive get major havoc (something like: > ide: dma_intr: status=0x58 { DriveReady, SeekCOmplete, DataRequest} > > ide status timeout=0xd8{Busy}; messages taken from swsups kernal panic > ). Have to do a hard reset. I guess it is the same reason why swsusp > gets a kernel panic when it sends PM commands to siimage.c. (Mybe the > same error is in libata causing the same kernel panic on swsusp.) > > Any clues? Strange. While doing 'hdparm -d1 /dev/hde' the same code path is executed which is executed during boot so probably device is in different state or you hit some weird driver bug :/. And you are right, thats the reason why swsusp panics. --bart
On Sun, Nov 30 2003, Vojtech Pavlik wrote:
> On Sun, Nov 30, 2003 at 06:10:06PM +0100, Jens Axboe wrote:
>
> > On Sun, Nov 30 2003, Jeff Garzik wrote:
> > > Bartlomiej Zolnierkiewicz wrote:
> > > >On Sunday 30 of November 2003 17:51, Jens Axboe wrote:
> > > >>>Tangent: My non-pessimistic fix will involve submitting a single sector
> > > >>>DMA r/w taskfile manually, then proceeding with the remaining sectors in
> > > >>>another r/w taskfile. This doubles the interrupts on the affected
> > > >>>chipset/drive combos, but still allows large requests. I'm not terribly
> > > >>
> > > >>Or split the request 50/50.
> > > >
> > > >
> > > >We can't - hardware will lock up.
> > >
> > > Well, the constraint we must satisfy is
> > >
> > > sector_count % 15 != 1
> >
> > (sector_count % 15 != 1) && (sector_count != 1)
> >
> > to be more precise :)
>
> I think you wanted to say:
>
> (sector_count % 15 != 1) || (sector_count == 1)
Ehm no, I don't think so... To my knowledge, sector_count == 1 is ok. If
not, the hardware would be seriously screwed (ok it is already) beyond
software fixups.
--
Jens Axboe
Jens Axboe wrote:
> On Sun, Nov 30 2003, Vojtech Pavlik wrote:
>
>>On Sun, Nov 30, 2003 at 06:10:06PM +0100, Jens Axboe wrote:
>>
>>
>>>On Sun, Nov 30 2003, Jeff Garzik wrote:
>>>
>>>>Bartlomiej Zolnierkiewicz wrote:
>>>>
>>>>>On Sunday 30 of November 2003 17:51, Jens Axboe wrote:
>>>>>
>>>>>>>Tangent: My non-pessimistic fix will involve submitting a single sector
>>>>>>>DMA r/w taskfile manually, then proceeding with the remaining sectors in
>>>>>>>another r/w taskfile. This doubles the interrupts on the affected
>>>>>>>chipset/drive combos, but still allows large requests. I'm not terribly
>>>>>>
>>>>>>Or split the request 50/50.
>>>>>
>>>>>
>>>>>We can't - hardware will lock up.
>>>>
>>>>Well, the constraint we must satisfy is
>>>>
>>>> sector_count % 15 != 1
>>>
>>> (sector_count % 15 != 1) && (sector_count != 1)
>>>
>>>to be more precise :)
>>
>>I think you wanted to say:
>>
>> (sector_count % 15 != 1) || (sector_count == 1)
>
>
> Ehm no, I don't think so... To my knowledge, sector_count == 1 is ok. If
> not, the hardware would be seriously screwed (ok it is already) beyond
> software fixups.
Now that you've kicked my brain into action, yes, sector_count==1 is ok.
It's all about limiting the data FIS... and with sector_count==1
there is no worry about the data FIS in this case.
Jeff
On Sun, Nov 30 2003, Jeff Garzik wrote: > Jens Axboe wrote: > >On Sun, Nov 30 2003, Jeff Garzik wrote: > > > >>Jens Axboe wrote: > >> > >>>On Sun, Nov 30 2003, Jeff Garzik wrote: > >>> > >>> > >>>>Jens Axboe wrote: > >>>> > >>>> > >>>>>On Sun, Nov 30 2003, Jeff Garzik wrote: > >>>>> > >>>>> > >>>>>>fond of partial completions, as I feel they add complexity, > >>>>>>particularly so in my case: I can simply use the same error paths > >>>>>>for both the single-sector taskfile and the "everything else" > >>>>>>taskfile, regardless of which taskfile throws the error. > >>>>> > >>>>> > >>>>>It's just a questions of maintaining the proper request state so you > >>>>>know how much and what part of a request is pending. Requests have been > >>>>>handled this way ever since clustered requests, that is why > >>>>>current_nr_sectors differs from nr_sectors. And with hard_* duplicates, > >>>>>it's pretty easy to extend this a bit. I don't see this as something > >>>>>complex, and if the alternative you are suggesting (your implementation > >>>>>idea is not clear to me...) is to fork another request then I think > >>>>>it's > >>>>>a lot better. > >>>> > >>>>[snip howto] > >>>> > >>>>Yeah, I know how to do partial completions. The increased complexity > >>>>arises in my driver. It's simply less code in my driver to treat each > >>>>transaction as an "all or none" affair. > >>>> > >>>>For the vastly common case, it's less i-cache and less interrupts to do > >>>>all-or-none. In the future I'll probably want to put partial > >>>>completions in the error path... > >>> > >>> > >>>Oh come one, i-cache? We're doing IO here, a cache line more or less in > >>>request handling is absolutely so much in the noise. > >>> > >>>What are the "increased complexity" involved with doing partial > >>>completions? You don't even have to know it's a partial request in the > >>>error handling, it's "just the request" state. Honestly, I don't see a > >>>problem there. You'll have to expand on what exactly you see as added > >>>complexity. To me it still seems like the fastest and most elegant way > >>>to handle it. It requires no special attention on request buildup, it > >>>requires no extra request and ugly split-code in the request handling. > >>>And the partial-completions come for free with the block layer code. > >> > >>libata, drivers/ide, and SCSI all must provide internal "submit this > >>taskfile/cdb" API that is decoupled from struct request. Therefore, > > > > > >Yes > > > > > >>submitting a transaction pair, or for ATAPI submitting the internal > >>REQUEST SENSE, is quite simple and only a few lines of code. > > > > > >SCSI already does these partial completions... > > > > > >>Any extra diddling of the hardware, and struct request, to provide > >>partial completions is extra code. The hardware is currently set up to > >>provide only "it's done" or "it failed" information. Logically, then, > >>partial completions must be more code than the current <none> ;-) > > > > > >That's not a valid argument. Whatever you do, you have to add some lines > >of code. > > Right. But the point with mentioning "decouple[...]" above was that the > most simple path is to submit two requests to hardware, and then a > single function call into {scsi|block} to complete the transaction. > > Current non-errata case: 1 taskfile, 1 completion func call > Upcoming errata solution: 2 taskfiles, 1 completion func call > Your errata suggestion seems to be: 2 taskfiles, 2 completion func calls > > That's obviously more work and more code for the errata case. I don't see why, it's exactly 2 x non-errata case. > And for the non-errata case, partial completions don't make any sense at > all. Of course, you would always complete these fully. But having partial completions at the lowest layer gives it to you for free. non-errata case uses the exact same path, it just happens to complete 100% of the request all the time. > >>WRT error handling, according to ATA specs I can look at the error > >>information to determine how much of the request, if any, completed > >>successfully. (dunno if this is also doable on ATAPI) That's why > >>partial completions in the error path make sense to me. > > > > > >... so if you do partial completions in the normal paths (or rather > >allow them), error handling will be simpler. And we all know where the > > In the common non-errata case, there is never a partial completion. Right. But as you mention, error handling is a partial completion by nature (almost always). > >hard and stupid bugs are - the basically never tested error handling. > > I have :) libata error handling is stupid and simple, but it's also > solid and easy to verify. Yet another path to be honed, of course :) That's good :). But even given that, error handling is usually the less tested path (by far). I do commend your 'keep it simple', I think that's key there. > >>>>see. I'll implement whichever is easier first, which will certainly > >>>>be better than the current sledgehammer limit. Any improvement over > >>>>the > >>> > >>> > >>>Definitely, the current static limit completely sucks... > >>> > >>> > >>> > >>>>current code will provide dramatic performance increases, and we can > >>>>tune after that... > >>> > >>> > >>>A path needs to be chosen first, though. > >> > >>The path has been chosen: the "it works" solution first, then tune. > >>:) > > > > > >Since one path excludes the other, you must choose a path first. Tuning > >is honing a path, not rewriting that code. > > The first depends on the second. The "it works" solution creates the > path to be honed. Precisely. But there are more than one workable way to fix it :) -- Jens Axboe
On Sun, Nov 30 2003, Jeff Garzik wrote:
> Jens Axboe wrote:
> >On Sun, Nov 30 2003, Vojtech Pavlik wrote:
> >
> >>On Sun, Nov 30, 2003 at 06:10:06PM +0100, Jens Axboe wrote:
> >>
> >>
> >>>On Sun, Nov 30 2003, Jeff Garzik wrote:
> >>>
> >>>>Bartlomiej Zolnierkiewicz wrote:
> >>>>
> >>>>>On Sunday 30 of November 2003 17:51, Jens Axboe wrote:
> >>>>>
> >>>>>>>Tangent: My non-pessimistic fix will involve submitting a single
> >>>>>>>sector
> >>>>>>>DMA r/w taskfile manually, then proceeding with the remaining
> >>>>>>>sectors in
> >>>>>>>another r/w taskfile. This doubles the interrupts on the affected
> >>>>>>>chipset/drive combos, but still allows large requests. I'm not
> >>>>>>>terribly
> >>>>>>
> >>>>>>Or split the request 50/50.
> >>>>>
> >>>>>
> >>>>>We can't - hardware will lock up.
> >>>>
> >>>>Well, the constraint we must satisfy is
> >>>>
> >>>> sector_count % 15 != 1
> >>>
> >>> (sector_count % 15 != 1) && (sector_count != 1)
> >>>
> >>>to be more precise :)
> >>
> >>I think you wanted to say:
> >>
> >> (sector_count % 15 != 1) || (sector_count == 1)
> >
> >
> >Ehm no, I don't think so... To my knowledge, sector_count == 1 is ok. If
> >not, the hardware would be seriously screwed (ok it is already) beyond
> >software fixups.
>
>
> Now that you've kicked my brain into action, yes, sector_count==1 is ok.
> It's all about limiting the data FIS... and with sector_count==1
> there is no worry about the data FIS in this case.
Ah, my line wasn't completely clear (to say the least)... So to clear
all doubts:
if ((sector_count % 15 == 1) && (sector_count != 1))
errata path
Agree?
--
Jens Axboe
Craig Bradney wrote:
> On the topic of speeds.. hdparm -t gives me 56Mb/s on my Maxtor 80Mb 8mb
> cache PATA drive. I got that with 2.4.23 pre 8 which was ATA100 and get
> just a little more on ATA133 with 2.6. Not sure what people are
> expecting on SATA.
Serial ATA merely changes the bus, a.k.a. the interface between drive
and system.
This doesn't mean that the drive itself will be any faster... most
first-gen SATA drives are just PATA drives with a new circuit board and
new firmware. Just like some SCSI and IDE drives are exactly the same
platters, but have differing circuit boards and connectors...
Jeff
Jens Axboe wrote: > Ah, my line wasn't completely clear (to say the least)... So to clear > all doubts: > > if ((sector_count % 15 == 1) && (sector_count != 1)) > errata path > > Agree? Agreed. The confusion here is most likely my fault, as my original post intentionally inverted the logic for illustrative purposes (hah!)... > Well, the constraint we must satisfy is > > sector_count % 15 != 1 > > (i.e. "== 1" causes the lockup) And to think, English is my only language... Jeff
so definitely, 32 MB/s is almost half the speed that you get. I'm in
2.6-test11. I don't know more options to try. The next will be booting
with "noapic nolapic". Some people reported better results with this.
by the way, I have booted with "doataraid noraid" (no drives connected,
only SATA support in bios), and nothing is shown in the boot messages
(nor dmesg) about libata being loaded. I don't know if I must connect a
hard drive and then the driver shows up, but I don't think that.
Thanks!
LuisMi Garcia
Craig Bradney wrote:
> On the topic of speeds.. hdparm -t gives me 56Mb/s on my Maxtor 80Mb 8mb
> cache PATA drive. I got that with 2.4.23 pre 8 which was ATA100 and get
> just a little more on ATA133 with 2.6. Not sure what people are
> expecting on SATA.
>
> Craig
>
> On Sun, 2003-11-30 at 18:52, Luis Miguel García wrote:
>
>
>> hello:
>>
>> I have a Seagate Barracuda IV (80 Gb) connected to parallel ata on a
>> nforce-2 motherboard.
>>
>> If any of you want for me to test any patch to fix the "seagate
>> issue", please, count on me. I have a SATA sis3112 and a
>> parallel-to-serial converter. If I'm of any help to you, drop me an
>> email.
>>
>> By the way, I'm only getting 32 MB/s (hdparm -tT /dev/hda) on my
>> actual parallel ata. Is this enough for an ATA-100 device?
>>
>> Thanks a lot.
>>
>> LuisMi García
>> Spain
>>
>> -
>> To unsubscribe from this list: send the line "unsubscribe
>> linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>>
>>
>
>
>
>
>
Jens Axboe wrote: > On Sun, Nov 30 2003, Jeff Garzik wrote: >>Current non-errata case: 1 taskfile, 1 completion func call >>Upcoming errata solution: 2 taskfiles, 1 completion func call >>Your errata suggestion seems to be: 2 taskfiles, 2 completion func calls >> >>That's obviously more work and more code for the errata case. > > > I don't see why, it's exactly 2 x non-errata case. Since the hardware request API is (and must be) completely decoupled from struct request API, I can achieve 1.5 x non-errata case. >>And for the non-errata case, partial completions don't make any sense at >>all. > > > Of course, you would always complete these fully. But having partial > completions at the lowest layer gives it to you for free. non-errata > case uses the exact same path, it just happens to complete 100% of the > request all the time. [editor's note: I wonder if I've broken a grammar rule using so many "non"s in a single email] If I completely ignore partial completions on the normal [non-error] paths, the current errata and non-errata struct request completion paths would be exactly the same. Only the error path would differ. The lowest [hardware req API] layer's request granularity is a single taskfile, so it will never know about partial completions. >>>>WRT error handling, according to ATA specs I can look at the error >>>>information to determine how much of the request, if any, completed >>>>successfully. (dunno if this is also doable on ATAPI) That's why >>>>partial completions in the error path make sense to me. >>> >>> >>>... so if you do partial completions in the normal paths (or rather >>>allow them), error handling will be simpler. And we all know where the >> >>In the common non-errata case, there is never a partial completion. > > > Right. But as you mention, error handling is a partial completion by > nature (almost always). Agreed. Just in case I transposed a word or something, I wish to clarify: both errata and error paths are almost always partial completions. However... for the case where both errata taskfiles completely _successfully_, it is better have only 1 completion on the hot path (the "1.5 x" mentioned above). Particularly considering that errata taskfiles are contiguous, and the second taskfile will completely fairly quickly after the first... The slow, error path is a whole different matter. Ignoring partial completions in the normal path keeps the error path simple, for errata and non-errata cases. Handling partial completions in the error code, for both errata and non-errata cases, is definitely something I want to do in the future. >>>hard and stupid bugs are - the basically never tested error handling. >> >>I have :) libata error handling is stupid and simple, but it's also >>solid and easy to verify. Yet another path to be honed, of course :) > > > That's good :). But even given that, error handling is usually the less > tested path (by far). I do commend your 'keep it simple', I think that's > key there. As a tangent, I'm hoping to convince some drive manufacturers (under NDA most likely, unfortunately) to provide special drive firmwares that will simulate read and write errors. i.e. fault injection. Jeff
On Sun, Nov 30 2003, Jeff Garzik wrote: > Jens Axboe wrote: > >On Sun, Nov 30 2003, Jeff Garzik wrote: > >>Current non-errata case: 1 taskfile, 1 completion func call > >>Upcoming errata solution: 2 taskfiles, 1 completion func call > >>Your errata suggestion seems to be: 2 taskfiles, 2 completion func calls > >> > >>That's obviously more work and more code for the errata case. > > > > > >I don't see why, it's exactly 2 x non-errata case. > > Since the hardware request API is (and must be) completely decoupled > from struct request API, I can achieve 1.5 x non-errata case. Hmm I don't follow that... Being a bit clever, you could even send off both A and B parts of the request in one go. Probably not worth it though, that would add some complexity (things like not spanning a page, stuff you probably don't want to bother the driver with). > >>And for the non-errata case, partial completions don't make any sense at > >>all. > > > > > >Of course, you would always complete these fully. But having partial > >completions at the lowest layer gives it to you for free. non-errata > >case uses the exact same path, it just happens to complete 100% of the > >request all the time. > > [editor's note: I wonder if I've broken a grammar rule using so many > "non"s in a single email] Hehe > If I completely ignore partial completions on the normal [non-error] > paths, the current errata and non-errata struct request completion paths > would be exactly the same. Only the error path would differ. The > lowest [hardware req API] layer's request granularity is a single > taskfile, so it will never know about partial completions. Indeed. The partial completions only exist at the driver -> block layer (or -> scsi) layer, not talking to the hardware. The hardware always gets 'a request', if that just happens to be only a part of a struct request so be it. > >>>>WRT error handling, according to ATA specs I can look at the error > >>>>information to determine how much of the request, if any, completed > >>>>successfully. (dunno if this is also doable on ATAPI) That's why > >>>>partial completions in the error path make sense to me. > >>> > >>> > >>>... so if you do partial completions in the normal paths (or rather > >>>allow them), error handling will be simpler. And we all know where the > >> > >>In the common non-errata case, there is never a partial completion. > > > > > >Right. But as you mention, error handling is a partial completion by > >nature (almost always). > > Agreed. Just in case I transposed a word or something, I wish to > clarify: both errata and error paths are almost always partial > completions. Yup agree. > However... for the case where both errata taskfiles completely > _successfully_, it is better have only 1 completion on the hot path (the > "1.5 x" mentioned above). Particularly considering that errata > taskfiles are contiguous, and the second taskfile will completely fairly > quickly after the first... Sure yes, the fewer completions the better. Where do you get the 1.5 from? You need to split the request handling no matter what for the errata path, I would count that as 2 completions. > The slow, error path is a whole different matter. Ignoring partial > completions in the normal path keeps the error path simple, for errata > and non-errata cases. Handling partial completions in the error code, How so?? There are no partial completions in the normal path. In fact, ignore the term 'partial completion'. Just think completion count or something like that. At end_io time, you look at how much io has completed, and you complete that back to the layer above you (block or scsi). The normal path would always have count == total request, and you are done. The errata (and error) path would have count <= total request, which just means you have a line or two of C to tell the layer above you not put the request back at the front. That's about it for added code. I think we are talking somewhat past each other. I don't mean to imply that you want partial completions in the non-errata path. Of course you don't. I'm purely talking about completion of a count of data which necessarily doesn't have to be the total struct request size. Your taskfile tells you how much. > for both errata and non-errata cases, is definitely something I want to > do in the future. Well yes, you have to. > >>>hard and stupid bugs are - the basically never tested error handling. > >> > >>I have :) libata error handling is stupid and simple, but it's also > >>solid and easy to verify. Yet another path to be honed, of course :) > > > > > >That's good :). But even given that, error handling is usually the less > >tested path (by far). I do commend your 'keep it simple', I think that's > >key there. > > As a tangent, I'm hoping to convince some drive manufacturers (under NDA > most likely, unfortunately) to provide special drive firmwares that will > simulate read and write errors. i.e. fault injection. Would be nice to have ways of doing that which are better than 'mark this drive bad with a label and pull it ouf of the drawer for testing error handling' :) -- Jens Axboe
On Sun, Nov 30, 2003 at 01:31:35PM -0500, Jeff Garzik wrote:
> >Ah, my line wasn't completely clear (to say the least)... So to clear
> >all doubts:
> >
> > if ((sector_count % 15 == 1) && (sector_count != 1))
> > errata path
> >
> >Agree?
>
>
> Agreed.
>
>
> The confusion here is most likely my fault, as my original post
> intentionally inverted the logic for illustrative purposes (hah!)...
Yeah, and there was an error in the inversion, since if you invert the
above statement, it looks like this:
if ((sector_count % 15 != 1) || (sector_count == 1))
ok path
else
errata path
Logic can be a bitch at times.
--
Vojtech Pavlik
SuSE Labs, SuSE CR
Jens Axboe wrote: > On Sun, Nov 30 2003, Jeff Garzik wrote: >>Since the hardware request API is (and must be) completely decoupled >>from struct request API, I can achieve 1.5 x non-errata case. > > Hmm I don't follow that... Being a bit clever, you could even send off > both A and B parts of the request in one go. Probably not worth it > though, that would add some complexity (things like not spanning a page, > stuff you probably don't want to bother the driver with). [...] > Indeed. The partial completions only exist at the driver -> block layer > (or -> scsi) layer, not talking to the hardware. The hardware always > gets 'a request', if that just happens to be only a part of a struct > request so be it. [...] > Sure yes, the fewer completions the better. Where do you get the 1.5 > from? You need to split the request handling no matter what for the > errata path, I would count that as 2 completions. Taskfile completion and struct request completion are separate. That results in * struct request received by libata * libata detects errata * libata creates 2 struct ata_queued_cmd's * libata calls ata_qc_push() 2 times * Time passes * ata_qc_complete called 2 times Option 1: {scsi|block} complete called 2 times, once for each taskfile Option 2: {scsi|block} complete called 1 time, when both taskfiles are done one way: 2 h/w completions, 1 struct request completion == 1.5 another way: 2 h/w completions, 2 struct request completions == 2.0 Maybe another way of looking at it: It's a question of where the state is stored -- in ata_queued_cmd or entirely in struct request -- and what are the benefits/downsides of each. When a single struct request causes the initiation of multiple ata_queued_cmd's, libata must be capable of knowing when multiple ata_queued_cmds forming a whole have completed. struct request must also know this. _But_. The key distinction is that libata must handle multiple requests might not be based on sector progress. For this SII errata, I _could_ do this at the block layer: ata_qc_complete() -> blk_end_io(first half of sectors) ata_qc_complete() -> blk_end_io(some more sectors) And the request would be completed by the block layer (right?). But under the hood, libata has to handle these situations: * One or more ATA commands must complete in succession, before the struct request may be end_io'd. * One or more ATA commands must complete asynchronously, before the struct request may be end_io'd. * These ATA commands might not be sector based: sometimes aggressive power management means that libata must issue and complete a PM-related taskfile, before issuing the {READ|WRITE} DMA passed to it in the struct request. I'm already storing and handling this stuff at the hardware-queue level. (remember hardware queues often bottleneck at the host and/or bus levels, not necessarily the request_queue level) So what all this hopefully boils down to is: if I have to do "internal completions" anyway, it's just more work for libata to separate out the 2 taskfiles into 2 block layer completions. For both errata and non-errata paths, I can just say "the last taskfile is done, clean up" Yet another way of looking at it: In order for all state to be kept at the block layer level, you would need this check: if ((rq->expected_taskfiles == rq->completed_taskfiles) && (rq->expected_sectors == rq->completed_sectors)) the struct request is "complete" and each call to end_io would require both a taskfile count and a sector count, which would increment ->completed_taskfiles and ->completed_sectors. Note1: s/taskfile/cdb/ if that's your fancy :) Note2: ->completed_sectors exists today under another name, yes, I know :) Jeff
Jeff Garzik wrote:
> Jens Axboe wrote:
>
>> Ah, my line wasn't completely clear (to say the least)... So to clear
>> all doubts:
>>
>> if ((sector_count % 15 == 1) && (sector_count != 1))
>> errata path
>>
>> Agree?
>
>
>
> Agreed.
>
>
> The confusion here is most likely my fault, as my original post
> intentionally inverted the logic for illustrative purposes (hah!)...
>
>> Well, the constraint we must satisfy is
>>
>> sector_count % 15 != 1
>>
>> (i.e. "== 1" causes the lockup)
Hi, I just rebuilt my kernel with libata support.
I have 3112 Silicon Image controller with IDE drive attahed.
(I have 2 IDE drives, so I bought second controller to split load between them. One is connected to the MB IDE controller.)
When I run hdparm command, I can see strange behaviour of the ilbata driver:
[root@shrike root]# hdparm -t /dev/hda
/dev/hda:
Timing buffered disk reads: 64 MB in 1.19 seconds = 53.78 MB/sec
[root@shrike root]# hdparm -t /dev/sda1
/dev/sda1:
Timing buffered disk reads: read(1048576) returned 921600 bytes
[root@shrike root]#
Meanwhile -T switch works normally.
I know, that siimage support is broken, any ideas, what can possibly cause such errors?
Is it somehow linked to the error discussed in this message thread?
WBR.
On Sun, 2003-11-30 at 19:41, Luis Miguel García wrote: > so definitely, 32 MB/s is almost half the speed that you get. I'm in > 2.6-test11. I don't know more options to try. The next will be booting > with "noapic nolapic". Some people reported better results with this. > > by the way, I have booted with "doataraid noraid" (no drives connected, > only SATA support in bios), and nothing is shown in the boot messages > (nor dmesg) about libata being loaded. I don't know if I must connect a > hard drive and then the driver shows up, but I don't think that. Depends on a lot of things, especially what else is on that controller. The way I found things was my involvement in the Scribus DTP project. I upgraded to the Athlon 2600 from a Duron 900. That took me from 30+mins from a Scribus compile to 2 mins (!) if there is no secondary drive. When I had my old 20gb drive plugged in to do copies off of it.. 11mins. Do you have anything else plugged in there? Of course, I guess in that case the 8mb drive cache is helping a lot. Craig > > > Thanks! > > LuisMi Garcia > > > Craig Bradney wrote: > > > On the topic of speeds.. hdparm -t gives me 56Mb/s on my Maxtor 80Mb 8mb > > cache PATA drive. I got that with 2.4.23 pre 8 which was ATA100 and get > > just a little more on ATA133 with 2.6. Not sure what people are > > expecting on SATA. > > > > Craig > > > > On Sun, 2003-11-30 at 18:52, Luis Miguel García wrote: > > > > > >> hello: > >> > >> I have a Seagate Barracuda IV (80 Gb) connected to parallel ata on a > >> nforce-2 motherboard. > >> > >> If any of you want for me to test any patch to fix the "seagate > >> issue", please, count on me. I have a SATA sis3112 and a > >> parallel-to-serial converter. If I'm of any help to you, drop me an > >> email. > >> > >> By the way, I'm only getting 32 MB/s (hdparm -tT /dev/hda) on my > >> actual parallel ata. Is this enough for an ATA-100 device? > >> > >> Thanks a lot. > >> > >> LuisMi García > >> Spain > >> > >> - > >> To unsubscribe from this list: send the line "unsubscribe > >> linux-kernel" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> Please read the FAQ at http://www.tux.org/lkml/ > >> > >> > > > > > > > > > > > > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ >
Bartlomiej Zolnierkiewicz wrote:
> On Sunday 30 of November 2003 18:19, Prakash K. Cheemplavam wrote:
>
>>Bartlomiej Zolnierkiewicz wrote:
>>
>>>In 2.6.x there is no max_kb_per_request setting in
>>>/proc/ide/hdx/settings. Therefore
>>> echo "max_kb_per_request:128" > /proc/ide/hde/settings
>>>does not work.
>>>
>>>Hmm. actually I was under influence that we have generic ioctls in 2.6.x,
>>>but I can find only BLKSECTGET, BLKSECTSET was somehow lost. Jens?
>>>
>>>Prakash, please try patch and maybe you will have 2 working drivers now
>>>:-).
>>
>>OK, this driver fixes the transfer rate problem. Nice, so I wanted to do
>>the right thing, but it didn't work, as you explained... Thanks.
>
>
> Cool.
>
>
>>Nevertheless there is still the issue left:
>>
>>hdparm -d1 /dev/hde makes the drive get major havoc (something like:
>>ide: dma_intr: status=0x58 { DriveReady, SeekCOmplete, DataRequest}
>>
>>ide status timeout=0xd8{Busy}; messages taken from swsups kernal panic
>>). Have to do a hard reset. I guess it is the same reason why swsusp
>>gets a kernel panic when it sends PM commands to siimage.c. (Mybe the
>>same error is in libata causing the same kernel panic on swsusp.)
>>
>>Any clues?
>
>
> Strange. While doing 'hdparm -d1 /dev/hde' the same code path is executed
> which is executed during boot so probably device is in different state or you
> hit some weird driver bug :/.
>
> And you are right, thats the reason why swsusp panics.
I think the bug is, the driver specifically doesn't like my
controller-sata converter-hd combination. As I stated in my very first
message, on HD access the siimage.c constantly calls:
static int siimage_mmio_ide_dma_test_irq (ide_drive_t *drive)
{
ide_hwif_t *hwif = HWIF(drive);
unsigned long base = (unsigned long)hwif->hwif_data;
unsigned long addr = siimage_selreg(hwif, 0x1);
if (SATA_ERROR_REG) {
u32 ext_stat = hwif->INL(base + 0x10);
u8 watchdog = 0;
if (ext_stat & ((hwif->channel) ? 0x40 : 0x10)) {
// u32 sata_error = hwif->INL(SATA_ERROR_REG);
// hwif->OUTL(sata_error, SATA_ERROR_REG);
// watchdog = (sata_error & 0x00680000) ? 1 : 0;
//#if 1
// printk(KERN_WARNING "%s: sata_error = 0x%08x, "
// "watchdog = %d, %s\n",
// drive->name, sata_error, watchdog,
// __FUNCTION__);
//#endif
} else {
Thats why I commented above portions out, otherwise my dmesg gets
flooded. What is strange, when I compile the kernel to *not* enable DMA
at boot, the siimage DMA gets enabled nevertheless, so I am not sure
whether hdparm -d1 and kernel boot take the same path to enable DMA. It
seems some sort of hack within siimage.c is used to enable DMA on my
drive. Remember, I have no native SATA drive, maybe thats the problem.
Prakash
On Sun, Nov 30 2003, Jeff Garzik wrote: > Jens Axboe wrote: > >On Sun, Nov 30 2003, Jeff Garzik wrote: > >>Since the hardware request API is (and must be) completely decoupled > >>from struct request API, I can achieve 1.5 x non-errata case. > > > >Hmm I don't follow that... Being a bit clever, you could even send off > >both A and B parts of the request in one go. Probably not worth it > >though, that would add some complexity (things like not spanning a page, > >stuff you probably don't want to bother the driver with). > [...] > >Indeed. The partial completions only exist at the driver -> block layer > >(or -> scsi) layer, not talking to the hardware. The hardware always > >gets 'a request', if that just happens to be only a part of a struct > >request so be it. > [...] > >Sure yes, the fewer completions the better. Where do you get the 1.5 > >from? You need to split the request handling no matter what for the > >errata path, I would count that as 2 completions. > > Taskfile completion and struct request completion are separate. That Of course > results in > > * struct request received by libata > * libata detects errata > * libata creates 2 struct ata_queued_cmd's > * libata calls ata_qc_push() 2 times > * Time passes > * ata_qc_complete called 2 times > Option 1: {scsi|block} complete called 2 times, once for each taskfile > Option 2: {scsi|block} complete called 1 time, when both taskfiles are done > > one way: 2 h/w completions, 1 struct request completion == 1.5 > another way: 2 h/w completions, 2 struct request completions == 2.0 Well that's implementation detail in your driver, I meant hardware completions. Doing one or 2 calls to scsi/block completion handling is not expensive, at least when compared to the hardware completion. And I'd greatly prefer option 1, so you end_io doesn't have to know about the request being partial or not. > Maybe another way of looking at it: > It's a question of where the state is stored -- in ata_queued_cmd or > entirely in struct request -- and what are the benefits/downsides of each. If you mangle the request, you only have to do it in one place. If your private command is easier to handle, that wont be tricky as well. Given that I don't really know your libata, I can't say for sure :) > When a single struct request causes the initiation of multiple > ata_queued_cmd's, libata must be capable of knowing when multiple > ata_queued_cmds forming a whole have completed. struct request must This is where I disagree. If you accept to serialize the request for the errata case, then it does _not_ need to know! That would be adding some complexity. > also know this. _But_. The key distinction is that libata must handle > multiple requests might not be based on sector progress. > > For this SII errata, I _could_ do this at the block layer: > ata_qc_complete() -> blk_end_io(first half of sectors) > ata_qc_complete() -> blk_end_io(some more sectors) > > And the request would be completed by the block layer (right?). Yes. > But under the hood, libata has to handle these situations: > * One or more ATA commands must complete in succession, before the > struct request may be end_io'd. That's not tricky, as I've mentioned it's a 1-2 lines of code in your end_io handling. > * One or more ATA commands must complete asynchronously, before the > struct request may be end_io'd. Depends on whether you want to accept serialization of that request for the errata path. > * These ATA commands might not be sector based: sometimes aggressive > power management means that libata must issue and complete a PM-related > taskfile, before issuing the {READ|WRITE} DMA passed to it in the struct > request. What's your point? > I'm already storing and handling this stuff at the hardware-queue level. > (remember hardware queues often bottleneck at the host and/or bus > levels, not necessarily the request_queue level) > > So what all this hopefully boils down to is: if I have to do "internal > completions" anyway, it's just more work for libata to separate out the > 2 taskfiles into 2 block layer completions. For both errata and > non-errata paths, I can just say "the last taskfile is done, clean up" I think you should just serialize the handling of that errata request. If you need to provide truly partial completions of a request (where X+1 can complete before X), then you do need to clone the actual struct request and do other magic. If you serialize it, it's basically no added complexity. > Yet another way of looking at it: > In order for all state to be kept at the block layer level, you would > need this check: > > if ((rq->expected_taskfiles == rq->completed_taskfiles) && > (rq->expected_sectors == rq->completed_sectors)) > the struct request is "complete" > > and each call to end_io would require both a taskfile count and a sector > count, which would increment ->completed_taskfiles and ->completed_sectors. No it doesn't work at all still, you can't complete arbitrary parts of a request like that. I don't see why you would need the completed_taskfiles (and expected), if you complete more sectors per taskfile than your driver is broken. So struct request already has enough info for the above 'if', but it doesn't have (and will never have) enough state to complete random parts of it non-sequentially. That can't work for all requests anyways, consider barriers. The actual implementation doesn't allow it either, with the one-way stringing of bio's and partial completions (and wakeups) of those as well. > Note1: s/taskfile/cdb/ if that's your fancy :) taskfile contains way more info that cdb, so I'll stick to that :) -- Jens Axboe