* access beyond end of device again @ 2002-06-24 14:35 Kevin 2002-06-24 14:35 ` Oleg Drokin 2002-06-24 14:37 ` Robert Brockway 0 siblings, 2 replies; 25+ messages in thread From: Kevin @ 2002-06-24 14:35 UTC (permalink / raw) To: reiserfs-list I'm getting these errors again: attempt to access beyond end of device 38:01: rw=0, want=2052028788, limit=58633312 anyone know what causes them? and more importantly a way to stop them from coming back? ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: access beyond end of device again 2002-06-24 14:35 access beyond end of device again Kevin @ 2002-06-24 14:35 ` Oleg Drokin 2002-06-24 14:45 ` Dirk Mueller 2002-06-24 14:37 ` Robert Brockway 1 sibling, 1 reply; 25+ messages in thread From: Oleg Drokin @ 2002-06-24 14:35 UTC (permalink / raw) To: Kevin; +Cc: reiserfs-list Hello! Do you get these during normal operations? Then it seems some of unformatted pointers were corrupted and you need to run reiserfsck to clear these. Bye, Oleg On Mon, Jun 24, 2002 at 07:35:24AM -0700, Kevin wrote: > I'm getting these errors again: > attempt to access beyond end of device > 38:01: rw=0, want=2052028788, limit=58633312 > > anyone know what causes them? and more importantly a way to stop them > from coming back? > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: access beyond end of device again 2002-06-24 14:35 ` Oleg Drokin @ 2002-06-24 14:45 ` Dirk Mueller 2002-06-24 14:49 ` Oleg Drokin 0 siblings, 1 reply; 25+ messages in thread From: Dirk Mueller @ 2002-06-24 14:45 UTC (permalink / raw) To: reiserfs-list On Mon, 24 Jun 2002, Oleg Drokin wrote: > > Do you get these during normal operations? > Then it seems some of unformatted pointers were corrupted and you need > to run reiserfsck to clear these. most lilely the partition is indeed to small (played with fsck lately ?) Dirk ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: access beyond end of device again 2002-06-24 14:45 ` Dirk Mueller @ 2002-06-24 14:49 ` Oleg Drokin 2002-06-24 16:46 ` Hans Reiser 2002-06-24 16:59 ` Dirk Mueller 0 siblings, 2 replies; 25+ messages in thread From: Oleg Drokin @ 2002-06-24 14:49 UTC (permalink / raw) To: Dirk Mueller; +Cc: reiserfs-list Hello! On Mon, Jun 24, 2002 at 04:45:01PM +0200, Dirk Mueller wrote: > > Do you get these during normal operations? > > Then it seems some of unformatted pointers were corrupted and you need > > to run reiserfsck to clear these. > most lilely the partition is indeed to small (played with fsck lately ?) reiserfs does not allocates any blocks past the partition size, so I cannot even imagine what are you speaking about ;) Bye, Oleg ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: access beyond end of device again 2002-06-24 14:49 ` Oleg Drokin @ 2002-06-24 16:46 ` Hans Reiser 2002-06-25 5:11 ` Oleg Drokin 2002-06-24 16:59 ` Dirk Mueller 1 sibling, 1 reply; 25+ messages in thread From: Hans Reiser @ 2002-06-24 16:46 UTC (permalink / raw) To: Oleg Drokin; +Cc: Dirk Mueller, reiserfs-list Oleg Drokin wrote: >Hello! > >On Mon, Jun 24, 2002 at 04:45:01PM +0200, Dirk Mueller wrote: > > > >>> Do you get these during normal operations? >>> Then it seems some of unformatted pointers were corrupted and you need >>> to run reiserfsck to clear these. >>> >>> >>most lilely the partition is indeed to small (played with fsck lately ?) >> >> > >reiserfs does not allocates any blocks past the partition size, >so I cannot even imagine what are you speaking about ;) > >Bye, > Oleg > > > > This can be caused by fdisk followed by mkreiserfs without a reboot between fdisk and mkreiserfs, yes? -- Hans ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: access beyond end of device again 2002-06-24 16:46 ` Hans Reiser @ 2002-06-25 5:11 ` Oleg Drokin 0 siblings, 0 replies; 25+ messages in thread From: Oleg Drokin @ 2002-06-25 5:11 UTC (permalink / raw) To: Hans Reiser; +Cc: Dirk Mueller, reiserfs-list Hello! On Mon, Jun 24, 2002 at 08:46:14PM +0400, Hans Reiser wrote: > >reiserfs does not allocates any blocks past the partition size, > >so I cannot even imagine what are you speaking about ;) > This can be caused by fdisk followed by mkreiserfs without a reboot > between fdisk and mkreiserfs, yes? Similar problem, but not this exact one, I'd say. But sequence of events should be this: have a hdd with several partitions. Have some of the partitions mounted. (or at least one). Destroy one of the partitions and create smaller one instead. (or just resize partition down). mkfs the partition without rebooting. But if resizing have removed more than 132 Mb of space, then such a partition won't mount on next reboot just because not of all bitmaps can be readed. (and messages indicated that requested sector is far away from partition end). The main thins there is for HDD to be mounted (at least one partition). If nothing were mounted off the HDD, then kernel is able to correctly re-read partition table. Bye, Oleg ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: access beyond end of device again 2002-06-24 14:49 ` Oleg Drokin 2002-06-24 16:46 ` Hans Reiser @ 2002-06-24 16:59 ` Dirk Mueller 1 sibling, 0 replies; 25+ messages in thread From: Dirk Mueller @ 2002-06-24 16:59 UTC (permalink / raw) To: reiserfs-list On Mon, 24 Jun 2002, Oleg Drokin wrote: > > most lilely the partition is indeed to small (played with fsck lately ?) fdisk I meant. sorry. Dirk ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: access beyond end of device again 2002-06-24 14:35 access beyond end of device again Kevin 2002-06-24 14:35 ` Oleg Drokin @ 2002-06-24 14:37 ` Robert Brockway 2002-06-24 14:48 ` Oleg Drokin 1 sibling, 1 reply; 25+ messages in thread From: Robert Brockway @ 2002-06-24 14:37 UTC (permalink / raw) To: Kevin; +Cc: reiserfs-list On Mon, 24 Jun 2002, Kevin wrote: > I'm getting these errors again: > attempt to access beyond end of device > 38:01: rw=0, want=2052028788, limit=58633312 What sort of device is this supposed to be? :) 38:01 is either a "Myricom PCI Myrinet board" or something "reserved for Linux/AP+" (and I'm assuming we're talking about a block device rather than a character device here :) I could be way off but could you confirm what you think the filesystem is on? Rob -- Robert Brockway B.Sc. email: robert@timetraveller.org ICQ: 104781119 Linux counter project ID #16440 (http://counter.li.org) avon: up 16 days, 23:28, 1 user, load average: 0.00, 0.02, 0.00 "The earth is but one country and mankind its citizens" -Baha'u'llah ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: access beyond end of device again 2002-06-24 14:37 ` Robert Brockway @ 2002-06-24 14:48 ` Oleg Drokin 2002-06-24 14:49 ` Robert Brockway 2002-07-10 18:04 ` Tim Small 0 siblings, 2 replies; 25+ messages in thread From: Oleg Drokin @ 2002-06-24 14:48 UTC (permalink / raw) To: Robert Brockway; +Cc: Kevin, reiserfs-list Hello! On Tue, Jun 25, 2002 at 12:37:02AM +1000, Robert Brockway wrote: > > I'm getting these errors again: > > attempt to access beyond end of device > > 38:01: rw=0, want=2052028788, limit=58633312 > What sort of device is this supposed to be? :) 38:01 is either a "Myricom > PCI Myrinet board" or something "reserved for Linux/AP+" (and I'm assuming > we're talking about a block device rather than a character device here :) Numbers printed are in hex, so this is: block Fifth IDE hard disk/CD-ROM interface 0 = /dev/hdi Master: whole disk (or CD-ROM) 64 = /dev/hdj Slave: whole disk (or CD-ROM) Partitions are handled the same way as for the first interface (see major number 3). Bye, Oleg ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: access beyond end of device again 2002-06-24 14:48 ` Oleg Drokin @ 2002-06-24 14:49 ` Robert Brockway 2002-06-24 17:30 ` Kevin 2002-07-10 18:04 ` Tim Small 1 sibling, 1 reply; 25+ messages in thread From: Robert Brockway @ 2002-06-24 14:49 UTC (permalink / raw) To: reiserfs-list On Mon, 24 Jun 2002, Oleg Drokin wrote: > Numbers printed are in hex, so this is: Damn, they are too (*blushes* :) Sorry...ahhh..late here...slinks back into hole :) ObContent: I have seen this error before. A number of different causes right up to & including actual disk problems. Has their been any messing with the partition table of late? Can we see an fdisk -l of the relevant disk to see how big the filesystem *should be* :) Rob -- Robert Brockway B.Sc. email: robert@timetraveller.org ICQ: 104781119 Linux counter project ID #16440 (http://counter.li.org) avon: up 16 days, 23:38, 1 user, load average: 0.07, 0.03, 0.01 "The earth is but one country and mankind its citizens" -Baha'u'llah ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: access beyond end of device again 2002-06-24 14:49 ` Robert Brockway @ 2002-06-24 17:30 ` Kevin 2002-06-25 5:54 ` Oleg Drokin 0 siblings, 1 reply; 25+ messages in thread From: Kevin @ 2002-06-24 17:30 UTC (permalink / raw) To: reiserfs-list > On Mon, 24 Jun 2002, Oleg Drokin wrote: > Can we see an fdisk -l of the relevant disk to see how big the filesystem > *should be* :) I reformatted the partition since the last time it rebooted, but I haven't changed the actual partition. The only time I see the error is when trying to read certain files. Everything else seems to work fine. The drive is a 60g maxtor attached to an hpt366. It is the master on its channel, and there is no slave. I'm running 2.4.18 with reiserfsprogs 3.x.1c-pre4. fdisk -l /dev/hdi1 Disk /dev/hdi: 16 heads, 63 sectors, 116336 cylinders Units = cylinders of 1008 * 512 bytes Device Boot Start End Blocks Id System /dev/hdi1 1 116336 58633312+ 83 Linux df /dev/hdi1 Filesystem 1k-blocks Used Available Use% Mounted on /dev/hdi1 58631516 39128508 19503008 67% /opt ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: access beyond end of device again 2002-06-24 17:30 ` Kevin @ 2002-06-25 5:54 ` Oleg Drokin 2002-06-25 6:08 ` Kevin 0 siblings, 1 reply; 25+ messages in thread From: Oleg Drokin @ 2002-06-25 5:54 UTC (permalink / raw) To: Kevin; +Cc: reiserfs-list Hello! On Mon, Jun 24, 2002 at 10:30:45AM -0700, Kevin wrote: > > Can we see an fdisk -l of the relevant disk to see how big the filesystem > > *should be* :) > I reformatted the partition since the last time it rebooted, but I Reformatted as in mkreiserfs? > haven't changed the actual partition. The only time I see the error > is when trying to read certain files. Everything else seems to work Yeah, ones that contains invalid blocknumbers in metadata. > fine. The drive is a 60g maxtor attached to an hpt366. It is the > master on its channel, and there is no slave. I'm running 2.4.18 with > reiserfsprogs 3.x.1c-pre4. > fdisk -l /dev/hdi1 > Disk /dev/hdi: 16 heads, 63 sectors, 116336 cylinders > Units = cylinders of 1008 * 512 bytes > Device Boot Start End Blocks Id System > /dev/hdi1 1 116336 58633312+ 83 Linux > /dev/hdi1 58631516 39128508 19503008 67% /opt Numbers looks correct. Bye, Oleg ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: access beyond end of device again 2002-06-25 5:54 ` Oleg Drokin @ 2002-06-25 6:08 ` Kevin 2002-06-25 6:15 ` Oleg Drokin 0 siblings, 1 reply; 25+ messages in thread From: Kevin @ 2002-06-25 6:08 UTC (permalink / raw) To: Oleg Drokin; +Cc: reiserfs-list > Reformatted as in mkreiserfs? Correct. That's the only change I've made to it though. Would rebuilding the superblocks help? Last time it happened, I rebuilt the tree, and it deleted the files that I had trouble with. So it's hard to say if it really fixed it, or if it was just waiting for me to write to that spot again. I'm about to run the Maxtor diags on the drive and do a factory recertification, just to make sure. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: access beyond end of device again 2002-06-25 6:08 ` Kevin @ 2002-06-25 6:15 ` Oleg Drokin 2002-06-25 7:46 ` Kevin 0 siblings, 1 reply; 25+ messages in thread From: Oleg Drokin @ 2002-06-25 6:15 UTC (permalink / raw) To: Kevin; +Cc: reiserfs-list Hello! On Wed, Jun 26, 2002 at 11:09:35PM -0700, Kevin wrote: > > Reformatted as in mkreiserfs? > Correct. That's the only change I've made to it though. Would > rebuilding the superblocks help? Last time it happened, I rebuilt the No. Superblock is fine in your case. > tree, and it deleted the files that I had trouble with. So it's hard --fix-fixable can fix such errors. (it just zeroes offending pointers), It would be nice if you first run reiserfsck --check and show us the output, though. Because you may have other kinds corruptions as well. Bye, Oleg ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: access beyond end of device again 2002-06-25 6:15 ` Oleg Drokin @ 2002-06-25 7:46 ` Kevin 2002-06-25 7:55 ` Oleg Drokin 0 siblings, 1 reply; 25+ messages in thread From: Kevin @ 2002-06-25 7:46 UTC (permalink / raw) To: Oleg Drokin; +Cc: reiserfs-list > --fix-fixable can fix such errors. (it just zeroes offending pointers), > It would be nice if you first run reiserfsck --check and show us the output, > though. Because you may have other kinds corruptions as well. > Bye, > Oleg I put the log here as its somewhat long: http://redefine.org/~coggy/hdi.txt Maxtor diags found nothing wrong with the driver itself. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: access beyond end of device again 2002-06-25 7:46 ` Kevin @ 2002-06-25 7:55 ` Oleg Drokin 2002-06-25 8:07 ` Kevin 0 siblings, 1 reply; 25+ messages in thread From: Oleg Drokin @ 2002-06-25 7:55 UTC (permalink / raw) To: Kevin; +Cc: reiserfs-list Hello! On Thu, Jun 27, 2002 at 12:47:46AM -0700, Kevin wrote: > > --fix-fixable can fix such errors. (it just zeroes offending pointers), > > It would be nice if you first run reiserfsck --check and show us the output, > > though. Because you may have other kinds corruptions as well. > I put the log here as its somewhat long: > http://redefine.org/~coggy/hdi.txt Well, you have some corrupted leaves in conjunction to corrupted unformatted pointers. So reiserfsck --rebuild-tree seems to be needed. Are you sure there is no data corruption when writing to disk? There were reports that VIA chipsets have problems with more than 3 IDE channels being in use simultaneously. I am sure there is even a test suite that trigger these bugs reliable. (you have not told us anything about your motherboard/system so may be you do not use VIA chipset of course, but finding one of such tools that uses several HDDs simultaneously is advisable). Bye, Oleg ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: access beyond end of device again 2002-06-25 7:55 ` Oleg Drokin @ 2002-06-25 8:07 ` Kevin 2002-06-25 8:13 ` Oleg Drokin 0 siblings, 1 reply; 25+ messages in thread From: Kevin @ 2002-06-25 8:07 UTC (permalink / raw) To: Oleg Drokin; +Cc: reiserfs-list > Well, you have some corrupted leaves in conjunction to corrupted unformatted > pointers. So reiserfsck --rebuild-tree seems to be needed. > Are you sure there is no data corruption when writing to disk? > There were reports that VIA chipsets have problems with more than 3 IDE > channels being in use simultaneously. > I am sure there is even a test suite that trigger these bugs reliable. > (you have not told us anything about your motherboard/system so may > be you do not use VIA chipset of course, but finding one of such tools > that uses several HDDs simultaneously is advisable). > Bye, > Oleg It is a 2x400 celeron on an Abit BP6. There are 5 hdd's total, spread across 3 controllers. All the drives are the master on their channel, with no slaves. As far as the testing, do you know of any such tools? It's worth a try. However, when the file that triggered the error was written, the system was not under any stress at all. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: access beyond end of device again 2002-06-25 8:07 ` Kevin @ 2002-06-25 8:13 ` Oleg Drokin 2002-06-25 8:46 ` Hans Reiser [not found] ` <353485111.20020627013212@redefine.org> 0 siblings, 2 replies; 25+ messages in thread From: Oleg Drokin @ 2002-06-25 8:13 UTC (permalink / raw) To: Kevin; +Cc: reiserfs-list [-- Attachment #1: Type: text/plain, Size: 1260 bytes --] Hello! On Thu, Jun 27, 2002 at 01:08:47AM -0700, Kevin wrote: > > Well, you have some corrupted leaves in conjunction to corrupted unformatted > > pointers. So reiserfsck --rebuild-tree seems to be needed. > > Are you sure there is no data corruption when writing to disk? > > There were reports that VIA chipsets have problems with more than 3 IDE > > channels being in use simultaneously. > > I am sure there is even a test suite that trigger these bugs reliable. > > (you have not told us anything about your motherboard/system so may > > be you do not use VIA chipset of course, but finding one of such tools > > that uses several HDDs simultaneously is advisable). > It is a 2x400 celeron on an Abit BP6. There are 5 hdd's total, spread Abit BP6 is particularly bad motherboard, you know. And running celerons in SMP mode is not supported by Intel. > across 3 controllers. All the drives are the master on their channel, > with no slaves. As far as the testing, do you know of any such tools? > It's worth a try. However, when the file that triggered the error was > written, the system was not under any stress at all. E.g. http://www.bit-net.com/~rmiller/dt.html Also take a look at the two messages from lkml, I have attached. Bye, Oleg [-- Attachment #2: m1 --] [-- Type: text/plain, Size: 6085 bytes --] From linux-kernel-owner+green=40namesys.com@vger.kernel.org Wed May 8 05:48:03 2002 Return-Path: <linux-kernel-owner+green=40namesys.com@vger.kernel.org> Delivered-To: green@localhost.namesys.com Received: from localhost (localhost [127.0.0.1]) by angband.namesys.com (Postfix on SuSE Linux 7.3 (i386)) with ESMTP id CEAAC41907 for <green@localhost>; Wed, 8 May 2002 05:48:03 +0400 (MSD) Delivered-To: green@namesys.com Received: from thebsh.namesys.com [212.16.7.65] by localhost with POP3 (fetchmail-5.9.0) for green@localhost (single-drop); Wed, 08 May 2002 05:48:03 +0400 (MSD) Received: (qmail 29959 invoked from network); 8 May 2002 01:46:15 -0000 Received: from vger.kernel.org (209.116.70.75) by thebsh.namesys.com with SMTP; 8 May 2002 01:46:15 -0000 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id <S315478AbSEHBoK>; Tue, 7 May 2002 21:44:10 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id <S315479AbSEHBoJ>; Tue, 7 May 2002 21:44:09 -0400 Received: from pop.gmx.net ([213.165.64.20]:62967 "HELO mail.gmx.net") by vger.kernel.org with SMTP id <S315478AbSEHBoI> convert rfc822-to-8bit; Tue, 7 May 2002 21:44:08 -0400 Received: (qmail 32229 invoked by uid 0); 8 May 2002 01:44:01 -0000 Received: from adsl-162-85.adsl-pool.axelero.hu (HELO lead) (62.201.85.162) by mail.gmx.net (mp001-rz3) with SMTP; 8 May 2002 01:44:01 -0000 Reply-To: <bPObject@axelero.hu> From: "P. Breuer" <bPObject@gmx.ch> To: <andre@linux-ide.org> Cc: <linux-kernel@vger.kernel.org> Subject: PROBLEM: silent data corruption using HPT370 on an ABIT VP6 Date: Wed, 8 May 2002 03:43:59 +0200 Message-ID: <EGEOJJNFHLHGOKNADENLOEGCCFAA.bPObject@gmx.ch> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7BIT X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0) X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 Importance: Normal Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org Status: RO Content-Length: 3968 Lines: 83 1. Silent disk corruption using HPT370 on an ABIT VP6 2. I have tracked down a crooked bug somewhere in the IDE driver leading to a slow and silent data corruption, which is a most alarming threat for the incautious. The case is simple: "cp file1 file2; diff file1 file2" shows differences under certain conditions. 3. Keywords: kernel, driver, ide, data corruption, i386 4. Kernel versions: 2.4.16 or 2.4.18 (error reproducible in both versions) 5. Hardware environment (details see below): ABIT VP6 motherboard including: dual Pentium III, VIA APOLLO PRO chipset VIA onboard EIDE controller, HPT370 "raid" UDMA/100 controller, integrated on board Promise TX2 (PDC) UDMA/100 PCI controller card Hard disks (all masters): 2 x 6GB Quantum Fireball EX6.4A on VIA, 2 x 40GB Quantum FireballP AS40.0 on PDC, 2 x 40GB Quantum FireballP AS40.0 on HPT 6. Software environment: IDE driver (kernel-integrated) raidtools-0.90-5 (optional) General: four 40GB disks of identical geometry have three partitions each, same partitioning, identified by /dev/hd[e,g,i,k][1-3], /dev/md[0-2} are three RAID-5 arrays defined on the four disks accordingly each out of three raid partitions are formatted ext3 with internal journal 7. ERROR description: Let "file1" be a "large" data file, e.g. 1GB, on a RAID array described above. Then "cp file1 file2; cmp -l file1 file2" shows (subtle) differences. There are random differences on several random spots between the files. The "spots" occur usually as blocks of few bytes in succession. The difference is up to several dozens of bytes at a 1GB file copy. 8. Tracking down the error: I have conducted over 100 test cases: the error is consistent, though random. First I excluded an error in the raid software: umount /dev/md[0-2]; raidstop /dev/md[0-2]. I used a script to read all four raw disks concurrent: for d in e, g, i, k; do \ (for i in 1 2 3 4 5; do \ dd if=/dev/hd"$d"1 count=2500000 \ 2> /dev/null | md5sum; done \ ) >> trc"$d".md5sum done I found NO differences in trce.md5sum and trcg.md5sum (both disks are on the Promise controller), but significant differences in trci.md5sum and trck.md5sum, displaying 3 and 5 different read results out of 5 identical reads, resp. (both disks are on the HPT370 controller). Oops!!! I stayed focused on the HPT370 controller, and compiled a small test environment with a single processor motherboard and a HPT370A PCI controller card, which, in addition, has the same HPT BIOS version (1.0.3b1) as the integrated one. I found no problem using this configuration, so the error might well be related only to the SMP architecture. 9. Solution or workaround? I browsed through the HighPoint Software web pages and found a remarkable replacement for the kernel IDE-driver. This is a SCSI IDE emulation module, called hpt37x2.o, that can be built for "any" 2.4.x kernel. And IT WORKS, at least for me, since at least two days ;) The only drawback is, that it is not GPL-d and the complete source is not available. The existence of a working driver is a profound proof for the kernel driver to be in error! 10. Attachments: I have saved several files out of /proc, boot log, etc. from the test period, i.e. by using the faulty driver. They are available upon request. Due to the fact, that the HPT driver is not a native IDE-driver, but a SCSI-emulation, it is not possible to switch between booting the old and new kernels very easily. One example, the raid arrays are not recognised from the foreign configuration. Peter Breuer [P.Breuer@freemail.hu] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ [-- Attachment #3: m2 --] [-- Type: text/plain, Size: 7189 bytes --] From linux-kernel-owner+green=40namesys.com@vger.kernel.org Tue May 14 22:04:04 2002 Return-Path: <linux-kernel-owner+green=40namesys.com@vger.kernel.org> Delivered-To: green@localhost.namesys.com Received: from localhost (localhost [127.0.0.1]) by angband.namesys.com (Postfix on SuSE Linux 7.3 (i386)) with ESMTP id 0E193B17A1 for <green@localhost>; Tue, 14 May 2002 22:04:04 +0400 (MSD) Delivered-To: green@namesys.com Received: from thebsh.namesys.com [212.16.7.65] by localhost with POP3 (fetchmail-5.9.0) for green@localhost (single-drop); Tue, 14 May 2002 22:04:04 +0400 (MSD) Received: (qmail 11573 invoked from network); 14 May 2002 18:03:31 -0000 Received: from vger.kernel.org (209.116.70.75) by thebsh.namesys.com with SMTP; 14 May 2002 18:03:31 -0000 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id <S315935AbSENR4J>; Tue, 14 May 2002 13:56:09 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id <S315937AbSENR4I>; Tue, 14 May 2002 13:56:08 -0400 Received: from mail.netbeat.de ([62.208.140.19]:53266 "HELO mail.netbeat.de") by vger.kernel.org with SMTP id <S315935AbSENRzk>; Tue, 14 May 2002 13:55:40 -0400 Received: (qmail 2315 invoked from network); 14 May 2002 17:57:31 -0000 Received: from pd9542a05.dip.t-dialin.net (HELO qs2) (217.84.42.5) by mail.netbeat.de with SMTP; 14 May 2002 17:57:31 -0000 Date: Tue, 14 May 2002 19:55:33 +0200 From: Henning Schroeder <hgs@anna-strasse.de> X-Mailer: The Bat! (v1.53d) Reply-To: Henning Schroeder <hgs@anna-strasse.de> Organization: =?ISO-8859-1?B?QW5uYXN0cmFzc2UgV/xyemJ1cmc=?= X-Priority: 3 (Normal) Message-ID: <379487051.20020514195533@anna-strasse.de> To: linux-kernel@vger.kernel.org Subject: IDE *data corruption* VIA VT8367 MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org Status: RO Content-Length: 5198 Lines: 128 Hello, I╢m not quite sure whether this is a kernel issue, but I can╢t think of another evildoer :-) ASUS A7V266-E Mainboard (VT8367 [KT266] Chipset, with VIA IDE and Promise 20265 IDE Controller on board), 4x MAXTOR 6L020J1 (20GB ATA-100) attached at the four ports (resulting in hda, hdc, hde, hdg). Robin Miller╢s Data Test Program (dt) from http://www.bit-net.com/~rmiller/dt.html reports data errors on (and only on) hdg when tests are run in parallel. This is especially nasty because i plan to use the drives in a RAID-0 fashion which results in data errors as well. These combinations give errors: (hda hdc hde hdg), (hdc hde hdg) These combinations run flawless: (hda hdc hde), (hde hdg), (hda hdc hdg). I did not test more combinations because every test takes some hours. Attaching hdg as a slave drive to the first promise port (which gives me hdf instead and the second promise port emtpy) makes the array run fine, but performance drops to a figure comparable to a single drive. There are no error logs whatsoever (except for the dt output). Without RAID-array and without heavy IDE access, the machine runs stable. Kernels tested: 2.4.18, 2.4.19pre8 Has anybody seen this before? Any info would be appreciated. I would be happy to provide more information. Diagnostics attached below. ------- output from dt (this is actually output from testing the raid array) ---------------- Command Line: % dt.d/dt of=/data/test limit=1g min=512 max=32k align=rotate procs=15 log=dtlog runtime=12h --> Date: June 2nd, Version: 14.10, Author: Robin T. Miller <-- [...] dt (2150): Error number 1 occurred on Wed May 8 20:16:40 2002 dt (2150): Data compare error at byte 5116 in record number 36 dt (2150): Relative block number where the error occcured is 639 (offset 508) dt (2150): Data expected = 0xde, data found = 0x33, byte count = 18432 dt (2150): The incorrect data starts at address 0x80b1688 (marked by asterisk '*') dt (2150): Dumping Pattern Buffer (base = 0x80b1688, offset = 0, limit = 4 bytes): 0x80b1688 *de c6 de c6 dt (2150): The incorrect data starts at address 0x80b33ff (marked by asterisk '*') dt (2150): Dumping Data Buffer (base = 0x80b2003, offset = 5116, limit = 64 bytes): 0x80b33df de c6 de c6 de c6 de c6 de c6 de c6 de c6 de c6 0x80b33ef de c6 de c6 de c6 de c6 de c6 de c6 de c6 de c6 0x80b33ff *33 33 33 33 de c6 de c6 de c6 de c6 de c6 de c6 0x80b340f de c6 de c6 de c6 de c6 de c6 de c6 de c6 de c6 [...] dt (2148): Error number 1 occurred on Wed May 8 20:16:42 2002 dt (2148): Data compare error at byte 2044 in record number 857 dt (2148): Relative block number where the error occcured is 27343 (offset 508) dt (2148): Data expected = 0xff, data found = 0x26, byte count = 12800 dt (2148): The incorrect data starts at address 0x80b1688 (marked by asterisk '*') dt (2148): Dumping Pattern Buffer (base = 0x80b1688, offset = 0, limit = 4 bytes): 0x80b1688 *ff 00 ff 00 dt (2148): The incorrect data starts at address 0x80b27fc (marked by asterisk '*') dt (2148): Dumping Data Buffer (base = 0x80b2000, offset = 2044, limit = 64 bytes): 0x80b27dc ff 00 ff 00 ff 00 ff 00 ff 00 ff 00 ff 00 ff 00 0x80b27ec ff 00 ff 00 ff 00 ff 00 ff 00 ff 00 ff 00 ff 00 0x80b27fc *26 33 67 66 ff 00 ff 00 ff 00 ff 00 ff 00 ff 00 0x80b280c ff 00 ff 00 ff 00 ff 00 ff 00 ff 00 ff 00 ff 00 [...] dt (2160): Error number 1 occurred on Wed May 8 20:16:46 2002 dt (2160): Data compare error at byte 24572 in record number 49 dt (2160): Relative block number where the error occcured is 1223 (offset 508) dt (2160): Data expected = 0x39, data found = 0xff, byte count = 25088 dt (2160): The incorrect data starts at address 0x80b1688 (marked by asterisk '*') dt (2160): Dumping Pattern Buffer (base = 0x80b1688, offset = 0, limit = 4 bytes): 0x80b1688 *39 9c c3 39 dt (2160): The incorrect data starts at address 0x80b7ffc (marked by asterisk '*') dt (2160): Dumping Data Buffer (base = 0x80b2000, offset = 24572, limit = 64 bytes): 0x80b7fdc 39 9c c3 39 39 9c c3 39 39 9c c3 39 39 9c c3 39 0x80b7fec 39 9c c3 39 39 9c c3 39 39 9c c3 39 39 9c c3 39 0x80b7ffc *ff 00 ff 00 39 9c c3 39 39 9c c3 39 39 9c c3 39 0x80b800c 39 9c c3 39 39 9c c3 39 39 9c c3 39 39 9c c3 39 [.... ad nauseaum] -------------- lspci output ------------ 00:00.0 Host bridge: VIA Technologies, Inc. VT8367 [KT266] 00:01.0 PCI bridge: VIA Technologies, Inc. VT8367 [KT266 AGP] 00:06.0 Unknown mass storage controller: Promise Technology, Inc. 20265 (rev 02) 00:0c.0 VGA compatible unclassified device: S3 Inc. 86c864 [Vision 864 DRAM] vers 0 00:0e.0 Ethernet controller: Intel Corp. 82557 [Ethernet Pro 100] (rev 0c) 00:0f.0 Ethernet controller: Intel Corp. 82557 [Ethernet Pro 100] (rev 0c) 00:11.0 ISA bridge: VIA Technologies, Inc. VT8233 PCI to ISA Bridge 00:11.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06) -- Best regards, Henning mailto:hgs@anna-strasse.de - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: access beyond end of device again 2002-06-25 8:13 ` Oleg Drokin @ 2002-06-25 8:46 ` Hans Reiser 2002-06-25 8:58 ` Oleg Drokin [not found] ` <353485111.20020627013212@redefine.org> 1 sibling, 1 reply; 25+ messages in thread From: Hans Reiser @ 2002-06-25 8:46 UTC (permalink / raw) To: Oleg Drokin; +Cc: Kevin, reiserfs-list Oleg Drokin wrote: > > >Abit BP6 is particularly bad motherboard, you know. >And running celerons in SMP mode is not supported by Intel. > All of Namesys used to use BP6s running celerons in SMP mode.... It was a poor mob though as I dimly remember. Not as bad as the current Tyans we use though..... Hans ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: access beyond end of device again 2002-06-25 8:46 ` Hans Reiser @ 2002-06-25 8:58 ` Oleg Drokin 2002-06-25 9:06 ` Hans Reiser 0 siblings, 1 reply; 25+ messages in thread From: Oleg Drokin @ 2002-06-25 8:58 UTC (permalink / raw) To: Hans Reiser; +Cc: Kevin, reiserfs-list Hello! On Tue, Jun 25, 2002 at 12:46:52PM +0400, Hans Reiser wrote: > >Abit BP6 is particularly bad motherboard, you know. > >And running celerons in SMP mode is not supported by Intel. > All of Namesys used to use BP6s running celerons in SMP mode.... Yes, ask Vitaly about his unpleasant experiences, and logs full of errors. Bye, Oleg ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: access beyond end of device again 2002-06-25 8:58 ` Oleg Drokin @ 2002-06-25 9:06 ` Hans Reiser 2002-06-25 9:41 ` Oleg Drokin 0 siblings, 1 reply; 25+ messages in thread From: Hans Reiser @ 2002-06-25 9:06 UTC (permalink / raw) To: Oleg Drokin; +Cc: Kevin, reiserfs-list Oleg Drokin wrote: >Hello! > >On Tue, Jun 25, 2002 at 12:46:52PM +0400, Hans Reiser wrote: > > >>>Abit BP6 is particularly bad motherboard, you know. >>>And running celerons in SMP mode is not supported by Intel. >>> >>> >>All of Namesys used to use BP6s running celerons in SMP mode.... >> >> > >Yes, ask Vitaly about his unpleasant experiences, and logs full of errors. > >Bye, > Oleg > > > > How fortunate that it was the fsck specialist who had the controller go bad on him;-)...... -- Hans ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: access beyond end of device again 2002-06-25 9:06 ` Hans Reiser @ 2002-06-25 9:41 ` Oleg Drokin 0 siblings, 0 replies; 25+ messages in thread From: Oleg Drokin @ 2002-06-25 9:41 UTC (permalink / raw) To: Hans Reiser; +Cc: Kevin, reiserfs-list Hello! On Tue, Jun 25, 2002 at 01:06:09PM +0400, Hans Reiser wrote: > >Yes, ask Vitaly about his unpleasant experiences, and logs full of errors. > How fortunate that it was the fsck specialist who had the controller go > bad on him;-)...... Yes, we already figured that out ;) Bye, Oleg ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <353485111.20020627013212@redefine.org>]
* Re: access beyond end of device again [not found] ` <353485111.20020627013212@redefine.org> @ 2002-06-25 23:22 ` Kevin 2002-06-26 4:53 ` Oleg Drokin 0 siblings, 1 reply; 25+ messages in thread From: Kevin @ 2002-06-25 23:22 UTC (permalink / raw) To: Oleg Drokin; +Cc: reiserfs-list You think moving the drive to a different controller (not hpt) would help? I've got an extra channel on a promise ata66 card that I could use. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: access beyond end of device again 2002-06-25 23:22 ` Kevin @ 2002-06-26 4:53 ` Oleg Drokin 0 siblings, 0 replies; 25+ messages in thread From: Oleg Drokin @ 2002-06-26 4:53 UTC (permalink / raw) To: Kevin; +Cc: reiserfs-list Hello! On Tue, Jun 25, 2002 at 04:22:42PM -0700, Kevin wrote: > You think moving the drive to a different controller (not hpt) would > help? I've got an extra channel on a promise ata66 card that I could > use. I am not sure, but it seems promise controllers have aven more problems with VIA chipsets than HPT controllers. But it is impossible to say until you try that combination and see what happens. Bye, Oleg ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: access beyond end of device again 2002-06-24 14:48 ` Oleg Drokin 2002-06-24 14:49 ` Robert Brockway @ 2002-07-10 18:04 ` Tim Small 1 sibling, 0 replies; 25+ messages in thread From: Tim Small @ 2002-07-10 18:04 UTC (permalink / raw) To: reiserfs-list I've seen this sort of problem before, with a 160G disk. I'd installed a kernel with IDE patches to enable access to all 160G of the device, and created a reiserfs on it. Then one of my colleagues decided to install our 'standard' kernel on it. Sadly, the entire filesystem was turned to cheese, and reiserfsck didn't get much back. However, this looks to be a LONG way beyond the end of the device, but "cat /proc/partitions" might still be worth doing... Tim. Oleg Drokin wrote: >Hello! > >On Tue, Jun 25, 2002 at 12:37:02AM +1000, Robert Brockway wrote: > > >>>I'm getting these errors again: >>> attempt to access beyond end of device >>> 38:01: rw=0, want=2052028788, limit=58633312 >>> >>> >>What sort of device is this supposed to be? :) 38:01 is either a "Myricom >>PCI Myrinet board" or something "reserved for Linux/AP+" (and I'm assuming >>we're talking about a block device rather than a character device here :) >> >> > >Numbers printed are in hex, so this is: > block Fifth IDE hard disk/CD-ROM interface > 0 = /dev/hdi Master: whole disk (or CD-ROM) > 64 = /dev/hdj Slave: whole disk (or CD-ROM) > > Partitions are handled the same way as for the first > interface (see major number 3). > >Bye, > Oleg > > ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2002-07-10 18:04 UTC | newest] Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2002-06-24 14:35 access beyond end of device again Kevin 2002-06-24 14:35 ` Oleg Drokin 2002-06-24 14:45 ` Dirk Mueller 2002-06-24 14:49 ` Oleg Drokin 2002-06-24 16:46 ` Hans Reiser 2002-06-25 5:11 ` Oleg Drokin 2002-06-24 16:59 ` Dirk Mueller 2002-06-24 14:37 ` Robert Brockway 2002-06-24 14:48 ` Oleg Drokin 2002-06-24 14:49 ` Robert Brockway 2002-06-24 17:30 ` Kevin 2002-06-25 5:54 ` Oleg Drokin 2002-06-25 6:08 ` Kevin 2002-06-25 6:15 ` Oleg Drokin 2002-06-25 7:46 ` Kevin 2002-06-25 7:55 ` Oleg Drokin 2002-06-25 8:07 ` Kevin 2002-06-25 8:13 ` Oleg Drokin 2002-06-25 8:46 ` Hans Reiser 2002-06-25 8:58 ` Oleg Drokin 2002-06-25 9:06 ` Hans Reiser 2002-06-25 9:41 ` Oleg Drokin [not found] ` <353485111.20020627013212@redefine.org> 2002-06-25 23:22 ` Kevin 2002-06-26 4:53 ` Oleg Drokin 2002-07-10 18:04 ` Tim Small
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.