linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [KEXEC][2.5.69] kexec for 2.5.69 available
@ 2003-05-09 19:04 Andy Pfiffer
  2003-05-09 20:04 ` ext3/lilo/2.5.6[89] (was: [KEXEC][2.5.69] kexec for 2.5.69 available) Christophe Saout
  0 siblings, 1 reply; 28+ messages in thread
From: Andy Pfiffer @ 2003-05-09 19:04 UTC (permalink / raw)
  To: Eric Biederman; +Cc: fastboot, linux-kernel, Suparna Bhattacharya

Eric,

I have a patch set available for kexec for 2.5.69.  I had an unrelated
delay in posting this due to some strange behavior of late with LILO and
my ext3-mounted /boot partition (/sbin/lilo would say that it updated,
but a subsequent reboot would not include my new kernel)

The patches are available for download from OSDL's patch lifecycle
manager ( http://www.osdl.org/cgi-bin/plm/ ).

Patch stack for kexec for 2.5.69:

        kexec base for 2.5.69:
        http://www.osdl.org/cgi-bin/plm?module=patch_info&patch_id=1828

        kexec hwfixes for 2.5.69:
        http://www.osdl.org/cgi-bin/plm?module=patch_info&patch_id=1829

        kexec usemm change (allowed 2-way to work for me):
        http://www.osdl.org/cgi-bin/plm?module=patch_info&patch_id=1830

        optional change to defconfig (sets CONFIG_KEXEC=y):
        http://www.osdl.org/cgi-bin/plm?module=patch_info&patch_id=1831

The patches are also available (with matching kexec-tools-1.8) from this
link pending a crontab update:
http://www.osdl.org/archive/andyp/kexec/2.5.69/

Andrew Morton's tree now also contains kexec, and you can pick up his
patch here:
http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.69/

I'll put together another release area for a matching kexec-tools for
-mm trees (different kexec syscall number between 2.5.* and 2.5.*-mm*)
as soon as I get -mm trees built and booted on my kexec test machines.


Regards,
Andy

To All: if you try kexec, a quick reply of success or failure to
fastboot@osdl.org would be appreciated.  If it doesn't work for you,
please include the output of lspci in your email.

Kexec has worked for me on these systems:

    single P3-800MHz, 640MB:
        00:00.0 Host bridge: ServerWorks CNB20LE Host Bridge (rev 06)
        00:00.1 Host bridge: ServerWorks CNB20LE Host Bridge (rev 06)
        00:01.0 VGA compatible controller: S3 Inc. Savage 4 (rev 04)
        00:09.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro
        100] (rev 08)
        00:0f.0 ISA bridge: ServerWorks OSB4 South Bridge (rev 50)
        00:0f.1 IDE interface: ServerWorks OSB4 IDE Controller
        00:0f.2 USB Controller: ServerWorks OSB4/CSB5 OHCI USB
        Controller (rev 04)
        01:03.0 SCSI storage controller: Adaptec AIC-7892P U160/m (rev
        02)
        
    dual P3-866MHz, 256MB:
        00:00.0 Host bridge: ServerWorks CNB20LE Host Bridge (rev 05)
        00:00.1 Host bridge: ServerWorks CNB20LE Host Bridge (rev 05)
        00:02.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro
        100] (rev 08)
        00:04.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro
        100] (rev 08)
        00:07.0 VGA compatible controller: ATI Technologies Inc Rage XL
        (rev 65)
        00:0f.0 ISA bridge: ServerWorks OSB4 South Bridge (rev 4f)
        00:0f.1 IDE interface: ServerWorks OSB4 IDE Controller
        00:0f.2 USB Controller: ServerWorks OSB4/CSB5 USB Controller
        (rev 04)
        01:05.0 SCSI storage controller: LSI Logic / Symbios Logic
        53c896 (rev 07)
        01:05.1 SCSI storage controller: LSI Logic / Symbios Logic
        53c896 (rev 07)

    dual P4-1.7GHz Xeon, 512MB:
        00:00.0 Host bridge: Intel Corp. 82850 860 (Wombat) Chipset Host
        Bridge (MCH) (rev 04)
        00:01.0 PCI bridge: Intel Corp. 82850/82860 850/860
        (Tehama/Wombat) Chipset AGP Bridge (rev 04)
        00:02.0 PCI bridge: Intel Corp. 82860 860 (Wombat) Chipset PCI
        Bridge (rev 04)
        00:1e.0 PCI bridge: Intel Corp. 82801BA/CA PCI Bridge (rev 04)
        00:1f.0 ISA bridge: Intel Corp. 82801BA ISA Bridge (LPC) (rev
        04)
        00:1f.1 IDE interface: Intel Corp. 82801BA IDE U100 (rev 04)
        00:1f.2 USB Controller: Intel Corp. 82801BA/BAM USB (Hub  (rev
        04)
        00:1f.3 SMBus: Intel Corp. 82801BA/BAM SMBus (rev 04)
        00:1f.5 Multimedia audio controller: Intel Corp. 82801BA/BAM
        AC'97 Audio (rev 04)
        01:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA
        G400 AGP (rev 85)
        02:1f.0 PCI bridge: Intel Corp. 82806AA PCI64 Hub PCI Bridge
        (rev 03)
        03:00.0 PIC: Intel Corp. 82806AA PCI64 Hub Advanced Programmable
        Interrupt Controller (rev 01)
        04:04.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro
        100] (rev 0c)






^ permalink raw reply	[flat|nested] 28+ messages in thread

* ext3/lilo/2.5.6[89] (was: [KEXEC][2.5.69] kexec for 2.5.69 available)
  2003-05-09 19:04 [KEXEC][2.5.69] kexec for 2.5.69 available Andy Pfiffer
@ 2003-05-09 20:04 ` Christophe Saout
  2003-05-09 20:55   ` Andy Pfiffer
  0 siblings, 1 reply; 28+ messages in thread
From: Christophe Saout @ 2003-05-09 20:04 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andy Pfiffer

Am Fre, 2003-05-09 um 21.04 schrieb Andy Pfiffer:

> [...]
>  I had an unrelated
> delay in posting this due to some strange behavior of late with LILO and
> my ext3-mounted /boot partition (/sbin/lilo would say that it updated,
> but a subsequent reboot would not include my new kernel)

So I'm not the only one having this problem... I think I first saw this
with 2.5.68 but I'm not sure.

My boot partition is a small ext3 partition on a lvm2 volume accessed
over device-mapper (I've written a lilo patch for that, but the patch is
working and) but I don't think that has something to do with the
problem.

When syncing, unmounting and waiting some time after running lilo, the
changes sometimes seem correctly written to disk, I don't know when
exactly.

Could it be that the location of /boot/map is not written to the
partition sector of /dev/hda? Or not flushed correctly or something?

After reboot the old kernel came up again (though it was moved to
vmlinuz.old).

-- 
Christophe Saout <christophe@saout.de>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: ext3/lilo/2.5.6[89] (was: [KEXEC][2.5.69] kexec for 2.5.69available)
  2003-05-09 20:55   ` Andy Pfiffer
@ 2003-05-09 20:46     ` Riley Williams
  2003-05-09 22:39       ` Joe Korty
  2003-05-09 23:39       ` Andy Pfiffer
  2003-06-11 22:08     ` ext[23]/lilo/2.5.{68,69,70} -- IDE Problem? Andy Pfiffer
  1 sibling, 2 replies; 28+ messages in thread
From: Riley Williams @ 2003-05-09 20:46 UTC (permalink / raw)
  To: Andy Pfiffer, Christophe Saout; +Cc: linux-kernel

Hi Andy, Christophe.

 >>> I had an unrelated delay in posting this due to some strange
 >>> behavior of late with LILO and my ext3-mounted /boot partition
 >>> (/sbin/lilo would say that it updated, but a subsequent reboot
 >>> would not include my new kernel)

 >> So I'm not the only one having this problem... I think I first
 >> saw this with 2.5.68 but I'm not sure.

 > Well, that makes two of us for sure.

 >> My boot partition is a small ext3 partition on a lvm2 volume
 >> accessed over device-mapper (I've written a lilo patch for
 >> that, but the patch is working and) but I don't think that has
 >> something to do with the problem.
 >>
 >> When syncing, unmounting and waiting some time after running
 >> lilo, the changes sometimes seem correctly written to disk, I
 >> don't know when exactly.
 >
 > My /boot is an ext3 partition on an IDE disk. My symptoms and
 > your symptoms match -- wait awhile, and it works okay. If you
 > don't wait "long enough" the changes made in /etc/lilo.conf are
 > not reflected in the after running /sbin/lilo and rebooting
 > normally.

One suggestion: ext3 is a journalled version of ext2, so if you can
boot with whatever is needed to specify that the boot partition is
to be mounted as ext2 rather than ext3, you can isolate the journal
system: If the problem's still there in ext2 then the journal is
not involved, but if the problem vanishes there, it's something to
do with the journal.

I have to admit that the above sounds very much like the details
are being recorded in the journal, but the journal isn't being
played back to update the actual files.

Best wishes from Riley.
---
 * Nothing as pretty as a smile, nothing as ugly as a frown.

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.478 / Virus Database: 275 - Release Date: 6-May-2003


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ext3/lilo/2.5.6[89] (was: [KEXEC][2.5.69] kexec for 2.5.69 available)
  2003-05-09 20:04 ` ext3/lilo/2.5.6[89] (was: [KEXEC][2.5.69] kexec for 2.5.69 available) Christophe Saout
@ 2003-05-09 20:55   ` Andy Pfiffer
  2003-05-09 20:46     ` ext3/lilo/2.5.6[89] (was: [KEXEC][2.5.69] kexec for 2.5.69available) Riley Williams
  2003-06-11 22:08     ` ext[23]/lilo/2.5.{68,69,70} -- IDE Problem? Andy Pfiffer
  0 siblings, 2 replies; 28+ messages in thread
From: Andy Pfiffer @ 2003-05-09 20:55 UTC (permalink / raw)
  To: Christophe Saout; +Cc: linux-kernel

On Fri, 2003-05-09 at 13:04, Christophe Saout wrote:
> Am Fre, 2003-05-09 um 21.04 schrieb Andy Pfiffer:
> 
> > [...]
> >  I had an unrelated
> > delay in posting this due to some strange behavior of late with LILO and
> > my ext3-mounted /boot partition (/sbin/lilo would say that it updated,
> > but a subsequent reboot would not include my new kernel)
> 
> So I'm not the only one having this problem... I think I first saw this
> with 2.5.68 but I'm not sure.

Well, that makes two of us for sure.

> 
> My boot partition is a small ext3 partition on a lvm2 volume accessed
> over device-mapper (I've written a lilo patch for that, but the patch is
> working and) but I don't think that has something to do with the
> problem.
> 
> When syncing, unmounting and waiting some time after running lilo, the
> changes sometimes seem correctly written to disk, I don't know when
> exactly.

My /boot is an ext3 partition on an IDE disk.  My symptoms and your
symptoms match -- wait awhile, and it works okay.  If you don't wait
"long enough" the changes made in /etc/lilo.conf are not reflected in
the after running /sbin/lilo and rebooting normally.

I have been unable to reproduce this on a uniproc system with SCSI
disks.

2.5.67 seems to work in this regard as expected.

> Could it be that the location of /boot/map is not written to the
> partition sector of /dev/hda? Or not flushed correctly or something?
> 
> After reboot the old kernel came up again (though it was moved to
> vmlinuz.old).

I don't know -- I haven't isolated it yet.

Anyone else?




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ext3/lilo/2.5.6[89] (was: [KEXEC][2.5.69] kexec for 2.5.69available)
  2003-05-09 20:46     ` ext3/lilo/2.5.6[89] (was: [KEXEC][2.5.69] kexec for 2.5.69available) Riley Williams
@ 2003-05-09 22:39       ` Joe Korty
  2003-05-09 23:39       ` Andy Pfiffer
  1 sibling, 0 replies; 28+ messages in thread
From: Joe Korty @ 2003-05-09 22:39 UTC (permalink / raw)
  To: Riley Williams; +Cc: Andy Pfiffer, Christophe Saout, linux-kernel

> One suggestion: ext3 is a journalled version of ext2, so if you can
> boot with whatever is needed to specify that the boot partition is
> to be mounted as ext2 rather than ext3, you can isolate the journal
> system: If the problem's still there in ext2 then the journal is
> not involved, but if the problem vanishes there, it's something to
> do with the journal.
> 
> I have to admit that the above sounds very much like the details
> are being recorded in the journal, but the journal isn't being
> played back to update the actual files.

I recall reading on lkml once that an ext3 sync(2) merely pushes volatile
data/metadata out to the journal rather than to to files themselves.

Joe

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: ext3/lilo/2.5.6[89] (was: [KEXEC][2.5.69] kexec for 2.5.69available)
  2003-05-09 20:46     ` ext3/lilo/2.5.6[89] (was: [KEXEC][2.5.69] kexec for 2.5.69available) Riley Williams
  2003-05-09 22:39       ` Joe Korty
@ 2003-05-09 23:39       ` Andy Pfiffer
  1 sibling, 0 replies; 28+ messages in thread
From: Andy Pfiffer @ 2003-05-09 23:39 UTC (permalink / raw)
  To: Riley Williams; +Cc: Christophe Saout, linux-kernel

On Fri, 2003-05-09 at 13:46, Riley Williams wrote:
> Hi Andy, Christophe.
> 
>  >>> I had an unrelated delay in posting this due to some strange
>  >>> behavior of late with LILO and my ext3-mounted /boot partition
>  >>> (/sbin/lilo would say that it updated, but a subsequent reboot
>  >>> would not include my new kernel)
> 
>  >> So I'm not the only one having this problem... I think I first
>  >> saw this with 2.5.68 but I'm not sure.
> 
>  > Well, that makes two of us for sure.
> 
>  >> My boot partition is a small ext3 partition on a lvm2 volume
>  >> accessed over device-mapper (I've written a lilo patch for
>  >> that, but the patch is working and) but I don't think that has
>  >> something to do with the problem.
>  >>
>  >> When syncing, unmounting and waiting some time after running
>  >> lilo, the changes sometimes seem correctly written to disk, I
>  >> don't know when exactly.
>  >
>  > My /boot is an ext3 partition on an IDE disk. My symptoms and
>  > your symptoms match -- wait awhile, and it works okay. If you
>  > don't wait "long enough" the changes made in /etc/lilo.conf are
>  > not reflected in the after running /sbin/lilo and rebooting
>  > normally.
> 
> One suggestion: ext3 is a journalled version of ext2, so if you can
> boot with whatever is needed to specify that the boot partition is
> to be mounted as ext2 rather than ext3, you can isolate the journal
> system: If the problem's still there in ext2 then the journal is
> not involved, but if the problem vanishes there, it's something to
> do with the journal.

Changing the "ext3" to "ext2" in /etc/fstab and rebooting did not change
the behavior (ie, edit /etc/lilo.conf, run /sbin/lilo, reboot cleanly,
changes not there).  I did see the warning about mounting an ext3
filesystem as ext2, however.

Strange.




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ext[23]/lilo/2.5.{68,69,70} -- IDE Problem?
  2003-05-09 20:55   ` Andy Pfiffer
  2003-05-09 20:46     ` ext3/lilo/2.5.6[89] (was: [KEXEC][2.5.69] kexec for 2.5.69available) Riley Williams
@ 2003-06-11 22:08     ` Andy Pfiffer
  2003-06-11 23:21       ` Christophe Saout
  1 sibling, 1 reply; 28+ messages in thread
From: Andy Pfiffer @ 2003-06-11 22:08 UTC (permalink / raw)
  To: Christophe Saout; +Cc: linux-kernel

On Fri, 2003-05-09 at 13:55, Andy Pfiffer wrote:
> On Fri, 2003-05-09 at 13:04, Christophe Saout wrote:
> > Am Fre, 2003-05-09 um 21.04 schrieb Andy Pfiffer:
> > 
> > > [...]
> > >  I had an unrelated
> > > delay in posting this due to some strange behavior of late with LILO and
> > > my ext3-mounted /boot partition (/sbin/lilo would say that it updated,
> > > but a subsequent reboot would not include my new kernel)
> > 
> > So I'm not the only one having this problem... I think I first saw this
> > with 2.5.68 but I'm not sure.
> 
> Well, that makes two of us for sure.
> 
> > 
> > My boot partition is a small ext3 partition on a lvm2 volume accessed
> > over device-mapper (I've written a lilo patch for that, but the patch is
> > working and) but I don't think that has something to do with the
> > problem.
> > 
> > When syncing, unmounting and waiting some time after running lilo, the
> > changes sometimes seem correctly written to disk, I don't know when
> > exactly.
> 
> My /boot is an ext3 partition on an IDE disk.  My symptoms and your
> symptoms match -- wait awhile, and it works okay.  If you don't wait
> "long enough" the changes made in /etc/lilo.conf are not reflected in
> the after running /sbin/lilo and rebooting normally.
> 
> I have been unable to reproduce this on a uniproc system with SCSI
> disks.
> 
> 2.5.67 seems to work in this regard as expected.
> 
> > Could it be that the location of /boot/map is not written to the
> > partition sector of /dev/hda? Or not flushed correctly or something?
> > 
> > After reboot the old kernel came up again (though it was moved to
> > vmlinuz.old).
> 
> I don't know -- I haven't isolated it yet.
> 
> Anyone else?

I have taken another look at this, and can confirm the following:

1. 2.5.67 works as expected.
2. 2.5.68, 2.5.69, and 2.5.70 do not.
3. ext2 vs. ext3 for /boot: no effect (ie, .68, .69, .70 demonstrate the
problem independent of the filesystem used for /boot).

Relative to a 2.5.68 pure BK tree, the deltas from 2.5.67 to 2.5.68 are:
1.971.76.10	/* 2.5.67 */
1.1124		/* 2.5.68 */

The patch exported by BK between these 2 revs is 297K lines ( a sizeable
haystack ).  Any ideas about where I should dig for my needle first
would be welcomed...


Gory details about my hardware & software follow...

% lilo -v
LILO version 22.1, Copyright (C) 1992-1998 Werner Almesberger
Development beyond version 21 Copyright (C) 1999-2001 John Coffman
Released 31-Oct-2001 and compiled at 20:50:13 on Mar 25 2002.
MAX_IMAGES = 27


CPUs:
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 15
model		: 1
model name	: Intel(R) Xeon(TM) CPU 1.70GHz
stepping	: 2
cpu MHz		: 1685.926
cache size	: 256 KB
physical id	: 0
siblings	: 1
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 2
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips	: 3317.76

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 15
model		: 0
model name	: Intel(R) Xeon(TM) CPU 1700MHz
stepping	: 10
cpu MHz		: 1685.926
cache size	: 256 KB
physical id	: 0
siblings	: 1
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 2
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips	: 3366.91


Two IDE hard drives (I haven't cracked the case to identify the
manufacturer):

/dev/hda:
 HDIO_GETGEO_BIG failed: Inappropriate ioctl for device

 Model=CI530L04VARE700-                        , FwRev=REO44AA5,
SerialNo=        S PXXTYH2351
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=40
 BuffType=DualPortCache, BuffSize=1916kB, MaxMultSect=16, MultSect=16
 CurCHS=16383/16/63, CurSects=-66060037, LBA=yes, LBAsects=78156288
 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes: pio0 pio1 pio2 pio3 pio4 
 DMA modes: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5 
 AdvancedPM=yes: disabled (255)
 Drive Supports : ATA/ATAPI-5 T13 1321D revision 1 : ATA-2 ATA-3 ATA-4
ATA-5

/dev/hdb:
 HDIO_GETGEO_BIG failed: Inappropriate ioctl for device

 Model=CI530L02VARE700-                        , FwRev=REO24AA5,
SerialNo=        S PVVTFT0B17
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=40
 BuffType=DualPortCache, BuffSize=1916kB, MaxMultSect=16, MultSect=16
 CurCHS=16383/16/63, CurSects=-66060037, LBA=yes, LBAsects=39876480
 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes: pio0 pio1 pio2 pio3 pio4 
 DMA modes: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5 
 AdvancedPM=yes: disabled (255)
 Drive Supports : ATA/ATAPI-5 T13 1321D revision 1 : ATA-2 ATA-3 ATA-4
ATA-5

The PCI hardware on this system:
00:00.0 Host bridge: Intel Corp. 82850 860 (Wombat) Chipset Host Bridge
(MCH) (rev 04)
	Subsystem: IBM: Unknown device 2531
	Flags: bus master, fast devsel, latency 0
	Memory at f0000000 (32-bit, prefetchable) [size=64M]
	Capabilities: [a0] AGP version 2.0

00:01.0 PCI bridge: Intel Corp. 82850/82860 850/860 (Tehama/Wombat)
Chipset AGP Bridge (rev 04) (prog-if 00 [Normal decode])
	Flags: bus master, 66Mhz, fast devsel, latency 64
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=32
	Memory behind bridge: f6000000-f8ffffff
	Prefetchable memory behind bridge: f4000000-f5ffffff

00:02.0 PCI bridge: Intel Corp. 82860 860 (Wombat) Chipset PCI Bridge
(rev 04) (prog-if 00 [Normal decode])
	Flags: bus master, 66Mhz, fast devsel, latency 32
	Bus: primary=00, secondary=02, subordinate=03, sec-latency=0
	Memory behind bridge: fb000000-fb0fffff

00:1e.0 PCI bridge: Intel Corp. 82801BA/CA PCI Bridge (rev 04) (prog-if
00 [Normal decode])
	Flags: bus master, fast devsel, latency 0
	Bus: primary=00, secondary=04, subordinate=04, sec-latency=32
	I/O behind bridge: 0000c000-0000cfff
	Memory behind bridge: f9000000-faffffff

00:1f.0 ISA bridge: Intel Corp. 82801BA ISA Bridge (LPC) (rev 04)
	Flags: bus master, medium devsel, latency 0

00:1f.1 IDE interface: Intel Corp. 82801BA IDE U100 (rev 04) (prog-if 80
[Master])
	Subsystem: IBM: Unknown device 2442
	Flags: bus master, medium devsel, latency 0
	I/O ports at f000 [size=16]

00:1f.2 USB Controller: Intel Corp. 82801BA/BAM USB (Hub  (rev 04)
(prog-if 00 [UHCI])
	Subsystem: IBM: Unknown device 2442
	Flags: bus master, medium devsel, latency 0, IRQ 19
	I/O ports at d000 [size=32]

00:1f.3 SMBus: Intel Corp. 82801BA/BAM SMBus (rev 04)
	Subsystem: IBM: Unknown device 2442
	Flags: medium devsel, IRQ 17
	I/O ports at 5000 [size=16]

00:1f.5 Multimedia audio controller: Intel Corp. 82801BA/BAM AC'97 Audio
(rev 04)
	Subsystem: IBM: Unknown device 0224
	Flags: bus master, medium devsel, latency 0, IRQ 17
	I/O ports at d800 [size=256]
	I/O ports at dc00 [size=64]

01:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G400 AGP
(rev 85) (prog-if 00 [VGA])
	Subsystem: Matrox Graphics, Inc. Millennium G450 Dual Head
	Flags: bus master, medium devsel, latency 64, IRQ 22
	Memory at f4000000 (32-bit, prefetchable) [size=32M]
	Memory at f6000000 (32-bit, non-prefetchable) [size=16K]
	Memory at f7000000 (32-bit, non-prefetchable) [size=8M]
	Expansion ROM at <unassigned> [disabled] [size=128K]
	Capabilities: [dc] Power Management version 2
	Capabilities: [f0] AGP version 2.0

02:1f.0 PCI bridge: Intel Corp. 82806AA PCI64 Hub PCI Bridge (rev 03)
(prog-if 00 [Normal decode])
	Flags: bus master, 66Mhz, fast devsel, latency 0
	Bus: primary=02, secondary=03, subordinate=03, sec-latency=32
	Memory behind bridge: fb000000-fb0fffff

03:00.0 PIC: Intel Corp. 82806AA PCI64 Hub Advanced Programmable
Interrupt Controller (rev 01) (prog-if 20 [IO(X)-APIC])
	Subsystem: Intel Corp. 82806AA PCI64 Hub Advanced Programmable
Interrupt Controller
	Flags: bus master, fast devsel, latency 0
	Memory at fb000000 (32-bit, non-prefetchable) [size=4K]

04:04.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100]
(rev 0c)
	Subsystem: IBM: Unknown device 0207
	Flags: bus master, medium devsel, latency 32, IRQ 16
	Memory at fa020000 (32-bit, non-prefetchable) [size=4K]
	I/O ports at c000 [size=64]
	Memory at fa000000 (32-bit, non-prefetchable) [size=128K]
	Expansion ROM at <unassigned> [disabled] [size=64K]
	Capabilities: [dc] Power Management version 2






^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ext[23]/lilo/2.5.{68,69,70} -- IDE Problem?
  2003-06-11 22:08     ` ext[23]/lilo/2.5.{68,69,70} -- IDE Problem? Andy Pfiffer
@ 2003-06-11 23:21       ` Christophe Saout
  2003-06-11 23:40         ` Bartlomiej Zolnierkiewicz
  2003-06-12  0:20         ` ext[23]/lilo/2.5.{68,69,70} -- blkdev_put() problem? Andy Pfiffer
  0 siblings, 2 replies; 28+ messages in thread
From: Christophe Saout @ 2003-06-11 23:21 UTC (permalink / raw)
  To: Andy Pfiffer; +Cc: linux-kernel

Am Don, 2003-06-12 um 00.08 schrieb Andy Pfiffer:
> On Fri, 2003-05-09 at 13:55, Andy Pfiffer wrote:
> > On Fri, 2003-05-09 at 13:04, Christophe Saout wrote:
> > > Am Fre, 2003-05-09 um 21.04 schrieb Andy Pfiffer:
> > > 
> > > > [...]
> > > >  I had an unrelated
> > > > delay in posting this due to some strange behavior of late with LILO and
> > > > my ext3-mounted /boot partition (/sbin/lilo would say that it updated,
> > > > but a subsequent reboot would not include my new kernel)
> > > 
> > > So I'm not the only one having this problem... I think I first saw this
> > > with 2.5.68 but I'm not sure.
> > 
> > Well, that makes two of us for sure.
> > 
> > > 
> > > My boot partition is a small ext3 partition on a lvm2 volume accessed
> > > over device-mapper (I've written a lilo patch for that, but the patch is
> > > working and) but I don't think that has something to do with the
> > > problem.
> > > 
> > > When syncing, unmounting and waiting some time after running lilo, the
> > > changes sometimes seem correctly written to disk, I don't know when
> > > exactly.
> > 
> > My /boot is an ext3 partition on an IDE disk.  My symptoms and your
> > symptoms match -- wait awhile, and it works okay.  If you don't wait
> > "long enough" the changes made in /etc/lilo.conf are not reflected in
> > the after running /sbin/lilo and rebooting normally.
> > 
> > I have been unable to reproduce this on a uniproc system with SCSI
> > disks.
> > 
> > 2.5.67 seems to work in this regard as expected.
> > 
> > > Could it be that the location of /boot/map is not written to the
> > > partition sector of /dev/hda? Or not flushed correctly or something?
> > > 
> > > After reboot the old kernel came up again (though it was moved to
> > > vmlinuz.old).
> > 
> > I don't know -- I haven't isolated it yet.
> > 
> > Anyone else?
> 
> I have taken another look at this, and can confirm the following:
> 
> 1. 2.5.67 works as expected.
> 2. 2.5.68, 2.5.69, and 2.5.70 do not.
> 3. ext2 vs. ext3 for /boot: no effect (ie, .68, .69, .70 demonstrate the
> problem independent of the filesystem used for /boot).

I found out that flushb /dev/<boot_device> helps, syncing doesn't. Not
100% sure if that's right, because right now I'm always doing both, but
I remember having only synced before and that didn't help.

> Relative to a 2.5.68 pure BK tree, the deltas from 2.5.67 to 2.5.68 are:
> 1.971.76.10	/* 2.5.67 */
> 1.1124		/* 2.5.68 */
> 
> The patch exported by BK between these 2 revs is 297K lines ( a sizeable
> haystack ).  Any ideas about where I should dig for my needle first
> would be welcomed...

There don't seem to be too much changes in /drivers/block or /fs, mostly
cleanups. I personally have no idea where to start, except trying out
each -bk version inbetween. Hmmm. And I'm not going to do that now...
:-/

-- 
Christophe Saout <christophe@saout.de>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ext[23]/lilo/2.5.{68,69,70} -- IDE Problem?
  2003-06-11 23:21       ` Christophe Saout
@ 2003-06-11 23:40         ` Bartlomiej Zolnierkiewicz
  2003-06-12  0:20         ` ext[23]/lilo/2.5.{68,69,70} -- blkdev_put() problem? Andy Pfiffer
  1 sibling, 0 replies; 28+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2003-06-11 23:40 UTC (permalink / raw)
  To: Christophe Saout; +Cc: Andy Pfiffer, linux-kernel


mm/msync.c:
<...>
 * MS_ASYNC does not start I/O (it used to, up to 2.5.67).
<...>

You can revert changes to mm/msync.c from 2.5.68 patch
and see whether it helps.

Regards,
--
Bartlomiej

On 12 Jun 2003, Christophe Saout wrote:

> Am Don, 2003-06-12 um 00.08 schrieb Andy Pfiffer:
> > On Fri, 2003-05-09 at 13:55, Andy Pfiffer wrote:
> > > On Fri, 2003-05-09 at 13:04, Christophe Saout wrote:
> > > > Am Fre, 2003-05-09 um 21.04 schrieb Andy Pfiffer:
> > > >
> > > > > [...]
> > > > >  I had an unrelated
> > > > > delay in posting this due to some strange behavior of late with LILO and
> > > > > my ext3-mounted /boot partition (/sbin/lilo would say that it updated,
> > > > > but a subsequent reboot would not include my new kernel)
> > > >
> > > > So I'm not the only one having this problem... I think I first saw this
> > > > with 2.5.68 but I'm not sure.
> > >
> > > Well, that makes two of us for sure.
> > >
> > > >
> > > > My boot partition is a small ext3 partition on a lvm2 volume accessed
> > > > over device-mapper (I've written a lilo patch for that, but the patch is
> > > > working and) but I don't think that has something to do with the
> > > > problem.
> > > >
> > > > When syncing, unmounting and waiting some time after running lilo, the
> > > > changes sometimes seem correctly written to disk, I don't know when
> > > > exactly.
> > >
> > > My /boot is an ext3 partition on an IDE disk.  My symptoms and your
> > > symptoms match -- wait awhile, and it works okay.  If you don't wait
> > > "long enough" the changes made in /etc/lilo.conf are not reflected in
> > > the after running /sbin/lilo and rebooting normally.
> > >
> > > I have been unable to reproduce this on a uniproc system with SCSI
> > > disks.
> > >
> > > 2.5.67 seems to work in this regard as expected.
> > >
> > > > Could it be that the location of /boot/map is not written to the
> > > > partition sector of /dev/hda? Or not flushed correctly or something?
> > > >
> > > > After reboot the old kernel came up again (though it was moved to
> > > > vmlinuz.old).
> > >
> > > I don't know -- I haven't isolated it yet.
> > >
> > > Anyone else?
> >
> > I have taken another look at this, and can confirm the following:
> >
> > 1. 2.5.67 works as expected.
> > 2. 2.5.68, 2.5.69, and 2.5.70 do not.
> > 3. ext2 vs. ext3 for /boot: no effect (ie, .68, .69, .70 demonstrate the
> > problem independent of the filesystem used for /boot).
>
> I found out that flushb /dev/<boot_device> helps, syncing doesn't. Not
> 100% sure if that's right, because right now I'm always doing both, but
> I remember having only synced before and that didn't help.
>
> > Relative to a 2.5.68 pure BK tree, the deltas from 2.5.67 to 2.5.68 are:
> > 1.971.76.10	/* 2.5.67 */
> > 1.1124		/* 2.5.68 */
> >
> > The patch exported by BK between these 2 revs is 297K lines ( a sizeable
> > haystack ).  Any ideas about where I should dig for my needle first
> > would be welcomed...
>
> There don't seem to be too much changes in /drivers/block or /fs, mostly
> cleanups. I personally have no idea where to start, except trying out
> each -bk version inbetween. Hmmm. And I'm not going to do that now...
> :-/
>
> --
> Christophe Saout <christophe@saout.de>




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ext[23]/lilo/2.5.{68,69,70} -- blkdev_put() problem?
  2003-06-11 23:21       ` Christophe Saout
  2003-06-11 23:40         ` Bartlomiej Zolnierkiewicz
@ 2003-06-12  0:20         ` Andy Pfiffer
  2003-06-12  0:29           ` Andrew Morton
  1 sibling, 1 reply; 28+ messages in thread
From: Andy Pfiffer @ 2003-06-12  0:20 UTC (permalink / raw)
  To: Christophe Saout, adam; +Cc: linux-kernel

On Wed, 2003-06-11 at 16:21, Christophe Saout wrote:
> Am Don, 2003-06-12 um 00.08 schrieb Andy Pfiffer:
> > On Fri, 2003-05-09 at 13:55, Andy Pfiffer wrote:
> > > On Fri, 2003-05-09 at 13:04, Christophe Saout wrote:
> > > > Am Fre, 2003-05-09 um 21.04 schrieb Andy Pfiffer:
> > > > 
> > > > > [...]
> > > > >  I had an unrelated
> > > > > delay in posting this due to some strange behavior of late with LILO and
> > > > > my ext3-mounted /boot partition (/sbin/lilo would say that it updated,
> > > > > but a subsequent reboot would not include my new kernel)
> > > > 
> > > > So I'm not the only one having this problem... I think I first saw this
> > > > with 2.5.68 but I'm not sure.
<snip>
> > I have taken another look at this, and can confirm the following:
> > 
> > 1. 2.5.67 works as expected.
> > 2. 2.5.68, 2.5.69, and 2.5.70 do not.
> > 3. ext2 vs. ext3 for /boot: no effect (ie, .68, .69, .70 demonstrate the
> > problem independent of the filesystem used for /boot).
> 
> I found out that flushb /dev/<boot_device> helps, syncing doesn't. Not
> 100% sure if that's right, because right now I'm always doing both, but
> I remember having only synced before and that didn't help.

<snip>

A little more digging reveals this thread from May 14, 2003:
http://marc.theaimsgroup.com/?l=linux-kernel&m=105296774516509&w=2

Applying the kludge in Adam's message:

--- linux-2.5.69/fs/block_dev.c.orig	2003-05-14 17:43:40.000000000 -0700
+++ linux-2.5.69/fs/block_dev.c	2003-05-14 17:44:29.000000000 -0700
@@ -635,14 +635,24 @@
 int blkdev_put(struct block_device *bdev, int kind)
 {
 	int ret = 0;
 	struct inode *bd_inode = bdev->bd_inode;
 	struct gendisk *disk = bdev->bd_disk;
 
 	down(&bdev->bd_sem);
+
+	/* AJR start */
+	switch (kind) {
+	case BDEV_FILE:
+	case BDEV_FS:
+		sync_blockdev(bd_inode->i_bdev);
+		break;
+	}
+	/* AJR end */
+
 	lock_kernel();
 	if (!--bdev->bd_openers) {
 		switch (kind) {
 		case BDEV_FILE:
 		case BDEV_FS:
 			sync_blockdev(bd_inode->i_bdev);
 			break;

made things work for me in 2.5.68.
I suspect it will make things work for .70 as well.

So now the important question: is it wrong to not sync_blockdev() until
the count drops to 0?

Andy



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ext[23]/lilo/2.5.{68,69,70} -- blkdev_put() problem?
  2003-06-12  0:20         ` ext[23]/lilo/2.5.{68,69,70} -- blkdev_put() problem? Andy Pfiffer
@ 2003-06-12  0:29           ` Andrew Morton
  2003-06-12 10:42             ` Christophe Saout
  2003-06-12 17:27             ` Andy Pfiffer
  0 siblings, 2 replies; 28+ messages in thread
From: Andrew Morton @ 2003-06-12  0:29 UTC (permalink / raw)
  To: Andy Pfiffer; +Cc: christophe, adam, linux-kernel

Andy Pfiffer <andyp@osdl.org> wrote:
>
> So now the important question: is it wrong to not sync_blockdev() until
> the count drops to 0?

Should be OK.  The close will not sync anything if someone else has the
blockdev open (ie: there's a filesystem mounted there).

But sync() should certainly write everything out, and lilo does perform a
sync.

I'd be interested in seeing the contents of /proc/meminfo immediately after
the lilo run, see if there's any dirty memory left around.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ext[23]/lilo/2.5.{68,69,70} -- blkdev_put() problem?
  2003-06-12  0:29           ` Andrew Morton
@ 2003-06-12 10:42             ` Christophe Saout
  2003-06-12 10:54               ` Andrew Morton
  2003-06-12 17:27             ` Andy Pfiffer
  1 sibling, 1 reply; 28+ messages in thread
From: Christophe Saout @ 2003-06-12 10:42 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Andy Pfiffer, adam, linux-kernel

Am Don, 2003-06-12 um 02.29 schrieb Andrew Morton:

> But sync() should certainly write everything out, and lilo does perform a
> sync.

Yep.

> I'd be interested in seeing the contents of /proc/meminfo immediately after
> the lilo run, see if there's any dirty memory left around.

Yes, one page. After running lilo, there are 4k diry, running sync
doesn't get it below 4k. Only flushb /dev/hda does (or waiting several
minutes).

If you're interested, I've put an annotated version of

( cat /proc/meminfo; lilo; cat /proc/meminfo; sync; cat /proc/meminfo;
flushb /dev/hda; cat /proc/meminfo ) | buffer > meminfo.out.txt

on my web space: http://www.saout.de/files/meminfo.out.txt

(the kernel used was 2.5.70-mm7 with some unrelated patches backed out)

BTW: I found out that now strace lilo freezes the machine...

-- 
Christophe Saout <christophe@saout.de>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ext[23]/lilo/2.5.{68,69,70} -- blkdev_put() problem?
  2003-06-12 10:42             ` Christophe Saout
@ 2003-06-12 10:54               ` Andrew Morton
  2003-06-12 11:12                 ` Christophe Saout
                                   ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Andrew Morton @ 2003-06-12 10:54 UTC (permalink / raw)
  To: Christophe Saout; +Cc: andyp, adam, linux-kernel

Christophe Saout <christophe@saout.de> wrote:
>
> Am Don, 2003-06-12 um 02.29 schrieb Andrew Morton:
> 
> > But sync() should certainly write everything out, and lilo does perform a
> > sync.
> 
> Yep.
> 
> > I'd be interested in seeing the contents of /proc/meminfo immediately after
> > the lilo run, see if there's any dirty memory left around.
> 
> Yes, one page. After running lilo, there are 4k diry, running sync
> doesn't get it below 4k.

That would tend to imply that a page got onto the wrong list.  But if that
were so, nothing would be able to write it.

> Only flushb /dev/hda does (or waiting several minutes).

What is flushb?

I use `lilo ; reboot -f' about 1000 times a day, no probs.  There's
something different.

Adam was doing strange things with an initrd and pivot_root.  Are you doing
anything unconventional?

> 
> BTW: I found out that now strace lilo freezes the machine...

Works OK here.  Try `strace strace lilo' ;)



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ext[23]/lilo/2.5.{68,69,70} -- blkdev_put() problem?
  2003-06-12 10:54               ` Andrew Morton
@ 2003-06-12 11:12                 ` Christophe Saout
  2003-06-12 11:24                 ` ext[23]/lilo/2.5.{68,69,70} -- strace lilo - freeze Christophe Saout
  2003-06-12 12:44                 ` ext[23]/lilo/2.5.{68,69,70} -- blkdev_put() problem? Herbert Xu
  2 siblings, 0 replies; 28+ messages in thread
From: Christophe Saout @ 2003-06-12 11:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: andyp, adam, linux-kernel

Am Don, 2003-06-12 um 12.54 schrieb Andrew Morton:

> Christophe Saout <christophe@saout.de> wrote:
> >
> > Am Don, 2003-06-12 um 02.29 schrieb Andrew Morton:
> >
> > > I'd be interested in seeing the contents of /proc/meminfo immediately after
> > > the lilo run, see if there's any dirty memory left around.
> > 
> > Yes, one page. After running lilo, there are 4k diry, running sync
> > doesn't get it below 4k.
> 
> That would tend to imply that a page got onto the wrong list.  But if that
> were so, nothing would be able to write it.
> 
> > Only flushb /dev/hda does (or waiting several minutes).
> 
> What is flushb?

A program that does a flush ioctl on a block device:

open("/dev/hda", O_RDONLY)              = 3
ioctl(3, BLKFLSBUF, 0)                  = 0

> I use `lilo ; reboot -f' about 1000 times a day, no probs.  There's
> something different.
> 
> Adam was doing strange things with an initrd and pivot_root.  Are you doing
> anything unconventional?

I'm using an initrd (but no pivot_root) that initializes my LVM2 volumes
(using device-mapper).

/boot and / are on device-mapper devices.

> > BTW: I found out that now strace lilo freezes the machine...
> 
> Works OK here.  Try `strace strace lilo' ;)

I'll try to find out what happens. Not interested in crashing my system
while answering emails now. ;)

-- 
Christophe Saout <christophe@saout.de>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ext[23]/lilo/2.5.{68,69,70} -- strace lilo - freeze
  2003-06-12 10:54               ` Andrew Morton
  2003-06-12 11:12                 ` Christophe Saout
@ 2003-06-12 11:24                 ` Christophe Saout
  2003-06-12 12:44                 ` ext[23]/lilo/2.5.{68,69,70} -- blkdev_put() problem? Herbert Xu
  2 siblings, 0 replies; 28+ messages in thread
From: Christophe Saout @ 2003-06-12 11:24 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Am Don, 2003-06-12 um 12.54 schrieb Andrew Morton:

> > BTW: I found out that now strace lilo freezes the machine...
> Works OK here.  Try `strace strace lilo' ;)

Since we are already talking about syncing...

The last thing "strace lilo" shows is:

fsync(5

-- 
Christophe Saout <christophe@saout.de>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ext[23]/lilo/2.5.{68,69,70} -- blkdev_put() problem?
  2003-06-12 10:54               ` Andrew Morton
  2003-06-12 11:12                 ` Christophe Saout
  2003-06-12 11:24                 ` ext[23]/lilo/2.5.{68,69,70} -- strace lilo - freeze Christophe Saout
@ 2003-06-12 12:44                 ` Herbert Xu
  2 siblings, 0 replies; 28+ messages in thread
From: Herbert Xu @ 2003-06-12 12:44 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel

Andrew Morton <akpm@digeo.com> wrote:
> 
> I use `lilo ; reboot -f' about 1000 times a day, no probs.  There's
> something different.
> 
> Adam was doing strange things with an initrd and pivot_root.  Are you doing
> anything unconventional?

I see exactly the same problem with lilo and I too use initrd + pivot_root.
I think Adam's post referred to elsewhere in this thread already identified
the problem as initrd-only.
-- 
Debian GNU/Linux 3.0 is out! ( http://www.debian.org/ )
Email:  Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ext[23]/lilo/2.5.{68,69,70} -- blkdev_put() problem?
  2003-06-12  0:29           ` Andrew Morton
  2003-06-12 10:42             ` Christophe Saout
@ 2003-06-12 17:27             ` Andy Pfiffer
  2003-06-12 17:53               ` Andrew Morton
  1 sibling, 1 reply; 28+ messages in thread
From: Andy Pfiffer @ 2003-06-12 17:27 UTC (permalink / raw)
  To: Andrew Morton; +Cc: christophe, adam, linux-kernel

On Wed, 2003-06-11 at 17:29, Andrew Morton wrote:
> Andy Pfiffer <andyp@osdl.org> wrote:
> >
> > So now the important question: is it wrong to not sync_blockdev() until
> > the count drops to 0?
> 
> Should be OK.  The close will not sync anything if someone else has the
> blockdev open (ie: there's a filesystem mounted there).
> 
> But sync() should certainly write everything out, and lilo does perform a
> sync.
> 
> I'd be interested in seeing the contents of /proc/meminfo immediately after
> the lilo run, see if there's any dirty memory left around.

How I measured:

	# cat measure
	#!/bin/sh
	sync
	cat /proc/meminfo
	/sbin/lilo
	cat /proc/meminfo

	# ./measure | dd bs=1024k > out

2.5.68 pure:

Before:				After:
MemTotal:       514304 kB	MemTotal:       514304 kB
MemFree:        388720 kB	MemFree:        385648 kB
Buffers:         10056 kB	Buffers:         12092 kB
Cached:          54000 kB	Cached:          54956 kB
SwapCached:          0 kB	SwapCached:          0 kB
Active:          62928 kB	Active:          63240 kB
Inactive:        34416 kB	Inactive:        37120 kB
HighTotal:           0 kB	HighTotal:           0 kB
HighFree:            0 kB	HighFree:            0 kB
LowTotal:       514304 kB	LowTotal:       514304 kB
LowFree:        388720 kB	LowFree:        385648 kB
SwapTotal:      787144 kB	SwapTotal:      787144 kB
SwapFree:       787144 kB	SwapFree:       787144 kB
Dirty:               0 kB	Dirty:               8 kB <---
Writeback:           0 kB	Writeback:           0 kB
Mapped:          45484 kB	Mapped:          45488 kB
Slab:            11880 kB	Slab:            12100 kB
Committed_AS:   146184 kB	Committed_AS:   146184 kB
PageTables:        656 kB	PageTables:        656 kB
VmallocTotal:   516040 kB	VmallocTotal:   516040 kB
VmallocUsed:     42608 kB	VmallocUsed:     42608 kB
VmallocChunk:   473432 kB	VmallocChunk:   473432 kB

2.5.68+kludge:

Before:				After:
MemTotal:       514304 kB	MemTotal:       514304 kB
MemFree:        390416 kB	MemFree:        387216 kB
Buffers:          9844 kB	Buffers:         11892 kB
Cached:          52920 kB	Cached:          53864 kB
SwapCached:          0 kB	SwapCached:          0 kB
Active:          62136 kB	Active:          62452 kB
Inactive:        33908 kB	Inactive:        36600 kB
HighTotal:           0 kB	HighTotal:           0 kB
HighFree:            0 kB	HighFree:            0 kB
LowTotal:       514304 kB	LowTotal:       514304 kB
LowFree:        390416 kB	LowFree:        387216 kB
SwapTotal:      787144 kB	SwapTotal:      787144 kB
SwapFree:       787144 kB	SwapFree:       787144 kB
Dirty:               0 kB	Dirty:               4 kB <---
Writeback:           0 kB	Writeback:           0 kB
Mapped:          45448 kB	Mapped:          45452 kB
Slab:            11564 kB	Slab:            11756 kB
Committed_AS:   146192 kB	Committed_AS:   146132 kB
PageTables:        656 kB	PageTables:        656 kB
VmallocTotal:   516040 kB	VmallocTotal:   516040 kB
VmallocUsed:     42608 kB	VmallocUsed:     42608 kB
VmallocChunk:   473432 kB	VmallocChunk:   473432 kB





^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ext[23]/lilo/2.5.{68,69,70} -- blkdev_put() problem?
  2003-06-12 17:27             ` Andy Pfiffer
@ 2003-06-12 17:53               ` Andrew Morton
  2003-06-12 18:03                 ` Andy Pfiffer
  0 siblings, 1 reply; 28+ messages in thread
From: Andrew Morton @ 2003-06-12 17:53 UTC (permalink / raw)
  To: Andy Pfiffer; +Cc: christophe, adam, linux-kernel

Andy Pfiffer <andyp@osdl.org> wrote:
>
> Dirty:               0 kB	Dirty:               4 kB <---

OK.  And are you using initrd as well?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ext[23]/lilo/2.5.{68,69,70} -- blkdev_put() problem?
  2003-06-12 17:53               ` Andrew Morton
@ 2003-06-12 18:03                 ` Andy Pfiffer
  2003-06-12 18:10                   ` Andrew Morton
  2003-06-12 18:25                   ` Andy Pfiffer
  0 siblings, 2 replies; 28+ messages in thread
From: Andy Pfiffer @ 2003-06-12 18:03 UTC (permalink / raw)
  To: Andrew Morton; +Cc: christophe, adam, linux-kernel

On Thu, 2003-06-12 at 10:53, Andrew Morton wrote:
> Andy Pfiffer <andyp@osdl.org> wrote:
> >
> > Dirty:               0 kB	Dirty:               4 kB <---
> 
> OK.  And are you using initrd as well?

It is specified in lilo.conf (SuSE 8.0 distro) but I don't see any
reason to keep it.  I'll yank it and see if it makes a difference.

Andy



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ext[23]/lilo/2.5.{68,69,70} -- blkdev_put() problem?
  2003-06-12 18:03                 ` Andy Pfiffer
@ 2003-06-12 18:10                   ` Andrew Morton
  2003-06-12 18:53                     ` Christophe Saout
  2003-06-12 18:25                   ` Andy Pfiffer
  1 sibling, 1 reply; 28+ messages in thread
From: Andrew Morton @ 2003-06-12 18:10 UTC (permalink / raw)
  To: Andy Pfiffer; +Cc: christophe, adam, linux-kernel

Andy Pfiffer <andyp@osdl.org> wrote:
>
> On Thu, 2003-06-12 at 10:53, Andrew Morton wrote:
> > Andy Pfiffer <andyp@osdl.org> wrote:
> > >
> > > Dirty:               0 kB	Dirty:               4 kB <---
> > 
> > OK.  And are you using initrd as well?
> 
> It is specified in lilo.conf (SuSE 8.0 distro) but I don't see any
> reason to keep it.  I'll yank it and see if it makes a difference.
> 

That would be interesting.

Also, what about this shot in the dark?


--- 25/fs/fs-writeback.c~a	2003-06-12 11:08:34.000000000 -0700
+++ 25-akpm/fs/fs-writeback.c	2003-06-12 11:08:39.000000000 -0700
@@ -368,7 +368,7 @@ void sync_inodes_sb(struct super_block *
 	};
 
 	get_page_state(&ps);
-	wbc.nr_to_write = ps.nr_dirty + ps.nr_unstable +
+	wbc.nr_to_write = ps.nr_dirty + ps.nr_unstable + 1024 +
 		(ps.nr_dirty + ps.nr_unstable) / 4;
 	spin_lock(&inode_lock);
 	sync_sb_inodes(sb, &wbc);

_


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ext[23]/lilo/2.5.{68,69,70} -- blkdev_put() problem?
  2003-06-12 18:03                 ` Andy Pfiffer
  2003-06-12 18:10                   ` Andrew Morton
@ 2003-06-12 18:25                   ` Andy Pfiffer
  2003-06-13  8:01                     ` Andrew Morton
  1 sibling, 1 reply; 28+ messages in thread
From: Andy Pfiffer @ 2003-06-12 18:25 UTC (permalink / raw)
  To: Andrew Morton; +Cc: christophe, adam, linux-kernel

On Thu, 2003-06-12 at 11:03, Andy Pfiffer wrote:
> On Thu, 2003-06-12 at 10:53, Andrew Morton wrote:
> > Andy Pfiffer <andyp@osdl.org> wrote:
> > >
> > > Dirty:               0 kB	Dirty:               4 kB <---
> > 
> > OK.  And are you using initrd as well?
> 
> It is specified in lilo.conf (SuSE 8.0 distro) but I don't see any
> reason to keep it.  I'll yank it and see if it makes a difference.

pure == 2.5.68
kludge == 2.5.68+kludge in blkdev_put()

% grep Dirt =noinitrd-*
=noinitrd-kludge=:Dirty:               0 kB   # before
=noinitrd-kludge=:Dirty:               4 kB   # after
=noinitrd-pure=:Dirty:               0 kB     # before
=noinitrd-pure=:Dirty:               4 kB     # after

So it would appear to me that initrd is the common denominator among
those of us reporting similar symptoms.

Andy





^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ext[23]/lilo/2.5.{68,69,70} -- blkdev_put() problem?
  2003-06-12 18:10                   ` Andrew Morton
@ 2003-06-12 18:53                     ` Christophe Saout
  0 siblings, 0 replies; 28+ messages in thread
From: Christophe Saout @ 2003-06-12 18:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Andy Pfiffer, adam, linux-kernel

Am Don, 2003-06-12 um 20.10 schrieb Andrew Morton:

> Also, what about this shot in the dark?

> -	wbc.nr_to_write = ps.nr_dirty + ps.nr_unstable +
> +	wbc.nr_to_write = ps.nr_dirty + ps.nr_unstable + 1024 +

Nope, still 4k dirty left after lilo.

-- 
Christophe Saout <christophe@saout.de>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ext[23]/lilo/2.5.{68,69,70} -- blkdev_put() problem?
  2003-06-12 18:25                   ` Andy Pfiffer
@ 2003-06-13  8:01                     ` Andrew Morton
  2003-06-13  9:57                       ` Herbert Xu
                                         ` (3 more replies)
  0 siblings, 4 replies; 28+ messages in thread
From: Andrew Morton @ 2003-06-13  8:01 UTC (permalink / raw)
  To: Andy Pfiffer
  Cc: christophe, adam, linux-kernel, Herbert Xu, Unai Garro Arrazola,
	Max Valdez, Eduardo Pereira Habkost


This should fix it.



Once the blockdev inode for /dev/ram0 is dirtied we have a memory-backed
inode on the blockdev superblock's s_dirty list.

sync_sb_inodes() sees the memory-backed inode on the superblock and assumes
that all the other inodes on the superblock are also memory-backed.  This is
not true for the blockdev superblock!  We forget to write out dirty pages
against the following blockdevs.

Fix this by just leaving the inode dirty and moving on to inspect the other
blockdev inodes on sb->s_io.

(This is a little inefficient: an alternative is to leave dirtied
memory-backed inodes on inode_in_use, so nobody ever even considers them for
writeout.  But that introduces an inconsistency and is a bit kludgey).



 fs/fs-writeback.c |   15 ++++++++++++++-
 1 files changed, 14 insertions(+), 1 deletion(-)

diff -puN fs/fs-writeback.c~writeback-memory-backed-fix fs/fs-writeback.c
--- 25/fs/fs-writeback.c~writeback-memory-backed-fix	2003-06-12 23:12:28.000000000 -0700
+++ 25-akpm/fs/fs-writeback.c	2003-06-12 23:14:07.000000000 -0700
@@ -260,8 +260,21 @@ sync_sb_inodes(struct super_block *sb, s
 		struct address_space *mapping = inode->i_mapping;
 		struct backing_dev_info *bdi = mapping->backing_dev_info;
 
-		if (bdi->memory_backed)
+		if (bdi->memory_backed) {
+			if (sb == blockdev_superblock) {
+				/*
+				 * Dirty memory-backed blockdev: the ramdisk
+				 * driver does this.
+				 */
+				list_move(&inode->i_list, &sb->s_dirty);
+				continue;
+			}
+			/*
+			 * Assume that all inodes on this superblock are memory
+			 * backed.  Skip the superblock.
+			 */
 			break;
+		}
 
 		if (wbc->nonblocking && bdi_write_congested(bdi)) {
 			wbc->encountered_congestion = 1;

_


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ext[23]/lilo/2.5.{68,69,70} -- blkdev_put() problem?
  2003-06-13  8:01                     ` Andrew Morton
@ 2003-06-13  9:57                       ` Herbert Xu
  2003-06-13 14:42                       ` Eduardo Pereira Habkost
                                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 28+ messages in thread
From: Herbert Xu @ 2003-06-13  9:57 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

On Fri, Jun 13, 2003 at 01:01:49AM -0700, Andrew Morton wrote:
> 
> Fix this by just leaving the inode dirty and moving on to inspect the other
> blockdev inodes on sb->s_io.

This fixes it for me.  Thanks Andrew.
-- 
Debian GNU/Linux 3.0 is out! ( http://www.debian.org/ )
Email:  Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ext[23]/lilo/2.5.{68,69,70} -- blkdev_put() problem?
  2003-06-13  8:01                     ` Andrew Morton
  2003-06-13  9:57                       ` Herbert Xu
@ 2003-06-13 14:42                       ` Eduardo Pereira Habkost
  2003-06-13 17:17                       ` Andy Pfiffer
  2003-06-13 22:12                       ` Unai Garro Arrazola
  3 siblings, 0 replies; 28+ messages in thread
From: Eduardo Pereira Habkost @ 2003-06-13 14:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andy Pfiffer, christophe, adam, linux-kernel, Herbert Xu,
	Unai Garro Arrazola, Max Valdez

[-- Attachment #1: Type: text/plain, Size: 138 bytes --]

On Fri, Jun 13, 2003 at 01:01:49AM -0700, Andrew Morton wrote:
> 
> This should fix it.
> 

It worked here. Thanks!

-- 
Eduardo

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ext[23]/lilo/2.5.{68,69,70} -- blkdev_put() problem?
  2003-06-13  8:01                     ` Andrew Morton
  2003-06-13  9:57                       ` Herbert Xu
  2003-06-13 14:42                       ` Eduardo Pereira Habkost
@ 2003-06-13 17:17                       ` Andy Pfiffer
  2003-06-13 22:12                       ` Unai Garro Arrazola
  3 siblings, 0 replies; 28+ messages in thread
From: Andy Pfiffer @ 2003-06-13 17:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: christophe, adam, linux-kernel, Herbert Xu, Unai Garro Arrazola,
	Max Valdez, Eduardo Pereira Habkost

On Fri, 2003-06-13 at 01:01, Andrew Morton wrote:
> Fix this by just leaving the inode dirty and moving on to inspect the other
> blockdev inodes on sb->s_io.

Yup, this fixed it for me, too.  Thanks for your help.  --Andy




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ext[23]/lilo/2.5.{68,69,70} -- blkdev_put() problem?
  2003-06-13  8:01                     ` Andrew Morton
                                         ` (2 preceding siblings ...)
  2003-06-13 17:17                       ` Andy Pfiffer
@ 2003-06-13 22:12                       ` Unai Garro Arrazola
  2003-06-13 22:28                         ` Andrew Morton
  3 siblings, 1 reply; 28+ messages in thread
From: Unai Garro Arrazola @ 2003-06-13 22:12 UTC (permalink / raw)
  To: Andrew Morton, Andy Pfiffer
  Cc: christophe, adam, linux-kernel, Herbert Xu, Max Valdez,
	Eduardo Pereira Habkost

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I just got the time to checked. It works great, thanks! Where can I send this 
box of chocolates as gratitude? ;-)

On Friday 13 June 2003 09:01, Andrew Morton wrote:
> This should fix it.
>
>
>
> Once the blockdev inode for /dev/ram0 is dirtied we have a memory-backed
> inode on the blockdev superblock's s_dirty list.
>
> sync_sb_inodes() sees the memory-backed inode on the superblock and assumes
> that all the other inodes on the superblock are also memory-backed.  This
> is not true for the blockdev superblock!  We forget to write out dirty
> pages against the following blockdevs.
>
> Fix this by just leaving the inode dirty and moving on to inspect the other
> blockdev inodes on sb->s_io.
>
> (This is a little inefficient: an alternative is to leave dirtied
> memory-backed inodes on inode_in_use, so nobody ever even considers them
> for writeout.  But that introduces an inconsistency and is a bit kludgey).
>
>
>
>  fs/fs-writeback.c |   15 ++++++++++++++-
>  1 files changed, 14 insertions(+), 1 deletion(-)
>
> diff -puN fs/fs-writeback.c~writeback-memory-backed-fix fs/fs-writeback.c
> --- 25/fs/fs-writeback.c~writeback-memory-backed-fix	2003-06-12
> 23:12:28.000000000 -0700 +++ 25-akpm/fs/fs-writeback.c	2003-06-12
> 23:14:07.000000000 -0700
> @@ -260,8 +260,21 @@ sync_sb_inodes(struct super_block *sb, s
>  		struct address_space *mapping = inode->i_mapping;
>  		struct backing_dev_info *bdi = mapping->backing_dev_info;
>
> -		if (bdi->memory_backed)
> +		if (bdi->memory_backed) {
> +			if (sb == blockdev_superblock) {
> +				/*
> +				 * Dirty memory-backed blockdev: the ramdisk
> +				 * driver does this.
> +				 */
> +				list_move(&inode->i_list, &sb->s_dirty);
> +				continue;
> +			}
> +			/*
> +			 * Assume that all inodes on this superblock are memory
> +			 * backed.  Skip the superblock.
> +			 */
>  			break;
> +		}
>
>  		if (wbc->nonblocking && bdi_write_congested(bdi)) {
>  			wbc->encountered_congestion = 1;
>
> _

- -- 
Coincidences are spiritual puns.
		-- G.K. Chesterton
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+6kxjhxDfDIoZlaURAsHrAKCRFnHCpzdBbtJ8C9vrY6P7T9+dYACgg+fL
XYizhhJD8KZ3bO4O/YzXr2c=
=Rwik
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ext[23]/lilo/2.5.{68,69,70} -- blkdev_put() problem?
  2003-06-13 22:12                       ` Unai Garro Arrazola
@ 2003-06-13 22:28                         ` Andrew Morton
  0 siblings, 0 replies; 28+ messages in thread
From: Andrew Morton @ 2003-06-13 22:28 UTC (permalink / raw)
  To: Unai Garro Arrazola
  Cc: andyp, christophe, adam, linux-kernel, herbert, maxvalde, ehabkost

Unai Garro Arrazola <Unai.Garro@ee.ed.ac.uk> wrote:
>
> I just got the time to checked. It works great, thanks!

Thanks for following up.

> Where can I send this box of chocolates as gratitude? ;-)

Not to the guy who broke it in the first place ;)



^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2003-06-13 22:18 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-05-09 19:04 [KEXEC][2.5.69] kexec for 2.5.69 available Andy Pfiffer
2003-05-09 20:04 ` ext3/lilo/2.5.6[89] (was: [KEXEC][2.5.69] kexec for 2.5.69 available) Christophe Saout
2003-05-09 20:55   ` Andy Pfiffer
2003-05-09 20:46     ` ext3/lilo/2.5.6[89] (was: [KEXEC][2.5.69] kexec for 2.5.69available) Riley Williams
2003-05-09 22:39       ` Joe Korty
2003-05-09 23:39       ` Andy Pfiffer
2003-06-11 22:08     ` ext[23]/lilo/2.5.{68,69,70} -- IDE Problem? Andy Pfiffer
2003-06-11 23:21       ` Christophe Saout
2003-06-11 23:40         ` Bartlomiej Zolnierkiewicz
2003-06-12  0:20         ` ext[23]/lilo/2.5.{68,69,70} -- blkdev_put() problem? Andy Pfiffer
2003-06-12  0:29           ` Andrew Morton
2003-06-12 10:42             ` Christophe Saout
2003-06-12 10:54               ` Andrew Morton
2003-06-12 11:12                 ` Christophe Saout
2003-06-12 11:24                 ` ext[23]/lilo/2.5.{68,69,70} -- strace lilo - freeze Christophe Saout
2003-06-12 12:44                 ` ext[23]/lilo/2.5.{68,69,70} -- blkdev_put() problem? Herbert Xu
2003-06-12 17:27             ` Andy Pfiffer
2003-06-12 17:53               ` Andrew Morton
2003-06-12 18:03                 ` Andy Pfiffer
2003-06-12 18:10                   ` Andrew Morton
2003-06-12 18:53                     ` Christophe Saout
2003-06-12 18:25                   ` Andy Pfiffer
2003-06-13  8:01                     ` Andrew Morton
2003-06-13  9:57                       ` Herbert Xu
2003-06-13 14:42                       ` Eduardo Pereira Habkost
2003-06-13 17:17                       ` Andy Pfiffer
2003-06-13 22:12                       ` Unai Garro Arrazola
2003-06-13 22:28                         ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).