linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Swap doesn't work
@ 2002-10-27 19:41 Tim Tassonis
  2002-10-27 20:03 ` Christoph Hellwig
  2002-10-27 21:17 ` Alan Cox
  0 siblings, 2 replies; 6+ messages in thread
From: Tim Tassonis @ 2002-10-27 19:41 UTC (permalink / raw)
  To: linux-kernel

Hi Alan

> On Sun, 2002-10-27 at 14:48, Vladimír Tøebický wrote:
> > > > That's not a badblock. That's an kernel IDE bug. Andre Hedrick and
> > > > Alan Cox will love to see this.
> > >
> > > Not on a kernel built with an untrusted hand built tool chain
> > >
> > Well, I don't know what could possibly cause this kind of error except
> > kernel.
> > No matter what application I use to read or write /dev/hda6. Which
> > part of my tool chain do you have in mind?
> 
> gcc and binutils. I get so many weird never duplicated reports from
> linux from scratch people that don't happen to anyone else that I treat
> them with deep suspicion.  Especially because it sometimes goes away if
> they instead build the same kernel with Debian/Red Hat/.. binutils/gcc

Not that I would know better or have an idea why this bug happens, but to
say "Bugger off if you have an lfs system" is a bit lousy, I think. After
all, lfs has not really an "unstrusted toolchain", as compared to
RH/Suse's/Debian "trustworthy computing toolchains":

lfs has a manual with clearly specified package versions, patches and
order of "toolchaining". It might well be a bug in that chain, but other
distros have bugs, too. Signing software doesn't make it superior, after
all.

However, the error does not happen on my crappy lfs system, but then
again, I run it in a vmware, with the virtual disks set up as scsi...

Bye
Tim




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Swap doesn't work
  2002-10-27 19:41 Swap doesn't work Tim Tassonis
@ 2002-10-27 20:03 ` Christoph Hellwig
  2002-10-27 20:13   ` Tim Tassonis
  2002-10-27 21:17 ` Alan Cox
  1 sibling, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2002-10-27 20:03 UTC (permalink / raw)
  To: Tim Tassonis; +Cc: linux-kernel

On Sun, Oct 27, 2002 at 08:41:10PM +0100, Tim Tassonis wrote:
> Not that I would know better or have an idea why this bug happens, but to
> say "Bugger off if you have an lfs system" is a bit lousy, I think. After
> all, lfs has not really an "unstrusted toolchain", as compared to
> RH/Suse's/Debian "trustworthy computing toolchains":

Sorry, tons of people that have absolute no clue about the package
internals set up their systems themselves and make mistakes.  nothing
spectacular, but they just don't have those people who know the
packages in detail and can notice and fix the bugs.  Just get binary
rpm/deb whatever of the toolchain and reproduce.

> lfs has a manual with clearly specified package versions, patches and
> order of "toolchaining". It might well be a bug in that chain, but other
> distros have bugs, too. Signing software doesn't make it superior, after
> all.

but having people who understand the software maintain the
packages sometimes helps :)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Swap doesn't work
  2002-10-27 20:03 ` Christoph Hellwig
@ 2002-10-27 20:13   ` Tim Tassonis
  0 siblings, 0 replies; 6+ messages in thread
From: Tim Tassonis @ 2002-10-27 20:13 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-kernel

> Sorry, tons of people that have absolute no clue about the package
> internals set up their systems themselves and make mistakes.  nothing
> spectacular, but they just don't have those people who know the
> packages in detail and can notice and fix the bugs.  Just get binary
> rpm/deb whatever of the toolchain and reproduce.

As I said, I can't reproduce it even on my lfs system, maybe because my
disks are scsi. So reproducing on my Red Hat wouldn't really help, would
it?

> > lfs has a manual with clearly specified package versions, patches and
> > order of "toolchaining". It might well be a bug in that chain, but
> > other distros have bugs, too. Signing software doesn't make it
> > superior, after all.
> 
> but having people who understand the software maintain the
> packages sometimes helps :)

As far as I know, lfs does not maintain the packages. binutils and gcc are
maintained by FSF to my knowledge.

Bye
Tim

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Swap doesn't work
  2002-10-27 19:41 Swap doesn't work Tim Tassonis
  2002-10-27 20:03 ` Christoph Hellwig
@ 2002-10-27 21:17 ` Alan Cox
  2002-10-28 12:22   ` Tim Tassonis
  1 sibling, 1 reply; 6+ messages in thread
From: Alan Cox @ 2002-10-27 21:17 UTC (permalink / raw)
  To: Tim Tassonis; +Cc: Linux Kernel Mailing List

On Sun, 2002-10-27 at 19:41, Tim Tassonis wrote:
> Not that I would know better or have an idea why this bug happens, but to
> say "Bugger off if you have an lfs system" is a bit lousy, I think. After
> all, lfs has not really an "unstrusted toolchain", as compared to
> RH/Suse's/Debian "trustworthy computing toolchains":

I get bugs that are clearly caused by miscompiled tool chains from Linux
from scratch people. I trust the RH, SuSE and Debian tool chains because
they have any neccessary patches applied for compiler bugs and they are
running against a properly built glibc and binutils.

If you simply grab the latest and greatest of everything from
ftp.gnu.org then quite often it won't work. 

If you'd like to me to spend hours debugging an LFS system where its
probably a tool error, then you can ask for current hourly rates.


Alan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Swap doesn't work
  2002-10-27 21:17 ` Alan Cox
@ 2002-10-28 12:22   ` Tim Tassonis
  2002-10-29 12:15     ` kernel BUG in page_alloc.c (rmqueue function) Steffen Persvold
  0 siblings, 1 reply; 6+ messages in thread
From: Tim Tassonis @ 2002-10-28 12:22 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

On 27 Oct 2002 21:17:34 +0000
Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:

> On Sun, 2002-10-27 at 19:41, Tim Tassonis wrote:
> > Not that I would know better or have an idea why this bug happens, but
> > to say "Bugger off if you have an lfs system" is a bit lousy, I think.
> > After all, lfs has not really an "unstrusted toolchain", as compared
> > to RH/Suse's/Debian "trustworthy computing toolchains":
> 
> I get bugs that are clearly caused by miscompiled tool chains from Linux
> from scratch people. I trust the RH, SuSE and Debian tool chains because
> they have any neccessary patches applied for compiler bugs and they are
> running against a properly built glibc and binutils.
> 
> If you simply grab the latest and greatest of everything from
> ftp.gnu.org then quite often it won't work. 

That's certainly true and before claiming a kernel bug, I would try
against a Red Hat System personally. Still, lfs does have gcc patches
included, it's not just cvs checkout from the relevant packages. It also
seems to have a sane order of compiling everything.

> If you'd like to me to spend hours debugging an LFS system where its
> probably a tool error, then you can ask for current hourly rates.

That wasn't actually my idea. And you are right, before claiming a kernel
bug one should probably always try to reproduce it against a different
system.

Bye
Tim


^ permalink raw reply	[flat|nested] 6+ messages in thread

* kernel BUG in page_alloc.c (rmqueue function)
  2002-10-28 12:22   ` Tim Tassonis
@ 2002-10-29 12:15     ` Steffen Persvold
  0 siblings, 0 replies; 6+ messages in thread
From: Steffen Persvold @ 2002-10-29 12:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2100 bytes --]

Hi all,

Lately I've been struggeling with kernels Oopsing in page_alloc.c, 
rmqueue() function. The line which triggers the Oops is actually in 
expand() which is inlined from rmqueue() (and others). In my kernel source 
(2.4.20-pre11), expand looks like this :

#define MARK_USED(index, order, area) \
        __change_bit((index) >> (1+(order)), (area)->map)

static inline struct page * expand (zone_t *zone, struct page *page,
         unsigned long index, int low, int high, free_area_t * area)
{
        unsigned long size = 1 << high;

        while (high > low) {
                if (BAD_RANGE(zone,page))
                        BUG();
                area--;
                high--;
                size >>= 1;
                list_add(&(page)->list, &(area)->free_list);
                MARK_USED(index, high, area);
                index += size;
                page += size;
        }
        if (BAD_RANGE(zone,page))
                BUG();
        return page;
}

The line that triggers the BUG is the last BAD_RANGE check. The module 
that calls __alloc_pages() is doing it in the following way :

addr = __get_free_page(GFP_KERNEL);

and frees the page with free_page(addr);

The machine configuration is RedHat 7.3 (gcc-2.96-110, 
binutils-2.11.93.0.2-11), 2 Xeon processors @ 2.2 GHz and 2GB RAM.

The module in question is not a part of the kernel tree, but the source is 
available if someone is interested. However, I'm really interrested in 
situations that could cause BAD_RANGE() to fail (since it is commented 
with a "Temporary debugging check") because of the above usage (which 
seems very straight forward to me).

The really strange thing is that the problem seem to disappear if I apply 
the per_cpu_pages patch by Ingo Molnar as found in the RedHat 
2.4.18-17.7.x kernel with a few modifications to make it fit on 
2.4.20-pre11 (attached).

Any help greatly appreciated.

Thanks,
 -- 
  Steffen Persvold   |       Scali AS      
 mailto:sp@scali.com |  http://www.scali.com
Tel: (+47) 2262 8950 |   Olaf Helsets vei 6
Fax: (+47) 2262 8951 |   N0621 Oslo, NORWAY




[-- Attachment #2: Type: TEXT/PLAIN, Size: 6230 bytes --]

--- linux-2.4.20-pre11/include/linux/mmzone.h.old	Mon Oct 28 17:32:34 2002
+++ linux-2.4.20-pre11/include/linux/mmzone.h	Mon Oct 28 06:40:59 2002
@@ -26,6 +26,14 @@
 
 struct pglist_data;
 
+#define MAX_PER_CPU_PAGES 512
+
+typedef struct per_cpu_pages_s {
+	int nr_pages;
+	int max_nr_pages;
+	struct list_head head;
+} __attribute__((aligned(L1_CACHE_BYTES))) per_cpu_t;
+
 /*
  * On machines where it is needed (eg PCs) we divide physical memory
  * into multiple physical zones. On a PC we have 3 zones:
@@ -38,6 +46,7 @@
 	/*
 	 * Commonly accessed fields:
 	 */
+	per_cpu_t		cpu_pages [NR_CPUS];
 	spinlock_t		lock;
 	unsigned long		free_pages;
 	unsigned long		pages_min, pages_low, pages_high;
--- linux-2.4.20-pre11/mm/page_alloc.c.old	Mon Oct 28 06:18:22 2002
+++ linux-2.4.20-pre11/mm/page_alloc.c	Mon Oct 28 06:35:24 2002
@@ -10,6 +10,7 @@
  *  Reshaped it to be a zoned allocator, Ingo Molnar, Red Hat, 1999
  *  Discontiguous memory support, Kanoj Sarcar, SGI, Nov 1999
  *  Zone balancing, Kanoj Sarcar, SGI, Jan 2000
+ *  Per-CPU page pool, Ingo Molnar, Red Hat, 2001, 2002
  */
 
 #include <linux/config.h>
@@ -21,6 +22,7 @@
 #include <linux/bootmem.h>
 #include <linux/slab.h>
 #include <linux/module.h>
+#include <linux/smp.h>
 
 int nr_swap_pages;
 int nr_active_pages;
@@ -54,6 +56,73 @@
 )
 
 /*
+ * Inline functions to control some balancing in the VM.
+ *
+ * Note that we do both global and per-zone balancing, with
+ * most of the balancing done globally.
+ */
+#define PLENTY_FACTOR   2
+#define ALL_ZONES       NULL
+#define ANY_ZONE        (struct zone_struct *)(~0UL)
+#define INACTIVE_FACTOR 5
+
+#define VM_MIN  0
+#define VM_LOW  1
+#define VM_HIGH 2
+#define VM_PLENTY 3
+static inline int zone_free_limit(struct zone_struct * zone, int limit)
+{
+	int free, target, delta;
+
+	/* This is really nasty, but GCC should completely optimise it away. */
+	if (limit == VM_MIN)
+		target = zone->pages_min;
+	else if (limit == VM_LOW)
+		target = zone->pages_low;
+	else if (limit == VM_HIGH)
+		target = zone->pages_high;
+	else
+		target = zone->pages_high * PLENTY_FACTOR;
+
+	free = zone->free_pages;
+	delta = target - free;
+
+	return delta;
+}
+
+static inline int free_limit(struct zone_struct * zone, int limit)
+{
+	int shortage = 0, local;
+
+	if (zone == ALL_ZONES) {
+		for_each_zone(zone)
+			shortage += zone_free_limit(zone, limit);
+	} else if (zone == ANY_ZONE) {
+		for_each_zone(zone) {
+			local = zone_free_limit(zone, limit);
+			shortage += max(local, 0);
+		}
+	} else {
+		shortage = zone_free_limit(zone, limit);
+	}
+
+	return shortage;
+}
+
+/*
+ * free_high - test if amount of free pages is less than ideal
+ * @zone: zone to test, ALL_ZONES to test memory globally
+ *
+ * Returns a positive value if the number of free and clean
+ * pages is below kswapd's target, zero or negative if we
+ * have more than enough free and clean pages.
+ */
+static inline int free_high(struct zone_struct * zone)
+{
+	return free_limit(zone, VM_HIGH);
+}
+
+/*
  * Freeing function for a buddy system allocator.
  * Contrary to prior comments, this is *NOT* hairy, and there
  * is no reason for anyone not to understand it.
@@ -84,6 +153,7 @@
 	unsigned long index, page_idx, mask, flags;
 	free_area_t *area;
 	struct page *base;
+	per_cpu_t *per_cpu;
 	zone_t *zone;
 
 	/*
@@ -96,6 +166,13 @@
 		lru_cache_del(page);
 	}
 
+	/*
+	 * This late check is safe because reserved pages do not
+	 * have a valid page->count. This trick avoids overhead
+	 * in __free_pages().
+	 */
+	if (PageReserved(page))
+		return;
 	if (page->buffers)
 		BUG();
 	if (page->mapping)
@@ -123,8 +200,19 @@
 
 	area = zone->free_area + order;
 
-	spin_lock_irqsave(&zone->lock, flags);
+	per_cpu = zone->cpu_pages + smp_processor_id();
+
+	__save_flags(flags);
+	__cli();
+	if (!order && (per_cpu->nr_pages < per_cpu->max_nr_pages) && (free_high(zone)<=0)) {
+		list_add(&page->list, &per_cpu->head);
+		per_cpu->nr_pages++;
+		__restore_flags(flags);
+		return;
+	}
 
+	spin_lock(&zone->lock);
+	
 	zone->free_pages -= mask;
 
 	while (mask + (1 << (MAX_ORDER-1))) {
@@ -198,13 +286,31 @@
 static FASTCALL(struct page * rmqueue(zone_t *zone, unsigned int order));
 static struct page * rmqueue(zone_t *zone, unsigned int order)
 {
+	per_cpu_t *per_cpu = zone->cpu_pages + smp_processor_id();
 	free_area_t * area = zone->free_area + order;
 	unsigned int curr_order = order;
 	struct list_head *head, *curr;
 	unsigned long flags;
 	struct page *page;
+	int threshold = 0;
+
+	if (!(current->flags & PF_MEMALLOC)) threshold = per_cpu->max_nr_pages/8;
+	__save_flags(flags);
+	__cli();
+
+	if (!order && (per_cpu->nr_pages>threshold)) {
+		if (list_empty(&per_cpu->head))
+			BUG();
+		page = list_entry(per_cpu->head.next, struct page, list);
+		list_del(&page->list);
+		per_cpu->nr_pages--;
+		__restore_flags(flags);
 
-	spin_lock_irqsave(&zone->lock, flags);
+		set_page_count(page, 1);
+		return page;
+	}
+
+	spin_lock(&zone->lock);
 	do {
 		head = &area->free_list;
 		curr = head->next;
@@ -450,7 +556,7 @@
 
 void __free_pages(struct page *page, unsigned int order)
 {
-	if (!PageReserved(page) && put_page_testzero(page))
+	if (put_page_testzero(page))
 		__free_pages_ok(page, order);
 }
 
@@ -726,6 +832,7 @@
 
 	offset = lmem_map - mem_map;	
 	for (j = 0; j < MAX_NR_ZONES; j++) {
+		int k;
 		zone_t *zone = pgdat->node_zones + j;
 		unsigned long mask;
 		unsigned long size, realsize;
@@ -738,6 +845,18 @@
 		printk("zone(%lu): %lu pages.\n", j, size);
 		zone->size = size;
 		zone->name = zone_names[j];
+
+		for (k = 0; k < NR_CPUS; k++) {
+			per_cpu_t *per_cpu = zone->cpu_pages + k;
+
+			INIT_LIST_HEAD(&per_cpu->head);
+			per_cpu->max_nr_pages = realsize / smp_num_cpus / 128;
+			if (per_cpu->max_nr_pages > MAX_PER_CPU_PAGES)
+				per_cpu->max_nr_pages = MAX_PER_CPU_PAGES;
+			else
+				if (!per_cpu->max_nr_pages)
+					per_cpu->max_nr_pages = 1;
+		}
 		zone->lock = SPIN_LOCK_UNLOCKED;
 		zone->zone_pgdat = pgdat;
 		zone->free_pages = 0;

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2002-10-29 12:08 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-10-27 19:41 Swap doesn't work Tim Tassonis
2002-10-27 20:03 ` Christoph Hellwig
2002-10-27 20:13   ` Tim Tassonis
2002-10-27 21:17 ` Alan Cox
2002-10-28 12:22   ` Tim Tassonis
2002-10-29 12:15     ` kernel BUG in page_alloc.c (rmqueue function) Steffen Persvold

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).