* Re: SWSUSP Discontiguous pagedirs [not found] ` <20030227132024.GB27084@atrey.karlin.mff.cuni.cz> @ 2003-02-27 18:42 ` Nigel Cunningham 2003-03-01 4:22 ` SWSUSP Discontiguous pagedir patch Nigel Cunningham 1 sibling, 0 replies; 42+ messages in thread From: Nigel Cunningham @ 2003-02-27 18:42 UTC (permalink / raw) To: Pavel Machek; +Cc: Linux Kernel Mailing List On Fri, 2003-02-28 at 02:20, Pavel Machek wrote: > Hi! > > > SPAM: Content analysis details: (6.30 hits, 5 required) > > SPAM: SUBJ_HAS_SPACES (2.6 points) Subject contains lots of white space > Spam assassin clearly does not like you :-(. I'll make my subject lines shorter :> > > Well, I might ask how many people you know with 4GB of swap and 4GB of > > RAM they want to suspend to disk :> Don't forget we still aren't > > handling himem anyway (at least not last time I checked). As y > > Well, on x86-64 it should be able to suspend 8GB machine just fine -- > being 64bit means you don't have to deal with himem. Plus it would > only be 2GB limit on x86-64. I was thinking about this before I got up. If the code was a hybrid of what we have now and my changes, there wouldn't need to be such a limit. If I am thinking straight, the number of pages to be copied back using suspend_asm.S will always be within this currently limit, because no highmem pages will be needed during the suspend process, so they can all be put in the second pageset. The only issue then is storage of the data for those pageset 2 pages. We could just add another layer of indirection(!), but that would result in quite inefficient memory usage in the pagedir struct beyond the end of pageset 1. Still, the alternative is more complicated code, and if you have that much memory anyway... I'll put some more thought into this. > > > If you still doubt the usefulness, perhaps you might try loading up 2.4, > > first with beta 16 applied and then with beta 18. In both cases, compare > > performance after loading up a bunch of applications and doing a suspend > > to disk cycle. Depending of course on what the applications are, beta 16 > > will be sluggish to respond (since it has to access disk a lot) whereas > > beta 18 will be much more responsive - as if you'd never suspended. To > > think in marketing terms for a moment, which would you rather have a > > reviewer comparing Linux and Windows see? > > As I'm used to machine pushed to swap, I can tolerate it quite > easily. shell/emacs/mutt is what I use, anyway... Mmm, but not all of us do. I'm using Evolution, Win4Lin... > > I don't know. I'd let Linus decide. I don't like hard limit on ammount > of mem, through. > > Is it possible to use some userspace app to page it back it? All things are possible, but not everything is beneficial :> (Bible). Actually, I'm not sure it's possible in this case. We can't restart processes when most of their memory space, along with all of the page cache and swap cache is still on disk and they think it's in RAM. Regards, Nigel ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: SWSUSP Discontiguous pagedir patch [not found] ` <20030227132024.GB27084@atrey.karlin.mff.cuni.cz> 2003-02-27 18:42 ` SWSUSP Discontiguous pagedirs Nigel Cunningham @ 2003-03-01 4:22 ` Nigel Cunningham 2003-03-02 23:55 ` Patrick Mochel 1 sibling, 1 reply; 42+ messages in thread From: Nigel Cunningham @ 2003-03-01 4:22 UTC (permalink / raw) To: Pavel Machek; +Cc: Linux Kernel Mailing List On Fri, 2003-02-28 at 02:20, Pavel Machek wrote: > > > b) introduces hard limit on how much pages you can save (4GB). > > > > Well, I might ask how many people you know with 4GB of swap and 4GB of > > RAM they want to suspend to disk :> Don't forget we still aren't > > handling himem anyway (at least not last time I checked). As y > > Well, on x86-64 it should be able to suspend 8GB machine just fine -- > being 64bit means you don't have to deal with himem. Plus it would > only be 2GB limit on x86-64. > [deletia] > > I don't know. I'd let Linus decide. I don't like hard limit on ammount > of mem, through. Hi again. I've thought things through some more. We need to keep in mind that other patches I intend to submit save the pages that aren't needed for the suspend process itself separately. Since this includes all the highmem pages and a reasonable proportion of the normal pages (easily more than half when we're talking high usage), we don't need to eat memory and we don't really have a hard limit on the size of the image. Presumably the same conditions will apply under x86-64. Thus, I still think we can go with the patch I submitted before. I've rediffed it against 2.5.63 (less the bits already applied). Regards, Nigel diff -ruN linux-2.5.63/arch/i386/kernel/Makefile linux-2.5.63-01/arch/i386/kernel/Makefile --- linux-2.5.63/arch/i386/kernel/Makefile 2003-03-01 15:10:16.000000000 +1300 +++ linux-2.5.63-01/arch/i386/kernel/Makefile 2003-03-01 15:14:28.000000000 +1300 @@ -23,7 +23,7 @@ obj-$(CONFIG_X86_MPPARSE) += mpparse.o obj-$(CONFIG_X86_LOCAL_APIC) += apic.o nmi.o obj-$(CONFIG_X86_IO_APIC) += io_apic.o -obj-$(CONFIG_SOFTWARE_SUSPEND) += suspend.o suspend_asm.o +obj-$(CONFIG_SOFTWARE_SUSPEND) += suspend.o obj-$(CONFIG_X86_NUMAQ) += numaq.o obj-$(CONFIG_EDD) += edd.o obj-$(CONFIG_MODULES) += module.o diff -ruN linux-2.5.63/arch/i386/kernel/suspend.c linux-2.5.63-01/arch/i386/kernel/suspend.c --- linux-2.5.63/arch/i386/kernel/suspend.c 2003-02-20 08:25:26.000000000 +1300 +++ linux-2.5.63-01/arch/i386/kernel/suspend.c 2003-02-20 08:27:36.000000000 +1300 @@ -133,3 +133,84 @@ } } + +/* Local variables for do_magic */ +static int loop __nosavedata = 0; +static int loop2 __nosavedata = 0; + +/* + * FIXME: This function should really be written in assembly. Actually + * requirement is that it does not touch stack, because %esp will be + * wrong during resume before restore_processor_context(). Check + * assembly if you modify this. + */ +void do_magic(int resume) +{ + if (!resume) { + do_magic_suspend_1(); + save_processor_state(); /* We need to capture registers and memory at "same time" */ + asm ( "movl %esp, saved_context_esp\n\t" + "movl %eax, saved_context_eax\n\t" + "movl %ebx, saved_context_ebx\n\t" + "movl %ecx, saved_context_ecx\n\t" + "movl %edx, saved_context_edx\n\t" + "movl %ebp, saved_context_ebp\n\t" + "movl %esi, saved_context_esi\n\t" + "movl %edi, saved_context_edi\n\t" + "pushfl ; popl saved_context_eflags\n\t"); + + do_magic_suspend_2(); /* If everything goes okay, this function does not return */ + return; + } + + /* We want to run from swapper_pg_dir, since swapper_pg_dir is stored in constant + * place in memory + */ + + __asm__( "movl %%ecx,%%cr3\n" ::"c"(__pa(swapper_pg_dir))); + +/* + * Final function for resuming: after copying the pages to their original + * position, it restores the register state. + * + * What about page tables? Writing data pages may toggle + * accessed/dirty bits in our page tables. That should be no problems + * with 4MB page tables. That's why we require have_pse. + * + * This loops destroys stack from under itself, so it better should + * not use any stack space, itself. When this function is entered at + * resume time, we move stack to _old_ place. This is means that this + * function must use no stack and no local variables in registers, + * until calling restore_processor_context(); + * + * Critical section here: noone should touch saved memory after + * do_magic_resume_1; copying works, because nr_copy_pages, + * pagedir_nosave, loop and loop2 are nosavedata. + */ + do_magic_resume_1(); + + for (loop=0; loop < nr_copy_pages; loop++) { + /* You may not call something (like copy_page) here: see above */ + for (loop2=0; loop2 < PAGE_SIZE; loop2++) { + *(((char *)(PAGEDIR_ENTRY(pagedir_nosave,loop)->orig_address))+loop2) = + *(((char *)(PAGEDIR_ENTRY(pagedir_nosave,loop)->address))+loop2); + __flush_tlb(); + } + } + + asm( "movl saved_context_esp, %esp\n\t" + "movl saved_context_ebp, %ebp\n\t" + "movl saved_context_eax, %eax\n\t" + "movl saved_context_ebx, %ebx\n\t" + "movl saved_context_ecx, %ecx\n\t" + "movl saved_context_edx, %edx\n\t" + "movl saved_context_esi, %esi\n\t" + "movl saved_context_edi, %edi\n\t"); + restore_processor_state(); + asm("pushl saved_context_eflags ; popfl\n\t"); + +/* Ahah, we now run with our old stack, and with registers copied from + suspend time */ + + do_magic_resume_2(); +} diff -ruN linux-2.5.63/include/linux/page-flags.h linux-2.5.63-01/include/linux/page-flags.h --- linux-2.5.63/include/linux/page-flags.h 2003-02-20 07:59:33.000000000 +1300 +++ linux-2.5.63-01/include/linux/page-flags.h 2003-02-20 08:28:31.000000000 +1300 @@ -74,6 +74,7 @@ #define PG_mappedtodisk 17 /* Has blocks allocated on-disk */ #define PG_reclaim 18 /* To be reclaimed asap */ #define PG_compound 19 /* Part of a compound page */ +#define PG_collides 20 /* swsusp - page used in save image */ /* * Global page accounting. One instance per CPU. Only unsigned longs are @@ -256,6 +257,9 @@ #define SetPageCompound(page) set_bit(PG_compound, &(page)->flags) #define ClearPageCompound(page) clear_bit(PG_compound, &(page)->flags) +#define PageCollides(page) test_bit(PG_collides, &(page)->flags) +#define SetPageCollides(page) set_bit(PG_collides, &(page)->flags) +#define ClearPageCollides(page) clear_bit(PG_collides, &(page)->flags) /* * The PageSwapCache predicate doesn't use a PG_flag at this time, * but it may again do so one day. diff -ruN linux-2.5.63/include/linux/suspend.h linux-2.5.63-01/include/linux/suspend.h --- linux-2.5.63/include/linux/suspend.h 2003-01-15 17:00:58.000000000 +1300 +++ linux-2.5.63-01/include/linux/suspend.h 2003-02-20 08:27:36.000000000 +1300 @@ -34,7 +34,7 @@ char version[20]; int num_cpus; int page_size; - suspend_pagedir_t *suspend_pagedir; + suspend_pagedir_t **suspend_pagedir; unsigned int num_pbes; struct swap_location { char filename[SWAP_FILENAME_MAXLENGTH]; @@ -42,6 +42,8 @@ }; #define SUSPEND_PD_PAGES(x) (((x)*sizeof(struct pbe))/PAGE_SIZE+1) +#define PAGEDIR_CAPACITY(x) (((x)*PAGE_SIZE/sizeof(struct pbe))) +#define PAGEDIR_ENTRY(pagedir, i) (pagedir[i/PAGEDIR_CAPACITY(1)] + (i%PAGEDIR_CAPACITY(1))) /* mm/vmscan.c */ extern int shrink_mem(void); @@ -61,7 +63,7 @@ extern void thaw_processes(void); extern unsigned int nr_copy_pages __nosavedata; -extern suspend_pagedir_t *pagedir_nosave __nosavedata; +extern suspend_pagedir_t **pagedir_nosave __nosavedata; /* Communication between kernel/suspend.c and arch/i386/suspend.c */ diff -ruN linux-2.5.63/kernel/suspend.c linux-2.5.63-01/kernel/suspend.c --- linux-2.5.63/kernel/suspend.c 2003-02-20 07:59:34.000000000 +1300 +++ linux-2.5.63-01/kernel/suspend.c 2003-02-20 10:42:52.000000000 +1300 @@ -96,7 +96,6 @@ static int new_loglevel = 7; static int orig_loglevel = 0; static int orig_fgconsole, orig_kmsg; -static int pagedir_order_check; static int nr_copy_pages_check; static int resume_status = 0; @@ -116,9 +115,9 @@ allocated at time of resume, that travels through memory not to collide with anything. */ -suspend_pagedir_t *pagedir_nosave __nosavedata = NULL; -static suspend_pagedir_t *pagedir_save; -static int pagedir_order __nosavedata = 0; +suspend_pagedir_t **pagedir_nosave __nosavedata = NULL; +static suspend_pagedir_t **pagedir_save = NULL; +static int pagedir_size __nosavedata = 0; struct link { char dummy[PAGE_SIZE - sizeof(swp_entry_t)]; @@ -395,7 +394,7 @@ { int i; swp_entry_t entry, prev = { 0 }; - int nr_pgdir_pages = SUSPEND_PD_PAGES(nr_copy_pages); + int pagedir_size = SUSPEND_PD_PAGES(nr_copy_pages); union diskpage *cur, *buffer = (union diskpage *)get_zeroed_page(GFP_ATOMIC); unsigned long address; struct page *page; @@ -410,16 +409,15 @@ if (swapfile_used[swp_type(entry)] != SWAPFILE_SUSPEND) panic("\nPage %d: not enough swapspace on suspend device", i ); - address = (pagedir_nosave+i)->address; + address = PAGEDIR_ENTRY(pagedir_nosave,i)->address; page = virt_to_page(address); rw_swap_page_sync(WRITE, entry, page); - (pagedir_nosave+i)->swap_address = entry; + PAGEDIR_ENTRY(pagedir_nosave,i)->swap_address = entry; } printk( "|\n" ); - printk( "Writing pagedir (%d pages): ", nr_pgdir_pages); - for (i=0; i<nr_pgdir_pages; i++) { - cur = (union diskpage *)((char *) pagedir_nosave)+i; - BUG_ON ((char *) cur != (((char *) pagedir_nosave) + i*PAGE_SIZE)); + printk( "Writing pagedir (%d pages): ", pagedir_size); + for (i=0; i<pagedir_size; i++) { + cur = (union diskpage *) pagedir_nosave[i]; printk( "." ); if (!(entry = get_swap_page()).val) { printk(KERN_CRIT "Not enough swapspace when writing pgdir\n" ); @@ -467,7 +465,7 @@ } /* if pagedir_p != NULL it also copies the counted pages */ -static int count_and_copy_data_pages(struct pbe *pagedir_p) +static int count_and_copy_data_pages(struct pbe **pagedir_p) { int chunk_size; int nr_copy_pages = 0; @@ -507,65 +505,88 @@ critical bios data? */ } else BUG(); - nr_copy_pages++; if (pagedir_p) { - pagedir_p->orig_address = ADDRESS(pfn); - copy_page((void *) pagedir_p->address, (void *) pagedir_p->orig_address); - pagedir_p++; + PAGEDIR_ENTRY(pagedir_p, nr_copy_pages)->orig_address = ADDRESS(pfn); + copy_page((void *) PAGEDIR_ENTRY(pagedir_p, nr_copy_pages)->address, (void *) PAGEDIR_ENTRY(pagedir_p, nr_copy_pages)->orig_address); } + nr_copy_pages++; } return nr_copy_pages; } -static void free_suspend_pagedir(unsigned long this_pagedir) +static void free_suspend_pagedir(struct pbe ** this_pagedir) { - struct page *page; - int pfn; - unsigned long this_pagedir_end = this_pagedir + - (PAGE_SIZE << pagedir_order); + int i; + int rangestart = -1, rangeend = -1; - for(pfn = 0; pfn < num_physpages; pfn++) { - page = pfn_to_page(pfn); - if (!TestClearPageNosave(page)) - continue; + if (pagedir_size == 0) + return; - if (ADDRESS(pfn) >= this_pagedir && ADDRESS(pfn) < this_pagedir_end) - continue; /* old pagedir gets freed in one */ - - free_page(ADDRESS(pfn)); + for(i = 0; i < nr_copy_pages; i++) { + if (PAGEDIR_ENTRY(this_pagedir,i)->address) { + if (rangestart > -1) { + printk("Pagedir entry %d-%d address2 not set!\n", rangestart, rangeend); + rangestart = -1; + } + ClearPageNosave(virt_to_page(PAGEDIR_ENTRY(this_pagedir,i)->address)); + free_page(PAGEDIR_ENTRY(this_pagedir,i)->address); + } else { + if (rangestart == -1) + rangestart = i; + rangeend = i; + } } - free_pages(this_pagedir, pagedir_order); + + if (rangestart > -1) + printk("Pagedir entry %d-%d address not set!\n", rangestart, nr_copy_pages - 1); + + for(i = 0; i < pagedir_size; i++) + free_page((unsigned long) this_pagedir[i]); + + free_page((unsigned long) this_pagedir); + this_pagedir = NULL; + nr_copy_pages = 0; + pagedir_size = 0; } -static suspend_pagedir_t *create_suspend_pagedir(int nr_copy_pages) +static suspend_pagedir_t **create_suspend_pagedir(int nr_copy_pages) { + suspend_pagedir_t **pagedir; + struct pbe **p; int i; - suspend_pagedir_t *pagedir; - struct pbe *p; - struct page *page; - pagedir_order = get_bitmask_order(SUSPEND_PD_PAGES(nr_copy_pages)); + pagedir_size = SUSPEND_PD_PAGES(nr_copy_pages); - p = pagedir = (suspend_pagedir_t *)__get_free_pages(GFP_ATOMIC | __GFP_COLD, pagedir_order); - if(!pagedir) + p = pagedir = (suspend_pagedir_t **)__get_free_pages(GFP_ATOMIC | __GFP_COLD, 0); + if(!p) return NULL; - page = virt_to_page(pagedir); - for(i=0; i < 1<<pagedir_order; i++) - SetPageNosave(page++); - + /* We aren't setting the pagedir itself Nosave because we have to be able + * to free it during resume, after restoring the image. This means nr_copy_pages + * needs to be adjusted */ + + for (i = 0; i < pagedir_size; i++) { + p[i] = (suspend_pagedir_t *)__get_free_pages(GFP_ATOMIC, 0); + if (!p[i]) { + int j; + for (j = 0; j < i; j++) { + free_page((unsigned long) p[j]); + } + free_page((unsigned long) p); + return NULL; + } + } + while(nr_copy_pages--) { - p->address = get_zeroed_page(GFP_ATOMIC | __GFP_COLD); - if(!p->address) { - free_suspend_pagedir((unsigned long) pagedir); + PAGEDIR_ENTRY(p, nr_copy_pages)->address = get_zeroed_page(GFP_ATOMIC | __GFP_COLD); + if(!PAGEDIR_ENTRY(p, nr_copy_pages)->address) { + free_suspend_pagedir(p); return NULL; } - printk("."); - SetPageNosave(virt_to_page(p->address)); - p->orig_address = 0; - p++; + SetPageNosave(virt_to_page(PAGEDIR_ENTRY(p, nr_copy_pages)->address)); + PAGEDIR_ENTRY(p, nr_copy_pages)->orig_address = 0; } - return pagedir; + return p; } static int prepare_suspend_console(void) @@ -604,12 +625,13 @@ static int prepare_suspend_processes(void) { + PRINTK("Syncing...\n"); + sys_sync(); if (freeze_processes()) { printk( KERN_ERR "Suspend failed: Not all processes stopped!\n" ); thaw_processes(); return 1; } - sys_sync(); return 0; } @@ -684,6 +706,7 @@ pagedir_nosave = NULL; printk( "/critical section: Counting pages to copy" ); nr_copy_pages = count_and_copy_data_pages(NULL); + nr_copy_pages += 1 + SUSPEND_PD_PAGES(nr_copy_pages); nr_needed_pages = nr_copy_pages + PAGES_FOR_IO; printk(" (pages needed: %d+%d=%d free: %d)\n",nr_copy_pages,PAGES_FOR_IO,nr_needed_pages,nr_free_pages()); @@ -713,7 +736,6 @@ return 1; } nr_copy_pages_check = nr_copy_pages; - pagedir_order_check = pagedir_order; drain_local_pages(); /* During allocating of suspend pagedir, new cold pages may appear. Kill them */ if (nr_copy_pages != count_and_copy_data_pages(pagedir_nosave)) /* copy */ @@ -789,12 +811,11 @@ void do_magic_resume_2(void) { BUG_ON (nr_copy_pages_check != nr_copy_pages); - BUG_ON (pagedir_order_check != pagedir_order); __flush_tlb_global(); /* Even mappings of "global" things (vmalloc) need to be fixed */ PRINTK( "Freeing prev allocated pagedir\n" ); - free_suspend_pagedir((unsigned long) pagedir_save); + free_suspend_pagedir(pagedir_save); spin_unlock_irq(&suspend_pagedir_lock); drivers_resume(RESUME_ALL_PHASES); @@ -831,7 +852,7 @@ spin_lock_irq(&suspend_pagedir_lock); /* Done to disable interrupts */ mdelay(1000); - free_pages((unsigned long) pagedir_nosave, pagedir_order); + free_suspend_pagedir(pagedir_nosave); spin_unlock_irq(&suspend_pagedir_lock); mark_swapfiles(((swp_entry_t) {0}), MARK_SWAP_RESUME); PRINTK(KERN_WARNING "%sLeaving do_magic_suspend_2...\n", name_suspend); @@ -894,37 +915,23 @@ /* More restore stuff */ -/* FIXME: Why not memcpy(to, from, 1<<pagedir_order*PAGE_SIZE)? */ -static void copy_pagedir(suspend_pagedir_t *to, suspend_pagedir_t *from) -{ - int i; - char *topointer=(char *)to, *frompointer=(char *)from; - - for(i=0; i < 1 << pagedir_order; i++) { - copy_page(topointer, frompointer); - topointer += PAGE_SIZE; - frompointer += PAGE_SIZE; - } -} - -#define does_collide(addr) does_collide_order(pagedir_nosave, addr, 0) - -/* - * Returns true if given address/order collides with any orig_address - */ -static int does_collide_order(suspend_pagedir_t *pagedir, unsigned long addr, - int order) -{ +static void warmup_collision_cache(suspend_pagedir_t **pagedir) { int i; - unsigned long addre = addr + (PAGE_SIZE<<order); - for(i=0; i < nr_copy_pages; i++) - if((pagedir+i)->orig_address >= addr && - (pagedir+i)->orig_address < addre) - return 1; + PRINTK("Setting up pagedir cache"); + for (i = 0; i < max_pfn; i++) + ClearPageCollides(pfn_to_page(i)); - return 0; + for(i=0; i < nr_copy_pages; i++) { + SetPageCollides(virt_to_page(PAGEDIR_ENTRY(pagedir, i)->orig_address)); + if (!(i%800)) { + PRINTK("."); + } + } + PRINTK("%d", i); + PRINTK("|\n"); } +#define does_collide(address) (PageCollides(virt_to_page(address))) /* * We check here that pagedir & pages it points to won't collide with pages @@ -932,64 +939,106 @@ */ static int check_pagedir(void) { - int i; + int i, nrdone = 0; + void **eaten_memory = NULL; + void **c = eaten_memory, *f, *addr; for(i=0; i < nr_copy_pages; i++) { - unsigned long addr; - - do { - addr = get_zeroed_page(GFP_ATOMIC); - if(!addr) - return -ENOMEM; - } while (does_collide(addr)); - - (pagedir_nosave+i)->address = addr; + while ((addr = (void *) get_zeroed_page(GFP_ATOMIC))) { + memset(addr, 0, PAGE_SIZE); + if (!does_collide((unsigned long) addr)) { + break; + } + eaten_memory = addr; + *eaten_memory = c; + c = eaten_memory; + } + PAGEDIR_ENTRY(pagedir_nosave,i)->address = (unsigned long) addr; + nrdone++; + } + + // Free unwanted memory + c = eaten_memory; + while(c) { + f = c; + c = *c; + if (f) + free_page((unsigned long) f); } + eaten_memory = NULL; + return 0; } static int relocate_pagedir(void) { + void **eaten_memory = NULL; + void **c = eaten_memory, *m = NULL, *f; + int oom = 0, i, numeaten = 0; + int pagedir_size = SUSPEND_PD_PAGES(nr_copy_pages); + /* * We have to avoid recursion (not to overflow kernel stack), * and that's why code looks pretty cryptic */ - suspend_pagedir_t *new_pagedir, *old_pagedir = pagedir_nosave; - void **eaten_memory = NULL; - void **c = eaten_memory, *m, *f; - - printk("Relocating pagedir"); - if(!does_collide_order(old_pagedir, (unsigned long)old_pagedir, pagedir_order)) { - printk("not neccessary\n"); - return 0; - } + PRINTK("Relocating conflicting parts of pagedir.\n"); - while ((m = (void *) __get_free_pages(GFP_ATOMIC, pagedir_order))) { - memset(m, 0, PAGE_SIZE); - if (!does_collide_order(old_pagedir, (unsigned long)m, pagedir_order)) - break; - eaten_memory = m; - printk( "." ); - *eaten_memory = c; - c = eaten_memory; - } + for (i = -1; i < pagedir_size; i++) { + int this_collides = 0; - if (!m) - return -ENOMEM; - - pagedir_nosave = new_pagedir = m; - copy_pagedir(new_pagedir, old_pagedir); + if (i == -1) + this_collides = does_collide((unsigned long) pagedir_nosave); + else + this_collides = does_collide((unsigned long) pagedir_nosave[i]); + + if (this_collides) { + while ((m = (void *) __get_free_pages(GFP_ATOMIC, 0))) { + memset(m, 0, PAGE_SIZE); + if (!does_collide((unsigned long)m)) { + if (i == -1) { + copy_page(m, pagedir_nosave); + free_page((unsigned long) pagedir_nosave); + pagedir_nosave = m; + } + else { + copy_page(m, (void *) pagedir_nosave[i]); + free_page((unsigned long) pagedir_nosave[i]); + pagedir_nosave[i] = m; + } + break; + } + numeaten++; + eaten_memory = m; + PRINTK("Eaten: %d. Still to try:%d\r", numeaten, nr_free_pages()); + *eaten_memory = c; + c = eaten_memory; + } + if (!m) { + printk("\nRan out of memory trying to relocate pagedir (tried %d pages).\n", numeaten); + oom = 1; + break; + } + } + } + + PRINTK("\nFreeing rejected memory locations..."); c = eaten_memory; while(c) { - printk(":"); - f = *c; + f = c; c = *c; if (f) - free_pages((unsigned long)f, pagedir_order); + free_pages((unsigned long) f, 0); } - printk("|\n"); + eaten_memory = NULL; + + PRINTK("\n"); + + if (oom) + return -ENOMEM; + else + return 0; return 0; } @@ -1062,7 +1111,7 @@ static int __read_suspend_image(struct block_device *bdev, union diskpage *cur, int noresume) { swp_entry_t next; - int i, nr_pgdir_pages; + int i, pagedir_size; #define PREPARENEXT \ { next = cur->link.next; \ @@ -1110,24 +1159,39 @@ pagedir_save = cur->sh.suspend_pagedir; nr_copy_pages = cur->sh.num_pbes; - nr_pgdir_pages = SUSPEND_PD_PAGES(nr_copy_pages); - pagedir_order = get_bitmask_order(nr_pgdir_pages); + pagedir_size = SUSPEND_PD_PAGES(nr_copy_pages); - pagedir_nosave = (suspend_pagedir_t *)__get_free_pages(GFP_ATOMIC, pagedir_order); + pagedir_nosave = (suspend_pagedir_t **)__get_free_pages(GFP_ATOMIC, 0); if (!pagedir_nosave) return -ENOMEM; + { + int i; + for (i = 0; i < pagedir_size; i++) { + pagedir_nosave[i] = (suspend_pagedir_t *)__get_free_pages(GFP_ATOMIC, 0); + if (!pagedir_nosave[i]) { + int j; + for (j = 0; j < i; j++) + free_page((unsigned long) pagedir_nosave[j]); + free_page((unsigned long) pagedir_nosave); + spin_unlock_irq(&suspend_pagedir_lock); + return -ENOMEM; + } + } + } PRINTK( "%sReading pagedir, ", name_resume ); /* We get pages in reverse order of saving! */ - for (i=nr_pgdir_pages-1; i>=0; i--) { + for (i=pagedir_size-1; i>=0; i--) { BUG_ON (!next.val); - cur = (union diskpage *)((char *) pagedir_nosave)+i; + cur = (union diskpage *) pagedir_nosave[i]; if (bdev_read_page(bdev, next.val, cur)) return -EIO; PREPARENEXT; } BUG_ON (next.val); + warmup_collision_cache(pagedir_nosave); + if (relocate_pagedir()) return -ENOMEM; if (check_pagedir()) @@ -1135,12 +1199,12 @@ printk( "Reading image data (%d pages): ", nr_copy_pages ); for(i=0; i < nr_copy_pages; i++) { - swp_entry_t swap_address = (pagedir_nosave+i)->swap_address; + swp_entry_t swap_address = PAGEDIR_ENTRY(pagedir_nosave,i)->swap_address; if (!(i%100)) printk( "." ); /* You do not need to check for overlaps... ... check_pagedir already did this work */ - if (bdev_read_page(bdev, swp_offset(swap_address) * PAGE_SIZE, (char *)((pagedir_nosave+i)->address))) + if (bdev_read_page(bdev, swp_offset(swap_address) * PAGE_SIZE, (char *)(PAGEDIR_ENTRY(pagedir_nosave,i)->address))) return -EIO; } printk( "|\n" ); ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: SWSUSP Discontiguous pagedir patch 2003-03-01 4:22 ` SWSUSP Discontiguous pagedir patch Nigel Cunningham @ 2003-03-02 23:55 ` Patrick Mochel 2003-03-03 2:06 ` Nigel Cunningham ` (3 more replies) 0 siblings, 4 replies; 42+ messages in thread From: Patrick Mochel @ 2003-03-02 23:55 UTC (permalink / raw) To: Nigel Cunningham; +Cc: Pavel Machek, Linux Kernel Mailing List Hi there. > Thus, I still think we can go with the patch I submitted before. I've > rediffed it against 2.5.63 (less the bits already applied). I've spent the last week reading, reviewing, and rewriting major portions of swsusp. I've actually been reasonably impressed, once I was able to get the code into a much more readable state. All in all, I think the idea of saving state to swap is dangerous for various reasons. However, I like some of the other concepts of the code, and will use them in developing a more palatable mechanism of doing STDs (hehe, I love saying that). Once I've successfully broken out the pieces I want to reuse, I'll post the cumulative patch. In the meantime, the incremental diffs can be viewed here: http://ldm.bkbits.net:8080/linux-2.5-power In the meantime, I do have some comments on your patch.. > diff -ruN linux-2.5.63/arch/i386/kernel/suspend.c linux-2.5.63-01/arch/i386/kernel/suspend.c > --- linux-2.5.63/arch/i386/kernel/suspend.c 2003-02-20 08:25:26.000000000 +1300 > +++ linux-2.5.63-01/arch/i386/kernel/suspend.c 2003-02-20 08:27:36.000000000 +1300 Thank you for putting this back in C, it's much appreciated. > +void do_magic(int resume) > +{ > + if (!resume) { > + do_magic_suspend_1(); > + save_processor_state(); /* We need to capture registers and memory at "same time" */ > + asm ( "movl %esp, saved_context_esp\n\t" > + "movl %eax, saved_context_eax\n\t" > + "movl %ebx, saved_context_ebx\n\t" > + "movl %ecx, saved_context_ecx\n\t" > + "movl %edx, saved_context_edx\n\t" > + "movl %ebp, saved_context_ebp\n\t" > + "movl %esi, saved_context_esi\n\t" > + "movl %edi, saved_context_edi\n\t" On x86, %eax, %ecx, and %edx are local scratch registers, and don't need to be saved. Note that gcc may use them, so check the assembly output. > +/* > + * Final function for resuming: after copying the pages to their original > + * position, it restores the register state. > + * > + * What about page tables? Writing data pages may toggle > + * accessed/dirty bits in our page tables. That should be no problems > + * with 4MB page tables. That's why we require have_pse. > + * > + * This loops destroys stack from under itself, so it better should > + * not use any stack space, itself. When this function is entered at > + * resume time, we move stack to _old_ place. This is means that this > + * function must use no stack and no local variables in registers, > + * until calling restore_processor_context(); > + * > + * Critical section here: noone should touch saved memory after > + * do_magic_resume_1; copying works, because nr_copy_pages, > + * pagedir_nosave, loop and loop2 are nosavedata. > + */ Do you have something against indenting comments? ;) > + for (loop=0; loop < nr_copy_pages; loop++) { > + /* You may not call something (like copy_page) here: see above */ > + for (loop2=0; loop2 < PAGE_SIZE; loop2++) { > + *(((char *)(PAGEDIR_ENTRY(pagedir_nosave,loop)->orig_address))+loop2) = > + *(((char *)(PAGEDIR_ENTRY(pagedir_nosave,loop)->address))+loop2); > + __flush_tlb(); > + } > + } This is better done as for (loop = 0; loop < nr_copy_pagse; loop++) { memcpy((char *)pagedir_nosave[loop].orig_address, (char *)pagedir_nosave[loop].address, PAGE_SIZE); __flush_tlb(); } Is __flush_tlb() really necessary? > diff -ruN linux-2.5.63/include/linux/page-flags.h linux-2.5.63-01/include/linux/page-flags.h > --- linux-2.5.63/include/linux/page-flags.h 2003-02-20 07:59:33.000000000 +1300 > +++ linux-2.5.63-01/include/linux/page-flags.h 2003-02-20 08:28:31.000000000 +1300 > @@ -74,6 +74,7 @@ > #define PG_mappedtodisk 17 /* Has blocks allocated on-disk */ > #define PG_reclaim 18 /* To be reclaimed asap */ > #define PG_compound 19 /* Part of a compound page */ > +#define PG_collides 20 /* swsusp - page used in save image */ > > /* > * Global page accounting. One instance per CPU. Only unsigned longs are > @@ -256,6 +257,9 @@ > #define SetPageCompound(page) set_bit(PG_compound, &(page)->flags) > #define ClearPageCompound(page) clear_bit(PG_compound, &(page)->flags) > > +#define PageCollides(page) test_bit(PG_collides, &(page)->flags) > +#define SetPageCollides(page) set_bit(PG_collides, &(page)->flags) > +#define ClearPageCollides(page) clear_bit(PG_collides, &(page)->flags) > /* > * The PageSwapCache predicate doesn't use a PG_flag at this time, > * but it may again do so one day. > diff -ruN linux-2.5.63/include/linux/suspend.h linux-2.5.63-01/include/linux/suspend.h > --- linux-2.5.63/include/linux/suspend.h 2003-01-15 17:00:58.000000000 +1300 > +++ linux-2.5.63-01/include/linux/suspend.h 2003-02-20 08:27:36.000000000 +1300 > @@ -34,7 +34,7 @@ > char version[20]; > int num_cpus; > int page_size; > - suspend_pagedir_t *suspend_pagedir; > + suspend_pagedir_t **suspend_pagedir; > unsigned int num_pbes; > struct swap_location { > char filename[SWAP_FILENAME_MAXLENGTH]; > @@ -42,6 +42,8 @@ > }; > > #define SUSPEND_PD_PAGES(x) (((x)*sizeof(struct pbe))/PAGE_SIZE+1) > +#define PAGEDIR_CAPACITY(x) (((x)*PAGE_SIZE/sizeof(struct pbe))) > +#define PAGEDIR_ENTRY(pagedir, i) (pagedir[i/PAGEDIR_CAPACITY(1)] + (i%PAGEDIR_CAPACITY(1))) > > /* mm/vmscan.c */ > extern int shrink_mem(void); > @@ -61,7 +63,7 @@ > extern void thaw_processes(void); > > extern unsigned int nr_copy_pages __nosavedata; > -extern suspend_pagedir_t *pagedir_nosave __nosavedata; > +extern suspend_pagedir_t **pagedir_nosave __nosavedata; > > /* Communication between kernel/suspend.c and arch/i386/suspend.c */ > This, and the rest of the deleted patch, are dubious. Once you start adding - more page flag bits - functions that use double pointers big warning alarms start going off I haven't looked that far into it yet, but I suspect there are some design issues there that should get resolved. -pat ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: SWSUSP Discontiguous pagedir patch 2003-03-02 23:55 ` Patrick Mochel @ 2003-03-03 2:06 ` Nigel Cunningham 2003-03-03 2:31 ` Nigel Cunningham ` (2 subsequent siblings) 3 siblings, 0 replies; 42+ messages in thread From: Nigel Cunningham @ 2003-03-03 2:06 UTC (permalink / raw) To: Patrick Mochel; +Cc: Pavel Machek, Linux Kernel Mailing List Hi. Thanks for your comments. I'll take a look at your rewrite. I'm currently working on a port of the 2.4 beta, so I'm hoping we're not going at cross-purposes here. > Thank you for putting this back in C, it's much appreciated. I know nothing about x86 assembly and have just been following the existing code, so I can't claim any credit for this. do_magic in assembly was just a cut and paste from suspend_asm.S so that I could get things going with the PAGEDIR_ENTRY macro. > Do you have something against indenting comments? ;) Cut and paste from the original - not my comment :> > This is better done as > > for (loop = 0; loop < nr_copy_pagse; loop++) { > memcpy((char *)pagedir_nosave[loop].orig_address, > (char *)pagedir_nosave[loop].address, > PAGE_SIZE); > __flush_tlb(); > } > > Is __flush_tlb() really necessary? Pass. Once again, I'm blindly following the comment that says you can't use memcpy. All of my changes are algorithm rewrites, not changes to the 'magic'. > This, and the rest of the deleted patch, are dubious. Once you start > adding > > - more page flag bits > - functions that use double pointers > > big warning alarms start going off I haven't looked that far into it yet, > but I suspect there are some design issues there that should get resolved. Longer term, I don't want to add page_flags. A page_flag was just here because it was the simplest way of getting a working implementation. In the long term I would use a dynamically allocated bitmap instead. The double pointers where the true point to the patch. They are necessary because my aim in future patches is to get 2.5 to the same point as 2.4. Under 2.4, you can now suspend to disk without needing to eat any memory (assuming enough swap etc). To achieve this, the pages of the pagedir must be able to be scattered around memory. With the existing code, you have to be able to allocate a contiguous set of pages for the whole pagedir. Take for an example a suspend cycle I did this morning: Mar 3 07:45:07 laptop-linux kernel: Free:1343. Sets:7720(7891),21527. PD:170. Swap:29589/53139. RAM to suspend:29930; resume:17459. Limits:30592,0 [deletia] Mar 3 07:45:07 laptop-linux kernel: - SWSUSP Version : beta 18 Mar 3 07:45:07 laptop-linux kernel: - Swap available : 53139 (amount unused when preparing image). Mar 3 07:45:07 laptop-linux kernel: - Pageset sizes : 7891 and 21527. (Pagedir size: 170) Mar 3 07:45:07 laptop-linux kernel: - Expected sizes : 7891 and 21527. Mar 3 07:45:07 laptop-linux kernel: - Parameters : 1 0 255 255 0 Mar 3 07:45:07 laptop-linux kernel: - Calculations : Image size: 29760. Ram to suspend: 30101. To resume: 17801. Mar 3 07:45:07 laptop-linux kernel: - Limits : 30592 pages RAM. Initial boot: 29435. Current boot: 0. In order to save 29760 pages, I needed a pagedir of 170 pages. Using the old code, I'd have to allocate 256 contiguous pages. With only 1343 available, what do you think my chances are? With the new functionality, it's no problem. (I should mention that I've seen a way in which I can reduce the pagedir size that this code uses - the struct this is using is different to the one currently in 2.5, and I would keep using the 2.5 version. This might mean we would need 128 pages instead of 256 for the same image size, but I'm sure you'll appreciate that the argument still stands). Regards, Nigel ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: SWSUSP Discontiguous pagedir patch 2003-03-02 23:55 ` Patrick Mochel 2003-03-03 2:06 ` Nigel Cunningham @ 2003-03-03 2:31 ` Nigel Cunningham 2003-03-03 12:30 ` Pavel Machek 2003-03-05 18:02 ` SWSUSP Discontiguous pagedir patch Pavel Machek 3 siblings, 0 replies; 42+ messages in thread From: Nigel Cunningham @ 2003-03-03 2:31 UTC (permalink / raw) To: Patrick Mochel; +Cc: Pavel Machek, Linux Kernel Mailing List On Mon, 2003-03-03 at 12:55, Patrick Mochel wrote: > http://ldm.bkbits.net:8080/linux-2.5-power Hi. again. I've taken a look at the comments for your changesets, and our changes do indeed conflict in a number of places. I'm happy to wait until your cleanups get included and then merge from there. I've had a brief go at using BK, but I'm only on a 56K connection, so I'm not sure how practical it is for me to do pulls etc. I guess for the moment the best path for me to take might be to continue to port 2.4 and then merge with you once I'm done. Regards, Nigel ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: SWSUSP Discontiguous pagedir patch 2003-03-02 23:55 ` Patrick Mochel 2003-03-03 2:06 ` Nigel Cunningham 2003-03-03 2:31 ` Nigel Cunningham @ 2003-03-03 12:30 ` Pavel Machek 2003-03-04 20:36 ` Patrick Mochel 2003-03-05 18:02 ` SWSUSP Discontiguous pagedir patch Pavel Machek 3 siblings, 1 reply; 42+ messages in thread From: Pavel Machek @ 2003-03-03 12:30 UTC (permalink / raw) To: Patrick Mochel; +Cc: Nigel Cunningham, Linux Kernel Mailing List Hi! > > Thus, I still think we can go with the patch I submitted before. I've > > rediffed it against 2.5.63 (less the bits already applied). > > I've spent the last week reading, reviewing, and rewriting major portions > of swsusp. I've actually been reasonably impressed, once I was able to get > the code into a much more readable state. :-). > > diff -ruN linux-2.5.63/arch/i386/kernel/suspend.c linux-2.5.63-01/arch/i386/kernel/suspend.c > > --- linux-2.5.63/arch/i386/kernel/suspend.c 2003-02-20 08:25:26.000000000 +1300 > > +++ linux-2.5.63-01/arch/i386/kernel/suspend.c 2003-02-20 08:27:36.000000000 +1300 > > Thank you for putting this back in C, it's much appreciated. Actually, it can not be put back in C. Manipulating stack pointer from gcc inline assembly is just undefined. Its back in C so we can edit it, but it needs to get back to assembly before merging with Linus. > > + for (loop=0; loop < nr_copy_pages; loop++) { > > + /* You may not call something (like copy_page) here: see above */ > > + for (loop2=0; loop2 < PAGE_SIZE; loop2++) { > > + *(((char *)(PAGEDIR_ENTRY(pagedir_nosave,loop)->orig_address))+loop2) = > > + *(((char *)(PAGEDIR_ENTRY(pagedir_nosave,loop)->address))+loop2); > > + __flush_tlb(); > > + } > > + } > > This is better done as > > for (loop = 0; loop < nr_copy_pagse; loop++) { > memcpy((char *)pagedir_nosave[loop].orig_address, > (char *)pagedir_nosave[loop].address, > PAGE_SIZE); > __flush_tlb(); > } Hehe, try it. You may not do function call at this point, because you are overwriting your stack. See mails with Andi Kleen. This *needs* to be in assembly. > Is __flush_tlb() really necessary? Its there to prevent Heisenbugs. Pavel -- Horseback riding is like software... ...vgf orggre jura vgf serr. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: SWSUSP Discontiguous pagedir patch 2003-03-03 12:30 ` Pavel Machek @ 2003-03-04 20:36 ` Patrick Mochel 2003-03-05 20:50 ` Pavel Machek 0 siblings, 1 reply; 42+ messages in thread From: Patrick Mochel @ 2003-03-04 20:36 UTC (permalink / raw) To: Pavel Machek; +Cc: Nigel Cunningham, Linux Kernel Mailing List > > > --- linux-2.5.63/arch/i386/kernel/suspend.c 2003-02-20 08:25:26.000000000 +1300 > > > +++ linux-2.5.63-01/arch/i386/kernel/suspend.c 2003-02-20 08:27:36.000000000 +1300 > > > > Thank you for putting this back in C, it's much appreciated. > > Actually, it can not be put back in C. Manipulating stack pointer from > gcc inline assembly is just undefined. Its back in C so we can edit > it, but it needs to get back to assembly before merging with Linus. Noted. I'll convert it back. > > This is better done as > > > > for (loop = 0; loop < nr_copy_pagse; loop++) { > > memcpy((char *)pagedir_nosave[loop].orig_address, > > (char *)pagedir_nosave[loop].address, > > PAGE_SIZE); > > __flush_tlb(); > > } > > Hehe, try it. > > You may not do function call at this point, because you are > overwriting your stack. See mails with Andi Kleen. This *needs* to be > in assembly. memcpy() is inlined, at least on x86, and it seems to work fine for me here. Besides, even if memcpy is not safe, you could at least copy 4 bytes at a time. ;) -pat ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: SWSUSP Discontiguous pagedir patch 2003-03-04 20:36 ` Patrick Mochel @ 2003-03-05 20:50 ` Pavel Machek 2003-03-05 21:52 ` Linux vs Windows temperature anomaly Jonathan Lundell 0 siblings, 1 reply; 42+ messages in thread From: Pavel Machek @ 2003-03-05 20:50 UTC (permalink / raw) To: Patrick Mochel; +Cc: Nigel Cunningham, Linux Kernel Mailing List Hi! > > > > --- linux-2.5.63/arch/i386/kernel/suspend.c 2003-02-20 08:25:26.000000000 +1300 > > > > +++ linux-2.5.63-01/arch/i386/kernel/suspend.c 2003-02-20 08:27:36.000000000 +1300 > > > > > > Thank you for putting this back in C, it's much appreciated. > > > > Actually, it can not be put back in C. Manipulating stack pointer from > > gcc inline assembly is just undefined. Its back in C so we can edit > > it, but it needs to get back to assembly before merging with Linus. > > Noted. I'll convert it back. Okay. > > > This is better done as > > > > > > for (loop = 0; loop < nr_copy_pagse; loop++) { > > > memcpy((char *)pagedir_nosave[loop].orig_address, > > > (char *)pagedir_nosave[loop].address, > > > PAGE_SIZE); > > > __flush_tlb(); > > > } > > > > Hehe, try it. > > > > You may not do function call at this point, because you are > > overwriting your stack. See mails with Andi Kleen. This *needs* to be > > in assembly. > > memcpy() is inlined, at least on x86, and it seems to work fine for me > here. Besides, even if memcpy is not safe, you could at least copy 4 bytes > at a time. ;) Well, this whole needs to be in assembly, anyway. I decided it is not perfomance critical, and copied it byte-by-byte. That can be changed... Pavel -- Horseback riding is like software... ...vgf orggre jura vgf serr. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Linux vs Windows temperature anomaly 2003-03-05 20:50 ` Pavel Machek @ 2003-03-05 21:52 ` Jonathan Lundell 2003-03-05 23:11 ` Herman Oosthuysen ` (3 more replies) 0 siblings, 4 replies; 42+ messages in thread From: Jonathan Lundell @ 2003-03-05 21:52 UTC (permalink / raw) To: Linux Kernel Mailing List We've been seeing a curious phenomenon on some PIII/ServerWorks CNB30-LE systems. The systems fail at relatively low temperatures. While the failures are not specifically memory related (ECC errors are never a factor), we have a memory test that's pretty good at triggering them. Data is apparently getting corrupted on the front-side bus. Here's the curious thing: when we run the same memory test on a Windows 2000 system (same hardware; we just swap the disk), we can run the ambient temperature up to 60C with no problem at all; the test will run for days. (It occurred to us to try Win2K because the hardware vendor was using it to test systems at temperature without seeing problems.) Swap in the Linux disk, and at that temperature it'll barely run at all. The memory test fails quickly at 40C ambient. FWIW, CPU cooling is pretty good in this box. So, the puzzle: what might account for temperature sensitivity, of all things, under Linux 2.4.9-31 (RH 7.2), but not Win2K? -- /Jonathan Lundell. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Linux vs Windows temperature anomaly 2003-03-05 21:52 ` Linux vs Windows temperature anomaly Jonathan Lundell @ 2003-03-05 23:11 ` Herman Oosthuysen 2003-03-05 23:38 ` Con Kolivas 2003-03-06 2:57 ` David Rees ` (2 subsequent siblings) 3 siblings, 1 reply; 42+ messages in thread From: Herman Oosthuysen @ 2003-03-05 23:11 UTC (permalink / raw) To: Linux Kernel Mailing List Jonathan Lundell wrote: > We've been seeing a curious phenomenon on some PIII/ServerWorks CNB30-LE > systems. > > The systems fail at relatively low temperatures. While the failures are > So, the puzzle: what might account for temperature sensitivity, of all > things, under Linux 2.4.9-31 (RH 7.2), but not Win2K? Linux is more 'busy' than windoze and I have heard of boxes frying when running Linux. The solution is to find a better motherboard manufacturer... Cheers, -- ------------------------------------------------------------------------ Herman Oosthuysen B.Eng.(E), Member of IEEE Wireless Networks Inc. http://www.WirelessNetworksInc.com E-mail: Herman@WirelessNetworksInc.com Phone: 1.403.569-5687, Fax: 1.403.235-3965 ------------------------------------------------------------------------ ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Linux vs Windows temperature anomaly 2003-03-05 23:11 ` Herman Oosthuysen @ 2003-03-05 23:38 ` Con Kolivas 2003-03-05 23:50 ` Russell King 2003-03-06 7:18 ` Corvus Corax 0 siblings, 2 replies; 42+ messages in thread From: Con Kolivas @ 2003-03-05 23:38 UTC (permalink / raw) To: Herman Oosthuysen, Linux Kernel Mailing List On Thu, 6 Mar 2003 10:11 am, Herman Oosthuysen wrote: > Jonathan Lundell wrote: > > We've been seeing a curious phenomenon on some PIII/ServerWorks CNB30-LE > > systems. > > > > The systems fail at relatively low temperatures. While the failures are > > So, the puzzle: what might account for temperature sensitivity, of all > > things, under Linux 2.4.9-31 (RH 7.2), but not Win2K? > > Linux is more 'busy' than windoze and I have heard of boxes frying when > running Linux. The solution is to find a better motherboard > manufacturer... That doesn't make sense. His post said the temperature was 20 degrees lower when it failed. Con ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Linux vs Windows temperature anomaly 2003-03-05 23:38 ` Con Kolivas @ 2003-03-05 23:50 ` Russell King 2003-03-06 0:29 ` Ed Sweetman 2003-03-06 7:18 ` Corvus Corax 1 sibling, 1 reply; 42+ messages in thread From: Russell King @ 2003-03-05 23:50 UTC (permalink / raw) To: Con Kolivas; +Cc: Herman Oosthuysen, Linux Kernel Mailing List On Thu, Mar 06, 2003 at 10:38:44AM +1100, Con Kolivas wrote: > On Thu, 6 Mar 2003 10:11 am, Herman Oosthuysen wrote: > > Linux is more 'busy' than windoze and I have heard of boxes frying when > > running Linux. The solution is to find a better motherboard > > manufacturer... > > That doesn't make sense. His post said the temperature was 20 degrees lower > when it failed. It makes perfect sense. Components drawing power produce heat, which causes a temperature rise above ambient. Put simply, if a chip that fails at a case temperature of 50C and you have a 10C rise, it'll fail at 40C. If you have a 20C rise, it'll fail at 30C. PS, the efficiency of heatsinks is measured in degC/W - how many degrees celcius the temperature rises for each watt of power dissipated. Double the dissipated power, double the temperature rise. -- Russell King (rmk@arm.linux.org.uk) The developer of ARM Linux http://www.arm.linux.org.uk/personal/aboutme.html ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Linux vs Windows temperature anomaly 2003-03-05 23:50 ` Russell King @ 2003-03-06 0:29 ` Ed Sweetman 2003-03-06 0:47 ` Trever L. Adams 2003-03-06 1:58 ` Jonathan Lundell 0 siblings, 2 replies; 42+ messages in thread From: Ed Sweetman @ 2003-03-06 0:29 UTC (permalink / raw) To: Russell King; +Cc: Con Kolivas, Herman Oosthuysen, Linux Kernel Mailing List Russell King wrote: > On Thu, Mar 06, 2003 at 10:38:44AM +1100, Con Kolivas wrote: > >>On Thu, 6 Mar 2003 10:11 am, Herman Oosthuysen wrote: >> >>>Linux is more 'busy' than windoze and I have heard of boxes frying when >>>running Linux. The solution is to find a better motherboard >>>manufacturer... >> >>That doesn't make sense. His post said the temperature was 20 degrees lower >>when it failed. > > > It makes perfect sense. Components drawing power produce heat, which > causes a temperature rise above ambient. Put simply, if a chip that > fails at a case temperature of 50C and you have a 10C rise, it'll fail > at 40C. If you have a 20C rise, it'll fail at 30C. > > PS, the efficiency of heatsinks is measured in degC/W - how many degrees > celcius the temperature rises for each watt of power dissipated. Double > the dissipated power, double the temperature rise. > that doesn't make much sense. a chip for a given power output fails at a certain chip temperature, this temperature doesn't vary by the case temp. If the case temp increases then the chip temp will increase as long as the cooling system on the chip doesn't change. Hence if the case temp increases the chip temperature will increase and that could put it into the range of failure. If the case temp decreases then the chip temp decreases. The behavior you describe is when you increase the power output of a chip beyond normal specifications (overclocking) then the temperature of failure is lowered. eg. A chip that would run normally at 50C now can only run stable at 45-40. chip temp sensors are usually located in a relatively cool area of the chip, hence chip failure temps occur usually around 60C (max) when in fact it's around 80-90C. Unfortunately for us, chip temperature is not uniform across the chip. Here is a nice little site to get some info on that stuff. http://users.erols.com/chare/elec.htm that being said. I've never heard of running linux frying someone's cpu. I could see frying a power supply because cheap power supplies will fail after a while of idle/load cycles that linux is good at using. I really dont see how else linux could be more "busy" than winows especially since windows has 5 or 6 spyware ad programs running behind the scenes all the time anyway and the virus scanner having to check every instruction would definitly lead to a higher cpu average than a linux box ding the same things minus the spyware and virus scanner. It just doesn't make any sense. Erroring out more in linux than windows...possibly yes depending on which version but not hardware damage under normal use. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Linux vs Windows temperature anomaly 2003-03-06 0:29 ` Ed Sweetman @ 2003-03-06 0:47 ` Trever L. Adams 2003-03-06 9:45 ` Russell King 2003-03-06 1:58 ` Jonathan Lundell 1 sibling, 1 reply; 42+ messages in thread From: Trever L. Adams @ 2003-03-06 0:47 UTC (permalink / raw) To: Ed Sweetman Cc: Russell King, Con Kolivas, Herman Oosthuysen, Linux Kernel Mailing List > The behavior you describe is when you increase the power output of a > chip beyond normal specifications (overclocking) then the temperature of > failure is lowered. eg. A chip that would run normally at 50C now can > only run stable at 45-40. You are the one mistaken. Most CPUs don't dissipate a constant amount of power as heat. That depends on what the CPU is doing. For example, even the Athlon without disconnect will cool some when it is 'halt'ed. If a CPU is working more, accomplishing more than it was at another time, it will be needing to rid itself of more heat. Hence, the fact that the external temperature becomes the limiting factor (along with how good the heat exchange system is [i.e. heat sink/fan]). I do believe the previous poster was incorrect about the mathematical relationship between case and CPU temperatures. They are NOT a 1:1. However, he is right, they are mathematically related. Just as the heat dissipated and the work done are related. You do not need to overclock a CPU to get this kind of a change. The change in the efficiency (memory management, task switching, etc.) of how the work is done can cause the CPU to be worked harder... and when the CPU is worked harder, so is memory and quite often just about everything else. Trever -- One O.S. to rule them all, One O.S. to find them. One O.S. to bring them all and in the darkness bind them. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Linux vs Windows temperature anomaly 2003-03-06 0:47 ` Trever L. Adams @ 2003-03-06 9:45 ` Russell King 0 siblings, 0 replies; 42+ messages in thread From: Russell King @ 2003-03-06 9:45 UTC (permalink / raw) To: Trever L. Adams Cc: Ed Sweetman, Con Kolivas, Herman Oosthuysen, Linux Kernel Mailing List On Wed, Mar 05, 2003 at 07:47:05PM -0500, Trever L. Adams wrote: > You are the one mistaken. Most CPUs don't dissipate a constant amount > of power as heat. That depends on what the CPU is doing. Correct - each time a gate in the CPU switches state, it produces a small amount of heat. Have enough gates switching, and you produce a lot of heat (and your current consumption goes up.) This is basic CMOS operation. > I do believe the previous poster was incorrect about the mathematical > relationship between case and CPU temperatures. I never said there was a 1:1 relationship here - you misread my mail. I talked about _heat sinks_, not the relationship between the temperature on the silicon die and the external case temperature, with or without a heatsink, with or without a fan. If you want to talk about the silicon die, then you need to take into account thermal resistance between the die and the case, the case and the heatsink, the heatsink and the surrounding air, the fact that the heatsink is attached to one side only, etc. However, going into it in minute detail with all the maths is NOT a subject for this list. -- Russell King (rmk@arm.linux.org.uk) The developer of ARM Linux http://www.arm.linux.org.uk/personal/aboutme.html ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Linux vs Windows temperature anomaly 2003-03-06 0:29 ` Ed Sweetman 2003-03-06 0:47 ` Trever L. Adams @ 2003-03-06 1:58 ` Jonathan Lundell 1 sibling, 0 replies; 42+ messages in thread From: Jonathan Lundell @ 2003-03-06 1:58 UTC (permalink / raw) To: linux-kernel At 7:29pm -0500 3/5/03, Ed Sweetman wrote: >I've never heard of running linux frying someone's cpu. I could see >frying a power supply because cheap power supplies will fail after a >while of idle/load cycles that linux is good at using. I really dont >see how else linux could be more "busy" than winows especially since >windows has 5 or 6 spyware ad programs running behind the scenes all >the time anyway and the virus scanner having to check every >instruction would definitly lead to a higher cpu average than a >linux box ding the same things minus the spyware and virus scanner. >It just doesn't make any sense. Erroring out more in linux than >windows...possibly yes depending on which version but not hardware >damage under normal use. I don't think it's a case of "busy" per se. Both systems are 100% occupied with a userland memory test (it just mallocs and locks a biggish buffer, and does reads and writes of various patters). One pass of the test takes about 104 seconds on both systems (presumably it's memory-bound, so compiler differences aren't showing up). It was suggested off-list that I compare the chipset config registers to see if anything is different. I've been meaning to do that, but just looking at the registers, I don't see anything that would affect FSB timing, or the FSB at all, for that matter. Naetheless, I'll do the comparison as soon as I dig up an lspci equivalent for Win2K. As for temperature differences, the heat sink temperature (at least) doesn't seem to differ appreciably between the systems, which is what I'd expect with essentially the same load on each. I'm wondering, somewhat ignorantly, if there might be some kind of CPU configuration that Windows is adjusting, as some kind of workaround or the like. I don't suppose that this is the best place to ask how to read MSRs from Windows.... -- /Jonathan Lundell. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Linux vs Windows temperature anomaly 2003-03-05 23:38 ` Con Kolivas 2003-03-05 23:50 ` Russell King @ 2003-03-06 7:18 ` Corvus Corax 2003-03-06 7:57 ` Ed Sweetman 2003-03-06 14:27 ` Jesse Pollard 1 sibling, 2 replies; 42+ messages in thread From: Corvus Corax @ 2003-03-06 7:18 UTC (permalink / raw) To: linux-kernel Am Thu, 6 Mar 2003 10:38:44 +1100 schrieb Con Kolivas <kernel@kolivas.org>: > > That doesn't make sense. His post said the temperature was 20 degrees lower > when it failed. > > Con I think it does, look at this: RAM ._____________________. _|| | | | | | | | | | ||_. ._/| ._/| / ||___________________|| |~\ ||/| ||/| | |O _____ O| |~\\ /||/| ||/| | | .-°| | |°-. | |\\\\ //|| | || | | | / \ |~| | / \ | |\\\\\ //=|| |=|| | | | /| |\| |~|/| |\ | |\\\\.________. ///=||/|=||/| | | * | | \_._/ |~| * | |\===| |==///==||/|=||/| | | |~|~| /CPU\ ~ | | | |====| north |==///==|| |=|| | | | | | |~\_ _/ | | | | |====| bridge |=======|| |=|| | | | * | | / ° \ | |~* | |/===| (MEM ) |=======||/|=||/| | | \| |/| |~|\|~|/ | |//==| (CTRL) |==\\\==||/|=||/| | | \ / |~| | \ / | |////°~~~~~~~~°==\\\==|| |=|| | | | °-.|_|_|.-° | |///// |||||| \\\=|| |=|| | | |O O| |//// |||||| \\=||/|=||/| | |~~~~~~~~~~~~~~~~~~~~~| |_// |||||| \\||/| ||/| °~|| | | | | | | | | | ||~°_/ |||||| \|| | || | °~~~~~~~~~~~~~~~~~~~~~° |||||| ||/ ||/ CPU TEMP | |||||| |_| |_| | | voltage |||||| | ||| |||||| | ||| .________. Mainboard | ||| | | TEMP .,,,,,,. data | south | O | |=======| bridge | \\_____°''''''° | (BUS ) | °~~~~~~° | (CTRL) | TEMP & °~~~~~~~~° VOLTAGE ctrl ////|||\\\\\ chip PCI & other BUS the sensor for the system temperature (somewhere on the board) is connected to a driver chip (usually on the i2c bus) like the w83781d (on my board) if something now causes the (often badly cooled) bridge to get hot (by more load between some periphery and the RAM for example) , the system temperature doesnt necessary have to increase. if the bridge has only a heatsink, its temperature is somewhat like (system TEMP)+ ( produced heatper time / heat given to the air by heatsink per time ) where the heatsinks capacity is dependent on the delta temperature, too, gets complicated ;) in short, the chips hotter than the rest of the system and if it has high load it gets even hotter, but its temp is still dependant on the main system TEMP. ;) blahrgh forget what i talk, watch the ASCII art, and imagine the effect of much data running between BUS and RAM ;-) (or BUS and BUS if north and southbridge are on the same chip) CvC ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Linux vs Windows temperature anomaly 2003-03-06 7:18 ` Corvus Corax @ 2003-03-06 7:57 ` Ed Sweetman 2003-03-06 8:18 ` Corvus Corax 2003-03-06 14:27 ` Jesse Pollard 1 sibling, 1 reply; 42+ messages in thread From: Ed Sweetman @ 2003-03-06 7:57 UTC (permalink / raw) To: Corvus Corax; +Cc: linux-kernel This is getting kicked like a deadhorse by now i think but. Unlike your cpu which gets idle commands from the OS and thus has an idle loop where it turns off certain circuits and which can get acpi commands to turn completely off the other chips in the computer do not have such a luxury. they are always on like the cpus of yesteryear used to be. It doesn't matter if they have data moving in them or not, no big difference. The reason why it seems like this is the case is for you HSF cooled cpu guys, load on the system bus usually means high cpu load and that means more heat put into the surrounding air and the little usually passive cooled but regardless, less hot system bus gets hotter along with the cpu and cooler when the cpu is idle. People cooled by other methods that do not dump heat into the surrounding air inside the case will notice that the system bus temp only varies with ambient air temp changes, not data transfer going on between ram and cpu. Corvus Corax wrote: > Am Thu, 6 Mar 2003 10:38:44 +1100 > schrieb Con Kolivas <kernel@kolivas.org>: > > >>That doesn't make sense. His post said the temperature was 20 degrees lower >>when it failed. >> >>Con > > > I think it does, > > look at this: > > RAM > ._____________________. > _|| | | | | | | | | | ||_. ._/| ._/| > / ||___________________|| |~\ ||/| ||/| > | |O _____ O| |~\\ /||/| ||/| > | | .-°| | |°-. | |\\\\ //|| | || | > | | / \ |~| | / \ | |\\\\\ //=|| |=|| | > | | /| |\| |~|/| |\ | |\\\\.________. ///=||/|=||/| > | | * | | \_._/ |~| * | |\===| |==///==||/|=||/| > | | |~|~| /CPU\ ~ | | | |====| north |==///==|| |=|| | > | | | | |~\_ _/ | | | | |====| bridge |=======|| |=|| | > | | * | | / ° \ | |~* | |/===| (MEM ) |=======||/|=||/| > | | \| |/| |~|\|~|/ | |//==| (CTRL) |==\\\==||/|=||/| > | | \ / |~| | \ / | |////°~~~~~~~~°==\\\==|| |=|| | > | | °-.|_|_|.-° | |///// |||||| \\\=|| |=|| | > | |O O| |//// |||||| \\=||/|=||/| > | |~~~~~~~~~~~~~~~~~~~~~| |_// |||||| \\||/| ||/| > °~|| | | | | | | | | | ||~°_/ |||||| \|| | || | > °~~~~~~~~~~~~~~~~~~~~~° |||||| ||/ ||/ > CPU TEMP | |||||| |_| |_| > | | voltage |||||| > | ||| |||||| > | ||| .________. > Mainboard | ||| | | > TEMP .,,,,,,. data | south | > O | |=======| bridge | > \\_____°''''''° | (BUS ) | > °~~~~~~° | (CTRL) | > TEMP & °~~~~~~~~° > VOLTAGE ctrl ////|||\\\\\ > chip PCI & other BUS > > > the sensor for the system temperature (somewhere on the board) is connected to a driver chip (usually on the i2c bus) > like the w83781d (on my board) > > if something now causes the (often badly cooled) bridge to get hot (by more load between some periphery and the RAM for example) > , the system temperature doesnt necessary have to increase. > > if the bridge has only a heatsink, its temperature is somewhat like > (system TEMP)+ ( produced heatper time / heat given to the air by heatsink per time ) > where the heatsinks capacity is dependent on the delta temperature, too, gets complicated ;) > > in short, the chips hotter than the rest of the system and if it has high load it gets even hotter, > but its temp is still dependant on the main system TEMP. ;) > > blahrgh forget what i talk, watch the ASCII art, and imagine the effect of much data running between > BUS and RAM ;-) (or BUS and BUS if north and southbridge are on the same chip) > > CvC > ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Linux vs Windows temperature anomaly 2003-03-06 7:57 ` Ed Sweetman @ 2003-03-06 8:18 ` Corvus Corax 2003-03-06 8:58 ` Ed Sweetman 0 siblings, 1 reply; 42+ messages in thread From: Corvus Corax @ 2003-03-06 8:18 UTC (permalink / raw) To: Ed Sweetman; +Cc: linux-kernel Am Thu, 06 Mar 2003 02:57:31 -0500 schrieb Ed Sweetman <ed.sweetman@wmich.edu>: > This is getting kicked like a deadhorse by now i think but. > even dead horses have the right of being kicked ;) but: > > Unlike your cpu which gets idle commands from the OS and thus has an > idle loop where it turns off certain circuits and which can get acpi > commands to turn completely off the other chips in the computer do not > have such a luxury. they are always on like the cpus of yesteryear used > to be. It doesn't matter if they have data moving in them or not, no > big difference. I think this is not right so, doe to 2 reasons: 1st, i burned my finger on my bridge often enough that i should know that its temperature varies, at least on some chips to really huge amounts ;-) (on my new borad they dont even get hand warm) 2nd. the fact that the chip (or circuit) is on, doesnt mean that there flows current. all halfway new microchips (including those bridges of course) are build in CMOS or similar technology, meaning that there is no more static current flowing through the transistors, but only capacity dependant current, when the transistors change their state. if no data flows, little transistors change their state (only clock signals and some other idle work that is done), and the output drivers to the bus systems are turned low. so there is no static current on the bus and no dynamic current in the chip --> low overall current --> low temperature on the other hand if data flows, its being processed, and many transistors change their state with the data flow, as do the output driver blocks --> high static and dynamic current --> high temperature. > The reason why it seems like this is the case is for > you HSF cooled cpu guys, load on the system bus usually means high cpu > load and that means more heat put into the surrounding air and the > little usually passive cooled but regardless, less hot system bus gets > hotter along with the cpu and cooler when the cpu is idle. People > cooled by other methods that do not dump heat into the surrounding air > inside the case will notice that the system bus temp only varies with > ambient air temp changes, not data transfer going on between ram and cpu. > than this would be measurable as an higher mainboard or system temperature, which is not in our case, as described in the other mails greetings, Corvus V Corax ;) ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Linux vs Windows temperature anomaly 2003-03-06 8:18 ` Corvus Corax @ 2003-03-06 8:58 ` Ed Sweetman 2003-03-06 15:41 ` Jesse Pollard 0 siblings, 1 reply; 42+ messages in thread From: Ed Sweetman @ 2003-03-06 8:58 UTC (permalink / raw) To: Corvus Corax; +Cc: linux-kernel Corvus Corax wrote: > Am Thu, 06 Mar 2003 02:57:31 -0500 > schrieb Ed Sweetman <ed.sweetman@wmich.edu>: > > >>This is getting kicked like a deadhorse by now i think but. >> > > even dead horses have the right of being kicked ;) but: > > >>Unlike your cpu which gets idle commands from the OS and thus has an >>idle loop where it turns off certain circuits and which can get acpi >>commands to turn completely off the other chips in the computer do not >>have such a luxury. they are always on like the cpus of yesteryear used >>to be. It doesn't matter if they have data moving in them or not, no >>big difference. > > > I think this is not right so, doe to 2 reasons: > > 1st, i burned my finger on my bridge often enough that i should know that > its temperature varies, at least on some chips to really huge amounts ;-) > (on my new borad they dont even get hand warm) > > 2nd. the fact that the chip (or circuit) is on, doesnt mean that there flows current. > > all halfway new microchips (including those bridges of course) are build in CMOS > or similar technology, meaning that there is no more static current flowing through > the transistors, but only capacity dependant current, when the transistors change > their state. > > if no data flows, little transistors change their state > (only clock signals and some other idle work that is done), > and the output drivers to the bus systems are turned low. > > so there is no static current on the bus and no dynamic current in the chip --> low overall current > --> low temperature > > on the other hand if data flows, its being processed, and many transistors change their state with the data flow, > as do the output driver blocks --> high static and dynamic current --> high temperature. > > > >>The reason why it seems like this is the case is for >>you HSF cooled cpu guys, load on the system bus usually means high cpu >>load and that means more heat put into the surrounding air and the >>little usually passive cooled but regardless, less hot system bus gets >>hotter along with the cpu and cooler when the cpu is idle. People >>cooled by other methods that do not dump heat into the surrounding air >>inside the case will notice that the system bus temp only varies with >>ambient air temp changes, not data transfer going on between ram and cpu. >> > > > than this would be measurable as an higher mainboard or system temperature, > which is not in our case, as described in the other mails higher than what, the bus power output is constant, the internal temperature of the case is dictated by the heat given off by all the components, they're not separable. That's like saying i would be able to tell if my cpu is putting a constant power output by seeing a higher ambient air temp in the computer case...well no you wouldn't it becomes a constant and all the other components that do have varying power outputs dictate the fluctuations of ambient air. How you get a constant as contributing to changes in ambient air is beyond me, and finding a comparison in order to say it makes the air hotter is further beyond me. The ambient air temp is the temp it is because of all the components inside the case, including the system bus. The only way the system bus's power output would be able to be measured as a higher ambient air temp is if it worked the way you suggested (which i really dont think most do). If you mean then the system bus should be hotter if it's always on the way i suggest ...well you'd be wrong. They dont get hot if the ambient temperature around the bus's heatsink stays cool. Most people just run them passive when they have watercooling because without all the heat from the cpu's heatsink, the ambient air around the bus is sufficiently cool. otherwise a fan is usually needed. i know for a fact my abit athlon motherboard's bus chip doesn't change temperature due to load in the system. The only time it fluctuates is when the temperature of the room changes and that change is not due to the chip (unless i got no air circulation in the room then the computer as a whole will heat up all the air and that feeds back on itself) I believe the originator of the thread went back to check and see if he can find out exactly what his Windows drivers are enabling. the rest of the thread has been arguing over if linux can load hardware more than windows can and what puts off heat and what doesn't which is stupid. i think the topic of the thread is a bunch of BS because unless he has a driver that is for some reason changing the frequency of something or the voltage then linux is not going to stress the system more than windows. The whole thing wreaks of FUD whether intentional or not. > greetings, > > > Corvus V Corax ;) > - ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Linux vs Windows temperature anomaly 2003-03-06 8:58 ` Ed Sweetman @ 2003-03-06 15:41 ` Jesse Pollard 0 siblings, 0 replies; 42+ messages in thread From: Jesse Pollard @ 2003-03-06 15:41 UTC (permalink / raw) To: Ed Sweetman, Corvus Corax; +Cc: linux-kernel On Thursday 06 March 2003 02:58 am, Ed Sweetman wrote: snip > i know for a fact my abit athlon motherboard's bus chip doesn't change > temperature due to load in the system. The only time it fluctuates is > when the temperature of the room changes and that change is not due to > the chip (unless i got no air circulation in the room then the computer > as a whole will heat up all the air and that feeds back on itself) Only because you are removing the heat as fast as it is being generated. Which speaks for a good motherboard, heat sink, and fan combination, along with decent AC for the room. Additional heat generation with the use of Linux has been documented going back to the 486 days, when problems were traced to an insufficient heat sink. (system works with windows, crashes with Linux... replaced heat sink and all is well). The entire thread has been about a burst of activity that causes a thermal spike in one or two possible locations not in the CPU. The internal ambient temperature takes at least 3-5 seconds to change before the sensor can report it. If the chip is already operating just below it's critical temperature (and that varies among chips, even in the same lot) then it will work with windows. Linux has a much higher demand on the hardware, partially due to the ability to generate DMA requests faster. This adds extra heat to the bridges, and COULD push the chip over the critical temperature for brief times (I would guess it would be in the millisecond range). Sustained DMA activity would be a suspect in something like this. It would be an interesting research topic to put high precision sensors on all of the important chips on a motherboard (say between the chip and heat sink) and come up with a time sequence and thermal map of a collection of motherboards.... -- ------------------------------------------------------------------------- Jesse I Pollard, II Email: pollard@navo.hpc.mil Any opinions expressed are solely my own. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Linux vs Windows temperature anomaly 2003-03-06 7:18 ` Corvus Corax 2003-03-06 7:57 ` Ed Sweetman @ 2003-03-06 14:27 ` Jesse Pollard 1 sibling, 0 replies; 42+ messages in thread From: Jesse Pollard @ 2003-03-06 14:27 UTC (permalink / raw) To: Corvus Corax, linux-kernel On Thursday 06 March 2003 01:18 am, Corvus Corax wrote: > Am Thu, 6 Mar 2003 10:38:44 +1100 > > schrieb Con Kolivas <kernel@kolivas.org>: > > That doesn't make sense. His post said the temperature was 20 degrees > > lower when it failed. > > > > Con > > I think it does, snip > if the bridge has only a heatsink, its temperature is somewhat like > (system TEMP)+ ( produced heatper time / heat given to the air by heatsink > per time ) where the heatsinks capacity is dependent on the delta > temperature, too, gets complicated ;) > > in short, the chips hotter than the rest of the system and if it has high > load it gets even hotter, but its temp is still dependant on the main > system TEMP. ;) It is also referred to as thermal inertia. It takes time for the heat sink to 1. heat up 2. start transferring that head out During that time delay the chip may easily overheat in a burst of activity. Same thing happens to fuses... a "slow blow" fuse will blow faster in higher ambient temperature, under conditions that are normal because the AC was turned on... -- ------------------------------------------------------------------------- Jesse I Pollard, II Email: pollard@navo.hpc.mil Any opinions expressed are solely my own. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Linux vs Windows temperature anomaly 2003-03-05 21:52 ` Linux vs Windows temperature anomaly Jonathan Lundell 2003-03-05 23:11 ` Herman Oosthuysen @ 2003-03-06 2:57 ` David Rees 2003-03-06 6:12 ` Matthias Schniedermeyer 2003-03-07 0:40 ` Horst von Brand 3 siblings, 0 replies; 42+ messages in thread From: David Rees @ 2003-03-06 2:57 UTC (permalink / raw) To: Linux Kernel Mailing List On Wed, Mar 05, 2003 at 01:52:16PM -0800, Jonathan Lundell wrote: > We've been seeing a curious phenomenon on some PIII/ServerWorks > CNB30-LE systems. > > The systems fail at relatively low temperatures. While the failures > are not specifically memory related (ECC errors are never a factor), > we have a memory test that's pretty good at triggering them. Data is > apparently getting corrupted on the front-side bus. > > Here's the curious thing: when we run the same memory test on a > Windows 2000 system (same hardware; we just swap the disk), we can > run the ambient temperature up to 60C with no problem at all; the > test will run for days. (It occurred to us to try Win2K because the > hardware vendor was using it to test systems at temperature without > seeing problems.) > > Swap in the Linux disk, and at that temperature it'll barely run at > all. The memory test fails quickly at 40C ambient. > > FWIW, CPU cooling is pretty good in this box. > > So, the puzzle: what might account for temperature sensitivity, of > all things, under Linux 2.4.9-31 (RH 7.2), but not Win2K? Since it doesn't sound like this is a memory error, but a chipset driver error it could be a Linux driver bug. You are running a very old kernel, at the least upgrade to the latest errata (which is currently 2.4.18-26.7. You are running the latest security updates as well, right? -Dave ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Linux vs Windows temperature anomaly 2003-03-05 21:52 ` Linux vs Windows temperature anomaly Jonathan Lundell 2003-03-05 23:11 ` Herman Oosthuysen 2003-03-06 2:57 ` David Rees @ 2003-03-06 6:12 ` Matthias Schniedermeyer 2003-03-06 16:07 ` Jonathan Lundell 2003-03-07 0:40 ` Horst von Brand 3 siblings, 1 reply; 42+ messages in thread From: Matthias Schniedermeyer @ 2003-03-06 6:12 UTC (permalink / raw) To: Jonathan Lundell; +Cc: Linux Kernel Mailing List On Wed, Mar 05, 2003 at 01:52:16PM -0800, Jonathan Lundell wrote: > We've been seeing a curious phenomenon on some PIII/ServerWorks > CNB30-LE systems. > > So, the puzzle: what might account for temperature sensitivity, of > all things, under Linux 2.4.9-31 (RH 7.2), but not Win2K? Hmmm. Wasn't there something with IDE and the LE-Chipset. Maybe you should try a current kernel. Don't know if this old-kernel has the fix. Bis denn -- Real Programmers consider "what you see is what you get" to be just as bad a concept in Text Editors as it is in women. No, the Real Programmer wants a "you asked for it, you got it" text editor -- complicated, cryptic, powerful, unforgiving, dangerous. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Linux vs Windows temperature anomaly 2003-03-06 6:12 ` Matthias Schniedermeyer @ 2003-03-06 16:07 ` Jonathan Lundell 0 siblings, 0 replies; 42+ messages in thread From: Jonathan Lundell @ 2003-03-06 16:07 UTC (permalink / raw) To: Linux Kernel Mailing List At 7:12am +0100 3/6/03, Matthias Schniedermeyer wrote: >Hmmm. Wasn't there something with IDE and the LE-Chipset. > >Maybe you should try a current kernel. Don't know if this old-kernel has >the fix. It involved DMA, I think; I've disabled IDE DMA altogether. My current plan is to run the tests with a more recent kernel, and to compare PIII MSRs between a Linux and Windows boot. -- /Jonathan Lundell. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Linux vs Windows temperature anomaly 2003-03-05 21:52 ` Linux vs Windows temperature anomaly Jonathan Lundell ` (2 preceding siblings ...) 2003-03-06 6:12 ` Matthias Schniedermeyer @ 2003-03-07 0:40 ` Horst von Brand 3 siblings, 0 replies; 42+ messages in thread From: Horst von Brand @ 2003-03-07 0:40 UTC (permalink / raw) To: Jonathan Lundell; +Cc: Linux Kernel Mailing List Jonathan Lundell <linux@lundell-bros.com> said: > We've been seeing a curious phenomenon on some PIII/ServerWorks > CNB30-LE systems. > > The systems fail at relatively low temperatures. While the failures > are not specifically memory related (ECC errors are never a factor), > we have a memory test that's pretty good at triggering them. Data is > apparently getting corrupted on the front-side bus. > > Here's the curious thing: when we run the same memory test on a > Windows 2000 system (same hardware; we just swap the disk), we can > run the ambient temperature up to 60C with no problem at all; the > test will run for days. (It occurred to us to try Win2K because the > hardware vendor was using it to test systems at temperature without > seeing problems.) > > Swap in the Linux disk, and at that temperature it'll barely run at > all. The memory test fails quickly at 40C ambient. Linux gives the hardware a _much_ harder workout than Windows. My first PC was a P/100, overclocked to /120. WinNT worked fine, Linux wouldn't even finish booting. -- Dr. Horst H. von Brand User #22616 counter.li.org Departamento de Informatica Fono: +56 32 654431 Universidad Tecnica Federico Santa Maria +56 32 654239 Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513 ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: SWSUSP Discontiguous pagedir patch 2003-03-02 23:55 ` Patrick Mochel ` (2 preceding siblings ...) 2003-03-03 12:30 ` Pavel Machek @ 2003-03-05 18:02 ` Pavel Machek 2003-03-07 17:14 ` Patrick Mochel 3 siblings, 1 reply; 42+ messages in thread From: Pavel Machek @ 2003-03-05 18:02 UTC (permalink / raw) To: Patrick Mochel; +Cc: Nigel Cunningham, Pavel Machek, Linux Kernel Mailing List Hi! > > Thus, I still think we can go with the patch I submitted before. I've > > rediffed it against 2.5.63 (less the bits already applied). > > I've spent the last week reading, reviewing, and rewriting major portions > of swsusp. I've actually been reasonably impressed, once I was able to get > the code into a much more readable state. > > All in all, I think the idea of saving state to swap is dangerous for > various reasons. However, I like some of the other concepts of the code, Can you elaborate? I believe writing to swap is good for user; and it works. > and will use them in developing a more palatable mechanism of doing STDs What is STD? > http://ldm.bkbits.net:8080/linux-2.5-power > Can you post cumulative diff of work-in-progress? I am not permitted to use bk. Also please make sure that you post the diff before you merge it (and please Cc me). Pavel -- Pavel Written on sharp zaurus, because my Velo1 broke. If you have Velo you don't need... ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: SWSUSP Discontiguous pagedir patch 2003-03-05 18:02 ` SWSUSP Discontiguous pagedir patch Pavel Machek @ 2003-03-07 17:14 ` Patrick Mochel 2003-03-07 20:27 ` Pavel Machek ` (2 more replies) 0 siblings, 3 replies; 42+ messages in thread From: Patrick Mochel @ 2003-03-07 17:14 UTC (permalink / raw) To: Pavel Machek; +Cc: Nigel Cunningham, Linux Kernel Mailing List > > All in all, I think the idea of saving state to swap is dangerous for > > various reasons. However, I like some of the other concepts of the code, > > Can you elaborate? I believe writing > to swap is good for user; and it works. It does work, but there are uncertainties inherently present when using such a solution. Some of them were just the behavior of the current code, which I fixed, like: - Only ever using the first swap partition, regardless of space left. - Not resetting swap signature if a resume failed. - Almost complete lack of a recovery path if anything failed (i.e. trying to back out of what has happened, instead of calling BUG() or panic()). - Function names like do_magic() and friends. This types of things don't instill any confidence in a user or other developer looking at the code. It gives the impression that the code is the result of blind guess work in the dark. After looking at the code, it was a shock to me that it worked at all. I understand that getting it to work involves dealing with the uncertainties. However, there is no reason to pass them on to other users. There were no comments as to what the do_magic*() functions did, let alone why they were 'magic', and there were 5 of them. There are uncertainties still present in the code, like - #warning about waiting for data to reach the disk. - "Waiting for DMAs to settle down" delay on resume. I respect the paranoia. Howver, it's things like these that should be dealt with before anything else. The general problems that I see with the solution are: - It simply won't work if you're low on swap or memory. - It won't work if you're swap is not persistant across reboots. - It won't work if you don't use swap. - It's dependent on the same exact kernel being loaded. It should only be dependent on the binary format of the written metadata. It also shouldn't be waiting until all the devices are probed and initialized, but that problem is out of your hands. Another problem I see in the future is initramfs, and when things start executing in there. It's currently unpacked by populate_rootfs() in init/main.c, long before software_resume() is called. Though it doesn't cause any explicit problems ATM, it does introduce more uncertainties. I don't want to cast the entire project in a negative light, though. It does work, and I'm fairly impressed by it. I do not want to take the feature away. I see it coexisting and sharing code nicely with any other solutions. I've created a registration mechanism for PM 'drivers', and a way for users to select which driver they want to use for the different PM states. In the patch, swsusp is just another driver. It can coexist with ACPI or APM (theoretically) just fine, without requiring a kernel rebuild or reboot. This also involves a generic framework for doing system-wide power management. In this, I've begun extracting bits from swsusp that are useful for any PM sequence. My goal is to reduce swsusp to just a small layer that writes/reads the saved pages from swap. The rest of the sequence, including memory and device handling, happens in generic code. > > and will use them in developing a more palatable mechanism of doing STDs > > What is STD? Suspend-to-disk. > > http://ldm.bkbits.net:8080/linux-2.5-power > > > > Can you post cumulative diff of work-in-progress? > I am not permitted to use bk. Also please > make sure that you post the diff before > you merge it (and please Cc me). Sure. From the above link, you can view the individual patches. I would hope that you could use wget to snarf them, though I don't know if that's legally ok (nor do I want to know). The cumulative patch is here: http://kernel.org/pub/linux/kernel/people/mochel/power/pm-2.5.64.diff.gz If I get a chance in the next few days, I'll post incremental diffs. Without them, the gradual changes are not so obvious. I understand you may not a rewrite of swsusp (regardless of how much cleaner the code is), and I respect that. I'm completely willing to leave kernel/suspend.c intact, and let you work in the integration into the generic PM model, and/or simply rename the new code something like swsusp2, swsusp-XP, or swsusp-pat. ;) -pat ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: SWSUSP Discontiguous pagedir patch 2003-03-07 17:14 ` Patrick Mochel @ 2003-03-07 20:27 ` Pavel Machek 2003-03-09 19:39 ` Benjamin Herrenschmidt 2003-03-10 16:49 ` Patrick Mochel 2003-03-07 20:36 ` Pavel Machek 2003-03-07 20:41 ` Pavel Machek 2 siblings, 2 replies; 42+ messages in thread From: Pavel Machek @ 2003-03-07 20:27 UTC (permalink / raw) To: Patrick Mochel; +Cc: Nigel Cunningham, Linux Kernel Mailing List Hi! > > > All in all, I think the idea of saving state to swap is dangerous for > > > various reasons. However, I like some of the other concepts of the code, > > > > Can you elaborate? I believe writing > > to swap is good for user; and it works. > > It does work, but there are uncertainties inherently present when using > such a solution. Some of them were just the behavior of the current code, > which I fixed, like: > > - Only ever using the first swap partition, regardless of space > left. But your solution would also only support *one* suspend partition, right? (And patches for using more than one swap partition are available for 2.4.X; I don't like them due to added complexity). > - Not resetting swap signature if a resume failed. Can be fixed in userland. Add option -s that for mkswap that fixes signature only if it was overwritten by suspend, and add mkswap -s /swap/partiton in your init scripts. > - Almost complete lack of a recovery path if anything failed (i.e. trying > to back out of what has happened, instead of calling BUG() or > panic()). Those BUGs / panics should be impossible to trigger. [And this has nothing to do with fact we suspend-to-swap]. > - Function names like do_magic() and friends. It is pretty magical operation, so you are at least warned. [And this has nothing to do with fact we suspend-to-swap]. > This types of things don't instill any confidence in a user or other > developer looking at the code. It gives the impression that the code is > the result of blind guess work in the dark. After looking at the code, it > was a shock to me that it worked at all. > > I understand that getting it to work involves dealing with the > uncertainties. However, there is no reason to pass them on to other users. > There were no comments as to what the do_magic*() functions did, let alone > why they were 'magic', and there were 5 of them. do_magic() replaces one kernel with another. That seems magical enough to me. [It is in 5 functions so that the real hard part can be in assembly]. > There are uncertainties still present in the code, like > > - #warning about waiting for data to reach the disk. > > - "Waiting for DMAs to settle down" delay on resume. > > I respect the paranoia. Howver, it's things like these that should be > dealt with before anything else. Feel free to fix them. [I believe both warning and waiting for DMA can be safely killed, but...] > The general problems that I see with the solution are: > > - It simply won't work if you're low on swap or memory. Your solution will not work with too small suspend partition too. Being low on memory... You'd have to have > 50% of your memory allocated by kernel for swsusp to fail. I do not think it can be sanely done other way. [Having separate disk drivers just for suspend is *not* sane.] > - It won't work if you're swap is not persistant across reboots. Your solution will not work if your suspend partition is not persistant across reboots. AND WHAT? > - It won't work if you don't use swap. Your solution will not work if your suspend partition is not there. > - It's dependent on the same exact kernel being loaded. > > It should only be dependent on the binary format of the written metadata. ...which leads to simpler design and few megabytes less transfered to / from disk. I do not think there's easy way to do it with different kernels. State of devices before switching to new kernel is important... > It also shouldn't be waiting until all the devices are probed and > initialized, but that problem is out of your hands. > > Another problem I see in the future is initramfs, and when things start > executing in there. It's currently unpacked by populate_rootfs() in > init/main.c, long before software_resume() is called. Though it doesn't > cause any explicit problems ATM, it does introduce more > uncertainties. Oops, I have not seen that one. Yep that may turn nasty in future. software_resume() should really be done before userland starts. > I don't want to cast the entire project in a negative light, though. It > does work, and I'm fairly impressed by it. Thanx. > I've created a registration mechanism for PM 'drivers', and a way for > users to select which driver they want to use for the different PM states. > In the patch, swsusp is just another driver. It can coexist with ACPI or > APM (theoretically) just fine, without requiring a kernel rebuild or > reboot. I believe it can coexist with ACPI and APM already just okay. You can echo 4b to /proc/acpi/sleep to trigger S4bios. > This also involves a generic framework for doing system-wide power > management. In this, I've begun extracting bits from swsusp that are > useful for any PM sequence. My goal is to reduce swsusp to just a small > layer that writes/reads the saved pages from swap. The rest of the > sequence, including memory and device handling, happens in generic > code. So you don't really want to create separate "suspend partition"? Good. More sharing between S3 and S4 is certainly good (but I do not think much more can be shared). > > > http://ldm.bkbits.net:8080/linux-2.5-power > > > > > > > Can you post cumulative diff of work-in-progress? > > I am not permitted to use bk. Also please > > make sure that you post the diff before > > you merge it (and please Cc me). > > Sure. From the above link, you can view the individual patches. I would > hope that you could use wget to snarf them, though I don't know if that's > legally ok (nor do I want to know). > > The cumulative patch is here: > > > >http://kernel.org/pub/linux/kernel/people/mochel/power/pm-2.5.64.diff.gz THenx. > If I get a chance in the next few days, I'll post incremental diffs. > Without them, the gradual changes are not so obvious. > > I understand you may not a rewrite of swsusp (regardless of how much > cleaner the code is), and I respect that. I'm completely willing to leave > kernel/suspend.c intact, and let you work in the integration into the > generic PM model, and/or simply rename the new code something like > swsusp2, swsusp-XP, or swsusp-pat. ;) So you want to develop swsusp-pat that will suspend to partition, allow another kernel version, and you think you can suspend when 90% of your memory is kmalloc()-ed? Do you agree that separate disk drivers for suspend is bad idea? Pavel -- When do you have a heart between your knees? [Johanka's followup: and *two* hearts?] ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: SWSUSP Discontiguous pagedir patch 2003-03-07 20:27 ` Pavel Machek @ 2003-03-09 19:39 ` Benjamin Herrenschmidt 2003-03-09 20:12 ` Pavel Machek 2003-03-10 16:49 ` Patrick Mochel 1 sibling, 1 reply; 42+ messages in thread From: Benjamin Herrenschmidt @ 2003-03-09 19:39 UTC (permalink / raw) To: Pavel Machek; +Cc: Patrick Mochel, Nigel Cunningham, Linux Kernel Mailing List > > - It's dependent on the same exact kernel being loaded. > > > > It should only be dependent on the binary format of the written metadata. > > ...which leads to simpler design and few megabytes less transfered to > / from disk. I do not think there's easy way to do it with different > kernels. State of devices before switching to new kernel is important... I don't think so. IMHO, the "old" kernel (used for loading the suspend image) should quiesce devices in a pretty "normal" way in the exact same way kexec does (and using the same code path/driver notifiers). I see no reason why there should be any kind of dependency between the "loader" kernel and the "loaded" kernel in this regard. In fact, I'm considering for PPC to just trash the "loader" kernel when possible and directly load the suspend image from the bootloader Ben. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: SWSUSP Discontiguous pagedir patch 2003-03-09 19:39 ` Benjamin Herrenschmidt @ 2003-03-09 20:12 ` Pavel Machek 0 siblings, 0 replies; 42+ messages in thread From: Pavel Machek @ 2003-03-09 20:12 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Patrick Mochel, Nigel Cunningham, Linux Kernel Mailing List Hi! > > > - It's dependent on the same exact kernel being loaded. > > > > > > It should only be dependent on the binary format of the written metadata. > > > > ...which leads to simpler design and few megabytes less transfered to > > / from disk. I do not think there's easy way to do it with different > > kernels. State of devices before switching to new kernel is important... > > I don't think so. > > IMHO, the "old" kernel (used for loading the suspend image) should > quiesce devices in a pretty "normal" way in the exact same way > kexec does (and using the same code path/driver notifiers). I see > no reason why there should be any kind of dependency between the > "loader" kernel and the "loaded" kernel in this regard. But if you add support for quiescing matrox in 2.6.5, you will not be able to resume 2.6.5 from 2.6.4 kernel. And as bugs are going to be in that area for a while I'd prefer people to suspend and resume with same kernel. Pavel -- Horseback riding is like software... ...vgf orggre jura vgf serr. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: SWSUSP Discontiguous pagedir patch 2003-03-07 20:27 ` Pavel Machek 2003-03-09 19:39 ` Benjamin Herrenschmidt @ 2003-03-10 16:49 ` Patrick Mochel 2003-03-10 19:23 ` Pavel Machek 1 sibling, 1 reply; 42+ messages in thread From: Patrick Mochel @ 2003-03-10 16:49 UTC (permalink / raw) To: Pavel Machek; +Cc: Nigel Cunningham, Linux Kernel Mailing List > But your solution would also only support *one* suspend partition, > right? (And patches for using more than one swap partition are > available for 2.4.X; I don't like them due to added complexity). Having a dedicated partition has an advantage in just that - it's dedicated to saving system state. Users must consciously create it, and must make it as big as the size of memory they have (or will have). Plus, it's not tied to the amount of memory being used when you suspend. Swap space has a specific purpose, I see it as a detriment to overload its intended usage. Of couse, that's just my opinion, and I don't have code to back it up. > > - Not resetting swap signature if a resume failed. > > Can be fixed in userland. Add option -s that for mkswap that fixes > signature only if it was overwritten by suspend, and add mkswap -s > /swap/partiton in your init scripts. That's wrong, IMO. If the kernel modifies it, it should reset it. You shouldn't impose extra burden on the users because your code failed. Besides, it's fixed anyway. > > - Almost complete lack of a recovery path if anything failed (i.e. trying > > to back out of what has happened, instead of calling BUG() or > > panic()). > > Those BUGs / panics should be impossible to trigger. [And this has > nothing to do with fact we suspend-to-swap]. [ I know these are not suspend-to-swap specific; sorry for implying that.] If they're really impossible to trigger, then they shouldn't be there at all. If you can recover from them, then you should, instead of giving up. Besides, a lot of them were completely bogus things like BUG_ON(sizeof(foo) != sizeof(bar)) Which are known at compile time, but were buried in the code to read/write the data, and only convoluted the code even more. > > - Function names like do_magic() and friends. > > It is pretty magical operation, so you are at least warned. [And this has > nothing to do with fact we suspend-to-swap]. IMO, warnings should be conveyed in comments, not in cryptic function names. Besides, there is nothing magical about it, unless that sequence of instructions actually does make your computer glow, levitate, or turn into a mermaid. In which case, I would like to know where I can find one. ;) Seriously, you described below what it does, which helps a lot more than anything named 'magic'. > > The general problems that I see with the solution are: > > > > - It simply won't work if you're low on swap or memory. > > Your solution will not work with too small suspend partition > too. Being low on memory... You'd have to have > 50% of your memory > allocated by kernel for swsusp to fail. I do not think it can be > sanely done other way. [Having separate disk drivers just for suspend > is *not* sane.] > > > - It won't work if you're swap is not persistant across reboots. > > Your solution will not work if your suspend partition is not > persistant across reboots. AND WHAT? > > > - It won't work if you don't use swap. > > Your solution will not work if your suspend partition is not there. I didn't mean to sound like a hypocrit, I apologize. The advantage of using a dedicated partition over swap is that in order to create the partition, the user must make a conscious decision to do so. There are parameters that can be enforced when making the partition, like the size and its existence on a persistant medium. These can be enforced by a user making a swap partition, but it places extra burden on the user. > > I've created a registration mechanism for PM 'drivers', and a way for > > users to select which driver they want to use for the different PM states. > > In the patch, swsusp is just another driver. It can coexist with ACPI or > > APM (theoretically) just fine, without requiring a kernel rebuild or > > reboot. > > I believe it can coexist with ACPI and APM already just okay. You can > echo 4b to /proc/acpi/sleep to trigger S4bios. > > > This also involves a generic framework for doing system-wide power > > management. In this, I've begun extracting bits from swsusp that are > > useful for any PM sequence. My goal is to reduce swsusp to just a small > > layer that writes/reads the saved pages from swap. The rest of the > > sequence, including memory and device handling, happens in generic > > code. > > So you don't really want to create separate "suspend partition"? Good. Sorry, the patch included a few distinct things, and I should have made it a bit more clear. In includes: - A generic PM framework which PM drivers can register with. Users can specificy which handler they wish to use for different states, based on their preference or the capabilities of their systems. They can also use one mechanism for entering power states: /sys/power/power_state, instead of relying different mechanisms for different PM drivers (/proc/acpi/sleep vs. apm(1) vs. sys_reboot()). - Generic sequence for entering sleep states, in drivers/power/main.c - Clean up of swsusp. - Conversion of swsusp and ACPI to register with the PM model. - Extraction of swsusp-specific features into the generic PM framework, so they can be shared with everyone. In the long run, I'd like to develop a solution using a dedicated partition. But, that wouldn't necessarily obviate the use of swsusp. It would coexist alongside it. > > I understand you may not a rewrite of swsusp (regardless of how much > > cleaner the code is), and I respect that. I'm completely willing to leave > > kernel/suspend.c intact, and let you work in the integration into the > > generic PM model, and/or simply rename the new code something like > > swsusp2, swsusp-XP, or swsusp-pat. ;) > > So you want to develop swsusp-pat that will suspend to partition, > allow another kernel version, and you think you can suspend when 90% > of your memory is kmalloc()-ed? Do you agree that separate disk > drivers for suspend is bad idea? Yes. -pat ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: SWSUSP Discontiguous pagedir patch 2003-03-10 16:49 ` Patrick Mochel @ 2003-03-10 19:23 ` Pavel Machek 2003-03-10 19:05 ` Patrick Mochel 2003-03-10 22:17 ` Nigel Cunningham 0 siblings, 2 replies; 42+ messages in thread From: Pavel Machek @ 2003-03-10 19:23 UTC (permalink / raw) To: Patrick Mochel; +Cc: Nigel Cunningham, Linux Kernel Mailing List Hi! > > But your solution would also only support *one* suspend partition, > > right? (And patches for using more than one swap partition are > > available for 2.4.X; I don't like them due to added complexity). > > Having a dedicated partition has an advantage in just that - it's > dedicated to saving system state. Users must consciously create it, and > must make it as big as the size of memory they have (or will have). Plus, > it's not tied to the amount of memory being used when you suspend. That's a problem. Users do not have suspend partitions, but they do have swap partition. And repartitioning existing installation is very painfull. OTOH it is true that if we want "emergency-suspend-to-disk-when-battery-low", dedicated partition makes some sense.... ... Well. You can always do swapoff, swapon, swsusp. Maybe some processes will die, but that's life ;-). [But for that you'd have to guarantee that suspend always works, which is hard, anyway.] > Swap space has a specific purpose, I see it as a detriment to overload its > intended usage. Of couse, that's just my opinion, and I don't have code to > back it up. Well, I see it as advantage because I have swap space anyway (rarely really used), so why not reuse it for swsusp? > > It is pretty magical operation, so you are at least warned. [And this has > > nothing to do with fact we suspend-to-swap]. > > IMO, warnings should be conveyed in comments, not in cryptic function > names. Besides, there is nothing magical about it, unless that sequence of > instructions actually does make your computer glow, levitate, or turn into > a mermaid. In which case, I would like to know where I can find one. ;) :-). Well, comments were getting out of date because code was in permanent flux. It makes sense to comment it now. > > Your solution will not work if your suspend partition is not there. > > I didn't mean to sound like a hypocrit, I apologize. The advantage of > using a dedicated partition over swap is that in order to create the > partition, the user must make a conscious decision to do so. > > There are parameters that can be enforced when making the partition, like > the size and its existence on a persistant medium. These can be enforced > by a user making a swap partition, but it places extra burden on the user. Well, IMO checklist like: if you want to use swsusp you have to a) check swap is on persistent medium b) make sure swap is at least as big as memory/2 [not really neccessary, we might be lucky and swsusp with 30MB of swap...] is easier for the user than repartitioning their harddrives. [I'd like to see someone running swap on floppy ;-)] > > So you don't really want to create separate "suspend partition"? Good. > > Sorry, the patch included a few distinct things, and I should have made it > a bit more clear. In includes: > > - A generic PM framework which PM drivers can register with. > > Users can specificy which handler they wish to use for different states, > based on their preference or the capabilities of their systems. > > They can also use one mechanism for entering power states: > /sys/power/power_state, instead of relying different mechanisms for > different PM drivers (/proc/acpi/sleep vs. apm(1) > vs. sys_reboot()). I believe sys_reboot() is the right way to do that. /sys/power/... needs sysfs mounted etc. /proc/acpi/sleep just happened to already be there and be very convenient. > In the long run, I'd like to develop a solution using a dedicated > partition. But, that wouldn't necessarily obviate the use of swsusp. It > would coexist alongside it. Actually "dedicated partition" vs. "swap partition" is quite a small detail. It only affects disk allocation routines. Basic stuff like "atomic copy" stays the same... > > > I understand you may not a rewrite of swsusp (regardless of how much > > > cleaner the code is), and I respect that. I'm completely willing to leave > > > kernel/suspend.c intact, and let you work in the integration into the > > > generic PM model, and/or simply rename the new code something like > > > swsusp2, swsusp-XP, or swsusp-pat. ;) > > > > So you want to develop swsusp-pat that will suspend to partition, > > allow another kernel version, and you think you can suspend when 90% > > of your memory is kmalloc()-ed? Do you agree that separate disk > > drivers for suspend is bad idea? > > Yes. Do you think you can suspend with 90% memory kmalloc()-ed? Pavel -- Horseback riding is like software... ...vgf orggre jura vgf serr. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: SWSUSP Discontiguous pagedir patch 2003-03-10 19:23 ` Pavel Machek @ 2003-03-10 19:05 ` Patrick Mochel 2003-03-10 22:17 ` Nigel Cunningham 1 sibling, 0 replies; 42+ messages in thread From: Patrick Mochel @ 2003-03-10 19:05 UTC (permalink / raw) To: Pavel Machek; +Cc: Nigel Cunningham, Linux Kernel Mailing List > Do you think you can suspend with 90% memory kmalloc()-ed? Dunno. I need to iron some other details before I get to play with this.. -pat ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: SWSUSP Discontiguous pagedir patch 2003-03-10 19:23 ` Pavel Machek 2003-03-10 19:05 ` Patrick Mochel @ 2003-03-10 22:17 ` Nigel Cunningham 2003-03-10 23:20 ` Pavel Machek 1 sibling, 1 reply; 42+ messages in thread From: Nigel Cunningham @ 2003-03-10 22:17 UTC (permalink / raw) To: Pavel Machek; +Cc: Patrick Mochel, Linux Kernel Mailing List Hi. On Tue, 2003-03-11 at 08:23, Pavel Machek wrote: > Do you think you can suspend with 90% memory kmalloc()-ed? Is that a fair question? Would 90% of memory ever be kmalloced? If the question is can you suspend with 90% of memory used, then I can answer yes. I do it all the time under the code I'm porting to 2.5. (Nearly there, by the way). Regards, Nigel ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: SWSUSP Discontiguous pagedir patch 2003-03-10 22:17 ` Nigel Cunningham @ 2003-03-10 23:20 ` Pavel Machek 0 siblings, 0 replies; 42+ messages in thread From: Pavel Machek @ 2003-03-10 23:20 UTC (permalink / raw) To: Nigel Cunningham; +Cc: Patrick Mochel, Linux Kernel Mailing List Hi! > > Do you think you can suspend with 90% memory kmalloc()-ed? > > Is that a fair question? Would 90% of memory ever be kmalloced? If the > question is can you suspend with 90% of memory used, then I can answer > yes. I do it all the time under the code I'm porting to 2.5. (Nearly > there, by the way). No, it was not fair question, not at all. If he'd replied with yes, I'd tell him I don't believe that ;-). Pavel -- Horseback riding is like software... ...vgf orggre jura vgf serr. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: SWSUSP Discontiguous pagedir patch 2003-03-07 17:14 ` Patrick Mochel 2003-03-07 20:27 ` Pavel Machek @ 2003-03-07 20:36 ` Pavel Machek 2003-03-10 16:51 ` Patrick Mochel 2003-03-07 20:41 ` Pavel Machek 2 siblings, 1 reply; 42+ messages in thread From: Pavel Machek @ 2003-03-07 20:36 UTC (permalink / raw) To: Patrick Mochel; +Cc: Nigel Cunningham, Linux Kernel Mailing List Hi! > The cumulative patch is here: > > http://kernel.org/pub/linux/kernel/people/mochel/power/pm-2.5.64.diff.gz Hmm, I am not sure if drivers/power is the right place for stuff like fridge.c. That might be usefull for other stuff, too. I do not think placing swsusp.h in drivers/power/swsusp is right. It should be in include/linux or include/linux/power. Pavel -- When do you have a heart between your knees? [Johanka's followup: and *two* hearts?] ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: SWSUSP Discontiguous pagedir patch 2003-03-07 20:36 ` Pavel Machek @ 2003-03-10 16:51 ` Patrick Mochel 2003-03-10 19:12 ` Pavel Machek 0 siblings, 1 reply; 42+ messages in thread From: Patrick Mochel @ 2003-03-10 16:51 UTC (permalink / raw) To: Pavel Machek; +Cc: Nigel Cunningham, Linux Kernel Mailing List > > The cumulative patch is here: > > > > http://kernel.org/pub/linux/kernel/people/mochel/power/pm-2.5.64.diff.gz > > Hmm, I am not sure if drivers/power is the right place for stuff like > fridge.c. That might be usefull for other stuff, too. That's fine. If it proves useful for other things, we can move it. > I do not think placing swsusp.h in drivers/power/swsusp is right. It > should be in include/linux or include/linux/power. That header is only for the shared functions between drivers/power/swsusp/*.c. There's no need to export it to everyone. Under the new model, nothing would call swsusp directly. It would call the model's functions, which would delegate the call to the user-specified handler for the action. -pat ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: SWSUSP Discontiguous pagedir patch 2003-03-10 16:51 ` Patrick Mochel @ 2003-03-10 19:12 ` Pavel Machek 2003-03-10 18:59 ` Patrick Mochel 0 siblings, 1 reply; 42+ messages in thread From: Pavel Machek @ 2003-03-10 19:12 UTC (permalink / raw) To: Patrick Mochel; +Cc: Nigel Cunningham, Linux Kernel Mailing List Hi! > > > The cumulative patch is here: > > > > > > http://kernel.org/pub/linux/kernel/people/mochel/power/pm-2.5.64.diff.gz > > > > Hmm, I am not sure if drivers/power is the right place for stuff like > > fridge.c. That might be usefull for other stuff, too. > > That's fine. If it proves useful for other things, we can move it. Actually, I'd like driver model to specify that things are refrigerated when device_suspend() and friends are being run. That should make drivers a lot simpler. [And as non-bitkeeper-capable user I fear moves ;-)] > > I do not think placing swsusp.h in drivers/power/swsusp is right. It > > should be in include/linux or include/linux/power. > > That header is only for the shared functions between > drivers/power/swsusp/*.c. There's no need to export it to everyone. Well, last time acpi introduced its private include/ directory, it was a disaster. Pavel -- Horseback riding is like software... ...vgf orggre jura vgf serr. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: SWSUSP Discontiguous pagedir patch 2003-03-10 19:12 ` Pavel Machek @ 2003-03-10 18:59 ` Patrick Mochel 0 siblings, 0 replies; 42+ messages in thread From: Patrick Mochel @ 2003-03-10 18:59 UTC (permalink / raw) To: Pavel Machek; +Cc: Nigel Cunningham, Linux Kernel Mailing List > > > Hmm, I am not sure if drivers/power is the right place for stuff like > > > fridge.c. That might be usefull for other stuff, too. > > > > That's fine. If it proves useful for other things, we can move it. > > Actually, I'd like driver model to specify that things are > refrigerated when device_suspend() and friends are being run. That > should make drivers a lot simpler. [And as non-bitkeeper-capable user > I fear moves ;-)] That's a policy decision outside of the scope of the driver model. It is however, inside the scope of the PM model, and by using the generic framework, this decision can be guaranteed to be made. > > > I do not think placing swsusp.h in drivers/power/swsusp is right. It > > > should be in include/linux or include/linux/power. > > > > That header is only for the shared functions between > > drivers/power/swsusp/*.c. There's no need to export it to everyone. > > Well, last time acpi introduced its private include/ directory, it was > a disaster. I don't necessarily agree. IMO, putting things in include/whatever/ makes it easy for other code to directly access those functions, some of which you never want people calling directly. And, if it's there, it's likely someone will use it someday. But, in the end it's your code, so I don't really care. -pat ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: SWSUSP Discontiguous pagedir patch 2003-03-07 17:14 ` Patrick Mochel 2003-03-07 20:27 ` Pavel Machek 2003-03-07 20:36 ` Pavel Machek @ 2003-03-07 20:41 ` Pavel Machek 2 siblings, 0 replies; 42+ messages in thread From: Pavel Machek @ 2003-03-07 20:41 UTC (permalink / raw) To: Patrick Mochel; +Cc: Pavel Machek, Nigel Cunningham, Linux Kernel Mailing List Hi! > http://kernel.org/pub/linux/kernel/people/mochel/power/pm-2.5.64.diff.gz +static inline void suspend_restore_mem(void) This has to be in assembly. You can't trust gcc not to move stack pointer. Pavel -- When do you have a heart between your knees? [Johanka's followup: and *two* hearts?] ^ permalink raw reply [flat|nested] 42+ messages in thread
* RE: Linux vs Windows temperature anomaly
@ 2003-03-06 17:29 Ed Vance
0 siblings, 0 replies; 42+ messages in thread
From: Ed Vance @ 2003-03-06 17:29 UTC (permalink / raw)
To: 'Ed Sweetman'; +Cc: Corvus Corax, linux-kernel
On Thu, March 06, 2003 at 12:58 AM, Ed Sweetman wrote:
>
> I believe the originator of the thread went back to check and
> see if he can find out exactly what his Windows drivers are
> enabling. the rest of the thread has been arguing over if
> linux can load hardware more than windows can and what puts
> off heat and what doesn't which is stupid.
>
> i think the topic of the thread is a bunch of BS because
> unless he has a driver that is for some reason changing the
> frequency of something or the voltage then linux is not going
> to stress the system more than windows. The whole thing wreaks
> of FUD whether intentional or not.
>
Well, here's _my_ stupid BS and (intentional) FUD question: 8)
Does anybody know if XP actively performs progressive power
management actions as the CPU temperature increases inside the
normal operating range? If this were done somewhat linearly
starting at "medium rare", instead of only at "well done" to
save the hardware, wouldn't it look like the reported anomaly?
Always suspicious ;)
Ed
----------------------------------------------------------------
Ed Vance edv (at) macrolink (dot) com
Macrolink, Inc. 1500 N. Kellogg Dr Anaheim, CA 92807
----------------------------------------------------------------
^ permalink raw reply [flat|nested] 42+ messages in thread
end of thread, other threads:[~2003-03-10 23:10 UTC | newest] Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <1045784829.3821.10.camel@laptop-linux.cunninghams> [not found] ` <20030223223757.GA120@elf.ucw.cz> [not found] ` <1046136752.1784.15.camel@laptop-linux.cunninghams> [not found] ` <20030227132024.GB27084@atrey.karlin.mff.cuni.cz> 2003-02-27 18:42 ` SWSUSP Discontiguous pagedirs Nigel Cunningham 2003-03-01 4:22 ` SWSUSP Discontiguous pagedir patch Nigel Cunningham 2003-03-02 23:55 ` Patrick Mochel 2003-03-03 2:06 ` Nigel Cunningham 2003-03-03 2:31 ` Nigel Cunningham 2003-03-03 12:30 ` Pavel Machek 2003-03-04 20:36 ` Patrick Mochel 2003-03-05 20:50 ` Pavel Machek 2003-03-05 21:52 ` Linux vs Windows temperature anomaly Jonathan Lundell 2003-03-05 23:11 ` Herman Oosthuysen 2003-03-05 23:38 ` Con Kolivas 2003-03-05 23:50 ` Russell King 2003-03-06 0:29 ` Ed Sweetman 2003-03-06 0:47 ` Trever L. Adams 2003-03-06 9:45 ` Russell King 2003-03-06 1:58 ` Jonathan Lundell 2003-03-06 7:18 ` Corvus Corax 2003-03-06 7:57 ` Ed Sweetman 2003-03-06 8:18 ` Corvus Corax 2003-03-06 8:58 ` Ed Sweetman 2003-03-06 15:41 ` Jesse Pollard 2003-03-06 14:27 ` Jesse Pollard 2003-03-06 2:57 ` David Rees 2003-03-06 6:12 ` Matthias Schniedermeyer 2003-03-06 16:07 ` Jonathan Lundell 2003-03-07 0:40 ` Horst von Brand 2003-03-05 18:02 ` SWSUSP Discontiguous pagedir patch Pavel Machek 2003-03-07 17:14 ` Patrick Mochel 2003-03-07 20:27 ` Pavel Machek 2003-03-09 19:39 ` Benjamin Herrenschmidt 2003-03-09 20:12 ` Pavel Machek 2003-03-10 16:49 ` Patrick Mochel 2003-03-10 19:23 ` Pavel Machek 2003-03-10 19:05 ` Patrick Mochel 2003-03-10 22:17 ` Nigel Cunningham 2003-03-10 23:20 ` Pavel Machek 2003-03-07 20:36 ` Pavel Machek 2003-03-10 16:51 ` Patrick Mochel 2003-03-10 19:12 ` Pavel Machek 2003-03-10 18:59 ` Patrick Mochel 2003-03-07 20:41 ` Pavel Machek 2003-03-06 17:29 Linux vs Windows temperature anomaly Ed Vance
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).