linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* BUG in tmpfs
@ 2006-01-19  0:14 Grzegorz Kulewski
  2006-01-19  3:14 ` Greg KH
  0 siblings, 1 reply; 4+ messages in thread
From: Grzegorz Kulewski @ 2006-01-19  0:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: Con Kolivas, ck, Linus Torvalds

Hi,

I managed to get something like this:

kernel BUG at mm/shmem.c:836!
Jan 17 23:31:42 kangur [4312977.009000] CPU:    0
Jan 17 23:31:42 kangur [4312977.009000] EIP:    0060:[<b014a429>] 
Tainted: P      VLI
Jan 17 23:31:42 kangur [4312977.009000] EFLAGS: 00210206 
(2.6.15-rc6-git4-ck1)
Jan 17 23:31:42 kangur [4312977.009000] EIP is at 
shmem_writepage+0xa6/0x15a
Jan 17 23:31:42 kangur [4312977.009000] eax: e45a5778   ebx: e45a5778 
ecx: 00000000   edx: 00000001
Jan 17 23:31:42 kangur [4312977.009000] esi: b3641000   edi: b1644000 
ebp: e45a57bc   esp: b3641d40
Jan 17 23:31:42 kangur [4312977.009000] ds: 007b   es: 007b   ss: 0068
Jan 17 23:31:42 kangur [4312977.009000] Process cc1 (pid: 2054, 
threadinfo=b3641000 task=b4173050)
Jan 17 23:31:42 kangur [4312977.009000] Stack: e45a5760 00013b24 00013b24 
b1644000 b1644000 e45a585c b3641e1c b013e168
Jan 17 23:31:42 kangur [4312977.009000]        00000000 00000000 00000000 
00000020 00000000 00000000 00000000 00000000
Jan 17 23:31:42 kangur [4312977.009000]        00000000 00000009 00000060 
00200246 b1644000 e45a585c e45a585c b013e3d1
Jan 17 23:31:42 kangur [4312977.009000] Call Trace:
Jan 17 23:31:42 kangur [4312977.009000]  [<b013e168>] pageout+0xbd/0xfa
Jan 17 23:31:42 kangur [4312977.009000]  [<b013e3d1>] 
shrink_list+0x22c/0x439
Jan 17 23:31:42 kangur [4312977.009000]  [<b0110db3>] 
smp_apic_timer_interrupt+0x34/0x77
Jan 17 23:31:42 kangur [4312977.009000]  [<b0103590>] 
apic_timer_interrupt+0x1c/0x24
Jan 17 23:31:42 kangur [4312977.009000]  [<b013d5e4>] 
__pagevec_release+0x15/0x1d
Jan 17 23:31:42 kangur [4312977.009000]  [<b013e77f>] 
shrink_cache+0x108/0x262
Jan 17 23:31:42 kangur [4312977.009000]  [<b013ed4c>] 
shrink_zone+0xa6/0xbc
Jan 17 23:31:42 kangur [4312977.009000]  [<b013ee19>] 
shrink_caches+0x4c/0x57
Jan 17 23:31:42 kangur [4312977.009000]  [<b013ef15>] 
try_to_free_pages+0xf1/0x1b9
Jan 17 23:31:42 kangur [4312977.009000]  [<b0137cc6>] 
__alloc_pages+0x171/0x266
Jan 17 23:31:42 kangur [4312977.009000]  [<b0142041>] 
do_anonymous_page+0x3e/0x143
Jan 17 23:31:42 kangur [4312977.009000]  [<b0142436>] 
__handle_mm_fault+0xa4/0x17f
Jan 17 23:31:42 kangur [4312977.009000]  [<b01134b9>] 
do_page_fault+0x176/0x4c1
Jan 17 23:31:42 kangur [4312977.009000]  [<b011babe>] irq_exit+0x29/0x34
Jan 17 23:31:42 kangur [4312977.009000]  [<b0104ade>] do_IRQ+0x52/0x5c
Jan 17 23:31:42 kangur [4312977.009000]  [<b0113343>] 
do_page_fault+0x0/0x4c1
Jan 17 23:31:42 kangur [4312977.009000]  [<b010365b>] error_code+0x4f/0x54
Jan 17 23:31:42 kangur [4312977.009000] Code: 0b 3f 03 13 e2 28 b0 e9 a0 
00 00 00 8b 04 24 31 c9 89 da e8 61 f3 ff ff 85 c0 89 c3 75 08 0f 0b 43 03 
13 e2 28 b0 8
3 3b 00 74 08 <0f> 0b 44 03 13 e2 28 b0 8b 54 24 04 89 f8 e8 7e cd ff ff 
85 c0
Jan 17 23:31:42 kangur [4312977.009000]  <6>note: cc1[2054] exited with 
preempt_count 1


after about half an hour of compiling 4.1.0_beta20060113 (from Gentoo) 
with all compilation files at tmpfs.

It does not look like it is easily reproductible and takes long time so I 
don't know if I will be able to reproduce it again in any sane time. But 
maybe somebody who knows mm better than me will be able to do something 
with this or will be able to say - "Hey it is fixed already in ...".

This is on kernel 2.6.15-rc6-git4-ck1 (so ck list in CC but I am not 
saying that -ck is responsible - there are some patches like swap prefetch 
in it so maybe...?) on Athlon XP 2000MHz machine with 1GB of RAM and 4GB 
of SWAP (in file on ext3 partition). I am rather sure that the hardware is 
100% ok because it was build 5 weeks ago and tested with memtest and prime 
for hours. Also there were no such problems with this kernel. Of course 
(it is Gentoo) whole system was compiled from source (with all compilation 
files on this tmpfs). So I don't know what could cause this.

In case somebody wonders what this line looks like in my kernel, here it 
is:

         BUG_ON(!PageLocked(page));
         BUG_ON(page_mapped(page));

         mapping = page->mapping;
         index = page->index;
         inode = mapping->host;
         info = SHMEM_I(inode);
         if (info->flags & VM_LOCKED)
                 goto redirty;
         swap = get_swap_page();
         if (!swap.val)
                 goto redirty;

         spin_lock(&info->lock);
         shmem_recalc_inode(inode);
         if (index >= info->next_index) {
                 BUG_ON(!(info->flags & SHMEM_TRUNCATE));
                 goto unlock;
         }
         entry = shmem_swp_entry(info, index, NULL);
         BUG_ON(!entry);
THIS LINE =>=>=>        BUG_ON(entry->val);

         if (move_to_swap_cache(page, swap) == 0) {
                 shmem_swp_set(info, entry, swap.val);
                 shmem_swp_unmap(entry);
                 spin_unlock(&info->lock);
                 if (list_empty(&info->swaplist)) {
                         spin_lock(&shmem_swaplist_lock);
                         /* move instead of add in case we're racing */
                         list_move_tail(&info->swaplist, &shmem_swaplist);
                         spin_unlock(&shmem_swaplist_lock);
                 }
                 unlock_page(page);
                 return 0;
         }

         shmem_swp_unmap(entry);



Thanks in advance,

Grzegorz Kulewski


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: BUG in tmpfs
  2006-01-19  0:14 BUG in tmpfs Grzegorz Kulewski
@ 2006-01-19  3:14 ` Greg KH
  2006-01-19 13:50   ` Grzegorz Kulewski
  0 siblings, 1 reply; 4+ messages in thread
From: Greg KH @ 2006-01-19  3:14 UTC (permalink / raw)
  To: Grzegorz Kulewski; +Cc: linux-kernel, Con Kolivas, ck, Linus Torvalds

On Thu, Jan 19, 2006 at 01:14:56AM +0100, Grzegorz Kulewski wrote:
> Hi,
> 
> I managed to get something like this:
> 
> kernel BUG at mm/shmem.c:836!
> Jan 17 23:31:42 kangur [4312977.009000] CPU:    0
> Jan 17 23:31:42 kangur [4312977.009000] EIP:    0060:[<b014a429>] 
> Tainted: P      VLI

Can you duplicate this on a 2.6.15 or newer kernel, without a closed
source driver loaded?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: BUG in tmpfs
  2006-01-19  3:14 ` Greg KH
@ 2006-01-19 13:50   ` Grzegorz Kulewski
  2006-01-19 18:12     ` Hugh Dickins
  0 siblings, 1 reply; 4+ messages in thread
From: Grzegorz Kulewski @ 2006-01-19 13:50 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-kernel, Con Kolivas, ck, Linus Torvalds

On Wed, 18 Jan 2006, Greg KH wrote:

> On Thu, Jan 19, 2006 at 01:14:56AM +0100, Grzegorz Kulewski wrote:
>> Hi,
>>
>> I managed to get something like this:
>>
>> kernel BUG at mm/shmem.c:836!
>> Jan 17 23:31:42 kangur [4312977.009000] CPU:    0
>> Jan 17 23:31:42 kangur [4312977.009000] EIP:    0060:[<b014a429>]
>> Tainted: P      VLI
>
> Can you duplicate this on a 2.6.15 or newer kernel, without a closed
> source driver loaded?

1. Well, as I said in my message I am not able to reproduce this on demand 
at all. It is the first time it ever happened to me in two months. If 
somebody is interested I can try very hardly to reproduce it but I would 
not expect success.

2. As to the tainted kernel. Well, I know it is tainted (by something 
like madwifi or nvidia loaded by coldplug and not even used from bootup). 
So this is very unlikely that this is the cause. But of course I can not 
guarantee that. But because of #1 -  I am not able to reproduce it with 
untainted kernel - I sent it to this list. Maybe it will make some bells 
ring in some smart head, especially if that head knows how those functions 
and data structures are supposed to work. If no - well - what could I do 
more?

I don't have infrastructure for making 24/7 stress tests of tmpfs (or 
anything other). But maybe if somebody has some spare machines he could 
try doing it because I am seeing different oopses/crashes/bugs in tmpfs on 
different kernel (on x86 and x86-64) from time to time. They are _very_ 
hard to reproduce and are probably also fixed from time to time (maybe by 
other not related changes not because somebody spoted bug in real life). 
Something like making whole system compiles and upgrades in a loop till 
error happens will proably be enough to catch most such errors after some 
time and doing it continously could prevent any regressions in newer 
kernels. All I could do is to try compiling that gcc 3 to 5 times a day 
with all proprietary modules not loaded (and that means real pain on 
desktop system btw).


Thanks,

Grzegorz Kulewski

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: BUG in tmpfs
  2006-01-19 13:50   ` Grzegorz Kulewski
@ 2006-01-19 18:12     ` Hugh Dickins
  0 siblings, 0 replies; 4+ messages in thread
From: Hugh Dickins @ 2006-01-19 18:12 UTC (permalink / raw)
  To: Grzegorz Kulewski; +Cc: Greg KH, linux-kernel, Con Kolivas, ck, Linus Torvalds

On Thu, 19 Jan 2006, Grzegorz Kulewski wrote:
> > On Thu, Jan 19, 2006 at 01:14:56AM +0100, Grzegorz Kulewski wrote:
> > > 
> > > kernel BUG at mm/shmem.c:836!
> > > Jan 17 23:31:42 kangur [4312977.009000] CPU:    0
> > > Jan 17 23:31:42 kangur [4312977.009000] EIP:    0060:[<b014a429>]
> > > Tainted: P      VLI
> > > Jan 17 23:31:42 kangur [4312977.009000] EFLAGS: 00210206 
> > > (2.6.15-rc6-git4-ck1)
> > > Jan 17 23:31:42 kangur [4312977.009000] EIP is at 
> > > shmem_writepage+0xa6/0x15a
>... 
> I don't have infrastructure for making 24/7 stress tests of tmpfs (or anything
> other). But maybe if somebody has some spare machines he could try doing it
> because I am seeing different oopses/crashes/bugs in tmpfs on different kernel
> (on x86 and x86-64) from time to time.

If you still have the logs from those, please do send me.  But nobody
else has reported any such tmpfs crashes for two or three years - though
I bet just the saying of that will help someone to remind me I'm wrong ;)

While it's easy to imagine some nasty interaction between swap prefetch
and shmem_writepage's peculiar use of swap, a quick glance at that patch
didn't ring any alarm bells for me, and it's not obvious how a problem
there would result in your particular "BUG_ON(entry->val);" - which could
as well come from random corruption (and GregKH has pointed out that your
kernel is P Tainted).

I've not been testing tmpfs on -ck, but the swap prefetch patch was in
-mm for a while, and I didn't see any such problem when testing there.
Please let me know if you see more such, or can unearth earlier logs.

Hugh

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2006-01-19 18:12 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-01-19  0:14 BUG in tmpfs Grzegorz Kulewski
2006-01-19  3:14 ` Greg KH
2006-01-19 13:50   ` Grzegorz Kulewski
2006-01-19 18:12     ` Hugh Dickins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).