* Debug: sleeping function called from invalid context @ 2003-08-15 17:18 Randy.Dunlap 2003-08-15 17:32 ` Dave Jones 0 siblings, 1 reply; 22+ messages in thread From: Randy.Dunlap @ 2003-08-15 17:18 UTC (permalink / raw) To: lkml Wrote some files to a Zip ppa device, did sync and umount: ppa: Version 2.07 (for Linux 2.4.x) ppa: Found device at ID 6, Attempting to use EPP 16 bit ppa: Found device at ID 6, Attempting to use SPP ppa: Communication established with ID 6 using SPP scsi1 : Iomega VPI0 (ppa) interface Vendor: IOMEGA Model: ZIP 100 Rev: J.03 Type: Direct-Access ANSI SCSI revision: 02 SCSI device sda: 196608 512-byte hdwr sectors (101 MB) sda: Write Protect is off sda: Mode Sense: 25 00 00 08 sda: cache data unavailable sda: assuming drive cache: write through sda: sda4 Attached scsi removable disk sda at scsi1, channel 0, id 6, lun 0 Attached scsi generic sg0 at scsi1, channel 0, id 6, lun 0, type 0 Debug: sleeping function called from invalid context at include/asm/uaccess.h:473 Call Trace: [<c0120d94>] __might_sleep+0x54/0x5b [<c010d001>] save_v86_state+0x71/0x1f0 [<c010dbd5>] handle_vm86_fault+0xc5/0xa90 [<c019cab8>] ext3_file_write+0x28/0xc0 [<c011cd96>] __change_page_attr+0x26/0x220 [<c010b310>] do_general_protection+0x0/0x90 [<c010a69d>] error_code+0x2d/0x40 [<c0109657>] syscall_call+0x7/0xb -- ~Randy ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Debug: sleeping function called from invalid context 2003-08-15 17:18 Debug: sleeping function called from invalid context Randy.Dunlap @ 2003-08-15 17:32 ` Dave Jones 2003-08-15 17:42 ` Randy.Dunlap 2003-08-15 19:30 ` Randy.Dunlap 0 siblings, 2 replies; 22+ messages in thread From: Dave Jones @ 2003-08-15 17:32 UTC (permalink / raw) To: Randy.Dunlap; +Cc: lkml On Fri, Aug 15, 2003 at 10:18:56AM -0700, Randy.Dunlap wrote: > Debug: sleeping function called from invalid context at include/asm/uaccess.h:473 > Call Trace: > [<c0120d94>] __might_sleep+0x54/0x5b > [<c010d001>] save_v86_state+0x71/0x1f0 > [<c010dbd5>] handle_vm86_fault+0xc5/0xa90 > [<c019cab8>] ext3_file_write+0x28/0xc0 > [<c011cd96>] __change_page_attr+0x26/0x220 > [<c010b310>] do_general_protection+0x0/0x90 > [<c010a69d>] error_code+0x2d/0x40 > [<c0109657>] syscall_call+0x7/0xb That's one really wierd looking backtrace. What else was that machine up to at the time ? Dave -- Dave Jones http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Debug: sleeping function called from invalid context 2003-08-15 17:32 ` Dave Jones @ 2003-08-15 17:42 ` Randy.Dunlap 2003-08-15 19:30 ` Randy.Dunlap 1 sibling, 0 replies; 22+ messages in thread From: Randy.Dunlap @ 2003-08-15 17:42 UTC (permalink / raw) To: Dave Jones; +Cc: linux-kernel On Fri, 15 Aug 2003 18:32:47 +0100 Dave Jones <davej@redhat.com> wrote: | On Fri, Aug 15, 2003 at 10:18:56AM -0700, Randy.Dunlap wrote: | | > Debug: sleeping function called from invalid context at include/asm/uaccess.h:473 | > Call Trace: | > [<c0120d94>] __might_sleep+0x54/0x5b | > [<c010d001>] save_v86_state+0x71/0x1f0 | > [<c010dbd5>] handle_vm86_fault+0xc5/0xa90 | > [<c019cab8>] ext3_file_write+0x28/0xc0 | > [<c011cd96>] __change_page_attr+0x26/0x220 | > [<c010b310>] do_general_protection+0x0/0x90 | > [<c010a69d>] error_code+0x2d/0x40 | > [<c0109657>] syscall_call+0x7/0xb | | That's one really wierd looking backtrace. What else was that | machine up to at the time ? Mostly I was just testing the ppa driver, so mostly "nothing." As I said: Wrote some files to a Zip ppa device, did sync and umount: Then I exited xfce and saw those messages. X and xfce (window manager) were loaded and some console windows were open. Nothing else other than daemons. -- ~Randy ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Debug: sleeping function called from invalid context 2003-08-15 17:32 ` Dave Jones 2003-08-15 17:42 ` Randy.Dunlap @ 2003-08-15 19:30 ` Randy.Dunlap 2003-08-16 7:06 ` Matt Mackall 1 sibling, 1 reply; 22+ messages in thread From: Randy.Dunlap @ 2003-08-15 19:30 UTC (permalink / raw) To: Dave Jones; +Cc: linux-kernel On Fri, 15 Aug 2003 18:32:47 +0100 Dave Jones <davej@redhat.com> wrote: | On Fri, Aug 15, 2003 at 10:18:56AM -0700, Randy.Dunlap wrote: | | > Debug: sleeping function called from invalid context at include/asm/uaccess.h:473 | > Call Trace: | > [<c0120d94>] __might_sleep+0x54/0x5b | > [<c010d001>] save_v86_state+0x71/0x1f0 | > [<c010dbd5>] handle_vm86_fault+0xc5/0xa90 | > [<c019cab8>] ext3_file_write+0x28/0xc0 | > [<c011cd96>] __change_page_attr+0x26/0x220 | > [<c010b310>] do_general_protection+0x0/0x90 | > [<c010a69d>] error_code+0x2d/0x40 | > [<c0109657>] syscall_call+0x7/0xb | | That's one really wierd looking backtrace. What else was that | machine up to at the time ? Some parts of it are explainable (to me), some not. I don't know what caused a GP fault or why ext3 shows up. But I can follow from do_general_protection() to handle_vm86_fault() to [inline] return_to_32bit() to save_v86_state() to __might_sleep(). And __might_sleep() is correct if change_page_attr() was called, since it takes a spinlock. I just can't connect quite all of the dots. -- ~Randy ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Debug: sleeping function called from invalid context 2003-08-15 19:30 ` Randy.Dunlap @ 2003-08-16 7:06 ` Matt Mackall 2003-08-18 21:07 ` Randy.Dunlap 0 siblings, 1 reply; 22+ messages in thread From: Matt Mackall @ 2003-08-16 7:06 UTC (permalink / raw) To: Randy.Dunlap; +Cc: Dave Jones, Andrew Morton, linux-kernel On Fri, Aug 15, 2003 at 12:30:53PM -0700, Randy.Dunlap wrote: > On Fri, 15 Aug 2003 18:32:47 +0100 Dave Jones <davej@redhat.com> wrote: > > | On Fri, Aug 15, 2003 at 10:18:56AM -0700, Randy.Dunlap wrote: > | > | > Debug: sleeping function called from invalid context at include/asm/uaccess.h:473 > | > Call Trace: > | > [<c0120d94>] __might_sleep+0x54/0x5b > | > [<c010d001>] save_v86_state+0x71/0x1f0 > | > [<c010dbd5>] handle_vm86_fault+0xc5/0xa90 > | > [<c019cab8>] ext3_file_write+0x28/0xc0 > | > [<c011cd96>] __change_page_attr+0x26/0x220 > | > [<c010b310>] do_general_protection+0x0/0x90 > | > [<c010a69d>] error_code+0x2d/0x40 > | > [<c0109657>] syscall_call+0x7/0xb > | > | That's one really wierd looking backtrace. What else was that > | machine up to at the time ? > > Some parts of it are explainable (to me), some not. > I don't know what caused a GP fault or why ext3 shows up. > > But I can follow from do_general_protection() to handle_vm86_fault() > to [inline] return_to_32bit() to save_v86_state() to __might_sleep(). > > And __might_sleep() is correct if change_page_attr() was called, > since it takes a spinlock. I just can't connect quite all of the dots. Ok, there's some stack noise here with the ext3_file_write and __change_page_attr. Here's what I've figured out so far. You probably have something like X running with a driver that calls an image of its own 16-bit BIOS to get something done (I think Matrox does this) via sys_vm86. One of the arguments to sys_vm86 is a pointer to a vm86plus_struct in userspace that gets stashed away in tsk->thread.vm86_info. When, for whatever reason, a fixup is needed in vm86 mode, we find ourselves in handle_vm86_fault and save_v86_state copied various junk out to this userspace struct we've kept a pointer to. Now as far as I can tell, there's nothing guaranteeing this struct is pinned down or in any way guaranteed to be around when the fault occurs. If the page with the struct _does_ get swapped out, we can be in trouble when we hit this fault. If this is actually a valid analysis, we should probably just pin the page for the duration of vm86 as it's already bordering on magical. If there's some reason this whole thing is safe, let me know and I'll try to get might_sleep to stop complaining about these. I suppose we could test it by hacking a program guaranteed to fault in vm86 mode and hacking the syscall to force the page out before calling do_sys_vm86. -- Matt Mackall : http://www.selenic.com : of or relating to the moon ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Debug: sleeping function called from invalid context 2003-08-16 7:06 ` Matt Mackall @ 2003-08-18 21:07 ` Randy.Dunlap 2003-08-18 21:26 ` Matt Mackall 2003-08-19 0:13 ` Dave Jones 0 siblings, 2 replies; 22+ messages in thread From: Randy.Dunlap @ 2003-08-18 21:07 UTC (permalink / raw) To: Matt Mackall; +Cc: davej, akpm, linux-kernel On Sat, 16 Aug 2003 02:06:52 -0500 Matt Mackall <mpm@selenic.com> wrote: | On Fri, Aug 15, 2003 at 12:30:53PM -0700, Randy.Dunlap wrote: | > On Fri, 15 Aug 2003 18:32:47 +0100 Dave Jones <davej@redhat.com> wrote: | > | > | On Fri, Aug 15, 2003 at 10:18:56AM -0700, Randy.Dunlap wrote: | > | | > | > Debug: sleeping function called from invalid context at include/asm/uaccess.h:473 | > | > Call Trace: | > | > [<c0120d94>] __might_sleep+0x54/0x5b | > | > [<c010d001>] save_v86_state+0x71/0x1f0 | > | > [<c010dbd5>] handle_vm86_fault+0xc5/0xa90 | > | > [<c019cab8>] ext3_file_write+0x28/0xc0 | > | > [<c011cd96>] __change_page_attr+0x26/0x220 | > | > [<c010b310>] do_general_protection+0x0/0x90 | > | > [<c010a69d>] error_code+0x2d/0x40 | > | > [<c0109657>] syscall_call+0x7/0xb | > | | > | That's one really wierd looking backtrace. What else was that | > | machine up to at the time ? | > | > Some parts of it are explainable (to me), some not. | > I don't know what caused a GP fault or why ext3 shows up. | > | > But I can follow from do_general_protection() to handle_vm86_fault() | > to [inline] return_to_32bit() to save_v86_state() to __might_sleep(). | > | > And __might_sleep() is correct if change_page_attr() was called, | > since it takes a spinlock. I just can't connect quite all of the dots. | | Ok, there's some stack noise here with the ext3_file_write and | __change_page_attr. | | Here's what I've figured out so far. You probably have something like | X running with a driver that calls an image of its own 16-bit BIOS to | get something done (I think Matrox does this) via sys_vm86. One of the | arguments to sys_vm86 is a pointer to a vm86plus_struct in userspace | that gets stashed away in tsk->thread.vm86_info. | | When, for whatever reason, a fixup is needed in vm86 mode, we find | ourselves in handle_vm86_fault and save_v86_state copied various junk | out to this userspace struct we've kept a pointer to. Now as far as I | can tell, there's nothing guaranteeing this struct is pinned down or | in any way guaranteed to be around when the fault occurs. If the page | with the struct _does_ get swapped out, we can be in trouble when we | hit this fault. | | If this is actually a valid analysis, we should probably just pin the | page for the duration of vm86 as it's already bordering on magical. | | If there's some reason this whole thing is safe, let me know and I'll | try to get might_sleep to stop complaining about these. | | I suppose we could test it by hacking a program guaranteed to fault in | vm86 mode and hacking the syscall to force the page out before calling | do_sys_vm86. I had another occurrence of this yesterday. It doesn't seem to be an error condition AFAICT. -- ~Randy "Everything is relative." ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Debug: sleeping function called from invalid context 2003-08-18 21:07 ` Randy.Dunlap @ 2003-08-18 21:26 ` Matt Mackall 2003-08-19 0:13 ` Dave Jones 1 sibling, 0 replies; 22+ messages in thread From: Matt Mackall @ 2003-08-18 21:26 UTC (permalink / raw) To: Randy.Dunlap; +Cc: davej, akpm, linux-kernel On Mon, Aug 18, 2003 at 02:07:29PM -0700, Randy.Dunlap wrote: > On Sat, 16 Aug 2003 02:06:52 -0500 Matt Mackall <mpm@selenic.com> wrote: > > | On Fri, Aug 15, 2003 at 12:30:53PM -0700, Randy.Dunlap wrote: > | > On Fri, 15 Aug 2003 18:32:47 +0100 Dave Jones <davej@redhat.com> wrote: > | > > | > | On Fri, Aug 15, 2003 at 10:18:56AM -0700, Randy.Dunlap wrote: > | > | > | > | > Debug: sleeping function called from invalid context at include/asm/uaccess.h:473 > | > | > Call Trace: > | > | > [<c0120d94>] __might_sleep+0x54/0x5b > | > | > [<c010d001>] save_v86_state+0x71/0x1f0 > | > | > [<c010dbd5>] handle_vm86_fault+0xc5/0xa90 > | > | > [<c019cab8>] ext3_file_write+0x28/0xc0 > | > | > [<c011cd96>] __change_page_attr+0x26/0x220 > | > | > [<c010b310>] do_general_protection+0x0/0x90 > | > | > [<c010a69d>] error_code+0x2d/0x40 > | > | > [<c0109657>] syscall_call+0x7/0xb > | > | > | > | That's one really wierd looking backtrace. What else was that > | > | machine up to at the time ? > | > > | > Some parts of it are explainable (to me), some not. > | > I don't know what caused a GP fault or why ext3 shows up. > | > > | > But I can follow from do_general_protection() to handle_vm86_fault() > | > to [inline] return_to_32bit() to save_v86_state() to __might_sleep(). > | > > | > And __might_sleep() is correct if change_page_attr() was called, > | > since it takes a spinlock. I just can't connect quite all of the dots. > | > | Ok, there's some stack noise here with the ext3_file_write and > | __change_page_attr. > | > | Here's what I've figured out so far. You probably have something like > | X running with a driver that calls an image of its own 16-bit BIOS to > | get something done (I think Matrox does this) via sys_vm86. One of the > | arguments to sys_vm86 is a pointer to a vm86plus_struct in userspace > | that gets stashed away in tsk->thread.vm86_info. > | > | When, for whatever reason, a fixup is needed in vm86 mode, we find > | ourselves in handle_vm86_fault and save_v86_state copied various junk > | out to this userspace struct we've kept a pointer to. Now as far as I > | can tell, there's nothing guaranteeing this struct is pinned down or > | in any way guaranteed to be around when the fault occurs. If the page > | with the struct _does_ get swapped out, we can be in trouble when we > | hit this fault. > | > | If this is actually a valid analysis, we should probably just pin the > | page for the duration of vm86 as it's already bordering on magical. > | > | If there's some reason this whole thing is safe, let me know and I'll > | try to get might_sleep to stop complaining about these. > | > | I suppose we could test it by hacking a program guaranteed to fault in > | vm86 mode and hacking the syscall to force the page out before calling > | do_sys_vm86. > > I had another occurrence of this yesterday. It doesn't seem to be > an error condition AFAICT. No, it's only a warning (which I recently added to the copy_* path) that something is used in a suspicious context. The question is "is the warning bogus or is there something that can actually go wrong here?". As far as I can tell, the answer is "yes, there's a good chance this could blow up under load". We need two things to happen: the caller's page containing the vm86plus_struct to get swapped out, and to take any sort of fault (including using virtualized instructions) in vm86 mode. Result: you'll hit the double-fault handler and the system will loop forever with interrupts disabled after spitting out a few printks that you won't see because you're in X. -- Matt Mackall : http://www.selenic.com : of or relating to the moon ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Debug: sleeping function called from invalid context 2003-08-18 21:07 ` Randy.Dunlap 2003-08-18 21:26 ` Matt Mackall @ 2003-08-19 0:13 ` Dave Jones 2003-08-19 0:15 ` Randy.Dunlap ` (2 more replies) 1 sibling, 3 replies; 22+ messages in thread From: Dave Jones @ 2003-08-19 0:13 UTC (permalink / raw) To: Randy.Dunlap; +Cc: Matt Mackall, akpm, linux-kernel On Mon, Aug 18, 2003 at 02:07:29PM -0700, Randy.Dunlap wrote: > | > | > Debug: sleeping function called from invalid context at include/asm/uaccess.h:473 > | > | > Call Trace: > | > | > [<c0120d94>] __might_sleep+0x54/0x5b > | > | > [<c010d001>] save_v86_state+0x71/0x1f0 > | > | > [<c010dbd5>] handle_vm86_fault+0xc5/0xa90 > | > | > [<c019cab8>] ext3_file_write+0x28/0xc0 > | > | > [<c011cd96>] __change_page_attr+0x26/0x220 > | > | > [<c010b310>] do_general_protection+0x0/0x90 > | > | > [<c010a69d>] error_code+0x2d/0x40 > | > | > [<c0109657>] syscall_call+0x7/0xb > | > | > > I had another occurrence of this yesterday. It doesn't seem to be > an error condition AFAICT. How spooky. now I got one too, (minus the noise). Call Trace: [<c0120022>] __might_sleep+0x5b/0x5f [<c010cf8a>] save_v86_state+0x6a/0x1f3 [<c010dad2>] handle_vm86_fault+0xa7/0x897 [<c010b2ed>] do_general_protection+0x0/0x93 [<c010a651>] error_code+0x2d/0x38 [<c0109623>] syscall_call+0x7/0xb By the looks of the logs, it happened as I restarted X, as theres a bunch of mtrr messages right after this.. Dave -- Dave Jones http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Debug: sleeping function called from invalid context 2003-08-19 0:13 ` Dave Jones @ 2003-08-19 0:15 ` Randy.Dunlap 2003-08-19 1:02 ` Zwane Mwaikambo 2003-08-19 0:15 ` Andrew Morton 2003-08-19 1:01 ` Matt Mackall 2 siblings, 1 reply; 22+ messages in thread From: Randy.Dunlap @ 2003-08-19 0:15 UTC (permalink / raw) To: Dave Jones; +Cc: mpm, akpm, linux-kernel On Tue, 19 Aug 2003 01:13:16 +0100 Dave Jones <davej@redhat.com> wrote: | On Mon, Aug 18, 2003 at 02:07:29PM -0700, Randy.Dunlap wrote: | > | > | > Debug: sleeping function called from invalid context at include/asm/uaccess.h:473 | > | > | > Call Trace: | > | > | > [<c0120d94>] __might_sleep+0x54/0x5b | > | > | > [<c010d001>] save_v86_state+0x71/0x1f0 | > | > | > [<c010dbd5>] handle_vm86_fault+0xc5/0xa90 | > | > | > [<c019cab8>] ext3_file_write+0x28/0xc0 | > | > | > [<c011cd96>] __change_page_attr+0x26/0x220 | > | > | > [<c010b310>] do_general_protection+0x0/0x90 | > | > | > [<c010a69d>] error_code+0x2d/0x40 | > | > | > [<c0109657>] syscall_call+0x7/0xb | > | > | | > | > I had another occurrence of this yesterday. It doesn't seem to be | > an error condition AFAICT. | | How spooky. now I got one too, (minus the noise). | | Call Trace: | [<c0120022>] __might_sleep+0x5b/0x5f | [<c010cf8a>] save_v86_state+0x6a/0x1f3 | [<c010dad2>] handle_vm86_fault+0xa7/0x897 | [<c010b2ed>] do_general_protection+0x0/0x93 | [<c010a651>] error_code+0x2d/0x38 | [<c0109623>] syscall_call+0x7/0xb | | By the looks of the logs, it happened as I restarted X, as theres | a bunch of mtrr messages right after this.. I don't recall mtrr messages, but it is happening just after I start X and the window manager with me also. -- ~Randy "Everything is relative." ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Debug: sleeping function called from invalid context 2003-08-19 0:15 ` Randy.Dunlap @ 2003-08-19 1:02 ` Zwane Mwaikambo 0 siblings, 0 replies; 22+ messages in thread From: Zwane Mwaikambo @ 2003-08-19 1:02 UTC (permalink / raw) To: Randy.Dunlap; +Cc: Dave Jones, mpm, akpm, linux-kernel On Mon, 18 Aug 2003, Randy.Dunlap wrote: > | How spooky. now I got one too, (minus the noise). > | > | Call Trace: > | [<c0120022>] __might_sleep+0x5b/0x5f > | [<c010cf8a>] save_v86_state+0x6a/0x1f3 > | [<c010dad2>] handle_vm86_fault+0xa7/0x897 > | [<c010b2ed>] do_general_protection+0x0/0x93 > | [<c010a651>] error_code+0x2d/0x38 > | [<c0109623>] syscall_call+0x7/0xb > | > | By the looks of the logs, it happened as I restarted X, as theres > | a bunch of mtrr messages right after this.. > > I don't recall mtrr messages, but it is happening just after I start > X and the window manager with me also. It most probably is the XFree86 video bios thing running in vm86 mode. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Debug: sleeping function called from invalid context 2003-08-19 0:13 ` Dave Jones 2003-08-19 0:15 ` Randy.Dunlap @ 2003-08-19 0:15 ` Andrew Morton 2003-08-19 1:27 ` Matt Mackall 2003-08-19 3:24 ` Randy.Dunlap 2003-08-19 1:01 ` Matt Mackall 2 siblings, 2 replies; 22+ messages in thread From: Andrew Morton @ 2003-08-19 0:15 UTC (permalink / raw) To: Dave Jones; +Cc: rddunlap, mpm, linux-kernel Dave Jones <davej@redhat.com> wrote: > > How spooky. now I got one too, (minus the noise). > > Call Trace: > [<c0120022>] __might_sleep+0x5b/0x5f It would be useful to know whether this was triggered by in_atomic() or by irqs_disabled(). We're suspecting the latter. diff -puN kernel/sched.c~might_sleep-diags kernel/sched.c --- 25/kernel/sched.c~might_sleep-diags Mon Aug 18 14:09:41 2003 +++ 25-akpm/kernel/sched.c Mon Aug 18 14:11:55 2003 @@ -2795,13 +2795,19 @@ void __might_sleep(char *file, int line) { #if defined(in_atomic) static unsigned long prev_jiffy; /* ratelimiting */ + char *msg = NULL; - if (in_atomic() || irqs_disabled()) { + if (in_atomic()) + msg = "in atomic section"; + else if (irqs_disabled()) + msg = "with interrupts disabled"; + + if (msg) { if (time_before(jiffies, prev_jiffy + HZ)) return; prev_jiffy = jiffies; - printk(KERN_ERR "Debug: sleeping function called from invalid" - " context at %s:%d\n", file, line); + printk(KERN_ERR "Debug: sleeping function " + "called %s at %s:%d\n", msg, file, line); dump_stack(); } #endif _ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Debug: sleeping function called from invalid context 2003-08-19 0:15 ` Andrew Morton @ 2003-08-19 1:27 ` Matt Mackall 2003-08-19 3:24 ` Randy.Dunlap 1 sibling, 0 replies; 22+ messages in thread From: Matt Mackall @ 2003-08-19 1:27 UTC (permalink / raw) To: Andrew Morton; +Cc: Dave Jones, rddunlap, linux-kernel On Mon, Aug 18, 2003 at 05:15:45PM -0700, Andrew Morton wrote: > Dave Jones <davej@redhat.com> wrote: > > > > How spooky. now I got one too, (minus the noise). > > > > Call Trace: > > [<c0120022>] __might_sleep+0x5b/0x5f > > It would be useful to know whether this was triggered by in_atomic() or by > irqs_disabled(). We're suspecting the latter. Everything points to it being a fault handler. Here's my current understanding: some part of X calls sys_vm86() sys_vm86 stashes pointer to userspace structure do_sys_vm86 fiddles with register structures to setup 16-bit transition do_sys_vm86 goes to 16-bit mode _through the userspace return path_ fault occurs in 16-bit code handle_vm86_fault invoked through interrupt save_v86_state writes into stashed userspace structure (might_sleep) return_to_32bit swaps register sets around return_to_32bit returns to the original userspace context Because we never return to the context of the sys_vm86 syscall, we're never again in an appropriate place to copy the registers over. A cleaner way to do this is to setup return_to_32bit to return to the point just after where sys_vm86 returns to 16bit mode and copy the registers to userspace in normal process context. -- Matt Mackall : http://www.selenic.com : of or relating to the moon ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Debug: sleeping function called from invalid context 2003-08-19 0:15 ` Andrew Morton 2003-08-19 1:27 ` Matt Mackall @ 2003-08-19 3:24 ` Randy.Dunlap 2003-08-19 3:35 ` Andrew Morton 1 sibling, 1 reply; 22+ messages in thread From: Randy.Dunlap @ 2003-08-19 3:24 UTC (permalink / raw) To: akpm; +Cc: davej, mpm, linux-kernel > Dave Jones <davej@redhat.com> wrote: >> >> How spooky. now I got one too, (minus the noise). >> >> Call Trace: >> [<c0120022>] __might_sleep+0x5b/0x5f > > It would be useful to know whether this was triggered by in_atomic() or by > irqs_disabled(). We're suspecting the latter. > > > diff -puN kernel/sched.c~might_sleep-diags kernel/sched.c > --- 25/kernel/sched.c~might_sleep-diags Mon Aug 18 14:09:41 2003 > +++ 25-akpm/kernel/sched.c Mon Aug 18 14:11:55 2003 > @@ -2795,13 +2795,19 @@ void __might_sleep(char *file, int line) > { > #if defined(in_atomic) > static unsigned long prev_jiffy; /* ratelimiting */ > + char *msg = NULL; > > - if (in_atomic() || irqs_disabled()) { > + if (in_atomic()) > + msg = "in atomic section"; > + else if (irqs_disabled()) > + msg = "with interrupts disabled"; > + > + if (msg) { > if (time_before(jiffies, prev_jiffy + HZ)) > return; > prev_jiffy = jiffies; > - printk(KERN_ERR "Debug: sleeping function called from invalid" > - " context at %s:%d\n", file, line); > + printk(KERN_ERR "Debug: sleeping function " > + "called %s at %s:%d\n", msg, file, line); > dump_stack(); > } > #endif Took 5 tries of loading X to get it, and you got it. (I'm using VGA_CONSOLE.) Debug: sleeping function called with interrupts disabled at include/asm/uaccess.h:473 Call Trace: [<c0120d93>] __might_sleep+0x53/0x74 [<c010d001>] save_v86_state+0x71/0x1f0 [<c010dbd5>] handle_vm86_fault+0xc5/0xa90 [<c019cac8>] ext3_file_write+0x28/0xc0 [<c011cd96>] __change_page_attr+0x26/0x220 [<c010b310>] do_general_protection+0x0/0x90 [<c010a69d>] error_code+0x2d/0x40 [<c0109657>] syscall_call+0x7/0xb ~Randy ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Debug: sleeping function called from invalid context 2003-08-19 3:24 ` Randy.Dunlap @ 2003-08-19 3:35 ` Andrew Morton 2003-08-19 3:39 ` Randy.Dunlap ` (2 more replies) 0 siblings, 3 replies; 22+ messages in thread From: Andrew Morton @ 2003-08-19 3:35 UTC (permalink / raw) To: Randy.Dunlap; +Cc: davej, mpm, linux-kernel "Randy.Dunlap" <rddunlap@osdl.org> wrote: > > Debug: sleeping function called with interrupts disabled at > include/asm/uaccess.h:473 OK, now my vague understanding of what's going on is that the app has chosen to disable local interupts (via iopl()) and has taken a vm86 trap. I guess we'd see the same thing if the app performed some sleeping syscall while interrupts are disabled. If that is correct then it really is just a false positive. It could also point at a bug in the application; it is presumably disabling interrupts for some form of locking, atomicity or timing guarantee. But it will not lock against other CPUs and the fact that it trapped into the kernel indicates tat it may not be getting the atomicity which it desires. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Debug: sleeping function called from invalid context 2003-08-19 3:35 ` Andrew Morton @ 2003-08-19 3:39 ` Randy.Dunlap 2003-08-19 4:26 ` Matt Mackall 2003-08-19 4:29 ` Andrew Morton 2003-08-19 4:14 ` Matt Mackall 2003-08-19 5:14 ` Matt Mackall 2 siblings, 2 replies; 22+ messages in thread From: Randy.Dunlap @ 2003-08-19 3:39 UTC (permalink / raw) To: akpm; +Cc: davej, mpm, linux-kernel > "Randy.Dunlap" <rddunlap@osdl.org> wrote: >> >> Debug: sleeping function called with interrupts disabled at >> include/asm/uaccess.h:473 > > OK, now my vague understanding of what's going on is that the app has chosen > to disable local interupts (via iopl()) and has taken a vm86 trap. I guess > we'd see the same thing if the app performed some sleeping syscall while > interrupts are disabled. > > If that is correct then it really is just a false positive. > > It could also point at a bug in the application; it is presumably disabling > interrupts for some form of locking, atomicity or timing guarantee. But it > will not lock against other CPUs and the fact that it trapped into the > kernel indicates tat it may not be getting the atomicity which it desires. Call Trace: [<c0120d93>] __might_sleep+0x53/0x74 [<c010d001>] save_v86_state+0x71/0x1f0 [<c010dbd5>] handle_vm86_fault+0xc5/0xa90 [<c019cac8>] ext3_file_write+0x28/0xc0 [<c011cd96>] __change_page_attr+0x26/0x220 [<c010b310>] do_general_protection+0x0/0x90 [<c010a69d>] error_code+0x2d/0x40 [<c0109657>] syscall_call+0x7/0xb My (more) vague understanding is that X(?) got the kernel to do_general_protection() somehow, but change_page_attr() does this: spin_lock_irqsave(&cpa_lock, flags); in arch/i386/mm/pageattr.c (I'm on a UP box), so irqs are disabled by the kernel and then we calls put_user() with a spinlock held. ~Randy [betting I understand it less than Andrew] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Debug: sleeping function called from invalid context 2003-08-19 3:39 ` Randy.Dunlap @ 2003-08-19 4:26 ` Matt Mackall 2003-08-19 4:29 ` Andrew Morton 1 sibling, 0 replies; 22+ messages in thread From: Matt Mackall @ 2003-08-19 4:26 UTC (permalink / raw) To: Randy.Dunlap; +Cc: akpm, davej, linux-kernel On Mon, Aug 18, 2003 at 08:39:29PM -0700, Randy.Dunlap wrote: > > "Randy.Dunlap" <rddunlap@osdl.org> wrote: > >> > >> Debug: sleeping function called with interrupts disabled at > >> include/asm/uaccess.h:473 > > > > OK, now my vague understanding of what's going on is that the app has chosen > > to disable local interupts (via iopl()) and has taken a vm86 trap. I guess > > we'd see the same thing if the app performed some sleeping syscall while > > interrupts are disabled. > > > > If that is correct then it really is just a false positive. > > > > It could also point at a bug in the application; it is presumably disabling > > interrupts for some form of locking, atomicity or timing guarantee. But it > > will not lock against other CPUs and the fact that it trapped into the > > kernel indicates tat it may not be getting the atomicity which it desires. > > Call Trace: > [<c0120d93>] __might_sleep+0x53/0x74 > [<c010d001>] save_v86_state+0x71/0x1f0 > [<c010dbd5>] handle_vm86_fault+0xc5/0xa90 > [<c019cac8>] ext3_file_write+0x28/0xc0 > [<c011cd96>] __change_page_attr+0x26/0x220 > [<c010b310>] do_general_protection+0x0/0x90 > [<c010a69d>] error_code+0x2d/0x40 > [<c0109657>] syscall_call+0x7/0xb > > My (more) vague understanding is that X(?) got the kernel to > do_general_protection() somehow, but change_page_attr() does this: > spin_lock_irqsave(&cpa_lock, flags); > in arch/i386/mm/pageattr.c (I'm on a UP box), > so irqs are disabled by the kernel and then we calls put_user() > with a spinlock held. My suspicion is that the X driver (MGA, right?) copies its BIOS into its address space, sets up io permissions for it to fiddle with your card, and then calls sys_vm86 with the appropriate registers to call into the interrupt handler when sys_vm86 swaps those registers in and does "jmp resume_userspace". When the video BIOS software interrupt handler does iret, we get the fault, copy the registers to our 32-bit usermode (tripping might_sleep), and then swap the 32-bit registers back in for another jump to resume_userspace. The fix is making return_to_32_bit return back to the end of do_sys_vm86 rather than straight to userspace, at which point we can safely handle the registers. -- Matt Mackall : http://www.selenic.com : of or relating to the moon ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Debug: sleeping function called from invalid context 2003-08-19 3:39 ` Randy.Dunlap 2003-08-19 4:26 ` Matt Mackall @ 2003-08-19 4:29 ` Andrew Morton 1 sibling, 0 replies; 22+ messages in thread From: Andrew Morton @ 2003-08-19 4:29 UTC (permalink / raw) To: Randy.Dunlap; +Cc: davej, mpm, linux-kernel "Randy.Dunlap" <rddunlap@osdl.org> wrote: > > Call Trace: > [<c0120d93>] __might_sleep+0x53/0x74 > [<c010d001>] save_v86_state+0x71/0x1f0 > [<c010dbd5>] handle_vm86_fault+0xc5/0xa90 > [<c019cac8>] ext3_file_write+0x28/0xc0 > [<c011cd96>] __change_page_attr+0x26/0x220 > [<c010b310>] do_general_protection+0x0/0x90 > [<c010a69d>] error_code+0x2d/0x40 > [<c0109657>] syscall_call+0x7/0xb > > My (more) vague understanding is that X(?) got the kernel to > do_general_protection() somehow, but change_page_attr() does this: > spin_lock_irqsave(&cpa_lock, flags); > in arch/i386/mm/pageattr.c (I'm on a UP box), > so irqs are disabled by the kernel and then we calls put_user() > with a spinlock held. The __change_page_attr() there looks like stack gunk. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Debug: sleeping function called from invalid context 2003-08-19 3:35 ` Andrew Morton 2003-08-19 3:39 ` Randy.Dunlap @ 2003-08-19 4:14 ` Matt Mackall 2003-08-19 5:14 ` Matt Mackall 2 siblings, 0 replies; 22+ messages in thread From: Matt Mackall @ 2003-08-19 4:14 UTC (permalink / raw) To: Andrew Morton; +Cc: Randy.Dunlap, davej, linux-kernel On Mon, Aug 18, 2003 at 08:35:13PM -0700, Andrew Morton wrote: > "Randy.Dunlap" <rddunlap@osdl.org> wrote: > > > > Debug: sleeping function called with interrupts disabled at > > include/asm/uaccess.h:473 > > OK, now my vague understanding of what's going on is that the app has > chosen to disable local interupts (via iopl()) and has taken a vm86 trap. Are you suggesting that whatever's calling sys_vm86 has disabled interrupts beforehand? I don't see why that's necessary at all. The vm86 fault handler is called via do_general_protection in any case and as such is in_interrupt() by definition. And a vm86 general protection fault can be caused by any of cli, sti, pushf, popf, intx, or iret. In fact, typical usage of vm86 mode is to setup a call to a 16-bit software interrupt handler and return via fault on the iret. I'm increasingly convinced it's actually broken. -- Matt Mackall : http://www.selenic.com : of or relating to the moon ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Debug: sleeping function called from invalid context 2003-08-19 3:35 ` Andrew Morton 2003-08-19 3:39 ` Randy.Dunlap 2003-08-19 4:14 ` Matt Mackall @ 2003-08-19 5:14 ` Matt Mackall 2 siblings, 0 replies; 22+ messages in thread From: Matt Mackall @ 2003-08-19 5:14 UTC (permalink / raw) To: Andrew Morton; +Cc: Randy.Dunlap, davej, linux-kernel On Mon, Aug 18, 2003 at 08:35:13PM -0700, Andrew Morton wrote: > "Randy.Dunlap" <rddunlap@osdl.org> wrote: > > > > Debug: sleeping function called with interrupts disabled at > > include/asm/uaccess.h:473 > > OK, now my vague understanding of what's going on is that the app has > chosen to disable local interupts (via iopl()) and has taken a vm86 trap. > I guess we'd see the same thing if the app performed some sleeping syscall > while interrupts are disabled. Ok, I think I've managed to reproduce this and show that it's not a case of calling syscalls with interrupt disabled. There's a utility called savetextmode that's part of svgalib (apt-get svgalib-bin for Debian folks). If you configure it to use VESA, it will call out to your video card bios and reproduce the error, just as I expected. I had kgdb handy and already running on a machine, so I set a breakpoint in sys_vm86old (X would use sys_vm86, but the results would be the same) and traced into it. About the first thing it does is call copy_from_user, which checks might_sleep, which remained silent. -- Matt Mackall : http://www.selenic.com : of or relating to the moon ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Debug: sleeping function called from invalid context 2003-08-19 0:13 ` Dave Jones 2003-08-19 0:15 ` Randy.Dunlap 2003-08-19 0:15 ` Andrew Morton @ 2003-08-19 1:01 ` Matt Mackall 2003-08-19 1:04 ` Dave Jones 2 siblings, 1 reply; 22+ messages in thread From: Matt Mackall @ 2003-08-19 1:01 UTC (permalink / raw) To: Dave Jones, Randy.Dunlap, akpm, linux-kernel On Tue, Aug 19, 2003 at 01:13:16AM +0100, Dave Jones wrote: > On Mon, Aug 18, 2003 at 02:07:29PM -0700, Randy.Dunlap wrote: > > | > | > Debug: sleeping function called from invalid context at include/asm/uaccess.h:473 > > | > | > Call Trace: > > | > | > [<c0120d94>] __might_sleep+0x54/0x5b > > | > | > [<c010d001>] save_v86_state+0x71/0x1f0 > > | > | > [<c010dbd5>] handle_vm86_fault+0xc5/0xa90 > > | > | > [<c019cab8>] ext3_file_write+0x28/0xc0 > > | > | > [<c011cd96>] __change_page_attr+0x26/0x220 > > | > | > [<c010b310>] do_general_protection+0x0/0x90 > > | > | > [<c010a69d>] error_code+0x2d/0x40 > > | > | > [<c0109657>] syscall_call+0x7/0xb > > | > | > > > > I had another occurrence of this yesterday. It doesn't seem to be > > an error condition AFAICT. > > How spooky. now I got one too, (minus the noise). > > Call Trace: > [<c0120022>] __might_sleep+0x5b/0x5f > [<c010cf8a>] save_v86_state+0x6a/0x1f3 > [<c010dad2>] handle_vm86_fault+0xa7/0x897 > [<c010b2ed>] do_general_protection+0x0/0x93 > [<c010a651>] error_code+0x2d/0x38 > [<c0109623>] syscall_call+0x7/0xb > > By the looks of the logs, it happened as I restarted X, as theres > a bunch of mtrr messages right after this.. What video driver do you use? -- Matt Mackall : http://www.selenic.com : of or relating to the moon ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Debug: sleeping function called from invalid context 2003-08-19 1:01 ` Matt Mackall @ 2003-08-19 1:04 ` Dave Jones 2003-08-19 1:09 ` Matt Mackall 0 siblings, 1 reply; 22+ messages in thread From: Dave Jones @ 2003-08-19 1:04 UTC (permalink / raw) To: Matt Mackall; +Cc: Randy.Dunlap, akpm, linux-kernel On Mon, Aug 18, 2003 at 08:01:19PM -0500, Matt Mackall wrote: > > > > By the looks of the logs, it happened as I restarted X, as theres > > a bunch of mtrr messages right after this.. > > What video driver do you use? mga Dave -- Dave Jones http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Debug: sleeping function called from invalid context 2003-08-19 1:04 ` Dave Jones @ 2003-08-19 1:09 ` Matt Mackall 0 siblings, 0 replies; 22+ messages in thread From: Matt Mackall @ 2003-08-19 1:09 UTC (permalink / raw) To: Dave Jones, Randy.Dunlap, akpm, linux-kernel On Tue, Aug 19, 2003 at 02:04:57AM +0100, Dave Jones wrote: > On Mon, Aug 18, 2003 at 08:01:19PM -0500, Matt Mackall wrote: > > > > > > By the looks of the logs, it happened as I restarted X, as theres > > > a bunch of mtrr messages right after this.. > > > > What video driver do you use? > > mga I had someone else send me a strings(1) of that driver, and it looked like it was running a copy of its BIOS. -- Matt Mackall : http://www.selenic.com : of or relating to the moon ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2003-08-19 5:17 UTC | newest] Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-08-15 17:18 Debug: sleeping function called from invalid context Randy.Dunlap 2003-08-15 17:32 ` Dave Jones 2003-08-15 17:42 ` Randy.Dunlap 2003-08-15 19:30 ` Randy.Dunlap 2003-08-16 7:06 ` Matt Mackall 2003-08-18 21:07 ` Randy.Dunlap 2003-08-18 21:26 ` Matt Mackall 2003-08-19 0:13 ` Dave Jones 2003-08-19 0:15 ` Randy.Dunlap 2003-08-19 1:02 ` Zwane Mwaikambo 2003-08-19 0:15 ` Andrew Morton 2003-08-19 1:27 ` Matt Mackall 2003-08-19 3:24 ` Randy.Dunlap 2003-08-19 3:35 ` Andrew Morton 2003-08-19 3:39 ` Randy.Dunlap 2003-08-19 4:26 ` Matt Mackall 2003-08-19 4:29 ` Andrew Morton 2003-08-19 4:14 ` Matt Mackall 2003-08-19 5:14 ` Matt Mackall 2003-08-19 1:01 ` Matt Mackall 2003-08-19 1:04 ` Dave Jones 2003-08-19 1:09 ` Matt Mackall
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).