linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Debug: sleeping function called from invalid context
@ 2003-08-15 17:18 Randy.Dunlap
  2003-08-15 17:32 ` Dave Jones
  0 siblings, 1 reply; 22+ messages in thread
From: Randy.Dunlap @ 2003-08-15 17:18 UTC (permalink / raw)
  To: lkml


Wrote some files to a Zip ppa device, did sync and umount:

ppa: Version 2.07 (for Linux 2.4.x)
ppa: Found device at ID 6, Attempting to use EPP 16 bit
ppa: Found device at ID 6, Attempting to use SPP
ppa: Communication established with ID 6 using SPP
scsi1 : Iomega VPI0 (ppa) interface
  Vendor: IOMEGA    Model: ZIP 100           Rev: J.03
  Type:   Direct-Access                      ANSI SCSI revision: 02
SCSI device sda: 196608 512-byte hdwr sectors (101 MB)
sda: Write Protect is off
sda: Mode Sense: 25 00 00 08
sda: cache data unavailable
sda: assuming drive cache: write through
 sda: sda4
Attached scsi removable disk sda at scsi1, channel 0, id 6, lun 0
Attached scsi generic sg0 at scsi1, channel 0, id 6, lun 0,  type 0
Debug: sleeping function called from invalid context at include/asm/uaccess.h:473
Call Trace:
 [<c0120d94>] __might_sleep+0x54/0x5b
 [<c010d001>] save_v86_state+0x71/0x1f0
 [<c010dbd5>] handle_vm86_fault+0xc5/0xa90
 [<c019cab8>] ext3_file_write+0x28/0xc0
 [<c011cd96>] __change_page_attr+0x26/0x220
 [<c010b310>] do_general_protection+0x0/0x90
 [<c010a69d>] error_code+0x2d/0x40
 [<c0109657>] syscall_call+0x7/0xb

--
~Randy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Debug: sleeping function called from invalid context
  2003-08-15 17:18 Debug: sleeping function called from invalid context Randy.Dunlap
@ 2003-08-15 17:32 ` Dave Jones
  2003-08-15 17:42   ` Randy.Dunlap
  2003-08-15 19:30   ` Randy.Dunlap
  0 siblings, 2 replies; 22+ messages in thread
From: Dave Jones @ 2003-08-15 17:32 UTC (permalink / raw)
  To: Randy.Dunlap; +Cc: lkml

On Fri, Aug 15, 2003 at 10:18:56AM -0700, Randy.Dunlap wrote:

 > Debug: sleeping function called from invalid context at include/asm/uaccess.h:473
 > Call Trace:
 >  [<c0120d94>] __might_sleep+0x54/0x5b
 >  [<c010d001>] save_v86_state+0x71/0x1f0
 >  [<c010dbd5>] handle_vm86_fault+0xc5/0xa90
 >  [<c019cab8>] ext3_file_write+0x28/0xc0
 >  [<c011cd96>] __change_page_attr+0x26/0x220
 >  [<c010b310>] do_general_protection+0x0/0x90
 >  [<c010a69d>] error_code+0x2d/0x40
 >  [<c0109657>] syscall_call+0x7/0xb

That's one really wierd looking backtrace. What else was that
machine up to at the time ?

		Dave

-- 
 Dave Jones     http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Debug: sleeping function called from invalid context
  2003-08-15 17:32 ` Dave Jones
@ 2003-08-15 17:42   ` Randy.Dunlap
  2003-08-15 19:30   ` Randy.Dunlap
  1 sibling, 0 replies; 22+ messages in thread
From: Randy.Dunlap @ 2003-08-15 17:42 UTC (permalink / raw)
  To: Dave Jones; +Cc: linux-kernel

On Fri, 15 Aug 2003 18:32:47 +0100 Dave Jones <davej@redhat.com> wrote:

| On Fri, Aug 15, 2003 at 10:18:56AM -0700, Randy.Dunlap wrote:
| 
|  > Debug: sleeping function called from invalid context at include/asm/uaccess.h:473
|  > Call Trace:
|  >  [<c0120d94>] __might_sleep+0x54/0x5b
|  >  [<c010d001>] save_v86_state+0x71/0x1f0
|  >  [<c010dbd5>] handle_vm86_fault+0xc5/0xa90
|  >  [<c019cab8>] ext3_file_write+0x28/0xc0
|  >  [<c011cd96>] __change_page_attr+0x26/0x220
|  >  [<c010b310>] do_general_protection+0x0/0x90
|  >  [<c010a69d>] error_code+0x2d/0x40
|  >  [<c0109657>] syscall_call+0x7/0xb
| 
| That's one really wierd looking backtrace. What else was that
| machine up to at the time ?

Mostly I was just testing the ppa driver, so mostly "nothing."
As I said:
Wrote some files to a Zip ppa device, did sync and umount:
Then I exited xfce and saw those messages.

X and xfce (window manager) were loaded and some console windows
were open.  Nothing else other than daemons.

--
~Randy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Debug: sleeping function called from invalid context
  2003-08-15 17:32 ` Dave Jones
  2003-08-15 17:42   ` Randy.Dunlap
@ 2003-08-15 19:30   ` Randy.Dunlap
  2003-08-16  7:06     ` Matt Mackall
  1 sibling, 1 reply; 22+ messages in thread
From: Randy.Dunlap @ 2003-08-15 19:30 UTC (permalink / raw)
  To: Dave Jones; +Cc: linux-kernel

On Fri, 15 Aug 2003 18:32:47 +0100 Dave Jones <davej@redhat.com> wrote:

| On Fri, Aug 15, 2003 at 10:18:56AM -0700, Randy.Dunlap wrote:
| 
|  > Debug: sleeping function called from invalid context at include/asm/uaccess.h:473
|  > Call Trace:
|  >  [<c0120d94>] __might_sleep+0x54/0x5b
|  >  [<c010d001>] save_v86_state+0x71/0x1f0
|  >  [<c010dbd5>] handle_vm86_fault+0xc5/0xa90
|  >  [<c019cab8>] ext3_file_write+0x28/0xc0
|  >  [<c011cd96>] __change_page_attr+0x26/0x220
|  >  [<c010b310>] do_general_protection+0x0/0x90
|  >  [<c010a69d>] error_code+0x2d/0x40
|  >  [<c0109657>] syscall_call+0x7/0xb
| 
| That's one really wierd looking backtrace. What else was that
| machine up to at the time ?

Some parts of it are explainable (to me), some not.
I don't know what caused a GP fault or why ext3 shows up.

But I can follow from do_general_protection() to handle_vm86_fault()
to [inline] return_to_32bit() to save_v86_state() to __might_sleep().

And __might_sleep() is correct if change_page_attr() was called,
since it takes a spinlock.  I just can't connect quite all of the dots.

--
~Randy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Debug: sleeping function called from invalid context
  2003-08-15 19:30   ` Randy.Dunlap
@ 2003-08-16  7:06     ` Matt Mackall
  2003-08-18 21:07       ` Randy.Dunlap
  0 siblings, 1 reply; 22+ messages in thread
From: Matt Mackall @ 2003-08-16  7:06 UTC (permalink / raw)
  To: Randy.Dunlap; +Cc: Dave Jones, Andrew Morton, linux-kernel

On Fri, Aug 15, 2003 at 12:30:53PM -0700, Randy.Dunlap wrote:
> On Fri, 15 Aug 2003 18:32:47 +0100 Dave Jones <davej@redhat.com> wrote:
> 
> | On Fri, Aug 15, 2003 at 10:18:56AM -0700, Randy.Dunlap wrote:
> | 
> |  > Debug: sleeping function called from invalid context at include/asm/uaccess.h:473
> |  > Call Trace:
> |  >  [<c0120d94>] __might_sleep+0x54/0x5b
> |  >  [<c010d001>] save_v86_state+0x71/0x1f0
> |  >  [<c010dbd5>] handle_vm86_fault+0xc5/0xa90
> |  >  [<c019cab8>] ext3_file_write+0x28/0xc0
> |  >  [<c011cd96>] __change_page_attr+0x26/0x220
> |  >  [<c010b310>] do_general_protection+0x0/0x90
> |  >  [<c010a69d>] error_code+0x2d/0x40
> |  >  [<c0109657>] syscall_call+0x7/0xb
> | 
> | That's one really wierd looking backtrace. What else was that
> | machine up to at the time ?
> 
> Some parts of it are explainable (to me), some not.
> I don't know what caused a GP fault or why ext3 shows up.
> 
> But I can follow from do_general_protection() to handle_vm86_fault()
> to [inline] return_to_32bit() to save_v86_state() to __might_sleep().
> 
> And __might_sleep() is correct if change_page_attr() was called,
> since it takes a spinlock.  I just can't connect quite all of the dots.

Ok, there's some stack noise here with the ext3_file_write and
__change_page_attr.

Here's what I've figured out so far. You probably have something like
X running with a driver that calls an image of its own 16-bit BIOS to
get something done (I think Matrox does this) via sys_vm86. One of the
arguments to sys_vm86 is a pointer to a vm86plus_struct in userspace
that gets stashed away in tsk->thread.vm86_info.

When, for whatever reason, a fixup is needed in vm86 mode, we find
ourselves in handle_vm86_fault and save_v86_state copied various junk
out to this userspace struct we've kept a pointer to. Now as far as I
can tell, there's nothing guaranteeing this struct is pinned down or
in any way guaranteed to be around when the fault occurs. If the page
with the struct _does_ get swapped out, we can be in trouble when we
hit this fault.

If this is actually a valid analysis, we should probably just pin the
page for the duration of vm86 as it's already bordering on magical.

If there's some reason this whole thing is safe, let me know and I'll
try to get might_sleep to stop complaining about these.

I suppose we could test it by hacking a program guaranteed to fault in
vm86 mode and hacking the syscall to force the page out before calling
do_sys_vm86.

-- 
Matt Mackall : http://www.selenic.com : of or relating to the moon

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Debug: sleeping function called from invalid context
  2003-08-16  7:06     ` Matt Mackall
@ 2003-08-18 21:07       ` Randy.Dunlap
  2003-08-18 21:26         ` Matt Mackall
  2003-08-19  0:13         ` Dave Jones
  0 siblings, 2 replies; 22+ messages in thread
From: Randy.Dunlap @ 2003-08-18 21:07 UTC (permalink / raw)
  To: Matt Mackall; +Cc: davej, akpm, linux-kernel

On Sat, 16 Aug 2003 02:06:52 -0500 Matt Mackall <mpm@selenic.com> wrote:

| On Fri, Aug 15, 2003 at 12:30:53PM -0700, Randy.Dunlap wrote:
| > On Fri, 15 Aug 2003 18:32:47 +0100 Dave Jones <davej@redhat.com> wrote:
| > 
| > | On Fri, Aug 15, 2003 at 10:18:56AM -0700, Randy.Dunlap wrote:
| > | 
| > |  > Debug: sleeping function called from invalid context at include/asm/uaccess.h:473
| > |  > Call Trace:
| > |  >  [<c0120d94>] __might_sleep+0x54/0x5b
| > |  >  [<c010d001>] save_v86_state+0x71/0x1f0
| > |  >  [<c010dbd5>] handle_vm86_fault+0xc5/0xa90
| > |  >  [<c019cab8>] ext3_file_write+0x28/0xc0
| > |  >  [<c011cd96>] __change_page_attr+0x26/0x220
| > |  >  [<c010b310>] do_general_protection+0x0/0x90
| > |  >  [<c010a69d>] error_code+0x2d/0x40
| > |  >  [<c0109657>] syscall_call+0x7/0xb
| > | 
| > | That's one really wierd looking backtrace. What else was that
| > | machine up to at the time ?
| > 
| > Some parts of it are explainable (to me), some not.
| > I don't know what caused a GP fault or why ext3 shows up.
| > 
| > But I can follow from do_general_protection() to handle_vm86_fault()
| > to [inline] return_to_32bit() to save_v86_state() to __might_sleep().
| > 
| > And __might_sleep() is correct if change_page_attr() was called,
| > since it takes a spinlock.  I just can't connect quite all of the dots.
| 
| Ok, there's some stack noise here with the ext3_file_write and
| __change_page_attr.
| 
| Here's what I've figured out so far. You probably have something like
| X running with a driver that calls an image of its own 16-bit BIOS to
| get something done (I think Matrox does this) via sys_vm86. One of the
| arguments to sys_vm86 is a pointer to a vm86plus_struct in userspace
| that gets stashed away in tsk->thread.vm86_info.
| 
| When, for whatever reason, a fixup is needed in vm86 mode, we find
| ourselves in handle_vm86_fault and save_v86_state copied various junk
| out to this userspace struct we've kept a pointer to. Now as far as I
| can tell, there's nothing guaranteeing this struct is pinned down or
| in any way guaranteed to be around when the fault occurs. If the page
| with the struct _does_ get swapped out, we can be in trouble when we
| hit this fault.
| 
| If this is actually a valid analysis, we should probably just pin the
| page for the duration of vm86 as it's already bordering on magical.
| 
| If there's some reason this whole thing is safe, let me know and I'll
| try to get might_sleep to stop complaining about these.
| 
| I suppose we could test it by hacking a program guaranteed to fault in
| vm86 mode and hacking the syscall to force the page out before calling
| do_sys_vm86.

I had another occurrence of this yesterday.  It doesn't seem to be
an error condition AFAICT.

--
~Randy
"Everything is relative."

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Debug: sleeping function called from invalid context
  2003-08-18 21:07       ` Randy.Dunlap
@ 2003-08-18 21:26         ` Matt Mackall
  2003-08-19  0:13         ` Dave Jones
  1 sibling, 0 replies; 22+ messages in thread
From: Matt Mackall @ 2003-08-18 21:26 UTC (permalink / raw)
  To: Randy.Dunlap; +Cc: davej, akpm, linux-kernel

On Mon, Aug 18, 2003 at 02:07:29PM -0700, Randy.Dunlap wrote:
> On Sat, 16 Aug 2003 02:06:52 -0500 Matt Mackall <mpm@selenic.com> wrote:
> 
> | On Fri, Aug 15, 2003 at 12:30:53PM -0700, Randy.Dunlap wrote:
> | > On Fri, 15 Aug 2003 18:32:47 +0100 Dave Jones <davej@redhat.com> wrote:
> | > 
> | > | On Fri, Aug 15, 2003 at 10:18:56AM -0700, Randy.Dunlap wrote:
> | > | 
> | > |  > Debug: sleeping function called from invalid context at include/asm/uaccess.h:473
> | > |  > Call Trace:
> | > |  >  [<c0120d94>] __might_sleep+0x54/0x5b
> | > |  >  [<c010d001>] save_v86_state+0x71/0x1f0
> | > |  >  [<c010dbd5>] handle_vm86_fault+0xc5/0xa90
> | > |  >  [<c019cab8>] ext3_file_write+0x28/0xc0
> | > |  >  [<c011cd96>] __change_page_attr+0x26/0x220
> | > |  >  [<c010b310>] do_general_protection+0x0/0x90
> | > |  >  [<c010a69d>] error_code+0x2d/0x40
> | > |  >  [<c0109657>] syscall_call+0x7/0xb
> | > | 
> | > | That's one really wierd looking backtrace. What else was that
> | > | machine up to at the time ?
> | > 
> | > Some parts of it are explainable (to me), some not.
> | > I don't know what caused a GP fault or why ext3 shows up.
> | > 
> | > But I can follow from do_general_protection() to handle_vm86_fault()
> | > to [inline] return_to_32bit() to save_v86_state() to __might_sleep().
> | > 
> | > And __might_sleep() is correct if change_page_attr() was called,
> | > since it takes a spinlock.  I just can't connect quite all of the dots.
> | 
> | Ok, there's some stack noise here with the ext3_file_write and
> | __change_page_attr.
> | 
> | Here's what I've figured out so far. You probably have something like
> | X running with a driver that calls an image of its own 16-bit BIOS to
> | get something done (I think Matrox does this) via sys_vm86. One of the
> | arguments to sys_vm86 is a pointer to a vm86plus_struct in userspace
> | that gets stashed away in tsk->thread.vm86_info.
> | 
> | When, for whatever reason, a fixup is needed in vm86 mode, we find
> | ourselves in handle_vm86_fault and save_v86_state copied various junk
> | out to this userspace struct we've kept a pointer to. Now as far as I
> | can tell, there's nothing guaranteeing this struct is pinned down or
> | in any way guaranteed to be around when the fault occurs. If the page
> | with the struct _does_ get swapped out, we can be in trouble when we
> | hit this fault.
> | 
> | If this is actually a valid analysis, we should probably just pin the
> | page for the duration of vm86 as it's already bordering on magical.
> | 
> | If there's some reason this whole thing is safe, let me know and I'll
> | try to get might_sleep to stop complaining about these.
> | 
> | I suppose we could test it by hacking a program guaranteed to fault in
> | vm86 mode and hacking the syscall to force the page out before calling
> | do_sys_vm86.
> 
> I had another occurrence of this yesterday.  It doesn't seem to be
> an error condition AFAICT.

No, it's only a warning (which I recently added to the copy_* path)
that something is used in a suspicious context. The question is "is
the warning bogus or is there something that can actually go wrong
here?".

As far as I can tell, the answer is "yes, there's a good chance this
could blow up under load". We need two things to happen: the caller's
page containing the vm86plus_struct to get swapped out, and to take
any sort of fault (including using virtualized instructions) in vm86
mode. Result: you'll hit the double-fault handler and the system will
loop forever with interrupts disabled after spitting out a few printks
that you won't see because you're in X.

-- 
Matt Mackall : http://www.selenic.com : of or relating to the moon

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Debug: sleeping function called from invalid context
  2003-08-18 21:07       ` Randy.Dunlap
  2003-08-18 21:26         ` Matt Mackall
@ 2003-08-19  0:13         ` Dave Jones
  2003-08-19  0:15           ` Randy.Dunlap
                             ` (2 more replies)
  1 sibling, 3 replies; 22+ messages in thread
From: Dave Jones @ 2003-08-19  0:13 UTC (permalink / raw)
  To: Randy.Dunlap; +Cc: Matt Mackall, akpm, linux-kernel

On Mon, Aug 18, 2003 at 02:07:29PM -0700, Randy.Dunlap wrote:
 > | > |  > Debug: sleeping function called from invalid context at include/asm/uaccess.h:473
 > | > |  > Call Trace:
 > | > |  >  [<c0120d94>] __might_sleep+0x54/0x5b
 > | > |  >  [<c010d001>] save_v86_state+0x71/0x1f0
 > | > |  >  [<c010dbd5>] handle_vm86_fault+0xc5/0xa90
 > | > |  >  [<c019cab8>] ext3_file_write+0x28/0xc0
 > | > |  >  [<c011cd96>] __change_page_attr+0x26/0x220
 > | > |  >  [<c010b310>] do_general_protection+0x0/0x90
 > | > |  >  [<c010a69d>] error_code+0x2d/0x40
 > | > |  >  [<c0109657>] syscall_call+0x7/0xb
 > | > | 
 > 
 > I had another occurrence of this yesterday.  It doesn't seem to be
 > an error condition AFAICT.

How spooky. now I got one too, (minus the noise).

Call Trace:
 [<c0120022>] __might_sleep+0x5b/0x5f
 [<c010cf8a>] save_v86_state+0x6a/0x1f3
 [<c010dad2>] handle_vm86_fault+0xa7/0x897
 [<c010b2ed>] do_general_protection+0x0/0x93
 [<c010a651>] error_code+0x2d/0x38
 [<c0109623>] syscall_call+0x7/0xb

By the looks of the logs, it happened as I restarted X, as theres
a bunch of mtrr messages right after this..

		Dave

-- 
 Dave Jones     http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Debug: sleeping function called from invalid context
  2003-08-19  0:13         ` Dave Jones
@ 2003-08-19  0:15           ` Randy.Dunlap
  2003-08-19  1:02             ` Zwane Mwaikambo
  2003-08-19  0:15           ` Andrew Morton
  2003-08-19  1:01           ` Matt Mackall
  2 siblings, 1 reply; 22+ messages in thread
From: Randy.Dunlap @ 2003-08-19  0:15 UTC (permalink / raw)
  To: Dave Jones; +Cc: mpm, akpm, linux-kernel

On Tue, 19 Aug 2003 01:13:16 +0100 Dave Jones <davej@redhat.com> wrote:

| On Mon, Aug 18, 2003 at 02:07:29PM -0700, Randy.Dunlap wrote:
|  > | > |  > Debug: sleeping function called from invalid context at include/asm/uaccess.h:473
|  > | > |  > Call Trace:
|  > | > |  >  [<c0120d94>] __might_sleep+0x54/0x5b
|  > | > |  >  [<c010d001>] save_v86_state+0x71/0x1f0
|  > | > |  >  [<c010dbd5>] handle_vm86_fault+0xc5/0xa90
|  > | > |  >  [<c019cab8>] ext3_file_write+0x28/0xc0
|  > | > |  >  [<c011cd96>] __change_page_attr+0x26/0x220
|  > | > |  >  [<c010b310>] do_general_protection+0x0/0x90
|  > | > |  >  [<c010a69d>] error_code+0x2d/0x40
|  > | > |  >  [<c0109657>] syscall_call+0x7/0xb
|  > | > | 
|  > 
|  > I had another occurrence of this yesterday.  It doesn't seem to be
|  > an error condition AFAICT.
| 
| How spooky. now I got one too, (minus the noise).
| 
| Call Trace:
|  [<c0120022>] __might_sleep+0x5b/0x5f
|  [<c010cf8a>] save_v86_state+0x6a/0x1f3
|  [<c010dad2>] handle_vm86_fault+0xa7/0x897
|  [<c010b2ed>] do_general_protection+0x0/0x93
|  [<c010a651>] error_code+0x2d/0x38
|  [<c0109623>] syscall_call+0x7/0xb
| 
| By the looks of the logs, it happened as I restarted X, as theres
| a bunch of mtrr messages right after this..

I don't recall mtrr messages, but it is happening just after I start
X and the window manager with me also.

--
~Randy
"Everything is relative."

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Debug: sleeping function called from invalid context
  2003-08-19  0:13         ` Dave Jones
  2003-08-19  0:15           ` Randy.Dunlap
@ 2003-08-19  0:15           ` Andrew Morton
  2003-08-19  1:27             ` Matt Mackall
  2003-08-19  3:24             ` Randy.Dunlap
  2003-08-19  1:01           ` Matt Mackall
  2 siblings, 2 replies; 22+ messages in thread
From: Andrew Morton @ 2003-08-19  0:15 UTC (permalink / raw)
  To: Dave Jones; +Cc: rddunlap, mpm, linux-kernel

Dave Jones <davej@redhat.com> wrote:
>
> How spooky. now I got one too, (minus the noise).
> 
> Call Trace:
>  [<c0120022>] __might_sleep+0x5b/0x5f

It would be useful to know whether this was triggered by in_atomic() or by
irqs_disabled().  We're suspecting the latter.


diff -puN kernel/sched.c~might_sleep-diags kernel/sched.c
--- 25/kernel/sched.c~might_sleep-diags	Mon Aug 18 14:09:41 2003
+++ 25-akpm/kernel/sched.c	Mon Aug 18 14:11:55 2003
@@ -2795,13 +2795,19 @@ void __might_sleep(char *file, int line)
 {
 #if defined(in_atomic)
 	static unsigned long prev_jiffy;	/* ratelimiting */
+	char *msg = NULL;
 
-	if (in_atomic() || irqs_disabled()) {
+	if (in_atomic())
+		msg = "in atomic section";
+	else if (irqs_disabled())
+		msg = "with interrupts disabled";
+
+	if (msg) {
 		if (time_before(jiffies, prev_jiffy + HZ))
 			return;
 		prev_jiffy = jiffies;
-		printk(KERN_ERR "Debug: sleeping function called from invalid"
-				" context at %s:%d\n", file, line);
+		printk(KERN_ERR "Debug: sleeping function "
+				"called %s at %s:%d\n", msg, file, line);
 		dump_stack();
 	}
 #endif

_


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Debug: sleeping function called from invalid context
  2003-08-19  0:13         ` Dave Jones
  2003-08-19  0:15           ` Randy.Dunlap
  2003-08-19  0:15           ` Andrew Morton
@ 2003-08-19  1:01           ` Matt Mackall
  2003-08-19  1:04             ` Dave Jones
  2 siblings, 1 reply; 22+ messages in thread
From: Matt Mackall @ 2003-08-19  1:01 UTC (permalink / raw)
  To: Dave Jones, Randy.Dunlap, akpm, linux-kernel

On Tue, Aug 19, 2003 at 01:13:16AM +0100, Dave Jones wrote:
> On Mon, Aug 18, 2003 at 02:07:29PM -0700, Randy.Dunlap wrote:
>  > | > |  > Debug: sleeping function called from invalid context at include/asm/uaccess.h:473
>  > | > |  > Call Trace:
>  > | > |  >  [<c0120d94>] __might_sleep+0x54/0x5b
>  > | > |  >  [<c010d001>] save_v86_state+0x71/0x1f0
>  > | > |  >  [<c010dbd5>] handle_vm86_fault+0xc5/0xa90
>  > | > |  >  [<c019cab8>] ext3_file_write+0x28/0xc0
>  > | > |  >  [<c011cd96>] __change_page_attr+0x26/0x220
>  > | > |  >  [<c010b310>] do_general_protection+0x0/0x90
>  > | > |  >  [<c010a69d>] error_code+0x2d/0x40
>  > | > |  >  [<c0109657>] syscall_call+0x7/0xb
>  > | > | 
>  > 
>  > I had another occurrence of this yesterday.  It doesn't seem to be
>  > an error condition AFAICT.
> 
> How spooky. now I got one too, (minus the noise).
> 
> Call Trace:
>  [<c0120022>] __might_sleep+0x5b/0x5f
>  [<c010cf8a>] save_v86_state+0x6a/0x1f3
>  [<c010dad2>] handle_vm86_fault+0xa7/0x897
>  [<c010b2ed>] do_general_protection+0x0/0x93
>  [<c010a651>] error_code+0x2d/0x38
>  [<c0109623>] syscall_call+0x7/0xb
> 
> By the looks of the logs, it happened as I restarted X, as theres
> a bunch of mtrr messages right after this..

What video driver do you use?

-- 
Matt Mackall : http://www.selenic.com : of or relating to the moon

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Debug: sleeping function called from invalid context
  2003-08-19  0:15           ` Randy.Dunlap
@ 2003-08-19  1:02             ` Zwane Mwaikambo
  0 siblings, 0 replies; 22+ messages in thread
From: Zwane Mwaikambo @ 2003-08-19  1:02 UTC (permalink / raw)
  To: Randy.Dunlap; +Cc: Dave Jones, mpm, akpm, linux-kernel

On Mon, 18 Aug 2003, Randy.Dunlap wrote:

> | How spooky. now I got one too, (minus the noise).
> | 
> | Call Trace:
> |  [<c0120022>] __might_sleep+0x5b/0x5f
> |  [<c010cf8a>] save_v86_state+0x6a/0x1f3
> |  [<c010dad2>] handle_vm86_fault+0xa7/0x897
> |  [<c010b2ed>] do_general_protection+0x0/0x93
> |  [<c010a651>] error_code+0x2d/0x38
> |  [<c0109623>] syscall_call+0x7/0xb
> | 
> | By the looks of the logs, it happened as I restarted X, as theres
> | a bunch of mtrr messages right after this..
> 
> I don't recall mtrr messages, but it is happening just after I start
> X and the window manager with me also.

It most probably is the XFree86 video bios thing running in vm86 mode.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Debug: sleeping function called from invalid context
  2003-08-19  1:01           ` Matt Mackall
@ 2003-08-19  1:04             ` Dave Jones
  2003-08-19  1:09               ` Matt Mackall
  0 siblings, 1 reply; 22+ messages in thread
From: Dave Jones @ 2003-08-19  1:04 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Randy.Dunlap, akpm, linux-kernel

On Mon, Aug 18, 2003 at 08:01:19PM -0500, Matt Mackall wrote:
 > > 
 > > By the looks of the logs, it happened as I restarted X, as theres
 > > a bunch of mtrr messages right after this..
 > 
 > What video driver do you use?

mga

		Dave

-- 
 Dave Jones     http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Debug: sleeping function called from invalid context
  2003-08-19  1:04             ` Dave Jones
@ 2003-08-19  1:09               ` Matt Mackall
  0 siblings, 0 replies; 22+ messages in thread
From: Matt Mackall @ 2003-08-19  1:09 UTC (permalink / raw)
  To: Dave Jones, Randy.Dunlap, akpm, linux-kernel

On Tue, Aug 19, 2003 at 02:04:57AM +0100, Dave Jones wrote:
> On Mon, Aug 18, 2003 at 08:01:19PM -0500, Matt Mackall wrote:
>  > > 
>  > > By the looks of the logs, it happened as I restarted X, as theres
>  > > a bunch of mtrr messages right after this..
>  > 
>  > What video driver do you use?
> 
> mga

I had someone else send me a strings(1) of that driver, and it looked
like it was running a copy of its BIOS.

-- 
Matt Mackall : http://www.selenic.com : of or relating to the moon

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Debug: sleeping function called from invalid context
  2003-08-19  0:15           ` Andrew Morton
@ 2003-08-19  1:27             ` Matt Mackall
  2003-08-19  3:24             ` Randy.Dunlap
  1 sibling, 0 replies; 22+ messages in thread
From: Matt Mackall @ 2003-08-19  1:27 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Dave Jones, rddunlap, linux-kernel

On Mon, Aug 18, 2003 at 05:15:45PM -0700, Andrew Morton wrote:
> Dave Jones <davej@redhat.com> wrote:
> >
> > How spooky. now I got one too, (minus the noise).
> > 
> > Call Trace:
> >  [<c0120022>] __might_sleep+0x5b/0x5f
> 
> It would be useful to know whether this was triggered by in_atomic() or by
> irqs_disabled().  We're suspecting the latter.

Everything points to it being a fault handler.

Here's my current understanding:

 some part of X calls sys_vm86()
 sys_vm86 stashes pointer to userspace structure
 do_sys_vm86 fiddles with register structures to setup 16-bit transition
 do_sys_vm86 goes to 16-bit mode _through the userspace return path_
 fault occurs in 16-bit code
 handle_vm86_fault invoked through interrupt
 save_v86_state writes into stashed userspace structure (might_sleep)
 return_to_32bit swaps register sets around
 return_to_32bit returns to the original userspace context

Because we never return to the context of the sys_vm86 syscall, we're
never again in an appropriate place to copy the registers over. A
cleaner way to do this is to setup return_to_32bit to return to the
point just after where sys_vm86 returns to 16bit mode and copy the
registers to userspace in normal process context.

-- 
Matt Mackall : http://www.selenic.com : of or relating to the moon

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Debug: sleeping function called from invalid context
  2003-08-19  0:15           ` Andrew Morton
  2003-08-19  1:27             ` Matt Mackall
@ 2003-08-19  3:24             ` Randy.Dunlap
  2003-08-19  3:35               ` Andrew Morton
  1 sibling, 1 reply; 22+ messages in thread
From: Randy.Dunlap @ 2003-08-19  3:24 UTC (permalink / raw)
  To: akpm; +Cc: davej, mpm, linux-kernel

> Dave Jones <davej@redhat.com> wrote:
>>
>> How spooky. now I got one too, (minus the noise).
>>
>> Call Trace:
>>  [<c0120022>] __might_sleep+0x5b/0x5f
>
> It would be useful to know whether this was triggered by in_atomic() or by
> irqs_disabled().  We're suspecting the latter.
>
>
> diff -puN kernel/sched.c~might_sleep-diags kernel/sched.c
> --- 25/kernel/sched.c~might_sleep-diags	Mon Aug 18 14:09:41 2003
> +++ 25-akpm/kernel/sched.c	Mon Aug 18 14:11:55 2003
> @@ -2795,13 +2795,19 @@ void __might_sleep(char *file, int line)
>  {
>  #if defined(in_atomic)
>  	static unsigned long prev_jiffy;	/* ratelimiting */
> +	char *msg = NULL;
>
> -	if (in_atomic() || irqs_disabled()) {
> +	if (in_atomic())
> +		msg = "in atomic section";
> +	else if (irqs_disabled())
> +		msg = "with interrupts disabled";
> +
> +	if (msg) {
>  		if (time_before(jiffies, prev_jiffy + HZ))
>  			return;
>  		prev_jiffy = jiffies;
> -		printk(KERN_ERR "Debug: sleeping function called from invalid"
> -				" context at %s:%d\n", file, line);
> +		printk(KERN_ERR "Debug: sleeping function "
> +				"called %s at %s:%d\n", msg, file, line);
>  		dump_stack();
>  	}
>  #endif

Took 5 tries of loading X to get it, and you got it.
(I'm using VGA_CONSOLE.)

Debug: sleeping function called with interrupts disabled at
include/asm/uaccess.h:473
Call Trace:
 [<c0120d93>] __might_sleep+0x53/0x74
 [<c010d001>] save_v86_state+0x71/0x1f0
 [<c010dbd5>] handle_vm86_fault+0xc5/0xa90
 [<c019cac8>] ext3_file_write+0x28/0xc0
 [<c011cd96>] __change_page_attr+0x26/0x220
 [<c010b310>] do_general_protection+0x0/0x90
 [<c010a69d>] error_code+0x2d/0x40
 [<c0109657>] syscall_call+0x7/0xb

~Randy




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Debug: sleeping function called from invalid context
  2003-08-19  3:24             ` Randy.Dunlap
@ 2003-08-19  3:35               ` Andrew Morton
  2003-08-19  3:39                 ` Randy.Dunlap
                                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Andrew Morton @ 2003-08-19  3:35 UTC (permalink / raw)
  To: Randy.Dunlap; +Cc: davej, mpm, linux-kernel

"Randy.Dunlap" <rddunlap@osdl.org> wrote:
>
> Debug: sleeping function called with interrupts disabled at
>  include/asm/uaccess.h:473

OK, now my vague understanding of what's going on is that the app has
chosen to disable local interupts (via iopl()) and has taken a vm86 trap. 
I guess we'd see the same thing if the app performed some sleeping syscall
while interrupts are disabled.

If that is correct then it really is just a false positive.

It could also point at a bug in the application; it is presumably disabling
interrupts for some form of locking, atomicity or timing guarantee.  But it
will not lock against other CPUs and the fact that it trapped into the
kernel indicates tat it may not be getting the atomicity which it desires.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Debug: sleeping function called from invalid context
  2003-08-19  3:35               ` Andrew Morton
@ 2003-08-19  3:39                 ` Randy.Dunlap
  2003-08-19  4:26                   ` Matt Mackall
  2003-08-19  4:29                   ` Andrew Morton
  2003-08-19  4:14                 ` Matt Mackall
  2003-08-19  5:14                 ` Matt Mackall
  2 siblings, 2 replies; 22+ messages in thread
From: Randy.Dunlap @ 2003-08-19  3:39 UTC (permalink / raw)
  To: akpm; +Cc: davej, mpm, linux-kernel

> "Randy.Dunlap" <rddunlap@osdl.org> wrote:
>>
>> Debug: sleeping function called with interrupts disabled at
>>  include/asm/uaccess.h:473
>
> OK, now my vague understanding of what's going on is that the app has chosen
> to disable local interupts (via iopl()) and has taken a vm86 trap.  I guess
> we'd see the same thing if the app performed some sleeping syscall while
> interrupts are disabled.
>
> If that is correct then it really is just a false positive.
>
> It could also point at a bug in the application; it is presumably disabling
> interrupts for some form of locking, atomicity or timing guarantee.  But it
> will not lock against other CPUs and the fact that it trapped into the
> kernel indicates tat it may not be getting the atomicity which it desires.

Call Trace:
 [<c0120d93>] __might_sleep+0x53/0x74
 [<c010d001>] save_v86_state+0x71/0x1f0
 [<c010dbd5>] handle_vm86_fault+0xc5/0xa90
 [<c019cac8>] ext3_file_write+0x28/0xc0
 [<c011cd96>] __change_page_attr+0x26/0x220
 [<c010b310>] do_general_protection+0x0/0x90
 [<c010a69d>] error_code+0x2d/0x40
 [<c0109657>] syscall_call+0x7/0xb

My (more) vague understanding is that X(?) got the kernel to
do_general_protection() somehow, but change_page_attr() does this:
	spin_lock_irqsave(&cpa_lock, flags);
in arch/i386/mm/pageattr.c (I'm on a UP box),
so irqs are disabled by the kernel and then we calls put_user()
with a spinlock held.

~Randy  [betting I understand it less than Andrew]




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Debug: sleeping function called from invalid context
  2003-08-19  3:35               ` Andrew Morton
  2003-08-19  3:39                 ` Randy.Dunlap
@ 2003-08-19  4:14                 ` Matt Mackall
  2003-08-19  5:14                 ` Matt Mackall
  2 siblings, 0 replies; 22+ messages in thread
From: Matt Mackall @ 2003-08-19  4:14 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Randy.Dunlap, davej, linux-kernel

On Mon, Aug 18, 2003 at 08:35:13PM -0700, Andrew Morton wrote:
> "Randy.Dunlap" <rddunlap@osdl.org> wrote:
> >
> > Debug: sleeping function called with interrupts disabled at
> >  include/asm/uaccess.h:473
> 
> OK, now my vague understanding of what's going on is that the app has
> chosen to disable local interupts (via iopl()) and has taken a vm86 trap. 

Are you suggesting that whatever's calling sys_vm86 has disabled
interrupts beforehand? I don't see why that's necessary at all. The
vm86 fault handler is called via do_general_protection in any case and
as such is in_interrupt() by definition. And a vm86 general protection
fault can be caused by any of cli, sti, pushf, popf, intx, or iret. In
fact, typical usage of vm86 mode is to setup a call to a 16-bit
software interrupt handler and return via fault on the iret.

I'm increasingly convinced it's actually broken.

-- 
Matt Mackall : http://www.selenic.com : of or relating to the moon

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Debug: sleeping function called from invalid context
  2003-08-19  3:39                 ` Randy.Dunlap
@ 2003-08-19  4:26                   ` Matt Mackall
  2003-08-19  4:29                   ` Andrew Morton
  1 sibling, 0 replies; 22+ messages in thread
From: Matt Mackall @ 2003-08-19  4:26 UTC (permalink / raw)
  To: Randy.Dunlap; +Cc: akpm, davej, linux-kernel

On Mon, Aug 18, 2003 at 08:39:29PM -0700, Randy.Dunlap wrote:
> > "Randy.Dunlap" <rddunlap@osdl.org> wrote:
> >>
> >> Debug: sleeping function called with interrupts disabled at
> >>  include/asm/uaccess.h:473
> >
> > OK, now my vague understanding of what's going on is that the app has chosen
> > to disable local interupts (via iopl()) and has taken a vm86 trap.  I guess
> > we'd see the same thing if the app performed some sleeping syscall while
> > interrupts are disabled.
> >
> > If that is correct then it really is just a false positive.
> >
> > It could also point at a bug in the application; it is presumably disabling
> > interrupts for some form of locking, atomicity or timing guarantee.  But it
> > will not lock against other CPUs and the fact that it trapped into the
> > kernel indicates tat it may not be getting the atomicity which it desires.
> 
> Call Trace:
>  [<c0120d93>] __might_sleep+0x53/0x74
>  [<c010d001>] save_v86_state+0x71/0x1f0
>  [<c010dbd5>] handle_vm86_fault+0xc5/0xa90
>  [<c019cac8>] ext3_file_write+0x28/0xc0
>  [<c011cd96>] __change_page_attr+0x26/0x220
>  [<c010b310>] do_general_protection+0x0/0x90
>  [<c010a69d>] error_code+0x2d/0x40
>  [<c0109657>] syscall_call+0x7/0xb
> 
> My (more) vague understanding is that X(?) got the kernel to
> do_general_protection() somehow, but change_page_attr() does this:
> 	spin_lock_irqsave(&cpa_lock, flags);
> in arch/i386/mm/pageattr.c (I'm on a UP box),
> so irqs are disabled by the kernel and then we calls put_user()
> with a spinlock held.

My suspicion is that the X driver (MGA, right?) copies its BIOS into
its address space, sets up io permissions for it to fiddle with your
card, and then calls sys_vm86 with the appropriate registers to call
into the interrupt handler when sys_vm86 swaps those registers in and
does "jmp resume_userspace". When the video BIOS software interrupt
handler does iret, we get the fault, copy the registers to our 32-bit
usermode (tripping might_sleep), and then swap the 32-bit registers
back in for another jump to resume_userspace.

The fix is making return_to_32_bit return back to the end of
do_sys_vm86 rather than straight to userspace, at which point we can
safely handle the registers.

-- 
Matt Mackall : http://www.selenic.com : of or relating to the moon

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Debug: sleeping function called from invalid context
  2003-08-19  3:39                 ` Randy.Dunlap
  2003-08-19  4:26                   ` Matt Mackall
@ 2003-08-19  4:29                   ` Andrew Morton
  1 sibling, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2003-08-19  4:29 UTC (permalink / raw)
  To: Randy.Dunlap; +Cc: davej, mpm, linux-kernel

"Randy.Dunlap" <rddunlap@osdl.org> wrote:
>
> Call Trace:
>   [<c0120d93>] __might_sleep+0x53/0x74
>   [<c010d001>] save_v86_state+0x71/0x1f0
>   [<c010dbd5>] handle_vm86_fault+0xc5/0xa90
>   [<c019cac8>] ext3_file_write+0x28/0xc0
>   [<c011cd96>] __change_page_attr+0x26/0x220
>   [<c010b310>] do_general_protection+0x0/0x90
>   [<c010a69d>] error_code+0x2d/0x40
>   [<c0109657>] syscall_call+0x7/0xb
> 
>  My (more) vague understanding is that X(?) got the kernel to
>  do_general_protection() somehow, but change_page_attr() does this:
>  	spin_lock_irqsave(&cpa_lock, flags);
>  in arch/i386/mm/pageattr.c (I'm on a UP box),
>  so irqs are disabled by the kernel and then we calls put_user()
>  with a spinlock held.

The __change_page_attr() there looks like stack gunk.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Debug: sleeping function called from invalid context
  2003-08-19  3:35               ` Andrew Morton
  2003-08-19  3:39                 ` Randy.Dunlap
  2003-08-19  4:14                 ` Matt Mackall
@ 2003-08-19  5:14                 ` Matt Mackall
  2 siblings, 0 replies; 22+ messages in thread
From: Matt Mackall @ 2003-08-19  5:14 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Randy.Dunlap, davej, linux-kernel

On Mon, Aug 18, 2003 at 08:35:13PM -0700, Andrew Morton wrote:
> "Randy.Dunlap" <rddunlap@osdl.org> wrote:
> >
> > Debug: sleeping function called with interrupts disabled at
> >  include/asm/uaccess.h:473
> 
> OK, now my vague understanding of what's going on is that the app has
> chosen to disable local interupts (via iopl()) and has taken a vm86 trap. 
> I guess we'd see the same thing if the app performed some sleeping syscall
> while interrupts are disabled.

Ok, I think I've managed to reproduce this and show that it's not a
case of calling syscalls with interrupt disabled.

There's a utility called savetextmode that's part of svgalib (apt-get
svgalib-bin for Debian folks). If you configure it to use VESA, it
will call out to your video card bios and reproduce the error, just as
I expected.

I had kgdb handy and already running on a machine, so I set a
breakpoint in sys_vm86old (X would use sys_vm86, but the results would
be the same) and traced into it. About the first thing it does is call
copy_from_user, which checks might_sleep, which remained silent.

-- 
Matt Mackall : http://www.selenic.com : of or relating to the moon

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2003-08-19  5:17 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-08-15 17:18 Debug: sleeping function called from invalid context Randy.Dunlap
2003-08-15 17:32 ` Dave Jones
2003-08-15 17:42   ` Randy.Dunlap
2003-08-15 19:30   ` Randy.Dunlap
2003-08-16  7:06     ` Matt Mackall
2003-08-18 21:07       ` Randy.Dunlap
2003-08-18 21:26         ` Matt Mackall
2003-08-19  0:13         ` Dave Jones
2003-08-19  0:15           ` Randy.Dunlap
2003-08-19  1:02             ` Zwane Mwaikambo
2003-08-19  0:15           ` Andrew Morton
2003-08-19  1:27             ` Matt Mackall
2003-08-19  3:24             ` Randy.Dunlap
2003-08-19  3:35               ` Andrew Morton
2003-08-19  3:39                 ` Randy.Dunlap
2003-08-19  4:26                   ` Matt Mackall
2003-08-19  4:29                   ` Andrew Morton
2003-08-19  4:14                 ` Matt Mackall
2003-08-19  5:14                 ` Matt Mackall
2003-08-19  1:01           ` Matt Mackall
2003-08-19  1:04             ` Dave Jones
2003-08-19  1:09               ` Matt Mackall

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).