All of lore.kernel.org
 help / color / mirror / Atom feed
* More linux-2.6.9 module problems
@ 2004-11-08 16:50 linux-os
  2004-11-09 19:51 ` Mike Waychison
  0 siblings, 1 reply; 8+ messages in thread
From: linux-os @ 2004-11-08 16:50 UTC (permalink / raw)
  To: Linux kernel


I have a memory-test procedure that tests
memory on a board, accessed via the PCI bus.
There is a lot of RAM and it's bank-switched
into some 64k windows so it takes a lot of
time to test, about 60 seconds.

This is in a module, therefore inside the kernel.
When it is invoked via an ioctl() call, the
kernel is frozen for the whole test-time. The
test procedure does not use any spin-locks nor
does it even use any semaphores. It just does a
bunch of read/write operations over the PCI/Bus.

I thought that I could enable the preemptible-
kernel option and the machine would then respond
normally. Not so. Even with 4 CPUs, when one
ioctl() is busy in the kernel, nothing else
happens until its done. Even keyboard activity
is gone, no Caps Lock and no Num Lock, no `ping`
response over the network. However, the machine
comes back to life when the memory-test is done.

This is kernel version 2.6.9. Is it possible that
somebody left on the BKL when calling a module
ioctl() on this version? If not, what do I do
to be able to execute a time-consuming procedure
from inside the kernel? Do I break it up into
sections and execute schedule() periodically
(temporary work-around --works)??

Cheers,
Dick Johnson
Penguin : Linux version 2.6.9 on an i686 machine (5537.79 BogoMips).
  Notice : All mail here is now cached for review by John Ashcroft.
                  98.36% of all statistics are fiction.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: More linux-2.6.9 module problems
  2004-11-08 16:50 More linux-2.6.9 module problems linux-os
@ 2004-11-09 19:51 ` Mike Waychison
  2004-11-09 20:25   ` linux-os
  0 siblings, 1 reply; 8+ messages in thread
From: Mike Waychison @ 2004-11-09 19:51 UTC (permalink / raw)
  To: linux-os; +Cc: Linux kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

linux-os wrote:
> 
> I have a memory-test procedure that tests
> memory on a board, accessed via the PCI bus.
> There is a lot of RAM and it's bank-switched
> into some 64k windows so it takes a lot of
> time to test, about 60 seconds.
> 
> This is in a module, therefore inside the kernel.
> When it is invoked via an ioctl() call, the
> kernel is frozen for the whole test-time. The
> test procedure does not use any spin-locks nor
> does it even use any semaphores. It just does a
> bunch of read/write operations over the PCI/Bus.
> 
> I thought that I could enable the preemptible-
> kernel option and the machine would then respond
> normally. Not so. Even with 4 CPUs, when one
> ioctl() is busy in the kernel, nothing else
> happens until its done. Even keyboard activity
> is gone, no Caps Lock and no Num Lock, no `ping`
> response over the network. However, the machine
> comes back to life when the memory-test is done.
> 
> This is kernel version 2.6.9. Is it possible that
> somebody left on the BKL when calling a module
> ioctl() on this version? If not, what do I do
> to be able to execute a time-consuming procedure
> from inside the kernel? Do I break it up into
> sections and execute schedule() periodically
> (temporary work-around --works)??
> 

The BKL has always been grabbed across ioctls.  Drop the lock when you
enter your f_op->ioctl call and grab it again open completion.

- --
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me,
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBkR/UdQs4kOxk3/MRAqYmAJwM4wQFhGis831m50lzqOKnCY0BEgCeOtXY
4TmvEX9mmXfsT2L9EinlwiM=
=fiO5
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: More linux-2.6.9 module problems
  2004-11-09 19:51 ` Mike Waychison
@ 2004-11-09 20:25   ` linux-os
  2004-11-09 21:43     ` Mike Waychison
  0 siblings, 1 reply; 8+ messages in thread
From: linux-os @ 2004-11-09 20:25 UTC (permalink / raw)
  To: Mike Waychison; +Cc: Linux kernel

On Tue, 9 Nov 2004, Mike Waychison wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> linux-os wrote:
>>
>> I have a memory-test procedure that tests
>> memory on a board, accessed via the PCI bus.
>> There is a lot of RAM and it's bank-switched
>> into some 64k windows so it takes a lot of
>> time to test, about 60 seconds.
>>
>> This is in a module, therefore inside the kernel.
>> When it is invoked via an ioctl() call, the
>> kernel is frozen for the whole test-time. The
>> test procedure does not use any spin-locks nor
>> does it even use any semaphores. It just does a
>> bunch of read/write operations over the PCI/Bus.
>>
>> I thought that I could enable the preemptible-
>> kernel option and the machine would then respond
>> normally. Not so. Even with 4 CPUs, when one
>> ioctl() is busy in the kernel, nothing else
>> happens until its done. Even keyboard activity
>> is gone, no Caps Lock and no Num Lock, no `ping`
>> response over the network. However, the machine
>> comes back to life when the memory-test is done.
>>
>> This is kernel version 2.6.9. Is it possible that
>> somebody left on the BKL when calling a module
>> ioctl() on this version? If not, what do I do
>> to be able to execute a time-consuming procedure
>> from inside the kernel? Do I break it up into
>> sections and execute schedule() periodically
>> (temporary work-around --works)??
>>
>
> The BKL has always been grabbed across ioctls.  Drop the lock when you
> enter your f_op->ioctl call and grab it again open completion.
>

Hmmm. I get 'scheduling while atomic' screaming across the screen!
There are no atomic operations in my ioctl functions so I don't
know what its complaining about. I think I shouldn't have tried
to do anything with BKL because I (my task) doesn't own it.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.9 on an i686 machine (5537.79 BogoMips).
  Notice : All mail here is now cached for review by John Ashcroft.
                  98.36% of all statistics are fiction.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: More linux-2.6.9 module problems
  2004-11-09 20:25   ` linux-os
@ 2004-11-09 21:43     ` Mike Waychison
  2004-11-09 22:17       ` linux-os
  0 siblings, 1 reply; 8+ messages in thread
From: Mike Waychison @ 2004-11-09 21:43 UTC (permalink / raw)
  To: linux-os; +Cc: Linux kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

linux-os wrote:
> On Tue, 9 Nov 2004, Mike Waychison wrote:
> 
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> linux-os wrote:
>>
>>>
>>> I have a memory-test procedure that tests
>>> memory on a board, accessed via the PCI bus.
>>> There is a lot of RAM and it's bank-switched
>>> into some 64k windows so it takes a lot of
>>> time to test, about 60 seconds.
>>>
>>> This is in a module, therefore inside the kernel.
>>> When it is invoked via an ioctl() call, the
>>> kernel is frozen for the whole test-time. The
>>> test procedure does not use any spin-locks nor
>>> does it even use any semaphores. It just does a
>>> bunch of read/write operations over the PCI/Bus.
>>>
>>> I thought that I could enable the preemptible-
>>> kernel option and the machine would then respond
>>> normally. Not so. Even with 4 CPUs, when one
>>> ioctl() is busy in the kernel, nothing else
>>> happens until its done. Even keyboard activity
>>> is gone, no Caps Lock and no Num Lock, no `ping`
>>> response over the network. However, the machine
>>> comes back to life when the memory-test is done.
>>>
>>> This is kernel version 2.6.9. Is it possible that
>>> somebody left on the BKL when calling a module
>>> ioctl() on this version? If not, what do I do
>>> to be able to execute a time-consuming procedure
>>> from inside the kernel? Do I break it up into
>>> sections and execute schedule() periodically
>>> (temporary work-around --works)??
>>>
>>
>> The BKL has always been grabbed across ioctls.  Drop the lock when you
>> enter your f_op->ioctl call and grab it again open completion.
>>
> 
> Hmmm. I get 'scheduling while atomic' screaming across the screen!
> There are no atomic operations in my ioctl functions so I don't
> know what its complaining about. I think I shouldn't have tried
> to do anything with BKL because I (my task) doesn't own it.
> 

'Scheduling while atomic' means you called some function that may
schedule itself out while you are holding a spinlock.  Note that the BKL
is not a regular spinlock, and scheduling is allowed while holding it.

Please see
http://james.bond.edu.au/courses/inft73626@033/Assigs/Papers/kernel_locking_techniques.html
by Robert Love, the section titled "The Big Kernel Lock"

Something else is wrong with your code.

- --
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me,
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBkToWdQs4kOxk3/MRAl2KAJ0e3Eg72MnrTWwJrctdN9YAY4T8ngCeN8p/
7G2IkrNjDaHpkYIi0dUdoQY=
=JUUw
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: More linux-2.6.9 module problems
  2004-11-09 21:43     ` Mike Waychison
@ 2004-11-09 22:17       ` linux-os
  2004-11-09 22:32         ` Mike Waychison
  0 siblings, 1 reply; 8+ messages in thread
From: linux-os @ 2004-11-09 22:17 UTC (permalink / raw)
  To: Mike Waychison; +Cc: Linux kernel

On Tue, 9 Nov 2004, Mike Waychison wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> linux-os wrote:
>> On Tue, 9 Nov 2004, Mike Waychison wrote:
>>
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> linux-os wrote:
>>>
>>>>
>>>> I have a memory-test procedure that tests
>>>> memory on a board, accessed via the PCI bus.
>>>> There is a lot of RAM and it's bank-switched
>>>> into some 64k windows so it takes a lot of
>>>> time to test, about 60 seconds.
>>>>
>>>> This is in a module, therefore inside the kernel.
>>>> When it is invoked via an ioctl() call, the
>>>> kernel is frozen for the whole test-time. The
>>>> test procedure does not use any spin-locks nor
>>>> does it even use any semaphores. It just does a
>>>> bunch of read/write operations over the PCI/Bus.
>>>>
>>>> I thought that I could enable the preemptible-
>>>> kernel option and the machine would then respond
>>>> normally. Not so. Even with 4 CPUs, when one
>>>> ioctl() is busy in the kernel, nothing else
>>>> happens until its done. Even keyboard activity
>>>> is gone, no Caps Lock and no Num Lock, no `ping`
>>>> response over the network. However, the machine
>>>> comes back to life when the memory-test is done.
>>>>
>>>> This is kernel version 2.6.9. Is it possible that
>>>> somebody left on the BKL when calling a module
>>>> ioctl() on this version? If not, what do I do
>>>> to be able to execute a time-consuming procedure
>>>> from inside the kernel? Do I break it up into
>>>> sections and execute schedule() periodically
>>>> (temporary work-around --works)??
>>>>
>>>
>>> The BKL has always been grabbed across ioctls.  Drop the lock when you
>>> enter your f_op->ioctl call and grab it again open completion.
>>>
>>
>> Hmmm. I get 'scheduling while atomic' screaming across the screen!
>> There are no atomic operations in my ioctl functions so I don't
>> know what its complaining about. I think I shouldn't have tried
>> to do anything with BKL because I (my task) doesn't own it.
>>
>
> 'Scheduling while atomic' means you called some function that may
> schedule itself out while you are holding a spinlock.  Note that the BKL
> is not a regular spinlock, and scheduling is allowed while holding it.
>
> Please see
> http://james.bond.edu.au/courses/inft73626@033/Assigs/Papers/kernel_locking_techniques.html
> by Robert Love, the section titled "The Big Kernel Lock"
>
> Something else is wrong with your code.

Not quite. Something is wrong with the e100 network driver used in
2.6.9. When I do:

int ioctl(,,,,)
{
    int ret;
    unlock_kernel();
    ret = original_ioctl(...);
    lock_kernel();
    return ret;
}
In my driver,  completely unrelated to the network.... It's
something in the e100 network driver that the kernel's
complaining about. If I shut down the network and remove
the network driver module I don't have any problems while
enabling BKL. Everything runs fine.

The code that runs is:

/*
  *   Copyright(c)  2004  Analogic Corporation
  *
  *   This program may be distributed under the GNU Public License
  *   version 2, as published by the Free Software Foundation, Inc.,
  *   59 Temple Place, Suite 330 Boston, MA, 02111.
  *
  *   File ram_test.c	Created 10-MAY-2001	Richard B. Johnson
  */

#include <linux/kernel.h>

/*-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
/*
  *   The following are in file rwcheck.S
  */
extern void xorlw(volatile void *men, size_t wrd, size_t len);
extern void fill_rnd(volatile void *men, size_t len);
extern unsigned char *check_rnd(volatile void *men, size_t len);
extern void set_seed(int);

/*-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
/*
  *   This tests RAM to make sure it is read/writable, and uniquely-
  *   addressable i.e., working.
  *   If the RAM is not working, this returns the address of the
  *   first failing location, otherwise it returns NULL.
  */

#define SEED 0x12345678

unsigned char *testram(volatile void *mem, size_t len)
{
     len /= sizeof(size_t);
     set_seed(SEED);
     fill_rnd(mem, len);
     xorlw(mem, 0x55555555, len);
     xorlw(mem, 0xaaaaaaaa, len);
     xorlw(mem, 0xa5555555, len);
     xorlw(mem, 0x5a555555, len);
     xorlw(mem, 0x55a55555, len);
     xorlw(mem, 0x555a5555, len);
     xorlw(mem, 0x5555a555, len);
     xorlw(mem, 0x55555a55, len);
     xorlw(mem, 0x555555a5, len);
     xorlw(mem, 0x5555555a, len);
     xorlw(mem, 0x5aaaaaaa, len);
     xorlw(mem, 0xa5aaaaaa, len);
     xorlw(mem, 0xaa5aaaaa, len);
     xorlw(mem, 0xaaa5aaaa, len);
     xorlw(mem, 0xaaaa5aaa, len);
     xorlw(mem, 0xaaaaa5aa, len);
     xorlw(mem, 0xaaaaaa5a, len);
     xorlw(mem, 0xaaaaaaa5, len);
     xorlw(mem, 0xaa55aa55, len);
     xorlw(mem, 0x55aa55aa, len);
     xorlw(mem, 0xaa55aa55, len);
     xorlw(mem, 0x55aa55aa, len);
     xorlw(mem, 0xaaaaaaaa, len);
     xorlw(mem, 0x5aaaaaaa, len);
     xorlw(mem, 0xa5aaaaaa, len);
     xorlw(mem, 0xaa5aaaaa, len);
     xorlw(mem, 0xaaa5aaaa, len);
     xorlw(mem, 0xaaaa5aaa, len);
     xorlw(mem, 0xaaaaa5aa, len);
     xorlw(mem, 0xaaaaaa5a, len);
     xorlw(mem, 0xaaaaaaa5, len);
     xorlw(mem, 0xa5555555, len);
     xorlw(mem, 0x5a555555, len);
     xorlw(mem, 0x55a55555, len);
     xorlw(mem, 0x555a5555, len);
     xorlw(mem, 0x5555a555, len);
     xorlw(mem, 0x55555a55, len);
     xorlw(mem, 0x555555a5, len);
     xorlw(mem, 0x5555555a, len);
     xorlw(mem, 0x55555555, len);
     set_seed(SEED);
     return check_rnd(mem, len);
}

The 60 seconds is a very long time to not have a responsive machine.
Once I removed the BKL, the machine was responsive as long as I
removed the network driver. There must be something in that network
driver that is timing-sensitive and I just ticked it off.

I will try a 3-COM board in a few minutes. The 'real' target machines
don't use either of these so it might just be a non-event although
the maintainer of the e100 should know that I've got an interesting
test platform if he's got a patch!


Cheers,
Dick Johnson
Penguin : Linux version 2.6.9 on an i686 machine (5537.79 BogoMips).
  Notice : All mail here is now cached for review by John Ashcroft.
                  98.36% of all statistics are fiction.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: More linux-2.6.9 module problems
  2004-11-09 22:17       ` linux-os
@ 2004-11-09 22:32         ` Mike Waychison
  2004-11-09 23:11           ` linux-os
  0 siblings, 1 reply; 8+ messages in thread
From: Mike Waychison @ 2004-11-09 22:32 UTC (permalink / raw)
  To: linux-os; +Cc: Linux kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

linux-os wrote:
> On Tue, 9 Nov 2004, Mike Waychison wrote:
> 
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> linux-os wrote:
>>
>>> On Tue, 9 Nov 2004, Mike Waychison wrote:
>>>
>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>> Hash: SHA1
>>>>
>>>> linux-os wrote:
>>>>
>>>>>
>>>>> I have a memory-test procedure that tests
>>>>> memory on a board, accessed via the PCI bus.
>>>>> There is a lot of RAM and it's bank-switched
>>>>> into some 64k windows so it takes a lot of
>>>>> time to test, about 60 seconds.
>>>>>
>>>>> This is in a module, therefore inside the kernel.
>>>>> When it is invoked via an ioctl() call, the
>>>>> kernel is frozen for the whole test-time. The
>>>>> test procedure does not use any spin-locks nor
>>>>> does it even use any semaphores. It just does a
>>>>> bunch of read/write operations over the PCI/Bus.
>>>>>
>>>>> I thought that I could enable the preemptible-
>>>>> kernel option and the machine would then respond
>>>>> normally. Not so. Even with 4 CPUs, when one
>>>>> ioctl() is busy in the kernel, nothing else
>>>>> happens until its done. Even keyboard activity
>>>>> is gone, no Caps Lock and no Num Lock, no `ping`
>>>>> response over the network. However, the machine
>>>>> comes back to life when the memory-test is done.
>>>>>
>>>>> This is kernel version 2.6.9. Is it possible that
>>>>> somebody left on the BKL when calling a module
>>>>> ioctl() on this version? If not, what do I do
>>>>> to be able to execute a time-consuming procedure
>>>>> from inside the kernel? Do I break it up into
>>>>> sections and execute schedule() periodically
>>>>> (temporary work-around --works)??
>>>>>
>>>>
>>>> The BKL has always been grabbed across ioctls.  Drop the lock when you
>>>> enter your f_op->ioctl call and grab it again open completion.
>>>>
>>>
>>> Hmmm. I get 'scheduling while atomic' screaming across the screen!
>>> There are no atomic operations in my ioctl functions so I don't
>>> know what its complaining about. I think I shouldn't have tried
>>> to do anything with BKL because I (my task) doesn't own it.
>>>
>>
>> 'Scheduling while atomic' means you called some function that may
>> schedule itself out while you are holding a spinlock.  Note that the BKL
>> is not a regular spinlock, and scheduling is allowed while holding it.
>>
>> Please see
>> http://james.bond.edu.au/courses/inft73626@033/Assigs/Papers/kernel_locking_techniques.html
>>
>> by Robert Love, the section titled "The Big Kernel Lock"
>>
>> Something else is wrong with your code.
> 
> 
> Not quite. Something is wrong with the e100 network driver used in
> 2.6.9. When I do:
> 
> int ioctl(,,,,)
> {
>    int ret;
>    unlock_kernel();
>    ret = original_ioctl(...);
>    lock_kernel();
>    return ret;
> }
> In my driver,  completely unrelated to the network.... It's
> something in the e100 network driver that the kernel's
> complaining about. If I shut down the network and remove
> the network driver module I don't have any problems while
> enabling BKL. Everything runs fine.
> 

Don't do that. ioctls rightly-assume that the BKL is held when they are
called.

When I said drop the lock, I meant for _your_ ioctl code.

- --
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me,
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBkUVvdQs4kOxk3/MRAscGAKCa51vEk6sXl9zc/mNf+2i6ntvhfACeORkF
YlqcKKfN/5Y++pY4Ws6Kgpw=
=LsgB
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: More linux-2.6.9 module problems
  2004-11-09 22:32         ` Mike Waychison
@ 2004-11-09 23:11           ` linux-os
  2004-11-10  0:10             ` linux-os
  0 siblings, 1 reply; 8+ messages in thread
From: linux-os @ 2004-11-09 23:11 UTC (permalink / raw)
  To: Mike Waychison; +Cc: Linux kernel

On Tue, 9 Nov 2004, Mike Waychison wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> linux-os wrote:
>> On Tue, 9 Nov 2004, Mike Waychison wrote:
>>
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> linux-os wrote:
>>>
>>>> On Tue, 9 Nov 2004, Mike Waychison wrote:
>>>>
>>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>>> Hash: SHA1
>>>>>
>>>>> linux-os wrote:
>>>>>
>>>>>>
>>>>>> I have a memory-test procedure that tests
>>>>>> memory on a board, accessed via the PCI bus.
>>>>>> There is a lot of RAM and it's bank-switched
>>>>>> into some 64k windows so it takes a lot of
>>>>>> time to test, about 60 seconds.
>>>>>>
>>>>>> This is in a module, therefore inside the kernel.
>>>>>> When it is invoked via an ioctl() call, the
>>>>>> kernel is frozen for the whole test-time. The
>>>>>> test procedure does not use any spin-locks nor
>>>>>> does it even use any semaphores. It just does a
>>>>>> bunch of read/write operations over the PCI/Bus.
>>>>>>
>>>>>> I thought that I could enable the preemptible-
>>>>>> kernel option and the machine would then respond
>>>>>> normally. Not so. Even with 4 CPUs, when one
>>>>>> ioctl() is busy in the kernel, nothing else
>>>>>> happens until its done. Even keyboard activity
>>>>>> is gone, no Caps Lock and no Num Lock, no `ping`
>>>>>> response over the network. However, the machine
>>>>>> comes back to life when the memory-test is done.
>>>>>>
>>>>>> This is kernel version 2.6.9. Is it possible that
>>>>>> somebody left on the BKL when calling a module
>>>>>> ioctl() on this version? If not, what do I do
>>>>>> to be able to execute a time-consuming procedure
>>>>>> from inside the kernel? Do I break it up into
>>>>>> sections and execute schedule() periodically
>>>>>> (temporary work-around --works)??
>>>>>>
>>>>>
>>>>> The BKL has always been grabbed across ioctls.  Drop the lock when you
>>>>> enter your f_op->ioctl call and grab it again open completion.
>>>>>
>>>>
>>>> Hmmm. I get 'scheduling while atomic' screaming across the screen!
>>>> There are no atomic operations in my ioctl functions so I don't
>>>> know what its complaining about. I think I shouldn't have tried
>>>> to do anything with BKL because I (my task) doesn't own it.
>>>>
>>>
>>> 'Scheduling while atomic' means you called some function that may
>>> schedule itself out while you are holding a spinlock.  Note that the BKL
>>> is not a regular spinlock, and scheduling is allowed while holding it.
>>>
>>> Please see
>>> http://james.bond.edu.au/courses/inft73626@033/Assigs/Papers/kernel_locking_techniques.html
>>>
>>> by Robert Love, the section titled "The Big Kernel Lock"
>>>
>>> Something else is wrong with your code.
>>
>>
>> Not quite. Something is wrong with the e100 network driver used in
>> 2.6.9. When I do:
>>
>> int ioctl(,,,,)
>> {
>>    int ret;
>>    unlock_kernel();
>>    ret = original_ioctl(...);
>>    lock_kernel();
>>    return ret;
>> }
>> In my driver,  completely unrelated to the network.... It's
>> something in the e100 network driver that the kernel's
>> complaining about. If I shut down the network and remove
>> the network driver module I don't have any problems while
>> enabling BKL. Everything runs fine.
>>
>
> Don't do that. ioctls rightly-assume that the BKL is held when they are
> called.
>
> When I said drop the lock, I meant for _your_ ioctl code.
>

Hmmm. My code didn't do any locking, therefore I don't know
how to, as you say "drop the lock", except how other kernel drivers
do it. If I had any semaphores (which I don't here), or spin-locks
(which I don't), I could certainly unlock anything my code locked.

However, the kernel did something before my code was called.
Therefore, I have no way of undoing it except by calling
unlock_kernel().

Is there some other way?

> - --
> Mike Waychison
> Sun Microsystems, Inc.
> 1 (650) 352-5299 voice
> 1 (416) 202-8336 voice
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> NOTICE:  The opinions expressed in this email are held by me,
> and may not represent the views of Sun Microsystems, Inc.
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.5 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
>
> iD8DBQFBkUVvdQs4kOxk3/MRAscGAKCa51vEk6sXl9zc/mNf+2i6ntvhfACeORkF
> YlqcKKfN/5Y++pY4Ws6Kgpw=
> =LsgB
> -----END PGP SIGNATURE-----
>

Cheers,
Dick Johnson
Penguin : Linux version 2.6.9 on an i686 machine (5537.79 BogoMips).
  Notice : All mail here is now cached for review by John Ashcroft.
                  98.36% of all statistics are fiction.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: More linux-2.6.9 module problems
  2004-11-09 23:11           ` linux-os
@ 2004-11-10  0:10             ` linux-os
  0 siblings, 0 replies; 8+ messages in thread
From: linux-os @ 2004-11-10  0:10 UTC (permalink / raw)
  To: Mike Waychison; +Cc: Linux kernel

On Tue, 9 Nov 2004, linux-os wrote:

> On Tue, 9 Nov 2004, Mike Waychison wrote:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>> 
>> linux-os wrote:
>>> On Tue, 9 Nov 2004, Mike Waychison wrote:
>>> 
>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>> Hash: SHA1
>>>> 
>>>> linux-os wrote:
>>>> 
>>>>> On Tue, 9 Nov 2004, Mike Waychison wrote:
>>>>> 
>>>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>>>> Hash: SHA1
>>>>>> 
>>>>>> linux-os wrote:
>>>>>> 
>>>>>>> 
>>>>>>> I have a memory-test procedure that tests
>>>>>>> memory on a board, accessed via the PCI bus.
>>>>>>> There is a lot of RAM and it's bank-switched
>>>>>>> into some 64k windows so it takes a lot of
>>>>>>> time to test, about 60 seconds.
>>>>>>> 
>>>>>>> This is in a module, therefore inside the kernel.
>>>>>>> When it is invoked via an ioctl() call, the
>>>>>>> kernel is frozen for the whole test-time. The
>>>>>>> test procedure does not use any spin-locks nor
>>>>>>> does it even use any semaphores. It just does a
>>>>>>> bunch of read/write operations over the PCI/Bus.
>>>>>>> 
>>>>>>> I thought that I could enable the preemptible-
>>>>>>> kernel option and the machine would then respond
>>>>>>> normally. Not so. Even with 4 CPUs, when one
>>>>>>> ioctl() is busy in the kernel, nothing else
>>>>>>> happens until its done. Even keyboard activity
>>>>>>> is gone, no Caps Lock and no Num Lock, no `ping`
>>>>>>> response over the network. However, the machine
>>>>>>> comes back to life when the memory-test is done.
>>>>>>> 
>>>>>>> This is kernel version 2.6.9. Is it possible that
>>>>>>> somebody left on the BKL when calling a module
>>>>>>> ioctl() on this version? If not, what do I do
>>>>>>> to be able to execute a time-consuming procedure
>>>>>>> from inside the kernel? Do I break it up into
>>>>>>> sections and execute schedule() periodically
>>>>>>> (temporary work-around --works)??
>>>>>>> 
>>>>>> 
>>>>>> The BKL has always been grabbed across ioctls.  Drop the lock when you
>>>>>> enter your f_op->ioctl call and grab it again open completion.
>>>>>> 
>>>>> 
>>>>> Hmmm. I get 'scheduling while atomic' screaming across the screen!
>>>>> There are no atomic operations in my ioctl functions so I don't
>>>>> know what its complaining about. I think I shouldn't have tried
>>>>> to do anything with BKL because I (my task) doesn't own it.
>>>>> 
>>>> 
>>>> 'Scheduling while atomic' means you called some function that may
>>>> schedule itself out while you are holding a spinlock.  Note that the BKL
>>>> is not a regular spinlock, and scheduling is allowed while holding it.
>>>> 
>>>> Please see
>>>> http://james.bond.edu.au/courses/inft73626@033/Assigs/Papers/kernel_locking_techniques.html
>>>> 
>>>> by Robert Love, the section titled "The Big Kernel Lock"
>>>> 
>>>> Something else is wrong with your code.
>>> 
>>> 
>>> Not quite. Something is wrong with the e100 network driver used in
>>> 2.6.9. When I do:
>>> 
>>> int ioctl(,,,,)
>>> {
>>>    int ret;
>>>    unlock_kernel();
>>>    ret = original_ioctl(...);
>>>    lock_kernel();
>>>    return ret;
>>> }
>>> In my driver,  completely unrelated to the network.... It's
>>> something in the e100 network driver that the kernel's
>>> complaining about. If I shut down the network and remove
>>> the network driver module I don't have any problems while
>>> enabling BKL. Everything runs fine.
>>> 
>> 
>> Don't do that. ioctls rightly-assume that the BKL is held when they are
>> called.
>> 
>> When I said drop the lock, I meant for _your_ ioctl code.
>> 
>
> Hmmm. My code didn't do any locking, therefore I don't know
> how to, as you say "drop the lock", except how other kernel drivers
> do it. If I had any semaphores (which I don't here), or spin-locks
> (which I don't), I could certainly unlock anything my code locked.
>
> However, the kernel did something before my code was called.
> Therefore, I have no way of undoing it except by calling
> unlock_kernel().
>
> Is there some other way?

I experimented with:
 	release_kernel_lock(current);
         do_ioctl();
         reacquire_kernel_lock(current);

The results were truly spectacular crashes when a copy_to_user
happens in the ioctl(), returning the results. The starting
error is:
    sleeping function called from invalid context at 
arch/i386/lib/usercopy.c:599.

It says in_atomic():1, irqs_disabled():0 so something
makes __might_sleep() think that it's "in_atomic".
Looking at that, I see where !kernel_locked() is going
to cause problems in ../include/linux/hard_irq.h if
we've been preempted.

Then a few hundred thousand lines of unrelated stuff 
smp_apic_timer_interrupt, etc.

FYI, there are no spin-locks and no semaphores in the
ioctl() code, and it all works if I don't muck with the
kernel lock.

So maybe I can't do copy_to_user unless the kernel lock
is held? Seems strange.


Cheers,
Dick Johnson
Penguin : Linux version 2.6.9 on an i686 machine (5537.79 BogoMips).
  Notice : All mail here is now cached for review by John Ashcroft.
                  98.36% of all statistics are fiction.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2004-11-10  0:14 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-11-08 16:50 More linux-2.6.9 module problems linux-os
2004-11-09 19:51 ` Mike Waychison
2004-11-09 20:25   ` linux-os
2004-11-09 21:43     ` Mike Waychison
2004-11-09 22:17       ` linux-os
2004-11-09 22:32         ` Mike Waychison
2004-11-09 23:11           ` linux-os
2004-11-10  0:10             ` linux-os

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.