* More linux-2.6.9 module problems @ 2004-11-08 16:50 linux-os 2004-11-09 19:51 ` Mike Waychison 0 siblings, 1 reply; 8+ messages in thread From: linux-os @ 2004-11-08 16:50 UTC (permalink / raw) To: Linux kernel I have a memory-test procedure that tests memory on a board, accessed via the PCI bus. There is a lot of RAM and it's bank-switched into some 64k windows so it takes a lot of time to test, about 60 seconds. This is in a module, therefore inside the kernel. When it is invoked via an ioctl() call, the kernel is frozen for the whole test-time. The test procedure does not use any spin-locks nor does it even use any semaphores. It just does a bunch of read/write operations over the PCI/Bus. I thought that I could enable the preemptible- kernel option and the machine would then respond normally. Not so. Even with 4 CPUs, when one ioctl() is busy in the kernel, nothing else happens until its done. Even keyboard activity is gone, no Caps Lock and no Num Lock, no `ping` response over the network. However, the machine comes back to life when the memory-test is done. This is kernel version 2.6.9. Is it possible that somebody left on the BKL when calling a module ioctl() on this version? If not, what do I do to be able to execute a time-consuming procedure from inside the kernel? Do I break it up into sections and execute schedule() periodically (temporary work-around --works)?? Cheers, Dick Johnson Penguin : Linux version 2.6.9 on an i686 machine (5537.79 BogoMips). Notice : All mail here is now cached for review by John Ashcroft. 98.36% of all statistics are fiction. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: More linux-2.6.9 module problems 2004-11-08 16:50 More linux-2.6.9 module problems linux-os @ 2004-11-09 19:51 ` Mike Waychison 2004-11-09 20:25 ` linux-os 0 siblings, 1 reply; 8+ messages in thread From: Mike Waychison @ 2004-11-09 19:51 UTC (permalink / raw) To: linux-os; +Cc: Linux kernel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 linux-os wrote: > > I have a memory-test procedure that tests > memory on a board, accessed via the PCI bus. > There is a lot of RAM and it's bank-switched > into some 64k windows so it takes a lot of > time to test, about 60 seconds. > > This is in a module, therefore inside the kernel. > When it is invoked via an ioctl() call, the > kernel is frozen for the whole test-time. The > test procedure does not use any spin-locks nor > does it even use any semaphores. It just does a > bunch of read/write operations over the PCI/Bus. > > I thought that I could enable the preemptible- > kernel option and the machine would then respond > normally. Not so. Even with 4 CPUs, when one > ioctl() is busy in the kernel, nothing else > happens until its done. Even keyboard activity > is gone, no Caps Lock and no Num Lock, no `ping` > response over the network. However, the machine > comes back to life when the memory-test is done. > > This is kernel version 2.6.9. Is it possible that > somebody left on the BKL when calling a module > ioctl() on this version? If not, what do I do > to be able to execute a time-consuming procedure > from inside the kernel? Do I break it up into > sections and execute schedule() periodically > (temporary work-around --works)?? > The BKL has always been grabbed across ioctls. Drop the lock when you enter your f_op->ioctl call and grab it again open completion. - -- Mike Waychison Sun Microsystems, Inc. 1 (650) 352-5299 voice 1 (416) 202-8336 voice ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ NOTICE: The opinions expressed in this email are held by me, and may not represent the views of Sun Microsystems, Inc. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBkR/UdQs4kOxk3/MRAqYmAJwM4wQFhGis831m50lzqOKnCY0BEgCeOtXY 4TmvEX9mmXfsT2L9EinlwiM= =fiO5 -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: More linux-2.6.9 module problems 2004-11-09 19:51 ` Mike Waychison @ 2004-11-09 20:25 ` linux-os 2004-11-09 21:43 ` Mike Waychison 0 siblings, 1 reply; 8+ messages in thread From: linux-os @ 2004-11-09 20:25 UTC (permalink / raw) To: Mike Waychison; +Cc: Linux kernel On Tue, 9 Nov 2004, Mike Waychison wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > linux-os wrote: >> >> I have a memory-test procedure that tests >> memory on a board, accessed via the PCI bus. >> There is a lot of RAM and it's bank-switched >> into some 64k windows so it takes a lot of >> time to test, about 60 seconds. >> >> This is in a module, therefore inside the kernel. >> When it is invoked via an ioctl() call, the >> kernel is frozen for the whole test-time. The >> test procedure does not use any spin-locks nor >> does it even use any semaphores. It just does a >> bunch of read/write operations over the PCI/Bus. >> >> I thought that I could enable the preemptible- >> kernel option and the machine would then respond >> normally. Not so. Even with 4 CPUs, when one >> ioctl() is busy in the kernel, nothing else >> happens until its done. Even keyboard activity >> is gone, no Caps Lock and no Num Lock, no `ping` >> response over the network. However, the machine >> comes back to life when the memory-test is done. >> >> This is kernel version 2.6.9. Is it possible that >> somebody left on the BKL when calling a module >> ioctl() on this version? If not, what do I do >> to be able to execute a time-consuming procedure >> from inside the kernel? Do I break it up into >> sections and execute schedule() periodically >> (temporary work-around --works)?? >> > > The BKL has always been grabbed across ioctls. Drop the lock when you > enter your f_op->ioctl call and grab it again open completion. > Hmmm. I get 'scheduling while atomic' screaming across the screen! There are no atomic operations in my ioctl functions so I don't know what its complaining about. I think I shouldn't have tried to do anything with BKL because I (my task) doesn't own it. Cheers, Dick Johnson Penguin : Linux version 2.6.9 on an i686 machine (5537.79 BogoMips). Notice : All mail here is now cached for review by John Ashcroft. 98.36% of all statistics are fiction. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: More linux-2.6.9 module problems 2004-11-09 20:25 ` linux-os @ 2004-11-09 21:43 ` Mike Waychison 2004-11-09 22:17 ` linux-os 0 siblings, 1 reply; 8+ messages in thread From: Mike Waychison @ 2004-11-09 21:43 UTC (permalink / raw) To: linux-os; +Cc: Linux kernel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 linux-os wrote: > On Tue, 9 Nov 2004, Mike Waychison wrote: > >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> linux-os wrote: >> >>> >>> I have a memory-test procedure that tests >>> memory on a board, accessed via the PCI bus. >>> There is a lot of RAM and it's bank-switched >>> into some 64k windows so it takes a lot of >>> time to test, about 60 seconds. >>> >>> This is in a module, therefore inside the kernel. >>> When it is invoked via an ioctl() call, the >>> kernel is frozen for the whole test-time. The >>> test procedure does not use any spin-locks nor >>> does it even use any semaphores. It just does a >>> bunch of read/write operations over the PCI/Bus. >>> >>> I thought that I could enable the preemptible- >>> kernel option and the machine would then respond >>> normally. Not so. Even with 4 CPUs, when one >>> ioctl() is busy in the kernel, nothing else >>> happens until its done. Even keyboard activity >>> is gone, no Caps Lock and no Num Lock, no `ping` >>> response over the network. However, the machine >>> comes back to life when the memory-test is done. >>> >>> This is kernel version 2.6.9. Is it possible that >>> somebody left on the BKL when calling a module >>> ioctl() on this version? If not, what do I do >>> to be able to execute a time-consuming procedure >>> from inside the kernel? Do I break it up into >>> sections and execute schedule() periodically >>> (temporary work-around --works)?? >>> >> >> The BKL has always been grabbed across ioctls. Drop the lock when you >> enter your f_op->ioctl call and grab it again open completion. >> > > Hmmm. I get 'scheduling while atomic' screaming across the screen! > There are no atomic operations in my ioctl functions so I don't > know what its complaining about. I think I shouldn't have tried > to do anything with BKL because I (my task) doesn't own it. > 'Scheduling while atomic' means you called some function that may schedule itself out while you are holding a spinlock. Note that the BKL is not a regular spinlock, and scheduling is allowed while holding it. Please see http://james.bond.edu.au/courses/inft73626@033/Assigs/Papers/kernel_locking_techniques.html by Robert Love, the section titled "The Big Kernel Lock" Something else is wrong with your code. - -- Mike Waychison Sun Microsystems, Inc. 1 (650) 352-5299 voice 1 (416) 202-8336 voice ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ NOTICE: The opinions expressed in this email are held by me, and may not represent the views of Sun Microsystems, Inc. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBkToWdQs4kOxk3/MRAl2KAJ0e3Eg72MnrTWwJrctdN9YAY4T8ngCeN8p/ 7G2IkrNjDaHpkYIi0dUdoQY= =JUUw -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: More linux-2.6.9 module problems 2004-11-09 21:43 ` Mike Waychison @ 2004-11-09 22:17 ` linux-os 2004-11-09 22:32 ` Mike Waychison 0 siblings, 1 reply; 8+ messages in thread From: linux-os @ 2004-11-09 22:17 UTC (permalink / raw) To: Mike Waychison; +Cc: Linux kernel On Tue, 9 Nov 2004, Mike Waychison wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > linux-os wrote: >> On Tue, 9 Nov 2004, Mike Waychison wrote: >> >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA1 >>> >>> linux-os wrote: >>> >>>> >>>> I have a memory-test procedure that tests >>>> memory on a board, accessed via the PCI bus. >>>> There is a lot of RAM and it's bank-switched >>>> into some 64k windows so it takes a lot of >>>> time to test, about 60 seconds. >>>> >>>> This is in a module, therefore inside the kernel. >>>> When it is invoked via an ioctl() call, the >>>> kernel is frozen for the whole test-time. The >>>> test procedure does not use any spin-locks nor >>>> does it even use any semaphores. It just does a >>>> bunch of read/write operations over the PCI/Bus. >>>> >>>> I thought that I could enable the preemptible- >>>> kernel option and the machine would then respond >>>> normally. Not so. Even with 4 CPUs, when one >>>> ioctl() is busy in the kernel, nothing else >>>> happens until its done. Even keyboard activity >>>> is gone, no Caps Lock and no Num Lock, no `ping` >>>> response over the network. However, the machine >>>> comes back to life when the memory-test is done. >>>> >>>> This is kernel version 2.6.9. Is it possible that >>>> somebody left on the BKL when calling a module >>>> ioctl() on this version? If not, what do I do >>>> to be able to execute a time-consuming procedure >>>> from inside the kernel? Do I break it up into >>>> sections and execute schedule() periodically >>>> (temporary work-around --works)?? >>>> >>> >>> The BKL has always been grabbed across ioctls. Drop the lock when you >>> enter your f_op->ioctl call and grab it again open completion. >>> >> >> Hmmm. I get 'scheduling while atomic' screaming across the screen! >> There are no atomic operations in my ioctl functions so I don't >> know what its complaining about. I think I shouldn't have tried >> to do anything with BKL because I (my task) doesn't own it. >> > > 'Scheduling while atomic' means you called some function that may > schedule itself out while you are holding a spinlock. Note that the BKL > is not a regular spinlock, and scheduling is allowed while holding it. > > Please see > http://james.bond.edu.au/courses/inft73626@033/Assigs/Papers/kernel_locking_techniques.html > by Robert Love, the section titled "The Big Kernel Lock" > > Something else is wrong with your code. Not quite. Something is wrong with the e100 network driver used in 2.6.9. When I do: int ioctl(,,,,) { int ret; unlock_kernel(); ret = original_ioctl(...); lock_kernel(); return ret; } In my driver, completely unrelated to the network.... It's something in the e100 network driver that the kernel's complaining about. If I shut down the network and remove the network driver module I don't have any problems while enabling BKL. Everything runs fine. The code that runs is: /* * Copyright(c) 2004 Analogic Corporation * * This program may be distributed under the GNU Public License * version 2, as published by the Free Software Foundation, Inc., * 59 Temple Place, Suite 330 Boston, MA, 02111. * * File ram_test.c Created 10-MAY-2001 Richard B. Johnson */ #include <linux/kernel.h> /*-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/ /* * The following are in file rwcheck.S */ extern void xorlw(volatile void *men, size_t wrd, size_t len); extern void fill_rnd(volatile void *men, size_t len); extern unsigned char *check_rnd(volatile void *men, size_t len); extern void set_seed(int); /*-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/ /* * This tests RAM to make sure it is read/writable, and uniquely- * addressable i.e., working. * If the RAM is not working, this returns the address of the * first failing location, otherwise it returns NULL. */ #define SEED 0x12345678 unsigned char *testram(volatile void *mem, size_t len) { len /= sizeof(size_t); set_seed(SEED); fill_rnd(mem, len); xorlw(mem, 0x55555555, len); xorlw(mem, 0xaaaaaaaa, len); xorlw(mem, 0xa5555555, len); xorlw(mem, 0x5a555555, len); xorlw(mem, 0x55a55555, len); xorlw(mem, 0x555a5555, len); xorlw(mem, 0x5555a555, len); xorlw(mem, 0x55555a55, len); xorlw(mem, 0x555555a5, len); xorlw(mem, 0x5555555a, len); xorlw(mem, 0x5aaaaaaa, len); xorlw(mem, 0xa5aaaaaa, len); xorlw(mem, 0xaa5aaaaa, len); xorlw(mem, 0xaaa5aaaa, len); xorlw(mem, 0xaaaa5aaa, len); xorlw(mem, 0xaaaaa5aa, len); xorlw(mem, 0xaaaaaa5a, len); xorlw(mem, 0xaaaaaaa5, len); xorlw(mem, 0xaa55aa55, len); xorlw(mem, 0x55aa55aa, len); xorlw(mem, 0xaa55aa55, len); xorlw(mem, 0x55aa55aa, len); xorlw(mem, 0xaaaaaaaa, len); xorlw(mem, 0x5aaaaaaa, len); xorlw(mem, 0xa5aaaaaa, len); xorlw(mem, 0xaa5aaaaa, len); xorlw(mem, 0xaaa5aaaa, len); xorlw(mem, 0xaaaa5aaa, len); xorlw(mem, 0xaaaaa5aa, len); xorlw(mem, 0xaaaaaa5a, len); xorlw(mem, 0xaaaaaaa5, len); xorlw(mem, 0xa5555555, len); xorlw(mem, 0x5a555555, len); xorlw(mem, 0x55a55555, len); xorlw(mem, 0x555a5555, len); xorlw(mem, 0x5555a555, len); xorlw(mem, 0x55555a55, len); xorlw(mem, 0x555555a5, len); xorlw(mem, 0x5555555a, len); xorlw(mem, 0x55555555, len); set_seed(SEED); return check_rnd(mem, len); } The 60 seconds is a very long time to not have a responsive machine. Once I removed the BKL, the machine was responsive as long as I removed the network driver. There must be something in that network driver that is timing-sensitive and I just ticked it off. I will try a 3-COM board in a few minutes. The 'real' target machines don't use either of these so it might just be a non-event although the maintainer of the e100 should know that I've got an interesting test platform if he's got a patch! Cheers, Dick Johnson Penguin : Linux version 2.6.9 on an i686 machine (5537.79 BogoMips). Notice : All mail here is now cached for review by John Ashcroft. 98.36% of all statistics are fiction. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: More linux-2.6.9 module problems 2004-11-09 22:17 ` linux-os @ 2004-11-09 22:32 ` Mike Waychison 2004-11-09 23:11 ` linux-os 0 siblings, 1 reply; 8+ messages in thread From: Mike Waychison @ 2004-11-09 22:32 UTC (permalink / raw) To: linux-os; +Cc: Linux kernel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 linux-os wrote: > On Tue, 9 Nov 2004, Mike Waychison wrote: > >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> linux-os wrote: >> >>> On Tue, 9 Nov 2004, Mike Waychison wrote: >>> >>>> -----BEGIN PGP SIGNED MESSAGE----- >>>> Hash: SHA1 >>>> >>>> linux-os wrote: >>>> >>>>> >>>>> I have a memory-test procedure that tests >>>>> memory on a board, accessed via the PCI bus. >>>>> There is a lot of RAM and it's bank-switched >>>>> into some 64k windows so it takes a lot of >>>>> time to test, about 60 seconds. >>>>> >>>>> This is in a module, therefore inside the kernel. >>>>> When it is invoked via an ioctl() call, the >>>>> kernel is frozen for the whole test-time. The >>>>> test procedure does not use any spin-locks nor >>>>> does it even use any semaphores. It just does a >>>>> bunch of read/write operations over the PCI/Bus. >>>>> >>>>> I thought that I could enable the preemptible- >>>>> kernel option and the machine would then respond >>>>> normally. Not so. Even with 4 CPUs, when one >>>>> ioctl() is busy in the kernel, nothing else >>>>> happens until its done. Even keyboard activity >>>>> is gone, no Caps Lock and no Num Lock, no `ping` >>>>> response over the network. However, the machine >>>>> comes back to life when the memory-test is done. >>>>> >>>>> This is kernel version 2.6.9. Is it possible that >>>>> somebody left on the BKL when calling a module >>>>> ioctl() on this version? If not, what do I do >>>>> to be able to execute a time-consuming procedure >>>>> from inside the kernel? Do I break it up into >>>>> sections and execute schedule() periodically >>>>> (temporary work-around --works)?? >>>>> >>>> >>>> The BKL has always been grabbed across ioctls. Drop the lock when you >>>> enter your f_op->ioctl call and grab it again open completion. >>>> >>> >>> Hmmm. I get 'scheduling while atomic' screaming across the screen! >>> There are no atomic operations in my ioctl functions so I don't >>> know what its complaining about. I think I shouldn't have tried >>> to do anything with BKL because I (my task) doesn't own it. >>> >> >> 'Scheduling while atomic' means you called some function that may >> schedule itself out while you are holding a spinlock. Note that the BKL >> is not a regular spinlock, and scheduling is allowed while holding it. >> >> Please see >> http://james.bond.edu.au/courses/inft73626@033/Assigs/Papers/kernel_locking_techniques.html >> >> by Robert Love, the section titled "The Big Kernel Lock" >> >> Something else is wrong with your code. > > > Not quite. Something is wrong with the e100 network driver used in > 2.6.9. When I do: > > int ioctl(,,,,) > { > int ret; > unlock_kernel(); > ret = original_ioctl(...); > lock_kernel(); > return ret; > } > In my driver, completely unrelated to the network.... It's > something in the e100 network driver that the kernel's > complaining about. If I shut down the network and remove > the network driver module I don't have any problems while > enabling BKL. Everything runs fine. > Don't do that. ioctls rightly-assume that the BKL is held when they are called. When I said drop the lock, I meant for _your_ ioctl code. - -- Mike Waychison Sun Microsystems, Inc. 1 (650) 352-5299 voice 1 (416) 202-8336 voice ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ NOTICE: The opinions expressed in this email are held by me, and may not represent the views of Sun Microsystems, Inc. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBkUVvdQs4kOxk3/MRAscGAKCa51vEk6sXl9zc/mNf+2i6ntvhfACeORkF YlqcKKfN/5Y++pY4Ws6Kgpw= =LsgB -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: More linux-2.6.9 module problems 2004-11-09 22:32 ` Mike Waychison @ 2004-11-09 23:11 ` linux-os 2004-11-10 0:10 ` linux-os 0 siblings, 1 reply; 8+ messages in thread From: linux-os @ 2004-11-09 23:11 UTC (permalink / raw) To: Mike Waychison; +Cc: Linux kernel On Tue, 9 Nov 2004, Mike Waychison wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > linux-os wrote: >> On Tue, 9 Nov 2004, Mike Waychison wrote: >> >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA1 >>> >>> linux-os wrote: >>> >>>> On Tue, 9 Nov 2004, Mike Waychison wrote: >>>> >>>>> -----BEGIN PGP SIGNED MESSAGE----- >>>>> Hash: SHA1 >>>>> >>>>> linux-os wrote: >>>>> >>>>>> >>>>>> I have a memory-test procedure that tests >>>>>> memory on a board, accessed via the PCI bus. >>>>>> There is a lot of RAM and it's bank-switched >>>>>> into some 64k windows so it takes a lot of >>>>>> time to test, about 60 seconds. >>>>>> >>>>>> This is in a module, therefore inside the kernel. >>>>>> When it is invoked via an ioctl() call, the >>>>>> kernel is frozen for the whole test-time. The >>>>>> test procedure does not use any spin-locks nor >>>>>> does it even use any semaphores. It just does a >>>>>> bunch of read/write operations over the PCI/Bus. >>>>>> >>>>>> I thought that I could enable the preemptible- >>>>>> kernel option and the machine would then respond >>>>>> normally. Not so. Even with 4 CPUs, when one >>>>>> ioctl() is busy in the kernel, nothing else >>>>>> happens until its done. Even keyboard activity >>>>>> is gone, no Caps Lock and no Num Lock, no `ping` >>>>>> response over the network. However, the machine >>>>>> comes back to life when the memory-test is done. >>>>>> >>>>>> This is kernel version 2.6.9. Is it possible that >>>>>> somebody left on the BKL when calling a module >>>>>> ioctl() on this version? If not, what do I do >>>>>> to be able to execute a time-consuming procedure >>>>>> from inside the kernel? Do I break it up into >>>>>> sections and execute schedule() periodically >>>>>> (temporary work-around --works)?? >>>>>> >>>>> >>>>> The BKL has always been grabbed across ioctls. Drop the lock when you >>>>> enter your f_op->ioctl call and grab it again open completion. >>>>> >>>> >>>> Hmmm. I get 'scheduling while atomic' screaming across the screen! >>>> There are no atomic operations in my ioctl functions so I don't >>>> know what its complaining about. I think I shouldn't have tried >>>> to do anything with BKL because I (my task) doesn't own it. >>>> >>> >>> 'Scheduling while atomic' means you called some function that may >>> schedule itself out while you are holding a spinlock. Note that the BKL >>> is not a regular spinlock, and scheduling is allowed while holding it. >>> >>> Please see >>> http://james.bond.edu.au/courses/inft73626@033/Assigs/Papers/kernel_locking_techniques.html >>> >>> by Robert Love, the section titled "The Big Kernel Lock" >>> >>> Something else is wrong with your code. >> >> >> Not quite. Something is wrong with the e100 network driver used in >> 2.6.9. When I do: >> >> int ioctl(,,,,) >> { >> int ret; >> unlock_kernel(); >> ret = original_ioctl(...); >> lock_kernel(); >> return ret; >> } >> In my driver, completely unrelated to the network.... It's >> something in the e100 network driver that the kernel's >> complaining about. If I shut down the network and remove >> the network driver module I don't have any problems while >> enabling BKL. Everything runs fine. >> > > Don't do that. ioctls rightly-assume that the BKL is held when they are > called. > > When I said drop the lock, I meant for _your_ ioctl code. > Hmmm. My code didn't do any locking, therefore I don't know how to, as you say "drop the lock", except how other kernel drivers do it. If I had any semaphores (which I don't here), or spin-locks (which I don't), I could certainly unlock anything my code locked. However, the kernel did something before my code was called. Therefore, I have no way of undoing it except by calling unlock_kernel(). Is there some other way? > - -- > Mike Waychison > Sun Microsystems, Inc. > 1 (650) 352-5299 voice > 1 (416) 202-8336 voice > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > NOTICE: The opinions expressed in this email are held by me, > and may not represent the views of Sun Microsystems, Inc. > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.2.5 (GNU/Linux) > Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org > > iD8DBQFBkUVvdQs4kOxk3/MRAscGAKCa51vEk6sXl9zc/mNf+2i6ntvhfACeORkF > YlqcKKfN/5Y++pY4Ws6Kgpw= > =LsgB > -----END PGP SIGNATURE----- > Cheers, Dick Johnson Penguin : Linux version 2.6.9 on an i686 machine (5537.79 BogoMips). Notice : All mail here is now cached for review by John Ashcroft. 98.36% of all statistics are fiction. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: More linux-2.6.9 module problems 2004-11-09 23:11 ` linux-os @ 2004-11-10 0:10 ` linux-os 0 siblings, 0 replies; 8+ messages in thread From: linux-os @ 2004-11-10 0:10 UTC (permalink / raw) To: Mike Waychison; +Cc: Linux kernel On Tue, 9 Nov 2004, linux-os wrote: > On Tue, 9 Nov 2004, Mike Waychison wrote: > >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> linux-os wrote: >>> On Tue, 9 Nov 2004, Mike Waychison wrote: >>> >>>> -----BEGIN PGP SIGNED MESSAGE----- >>>> Hash: SHA1 >>>> >>>> linux-os wrote: >>>> >>>>> On Tue, 9 Nov 2004, Mike Waychison wrote: >>>>> >>>>>> -----BEGIN PGP SIGNED MESSAGE----- >>>>>> Hash: SHA1 >>>>>> >>>>>> linux-os wrote: >>>>>> >>>>>>> >>>>>>> I have a memory-test procedure that tests >>>>>>> memory on a board, accessed via the PCI bus. >>>>>>> There is a lot of RAM and it's bank-switched >>>>>>> into some 64k windows so it takes a lot of >>>>>>> time to test, about 60 seconds. >>>>>>> >>>>>>> This is in a module, therefore inside the kernel. >>>>>>> When it is invoked via an ioctl() call, the >>>>>>> kernel is frozen for the whole test-time. The >>>>>>> test procedure does not use any spin-locks nor >>>>>>> does it even use any semaphores. It just does a >>>>>>> bunch of read/write operations over the PCI/Bus. >>>>>>> >>>>>>> I thought that I could enable the preemptible- >>>>>>> kernel option and the machine would then respond >>>>>>> normally. Not so. Even with 4 CPUs, when one >>>>>>> ioctl() is busy in the kernel, nothing else >>>>>>> happens until its done. Even keyboard activity >>>>>>> is gone, no Caps Lock and no Num Lock, no `ping` >>>>>>> response over the network. However, the machine >>>>>>> comes back to life when the memory-test is done. >>>>>>> >>>>>>> This is kernel version 2.6.9. Is it possible that >>>>>>> somebody left on the BKL when calling a module >>>>>>> ioctl() on this version? If not, what do I do >>>>>>> to be able to execute a time-consuming procedure >>>>>>> from inside the kernel? Do I break it up into >>>>>>> sections and execute schedule() periodically >>>>>>> (temporary work-around --works)?? >>>>>>> >>>>>> >>>>>> The BKL has always been grabbed across ioctls. Drop the lock when you >>>>>> enter your f_op->ioctl call and grab it again open completion. >>>>>> >>>>> >>>>> Hmmm. I get 'scheduling while atomic' screaming across the screen! >>>>> There are no atomic operations in my ioctl functions so I don't >>>>> know what its complaining about. I think I shouldn't have tried >>>>> to do anything with BKL because I (my task) doesn't own it. >>>>> >>>> >>>> 'Scheduling while atomic' means you called some function that may >>>> schedule itself out while you are holding a spinlock. Note that the BKL >>>> is not a regular spinlock, and scheduling is allowed while holding it. >>>> >>>> Please see >>>> http://james.bond.edu.au/courses/inft73626@033/Assigs/Papers/kernel_locking_techniques.html >>>> >>>> by Robert Love, the section titled "The Big Kernel Lock" >>>> >>>> Something else is wrong with your code. >>> >>> >>> Not quite. Something is wrong with the e100 network driver used in >>> 2.6.9. When I do: >>> >>> int ioctl(,,,,) >>> { >>> int ret; >>> unlock_kernel(); >>> ret = original_ioctl(...); >>> lock_kernel(); >>> return ret; >>> } >>> In my driver, completely unrelated to the network.... It's >>> something in the e100 network driver that the kernel's >>> complaining about. If I shut down the network and remove >>> the network driver module I don't have any problems while >>> enabling BKL. Everything runs fine. >>> >> >> Don't do that. ioctls rightly-assume that the BKL is held when they are >> called. >> >> When I said drop the lock, I meant for _your_ ioctl code. >> > > Hmmm. My code didn't do any locking, therefore I don't know > how to, as you say "drop the lock", except how other kernel drivers > do it. If I had any semaphores (which I don't here), or spin-locks > (which I don't), I could certainly unlock anything my code locked. > > However, the kernel did something before my code was called. > Therefore, I have no way of undoing it except by calling > unlock_kernel(). > > Is there some other way? I experimented with: release_kernel_lock(current); do_ioctl(); reacquire_kernel_lock(current); The results were truly spectacular crashes when a copy_to_user happens in the ioctl(), returning the results. The starting error is: sleeping function called from invalid context at arch/i386/lib/usercopy.c:599. It says in_atomic():1, irqs_disabled():0 so something makes __might_sleep() think that it's "in_atomic". Looking at that, I see where !kernel_locked() is going to cause problems in ../include/linux/hard_irq.h if we've been preempted. Then a few hundred thousand lines of unrelated stuff smp_apic_timer_interrupt, etc. FYI, there are no spin-locks and no semaphores in the ioctl() code, and it all works if I don't muck with the kernel lock. So maybe I can't do copy_to_user unless the kernel lock is held? Seems strange. Cheers, Dick Johnson Penguin : Linux version 2.6.9 on an i686 machine (5537.79 BogoMips). Notice : All mail here is now cached for review by John Ashcroft. 98.36% of all statistics are fiction. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2004-11-10 0:14 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2004-11-08 16:50 More linux-2.6.9 module problems linux-os 2004-11-09 19:51 ` Mike Waychison 2004-11-09 20:25 ` linux-os 2004-11-09 21:43 ` Mike Waychison 2004-11-09 22:17 ` linux-os 2004-11-09 22:32 ` Mike Waychison 2004-11-09 23:11 ` linux-os 2004-11-10 0:10 ` linux-os
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.