* nvidia bug or RT bug? @ 2012-05-07 1:17 jordan 2012-05-08 15:15 ` Steven Rostedt 2012-06-05 23:32 ` Glenn Elliott 0 siblings, 2 replies; 7+ messages in thread From: jordan @ 2012-05-07 1:17 UTC (permalink / raw) To: linux-rt-users Hey everyone. I've been using linux-rt-3.0 series, which has been very stable using the nvidia proprietary drivers (pretty much flawlessly, actually). I had used rt-3.0 with nvidia all the way upto nvidia version 290.35. I never experienced any problems relating to nvidia, at all... But external/other reasons, recently I have upgraded to rt-3.2 which also seems to be working quite well. At the same time, i also upgraded my nvidia driver to the latest available driver, which is 302.07 (beta). I know many of the RT devs aren't huge fans of the nvidia blob, but i would like to know whether this is an nvidia bug or rt bug; [143335.564097] BUG: scheduling while atomic: irq/19-nvidia/1141/0x00000002 [143335.564099] Modules linked in: snd_usb_audio ipv6 snd_seq_midi snd_seq_midi_event snd_seq_dummy snd_hrtimer snd_seq joydev wacom(O) vmnet(O) fuse vsock(O) vmci(O) vmmon(O) bnep bluetooth rfkill hwmon_vid snd_usb_us122l snd_usbmidi_lib snd_rawmidi snd_seq_device snd_hda_codec_via snd_hda_codec snd_hwdep snd_pcm forcedeth evdev snd_page_alloc edac_mce_amd firewire_ohci edac_core firewire_core wmi k10temp crc_itu_t video asus_atk0110 psmouse serio_raw snd_timer snd soundcore button i2c_nforce2 processor nvidia(P) i2c_core ext4 crc16 jbd2 mbcache usbhid hid sd_mod pata_amd pata_acpi ahci uhci_hcd libahci ata_generic ohci_hcd libata ehci_hcd scsi_mod usbcore usb_common [last unloaded: snd_hda_codec_hdmi] [143335.564127] Pid: 1141, comm: irq/19-nvidia Tainted: P O 3.2.16-rt27-1-rt #1 [143335.564128] Call Trace: [143335.564134] [<ffffffff81415afa>] __schedule_bug+0x5f/0x63 [143335.564137] [<ffffffff8141bb92>] __schedule+0x842/0x8b0 [143335.564140] [<ffffffff8141dde0>] ? _raw_spin_unlock_irqrestore+0x10/0x40 [143335.564141] [<ffffffff8141de07>] ? _raw_spin_unlock_irqrestore+0x37/0x40 [143335.564144] [<ffffffff8109a755>] ? task_blocks_on_rt_mutex+0x1a5/0x210 [143335.564146] [<ffffffff8141c0ae>] schedule+0x2e/0xa0 [143335.564147] [<ffffffff8141d71c>] rt_spin_lock_slowlock+0x114/0x1f8 [143335.564149] [<ffffffff8141d996>] rt_spin_lock+0x26/0x30 [143335.564151] [<ffffffff8105a4a6>] __wake_up+0x36/0x70 [143335.564259] [<ffffffffa0837783>] nv_post_event+0xe3/0x120 [nvidia] [143335.564297] [<ffffffffa080c8ce>] _nv014670rm+0xe8/0x141 [nvidia] [143335.564367] [<ffffffffa04d3928>] ? _nv006157rm+0xc0/0xe9 [nvidia] [143335.564436] [<ffffffffa04d3be2>] ? _nv013751rm+0xdf/0xf8 [nvidia] [143335.564504] [<ffffffffa04d3b62>] ? _nv013751rm+0x5f/0xf8 [nvidia] [143335.564581] [<ffffffffa0642bdf>] ? _nv010689rm+0xece/0x10ec [nvidia] [143335.564657] [<ffffffffa0642bf8>] ? _nv010689rm+0xee7/0x10ec [nvidia] [143335.564714] [<ffffffffa06b378b>] ? _nv013099rm+0x8b8/0xc74 [nvidia] [143335.564770] [<ffffffffa06bc932>] ? _nv013078rm+0xf2/0x137 [nvidia] [143335.564805] [<ffffffffa08106b3>] ? _nv001098rm+0x143/0x1c2 [nvidia] [143335.564840] [<ffffffffa08172d7>] ? rm_isr_bh+0x23/0x66 [nvidia] [143335.564874] [<ffffffffa0835532>] ? nv_kern_isr_bh+0x42/0x70 [nvidia] [143335.564876] [<ffffffff81068f11>] ? __tasklet_action.isra.9+0x71/0x150 [143335.564878] [<ffffffff8106909e>] ? tasklet_action+0x5e/0x60 [143335.564880] [<ffffffff8106826b>] ? __do_softirq_common+0xcb/0x240 [143335.564882] [<ffffffff81069360>] ? __do_softirq+0x10/0x20 [143335.564883] [<ffffffff810694ad>] ? local_bh_enable+0x13d/0x160 [143335.564885] [<ffffffff810c657b>] ? irq_forced_thread_fn+0x4b/0x70 [143335.564887] [<ffffffff810c6438>] ? irq_thread+0x158/0x200 [143335.564888] [<ffffffff810c6530>] ? irq_thread_fn+0x50/0x50 [143335.564889] [<ffffffff810c62e0>] ? irq_finalize_oneshot+0x110/0x110 [143335.564891] [<ffffffff8108414c>] ? kthread+0x8c/0xa0 [143335.564893] [<ffffffff81050449>] ? finish_task_switch+0x49/0xf0 [143335.564896] [<ffffffff81420b34>] ? kernel_thread_helper+0x4/0x10 [143335.564898] [<ffffffff810840c0>] ? __init_kthread_worker+0x60/0x60 [143335.564899] [<ffffffff81420b30>] ? gs_change+0x13/0x13 [ninez@ninez ~]$ if this is an nvidia problem (which i assume it is), i would like to report it to them (and obviously, if it is an rt related bug - i would like to report it here. I am using an rt-patch for the nvidia driver. I simply adapted an existing package for my distro (Archlinux). the patch is available here, in case it matters; https://aur.archlinux.org/packages.php?ID=12132 the patch can be viewed by downloading the 'tarball'. I only mention the patch in case it matters. any other info you may need, just ask and i will do my best to provide it. thank you. jordan ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: nvidia bug or RT bug? 2012-05-07 1:17 nvidia bug or RT bug? jordan @ 2012-05-08 15:15 ` Steven Rostedt 2012-05-08 15:46 ` jordan 2012-06-05 23:32 ` Glenn Elliott 1 sibling, 1 reply; 7+ messages in thread From: Steven Rostedt @ 2012-05-08 15:15 UTC (permalink / raw) To: jordan; +Cc: linux-rt-users On Sun, 2012-05-06 at 21:17 -0400, jordan wrote: > Hey everyone. > > I've been using linux-rt-3.0 series, which has been very stable using > the nvidia proprietary drivers (pretty much flawlessly, actually). I > had used rt-3.0 with nvidia all the way upto nvidia version 290.35. I > never experienced any problems relating to nvidia, at all... But > external/other reasons, recently I have upgraded to rt-3.2 which also > seems to be working quite well. At the same time, i also upgraded my > nvidia driver to the latest available driver, which is 302.07 (beta). > > I know many of the RT devs aren't huge fans of the nvidia blob, but i > would like to know whether this is an nvidia bug or rt bug; > > [143335.564097] BUG: scheduling while atomic: irq/19-nvidia/1141/0x00000002 [..] > > if this is an nvidia problem (which i assume it is), i would like to > report it to them (and obviously, if it is an rt related bug - i would > like to report it here. It's both, but we don't care. Sorry. Seems that the nvidia driver is not compatible with some of the changes that -rt has done. One is that you can not call spinlocks after disabling preemption. If the nvidia driver does this, it will break. > > I am using an rt-patch for the nvidia driver. I simply adapted an > existing package for my distro (Archlinux). the patch is available > here, in case it matters; Bring up the bug with the nvidia rt patch maintainer. That's about all I'll say on this matter. -- Steve ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: nvidia bug or RT bug? 2012-05-08 15:15 ` Steven Rostedt @ 2012-05-08 15:46 ` jordan 2012-05-08 15:58 ` Steven Rostedt 0 siblings, 1 reply; 7+ messages in thread From: jordan @ 2012-05-08 15:46 UTC (permalink / raw) To: Steven Rostedt; +Cc: linux-rt-users Thanks for the reply, Steven. ( i didn't even think i would get that) > It's both, but we don't care. Sorry. Seems that the nvidia driver is not > compatible with some of the changes that -rt has done. One is that you > can not call spinlocks after disabling preemption. If the nvidia driver > does this, it will break. I see. I know how to reproduce this issue, and how to completely avoid it from affecting me, anyway - so it's not a huge deal. It's only happened twice, in both circumstances i use using VDPAU with Adobe flash <sarcasm>surprise, surprise!</sarcasm>, also in both circumstances my uptime had been several days. ie: it doesn't happen very often, at all. I understand why you don't care about nvidia, but if it is a bug in RT - should it not be fixed? anyways, I will report it to nvidia once i get a chance to. it would nice if they addressed RT-related problems in their driver being as some folks do require proper/good 3d acceleration / real performance / Opengl 4.2 which (obviously) Nouveau does not provide, nor is it anywhere near providing in the foreseeable future :\ (not that this is news to anyone who has used them both). > Bring up the bug with the nvidia rt patch maintainer. That's about all > I'll say on this matter. No problem. I know the author of the patch, anyway. I just wanted some verification, rather than just making my own assumptions. take care, and thanks again. jordan ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: nvidia bug or RT bug? 2012-05-08 15:46 ` jordan @ 2012-05-08 15:58 ` Steven Rostedt 0 siblings, 0 replies; 7+ messages in thread From: Steven Rostedt @ 2012-05-08 15:58 UTC (permalink / raw) To: jordan; +Cc: linux-rt-users On Tue, 2012-05-08 at 11:46 -0400, jordan wrote: > I understand why you don't care about nvidia, Because it's a proprietary driver and I can't see some of the code nor can I fix it. > but if it is a bug in RT It's not a bug in RT. It's a bug in how nvidia interacts with RT. When I said it was both, I meant that it wont break in vanilla kernel, but only breaks when nvidia is used with RT. But that's because RT changes things, and drivers must adapt to be part of RT. > - should it not be fixed? There's nothing to fix. > anyways, I will report it to nvidia once i > get a chance to. it would nice if they addressed RT-related problems That's up to nvidia to decide. > in their driver being as some folks do require proper/good 3d > acceleration / real performance / Opengl 4.2 which (obviously) Nouveau > does not provide, nor is it anywhere near providing in the foreseeable > future :\ (not that this is news to anyone who has used them both). > > > Bring up the bug with the nvidia rt patch maintainer. That's about all > > I'll say on this matter. > > No problem. I know the author of the patch, anyway. I just wanted some > verification, rather than just making my own assumptions. Thanks, -- Steve ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: nvidia bug or RT bug? 2012-05-07 1:17 nvidia bug or RT bug? jordan 2012-05-08 15:15 ` Steven Rostedt @ 2012-06-05 23:32 ` Glenn Elliott 2012-06-06 0:03 ` Glenn Elliott 1 sibling, 1 reply; 7+ messages in thread From: Glenn Elliott @ 2012-06-05 23:32 UTC (permalink / raw) To: linux-rt-users jordan <triplesquarednine <at> gmail.com> writes: > I've been using linux-rt-3.0 series, which has been very stable using > the nvidia proprietary drivers (pretty much flawlessly, actually). I > had used rt-3.0 with nvidia all the way upto nvidia version 290.35. I > never experienced any problems relating to nvidia, at all... But > external/other reasons, recently I have upgraded to rt-3.2 which also > seems to be working quite well. At the same time, i also upgraded my > nvidia driver to the latest available driver, which is 302.07 (beta). I've used the 270 drivers with an older version of PREEMPT_RT. However, I had to modify the GPL layer code to make the compilation/install go through. Did you have to do the same? In the GPL layer (which you can extract from the *.run driver package), I had to edit kernel/nv-linux.h and update the NV_SPIN_*LOCK() to call raw_spin_*lock(). (Using raw spin locks seemed like the only safe thing to do, given the closed-source nature of the driver.) I haven't looked at the 290 GPL layer, so I don't know if this spinlock edit is still necessarily. Another interesting edit is to enable MSI interrupts. In kernel/nv-reg.h, change the line "NV_DEFINE_REG_ENTRY(__NV_ENABLE_MSI, 0);" to "NV_DEFINE_REG_ENTRY(__NV_ENABLE_MSI, 1);" After your edits, simply do "make module; make install" from within the kernel directory to install the custom driver to the currently running kernel (you'll probably want to run the full installer first to pick up the various shared libraries and X configurations). I've found that the GPL layer code changes rarely, so I believe there is a good chance that these edits will still be valid. -Glenn ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: nvidia bug or RT bug? 2012-06-05 23:32 ` Glenn Elliott @ 2012-06-06 0:03 ` Glenn Elliott 2012-06-06 0:25 ` jordan 0 siblings, 1 reply; 7+ messages in thread From: Glenn Elliott @ 2012-06-06 0:03 UTC (permalink / raw) To: linux-rt-users Glenn Elliott <gelliott <at> cs.unc.edu> writes: > > jordan <triplesquarednine <at> gmail.com> writes: > > > I've been using linux-rt-3.0 series, which has been very stable using > > the nvidia proprietary drivers (pretty much flawlessly, actually). I > > had used rt-3.0 with nvidia all the way upto nvidia version 290.35. I > > never experienced any problems relating to nvidia, at all... But > > external/other reasons, recently I have upgraded to rt-3.2 which also > > seems to be working quite well. At the same time, i also upgraded my > > nvidia driver to the latest available driver, which is 302.07 (beta). > > I've used the 270 drivers with an older version of PREEMPT_RT. However, > I had to modify the GPL layer code to make the compilation/install go > through. Did you have to do the same? > > In the GPL layer (which you can extract from the *.run driver package), > I had to edit kernel/nv-linux.h and update the NV_SPIN_*LOCK() to call > raw_spin_*lock(). (Using raw spin locks seemed like the only safe thing > to do, given the closed-source nature of the driver.) > > I haven't looked at the 290 GPL layer, so I don't know if this > spinlock edit is still necessarily. > > Another interesting edit is to enable MSI interrupts. > In kernel/nv-reg.h, change the line > "NV_DEFINE_REG_ENTRY(__NV_ENABLE_MSI, 0);" > to > "NV_DEFINE_REG_ENTRY(__NV_ENABLE_MSI, 1);" > > After your edits, simply do "make module; make install" from within the > kernel directory to install the custom driver to the currently running > kernel (you'll probably want to run the full installer first to pick up > the various shared libraries and X configurations). > > I've found that the GPL layer code changes rarely, so I believe there > is a good chance that these edits will still be valid. > > -Glenn Here is a patch carried forward for 302.06.03 (the beta driver I downloaded with CUDA 5.0 preview a few weeks ago). They were the same for the 270 driver. This has worked for me; maybe it will work for you. diff -rupN NVIDIA-Linux-x86_64-302.06.03/kernel/nv-linux.h \ NVIDIA-Linux-x86_64-302.06.03.rawspin.msi/kernel/nv-linux.h --- NVIDIA-Linux-x86_64-302.06.03/kernel/nv-linux.h 2012-05-03 \ 21:19:21.000000000 -0400 +++ NVIDIA-Linux-x86_64-302.06.03.rawspin.msi/kernel/nv-linux.h 2012-06-05 19:44:01.642339831 -0400 @@ -291,28 +291,15 @@ extern int nv_pat_mode; #endif #endif -#if defined(CONFIG_PREEMPT_RT) -typedef atomic_spinlock_t nv_spinlock_t; -#define NV_SPIN_LOCK_INIT(lock) atomic_spin_lock_init(lock) -#define NV_SPIN_LOCK_IRQ(lock) atomic_spin_lock_irq(lock) -#define NV_SPIN_UNLOCK_IRQ(lock) atomic_spin_unlock_irq(lock) -#define NV_SPIN_LOCK_IRQSAVE(lock,flags) atomic_spin_lock_irqsave(lock,flags) -#define NV_SPIN_UNLOCK_IRQRESTORE(lock,flags) \ - atomic_spin_unlock_irqrestore(lock,flags) -#define NV_SPIN_LOCK(lock) atomic_spin_lock(lock) -#define NV_SPIN_UNLOCK(lock) atomic_spin_unlock(lock) -#define NV_SPIN_UNLOCK_WAIT(lock) atomic_spin_unlock_wait(lock) -#else -typedef spinlock_t nv_spinlock_t; -#define NV_SPIN_LOCK_INIT(lock) spin_lock_init(lock) -#define NV_SPIN_LOCK_IRQ(lock) spin_lock_irq(lock) -#define NV_SPIN_UNLOCK_IRQ(lock) spin_unlock_irq(lock) -#define NV_SPIN_LOCK_IRQSAVE(lock,flags) spin_lock_irqsave(lock,flags) -#define NV_SPIN_UNLOCK_IRQRESTORE(lock,flags) spin_unlock_irqrestore(lock,flags) -#define NV_SPIN_LOCK(lock) spin_lock(lock) -#define NV_SPIN_UNLOCK(lock) spin_unlock(lock) -#define NV_SPIN_UNLOCK_WAIT(lock) spin_unlock_wait(lock) -#endif +typedef raw_spinlock_t nv_spinlock_t; +#define NV_SPIN_LOCK_INIT(lock) raw_spin_lock_init(lock) +#define NV_SPIN_LOCK_IRQ(lock) raw_spin_lock_irq(lock) +#define NV_SPIN_UNLOCK_IRQ(lock) raw_spin_unlock_irq(lock) +#define NV_SPIN_LOCK_IRQSAVE(lock,flags) raw_spin_lock_irqsave(lock,flags) +#define NV_SPIN_UNLOCK_IRQRESTORE(lock,flags) raw_spin_unlock_irqrestore(lock,flags) +#define NV_SPIN_LOCK(lock) raw_spin_lock(lock) +#define NV_SPIN_UNLOCK(lock) raw_spin_unlock(lock) +#define NV_SPIN_UNLOCK_WAIT(lock) raw_spin_unlock_wait(lock) #if defined(NVCPU_X86) #ifndef write_cr4 @@ -954,9 +941,6 @@ static inline int nv_execute_on_all_cpus return ret; } -#if defined(CONFIG_PREEMPT_RT) -#define NV_INIT_MUTEX(mutex) semaphore_init(mutex) -#else #if !defined(__SEMAPHORE_INITIALIZER) && defined(__COMPAT_SEMAPHORE_INITIALIZER) #define __SEMAPHORE_INITIALIZER __COMPAT_SEMAPHORE_INITIALIZER #endif @@ -966,7 +950,6 @@ static inline int nv_execute_on_all_cpus __SEMAPHORE_INITIALIZER(*(mutex), 1); \ *(mutex) = __mutex; \ } -#endif #if defined (KERNEL_2_4) # define NV_IS_SUSER() suser() diff -rupN NVIDIA-Linux-x86_64-302.06.03/kernel/nv-reg.h \ NVIDIA-Linux-x86_64-302.06.03.rawspin.msi/kernel/nv-reg.h --- NVIDIA-Linux-x86_64-302.06.03/kernel/nv-reg.h 2012-05-03 \ 21:19:21.000000000 -0400 +++ NVIDIA-Linux-x86_64-302.06.03.rawspin.msi/kernel/nv-reg.h 2012-06-05 \ 19:41:47.338336880 -0400 @@ -607,7 +607,7 @@ NV_DEFINE_REG_ENTRY(__NV_INITIALIZE_SYST NV_DEFINE_REG_ENTRY(__NV_USE_VBIOS, 1); NV_DEFINE_REG_ENTRY(__NV_RM_EDGE_INTR_CHECK, 1); NV_DEFINE_REG_ENTRY(__NV_USE_PAGE_ATTRIBUTE_TABLE, ~0); -NV_DEFINE_REG_ENTRY(__NV_ENABLE_MSI, 0); +NV_DEFINE_REG_ENTRY(__NV_ENABLE_MSI, 1); NV_DEFINE_REG_ENTRY(__NV_MAP_REGISTERS_EARLY, 0); NV_DEFINE_REG_ENTRY(__NV_REGISTER_FOR_ACPI_EVENTS, 1); I am a little uncertain about my removal of "NV_INIT_MUTEX(mutex) semaphore_init(mutex)". I can't remember what my rational was. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: nvidia bug or RT bug? 2012-06-06 0:03 ` Glenn Elliott @ 2012-06-06 0:25 ` jordan 0 siblings, 0 replies; 7+ messages in thread From: jordan @ 2012-06-06 0:25 UTC (permalink / raw) To: Glenn Elliott; +Cc: linux-rt-users > Here is a patch carried forward for 302.06.03 (the beta driver I > downloaded with CUDA 5.0 preview a few weeks ago). They were > the same for the 270 driver. This has worked for me; maybe it will > work for you. I've actually got 302.11 running, nicely. I haven't had the crash i originally posted about it a while (that doesn't mean it was fixed necessarily). here is a link (pastebin) to the patch that i am using; http://pastebin.com/CXWtt3E1 do you see any problems, or something that may be missing? (aside from the MSI stuff?) I ask because i suppose it is possible that nvidia has changed, added something and the patch that i use is for the 290. series and i just adapted the pkguibld (build-script) to use the latest version. So if you see any issues let me know, and i could forward that information to the packager who is maintaining this patch. > I am a little uncertain about my removal of "NV_INIT_MUTEX(mutex) > semaphore_init(mutex)". I can't remember what my rational was. I have no idea. ;) cheerz Jordan -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-06-06 0:25 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-05-07 1:17 nvidia bug or RT bug? jordan 2012-05-08 15:15 ` Steven Rostedt 2012-05-08 15:46 ` jordan 2012-05-08 15:58 ` Steven Rostedt 2012-06-05 23:32 ` Glenn Elliott 2012-06-06 0:03 ` Glenn Elliott 2012-06-06 0:25 ` jordan
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.