All of lore.kernel.org
 help / color / mirror / Atom feed
* nvidia bug or RT bug?
@ 2012-05-07  1:17 jordan
  2012-05-08 15:15 ` Steven Rostedt
  2012-06-05 23:32 ` Glenn Elliott
  0 siblings, 2 replies; 7+ messages in thread
From: jordan @ 2012-05-07  1:17 UTC (permalink / raw)
  To: linux-rt-users

Hey everyone.

I've been using linux-rt-3.0 series, which has been very stable using
the nvidia proprietary drivers (pretty much flawlessly, actually). I
had used rt-3.0 with nvidia all the way upto nvidia version 290.35. I
never experienced any problems relating to nvidia, at all... But
external/other reasons, recently I have upgraded to rt-3.2 which also
seems to be working quite well. At the same time, i also upgraded my
nvidia driver to the latest available driver, which is 302.07 (beta).

I know many of the RT devs aren't huge fans of the nvidia blob, but i
would like to know whether this is an nvidia bug or rt bug;

[143335.564097] BUG: scheduling while atomic: irq/19-nvidia/1141/0x00000002
[143335.564099] Modules linked in: snd_usb_audio ipv6 snd_seq_midi
snd_seq_midi_event snd_seq_dummy snd_hrtimer snd_seq joydev wacom(O)
vmnet(O) fuse vsock(O) vmci(O) vmmon(O) bnep bluetooth rfkill
hwmon_vid snd_usb_us122l snd_usbmidi_lib snd_rawmidi snd_seq_device
snd_hda_codec_via snd_hda_codec snd_hwdep snd_pcm forcedeth evdev
snd_page_alloc edac_mce_amd firewire_ohci edac_core firewire_core wmi
k10temp crc_itu_t video asus_atk0110 psmouse serio_raw snd_timer snd
soundcore button i2c_nforce2 processor nvidia(P) i2c_core ext4 crc16
jbd2 mbcache usbhid hid sd_mod pata_amd pata_acpi ahci uhci_hcd
libahci ata_generic ohci_hcd libata ehci_hcd scsi_mod usbcore
usb_common [last unloaded: snd_hda_codec_hdmi]
[143335.564127] Pid: 1141, comm: irq/19-nvidia Tainted: P           O
3.2.16-rt27-1-rt #1
[143335.564128] Call Trace:
[143335.564134]  [<ffffffff81415afa>] __schedule_bug+0x5f/0x63
[143335.564137]  [<ffffffff8141bb92>] __schedule+0x842/0x8b0
[143335.564140]  [<ffffffff8141dde0>] ? _raw_spin_unlock_irqrestore+0x10/0x40
[143335.564141]  [<ffffffff8141de07>] ? _raw_spin_unlock_irqrestore+0x37/0x40
[143335.564144]  [<ffffffff8109a755>] ? task_blocks_on_rt_mutex+0x1a5/0x210
[143335.564146]  [<ffffffff8141c0ae>] schedule+0x2e/0xa0
[143335.564147]  [<ffffffff8141d71c>] rt_spin_lock_slowlock+0x114/0x1f8
[143335.564149]  [<ffffffff8141d996>] rt_spin_lock+0x26/0x30
[143335.564151]  [<ffffffff8105a4a6>] __wake_up+0x36/0x70
[143335.564259]  [<ffffffffa0837783>] nv_post_event+0xe3/0x120 [nvidia]
[143335.564297]  [<ffffffffa080c8ce>] _nv014670rm+0xe8/0x141 [nvidia]
[143335.564367]  [<ffffffffa04d3928>] ? _nv006157rm+0xc0/0xe9 [nvidia]
[143335.564436]  [<ffffffffa04d3be2>] ? _nv013751rm+0xdf/0xf8 [nvidia]
[143335.564504]  [<ffffffffa04d3b62>] ? _nv013751rm+0x5f/0xf8 [nvidia]
[143335.564581]  [<ffffffffa0642bdf>] ? _nv010689rm+0xece/0x10ec [nvidia]
[143335.564657]  [<ffffffffa0642bf8>] ? _nv010689rm+0xee7/0x10ec [nvidia]
[143335.564714]  [<ffffffffa06b378b>] ? _nv013099rm+0x8b8/0xc74 [nvidia]
[143335.564770]  [<ffffffffa06bc932>] ? _nv013078rm+0xf2/0x137 [nvidia]
[143335.564805]  [<ffffffffa08106b3>] ? _nv001098rm+0x143/0x1c2 [nvidia]
[143335.564840]  [<ffffffffa08172d7>] ? rm_isr_bh+0x23/0x66 [nvidia]
[143335.564874]  [<ffffffffa0835532>] ? nv_kern_isr_bh+0x42/0x70 [nvidia]
[143335.564876]  [<ffffffff81068f11>] ? __tasklet_action.isra.9+0x71/0x150
[143335.564878]  [<ffffffff8106909e>] ? tasklet_action+0x5e/0x60
[143335.564880]  [<ffffffff8106826b>] ? __do_softirq_common+0xcb/0x240
[143335.564882]  [<ffffffff81069360>] ? __do_softirq+0x10/0x20
[143335.564883]  [<ffffffff810694ad>] ? local_bh_enable+0x13d/0x160
[143335.564885]  [<ffffffff810c657b>] ? irq_forced_thread_fn+0x4b/0x70
[143335.564887]  [<ffffffff810c6438>] ? irq_thread+0x158/0x200
[143335.564888]  [<ffffffff810c6530>] ? irq_thread_fn+0x50/0x50
[143335.564889]  [<ffffffff810c62e0>] ? irq_finalize_oneshot+0x110/0x110
[143335.564891]  [<ffffffff8108414c>] ? kthread+0x8c/0xa0
[143335.564893]  [<ffffffff81050449>] ? finish_task_switch+0x49/0xf0
[143335.564896]  [<ffffffff81420b34>] ? kernel_thread_helper+0x4/0x10
[143335.564898]  [<ffffffff810840c0>] ? __init_kthread_worker+0x60/0x60
[143335.564899]  [<ffffffff81420b30>] ? gs_change+0x13/0x13
[ninez@ninez ~]$

if this is an nvidia problem (which i assume it is), i would like to
report it to them (and obviously, if it is an rt related bug - i would
like to report it here.

I am using an rt-patch for the nvidia driver. I simply adapted an
existing package for my distro (Archlinux). the patch is available
here, in case it matters;

https://aur.archlinux.org/packages.php?ID=12132

the patch can be viewed by downloading the 'tarball'. I only mention
the patch in case it matters.

any other info you may need, just ask and i will do my best to provide it.

thank you.

jordan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: nvidia bug or RT bug?
  2012-05-07  1:17 nvidia bug or RT bug? jordan
@ 2012-05-08 15:15 ` Steven Rostedt
  2012-05-08 15:46   ` jordan
  2012-06-05 23:32 ` Glenn Elliott
  1 sibling, 1 reply; 7+ messages in thread
From: Steven Rostedt @ 2012-05-08 15:15 UTC (permalink / raw)
  To: jordan; +Cc: linux-rt-users

On Sun, 2012-05-06 at 21:17 -0400, jordan wrote:
> Hey everyone.
> 
> I've been using linux-rt-3.0 series, which has been very stable using
> the nvidia proprietary drivers (pretty much flawlessly, actually). I
> had used rt-3.0 with nvidia all the way upto nvidia version 290.35. I
> never experienced any problems relating to nvidia, at all... But
> external/other reasons, recently I have upgraded to rt-3.2 which also
> seems to be working quite well. At the same time, i also upgraded my
> nvidia driver to the latest available driver, which is 302.07 (beta).
> 
> I know many of the RT devs aren't huge fans of the nvidia blob, but i
> would like to know whether this is an nvidia bug or rt bug;
> 
> [143335.564097] BUG: scheduling while atomic: irq/19-nvidia/1141/0x00000002

[..]

> 
> if this is an nvidia problem (which i assume it is), i would like to
> report it to them (and obviously, if it is an rt related bug - i would
> like to report it here.

It's both, but we don't care. Sorry. Seems that the nvidia driver is not
compatible with some of the changes that -rt has done. One is that you
can not call spinlocks after disabling preemption. If the nvidia driver
does this, it will break.

> 
> I am using an rt-patch for the nvidia driver. I simply adapted an
> existing package for my distro (Archlinux). the patch is available
> here, in case it matters;

Bring up the bug with the nvidia rt patch maintainer. That's about all
I'll say on this matter.

-- Steve




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: nvidia bug or RT bug?
  2012-05-08 15:15 ` Steven Rostedt
@ 2012-05-08 15:46   ` jordan
  2012-05-08 15:58     ` Steven Rostedt
  0 siblings, 1 reply; 7+ messages in thread
From: jordan @ 2012-05-08 15:46 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-rt-users

Thanks for the reply, Steven. ( i didn't even think i would get that)

> It's both, but we don't care. Sorry. Seems that the nvidia driver is not
> compatible with some of the changes that -rt has done. One is that you
> can not call spinlocks after disabling preemption. If the nvidia driver
> does this, it will break.

I see. I know how to reproduce this issue, and how to completely avoid
it from affecting me, anyway - so it's not a huge deal. It's only
happened twice, in both circumstances i use using VDPAU with Adobe
flash <sarcasm>surprise, surprise!</sarcasm>, also in both
circumstances my uptime had been several days. ie: it doesn't happen
very often, at all.

I understand why you don't care about nvidia, but if it is a bug in RT
- should it not be fixed? anyways, I will report it to nvidia once i
get a chance to. it would nice if they addressed RT-related problems
in their driver being as some folks do require proper/good 3d
acceleration / real performance / Opengl 4.2 which (obviously) Nouveau
does not provide, nor is it anywhere near providing in the foreseeable
future :\ (not that this is news to anyone who has used them both).

> Bring up the bug with the nvidia rt patch maintainer. That's about all
> I'll say on this matter.

No problem. I know the author of the patch, anyway. I just wanted some
verification, rather than just making my own assumptions.

take care, and thanks again.

jordan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: nvidia bug or RT bug?
  2012-05-08 15:46   ` jordan
@ 2012-05-08 15:58     ` Steven Rostedt
  0 siblings, 0 replies; 7+ messages in thread
From: Steven Rostedt @ 2012-05-08 15:58 UTC (permalink / raw)
  To: jordan; +Cc: linux-rt-users

On Tue, 2012-05-08 at 11:46 -0400, jordan wrote:

> I understand why you don't care about nvidia,

Because it's a proprietary driver and I can't see some of the code nor
can I fix it.


>  but if it is a bug in RT

It's not a bug in RT. It's a bug in how nvidia interacts with RT. When I
said it was both, I meant that it wont break in vanilla kernel, but only
breaks when nvidia is used with RT. But that's because RT changes
things, and drivers must adapt to be part of RT.

> - should it not be fixed?

There's nothing to fix.


>  anyways, I will report it to nvidia once i
> get a chance to. it would nice if they addressed RT-related problems

That's up to nvidia to decide.

> in their driver being as some folks do require proper/good 3d
> acceleration / real performance / Opengl 4.2 which (obviously) Nouveau
> does not provide, nor is it anywhere near providing in the foreseeable
> future :\ (not that this is news to anyone who has used them both).
> 
> > Bring up the bug with the nvidia rt patch maintainer. That's about all
> > I'll say on this matter.
> 
> No problem. I know the author of the patch, anyway. I just wanted some
> verification, rather than just making my own assumptions.

Thanks,

-- Steve



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: nvidia bug or RT bug?
  2012-05-07  1:17 nvidia bug or RT bug? jordan
  2012-05-08 15:15 ` Steven Rostedt
@ 2012-06-05 23:32 ` Glenn Elliott
  2012-06-06  0:03   ` Glenn Elliott
  1 sibling, 1 reply; 7+ messages in thread
From: Glenn Elliott @ 2012-06-05 23:32 UTC (permalink / raw)
  To: linux-rt-users

jordan <triplesquarednine <at> gmail.com> writes:

> I've been using linux-rt-3.0 series, which has been very stable using
> the nvidia proprietary drivers (pretty much flawlessly, actually). I
> had used rt-3.0 with nvidia all the way upto nvidia version 290.35. I
> never experienced any problems relating to nvidia, at all... But
> external/other reasons, recently I have upgraded to rt-3.2 which also
> seems to be working quite well. At the same time, i also upgraded my
> nvidia driver to the latest available driver, which is 302.07 (beta).

I've used the 270 drivers with an older version of PREEMPT_RT.  However,
I had to modify the GPL layer code to make the compilation/install go
through.  Did you have to do the same?

In the GPL layer (which you can extract from the *.run driver package),
I had to  edit kernel/nv-linux.h and update the NV_SPIN_*LOCK() to call
raw_spin_*lock().  (Using raw spin locks seemed like the only safe thing
to do, given the closed-source nature of the driver.)

I haven't looked at the 290 GPL layer, so I don't know if this
spinlock edit is still necessarily.

Another interesting edit is to enable MSI interrupts.
In kernel/nv-reg.h, change the line 
"NV_DEFINE_REG_ENTRY(__NV_ENABLE_MSI, 0);"
to
"NV_DEFINE_REG_ENTRY(__NV_ENABLE_MSI, 1);"

After your edits, simply do "make module; make install" from within the
kernel  directory to install the custom driver to the currently running
kernel (you'll probably want to run the full installer first to pick up
the various shared libraries and X configurations).

I've found that the GPL layer code changes rarely, so I believe there
is a good chance that these edits will still be valid.

-Glenn


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: nvidia bug or RT bug?
  2012-06-05 23:32 ` Glenn Elliott
@ 2012-06-06  0:03   ` Glenn Elliott
  2012-06-06  0:25     ` jordan
  0 siblings, 1 reply; 7+ messages in thread
From: Glenn Elliott @ 2012-06-06  0:03 UTC (permalink / raw)
  To: linux-rt-users

Glenn Elliott <gelliott <at> cs.unc.edu> writes:

> 
> jordan <triplesquarednine <at> gmail.com> writes:
> 
> > I've been using linux-rt-3.0 series, which has been very stable using
> > the nvidia proprietary drivers (pretty much flawlessly, actually). I
> > had used rt-3.0 with nvidia all the way upto nvidia version 290.35. I
> > never experienced any problems relating to nvidia, at all... But
> > external/other reasons, recently I have upgraded to rt-3.2 which also
> > seems to be working quite well. At the same time, i also upgraded my
> > nvidia driver to the latest available driver, which is 302.07 (beta).
> 
> I've used the 270 drivers with an older version of PREEMPT_RT.  However,
> I had to modify the GPL layer code to make the compilation/install go
> through.  Did you have to do the same?
> 
> In the GPL layer (which you can extract from the *.run driver package),
> I had to  edit kernel/nv-linux.h and update the NV_SPIN_*LOCK() to call
> raw_spin_*lock().  (Using raw spin locks seemed like the only safe thing
> to do, given the closed-source nature of the driver.)
> 
> I haven't looked at the 290 GPL layer, so I don't know if this
> spinlock edit is still necessarily.
> 
> Another interesting edit is to enable MSI interrupts.
> In kernel/nv-reg.h, change the line 
> "NV_DEFINE_REG_ENTRY(__NV_ENABLE_MSI, 0);"
> to
> "NV_DEFINE_REG_ENTRY(__NV_ENABLE_MSI, 1);"
> 
> After your edits, simply do "make module; make install" from within the
> kernel  directory to install the custom driver to the currently running
> kernel (you'll probably want to run the full installer first to pick up
> the various shared libraries and X configurations).
> 
> I've found that the GPL layer code changes rarely, so I believe there
> is a good chance that these edits will still be valid.
> 
> -Glenn


Here is a patch carried forward for 302.06.03 (the beta driver I
downloaded with CUDA 5.0 preview a few weeks ago).  They were
the same for the 270 driver.  This has worked for me; maybe it will
work for you.

diff -rupN NVIDIA-Linux-x86_64-302.06.03/kernel/nv-linux.h \
NVIDIA-Linux-x86_64-302.06.03.rawspin.msi/kernel/nv-linux.h
--- NVIDIA-Linux-x86_64-302.06.03/kernel/nv-linux.h	2012-05-03 \
21:19:21.000000000 -0400
+++ NVIDIA-Linux-x86_64-302.06.03.rawspin.msi/kernel/nv-linux.h	2012-06-05 
19:44:01.642339831 -0400
@@ -291,28 +291,15 @@ extern int nv_pat_mode;
 #endif
 #endif
 
-#if defined(CONFIG_PREEMPT_RT)
-typedef atomic_spinlock_t         nv_spinlock_t;
-#define NV_SPIN_LOCK_INIT(lock)   atomic_spin_lock_init(lock)
-#define NV_SPIN_LOCK_IRQ(lock)    atomic_spin_lock_irq(lock)
-#define NV_SPIN_UNLOCK_IRQ(lock)  atomic_spin_unlock_irq(lock)
-#define NV_SPIN_LOCK_IRQSAVE(lock,flags) atomic_spin_lock_irqsave(lock,flags)
-#define NV_SPIN_UNLOCK_IRQRESTORE(lock,flags) \
-  atomic_spin_unlock_irqrestore(lock,flags)
-#define NV_SPIN_LOCK(lock)        atomic_spin_lock(lock)
-#define NV_SPIN_UNLOCK(lock)      atomic_spin_unlock(lock)
-#define NV_SPIN_UNLOCK_WAIT(lock) atomic_spin_unlock_wait(lock)
-#else
-typedef spinlock_t                nv_spinlock_t;
-#define NV_SPIN_LOCK_INIT(lock)   spin_lock_init(lock)
-#define NV_SPIN_LOCK_IRQ(lock)    spin_lock_irq(lock)
-#define NV_SPIN_UNLOCK_IRQ(lock)  spin_unlock_irq(lock)
-#define NV_SPIN_LOCK_IRQSAVE(lock,flags) spin_lock_irqsave(lock,flags)
-#define NV_SPIN_UNLOCK_IRQRESTORE(lock,flags) 
spin_unlock_irqrestore(lock,flags)
-#define NV_SPIN_LOCK(lock)        spin_lock(lock)
-#define NV_SPIN_UNLOCK(lock)      spin_unlock(lock)
-#define NV_SPIN_UNLOCK_WAIT(lock) spin_unlock_wait(lock)
-#endif
+typedef raw_spinlock_t                nv_spinlock_t;
+#define NV_SPIN_LOCK_INIT(lock)   raw_spin_lock_init(lock)
+#define NV_SPIN_LOCK_IRQ(lock)    raw_spin_lock_irq(lock)
+#define NV_SPIN_UNLOCK_IRQ(lock)  raw_spin_unlock_irq(lock)
+#define NV_SPIN_LOCK_IRQSAVE(lock,flags) raw_spin_lock_irqsave(lock,flags)
+#define NV_SPIN_UNLOCK_IRQRESTORE(lock,flags) 
raw_spin_unlock_irqrestore(lock,flags)
+#define NV_SPIN_LOCK(lock)        raw_spin_lock(lock)
+#define NV_SPIN_UNLOCK(lock)      raw_spin_unlock(lock)
+#define NV_SPIN_UNLOCK_WAIT(lock) raw_spin_unlock_wait(lock)
 
 #if defined(NVCPU_X86)
 #ifndef write_cr4
@@ -954,9 +941,6 @@ static inline int nv_execute_on_all_cpus
     return ret;
 }
 
-#if defined(CONFIG_PREEMPT_RT)
-#define NV_INIT_MUTEX(mutex) semaphore_init(mutex)
-#else
 #if !defined(__SEMAPHORE_INITIALIZER) && 
defined(__COMPAT_SEMAPHORE_INITIALIZER)
 #define __SEMAPHORE_INITIALIZER __COMPAT_SEMAPHORE_INITIALIZER
 #endif
@@ -966,7 +950,6 @@ static inline int nv_execute_on_all_cpus
             __SEMAPHORE_INITIALIZER(*(mutex), 1);  \
         *(mutex) = __mutex;                        \
     }
-#endif
 
 #if defined (KERNEL_2_4)
 #  define NV_IS_SUSER()                 suser()
diff -rupN NVIDIA-Linux-x86_64-302.06.03/kernel/nv-reg.h \
NVIDIA-Linux-x86_64-302.06.03.rawspin.msi/kernel/nv-reg.h
--- NVIDIA-Linux-x86_64-302.06.03/kernel/nv-reg.h	2012-05-03 \
21:19:21.000000000 -0400
+++ NVIDIA-Linux-x86_64-302.06.03.rawspin.msi/kernel/nv-reg.h	2012-06-05 \ 
19:41:47.338336880 -0400
@@ -607,7 +607,7 @@ NV_DEFINE_REG_ENTRY(__NV_INITIALIZE_SYST
 NV_DEFINE_REG_ENTRY(__NV_USE_VBIOS, 1);
 NV_DEFINE_REG_ENTRY(__NV_RM_EDGE_INTR_CHECK, 1);
 NV_DEFINE_REG_ENTRY(__NV_USE_PAGE_ATTRIBUTE_TABLE, ~0);
-NV_DEFINE_REG_ENTRY(__NV_ENABLE_MSI, 0);
+NV_DEFINE_REG_ENTRY(__NV_ENABLE_MSI, 1);
 NV_DEFINE_REG_ENTRY(__NV_MAP_REGISTERS_EARLY, 0);
 NV_DEFINE_REG_ENTRY(__NV_REGISTER_FOR_ACPI_EVENTS, 1);


I am a little uncertain about my removal of "NV_INIT_MUTEX(mutex)
semaphore_init(mutex)".  I can't remember what my rational was.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: nvidia bug or RT bug?
  2012-06-06  0:03   ` Glenn Elliott
@ 2012-06-06  0:25     ` jordan
  0 siblings, 0 replies; 7+ messages in thread
From: jordan @ 2012-06-06  0:25 UTC (permalink / raw)
  To: Glenn Elliott; +Cc: linux-rt-users

> Here is a patch carried forward for 302.06.03 (the beta driver I
> downloaded with CUDA 5.0 preview a few weeks ago).  They were
> the same for the 270 driver.  This has worked for me; maybe it will
> work for you.

I've actually got 302.11 running, nicely. I haven't had the crash i
originally posted about it a while (that doesn't mean it was fixed
necessarily).

here is a link (pastebin) to the patch that i am using;

http://pastebin.com/CXWtt3E1

do you see any problems, or something that may be missing? (aside from
the MSI stuff?) I ask because i suppose it is possible that nvidia has
changed, added something and the patch that i use is for the 290.
series and i just adapted the pkguibld (build-script) to use the
latest version. So if you see any issues let me know, and i could
forward that information to the packager who is maintaining this
patch.

> I am a little uncertain about my removal of "NV_INIT_MUTEX(mutex)
> semaphore_init(mutex)".  I can't remember what my rational was.

I have no idea. ;)

cheerz

Jordan
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-06-06  0:25 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-07  1:17 nvidia bug or RT bug? jordan
2012-05-08 15:15 ` Steven Rostedt
2012-05-08 15:46   ` jordan
2012-05-08 15:58     ` Steven Rostedt
2012-06-05 23:32 ` Glenn Elliott
2012-06-06  0:03   ` Glenn Elliott
2012-06-06  0:25     ` jordan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.