* Re: kernel 3.11.6 general protection fault
@ 2013-11-13 19:58 ` MPhil. Emanoil Kotsev
0 siblings, 0 replies; 26+ messages in thread
From: MPhil. Emanoil Kotsev @ 2013-11-13 19:58 UTC (permalink / raw)
To: Borislav Petkov; +Cc: linux-kernel, intel-gfx
(sorry it replys automaticaly only to the sender - now added the list)
What do the intel-gfx people think?
====== original mail follows =======
Hi sorry for bothering you once again.
I noticed most of the issues are coming from drm (I have the stupid "Intel
Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics
Controller (rev 03)")
So I checked today the logs again and found out it crashed in the mornign when
turning on the notebook in the office.
Is there something you can conclude from the trace below and another
question - why is it checking CRTC as I have LVDS, VGA1 and DVI1 - actually
using only the LVDS and DVI outputs
Thanks again for taking your time
Nov 13 09:36:21 maistor kernel: [ 40.447271] ------------[ cut
here ]------------
Nov 13 09:36:21 maistor kernel: [ 40.447311] WARNING: CPU: 1 PID: 4142 at
drivers/gpu/drm/i915/intel_display.c:8292 check_crtc_state+0x5cf/0xa60 [i915]
()
Nov 13 09:36:21 maistor kernel: [ 40.447313] pipe state doesn't match!
Nov 13 09:36:21 maistor kernel: [ 40.447315] Modules linked in: snd_hrtimer
acpi_pad sbs sbshc fan binfmt_misc uinput fuse af_packet ipv6 firewire_sbp2
snd
_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss
snd_pcm snd_page_alloc snd_seq_dummy snd_seq_oss snd_seq_midi
snd_seq_midi_eve
nt snd_rawmidi snd_seq snd_seq_device snd_timer arc4 iTCO_wdt snd gpio_ich
iwl3945 dell_wmi sparse_keymap i2c_i801 iTCO_vendor_support ehci_pci iwlegacy
mac8
0211 cfg80211 soundcore rfkill dell_laptop lpc_ich yenta_socket pcmcia_rsrc
irda 8250 evdev wmi processor dcdbas rtc_cmos battery crc_ccitt ac joydev
sha256_
ssse3 sha256_generic cbc hid_generic usbhid hid loop dm_crypt dm_mod sg b44
sr_mod cdrom ssb i915 cfbfillrect cfbimgblt mmc_core mii pcmcia pcmcia_core
uhci_
hcd i2c_algo_bit cfbcopyarea firewire_ohci video backlight firewire_core
crc_itu_t drm_kms_helper drm ehci_hcd sd_mod i2c_core thermal thermal_sys
freq_table
usbcore usb_common button intel_agp intel_gtt agpgart
Nov 13 09:36:21 maistor kernel: [ 40.447384] CPU: 1 PID: 4142 Comm: Xorg
Tainted: P 3.11.6eko2 #3
Nov 13 09:36:21 maistor kernel: [ 40.447386] Hardware name: Dell Inc.
Latitude D520 /0NF743, BIOS A04 12/18/2006
Nov 13 09:36:21 maistor kernel: [ 40.447388] 0000000000000000
0000000000000009 ffffffff813ce8ab ffff880079c8f888
Nov 13 09:36:21 maistor kernel: [ 40.447392] ffffffff81038001
ffff88007a2596d8 ffff880079c8f900 ffff880037f3a000
Nov 13 09:36:21 maistor kernel: [ 40.447395] 0000000000000001
ffff880037f3a488 ffffffff810380e5 ffffffffa0295531
Nov 13 09:36:21 maistor kernel: [ 40.447398] Call Trace:
Nov 13 09:36:21 maistor kernel: [ 40.447407] [<ffffffff813ce8ab>] ?
dump_stack+0x41/0x51
Nov 13 09:36:21 maistor kernel: [ 40.447412] [<ffffffff81038001>] ?
warn_slowpath_common+0x81/0xb0
Nov 13 09:36:21 maistor kernel: [ 40.447415] [<ffffffff810380e5>] ?
warn_slowpath_fmt+0x45/0x50
Nov 13 09:36:21 maistor kernel: [ 40.447427] [<ffffffffa024338f>] ?
check_crtc_state+0x5cf/0xa60 [i915]
Nov 13 09:36:21 maistor kernel: [ 40.447440] [<ffffffffa024db7d>] ?
intel_modeset_check_state+0x2bd/0x730 [i915]
Nov 13 09:36:21 maistor kernel: [ 40.447445] [<ffffffff811ec219>] ?
snprintf+0x39/0x40
Nov 13 09:36:21 maistor kernel: [ 40.447456] [<ffffffffa024e05d>] ?
intel_set_mode+0x1d/0x30 [i915]
Nov 13 09:36:21 maistor kernel: [ 40.447467] [<ffffffffa024e81a>] ?
intel_crtc_set_config+0x7aa/0x980 [i915]
Nov 13 09:36:21 maistor kernel: [ 40.447481] [<ffffffffa00f9155>] ?
drm_mode_set_config_internal+0x55/0xd0 [drm]
Nov 13 09:36:21 maistor kernel: [ 40.447490] [<ffffffffa00fb118>] ?
drm_mode_setcrtc+0x118/0x640 [drm]
Nov 13 09:36:21 maistor kernel: [ 40.447497] [<ffffffffa00ec11d>] ?
drm_ioctl+0x4ed/0x5f0 [drm]
Nov 13 09:36:21 maistor kernel: [ 40.447507] [<ffffffffa00fb000>] ?
drm_mode_setplane+0x3a0/0x3a0 [drm]
Nov 13 09:36:21 maistor kernel: [ 40.447512] [<ffffffff8111428b>] ?
do_vfs_ioctl+0x8b/0x520
Nov 13 09:36:21 maistor kernel: [ 40.447515] [<ffffffff8111476d>] ?
SyS_ioctl+0x4d/0xa0
Nov 13 09:36:21 maistor kernel: [ 40.447519] [<ffffffff813d4c56>] ?
system_call_fastpath+0x1a/0x1f
Nov 13 09:36:21 maistor kernel: [ 40.447521] ---[ end trace
307df46ce6dc8ed1 ]---
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: kernel 3.11.6 general protection fault
@ 2013-11-13 19:58 ` MPhil. Emanoil Kotsev
0 siblings, 0 replies; 26+ messages in thread
From: MPhil. Emanoil Kotsev @ 2013-11-13 19:58 UTC (permalink / raw)
To: Borislav Petkov; +Cc: intel-gfx, linux-kernel
(sorry it replys automaticaly only to the sender - now added the list)
What do the intel-gfx people think?
====== original mail follows =======
Hi sorry for bothering you once again.
I noticed most of the issues are coming from drm (I have the stupid "Intel
Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics
Controller (rev 03)")
So I checked today the logs again and found out it crashed in the mornign when
turning on the notebook in the office.
Is there something you can conclude from the trace below and another
question - why is it checking CRTC as I have LVDS, VGA1 and DVI1 - actually
using only the LVDS and DVI outputs
Thanks again for taking your time
Nov 13 09:36:21 maistor kernel: [ 40.447271] ------------[ cut
here ]------------
Nov 13 09:36:21 maistor kernel: [ 40.447311] WARNING: CPU: 1 PID: 4142 at
drivers/gpu/drm/i915/intel_display.c:8292 check_crtc_state+0x5cf/0xa60 [i915]
()
Nov 13 09:36:21 maistor kernel: [ 40.447313] pipe state doesn't match!
Nov 13 09:36:21 maistor kernel: [ 40.447315] Modules linked in: snd_hrtimer
acpi_pad sbs sbshc fan binfmt_misc uinput fuse af_packet ipv6 firewire_sbp2
snd
_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss
snd_pcm snd_page_alloc snd_seq_dummy snd_seq_oss snd_seq_midi
snd_seq_midi_eve
nt snd_rawmidi snd_seq snd_seq_device snd_timer arc4 iTCO_wdt snd gpio_ich
iwl3945 dell_wmi sparse_keymap i2c_i801 iTCO_vendor_support ehci_pci iwlegacy
mac8
0211 cfg80211 soundcore rfkill dell_laptop lpc_ich yenta_socket pcmcia_rsrc
irda 8250 evdev wmi processor dcdbas rtc_cmos battery crc_ccitt ac joydev
sha256_
ssse3 sha256_generic cbc hid_generic usbhid hid loop dm_crypt dm_mod sg b44
sr_mod cdrom ssb i915 cfbfillrect cfbimgblt mmc_core mii pcmcia pcmcia_core
uhci_
hcd i2c_algo_bit cfbcopyarea firewire_ohci video backlight firewire_core
crc_itu_t drm_kms_helper drm ehci_hcd sd_mod i2c_core thermal thermal_sys
freq_table
usbcore usb_common button intel_agp intel_gtt agpgart
Nov 13 09:36:21 maistor kernel: [ 40.447384] CPU: 1 PID: 4142 Comm: Xorg
Tainted: P 3.11.6eko2 #3
Nov 13 09:36:21 maistor kernel: [ 40.447386] Hardware name: Dell Inc.
Latitude D520 /0NF743, BIOS A04 12/18/2006
Nov 13 09:36:21 maistor kernel: [ 40.447388] 0000000000000000
0000000000000009 ffffffff813ce8ab ffff880079c8f888
Nov 13 09:36:21 maistor kernel: [ 40.447392] ffffffff81038001
ffff88007a2596d8 ffff880079c8f900 ffff880037f3a000
Nov 13 09:36:21 maistor kernel: [ 40.447395] 0000000000000001
ffff880037f3a488 ffffffff810380e5 ffffffffa0295531
Nov 13 09:36:21 maistor kernel: [ 40.447398] Call Trace:
Nov 13 09:36:21 maistor kernel: [ 40.447407] [<ffffffff813ce8ab>] ?
dump_stack+0x41/0x51
Nov 13 09:36:21 maistor kernel: [ 40.447412] [<ffffffff81038001>] ?
warn_slowpath_common+0x81/0xb0
Nov 13 09:36:21 maistor kernel: [ 40.447415] [<ffffffff810380e5>] ?
warn_slowpath_fmt+0x45/0x50
Nov 13 09:36:21 maistor kernel: [ 40.447427] [<ffffffffa024338f>] ?
check_crtc_state+0x5cf/0xa60 [i915]
Nov 13 09:36:21 maistor kernel: [ 40.447440] [<ffffffffa024db7d>] ?
intel_modeset_check_state+0x2bd/0x730 [i915]
Nov 13 09:36:21 maistor kernel: [ 40.447445] [<ffffffff811ec219>] ?
snprintf+0x39/0x40
Nov 13 09:36:21 maistor kernel: [ 40.447456] [<ffffffffa024e05d>] ?
intel_set_mode+0x1d/0x30 [i915]
Nov 13 09:36:21 maistor kernel: [ 40.447467] [<ffffffffa024e81a>] ?
intel_crtc_set_config+0x7aa/0x980 [i915]
Nov 13 09:36:21 maistor kernel: [ 40.447481] [<ffffffffa00f9155>] ?
drm_mode_set_config_internal+0x55/0xd0 [drm]
Nov 13 09:36:21 maistor kernel: [ 40.447490] [<ffffffffa00fb118>] ?
drm_mode_setcrtc+0x118/0x640 [drm]
Nov 13 09:36:21 maistor kernel: [ 40.447497] [<ffffffffa00ec11d>] ?
drm_ioctl+0x4ed/0x5f0 [drm]
Nov 13 09:36:21 maistor kernel: [ 40.447507] [<ffffffffa00fb000>] ?
drm_mode_setplane+0x3a0/0x3a0 [drm]
Nov 13 09:36:21 maistor kernel: [ 40.447512] [<ffffffff8111428b>] ?
do_vfs_ioctl+0x8b/0x520
Nov 13 09:36:21 maistor kernel: [ 40.447515] [<ffffffff8111476d>] ?
SyS_ioctl+0x4d/0xa0
Nov 13 09:36:21 maistor kernel: [ 40.447519] [<ffffffff813d4c56>] ?
system_call_fastpath+0x1a/0x1f
Nov 13 09:36:21 maistor kernel: [ 40.447521] ---[ end trace
307df46ce6dc8ed1 ]---
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Intel-gfx] kernel 3.11.6 general protection fault
2013-11-13 19:58 ` MPhil. Emanoil Kotsev
@ 2013-11-13 20:09 ` Daniel Vetter
-1 siblings, 0 replies; 26+ messages in thread
From: Daniel Vetter @ 2013-11-13 20:09 UTC (permalink / raw)
To: MPhil. Emanoil Kotsev; +Cc: Borislav Petkov, intel-gfx, linux-kernel
On Wed, Nov 13, 2013 at 08:58:29PM +0100, MPhil. Emanoil Kotsev wrote:
> (sorry it replys automaticaly only to the sender - now added the list)
>
> What do the intel-gfx people think?
>
> ====== original mail follows =======
> Hi sorry for bothering you once again.
>
> I noticed most of the issues are coming from drm (I have the stupid "Intel
> Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics
> Controller (rev 03)")
>
> So I checked today the logs again and found out it crashed in the mornign when
> turning on the notebook in the office.
>
> Is there something you can conclude from the trace below and another
> question - why is it checking CRTC as I have LVDS, VGA1 and DVI1 - actually
> using only the LVDS and DVI outputs
>
> Thanks again for taking your time
Testing on latest drm-intel-nightly from
http://cgit.freedesktop.org/~danvet/drm-intel/
If that doesn't help then please boot with drm.debug=0xe, reproduce the
issue and then attach the complete dmesg. Please make sure everything
starting from boot messages is in there, increase the dmesg buffer with
log_buf_len=4M or so if that isn't the case.
-Daniel
>
>
>
> Nov 13 09:36:21 maistor kernel: [ 40.447271] ------------[ cut
> here ]------------
> Nov 13 09:36:21 maistor kernel: [ 40.447311] WARNING: CPU: 1 PID: 4142 at
> drivers/gpu/drm/i915/intel_display.c:8292 check_crtc_state+0x5cf/0xa60 [i915]
> ()
> Nov 13 09:36:21 maistor kernel: [ 40.447313] pipe state doesn't match!
> Nov 13 09:36:21 maistor kernel: [ 40.447315] Modules linked in: snd_hrtimer
> acpi_pad sbs sbshc fan binfmt_misc uinput fuse af_packet ipv6 firewire_sbp2
> snd
> _hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss
> snd_pcm snd_page_alloc snd_seq_dummy snd_seq_oss snd_seq_midi
> snd_seq_midi_eve
> nt snd_rawmidi snd_seq snd_seq_device snd_timer arc4 iTCO_wdt snd gpio_ich
> iwl3945 dell_wmi sparse_keymap i2c_i801 iTCO_vendor_support ehci_pci iwlegacy
> mac8
> 0211 cfg80211 soundcore rfkill dell_laptop lpc_ich yenta_socket pcmcia_rsrc
> irda 8250 evdev wmi processor dcdbas rtc_cmos battery crc_ccitt ac joydev
> sha256_
> ssse3 sha256_generic cbc hid_generic usbhid hid loop dm_crypt dm_mod sg b44
> sr_mod cdrom ssb i915 cfbfillrect cfbimgblt mmc_core mii pcmcia pcmcia_core
> uhci_
> hcd i2c_algo_bit cfbcopyarea firewire_ohci video backlight firewire_core
> crc_itu_t drm_kms_helper drm ehci_hcd sd_mod i2c_core thermal thermal_sys
> freq_table
> usbcore usb_common button intel_agp intel_gtt agpgart
> Nov 13 09:36:21 maistor kernel: [ 40.447384] CPU: 1 PID: 4142 Comm: Xorg
> Tainted: P 3.11.6eko2 #3
> Nov 13 09:36:21 maistor kernel: [ 40.447386] Hardware name: Dell Inc.
> Latitude D520 /0NF743, BIOS A04 12/18/2006
> Nov 13 09:36:21 maistor kernel: [ 40.447388] 0000000000000000
> 0000000000000009 ffffffff813ce8ab ffff880079c8f888
> Nov 13 09:36:21 maistor kernel: [ 40.447392] ffffffff81038001
> ffff88007a2596d8 ffff880079c8f900 ffff880037f3a000
> Nov 13 09:36:21 maistor kernel: [ 40.447395] 0000000000000001
> ffff880037f3a488 ffffffff810380e5 ffffffffa0295531
> Nov 13 09:36:21 maistor kernel: [ 40.447398] Call Trace:
> Nov 13 09:36:21 maistor kernel: [ 40.447407] [<ffffffff813ce8ab>] ?
> dump_stack+0x41/0x51
> Nov 13 09:36:21 maistor kernel: [ 40.447412] [<ffffffff81038001>] ?
> warn_slowpath_common+0x81/0xb0
> Nov 13 09:36:21 maistor kernel: [ 40.447415] [<ffffffff810380e5>] ?
> warn_slowpath_fmt+0x45/0x50
> Nov 13 09:36:21 maistor kernel: [ 40.447427] [<ffffffffa024338f>] ?
> check_crtc_state+0x5cf/0xa60 [i915]
> Nov 13 09:36:21 maistor kernel: [ 40.447440] [<ffffffffa024db7d>] ?
> intel_modeset_check_state+0x2bd/0x730 [i915]
> Nov 13 09:36:21 maistor kernel: [ 40.447445] [<ffffffff811ec219>] ?
> snprintf+0x39/0x40
> Nov 13 09:36:21 maistor kernel: [ 40.447456] [<ffffffffa024e05d>] ?
> intel_set_mode+0x1d/0x30 [i915]
> Nov 13 09:36:21 maistor kernel: [ 40.447467] [<ffffffffa024e81a>] ?
> intel_crtc_set_config+0x7aa/0x980 [i915]
> Nov 13 09:36:21 maistor kernel: [ 40.447481] [<ffffffffa00f9155>] ?
> drm_mode_set_config_internal+0x55/0xd0 [drm]
> Nov 13 09:36:21 maistor kernel: [ 40.447490] [<ffffffffa00fb118>] ?
> drm_mode_setcrtc+0x118/0x640 [drm]
> Nov 13 09:36:21 maistor kernel: [ 40.447497] [<ffffffffa00ec11d>] ?
> drm_ioctl+0x4ed/0x5f0 [drm]
> Nov 13 09:36:21 maistor kernel: [ 40.447507] [<ffffffffa00fb000>] ?
> drm_mode_setplane+0x3a0/0x3a0 [drm]
> Nov 13 09:36:21 maistor kernel: [ 40.447512] [<ffffffff8111428b>] ?
> do_vfs_ioctl+0x8b/0x520
> Nov 13 09:36:21 maistor kernel: [ 40.447515] [<ffffffff8111476d>] ?
> SyS_ioctl+0x4d/0xa0
> Nov 13 09:36:21 maistor kernel: [ 40.447519] [<ffffffff813d4c56>] ?
> system_call_fastpath+0x1a/0x1f
> Nov 13 09:36:21 maistor kernel: [ 40.447521] ---[ end trace
> 307df46ce6dc8ed1 ]---
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: kernel 3.11.6 general protection fault
@ 2013-11-13 20:09 ` Daniel Vetter
0 siblings, 0 replies; 26+ messages in thread
From: Daniel Vetter @ 2013-11-13 20:09 UTC (permalink / raw)
To: MPhil. Emanoil Kotsev; +Cc: intel-gfx, Borislav Petkov, linux-kernel
On Wed, Nov 13, 2013 at 08:58:29PM +0100, MPhil. Emanoil Kotsev wrote:
> (sorry it replys automaticaly only to the sender - now added the list)
>
> What do the intel-gfx people think?
>
> ====== original mail follows =======
> Hi sorry for bothering you once again.
>
> I noticed most of the issues are coming from drm (I have the stupid "Intel
> Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics
> Controller (rev 03)")
>
> So I checked today the logs again and found out it crashed in the mornign when
> turning on the notebook in the office.
>
> Is there something you can conclude from the trace below and another
> question - why is it checking CRTC as I have LVDS, VGA1 and DVI1 - actually
> using only the LVDS and DVI outputs
>
> Thanks again for taking your time
Testing on latest drm-intel-nightly from
http://cgit.freedesktop.org/~danvet/drm-intel/
If that doesn't help then please boot with drm.debug=0xe, reproduce the
issue and then attach the complete dmesg. Please make sure everything
starting from boot messages is in there, increase the dmesg buffer with
log_buf_len=4M or so if that isn't the case.
-Daniel
>
>
>
> Nov 13 09:36:21 maistor kernel: [ 40.447271] ------------[ cut
> here ]------------
> Nov 13 09:36:21 maistor kernel: [ 40.447311] WARNING: CPU: 1 PID: 4142 at
> drivers/gpu/drm/i915/intel_display.c:8292 check_crtc_state+0x5cf/0xa60 [i915]
> ()
> Nov 13 09:36:21 maistor kernel: [ 40.447313] pipe state doesn't match!
> Nov 13 09:36:21 maistor kernel: [ 40.447315] Modules linked in: snd_hrtimer
> acpi_pad sbs sbshc fan binfmt_misc uinput fuse af_packet ipv6 firewire_sbp2
> snd
> _hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss
> snd_pcm snd_page_alloc snd_seq_dummy snd_seq_oss snd_seq_midi
> snd_seq_midi_eve
> nt snd_rawmidi snd_seq snd_seq_device snd_timer arc4 iTCO_wdt snd gpio_ich
> iwl3945 dell_wmi sparse_keymap i2c_i801 iTCO_vendor_support ehci_pci iwlegacy
> mac8
> 0211 cfg80211 soundcore rfkill dell_laptop lpc_ich yenta_socket pcmcia_rsrc
> irda 8250 evdev wmi processor dcdbas rtc_cmos battery crc_ccitt ac joydev
> sha256_
> ssse3 sha256_generic cbc hid_generic usbhid hid loop dm_crypt dm_mod sg b44
> sr_mod cdrom ssb i915 cfbfillrect cfbimgblt mmc_core mii pcmcia pcmcia_core
> uhci_
> hcd i2c_algo_bit cfbcopyarea firewire_ohci video backlight firewire_core
> crc_itu_t drm_kms_helper drm ehci_hcd sd_mod i2c_core thermal thermal_sys
> freq_table
> usbcore usb_common button intel_agp intel_gtt agpgart
> Nov 13 09:36:21 maistor kernel: [ 40.447384] CPU: 1 PID: 4142 Comm: Xorg
> Tainted: P 3.11.6eko2 #3
> Nov 13 09:36:21 maistor kernel: [ 40.447386] Hardware name: Dell Inc.
> Latitude D520 /0NF743, BIOS A04 12/18/2006
> Nov 13 09:36:21 maistor kernel: [ 40.447388] 0000000000000000
> 0000000000000009 ffffffff813ce8ab ffff880079c8f888
> Nov 13 09:36:21 maistor kernel: [ 40.447392] ffffffff81038001
> ffff88007a2596d8 ffff880079c8f900 ffff880037f3a000
> Nov 13 09:36:21 maistor kernel: [ 40.447395] 0000000000000001
> ffff880037f3a488 ffffffff810380e5 ffffffffa0295531
> Nov 13 09:36:21 maistor kernel: [ 40.447398] Call Trace:
> Nov 13 09:36:21 maistor kernel: [ 40.447407] [<ffffffff813ce8ab>] ?
> dump_stack+0x41/0x51
> Nov 13 09:36:21 maistor kernel: [ 40.447412] [<ffffffff81038001>] ?
> warn_slowpath_common+0x81/0xb0
> Nov 13 09:36:21 maistor kernel: [ 40.447415] [<ffffffff810380e5>] ?
> warn_slowpath_fmt+0x45/0x50
> Nov 13 09:36:21 maistor kernel: [ 40.447427] [<ffffffffa024338f>] ?
> check_crtc_state+0x5cf/0xa60 [i915]
> Nov 13 09:36:21 maistor kernel: [ 40.447440] [<ffffffffa024db7d>] ?
> intel_modeset_check_state+0x2bd/0x730 [i915]
> Nov 13 09:36:21 maistor kernel: [ 40.447445] [<ffffffff811ec219>] ?
> snprintf+0x39/0x40
> Nov 13 09:36:21 maistor kernel: [ 40.447456] [<ffffffffa024e05d>] ?
> intel_set_mode+0x1d/0x30 [i915]
> Nov 13 09:36:21 maistor kernel: [ 40.447467] [<ffffffffa024e81a>] ?
> intel_crtc_set_config+0x7aa/0x980 [i915]
> Nov 13 09:36:21 maistor kernel: [ 40.447481] [<ffffffffa00f9155>] ?
> drm_mode_set_config_internal+0x55/0xd0 [drm]
> Nov 13 09:36:21 maistor kernel: [ 40.447490] [<ffffffffa00fb118>] ?
> drm_mode_setcrtc+0x118/0x640 [drm]
> Nov 13 09:36:21 maistor kernel: [ 40.447497] [<ffffffffa00ec11d>] ?
> drm_ioctl+0x4ed/0x5f0 [drm]
> Nov 13 09:36:21 maistor kernel: [ 40.447507] [<ffffffffa00fb000>] ?
> drm_mode_setplane+0x3a0/0x3a0 [drm]
> Nov 13 09:36:21 maistor kernel: [ 40.447512] [<ffffffff8111428b>] ?
> do_vfs_ioctl+0x8b/0x520
> Nov 13 09:36:21 maistor kernel: [ 40.447515] [<ffffffff8111476d>] ?
> SyS_ioctl+0x4d/0xa0
> Nov 13 09:36:21 maistor kernel: [ 40.447519] [<ffffffff813d4c56>] ?
> system_call_fastpath+0x1a/0x1f
> Nov 13 09:36:21 maistor kernel: [ 40.447521] ---[ end trace
> 307df46ce6dc8ed1 ]---
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Intel-gfx] kernel 3.11.6 general protection fault
2013-11-13 20:09 ` Daniel Vetter
@ 2013-11-13 20:33 ` Borislav Petkov
-1 siblings, 0 replies; 26+ messages in thread
From: Borislav Petkov @ 2013-11-13 20:33 UTC (permalink / raw)
To: MPhil. Emanoil Kotsev; +Cc: intel-gfx, linux-kernel, Daniel Vetter
Some more suggestions, in addition to Daniel's:
On Wed, Nov 13, 2013 at 09:09:14PM +0100, Daniel Vetter wrote:
> > Nov 13 09:36:21 maistor kernel: [ 40.447271] ------------[ cut
> > here ]------------
> > Nov 13 09:36:21 maistor kernel: [ 40.447311] WARNING: CPU: 1 PID: 4142 at
> > drivers/gpu/drm/i915/intel_display.c:8292 check_crtc_state+0x5cf/0xa60 [i915]
> > ()
> > Nov 13 09:36:21 maistor kernel: [ 40.447313] pipe state doesn't match!
That's
if (active &&
!intel_pipe_config_compare(dev, &crtc->config, &pipe_config)) {
WARN(1, "pipe state doesn't match!\n"); <---
intel_dump_pipe_config(crtc, &pipe_config,
"[hw state]");
intel_dump_pipe_config(crtc, &crtc->config,
"[sw state]");
}
> > Nov 13 09:36:21 maistor kernel: [ 40.447315] Modules linked in: snd_hrtimer
> > acpi_pad sbs sbshc fan binfmt_misc uinput fuse af_packet ipv6 firewire_sbp2
> > snd
> > _hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss
> > snd_pcm snd_page_alloc snd_seq_dummy snd_seq_oss snd_seq_midi
> > snd_seq_midi_eve
> > nt snd_rawmidi snd_seq snd_seq_device snd_timer arc4 iTCO_wdt snd gpio_ich
> > iwl3945 dell_wmi sparse_keymap i2c_i801 iTCO_vendor_support ehci_pci iwlegacy
> > mac8
> > 0211 cfg80211 soundcore rfkill dell_laptop lpc_ich yenta_socket pcmcia_rsrc
> > irda 8250 evdev wmi processor dcdbas rtc_cmos battery crc_ccitt ac joydev
> > sha256_
> > ssse3 sha256_generic cbc hid_generic usbhid hid loop dm_crypt dm_mod sg b44
> > sr_mod cdrom ssb i915 cfbfillrect cfbimgblt mmc_core mii pcmcia pcmcia_core
> > uhci_
> > hcd i2c_algo_bit cfbcopyarea firewire_ohci video backlight firewire_core
> > crc_itu_t drm_kms_helper drm ehci_hcd sd_mod i2c_core thermal thermal_sys
> > freq_table
> > usbcore usb_common button intel_agp intel_gtt agpgart
> > Nov 13 09:36:21 maistor kernel: [ 40.447384] CPU: 1 PID: 4142 Comm: Xorg
> > Tainted: P 3.11.6eko2 #3
And there's that taint P again due to the vmware modules.
I know that you tried without the vmware modules where your kernel
wasn't tainted but then you got a #GP which could be something entirely
different. But now you're hitting some sanity-checking code which could
mean there's some corruption happening.
So, can you reproduce that exact same warning, i.e. this one:
WARNING: CPU: 1 PID: 4142 at drivers/gpu/drm/i915/intel_display.c:8292 check_crtc_state+0x5cf/0xa60 [i915]()
pipe state doesn't match!
*without* the vmware modules installed?
Also, it wouldn't hurt to try the shiny new 3.12.
HTH.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: kernel 3.11.6 general protection fault
@ 2013-11-13 20:33 ` Borislav Petkov
0 siblings, 0 replies; 26+ messages in thread
From: Borislav Petkov @ 2013-11-13 20:33 UTC (permalink / raw)
To: MPhil. Emanoil Kotsev; +Cc: intel-gfx, linux-kernel
Some more suggestions, in addition to Daniel's:
On Wed, Nov 13, 2013 at 09:09:14PM +0100, Daniel Vetter wrote:
> > Nov 13 09:36:21 maistor kernel: [ 40.447271] ------------[ cut
> > here ]------------
> > Nov 13 09:36:21 maistor kernel: [ 40.447311] WARNING: CPU: 1 PID: 4142 at
> > drivers/gpu/drm/i915/intel_display.c:8292 check_crtc_state+0x5cf/0xa60 [i915]
> > ()
> > Nov 13 09:36:21 maistor kernel: [ 40.447313] pipe state doesn't match!
That's
if (active &&
!intel_pipe_config_compare(dev, &crtc->config, &pipe_config)) {
WARN(1, "pipe state doesn't match!\n"); <---
intel_dump_pipe_config(crtc, &pipe_config,
"[hw state]");
intel_dump_pipe_config(crtc, &crtc->config,
"[sw state]");
}
> > Nov 13 09:36:21 maistor kernel: [ 40.447315] Modules linked in: snd_hrtimer
> > acpi_pad sbs sbshc fan binfmt_misc uinput fuse af_packet ipv6 firewire_sbp2
> > snd
> > _hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss
> > snd_pcm snd_page_alloc snd_seq_dummy snd_seq_oss snd_seq_midi
> > snd_seq_midi_eve
> > nt snd_rawmidi snd_seq snd_seq_device snd_timer arc4 iTCO_wdt snd gpio_ich
> > iwl3945 dell_wmi sparse_keymap i2c_i801 iTCO_vendor_support ehci_pci iwlegacy
> > mac8
> > 0211 cfg80211 soundcore rfkill dell_laptop lpc_ich yenta_socket pcmcia_rsrc
> > irda 8250 evdev wmi processor dcdbas rtc_cmos battery crc_ccitt ac joydev
> > sha256_
> > ssse3 sha256_generic cbc hid_generic usbhid hid loop dm_crypt dm_mod sg b44
> > sr_mod cdrom ssb i915 cfbfillrect cfbimgblt mmc_core mii pcmcia pcmcia_core
> > uhci_
> > hcd i2c_algo_bit cfbcopyarea firewire_ohci video backlight firewire_core
> > crc_itu_t drm_kms_helper drm ehci_hcd sd_mod i2c_core thermal thermal_sys
> > freq_table
> > usbcore usb_common button intel_agp intel_gtt agpgart
> > Nov 13 09:36:21 maistor kernel: [ 40.447384] CPU: 1 PID: 4142 Comm: Xorg
> > Tainted: P 3.11.6eko2 #3
And there's that taint P again due to the vmware modules.
I know that you tried without the vmware modules where your kernel
wasn't tainted but then you got a #GP which could be something entirely
different. But now you're hitting some sanity-checking code which could
mean there's some corruption happening.
So, can you reproduce that exact same warning, i.e. this one:
WARNING: CPU: 1 PID: 4142 at drivers/gpu/drm/i915/intel_display.c:8292 check_crtc_state+0x5cf/0xa60 [i915]()
pipe state doesn't match!
*without* the vmware modules installed?
Also, it wouldn't hurt to try the shiny new 3.12.
HTH.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Intel-gfx] kernel 3.11.6 general protection fault
2013-11-13 20:33 ` Borislav Petkov
@ 2013-11-13 21:19 ` MPhil. Emanoil Kotsev
-1 siblings, 0 replies; 26+ messages in thread
From: MPhil. Emanoil Kotsev @ 2013-11-13 21:19 UTC (permalink / raw)
To: Borislav Petkov; +Cc: intel-gfx, linux-kernel, Daniel Vetter
Hi
On Wednesday 13 November 2013 21:33:19 Borislav Petkov wrote:
> Some more suggestions, in addition to Daniel's:
>
> On Wed, Nov 13, 2013 at 09:09:14PM +0100, Daniel Vetter wrote:
> > > Nov 13 09:36:21 maistor kernel: [ 40.447271] ------------[ cut
> > > here ]------------
> > > Nov 13 09:36:21 maistor kernel: [ 40.447311] WARNING: CPU: 1 PID:
> > > 4142 at drivers/gpu/drm/i915/intel_display.c:8292
> > > check_crtc_state+0x5cf/0xa60 [i915] ()
> > > Nov 13 09:36:21 maistor kernel: [ 40.447313] pipe state doesn't
> > > match!
>
> That's
>
> if (active &&
> !intel_pipe_config_compare(dev, &crtc->config, &pipe_config)) {
> WARN(1, "pipe state doesn't match!\n"); <---
> intel_dump_pipe_config(crtc, &pipe_config,
> "[hw state]");
> intel_dump_pipe_config(crtc, &crtc->config,
> "[sw state]");
> }
>
I looked there, but it would have taken more time then available to get an
idea on what it is exactly trying to do
> > > Nov 13 09:36:21 maistor kernel: [ 40.447315] Modules linked in:
> > > snd_hrtimer acpi_pad sbs sbshc fan binfmt_misc uinput fuse af_packet
> > > ipv6 firewire_sbp2 snd
> > > _hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss
> > > snd_mixer_oss snd_pcm snd_page_alloc snd_seq_dummy snd_seq_oss
> > > snd_seq_midi
> > > snd_seq_midi_eve
> > > nt snd_rawmidi snd_seq snd_seq_device snd_timer arc4 iTCO_wdt snd
> > > gpio_ich iwl3945 dell_wmi sparse_keymap i2c_i801 iTCO_vendor_support
> > > ehci_pci iwlegacy mac8
> > > 0211 cfg80211 soundcore rfkill dell_laptop lpc_ich yenta_socket
> > > pcmcia_rsrc irda 8250 evdev wmi processor dcdbas rtc_cmos battery
> > > crc_ccitt ac joydev sha256_
> > > ssse3 sha256_generic cbc hid_generic usbhid hid loop dm_crypt dm_mod sg
> > > b44 sr_mod cdrom ssb i915 cfbfillrect cfbimgblt mmc_core mii pcmcia
> > > pcmcia_core uhci_
> > > hcd i2c_algo_bit cfbcopyarea firewire_ohci video backlight
> > > firewire_core crc_itu_t drm_kms_helper drm ehci_hcd sd_mod i2c_core
> > > thermal thermal_sys freq_table
> > > usbcore usb_common button intel_agp intel_gtt agpgart
> > > Nov 13 09:36:21 maistor kernel: [ 40.447384] CPU: 1 PID: 4142 Comm:
> > > Xorg Tainted: P 3.11.6eko2 #3
>
> And there's that taint P again due to the vmware modules.
>
> I know that you tried without the vmware modules where your kernel
> wasn't tainted but then you got a #GP which could be something entirely
> different. But now you're hitting some sanity-checking code which could
> mean there's some corruption happening.
Yes with #GP machine locks and this time it didn't
>
> So, can you reproduce that exact same warning, i.e. this one:
>
> WARNING: CPU: 1 PID: 4142 at drivers/gpu/drm/i915/intel_display.c:8292
> check_crtc_state+0x5cf/0xa60 [i915]() pipe state doesn't match!
>
> *without* the vmware modules installed?
I'm not sure - you know it happens random
>
> Also, it wouldn't hurt to try the shiny new 3.12.
I was thinking to do so - but lets be honest. I would save everybody's time if
I were 100% sure it is a hardware issue and I would buy a new notebook. The
one was serving great for the past 7y and it payed off itself already long
time ago.
I could try 3.12 and also try in combination with the git drm-intel as Daniel
suggested.
I'm still thinking that his has something to do with the graphics, but rather
guessing from intuition.
I'm not sure if it helps somehow but when I grep as following I find only the
tainted erros - it's not visible which of them were GP, but still it shows
where it hit the issue
zgrep 'Comm:' messages* | more
messages:Nov 11 10:52:42 maistor kernel: [ 43.961984] CPU: 1 PID: 4103 Comm:
Xorg Tainted: P 3.11.6eko2 #3
messages:Nov 11 10:52:54 maistor kernel: [ 55.759687] CPU: 1 PID: 4103 Comm:
Xorg Tainted: P W 3.11.6eko2 #3
messages:Nov 12 10:35:42 maistor kernel: [ 28.626271] CPU: 0 PID: 3895 Comm:
Xorg Tainted: P 3.11.6eko2 #3
messages:Nov 12 10:35:55 maistor kernel: [ 41.618447] CPU: 1 PID: 3895 Comm:
Xorg Tainted: P W 3.11.6eko2 #3
messages:Nov 13 09:36:21 maistor kernel: [ 40.447384] CPU: 1 PID: 4142 Comm:
Xorg Tainted: P 3.11.6eko2 #3
messages:Nov 13 09:36:34 maistor kernel: [ 53.624754] CPU: 0 PID: 4142 Comm:
Xorg Tainted: P W 3.11.6eko2 #3
messages.1:Nov 4 11:21:19 maistor kernel: [ 38.497643] CPU: 1 PID: 4104
Comm: Xorg Tainted: P 3.11.6eko2 #3
messages.1:Nov 4 11:21:31 maistor kernel: [ 50.844193] CPU: 0 PID: 4104
Comm: Xorg Tainted: P W 3.11.6eko2 #3
messages.1:Nov 5 10:28:49 maistor kernel: [ 39.545474] CPU: 1 PID: 4253
Comm: Xorg Tainted: P 3.11.6eko2 #3
messages.1:Nov 5 10:29:02 maistor kernel: [ 52.078761] CPU: 0 PID: 4253
Comm: Xorg Tainted: P W 3.11.6eko2 #3
messages.1:Nov 6 10:33:01 maistor kernel: [ 38.876587] CPU: 0 PID: 4128
Comm: Xorg Tainted: P 3.11.6eko2 #3
messages.1:Nov 6 10:33:12 maistor kernel: [ 49.777082] CPU: 0 PID: 4128
Comm: Xorg Tainted: P W 3.11.6eko2 #3
messages.1:Nov 7 10:27:40 maistor kernel: [ 38.771546] CPU: 0 PID: 4110
Comm: Xorg Tainted: P 3.11.6eko2 #3
messages.1:Nov 7 10:27:53 maistor kernel: [ 51.896606] CPU: 1 PID: 4110
Comm: Xorg Tainted: P W 3.11.6eko2 #3
messages.1:Nov 8 10:35:13 maistor kernel: [ 42.021333] CPU: 1 PID: 4224
Comm: Xorg Tainted: P 3.11.6eko2 #3
messages.1:Nov 8 10:35:22 maistor kernel: [ 51.699993] CPU: 0 PID: 4224
Comm: Xorg Tainted: P W 3.11.6eko2 #3
messages.2.gz:Oct 27 19:01:29 maistor kernel: CPU: 1 PID: 6111 Comm:
plugin-containe Tainted: P O 3.11.6eko2 #1
messages.2.gz:Oct 27 22:15:14 maistor kernel: CPU: 1 PID: 9024 Comm:
plugin-containe Tainted: P O 3.11.6eko2 #1
messages.2.gz:Oct 28 10:33:34 maistor kernel: CPU: 0 PID: 4195 Comm: Xorg
Tainted: P 3.11.6eko2 #1
messages.2.gz:Oct 28 10:33:43 maistor kernel: CPU: 0 PID: 4195 Comm: Xorg
Tainted: P W 3.11.6eko2 #1
messages.2.gz:Oct 29 10:34:29 maistor kernel: CPU: 1 PID: 4633 Comm: Xorg
Tainted: P 3.11.6eko2 #1
messages.2.gz:Oct 29 10:34:41 maistor kernel: CPU: 1 PID: 4633 Comm: Xorg
Tainted: P W 3.11.6eko2 #1
messages.2.gz:Oct 30 10:30:55 maistor kernel: CPU: 1 PID: 4030 Comm: Xorg
Tainted: P 3.11.6eko2 #1
messages.2.gz:Oct 30 10:31:06 maistor kernel: CPU: 0 PID: 4030 Comm: Xorg
Tainted: P W 3.11.6eko2 #1
messages.2.gz:Oct 31 10:51:01 maistor kernel: CPU: 1 PID: 6441 Comm: Xorg
Tainted: P O 3.11.6eko2 #1
messages.2.gz:Oct 31 10:51:08 maistor kernel: CPU: 1 PID: 6441 Comm: Xorg
Tainted: P W O 3.11.6eko2 #1
messages.2.gz:Nov 2 06:32:48 maistor kernel: CPU: 0 PID: 5925 Comm: Socket
Thread Tainted: P 3.11.6eko2 #1
messages.3.gz:Oct 20 02:37:01 maistor kernel: CPU: 0 PID: 5952 Comm: Socket
Thread Tainted: P 3.11.6eko2 #1
messages.3.gz:Oct 20 23:34:00 maistor kernel: CPU: 0 PID: 14534 Comm:
plugin-containe Tainted: P 3.11.6eko2 #1
messages.3.gz:Oct 20 23:34:24 maistor kernel: CPU: 1 PID: 14535 Comm:
plugin-containe Tainted: P D 3.11.6eko2 #1
messages.3.gz:Oct 20 23:34:52 maistor kernel: CPU: 1 PID: 14535 Comm:
plugin-containe Tainted: P D 3.11.6eko2 #1
messages.3.gz:Oct 20 23:35:20 maistor kernel: CPU: 1 PID: 14535 Comm:
plugin-containe Tainted: P D 3.11.6eko2 #1
messages.3.gz:Oct 20 23:35:48 maistor kernel: CPU: 1 PID: 14535 Comm:
plugin-containe Tainted: P D 3.11.6eko2 #1
messages.3.gz:Oct 20 23:36:16 maistor kernel: CPU: 1 PID: 14535 Comm:
plugin-containe Tainted: P D 3.11.6eko2 #1
messages.3.gz:Oct 20 23:36:44 maistor kernel: CPU: 1 PID: 14535 Comm:
plugin-containe Tainted: P D 3.11.6eko2 #1
messages.3.gz:Oct 20 23:37:12 maistor kernel: CPU: 1 PID: 14535 Comm:
plugin-containe Tainted: P D 3.11.6eko2 #1
messages.3.gz:Oct 21 10:42:33 maistor kernel: CPU: 0 PID: 4002 Comm: Xorg
Tainted: P 3.11.6eko2 #1
messages.3.gz:Oct 21 10:42:45 maistor kernel: CPU: 1 PID: 4002 Comm: Xorg
Tainted: P W 3.11.6eko2 #1
messages.3.gz:Oct 22 11:24:20 maistor kernel: CPU: 1 PID: 4129 Comm: Xorg
Tainted: P 3.11.6eko2 #1
messages.3.gz:Oct 22 11:24:30 maistor kernel: CPU: 1 PID: 4129 Comm: Xorg
Tainted: P W 3.11.6eko2 #1
messages.3.gz:Oct 23 11:09:10 maistor kernel: CPU: 1 PID: 4197 Comm: Xorg
Tainted: P 3.11.6eko2 #1
messages.3.gz:Oct 23 11:09:18 maistor kernel: CPU: 1 PID: 4197 Comm: Xorg
Tainted: P W 3.11.6eko2 #1
messages.3.gz:Oct 24 11:00:12 maistor kernel: CPU: 0 PID: 3981 Comm: Xorg
Tainted: P 3.11.6eko2 #1
messages.3.gz:Oct 24 11:00:23 maistor kernel: CPU: 1 PID: 3981 Comm: Xorg
Tainted: P W 3.11.6eko2 #1
messages.3.gz:Oct 25 12:55:49 maistor kernel: CPU: 0 PID: 4564 Comm: Xorg
Tainted: P 3.11.6eko2 #1
messages.3.gz:Oct 25 12:56:01 maistor kernel: CPU: 1 PID: 4564 Comm: Xorg
Tainted: P W 3.11.6eko2 #1
messages.3.gz:Oct 26 21:57:03 maistor kernel: CPU: 0 PID: 30118 Comm:
plugin-containe Tainted: P O 3.11.6eko2 #1
messages.3.gz:Oct 26 21:57:27 maistor kernel: CPU: 1 PID: 30117 Comm:
plugin-containe Tainted: P D O 3.11.6eko2 #1
messages.3.gz:Oct 26 21:57:55 maistor kernel: CPU: 1 PID: 30117 Comm:
plugin-containe Tainted: P D O 3.11.6eko2 #1
messages.3.gz:Oct 26 21:58:23 maistor kernel: CPU: 1 PID: 30117 Comm:
plugin-containe Tainted: P D O 3.11.6eko2 #1
messages.3.gz:Oct 26 21:58:51 maistor kernel: CPU: 1 PID: 30117 Comm:
plugin-containe Tainted: P D O 3.11.6eko2 #1
messages.3.gz:Oct 26 21:59:19 maistor kernel: CPU: 1 PID: 30117 Comm:
plugin-containe Tainted: P D O 3.11.6eko2 #1
messages.4.gz:Oct 14 19:40:28 maistor kernel: CPU: 1 PID: 19163 Comm: konsole
Tainted: P O 3.10.9eko2 #4
messages.4.gz:Oct 14 19:42:04 maistor kernel: CPU: 0 PID: 26225 Comm: wfica
Tainted: P D O 3.10.9eko2 #4
messages.4.gz:Oct 14 20:17:55 maistor kernel: CPU: 1 PID: 390 Comm: kswapd0
Tainted: P D O 3.10.9eko2 #4
messages.4.gz:Oct 15 20:16:45 maistor kernel: CPU: 0 PID: 4058 Comm: Xorg
Tainted: P O 3.10.9eko2 #4
messages.4.gz:Oct 17 10:40:44 maistor kernel: CPU: 1 PID: 6417 Comm:
plugin-containe Tainted: P O 3.10.9eko2 #4
messages.4.gz:Oct 17 12:42:09 maistor kernel: CPU: 1 PID: 390 Comm: kswapd0
Tainted: P D O 3.10.9eko2 #4
messages.4.gz:Oct 17 13:16:14 maistor kernel: CPU: 0 PID: 6108 Comm: kmix
Tainted: P O 3.10.9eko2 #4
messages.4.gz:Oct 17 13:17:33 maistor kernel: CPU: 1 PID: 20690 Comm:
udisks-daemon Tainted: P D O 3.10.9eko2 #4
messages.4.gz:Oct 17 13:17:33 maistor kernel: CPU: 1 PID: 20690 Comm:
udisks-daemon Tainted: P D W O 3.10.9eko2 #4
messages.4.gz:Oct 17 13:56:58 maistor kernel: CPU: 1 PID: 13731 Comm:
plugin-containe Tainted: P O 3.10.9eko2 #4
messages.4.gz:Oct 17 13:57:04 maistor kernel: CPU: 0 PID: 4494 Comm: Xorg
Tainted: P W O 3.10.9eko2 #4
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: kernel 3.11.6 general protection fault
@ 2013-11-13 21:19 ` MPhil. Emanoil Kotsev
0 siblings, 0 replies; 26+ messages in thread
From: MPhil. Emanoil Kotsev @ 2013-11-13 21:19 UTC (permalink / raw)
To: Borislav Petkov; +Cc: intel-gfx, linux-kernel
Hi
On Wednesday 13 November 2013 21:33:19 Borislav Petkov wrote:
> Some more suggestions, in addition to Daniel's:
>
> On Wed, Nov 13, 2013 at 09:09:14PM +0100, Daniel Vetter wrote:
> > > Nov 13 09:36:21 maistor kernel: [ 40.447271] ------------[ cut
> > > here ]------------
> > > Nov 13 09:36:21 maistor kernel: [ 40.447311] WARNING: CPU: 1 PID:
> > > 4142 at drivers/gpu/drm/i915/intel_display.c:8292
> > > check_crtc_state+0x5cf/0xa60 [i915] ()
> > > Nov 13 09:36:21 maistor kernel: [ 40.447313] pipe state doesn't
> > > match!
>
> That's
>
> if (active &&
> !intel_pipe_config_compare(dev, &crtc->config, &pipe_config)) {
> WARN(1, "pipe state doesn't match!\n"); <---
> intel_dump_pipe_config(crtc, &pipe_config,
> "[hw state]");
> intel_dump_pipe_config(crtc, &crtc->config,
> "[sw state]");
> }
>
I looked there, but it would have taken more time then available to get an
idea on what it is exactly trying to do
> > > Nov 13 09:36:21 maistor kernel: [ 40.447315] Modules linked in:
> > > snd_hrtimer acpi_pad sbs sbshc fan binfmt_misc uinput fuse af_packet
> > > ipv6 firewire_sbp2 snd
> > > _hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss
> > > snd_mixer_oss snd_pcm snd_page_alloc snd_seq_dummy snd_seq_oss
> > > snd_seq_midi
> > > snd_seq_midi_eve
> > > nt snd_rawmidi snd_seq snd_seq_device snd_timer arc4 iTCO_wdt snd
> > > gpio_ich iwl3945 dell_wmi sparse_keymap i2c_i801 iTCO_vendor_support
> > > ehci_pci iwlegacy mac8
> > > 0211 cfg80211 soundcore rfkill dell_laptop lpc_ich yenta_socket
> > > pcmcia_rsrc irda 8250 evdev wmi processor dcdbas rtc_cmos battery
> > > crc_ccitt ac joydev sha256_
> > > ssse3 sha256_generic cbc hid_generic usbhid hid loop dm_crypt dm_mod sg
> > > b44 sr_mod cdrom ssb i915 cfbfillrect cfbimgblt mmc_core mii pcmcia
> > > pcmcia_core uhci_
> > > hcd i2c_algo_bit cfbcopyarea firewire_ohci video backlight
> > > firewire_core crc_itu_t drm_kms_helper drm ehci_hcd sd_mod i2c_core
> > > thermal thermal_sys freq_table
> > > usbcore usb_common button intel_agp intel_gtt agpgart
> > > Nov 13 09:36:21 maistor kernel: [ 40.447384] CPU: 1 PID: 4142 Comm:
> > > Xorg Tainted: P 3.11.6eko2 #3
>
> And there's that taint P again due to the vmware modules.
>
> I know that you tried without the vmware modules where your kernel
> wasn't tainted but then you got a #GP which could be something entirely
> different. But now you're hitting some sanity-checking code which could
> mean there's some corruption happening.
Yes with #GP machine locks and this time it didn't
>
> So, can you reproduce that exact same warning, i.e. this one:
>
> WARNING: CPU: 1 PID: 4142 at drivers/gpu/drm/i915/intel_display.c:8292
> check_crtc_state+0x5cf/0xa60 [i915]() pipe state doesn't match!
>
> *without* the vmware modules installed?
I'm not sure - you know it happens random
>
> Also, it wouldn't hurt to try the shiny new 3.12.
I was thinking to do so - but lets be honest. I would save everybody's time if
I were 100% sure it is a hardware issue and I would buy a new notebook. The
one was serving great for the past 7y and it payed off itself already long
time ago.
I could try 3.12 and also try in combination with the git drm-intel as Daniel
suggested.
I'm still thinking that his has something to do with the graphics, but rather
guessing from intuition.
I'm not sure if it helps somehow but when I grep as following I find only the
tainted erros - it's not visible which of them were GP, but still it shows
where it hit the issue
zgrep 'Comm:' messages* | more
messages:Nov 11 10:52:42 maistor kernel: [ 43.961984] CPU: 1 PID: 4103 Comm:
Xorg Tainted: P 3.11.6eko2 #3
messages:Nov 11 10:52:54 maistor kernel: [ 55.759687] CPU: 1 PID: 4103 Comm:
Xorg Tainted: P W 3.11.6eko2 #3
messages:Nov 12 10:35:42 maistor kernel: [ 28.626271] CPU: 0 PID: 3895 Comm:
Xorg Tainted: P 3.11.6eko2 #3
messages:Nov 12 10:35:55 maistor kernel: [ 41.618447] CPU: 1 PID: 3895 Comm:
Xorg Tainted: P W 3.11.6eko2 #3
messages:Nov 13 09:36:21 maistor kernel: [ 40.447384] CPU: 1 PID: 4142 Comm:
Xorg Tainted: P 3.11.6eko2 #3
messages:Nov 13 09:36:34 maistor kernel: [ 53.624754] CPU: 0 PID: 4142 Comm:
Xorg Tainted: P W 3.11.6eko2 #3
messages.1:Nov 4 11:21:19 maistor kernel: [ 38.497643] CPU: 1 PID: 4104
Comm: Xorg Tainted: P 3.11.6eko2 #3
messages.1:Nov 4 11:21:31 maistor kernel: [ 50.844193] CPU: 0 PID: 4104
Comm: Xorg Tainted: P W 3.11.6eko2 #3
messages.1:Nov 5 10:28:49 maistor kernel: [ 39.545474] CPU: 1 PID: 4253
Comm: Xorg Tainted: P 3.11.6eko2 #3
messages.1:Nov 5 10:29:02 maistor kernel: [ 52.078761] CPU: 0 PID: 4253
Comm: Xorg Tainted: P W 3.11.6eko2 #3
messages.1:Nov 6 10:33:01 maistor kernel: [ 38.876587] CPU: 0 PID: 4128
Comm: Xorg Tainted: P 3.11.6eko2 #3
messages.1:Nov 6 10:33:12 maistor kernel: [ 49.777082] CPU: 0 PID: 4128
Comm: Xorg Tainted: P W 3.11.6eko2 #3
messages.1:Nov 7 10:27:40 maistor kernel: [ 38.771546] CPU: 0 PID: 4110
Comm: Xorg Tainted: P 3.11.6eko2 #3
messages.1:Nov 7 10:27:53 maistor kernel: [ 51.896606] CPU: 1 PID: 4110
Comm: Xorg Tainted: P W 3.11.6eko2 #3
messages.1:Nov 8 10:35:13 maistor kernel: [ 42.021333] CPU: 1 PID: 4224
Comm: Xorg Tainted: P 3.11.6eko2 #3
messages.1:Nov 8 10:35:22 maistor kernel: [ 51.699993] CPU: 0 PID: 4224
Comm: Xorg Tainted: P W 3.11.6eko2 #3
messages.2.gz:Oct 27 19:01:29 maistor kernel: CPU: 1 PID: 6111 Comm:
plugin-containe Tainted: P O 3.11.6eko2 #1
messages.2.gz:Oct 27 22:15:14 maistor kernel: CPU: 1 PID: 9024 Comm:
plugin-containe Tainted: P O 3.11.6eko2 #1
messages.2.gz:Oct 28 10:33:34 maistor kernel: CPU: 0 PID: 4195 Comm: Xorg
Tainted: P 3.11.6eko2 #1
messages.2.gz:Oct 28 10:33:43 maistor kernel: CPU: 0 PID: 4195 Comm: Xorg
Tainted: P W 3.11.6eko2 #1
messages.2.gz:Oct 29 10:34:29 maistor kernel: CPU: 1 PID: 4633 Comm: Xorg
Tainted: P 3.11.6eko2 #1
messages.2.gz:Oct 29 10:34:41 maistor kernel: CPU: 1 PID: 4633 Comm: Xorg
Tainted: P W 3.11.6eko2 #1
messages.2.gz:Oct 30 10:30:55 maistor kernel: CPU: 1 PID: 4030 Comm: Xorg
Tainted: P 3.11.6eko2 #1
messages.2.gz:Oct 30 10:31:06 maistor kernel: CPU: 0 PID: 4030 Comm: Xorg
Tainted: P W 3.11.6eko2 #1
messages.2.gz:Oct 31 10:51:01 maistor kernel: CPU: 1 PID: 6441 Comm: Xorg
Tainted: P O 3.11.6eko2 #1
messages.2.gz:Oct 31 10:51:08 maistor kernel: CPU: 1 PID: 6441 Comm: Xorg
Tainted: P W O 3.11.6eko2 #1
messages.2.gz:Nov 2 06:32:48 maistor kernel: CPU: 0 PID: 5925 Comm: Socket
Thread Tainted: P 3.11.6eko2 #1
messages.3.gz:Oct 20 02:37:01 maistor kernel: CPU: 0 PID: 5952 Comm: Socket
Thread Tainted: P 3.11.6eko2 #1
messages.3.gz:Oct 20 23:34:00 maistor kernel: CPU: 0 PID: 14534 Comm:
plugin-containe Tainted: P 3.11.6eko2 #1
messages.3.gz:Oct 20 23:34:24 maistor kernel: CPU: 1 PID: 14535 Comm:
plugin-containe Tainted: P D 3.11.6eko2 #1
messages.3.gz:Oct 20 23:34:52 maistor kernel: CPU: 1 PID: 14535 Comm:
plugin-containe Tainted: P D 3.11.6eko2 #1
messages.3.gz:Oct 20 23:35:20 maistor kernel: CPU: 1 PID: 14535 Comm:
plugin-containe Tainted: P D 3.11.6eko2 #1
messages.3.gz:Oct 20 23:35:48 maistor kernel: CPU: 1 PID: 14535 Comm:
plugin-containe Tainted: P D 3.11.6eko2 #1
messages.3.gz:Oct 20 23:36:16 maistor kernel: CPU: 1 PID: 14535 Comm:
plugin-containe Tainted: P D 3.11.6eko2 #1
messages.3.gz:Oct 20 23:36:44 maistor kernel: CPU: 1 PID: 14535 Comm:
plugin-containe Tainted: P D 3.11.6eko2 #1
messages.3.gz:Oct 20 23:37:12 maistor kernel: CPU: 1 PID: 14535 Comm:
plugin-containe Tainted: P D 3.11.6eko2 #1
messages.3.gz:Oct 21 10:42:33 maistor kernel: CPU: 0 PID: 4002 Comm: Xorg
Tainted: P 3.11.6eko2 #1
messages.3.gz:Oct 21 10:42:45 maistor kernel: CPU: 1 PID: 4002 Comm: Xorg
Tainted: P W 3.11.6eko2 #1
messages.3.gz:Oct 22 11:24:20 maistor kernel: CPU: 1 PID: 4129 Comm: Xorg
Tainted: P 3.11.6eko2 #1
messages.3.gz:Oct 22 11:24:30 maistor kernel: CPU: 1 PID: 4129 Comm: Xorg
Tainted: P W 3.11.6eko2 #1
messages.3.gz:Oct 23 11:09:10 maistor kernel: CPU: 1 PID: 4197 Comm: Xorg
Tainted: P 3.11.6eko2 #1
messages.3.gz:Oct 23 11:09:18 maistor kernel: CPU: 1 PID: 4197 Comm: Xorg
Tainted: P W 3.11.6eko2 #1
messages.3.gz:Oct 24 11:00:12 maistor kernel: CPU: 0 PID: 3981 Comm: Xorg
Tainted: P 3.11.6eko2 #1
messages.3.gz:Oct 24 11:00:23 maistor kernel: CPU: 1 PID: 3981 Comm: Xorg
Tainted: P W 3.11.6eko2 #1
messages.3.gz:Oct 25 12:55:49 maistor kernel: CPU: 0 PID: 4564 Comm: Xorg
Tainted: P 3.11.6eko2 #1
messages.3.gz:Oct 25 12:56:01 maistor kernel: CPU: 1 PID: 4564 Comm: Xorg
Tainted: P W 3.11.6eko2 #1
messages.3.gz:Oct 26 21:57:03 maistor kernel: CPU: 0 PID: 30118 Comm:
plugin-containe Tainted: P O 3.11.6eko2 #1
messages.3.gz:Oct 26 21:57:27 maistor kernel: CPU: 1 PID: 30117 Comm:
plugin-containe Tainted: P D O 3.11.6eko2 #1
messages.3.gz:Oct 26 21:57:55 maistor kernel: CPU: 1 PID: 30117 Comm:
plugin-containe Tainted: P D O 3.11.6eko2 #1
messages.3.gz:Oct 26 21:58:23 maistor kernel: CPU: 1 PID: 30117 Comm:
plugin-containe Tainted: P D O 3.11.6eko2 #1
messages.3.gz:Oct 26 21:58:51 maistor kernel: CPU: 1 PID: 30117 Comm:
plugin-containe Tainted: P D O 3.11.6eko2 #1
messages.3.gz:Oct 26 21:59:19 maistor kernel: CPU: 1 PID: 30117 Comm:
plugin-containe Tainted: P D O 3.11.6eko2 #1
messages.4.gz:Oct 14 19:40:28 maistor kernel: CPU: 1 PID: 19163 Comm: konsole
Tainted: P O 3.10.9eko2 #4
messages.4.gz:Oct 14 19:42:04 maistor kernel: CPU: 0 PID: 26225 Comm: wfica
Tainted: P D O 3.10.9eko2 #4
messages.4.gz:Oct 14 20:17:55 maistor kernel: CPU: 1 PID: 390 Comm: kswapd0
Tainted: P D O 3.10.9eko2 #4
messages.4.gz:Oct 15 20:16:45 maistor kernel: CPU: 0 PID: 4058 Comm: Xorg
Tainted: P O 3.10.9eko2 #4
messages.4.gz:Oct 17 10:40:44 maistor kernel: CPU: 1 PID: 6417 Comm:
plugin-containe Tainted: P O 3.10.9eko2 #4
messages.4.gz:Oct 17 12:42:09 maistor kernel: CPU: 1 PID: 390 Comm: kswapd0
Tainted: P D O 3.10.9eko2 #4
messages.4.gz:Oct 17 13:16:14 maistor kernel: CPU: 0 PID: 6108 Comm: kmix
Tainted: P O 3.10.9eko2 #4
messages.4.gz:Oct 17 13:17:33 maistor kernel: CPU: 1 PID: 20690 Comm:
udisks-daemon Tainted: P D O 3.10.9eko2 #4
messages.4.gz:Oct 17 13:17:33 maistor kernel: CPU: 1 PID: 20690 Comm:
udisks-daemon Tainted: P D W O 3.10.9eko2 #4
messages.4.gz:Oct 17 13:56:58 maistor kernel: CPU: 1 PID: 13731 Comm:
plugin-containe Tainted: P O 3.10.9eko2 #4
messages.4.gz:Oct 17 13:57:04 maistor kernel: CPU: 0 PID: 4494 Comm: Xorg
Tainted: P W O 3.10.9eko2 #4
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Intel-gfx] kernel 3.11.6 general protection fault
2013-11-13 20:33 ` Borislav Petkov
@ 2013-11-17 11:35 ` MPhil. Emanoil Kotsev
-1 siblings, 0 replies; 26+ messages in thread
From: MPhil. Emanoil Kotsev @ 2013-11-17 11:35 UTC (permalink / raw)
To: Borislav Petkov; +Cc: intel-gfx, linux-kernel, Daniel Vetter
Hi
I listened to your advise and installed 3.12 kernel (no other modules on top
that would taint the kernel like vmware/player).
So it turned out I have to enable /proc/acpi (depreciated) and acpi_cpufreq,
so that I may have a proper support for cooling and frequency.
$ acpi -t
Thermal 0: ok, 50.5 degrees C
$ acpi -c
Cooling 0: Processor 0 of 10
Cooling 1: Processor 0 of 10
Cooling 2: LCD 3 of 7
$ lsmod | grep cpu
cpufreq_ondemand 8085 2
cpufreq_powersave 926 0
cpufreq_performance 930 0
cpufreq_conservative 6305 0
acpi_cpufreq 6955 0
processor 23167 3 acpi_cpufreq
After doing all of this I was able to reproduce the issue by overloading the
system with following simple steps:
1. start a compilation of something (ex. kernel)
2. run another process hungry application (flashplayer in firefox)
=> system locks in about 3-5mins
I also noticed that the board gets pretty hot, so in my opinion it locks
because of thermal issue.
I think this also would explain why I see errors at different processes
(mostly Xorg), but with 3.12 I do not get any trace message in the log files.
Could you advise which option should be enabled in the kernel or how I could
log/trace if system locks.
How can I make sure that the cooling/temp works properly?
Perhaps after upgrading in september the system is working under heavier load
and therefore I started having the issue, or something broke in software or
hardware and it can not cool down properly. I don't think the kernel is the
issue, because I had the same with older kernels that were working fine
before.
The fan looks clean and there is no dust or whatever in the cooling area, that
would prevent colling. The physical position of the notebook (docking
station) also did not change.
I don't know where to look at or to start, so any advise is appreciated.
thanks in advance and kind regards
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: kernel 3.11.6 general protection fault
@ 2013-11-17 11:35 ` MPhil. Emanoil Kotsev
0 siblings, 0 replies; 26+ messages in thread
From: MPhil. Emanoil Kotsev @ 2013-11-17 11:35 UTC (permalink / raw)
To: Borislav Petkov; +Cc: intel-gfx, linux-kernel
Hi
I listened to your advise and installed 3.12 kernel (no other modules on top
that would taint the kernel like vmware/player).
So it turned out I have to enable /proc/acpi (depreciated) and acpi_cpufreq,
so that I may have a proper support for cooling and frequency.
$ acpi -t
Thermal 0: ok, 50.5 degrees C
$ acpi -c
Cooling 0: Processor 0 of 10
Cooling 1: Processor 0 of 10
Cooling 2: LCD 3 of 7
$ lsmod | grep cpu
cpufreq_ondemand 8085 2
cpufreq_powersave 926 0
cpufreq_performance 930 0
cpufreq_conservative 6305 0
acpi_cpufreq 6955 0
processor 23167 3 acpi_cpufreq
After doing all of this I was able to reproduce the issue by overloading the
system with following simple steps:
1. start a compilation of something (ex. kernel)
2. run another process hungry application (flashplayer in firefox)
=> system locks in about 3-5mins
I also noticed that the board gets pretty hot, so in my opinion it locks
because of thermal issue.
I think this also would explain why I see errors at different processes
(mostly Xorg), but with 3.12 I do not get any trace message in the log files.
Could you advise which option should be enabled in the kernel or how I could
log/trace if system locks.
How can I make sure that the cooling/temp works properly?
Perhaps after upgrading in september the system is working under heavier load
and therefore I started having the issue, or something broke in software or
hardware and it can not cool down properly. I don't think the kernel is the
issue, because I had the same with older kernels that were working fine
before.
The fan looks clean and there is no dust or whatever in the cooling area, that
would prevent colling. The physical position of the notebook (docking
station) also did not change.
I don't know where to look at or to start, so any advise is appreciated.
thanks in advance and kind regards
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Intel-gfx] kernel 3.11.6 general protection fault
2013-11-17 11:35 ` MPhil. Emanoil Kotsev
@ 2013-11-17 12:07 ` Borislav Petkov
-1 siblings, 0 replies; 26+ messages in thread
From: Borislav Petkov @ 2013-11-17 12:07 UTC (permalink / raw)
To: MPhil. Emanoil Kotsev; +Cc: intel-gfx, linux-kernel, Daniel Vetter
On Sun, Nov 17, 2013 at 12:35:16PM +0100, MPhil. Emanoil Kotsev wrote:
> After doing all of this I was able to reproduce the issue by
> overloading the system with following simple steps:
> 1. start a compilation of something (ex. kernel)
> 2. run another process hungry application (flashplayer in firefox)
> => system locks in about 3-5mins
Ha, so we're getting somewhere :)
> I also noticed that the board gets pretty hot, so in my opinion it
> locks because of thermal issue.
The symptoms we're seeing so far are very much consistent with a thermal
issue.
> I think this also would explain why I see errors at different
> processes (mostly Xorg), but with 3.12 I do not get any trace message
> in the log files. Could you advise which option should be enabled in
> the kernel or how I could log/trace if system locks.
Try enabling CONFIG_LOCKUP_DETECTOR, that could tell us where we're
hanging.
But, make sure to be on a console and not in X in order to get a chance
to see the message. What I do is reroute all log messages to /dev/tty8,
i.e. have
*.* |/dev/tty8
in syslog.conf and switch to it with Ctrl-Alt-F8.
> How can I make sure that the cooling/temp works properly?
>
> Perhaps after upgrading in september the system is working under
What kind of upgrade exactly did you do to a laptop?
> heavier load and therefore I started having the issue, or something
> broke in software or hardware and it can not cool down properly. I
> don't think the kernel is the issue, because I had the same with older
> kernels that were working fine before.
>
> The fan looks clean and there is no dust or whatever in the cooling
> area, that would prevent colling. The physical position of the
> notebook (docking station) also did not change.
Does the issue happen if the laptop is not in the docking station?
In any case, you need to follow your steps back of the upgrade to have
at least a clue what causes the overheating.
Can you revert the upgrade and see whether it still happens?
Also, do you have sensors support for your hardware? IOW, can you
monitor the temperature of some hardware elements by running
$ sensors
?
For example, I see this on my box here:
$ sensors
fam15h_power-pci-00c4
Adapter: PCI adapter
power1: 45.64 W (crit = 125.19 W)
k10temp-pci-00c3
Adapter: PCI adapter
temp1: +19.2°C (high = +70.0°C)
(crit = +90.0°C, hyst = +87.0°C)
radeon-pci-0100
Adapter: PCI adapter
temp1: +80.0°C
so when something overheats, running "watch -n 1 sensors" could give
some hints.
Also, what does
$ grep . -EriIn /sys/devices/system/cpu/cpu0/cpufreq
give?
Also, can you connect your laptop to a serial or netconsole to collect
dmesg before and while the lockup happens?
Basically, we're looking for a hint about which part of the hw causes
the overheating...
HTH.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: kernel 3.11.6 general protection fault
@ 2013-11-17 12:07 ` Borislav Petkov
0 siblings, 0 replies; 26+ messages in thread
From: Borislav Petkov @ 2013-11-17 12:07 UTC (permalink / raw)
To: MPhil. Emanoil Kotsev; +Cc: intel-gfx, linux-kernel
On Sun, Nov 17, 2013 at 12:35:16PM +0100, MPhil. Emanoil Kotsev wrote:
> After doing all of this I was able to reproduce the issue by
> overloading the system with following simple steps:
> 1. start a compilation of something (ex. kernel)
> 2. run another process hungry application (flashplayer in firefox)
> => system locks in about 3-5mins
Ha, so we're getting somewhere :)
> I also noticed that the board gets pretty hot, so in my opinion it
> locks because of thermal issue.
The symptoms we're seeing so far are very much consistent with a thermal
issue.
> I think this also would explain why I see errors at different
> processes (mostly Xorg), but with 3.12 I do not get any trace message
> in the log files. Could you advise which option should be enabled in
> the kernel or how I could log/trace if system locks.
Try enabling CONFIG_LOCKUP_DETECTOR, that could tell us where we're
hanging.
But, make sure to be on a console and not in X in order to get a chance
to see the message. What I do is reroute all log messages to /dev/tty8,
i.e. have
*.* |/dev/tty8
in syslog.conf and switch to it with Ctrl-Alt-F8.
> How can I make sure that the cooling/temp works properly?
>
> Perhaps after upgrading in september the system is working under
What kind of upgrade exactly did you do to a laptop?
> heavier load and therefore I started having the issue, or something
> broke in software or hardware and it can not cool down properly. I
> don't think the kernel is the issue, because I had the same with older
> kernels that were working fine before.
>
> The fan looks clean and there is no dust or whatever in the cooling
> area, that would prevent colling. The physical position of the
> notebook (docking station) also did not change.
Does the issue happen if the laptop is not in the docking station?
In any case, you need to follow your steps back of the upgrade to have
at least a clue what causes the overheating.
Can you revert the upgrade and see whether it still happens?
Also, do you have sensors support for your hardware? IOW, can you
monitor the temperature of some hardware elements by running
$ sensors
?
For example, I see this on my box here:
$ sensors
fam15h_power-pci-00c4
Adapter: PCI adapter
power1: 45.64 W (crit = 125.19 W)
k10temp-pci-00c3
Adapter: PCI adapter
temp1: +19.2°C (high = +70.0°C)
(crit = +90.0°C, hyst = +87.0°C)
radeon-pci-0100
Adapter: PCI adapter
temp1: +80.0°C
so when something overheats, running "watch -n 1 sensors" could give
some hints.
Also, what does
$ grep . -EriIn /sys/devices/system/cpu/cpu0/cpufreq
give?
Also, can you connect your laptop to a serial or netconsole to collect
dmesg before and while the lockup happens?
Basically, we're looking for a hint about which part of the hw causes
the overheating...
HTH.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Intel-gfx] kernel 3.11.6 general protection fault
2013-11-17 12:07 ` Borislav Petkov
@ 2013-11-17 14:45 ` MPhil. Emanoil Kotsev
-1 siblings, 0 replies; 26+ messages in thread
From: MPhil. Emanoil Kotsev @ 2013-11-17 14:45 UTC (permalink / raw)
To: Borislav Petkov; +Cc: intel-gfx, linux-kernel, Daniel Vetter
Hi,
On Sunday 17 November 2013 13:07:34 Borislav Petkov wrote:
> On Sun, Nov 17, 2013 at 12:35:16PM +0100, MPhil. Emanoil Kotsev wrote:
> > After doing all of this I was able to reproduce the issue by
> > overloading the system with following simple steps:
> > 1. start a compilation of something (ex. kernel)
> > 2. run another process hungry application (flashplayer in firefox)
> > => system locks in about 3-5mins
>
> Ha, so we're getting somewhere :)
yes looks like :)
>
> > I also noticed that the board gets pretty hot, so in my opinion it
> > locks because of thermal issue.
>
> The symptoms we're seeing so far are very much consistent with a thermal
> issue.
this is also true - which makes me sad as the notebook was working great in
the past 7y
>
> > I think this also would explain why I see errors at different
> > processes (mostly Xorg), but with 3.12 I do not get any trace message
> > in the log files. Could you advise which option should be enabled in
> > the kernel or how I could log/trace if system locks.
>
> Try enabling CONFIG_LOCKUP_DETECTOR, that could tell us where we're
> hanging.
>
> But, make sure to be on a console and not in X in order to get a chance
> to see the message. What I do is reroute all log messages to /dev/tty8,
> i.e. have
>
> *.* |/dev/tty8
>
> in syslog.conf and switch to it with Ctrl-Alt-F8.
thanks for the advise. I'll do so
>
> > How can I make sure that the cooling/temp works properly?
> >
> > Perhaps after upgrading in september the system is working under
>
> What kind of upgrade exactly did you do to a laptop?
I was using debian squeeze with trinity desktop (KDE 3.5.10) and upgraded to
debian wheeze with TDE (3.5.13)
>
> > heavier load and therefore I started having the issue, or something
> > broke in software or hardware and it can not cool down properly. I
> > don't think the kernel is the issue, because I had the same with older
> > kernels that were working fine before.
> >
> > The fan looks clean and there is no dust or whatever in the cooling
> > area, that would prevent colling. The physical position of the
> > notebook (docking station) also did not change.
>
> Does the issue happen if the laptop is not in the docking station?
I wanted to test this, but as I have to replug a lot, didn't do it so far,
also because it was working with this docking station for the past 2y
>
> In any case, you need to follow your steps back of the upgrade to have
> at least a clue what causes the overheating.
>
> Can you revert the upgrade and see whether it still happens?
This would be hard - no impossible as I have a backup but it will be time
consuming
>
> Also, do you have sensors support for your hardware? IOW, can you
> monitor the temperature of some hardware elements by running
>
> $ sensors
$ sensors
acpitz-virtual-0
Adapter: Virtual device
temp1: +47.5°C (crit = +126.0°C)
>
> ?
>
> For example, I see this on my box here:
>
> $ sensors
> fam15h_power-pci-00c4
> Adapter: PCI adapter
> power1: 45.64 W (crit = 125.19 W)
>
> k10temp-pci-00c3
> Adapter: PCI adapter
> temp1: +19.2°C (high = +70.0°C)
> (crit = +90.0°C, hyst = +87.0°C)
>
> radeon-pci-0100
> Adapter: PCI adapter
> temp1: +80.0°C
>
> so when something overheats, running "watch -n 1 sensors" could give
> some hints.
>
> Also, what does
>
> $ grep . -EriIn /sys/devices/system/cpu/cpu0/cpufreq
>
> give?
grep . -EriIn /sys/devices/system/cpu/cpu0/cpufreq
/sys/devices/system/cpu/cpu0/cpufreq/bios_limit:1:2000000
/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:1:ondemand
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_transition_latency:1:10000
/sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies:1:2000000
1667000 1333000 1000000
/sys/devices/system/cpu/cpu0/cpufreq/freqdomain_cpus:1:0 1
/sys/devices/system/cpu/cpu0/cpufreq/scaling_driver:1:acpi-cpufreq
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq:1:1000000
/sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors:1:ondemand
powersave performance conservative userspace
/sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:1:1000000
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq:1:2000000
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq:1:1000000
/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq:1:2000000
/sys/devices/system/cpu/cpu0/cpufreq/affected_cpus:1:0
/sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq:1:1000000
/sys/devices/system/cpu/cpu0/cpufreq/related_cpus:1:0
/sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed:1:<unsupported>
>
> Also, can you connect your laptop to a serial or netconsole to collect
> dmesg before and while the lockup happens?
I could try this. I guess this assumes I have to have another machine running
in paralell, but this can be arranged with a little effort
>
> Basically, we're looking for a hint about which part of the hw causes
> the overheating...
>
> HTH.
Thanks for the hints. As I never had to do with overheating or similar issues,
your help is very precious to me. Unfortunately we have a little child on
board and time is limitted :) to a couple of hours daily, where I can work at
home which means even less time for debugging. But I never give up. I just
want to be sure that it is not a hardware issue
Thanks again and kind regards. I'll post when I have some useful input
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: kernel 3.11.6 general protection fault
@ 2013-11-17 14:45 ` MPhil. Emanoil Kotsev
0 siblings, 0 replies; 26+ messages in thread
From: MPhil. Emanoil Kotsev @ 2013-11-17 14:45 UTC (permalink / raw)
To: Borislav Petkov; +Cc: intel-gfx, linux-kernel
Hi,
On Sunday 17 November 2013 13:07:34 Borislav Petkov wrote:
> On Sun, Nov 17, 2013 at 12:35:16PM +0100, MPhil. Emanoil Kotsev wrote:
> > After doing all of this I was able to reproduce the issue by
> > overloading the system with following simple steps:
> > 1. start a compilation of something (ex. kernel)
> > 2. run another process hungry application (flashplayer in firefox)
> > => system locks in about 3-5mins
>
> Ha, so we're getting somewhere :)
yes looks like :)
>
> > I also noticed that the board gets pretty hot, so in my opinion it
> > locks because of thermal issue.
>
> The symptoms we're seeing so far are very much consistent with a thermal
> issue.
this is also true - which makes me sad as the notebook was working great in
the past 7y
>
> > I think this also would explain why I see errors at different
> > processes (mostly Xorg), but with 3.12 I do not get any trace message
> > in the log files. Could you advise which option should be enabled in
> > the kernel or how I could log/trace if system locks.
>
> Try enabling CONFIG_LOCKUP_DETECTOR, that could tell us where we're
> hanging.
>
> But, make sure to be on a console and not in X in order to get a chance
> to see the message. What I do is reroute all log messages to /dev/tty8,
> i.e. have
>
> *.* |/dev/tty8
>
> in syslog.conf and switch to it with Ctrl-Alt-F8.
thanks for the advise. I'll do so
>
> > How can I make sure that the cooling/temp works properly?
> >
> > Perhaps after upgrading in september the system is working under
>
> What kind of upgrade exactly did you do to a laptop?
I was using debian squeeze with trinity desktop (KDE 3.5.10) and upgraded to
debian wheeze with TDE (3.5.13)
>
> > heavier load and therefore I started having the issue, or something
> > broke in software or hardware and it can not cool down properly. I
> > don't think the kernel is the issue, because I had the same with older
> > kernels that were working fine before.
> >
> > The fan looks clean and there is no dust or whatever in the cooling
> > area, that would prevent colling. The physical position of the
> > notebook (docking station) also did not change.
>
> Does the issue happen if the laptop is not in the docking station?
I wanted to test this, but as I have to replug a lot, didn't do it so far,
also because it was working with this docking station for the past 2y
>
> In any case, you need to follow your steps back of the upgrade to have
> at least a clue what causes the overheating.
>
> Can you revert the upgrade and see whether it still happens?
This would be hard - no impossible as I have a backup but it will be time
consuming
>
> Also, do you have sensors support for your hardware? IOW, can you
> monitor the temperature of some hardware elements by running
>
> $ sensors
$ sensors
acpitz-virtual-0
Adapter: Virtual device
temp1: +47.5°C (crit = +126.0°C)
>
> ?
>
> For example, I see this on my box here:
>
> $ sensors
> fam15h_power-pci-00c4
> Adapter: PCI adapter
> power1: 45.64 W (crit = 125.19 W)
>
> k10temp-pci-00c3
> Adapter: PCI adapter
> temp1: +19.2°C (high = +70.0°C)
> (crit = +90.0°C, hyst = +87.0°C)
>
> radeon-pci-0100
> Adapter: PCI adapter
> temp1: +80.0°C
>
> so when something overheats, running "watch -n 1 sensors" could give
> some hints.
>
> Also, what does
>
> $ grep . -EriIn /sys/devices/system/cpu/cpu0/cpufreq
>
> give?
grep . -EriIn /sys/devices/system/cpu/cpu0/cpufreq
/sys/devices/system/cpu/cpu0/cpufreq/bios_limit:1:2000000
/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:1:ondemand
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_transition_latency:1:10000
/sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies:1:2000000
1667000 1333000 1000000
/sys/devices/system/cpu/cpu0/cpufreq/freqdomain_cpus:1:0 1
/sys/devices/system/cpu/cpu0/cpufreq/scaling_driver:1:acpi-cpufreq
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq:1:1000000
/sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors:1:ondemand
powersave performance conservative userspace
/sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:1:1000000
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq:1:2000000
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq:1:1000000
/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq:1:2000000
/sys/devices/system/cpu/cpu0/cpufreq/affected_cpus:1:0
/sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq:1:1000000
/sys/devices/system/cpu/cpu0/cpufreq/related_cpus:1:0
/sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed:1:<unsupported>
>
> Also, can you connect your laptop to a serial or netconsole to collect
> dmesg before and while the lockup happens?
I could try this. I guess this assumes I have to have another machine running
in paralell, but this can be arranged with a little effort
>
> Basically, we're looking for a hint about which part of the hw causes
> the overheating...
>
> HTH.
Thanks for the hints. As I never had to do with overheating or similar issues,
your help is very precious to me. Unfortunately we have a little child on
board and time is limitted :) to a couple of hours daily, where I can work at
home which means even less time for debugging. But I never give up. I just
want to be sure that it is not a hardware issue
Thanks again and kind regards. I'll post when I have some useful input
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Intel-gfx] kernel 3.11.6 general protection fault
2013-11-17 14:45 ` MPhil. Emanoil Kotsev
@ 2013-11-17 15:06 ` Borislav Petkov
-1 siblings, 0 replies; 26+ messages in thread
From: Borislav Petkov @ 2013-11-17 15:06 UTC (permalink / raw)
To: MPhil. Emanoil Kotsev; +Cc: intel-gfx, linux-kernel, Daniel Vetter
On Sun, Nov 17, 2013 at 03:45:34PM +0100, MPhil. Emanoil Kotsev wrote:
> this is also true - which makes me sad as the notebook was working
> thgreat in e past 7y
Hmm, maybe it is heading slowly for the eternal hunting fields... :-)
> > What kind of upgrade exactly did you do to a laptop?
>
> I was using debian squeeze with trinity desktop (KDE 3.5.10) and upgraded to
> debian wheeze with TDE (3.5.13)
Oh ok, so I thought you were talking about a hw upgrade, like adding
more RAM, hew hdd, etc.
Ok, can you try this: boot without X and try overloading the machine on
the console, i.e. do
while true; do make clean && make -j64; done
or similar in your kernel repository. Does it trigger then?
Although I can't imagine how a software upgrade would cause the
overheating... :-\.
> > Can you revert the upgrade and see whether it still happens?
> This would be hard - no impossible as I have a backup but it will be
> time consuming
You could try booting a distro from a livecd and see any change there...
> $ sensors
> acpitz-virtual-0
> Adapter: Virtual device
> temp1: +47.5°C (crit = +126.0°C)
That's some ACPI timezone thing. So what happens if you do
$ watch -n 1 sensors
and you incur the load? Do you hit the critical temperature?
> grep . -EriIn /sys/devices/system/cpu/cpu0/cpufreq
> /sys/devices/system/cpu/cpu0/cpufreq/bios_limit:1:2000000
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:1:ondemand
> /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_transition_latency:1:10000
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies:1:2000000 1667000 1333000 1000000
> /sys/devices/system/cpu/cpu0/cpufreq/freqdomain_cpus:1:0 1
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver:1:acpi-cpufreq
> /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq:1:1000000
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors:1:ondemand powersave performance conservative userspace
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:1:1000000
> /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq:1:2000000
> /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq:1:1000000
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq:1:2000000
> /sys/devices/system/cpu/cpu0/cpufreq/affected_cpus:1:0
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq:1:1000000
> /sys/devices/system/cpu/cpu0/cpufreq/related_cpus:1:0
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed:1:<unsupported>
Yeah, I don't see anything wrong with that output.
> I could try this. I guess this assumes I have to have another machine
> running in paralell, but this can be arranged with a little effort
Yep.
> Thanks for the hints. As I never had to do with overheating or
> similar issues, your help is very precious to me. Unfortunately we
> have a little child on board and time is limitted :) to a couple of
> hours daily, where I can work at home which means even less time for
> debugging. But I never give up. I just want to be sure that it is not
> a hardware issue
No worries, take care of the child first - the laptop and everyone else
can wait :-)
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: kernel 3.11.6 general protection fault
@ 2013-11-17 15:06 ` Borislav Petkov
0 siblings, 0 replies; 26+ messages in thread
From: Borislav Petkov @ 2013-11-17 15:06 UTC (permalink / raw)
To: MPhil. Emanoil Kotsev; +Cc: intel-gfx, linux-kernel
On Sun, Nov 17, 2013 at 03:45:34PM +0100, MPhil. Emanoil Kotsev wrote:
> this is also true - which makes me sad as the notebook was working
> thgreat in e past 7y
Hmm, maybe it is heading slowly for the eternal hunting fields... :-)
> > What kind of upgrade exactly did you do to a laptop?
>
> I was using debian squeeze with trinity desktop (KDE 3.5.10) and upgraded to
> debian wheeze with TDE (3.5.13)
Oh ok, so I thought you were talking about a hw upgrade, like adding
more RAM, hew hdd, etc.
Ok, can you try this: boot without X and try overloading the machine on
the console, i.e. do
while true; do make clean && make -j64; done
or similar in your kernel repository. Does it trigger then?
Although I can't imagine how a software upgrade would cause the
overheating... :-\.
> > Can you revert the upgrade and see whether it still happens?
> This would be hard - no impossible as I have a backup but it will be
> time consuming
You could try booting a distro from a livecd and see any change there...
> $ sensors
> acpitz-virtual-0
> Adapter: Virtual device
> temp1: +47.5°C (crit = +126.0°C)
That's some ACPI timezone thing. So what happens if you do
$ watch -n 1 sensors
and you incur the load? Do you hit the critical temperature?
> grep . -EriIn /sys/devices/system/cpu/cpu0/cpufreq
> /sys/devices/system/cpu/cpu0/cpufreq/bios_limit:1:2000000
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:1:ondemand
> /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_transition_latency:1:10000
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies:1:2000000 1667000 1333000 1000000
> /sys/devices/system/cpu/cpu0/cpufreq/freqdomain_cpus:1:0 1
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver:1:acpi-cpufreq
> /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq:1:1000000
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors:1:ondemand powersave performance conservative userspace
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:1:1000000
> /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq:1:2000000
> /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq:1:1000000
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq:1:2000000
> /sys/devices/system/cpu/cpu0/cpufreq/affected_cpus:1:0
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq:1:1000000
> /sys/devices/system/cpu/cpu0/cpufreq/related_cpus:1:0
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed:1:<unsupported>
Yeah, I don't see anything wrong with that output.
> I could try this. I guess this assumes I have to have another machine
> running in paralell, but this can be arranged with a little effort
Yep.
> Thanks for the hints. As I never had to do with overheating or
> similar issues, your help is very precious to me. Unfortunately we
> have a little child on board and time is limitted :) to a couple of
> hours daily, where I can work at home which means even less time for
> debugging. But I never give up. I just want to be sure that it is not
> a hardware issue
No worries, take care of the child first - the laptop and everyone else
can wait :-)
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Intel-gfx] kernel 3.11.6 general protection fault
2013-11-17 15:06 ` Borislav Petkov
@ 2013-11-17 16:45 ` MPhil. Emanoil Kotsev
-1 siblings, 0 replies; 26+ messages in thread
From: MPhil. Emanoil Kotsev @ 2013-11-17 16:45 UTC (permalink / raw)
To: Borislav Petkov; +Cc: intel-gfx, linux-kernel, Daniel Vetter
Hi
On Sunday 17 November 2013 16:06:07 you wrote:
> On Sun, Nov 17, 2013 at 03:45:34PM +0100, MPhil. Emanoil Kotsev wrote:
> > this is also true - which makes me sad as the notebook was working
> > thgreat in e past 7y
>
> Hmm, maybe it is heading slowly for the eternal hunting fields... :-)
may be, but I am a bit of academic so until 100% prove - I doubt, which does
not mean that I can not purchaise a new one :)
>
> > > What kind of upgrade exactly did you do to a laptop?
> >
> > I was using debian squeeze with trinity desktop (KDE 3.5.10) and upgraded
> > to debian wheeze with TDE (3.5.13)
>
> Oh ok, so I thought you were talking about a hw upgrade, like adding
> more RAM, hew hdd, etc.
>
> Ok, can you try this: boot without X and try overloading the machine on
> the console, i.e. do
>
> while true; do make clean && make -j64; done
>
> or similar in your kernel repository. Does it trigger then?
I'll try - I'm also curious what will happen!
>
> Although I can't imagine how a software upgrade would cause the
> overheating... :-\.
How - new libraries - more exhaustive algorythms - higher cpu usage etc. Some
of the things M$ is doing on purpose to force you upgrade your hardware every
2-3years
>
> > > Can you revert the upgrade and see whether it still happens?
> >
> > This would be hard - no impossible as I have a backup but it will be
> > time consuming
>
> You could try booting a distro from a livecd and see any change there...
>
> > $ sensors
> > acpitz-virtual-0
> > Adapter: Virtual device
> > temp1: +47.5°C (crit = +126.0°C)
>
> That's some ACPI timezone thing. So what happens if you do
>
> $ watch -n 1 sensors
>
> and you incur the load? Do you hit the critical temperature?
I wanted to first compile the kernel with the debug option you mentioned, but
while compiling it went to about 75°C.
>
> > grep . -EriIn /sys/devices/system/cpu/cpu0/cpufreq
> > /sys/devices/system/cpu/cpu0/cpufreq/bios_limit:1:2000000
> > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:1:ondemand
> > /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_transition_latency:1:10000
> > /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies:1:2000
> >000 1667000 1333000 1000000
> > /sys/devices/system/cpu/cpu0/cpufreq/freqdomain_cpus:1:0 1
> > /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver:1:acpi-cpufreq
> > /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq:1:1000000
> > /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors:1:ondema
> >nd powersave performance conservative userspace
> > /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:1:1000000
> > /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq:1:2000000
> > /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq:1:1000000
> > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq:1:2000000
> > /sys/devices/system/cpu/cpu0/cpufreq/affected_cpus:1:0
> > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq:1:1000000
> > /sys/devices/system/cpu/cpu0/cpufreq/related_cpus:1:0
> > /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed:1:<unsupported>
>
> Yeah, I don't see anything wrong with that output.
yes looks nice
>
> > I could try this. I guess this assumes I have to have another machine
> > running in paralell, but this can be arranged with a little effort
>
> Yep.
>
> > Thanks for the hints. As I never had to do with overheating or
> > similar issues, your help is very precious to me. Unfortunately we
> > have a little child on board and time is limitted :) to a couple of
> > hours daily, where I can work at home which means even less time for
> > debugging. But I never give up. I just want to be sure that it is not
> > a hardware issue
>
> No worries, take care of the child first - the laptop and everyone else
> can wait :-)
yes - we do load balancing with my wife :)
I'll post back with some data (I hope)
regards
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: kernel 3.11.6 general protection fault
@ 2013-11-17 16:45 ` MPhil. Emanoil Kotsev
0 siblings, 0 replies; 26+ messages in thread
From: MPhil. Emanoil Kotsev @ 2013-11-17 16:45 UTC (permalink / raw)
To: Borislav Petkov; +Cc: intel-gfx, linux-kernel
Hi
On Sunday 17 November 2013 16:06:07 you wrote:
> On Sun, Nov 17, 2013 at 03:45:34PM +0100, MPhil. Emanoil Kotsev wrote:
> > this is also true - which makes me sad as the notebook was working
> > thgreat in e past 7y
>
> Hmm, maybe it is heading slowly for the eternal hunting fields... :-)
may be, but I am a bit of academic so until 100% prove - I doubt, which does
not mean that I can not purchaise a new one :)
>
> > > What kind of upgrade exactly did you do to a laptop?
> >
> > I was using debian squeeze with trinity desktop (KDE 3.5.10) and upgraded
> > to debian wheeze with TDE (3.5.13)
>
> Oh ok, so I thought you were talking about a hw upgrade, like adding
> more RAM, hew hdd, etc.
>
> Ok, can you try this: boot without X and try overloading the machine on
> the console, i.e. do
>
> while true; do make clean && make -j64; done
>
> or similar in your kernel repository. Does it trigger then?
I'll try - I'm also curious what will happen!
>
> Although I can't imagine how a software upgrade would cause the
> overheating... :-\.
How - new libraries - more exhaustive algorythms - higher cpu usage etc. Some
of the things M$ is doing on purpose to force you upgrade your hardware every
2-3years
>
> > > Can you revert the upgrade and see whether it still happens?
> >
> > This would be hard - no impossible as I have a backup but it will be
> > time consuming
>
> You could try booting a distro from a livecd and see any change there...
>
> > $ sensors
> > acpitz-virtual-0
> > Adapter: Virtual device
> > temp1: +47.5°C (crit = +126.0°C)
>
> That's some ACPI timezone thing. So what happens if you do
>
> $ watch -n 1 sensors
>
> and you incur the load? Do you hit the critical temperature?
I wanted to first compile the kernel with the debug option you mentioned, but
while compiling it went to about 75°C.
>
> > grep . -EriIn /sys/devices/system/cpu/cpu0/cpufreq
> > /sys/devices/system/cpu/cpu0/cpufreq/bios_limit:1:2000000
> > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:1:ondemand
> > /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_transition_latency:1:10000
> > /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies:1:2000
> >000 1667000 1333000 1000000
> > /sys/devices/system/cpu/cpu0/cpufreq/freqdomain_cpus:1:0 1
> > /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver:1:acpi-cpufreq
> > /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq:1:1000000
> > /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors:1:ondema
> >nd powersave performance conservative userspace
> > /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:1:1000000
> > /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq:1:2000000
> > /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq:1:1000000
> > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq:1:2000000
> > /sys/devices/system/cpu/cpu0/cpufreq/affected_cpus:1:0
> > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq:1:1000000
> > /sys/devices/system/cpu/cpu0/cpufreq/related_cpus:1:0
> > /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed:1:<unsupported>
>
> Yeah, I don't see anything wrong with that output.
yes looks nice
>
> > I could try this. I guess this assumes I have to have another machine
> > running in paralell, but this can be arranged with a little effort
>
> Yep.
>
> > Thanks for the hints. As I never had to do with overheating or
> > similar issues, your help is very precious to me. Unfortunately we
> > have a little child on board and time is limitted :) to a couple of
> > hours daily, where I can work at home which means even less time for
> > debugging. But I never give up. I just want to be sure that it is not
> > a hardware issue
>
> No worries, take care of the child first - the laptop and everyone else
> can wait :-)
yes - we do load balancing with my wife :)
I'll post back with some data (I hope)
regards
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Intel-gfx] kernel 3.11.6 general protection fault
2013-11-17 16:45 ` MPhil. Emanoil Kotsev
@ 2013-11-17 20:05 ` Borislav Petkov
-1 siblings, 0 replies; 26+ messages in thread
From: Borislav Petkov @ 2013-11-17 20:05 UTC (permalink / raw)
To: MPhil. Emanoil Kotsev; +Cc: intel-gfx, linux-kernel, Daniel Vetter
On Sun, Nov 17, 2013 at 05:45:18PM +0100, MPhil. Emanoil Kotsev wrote:
> How - new libraries - more exhaustive algorythms - higher cpu usage
> etc. Some of the things M$ is doing on purpose to force you upgrade
> your hardware every 2-3years
That would be too easy and machines would be dying left and right of
overheating. Actually, sane hardware is much more robust than that and
it throttles itself in case of critical temperature levels. And, IMHO
your Dell Latitude D520 should be fine, in that respect. But we'll see.
:-)
> I wanted to first compile the kernel with the debug option you
> mentioned, but while compiling it went to about 75°C.
Yeah, that's still ok if we trust the output saying that 126°C is the
critical temp.
It would be interesting to see what this sensor says right before the
machine locks up.
HTH.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: kernel 3.11.6 general protection fault
@ 2013-11-17 20:05 ` Borislav Petkov
0 siblings, 0 replies; 26+ messages in thread
From: Borislav Petkov @ 2013-11-17 20:05 UTC (permalink / raw)
To: MPhil. Emanoil Kotsev; +Cc: intel-gfx, linux-kernel
On Sun, Nov 17, 2013 at 05:45:18PM +0100, MPhil. Emanoil Kotsev wrote:
> How - new libraries - more exhaustive algorythms - higher cpu usage
> etc. Some of the things M$ is doing on purpose to force you upgrade
> your hardware every 2-3years
That would be too easy and machines would be dying left and right of
overheating. Actually, sane hardware is much more robust than that and
it throttles itself in case of critical temperature levels. And, IMHO
your Dell Latitude D520 should be fine, in that respect. But we'll see.
:-)
> I wanted to first compile the kernel with the debug option you
> mentioned, but while compiling it went to about 75°C.
Yeah, that's still ok if we trust the output saying that 126°C is the
critical temp.
It would be interesting to see what this sensor says right before the
machine locks up.
HTH.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Intel-gfx] kernel 3.11.6 general protection fault
2013-11-17 20:05 ` Borislav Petkov
@ 2013-11-19 9:21 ` MPhil. Emanoil Kotsev
-1 siblings, 0 replies; 26+ messages in thread
From: MPhil. Emanoil Kotsev @ 2013-11-19 9:21 UTC (permalink / raw)
To: Borislav Petkov; +Cc: intel-gfx, linux-kernel, Daniel Vetter
Hi
On Sunday 17 November 2013 21:05:46 Borislav Petkov wrote:
> On Sun, Nov 17, 2013 at 05:45:18PM +0100, MPhil. Emanoil Kotsev wrote:
> > How - new libraries - more exhaustive algorythms - higher cpu usage
> > etc. Some of the things M$ is doing on purpose to force you upgrade
> > your hardware every 2-3years
>
> That would be too easy and machines would be dying left and right of
> overheating. Actually, sane hardware is much more robust than that and
> it throttles itself in case of critical temperature levels. And, IMHO
> your Dell Latitude D520 should be fine, in that respect. But we'll see.
>
I was thinking the same - but started to despair
> :-)
> :
> > I wanted to first compile the kernel with the debug option you
> > mentioned, but while compiling it went to about 75°C.
>
> Yeah, that's still ok if we trust the output saying that 126°C is the
> critical temp.
>
> It would be interesting to see what this sensor says right before the
> machine locks up.
This test is outstanding for a moment where I have more free time to reproduce
and log everything
I did something else yesterday evening before going to bed ~00:30
I closed the notebook cover just so that it would switch off the LCD display
In the morning I opened up and found the notebook with blinking led lights
http://www.dell.com/support/troubleshooting/us/en/19/KCS/KcsArticles/ArticleView?c=us&l=en&s=dhs&docid=DSN_DBECF64CFEDA449398CB9E859D4944A5
unfortunately I don't find the pattern in the link above
the left one was on and the other two were blinking
Arter shut down (keep power button pressed) and turning it on only the two
leds (middle and right) were blinking, which according the link above means
Configuring PCI bridges Replacing the system board.
After waiting for about 1-2mins notebook starts normally - another link to
heating issues.
At the moment I have to do pretty much at all levels, so I can not test any
further.
This is just an update. I'll post again when more results are available.
I'm thinking to open up and inspect from inside - perhaps somewhere the
cooling system is clogged or something.
regards
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: kernel 3.11.6 general protection fault
@ 2013-11-19 9:21 ` MPhil. Emanoil Kotsev
0 siblings, 0 replies; 26+ messages in thread
From: MPhil. Emanoil Kotsev @ 2013-11-19 9:21 UTC (permalink / raw)
To: Borislav Petkov; +Cc: intel-gfx, linux-kernel
Hi
On Sunday 17 November 2013 21:05:46 Borislav Petkov wrote:
> On Sun, Nov 17, 2013 at 05:45:18PM +0100, MPhil. Emanoil Kotsev wrote:
> > How - new libraries - more exhaustive algorythms - higher cpu usage
> > etc. Some of the things M$ is doing on purpose to force you upgrade
> > your hardware every 2-3years
>
> That would be too easy and machines would be dying left and right of
> overheating. Actually, sane hardware is much more robust than that and
> it throttles itself in case of critical temperature levels. And, IMHO
> your Dell Latitude D520 should be fine, in that respect. But we'll see.
>
I was thinking the same - but started to despair
> :-)
> :
> > I wanted to first compile the kernel with the debug option you
> > mentioned, but while compiling it went to about 75°C.
>
> Yeah, that's still ok if we trust the output saying that 126°C is the
> critical temp.
>
> It would be interesting to see what this sensor says right before the
> machine locks up.
This test is outstanding for a moment where I have more free time to reproduce
and log everything
I did something else yesterday evening before going to bed ~00:30
I closed the notebook cover just so that it would switch off the LCD display
In the morning I opened up and found the notebook with blinking led lights
http://www.dell.com/support/troubleshooting/us/en/19/KCS/KcsArticles/ArticleView?c=us&l=en&s=dhs&docid=DSN_DBECF64CFEDA449398CB9E859D4944A5
unfortunately I don't find the pattern in the link above
the left one was on and the other two were blinking
Arter shut down (keep power button pressed) and turning it on only the two
leds (middle and right) were blinking, which according the link above means
Configuring PCI bridges Replacing the system board.
After waiting for about 1-2mins notebook starts normally - another link to
heating issues.
At the moment I have to do pretty much at all levels, so I can not test any
further.
This is just an update. I'll post again when more results are available.
I'm thinking to open up and inspect from inside - perhaps somewhere the
cooling system is clogged or something.
regards
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Intel-gfx] kernel 3.11.6 general protection fault
2013-11-17 20:05 ` Borislav Petkov
@ 2013-12-18 20:59 ` MPhil. Emanoil Kotsev
-1 siblings, 0 replies; 26+ messages in thread
From: MPhil. Emanoil Kotsev @ 2013-12-18 20:59 UTC (permalink / raw)
To: Borislav Petkov; +Cc: intel-gfx, linux-kernel, Daniel Vetter
Hi again,
sorry for writing after such long time of silence, but I was busy with one
project (and family as well)
On Sunday 17 November 2013 21:05:46 you wrote:
> On Sun, Nov 17, 2013 at 05:45:18PM +0100, MPhil. Emanoil Kotsev wrote:
> > How - new libraries - more exhaustive algorythms - higher cpu usage
> > etc. Some of the things M$ is doing on purpose to force you upgrade
> > your hardware every 2-3years
>
> That would be too easy and machines would be dying left and right of
> overheating. Actually, sane hardware is much more robust than that and
> it throttles itself in case of critical temperature levels. And, IMHO
> your Dell Latitude D520 should be fine, in that respect. But we'll see.
>
I was able to solve the issue by removing some of the modules I had in
xorg.conf.
I noticed that it is not the cpu that is overheating, but rather the
video/graphic card. The area around the "Dell" logo on the front of the
display is still pretty hot, but the system seem to be working fine now and I
can not reproduce the issue any more.
Someone would ask why I'm using the xorg.conf. The reason is because without
it X automatically loads the GL direver for 3d support and I am not able to
use second display.
Perhaps it is worth trying latest intel driver as susggested before. However
with the current one it is working fine, so I would consider the issue as
solved.
I would like to thank you for your precious support and ideas once again.
regards
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: kernel 3.11.6 general protection fault
@ 2013-12-18 20:59 ` MPhil. Emanoil Kotsev
0 siblings, 0 replies; 26+ messages in thread
From: MPhil. Emanoil Kotsev @ 2013-12-18 20:59 UTC (permalink / raw)
To: Borislav Petkov; +Cc: intel-gfx, linux-kernel
Hi again,
sorry for writing after such long time of silence, but I was busy with one
project (and family as well)
On Sunday 17 November 2013 21:05:46 you wrote:
> On Sun, Nov 17, 2013 at 05:45:18PM +0100, MPhil. Emanoil Kotsev wrote:
> > How - new libraries - more exhaustive algorythms - higher cpu usage
> > etc. Some of the things M$ is doing on purpose to force you upgrade
> > your hardware every 2-3years
>
> That would be too easy and machines would be dying left and right of
> overheating. Actually, sane hardware is much more robust than that and
> it throttles itself in case of critical temperature levels. And, IMHO
> your Dell Latitude D520 should be fine, in that respect. But we'll see.
>
I was able to solve the issue by removing some of the modules I had in
xorg.conf.
I noticed that it is not the cpu that is overheating, but rather the
video/graphic card. The area around the "Dell" logo on the front of the
display is still pretty hot, but the system seem to be working fine now and I
can not reproduce the issue any more.
Someone would ask why I'm using the xorg.conf. The reason is because without
it X automatically loads the GL direver for 3d support and I am not able to
use second display.
Perhaps it is worth trying latest intel driver as susggested before. However
with the current one it is working fine, so I would consider the issue as
solved.
I would like to thank you for your precious support and ideas once again.
regards
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Intel-gfx] kernel 3.11.6 general protection fault
2013-12-18 20:59 ` MPhil. Emanoil Kotsev
@ 2013-12-18 21:22 ` Borislav Petkov
-1 siblings, 0 replies; 26+ messages in thread
From: Borislav Petkov @ 2013-12-18 21:22 UTC (permalink / raw)
To: MPhil. Emanoil Kotsev; +Cc: intel-gfx, linux-kernel, Daniel Vetter
On Wed, Dec 18, 2013 at 09:59:22PM +0100, MPhil. Emanoil Kotsev wrote:
> I was able to solve the issue by removing some of the modules I had in
> xorg.conf. I noticed that it is not the cpu that is overheating, but
> rather the video/graphic card. The area around the "Dell" logo on the
> front of the display is still pretty hot, but the system seem to be
> working fine now and I can not reproduce the issue any more.
Interesting. Which module was that?
It was probably making your GPU go nuts. The more interesting question
is whether this module would behave on your machine normally and only
some buggy incarnation of it would cause the overheating... I.e., it
could be you upgraded X and with the new version the issue started
appearing. Fun.
> I would like to thank you for your precious support and ideas once
> again.
Sure, you're welcome!
:-)
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: kernel 3.11.6 general protection fault
@ 2013-12-18 21:22 ` Borislav Petkov
0 siblings, 0 replies; 26+ messages in thread
From: Borislav Petkov @ 2013-12-18 21:22 UTC (permalink / raw)
To: MPhil. Emanoil Kotsev; +Cc: intel-gfx, linux-kernel
On Wed, Dec 18, 2013 at 09:59:22PM +0100, MPhil. Emanoil Kotsev wrote:
> I was able to solve the issue by removing some of the modules I had in
> xorg.conf. I noticed that it is not the cpu that is overheating, but
> rather the video/graphic card. The area around the "Dell" logo on the
> front of the display is still pretty hot, but the system seem to be
> working fine now and I can not reproduce the issue any more.
Interesting. Which module was that?
It was probably making your GPU go nuts. The more interesting question
is whether this module would behave on your machine normally and only
some buggy incarnation of it would cause the overheating... I.e., it
could be you upgraded X and with the new version the issue started
appearing. Fun.
> I would like to thank you for your precious support and ideas once
> again.
Sure, you're welcome!
:-)
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2013-12-18 21:22 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-13 19:58 kernel 3.11.6 general protection fault MPhil. Emanoil Kotsev
2013-11-13 19:58 ` MPhil. Emanoil Kotsev
2013-11-13 20:09 ` [Intel-gfx] " Daniel Vetter
2013-11-13 20:09 ` Daniel Vetter
2013-11-13 20:33 ` [Intel-gfx] " Borislav Petkov
2013-11-13 20:33 ` Borislav Petkov
2013-11-13 21:19 ` [Intel-gfx] " MPhil. Emanoil Kotsev
2013-11-13 21:19 ` MPhil. Emanoil Kotsev
2013-11-17 11:35 ` [Intel-gfx] " MPhil. Emanoil Kotsev
2013-11-17 11:35 ` MPhil. Emanoil Kotsev
2013-11-17 12:07 ` [Intel-gfx] " Borislav Petkov
2013-11-17 12:07 ` Borislav Petkov
2013-11-17 14:45 ` [Intel-gfx] " MPhil. Emanoil Kotsev
2013-11-17 14:45 ` MPhil. Emanoil Kotsev
2013-11-17 15:06 ` [Intel-gfx] " Borislav Petkov
2013-11-17 15:06 ` Borislav Petkov
2013-11-17 16:45 ` [Intel-gfx] " MPhil. Emanoil Kotsev
2013-11-17 16:45 ` MPhil. Emanoil Kotsev
2013-11-17 20:05 ` [Intel-gfx] " Borislav Petkov
2013-11-17 20:05 ` Borislav Petkov
2013-11-19 9:21 ` [Intel-gfx] " MPhil. Emanoil Kotsev
2013-11-19 9:21 ` MPhil. Emanoil Kotsev
2013-12-18 20:59 ` [Intel-gfx] " MPhil. Emanoil Kotsev
2013-12-18 20:59 ` MPhil. Emanoil Kotsev
2013-12-18 21:22 ` [Intel-gfx] " Borislav Petkov
2013-12-18 21:22 ` Borislav Petkov
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.