* v4.20-rc1: list_del corruption on thinkpad x220 @ 2018-11-08 17:58 Pavel Machek 2018-11-21 11:19 ` Joonas Lahtinen 0 siblings, 1 reply; 15+ messages in thread From: Pavel Machek @ 2018-11-08 17:58 UTC (permalink / raw) To: kernel list, tglx, mingo, bp, hpa, x86 Cc: jani.nikula, joonas.lahtinen, rodrigo.vivi, intel-gfx [-- Attachment #1: Type: text/plain, Size: 3909 bytes --] Hi! My machine locked hard (thinkpad x220). After reboot, I found this in syslog: Sounds like memory corruption..? Does not sound like easy to debug. ...otoh, it still looks like an addres, so maybe it is "just" race in GPU drivers? Any ideas? Pavel Nov 8 18:35:01 duo CRON[28511]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa 1 1 1) Nov 8 18:42:57 duo kernel: list_del corruption. prev->next should be ffff8801742b8178, but was ffffc9000192fec8 Nov 8 18:42:57 duo kernel: ------------[ cut here ]------------ Nov 8 18:42:57 duo kernel: kernel BUG at /data/fast/l/k/lib/list_debug.c:53! Nov 8 18:42:57 duo kernel: invalid opcode: 0000 [#1] SMP PTI Nov 8 18:42:57 duo kernel: CPU: 2 PID: 1082 Comm: i915/signal:1 Not tainted 4.20.0-rc1+ #3 Nov 8 18:42:57 duo kernel: Hardware name: LENOVO 42872WU/42872WU, BIOS 8DET74WW (1.44 ) 03 /13/2018 Nov 8 18:42:57 duo kernel: RIP: 0010:__list_del_entry_valid+0x8e/0x90 Nov 8 18:42:57 duo kernel: Code: 66 88 d1 ff 0f 0b 48 89 fe 31 c0 48 c7 c7 90 74 5e 85 e8 53 88 d1 ff 0f 0b 48 89 fe 31 c0 48 c7 c7 c8 74 5e 85 e8 40 88 d1 ff <0f> 0b 55 48 89 d0 48 8b 52 08 48 89 e5 48 39 f2 75 19 48 8b 32 48 Nov 8 18:42:57 duo kernel: RSP: 0000:ffffc9000196be78 EFLAGS: 00210086 Nov 8 18:42:57 duo kernel: RAX: 0000000000000054 RBX: ffff8801742b8178 RCX: 00000000000000 00 Nov 8 18:42:57 duo kernel: RDX: 0000000000000000 RSI: ffff88019e2a53d8 RDI: ffff88019e2a53 d8 Nov 8 18:42:57 duo kernel: RBP: ffffc9000196be78 R08: ffff880196e2cd10 R09: 00000000000000 00 Nov 8 18:42:57 duo kernel: R10: 00000000e7684eb9 R11: 3863656632393101 R12: ffffc9000196be c8 Nov 8 18:42:57 duo kernel: R13: ffff88019707e000 R14: ffff8801742b8080 R15: ffffc9000192fd d0 Nov 8 18:42:57 duo kernel: FS: 0000000000000000(0000) GS:ffff88019e280000(0000) knlGS:000 0000000000000 Nov 8 18:42:57 duo kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 8 18:42:57 duo kernel: CR2: 00000000ed2bf000 CR3: 000000000581e001 CR4: 00000000000606a0 Nov 8 18:42:57 duo kernel: Call Trace: Nov 8 18:42:57 duo kernel: intel_breadcrumbs_signaler+0x162/0x330 Nov 8 18:42:57 duo kernel: kthread+0x116/0x150 Nov 8 18:42:57 duo kernel: ? intel_engine_wakeup+0x40/0x40 Nov 8 18:42:57 duo kernel: ? kthread_park+0x90/0x90 Nov 8 18:42:57 duo kernel: ret_from_fork+0x35/0x40 Nov 8 18:42:57 duo kernel: Modules linked in: Nov 8 18:42:57 duo kernel: ---[ end trace 2f8da183a56f80f6 ]--- Nov 8 18:42:57 duo kernel: RIP: 0010:__list_del_entry_valid+0x8e/0x90 Nov 8 18:42:57 duo kernel: Code: 66 88 d1 ff 0f 0b 48 89 fe 31 c0 48 c7 c7 90 74 5e 85 e8 53 88 d1 ff 0f 0b 48 89 fe 31 c0 48 c7 c7 c8 74 5e 85 e8 40 88 d1 ff <0f> 0b 55 48 89 d0 48 8b 52 08 48 89 e5 48 39 f2 75 19 48 8b 32 48 Nov 8 18:42:57 duo kernel: RSP: 0000:ffffc9000196be78 EFLAGS: 00210086 Nov 8 18:42:57 duo kernel: RAX: 0000000000000054 RBX: ffff8801742b8178 RCX: 0000000000000000 Nov 8 18:42:57 duo kernel: RDX: 0000000000000000 RSI: ffff88019e2a53d8 RDI: ffff88019e2a53d8 Nov 8 18:42:57 duo kernel: RBP: ffffc9000196be78 R08: ffff880196e2cd10 R09: 0000000000000000 Nov 8 18:42:57 duo kernel: R10: 00000000e7684eb9 R11: 3863656632393101 R12: ffffc9000196bec8 Nov 8 18:42:57 duo kernel: R13: ffff88019707e000 R14: ffff8801742b8080 R15: ffffc9000192fdd0 Nov 8 18:42:57 duo kernel: FS: 0000000000000000(0000) GS:ffff88019e280000(0000) knlGS:0000000000000000 Nov 8 18:42:57 duo kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 8 18:42:57 duo kernel: CR2: 00000000ed2bf000 CR3: 000000000581e001 CR4: 00000000000606a0 -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: v4.20-rc1: list_del corruption on thinkpad x220 2018-11-08 17:58 v4.20-rc1: list_del corruption on thinkpad x220 Pavel Machek @ 2018-11-21 11:19 ` Joonas Lahtinen 2018-11-21 11:54 ` Pavel Machek 0 siblings, 1 reply; 15+ messages in thread From: Joonas Lahtinen @ 2018-11-21 11:19 UTC (permalink / raw) To: Pavel Machek, bp, hpa, kernel list, mingo, tglx, x86 Cc: jani.nikula, rodrigo.vivi, intel-gfx, chris + Chris Quoting Pavel Machek (2018-11-08 19:58:03) > Hi! > > My machine locked hard (thinkpad x220). After reboot, I found this in > syslog: > > Sounds like memory corruption..? Does not sound like easy to debug. Were you doing something GPU intense when you experienced the hard hang? And if so, have you been able to hit the issue more than once? At this point it doesn't look like anything we've hit previously, so would be great to have some more insight into how we could reproduce. There's one similar for nouveau in Bugzilla, but it seems like a genuine memory corruption (1 bit flipped): https://bugs.freedesktop.org/show_bug.cgi?id=84880 Any extra information would be of use :) Regards, Joonas PS. Could you open a bug to Bugzilla, it'll help to collect the information in one consolidated place: https://01.org/linuxgraphics/documentation/how-report-bugs > > ...otoh, it still looks like an addres, so maybe it is "just" race in > GPU drivers? > > Any ideas? > Pavel > > Nov 8 18:35:01 duo CRON[28511]: (root) CMD (command -v debian-sa1 > > /dev/null && debian-sa > 1 1 1) > Nov 8 18:42:57 duo kernel: list_del corruption. prev->next should be > ffff8801742b8178, but > was ffffc9000192fec8 > Nov 8 18:42:57 duo kernel: ------------[ cut here ]------------ > Nov 8 18:42:57 duo kernel: kernel BUG at > /data/fast/l/k/lib/list_debug.c:53! > Nov 8 18:42:57 duo kernel: invalid opcode: 0000 [#1] SMP PTI > Nov 8 18:42:57 duo kernel: CPU: 2 PID: 1082 Comm: i915/signal:1 Not > tainted 4.20.0-rc1+ #3 > Nov 8 18:42:57 duo kernel: Hardware name: LENOVO 42872WU/42872WU, > BIOS 8DET74WW (1.44 ) 03 > /13/2018 > Nov 8 18:42:57 duo kernel: RIP: > 0010:__list_del_entry_valid+0x8e/0x90 > Nov 8 18:42:57 duo kernel: Code: 66 88 d1 ff 0f 0b 48 89 fe 31 c0 48 > c7 c7 90 74 5e 85 e8 > 53 88 d1 ff 0f 0b 48 89 fe 31 c0 48 c7 c7 c8 74 5e 85 e8 40 88 d1 ff > <0f> 0b 55 48 89 d0 48 > 8b 52 08 48 89 e5 48 39 f2 75 19 48 8b 32 48 > Nov 8 18:42:57 duo kernel: RSP: 0000:ffffc9000196be78 EFLAGS: > 00210086 > Nov 8 18:42:57 duo kernel: RAX: 0000000000000054 RBX: > ffff8801742b8178 RCX: 00000000000000 > 00 > Nov 8 18:42:57 duo kernel: RDX: 0000000000000000 RSI: > ffff88019e2a53d8 RDI: ffff88019e2a53 > d8 > Nov 8 18:42:57 duo kernel: RBP: ffffc9000196be78 R08: > ffff880196e2cd10 R09: 00000000000000 > 00 > Nov 8 18:42:57 duo kernel: R10: 00000000e7684eb9 R11: > 3863656632393101 R12: ffffc9000196be > c8 > Nov 8 18:42:57 duo kernel: R13: ffff88019707e000 R14: > ffff8801742b8080 R15: ffffc9000192fd > d0 > Nov 8 18:42:57 duo kernel: FS: 0000000000000000(0000) > GS:ffff88019e280000(0000) knlGS:000 > 0000000000000 > Nov 8 18:42:57 duo kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > Nov 8 18:42:57 duo kernel: CR2: 00000000ed2bf000 CR3: > 000000000581e001 CR4: 00000000000606a0 > Nov 8 18:42:57 duo kernel: Call Trace: > Nov 8 18:42:57 duo kernel: intel_breadcrumbs_signaler+0x162/0x330 > Nov 8 18:42:57 duo kernel: kthread+0x116/0x150 > Nov 8 18:42:57 duo kernel: ? intel_engine_wakeup+0x40/0x40 > Nov 8 18:42:57 duo kernel: ? kthread_park+0x90/0x90 > Nov 8 18:42:57 duo kernel: ret_from_fork+0x35/0x40 > Nov 8 18:42:57 duo kernel: Modules linked in: > Nov 8 18:42:57 duo kernel: ---[ end trace 2f8da183a56f80f6 ]--- > Nov 8 18:42:57 duo kernel: RIP: > 0010:__list_del_entry_valid+0x8e/0x90 > Nov 8 18:42:57 duo kernel: Code: 66 88 d1 ff 0f 0b 48 89 fe 31 c0 > 48 c7 c7 90 74 5e 85 e8 53 88 d1 ff 0f 0b 48 89 fe 31 c0 48 c7 c7 c8 > 74 5e 85 e8 40 88 d1 ff <0f> 0b 55 48 89 d0 48 8b 52 08 48 89 e5 48 > 39 f2 75 19 48 8b 32 48 > Nov 8 18:42:57 duo kernel: RSP: 0000:ffffc9000196be78 EFLAGS: > 00210086 > Nov 8 18:42:57 duo kernel: RAX: 0000000000000054 RBX: > ffff8801742b8178 RCX: 0000000000000000 > Nov 8 18:42:57 duo kernel: RDX: 0000000000000000 RSI: > ffff88019e2a53d8 RDI: ffff88019e2a53d8 > Nov 8 18:42:57 duo kernel: RBP: ffffc9000196be78 R08: > ffff880196e2cd10 R09: 0000000000000000 > Nov 8 18:42:57 duo kernel: R10: 00000000e7684eb9 R11: > 3863656632393101 R12: ffffc9000196bec8 > Nov 8 18:42:57 duo kernel: R13: ffff88019707e000 R14: > ffff8801742b8080 R15: ffffc9000192fdd0 > Nov 8 18:42:57 duo kernel: FS: 0000000000000000(0000) > GS:ffff88019e280000(0000) knlGS:0000000000000000 > Nov 8 18:42:57 duo kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > Nov 8 18:42:57 duo kernel: CR2: 00000000ed2bf000 CR3: > 000000000581e001 CR4: 00000000000606a0 > > -- > (english) http://www.livejournal.com/~pavelmachek > (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: v4.20-rc1: list_del corruption on thinkpad x220 2018-11-21 11:19 ` Joonas Lahtinen @ 2018-11-21 11:54 ` Pavel Machek 2018-11-23 8:17 ` Joonas Lahtinen 0 siblings, 1 reply; 15+ messages in thread From: Pavel Machek @ 2018-11-21 11:54 UTC (permalink / raw) To: Joonas Lahtinen Cc: bp, hpa, kernel list, mingo, tglx, x86, jani.nikula, rodrigo.vivi, intel-gfx, chris [-- Attachment #1: Type: text/plain, Size: 5468 bytes --] Hi! > > My machine locked hard (thinkpad x220). After reboot, I found this in > > syslog: > > > > Sounds like memory corruption..? Does not sound like easy to debug. > > Were you doing something GPU intense when you experienced the hard hang? > > And if so, have you been able to hit the issue more than once? At this > point it doesn't look like anything we've hit previously, so would be > great to have some more insight into how we could reproduce. I seen another crash since that, but I don't think it counts at "easily reproducible". I may have been running flightgear at that point. That's fairly GPU intensive. > There's one similar for nouveau in Bugzilla, but it seems like a genuine > memory corruption (1 bit flipped): > > https://bugs.freedesktop.org/show_bug.cgi?id=84880 > > Any extra information would be of use :) > > Regards, Joonas > > PS. Could you open a bug to Bugzilla, it'll help to collect the > information in one consolidated place: > > https://01.org/linuxgraphics/documentation/how-report-bugs I prefer email... certainly for bugs that can't be reproduced. Best regards, Pavel > > > > ...otoh, it still looks like an addres, so maybe it is "just" race in > > GPU drivers? > > > > Any ideas? > > Pavel > > > > Nov 8 18:35:01 duo CRON[28511]: (root) CMD (command -v debian-sa1 > > > /dev/null && debian-sa > > 1 1 1) > > Nov 8 18:42:57 duo kernel: list_del corruption. prev->next should be > > ffff8801742b8178, but > > was ffffc9000192fec8 > > Nov 8 18:42:57 duo kernel: ------------[ cut here ]------------ > > Nov 8 18:42:57 duo kernel: kernel BUG at > > /data/fast/l/k/lib/list_debug.c:53! > > Nov 8 18:42:57 duo kernel: invalid opcode: 0000 [#1] SMP PTI > > Nov 8 18:42:57 duo kernel: CPU: 2 PID: 1082 Comm: i915/signal:1 Not > > tainted 4.20.0-rc1+ #3 > > Nov 8 18:42:57 duo kernel: Hardware name: LENOVO 42872WU/42872WU, > > BIOS 8DET74WW (1.44 ) 03 > > /13/2018 > > Nov 8 18:42:57 duo kernel: RIP: > > 0010:__list_del_entry_valid+0x8e/0x90 > > Nov 8 18:42:57 duo kernel: Code: 66 88 d1 ff 0f 0b 48 89 fe 31 c0 48 > > c7 c7 90 74 5e 85 e8 > > 53 88 d1 ff 0f 0b 48 89 fe 31 c0 48 c7 c7 c8 74 5e 85 e8 40 88 d1 ff > > <0f> 0b 55 48 89 d0 48 > > 8b 52 08 48 89 e5 48 39 f2 75 19 48 8b 32 48 > > Nov 8 18:42:57 duo kernel: RSP: 0000:ffffc9000196be78 EFLAGS: > > 00210086 > > Nov 8 18:42:57 duo kernel: RAX: 0000000000000054 RBX: > > ffff8801742b8178 RCX: 00000000000000 > > 00 > > Nov 8 18:42:57 duo kernel: RDX: 0000000000000000 RSI: > > ffff88019e2a53d8 RDI: ffff88019e2a53 > > d8 > > Nov 8 18:42:57 duo kernel: RBP: ffffc9000196be78 R08: > > ffff880196e2cd10 R09: 00000000000000 > > 00 > > Nov 8 18:42:57 duo kernel: R10: 00000000e7684eb9 R11: > > 3863656632393101 R12: ffffc9000196be > > c8 > > Nov 8 18:42:57 duo kernel: R13: ffff88019707e000 R14: > > ffff8801742b8080 R15: ffffc9000192fd > > d0 > > Nov 8 18:42:57 duo kernel: FS: 0000000000000000(0000) > > GS:ffff88019e280000(0000) knlGS:000 > > 0000000000000 > > Nov 8 18:42:57 duo kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > > 0000000080050033 > > Nov 8 18:42:57 duo kernel: CR2: 00000000ed2bf000 CR3: > > 000000000581e001 CR4: 00000000000606a0 > > Nov 8 18:42:57 duo kernel: Call Trace: > > Nov 8 18:42:57 duo kernel: intel_breadcrumbs_signaler+0x162/0x330 > > Nov 8 18:42:57 duo kernel: kthread+0x116/0x150 > > Nov 8 18:42:57 duo kernel: ? intel_engine_wakeup+0x40/0x40 > > Nov 8 18:42:57 duo kernel: ? kthread_park+0x90/0x90 > > Nov 8 18:42:57 duo kernel: ret_from_fork+0x35/0x40 > > Nov 8 18:42:57 duo kernel: Modules linked in: > > Nov 8 18:42:57 duo kernel: ---[ end trace 2f8da183a56f80f6 ]--- > > Nov 8 18:42:57 duo kernel: RIP: > > 0010:__list_del_entry_valid+0x8e/0x90 > > Nov 8 18:42:57 duo kernel: Code: 66 88 d1 ff 0f 0b 48 89 fe 31 c0 > > 48 c7 c7 90 74 5e 85 e8 53 88 d1 ff 0f 0b 48 89 fe 31 c0 48 c7 c7 c8 > > 74 5e 85 e8 40 88 d1 ff <0f> 0b 55 48 89 d0 48 8b 52 08 48 89 e5 48 > > 39 f2 75 19 48 8b 32 48 > > Nov 8 18:42:57 duo kernel: RSP: 0000:ffffc9000196be78 EFLAGS: > > 00210086 > > Nov 8 18:42:57 duo kernel: RAX: 0000000000000054 RBX: > > ffff8801742b8178 RCX: 0000000000000000 > > Nov 8 18:42:57 duo kernel: RDX: 0000000000000000 RSI: > > ffff88019e2a53d8 RDI: ffff88019e2a53d8 > > Nov 8 18:42:57 duo kernel: RBP: ffffc9000196be78 R08: > > ffff880196e2cd10 R09: 0000000000000000 > > Nov 8 18:42:57 duo kernel: R10: 00000000e7684eb9 R11: > > 3863656632393101 R12: ffffc9000196bec8 > > Nov 8 18:42:57 duo kernel: R13: ffff88019707e000 R14: > > ffff8801742b8080 R15: ffffc9000192fdd0 > > Nov 8 18:42:57 duo kernel: FS: 0000000000000000(0000) > > GS:ffff88019e280000(0000) knlGS:0000000000000000 > > Nov 8 18:42:57 duo kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > > 0000000080050033 > > Nov 8 18:42:57 duo kernel: CR2: 00000000ed2bf000 CR3: > > 000000000581e001 CR4: 00000000000606a0 > > > > -- > > (english) http://www.livejournal.com/~pavelmachek > > (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: v4.20-rc1: list_del corruption on thinkpad x220 2018-11-21 11:54 ` Pavel Machek @ 2018-11-23 8:17 ` Joonas Lahtinen 2018-11-24 15:23 ` Pavel Machek 0 siblings, 1 reply; 15+ messages in thread From: Joonas Lahtinen @ 2018-11-23 8:17 UTC (permalink / raw) To: Pavel Machek Cc: bp, hpa, kernel list, mingo, tglx, x86, jani.nikula, rodrigo.vivi, intel-gfx, chris Quoting Pavel Machek (2018-11-21 13:54:49) > Hi! > > > > My machine locked hard (thinkpad x220). After reboot, I found this in > > > syslog: > > > > > > Sounds like memory corruption..? Does not sound like easy to debug. > > > > Were you doing something GPU intense when you experienced the hard hang? > > > > And if so, have you been able to hit the issue more than once? At this > > point it doesn't look like anything we've hit previously, so would be > > great to have some more insight into how we could reproduce. > > I seen another crash since that, but I don't think it counts at > "easily reproducible". > > I may have been running flightgear at that point. That's fairly GPU intensive. > > > There's one similar for nouveau in Bugzilla, but it seems like a genuine > > memory corruption (1 bit flipped): > > > > https://bugs.freedesktop.org/show_bug.cgi?id=84880 > > > > Any extra information would be of use :) > > > > Regards, Joonas > > > > PS. Could you open a bug to Bugzilla, it'll help to collect the > > information in one consolidated place: > > > > https://01.org/linuxgraphics/documentation/how-report-bugs > > I prefer email... certainly for bugs that can't be reproduced. By adding it to the Bugzilla it may be recognized by somebody else who is experiencing a similar issue. Internet points are not deducted for submitting bugs in good faith, even if they get closed as NOTABUG. It sounds like you've hit the same signature twice, so it may very well be reproducible. Does flightgear have some demo mode where you could leave it running a heavy scene overnight? Were you running 4.19 kernel previously, distro one or vanilla? A full dmesg from a boot would be appreciated (from kernel where you didn't experience issues, and from one where you do). We actually have a well defined process and personnel to look into the Bugzilla entries, so it'd still be helpful to have this logged to Bugzilla. Regards, Joonas > > Best regards, > Pavel > > > > > > ...otoh, it still looks like an addres, so maybe it is "just" race in > > > GPU drivers? > > > > > > Any ideas? > > > Pavel > > > > > > Nov 8 18:35:01 duo CRON[28511]: (root) CMD (command -v debian-sa1 > > > > /dev/null && debian-sa > > > 1 1 1) > > > Nov 8 18:42:57 duo kernel: list_del corruption. prev->next should be > > > ffff8801742b8178, but > > > was ffffc9000192fec8 > > > Nov 8 18:42:57 duo kernel: ------------[ cut here ]------------ > > > Nov 8 18:42:57 duo kernel: kernel BUG at > > > /data/fast/l/k/lib/list_debug.c:53! > > > Nov 8 18:42:57 duo kernel: invalid opcode: 0000 [#1] SMP PTI > > > Nov 8 18:42:57 duo kernel: CPU: 2 PID: 1082 Comm: i915/signal:1 Not > > > tainted 4.20.0-rc1+ #3 > > > Nov 8 18:42:57 duo kernel: Hardware name: LENOVO 42872WU/42872WU, > > > BIOS 8DET74WW (1.44 ) 03 > > > /13/2018 > > > Nov 8 18:42:57 duo kernel: RIP: > > > 0010:__list_del_entry_valid+0x8e/0x90 > > > Nov 8 18:42:57 duo kernel: Code: 66 88 d1 ff 0f 0b 48 89 fe 31 c0 48 > > > c7 c7 90 74 5e 85 e8 > > > 53 88 d1 ff 0f 0b 48 89 fe 31 c0 48 c7 c7 c8 74 5e 85 e8 40 88 d1 ff > > > <0f> 0b 55 48 89 d0 48 > > > 8b 52 08 48 89 e5 48 39 f2 75 19 48 8b 32 48 > > > Nov 8 18:42:57 duo kernel: RSP: 0000:ffffc9000196be78 EFLAGS: > > > 00210086 > > > Nov 8 18:42:57 duo kernel: RAX: 0000000000000054 RBX: > > > ffff8801742b8178 RCX: 00000000000000 > > > 00 > > > Nov 8 18:42:57 duo kernel: RDX: 0000000000000000 RSI: > > > ffff88019e2a53d8 RDI: ffff88019e2a53 > > > d8 > > > Nov 8 18:42:57 duo kernel: RBP: ffffc9000196be78 R08: > > > ffff880196e2cd10 R09: 00000000000000 > > > 00 > > > Nov 8 18:42:57 duo kernel: R10: 00000000e7684eb9 R11: > > > 3863656632393101 R12: ffffc9000196be > > > c8 > > > Nov 8 18:42:57 duo kernel: R13: ffff88019707e000 R14: > > > ffff8801742b8080 R15: ffffc9000192fd > > > d0 > > > Nov 8 18:42:57 duo kernel: FS: 0000000000000000(0000) > > > GS:ffff88019e280000(0000) knlGS:000 > > > 0000000000000 > > > Nov 8 18:42:57 duo kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > > > 0000000080050033 > > > Nov 8 18:42:57 duo kernel: CR2: 00000000ed2bf000 CR3: > > > 000000000581e001 CR4: 00000000000606a0 > > > Nov 8 18:42:57 duo kernel: Call Trace: > > > Nov 8 18:42:57 duo kernel: intel_breadcrumbs_signaler+0x162/0x330 > > > Nov 8 18:42:57 duo kernel: kthread+0x116/0x150 > > > Nov 8 18:42:57 duo kernel: ? intel_engine_wakeup+0x40/0x40 > > > Nov 8 18:42:57 duo kernel: ? kthread_park+0x90/0x90 > > > Nov 8 18:42:57 duo kernel: ret_from_fork+0x35/0x40 > > > Nov 8 18:42:57 duo kernel: Modules linked in: > > > Nov 8 18:42:57 duo kernel: ---[ end trace 2f8da183a56f80f6 ]--- > > > Nov 8 18:42:57 duo kernel: RIP: > > > 0010:__list_del_entry_valid+0x8e/0x90 > > > Nov 8 18:42:57 duo kernel: Code: 66 88 d1 ff 0f 0b 48 89 fe 31 c0 > > > 48 c7 c7 90 74 5e 85 e8 53 88 d1 ff 0f 0b 48 89 fe 31 c0 48 c7 c7 c8 > > > 74 5e 85 e8 40 88 d1 ff <0f> 0b 55 48 89 d0 48 8b 52 08 48 89 e5 48 > > > 39 f2 75 19 48 8b 32 48 > > > Nov 8 18:42:57 duo kernel: RSP: 0000:ffffc9000196be78 EFLAGS: > > > 00210086 > > > Nov 8 18:42:57 duo kernel: RAX: 0000000000000054 RBX: > > > ffff8801742b8178 RCX: 0000000000000000 > > > Nov 8 18:42:57 duo kernel: RDX: 0000000000000000 RSI: > > > ffff88019e2a53d8 RDI: ffff88019e2a53d8 > > > Nov 8 18:42:57 duo kernel: RBP: ffffc9000196be78 R08: > > > ffff880196e2cd10 R09: 0000000000000000 > > > Nov 8 18:42:57 duo kernel: R10: 00000000e7684eb9 R11: > > > 3863656632393101 R12: ffffc9000196bec8 > > > Nov 8 18:42:57 duo kernel: R13: ffff88019707e000 R14: > > > ffff8801742b8080 R15: ffffc9000192fdd0 > > > Nov 8 18:42:57 duo kernel: FS: 0000000000000000(0000) > > > GS:ffff88019e280000(0000) knlGS:0000000000000000 > > > Nov 8 18:42:57 duo kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > > > 0000000080050033 > > > Nov 8 18:42:57 duo kernel: CR2: 00000000ed2bf000 CR3: > > > 000000000581e001 CR4: 00000000000606a0 > > > > > > -- > > > (english) http://www.livejournal.com/~pavelmachek > > > (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html > > -- > (english) http://www.livejournal.com/~pavelmachek > (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: v4.20-rc1: list_del corruption on thinkpad x220 2018-11-23 8:17 ` Joonas Lahtinen @ 2018-11-24 15:23 ` Pavel Machek 2018-12-08 11:13 ` v4.20-rc1: list_del corruption on thinkpad x220, graphics related? Pavel Machek 0 siblings, 1 reply; 15+ messages in thread From: Pavel Machek @ 2018-11-24 15:23 UTC (permalink / raw) To: Joonas Lahtinen Cc: bp, hpa, kernel list, mingo, tglx, x86, jani.nikula, rodrigo.vivi, intel-gfx, chris [-- Attachment #1: Type: text/plain, Size: 2019 bytes --] Hi! > > > There's one similar for nouveau in Bugzilla, but it seems like a genuine > > > memory corruption (1 bit flipped): > > > > > > https://bugs.freedesktop.org/show_bug.cgi?id=84880 > > > > > > Any extra information would be of use :) > > > > > > Regards, Joonas > > > > > > PS. Could you open a bug to Bugzilla, it'll help to collect the > > > information in one consolidated place: > > > > > > https://01.org/linuxgraphics/documentation/how-report-bugs > > > > I prefer email... certainly for bugs that can't be reproduced. > > By adding it to the Bugzilla it may be recognized by somebody else > who is experiencing a similar issue. Internet points are not deducted > for submitting bugs in good faith, even if they get closed as > NOTABUG. Feel free to copy from email to bugzilla :-). > It sounds like you've hit the same signature twice, so it may very well > be reproducible. Does flightgear have some demo mode where you could > leave it running a heavy scene overnight? I'm not sure if it was same signature twice. I had two lockups, but IIRC only investigated one. Not really a demo mode. I can put plane on autopilot, but eventually gas runs out. (And I guess window needs to be visible for test to be effective.) I tried today, but it did not crash. Do you have something else I could run to do the testing? > Were you running 4.19 kernel previously, distro one or vanilla? A full > dmesg from a boot would be appreciated (from kernel where you didn't > experience issues, and from one where you do). Recent kernels I'm running are self-compiled. > We actually have a well defined process and personnel to look into the > Bugzilla entries, so it'd still be helpful to have this logged to > Bugzilla. If I can reproduce it, it makes sense to create bugzilla entry. Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: v4.20-rc1: list_del corruption on thinkpad x220, graphics related? 2018-11-24 15:23 ` Pavel Machek @ 2018-12-08 11:13 ` Pavel Machek 2018-12-08 11:24 ` Pavel Machek 0 siblings, 1 reply; 15+ messages in thread From: Pavel Machek @ 2018-12-08 11:13 UTC (permalink / raw) To: Joonas Lahtinen Cc: bp, hpa, kernel list, mingo, tglx, x86, jani.nikula, rodrigo.vivi, intel-gfx, chris [-- Attachment #1: Type: text/plain, Size: 8717 bytes --] Hi! > > > > There's one similar for nouveau in Bugzilla, but it seems like a genuine > > > > memory corruption (1 bit flipped): > > > > > > > > https://bugs.freedesktop.org/show_bug.cgi?id=84880 > > > > > > > > Any extra information would be of use :) > > > > > > > > Regards, Joonas > > > > > > > > PS. Could you open a bug to Bugzilla, it'll help to collect the > > > > information in one consolidated place: > > > > > > > > https://01.org/linuxgraphics/documentation/how-report-bugs > > > > > > I prefer email... certainly for bugs that can't be reproduced. > > > > By adding it to the Bugzilla it may be recognized by somebody else > > who is experiencing a similar issue. Internet points are not deducted > > for submitting bugs in good faith, even if they get closed as > > NOTABUG. Well, your documentation suggests you'll deduce my internet points: Before filing the bug, please try to reproduce your issue with the latest kernel. Use the latest drm-tip branch from http://cgit.freedesktop.org/drm-tip and build as instructed on our Build Guide. :-) > Feel free to copy from email to bugzilla :-). Hmm, so it seems it happened again today: Dec 8 11:45:01 duo CRON[29325]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) Dec 8 11:46:42 duo org.mate.panel.applet.MateWeatherAppletFactory[3983]: (mateweather-applet-2:4242): GLib-CRITICAL **: Source ID 14603 was not found when attempting to remove it Dec 8 11:54:59 duo kernel: list_del corruption. prev->next should be ffff88019283ea28, but was ffff8801411a1c68 Dec 8 11:54:59 duo kernel: ------------[ cut here ]------------ Dec 8 11:54:59 duo kernel: kernel BUG at /data/fast/l/k/lib/list_debug.c:53! Dec 8 11:54:59 duo kernel: invalid opcode: 0000 [#1] SMP PTI Dec 8 11:54:59 duo kernel: CPU: 1 PID: 3428 Comm: Xorg Not tainted 4.20.0-rc1+ #4 Dec 8 11:54:59 duo kernel: Hardware name: LENOVO 42872WU/42872WU, BIOS 8DET74WW (1.44 ) 03/13/2018 Dec 8 11:54:59 duo kernel: RIP: 0010:__list_del_entry_valid+0x8e/0x90 Dec 8 11:54:59 duo kernel: Code: 16 88 d1 ff 0f 0b 48 89 fe 31 c0 48 c7 c7 08 75 5e 85 e8 03 88 d1 ff 0f 0b 48 89 fe 31 c0 48 c7 c7 40 75 5e 85 e8 f0 87 d1 ff <0f> 0b 55 48 89 d0 48 8b 52 08 48 89 e5 48 39 f2 75 19 48 8b 32 48 Dec 8 11:54:59 duo kernel: RSP: 0000:ffffc90000223ac0 EFLAGS: 00213282 Dec 8 11:54:59 duo kernel: RAX: 0000000000000054 RBX: ffff880115a07c40 RCX: 0000000000000000 Dec 8 11:54:59 duo kernel: RDX: 0000000000000000 RSI: ffff88019e2653d8 RDI: ffff88019e2653d8 Dec 8 11:54:59 duo kernel: RBP: ffffc90000223ac0 R08: ffff880193a2ad10 R09: 0000000000000000 Dec 8 11:54:59 duo kernel: R10: 00000000008e9088 R11: 2e6e6f6974707501 R12: ffff8801960cb240 Dec 8 11:54:59 duo kernel: R13: ffff88019283e900 R14: ffff880115a07ec0 R15: ffff88019283ea28 Dec 8 11:54:59 duo kernel: FS: 0000000000000000(0000) GS:ffff88019e240000(0063) knlGS:00000000f79c4880 Dec 8 11:54:59 duo kernel: CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 Dec 8 11:54:59 duo kernel: CR2: 00000000086b0df8 CR3: 00000001939f6004 CR4: 00000000000606a0 Dec 8 11:54:59 duo kernel: Call Trace: Dec 8 11:54:59 duo kernel: i915_vma_move_to_active+0x1c3/0x510 Dec 8 11:54:59 duo kernel: ? i915_request_await_object+0xf4/0x280 Dec 8 11:54:59 duo kernel: i915_gem_do_execbuffer+0xe2f/0x10a0 Dec 8 11:54:59 duo kernel: ? find_held_lock+0x39/0xb0 Dec 8 11:54:59 duo kernel: ? kvmalloc_node+0x26/0x70 Dec 8 11:54:59 duo kernel: i915_gem_execbuffer2_ioctl+0x1b4/0x360 Dec 8 11:54:59 duo kernel: ? i915_gem_execbuffer_ioctl+0x290/0x290 Dec 8 11:54:59 duo kernel: drm_ioctl_kernel+0xaa/0xf0 Dec 8 11:54:59 duo kernel: drm_ioctl+0x323/0x3d0 Dec 8 11:54:59 duo kernel: ? i915_gem_execbuffer_ioctl+0x290/0x290 Dec 8 11:54:59 duo kernel: ? posix_ktime_get_ts+0xc/0x10 Dec 8 11:54:59 duo kernel: i915_compat_ioctl+0x37/0x40 Dec 8 11:54:59 duo kernel: __ia32_compat_sys_ioctl+0x429/0xe90 Dec 8 11:54:59 duo kernel: ? put_old_timespec32+0x9/0x10 Dec 8 11:54:59 duo kernel: ? __ia32_compat_sys_clock_gettime+0x67/0x90 Dec 8 11:54:59 duo kernel: do_int80_syscall_32+0x50/0x100 Dec 8 11:54:59 duo kernel: entry_INT80_compat+0x7d/0x82 Dec 8 11:54:59 duo kernel: RIP: 0023:0xf7fd5c42 Dec 8 11:54:59 duo kernel: Code: 65 8b 15 04 00 00 00 8b 0e 8b 0c ca 83 f9 ff 75 0c 89 04 24 89 f0 e8 b3 fe ff ff eb 05 8b 46 04 01 c8 83 c4 14 5b 5e c3 cd 80 <c3> 8d b6 00 00 00 00 8d bc 27 00 00 00 00 8b 1c 24 c3 8d b6 00 00 Dec 8 11:54:59 duo kernel: RSP: 002b:00000000fff1a014 EFLAGS: 00203292 ORIG_RAX: 0000000000000036 Dec 8 11:54:59 duo kernel: RAX: ffffffffffffffda RBX: 000000000000000a RCX: 0000000040406469 Dec 8 11:54:59 duo kernel: RDX: 00000000fff1a0bc RSI: 0000000000000000 RDI: 0000000040406469 Dec 8 11:54:59 duo kernel: RBP: 000000000000000a R08: 0000000000000000 R09: 0000000000000000 Dec 8 11:54:59 duo kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 Dec 8 11:54:59 duo kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Dec 8 11:54:59 duo kernel: Modules linked in: Dec 8 11:54:59 duo kernel: ---[ end trace 0c1e74ccc719c763 ]--- Dec 8 11:54:59 duo kernel: RIP: 0010:__list_del_entry_valid+0x8e/0x90 Dec 8 11:54:59 duo kernel: Code: 16 88 d1 ff 0f 0b 48 89 fe 31 c0 48 c7 c7 08 75 5e 85 e8 03 88 d1 ff 0f 0b 48 89 fe 31 c0 48 c7 c7 40 75 5e 85 e8 f0 87 d1 ff <0f> 0b 55 48 89 d0 48 8b 52 08 48 89 e5 48 39 f2 75 19 48 8b 32 48 Dec 8 11:54:59 duo kernel: RSP: 0000:ffffc90000223ac0 EFLAGS: 00213282 Dec 8 11:54:59 duo kernel: RAX: 0000000000000054 RBX: ffff880115a07c40 RCX: 0000000000000000 Dec 8 11:54:59 duo kernel: RDX: 0000000000000000 RSI: ffff88019e2653d8 RDI: ffff88019e2653d8 Dec 8 11:54:59 duo kernel: RBP: ffffc90000223ac0 R08: ffff880193a2ad10 R09: 0000000000000000 Dec 8 11:54:59 duo kernel: R10: 00000000008e9088 R11: 2e6e6f6974707501 R12: ffff8801960cb240 Dec 8 11:54:59 duo kernel: R13: ffff88019283e900 R14: ffff880115a07ec0 R15: ffff88019283ea28 Dec 8 11:54:59 duo kernel: FS: 0000000000000000(0000) GS:ffff88019e240000(0063) knlGS:00000000f79c4880 Dec 8 11:54:59 duo kernel: CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 Dec 8 11:54:59 duo kernel: CR2: 00000000086b0df8 CR3: 00000001939f6004 CR4: 00000000000606a0 Dec 8 11:54:59 duo org.mate.panel.applet.WnckletFactory[3983]: wnck-applet: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. Dec 8 11:54:59 duo org.mate.panel.applet.MateWeatherAppletFactory[3983]: mateweather-applet-2: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. Dec 8 11:55:00 duo org.mate.panel.applet.CommandAppletFactory[3983]: command-applet: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. Dec 8 11:55:00 duo org.mate.panel.applet.NotificationAreaAppletFactory[3983]: notification-area-applet: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. Dec 8 11:55:00 duo org.mate.panel.applet.ClockAppletFactory[3983]: clock-applet: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. Dec 8 11:55:01 duo CRON[30056]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) Dec 8 11:55:02 duo org.mate.panel.applet.InhibitAppletFactory[3983]: mate-inhibit-applet: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. Dec 8 11:55:09 duo org.a11y.atspi.Registry[4114]: XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":0" Do you see high chance of this being DRM/Intel issue? > > It sounds like you've hit the same signature twice, so it may very well > > be reproducible. Does flightgear have some demo mode where you could > > leave it running a heavy scene overnight? > > I'm not sure if it was same signature twice. I had two lockups, but > IIRC only investigated one. So it is twice now. > Not really a demo mode. I can put plane on autopilot, but eventually > gas runs out. (And I guess window needs to be visible for test to be > effective.) I tried today, but it did not crash. > > Do you have something else I could run to do the testing? This time I was not really running anything graphics heavy, except of chromium playing youtube video. Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: v4.20-rc1: list_del corruption on thinkpad x220, graphics related? 2018-12-08 11:13 ` v4.20-rc1: list_del corruption on thinkpad x220, graphics related? Pavel Machek @ 2018-12-08 11:24 ` Pavel Machek 2018-12-09 11:18 ` v4.20-rc5+ on x220: Resetting chip for hang on rcs0 Pavel Machek 2018-12-10 8:28 ` v4.20-rc1: list_del corruption on thinkpad x220, graphics related? Joonas Lahtinen 0 siblings, 2 replies; 15+ messages in thread From: Pavel Machek @ 2018-12-08 11:24 UTC (permalink / raw) To: Joonas Lahtinen Cc: bp, hpa, kernel list, mingo, tglx, x86, jani.nikula, rodrigo.vivi, intel-gfx, chris [-- Attachment #1: Type: text/plain, Size: 1596 bytes --] On Sat 2018-12-08 12:13:46, Pavel Machek wrote: > Hi! > > > > > > There's one similar for nouveau in Bugzilla, but it seems like a genuine > > > > > memory corruption (1 bit flipped): > > > > > > > > > > https://bugs.freedesktop.org/show_bug.cgi?id=84880 > > > > > > > > > > Any extra information would be of use :) > > > > > > > > > > Regards, Joonas > > > > > > > > > > PS. Could you open a bug to Bugzilla, it'll help to collect the > > > > > information in one consolidated place: > > > > > > > > > > https://01.org/linuxgraphics/documentation/how-report-bugs > > > > > > > > I prefer email... certainly for bugs that can't be reproduced. > > > > > > By adding it to the Bugzilla it may be recognized by somebody else > > > who is experiencing a similar issue. Internet points are not deducted > > > for submitting bugs in good faith, even if they get closed as > > > NOTABUG. > > Well, your documentation suggests you'll deduce my internet points: > > Before filing the bug, please try to reproduce your issue with the > latest kernel. Use the latest drm-tip branch from > http://cgit.freedesktop.org/drm-tip and build as instructed on our > Build Guide. > > :-) I'd prefer not to run drm-tip. I'll update to 2.6.20-rc5+ and see if it re-appears (but it takes long time to reproduce :-(). If you think it is useful, I can try to update my machine to linux-next. Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* v4.20-rc5+ on x220: Resetting chip for hang on rcs0 2018-12-08 11:24 ` Pavel Machek @ 2018-12-09 11:18 ` Pavel Machek 2018-12-10 8:30 ` Joonas Lahtinen 2018-12-10 8:28 ` v4.20-rc1: list_del corruption on thinkpad x220, graphics related? Joonas Lahtinen 1 sibling, 1 reply; 15+ messages in thread From: Pavel Machek @ 2018-12-09 11:18 UTC (permalink / raw) To: Joonas Lahtinen; +Cc: kernel list, jani.nikula, rodrigo.vivi, intel-gfx, chris [-- Attachment #1.1: Type: text/plain, Size: 1286 bytes --] Hi! Another day, another problem... but this one is different from the previous hang, as machine survives. Chromium was running with youtube video playing. [31850.666274] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [31850.666277] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [31850.666279] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [31850.666282] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [31850.666285] [drm] GPU crash dump saved to /sys/class/drm/card0/error [31850.666394] i915 0000:00:02.0: Resetting chip for hang on rcs0 [31850.668474] WARNING: CPU: 0 PID: 13675 at /data/fast/l/k/include/linux/dma-fence.h:503 i915_request_skip+0x71/0x80 [31850.668478] Modules linked in: [31850.668484] CPU: 0 PID: 13675 Comm: kworker/0:3 Not tainted 4.20.0-rc5+ #5 [31850.668487] Hardware name: LENOVO 42872WU/42872WU, BIOS 8DET74WW (1.44 ) 03/13/2018 Dmesg and /sys/class/drm/card0/error are attached. Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #1.2: delme.gz --] [-- Type: application/gzip, Size: 22379 bytes --] [-- Attachment #1.3: delme2.gz --] [-- Type: application/gzip, Size: 2546 bytes --] [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: v4.20-rc5+ on x220: Resetting chip for hang on rcs0 2018-12-09 11:18 ` v4.20-rc5+ on x220: Resetting chip for hang on rcs0 Pavel Machek @ 2018-12-10 8:30 ` Joonas Lahtinen 0 siblings, 0 replies; 15+ messages in thread From: Joonas Lahtinen @ 2018-12-10 8:30 UTC (permalink / raw) To: Pavel Machek; +Cc: kernel list, jani.nikula, rodrigo.vivi, intel-gfx, chris On Sun, 2018-12-09 at 12:18 +0100, Pavel Machek wrote: > Hi! > > Another day, another problem... but this one is different from the > previous hang, as machine survives. Please, file a bug. It says so even in the splat... Regards, Joonas > > Chromium was running with youtube video playing. > > [31850.666274] [drm] GPU hangs can indicate a bug anywhere in the > entire gfx stack, including userspace. > [31850.666277] [drm] Please file a _new_ bug report on > bugs.freedesktop.org against DRI -> DRM/Intel > [31850.666279] [drm] drm/i915 developers can then reassign to the > right component if it's not a kernel issue. > [31850.666282] [drm] The gpu crash dump is required to analyze gpu > hangs, so please always attach it. > [31850.666285] [drm] GPU crash dump saved to > /sys/class/drm/card0/error > [31850.666394] i915 0000:00:02.0: Resetting chip for hang on rcs0 > [31850.668474] WARNING: CPU: 0 PID: 13675 at > /data/fast/l/k/include/linux/dma-fence.h:503 > i915_request_skip+0x71/0x80 > [31850.668478] Modules linked in: > [31850.668484] CPU: 0 PID: 13675 Comm: kworker/0:3 Not tainted > 4.20.0-rc5+ #5 > [31850.668487] Hardware name: LENOVO 42872WU/42872WU, BIOS 8DET74WW > (1.44 ) 03/13/2018 > > Dmesg and /sys/class/drm/card0/error are attached. > > Best regards, > Pavel -- Joonas Lahtinen Open Source Graphics Center Intel Corporation ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: v4.20-rc1: list_del corruption on thinkpad x220, graphics related? 2018-12-08 11:24 ` Pavel Machek 2018-12-09 11:18 ` v4.20-rc5+ on x220: Resetting chip for hang on rcs0 Pavel Machek @ 2018-12-10 8:28 ` Joonas Lahtinen 2018-12-12 18:29 ` 4.20.0-rc6-next-20181210, " Pavel Machek 1 sibling, 1 reply; 15+ messages in thread From: Joonas Lahtinen @ 2018-12-10 8:28 UTC (permalink / raw) To: Pavel Machek Cc: bp, hpa, kernel list, mingo, tglx, x86, jani.nikula, rodrigo.vivi, intel-gfx, chris On Sat, 2018-12-08 at 12:24 +0100, Pavel Machek wrote: > On Sat 2018-12-08 12:13:46, Pavel Machek wrote: > > Hi! > > > > > > > > There's one similar for nouveau in Bugzilla, but it seems like a genuine > > > > > > memory corruption (1 bit flipped): > > > > > > > > > > > > https://bugs.freedesktop.org/show_bug.cgi?id=84880 > > > > > > > > > > > > Any extra information would be of use :) > > > > > > > > > > > > Regards, Joonas > > > > > > > > > > > > PS. Could you open a bug to Bugzilla, it'll help to collect the > > > > > > information in one consolidated place: > > > > > > > > > > > > https://01.org/linuxgraphics/documentation/how-report-bugs > > > > > > > > > > I prefer email... certainly for bugs that can't be reproduced. > > > > > > > > By adding it to the Bugzilla it may be recognized by somebody else > > > > who is experiencing a similar issue. Internet points are not deducted > > > > for submitting bugs in good faith, even if they get closed as > > > > NOTABUG. > > > > Well, your documentation suggests you'll deduce my internet points: > > > > Before filing the bug, please try to reproduce your issue with the > > latest kernel. Use the latest drm-tip branch from > > http://cgit.freedesktop.org/drm-tip and build as instructed on our > > Build Guide. > > > > :-) > > I'd prefer not to run drm-tip. I'll update to 2.6.20-rc5+ and see if > it re-appears (but it takes long time to reproduce :-(). If we can or can not reproduce the issue with drm-tip, is a very useful datapoint for us. If we can not reproduce, it'll be possible to bisect which commit fixed it, and backport that. On the other hand, if it's still reproducible, we know we're not spending time on something we already fixed, and the priority gets a bump. > If you think it is useful, I can try to update my machine to > linux-next. linux-next is closer to drm-tip, so it's better. Do you have some specific reason for not wanting to run drm-tip (but linux-next is still ok)? Regards, Joonas > > Best regards, > Pavel > -- Joonas Lahtinen Open Source Graphics Center Intel Corporation ^ permalink raw reply [flat|nested] 15+ messages in thread
* 4.20.0-rc6-next-20181210, v4.20-rc1: list_del corruption on thinkpad x220, graphics related? 2018-12-10 8:28 ` v4.20-rc1: list_del corruption on thinkpad x220, graphics related? Joonas Lahtinen @ 2018-12-12 18:29 ` Pavel Machek 2018-12-13 8:29 ` Joonas Lahtinen 0 siblings, 1 reply; 15+ messages in thread From: Pavel Machek @ 2018-12-12 18:29 UTC (permalink / raw) To: Joonas Lahtinen Cc: bp, hpa, kernel list, mingo, tglx, x86, jani.nikula, rodrigo.vivi, intel-gfx, chris [-- Attachment #1.1: Type: text/plain, Size: 3377 bytes --] Hi! > > > > > > > There's one similar for nouveau in Bugzilla, but it seems like a genuine > > > > > > > memory corruption (1 bit flipped): > > > > > > > > > > > > > > https://bugs.freedesktop.org/show_bug.cgi?id=84880 > > > > > > > > > > > > > > Any extra information would be of use :) > > > > > > > > > > > > > > Regards, Joonas > > > > > > > > > > > > > > PS. Could you open a bug to Bugzilla, it'll help to collect the > > > > > > > information in one consolidated place: > > > > > > > > > > > > > > https://01.org/linuxgraphics/documentation/how-report-bugs > > > > > > > > > > > > I prefer email... certainly for bugs that can't be reproduced. > > > > > > > > > > By adding it to the Bugzilla it may be recognized by somebody else > > > > > who is experiencing a similar issue. Internet points are not deducted > > > > > for submitting bugs in good faith, even if they get closed as > > > > > NOTABUG. > > > > > > Well, your documentation suggests you'll deduce my internet points: > > > > > > Before filing the bug, please try to reproduce your issue with the > > > latest kernel. Use the latest drm-tip branch from > > > http://cgit.freedesktop.org/drm-tip and build as instructed on our > > > Build Guide. > > > > > > :-) > > > > I'd prefer not to run drm-tip. I'll update to 2.6.20-rc5+ and see if > > it re-appears (but it takes long time to reproduce :-(). > > If we can or can not reproduce the issue with drm-tip, is a very useful > datapoint for us. If we can not reproduce, it'll be possible to bisect > which commit fixed it, and backport that. On the other hand, if it's > still reproducible, we know we're not spending time on something we > already fixed, and the priority gets a bump. bisect ... is not practical on something that takes 2 days to reproduce. > > If you think it is useful, I can try to update my machine to > > linux-next. > > linux-next is closer to drm-tip, so it's better. Do you have some > specific reason for not wanting to run drm-tip (but linux-next is still > ok)? I already have build/update scripts for -next, and I trust -next not to store screenshots of my desktop in my master boot record :-). Anyway, it does happen with -next. This time, chromiums were running, and crash happened minute? after I exited flightgear. It can be seen in the logs. Oh and I might want to mention -- machine was rather deep in swap this time, as in "mouse jumping when starting fgfs" and "could feel the chromium being swapped back in". I might have had this situation before, and just powercycled the machine "because it is so deep in swap that it will not recover". top says: top - 19:18:24 up 2 days, 8:03, 2 users, load average: 3.02, 3.45, 3.21 Tasks: 141 total, 1 running, 86 sleeping, 0 stopped, 2 zombie %Cpu(s): 18.8 us, 7.6 sy, 3.0 ni, 68.4 id, 1.3 wa, 0.0 hi, 0.9 si, 0.0 st KiB Mem: 5967968 total, 663244 used, 5304724 free, 48876 buffers KiB Swap: 1681428 total, 170904 used, 1510524 free. 446280 cached Mem ....but of course that memory is free once everything died. Any ideas? Should I go back to v4.19 to see if it happens there, too? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #1.2: delme.gz --] [-- Type: application/gzip, Size: 19286 bytes --] [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: 4.20.0-rc6-next-20181210, v4.20-rc1: list_del corruption on thinkpad x220, graphics related? 2018-12-12 18:29 ` 4.20.0-rc6-next-20181210, " Pavel Machek @ 2018-12-13 8:29 ` Joonas Lahtinen 2018-12-27 8:34 ` [regression from v4.19] " Pavel Machek 0 siblings, 1 reply; 15+ messages in thread From: Joonas Lahtinen @ 2018-12-13 8:29 UTC (permalink / raw) To: Pavel Machek Cc: bp, hpa, kernel list, mingo, tglx, x86, jani.nikula, rodrigo.vivi, intel-gfx, chris Quoting Pavel Machek (2018-12-12 20:29:02) > Hi! > > > > > > > > > There's one similar for nouveau in Bugzilla, but it seems like a genuine > > > > > > > > memory corruption (1 bit flipped): > > > > > > > > > > > > > > > > https://bugs.freedesktop.org/show_bug.cgi?id=84880 > > > > > > > > > > > > > > > > Any extra information would be of use :) > > > > > > > > > > > > > > > > Regards, Joonas > > > > > > > > > > > > > > > > PS. Could you open a bug to Bugzilla, it'll help to collect the > > > > > > > > information in one consolidated place: > > > > > > > > > > > > > > > > https://01.org/linuxgraphics/documentation/how-report-bugs > > > > > > > > > > > > > > I prefer email... certainly for bugs that can't be reproduced. > > > > > > > > > > > > By adding it to the Bugzilla it may be recognized by somebody else > > > > > > who is experiencing a similar issue. Internet points are not deducted > > > > > > for submitting bugs in good faith, even if they get closed as > > > > > > NOTABUG. > > > > > > > > Well, your documentation suggests you'll deduce my internet points: > > > > > > > > Before filing the bug, please try to reproduce your issue with the > > > > latest kernel. Use the latest drm-tip branch from > > > > http://cgit.freedesktop.org/drm-tip and build as instructed on our > > > > Build Guide. > > > > > > > > :-) > > > > > > I'd prefer not to run drm-tip. I'll update to 2.6.20-rc5+ and see if > > > it re-appears (but it takes long time to reproduce :-(). > > > > If we can or can not reproduce the issue with drm-tip, is a very useful > > datapoint for us. If we can not reproduce, it'll be possible to bisect > > which commit fixed it, and backport that. On the other hand, if it's > > still reproducible, we know we're not spending time on something we > > already fixed, and the priority gets a bump. > > bisect ... is not practical on something that takes 2 days to reproduce. > > > > If you think it is useful, I can try to update my machine to > > > linux-next. > > > > linux-next is closer to drm-tip, so it's better. Do you have some > > specific reason for not wanting to run drm-tip (but linux-next is still > > ok)? > > I already have build/update scripts for -next, and I trust -next not > to store screenshots of my desktop in my master boot record :-). > > Anyway, it does happen with -next. This time, chromiums were running, > and crash happened minute? after I exited flightgear. It can be seen > in the logs. > > Oh and I might want to mention -- machine was rather deep in swap this > time, as in "mouse jumping when starting fgfs" and "could feel the > chromium being swapped back in". I might have had this situation > before, and just powercycled the machine "because it is so deep in > swap that it will not recover". > > top says: > > top - 19:18:24 up 2 days, 8:03, 2 users, load average: 3.02, 3.45, > 3.21 > Tasks: 141 total, 1 running, 86 sleeping, 0 stopped, 2 zombie > %Cpu(s): 18.8 us, 7.6 sy, 3.0 ni, 68.4 id, 1.3 wa, 0.0 hi, 0.9 > si, 0.0 st > KiB Mem: 5967968 total, 663244 used, 5304724 free, 48876 > buffers > KiB Swap: 1681428 total, 170904 used, 1510524 free. 446280 > cached Mem > > ....but of course that memory is free once everything died. > > Any ideas? Should I go back to v4.19 to see if it happens there, too? linux-next includes very much the same code as drm-tip. There's nobody magically reviewing the code more than it is reviewed for inclusion into drm-tip, when it is fed into linux-next. So thinking linux-next would be some way safer is an illusion. It sounds like having memory pressure expedites the corruption, which should make it easier to reproduce and thus fix. So if you could please try drm-tip reproducing AND open a bug in Bugzilla. If you are unwilling to do that, it is very difficult to help you more. Regards, Joonas > > > Pavel > -- > (english) http://www.livejournal.com/~pavelmachek > (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* [regression from v4.19] Re: 4.20.0-rc6-next-20181210, v4.20-rc1: list_del corruption on thinkpad x220, graphics related? 2018-12-13 8:29 ` Joonas Lahtinen @ 2018-12-27 8:34 ` Pavel Machek 2019-01-02 9:34 ` Joonas Lahtinen 0 siblings, 1 reply; 15+ messages in thread From: Pavel Machek @ 2018-12-27 8:34 UTC (permalink / raw) To: Joonas Lahtinen Cc: bp, hpa, kernel list, mingo, tglx, x86, jani.nikula, rodrigo.vivi, intel-gfx, chris [-- Attachment #1: Type: text/plain, Size: 2614 bytes --] Hi! > > > > If you think it is useful, I can try to update my machine to > > > > linux-next. > > > > > > linux-next is closer to drm-tip, so it's better. Do you have some > > > specific reason for not wanting to run drm-tip (but linux-next is still > > > ok)? > > > > I already have build/update scripts for -next, and I trust -next not > > to store screenshots of my desktop in my master boot record :-). > > > > Anyway, it does happen with -next. This time, chromiums were running, > > and crash happened minute? after I exited flightgear. It can be seen > > in the logs. > > > > Oh and I might want to mention -- machine was rather deep in swap this > > time, as in "mouse jumping when starting fgfs" and "could feel the > > chromium being swapped back in". I might have had this situation > > before, and just powercycled the machine "because it is so deep in > > swap that it will not recover". > > > > top says: > > > > top - 19:18:24 up 2 days, 8:03, 2 users, load average: 3.02, 3.45, > > 3.21 > > Tasks: 141 total, 1 running, 86 sleeping, 0 stopped, 2 zombie > > %Cpu(s): 18.8 us, 7.6 sy, 3.0 ni, 68.4 id, 1.3 wa, 0.0 hi, 0.9 > > si, 0.0 st > > KiB Mem: 5967968 total, 663244 used, 5304724 free, 48876 > > buffers > > KiB Swap: 1681428 total, 170904 used, 1510524 free. 446280 > > cached Mem > > > > ....but of course that memory is free once everything died. > > > > Any ideas? Should I go back to v4.19 to see if it happens there, too? > > linux-next includes very much the same code as drm-tip. There's nobody > magically reviewing the code more than it is reviewed for inclusion into > drm-tip, when it is fed into linux-next. So thinking linux-next would be > some way safer is an illusion. > > It sounds like having memory pressure expedites the corruption, which > should make it easier to reproduce and thus fix. > > So if you could please try drm-tip reproducing AND open a bug in Bugzilla. > If you are unwilling to do that, it is very difficult to help you > more. Website says I have to read and agree to two different pieces of legalesee, and I'd need to keep track of yet another password... so you can "communicate" with me. But you can already communicate with me, over email. I verified v4.19 is stable -- it worked ok for way more than two days it usually takes to crash. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [regression from v4.19] Re: 4.20.0-rc6-next-20181210, v4.20-rc1: list_del corruption on thinkpad x220, graphics related? 2018-12-27 8:34 ` [regression from v4.19] " Pavel Machek @ 2019-01-02 9:34 ` Joonas Lahtinen 2019-01-02 21:02 ` Pavel Machek 0 siblings, 1 reply; 15+ messages in thread From: Joonas Lahtinen @ 2019-01-02 9:34 UTC (permalink / raw) To: Pavel Machek Cc: bp, hpa, kernel list, mingo, tglx, x86, jani.nikula, rodrigo.vivi, intel-gfx, chris Quoting Pavel Machek (2018-12-27 10:34:39) > Hi! > > > > > > If you think it is useful, I can try to update my machine to > > > > > linux-next. > > > > > > > > linux-next is closer to drm-tip, so it's better. Do you have some > > > > specific reason for not wanting to run drm-tip (but linux-next is still > > > > ok)? > > > > > > I already have build/update scripts for -next, and I trust -next not > > > to store screenshots of my desktop in my master boot record :-). > > > > > > Anyway, it does happen with -next. This time, chromiums were running, > > > and crash happened minute? after I exited flightgear. It can be seen > > > in the logs. > > > > > > Oh and I might want to mention -- machine was rather deep in swap this > > > time, as in "mouse jumping when starting fgfs" and "could feel the > > > chromium being swapped back in". I might have had this situation > > > before, and just powercycled the machine "because it is so deep in > > > swap that it will not recover". > > > > > > top says: > > > > > > top - 19:18:24 up 2 days, 8:03, 2 users, load average: 3.02, 3.45, > > > 3.21 > > > Tasks: 141 total, 1 running, 86 sleeping, 0 stopped, 2 zombie > > > %Cpu(s): 18.8 us, 7.6 sy, 3.0 ni, 68.4 id, 1.3 wa, 0.0 hi, 0.9 > > > si, 0.0 st > > > KiB Mem: 5967968 total, 663244 used, 5304724 free, 48876 > > > buffers > > > KiB Swap: 1681428 total, 170904 used, 1510524 free. 446280 > > > cached Mem > > > > > > ....but of course that memory is free once everything died. > > > > > > Any ideas? Should I go back to v4.19 to see if it happens there, too? > > > > linux-next includes very much the same code as drm-tip. There's nobody > > magically reviewing the code more than it is reviewed for inclusion into > > drm-tip, when it is fed into linux-next. So thinking linux-next would be > > some way safer is an illusion. > > > > It sounds like having memory pressure expedites the corruption, which > > should make it easier to reproduce and thus fix. > > > > So if you could please try drm-tip reproducing AND open a bug in Bugzilla. > > If you are unwilling to do that, it is very difficult to help you > > more. > > Website says I have to read and agree to two different pieces of > legalesee, and I'd need to keep track of yet another password... so > you can "communicate" with me. > > But you can already communicate with me, over email. I've listed all the reasons why our bug handling process is what it is. If registering to the Bugzilla is too much of an effort for you, then I won't be able to help you further on this. Regards, Joonas > I verified v4.19 is stable -- it worked ok for way more than two days > it usually takes to crash. > > Pavel > -- > (english) http://www.livejournal.com/~pavelmachek > (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [regression from v4.19] Re: 4.20.0-rc6-next-20181210, v4.20-rc1: list_del corruption on thinkpad x220, graphics related? 2019-01-02 9:34 ` Joonas Lahtinen @ 2019-01-02 21:02 ` Pavel Machek 0 siblings, 0 replies; 15+ messages in thread From: Pavel Machek @ 2019-01-02 21:02 UTC (permalink / raw) To: Joonas Lahtinen Cc: bp, hpa, kernel list, mingo, tglx, x86, jani.nikula, rodrigo.vivi, intel-gfx, chris [-- Attachment #1: Type: text/plain, Size: 1065 bytes --] Hi! > > > So if you could please try drm-tip reproducing AND open a bug in Bugzilla. > > > If you are unwilling to do that, it is very difficult to help you > > > more. > > > > Website says I have to read and agree to two different pieces of > > legalesee, and I'd need to keep track of yet another password... so > > you can "communicate" with me. > > > > But you can already communicate with me, over email. > > I've listed all the reasons why our bug handling process is what it is. > > If registering to the Bugzilla is too much of an effort for you, then I > won't be able to help you further on this. Actually I did register at the bugzilla. Only useful help there was that CONFIG_DRM_I915_DEBUG_GEM might be useful. Unfortunately that one seems to make it panic() and impossible to get anything useful. https://bugs.freedesktop.org/show_bug.cgi?id=109175 Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2019-01-02 21:02 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-11-08 17:58 v4.20-rc1: list_del corruption on thinkpad x220 Pavel Machek 2018-11-21 11:19 ` Joonas Lahtinen 2018-11-21 11:54 ` Pavel Machek 2018-11-23 8:17 ` Joonas Lahtinen 2018-11-24 15:23 ` Pavel Machek 2018-12-08 11:13 ` v4.20-rc1: list_del corruption on thinkpad x220, graphics related? Pavel Machek 2018-12-08 11:24 ` Pavel Machek 2018-12-09 11:18 ` v4.20-rc5+ on x220: Resetting chip for hang on rcs0 Pavel Machek 2018-12-10 8:30 ` Joonas Lahtinen 2018-12-10 8:28 ` v4.20-rc1: list_del corruption on thinkpad x220, graphics related? Joonas Lahtinen 2018-12-12 18:29 ` 4.20.0-rc6-next-20181210, " Pavel Machek 2018-12-13 8:29 ` Joonas Lahtinen 2018-12-27 8:34 ` [regression from v4.19] " Pavel Machek 2019-01-02 9:34 ` Joonas Lahtinen 2019-01-02 21:02 ` Pavel Machek
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).