All of lore.kernel.org
 help / color / mirror / Atom feed
* Guest soft lockups with "xen: make xen_qlock_wait() nestable"
@ 2018-11-07  9:30 Sander Eikelenboom
  2018-11-07 22:34 ` Boris Ostrovsky
  2018-11-08  7:08 ` Juergen Gross
  0 siblings, 2 replies; 14+ messages in thread
From: Sander Eikelenboom @ 2018-11-07  9:30 UTC (permalink / raw)
  To: Juergen Gross, Boris Ostrovsky; +Cc: xen-devel

Hi Juergen / Boris,

Last week i tested Linux kernel 4.19.0 stable with the Xen "for-linus-4.20" branch pulled on top.
Unfortunately i was seeing guests lockup after some time, see below for the logging from one of the guest
which i was able to capture.
Reverting "xen: make xen_qlock_wait() nestable" 7250f6d35681dfc44749d90598a2d51a118ce2b8,
made the lockups disappear.

These guests are stressed quite hard in both CPU and networking, 
so they are probably more susceptible to locking issues.

System is a AMD phenom x6, running Xen-unstable.

Any ideas ?

--
Sander


serveerstertje:~# xl console security
[ 6045.805396] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [ml1:mon-front-i:20428]
[ 6045.826995] Modules linked in:
[ 6045.836310] CPU: 1 PID: 20428 Comm: ml1:mon-front-i Tainted: G             L    4.19.0-20181022-doflr-xennext-vlan-ppp+ #1
[ 6045.865526] Hardware name: Xen HVM domU, BIOS 4.12-unstable 10/27/2018
[ 6045.882784] RIP: 0010:__pv_queued_spin_lock_slowpath+0xda/0x280
[ 6045.897019] Code: 44 41 bc 01 00 00 00 41 bd 00 01 00 00 3c 02 0f 94 c0 0f b6 c0 48 89 04 24 c6 45 44 00 ba 00 80 00 00 c6 43 01 01 eb 0b f3 90 <83> ea 01 0f 84 49 01 00 00 0f b6 03 84 c0 75 ee 44 89 e8 f0 66 44
[ 6045.902111] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [ml1:mon-front-i:20429]
[ 6045.945207] RSP: 0000:ffffc90003ba7d38 EFLAGS: 00000202
[ 6045.965743] Modules linked in:
[ 6045.965748] CPU: 2 PID: 20429 Comm: ml1:mon-front-i Tainted: G             L    4.19.0-20181022-doflr-xennext-vlan-ppp+ #1
[ 6045.965748] Hardware name: Xen HVM domU, BIOS 4.12-unstable 10/27/2018
[ 6045.965753] RIP: 0010:smp_call_function_many+0x1db/0x240
[ 6045.965756] Code: c7 e8 99 ab d1 00 3b 05 67 2e a0 01 0f 83 a3 fe ff ff 48 63 d0 48 8b 0b 48 03 0c d5 00 47 8a 82 8b 51 18 83 e2 01 74 0a f3 90 <8b> 51 18 83 e2 01 75 f6 eb c8 48 c7 c2 a0 e0 b3 82 48 89 ee 89 df
[ 6045.965757] RSP: 0000:ffffc90003bafc70 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff0c
[ 6045.980003]  ORIG_RAX: ffffffffffffff0c
[ 6045.988675] RAX: 0000000000000000 RBX: ffff88007bd21d80 RCX: ffff88007bc25ae0
[ 6045.995387] watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [nc4:mon-front-i:3291]
[ 6045.995388] Modules linked in:
[ 6045.995392] CPU: 3 PID: 3291 Comm: nc4:mon-front-i Tainted: G             L    4.19.0-20181022-doflr-xennext-vlan-ppp+ #1
[ 6045.995392] Hardware name: Xen HVM domU, BIOS 4.12-unstable 10/27/2018
[ 6045.995397] RIP: 0010:smp_call_function_many+0x1db/0x240
[ 6045.995400] Code: c7 e8 99 ab d1 00 3b 05 67 2e a0 01 0f 83 a3 fe ff ff 48 63 d0 48 8b 0b 48 03 0c d5 00 47 8a 82 8b 51 18 83 e2 01 74 0a f3 90 <8b> 51 18 83 e2 01 75 f6 eb c8 48 c7 c2 a0 e0 b3 82 48 89 ee 89 df
[ 6045.995401] RSP: 0000:ffffc90002307c70 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff0c
[ 6045.995402] RAX: 0000000000000000 RBX: ffff88007bda1d80 RCX: ffff88007bc25c20
[ 6045.995403] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88007bda1d88
[ 6045.995403] RBP: ffff88007bda1d88 R08: 0000000000000000 R09: 0000000000000000
[ 6045.995404] R10: 0000000000000000 R11: 0000000000000040 R12: ffffffff81057cb0
[ 6045.995404] R13: ffffc90002307cc0 R14: 0000000000000001 R15: 0000000000000006
[ 6045.995413] FS:  00007f4eafa0f700(0000) GS:ffff88007bd80000(0000) knlGS:0000000000000000
[ 6045.995414] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6045.995415] CR2: 00007f4eaca182c0 CR3: 000000007462a000 CR4: 00000000000006e0
[ 6045.995416] Call Trace:
[ 6045.995422]  flush_tlb_mm_range+0xb7/0x120
[ 6045.995425]  ? ptep_clear_flush+0x30/0x40
[ 6045.995427]  ? mem_cgroup_throttle_swaprate+0x12/0x110
[ 6045.995429]  ? mem_cgroup_try_charge_delay+0x2c/0x40
[ 6045.995430]  ptep_clear_flush+0x30/0x40
[ 6045.995432]  wp_page_copy+0x311/0x6c0
[ 6045.995434]  do_wp_page+0x111/0x4c0
[ 6045.995435]  __handle_mm_fault+0x445/0xbd0
[ 6045.995437]  handle_mm_fault+0xf8/0x200
[ 6045.995438]  __do_page_fault+0x231/0x460
[ 6045.995441]  ? page_fault+0x8/0x30
[ 6045.995441]  page_fault+0x1e/0x30
[ 6045.995443] RIP: 0033:0x7f4ee1243476
[ 6045.995444] Code: 87 f3 c3 90 4c 8d 04 52 f3 0f 6f 06 f3 0f 6f 0c 16 f3 0f 6f 14 56 f3 42 0f 6f 1c 06 66 0f 7f 07 66 0f 7f 0c 17 66 0f 7f 14 57 <66> 42 0f 7f 1c 07 83 e9 04 48 8d 34 96 48 8d 3c 97 75 cb f3 c3 90
[ 6045.995445] RSP: 002b:00007f4eafa0df38 EFLAGS: 00010202
[ 6045.995446] RAX: 0000000000000000 RBX: 00007f4ea8014500 RCX: 0000000000000008
[ 6045.995447] RDX: 0000000000000780 RSI: 00007f4ea8287500 RDI: 00007f4eaca16c40
[ 6045.995447] RBP: 0000000000000d80 R08: 0000000000001680 R09: 0000000000000360
[ 6045.995448] R10: 0000000000000000 R11: 0000000000000440 R12: 0000000000000000
[ 6045.995448] R13: 0000000000000000 R14: 0000000000000440 R15: 0000000000000000
[ 6046.017424] RAX: 0000000000000001 RBX: ffffea0001d194e8 RCX: ffff88007b06e000
[ 6046.017428] RDX: 0000000000003c23 RSI: 000000000000003b RDI: 0000000000000246
[ 6046.036278] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88007bd21d88
[ 6046.036279] RBP: ffff88007bd21d88 R08: 0000000000000000 R09: 0000000000000000
[ 6046.036280] R10: 0000000000000000 R11: 0000000000000040 R12: ffffffff81057cb0
[ 6046.036281] R13: ffffc90003bafcc0 R14: 0000000000000001 R15: 0000000000000006
[ 6046.036289] FS:  00007f4e77ded700(0000) GS:ffff88007bd00000(0000) knlGS:0000000000000000
[ 6046.052202] RBP: ffff88007bca1a00 R08: ffff88007b06e000 R09: 0000000000000000
[ 6046.052203] R10: 0000000000000000 R11: 0000000000000040 R12: 0000000000000001
[ 6046.052204] R13: 0000000000000100 R14: ffff88007bc21a00 R15: 0000000000080000
[ 6046.052211] FS:  00007f4e775ec700(0000) GS:ffff88007bc80000(0000) knlGS:0000000000000000
[ 6046.103484] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6046.103485] CR2: 00007f4ec1d81660 CR3: 000000007462a000 CR4: 00000000000006e0
[ 6046.103488] Call Trace:
[ 6046.103494]  flush_tlb_mm_range+0xb7/0x120
[ 6046.103497]  ? ptep_clear_flush+0x30/0x40
[ 6046.103498]  ptep_clear_flush+0x30/0x40
[ 6046.103500]  wp_page_copy+0x311/0x6c0
[ 6046.103502]  do_wp_page+0x111/0x4c0
[ 6046.103504]  __handle_mm_fault+0x445/0xbd0
[ 6046.103506]  handle_mm_fault+0xf8/0x200
[ 6046.103508]  __do_page_fault+0x231/0x460
[ 6046.103510]  ? page_fault+0x8/0x30
[ 6046.103511]  page_fault+0x1e/0x30
[ 6046.103513] RIP: 0033:0x7f4edb7bfe52
[ 6046.103516] Code: 1f 84 00 00 00 00 00 90 48 8d 04 49 4c 8d 0c 76 0f 28 02 0f 28 0c 0a 0f 28 14 4a 0f 28 1c 02 0f 29 07 0f 29 0c 37 0f 29 14 77 <42> 0f 29 1c 0f 48 8d 14 8a 48 8d 3c b7 41 83 e8 04 7f d3 f3 c3 0f
[ 6046.103517] RSP: 002b:00007f4e77dea3b8 EFLAGS: 00010202
[ 6046.103518] RAX: 0000000000000060 RBX: 00007f4ec011dd40 RCX: 0000000000000020
[ 6046.103518] RDX: 00007f4ec01245c0 RSI: 00000000000007c0 RDI: 00007f4ec1d7ff20
[ 6046.103519] RBP: 0000000000000002 R08: 0000000000000008 R09: 0000000000001740
[ 6046.103520] R10: 0000000000000000 R11: 0000000000000000 R12: 00007f4ec0735180
[ 6046.103520] R13: 0000000000000528 R14: 0000000000000528 R15: 00007f4ec0718440
[ 6047.332569] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6047.342645] CR2: 00007f4ec1dad760 CR3: 000000007462a000 CR4: 00000000000006e0
[ 6047.355740] Call Trace:
[ 6047.360760]  _raw_spin_lock+0x18/0x20
[ 6047.368569]  wp_page_copy+0x209/0x6c0
[ 6047.377198]  do_wp_page+0x111/0x4c0
[ 6047.386664]  __handle_mm_fault+0x445/0xbd0
[ 6047.397824]  handle_mm_fault+0xf8/0x200
[ 6047.408422]  __do_page_fault+0x231/0x460
[ 6047.419309]  ? page_fault+0x8/0x30
[ 6047.429095]  page_fault+0x1e/0x30
[ 6047.438655] RIP: 0033:0x7f4edb6b5880
[ 6047.448848] Code: 08 49 63 c0 48 01 d0 f6 c2 04 74 06 89 30 41 83 c0 04 89 f7 48 89 f8 48 c1 e0 20 48 01 c7 49 63 c0 66 0f 1f 84 00 00 00 00 00 <48> 89 3c 02 48 83 c0 08 39 c1 7f f4 8b 44 24 48 44 29 c0 83 e0 f8
[ 6047.496447] RSP: 002b:00007f4e775e9280 EFLAGS: 00010246
[ 6047.511096] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000019
[ 6047.529908] RDX: 00007f4ec1dad760 RSI: 0000000000000000 RDI: 0000000000000000
[ 6047.547875] RBP: 00007f4ec1daa920 R08: 0000000000000000 R09: 0000000000000000
[ 6047.567437] R10: 0000000000000000 R11: 0000000000000005 R12: 0000000000000020
[ 6047.585753] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
[ 6073.805174] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [ml1:mon-front-i:20428]
[ 6073.827576] Modules linked in:
[ 6073.836754] CPU: 1 PID: 20428 Comm: ml1:mon-front-i Tainted: G             L    4.19.0-20181022-doflr-xennext-vlan-ppp+ #1
[ 6073.866318] Hardware name: Xen HVM domU, BIOS 4.12-unstable 10/27/2018
[ 6073.885331] RIP: 0010:__pv_queued_spin_lock_slowpath+0xda/0x280
[ 6073.901825] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [ml1:mon-front-i:20429]
[ 6073.905547] Code: 44 41 bc 01 00 00 00 41 bd 00 01 00 00 3c 02 0f 94 c0 0f b6 c0 48 89 04 24 c6 45 44 00 ba 00 80 00 00 c6 43 01 01 eb 0b f3 90 <83> ea 01 0f 84 49 01 00 00 0f b6 03 84 c0 75 ee 44 89 e8 f0 66 44
[ 6073.905548] RSP: 0000:ffffc90003ba7d38 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff0c
[ 6073.905550] RAX: 0000000000000001 RBX: ffffea0001d194e8 RCX: ffff88007b06e000
[ 6073.905550] RDX: 0000000000005395 RSI: 000000000000003b RDI: 0000000000000246
[ 6073.905551] RBP: ffff88007bca1a00 R08: ffff88007b06e000 R09: 0000000000000000
[ 6073.905552] R10: 0000000000000000 R11: 0000000000000040 R12: 0000000000000001
[ 6073.905552] R13: 0000000000000100 R14: ffff88007bc21a00 R15: 0000000000080000
[ 6073.905561] FS:  00007f4e775ec700(0000) GS:ffff88007bc80000(0000) knlGS:0000000000000000
[ 6073.905562] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6073.905563] CR2: 00007f4ec1dad760 CR3: 000000007462a000 CR4: 00000000000006e0
[ 6073.905565] Call Trace:
[ 6073.905571]  _raw_spin_lock+0x18/0x20
[ 6073.905573]  wp_page_copy+0x209/0x6c0
[ 6073.905575]  do_wp_page+0x111/0x4c0
[ 6073.905577]  __handle_mm_fault+0x445/0xbd0
[ 6073.905578]  handle_mm_fault+0xf8/0x200
[ 6073.905580]  __do_page_fault+0x231/0x460
[ 6073.905582]  ? page_fault+0x8/0x30
[ 6073.905583]  page_fault+0x1e/0x30
[ 6073.905584] RIP: 0033:0x7f4edb6b5880
[ 6073.905586] Code: 08 49 63 c0 48 01 d0 f6 c2 04 74 06 89 30 41 83 c0 04 89 f7 48 89 f8 48 c1 e0 20 48 01 c7 49 63 c0 66 0f 1f 84 00 00 00 00 00 <48> 89 3c 02 48 83 c0 08 39 c1 7f f4 8b 44 24 48 44 29 c0 83 e0 f8
[ 6073.905586] RSP: 002b:00007f4e775e9280 EFLAGS: 00010246
[ 6073.905587] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000019
[ 6073.905588] RDX: 00007f4ec1dad760 RSI: 0000000000000000 RDI: 0000000000000000
[ 6073.905588] RBP: 00007f4ec1daa920 R08: 0000000000000000 R09: 0000000000000000
[ 6073.905589] R10: 0000000000000000 R11: 0000000000000005 R12: 0000000000000020
[ 6073.905590] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
[ 6073.995165] watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [nc4:mon-front-i:3291]
[ 6074.004733] Modules linked in:
[ 6074.004737] CPU: 2 PID: 20429 Comm: ml1:mon-front-i Tainted: G             L    4.19.0-20181022-doflr-xennext-vlan-ppp+ #1
[ 6074.004738] Hardware name: Xen HVM domU, BIOS 4.12-unstable 10/27/2018
[ 6074.004742] RIP: 0010:smp_call_function_many+0x1db/0x240
[ 6074.004744] Code: c7 e8 99 ab d1 00 3b 05 67 2e a0 01 0f 83 a3 fe ff ff 48 63 d0 48 8b 0b 48 03 0c d5 00 47 8a 82 8b 51 18 83 e2 01 74 0a f3 90 <8b> 51 18 83 e2 01 75 f6 eb c8 48 c7 c2 a0 e0 b3 82 48 89 ee 89 df
[ 6074.004745] RSP: 0000:ffffc90003bafc70 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff0c
[ 6074.004747] RAX: 0000000000000000 RBX: ffff88007bd21d80 RCX: ffff88007bc25ae0
[ 6074.004747] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88007bd21d88
[ 6074.004748] RBP: ffff88007bd21d88 R08: 0000000000000000 R09: 0000000000000000
[ 6074.004749] R10: 0000000000000000 R11: 0000000000000040 R12: ffffffff81057cb0
[ 6074.004749] R13: ffffc90003bafcc0 R14: 0000000000000001 R15: 0000000000000006
[ 6074.004757] FS:  00007f4e77ded700(0000) GS:ffff88007bd00000(0000) knlGS:0000000000000000
[ 6074.004757] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6074.004758] CR2: 00007f4ec1d81660 CR3: 000000007462a000 CR4: 00000000000006e0
[ 6074.004760] Call Trace:
[ 6074.004764]  flush_tlb_mm_range+0xb7/0x120
[ 6074.004767]  ? ptep_clear_flush+0x30/0x40
[ 6074.004768]  ptep_clear_flush+0x30/0x40
[ 6074.004770]  wp_page_copy+0x311/0x6c0
[ 6074.004771]  do_wp_page+0x111/0x4c0
[ 6074.004773]  __handle_mm_fault+0x445/0xbd0
[ 6074.004774]  handle_mm_fault+0xf8/0x200
[ 6074.004776]  __do_page_fault+0x231/0x460
[ 6074.004778]  ? page_fault+0x8/0x30
[ 6074.004779]  page_fault+0x1e/0x30
[ 6074.004780] RIP: 0033:0x7f4edb7bfe52
[ 6074.004781] Code: 1f 84 00 00 00 00 00 90 48 8d 04 49 4c 8d 0c 76 0f 28 02 0f 28 0c 0a 0f 28 14 4a 0f 28 1c 02 0f 29 07 0f 29 0c 37 0f 29 14 77 <42> 0f 29 1c 0f 48 8d 14 8a 48 8d 3c b7 41 83 e8 04 7f d3 f3 c3 0f
[ 6074.004782] RSP: 002b:00007f4e77dea3b8 EFLAGS: 00010202
[ 6074.004783] RAX: 0000000000000060 RBX: 00007f4ec011dd40 RCX: 0000000000000020
[ 6074.004784] RDX: 00007f4ec01245c0 RSI: 00000000000007c0 RDI: 00007f4ec1d7ff20
[ 6074.004784] RBP: 0000000000000002 R08: 0000000000000008 R09: 0000000000001740
[ 6074.004785] R10: 0000000000000000 R11: 0000000000000000 R12: 00007f4ec0735180
[ 6074.004785] R13: 0000000000000528 R14: 0000000000000528 R15: 00007f4ec0718440
[ 6075.121332] Modules linked in:
[ 6075.130018] CPU: 3 PID: 3291 Comm: nc4:mon-front-i Tainted: G             L    4.19.0-20181022-doflr-xennext-vlan-ppp+ #1
[ 6075.157613] Hardware name: Xen HVM domU, BIOS 4.12-unstable 10/27/2018
[ 6075.173412] RIP: 0010:smp_call_function_many+0x1db/0x240
[ 6075.186886] Code: c7 e8 99 ab d1 00 3b 05 67 2e a0 01 0f 83 a3 fe ff ff 48 63 d0 48 8b 0b 48 03 0c d5 00 47 8a 82 8b 51 18 83 e2 01 74 0a f3 90 <8b> 51 18 83 e2 01 75 f6 eb c8 48 c7 c2 a0 e0 b3 82 48 89 ee 89 df
[ 6075.241527] RSP: 0000:ffffc90002307c70 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff0c
[ 6075.262894] RAX: 0000000000000000 RBX: ffff88007bda1d80 RCX: ffff88007bc25c20
[ 6075.281203] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88007bda1d88
[ 6075.295600] RBP: ffff88007bda1d88 R08: 0000000000000000 R09: 0000000000000000
[ 6075.311119] R10: 0000000000000000 R11: 0000000000000040 R12: ffffffff81057cb0
[ 6075.329119] R13: ffffc90002307cc0 R14: 0000000000000001 R15: 0000000000000006
[ 6075.348512] FS:  00007f4eafa0f700(0000) GS:ffff88007bd80000(0000) knlGS:0000000000000000
[ 6075.371745] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6075.388596] CR2: 00007f4eaca182c0 CR3: 000000007462a000 CR4: 00000000000006e0
[ 6075.409252] Call Trace:
[ 6075.417519]  flush_tlb_mm_range+0xb7/0x120
[ 6075.428618]  ? ptep_clear_flush+0x30/0x40
[ 6075.438124]  ? mem_cgroup_throttle_swaprate+0x12/0x110
[ 6075.450947]  ? mem_cgroup_try_charge_delay+0x2c/0x40
[ 6075.463553]  ptep_clear_flush+0x30/0x40
[ 6075.472902]  wp_page_copy+0x311/0x6c0
[ 6075.484953]  do_wp_page+0x111/0x4c0
[ 6075.495653]  __handle_mm_fault+0x445/0xbd0
[ 6075.505234]  handle_mm_fault+0xf8/0x200
[ 6075.515135]  __do_page_fault+0x231/0x460
[ 6075.526681]  ? page_fault+0x8/0x30
[ 6075.538490]  page_fault+0x1e/0x30
[ 6075.549356] RIP: 0033:0x7f4ee1243476
[ 6075.591973] Code: 87 f3 c3 90 4c 8d 04 52 f3 0f 6f 06 f3 0f 6f 0c 16 f3 0f 6f 14 56 f3 42 0f 6f 1c 06 66 0f 7f 07 66 0f 7f 0c 17 66 0f 7f 14 57 <66> 42 0f 7f 1c 07 83 e9 04 48 8d 34 96 48 8d 3c 97 75 cb f3 c3 90
[ 6075.640506] RSP: 002b:00007f4eafa0df38 EFLAGS: 00010202
[ 6075.654685] RAX: 0000000000000000 RBX: 00007f4ea8014500 RCX: 0000000000000008
[ 6075.670418] RDX: 0000000000000780 RSI: 00007f4ea8287500 RDI: 00007f4eaca16c40
[ 6075.684779] RBP: 0000000000000d80 R08: 0000000000001680 R09: 0000000000000360
[ 6075.698332] R10: 0000000000000000 R11: 0000000000000440 R12: 0000000000000000
[ 6075.713357] R13: 0000000000000000 R14: 0000000000000440 R15: 0000000000000000
[ 6101.804934] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [ml1:mon-front-i:20428]
[ 6101.818433] Modules linked in:
[ 6101.823779] CPU: 1 PID: 20428 Comm: ml1:mon-front-i Tainted: G             L    4.19.0-20181022-doflr-xennext-vlan-ppp+ #1
[ 6101.843151] Hardware name: Xen HVM domU, BIOS 4.12-unstable 10/27/2018
[ 6101.854055] RIP: 0010:__pv_queued_spin_lock_slowpath+0xda/0x280
[ 6101.868370] Code: 44 41 bc 01 00 00 00 41 bd 00 01 00 00 3c 02 0f 94 c0 0f b6 c0 48 89 04 24 c6 45 44 00 ba 00 80 00 00 c6 43 01 01 eb 0b f3 90 <83> ea 01 0f 84 49 01 00 00 0f b6 03 84 c0 75 ee 44 89 e8 f0 66 44
[ 6101.901598] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [ml1:mon-front-i:20429]
[ 6101.909332] RSP: 0000:ffffc90003ba7d38 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff0c
[ 6101.930379] Modules linked in:
[ 6101.950302] RAX: 0000000000000001 RBX: ffffea0001d194e8 RCX: ffff88007b06e000
[ 6101.950303] RDX: 00000000000040ec RSI: 000000000000003b RDI: 0000000000000246
[ 6101.950304] RBP: ffff88007bca1a00 R08: ffff88007b06e000 R09: 0000000000000000
[ 6101.950305] R10: 0000000000000000 R11: 0000000000000040 R12: 0000000000000001
[ 6101.950305] R13: 0000000000000100 R14: ffff88007bc21a00 R15: 0000000000080000
[ 6101.950313] FS:  00007f4e775ec700(0000) GS:ffff88007bc80000(0000) knlGS:0000000000000000
[ 6101.950314] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6101.950315] CR2: 00007f4ec1dad760 CR3: 000000007462a000 CR4: 00000000000006e0
[ 6101.950317] Call Trace:
[ 6101.950324]  _raw_spin_lock+0x18/0x20
[ 6101.950327]  wp_page_copy+0x209/0x6c0
[ 6101.950328]  do_wp_page+0x111/0x4c0
[ 6101.950330]  __handle_mm_fault+0x445/0xbd0
[ 6101.950332]  handle_mm_fault+0xf8/0x200
[ 6101.950333]  __do_page_fault+0x231/0x460
[ 6101.950335]  ? page_fault+0x8/0x30
[ 6101.950336]  page_fault+0x1e/0x30
[ 6101.950337] RIP: 0033:0x7f4edb6b5880
[ 6101.950341] Code: 08 49 63 c0 48 01 d0 f6 c2 04 74 06 89 30 41 83 c0 04 89 f7 48 89 f8 48 c1 e0 20 48 01 c7 49 63 c0 66 0f 1f 84 00 00 00 00 00 <48> 89 3c 02 48 83 c0 08 39 c1 7f f4 8b 44 24 48 44 29 c0 83 e0 f8
[ 6101.950342] RSP: 002b:00007f4e775e9280 EFLAGS: 00010246
[ 6101.950343] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000019
[ 6101.950344] RDX: 00007f4ec1dad760 RSI: 0000000000000000 RDI: 0000000000000000
[ 6101.950344] RBP: 00007f4ec1daa920 R08: 0000000000000000 R09: 0000000000000000
[ 6101.950345] R10: 0000000000000000 R11: 0000000000000005 R12: 0000000000000020
[ 6101.950345] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
[ 6101.994931] watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [nc4:mon-front-i:3291]
[ 6102.005033] CPU: 2 PID: 20429 Comm: ml1:mon-front-i Tainted: G             L    4.19.0-20181022-doflr-xennext-vlan-ppp+ #1
[ 6102.025674] Modules linked in:
[ 6102.025680] CPU: 3 PID: 3291 Comm: nc4:mon-front-i Tainted: G             L    4.19.0-20181022-doflr-xennext-vlan-ppp+ #1
[ 6102.045621] Hardware name: Xen HVM domU, BIOS 4.12-unstable 10/27/2018
[ 6102.045629] RIP: 0010:smp_call_function_many+0x1db/0x240
[ 6102.065819] Hardware name: Xen HVM domU, BIOS 4.12-unstable 10/27/2018
[ 6102.065824] RIP: 0010:smp_call_function_many+0x1db/0x240
[ 6102.065826] Code: c7 e8 99 ab d1 00 3b 05 67 2e a0 01 0f 83 a3 fe ff ff 48 63 d0 48 8b 0b 48 03 0c d5 00 47 8a 82 8b 51 18 83 e2 01 74 0a f3 90 <8b> 51 18 83 e2 01 75 f6 eb c8 48 c7 c2 a0 e0 b3 82 48 89 ee 89 df
[ 6102.065827] RSP: 0000:ffffc90002307c70 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff0c
[ 6102.065829] RAX: 0000000000000000 RBX: ffff88007bda1d80 RCX: ffff88007bc25c20
[ 6102.065829] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88007bda1d88
[ 6102.065830] RBP: ffff88007bda1d88 R08: 0000000000000000 R09: 0000000000000000
[ 6102.065831] R10: 0000000000000000 R11: 0000000000000040 R12: ffffffff81057cb0
[ 6102.065831] R13: ffffc90002307cc0 R14: 0000000000000001 R15: 0000000000000006
[ 6102.065839] FS:  00007f4eafa0f700(0000) GS:ffff88007bd80000(0000) knlGS:0000000000000000
[ 6102.065839] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6102.065840] CR2: 00007f4eaca182c0 CR3: 000000007462a000 CR4: 00000000000006e0
[ 6102.065841] Call Trace:
[ 6102.065847]  flush_tlb_mm_range+0xb7/0x120
[ 6102.065850]  ? ptep_clear_flush+0x30/0x40
[ 6102.065852]  ? mem_cgroup_throttle_swaprate+0x12/0x110
[ 6102.065854]  ? mem_cgroup_try_charge_delay+0x2c/0x40
[ 6102.065855]  ptep_clear_flush+0x30/0x40
[ 6102.065856]  wp_page_copy+0x311/0x6c0
[ 6102.065858]  do_wp_page+0x111/0x4c0
[ 6102.065859]  __handle_mm_fault+0x445/0xbd0
[ 6102.065861]  handle_mm_fault+0xf8/0x200
[ 6102.065862]  __do_page_fault+0x231/0x460
[ 6102.065864]  ? page_fault+0x8/0x30
[ 6102.065865]  page_fault+0x1e/0x30
[ 6102.065866] RIP: 0033:0x7f4ee1243476
[ 6102.065868] Code: 87 f3 c3 90 4c 8d 04 52 f3 0f 6f 06 f3 0f 6f 0c 16 f3 0f 6f 14 56 f3 42 0f 6f 1c 06 66 0f 7f 07 66 0f 7f 0c 17 66 0f 7f 14 57 <66> 42 0f 7f 1c 07 83 e9 04 48 8d 34 96 48 8d 3c 97 75 cb f3 c3 90
[ 6102.065868] RSP: 002b:00007f4eafa0df38 EFLAGS: 00010202
[ 6102.065869] RAX: 0000000000000000 RBX: 00007f4ea8014500 RCX: 0000000000000008
[ 6102.065870] RDX: 0000000000000780 RSI: 00007f4ea8287500 RDI: 00007f4eaca16c40
[ 6102.065870] RBP: 0000000000000d80 R08: 0000000000001680 R09: 0000000000000360
[ 6102.065871] R10: 0000000000000000 R11: 0000000000000440 R12: 0000000000000000
[ 6102.065871] R13: 0000000000000000 R14: 0000000000000440 R15: 0000000000000000
[ 6103.031440] Code: c7 e8 99 ab d1 00 3b 05 67 2e a0 01 0f 83 a3 fe ff ff 48 63 d0 48 8b 0b 48 03 0c d5 00 47 8a 82 8b 51 18 83 e2 01 74 0a f3 90 <8b> 51 18 83 e2 01 75 f6 eb c8 48 c7 c2 a0 e0 b3 82 48 89 ee 89 df
[ 6103.086011] RSP: 0000:ffffc90003bafc70 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff0c
[ 6103.106153] RAX: 0000000000000000 RBX: ffff88007bd21d80 RCX: ffff88007bc25ae0
[ 6103.125746] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88007bd21d88
[ 6103.145416] RBP: ffff88007bd21d88 R08: 0000000000000000 R09: 0000000000000000
[ 6103.165330] R10: 0000000000000000 R11: 0000000000000040 R12: ffffffff81057cb0
[ 6103.184039] R13: ffffc90003bafcc0 R14: 0000000000000001 R15: 0000000000000006
[ 6103.202199] FS:  00007f4e77ded700(0000) GS:ffff88007bd00000(0000) knlGS:0000000000000000
[ 6103.223752] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6103.238817] CR2: 00007f4ec1d81660 CR3: 000000007462a000 CR4: 00000000000006e0
[ 6103.257574] Call Trace:
[ 6103.264574]  flush_tlb_mm_range+0xb7/0x120
[ 6103.276453]  ? ptep_clear_flush+0x30/0x40
[ 6103.287404]  ptep_clear_flush+0x30/0x40
[ 6103.297723]  wp_page_copy+0x311/0x6c0
[ 6103.308219]  do_wp_page+0x111/0x4c0
[ 6103.317656]  __handle_mm_fault+0x445/0xbd0
[ 6103.328956]  handle_mm_fault+0xf8/0x200
[ 6103.336587]  __do_page_fault+0x231/0x460
[ 6103.346720]  ? page_fault+0x8/0x30
[ 6103.353781]  page_fault+0x1e/0x30
[ 6103.361789] RIP: 0033:0x7f4edb7bfe52
[ 6103.368692] Code: 1f 84 00 00 00 00 00 90 48 8d 04 49 4c 8d 0c 76 0f 28 02 0f 28 0c 0a 0f 28 14 4a 0f 28 1c 02 0f 29 07 0f 29 0c 37 0f 29 14 77 <42> 0f 29 1c 0f 48 8d 14 8a 48 8d 3c b7 41 83 e8 04 7f d3 f3 c3 0f
[ 6103.411892] RSP: 002b:00007f4e77dea3b8 EFLAGS: 00010202
[ 6103.425641] RAX: 0000000000000060 RBX: 00007f4ec011dd40 RCX: 0000000000000020
[ 6103.443081] RDX: 00007f4ec01245c0 RSI: 00000000000007c0 RDI: 00007f4ec1d7ff20
[ 6103.460544] RBP: 0000000000000002 R08: 0000000000000008 R09: 0000000000001740
[ 6103.477320] R10: 0000000000000000 R11: 0000000000000000 R12: 00007f4ec0735180
[ 6103.490583] R13: 0000000000000528 R14: 0000000000000528 R15: 00007f4ec0718440
[ 6129.804708] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [ml1:mon-front-i:20428]
[ 6129.821850] Modules linked in:
[ 6129.829479] CPU: 1 PID: 20428 Comm: ml1:mon-front-i Tainted: G             L    4.19.0-20181022-doflr-xennext-vlan-ppp+ #1
[ 6129.854952] Hardware name: Xen HVM domU, BIOS 4.12-unstable 10/27/2018
[ 6129.871686] RIP: 0010:__pv_queued_spin_lock_slowpath+0xda/0x280
[ 6129.887378] Code: 44 41 bc 01 00 00 00 41 bd 00 01 00 00 3c 02 0f 94 c0 0f b6 c0 48 89 04 24 c6 45 44 00 ba 00 80 00 00 c6 43 01 01 eb 0b f3 90 <83> ea 01 0f 84 49 01 00 00 0f b6 03 84 c0 75 ee 44 89 e8 f0 66 44
[ 6129.901375] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [ml1:mon-front-i:20429]
[ 6129.931499] RSP: 0000:ffffc90003ba7d38 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff0c
[ 6129.950822] Modules linked in:
[ 6129.950826] CPU: 2 PID: 20429 Comm: ml1:mon-front-i Tainted: G             L    4.19.0-20181022-doflr-xennext-vlan-ppp+ #1
[ 6129.950827] Hardware name: Xen HVM domU, BIOS 4.12-unstable 10/27/2018
[ 6129.950831] RIP: 0010:smp_call_function_many+0x1db/0x240
[ 6129.950833] Code: c7 e8 99 ab d1 00 3b 05 67 2e a0 01 0f 83 a3 fe ff ff 48 63 d0 48 8b 0b 48 03 0c d5 00 47 8a 82 8b 51 18 83 e2 01 74 0a f3 90 <8b> 51 18 83 e2 01 75 f6 eb c8 48 c7 c2 a0 e0 b3 82 48 89 ee 89 df
[ 6129.950834] RSP: 0000:ffffc90003bafc70 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff0c
[ 6129.950835] RAX: 0000000000000000 RBX: ffff88007bd21d80 RCX: ffff88007bc25ae0
[ 6129.950836] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88007bd21d88
[ 6129.950836] RBP: ffff88007bd21d88 R08: 0000000000000000 R09: 0000000000000000
[ 6129.950837] R10: 0000000000000000 R11: 0000000000000040 R12: ffffffff81057cb0
[ 6129.950838] R13: ffffc90003bafcc0 R14: 0000000000000001 R15: 0000000000000006
[ 6129.950844] FS:  00007f4e77ded700(0000) GS:ffff88007bd00000(0000) knlGS:0000000000000000
[ 6129.950845] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6129.950846] CR2: 00007f4ec1d81660 CR3: 000000007462a000 CR4: 00000000000006e0
[ 6129.950847] Call Trace:
[ 6129.950853]  flush_tlb_mm_range+0xb7/0x120
[ 6129.950855]  ? ptep_clear_flush+0x30/0x40
[ 6129.950856]  ptep_clear_flush+0x30/0x40
[ 6129.950857]  wp_page_copy+0x311/0x6c0
[ 6129.950859]  do_wp_page+0x111/0x4c0
[ 6129.950861]  __handle_mm_fault+0x445/0xbd0
[ 6129.950862]  handle_mm_fault+0xf8/0x200
[ 6129.950864]  __do_page_fault+0x231/0x460
[ 6129.950865]  ? page_fault+0x8/0x30
[ 6129.950866]  page_fault+0x1e/0x30
[ 6129.950869] RIP: 0033:0x7f4edb7bfe52
[ 6129.950870] Code: 1f 84 00 00 00 00 00 90 48 8d 04 49 4c 8d 0c 76 0f 28 02 0f 28 0c 0a 0f 28 14 4a 0f 28 1c 02 0f 29 07 0f 29 0c 37 0f 29 14 77 <42> 0f 29 1c 0f 48 8d 14 8a 48 8d 3c b7 41 83 e8 04 7f d3 f3 c3 0f
[ 6129.950871] RSP: 002b:00007f4e77dea3b8 EFLAGS: 00010202
[ 6129.950871] RAX: 0000000000000060 RBX: 00007f4ec011dd40 RCX: 0000000000000020
[ 6129.950872] RDX: 00007f4ec01245c0 RSI: 00000000000007c0 RDI: 00007f4ec1d7ff20
[ 6129.950873] RBP: 0000000000000002 R08: 0000000000000008 R09: 0000000000001740
[ 6129.950873] R10: 0000000000000000 R11: 0000000000000000 R12: 00007f4ec0735180
[ 6129.950874] R13: 0000000000000528 R14: 0000000000000528 R15: 00007f4ec0718440
[ 6129.994712] watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [nc4:mon-front-i:3291]
[ 6130.016750] RAX: 0000000000000001 RBX: ffffea0001d194e8 RCX: ffff88007b06e000
[ 6130.016754] RDX: 0000000000005e93 RSI: 000000000000003b RDI: 0000000000000246
[ 6130.032446] Modules linked in:
[ 6130.032450] CPU: 3 PID: 3291 Comm: nc4:mon-front-i Tainted: G             L    4.19.0-20181022-doflr-xennext-vlan-ppp+ #1
[ 6130.032451] Hardware name: Xen HVM domU, BIOS 4.12-unstable 10/27/2018
[ 6130.032455] RIP: 0010:smp_call_function_many+0x1db/0x240
[ 6130.032458] Code: c7 e8 99 ab d1 00 3b 05 67 2e a0 01 0f 83 a3 fe ff ff 48 63 d0 48 8b 0b 48 03 0c d5 00 47 8a 82 8b 51 18 83 e2 01 74 0a f3 90 <8b> 51 18 83 e2 01 75 f6 eb c8 48 c7 c2 a0 e0 b3 82 48 89 ee 89 df
[ 6130.032459] RSP: 0000:ffffc90002307c70 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff0c
[ 6130.032460] RAX: 0000000000000000 RBX: ffff88007bda1d80 RCX: ffff88007bc25c20
[ 6130.032460] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88007bda1d88
[ 6130.032461] RBP: ffff88007bda1d88 R08: 0000000000000000 R09: 0000000000000000
[ 6130.032462] R10: 0000000000000000 R11: 0000000000000040 R12: ffffffff81057cb0
[ 6130.032462] R13: ffffc90002307cc0 R14: 0000000000000001 R15: 0000000000000006
[ 6130.032469] FS:  00007f4eafa0f700(0000) GS:ffff88007bd80000(0000) knlGS:0000000000000000
[ 6130.032470] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6130.032471] CR2: 00007f4eaca182c0 CR3: 000000007462a000 CR4: 00000000000006e0
[ 6130.032472] Call Trace:
[ 6130.032477]  flush_tlb_mm_range+0xb7/0x120
[ 6130.032479]  ? ptep_clear_flush+0x30/0x40
[ 6130.032481]  ? mem_cgroup_throttle_swaprate+0x12/0x110
[ 6130.032484]  ? mem_cgroup_try_charge_delay+0x2c/0x40
[ 6130.032485]  ptep_clear_flush+0x30/0x40
[ 6130.032487]  wp_page_copy+0x311/0x6c0
[ 6130.032488]  do_wp_page+0x111/0x4c0
[ 6130.032489]  __handle_mm_fault+0x445/0xbd0
[ 6130.032491]  handle_mm_fault+0xf8/0x200
[ 6130.032492]  __do_page_fault+0x231/0x460
[ 6130.032494]  ? page_fault+0x8/0x30
[ 6130.032495]  page_fault+0x1e/0x30
[ 6130.032496] RIP: 0033:0x7f4ee1243476
[ 6130.032498] Code: 87 f3 c3 90 4c 8d 04 52 f3 0f 6f 06 f3 0f 6f 0c 16 f3 0f 6f 14 56 f3 42 0f 6f 1c 06 66 0f 7f 07 66 0f 7f 0c 17 66 0f 7f 14 57 <66> 42 0f 7f 1c 07 83 e9 04 48 8d 34 96 48 8d 3c 97 75 cb f3 c3 90
[ 6130.032498] RSP: 002b:00007f4eafa0df38 EFLAGS: 00010202
[ 6130.032499] RAX: 0000000000000000 RBX: 00007f4ea8014500 RCX: 0000000000000008
[ 6130.032500] RDX: 0000000000000780 RSI: 00007f4ea8287500 RDI: 00007f4eaca16c40
[ 6130.032500] RBP: 0000000000000d80 R08: 0000000000001680 R09: 0000000000000360
[ 6130.032501] R10: 0000000000000000 R11: 0000000000000440 R12: 0000000000000000
[ 6130.032501] R13: 0000000000000000 R14: 0000000000000440 R15: 0000000000000000
[ 6131.291047] RBP: ffff88007bca1a00 R08: ffff88007b06e000 R09: 0000000000000000
[ 6131.309139] R10: 0000000000000000 R11: 0000000000000040 R12: 0000000000000001
[ 6131.327972] R13: 0000000000000100 R14: ffff88007bc21a00 R15: 0000000000080000
[ 6131.346497] FS:  00007f4e775ec700(0000) GS:ffff88007bc80000(0000) knlGS:0000000000000000
[ 6131.368028] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6131.382547] CR2: 00007f4ec1dad760 CR3: 000000007462a000 CR4: 00000000000006e0
[ 6131.400462] Call Trace:
[ 6131.407719]  _raw_spin_lock+0x18/0x20
[ 6131.416243]  wp_page_copy+0x209/0x6c0
[ 6131.426001]  do_wp_page+0x111/0x4c0
[ 6131.435944]  __handle_mm_fault+0x445/0xbd0
[ 6131.446758]  handle_mm_fault+0xf8/0x200
[ 6131.456932]  __do_page_fault+0x231/0x460
[ 6131.468114]  ? page_fault+0x8/0x30
[ 6131.477575]  page_fault+0x1e/0x30
[ 6131.486773] RIP: 0033:0x7f4edb6b5880
[ 6131.496623] Code: 08 49 63 c0 48 01 d0 f6 c2 04 74 06 89 30 41 83 c0 04 89 f7 48 89 f8 48 c1 e0 20 48 01 c7 49 63 c0 66 0f 1f 84 00 00 00 00 00 <48> 89 3c 02 48 83 c0 08 39 c1 7f f4 8b 44 24 48 44 29 c0 83 e0 f8
[ 6131.543172] RSP: 002b:00007f4e775e9280 EFLAGS: 00010246
[ 6131.556827] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000019
[ 6131.575884] RDX: 00007f4ec1dad760 RSI: 0000000000000000 RDI: 0000000000000000
[ 6131.594898] RBP: 00007f4ec1daa920 R08: 0000000000000000 R09: 0000000000000000
[ 6131.614826] R10: 0000000000000000 R11: 0000000000000005 R12: 0000000000000020
[ 6131.633727] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Guest soft lockups with "xen: make xen_qlock_wait() nestable"
  2018-11-07  9:30 Guest soft lockups with "xen: make xen_qlock_wait() nestable" Sander Eikelenboom
@ 2018-11-07 22:34 ` Boris Ostrovsky
  2018-11-07 22:45   ` Sander Eikelenboom
  2018-11-08  7:08 ` Juergen Gross
  1 sibling, 1 reply; 14+ messages in thread
From: Boris Ostrovsky @ 2018-11-07 22:34 UTC (permalink / raw)
  To: Sander Eikelenboom, Juergen Gross; +Cc: xen-devel

On 11/7/18 4:30 AM, Sander Eikelenboom wrote:
> Hi Juergen / Boris,
>
> Last week i tested Linux kernel 4.19.0 stable with the Xen "for-linus-4.20" branch pulled on top.
> Unfortunately i was seeing guests lockup after some time, see below for the logging from one of the guest
> which i was able to capture.
> Reverting "xen: make xen_qlock_wait() nestable" 7250f6d35681dfc44749d90598a2d51a118ce2b8,
> made the lockups disappear.
>
> These guests are stressed quite hard in both CPU and networking, 
> so they are probably more susceptible to locking issues.
>
> System is a AMD phenom x6, running Xen-unstable.
>
> Any ideas ?


By any chance, is VMPU on?


-boris




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Guest soft lockups with "xen: make xen_qlock_wait() nestable"
  2018-11-07 22:34 ` Boris Ostrovsky
@ 2018-11-07 22:45   ` Sander Eikelenboom
  2018-11-07 22:53     ` Boris Ostrovsky
  0 siblings, 1 reply; 14+ messages in thread
From: Sander Eikelenboom @ 2018-11-07 22:45 UTC (permalink / raw)
  To: Boris Ostrovsky, Juergen Gross; +Cc: xen-devel

On 07/11/18 23:34, Boris Ostrovsky wrote:
> On 11/7/18 4:30 AM, Sander Eikelenboom wrote:
>> Hi Juergen / Boris,
>>
>> Last week i tested Linux kernel 4.19.0 stable with the Xen "for-linus-4.20" branch pulled on top.
>> Unfortunately i was seeing guests lockup after some time, see below for the logging from one of the guest
>> which i was able to capture.
>> Reverting "xen: make xen_qlock_wait() nestable" 7250f6d35681dfc44749d90598a2d51a118ce2b8,
>> made the lockups disappear.
>>
>> These guests are stressed quite hard in both CPU and networking, 
>> so they are probably more susceptible to locking issues.
>>
>> System is a AMD phenom x6, running Xen-unstable.
>>
>> Any ideas ?
> 
> 
> By any chance, is VMPU on?
> 
> 
> -boris
> 

Had to look up what that is :), but seems only applicable to PV guests if i'm correct ?

I'm only running PVH and HVM guests at the moment, except for dom0 of course,
which reports:
    [    0.941407] VPMU disabled by hypervisor.

These soft lockups were in a HVM guest (if i remember correctly, i have seen
a PVH guest lockup as well after a while (also a quite heavy cpu/network stressed one).

--
Sander



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Guest soft lockups with "xen: make xen_qlock_wait() nestable"
  2018-11-07 22:45   ` Sander Eikelenboom
@ 2018-11-07 22:53     ` Boris Ostrovsky
  0 siblings, 0 replies; 14+ messages in thread
From: Boris Ostrovsky @ 2018-11-07 22:53 UTC (permalink / raw)
  To: Sander Eikelenboom, Juergen Gross; +Cc: xen-devel

On 11/7/18 5:45 PM, Sander Eikelenboom wrote:
> On 07/11/18 23:34, Boris Ostrovsky wrote:
>> On 11/7/18 4:30 AM, Sander Eikelenboom wrote:
>>> Hi Juergen / Boris,
>>>
>>> Last week i tested Linux kernel 4.19.0 stable with the Xen "for-linus-4.20" branch pulled on top.
>>> Unfortunately i was seeing guests lockup after some time, see below for the logging from one of the guest
>>> which i was able to capture.
>>> Reverting "xen: make xen_qlock_wait() nestable" 7250f6d35681dfc44749d90598a2d51a118ce2b8,
>>> made the lockups disappear.
>>>
>>> These guests are stressed quite hard in both CPU and networking, 
>>> so they are probably more susceptible to locking issues.
>>>
>>> System is a AMD phenom x6, running Xen-unstable.
>>>
>>> Any ideas ?
>>
>> By any chance, is VMPU on?
>>
>>
>> -boris
>>
> Had to look up what that is :), but seems only applicable to PV guests if i'm correct ?

No, it is applicable to HVM guests as well.

>
> I'm only running PVH and HVM guests at the moment, except for dom0 of course,
> which reports:
>     [    0.941407] VPMU disabled by hypervisor.

OK, you don't have it. (I asked because I was thinking of NMIs)

-boris

>
> These soft lockups were in a HVM guest (if i remember correctly, i have seen
> a PVH guest lockup as well after a while (also a quite heavy cpu/network stressed one).
>
> --
> Sander
>
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Guest soft lockups with "xen: make xen_qlock_wait() nestable"
  2018-11-07  9:30 Guest soft lockups with "xen: make xen_qlock_wait() nestable" Sander Eikelenboom
  2018-11-07 22:34 ` Boris Ostrovsky
@ 2018-11-08  7:08 ` Juergen Gross
  2018-11-08  8:14   ` Sander Eikelenboom
  1 sibling, 1 reply; 14+ messages in thread
From: Juergen Gross @ 2018-11-08  7:08 UTC (permalink / raw)
  To: Sander Eikelenboom, Boris Ostrovsky; +Cc: xen-devel

On 07/11/2018 10:30, Sander Eikelenboom wrote:
> Hi Juergen / Boris,
> 
> Last week i tested Linux kernel 4.19.0 stable with the Xen "for-linus-4.20" branch pulled on top.
> Unfortunately i was seeing guests lockup after some time, see below for the logging from one of the guest
> which i was able to capture.
> Reverting "xen: make xen_qlock_wait() nestable" 7250f6d35681dfc44749d90598a2d51a118ce2b8,
> made the lockups disappear.
> 
> These guests are stressed quite hard in both CPU and networking, 
> so they are probably more susceptible to locking issues.
> 
> System is a AMD phenom x6, running Xen-unstable.
> 
> Any ideas ?

Just checked the hypervisor again: it seems a pending interrupt for a
HVM/PVH vcpu won't let SCHEDOP_poll return in case interrupts are
disabled.

I need to rework the patch for that scenario. Until then I'll revert
it.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Guest soft lockups with "xen: make xen_qlock_wait() nestable"
  2018-11-08  7:08 ` Juergen Gross
@ 2018-11-08  8:14   ` Sander Eikelenboom
  2018-11-08  8:18     ` Juergen Gross
  0 siblings, 1 reply; 14+ messages in thread
From: Sander Eikelenboom @ 2018-11-08  8:14 UTC (permalink / raw)
  To: Juergen Gross, Boris Ostrovsky; +Cc: xen-devel

On 08/11/18 08:08, Juergen Gross wrote:
> On 07/11/2018 10:30, Sander Eikelenboom wrote:
>> Hi Juergen / Boris,
>>
>> Last week i tested Linux kernel 4.19.0 stable with the Xen "for-linus-4.20" branch pulled on top.
>> Unfortunately i was seeing guests lockup after some time, see below for the logging from one of the guest
>> which i was able to capture.
>> Reverting "xen: make xen_qlock_wait() nestable" 7250f6d35681dfc44749d90598a2d51a118ce2b8,
>> made the lockups disappear.
>>
>> These guests are stressed quite hard in both CPU and networking, 
>> so they are probably more susceptible to locking issues.
>>
>> System is a AMD phenom x6, running Xen-unstable.
>>
>> Any ideas ?
> 
> Just checked the hypervisor again: it seems a pending interrupt for a
> HVM/PVH vcpu won't let SCHEDOP_poll return in case interrupts are
> disabled.
> 
> I need to rework the patch for that scenario. Until then I'll revert
> it.

Thanks for looking into it.

--
Sander

> 
> Juergen
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Guest soft lockups with "xen: make xen_qlock_wait() nestable"
  2018-11-08  8:14   ` Sander Eikelenboom
@ 2018-11-08  8:18     ` Juergen Gross
  2018-11-08  9:57       ` Sander Eikelenboom
  0 siblings, 1 reply; 14+ messages in thread
From: Juergen Gross @ 2018-11-08  8:18 UTC (permalink / raw)
  To: Sander Eikelenboom, Boris Ostrovsky; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 1096 bytes --]

On 08/11/2018 09:14, Sander Eikelenboom wrote:
> On 08/11/18 08:08, Juergen Gross wrote:
>> On 07/11/2018 10:30, Sander Eikelenboom wrote:
>>> Hi Juergen / Boris,
>>>
>>> Last week i tested Linux kernel 4.19.0 stable with the Xen "for-linus-4.20" branch pulled on top.
>>> Unfortunately i was seeing guests lockup after some time, see below for the logging from one of the guest
>>> which i was able to capture.
>>> Reverting "xen: make xen_qlock_wait() nestable" 7250f6d35681dfc44749d90598a2d51a118ce2b8,
>>> made the lockups disappear.
>>>
>>> These guests are stressed quite hard in both CPU and networking, 
>>> so they are probably more susceptible to locking issues.
>>>
>>> System is a AMD phenom x6, running Xen-unstable.
>>>
>>> Any ideas ?
>>
>> Just checked the hypervisor again: it seems a pending interrupt for a
>> HVM/PVH vcpu won't let SCHEDOP_poll return in case interrupts are
>> disabled.
>>
>> I need to rework the patch for that scenario. Until then I'll revert
>> it.
> 
> Thanks for looking into it.

Could you try the attached patch (on top of 7250f6d35681df)?


Juergen


[-- Attachment #2: 0001-xen-fix-xen_qlock_wait.patch --]
[-- Type: text/x-patch, Size: 2358 bytes --]

>From 4f2d04b321d4eb50dab5cdfaa025336f9360618a Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Thu, 8 Nov 2018 08:35:06 +0100
Subject: [PATCH] xen: fix xen_qlock_wait()

Commit a856531951dc80 ("xen: make xen_qlock_wait() nestable")
introduced a regression for Xen guests running fully virtualized
(HVM or PVH mode). The Xen hypervisor wouldn't return from the poll
hypercall with interrupts disabled in case of an interrupt (for PV
guests it does).

So instead of disabling interrupts in xen_qlock_wait() use a nesting
counter to avoid calling xen_clear_irq_pending() in case
xen_qlock_wait() is nested.

Fixes: a856531951dc80 ("xen: make xen_qlock_wait() nestable")
Cc: stable@vger.kernel.org
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 arch/x86/xen/spinlock.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 441c88262169..22f3baa67a25 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -9,6 +9,7 @@
 #include <linux/log2.h>
 #include <linux/gfp.h>
 #include <linux/slab.h>
+#include <linux/atomic.h>
 
 #include <asm/paravirt.h>
 #include <asm/qspinlock.h>
@@ -21,6 +22,7 @@
 
 static DEFINE_PER_CPU(int, lock_kicker_irq) = -1;
 static DEFINE_PER_CPU(char *, irq_name);
+static DEFINE_PER_CPU(atomic_t, xen_qlock_wait_nest);
 static bool xen_pvspin = true;
 
 static void xen_qlock_kick(int cpu)
@@ -39,25 +41,25 @@ static void xen_qlock_kick(int cpu)
  */
 static void xen_qlock_wait(u8 *byte, u8 val)
 {
-	unsigned long flags;
 	int irq = __this_cpu_read(lock_kicker_irq);
 
 	/* If kicker interrupts not initialized yet, just spin */
 	if (irq == -1 || in_nmi())
 		return;
 
-	/* Guard against reentry. */
-	local_irq_save(flags);
+	/* Detect reentry. */
+	atomic_inc(&xen_qlock_wait_nest);
 
-	/* If irq pending already clear it. */
-	if (xen_test_irq_pending(irq)) {
+	/* If irq pending already and no nested call clear it. */
+	if (atomic_read(&xen_qlock_wait_nest) == 1 &&
+	    xen_test_irq_pending(irq)) {
 		xen_clear_irq_pending(irq);
 	} else if (READ_ONCE(*byte) == val) {
 		/* Block until irq becomes pending (or a spurious wakeup) */
 		xen_poll_irq(irq);
 	}
 
-	local_irq_restore(flags);
+	atomic_dec(&xen_qlock_wait_nest);
 }
 
 static irqreturn_t dummy_handler(int irq, void *dev_id)
-- 
2.16.4


[-- Attachment #3: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: Guest soft lockups with "xen: make xen_qlock_wait() nestable"
  2018-11-08  8:18     ` Juergen Gross
@ 2018-11-08  9:57       ` Sander Eikelenboom
  2018-11-08 10:18         ` Juergen Gross
  0 siblings, 1 reply; 14+ messages in thread
From: Sander Eikelenboom @ 2018-11-08  9:57 UTC (permalink / raw)
  To: Juergen Gross, Boris Ostrovsky; +Cc: xen-devel

On 08/11/18 09:18, Juergen Gross wrote:
> On 08/11/2018 09:14, Sander Eikelenboom wrote:
>> On 08/11/18 08:08, Juergen Gross wrote:
>>> On 07/11/2018 10:30, Sander Eikelenboom wrote:
>>>> Hi Juergen / Boris,
>>>>
>>>> Last week i tested Linux kernel 4.19.0 stable with the Xen "for-linus-4.20" branch pulled on top.
>>>> Unfortunately i was seeing guests lockup after some time, see below for the logging from one of the guest
>>>> which i was able to capture.
>>>> Reverting "xen: make xen_qlock_wait() nestable" 7250f6d35681dfc44749d90598a2d51a118ce2b8,
>>>> made the lockups disappear.
>>>>
>>>> These guests are stressed quite hard in both CPU and networking, 
>>>> so they are probably more susceptible to locking issues.
>>>>
>>>> System is a AMD phenom x6, running Xen-unstable.
>>>>
>>>> Any ideas ?
>>>
>>> Just checked the hypervisor again: it seems a pending interrupt for a
>>> HVM/PVH vcpu won't let SCHEDOP_poll return in case interrupts are
>>> disabled.
>>>
>>> I need to rework the patch for that scenario. Until then I'll revert
>>> it.
>>
>> Thanks for looking into it.
> 
> Could you try the attached patch (on top of 7250f6d35681df)?

That blows up while booting the guest:

[    1.792870] installing Xen timer for CPU 1
[    1.796171] x86: Booting SMP configuration:
[    1.799410] .... node  #0, CPUs:      #1
[    1.882922] cpu 1 spinlock event irq 59
[    1.899446] installing Xen timer for CPU 2
[    1.902864]  #2
[    1.986248] cpu 2 spinlock event irq 65
[    1.996200] installing Xen timer for CPU 3
[    1.999522]  #3
[    2.082921] cpu 3 spinlock event irq 71
[    2.092749] smp: Brought up 1 node, 4 CPUs
[    2.096079] smpboot: Max logical packages: 1
[    2.099410] smpboot: Total of 4 processors activated (25688.36 BogoMIPS)
[    2.102893] BUG: unable to handle kernel paging request at 0000000000014f90
[    2.106063] PGD 0 P4D 0 
[    2.106063] Oops: 0002 [#1] SMP NOPTI
[    2.106063] CPU: 1 PID: 16 Comm: migration/1 Not tainted 4.19.0-20181108-doflr-xennext-vlan-ppp-blkmq-qlockpatch+ #1
[    2.106063] Hardware name: Xen HVM domU, BIOS 4.12-unstable 10/30/2018
[    2.106063] RIP: 0010:xen_qlock_wait+0x23/0x70
[    2.106063] Code: 1f 84 00 00 00 00 00 55 53 48 83 ec 08 65 8b 2d 63 33 ff 7e 83 fd ff 74 32 65 8b 05 47 3f ff 7e a9 00 00 10 00 75 24 48 89 fb <f0> ff 05 36 33 ff 7e 8b 05 30 33 ff 7e 83 f8 01 74 16 0f b6 03 40
[    2.106063] RSP: 0018:ffffc900006d3dc0 EFLAGS: 00010046
[    2.106063] RAX: 0000000080000001 RBX: ffffffff831a5a68 RCX: 0000000000000008
[    2.106063] RDX: ffff88010f7ef700 RSI: 0000000000000003 RDI: ffffffff831a5a68
[    2.106063] RBP: 000000000000003b R08: 0000000000000008 R09: 000000000000006c
[    2.106063] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[    2.106063] R13: 0000000000000100 R14: 0000000000000000 R15: 0000000000080000
[    2.106063] FS:  0000000000000000(0000) GS:ffff88010b280000(0000) knlGS:0000000000000000
[    2.106063] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.106063] CR2: 0000000000014f90 CR3: 0000000002a24000 CR4: 00000000000006e0
[    2.106063] Call Trace:
[    2.106063]  ? __switch_to_asm+0x40/0x70
[    2.106063]  __pv_queued_spin_lock_slowpath+0x248/0x280
[    2.106063]  _raw_spin_lock+0x18/0x20
[    2.106063]  prepare_set+0xc/0x90
[    2.106063]  generic_set_all+0x26/0x2e0
[    2.106063]  ? __switch_to_asm+0x40/0x70
[    2.106063]  mtrr_rendezvous_handler+0x34/0x60
[    2.106063]  multi_cpu_stop+0xb6/0xe0
[    2.106063]  ? cpu_stop_queue_work+0xd0/0xd0
[    2.106063]  cpu_stopper_thread+0x86/0x100
[    2.106063]  smpboot_thread_fn+0x109/0x160
[    2.106063]  kthread+0xee/0x120
[    2.106063]  ? sort_range+0x20/0x20
[    2.106063]  ? kthread_park+0x80/0x80
[    2.106063]  ret_from_fork+0x22/0x40
[    2.106063] Modules linked in:
[    2.106063] CR2: 0000000000014f90
[    2.106063] BUG: unable to handle kernel paging request at 0000000000014f90
[    2.106063] ---[ end trace e5be82cfc3e40a5e ]---
[    2.106063] PGD 0 
[    2.106063] RIP: 0010:xen_qlock_wait+0x23/0x70
[    2.106063] P4D 0 
[    2.106063] Code: 1f 84 00 00 00 00 00 55 53 48 83 ec 08 65 8b 2d 63 33 ff 7e 83 fd ff 74 32 65 8b 05 47 3f ff 7e a9 00 00 10 00 75 24 48 89 fb <f0> ff 05 36 33 ff 7e 8b 05 30 33 ff 7e 83 f8 01 74 16 0f b6 03 40
[    2.106063] Oops: 0002 [#2] SMP NOPTI
[    2.106063] RSP: 0018:ffffc900006d3dc0 EFLAGS: 00010046


> 
> Juergen
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Guest soft lockups with "xen: make xen_qlock_wait() nestable"
  2018-11-08  9:57       ` Sander Eikelenboom
@ 2018-11-08 10:18         ` Juergen Gross
  2018-11-08 11:56           ` Sander Eikelenboom
  2018-11-14 23:22           ` David Woodhouse
  0 siblings, 2 replies; 14+ messages in thread
From: Juergen Gross @ 2018-11-08 10:18 UTC (permalink / raw)
  To: Sander Eikelenboom, Boris Ostrovsky; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 1457 bytes --]

On 08/11/2018 10:57, Sander Eikelenboom wrote:
> On 08/11/18 09:18, Juergen Gross wrote:
>> On 08/11/2018 09:14, Sander Eikelenboom wrote:
>>> On 08/11/18 08:08, Juergen Gross wrote:
>>>> On 07/11/2018 10:30, Sander Eikelenboom wrote:
>>>>> Hi Juergen / Boris,
>>>>>
>>>>> Last week i tested Linux kernel 4.19.0 stable with the Xen "for-linus-4.20" branch pulled on top.
>>>>> Unfortunately i was seeing guests lockup after some time, see below for the logging from one of the guest
>>>>> which i was able to capture.
>>>>> Reverting "xen: make xen_qlock_wait() nestable" 7250f6d35681dfc44749d90598a2d51a118ce2b8,
>>>>> made the lockups disappear.
>>>>>
>>>>> These guests are stressed quite hard in both CPU and networking, 
>>>>> so they are probably more susceptible to locking issues.
>>>>>
>>>>> System is a AMD phenom x6, running Xen-unstable.
>>>>>
>>>>> Any ideas ?
>>>>
>>>> Just checked the hypervisor again: it seems a pending interrupt for a
>>>> HVM/PVH vcpu won't let SCHEDOP_poll return in case interrupts are
>>>> disabled.
>>>>
>>>> I need to rework the patch for that scenario. Until then I'll revert
>>>> it.
>>>
>>> Thanks for looking into it.
>>
>> Could you try the attached patch (on top of 7250f6d35681df)?
> 
> That blows up while booting the guest:

Oh, sorry. Of course it does. Dereferencing a percpu variable
directly can't work. How silly of me.

The attached variant should repair that. Tested to not break booting.


Juergen

[-- Attachment #2: 0001-xen-fix-xen_qlock_wait.patch --]
[-- Type: text/x-patch, Size: 2400 bytes --]

>From 861c47480be2ef5cc301d3c4c2ca083c1160e39d Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Thu, 8 Nov 2018 08:35:06 +0100
Subject: [PATCH] xen: fix xen_qlock_wait()

Commit a856531951dc80 ("xen: make xen_qlock_wait() nestable")
introduced a regression for Xen guests running fully virtualized
(HVM or PVH mode). The Xen hypervisor wouldn't return from the poll
hypercall with interrupts disabled in case of an interrupt (for PV
guests it does).

So instead of disabling interrupts in xen_qlock_wait() use a nesting
counter to avoid calling xen_clear_irq_pending() in case
xen_qlock_wait() is nested.

Fixes: a856531951dc80 ("xen: make xen_qlock_wait() nestable")
Cc: stable@vger.kernel.org
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 arch/x86/xen/spinlock.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 441c88262169..5b25f8e9b619 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -9,6 +9,7 @@
 #include <linux/log2.h>
 #include <linux/gfp.h>
 #include <linux/slab.h>
+#include <linux/atomic.h>
 
 #include <asm/paravirt.h>
 #include <asm/qspinlock.h>
@@ -21,6 +22,7 @@
 
 static DEFINE_PER_CPU(int, lock_kicker_irq) = -1;
 static DEFINE_PER_CPU(char *, irq_name);
+static DEFINE_PER_CPU(atomic_t, xen_qlock_wait_nest);
 static bool xen_pvspin = true;
 
 static void xen_qlock_kick(int cpu)
@@ -39,25 +41,25 @@ static void xen_qlock_kick(int cpu)
  */
 static void xen_qlock_wait(u8 *byte, u8 val)
 {
-	unsigned long flags;
 	int irq = __this_cpu_read(lock_kicker_irq);
 
 	/* If kicker interrupts not initialized yet, just spin */
 	if (irq == -1 || in_nmi())
 		return;
 
-	/* Guard against reentry. */
-	local_irq_save(flags);
+	/* Detect reentry. */
+	atomic_inc(this_cpu_ptr(&xen_qlock_wait_nest));
 
-	/* If irq pending already clear it. */
-	if (xen_test_irq_pending(irq)) {
+	/* If irq pending already and no nested call clear it. */
+	if (atomic_read(this_cpu_ptr(&xen_qlock_wait_nest)) == 1 &&
+	    xen_test_irq_pending(irq)) {
 		xen_clear_irq_pending(irq);
 	} else if (READ_ONCE(*byte) == val) {
 		/* Block until irq becomes pending (or a spurious wakeup) */
 		xen_poll_irq(irq);
 	}
 
-	local_irq_restore(flags);
+	atomic_dec(this_cpu_ptr(&xen_qlock_wait_nest));
 }
 
 static irqreturn_t dummy_handler(int irq, void *dev_id)
-- 
2.16.4


[-- Attachment #3: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: Guest soft lockups with "xen: make xen_qlock_wait() nestable"
  2018-11-08 10:18         ` Juergen Gross
@ 2018-11-08 11:56           ` Sander Eikelenboom
  2018-11-14 23:22           ` David Woodhouse
  1 sibling, 0 replies; 14+ messages in thread
From: Sander Eikelenboom @ 2018-11-08 11:56 UTC (permalink / raw)
  To: Juergen Gross, Boris Ostrovsky; +Cc: xen-devel

On 08/11/18 11:18, Juergen Gross wrote:
> On 08/11/2018 10:57, Sander Eikelenboom wrote:
>> On 08/11/18 09:18, Juergen Gross wrote:
>>> On 08/11/2018 09:14, Sander Eikelenboom wrote:
>>>> On 08/11/18 08:08, Juergen Gross wrote:
>>>>> On 07/11/2018 10:30, Sander Eikelenboom wrote:
>>>>>> Hi Juergen / Boris,
>>>>>>
>>>>>> Last week i tested Linux kernel 4.19.0 stable with the Xen "for-linus-4.20" branch pulled on top.
>>>>>> Unfortunately i was seeing guests lockup after some time, see below for the logging from one of the guest
>>>>>> which i was able to capture.
>>>>>> Reverting "xen: make xen_qlock_wait() nestable" 7250f6d35681dfc44749d90598a2d51a118ce2b8,
>>>>>> made the lockups disappear.
>>>>>>
>>>>>> These guests are stressed quite hard in both CPU and networking, 
>>>>>> so they are probably more susceptible to locking issues.
>>>>>>
>>>>>> System is a AMD phenom x6, running Xen-unstable.
>>>>>>
>>>>>> Any ideas ?
>>>>>
>>>>> Just checked the hypervisor again: it seems a pending interrupt for a
>>>>> HVM/PVH vcpu won't let SCHEDOP_poll return in case interrupts are
>>>>> disabled.
>>>>>
>>>>> I need to rework the patch for that scenario. Until then I'll revert
>>>>> it.
>>>>
>>>> Thanks for looking into it.
>>>
>>> Could you try the attached patch (on top of 7250f6d35681df)?
>>
>> That blows up while booting the guest:
> 
> Oh, sorry. Of course it does. Dereferencing a percpu variable
> directly can't work. How silly of me.
> 
> The attached variant should repair that. Tested to not break booting.

This one boots. Will report back when either I find issues or
when I'm comfortable enough to give a "Tested-by" in a few days.

Thanks again.

--
Sander


> 
> Juergen
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Guest soft lockups with "xen: make xen_qlock_wait() nestable"
  2018-11-08 10:18         ` Juergen Gross
  2018-11-08 11:56           ` Sander Eikelenboom
@ 2018-11-14 23:22           ` David Woodhouse
  2018-11-19  7:05             ` Juergen Gross
  1 sibling, 1 reply; 14+ messages in thread
From: David Woodhouse @ 2018-11-14 23:22 UTC (permalink / raw)
  To: Juergen Gross, Sander Eikelenboom, Boris Ostrovsky; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 321 bytes --]

On Thu, 2018-11-08 at 11:18 +0100, Juergen Gross wrote:
> Oh, sorry. Of course it does. Dereferencing a percpu variable
> directly can't work. How silly of me.
> 
> The attached variant should repair that. Tested to not break booting.

Strictly speaking, shouldn't you have an atomic_init() in there
somewhere?


[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Guest soft lockups with "xen: make xen_qlock_wait() nestable"
  2018-11-14 23:22           ` David Woodhouse
@ 2018-11-19  7:05             ` Juergen Gross
  2018-11-19  9:39               ` David Woodhouse
  0 siblings, 1 reply; 14+ messages in thread
From: Juergen Gross @ 2018-11-19  7:05 UTC (permalink / raw)
  To: David Woodhouse, Sander Eikelenboom, Boris Ostrovsky; +Cc: xen-devel

On 15/11/2018 00:22, David Woodhouse wrote:
> On Thu, 2018-11-08 at 11:18 +0100, Juergen Gross wrote:
>> Oh, sorry. Of course it does. Dereferencing a percpu variable
>> directly can't work. How silly of me.
>>
>> The attached variant should repair that. Tested to not break booting.
> 
> Strictly speaking, shouldn't you have an atomic_init() in there
> somewhere?

atomic_t variables initialized with 0 (e.g. static ones) seem not to
require atomic_init() (or ATOMIC_INIT). Documentation/atomic_t.txt
doesn't mention the need to use it in this case. So I guess it is a
matter of taste.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Guest soft lockups with "xen: make xen_qlock_wait() nestable"
  2018-11-19  7:05             ` Juergen Gross
@ 2018-11-19  9:39               ` David Woodhouse
  2018-11-19  9:46                 ` Juergen Gross
  0 siblings, 1 reply; 14+ messages in thread
From: David Woodhouse @ 2018-11-19  9:39 UTC (permalink / raw)
  To: Juergen Gross, Sander Eikelenboom, Boris Ostrovsky; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1078 bytes --]

On Mon, 2018-11-19 at 08:05 +0100, Juergen Gross wrote:
> On 15/11/2018 00:22, David Woodhouse wrote:
> > On Thu, 2018-11-08 at 11:18 +0100, Juergen Gross wrote:
> > > Oh, sorry. Of course it does. Dereferencing a percpu variable
> > > directly can't work. How silly of me.
> > > 
> > > The attached variant should repair that. Tested to not break
> > > booting.
> > 
> > Strictly speaking, shouldn't you have an atomic_init() in there
> > somewhere?
> 
> atomic_t variables initialized with 0 (e.g. static ones) seem not to
> require atomic_init() (or ATOMIC_INIT). Documentation/atomic_t.txt
> doesn't mention the need to use it in this case. So I guess it is a
> matter of taste.

Yeah, we have '#define ATOMIC_INIT(i) { (i) }' fairly much everywhere
now, even on SPARC (not that this code runs on SPARC).

So it doesn't really matter, and it's fairly unlikely that the atomic_t
implementation is going to *change*.

But still, there's no harm in doing the 'tasteful' version:

static DEFINE_PER_CPU(atomic_t, xen_qlock_wait_nest) = ATOMIC_INIT(0);

[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Guest soft lockups with "xen: make xen_qlock_wait() nestable"
  2018-11-19  9:39               ` David Woodhouse
@ 2018-11-19  9:46                 ` Juergen Gross
  0 siblings, 0 replies; 14+ messages in thread
From: Juergen Gross @ 2018-11-19  9:46 UTC (permalink / raw)
  To: David Woodhouse, Sander Eikelenboom, Boris Ostrovsky; +Cc: xen-devel

On 19/11/2018 10:39, David Woodhouse wrote:
> On Mon, 2018-11-19 at 08:05 +0100, Juergen Gross wrote:
>> On 15/11/2018 00:22, David Woodhouse wrote:
>>> On Thu, 2018-11-08 at 11:18 +0100, Juergen Gross wrote:
>>>> Oh, sorry. Of course it does. Dereferencing a percpu variable
>>>> directly can't work. How silly of me.
>>>>
>>>> The attached variant should repair that. Tested to not break
>>>> booting.
>>>
>>> Strictly speaking, shouldn't you have an atomic_init() in there
>>> somewhere?
>>
>> atomic_t variables initialized with 0 (e.g. static ones) seem not to
>> require atomic_init() (or ATOMIC_INIT). Documentation/atomic_t.txt
>> doesn't mention the need to use it in this case. So I guess it is a
>> matter of taste.
> 
> Yeah, we have '#define ATOMIC_INIT(i) { (i) }' fairly much everywhere
> now, even on SPARC (not that this code runs on SPARC).
> 
> So it doesn't really matter, and it's fairly unlikely that the atomic_t
> implementation is going to *change*.
> 
> But still, there's no harm in doing the 'tasteful' version:
> 
> static DEFINE_PER_CPU(atomic_t, xen_qlock_wait_nest) = ATOMIC_INIT(0);

Feel free to send a patch (my patch is already in rc2).


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-11-19  9:46 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-07  9:30 Guest soft lockups with "xen: make xen_qlock_wait() nestable" Sander Eikelenboom
2018-11-07 22:34 ` Boris Ostrovsky
2018-11-07 22:45   ` Sander Eikelenboom
2018-11-07 22:53     ` Boris Ostrovsky
2018-11-08  7:08 ` Juergen Gross
2018-11-08  8:14   ` Sander Eikelenboom
2018-11-08  8:18     ` Juergen Gross
2018-11-08  9:57       ` Sander Eikelenboom
2018-11-08 10:18         ` Juergen Gross
2018-11-08 11:56           ` Sander Eikelenboom
2018-11-14 23:22           ` David Woodhouse
2018-11-19  7:05             ` Juergen Gross
2018-11-19  9:39               ` David Woodhouse
2018-11-19  9:46                 ` Juergen Gross

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.