All of lore.kernel.org
 help / color / mirror / Atom feed
* Dom0 kernel 4.14 with SMP randomly crashing
@ 2018-11-04 13:07 Rishi
  2018-11-04 14:24 ` Rishi
  2018-11-05 10:19 ` Wei Liu
  0 siblings, 2 replies; 18+ messages in thread
From: Rishi @ 2018-11-04 13:07 UTC (permalink / raw)
  To: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 678 bytes --]

I've built a dom0 kernel 4.14 with SMP support. The dom0 kernel crashes
when I'm downloading a large file on host. It does not crash if I have
nosmp boot option on xen command line.

my .config SMP options are

[root@f6029920339a wip-kernel-4.14.78]# grep SMP .config

CONFIG_X86_64_*SMP*=y

CONFIG_GENERIC_*SMP*_IDLE_THREAD=y

CONFIG_*SMP*=y

CONFIG_XEN_PV_*SMP*=y

CONFIG_XEN_PVHVM_*SMP*=y

# CONFIG_MAX*SMP* is not set

CONFIG_PM_SLEEP_*SMP*=y

CONFIG_SCSI_SAS_HOST_*SMP*=y


While comparing it to earlier kernels the options

CONFIG_XEN_PV_*SMP*=y

CONFIG_XEN_PVHVM_*SMP*=y


are newly introduced. Does xen need any update to better handle them?

My xen versions is : 4.7.6

[-- Attachment #1.2: Type: text/html, Size: 7440 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Dom0 kernel 4.14 with SMP randomly crashing
  2018-11-04 13:07 Dom0 kernel 4.14 with SMP randomly crashing Rishi
@ 2018-11-04 14:24 ` Rishi
  2018-11-05 10:19 ` Wei Liu
  1 sibling, 0 replies; 18+ messages in thread
From: Rishi @ 2018-11-04 14:24 UTC (permalink / raw)
  To: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1072 bytes --]

Found the references of these in
https://lists.xenproject.org/archives/html/xen-devel/2017-02/msg03120.html
proposal to split PV, PVHVM and PVH code in kernel.

There is no mention of any change or requirement from Xen perspective. Any
other way to track this problem?

On Sun, Nov 4, 2018 at 6:37 PM Rishi <2rushikeshj@gmail.com> wrote:

> I've built a dom0 kernel 4.14 with SMP support. The dom0 kernel crashes
> when I'm downloading a large file on host. It does not crash if I have
> nosmp boot option on xen command line.
>
> my .config SMP options are
>
> [root@f6029920339a wip-kernel-4.14.78]# grep SMP .config
>
> CONFIG_X86_64_*SMP*=y
>
> CONFIG_GENERIC_*SMP*_IDLE_THREAD=y
>
> CONFIG_*SMP*=y
>
> CONFIG_XEN_PV_*SMP*=y
>
> CONFIG_XEN_PVHVM_*SMP*=y
>
> # CONFIG_MAX*SMP* is not set
>
> CONFIG_PM_SLEEP_*SMP*=y
>
> CONFIG_SCSI_SAS_HOST_*SMP*=y
>
>
> While comparing it to earlier kernels the options
>
> CONFIG_XEN_PV_*SMP*=y
>
> CONFIG_XEN_PVHVM_*SMP*=y
>
>
> are newly introduced. Does xen need any update to better handle them?
>
> My xen versions is : 4.7.6
>

[-- Attachment #1.2: Type: text/html, Size: 8165 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Dom0 kernel 4.14 with SMP randomly crashing
  2018-11-04 13:07 Dom0 kernel 4.14 with SMP randomly crashing Rishi
  2018-11-04 14:24 ` Rishi
@ 2018-11-05 10:19 ` Wei Liu
  2018-11-05 10:56   ` Rishi
  1 sibling, 1 reply; 18+ messages in thread
From: Wei Liu @ 2018-11-05 10:19 UTC (permalink / raw)
  To: Rishi; +Cc: Wei Liu, xen-devel

On Sun, Nov 04, 2018 at 06:37:36PM +0530, Rishi wrote:
> I've built a dom0 kernel 4.14 with SMP support. The dom0 kernel crashes
> when I'm downloading a large file on host. It does not crash if I have
> nosmp boot option on xen command line.
> 
> my .config SMP options are
> 
> [root@f6029920339a wip-kernel-4.14.78]# grep SMP .config
> 
> CONFIG_X86_64_*SMP*=y
> 
> CONFIG_GENERIC_*SMP*_IDLE_THREAD=y
> 
> CONFIG_*SMP*=y
> 
> CONFIG_XEN_PV_*SMP*=y
> 
> CONFIG_XEN_PVHVM_*SMP*=y
> 
> # CONFIG_MAX*SMP* is not set
> 
> CONFIG_PM_SLEEP_*SMP*=y
> 
> CONFIG_SCSI_SAS_HOST_*SMP*=y
> 
> 
> While comparing it to earlier kernels the options
> 
> CONFIG_XEN_PV_*SMP*=y
> 
> CONFIG_XEN_PVHVM_*SMP*=y
> 
> 
> are newly introduced. Does xen need any update to better handle them?
> 

Please post the kernel log when it crashed. I'm afraid Xen developers
can't read your mind to figure out what went wrong. :-)

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Dom0 kernel 4.14 with SMP randomly crashing
  2018-11-05 10:19 ` Wei Liu
@ 2018-11-05 10:56   ` Rishi
  2018-11-05 10:58     ` Wei Liu
  0 siblings, 1 reply; 18+ messages in thread
From: Rishi @ 2018-11-05 10:56 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1405 bytes --]

Nothing is getting logged. I suspect its because the kernel just goes in
halt state.

/var/log/kern.log /var/log/messages /var/log/xen/hypervisor.log

All of these files have nothing appended.

I can fetch info from kernel debugfs, if you point me to required area,


The dom0 kernel works fine without spilt of SMP code or with "nosmp" option
at xen command line.

On Mon, Nov 5, 2018 at 3:49 PM Wei Liu <wei.liu2@citrix.com> wrote:

> On Sun, Nov 04, 2018 at 06:37:36PM +0530, Rishi wrote:
> > I've built a dom0 kernel 4.14 with SMP support. The dom0 kernel crashes
> > when I'm downloading a large file on host. It does not crash if I have
> > nosmp boot option on xen command line.
> >
> > my .config SMP options are
> >
> > [root@f6029920339a wip-kernel-4.14.78]# grep SMP .config
> >
> > CONFIG_X86_64_*SMP*=y
> >
> > CONFIG_GENERIC_*SMP*_IDLE_THREAD=y
> >
> > CONFIG_*SMP*=y
> >
> > CONFIG_XEN_PV_*SMP*=y
> >
> > CONFIG_XEN_PVHVM_*SMP*=y
> >
> > # CONFIG_MAX*SMP* is not set
> >
> > CONFIG_PM_SLEEP_*SMP*=y
> >
> > CONFIG_SCSI_SAS_HOST_*SMP*=y
> >
> >
> > While comparing it to earlier kernels the options
> >
> > CONFIG_XEN_PV_*SMP*=y
> >
> > CONFIG_XEN_PVHVM_*SMP*=y
> >
> >
> > are newly introduced. Does xen need any update to better handle them?
> >
>
> Please post the kernel log when it crashed. I'm afraid Xen developers
> can't read your mind to figure out what went wrong. :-)
>
> Wei.
>

[-- Attachment #1.2: Type: text/html, Size: 3284 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Dom0 kernel 4.14 with SMP randomly crashing
  2018-11-05 10:56   ` Rishi
@ 2018-11-05 10:58     ` Wei Liu
  2018-11-05 11:28       ` Rishi
  0 siblings, 1 reply; 18+ messages in thread
From: Wei Liu @ 2018-11-05 10:58 UTC (permalink / raw)
  To: Rishi; +Cc: Wei Liu, xen-devel

On Mon, Nov 05, 2018 at 04:26:03PM +0530, Rishi wrote:
> Nothing is getting logged. I suspect its because the kernel just goes in
> halt state.
> 
> /var/log/kern.log /var/log/messages /var/log/xen/hypervisor.log
> 
> All of these files have nothing appended.
> 
> I can fetch info from kernel debugfs, if you point me to required area,
> 
> 
> The dom0 kernel works fine without spilt of SMP code or with "nosmp" option
> at xen command line.

I think you will need to set up a serial console. I wouldn't expect to
see anything in /var/log if the system has crashed.

Without serial log there is nothing we can do to help.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Dom0 kernel 4.14 with SMP randomly crashing
  2018-11-05 10:58     ` Wei Liu
@ 2018-11-05 11:28       ` Rishi
  2018-11-05 11:41         ` Wei Liu
  0 siblings, 1 reply; 18+ messages in thread
From: Rishi @ 2018-11-05 11:28 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3404 bytes --]

Alright, I got the serial console and following is the crash log. Thank you
for pointing that out.

[  133.594852] watchdog: BUG: soft lockup - CPU#2 stuck for 22s!
[ksoftirqd/2:22]

[  133.599232] Kernel panic - not syncing: softlockup: hung tasks

[  133.602275] CPU: 2 PID: 22 Comm: ksoftirqd/2 Tainted: G
L    4.19.1
#1

[  133.606620] Hardware name: Xen HVM domU, BIOS 4.4.1-xs132257 12/12/2016

[  133.610376] Call Trace:

[  133.611827]  <IRQ>

[  133.612942]  dump_stack+0x5a/0x73

[  133.614676]  panic+0xe8/0x249

[  133.616183]  watchdog_timer_fn+0x200/0x230

[  133.618366]  ? softlockup_fn+0x40/0x40

[  133.620561]  __hrtimer_run_queues+0x133/0x270

[  133.623056]  hrtimer_interrupt+0xfb/0x260

[  133.625580]  xen_timer_interrupt+0x1b/0x30

[  133.628011]  __handle_irq_event_percpu+0x69/0x1a0

[  133.630642]  handle_irq_event_percpu+0x30/0x70

[  133.633826]  handle_percpu_irq+0x34/0x50

[  133.636012]  generic_handle_irq+0x1e/0x30

[  133.638573]  __evtchn_fifo_handle_events+0x163/0x1a0

[  133.641444]  __xen_evtchn_do_upcall+0x41/0x70

[  133.643772]  xen_evtchn_do_upcall+0x27/0x50

[  133.645868]  xen_do_hypervisor_callback+0x29/0x40

[  133.648231]  </IRQ>

[  133.649279] RIP: e030:xen_hypercall_xen_version+0xa/0x20

[  133.652259] Code: 51 41 53 b8 10 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc
cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 11 00 00 00 0f 05
<41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc

[  133.662574] RSP: e02b:ffffc900400dbd18 EFLAGS: 00000246

[  133.665557] RAX: 0000000000040007 RBX: ffff88003315a898 RCX:
ffffffff8100122a

[  133.670005] RDX: ffffc900400080ff RSI: 0000000000000000 RDI:
0000000000000000

[  133.674606] RBP: ffff88003315a890 R08: 0000000000000000 R09:
0000000000000000

[  133.678567] R10: 0000000000000000 R11: 0000000000000246 R12:
0000000080000600

[  133.682552] R13: 000000000000002a R14: 00000000000002a0 R15:
000000000000002a

[  133.686851]  ? xen_hypercall_xen_version+0xa/0x20

[  133.689681]  ? xen_force_evtchn_callback+0x9/0x10

[  133.692189]  ? check_events+0x12/0x20

[  133.694227]  ? xen_restore_fl_direct+0x1f/0x20

[  133.696878]  ? _raw_spin_unlock_irqrestore+0x14/0x20

[  133.699894]  ? cp_rx_poll+0x427/0x4d0 [8139cp]

[  133.702421]  ? net_rx_action+0x171/0x3a0

[  133.704693]  ? __do_softirq+0x11e/0x295

[  133.706672]  ? run_ksoftirqd+0x1e/0x40

[  133.708867]  ? smpboot_thread_fn+0x13c/0x160

[  133.711320]  ? kthread+0x115/0x120

[  133.713350]  ? sort_range+0x20/0x20

[  133.715421]  ? __kthread_cancel_work+0x80/0x80

[  133.717969]  ? ret_from_fork+0x35/0x40

[  133.720238] Kernel Offset: disabled



I'll try to solve it ahead.



On Mon, Nov 5, 2018 at 4:28 PM Wei Liu <wei.liu2@citrix.com> wrote:

> On Mon, Nov 05, 2018 at 04:26:03PM +0530, Rishi wrote:
> > Nothing is getting logged. I suspect its because the kernel just goes in
> > halt state.
> >
> > /var/log/kern.log /var/log/messages /var/log/xen/hypervisor.log
> >
> > All of these files have nothing appended.
> >
> > I can fetch info from kernel debugfs, if you point me to required area,
> >
> >
> > The dom0 kernel works fine without spilt of SMP code or with "nosmp"
> option
> > at xen command line.
>
> I think you will need to set up a serial console. I wouldn't expect to
> see anything in /var/log if the system has crashed.
>
> Without serial log there is nothing we can do to help.
>
> Wei.
>

[-- Attachment #1.2: Type: text/html, Size: 21024 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Dom0 kernel 4.14 with SMP randomly crashing
  2018-11-05 11:28       ` Rishi
@ 2018-11-05 11:41         ` Wei Liu
  2018-11-05 11:48           ` Rishi
  0 siblings, 1 reply; 18+ messages in thread
From: Wei Liu @ 2018-11-05 11:41 UTC (permalink / raw)
  To: Rishi; +Cc: Wei Liu, xen-devel

On Mon, Nov 05, 2018 at 04:58:35PM +0530, Rishi wrote:
> Alright, I got the serial console and following is the crash log. Thank you
> for pointing that out.
> 
> [  133.594852] watchdog: BUG: soft lockup - CPU#2 stuck for 22s!
> [ksoftirqd/2:22]
> 
> [  133.599232] Kernel panic - not syncing: softlockup: hung tasks
> 
> [  133.602275] CPU: 2 PID: 22 Comm: ksoftirqd/2 Tainted: G
> L    4.19.1
> #1
> 
> [  133.606620] Hardware name: Xen HVM domU, BIOS 4.4.1-xs132257 12/12/2016

Is this serial log from the host? It says it is running as a HVM DomU.
Maybe you have mistaken guest serial log with host serial log?

This indicates your machine runs XenServer, which has its own patch
queues on top of upstream Xen. You may also want to report to xs-devel
mailing list.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Dom0 kernel 4.14 with SMP randomly crashing
  2018-11-05 11:41         ` Wei Liu
@ 2018-11-05 11:48           ` Rishi
  2018-11-05 11:58             ` Wei Liu
  0 siblings, 1 reply; 18+ messages in thread
From: Rishi @ 2018-11-05 11:48 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 950 bytes --]

Yes, I'm running it in a HVM domU for development purpose.

On Mon, Nov 5, 2018 at 5:11 PM Wei Liu <wei.liu2@citrix.com> wrote:

> On Mon, Nov 05, 2018 at 04:58:35PM +0530, Rishi wrote:
> > Alright, I got the serial console and following is the crash log. Thank
> you
> > for pointing that out.
> >
> > [  133.594852] watchdog: BUG: soft lockup - CPU#2 stuck for 22s!
> > [ksoftirqd/2:22]
> >
> > [  133.599232] Kernel panic - not syncing: softlockup: hung tasks
> >
> > [  133.602275] CPU: 2 PID: 22 Comm: ksoftirqd/2 Tainted: G
> > L    4.19.1
> > #1
> >
> > [  133.606620] Hardware name: Xen HVM domU, BIOS 4.4.1-xs132257
> 12/12/2016
>
> Is this serial log from the host? It says it is running as a HVM DomU.
> Maybe you have mistaken guest serial log with host serial log?
>
> This indicates your machine runs XenServer, which has its own patch
> queues on top of upstream Xen. You may also want to report to xs-devel
> mailing list.
>
> Wei.
>

[-- Attachment #1.2: Type: text/html, Size: 1315 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Dom0 kernel 4.14 with SMP randomly crashing
  2018-11-05 11:48           ` Rishi
@ 2018-11-05 11:58             ` Wei Liu
  2018-11-05 12:30               ` Rishi
  0 siblings, 1 reply; 18+ messages in thread
From: Wei Liu @ 2018-11-05 11:58 UTC (permalink / raw)
  To: Rishi; +Cc: Wei Liu, xen-devel

On Mon, Nov 05, 2018 at 05:18:43PM +0530, Rishi wrote:
> Yes, I'm running it in a HVM domU for development purpose.

What is your exact setup?

Wei.

> 
> On Mon, Nov 5, 2018 at 5:11 PM Wei Liu <wei.liu2@citrix.com> wrote:
> 
> > On Mon, Nov 05, 2018 at 04:58:35PM +0530, Rishi wrote:
> > > Alright, I got the serial console and following is the crash log. Thank
> > you
> > > for pointing that out.
> > >
> > > [  133.594852] watchdog: BUG: soft lockup - CPU#2 stuck for 22s!
> > > [ksoftirqd/2:22]
> > >
> > > [  133.599232] Kernel panic - not syncing: softlockup: hung tasks
> > >
> > > [  133.602275] CPU: 2 PID: 22 Comm: ksoftirqd/2 Tainted: G
> > > L    4.19.1
> > > #1
> > >
> > > [  133.606620] Hardware name: Xen HVM domU, BIOS 4.4.1-xs132257
> > 12/12/2016
> >
> > Is this serial log from the host? It says it is running as a HVM DomU.
> > Maybe you have mistaken guest serial log with host serial log?
> >
> > This indicates your machine runs XenServer, which has its own patch
> > queues on top of upstream Xen. You may also want to report to xs-devel
> > mailing list.
> >
> > Wei.
> >

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Dom0 kernel 4.14 with SMP randomly crashing
  2018-11-05 11:58             ` Wei Liu
@ 2018-11-05 12:30               ` Rishi
  2018-11-05 12:49                 ` Wei Liu
  0 siblings, 1 reply; 18+ messages in thread
From: Rishi @ 2018-11-05 12:30 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1481 bytes --]

I'm using a XenServer Host and XCP-NG on it as HVM. I used xencons=tty
console=ttyS0 on XCP-NG dom0 kernel line, to obtain serial console.
I'm working on to build a more recent dom0 kernel for improved support of
Ceph in XenServer/XCP-NG.



On Mon, Nov 5, 2018 at 5:28 PM Wei Liu <wei.liu2@citrix.com> wrote:

> On Mon, Nov 05, 2018 at 05:18:43PM +0530, Rishi wrote:
> > Yes, I'm running it in a HVM domU for development purpose.
>
> What is your exact setup?
>
> Wei.
>
> >
> > On Mon, Nov 5, 2018 at 5:11 PM Wei Liu <wei.liu2@citrix.com> wrote:
> >
> > > On Mon, Nov 05, 2018 at 04:58:35PM +0530, Rishi wrote:
> > > > Alright, I got the serial console and following is the crash log.
> Thank
> > > you
> > > > for pointing that out.
> > > >
> > > > [  133.594852] watchdog: BUG: soft lockup - CPU#2 stuck for 22s!
> > > > [ksoftirqd/2:22]
> > > >
> > > > [  133.599232] Kernel panic - not syncing: softlockup: hung tasks
> > > >
> > > > [  133.602275] CPU: 2 PID: 22 Comm: ksoftirqd/2 Tainted: G
> > > > L    4.19.1
> > > > #1
> > > >
> > > > [  133.606620] Hardware name: Xen HVM domU, BIOS 4.4.1-xs132257
> > > 12/12/2016
> > >
> > > Is this serial log from the host? It says it is running as a HVM DomU.
> > > Maybe you have mistaken guest serial log with host serial log?
> > >
> > > This indicates your machine runs XenServer, which has its own patch
> > > queues on top of upstream Xen. You may also want to report to xs-devel
> > > mailing list.
> > >
> > > Wei.
> > >
>

[-- Attachment #1.2: Type: text/html, Size: 2273 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Dom0 kernel 4.14 with SMP randomly crashing
  2018-11-05 12:30               ` Rishi
@ 2018-11-05 12:49                 ` Wei Liu
  2018-11-05 12:59                   ` Rishi
  0 siblings, 1 reply; 18+ messages in thread
From: Wei Liu @ 2018-11-05 12:49 UTC (permalink / raw)
  To: Rishi; +Cc: Wei Liu, xen-devel

I forgot to say: please don't top-post.

On Mon, Nov 05, 2018 at 06:00:10PM +0530, Rishi wrote:
> I'm using a XenServer Host and XCP-NG on it as HVM. I used xencons=tty
> console=ttyS0 on XCP-NG dom0 kernel line, to obtain serial console.
> I'm working on to build a more recent dom0 kernel for improved support of
> Ceph in XenServer/XCP-NG.

This is an interesting setup. I don't think you can expect to just drop
in a new kernel to XenServer/XCP-NG and then it works flawlessly. What
did you do to the patch queue XenServer carries for 4.4?

Also, have you got a working baseline? I.e. did the stock 4.4 kernel
work?

Wei.

> 
> 
> 
> On Mon, Nov 5, 2018 at 5:28 PM Wei Liu <wei.liu2@citrix.com> wrote:
> 
> > On Mon, Nov 05, 2018 at 05:18:43PM +0530, Rishi wrote:
> > > Yes, I'm running it in a HVM domU for development purpose.
> >
> > What is your exact setup?
> >
> > Wei.
> >
> > >
> > > On Mon, Nov 5, 2018 at 5:11 PM Wei Liu <wei.liu2@citrix.com> wrote:
> > >
> > > > On Mon, Nov 05, 2018 at 04:58:35PM +0530, Rishi wrote:
> > > > > Alright, I got the serial console and following is the crash log.
> > Thank
> > > > you
> > > > > for pointing that out.
> > > > >
> > > > > [  133.594852] watchdog: BUG: soft lockup - CPU#2 stuck for 22s!
> > > > > [ksoftirqd/2:22]
> > > > >
> > > > > [  133.599232] Kernel panic - not syncing: softlockup: hung tasks
> > > > >
> > > > > [  133.602275] CPU: 2 PID: 22 Comm: ksoftirqd/2 Tainted: G
> > > > > L    4.19.1
> > > > > #1
> > > > >
> > > > > [  133.606620] Hardware name: Xen HVM domU, BIOS 4.4.1-xs132257
> > > > 12/12/2016
> > > >
> > > > Is this serial log from the host? It says it is running as a HVM DomU.
> > > > Maybe you have mistaken guest serial log with host serial log?
> > > >
> > > > This indicates your machine runs XenServer, which has its own patch
> > > > queues on top of upstream Xen. You may also want to report to xs-devel
> > > > mailing list.
> > > >
> > > > Wei.
> > > >
> >

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Dom0 kernel 4.14 with SMP randomly crashing
  2018-11-05 12:49                 ` Wei Liu
@ 2018-11-05 12:59                   ` Rishi
  2018-11-05 13:02                     ` Rishi
  0 siblings, 1 reply; 18+ messages in thread
From: Rishi @ 2018-11-05 12:59 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2731 bytes --]

Yes, I'm taking out patches from 4.4 and actually do have a working 4.9
kernel along with blktap. Tested networking and disk IO in it.

There are roughly 415 patches to 4.4 out of which some ~210+ are already
applied in 4.9 and ~220+ are already applied in 4.14. I dont have numbers
for 4.19 yet.

Essentially I'm down to single digit number of patches atm to have a
working setup for kernel 4.9. I know there would be mishaps since I'm not
applying all patches but my experiment is to see how close can we stay near
mainline kernel + what can be the patches that kernel.org can accept.



On Mon, Nov 5, 2018 at 6:19 PM Wei Liu <wei.liu2@citrix.com> wrote:

> I forgot to say: please don't top-post.
>
> On Mon, Nov 05, 2018 at 06:00:10PM +0530, Rishi wrote:
> > I'm using a XenServer Host and XCP-NG on it as HVM. I used xencons=tty
> > console=ttyS0 on XCP-NG dom0 kernel line, to obtain serial console.
> > I'm working on to build a more recent dom0 kernel for improved support of
> > Ceph in XenServer/XCP-NG.
>
> This is an interesting setup. I don't think you can expect to just drop
> in a new kernel to XenServer/XCP-NG and then it works flawlessly. What
> did you do to the patch queue XenServer carries for 4.4?
>
> Also, have you got a working baseline? I.e. did the stock 4.4 kernel
> work?
>
> Wei.
>
> >
> >
> >
> > On Mon, Nov 5, 2018 at 5:28 PM Wei Liu <wei.liu2@citrix.com> wrote:
> >
> > > On Mon, Nov 05, 2018 at 05:18:43PM +0530, Rishi wrote:
> > > > Yes, I'm running it in a HVM domU for development purpose.
> > >
> > > What is your exact setup?
> > >
> > > Wei.
> > >
> > > >
> > > > On Mon, Nov 5, 2018 at 5:11 PM Wei Liu <wei.liu2@citrix.com> wrote:
> > > >
> > > > > On Mon, Nov 05, 2018 at 04:58:35PM +0530, Rishi wrote:
> > > > > > Alright, I got the serial console and following is the crash log.
> > > Thank
> > > > > you
> > > > > > for pointing that out.
> > > > > >
> > > > > > [  133.594852] watchdog: BUG: soft lockup - CPU#2 stuck for 22s!
> > > > > > [ksoftirqd/2:22]
> > > > > >
> > > > > > [  133.599232] Kernel panic - not syncing: softlockup: hung tasks
> > > > > >
> > > > > > [  133.602275] CPU: 2 PID: 22 Comm: ksoftirqd/2 Tainted: G
> > > > > > L    4.19.1
> > > > > > #1
> > > > > >
> > > > > > [  133.606620] Hardware name: Xen HVM domU, BIOS 4.4.1-xs132257
> > > > > 12/12/2016
> > > > >
> > > > > Is this serial log from the host? It says it is running as a HVM
> DomU.
> > > > > Maybe you have mistaken guest serial log with host serial log?
> > > > >
> > > > > This indicates your machine runs XenServer, which has its own patch
> > > > > queues on top of upstream Xen. You may also want to report to
> xs-devel
> > > > > mailing list.
> > > > >
> > > > > Wei.
> > > > >
> > >
>

[-- Attachment #1.2: Type: text/html, Size: 3854 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Dom0 kernel 4.14 with SMP randomly crashing
  2018-11-05 12:59                   ` Rishi
@ 2018-11-05 13:02                     ` Rishi
  2018-11-06 10:01                       ` Rishi
  0 siblings, 1 reply; 18+ messages in thread
From: Rishi @ 2018-11-05 13:02 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2952 bytes --]

On Mon, Nov 5, 2018 at 6:29 PM Rishi <2rushikeshj@gmail.com> wrote:

> Yes, I'm taking out patches from 4.4 and actually do have a working 4.9
> kernel along with blktap. Tested networking and disk IO in it.
>
> There are roughly 415 patches to 4.4 out of which some ~210+ are already
> applied in 4.9 and ~220+ are already applied in 4.14. I dont have numbers
> for 4.19 yet.
>
> Essentially I'm down to single digit number of patches atm to have a
> working setup for kernel 4.9. I know there would be mishaps since I'm not
> applying all patches but my experiment is to see how close can we stay near
> mainline kernel + what can be the patches that kernel.org can accept.
>
>
>
> On Mon, Nov 5, 2018 at 6:19 PM Wei Liu <wei.liu2@citrix.com> wrote:
>
>> I forgot to say: please don't top-post.
>>
>> On Mon, Nov 05, 2018 at 06:00:10PM +0530, Rishi wrote:
>> > I'm using a XenServer Host and XCP-NG on it as HVM. I used xencons=tty
>> > console=ttyS0 on XCP-NG dom0 kernel line, to obtain serial console.
>> > I'm working on to build a more recent dom0 kernel for improved support
>> of
>> > Ceph in XenServer/XCP-NG.
>>
>> This is an interesting setup. I don't think you can expect to just drop
>> in a new kernel to XenServer/XCP-NG and then it works flawlessly. What
>> did you do to the patch queue XenServer carries for 4.4?
>>
>> Also, have you got a working baseline? I.e. did the stock 4.4 kernel
>> work?
>>
>> Wei.
>>
>> >
>> >
>> >
>> > On Mon, Nov 5, 2018 at 5:28 PM Wei Liu <wei.liu2@citrix.com> wrote:
>> >
>> > > On Mon, Nov 05, 2018 at 05:18:43PM +0530, Rishi wrote:
>> > > > Yes, I'm running it in a HVM domU for development purpose.
>> > >
>> > > What is your exact setup?
>> > >
>> > > Wei.
>> > >
>> > > >
>> > > > On Mon, Nov 5, 2018 at 5:11 PM Wei Liu <wei.liu2@citrix.com> wrote:
>> > > >
>> > > > > On Mon, Nov 05, 2018 at 04:58:35PM +0530, Rishi wrote:
>> > > > > > Alright, I got the serial console and following is the crash
>> log.
>> > > Thank
>> > > > > you
>> > > > > > for pointing that out.
>> > > > > >
>> > > > > > [  133.594852] watchdog: BUG: soft lockup - CPU#2 stuck for 22s!
>> > > > > > [ksoftirqd/2:22]
>> > > > > >
>> > > > > > [  133.599232] Kernel panic - not syncing: softlockup: hung
>> tasks
>> > > > > >
>> > > > > > [  133.602275] CPU: 2 PID: 22 Comm: ksoftirqd/2 Tainted: G
>> > > > > > L    4.19.1
>> > > > > > #1
>> > > > > >
>> > > > > > [  133.606620] Hardware name: Xen HVM domU, BIOS 4.4.1-xs132257
>> > > > > 12/12/2016
>> > > > >
>> > > > > Is this serial log from the host? It says it is running as a HVM
>> DomU.
>> > > > > Maybe you have mistaken guest serial log with host serial log?
>> > > > >
>> > > > > This indicates your machine runs XenServer, which has its own
>> patch
>> > > > > queues on top of upstream Xen. You may also want to report to
>> xs-devel
>> > > > > mailing list.
>> > > > >
>> > > > > Wei.
>> > > > >
>> > >
>>
>

Sorry, I'll take care of top post from onwards.

[-- Attachment #1.2: Type: text/html, Size: 4272 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Dom0 kernel 4.14 with SMP randomly crashing
  2018-11-05 13:02                     ` Rishi
@ 2018-11-06 10:01                       ` Rishi
  2018-11-06 12:17                         ` Wei Liu
  0 siblings, 1 reply; 18+ messages in thread
From: Rishi @ 2018-11-06 10:01 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu


[-- Attachment #1.1: Type: text/plain, Size: 6310 bytes --]

On Mon, Nov 5, 2018 at 6:32 PM Rishi <2rushikeshj@gmail.com> wrote:

>
>
> On Mon, Nov 5, 2018 at 6:29 PM Rishi <2rushikeshj@gmail.com> wrote:
>
>> Yes, I'm taking out patches from 4.4 and actually do have a working 4.9
>> kernel along with blktap. Tested networking and disk IO in it.
>>
>> There are roughly 415 patches to 4.4 out of which some ~210+ are already
>> applied in 4.9 and ~220+ are already applied in 4.14. I dont have numbers
>> for 4.19 yet.
>>
>> Essentially I'm down to single digit number of patches atm to have a
>> working setup for kernel 4.9. I know there would be mishaps since I'm not
>> applying all patches but my experiment is to see how close can we stay near
>> mainline kernel + what can be the patches that kernel.org can accept.
>>
>>
>>
>> On Mon, Nov 5, 2018 at 6:19 PM Wei Liu <wei.liu2@citrix.com> wrote:
>>
>>> I forgot to say: please don't top-post.
>>>
>>> On Mon, Nov 05, 2018 at 06:00:10PM +0530, Rishi wrote:
>>> > I'm using a XenServer Host and XCP-NG on it as HVM. I used xencons=tty
>>> > console=ttyS0 on XCP-NG dom0 kernel line, to obtain serial console.
>>> > I'm working on to build a more recent dom0 kernel for improved support
>>> of
>>> > Ceph in XenServer/XCP-NG.
>>>
>>> This is an interesting setup. I don't think you can expect to just drop
>>> in a new kernel to XenServer/XCP-NG and then it works flawlessly. What
>>> did you do to the patch queue XenServer carries for 4.4?
>>>
>>> Also, have you got a working baseline? I.e. did the stock 4.4 kernel
>>> work?
>>>
>>> Wei.
>>>
>>> >
>>> >
>>> >
>>> > On Mon, Nov 5, 2018 at 5:28 PM Wei Liu <wei.liu2@citrix.com> wrote:
>>> >
>>> > > On Mon, Nov 05, 2018 at 05:18:43PM +0530, Rishi wrote:
>>> > > > Yes, I'm running it in a HVM domU for development purpose.
>>> > >
>>> > > What is your exact setup?
>>> > >
>>> > > Wei.
>>> > >
>>> > > >
>>> > > > On Mon, Nov 5, 2018 at 5:11 PM Wei Liu <wei.liu2@citrix.com>
>>> wrote:
>>> > > >
>>> > > > > On Mon, Nov 05, 2018 at 04:58:35PM +0530, Rishi wrote:
>>> > > > > > Alright, I got the serial console and following is the crash
>>> log.
>>> > > Thank
>>> > > > > you
>>> > > > > > for pointing that out.
>>> > > > > >
>>> > > > > > [  133.594852] watchdog: BUG: soft lockup - CPU#2 stuck for
>>> 22s!
>>> > > > > > [ksoftirqd/2:22]
>>> > > > > >
>>> > > > > > [  133.599232] Kernel panic - not syncing: softlockup: hung
>>> tasks
>>> > > > > >
>>> > > > > > [  133.602275] CPU: 2 PID: 22 Comm: ksoftirqd/2 Tainted: G
>>> > > > > > L    4.19.1
>>> > > > > > #1
>>> > > > > >
>>> > > > > > [  133.606620] Hardware name: Xen HVM domU, BIOS 4.4.1-xs132257
>>> > > > > 12/12/2016
>>> > > > >
>>> > > > > Is this serial log from the host? It says it is running as a HVM
>>> DomU.
>>> > > > > Maybe you have mistaken guest serial log with host serial log?
>>> > > > >
>>> > > > > This indicates your machine runs XenServer, which has its own
>>> patch
>>> > > > > queues on top of upstream Xen. You may also want to report to
>>> xs-devel
>>> > > > > mailing list.
>>> > > > >
>>> > > > > Wei.
>>> > > > >
>>> > >
>>>
>>
>
> Sorry, I'll take care of top post from onwards.
>

So after knowing the stack trace, it appears that the CPU was getting stuck
for xen_hypercall_xen_version

watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:0]


[30569.582740] watchdog: BUG: soft lockup - CPU#0 stuck for 23s!
[swapper/0:0]

[30569.588186] Kernel panic - not syncing: softlockup: hung tasks

[30569.591307] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G             L    4.19.1
#1

[30569.595110] Hardware name: Xen HVM domU, BIOS 4.4.1-xs132257 12/12/2016

[30569.598356] Call Trace:

[30569.599597]  <IRQ>

[30569.600920]  dump_stack+0x5a/0x73

[30569.602998]  panic+0xe8/0x249

[30569.604806]  watchdog_timer_fn+0x200/0x230

[30569.607029]  ? softlockup_fn+0x40/0x40

[30569.609246]  __hrtimer_run_queues+0x133/0x270

[30569.611712]  hrtimer_interrupt+0xfb/0x260

[30569.613800]  xen_timer_interrupt+0x1b/0x30

[30569.616972]  __handle_irq_event_percpu+0x69/0x1a0

[30569.619831]  handle_irq_event_percpu+0x30/0x70

[30569.622382]  handle_percpu_irq+0x34/0x50

[30569.625048]  generic_handle_irq+0x1e/0x30

[30569.627216]  __evtchn_fifo_handle_events+0x163/0x1a0

[30569.629955]  __xen_evtchn_do_upcall+0x41/0x70

[30569.632612]  xen_evtchn_do_upcall+0x27/0x50

[30569.635136]  xen_do_hypervisor_callback+0x29/0x40

[30569.638181] RIP: e030:xen_hypercall_xen_version+0xa/0x20

[30569.641302] Code: 51 41 53 b8 10 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc
cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 11 00 00 00 0f 05
<41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc

[30569.651998] RSP: e02b:ffff8800b6203e10 EFLAGS: 00000246

[30569.655077] RAX: 0000000000040007 RBX: ffff8800ae41a898 RCX:
ffffffff8100122a

[30569.659226] RDX: ffffc900400080ff RSI: 0000000000000000 RDI:
0000000000000000

[30569.663480] RBP: ffff8800ae41a890 R08: 0000000000000000 R09:
0000000000000000

[30569.667943] R10: 0000000000000000 R11: 0000000000000246 R12:
0000000080000600

[30569.672057] R13: 000000000000001d R14: 00000000000001d0 R15:
000000000000001d

[30569.675911]  ? xen_hypercall_xen_version+0xa/0x20

[30569.678470]  ? xen_force_evtchn_callback+0x9/0x10

[30569.681495]  ? check_events+0x12/0x20

[30569.683738]  ? xen_restore_fl_direct+0x1f/0x20

[30569.686632]  ? _raw_spin_unlock_irqrestore+0x14/0x20

[30569.689166]  ? cp_rx_poll+0x427/0x4d0 [8139cp]

[30569.691519]  ? net_rx_action+0x171/0x3a0

[30569.694219]  ? __do_softirq+0x11e/0x295

[30569.696442]  ? irq_exit+0x62/0xb0

[30569.698251]  ? xen_evtchn_do_upcall+0x2c/0x50

[30569.701037]  ? xen_do_hypervisor_callback+0x29/0x40

[30569.704439]  </IRQ>

[30569.705731]  ? xen_hypercall_sched_op+0xa/0x20

[30569.708766]  ? xen_hypercall_sched_op+0xa/0x20

[30569.711344]  ? xen_safe_halt+0xc/0x20

[30569.713353]  ? default_idle+0x80/0x140

[30569.715345]  ? do_idle+0x13a/0x250

[30569.717216]  ? cpu_startup_entry+0x6f/0x80

[30569.719511]  ? start_kernel+0x4f6/0x516

[30569.721681]  ? set_init_arg+0x57/0x57

[30569.723985]  ? xen_start_kernel+0x575/0x57f

[30569.726453] Kernel Offset: disabled



So I wrote a kernel module to try to call the function
xen_hypercall_xen_version through it, it could successfully run and return
the version.


What else should I be checking for?

[-- Attachment #1.2: Type: text/html, Size: 26059 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Dom0 kernel 4.14 with SMP randomly crashing
  2018-11-06 10:01                       ` Rishi
@ 2018-11-06 12:17                         ` Wei Liu
  2018-11-06 17:11                           ` Rishi
  0 siblings, 1 reply; 18+ messages in thread
From: Wei Liu @ 2018-11-06 12:17 UTC (permalink / raw)
  To: Rishi; +Cc: Wei Liu, xen-devel

On Tue, Nov 06, 2018 at 03:31:31PM +0530, Rishi wrote:
> 
> So after knowing the stack trace, it appears that the CPU was getting stuck
> for xen_hypercall_xen_version

That hypercall is used when a PV kernel (re-)enables interrupts. See
xen_irq_enable. The purpose is to force the kernel to switch to
hypervisor.

> 
> watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:0]
> 
> 
> [30569.582740] watchdog: BUG: soft lockup - CPU#0 stuck for 23s!
> [swapper/0:0]
> 
> [30569.588186] Kernel panic - not syncing: softlockup: hung tasks
> 
> [30569.591307] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G             L    4.19.1
> #1
> 
> [30569.595110] Hardware name: Xen HVM domU, BIOS 4.4.1-xs132257 12/12/2016
> 
> [30569.598356] Call Trace:
> 
> [30569.599597]  <IRQ>
> 
> [30569.600920]  dump_stack+0x5a/0x73
> 
> [30569.602998]  panic+0xe8/0x249
> 
> [30569.604806]  watchdog_timer_fn+0x200/0x230
> 
> [30569.607029]  ? softlockup_fn+0x40/0x40
> 
> [30569.609246]  __hrtimer_run_queues+0x133/0x270
> 
> [30569.611712]  hrtimer_interrupt+0xfb/0x260
> 
> [30569.613800]  xen_timer_interrupt+0x1b/0x30
> 
> [30569.616972]  __handle_irq_event_percpu+0x69/0x1a0
> 
> [30569.619831]  handle_irq_event_percpu+0x30/0x70
> 
> [30569.622382]  handle_percpu_irq+0x34/0x50
> 
> [30569.625048]  generic_handle_irq+0x1e/0x30
> 
> [30569.627216]  __evtchn_fifo_handle_events+0x163/0x1a0
> 
> [30569.629955]  __xen_evtchn_do_upcall+0x41/0x70
> 
> [30569.632612]  xen_evtchn_do_upcall+0x27/0x50
> 
> [30569.635136]  xen_do_hypervisor_callback+0x29/0x40
> 
> [30569.638181] RIP: e030:xen_hypercall_xen_version+0xa/0x20

What is the asm code for this RIP?


Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Dom0 kernel 4.14 with SMP randomly crashing
  2018-11-06 12:17                         ` Wei Liu
@ 2018-11-06 17:11                           ` Rishi
  2018-11-06 18:46                             ` Rishi
  0 siblings, 1 reply; 18+ messages in thread
From: Rishi @ 2018-11-06 17:11 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2394 bytes --]

On Tue, Nov 6, 2018 at 5:47 PM Wei Liu <wei.liu2@citrix.com> wrote:

> On Tue, Nov 06, 2018 at 03:31:31PM +0530, Rishi wrote:
> >
> > So after knowing the stack trace, it appears that the CPU was getting
> stuck
> > for xen_hypercall_xen_version
>
> That hypercall is used when a PV kernel (re-)enables interrupts. See
> xen_irq_enable. The purpose is to force the kernel to switch to
> hypervisor.
>
> >
> > watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:0]
> >
> >
> > [30569.582740] watchdog: BUG: soft lockup - CPU#0 stuck for 23s!
> > [swapper/0:0]
> >
> > [30569.588186] Kernel panic - not syncing: softlockup: hung tasks
> >
> > [30569.591307] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G             L
> 4.19.1
> > #1
> >
> > [30569.595110] Hardware name: Xen HVM domU, BIOS 4.4.1-xs132257
> 12/12/2016
> >
> > [30569.598356] Call Trace:
> >
> > [30569.599597]  <IRQ>
> >
> > [30569.600920]  dump_stack+0x5a/0x73
> >
> > [30569.602998]  panic+0xe8/0x249
> >
> > [30569.604806]  watchdog_timer_fn+0x200/0x230
> >
> > [30569.607029]  ? softlockup_fn+0x40/0x40
> >
> > [30569.609246]  __hrtimer_run_queues+0x133/0x270
> >
> > [30569.611712]  hrtimer_interrupt+0xfb/0x260
> >
> > [30569.613800]  xen_timer_interrupt+0x1b/0x30
> >
> > [30569.616972]  __handle_irq_event_percpu+0x69/0x1a0
> >
> > [30569.619831]  handle_irq_event_percpu+0x30/0x70
> >
> > [30569.622382]  handle_percpu_irq+0x34/0x50
> >
> > [30569.625048]  generic_handle_irq+0x1e/0x30
> >
> > [30569.627216]  __evtchn_fifo_handle_events+0x163/0x1a0
> >
> > [30569.629955]  __xen_evtchn_do_upcall+0x41/0x70
> >
> > [30569.632612]  xen_evtchn_do_upcall+0x27/0x50
> >
> > [30569.635136]  xen_do_hypervisor_callback+0x29/0x40
> >
> > [30569.638181] RIP: e030:xen_hypercall_xen_version+0xa/0x20
>
> What is the asm code for this RIP?
>
>
> Wei.
>

The issue of crash is getting resolved with appending "noirqbalance" at xen
command line. This way all dom0 cpus are available but not irq balanced at
xen.

Even though I'm running irqbalance service in dom0 the irqs seems to be not
moving. <- this is dom0 perspective, I do not know yet, if it follows Xen
irq.

I tried objdump, while I have  have the function in out but there is no asm
code of it. Its just "..."

ffffffff81001220 <xen_hypercall_xen_version>:

        ...


ffffffff81001240 <xen_hypercall_console_io>:

        ...

All "hypercalls" appear similarly.

[-- Attachment #1.2: Type: text/html, Size: 4831 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Dom0 kernel 4.14 with SMP randomly crashing
  2018-11-06 17:11                           ` Rishi
@ 2018-11-06 18:46                             ` Rishi
  2018-11-06 20:21                               ` Rishi
  0 siblings, 1 reply; 18+ messages in thread
From: Rishi @ 2018-11-06 18:46 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2995 bytes --]

On Tue, Nov 6, 2018 at 10:41 PM Rishi <2rushikeshj@gmail.com> wrote:

>
>
> On Tue, Nov 6, 2018 at 5:47 PM Wei Liu <wei.liu2@citrix.com> wrote:
>
>> On Tue, Nov 06, 2018 at 03:31:31PM +0530, Rishi wrote:
>> >
>> > So after knowing the stack trace, it appears that the CPU was getting
>> stuck
>> > for xen_hypercall_xen_version
>>
>> That hypercall is used when a PV kernel (re-)enables interrupts. See
>> xen_irq_enable. The purpose is to force the kernel to switch to
>> hypervisor.
>>
>> >
>> > watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:0]
>> >
>> >
>> > [30569.582740] watchdog: BUG: soft lockup - CPU#0 stuck for 23s!
>> > [swapper/0:0]
>> >
>> > [30569.588186] Kernel panic - not syncing: softlockup: hung tasks
>> >
>> > [30569.591307] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G             L
>>   4.19.1
>> > #1
>> >
>> > [30569.595110] Hardware name: Xen HVM domU, BIOS 4.4.1-xs132257
>> 12/12/2016
>> >
>> > [30569.598356] Call Trace:
>> >
>> > [30569.599597]  <IRQ>
>> >
>> > [30569.600920]  dump_stack+0x5a/0x73
>> >
>> > [30569.602998]  panic+0xe8/0x249
>> >
>> > [30569.604806]  watchdog_timer_fn+0x200/0x230
>> >
>> > [30569.607029]  ? softlockup_fn+0x40/0x40
>> >
>> > [30569.609246]  __hrtimer_run_queues+0x133/0x270
>> >
>> > [30569.611712]  hrtimer_interrupt+0xfb/0x260
>> >
>> > [30569.613800]  xen_timer_interrupt+0x1b/0x30
>> >
>> > [30569.616972]  __handle_irq_event_percpu+0x69/0x1a0
>> >
>> > [30569.619831]  handle_irq_event_percpu+0x30/0x70
>> >
>> > [30569.622382]  handle_percpu_irq+0x34/0x50
>> >
>> > [30569.625048]  generic_handle_irq+0x1e/0x30
>> >
>> > [30569.627216]  __evtchn_fifo_handle_events+0x163/0x1a0
>> >
>> > [30569.629955]  __xen_evtchn_do_upcall+0x41/0x70
>> >
>> > [30569.632612]  xen_evtchn_do_upcall+0x27/0x50
>> >
>> > [30569.635136]  xen_do_hypervisor_callback+0x29/0x40
>> >
>> > [30569.638181] RIP: e030:xen_hypercall_xen_version+0xa/0x20
>>
>> What is the asm code for this RIP?
>>
>>
>> Wei.
>>
>
> The issue of crash is getting resolved with appending "noirqbalance" at
> xen command line. This way all dom0 cpus are available but not irq balanced
> at xen.
>
> Even though I'm running irqbalance service in dom0 the irqs seems to be
> not moving. <- this is dom0 perspective, I do not know yet, if it follows
> Xen irq.
>
> I tried objdump, while I have  have the function in out but there is no
> asm code of it. Its just "..."
>
> ffffffff81001220 <xen_hypercall_xen_version>:
>
>         ...
>
>
> ffffffff81001240 <xen_hypercall_console_io>:
>
>         ...
>
> All "hypercalls" appear similarly.
>

How frequent can be that hypercall/xen_irq_enable()? Like n/s or once a
while?
During my tests, the system runs stable unless I'm downloading a large
file. Files around a GB size are getting downloaded without crash, but
system crash comes when its above it. I'm using a 2.1GB file & wget to
download.

Is there a way I can simulate PV kernel (re-)enable of interrupt using a
kernel module with a controlled fashion?

[-- Attachment #1.2: Type: text/html, Size: 5926 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Dom0 kernel 4.14 with SMP randomly crashing
  2018-11-06 18:46                             ` Rishi
@ 2018-11-06 20:21                               ` Rishi
  0 siblings, 0 replies; 18+ messages in thread
From: Rishi @ 2018-11-06 20:21 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 4548 bytes --]

On Wed, Nov 7, 2018 at 12:16 AM Rishi <2rushikeshj@gmail.com> wrote:

>
>
> On Tue, Nov 6, 2018 at 10:41 PM Rishi <2rushikeshj@gmail.com> wrote:
>
>>
>>
>> On Tue, Nov 6, 2018 at 5:47 PM Wei Liu <wei.liu2@citrix.com> wrote:
>>
>>> On Tue, Nov 06, 2018 at 03:31:31PM +0530, Rishi wrote:
>>> >
>>> > So after knowing the stack trace, it appears that the CPU was getting
>>> stuck
>>> > for xen_hypercall_xen_version
>>>
>>> That hypercall is used when a PV kernel (re-)enables interrupts. See
>>> xen_irq_enable. The purpose is to force the kernel to switch to
>>> hypervisor.
>>>
>>> >
>>> > watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:0]
>>> >
>>> >
>>> > [30569.582740] watchdog: BUG: soft lockup - CPU#0 stuck for 23s!
>>> > [swapper/0:0]
>>> >
>>> > [30569.588186] Kernel panic - not syncing: softlockup: hung tasks
>>> >
>>> > [30569.591307] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G             L
>>>   4.19.1
>>> > #1
>>> >
>>> > [30569.595110] Hardware name: Xen HVM domU, BIOS 4.4.1-xs132257
>>> 12/12/2016
>>> >
>>> > [30569.598356] Call Trace:
>>> >
>>> > [30569.599597]  <IRQ>
>>> >
>>> > [30569.600920]  dump_stack+0x5a/0x73
>>> >
>>> > [30569.602998]  panic+0xe8/0x249
>>> >
>>> > [30569.604806]  watchdog_timer_fn+0x200/0x230
>>> >
>>> > [30569.607029]  ? softlockup_fn+0x40/0x40
>>> >
>>> > [30569.609246]  __hrtimer_run_queues+0x133/0x270
>>> >
>>> > [30569.611712]  hrtimer_interrupt+0xfb/0x260
>>> >
>>> > [30569.613800]  xen_timer_interrupt+0x1b/0x30
>>> >
>>> > [30569.616972]  __handle_irq_event_percpu+0x69/0x1a0
>>> >
>>> > [30569.619831]  handle_irq_event_percpu+0x30/0x70
>>> >
>>> > [30569.622382]  handle_percpu_irq+0x34/0x50
>>> >
>>> > [30569.625048]  generic_handle_irq+0x1e/0x30
>>> >
>>> > [30569.627216]  __evtchn_fifo_handle_events+0x163/0x1a0
>>> >
>>> > [30569.629955]  __xen_evtchn_do_upcall+0x41/0x70
>>> >
>>> > [30569.632612]  xen_evtchn_do_upcall+0x27/0x50
>>> >
>>> > [30569.635136]  xen_do_hypervisor_callback+0x29/0x40
>>> >
>>> > [30569.638181] RIP: e030:xen_hypercall_xen_version+0xa/0x20
>>>
>>> What is the asm code for this RIP?
>>>
>>>
>>> Wei.
>>>
>>
>> The issue of crash is getting resolved with appending "noirqbalance" at
>> xen command line. This way all dom0 cpus are available but not irq balanced
>> at xen.
>>
>> Even though I'm running irqbalance service in dom0 the irqs seems to be
>> not moving. <- this is dom0 perspective, I do not know yet, if it follows
>> Xen irq.
>>
>> I tried objdump, while I have  have the function in out but there is no
>> asm code of it. Its just "..."
>>
>> ffffffff81001220 <xen_hypercall_xen_version>:
>>
>>         ...
>>
>>
>> ffffffff81001240 <xen_hypercall_console_io>:
>>
>>         ...
>>
>> All "hypercalls" appear similarly.
>>
>
> How frequent can be that hypercall/xen_irq_enable()? Like n/s or once a
> while?
> During my tests, the system runs stable unless I'm downloading a large
> file. Files around a GB size are getting downloaded without crash, but
> system crash comes when its above it. I'm using a 2.1GB file & wget to
> download.
>
> Is there a way I can simulate PV kernel (re-)enable of interrupt using a
> kernel module with a controlled fashion?
>

If this is on right track

ffffffff8101ab70 <xen_force_evtchn_callback>:

ffffffff8101ab70:       31 ff                   xor    %edi,%edi

ffffffff8101ab72:       31 f6                   xor    %esi,%esi

ffffffff8101ab74:       e8 a7 66 fe ff          callq  ffffffff81001220
<xen_hypercall_xen_version>

ffffffff8101ab79:       c3                      retq

ffffffff8101ab7a:       66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)

It seems I'm hitting following code from xen_irq_enable

        barrier(); /* unmask then check (avoid races) */

        if (unlikely(vcpu->evtchn_upcall_pending))
                xen_force_evtchn_callback();

The code says unlikely yet, it is being called, And I got following
structure

struct vcpu_info {

        /*

         * 'evtchn_upcall_pending' is written non-zero by Xen to indicate

         * a pending notification for a particular VCPU. It is then cleared

         * by the guest OS /before/ checking for pending work, thus avoiding

         * a set-and-check race. Note that the mask is only accessed by Xen

         * on the CPU that is currently hosting the VCPU. This means that
the

         * pending and mask flags can be updated by the guest without
special

         * synchronisation (i.e., no need for the x86 LOCK prefix).


 Let me know if the thread is being spammed with such intermediates.

[-- Attachment #1.2: Type: text/html, Size: 15490 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-11-06 20:21 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-04 13:07 Dom0 kernel 4.14 with SMP randomly crashing Rishi
2018-11-04 14:24 ` Rishi
2018-11-05 10:19 ` Wei Liu
2018-11-05 10:56   ` Rishi
2018-11-05 10:58     ` Wei Liu
2018-11-05 11:28       ` Rishi
2018-11-05 11:41         ` Wei Liu
2018-11-05 11:48           ` Rishi
2018-11-05 11:58             ` Wei Liu
2018-11-05 12:30               ` Rishi
2018-11-05 12:49                 ` Wei Liu
2018-11-05 12:59                   ` Rishi
2018-11-05 13:02                     ` Rishi
2018-11-06 10:01                       ` Rishi
2018-11-06 12:17                         ` Wei Liu
2018-11-06 17:11                           ` Rishi
2018-11-06 18:46                             ` Rishi
2018-11-06 20:21                               ` Rishi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.