All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai-help] Xenomai scheduling while atomic bug--debugging parameters
@ 2009-12-09 16:56 Josh Karch
  2009-12-09 17:45 ` Gilles Chanteperdrix
  0 siblings, 1 reply; 10+ messages in thread
From: Josh Karch @ 2009-12-09 16:56 UTC (permalink / raw)
  To: xenomai

Hello,

I have been working on a complex Xenomai based control system for Wind Turbine research and I recently encountered a bug, which after much testing I am unable to isolate, only delay the onset of the bug, which comes more quickly when "top" is run.

My system is a PIII 1GHz, 1GB RAM, uniprocessor PC-104 stack with two PEAK PCI PCAN cards (4 channels) and an Exar 16L788 Quad 16550A compatible UART which is an ISA board (the PCI boards conflicted with the USB IRQs so I installed this board instead).  Our software is a daemon that uses three CAN channels controlled from a master RT-Task using message queues and RT-Alarms; our data logging system logs all data using a Pthread reading from an RT-Pipe, and communication/control of the control system is established through a daemon command line interface that uses a send and receive RT-Pipe to ensure that all RT-Tasks stay in primary mode, while all linux control functions happen in the forked process and one logging pthread.   RT-Mutexes are used throughout.

Depending on how many parts of my program are enabled: (logging code, serial communications) it is only a matter of time before I get "scheduling while atomic" errors. I am running Linux 2.6.30.8 and Xenomai 2.4.10.  With logging (heavy rt-pipe stream) disabled it took 14 hours for the bug to appear, with logging enabled it took 1,5 hours.  During that time I was trying to break the code;  starting and stopping "top" in another window seemed to break it.  In addition, I may have also encountered this bug in another form while improperly using the xeno_16550A driver.  Instead of using setserial to set the quad uarts to none, I now just run as a kernel parameter 8250.nr_uarts=0.

Then I saw this section online:
http://www.mail-archive.com/xenomai@xenomai.org

Seems like a very similar problem. I will try to disable all of these parameters and see if that resolves the problem:  Any ideas on whether that issue was addressed in 2.4.10?

Thank you for your help with this,

Sincerely,

Joshua Karch

Here is the most recent trace:

Dec  9 00:39:35 debian kernel: [56261.755997] BUG: scheduling while atomic: sshd/1294/0x00000003
Dec  9 00:39:35 debian kernel: [56261.756750] Modules linked in: ipv6 nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc loop e100 mii pcan ehci_hcd uhci_hcd shpchp i2c_i801 iTCO_wdt i2c_core serio_raw pci_hotplug intel_agp rng_core piix usbcore agpgart
Dec  9 00:39:35 debian kernel: [56261.768704] Pid: 1294, comm: sshd Not tainted 2.6.30.8 #1
Dec  9 00:39:35 debian kernel: [56261.772700] Call Trace:
Dec  9 00:39:35 debian kernel: [56261.776709]  [<c0318291>] ? __schedule+0x8d/0x505
Dec  9 00:39:35 debian kernel: [56261.780706]  [<c0163c6a>] ? xnpod_schedule+0x6b9/0x987
Dec  9 00:39:35 debian kernel: [56261.784706]  [<c0318718>] ? schedule+0xf/0x28
Dec  9 00:39:35 debian kernel: [56261.788706]  [<c031936d>] ? schedule_hrtimeout_range+0x30/0x121
Dec  9 00:39:35 debian kernel: [56261.792707]  [<c011a56d>] ? __wake_up+0x41/0x7c
Dec  9 00:39:35 debian kernel: [56261.796704]  [<c011a58b>] ? __wake_up+0x5f/0x7c
Dec  9 00:39:35 debian kernel: [56261.800706]  [<c027503d>] ? tty_ldisc_deref+0x5f/0x97
Dec  9 00:39:35 debian kernel: [56261.804705]  [<c027505e>] ? tty_ldisc_deref+0x80/0x97
Dec  9 00:39:35 debian kernel: [56261.808705]  [<c01b0013>] ? poll_schedule_timeout+0x25/0x3e
Dec  9 00:39:35 debian kernel: [56261.812705]  [<c01b095e>] ? do_select+0x3e5/0x41d
Dec  9 00:39:35 debian kernel: [56261.816707]  [<c01b0dff>] ? __pollwait+0x0/0x9c
Dec  9 00:39:35 debian kernel: [56261.820710]  [<c01b0e9b>] ? pollwake+0x0/0x55
Dec  9 00:39:35 debian kernel: [56261.824705]  [<c01b0e9b>] ? pollwake+0x0/0x55
Dec  9 00:39:35 debian kernel: [56261.828706]  [<c01b0e9b>] ? pollwake+0x0/0x55
Dec  9 00:39:35 debian kernel: [56261.832704]  [<c01b0e9b>] ? pollwake+0x0/0x55
Dec  9 00:39:35 debian kernel: [56261.836707]  [<c02c55f7>] ? __qdisc_run+0x10b/0x1b1
Dec  9 00:39:35 debian kernel: [56261.840712]  [<c01228b1>] ? local_bh_enable+0x7c/0x8f
Dec  9 00:39:35 debian kernel: [56261.844706]  [<c02b687d>] ? dev_queue_xmit+0x33e/0x36e
Dec  9 00:39:35 debian kernel: [56261.848706]  [<c02d76d8>] ? ip_finish_output+0x1f2/0x22a
Dec  9 00:39:35 debian kernel: [56261.852705]  [<c02d6988>] ? ip_local_out+0x15/0x17
Dec  9 00:39:35 debian kernel: [56261.856705]  [<c02d7179>] ? ip_queue_xmit+0x2f4/0x355
Dec  9 00:39:35 debian kernel: [56261.860704]  [<c02d7179>] ? ip_queue_xmit+0x2f4/0x355
Dec  9 00:39:35 debian kernel: [56261.864705]  [<c0185c2a>] ? get_page_from_freelist+0x2e4/0x439
Dec  9 00:39:35 debian kernel: [56261.868713]  [<c0163c6a>] ? xnpod_schedule+0x6b9/0x987
Dec  9 00:39:35 debian kernel: [56261.872704]  [<c02e9f2a>] ? tcp_v4_send_check+0x7a/0xb0
Dec  9 00:39:35 debian kernel: [56261.876705]  [<c02e68fa>] ? tcp_transmit_skb+0x5ab/0x5e2
Dec  9 00:39:35 debian kernel: [56261.880706]  [<c0125e5a>] ? lock_timer_base+0x32/0x73
Dec  9 00:39:35 debian kernel: [56261.884705]  [<c02e8a10>] ? tcp_write_xmit+0x7a4/0x864
Dec  9 00:39:35 debian kernel: [56261.888703]  [<c02e8a10>] ? tcp_write_xmit+0x7a4/0x864
Dec  9 00:39:35 debian kernel: [56261.892706]  [<c01536c4>] ? __ipipe_dispatch_wired_nocheck+0x36/0x6d
Dec  9 00:39:35 debian kernel: [56261.896707]  [<c02b07aa>] ? __alloc_skb+0x46/0x10b
Dec  9 00:39:35 debian kernel: [56261.900704]  [<c012242a>] ? local_bh_disable+0xa/0xb
Dec  9 00:39:35 debian kernel: [56261.904722]  [<c02ac1d9>] ? release_sock+0xa0/0xa8
Dec  9 00:39:35 debian kernel: [56261.908705]  [<c01228b1>] ? local_bh_enable+0x7c/0x8f
Dec  9 00:39:35 debian kernel: [56261.912705]  [<c02de7f8>] ? tcp_sendmsg+0x7b0/0x88a
Dec  9 00:39:35 debian kernel: [56261.916706]  [<c01b0b86>] ? core_sys_select+0x1f0/0x287
Dec  9 00:39:35 debian kernel: [56261.920715]  [<c01a3c3a>] ? do_sync_write+0xbf/0x106
Dec  9 00:39:35 debian kernel: [56261.924705]  [<c012f172>] ? remove_wait_queue+0x58/0x6f
Dec  9 00:39:35 debian kernel: [56261.928706]  [<c012ee74>] ? autoremove_wake_function+0x0/0x2d
Dec  9 00:39:35 debian kernel: [56261.932707]  [<c01b0dde>] ? sys_select+0x6e/0x8f
Dec  9 00:39:35 debian kernel: [56261.936704]  [<c01a4722>] ? sys_write+0x3c/0x63
Dec  9 00:39:35 debian kernel: [56261.940706]  [<c01027f5>] ? sysenter_do_call+0x12/0x16
Dec  9 00:39:35 debian kernel: [56261.949288] BUG: scheduling while atomic: top/1503/0x00000003
Dec  9 00:39:35 debian kernel: [56261.952701] Modules linked in: ipv6 nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc loop e100 mii pcan ehci_hcd uhci_hcd shpchp i2c_i801 iTCO_wdt i2c_core serio_raw pci_hotplug intel_agp rng_core piix usbcore agpgart
Dec  9 00:39:35 debian kernel: [56261.960702] Pid: 1503, comm: top Not tainted 2.6.30.8 #1
Dec  9 00:39:35 debian kernel: [56261.964700] Call Trace:
Dec  9 00:39:35 debian kernel: [56261.968715]  [<c0318291>] ? __schedule+0x8d/0x505
Dec  9 00:39:35 debian kernel: [56261.972705]  [<c0131daf>] ? __hrtimer_start_range_ns+0x246/0x263
Dec  9 00:39:35 debian kernel: [56261.976712]  [<c0131dec>] ? hrtimer_start_range_ns+0xf/0x13
Dec  9 00:39:35 debian kernel: [56261.980706]  [<c0318718>] ? schedule+0xf/0x28
Dec  9 00:39:35 debian kernel: [56261.984705]  [<c0319436>] ? schedule_hrtimeout_range+0xf9/0x121
Dec  9 00:39:35 debian kernel: [56261.988705]  [<c0131549>] ? hrtimer_wakeup+0x0/0x18
Dec  9 00:39:35 debian kernel: [56261.992719]  [<c0131dec>] ? hrtimer_start_range_ns+0xf/0x13
Dec  9 00:39:35 debian kernel: [56261.996712]  [<c01b0013>] ? poll_schedule_timeout+0x25/0x3e
Dec  9 00:39:35 debian kernel: [56262.000705]  [<c01b095e>] ? do_select+0x3e5/0x41d
Dec  9 00:39:35 debian kernel: [56262.004708]  [<c01b0dff>] ? __pollwait+0x0/0x9c
Dec  9 00:39:35 debian kernel: [56262.008705]  [<c01b0e9b>] ? pollwake+0x0/0x55
Dec  9 00:39:35 debian kernel: [56262.012706]  [<c01b0e9b>] ? pollwake+0x0/0x55
Dec  9 00:39:35 debian kernel: [56262.016707]  [<c0163c6a>] ? xnpod_schedule+0x6b9/0x987
Dec  9 00:39:35 debian kernel: [56262.020722]  [<f838247e>] ? pcan_chardev_rx+0x4a/0x60 [pcan]
Dec  9 00:39:35 debian kernel: [56262.024707]  [<c017f6ae>] ? rtdm_event_signal+0x20/0x41
Dec  9 00:39:35 debian kernel: [56262.028713]  [<f838444e>] ? sja1000_irqhandler_common+0x4e8/0x54c [pcan]
Dec  9 00:39:35 debian kernel: [56262.032706]  [<c0160705>] ? xnintr_irq_handler+0xc5/0x122
Dec  9 00:39:35 debian kernel: [56262.036707]  [<c0106a36>] ? sched_clock+0x3d/0x5d
Dec  9 00:39:35 debian kernel: [56262.040705]  [<c01334e8>] ? sched_clock_cpu+0x8a/0x247
Dec  9 00:39:35 debian kernel: [56262.044705]  [<c0106a36>] ? sched_clock+0x3d/0x5d
Dec  9 00:39:35 debian kernel: [56262.048706]  [<c01334e8>] ? sched_clock_cpu+0x8a/0x247
Dec  9 00:39:35 debian kernel: [56262.052705]  [<c0119139>] ? try_to_wake_up+0x178/0x194
Dec  9 00:39:35 debian kernel: [56262.056705]  [<c01b0ee5>] ? pollwake+0x4a/0x55
Dec  9 00:39:35 debian kernel: [56262.060705]  [<c0119155>] ? default_wake_function+0x0/0x8
Dec  9 00:39:35 debian kernel: [56262.064706]  [<c0116c3e>] ? __wake_up_common+0x2d/0x52
Dec  9 00:39:35 debian kernel: [56262.068705]  [<c0106a36>] ? sched_clock+0x3d/0x5d
Dec  9 00:39:35 debian kernel: [56262.072705]  [<c01334e8>] ? sched_clock_cpu+0x8a/0x247
Dec  9 00:39:35 debian kernel: [56262.076705]  [<c011a58b>] ? __wake_up+0x5f/0x7c
Dec  9 00:39:35 debian kernel: [56262.080706]  [<c0119139>] ? try_to_wake_up+0x178/0x194
Dec  9 00:39:35 debian kernel: [56262.084705]  [<c01b900e>] ? seq_printf+0x25/0x41
Dec  9 00:39:35 debian kernel: [56262.088706]  [<c01b0b86>] ? core_sys_select+0x1f0/0x287
Dec  9 00:39:35 debian kernel: [56262.092713]  [<c0116c3e>] ? __wake_up_common+0x2d/0x52
Dec  9 00:39:35 debian kernel: [56262.096705]  [<c011a56d>] ? __wake_up+0x41/0x7c
Dec  9 00:39:35 debian kernel: [56262.100704]  [<c011a58b>] ? __wake_up+0x5f/0x7c
Dec  9 00:39:35 debian kernel: [56262.104706]  [<c011a56d>] ? __wake_up+0x41/0x7c
Dec  9 00:39:35 debian kernel: [56262.108703]  [<c011a58b>] ? __wake_up+0x5f/0x7c
Dec  9 00:39:35 debian kernel: [56262.112713]  [<c027503d>] ? tty_ldisc_deref+0x5f/0x97
Dec  9 00:39:35 debian kernel: [56262.116711]  [<c027505e>] ? tty_ldisc_deref+0x80/0x97
Dec  9 00:39:35 debian kernel: [56262.120709]  [<c0271f9e>] ? n_tty_ioctl+0x0/0xa5
Dec  9 00:39:35 debian kernel: [56262.124705]  [<c0270b03>] ? tty_ioctl+0x744/0x767
Dec  9 00:39:35 debian kernel: [56262.128704]  [<c02703bf>] ? tty_ioctl+0x0/0x767
Dec  9 00:39:35 debian kernel: [56262.132704]  [<c01af0bf>] ? vfs_ioctl+0x1c/0x5f
Dec  9 00:39:35 debian kernel: [56262.136706]  [<c01af578>] ? do_vfs_ioctl+0x476/0x4b6
Dec  9 00:39:35 debian kernel: [56262.140711]  [<c027503d>] ? tty_ldisc_deref+0x5f/0x97
Dec  9 00:39:35 debian kernel: [56262.144709]  [<c027505e>] ? tty_ldisc_deref+0x80/0x97
Dec  9 00:39:35 debian kernel: [56262.148705]  [<c026eb49>] ? tty_read+0x6b/0x99
Dec  9 00:39:35 debian kernel: [56262.152705]  [<c01358c5>] ? getnstimeofday+0x4d/0xca
Dec  9 00:39:35 debian kernel: [56262.156705]  [<c01b0dde>] ? sys_select+0x6e/0x8f
Dec  9 00:39:35 debian kernel: [56262.160705]  [<c01027f5>] ? sysenter_do_call+0x12/0x16
Dec  9 00:39:38 debian kernel: [56264.536732] BUG: scheduling while atomic: top/1503/0x00000003
Dec  9 00:39:38 debian kernel: [56264.540697] Modules linked in: ipv6 nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc loop e100 mii pcan ehci_hcd uhci_hcd shpchp i2c_i801 iTCO_wdt i2c_core serio_raw pci_hotplug intel_agp rng_core piix usbcore agpgart
Dec  9 00:39:38 debian kernel: [56264.540697] Pid: 1503, comm: top Not tainted 2.6.30.8 #1
Dec  9 00:39:38 debian kernel: [56264.540697] Call Trace:
Dec  9 00:39:38 debian kernel: [56264.540697]  [<c0318291>] ? __schedule+0x8d/0x505
Dec  9 00:39:38 debian kernel: [56264.540697]  [<c01a47ca>] ? vfs_read+0x81/0x11c
Dec  9 00:39:38 debian kernel: [56264.540697]  [<c01af5f9>] ? sys_ioctl+0x41/0x58
Dec  9 00:39:38 debian kernel: [56264.540697]  [<c0318718>] ? schedule+0xf/0x28
Dec  9 00:39:38 debian kernel: [56264.540697]  [<c0102963>] ? work_resched+0x6/0x20
Dec  9 00:39:38 debian kernel: [56264.540710] BUG: scheduling while atomic: sshd/1294/0x00000003
Dec  9 00:39:38 debian kernel: [56264.544710] Modules linked in: ipv6 nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc loop e100 mii pcan ehci_hcd uhci_hcd shpchp i2c_i801 iTCO_wdt i2c_core serio_raw pci_hotplug intel_agp rng_core piix usbcore agpgart
Dec  9 00:39:38 debian kernel: [56264.552708] Pid: 1294, comm: sshd Not tainted 2.6.30.8 #1
Dec  9 00:39:38 debian kernel: [56264.556700] Call Trace:
Dec  9 00:39:38 debian kernel: [56264.560705]  [<c0318291>] ? __schedule+0x8d/0x505
Dec  9 00:39:38 debian kernel: [56264.564706]  [<c0163c6a>] ? xnpod_schedule+0x6b9/0x987
Dec  9 00:39:38 debian kernel: [56264.568706]  [<c0318718>] ? schedule+0xf/0x28
Dec  9 00:39:38 debian kernel: [56264.572705]  [<c031936d>] ? schedule_hrtimeout_range+0x30/0x121
Dec  9 00:39:38 debian kernel: [56264.576706]  [<c011a56d>] ? __wake_up+0x41/0x7c
Dec  9 00:39:38 debian kernel: [56264.580705]  [<c011a58b>] ? __wake_up+0x5f/0x7c
Dec  9 00:39:38 debian kernel: [56264.584707]  [<c027503d>] ? tty_ldisc_deref+0x5f/0x97
Dec  9 00:39:38 debian kernel: [56264.588705]  [<c027505e>] ? tty_ldisc_deref+0x80/0x97
Dec  9 00:39:38 debian kernel: [56264.592705]  [<c01b0013>] ? poll_schedule_timeout+0x25/0x3e
Dec  9 00:39:38 debian kernel: [56264.596705]  [<c01b095e>] ? do_select+0x3e5/0x41d
Dec  9 00:39:38 debian kernel: [56264.600707]  [<c01b0dff>] ? __pollwait+0x0/0x9c
Dec  9 00:39:38 debian kernel: [56264.604704]  [<c01b0e9b>] ? pollwake+0x0/0x55
Dec  9 00:39:38 debian kernel: [56264.608705]  [<c01b0e9b>] ? pollwake+0x0/0x55
Dec  9 00:39:38 debian kernel: [56264.612705]  [<c01b0e9b>] ? pollwake+0x0/0x55
Dec  9 00:39:38 debian kernel: [56264.616704]  [<c01b0e9b>] ? pollwake+0x0/0x55
Dec  9 00:39:38 debian kernel: [56264.620708]  [<c02c55f7>] ? __qdisc_run+0x10b/0x1b1
Dec  9 00:39:38 debian kernel: [56264.624705]  [<c01228b1>] ? local_bh_enable+0x7c/0x8f
Dec  9 00:39:38 debian kernel: [56264.628705]  [<c02b687d>] ? dev_queue_xmit+0x33e/0x36e
Dec  9 00:39:38 debian kernel: [56264.632706]  [<c02d76d8>] ? ip_finish_output+0x1f2/0x22a
Dec  9 00:39:38 debian kernel: [56264.636706]  [<c02d6988>] ? ip_local_out+0x15/0x17
Dec  9 00:39:38 debian kernel: [56264.640704]  [<c02d7179>] ? ip_queue_xmit+0x2f4/0x355
Dec  9 00:39:38 debian kernel: [56264.644705]  [<c02d7179>] ? ip_queue_xmit+0x2f4/0x355
Dec  9 00:39:38 debian kernel: [56264.648705]  [<c0185c2a>] ? get_page_from_freelist+0x2e4/0x439
Dec  9 00:39:38 debian kernel: [56264.652706]  [<c0163c6a>] ? xnpod_schedule+0x6b9/0x987
Dec  9 00:39:38 debian kernel: [56264.656706]  [<c02e9f2a>] ? tcp_v4_send_check+0x7a/0xb0
Dec  9 00:39:38 debian kernel: [56264.660710]  [<c02e68fa>] ? tcp_transmit_skb+0x5ab/0x5e2
Dec  9 00:39:38 debian kernel: [56264.664705]  [<c0125e5a>] ? lock_timer_base+0x32/0x73
Dec  9 00:39:38 debian kernel: [56264.668706]  [<c02e8a10>] ? tcp_write_xmit+0x7a4/0x864
Dec  9 00:39:38 debian kernel: [56264.672703]  [<c02e8a10>] ? tcp_write_xmit+0x7a4/0x864
Dec  9 00:39:38 debian kernel: [56264.676706]  [<c01536c4>] ? __ipipe_dispatch_wired_nocheck+0x36/0x6d
Dec  9 00:39:38 debian kernel: [56264.680712]  [<c02b07aa>] ? __alloc_skb+0x46/0x10b
Dec  9 00:39:38 debian kernel: [56264.684704]  [<c012242a>] ? local_bh_disable+0xa/0xb
Dec  9 00:39:38 debian kernel: [56264.688707]  [<c02ac1d9>] ? release_sock+0xa0/0xa8
Dec  9 00:39:38 debian kernel: [56264.692704]  [<c01228b1>] ? local_bh_enable+0x7c/0x8f
Dec  9 00:39:38 debian kernel: [56264.696706]  [<c02de7f8>] ? tcp_sendmsg+0x7b0/0x88a
Dec  9 00:39:38 debian kernel: [56264.700707]  [<c01b0b86>] ? core_sys_select+0x1f0/0x287
Dec  9 00:39:38 debian kernel: [56264.704707]  [<c01a3c3a>] ? do_sync_write+0xbf/0x106
Dec  9 00:39:38 debian kernel: [56264.708706]  [<c012f172>] ? remove_wait_queue+0x58/0x6f
Dec  9 00:39:38 debian kernel: [56264.712713]  [<c012ee74>] ? autoremove_wake_function+0x0/0x2d
Dec  9 00:39:38 debian kernel: [56264.716706]  [<c01b0dde>] ? sys_select+0x6e/0x8f
Dec  9 00:39:38 debian kernel: [56264.720722]  [<c01a4722>] ? sys_write+0x3c/0x63
Dec  9 00:39:38 debian kernel: [56264.724706]  [<c01027f5>] ? sysenter_do_call+0x12/0x16
Dec  9 00:39:38 debian kernel: [56264.735792] BUG: scheduling while atomic: top/1503/0x00000003
Dec  9 00:39:38 debian kernel: [56264.736697] Modules linked in: ipv6 nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc loop e100 mii pcan ehci_hcd uhci_hcd shpchp i2c_i801 iTCO_wdt i2c_core serio_raw pci_hotplug intel_agp rng_core piix usbcore agpgart




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai-help] Xenomai scheduling while atomic bug--debugging parameters
  2009-12-09 16:56 [Xenomai-help] Xenomai scheduling while atomic bug--debugging parameters Josh Karch
@ 2009-12-09 17:45 ` Gilles Chanteperdrix
  2009-12-09 18:04   ` Josh Karch
  0 siblings, 1 reply; 10+ messages in thread
From: Gilles Chanteperdrix @ 2009-12-09 17:45 UTC (permalink / raw)
  To: Josh Karch; +Cc: xenomai

Josh Karch wrote:
> parameters and see if that resolves the problem:  Any ideas on
> whether that issue was addressed in 2.4.10?

Looking a the thread you show us, the user which had the original
problem seem not to have sent any further data, so chances are that the
issue was not addressed.

The data we are interested in are:
- better stack traces (frame pointer are possibly disabled in your
kernel, it would probably be better if you enabled them)
- kernel configuration
- a run with all xenomai and I-pipe debug options enabled

TIA.

-- 
                                          Gilles



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai-help] Xenomai scheduling while atomic bug--debugging parameters
  2009-12-09 17:45 ` Gilles Chanteperdrix
@ 2009-12-09 18:04   ` Josh Karch
  2009-12-09 18:42     ` Jan Kiszka
  0 siblings, 1 reply; 10+ messages in thread
From: Josh Karch @ 2009-12-09 18:04 UTC (permalink / raw)
  To: xenomai

Gilles,

sounds good-- for now I'm recompiling the kernel with all debug options disabled to see if my program will run stable, but I can make another compiling with all options enabled for kernel debugging, ipipe debugging, and xenomai debugging, run it, send a config.  I have a couple of questions:
1) is it possible to disable all of these debugging parameters in the kernel boot parameter or do I have to recompile
2) What is the best way to collect a trace?  Right now I have copied from kern.log, messages, and syslog (in /var/log), although they are all similar.
3) Is there a quick way to recompile the kernel with these options without having to rebuild all the kernel modules or do I need to do a clean and rebuild?

Thank you,

Joshua Karch
________________________________________
From: Gilles Chanteperdrix [gilles.chanteperdrix@xenomai.org]
Sent: Wednesday, December 09, 2009 9:45 AM
To: Josh Karch
Cc: 
Subject: Re: [Xenomai-help] Xenomai scheduling while atomic bug--debugging parameters

Josh Karch wrote:
> parameters and see if that resolves the problem:  Any ideas on
> whether that issue was addressed in 2.4.10?

Looking a the thread you show us, the user which had the original
problem seem not to have sent any further data, so chances are that the
issue was not addressed.

The data we are interested in are:
- better stack traces (frame pointer are possibly disabled in your
kernel, it would probably be better if you enabled them)
- kernel configuration
- a run with all xenomai and I-pipe debug options enabled

TIA.

--
                                          Gilles



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai-help] Xenomai scheduling while atomic bug--debugging parameters
  2009-12-09 18:04   ` Josh Karch
@ 2009-12-09 18:42     ` Jan Kiszka
  2009-12-09 20:34       ` Josh Karch
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Kiszka @ 2009-12-09 18:42 UTC (permalink / raw)
  To: Josh Karch; +Cc: xenomai

Josh Karch wrote:
> Gilles,
> 
> sounds good-- for now I'm recompiling the kernel with all debug options disabled to see if my program will run stable, but I can make another compiling with all options enabled for kernel debugging, ipipe debugging, and xenomai debugging, run it, send a config.  I have a couple of questions:
> 1) is it possible to disable all of these debugging parameters in the kernel boot parameter or do I have to recompile

No for most of them. You can compile some tracing features in and leave
them disabled during runtime until needed, but e.g. frame pointers are
too basic.

> 2) What is the best way to collect a trace?  Right now I have copied from kern.log, messages, and syslog (in /var/log), although they are all similar.

As long as dmesg works or syslogd/klogd is able to grab the kernel
console, one of such sources suffices. If the system somehow behaves
strange, drops the first oops, or even hard-locks, collect them via
serial console/netconsole on a remote system.

> 3) Is there a quick way to recompile the kernel with these options without having to rebuild all the kernel modules or do I need to do a clean and rebuild?

Don't do that, always rebuild properly or you create a bunch of
additional sources for troubles. Some config changes do allow this, but
unless you know them and their restrictions, better don't try to "find"
them. Having a proper build environment, rebuilding a tailored embedded
kernel + modules shouldn't take longer than a few minutes.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai-help] Xenomai scheduling while atomic bug--debugging parameters
  2009-12-09 18:42     ` Jan Kiszka
@ 2009-12-09 20:34       ` Josh Karch
  2009-12-10  9:56         ` Jan Kiszka
  0 siblings, 1 reply; 10+ messages in thread
From: Josh Karch @ 2009-12-09 20:34 UTC (permalink / raw)
  To: xenomai

Hi Jan,

I did a complete make mrproper on the system and recompiled without debugging parameters, which took a few hours on the machine because the modules take up a lot. I probably should prune all unused modules (eg video, pcmcia, sound, etc) that I'm not using.

I guess I will have to back up my current /lib/modules/2.6.30.8 when I recompile with all debug support for that experiment, grab the data, and then get that to you once my testing (hopefully) is successful on this change without kernel debugging options.

Cheers,

Joshua Karch
________________________________________
From: Jan Kiszka [jan.kiszka@domain.hid]
Sent: Wednesday, December 09, 2009 10:42 AM
To: Josh Karch
Cc: xenomai@xenomai.org
Subject: Re: Xenomai scheduling while atomic bug--debugging   parameters

Josh Karch wrote:
> Gilles,
>
> sounds good-- for now I'm recompiling the kernel with all debug options disabled to see if my program will run stable, but I can make another compiling with all options enabled for kernel debugging, ipipe debugging, and xenomai debugging, run it, send a config.  I have a couple of questions:
> 1) is it possible to disable all of these debugging parameters in the kernel boot parameter or do I have to recompile

No for most of them. You can compile some tracing features in and leave
them disabled during runtime until needed, but e.g. frame pointers are
too basic.

> 2) What is the best way to collect a trace?  Right now I have copied from kern.log, messages, and syslog (in /var/log), although they are all similar.

As long as dmesg works or syslogd/klogd is able to grab the kernel
console, one of such sources suffices. If the system somehow behaves
strange, drops the first oops, or even hard-locks, collect them via
serial console/netconsole on a remote system.

> 3) Is there a quick way to recompile the kernel with these options without having to rebuild all the kernel modules or do I need to do a clean and rebuild?

Don't do that, always rebuild properly or you create a bunch of
additional sources for troubles. Some config changes do allow this, but
unless you know them and their restrictions, better don't try to "find"
them. Having a proper build environment, rebuilding a tailored embedded
kernel + modules shouldn't take longer than a few minutes.

Jan

--
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai-help] Xenomai scheduling while atomic bug--debugging parameters
  2009-12-09 20:34       ` Josh Karch
@ 2009-12-10  9:56         ` Jan Kiszka
  2009-12-10 16:30           ` Josh Karch
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Kiszka @ 2009-12-10  9:56 UTC (permalink / raw)
  To: Josh Karch; +Cc: xenomai

Josh Karch wrote:
> Hi Jan,
> 
> I did a complete make mrproper on the system and recompiled without debugging parameters, 

mrproper is not required, the kernel is supposed to detect what needs to
be rebuild based on the .config changes.

> which took a few hours on the machine because the modules take up a lot. I probably should prune all unused modules (eg video, pcmcia, sound, etc) that I'm not using.

Yep.

> 
> I guess I will have to back up my current /lib/modules/2.6.30.8 when I recompile with all debug support for that experiment, grab the data, and then get that to you once my testing (hopefully) is successful on this change without kernel debugging options.
> 

Gilles already said this: We also need your .config to check if you are
stressing an uncommon configuration or if we are even able to reproduce
it locally.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai-help] Xenomai scheduling while atomic bug--debugging parameters
  2009-12-10  9:56         ` Jan Kiszka
@ 2009-12-10 16:30           ` Josh Karch
  2009-12-11  9:43             ` Jan Kiszka
  0 siblings, 1 reply; 10+ messages in thread
From: Josh Karch @ 2009-12-10 16:30 UTC (permalink / raw)
  To: xenomai

Jan,

Our program ran without interruption last night with no bug, except it appears one still exists possibly regarding either ethernet (Intel e100), sshd, nfsd, or as a result of running top.  It seems excessive network requests and calls to top and dmesg can trigger it.

Today I will recompile the kernel with all debugging options enabled,  make the system run until the bug occurs, send you that .config file as well as the current .config that still crashes on occasion (though the xenomai application runs rock solid in spite of the linux scheduling while atomic bugs). since the logs may be big, is there an ftp I can send them to or should I email those directly?

Thank you,

Joshua Karch


________________________________________
From: Jan Kiszka [jan.kiszka@domain.hid]
Sent: Thursday, December 10, 2009 1:56 AM
To: Josh Karch
Cc: xenomai@xenomai.org
Subject: Re: Xenomai scheduling while atomic bug--debugging   parameters

Josh Karch wrote:
> Hi Jan,
>
> I did a complete make mrproper on the system and recompiled without debugging parameters,

mrproper is not required, the kernel is supposed to detect what needs to
be rebuild based on the .config changes.

> which took a few hours on the machine because the modules take up a lot. I probably should prune all unused modules (eg video, pcmcia, sound, etc) that I'm not using.

Yep.

>
> I guess I will have to back up my current /lib/modules/2.6.30.8 when I recompile with all debug support for that experiment, grab the data, and then get that to you once my testing (hopefully) is successful on this change without kernel debugging options.
>

Gilles already said this: We also need your .config to check if you are
stressing an uncommon configuration or if we are even able to reproduce
it locally.

Jan

--
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai-help] Xenomai scheduling while atomic bug--debugging parameters
  2009-12-10 16:30           ` Josh Karch
@ 2009-12-11  9:43             ` Jan Kiszka
  2009-12-11 16:45               ` Josh Karch
  2009-12-14 16:33               ` Josh Karch
  0 siblings, 2 replies; 10+ messages in thread
From: Jan Kiszka @ 2009-12-11  9:43 UTC (permalink / raw)
  To: Josh Karch; +Cc: xenomai

Josh Karch wrote:
> Jan,
> 
> Our program ran without interruption last night with no bug, except it appears one still exists possibly regarding either ethernet (Intel e100), sshd, nfsd, or as a result of running top.  It seems excessive network requests and calls to top and dmesg can trigger it.
> 
> Today I will recompile the kernel with all debugging options enabled,  make the system run until the bug occurs, send you that .config file as well as the current .config that still crashes on occasion (though the xenomai application runs rock solid in spite of the linux scheduling while atomic bugs). since the logs may be big, is there an ftp I can send them to or should I email those directly?
> 

>From the "fulldebugging" trace you sent to Gilles and me:

> [  412.520716]  |  # func                    0 ipipe_trace_panic_freeze+0x4 (ipipe_check_context+0x41)
> [  412.520716]  |  # func                    0 ipipe_check_context+0x5 (add_preempt_count+0x10)
> [  412.520716]  |  # func                    0 delay_tsc+0x9 (__const_udelay+0x1d)
> [  412.520716]  |  # func                   -1 __const_udelay+0x3 (sja1000_irqhandler_common+0x2bd [pcan])

The pcan driver invokes a Linux service that is not supposed to be used
from RT context (udelay). Better switch to the CAN driver in upstream
Xenomai (it also comes with a standard API for better portability across
CAN vendors).

> [  412.520716]  |  # func                   -9 xnarch_tsc_to_ns+0x5 (xnarch_get_cpu_time+0xf)
> [  412.520716]  |  # func                   -9 xnarch_get_cpu_time+0x3 (sja1000_irqhandler_common+0x102 [pcan])
> [  412.520716]  |  # func                  -10 xnintr_edge_shirq_handler+0x9 (__ipipe_dispatch_wired_nocheck+0x3e)
> [  412.520716]  |   +func                  -11 __ipipe_dispatch_wired_nocheck+0x6 (__ipipe_dispatch_wired+0x4f)
> [  412.520716]  |   +func                  -11 __ipipe_dispatch_wired+0x5 (__ipipe_handle_irq+0x93)
> [  412.520716]  |   +func                  -11 native_apic_mem_write+0x3 (ack_apic_level+0x141)
> [  412.520716]  |   +func                  -13 native_apic_mem_read+0x3 (ack_apic_level+0x3f)
> [  412.520716]  |   +func                  -13 ack_apic_level+0x6 (__ipipe_ack_fasteoi_irq+0xe)
> [  412.520716]  |   +func                  -13 __ipipe_ack_fasteoi_irq+0x3 (__ipipe_handle_irq+0x8a)
> [  412.520716]  |   +func                  -14 __ipipe_handle_irq+0x9 (common_interrupt+0x36)
> [  412.520716]  |   +begin   0xffffffb6    -14 common_interrupt+0x2f ()

Besides that, there might be a bug in our bug detection infrastructure
with generates a confusing output instead of a proper warning about the
core issue. Will have a look.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai-help] Xenomai scheduling while atomic bug--debugging parameters
  2009-12-11  9:43             ` Jan Kiszka
@ 2009-12-11 16:45               ` Josh Karch
  2009-12-14 16:33               ` Josh Karch
  1 sibling, 0 replies; 10+ messages in thread
From: Josh Karch @ 2009-12-11 16:45 UTC (permalink / raw)
  To: xenomai

Jan, thank you for your finding this unusual CAN bug. At the time I first put my application together, there was no support for PEAK CAN PCI cards for Xenomai.  I will now search for some example code that I can use to run three CAN channels simultaneously.

Sincerely,

Joshua Karch.
________________________________________
From: Jan Kiszka [jan.kiszka@domain.hid]
Sent: Friday, December 11, 2009 1:43 AM
To: Josh Karch
Cc: xenomai@xenomai.org
Subject: Re: Xenomai scheduling while atomic bug--debugging   parameters

Josh Karch wrote:
> Jan,
>
> Our program ran without interruption last night with no bug, except it appears one still exists possibly regarding either ethernet (Intel e100), sshd, nfsd, or as a result of running top.  It seems excessive network requests and calls to top and dmesg can trigger it.
>
> Today I will recompile the kernel with all debugging options enabled,  make the system run until the bug occurs, send you that .config file as well as the current .config that still crashes on occasion (though the xenomai application runs rock solid in spite of the linux scheduling while atomic bugs). since the logs may be big, is there an ftp I can send them to or should I email those directly?
>

>From the "fulldebugging" trace you sent to Gilles and me:

> [  412.520716]  |  # func                    0 ipipe_trace_panic_freeze+0x4 (ipipe_check_context+0x41)
> [  412.520716]  |  # func                    0 ipipe_check_context+0x5 (add_preempt_count+0x10)
> [  412.520716]  |  # func                    0 delay_tsc+0x9 (__const_udelay+0x1d)
> [  412.520716]  |  # func                   -1 __const_udelay+0x3 (sja1000_irqhandler_common+0x2bd [pcan])

The pcan driver invokes a Linux service that is not supposed to be used
from RT context (udelay). Better switch to the CAN driver in upstream
Xenomai (it also comes with a standard API for better portability across
CAN vendors).

> [  412.520716]  |  # func                   -9 xnarch_tsc_to_ns+0x5 (xnarch_get_cpu_time+0xf)
> [  412.520716]  |  # func                   -9 xnarch_get_cpu_time+0x3 (sja1000_irqhandler_common+0x102 [pcan])
> [  412.520716]  |  # func                  -10 xnintr_edge_shirq_handler+0x9 (__ipipe_dispatch_wired_nocheck+0x3e)
> [  412.520716]  |   +func                  -11 __ipipe_dispatch_wired_nocheck+0x6 (__ipipe_dispatch_wired+0x4f)
> [  412.520716]  |   +func                  -11 __ipipe_dispatch_wired+0x5 (__ipipe_handle_irq+0x93)
> [  412.520716]  |   +func                  -11 native_apic_mem_write+0x3 (ack_apic_level+0x141)
> [  412.520716]  |   +func                  -13 native_apic_mem_read+0x3 (ack_apic_level+0x3f)
> [  412.520716]  |   +func                  -13 ack_apic_level+0x6 (__ipipe_ack_fasteoi_irq+0xe)
> [  412.520716]  |   +func                  -13 __ipipe_ack_fasteoi_irq+0x3 (__ipipe_handle_irq+0x8a)
> [  412.520716]  |   +func                  -14 __ipipe_handle_irq+0x9 (common_interrupt+0x36)
> [  412.520716]  |   +begin   0xffffffb6    -14 common_interrupt+0x2f ()

Besides that, there might be a bug in our bug detection infrastructure
with generates a confusing output instead of a proper warning about the
core issue. Will have a look.

Jan

--
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai-help] Xenomai scheduling while atomic bug--debugging parameters
  2009-12-11  9:43             ` Jan Kiszka
  2009-12-11 16:45               ` Josh Karch
@ 2009-12-14 16:33               ` Josh Karch
  1 sibling, 0 replies; 10+ messages in thread
From: Josh Karch @ 2009-12-14 16:33 UTC (permalink / raw)
  To: xenomai

Jan,
I left my application running over the weekend with the new RT-Socket CAN and the system is still running stable today.  Thank you for pointing out that bug-- you've made my December vacation!

Sincerely,

Joshua Karch
________________________________________
From: Jan Kiszka [jan.kiszka@domain.hid]
Sent: Friday, December 11, 2009 1:43 AM
To: Josh Karch
Cc: xenomai@xenomai.org
Subject: Re: Xenomai scheduling while atomic bug--debugging   parameters

Josh Karch wrote:
> Jan,
>
> Our program ran without interruption last night with no bug, except it appears one still exists possibly regarding either ethernet (Intel e100), sshd, nfsd, or as a result of running top.  It seems excessive network requests and calls to top and dmesg can trigger it.
>
> Today I will recompile the kernel with all debugging options enabled,  make the system run until the bug occurs, send you that .config file as well as the current .config that still crashes on occasion (though the xenomai application runs rock solid in spite of the linux scheduling while atomic bugs). since the logs may be big, is there an ftp I can send them to or should I email those directly?
>

>From the "fulldebugging" trace you sent to Gilles and me:

> [  412.520716]  |  # func                    0 ipipe_trace_panic_freeze+0x4 (ipipe_check_context+0x41)
> [  412.520716]  |  # func                    0 ipipe_check_context+0x5 (add_preempt_count+0x10)
> [  412.520716]  |  # func                    0 delay_tsc+0x9 (__const_udelay+0x1d)
> [  412.520716]  |  # func                   -1 __const_udelay+0x3 (sja1000_irqhandler_common+0x2bd [pcan])

The pcan driver invokes a Linux service that is not supposed to be used
from RT context (udelay). Better switch to the CAN driver in upstream
Xenomai (it also comes with a standard API for better portability across
CAN vendors).

> [  412.520716]  |  # func                   -9 xnarch_tsc_to_ns+0x5 (xnarch_get_cpu_time+0xf)
> [  412.520716]  |  # func                   -9 xnarch_get_cpu_time+0x3 (sja1000_irqhandler_common+0x102 [pcan])
> [  412.520716]  |  # func                  -10 xnintr_edge_shirq_handler+0x9 (__ipipe_dispatch_wired_nocheck+0x3e)
> [  412.520716]  |   +func                  -11 __ipipe_dispatch_wired_nocheck+0x6 (__ipipe_dispatch_wired+0x4f)
> [  412.520716]  |   +func                  -11 __ipipe_dispatch_wired+0x5 (__ipipe_handle_irq+0x93)
> [  412.520716]  |   +func                  -11 native_apic_mem_write+0x3 (ack_apic_level+0x141)
> [  412.520716]  |   +func                  -13 native_apic_mem_read+0x3 (ack_apic_level+0x3f)
> [  412.520716]  |   +func                  -13 ack_apic_level+0x6 (__ipipe_ack_fasteoi_irq+0xe)
> [  412.520716]  |   +func                  -13 __ipipe_ack_fasteoi_irq+0x3 (__ipipe_handle_irq+0x8a)
> [  412.520716]  |   +func                  -14 __ipipe_handle_irq+0x9 (common_interrupt+0x36)
> [  412.520716]  |   +begin   0xffffffb6    -14 common_interrupt+0x2f ()

Besides that, there might be a bug in our bug detection infrastructure
with generates a confusing output instead of a proper warning about the
core issue. Will have a look.

Jan

--
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-12-14 16:33 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-12-09 16:56 [Xenomai-help] Xenomai scheduling while atomic bug--debugging parameters Josh Karch
2009-12-09 17:45 ` Gilles Chanteperdrix
2009-12-09 18:04   ` Josh Karch
2009-12-09 18:42     ` Jan Kiszka
2009-12-09 20:34       ` Josh Karch
2009-12-10  9:56         ` Jan Kiszka
2009-12-10 16:30           ` Josh Karch
2009-12-11  9:43             ` Jan Kiszka
2009-12-11 16:45               ` Josh Karch
2009-12-14 16:33               ` Josh Karch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.