All of lore.kernel.org
 help / color / mirror / Atom feed
* Linux 3.1-rc9
@ 2011-10-05  1:40 Linus Torvalds
  2011-10-07  7:08 ` Simon Kirby
  2011-10-09 20:51 ` Arkadiusz Miśkiewicz
  0 siblings, 2 replies; 98+ messages in thread
From: Linus Torvalds @ 2011-10-05  1:40 UTC (permalink / raw)
  To: Linux Kernel Mailing List

Another week, another -rc.

On the kernel front, not a huge amount of changes. That said, by now,
there had better not be - and I definitely wouldn't have minded having
even fewer changes. But the fixes that are here are generally pretty
small, and the diffstat really doesn't look all that scary - there
really aren't *big* changes anywhere.

The things that do stand out a bit: some DRM fixes (radeon and i915),
various network drivers, some ceph fixes - and just lots of random
small stuff. The sparc updates are tiny (T4/T5 detection), but even so
are the bulk of the arch changes, things really have been that quiet.

The more noticeable change isn't actually to the code at all, it's
that kernel.org is starting to have parts of it come up again, so you
can now find the kernel sources back in the traditional location:

    git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

(although I am also updating github in case you cloned from there, so
you don't *have* to change).

Also, since I now have a new stronger gpg key, and that new key
actually ends up being signed by more people than the old one ever was
(at least people I *know*), I decided I might as well switch to that
one. So if you are the kind of person who verifies tags, you may want
to do

   gpg --recv-keys 00411886

to get my new key, so that "git verify-tag" will work for you.

(Key fingerprint = ABAF 11C6 5A29 70B1 30AB  E3C4 79BE 3E43 0041 1886)

Obviously, in order to trust that that is actually really my key
rather than just blindly believe this email that could easily have
been faked by some Linux wannabe), you'd need to check it. But since
it's signed with my old tag signing key, you shouldn't have any more
trust issues with the new one than you should have with the old key.
Or you can try to follow the chain of trust of the other key signers -
some of them have way more signatures than I ever had and are pretty
well connected.

Anything else worth mentioning? Hopefully the PCIe issues with MPS
tuning are all behind us, for the simple reason that we just disabled
them for now, and will revisit it for 3.2. And the occasional oopses
with USB disk removal should now be fixed once and for all (knock
wood). Anything else should be really esoteric and device-specific,
you can get a feel for it in the shortlog.

Go forth and test,

                     Linus

---

Alex Deucher (4):
      drm/radeon/kms: fix regression in DP aux defer handling
      drm/radeon/kms: add retry limits for native DP aux defer
      drm/radeon/kms: Fix logic error in DP HPD handler
      drm/radeon/kms: fix channel_remap setup (v2)

Andy Gospodarek (1):
      bonding: properly stop queuing work when requested

Antonio Quartulli (1):
      batman-adv: do_bcast has to be true for broadcast packets only

Archit Taneja (1):
      [media] OMAP_VOUT: Fix build break caused by update_mode removal in DSS2

Arnd Bergmann (2):
      ASoC: use a valid device for dev_err() in Zylonite
      ASoC: omap_mcpdm_remove cannot be __devexit

Axel Lin (1):
      ASoC: Fix setting update bits for WM8753_LADC and WM8753_RADC

Ben Greear (2):
      ipv6-multicast: Fix memory leak in input path.
      ipv6-multicast: Fix memory leak in IPv6 multicast.

Benjamin Herrenschmidt (1):
      powerpc: Fix device-tree matching for Apple U4 bridge

Borislav Petkov (1):
      ide-disk: Fix request requeuing

Brian King (1):
      ibmveth: Fix oops on request_irq failure

Carsten Otte (1):
      [S390] gmap: always up mmap_sem properly

Dave Young (1):
      [media] v4l: Make sure we hold a reference to the v4l2_device
before using it

David S. Miller (3):
      sparc64: Future proof Niagara cpu detection.
      sparc: Make '-p' boot option meaningful again.
      sparc64: Force the execute bit in OpenFirmware's translation entries.

David Vrabel (1):
      net: xen-netback: correctly restart Tx after a VM restore/migrate

Divy Le Ray (1):
      cxgb4: Fix EEH on IBM P7IOC

Dmitry Kravkov (2):
      bnx2x: fix hw attention handling
      bnx2x: fix WOL by enablement PME in config space

Guenter Roeck (1):
      hwmon: (coretemp) Avoid leaving around dangling pointer

Hannes Reinecke (1):
      block: Free queue resources at blk_release_queue()

Hans Verkuil (1):
      [media] v4l: Fix use-after-free case in v4l2_device_release

Ian Campbell (1):
      MAINTAINERS: tehuti: Alexander Indenbaum's address bounces

James Bottomley (1):
      [SCSI] 3w-9xxx: fix iommu_iova leak

Jason Wang (1):
      net: fix a typo in Documentation/networking/scaling.txt

Jean Delvare (1):
      hwmon: (coretemp) Fixup platform device ID change

Jim Schutt (1):
      libceph: initialize ack_stamp to avoid unnecessary connection reset

Jiri Olsa (1):
      perf tools: Fix raw sample reading

Joerg Roedel (1):
      [media] omap3isp: Fix build error in ispccdc.c

Johannes Berg (1):
      iwlagn: fix dangling scan request

Jon Mason (1):
      PCI: Disable MPS configuration by default

Jonathan Lallinger (1):
      RDSRDMA: Fix cleanup of rds_iw_mr_pool

Josef Bacik (1):
      Btrfs: force a page fault if we have a shorty copy on a page boundary

Jouni Malinen (1):
      cfg80211: Fix validation of AKM suites

Keith Packard (2):
      drm/i915: Enable dither whenever display bpc < frame buffer bpc
      drm/i915: FBC off for ironlake and older, otherwise on by default

Larry Finger (1):
      rtlwifi: rtl8192cu: Fix unitialized struct

Lars-Peter Clausen (1):
      mfd: Fix generic irq chip ack function name for jz4740-adc

Laurent Pinchart (1):
      [media] uvcvideo: Fix crash when linking entities

Linus Torvalds (2):
      bootup: move 'usermodehelper_enable()' to the end of do_basic_setup()
      Linux 3.1-rc9

Madalin Bucur (2):
      net: check return value for dst_alloc
      ipv6: check return value for dst_alloc

Mark Salyzyn (1):
      [SCSI] libsas: fix failure to revalidate domain for anything but
the first expander child.

Martin Schwidefsky (1):
      [S390] Do not clobber personality flags on exec

Mathias Krause (1):
      sparc, exec: remove redundant addr_limit assignment

Matt Fleming (1):
      x86/rtc: Don't recursively acquire rtc_lock

Michel Dänzer (3):
      drm/radeon: Simplify cursor x/yorigin calculation.
      drm/radeon: Update AVIVO cursor coordinate origin before
x/yorigin calculation.
      drm/radeon: Set cursor x/y to 0 when x/yorigin > 0.

Ming Lei (1):
      [media] uvcvideo: Set alternate setting 0 on resume if the bus
has been reset

Mohammed Shafi Shajakhan (1):
      ath9k: Fix a dma warning/memory leak

Neil Horman (1):
      [SCSI] cxgb3i: convert cdev->l2opt to use rcu to prevent NULL dereference

Nicholas Miell (1):
      drm/radeon/kms: fix cursor image off-by-one error

Noah Watkins (1):
      libceph: fix parse options memory leak

Oliver Hartkopp (2):
      can bcm: fix tx_setup off-by-one errors
      can bcm: fix incomplete tx_setup fix

Peter Oberparleiter (1):
      [S390] cio: fix cio_tpi ignoring adapter interrupts

Peter Zijlstra (1):
      posix-cpu-timers: Cure SMP wobbles

Rajkumar Manoharan (1):
      ath9k_hw: Fix Rx DMA stuck for AR9003 chips

Ram Pai (1):
      Resource: fix wrong resource window calculation

Randy Dunlap (1):
      [SCSI] scsi: qla4xxx needs libiscsi.o

Richard Cochran (2):
      ptp: fix L2 event message recognition
      dp83640: reduce driver noise

Rob Herring (2):
      irq: Add declaration of irq_domain_simple_ops to irqdomain.h
      irq: Fix check for already initialized irq_domain in irq_domain_add

Roy.Li (1):
      net: Documentation: Fix type of variables

Sage Weil (3):
      libceph: fix linger request requeuing
      libceph: fix pg_temp mapping calculation
      libceph: fix pg_temp mapping update

Shawn Bohrer (1):
      sched/rt: Migrate equal priority tasks to available CPUs

Shmulik Ravid (1):
      bnx2x: add missing break in bnx2x_dcbnl_get_cap

Simon Farnsworth (1):
      drm/i915: Enable SDVO hotplug interrupts for HDMI and DVI

Simon Kirby (1):
      sched: Fix up wchan borkage

Stanislaw Gruszka (2):
      iwlegacy: fix command queue timeout
      iwlegacy: do not use interruptible waits

Takashi Iwai (2):
      ALSA: hda - Fix a regression of the position-buffer check
      lis3: fix regression of HP DriveGuard with 8bit chip

Tomoya MORINAGA (5):
      spi-topcliff-pch: add tx-memory clear after complete transmitting
      spi-topcliff-pch: Fix SSN Control issue
      spi-topcliff-pch: Fix CPU read complete condition issue
      spi-topcliff-pch: Add recovery processing in case FIFO overrun
error occurs
      spi-topcliff-pch: Fix overrun issue

Toshiharu Okada (2):
      pch_gbe: Fixed the issue on which PC was frozen when link was downed.
      pch_gbe: Fixed the issue on which a network freezes

Vasily Averin (1):
      [SCSI] aacraid: reset should disable MSI interrupt

Willem de Bruijn (1):
      make PACKET_STATISTICS getsockopt report consistently between
ring and non-ring

Wu Fengguang (1):
      writeback: show raw dirtied_when in trace writeback_single_inode

Yan, Zheng (1):
      ipv6: nullify ipv6_ac_list and ipv6_fl_list when creating new socket

wangyanqing (1):
      bootup: move 'usermodehelper_enable()' a little earlier

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-05  1:40 Linux 3.1-rc9 Linus Torvalds
@ 2011-10-07  7:08 ` Simon Kirby
  2011-10-07 17:48   ` Simon Kirby
  2011-10-09 20:51 ` Arkadiusz Miśkiewicz
  1 sibling, 1 reply; 98+ messages in thread
From: Simon Kirby @ 2011-10-07  7:08 UTC (permalink / raw)
  To: Linus Torvalds, Peter Zijlstra; +Cc: Linux Kernel Mailing List

On Tue, Oct 04, 2011 at 06:40:14PM -0700, Linus Torvalds wrote:

> Peter Zijlstra (1):
>       posix-cpu-timers: Cure SMP wobbles

Hello!

I upgraded a few boxes from 3.1-rc6+fixes to 3.1-rc9 (actually 538d2882),
and now they're hard locking every 15 minutes. Below is a serial console
capture of the lockup. I suspect this is from d670ec13. I'll confirm that
they stop crashing with that commit reverted...

[ 1717.560007] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
[ 1717.560007] Pid: 18034, comm: php Not tainted 3.1.0-rc9-hw+ #45
[ 1717.560007] Call Trace:
[ 1717.560007]  <NMI>  [<ffffffff816b3544>] panic+0xba/0x1fb
[ 1717.560007]  [<ffffffff81018d70>] ? native_sched_clock+0x20/0x80
[ 1717.560007]  [<ffffffff81018dd9>] ? sched_clock+0x9/0x10
[ 1717.560007]  [<ffffffff810a4751>] watchdog_overflow_callback+0xb1/0xc0
[ 1717.560007]  [<ffffffff810d0a12>] __perf_event_overflow+0xa2/0x1f0
[ 1717.560007]  [<ffffffff810c9f11>] ? perf_event_update_userpage+0x11/0xc0
[ 1717.560007]  [<ffffffff810d0f64>] perf_event_overflow+0x14/0x20
[ 1717.560007]  [<ffffffff81025a11>] intel_pmu_handle_irq+0x351/0x5f0
[ 1717.560007]  [<ffffffff816b7ff6>] perf_event_nmi_handler+0x36/0xb0
[ 1717.560007]  [<ffffffff816ba21f>] notifier_call_chain+0x3f/0x80
[ 1717.560007]  [<ffffffff816ba285>] atomic_notifier_call_chain+0x15/0x20
[ 1717.560007]  [<ffffffff816ba2be>] notify_die+0x2e/0x30
[ 1717.560007]  [<ffffffff816b76e2>] do_nmi+0xa2/0x250
[ 1717.560007]  [<ffffffff816b7080>] nmi+0x20/0x30
[ 1717.560007]  [<ffffffff8137e20d>] ? __write_lock_failed+0xd/0x20
[ 1717.560007]  <<EOE>>  [<ffffffff816b6819>] _raw_write_lock_irq+0x19/0x20
[ 1717.560007]  [<ffffffff810587c3>] copy_process+0xb23/0x1270
[ 1717.560007]  [<ffffffff81058fc2>] do_fork+0xb2/0x2f0
[ 1717.560007]  [<ffffffff8101a7e3>] sys_clone+0x23/0x30
[ 1717.560007]  [<ffffffff816be533>] stub_clone+0x13/0x20
[ 1717.560007]  [<ffffffff816be292>] ? system_call_fastpath+0x16/0x1b
[ 1717.560005] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 3
[ 1717.560005] Pid: 18038, comm: httpd Not tainted 3.1.0-rc9-hw+ #45
[ 1717.560005] Call Trace:
[ 1717.560005]  <NMI>  [<ffffffff816b3544>] panic+0xba/0x1fb
[ 1717.560005]  [<ffffffff81018d70>] ? native_sched_clock+0x20/0x80
[ 1717.560005]  [<ffffffff81018dd9>] ? sched_clock+0x9/0x10
[ 1717.560005]  [<ffffffff810a4751>] watchdog_overflow_callback+0xb1/0xc0
[ 1717.560005]  [<ffffffff810d0a12>] __perf_event_overflow+0xa2/0x1f0
[ 1717.560005]  [<ffffffff810c9f11>] ? perf_event_update_userpage+0x11/0xc0
[ 1717.560005]  [<ffffffff810d0f64>] perf_event_overflow+0x14/0x20
[ 1717.560005]  [<ffffffff81025a11>] intel_pmu_handle_irq+0x351/0x5f0
[ 1717.560005]  [<ffffffff816b7ff6>] perf_event_nmi_handler+0x36/0xb0
[ 1717.560005]  [<ffffffff816ba21f>] notifier_call_chain+0x3f/0x80
[ 1717.560005]  [<ffffffff816ba285>] atomic_notifier_call_chain+0x15/0x20
[ 1717.560005]  [<ffffffff816ba2be>] notify_die+0x2e/0x30
[ 1717.560005]  [<ffffffff816b76e2>] do_nmi+0xa2/0x250
[ 1717.560005]  [<ffffffff816b7080>] nmi+0x20/0x30
[ 1717.560005]  [<ffffffff816b6644>] ? _raw_spin_lock+0x14/0x20
[ 1717.560005]  <<EOE>>  [<ffffffff8104b4e5>] task_rq_lock+0x55/0xa0
[ 1717.560005]  [<ffffffff8104b8d4>] task_sched_runtime+0x24/0x90
[ 1717.560005]  [<ffffffff8107c924>] thread_group_cputime+0x74/0xb0
[ 1717.560005]  [<ffffffff8107d126>] thread_group_cputimer+0xa6/0xf0
[ 1717.560005]  [<ffffffff8107d198>] cpu_timer_sample_group+0x28/0x90
[ 1717.560005]  [<ffffffff8107d3c3>] set_process_cpu_timer+0x33/0x110
[ 1717.560005]  [<ffffffff8107d4da>] update_rlimit_cpu+0x3a/0x60
[ 1717.560005]  [<ffffffff8106fe9e>] do_prlimit+0xfe/0x1f0
[ 1717.560005]  [<ffffffff8106ffd6>] sys_setrlimit+0x46/0x60
[ 1717.560005]  [<ffffffff816be292>] system_call_fastpath+0x16/0x1b
[ 1717.564005] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 1
[ 1717.564005] Pid: 8, comm: migration/1 Not tainted 3.1.0-rc9-hw+ #45
[ 1717.564005] Call Trace:
[ 1717.564005]  <NMI>  [<ffffffff816b3544>] panic+0xba/0x1fb
[ 1717.564005]  [<ffffffff81018d70>] ? native_sched_clock+0x20/0x80
[ 1717.564005]  [<ffffffff81018dd9>] ? sched_clock+0x9/0x10
[ 1717.564005]  [<ffffffff810a4751>] watchdog_overflow_callback+0xb1/0xc0
[ 1717.564005]  [<ffffffff810d0a12>] __perf_event_overflow+0xa2/0x1f0
[ 1717.564005]  [<ffffffff810c9f11>] ? perf_event_update_userpage+0x11/0xc0
[ 1717.564005]  [<ffffffff810d0f64>] perf_event_overflow+0x14/0x20
[ 1717.564005]  [<ffffffff81025a11>] intel_pmu_handle_irq+0x351/0x5f0
[ 1717.564005]  [<ffffffff816b7ff6>] perf_event_nmi_handler+0x36/0xb0
[ 1717.564005]  [<ffffffff816ba21f>] notifier_call_chain+0x3f/0x80
[ 1717.564005]  [<ffffffff816ba285>] atomic_notifier_call_chain+0x15/0x20
[ 1717.564005]  [<ffffffff816ba2be>] notify_die+0x2e/0x30
[ 1717.564005]  [<ffffffff816b76e2>] do_nmi+0xa2/0x250
[ 1717.564005]  [<ffffffff816b7080>] nmi+0x20/0x30
[ 1717.564005]  [<ffffffff816b6640>] ? _raw_spin_lock+0x10/0x20
[ 1717.564005]  <<EOE>>  [<ffffffff81048cfd>] double_rq_lock+0x4d/0x60
[ 1717.564005]  [<ffffffff8104fee8>] __migrate_task+0x78/0x120
[ 1717.564005]  [<ffffffff8104ff90>] ? __migrate_task+0x120/0x120
[ 1717.564005]  [<ffffffff8104ffae>] migration_cpu_stop+0x1e/0x30
[ 1717.564005]  [<ffffffff810a370c>] cpu_stopper_thread+0xcc/0x190
[ 1717.564005]  [<ffffffff8105049d>] ? default_wake_function+0xd/0x10
[ 1717.564005]  [<ffffffff81043e0a>] ? __wake_up_common+0x5a/0x90
[ 1717.564005]  [<ffffffff810a3640>] ? cgroup_release_agent+0x1d0/0x1d0
[ 1717.564005]  [<ffffffff810a3640>] ? cgroup_release_agent+0x1d0/0x1d0
[ 1717.564005]  [<ffffffff8107adb6>] kthread+0x96/0xb0
[ 1717.564005]  [<ffffffff816c0374>] kernel_thread_helper+0x4/0x10
[ 1717.564005]  [<ffffffff8107ad20>] ? kthread_worker_fn+0x190/0x190
[ 1717.564005]  [<ffffffff816c0370>] ? gs_change+0x13/0x13
[ 1717.560007] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 2
[ 1717.560007] Pid: 15190, comm: httpd Not tainted 3.1.0-rc9-hw+ #45
[ 1717.560007] Call Trace:
[ 1717.560007]  <NMI>  [<ffffffff816b3544>] panic+0xba/0x1fb
[ 1717.560007]  [<ffffffff81018d70>] ? native_sched_clock+0x20/0x80
[ 1717.560007]  [<ffffffff81018dd9>] ? sched_clock+0x9/0x10
[ 1717.560007]  [<ffffffff810a4751>] watchdog_overflow_callback+0xb1/0xc0
[ 1717.560007]  [<ffffffff810d0a12>] __perf_event_overflow+0xa2/0x1f0
[ 1717.560007]  [<ffffffff810c9f11>] ? perf_event_update_userpage+0x11/0xc0
[ 1717.560007]  [<ffffffff810d0f64>] perf_event_overflow+0x14/0x20
[ 1717.560007]  [<ffffffff81025a11>] intel_pmu_handle_irq+0x351/0x5f0
[ 1717.560007]  [<ffffffff816b7ff6>] perf_event_nmi_handler+0x36/0xb0
[ 1717.560007]  [<ffffffff816ba21f>] notifier_call_chain+0x3f/0x80
[ 1717.560007]  [<ffffffff816ba285>] atomic_notifier_call_chain+0x15/0x20
[ 1717.560007]  [<ffffffff816ba2be>] notify_die+0x2e/0x30
[ 1717.560007]  [<ffffffff816b76e2>] do_nmi+0xa2/0x250
[ 1717.560007]  [<ffffffff816b7080>] nmi+0x20/0x30
[ 1717.560007]  [<ffffffff816b6644>] ? _raw_spin_lock+0x14/0x20
[ 1717.560007]  <<EOE>>  [<ffffffff81048064>] update_curr+0x174/0x1a0
[ 1717.560007]  [<ffffffff8104c75c>] enqueue_task_fair+0x5c/0x520
[ 1717.560007]  [<ffffffff81048ea1>] enqueue_task+0x61/0x70
[ 1717.560007]  [<ffffffff81048ed9>] activate_task+0x29/0x40
[ 1717.560007]  [<ffffffff81050589>] wake_up_new_task+0xb9/0x160
[ 1717.560007]  [<ffffffff81059056>] do_fork+0x146/0x2f0
[ 1717.560007]  [<ffffffff81114d80>] ? fd_install+0x30/0x60
[ 1717.560007]  [<ffffffff8101a7e3>] sys_clone+0x23/0x30
[ 1717.560007]  [<ffffffff816be533>] stub_clone+0x13/0x20
[ 1717.560007]  [<ffffffff816be292>] ? system_call_fastpath+0x16/0x1b

Config: http://0x.ca/sim/ref/3.1-rc9/config

Simon-

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-07  7:08 ` Simon Kirby
@ 2011-10-07 17:48   ` Simon Kirby
  2011-10-07 18:01     ` Peter Zijlstra
  0 siblings, 1 reply; 98+ messages in thread
From: Simon Kirby @ 2011-10-07 17:48 UTC (permalink / raw)
  To: Linus Torvalds, Peter Zijlstra; +Cc: Linux Kernel Mailing List

On Fri, Oct 07, 2011 at 12:08:42AM -0700, Simon Kirby wrote:

> On Tue, Oct 04, 2011 at 06:40:14PM -0700, Linus Torvalds wrote:
> 
> > Peter Zijlstra (1):
> >       posix-cpu-timers: Cure SMP wobbles
> 
> Hello!
> 
> I upgraded a few boxes from 3.1-rc6+fixes to 3.1-rc9 (actually 538d2882),
> and now they're hard locking every 15 minutes. Below is a serial console
> capture of the lockup. I suspect this is from d670ec13. I'll confirm that
> they stop crashing with that commit reverted...

Yes, they stopped locking up with d670ec13 reverted.

Simon-

> [ 1717.560007] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
> [ 1717.560007] Pid: 18034, comm: php Not tainted 3.1.0-rc9-hw+ #45
> [ 1717.560007] Call Trace:
> [ 1717.560007]  <NMI>  [<ffffffff816b3544>] panic+0xba/0x1fb
> [ 1717.560007]  [<ffffffff81018d70>] ? native_sched_clock+0x20/0x80
> [ 1717.560007]  [<ffffffff81018dd9>] ? sched_clock+0x9/0x10
> [ 1717.560007]  [<ffffffff810a4751>] watchdog_overflow_callback+0xb1/0xc0
> [ 1717.560007]  [<ffffffff810d0a12>] __perf_event_overflow+0xa2/0x1f0
> [ 1717.560007]  [<ffffffff810c9f11>] ? perf_event_update_userpage+0x11/0xc0
> [ 1717.560007]  [<ffffffff810d0f64>] perf_event_overflow+0x14/0x20
> [ 1717.560007]  [<ffffffff81025a11>] intel_pmu_handle_irq+0x351/0x5f0
> [ 1717.560007]  [<ffffffff816b7ff6>] perf_event_nmi_handler+0x36/0xb0
> [ 1717.560007]  [<ffffffff816ba21f>] notifier_call_chain+0x3f/0x80
> [ 1717.560007]  [<ffffffff816ba285>] atomic_notifier_call_chain+0x15/0x20
> [ 1717.560007]  [<ffffffff816ba2be>] notify_die+0x2e/0x30
> [ 1717.560007]  [<ffffffff816b76e2>] do_nmi+0xa2/0x250
> [ 1717.560007]  [<ffffffff816b7080>] nmi+0x20/0x30
> [ 1717.560007]  [<ffffffff8137e20d>] ? __write_lock_failed+0xd/0x20
> [ 1717.560007]  <<EOE>>  [<ffffffff816b6819>] _raw_write_lock_irq+0x19/0x20
> [ 1717.560007]  [<ffffffff810587c3>] copy_process+0xb23/0x1270
> [ 1717.560007]  [<ffffffff81058fc2>] do_fork+0xb2/0x2f0
> [ 1717.560007]  [<ffffffff8101a7e3>] sys_clone+0x23/0x30
> [ 1717.560007]  [<ffffffff816be533>] stub_clone+0x13/0x20
> [ 1717.560007]  [<ffffffff816be292>] ? system_call_fastpath+0x16/0x1b
> [ 1717.560005] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 3
> [ 1717.560005] Pid: 18038, comm: httpd Not tainted 3.1.0-rc9-hw+ #45
> [ 1717.560005] Call Trace:
> [ 1717.560005]  <NMI>  [<ffffffff816b3544>] panic+0xba/0x1fb
> [ 1717.560005]  [<ffffffff81018d70>] ? native_sched_clock+0x20/0x80
> [ 1717.560005]  [<ffffffff81018dd9>] ? sched_clock+0x9/0x10
> [ 1717.560005]  [<ffffffff810a4751>] watchdog_overflow_callback+0xb1/0xc0
> [ 1717.560005]  [<ffffffff810d0a12>] __perf_event_overflow+0xa2/0x1f0
> [ 1717.560005]  [<ffffffff810c9f11>] ? perf_event_update_userpage+0x11/0xc0
> [ 1717.560005]  [<ffffffff810d0f64>] perf_event_overflow+0x14/0x20
> [ 1717.560005]  [<ffffffff81025a11>] intel_pmu_handle_irq+0x351/0x5f0
> [ 1717.560005]  [<ffffffff816b7ff6>] perf_event_nmi_handler+0x36/0xb0
> [ 1717.560005]  [<ffffffff816ba21f>] notifier_call_chain+0x3f/0x80
> [ 1717.560005]  [<ffffffff816ba285>] atomic_notifier_call_chain+0x15/0x20
> [ 1717.560005]  [<ffffffff816ba2be>] notify_die+0x2e/0x30
> [ 1717.560005]  [<ffffffff816b76e2>] do_nmi+0xa2/0x250
> [ 1717.560005]  [<ffffffff816b7080>] nmi+0x20/0x30
> [ 1717.560005]  [<ffffffff816b6644>] ? _raw_spin_lock+0x14/0x20
> [ 1717.560005]  <<EOE>>  [<ffffffff8104b4e5>] task_rq_lock+0x55/0xa0
> [ 1717.560005]  [<ffffffff8104b8d4>] task_sched_runtime+0x24/0x90
> [ 1717.560005]  [<ffffffff8107c924>] thread_group_cputime+0x74/0xb0
> [ 1717.560005]  [<ffffffff8107d126>] thread_group_cputimer+0xa6/0xf0
> [ 1717.560005]  [<ffffffff8107d198>] cpu_timer_sample_group+0x28/0x90
> [ 1717.560005]  [<ffffffff8107d3c3>] set_process_cpu_timer+0x33/0x110
> [ 1717.560005]  [<ffffffff8107d4da>] update_rlimit_cpu+0x3a/0x60
> [ 1717.560005]  [<ffffffff8106fe9e>] do_prlimit+0xfe/0x1f0
> [ 1717.560005]  [<ffffffff8106ffd6>] sys_setrlimit+0x46/0x60
> [ 1717.560005]  [<ffffffff816be292>] system_call_fastpath+0x16/0x1b
> [ 1717.564005] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 1
> [ 1717.564005] Pid: 8, comm: migration/1 Not tainted 3.1.0-rc9-hw+ #45
> [ 1717.564005] Call Trace:
> [ 1717.564005]  <NMI>  [<ffffffff816b3544>] panic+0xba/0x1fb
> [ 1717.564005]  [<ffffffff81018d70>] ? native_sched_clock+0x20/0x80
> [ 1717.564005]  [<ffffffff81018dd9>] ? sched_clock+0x9/0x10
> [ 1717.564005]  [<ffffffff810a4751>] watchdog_overflow_callback+0xb1/0xc0
> [ 1717.564005]  [<ffffffff810d0a12>] __perf_event_overflow+0xa2/0x1f0
> [ 1717.564005]  [<ffffffff810c9f11>] ? perf_event_update_userpage+0x11/0xc0
> [ 1717.564005]  [<ffffffff810d0f64>] perf_event_overflow+0x14/0x20
> [ 1717.564005]  [<ffffffff81025a11>] intel_pmu_handle_irq+0x351/0x5f0
> [ 1717.564005]  [<ffffffff816b7ff6>] perf_event_nmi_handler+0x36/0xb0
> [ 1717.564005]  [<ffffffff816ba21f>] notifier_call_chain+0x3f/0x80
> [ 1717.564005]  [<ffffffff816ba285>] atomic_notifier_call_chain+0x15/0x20
> [ 1717.564005]  [<ffffffff816ba2be>] notify_die+0x2e/0x30
> [ 1717.564005]  [<ffffffff816b76e2>] do_nmi+0xa2/0x250
> [ 1717.564005]  [<ffffffff816b7080>] nmi+0x20/0x30
> [ 1717.564005]  [<ffffffff816b6640>] ? _raw_spin_lock+0x10/0x20
> [ 1717.564005]  <<EOE>>  [<ffffffff81048cfd>] double_rq_lock+0x4d/0x60
> [ 1717.564005]  [<ffffffff8104fee8>] __migrate_task+0x78/0x120
> [ 1717.564005]  [<ffffffff8104ff90>] ? __migrate_task+0x120/0x120
> [ 1717.564005]  [<ffffffff8104ffae>] migration_cpu_stop+0x1e/0x30
> [ 1717.564005]  [<ffffffff810a370c>] cpu_stopper_thread+0xcc/0x190
> [ 1717.564005]  [<ffffffff8105049d>] ? default_wake_function+0xd/0x10
> [ 1717.564005]  [<ffffffff81043e0a>] ? __wake_up_common+0x5a/0x90
> [ 1717.564005]  [<ffffffff810a3640>] ? cgroup_release_agent+0x1d0/0x1d0
> [ 1717.564005]  [<ffffffff810a3640>] ? cgroup_release_agent+0x1d0/0x1d0
> [ 1717.564005]  [<ffffffff8107adb6>] kthread+0x96/0xb0
> [ 1717.564005]  [<ffffffff816c0374>] kernel_thread_helper+0x4/0x10
> [ 1717.564005]  [<ffffffff8107ad20>] ? kthread_worker_fn+0x190/0x190
> [ 1717.564005]  [<ffffffff816c0370>] ? gs_change+0x13/0x13
> [ 1717.560007] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 2
> [ 1717.560007] Pid: 15190, comm: httpd Not tainted 3.1.0-rc9-hw+ #45
> [ 1717.560007] Call Trace:
> [ 1717.560007]  <NMI>  [<ffffffff816b3544>] panic+0xba/0x1fb
> [ 1717.560007]  [<ffffffff81018d70>] ? native_sched_clock+0x20/0x80
> [ 1717.560007]  [<ffffffff81018dd9>] ? sched_clock+0x9/0x10
> [ 1717.560007]  [<ffffffff810a4751>] watchdog_overflow_callback+0xb1/0xc0
> [ 1717.560007]  [<ffffffff810d0a12>] __perf_event_overflow+0xa2/0x1f0
> [ 1717.560007]  [<ffffffff810c9f11>] ? perf_event_update_userpage+0x11/0xc0
> [ 1717.560007]  [<ffffffff810d0f64>] perf_event_overflow+0x14/0x20
> [ 1717.560007]  [<ffffffff81025a11>] intel_pmu_handle_irq+0x351/0x5f0
> [ 1717.560007]  [<ffffffff816b7ff6>] perf_event_nmi_handler+0x36/0xb0
> [ 1717.560007]  [<ffffffff816ba21f>] notifier_call_chain+0x3f/0x80
> [ 1717.560007]  [<ffffffff816ba285>] atomic_notifier_call_chain+0x15/0x20
> [ 1717.560007]  [<ffffffff816ba2be>] notify_die+0x2e/0x30
> [ 1717.560007]  [<ffffffff816b76e2>] do_nmi+0xa2/0x250
> [ 1717.560007]  [<ffffffff816b7080>] nmi+0x20/0x30
> [ 1717.560007]  [<ffffffff816b6644>] ? _raw_spin_lock+0x14/0x20
> [ 1717.560007]  <<EOE>>  [<ffffffff81048064>] update_curr+0x174/0x1a0
> [ 1717.560007]  [<ffffffff8104c75c>] enqueue_task_fair+0x5c/0x520
> [ 1717.560007]  [<ffffffff81048ea1>] enqueue_task+0x61/0x70
> [ 1717.560007]  [<ffffffff81048ed9>] activate_task+0x29/0x40
> [ 1717.560007]  [<ffffffff81050589>] wake_up_new_task+0xb9/0x160
> [ 1717.560007]  [<ffffffff81059056>] do_fork+0x146/0x2f0
> [ 1717.560007]  [<ffffffff81114d80>] ? fd_install+0x30/0x60
> [ 1717.560007]  [<ffffffff8101a7e3>] sys_clone+0x23/0x30
> [ 1717.560007]  [<ffffffff816be533>] stub_clone+0x13/0x20
> [ 1717.560007]  [<ffffffff816be292>] ? system_call_fastpath+0x16/0x1b
> 
> Config: http://0x.ca/sim/ref/3.1-rc9/config

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-07 17:48   ` Simon Kirby
@ 2011-10-07 18:01     ` Peter Zijlstra
  2011-10-08  0:33       ` Simon Kirby
  2011-10-08  0:50       ` Simon Kirby
  0 siblings, 2 replies; 98+ messages in thread
From: Peter Zijlstra @ 2011-10-07 18:01 UTC (permalink / raw)
  To: Simon Kirby
  Cc: Linus Torvalds, Linux Kernel Mailing List, Dave Jones, Thomas Gleixner

On Fri, 2011-10-07 at 10:48 -0700, Simon Kirby wrote:

> Yes, they stopped locking up with d670ec13 reverted.

> > [ 1717.560005]  [<ffffffff8104b8d4>] task_sched_runtime+0x24/0x90
> > [ 1717.560005]  [<ffffffff8107c924>] thread_group_cputime+0x74/0xb0
> > [ 1717.560005]  [<ffffffff8107d126>] thread_group_cputimer+0xa6/0xf0
> > [ 1717.560005]  [<ffffffff8107d198>] cpu_timer_sample_group+0x28/0x90
> > [ 1717.560005]  [<ffffffff8107d3c3>] set_process_cpu_timer+0x33/0x110
> > [ 1717.560005]  [<ffffffff8107d4da>] update_rlimit_cpu+0x3a/0x60
> > [ 1717.560005]  [<ffffffff8106fe9e>] do_prlimit+0xfe/0x1f0
> > [ 1717.560005]  [<ffffffff8106ffd6>] sys_setrlimit+0x46/0x60
> > [ 1717.560005]  [<ffffffff816be292>] system_call_fastpath+0x16/0x1b


OK so that cputimer stuff is horrid and the worst part is that I cannot
seem to trigger this. You guys must have some weird userspace stuff that
I simply don't have.

I tried running some LTP tests, and it was suggested I find some glibc
tests as well, but I haven't got that far yet.

Now the problem isn't new, but the referenced patch does make it _MUCH_
more likely.

Both Thomas and I have tried to come up with solutions, but the only
thing that stands a chance of working, other than using atomic64_t, is
giving task_cputime::cputime.sum_exec_runtime its own lock.

Clearly this is all very ugly and I'm really hesitant of even posting
this, but here goes...

---
 include/linux/sched.h     |    3 +++
 kernel/posix-cpu-timers.c |    6 +++++-
 kernel/sched_stats.h      |    4 ++--
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index f3c5273..fbbe5eb 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -504,6 +504,7 @@ struct task_cputime {
  * @running:		non-zero when there are timers running and
  * 			@cputime receives updates.
  * @lock:		lock for fields in this struct.
+ * @runtime_lock:	lock for cputime.sum_exec_runtime
  *
  * This structure contains the version of task_cputime, above, that is
  * used for thread group CPU timer calculations.
@@ -512,6 +513,7 @@ struct thread_group_cputimer {
 	struct task_cputime cputime;
 	int running;
 	raw_spinlock_t lock;
+	raw_spinlock_t runtime_lock;
 };
 
 #include <linux/rwsem.h>
@@ -2571,6 +2573,7 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times);
 static inline void thread_group_cputime_init(struct signal_struct *sig)
 {
 	raw_spin_lock_init(&sig->cputimer.lock);
+	raw_spin_lock_init(&sig->cputimer.runtime_lock);
 }
 
 /*
diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index d20586b..bf760b4 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -284,9 +284,13 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)
 		 * it.
 		 */
 		thread_group_cputime(tsk, &sum);
+		raw_spin_lock(&cputimer->runtime_lock);
 		update_gt_cputime(&cputimer->cputime, &sum);
-	}
+	} else
+		raw_spin_lock(&cputimer->runtime_lock);
+
 	*times = cputimer->cputime;
+	raw_spin_unlock(&cputimer->runtime_lock);
 	raw_spin_unlock_irqrestore(&cputimer->lock, flags);
 }
 
diff --git a/kernel/sched_stats.h b/kernel/sched_stats.h
index 87f9e36..f9751c1 100644
--- a/kernel/sched_stats.h
+++ b/kernel/sched_stats.h
@@ -330,7 +330,7 @@ static inline void account_group_exec_runtime(struct task_struct *tsk,
 	if (!cputimer->running)
 		return;
 
-	raw_spin_lock(&cputimer->lock);
+	raw_spin_lock(&cputimer->runtime_lock);
 	cputimer->cputime.sum_exec_runtime += ns;
-	raw_spin_unlock(&cputimer->lock);
+	raw_spin_unlock(&cputimer->runtime_lock);
 }


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-07 18:01     ` Peter Zijlstra
@ 2011-10-08  0:33       ` Simon Kirby
  2011-10-08  0:50       ` Simon Kirby
  1 sibling, 0 replies; 98+ messages in thread
From: Simon Kirby @ 2011-10-08  0:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Linux Kernel Mailing List, Dave Jones, Thomas Gleixner

On Fri, Oct 07, 2011 at 08:01:55PM +0200, Peter Zijlstra wrote:

> On Fri, 2011-10-07 at 10:48 -0700, Simon Kirby wrote:
> 
> > Yes, they stopped locking up with d670ec13 reverted.
> 
> > > [ 1717.560005]  [<ffffffff8104b8d4>] task_sched_runtime+0x24/0x90
> > > [ 1717.560005]  [<ffffffff8107c924>] thread_group_cputime+0x74/0xb0
> > > [ 1717.560005]  [<ffffffff8107d126>] thread_group_cputimer+0xa6/0xf0
> > > [ 1717.560005]  [<ffffffff8107d198>] cpu_timer_sample_group+0x28/0x90
> > > [ 1717.560005]  [<ffffffff8107d3c3>] set_process_cpu_timer+0x33/0x110
> > > [ 1717.560005]  [<ffffffff8107d4da>] update_rlimit_cpu+0x3a/0x60
> > > [ 1717.560005]  [<ffffffff8106fe9e>] do_prlimit+0xfe/0x1f0
> > > [ 1717.560005]  [<ffffffff8106ffd6>] sys_setrlimit+0x46/0x60
> > > [ 1717.560005]  [<ffffffff816be292>] system_call_fastpath+0x16/0x1b
> 
> OK so that cputimer stuff is horrid and the worst part is that I cannot
> seem to trigger this. You guys must have some weird userspace stuff that
> I simply don't have.

I haven't tried your patch yet, but it might help to mention that on
this particular cluster, we are using CONFIG_TASK_IO_ACCOUNTING under
CONFIG_TASKSTATS, and we have process accounting enabled (w/"accton").
Perhaps that enables some other path that makes it difficult to hit
otherwise.

You can't have clouds without weather reporting, of course. :)

Other than that, it's just a typical shared web environment.

Simon-

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-07 18:01     ` Peter Zijlstra
  2011-10-08  0:33       ` Simon Kirby
@ 2011-10-08  0:50       ` Simon Kirby
  2011-10-08  7:55         ` Peter Zijlstra
  1 sibling, 1 reply; 98+ messages in thread
From: Simon Kirby @ 2011-10-08  0:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Linux Kernel Mailing List, Dave Jones, Thomas Gleixner

On Fri, Oct 07, 2011 at 08:01:55PM +0200, Peter Zijlstra wrote:

> @@ -2571,6 +2573,7 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times);
>  static inline void thread_group_cputime_init(struct signal_struct *sig)
>  {
>  	raw_spin_lock_init(&sig->cputimer.lock);
> +	raw_spin_lock_init(&sig->cputimer.runtime_lock);

My 3.1-rc9 tree has just spin_lock_init() here, not raw_*.

Which tree is your patch against? -next or something?

It applies with some cooking like this, but will it be right?

> sed s/raw_// ../sched-patch-noraw.diff | patch -p1 --dry
patching file include/linux/sched.h
Hunk #1 succeeded at 503 (offset -1 lines).
Hunk #2 succeeded at 512 (offset -1 lines).
Hunk #3 succeeded at 2568 (offset -5 lines).
patching file kernel/posix-cpu-timers.c
patching file kernel/sched_stats.h

Simon-

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-08  0:50       ` Simon Kirby
@ 2011-10-08  7:55         ` Peter Zijlstra
  2011-10-12 21:35           ` Simon Kirby
  0 siblings, 1 reply; 98+ messages in thread
From: Peter Zijlstra @ 2011-10-08  7:55 UTC (permalink / raw)
  To: Simon Kirby
  Cc: Linus Torvalds, Linux Kernel Mailing List, Dave Jones, Thomas Gleixner

On Fri, 2011-10-07 at 17:50 -0700, Simon Kirby wrote:
> On Fri, Oct 07, 2011 at 08:01:55PM +0200, Peter Zijlstra wrote:
> 
> > @@ -2571,6 +2573,7 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times);
> >  static inline void thread_group_cputime_init(struct signal_struct *sig)
> >  {
> >       raw_spin_lock_init(&sig->cputimer.lock);
> > +     raw_spin_lock_init(&sig->cputimer.runtime_lock);
> 
> My 3.1-rc9 tree has just spin_lock_init() here, not raw_*.
> 
> Which tree is your patch against? -next or something?

or something yeah.. tip/master I think.

> It applies with some cooking like this, but will it be right?
> 
> > sed s/raw_// ../sched-patch-noraw.diff | patch -p1 --dry
> patching file include/linux/sched.h
> Hunk #1 succeeded at 503 (offset -1 lines).
> Hunk #2 succeeded at 512 (offset -1 lines).
> Hunk #3 succeeded at 2568 (offset -5 lines).
> patching file kernel/posix-cpu-timers.c
> patching file kernel/sched_stats.h 

yes that would be fine.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-05  1:40 Linux 3.1-rc9 Linus Torvalds
  2011-10-07  7:08 ` Simon Kirby
@ 2011-10-09 20:51 ` Arkadiusz Miśkiewicz
  2011-10-10  2:29   ` [tpmdd-devel] " Stefan Berger
  1 sibling, 1 reply; 98+ messages in thread
From: Arkadiusz Miśkiewicz @ 2011-10-09 20:51 UTC (permalink / raw)
  To: linux-kernel, tpmdd-devel, Debora Velarde, Rajiv Andrade,
	Marcel Selhorst

On Wednesday 05 of October 2011, Linus Torvalds wrote:
> Another week, another -rc.

suspend to ram regression is annoying (still visible on rc9; 
https://lkml.org/lkml/2011/9/24/76) but unfortunately maintainers are silent.

-- 
Arkadiusz Miśkiewicz        PLD/Linux Team
arekm / maven.pl            http://ftp.pld-linux.org/

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [tpmdd-devel] Linux 3.1-rc9
  2011-10-09 20:51 ` Arkadiusz Miśkiewicz
@ 2011-10-10  2:29   ` Stefan Berger
  2011-10-10 16:23     ` Rajiv Andrade
  0 siblings, 1 reply; 98+ messages in thread
From: Stefan Berger @ 2011-10-10  2:29 UTC (permalink / raw)
  To: Arkadiusz Miśkiewicz
  Cc: linux-kernel, tpmdd-devel, Debora Velarde, Rajiv Andrade,
	Marcel Selhorst

On 10/09/2011 04:51 PM, Arkadiusz Miśkiewicz wrote:
> On Wednesday 05 of October 2011, Linus Torvalds wrote:
>> Another week, another -rc.
> suspend to ram regression is annoying (still visible on rc9;
> https://lkml.org/lkml/2011/9/24/76) but unfortunately maintainers are silent.
>
I tried -rc9 on my Lenovo W500 with that same TPM. I cannot reproduce 
the 'scheduling while atomic' problem you had reported earlier. I also 
could suspend / resume fine as long as I did the following:

- suspended with the tpm_tis driver as module in the kernel
- once a suspend was done without the tpm_tis driver the subsequent 
suspends were all done without the tpm_tis driver

Once I had done a suspend/resume with the tpm_tis driver *not* in the 
kernel and then again a suspend with the tpm_tis driver in the kernel, 
it did not resume anymore. I believe previously (previous version of 
kernel and/or Fedora) it refused to even suspend. The reason why this 
doesn't work properly is that the driver has to send a command to the 
TPM upon suspend and the BIOS then sends the corresponding wakeup command.

Did you maybe previously suspend/resume without a tpm_tis driver and 
then try to suspend with it ?

Also, my Lenovo W500 shows particularly odd behavior when I switch from 
Windows to Linux. The first suspend with a Linux booted after Windows 
(with or without tpm_tis driver) does *not* resume (reboot required). A 
subsequently rebooted Linux makes the suspend/resume work fine.

    Stefan





^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-10  2:29   ` [tpmdd-devel] " Stefan Berger
@ 2011-10-10 16:23     ` Rajiv Andrade
  2011-10-10 17:05       ` Arkadiusz Miśkiewicz
  0 siblings, 1 reply; 98+ messages in thread
From: Rajiv Andrade @ 2011-10-10 16:23 UTC (permalink / raw)
  To: Stefan Berger
  Cc: Arkadiusz Miśkiewicz, linux-kernel, tpmdd-devel,
	Debora Velarde, Marcel Selhorst

On 09/10/11 23:29, Stefan Berger wrote:
> On 10/09/2011 04:51 PM, Arkadiusz Miśkiewicz wrote:
>> On Wednesday 05 of October 2011, Linus Torvalds wrote:
>>> Another week, another -rc.
>> suspend to ram regression is annoying (still visible on rc9;
>> https://lkml.org/lkml/2011/9/24/76) but unfortunately maintainers are 
>> silent.
>>
> I tried -rc9 on my Lenovo W500 with that same TPM. I cannot reproduce 
> the 'scheduling while atomic' problem you had reported earlier. I also 
> could suspend / resume fine as long as I did the following:
>
> - suspended with the tpm_tis driver as module in the kernel
> - once a suspend was done without the tpm_tis driver the subsequent 
> suspends were all done without the tpm_tis driver
>
> Once I had done a suspend/resume with the tpm_tis driver *not* in the 
> kernel and then again a suspend with the tpm_tis driver in the kernel, 
> it did not resume anymore. I believe previously (previous version of 
> kernel and/or Fedora) it refused to even suspend. The reason why this 
> doesn't work properly is that the driver has to send a command to the 
> TPM upon suspend and the BIOS then sends the corresponding wakeup 
> command.
>
> Did you maybe previously suspend/resume without a tpm_tis driver and 
> then try to suspend with it ?
>
> Also, my Lenovo W500 shows particularly odd behavior when I switch 
> from Windows to Linux. The first suspend with a Linux booted after 
> Windows (with or without tpm_tis driver) does *not* resume (reboot 
> required). A subsequently rebooted Linux makes the suspend/resume work 
> fine.
>
>    Stefan
>
Arkadiusz,

Do you still see the issue with this patch [1][2] applied?

[1] - http://marc.info/?l=linux-kernel&m=131824905826280&w=2
[2] - github.com/srajiv/tpm.git for-james

Thanks,
Rajiv


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-10 16:23     ` Rajiv Andrade
@ 2011-10-10 17:05       ` Arkadiusz Miśkiewicz
  2011-10-10 17:22         ` Stefan Berger
  0 siblings, 1 reply; 98+ messages in thread
From: Arkadiusz Miśkiewicz @ 2011-10-10 17:05 UTC (permalink / raw)
  To: Rajiv Andrade
  Cc: Stefan Berger, linux-kernel, tpmdd-devel, Debora Velarde,
	Marcel Selhorst

On Monday 10 of October 2011, Rajiv Andrade wrote:
> On 09/10/11 23:29, Stefan Berger wrote:
> > On 10/09/2011 04:51 PM, Arkadiusz Miśkiewicz wrote:
> >> On Wednesday 05 of October 2011, Linus Torvalds wrote:
> >>> Another week, another -rc.
> >> 
> >> suspend to ram regression is annoying (still visible on rc9;
> >> https://lkml.org/lkml/2011/9/24/76) but unfortunately maintainers are
> >> silent.
> > 
> > I tried -rc9 on my Lenovo W500 with that same TPM. I cannot reproduce
> > the 'scheduling while atomic' problem you had reported earlier. I also
> > could suspend / resume fine as long as I did the following:
> > 
> > - suspended with the tpm_tis driver as module in the kernel
> > - once a suspend was done without the tpm_tis driver the subsequent
> > suspends were all done without the tpm_tis driver
> > 
> > Once I had done a suspend/resume with the tpm_tis driver *not* in the
> > kernel and then again a suspend with the tpm_tis driver in the kernel,
> > it did not resume anymore. I believe previously (previous version of
> > kernel and/or Fedora) it refused to even suspend. The reason why this
> > doesn't work properly is that the driver has to send a command to the
> > TPM upon suspend and the BIOS then sends the corresponding wakeup
> > command.
> > 
> > Did you maybe previously suspend/resume without a tpm_tis driver and
> > then try to suspend with it ?
> > 
> > Also, my Lenovo W500 shows particularly odd behavior when I switch
> > from Windows to Linux. The first suspend with a Linux booted after
> > Windows (with or without tpm_tis driver) does *not* resume (reboot
> > required). A subsequently rebooted Linux makes the suspend/resume work
> > fine.
> > 
> >    Stefan
> 
> Arkadiusz,
> 
> Do you still see the issue with this patch [1][2] applied?

The issue doesn't happen with this patch but error condition with "Could not 
read PCR 0. TPM is not working correctly." is triggered immediately at boot, 
even before suspend is used.

$ dmesg|grep -iE "(tpm|suspend)"
[   12.640039] tpm_tis 00:0a: 1.2 TPM (device-id 0x1020, rev-id 6)
[   12.640048] tpm_tis 00:0a: Intel iTPM workaround enabled
[   12.768057] tpm_tis 00:0a: Could not read PCR 0. TPM is not working 
correctly.
[   12.768066] tpm_tis 00:0a: Was machine previously suspended without TPM 
driver present?
[   88.512117] Suspending console(s) (use no_console_suspend to debug)


> 
> [1] - http://marc.info/?l=linux-kernel&m=131824905826280&w=2
> [2] - github.com/srajiv/tpm.git for-james
> 
> Thanks,
> Rajiv


-- 
Arkadiusz Miśkiewicz        PLD/Linux Team
arekm / maven.pl            http://ftp.pld-linux.org/

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-10 17:05       ` Arkadiusz Miśkiewicz
@ 2011-10-10 17:22         ` Stefan Berger
  2011-10-10 17:57           ` Arkadiusz Miśkiewicz
  0 siblings, 1 reply; 98+ messages in thread
From: Stefan Berger @ 2011-10-10 17:22 UTC (permalink / raw)
  To: Arkadiusz Miśkiewicz
  Cc: Rajiv Andrade, linux-kernel, tpmdd-devel, Debora Velarde,
	Marcel Selhorst

On 10/10/2011 01:05 PM, Arkadiusz Miśkiewicz wrote:
> On Monday 10 of October 2011, Rajiv Andrade wrote:
>> On 09/10/11 23:29, Stefan Berger wrote:
>>> On 10/09/2011 04:51 PM, Arkadiusz Miśkiewicz wrote:
>>>> On Wednesday 05 of October 2011, Linus Torvalds wrote:
>>>>> Another week, another -rc.
>>>> suspend to ram regression is annoying (still visible on rc9;
>>>> https://lkml.org/lkml/2011/9/24/76) but unfortunately maintainers are
>>>> silent.
>>> I tried -rc9 on my Lenovo W500 with that same TPM. I cannot reproduce
>>> the 'scheduling while atomic' problem you had reported earlier. I also
>>> could suspend / resume fine as long as I did the following:
>>>
>>> - suspended with the tpm_tis driver as module in the kernel
>>> - once a suspend was done without the tpm_tis driver the subsequent
>>> suspends were all done without the tpm_tis driver
>>>
>>> Once I had done a suspend/resume with the tpm_tis driver *not* in the
>>> kernel and then again a suspend with the tpm_tis driver in the kernel,
>>> it did not resume anymore. I believe previously (previous version of
>>> kernel and/or Fedora) it refused to even suspend. The reason why this
>>> doesn't work properly is that the driver has to send a command to the
>>> TPM upon suspend and the BIOS then sends the corresponding wakeup
>>> command.
>>>
>>> Did you maybe previously suspend/resume without a tpm_tis driver and
>>> then try to suspend with it ?
>>>
>>> Also, my Lenovo W500 shows particularly odd behavior when I switch
>>> from Windows to Linux. The first suspend with a Linux booted after
>>> Windows (with or without tpm_tis driver) does *not* resume (reboot
>>> required). A subsequently rebooted Linux makes the suspend/resume work
>>> fine.
>>>
>>>     Stefan
>> Arkadiusz,
>>
>> Do you still see the issue with this patch [1][2] applied?
> The issue doesn't happen with this patch but error condition with "Could not
> read PCR 0. TPM is not working correctly." is triggered immediately at boot,
> even before suspend is used.
>
> $ dmesg|grep -iE "(tpm|suspend)"
> [   12.640039] tpm_tis 00:0a: 1.2 TPM (device-id 0x1020, rev-id 6)
> [   12.640048] tpm_tis 00:0a: Intel iTPM workaround enabled
> [   12.768057] tpm_tis 00:0a: Could not read PCR 0. TPM is not working
> correctly.
> [   12.768066] tpm_tis 00:0a: Was machine previously suspended without TPM
> driver present?
> [   88.512117] Suspending console(s) (use no_console_suspend to debug)
>
Though I suppose that now your suspend/resume cycles always work?
I guess the BIOS seems not to be initializing the TPM correctly. Any 
chance you can get a hold of a BIOS update for your machine?

    Stefan


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-10 17:22         ` Stefan Berger
@ 2011-10-10 17:57           ` Arkadiusz Miśkiewicz
  2011-10-10 21:08             ` Arkadiusz Miśkiewicz
  2011-10-11  7:09             ` [tpmdd-devel] " Peter.Huewe
  0 siblings, 2 replies; 98+ messages in thread
From: Arkadiusz Miśkiewicz @ 2011-10-10 17:57 UTC (permalink / raw)
  To: Stefan Berger
  Cc: Rajiv Andrade, linux-kernel, tpmdd-devel, Debora Velarde,
	Marcel Selhorst

On Monday 10 of October 2011, Stefan Berger wrote:
> On 10/10/2011 01:05 PM, Arkadiusz Miśkiewicz wrote:
> > On Monday 10 of October 2011, Rajiv Andrade wrote:
> >> On 09/10/11 23:29, Stefan Berger wrote:
> >>> On 10/09/2011 04:51 PM, Arkadiusz Miśkiewicz wrote:
> >>>> On Wednesday 05 of October 2011, Linus Torvalds wrote:
> >>>>> Another week, another -rc.
> >>>> 
> >>>> suspend to ram regression is annoying (still visible on rc9;
> >>>> https://lkml.org/lkml/2011/9/24/76) but unfortunately maintainers are
> >>>> silent.
> >>> 
> >>> I tried -rc9 on my Lenovo W500 with that same TPM. I cannot reproduce
> >>> the 'scheduling while atomic' problem you had reported earlier. I also
> >>> could suspend / resume fine as long as I did the following:
> >>> 
> >>> - suspended with the tpm_tis driver as module in the kernel
> >>> - once a suspend was done without the tpm_tis driver the subsequent
> >>> suspends were all done without the tpm_tis driver
> >>> 
> >>> Once I had done a suspend/resume with the tpm_tis driver *not* in the
> >>> kernel and then again a suspend with the tpm_tis driver in the kernel,
> >>> it did not resume anymore. I believe previously (previous version of
> >>> kernel and/or Fedora) it refused to even suspend. The reason why this
> >>> doesn't work properly is that the driver has to send a command to the
> >>> TPM upon suspend and the BIOS then sends the corresponding wakeup
> >>> command.
> >>> 
> >>> Did you maybe previously suspend/resume without a tpm_tis driver and
> >>> then try to suspend with it ?
> >>> 
> >>> Also, my Lenovo W500 shows particularly odd behavior when I switch
> >>> from Windows to Linux. The first suspend with a Linux booted after
> >>> Windows (with or without tpm_tis driver) does *not* resume (reboot
> >>> required). A subsequently rebooted Linux makes the suspend/resume work
> >>> fine.
> >>> 
> >>>     Stefan
> >> 
> >> Arkadiusz,
> >> 
> >> Do you still see the issue with this patch [1][2] applied?
> > 
> > The issue doesn't happen with this patch but error condition with "Could
> > not read PCR 0. TPM is not working correctly." is triggered immediately
> > at boot, even before suspend is used.
> > 
> > $ dmesg|grep -iE "(tpm|suspend)"
> > [   12.640039] tpm_tis 00:0a: 1.2 TPM (device-id 0x1020, rev-id 6)
> > [   12.640048] tpm_tis 00:0a: Intel iTPM workaround enabled
> > [   12.768057] tpm_tis 00:0a: Could not read PCR 0. TPM is not working
> > correctly.
> > [   12.768066] tpm_tis 00:0a: Was machine previously suspended without
> > TPM driver present?
> > [   88.512117] Suspending console(s) (use no_console_suspend to debug)
> 
> Though I suppose that now your suspend/resume cycles always work?

Tried several times and it always worked, so probably yes. Longer testing will 
give definitive answer.

> I guess the BIOS seems not to be initializing the TPM correctly. Any
> chance you can get a hold of a BIOS update for your machine?

Then I looked into bios options on this thinkpad t400 and there are 3 possible 
TPM settings: Enabled, Invisible, Disabled.

Invisible is - visible but not working - according to bios help. No idea why 
such option exists but I had it enabled.

Right now I've set that to "Enabled" and ran few suspend/resume cycles - no 
problems so far.

I guess there is some way to make "Invisible" mode properly handled in Linux, 
too.

>     Stefan

-- 
Arkadiusz Miśkiewicz        PLD/Linux Team
arekm / maven.pl            http://ftp.pld-linux.org/

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-10 17:57           ` Arkadiusz Miśkiewicz
@ 2011-10-10 21:08             ` Arkadiusz Miśkiewicz
  2011-10-11  7:09             ` [tpmdd-devel] " Peter.Huewe
  1 sibling, 0 replies; 98+ messages in thread
From: Arkadiusz Miśkiewicz @ 2011-10-10 21:08 UTC (permalink / raw)
  To: Stefan Berger
  Cc: Rajiv Andrade, linux-kernel, tpmdd-devel, Debora Velarde,
	Marcel Selhorst

On Monday 10 of October 2011, Arkadiusz Miśkiewicz wrote:
> On Monday 10 of October 2011, Stefan Berger wrote:
> > On 10/10/2011 01:05 PM, Arkadiusz Miśkiewicz wrote:
> > > On Monday 10 of October 2011, Rajiv Andrade wrote:
> > >> On 09/10/11 23:29, Stefan Berger wrote:
> > >>> On 10/09/2011 04:51 PM, Arkadiusz Miśkiewicz wrote:
> > >>>> On Wednesday 05 of October 2011, Linus Torvalds wrote:
> > >>>>> Another week, another -rc.
> > >>>> 
> > >>>> suspend to ram regression is annoying (still visible on rc9;
> > >>>> https://lkml.org/lkml/2011/9/24/76) but unfortunately maintainers
> > >>>> are silent.
> > >>> 
> > >>> I tried -rc9 on my Lenovo W500 with that same TPM. I cannot reproduce
> > >>> the 'scheduling while atomic' problem you had reported earlier. I
> > >>> also could suspend / resume fine as long as I did the following:
> > >>> 
> > >>> - suspended with the tpm_tis driver as module in the kernel
> > >>> - once a suspend was done without the tpm_tis driver the subsequent
> > >>> suspends were all done without the tpm_tis driver
> > >>> 
> > >>> Once I had done a suspend/resume with the tpm_tis driver *not* in the
> > >>> kernel and then again a suspend with the tpm_tis driver in the
> > >>> kernel, it did not resume anymore. I believe previously (previous
> > >>> version of kernel and/or Fedora) it refused to even suspend. The
> > >>> reason why this doesn't work properly is that the driver has to send
> > >>> a command to the TPM upon suspend and the BIOS then sends the
> > >>> corresponding wakeup command.
> > >>> 
> > >>> Did you maybe previously suspend/resume without a tpm_tis driver and
> > >>> then try to suspend with it ?
> > >>> 
> > >>> Also, my Lenovo W500 shows particularly odd behavior when I switch
> > >>> from Windows to Linux. The first suspend with a Linux booted after
> > >>> Windows (with or without tpm_tis driver) does *not* resume (reboot
> > >>> required). A subsequently rebooted Linux makes the suspend/resume
> > >>> work fine.
> > >>> 
> > >>>     Stefan
> > >> 
> > >> Arkadiusz,
> > >> 
> > >> Do you still see the issue with this patch [1][2] applied?
> > > 
> > > The issue doesn't happen with this patch but error condition with
> > > "Could not read PCR 0. TPM is not working correctly." is triggered
> > > immediately at boot, even before suspend is used.
> > > 
> > > $ dmesg|grep -iE "(tpm|suspend)"
> > > [   12.640039] tpm_tis 00:0a: 1.2 TPM (device-id 0x1020, rev-id 6)
> > > [   12.640048] tpm_tis 00:0a: Intel iTPM workaround enabled
> > > [   12.768057] tpm_tis 00:0a: Could not read PCR 0. TPM is not working
> > > correctly.
> > > [   12.768066] tpm_tis 00:0a: Was machine previously suspended without
> > > TPM driver present?
> > > [   88.512117] Suspending console(s) (use no_console_suspend to debug)
> > 
> > Though I suppose that now your suspend/resume cycles always work?
> 
> Tried several times and it always worked, so probably yes. Longer testing
> will give definitive answer.
> 
> > I guess the BIOS seems not to be initializing the TPM correctly. Any
> > chance you can get a hold of a BIOS update for your machine?
> 
> Then I looked into bios options on this thinkpad t400 and there are 3
> possible TPM settings: Enabled, Invisible, Disabled.
> 
> Invisible is - visible but not working - according to bios help. No idea
> why such option exists but I had it enabled.
> 
> Right now I've set that to "Enabled" and ran few suspend/resume cycles - no
> problems so far.

Unfortunately TPM enabled in bios + kernel (3.1.0-rc9-00064-g65112dc-dirty) 
with the patch applied

[11629.922643] legacy_resume(): pnp_bus_resume+0x0/0x67 returns -19
[11629.922646] PM: Device 00:0a failed to resume: error -19


and there is no "Could not read PCR 0. TPM is not working correctly." message, 
so this check  doesn't seem to be good enough.

> 
> I guess there is some way to make "Invisible" mode properly handled in
> Linux, too.
> 
> >     Stefan


-- 
Arkadiusz Miśkiewicz        PLD/Linux Team
arekm / maven.pl            http://ftp.pld-linux.org/

^ permalink raw reply	[flat|nested] 98+ messages in thread

* RE: [tpmdd-devel] Linux 3.1-rc9
  2011-10-10 17:57           ` Arkadiusz Miśkiewicz
  2011-10-10 21:08             ` Arkadiusz Miśkiewicz
@ 2011-10-11  7:09             ` Peter.Huewe
  1 sibling, 0 replies; 98+ messages in thread
From: Peter.Huewe @ 2011-10-11  7:09 UTC (permalink / raw)
  To: a.miskiewicz, stefanb; +Cc: m.selhorst, tpmdd-devel, linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1170 bytes --]

-----Original Message-----
From: Arkadiusz Miśkiewicz [mailto:a.miskiewicz@gmail.com]
>> I guess the BIOS seems not to be initializing the TPM correctly. Any
>> chance you can get a hold of a BIOS update for your machine?

> Then I looked into bios options on this thinkpad t400 and there are 3 possible
> TPM settings: Enabled, Invisible, Disabled.

> Invisible is - visible but not working - according to bios help. No idea why
> such option exists but I had it enabled.

> I guess there is some way to make "Invisible" mode properly handled in Linux,
> too.

Invisible here probably means that the bios simply does not send a TPM_Startup which is needed to get the TPM running.
(and maybe it even let's physical presence untouched too).
If the driver would send a TPM_Startup(STATE) (which usually should not cause any problems, since if you send it twice the second one simply gets 'ignored' with a "invalid postinit" return code)
the tpm would probably work in the invisible case too.


Thanks,
Peter



ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-08  7:55         ` Peter Zijlstra
@ 2011-10-12 21:35           ` Simon Kirby
  2011-10-13 23:25             ` Simon Kirby
  2011-10-18  5:40             ` Simon Kirby
  0 siblings, 2 replies; 98+ messages in thread
From: Simon Kirby @ 2011-10-12 21:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Linux Kernel Mailing List, Dave Jones, Thomas Gleixner

On Sat, Oct 08, 2011 at 09:55:51AM +0200, Peter Zijlstra wrote:

> On Fri, 2011-10-07 at 17:50 -0700, Simon Kirby wrote:
> > On Fri, Oct 07, 2011 at 08:01:55PM +0200, Peter Zijlstra wrote:
> > 
> > > @@ -2571,6 +2573,7 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times);
> > >  static inline void thread_group_cputime_init(struct signal_struct *sig)
> > >  {
> > >       raw_spin_lock_init(&sig->cputimer.lock);
> > > +     raw_spin_lock_init(&sig->cputimer.runtime_lock);
> > 
> > My 3.1-rc9 tree has just spin_lock_init() here, not raw_*.
> > 
> > Which tree is your patch against? -next or something?
> 
> or something yeah.. tip/master I think.
> 
> > It applies with some cooking like this, but will it be right?
> > 
> > > sed s/raw_// ../sched-patch-noraw.diff | patch -p1 --dry
> > patching file include/linux/sched.h
> > Hunk #1 succeeded at 503 (offset -1 lines).
> > Hunk #2 succeeded at 512 (offset -1 lines).
> > Hunk #3 succeeded at 2568 (offset -5 lines).
> > patching file kernel/posix-cpu-timers.c
> > patching file kernel/sched_stats.h 
> 
> yes that would be fine.

This patch (s/raw_//) has been stable on 5 boxes for a day. I'll push to
another 15 shortly and confirm tomorrow. Meanwhile, we had another ~4
boxes lock up on 3.1-rc9 _with_ d670ec13 reverted (all CPUs spinning),
but there weren't enough serial cables to log all of them and we haven't
been lucky enough to capture anything other than what fits on 80x25.
I'm hoping it's just the same bug you've already fixed. Strangely, boxes
on -rc6 and -rc7 haven't hit it, but those are across clusters with
different workloads.

Thanks!

Simon-

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-12 21:35           ` Simon Kirby
@ 2011-10-13 23:25             ` Simon Kirby
  2011-10-17  1:39               ` Linus Torvalds
  2011-10-18  5:40             ` Simon Kirby
  1 sibling, 1 reply; 98+ messages in thread
From: Simon Kirby @ 2011-10-13 23:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Linux Kernel Mailing List, Dave Jones, Thomas Gleixner

On Wed, Oct 12, 2011 at 02:35:55PM -0700, Simon Kirby wrote:

> On Sat, Oct 08, 2011 at 09:55:51AM +0200, Peter Zijlstra wrote:
> 
> > On Fri, 2011-10-07 at 17:50 -0700, Simon Kirby wrote:
> > > On Fri, Oct 07, 2011 at 08:01:55PM +0200, Peter Zijlstra wrote:
> > > 
> > > > @@ -2571,6 +2573,7 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times);
> > > >  static inline void thread_group_cputime_init(struct signal_struct *sig)
> > > >  {
> > > >       raw_spin_lock_init(&sig->cputimer.lock);
> > > > +     raw_spin_lock_init(&sig->cputimer.runtime_lock);
> > > 
> > > My 3.1-rc9 tree has just spin_lock_init() here, not raw_*.
> > > 
> > > Which tree is your patch against? -next or something?
> > 
> > or something yeah.. tip/master I think.
> > 
> > > It applies with some cooking like this, but will it be right?
> > > 
> > > > sed s/raw_// ../sched-patch-noraw.diff | patch -p1 --dry
> > > patching file include/linux/sched.h
> > > Hunk #1 succeeded at 503 (offset -1 lines).
> > > Hunk #2 succeeded at 512 (offset -1 lines).
> > > Hunk #3 succeeded at 2568 (offset -5 lines).
> > > patching file kernel/posix-cpu-timers.c
> > > patching file kernel/sched_stats.h 
> > 
> > yes that would be fine.
> 
> This patch (s/raw_//) has been stable on 5 boxes for a day. I'll push to
> another 15 shortly and confirm tomorrow. Meanwhile, we had another ~4
> boxes lock up on 3.1-rc9 _with_ d670ec13 reverted (all CPUs spinning),
> but there weren't enough serial cables to log all of them and we haven't
> been lucky enough to capture anything other than what fits on 80x25.
> I'm hoping it's just the same bug you've already fixed. Strangely, boxes
> on -rc6 and -rc7 haven't hit it, but those are across clusters with
> different workloads.

Looks good. No hangs or crashes for two days on any of them running
3.1-rc9 plus this patch. Not sure if you want to deuglify it, but it
seems to work...

Tested-by: Simon Kirby <sim@hostway.ca>

diff against Linus reproduced below.

Simon-

 include/linux/sched.h     |    3 +++
 kernel/posix-cpu-timers.c |    6 +++++-
 kernel/sched_stats.h      |    4 ++--
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 41d0237..ad9eafc 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -503,6 +503,7 @@ struct task_cputime {
  * @running:		non-zero when there are timers running and
  * 			@cputime receives updates.
  * @lock:		lock for fields in this struct.
+ * @runtime_lock:	lock for cputime.sum_exec_runtime
  *
  * This structure contains the version of task_cputime, above, that is
  * used for thread group CPU timer calculations.
@@ -511,6 +512,7 @@ struct thread_group_cputimer {
 	struct task_cputime cputime;
 	int running;
 	spinlock_t lock;
+	spinlock_t runtime_lock;
 };
 
 #include <linux/rwsem.h>
@@ -2566,6 +2568,7 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times);
 static inline void thread_group_cputime_init(struct signal_struct *sig)
 {
 	spin_lock_init(&sig->cputimer.lock);
+	spin_lock_init(&sig->cputimer.runtime_lock);
 }
 
 /*
diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index c8008dd..fa189a6 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -284,9 +284,13 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)
 		 * it.
 		 */
 		thread_group_cputime(tsk, &sum);
+		spin_lock(&cputimer->runtime_lock);
 		update_gt_cputime(&cputimer->cputime, &sum);
-	}
+	} else
+		spin_lock(&cputimer->runtime_lock);
+
 	*times = cputimer->cputime;
+	spin_unlock(&cputimer->runtime_lock);
 	spin_unlock_irqrestore(&cputimer->lock, flags);
 }
 
diff --git a/kernel/sched_stats.h b/kernel/sched_stats.h
index 331e01b..a7e2c1a 100644
--- a/kernel/sched_stats.h
+++ b/kernel/sched_stats.h
@@ -330,7 +330,7 @@ static inline void account_group_exec_runtime(struct task_struct *tsk,
 	if (!cputimer->running)
 		return;
 
-	spin_lock(&cputimer->lock);
+	spin_lock(&cputimer->runtime_lock);
 	cputimer->cputime.sum_exec_runtime += ns;
-	spin_unlock(&cputimer->lock);
+	spin_unlock(&cputimer->runtime_lock);
 }

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-13 23:25             ` Simon Kirby
@ 2011-10-17  1:39               ` Linus Torvalds
  2011-10-17  4:58                 ` Ingo Molnar
                                   ` (3 more replies)
  0 siblings, 4 replies; 98+ messages in thread
From: Linus Torvalds @ 2011-10-17  1:39 UTC (permalink / raw)
  To: Simon Kirby
  Cc: Peter Zijlstra, Linux Kernel Mailing List, Dave Jones,
	Thomas Gleixner, Martin Schwidefsky, Ingo Molnar

On Thu, Oct 13, 2011 at 4:25 PM, Simon Kirby <sim@hostway.ca> wrote:
>
> Looks good. No hangs or crashes for two days on any of them running
> 3.1-rc9 plus this patch. Not sure if you want to deuglify it, but it
> seems to work...
>
> Tested-by: Simon Kirby <sim@hostway.ca>

Peter, what's the status of this one?

Quite frankly, I personally consider it to be broken - why are we
introducing this new lock for this very special thing? A spinlock to
protect a *single* word of counter seems broken.

It seems more likely that the real bug is that kernel/sched_stats.h
currently takes cputimer->lock without disabling interrupts. Everybody
else uses irq-safe locking, why would sched_stats.h not need that?

However, I don't see why that spinlock is needed at all. Why aren't
those fields just atomics (or at least just "sum_exec_runtime")? And
why does "cputime_add()" exist at all? It seems to always be just a
plain add, and nothing else would seem to ever make sense *anyway*?

In other words, none of that code makes any sense to me at all. And
the patch in question that fixes a hang for Simon seems to make it
even worse. Can somebody explain to me why it looks that crappy?

Please?

That stupid definition of cputime_add() has apparently existed as-is
since it was introduced in 2005. Why do we have code like this:

    times->utime = cputime_add(times->utime, t->utime);

instead of just

    times->utime += t->utime;

which seems not just shorter, but more readable too? The reason is not
some type safety in the cputime_add() thing, it's just a macro.

Added Martin and Ingo to the discussion - Martin because he added that
cputime_add in the first place, and Ingo because he gets the most hits
on kernel/sched_stats.h. Guys - you can see the history on lkml.

                                 Linus

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17  1:39               ` Linus Torvalds
@ 2011-10-17  4:58                 ` Ingo Molnar
  2011-10-17  9:03                   ` Thomas Gleixner
  2011-10-17  7:55                 ` Martin Schwidefsky
                                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 98+ messages in thread
From: Ingo Molnar @ 2011-10-17  4:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Simon Kirby, Peter Zijlstra, Linux Kernel Mailing List,
	Dave Jones, Thomas Gleixner, Martin Schwidefsky


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> However, I don't see why that spinlock is needed at all. Why aren't 
> those fields just atomics (or at least just "sum_exec_runtime")? 
> And why does "cputime_add()" exist at all? [...]

Agreed, atomic64_t is the best choice here. (When the lock was added 
to struct *_cputimer this should probably have been done already - 
but we didn't have atomic64_t back then yet.)

> That stupid definition of cputime_add() has apparently existed 
> as-is since it was introduced in 2005. Why do we have code like 
> this:
> 
>     times->utime = cputime_add(times->utime, t->utime);
> 
> instead of just
> 
>     times->utime += t->utime;
> 
> which seems not just shorter, but more readable too? The reason is 
> not some type safety in the cputime_add() thing, it's just a macro.

Yes. This was in fact how the old scheduler accunting code looked 
like:

-                               utime += t->utime;
-                               stime += t->stime;
+                               utime = cputime_add(utime, t->utime);
+                               stime = cputime_add(stime, t->stime);

before the pointless looking cputime_t wrappery was added in 2005:

 0a71336: [PATCH] cputime: introduce cputime

For the record, i absolutely hate much of the other time related type 
obfuscation we do as well.

For example the ktime_t obfuscation - we only do it to avoid a divide 
on 32-bit architectures that cannot do fast 64/32 divisions ...

It makes the time code a *lot* less obvious than it could be.

I think we should use one flat u64 nanoseconds time type in the 
kernel (preparing it with using KTIME_SCALAR on all architectures for 
a release or so), used with very simple and obvious C arithmetics.

That simple time type could then trickle down as well: we could use 
it everywhere in kernel code and limit the hodge-podge of ABI time 
units to the syscall boundary. (and convert the internal time unit to 
whatever ABI unit there is close to the syscall boundary)

There's a point where micro-optimized 32-bit support related 
maintenance overhead (and the resulting loss of 
robustness/flexibility) becomes too expensive IMO.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17  1:39               ` Linus Torvalds
  2011-10-17  4:58                 ` Ingo Molnar
@ 2011-10-17  7:55                 ` Martin Schwidefsky
  2011-10-17  9:12                   ` Peter Zijlstra
  2011-10-17 20:48                   ` H. Peter Anvin
  2011-10-17 10:34                 ` Peter Zijlstra
  2011-10-20 14:36                 ` Linux 3.1-rc9 Martin Schwidefsky
  3 siblings, 2 replies; 98+ messages in thread
From: Martin Schwidefsky @ 2011-10-17  7:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Simon Kirby, Peter Zijlstra, Linux Kernel Mailing List,
	Dave Jones, Thomas Gleixner, Ingo Molnar

On Sun, 16 Oct 2011 18:39:57 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> That stupid definition of cputime_add() has apparently existed as-is
> since it was introduced in 2005. Why do we have code like this:
> 
>     times->utime = cputime_add(times->utime, t->utime);
> 
> instead of just
> 
>     times->utime += t->utime;
> 
> which seems not just shorter, but more readable too? The reason is not
> some type safety in the cputime_add() thing, it's just a macro.
> 
> Added Martin and Ingo to the discussion - Martin because he added that
> cputime_add in the first place, and Ingo because he gets the most hits
> on kernel/sched_stats.h. Guys - you can see the history on lkml.

I introduced those macros to find all the places in the kernel operating
on a cputime value. The additional debug patch defined cputime_t as a
struct which contained a single u64. That way I got a compiler error
for every place I missed.

The reason for the cputime_xxx primitives has been my fear that people
ignore the cputime_t type and just use unsigned long (as they always
have). That would break s390 which needs a u64 for its cputime value.
Dunno if we still need it, seems like we got used to using cputime_t.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17  4:58                 ` Ingo Molnar
@ 2011-10-17  9:03                   ` Thomas Gleixner
  2011-10-17 10:40                     ` Peter Zijlstra
  2011-10-17 18:49                     ` Ingo Molnar
  0 siblings, 2 replies; 98+ messages in thread
From: Thomas Gleixner @ 2011-10-17  9:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Simon Kirby, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky

On Mon, 17 Oct 2011, Ingo Molnar wrote:
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> For the record, i absolutely hate much of the other time related type 
> obfuscation we do as well.
> 
> For example the ktime_t obfuscation - we only do it to avoid a divide 
> on 32-bit architectures that cannot do fast 64/32 divisions ...
> 
> It makes the time code a *lot* less obvious than it could be.
> 
> I think we should use one flat u64 nanoseconds time type in the 
> kernel (preparing it with using KTIME_SCALAR on all architectures for 
> a release or so), used with very simple and obvious C arithmetics.

It'd be nice, but this simply will not fly.
 
> That simple time type could then trickle down as well: we could use 
> it everywhere in kernel code and limit the hodge-podge of ABI time 
> units to the syscall boundary. (and convert the internal time unit to 
> whatever ABI unit there is close to the syscall boundary)
> 
> There's a point where micro-optimized 32-bit support related 
> maintenance overhead (and the resulting loss of 
> robustness/flexibility) becomes too expensive IMO.

That's not a micro optimization, it's a massive performance hit if you
force those 32bit archs to do 64/32 all over the place.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17  7:55                 ` Martin Schwidefsky
@ 2011-10-17  9:12                   ` Peter Zijlstra
  2011-10-17  9:18                     ` Martin Schwidefsky
  2011-10-17 20:48                   ` H. Peter Anvin
  1 sibling, 1 reply; 98+ messages in thread
From: Peter Zijlstra @ 2011-10-17  9:12 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: Linus Torvalds, Simon Kirby, Linux Kernel Mailing List,
	Dave Jones, Thomas Gleixner, Ingo Molnar

On Mon, 2011-10-17 at 09:55 +0200, Martin Schwidefsky wrote:
> 
> The reason for the cputime_xxx primitives has been my fear that people
> ignore the cputime_t type and just use unsigned long (as they always
> have). That would break s390 which needs a u64 for its cputime value.
> Dunno if we still need it, seems like we got used to using cputime_t. 

Right, and like mentioned last time this came up, we could possibly make
use of sparse to ensure things don't go fail on 32bit s390.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17  9:12                   ` Peter Zijlstra
@ 2011-10-17  9:18                     ` Martin Schwidefsky
  0 siblings, 0 replies; 98+ messages in thread
From: Martin Schwidefsky @ 2011-10-17  9:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Simon Kirby, Linux Kernel Mailing List,
	Dave Jones, Thomas Gleixner, Ingo Molnar

On Mon, 17 Oct 2011 11:12:51 +0200
Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> On Mon, 2011-10-17 at 09:55 +0200, Martin Schwidefsky wrote:
> > 
> > The reason for the cputime_xxx primitives has been my fear that people
> > ignore the cputime_t type and just use unsigned long (as they always
> > have). That would break s390 which needs a u64 for its cputime value.
> > Dunno if we still need it, seems like we got used to using cputime_t. 
> 
> Right, and like mentioned last time this came up, we could possibly make
> use of sparse to ensure things don't go fail on 32bit s390.

Indeed. No progress on the sparse check so far I'm afraid. 


-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17  1:39               ` Linus Torvalds
  2011-10-17  4:58                 ` Ingo Molnar
  2011-10-17  7:55                 ` Martin Schwidefsky
@ 2011-10-17 10:34                 ` Peter Zijlstra
  2011-10-17 14:07                   ` Martin Schwidefsky
  2011-10-17 14:57                   ` Linus Torvalds
  2011-10-20 14:36                 ` Linux 3.1-rc9 Martin Schwidefsky
  3 siblings, 2 replies; 98+ messages in thread
From: Peter Zijlstra @ 2011-10-17 10:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Simon Kirby, Linux Kernel Mailing List, Dave Jones,
	Thomas Gleixner, Martin Schwidefsky, Ingo Molnar

On Sun, 2011-10-16 at 18:39 -0700, Linus Torvalds wrote:

> Quite frankly, I personally consider it to be broken - why are we
> introducing this new lock for this very special thing? A spinlock to
> protect a *single* word of counter seems broken.

Well, I thought atomic64_t would be more expensive on 32bit archs, i386
uses the horridly expensive cmpxchg8b thing to implement it.

That said, I'm more than glad to use it.

> However, I don't see why that spinlock is needed at all. Why aren't
> those fields just atomics (or at least just "sum_exec_runtime")? 

Done.

> And
> why does "cputime_add()" exist at all? It seems to always be just a
> plain add, and nothing else would seem to ever make sense *anyway*?

Martin and me were discussing the merit of that only a few weeks ago ;-)

BTW what would we all think about a coccinelle generated patch that
fixes atomic*_add()'s argument order?

---
Subject: cputimer: Cure lock inversion
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Mon Oct 17 11:50:30 CEST 2011

There's a lock inversion between the cputimer->lock and rq->lock; notably
the two callchains involved are:

 update_rlimit_cpu()
   sighand->siglock
   set_process_cpu_timer()
     cpu_timer_sample_group()
       thread_group_cputimer()
         cputimer->lock
         thread_group_cputime()
           task_sched_runtime()
             ->pi_lock
             rq->lock

 scheduler_tick()
   rq->lock
   task_tick_fair()
     update_curr()
       account_group_exec()
         cputimer->lock

Where the first one is enabling a CLOCK_PROCESS_CPUTIME_ID timer, and the
second one is keeping up-to-date.

Note that e8abccb7193 ("posix-cpu-timers: Cure SMP accounting oddities") didn't
introduce this problem, but merely made it much more likely to happen, see how
cpu_timer_sample_group() for the CPUCLOCK_SCHED case also takes rq->lock.

Cure this inversion by removing the need to acquire cputimer->lock in the
update path by converting task_cputime::sum_exec_runtime to an atomic64_t.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 include/linux/sched.h     |    4 ++--
 kernel/fork.c             |    2 +-
 kernel/posix-cpu-timers.c |   41 ++++++++++++++++++++++++-----------------
 kernel/sched.c            |    2 +-
 kernel/sched_rt.c         |    6 ++++--
 kernel/sched_stats.h      |    4 +---
 6 files changed, 33 insertions(+), 26 deletions(-)
Index: linux-2.6/include/linux/sched.h
===================================================================
--- linux-2.6.orig/include/linux/sched.h
+++ linux-2.6/include/linux/sched.h
@@ -474,7 +474,7 @@ struct cpu_itimer {
 struct task_cputime {
 	cputime_t utime;
 	cputime_t stime;
-	unsigned long long sum_exec_runtime;
+	atomic64_t sum_exec_runtime;
 };
 /* Alternate field names when used to cache expirations. */
 #define prof_exp	stime
@@ -485,7 +485,7 @@ struct task_cputime {
 	(struct task_cputime) {					\
 		.utime = cputime_zero,				\
 		.stime = cputime_zero,				\
-		.sum_exec_runtime = 0,				\
+		.sum_exec_runtime = ATOMIC64_INIT(0),		\
 	}
 
 /*
Index: linux-2.6/kernel/fork.c
===================================================================
--- linux-2.6.orig/kernel/fork.c
+++ linux-2.6/kernel/fork.c
@@ -1033,7 +1033,7 @@ static void posix_cpu_timers_init(struct
 {
 	tsk->cputime_expires.prof_exp = cputime_zero;
 	tsk->cputime_expires.virt_exp = cputime_zero;
-	tsk->cputime_expires.sched_exp = 0;
+	atomic64_set(&tsk->cputime_expires.sched_exp, 0);
 	INIT_LIST_HEAD(&tsk->cpu_timers[0]);
 	INIT_LIST_HEAD(&tsk->cpu_timers[1]);
 	INIT_LIST_HEAD(&tsk->cpu_timers[2]);
Index: linux-2.6/kernel/posix-cpu-timers.c
===================================================================
--- linux-2.6.orig/kernel/posix-cpu-timers.c
+++ linux-2.6/kernel/posix-cpu-timers.c
@@ -239,7 +239,7 @@ void thread_group_cputime(struct task_st
 
 	times->utime = sig->utime;
 	times->stime = sig->stime;
-	times->sum_exec_runtime = sig->sum_sched_runtime;
+	atomic64_set(&times->sum_exec_runtime, sig->sum_sched_runtime);
 
 	rcu_read_lock();
 	/* make sure we can trust tsk->thread_group list */
@@ -250,7 +250,7 @@ void thread_group_cputime(struct task_st
 	do {
 		times->utime = cputime_add(times->utime, t->utime);
 		times->stime = cputime_add(times->stime, t->stime);
-		times->sum_exec_runtime += task_sched_runtime(t);
+		atomic64_add(task_sched_runtime(t), &times->sum_exec_runtime);
 	} while_each_thread(tsk, t);
 out:
 	rcu_read_unlock();
@@ -264,8 +264,11 @@ static void update_gt_cputime(struct tas
 	if (cputime_gt(b->stime, a->stime))
 		a->stime = b->stime;
 
-	if (b->sum_exec_runtime > a->sum_exec_runtime)
-		a->sum_exec_runtime = b->sum_exec_runtime;
+	if (atomic64_read(&b->sum_exec_runtime) >
+			atomic64_read(&a->sum_exec_runtime)) {
+		atomic64_set(&a->sum_exec_runtime,
+				atomic64_read(&b->sum_exec_runtime));
+	}
 }
 
 void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)
@@ -287,6 +290,8 @@ void thread_group_cputimer(struct task_s
 		update_gt_cputime(&cputimer->cputime, &sum);
 	}
 	*times = cputimer->cputime;
+	atomic64_set(&times->sum_exec_runtime,
+			atomic64_read(&cputimer->cputime.sum_exec_runtime));
 	spin_unlock_irqrestore(&cputimer->lock, flags);
 }
 
@@ -313,7 +318,7 @@ static int cpu_clock_sample_group(const 
 		break;
 	case CPUCLOCK_SCHED:
 		thread_group_cputime(p, &cputime);
-		cpu->sched = cputime.sum_exec_runtime;
+		cpu->sched = atomic64_read(&cputime.sum_exec_runtime);
 		break;
 	}
 	return 0;
@@ -593,9 +598,9 @@ static void arm_timer(struct k_itimer *t
 				cputime_expires->virt_exp = exp->cpu;
 			break;
 		case CPUCLOCK_SCHED:
-			if (cputime_expires->sched_exp == 0 ||
-			    cputime_expires->sched_exp > exp->sched)
-				cputime_expires->sched_exp = exp->sched;
+			if (atomic64_read(&cputime_expires->sched_exp) == 0 ||
+			    atomic64_read(&cputime_expires->sched_exp) > exp->sched)
+				atomic64_set(&cputime_expires->sched_exp, exp->sched);
 			break;
 		}
 	}
@@ -656,7 +661,7 @@ static int cpu_timer_sample_group(const 
 		cpu->cpu = cputime.utime;
 		break;
 	case CPUCLOCK_SCHED:
-		cpu->sched = cputime.sum_exec_runtime + task_delta_exec(p);
+		cpu->sched = atomic64_read(&cputime.sum_exec_runtime) + task_delta_exec(p);
 		break;
 	}
 	return 0;
@@ -947,13 +952,14 @@ static void check_thread_timers(struct t
 
 	++timers;
 	maxfire = 20;
-	tsk->cputime_expires.sched_exp = 0;
+	atomic64_set(&tsk->cputime_expires.sched_exp, 0);
 	while (!list_empty(timers)) {
 		struct cpu_timer_list *t = list_first_entry(timers,
 						      struct cpu_timer_list,
 						      entry);
 		if (!--maxfire || tsk->se.sum_exec_runtime < t->expires.sched) {
-			tsk->cputime_expires.sched_exp = t->expires.sched;
+			atomic64_set(&tsk->cputime_expires.sched_exp,
+				     t->expires.sched);
 			break;
 		}
 		t->firing = 1;
@@ -1049,7 +1055,7 @@ static inline int task_cputime_zero(cons
 {
 	if (cputime_eq(cputime->utime, cputime_zero) &&
 	    cputime_eq(cputime->stime, cputime_zero) &&
-	    cputime->sum_exec_runtime == 0)
+	    atomic64_read(&cputime->sum_exec_runtime) == 0)
 		return 1;
 	return 0;
 }
@@ -1076,7 +1082,7 @@ static void check_process_timers(struct 
 	thread_group_cputimer(tsk, &cputime);
 	utime = cputime.utime;
 	ptime = cputime_add(utime, cputime.stime);
-	sum_sched_runtime = cputime.sum_exec_runtime;
+	sum_sched_runtime = atomic64_read(&cputime.sum_exec_runtime);
 	maxfire = 20;
 	prof_expires = cputime_zero;
 	while (!list_empty(timers)) {
@@ -1161,7 +1167,7 @@ static void check_process_timers(struct 
 
 	sig->cputime_expires.prof_exp = prof_expires;
 	sig->cputime_expires.virt_exp = virt_expires;
-	sig->cputime_expires.sched_exp = sched_expires;
+	atomic64_set(&sig->cputime_expires.sched_exp, sched_expires);
 	if (task_cputime_zero(&sig->cputime_expires))
 		stop_process_timers(sig);
 }
@@ -1255,8 +1261,9 @@ static inline int task_cputime_expired(c
 	    cputime_ge(cputime_add(sample->utime, sample->stime),
 		       expires->stime))
 		return 1;
-	if (expires->sum_exec_runtime != 0 &&
-	    sample->sum_exec_runtime >= expires->sum_exec_runtime)
+	if (atomic64_read(&expires->sum_exec_runtime) != 0 &&
+	    atomic64_read(&sample->sum_exec_runtime) >=
+			atomic64_read(&expires->sum_exec_runtime))
 		return 1;
 	return 0;
 }
@@ -1279,7 +1286,7 @@ static inline int fastpath_timer_check(s
 		struct task_cputime task_sample = {
 			.utime = tsk->utime,
 			.stime = tsk->stime,
-			.sum_exec_runtime = tsk->se.sum_exec_runtime
+			.sum_exec_runtime = ATOMIC64_INIT(tsk->se.sum_exec_runtime),
 		};
 
 		if (task_cputime_expired(&task_sample, &tsk->cputime_expires))
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -4075,7 +4075,7 @@ void thread_group_times(struct task_stru
 	thread_group_cputime(p, &cputime);
 
 	total = cputime_add(cputime.utime, cputime.stime);
-	rtime = nsecs_to_cputime(cputime.sum_exec_runtime);
+	rtime = nsecs_to_cputime(atomic64_read(&cputime.sum_exec_runtime));
 
 	if (total) {
 		u64 temp = rtime;
Index: linux-2.6/kernel/sched_rt.c
===================================================================
--- linux-2.6.orig/kernel/sched_rt.c
+++ linux-2.6/kernel/sched_rt.c
@@ -1763,8 +1763,10 @@ static void watchdog(struct rq *rq, stru
 
 		p->rt.timeout++;
 		next = DIV_ROUND_UP(min(soft, hard), USEC_PER_SEC/HZ);
-		if (p->rt.timeout > next)
-			p->cputime_expires.sched_exp = p->se.sum_exec_runtime;
+		if (p->rt.timeout > next) {
+			atomic64_set(&p->cputime_expires.sched_exp,
+					p->se.sum_exec_runtime);
+		}
 	}
 }
 
Index: linux-2.6/kernel/sched_stats.h
===================================================================
--- linux-2.6.orig/kernel/sched_stats.h
+++ linux-2.6/kernel/sched_stats.h
@@ -330,7 +330,5 @@ static inline void account_group_exec_ru
 	if (!cputimer->running)
 		return;
 
-	spin_lock(&cputimer->lock);
-	cputimer->cputime.sum_exec_runtime += ns;
-	spin_unlock(&cputimer->lock);
+	atomic64_add(ns, &cputimer->cputime.sum_exec_runtime);
 }


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17  9:03                   ` Thomas Gleixner
@ 2011-10-17 10:40                     ` Peter Zijlstra
  2011-10-17 11:40                       ` Alan Cox
  2011-10-17 18:49                     ` Ingo Molnar
  1 sibling, 1 reply; 98+ messages in thread
From: Peter Zijlstra @ 2011-10-17 10:40 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Linus Torvalds, Simon Kirby,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky

On Mon, 2011-10-17 at 11:03 +0200, Thomas Gleixner wrote:
> That's not a micro optimization, it's a massive performance hit if you
> force those 32bit archs to do 64/32 all over the place.
> 
Linus could just say he doesn't care about 32bit and everybody sane
should just get a 64bit machine.. but I suspect that's a few more years.

Although I hope not too long, even my phone could do with more than 2G
of memory.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17 10:40                     ` Peter Zijlstra
@ 2011-10-17 11:40                       ` Alan Cox
  0 siblings, 0 replies; 98+ messages in thread
From: Alan Cox @ 2011-10-17 11:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Linus Torvalds, Simon Kirby,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky

On Mon, 17 Oct 2011 12:40:01 +0200
Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> On Mon, 2011-10-17 at 11:03 +0200, Thomas Gleixner wrote:
> > That's not a micro optimization, it's a massive performance hit if you
> > force those 32bit archs to do 64/32 all over the place.
> > 
> Linus could just say he doesn't care about 32bit and everybody sane
> should just get a 64bit machine.. but I suspect that's a few more years.
> 
> Although I hope not too long, even my phone could do with more than 2G
> of memory.

Perhaps it wouldnt; if people didn't keep adding junk to the system ;)


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17 10:34                 ` Peter Zijlstra
@ 2011-10-17 14:07                   ` Martin Schwidefsky
  2011-10-17 14:57                   ` Linus Torvalds
  1 sibling, 0 replies; 98+ messages in thread
From: Martin Schwidefsky @ 2011-10-17 14:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Simon Kirby, Linux Kernel Mailing List,
	Dave Jones, Thomas Gleixner, Ingo Molnar

On Mon, 17 Oct 2011 12:34:18 +0200
Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> > And
> > why does "cputime_add()" exist at all? It seems to always be just a
> > plain add, and nothing else would seem to ever make sense *anyway*?
> 
> Martin and me were discussing the merit of that only a few weeks ago ;-)

I took my old cputime debug patch and compiled the latest git tree with it.
The compiler found a few places where fishy things happen:

1) fs/proc/uptime.c
static int uptime_proc_show(struct seq_file *m, void *v)
{
	...
        cputime_t idletime = cputime_zero;

        for_each_possible_cpu(i)
                idletime = cputime64_add(idletime, kstat_cpu(i).cpustat.idle);
	...
        cputime_to_timespec(idletime, &idle);
	...
}

idletime is a 32-bit integer on x86-32. The sum of the idle time over all
cpus will quickly overflow, e.g. consider HZ=1000 on a quad-core. It would
overflow after 12.42 days (2^32 / 1000 / 4 / 86400).

2) kernel/posix-cpu-timers.c
/*                                                                              
 * Divide and limit the result to res >= 1                                      
 *                                                                              
 * This is necessary to prevent signal delivery starvation, when the result of  
 * the division would be rounded down to 0.                                     
 */
static inline cputime_t cputime_div_non_zero(cputime_t time, unsigned long div)
{
        cputime_t res = cputime_div(time, div);

        return max_t(cputime_t, res, 1);
}

A cputime of 1 on s390 is 0.244 nano seconds, I have my doubts if that will
prevent signal starvation. Fortunately the function is unused and can be
removed.

3) kernel/itimer
enum hrtimer_restart it_real_fn(struct hrtimer *timer)
{
        struct signal_struct *sig =
                container_of(timer, struct signal_struct, real_timer);

        trace_itimer_expire(ITIMER_REAL, sig->leader_pid, 0);
        kill_pid_info(SIGALRM, SEND_SIG_PRIV, sig->leader_pid);

        return HRTIMER_NORESTART;
}

trace_itimer_expire take a cputime as third argument. That should be
cputime_zero in the current notation, same in do_setitimer. After the
conversion all cputime_zero occurences would be replaced with 0.

4) kernel/sched.c
#define CPUACCT_BATCH   \
        min_t(long, percpu_counter_batch * cputime_one_jiffy, INT_MAX)

If cputime_t is defined as an 64-bit type on a 32-bit architecture the
CPUACCT_BATCH definition can break. Should work for the existing code
though.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17 10:34                 ` Peter Zijlstra
  2011-10-17 14:07                   ` Martin Schwidefsky
@ 2011-10-17 14:57                   ` Linus Torvalds
  2011-10-17 17:54                     ` Peter Zijlstra
  1 sibling, 1 reply; 98+ messages in thread
From: Linus Torvalds @ 2011-10-17 14:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Simon Kirby, Linux Kernel Mailing List, Dave Jones,
	Thomas Gleixner, Martin Schwidefsky, Ingo Molnar

On Mon, Oct 17, 2011 at 3:34 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
> Well, I thought atomic64_t would be more expensive on 32bit archs, i386
> uses the horridly expensive cmpxchg8b thing to implement it.

Ugh, yes. And some of those paths seem to be hot-paths too.

Perhaps more importantly, there are way more accesses to that
'sum_exec_runtime' than the spinlock-variant of the patch implied.

So now with the atomic64 variant, the readers are protected too, and
that ends up being really expensive. That may be the "right thing" to
do, but I'm not sure if it's really acceptable. Also, I see that some
of the atomic regions (that weren't protected by the spinlock *either*
aren't just simple adds: they are code like

+                       if (atomic64_read(&cputime_expires->sched_exp) == 0 ||
+                           atomic64_read(&cputime_expires->sched_exp)
> exp->sched)
+
atomic64_set(&cputime_expires->sched_exp, exp->sched);

in arm_timer(), which was apparently totally unprotected before, and
which is just inappropriate with atomic accesses.

So seeing this, I'm not confident that atomic64 works at all, after all.

Grrr..

                               Linus

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17 14:57                   ` Linus Torvalds
@ 2011-10-17 17:54                     ` Peter Zijlstra
  2011-10-17 18:31                       ` Linus Torvalds
  0 siblings, 1 reply; 98+ messages in thread
From: Peter Zijlstra @ 2011-10-17 17:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Simon Kirby, Linux Kernel Mailing List, Dave Jones,
	Thomas Gleixner, Martin Schwidefsky, Ingo Molnar

On Mon, 2011-10-17 at 07:57 -0700, Linus Torvalds wrote:

> So seeing this, I'm not confident that atomic64 works at all, after all.

I could of course propose this... but I really won't since I'm half
retching by now.. ;-)


---
 include/linux/sched.h     |    7 +++++--
 kernel/posix-cpu-timers.c |    8 +++++---
 kernel/sched_stats.h      |    4 +---
 3 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 41d0237..94bf16f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -474,7 +474,10 @@ struct cpu_itimer {
 struct task_cputime {
 	cputime_t utime;
 	cputime_t stime;
-	unsigned long long sum_exec_runtime;
+	union {
+		unsigned long long sum_exec_runtime;
+		atomic64_t _sum_exec_runtime;
+	};
 };
 /* Alternate field names when used to cache expirations. */
 #define prof_exp	stime
@@ -485,7 +488,7 @@ struct task_cputime {
 	(struct task_cputime) {					\
 		.utime = cputime_zero,				\
 		.stime = cputime_zero,				\
-		.sum_exec_runtime = 0,				\
+		{ .sum_exec_runtime = 0, },			\
 	}
 
 /*
diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index c8008dd..4808c0d 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -264,8 +264,8 @@ static void update_gt_cputime(struct task_cputime *a, struct task_cputime *b)
 	if (cputime_gt(b->stime, a->stime))
 		a->stime = b->stime;
 
-	if (b->sum_exec_runtime > a->sum_exec_runtime)
-		a->sum_exec_runtime = b->sum_exec_runtime;
+	if (b->sum_exec_runtime > atomic64_read(&a->_sum_exec_runtime))
+		atomic64_set(&a->_sum_exec_runtime, b->sum_exec_runtime);
 }
 
 void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)
@@ -287,6 +287,8 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)
 		update_gt_cputime(&cputimer->cputime, &sum);
 	}
 	*times = cputimer->cputime;
+	times->sum_exec_runtime = 
+		atomic64_read(&cputimer->cputime._sum_exec_runtime);
 	spin_unlock_irqrestore(&cputimer->lock, flags);
 }
 
@@ -1279,7 +1281,7 @@ static inline int fastpath_timer_check(struct task_struct *tsk)
 		struct task_cputime task_sample = {
 			.utime = tsk->utime,
 			.stime = tsk->stime,
-			.sum_exec_runtime = tsk->se.sum_exec_runtime
+			{ .sum_exec_runtime = tsk->se.sum_exec_runtime, },
 		};
 
 		if (task_cputime_expired(&task_sample, &tsk->cputime_expires))
diff --git a/kernel/sched_stats.h b/kernel/sched_stats.h
index 331e01b..65dcb76 100644
--- a/kernel/sched_stats.h
+++ b/kernel/sched_stats.h
@@ -330,7 +330,5 @@ static inline void account_group_exec_runtime(struct task_struct *tsk,
 	if (!cputimer->running)
 		return;
 
-	spin_lock(&cputimer->lock);
-	cputimer->cputime.sum_exec_runtime += ns;
-	spin_unlock(&cputimer->lock);
+	atomic64_add(ns, &cputimer->cputime._sum_exec_runtime);
 }


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17 17:54                     ` Peter Zijlstra
@ 2011-10-17 18:31                       ` Linus Torvalds
  2011-10-17 19:23                         ` Peter Zijlstra
  0 siblings, 1 reply; 98+ messages in thread
From: Linus Torvalds @ 2011-10-17 18:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Simon Kirby, Linux Kernel Mailing List, Dave Jones,
	Thomas Gleixner, Martin Schwidefsky, Ingo Molnar

On Mon, Oct 17, 2011 at 10:54 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
> I could of course propose this... but I really won't since I'm half
> retching by now.. ;-)

Wow. Is this "ugly and fragile code week" and I just didn't get the memo?

I do wonder if we might not fix the problem by just taking the
*existing* lock in the right order?

IOW, how nasty would be it be to make "scheduler_tick()" just get the
cputimer->lock outside or rq->lock?

Sure, we'd hold that lock *much* longer than we need, but how much do
we care? Is that a lock that gets contention? It migth be the simple
solution for now - I *would* like to get 3.1 out..

                        Linus

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17  9:03                   ` Thomas Gleixner
  2011-10-17 10:40                     ` Peter Zijlstra
@ 2011-10-17 18:49                     ` Ingo Molnar
  2011-10-17 20:35                       ` H. Peter Anvin
  1 sibling, 1 reply; 98+ messages in thread
From: Ingo Molnar @ 2011-10-17 18:49 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Linus Torvalds, Simon Kirby, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky


* Thomas Gleixner <tglx@linutronix.de> wrote:

> > That simple time type could then trickle down as well: we could 
> > use it everywhere in kernel code and limit the hodge-podge of ABI 
> > time units to the syscall boundary. (and convert the internal 
> > time unit to whatever ABI unit there is close to the syscall 
> > boundary)
> > 
> > There's a point where micro-optimized 32-bit support related 
> > maintenance overhead (and the resulting loss of 
> > robustness/flexibility) becomes too expensive IMO.
> 
> That's not a micro optimization, it's a massive performance hit if 
> you force those 32bit archs to do 64/32 all over the place.

Do we have some hard data on this, which we could put into comments 
in include/linux/ktime.h and such? Older versions of GCC used to do a 
bad job of long long handling on 32-bit systems - that might be a 
factor in the performance figures.

But i suspect you are right that the cost is still very much there 
...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17 18:31                       ` Linus Torvalds
@ 2011-10-17 19:23                         ` Peter Zijlstra
  2011-10-17 21:00                           ` Thomas Gleixner
  0 siblings, 1 reply; 98+ messages in thread
From: Peter Zijlstra @ 2011-10-17 19:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Simon Kirby, Linux Kernel Mailing List, Dave Jones,
	Thomas Gleixner, Martin Schwidefsky, Ingo Molnar

On Mon, 2011-10-17 at 11:31 -0700, Linus Torvalds wrote:
> On Mon, Oct 17, 2011 at 10:54 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> >
> > I could of course propose this... but I really won't since I'm half
> > retching by now.. ;-)
> 
> Wow. Is this "ugly and fragile code week" and I just didn't get the memo?

Do I get a price?

> I do wonder if we might not fix the problem by just taking the
> *existing* lock in the right order?
> 
> IOW, how nasty would be it be to make "scheduler_tick()" just get the
> cputimer->lock outside or rq->lock?
> 
> Sure, we'd hold that lock *much* longer than we need, but how much do
> we care? Is that a lock that gets contention? It migth be the simple
> solution for now - I *would* like to get 3.1 out..

Ah, sadly the tick isn't the only one with the inverted callchain,
pretty much every callchain in the scheduler ends up in update_curr()
one way or another.

The easier way around might be something like this... even when two
threads in a process race to enable this clock the the wasted time is
pretty much of the same order as we would otherwise have wasted spinning
on the lock and the update_gt_cputime() think would end up moving the
clock fwd to the latest outcome any which way.

Humm,. Thomas anything?


---
diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index c8008dd..640ded8 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -274,9 +274,7 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)
 	struct task_cputime sum;
 	unsigned long flags;
 
-	spin_lock_irqsave(&cputimer->lock, flags);
 	if (!cputimer->running) {
-		cputimer->running = 1;
 		/*
 		 * The POSIX timer interface allows for absolute time expiry
 		 * values through the TIMER_ABSTIME flag, therefore we have
@@ -284,8 +282,11 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)
 		 * it.
 		 */
 		thread_group_cputime(tsk, &sum);
+		spin_lock_irqsave(&cputimer->lock, flags);
+		cputimer->running = 1;
 		update_gt_cputime(&cputimer->cputime, &sum);
-	}
+	} else
+		spin_lock_irqsave(&cputimer->lock, flags);
 	*times = cputimer->cputime;
 	spin_unlock_irqrestore(&cputimer->lock, flags);
 }


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17 18:49                     ` Ingo Molnar
@ 2011-10-17 20:35                       ` H. Peter Anvin
  2011-10-17 21:19                         ` Ingo Molnar
  0 siblings, 1 reply; 98+ messages in thread
From: H. Peter Anvin @ 2011-10-17 20:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, Linus Torvalds, Simon Kirby, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky

On 10/17/2011 11:49 AM, Ingo Molnar wrote:
> Do we have some hard data on this, which we could put into comments 
> in include/linux/ktime.h and such? Older versions of GCC used to do a 
> bad job of long long handling on 32-bit systems - that might be a 
> factor in the performance figures.
> 
> But i suspect you are right that the cost is still very much there 

64/64 division is done bit by bit on most (all?) 32-bit architectures.

64/32 division can be done in hardware on some architectures, e.g. x86.

	-hpa

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17  7:55                 ` Martin Schwidefsky
  2011-10-17  9:12                   ` Peter Zijlstra
@ 2011-10-17 20:48                   ` H. Peter Anvin
  2011-10-18  7:20                     ` Martin Schwidefsky
  1 sibling, 1 reply; 98+ messages in thread
From: H. Peter Anvin @ 2011-10-17 20:48 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: Linus Torvalds, Simon Kirby, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Thomas Gleixner,
	Ingo Molnar

On 10/17/2011 12:55 AM, Martin Schwidefsky wrote:
> 
> I introduced those macros to find all the places in the kernel operating
> on a cputime value. The additional debug patch defined cputime_t as a
> struct which contained a single u64. That way I got a compiler error
> for every place I missed.
> 

And was there a reason that that structure thingy didn't get merged?

	-hpa

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17 19:23                         ` Peter Zijlstra
@ 2011-10-17 21:00                           ` Thomas Gleixner
  2011-10-18  8:39                             ` Thomas Gleixner
  0 siblings, 1 reply; 98+ messages in thread
From: Thomas Gleixner @ 2011-10-17 21:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Simon Kirby, Linux Kernel Mailing List,
	Dave Jones, Martin Schwidefsky, Ingo Molnar

On Mon, 17 Oct 2011, Peter Zijlstra wrote:

> On Mon, 2011-10-17 at 11:31 -0700, Linus Torvalds wrote:
> > On Mon, Oct 17, 2011 at 10:54 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > >
> > > I could of course propose this... but I really won't since I'm half
> > > retching by now.. ;-)
> > 
> > Wow. Is this "ugly and fragile code week" and I just didn't get the memo?
> 
> Do I get a price?
> 
> > I do wonder if we might not fix the problem by just taking the
> > *existing* lock in the right order?
> > 
> > IOW, how nasty would be it be to make "scheduler_tick()" just get the
> > cputimer->lock outside or rq->lock?
> > 
> > Sure, we'd hold that lock *much* longer than we need, but how much do
> > we care? Is that a lock that gets contention? It migth be the simple
> > solution for now - I *would* like to get 3.1 out..
> 
> Ah, sadly the tick isn't the only one with the inverted callchain,
> pretty much every callchain in the scheduler ends up in update_curr()
> one way or another.
> 
> The easier way around might be something like this... even when two
> threads in a process race to enable this clock the the wasted time is
> pretty much of the same order as we would otherwise have wasted spinning
> on the lock and the update_gt_cputime() think would end up moving the
> clock fwd to the latest outcome any which way.
> 
> Humm,. Thomas anything?
 
No, that should work. It does not make that call path more racy
against exit, which is another trainwreck at least on 32bit machines
which I discovered while looking for the problems with your patch.

thread_group_cputime() reads task->signal->utime/stime/sum_sched_runtime

These fields are updated in __exit_signal() w/o holding
task->signal->cputimer.lock. So nothing prevents that these values
change while we read them.

All callers of thread_group_cputime() except the scheduler callpath
hold sighand lock, which is also taken in __exit_signal().

So your patch does not make that particular case worse.

That said, I really need some sleep before I can make a final
judgement on that horror. The call paths are such an intermingled mess
that it's not funny anymore. I do that tomorrow morning first thing.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17 20:35                       ` H. Peter Anvin
@ 2011-10-17 21:19                         ` Ingo Molnar
  2011-10-17 21:22                           ` H. Peter Anvin
  2011-10-17 21:31                           ` Ingo Molnar
  0 siblings, 2 replies; 98+ messages in thread
From: Ingo Molnar @ 2011-10-17 21:19 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Linus Torvalds, Simon Kirby, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky


* H. Peter Anvin <hpa@zytor.com> wrote:

> On 10/17/2011 11:49 AM, Ingo Molnar wrote:
> > Do we have some hard data on this, which we could put into comments 
> > in include/linux/ktime.h and such? Older versions of GCC used to do a 
> > bad job of long long handling on 32-bit systems - that might be a 
> > factor in the performance figures.
> > 
> > But i suspect you are right that the cost is still very much there 
> 
> 64/64 division is done bit by bit on most (all?) 32-bit architectures.
> 
> 64/32 division can be done in hardware on some architectures, e.g. x86.

it's 64/32 division - it's the /1000000000 /1000000 /1000 divisions 
in the large majority of cases, to convert between 
seconds/milliseconds/microseconds and scalar nanoseconds.

the kernel-internal ktime_t in the 32-bit optimized case is:

union ktime {
        s32     sec, nsec;
};

which is the same as timespec and arithmetically close to timeval, 
which many ABIs use. So conversion is easy in that case - but 
arithmetics gets a bit harder.

If we used a scalar 64-bit form for all kernel internal time 
representations:

	s64	nsecs;

then conversions back to timespec/timeval would involve dividing this 
64-bit value with 1000000000 or 1000000.

Is there no faster approximation for those than bit by bit?

In particular we could try something like:

	(high*2^32 + low)/1e9 ~==  ( high * (2^64/1e9) ) / 2^32

... which reduces it all to a 64-bit multiplication (or two 32-bit 
multiplications) with a known constant, at the cost of 1 nsec 
imprecision of the result - but that's an OK approximation in my 
opinion.

But it's late here and math is hard - lets go shopping ;-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17 21:19                         ` Ingo Molnar
@ 2011-10-17 21:22                           ` H. Peter Anvin
  2011-10-17 21:39                             ` Ingo Molnar
  2011-10-17 21:31                           ` Ingo Molnar
  1 sibling, 1 reply; 98+ messages in thread
From: H. Peter Anvin @ 2011-10-17 21:22 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, Linus Torvalds, Simon Kirby, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky

On 10/17/2011 02:19 PM, Ingo Molnar wrote:
> 
> it's 64/32 division - it's the /1000000000 /1000000 /1000 divisions 
> in the large majority of cases, to convert between 
> seconds/milliseconds/microseconds and scalar nanoseconds.
> 
> the kernel-internal ktime_t in the 32-bit optimized case is:
> 
> union ktime {
>         s32     sec, nsec;
> };
> 
> which is the same as timespec and arithmetically close to timeval, 
> which many ABIs use. So conversion is easy in that case - but 
> arithmetics gets a bit harder.
> 
> If we used a scalar 64-bit form for all kernel internal time 
> representations:
> 
> 	s64	nsecs;
> 
> then conversions back to timespec/timeval would involve dividing this 
> 64-bit value with 1000000000 or 1000000.
> 
> Is there no faster approximation for those than bit by bit?
> 
> In particular we could try something like:
> 
> 	(high*2^32 + low)/1e9 ~==  ( high * (2^64/1e9) ) / 2^32
> 
> ... which reduces it all to a 64-bit multiplication (or two 32-bit 
> multiplications) with a known constant, at the cost of 1 nsec 
> imprecision of the result - but that's an OK approximation in my 
> opinion.
> 

We can do much better than that with reciprocal multiplication.  We're
already playing reciprocal multiplication tricks for jiffies conversion,
and in this case it's much easier because the constant is already known.

	-hpa


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17 21:19                         ` Ingo Molnar
  2011-10-17 21:22                           ` H. Peter Anvin
@ 2011-10-17 21:31                           ` Ingo Molnar
  1 sibling, 0 replies; 98+ messages in thread
From: Ingo Molnar @ 2011-10-17 21:31 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Linus Torvalds, Simon Kirby, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky


* Ingo Molnar <mingo@elte.hu> wrote:

> If we used a scalar 64-bit form for all kernel internal time 
> representations:
> 
> 	s64	nsecs;
> 
> then conversions back to timespec/timeval would involve dividing 
> this 64-bit value with 1000000000 or 1000000.
> 
> Is there no faster approximation for those than bit by bit?
> 
> In particular we could try something like:
> 
> 	(high*2^32 + low)/1e9 ~==  ( high * (2^64/1e9) ) / 2^32
> 
> ... which reduces it all to a 64-bit multiplication (or two 32-bit 
> multiplications) with a known constant, at the cost of 1 nsec 
> imprecision of the result - but that's an OK approximation in my 
> opinion.

Hm, no, the numeric error would be in the *seconds* result, and would 
be 0-3 seconds - which is obviously not acceptable.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17 21:22                           ` H. Peter Anvin
@ 2011-10-17 21:39                             ` Ingo Molnar
  2011-10-17 22:03                               ` Ingo Molnar
  2011-10-17 22:08                               ` H. Peter Anvin
  0 siblings, 2 replies; 98+ messages in thread
From: Ingo Molnar @ 2011-10-17 21:39 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Linus Torvalds, Simon Kirby, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky


* H. Peter Anvin <hpa@zytor.com> wrote:

> On 10/17/2011 02:19 PM, Ingo Molnar wrote:
> > 
> > it's 64/32 division - it's the /1000000000 /1000000 /1000 divisions 
> > in the large majority of cases, to convert between 
> > seconds/milliseconds/microseconds and scalar nanoseconds.
> > 
> > the kernel-internal ktime_t in the 32-bit optimized case is:
> > 
> > union ktime {
> >         s32     sec, nsec;
> > };
> > 
> > which is the same as timespec and arithmetically close to timeval, 
> > which many ABIs use. So conversion is easy in that case - but 
> > arithmetics gets a bit harder.
> > 
> > If we used a scalar 64-bit form for all kernel internal time 
> > representations:
> > 
> > 	s64	nsecs;
> > 
> > then conversions back to timespec/timeval would involve dividing this 
> > 64-bit value with 1000000000 or 1000000.
> > 
> > Is there no faster approximation for those than bit by bit?
> > 
> > In particular we could try something like:
> > 
> > 	(high*2^32 + low)/1e9 ~==  ( high * (2^64/1e9) ) / 2^32
> > 
> > ... which reduces it all to a 64-bit multiplication (or two 32-bit 
> > multiplications) with a known constant, at the cost of 1 nsec 
> > imprecision of the result - but that's an OK approximation in my 
> > opinion.
> > 
> 
> We can do much better than that with reciprocal multiplication.  

Yes, 2^64/1e9 is the reciprocal.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17 21:39                             ` Ingo Molnar
@ 2011-10-17 22:03                               ` Ingo Molnar
  2011-10-17 22:04                                 ` Ingo Molnar
  2011-10-17 22:08                               ` H. Peter Anvin
  1 sibling, 1 reply; 98+ messages in thread
From: Ingo Molnar @ 2011-10-17 22:03 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Linus Torvalds, Simon Kirby, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky


* Ingo Molnar <mingo@elte.hu> wrote:

> > > In particular we could try something like:
> > > 
> > > 	(high*2^32 + low)/1e9 ~==  ( high * (2^64/1e9) ) / 2^32
> > > 
> > > ... which reduces it all to a 64-bit multiplication (or two 
> > > 32-bit multiplications) with a known constant, at the cost of 1 
> > > nsec imprecision of the result - but that's an OK approximation 
> > > in my opinion.
> > > 
> > 
> > We can do much better than that with reciprocal multiplication.
> 
> Yes, 2^64/1e9 is the reciprocal.

So basically, to extend on the pseudocode above, we could do the 
equivalent of:

/* 2^64/1e9: */
#define MAGIC 18446744073ULL

        secs_fast = ((nsecs >> 32) * MAGIC) >> 32;
        secs_fast += (nsecs & 0xFFFFFFFF)/1000000000;

to get to the precise 'timeval.secs' field - these are all 32-bit 
operations: a 32-bit multiplication and a 32-bit division if i 
counted it right.

(Likewise we can get the remainder as well, for timeval.nsecs.)

So I think if we add 32-bit optimized reciprocal multiplication based 
timeval and timespec routines, we can change ktime_t to a simple 
scalar type on 64-bit and 32-bit architectures alike.

It would likely be faster as well: the 32-bit ktime operations are 
more complex than straightforward u64 operations.

Thomas, what do you think?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17 22:03                               ` Ingo Molnar
@ 2011-10-17 22:04                                 ` Ingo Molnar
  0 siblings, 0 replies; 98+ messages in thread
From: Ingo Molnar @ 2011-10-17 22:04 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Linus Torvalds, Simon Kirby, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky


* Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> > > > In particular we could try something like:
> > > > 
> > > > 	(high*2^32 + low)/1e9 ~==  ( high * (2^64/1e9) ) / 2^32
> > > > 
> > > > ... which reduces it all to a 64-bit multiplication (or two 
> > > > 32-bit multiplications) with a known constant, at the cost of 1 
> > > > nsec imprecision of the result - but that's an OK approximation 
> > > > in my opinion.
> > > > 
> > > 
> > > We can do much better than that with reciprocal multiplication.
> > 
> > Yes, 2^64/1e9 is the reciprocal.
> 
> So basically, to extend on the pseudocode above, we could do the 
> equivalent of:
> 
> /* 2^64/1e9: */
> #define MAGIC 18446744073ULL
> 
>         secs_fast = ((nsecs >> 32) * MAGIC) >> 32;
>         secs_fast += (nsecs & 0xFFFFFFFF)/1000000000;
> 
> to get to the precise 'timeval.secs' field - these are all 32-bit 
> operations: a 32-bit multiplication and a 32-bit division if i 
> counted it right.
> 
> (Likewise we can get the remainder as well, for timeval.nsecs.)

that's timespec.nsecs - there's timeval.usecs. The same argument 
applies in both cases.

This would deobfuscate a rather important data type in the timer 
code.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17 21:39                             ` Ingo Molnar
  2011-10-17 22:03                               ` Ingo Molnar
@ 2011-10-17 22:08                               ` H. Peter Anvin
  2011-10-18  6:01                                 ` Ingo Molnar
  2011-10-18  7:12                                 ` Geert Uytterhoeven
  1 sibling, 2 replies; 98+ messages in thread
From: H. Peter Anvin @ 2011-10-17 22:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, Linus Torvalds, Simon Kirby, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky

On 10/17/2011 02:39 PM, Ingo Molnar wrote:
>>
>> We can do much better than that with reciprocal multiplication.  
> 
> Yes, 2^64/1e9 is the reciprocal.
> 

What I mean is that it's pretty easy to work it so it doesn't have the
errors.  We have 32*32 = 64 multiplication on all 32-bit platforms I'm
99.9% sure.

	-hpa


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-12 21:35           ` Simon Kirby
  2011-10-13 23:25             ` Simon Kirby
@ 2011-10-18  5:40             ` Simon Kirby
  1 sibling, 0 replies; 98+ messages in thread
From: Simon Kirby @ 2011-10-18  5:40 UTC (permalink / raw)
  To: Linux Kernel Mailing List, netdev

On Wed, Oct 12, 2011 at 02:35:55PM -0700, Simon Kirby wrote:

> > > patching file kernel/posix-cpu-timers.c
> > > patching file kernel/sched_stats.h 
> > 
> > yes that would be fine.
> 
> This patch (s/raw_//) has been stable on 5 boxes for a day. I'll push to
> another 15 shortly and confirm tomorrow. Meanwhile, we had another ~4
> boxes lock up on 3.1-rc9 _with_ d670ec13 reverted (all CPUs spinning),
> but there weren't enough serial cables to log all of them and we haven't
> been lucky enough to capture anything other than what fits on 80x25.
> I'm hoping it's just the same bug you've already fixed.

Looks to be a different bug. It just happened on a box with serial
console logging, on the same build I was testing the above patch on --
Linus master circa Oct 7th. This seems to be specific to TCP. I'm not
sure what is with all of the doubled backtraces. I've only seen this on
a couple of different boxes so far.

Full log at http://0x.ca/sim/ref/3.1-rc9/3.1-rc9-tcp-lockup.log

First 100 lines:

[516112.140013] BUG: soft lockup - CPU#0 stuck for 22s! [swapper:0]
[516112.144001] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[516112.144001] CPU 0 
[516112.144001] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[516112.144001] 
[516112.144001] Pid: 0, comm: swapper Not tainted 3.1.0-rc9-hw+ #48 Dell Inc. PowerEdge 1950/0UR033
[516112.144001] RIP: 0010:[<ffffffff816b6694>]  [<ffffffff816b6694>] _raw_spin_lock+0x14/0x20
[516112.144001] RSP: 0018:ffff88022fc03e10  EFLAGS: 00000297
[516112.144001] RAX: 0000000000000100 RBX: ffffffff81022674 RCX: ffffffff81b4df20
[516112.144001] RDX: ffff8801002aebe0 RSI: dead000000200200 RDI: ffff8801002ad188
[516112.144001] RBP: ffff88022fc03e10 R08: 00000000000000f7 R09: 0000000000000000
[516112.144001] R10: 0000000000000000 R11: 0000000000000010 R12: ffff88022fc03d88
[516112.144001] R13: ffffffff816bed1e R14: ffff88022fc03e10 R15: ffffffff81b4df00
[516112.144001] FS:  0000000000000000(0000) GS:ffff88022fc00000(0000) knlGS:0000000000000000
[516112.244020] BUG: soft lockup - CPU#1 stuck for 22s! [kworker/0:0:0]
[516112.244024] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[516112.244033] CPU 1 
[516112.244035] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[516112.244041] 
[516112.244044] Pid: 0, comm: kworker/0:0 Not tainted 3.1.0-rc9-hw+ #48 Dell Inc. PowerEdge 1950/0UR033
[516112.244048] RIP: 0010:[<ffffffff816b6694>]  [<ffffffff816b6694>] _raw_spin_lock+0x14/0x20
[516112.244057] RSP: 0018:ffff88022fc43e10  EFLAGS: 00000297
[516112.244059] RAX: 0000000000000100 RBX: ffffffff81022674 RCX: ffff880226888020
[516112.244062] RDX: ffff88001ece1aa0 RSI: dead000000200200 RDI: ffff88001ece1f88
[516112.244064] RBP: ffff88022fc43e10 R08: 00000000000000df R09: 0000000000000000
[516112.244066] R10: 0000000000000000 R11: 0000000000000010 R12: ffff88022fc43d88
[516112.244068] R13: ffffffff816bed1e R14: ffff88022fc43e10 R15: ffff880226888000
[516112.244071] FS:  0000000000000000(0000) GS:ffff88022fc40000(0000) knlGS:0000000000000000
[516112.244074] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[516112.244076] CR2: ffffffffff600400 CR3: 0000000126d93000 CR4: 00000000000006e0
[516112.244078] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[516112.244081] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[516112.244083] Process kworker/0:0 (pid: 0, threadinfo ffff880226918000, task ffff880226911640)
[516112.244085] Stack:
[516112.244086]  ffff88022fc43e40 ffffffff8162a613 0000000000000000 0000000000000000
[516112.244090]  ffff880226888000 ffff88001ece20e0 ffff88022fc43ee0 ffffffff810692dc
[516112.244094]  0000000000000000 ffff880226919fd8 ffff880226919fd8 ffff880226919fd8
[516112.244098] Call Trace:
[516112.244099]  <IRQ> 
[516112.244105]  [<ffffffff8162a613>] tcp_keepalive_timer+0x23/0x260
[516112.244110]  [<ffffffff810692dc>] run_timer_softirq+0x1ac/0x310
[516112.244113]  [<ffffffff8162a5f0>] ? tcp_init_xmit_timers+0x20/0x20
[516112.244118]  [<ffffffff8102e838>] ? lapic_next_event+0x18/0x20
[516112.244121]  [<ffffffff81060bf0>] __do_softirq+0xe0/0x1d0
[516112.244125]  [<ffffffff816c04ac>] call_softirq+0x1c/0x30
[516112.244129]  [<ffffffff81014255>] do_softirq+0x65/0xa0
[516112.244132]  [<ffffffff810608fd>] irq_exit+0xad/0xe0
[516112.244135]  [<ffffffff8102f569>] smp_apic_timer_interrupt+0x69/0xa0
[516112.244139]  [<ffffffff816bed1e>] apic_timer_interrupt+0x6e/0x80
[516112.244140]  <EOI> 
[516112.244144]  [<ffffffff8101a337>] ? mwait_idle+0x117/0x120
[516112.244147]  [<ffffffff810120c6>] cpu_idle+0x86/0xe0
[516112.244151]  [<ffffffff816ae77c>] start_secondary+0x1a3/0x1e7
[516112.244153] Code: 0f b6 c2 85 c0 c9 0f 95 c0 0f b6 c0 c3 66 2e 0f 1f 84 00 00 00 00 00 55 b8 00 01 00 00 48 89 e5 f0 66 0f c1 07 38 e0 74 06 f3 90 <8a> 07 eb f6 c9 c3 66 0f 1f 44 00 00 55 48 89 e5 9c 58 66 66 90 
[516112.244173] Call Trace:
[516112.244174]  <IRQ>  [<ffffffff8162a613>] tcp_keepalive_timer+0x23/0x260
[516112.244179]  [<ffffffff810692dc>] run_timer_softirq+0x1ac/0x310
[516112.244182]  [<ffffffff8162a5f0>] ? tcp_init_xmit_timers+0x20/0x20
[516112.244185]  [<ffffffff8102e838>] ? lapic_next_event+0x18/0x20
[516112.244188]  [<ffffffff81060bf0>] __do_softirq+0xe0/0x1d0
[516112.244191]  [<ffffffff816c04ac>] call_softirq+0x1c/0x30
[516112.244194]  [<ffffffff81014255>] do_softirq+0x65/0xa0
[516112.244197]  [<ffffffff810608fd>] irq_exit+0xad/0xe0
[516112.244199]  [<ffffffff8102f569>] smp_apic_timer_interrupt+0x69/0xa0
[516112.244202]  [<ffffffff816bed1e>] apic_timer_interrupt+0x6e/0x80
[516112.244204]  <EOI>  [<ffffffff8101a337>] ? mwait_idle+0x117/0x120
[516112.244209]  [<ffffffff810120c6>] cpu_idle+0x86/0xe0
[516112.244212]  [<ffffffff816ae77c>] start_secondary+0x1a3/0x1e7
[516112.344023] BUG: soft lockup - CPU#2 stuck for 23s! [php:1486]
[516112.344025] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[516112.344033] CPU 2 
[516112.344034] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[516112.344040] 
[516112.344042] Pid: 1486, comm: php Not tainted 3.1.0-rc9-hw+ #48 Dell Inc. PowerEdge 1950/0UR033
[516112.344046] RIP: 0010:[<ffffffff816b6694>]  [<ffffffff816b6694>] _raw_spin_lock+0x14/0x20
[516112.344051] RSP: 0000:ffff88022fc83e10  EFLAGS: 00000297
[516112.344053] RAX: 0000000000000100 RBX: ffffffff81022674 RCX: ffff880226920020
[516112.344056] RDX: ffff88022198c660 RSI: dead000000200200 RDI: ffff8800ac758cc8
[516112.344058] RBP: ffff88022fc83e10 R08: 00000000000000ef R09: 0000000000000000
[516112.344060] R10: 000000000000018b R11: 0000000000000010 R12: ffff88022fc83d88
[516112.344062] R13: ffffffff816bed1e R14: ffff88022fc83e10 R15: ffff880226920000
[516112.344065] FS:  00007faafda03720(0000) GS:ffff88022fc80000(0000) knlGS:0000000000000000
[516112.344068] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[516112.344070] CR2: ffffffffff600400 CR3: 00000002223de000 CR4: 00000000000006e0
[516112.344072] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[516112.344075] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[516112.344077] Process php (pid: 1486, threadinfo ffff880039262000, task ffff88003e675900)
[516112.344079] Stack:
[516112.344081]  ffff88022fc83e40 ffffffff8162a613 0000000000000000 0000000000000000
[516112.344084]  ffff880226920000 ffff8800ac758e20 ffff88022fc83ee0 ffffffff810692dc
[516112.344088]  0000000000000001 ffff880039263fd8 ffff880039263fd8 ffff880039263fd8
[516112.344091] Call Trace:
[516112.344093]  <IRQ> 
[516112.344099]  [<ffffffff8162a613>] tcp_keepalive_timer+0x23/0x260
[516112.344104]  [<ffffffff810692dc>] run_timer_softirq+0x1ac/0x310
[516112.344107]  [<ffffffff8162a5f0>] ? tcp_init_xmit_timers+0x20/0x20
[516112.344111]  [<ffffffff8102e838>] ? lapic_next_event+0x18/0x20
[516112.344115]  [<ffffffff81060bf0>] __do_softirq+0xe0/0x1d0
[516112.344119]  [<ffffffff816c04ac>] call_softirq+0x1c/0x30
[516112.344123]  [<ffffffff81014255>] do_softirq+0x65/0xa0

Simon-

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17 22:08                               ` H. Peter Anvin
@ 2011-10-18  6:01                                 ` Ingo Molnar
  2011-10-18  7:12                                 ` Geert Uytterhoeven
  1 sibling, 0 replies; 98+ messages in thread
From: Ingo Molnar @ 2011-10-18  6:01 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Linus Torvalds, Simon Kirby, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky


* H. Peter Anvin <hpa@zytor.com> wrote:

> On 10/17/2011 02:39 PM, Ingo Molnar wrote:
> >>
> >> We can do much better than that with reciprocal multiplication.  
> > 
> > Yes, 2^64/1e9 is the reciprocal.
> > 
> 
> What I mean is that it's pretty easy to work it so it doesn't have 
> the errors. [...]

Yeah - the second pseudocode i gave will do that with no errors.

> [...] We have 32*32 = 64 multiplication on all 32-bit platforms I'm 
> 99.9% sure.

Good.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17 22:08                               ` H. Peter Anvin
  2011-10-18  6:01                                 ` Ingo Molnar
@ 2011-10-18  7:12                                 ` Geert Uytterhoeven
  2011-10-18 18:50                                   ` H. Peter Anvin
  1 sibling, 1 reply; 98+ messages in thread
From: Geert Uytterhoeven @ 2011-10-18  7:12 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ingo Molnar, Thomas Gleixner, Linus Torvalds, Simon Kirby,
	Peter Zijlstra, Linux Kernel Mailing List, Dave Jones,
	Martin Schwidefsky

On Tue, Oct 18, 2011 at 00:08, H. Peter Anvin <hpa@zytor.com> wrote:
> On 10/17/2011 02:39 PM, Ingo Molnar wrote:
>>> We can do much better than that with reciprocal multiplication.
>>
>> Yes, 2^64/1e9 is the reciprocal.
>
> What I mean is that it's pretty easy to work it so it doesn't have the
> errors.  We have 32*32 = 64 multiplication on all 32-bit platforms I'm
> 99.9% sure.

I assume you mean "we have in hardware"?

Is that muldi3?

$ git ls-files "*muldi3*"
arch/arm/lib/muldi3.S
arch/blackfin/lib/muldi3.S
arch/frv/lib/__muldi3.S
arch/m68k/lib/muldi3.c
arch/microblaze/lib/muldi3.c
arch/sparc/lib/muldi3.S
$

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17 20:48                   ` H. Peter Anvin
@ 2011-10-18  7:20                     ` Martin Schwidefsky
  0 siblings, 0 replies; 98+ messages in thread
From: Martin Schwidefsky @ 2011-10-18  7:20 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Simon Kirby, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Thomas Gleixner,
	Ingo Molnar

On Mon, 17 Oct 2011 13:48:33 -0700
"H. Peter Anvin" <hpa@zytor.com> wrote:

> On 10/17/2011 12:55 AM, Martin Schwidefsky wrote:
> > 
> > I introduced those macros to find all the places in the kernel operating
> > on a cputime value. The additional debug patch defined cputime_t as a
> > struct which contained a single u64. That way I got a compiler error
> > for every place I missed.
> > 
> 
> And was there a reason that that structure thingy didn't get merged?

Oh yes, it is fragile, hackish and ugly as hell.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17 21:00                           ` Thomas Gleixner
@ 2011-10-18  8:39                             ` Thomas Gleixner
  2011-10-18  9:05                               ` Peter Zijlstra
  0 siblings, 1 reply; 98+ messages in thread
From: Thomas Gleixner @ 2011-10-18  8:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Simon Kirby, Linux Kernel Mailing List,
	Dave Jones, Martin Schwidefsky, Ingo Molnar

On Mon, 17 Oct 2011, Thomas Gleixner wrote:
> That said, I really need some sleep before I can make a final
> judgement on that horror. The call paths are such an intermingled mess
> that it's not funny anymore. I do that tomorrow morning first thing.

The patch is safe and the exit race just existed in my confused tired
brain. Peter, can you please provide a changelog. That wants a cc
stable as well, because that deadlock causing commit hit 3.0.7 :(

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-18  8:39                             ` Thomas Gleixner
@ 2011-10-18  9:05                               ` Peter Zijlstra
  2011-10-18 14:59                                 ` Linus Torvalds
                                                   ` (2 more replies)
  0 siblings, 3 replies; 98+ messages in thread
From: Peter Zijlstra @ 2011-10-18  9:05 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Linus Torvalds, Simon Kirby, Linux Kernel Mailing List,
	Dave Jones, Martin Schwidefsky, Ingo Molnar

On Tue, 2011-10-18 at 10:39 +0200, Thomas Gleixner wrote:
> On Mon, 17 Oct 2011, Thomas Gleixner wrote:
> > That said, I really need some sleep before I can make a final
> > judgement on that horror. The call paths are such an intermingled mess
> > that it's not funny anymore. I do that tomorrow morning first thing.
> 
> The patch is safe and the exit race just existed in my confused tired
> brain. Peter, can you please provide a changelog. That wants a cc
> stable as well, because that deadlock causing commit hit 3.0.7 :( 

---
Subject: cputimer: Cure lock inversion
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Mon Oct 17 11:50:30 CEST 2011

There's a lock inversion between the cputimer->lock and rq->lock; notably
the two callchains involved are:

 update_rlimit_cpu()
   sighand->siglock
   set_process_cpu_timer()
     cpu_timer_sample_group()
       thread_group_cputimer()
         cputimer->lock
         thread_group_cputime()
           task_sched_runtime()
             ->pi_lock
             rq->lock

 scheduler_tick()
   rq->lock
   task_tick_fair()
     update_curr()
       account_group_exec()
         cputimer->lock

Where the first one is enabling a CLOCK_PROCESS_CPUTIME_ID timer, and
the second one is keeping up-to-date.

This problem was introduced by e8abccb7193 ("posix-cpu-timers: Cure
SMP accounting oddities").

Cure the problem by removing the cputimer->lock and rq->lock nesting,
this leaves concurrent enablers doing duplicate work, but the time
wasted should be on the same order otherwise wasted spinning on the
lock and the greater-than assignment filter should ensure we preserve
monotonicity.

Reported-by: Dave Jones <davej@redhat.com>
Reported-by: Simon Kirby <sim@hostway.ca>
Cc: stable@kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/posix-cpu-timers.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)
Index: linux-2.6/kernel/posix-cpu-timers.c
===================================================================
--- linux-2.6.orig/kernel/posix-cpu-timers.c
+++ linux-2.6/kernel/posix-cpu-timers.c
@@ -274,9 +274,7 @@ void thread_group_cputimer(struct task_s
 	struct task_cputime sum;
 	unsigned long flags;
 
-	spin_lock_irqsave(&cputimer->lock, flags);
 	if (!cputimer->running) {
-		cputimer->running = 1;
 		/*
 		 * The POSIX timer interface allows for absolute time expiry
 		 * values through the TIMER_ABSTIME flag, therefore we have
@@ -284,8 +282,11 @@ void thread_group_cputimer(struct task_s
 		 * it.
 		 */
 		thread_group_cputime(tsk, &sum);
+		spin_lock_irqsave(&cputimer->lock, flags);
+		cputimer->running = 1;
 		update_gt_cputime(&cputimer->cputime, &sum);
-	}
+	} else
+		spin_lock_irqsave(&cputimer->lock, flags);
 	*times = cputimer->cputime;
 	spin_unlock_irqrestore(&cputimer->lock, flags);
 }


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-18  9:05                               ` Peter Zijlstra
@ 2011-10-18 14:59                                 ` Linus Torvalds
  2011-10-18 15:26                                   ` Thomas Gleixner
                                                     ` (2 more replies)
  2011-10-18 16:13                                 ` Linux 3.1-rc9 Dave Jones
  2011-10-18 18:20                                 ` Simon Kirby
  2 siblings, 3 replies; 98+ messages in thread
From: Linus Torvalds @ 2011-10-18 14:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Simon Kirby, Linux Kernel Mailing List,
	Dave Jones, Martin Schwidefsky, Ingo Molnar

On Tue, Oct 18, 2011 at 2:05 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
> Subject: cputimer: Cure lock inversion
> From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date: Mon Oct 17 11:50:30 CEST 2011
>
> There's a lock inversion between the cputimer->lock and rq->lock; notably
> the two callchains involved are:

Thanks, looks nice and small. Simon - can you check that this works for you?

Thomas/Ingo - once confirmed by Simon, should I take it directly or
will this come through your trees?

                                 Linus

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-18 14:59                                 ` Linus Torvalds
@ 2011-10-18 15:26                                   ` Thomas Gleixner
  2011-10-18 18:07                                   ` Ingo Molnar
  2011-10-18 18:14                                   ` [GIT PULL] timer fix Ingo Molnar
  2 siblings, 0 replies; 98+ messages in thread
From: Thomas Gleixner @ 2011-10-18 15:26 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Simon Kirby, Linux Kernel Mailing List,
	Dave Jones, Martin Schwidefsky, Ingo Molnar

On Tue, 18 Oct 2011, Linus Torvalds wrote:

> On Tue, Oct 18, 2011 at 2:05 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> >
> > Subject: cputimer: Cure lock inversion
> > From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > Date: Mon Oct 17 11:50:30 CEST 2011
> >
> > There's a lock inversion between the cputimer->lock and rq->lock; notably
> > the two callchains involved are:
> 
> Thanks, looks nice and small. Simon - can you check that this works for you?
> 
> Thomas/Ingo - once confirmed by Simon, should I take it directly or
> will this come through your trees?

I have it queued in timers/urgent and run it through testing at the
moment. Will send a pull request once confirmed.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-18  9:05                               ` Peter Zijlstra
  2011-10-18 14:59                                 ` Linus Torvalds
@ 2011-10-18 16:13                                 ` Dave Jones
  2011-10-18 18:20                                 ` Simon Kirby
  2 siblings, 0 replies; 98+ messages in thread
From: Dave Jones @ 2011-10-18 16:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Linus Torvalds, Simon Kirby,
	Linux Kernel Mailing List, Martin Schwidefsky, Ingo Molnar

On Tue, Oct 18, 2011 at 11:05:13AM +0200, Peter Zijlstra wrote:
 
 > Reported-by: Dave Jones <davej@redhat.com>

Ok, feel free to add a 
 
Tested-by: Dave Jones <davej@redhat.com>

too. Seems to do the right thing here.

	Dave

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-18 14:59                                 ` Linus Torvalds
  2011-10-18 15:26                                   ` Thomas Gleixner
@ 2011-10-18 18:07                                   ` Ingo Molnar
  2011-10-18 18:14                                   ` [GIT PULL] timer fix Ingo Molnar
  2 siblings, 0 replies; 98+ messages in thread
From: Ingo Molnar @ 2011-10-18 18:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Thomas Gleixner, Simon Kirby,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, Oct 18, 2011 at 2:05 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> >
> > Subject: cputimer: Cure lock inversion
> > From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > Date: Mon Oct 17 11:50:30 CEST 2011
> >
> > There's a lock inversion between the cputimer->lock and rq->lock; notably
> > the two callchains involved are:
> 
> Thanks, looks nice and small. Simon - can you check that this works for you?
> 
> Thomas/Ingo - once confirmed by Simon, should I take it directly or
> will this come through your trees?

Yeah, we have it in -tip already and it was tested all day, lemme 
cook up a pull request to not hold up the v3.1 release much longer..

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [GIT PULL] timer fix
  2011-10-18 14:59                                 ` Linus Torvalds
  2011-10-18 15:26                                   ` Thomas Gleixner
  2011-10-18 18:07                                   ` Ingo Molnar
@ 2011-10-18 18:14                                   ` Ingo Molnar
  2 siblings, 0 replies; 98+ messages in thread
From: Ingo Molnar @ 2011-10-18 18:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Thomas Gleixner, Simon Kirby,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Andrew Morton


Linus,

Please pull the latest timers-urgent-for-linus git tree from:

   git://tesla.tglx.de/git/linux-2.6-tip.git timers-urgent-for-linus

 Thanks,

	Ingo

------------------>
Peter Zijlstra (1):
      cputimer: Cure lock inversion


 kernel/posix-cpu-timers.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index c8008dd..640ded8 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -274,9 +274,7 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)
 	struct task_cputime sum;
 	unsigned long flags;
 
-	spin_lock_irqsave(&cputimer->lock, flags);
 	if (!cputimer->running) {
-		cputimer->running = 1;
 		/*
 		 * The POSIX timer interface allows for absolute time expiry
 		 * values through the TIMER_ABSTIME flag, therefore we have
@@ -284,8 +282,11 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)
 		 * it.
 		 */
 		thread_group_cputime(tsk, &sum);
+		spin_lock_irqsave(&cputimer->lock, flags);
+		cputimer->running = 1;
 		update_gt_cputime(&cputimer->cputime, &sum);
-	}
+	} else
+		spin_lock_irqsave(&cputimer->lock, flags);
 	*times = cputimer->cputime;
 	spin_unlock_irqrestore(&cputimer->lock, flags);
 }

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-18  9:05                               ` Peter Zijlstra
  2011-10-18 14:59                                 ` Linus Torvalds
  2011-10-18 16:13                                 ` Linux 3.1-rc9 Dave Jones
@ 2011-10-18 18:20                                 ` Simon Kirby
  2011-10-18 19:48                                   ` Thomas Gleixner
  2 siblings, 1 reply; 98+ messages in thread
From: Simon Kirby @ 2011-10-18 18:20 UTC (permalink / raw)
  To: Peter Zijlstra, Linus Torvalds
  Cc: Thomas Gleixner, Linux Kernel Mailing List, Dave Jones,
	Martin Schwidefsky, Ingo Molnar

On Tue, Oct 18, 2011 at 11:05:13AM +0200, Peter Zijlstra wrote:

> Subject: cputimer: Cure lock inversion
> From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date: Mon Oct 17 11:50:30 CEST 2011
> 
> There's a lock inversion between the cputimer->lock and rq->lock; notably
> the two callchains involved are:
> 
>  update_rlimit_cpu()
>    sighand->siglock
>    set_process_cpu_timer()
>      cpu_timer_sample_group()
>        thread_group_cputimer()
>          cputimer->lock
>          thread_group_cputime()
>            task_sched_runtime()
>              ->pi_lock
>              rq->lock
> 
>  scheduler_tick()
>    rq->lock
>    task_tick_fair()
>      update_curr()
>        account_group_exec()
>          cputimer->lock
> 
> Where the first one is enabling a CLOCK_PROCESS_CPUTIME_ID timer, and
> the second one is keeping up-to-date.
> 
> This problem was introduced by e8abccb7193 ("posix-cpu-timers: Cure
> SMP accounting oddities").
> 
> Cure the problem by removing the cputimer->lock and rq->lock nesting,
> this leaves concurrent enablers doing duplicate work, but the time
> wasted should be on the same order otherwise wasted spinning on the
> lock and the greater-than assignment filter should ensure we preserve
> monotonicity.
> 
> Reported-by: Dave Jones <davej@redhat.com>
> Reported-by: Simon Kirby <sim@hostway.ca>
> Cc: stable@kernel.org
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
>  kernel/posix-cpu-timers.c |    7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> Index: linux-2.6/kernel/posix-cpu-timers.c
> ===================================================================
> --- linux-2.6.orig/kernel/posix-cpu-timers.c
> +++ linux-2.6/kernel/posix-cpu-timers.c
> @@ -274,9 +274,7 @@ void thread_group_cputimer(struct task_s
>  	struct task_cputime sum;
>  	unsigned long flags;
>  
> -	spin_lock_irqsave(&cputimer->lock, flags);
>  	if (!cputimer->running) {
> -		cputimer->running = 1;
>  		/*
>  		 * The POSIX timer interface allows for absolute time expiry
>  		 * values through the TIMER_ABSTIME flag, therefore we have
> @@ -284,8 +282,11 @@ void thread_group_cputimer(struct task_s
>  		 * it.
>  		 */
>  		thread_group_cputime(tsk, &sum);
> +		spin_lock_irqsave(&cputimer->lock, flags);
> +		cputimer->running = 1;
>  		update_gt_cputime(&cputimer->cputime, &sum);
> -	}
> +	} else
> +		spin_lock_irqsave(&cputimer->lock, flags);
>  	*times = cputimer->cputime;
>  	spin_unlock_irqrestore(&cputimer->lock, flags);
>  }
> 

Tested-by: Simon Kirby <sim@hostway.ca>

Looks good running on three boxes since this morning (unpatched kernel
hangs in ~15 minutes).

While I have your eyes, does this hang trace make any sense (which
happened a couple of times with your previous patch applied)?

http://0x.ca/sim/ref/3.1-rc9/3.1-rc9-tcp-lockup.log

I don't see how all CPUs could be spinning on the same lock without
reentry, and I don't see the any in the backtraces.

Simon-

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-18  7:12                                 ` Geert Uytterhoeven
@ 2011-10-18 18:50                                   ` H. Peter Anvin
  0 siblings, 0 replies; 98+ messages in thread
From: H. Peter Anvin @ 2011-10-18 18:50 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Ingo Molnar, Thomas Gleixner, Linus Torvalds, Simon Kirby,
	Peter Zijlstra, Linux Kernel Mailing List, Dave Jones,
	Martin Schwidefsky

On 10/18/2011 12:12 AM, Geert Uytterhoeven wrote:
> 
> I assume you mean "we have in hardware"?
> 

No.

> Is that muldi3?

No.

	-hpa

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-18 18:20                                 ` Simon Kirby
@ 2011-10-18 19:48                                   ` Thomas Gleixner
  2011-10-18 20:12                                     ` Linus Torvalds
  2011-10-24 19:02                                     ` Simon Kirby
  0 siblings, 2 replies; 98+ messages in thread
From: Thomas Gleixner @ 2011-10-18 19:48 UTC (permalink / raw)
  To: Simon Kirby
  Cc: Peter Zijlstra, Linus Torvalds, Linux Kernel Mailing List,
	Dave Jones, Martin Schwidefsky, Ingo Molnar, David Miller

On Tue, 18 Oct 2011, Simon Kirby wrote:
> Looks good running on three boxes since this morning (unpatched kernel
> hangs in ~15 minutes).
> 
> While I have your eyes, does this hang trace make any sense (which
> happened a couple of times with your previous patch applied)?
> 
> http://0x.ca/sim/ref/3.1-rc9/3.1-rc9-tcp-lockup.log
> 
> I don't see how all CPUs could be spinning on the same lock without
> reentry, and I don't see the any in the backtraces.

Weird.

Which version of Peters patches was this, the extra lock or the
atomic64 thingy?

It does not look related. Could you try to reproduce that problem with
lockdep enabled? lockdep might make it go away, but it's definitely
worth a try.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-18 19:48                                   ` Thomas Gleixner
@ 2011-10-18 20:12                                     ` Linus Torvalds
  2011-10-25 15:26                                       ` Simon Kirby
  2011-10-24 19:02                                     ` Simon Kirby
  1 sibling, 1 reply; 98+ messages in thread
From: Linus Torvalds @ 2011-10-18 20:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Simon Kirby, Peter Zijlstra, Linux Kernel Mailing List,
	Dave Jones, Martin Schwidefsky, Ingo Molnar, David Miller

On Tue, Oct 18, 2011 at 12:48 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> It does not look related.

Yeah, the only lock held there seems to be the socket lock, and it
looks like all CPU's are spinning on it.

> Could you try to reproduce that problem with
> lockdep enabled? lockdep might make it go away, but it's definitely
> worth a try.

And DEBUG_SPINLOCK / DEBUG_SPINLOCK_SLEEP too. Maybe you're triggering
some odd networking thing.  It sounds unlikely, but maybe some error
case you get into doesn't release the socket lock.

I think PROVE_LOCKING already enables DEBUG_SPINLOCK, but the sleeping
lock thing is separate, iirc.

                    Linus

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17  1:39               ` Linus Torvalds
                                   ` (2 preceding siblings ...)
  2011-10-17 10:34                 ` Peter Zijlstra
@ 2011-10-20 14:36                 ` Martin Schwidefsky
  2011-10-23 11:34                   ` Ingo Molnar
  3 siblings, 1 reply; 98+ messages in thread
From: Martin Schwidefsky @ 2011-10-20 14:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Simon Kirby, Peter Zijlstra, Linux Kernel Mailing List,
	Dave Jones, Thomas Gleixner, Ingo Molnar

On Sun, 16 Oct 2011 18:39:57 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> That stupid definition of cputime_add() has apparently existed as-is
> since it was introduced in 2005. Why do we have code like this:
> 
>     times->utime = cputime_add(times->utime, t->utime);
> 
> instead of just
> 
>     times->utime += t->utime;
> 
> which seems not just shorter, but more readable too? The reason is not
> some type safety in the cputime_add() thing, it's just a macro.
> 
> Added Martin and Ingo to the discussion - Martin because he added that
> cputime_add in the first place, and Ingo because he gets the most hits
> on kernel/sched_stats.h. Guys - you can see the history on lkml.

I tried my luck with cputime and sparse. It seems to work, I've added
sparse __nocast to the typedefs of cputime_t and cputime64_t and removed
all cputime macros for simple scalar operations on cputime.

Compiling a x86-64 tree with C=1 still gives a few warnings:

1) sparse creates a warning if a pointer to a nowarn variable is created.
   Is that intentional?
2) uptime_proc_show is borked, it uses a cputime_t to accumulate the idle
   time over the processors. This will overflow on x86-32 after 12.45 days
   of idle time. The __nocast check of sparse correctly identifies this
   as a problem:
   fs/proc/uptime.c:18:49: warning: implicit cast to/from nocast type.
3) cpufreq governor do strange things with cputime, e.g. wall time that
   is kept in a cputime64..

The patch is quite big. Comments ?

---
Subject: [PATCH] cputime: add sparse checking and cleanup

From: Martin Schwidefsky <schwidefsky@de.ibm.com>

Make cputime_t and cputime64_t nocast to enable sparse checking to
detect incorrect use of cputime. Drop the cputime macros for simple
scalar operations. The conversion macros are still needed as well as
the cputime_zero, cputime64_zero and cputime_one_jiffy defines.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---
 arch/ia64/include/asm/cputime.h        |   72 ++++++++---------
 arch/powerpc/include/asm/cputime.h     |   71 ++++++----------
 arch/s390/include/asm/cputime.h        |  139 +++++++++++++++------------------
 drivers/cpufreq/cpufreq_conservative.c |   27 +++---
 drivers/cpufreq/cpufreq_ondemand.c     |   31 +++----
 drivers/cpufreq/cpufreq_stats.c        |    5 -
 drivers/macintosh/rack-meter.c         |   11 --
 fs/proc/array.c                        |    4 
 fs/proc/stat.c                         |   25 ++---
 fs/proc/uptime.c                       |    2 
 include/asm-generic/cputime.h          |   64 +++++++--------
 kernel/acct.c                          |    4 
 kernel/cpu.c                           |    3 
 kernel/exit.c                          |   22 +----
 kernel/itimer.c                        |   19 ++--
 kernel/posix-cpu-timers.c              |  125 ++++++++++++-----------------
 kernel/sched.c                         |   80 ++++++++----------
 kernel/sched_stats.h                   |    6 -
 kernel/signal.c                        |    6 -
 kernel/sys.c                           |    4 
 kernel/tsacct.c                        |    4 
 21 files changed, 318 insertions(+), 406 deletions(-)

--- a/arch/ia64/include/asm/cputime.h
+++ b/arch/ia64/include/asm/cputime.h
@@ -26,59 +26,54 @@
 #include <linux/jiffies.h>
 #include <asm/processor.h>
 
-typedef u64 cputime_t;
-typedef u64 cputime64_t;
+typedef u64 __nocast cputime_t;
+typedef u64 __nocast cputime64_t;
 
-#define cputime_zero			((cputime_t)0)
+#define cputime_zero			((__force cputime_t) 0ULL)
 #define cputime_one_jiffy		jiffies_to_cputime(1)
-#define cputime_max			((~((cputime_t)0) >> 1) - 1)
-#define cputime_add(__a, __b)		((__a) +  (__b))
-#define cputime_sub(__a, __b)		((__a) -  (__b))
-#define cputime_div(__a, __n)		((__a) /  (__n))
-#define cputime_halve(__a)		((__a) >> 1)
-#define cputime_eq(__a, __b)		((__a) == (__b))
-#define cputime_gt(__a, __b)		((__a) >  (__b))
-#define cputime_ge(__a, __b)		((__a) >= (__b))
-#define cputime_lt(__a, __b)		((__a) <  (__b))
-#define cputime_le(__a, __b)		((__a) <= (__b))
-
-#define cputime64_zero			((cputime64_t)0)
-#define cputime64_add(__a, __b)		((__a) + (__b))
-#define cputime64_sub(__a, __b)		((__a) - (__b))
-#define cputime_to_cputime64(__ct)	(__ct)
+
+#define cputime64_zero			((__force cputime64_t) 0ULL)
 
 /*
  * Convert cputime <-> jiffies (HZ)
  */
-#define cputime_to_jiffies(__ct)	((__ct) / (NSEC_PER_SEC / HZ))
-#define jiffies_to_cputime(__jif)	((__jif) * (NSEC_PER_SEC / HZ))
-#define cputime64_to_jiffies64(__ct)	((__ct) / (NSEC_PER_SEC / HZ))
-#define jiffies64_to_cputime64(__jif)	((__jif) * (NSEC_PER_SEC / HZ))
+#define cputime_to_jiffies(__ct)	\
+	((__force u64)(__ct) / (NSEC_PER_SEC / HZ))
+#define jiffies_to_cputime(__jif)	\
+	(__force cputime_t)((__jif) * (NSEC_PER_SEC / HZ))
+#define cputime64_to_jiffies64(__ct)	\
+	((__force u64)(__ct) / (NSEC_PER_SEC / HZ))
+#define jiffies64_to_cputime64(__jif)	\
+	(__force cputime64_t)((__jif) * (NSEC_PER_SEC / HZ))
 
 /*
  * Convert cputime <-> microseconds
  */
-#define cputime_to_usecs(__ct)		((__ct) / NSEC_PER_USEC)
-#define usecs_to_cputime(__usecs)	((__usecs) * NSEC_PER_USEC)
+#define cputime_to_usecs(__ct)		\
+	((__force u64)(__ct) / NSEC_PER_USEC)
+#define usecs_to_cputime(__usecs)	\
+	(__force cputime_t)((__usecs) * NSEC_PER_USEC)
 
 /*
  * Convert cputime <-> seconds
  */
-#define cputime_to_secs(__ct)		((__ct) / NSEC_PER_SEC)
-#define secs_to_cputime(__secs)		((__secs) * NSEC_PER_SEC)
+#define cputime_to_secs(__ct)		\
+	((__force u64)(__ct) / NSEC_PER_SEC)
+#define secs_to_cputime(__secs)		\
+	(__force cputime_t)((__secs) * NSEC_PER_SEC)
 
 /*
  * Convert cputime <-> timespec (nsec)
  */
 static inline cputime_t timespec_to_cputime(const struct timespec *val)
 {
-	cputime_t ret = val->tv_sec * NSEC_PER_SEC;
-	return (ret + val->tv_nsec);
+	u64 ret = val->tv_sec * NSEC_PER_SEC + val->tv_nsec;
+	return (__force cputime_t) ret;
 }
 static inline void cputime_to_timespec(const cputime_t ct, struct timespec *val)
 {
-	val->tv_sec  = ct / NSEC_PER_SEC;
-	val->tv_nsec = ct % NSEC_PER_SEC;
+	val->tv_sec  = (__force u64) ct / NSEC_PER_SEC;
+	val->tv_nsec = (__force u64) ct % NSEC_PER_SEC;
 }
 
 /*
@@ -86,25 +81,28 @@ static inline void cputime_to_timespec(c
  */
 static inline cputime_t timeval_to_cputime(struct timeval *val)
 {
-	cputime_t ret = val->tv_sec * NSEC_PER_SEC;
-	return (ret + val->tv_usec * NSEC_PER_USEC);
+	u64 ret = val->tv_sec * NSEC_PER_SEC + val->tv_usec * NSEC_PER_USEC;
+	return (__force cputime_t) ret;
 }
 static inline void cputime_to_timeval(const cputime_t ct, struct timeval *val)
 {
-	val->tv_sec = ct / NSEC_PER_SEC;
-	val->tv_usec = (ct % NSEC_PER_SEC) / NSEC_PER_USEC;
+	val->tv_sec = (__force u64) ct / NSEC_PER_SEC;
+	val->tv_usec = ((__force u64) ct % NSEC_PER_SEC) / NSEC_PER_USEC;
 }
 
 /*
  * Convert cputime <-> clock (USER_HZ)
  */
-#define cputime_to_clock_t(__ct)	((__ct) / (NSEC_PER_SEC / USER_HZ))
-#define clock_t_to_cputime(__x)		((__x) * (NSEC_PER_SEC / USER_HZ))
+#define cputime_to_clock_t(__ct)	\
+	((__force u64)(__ct) / (NSEC_PER_SEC / USER_HZ))
+#define clock_t_to_cputime(__x)		\
+	(__force cputime_t)((__x) * (NSEC_PER_SEC / USER_HZ))
 
 /*
  * Convert cputime64 to clock.
  */
-#define cputime64_to_clock_t(__ct)      cputime_to_clock_t((cputime_t)__ct)
+#define cputime64_to_clock_t(__ct)      \
+	cputime_to_clock_t((__force cputime_t)__ct)
 
 #endif /* CONFIG_VIRT_CPU_ACCOUNTING */
 #endif /* __IA64_CPUTIME_H */
--- a/arch/powerpc/include/asm/cputime.h
+++ b/arch/powerpc/include/asm/cputime.h
@@ -29,25 +29,11 @@ static inline void setup_cputime_one_jif
 #include <asm/time.h>
 #include <asm/param.h>
 
-typedef u64 cputime_t;
-typedef u64 cputime64_t;
+typedef u64 __nocast cputime_t;
+typedef u64 __nocast cputime64_t;
 
-#define cputime_zero			((cputime_t)0)
-#define cputime_max			((~((cputime_t)0) >> 1) - 1)
-#define cputime_add(__a, __b)		((__a) +  (__b))
-#define cputime_sub(__a, __b)		((__a) -  (__b))
-#define cputime_div(__a, __n)		((__a) /  (__n))
-#define cputime_halve(__a)		((__a) >> 1)
-#define cputime_eq(__a, __b)		((__a) == (__b))
-#define cputime_gt(__a, __b)		((__a) >  (__b))
-#define cputime_ge(__a, __b)		((__a) >= (__b))
-#define cputime_lt(__a, __b)		((__a) <  (__b))
-#define cputime_le(__a, __b)		((__a) <= (__b))
-
-#define cputime64_zero			((cputime64_t)0)
-#define cputime64_add(__a, __b)		((__a) + (__b))
-#define cputime64_sub(__a, __b)		((__a) - (__b))
-#define cputime_to_cputime64(__ct)	(__ct)
+#define cputime_zero			((__force cputime_t) 0ULL)
+#define cputime64_zero			((__force cputime64_t) 0ULL)
 
 #ifdef __KERNEL__
 
@@ -65,7 +51,7 @@ DECLARE_PER_CPU(unsigned long, cputime_s
 
 static inline unsigned long cputime_to_jiffies(const cputime_t ct)
 {
-	return mulhdu(ct, __cputime_jiffies_factor);
+	return mulhdu((__force u64) ct, __cputime_jiffies_factor);
 }
 
 /* Estimate the scaled cputime by scaling the real cputime based on
@@ -74,14 +60,15 @@ static inline cputime_t cputime_to_scale
 {
 	if (cpu_has_feature(CPU_FTR_SPURR) &&
 	    __get_cpu_var(cputime_last_delta))
-		return ct * __get_cpu_var(cputime_scaled_last_delta) /
-			    __get_cpu_var(cputime_last_delta);
+		return (__force u64) ct *
+			__get_cpu_var(cputime_scaled_last_delta) /
+			__get_cpu_var(cputime_last_delta);
 	return ct;
 }
 
 static inline cputime_t jiffies_to_cputime(const unsigned long jif)
 {
-	cputime_t ct;
+	u64 ct;
 	unsigned long sec;
 
 	/* have to be a little careful about overflow */
@@ -93,7 +80,7 @@ static inline cputime_t jiffies_to_cputi
 	}
 	if (sec)
 		ct += (cputime_t) sec * tb_ticks_per_sec;
-	return ct;
+	return (__force cputime_t) ct;
 }
 
 static inline void setup_cputime_one_jiffy(void)
@@ -103,7 +90,7 @@ static inline void setup_cputime_one_jif
 
 static inline cputime64_t jiffies64_to_cputime64(const u64 jif)
 {
-	cputime_t ct;
+	u64 ct;
 	u64 sec;
 
 	/* have to be a little careful about overflow */
@@ -114,13 +101,13 @@ static inline cputime64_t jiffies64_to_c
 		do_div(ct, HZ);
 	}
 	if (sec)
-		ct += (cputime_t) sec * tb_ticks_per_sec;
-	return ct;
+		ct += (u64) sec * tb_ticks_per_sec;
+	return (__force cputime64_t) ct;
 }
 
 static inline u64 cputime64_to_jiffies64(const cputime_t ct)
 {
-	return mulhdu(ct, __cputime_jiffies_factor);
+	return mulhdu((__force u64) ct, __cputime_jiffies_factor);
 }
 
 /*
@@ -130,12 +117,12 @@ extern u64 __cputime_msec_factor;
 
 static inline unsigned long cputime_to_usecs(const cputime_t ct)
 {
-	return mulhdu(ct, __cputime_msec_factor) * USEC_PER_MSEC;
+	return mulhdu((__force u64) ct, __cputime_msec_factor) * USEC_PER_MSEC;
 }
 
 static inline cputime_t usecs_to_cputime(const unsigned long us)
 {
-	cputime_t ct;
+	u64 ct;
 	unsigned long sec;
 
 	/* have to be a little careful about overflow */
@@ -147,7 +134,7 @@ static inline cputime_t usecs_to_cputime
 	}
 	if (sec)
 		ct += (cputime_t) sec * tb_ticks_per_sec;
-	return ct;
+	return (__force cputime_t) ct;
 }
 
 /*
@@ -157,12 +144,12 @@ extern u64 __cputime_sec_factor;
 
 static inline unsigned long cputime_to_secs(const cputime_t ct)
 {
-	return mulhdu(ct, __cputime_sec_factor);
+	return mulhdu((__force u64) ct, __cputime_sec_factor);
 }
 
 static inline cputime_t secs_to_cputime(const unsigned long sec)
 {
-	return (cputime_t) sec * tb_ticks_per_sec;
+	return (__force cputime_t)((u64) sec * tb_ticks_per_sec);
 }
 
 /*
@@ -170,7 +157,7 @@ static inline cputime_t secs_to_cputime(
  */
 static inline void cputime_to_timespec(const cputime_t ct, struct timespec *p)
 {
-	u64 x = ct;
+	u64 x = (__force u64) ct;
 	unsigned int frac;
 
 	frac = do_div(x, tb_ticks_per_sec);
@@ -182,11 +169,11 @@ static inline void cputime_to_timespec(c
 
 static inline cputime_t timespec_to_cputime(const struct timespec *p)
 {
-	cputime_t ct;
+	u64 ct;
 
 	ct = (u64) p->tv_nsec * tb_ticks_per_sec;
 	do_div(ct, 1000000000);
-	return ct + (u64) p->tv_sec * tb_ticks_per_sec;
+	return (__force cputime_t)(ct + (u64) p->tv_sec * tb_ticks_per_sec);
 }
 
 /*
@@ -194,7 +181,7 @@ static inline cputime_t timespec_to_cput
  */
 static inline void cputime_to_timeval(const cputime_t ct, struct timeval *p)
 {
-	u64 x = ct;
+	u64 x = (__force u64) ct;
 	unsigned int frac;
 
 	frac = do_div(x, tb_ticks_per_sec);
@@ -206,11 +193,11 @@ static inline void cputime_to_timeval(co
 
 static inline cputime_t timeval_to_cputime(const struct timeval *p)
 {
-	cputime_t ct;
+	u64 ct;
 
 	ct = (u64) p->tv_usec * tb_ticks_per_sec;
 	do_div(ct, 1000000);
-	return ct + (u64) p->tv_sec * tb_ticks_per_sec;
+	return (__force cputime_t)(ct + (u64) p->tv_sec * tb_ticks_per_sec);
 }
 
 /*
@@ -220,12 +207,12 @@ extern u64 __cputime_clockt_factor;
 
 static inline unsigned long cputime_to_clock_t(const cputime_t ct)
 {
-	return mulhdu(ct, __cputime_clockt_factor);
+	return mulhdu((__force u64) ct, __cputime_clockt_factor);
 }
 
 static inline cputime_t clock_t_to_cputime(const unsigned long clk)
 {
-	cputime_t ct;
+	u64 ct;
 	unsigned long sec;
 
 	/* have to be a little careful about overflow */
@@ -236,8 +223,8 @@ static inline cputime_t clock_t_to_cputi
 		do_div(ct, USER_HZ);
 	}
 	if (sec)
-		ct += (cputime_t) sec * tb_ticks_per_sec;
-	return ct;
+		ct += (u64) sec * tb_ticks_per_sec;
+	return (__force cputime_t) ct;
 }
 
 #define cputime64_to_clock_t(ct)	cputime_to_clock_t((cputime_t)(ct))
--- a/arch/s390/include/asm/cputime.h
+++ b/arch/s390/include/asm/cputime.h
@@ -16,114 +16,101 @@
 
 /* We want to use full resolution of the CPU timer: 2**-12 micro-seconds. */
 
-typedef unsigned long long cputime_t;
-typedef unsigned long long cputime64_t;
+typedef unsigned long long __nocast cputime_t;
+typedef unsigned long long __nocast cputime64_t;
 
-#ifndef __s390x__
-
-static inline unsigned int
-__div(unsigned long long n, unsigned int base)
+static inline unsigned long __div(unsigned long long n, unsigned long base)
 {
+#ifndef __s390x__
 	register_pair rp;
 
 	rp.pair = n >> 1;
 	asm ("dr %0,%1" : "+d" (rp) : "d" (base >> 1));
 	return rp.subreg.odd;
+#else /* __s390x__ */
+	return n / base;
+#endif /* __s390x__ */
 }
 
-#else /* __s390x__ */
+#define cputime_zero			((__force cputime_t) 0ULL)
+#define cputime_one_jiffy		jiffies_to_cputime(1)
+
+#define cputime64_zero			((__force cputime64_t) 0ULL)
 
-static inline unsigned int
-__div(unsigned long long n, unsigned int base)
+/*
+ * Convert cputime to jiffies and back.
+ */
+static inline unsigned long cputime_to_jiffies(const cputime_t cputime)
 {
-	return n / base;
+	return __div((__force unsigned long long) cputime, 4096000000ULL / HZ);
 }
 
-#endif /* __s390x__ */
+static inline cputime_t jiffies_to_cputime(const unsigned int jif)
+{
+	return (__force cputime_t)(jif * (4096000000ULL / HZ));
+}
 
-#define cputime_zero			(0ULL)
-#define cputime_one_jiffy		jiffies_to_cputime(1)
-#define cputime_max			((~0UL >> 1) - 1)
-#define cputime_add(__a, __b)		((__a) +  (__b))
-#define cputime_sub(__a, __b)		((__a) -  (__b))
-#define cputime_div(__a, __n) ({		\
-	unsigned long long __div = (__a);	\
-	do_div(__div,__n);			\
-	__div;					\
-})
-#define cputime_halve(__a)		((__a) >> 1)
-#define cputime_eq(__a, __b)		((__a) == (__b))
-#define cputime_gt(__a, __b)		((__a) >  (__b))
-#define cputime_ge(__a, __b)		((__a) >= (__b))
-#define cputime_lt(__a, __b)		((__a) <  (__b))
-#define cputime_le(__a, __b)		((__a) <= (__b))
-#define cputime_to_jiffies(__ct)	(__div((__ct), 4096000000ULL / HZ))
-#define cputime_to_scaled(__ct)		(__ct)
-#define jiffies_to_cputime(__hz)	((cputime_t)(__hz) * (4096000000ULL / HZ))
-
-#define cputime64_zero			(0ULL)
-#define cputime64_add(__a, __b)		((__a) + (__b))
-#define cputime_to_cputime64(__ct)	(__ct)
+static inline u64 cputime64_to_jiffies64(cputime64_t cputime)
+{
+	unsigned long long jif = (__force unsigned long long) cputime;
+	do_div(jif, 4096000000ULL / HZ);
+	return jif;
+}
 
-static inline u64
-cputime64_to_jiffies64(cputime64_t cputime)
+static inline cputime64_t jiffies64_to_cputime64(const u64 jif)
 {
-	do_div(cputime, 4096000000ULL / HZ);
-	return cputime;
+	return (__force cputime64_t)(jif * (4096000000ULL / HZ));
 }
 
 /*
  * Convert cputime to microseconds and back.
  */
-static inline unsigned int
-cputime_to_usecs(const cputime_t cputime)
+static inline unsigned int cputime_to_usecs(const cputime_t cputime)
 {
-	return cputime_div(cputime, 4096);
+	return (__force unsigned long long) cputime >> 12;
 }
 
-static inline cputime_t
-usecs_to_cputime(const unsigned int m)
+static inline cputime_t usecs_to_cputime(const unsigned int m)
 {
-	return (cputime_t) m * 4096;
+	return (__force cputime_t)(m * 4096ULL);
 }
 
 /*
  * Convert cputime to milliseconds and back.
  */
-static inline unsigned int
-cputime_to_secs(const cputime_t cputime)
+static inline unsigned int cputime_to_secs(const cputime_t cputime)
 {
-	return __div(cputime, 2048000000) >> 1;
+	return __div((__force unsigned long long) cputime, 2048000000) >> 1;
 }
 
-static inline cputime_t
-secs_to_cputime(const unsigned int s)
+static inline cputime_t secs_to_cputime(const unsigned int s)
 {
-	return (cputime_t) s * 4096000000ULL;
+	return (__force cputime_t)(s * 4096000000ULL);
 }
 
 /*
  * Convert cputime to timespec and back.
  */
-static inline cputime_t
-timespec_to_cputime(const struct timespec *value)
+static inline cputime_t timespec_to_cputime(const struct timespec *value)
 {
-	return value->tv_nsec * 4096 / 1000 + (u64) value->tv_sec * 4096000000ULL;
+	unsigned long long ret = value->tv_sec * 4096000000ULL;
+	return (__force cputime_t)(ret + value->tv_nsec * 4096 / 1000);
 }
 
-static inline void
-cputime_to_timespec(const cputime_t cputime, struct timespec *value)
+static inline void cputime_to_timespec(const cputime_t cputime,
+				       struct timespec *value)
 {
+	unsigned long long __cputime = (__force unsigned long long) cputime;
 #ifndef __s390x__
 	register_pair rp;
 
-	rp.pair = cputime >> 1;
+	rp.pair = __cputime >> 1;
 	asm ("dr %0,%1" : "+d" (rp) : "d" (2048000000UL));
 	value->tv_nsec = rp.subreg.even * 1000 / 4096;
 	value->tv_sec = rp.subreg.odd;
 #else
-	value->tv_nsec = (cputime % 4096000000ULL) * 1000 / 4096;
-	value->tv_sec = cputime / 4096000000ULL;
+	value->tv_nsec = (__cputime % 4096000000ULL) * 1000 / 4096;
+	value->tv_sec = __cputime / 4096000000ULL;
 #endif
 }
 
@@ -132,50 +119,52 @@ cputime_to_timespec(const cputime_t cput
  * Since cputime and timeval have the same resolution (microseconds)
  * this is easy.
  */
-static inline cputime_t
-timeval_to_cputime(const struct timeval *value)
+static inline cputime_t timeval_to_cputime(const struct timeval *value)
 {
-	return value->tv_usec * 4096 + (u64) value->tv_sec * 4096000000ULL;
+	unsigned long long ret = value->tv_sec * 4096000000ULL;
+	return (__force cputime_t)(ret + value->tv_usec * 4096ULL);
 }
 
-static inline void
-cputime_to_timeval(const cputime_t cputime, struct timeval *value)
+static inline void cputime_to_timeval(const cputime_t cputime,
+				      struct timeval *value)
 {
+	unsigned long long __cputime = (__force unsigned long long) cputime;
 #ifndef __s390x__
 	register_pair rp;
 
-	rp.pair = cputime >> 1;
+	rp.pair = __cputime >> 1;
 	asm ("dr %0,%1" : "+d" (rp) : "d" (2048000000UL));
 	value->tv_usec = rp.subreg.even / 4096;
 	value->tv_sec = rp.subreg.odd;
 #else
-	value->tv_usec = (cputime % 4096000000ULL) / 4096;
-	value->tv_sec = cputime / 4096000000ULL;
+	value->tv_usec = (__cputime % 4096000000ULL) / 4096;
+	value->tv_sec = __cputime / 4096000000ULL;
 #endif
 }
 
 /*
  * Convert cputime to clock and back.
  */
-static inline clock_t
-cputime_to_clock_t(cputime_t cputime)
+static inline clock_t cputime_to_clock_t(cputime_t cputime)
 {
-	return cputime_div(cputime, 4096000000ULL / USER_HZ);
+	unsigned long long clock = (__force unsigned long long) cputime;
+	do_div(clock, 4096000000ULL / USER_HZ);
+	return clock;
 }
 
-static inline cputime_t
-clock_t_to_cputime(unsigned long x)
+static inline cputime_t clock_t_to_cputime(unsigned long x)
 {
-	return (cputime_t) x * (4096000000ULL / USER_HZ);
+	return (__force cputime_t)(x * (4096000000ULL / USER_HZ));
 }
 
 /*
  * Convert cputime64 to clock.
  */
-static inline clock_t
-cputime64_to_clock_t(cputime64_t cputime)
+static inline clock_t cputime64_to_clock_t(cputime64_t cputime)
 {
-       return cputime_div(cputime, 4096000000ULL / USER_HZ);
+	unsigned long long clock = (__force unsigned long long) cputime;
+	do_div(clock, 4096000000ULL / USER_HZ);
+	return clock;
 }
 
 struct s390_idle_data {
--- a/drivers/cpufreq/cpufreq_conservative.c
+++ b/drivers/cpufreq/cpufreq_conservative.c
@@ -103,15 +103,14 @@ static inline cputime64_t get_cpu_idle_t
 	cputime64_t busy_time;
 
 	cur_wall_time = jiffies64_to_cputime64(get_jiffies_64());
-	busy_time = cputime64_add(kstat_cpu(cpu).cpustat.user,
-			kstat_cpu(cpu).cpustat.system);
+	busy_time  = kstat_cpu(cpu).cpustat.user;
+	busy_time += kstat_cpu(cpu).cpustat.system;
+	busy_time += kstat_cpu(cpu).cpustat.irq;
+	busy_time += kstat_cpu(cpu).cpustat.softirq;
+	busy_time += kstat_cpu(cpu).cpustat.steal;
+	busy_time += kstat_cpu(cpu).cpustat.nice;
 
-	busy_time = cputime64_add(busy_time, kstat_cpu(cpu).cpustat.irq);
-	busy_time = cputime64_add(busy_time, kstat_cpu(cpu).cpustat.softirq);
-	busy_time = cputime64_add(busy_time, kstat_cpu(cpu).cpustat.steal);
-	busy_time = cputime64_add(busy_time, kstat_cpu(cpu).cpustat.nice);
-
-	idle_time = cputime64_sub(cur_wall_time, busy_time);
+	idle_time = cur_wall_time - busy_time;
 	if (wall)
 		*wall = (cputime64_t)jiffies_to_usecs(cur_wall_time);
 
@@ -351,20 +350,20 @@ static void dbs_check_cpu(struct cpu_dbs
 
 		cur_idle_time = get_cpu_idle_time(j, &cur_wall_time);
 
-		wall_time = (unsigned int) cputime64_sub(cur_wall_time,
-				j_dbs_info->prev_cpu_wall);
+		wall_time = (unsigned int)
+			cur_wall_time - j_dbs_info->prev_cpu_wall;
 		j_dbs_info->prev_cpu_wall = cur_wall_time;
 
-		idle_time = (unsigned int) cputime64_sub(cur_idle_time,
-				j_dbs_info->prev_cpu_idle);
+		idle_time = (unsigned int)
+			cur_idle_time - j_dbs_info->prev_cpu_idle;
 		j_dbs_info->prev_cpu_idle = cur_idle_time;
 
 		if (dbs_tuners_ins.ignore_nice) {
 			cputime64_t cur_nice;
 			unsigned long cur_nice_jiffies;
 
-			cur_nice = cputime64_sub(kstat_cpu(j).cpustat.nice,
-					 j_dbs_info->prev_cpu_nice);
+			cur_nice = kstat_cpu(j).cpustat.nice -
+					j_dbs_info->prev_cpu_nice;
 			/*
 			 * Assumption: nice time between sampling periods will
 			 * be less than 2^32 jiffies for 32 bit sys
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -127,15 +127,14 @@ static inline cputime64_t get_cpu_idle_t
 	cputime64_t busy_time;
 
 	cur_wall_time = jiffies64_to_cputime64(get_jiffies_64());
-	busy_time = cputime64_add(kstat_cpu(cpu).cpustat.user,
-			kstat_cpu(cpu).cpustat.system);
+	busy_time  = kstat_cpu(cpu).cpustat.user;
+	busy_time += kstat_cpu(cpu).cpustat.system;
+	busy_time += kstat_cpu(cpu).cpustat.irq;
+	busy_time += kstat_cpu(cpu).cpustat.softirq;
+	busy_time += kstat_cpu(cpu).cpustat.steal;
+	busy_time += kstat_cpu(cpu).cpustat.nice;
 
-	busy_time = cputime64_add(busy_time, kstat_cpu(cpu).cpustat.irq);
-	busy_time = cputime64_add(busy_time, kstat_cpu(cpu).cpustat.softirq);
-	busy_time = cputime64_add(busy_time, kstat_cpu(cpu).cpustat.steal);
-	busy_time = cputime64_add(busy_time, kstat_cpu(cpu).cpustat.nice);
-
-	idle_time = cputime64_sub(cur_wall_time, busy_time);
+	idle_time = cur_wall_time - busy_time;
 	if (wall)
 		*wall = (cputime64_t)jiffies_to_usecs(cur_wall_time);
 
@@ -440,24 +439,24 @@ static void dbs_check_cpu(struct cpu_dbs
 		cur_idle_time = get_cpu_idle_time(j, &cur_wall_time);
 		cur_iowait_time = get_cpu_iowait_time(j, &cur_wall_time);
 
-		wall_time = (unsigned int) cputime64_sub(cur_wall_time,
-				j_dbs_info->prev_cpu_wall);
+		wall_time = (unsigned int)
+			cur_wall_time - j_dbs_info->prev_cpu_wall;
 		j_dbs_info->prev_cpu_wall = cur_wall_time;
 
-		idle_time = (unsigned int) cputime64_sub(cur_idle_time,
-				j_dbs_info->prev_cpu_idle);
+		idle_time = (unsigned int)
+			cur_idle_time - j_dbs_info->prev_cpu_idle;
 		j_dbs_info->prev_cpu_idle = cur_idle_time;
 
-		iowait_time = (unsigned int) cputime64_sub(cur_iowait_time,
-				j_dbs_info->prev_cpu_iowait);
+		iowait_time = (unsigned int)
+			cur_iowait_time - j_dbs_info->prev_cpu_iowait;
 		j_dbs_info->prev_cpu_iowait = cur_iowait_time;
 
 		if (dbs_tuners_ins.ignore_nice) {
 			cputime64_t cur_nice;
 			unsigned long cur_nice_jiffies;
 
-			cur_nice = cputime64_sub(kstat_cpu(j).cpustat.nice,
-					 j_dbs_info->prev_cpu_nice);
+			cur_nice = kstat_cpu(j).cpustat.nice -
+					j_dbs_info->prev_cpu_nice;
 			/*
 			 * Assumption: nice time between sampling periods will
 			 * be less than 2^32 jiffies for 32 bit sys
--- a/drivers/cpufreq/cpufreq_stats.c
+++ b/drivers/cpufreq/cpufreq_stats.c
@@ -60,9 +60,8 @@ static int cpufreq_stats_update(unsigned
 	spin_lock(&cpufreq_stats_lock);
 	stat = per_cpu(cpufreq_stats_table, cpu);
 	if (stat->time_in_state)
-		stat->time_in_state[stat->last_index] =
-			cputime64_add(stat->time_in_state[stat->last_index],
-				      cputime_sub(cur_time, stat->last_time));
+		stat->time_in_state[stat->last_index] +=
+			cur_time - stat->last_time;
 	stat->last_time = cur_time;
 	spin_unlock(&cpufreq_stats_lock);
 	return 0;
--- a/drivers/macintosh/rack-meter.c
+++ b/drivers/macintosh/rack-meter.c
@@ -83,11 +83,10 @@ static inline cputime64_t get_cpu_idle_t
 {
 	cputime64_t retval;
 
-	retval = cputime64_add(kstat_cpu(cpu).cpustat.idle,
-			kstat_cpu(cpu).cpustat.iowait);
+	retval = kstat_cpu(cpu).cpustat.idle + kstat_cpu(cpu).cpustat.iowait;
 
 	if (rackmeter_ignore_nice)
-		retval = cputime64_add(retval, kstat_cpu(cpu).cpustat.nice);
+		retval = retval + kstat_cpu(cpu).cpustat.nice;
 
 	return retval;
 }
@@ -220,13 +219,11 @@ static void rackmeter_do_timer(struct wo
 	int i, offset, load, cumm, pause;
 
 	cur_jiffies = jiffies64_to_cputime64(get_jiffies_64());
-	total_ticks = (unsigned int)cputime64_sub(cur_jiffies,
-						  rcpu->prev_wall);
+	total_ticks = (unsigned int) (cur_jiffies - rcpu->prev_wall);
 	rcpu->prev_wall = cur_jiffies;
 
 	total_idle_ticks = get_cpu_idle_time(cpu);
-	idle_ticks = (unsigned int) cputime64_sub(total_idle_ticks,
-				rcpu->prev_idle);
+	idle_ticks = (unsigned int) (total_idle_ticks - rcpu->prev_idle);
 	rcpu->prev_idle = total_idle_ticks;
 
 	/* We do a very dumb calculation to update the LEDs for now,
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -423,14 +423,14 @@ static int do_task_stat(struct seq_file
 			do {
 				min_flt += t->min_flt;
 				maj_flt += t->maj_flt;
-				gtime = cputime_add(gtime, t->gtime);
+				gtime += t->gtime;
 				t = next_thread(t);
 			} while (t != task);
 
 			min_flt += sig->min_flt;
 			maj_flt += sig->maj_flt;
 			thread_group_times(task, &utime, &stime);
-			gtime = cputime_add(gtime, sig->gtime);
+			gtime += sig->gtime;
 		}
 
 		sid = task_session_nr_ns(task, ns);
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -39,18 +39,16 @@ static int show_stat(struct seq_file *p,
 	jif = boottime.tv_sec;
 
 	for_each_possible_cpu(i) {
-		user = cputime64_add(user, kstat_cpu(i).cpustat.user);
-		nice = cputime64_add(nice, kstat_cpu(i).cpustat.nice);
-		system = cputime64_add(system, kstat_cpu(i).cpustat.system);
-		idle = cputime64_add(idle, kstat_cpu(i).cpustat.idle);
-		idle = cputime64_add(idle, arch_idle_time(i));
-		iowait = cputime64_add(iowait, kstat_cpu(i).cpustat.iowait);
-		irq = cputime64_add(irq, kstat_cpu(i).cpustat.irq);
-		softirq = cputime64_add(softirq, kstat_cpu(i).cpustat.softirq);
-		steal = cputime64_add(steal, kstat_cpu(i).cpustat.steal);
-		guest = cputime64_add(guest, kstat_cpu(i).cpustat.guest);
-		guest_nice = cputime64_add(guest_nice,
-			kstat_cpu(i).cpustat.guest_nice);
+		user += kstat_cpu(i).cpustat.user;
+		nice += kstat_cpu(i).cpustat.nice;
+		system += kstat_cpu(i).cpustat.system;
+		idle += kstat_cpu(i).cpustat.idle + arch_idle_time(i);
+		iowait += kstat_cpu(i).cpustat.iowait;
+		irq += kstat_cpu(i).cpustat.irq;
+		softirq += kstat_cpu(i).cpustat.softirq;
+		steal += kstat_cpu(i).cpustat.steal;
+		guest += kstat_cpu(i).cpustat.guest;
+		guest_nice += kstat_cpu(i).cpustat.guest_nice;
 		sum += kstat_cpu_irqs_sum(i);
 		sum += arch_irq_stat_cpu(i);
 
@@ -81,8 +79,7 @@ static int show_stat(struct seq_file *p,
 		user = kstat_cpu(i).cpustat.user;
 		nice = kstat_cpu(i).cpustat.nice;
 		system = kstat_cpu(i).cpustat.system;
-		idle = kstat_cpu(i).cpustat.idle;
-		idle = cputime64_add(idle, arch_idle_time(i));
+		idle = kstat_cpu(i).cpustat.idle + arch_idle_time(i);
 		iowait = kstat_cpu(i).cpustat.iowait;
 		irq = kstat_cpu(i).cpustat.irq;
 		softirq = kstat_cpu(i).cpustat.softirq;
--- a/fs/proc/uptime.c
+++ b/fs/proc/uptime.c
@@ -15,7 +15,7 @@ static int uptime_proc_show(struct seq_f
 	cputime_t idletime = cputime_zero;
 
 	for_each_possible_cpu(i)
-		idletime = cputime64_add(idletime, kstat_cpu(i).cpustat.idle);
+		idletime += kstat_cpu(i).cpustat.idle;
 
 	do_posix_clock_monotonic_gettime(&uptime);
 	monotonic_to_bootbased(&uptime);
--- a/include/asm-generic/cputime.h
+++ b/include/asm-generic/cputime.h
@@ -4,70 +4,66 @@
 #include <linux/time.h>
 #include <linux/jiffies.h>
 
-typedef unsigned long cputime_t;
+typedef unsigned long __nocast cputime_t;
 
-#define cputime_zero			(0UL)
+#define cputime_zero			((__force cputime_t) 0UL)
 #define cputime_one_jiffy		jiffies_to_cputime(1)
-#define cputime_max			((~0UL >> 1) - 1)
-#define cputime_add(__a, __b)		((__a) +  (__b))
-#define cputime_sub(__a, __b)		((__a) -  (__b))
-#define cputime_div(__a, __n)		((__a) /  (__n))
-#define cputime_halve(__a)		((__a) >> 1)
-#define cputime_eq(__a, __b)		((__a) == (__b))
-#define cputime_gt(__a, __b)		((__a) >  (__b))
-#define cputime_ge(__a, __b)		((__a) >= (__b))
-#define cputime_lt(__a, __b)		((__a) <  (__b))
-#define cputime_le(__a, __b)		((__a) <= (__b))
-#define cputime_to_jiffies(__ct)	(__ct)
+#define cputime_to_jiffies(__ct)	(__force unsigned long)(__ct)
 #define cputime_to_scaled(__ct)		(__ct)
-#define jiffies_to_cputime(__hz)	(__hz)
+#define jiffies_to_cputime(__hz)	(__force cputime_t)(__hz)
 
-typedef u64 cputime64_t;
+typedef u64 __nocast cputime64_t;
 
-#define cputime64_zero (0ULL)
-#define cputime64_add(__a, __b)		((__a) + (__b))
-#define cputime64_sub(__a, __b)		((__a) - (__b))
-#define cputime64_to_jiffies64(__ct)	(__ct)
-#define jiffies64_to_cputime64(__jif)	(__jif)
-#define cputime_to_cputime64(__ct)	((u64) __ct)
-#define cputime64_gt(__a, __b)		((__a) >  (__b))
+#define cputime64_zero			((__force cputime64_t) 0ULL)
+#define cputime64_to_jiffies64(__ct)	(__force u64)(__ct)
+#define jiffies64_to_cputime64(__jif)	(__force cputime64_t)(__jif)
 
-#define nsecs_to_cputime64(__ct)	nsecs_to_jiffies64(__ct)
+#define nsecs_to_cputime64(__ct)	\
+	jiffies64_to_cputime64(nsecs_to_jiffies64(__ct))
 
 
 /*
  * Convert cputime to microseconds and back.
  */
-#define cputime_to_usecs(__ct)		jiffies_to_usecs(__ct);
-#define usecs_to_cputime(__msecs)	usecs_to_jiffies(__msecs);
+#define cputime_to_usecs(__ct)		\
+	jiffies_to_usecs(cputime_to_jiffies(__ct));
+#define usecs_to_cputime(__msecs)	\
+	jiffies_to_cputime(usecs_to_jiffies(__msecs));
 
 /*
  * Convert cputime to seconds and back.
  */
-#define cputime_to_secs(jif)		((jif) / HZ)
-#define secs_to_cputime(sec)		((sec) * HZ)
+#define cputime_to_secs(jif)		(cputime_to_jiffies(jif) / HZ)
+#define secs_to_cputime(sec)		jiffies_to_cputime((sec) * HZ)
 
 /*
  * Convert cputime to timespec and back.
  */
-#define timespec_to_cputime(__val)	timespec_to_jiffies(__val)
-#define cputime_to_timespec(__ct,__val)	jiffies_to_timespec(__ct,__val)
+#define timespec_to_cputime(__val)	\
+	jiffies_to_cputime(timespec_to_jiffies(__val))
+#define cputime_to_timespec(__ct,__val)	\
+	jiffies_to_timespec(cputime_to_jiffies(__ct),__val)
 
 /*
  * Convert cputime to timeval and back.
  */
-#define timeval_to_cputime(__val)	timeval_to_jiffies(__val)
-#define cputime_to_timeval(__ct,__val)	jiffies_to_timeval(__ct,__val)
+#define timeval_to_cputime(__val)	\
+	jiffies_to_cputime(timeval_to_jiffies(__val))
+#define cputime_to_timeval(__ct,__val)	\
+	jiffies_to_timeval(cputime_to_jiffies(__ct),__val)
 
 /*
  * Convert cputime to clock and back.
  */
-#define cputime_to_clock_t(__ct)	jiffies_to_clock_t(__ct)
-#define clock_t_to_cputime(__x)		clock_t_to_jiffies(__x)
+#define cputime_to_clock_t(__ct)	\
+	jiffies_to_clock_t(cputime_to_jiffies(__ct))
+#define clock_t_to_cputime(__x)		\
+	jiffies_to_cputime(clock_t_to_jiffies(__x))
 
 /*
  * Convert cputime64 to clock.
  */
-#define cputime64_to_clock_t(__ct)	jiffies_64_to_clock_t(__ct)
+#define cputime64_to_clock_t(__ct)	\
+	jiffies_64_to_clock_t(cputime64_to_jiffies64(__ct))
 
 #endif
--- a/kernel/acct.c
+++ b/kernel/acct.c
@@ -613,8 +613,8 @@ void acct_collect(long exitcode, int gro
 		pacct->ac_flag |= ACORE;
 	if (current->flags & PF_SIGNALED)
 		pacct->ac_flag |= AXSIG;
-	pacct->ac_utime = cputime_add(pacct->ac_utime, current->utime);
-	pacct->ac_stime = cputime_add(pacct->ac_stime, current->stime);
+	pacct->ac_utime += current->utime;
+	pacct->ac_stime += current->stime;
 	pacct->ac_minflt += current->min_flt;
 	pacct->ac_majflt += current->maj_flt;
 	spin_unlock_irq(&current->sighand->siglock);
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -177,8 +177,7 @@ static inline void check_for_tasks(int c
 	write_lock_irq(&tasklist_lock);
 	for_each_process(p) {
 		if (task_cpu(p) == cpu && p->state == TASK_RUNNING &&
-		    (!cputime_eq(p->utime, cputime_zero) ||
-		     !cputime_eq(p->stime, cputime_zero)))
+		    (p->utime != cputime_zero || p->stime != cputime_zero))
 			printk(KERN_WARNING "Task %s (pid = %d) is on cpu %d "
 				"(state = %ld, flags = %x)\n",
 				p->comm, task_pid_nr(p), cpu,
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -121,9 +121,9 @@ static void __exit_signal(struct task_st
 		 * We won't ever get here for the group leader, since it
 		 * will have been the last reference on the signal_struct.
 		 */
-		sig->utime = cputime_add(sig->utime, tsk->utime);
-		sig->stime = cputime_add(sig->stime, tsk->stime);
-		sig->gtime = cputime_add(sig->gtime, tsk->gtime);
+		sig->utime += tsk->utime;
+		sig->stime += tsk->stime;
+		sig->gtime += tsk->gtime;
 		sig->min_flt += tsk->min_flt;
 		sig->maj_flt += tsk->maj_flt;
 		sig->nvcsw += tsk->nvcsw;
@@ -1257,19 +1257,9 @@ static int wait_task_zombie(struct wait_
 		spin_lock_irq(&p->real_parent->sighand->siglock);
 		psig = p->real_parent->signal;
 		sig = p->signal;
-		psig->cutime =
-			cputime_add(psig->cutime,
-			cputime_add(tgutime,
-				    sig->cutime));
-		psig->cstime =
-			cputime_add(psig->cstime,
-			cputime_add(tgstime,
-				    sig->cstime));
-		psig->cgtime =
-			cputime_add(psig->cgtime,
-			cputime_add(p->gtime,
-			cputime_add(sig->gtime,
-				    sig->cgtime)));
+		psig->cutime += tgutime + sig->cutime;
+		psig->cstime += tgstime + sig->cstime;
+		psig->cgtime += p->gtime + sig->gtime + sig->cgtime;
 		psig->cmin_flt +=
 			p->min_flt + sig->min_flt + sig->cmin_flt;
 		psig->cmaj_flt +=
--- a/kernel/itimer.c
+++ b/kernel/itimer.c
@@ -52,22 +52,22 @@ static void get_cpu_itimer(struct task_s
 
 	cval = it->expires;
 	cinterval = it->incr;
-	if (!cputime_eq(cval, cputime_zero)) {
+	if (cval != cputime_zero) {
 		struct task_cputime cputime;
 		cputime_t t;
 
 		thread_group_cputimer(tsk, &cputime);
 		if (clock_id == CPUCLOCK_PROF)
-			t = cputime_add(cputime.utime, cputime.stime);
+			t = cputime.utime + cputime.stime;
 		else
 			/* CPUCLOCK_VIRT */
 			t = cputime.utime;
 
-		if (cputime_le(cval, t))
+		if (cval < t)
 			/* about to fire */
 			cval = cputime_one_jiffy;
 		else
-			cval = cputime_sub(cval, t);
+			cval = cval - t;
 	}
 
 	spin_unlock_irq(&tsk->sighand->siglock);
@@ -123,7 +123,7 @@ enum hrtimer_restart it_real_fn(struct h
 	struct signal_struct *sig =
 		container_of(timer, struct signal_struct, real_timer);
 
-	trace_itimer_expire(ITIMER_REAL, sig->leader_pid, 0);
+	trace_itimer_expire(ITIMER_REAL, sig->leader_pid, cputime_zero);
 	kill_pid_info(SIGALRM, SEND_SIG_PRIV, sig->leader_pid);
 
 	return HRTIMER_NORESTART;
@@ -161,10 +161,9 @@ static void set_cpu_itimer(struct task_s
 
 	cval = it->expires;
 	cinterval = it->incr;
-	if (!cputime_eq(cval, cputime_zero) ||
-	    !cputime_eq(nval, cputime_zero)) {
-		if (cputime_gt(nval, cputime_zero))
-			nval = cputime_add(nval, cputime_one_jiffy);
+	if (cval != cputime_zero || nval != cputime_zero) {
+		if (nval > cputime_zero)
+			nval += cputime_one_jiffy;
 		set_process_cpu_timer(tsk, clock_id, &nval, &cval);
 	}
 	it->expires = nval;
@@ -224,7 +223,7 @@ again:
 		} else
 			tsk->signal->it_real_incr.tv64 = 0;
 
-		trace_itimer_state(ITIMER_REAL, value, 0);
+		trace_itimer_state(ITIMER_REAL, value, cputime_zero);
 		spin_unlock_irq(&tsk->sighand->siglock);
 		break;
 	case ITIMER_VIRTUAL:
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -78,7 +78,7 @@ static inline int cpu_time_before(const
 	if (CPUCLOCK_WHICH(which_clock) == CPUCLOCK_SCHED) {
 		return now.sched < then.sched;
 	}  else {
-		return cputime_lt(now.cpu, then.cpu);
+		return now.cpu < then.cpu;
 	}
 }
 static inline void cpu_time_add(const clockid_t which_clock,
@@ -88,7 +88,7 @@ static inline void cpu_time_add(const cl
 	if (CPUCLOCK_WHICH(which_clock) == CPUCLOCK_SCHED) {
 		acc->sched += val.sched;
 	}  else {
-		acc->cpu = cputime_add(acc->cpu, val.cpu);
+		acc->cpu += val.cpu;
 	}
 }
 static inline union cpu_time_count cpu_time_sub(const clockid_t which_clock,
@@ -98,25 +98,12 @@ static inline union cpu_time_count cpu_t
 	if (CPUCLOCK_WHICH(which_clock) == CPUCLOCK_SCHED) {
 		a.sched -= b.sched;
 	}  else {
-		a.cpu = cputime_sub(a.cpu, b.cpu);
+		a.cpu -= b.cpu;
 	}
 	return a;
 }
 
 /*
- * Divide and limit the result to res >= 1
- *
- * This is necessary to prevent signal delivery starvation, when the result of
- * the division would be rounded down to 0.
- */
-static inline cputime_t cputime_div_non_zero(cputime_t time, unsigned long div)
-{
-	cputime_t res = cputime_div(time, div);
-
-	return max_t(cputime_t, res, 1);
-}
-
-/*
  * Update expiry time from increment, and increase overrun count,
  * given the current clock sample.
  */
@@ -148,28 +135,26 @@ static void bump_cpu_timer(struct k_itim
 	} else {
 		cputime_t delta, incr;
 
-		if (cputime_lt(now.cpu, timer->it.cpu.expires.cpu))
+		if (now.cpu < timer->it.cpu.expires.cpu)
 			return;
 		incr = timer->it.cpu.incr.cpu;
-		delta = cputime_sub(cputime_add(now.cpu, incr),
-				    timer->it.cpu.expires.cpu);
+		delta = now.cpu + incr - timer->it.cpu.expires.cpu;
 		/* Don't use (incr*2 < delta), incr*2 might overflow. */
-		for (i = 0; cputime_lt(incr, cputime_sub(delta, incr)); i++)
-			     incr = cputime_add(incr, incr);
-		for (; i >= 0; incr = cputime_halve(incr), i--) {
-			if (cputime_lt(delta, incr))
+		for (i = 0; incr < delta - incr; i++)
+			     incr += incr;
+		for (; i >= 0; incr = incr >> 1, i--) {
+			if (delta < incr)
 				continue;
-			timer->it.cpu.expires.cpu =
-				cputime_add(timer->it.cpu.expires.cpu, incr);
+			timer->it.cpu.expires.cpu += incr;
 			timer->it_overrun += 1 << i;
-			delta = cputime_sub(delta, incr);
+			delta -= incr;
 		}
 	}
 }
 
 static inline cputime_t prof_ticks(struct task_struct *p)
 {
-	return cputime_add(p->utime, p->stime);
+	return p->utime + p->stime;
 }
 static inline cputime_t virt_ticks(struct task_struct *p)
 {
@@ -248,8 +233,8 @@ void thread_group_cputime(struct task_st
 
 	t = tsk;
 	do {
-		times->utime = cputime_add(times->utime, t->utime);
-		times->stime = cputime_add(times->stime, t->stime);
+		times->utime += t->utime;
+		times->stime += t->stime;
 		times->sum_exec_runtime += task_sched_runtime(t);
 	} while_each_thread(tsk, t);
 out:
@@ -258,10 +243,10 @@ out:
 
 static void update_gt_cputime(struct task_cputime *a, struct task_cputime *b)
 {
-	if (cputime_gt(b->utime, a->utime))
+	if (b->utime > a->utime)
 		a->utime = b->utime;
 
-	if (cputime_gt(b->stime, a->stime))
+	if (b->stime > a->stime)
 		a->stime = b->stime;
 
 	if (b->sum_exec_runtime > a->sum_exec_runtime)
@@ -306,7 +291,7 @@ static int cpu_clock_sample_group(const
 		return -EINVAL;
 	case CPUCLOCK_PROF:
 		thread_group_cputime(p, &cputime);
-		cpu->cpu = cputime_add(cputime.utime, cputime.stime);
+		cpu->cpu = cputime.utime + cputime.stime;
 		break;
 	case CPUCLOCK_VIRT:
 		thread_group_cputime(p, &cputime);
@@ -470,26 +455,24 @@ static void cleanup_timers(struct list_h
 			   unsigned long long sum_exec_runtime)
 {
 	struct cpu_timer_list *timer, *next;
-	cputime_t ptime = cputime_add(utime, stime);
+	cputime_t ptime = utime + stime;
 
 	list_for_each_entry_safe(timer, next, head, entry) {
 		list_del_init(&timer->entry);
-		if (cputime_lt(timer->expires.cpu, ptime)) {
+		if (timer->expires.cpu < ptime) {
 			timer->expires.cpu = cputime_zero;
 		} else {
-			timer->expires.cpu = cputime_sub(timer->expires.cpu,
-							 ptime);
+			timer->expires.cpu = timer->expires.cpu - ptime;
 		}
 	}
 
 	++head;
 	list_for_each_entry_safe(timer, next, head, entry) {
 		list_del_init(&timer->entry);
-		if (cputime_lt(timer->expires.cpu, utime)) {
+		if (timer->expires.cpu < utime) {
 			timer->expires.cpu = cputime_zero;
 		} else {
-			timer->expires.cpu = cputime_sub(timer->expires.cpu,
-							 utime);
+			timer->expires.cpu = timer->expires.cpu - utime;
 		}
 	}
 
@@ -520,8 +503,7 @@ void posix_cpu_timers_exit_group(struct
 	struct signal_struct *const sig = tsk->signal;
 
 	cleanup_timers(tsk->signal->cpu_timers,
-		       cputime_add(tsk->utime, sig->utime),
-		       cputime_add(tsk->stime, sig->stime),
+		       tsk->utime + sig->utime, tsk->stime + sig->stime,
 		       tsk->se.sum_exec_runtime + sig->sum_sched_runtime);
 }
 
@@ -540,8 +522,7 @@ static void clear_dead_task(struct k_iti
 
 static inline int expires_gt(cputime_t expires, cputime_t new_exp)
 {
-	return cputime_eq(expires, cputime_zero) ||
-	       cputime_gt(expires, new_exp);
+	return expires == cputime_zero || expires > new_exp;
 }
 
 /*
@@ -651,7 +632,7 @@ static int cpu_timer_sample_group(const
 	default:
 		return -EINVAL;
 	case CPUCLOCK_PROF:
-		cpu->cpu = cputime_add(cputime.utime, cputime.stime);
+		cpu->cpu = cputime.utime + cputime.stime;
 		break;
 	case CPUCLOCK_VIRT:
 		cpu->cpu = cputime.utime;
@@ -923,7 +904,7 @@ static void check_thread_timers(struct t
 		struct cpu_timer_list *t = list_first_entry(timers,
 						      struct cpu_timer_list,
 						      entry);
-		if (!--maxfire || cputime_lt(prof_ticks(tsk), t->expires.cpu)) {
+		if (!--maxfire || prof_ticks(tsk) < t->expires.cpu) {
 			tsk->cputime_expires.prof_exp = t->expires.cpu;
 			break;
 		}
@@ -938,7 +919,7 @@ static void check_thread_timers(struct t
 		struct cpu_timer_list *t = list_first_entry(timers,
 						      struct cpu_timer_list,
 						      entry);
-		if (!--maxfire || cputime_lt(virt_ticks(tsk), t->expires.cpu)) {
+		if (!--maxfire || virt_ticks(tsk) < t->expires.cpu) {
 			tsk->cputime_expires.virt_exp = t->expires.cpu;
 			break;
 		}
@@ -1009,16 +990,15 @@ static u32 onecputick;
 static void check_cpu_itimer(struct task_struct *tsk, struct cpu_itimer *it,
 			     cputime_t *expires, cputime_t cur_time, int signo)
 {
-	if (cputime_eq(it->expires, cputime_zero))
+	if (it->expires == cputime_zero)
 		return;
 
-	if (cputime_ge(cur_time, it->expires)) {
-		if (!cputime_eq(it->incr, cputime_zero)) {
-			it->expires = cputime_add(it->expires, it->incr);
+	if (cur_time >= it->expires) {
+		if (it->incr != cputime_zero) {
+			it->expires += it->incr;
 			it->error += it->incr_error;
 			if (it->error >= onecputick) {
-				it->expires = cputime_sub(it->expires,
-							  cputime_one_jiffy);
+				it->expires -= cputime_one_jiffy;
 				it->error -= onecputick;
 			}
 		} else {
@@ -1031,9 +1011,8 @@ static void check_cpu_itimer(struct task
 		__group_send_sig_info(signo, SEND_SIG_PRIV, tsk);
 	}
 
-	if (!cputime_eq(it->expires, cputime_zero) &&
-	    (cputime_eq(*expires, cputime_zero) ||
-	     cputime_lt(it->expires, *expires))) {
+	if (it->expires != cputime_zero &&
+	    (*expires == cputime_zero || it->expires < *expires)) {
 		*expires = it->expires;
 	}
 }
@@ -1048,8 +1027,8 @@ static void check_cpu_itimer(struct task
  */
 static inline int task_cputime_zero(const struct task_cputime *cputime)
 {
-	if (cputime_eq(cputime->utime, cputime_zero) &&
-	    cputime_eq(cputime->stime, cputime_zero) &&
+	if (cputime->utime == cputime_zero &&
+	    cputime->stime == cputime_zero &&
 	    cputime->sum_exec_runtime == 0)
 		return 1;
 	return 0;
@@ -1076,7 +1055,7 @@ static void check_process_timers(struct
 	 */
 	thread_group_cputimer(tsk, &cputime);
 	utime = cputime.utime;
-	ptime = cputime_add(utime, cputime.stime);
+	ptime = utime + cputime.stime;
 	sum_sched_runtime = cputime.sum_exec_runtime;
 	maxfire = 20;
 	prof_expires = cputime_zero;
@@ -1084,7 +1063,7 @@ static void check_process_timers(struct
 		struct cpu_timer_list *tl = list_first_entry(timers,
 						      struct cpu_timer_list,
 						      entry);
-		if (!--maxfire || cputime_lt(ptime, tl->expires.cpu)) {
+		if (!--maxfire || ptime < tl->expires.cpu) {
 			prof_expires = tl->expires.cpu;
 			break;
 		}
@@ -1099,7 +1078,7 @@ static void check_process_timers(struct
 		struct cpu_timer_list *tl = list_first_entry(timers,
 						      struct cpu_timer_list,
 						      entry);
-		if (!--maxfire || cputime_lt(utime, tl->expires.cpu)) {
+		if (!--maxfire || utime < tl->expires.cpu) {
 			virt_expires = tl->expires.cpu;
 			break;
 		}
@@ -1154,8 +1133,7 @@ static void check_process_timers(struct
 			}
 		}
 		x = secs_to_cputime(soft);
-		if (cputime_eq(prof_expires, cputime_zero) ||
-		    cputime_lt(x, prof_expires)) {
+		if (prof_expires == cputime_zero || x < prof_expires) {
 			prof_expires = x;
 		}
 	}
@@ -1249,12 +1227,11 @@ out:
 static inline int task_cputime_expired(const struct task_cputime *sample,
 					const struct task_cputime *expires)
 {
-	if (!cputime_eq(expires->utime, cputime_zero) &&
-	    cputime_ge(sample->utime, expires->utime))
+	if (expires->utime != cputime_zero &&
+	    sample->utime >= expires->utime)
 		return 1;
-	if (!cputime_eq(expires->stime, cputime_zero) &&
-	    cputime_ge(cputime_add(sample->utime, sample->stime),
-		       expires->stime))
+	if (expires->stime != cputime_zero &&
+	    sample->utime + sample->stime >= expires->stime)
 		return 1;
 	if (expires->sum_exec_runtime != 0 &&
 	    sample->sum_exec_runtime >= expires->sum_exec_runtime)
@@ -1389,18 +1366,18 @@ void set_process_cpu_timer(struct task_s
 		 * it to be relative, *newval argument is relative and we update
 		 * it to be absolute.
 		 */
-		if (!cputime_eq(*oldval, cputime_zero)) {
-			if (cputime_le(*oldval, now.cpu)) {
+		if (*oldval != cputime_zero) {
+			if (*oldval <= now.cpu) {
 				/* Just about to fire. */
 				*oldval = cputime_one_jiffy;
 			} else {
-				*oldval = cputime_sub(*oldval, now.cpu);
+				*oldval -= now.cpu;
 			}
 		}
 
-		if (cputime_eq(*newval, cputime_zero))
+		if (*newval == cputime_zero)
 			return;
-		*newval = cputime_add(*newval, now.cpu);
+		*newval += now.cpu;
 	}
 
 	/*
@@ -1409,11 +1386,11 @@ void set_process_cpu_timer(struct task_s
 	 */
 	switch (clock_idx) {
 	case CPUCLOCK_PROF:
-		if (expires_gt(tsk->signal->cputime_expires.prof_exp, *newval))
+		if (tsk->signal->cputime_expires.prof_exp > *newval)
 			tsk->signal->cputime_expires.prof_exp = *newval;
 		break;
 	case CPUCLOCK_VIRT:
-		if (expires_gt(tsk->signal->cputime_expires.virt_exp, *newval))
+		if (tsk->signal->cputime_expires.virt_exp >  *newval)
 			tsk->signal->cputime_expires.virt_exp = *newval;
 		break;
 	}
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2011,7 +2011,7 @@ static int irqtime_account_hi_update(voi
 
 	local_irq_save(flags);
 	latest_ns = this_cpu_read(cpu_hardirq_time);
-	if (cputime64_gt(nsecs_to_cputime64(latest_ns), cpustat->irq))
+	if (nsecs_to_cputime64(latest_ns) > cpustat->irq)
 		ret = 1;
 	local_irq_restore(flags);
 	return ret;
@@ -2026,7 +2026,7 @@ static int irqtime_account_si_update(voi
 
 	local_irq_save(flags);
 	latest_ns = this_cpu_read(cpu_softirq_time);
-	if (cputime64_gt(nsecs_to_cputime64(latest_ns), cpustat->softirq))
+	if (nsecs_to_cputime64(latest_ns) > cpustat->softirq)
 		ret = 1;
 	local_irq_restore(flags);
 	return ret;
@@ -3734,19 +3734,17 @@ void account_user_time(struct task_struc
 		       cputime_t cputime_scaled)
 {
 	struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
-	cputime64_t tmp;
 
 	/* Add user time to process. */
-	p->utime = cputime_add(p->utime, cputime);
-	p->utimescaled = cputime_add(p->utimescaled, cputime_scaled);
+	p->utime += cputime;
+	p->utimescaled += cputime_scaled;
 	account_group_user_time(p, cputime);
 
 	/* Add user time to cpustat. */
-	tmp = cputime_to_cputime64(cputime);
 	if (TASK_NICE(p) > 0)
-		cpustat->nice = cputime64_add(cpustat->nice, tmp);
+		cpustat->nice += (__force cputime64_t) cputime;
 	else
-		cpustat->user = cputime64_add(cpustat->user, tmp);
+		cpustat->user += (__force cputime64_t) cputime;
 
 	cpuacct_update_stats(p, CPUACCT_STAT_USER, cputime);
 	/* Account for user time used */
@@ -3762,24 +3760,21 @@ void account_user_time(struct task_struc
 static void account_guest_time(struct task_struct *p, cputime_t cputime,
 			       cputime_t cputime_scaled)
 {
-	cputime64_t tmp;
 	struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
 
-	tmp = cputime_to_cputime64(cputime);
-
 	/* Add guest time to process. */
-	p->utime = cputime_add(p->utime, cputime);
-	p->utimescaled = cputime_add(p->utimescaled, cputime_scaled);
+	p->utime += cputime;
+	p->utimescaled += cputime_scaled;
 	account_group_user_time(p, cputime);
-	p->gtime = cputime_add(p->gtime, cputime);
+	p->gtime += cputime;
 
 	/* Add guest time to cpustat. */
 	if (TASK_NICE(p) > 0) {
-		cpustat->nice = cputime64_add(cpustat->nice, tmp);
-		cpustat->guest_nice = cputime64_add(cpustat->guest_nice, tmp);
+		cpustat->nice += (__force cputime64_t) cputime;
+		cpustat->guest_nice += (__force cputime64_t) cputime;
 	} else {
-		cpustat->user = cputime64_add(cpustat->user, tmp);
-		cpustat->guest = cputime64_add(cpustat->guest, tmp);
+		cpustat->user += (__force cputime64_t) cputime;
+		cpustat->guest += (__force cputime64_t) cputime;
 	}
 }
 
@@ -3794,15 +3789,13 @@ static inline
 void __account_system_time(struct task_struct *p, cputime_t cputime,
 			cputime_t cputime_scaled, cputime64_t *target_cputime64)
 {
-	cputime64_t tmp = cputime_to_cputime64(cputime);
-
 	/* Add system time to process. */
-	p->stime = cputime_add(p->stime, cputime);
-	p->stimescaled = cputime_add(p->stimescaled, cputime_scaled);
+	p->stime += cputime;
+	p->stimescaled += cputime_scaled;
 	account_group_system_time(p, cputime);
 
 	/* Add system time to cpustat. */
-	*target_cputime64 = cputime64_add(*target_cputime64, tmp);
+	*target_cputime64 += (__force cputime64_t) cputime;
 	cpuacct_update_stats(p, CPUACCT_STAT_SYSTEM, cputime);
 
 	/* Account for system time used */
@@ -3844,9 +3837,8 @@ void account_system_time(struct task_str
 void account_steal_time(cputime_t cputime)
 {
 	struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
-	cputime64_t cputime64 = cputime_to_cputime64(cputime);
 
-	cpustat->steal = cputime64_add(cpustat->steal, cputime64);
+	cpustat->steal += (__force cputime64_t) cputime;
 }
 
 /*
@@ -3856,13 +3848,12 @@ void account_steal_time(cputime_t cputim
 void account_idle_time(cputime_t cputime)
 {
 	struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
-	cputime64_t cputime64 = cputime_to_cputime64(cputime);
 	struct rq *rq = this_rq();
 
 	if (atomic_read(&rq->nr_iowait) > 0)
-		cpustat->iowait = cputime64_add(cpustat->iowait, cputime64);
+		cpustat->iowait += (__force cputime64_t) cputime;
 	else
-		cpustat->idle = cputime64_add(cpustat->idle, cputime64);
+		cpustat->idle += (__force cputime64_t) cputime;
 }
 
 static __always_inline bool steal_account_process_tick(void)
@@ -3912,16 +3903,15 @@ static void irqtime_account_process_tick
 						struct rq *rq)
 {
 	cputime_t one_jiffy_scaled = cputime_to_scaled(cputime_one_jiffy);
-	cputime64_t tmp = cputime_to_cputime64(cputime_one_jiffy);
 	struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
 
 	if (steal_account_process_tick())
 		return;
 
 	if (irqtime_account_hi_update()) {
-		cpustat->irq = cputime64_add(cpustat->irq, tmp);
+		cpustat->irq += (__force cputime64_t) cputime_one_jiffy;
 	} else if (irqtime_account_si_update()) {
-		cpustat->softirq = cputime64_add(cpustat->softirq, tmp);
+		cpustat->softirq += (__force cputime64_t) cputime_one_jiffy;
 	} else if (this_cpu_ksoftirqd() == p) {
 		/*
 		 * ksoftirqd time do not get accounted in cpu_softirq_time.
@@ -4037,7 +4027,7 @@ void thread_group_times(struct task_stru
 
 void task_times(struct task_struct *p, cputime_t *ut, cputime_t *st)
 {
-	cputime_t rtime, utime = p->utime, total = cputime_add(utime, p->stime);
+	cputime_t rtime, utime = p->utime, total = utime + p->stime;
 
 	/*
 	 * Use CFS's precise accounting:
@@ -4045,11 +4035,11 @@ void task_times(struct task_struct *p, c
 	rtime = nsecs_to_cputime(p->se.sum_exec_runtime);
 
 	if (total) {
-		u64 temp = rtime;
+		u64 temp = (__force u64) rtime;
 
-		temp *= utime;
-		do_div(temp, total);
-		utime = (cputime_t)temp;
+		temp *= (__force u64) utime;
+		do_div(temp, (__force u32) total);
+		utime = (__force cputime_t) temp;
 	} else
 		utime = rtime;
 
@@ -4057,7 +4047,7 @@ void task_times(struct task_struct *p, c
 	 * Compare with previous values, to keep monotonicity:
 	 */
 	p->prev_utime = max(p->prev_utime, utime);
-	p->prev_stime = max(p->prev_stime, cputime_sub(rtime, p->prev_utime));
+	p->prev_stime = max(p->prev_stime, rtime - p->prev_utime);
 
 	*ut = p->prev_utime;
 	*st = p->prev_stime;
@@ -4074,21 +4064,20 @@ void thread_group_times(struct task_stru
 
 	thread_group_cputime(p, &cputime);
 
-	total = cputime_add(cputime.utime, cputime.stime);
+	total = cputime.utime + cputime.stime;
 	rtime = nsecs_to_cputime(cputime.sum_exec_runtime);
 
 	if (total) {
-		u64 temp = rtime;
+		u64 temp = (__force u64) rtime;
 
-		temp *= cputime.utime;
-		do_div(temp, total);
-		utime = (cputime_t)temp;
+		temp *= (__force u64) cputime.utime;
+		do_div(temp, (__force u32) total);
+		utime = (__force cputime_t) temp;
 	} else
 		utime = rtime;
 
 	sig->prev_utime = max(sig->prev_utime, utime);
-	sig->prev_stime = max(sig->prev_stime,
-			      cputime_sub(rtime, sig->prev_utime));
+	sig->prev_stime = max(sig->prev_stime, rtime - sig->prev_utime);
 
 	*ut = sig->prev_utime;
 	*st = sig->prev_stime;
@@ -9330,7 +9319,8 @@ static void cpuacct_update_stats(struct
 	ca = task_ca(tsk);
 
 	do {
-		__percpu_counter_add(&ca->cpustat[idx], val, batch);
+		__percpu_counter_add(&ca->cpustat[idx],
+				     (__force s64) val, batch);
 		ca = ca->parent;
 	} while (ca);
 	rcu_read_unlock();
--- a/kernel/sched_stats.h
+++ b/kernel/sched_stats.h
@@ -283,8 +283,7 @@ static inline void account_group_user_ti
 		return;
 
 	spin_lock(&cputimer->lock);
-	cputimer->cputime.utime =
-		cputime_add(cputimer->cputime.utime, cputime);
+	cputimer->cputime.utime += cputime;
 	spin_unlock(&cputimer->lock);
 }
 
@@ -307,8 +306,7 @@ static inline void account_group_system_
 		return;
 
 	spin_lock(&cputimer->lock);
-	cputimer->cputime.stime =
-		cputime_add(cputimer->cputime.stime, cputime);
+	cputimer->cputime.stime += cputime;
 	spin_unlock(&cputimer->lock);
 }
 
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1621,10 +1621,8 @@ bool do_notify_parent(struct task_struct
 	info.si_uid = __task_cred(tsk)->uid;
 	rcu_read_unlock();
 
-	info.si_utime = cputime_to_clock_t(cputime_add(tsk->utime,
-				tsk->signal->utime));
-	info.si_stime = cputime_to_clock_t(cputime_add(tsk->stime,
-				tsk->signal->stime));
+	info.si_utime = cputime_to_clock_t(tsk->utime + tsk->signal->utime);
+	info.si_stime = cputime_to_clock_t(tsk->stime + tsk->signal->stime);
 
 	info.si_status = tsk->exit_code & 0x7f;
 	if (tsk->exit_code & 0x80)
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1632,8 +1632,8 @@ static void k_getrusage(struct task_stru
 
 		case RUSAGE_SELF:
 			thread_group_times(p, &tgutime, &tgstime);
-			utime = cputime_add(utime, tgutime);
-			stime = cputime_add(stime, tgstime);
+			utime += tgutime;
+			stime += tgstime;
 			r->ru_nvcsw += p->signal->nvcsw;
 			r->ru_nivcsw += p->signal->nivcsw;
 			r->ru_minflt += p->signal->min_flt;
--- a/kernel/tsacct.c
+++ b/kernel/tsacct.c
@@ -127,7 +127,7 @@ void acct_update_integrals(struct task_s
 
 		local_irq_save(flags);
 		time = tsk->stime + tsk->utime;
-		dtime = cputime_sub(time, tsk->acct_timexpd);
+		dtime = time - tsk->acct_timexpd;
 		jiffies_to_timeval(cputime_to_jiffies(dtime), &value);
 		delta = value.tv_sec;
 		delta = delta * USEC_PER_SEC + value.tv_usec;
@@ -148,7 +148,7 @@ void acct_update_integrals(struct task_s
  */
 void acct_clear_integrals(struct task_struct *tsk)
 {
-	tsk->acct_timexpd = 0;
+	tsk->acct_timexpd = cputime_zero;
 	tsk->acct_rss_mem1 = 0;
 	tsk->acct_vm_mem1 = 0;
 }


-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-20 14:36                 ` Linux 3.1-rc9 Martin Schwidefsky
@ 2011-10-23 11:34                   ` Ingo Molnar
  2011-10-24  7:48                     ` Martin Schwidefsky
  0 siblings, 1 reply; 98+ messages in thread
From: Ingo Molnar @ 2011-10-23 11:34 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: Linus Torvalds, Simon Kirby, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Thomas Gleixner


* Martin Schwidefsky <schwidefsky@de.ibm.com> wrote:

> +#define cputime_zero			((__force cputime_t) 0ULL)
> +#define cputime64_zero			((__force cputime64_t) 0ULL)

Hm, why are these still needed?

This:

		if (*newval == cputime_zero)
			return;

Could be written as the much simpler:

		if (!*newval)
 			return;

with no ill effect that i can see.

Thanks,
	Ingo

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-23 11:34                   ` Ingo Molnar
@ 2011-10-24  7:48                     ` Martin Schwidefsky
  2011-10-24  7:51                       ` Linus Torvalds
  0 siblings, 1 reply; 98+ messages in thread
From: Martin Schwidefsky @ 2011-10-24  7:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Simon Kirby, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Thomas Gleixner

On Sun, 23 Oct 2011 13:34:22 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Martin Schwidefsky <schwidefsky@de.ibm.com> wrote:
> 
> > +#define cputime_zero			((__force cputime_t) 0ULL)
> > +#define cputime64_zero			((__force cputime64_t) 0ULL)
> 
> Hm, why are these still needed?
> 
> This:
> 
> 		if (*newval == cputime_zero)
> 			return;
> 
> Could be written as the much simpler:
> 
> 		if (!*newval)
>  			return;
> 
> with no ill effect that i can see.

These types are still there because cputime_t can be u32 or u64. E.g. this

  timer->expires.cpu = 0;

will give the following sparse warning

  kernel/posix-cpu-timers.c:463:46: warning: implicit cast to nocast type

if you architecture happens to have a u64 as cputime_t.
We could get rid of cputime64_t as it always should be a u64. To keep
things symmetrical I choose to keep both defines.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-24  7:48                     ` Martin Schwidefsky
@ 2011-10-24  7:51                       ` Linus Torvalds
  2011-10-24  8:08                         ` Martin Schwidefsky
  0 siblings, 1 reply; 98+ messages in thread
From: Linus Torvalds @ 2011-10-24  7:51 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: Ingo Molnar, Simon Kirby, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Thomas Gleixner

On Mon, Oct 24, 2011 at 9:48 AM, Martin Schwidefsky
<schwidefsky@de.ibm.com> wrote:
>
> These types are still there because cputime_t can be u32 or u64. E.g. this
>
>  timer->expires.cpu = 0;
>
> will give the following sparse warning
>
>  kernel/posix-cpu-timers.c:463:46: warning: implicit cast to nocast type

Ok, we should probably special-case zero for that case too (we
consider zero to be very special - it's not only the NULL pointer, but
0 is special for the bitwise types etc). So this is very arguably a
sparse issue: casting zero is special.

                  Linus

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-24  7:51                       ` Linus Torvalds
@ 2011-10-24  8:08                         ` Martin Schwidefsky
  0 siblings, 0 replies; 98+ messages in thread
From: Martin Schwidefsky @ 2011-10-24  8:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Simon Kirby, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Thomas Gleixner

On Mon, 24 Oct 2011 09:51:09 +0200
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Mon, Oct 24, 2011 at 9:48 AM, Martin Schwidefsky
> <schwidefsky@de.ibm.com> wrote:
> >
> > These types are still there because cputime_t can be u32 or u64. E.g. this
> >
> >  timer->expires.cpu = 0;
> >
> > will give the following sparse warning
> >
> >  kernel/posix-cpu-timers.c:463:46: warning: implicit cast to nocast type
> 
> Ok, we should probably special-case zero for that case too (we
> consider zero to be very special - it's not only the NULL pointer, but
> 0 is special for the bitwise types etc). So this is very arguably a
> sparse issue: casting zero is special.

Ok, cool. In that case I'll cook up a patch without cputime_t & cputime64_t
and put it on the cputime branch on git390.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-18 19:48                                   ` Thomas Gleixner
  2011-10-18 20:12                                     ` Linus Torvalds
@ 2011-10-24 19:02                                     ` Simon Kirby
  2011-10-25  7:13                                       ` Linus Torvalds
  2011-10-25 20:20                                       ` Simon Kirby
  1 sibling, 2 replies; 98+ messages in thread
From: Simon Kirby @ 2011-10-24 19:02 UTC (permalink / raw)
  To: Thomas Gleixner, David Miller
  Cc: Peter Zijlstra, Linus Torvalds, Linux Kernel Mailing List,
	Dave Jones, Martin Schwidefsky, Ingo Molnar

On Tue, Oct 18, 2011 at 09:48:51PM +0200, Thomas Gleixner wrote:

> On Tue, 18 Oct 2011, Simon Kirby wrote:
> > Looks good running on three boxes since this morning (unpatched kernel
> > hangs in ~15 minutes).
> > 
> > While I have your eyes, does this hang trace make any sense (which
> > happened a couple of times with your previous patch applied)?
> > 
> > http://0x.ca/sim/ref/3.1-rc9/3.1-rc9-tcp-lockup.log
> > 
> > I don't see how all CPUs could be spinning on the same lock without
> > reentry, and I don't see the any in the backtraces.
> 
> Weird.
> 
> Which version of Peters patches was this, the extra lock or the
> atomic64 thingy?

The first one with the extra lock. I never tried the atomic64 one.
Anyway, that's fixed now.

> It does not look related. Could you try to reproduce that problem with
> lockdep enabled? lockdep might make it go away, but it's definitely
> worth a try.

Trying now...

...Whoops, never sent this email.

Ok, hit the hang about 4 more times, but only this morning on a box with
a serial cable attached. Yay!

Simon-

[216695.579770] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[216695.589435] 
[216695.589437] =======================================================
[216695.593380] [ INFO: possible circular locking dependency detected ]
[216695.593380] 3.1.0-rc10-hw-lockdep+ #51
[216695.593380] -------------------------------------------------------
[216695.593380] kworker/0:1/0 is trying to acquire lock:
[216695.593380]  (&icsk->icsk_retransmit_timer){+.-.-.}, at: [<ffffffff8106cc88>] run_timer_softirq+0x198/0x410
[216695.593380] 
[216695.593380] but task is already holding lock:
[216695.593380]  (slock-AF_INET){+.-.-.}, at: [<ffffffff81604540>] sk_clone+0x120/0x420
[216695.593380] 
[216695.593380] which lock already depends on the new lock.
[216695.593380] 
[216695.593380] 
[216695.593380] the existing dependency chain (in reverse order) is:
[216695.593380] 
[216695.593380] -> #1 (slock-AF_INET){+.-.-.}:
[216695.593380]        [<ffffffff8109a7b9>] lock_acquire+0x109/0x140
[216695.593380]        [<ffffffff816f55fc>] _raw_spin_lock+0x3c/0x50
[216695.593380]        [<ffffffff81661cc3>] tcp_write_timer+0x23/0x230
[216695.682901]        [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[216695.682901]        [<ffffffff810640b8>] __do_softirq+0x138/0x250
[216695.682901]        [<ffffffff817002bc>] call_softirq+0x1c/0x30
[216695.682901]        [<ffffffff810153c5>] do_softirq+0x95/0xd0
[216695.682901]        [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[216695.682901]        [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[216695.682901]        [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[216695.682901]        [<ffffffff81013156>] cpu_idle+0x96/0xf0
[216695.682901]        [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[216695.682901] 
[216695.682901] -> #0 (&icsk->icsk_retransmit_timer){+.-.-.}:
[216695.682901]        [<ffffffff8109a000>] __lock_acquire+0x2040/0x2180
[216695.682901]        [<ffffffff8109a7b9>] lock_acquire+0x109/0x140
[216695.682901]        [<ffffffff8106cd09>] run_timer_softirq+0x219/0x410
[216695.682901]        [<ffffffff810640b8>] __do_softirq+0x138/0x250
[216695.682901]        [<ffffffff817002bc>] call_softirq+0x1c/0x30
[216695.682901]        [<ffffffff810153c5>] do_softirq+0x95/0xd0
[216695.682901]        [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[216695.682901]        [<ffffffff81014b74>] do_IRQ+0x64/0xe0
[216695.682901]        [<ffffffff816f6273>] ret_from_intr+0x0/0x1a
[216695.682901]        [<ffffffff81013156>] cpu_idle+0x96/0xf0
[216695.682901]        [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[216695.682901] 
[216695.682901] other info that might help us debug this:
[216695.682901] 
[216695.682901]  Possible unsafe locking scenario:
[216695.682901] 
[216695.682901]        CPU0                    CPU1
[216695.682901]        ----                    ----
[216695.682901]   lock(slock-AF_INET);
[216695.682901]                                lock(&icsk->icsk_retransmit_timer);
[216695.682901]                                lock(slock-AF_INET);
[216695.682901]   lock(&icsk->icsk_retransmit_timer);
[216695.682901] 
[216695.682901]  *** DEADLOCK ***
[216695.682901] 
[216695.682901] 1 lock held by kworker/0:1/0:
[216695.682901]  #0:  (slock-AF_INET){+.-.-.}, at: [<ffffffff81604540>] sk_clone+0x120/0x420
[216695.682901] 
[216695.682901] stack backtrace:
[216695.682901] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc10-hw-lockdep+ #51
[216695.682901] Call Trace:
[216695.682901]  <IRQ>  [<ffffffff81097eab>] print_circular_bug+0x21b/0x330
[216695.682901]  [<ffffffff8109a000>] __lock_acquire+0x2040/0x2180
[216695.682901]  [<ffffffffa001d6e2>] ? nf_conntrack_free+0x42/0x50 [nf_conntrack]
[216695.682901]  [<ffffffffa001d6e2>] ? nf_conntrack_free+0x42/0x50 [nf_conntrack]
[216695.682901]  [<ffffffff8109a7b9>] lock_acquire+0x109/0x140
[216695.682901]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[216695.682901]  [<ffffffff81096b4c>] ? trace_hardirqs_on_caller+0x7c/0x1c0
[216695.682901]  [<ffffffff8106cd09>] run_timer_softirq+0x219/0x410
[216695.682901]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[216695.682901]  [<ffffffff816f16c1>] ? printk+0x67/0x69
[216695.682901]  [<ffffffff81661ca0>] ? tcp_delack_timer+0x230/0x230
[216695.682901]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[216695.682901]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[216695.682901]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[216695.682901]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[216695.682901]  [<ffffffff81014b74>] do_IRQ+0x64/0xe0
[216695.682901]  [<ffffffff816f6273>] common_interrupt+0x73/0x73
[216695.682901]  <EOI>  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[216695.682901]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[216695.682901]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[216695.682901]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[216696.019296] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000105?
[216697.762956] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[216698.597297] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[216701.489681] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[216701.667999] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[216704.580592] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[216709.468971] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[216712.845904] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[216716.588502] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[216725.072958] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[216725.603879] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[216725.828374] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[216727.588978] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[216735.513864] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[216740.581530] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[216756.278571] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[218855.312903] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[218855.323133] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[218858.293355] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[218864.301938] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[218876.333821] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[218885.332651] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[218900.313590] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[220821.012017] TCP: Peer 32.176.160.153:49226/80 unexpectedly shrunk window 665256753:665268993 (repaired)
[221075.224300] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[221075.234579] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[221075.277593] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[221075.780515] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[221075.780713] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[221077.349279] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[221077.905587] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[221077.915567] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[221081.498430] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[221081.703277] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[221082.088513] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[221082.167985] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[221089.772578] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[221090.487927] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[221090.686394] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[221094.587131] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[221105.255699] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[221105.280699] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[221105.291634] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[221106.325794] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[221107.286029] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[221107.622736] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[221107.734471] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[221120.381643] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[223936.264020] BUG: soft lockup - CPU#0 stuck for 22s! [swapper:0]
[223936.268002] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223936.268002] irq event stamp: 2595159887
[223936.268002] hardirqs last  enabled at (2595159887): [<ffffffff8101b805>] mwait_idle+0x145/0x170
[223936.268002] hardirqs last disabled at (2595159886): [<ffffffff81013139>] cpu_idle+0x79/0xf0
[223936.268002] softirqs last  enabled at (2595159878): [<ffffffff81064176>] __do_softirq+0x1f6/0x250
[223936.268002] softirqs last disabled at (2595159873): [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223936.268002] CPU 0 
[223936.268002] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223936.268002] 
[223936.268002] Pid: 0, comm: swapper Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
[223936.268002] RIP: 0010:[<ffffffff813a4ee3>]  [<ffffffff813a4ee3>] delay_tsc+0x73/0xd0
[223936.268002] RSP: 0018:ffff88022fc03cf0  EFLAGS: 00000202
[223936.268002] RAX: 00017b5d5932dd02 RBX: ffffffff816f6334 RCX: 000000005932dd02
[223936.372028] BUG: soft lockup - CPU#1 stuck for 22s! [kworker/0:0:0]
[223936.372031] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223936.372042] irq event stamp: 2598787699
[223936.372044] hardirqs last  enabled at (2598787699): [<ffffffff8101b805>] mwait_idle+0x145/0x170
[223936.372054] hardirqs last disabled at (2598787698): [<ffffffff81013139>] cpu_idle+0x79/0xf0
[223936.372058] softirqs last  enabled at (2598787696): [<ffffffff81064176>] __do_softirq+0x1f6/0x250
[223936.372063] softirqs last disabled at (2598787681): [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223936.372069] CPU 1 
[223936.372070] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223936.372079] 
[223936.372081] Pid: 0, comm: kworker/0:0 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
[223936.372086] RIP: 0010:[<ffffffff8101afab>]  [<ffffffff8101afab>] native_read_tsc+0xb/0x20
[223936.372091] RSP: 0018:ffff88022fc43ce0  EFLAGS: 00000202
[223936.372093] RAX: 0000000000017b5d RBX: ffffffff816f6334 RCX: 00000000652f810e
[223936.372096] RDX: 0000000000017b5d RSI: 0000000000000000 RDI: 0000000000000001
[223936.372098] RBP: ffff88022fc43ce0 R08: 00000000652f80c8 R09: 0000000000000000
[223936.372101] R10: ffff88022690dd60 R11: 0000000000000000 R12: ffff88022fc43c58
[223936.372103] R13: ffffffff816feb33 R14: ffff88022fc43ce0 R15: 00000000180bbeb8
[223936.372106] FS:  0000000000000000(0000) GS:ffff88022fc40000(0000) knlGS:0000000000000000
[223936.372108] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[223936.372111] CR2: 00007f472ba6a6fc CR3: 0000000126bb7000 CR4: 00000000000006e0
[223936.372113] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[223936.372116] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[223936.372119] Process kworker/0:0 (pid: 0, threadinfo ffff880226928000, task ffff88022690dd60)
[223936.372121] Stack:
[223936.372123]  ffff88022fc43d30 ffffffff813a4eaf ffff880226928000 00000000652f8090
[223936.372128]  000000012fc43d18 ffff88002e90e348 00000000180bbeb8 000000006efcdc62
[223936.372132]  0000000000000001 ffff88022690dd60 ffff88022fc43d40 ffffffff813a4f6a
[223936.372136] Call Trace:
[223936.372139]  <IRQ> 
[223936.372144]  [<ffffffff813a4eaf>] delay_tsc+0x3f/0xd0
[223936.372148]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223936.372153]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223936.372159]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223936.372164]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223936.372168]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223936.372174]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223936.372178]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223936.372182]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223936.372186]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223936.372190]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223936.372194]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223936.372198]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223936.372203]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223936.372208]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223936.372210]  <EOI> 
[223936.372214]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223936.372218]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223936.372222]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223936.372226]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223936.372230]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[223936.372233] Code: a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31 89 c1 48 89 d0 
[223936.372253]  c1 e0 20 89 ca 48 09 d0 c9 c3 66 2e 0f 1f 84 00 00 00 00 00 
[223936.372262] Call Trace:
[223936.372264]  <IRQ>  [<ffffffff813a4eaf>] delay_tsc+0x3f/0xd0
[223936.372269]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223936.372272]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223936.372276]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223936.372280]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223936.372283]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223936.372286]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223936.372289]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223936.372293]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223936.372297]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223936.372300]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223936.372303]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223936.372307]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223936.372310]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223936.372313]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223936.372315]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223936.372321]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223936.372324]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223936.372327]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223936.372331]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[223936.476032] BUG: soft lockup - CPU#2 stuck for 23s! [kworker/0:1:0]
[223936.476034] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223936.476043] irq event stamp: 2613824057
[223936.476045] hardirqs last  enabled at (2613824057): [<ffffffff8101b805>] mwait_idle+0x145/0x170
[223936.476050] hardirqs last disabled at (2613824056): [<ffffffff81013139>] cpu_idle+0x79/0xf0
[223936.476054] softirqs last  enabled at (2613824048): [<ffffffff81064176>] __do_softirq+0x1f6/0x250
[223936.476058] softirqs last disabled at (2613824031): [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223936.476062] CPU 2 
[223936.476063] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223936.476071] 
[223936.476073] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
[223936.476077] RIP: 0010:[<ffffffff8101afa6>]  [<ffffffff8101afa6>] native_read_tsc+0x6/0x20
[223936.476082] RSP: 0018:ffff88022fc83ce0  EFLAGS: 00000202
[223936.476084] RAX: 0000000070ba7dfc RBX: ffffffff813a60ae RCX: 0000000070ba7dc4
[223936.476086] RDX: 0000000000017b5d RSI: 0000000000000000 RDI: 0000000000000001
[223936.476089] RBP: ffff88022fc83ce0 R08: 0000000070ba7d7e R09: 0000000000000000
[223936.476091] R10: ffff880226948000 R11: 0000000000000000 R12: ffff88022fc83c58
[223936.476093] R13: ffffffff816feb33 R14: ffff88022fc83ce0 R15: 00000000182285f9
[223936.476096] FS:  0000000000000000(0000) GS:ffff88022fc80000(0000) knlGS:0000000000000000
[223936.476099] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[223936.476101] CR2: 00007f25e7874d7f CR3: 0000000124c0d000 CR4: 00000000000006e0
[223936.476104] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[223936.476106] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[223936.476109] Process kworker/0:1 (pid: 0, threadinfo ffff88022693e000, task ffff880226948000)
[223936.476111] Stack:
[223936.476113]  ffff88022fc83d30 ffffffff813a4ee8 ffff88022693e000 0000000070ba7dc4
[223936.476117]  00000002ffffff10 ffff88006afd8948 00000000182285f9 000000006efcdc62
[223936.476121]  0000000000000001 ffff880226948000 ffff88022fc83d40 ffffffff813a4f6a
[223936.476126] Call Trace:
[223936.476128]  <IRQ> 
[223936.476132]  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
[223936.476136]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223936.476141]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223936.476147]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223936.476153]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223936.476157]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223936.476163]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223936.476167]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223936.476171]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223936.476176]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223936.476180]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223936.476184]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223936.476187]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223936.476193]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223936.476197]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223936.476199]  <EOI> 
[223936.476203]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223936.476207]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223936.476211]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223936.476215]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223936.476219]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[223936.476222] Code: 08 ff 15 46 5c a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31 
[223936.476241]  c1 48 89 d0 48 c1 e0 20 89 ca 48 09 d0 c9 c3 66 2e 0f 1f 84 
[223936.476251] Call Trace:
[223936.476252]  <IRQ>  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
[223936.476257]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223936.476261]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223936.476265]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223936.476268]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223936.476272]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223936.476275]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223936.476278]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223936.476282]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223936.476286]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223936.476289]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223936.476292]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223936.476295]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223936.476299]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223936.476302]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223936.476304]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223936.476310]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223936.476313]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223936.476316]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223936.476320]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[223936.580039] BUG: soft lockup - CPU#3 stuck for 23s! [kworker/0:1:0]
[223936.580041] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223936.580050] irq event stamp: 2615464042
[223936.580052] hardirqs last  enabled at (2615464042): [<ffffffff816f5edb>] _raw_spin_unlock_irq+0x2b/0x50
[223936.580057] hardirqs last disabled at (2615464041): [<ffffffff816f56a8>] _raw_spin_lock_irq+0x18/0x60
[223936.580061] softirqs last  enabled at (2615463964): [<ffffffff81063cce>] _local_bh_enable+0xe/0x10
[223936.580065] softirqs last disabled at (2615463965): [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223936.580069] CPU 3 
[223936.580070] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223936.580078] 
[223936.580080] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
[223936.580085] RIP: 0010:[<ffffffff8101afa6>]  [<ffffffff8101afa6>] native_read_tsc+0x6/0x20
[223936.580090] RSP: 0018:ffff88022fcc3ce0  EFLAGS: 00000202
[223936.580092] RAX: 000000007c457b06 RBX: ffffffff816f6334 RCX: 000000007c457ad5
[223936.580094] RDX: 0000000000017b5d RSI: ffffffff818f9896 RDI: 0000000000000001
[223936.580097] RBP: ffff88022fcc3ce0 R08: 000000007c457a88 R09: 0000000000000000
[223936.580099] R10: ffff880226981f20 R11: 0000000000000000 R12: ffff88022fcc3c58
[223936.580101] R13: ffffffff816feb33 R14: ffff88022fcc3ce0 R15: 00000000183a1380
[223936.580104] FS:  0000000000000000(0000) GS:ffff88022fcc0000(0000) knlGS:0000000000000000
[223936.580107] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[223936.580109] CR2: 0000000000f38820 CR3: 0000000104b52000 CR4: 00000000000006e0
[223936.580112] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[223936.580114] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[223936.580117] Process kworker/0:1 (pid: 0, threadinfo ffff880226988000, task ffff880226981f20)
[223936.580119] Stack:
[223936.580120]  ffff88022fcc3d30 ffffffff813a4ee8 ffff880226988000 000000007c457ad5
[223936.580125]  00000003ffffff10 ffff880031438948 00000000183a1380 000000006efcdc62
[223936.580129]  0000000000000001 ffff880226981f20 ffff88022fcc3d40 ffffffff813a4f6a
[223936.580133] Call Trace:
[223936.580135]  <IRQ> 
[223936.580138]  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
[223936.580142]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223936.580147]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223936.580151]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223936.580156]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223936.580160]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223936.580164]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223936.580168]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223936.580172]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223936.580176]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223936.580181]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223936.580185]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223936.580188]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223936.580192]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223936.580196]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223936.580199]  <EOI> 
[223936.580202]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223936.580206]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223936.580211]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223936.580214]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223936.580218]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[223936.580221] Code: 08 ff 15 46 5c a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31 
[223936.580240]  c1 48 89 d0 48 c1 e0 20 89 ca 48 09 d0 c9 c3 66 2e 0f 1f 84 
[223936.580250] Call Trace:
[223936.580251]  <IRQ>  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
[223936.580256]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223936.580260]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223936.580264]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223936.580267]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223936.580270]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223936.580274]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223936.580277]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223936.580280]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223936.580284]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223936.580288]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223936.580291]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223936.580294]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223936.580297]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223936.580300]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223936.580302]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223936.580308]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223936.580312]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223936.580315]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223936.580318]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[223936.268002] RDX: 000000005932dd02 RSI: ffffffff818f9896 RDI: 0000000000000001
[223936.268002] RBP: ffff88022fc03d30 R08: 000000005932dcb5 R09: 0000000000000000
[223936.268002] R10: ffffffff81a2b020 R11: 0000000000000000 R12: ffff88022fc03c68
[223936.268002] R13: ffffffff816feb33 R14: ffff88022fc03d30 R15: 0000000017f328cd
[223936.268002] FS:  0000000000000000(0000) GS:ffff88022fc00000(0000) knlGS:0000000000000000
[223936.268002] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[223936.268002] CR2: 00007f25e7bc13a0 CR3: 00000001426fc000 CR4: 00000000000006f0
[223936.268002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[223936.268002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[223936.268002] Process swapper (pid: 0, threadinfo ffffffff819a6000, task ffffffff81a2b020)
[223936.268002] Stack:
[223936.268002]  ffffffff819a6000 000000005932dd02 000000002fc03d18 ffff8801f6c22448
[223936.268002]  0000000017f328cd 000000006efcdc62 0000000000000001 ffffffff81a2b020
[223936.268002]  ffff88022fc03d40 ffffffff813a4f6a ffff88022fc03d80 ffffffff813ac2ab
[223936.268002] Call Trace:
[223936.268002]  <IRQ> 
[223936.268002]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223936.268002]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223936.268002]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223936.268002]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223936.268002]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223936.268002]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223936.268002]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223936.268002]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223936.268002]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223936.268002]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223936.268002]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223936.268002]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223936.268002]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223936.268002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223936.268002]  <EOI> 
[223936.268002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223936.268002]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223936.268002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223936.268002]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223936.268002]  [<ffffffff816ca491>] rest_init+0xd1/0xe0
[223936.268002]  [<ffffffff816ca3c0>] ? csum_partial_copy_generic+0x170/0x170
[223936.268002]  [<ffffffff81adcc55>] start_kernel+0x360/0x3ac
[223936.268002]  [<ffffffff81adc2a2>] x86_64_start_reservations+0x82/0x89
[223936.268002]  [<ffffffff81adc3b8>] x86_64_start_kernel+0x10f/0x12a
[223936.268002]  [<ffffffff81adc140>] ? early_idt_handlers+0x140/0x140
[223936.268002] Code: 4c 89 7d c8 eb 1f 66 90 48 8b 45 c0 83 68 1c 01 f3 90 83 40 1c 01 65 44 8b 3c 25 50 d3 00 00 44 3b 7d d4 75 3b 66 66 90 0f ae e8 <e8> b8 60 c7 ff 66 90 4c 63 c0 4c 89 c0 48 2b 45 c8 48 39 d8 72 
[223936.268002] Call Trace:
[223936.268002]  <IRQ>  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223936.268002]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223936.268002]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223936.268002]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223936.268002]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223936.268002]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223936.268002]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223936.268002]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223936.268002]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223936.268002]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223936.268002]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223936.268002]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223936.268002]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223936.268002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223936.268002]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223936.268002]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223936.268002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223936.268002]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223936.268002]  [<ffffffff816ca491>] rest_init+0xd1/0xe0
[223936.268002]  [<ffffffff816ca3c0>] ? csum_partial_copy_generic+0x170/0x170
[223936.268002]  [<ffffffff81adcc55>] start_kernel+0x360/0x3ac
[223936.268002]  [<ffffffff81adc2a2>] x86_64_start_reservations+0x82/0x89
[223936.268002]  [<ffffffff81adc3b8>] x86_64_start_kernel+0x10f/0x12a
[223936.268002]  [<ffffffff81adc140>] ? early_idt_handlers+0x140/0x140
[223964.264018] BUG: soft lockup - CPU#0 stuck for 22s! [swapper:0]
[223964.268002] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223964.268002] irq event stamp: 2595159887
[223964.268002] hardirqs last  enabled at (2595159887): [<ffffffff8101b805>] mwait_idle+0x145/0x170
[223964.268002] hardirqs last disabled at (2595159886): [<ffffffff81013139>] cpu_idle+0x79/0xf0
[223964.268002] softirqs last  enabled at (2595159878): [<ffffffff81064176>] __do_softirq+0x1f6/0x250
[223964.268002] softirqs last disabled at (2595159873): [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223964.268002] CPU 0 
[223964.268002] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223964.268002] 
[223964.268002] Pid: 0, comm: swapper Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
[223964.268002] RIP: 0010:[<ffffffff8101afa6>]  [<ffffffff8101afa6>] native_read_tsc+0x6/0x20
[223964.268002] RSP: 0018:ffff88022fc03ce0  EFLAGS: 00000202
[223964.268002] RAX: 000000007cb6c61b RBX: ffffffff816f6334 RCX: 000000007cb6c5e3
[223964.372025] BUG: soft lockup - CPU#1 stuck for 23s! [kworker/0:0:0]
[223964.372027] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223964.372036] irq event stamp: 2598787699
[223964.372037] hardirqs last  enabled at (2598787699): [<ffffffff8101b805>] mwait_idle+0x145/0x170
[223964.372042] hardirqs last disabled at (2598787698): [<ffffffff81013139>] cpu_idle+0x79/0xf0
[223964.372045] softirqs last  enabled at (2598787696): [<ffffffff81064176>] __do_softirq+0x1f6/0x250
[223964.372049] softirqs last disabled at (2598787681): [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223964.372052] CPU 1 
[223964.372053] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223964.372061] 
[223964.372063] Pid: 0, comm: kworker/0:0 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
[223964.372067] RIP: 0010:[<ffffffff8101afa0>]  [<ffffffff8101afa0>] read_persistent_clock+0x30/0x30
[223964.372072] RSP: 0018:ffff88022fc43ce8  EFLAGS: 00000202
[223964.372074] RAX: 0000000000000001 RBX: ffff88022fc43c68 RCX: 0000000088b369fd
[223964.372076] RDX: 0000000000000200 RSI: 0000000000000000 RDI: 0000000000000001
[223964.372078] RBP: ffff88022fc43d30 R08: ffffffff88b369fd R09: 0000000000000000
[223964.372081] R10: ffff88022690dd60 R11: 0000000000000000 R12: ffff88022fc43c58
[223964.372083] R13: ffffffff816feb33 R14: ffff88022fc43d30 R15: 00000000307e58b4
[223964.372086] FS:  0000000000000000(0000) GS:ffff88022fc40000(0000) knlGS:0000000000000000
[223964.372089] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[223964.372091] CR2: 00007f472ba6a6fc CR3: 0000000126bb7000 CR4: 00000000000006e0
[223964.372093] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[223964.372096] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[223964.372098] Process kworker/0:0 (pid: 0, threadinfo ffff880226928000, task ffff88022690dd60)
[223964.372100] Stack:
[223964.372102]  ffffffff813a4eaf ffff880226928000 ffffffff88b369c5 000000012fc43d18
[223964.372106]  ffff88002e90e348 00000000307e58b4 000000006efcdc62 0000000000000001
[223964.372111]  ffff88022690dd60 ffff88022fc43d40 ffffffff813a4f6a ffff88022fc43d80
[223964.372115] Call Trace:
[223964.372116]  <IRQ> 
[223964.372119]  [<ffffffff813a4eaf>] ? delay_tsc+0x3f/0xd0
[223964.372123]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223964.372127]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223964.372132]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223964.372136]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223964.372140]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223964.372144]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223964.372148]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223964.372153]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223964.372158]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223964.372162]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223964.372166]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223964.372170]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223964.372174]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223964.372178]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223964.372180]  <EOI> 
[223964.372184]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223964.372188]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223964.372192]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223964.372196]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223964.372200]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[223964.372203] Code: 48 89 fb 48 83 ec 08 ff 15 46 5c a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 
[223964.372221]  48 89 e5 0f 31 89 c1 48 89 d0 48 c1 e0 20 89 ca 48 09 d0 c9 
[223964.372231] Call Trace:
[223964.372232]  <IRQ>  [<ffffffff813a4eaf>] ? delay_tsc+0x3f/0xd0
[223964.372237]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223964.372241]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223964.372245]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223964.372248]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223964.372251]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223964.372255]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223964.372258]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223964.372261]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223964.372265]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223964.372268]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223964.372271]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223964.372275]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223964.372278]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223964.372281]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223964.372282]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223964.372288]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223964.372292]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223964.372295]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223964.372298]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[223964.476031] BUG: soft lockup - CPU#2 stuck for 23s! [kworker/0:1:0]
[223964.476033] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223964.476042] irq event stamp: 2613824057
[223964.476043] hardirqs last  enabled at (2613824057): [<ffffffff8101b805>] mwait_idle+0x145/0x170
[223964.476048] hardirqs last disabled at (2613824056): [<ffffffff81013139>] cpu_idle+0x79/0xf0
[223964.476051] softirqs last  enabled at (2613824048): [<ffffffff81064176>] __do_softirq+0x1f6/0x250
[223964.476055] softirqs last disabled at (2613824031): [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223964.476059] CPU 2 
[223964.476060] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223964.476067] 
[223964.476070] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
[223964.476074] RIP: 0010:[<ffffffff8101afa6>]  [<ffffffff8101afa6>] native_read_tsc+0x6/0x20
[223964.476078] RSP: 0018:ffff88022fc83ce0  EFLAGS: 00000206
[223964.476080] RAX: 00000000943e6715 RBX: ffffffff816f6334 RCX: 00000000943e66dd
[223964.476083] RDX: 0000000000017b69 RSI: 0000000000000000 RDI: 0000000000000001
[223964.476085] RBP: ffff88022fc83ce0 R08: ffffffff943e6697 R09: 0000000000000000
[223964.476087] R10: ffff880226948000 R11: 0000000000000000 R12: ffff88022fc83c58
[223964.476090] R13: ffffffff816feb33 R14: ffff88022fc83ce0 R15: 000000003094ad30
[223964.476092] FS:  0000000000000000(0000) GS:ffff88022fc80000(0000) knlGS:0000000000000000
[223964.476095] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[223964.476097] CR2: 00007f25e7874d7f CR3: 0000000124c0d000 CR4: 00000000000006e0
[223964.476100] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[223964.476102] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[223964.476105] Process kworker/0:1 (pid: 0, threadinfo ffff88022693e000, task ffff880226948000)
[223964.476107] Stack:
[223964.476108]  ffff88022fc83d30 ffffffff813a4ee8 ffff88022693e000 ffffffff943e66dd
[223964.476113]  00000002ffffff10 ffff88006afd8948 000000003094ad30 000000006efcdc62
[223964.476117]  0000000000000001 ffff880226948000 ffff88022fc83d40 ffffffff813a4f6a
[223964.476121] Call Trace:
[223964.476123]  <IRQ> 
[223964.476126]  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
[223964.476130]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223964.476134]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223964.476139]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223964.476143]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223964.476147]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223964.476151]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223964.476155]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223964.476159]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223964.476164]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223964.476168]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223964.476172]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223964.476176]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223964.476180]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223964.476184]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223964.476186]  <EOI> 
[223964.476190]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223964.476194]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223964.476198]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223964.476202]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223964.476206]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[223964.476208] Code: 08 ff 15 46 5c a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31 
[223964.476227]  c1 48 89 d0 48 c1 e0 20 89 ca 48 09 d0 c9 c3 66 2e 0f 1f 84 
[223964.476236] Call Trace:
[223964.476238]  <IRQ>  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
[223964.476243]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223964.476246]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223964.476250]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223964.476254]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223964.476257]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223964.476260]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223964.476264]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223964.476267]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223964.476271]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223964.476274]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223964.476277]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223964.476281]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223964.476284]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223964.476287]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223964.476289]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223964.476295]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223964.476298]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223964.476301]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223964.476304]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[223964.580038] BUG: soft lockup - CPU#3 stuck for 23s! [kworker/0:1:0]
[223964.580040] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223964.580049] irq event stamp: 2615464042
[223964.580050] hardirqs last  enabled at (2615464042): [<ffffffff816f5edb>] _raw_spin_unlock_irq+0x2b/0x50
[223964.580054] hardirqs last disabled at (2615464041): [<ffffffff816f56a8>] _raw_spin_lock_irq+0x18/0x60
[223964.580058] softirqs last  enabled at (2615463964): [<ffffffff81063cce>] _local_bh_enable+0xe/0x10
[223964.580062] softirqs last disabled at (2615463965): [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223964.580066] CPU 3 
[223964.580067] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223964.580075] 
[223964.580077] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
[223964.580081] RIP: 0010:[<ffffffff8101afa6>]  [<ffffffff8101afa6>] native_read_tsc+0x6/0x20
[223964.580086] RSP: 0018:ffff88022fcc3ce0  EFLAGS: 00000206
[223964.580088] RAX: 000000009fc963af RBX: ffffffff816f6334 RCX: 000000009fc96377
[223964.580090] RDX: 0000000000017b69 RSI: ffffffff818f9896 RDI: 0000000000000001
[223964.580093] RBP: ffff88022fcc3ce0 R08: ffffffff9fc96331 R09: 0000000000000000
[223964.580095] R10: ffff880226981f20 R11: 0000000000000000 R12: ffff88022fcc3c58
[223964.580097] R13: ffffffff816feb33 R14: ffff88022fcc3ce0 R15: 0000000030ac88b0
[223964.580100] FS:  0000000000000000(0000) GS:ffff88022fcc0000(0000) knlGS:0000000000000000
[223964.580103] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[223964.580105] CR2: 0000000000f38820 CR3: 0000000104b52000 CR4: 00000000000006e0
[223964.580107] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[223964.580110] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[223964.580112] Process kworker/0:1 (pid: 0, threadinfo ffff880226988000, task ffff880226981f20)
[223964.580114] Stack:
[223964.580116]  ffff88022fcc3d30 ffffffff813a4ee8 ffff880226988000 ffffffff9fc96377
[223964.580120]  000000039c3b34d8 ffff880031438948 0000000030ac88b0 000000006efcdc62
[223964.580124]  0000000000000001 ffff880226981f20 ffff88022fcc3d40 ffffffff813a4f6a
[223964.580128] Call Trace:
[223964.580130]  <IRQ> 
[223964.580133]  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
[223964.580137]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223964.580141]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223964.580146]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223964.580150]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223964.580154]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223964.580158]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223964.580162]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223964.580167]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223964.580171]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223964.580176]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223964.580180]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223964.580184]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223964.580188]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223964.580192]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223964.580194]  <EOI> 
[223964.580198]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223964.580202]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223964.580206]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223964.580210]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223964.580214]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[223964.580217] Code: 08 ff 15 46 5c a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31 
[223964.580235]  c1 48 89 d0 48 c1 e0 20 89 ca 48 09 d0 c9 c3 66 2e 0f 1f 84 
[223964.580245] Call Trace:
[223964.580246]  <IRQ>  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
[223964.580252]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223964.580255]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223964.580259]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223964.580262]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223964.580265]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223964.580269]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223964.580272]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223964.580276]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223964.580279]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223964.580283]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223964.580286]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223964.580289]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223964.580292]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223964.580295]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223964.580297]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223964.580303]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223964.580307]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223964.580310]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223964.580313]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[223964.268002] RDX: 0000000000017b69 RSI: ffffffff818f9896 RDI: 0000000000000001
[223964.268002] RBP: ffff88022fc03ce0 R08: 000000007cb6c596 R09: 0000000000000000
[223964.268002] R10: ffffffff81a2b020 R11: 0000000000000000 R12: ffff88022fc03c58
[223964.268002] R13: ffffffff816feb33 R14: ffff88022fc03ce0 R15: 000000002eb85d38
[223964.268002] FS:  0000000000000000(0000) GS:ffff88022fc00000(0000) knlGS:0000000000000000
[223964.268002] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[223964.268002] CR2: 00007f25e7bc13a0 CR3: 00000001426fc000 CR4: 00000000000006f0
[223964.268002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[223964.268002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[223964.268002] Process swapper (pid: 0, threadinfo ffffffff819a6000, task ffffffff81a2b020)
[223964.268002] Stack:
[223964.268002]  ffff88022fc03d30 ffffffff813a4ee8 ffffffff819a6000 000000007cb6c5e3
[223964.268002]  000000007c44ac9c ffff8801f6c22448 000000002eb85d38 000000006efcdc62
[223964.268002]  0000000000000001 ffffffff81a2b020 ffff88022fc03d40 ffffffff813a4f6a
[223964.268002] Call Trace:
[223964.268002]  <IRQ> 
[223964.268002]  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
[223964.268002]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223964.268002]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223964.268002]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223964.268002]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223964.268002]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223964.268002]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223964.268002]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223964.268002]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223964.268002]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223964.268002]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223964.268002]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223964.268002]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223964.268002]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223964.268002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223964.268002]  <EOI> 
[223964.268002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223964.268002]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223964.268002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223964.268002]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223964.268002]  [<ffffffff816ca491>] rest_init+0xd1/0xe0
[223964.268002]  [<ffffffff816ca3c0>] ? csum_partial_copy_generic+0x170/0x170
[223964.268002]  [<ffffffff81adcc55>] start_kernel+0x360/0x3ac
[223964.268002]  [<ffffffff81adc2a2>] x86_64_start_reservations+0x82/0x89
[223964.268002]  [<ffffffff81adc3b8>] x86_64_start_kernel+0x10f/0x12a
[223964.268002]  [<ffffffff81adc140>] ? early_idt_handlers+0x140/0x140
[223964.268002] Code: 08 ff 15 46 5c a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31 
[223964.268002]  c1 48 89 d0 48 c1 e0 20 89 ca 48 09 d0 c9 c3 66 2e 0f 1f 84 
[223964.268002] Call Trace:
[223964.268002]  <IRQ>  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
[223964.268002]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223964.268002]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223964.268002]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223964.268002]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223964.268002]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223964.268002]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223964.268002]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223964.268002]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223964.268002]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223964.268002]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223964.268002]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223964.268002]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223964.268002]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223964.268002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223964.268002]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223964.268002]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223964.268002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223964.268002]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223964.268002]  [<ffffffff816ca491>] rest_init+0xd1/0xe0
[223964.268002]  [<ffffffff816ca3c0>] ? csum_partial_copy_generic+0x170/0x170
[223964.268002]  [<ffffffff81adcc55>] start_kernel+0x360/0x3ac
[223964.268002]  [<ffffffff81adc2a2>] x86_64_start_reservations+0x82/0x89
[223964.268002]  [<ffffffff81adc3b8>] x86_64_start_kernel+0x10f/0x12a
[223964.268002]  [<ffffffff81adc140>] ? early_idt_handlers+0x140/0x140
[223968.815995] INFO: rcu_sched_state detected stall on CPU 1 (t=15000 jiffies)
[223968.819995] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 1} (detected by 3, t=15002 jiffies)
[223968.820000] sending NMI to all CPUs:
[223968.820002] NMI backtrace for cpu 3
[223968.820002] CPU 3 
[223968.820002] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223968.820002] 
[223968.820002] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
[223968.820002] RIP: 0010:[<ffffffff813a4f86>]  [<ffffffff813a4f86>] __const_udelay+0x16/0x40
[223968.820002] RSP: 0018:ffff88022fcc3a90  EFLAGS: 00000002
[223968.820002] RAX: 0000000000e34d8a RBX: 0000000000000001 RCX: 0000000001062560
[223968.820002] RDX: 000000000071a6c5 RSI: 0000000000000002 RDI: 0000000000418958
[223968.820002] RBP: ffff88022fcc3ab0 R08: 0000000000000002 R09: 0000000000000000
[223968.820002] R10: 0000000000000006 R11: 000000000000000a R12: ffffffff81a40d80
[223968.820002] R13: 0000000000000010 R14: ffffffff81a40e40 R15: ffffffff81a40fc0
[223968.820002] FS:  0000000000000000(0000) GS:ffff88022fcc0000(0000) knlGS:0000000000000000
[223968.820002] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[223968.820002] CR2: 0000000000f38820 CR3: 0000000104b52000 CR4: 00000000000006e0
[223968.820002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[223968.820002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[223968.820002] Process kworker/0:1 (pid: 0, threadinfo ffff880226988000, task ffff880226981f20)
[223968.820002] Stack:
[223968.820002]  ffff88022fcc3ab0 ffffffff81031695 ffff88022fccdfa0 ffff88022fccdfa0
[223968.820002]  ffff88022fcc3af0 ffffffff810bb9d2 ffffffff81a40fc0 0000000000000003
[223968.820002]  0000000000000003 ffff880226981f20 ffffffff810921f0 ffff88022fcc3be0
[223968.820002] Call Trace:
[223968.820002]  <IRQ> 
[223968.820002]  [<ffffffff81031695>] ? arch_trigger_all_cpu_backtrace+0x65/0x90
[223968.820002]  [<ffffffff810bb9d2>] __rcu_pending+0x382/0x3b0
[223968.820002]  [<ffffffff810921f0>] ? tick_nohz_handler+0x100/0x100
[223968.820002]  [<ffffffff810bba67>] rcu_check_callbacks+0x67/0x130
[223968.820002]  [<ffffffff8106d861>] update_process_times+0x41/0x80
[223968.820002]  [<ffffffff81092256>] tick_sched_timer+0x66/0xc0
[223968.820002]  [<ffffffff810845ee>] __run_hrtimer+0xfe/0x1e0
[223968.820002]  [<ffffffff8108491d>] hrtimer_interrupt+0xcd/0x1f0
[223968.820002]  [<ffffffff810310c4>] smp_apic_timer_interrupt+0x64/0xa0
[223968.820002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223968.820002]  [<ffffffff816f6334>] ? retint_restore_args+0x13/0x13
[223968.820002]  [<ffffffff8101afa6>] ? native_read_tsc+0x6/0x20
[223968.820002]  [<ffffffff813a4eaf>] delay_tsc+0x3f/0xd0
[223968.820002]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223968.820002]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223968.820002]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223968.820002]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223968.820002]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223968.820002]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223968.820002]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223968.820002]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223968.820002]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223968.820002]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223968.820002]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223968.820002]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223968.820002]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223968.820002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223968.820002]  <EOI> 
[223968.820002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223968.820002]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223968.820002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223968.820002]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223968.820002]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[223968.820002] Code: 00 00 00 00 00 55 48 89 e5 ff 15 8e a5 6c 00 c9 c3 0f 1f 40 00 55 48 8d 0c bd 00 00 00 00 65 48 8b 14 25 58 2d 01 00 48 8d 04 12 
[223968.820002]  c1 e2 06 48 89 e5 48 29 c2 48 89 c8 f7 e2 48 8d 7a 01 ff 15 
[223968.820002] Call Trace:
[223968.820002]  <IRQ>  [<ffffffff81031695>] ? arch_trigger_all_cpu_backtrace+0x65/0x90
[223968.820002]  [<ffffffff810bb9d2>] __rcu_pending+0x382/0x3b0
[223968.820002]  [<ffffffff810921f0>] ? tick_nohz_handler+0x100/0x100
[223968.820002]  [<ffffffff810bba67>] rcu_check_callbacks+0x67/0x130
[223968.820002]  [<ffffffff8106d861>] update_process_times+0x41/0x80
[223968.820002]  [<ffffffff81092256>] tick_sched_timer+0x66/0xc0
[223968.820002]  [<ffffffff810845ee>] __run_hrtimer+0xfe/0x1e0
[223968.820002]  [<ffffffff8108491d>] hrtimer_interrupt+0xcd/0x1f0
[223968.820002]  [<ffffffff810310c4>] smp_apic_timer_interrupt+0x64/0xa0
[223968.820002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223968.820002]  [<ffffffff816f6334>] ? retint_restore_args+0x13/0x13
[223968.820002]  [<ffffffff8101afa6>] ? native_read_tsc+0x6/0x20
[223968.820002]  [<ffffffff813a4eaf>] delay_tsc+0x3f/0xd0
[223968.820002]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223968.820002]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223968.820002]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223968.820002]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223968.820002]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223968.820002]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223968.820002]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223968.820002]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223968.820002]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223968.820002]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223968.820002]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223968.820002]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223968.820002]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223968.820002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223968.820002]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223968.820002]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223968.820002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223968.820002]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223968.820002]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[223968.820335] NMI backtrace for cpu 0
[223968.820337] CPU 0 
[223968.820338] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223968.820347] 
[223968.820349] Pid: 0, comm: swapper Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
[223968.820353] RIP: 0010:[<ffffffff813a4ef0>]  [<ffffffff813a4ef0>] delay_tsc+0x80/0xd0
[223968.820358] RSP: 0018:ffff88022fc03cf0  EFLAGS: 00000206
[223968.820360] RAX: 000000007659b10f RBX: 0000000000000001 RCX: 000000007659b10f
[223968.820363] RDX: 000000007659b10f RSI: ffffffff818f9896 RDI: 0000000000000001
[223968.820365] RBP: ffff88022fc03d30 R08: 000000007659b10f R09: 0000000000000000
[223968.820367] R10: ffffffff81a2b020 R11: 0000000000000000 R12: 0000000031026962
[223968.820370] R13: 000000006efcdc62 R14: ffffffff819a6000 R15: 000000007659b0de
[223968.820373] FS:  0000000000000000(0000) GS:ffff88022fc00000(0000) knlGS:0000000000000000
[223968.820375] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[223968.820377] CR2: 00007f25e7bc13a0 CR3: 00000001426fc000 CR4: 00000000000006f0
[223968.820380] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[223968.820382] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[223968.820385] Process swapper (pid: 0, threadinfo ffffffff819a6000, task ffffffff81a2b020)
[223968.820387] Stack:
[223968.820388]  ffffffff819a6000 000000007659b0de 00000000818f9896 ffff8801f6c22448
[223968.820393]  0000000031026962 000000006efcdc62 0000000000000001 ffffffff81a2b020
[223968.820397]  ffff88022fc03d40 ffffffff813a4f6a ffff88022fc03d80 ffffffff813ac2ab
[223968.820401] Call Trace:
[223968.820402]  <IRQ> 
[223968.820406]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223968.820410]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223968.820414]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223968.820417]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223968.820420]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223968.820424]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223968.820427]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223968.820430]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223968.820434]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223968.820437]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223968.820441]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223968.820444]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223968.820447]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223968.820450]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223968.820452]  <EOI> 
[223968.820455]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223968.820459]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223968.820462]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223968.820465]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223968.820468]  [<ffffffff816ca491>] rest_init+0xd1/0xe0
[223968.820471]  [<ffffffff816ca3c0>] ? csum_partial_copy_generic+0x170/0x170
[223968.820475]  [<ffffffff81adcc55>] start_kernel+0x360/0x3ac
[223968.820478]  [<ffffffff81adc2a2>] x86_64_start_reservations+0x82/0x89
[223968.820481]  [<ffffffff81adc3b8>] x86_64_start_kernel+0x10f/0x12a
[223968.820484]  [<ffffffff81adc140>] ? early_idt_handlers+0x140/0x140
[223968.820486] Code: 68 1c 01 f3 90 83 40 1c 01 65 44 8b 3c 25 50 d3 00 00 44 3b 7d d4 75 3b 66 66 90 0f ae e8 e8 b8 60 c7 ff 66 90 4c 63 c0 4c 89 c0 
[223968.820504]  2b 45 c8 48 39 d8 72 c7 65 48 8b 04 25 08 c4 00 00 83 a8 44 
[223968.820514] Call Trace:
[223968.820515]  <IRQ>  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223968.820521]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223968.820525]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223968.820528]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223968.820532]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223968.820535]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223968.820538]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223968.820542]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223968.820546]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223968.820549]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223968.820552]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223968.820556]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223968.820559]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223968.820562]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223968.820564]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223968.820570]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223968.820573]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223968.820576]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223968.820579]  [<ffffffff816ca491>] rest_init+0xd1/0xe0
[223968.820583]  [<ffffffff816ca3c0>] ? csum_partial_copy_generic+0x170/0x170
[223968.820586]  [<ffffffff81adcc55>] start_kernel+0x360/0x3ac
[223968.820589]  [<ffffffff81adc2a2>] x86_64_start_reservations+0x82/0x89
[223968.820593]  [<ffffffff81adc3b8>] x86_64_start_kernel+0x10f/0x12a
[223968.820596]  [<ffffffff81adc140>] ? early_idt_handlers+0x140/0x140
[223968.820599] NMI backtrace for cpu 2
[223968.820600] CPU 2 
[223968.820602] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223968.820610] 
[223968.820612] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
[223968.820616] RIP: 0010:[<ffffffff8101afa6>]  [<ffffffff8101afa6>] native_read_tsc+0x6/0x20
[223968.820621] RSP: 0018:ffff88022fc83ce0  EFLAGS: 00000202
[223968.820623] RAX: 000000007659b116 RBX: 0000000000000001 RCX: 000000007659b0e5
[223968.820625] RDX: 0000000000017b6b RSI: 0000000000000000 RDI: 0000000000000001
[223968.820628] RBP: ffff88022fc83ce0 R08: 000000007659b098 R09: 0000000000000000
[223968.820630] R10: ffff880226948000 R11: 0000000000000000 R12: 00000000345f87d7
[223968.820632] R13: 000000006efcdc62 R14: ffff88022693e000 R15: 000000007659b0e5
[223968.820635] FS:  0000000000000000(0000) GS:ffff88022fc80000(0000) knlGS:0000000000000000
[223968.820638] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[223968.820640] CR2: 00007f25e7874d7f CR3: 0000000124c0d000 CR4: 00000000000006e0
[223968.820642] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[223968.820645] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[223968.820647] Process kworker/0:1 (pid: 0, threadinfo ffff88022693e000, task ffff880226948000)
[223968.820649] Stack:
[223968.820651]  ffff88022fc83d30 ffffffff813a4ee8 ffff88022693e000 000000007659b0e5
[223968.820655]  000000026b4044c5 ffff88006afd8948 00000000345f87d7 000000006efcdc62
[223968.820659]  0000000000000001 ffff880226948000 ffff88022fc83d40 ffffffff813a4f6a
[223968.820663] Call Trace:
[223968.820665]  <IRQ> 
[223968.820668]  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
[223968.820671]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223968.820674]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223968.820678]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223968.820682]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223968.820685]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223968.820688]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223968.820691]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223968.820695]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223968.820699]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223968.820702]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223968.820705]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223968.820708]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223968.820712]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223968.820715]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223968.820717]  <EOI> 
[223968.820720]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223968.820723]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223968.820727]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223968.820730]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223968.820733]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[223968.820735] Code: 08 ff 15 46 5c a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31 
[223968.820753]  c1 48 89 d0 48 c1 e0 20 89 ca 48 09 d0 c9 c3 66 2e 0f 1f 84 
[223968.820763] Call Trace:
[223968.820764]  <IRQ>  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
[223968.820769]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223968.820773]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223968.820777]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223968.820780]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223968.820783]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223968.820787]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223968.820790]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223968.820793]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223968.820797]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223968.820801]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223968.820804]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223968.820807]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223968.820810]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223968.820813]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223968.820815]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223968.820821]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223968.820824]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223968.820827]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223968.820831]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[223968.816001] NMI backtrace for cpu 1
[223968.816001] CPU 1 
[223968.816001] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223968.816001] 
[223968.816001] Pid: 0, comm: kworker/0:0 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
[223968.816001] RIP: 0010:[<ffffffff81440955>]  [<ffffffff81440955>] io_serial_out+0x15/0x20
[223968.816001] RSP: 0018:ffff88022fc437f0  EFLAGS: 00000002
[223968.816001] RAX: 0000000000000073 RBX: ffffffff8243eec0 RCX: 0000000000000000
[223968.816001] RDX: 00000000000003f8 RSI: 00000000000003f8 RDI: ffffffff8243eec0
[223968.816001] RBP: ffff88022fc437f0 R08: 000000007659a435 R09: 0000000000000000
[223968.816001] R10: 0000000000000001 R11: 000000000000000a R12: 0000000000000073
[223968.816001] R13: ffffffff81bc648d R14: 0000000000000050 R15: ffffffff8243eec0
[223968.816001] FS:  0000000000000000(0000) GS:ffff88022fc40000(0000) knlGS:0000000000000000
[223968.816001] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[223968.816001] CR2: 00007f472ba6a6fc CR3: 0000000126bb7000 CR4: 00000000000006e0
[223968.816001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[223968.816001] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[223968.816001] Process kworker/0:0 (pid: 0, threadinfo ffff880226928000, task ffff88022690dd60)
[223968.816001] Stack:
[223968.816001]  ffff88022fc43810 ffffffff814410dc 0000000000000030 ffffffff814410b0
[223968.816001]  ffff88022fc43850 ffffffff8143cdb5 0000000000000087 0000000000000000
[223968.816001]  ffffffff8243eec0 0000000000000001 0000000000000087 000000000000000d
[223968.816001] Call Trace:
[223968.816001]  <IRQ> 
[223968.816001]  [<ffffffff814410dc>] serial8250_console_putchar+0x2c/0x40
[223968.816001]  [<ffffffff814410b0>] ? wait_for_xmitr+0xa0/0xa0
[223968.816001]  [<ffffffff8143cdb5>] uart_console_write+0x35/0x70
[223968.816001]  [<ffffffff814417be>] serial8250_console_write+0xbe/0x1a0
[223968.816001]  [<ffffffff8105c78e>] __call_console_drivers+0x8e/0xb0
[223968.816001]  [<ffffffff8105c7f5>] _call_console_drivers+0x45/0x70
[223968.816001]  [<ffffffff8105d02f>] console_unlock+0x17f/0x2b0
[223968.816001]  [<ffffffff8105d64d>] vprintk+0x1fd/0x520
[223968.816001]  [<ffffffff816f16c1>] printk+0x67/0x69
[223968.816001]  [<ffffffff816f5fa6>] ? _raw_spin_unlock+0x26/0x40
[223968.816001]  [<ffffffff8105388b>] ? account_system_time+0xab/0x190
[223968.816001]  [<ffffffff810bb7e4>] __rcu_pending+0x194/0x3b0
[223968.816001]  [<ffffffff810921f0>] ? tick_nohz_handler+0x100/0x100
[223968.816001]  [<ffffffff810bba67>] rcu_check_callbacks+0x67/0x130
[223968.816001]  [<ffffffff8106d861>] update_process_times+0x41/0x80
[223968.816001]  [<ffffffff81092256>] tick_sched_timer+0x66/0xc0
[223968.816001]  [<ffffffff810845ee>] __run_hrtimer+0xfe/0x1e0
[223968.816001]  [<ffffffff8108491d>] hrtimer_interrupt+0xcd/0x1f0
[223968.816001]  [<ffffffff810310c4>] smp_apic_timer_interrupt+0x64/0xa0
[223968.816001]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223968.816001]  [<ffffffff816f6334>] ? retint_restore_args+0x13/0x13
[223968.816001]  [<ffffffff8101afa6>] ? native_read_tsc+0x6/0x20
[223968.816001]  [<ffffffff813a4eaf>] delay_tsc+0x3f/0xd0
[223968.816001]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223968.816001]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223968.816001]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223968.816001]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223968.816001]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223968.816001]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223968.816001]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223968.816001]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223968.816001]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223968.816001]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223968.816001]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223968.816001]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223968.816001]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223968.816001]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223968.816001]  <EOI> 
[223968.816001]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223968.816001]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223968.816001]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223968.816001]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223968.816001]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[223968.816001] Code: 48 89 e5 d3 e2 03 57 38 ec 0f b6 c0 c9 c3 0f 1f 84 00 00 00 00 00 0f b6 8f 81 00 00 00 55 89 d0 48 89 e5 d3 e6 03 77 38 89 f2 ee <c9> c3 66 0f 1f 84 00 00 00 00 00 55 80 bf 82 00 00 00 08 48 89 
[223968.816001] Call Trace:
[223968.816001]  <IRQ>  [<ffffffff814410dc>] serial8250_console_putchar+0x2c/0x40
[223968.816001]  [<ffffffff814410b0>] ? wait_for_xmitr+0xa0/0xa0
[223968.816001]  [<ffffffff8143cdb5>] uart_console_write+0x35/0x70
[223968.816001]  [<ffffffff814417be>] serial8250_console_write+0xbe/0x1a0
[223968.816001]  [<ffffffff8105c78e>] __call_console_drivers+0x8e/0xb0
[223968.816001]  [<ffffffff8105c7f5>] _call_console_drivers+0x45/0x70
[223968.816001]  [<ffffffff8105d02f>] console_unlock+0x17f/0x2b0
[223968.816001]  [<ffffffff8105d64d>] vprintk+0x1fd/0x520
[223968.816001]  [<ffffffff816f16c1>] printk+0x67/0x69
[223968.816001]  [<ffffffff816f5fa6>] ? _raw_spin_unlock+0x26/0x40
[223968.816001]  [<ffffffff8105388b>] ? account_system_time+0xab/0x190
[223968.816001]  [<ffffffff810bb7e4>] __rcu_pending+0x194/0x3b0
[223968.816001]  [<ffffffff810921f0>] ? tick_nohz_handler+0x100/0x100
[223968.816001]  [<ffffffff810bba67>] rcu_check_callbacks+0x67/0x130
[223968.816001]  [<ffffffff8106d861>] update_process_times+0x41/0x80
[223968.816001]  [<ffffffff81092256>] tick_sched_timer+0x66/0xc0
[223968.816001]  [<ffffffff810845ee>] __run_hrtimer+0xfe/0x1e0
[223968.816001]  [<ffffffff8108491d>] hrtimer_interrupt+0xcd/0x1f0
[223968.816001]  [<ffffffff810310c4>] smp_apic_timer_interrupt+0x64/0xa0
[223968.816001]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223968.816001]  [<ffffffff816f6334>] ? retint_restore_args+0x13/0x13
[223968.816001]  [<ffffffff8101afa6>] ? native_read_tsc+0x6/0x20
[223968.816001]  [<ffffffff813a4eaf>] delay_tsc+0x3f/0xd0
[223968.816001]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223968.816001]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223968.816001]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223968.816001]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223968.816001]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223968.816001]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223968.816001]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223968.816001]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223968.816001]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223968.816001]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223968.816001]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223968.816001]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223968.816001]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223968.816001]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223968.816001]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223968.816001]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223968.816001]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223968.816001]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223968.816001]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff

[ goes on for another ~300kB, trimmed ]

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-24 19:02                                     ` Simon Kirby
@ 2011-10-25  7:13                                       ` Linus Torvalds
  2011-10-25  9:01                                         ` David Miller
  2011-10-25 20:20                                       ` Simon Kirby
  1 sibling, 1 reply; 98+ messages in thread
From: Linus Torvalds @ 2011-10-25  7:13 UTC (permalink / raw)
  To: Simon Kirby, Network Development
  Cc: Thomas Gleixner, David Miller, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar

Added netdev, because this seems to be a generic networking bug (ABBA
between sk_lock and icsk_retransmit_timer if my quick scan looks
correct).

Davem?

               Linus

On Mon, Oct 24, 2011 at 9:02 PM, Simon Kirby <sim@hostway.ca> wrote:
>
> Ok, hit the hang about 4 more times, but only this morning on a box with
> a serial cable attached. Yay!
>
> Simon-
>
> [216695.579770] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [216695.589435]
> [216695.589437] =======================================================
> [216695.593380] [ INFO: possible circular locking dependency detected ]
> [216695.593380] 3.1.0-rc10-hw-lockdep+ #51
> [216695.593380] -------------------------------------------------------
> [216695.593380] kworker/0:1/0 is trying to acquire lock:
> [216695.593380]  (&icsk->icsk_retransmit_timer){+.-.-.}, at: [<ffffffff8106cc88>] run_timer_softirq+0x198/0x410
> [216695.593380]
> [216695.593380] but task is already holding lock:
> [216695.593380]  (slock-AF_INET){+.-.-.}, at: [<ffffffff81604540>] sk_clone+0x120/0x420
> [216695.593380]
> [216695.593380] which lock already depends on the new lock.
> [216695.593380]
> [216695.593380]
> [216695.593380] the existing dependency chain (in reverse order) is:
> [216695.593380]
> [216695.593380] -> #1 (slock-AF_INET){+.-.-.}:
> [216695.593380]        [<ffffffff8109a7b9>] lock_acquire+0x109/0x140
> [216695.593380]        [<ffffffff816f55fc>] _raw_spin_lock+0x3c/0x50
> [216695.593380]        [<ffffffff81661cc3>] tcp_write_timer+0x23/0x230
> [216695.682901]        [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [216695.682901]        [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [216695.682901]        [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [216695.682901]        [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [216695.682901]        [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [216695.682901]        [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [216695.682901]        [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [216695.682901]        [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [216695.682901]        [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [216695.682901]
> [216695.682901] -> #0 (&icsk->icsk_retransmit_timer){+.-.-.}:
> [216695.682901]        [<ffffffff8109a000>] __lock_acquire+0x2040/0x2180
> [216695.682901]        [<ffffffff8109a7b9>] lock_acquire+0x109/0x140
> [216695.682901]        [<ffffffff8106cd09>] run_timer_softirq+0x219/0x410
> [216695.682901]        [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [216695.682901]        [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [216695.682901]        [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [216695.682901]        [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [216695.682901]        [<ffffffff81014b74>] do_IRQ+0x64/0xe0
> [216695.682901]        [<ffffffff816f6273>] ret_from_intr+0x0/0x1a
> [216695.682901]        [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [216695.682901]        [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [216695.682901]
> [216695.682901] other info that might help us debug this:
> [216695.682901]
> [216695.682901]  Possible unsafe locking scenario:
> [216695.682901]
> [216695.682901]        CPU0                    CPU1
> [216695.682901]        ----                    ----
> [216695.682901]   lock(slock-AF_INET);
> [216695.682901]                                lock(&icsk->icsk_retransmit_timer);
> [216695.682901]                                lock(slock-AF_INET);
> [216695.682901]   lock(&icsk->icsk_retransmit_timer);
> [216695.682901]
> [216695.682901]  *** DEADLOCK ***
> [216695.682901]
> [216695.682901] 1 lock held by kworker/0:1/0:
> [216695.682901]  #0:  (slock-AF_INET){+.-.-.}, at: [<ffffffff81604540>] sk_clone+0x120/0x420
> [216695.682901]
> [216695.682901] stack backtrace:
> [216695.682901] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc10-hw-lockdep+ #51
> [216695.682901] Call Trace:
> [216695.682901]  <IRQ>  [<ffffffff81097eab>] print_circular_bug+0x21b/0x330
> [216695.682901]  [<ffffffff8109a000>] __lock_acquire+0x2040/0x2180
> [216695.682901]  [<ffffffffa001d6e2>] ? nf_conntrack_free+0x42/0x50 [nf_conntrack]
> [216695.682901]  [<ffffffffa001d6e2>] ? nf_conntrack_free+0x42/0x50 [nf_conntrack]
> [216695.682901]  [<ffffffff8109a7b9>] lock_acquire+0x109/0x140
> [216695.682901]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [216695.682901]  [<ffffffff81096b4c>] ? trace_hardirqs_on_caller+0x7c/0x1c0
> [216695.682901]  [<ffffffff8106cd09>] run_timer_softirq+0x219/0x410
> [216695.682901]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [216695.682901]  [<ffffffff816f16c1>] ? printk+0x67/0x69
> [216695.682901]  [<ffffffff81661ca0>] ? tcp_delack_timer+0x230/0x230
> [216695.682901]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [216695.682901]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [216695.682901]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [216695.682901]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [216695.682901]  [<ffffffff81014b74>] do_IRQ+0x64/0xe0
> [216695.682901]  [<ffffffff816f6273>] common_interrupt+0x73/0x73
> [216695.682901]  <EOI>  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [216695.682901]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [216695.682901]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [216695.682901]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [216696.019296] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000105?
> [216697.762956] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [216698.597297] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [216701.489681] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [216701.667999] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [216704.580592] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
> [216709.468971] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
> [216712.845904] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [216716.588502] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
> [216725.072958] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
> [216725.603879] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [216725.828374] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [216727.588978] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [216735.513864] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [216740.581530] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [216756.278571] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [218855.312903] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
> [218855.323133] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
> [218858.293355] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [218864.301938] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [218876.333821] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [218885.332651] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
> [218900.313590] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [220821.012017] TCP: Peer 32.176.160.153:49226/80 unexpectedly shrunk window 665256753:665268993 (repaired)
> [221075.224300] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221075.234579] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221075.277593] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221075.780515] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221075.780713] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221077.349279] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221077.905587] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
> [221077.915567] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
> [221081.498430] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221081.703277] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
> [221082.088513] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221082.167985] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221089.772578] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221090.487927] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
> [221090.686394] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221094.587131] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221105.255699] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
> [221105.280699] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221105.291634] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221106.325794] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221107.286029] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221107.622736] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221107.734471] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221120.381643] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [223936.264020] BUG: soft lockup - CPU#0 stuck for 22s! [swapper:0]
> [223936.268002] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223936.268002] irq event stamp: 2595159887
> [223936.268002] hardirqs last  enabled at (2595159887): [<ffffffff8101b805>] mwait_idle+0x145/0x170
> [223936.268002] hardirqs last disabled at (2595159886): [<ffffffff81013139>] cpu_idle+0x79/0xf0
> [223936.268002] softirqs last  enabled at (2595159878): [<ffffffff81064176>] __do_softirq+0x1f6/0x250
> [223936.268002] softirqs last disabled at (2595159873): [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223936.268002] CPU 0
> [223936.268002] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223936.268002]
> [223936.268002] Pid: 0, comm: swapper Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
> [223936.268002] RIP: 0010:[<ffffffff813a4ee3>]  [<ffffffff813a4ee3>] delay_tsc+0x73/0xd0
> [223936.268002] RSP: 0018:ffff88022fc03cf0  EFLAGS: 00000202
> [223936.268002] RAX: 00017b5d5932dd02 RBX: ffffffff816f6334 RCX: 000000005932dd02
> [223936.372028] BUG: soft lockup - CPU#1 stuck for 22s! [kworker/0:0:0]
> [223936.372031] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223936.372042] irq event stamp: 2598787699
> [223936.372044] hardirqs last  enabled at (2598787699): [<ffffffff8101b805>] mwait_idle+0x145/0x170
> [223936.372054] hardirqs last disabled at (2598787698): [<ffffffff81013139>] cpu_idle+0x79/0xf0
> [223936.372058] softirqs last  enabled at (2598787696): [<ffffffff81064176>] __do_softirq+0x1f6/0x250
> [223936.372063] softirqs last disabled at (2598787681): [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223936.372069] CPU 1
> [223936.372070] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223936.372079]
> [223936.372081] Pid: 0, comm: kworker/0:0 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
> [223936.372086] RIP: 0010:[<ffffffff8101afab>]  [<ffffffff8101afab>] native_read_tsc+0xb/0x20
> [223936.372091] RSP: 0018:ffff88022fc43ce0  EFLAGS: 00000202
> [223936.372093] RAX: 0000000000017b5d RBX: ffffffff816f6334 RCX: 00000000652f810e
> [223936.372096] RDX: 0000000000017b5d RSI: 0000000000000000 RDI: 0000000000000001
> [223936.372098] RBP: ffff88022fc43ce0 R08: 00000000652f80c8 R09: 0000000000000000
> [223936.372101] R10: ffff88022690dd60 R11: 0000000000000000 R12: ffff88022fc43c58
> [223936.372103] R13: ffffffff816feb33 R14: ffff88022fc43ce0 R15: 00000000180bbeb8
> [223936.372106] FS:  0000000000000000(0000) GS:ffff88022fc40000(0000) knlGS:0000000000000000
> [223936.372108] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [223936.372111] CR2: 00007f472ba6a6fc CR3: 0000000126bb7000 CR4: 00000000000006e0
> [223936.372113] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [223936.372116] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [223936.372119] Process kworker/0:0 (pid: 0, threadinfo ffff880226928000, task ffff88022690dd60)
> [223936.372121] Stack:
> [223936.372123]  ffff88022fc43d30 ffffffff813a4eaf ffff880226928000 00000000652f8090
> [223936.372128]  000000012fc43d18 ffff88002e90e348 00000000180bbeb8 000000006efcdc62
> [223936.372132]  0000000000000001 ffff88022690dd60 ffff88022fc43d40 ffffffff813a4f6a
> [223936.372136] Call Trace:
> [223936.372139]  <IRQ>
> [223936.372144]  [<ffffffff813a4eaf>] delay_tsc+0x3f/0xd0
> [223936.372148]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223936.372153]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223936.372159]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223936.372164]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223936.372168]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223936.372174]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223936.372178]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223936.372182]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223936.372186]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223936.372190]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223936.372194]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223936.372198]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223936.372203]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223936.372208]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223936.372210]  <EOI>
> [223936.372214]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.372218]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223936.372222]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.372226]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223936.372230]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223936.372233] Code: a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31 89 c1 48 89 d0
> [223936.372253]  c1 e0 20 89 ca 48 09 d0 c9 c3 66 2e 0f 1f 84 00 00 00 00 00
> [223936.372262] Call Trace:
> [223936.372264]  <IRQ>  [<ffffffff813a4eaf>] delay_tsc+0x3f/0xd0
> [223936.372269]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223936.372272]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223936.372276]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223936.372280]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223936.372283]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223936.372286]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223936.372289]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223936.372293]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223936.372297]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223936.372300]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223936.372303]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223936.372307]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223936.372310]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223936.372313]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223936.372315]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.372321]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223936.372324]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.372327]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223936.372331]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223936.476032] BUG: soft lockup - CPU#2 stuck for 23s! [kworker/0:1:0]
> [223936.476034] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223936.476043] irq event stamp: 2613824057
> [223936.476045] hardirqs last  enabled at (2613824057): [<ffffffff8101b805>] mwait_idle+0x145/0x170
> [223936.476050] hardirqs last disabled at (2613824056): [<ffffffff81013139>] cpu_idle+0x79/0xf0
> [223936.476054] softirqs last  enabled at (2613824048): [<ffffffff81064176>] __do_softirq+0x1f6/0x250
> [223936.476058] softirqs last disabled at (2613824031): [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223936.476062] CPU 2
> [223936.476063] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223936.476071]
> [223936.476073] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
> [223936.476077] RIP: 0010:[<ffffffff8101afa6>]  [<ffffffff8101afa6>] native_read_tsc+0x6/0x20
> [223936.476082] RSP: 0018:ffff88022fc83ce0  EFLAGS: 00000202
> [223936.476084] RAX: 0000000070ba7dfc RBX: ffffffff813a60ae RCX: 0000000070ba7dc4
> [223936.476086] RDX: 0000000000017b5d RSI: 0000000000000000 RDI: 0000000000000001
> [223936.476089] RBP: ffff88022fc83ce0 R08: 0000000070ba7d7e R09: 0000000000000000
> [223936.476091] R10: ffff880226948000 R11: 0000000000000000 R12: ffff88022fc83c58
> [223936.476093] R13: ffffffff816feb33 R14: ffff88022fc83ce0 R15: 00000000182285f9
> [223936.476096] FS:  0000000000000000(0000) GS:ffff88022fc80000(0000) knlGS:0000000000000000
> [223936.476099] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [223936.476101] CR2: 00007f25e7874d7f CR3: 0000000124c0d000 CR4: 00000000000006e0
> [223936.476104] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [223936.476106] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [223936.476109] Process kworker/0:1 (pid: 0, threadinfo ffff88022693e000, task ffff880226948000)
> [223936.476111] Stack:
> [223936.476113]  ffff88022fc83d30 ffffffff813a4ee8 ffff88022693e000 0000000070ba7dc4
> [223936.476117]  00000002ffffff10 ffff88006afd8948 00000000182285f9 000000006efcdc62
> [223936.476121]  0000000000000001 ffff880226948000 ffff88022fc83d40 ffffffff813a4f6a
> [223936.476126] Call Trace:
> [223936.476128]  <IRQ>
> [223936.476132]  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
> [223936.476136]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223936.476141]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223936.476147]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223936.476153]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223936.476157]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223936.476163]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223936.476167]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223936.476171]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223936.476176]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223936.476180]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223936.476184]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223936.476187]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223936.476193]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223936.476197]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223936.476199]  <EOI>
> [223936.476203]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.476207]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223936.476211]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.476215]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223936.476219]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223936.476222] Code: 08 ff 15 46 5c a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31
> [223936.476241]  c1 48 89 d0 48 c1 e0 20 89 ca 48 09 d0 c9 c3 66 2e 0f 1f 84
> [223936.476251] Call Trace:
> [223936.476252]  <IRQ>  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
> [223936.476257]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223936.476261]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223936.476265]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223936.476268]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223936.476272]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223936.476275]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223936.476278]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223936.476282]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223936.476286]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223936.476289]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223936.476292]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223936.476295]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223936.476299]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223936.476302]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223936.476304]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.476310]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223936.476313]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.476316]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223936.476320]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223936.580039] BUG: soft lockup - CPU#3 stuck for 23s! [kworker/0:1:0]
> [223936.580041] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223936.580050] irq event stamp: 2615464042
> [223936.580052] hardirqs last  enabled at (2615464042): [<ffffffff816f5edb>] _raw_spin_unlock_irq+0x2b/0x50
> [223936.580057] hardirqs last disabled at (2615464041): [<ffffffff816f56a8>] _raw_spin_lock_irq+0x18/0x60
> [223936.580061] softirqs last  enabled at (2615463964): [<ffffffff81063cce>] _local_bh_enable+0xe/0x10
> [223936.580065] softirqs last disabled at (2615463965): [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223936.580069] CPU 3
> [223936.580070] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223936.580078]
> [223936.580080] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
> [223936.580085] RIP: 0010:[<ffffffff8101afa6>]  [<ffffffff8101afa6>] native_read_tsc+0x6/0x20
> [223936.580090] RSP: 0018:ffff88022fcc3ce0  EFLAGS: 00000202
> [223936.580092] RAX: 000000007c457b06 RBX: ffffffff816f6334 RCX: 000000007c457ad5
> [223936.580094] RDX: 0000000000017b5d RSI: ffffffff818f9896 RDI: 0000000000000001
> [223936.580097] RBP: ffff88022fcc3ce0 R08: 000000007c457a88 R09: 0000000000000000
> [223936.580099] R10: ffff880226981f20 R11: 0000000000000000 R12: ffff88022fcc3c58
> [223936.580101] R13: ffffffff816feb33 R14: ffff88022fcc3ce0 R15: 00000000183a1380
> [223936.580104] FS:  0000000000000000(0000) GS:ffff88022fcc0000(0000) knlGS:0000000000000000
> [223936.580107] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [223936.580109] CR2: 0000000000f38820 CR3: 0000000104b52000 CR4: 00000000000006e0
> [223936.580112] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [223936.580114] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [223936.580117] Process kworker/0:1 (pid: 0, threadinfo ffff880226988000, task ffff880226981f20)
> [223936.580119] Stack:
> [223936.580120]  ffff88022fcc3d30 ffffffff813a4ee8 ffff880226988000 000000007c457ad5
> [223936.580125]  00000003ffffff10 ffff880031438948 00000000183a1380 000000006efcdc62
> [223936.580129]  0000000000000001 ffff880226981f20 ffff88022fcc3d40 ffffffff813a4f6a
> [223936.580133] Call Trace:
> [223936.580135]  <IRQ>
> [223936.580138]  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
> [223936.580142]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223936.580147]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223936.580151]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223936.580156]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223936.580160]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223936.580164]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223936.580168]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223936.580172]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223936.580176]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223936.580181]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223936.580185]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223936.580188]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223936.580192]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223936.580196]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223936.580199]  <EOI>
> [223936.580202]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.580206]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223936.580211]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.580214]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223936.580218]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223936.580221] Code: 08 ff 15 46 5c a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31
> [223936.580240]  c1 48 89 d0 48 c1 e0 20 89 ca 48 09 d0 c9 c3 66 2e 0f 1f 84
> [223936.580250] Call Trace:
> [223936.580251]  <IRQ>  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
> [223936.580256]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223936.580260]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223936.580264]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223936.580267]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223936.580270]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223936.580274]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223936.580277]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223936.580280]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223936.580284]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223936.580288]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223936.580291]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223936.580294]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223936.580297]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223936.580300]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223936.580302]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.580308]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223936.580312]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.580315]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223936.580318]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223936.268002] RDX: 000000005932dd02 RSI: ffffffff818f9896 RDI: 0000000000000001
> [223936.268002] RBP: ffff88022fc03d30 R08: 000000005932dcb5 R09: 0000000000000000
> [223936.268002] R10: ffffffff81a2b020 R11: 0000000000000000 R12: ffff88022fc03c68
> [223936.268002] R13: ffffffff816feb33 R14: ffff88022fc03d30 R15: 0000000017f328cd
> [223936.268002] FS:  0000000000000000(0000) GS:ffff88022fc00000(0000) knlGS:0000000000000000
> [223936.268002] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [223936.268002] CR2: 00007f25e7bc13a0 CR3: 00000001426fc000 CR4: 00000000000006f0
> [223936.268002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [223936.268002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [223936.268002] Process swapper (pid: 0, threadinfo ffffffff819a6000, task ffffffff81a2b020)
> [223936.268002] Stack:
> [223936.268002]  ffffffff819a6000 000000005932dd02 000000002fc03d18 ffff8801f6c22448
> [223936.268002]  0000000017f328cd 000000006efcdc62 0000000000000001 ffffffff81a2b020
> [223936.268002]  ffff88022fc03d40 ffffffff813a4f6a ffff88022fc03d80 ffffffff813ac2ab
> [223936.268002] Call Trace:
> [223936.268002]  <IRQ>
> [223936.268002]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223936.268002]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223936.268002]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223936.268002]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223936.268002]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223936.268002]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223936.268002]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223936.268002]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223936.268002]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223936.268002]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223936.268002]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223936.268002]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223936.268002]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223936.268002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223936.268002]  <EOI>
> [223936.268002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.268002]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223936.268002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.268002]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223936.268002]  [<ffffffff816ca491>] rest_init+0xd1/0xe0
> [223936.268002]  [<ffffffff816ca3c0>] ? csum_partial_copy_generic+0x170/0x170
> [223936.268002]  [<ffffffff81adcc55>] start_kernel+0x360/0x3ac
> [223936.268002]  [<ffffffff81adc2a2>] x86_64_start_reservations+0x82/0x89
> [223936.268002]  [<ffffffff81adc3b8>] x86_64_start_kernel+0x10f/0x12a
> [223936.268002]  [<ffffffff81adc140>] ? early_idt_handlers+0x140/0x140
> [223936.268002] Code: 4c 89 7d c8 eb 1f 66 90 48 8b 45 c0 83 68 1c 01 f3 90 83 40 1c 01 65 44 8b 3c 25 50 d3 00 00 44 3b 7d d4 75 3b 66 66 90 0f ae e8 <e8> b8 60 c7 ff 66 90 4c 63 c0 4c 89 c0 48 2b 45 c8 48 39 d8 72
> [223936.268002] Call Trace:
> [223936.268002]  <IRQ>  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223936.268002]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223936.268002]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223936.268002]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223936.268002]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223936.268002]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223936.268002]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223936.268002]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223936.268002]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223936.268002]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223936.268002]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223936.268002]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223936.268002]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223936.268002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223936.268002]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.268002]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223936.268002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.268002]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223936.268002]  [<ffffffff816ca491>] rest_init+0xd1/0xe0
> [223936.268002]  [<ffffffff816ca3c0>] ? csum_partial_copy_generic+0x170/0x170
> [223936.268002]  [<ffffffff81adcc55>] start_kernel+0x360/0x3ac
> [223936.268002]  [<ffffffff81adc2a2>] x86_64_start_reservations+0x82/0x89
> [223936.268002]  [<ffffffff81adc3b8>] x86_64_start_kernel+0x10f/0x12a
> [223936.268002]  [<ffffffff81adc140>] ? early_idt_handlers+0x140/0x140
> [223964.264018] BUG: soft lockup - CPU#0 stuck for 22s! [swapper:0]
> [223964.268002] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223964.268002] irq event stamp: 2595159887
> [223964.268002] hardirqs last  enabled at (2595159887): [<ffffffff8101b805>] mwait_idle+0x145/0x170
> [223964.268002] hardirqs last disabled at (2595159886): [<ffffffff81013139>] cpu_idle+0x79/0xf0
> [223964.268002] softirqs last  enabled at (2595159878): [<ffffffff81064176>] __do_softirq+0x1f6/0x250
> [223964.268002] softirqs last disabled at (2595159873): [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223964.268002] CPU 0
> [223964.268002] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223964.268002]
> [223964.268002] Pid: 0, comm: swapper Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
> [223964.268002] RIP: 0010:[<ffffffff8101afa6>]  [<ffffffff8101afa6>] native_read_tsc+0x6/0x20
> [223964.268002] RSP: 0018:ffff88022fc03ce0  EFLAGS: 00000202
> [223964.268002] RAX: 000000007cb6c61b RBX: ffffffff816f6334 RCX: 000000007cb6c5e3
> [223964.372025] BUG: soft lockup - CPU#1 stuck for 23s! [kworker/0:0:0]
> [223964.372027] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223964.372036] irq event stamp: 2598787699
> [223964.372037] hardirqs last  enabled at (2598787699): [<ffffffff8101b805>] mwait_idle+0x145/0x170
> [223964.372042] hardirqs last disabled at (2598787698): [<ffffffff81013139>] cpu_idle+0x79/0xf0
> [223964.372045] softirqs last  enabled at (2598787696): [<ffffffff81064176>] __do_softirq+0x1f6/0x250
> [223964.372049] softirqs last disabled at (2598787681): [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223964.372052] CPU 1
> [223964.372053] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223964.372061]
> [223964.372063] Pid: 0, comm: kworker/0:0 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
> [223964.372067] RIP: 0010:[<ffffffff8101afa0>]  [<ffffffff8101afa0>] read_persistent_clock+0x30/0x30
> [223964.372072] RSP: 0018:ffff88022fc43ce8  EFLAGS: 00000202
> [223964.372074] RAX: 0000000000000001 RBX: ffff88022fc43c68 RCX: 0000000088b369fd
> [223964.372076] RDX: 0000000000000200 RSI: 0000000000000000 RDI: 0000000000000001
> [223964.372078] RBP: ffff88022fc43d30 R08: ffffffff88b369fd R09: 0000000000000000
> [223964.372081] R10: ffff88022690dd60 R11: 0000000000000000 R12: ffff88022fc43c58
> [223964.372083] R13: ffffffff816feb33 R14: ffff88022fc43d30 R15: 00000000307e58b4
> [223964.372086] FS:  0000000000000000(0000) GS:ffff88022fc40000(0000) knlGS:0000000000000000
> [223964.372089] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [223964.372091] CR2: 00007f472ba6a6fc CR3: 0000000126bb7000 CR4: 00000000000006e0
> [223964.372093] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [223964.372096] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [223964.372098] Process kworker/0:0 (pid: 0, threadinfo ffff880226928000, task ffff88022690dd60)
> [223964.372100] Stack:
> [223964.372102]  ffffffff813a4eaf ffff880226928000 ffffffff88b369c5 000000012fc43d18
> [223964.372106]  ffff88002e90e348 00000000307e58b4 000000006efcdc62 0000000000000001
> [223964.372111]  ffff88022690dd60 ffff88022fc43d40 ffffffff813a4f6a ffff88022fc43d80
> [223964.372115] Call Trace:
> [223964.372116]  <IRQ>
> [223964.372119]  [<ffffffff813a4eaf>] ? delay_tsc+0x3f/0xd0
> [223964.372123]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223964.372127]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223964.372132]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223964.372136]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223964.372140]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223964.372144]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223964.372148]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223964.372153]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223964.372158]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223964.372162]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223964.372166]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223964.372170]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223964.372174]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223964.372178]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223964.372180]  <EOI>
> [223964.372184]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.372188]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223964.372192]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.372196]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223964.372200]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223964.372203] Code: 48 89 fb 48 83 ec 08 ff 15 46 5c a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00
> [223964.372221]  48 89 e5 0f 31 89 c1 48 89 d0 48 c1 e0 20 89 ca 48 09 d0 c9
> [223964.372231] Call Trace:
> [223964.372232]  <IRQ>  [<ffffffff813a4eaf>] ? delay_tsc+0x3f/0xd0
> [223964.372237]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223964.372241]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223964.372245]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223964.372248]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223964.372251]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223964.372255]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223964.372258]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223964.372261]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223964.372265]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223964.372268]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223964.372271]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223964.372275]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223964.372278]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223964.372281]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223964.372282]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.372288]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223964.372292]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.372295]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223964.372298]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223964.476031] BUG: soft lockup - CPU#2 stuck for 23s! [kworker/0:1:0]
> [223964.476033] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223964.476042] irq event stamp: 2613824057
> [223964.476043] hardirqs last  enabled at (2613824057): [<ffffffff8101b805>] mwait_idle+0x145/0x170
> [223964.476048] hardirqs last disabled at (2613824056): [<ffffffff81013139>] cpu_idle+0x79/0xf0
> [223964.476051] softirqs last  enabled at (2613824048): [<ffffffff81064176>] __do_softirq+0x1f6/0x250
> [223964.476055] softirqs last disabled at (2613824031): [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223964.476059] CPU 2
> [223964.476060] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223964.476067]
> [223964.476070] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
> [223964.476074] RIP: 0010:[<ffffffff8101afa6>]  [<ffffffff8101afa6>] native_read_tsc+0x6/0x20
> [223964.476078] RSP: 0018:ffff88022fc83ce0  EFLAGS: 00000206
> [223964.476080] RAX: 00000000943e6715 RBX: ffffffff816f6334 RCX: 00000000943e66dd
> [223964.476083] RDX: 0000000000017b69 RSI: 0000000000000000 RDI: 0000000000000001
> [223964.476085] RBP: ffff88022fc83ce0 R08: ffffffff943e6697 R09: 0000000000000000
> [223964.476087] R10: ffff880226948000 R11: 0000000000000000 R12: ffff88022fc83c58
> [223964.476090] R13: ffffffff816feb33 R14: ffff88022fc83ce0 R15: 000000003094ad30
> [223964.476092] FS:  0000000000000000(0000) GS:ffff88022fc80000(0000) knlGS:0000000000000000
> [223964.476095] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [223964.476097] CR2: 00007f25e7874d7f CR3: 0000000124c0d000 CR4: 00000000000006e0
> [223964.476100] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [223964.476102] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [223964.476105] Process kworker/0:1 (pid: 0, threadinfo ffff88022693e000, task ffff880226948000)
> [223964.476107] Stack:
> [223964.476108]  ffff88022fc83d30 ffffffff813a4ee8 ffff88022693e000 ffffffff943e66dd
> [223964.476113]  00000002ffffff10 ffff88006afd8948 000000003094ad30 000000006efcdc62
> [223964.476117]  0000000000000001 ffff880226948000 ffff88022fc83d40 ffffffff813a4f6a
> [223964.476121] Call Trace:
> [223964.476123]  <IRQ>
> [223964.476126]  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
> [223964.476130]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223964.476134]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223964.476139]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223964.476143]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223964.476147]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223964.476151]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223964.476155]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223964.476159]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223964.476164]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223964.476168]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223964.476172]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223964.476176]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223964.476180]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223964.476184]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223964.476186]  <EOI>
> [223964.476190]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.476194]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223964.476198]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.476202]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223964.476206]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223964.476208] Code: 08 ff 15 46 5c a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31
> [223964.476227]  c1 48 89 d0 48 c1 e0 20 89 ca 48 09 d0 c9 c3 66 2e 0f 1f 84
> [223964.476236] Call Trace:
> [223964.476238]  <IRQ>  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
> [223964.476243]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223964.476246]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223964.476250]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223964.476254]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223964.476257]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223964.476260]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223964.476264]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223964.476267]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223964.476271]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223964.476274]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223964.476277]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223964.476281]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223964.476284]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223964.476287]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223964.476289]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.476295]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223964.476298]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.476301]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223964.476304]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223964.580038] BUG: soft lockup - CPU#3 stuck for 23s! [kworker/0:1:0]
> [223964.580040] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223964.580049] irq event stamp: 2615464042
> [223964.580050] hardirqs last  enabled at (2615464042): [<ffffffff816f5edb>] _raw_spin_unlock_irq+0x2b/0x50
> [223964.580054] hardirqs last disabled at (2615464041): [<ffffffff816f56a8>] _raw_spin_lock_irq+0x18/0x60
> [223964.580058] softirqs last  enabled at (2615463964): [<ffffffff81063cce>] _local_bh_enable+0xe/0x10
> [223964.580062] softirqs last disabled at (2615463965): [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223964.580066] CPU 3
> [223964.580067] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223964.580075]
> [223964.580077] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
> [223964.580081] RIP: 0010:[<ffffffff8101afa6>]  [<ffffffff8101afa6>] native_read_tsc+0x6/0x20
> [223964.580086] RSP: 0018:ffff88022fcc3ce0  EFLAGS: 00000206
> [223964.580088] RAX: 000000009fc963af RBX: ffffffff816f6334 RCX: 000000009fc96377
> [223964.580090] RDX: 0000000000017b69 RSI: ffffffff818f9896 RDI: 0000000000000001
> [223964.580093] RBP: ffff88022fcc3ce0 R08: ffffffff9fc96331 R09: 0000000000000000
> [223964.580095] R10: ffff880226981f20 R11: 0000000000000000 R12: ffff88022fcc3c58
> [223964.580097] R13: ffffffff816feb33 R14: ffff88022fcc3ce0 R15: 0000000030ac88b0
> [223964.580100] FS:  0000000000000000(0000) GS:ffff88022fcc0000(0000) knlGS:0000000000000000
> [223964.580103] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [223964.580105] CR2: 0000000000f38820 CR3: 0000000104b52000 CR4: 00000000000006e0
> [223964.580107] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [223964.580110] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [223964.580112] Process kworker/0:1 (pid: 0, threadinfo ffff880226988000, task ffff880226981f20)
> [223964.580114] Stack:
> [223964.580116]  ffff88022fcc3d30 ffffffff813a4ee8 ffff880226988000 ffffffff9fc96377
> [223964.580120]  000000039c3b34d8 ffff880031438948 0000000030ac88b0 000000006efcdc62
> [223964.580124]  0000000000000001 ffff880226981f20 ffff88022fcc3d40 ffffffff813a4f6a
> [223964.580128] Call Trace:
> [223964.580130]  <IRQ>
> [223964.580133]  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
> [223964.580137]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223964.580141]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223964.580146]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223964.580150]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223964.580154]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223964.580158]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223964.580162]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223964.580167]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223964.580171]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223964.580176]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223964.580180]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223964.580184]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223964.580188]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223964.580192]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223964.580194]  <EOI>
> [223964.580198]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.580202]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223964.580206]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.580210]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223964.580214]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223964.580217] Code: 08 ff 15 46 5c a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31
> [223964.580235]  c1 48 89 d0 48 c1 e0 20 89 ca 48 09 d0 c9 c3 66 2e 0f 1f 84
> [223964.580245] Call Trace:
> [223964.580246]  <IRQ>  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
> [223964.580252]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223964.580255]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223964.580259]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223964.580262]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223964.580265]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223964.580269]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223964.580272]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223964.580276]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223964.580279]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223964.580283]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223964.580286]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223964.580289]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223964.580292]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223964.580295]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223964.580297]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.580303]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223964.580307]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.580310]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223964.580313]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223964.268002] RDX: 0000000000017b69 RSI: ffffffff818f9896 RDI: 0000000000000001
> [223964.268002] RBP: ffff88022fc03ce0 R08: 000000007cb6c596 R09: 0000000000000000
> [223964.268002] R10: ffffffff81a2b020 R11: 0000000000000000 R12: ffff88022fc03c58
> [223964.268002] R13: ffffffff816feb33 R14: ffff88022fc03ce0 R15: 000000002eb85d38
> [223964.268002] FS:  0000000000000000(0000) GS:ffff88022fc00000(0000) knlGS:0000000000000000
> [223964.268002] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [223964.268002] CR2: 00007f25e7bc13a0 CR3: 00000001426fc000 CR4: 00000000000006f0
> [223964.268002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [223964.268002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [223964.268002] Process swapper (pid: 0, threadinfo ffffffff819a6000, task ffffffff81a2b020)
> [223964.268002] Stack:
> [223964.268002]  ffff88022fc03d30 ffffffff813a4ee8 ffffffff819a6000 000000007cb6c5e3
> [223964.268002]  000000007c44ac9c ffff8801f6c22448 000000002eb85d38 000000006efcdc62
> [223964.268002]  0000000000000001 ffffffff81a2b020 ffff88022fc03d40 ffffffff813a4f6a
> [223964.268002] Call Trace:
> [223964.268002]  <IRQ>
> [223964.268002]  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
> [223964.268002]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223964.268002]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223964.268002]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223964.268002]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223964.268002]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223964.268002]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223964.268002]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223964.268002]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223964.268002]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223964.268002]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223964.268002]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223964.268002]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223964.268002]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223964.268002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223964.268002]  <EOI>
> [223964.268002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.268002]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223964.268002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.268002]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223964.268002]  [<ffffffff816ca491>] rest_init+0xd1/0xe0
> [223964.268002]  [<ffffffff816ca3c0>] ? csum_partial_copy_generic+0x170/0x170
> [223964.268002]  [<ffffffff81adcc55>] start_kernel+0x360/0x3ac
> [223964.268002]  [<ffffffff81adc2a2>] x86_64_start_reservations+0x82/0x89
> [223964.268002]  [<ffffffff81adc3b8>] x86_64_start_kernel+0x10f/0x12a
> [223964.268002]  [<ffffffff81adc140>] ? early_idt_handlers+0x140/0x140
> [223964.268002] Code: 08 ff 15 46 5c a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31
> [223964.268002]  c1 48 89 d0 48 c1 e0 20 89 ca 48 09 d0 c9 c3 66 2e 0f 1f 84
> [223964.268002] Call Trace:
> [223964.268002]  <IRQ>  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
> [223964.268002]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223964.268002]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223964.268002]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223964.268002]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223964.268002]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223964.268002]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223964.268002]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223964.268002]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223964.268002]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223964.268002]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223964.268002]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223964.268002]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223964.268002]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223964.268002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223964.268002]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.268002]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223964.268002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.268002]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223964.268002]  [<ffffffff816ca491>] rest_init+0xd1/0xe0
> [223964.268002]  [<ffffffff816ca3c0>] ? csum_partial_copy_generic+0x170/0x170
> [223964.268002]  [<ffffffff81adcc55>] start_kernel+0x360/0x3ac
> [223964.268002]  [<ffffffff81adc2a2>] x86_64_start_reservations+0x82/0x89
> [223964.268002]  [<ffffffff81adc3b8>] x86_64_start_kernel+0x10f/0x12a
> [223964.268002]  [<ffffffff81adc140>] ? early_idt_handlers+0x140/0x140
> [223968.815995] INFO: rcu_sched_state detected stall on CPU 1 (t=15000 jiffies)
> [223968.819995] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 1} (detected by 3, t=15002 jiffies)
> [223968.820000] sending NMI to all CPUs:
> [223968.820002] NMI backtrace for cpu 3
> [223968.820002] CPU 3
> [223968.820002] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223968.820002]
> [223968.820002] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
> [223968.820002] RIP: 0010:[<ffffffff813a4f86>]  [<ffffffff813a4f86>] __const_udelay+0x16/0x40
> [223968.820002] RSP: 0018:ffff88022fcc3a90  EFLAGS: 00000002
> [223968.820002] RAX: 0000000000e34d8a RBX: 0000000000000001 RCX: 0000000001062560
> [223968.820002] RDX: 000000000071a6c5 RSI: 0000000000000002 RDI: 0000000000418958
> [223968.820002] RBP: ffff88022fcc3ab0 R08: 0000000000000002 R09: 0000000000000000
> [223968.820002] R10: 0000000000000006 R11: 000000000000000a R12: ffffffff81a40d80
> [223968.820002] R13: 0000000000000010 R14: ffffffff81a40e40 R15: ffffffff81a40fc0
> [223968.820002] FS:  0000000000000000(0000) GS:ffff88022fcc0000(0000) knlGS:0000000000000000
> [223968.820002] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [223968.820002] CR2: 0000000000f38820 CR3: 0000000104b52000 CR4: 00000000000006e0
> [223968.820002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [223968.820002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [223968.820002] Process kworker/0:1 (pid: 0, threadinfo ffff880226988000, task ffff880226981f20)
> [223968.820002] Stack:
> [223968.820002]  ffff88022fcc3ab0 ffffffff81031695 ffff88022fccdfa0 ffff88022fccdfa0
> [223968.820002]  ffff88022fcc3af0 ffffffff810bb9d2 ffffffff81a40fc0 0000000000000003
> [223968.820002]  0000000000000003 ffff880226981f20 ffffffff810921f0 ffff88022fcc3be0
> [223968.820002] Call Trace:
> [223968.820002]  <IRQ>
> [223968.820002]  [<ffffffff81031695>] ? arch_trigger_all_cpu_backtrace+0x65/0x90
> [223968.820002]  [<ffffffff810bb9d2>] __rcu_pending+0x382/0x3b0
> [223968.820002]  [<ffffffff810921f0>] ? tick_nohz_handler+0x100/0x100
> [223968.820002]  [<ffffffff810bba67>] rcu_check_callbacks+0x67/0x130
> [223968.820002]  [<ffffffff8106d861>] update_process_times+0x41/0x80
> [223968.820002]  [<ffffffff81092256>] tick_sched_timer+0x66/0xc0
> [223968.820002]  [<ffffffff810845ee>] __run_hrtimer+0xfe/0x1e0
> [223968.820002]  [<ffffffff8108491d>] hrtimer_interrupt+0xcd/0x1f0
> [223968.820002]  [<ffffffff810310c4>] smp_apic_timer_interrupt+0x64/0xa0
> [223968.820002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223968.820002]  [<ffffffff816f6334>] ? retint_restore_args+0x13/0x13
> [223968.820002]  [<ffffffff8101afa6>] ? native_read_tsc+0x6/0x20
> [223968.820002]  [<ffffffff813a4eaf>] delay_tsc+0x3f/0xd0
> [223968.820002]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223968.820002]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223968.820002]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223968.820002]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223968.820002]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223968.820002]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223968.820002]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223968.820002]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223968.820002]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223968.820002]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223968.820002]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223968.820002]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223968.820002]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223968.820002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223968.820002]  <EOI>
> [223968.820002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.820002]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223968.820002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.820002]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223968.820002]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223968.820002] Code: 00 00 00 00 00 55 48 89 e5 ff 15 8e a5 6c 00 c9 c3 0f 1f 40 00 55 48 8d 0c bd 00 00 00 00 65 48 8b 14 25 58 2d 01 00 48 8d 04 12
> [223968.820002]  c1 e2 06 48 89 e5 48 29 c2 48 89 c8 f7 e2 48 8d 7a 01 ff 15
> [223968.820002] Call Trace:
> [223968.820002]  <IRQ>  [<ffffffff81031695>] ? arch_trigger_all_cpu_backtrace+0x65/0x90
> [223968.820002]  [<ffffffff810bb9d2>] __rcu_pending+0x382/0x3b0
> [223968.820002]  [<ffffffff810921f0>] ? tick_nohz_handler+0x100/0x100
> [223968.820002]  [<ffffffff810bba67>] rcu_check_callbacks+0x67/0x130
> [223968.820002]  [<ffffffff8106d861>] update_process_times+0x41/0x80
> [223968.820002]  [<ffffffff81092256>] tick_sched_timer+0x66/0xc0
> [223968.820002]  [<ffffffff810845ee>] __run_hrtimer+0xfe/0x1e0
> [223968.820002]  [<ffffffff8108491d>] hrtimer_interrupt+0xcd/0x1f0
> [223968.820002]  [<ffffffff810310c4>] smp_apic_timer_interrupt+0x64/0xa0
> [223968.820002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223968.820002]  [<ffffffff816f6334>] ? retint_restore_args+0x13/0x13
> [223968.820002]  [<ffffffff8101afa6>] ? native_read_tsc+0x6/0x20
> [223968.820002]  [<ffffffff813a4eaf>] delay_tsc+0x3f/0xd0
> [223968.820002]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223968.820002]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223968.820002]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223968.820002]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223968.820002]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223968.820002]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223968.820002]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223968.820002]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223968.820002]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223968.820002]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223968.820002]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223968.820002]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223968.820002]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223968.820002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223968.820002]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.820002]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223968.820002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.820002]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223968.820002]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223968.820335] NMI backtrace for cpu 0
> [223968.820337] CPU 0
> [223968.820338] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223968.820347]
> [223968.820349] Pid: 0, comm: swapper Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
> [223968.820353] RIP: 0010:[<ffffffff813a4ef0>]  [<ffffffff813a4ef0>] delay_tsc+0x80/0xd0
> [223968.820358] RSP: 0018:ffff88022fc03cf0  EFLAGS: 00000206
> [223968.820360] RAX: 000000007659b10f RBX: 0000000000000001 RCX: 000000007659b10f
> [223968.820363] RDX: 000000007659b10f RSI: ffffffff818f9896 RDI: 0000000000000001
> [223968.820365] RBP: ffff88022fc03d30 R08: 000000007659b10f R09: 0000000000000000
> [223968.820367] R10: ffffffff81a2b020 R11: 0000000000000000 R12: 0000000031026962
> [223968.820370] R13: 000000006efcdc62 R14: ffffffff819a6000 R15: 000000007659b0de
> [223968.820373] FS:  0000000000000000(0000) GS:ffff88022fc00000(0000) knlGS:0000000000000000
> [223968.820375] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [223968.820377] CR2: 00007f25e7bc13a0 CR3: 00000001426fc000 CR4: 00000000000006f0
> [223968.820380] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [223968.820382] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [223968.820385] Process swapper (pid: 0, threadinfo ffffffff819a6000, task ffffffff81a2b020)
> [223968.820387] Stack:
> [223968.820388]  ffffffff819a6000 000000007659b0de 00000000818f9896 ffff8801f6c22448
> [223968.820393]  0000000031026962 000000006efcdc62 0000000000000001 ffffffff81a2b020
> [223968.820397]  ffff88022fc03d40 ffffffff813a4f6a ffff88022fc03d80 ffffffff813ac2ab
> [223968.820401] Call Trace:
> [223968.820402]  <IRQ>
> [223968.820406]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223968.820410]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223968.820414]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223968.820417]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223968.820420]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223968.820424]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223968.820427]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223968.820430]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223968.820434]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223968.820437]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223968.820441]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223968.820444]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223968.820447]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223968.820450]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223968.820452]  <EOI>
> [223968.820455]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.820459]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223968.820462]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.820465]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223968.820468]  [<ffffffff816ca491>] rest_init+0xd1/0xe0
> [223968.820471]  [<ffffffff816ca3c0>] ? csum_partial_copy_generic+0x170/0x170
> [223968.820475]  [<ffffffff81adcc55>] start_kernel+0x360/0x3ac
> [223968.820478]  [<ffffffff81adc2a2>] x86_64_start_reservations+0x82/0x89
> [223968.820481]  [<ffffffff81adc3b8>] x86_64_start_kernel+0x10f/0x12a
> [223968.820484]  [<ffffffff81adc140>] ? early_idt_handlers+0x140/0x140
> [223968.820486] Code: 68 1c 01 f3 90 83 40 1c 01 65 44 8b 3c 25 50 d3 00 00 44 3b 7d d4 75 3b 66 66 90 0f ae e8 e8 b8 60 c7 ff 66 90 4c 63 c0 4c 89 c0
> [223968.820504]  2b 45 c8 48 39 d8 72 c7 65 48 8b 04 25 08 c4 00 00 83 a8 44
> [223968.820514] Call Trace:
> [223968.820515]  <IRQ>  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223968.820521]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223968.820525]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223968.820528]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223968.820532]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223968.820535]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223968.820538]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223968.820542]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223968.820546]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223968.820549]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223968.820552]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223968.820556]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223968.820559]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223968.820562]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223968.820564]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.820570]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223968.820573]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.820576]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223968.820579]  [<ffffffff816ca491>] rest_init+0xd1/0xe0
> [223968.820583]  [<ffffffff816ca3c0>] ? csum_partial_copy_generic+0x170/0x170
> [223968.820586]  [<ffffffff81adcc55>] start_kernel+0x360/0x3ac
> [223968.820589]  [<ffffffff81adc2a2>] x86_64_start_reservations+0x82/0x89
> [223968.820593]  [<ffffffff81adc3b8>] x86_64_start_kernel+0x10f/0x12a
> [223968.820596]  [<ffffffff81adc140>] ? early_idt_handlers+0x140/0x140
> [223968.820599] NMI backtrace for cpu 2
> [223968.820600] CPU 2
> [223968.820602] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223968.820610]
> [223968.820612] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
> [223968.820616] RIP: 0010:[<ffffffff8101afa6>]  [<ffffffff8101afa6>] native_read_tsc+0x6/0x20
> [223968.820621] RSP: 0018:ffff88022fc83ce0  EFLAGS: 00000202
> [223968.820623] RAX: 000000007659b116 RBX: 0000000000000001 RCX: 000000007659b0e5
> [223968.820625] RDX: 0000000000017b6b RSI: 0000000000000000 RDI: 0000000000000001
> [223968.820628] RBP: ffff88022fc83ce0 R08: 000000007659b098 R09: 0000000000000000
> [223968.820630] R10: ffff880226948000 R11: 0000000000000000 R12: 00000000345f87d7
> [223968.820632] R13: 000000006efcdc62 R14: ffff88022693e000 R15: 000000007659b0e5
> [223968.820635] FS:  0000000000000000(0000) GS:ffff88022fc80000(0000) knlGS:0000000000000000
> [223968.820638] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [223968.820640] CR2: 00007f25e7874d7f CR3: 0000000124c0d000 CR4: 00000000000006e0
> [223968.820642] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [223968.820645] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [223968.820647] Process kworker/0:1 (pid: 0, threadinfo ffff88022693e000, task ffff880226948000)
> [223968.820649] Stack:
> [223968.820651]  ffff88022fc83d30 ffffffff813a4ee8 ffff88022693e000 000000007659b0e5
> [223968.820655]  000000026b4044c5 ffff88006afd8948 00000000345f87d7 000000006efcdc62
> [223968.820659]  0000000000000001 ffff880226948000 ffff88022fc83d40 ffffffff813a4f6a
> [223968.820663] Call Trace:
> [223968.820665]  <IRQ>
> [223968.820668]  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
> [223968.820671]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223968.820674]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223968.820678]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223968.820682]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223968.820685]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223968.820688]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223968.820691]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223968.820695]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223968.820699]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223968.820702]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223968.820705]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223968.820708]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223968.820712]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223968.820715]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223968.820717]  <EOI>
> [223968.820720]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.820723]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223968.820727]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.820730]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223968.820733]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223968.820735] Code: 08 ff 15 46 5c a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31
> [223968.820753]  c1 48 89 d0 48 c1 e0 20 89 ca 48 09 d0 c9 c3 66 2e 0f 1f 84
> [223968.820763] Call Trace:
> [223968.820764]  <IRQ>  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
> [223968.820769]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223968.820773]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223968.820777]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223968.820780]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223968.820783]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223968.820787]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223968.820790]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223968.820793]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223968.820797]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223968.820801]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223968.820804]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223968.820807]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223968.820810]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223968.820813]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223968.820815]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.820821]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223968.820824]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.820827]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223968.820831]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223968.816001] NMI backtrace for cpu 1
> [223968.816001] CPU 1
> [223968.816001] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223968.816001]
> [223968.816001] Pid: 0, comm: kworker/0:0 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
> [223968.816001] RIP: 0010:[<ffffffff81440955>]  [<ffffffff81440955>] io_serial_out+0x15/0x20
> [223968.816001] RSP: 0018:ffff88022fc437f0  EFLAGS: 00000002
> [223968.816001] RAX: 0000000000000073 RBX: ffffffff8243eec0 RCX: 0000000000000000
> [223968.816001] RDX: 00000000000003f8 RSI: 00000000000003f8 RDI: ffffffff8243eec0
> [223968.816001] RBP: ffff88022fc437f0 R08: 000000007659a435 R09: 0000000000000000
> [223968.816001] R10: 0000000000000001 R11: 000000000000000a R12: 0000000000000073
> [223968.816001] R13: ffffffff81bc648d R14: 0000000000000050 R15: ffffffff8243eec0
> [223968.816001] FS:  0000000000000000(0000) GS:ffff88022fc40000(0000) knlGS:0000000000000000
> [223968.816001] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [223968.816001] CR2: 00007f472ba6a6fc CR3: 0000000126bb7000 CR4: 00000000000006e0
> [223968.816001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [223968.816001] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [223968.816001] Process kworker/0:0 (pid: 0, threadinfo ffff880226928000, task ffff88022690dd60)
> [223968.816001] Stack:
> [223968.816001]  ffff88022fc43810 ffffffff814410dc 0000000000000030 ffffffff814410b0
> [223968.816001]  ffff88022fc43850 ffffffff8143cdb5 0000000000000087 0000000000000000
> [223968.816001]  ffffffff8243eec0 0000000000000001 0000000000000087 000000000000000d
> [223968.816001] Call Trace:
> [223968.816001]  <IRQ>
> [223968.816001]  [<ffffffff814410dc>] serial8250_console_putchar+0x2c/0x40
> [223968.816001]  [<ffffffff814410b0>] ? wait_for_xmitr+0xa0/0xa0
> [223968.816001]  [<ffffffff8143cdb5>] uart_console_write+0x35/0x70
> [223968.816001]  [<ffffffff814417be>] serial8250_console_write+0xbe/0x1a0
> [223968.816001]  [<ffffffff8105c78e>] __call_console_drivers+0x8e/0xb0
> [223968.816001]  [<ffffffff8105c7f5>] _call_console_drivers+0x45/0x70
> [223968.816001]  [<ffffffff8105d02f>] console_unlock+0x17f/0x2b0
> [223968.816001]  [<ffffffff8105d64d>] vprintk+0x1fd/0x520
> [223968.816001]  [<ffffffff816f16c1>] printk+0x67/0x69
> [223968.816001]  [<ffffffff816f5fa6>] ? _raw_spin_unlock+0x26/0x40
> [223968.816001]  [<ffffffff8105388b>] ? account_system_time+0xab/0x190
> [223968.816001]  [<ffffffff810bb7e4>] __rcu_pending+0x194/0x3b0
> [223968.816001]  [<ffffffff810921f0>] ? tick_nohz_handler+0x100/0x100
> [223968.816001]  [<ffffffff810bba67>] rcu_check_callbacks+0x67/0x130
> [223968.816001]  [<ffffffff8106d861>] update_process_times+0x41/0x80
> [223968.816001]  [<ffffffff81092256>] tick_sched_timer+0x66/0xc0
> [223968.816001]  [<ffffffff810845ee>] __run_hrtimer+0xfe/0x1e0
> [223968.816001]  [<ffffffff8108491d>] hrtimer_interrupt+0xcd/0x1f0
> [223968.816001]  [<ffffffff810310c4>] smp_apic_timer_interrupt+0x64/0xa0
> [223968.816001]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223968.816001]  [<ffffffff816f6334>] ? retint_restore_args+0x13/0x13
> [223968.816001]  [<ffffffff8101afa6>] ? native_read_tsc+0x6/0x20
> [223968.816001]  [<ffffffff813a4eaf>] delay_tsc+0x3f/0xd0
> [223968.816001]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223968.816001]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223968.816001]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223968.816001]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223968.816001]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223968.816001]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223968.816001]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223968.816001]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223968.816001]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223968.816001]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223968.816001]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223968.816001]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223968.816001]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223968.816001]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223968.816001]  <EOI>
> [223968.816001]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.816001]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223968.816001]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.816001]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223968.816001]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223968.816001] Code: 48 89 e5 d3 e2 03 57 38 ec 0f b6 c0 c9 c3 0f 1f 84 00 00 00 00 00 0f b6 8f 81 00 00 00 55 89 d0 48 89 e5 d3 e6 03 77 38 89 f2 ee <c9> c3 66 0f 1f 84 00 00 00 00 00 55 80 bf 82 00 00 00 08 48 89
> [223968.816001] Call Trace:
> [223968.816001]  <IRQ>  [<ffffffff814410dc>] serial8250_console_putchar+0x2c/0x40
> [223968.816001]  [<ffffffff814410b0>] ? wait_for_xmitr+0xa0/0xa0
> [223968.816001]  [<ffffffff8143cdb5>] uart_console_write+0x35/0x70
> [223968.816001]  [<ffffffff814417be>] serial8250_console_write+0xbe/0x1a0
> [223968.816001]  [<ffffffff8105c78e>] __call_console_drivers+0x8e/0xb0
> [223968.816001]  [<ffffffff8105c7f5>] _call_console_drivers+0x45/0x70
> [223968.816001]  [<ffffffff8105d02f>] console_unlock+0x17f/0x2b0
> [223968.816001]  [<ffffffff8105d64d>] vprintk+0x1fd/0x520
> [223968.816001]  [<ffffffff816f16c1>] printk+0x67/0x69
> [223968.816001]  [<ffffffff816f5fa6>] ? _raw_spin_unlock+0x26/0x40
> [223968.816001]  [<ffffffff8105388b>] ? account_system_time+0xab/0x190
> [223968.816001]  [<ffffffff810bb7e4>] __rcu_pending+0x194/0x3b0
> [223968.816001]  [<ffffffff810921f0>] ? tick_nohz_handler+0x100/0x100
> [223968.816001]  [<ffffffff810bba67>] rcu_check_callbacks+0x67/0x130
> [223968.816001]  [<ffffffff8106d861>] update_process_times+0x41/0x80
> [223968.816001]  [<ffffffff81092256>] tick_sched_timer+0x66/0xc0
> [223968.816001]  [<ffffffff810845ee>] __run_hrtimer+0xfe/0x1e0
> [223968.816001]  [<ffffffff8108491d>] hrtimer_interrupt+0xcd/0x1f0
> [223968.816001]  [<ffffffff810310c4>] smp_apic_timer_interrupt+0x64/0xa0
> [223968.816001]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223968.816001]  [<ffffffff816f6334>] ? retint_restore_args+0x13/0x13
> [223968.816001]  [<ffffffff8101afa6>] ? native_read_tsc+0x6/0x20
> [223968.816001]  [<ffffffff813a4eaf>] delay_tsc+0x3f/0xd0
> [223968.816001]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223968.816001]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223968.816001]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223968.816001]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223968.816001]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223968.816001]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223968.816001]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223968.816001]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223968.816001]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223968.816001]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223968.816001]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223968.816001]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223968.816001]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223968.816001]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223968.816001]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.816001]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223968.816001]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.816001]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223968.816001]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
>
> [ goes on for another ~300kB, trimmed ]
>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-25  7:13                                       ` Linus Torvalds
@ 2011-10-25  9:01                                         ` David Miller
  2011-10-25 12:30                                           ` Thomas Gleixner
  0 siblings, 1 reply; 98+ messages in thread
From: David Miller @ 2011-10-25  9:01 UTC (permalink / raw)
  To: torvalds
  Cc: sim, netdev, tglx, a.p.zijlstra, linux-kernel, davej, schwidefsky, mingo

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Tue, 25 Oct 2011 09:13:48 +0200

> Added netdev, because this seems to be a generic networking bug (ABBA
> between sk_lock and icsk_retransmit_timer if my quick scan looks
> correct).
> 
> Davem?

I suspect that's all just a side effect of whatever is creating the
preempt_count imbalance.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-25  9:01                                         ` David Miller
@ 2011-10-25 12:30                                           ` Thomas Gleixner
  2011-10-25 23:18                                             ` David Miller
  0 siblings, 1 reply; 98+ messages in thread
From: Thomas Gleixner @ 2011-10-25 12:30 UTC (permalink / raw)
  To: David Miller
  Cc: torvalds, sim, netdev, a.p.zijlstra, linux-kernel, davej,
	schwidefsky, mingo

On Tue, 25 Oct 2011, David Miller wrote:

> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Tue, 25 Oct 2011 09:13:48 +0200
> 
> > Added netdev, because this seems to be a generic networking bug (ABBA
> > between sk_lock and icsk_retransmit_timer if my quick scan looks
> > correct).
> > 
> > Davem?
> 
> I suspect that's all just a side effect of whatever is creating the
> preempt_count imbalance.

Something is holding socket lock and it was acquired in sk_clone()
which does bh_lock_sock() and returns with the lock held, though I got
completely lost in the gazillions of possible callchains ...

While staring at it I found an missing unlock in sk_clone() itself,
but that's not the one which causes the leak. Lockdep would have
complained about that separately :)

Thanks,

	tglx

--------->
Subject: net: Unlock sock before calling sk_free()

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Index: linux-2.6/net/core/sock.c
===================================================================
--- linux-2.6.orig/net/core/sock.c
+++ linux-2.6/net/core/sock.c
@@ -1260,6 +1260,7 @@ struct sock *sk_clone(const struct sock 
 			/* It is still raw copy of parent, so invalidate
 			 * destructor and make plain sk_free() */
 			newsk->sk_destruct = NULL;
+			bh_unlock_sock(newsk);
 			sk_free(newsk);
 			newsk = NULL;
 			goto out;


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-18 20:12                                     ` Linus Torvalds
@ 2011-10-25 15:26                                       ` Simon Kirby
  2011-10-26  1:47                                         ` Yong Zhang
  0 siblings, 1 reply; 98+ messages in thread
From: Simon Kirby @ 2011-10-25 15:26 UTC (permalink / raw)
  To: Linus Torvalds, Peter Zijlstra, Ingo Molnar
  Cc: Thomas Gleixner, Linux Kernel Mailing List, Dave Jones,
	Martin Schwidefsky, Ingo Molnar, David Miller

On Tue, Oct 18, 2011 at 01:12:41PM -0700, Linus Torvalds wrote:

> On Tue, Oct 18, 2011 at 12:48 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > It does not look related.
> 
> Yeah, the only lock held there seems to be the socket lock, and it
> looks like all CPU's are spinning on it.
> 
> > Could you try to reproduce that problem with
> > lockdep enabled? lockdep might make it go away, but it's definitely
> > worth a try.
> 
> And DEBUG_SPINLOCK / DEBUG_SPINLOCK_SLEEP too. Maybe you're triggering
> some odd networking thing.  It sounds unlikely, but maybe some error
> case you get into doesn't release the socket lock.
> 
> I think PROVE_LOCKING already enables DEBUG_SPINLOCK, but the sleeping
> lock thing is separate, iirc.

I think the config option you were trying to think of is
CONFIG_DEBUG_ATOMIC_SLEEP, which enables CONFIG_PREEMPT_COUNT.

By the way, we got this WARN_ON_ONCE while running lockdep elsewhere:

       /*
        * We can walk the hash lockfree, because the hash only
        * grows, and we are careful when adding entries to the end:
        */
       list_for_each_entry(class, hash_head, hash_entry) {
               if (class->key == key) {
                       WARN_ON_ONCE(class->name != lock->name);
                       return class;
               }
       }

[19274.691090] ------------[ cut here ]------------
[19274.691107] WARNING: at kernel/lockdep.c:690 __lock_acquire+0xfd6/0x2180()
[19274.691112] Hardware name: PowerEdge 2950
[19274.691115] Modules linked in: drbd lru_cache cn ipmi_devintf ipmi_si ipmi_msghandler sata_sil24 bnx2
[19274.691137] Pid: 4416, comm: heartbeat Not tainted 3.1.0-hw-lockdep+ #52
[19274.691141] Call Trace:
[19274.691149]  [<ffffffff81098f96>] ? __lock_acquire+0xfd6/0x2180
[19274.691156]  [<ffffffff8105c4f0>] warn_slowpath_common+0x80/0xc0
[19274.691163]  [<ffffffff8105c545>] warn_slowpath_null+0x15/0x20
[19274.691169]  [<ffffffff81098f96>] __lock_acquire+0xfd6/0x2180
[19274.691175]  [<ffffffff8109a2e9>] ? lock_release_non_nested+0x1a9/0x340
[19274.691181]  [<ffffffff8109a7b9>] lock_acquire+0x109/0x140
[19274.691185]  [<ffffffff8104a302>] ? double_rq_lock+0x52/0x80
[19274.691191]  [<ffffffff813a4f8a>] ? __delay+0xa/0x10
[19274.691197]  [<ffffffff816f55fa>] _raw_spin_lock_nested+0x3a/0x50
[19274.691201]  [<ffffffff8104a302>] ? double_rq_lock+0x52/0x80
[19274.691205]  [<ffffffff8104a302>] double_rq_lock+0x52/0x80
[19274.691210]  [<ffffffff81058167>] load_balance+0x897/0x16e0
[19274.691215]  [<ffffffff81058199>] ? load_balance+0x8c9/0x16e0
[19274.691219]  [<ffffffff8104d172>] ? update_shares+0xd2/0x150
[19274.691226]  [<ffffffff816f2572>] ? __schedule+0x842/0xa20
[19274.691232]  [<ffffffff816f2608>] __schedule+0x8d8/0xa20
[19274.691238]  [<ffffffff816f2572>] ? __schedule+0x842/0xa20
[19274.691243]  [<ffffffff81063e87>] ? local_bh_enable+0xa7/0x110
[19274.691249]  [<ffffffff8169c008>] ? unix_stream_recvmsg+0x1d8/0x7f0
[19274.691254]  [<ffffffff81614c88>] ? dev_queue_xmit+0x1a8/0x8a0
[19274.691258]  [<ffffffff816f282a>] schedule+0x3a/0x60
[19274.691265]  [<ffffffff816f4515>] schedule_hrtimeout_range_clock+0x105/0x120
[19274.691270]  [<ffffffff81096c9d>] ? trace_hardirqs_on+0xd/0x10
[19274.691276]  [<ffffffff81080d89>] ? add_wait_queue+0x49/0x60
[19274.691282]  [<ffffffff816f453e>] schedule_hrtimeout_range+0xe/0x10
[19274.691291]  [<ffffffff8113dc04>] poll_schedule_timeout+0x44/0x70
[19274.691297]  [<ffffffff8113e29c>] do_sys_poll+0x33c/0x4f0
[19274.691303]  [<ffffffff8113dcf0>] ? poll_freewait+0xc0/0xc0
[19274.691309]  [<ffffffff8113ddf0>] ? __pollwait+0x100/0x100
[19274.691317]  [<ffffffff81602c3d>] ? sock_update_classid+0xfd/0x140
[19274.691323]  [<ffffffff81602bb0>] ? sock_update_classid+0x70/0x140
[19274.691330]  [<ffffffff815ff1f7>] ? sock_recvmsg+0xf7/0x130
[19274.691336]  [<ffffffff81098450>] ? __lock_acquire+0x490/0x2180
[19274.691343]  [<ffffffff8110427e>] ? might_fault+0x4e/0xa0
[19274.691351]  [<ffffffff8101a129>] ? sched_clock+0x9/0x10
[19274.691356]  [<ffffffff810944cd>] ? trace_hardirqs_off+0xd/0x10
[19274.691363]  [<ffffffff815ffb0b>] ? sys_recvfrom+0xbb/0x120
[19274.691370]  [<ffffffff81082540>] ? process_cpu_clock_getres+0x10/0x10
[19274.691376]  [<ffffffff8110427e>] ? might_fault+0x4e/0xa0
[19274.691383]  [<ffffffff8110427e>] ? might_fault+0x4e/0xa0
[19274.691390]  [<ffffffff816fe0ca>] ? sysret_check+0x2e/0x69
[19274.691396]  [<ffffffff8113e647>] sys_poll+0x77/0x110
[19274.691402]  [<ffffffff816fe092>] system_call_fastpath+0x16/0x1b
[19274.691407] ---[ end trace 74fbaae9066aadcc ]---

Simon-

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-24 19:02                                     ` Simon Kirby
  2011-10-25  7:13                                       ` Linus Torvalds
@ 2011-10-25 20:20                                       ` Simon Kirby
  2011-10-31 17:32                                         ` Simon Kirby
  2011-11-18 23:11                                         ` [tip:perf/core] lockdep: Show subclass in pretty print of lockdep output tip-bot for Steven Rostedt
  1 sibling, 2 replies; 98+ messages in thread
From: Simon Kirby @ 2011-10-25 20:20 UTC (permalink / raw)
  To: Thomas Gleixner, David Miller
  Cc: Peter Zijlstra, Linus Torvalds, Linux Kernel Mailing List,
	Dave Jones, Martin Schwidefsky, Ingo Molnar, Network Development

On Mon, Oct 24, 2011 at 12:02:03PM -0700, Simon Kirby wrote:

> Ok, hit the hang about 4 more times, but only this morning on a box with
> a serial cable attached. Yay!

Here's lockdep output from another box. This one looks a bit different.

Simon-

[583223.799383] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[583223.805083] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[583223.805093] 
[583223.805094] =================================
[583223.805096] [ INFO: inconsistent lock state ]
[583223.805098] 3.1.0-rc10-hw-lockdep+ #51
[583223.805100] ---------------------------------
[583223.805102] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
[583223.805105] swapper/0 [HC0[0]:SC0[0]:HE1:SE1] takes:
[583223.805107]  (slock-AF_INET){+.?.-.}, at: [<ffffffff81604540>] sk_clone+0x120/0x420
[583223.805116] {IN-SOFTIRQ-W} state was registered at:
[583223.805117]   [<ffffffff81098c7c>] __lock_acquire+0xcbc/0x2180
[583223.805123]   [<ffffffff8109a7b9>] lock_acquire+0x109/0x140
[583223.805126]   [<ffffffff816f55fc>] _raw_spin_lock+0x3c/0x50
[583223.805131]   [<ffffffff8166bd3d>] udp_queue_rcv_skb+0x26d/0x4b0
[583223.805135]   [<ffffffff8166c6a3>] __udp4_lib_rcv+0x2f3/0x910
[583223.805139]   [<ffffffff8166ccd5>] udp_rcv+0x15/0x20
[583223.805142]   [<ffffffff81641960>] ip_local_deliver_finish+0x100/0x2f0
[583223.805146]   [<ffffffff81641bdd>] ip_local_deliver+0x8d/0xa0
[583223.805149]   [<ffffffff81641203>] ip_rcv_finish+0x1a3/0x510
[583223.805152]   [<ffffffff816417e2>] ip_rcv+0x272/0x2f0
[583223.805154]   [<ffffffff81610d67>] __netif_receive_skb+0x4d7/0x560
[583223.805158]   [<ffffffff81610ec0>] process_backlog+0xd0/0x1e0
[583223.805161]   [<ffffffff81613880>] net_rx_action+0x140/0x2c0
[583223.805164]   [<ffffffff810640b8>] __do_softirq+0x138/0x250
[583223.805168]   [<ffffffff817002bc>] call_softirq+0x1c/0x30
[583223.805172]   [<ffffffff810153c5>] do_softirq+0x95/0xd0
[583223.805176]   [<ffffffff81063ecd>] local_bh_enable+0xed/0x110
[583223.805179]   [<ffffffff81614c48>] dev_queue_xmit+0x1a8/0x8a0
[583223.805181]   [<ffffffff8161f1aa>] neigh_resolve_output+0x17a/0x220
[583223.805185]   [<ffffffff81647d4c>] ip_finish_output+0x2ec/0x590
[583223.805188]   [<ffffffff81648078>] ip_output+0x88/0xe0
[583223.805191]   [<ffffffff81646cd8>] ip_local_out+0x28/0x80
[583223.805194]   [<ffffffff81646d39>] ip_send_skb+0x9/0x40
[583223.805197]   [<ffffffff8166aeb2>] udp_send_skb+0x122/0x390
[583223.805200]   [<ffffffff8166db0c>] udp_sendmsg+0x7dc/0x920
[583223.805203]   [<ffffffff81675e1f>] inet_sendmsg+0xbf/0x120
[583223.805207]   [<ffffffff815ff333>] sock_sendmsg+0xe3/0x110
[583223.805209]   [<ffffffff815ffc55>] sys_sendto+0x105/0x140
[583223.805212]   [<ffffffff816fe052>] system_call_fastpath+0x16/0x1b
[583223.805217] irq event stamp: 4284605374
[583223.805219] hardirqs last  enabled at (4284605372): [<ffffffff816101ad>] net_rps_action_and_irq_enable+0x8d/0xa0
[583223.805222] hardirqs last disabled at (4284605373): [<ffffffff8106412d>] __do_softirq+0x1ad/0x250
[583223.805226] softirqs last  enabled at (4284605374): [<ffffffff81064176>] __do_softirq+0x1f6/0x250
[583223.805230] softirqs last disabled at (4284605313): [<ffffffff817002bc>] call_softirq+0x1c/0x30
[583223.805233] 
[583223.805233] other info that might help us debug this:
[583223.805235]  Possible unsafe locking scenario:
[583223.805236] 
[583223.805237]        CPU0
[583223.805238]        ----
[583223.805239]   lock(slock-AF_INET);
[583223.805241]   <Interrupt>
[583223.805242]     lock(slock-AF_INET);
[583223.805244] 
[583223.805245]  *** DEADLOCK ***
[583223.805246] 
[583223.805248] 1 lock held by swapper/0:
[583223.805249]  #0:  (slock-AF_INET){+.?.-.}, at: [<ffffffff81604540>] sk_clone+0x120/0x420
[583223.805254] 
[583223.805254] stack backtrace:
[583223.805257] Pid: 0, comm: swapper Not tainted 3.1.0-rc10-hw-lockdep+ #51
[583223.805259] Call Trace:
[583223.805264]  [<ffffffff81096033>] print_usage_bug+0x243/0x310
[583223.805267]  [<ffffffff810965b4>] mark_lock+0x4b4/0x6c0
[583223.805271]  [<ffffffff81097400>] ? check_usage_forwards+0x110/0x110
[583223.805275]  [<ffffffff81096862>] mark_held_locks+0xa2/0x130
[583223.805278]  [<ffffffff816f6334>] ? retint_restore_args+0x13/0x13
[583223.805281]  [<ffffffff81096c0d>] trace_hardirqs_on_caller+0x13d/0x1c0
[583223.805286]  [<ffffffff813a60ae>] trace_hardirqs_on_thunk+0x3a/0x3f
[583223.805290]  [<ffffffff81092b8e>] ? tick_nohz_stop_sched_tick+0x2fe/0x430
[583223.805293]  [<ffffffff816f6334>] ? retint_restore_args+0x13/0x13
[583223.805297]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[583223.805301]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[583223.805304]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[583223.805307]  [<ffffffff816ca491>] rest_init+0xd1/0xe0
[583223.805310]  [<ffffffff816ca3c0>] ? csum_partial_copy_generic+0x170/0x170
[583223.805315]  [<ffffffff81adcc55>] start_kernel+0x360/0x3ac
[583223.805318]  [<ffffffff81adc2a2>] x86_64_start_reservations+0x82/0x89
[583223.805321]  [<ffffffff81adc3b8>] x86_64_start_kernel+0x10f/0x12a
[583223.805325]  [<ffffffff81adc140>] ? early_idt_handlers+0x140/0x140
[583226.813848] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[583232.802948] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[583244.833571] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[583253.849631] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[583268.837126] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[587843.931805] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[587846.165584] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[587850.602316] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[587859.482841] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[587873.940136] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[587877.240624] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[590476.272022] BUG: soft lockup - CPU#0 stuck for 22s! [swapper:0]
[590476.276002] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[590476.276002] irq event stamp: 4284605374
[590476.276002] hardirqs last  enabled at (4284605372): [<ffffffff816101ad>] net_rps_action_and_irq_enable+0x8d/0xa0
[590476.276002] hardirqs last disabled at (4284605373): [<ffffffff8106412d>] __do_softirq+0x1ad/0x250
[590476.276002] softirqs last  enabled at (4284605374): [<ffffffff81064176>] __do_softirq+0x1f6/0x250
[590476.276002] softirqs last disabled at (4284605313): [<ffffffff817002bc>] call_softirq+0x1c/0x30
[590476.276002] CPU 0 
[590476.276002] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[590476.276002] 
[590476.276002] Pid: 0, comm: swapper Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0UR033
[590476.276002] RIP: 0010:[<ffffffff813a4ee3>]  [<ffffffff813a4ee3>] delay_tsc+0x73/0xd0
[590476.276002] RSP: 0018:ffff88022fc03cf0  EFLAGS: 00000206
[590476.276002] RAX: 00042f884dcdaa24 RBX: ffff88022fc0d3c0 RCX: 000000004dcdaa24
[590476.380029] BUG: soft lockup - CPU#1 stuck for 22s! [php:10828]
[590476.380033] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[590476.380044] irq event stamp: 0
[590476.380045] hardirqs last  enabled at (0): [<          (null)>]           (null)
[590476.380048] hardirqs last disabled at (0): [<ffffffff8105aa8b>] copy_process+0x65b/0x1450
[590476.380056] softirqs last  enabled at (0): [<ffffffff8105aa8b>] copy_process+0x65b/0x1450
[590476.380060] softirqs last disabled at (0): [<          (null)>]           (null)
[590476.380063] CPU 1 
[590476.380064] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[590476.380072] 
[590476.380075] Pid: 10828, comm: php Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0UR033
[590476.380079] RIP: 0010:[<ffffffff8101afa6>]  [<ffffffff8101afa6>] native_read_tsc+0x6/0x20
[590476.380086] RSP: 0000:ffff88022fc43ce0  EFLAGS: 00000206
[590476.380088] RAX: 000000005aa56d04 RBX: ffffffff816f6334 RCX: 000000005aa56c92
[590476.380091] RDX: 0000000000042f88 RSI: ffffffff818f9896 RDI: 0000000000000001
[590476.380093] RBP: ffff88022fc43ce0 R08: 000000005aa56c92 R09: 0000000000000000
[590476.380096] R10: ffff88014b9a9f20 R11: 0000000000000000 R12: ffff88022fc43c58
[590476.380098] R13: ffffffff816feb33 R14: ffff88022fc43ce0 R15: 000000000e27878c
[590476.380101] FS:  00007fb61c8fa720(0000) GS:ffff88022fc40000(0000) knlGS:0000000000000000
[590476.380103] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[590476.380106] CR2: 00000000027914a0 CR3: 000000013a070000 CR4: 00000000000006e0
[590476.380108] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[590476.380110] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[590476.380113] Process php (pid: 10828, threadinfo ffff88014a1f2000, task ffff88014b9a9f20)
[590476.380115] Stack:
[590476.380117]  ffff88022fc43d30 ffffffff813a4eaf ffff88014a1f2000 000000005aa56c38
[590476.380121]  00000001818f9896 ffff88001db58048 000000000e27878c 0000000076e96800
[590476.380125]  0000000000000001 ffff88014b9a9f20 ffff88022fc43d40 ffffffff813a4f6a
[590476.380129] Call Trace:
[590476.380132]  <IRQ> 
[590476.380137]  [<ffffffff813a4eaf>] delay_tsc+0x3f/0xd0
[590476.380141]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[590476.380145]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[590476.380151]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[590476.380157]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[590476.380161]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[590476.380166]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[590476.380169]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[590476.380174]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[590476.380179]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[590476.380184]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[590476.380188]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[590476.380191]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[590476.380196]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[590476.380200]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[590476.380203]  <EOI> 
[590476.380206]  [<ffffffff816f6319>] ? retint_swapgs+0x13/0x1b
[590476.380208] Code: 08 ff 15 46 5c a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31 
[590476.380227]  c1 48 89 d0 48 c1 e0 20 89 ca 48 09 d0 c9 c3 66 2e 0f 1f 84 
[590476.380236] Call Trace:
[590476.380237]  <IRQ>  [<ffffffff813a4eaf>] delay_tsc+0x3f/0xd0
[590476.380242]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[590476.380246]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[590476.380249]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[590476.380252]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[590476.380256]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[590476.380259]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[590476.380262]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[590476.380265]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[590476.380268]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[590476.380271]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[590476.380274]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[590476.380277]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[590476.380280]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[590476.380283]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[590476.380285]  <EOI>  [<ffffffff816f6319>] ? retint_swapgs+0x13/0x1b
[590476.484032] BUG: soft lockup - CPU#2 stuck for 23s! [suexec:10831]
...

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-25 12:30                                           ` Thomas Gleixner
@ 2011-10-25 23:18                                             ` David Miller
  0 siblings, 0 replies; 98+ messages in thread
From: David Miller @ 2011-10-25 23:18 UTC (permalink / raw)
  To: tglx
  Cc: torvalds, sim, netdev, a.p.zijlstra, linux-kernel, davej,
	schwidefsky, mingo

From: Thomas Gleixner <tglx@linutronix.de>
Date: Tue, 25 Oct 2011 14:30:50 +0200 (CEST)

> Subject: net: Unlock sock before calling sk_free()
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Good spotting, applied, thanks Thomas!

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-25 15:26                                       ` Simon Kirby
@ 2011-10-26  1:47                                         ` Yong Zhang
  0 siblings, 0 replies; 98+ messages in thread
From: Yong Zhang @ 2011-10-26  1:47 UTC (permalink / raw)
  To: Simon Kirby
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	David Miller

On Tue, Oct 25, 2011 at 08:26:31AM -0700, Simon Kirby wrote:
> On Tue, Oct 18, 2011 at 01:12:41PM -0700, Linus Torvalds wrote:
> 
> > On Tue, Oct 18, 2011 at 12:48 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > >
> > > It does not look related.
> > 
> > Yeah, the only lock held there seems to be the socket lock, and it
> > looks like all CPU's are spinning on it.
> > 
> > > Could you try to reproduce that problem with
> > > lockdep enabled? lockdep might make it go away, but it's definitely
> > > worth a try.
> > 
> > And DEBUG_SPINLOCK / DEBUG_SPINLOCK_SLEEP too. Maybe you're triggering
> > some odd networking thing.  It sounds unlikely, but maybe some error
> > case you get into doesn't release the socket lock.
> > 
> > I think PROVE_LOCKING already enables DEBUG_SPINLOCK, but the sleeping
> > lock thing is separate, iirc.
> 
> I think the config option you were trying to think of is
> CONFIG_DEBUG_ATOMIC_SLEEP, which enables CONFIG_PREEMPT_COUNT.
> 
> By the way, we got this WARN_ON_ONCE while running lockdep elsewhere:
> 
>        /*
>         * We can walk the hash lockfree, because the hash only
>         * grows, and we are careful when adding entries to the end:
>         */
>        list_for_each_entry(class, hash_head, hash_entry) {
>                if (class->key == key) {
>                        WARN_ON_ONCE(class->name != lock->name);

Someone has hit this before, maybe you can try the patch in:
http://marc.info/?l=linux-kernel&m=131919035525533

Thanks,
Yong

>                        return class;
>                }
>        }
> 
> [19274.691090] ------------[ cut here ]------------
> [19274.691107] WARNING: at kernel/lockdep.c:690 __lock_acquire+0xfd6/0x2180()
> [19274.691112] Hardware name: PowerEdge 2950
> [19274.691115] Modules linked in: drbd lru_cache cn ipmi_devintf ipmi_si ipmi_msghandler sata_sil24 bnx2
> [19274.691137] Pid: 4416, comm: heartbeat Not tainted 3.1.0-hw-lockdep+ #52
> [19274.691141] Call Trace:
> [19274.691149]  [<ffffffff81098f96>] ? __lock_acquire+0xfd6/0x2180
> [19274.691156]  [<ffffffff8105c4f0>] warn_slowpath_common+0x80/0xc0
> [19274.691163]  [<ffffffff8105c545>] warn_slowpath_null+0x15/0x20
> [19274.691169]  [<ffffffff81098f96>] __lock_acquire+0xfd6/0x2180
> [19274.691175]  [<ffffffff8109a2e9>] ? lock_release_non_nested+0x1a9/0x340
> [19274.691181]  [<ffffffff8109a7b9>] lock_acquire+0x109/0x140
> [19274.691185]  [<ffffffff8104a302>] ? double_rq_lock+0x52/0x80
> [19274.691191]  [<ffffffff813a4f8a>] ? __delay+0xa/0x10
> [19274.691197]  [<ffffffff816f55fa>] _raw_spin_lock_nested+0x3a/0x50
> [19274.691201]  [<ffffffff8104a302>] ? double_rq_lock+0x52/0x80
> [19274.691205]  [<ffffffff8104a302>] double_rq_lock+0x52/0x80
> [19274.691210]  [<ffffffff81058167>] load_balance+0x897/0x16e0
> [19274.691215]  [<ffffffff81058199>] ? load_balance+0x8c9/0x16e0
> [19274.691219]  [<ffffffff8104d172>] ? update_shares+0xd2/0x150
> [19274.691226]  [<ffffffff816f2572>] ? __schedule+0x842/0xa20
> [19274.691232]  [<ffffffff816f2608>] __schedule+0x8d8/0xa20
> [19274.691238]  [<ffffffff816f2572>] ? __schedule+0x842/0xa20
> [19274.691243]  [<ffffffff81063e87>] ? local_bh_enable+0xa7/0x110
> [19274.691249]  [<ffffffff8169c008>] ? unix_stream_recvmsg+0x1d8/0x7f0
> [19274.691254]  [<ffffffff81614c88>] ? dev_queue_xmit+0x1a8/0x8a0
> [19274.691258]  [<ffffffff816f282a>] schedule+0x3a/0x60
> [19274.691265]  [<ffffffff816f4515>] schedule_hrtimeout_range_clock+0x105/0x120
> [19274.691270]  [<ffffffff81096c9d>] ? trace_hardirqs_on+0xd/0x10
> [19274.691276]  [<ffffffff81080d89>] ? add_wait_queue+0x49/0x60
> [19274.691282]  [<ffffffff816f453e>] schedule_hrtimeout_range+0xe/0x10
> [19274.691291]  [<ffffffff8113dc04>] poll_schedule_timeout+0x44/0x70
> [19274.691297]  [<ffffffff8113e29c>] do_sys_poll+0x33c/0x4f0
> [19274.691303]  [<ffffffff8113dcf0>] ? poll_freewait+0xc0/0xc0
> [19274.691309]  [<ffffffff8113ddf0>] ? __pollwait+0x100/0x100
> [19274.691317]  [<ffffffff81602c3d>] ? sock_update_classid+0xfd/0x140
> [19274.691323]  [<ffffffff81602bb0>] ? sock_update_classid+0x70/0x140
> [19274.691330]  [<ffffffff815ff1f7>] ? sock_recvmsg+0xf7/0x130
> [19274.691336]  [<ffffffff81098450>] ? __lock_acquire+0x490/0x2180
> [19274.691343]  [<ffffffff8110427e>] ? might_fault+0x4e/0xa0
> [19274.691351]  [<ffffffff8101a129>] ? sched_clock+0x9/0x10
> [19274.691356]  [<ffffffff810944cd>] ? trace_hardirqs_off+0xd/0x10
> [19274.691363]  [<ffffffff815ffb0b>] ? sys_recvfrom+0xbb/0x120
> [19274.691370]  [<ffffffff81082540>] ? process_cpu_clock_getres+0x10/0x10
> [19274.691376]  [<ffffffff8110427e>] ? might_fault+0x4e/0xa0
> [19274.691383]  [<ffffffff8110427e>] ? might_fault+0x4e/0xa0
> [19274.691390]  [<ffffffff816fe0ca>] ? sysret_check+0x2e/0x69
> [19274.691396]  [<ffffffff8113e647>] sys_poll+0x77/0x110
> [19274.691402]  [<ffffffff816fe092>] system_call_fastpath+0x16/0x1b
> [19274.691407] ---[ end trace 74fbaae9066aadcc ]---
> 
> Simon-
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Only stand for myself

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-25 20:20                                       ` Simon Kirby
@ 2011-10-31 17:32                                         ` Simon Kirby
  2011-11-02 16:40                                           ` Thomas Gleixner
  2011-11-02 22:10                                           ` Steven Rostedt
  2011-11-18 23:11                                         ` [tip:perf/core] lockdep: Show subclass in pretty print of lockdep output tip-bot for Steven Rostedt
  1 sibling, 2 replies; 98+ messages in thread
From: Simon Kirby @ 2011-10-31 17:32 UTC (permalink / raw)
  To: Thomas Gleixner, David Miller
  Cc: Peter Zijlstra, Linus Torvalds, Linux Kernel Mailing List,
	Dave Jones, Martin Schwidefsky, Ingo Molnar, Network Development

On Tue, Oct 25, 2011 at 01:20:49PM -0700, Simon Kirby wrote:

> On Mon, Oct 24, 2011 at 12:02:03PM -0700, Simon Kirby wrote:
> 
> > Ok, hit the hang about 4 more times, but only this morning on a box with
> > a serial cable attached. Yay!
> 
> Here's lockdep output from another box. This one looks a bit different.

One more, again a bit different. The last few lockups have looked like
this. Not sure why, but we're hitting this at a few a day now. Thomas,
this is without your patch, but as you said, that's right before a free
and should print a separate lockdep warning.

No "huh" lines until after the trace on this one. I'll move to 3.1 with
cherry-picked b0691c8e now.

Simon-

[104661.173798] 
[104661.173801] =======================================================
[104661.179922] [ INFO: possible circular locking dependency detected ]
[104661.179922] 3.1.0-rc10-hw-lockdep+ #51
[104661.179922] -------------------------------------------------------
[104661.179922] watchdog.pl/29331 is trying to acquire lock:
[104661.179922]  (slock-AF_INET/1){+.-.-.}, at: [<ffffffff81664887>] tcp_v4_rcv+0x867/0xc10
[104661.179922] 
[104661.179922] but task is already holding lock:
[104661.179922]  (slock-AF_INET){+.-.-.}, at: [<ffffffff81604540>] sk_clone+0x120/0x420
[104661.179922] 
[104661.179922] which lock already depends on the new lock.
[104661.179922] 
[104661.179922] 
[104661.179922] the existing dependency chain (in reverse order) is:
[104661.239412] 
[104661.239412] -> #1 (slock-AF_INET){+.-.-.}:
[104661.244767]        [<ffffffff8109a7b9>] lock_acquire+0x109/0x140
[104661.244767]        [<ffffffff816f55fc>] _raw_spin_lock+0x3c/0x50
[104661.244767]        [<ffffffff81604540>] sk_clone+0x120/0x420
[104661.244767]        [<ffffffff8164cb33>] inet_csk_clone+0x13/0x90
[104661.244767]        [<ffffffff816669a5>] tcp_create_openreq_child+0x25/0x4d0
[104661.244767]        [<ffffffff81664c78>] tcp_v4_syn_recv_sock+0x48/0x2c0
[104661.244767]        [<ffffffff816667f5>] tcp_check_req+0x335/0x4c0
[104661.244767]        [<ffffffff81663e5e>] tcp_v4_do_rcv+0x29e/0x460
[104661.244767]        [<ffffffff816648ac>] tcp_v4_rcv+0x88c/0xc10   
[104661.244767]        [<ffffffff81641960>] ip_local_deliver_finish+0x100/0x2f0
[104661.244767]        [<ffffffff81641bdd>] ip_local_deliver+0x8d/0xa0
[104661.244767]        [<ffffffff81641203>] ip_rcv_finish+0x1a3/0x510 
[104661.244767]        [<ffffffff816417e2>] ip_rcv+0x272/0x2f0
[104661.244767]        [<ffffffff81610d67>] __netif_receive_skb+0x4d7/0x560
[104661.244767]        [<ffffffff81610ec0>] process_backlog+0xd0/0x1e0
[104661.244767]        [<ffffffff81613880>] net_rx_action+0x140/0x2c0 
[104661.244767]        [<ffffffff810640b8>] __do_softirq+0x138/0x250  
[104661.244767]        [<ffffffff817002bc>] call_softirq+0x1c/0x30    
[104661.244767]        [<ffffffff810153c5>] do_softirq+0x95/0xd0      
[104661.244767]        [<ffffffff81063dbd>] local_bh_enable_ip+0xed/0x110
[104661.244767]        [<ffffffff816f5e9f>] _raw_spin_unlock_bh+0x3f/0x50
[104661.244767]        [<ffffffff81602e41>] release_sock+0x161/0x1d0
[104661.244767]        [<ffffffff816762ed>] inet_stream_connect+0x6d/0x2f0
[104661.244767]        [<ffffffff815fcfeb>] kernel_connect+0xb/0x10
[104661.244767]        [<ffffffff816aaf86>] xs_tcp_setup_socket+0x2a6/0x4c0
[104661.244767]        [<ffffffff81078cf9>] process_one_work+0x1e9/0x560   
[104661.244767]        [<ffffffff81079403>] worker_thread+0x193/0x420      
[104661.244767]        [<ffffffff81080466>] kthread+0x96/0xb0
[104661.244767]        [<ffffffff817001c4>] kernel_thread_helper+0x4/0x10
[104661.244767] 
[104661.244767] -> #0 (slock-AF_INET/1){+.-.-.}:
[104661.244767]        [<ffffffff8109a000>] __lock_acquire+0x2040/0x2180
[104661.244767]        [<ffffffff8109a7b9>] lock_acquire+0x109/0x140
[104661.244767]        [<ffffffff816f55aa>] _raw_spin_lock_nested+0x3a/0x50
[104661.244767]        [<ffffffff81664887>] tcp_v4_rcv+0x867/0xc10
[104661.244767]        [<ffffffff81641960>] ip_local_deliver_finish+0x100/0x2f0
[104661.244767]        [<ffffffff81641bdd>] ip_local_deliver+0x8d/0xa0
[104661.244767]        [<ffffffff81641203>] ip_rcv_finish+0x1a3/0x510 
[104661.244767]        [<ffffffff816417e2>] ip_rcv+0x272/0x2f0
[104661.244767]        [<ffffffff81610d67>] __netif_receive_skb+0x4d7/0x560
[104661.244767]        [<ffffffff81612e24>] netif_receive_skb+0x104/0x120  
[104661.244767]        [<ffffffff81612f70>] napi_skb_finish+0x50/0x70
[104661.244767]        [<ffffffff81613635>] napi_gro_receive+0xc5/0xd0
[104661.244767]        [<ffffffffa000ad50>] bnx2_poll_work+0x610/0x1560 [bnx2]
[104661.244767]        [<ffffffffa000bde6>] bnx2_poll+0x66/0x250 [bnx2]
[104661.244767]        [<ffffffff81613880>] net_rx_action+0x140/0x2c0  
[104661.244767]        [<ffffffff810640b8>] __do_softirq+0x138/0x250   
[104661.244767]        [<ffffffff817002bc>] call_softirq+0x1c/0x30     
[104661.244767]        [<ffffffff810153c5>] do_softirq+0x95/0xd0       
[104661.244767]        [<ffffffff81063c8d>] irq_exit+0xdd/0x110        
[104661.244767]        [<ffffffff81014b74>] do_IRQ+0x64/0xe0           
[104661.244767]        [<ffffffff816f6273>] ret_from_intr+0x0/0x1a     
[104661.244767]        [<ffffffff816f65b5>] page_fault+0x25/0x30     
[104661.244767] 
[104661.244767] other info that might help us debug this:
[104661.244767] 
[104661.244767]  Possible unsafe locking scenario:
[104661.244767]        
[104661.244767]        CPU0                    CPU1
[104661.244767]        ----                    ----
[104661.244767]   lock(slock-AF_INET);
[104661.244767]                                lock(slock-AF_INET);
[104661.244767]                                lock(slock-AF_INET);
[104661.244767]   lock(slock-AF_INET);
[104661.244767] 
[104661.244767]  *** DEADLOCK ***
[104661.244767] 
[104661.244767] 3 locks held by watchdog.pl/29331:
[104661.244767]  #0:  (slock-AF_INET){+.-.-.}, at: [<ffffffff81604540>] sk_clone+0x120/0x420
[104661.244767]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff816109f5>] __netif_receive_skb+0x165/0x560
[104661.244767]  #2:  (rcu_read_lock){.+.+..}, at: [<ffffffff816418a0>] ip_local_deliver_finish+0x40/0x2f0
[104661.244767] 
[104661.244767] stack backtrace:
[104661.244767] Pid: 29331, comm: watchdog.pl Not tainted 3.1.0-rc10-hw-lockdep+ #51
[104661.244767] Call Trace:
[104661.244767]  <IRQ>  [<ffffffff81097eab>] print_circular_bug+0x21b/0x330
[104661.244767]  [<ffffffff8109a000>] __lock_acquire+0x2040/0x2180
[104661.244767]  [<ffffffff8109a7b9>] lock_acquire+0x109/0x140
[104661.244767]  [<ffffffff81664887>] ? tcp_v4_rcv+0x867/0xc10
[104661.244767]  [<ffffffff816f55aa>] _raw_spin_lock_nested+0x3a/0x50
[104661.244767]  [<ffffffff81664887>] ? tcp_v4_rcv+0x867/0xc10
[104661.244767]  [<ffffffff81664887>] tcp_v4_rcv+0x867/0xc10  
[104661.244767]  [<ffffffff816418a0>] ? ip_local_deliver_finish+0x40/0x2f0
[104661.244767]  [<ffffffff81636978>] ? nf_hook_slow+0x148/0x1a0
[104661.244767]  [<ffffffff81641960>] ip_local_deliver_finish+0x100/0x2f0
[104661.244767]  [<ffffffff816418a0>] ? ip_local_deliver_finish+0x40/0x2f0
[104661.244767]  [<ffffffff81641bdd>] ip_local_deliver+0x8d/0xa0
[104661.244767]  [<ffffffff81641203>] ip_rcv_finish+0x1a3/0x510 
[104661.244767]  [<ffffffff816417e2>] ip_rcv+0x272/0x2f0
[104661.244767]  [<ffffffff81610d67>] __netif_receive_skb+0x4d7/0x560
[104661.244767]  [<ffffffff816109f5>] ? __netif_receive_skb+0x165/0x560
[104661.244767]  [<ffffffff81612e24>] netif_receive_skb+0x104/0x120
[104661.244767]  [<ffffffff81612d43>] ? netif_receive_skb+0x23/0x120
[104661.244767]  [<ffffffff816133ab>] ? dev_gro_receive+0x29b/0x380 
[104661.244767]  [<ffffffff816132a2>] ? dev_gro_receive+0x192/0x380 
[104661.244767]  [<ffffffff81612f70>] napi_skb_finish+0x50/0x70
[104661.244767]  [<ffffffff81613635>] napi_gro_receive+0xc5/0xd0
[104661.244767]  [<ffffffffa000ad50>] bnx2_poll_work+0x610/0x1560 [bnx2]
[104661.244767]  [<ffffffffa000bde6>] bnx2_poll+0x66/0x250 [bnx2]
[104661.244767]  [<ffffffff81613880>] net_rx_action+0x140/0x2c0  
[104661.244767]  [<ffffffff810640b8>] __do_softirq+0x138/0x250   
[104661.244767]  [<ffffffff817002bc>] call_softirq+0x1c/0x30     
[104661.244767]  [<ffffffff810153c5>] do_softirq+0x95/0xd0       
[104661.244767]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110        
[104661.244767]  [<ffffffff81014b74>] do_IRQ+0x64/0xe0           
[104661.244767]  [<ffffffff816f6273>] common_interrupt+0x73/0x73
[104661.244767]  <EOI>  [<ffffffff816f99b3>] ? do_page_fault+0x93/0x520
[104661.244767]  [<ffffffff816f99af>] ? do_page_fault+0x8f/0x520
[104661.244767]  [<ffffffff81149afc>] ? vfsmount_lock_local_unlock+0x1c/0x40
[104661.244767]  [<ffffffff8114a79b>] ? mntput_no_expire+0x3b/0x150
[104661.244767]  [<ffffffff8114a8ca>] ? mntput+0x1a/0x30
[104661.244767]  [<ffffffff8112c540>] ? fput+0x190/0x230
[104661.244767]  [<ffffffff813a60ed>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[104661.244767]  [<ffffffff816f65b5>] page_fault+0x25/0x30
[104661.897577] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[104661.923653] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[104663.418206] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[104666.420003] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[104672.425159] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[104684.423542] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[104691.206752] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-31 17:32                                         ` Simon Kirby
@ 2011-11-02 16:40                                           ` Thomas Gleixner
  2011-11-02 17:27                                             ` Eric Dumazet
  2011-11-02 18:28                                             ` Simon Kirby
  2011-11-02 22:10                                           ` Steven Rostedt
  1 sibling, 2 replies; 98+ messages in thread
From: Thomas Gleixner @ 2011-11-02 16:40 UTC (permalink / raw)
  To: Simon Kirby
  Cc: David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

On Mon, 31 Oct 2011, Simon Kirby wrote:
> On Tue, Oct 25, 2011 at 01:20:49PM -0700, Simon Kirby wrote:
> 
> > On Mon, Oct 24, 2011 at 12:02:03PM -0700, Simon Kirby wrote:
> > 
> > > Ok, hit the hang about 4 more times, but only this morning on a box with
> > > a serial cable attached. Yay!
> > 
> > Here's lockdep output from another box. This one looks a bit different.
> 
> One more, again a bit different. The last few lockups have looked like
> this. Not sure why, but we're hitting this at a few a day now. Thomas,
> this is without your patch, but as you said, that's right before a free
> and should print a separate lockdep warning.
> 
> No "huh" lines until after the trace on this one. I'll move to 3.1 with

That means that the lockdep warning hit in the same net_rx cycle
before the leak was detected by the softirq code.

> cherry-picked b0691c8e now.

Can you please add the debug patch below and try the following:

Enable CONFIG_FUNCTION_TRACER & CONFIG_FUNCTION_GRAPH_TRACER

# cd $DEBUGFSMOUNTPOINT/tracing
# echo sk_clone >set_ftrace_filter
# echo function >current_tracer
# echo 1 >options/func_stack_trace

Now wait until it reproduces (which stops the trace) and read out

# cat trace >/tmp/trace.txt

Please provide the trace file along with the lockdep splat. That
should tell us which callchain is responsible for the spinlock
leakage.

Thanks,

	tglx

--------------->
 kernel/softirq.c |    1 +
 1 file changed, 1 insertion(+)

Index: linux-2.6/kernel/softirq.c
===================================================================
--- linux-2.6.orig/kernel/softirq.c
+++ linux-2.6/kernel/softirq.c
@@ -238,6 +238,7 @@ restart:
 			h->action(h);
 			trace_softirq_exit(vec_nr);
 			if (unlikely(prev_count != preempt_count())) {
+				tracing_off();
 				printk(KERN_ERR "huh, entered softirq %u %s %p"
 				       "with preempt_count %08x,"
 				       " exited with %08x?\n", vec_nr,

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 16:40                                           ` Thomas Gleixner
@ 2011-11-02 17:27                                             ` Eric Dumazet
  2011-11-02 17:46                                               ` Linus Torvalds
                                                                 ` (2 more replies)
  2011-11-02 18:28                                             ` Simon Kirby
  1 sibling, 3 replies; 98+ messages in thread
From: Eric Dumazet @ 2011-11-02 17:27 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Simon Kirby, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

Le mercredi 02 novembre 2011 à 17:40 +0100, Thomas Gleixner a écrit :
> On Mon, 31 Oct 2011, Simon Kirby wrote:
> > On Tue, Oct 25, 2011 at 01:20:49PM -0700, Simon Kirby wrote:
> > 
> > > On Mon, Oct 24, 2011 at 12:02:03PM -0700, Simon Kirby wrote:
> > > 
> > > > Ok, hit the hang about 4 more times, but only this morning on a box with
> > > > a serial cable attached. Yay!
> > > 
> > > Here's lockdep output from another box. This one looks a bit different.
> > 
> > One more, again a bit different. The last few lockups have looked like
> > this. Not sure why, but we're hitting this at a few a day now. Thomas,
> > this is without your patch, but as you said, that's right before a free
> > and should print a separate lockdep warning.
> > 
> > No "huh" lines until after the trace on this one. I'll move to 3.1 with
> 
> That means that the lockdep warning hit in the same net_rx cycle
> before the leak was detected by the softirq code.
> 
> > cherry-picked b0691c8e now.
> 
> Can you please add the debug patch below and try the following:
> 
> Enable CONFIG_FUNCTION_TRACER & CONFIG_FUNCTION_GRAPH_TRACER
> 
> # cd $DEBUGFSMOUNTPOINT/tracing
> # echo sk_clone >set_ftrace_filter
> # echo function >current_tracer
> # echo 1 >options/func_stack_trace
> 
> Now wait until it reproduces (which stops the trace) and read out
> 
> # cat trace >/tmp/trace.txt
> 
> Please provide the trace file along with the lockdep splat. That
> should tell us which callchain is responsible for the spinlock
> leakage.
> 
> Thanks,
> 
> 	tglx
> 
> --------------->
>  kernel/softirq.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> Index: linux-2.6/kernel/softirq.c
> ===================================================================
> --- linux-2.6.orig/kernel/softirq.c
> +++ linux-2.6/kernel/softirq.c
> @@ -238,6 +238,7 @@ restart:
>  			h->action(h);
>  			trace_softirq_exit(vec_nr);
>  			if (unlikely(prev_count != preempt_count())) {
> +				tracing_off();
>  				printk(KERN_ERR "huh, entered softirq %u %s %p"
>  				       "with preempt_count %08x,"
>  				       " exited with %08x?\n", vec_nr,


I believe it might come from commit 0e734419
(ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.)

In case inet_csk_route_child_sock() returns NULL, we dont release socket
lock.




^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 17:27                                             ` Eric Dumazet
@ 2011-11-02 17:46                                               ` Linus Torvalds
  2011-11-02 17:53                                                 ` Eric Dumazet
  2011-11-02 17:49                                               ` Eric Dumazet
  2011-11-02 17:54                                               ` Thomas Gleixner
  2 siblings, 1 reply; 98+ messages in thread
From: Linus Torvalds @ 2011-11-02 17:46 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Thomas Gleixner, Simon Kirby, David Miller, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

On Wed, Nov 2, 2011 at 10:27 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> I believe it might come from commit 0e734419
> (ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.)
>
> In case inet_csk_route_child_sock() returns NULL, we dont release socket
> lock.

Hmm. I'm not seeing it. We're not even taking the socket lock there.
Or is it hidden somehow?

                    Linus

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 17:27                                             ` Eric Dumazet
  2011-11-02 17:46                                               ` Linus Torvalds
@ 2011-11-02 17:49                                               ` Eric Dumazet
  2011-11-02 17:58                                                 ` Eric Dumazet
  2011-11-02 17:54                                               ` Thomas Gleixner
  2 siblings, 1 reply; 98+ messages in thread
From: Eric Dumazet @ 2011-11-02 17:49 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Simon Kirby, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

Le mercredi 02 novembre 2011 à 18:27 +0100, Eric Dumazet a écrit :

> I believe it might come from commit 0e734419
> (ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.)
> 
> In case inet_csk_route_child_sock() returns NULL, we dont release socket
> lock.
> 
> 

Yes, thats the problem. I am testing following patch :

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 0ea10ee..683d97a 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1510,6 +1510,7 @@ exit:
 	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS);
 	return NULL;
 put_and_exit:
+	bh_unlock_sock(newsk);
 	sock_put(newsk);
 	goto exit;
 }



^ permalink raw reply related	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 17:46                                               ` Linus Torvalds
@ 2011-11-02 17:53                                                 ` Eric Dumazet
  2011-11-02 18:00                                                   ` Linus Torvalds
  0 siblings, 1 reply; 98+ messages in thread
From: Eric Dumazet @ 2011-11-02 17:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Simon Kirby, David Miller, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

Le mercredi 02 novembre 2011 à 10:46 -0700, Linus Torvalds a écrit :
> On Wed, Nov 2, 2011 at 10:27 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > I believe it might come from commit 0e734419
> > (ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.)
> >
> > In case inet_csk_route_child_sock() returns NULL, we dont release socket
> > lock.
> 
> Hmm. I'm not seeing it. We're not even taking the socket lock there.
> Or is it hidden somehow?
> 
>                     Linus

tcp_v4_syn_recv_sock()
{
	newsk = tcp_create_openreq_child(sk, req, skb);

...
	

}

newsk is locked at this point.




^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 17:27                                             ` Eric Dumazet
  2011-11-02 17:46                                               ` Linus Torvalds
  2011-11-02 17:49                                               ` Eric Dumazet
@ 2011-11-02 17:54                                               ` Thomas Gleixner
  2011-11-02 18:04                                                 ` Eric Dumazet
  2 siblings, 1 reply; 98+ messages in thread
From: Thomas Gleixner @ 2011-11-02 17:54 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Simon Kirby, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2793 bytes --]

On Wed, 2 Nov 2011, Eric Dumazet wrote:

> Le mercredi 02 novembre 2011 à 17:40 +0100, Thomas Gleixner a écrit :
> > On Mon, 31 Oct 2011, Simon Kirby wrote:
> > > On Tue, Oct 25, 2011 at 01:20:49PM -0700, Simon Kirby wrote:
> > > 
> > > > On Mon, Oct 24, 2011 at 12:02:03PM -0700, Simon Kirby wrote:
> > > > 
> > > > > Ok, hit the hang about 4 more times, but only this morning on a box with
> > > > > a serial cable attached. Yay!
> > > > 
> > > > Here's lockdep output from another box. This one looks a bit different.
> > > 
> > > One more, again a bit different. The last few lockups have looked like
> > > this. Not sure why, but we're hitting this at a few a day now. Thomas,
> > > this is without your patch, but as you said, that's right before a free
> > > and should print a separate lockdep warning.
> > > 
> > > No "huh" lines until after the trace on this one. I'll move to 3.1 with
> > 
> > That means that the lockdep warning hit in the same net_rx cycle
> > before the leak was detected by the softirq code.
> > 
> > > cherry-picked b0691c8e now.
> > 
> > Can you please add the debug patch below and try the following:
> > 
> > Enable CONFIG_FUNCTION_TRACER & CONFIG_FUNCTION_GRAPH_TRACER
> > 
> > # cd $DEBUGFSMOUNTPOINT/tracing
> > # echo sk_clone >set_ftrace_filter
> > # echo function >current_tracer
> > # echo 1 >options/func_stack_trace
> > 
> > Now wait until it reproduces (which stops the trace) and read out
> > 
> > # cat trace >/tmp/trace.txt
> > 
> > Please provide the trace file along with the lockdep splat. That
> > should tell us which callchain is responsible for the spinlock
> > leakage.
> > 
> > Thanks,
> > 
> > 	tglx
> > 
> > --------------->
> >  kernel/softirq.c |    1 +
> >  1 file changed, 1 insertion(+)
> > 
> > Index: linux-2.6/kernel/softirq.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/softirq.c
> > +++ linux-2.6/kernel/softirq.c
> > @@ -238,6 +238,7 @@ restart:
> >  			h->action(h);
> >  			trace_softirq_exit(vec_nr);
> >  			if (unlikely(prev_count != preempt_count())) {
> > +				tracing_off();
> >  				printk(KERN_ERR "huh, entered softirq %u %s %p"
> >  				       "with preempt_count %08x,"
> >  				       " exited with %08x?\n", vec_nr,
> 
> 
> I believe it might come from commit 0e734419
> (ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.)
> 
> In case inet_csk_route_child_sock() returns NULL, we dont release socket
> lock.

The same applies for if (__inet_inherit_port(sk, newsk) < 0) a few
lines further down, but that part was leaking the lock before that
commit already.

Just for the record, the locking in that code is mind boggling. It
took me some detective work to find even the place where the success
code path unlocks the lock :(

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 17:49                                               ` Eric Dumazet
@ 2011-11-02 17:58                                                 ` Eric Dumazet
  2011-11-02 19:16                                                   ` Simon Kirby
  0 siblings, 1 reply; 98+ messages in thread
From: Eric Dumazet @ 2011-11-02 17:58 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Simon Kirby, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

Le mercredi 02 novembre 2011 à 18:49 +0100, Eric Dumazet a écrit :
> Le mercredi 02 novembre 2011 à 18:27 +0100, Eric Dumazet a écrit :
> 
> > I believe it might come from commit 0e734419
> > (ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.)
> > 
> > In case inet_csk_route_child_sock() returns NULL, we dont release socket
> > lock.
> > 
> > 
> 
> Yes, thats the problem. I am testing following patch :
> 
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 0ea10ee..683d97a 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1510,6 +1510,7 @@ exit:
>  	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS);
>  	return NULL;
>  put_and_exit:
> +	bh_unlock_sock(newsk);
>  	sock_put(newsk);
>  	goto exit;
>  }
> 


This indeed solves the problem, but more closer inspection is needed to
close all bugs, not this only one.

# netstat -s
Ip:
    6961157 total packets received
    0 forwarded
    0 incoming packets discarded
    6961157 incoming packets delivered
    6961049 requests sent out
    2 dropped because of missing route    //// HERE, this is the origin




^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 17:53                                                 ` Eric Dumazet
@ 2011-11-02 18:00                                                   ` Linus Torvalds
  2011-11-02 18:05                                                     ` Eric Dumazet
  0 siblings, 1 reply; 98+ messages in thread
From: Linus Torvalds @ 2011-11-02 18:00 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Thomas Gleixner, Simon Kirby, David Miller, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

On Wed, Nov 2, 2011 at 10:53 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> tcp_v4_syn_recv_sock()
> {
>        newsk = tcp_create_openreq_child(sk, req, skb);
>
> newsk is locked at this point.

Umm, if that is the case, then the bug predates the commit you point
to. There were exit paths before that too.

                   Linus

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 17:54                                               ` Thomas Gleixner
@ 2011-11-02 18:04                                                 ` Eric Dumazet
  0 siblings, 0 replies; 98+ messages in thread
From: Eric Dumazet @ 2011-11-02 18:04 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Simon Kirby, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

Le mercredi 02 novembre 2011 à 18:54 +0100, Thomas Gleixner a écrit :

> The same applies for if (__inet_inherit_port(sk, newsk) < 0) a few
> lines further down, but that part was leaking the lock before that
> commit already.
> 

Yes, but in normal condition, this never happened, this is why this
problem was never noticed.

tproxy is probaby very seldom used, and when used the error path is
probably never reached...



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 18:00                                                   ` Linus Torvalds
@ 2011-11-02 18:05                                                     ` Eric Dumazet
  2011-11-02 18:10                                                       ` Linus Torvalds
  0 siblings, 1 reply; 98+ messages in thread
From: Eric Dumazet @ 2011-11-02 18:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Simon Kirby, David Miller, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

Le mercredi 02 novembre 2011 à 11:00 -0700, Linus Torvalds a écrit :
> On Wed, Nov 2, 2011 at 10:53 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > tcp_v4_syn_recv_sock()
> > {
> >        newsk = tcp_create_openreq_child(sk, req, skb);
> >
> > newsk is locked at this point.
> 
> Umm, if that is the case, then the bug predates the commit you point
> to. There were exit paths before that too.
> 

Yes, but only when tproxy is used, and in some obscure error
conditions... Probably nobody ever hit them or complained.




^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 18:05                                                     ` Eric Dumazet
@ 2011-11-02 18:10                                                       ` Linus Torvalds
  0 siblings, 0 replies; 98+ messages in thread
From: Linus Torvalds @ 2011-11-02 18:10 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Thomas Gleixner, Simon Kirby, David Miller, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

On Wed, Nov 2, 2011 at 11:05 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> Yes, but only when tproxy is used, and in some obscure error
> conditions... Probably nobody ever hit them or complained.

Yes, I'm not disputing that. However, it does show how incredibly
fragile that code is.

May I suggest renaming those "clone_sk()" kinds of functions
"clone_sk_lock()" or something? So that you *see* that it's locked as
it is cloned. That might have made the bug not happen in the first
place..

Of course, maybe it's obvious to most net people - just not me looking
at the code - that the new socket ended up being locked at allocation.
But considering the bug happened twice, that "obvious" part is clearly
debatable..

                          Linus

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 16:40                                           ` Thomas Gleixner
  2011-11-02 17:27                                             ` Eric Dumazet
@ 2011-11-02 18:28                                             ` Simon Kirby
  2011-11-02 18:30                                               ` Thomas Gleixner
  1 sibling, 1 reply; 98+ messages in thread
From: Simon Kirby @ 2011-11-02 18:28 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

On Wed, Nov 02, 2011 at 05:40:53PM +0100, Thomas Gleixner wrote:

> On Mon, 31 Oct 2011, Simon Kirby wrote:
> 
> > One more, again a bit different. The last few lockups have looked like
> > this. Not sure why, but we're hitting this at a few a day now. Thomas,
> > this is without your patch, but as you said, that's right before a free
> > and should print a separate lockdep warning.
> > 
> > No "huh" lines until after the trace on this one. I'll move to 3.1 with
> 
> That means that the lockdep warning hit in the same net_rx cycle
> before the leak was detected by the softirq code.
> 
> > cherry-picked b0691c8e now.
> 
> Can you please add the debug patch below and try the following:
> 
> Enable CONFIG_FUNCTION_TRACER & CONFIG_FUNCTION_GRAPH_TRACER
> 
> # cd $DEBUGFSMOUNTPOINT/tracing
> # echo sk_clone >set_ftrace_filter
> # echo function >current_tracer
> # echo 1 >options/func_stack_trace
> 
> Now wait until it reproduces (which stops the trace) and read out
> 
> # cat trace >/tmp/trace.txt
> 
> Please provide the trace file along with the lockdep splat. That
> should tell us which callchain is responsible for the spinlock
> leakage.
> Thanks,
> 
> 	tglx
> 
> --------------->
>  kernel/softirq.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> Index: linux-2.6/kernel/softirq.c
> ===================================================================
> --- linux-2.6.orig/kernel/softirq.c
> +++ linux-2.6/kernel/softirq.c
> @@ -238,6 +238,7 @@ restart:
>  			h->action(h);
>  			trace_softirq_exit(vec_nr);
>  			if (unlikely(prev_count != preempt_count())) {
> +				tracing_off();
>  				printk(KERN_ERR "huh, entered softirq %u %s %p"
>  				       "with preempt_count %08x,"
>  				       " exited with %08x?\n", vec_nr,

Ok, I'll try this. Hmm, all CPUs typically try to grab the lock fairly
quickly after it happens, which could make it difficult to cat the file.
I'll try ftrace_dump(DUMP_ALL); in there instead.

Simon-

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 18:28                                             ` Simon Kirby
@ 2011-11-02 18:30                                               ` Thomas Gleixner
  0 siblings, 0 replies; 98+ messages in thread
From: Thomas Gleixner @ 2011-11-02 18:30 UTC (permalink / raw)
  To: Simon Kirby
  Cc: David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

On Wed, 2 Nov 2011, Simon Kirby wrote:
> On Wed, Nov 02, 2011 at 05:40:53PM +0100, Thomas Gleixner wrote:
> Ok, I'll try this. Hmm, all CPUs typically try to grab the lock fairly
> quickly after it happens, which could make it difficult to cat the file.
> I'll try ftrace_dump(DUMP_ALL); in there instead.

Eric has spotted the source of trouble already. Can you try his patch
first? If it still persists, we still can resort to hardcore tracing :)

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 17:58                                                 ` Eric Dumazet
@ 2011-11-02 19:16                                                   ` Simon Kirby
  2011-11-02 22:42                                                     ` Eric Dumazet
  0 siblings, 1 reply; 98+ messages in thread
From: Simon Kirby @ 2011-11-02 19:16 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Thomas Gleixner, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

On Wed, Nov 02, 2011 at 06:58:21PM +0100, Eric Dumazet wrote:

> Le mercredi 02 novembre 2011 ?? 18:49 +0100, Eric Dumazet a ??crit :
> > Le mercredi 02 novembre 2011 ?? 18:27 +0100, Eric Dumazet a ??crit :
> > 
> > > I believe it might come from commit 0e734419
> > > (ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.)
> > > 
> > > In case inet_csk_route_child_sock() returns NULL, we dont release socket
> > > lock.
> > > 
> > > 
> > 
> > Yes, thats the problem. I am testing following patch :
> > 
> > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> > index 0ea10ee..683d97a 100644
> > --- a/net/ipv4/tcp_ipv4.c
> > +++ b/net/ipv4/tcp_ipv4.c
> > @@ -1510,6 +1510,7 @@ exit:
> >  	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS);
> >  	return NULL;
> >  put_and_exit:
> > +	bh_unlock_sock(newsk);
> >  	sock_put(newsk);
> >  	goto exit;
> >  }
> > 
> 
> 
> This indeed solves the problem, but more closer inspection is needed to
> close all bugs, not this only one.
> 
> # netstat -s
> Ip:
>     6961157 total packets received
>     0 forwarded
>     0 incoming packets discarded
>     6961157 incoming packets delivered
>     6961049 requests sent out
>     2 dropped because of missing route    //// HERE, this is the origin

Actually, we have an anti-abuse daemon that injects blackhole routes, so
this makes sense. (The daemon was written before ipsets were merged and
normal netfilter rules make it fall over under attack.)

I'll try with this patch. Thanks!

Simon-

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-31 17:32                                         ` Simon Kirby
  2011-11-02 16:40                                           ` Thomas Gleixner
@ 2011-11-02 22:10                                           ` Steven Rostedt
  2011-11-02 23:00                                             ` Steven Rostedt
  1 sibling, 1 reply; 98+ messages in thread
From: Steven Rostedt @ 2011-11-02 22:10 UTC (permalink / raw)
  To: Simon Kirby
  Cc: Thomas Gleixner, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

Thomas pointed me here.

On Mon, Oct 31, 2011 at 10:32:46AM -0700, Simon Kirby wrote:
> [104661.244767] 
> [104661.244767]  Possible unsafe locking scenario:
> [104661.244767]        
> [104661.244767]        CPU0                    CPU1
> [104661.244767]        ----                    ----
> [104661.244767]   lock(slock-AF_INET);
> [104661.244767]                                lock(slock-AF_INET);
> [104661.244767]                                lock(slock-AF_INET);
> [104661.244767]   lock(slock-AF_INET);
> [104661.244767] 
> [104661.244767]  *** DEADLOCK ***
> [104661.244767] 

Bah, I used the __print_lock_name() function to show the lock names in
the above, which leaves off the subclass number. I'll go write up a
patch that fixes that.

Thanks,

-- Steve


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 19:16                                                   ` Simon Kirby
@ 2011-11-02 22:42                                                     ` Eric Dumazet
  2011-11-03  0:24                                                       ` Thomas Gleixner
                                                                         ` (2 more replies)
  0 siblings, 3 replies; 98+ messages in thread
From: Eric Dumazet @ 2011-11-02 22:42 UTC (permalink / raw)
  To: Simon Kirby
  Cc: Thomas Gleixner, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development, Balazs Scheidler,
	KOVACS Krisztian

On 02/11/2011 20:16, Simon Kirby wrote:

 
> Actually, we have an anti-abuse daemon that injects blackhole routes, so
> this makes sense. (The daemon was written before ipsets were merged and
> normal netfilter rules make it fall over under attack.)
> 
> I'll try with this patch. Thanks!
> 


Thanks !

Here is the official submission, please add your 'Tested-by' signature
when you can confirm problem goes away.

(It did here, when I injected random NULL returns from
inet_csk_route_child_sock(), so I am confident this is the problem you hit )

[PATCH] net: add missing bh_unlock_sock() calls

Simon Kirby reported lockdep warnings and following messages :

[104661.897577] huh, entered softirq 3 NET_RX ffffffff81613740
preempt_count 00000101, exited with 00000102?

[104661.923653] huh, entered softirq 3 NET_RX ffffffff81613740
preempt_count 00000101, exited with 00000102?

Problem comes from commit 0e734419
(ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.)

If inet_csk_route_child_sock() returns NULL, we should release socket
lock before freeing it.

Another lock imbalance exists if __inet_inherit_port() returns an error
since commit 093d282321da ( tproxy: fix hash locking issue when using
port redirection in __inet_inherit_port()) a backport is also needed for
>= 2.6.37 kernels.

Reported-by: Dimon Kirby <sim@hostway.ca>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Balazs Scheidler <bazsi@balabit.hu>
CC: KOVACS Krisztian <hidden@balabit.hu>
---
 net/dccp/ipv4.c     |    1 +
 net/ipv4/tcp_ipv4.c |    1 +
 2 files changed, 2 insertions(+)

diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index 332639b..90a919a 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -433,6 +433,7 @@ exit:
 	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS);
 	return NULL;
 put_and_exit:
+	bh_unlock_sock(newsk);
 	sock_put(newsk);
 	goto exit;
 }
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 0ea10ee..683d97a 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1510,6 +1510,7 @@ exit:
 	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS);
 	return NULL;
 put_and_exit:
+	bh_unlock_sock(newsk);
 	sock_put(newsk);
 	goto exit;
 }

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 22:10                                           ` Steven Rostedt
@ 2011-11-02 23:00                                             ` Steven Rostedt
  2011-11-03  0:09                                               ` Simon Kirby
  0 siblings, 1 reply; 98+ messages in thread
From: Steven Rostedt @ 2011-11-02 23:00 UTC (permalink / raw)
  To: Simon Kirby
  Cc: Thomas Gleixner, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

On Wed, Nov 02, 2011 at 06:10:23PM -0400, Steven Rostedt wrote:
> Thomas pointed me here.
> 
> On Mon, Oct 31, 2011 at 10:32:46AM -0700, Simon Kirby wrote:
> > [104661.244767] 
> > [104661.244767]  Possible unsafe locking scenario:
> > [104661.244767]        
> > [104661.244767]        CPU0                    CPU1
> > [104661.244767]        ----                    ----
> > [104661.244767]   lock(slock-AF_INET);
> > [104661.244767]                                lock(slock-AF_INET);
> > [104661.244767]                                lock(slock-AF_INET);
> > [104661.244767]   lock(slock-AF_INET);
> > [104661.244767] 
> > [104661.244767]  *** DEADLOCK ***
> > [104661.244767] 
> 
> Bah, I used the __print_lock_name() function to show the lock names in
> the above, which leaves off the subclass number. I'll go write up a
> patch that fixes that.
> 

Simon,

If you are still triggering the bug. Could you do me a favor and apply
the following patch. Just to make sure it fixes the confusing output
from above.

Thanks,

-- Steve


diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 91d67ce..d821ac9 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -490,16 +490,22 @@ void get_usage_chars(struct lock_class *class, char usage[LOCK_USAGE_CHARS])
 	usage[i] = '\0';
 }
 
-static int __print_lock_name(struct lock_class *class)
+static void __print_lock_name(struct lock_class *class)
 {
 	char str[KSYM_NAME_LEN];
 	const char *name;
 
 	name = class->name;
-	if (!name)
+	if (!name) {
 		name = __get_key_name(class->key, str);
-
-	return printk("%s", name);
+		printk("%s", name);
+	} else {
+		printk("%s", name);
+		if (class->name_version > 1)
+			printk("#%d", class->name_version);
+		if (class->subclass)
+			printk("/%d", class->subclass);
+	}
 }
 
 static void print_lock_name(struct lock_class *class)
@@ -509,17 +515,8 @@ static void print_lock_name(struct lock_class *class)
 
 	get_usage_chars(class, usage);
 
-	name = class->name;
-	if (!name) {
-		name = __get_key_name(class->key, str);
-		printk(" (%s", name);
-	} else {
-		printk(" (%s", name);
-		if (class->name_version > 1)
-			printk("#%d", class->name_version);
-		if (class->subclass)
-			printk("/%d", class->subclass);
-	}
+	printk(" (");
+	__print_lock_name(class);
 	printk("){%s}", usage);
 }
 

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 23:00                                             ` Steven Rostedt
@ 2011-11-03  0:09                                               ` Simon Kirby
  2011-11-03  0:15                                                 ` Steven Rostedt
  0 siblings, 1 reply; 98+ messages in thread
From: Simon Kirby @ 2011-11-03  0:09 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Thomas Gleixner, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

On Wed, Nov 02, 2011 at 07:00:10PM -0400, Steven Rostedt wrote:

> On Wed, Nov 02, 2011 at 06:10:23PM -0400, Steven Rostedt wrote:
> > Thomas pointed me here.
> > 
> > On Mon, Oct 31, 2011 at 10:32:46AM -0700, Simon Kirby wrote:
> > > [104661.244767] 
> > > [104661.244767]  Possible unsafe locking scenario:
> > > [104661.244767]        
> > > [104661.244767]        CPU0                    CPU1
> > > [104661.244767]        ----                    ----
> > > [104661.244767]   lock(slock-AF_INET);
> > > [104661.244767]                                lock(slock-AF_INET);
> > > [104661.244767]                                lock(slock-AF_INET);
> > > [104661.244767]   lock(slock-AF_INET);
> > > [104661.244767] 
> > > [104661.244767]  *** DEADLOCK ***
> > > [104661.244767] 
> > 
> > Bah, I used the __print_lock_name() function to show the lock names in
> > the above, which leaves off the subclass number. I'll go write up a
> > patch that fixes that.
> > 
> 
> Simon,
> 
> If you are still triggering the bug. Could you do me a favor and apply
> the following patch. Just to make sure it fixes the confusing output
> from above.
> 
> Thanks,
> 
> -- Steve
> 
> 
> diff --git a/kernel/lockdep.c b/kernel/lockdep.c
> index 91d67ce..d821ac9 100644
> --- a/kernel/lockdep.c
> +++ b/kernel/lockdep.c
> @@ -490,16 +490,22 @@ void get_usage_chars(struct lock_class *class, char usage[LOCK_USAGE_CHARS])
>  	usage[i] = '\0';
>  }
>  
> -static int __print_lock_name(struct lock_class *class)
> +static void __print_lock_name(struct lock_class *class)
>  {
>  	char str[KSYM_NAME_LEN];
>  	const char *name;
>  
>  	name = class->name;
> -	if (!name)
> +	if (!name) {
>  		name = __get_key_name(class->key, str);
> -
> -	return printk("%s", name);
> +		printk("%s", name);
> +	} else {
> +		printk("%s", name);
> +		if (class->name_version > 1)
> +			printk("#%d", class->name_version);
> +		if (class->subclass)
> +			printk("/%d", class->subclass);
> +	}
>  }
>  
>  static void print_lock_name(struct lock_class *class)
> @@ -509,17 +515,8 @@ static void print_lock_name(struct lock_class *class)
>  
>  	get_usage_chars(class, usage);
>  
> -	name = class->name;
> -	if (!name) {
> -		name = __get_key_name(class->key, str);
> -		printk(" (%s", name);
> -	} else {
> -		printk(" (%s", name);
> -		if (class->name_version > 1)
> -			printk("#%d", class->name_version);
> -		if (class->subclass)
> -			printk("/%d", class->subclass);
> -	}
> +	printk(" (");
> +	__print_lock_name(class);
>  	printk("){%s}", usage);
>  }

Hello!

I am now able to reproduce on demand by just starting an "ab" from
another box and "ip route add blackhole <other machine>" on the target
box while the ab is running. The first time I tried this without your
patch, and got the trace I had before. With your patch, I got this:

[  366.198866] huh, entered softirq 3 NET_RX ffffffff81616560 preempt_count 00000102, exited with 00000103?
[  366.198981] 
[  366.198982] =================================
[  366.199118] [ INFO: inconsistent lock state ]
[  366.199189] 3.1.0-hw-lockdep+ #58
[  366.199259] ---------------------------------
[  366.199331] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
[  366.199407] kworker/0:1/0 [HC0[0]:SC0[0]:HE1:SE1] takes:
[  366.199480]  (slock-AF_INET){+.?...}, at: [<ffffffff8160738e>] sk_clone+0x10e/0x3e0
[  366.199773] {IN-SOFTIRQ-W} state was registered at:
[  366.199846]   [<ffffffff81098b7c>] __lock_acquire+0xcbc/0x2180
[  366.199973]   [<ffffffff8109a149>] lock_acquire+0x109/0x140
[  366.200096]   [<ffffffff816f842c>] _raw_spin_lock+0x3c/0x50
[  366.200220]   [<ffffffff8166eb6d>] udp_queue_rcv_skb+0x26d/0x4b0
[  366.200346]   [<ffffffff8166f4d3>] __udp4_lib_rcv+0x2f3/0x910
[  366.200470]   [<ffffffff8166fb05>] udp_rcv+0x15/0x20
[  366.200592]   [<ffffffff81644790>] ip_local_deliver_finish+0x100/0x2f0
[  366.200718]   [<ffffffff81644a0d>] ip_local_deliver+0x8d/0xa0
[  366.200841]   [<ffffffff81644033>] ip_rcv_finish+0x1a3/0x510
[  366.200965]   [<ffffffff81644612>] ip_rcv+0x272/0x2f0
[  366.201086]   [<ffffffff81613b87>] __netif_receive_skb+0x4d7/0x560
[  366.201211]   [<ffffffff81613ce0>] process_backlog+0xd0/0x1e0
[  366.201335]   [<ffffffff816166a0>] net_rx_action+0x140/0x2c0
[  366.201458]   [<ffffffff810640e8>] __do_softirq+0x138/0x250
[  366.201582]   [<ffffffff817030fc>] call_softirq+0x1c/0x30
[  366.201706]   [<ffffffff810153c5>] do_softirq+0x95/0xd0
[  366.202822]   [<ffffffff81063efd>] local_bh_enable+0xed/0x110
[  366.202822]   [<ffffffff81617a68>] dev_queue_xmit+0x1a8/0x8a0
[  366.202822]   [<ffffffff81621fca>] neigh_resolve_output+0x17a/0x220
[  366.202822]   [<ffffffff8164ab7c>] ip_finish_output+0x2ec/0x590
[  366.202822]   [<ffffffff8164aea8>] ip_output+0x88/0xe0
[  366.202822]   [<ffffffff81649b08>] ip_local_out+0x28/0x80
[  366.202822]   [<ffffffff81649b69>] ip_send_skb+0x9/0x40
[  366.202822]   [<ffffffff8166dce8>] udp_send_skb+0x108/0x370
[  366.202822]   [<ffffffff8167093c>] udp_sendmsg+0x7dc/0x920
[  366.202822]   [<ffffffff81678c4f>] inet_sendmsg+0xbf/0x120
[  366.202822]   [<ffffffff81602193>] sock_sendmsg+0xe3/0x110
[  366.202822]   [<ffffffff81602ab5>] sys_sendto+0x105/0x140
[  366.202822]   [<ffffffff81700e92>] system_call_fastpath+0x16/0x1b
[  366.202822] irq event stamp: 1175966
[  366.202822] hardirqs last  enabled at (1175964): [<ffffffff816f9174>] restore_args+0x0/0x30
[  366.202822] hardirqs last disabled at (1175965): [<ffffffff8106415d>] __do_softirq+0x1ad/0x250
[  366.202822] softirqs last  enabled at (1175966): [<ffffffff810641a6>] __do_softirq+0x1f6/0x250
[  366.202822] softirqs last disabled at (1175907): [<ffffffff817030fc>] call_softirq+0x1c/0x30
[  366.202822] 
[  366.202822] other info that might help us debug this:
[  366.202822]  Possible unsafe locking scenario:
[  366.202822] 
[  366.202822]        CPU0
[  366.202822]        ----
[  366.202822]   lock(slock-AF_INET);
[  366.202822]   <Interrupt>
[  366.202822]     lock(slock-AF_INET);
[  366.202822] 
[  366.202822]  *** DEADLOCK ***
[  366.202822] 
[  366.202822] 1 lock held by kworker/0:1/0:
[  366.202822]  #0:  (slock-AF_INET){+.?...}, at: [<ffffffff8160738e>] sk_clone+0x10e/0x3e0
[  366.202822] 
[  366.202822] stack backtrace:
[  366.202822] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-hw-lockdep+ #58
[  366.202822] Call Trace:
[  366.202822]  [<ffffffff81095f31>] print_usage_bug+0x241/0x310
[  366.202822]  [<ffffffff810964b4>] mark_lock+0x4b4/0x6c0
[  366.202822]  [<ffffffff81097300>] ? check_usage_forwards+0x110/0x110
[  366.202822]  [<ffffffff81096762>] mark_held_locks+0xa2/0x130
[  366.202822]  [<ffffffff816f9174>] ? retint_restore_args+0x13/0x13
[  366.202822]  [<ffffffff81096b0d>] trace_hardirqs_on_caller+0x13d/0x1c0
[  366.202822]  [<ffffffff813a60ee>] trace_hardirqs_on_thunk+0x3a/0x3f
[  366.202822]  [<ffffffff816f9174>] ? retint_restore_args+0x13/0x13
[  366.202822]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[  366.202822]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[  366.202822]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[  366.202822]  [<ffffffff816ef2eb>] start_secondary+0x1ca/0x1ff

...which of course is a different splat, so I ran it again:

[   49.028097] =======================================================
[   49.028244] [ INFO: possible circular locking dependency detected ]
[   49.028321] 3.1.0-hw-lockdep+ #58
[   49.028391] -------------------------------------------------------
[   49.028466] tcsh/2490 is trying to acquire lock:
[   49.028539]  (slock-AF_INET/1){+.-...}, at: [<ffffffff816676b7>] tcp_v4_rcv+0x867/0xc10
[   49.028882] 
[   49.028883] but task is already holding lock:
[   49.029018]  (slock-AF_INET){+.-...}, at: [<ffffffff8160738e>] sk_clone+0x10e/0x3e0
[   49.029310] 
[   49.029310] which lock already depends on the new lock.
[   49.029312] 
[   49.029513] 
[   49.029514] the existing dependency chain (in reverse order) is:
[   49.029653] 
[   49.029654] -> #1 (slock-AF_INET){+.-...}:
[   49.029986]        [<ffffffff8109a149>] lock_acquire+0x109/0x140
[   49.030115]        [<ffffffff816f842c>] _raw_spin_lock+0x3c/0x50
[   49.030242]        [<ffffffff8160738e>] sk_clone+0x10e/0x3e0
[   49.031959]        [<ffffffff8164f963>] inet_csk_clone+0x13/0x90
[   49.032008]        [<ffffffff816697d5>] tcp_create_openreq_child+0x25/0x4d0
[   49.032008]        [<ffffffff81667aa8>] tcp_v4_syn_recv_sock+0x48/0x2c0
[   49.032008]        [<ffffffff81669625>] tcp_check_req+0x335/0x4c0
[   49.032008]        [<ffffffff81666c8e>] tcp_v4_do_rcv+0x29e/0x460
[   49.032008]        [<ffffffff816676dc>] tcp_v4_rcv+0x88c/0xc10
[   49.032008]        [<ffffffff81644790>] ip_local_deliver_finish+0x100/0x2f0
[   49.032008]        [<ffffffff81644a0d>] ip_local_deliver+0x8d/0xa0
[   49.032008]        [<ffffffff81644033>] ip_rcv_finish+0x1a3/0x510
[   49.032008]        [<ffffffff81644612>] ip_rcv+0x272/0x2f0
[   49.032008]        [<ffffffff81613b87>] __netif_receive_skb+0x4d7/0x560
[   49.032008]        [<ffffffff81613ce0>] process_backlog+0xd0/0x1e0
[   49.032008]        [<ffffffff816166a0>] net_rx_action+0x140/0x2c0
[   49.032008]        [<ffffffff810640e8>] __do_softirq+0x138/0x250
[   49.032008]        [<ffffffff817030fc>] call_softirq+0x1c/0x30
[   49.032008]        [<ffffffff810153c5>] do_softirq+0x95/0xd0
[   49.032008]        [<ffffffff81063ded>] local_bh_enable_ip+0xed/0x110
[   49.032008]        [<ffffffff816f8ccf>] _raw_spin_unlock_bh+0x3f/0x50
[   49.032008]        [<ffffffff81605ca1>] release_sock+0x161/0x1d0
[   49.032008]        [<ffffffff8167911d>] inet_stream_connect+0x6d/0x2f0
[   49.032008]        [<ffffffff815ffe4b>] kernel_connect+0xb/0x10
[   49.032008]        [<ffffffff816addb6>] xs_tcp_setup_socket+0x2a6/0x4c0
[   49.032008]        [<ffffffff81078d29>] process_one_work+0x1e9/0x560
[   49.032008]        [<ffffffff81079433>] worker_thread+0x193/0x420
[   49.032008]        [<ffffffff81080496>] kthread+0x96/0xb0
[   49.032008]        [<ffffffff81703004>] kernel_thread_helper+0x4/0x10
[   49.032008] 
[   49.032008] -> #0 (slock-AF_INET/1){+.-...}:
[   49.032008]        [<ffffffff81099f00>] __lock_acquire+0x2040/0x2180
[   49.032008]        [<ffffffff8109a149>] lock_acquire+0x109/0x140
[   49.032008]        [<ffffffff816f83da>] _raw_spin_lock_nested+0x3a/0x50
[   49.032008]        [<ffffffff816676b7>] tcp_v4_rcv+0x867/0xc10
[   49.032008]        [<ffffffff81644790>] ip_local_deliver_finish+0x100/0x2f0
[   49.032008]        [<ffffffff81644a0d>] ip_local_deliver+0x8d/0xa0
[   49.032008]        [<ffffffff81644033>] ip_rcv_finish+0x1a3/0x510
[   49.032008]        [<ffffffff81644612>] ip_rcv+0x272/0x2f0
[   49.032008]        [<ffffffff81613b87>] __netif_receive_skb+0x4d7/0x560
[   49.032008]        [<ffffffff81615c44>] netif_receive_skb+0x104/0x120
[   49.032008]        [<ffffffff81615d90>] napi_skb_finish+0x50/0x70
[   49.032008]        [<ffffffff81616455>] napi_gro_receive+0xc5/0xd0
[   49.032008]        [<ffffffffa000ad50>] bnx2_poll_work+0x610/0x1560 [bnx2]
[   49.032008]        [<ffffffffa000bde6>] bnx2_poll+0x66/0x250 [bnx2]
[   49.032008]        [<ffffffff816166a0>] net_rx_action+0x140/0x2c0
[   49.032008]        [<ffffffff810640e8>] __do_softirq+0x138/0x250
[   49.032008]        [<ffffffff817030fc>] call_softirq+0x1c/0x30
[   49.032008]        [<ffffffff810153c5>] do_softirq+0x95/0xd0
[   49.032008]        [<ffffffff81063cbd>] irq_exit+0xdd/0x110
[   49.032008]        [<ffffffff81014b74>] do_IRQ+0x64/0xe0
[   49.032008]        [<ffffffff816f90b3>] ret_from_intr+0x0/0x1a
[   49.032008]        [<ffffffff8105f63f>] release_task+0x24f/0x4c0
[   49.032008]        [<ffffffff810601de>] wait_consider_task+0x92e/0xb90
[   49.032008]        [<ffffffff81060590>] do_wait+0x150/0x270
[   49.032008]        [<ffffffff81060751>] sys_wait4+0xa1/0xf0
[   49.032008]        [<ffffffff81700e92>] system_call_fastpath+0x16/0x1b
[   49.032008] 
[   49.032008] other info that might help us debug this:
[   49.032008] 
[   49.032008]  Possible unsafe locking scenario:
[   49.032008] 
[   49.032008]        CPU0                    CPU1
[   49.032008]        ----                    ----
[   49.032008]   lock(slock-AF_INET);
[   49.039565]                                lock(slock-AF_INET/1);
[   49.039565]                                lock(slock-AF_INET);
[   49.039565]   lock(slock-AF_INET/1);
[   49.039565] 
[   49.039565]  *** DEADLOCK ***
[   49.039565] 
[   49.039565] 3 locks held by tcsh/2490:
[   49.039565]  #0:  (slock-AF_INET){+.-...}, at: [<ffffffff8160738e>] sk_clone+0x10e/0x3e0
[   49.039565]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff81613815>] __netif_receive_skb+0x165/0x560
[   49.039565]  #2:  (rcu_read_lock){.+.+..}, at: [<ffffffff816446d0>] ip_local_deliver_finish+0x40/0x2f0
[   49.039565] 
[   49.039565] stack backtrace:
[   49.039565] Pid: 2490, comm: tcsh Not tainted 3.1.0-hw-lockdep+ #58
[   49.039565] Call Trace:
[   49.039565]  <IRQ>  [<ffffffff81097dab>] print_circular_bug+0x21b/0x330
[   49.039565]  [<ffffffff81099f00>] __lock_acquire+0x2040/0x2180
[   49.039565]  [<ffffffff8109a149>] lock_acquire+0x109/0x140
[   49.039565]  [<ffffffff816676b7>] ? tcp_v4_rcv+0x867/0xc10
[   49.039565]  [<ffffffff816f83da>] _raw_spin_lock_nested+0x3a/0x50
[   49.039565]  [<ffffffff816676b7>] ? tcp_v4_rcv+0x867/0xc10
[   49.039565]  [<ffffffff816676b7>] tcp_v4_rcv+0x867/0xc10
[   49.039565]  [<ffffffff816446d0>] ? ip_local_deliver_finish+0x40/0x2f0
[   49.039565]  [<ffffffff81644790>] ip_local_deliver_finish+0x100/0x2f0
[   49.039565]  [<ffffffff816446d0>] ? ip_local_deliver_finish+0x40/0x2f0
[   49.039565]  [<ffffffff81644a0d>] ip_local_deliver+0x8d/0xa0
[   49.039565]  [<ffffffff81644033>] ip_rcv_finish+0x1a3/0x510
[   49.039565]  [<ffffffff81644612>] ip_rcv+0x272/0x2f0
[   49.039565]  [<ffffffff81613b87>] __netif_receive_skb+0x4d7/0x560
[   49.039565]  [<ffffffff81613815>] ? __netif_receive_skb+0x165/0x560
[   49.039565]  [<ffffffff81615c44>] netif_receive_skb+0x104/0x120
[   49.039565]  [<ffffffff81615b63>] ? netif_receive_skb+0x23/0x120
[   49.039565]  [<ffffffff816161cb>] ? dev_gro_receive+0x29b/0x380
[   49.039565]  [<ffffffff816160c2>] ? dev_gro_receive+0x192/0x380
[   49.039565]  [<ffffffff81615d90>] napi_skb_finish+0x50/0x70
[   49.039565]  [<ffffffff81616455>] napi_gro_receive+0xc5/0xd0
[   49.039565]  [<ffffffffa000ad50>] bnx2_poll_work+0x610/0x1560 [bnx2]
[   49.039565]  [<ffffffffa000bde6>] bnx2_poll+0x66/0x250 [bnx2]
[   49.039565]  [<ffffffff816166a0>] net_rx_action+0x140/0x2c0
[   49.039565]  [<ffffffff810640e8>] __do_softirq+0x138/0x250
[   49.039565]  [<ffffffff817030fc>] call_softirq+0x1c/0x30
[   49.039565]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[   49.039565]  [<ffffffff81063cbd>] irq_exit+0xdd/0x110
[   49.039565]  [<ffffffff81014b74>] do_IRQ+0x64/0xe0
[   49.039565]  [<ffffffff816f90b3>] common_interrupt+0x73/0x73
[   49.039565]  <EOI>  [<ffffffff810944fd>] ? trace_hardirqs_off+0xd/0x10
[   49.039565]  [<ffffffff816f864f>] ? _raw_write_unlock_irq+0x2f/0x50
[   49.039565]  [<ffffffff816f864b>] ? _raw_write_unlock_irq+0x2b/0x50
[   49.039565]  [<ffffffff8105f63f>] release_task+0x24f/0x4c0
[   49.039565]  [<ffffffff8105f414>] ? release_task+0x24/0x4c0
[   49.039565]  [<ffffffff810601de>] wait_consider_task+0x92e/0xb90
[   49.039565]  [<ffffffff81096b0d>] ? trace_hardirqs_on_caller+0x13d/0x1c0
[   49.039565]  [<ffffffff81060590>] do_wait+0x150/0x270
[   49.039565]  [<ffffffff81096b9d>] ? trace_hardirqs_on+0xd/0x10
[   49.039565]  [<ffffffff81060751>] sys_wait4+0xa1/0xf0
[   49.039565]  [<ffffffff8105e9b0>] ? wait_noreap_copyout+0x150/0x150
[   49.039565]  [<ffffffff81700e92>] system_call_fastpath+0x16/0x1b
[   49.045277] huh, entered softirq 3 NET_RX ffffffff81616560 preempt_count 00000102, exited with 00000103?

Did that help? I'm not sure if that's what you wanted to see...

Simon-

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-03  0:09                                               ` Simon Kirby
@ 2011-11-03  0:15                                                 ` Steven Rostedt
  2011-11-03  0:17                                                   ` Simon Kirby
  0 siblings, 1 reply; 98+ messages in thread
From: Steven Rostedt @ 2011-11-03  0:15 UTC (permalink / raw)
  To: Simon Kirby
  Cc: Thomas Gleixner, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

On Wed, 2011-11-02 at 17:09 -0700, Simon Kirby wrote:
>  
> [   49.032008] other info that might help us debug this:
> [   49.032008] 
> [   49.032008]  Possible unsafe locking scenario:
> [   49.032008] 
> [   49.032008]        CPU0                    CPU1
> [   49.032008]        ----                    ----
> [   49.032008]   lock(slock-AF_INET);
> [   49.039565]                                lock(slock-AF_INET/1);
> [   49.039565]                                lock(slock-AF_INET);
> [   49.039565]   lock(slock-AF_INET/1);
> [   49.039565] 
> [   49.039565]  *** DEADLOCK ***
> [   49.039565] 

> Did that help? I'm not sure if that's what you wanted to see...


Yes, this looks much better than what you previously showed. The added
"/1" makes a world of difference.

Thanks!

I'll add your "Tested-by". Seems rather strange as we didn't fix the bug
you are chasing, but instead fixed the output of what the bug
produced ;)

-- Steve



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-03  0:15                                                 ` Steven Rostedt
@ 2011-11-03  0:17                                                   ` Simon Kirby
  0 siblings, 0 replies; 98+ messages in thread
From: Simon Kirby @ 2011-11-03  0:17 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Thomas Gleixner, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

On Wed, Nov 02, 2011 at 08:15:51PM -0400, Steven Rostedt wrote:

> On Wed, 2011-11-02 at 17:09 -0700, Simon Kirby wrote:
> >  
> > [   49.032008] other info that might help us debug this:
> > [   49.032008] 
> > [   49.032008]  Possible unsafe locking scenario:
> > [   49.032008] 
> > [   49.032008]        CPU0                    CPU1
> > [   49.032008]        ----                    ----
> > [   49.032008]   lock(slock-AF_INET);
> > [   49.039565]                                lock(slock-AF_INET/1);
> > [   49.039565]                                lock(slock-AF_INET);
> > [   49.039565]   lock(slock-AF_INET/1);
> > [   49.039565] 
> > [   49.039565]  *** DEADLOCK ***
> > [   49.039565] 
> 
> > Did that help? I'm not sure if that's what you wanted to see...
> 
> 
> Yes, this looks much better than what you previously showed. The added
> "/1" makes a world of difference.
> 
> Thanks!
> 
> I'll add your "Tested-by". Seems rather strange as we didn't fix the bug
> you are chasing, but instead fixed the output of what the bug
> produced ;)

Well, I was testing this without Eric's patch as I figured you wanted to
see the splat. :) Testing again with Eric's patch now.

Simon-

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 22:42                                                     ` Eric Dumazet
@ 2011-11-03  0:24                                                       ` Thomas Gleixner
  2011-11-03  0:52                                                       ` Simon Kirby
  2011-11-03  6:06                                                       ` Jörg-Volker Peetz
  2 siblings, 0 replies; 98+ messages in thread
From: Thomas Gleixner @ 2011-11-03  0:24 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Simon Kirby, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development, Balazs Scheidler,
	KOVACS Krisztian

On Wed, 2 Nov 2011, Eric Dumazet wrote:
> On 02/11/2011 20:16, Simon Kirby wrote:
> 
>  
> > Actually, we have an anti-abuse daemon that injects blackhole routes, so
> > this makes sense. (The daemon was written before ipsets were merged and
> > normal netfilter rules make it fall over under attack.)
> > 
> > I'll try with this patch. Thanks!
> > 
> 
> 
> Thanks !
> 
> Here is the official submission, please add your 'Tested-by' signature
> when you can confirm problem goes away.
> 
> (It did here, when I injected random NULL returns from
> inet_csk_route_child_sock(), so I am confident this is the problem you hit )
> 
> [PATCH] net: add missing bh_unlock_sock() calls
> 
> Simon Kirby reported lockdep warnings and following messages :
> 
> [104661.897577] huh, entered softirq 3 NET_RX ffffffff81613740
> preempt_count 00000101, exited with 00000102?
> 
> [104661.923653] huh, entered softirq 3 NET_RX ffffffff81613740
> preempt_count 00000101, exited with 00000102?
> 
> Problem comes from commit 0e734419
> (ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.)
> 
> If inet_csk_route_child_sock() returns NULL, we should release socket
> lock before freeing it.
> 
> Another lock imbalance exists if __inet_inherit_port() returns an error
> since commit 093d282321da ( tproxy: fix hash locking issue when using
> port redirection in __inet_inherit_port()) a backport is also needed for
> >= 2.6.37 kernels.
> 
> Reported-by: Dimon Kirby <sim@hostway.ca>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Balazs Scheidler <bazsi@balabit.hu>
> CC: KOVACS Krisztian <hidden@balabit.hu>

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

You probably also want: CC: stable@vger.kernel.org

Thanks,

	tglx

> ---
>  net/dccp/ipv4.c     |    1 +
>  net/ipv4/tcp_ipv4.c |    1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
> index 332639b..90a919a 100644
> --- a/net/dccp/ipv4.c
> +++ b/net/dccp/ipv4.c
> @@ -433,6 +433,7 @@ exit:
>  	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS);
>  	return NULL;
>  put_and_exit:
> +	bh_unlock_sock(newsk);
>  	sock_put(newsk);
>  	goto exit;
>  }
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 0ea10ee..683d97a 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1510,6 +1510,7 @@ exit:
>  	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS);
>  	return NULL;
>  put_and_exit:
> +	bh_unlock_sock(newsk);
>  	sock_put(newsk);
>  	goto exit;
>  }
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 22:42                                                     ` Eric Dumazet
  2011-11-03  0:24                                                       ` Thomas Gleixner
@ 2011-11-03  0:52                                                       ` Simon Kirby
  2011-11-03 22:07                                                         ` David Miller
  2011-11-03  6:06                                                       ` Jörg-Volker Peetz
  2 siblings, 1 reply; 98+ messages in thread
From: Simon Kirby @ 2011-11-03  0:52 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Thomas Gleixner, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development, Balazs Scheidler,
	KOVACS Krisztian

On Wed, Nov 02, 2011 at 11:42:56PM +0100, Eric Dumazet wrote:

> On 02/11/2011 20:16, Simon Kirby wrote:
> 
>  
> > Actually, we have an anti-abuse daemon that injects blackhole routes, so
> > this makes sense. (The daemon was written before ipsets were merged and
> > normal netfilter rules make it fall over under attack.)
> > 
> > I'll try with this patch. Thanks!
> > 
> 
> 
> Thanks !
> 
> Here is the official submission, please add your 'Tested-by' signature
> when you can confirm problem goes away.
> 
> (It did here, when I injected random NULL returns from
> inet_csk_route_child_sock(), so I am confident this is the problem you hit )
> 
> [PATCH] net: add missing bh_unlock_sock() calls
> 
> Simon Kirby reported lockdep warnings and following messages :
> 
> [104661.897577] huh, entered softirq 3 NET_RX ffffffff81613740
> preempt_count 00000101, exited with 00000102?
> 
> [104661.923653] huh, entered softirq 3 NET_RX ffffffff81613740
> preempt_count 00000101, exited with 00000102?
> 
> Problem comes from commit 0e734419
> (ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.)
> 
> If inet_csk_route_child_sock() returns NULL, we should release socket
> lock before freeing it.
> 
> Another lock imbalance exists if __inet_inherit_port() returns an error
> since commit 093d282321da ( tproxy: fix hash locking issue when using
> port redirection in __inet_inherit_port()) a backport is also needed for
> >= 2.6.37 kernels.
> 
> Reported-by: Dimon Kirby <sim@hostway.ca>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Balazs Scheidler <bazsi@balabit.hu>
> CC: KOVACS Krisztian <hidden@balabit.hu>
> ---
>  net/dccp/ipv4.c     |    1 +
>  net/ipv4/tcp_ipv4.c |    1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
> index 332639b..90a919a 100644
> --- a/net/dccp/ipv4.c
> +++ b/net/dccp/ipv4.c
> @@ -433,6 +433,7 @@ exit:
>  	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS);
>  	return NULL;
>  put_and_exit:
> +	bh_unlock_sock(newsk);
>  	sock_put(newsk);
>  	goto exit;
>  }
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 0ea10ee..683d97a 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1510,6 +1510,7 @@ exit:
>  	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS);
>  	return NULL;
>  put_and_exit:
> +	bh_unlock_sock(newsk);
>  	sock_put(newsk);
>  	goto exit;
>  }

Tested-by: Simon Kirby <sim@hostway.ca>

I tried many times, with route unreach/blackhole, and could not reproduce
the issue with this patch applied.

Thanks!

Simon-

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 22:42                                                     ` Eric Dumazet
  2011-11-03  0:24                                                       ` Thomas Gleixner
  2011-11-03  0:52                                                       ` Simon Kirby
@ 2011-11-03  6:06                                                       ` Jörg-Volker Peetz
  2011-11-03  6:26                                                         ` Eric Dumazet
  2 siblings, 1 reply; 98+ messages in thread
From: Jörg-Volker Peetz @ 2011-11-03  6:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev

Eric Dumazet wrote, on 11/02/11 23:42:
<snip>
> Reported-by: Dimon Kirby <sim@hostway.ca>
??             Simon                     ??
<snip>
-- 
Best regards,
Jörg-Volker.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-03  6:06                                                       ` Jörg-Volker Peetz
@ 2011-11-03  6:26                                                         ` Eric Dumazet
  2011-11-03  6:43                                                           ` David Miller
  0 siblings, 1 reply; 98+ messages in thread
From: Eric Dumazet @ 2011-11-03  6:26 UTC (permalink / raw)
  To: Jörg-Volker Peetz
  Cc: Simon Kirby, Thomas Gleixner, Linux Kernel Mailing List

On 03/11/2011 07:06, Jörg-Volker Peetz wrote:

> Eric Dumazet wrote, on 11/02/11 23:42:
> <snip>
>> Reported-by: Dimon Kirby <sim@hostway.ca>
> ??             Simon                     ??
> <snip>


Oops sorry, please David could you fix Simon name ?

Reported-by: Simon Kirby <sim@hostway.ca>

Thanks


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-03  6:26                                                         ` Eric Dumazet
@ 2011-11-03  6:43                                                           ` David Miller
  0 siblings, 0 replies; 98+ messages in thread
From: David Miller @ 2011-11-03  6:43 UTC (permalink / raw)
  To: eric.dumazet; +Cc: jvpeetz, sim, tglx, linux-kernel

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 03 Nov 2011 07:26:37 +0100

> On 03/11/2011 07:06, Jörg-Volker Peetz wrote:
> 
>> Eric Dumazet wrote, on 11/02/11 23:42:
>> <snip>
>>> Reported-by: Dimon Kirby <sim@hostway.ca>
>> ??             Simon                     ??
>> <snip>
> 
> 
> Oops sorry, please David could you fix Simon name ?

Will do.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-03  0:52                                                       ` Simon Kirby
@ 2011-11-03 22:07                                                         ` David Miller
  0 siblings, 0 replies; 98+ messages in thread
From: David Miller @ 2011-11-03 22:07 UTC (permalink / raw)
  To: sim
  Cc: eric.dumazet, tglx, a.p.zijlstra, torvalds, linux-kernel, davej,
	schwidefsky, mingo, netdev, bazsi, hidden

From: Simon Kirby <sim@hostway.ca>
Date: Wed, 2 Nov 2011 17:52:55 -0700

>> [PATCH] net: add missing bh_unlock_sock() calls
 ...
> Tested-by: Simon Kirby <sim@hostway.ca>
> 
> I tried many times, with route unreach/blackhole, and could not reproduce
> the issue with this patch applied.

Applied and queued up for -stable, thanks everyone!

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [tip:perf/core] lockdep: Show subclass in pretty print of lockdep output
  2011-10-25 20:20                                       ` Simon Kirby
  2011-10-31 17:32                                         ` Simon Kirby
@ 2011-11-18 23:11                                         ` tip-bot for Steven Rostedt
  1 sibling, 0 replies; 98+ messages in thread
From: tip-bot for Steven Rostedt @ 2011-11-18 23:11 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, rostedt, srostedt, tglx, sim

Commit-ID:  e5e78d08f3ab3094783b8df08a5b6d1d1a56a58f
Gitweb:     http://git.kernel.org/tip/e5e78d08f3ab3094783b8df08a5b6d1d1a56a58f
Author:     Steven Rostedt <srostedt@redhat.com>
AuthorDate: Wed, 2 Nov 2011 20:24:16 -0400
Committer:  Steven Rostedt <rostedt@goodmis.org>
CommitDate: Mon, 7 Nov 2011 11:01:46 -0500

lockdep: Show subclass in pretty print of lockdep output

The pretty print of the lockdep debug splat uses just the lock name
to show how the locking scenario happens. But when it comes to
nesting locks, the output becomes confusing which takes away the point
of the pretty printing of the lock scenario.

Without displaying the subclass info, we get the following output:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(slock-AF_INET);
                                lock(slock-AF_INET);
                                lock(slock-AF_INET);
   lock(slock-AF_INET);

  *** DEADLOCK ***

The above looks more of a A->A locking bug than a A->B B->A.
By adding the subclass to the output, we can see what really happened:

 other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(slock-AF_INET);
                                lock(slock-AF_INET/1);
                                lock(slock-AF_INET);
   lock(slock-AF_INET/1);

  *** DEADLOCK ***

This bug was discovered while tracking down a real bug caught by lockdep.

Link: http://lkml.kernel.org/r/20111025202049.GB25043@hostway.ca

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Simon Kirby <sim@hostway.ca>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/lockdep.c |   30 +++++++++++++-----------------
 1 files changed, 13 insertions(+), 17 deletions(-)

diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 91d67ce..6bd915d 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -490,36 +490,32 @@ void get_usage_chars(struct lock_class *class, char usage[LOCK_USAGE_CHARS])
 	usage[i] = '\0';
 }
 
-static int __print_lock_name(struct lock_class *class)
+static void __print_lock_name(struct lock_class *class)
 {
 	char str[KSYM_NAME_LEN];
 	const char *name;
 
 	name = class->name;
-	if (!name)
-		name = __get_key_name(class->key, str);
-
-	return printk("%s", name);
-}
-
-static void print_lock_name(struct lock_class *class)
-{
-	char str[KSYM_NAME_LEN], usage[LOCK_USAGE_CHARS];
-	const char *name;
-
-	get_usage_chars(class, usage);
-
-	name = class->name;
 	if (!name) {
 		name = __get_key_name(class->key, str);
-		printk(" (%s", name);
+		printk("%s", name);
 	} else {
-		printk(" (%s", name);
+		printk("%s", name);
 		if (class->name_version > 1)
 			printk("#%d", class->name_version);
 		if (class->subclass)
 			printk("/%d", class->subclass);
 	}
+}
+
+static void print_lock_name(struct lock_class *class)
+{
+	char usage[LOCK_USAGE_CHARS];
+
+	get_usage_chars(class, usage);
+
+	printk(" (");
+	__print_lock_name(class);
 	printk("){%s}", usage);
 }
 

^ permalink raw reply related	[flat|nested] 98+ messages in thread

end of thread, other threads:[~2011-11-18 23:11 UTC | newest]

Thread overview: 98+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-05  1:40 Linux 3.1-rc9 Linus Torvalds
2011-10-07  7:08 ` Simon Kirby
2011-10-07 17:48   ` Simon Kirby
2011-10-07 18:01     ` Peter Zijlstra
2011-10-08  0:33       ` Simon Kirby
2011-10-08  0:50       ` Simon Kirby
2011-10-08  7:55         ` Peter Zijlstra
2011-10-12 21:35           ` Simon Kirby
2011-10-13 23:25             ` Simon Kirby
2011-10-17  1:39               ` Linus Torvalds
2011-10-17  4:58                 ` Ingo Molnar
2011-10-17  9:03                   ` Thomas Gleixner
2011-10-17 10:40                     ` Peter Zijlstra
2011-10-17 11:40                       ` Alan Cox
2011-10-17 18:49                     ` Ingo Molnar
2011-10-17 20:35                       ` H. Peter Anvin
2011-10-17 21:19                         ` Ingo Molnar
2011-10-17 21:22                           ` H. Peter Anvin
2011-10-17 21:39                             ` Ingo Molnar
2011-10-17 22:03                               ` Ingo Molnar
2011-10-17 22:04                                 ` Ingo Molnar
2011-10-17 22:08                               ` H. Peter Anvin
2011-10-18  6:01                                 ` Ingo Molnar
2011-10-18  7:12                                 ` Geert Uytterhoeven
2011-10-18 18:50                                   ` H. Peter Anvin
2011-10-17 21:31                           ` Ingo Molnar
2011-10-17  7:55                 ` Martin Schwidefsky
2011-10-17  9:12                   ` Peter Zijlstra
2011-10-17  9:18                     ` Martin Schwidefsky
2011-10-17 20:48                   ` H. Peter Anvin
2011-10-18  7:20                     ` Martin Schwidefsky
2011-10-17 10:34                 ` Peter Zijlstra
2011-10-17 14:07                   ` Martin Schwidefsky
2011-10-17 14:57                   ` Linus Torvalds
2011-10-17 17:54                     ` Peter Zijlstra
2011-10-17 18:31                       ` Linus Torvalds
2011-10-17 19:23                         ` Peter Zijlstra
2011-10-17 21:00                           ` Thomas Gleixner
2011-10-18  8:39                             ` Thomas Gleixner
2011-10-18  9:05                               ` Peter Zijlstra
2011-10-18 14:59                                 ` Linus Torvalds
2011-10-18 15:26                                   ` Thomas Gleixner
2011-10-18 18:07                                   ` Ingo Molnar
2011-10-18 18:14                                   ` [GIT PULL] timer fix Ingo Molnar
2011-10-18 16:13                                 ` Linux 3.1-rc9 Dave Jones
2011-10-18 18:20                                 ` Simon Kirby
2011-10-18 19:48                                   ` Thomas Gleixner
2011-10-18 20:12                                     ` Linus Torvalds
2011-10-25 15:26                                       ` Simon Kirby
2011-10-26  1:47                                         ` Yong Zhang
2011-10-24 19:02                                     ` Simon Kirby
2011-10-25  7:13                                       ` Linus Torvalds
2011-10-25  9:01                                         ` David Miller
2011-10-25 12:30                                           ` Thomas Gleixner
2011-10-25 23:18                                             ` David Miller
2011-10-25 20:20                                       ` Simon Kirby
2011-10-31 17:32                                         ` Simon Kirby
2011-11-02 16:40                                           ` Thomas Gleixner
2011-11-02 17:27                                             ` Eric Dumazet
2011-11-02 17:46                                               ` Linus Torvalds
2011-11-02 17:53                                                 ` Eric Dumazet
2011-11-02 18:00                                                   ` Linus Torvalds
2011-11-02 18:05                                                     ` Eric Dumazet
2011-11-02 18:10                                                       ` Linus Torvalds
2011-11-02 17:49                                               ` Eric Dumazet
2011-11-02 17:58                                                 ` Eric Dumazet
2011-11-02 19:16                                                   ` Simon Kirby
2011-11-02 22:42                                                     ` Eric Dumazet
2011-11-03  0:24                                                       ` Thomas Gleixner
2011-11-03  0:52                                                       ` Simon Kirby
2011-11-03 22:07                                                         ` David Miller
2011-11-03  6:06                                                       ` Jörg-Volker Peetz
2011-11-03  6:26                                                         ` Eric Dumazet
2011-11-03  6:43                                                           ` David Miller
2011-11-02 17:54                                               ` Thomas Gleixner
2011-11-02 18:04                                                 ` Eric Dumazet
2011-11-02 18:28                                             ` Simon Kirby
2011-11-02 18:30                                               ` Thomas Gleixner
2011-11-02 22:10                                           ` Steven Rostedt
2011-11-02 23:00                                             ` Steven Rostedt
2011-11-03  0:09                                               ` Simon Kirby
2011-11-03  0:15                                                 ` Steven Rostedt
2011-11-03  0:17                                                   ` Simon Kirby
2011-11-18 23:11                                         ` [tip:perf/core] lockdep: Show subclass in pretty print of lockdep output tip-bot for Steven Rostedt
2011-10-20 14:36                 ` Linux 3.1-rc9 Martin Schwidefsky
2011-10-23 11:34                   ` Ingo Molnar
2011-10-24  7:48                     ` Martin Schwidefsky
2011-10-24  7:51                       ` Linus Torvalds
2011-10-24  8:08                         ` Martin Schwidefsky
2011-10-18  5:40             ` Simon Kirby
2011-10-09 20:51 ` Arkadiusz Miśkiewicz
2011-10-10  2:29   ` [tpmdd-devel] " Stefan Berger
2011-10-10 16:23     ` Rajiv Andrade
2011-10-10 17:05       ` Arkadiusz Miśkiewicz
2011-10-10 17:22         ` Stefan Berger
2011-10-10 17:57           ` Arkadiusz Miśkiewicz
2011-10-10 21:08             ` Arkadiusz Miśkiewicz
2011-10-11  7:09             ` [tpmdd-devel] " Peter.Huewe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.