linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Linux 3.1-rc9
@ 2011-10-05  1:40 Linus Torvalds
  2011-10-07  7:08 ` Simon Kirby
  2011-10-09 20:51 ` Arkadiusz Miśkiewicz
  0 siblings, 2 replies; 156+ messages in thread
From: Linus Torvalds @ 2011-10-05  1:40 UTC (permalink / raw)
  To: Linux Kernel Mailing List

Another week, another -rc.

On the kernel front, not a huge amount of changes. That said, by now,
there had better not be - and I definitely wouldn't have minded having
even fewer changes. But the fixes that are here are generally pretty
small, and the diffstat really doesn't look all that scary - there
really aren't *big* changes anywhere.

The things that do stand out a bit: some DRM fixes (radeon and i915),
various network drivers, some ceph fixes - and just lots of random
small stuff. The sparc updates are tiny (T4/T5 detection), but even so
are the bulk of the arch changes, things really have been that quiet.

The more noticeable change isn't actually to the code at all, it's
that kernel.org is starting to have parts of it come up again, so you
can now find the kernel sources back in the traditional location:

    git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

(although I am also updating github in case you cloned from there, so
you don't *have* to change).

Also, since I now have a new stronger gpg key, and that new key
actually ends up being signed by more people than the old one ever was
(at least people I *know*), I decided I might as well switch to that
one. So if you are the kind of person who verifies tags, you may want
to do

   gpg --recv-keys 00411886

to get my new key, so that "git verify-tag" will work for you.

(Key fingerprint = ABAF 11C6 5A29 70B1 30AB  E3C4 79BE 3E43 0041 1886)

Obviously, in order to trust that that is actually really my key
rather than just blindly believe this email that could easily have
been faked by some Linux wannabe), you'd need to check it. But since
it's signed with my old tag signing key, you shouldn't have any more
trust issues with the new one than you should have with the old key.
Or you can try to follow the chain of trust of the other key signers -
some of them have way more signatures than I ever had and are pretty
well connected.

Anything else worth mentioning? Hopefully the PCIe issues with MPS
tuning are all behind us, for the simple reason that we just disabled
them for now, and will revisit it for 3.2. And the occasional oopses
with USB disk removal should now be fixed once and for all (knock
wood). Anything else should be really esoteric and device-specific,
you can get a feel for it in the shortlog.

Go forth and test,

                     Linus

---

Alex Deucher (4):
      drm/radeon/kms: fix regression in DP aux defer handling
      drm/radeon/kms: add retry limits for native DP aux defer
      drm/radeon/kms: Fix logic error in DP HPD handler
      drm/radeon/kms: fix channel_remap setup (v2)

Andy Gospodarek (1):
      bonding: properly stop queuing work when requested

Antonio Quartulli (1):
      batman-adv: do_bcast has to be true for broadcast packets only

Archit Taneja (1):
      [media] OMAP_VOUT: Fix build break caused by update_mode removal in DSS2

Arnd Bergmann (2):
      ASoC: use a valid device for dev_err() in Zylonite
      ASoC: omap_mcpdm_remove cannot be __devexit

Axel Lin (1):
      ASoC: Fix setting update bits for WM8753_LADC and WM8753_RADC

Ben Greear (2):
      ipv6-multicast: Fix memory leak in input path.
      ipv6-multicast: Fix memory leak in IPv6 multicast.

Benjamin Herrenschmidt (1):
      powerpc: Fix device-tree matching for Apple U4 bridge

Borislav Petkov (1):
      ide-disk: Fix request requeuing

Brian King (1):
      ibmveth: Fix oops on request_irq failure

Carsten Otte (1):
      [S390] gmap: always up mmap_sem properly

Dave Young (1):
      [media] v4l: Make sure we hold a reference to the v4l2_device
before using it

David S. Miller (3):
      sparc64: Future proof Niagara cpu detection.
      sparc: Make '-p' boot option meaningful again.
      sparc64: Force the execute bit in OpenFirmware's translation entries.

David Vrabel (1):
      net: xen-netback: correctly restart Tx after a VM restore/migrate

Divy Le Ray (1):
      cxgb4: Fix EEH on IBM P7IOC

Dmitry Kravkov (2):
      bnx2x: fix hw attention handling
      bnx2x: fix WOL by enablement PME in config space

Guenter Roeck (1):
      hwmon: (coretemp) Avoid leaving around dangling pointer

Hannes Reinecke (1):
      block: Free queue resources at blk_release_queue()

Hans Verkuil (1):
      [media] v4l: Fix use-after-free case in v4l2_device_release

Ian Campbell (1):
      MAINTAINERS: tehuti: Alexander Indenbaum's address bounces

James Bottomley (1):
      [SCSI] 3w-9xxx: fix iommu_iova leak

Jason Wang (1):
      net: fix a typo in Documentation/networking/scaling.txt

Jean Delvare (1):
      hwmon: (coretemp) Fixup platform device ID change

Jim Schutt (1):
      libceph: initialize ack_stamp to avoid unnecessary connection reset

Jiri Olsa (1):
      perf tools: Fix raw sample reading

Joerg Roedel (1):
      [media] omap3isp: Fix build error in ispccdc.c

Johannes Berg (1):
      iwlagn: fix dangling scan request

Jon Mason (1):
      PCI: Disable MPS configuration by default

Jonathan Lallinger (1):
      RDSRDMA: Fix cleanup of rds_iw_mr_pool

Josef Bacik (1):
      Btrfs: force a page fault if we have a shorty copy on a page boundary

Jouni Malinen (1):
      cfg80211: Fix validation of AKM suites

Keith Packard (2):
      drm/i915: Enable dither whenever display bpc < frame buffer bpc
      drm/i915: FBC off for ironlake and older, otherwise on by default

Larry Finger (1):
      rtlwifi: rtl8192cu: Fix unitialized struct

Lars-Peter Clausen (1):
      mfd: Fix generic irq chip ack function name for jz4740-adc

Laurent Pinchart (1):
      [media] uvcvideo: Fix crash when linking entities

Linus Torvalds (2):
      bootup: move 'usermodehelper_enable()' to the end of do_basic_setup()
      Linux 3.1-rc9

Madalin Bucur (2):
      net: check return value for dst_alloc
      ipv6: check return value for dst_alloc

Mark Salyzyn (1):
      [SCSI] libsas: fix failure to revalidate domain for anything but
the first expander child.

Martin Schwidefsky (1):
      [S390] Do not clobber personality flags on exec

Mathias Krause (1):
      sparc, exec: remove redundant addr_limit assignment

Matt Fleming (1):
      x86/rtc: Don't recursively acquire rtc_lock

Michel Dänzer (3):
      drm/radeon: Simplify cursor x/yorigin calculation.
      drm/radeon: Update AVIVO cursor coordinate origin before
x/yorigin calculation.
      drm/radeon: Set cursor x/y to 0 when x/yorigin > 0.

Ming Lei (1):
      [media] uvcvideo: Set alternate setting 0 on resume if the bus
has been reset

Mohammed Shafi Shajakhan (1):
      ath9k: Fix a dma warning/memory leak

Neil Horman (1):
      [SCSI] cxgb3i: convert cdev->l2opt to use rcu to prevent NULL dereference

Nicholas Miell (1):
      drm/radeon/kms: fix cursor image off-by-one error

Noah Watkins (1):
      libceph: fix parse options memory leak

Oliver Hartkopp (2):
      can bcm: fix tx_setup off-by-one errors
      can bcm: fix incomplete tx_setup fix

Peter Oberparleiter (1):
      [S390] cio: fix cio_tpi ignoring adapter interrupts

Peter Zijlstra (1):
      posix-cpu-timers: Cure SMP wobbles

Rajkumar Manoharan (1):
      ath9k_hw: Fix Rx DMA stuck for AR9003 chips

Ram Pai (1):
      Resource: fix wrong resource window calculation

Randy Dunlap (1):
      [SCSI] scsi: qla4xxx needs libiscsi.o

Richard Cochran (2):
      ptp: fix L2 event message recognition
      dp83640: reduce driver noise

Rob Herring (2):
      irq: Add declaration of irq_domain_simple_ops to irqdomain.h
      irq: Fix check for already initialized irq_domain in irq_domain_add

Roy.Li (1):
      net: Documentation: Fix type of variables

Sage Weil (3):
      libceph: fix linger request requeuing
      libceph: fix pg_temp mapping calculation
      libceph: fix pg_temp mapping update

Shawn Bohrer (1):
      sched/rt: Migrate equal priority tasks to available CPUs

Shmulik Ravid (1):
      bnx2x: add missing break in bnx2x_dcbnl_get_cap

Simon Farnsworth (1):
      drm/i915: Enable SDVO hotplug interrupts for HDMI and DVI

Simon Kirby (1):
      sched: Fix up wchan borkage

Stanislaw Gruszka (2):
      iwlegacy: fix command queue timeout
      iwlegacy: do not use interruptible waits

Takashi Iwai (2):
      ALSA: hda - Fix a regression of the position-buffer check
      lis3: fix regression of HP DriveGuard with 8bit chip

Tomoya MORINAGA (5):
      spi-topcliff-pch: add tx-memory clear after complete transmitting
      spi-topcliff-pch: Fix SSN Control issue
      spi-topcliff-pch: Fix CPU read complete condition issue
      spi-topcliff-pch: Add recovery processing in case FIFO overrun
error occurs
      spi-topcliff-pch: Fix overrun issue

Toshiharu Okada (2):
      pch_gbe: Fixed the issue on which PC was frozen when link was downed.
      pch_gbe: Fixed the issue on which a network freezes

Vasily Averin (1):
      [SCSI] aacraid: reset should disable MSI interrupt

Willem de Bruijn (1):
      make PACKET_STATISTICS getsockopt report consistently between
ring and non-ring

Wu Fengguang (1):
      writeback: show raw dirtied_when in trace writeback_single_inode

Yan, Zheng (1):
      ipv6: nullify ipv6_ac_list and ipv6_fl_list when creating new socket

wangyanqing (1):
      bootup: move 'usermodehelper_enable()' a little earlier

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-05  1:40 Linux 3.1-rc9 Linus Torvalds
@ 2011-10-07  7:08 ` Simon Kirby
  2011-10-07 17:48   ` Simon Kirby
  2011-10-09 20:51 ` Arkadiusz Miśkiewicz
  1 sibling, 1 reply; 156+ messages in thread
From: Simon Kirby @ 2011-10-07  7:08 UTC (permalink / raw)
  To: Linus Torvalds, Peter Zijlstra; +Cc: Linux Kernel Mailing List

On Tue, Oct 04, 2011 at 06:40:14PM -0700, Linus Torvalds wrote:

> Peter Zijlstra (1):
>       posix-cpu-timers: Cure SMP wobbles

Hello!

I upgraded a few boxes from 3.1-rc6+fixes to 3.1-rc9 (actually 538d2882),
and now they're hard locking every 15 minutes. Below is a serial console
capture of the lockup. I suspect this is from d670ec13. I'll confirm that
they stop crashing with that commit reverted...

[ 1717.560007] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
[ 1717.560007] Pid: 18034, comm: php Not tainted 3.1.0-rc9-hw+ #45
[ 1717.560007] Call Trace:
[ 1717.560007]  <NMI>  [<ffffffff816b3544>] panic+0xba/0x1fb
[ 1717.560007]  [<ffffffff81018d70>] ? native_sched_clock+0x20/0x80
[ 1717.560007]  [<ffffffff81018dd9>] ? sched_clock+0x9/0x10
[ 1717.560007]  [<ffffffff810a4751>] watchdog_overflow_callback+0xb1/0xc0
[ 1717.560007]  [<ffffffff810d0a12>] __perf_event_overflow+0xa2/0x1f0
[ 1717.560007]  [<ffffffff810c9f11>] ? perf_event_update_userpage+0x11/0xc0
[ 1717.560007]  [<ffffffff810d0f64>] perf_event_overflow+0x14/0x20
[ 1717.560007]  [<ffffffff81025a11>] intel_pmu_handle_irq+0x351/0x5f0
[ 1717.560007]  [<ffffffff816b7ff6>] perf_event_nmi_handler+0x36/0xb0
[ 1717.560007]  [<ffffffff816ba21f>] notifier_call_chain+0x3f/0x80
[ 1717.560007]  [<ffffffff816ba285>] atomic_notifier_call_chain+0x15/0x20
[ 1717.560007]  [<ffffffff816ba2be>] notify_die+0x2e/0x30
[ 1717.560007]  [<ffffffff816b76e2>] do_nmi+0xa2/0x250
[ 1717.560007]  [<ffffffff816b7080>] nmi+0x20/0x30
[ 1717.560007]  [<ffffffff8137e20d>] ? __write_lock_failed+0xd/0x20
[ 1717.560007]  <<EOE>>  [<ffffffff816b6819>] _raw_write_lock_irq+0x19/0x20
[ 1717.560007]  [<ffffffff810587c3>] copy_process+0xb23/0x1270
[ 1717.560007]  [<ffffffff81058fc2>] do_fork+0xb2/0x2f0
[ 1717.560007]  [<ffffffff8101a7e3>] sys_clone+0x23/0x30
[ 1717.560007]  [<ffffffff816be533>] stub_clone+0x13/0x20
[ 1717.560007]  [<ffffffff816be292>] ? system_call_fastpath+0x16/0x1b
[ 1717.560005] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 3
[ 1717.560005] Pid: 18038, comm: httpd Not tainted 3.1.0-rc9-hw+ #45
[ 1717.560005] Call Trace:
[ 1717.560005]  <NMI>  [<ffffffff816b3544>] panic+0xba/0x1fb
[ 1717.560005]  [<ffffffff81018d70>] ? native_sched_clock+0x20/0x80
[ 1717.560005]  [<ffffffff81018dd9>] ? sched_clock+0x9/0x10
[ 1717.560005]  [<ffffffff810a4751>] watchdog_overflow_callback+0xb1/0xc0
[ 1717.560005]  [<ffffffff810d0a12>] __perf_event_overflow+0xa2/0x1f0
[ 1717.560005]  [<ffffffff810c9f11>] ? perf_event_update_userpage+0x11/0xc0
[ 1717.560005]  [<ffffffff810d0f64>] perf_event_overflow+0x14/0x20
[ 1717.560005]  [<ffffffff81025a11>] intel_pmu_handle_irq+0x351/0x5f0
[ 1717.560005]  [<ffffffff816b7ff6>] perf_event_nmi_handler+0x36/0xb0
[ 1717.560005]  [<ffffffff816ba21f>] notifier_call_chain+0x3f/0x80
[ 1717.560005]  [<ffffffff816ba285>] atomic_notifier_call_chain+0x15/0x20
[ 1717.560005]  [<ffffffff816ba2be>] notify_die+0x2e/0x30
[ 1717.560005]  [<ffffffff816b76e2>] do_nmi+0xa2/0x250
[ 1717.560005]  [<ffffffff816b7080>] nmi+0x20/0x30
[ 1717.560005]  [<ffffffff816b6644>] ? _raw_spin_lock+0x14/0x20
[ 1717.560005]  <<EOE>>  [<ffffffff8104b4e5>] task_rq_lock+0x55/0xa0
[ 1717.560005]  [<ffffffff8104b8d4>] task_sched_runtime+0x24/0x90
[ 1717.560005]  [<ffffffff8107c924>] thread_group_cputime+0x74/0xb0
[ 1717.560005]  [<ffffffff8107d126>] thread_group_cputimer+0xa6/0xf0
[ 1717.560005]  [<ffffffff8107d198>] cpu_timer_sample_group+0x28/0x90
[ 1717.560005]  [<ffffffff8107d3c3>] set_process_cpu_timer+0x33/0x110
[ 1717.560005]  [<ffffffff8107d4da>] update_rlimit_cpu+0x3a/0x60
[ 1717.560005]  [<ffffffff8106fe9e>] do_prlimit+0xfe/0x1f0
[ 1717.560005]  [<ffffffff8106ffd6>] sys_setrlimit+0x46/0x60
[ 1717.560005]  [<ffffffff816be292>] system_call_fastpath+0x16/0x1b
[ 1717.564005] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 1
[ 1717.564005] Pid: 8, comm: migration/1 Not tainted 3.1.0-rc9-hw+ #45
[ 1717.564005] Call Trace:
[ 1717.564005]  <NMI>  [<ffffffff816b3544>] panic+0xba/0x1fb
[ 1717.564005]  [<ffffffff81018d70>] ? native_sched_clock+0x20/0x80
[ 1717.564005]  [<ffffffff81018dd9>] ? sched_clock+0x9/0x10
[ 1717.564005]  [<ffffffff810a4751>] watchdog_overflow_callback+0xb1/0xc0
[ 1717.564005]  [<ffffffff810d0a12>] __perf_event_overflow+0xa2/0x1f0
[ 1717.564005]  [<ffffffff810c9f11>] ? perf_event_update_userpage+0x11/0xc0
[ 1717.564005]  [<ffffffff810d0f64>] perf_event_overflow+0x14/0x20
[ 1717.564005]  [<ffffffff81025a11>] intel_pmu_handle_irq+0x351/0x5f0
[ 1717.564005]  [<ffffffff816b7ff6>] perf_event_nmi_handler+0x36/0xb0
[ 1717.564005]  [<ffffffff816ba21f>] notifier_call_chain+0x3f/0x80
[ 1717.564005]  [<ffffffff816ba285>] atomic_notifier_call_chain+0x15/0x20
[ 1717.564005]  [<ffffffff816ba2be>] notify_die+0x2e/0x30
[ 1717.564005]  [<ffffffff816b76e2>] do_nmi+0xa2/0x250
[ 1717.564005]  [<ffffffff816b7080>] nmi+0x20/0x30
[ 1717.564005]  [<ffffffff816b6640>] ? _raw_spin_lock+0x10/0x20
[ 1717.564005]  <<EOE>>  [<ffffffff81048cfd>] double_rq_lock+0x4d/0x60
[ 1717.564005]  [<ffffffff8104fee8>] __migrate_task+0x78/0x120
[ 1717.564005]  [<ffffffff8104ff90>] ? __migrate_task+0x120/0x120
[ 1717.564005]  [<ffffffff8104ffae>] migration_cpu_stop+0x1e/0x30
[ 1717.564005]  [<ffffffff810a370c>] cpu_stopper_thread+0xcc/0x190
[ 1717.564005]  [<ffffffff8105049d>] ? default_wake_function+0xd/0x10
[ 1717.564005]  [<ffffffff81043e0a>] ? __wake_up_common+0x5a/0x90
[ 1717.564005]  [<ffffffff810a3640>] ? cgroup_release_agent+0x1d0/0x1d0
[ 1717.564005]  [<ffffffff810a3640>] ? cgroup_release_agent+0x1d0/0x1d0
[ 1717.564005]  [<ffffffff8107adb6>] kthread+0x96/0xb0
[ 1717.564005]  [<ffffffff816c0374>] kernel_thread_helper+0x4/0x10
[ 1717.564005]  [<ffffffff8107ad20>] ? kthread_worker_fn+0x190/0x190
[ 1717.564005]  [<ffffffff816c0370>] ? gs_change+0x13/0x13
[ 1717.560007] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 2
[ 1717.560007] Pid: 15190, comm: httpd Not tainted 3.1.0-rc9-hw+ #45
[ 1717.560007] Call Trace:
[ 1717.560007]  <NMI>  [<ffffffff816b3544>] panic+0xba/0x1fb
[ 1717.560007]  [<ffffffff81018d70>] ? native_sched_clock+0x20/0x80
[ 1717.560007]  [<ffffffff81018dd9>] ? sched_clock+0x9/0x10
[ 1717.560007]  [<ffffffff810a4751>] watchdog_overflow_callback+0xb1/0xc0
[ 1717.560007]  [<ffffffff810d0a12>] __perf_event_overflow+0xa2/0x1f0
[ 1717.560007]  [<ffffffff810c9f11>] ? perf_event_update_userpage+0x11/0xc0
[ 1717.560007]  [<ffffffff810d0f64>] perf_event_overflow+0x14/0x20
[ 1717.560007]  [<ffffffff81025a11>] intel_pmu_handle_irq+0x351/0x5f0
[ 1717.560007]  [<ffffffff816b7ff6>] perf_event_nmi_handler+0x36/0xb0
[ 1717.560007]  [<ffffffff816ba21f>] notifier_call_chain+0x3f/0x80
[ 1717.560007]  [<ffffffff816ba285>] atomic_notifier_call_chain+0x15/0x20
[ 1717.560007]  [<ffffffff816ba2be>] notify_die+0x2e/0x30
[ 1717.560007]  [<ffffffff816b76e2>] do_nmi+0xa2/0x250
[ 1717.560007]  [<ffffffff816b7080>] nmi+0x20/0x30
[ 1717.560007]  [<ffffffff816b6644>] ? _raw_spin_lock+0x14/0x20
[ 1717.560007]  <<EOE>>  [<ffffffff81048064>] update_curr+0x174/0x1a0
[ 1717.560007]  [<ffffffff8104c75c>] enqueue_task_fair+0x5c/0x520
[ 1717.560007]  [<ffffffff81048ea1>] enqueue_task+0x61/0x70
[ 1717.560007]  [<ffffffff81048ed9>] activate_task+0x29/0x40
[ 1717.560007]  [<ffffffff81050589>] wake_up_new_task+0xb9/0x160
[ 1717.560007]  [<ffffffff81059056>] do_fork+0x146/0x2f0
[ 1717.560007]  [<ffffffff81114d80>] ? fd_install+0x30/0x60
[ 1717.560007]  [<ffffffff8101a7e3>] sys_clone+0x23/0x30
[ 1717.560007]  [<ffffffff816be533>] stub_clone+0x13/0x20
[ 1717.560007]  [<ffffffff816be292>] ? system_call_fastpath+0x16/0x1b

Config: http://0x.ca/sim/ref/3.1-rc9/config

Simon-

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-07  7:08 ` Simon Kirby
@ 2011-10-07 17:48   ` Simon Kirby
  2011-10-07 18:01     ` Peter Zijlstra
  0 siblings, 1 reply; 156+ messages in thread
From: Simon Kirby @ 2011-10-07 17:48 UTC (permalink / raw)
  To: Linus Torvalds, Peter Zijlstra; +Cc: Linux Kernel Mailing List

On Fri, Oct 07, 2011 at 12:08:42AM -0700, Simon Kirby wrote:

> On Tue, Oct 04, 2011 at 06:40:14PM -0700, Linus Torvalds wrote:
> 
> > Peter Zijlstra (1):
> >       posix-cpu-timers: Cure SMP wobbles
> 
> Hello!
> 
> I upgraded a few boxes from 3.1-rc6+fixes to 3.1-rc9 (actually 538d2882),
> and now they're hard locking every 15 minutes. Below is a serial console
> capture of the lockup. I suspect this is from d670ec13. I'll confirm that
> they stop crashing with that commit reverted...

Yes, they stopped locking up with d670ec13 reverted.

Simon-

> [ 1717.560007] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
> [ 1717.560007] Pid: 18034, comm: php Not tainted 3.1.0-rc9-hw+ #45
> [ 1717.560007] Call Trace:
> [ 1717.560007]  <NMI>  [<ffffffff816b3544>] panic+0xba/0x1fb
> [ 1717.560007]  [<ffffffff81018d70>] ? native_sched_clock+0x20/0x80
> [ 1717.560007]  [<ffffffff81018dd9>] ? sched_clock+0x9/0x10
> [ 1717.560007]  [<ffffffff810a4751>] watchdog_overflow_callback+0xb1/0xc0
> [ 1717.560007]  [<ffffffff810d0a12>] __perf_event_overflow+0xa2/0x1f0
> [ 1717.560007]  [<ffffffff810c9f11>] ? perf_event_update_userpage+0x11/0xc0
> [ 1717.560007]  [<ffffffff810d0f64>] perf_event_overflow+0x14/0x20
> [ 1717.560007]  [<ffffffff81025a11>] intel_pmu_handle_irq+0x351/0x5f0
> [ 1717.560007]  [<ffffffff816b7ff6>] perf_event_nmi_handler+0x36/0xb0
> [ 1717.560007]  [<ffffffff816ba21f>] notifier_call_chain+0x3f/0x80
> [ 1717.560007]  [<ffffffff816ba285>] atomic_notifier_call_chain+0x15/0x20
> [ 1717.560007]  [<ffffffff816ba2be>] notify_die+0x2e/0x30
> [ 1717.560007]  [<ffffffff816b76e2>] do_nmi+0xa2/0x250
> [ 1717.560007]  [<ffffffff816b7080>] nmi+0x20/0x30
> [ 1717.560007]  [<ffffffff8137e20d>] ? __write_lock_failed+0xd/0x20
> [ 1717.560007]  <<EOE>>  [<ffffffff816b6819>] _raw_write_lock_irq+0x19/0x20
> [ 1717.560007]  [<ffffffff810587c3>] copy_process+0xb23/0x1270
> [ 1717.560007]  [<ffffffff81058fc2>] do_fork+0xb2/0x2f0
> [ 1717.560007]  [<ffffffff8101a7e3>] sys_clone+0x23/0x30
> [ 1717.560007]  [<ffffffff816be533>] stub_clone+0x13/0x20
> [ 1717.560007]  [<ffffffff816be292>] ? system_call_fastpath+0x16/0x1b
> [ 1717.560005] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 3
> [ 1717.560005] Pid: 18038, comm: httpd Not tainted 3.1.0-rc9-hw+ #45
> [ 1717.560005] Call Trace:
> [ 1717.560005]  <NMI>  [<ffffffff816b3544>] panic+0xba/0x1fb
> [ 1717.560005]  [<ffffffff81018d70>] ? native_sched_clock+0x20/0x80
> [ 1717.560005]  [<ffffffff81018dd9>] ? sched_clock+0x9/0x10
> [ 1717.560005]  [<ffffffff810a4751>] watchdog_overflow_callback+0xb1/0xc0
> [ 1717.560005]  [<ffffffff810d0a12>] __perf_event_overflow+0xa2/0x1f0
> [ 1717.560005]  [<ffffffff810c9f11>] ? perf_event_update_userpage+0x11/0xc0
> [ 1717.560005]  [<ffffffff810d0f64>] perf_event_overflow+0x14/0x20
> [ 1717.560005]  [<ffffffff81025a11>] intel_pmu_handle_irq+0x351/0x5f0
> [ 1717.560005]  [<ffffffff816b7ff6>] perf_event_nmi_handler+0x36/0xb0
> [ 1717.560005]  [<ffffffff816ba21f>] notifier_call_chain+0x3f/0x80
> [ 1717.560005]  [<ffffffff816ba285>] atomic_notifier_call_chain+0x15/0x20
> [ 1717.560005]  [<ffffffff816ba2be>] notify_die+0x2e/0x30
> [ 1717.560005]  [<ffffffff816b76e2>] do_nmi+0xa2/0x250
> [ 1717.560005]  [<ffffffff816b7080>] nmi+0x20/0x30
> [ 1717.560005]  [<ffffffff816b6644>] ? _raw_spin_lock+0x14/0x20
> [ 1717.560005]  <<EOE>>  [<ffffffff8104b4e5>] task_rq_lock+0x55/0xa0
> [ 1717.560005]  [<ffffffff8104b8d4>] task_sched_runtime+0x24/0x90
> [ 1717.560005]  [<ffffffff8107c924>] thread_group_cputime+0x74/0xb0
> [ 1717.560005]  [<ffffffff8107d126>] thread_group_cputimer+0xa6/0xf0
> [ 1717.560005]  [<ffffffff8107d198>] cpu_timer_sample_group+0x28/0x90
> [ 1717.560005]  [<ffffffff8107d3c3>] set_process_cpu_timer+0x33/0x110
> [ 1717.560005]  [<ffffffff8107d4da>] update_rlimit_cpu+0x3a/0x60
> [ 1717.560005]  [<ffffffff8106fe9e>] do_prlimit+0xfe/0x1f0
> [ 1717.560005]  [<ffffffff8106ffd6>] sys_setrlimit+0x46/0x60
> [ 1717.560005]  [<ffffffff816be292>] system_call_fastpath+0x16/0x1b
> [ 1717.564005] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 1
> [ 1717.564005] Pid: 8, comm: migration/1 Not tainted 3.1.0-rc9-hw+ #45
> [ 1717.564005] Call Trace:
> [ 1717.564005]  <NMI>  [<ffffffff816b3544>] panic+0xba/0x1fb
> [ 1717.564005]  [<ffffffff81018d70>] ? native_sched_clock+0x20/0x80
> [ 1717.564005]  [<ffffffff81018dd9>] ? sched_clock+0x9/0x10
> [ 1717.564005]  [<ffffffff810a4751>] watchdog_overflow_callback+0xb1/0xc0
> [ 1717.564005]  [<ffffffff810d0a12>] __perf_event_overflow+0xa2/0x1f0
> [ 1717.564005]  [<ffffffff810c9f11>] ? perf_event_update_userpage+0x11/0xc0
> [ 1717.564005]  [<ffffffff810d0f64>] perf_event_overflow+0x14/0x20
> [ 1717.564005]  [<ffffffff81025a11>] intel_pmu_handle_irq+0x351/0x5f0
> [ 1717.564005]  [<ffffffff816b7ff6>] perf_event_nmi_handler+0x36/0xb0
> [ 1717.564005]  [<ffffffff816ba21f>] notifier_call_chain+0x3f/0x80
> [ 1717.564005]  [<ffffffff816ba285>] atomic_notifier_call_chain+0x15/0x20
> [ 1717.564005]  [<ffffffff816ba2be>] notify_die+0x2e/0x30
> [ 1717.564005]  [<ffffffff816b76e2>] do_nmi+0xa2/0x250
> [ 1717.564005]  [<ffffffff816b7080>] nmi+0x20/0x30
> [ 1717.564005]  [<ffffffff816b6640>] ? _raw_spin_lock+0x10/0x20
> [ 1717.564005]  <<EOE>>  [<ffffffff81048cfd>] double_rq_lock+0x4d/0x60
> [ 1717.564005]  [<ffffffff8104fee8>] __migrate_task+0x78/0x120
> [ 1717.564005]  [<ffffffff8104ff90>] ? __migrate_task+0x120/0x120
> [ 1717.564005]  [<ffffffff8104ffae>] migration_cpu_stop+0x1e/0x30
> [ 1717.564005]  [<ffffffff810a370c>] cpu_stopper_thread+0xcc/0x190
> [ 1717.564005]  [<ffffffff8105049d>] ? default_wake_function+0xd/0x10
> [ 1717.564005]  [<ffffffff81043e0a>] ? __wake_up_common+0x5a/0x90
> [ 1717.564005]  [<ffffffff810a3640>] ? cgroup_release_agent+0x1d0/0x1d0
> [ 1717.564005]  [<ffffffff810a3640>] ? cgroup_release_agent+0x1d0/0x1d0
> [ 1717.564005]  [<ffffffff8107adb6>] kthread+0x96/0xb0
> [ 1717.564005]  [<ffffffff816c0374>] kernel_thread_helper+0x4/0x10
> [ 1717.564005]  [<ffffffff8107ad20>] ? kthread_worker_fn+0x190/0x190
> [ 1717.564005]  [<ffffffff816c0370>] ? gs_change+0x13/0x13
> [ 1717.560007] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 2
> [ 1717.560007] Pid: 15190, comm: httpd Not tainted 3.1.0-rc9-hw+ #45
> [ 1717.560007] Call Trace:
> [ 1717.560007]  <NMI>  [<ffffffff816b3544>] panic+0xba/0x1fb
> [ 1717.560007]  [<ffffffff81018d70>] ? native_sched_clock+0x20/0x80
> [ 1717.560007]  [<ffffffff81018dd9>] ? sched_clock+0x9/0x10
> [ 1717.560007]  [<ffffffff810a4751>] watchdog_overflow_callback+0xb1/0xc0
> [ 1717.560007]  [<ffffffff810d0a12>] __perf_event_overflow+0xa2/0x1f0
> [ 1717.560007]  [<ffffffff810c9f11>] ? perf_event_update_userpage+0x11/0xc0
> [ 1717.560007]  [<ffffffff810d0f64>] perf_event_overflow+0x14/0x20
> [ 1717.560007]  [<ffffffff81025a11>] intel_pmu_handle_irq+0x351/0x5f0
> [ 1717.560007]  [<ffffffff816b7ff6>] perf_event_nmi_handler+0x36/0xb0
> [ 1717.560007]  [<ffffffff816ba21f>] notifier_call_chain+0x3f/0x80
> [ 1717.560007]  [<ffffffff816ba285>] atomic_notifier_call_chain+0x15/0x20
> [ 1717.560007]  [<ffffffff816ba2be>] notify_die+0x2e/0x30
> [ 1717.560007]  [<ffffffff816b76e2>] do_nmi+0xa2/0x250
> [ 1717.560007]  [<ffffffff816b7080>] nmi+0x20/0x30
> [ 1717.560007]  [<ffffffff816b6644>] ? _raw_spin_lock+0x14/0x20
> [ 1717.560007]  <<EOE>>  [<ffffffff81048064>] update_curr+0x174/0x1a0
> [ 1717.560007]  [<ffffffff8104c75c>] enqueue_task_fair+0x5c/0x520
> [ 1717.560007]  [<ffffffff81048ea1>] enqueue_task+0x61/0x70
> [ 1717.560007]  [<ffffffff81048ed9>] activate_task+0x29/0x40
> [ 1717.560007]  [<ffffffff81050589>] wake_up_new_task+0xb9/0x160
> [ 1717.560007]  [<ffffffff81059056>] do_fork+0x146/0x2f0
> [ 1717.560007]  [<ffffffff81114d80>] ? fd_install+0x30/0x60
> [ 1717.560007]  [<ffffffff8101a7e3>] sys_clone+0x23/0x30
> [ 1717.560007]  [<ffffffff816be533>] stub_clone+0x13/0x20
> [ 1717.560007]  [<ffffffff816be292>] ? system_call_fastpath+0x16/0x1b
> 
> Config: http://0x.ca/sim/ref/3.1-rc9/config

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-07 17:48   ` Simon Kirby
@ 2011-10-07 18:01     ` Peter Zijlstra
  2011-10-08  0:33       ` Simon Kirby
  2011-10-08  0:50       ` Simon Kirby
  0 siblings, 2 replies; 156+ messages in thread
From: Peter Zijlstra @ 2011-10-07 18:01 UTC (permalink / raw)
  To: Simon Kirby
  Cc: Linus Torvalds, Linux Kernel Mailing List, Dave Jones, Thomas Gleixner

On Fri, 2011-10-07 at 10:48 -0700, Simon Kirby wrote:

> Yes, they stopped locking up with d670ec13 reverted.

> > [ 1717.560005]  [<ffffffff8104b8d4>] task_sched_runtime+0x24/0x90
> > [ 1717.560005]  [<ffffffff8107c924>] thread_group_cputime+0x74/0xb0
> > [ 1717.560005]  [<ffffffff8107d126>] thread_group_cputimer+0xa6/0xf0
> > [ 1717.560005]  [<ffffffff8107d198>] cpu_timer_sample_group+0x28/0x90
> > [ 1717.560005]  [<ffffffff8107d3c3>] set_process_cpu_timer+0x33/0x110
> > [ 1717.560005]  [<ffffffff8107d4da>] update_rlimit_cpu+0x3a/0x60
> > [ 1717.560005]  [<ffffffff8106fe9e>] do_prlimit+0xfe/0x1f0
> > [ 1717.560005]  [<ffffffff8106ffd6>] sys_setrlimit+0x46/0x60
> > [ 1717.560005]  [<ffffffff816be292>] system_call_fastpath+0x16/0x1b


OK so that cputimer stuff is horrid and the worst part is that I cannot
seem to trigger this. You guys must have some weird userspace stuff that
I simply don't have.

I tried running some LTP tests, and it was suggested I find some glibc
tests as well, but I haven't got that far yet.

Now the problem isn't new, but the referenced patch does make it _MUCH_
more likely.

Both Thomas and I have tried to come up with solutions, but the only
thing that stands a chance of working, other than using atomic64_t, is
giving task_cputime::cputime.sum_exec_runtime its own lock.

Clearly this is all very ugly and I'm really hesitant of even posting
this, but here goes...

---
 include/linux/sched.h     |    3 +++
 kernel/posix-cpu-timers.c |    6 +++++-
 kernel/sched_stats.h      |    4 ++--
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index f3c5273..fbbe5eb 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -504,6 +504,7 @@ struct task_cputime {
  * @running:		non-zero when there are timers running and
  * 			@cputime receives updates.
  * @lock:		lock for fields in this struct.
+ * @runtime_lock:	lock for cputime.sum_exec_runtime
  *
  * This structure contains the version of task_cputime, above, that is
  * used for thread group CPU timer calculations.
@@ -512,6 +513,7 @@ struct thread_group_cputimer {
 	struct task_cputime cputime;
 	int running;
 	raw_spinlock_t lock;
+	raw_spinlock_t runtime_lock;
 };
 
 #include <linux/rwsem.h>
@@ -2571,6 +2573,7 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times);
 static inline void thread_group_cputime_init(struct signal_struct *sig)
 {
 	raw_spin_lock_init(&sig->cputimer.lock);
+	raw_spin_lock_init(&sig->cputimer.runtime_lock);
 }
 
 /*
diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index d20586b..bf760b4 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -284,9 +284,13 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)
 		 * it.
 		 */
 		thread_group_cputime(tsk, &sum);
+		raw_spin_lock(&cputimer->runtime_lock);
 		update_gt_cputime(&cputimer->cputime, &sum);
-	}
+	} else
+		raw_spin_lock(&cputimer->runtime_lock);
+
 	*times = cputimer->cputime;
+	raw_spin_unlock(&cputimer->runtime_lock);
 	raw_spin_unlock_irqrestore(&cputimer->lock, flags);
 }
 
diff --git a/kernel/sched_stats.h b/kernel/sched_stats.h
index 87f9e36..f9751c1 100644
--- a/kernel/sched_stats.h
+++ b/kernel/sched_stats.h
@@ -330,7 +330,7 @@ static inline void account_group_exec_runtime(struct task_struct *tsk,
 	if (!cputimer->running)
 		return;
 
-	raw_spin_lock(&cputimer->lock);
+	raw_spin_lock(&cputimer->runtime_lock);
 	cputimer->cputime.sum_exec_runtime += ns;
-	raw_spin_unlock(&cputimer->lock);
+	raw_spin_unlock(&cputimer->runtime_lock);
 }


^ permalink raw reply related	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-07 18:01     ` Peter Zijlstra
@ 2011-10-08  0:33       ` Simon Kirby
  2011-10-08  0:50       ` Simon Kirby
  1 sibling, 0 replies; 156+ messages in thread
From: Simon Kirby @ 2011-10-08  0:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Linux Kernel Mailing List, Dave Jones, Thomas Gleixner

On Fri, Oct 07, 2011 at 08:01:55PM +0200, Peter Zijlstra wrote:

> On Fri, 2011-10-07 at 10:48 -0700, Simon Kirby wrote:
> 
> > Yes, they stopped locking up with d670ec13 reverted.
> 
> > > [ 1717.560005]  [<ffffffff8104b8d4>] task_sched_runtime+0x24/0x90
> > > [ 1717.560005]  [<ffffffff8107c924>] thread_group_cputime+0x74/0xb0
> > > [ 1717.560005]  [<ffffffff8107d126>] thread_group_cputimer+0xa6/0xf0
> > > [ 1717.560005]  [<ffffffff8107d198>] cpu_timer_sample_group+0x28/0x90
> > > [ 1717.560005]  [<ffffffff8107d3c3>] set_process_cpu_timer+0x33/0x110
> > > [ 1717.560005]  [<ffffffff8107d4da>] update_rlimit_cpu+0x3a/0x60
> > > [ 1717.560005]  [<ffffffff8106fe9e>] do_prlimit+0xfe/0x1f0
> > > [ 1717.560005]  [<ffffffff8106ffd6>] sys_setrlimit+0x46/0x60
> > > [ 1717.560005]  [<ffffffff816be292>] system_call_fastpath+0x16/0x1b
> 
> OK so that cputimer stuff is horrid and the worst part is that I cannot
> seem to trigger this. You guys must have some weird userspace stuff that
> I simply don't have.

I haven't tried your patch yet, but it might help to mention that on
this particular cluster, we are using CONFIG_TASK_IO_ACCOUNTING under
CONFIG_TASKSTATS, and we have process accounting enabled (w/"accton").
Perhaps that enables some other path that makes it difficult to hit
otherwise.

You can't have clouds without weather reporting, of course. :)

Other than that, it's just a typical shared web environment.

Simon-

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-07 18:01     ` Peter Zijlstra
  2011-10-08  0:33       ` Simon Kirby
@ 2011-10-08  0:50       ` Simon Kirby
  2011-10-08  7:55         ` Peter Zijlstra
  1 sibling, 1 reply; 156+ messages in thread
From: Simon Kirby @ 2011-10-08  0:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Linux Kernel Mailing List, Dave Jones, Thomas Gleixner

On Fri, Oct 07, 2011 at 08:01:55PM +0200, Peter Zijlstra wrote:

> @@ -2571,6 +2573,7 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times);
>  static inline void thread_group_cputime_init(struct signal_struct *sig)
>  {
>  	raw_spin_lock_init(&sig->cputimer.lock);
> +	raw_spin_lock_init(&sig->cputimer.runtime_lock);

My 3.1-rc9 tree has just spin_lock_init() here, not raw_*.

Which tree is your patch against? -next or something?

It applies with some cooking like this, but will it be right?

> sed s/raw_// ../sched-patch-noraw.diff | patch -p1 --dry
patching file include/linux/sched.h
Hunk #1 succeeded at 503 (offset -1 lines).
Hunk #2 succeeded at 512 (offset -1 lines).
Hunk #3 succeeded at 2568 (offset -5 lines).
patching file kernel/posix-cpu-timers.c
patching file kernel/sched_stats.h

Simon-

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-08  0:50       ` Simon Kirby
@ 2011-10-08  7:55         ` Peter Zijlstra
  2011-10-12 21:35           ` Simon Kirby
  0 siblings, 1 reply; 156+ messages in thread
From: Peter Zijlstra @ 2011-10-08  7:55 UTC (permalink / raw)
  To: Simon Kirby
  Cc: Linus Torvalds, Linux Kernel Mailing List, Dave Jones, Thomas Gleixner

On Fri, 2011-10-07 at 17:50 -0700, Simon Kirby wrote:
> On Fri, Oct 07, 2011 at 08:01:55PM +0200, Peter Zijlstra wrote:
> 
> > @@ -2571,6 +2573,7 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times);
> >  static inline void thread_group_cputime_init(struct signal_struct *sig)
> >  {
> >       raw_spin_lock_init(&sig->cputimer.lock);
> > +     raw_spin_lock_init(&sig->cputimer.runtime_lock);
> 
> My 3.1-rc9 tree has just spin_lock_init() here, not raw_*.
> 
> Which tree is your patch against? -next or something?

or something yeah.. tip/master I think.

> It applies with some cooking like this, but will it be right?
> 
> > sed s/raw_// ../sched-patch-noraw.diff | patch -p1 --dry
> patching file include/linux/sched.h
> Hunk #1 succeeded at 503 (offset -1 lines).
> Hunk #2 succeeded at 512 (offset -1 lines).
> Hunk #3 succeeded at 2568 (offset -5 lines).
> patching file kernel/posix-cpu-timers.c
> patching file kernel/sched_stats.h 

yes that would be fine.

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-05  1:40 Linux 3.1-rc9 Linus Torvalds
  2011-10-07  7:08 ` Simon Kirby
@ 2011-10-09 20:51 ` Arkadiusz Miśkiewicz
  2011-10-10  2:29   ` [tpmdd-devel] " Stefan Berger
  1 sibling, 1 reply; 156+ messages in thread
From: Arkadiusz Miśkiewicz @ 2011-10-09 20:51 UTC (permalink / raw)
  To: linux-kernel, tpmdd-devel, Debora Velarde, Rajiv Andrade,
	Marcel Selhorst

On Wednesday 05 of October 2011, Linus Torvalds wrote:
> Another week, another -rc.

suspend to ram regression is annoying (still visible on rc9; 
https://lkml.org/lkml/2011/9/24/76) but unfortunately maintainers are silent.

-- 
Arkadiusz Miśkiewicz        PLD/Linux Team
arekm / maven.pl            http://ftp.pld-linux.org/

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [tpmdd-devel] Linux 3.1-rc9
  2011-10-09 20:51 ` Arkadiusz Miśkiewicz
@ 2011-10-10  2:29   ` Stefan Berger
  2011-10-10 16:23     ` Rajiv Andrade
  0 siblings, 1 reply; 156+ messages in thread
From: Stefan Berger @ 2011-10-10  2:29 UTC (permalink / raw)
  To: Arkadiusz Miśkiewicz
  Cc: linux-kernel, tpmdd-devel, Debora Velarde, Rajiv Andrade,
	Marcel Selhorst

On 10/09/2011 04:51 PM, Arkadiusz Miśkiewicz wrote:
> On Wednesday 05 of October 2011, Linus Torvalds wrote:
>> Another week, another -rc.
> suspend to ram regression is annoying (still visible on rc9;
> https://lkml.org/lkml/2011/9/24/76) but unfortunately maintainers are silent.
>
I tried -rc9 on my Lenovo W500 with that same TPM. I cannot reproduce 
the 'scheduling while atomic' problem you had reported earlier. I also 
could suspend / resume fine as long as I did the following:

- suspended with the tpm_tis driver as module in the kernel
- once a suspend was done without the tpm_tis driver the subsequent 
suspends were all done without the tpm_tis driver

Once I had done a suspend/resume with the tpm_tis driver *not* in the 
kernel and then again a suspend with the tpm_tis driver in the kernel, 
it did not resume anymore. I believe previously (previous version of 
kernel and/or Fedora) it refused to even suspend. The reason why this 
doesn't work properly is that the driver has to send a command to the 
TPM upon suspend and the BIOS then sends the corresponding wakeup command.

Did you maybe previously suspend/resume without a tpm_tis driver and 
then try to suspend with it ?

Also, my Lenovo W500 shows particularly odd behavior when I switch from 
Windows to Linux. The first suspend with a Linux booted after Windows 
(with or without tpm_tis driver) does *not* resume (reboot required). A 
subsequently rebooted Linux makes the suspend/resume work fine.

    Stefan





^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-10  2:29   ` [tpmdd-devel] " Stefan Berger
@ 2011-10-10 16:23     ` Rajiv Andrade
  2011-10-10 17:05       ` Arkadiusz Miśkiewicz
  0 siblings, 1 reply; 156+ messages in thread
From: Rajiv Andrade @ 2011-10-10 16:23 UTC (permalink / raw)
  To: Stefan Berger
  Cc: Arkadiusz Miśkiewicz, linux-kernel, tpmdd-devel,
	Debora Velarde, Marcel Selhorst

On 09/10/11 23:29, Stefan Berger wrote:
> On 10/09/2011 04:51 PM, Arkadiusz Miśkiewicz wrote:
>> On Wednesday 05 of October 2011, Linus Torvalds wrote:
>>> Another week, another -rc.
>> suspend to ram regression is annoying (still visible on rc9;
>> https://lkml.org/lkml/2011/9/24/76) but unfortunately maintainers are 
>> silent.
>>
> I tried -rc9 on my Lenovo W500 with that same TPM. I cannot reproduce 
> the 'scheduling while atomic' problem you had reported earlier. I also 
> could suspend / resume fine as long as I did the following:
>
> - suspended with the tpm_tis driver as module in the kernel
> - once a suspend was done without the tpm_tis driver the subsequent 
> suspends were all done without the tpm_tis driver
>
> Once I had done a suspend/resume with the tpm_tis driver *not* in the 
> kernel and then again a suspend with the tpm_tis driver in the kernel, 
> it did not resume anymore. I believe previously (previous version of 
> kernel and/or Fedora) it refused to even suspend. The reason why this 
> doesn't work properly is that the driver has to send a command to the 
> TPM upon suspend and the BIOS then sends the corresponding wakeup 
> command.
>
> Did you maybe previously suspend/resume without a tpm_tis driver and 
> then try to suspend with it ?
>
> Also, my Lenovo W500 shows particularly odd behavior when I switch 
> from Windows to Linux. The first suspend with a Linux booted after 
> Windows (with or without tpm_tis driver) does *not* resume (reboot 
> required). A subsequently rebooted Linux makes the suspend/resume work 
> fine.
>
>    Stefan
>
Arkadiusz,

Do you still see the issue with this patch [1][2] applied?

[1] - http://marc.info/?l=linux-kernel&m=131824905826280&w=2
[2] - github.com/srajiv/tpm.git for-james

Thanks,
Rajiv


^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-10 16:23     ` Rajiv Andrade
@ 2011-10-10 17:05       ` Arkadiusz Miśkiewicz
  2011-10-10 17:22         ` Stefan Berger
  0 siblings, 1 reply; 156+ messages in thread
From: Arkadiusz Miśkiewicz @ 2011-10-10 17:05 UTC (permalink / raw)
  To: Rajiv Andrade
  Cc: Stefan Berger, linux-kernel, tpmdd-devel, Debora Velarde,
	Marcel Selhorst

On Monday 10 of October 2011, Rajiv Andrade wrote:
> On 09/10/11 23:29, Stefan Berger wrote:
> > On 10/09/2011 04:51 PM, Arkadiusz Miśkiewicz wrote:
> >> On Wednesday 05 of October 2011, Linus Torvalds wrote:
> >>> Another week, another -rc.
> >> 
> >> suspend to ram regression is annoying (still visible on rc9;
> >> https://lkml.org/lkml/2011/9/24/76) but unfortunately maintainers are
> >> silent.
> > 
> > I tried -rc9 on my Lenovo W500 with that same TPM. I cannot reproduce
> > the 'scheduling while atomic' problem you had reported earlier. I also
> > could suspend / resume fine as long as I did the following:
> > 
> > - suspended with the tpm_tis driver as module in the kernel
> > - once a suspend was done without the tpm_tis driver the subsequent
> > suspends were all done without the tpm_tis driver
> > 
> > Once I had done a suspend/resume with the tpm_tis driver *not* in the
> > kernel and then again a suspend with the tpm_tis driver in the kernel,
> > it did not resume anymore. I believe previously (previous version of
> > kernel and/or Fedora) it refused to even suspend. The reason why this
> > doesn't work properly is that the driver has to send a command to the
> > TPM upon suspend and the BIOS then sends the corresponding wakeup
> > command.
> > 
> > Did you maybe previously suspend/resume without a tpm_tis driver and
> > then try to suspend with it ?
> > 
> > Also, my Lenovo W500 shows particularly odd behavior when I switch
> > from Windows to Linux. The first suspend with a Linux booted after
> > Windows (with or without tpm_tis driver) does *not* resume (reboot
> > required). A subsequently rebooted Linux makes the suspend/resume work
> > fine.
> > 
> >    Stefan
> 
> Arkadiusz,
> 
> Do you still see the issue with this patch [1][2] applied?

The issue doesn't happen with this patch but error condition with "Could not 
read PCR 0. TPM is not working correctly." is triggered immediately at boot, 
even before suspend is used.

$ dmesg|grep -iE "(tpm|suspend)"
[   12.640039] tpm_tis 00:0a: 1.2 TPM (device-id 0x1020, rev-id 6)
[   12.640048] tpm_tis 00:0a: Intel iTPM workaround enabled
[   12.768057] tpm_tis 00:0a: Could not read PCR 0. TPM is not working 
correctly.
[   12.768066] tpm_tis 00:0a: Was machine previously suspended without TPM 
driver present?
[   88.512117] Suspending console(s) (use no_console_suspend to debug)


> 
> [1] - http://marc.info/?l=linux-kernel&m=131824905826280&w=2
> [2] - github.com/srajiv/tpm.git for-james
> 
> Thanks,
> Rajiv


-- 
Arkadiusz Miśkiewicz        PLD/Linux Team
arekm / maven.pl            http://ftp.pld-linux.org/

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-10 17:05       ` Arkadiusz Miśkiewicz
@ 2011-10-10 17:22         ` Stefan Berger
  2011-10-10 17:57           ` Arkadiusz Miśkiewicz
  0 siblings, 1 reply; 156+ messages in thread
From: Stefan Berger @ 2011-10-10 17:22 UTC (permalink / raw)
  To: Arkadiusz Miśkiewicz
  Cc: Rajiv Andrade, linux-kernel, tpmdd-devel, Debora Velarde,
	Marcel Selhorst

On 10/10/2011 01:05 PM, Arkadiusz Miśkiewicz wrote:
> On Monday 10 of October 2011, Rajiv Andrade wrote:
>> On 09/10/11 23:29, Stefan Berger wrote:
>>> On 10/09/2011 04:51 PM, Arkadiusz Miśkiewicz wrote:
>>>> On Wednesday 05 of October 2011, Linus Torvalds wrote:
>>>>> Another week, another -rc.
>>>> suspend to ram regression is annoying (still visible on rc9;
>>>> https://lkml.org/lkml/2011/9/24/76) but unfortunately maintainers are
>>>> silent.
>>> I tried -rc9 on my Lenovo W500 with that same TPM. I cannot reproduce
>>> the 'scheduling while atomic' problem you had reported earlier. I also
>>> could suspend / resume fine as long as I did the following:
>>>
>>> - suspended with the tpm_tis driver as module in the kernel
>>> - once a suspend was done without the tpm_tis driver the subsequent
>>> suspends were all done without the tpm_tis driver
>>>
>>> Once I had done a suspend/resume with the tpm_tis driver *not* in the
>>> kernel and then again a suspend with the tpm_tis driver in the kernel,
>>> it did not resume anymore. I believe previously (previous version of
>>> kernel and/or Fedora) it refused to even suspend. The reason why this
>>> doesn't work properly is that the driver has to send a command to the
>>> TPM upon suspend and the BIOS then sends the corresponding wakeup
>>> command.
>>>
>>> Did you maybe previously suspend/resume without a tpm_tis driver and
>>> then try to suspend with it ?
>>>
>>> Also, my Lenovo W500 shows particularly odd behavior when I switch
>>> from Windows to Linux. The first suspend with a Linux booted after
>>> Windows (with or without tpm_tis driver) does *not* resume (reboot
>>> required). A subsequently rebooted Linux makes the suspend/resume work
>>> fine.
>>>
>>>     Stefan
>> Arkadiusz,
>>
>> Do you still see the issue with this patch [1][2] applied?
> The issue doesn't happen with this patch but error condition with "Could not
> read PCR 0. TPM is not working correctly." is triggered immediately at boot,
> even before suspend is used.
>
> $ dmesg|grep -iE "(tpm|suspend)"
> [   12.640039] tpm_tis 00:0a: 1.2 TPM (device-id 0x1020, rev-id 6)
> [   12.640048] tpm_tis 00:0a: Intel iTPM workaround enabled
> [   12.768057] tpm_tis 00:0a: Could not read PCR 0. TPM is not working
> correctly.
> [   12.768066] tpm_tis 00:0a: Was machine previously suspended without TPM
> driver present?
> [   88.512117] Suspending console(s) (use no_console_suspend to debug)
>
Though I suppose that now your suspend/resume cycles always work?
I guess the BIOS seems not to be initializing the TPM correctly. Any 
chance you can get a hold of a BIOS update for your machine?

    Stefan


^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-10 17:22         ` Stefan Berger
@ 2011-10-10 17:57           ` Arkadiusz Miśkiewicz
  2011-10-10 21:08             ` Arkadiusz Miśkiewicz
  2011-10-11  7:09             ` [tpmdd-devel] " Peter.Huewe
  0 siblings, 2 replies; 156+ messages in thread
From: Arkadiusz Miśkiewicz @ 2011-10-10 17:57 UTC (permalink / raw)
  To: Stefan Berger
  Cc: Rajiv Andrade, linux-kernel, tpmdd-devel, Debora Velarde,
	Marcel Selhorst

On Monday 10 of October 2011, Stefan Berger wrote:
> On 10/10/2011 01:05 PM, Arkadiusz Miśkiewicz wrote:
> > On Monday 10 of October 2011, Rajiv Andrade wrote:
> >> On 09/10/11 23:29, Stefan Berger wrote:
> >>> On 10/09/2011 04:51 PM, Arkadiusz Miśkiewicz wrote:
> >>>> On Wednesday 05 of October 2011, Linus Torvalds wrote:
> >>>>> Another week, another -rc.
> >>>> 
> >>>> suspend to ram regression is annoying (still visible on rc9;
> >>>> https://lkml.org/lkml/2011/9/24/76) but unfortunately maintainers are
> >>>> silent.
> >>> 
> >>> I tried -rc9 on my Lenovo W500 with that same TPM. I cannot reproduce
> >>> the 'scheduling while atomic' problem you had reported earlier. I also
> >>> could suspend / resume fine as long as I did the following:
> >>> 
> >>> - suspended with the tpm_tis driver as module in the kernel
> >>> - once a suspend was done without the tpm_tis driver the subsequent
> >>> suspends were all done without the tpm_tis driver
> >>> 
> >>> Once I had done a suspend/resume with the tpm_tis driver *not* in the
> >>> kernel and then again a suspend with the tpm_tis driver in the kernel,
> >>> it did not resume anymore. I believe previously (previous version of
> >>> kernel and/or Fedora) it refused to even suspend. The reason why this
> >>> doesn't work properly is that the driver has to send a command to the
> >>> TPM upon suspend and the BIOS then sends the corresponding wakeup
> >>> command.
> >>> 
> >>> Did you maybe previously suspend/resume without a tpm_tis driver and
> >>> then try to suspend with it ?
> >>> 
> >>> Also, my Lenovo W500 shows particularly odd behavior when I switch
> >>> from Windows to Linux. The first suspend with a Linux booted after
> >>> Windows (with or without tpm_tis driver) does *not* resume (reboot
> >>> required). A subsequently rebooted Linux makes the suspend/resume work
> >>> fine.
> >>> 
> >>>     Stefan
> >> 
> >> Arkadiusz,
> >> 
> >> Do you still see the issue with this patch [1][2] applied?
> > 
> > The issue doesn't happen with this patch but error condition with "Could
> > not read PCR 0. TPM is not working correctly." is triggered immediately
> > at boot, even before suspend is used.
> > 
> > $ dmesg|grep -iE "(tpm|suspend)"
> > [   12.640039] tpm_tis 00:0a: 1.2 TPM (device-id 0x1020, rev-id 6)
> > [   12.640048] tpm_tis 00:0a: Intel iTPM workaround enabled
> > [   12.768057] tpm_tis 00:0a: Could not read PCR 0. TPM is not working
> > correctly.
> > [   12.768066] tpm_tis 00:0a: Was machine previously suspended without
> > TPM driver present?
> > [   88.512117] Suspending console(s) (use no_console_suspend to debug)
> 
> Though I suppose that now your suspend/resume cycles always work?

Tried several times and it always worked, so probably yes. Longer testing will 
give definitive answer.

> I guess the BIOS seems not to be initializing the TPM correctly. Any
> chance you can get a hold of a BIOS update for your machine?

Then I looked into bios options on this thinkpad t400 and there are 3 possible 
TPM settings: Enabled, Invisible, Disabled.

Invisible is - visible but not working - according to bios help. No idea why 
such option exists but I had it enabled.

Right now I've set that to "Enabled" and ran few suspend/resume cycles - no 
problems so far.

I guess there is some way to make "Invisible" mode properly handled in Linux, 
too.

>     Stefan

-- 
Arkadiusz Miśkiewicz        PLD/Linux Team
arekm / maven.pl            http://ftp.pld-linux.org/

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-10 17:57           ` Arkadiusz Miśkiewicz
@ 2011-10-10 21:08             ` Arkadiusz Miśkiewicz
  2011-10-11  7:09             ` [tpmdd-devel] " Peter.Huewe
  1 sibling, 0 replies; 156+ messages in thread
From: Arkadiusz Miśkiewicz @ 2011-10-10 21:08 UTC (permalink / raw)
  To: Stefan Berger
  Cc: Rajiv Andrade, linux-kernel, tpmdd-devel, Debora Velarde,
	Marcel Selhorst

On Monday 10 of October 2011, Arkadiusz Miśkiewicz wrote:
> On Monday 10 of October 2011, Stefan Berger wrote:
> > On 10/10/2011 01:05 PM, Arkadiusz Miśkiewicz wrote:
> > > On Monday 10 of October 2011, Rajiv Andrade wrote:
> > >> On 09/10/11 23:29, Stefan Berger wrote:
> > >>> On 10/09/2011 04:51 PM, Arkadiusz Miśkiewicz wrote:
> > >>>> On Wednesday 05 of October 2011, Linus Torvalds wrote:
> > >>>>> Another week, another -rc.
> > >>>> 
> > >>>> suspend to ram regression is annoying (still visible on rc9;
> > >>>> https://lkml.org/lkml/2011/9/24/76) but unfortunately maintainers
> > >>>> are silent.
> > >>> 
> > >>> I tried -rc9 on my Lenovo W500 with that same TPM. I cannot reproduce
> > >>> the 'scheduling while atomic' problem you had reported earlier. I
> > >>> also could suspend / resume fine as long as I did the following:
> > >>> 
> > >>> - suspended with the tpm_tis driver as module in the kernel
> > >>> - once a suspend was done without the tpm_tis driver the subsequent
> > >>> suspends were all done without the tpm_tis driver
> > >>> 
> > >>> Once I had done a suspend/resume with the tpm_tis driver *not* in the
> > >>> kernel and then again a suspend with the tpm_tis driver in the
> > >>> kernel, it did not resume anymore. I believe previously (previous
> > >>> version of kernel and/or Fedora) it refused to even suspend. The
> > >>> reason why this doesn't work properly is that the driver has to send
> > >>> a command to the TPM upon suspend and the BIOS then sends the
> > >>> corresponding wakeup command.
> > >>> 
> > >>> Did you maybe previously suspend/resume without a tpm_tis driver and
> > >>> then try to suspend with it ?
> > >>> 
> > >>> Also, my Lenovo W500 shows particularly odd behavior when I switch
> > >>> from Windows to Linux. The first suspend with a Linux booted after
> > >>> Windows (with or without tpm_tis driver) does *not* resume (reboot
> > >>> required). A subsequently rebooted Linux makes the suspend/resume
> > >>> work fine.
> > >>> 
> > >>>     Stefan
> > >> 
> > >> Arkadiusz,
> > >> 
> > >> Do you still see the issue with this patch [1][2] applied?
> > > 
> > > The issue doesn't happen with this patch but error condition with
> > > "Could not read PCR 0. TPM is not working correctly." is triggered
> > > immediately at boot, even before suspend is used.
> > > 
> > > $ dmesg|grep -iE "(tpm|suspend)"
> > > [   12.640039] tpm_tis 00:0a: 1.2 TPM (device-id 0x1020, rev-id 6)
> > > [   12.640048] tpm_tis 00:0a: Intel iTPM workaround enabled
> > > [   12.768057] tpm_tis 00:0a: Could not read PCR 0. TPM is not working
> > > correctly.
> > > [   12.768066] tpm_tis 00:0a: Was machine previously suspended without
> > > TPM driver present?
> > > [   88.512117] Suspending console(s) (use no_console_suspend to debug)
> > 
> > Though I suppose that now your suspend/resume cycles always work?
> 
> Tried several times and it always worked, so probably yes. Longer testing
> will give definitive answer.
> 
> > I guess the BIOS seems not to be initializing the TPM correctly. Any
> > chance you can get a hold of a BIOS update for your machine?
> 
> Then I looked into bios options on this thinkpad t400 and there are 3
> possible TPM settings: Enabled, Invisible, Disabled.
> 
> Invisible is - visible but not working - according to bios help. No idea
> why such option exists but I had it enabled.
> 
> Right now I've set that to "Enabled" and ran few suspend/resume cycles - no
> problems so far.

Unfortunately TPM enabled in bios + kernel (3.1.0-rc9-00064-g65112dc-dirty) 
with the patch applied

[11629.922643] legacy_resume(): pnp_bus_resume+0x0/0x67 returns -19
[11629.922646] PM: Device 00:0a failed to resume: error -19


and there is no "Could not read PCR 0. TPM is not working correctly." message, 
so this check  doesn't seem to be good enough.

> 
> I guess there is some way to make "Invisible" mode properly handled in
> Linux, too.
> 
> >     Stefan


-- 
Arkadiusz Miśkiewicz        PLD/Linux Team
arekm / maven.pl            http://ftp.pld-linux.org/

^ permalink raw reply	[flat|nested] 156+ messages in thread

* RE: [tpmdd-devel] Linux 3.1-rc9
  2011-10-10 17:57           ` Arkadiusz Miśkiewicz
  2011-10-10 21:08             ` Arkadiusz Miśkiewicz
@ 2011-10-11  7:09             ` Peter.Huewe
  1 sibling, 0 replies; 156+ messages in thread
From: Peter.Huewe @ 2011-10-11  7:09 UTC (permalink / raw)
  To: a.miskiewicz, stefanb; +Cc: m.selhorst, tpmdd-devel, linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1170 bytes --]

-----Original Message-----
From: Arkadiusz Miśkiewicz [mailto:a.miskiewicz@gmail.com]
>> I guess the BIOS seems not to be initializing the TPM correctly. Any
>> chance you can get a hold of a BIOS update for your machine?

> Then I looked into bios options on this thinkpad t400 and there are 3 possible
> TPM settings: Enabled, Invisible, Disabled.

> Invisible is - visible but not working - according to bios help. No idea why
> such option exists but I had it enabled.

> I guess there is some way to make "Invisible" mode properly handled in Linux,
> too.

Invisible here probably means that the bios simply does not send a TPM_Startup which is needed to get the TPM running.
(and maybe it even let's physical presence untouched too).
If the driver would send a TPM_Startup(STATE) (which usually should not cause any problems, since if you send it twice the second one simply gets 'ignored' with a "invalid postinit" return code)
the tpm would probably work in the invisible case too.


Thanks,
Peter



ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-08  7:55         ` Peter Zijlstra
@ 2011-10-12 21:35           ` Simon Kirby
  2011-10-13 23:25             ` Simon Kirby
  2011-10-18  5:40             ` Simon Kirby
  0 siblings, 2 replies; 156+ messages in thread
From: Simon Kirby @ 2011-10-12 21:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Linux Kernel Mailing List, Dave Jones, Thomas Gleixner

On Sat, Oct 08, 2011 at 09:55:51AM +0200, Peter Zijlstra wrote:

> On Fri, 2011-10-07 at 17:50 -0700, Simon Kirby wrote:
> > On Fri, Oct 07, 2011 at 08:01:55PM +0200, Peter Zijlstra wrote:
> > 
> > > @@ -2571,6 +2573,7 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times);
> > >  static inline void thread_group_cputime_init(struct signal_struct *sig)
> > >  {
> > >       raw_spin_lock_init(&sig->cputimer.lock);
> > > +     raw_spin_lock_init(&sig->cputimer.runtime_lock);
> > 
> > My 3.1-rc9 tree has just spin_lock_init() here, not raw_*.
> > 
> > Which tree is your patch against? -next or something?
> 
> or something yeah.. tip/master I think.
> 
> > It applies with some cooking like this, but will it be right?
> > 
> > > sed s/raw_// ../sched-patch-noraw.diff | patch -p1 --dry
> > patching file include/linux/sched.h
> > Hunk #1 succeeded at 503 (offset -1 lines).
> > Hunk #2 succeeded at 512 (offset -1 lines).
> > Hunk #3 succeeded at 2568 (offset -5 lines).
> > patching file kernel/posix-cpu-timers.c
> > patching file kernel/sched_stats.h 
> 
> yes that would be fine.

This patch (s/raw_//) has been stable on 5 boxes for a day. I'll push to
another 15 shortly and confirm tomorrow. Meanwhile, we had another ~4
boxes lock up on 3.1-rc9 _with_ d670ec13 reverted (all CPUs spinning),
but there weren't enough serial cables to log all of them and we haven't
been lucky enough to capture anything other than what fits on 80x25.
I'm hoping it's just the same bug you've already fixed. Strangely, boxes
on -rc6 and -rc7 haven't hit it, but those are across clusters with
different workloads.

Thanks!

Simon-

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-12 21:35           ` Simon Kirby
@ 2011-10-13 23:25             ` Simon Kirby
  2011-10-17  1:39               ` Linus Torvalds
  2011-10-18  5:40             ` Simon Kirby
  1 sibling, 1 reply; 156+ messages in thread
From: Simon Kirby @ 2011-10-13 23:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Linux Kernel Mailing List, Dave Jones, Thomas Gleixner

On Wed, Oct 12, 2011 at 02:35:55PM -0700, Simon Kirby wrote:

> On Sat, Oct 08, 2011 at 09:55:51AM +0200, Peter Zijlstra wrote:
> 
> > On Fri, 2011-10-07 at 17:50 -0700, Simon Kirby wrote:
> > > On Fri, Oct 07, 2011 at 08:01:55PM +0200, Peter Zijlstra wrote:
> > > 
> > > > @@ -2571,6 +2573,7 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times);
> > > >  static inline void thread_group_cputime_init(struct signal_struct *sig)
> > > >  {
> > > >       raw_spin_lock_init(&sig->cputimer.lock);
> > > > +     raw_spin_lock_init(&sig->cputimer.runtime_lock);
> > > 
> > > My 3.1-rc9 tree has just spin_lock_init() here, not raw_*.
> > > 
> > > Which tree is your patch against? -next or something?
> > 
> > or something yeah.. tip/master I think.
> > 
> > > It applies with some cooking like this, but will it be right?
> > > 
> > > > sed s/raw_// ../sched-patch-noraw.diff | patch -p1 --dry
> > > patching file include/linux/sched.h
> > > Hunk #1 succeeded at 503 (offset -1 lines).
> > > Hunk #2 succeeded at 512 (offset -1 lines).
> > > Hunk #3 succeeded at 2568 (offset -5 lines).
> > > patching file kernel/posix-cpu-timers.c
> > > patching file kernel/sched_stats.h 
> > 
> > yes that would be fine.
> 
> This patch (s/raw_//) has been stable on 5 boxes for a day. I'll push to
> another 15 shortly and confirm tomorrow. Meanwhile, we had another ~4
> boxes lock up on 3.1-rc9 _with_ d670ec13 reverted (all CPUs spinning),
> but there weren't enough serial cables to log all of them and we haven't
> been lucky enough to capture anything other than what fits on 80x25.
> I'm hoping it's just the same bug you've already fixed. Strangely, boxes
> on -rc6 and -rc7 haven't hit it, but those are across clusters with
> different workloads.

Looks good. No hangs or crashes for two days on any of them running
3.1-rc9 plus this patch. Not sure if you want to deuglify it, but it
seems to work...

Tested-by: Simon Kirby <sim@hostway.ca>

diff against Linus reproduced below.

Simon-

 include/linux/sched.h     |    3 +++
 kernel/posix-cpu-timers.c |    6 +++++-
 kernel/sched_stats.h      |    4 ++--
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 41d0237..ad9eafc 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -503,6 +503,7 @@ struct task_cputime {
  * @running:		non-zero when there are timers running and
  * 			@cputime receives updates.
  * @lock:		lock for fields in this struct.
+ * @runtime_lock:	lock for cputime.sum_exec_runtime
  *
  * This structure contains the version of task_cputime, above, that is
  * used for thread group CPU timer calculations.
@@ -511,6 +512,7 @@ struct thread_group_cputimer {
 	struct task_cputime cputime;
 	int running;
 	spinlock_t lock;
+	spinlock_t runtime_lock;
 };
 
 #include <linux/rwsem.h>
@@ -2566,6 +2568,7 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times);
 static inline void thread_group_cputime_init(struct signal_struct *sig)
 {
 	spin_lock_init(&sig->cputimer.lock);
+	spin_lock_init(&sig->cputimer.runtime_lock);
 }
 
 /*
diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index c8008dd..fa189a6 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -284,9 +284,13 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)
 		 * it.
 		 */
 		thread_group_cputime(tsk, &sum);
+		spin_lock(&cputimer->runtime_lock);
 		update_gt_cputime(&cputimer->cputime, &sum);
-	}
+	} else
+		spin_lock(&cputimer->runtime_lock);
+
 	*times = cputimer->cputime;
+	spin_unlock(&cputimer->runtime_lock);
 	spin_unlock_irqrestore(&cputimer->lock, flags);
 }
 
diff --git a/kernel/sched_stats.h b/kernel/sched_stats.h
index 331e01b..a7e2c1a 100644
--- a/kernel/sched_stats.h
+++ b/kernel/sched_stats.h
@@ -330,7 +330,7 @@ static inline void account_group_exec_runtime(struct task_struct *tsk,
 	if (!cputimer->running)
 		return;
 
-	spin_lock(&cputimer->lock);
+	spin_lock(&cputimer->runtime_lock);
 	cputimer->cputime.sum_exec_runtime += ns;
-	spin_unlock(&cputimer->lock);
+	spin_unlock(&cputimer->runtime_lock);
 }

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-13 23:25             ` Simon Kirby
@ 2011-10-17  1:39               ` Linus Torvalds
  2011-10-17  4:58                 ` Ingo Molnar
                                   ` (3 more replies)
  0 siblings, 4 replies; 156+ messages in thread
From: Linus Torvalds @ 2011-10-17  1:39 UTC (permalink / raw)
  To: Simon Kirby
  Cc: Peter Zijlstra, Linux Kernel Mailing List, Dave Jones,
	Thomas Gleixner, Martin Schwidefsky, Ingo Molnar

On Thu, Oct 13, 2011 at 4:25 PM, Simon Kirby <sim@hostway.ca> wrote:
>
> Looks good. No hangs or crashes for two days on any of them running
> 3.1-rc9 plus this patch. Not sure if you want to deuglify it, but it
> seems to work...
>
> Tested-by: Simon Kirby <sim@hostway.ca>

Peter, what's the status of this one?

Quite frankly, I personally consider it to be broken - why are we
introducing this new lock for this very special thing? A spinlock to
protect a *single* word of counter seems broken.

It seems more likely that the real bug is that kernel/sched_stats.h
currently takes cputimer->lock without disabling interrupts. Everybody
else uses irq-safe locking, why would sched_stats.h not need that?

However, I don't see why that spinlock is needed at all. Why aren't
those fields just atomics (or at least just "sum_exec_runtime")? And
why does "cputime_add()" exist at all? It seems to always be just a
plain add, and nothing else would seem to ever make sense *anyway*?

In other words, none of that code makes any sense to me at all. And
the patch in question that fixes a hang for Simon seems to make it
even worse. Can somebody explain to me why it looks that crappy?

Please?

That stupid definition of cputime_add() has apparently existed as-is
since it was introduced in 2005. Why do we have code like this:

    times->utime = cputime_add(times->utime, t->utime);

instead of just

    times->utime += t->utime;

which seems not just shorter, but more readable too? The reason is not
some type safety in the cputime_add() thing, it's just a macro.

Added Martin and Ingo to the discussion - Martin because he added that
cputime_add in the first place, and Ingo because he gets the most hits
on kernel/sched_stats.h. Guys - you can see the history on lkml.

                                 Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17  1:39               ` Linus Torvalds
@ 2011-10-17  4:58                 ` Ingo Molnar
  2011-10-17  9:03                   ` Thomas Gleixner
  2011-10-17  7:55                 ` Martin Schwidefsky
                                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 156+ messages in thread
From: Ingo Molnar @ 2011-10-17  4:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Simon Kirby, Peter Zijlstra, Linux Kernel Mailing List,
	Dave Jones, Thomas Gleixner, Martin Schwidefsky


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> However, I don't see why that spinlock is needed at all. Why aren't 
> those fields just atomics (or at least just "sum_exec_runtime")? 
> And why does "cputime_add()" exist at all? [...]

Agreed, atomic64_t is the best choice here. (When the lock was added 
to struct *_cputimer this should probably have been done already - 
but we didn't have atomic64_t back then yet.)

> That stupid definition of cputime_add() has apparently existed 
> as-is since it was introduced in 2005. Why do we have code like 
> this:
> 
>     times->utime = cputime_add(times->utime, t->utime);
> 
> instead of just
> 
>     times->utime += t->utime;
> 
> which seems not just shorter, but more readable too? The reason is 
> not some type safety in the cputime_add() thing, it's just a macro.

Yes. This was in fact how the old scheduler accunting code looked 
like:

-                               utime += t->utime;
-                               stime += t->stime;
+                               utime = cputime_add(utime, t->utime);
+                               stime = cputime_add(stime, t->stime);

before the pointless looking cputime_t wrappery was added in 2005:

 0a71336: [PATCH] cputime: introduce cputime

For the record, i absolutely hate much of the other time related type 
obfuscation we do as well.

For example the ktime_t obfuscation - we only do it to avoid a divide 
on 32-bit architectures that cannot do fast 64/32 divisions ...

It makes the time code a *lot* less obvious than it could be.

I think we should use one flat u64 nanoseconds time type in the 
kernel (preparing it with using KTIME_SCALAR on all architectures for 
a release or so), used with very simple and obvious C arithmetics.

That simple time type could then trickle down as well: we could use 
it everywhere in kernel code and limit the hodge-podge of ABI time 
units to the syscall boundary. (and convert the internal time unit to 
whatever ABI unit there is close to the syscall boundary)

There's a point where micro-optimized 32-bit support related 
maintenance overhead (and the resulting loss of 
robustness/flexibility) becomes too expensive IMO.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17  1:39               ` Linus Torvalds
  2011-10-17  4:58                 ` Ingo Molnar
@ 2011-10-17  7:55                 ` Martin Schwidefsky
  2011-10-17  9:12                   ` Peter Zijlstra
  2011-10-17 20:48                   ` H. Peter Anvin
  2011-10-17 10:34                 ` Peter Zijlstra
  2011-10-20 14:36                 ` Linux 3.1-rc9 Martin Schwidefsky
  3 siblings, 2 replies; 156+ messages in thread
From: Martin Schwidefsky @ 2011-10-17  7:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Simon Kirby, Peter Zijlstra, Linux Kernel Mailing List,
	Dave Jones, Thomas Gleixner, Ingo Molnar

On Sun, 16 Oct 2011 18:39:57 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> That stupid definition of cputime_add() has apparently existed as-is
> since it was introduced in 2005. Why do we have code like this:
> 
>     times->utime = cputime_add(times->utime, t->utime);
> 
> instead of just
> 
>     times->utime += t->utime;
> 
> which seems not just shorter, but more readable too? The reason is not
> some type safety in the cputime_add() thing, it's just a macro.
> 
> Added Martin and Ingo to the discussion - Martin because he added that
> cputime_add in the first place, and Ingo because he gets the most hits
> on kernel/sched_stats.h. Guys - you can see the history on lkml.

I introduced those macros to find all the places in the kernel operating
on a cputime value. The additional debug patch defined cputime_t as a
struct which contained a single u64. That way I got a compiler error
for every place I missed.

The reason for the cputime_xxx primitives has been my fear that people
ignore the cputime_t type and just use unsigned long (as they always
have). That would break s390 which needs a u64 for its cputime value.
Dunno if we still need it, seems like we got used to using cputime_t.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17  4:58                 ` Ingo Molnar
@ 2011-10-17  9:03                   ` Thomas Gleixner
  2011-10-17 10:40                     ` Peter Zijlstra
  2011-10-17 18:49                     ` Ingo Molnar
  0 siblings, 2 replies; 156+ messages in thread
From: Thomas Gleixner @ 2011-10-17  9:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Simon Kirby, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky

On Mon, 17 Oct 2011, Ingo Molnar wrote:
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> For the record, i absolutely hate much of the other time related type 
> obfuscation we do as well.
> 
> For example the ktime_t obfuscation - we only do it to avoid a divide 
> on 32-bit architectures that cannot do fast 64/32 divisions ...
> 
> It makes the time code a *lot* less obvious than it could be.
> 
> I think we should use one flat u64 nanoseconds time type in the 
> kernel (preparing it with using KTIME_SCALAR on all architectures for 
> a release or so), used with very simple and obvious C arithmetics.

It'd be nice, but this simply will not fly.
 
> That simple time type could then trickle down as well: we could use 
> it everywhere in kernel code and limit the hodge-podge of ABI time 
> units to the syscall boundary. (and convert the internal time unit to 
> whatever ABI unit there is close to the syscall boundary)
> 
> There's a point where micro-optimized 32-bit support related 
> maintenance overhead (and the resulting loss of 
> robustness/flexibility) becomes too expensive IMO.

That's not a micro optimization, it's a massive performance hit if you
force those 32bit archs to do 64/32 all over the place.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17  7:55                 ` Martin Schwidefsky
@ 2011-10-17  9:12                   ` Peter Zijlstra
  2011-10-17  9:18                     ` Martin Schwidefsky
  2011-10-17 20:48                   ` H. Peter Anvin
  1 sibling, 1 reply; 156+ messages in thread
From: Peter Zijlstra @ 2011-10-17  9:12 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: Linus Torvalds, Simon Kirby, Linux Kernel Mailing List,
	Dave Jones, Thomas Gleixner, Ingo Molnar

On Mon, 2011-10-17 at 09:55 +0200, Martin Schwidefsky wrote:
> 
> The reason for the cputime_xxx primitives has been my fear that people
> ignore the cputime_t type and just use unsigned long (as they always
> have). That would break s390 which needs a u64 for its cputime value.
> Dunno if we still need it, seems like we got used to using cputime_t. 

Right, and like mentioned last time this came up, we could possibly make
use of sparse to ensure things don't go fail on 32bit s390.

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17  9:12                   ` Peter Zijlstra
@ 2011-10-17  9:18                     ` Martin Schwidefsky
  0 siblings, 0 replies; 156+ messages in thread
From: Martin Schwidefsky @ 2011-10-17  9:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Simon Kirby, Linux Kernel Mailing List,
	Dave Jones, Thomas Gleixner, Ingo Molnar

On Mon, 17 Oct 2011 11:12:51 +0200
Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> On Mon, 2011-10-17 at 09:55 +0200, Martin Schwidefsky wrote:
> > 
> > The reason for the cputime_xxx primitives has been my fear that people
> > ignore the cputime_t type and just use unsigned long (as they always
> > have). That would break s390 which needs a u64 for its cputime value.
> > Dunno if we still need it, seems like we got used to using cputime_t. 
> 
> Right, and like mentioned last time this came up, we could possibly make
> use of sparse to ensure things don't go fail on 32bit s390.

Indeed. No progress on the sparse check so far I'm afraid. 


-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17  1:39               ` Linus Torvalds
  2011-10-17  4:58                 ` Ingo Molnar
  2011-10-17  7:55                 ` Martin Schwidefsky
@ 2011-10-17 10:34                 ` Peter Zijlstra
  2011-10-17 14:07                   ` Martin Schwidefsky
  2011-10-17 14:57                   ` Linus Torvalds
  2011-10-20 14:36                 ` Linux 3.1-rc9 Martin Schwidefsky
  3 siblings, 2 replies; 156+ messages in thread
From: Peter Zijlstra @ 2011-10-17 10:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Simon Kirby, Linux Kernel Mailing List, Dave Jones,
	Thomas Gleixner, Martin Schwidefsky, Ingo Molnar

On Sun, 2011-10-16 at 18:39 -0700, Linus Torvalds wrote:

> Quite frankly, I personally consider it to be broken - why are we
> introducing this new lock for this very special thing? A spinlock to
> protect a *single* word of counter seems broken.

Well, I thought atomic64_t would be more expensive on 32bit archs, i386
uses the horridly expensive cmpxchg8b thing to implement it.

That said, I'm more than glad to use it.

> However, I don't see why that spinlock is needed at all. Why aren't
> those fields just atomics (or at least just "sum_exec_runtime")? 

Done.

> And
> why does "cputime_add()" exist at all? It seems to always be just a
> plain add, and nothing else would seem to ever make sense *anyway*?

Martin and me were discussing the merit of that only a few weeks ago ;-)

BTW what would we all think about a coccinelle generated patch that
fixes atomic*_add()'s argument order?

---
Subject: cputimer: Cure lock inversion
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Mon Oct 17 11:50:30 CEST 2011

There's a lock inversion between the cputimer->lock and rq->lock; notably
the two callchains involved are:

 update_rlimit_cpu()
   sighand->siglock
   set_process_cpu_timer()
     cpu_timer_sample_group()
       thread_group_cputimer()
         cputimer->lock
         thread_group_cputime()
           task_sched_runtime()
             ->pi_lock
             rq->lock

 scheduler_tick()
   rq->lock
   task_tick_fair()
     update_curr()
       account_group_exec()
         cputimer->lock

Where the first one is enabling a CLOCK_PROCESS_CPUTIME_ID timer, and the
second one is keeping up-to-date.

Note that e8abccb7193 ("posix-cpu-timers: Cure SMP accounting oddities") didn't
introduce this problem, but merely made it much more likely to happen, see how
cpu_timer_sample_group() for the CPUCLOCK_SCHED case also takes rq->lock.

Cure this inversion by removing the need to acquire cputimer->lock in the
update path by converting task_cputime::sum_exec_runtime to an atomic64_t.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 include/linux/sched.h     |    4 ++--
 kernel/fork.c             |    2 +-
 kernel/posix-cpu-timers.c |   41 ++++++++++++++++++++++++-----------------
 kernel/sched.c            |    2 +-
 kernel/sched_rt.c         |    6 ++++--
 kernel/sched_stats.h      |    4 +---
 6 files changed, 33 insertions(+), 26 deletions(-)
Index: linux-2.6/include/linux/sched.h
===================================================================
--- linux-2.6.orig/include/linux/sched.h
+++ linux-2.6/include/linux/sched.h
@@ -474,7 +474,7 @@ struct cpu_itimer {
 struct task_cputime {
 	cputime_t utime;
 	cputime_t stime;
-	unsigned long long sum_exec_runtime;
+	atomic64_t sum_exec_runtime;
 };
 /* Alternate field names when used to cache expirations. */
 #define prof_exp	stime
@@ -485,7 +485,7 @@ struct task_cputime {
 	(struct task_cputime) {					\
 		.utime = cputime_zero,				\
 		.stime = cputime_zero,				\
-		.sum_exec_runtime = 0,				\
+		.sum_exec_runtime = ATOMIC64_INIT(0),		\
 	}
 
 /*
Index: linux-2.6/kernel/fork.c
===================================================================
--- linux-2.6.orig/kernel/fork.c
+++ linux-2.6/kernel/fork.c
@@ -1033,7 +1033,7 @@ static void posix_cpu_timers_init(struct
 {
 	tsk->cputime_expires.prof_exp = cputime_zero;
 	tsk->cputime_expires.virt_exp = cputime_zero;
-	tsk->cputime_expires.sched_exp = 0;
+	atomic64_set(&tsk->cputime_expires.sched_exp, 0);
 	INIT_LIST_HEAD(&tsk->cpu_timers[0]);
 	INIT_LIST_HEAD(&tsk->cpu_timers[1]);
 	INIT_LIST_HEAD(&tsk->cpu_timers[2]);
Index: linux-2.6/kernel/posix-cpu-timers.c
===================================================================
--- linux-2.6.orig/kernel/posix-cpu-timers.c
+++ linux-2.6/kernel/posix-cpu-timers.c
@@ -239,7 +239,7 @@ void thread_group_cputime(struct task_st
 
 	times->utime = sig->utime;
 	times->stime = sig->stime;
-	times->sum_exec_runtime = sig->sum_sched_runtime;
+	atomic64_set(&times->sum_exec_runtime, sig->sum_sched_runtime);
 
 	rcu_read_lock();
 	/* make sure we can trust tsk->thread_group list */
@@ -250,7 +250,7 @@ void thread_group_cputime(struct task_st
 	do {
 		times->utime = cputime_add(times->utime, t->utime);
 		times->stime = cputime_add(times->stime, t->stime);
-		times->sum_exec_runtime += task_sched_runtime(t);
+		atomic64_add(task_sched_runtime(t), &times->sum_exec_runtime);
 	} while_each_thread(tsk, t);
 out:
 	rcu_read_unlock();
@@ -264,8 +264,11 @@ static void update_gt_cputime(struct tas
 	if (cputime_gt(b->stime, a->stime))
 		a->stime = b->stime;
 
-	if (b->sum_exec_runtime > a->sum_exec_runtime)
-		a->sum_exec_runtime = b->sum_exec_runtime;
+	if (atomic64_read(&b->sum_exec_runtime) >
+			atomic64_read(&a->sum_exec_runtime)) {
+		atomic64_set(&a->sum_exec_runtime,
+				atomic64_read(&b->sum_exec_runtime));
+	}
 }
 
 void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)
@@ -287,6 +290,8 @@ void thread_group_cputimer(struct task_s
 		update_gt_cputime(&cputimer->cputime, &sum);
 	}
 	*times = cputimer->cputime;
+	atomic64_set(&times->sum_exec_runtime,
+			atomic64_read(&cputimer->cputime.sum_exec_runtime));
 	spin_unlock_irqrestore(&cputimer->lock, flags);
 }
 
@@ -313,7 +318,7 @@ static int cpu_clock_sample_group(const 
 		break;
 	case CPUCLOCK_SCHED:
 		thread_group_cputime(p, &cputime);
-		cpu->sched = cputime.sum_exec_runtime;
+		cpu->sched = atomic64_read(&cputime.sum_exec_runtime);
 		break;
 	}
 	return 0;
@@ -593,9 +598,9 @@ static void arm_timer(struct k_itimer *t
 				cputime_expires->virt_exp = exp->cpu;
 			break;
 		case CPUCLOCK_SCHED:
-			if (cputime_expires->sched_exp == 0 ||
-			    cputime_expires->sched_exp > exp->sched)
-				cputime_expires->sched_exp = exp->sched;
+			if (atomic64_read(&cputime_expires->sched_exp) == 0 ||
+			    atomic64_read(&cputime_expires->sched_exp) > exp->sched)
+				atomic64_set(&cputime_expires->sched_exp, exp->sched);
 			break;
 		}
 	}
@@ -656,7 +661,7 @@ static int cpu_timer_sample_group(const 
 		cpu->cpu = cputime.utime;
 		break;
 	case CPUCLOCK_SCHED:
-		cpu->sched = cputime.sum_exec_runtime + task_delta_exec(p);
+		cpu->sched = atomic64_read(&cputime.sum_exec_runtime) + task_delta_exec(p);
 		break;
 	}
 	return 0;
@@ -947,13 +952,14 @@ static void check_thread_timers(struct t
 
 	++timers;
 	maxfire = 20;
-	tsk->cputime_expires.sched_exp = 0;
+	atomic64_set(&tsk->cputime_expires.sched_exp, 0);
 	while (!list_empty(timers)) {
 		struct cpu_timer_list *t = list_first_entry(timers,
 						      struct cpu_timer_list,
 						      entry);
 		if (!--maxfire || tsk->se.sum_exec_runtime < t->expires.sched) {
-			tsk->cputime_expires.sched_exp = t->expires.sched;
+			atomic64_set(&tsk->cputime_expires.sched_exp,
+				     t->expires.sched);
 			break;
 		}
 		t->firing = 1;
@@ -1049,7 +1055,7 @@ static inline int task_cputime_zero(cons
 {
 	if (cputime_eq(cputime->utime, cputime_zero) &&
 	    cputime_eq(cputime->stime, cputime_zero) &&
-	    cputime->sum_exec_runtime == 0)
+	    atomic64_read(&cputime->sum_exec_runtime) == 0)
 		return 1;
 	return 0;
 }
@@ -1076,7 +1082,7 @@ static void check_process_timers(struct 
 	thread_group_cputimer(tsk, &cputime);
 	utime = cputime.utime;
 	ptime = cputime_add(utime, cputime.stime);
-	sum_sched_runtime = cputime.sum_exec_runtime;
+	sum_sched_runtime = atomic64_read(&cputime.sum_exec_runtime);
 	maxfire = 20;
 	prof_expires = cputime_zero;
 	while (!list_empty(timers)) {
@@ -1161,7 +1167,7 @@ static void check_process_timers(struct 
 
 	sig->cputime_expires.prof_exp = prof_expires;
 	sig->cputime_expires.virt_exp = virt_expires;
-	sig->cputime_expires.sched_exp = sched_expires;
+	atomic64_set(&sig->cputime_expires.sched_exp, sched_expires);
 	if (task_cputime_zero(&sig->cputime_expires))
 		stop_process_timers(sig);
 }
@@ -1255,8 +1261,9 @@ static inline int task_cputime_expired(c
 	    cputime_ge(cputime_add(sample->utime, sample->stime),
 		       expires->stime))
 		return 1;
-	if (expires->sum_exec_runtime != 0 &&
-	    sample->sum_exec_runtime >= expires->sum_exec_runtime)
+	if (atomic64_read(&expires->sum_exec_runtime) != 0 &&
+	    atomic64_read(&sample->sum_exec_runtime) >=
+			atomic64_read(&expires->sum_exec_runtime))
 		return 1;
 	return 0;
 }
@@ -1279,7 +1286,7 @@ static inline int fastpath_timer_check(s
 		struct task_cputime task_sample = {
 			.utime = tsk->utime,
 			.stime = tsk->stime,
-			.sum_exec_runtime = tsk->se.sum_exec_runtime
+			.sum_exec_runtime = ATOMIC64_INIT(tsk->se.sum_exec_runtime),
 		};
 
 		if (task_cputime_expired(&task_sample, &tsk->cputime_expires))
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -4075,7 +4075,7 @@ void thread_group_times(struct task_stru
 	thread_group_cputime(p, &cputime);
 
 	total = cputime_add(cputime.utime, cputime.stime);
-	rtime = nsecs_to_cputime(cputime.sum_exec_runtime);
+	rtime = nsecs_to_cputime(atomic64_read(&cputime.sum_exec_runtime));
 
 	if (total) {
 		u64 temp = rtime;
Index: linux-2.6/kernel/sched_rt.c
===================================================================
--- linux-2.6.orig/kernel/sched_rt.c
+++ linux-2.6/kernel/sched_rt.c
@@ -1763,8 +1763,10 @@ static void watchdog(struct rq *rq, stru
 
 		p->rt.timeout++;
 		next = DIV_ROUND_UP(min(soft, hard), USEC_PER_SEC/HZ);
-		if (p->rt.timeout > next)
-			p->cputime_expires.sched_exp = p->se.sum_exec_runtime;
+		if (p->rt.timeout > next) {
+			atomic64_set(&p->cputime_expires.sched_exp,
+					p->se.sum_exec_runtime);
+		}
 	}
 }
 
Index: linux-2.6/kernel/sched_stats.h
===================================================================
--- linux-2.6.orig/kernel/sched_stats.h
+++ linux-2.6/kernel/sched_stats.h
@@ -330,7 +330,5 @@ static inline void account_group_exec_ru
 	if (!cputimer->running)
 		return;
 
-	spin_lock(&cputimer->lock);
-	cputimer->cputime.sum_exec_runtime += ns;
-	spin_unlock(&cputimer->lock);
+	atomic64_add(ns, &cputimer->cputime.sum_exec_runtime);
 }


^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17  9:03                   ` Thomas Gleixner
@ 2011-10-17 10:40                     ` Peter Zijlstra
  2011-10-17 11:40                       ` Alan Cox
  2011-10-17 18:49                     ` Ingo Molnar
  1 sibling, 1 reply; 156+ messages in thread
From: Peter Zijlstra @ 2011-10-17 10:40 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Linus Torvalds, Simon Kirby,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky

On Mon, 2011-10-17 at 11:03 +0200, Thomas Gleixner wrote:
> That's not a micro optimization, it's a massive performance hit if you
> force those 32bit archs to do 64/32 all over the place.
> 
Linus could just say he doesn't care about 32bit and everybody sane
should just get a 64bit machine.. but I suspect that's a few more years.

Although I hope not too long, even my phone could do with more than 2G
of memory.

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17 10:40                     ` Peter Zijlstra
@ 2011-10-17 11:40                       ` Alan Cox
  0 siblings, 0 replies; 156+ messages in thread
From: Alan Cox @ 2011-10-17 11:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Linus Torvalds, Simon Kirby,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky

On Mon, 17 Oct 2011 12:40:01 +0200
Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> On Mon, 2011-10-17 at 11:03 +0200, Thomas Gleixner wrote:
> > That's not a micro optimization, it's a massive performance hit if you
> > force those 32bit archs to do 64/32 all over the place.
> > 
> Linus could just say he doesn't care about 32bit and everybody sane
> should just get a 64bit machine.. but I suspect that's a few more years.
> 
> Although I hope not too long, even my phone could do with more than 2G
> of memory.

Perhaps it wouldnt; if people didn't keep adding junk to the system ;)


^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17 10:34                 ` Peter Zijlstra
@ 2011-10-17 14:07                   ` Martin Schwidefsky
  2011-10-17 14:57                   ` Linus Torvalds
  1 sibling, 0 replies; 156+ messages in thread
From: Martin Schwidefsky @ 2011-10-17 14:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Simon Kirby, Linux Kernel Mailing List,
	Dave Jones, Thomas Gleixner, Ingo Molnar

On Mon, 17 Oct 2011 12:34:18 +0200
Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> > And
> > why does "cputime_add()" exist at all? It seems to always be just a
> > plain add, and nothing else would seem to ever make sense *anyway*?
> 
> Martin and me were discussing the merit of that only a few weeks ago ;-)

I took my old cputime debug patch and compiled the latest git tree with it.
The compiler found a few places where fishy things happen:

1) fs/proc/uptime.c
static int uptime_proc_show(struct seq_file *m, void *v)
{
	...
        cputime_t idletime = cputime_zero;

        for_each_possible_cpu(i)
                idletime = cputime64_add(idletime, kstat_cpu(i).cpustat.idle);
	...
        cputime_to_timespec(idletime, &idle);
	...
}

idletime is a 32-bit integer on x86-32. The sum of the idle time over all
cpus will quickly overflow, e.g. consider HZ=1000 on a quad-core. It would
overflow after 12.42 days (2^32 / 1000 / 4 / 86400).

2) kernel/posix-cpu-timers.c
/*                                                                              
 * Divide and limit the result to res >= 1                                      
 *                                                                              
 * This is necessary to prevent signal delivery starvation, when the result of  
 * the division would be rounded down to 0.                                     
 */
static inline cputime_t cputime_div_non_zero(cputime_t time, unsigned long div)
{
        cputime_t res = cputime_div(time, div);

        return max_t(cputime_t, res, 1);
}

A cputime of 1 on s390 is 0.244 nano seconds, I have my doubts if that will
prevent signal starvation. Fortunately the function is unused and can be
removed.

3) kernel/itimer
enum hrtimer_restart it_real_fn(struct hrtimer *timer)
{
        struct signal_struct *sig =
                container_of(timer, struct signal_struct, real_timer);

        trace_itimer_expire(ITIMER_REAL, sig->leader_pid, 0);
        kill_pid_info(SIGALRM, SEND_SIG_PRIV, sig->leader_pid);

        return HRTIMER_NORESTART;
}

trace_itimer_expire take a cputime as third argument. That should be
cputime_zero in the current notation, same in do_setitimer. After the
conversion all cputime_zero occurences would be replaced with 0.

4) kernel/sched.c
#define CPUACCT_BATCH   \
        min_t(long, percpu_counter_batch * cputime_one_jiffy, INT_MAX)

If cputime_t is defined as an 64-bit type on a 32-bit architecture the
CPUACCT_BATCH definition can break. Should work for the existing code
though.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17 10:34                 ` Peter Zijlstra
  2011-10-17 14:07                   ` Martin Schwidefsky
@ 2011-10-17 14:57                   ` Linus Torvalds
  2011-10-17 17:54                     ` Peter Zijlstra
  1 sibling, 1 reply; 156+ messages in thread
From: Linus Torvalds @ 2011-10-17 14:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Simon Kirby, Linux Kernel Mailing List, Dave Jones,
	Thomas Gleixner, Martin Schwidefsky, Ingo Molnar

On Mon, Oct 17, 2011 at 3:34 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
> Well, I thought atomic64_t would be more expensive on 32bit archs, i386
> uses the horridly expensive cmpxchg8b thing to implement it.

Ugh, yes. And some of those paths seem to be hot-paths too.

Perhaps more importantly, there are way more accesses to that
'sum_exec_runtime' than the spinlock-variant of the patch implied.

So now with the atomic64 variant, the readers are protected too, and
that ends up being really expensive. That may be the "right thing" to
do, but I'm not sure if it's really acceptable. Also, I see that some
of the atomic regions (that weren't protected by the spinlock *either*
aren't just simple adds: they are code like

+                       if (atomic64_read(&cputime_expires->sched_exp) == 0 ||
+                           atomic64_read(&cputime_expires->sched_exp)
> exp->sched)
+
atomic64_set(&cputime_expires->sched_exp, exp->sched);

in arm_timer(), which was apparently totally unprotected before, and
which is just inappropriate with atomic accesses.

So seeing this, I'm not confident that atomic64 works at all, after all.

Grrr..

                               Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17 14:57                   ` Linus Torvalds
@ 2011-10-17 17:54                     ` Peter Zijlstra
  2011-10-17 18:31                       ` Linus Torvalds
  0 siblings, 1 reply; 156+ messages in thread
From: Peter Zijlstra @ 2011-10-17 17:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Simon Kirby, Linux Kernel Mailing List, Dave Jones,
	Thomas Gleixner, Martin Schwidefsky, Ingo Molnar

On Mon, 2011-10-17 at 07:57 -0700, Linus Torvalds wrote:

> So seeing this, I'm not confident that atomic64 works at all, after all.

I could of course propose this... but I really won't since I'm half
retching by now.. ;-)


---
 include/linux/sched.h     |    7 +++++--
 kernel/posix-cpu-timers.c |    8 +++++---
 kernel/sched_stats.h      |    4 +---
 3 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 41d0237..94bf16f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -474,7 +474,10 @@ struct cpu_itimer {
 struct task_cputime {
 	cputime_t utime;
 	cputime_t stime;
-	unsigned long long sum_exec_runtime;
+	union {
+		unsigned long long sum_exec_runtime;
+		atomic64_t _sum_exec_runtime;
+	};
 };
 /* Alternate field names when used to cache expirations. */
 #define prof_exp	stime
@@ -485,7 +488,7 @@ struct task_cputime {
 	(struct task_cputime) {					\
 		.utime = cputime_zero,				\
 		.stime = cputime_zero,				\
-		.sum_exec_runtime = 0,				\
+		{ .sum_exec_runtime = 0, },			\
 	}
 
 /*
diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index c8008dd..4808c0d 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -264,8 +264,8 @@ static void update_gt_cputime(struct task_cputime *a, struct task_cputime *b)
 	if (cputime_gt(b->stime, a->stime))
 		a->stime = b->stime;
 
-	if (b->sum_exec_runtime > a->sum_exec_runtime)
-		a->sum_exec_runtime = b->sum_exec_runtime;
+	if (b->sum_exec_runtime > atomic64_read(&a->_sum_exec_runtime))
+		atomic64_set(&a->_sum_exec_runtime, b->sum_exec_runtime);
 }
 
 void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)
@@ -287,6 +287,8 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)
 		update_gt_cputime(&cputimer->cputime, &sum);
 	}
 	*times = cputimer->cputime;
+	times->sum_exec_runtime = 
+		atomic64_read(&cputimer->cputime._sum_exec_runtime);
 	spin_unlock_irqrestore(&cputimer->lock, flags);
 }
 
@@ -1279,7 +1281,7 @@ static inline int fastpath_timer_check(struct task_struct *tsk)
 		struct task_cputime task_sample = {
 			.utime = tsk->utime,
 			.stime = tsk->stime,
-			.sum_exec_runtime = tsk->se.sum_exec_runtime
+			{ .sum_exec_runtime = tsk->se.sum_exec_runtime, },
 		};
 
 		if (task_cputime_expired(&task_sample, &tsk->cputime_expires))
diff --git a/kernel/sched_stats.h b/kernel/sched_stats.h
index 331e01b..65dcb76 100644
--- a/kernel/sched_stats.h
+++ b/kernel/sched_stats.h
@@ -330,7 +330,5 @@ static inline void account_group_exec_runtime(struct task_struct *tsk,
 	if (!cputimer->running)
 		return;
 
-	spin_lock(&cputimer->lock);
-	cputimer->cputime.sum_exec_runtime += ns;
-	spin_unlock(&cputimer->lock);
+	atomic64_add(ns, &cputimer->cputime._sum_exec_runtime);
 }


^ permalink raw reply related	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17 17:54                     ` Peter Zijlstra
@ 2011-10-17 18:31                       ` Linus Torvalds
  2011-10-17 19:23                         ` Peter Zijlstra
  0 siblings, 1 reply; 156+ messages in thread
From: Linus Torvalds @ 2011-10-17 18:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Simon Kirby, Linux Kernel Mailing List, Dave Jones,
	Thomas Gleixner, Martin Schwidefsky, Ingo Molnar

On Mon, Oct 17, 2011 at 10:54 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
> I could of course propose this... but I really won't since I'm half
> retching by now.. ;-)

Wow. Is this "ugly and fragile code week" and I just didn't get the memo?

I do wonder if we might not fix the problem by just taking the
*existing* lock in the right order?

IOW, how nasty would be it be to make "scheduler_tick()" just get the
cputimer->lock outside or rq->lock?

Sure, we'd hold that lock *much* longer than we need, but how much do
we care? Is that a lock that gets contention? It migth be the simple
solution for now - I *would* like to get 3.1 out..

                        Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17  9:03                   ` Thomas Gleixner
  2011-10-17 10:40                     ` Peter Zijlstra
@ 2011-10-17 18:49                     ` Ingo Molnar
  2011-10-17 20:35                       ` H. Peter Anvin
  1 sibling, 1 reply; 156+ messages in thread
From: Ingo Molnar @ 2011-10-17 18:49 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Linus Torvalds, Simon Kirby, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky


* Thomas Gleixner <tglx@linutronix.de> wrote:

> > That simple time type could then trickle down as well: we could 
> > use it everywhere in kernel code and limit the hodge-podge of ABI 
> > time units to the syscall boundary. (and convert the internal 
> > time unit to whatever ABI unit there is close to the syscall 
> > boundary)
> > 
> > There's a point where micro-optimized 32-bit support related 
> > maintenance overhead (and the resulting loss of 
> > robustness/flexibility) becomes too expensive IMO.
> 
> That's not a micro optimization, it's a massive performance hit if 
> you force those 32bit archs to do 64/32 all over the place.

Do we have some hard data on this, which we could put into comments 
in include/linux/ktime.h and such? Older versions of GCC used to do a 
bad job of long long handling on 32-bit systems - that might be a 
factor in the performance figures.

But i suspect you are right that the cost is still very much there 
...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17 18:31                       ` Linus Torvalds
@ 2011-10-17 19:23                         ` Peter Zijlstra
  2011-10-17 21:00                           ` Thomas Gleixner
  0 siblings, 1 reply; 156+ messages in thread
From: Peter Zijlstra @ 2011-10-17 19:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Simon Kirby, Linux Kernel Mailing List, Dave Jones,
	Thomas Gleixner, Martin Schwidefsky, Ingo Molnar

On Mon, 2011-10-17 at 11:31 -0700, Linus Torvalds wrote:
> On Mon, Oct 17, 2011 at 10:54 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> >
> > I could of course propose this... but I really won't since I'm half
> > retching by now.. ;-)
> 
> Wow. Is this "ugly and fragile code week" and I just didn't get the memo?

Do I get a price?

> I do wonder if we might not fix the problem by just taking the
> *existing* lock in the right order?
> 
> IOW, how nasty would be it be to make "scheduler_tick()" just get the
> cputimer->lock outside or rq->lock?
> 
> Sure, we'd hold that lock *much* longer than we need, but how much do
> we care? Is that a lock that gets contention? It migth be the simple
> solution for now - I *would* like to get 3.1 out..

Ah, sadly the tick isn't the only one with the inverted callchain,
pretty much every callchain in the scheduler ends up in update_curr()
one way or another.

The easier way around might be something like this... even when two
threads in a process race to enable this clock the the wasted time is
pretty much of the same order as we would otherwise have wasted spinning
on the lock and the update_gt_cputime() think would end up moving the
clock fwd to the latest outcome any which way.

Humm,. Thomas anything?


---
diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index c8008dd..640ded8 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -274,9 +274,7 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)
 	struct task_cputime sum;
 	unsigned long flags;
 
-	spin_lock_irqsave(&cputimer->lock, flags);
 	if (!cputimer->running) {
-		cputimer->running = 1;
 		/*
 		 * The POSIX timer interface allows for absolute time expiry
 		 * values through the TIMER_ABSTIME flag, therefore we have
@@ -284,8 +282,11 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)
 		 * it.
 		 */
 		thread_group_cputime(tsk, &sum);
+		spin_lock_irqsave(&cputimer->lock, flags);
+		cputimer->running = 1;
 		update_gt_cputime(&cputimer->cputime, &sum);
-	}
+	} else
+		spin_lock_irqsave(&cputimer->lock, flags);
 	*times = cputimer->cputime;
 	spin_unlock_irqrestore(&cputimer->lock, flags);
 }


^ permalink raw reply related	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17 18:49                     ` Ingo Molnar
@ 2011-10-17 20:35                       ` H. Peter Anvin
  2011-10-17 21:19                         ` Ingo Molnar
  0 siblings, 1 reply; 156+ messages in thread
From: H. Peter Anvin @ 2011-10-17 20:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, Linus Torvalds, Simon Kirby, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky

On 10/17/2011 11:49 AM, Ingo Molnar wrote:
> Do we have some hard data on this, which we could put into comments 
> in include/linux/ktime.h and such? Older versions of GCC used to do a 
> bad job of long long handling on 32-bit systems - that might be a 
> factor in the performance figures.
> 
> But i suspect you are right that the cost is still very much there 

64/64 division is done bit by bit on most (all?) 32-bit architectures.

64/32 division can be done in hardware on some architectures, e.g. x86.

	-hpa

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17  7:55                 ` Martin Schwidefsky
  2011-10-17  9:12                   ` Peter Zijlstra
@ 2011-10-17 20:48                   ` H. Peter Anvin
  2011-10-18  7:20                     ` Martin Schwidefsky
  1 sibling, 1 reply; 156+ messages in thread
From: H. Peter Anvin @ 2011-10-17 20:48 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: Linus Torvalds, Simon Kirby, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Thomas Gleixner,
	Ingo Molnar

On 10/17/2011 12:55 AM, Martin Schwidefsky wrote:
> 
> I introduced those macros to find all the places in the kernel operating
> on a cputime value. The additional debug patch defined cputime_t as a
> struct which contained a single u64. That way I got a compiler error
> for every place I missed.
> 

And was there a reason that that structure thingy didn't get merged?

	-hpa

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17 19:23                         ` Peter Zijlstra
@ 2011-10-17 21:00                           ` Thomas Gleixner
  2011-10-18  8:39                             ` Thomas Gleixner
  0 siblings, 1 reply; 156+ messages in thread
From: Thomas Gleixner @ 2011-10-17 21:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Simon Kirby, Linux Kernel Mailing List,
	Dave Jones, Martin Schwidefsky, Ingo Molnar

On Mon, 17 Oct 2011, Peter Zijlstra wrote:

> On Mon, 2011-10-17 at 11:31 -0700, Linus Torvalds wrote:
> > On Mon, Oct 17, 2011 at 10:54 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > >
> > > I could of course propose this... but I really won't since I'm half
> > > retching by now.. ;-)
> > 
> > Wow. Is this "ugly and fragile code week" and I just didn't get the memo?
> 
> Do I get a price?
> 
> > I do wonder if we might not fix the problem by just taking the
> > *existing* lock in the right order?
> > 
> > IOW, how nasty would be it be to make "scheduler_tick()" just get the
> > cputimer->lock outside or rq->lock?
> > 
> > Sure, we'd hold that lock *much* longer than we need, but how much do
> > we care? Is that a lock that gets contention? It migth be the simple
> > solution for now - I *would* like to get 3.1 out..
> 
> Ah, sadly the tick isn't the only one with the inverted callchain,
> pretty much every callchain in the scheduler ends up in update_curr()
> one way or another.
> 
> The easier way around might be something like this... even when two
> threads in a process race to enable this clock the the wasted time is
> pretty much of the same order as we would otherwise have wasted spinning
> on the lock and the update_gt_cputime() think would end up moving the
> clock fwd to the latest outcome any which way.
> 
> Humm,. Thomas anything?
 
No, that should work. It does not make that call path more racy
against exit, which is another trainwreck at least on 32bit machines
which I discovered while looking for the problems with your patch.

thread_group_cputime() reads task->signal->utime/stime/sum_sched_runtime

These fields are updated in __exit_signal() w/o holding
task->signal->cputimer.lock. So nothing prevents that these values
change while we read them.

All callers of thread_group_cputime() except the scheduler callpath
hold sighand lock, which is also taken in __exit_signal().

So your patch does not make that particular case worse.

That said, I really need some sleep before I can make a final
judgement on that horror. The call paths are such an intermingled mess
that it's not funny anymore. I do that tomorrow morning first thing.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17 20:35                       ` H. Peter Anvin
@ 2011-10-17 21:19                         ` Ingo Molnar
  2011-10-17 21:22                           ` H. Peter Anvin
  2011-10-17 21:31                           ` Ingo Molnar
  0 siblings, 2 replies; 156+ messages in thread
From: Ingo Molnar @ 2011-10-17 21:19 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Linus Torvalds, Simon Kirby, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky


* H. Peter Anvin <hpa@zytor.com> wrote:

> On 10/17/2011 11:49 AM, Ingo Molnar wrote:
> > Do we have some hard data on this, which we could put into comments 
> > in include/linux/ktime.h and such? Older versions of GCC used to do a 
> > bad job of long long handling on 32-bit systems - that might be a 
> > factor in the performance figures.
> > 
> > But i suspect you are right that the cost is still very much there 
> 
> 64/64 division is done bit by bit on most (all?) 32-bit architectures.
> 
> 64/32 division can be done in hardware on some architectures, e.g. x86.

it's 64/32 division - it's the /1000000000 /1000000 /1000 divisions 
in the large majority of cases, to convert between 
seconds/milliseconds/microseconds and scalar nanoseconds.

the kernel-internal ktime_t in the 32-bit optimized case is:

union ktime {
        s32     sec, nsec;
};

which is the same as timespec and arithmetically close to timeval, 
which many ABIs use. So conversion is easy in that case - but 
arithmetics gets a bit harder.

If we used a scalar 64-bit form for all kernel internal time 
representations:

	s64	nsecs;

then conversions back to timespec/timeval would involve dividing this 
64-bit value with 1000000000 or 1000000.

Is there no faster approximation for those than bit by bit?

In particular we could try something like:

	(high*2^32 + low)/1e9 ~==  ( high * (2^64/1e9) ) / 2^32

... which reduces it all to a 64-bit multiplication (or two 32-bit 
multiplications) with a known constant, at the cost of 1 nsec 
imprecision of the result - but that's an OK approximation in my 
opinion.

But it's late here and math is hard - lets go shopping ;-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17 21:19                         ` Ingo Molnar
@ 2011-10-17 21:22                           ` H. Peter Anvin
  2011-10-17 21:39                             ` Ingo Molnar
  2011-10-17 21:31                           ` Ingo Molnar
  1 sibling, 1 reply; 156+ messages in thread
From: H. Peter Anvin @ 2011-10-17 21:22 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, Linus Torvalds, Simon Kirby, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky

On 10/17/2011 02:19 PM, Ingo Molnar wrote:
> 
> it's 64/32 division - it's the /1000000000 /1000000 /1000 divisions 
> in the large majority of cases, to convert between 
> seconds/milliseconds/microseconds and scalar nanoseconds.
> 
> the kernel-internal ktime_t in the 32-bit optimized case is:
> 
> union ktime {
>         s32     sec, nsec;
> };
> 
> which is the same as timespec and arithmetically close to timeval, 
> which many ABIs use. So conversion is easy in that case - but 
> arithmetics gets a bit harder.
> 
> If we used a scalar 64-bit form for all kernel internal time 
> representations:
> 
> 	s64	nsecs;
> 
> then conversions back to timespec/timeval would involve dividing this 
> 64-bit value with 1000000000 or 1000000.
> 
> Is there no faster approximation for those than bit by bit?
> 
> In particular we could try something like:
> 
> 	(high*2^32 + low)/1e9 ~==  ( high * (2^64/1e9) ) / 2^32
> 
> ... which reduces it all to a 64-bit multiplication (or two 32-bit 
> multiplications) with a known constant, at the cost of 1 nsec 
> imprecision of the result - but that's an OK approximation in my 
> opinion.
> 

We can do much better than that with reciprocal multiplication.  We're
already playing reciprocal multiplication tricks for jiffies conversion,
and in this case it's much easier because the constant is already known.

	-hpa


^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17 21:19                         ` Ingo Molnar
  2011-10-17 21:22                           ` H. Peter Anvin
@ 2011-10-17 21:31                           ` Ingo Molnar
  1 sibling, 0 replies; 156+ messages in thread
From: Ingo Molnar @ 2011-10-17 21:31 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Linus Torvalds, Simon Kirby, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky


* Ingo Molnar <mingo@elte.hu> wrote:

> If we used a scalar 64-bit form for all kernel internal time 
> representations:
> 
> 	s64	nsecs;
> 
> then conversions back to timespec/timeval would involve dividing 
> this 64-bit value with 1000000000 or 1000000.
> 
> Is there no faster approximation for those than bit by bit?
> 
> In particular we could try something like:
> 
> 	(high*2^32 + low)/1e9 ~==  ( high * (2^64/1e9) ) / 2^32
> 
> ... which reduces it all to a 64-bit multiplication (or two 32-bit 
> multiplications) with a known constant, at the cost of 1 nsec 
> imprecision of the result - but that's an OK approximation in my 
> opinion.

Hm, no, the numeric error would be in the *seconds* result, and would 
be 0-3 seconds - which is obviously not acceptable.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17 21:22                           ` H. Peter Anvin
@ 2011-10-17 21:39                             ` Ingo Molnar
  2011-10-17 22:03                               ` Ingo Molnar
  2011-10-17 22:08                               ` H. Peter Anvin
  0 siblings, 2 replies; 156+ messages in thread
From: Ingo Molnar @ 2011-10-17 21:39 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Linus Torvalds, Simon Kirby, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky


* H. Peter Anvin <hpa@zytor.com> wrote:

> On 10/17/2011 02:19 PM, Ingo Molnar wrote:
> > 
> > it's 64/32 division - it's the /1000000000 /1000000 /1000 divisions 
> > in the large majority of cases, to convert between 
> > seconds/milliseconds/microseconds and scalar nanoseconds.
> > 
> > the kernel-internal ktime_t in the 32-bit optimized case is:
> > 
> > union ktime {
> >         s32     sec, nsec;
> > };
> > 
> > which is the same as timespec and arithmetically close to timeval, 
> > which many ABIs use. So conversion is easy in that case - but 
> > arithmetics gets a bit harder.
> > 
> > If we used a scalar 64-bit form for all kernel internal time 
> > representations:
> > 
> > 	s64	nsecs;
> > 
> > then conversions back to timespec/timeval would involve dividing this 
> > 64-bit value with 1000000000 or 1000000.
> > 
> > Is there no faster approximation for those than bit by bit?
> > 
> > In particular we could try something like:
> > 
> > 	(high*2^32 + low)/1e9 ~==  ( high * (2^64/1e9) ) / 2^32
> > 
> > ... which reduces it all to a 64-bit multiplication (or two 32-bit 
> > multiplications) with a known constant, at the cost of 1 nsec 
> > imprecision of the result - but that's an OK approximation in my 
> > opinion.
> > 
> 
> We can do much better than that with reciprocal multiplication.  

Yes, 2^64/1e9 is the reciprocal.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17 21:39                             ` Ingo Molnar
@ 2011-10-17 22:03                               ` Ingo Molnar
  2011-10-17 22:04                                 ` Ingo Molnar
  2011-10-17 22:08                               ` H. Peter Anvin
  1 sibling, 1 reply; 156+ messages in thread
From: Ingo Molnar @ 2011-10-17 22:03 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Linus Torvalds, Simon Kirby, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky


* Ingo Molnar <mingo@elte.hu> wrote:

> > > In particular we could try something like:
> > > 
> > > 	(high*2^32 + low)/1e9 ~==  ( high * (2^64/1e9) ) / 2^32
> > > 
> > > ... which reduces it all to a 64-bit multiplication (or two 
> > > 32-bit multiplications) with a known constant, at the cost of 1 
> > > nsec imprecision of the result - but that's an OK approximation 
> > > in my opinion.
> > > 
> > 
> > We can do much better than that with reciprocal multiplication.
> 
> Yes, 2^64/1e9 is the reciprocal.

So basically, to extend on the pseudocode above, we could do the 
equivalent of:

/* 2^64/1e9: */
#define MAGIC 18446744073ULL

        secs_fast = ((nsecs >> 32) * MAGIC) >> 32;
        secs_fast += (nsecs & 0xFFFFFFFF)/1000000000;

to get to the precise 'timeval.secs' field - these are all 32-bit 
operations: a 32-bit multiplication and a 32-bit division if i 
counted it right.

(Likewise we can get the remainder as well, for timeval.nsecs.)

So I think if we add 32-bit optimized reciprocal multiplication based 
timeval and timespec routines, we can change ktime_t to a simple 
scalar type on 64-bit and 32-bit architectures alike.

It would likely be faster as well: the 32-bit ktime operations are 
more complex than straightforward u64 operations.

Thomas, what do you think?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17 22:03                               ` Ingo Molnar
@ 2011-10-17 22:04                                 ` Ingo Molnar
  0 siblings, 0 replies; 156+ messages in thread
From: Ingo Molnar @ 2011-10-17 22:04 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Linus Torvalds, Simon Kirby, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky


* Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> > > > In particular we could try something like:
> > > > 
> > > > 	(high*2^32 + low)/1e9 ~==  ( high * (2^64/1e9) ) / 2^32
> > > > 
> > > > ... which reduces it all to a 64-bit multiplication (or two 
> > > > 32-bit multiplications) with a known constant, at the cost of 1 
> > > > nsec imprecision of the result - but that's an OK approximation 
> > > > in my opinion.
> > > > 
> > > 
> > > We can do much better than that with reciprocal multiplication.
> > 
> > Yes, 2^64/1e9 is the reciprocal.
> 
> So basically, to extend on the pseudocode above, we could do the 
> equivalent of:
> 
> /* 2^64/1e9: */
> #define MAGIC 18446744073ULL
> 
>         secs_fast = ((nsecs >> 32) * MAGIC) >> 32;
>         secs_fast += (nsecs & 0xFFFFFFFF)/1000000000;
> 
> to get to the precise 'timeval.secs' field - these are all 32-bit 
> operations: a 32-bit multiplication and a 32-bit division if i 
> counted it right.
> 
> (Likewise we can get the remainder as well, for timeval.nsecs.)

that's timespec.nsecs - there's timeval.usecs. The same argument 
applies in both cases.

This would deobfuscate a rather important data type in the timer 
code.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17 21:39                             ` Ingo Molnar
  2011-10-17 22:03                               ` Ingo Molnar
@ 2011-10-17 22:08                               ` H. Peter Anvin
  2011-10-18  6:01                                 ` Ingo Molnar
  2011-10-18  7:12                                 ` Geert Uytterhoeven
  1 sibling, 2 replies; 156+ messages in thread
From: H. Peter Anvin @ 2011-10-17 22:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, Linus Torvalds, Simon Kirby, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky

On 10/17/2011 02:39 PM, Ingo Molnar wrote:
>>
>> We can do much better than that with reciprocal multiplication.  
> 
> Yes, 2^64/1e9 is the reciprocal.
> 

What I mean is that it's pretty easy to work it so it doesn't have the
errors.  We have 32*32 = 64 multiplication on all 32-bit platforms I'm
99.9% sure.

	-hpa


^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-12 21:35           ` Simon Kirby
  2011-10-13 23:25             ` Simon Kirby
@ 2011-10-18  5:40             ` Simon Kirby
  1 sibling, 0 replies; 156+ messages in thread
From: Simon Kirby @ 2011-10-18  5:40 UTC (permalink / raw)
  To: Linux Kernel Mailing List, netdev

On Wed, Oct 12, 2011 at 02:35:55PM -0700, Simon Kirby wrote:

> > > patching file kernel/posix-cpu-timers.c
> > > patching file kernel/sched_stats.h 
> > 
> > yes that would be fine.
> 
> This patch (s/raw_//) has been stable on 5 boxes for a day. I'll push to
> another 15 shortly and confirm tomorrow. Meanwhile, we had another ~4
> boxes lock up on 3.1-rc9 _with_ d670ec13 reverted (all CPUs spinning),
> but there weren't enough serial cables to log all of them and we haven't
> been lucky enough to capture anything other than what fits on 80x25.
> I'm hoping it's just the same bug you've already fixed.

Looks to be a different bug. It just happened on a box with serial
console logging, on the same build I was testing the above patch on --
Linus master circa Oct 7th. This seems to be specific to TCP. I'm not
sure what is with all of the doubled backtraces. I've only seen this on
a couple of different boxes so far.

Full log at http://0x.ca/sim/ref/3.1-rc9/3.1-rc9-tcp-lockup.log

First 100 lines:

[516112.140013] BUG: soft lockup - CPU#0 stuck for 22s! [swapper:0]
[516112.144001] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[516112.144001] CPU 0 
[516112.144001] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[516112.144001] 
[516112.144001] Pid: 0, comm: swapper Not tainted 3.1.0-rc9-hw+ #48 Dell Inc. PowerEdge 1950/0UR033
[516112.144001] RIP: 0010:[<ffffffff816b6694>]  [<ffffffff816b6694>] _raw_spin_lock+0x14/0x20
[516112.144001] RSP: 0018:ffff88022fc03e10  EFLAGS: 00000297
[516112.144001] RAX: 0000000000000100 RBX: ffffffff81022674 RCX: ffffffff81b4df20
[516112.144001] RDX: ffff8801002aebe0 RSI: dead000000200200 RDI: ffff8801002ad188
[516112.144001] RBP: ffff88022fc03e10 R08: 00000000000000f7 R09: 0000000000000000
[516112.144001] R10: 0000000000000000 R11: 0000000000000010 R12: ffff88022fc03d88
[516112.144001] R13: ffffffff816bed1e R14: ffff88022fc03e10 R15: ffffffff81b4df00
[516112.144001] FS:  0000000000000000(0000) GS:ffff88022fc00000(0000) knlGS:0000000000000000
[516112.244020] BUG: soft lockup - CPU#1 stuck for 22s! [kworker/0:0:0]
[516112.244024] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[516112.244033] CPU 1 
[516112.244035] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[516112.244041] 
[516112.244044] Pid: 0, comm: kworker/0:0 Not tainted 3.1.0-rc9-hw+ #48 Dell Inc. PowerEdge 1950/0UR033
[516112.244048] RIP: 0010:[<ffffffff816b6694>]  [<ffffffff816b6694>] _raw_spin_lock+0x14/0x20
[516112.244057] RSP: 0018:ffff88022fc43e10  EFLAGS: 00000297
[516112.244059] RAX: 0000000000000100 RBX: ffffffff81022674 RCX: ffff880226888020
[516112.244062] RDX: ffff88001ece1aa0 RSI: dead000000200200 RDI: ffff88001ece1f88
[516112.244064] RBP: ffff88022fc43e10 R08: 00000000000000df R09: 0000000000000000
[516112.244066] R10: 0000000000000000 R11: 0000000000000010 R12: ffff88022fc43d88
[516112.244068] R13: ffffffff816bed1e R14: ffff88022fc43e10 R15: ffff880226888000
[516112.244071] FS:  0000000000000000(0000) GS:ffff88022fc40000(0000) knlGS:0000000000000000
[516112.244074] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[516112.244076] CR2: ffffffffff600400 CR3: 0000000126d93000 CR4: 00000000000006e0
[516112.244078] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[516112.244081] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[516112.244083] Process kworker/0:0 (pid: 0, threadinfo ffff880226918000, task ffff880226911640)
[516112.244085] Stack:
[516112.244086]  ffff88022fc43e40 ffffffff8162a613 0000000000000000 0000000000000000
[516112.244090]  ffff880226888000 ffff88001ece20e0 ffff88022fc43ee0 ffffffff810692dc
[516112.244094]  0000000000000000 ffff880226919fd8 ffff880226919fd8 ffff880226919fd8
[516112.244098] Call Trace:
[516112.244099]  <IRQ> 
[516112.244105]  [<ffffffff8162a613>] tcp_keepalive_timer+0x23/0x260
[516112.244110]  [<ffffffff810692dc>] run_timer_softirq+0x1ac/0x310
[516112.244113]  [<ffffffff8162a5f0>] ? tcp_init_xmit_timers+0x20/0x20
[516112.244118]  [<ffffffff8102e838>] ? lapic_next_event+0x18/0x20
[516112.244121]  [<ffffffff81060bf0>] __do_softirq+0xe0/0x1d0
[516112.244125]  [<ffffffff816c04ac>] call_softirq+0x1c/0x30
[516112.244129]  [<ffffffff81014255>] do_softirq+0x65/0xa0
[516112.244132]  [<ffffffff810608fd>] irq_exit+0xad/0xe0
[516112.244135]  [<ffffffff8102f569>] smp_apic_timer_interrupt+0x69/0xa0
[516112.244139]  [<ffffffff816bed1e>] apic_timer_interrupt+0x6e/0x80
[516112.244140]  <EOI> 
[516112.244144]  [<ffffffff8101a337>] ? mwait_idle+0x117/0x120
[516112.244147]  [<ffffffff810120c6>] cpu_idle+0x86/0xe0
[516112.244151]  [<ffffffff816ae77c>] start_secondary+0x1a3/0x1e7
[516112.244153] Code: 0f b6 c2 85 c0 c9 0f 95 c0 0f b6 c0 c3 66 2e 0f 1f 84 00 00 00 00 00 55 b8 00 01 00 00 48 89 e5 f0 66 0f c1 07 38 e0 74 06 f3 90 <8a> 07 eb f6 c9 c3 66 0f 1f 44 00 00 55 48 89 e5 9c 58 66 66 90 
[516112.244173] Call Trace:
[516112.244174]  <IRQ>  [<ffffffff8162a613>] tcp_keepalive_timer+0x23/0x260
[516112.244179]  [<ffffffff810692dc>] run_timer_softirq+0x1ac/0x310
[516112.244182]  [<ffffffff8162a5f0>] ? tcp_init_xmit_timers+0x20/0x20
[516112.244185]  [<ffffffff8102e838>] ? lapic_next_event+0x18/0x20
[516112.244188]  [<ffffffff81060bf0>] __do_softirq+0xe0/0x1d0
[516112.244191]  [<ffffffff816c04ac>] call_softirq+0x1c/0x30
[516112.244194]  [<ffffffff81014255>] do_softirq+0x65/0xa0
[516112.244197]  [<ffffffff810608fd>] irq_exit+0xad/0xe0
[516112.244199]  [<ffffffff8102f569>] smp_apic_timer_interrupt+0x69/0xa0
[516112.244202]  [<ffffffff816bed1e>] apic_timer_interrupt+0x6e/0x80
[516112.244204]  <EOI>  [<ffffffff8101a337>] ? mwait_idle+0x117/0x120
[516112.244209]  [<ffffffff810120c6>] cpu_idle+0x86/0xe0
[516112.244212]  [<ffffffff816ae77c>] start_secondary+0x1a3/0x1e7
[516112.344023] BUG: soft lockup - CPU#2 stuck for 23s! [php:1486]
[516112.344025] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[516112.344033] CPU 2 
[516112.344034] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[516112.344040] 
[516112.344042] Pid: 1486, comm: php Not tainted 3.1.0-rc9-hw+ #48 Dell Inc. PowerEdge 1950/0UR033
[516112.344046] RIP: 0010:[<ffffffff816b6694>]  [<ffffffff816b6694>] _raw_spin_lock+0x14/0x20
[516112.344051] RSP: 0000:ffff88022fc83e10  EFLAGS: 00000297
[516112.344053] RAX: 0000000000000100 RBX: ffffffff81022674 RCX: ffff880226920020
[516112.344056] RDX: ffff88022198c660 RSI: dead000000200200 RDI: ffff8800ac758cc8
[516112.344058] RBP: ffff88022fc83e10 R08: 00000000000000ef R09: 0000000000000000
[516112.344060] R10: 000000000000018b R11: 0000000000000010 R12: ffff88022fc83d88
[516112.344062] R13: ffffffff816bed1e R14: ffff88022fc83e10 R15: ffff880226920000
[516112.344065] FS:  00007faafda03720(0000) GS:ffff88022fc80000(0000) knlGS:0000000000000000
[516112.344068] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[516112.344070] CR2: ffffffffff600400 CR3: 00000002223de000 CR4: 00000000000006e0
[516112.344072] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[516112.344075] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[516112.344077] Process php (pid: 1486, threadinfo ffff880039262000, task ffff88003e675900)
[516112.344079] Stack:
[516112.344081]  ffff88022fc83e40 ffffffff8162a613 0000000000000000 0000000000000000
[516112.344084]  ffff880226920000 ffff8800ac758e20 ffff88022fc83ee0 ffffffff810692dc
[516112.344088]  0000000000000001 ffff880039263fd8 ffff880039263fd8 ffff880039263fd8
[516112.344091] Call Trace:
[516112.344093]  <IRQ> 
[516112.344099]  [<ffffffff8162a613>] tcp_keepalive_timer+0x23/0x260
[516112.344104]  [<ffffffff810692dc>] run_timer_softirq+0x1ac/0x310
[516112.344107]  [<ffffffff8162a5f0>] ? tcp_init_xmit_timers+0x20/0x20
[516112.344111]  [<ffffffff8102e838>] ? lapic_next_event+0x18/0x20
[516112.344115]  [<ffffffff81060bf0>] __do_softirq+0xe0/0x1d0
[516112.344119]  [<ffffffff816c04ac>] call_softirq+0x1c/0x30
[516112.344123]  [<ffffffff81014255>] do_softirq+0x65/0xa0

Simon-

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17 22:08                               ` H. Peter Anvin
@ 2011-10-18  6:01                                 ` Ingo Molnar
  2011-10-18  7:12                                 ` Geert Uytterhoeven
  1 sibling, 0 replies; 156+ messages in thread
From: Ingo Molnar @ 2011-10-18  6:01 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Linus Torvalds, Simon Kirby, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky


* H. Peter Anvin <hpa@zytor.com> wrote:

> On 10/17/2011 02:39 PM, Ingo Molnar wrote:
> >>
> >> We can do much better than that with reciprocal multiplication.  
> > 
> > Yes, 2^64/1e9 is the reciprocal.
> > 
> 
> What I mean is that it's pretty easy to work it so it doesn't have 
> the errors. [...]

Yeah - the second pseudocode i gave will do that with no errors.

> [...] We have 32*32 = 64 multiplication on all 32-bit platforms I'm 
> 99.9% sure.

Good.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17 22:08                               ` H. Peter Anvin
  2011-10-18  6:01                                 ` Ingo Molnar
@ 2011-10-18  7:12                                 ` Geert Uytterhoeven
  2011-10-18 18:50                                   ` H. Peter Anvin
  1 sibling, 1 reply; 156+ messages in thread
From: Geert Uytterhoeven @ 2011-10-18  7:12 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ingo Molnar, Thomas Gleixner, Linus Torvalds, Simon Kirby,
	Peter Zijlstra, Linux Kernel Mailing List, Dave Jones,
	Martin Schwidefsky

On Tue, Oct 18, 2011 at 00:08, H. Peter Anvin <hpa@zytor.com> wrote:
> On 10/17/2011 02:39 PM, Ingo Molnar wrote:
>>> We can do much better than that with reciprocal multiplication.
>>
>> Yes, 2^64/1e9 is the reciprocal.
>
> What I mean is that it's pretty easy to work it so it doesn't have the
> errors.  We have 32*32 = 64 multiplication on all 32-bit platforms I'm
> 99.9% sure.

I assume you mean "we have in hardware"?

Is that muldi3?

$ git ls-files "*muldi3*"
arch/arm/lib/muldi3.S
arch/blackfin/lib/muldi3.S
arch/frv/lib/__muldi3.S
arch/m68k/lib/muldi3.c
arch/microblaze/lib/muldi3.c
arch/sparc/lib/muldi3.S
$

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17 20:48                   ` H. Peter Anvin
@ 2011-10-18  7:20                     ` Martin Schwidefsky
  0 siblings, 0 replies; 156+ messages in thread
From: Martin Schwidefsky @ 2011-10-18  7:20 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Simon Kirby, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Thomas Gleixner,
	Ingo Molnar

On Mon, 17 Oct 2011 13:48:33 -0700
"H. Peter Anvin" <hpa@zytor.com> wrote:

> On 10/17/2011 12:55 AM, Martin Schwidefsky wrote:
> > 
> > I introduced those macros to find all the places in the kernel operating
> > on a cputime value. The additional debug patch defined cputime_t as a
> > struct which contained a single u64. That way I got a compiler error
> > for every place I missed.
> > 
> 
> And was there a reason that that structure thingy didn't get merged?

Oh yes, it is fragile, hackish and ugly as hell.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17 21:00                           ` Thomas Gleixner
@ 2011-10-18  8:39                             ` Thomas Gleixner
  2011-10-18  9:05                               ` Peter Zijlstra
  0 siblings, 1 reply; 156+ messages in thread
From: Thomas Gleixner @ 2011-10-18  8:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Simon Kirby, Linux Kernel Mailing List,
	Dave Jones, Martin Schwidefsky, Ingo Molnar

On Mon, 17 Oct 2011, Thomas Gleixner wrote:
> That said, I really need some sleep before I can make a final
> judgement on that horror. The call paths are such an intermingled mess
> that it's not funny anymore. I do that tomorrow morning first thing.

The patch is safe and the exit race just existed in my confused tired
brain. Peter, can you please provide a changelog. That wants a cc
stable as well, because that deadlock causing commit hit 3.0.7 :(

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-18  8:39                             ` Thomas Gleixner
@ 2011-10-18  9:05                               ` Peter Zijlstra
  2011-10-18 14:59                                 ` Linus Torvalds
                                                   ` (2 more replies)
  0 siblings, 3 replies; 156+ messages in thread
From: Peter Zijlstra @ 2011-10-18  9:05 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Linus Torvalds, Simon Kirby, Linux Kernel Mailing List,
	Dave Jones, Martin Schwidefsky, Ingo Molnar

On Tue, 2011-10-18 at 10:39 +0200, Thomas Gleixner wrote:
> On Mon, 17 Oct 2011, Thomas Gleixner wrote:
> > That said, I really need some sleep before I can make a final
> > judgement on that horror. The call paths are such an intermingled mess
> > that it's not funny anymore. I do that tomorrow morning first thing.
> 
> The patch is safe and the exit race just existed in my confused tired
> brain. Peter, can you please provide a changelog. That wants a cc
> stable as well, because that deadlock causing commit hit 3.0.7 :( 

---
Subject: cputimer: Cure lock inversion
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Mon Oct 17 11:50:30 CEST 2011

There's a lock inversion between the cputimer->lock and rq->lock; notably
the two callchains involved are:

 update_rlimit_cpu()
   sighand->siglock
   set_process_cpu_timer()
     cpu_timer_sample_group()
       thread_group_cputimer()
         cputimer->lock
         thread_group_cputime()
           task_sched_runtime()
             ->pi_lock
             rq->lock

 scheduler_tick()
   rq->lock
   task_tick_fair()
     update_curr()
       account_group_exec()
         cputimer->lock

Where the first one is enabling a CLOCK_PROCESS_CPUTIME_ID timer, and
the second one is keeping up-to-date.

This problem was introduced by e8abccb7193 ("posix-cpu-timers: Cure
SMP accounting oddities").

Cure the problem by removing the cputimer->lock and rq->lock nesting,
this leaves concurrent enablers doing duplicate work, but the time
wasted should be on the same order otherwise wasted spinning on the
lock and the greater-than assignment filter should ensure we preserve
monotonicity.

Reported-by: Dave Jones <davej@redhat.com>
Reported-by: Simon Kirby <sim@hostway.ca>
Cc: stable@kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/posix-cpu-timers.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)
Index: linux-2.6/kernel/posix-cpu-timers.c
===================================================================
--- linux-2.6.orig/kernel/posix-cpu-timers.c
+++ linux-2.6/kernel/posix-cpu-timers.c
@@ -274,9 +274,7 @@ void thread_group_cputimer(struct task_s
 	struct task_cputime sum;
 	unsigned long flags;
 
-	spin_lock_irqsave(&cputimer->lock, flags);
 	if (!cputimer->running) {
-		cputimer->running = 1;
 		/*
 		 * The POSIX timer interface allows for absolute time expiry
 		 * values through the TIMER_ABSTIME flag, therefore we have
@@ -284,8 +282,11 @@ void thread_group_cputimer(struct task_s
 		 * it.
 		 */
 		thread_group_cputime(tsk, &sum);
+		spin_lock_irqsave(&cputimer->lock, flags);
+		cputimer->running = 1;
 		update_gt_cputime(&cputimer->cputime, &sum);
-	}
+	} else
+		spin_lock_irqsave(&cputimer->lock, flags);
 	*times = cputimer->cputime;
 	spin_unlock_irqrestore(&cputimer->lock, flags);
 }


^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-18  9:05                               ` Peter Zijlstra
@ 2011-10-18 14:59                                 ` Linus Torvalds
  2011-10-18 15:26                                   ` Thomas Gleixner
                                                     ` (2 more replies)
  2011-10-18 16:13                                 ` Linux 3.1-rc9 Dave Jones
  2011-10-18 18:20                                 ` Simon Kirby
  2 siblings, 3 replies; 156+ messages in thread
From: Linus Torvalds @ 2011-10-18 14:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Simon Kirby, Linux Kernel Mailing List,
	Dave Jones, Martin Schwidefsky, Ingo Molnar

On Tue, Oct 18, 2011 at 2:05 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
> Subject: cputimer: Cure lock inversion
> From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date: Mon Oct 17 11:50:30 CEST 2011
>
> There's a lock inversion between the cputimer->lock and rq->lock; notably
> the two callchains involved are:

Thanks, looks nice and small. Simon - can you check that this works for you?

Thomas/Ingo - once confirmed by Simon, should I take it directly or
will this come through your trees?

                                 Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-18 14:59                                 ` Linus Torvalds
@ 2011-10-18 15:26                                   ` Thomas Gleixner
  2011-10-18 18:07                                   ` Ingo Molnar
  2011-10-18 18:14                                   ` [GIT PULL] timer fix Ingo Molnar
  2 siblings, 0 replies; 156+ messages in thread
From: Thomas Gleixner @ 2011-10-18 15:26 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Simon Kirby, Linux Kernel Mailing List,
	Dave Jones, Martin Schwidefsky, Ingo Molnar

On Tue, 18 Oct 2011, Linus Torvalds wrote:

> On Tue, Oct 18, 2011 at 2:05 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> >
> > Subject: cputimer: Cure lock inversion
> > From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > Date: Mon Oct 17 11:50:30 CEST 2011
> >
> > There's a lock inversion between the cputimer->lock and rq->lock; notably
> > the two callchains involved are:
> 
> Thanks, looks nice and small. Simon - can you check that this works for you?
> 
> Thomas/Ingo - once confirmed by Simon, should I take it directly or
> will this come through your trees?

I have it queued in timers/urgent and run it through testing at the
moment. Will send a pull request once confirmed.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-18  9:05                               ` Peter Zijlstra
  2011-10-18 14:59                                 ` Linus Torvalds
@ 2011-10-18 16:13                                 ` Dave Jones
  2011-10-18 18:20                                 ` Simon Kirby
  2 siblings, 0 replies; 156+ messages in thread
From: Dave Jones @ 2011-10-18 16:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Linus Torvalds, Simon Kirby,
	Linux Kernel Mailing List, Martin Schwidefsky, Ingo Molnar

On Tue, Oct 18, 2011 at 11:05:13AM +0200, Peter Zijlstra wrote:
 
 > Reported-by: Dave Jones <davej@redhat.com>

Ok, feel free to add a 
 
Tested-by: Dave Jones <davej@redhat.com>

too. Seems to do the right thing here.

	Dave

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-18 14:59                                 ` Linus Torvalds
  2011-10-18 15:26                                   ` Thomas Gleixner
@ 2011-10-18 18:07                                   ` Ingo Molnar
  2011-10-18 18:14                                   ` [GIT PULL] timer fix Ingo Molnar
  2 siblings, 0 replies; 156+ messages in thread
From: Ingo Molnar @ 2011-10-18 18:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Thomas Gleixner, Simon Kirby,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, Oct 18, 2011 at 2:05 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> >
> > Subject: cputimer: Cure lock inversion
> > From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > Date: Mon Oct 17 11:50:30 CEST 2011
> >
> > There's a lock inversion between the cputimer->lock and rq->lock; notably
> > the two callchains involved are:
> 
> Thanks, looks nice and small. Simon - can you check that this works for you?
> 
> Thomas/Ingo - once confirmed by Simon, should I take it directly or
> will this come through your trees?

Yeah, we have it in -tip already and it was tested all day, lemme 
cook up a pull request to not hold up the v3.1 release much longer..

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [GIT PULL] timer fix
  2011-10-18 14:59                                 ` Linus Torvalds
  2011-10-18 15:26                                   ` Thomas Gleixner
  2011-10-18 18:07                                   ` Ingo Molnar
@ 2011-10-18 18:14                                   ` Ingo Molnar
  2 siblings, 0 replies; 156+ messages in thread
From: Ingo Molnar @ 2011-10-18 18:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Thomas Gleixner, Simon Kirby,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Andrew Morton


Linus,

Please pull the latest timers-urgent-for-linus git tree from:

   git://tesla.tglx.de/git/linux-2.6-tip.git timers-urgent-for-linus

 Thanks,

	Ingo

------------------>
Peter Zijlstra (1):
      cputimer: Cure lock inversion


 kernel/posix-cpu-timers.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index c8008dd..640ded8 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -274,9 +274,7 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)
 	struct task_cputime sum;
 	unsigned long flags;
 
-	spin_lock_irqsave(&cputimer->lock, flags);
 	if (!cputimer->running) {
-		cputimer->running = 1;
 		/*
 		 * The POSIX timer interface allows for absolute time expiry
 		 * values through the TIMER_ABSTIME flag, therefore we have
@@ -284,8 +282,11 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)
 		 * it.
 		 */
 		thread_group_cputime(tsk, &sum);
+		spin_lock_irqsave(&cputimer->lock, flags);
+		cputimer->running = 1;
 		update_gt_cputime(&cputimer->cputime, &sum);
-	}
+	} else
+		spin_lock_irqsave(&cputimer->lock, flags);
 	*times = cputimer->cputime;
 	spin_unlock_irqrestore(&cputimer->lock, flags);
 }

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-18  9:05                               ` Peter Zijlstra
  2011-10-18 14:59                                 ` Linus Torvalds
  2011-10-18 16:13                                 ` Linux 3.1-rc9 Dave Jones
@ 2011-10-18 18:20                                 ` Simon Kirby
  2011-10-18 19:48                                   ` Thomas Gleixner
  2 siblings, 1 reply; 156+ messages in thread
From: Simon Kirby @ 2011-10-18 18:20 UTC (permalink / raw)
  To: Peter Zijlstra, Linus Torvalds
  Cc: Thomas Gleixner, Linux Kernel Mailing List, Dave Jones,
	Martin Schwidefsky, Ingo Molnar

On Tue, Oct 18, 2011 at 11:05:13AM +0200, Peter Zijlstra wrote:

> Subject: cputimer: Cure lock inversion
> From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date: Mon Oct 17 11:50:30 CEST 2011
> 
> There's a lock inversion between the cputimer->lock and rq->lock; notably
> the two callchains involved are:
> 
>  update_rlimit_cpu()
>    sighand->siglock
>    set_process_cpu_timer()
>      cpu_timer_sample_group()
>        thread_group_cputimer()
>          cputimer->lock
>          thread_group_cputime()
>            task_sched_runtime()
>              ->pi_lock
>              rq->lock
> 
>  scheduler_tick()
>    rq->lock
>    task_tick_fair()
>      update_curr()
>        account_group_exec()
>          cputimer->lock
> 
> Where the first one is enabling a CLOCK_PROCESS_CPUTIME_ID timer, and
> the second one is keeping up-to-date.
> 
> This problem was introduced by e8abccb7193 ("posix-cpu-timers: Cure
> SMP accounting oddities").
> 
> Cure the problem by removing the cputimer->lock and rq->lock nesting,
> this leaves concurrent enablers doing duplicate work, but the time
> wasted should be on the same order otherwise wasted spinning on the
> lock and the greater-than assignment filter should ensure we preserve
> monotonicity.
> 
> Reported-by: Dave Jones <davej@redhat.com>
> Reported-by: Simon Kirby <sim@hostway.ca>
> Cc: stable@kernel.org
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
>  kernel/posix-cpu-timers.c |    7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> Index: linux-2.6/kernel/posix-cpu-timers.c
> ===================================================================
> --- linux-2.6.orig/kernel/posix-cpu-timers.c
> +++ linux-2.6/kernel/posix-cpu-timers.c
> @@ -274,9 +274,7 @@ void thread_group_cputimer(struct task_s
>  	struct task_cputime sum;
>  	unsigned long flags;
>  
> -	spin_lock_irqsave(&cputimer->lock, flags);
>  	if (!cputimer->running) {
> -		cputimer->running = 1;
>  		/*
>  		 * The POSIX timer interface allows for absolute time expiry
>  		 * values through the TIMER_ABSTIME flag, therefore we have
> @@ -284,8 +282,11 @@ void thread_group_cputimer(struct task_s
>  		 * it.
>  		 */
>  		thread_group_cputime(tsk, &sum);
> +		spin_lock_irqsave(&cputimer->lock, flags);
> +		cputimer->running = 1;
>  		update_gt_cputime(&cputimer->cputime, &sum);
> -	}
> +	} else
> +		spin_lock_irqsave(&cputimer->lock, flags);
>  	*times = cputimer->cputime;
>  	spin_unlock_irqrestore(&cputimer->lock, flags);
>  }
> 

Tested-by: Simon Kirby <sim@hostway.ca>

Looks good running on three boxes since this morning (unpatched kernel
hangs in ~15 minutes).

While I have your eyes, does this hang trace make any sense (which
happened a couple of times with your previous patch applied)?

http://0x.ca/sim/ref/3.1-rc9/3.1-rc9-tcp-lockup.log

I don't see how all CPUs could be spinning on the same lock without
reentry, and I don't see the any in the backtraces.

Simon-

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-18  7:12                                 ` Geert Uytterhoeven
@ 2011-10-18 18:50                                   ` H. Peter Anvin
  0 siblings, 0 replies; 156+ messages in thread
From: H. Peter Anvin @ 2011-10-18 18:50 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Ingo Molnar, Thomas Gleixner, Linus Torvalds, Simon Kirby,
	Peter Zijlstra, Linux Kernel Mailing List, Dave Jones,
	Martin Schwidefsky

On 10/18/2011 12:12 AM, Geert Uytterhoeven wrote:
> 
> I assume you mean "we have in hardware"?
> 

No.

> Is that muldi3?

No.

	-hpa

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-18 18:20                                 ` Simon Kirby
@ 2011-10-18 19:48                                   ` Thomas Gleixner
  2011-10-18 20:12                                     ` Linus Torvalds
  2011-10-24 19:02                                     ` Simon Kirby
  0 siblings, 2 replies; 156+ messages in thread
From: Thomas Gleixner @ 2011-10-18 19:48 UTC (permalink / raw)
  To: Simon Kirby
  Cc: Peter Zijlstra, Linus Torvalds, Linux Kernel Mailing List,
	Dave Jones, Martin Schwidefsky, Ingo Molnar, David Miller

On Tue, 18 Oct 2011, Simon Kirby wrote:
> Looks good running on three boxes since this morning (unpatched kernel
> hangs in ~15 minutes).
> 
> While I have your eyes, does this hang trace make any sense (which
> happened a couple of times with your previous patch applied)?
> 
> http://0x.ca/sim/ref/3.1-rc9/3.1-rc9-tcp-lockup.log
> 
> I don't see how all CPUs could be spinning on the same lock without
> reentry, and I don't see the any in the backtraces.

Weird.

Which version of Peters patches was this, the extra lock or the
atomic64 thingy?

It does not look related. Could you try to reproduce that problem with
lockdep enabled? lockdep might make it go away, but it's definitely
worth a try.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-18 19:48                                   ` Thomas Gleixner
@ 2011-10-18 20:12                                     ` Linus Torvalds
  2011-10-25 15:26                                       ` Simon Kirby
  2011-10-24 19:02                                     ` Simon Kirby
  1 sibling, 1 reply; 156+ messages in thread
From: Linus Torvalds @ 2011-10-18 20:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Simon Kirby, Peter Zijlstra, Linux Kernel Mailing List,
	Dave Jones, Martin Schwidefsky, Ingo Molnar, David Miller

On Tue, Oct 18, 2011 at 12:48 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> It does not look related.

Yeah, the only lock held there seems to be the socket lock, and it
looks like all CPU's are spinning on it.

> Could you try to reproduce that problem with
> lockdep enabled? lockdep might make it go away, but it's definitely
> worth a try.

And DEBUG_SPINLOCK / DEBUG_SPINLOCK_SLEEP too. Maybe you're triggering
some odd networking thing.  It sounds unlikely, but maybe some error
case you get into doesn't release the socket lock.

I think PROVE_LOCKING already enables DEBUG_SPINLOCK, but the sleeping
lock thing is separate, iirc.

                    Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-17  1:39               ` Linus Torvalds
                                   ` (2 preceding siblings ...)
  2011-10-17 10:34                 ` Peter Zijlstra
@ 2011-10-20 14:36                 ` Martin Schwidefsky
  2011-10-23 11:34                   ` Ingo Molnar
  3 siblings, 1 reply; 156+ messages in thread
From: Martin Schwidefsky @ 2011-10-20 14:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Simon Kirby, Peter Zijlstra, Linux Kernel Mailing List,
	Dave Jones, Thomas Gleixner, Ingo Molnar

On Sun, 16 Oct 2011 18:39:57 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> That stupid definition of cputime_add() has apparently existed as-is
> since it was introduced in 2005. Why do we have code like this:
> 
>     times->utime = cputime_add(times->utime, t->utime);
> 
> instead of just
> 
>     times->utime += t->utime;
> 
> which seems not just shorter, but more readable too? The reason is not
> some type safety in the cputime_add() thing, it's just a macro.
> 
> Added Martin and Ingo to the discussion - Martin because he added that
> cputime_add in the first place, and Ingo because he gets the most hits
> on kernel/sched_stats.h. Guys - you can see the history on lkml.

I tried my luck with cputime and sparse. It seems to work, I've added
sparse __nocast to the typedefs of cputime_t and cputime64_t and removed
all cputime macros for simple scalar operations on cputime.

Compiling a x86-64 tree with C=1 still gives a few warnings:

1) sparse creates a warning if a pointer to a nowarn variable is created.
   Is that intentional?
2) uptime_proc_show is borked, it uses a cputime_t to accumulate the idle
   time over the processors. This will overflow on x86-32 after 12.45 days
   of idle time. The __nocast check of sparse correctly identifies this
   as a problem:
   fs/proc/uptime.c:18:49: warning: implicit cast to/from nocast type.
3) cpufreq governor do strange things with cputime, e.g. wall time that
   is kept in a cputime64..

The patch is quite big. Comments ?

---
Subject: [PATCH] cputime: add sparse checking and cleanup

From: Martin Schwidefsky <schwidefsky@de.ibm.com>

Make cputime_t and cputime64_t nocast to enable sparse checking to
detect incorrect use of cputime. Drop the cputime macros for simple
scalar operations. The conversion macros are still needed as well as
the cputime_zero, cputime64_zero and cputime_one_jiffy defines.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---
 arch/ia64/include/asm/cputime.h        |   72 ++++++++---------
 arch/powerpc/include/asm/cputime.h     |   71 ++++++----------
 arch/s390/include/asm/cputime.h        |  139 +++++++++++++++------------------
 drivers/cpufreq/cpufreq_conservative.c |   27 +++---
 drivers/cpufreq/cpufreq_ondemand.c     |   31 +++----
 drivers/cpufreq/cpufreq_stats.c        |    5 -
 drivers/macintosh/rack-meter.c         |   11 --
 fs/proc/array.c                        |    4 
 fs/proc/stat.c                         |   25 ++---
 fs/proc/uptime.c                       |    2 
 include/asm-generic/cputime.h          |   64 +++++++--------
 kernel/acct.c                          |    4 
 kernel/cpu.c                           |    3 
 kernel/exit.c                          |   22 +----
 kernel/itimer.c                        |   19 ++--
 kernel/posix-cpu-timers.c              |  125 ++++++++++++-----------------
 kernel/sched.c                         |   80 ++++++++----------
 kernel/sched_stats.h                   |    6 -
 kernel/signal.c                        |    6 -
 kernel/sys.c                           |    4 
 kernel/tsacct.c                        |    4 
 21 files changed, 318 insertions(+), 406 deletions(-)

--- a/arch/ia64/include/asm/cputime.h
+++ b/arch/ia64/include/asm/cputime.h
@@ -26,59 +26,54 @@
 #include <linux/jiffies.h>
 #include <asm/processor.h>
 
-typedef u64 cputime_t;
-typedef u64 cputime64_t;
+typedef u64 __nocast cputime_t;
+typedef u64 __nocast cputime64_t;
 
-#define cputime_zero			((cputime_t)0)
+#define cputime_zero			((__force cputime_t) 0ULL)
 #define cputime_one_jiffy		jiffies_to_cputime(1)
-#define cputime_max			((~((cputime_t)0) >> 1) - 1)
-#define cputime_add(__a, __b)		((__a) +  (__b))
-#define cputime_sub(__a, __b)		((__a) -  (__b))
-#define cputime_div(__a, __n)		((__a) /  (__n))
-#define cputime_halve(__a)		((__a) >> 1)
-#define cputime_eq(__a, __b)		((__a) == (__b))
-#define cputime_gt(__a, __b)		((__a) >  (__b))
-#define cputime_ge(__a, __b)		((__a) >= (__b))
-#define cputime_lt(__a, __b)		((__a) <  (__b))
-#define cputime_le(__a, __b)		((__a) <= (__b))
-
-#define cputime64_zero			((cputime64_t)0)
-#define cputime64_add(__a, __b)		((__a) + (__b))
-#define cputime64_sub(__a, __b)		((__a) - (__b))
-#define cputime_to_cputime64(__ct)	(__ct)
+
+#define cputime64_zero			((__force cputime64_t) 0ULL)
 
 /*
  * Convert cputime <-> jiffies (HZ)
  */
-#define cputime_to_jiffies(__ct)	((__ct) / (NSEC_PER_SEC / HZ))
-#define jiffies_to_cputime(__jif)	((__jif) * (NSEC_PER_SEC / HZ))
-#define cputime64_to_jiffies64(__ct)	((__ct) / (NSEC_PER_SEC / HZ))
-#define jiffies64_to_cputime64(__jif)	((__jif) * (NSEC_PER_SEC / HZ))
+#define cputime_to_jiffies(__ct)	\
+	((__force u64)(__ct) / (NSEC_PER_SEC / HZ))
+#define jiffies_to_cputime(__jif)	\
+	(__force cputime_t)((__jif) * (NSEC_PER_SEC / HZ))
+#define cputime64_to_jiffies64(__ct)	\
+	((__force u64)(__ct) / (NSEC_PER_SEC / HZ))
+#define jiffies64_to_cputime64(__jif)	\
+	(__force cputime64_t)((__jif) * (NSEC_PER_SEC / HZ))
 
 /*
  * Convert cputime <-> microseconds
  */
-#define cputime_to_usecs(__ct)		((__ct) / NSEC_PER_USEC)
-#define usecs_to_cputime(__usecs)	((__usecs) * NSEC_PER_USEC)
+#define cputime_to_usecs(__ct)		\
+	((__force u64)(__ct) / NSEC_PER_USEC)
+#define usecs_to_cputime(__usecs)	\
+	(__force cputime_t)((__usecs) * NSEC_PER_USEC)
 
 /*
  * Convert cputime <-> seconds
  */
-#define cputime_to_secs(__ct)		((__ct) / NSEC_PER_SEC)
-#define secs_to_cputime(__secs)		((__secs) * NSEC_PER_SEC)
+#define cputime_to_secs(__ct)		\
+	((__force u64)(__ct) / NSEC_PER_SEC)
+#define secs_to_cputime(__secs)		\
+	(__force cputime_t)((__secs) * NSEC_PER_SEC)
 
 /*
  * Convert cputime <-> timespec (nsec)
  */
 static inline cputime_t timespec_to_cputime(const struct timespec *val)
 {
-	cputime_t ret = val->tv_sec * NSEC_PER_SEC;
-	return (ret + val->tv_nsec);
+	u64 ret = val->tv_sec * NSEC_PER_SEC + val->tv_nsec;
+	return (__force cputime_t) ret;
 }
 static inline void cputime_to_timespec(const cputime_t ct, struct timespec *val)
 {
-	val->tv_sec  = ct / NSEC_PER_SEC;
-	val->tv_nsec = ct % NSEC_PER_SEC;
+	val->tv_sec  = (__force u64) ct / NSEC_PER_SEC;
+	val->tv_nsec = (__force u64) ct % NSEC_PER_SEC;
 }
 
 /*
@@ -86,25 +81,28 @@ static inline void cputime_to_timespec(c
  */
 static inline cputime_t timeval_to_cputime(struct timeval *val)
 {
-	cputime_t ret = val->tv_sec * NSEC_PER_SEC;
-	return (ret + val->tv_usec * NSEC_PER_USEC);
+	u64 ret = val->tv_sec * NSEC_PER_SEC + val->tv_usec * NSEC_PER_USEC;
+	return (__force cputime_t) ret;
 }
 static inline void cputime_to_timeval(const cputime_t ct, struct timeval *val)
 {
-	val->tv_sec = ct / NSEC_PER_SEC;
-	val->tv_usec = (ct % NSEC_PER_SEC) / NSEC_PER_USEC;
+	val->tv_sec = (__force u64) ct / NSEC_PER_SEC;
+	val->tv_usec = ((__force u64) ct % NSEC_PER_SEC) / NSEC_PER_USEC;
 }
 
 /*
  * Convert cputime <-> clock (USER_HZ)
  */
-#define cputime_to_clock_t(__ct)	((__ct) / (NSEC_PER_SEC / USER_HZ))
-#define clock_t_to_cputime(__x)		((__x) * (NSEC_PER_SEC / USER_HZ))
+#define cputime_to_clock_t(__ct)	\
+	((__force u64)(__ct) / (NSEC_PER_SEC / USER_HZ))
+#define clock_t_to_cputime(__x)		\
+	(__force cputime_t)((__x) * (NSEC_PER_SEC / USER_HZ))
 
 /*
  * Convert cputime64 to clock.
  */
-#define cputime64_to_clock_t(__ct)      cputime_to_clock_t((cputime_t)__ct)
+#define cputime64_to_clock_t(__ct)      \
+	cputime_to_clock_t((__force cputime_t)__ct)
 
 #endif /* CONFIG_VIRT_CPU_ACCOUNTING */
 #endif /* __IA64_CPUTIME_H */
--- a/arch/powerpc/include/asm/cputime.h
+++ b/arch/powerpc/include/asm/cputime.h
@@ -29,25 +29,11 @@ static inline void setup_cputime_one_jif
 #include <asm/time.h>
 #include <asm/param.h>
 
-typedef u64 cputime_t;
-typedef u64 cputime64_t;
+typedef u64 __nocast cputime_t;
+typedef u64 __nocast cputime64_t;
 
-#define cputime_zero			((cputime_t)0)
-#define cputime_max			((~((cputime_t)0) >> 1) - 1)
-#define cputime_add(__a, __b)		((__a) +  (__b))
-#define cputime_sub(__a, __b)		((__a) -  (__b))
-#define cputime_div(__a, __n)		((__a) /  (__n))
-#define cputime_halve(__a)		((__a) >> 1)
-#define cputime_eq(__a, __b)		((__a) == (__b))
-#define cputime_gt(__a, __b)		((__a) >  (__b))
-#define cputime_ge(__a, __b)		((__a) >= (__b))
-#define cputime_lt(__a, __b)		((__a) <  (__b))
-#define cputime_le(__a, __b)		((__a) <= (__b))
-
-#define cputime64_zero			((cputime64_t)0)
-#define cputime64_add(__a, __b)		((__a) + (__b))
-#define cputime64_sub(__a, __b)		((__a) - (__b))
-#define cputime_to_cputime64(__ct)	(__ct)
+#define cputime_zero			((__force cputime_t) 0ULL)
+#define cputime64_zero			((__force cputime64_t) 0ULL)
 
 #ifdef __KERNEL__
 
@@ -65,7 +51,7 @@ DECLARE_PER_CPU(unsigned long, cputime_s
 
 static inline unsigned long cputime_to_jiffies(const cputime_t ct)
 {
-	return mulhdu(ct, __cputime_jiffies_factor);
+	return mulhdu((__force u64) ct, __cputime_jiffies_factor);
 }
 
 /* Estimate the scaled cputime by scaling the real cputime based on
@@ -74,14 +60,15 @@ static inline cputime_t cputime_to_scale
 {
 	if (cpu_has_feature(CPU_FTR_SPURR) &&
 	    __get_cpu_var(cputime_last_delta))
-		return ct * __get_cpu_var(cputime_scaled_last_delta) /
-			    __get_cpu_var(cputime_last_delta);
+		return (__force u64) ct *
+			__get_cpu_var(cputime_scaled_last_delta) /
+			__get_cpu_var(cputime_last_delta);
 	return ct;
 }
 
 static inline cputime_t jiffies_to_cputime(const unsigned long jif)
 {
-	cputime_t ct;
+	u64 ct;
 	unsigned long sec;
 
 	/* have to be a little careful about overflow */
@@ -93,7 +80,7 @@ static inline cputime_t jiffies_to_cputi
 	}
 	if (sec)
 		ct += (cputime_t) sec * tb_ticks_per_sec;
-	return ct;
+	return (__force cputime_t) ct;
 }
 
 static inline void setup_cputime_one_jiffy(void)
@@ -103,7 +90,7 @@ static inline void setup_cputime_one_jif
 
 static inline cputime64_t jiffies64_to_cputime64(const u64 jif)
 {
-	cputime_t ct;
+	u64 ct;
 	u64 sec;
 
 	/* have to be a little careful about overflow */
@@ -114,13 +101,13 @@ static inline cputime64_t jiffies64_to_c
 		do_div(ct, HZ);
 	}
 	if (sec)
-		ct += (cputime_t) sec * tb_ticks_per_sec;
-	return ct;
+		ct += (u64) sec * tb_ticks_per_sec;
+	return (__force cputime64_t) ct;
 }
 
 static inline u64 cputime64_to_jiffies64(const cputime_t ct)
 {
-	return mulhdu(ct, __cputime_jiffies_factor);
+	return mulhdu((__force u64) ct, __cputime_jiffies_factor);
 }
 
 /*
@@ -130,12 +117,12 @@ extern u64 __cputime_msec_factor;
 
 static inline unsigned long cputime_to_usecs(const cputime_t ct)
 {
-	return mulhdu(ct, __cputime_msec_factor) * USEC_PER_MSEC;
+	return mulhdu((__force u64) ct, __cputime_msec_factor) * USEC_PER_MSEC;
 }
 
 static inline cputime_t usecs_to_cputime(const unsigned long us)
 {
-	cputime_t ct;
+	u64 ct;
 	unsigned long sec;
 
 	/* have to be a little careful about overflow */
@@ -147,7 +134,7 @@ static inline cputime_t usecs_to_cputime
 	}
 	if (sec)
 		ct += (cputime_t) sec * tb_ticks_per_sec;
-	return ct;
+	return (__force cputime_t) ct;
 }
 
 /*
@@ -157,12 +144,12 @@ extern u64 __cputime_sec_factor;
 
 static inline unsigned long cputime_to_secs(const cputime_t ct)
 {
-	return mulhdu(ct, __cputime_sec_factor);
+	return mulhdu((__force u64) ct, __cputime_sec_factor);
 }
 
 static inline cputime_t secs_to_cputime(const unsigned long sec)
 {
-	return (cputime_t) sec * tb_ticks_per_sec;
+	return (__force cputime_t)((u64) sec * tb_ticks_per_sec);
 }
 
 /*
@@ -170,7 +157,7 @@ static inline cputime_t secs_to_cputime(
  */
 static inline void cputime_to_timespec(const cputime_t ct, struct timespec *p)
 {
-	u64 x = ct;
+	u64 x = (__force u64) ct;
 	unsigned int frac;
 
 	frac = do_div(x, tb_ticks_per_sec);
@@ -182,11 +169,11 @@ static inline void cputime_to_timespec(c
 
 static inline cputime_t timespec_to_cputime(const struct timespec *p)
 {
-	cputime_t ct;
+	u64 ct;
 
 	ct = (u64) p->tv_nsec * tb_ticks_per_sec;
 	do_div(ct, 1000000000);
-	return ct + (u64) p->tv_sec * tb_ticks_per_sec;
+	return (__force cputime_t)(ct + (u64) p->tv_sec * tb_ticks_per_sec);
 }
 
 /*
@@ -194,7 +181,7 @@ static inline cputime_t timespec_to_cput
  */
 static inline void cputime_to_timeval(const cputime_t ct, struct timeval *p)
 {
-	u64 x = ct;
+	u64 x = (__force u64) ct;
 	unsigned int frac;
 
 	frac = do_div(x, tb_ticks_per_sec);
@@ -206,11 +193,11 @@ static inline void cputime_to_timeval(co
 
 static inline cputime_t timeval_to_cputime(const struct timeval *p)
 {
-	cputime_t ct;
+	u64 ct;
 
 	ct = (u64) p->tv_usec * tb_ticks_per_sec;
 	do_div(ct, 1000000);
-	return ct + (u64) p->tv_sec * tb_ticks_per_sec;
+	return (__force cputime_t)(ct + (u64) p->tv_sec * tb_ticks_per_sec);
 }
 
 /*
@@ -220,12 +207,12 @@ extern u64 __cputime_clockt_factor;
 
 static inline unsigned long cputime_to_clock_t(const cputime_t ct)
 {
-	return mulhdu(ct, __cputime_clockt_factor);
+	return mulhdu((__force u64) ct, __cputime_clockt_factor);
 }
 
 static inline cputime_t clock_t_to_cputime(const unsigned long clk)
 {
-	cputime_t ct;
+	u64 ct;
 	unsigned long sec;
 
 	/* have to be a little careful about overflow */
@@ -236,8 +223,8 @@ static inline cputime_t clock_t_to_cputi
 		do_div(ct, USER_HZ);
 	}
 	if (sec)
-		ct += (cputime_t) sec * tb_ticks_per_sec;
-	return ct;
+		ct += (u64) sec * tb_ticks_per_sec;
+	return (__force cputime_t) ct;
 }
 
 #define cputime64_to_clock_t(ct)	cputime_to_clock_t((cputime_t)(ct))
--- a/arch/s390/include/asm/cputime.h
+++ b/arch/s390/include/asm/cputime.h
@@ -16,114 +16,101 @@
 
 /* We want to use full resolution of the CPU timer: 2**-12 micro-seconds. */
 
-typedef unsigned long long cputime_t;
-typedef unsigned long long cputime64_t;
+typedef unsigned long long __nocast cputime_t;
+typedef unsigned long long __nocast cputime64_t;
 
-#ifndef __s390x__
-
-static inline unsigned int
-__div(unsigned long long n, unsigned int base)
+static inline unsigned long __div(unsigned long long n, unsigned long base)
 {
+#ifndef __s390x__
 	register_pair rp;
 
 	rp.pair = n >> 1;
 	asm ("dr %0,%1" : "+d" (rp) : "d" (base >> 1));
 	return rp.subreg.odd;
+#else /* __s390x__ */
+	return n / base;
+#endif /* __s390x__ */
 }
 
-#else /* __s390x__ */
+#define cputime_zero			((__force cputime_t) 0ULL)
+#define cputime_one_jiffy		jiffies_to_cputime(1)
+
+#define cputime64_zero			((__force cputime64_t) 0ULL)
 
-static inline unsigned int
-__div(unsigned long long n, unsigned int base)
+/*
+ * Convert cputime to jiffies and back.
+ */
+static inline unsigned long cputime_to_jiffies(const cputime_t cputime)
 {
-	return n / base;
+	return __div((__force unsigned long long) cputime, 4096000000ULL / HZ);
 }
 
-#endif /* __s390x__ */
+static inline cputime_t jiffies_to_cputime(const unsigned int jif)
+{
+	return (__force cputime_t)(jif * (4096000000ULL / HZ));
+}
 
-#define cputime_zero			(0ULL)
-#define cputime_one_jiffy		jiffies_to_cputime(1)
-#define cputime_max			((~0UL >> 1) - 1)
-#define cputime_add(__a, __b)		((__a) +  (__b))
-#define cputime_sub(__a, __b)		((__a) -  (__b))
-#define cputime_div(__a, __n) ({		\
-	unsigned long long __div = (__a);	\
-	do_div(__div,__n);			\
-	__div;					\
-})
-#define cputime_halve(__a)		((__a) >> 1)
-#define cputime_eq(__a, __b)		((__a) == (__b))
-#define cputime_gt(__a, __b)		((__a) >  (__b))
-#define cputime_ge(__a, __b)		((__a) >= (__b))
-#define cputime_lt(__a, __b)		((__a) <  (__b))
-#define cputime_le(__a, __b)		((__a) <= (__b))
-#define cputime_to_jiffies(__ct)	(__div((__ct), 4096000000ULL / HZ))
-#define cputime_to_scaled(__ct)		(__ct)
-#define jiffies_to_cputime(__hz)	((cputime_t)(__hz) * (4096000000ULL / HZ))
-
-#define cputime64_zero			(0ULL)
-#define cputime64_add(__a, __b)		((__a) + (__b))
-#define cputime_to_cputime64(__ct)	(__ct)
+static inline u64 cputime64_to_jiffies64(cputime64_t cputime)
+{
+	unsigned long long jif = (__force unsigned long long) cputime;
+	do_div(jif, 4096000000ULL / HZ);
+	return jif;
+}
 
-static inline u64
-cputime64_to_jiffies64(cputime64_t cputime)
+static inline cputime64_t jiffies64_to_cputime64(const u64 jif)
 {
-	do_div(cputime, 4096000000ULL / HZ);
-	return cputime;
+	return (__force cputime64_t)(jif * (4096000000ULL / HZ));
 }
 
 /*
  * Convert cputime to microseconds and back.
  */
-static inline unsigned int
-cputime_to_usecs(const cputime_t cputime)
+static inline unsigned int cputime_to_usecs(const cputime_t cputime)
 {
-	return cputime_div(cputime, 4096);
+	return (__force unsigned long long) cputime >> 12;
 }
 
-static inline cputime_t
-usecs_to_cputime(const unsigned int m)
+static inline cputime_t usecs_to_cputime(const unsigned int m)
 {
-	return (cputime_t) m * 4096;
+	return (__force cputime_t)(m * 4096ULL);
 }
 
 /*
  * Convert cputime to milliseconds and back.
  */
-static inline unsigned int
-cputime_to_secs(const cputime_t cputime)
+static inline unsigned int cputime_to_secs(const cputime_t cputime)
 {
-	return __div(cputime, 2048000000) >> 1;
+	return __div((__force unsigned long long) cputime, 2048000000) >> 1;
 }
 
-static inline cputime_t
-secs_to_cputime(const unsigned int s)
+static inline cputime_t secs_to_cputime(const unsigned int s)
 {
-	return (cputime_t) s * 4096000000ULL;
+	return (__force cputime_t)(s * 4096000000ULL);
 }
 
 /*
  * Convert cputime to timespec and back.
  */
-static inline cputime_t
-timespec_to_cputime(const struct timespec *value)
+static inline cputime_t timespec_to_cputime(const struct timespec *value)
 {
-	return value->tv_nsec * 4096 / 1000 + (u64) value->tv_sec * 4096000000ULL;
+	unsigned long long ret = value->tv_sec * 4096000000ULL;
+	return (__force cputime_t)(ret + value->tv_nsec * 4096 / 1000);
 }
 
-static inline void
-cputime_to_timespec(const cputime_t cputime, struct timespec *value)
+static inline void cputime_to_timespec(const cputime_t cputime,
+				       struct timespec *value)
 {
+	unsigned long long __cputime = (__force unsigned long long) cputime;
 #ifndef __s390x__
 	register_pair rp;
 
-	rp.pair = cputime >> 1;
+	rp.pair = __cputime >> 1;
 	asm ("dr %0,%1" : "+d" (rp) : "d" (2048000000UL));
 	value->tv_nsec = rp.subreg.even * 1000 / 4096;
 	value->tv_sec = rp.subreg.odd;
 #else
-	value->tv_nsec = (cputime % 4096000000ULL) * 1000 / 4096;
-	value->tv_sec = cputime / 4096000000ULL;
+	value->tv_nsec = (__cputime % 4096000000ULL) * 1000 / 4096;
+	value->tv_sec = __cputime / 4096000000ULL;
 #endif
 }
 
@@ -132,50 +119,52 @@ cputime_to_timespec(const cputime_t cput
  * Since cputime and timeval have the same resolution (microseconds)
  * this is easy.
  */
-static inline cputime_t
-timeval_to_cputime(const struct timeval *value)
+static inline cputime_t timeval_to_cputime(const struct timeval *value)
 {
-	return value->tv_usec * 4096 + (u64) value->tv_sec * 4096000000ULL;
+	unsigned long long ret = value->tv_sec * 4096000000ULL;
+	return (__force cputime_t)(ret + value->tv_usec * 4096ULL);
 }
 
-static inline void
-cputime_to_timeval(const cputime_t cputime, struct timeval *value)
+static inline void cputime_to_timeval(const cputime_t cputime,
+				      struct timeval *value)
 {
+	unsigned long long __cputime = (__force unsigned long long) cputime;
 #ifndef __s390x__
 	register_pair rp;
 
-	rp.pair = cputime >> 1;
+	rp.pair = __cputime >> 1;
 	asm ("dr %0,%1" : "+d" (rp) : "d" (2048000000UL));
 	value->tv_usec = rp.subreg.even / 4096;
 	value->tv_sec = rp.subreg.odd;
 #else
-	value->tv_usec = (cputime % 4096000000ULL) / 4096;
-	value->tv_sec = cputime / 4096000000ULL;
+	value->tv_usec = (__cputime % 4096000000ULL) / 4096;
+	value->tv_sec = __cputime / 4096000000ULL;
 #endif
 }
 
 /*
  * Convert cputime to clock and back.
  */
-static inline clock_t
-cputime_to_clock_t(cputime_t cputime)
+static inline clock_t cputime_to_clock_t(cputime_t cputime)
 {
-	return cputime_div(cputime, 4096000000ULL / USER_HZ);
+	unsigned long long clock = (__force unsigned long long) cputime;
+	do_div(clock, 4096000000ULL / USER_HZ);
+	return clock;
 }
 
-static inline cputime_t
-clock_t_to_cputime(unsigned long x)
+static inline cputime_t clock_t_to_cputime(unsigned long x)
 {
-	return (cputime_t) x * (4096000000ULL / USER_HZ);
+	return (__force cputime_t)(x * (4096000000ULL / USER_HZ));
 }
 
 /*
  * Convert cputime64 to clock.
  */
-static inline clock_t
-cputime64_to_clock_t(cputime64_t cputime)
+static inline clock_t cputime64_to_clock_t(cputime64_t cputime)
 {
-       return cputime_div(cputime, 4096000000ULL / USER_HZ);
+	unsigned long long clock = (__force unsigned long long) cputime;
+	do_div(clock, 4096000000ULL / USER_HZ);
+	return clock;
 }
 
 struct s390_idle_data {
--- a/drivers/cpufreq/cpufreq_conservative.c
+++ b/drivers/cpufreq/cpufreq_conservative.c
@@ -103,15 +103,14 @@ static inline cputime64_t get_cpu_idle_t
 	cputime64_t busy_time;
 
 	cur_wall_time = jiffies64_to_cputime64(get_jiffies_64());
-	busy_time = cputime64_add(kstat_cpu(cpu).cpustat.user,
-			kstat_cpu(cpu).cpustat.system);
+	busy_time  = kstat_cpu(cpu).cpustat.user;
+	busy_time += kstat_cpu(cpu).cpustat.system;
+	busy_time += kstat_cpu(cpu).cpustat.irq;
+	busy_time += kstat_cpu(cpu).cpustat.softirq;
+	busy_time += kstat_cpu(cpu).cpustat.steal;
+	busy_time += kstat_cpu(cpu).cpustat.nice;
 
-	busy_time = cputime64_add(busy_time, kstat_cpu(cpu).cpustat.irq);
-	busy_time = cputime64_add(busy_time, kstat_cpu(cpu).cpustat.softirq);
-	busy_time = cputime64_add(busy_time, kstat_cpu(cpu).cpustat.steal);
-	busy_time = cputime64_add(busy_time, kstat_cpu(cpu).cpustat.nice);
-
-	idle_time = cputime64_sub(cur_wall_time, busy_time);
+	idle_time = cur_wall_time - busy_time;
 	if (wall)
 		*wall = (cputime64_t)jiffies_to_usecs(cur_wall_time);
 
@@ -351,20 +350,20 @@ static void dbs_check_cpu(struct cpu_dbs
 
 		cur_idle_time = get_cpu_idle_time(j, &cur_wall_time);
 
-		wall_time = (unsigned int) cputime64_sub(cur_wall_time,
-				j_dbs_info->prev_cpu_wall);
+		wall_time = (unsigned int)
+			cur_wall_time - j_dbs_info->prev_cpu_wall;
 		j_dbs_info->prev_cpu_wall = cur_wall_time;
 
-		idle_time = (unsigned int) cputime64_sub(cur_idle_time,
-				j_dbs_info->prev_cpu_idle);
+		idle_time = (unsigned int)
+			cur_idle_time - j_dbs_info->prev_cpu_idle;
 		j_dbs_info->prev_cpu_idle = cur_idle_time;
 
 		if (dbs_tuners_ins.ignore_nice) {
 			cputime64_t cur_nice;
 			unsigned long cur_nice_jiffies;
 
-			cur_nice = cputime64_sub(kstat_cpu(j).cpustat.nice,
-					 j_dbs_info->prev_cpu_nice);
+			cur_nice = kstat_cpu(j).cpustat.nice -
+					j_dbs_info->prev_cpu_nice;
 			/*
 			 * Assumption: nice time between sampling periods will
 			 * be less than 2^32 jiffies for 32 bit sys
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -127,15 +127,14 @@ static inline cputime64_t get_cpu_idle_t
 	cputime64_t busy_time;
 
 	cur_wall_time = jiffies64_to_cputime64(get_jiffies_64());
-	busy_time = cputime64_add(kstat_cpu(cpu).cpustat.user,
-			kstat_cpu(cpu).cpustat.system);
+	busy_time  = kstat_cpu(cpu).cpustat.user;
+	busy_time += kstat_cpu(cpu).cpustat.system;
+	busy_time += kstat_cpu(cpu).cpustat.irq;
+	busy_time += kstat_cpu(cpu).cpustat.softirq;
+	busy_time += kstat_cpu(cpu).cpustat.steal;
+	busy_time += kstat_cpu(cpu).cpustat.nice;
 
-	busy_time = cputime64_add(busy_time, kstat_cpu(cpu).cpustat.irq);
-	busy_time = cputime64_add(busy_time, kstat_cpu(cpu).cpustat.softirq);
-	busy_time = cputime64_add(busy_time, kstat_cpu(cpu).cpustat.steal);
-	busy_time = cputime64_add(busy_time, kstat_cpu(cpu).cpustat.nice);
-
-	idle_time = cputime64_sub(cur_wall_time, busy_time);
+	idle_time = cur_wall_time - busy_time;
 	if (wall)
 		*wall = (cputime64_t)jiffies_to_usecs(cur_wall_time);
 
@@ -440,24 +439,24 @@ static void dbs_check_cpu(struct cpu_dbs
 		cur_idle_time = get_cpu_idle_time(j, &cur_wall_time);
 		cur_iowait_time = get_cpu_iowait_time(j, &cur_wall_time);
 
-		wall_time = (unsigned int) cputime64_sub(cur_wall_time,
-				j_dbs_info->prev_cpu_wall);
+		wall_time = (unsigned int)
+			cur_wall_time - j_dbs_info->prev_cpu_wall;
 		j_dbs_info->prev_cpu_wall = cur_wall_time;
 
-		idle_time = (unsigned int) cputime64_sub(cur_idle_time,
-				j_dbs_info->prev_cpu_idle);
+		idle_time = (unsigned int)
+			cur_idle_time - j_dbs_info->prev_cpu_idle;
 		j_dbs_info->prev_cpu_idle = cur_idle_time;
 
-		iowait_time = (unsigned int) cputime64_sub(cur_iowait_time,
-				j_dbs_info->prev_cpu_iowait);
+		iowait_time = (unsigned int)
+			cur_iowait_time - j_dbs_info->prev_cpu_iowait;
 		j_dbs_info->prev_cpu_iowait = cur_iowait_time;
 
 		if (dbs_tuners_ins.ignore_nice) {
 			cputime64_t cur_nice;
 			unsigned long cur_nice_jiffies;
 
-			cur_nice = cputime64_sub(kstat_cpu(j).cpustat.nice,
-					 j_dbs_info->prev_cpu_nice);
+			cur_nice = kstat_cpu(j).cpustat.nice -
+					j_dbs_info->prev_cpu_nice;
 			/*
 			 * Assumption: nice time between sampling periods will
 			 * be less than 2^32 jiffies for 32 bit sys
--- a/drivers/cpufreq/cpufreq_stats.c
+++ b/drivers/cpufreq/cpufreq_stats.c
@@ -60,9 +60,8 @@ static int cpufreq_stats_update(unsigned
 	spin_lock(&cpufreq_stats_lock);
 	stat = per_cpu(cpufreq_stats_table, cpu);
 	if (stat->time_in_state)
-		stat->time_in_state[stat->last_index] =
-			cputime64_add(stat->time_in_state[stat->last_index],
-				      cputime_sub(cur_time, stat->last_time));
+		stat->time_in_state[stat->last_index] +=
+			cur_time - stat->last_time;
 	stat->last_time = cur_time;
 	spin_unlock(&cpufreq_stats_lock);
 	return 0;
--- a/drivers/macintosh/rack-meter.c
+++ b/drivers/macintosh/rack-meter.c
@@ -83,11 +83,10 @@ static inline cputime64_t get_cpu_idle_t
 {
 	cputime64_t retval;
 
-	retval = cputime64_add(kstat_cpu(cpu).cpustat.idle,
-			kstat_cpu(cpu).cpustat.iowait);
+	retval = kstat_cpu(cpu).cpustat.idle + kstat_cpu(cpu).cpustat.iowait;
 
 	if (rackmeter_ignore_nice)
-		retval = cputime64_add(retval, kstat_cpu(cpu).cpustat.nice);
+		retval = retval + kstat_cpu(cpu).cpustat.nice;
 
 	return retval;
 }
@@ -220,13 +219,11 @@ static void rackmeter_do_timer(struct wo
 	int i, offset, load, cumm, pause;
 
 	cur_jiffies = jiffies64_to_cputime64(get_jiffies_64());
-	total_ticks = (unsigned int)cputime64_sub(cur_jiffies,
-						  rcpu->prev_wall);
+	total_ticks = (unsigned int) (cur_jiffies - rcpu->prev_wall);
 	rcpu->prev_wall = cur_jiffies;
 
 	total_idle_ticks = get_cpu_idle_time(cpu);
-	idle_ticks = (unsigned int) cputime64_sub(total_idle_ticks,
-				rcpu->prev_idle);
+	idle_ticks = (unsigned int) (total_idle_ticks - rcpu->prev_idle);
 	rcpu->prev_idle = total_idle_ticks;
 
 	/* We do a very dumb calculation to update the LEDs for now,
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -423,14 +423,14 @@ static int do_task_stat(struct seq_file
 			do {
 				min_flt += t->min_flt;
 				maj_flt += t->maj_flt;
-				gtime = cputime_add(gtime, t->gtime);
+				gtime += t->gtime;
 				t = next_thread(t);
 			} while (t != task);
 
 			min_flt += sig->min_flt;
 			maj_flt += sig->maj_flt;
 			thread_group_times(task, &utime, &stime);
-			gtime = cputime_add(gtime, sig->gtime);
+			gtime += sig->gtime;
 		}
 
 		sid = task_session_nr_ns(task, ns);
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -39,18 +39,16 @@ static int show_stat(struct seq_file *p,
 	jif = boottime.tv_sec;
 
 	for_each_possible_cpu(i) {
-		user = cputime64_add(user, kstat_cpu(i).cpustat.user);
-		nice = cputime64_add(nice, kstat_cpu(i).cpustat.nice);
-		system = cputime64_add(system, kstat_cpu(i).cpustat.system);
-		idle = cputime64_add(idle, kstat_cpu(i).cpustat.idle);
-		idle = cputime64_add(idle, arch_idle_time(i));
-		iowait = cputime64_add(iowait, kstat_cpu(i).cpustat.iowait);
-		irq = cputime64_add(irq, kstat_cpu(i).cpustat.irq);
-		softirq = cputime64_add(softirq, kstat_cpu(i).cpustat.softirq);
-		steal = cputime64_add(steal, kstat_cpu(i).cpustat.steal);
-		guest = cputime64_add(guest, kstat_cpu(i).cpustat.guest);
-		guest_nice = cputime64_add(guest_nice,
-			kstat_cpu(i).cpustat.guest_nice);
+		user += kstat_cpu(i).cpustat.user;
+		nice += kstat_cpu(i).cpustat.nice;
+		system += kstat_cpu(i).cpustat.system;
+		idle += kstat_cpu(i).cpustat.idle + arch_idle_time(i);
+		iowait += kstat_cpu(i).cpustat.iowait;
+		irq += kstat_cpu(i).cpustat.irq;
+		softirq += kstat_cpu(i).cpustat.softirq;
+		steal += kstat_cpu(i).cpustat.steal;
+		guest += kstat_cpu(i).cpustat.guest;
+		guest_nice += kstat_cpu(i).cpustat.guest_nice;
 		sum += kstat_cpu_irqs_sum(i);
 		sum += arch_irq_stat_cpu(i);
 
@@ -81,8 +79,7 @@ static int show_stat(struct seq_file *p,
 		user = kstat_cpu(i).cpustat.user;
 		nice = kstat_cpu(i).cpustat.nice;
 		system = kstat_cpu(i).cpustat.system;
-		idle = kstat_cpu(i).cpustat.idle;
-		idle = cputime64_add(idle, arch_idle_time(i));
+		idle = kstat_cpu(i).cpustat.idle + arch_idle_time(i);
 		iowait = kstat_cpu(i).cpustat.iowait;
 		irq = kstat_cpu(i).cpustat.irq;
 		softirq = kstat_cpu(i).cpustat.softirq;
--- a/fs/proc/uptime.c
+++ b/fs/proc/uptime.c
@@ -15,7 +15,7 @@ static int uptime_proc_show(struct seq_f
 	cputime_t idletime = cputime_zero;
 
 	for_each_possible_cpu(i)
-		idletime = cputime64_add(idletime, kstat_cpu(i).cpustat.idle);
+		idletime += kstat_cpu(i).cpustat.idle;
 
 	do_posix_clock_monotonic_gettime(&uptime);
 	monotonic_to_bootbased(&uptime);
--- a/include/asm-generic/cputime.h
+++ b/include/asm-generic/cputime.h
@@ -4,70 +4,66 @@
 #include <linux/time.h>
 #include <linux/jiffies.h>
 
-typedef unsigned long cputime_t;
+typedef unsigned long __nocast cputime_t;
 
-#define cputime_zero			(0UL)
+#define cputime_zero			((__force cputime_t) 0UL)
 #define cputime_one_jiffy		jiffies_to_cputime(1)
-#define cputime_max			((~0UL >> 1) - 1)
-#define cputime_add(__a, __b)		((__a) +  (__b))
-#define cputime_sub(__a, __b)		((__a) -  (__b))
-#define cputime_div(__a, __n)		((__a) /  (__n))
-#define cputime_halve(__a)		((__a) >> 1)
-#define cputime_eq(__a, __b)		((__a) == (__b))
-#define cputime_gt(__a, __b)		((__a) >  (__b))
-#define cputime_ge(__a, __b)		((__a) >= (__b))
-#define cputime_lt(__a, __b)		((__a) <  (__b))
-#define cputime_le(__a, __b)		((__a) <= (__b))
-#define cputime_to_jiffies(__ct)	(__ct)
+#define cputime_to_jiffies(__ct)	(__force unsigned long)(__ct)
 #define cputime_to_scaled(__ct)		(__ct)
-#define jiffies_to_cputime(__hz)	(__hz)
+#define jiffies_to_cputime(__hz)	(__force cputime_t)(__hz)
 
-typedef u64 cputime64_t;
+typedef u64 __nocast cputime64_t;
 
-#define cputime64_zero (0ULL)
-#define cputime64_add(__a, __b)		((__a) + (__b))
-#define cputime64_sub(__a, __b)		((__a) - (__b))
-#define cputime64_to_jiffies64(__ct)	(__ct)
-#define jiffies64_to_cputime64(__jif)	(__jif)
-#define cputime_to_cputime64(__ct)	((u64) __ct)
-#define cputime64_gt(__a, __b)		((__a) >  (__b))
+#define cputime64_zero			((__force cputime64_t) 0ULL)
+#define cputime64_to_jiffies64(__ct)	(__force u64)(__ct)
+#define jiffies64_to_cputime64(__jif)	(__force cputime64_t)(__jif)
 
-#define nsecs_to_cputime64(__ct)	nsecs_to_jiffies64(__ct)
+#define nsecs_to_cputime64(__ct)	\
+	jiffies64_to_cputime64(nsecs_to_jiffies64(__ct))
 
 
 /*
  * Convert cputime to microseconds and back.
  */
-#define cputime_to_usecs(__ct)		jiffies_to_usecs(__ct);
-#define usecs_to_cputime(__msecs)	usecs_to_jiffies(__msecs);
+#define cputime_to_usecs(__ct)		\
+	jiffies_to_usecs(cputime_to_jiffies(__ct));
+#define usecs_to_cputime(__msecs)	\
+	jiffies_to_cputime(usecs_to_jiffies(__msecs));
 
 /*
  * Convert cputime to seconds and back.
  */
-#define cputime_to_secs(jif)		((jif) / HZ)
-#define secs_to_cputime(sec)		((sec) * HZ)
+#define cputime_to_secs(jif)		(cputime_to_jiffies(jif) / HZ)
+#define secs_to_cputime(sec)		jiffies_to_cputime((sec) * HZ)
 
 /*
  * Convert cputime to timespec and back.
  */
-#define timespec_to_cputime(__val)	timespec_to_jiffies(__val)
-#define cputime_to_timespec(__ct,__val)	jiffies_to_timespec(__ct,__val)
+#define timespec_to_cputime(__val)	\
+	jiffies_to_cputime(timespec_to_jiffies(__val))
+#define cputime_to_timespec(__ct,__val)	\
+	jiffies_to_timespec(cputime_to_jiffies(__ct),__val)
 
 /*
  * Convert cputime to timeval and back.
  */
-#define timeval_to_cputime(__val)	timeval_to_jiffies(__val)
-#define cputime_to_timeval(__ct,__val)	jiffies_to_timeval(__ct,__val)
+#define timeval_to_cputime(__val)	\
+	jiffies_to_cputime(timeval_to_jiffies(__val))
+#define cputime_to_timeval(__ct,__val)	\
+	jiffies_to_timeval(cputime_to_jiffies(__ct),__val)
 
 /*
  * Convert cputime to clock and back.
  */
-#define cputime_to_clock_t(__ct)	jiffies_to_clock_t(__ct)
-#define clock_t_to_cputime(__x)		clock_t_to_jiffies(__x)
+#define cputime_to_clock_t(__ct)	\
+	jiffies_to_clock_t(cputime_to_jiffies(__ct))
+#define clock_t_to_cputime(__x)		\
+	jiffies_to_cputime(clock_t_to_jiffies(__x))
 
 /*
  * Convert cputime64 to clock.
  */
-#define cputime64_to_clock_t(__ct)	jiffies_64_to_clock_t(__ct)
+#define cputime64_to_clock_t(__ct)	\
+	jiffies_64_to_clock_t(cputime64_to_jiffies64(__ct))
 
 #endif
--- a/kernel/acct.c
+++ b/kernel/acct.c
@@ -613,8 +613,8 @@ void acct_collect(long exitcode, int gro
 		pacct->ac_flag |= ACORE;
 	if (current->flags & PF_SIGNALED)
 		pacct->ac_flag |= AXSIG;
-	pacct->ac_utime = cputime_add(pacct->ac_utime, current->utime);
-	pacct->ac_stime = cputime_add(pacct->ac_stime, current->stime);
+	pacct->ac_utime += current->utime;
+	pacct->ac_stime += current->stime;
 	pacct->ac_minflt += current->min_flt;
 	pacct->ac_majflt += current->maj_flt;
 	spin_unlock_irq(&current->sighand->siglock);
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -177,8 +177,7 @@ static inline void check_for_tasks(int c
 	write_lock_irq(&tasklist_lock);
 	for_each_process(p) {
 		if (task_cpu(p) == cpu && p->state == TASK_RUNNING &&
-		    (!cputime_eq(p->utime, cputime_zero) ||
-		     !cputime_eq(p->stime, cputime_zero)))
+		    (p->utime != cputime_zero || p->stime != cputime_zero))
 			printk(KERN_WARNING "Task %s (pid = %d) is on cpu %d "
 				"(state = %ld, flags = %x)\n",
 				p->comm, task_pid_nr(p), cpu,
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -121,9 +121,9 @@ static void __exit_signal(struct task_st
 		 * We won't ever get here for the group leader, since it
 		 * will have been the last reference on the signal_struct.
 		 */
-		sig->utime = cputime_add(sig->utime, tsk->utime);
-		sig->stime = cputime_add(sig->stime, tsk->stime);
-		sig->gtime = cputime_add(sig->gtime, tsk->gtime);
+		sig->utime += tsk->utime;
+		sig->stime += tsk->stime;
+		sig->gtime += tsk->gtime;
 		sig->min_flt += tsk->min_flt;
 		sig->maj_flt += tsk->maj_flt;
 		sig->nvcsw += tsk->nvcsw;
@@ -1257,19 +1257,9 @@ static int wait_task_zombie(struct wait_
 		spin_lock_irq(&p->real_parent->sighand->siglock);
 		psig = p->real_parent->signal;
 		sig = p->signal;
-		psig->cutime =
-			cputime_add(psig->cutime,
-			cputime_add(tgutime,
-				    sig->cutime));
-		psig->cstime =
-			cputime_add(psig->cstime,
-			cputime_add(tgstime,
-				    sig->cstime));
-		psig->cgtime =
-			cputime_add(psig->cgtime,
-			cputime_add(p->gtime,
-			cputime_add(sig->gtime,
-				    sig->cgtime)));
+		psig->cutime += tgutime + sig->cutime;
+		psig->cstime += tgstime + sig->cstime;
+		psig->cgtime += p->gtime + sig->gtime + sig->cgtime;
 		psig->cmin_flt +=
 			p->min_flt + sig->min_flt + sig->cmin_flt;
 		psig->cmaj_flt +=
--- a/kernel/itimer.c
+++ b/kernel/itimer.c
@@ -52,22 +52,22 @@ static void get_cpu_itimer(struct task_s
 
 	cval = it->expires;
 	cinterval = it->incr;
-	if (!cputime_eq(cval, cputime_zero)) {
+	if (cval != cputime_zero) {
 		struct task_cputime cputime;
 		cputime_t t;
 
 		thread_group_cputimer(tsk, &cputime);
 		if (clock_id == CPUCLOCK_PROF)
-			t = cputime_add(cputime.utime, cputime.stime);
+			t = cputime.utime + cputime.stime;
 		else
 			/* CPUCLOCK_VIRT */
 			t = cputime.utime;
 
-		if (cputime_le(cval, t))
+		if (cval < t)
 			/* about to fire */
 			cval = cputime_one_jiffy;
 		else
-			cval = cputime_sub(cval, t);
+			cval = cval - t;
 	}
 
 	spin_unlock_irq(&tsk->sighand->siglock);
@@ -123,7 +123,7 @@ enum hrtimer_restart it_real_fn(struct h
 	struct signal_struct *sig =
 		container_of(timer, struct signal_struct, real_timer);
 
-	trace_itimer_expire(ITIMER_REAL, sig->leader_pid, 0);
+	trace_itimer_expire(ITIMER_REAL, sig->leader_pid, cputime_zero);
 	kill_pid_info(SIGALRM, SEND_SIG_PRIV, sig->leader_pid);
 
 	return HRTIMER_NORESTART;
@@ -161,10 +161,9 @@ static void set_cpu_itimer(struct task_s
 
 	cval = it->expires;
 	cinterval = it->incr;
-	if (!cputime_eq(cval, cputime_zero) ||
-	    !cputime_eq(nval, cputime_zero)) {
-		if (cputime_gt(nval, cputime_zero))
-			nval = cputime_add(nval, cputime_one_jiffy);
+	if (cval != cputime_zero || nval != cputime_zero) {
+		if (nval > cputime_zero)
+			nval += cputime_one_jiffy;
 		set_process_cpu_timer(tsk, clock_id, &nval, &cval);
 	}
 	it->expires = nval;
@@ -224,7 +223,7 @@ again:
 		} else
 			tsk->signal->it_real_incr.tv64 = 0;
 
-		trace_itimer_state(ITIMER_REAL, value, 0);
+		trace_itimer_state(ITIMER_REAL, value, cputime_zero);
 		spin_unlock_irq(&tsk->sighand->siglock);
 		break;
 	case ITIMER_VIRTUAL:
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -78,7 +78,7 @@ static inline int cpu_time_before(const
 	if (CPUCLOCK_WHICH(which_clock) == CPUCLOCK_SCHED) {
 		return now.sched < then.sched;
 	}  else {
-		return cputime_lt(now.cpu, then.cpu);
+		return now.cpu < then.cpu;
 	}
 }
 static inline void cpu_time_add(const clockid_t which_clock,
@@ -88,7 +88,7 @@ static inline void cpu_time_add(const cl
 	if (CPUCLOCK_WHICH(which_clock) == CPUCLOCK_SCHED) {
 		acc->sched += val.sched;
 	}  else {
-		acc->cpu = cputime_add(acc->cpu, val.cpu);
+		acc->cpu += val.cpu;
 	}
 }
 static inline union cpu_time_count cpu_time_sub(const clockid_t which_clock,
@@ -98,25 +98,12 @@ static inline union cpu_time_count cpu_t
 	if (CPUCLOCK_WHICH(which_clock) == CPUCLOCK_SCHED) {
 		a.sched -= b.sched;
 	}  else {
-		a.cpu = cputime_sub(a.cpu, b.cpu);
+		a.cpu -= b.cpu;
 	}
 	return a;
 }
 
 /*
- * Divide and limit the result to res >= 1
- *
- * This is necessary to prevent signal delivery starvation, when the result of
- * the division would be rounded down to 0.
- */
-static inline cputime_t cputime_div_non_zero(cputime_t time, unsigned long div)
-{
-	cputime_t res = cputime_div(time, div);
-
-	return max_t(cputime_t, res, 1);
-}
-
-/*
  * Update expiry time from increment, and increase overrun count,
  * given the current clock sample.
  */
@@ -148,28 +135,26 @@ static void bump_cpu_timer(struct k_itim
 	} else {
 		cputime_t delta, incr;
 
-		if (cputime_lt(now.cpu, timer->it.cpu.expires.cpu))
+		if (now.cpu < timer->it.cpu.expires.cpu)
 			return;
 		incr = timer->it.cpu.incr.cpu;
-		delta = cputime_sub(cputime_add(now.cpu, incr),
-				    timer->it.cpu.expires.cpu);
+		delta = now.cpu + incr - timer->it.cpu.expires.cpu;
 		/* Don't use (incr*2 < delta), incr*2 might overflow. */
-		for (i = 0; cputime_lt(incr, cputime_sub(delta, incr)); i++)
-			     incr = cputime_add(incr, incr);
-		for (; i >= 0; incr = cputime_halve(incr), i--) {
-			if (cputime_lt(delta, incr))
+		for (i = 0; incr < delta - incr; i++)
+			     incr += incr;
+		for (; i >= 0; incr = incr >> 1, i--) {
+			if (delta < incr)
 				continue;
-			timer->it.cpu.expires.cpu =
-				cputime_add(timer->it.cpu.expires.cpu, incr);
+			timer->it.cpu.expires.cpu += incr;
 			timer->it_overrun += 1 << i;
-			delta = cputime_sub(delta, incr);
+			delta -= incr;
 		}
 	}
 }
 
 static inline cputime_t prof_ticks(struct task_struct *p)
 {
-	return cputime_add(p->utime, p->stime);
+	return p->utime + p->stime;
 }
 static inline cputime_t virt_ticks(struct task_struct *p)
 {
@@ -248,8 +233,8 @@ void thread_group_cputime(struct task_st
 
 	t = tsk;
 	do {
-		times->utime = cputime_add(times->utime, t->utime);
-		times->stime = cputime_add(times->stime, t->stime);
+		times->utime += t->utime;
+		times->stime += t->stime;
 		times->sum_exec_runtime += task_sched_runtime(t);
 	} while_each_thread(tsk, t);
 out:
@@ -258,10 +243,10 @@ out:
 
 static void update_gt_cputime(struct task_cputime *a, struct task_cputime *b)
 {
-	if (cputime_gt(b->utime, a->utime))
+	if (b->utime > a->utime)
 		a->utime = b->utime;
 
-	if (cputime_gt(b->stime, a->stime))
+	if (b->stime > a->stime)
 		a->stime = b->stime;
 
 	if (b->sum_exec_runtime > a->sum_exec_runtime)
@@ -306,7 +291,7 @@ static int cpu_clock_sample_group(const
 		return -EINVAL;
 	case CPUCLOCK_PROF:
 		thread_group_cputime(p, &cputime);
-		cpu->cpu = cputime_add(cputime.utime, cputime.stime);
+		cpu->cpu = cputime.utime + cputime.stime;
 		break;
 	case CPUCLOCK_VIRT:
 		thread_group_cputime(p, &cputime);
@@ -470,26 +455,24 @@ static void cleanup_timers(struct list_h
 			   unsigned long long sum_exec_runtime)
 {
 	struct cpu_timer_list *timer, *next;
-	cputime_t ptime = cputime_add(utime, stime);
+	cputime_t ptime = utime + stime;
 
 	list_for_each_entry_safe(timer, next, head, entry) {
 		list_del_init(&timer->entry);
-		if (cputime_lt(timer->expires.cpu, ptime)) {
+		if (timer->expires.cpu < ptime) {
 			timer->expires.cpu = cputime_zero;
 		} else {
-			timer->expires.cpu = cputime_sub(timer->expires.cpu,
-							 ptime);
+			timer->expires.cpu = timer->expires.cpu - ptime;
 		}
 	}
 
 	++head;
 	list_for_each_entry_safe(timer, next, head, entry) {
 		list_del_init(&timer->entry);
-		if (cputime_lt(timer->expires.cpu, utime)) {
+		if (timer->expires.cpu < utime) {
 			timer->expires.cpu = cputime_zero;
 		} else {
-			timer->expires.cpu = cputime_sub(timer->expires.cpu,
-							 utime);
+			timer->expires.cpu = timer->expires.cpu - utime;
 		}
 	}
 
@@ -520,8 +503,7 @@ void posix_cpu_timers_exit_group(struct
 	struct signal_struct *const sig = tsk->signal;
 
 	cleanup_timers(tsk->signal->cpu_timers,
-		       cputime_add(tsk->utime, sig->utime),
-		       cputime_add(tsk->stime, sig->stime),
+		       tsk->utime + sig->utime, tsk->stime + sig->stime,
 		       tsk->se.sum_exec_runtime + sig->sum_sched_runtime);
 }
 
@@ -540,8 +522,7 @@ static void clear_dead_task(struct k_iti
 
 static inline int expires_gt(cputime_t expires, cputime_t new_exp)
 {
-	return cputime_eq(expires, cputime_zero) ||
-	       cputime_gt(expires, new_exp);
+	return expires == cputime_zero || expires > new_exp;
 }
 
 /*
@@ -651,7 +632,7 @@ static int cpu_timer_sample_group(const
 	default:
 		return -EINVAL;
 	case CPUCLOCK_PROF:
-		cpu->cpu = cputime_add(cputime.utime, cputime.stime);
+		cpu->cpu = cputime.utime + cputime.stime;
 		break;
 	case CPUCLOCK_VIRT:
 		cpu->cpu = cputime.utime;
@@ -923,7 +904,7 @@ static void check_thread_timers(struct t
 		struct cpu_timer_list *t = list_first_entry(timers,
 						      struct cpu_timer_list,
 						      entry);
-		if (!--maxfire || cputime_lt(prof_ticks(tsk), t->expires.cpu)) {
+		if (!--maxfire || prof_ticks(tsk) < t->expires.cpu) {
 			tsk->cputime_expires.prof_exp = t->expires.cpu;
 			break;
 		}
@@ -938,7 +919,7 @@ static void check_thread_timers(struct t
 		struct cpu_timer_list *t = list_first_entry(timers,
 						      struct cpu_timer_list,
 						      entry);
-		if (!--maxfire || cputime_lt(virt_ticks(tsk), t->expires.cpu)) {
+		if (!--maxfire || virt_ticks(tsk) < t->expires.cpu) {
 			tsk->cputime_expires.virt_exp = t->expires.cpu;
 			break;
 		}
@@ -1009,16 +990,15 @@ static u32 onecputick;
 static void check_cpu_itimer(struct task_struct *tsk, struct cpu_itimer *it,
 			     cputime_t *expires, cputime_t cur_time, int signo)
 {
-	if (cputime_eq(it->expires, cputime_zero))
+	if (it->expires == cputime_zero)
 		return;
 
-	if (cputime_ge(cur_time, it->expires)) {
-		if (!cputime_eq(it->incr, cputime_zero)) {
-			it->expires = cputime_add(it->expires, it->incr);
+	if (cur_time >= it->expires) {
+		if (it->incr != cputime_zero) {
+			it->expires += it->incr;
 			it->error += it->incr_error;
 			if (it->error >= onecputick) {
-				it->expires = cputime_sub(it->expires,
-							  cputime_one_jiffy);
+				it->expires -= cputime_one_jiffy;
 				it->error -= onecputick;
 			}
 		} else {
@@ -1031,9 +1011,8 @@ static void check_cpu_itimer(struct task
 		__group_send_sig_info(signo, SEND_SIG_PRIV, tsk);
 	}
 
-	if (!cputime_eq(it->expires, cputime_zero) &&
-	    (cputime_eq(*expires, cputime_zero) ||
-	     cputime_lt(it->expires, *expires))) {
+	if (it->expires != cputime_zero &&
+	    (*expires == cputime_zero || it->expires < *expires)) {
 		*expires = it->expires;
 	}
 }
@@ -1048,8 +1027,8 @@ static void check_cpu_itimer(struct task
  */
 static inline int task_cputime_zero(const struct task_cputime *cputime)
 {
-	if (cputime_eq(cputime->utime, cputime_zero) &&
-	    cputime_eq(cputime->stime, cputime_zero) &&
+	if (cputime->utime == cputime_zero &&
+	    cputime->stime == cputime_zero &&
 	    cputime->sum_exec_runtime == 0)
 		return 1;
 	return 0;
@@ -1076,7 +1055,7 @@ static void check_process_timers(struct
 	 */
 	thread_group_cputimer(tsk, &cputime);
 	utime = cputime.utime;
-	ptime = cputime_add(utime, cputime.stime);
+	ptime = utime + cputime.stime;
 	sum_sched_runtime = cputime.sum_exec_runtime;
 	maxfire = 20;
 	prof_expires = cputime_zero;
@@ -1084,7 +1063,7 @@ static void check_process_timers(struct
 		struct cpu_timer_list *tl = list_first_entry(timers,
 						      struct cpu_timer_list,
 						      entry);
-		if (!--maxfire || cputime_lt(ptime, tl->expires.cpu)) {
+		if (!--maxfire || ptime < tl->expires.cpu) {
 			prof_expires = tl->expires.cpu;
 			break;
 		}
@@ -1099,7 +1078,7 @@ static void check_process_timers(struct
 		struct cpu_timer_list *tl = list_first_entry(timers,
 						      struct cpu_timer_list,
 						      entry);
-		if (!--maxfire || cputime_lt(utime, tl->expires.cpu)) {
+		if (!--maxfire || utime < tl->expires.cpu) {
 			virt_expires = tl->expires.cpu;
 			break;
 		}
@@ -1154,8 +1133,7 @@ static void check_process_timers(struct
 			}
 		}
 		x = secs_to_cputime(soft);
-		if (cputime_eq(prof_expires, cputime_zero) ||
-		    cputime_lt(x, prof_expires)) {
+		if (prof_expires == cputime_zero || x < prof_expires) {
 			prof_expires = x;
 		}
 	}
@@ -1249,12 +1227,11 @@ out:
 static inline int task_cputime_expired(const struct task_cputime *sample,
 					const struct task_cputime *expires)
 {
-	if (!cputime_eq(expires->utime, cputime_zero) &&
-	    cputime_ge(sample->utime, expires->utime))
+	if (expires->utime != cputime_zero &&
+	    sample->utime >= expires->utime)
 		return 1;
-	if (!cputime_eq(expires->stime, cputime_zero) &&
-	    cputime_ge(cputime_add(sample->utime, sample->stime),
-		       expires->stime))
+	if (expires->stime != cputime_zero &&
+	    sample->utime + sample->stime >= expires->stime)
 		return 1;
 	if (expires->sum_exec_runtime != 0 &&
 	    sample->sum_exec_runtime >= expires->sum_exec_runtime)
@@ -1389,18 +1366,18 @@ void set_process_cpu_timer(struct task_s
 		 * it to be relative, *newval argument is relative and we update
 		 * it to be absolute.
 		 */
-		if (!cputime_eq(*oldval, cputime_zero)) {
-			if (cputime_le(*oldval, now.cpu)) {
+		if (*oldval != cputime_zero) {
+			if (*oldval <= now.cpu) {
 				/* Just about to fire. */
 				*oldval = cputime_one_jiffy;
 			} else {
-				*oldval = cputime_sub(*oldval, now.cpu);
+				*oldval -= now.cpu;
 			}
 		}
 
-		if (cputime_eq(*newval, cputime_zero))
+		if (*newval == cputime_zero)
 			return;
-		*newval = cputime_add(*newval, now.cpu);
+		*newval += now.cpu;
 	}
 
 	/*
@@ -1409,11 +1386,11 @@ void set_process_cpu_timer(struct task_s
 	 */
 	switch (clock_idx) {
 	case CPUCLOCK_PROF:
-		if (expires_gt(tsk->signal->cputime_expires.prof_exp, *newval))
+		if (tsk->signal->cputime_expires.prof_exp > *newval)
 			tsk->signal->cputime_expires.prof_exp = *newval;
 		break;
 	case CPUCLOCK_VIRT:
-		if (expires_gt(tsk->signal->cputime_expires.virt_exp, *newval))
+		if (tsk->signal->cputime_expires.virt_exp >  *newval)
 			tsk->signal->cputime_expires.virt_exp = *newval;
 		break;
 	}
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2011,7 +2011,7 @@ static int irqtime_account_hi_update(voi
 
 	local_irq_save(flags);
 	latest_ns = this_cpu_read(cpu_hardirq_time);
-	if (cputime64_gt(nsecs_to_cputime64(latest_ns), cpustat->irq))
+	if (nsecs_to_cputime64(latest_ns) > cpustat->irq)
 		ret = 1;
 	local_irq_restore(flags);
 	return ret;
@@ -2026,7 +2026,7 @@ static int irqtime_account_si_update(voi
 
 	local_irq_save(flags);
 	latest_ns = this_cpu_read(cpu_softirq_time);
-	if (cputime64_gt(nsecs_to_cputime64(latest_ns), cpustat->softirq))
+	if (nsecs_to_cputime64(latest_ns) > cpustat->softirq)
 		ret = 1;
 	local_irq_restore(flags);
 	return ret;
@@ -3734,19 +3734,17 @@ void account_user_time(struct task_struc
 		       cputime_t cputime_scaled)
 {
 	struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
-	cputime64_t tmp;
 
 	/* Add user time to process. */
-	p->utime = cputime_add(p->utime, cputime);
-	p->utimescaled = cputime_add(p->utimescaled, cputime_scaled);
+	p->utime += cputime;
+	p->utimescaled += cputime_scaled;
 	account_group_user_time(p, cputime);
 
 	/* Add user time to cpustat. */
-	tmp = cputime_to_cputime64(cputime);
 	if (TASK_NICE(p) > 0)
-		cpustat->nice = cputime64_add(cpustat->nice, tmp);
+		cpustat->nice += (__force cputime64_t) cputime;
 	else
-		cpustat->user = cputime64_add(cpustat->user, tmp);
+		cpustat->user += (__force cputime64_t) cputime;
 
 	cpuacct_update_stats(p, CPUACCT_STAT_USER, cputime);
 	/* Account for user time used */
@@ -3762,24 +3760,21 @@ void account_user_time(struct task_struc
 static void account_guest_time(struct task_struct *p, cputime_t cputime,
 			       cputime_t cputime_scaled)
 {
-	cputime64_t tmp;
 	struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
 
-	tmp = cputime_to_cputime64(cputime);
-
 	/* Add guest time to process. */
-	p->utime = cputime_add(p->utime, cputime);
-	p->utimescaled = cputime_add(p->utimescaled, cputime_scaled);
+	p->utime += cputime;
+	p->utimescaled += cputime_scaled;
 	account_group_user_time(p, cputime);
-	p->gtime = cputime_add(p->gtime, cputime);
+	p->gtime += cputime;
 
 	/* Add guest time to cpustat. */
 	if (TASK_NICE(p) > 0) {
-		cpustat->nice = cputime64_add(cpustat->nice, tmp);
-		cpustat->guest_nice = cputime64_add(cpustat->guest_nice, tmp);
+		cpustat->nice += (__force cputime64_t) cputime;
+		cpustat->guest_nice += (__force cputime64_t) cputime;
 	} else {
-		cpustat->user = cputime64_add(cpustat->user, tmp);
-		cpustat->guest = cputime64_add(cpustat->guest, tmp);
+		cpustat->user += (__force cputime64_t) cputime;
+		cpustat->guest += (__force cputime64_t) cputime;
 	}
 }
 
@@ -3794,15 +3789,13 @@ static inline
 void __account_system_time(struct task_struct *p, cputime_t cputime,
 			cputime_t cputime_scaled, cputime64_t *target_cputime64)
 {
-	cputime64_t tmp = cputime_to_cputime64(cputime);
-
 	/* Add system time to process. */
-	p->stime = cputime_add(p->stime, cputime);
-	p->stimescaled = cputime_add(p->stimescaled, cputime_scaled);
+	p->stime += cputime;
+	p->stimescaled += cputime_scaled;
 	account_group_system_time(p, cputime);
 
 	/* Add system time to cpustat. */
-	*target_cputime64 = cputime64_add(*target_cputime64, tmp);
+	*target_cputime64 += (__force cputime64_t) cputime;
 	cpuacct_update_stats(p, CPUACCT_STAT_SYSTEM, cputime);
 
 	/* Account for system time used */
@@ -3844,9 +3837,8 @@ void account_system_time(struct task_str
 void account_steal_time(cputime_t cputime)
 {
 	struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
-	cputime64_t cputime64 = cputime_to_cputime64(cputime);
 
-	cpustat->steal = cputime64_add(cpustat->steal, cputime64);
+	cpustat->steal += (__force cputime64_t) cputime;
 }
 
 /*
@@ -3856,13 +3848,12 @@ void account_steal_time(cputime_t cputim
 void account_idle_time(cputime_t cputime)
 {
 	struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
-	cputime64_t cputime64 = cputime_to_cputime64(cputime);
 	struct rq *rq = this_rq();
 
 	if (atomic_read(&rq->nr_iowait) > 0)
-		cpustat->iowait = cputime64_add(cpustat->iowait, cputime64);
+		cpustat->iowait += (__force cputime64_t) cputime;
 	else
-		cpustat->idle = cputime64_add(cpustat->idle, cputime64);
+		cpustat->idle += (__force cputime64_t) cputime;
 }
 
 static __always_inline bool steal_account_process_tick(void)
@@ -3912,16 +3903,15 @@ static void irqtime_account_process_tick
 						struct rq *rq)
 {
 	cputime_t one_jiffy_scaled = cputime_to_scaled(cputime_one_jiffy);
-	cputime64_t tmp = cputime_to_cputime64(cputime_one_jiffy);
 	struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
 
 	if (steal_account_process_tick())
 		return;
 
 	if (irqtime_account_hi_update()) {
-		cpustat->irq = cputime64_add(cpustat->irq, tmp);
+		cpustat->irq += (__force cputime64_t) cputime_one_jiffy;
 	} else if (irqtime_account_si_update()) {
-		cpustat->softirq = cputime64_add(cpustat->softirq, tmp);
+		cpustat->softirq += (__force cputime64_t) cputime_one_jiffy;
 	} else if (this_cpu_ksoftirqd() == p) {
 		/*
 		 * ksoftirqd time do not get accounted in cpu_softirq_time.
@@ -4037,7 +4027,7 @@ void thread_group_times(struct task_stru
 
 void task_times(struct task_struct *p, cputime_t *ut, cputime_t *st)
 {
-	cputime_t rtime, utime = p->utime, total = cputime_add(utime, p->stime);
+	cputime_t rtime, utime = p->utime, total = utime + p->stime;
 
 	/*
 	 * Use CFS's precise accounting:
@@ -4045,11 +4035,11 @@ void task_times(struct task_struct *p, c
 	rtime = nsecs_to_cputime(p->se.sum_exec_runtime);
 
 	if (total) {
-		u64 temp = rtime;
+		u64 temp = (__force u64) rtime;
 
-		temp *= utime;
-		do_div(temp, total);
-		utime = (cputime_t)temp;
+		temp *= (__force u64) utime;
+		do_div(temp, (__force u32) total);
+		utime = (__force cputime_t) temp;
 	} else
 		utime = rtime;
 
@@ -4057,7 +4047,7 @@ void task_times(struct task_struct *p, c
 	 * Compare with previous values, to keep monotonicity:
 	 */
 	p->prev_utime = max(p->prev_utime, utime);
-	p->prev_stime = max(p->prev_stime, cputime_sub(rtime, p->prev_utime));
+	p->prev_stime = max(p->prev_stime, rtime - p->prev_utime);
 
 	*ut = p->prev_utime;
 	*st = p->prev_stime;
@@ -4074,21 +4064,20 @@ void thread_group_times(struct task_stru
 
 	thread_group_cputime(p, &cputime);
 
-	total = cputime_add(cputime.utime, cputime.stime);
+	total = cputime.utime + cputime.stime;
 	rtime = nsecs_to_cputime(cputime.sum_exec_runtime);
 
 	if (total) {
-		u64 temp = rtime;
+		u64 temp = (__force u64) rtime;
 
-		temp *= cputime.utime;
-		do_div(temp, total);
-		utime = (cputime_t)temp;
+		temp *= (__force u64) cputime.utime;
+		do_div(temp, (__force u32) total);
+		utime = (__force cputime_t) temp;
 	} else
 		utime = rtime;
 
 	sig->prev_utime = max(sig->prev_utime, utime);
-	sig->prev_stime = max(sig->prev_stime,
-			      cputime_sub(rtime, sig->prev_utime));
+	sig->prev_stime = max(sig->prev_stime, rtime - sig->prev_utime);
 
 	*ut = sig->prev_utime;
 	*st = sig->prev_stime;
@@ -9330,7 +9319,8 @@ static void cpuacct_update_stats(struct
 	ca = task_ca(tsk);
 
 	do {
-		__percpu_counter_add(&ca->cpustat[idx], val, batch);
+		__percpu_counter_add(&ca->cpustat[idx],
+				     (__force s64) val, batch);
 		ca = ca->parent;
 	} while (ca);
 	rcu_read_unlock();
--- a/kernel/sched_stats.h
+++ b/kernel/sched_stats.h
@@ -283,8 +283,7 @@ static inline void account_group_user_ti
 		return;
 
 	spin_lock(&cputimer->lock);
-	cputimer->cputime.utime =
-		cputime_add(cputimer->cputime.utime, cputime);
+	cputimer->cputime.utime += cputime;
 	spin_unlock(&cputimer->lock);
 }
 
@@ -307,8 +306,7 @@ static inline void account_group_system_
 		return;
 
 	spin_lock(&cputimer->lock);
-	cputimer->cputime.stime =
-		cputime_add(cputimer->cputime.stime, cputime);
+	cputimer->cputime.stime += cputime;
 	spin_unlock(&cputimer->lock);
 }
 
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1621,10 +1621,8 @@ bool do_notify_parent(struct task_struct
 	info.si_uid = __task_cred(tsk)->uid;
 	rcu_read_unlock();
 
-	info.si_utime = cputime_to_clock_t(cputime_add(tsk->utime,
-				tsk->signal->utime));
-	info.si_stime = cputime_to_clock_t(cputime_add(tsk->stime,
-				tsk->signal->stime));
+	info.si_utime = cputime_to_clock_t(tsk->utime + tsk->signal->utime);
+	info.si_stime = cputime_to_clock_t(tsk->stime + tsk->signal->stime);
 
 	info.si_status = tsk->exit_code & 0x7f;
 	if (tsk->exit_code & 0x80)
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1632,8 +1632,8 @@ static void k_getrusage(struct task_stru
 
 		case RUSAGE_SELF:
 			thread_group_times(p, &tgutime, &tgstime);
-			utime = cputime_add(utime, tgutime);
-			stime = cputime_add(stime, tgstime);
+			utime += tgutime;
+			stime += tgstime;
 			r->ru_nvcsw += p->signal->nvcsw;
 			r->ru_nivcsw += p->signal->nivcsw;
 			r->ru_minflt += p->signal->min_flt;
--- a/kernel/tsacct.c
+++ b/kernel/tsacct.c
@@ -127,7 +127,7 @@ void acct_update_integrals(struct task_s
 
 		local_irq_save(flags);
 		time = tsk->stime + tsk->utime;
-		dtime = cputime_sub(time, tsk->acct_timexpd);
+		dtime = time - tsk->acct_timexpd;
 		jiffies_to_timeval(cputime_to_jiffies(dtime), &value);
 		delta = value.tv_sec;
 		delta = delta * USEC_PER_SEC + value.tv_usec;
@@ -148,7 +148,7 @@ void acct_update_integrals(struct task_s
  */
 void acct_clear_integrals(struct task_struct *tsk)
 {
-	tsk->acct_timexpd = 0;
+	tsk->acct_timexpd = cputime_zero;
 	tsk->acct_rss_mem1 = 0;
 	tsk->acct_vm_mem1 = 0;
 }


-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-20 14:36                 ` Linux 3.1-rc9 Martin Schwidefsky
@ 2011-10-23 11:34                   ` Ingo Molnar
  2011-10-24  7:48                     ` Martin Schwidefsky
  0 siblings, 1 reply; 156+ messages in thread
From: Ingo Molnar @ 2011-10-23 11:34 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: Linus Torvalds, Simon Kirby, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Thomas Gleixner


* Martin Schwidefsky <schwidefsky@de.ibm.com> wrote:

> +#define cputime_zero			((__force cputime_t) 0ULL)
> +#define cputime64_zero			((__force cputime64_t) 0ULL)

Hm, why are these still needed?

This:

		if (*newval == cputime_zero)
			return;

Could be written as the much simpler:

		if (!*newval)
 			return;

with no ill effect that i can see.

Thanks,
	Ingo

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-23 11:34                   ` Ingo Molnar
@ 2011-10-24  7:48                     ` Martin Schwidefsky
  2011-10-24  7:51                       ` Linus Torvalds
  0 siblings, 1 reply; 156+ messages in thread
From: Martin Schwidefsky @ 2011-10-24  7:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Simon Kirby, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Thomas Gleixner

On Sun, 23 Oct 2011 13:34:22 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Martin Schwidefsky <schwidefsky@de.ibm.com> wrote:
> 
> > +#define cputime_zero			((__force cputime_t) 0ULL)
> > +#define cputime64_zero			((__force cputime64_t) 0ULL)
> 
> Hm, why are these still needed?
> 
> This:
> 
> 		if (*newval == cputime_zero)
> 			return;
> 
> Could be written as the much simpler:
> 
> 		if (!*newval)
>  			return;
> 
> with no ill effect that i can see.

These types are still there because cputime_t can be u32 or u64. E.g. this

  timer->expires.cpu = 0;

will give the following sparse warning

  kernel/posix-cpu-timers.c:463:46: warning: implicit cast to nocast type

if you architecture happens to have a u64 as cputime_t.
We could get rid of cputime64_t as it always should be a u64. To keep
things symmetrical I choose to keep both defines.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-24  7:48                     ` Martin Schwidefsky
@ 2011-10-24  7:51                       ` Linus Torvalds
  2011-10-24  8:08                         ` Martin Schwidefsky
  0 siblings, 1 reply; 156+ messages in thread
From: Linus Torvalds @ 2011-10-24  7:51 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: Ingo Molnar, Simon Kirby, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Thomas Gleixner

On Mon, Oct 24, 2011 at 9:48 AM, Martin Schwidefsky
<schwidefsky@de.ibm.com> wrote:
>
> These types are still there because cputime_t can be u32 or u64. E.g. this
>
>  timer->expires.cpu = 0;
>
> will give the following sparse warning
>
>  kernel/posix-cpu-timers.c:463:46: warning: implicit cast to nocast type

Ok, we should probably special-case zero for that case too (we
consider zero to be very special - it's not only the NULL pointer, but
0 is special for the bitwise types etc). So this is very arguably a
sparse issue: casting zero is special.

                  Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-24  7:51                       ` Linus Torvalds
@ 2011-10-24  8:08                         ` Martin Schwidefsky
  0 siblings, 0 replies; 156+ messages in thread
From: Martin Schwidefsky @ 2011-10-24  8:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Simon Kirby, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Thomas Gleixner

On Mon, 24 Oct 2011 09:51:09 +0200
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Mon, Oct 24, 2011 at 9:48 AM, Martin Schwidefsky
> <schwidefsky@de.ibm.com> wrote:
> >
> > These types are still there because cputime_t can be u32 or u64. E.g. this
> >
> >  timer->expires.cpu = 0;
> >
> > will give the following sparse warning
> >
> >  kernel/posix-cpu-timers.c:463:46: warning: implicit cast to nocast type
> 
> Ok, we should probably special-case zero for that case too (we
> consider zero to be very special - it's not only the NULL pointer, but
> 0 is special for the bitwise types etc). So this is very arguably a
> sparse issue: casting zero is special.

Ok, cool. In that case I'll cook up a patch without cputime_t & cputime64_t
and put it on the cputime branch on git390.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-18 19:48                                   ` Thomas Gleixner
  2011-10-18 20:12                                     ` Linus Torvalds
@ 2011-10-24 19:02                                     ` Simon Kirby
  2011-10-25  7:13                                       ` Linus Torvalds
  2011-10-25 20:20                                       ` Simon Kirby
  1 sibling, 2 replies; 156+ messages in thread
From: Simon Kirby @ 2011-10-24 19:02 UTC (permalink / raw)
  To: Thomas Gleixner, David Miller
  Cc: Peter Zijlstra, Linus Torvalds, Linux Kernel Mailing List,
	Dave Jones, Martin Schwidefsky, Ingo Molnar

On Tue, Oct 18, 2011 at 09:48:51PM +0200, Thomas Gleixner wrote:

> On Tue, 18 Oct 2011, Simon Kirby wrote:
> > Looks good running on three boxes since this morning (unpatched kernel
> > hangs in ~15 minutes).
> > 
> > While I have your eyes, does this hang trace make any sense (which
> > happened a couple of times with your previous patch applied)?
> > 
> > http://0x.ca/sim/ref/3.1-rc9/3.1-rc9-tcp-lockup.log
> > 
> > I don't see how all CPUs could be spinning on the same lock without
> > reentry, and I don't see the any in the backtraces.
> 
> Weird.
> 
> Which version of Peters patches was this, the extra lock or the
> atomic64 thingy?

The first one with the extra lock. I never tried the atomic64 one.
Anyway, that's fixed now.

> It does not look related. Could you try to reproduce that problem with
> lockdep enabled? lockdep might make it go away, but it's definitely
> worth a try.

Trying now...

...Whoops, never sent this email.

Ok, hit the hang about 4 more times, but only this morning on a box with
a serial cable attached. Yay!

Simon-

[216695.579770] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[216695.589435] 
[216695.589437] =======================================================
[216695.593380] [ INFO: possible circular locking dependency detected ]
[216695.593380] 3.1.0-rc10-hw-lockdep+ #51
[216695.593380] -------------------------------------------------------
[216695.593380] kworker/0:1/0 is trying to acquire lock:
[216695.593380]  (&icsk->icsk_retransmit_timer){+.-.-.}, at: [<ffffffff8106cc88>] run_timer_softirq+0x198/0x410
[216695.593380] 
[216695.593380] but task is already holding lock:
[216695.593380]  (slock-AF_INET){+.-.-.}, at: [<ffffffff81604540>] sk_clone+0x120/0x420
[216695.593380] 
[216695.593380] which lock already depends on the new lock.
[216695.593380] 
[216695.593380] 
[216695.593380] the existing dependency chain (in reverse order) is:
[216695.593380] 
[216695.593380] -> #1 (slock-AF_INET){+.-.-.}:
[216695.593380]        [<ffffffff8109a7b9>] lock_acquire+0x109/0x140
[216695.593380]        [<ffffffff816f55fc>] _raw_spin_lock+0x3c/0x50
[216695.593380]        [<ffffffff81661cc3>] tcp_write_timer+0x23/0x230
[216695.682901]        [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[216695.682901]        [<ffffffff810640b8>] __do_softirq+0x138/0x250
[216695.682901]        [<ffffffff817002bc>] call_softirq+0x1c/0x30
[216695.682901]        [<ffffffff810153c5>] do_softirq+0x95/0xd0
[216695.682901]        [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[216695.682901]        [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[216695.682901]        [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[216695.682901]        [<ffffffff81013156>] cpu_idle+0x96/0xf0
[216695.682901]        [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[216695.682901] 
[216695.682901] -> #0 (&icsk->icsk_retransmit_timer){+.-.-.}:
[216695.682901]        [<ffffffff8109a000>] __lock_acquire+0x2040/0x2180
[216695.682901]        [<ffffffff8109a7b9>] lock_acquire+0x109/0x140
[216695.682901]        [<ffffffff8106cd09>] run_timer_softirq+0x219/0x410
[216695.682901]        [<ffffffff810640b8>] __do_softirq+0x138/0x250
[216695.682901]        [<ffffffff817002bc>] call_softirq+0x1c/0x30
[216695.682901]        [<ffffffff810153c5>] do_softirq+0x95/0xd0
[216695.682901]        [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[216695.682901]        [<ffffffff81014b74>] do_IRQ+0x64/0xe0
[216695.682901]        [<ffffffff816f6273>] ret_from_intr+0x0/0x1a
[216695.682901]        [<ffffffff81013156>] cpu_idle+0x96/0xf0
[216695.682901]        [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[216695.682901] 
[216695.682901] other info that might help us debug this:
[216695.682901] 
[216695.682901]  Possible unsafe locking scenario:
[216695.682901] 
[216695.682901]        CPU0                    CPU1
[216695.682901]        ----                    ----
[216695.682901]   lock(slock-AF_INET);
[216695.682901]                                lock(&icsk->icsk_retransmit_timer);
[216695.682901]                                lock(slock-AF_INET);
[216695.682901]   lock(&icsk->icsk_retransmit_timer);
[216695.682901] 
[216695.682901]  *** DEADLOCK ***
[216695.682901] 
[216695.682901] 1 lock held by kworker/0:1/0:
[216695.682901]  #0:  (slock-AF_INET){+.-.-.}, at: [<ffffffff81604540>] sk_clone+0x120/0x420
[216695.682901] 
[216695.682901] stack backtrace:
[216695.682901] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc10-hw-lockdep+ #51
[216695.682901] Call Trace:
[216695.682901]  <IRQ>  [<ffffffff81097eab>] print_circular_bug+0x21b/0x330
[216695.682901]  [<ffffffff8109a000>] __lock_acquire+0x2040/0x2180
[216695.682901]  [<ffffffffa001d6e2>] ? nf_conntrack_free+0x42/0x50 [nf_conntrack]
[216695.682901]  [<ffffffffa001d6e2>] ? nf_conntrack_free+0x42/0x50 [nf_conntrack]
[216695.682901]  [<ffffffff8109a7b9>] lock_acquire+0x109/0x140
[216695.682901]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[216695.682901]  [<ffffffff81096b4c>] ? trace_hardirqs_on_caller+0x7c/0x1c0
[216695.682901]  [<ffffffff8106cd09>] run_timer_softirq+0x219/0x410
[216695.682901]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[216695.682901]  [<ffffffff816f16c1>] ? printk+0x67/0x69
[216695.682901]  [<ffffffff81661ca0>] ? tcp_delack_timer+0x230/0x230
[216695.682901]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[216695.682901]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[216695.682901]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[216695.682901]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[216695.682901]  [<ffffffff81014b74>] do_IRQ+0x64/0xe0
[216695.682901]  [<ffffffff816f6273>] common_interrupt+0x73/0x73
[216695.682901]  <EOI>  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[216695.682901]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[216695.682901]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[216695.682901]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[216696.019296] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000105?
[216697.762956] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[216698.597297] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[216701.489681] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[216701.667999] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[216704.580592] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[216709.468971] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[216712.845904] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[216716.588502] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[216725.072958] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[216725.603879] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[216725.828374] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[216727.588978] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[216735.513864] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[216740.581530] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[216756.278571] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[218855.312903] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[218855.323133] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[218858.293355] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[218864.301938] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[218876.333821] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[218885.332651] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[218900.313590] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[220821.012017] TCP: Peer 32.176.160.153:49226/80 unexpectedly shrunk window 665256753:665268993 (repaired)
[221075.224300] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[221075.234579] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[221075.277593] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[221075.780515] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[221075.780713] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[221077.349279] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[221077.905587] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[221077.915567] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[221081.498430] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[221081.703277] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[221082.088513] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[221082.167985] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[221089.772578] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[221090.487927] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[221090.686394] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[221094.587131] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[221105.255699] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[221105.280699] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[221105.291634] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[221106.325794] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[221107.286029] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[221107.622736] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[221107.734471] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[221120.381643] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[223936.264020] BUG: soft lockup - CPU#0 stuck for 22s! [swapper:0]
[223936.268002] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223936.268002] irq event stamp: 2595159887
[223936.268002] hardirqs last  enabled at (2595159887): [<ffffffff8101b805>] mwait_idle+0x145/0x170
[223936.268002] hardirqs last disabled at (2595159886): [<ffffffff81013139>] cpu_idle+0x79/0xf0
[223936.268002] softirqs last  enabled at (2595159878): [<ffffffff81064176>] __do_softirq+0x1f6/0x250
[223936.268002] softirqs last disabled at (2595159873): [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223936.268002] CPU 0 
[223936.268002] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223936.268002] 
[223936.268002] Pid: 0, comm: swapper Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
[223936.268002] RIP: 0010:[<ffffffff813a4ee3>]  [<ffffffff813a4ee3>] delay_tsc+0x73/0xd0
[223936.268002] RSP: 0018:ffff88022fc03cf0  EFLAGS: 00000202
[223936.268002] RAX: 00017b5d5932dd02 RBX: ffffffff816f6334 RCX: 000000005932dd02
[223936.372028] BUG: soft lockup - CPU#1 stuck for 22s! [kworker/0:0:0]
[223936.372031] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223936.372042] irq event stamp: 2598787699
[223936.372044] hardirqs last  enabled at (2598787699): [<ffffffff8101b805>] mwait_idle+0x145/0x170
[223936.372054] hardirqs last disabled at (2598787698): [<ffffffff81013139>] cpu_idle+0x79/0xf0
[223936.372058] softirqs last  enabled at (2598787696): [<ffffffff81064176>] __do_softirq+0x1f6/0x250
[223936.372063] softirqs last disabled at (2598787681): [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223936.372069] CPU 1 
[223936.372070] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223936.372079] 
[223936.372081] Pid: 0, comm: kworker/0:0 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
[223936.372086] RIP: 0010:[<ffffffff8101afab>]  [<ffffffff8101afab>] native_read_tsc+0xb/0x20
[223936.372091] RSP: 0018:ffff88022fc43ce0  EFLAGS: 00000202
[223936.372093] RAX: 0000000000017b5d RBX: ffffffff816f6334 RCX: 00000000652f810e
[223936.372096] RDX: 0000000000017b5d RSI: 0000000000000000 RDI: 0000000000000001
[223936.372098] RBP: ffff88022fc43ce0 R08: 00000000652f80c8 R09: 0000000000000000
[223936.372101] R10: ffff88022690dd60 R11: 0000000000000000 R12: ffff88022fc43c58
[223936.372103] R13: ffffffff816feb33 R14: ffff88022fc43ce0 R15: 00000000180bbeb8
[223936.372106] FS:  0000000000000000(0000) GS:ffff88022fc40000(0000) knlGS:0000000000000000
[223936.372108] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[223936.372111] CR2: 00007f472ba6a6fc CR3: 0000000126bb7000 CR4: 00000000000006e0
[223936.372113] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[223936.372116] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[223936.372119] Process kworker/0:0 (pid: 0, threadinfo ffff880226928000, task ffff88022690dd60)
[223936.372121] Stack:
[223936.372123]  ffff88022fc43d30 ffffffff813a4eaf ffff880226928000 00000000652f8090
[223936.372128]  000000012fc43d18 ffff88002e90e348 00000000180bbeb8 000000006efcdc62
[223936.372132]  0000000000000001 ffff88022690dd60 ffff88022fc43d40 ffffffff813a4f6a
[223936.372136] Call Trace:
[223936.372139]  <IRQ> 
[223936.372144]  [<ffffffff813a4eaf>] delay_tsc+0x3f/0xd0
[223936.372148]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223936.372153]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223936.372159]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223936.372164]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223936.372168]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223936.372174]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223936.372178]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223936.372182]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223936.372186]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223936.372190]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223936.372194]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223936.372198]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223936.372203]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223936.372208]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223936.372210]  <EOI> 
[223936.372214]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223936.372218]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223936.372222]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223936.372226]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223936.372230]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[223936.372233] Code: a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31 89 c1 48 89 d0 
[223936.372253]  c1 e0 20 89 ca 48 09 d0 c9 c3 66 2e 0f 1f 84 00 00 00 00 00 
[223936.372262] Call Trace:
[223936.372264]  <IRQ>  [<ffffffff813a4eaf>] delay_tsc+0x3f/0xd0
[223936.372269]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223936.372272]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223936.372276]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223936.372280]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223936.372283]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223936.372286]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223936.372289]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223936.372293]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223936.372297]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223936.372300]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223936.372303]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223936.372307]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223936.372310]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223936.372313]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223936.372315]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223936.372321]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223936.372324]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223936.372327]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223936.372331]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[223936.476032] BUG: soft lockup - CPU#2 stuck for 23s! [kworker/0:1:0]
[223936.476034] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223936.476043] irq event stamp: 2613824057
[223936.476045] hardirqs last  enabled at (2613824057): [<ffffffff8101b805>] mwait_idle+0x145/0x170
[223936.476050] hardirqs last disabled at (2613824056): [<ffffffff81013139>] cpu_idle+0x79/0xf0
[223936.476054] softirqs last  enabled at (2613824048): [<ffffffff81064176>] __do_softirq+0x1f6/0x250
[223936.476058] softirqs last disabled at (2613824031): [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223936.476062] CPU 2 
[223936.476063] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223936.476071] 
[223936.476073] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
[223936.476077] RIP: 0010:[<ffffffff8101afa6>]  [<ffffffff8101afa6>] native_read_tsc+0x6/0x20
[223936.476082] RSP: 0018:ffff88022fc83ce0  EFLAGS: 00000202
[223936.476084] RAX: 0000000070ba7dfc RBX: ffffffff813a60ae RCX: 0000000070ba7dc4
[223936.476086] RDX: 0000000000017b5d RSI: 0000000000000000 RDI: 0000000000000001
[223936.476089] RBP: ffff88022fc83ce0 R08: 0000000070ba7d7e R09: 0000000000000000
[223936.476091] R10: ffff880226948000 R11: 0000000000000000 R12: ffff88022fc83c58
[223936.476093] R13: ffffffff816feb33 R14: ffff88022fc83ce0 R15: 00000000182285f9
[223936.476096] FS:  0000000000000000(0000) GS:ffff88022fc80000(0000) knlGS:0000000000000000
[223936.476099] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[223936.476101] CR2: 00007f25e7874d7f CR3: 0000000124c0d000 CR4: 00000000000006e0
[223936.476104] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[223936.476106] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[223936.476109] Process kworker/0:1 (pid: 0, threadinfo ffff88022693e000, task ffff880226948000)
[223936.476111] Stack:
[223936.476113]  ffff88022fc83d30 ffffffff813a4ee8 ffff88022693e000 0000000070ba7dc4
[223936.476117]  00000002ffffff10 ffff88006afd8948 00000000182285f9 000000006efcdc62
[223936.476121]  0000000000000001 ffff880226948000 ffff88022fc83d40 ffffffff813a4f6a
[223936.476126] Call Trace:
[223936.476128]  <IRQ> 
[223936.476132]  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
[223936.476136]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223936.476141]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223936.476147]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223936.476153]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223936.476157]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223936.476163]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223936.476167]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223936.476171]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223936.476176]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223936.476180]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223936.476184]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223936.476187]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223936.476193]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223936.476197]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223936.476199]  <EOI> 
[223936.476203]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223936.476207]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223936.476211]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223936.476215]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223936.476219]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[223936.476222] Code: 08 ff 15 46 5c a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31 
[223936.476241]  c1 48 89 d0 48 c1 e0 20 89 ca 48 09 d0 c9 c3 66 2e 0f 1f 84 
[223936.476251] Call Trace:
[223936.476252]  <IRQ>  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
[223936.476257]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223936.476261]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223936.476265]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223936.476268]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223936.476272]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223936.476275]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223936.476278]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223936.476282]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223936.476286]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223936.476289]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223936.476292]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223936.476295]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223936.476299]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223936.476302]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223936.476304]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223936.476310]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223936.476313]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223936.476316]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223936.476320]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[223936.580039] BUG: soft lockup - CPU#3 stuck for 23s! [kworker/0:1:0]
[223936.580041] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223936.580050] irq event stamp: 2615464042
[223936.580052] hardirqs last  enabled at (2615464042): [<ffffffff816f5edb>] _raw_spin_unlock_irq+0x2b/0x50
[223936.580057] hardirqs last disabled at (2615464041): [<ffffffff816f56a8>] _raw_spin_lock_irq+0x18/0x60
[223936.580061] softirqs last  enabled at (2615463964): [<ffffffff81063cce>] _local_bh_enable+0xe/0x10
[223936.580065] softirqs last disabled at (2615463965): [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223936.580069] CPU 3 
[223936.580070] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223936.580078] 
[223936.580080] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
[223936.580085] RIP: 0010:[<ffffffff8101afa6>]  [<ffffffff8101afa6>] native_read_tsc+0x6/0x20
[223936.580090] RSP: 0018:ffff88022fcc3ce0  EFLAGS: 00000202
[223936.580092] RAX: 000000007c457b06 RBX: ffffffff816f6334 RCX: 000000007c457ad5
[223936.580094] RDX: 0000000000017b5d RSI: ffffffff818f9896 RDI: 0000000000000001
[223936.580097] RBP: ffff88022fcc3ce0 R08: 000000007c457a88 R09: 0000000000000000
[223936.580099] R10: ffff880226981f20 R11: 0000000000000000 R12: ffff88022fcc3c58
[223936.580101] R13: ffffffff816feb33 R14: ffff88022fcc3ce0 R15: 00000000183a1380
[223936.580104] FS:  0000000000000000(0000) GS:ffff88022fcc0000(0000) knlGS:0000000000000000
[223936.580107] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[223936.580109] CR2: 0000000000f38820 CR3: 0000000104b52000 CR4: 00000000000006e0
[223936.580112] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[223936.580114] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[223936.580117] Process kworker/0:1 (pid: 0, threadinfo ffff880226988000, task ffff880226981f20)
[223936.580119] Stack:
[223936.580120]  ffff88022fcc3d30 ffffffff813a4ee8 ffff880226988000 000000007c457ad5
[223936.580125]  00000003ffffff10 ffff880031438948 00000000183a1380 000000006efcdc62
[223936.580129]  0000000000000001 ffff880226981f20 ffff88022fcc3d40 ffffffff813a4f6a
[223936.580133] Call Trace:
[223936.580135]  <IRQ> 
[223936.580138]  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
[223936.580142]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223936.580147]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223936.580151]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223936.580156]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223936.580160]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223936.580164]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223936.580168]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223936.580172]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223936.580176]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223936.580181]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223936.580185]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223936.580188]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223936.580192]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223936.580196]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223936.580199]  <EOI> 
[223936.580202]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223936.580206]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223936.580211]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223936.580214]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223936.580218]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[223936.580221] Code: 08 ff 15 46 5c a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31 
[223936.580240]  c1 48 89 d0 48 c1 e0 20 89 ca 48 09 d0 c9 c3 66 2e 0f 1f 84 
[223936.580250] Call Trace:
[223936.580251]  <IRQ>  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
[223936.580256]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223936.580260]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223936.580264]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223936.580267]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223936.580270]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223936.580274]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223936.580277]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223936.580280]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223936.580284]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223936.580288]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223936.580291]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223936.580294]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223936.580297]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223936.580300]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223936.580302]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223936.580308]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223936.580312]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223936.580315]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223936.580318]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[223936.268002] RDX: 000000005932dd02 RSI: ffffffff818f9896 RDI: 0000000000000001
[223936.268002] RBP: ffff88022fc03d30 R08: 000000005932dcb5 R09: 0000000000000000
[223936.268002] R10: ffffffff81a2b020 R11: 0000000000000000 R12: ffff88022fc03c68
[223936.268002] R13: ffffffff816feb33 R14: ffff88022fc03d30 R15: 0000000017f328cd
[223936.268002] FS:  0000000000000000(0000) GS:ffff88022fc00000(0000) knlGS:0000000000000000
[223936.268002] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[223936.268002] CR2: 00007f25e7bc13a0 CR3: 00000001426fc000 CR4: 00000000000006f0
[223936.268002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[223936.268002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[223936.268002] Process swapper (pid: 0, threadinfo ffffffff819a6000, task ffffffff81a2b020)
[223936.268002] Stack:
[223936.268002]  ffffffff819a6000 000000005932dd02 000000002fc03d18 ffff8801f6c22448
[223936.268002]  0000000017f328cd 000000006efcdc62 0000000000000001 ffffffff81a2b020
[223936.268002]  ffff88022fc03d40 ffffffff813a4f6a ffff88022fc03d80 ffffffff813ac2ab
[223936.268002] Call Trace:
[223936.268002]  <IRQ> 
[223936.268002]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223936.268002]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223936.268002]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223936.268002]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223936.268002]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223936.268002]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223936.268002]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223936.268002]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223936.268002]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223936.268002]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223936.268002]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223936.268002]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223936.268002]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223936.268002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223936.268002]  <EOI> 
[223936.268002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223936.268002]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223936.268002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223936.268002]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223936.268002]  [<ffffffff816ca491>] rest_init+0xd1/0xe0
[223936.268002]  [<ffffffff816ca3c0>] ? csum_partial_copy_generic+0x170/0x170
[223936.268002]  [<ffffffff81adcc55>] start_kernel+0x360/0x3ac
[223936.268002]  [<ffffffff81adc2a2>] x86_64_start_reservations+0x82/0x89
[223936.268002]  [<ffffffff81adc3b8>] x86_64_start_kernel+0x10f/0x12a
[223936.268002]  [<ffffffff81adc140>] ? early_idt_handlers+0x140/0x140
[223936.268002] Code: 4c 89 7d c8 eb 1f 66 90 48 8b 45 c0 83 68 1c 01 f3 90 83 40 1c 01 65 44 8b 3c 25 50 d3 00 00 44 3b 7d d4 75 3b 66 66 90 0f ae e8 <e8> b8 60 c7 ff 66 90 4c 63 c0 4c 89 c0 48 2b 45 c8 48 39 d8 72 
[223936.268002] Call Trace:
[223936.268002]  <IRQ>  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223936.268002]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223936.268002]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223936.268002]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223936.268002]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223936.268002]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223936.268002]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223936.268002]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223936.268002]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223936.268002]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223936.268002]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223936.268002]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223936.268002]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223936.268002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223936.268002]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223936.268002]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223936.268002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223936.268002]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223936.268002]  [<ffffffff816ca491>] rest_init+0xd1/0xe0
[223936.268002]  [<ffffffff816ca3c0>] ? csum_partial_copy_generic+0x170/0x170
[223936.268002]  [<ffffffff81adcc55>] start_kernel+0x360/0x3ac
[223936.268002]  [<ffffffff81adc2a2>] x86_64_start_reservations+0x82/0x89
[223936.268002]  [<ffffffff81adc3b8>] x86_64_start_kernel+0x10f/0x12a
[223936.268002]  [<ffffffff81adc140>] ? early_idt_handlers+0x140/0x140
[223964.264018] BUG: soft lockup - CPU#0 stuck for 22s! [swapper:0]
[223964.268002] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223964.268002] irq event stamp: 2595159887
[223964.268002] hardirqs last  enabled at (2595159887): [<ffffffff8101b805>] mwait_idle+0x145/0x170
[223964.268002] hardirqs last disabled at (2595159886): [<ffffffff81013139>] cpu_idle+0x79/0xf0
[223964.268002] softirqs last  enabled at (2595159878): [<ffffffff81064176>] __do_softirq+0x1f6/0x250
[223964.268002] softirqs last disabled at (2595159873): [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223964.268002] CPU 0 
[223964.268002] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223964.268002] 
[223964.268002] Pid: 0, comm: swapper Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
[223964.268002] RIP: 0010:[<ffffffff8101afa6>]  [<ffffffff8101afa6>] native_read_tsc+0x6/0x20
[223964.268002] RSP: 0018:ffff88022fc03ce0  EFLAGS: 00000202
[223964.268002] RAX: 000000007cb6c61b RBX: ffffffff816f6334 RCX: 000000007cb6c5e3
[223964.372025] BUG: soft lockup - CPU#1 stuck for 23s! [kworker/0:0:0]
[223964.372027] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223964.372036] irq event stamp: 2598787699
[223964.372037] hardirqs last  enabled at (2598787699): [<ffffffff8101b805>] mwait_idle+0x145/0x170
[223964.372042] hardirqs last disabled at (2598787698): [<ffffffff81013139>] cpu_idle+0x79/0xf0
[223964.372045] softirqs last  enabled at (2598787696): [<ffffffff81064176>] __do_softirq+0x1f6/0x250
[223964.372049] softirqs last disabled at (2598787681): [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223964.372052] CPU 1 
[223964.372053] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223964.372061] 
[223964.372063] Pid: 0, comm: kworker/0:0 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
[223964.372067] RIP: 0010:[<ffffffff8101afa0>]  [<ffffffff8101afa0>] read_persistent_clock+0x30/0x30
[223964.372072] RSP: 0018:ffff88022fc43ce8  EFLAGS: 00000202
[223964.372074] RAX: 0000000000000001 RBX: ffff88022fc43c68 RCX: 0000000088b369fd
[223964.372076] RDX: 0000000000000200 RSI: 0000000000000000 RDI: 0000000000000001
[223964.372078] RBP: ffff88022fc43d30 R08: ffffffff88b369fd R09: 0000000000000000
[223964.372081] R10: ffff88022690dd60 R11: 0000000000000000 R12: ffff88022fc43c58
[223964.372083] R13: ffffffff816feb33 R14: ffff88022fc43d30 R15: 00000000307e58b4
[223964.372086] FS:  0000000000000000(0000) GS:ffff88022fc40000(0000) knlGS:0000000000000000
[223964.372089] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[223964.372091] CR2: 00007f472ba6a6fc CR3: 0000000126bb7000 CR4: 00000000000006e0
[223964.372093] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[223964.372096] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[223964.372098] Process kworker/0:0 (pid: 0, threadinfo ffff880226928000, task ffff88022690dd60)
[223964.372100] Stack:
[223964.372102]  ffffffff813a4eaf ffff880226928000 ffffffff88b369c5 000000012fc43d18
[223964.372106]  ffff88002e90e348 00000000307e58b4 000000006efcdc62 0000000000000001
[223964.372111]  ffff88022690dd60 ffff88022fc43d40 ffffffff813a4f6a ffff88022fc43d80
[223964.372115] Call Trace:
[223964.372116]  <IRQ> 
[223964.372119]  [<ffffffff813a4eaf>] ? delay_tsc+0x3f/0xd0
[223964.372123]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223964.372127]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223964.372132]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223964.372136]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223964.372140]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223964.372144]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223964.372148]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223964.372153]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223964.372158]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223964.372162]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223964.372166]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223964.372170]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223964.372174]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223964.372178]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223964.372180]  <EOI> 
[223964.372184]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223964.372188]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223964.372192]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223964.372196]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223964.372200]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[223964.372203] Code: 48 89 fb 48 83 ec 08 ff 15 46 5c a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 
[223964.372221]  48 89 e5 0f 31 89 c1 48 89 d0 48 c1 e0 20 89 ca 48 09 d0 c9 
[223964.372231] Call Trace:
[223964.372232]  <IRQ>  [<ffffffff813a4eaf>] ? delay_tsc+0x3f/0xd0
[223964.372237]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223964.372241]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223964.372245]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223964.372248]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223964.372251]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223964.372255]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223964.372258]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223964.372261]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223964.372265]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223964.372268]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223964.372271]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223964.372275]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223964.372278]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223964.372281]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223964.372282]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223964.372288]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223964.372292]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223964.372295]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223964.372298]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[223964.476031] BUG: soft lockup - CPU#2 stuck for 23s! [kworker/0:1:0]
[223964.476033] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223964.476042] irq event stamp: 2613824057
[223964.476043] hardirqs last  enabled at (2613824057): [<ffffffff8101b805>] mwait_idle+0x145/0x170
[223964.476048] hardirqs last disabled at (2613824056): [<ffffffff81013139>] cpu_idle+0x79/0xf0
[223964.476051] softirqs last  enabled at (2613824048): [<ffffffff81064176>] __do_softirq+0x1f6/0x250
[223964.476055] softirqs last disabled at (2613824031): [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223964.476059] CPU 2 
[223964.476060] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223964.476067] 
[223964.476070] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
[223964.476074] RIP: 0010:[<ffffffff8101afa6>]  [<ffffffff8101afa6>] native_read_tsc+0x6/0x20
[223964.476078] RSP: 0018:ffff88022fc83ce0  EFLAGS: 00000206
[223964.476080] RAX: 00000000943e6715 RBX: ffffffff816f6334 RCX: 00000000943e66dd
[223964.476083] RDX: 0000000000017b69 RSI: 0000000000000000 RDI: 0000000000000001
[223964.476085] RBP: ffff88022fc83ce0 R08: ffffffff943e6697 R09: 0000000000000000
[223964.476087] R10: ffff880226948000 R11: 0000000000000000 R12: ffff88022fc83c58
[223964.476090] R13: ffffffff816feb33 R14: ffff88022fc83ce0 R15: 000000003094ad30
[223964.476092] FS:  0000000000000000(0000) GS:ffff88022fc80000(0000) knlGS:0000000000000000
[223964.476095] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[223964.476097] CR2: 00007f25e7874d7f CR3: 0000000124c0d000 CR4: 00000000000006e0
[223964.476100] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[223964.476102] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[223964.476105] Process kworker/0:1 (pid: 0, threadinfo ffff88022693e000, task ffff880226948000)
[223964.476107] Stack:
[223964.476108]  ffff88022fc83d30 ffffffff813a4ee8 ffff88022693e000 ffffffff943e66dd
[223964.476113]  00000002ffffff10 ffff88006afd8948 000000003094ad30 000000006efcdc62
[223964.476117]  0000000000000001 ffff880226948000 ffff88022fc83d40 ffffffff813a4f6a
[223964.476121] Call Trace:
[223964.476123]  <IRQ> 
[223964.476126]  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
[223964.476130]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223964.476134]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223964.476139]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223964.476143]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223964.476147]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223964.476151]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223964.476155]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223964.476159]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223964.476164]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223964.476168]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223964.476172]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223964.476176]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223964.476180]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223964.476184]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223964.476186]  <EOI> 
[223964.476190]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223964.476194]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223964.476198]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223964.476202]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223964.476206]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[223964.476208] Code: 08 ff 15 46 5c a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31 
[223964.476227]  c1 48 89 d0 48 c1 e0 20 89 ca 48 09 d0 c9 c3 66 2e 0f 1f 84 
[223964.476236] Call Trace:
[223964.476238]  <IRQ>  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
[223964.476243]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223964.476246]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223964.476250]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223964.476254]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223964.476257]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223964.476260]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223964.476264]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223964.476267]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223964.476271]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223964.476274]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223964.476277]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223964.476281]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223964.476284]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223964.476287]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223964.476289]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223964.476295]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223964.476298]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223964.476301]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223964.476304]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[223964.580038] BUG: soft lockup - CPU#3 stuck for 23s! [kworker/0:1:0]
[223964.580040] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223964.580049] irq event stamp: 2615464042
[223964.580050] hardirqs last  enabled at (2615464042): [<ffffffff816f5edb>] _raw_spin_unlock_irq+0x2b/0x50
[223964.580054] hardirqs last disabled at (2615464041): [<ffffffff816f56a8>] _raw_spin_lock_irq+0x18/0x60
[223964.580058] softirqs last  enabled at (2615463964): [<ffffffff81063cce>] _local_bh_enable+0xe/0x10
[223964.580062] softirqs last disabled at (2615463965): [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223964.580066] CPU 3 
[223964.580067] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223964.580075] 
[223964.580077] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
[223964.580081] RIP: 0010:[<ffffffff8101afa6>]  [<ffffffff8101afa6>] native_read_tsc+0x6/0x20
[223964.580086] RSP: 0018:ffff88022fcc3ce0  EFLAGS: 00000206
[223964.580088] RAX: 000000009fc963af RBX: ffffffff816f6334 RCX: 000000009fc96377
[223964.580090] RDX: 0000000000017b69 RSI: ffffffff818f9896 RDI: 0000000000000001
[223964.580093] RBP: ffff88022fcc3ce0 R08: ffffffff9fc96331 R09: 0000000000000000
[223964.580095] R10: ffff880226981f20 R11: 0000000000000000 R12: ffff88022fcc3c58
[223964.580097] R13: ffffffff816feb33 R14: ffff88022fcc3ce0 R15: 0000000030ac88b0
[223964.580100] FS:  0000000000000000(0000) GS:ffff88022fcc0000(0000) knlGS:0000000000000000
[223964.580103] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[223964.580105] CR2: 0000000000f38820 CR3: 0000000104b52000 CR4: 00000000000006e0
[223964.580107] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[223964.580110] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[223964.580112] Process kworker/0:1 (pid: 0, threadinfo ffff880226988000, task ffff880226981f20)
[223964.580114] Stack:
[223964.580116]  ffff88022fcc3d30 ffffffff813a4ee8 ffff880226988000 ffffffff9fc96377
[223964.580120]  000000039c3b34d8 ffff880031438948 0000000030ac88b0 000000006efcdc62
[223964.580124]  0000000000000001 ffff880226981f20 ffff88022fcc3d40 ffffffff813a4f6a
[223964.580128] Call Trace:
[223964.580130]  <IRQ> 
[223964.580133]  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
[223964.580137]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223964.580141]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223964.580146]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223964.580150]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223964.580154]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223964.580158]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223964.580162]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223964.580167]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223964.580171]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223964.580176]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223964.580180]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223964.580184]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223964.580188]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223964.580192]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223964.580194]  <EOI> 
[223964.580198]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223964.580202]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223964.580206]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223964.580210]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223964.580214]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[223964.580217] Code: 08 ff 15 46 5c a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31 
[223964.580235]  c1 48 89 d0 48 c1 e0 20 89 ca 48 09 d0 c9 c3 66 2e 0f 1f 84 
[223964.580245] Call Trace:
[223964.580246]  <IRQ>  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
[223964.580252]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223964.580255]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223964.580259]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223964.580262]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223964.580265]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223964.580269]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223964.580272]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223964.580276]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223964.580279]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223964.580283]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223964.580286]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223964.580289]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223964.580292]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223964.580295]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223964.580297]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223964.580303]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223964.580307]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223964.580310]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223964.580313]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[223964.268002] RDX: 0000000000017b69 RSI: ffffffff818f9896 RDI: 0000000000000001
[223964.268002] RBP: ffff88022fc03ce0 R08: 000000007cb6c596 R09: 0000000000000000
[223964.268002] R10: ffffffff81a2b020 R11: 0000000000000000 R12: ffff88022fc03c58
[223964.268002] R13: ffffffff816feb33 R14: ffff88022fc03ce0 R15: 000000002eb85d38
[223964.268002] FS:  0000000000000000(0000) GS:ffff88022fc00000(0000) knlGS:0000000000000000
[223964.268002] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[223964.268002] CR2: 00007f25e7bc13a0 CR3: 00000001426fc000 CR4: 00000000000006f0
[223964.268002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[223964.268002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[223964.268002] Process swapper (pid: 0, threadinfo ffffffff819a6000, task ffffffff81a2b020)
[223964.268002] Stack:
[223964.268002]  ffff88022fc03d30 ffffffff813a4ee8 ffffffff819a6000 000000007cb6c5e3
[223964.268002]  000000007c44ac9c ffff8801f6c22448 000000002eb85d38 000000006efcdc62
[223964.268002]  0000000000000001 ffffffff81a2b020 ffff88022fc03d40 ffffffff813a4f6a
[223964.268002] Call Trace:
[223964.268002]  <IRQ> 
[223964.268002]  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
[223964.268002]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223964.268002]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223964.268002]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223964.268002]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223964.268002]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223964.268002]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223964.268002]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223964.268002]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223964.268002]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223964.268002]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223964.268002]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223964.268002]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223964.268002]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223964.268002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223964.268002]  <EOI> 
[223964.268002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223964.268002]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223964.268002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223964.268002]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223964.268002]  [<ffffffff816ca491>] rest_init+0xd1/0xe0
[223964.268002]  [<ffffffff816ca3c0>] ? csum_partial_copy_generic+0x170/0x170
[223964.268002]  [<ffffffff81adcc55>] start_kernel+0x360/0x3ac
[223964.268002]  [<ffffffff81adc2a2>] x86_64_start_reservations+0x82/0x89
[223964.268002]  [<ffffffff81adc3b8>] x86_64_start_kernel+0x10f/0x12a
[223964.268002]  [<ffffffff81adc140>] ? early_idt_handlers+0x140/0x140
[223964.268002] Code: 08 ff 15 46 5c a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31 
[223964.268002]  c1 48 89 d0 48 c1 e0 20 89 ca 48 09 d0 c9 c3 66 2e 0f 1f 84 
[223964.268002] Call Trace:
[223964.268002]  <IRQ>  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
[223964.268002]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223964.268002]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223964.268002]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223964.268002]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223964.268002]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223964.268002]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223964.268002]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223964.268002]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223964.268002]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223964.268002]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223964.268002]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223964.268002]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223964.268002]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223964.268002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223964.268002]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223964.268002]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223964.268002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223964.268002]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223964.268002]  [<ffffffff816ca491>] rest_init+0xd1/0xe0
[223964.268002]  [<ffffffff816ca3c0>] ? csum_partial_copy_generic+0x170/0x170
[223964.268002]  [<ffffffff81adcc55>] start_kernel+0x360/0x3ac
[223964.268002]  [<ffffffff81adc2a2>] x86_64_start_reservations+0x82/0x89
[223964.268002]  [<ffffffff81adc3b8>] x86_64_start_kernel+0x10f/0x12a
[223964.268002]  [<ffffffff81adc140>] ? early_idt_handlers+0x140/0x140
[223968.815995] INFO: rcu_sched_state detected stall on CPU 1 (t=15000 jiffies)
[223968.819995] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 1} (detected by 3, t=15002 jiffies)
[223968.820000] sending NMI to all CPUs:
[223968.820002] NMI backtrace for cpu 3
[223968.820002] CPU 3 
[223968.820002] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223968.820002] 
[223968.820002] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
[223968.820002] RIP: 0010:[<ffffffff813a4f86>]  [<ffffffff813a4f86>] __const_udelay+0x16/0x40
[223968.820002] RSP: 0018:ffff88022fcc3a90  EFLAGS: 00000002
[223968.820002] RAX: 0000000000e34d8a RBX: 0000000000000001 RCX: 0000000001062560
[223968.820002] RDX: 000000000071a6c5 RSI: 0000000000000002 RDI: 0000000000418958
[223968.820002] RBP: ffff88022fcc3ab0 R08: 0000000000000002 R09: 0000000000000000
[223968.820002] R10: 0000000000000006 R11: 000000000000000a R12: ffffffff81a40d80
[223968.820002] R13: 0000000000000010 R14: ffffffff81a40e40 R15: ffffffff81a40fc0
[223968.820002] FS:  0000000000000000(0000) GS:ffff88022fcc0000(0000) knlGS:0000000000000000
[223968.820002] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[223968.820002] CR2: 0000000000f38820 CR3: 0000000104b52000 CR4: 00000000000006e0
[223968.820002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[223968.820002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[223968.820002] Process kworker/0:1 (pid: 0, threadinfo ffff880226988000, task ffff880226981f20)
[223968.820002] Stack:
[223968.820002]  ffff88022fcc3ab0 ffffffff81031695 ffff88022fccdfa0 ffff88022fccdfa0
[223968.820002]  ffff88022fcc3af0 ffffffff810bb9d2 ffffffff81a40fc0 0000000000000003
[223968.820002]  0000000000000003 ffff880226981f20 ffffffff810921f0 ffff88022fcc3be0
[223968.820002] Call Trace:
[223968.820002]  <IRQ> 
[223968.820002]  [<ffffffff81031695>] ? arch_trigger_all_cpu_backtrace+0x65/0x90
[223968.820002]  [<ffffffff810bb9d2>] __rcu_pending+0x382/0x3b0
[223968.820002]  [<ffffffff810921f0>] ? tick_nohz_handler+0x100/0x100
[223968.820002]  [<ffffffff810bba67>] rcu_check_callbacks+0x67/0x130
[223968.820002]  [<ffffffff8106d861>] update_process_times+0x41/0x80
[223968.820002]  [<ffffffff81092256>] tick_sched_timer+0x66/0xc0
[223968.820002]  [<ffffffff810845ee>] __run_hrtimer+0xfe/0x1e0
[223968.820002]  [<ffffffff8108491d>] hrtimer_interrupt+0xcd/0x1f0
[223968.820002]  [<ffffffff810310c4>] smp_apic_timer_interrupt+0x64/0xa0
[223968.820002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223968.820002]  [<ffffffff816f6334>] ? retint_restore_args+0x13/0x13
[223968.820002]  [<ffffffff8101afa6>] ? native_read_tsc+0x6/0x20
[223968.820002]  [<ffffffff813a4eaf>] delay_tsc+0x3f/0xd0
[223968.820002]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223968.820002]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223968.820002]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223968.820002]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223968.820002]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223968.820002]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223968.820002]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223968.820002]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223968.820002]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223968.820002]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223968.820002]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223968.820002]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223968.820002]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223968.820002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223968.820002]  <EOI> 
[223968.820002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223968.820002]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223968.820002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223968.820002]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223968.820002]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[223968.820002] Code: 00 00 00 00 00 55 48 89 e5 ff 15 8e a5 6c 00 c9 c3 0f 1f 40 00 55 48 8d 0c bd 00 00 00 00 65 48 8b 14 25 58 2d 01 00 48 8d 04 12 
[223968.820002]  c1 e2 06 48 89 e5 48 29 c2 48 89 c8 f7 e2 48 8d 7a 01 ff 15 
[223968.820002] Call Trace:
[223968.820002]  <IRQ>  [<ffffffff81031695>] ? arch_trigger_all_cpu_backtrace+0x65/0x90
[223968.820002]  [<ffffffff810bb9d2>] __rcu_pending+0x382/0x3b0
[223968.820002]  [<ffffffff810921f0>] ? tick_nohz_handler+0x100/0x100
[223968.820002]  [<ffffffff810bba67>] rcu_check_callbacks+0x67/0x130
[223968.820002]  [<ffffffff8106d861>] update_process_times+0x41/0x80
[223968.820002]  [<ffffffff81092256>] tick_sched_timer+0x66/0xc0
[223968.820002]  [<ffffffff810845ee>] __run_hrtimer+0xfe/0x1e0
[223968.820002]  [<ffffffff8108491d>] hrtimer_interrupt+0xcd/0x1f0
[223968.820002]  [<ffffffff810310c4>] smp_apic_timer_interrupt+0x64/0xa0
[223968.820002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223968.820002]  [<ffffffff816f6334>] ? retint_restore_args+0x13/0x13
[223968.820002]  [<ffffffff8101afa6>] ? native_read_tsc+0x6/0x20
[223968.820002]  [<ffffffff813a4eaf>] delay_tsc+0x3f/0xd0
[223968.820002]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223968.820002]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223968.820002]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223968.820002]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223968.820002]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223968.820002]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223968.820002]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223968.820002]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223968.820002]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223968.820002]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223968.820002]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223968.820002]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223968.820002]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223968.820002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223968.820002]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223968.820002]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223968.820002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223968.820002]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223968.820002]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[223968.820335] NMI backtrace for cpu 0
[223968.820337] CPU 0 
[223968.820338] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223968.820347] 
[223968.820349] Pid: 0, comm: swapper Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
[223968.820353] RIP: 0010:[<ffffffff813a4ef0>]  [<ffffffff813a4ef0>] delay_tsc+0x80/0xd0
[223968.820358] RSP: 0018:ffff88022fc03cf0  EFLAGS: 00000206
[223968.820360] RAX: 000000007659b10f RBX: 0000000000000001 RCX: 000000007659b10f
[223968.820363] RDX: 000000007659b10f RSI: ffffffff818f9896 RDI: 0000000000000001
[223968.820365] RBP: ffff88022fc03d30 R08: 000000007659b10f R09: 0000000000000000
[223968.820367] R10: ffffffff81a2b020 R11: 0000000000000000 R12: 0000000031026962
[223968.820370] R13: 000000006efcdc62 R14: ffffffff819a6000 R15: 000000007659b0de
[223968.820373] FS:  0000000000000000(0000) GS:ffff88022fc00000(0000) knlGS:0000000000000000
[223968.820375] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[223968.820377] CR2: 00007f25e7bc13a0 CR3: 00000001426fc000 CR4: 00000000000006f0
[223968.820380] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[223968.820382] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[223968.820385] Process swapper (pid: 0, threadinfo ffffffff819a6000, task ffffffff81a2b020)
[223968.820387] Stack:
[223968.820388]  ffffffff819a6000 000000007659b0de 00000000818f9896 ffff8801f6c22448
[223968.820393]  0000000031026962 000000006efcdc62 0000000000000001 ffffffff81a2b020
[223968.820397]  ffff88022fc03d40 ffffffff813a4f6a ffff88022fc03d80 ffffffff813ac2ab
[223968.820401] Call Trace:
[223968.820402]  <IRQ> 
[223968.820406]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223968.820410]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223968.820414]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223968.820417]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223968.820420]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223968.820424]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223968.820427]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223968.820430]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223968.820434]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223968.820437]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223968.820441]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223968.820444]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223968.820447]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223968.820450]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223968.820452]  <EOI> 
[223968.820455]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223968.820459]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223968.820462]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223968.820465]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223968.820468]  [<ffffffff816ca491>] rest_init+0xd1/0xe0
[223968.820471]  [<ffffffff816ca3c0>] ? csum_partial_copy_generic+0x170/0x170
[223968.820475]  [<ffffffff81adcc55>] start_kernel+0x360/0x3ac
[223968.820478]  [<ffffffff81adc2a2>] x86_64_start_reservations+0x82/0x89
[223968.820481]  [<ffffffff81adc3b8>] x86_64_start_kernel+0x10f/0x12a
[223968.820484]  [<ffffffff81adc140>] ? early_idt_handlers+0x140/0x140
[223968.820486] Code: 68 1c 01 f3 90 83 40 1c 01 65 44 8b 3c 25 50 d3 00 00 44 3b 7d d4 75 3b 66 66 90 0f ae e8 e8 b8 60 c7 ff 66 90 4c 63 c0 4c 89 c0 
[223968.820504]  2b 45 c8 48 39 d8 72 c7 65 48 8b 04 25 08 c4 00 00 83 a8 44 
[223968.820514] Call Trace:
[223968.820515]  <IRQ>  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223968.820521]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223968.820525]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223968.820528]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223968.820532]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223968.820535]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223968.820538]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223968.820542]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223968.820546]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223968.820549]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223968.820552]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223968.820556]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223968.820559]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223968.820562]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223968.820564]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223968.820570]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223968.820573]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223968.820576]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223968.820579]  [<ffffffff816ca491>] rest_init+0xd1/0xe0
[223968.820583]  [<ffffffff816ca3c0>] ? csum_partial_copy_generic+0x170/0x170
[223968.820586]  [<ffffffff81adcc55>] start_kernel+0x360/0x3ac
[223968.820589]  [<ffffffff81adc2a2>] x86_64_start_reservations+0x82/0x89
[223968.820593]  [<ffffffff81adc3b8>] x86_64_start_kernel+0x10f/0x12a
[223968.820596]  [<ffffffff81adc140>] ? early_idt_handlers+0x140/0x140
[223968.820599] NMI backtrace for cpu 2
[223968.820600] CPU 2 
[223968.820602] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223968.820610] 
[223968.820612] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
[223968.820616] RIP: 0010:[<ffffffff8101afa6>]  [<ffffffff8101afa6>] native_read_tsc+0x6/0x20
[223968.820621] RSP: 0018:ffff88022fc83ce0  EFLAGS: 00000202
[223968.820623] RAX: 000000007659b116 RBX: 0000000000000001 RCX: 000000007659b0e5
[223968.820625] RDX: 0000000000017b6b RSI: 0000000000000000 RDI: 0000000000000001
[223968.820628] RBP: ffff88022fc83ce0 R08: 000000007659b098 R09: 0000000000000000
[223968.820630] R10: ffff880226948000 R11: 0000000000000000 R12: 00000000345f87d7
[223968.820632] R13: 000000006efcdc62 R14: ffff88022693e000 R15: 000000007659b0e5
[223968.820635] FS:  0000000000000000(0000) GS:ffff88022fc80000(0000) knlGS:0000000000000000
[223968.820638] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[223968.820640] CR2: 00007f25e7874d7f CR3: 0000000124c0d000 CR4: 00000000000006e0
[223968.820642] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[223968.820645] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[223968.820647] Process kworker/0:1 (pid: 0, threadinfo ffff88022693e000, task ffff880226948000)
[223968.820649] Stack:
[223968.820651]  ffff88022fc83d30 ffffffff813a4ee8 ffff88022693e000 000000007659b0e5
[223968.820655]  000000026b4044c5 ffff88006afd8948 00000000345f87d7 000000006efcdc62
[223968.820659]  0000000000000001 ffff880226948000 ffff88022fc83d40 ffffffff813a4f6a
[223968.820663] Call Trace:
[223968.820665]  <IRQ> 
[223968.820668]  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
[223968.820671]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223968.820674]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223968.820678]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223968.820682]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223968.820685]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223968.820688]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223968.820691]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223968.820695]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223968.820699]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223968.820702]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223968.820705]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223968.820708]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223968.820712]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223968.820715]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223968.820717]  <EOI> 
[223968.820720]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223968.820723]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223968.820727]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223968.820730]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223968.820733]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[223968.820735] Code: 08 ff 15 46 5c a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31 
[223968.820753]  c1 48 89 d0 48 c1 e0 20 89 ca 48 09 d0 c9 c3 66 2e 0f 1f 84 
[223968.820763] Call Trace:
[223968.820764]  <IRQ>  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
[223968.820769]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223968.820773]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223968.820777]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223968.820780]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223968.820783]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223968.820787]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223968.820790]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223968.820793]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223968.820797]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223968.820801]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223968.820804]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223968.820807]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223968.820810]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223968.820813]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223968.820815]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223968.820821]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223968.820824]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223968.820827]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223968.820831]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[223968.816001] NMI backtrace for cpu 1
[223968.816001] CPU 1 
[223968.816001] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[223968.816001] 
[223968.816001] Pid: 0, comm: kworker/0:0 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
[223968.816001] RIP: 0010:[<ffffffff81440955>]  [<ffffffff81440955>] io_serial_out+0x15/0x20
[223968.816001] RSP: 0018:ffff88022fc437f0  EFLAGS: 00000002
[223968.816001] RAX: 0000000000000073 RBX: ffffffff8243eec0 RCX: 0000000000000000
[223968.816001] RDX: 00000000000003f8 RSI: 00000000000003f8 RDI: ffffffff8243eec0
[223968.816001] RBP: ffff88022fc437f0 R08: 000000007659a435 R09: 0000000000000000
[223968.816001] R10: 0000000000000001 R11: 000000000000000a R12: 0000000000000073
[223968.816001] R13: ffffffff81bc648d R14: 0000000000000050 R15: ffffffff8243eec0
[223968.816001] FS:  0000000000000000(0000) GS:ffff88022fc40000(0000) knlGS:0000000000000000
[223968.816001] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[223968.816001] CR2: 00007f472ba6a6fc CR3: 0000000126bb7000 CR4: 00000000000006e0
[223968.816001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[223968.816001] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[223968.816001] Process kworker/0:0 (pid: 0, threadinfo ffff880226928000, task ffff88022690dd60)
[223968.816001] Stack:
[223968.816001]  ffff88022fc43810 ffffffff814410dc 0000000000000030 ffffffff814410b0
[223968.816001]  ffff88022fc43850 ffffffff8143cdb5 0000000000000087 0000000000000000
[223968.816001]  ffffffff8243eec0 0000000000000001 0000000000000087 000000000000000d
[223968.816001] Call Trace:
[223968.816001]  <IRQ> 
[223968.816001]  [<ffffffff814410dc>] serial8250_console_putchar+0x2c/0x40
[223968.816001]  [<ffffffff814410b0>] ? wait_for_xmitr+0xa0/0xa0
[223968.816001]  [<ffffffff8143cdb5>] uart_console_write+0x35/0x70
[223968.816001]  [<ffffffff814417be>] serial8250_console_write+0xbe/0x1a0
[223968.816001]  [<ffffffff8105c78e>] __call_console_drivers+0x8e/0xb0
[223968.816001]  [<ffffffff8105c7f5>] _call_console_drivers+0x45/0x70
[223968.816001]  [<ffffffff8105d02f>] console_unlock+0x17f/0x2b0
[223968.816001]  [<ffffffff8105d64d>] vprintk+0x1fd/0x520
[223968.816001]  [<ffffffff816f16c1>] printk+0x67/0x69
[223968.816001]  [<ffffffff816f5fa6>] ? _raw_spin_unlock+0x26/0x40
[223968.816001]  [<ffffffff8105388b>] ? account_system_time+0xab/0x190
[223968.816001]  [<ffffffff810bb7e4>] __rcu_pending+0x194/0x3b0
[223968.816001]  [<ffffffff810921f0>] ? tick_nohz_handler+0x100/0x100
[223968.816001]  [<ffffffff810bba67>] rcu_check_callbacks+0x67/0x130
[223968.816001]  [<ffffffff8106d861>] update_process_times+0x41/0x80
[223968.816001]  [<ffffffff81092256>] tick_sched_timer+0x66/0xc0
[223968.816001]  [<ffffffff810845ee>] __run_hrtimer+0xfe/0x1e0
[223968.816001]  [<ffffffff8108491d>] hrtimer_interrupt+0xcd/0x1f0
[223968.816001]  [<ffffffff810310c4>] smp_apic_timer_interrupt+0x64/0xa0
[223968.816001]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223968.816001]  [<ffffffff816f6334>] ? retint_restore_args+0x13/0x13
[223968.816001]  [<ffffffff8101afa6>] ? native_read_tsc+0x6/0x20
[223968.816001]  [<ffffffff813a4eaf>] delay_tsc+0x3f/0xd0
[223968.816001]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223968.816001]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223968.816001]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223968.816001]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223968.816001]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223968.816001]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223968.816001]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223968.816001]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223968.816001]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223968.816001]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223968.816001]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223968.816001]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223968.816001]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223968.816001]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223968.816001]  <EOI> 
[223968.816001]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223968.816001]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223968.816001]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223968.816001]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223968.816001]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
[223968.816001] Code: 48 89 e5 d3 e2 03 57 38 ec 0f b6 c0 c9 c3 0f 1f 84 00 00 00 00 00 0f b6 8f 81 00 00 00 55 89 d0 48 89 e5 d3 e6 03 77 38 89 f2 ee <c9> c3 66 0f 1f 84 00 00 00 00 00 55 80 bf 82 00 00 00 08 48 89 
[223968.816001] Call Trace:
[223968.816001]  <IRQ>  [<ffffffff814410dc>] serial8250_console_putchar+0x2c/0x40
[223968.816001]  [<ffffffff814410b0>] ? wait_for_xmitr+0xa0/0xa0
[223968.816001]  [<ffffffff8143cdb5>] uart_console_write+0x35/0x70
[223968.816001]  [<ffffffff814417be>] serial8250_console_write+0xbe/0x1a0
[223968.816001]  [<ffffffff8105c78e>] __call_console_drivers+0x8e/0xb0
[223968.816001]  [<ffffffff8105c7f5>] _call_console_drivers+0x45/0x70
[223968.816001]  [<ffffffff8105d02f>] console_unlock+0x17f/0x2b0
[223968.816001]  [<ffffffff8105d64d>] vprintk+0x1fd/0x520
[223968.816001]  [<ffffffff816f16c1>] printk+0x67/0x69
[223968.816001]  [<ffffffff816f5fa6>] ? _raw_spin_unlock+0x26/0x40
[223968.816001]  [<ffffffff8105388b>] ? account_system_time+0xab/0x190
[223968.816001]  [<ffffffff810bb7e4>] __rcu_pending+0x194/0x3b0
[223968.816001]  [<ffffffff810921f0>] ? tick_nohz_handler+0x100/0x100
[223968.816001]  [<ffffffff810bba67>] rcu_check_callbacks+0x67/0x130
[223968.816001]  [<ffffffff8106d861>] update_process_times+0x41/0x80
[223968.816001]  [<ffffffff81092256>] tick_sched_timer+0x66/0xc0
[223968.816001]  [<ffffffff810845ee>] __run_hrtimer+0xfe/0x1e0
[223968.816001]  [<ffffffff8108491d>] hrtimer_interrupt+0xcd/0x1f0
[223968.816001]  [<ffffffff810310c4>] smp_apic_timer_interrupt+0x64/0xa0
[223968.816001]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223968.816001]  [<ffffffff816f6334>] ? retint_restore_args+0x13/0x13
[223968.816001]  [<ffffffff8101afa6>] ? native_read_tsc+0x6/0x20
[223968.816001]  [<ffffffff813a4eaf>] delay_tsc+0x3f/0xd0
[223968.816001]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[223968.816001]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[223968.816001]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[223968.816001]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[223968.816001]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[223968.816001]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[223968.816001]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[223968.816001]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[223968.816001]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[223968.816001]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[223968.816001]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[223968.816001]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[223968.816001]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[223968.816001]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[223968.816001]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223968.816001]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[223968.816001]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[223968.816001]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[223968.816001]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff

[ goes on for another ~300kB, trimmed ]

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-24 19:02                                     ` Simon Kirby
@ 2011-10-25  7:13                                       ` Linus Torvalds
  2011-10-25  9:01                                         ` David Miller
  2011-10-25 20:20                                       ` Simon Kirby
  1 sibling, 1 reply; 156+ messages in thread
From: Linus Torvalds @ 2011-10-25  7:13 UTC (permalink / raw)
  To: Simon Kirby, Network Development
  Cc: Thomas Gleixner, David Miller, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar

Added netdev, because this seems to be a generic networking bug (ABBA
between sk_lock and icsk_retransmit_timer if my quick scan looks
correct).

Davem?

               Linus

On Mon, Oct 24, 2011 at 9:02 PM, Simon Kirby <sim@hostway.ca> wrote:
>
> Ok, hit the hang about 4 more times, but only this morning on a box with
> a serial cable attached. Yay!
>
> Simon-
>
> [216695.579770] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [216695.589435]
> [216695.589437] =======================================================
> [216695.593380] [ INFO: possible circular locking dependency detected ]
> [216695.593380] 3.1.0-rc10-hw-lockdep+ #51
> [216695.593380] -------------------------------------------------------
> [216695.593380] kworker/0:1/0 is trying to acquire lock:
> [216695.593380]  (&icsk->icsk_retransmit_timer){+.-.-.}, at: [<ffffffff8106cc88>] run_timer_softirq+0x198/0x410
> [216695.593380]
> [216695.593380] but task is already holding lock:
> [216695.593380]  (slock-AF_INET){+.-.-.}, at: [<ffffffff81604540>] sk_clone+0x120/0x420
> [216695.593380]
> [216695.593380] which lock already depends on the new lock.
> [216695.593380]
> [216695.593380]
> [216695.593380] the existing dependency chain (in reverse order) is:
> [216695.593380]
> [216695.593380] -> #1 (slock-AF_INET){+.-.-.}:
> [216695.593380]        [<ffffffff8109a7b9>] lock_acquire+0x109/0x140
> [216695.593380]        [<ffffffff816f55fc>] _raw_spin_lock+0x3c/0x50
> [216695.593380]        [<ffffffff81661cc3>] tcp_write_timer+0x23/0x230
> [216695.682901]        [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [216695.682901]        [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [216695.682901]        [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [216695.682901]        [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [216695.682901]        [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [216695.682901]        [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [216695.682901]        [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [216695.682901]        [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [216695.682901]        [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [216695.682901]
> [216695.682901] -> #0 (&icsk->icsk_retransmit_timer){+.-.-.}:
> [216695.682901]        [<ffffffff8109a000>] __lock_acquire+0x2040/0x2180
> [216695.682901]        [<ffffffff8109a7b9>] lock_acquire+0x109/0x140
> [216695.682901]        [<ffffffff8106cd09>] run_timer_softirq+0x219/0x410
> [216695.682901]        [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [216695.682901]        [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [216695.682901]        [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [216695.682901]        [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [216695.682901]        [<ffffffff81014b74>] do_IRQ+0x64/0xe0
> [216695.682901]        [<ffffffff816f6273>] ret_from_intr+0x0/0x1a
> [216695.682901]        [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [216695.682901]        [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [216695.682901]
> [216695.682901] other info that might help us debug this:
> [216695.682901]
> [216695.682901]  Possible unsafe locking scenario:
> [216695.682901]
> [216695.682901]        CPU0                    CPU1
> [216695.682901]        ----                    ----
> [216695.682901]   lock(slock-AF_INET);
> [216695.682901]                                lock(&icsk->icsk_retransmit_timer);
> [216695.682901]                                lock(slock-AF_INET);
> [216695.682901]   lock(&icsk->icsk_retransmit_timer);
> [216695.682901]
> [216695.682901]  *** DEADLOCK ***
> [216695.682901]
> [216695.682901] 1 lock held by kworker/0:1/0:
> [216695.682901]  #0:  (slock-AF_INET){+.-.-.}, at: [<ffffffff81604540>] sk_clone+0x120/0x420
> [216695.682901]
> [216695.682901] stack backtrace:
> [216695.682901] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc10-hw-lockdep+ #51
> [216695.682901] Call Trace:
> [216695.682901]  <IRQ>  [<ffffffff81097eab>] print_circular_bug+0x21b/0x330
> [216695.682901]  [<ffffffff8109a000>] __lock_acquire+0x2040/0x2180
> [216695.682901]  [<ffffffffa001d6e2>] ? nf_conntrack_free+0x42/0x50 [nf_conntrack]
> [216695.682901]  [<ffffffffa001d6e2>] ? nf_conntrack_free+0x42/0x50 [nf_conntrack]
> [216695.682901]  [<ffffffff8109a7b9>] lock_acquire+0x109/0x140
> [216695.682901]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [216695.682901]  [<ffffffff81096b4c>] ? trace_hardirqs_on_caller+0x7c/0x1c0
> [216695.682901]  [<ffffffff8106cd09>] run_timer_softirq+0x219/0x410
> [216695.682901]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [216695.682901]  [<ffffffff816f16c1>] ? printk+0x67/0x69
> [216695.682901]  [<ffffffff81661ca0>] ? tcp_delack_timer+0x230/0x230
> [216695.682901]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [216695.682901]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [216695.682901]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [216695.682901]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [216695.682901]  [<ffffffff81014b74>] do_IRQ+0x64/0xe0
> [216695.682901]  [<ffffffff816f6273>] common_interrupt+0x73/0x73
> [216695.682901]  <EOI>  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [216695.682901]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [216695.682901]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [216695.682901]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [216696.019296] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000105?
> [216697.762956] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [216698.597297] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [216701.489681] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [216701.667999] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [216704.580592] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
> [216709.468971] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
> [216712.845904] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [216716.588502] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
> [216725.072958] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
> [216725.603879] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [216725.828374] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [216727.588978] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [216735.513864] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [216740.581530] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [216756.278571] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [218855.312903] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
> [218855.323133] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
> [218858.293355] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [218864.301938] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [218876.333821] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [218885.332651] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
> [218900.313590] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [220821.012017] TCP: Peer 32.176.160.153:49226/80 unexpectedly shrunk window 665256753:665268993 (repaired)
> [221075.224300] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221075.234579] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221075.277593] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221075.780515] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221075.780713] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221077.349279] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221077.905587] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
> [221077.915567] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
> [221081.498430] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221081.703277] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
> [221082.088513] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221082.167985] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221089.772578] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221090.487927] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
> [221090.686394] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221094.587131] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221105.255699] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
> [221105.280699] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221105.291634] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221106.325794] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221107.286029] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221107.622736] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221107.734471] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [221120.381643] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
> [223936.264020] BUG: soft lockup - CPU#0 stuck for 22s! [swapper:0]
> [223936.268002] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223936.268002] irq event stamp: 2595159887
> [223936.268002] hardirqs last  enabled at (2595159887): [<ffffffff8101b805>] mwait_idle+0x145/0x170
> [223936.268002] hardirqs last disabled at (2595159886): [<ffffffff81013139>] cpu_idle+0x79/0xf0
> [223936.268002] softirqs last  enabled at (2595159878): [<ffffffff81064176>] __do_softirq+0x1f6/0x250
> [223936.268002] softirqs last disabled at (2595159873): [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223936.268002] CPU 0
> [223936.268002] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223936.268002]
> [223936.268002] Pid: 0, comm: swapper Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
> [223936.268002] RIP: 0010:[<ffffffff813a4ee3>]  [<ffffffff813a4ee3>] delay_tsc+0x73/0xd0
> [223936.268002] RSP: 0018:ffff88022fc03cf0  EFLAGS: 00000202
> [223936.268002] RAX: 00017b5d5932dd02 RBX: ffffffff816f6334 RCX: 000000005932dd02
> [223936.372028] BUG: soft lockup - CPU#1 stuck for 22s! [kworker/0:0:0]
> [223936.372031] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223936.372042] irq event stamp: 2598787699
> [223936.372044] hardirqs last  enabled at (2598787699): [<ffffffff8101b805>] mwait_idle+0x145/0x170
> [223936.372054] hardirqs last disabled at (2598787698): [<ffffffff81013139>] cpu_idle+0x79/0xf0
> [223936.372058] softirqs last  enabled at (2598787696): [<ffffffff81064176>] __do_softirq+0x1f6/0x250
> [223936.372063] softirqs last disabled at (2598787681): [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223936.372069] CPU 1
> [223936.372070] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223936.372079]
> [223936.372081] Pid: 0, comm: kworker/0:0 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
> [223936.372086] RIP: 0010:[<ffffffff8101afab>]  [<ffffffff8101afab>] native_read_tsc+0xb/0x20
> [223936.372091] RSP: 0018:ffff88022fc43ce0  EFLAGS: 00000202
> [223936.372093] RAX: 0000000000017b5d RBX: ffffffff816f6334 RCX: 00000000652f810e
> [223936.372096] RDX: 0000000000017b5d RSI: 0000000000000000 RDI: 0000000000000001
> [223936.372098] RBP: ffff88022fc43ce0 R08: 00000000652f80c8 R09: 0000000000000000
> [223936.372101] R10: ffff88022690dd60 R11: 0000000000000000 R12: ffff88022fc43c58
> [223936.372103] R13: ffffffff816feb33 R14: ffff88022fc43ce0 R15: 00000000180bbeb8
> [223936.372106] FS:  0000000000000000(0000) GS:ffff88022fc40000(0000) knlGS:0000000000000000
> [223936.372108] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [223936.372111] CR2: 00007f472ba6a6fc CR3: 0000000126bb7000 CR4: 00000000000006e0
> [223936.372113] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [223936.372116] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [223936.372119] Process kworker/0:0 (pid: 0, threadinfo ffff880226928000, task ffff88022690dd60)
> [223936.372121] Stack:
> [223936.372123]  ffff88022fc43d30 ffffffff813a4eaf ffff880226928000 00000000652f8090
> [223936.372128]  000000012fc43d18 ffff88002e90e348 00000000180bbeb8 000000006efcdc62
> [223936.372132]  0000000000000001 ffff88022690dd60 ffff88022fc43d40 ffffffff813a4f6a
> [223936.372136] Call Trace:
> [223936.372139]  <IRQ>
> [223936.372144]  [<ffffffff813a4eaf>] delay_tsc+0x3f/0xd0
> [223936.372148]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223936.372153]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223936.372159]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223936.372164]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223936.372168]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223936.372174]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223936.372178]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223936.372182]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223936.372186]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223936.372190]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223936.372194]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223936.372198]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223936.372203]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223936.372208]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223936.372210]  <EOI>
> [223936.372214]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.372218]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223936.372222]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.372226]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223936.372230]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223936.372233] Code: a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31 89 c1 48 89 d0
> [223936.372253]  c1 e0 20 89 ca 48 09 d0 c9 c3 66 2e 0f 1f 84 00 00 00 00 00
> [223936.372262] Call Trace:
> [223936.372264]  <IRQ>  [<ffffffff813a4eaf>] delay_tsc+0x3f/0xd0
> [223936.372269]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223936.372272]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223936.372276]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223936.372280]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223936.372283]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223936.372286]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223936.372289]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223936.372293]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223936.372297]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223936.372300]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223936.372303]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223936.372307]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223936.372310]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223936.372313]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223936.372315]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.372321]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223936.372324]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.372327]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223936.372331]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223936.476032] BUG: soft lockup - CPU#2 stuck for 23s! [kworker/0:1:0]
> [223936.476034] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223936.476043] irq event stamp: 2613824057
> [223936.476045] hardirqs last  enabled at (2613824057): [<ffffffff8101b805>] mwait_idle+0x145/0x170
> [223936.476050] hardirqs last disabled at (2613824056): [<ffffffff81013139>] cpu_idle+0x79/0xf0
> [223936.476054] softirqs last  enabled at (2613824048): [<ffffffff81064176>] __do_softirq+0x1f6/0x250
> [223936.476058] softirqs last disabled at (2613824031): [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223936.476062] CPU 2
> [223936.476063] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223936.476071]
> [223936.476073] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
> [223936.476077] RIP: 0010:[<ffffffff8101afa6>]  [<ffffffff8101afa6>] native_read_tsc+0x6/0x20
> [223936.476082] RSP: 0018:ffff88022fc83ce0  EFLAGS: 00000202
> [223936.476084] RAX: 0000000070ba7dfc RBX: ffffffff813a60ae RCX: 0000000070ba7dc4
> [223936.476086] RDX: 0000000000017b5d RSI: 0000000000000000 RDI: 0000000000000001
> [223936.476089] RBP: ffff88022fc83ce0 R08: 0000000070ba7d7e R09: 0000000000000000
> [223936.476091] R10: ffff880226948000 R11: 0000000000000000 R12: ffff88022fc83c58
> [223936.476093] R13: ffffffff816feb33 R14: ffff88022fc83ce0 R15: 00000000182285f9
> [223936.476096] FS:  0000000000000000(0000) GS:ffff88022fc80000(0000) knlGS:0000000000000000
> [223936.476099] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [223936.476101] CR2: 00007f25e7874d7f CR3: 0000000124c0d000 CR4: 00000000000006e0
> [223936.476104] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [223936.476106] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [223936.476109] Process kworker/0:1 (pid: 0, threadinfo ffff88022693e000, task ffff880226948000)
> [223936.476111] Stack:
> [223936.476113]  ffff88022fc83d30 ffffffff813a4ee8 ffff88022693e000 0000000070ba7dc4
> [223936.476117]  00000002ffffff10 ffff88006afd8948 00000000182285f9 000000006efcdc62
> [223936.476121]  0000000000000001 ffff880226948000 ffff88022fc83d40 ffffffff813a4f6a
> [223936.476126] Call Trace:
> [223936.476128]  <IRQ>
> [223936.476132]  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
> [223936.476136]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223936.476141]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223936.476147]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223936.476153]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223936.476157]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223936.476163]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223936.476167]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223936.476171]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223936.476176]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223936.476180]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223936.476184]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223936.476187]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223936.476193]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223936.476197]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223936.476199]  <EOI>
> [223936.476203]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.476207]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223936.476211]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.476215]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223936.476219]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223936.476222] Code: 08 ff 15 46 5c a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31
> [223936.476241]  c1 48 89 d0 48 c1 e0 20 89 ca 48 09 d0 c9 c3 66 2e 0f 1f 84
> [223936.476251] Call Trace:
> [223936.476252]  <IRQ>  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
> [223936.476257]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223936.476261]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223936.476265]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223936.476268]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223936.476272]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223936.476275]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223936.476278]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223936.476282]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223936.476286]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223936.476289]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223936.476292]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223936.476295]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223936.476299]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223936.476302]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223936.476304]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.476310]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223936.476313]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.476316]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223936.476320]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223936.580039] BUG: soft lockup - CPU#3 stuck for 23s! [kworker/0:1:0]
> [223936.580041] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223936.580050] irq event stamp: 2615464042
> [223936.580052] hardirqs last  enabled at (2615464042): [<ffffffff816f5edb>] _raw_spin_unlock_irq+0x2b/0x50
> [223936.580057] hardirqs last disabled at (2615464041): [<ffffffff816f56a8>] _raw_spin_lock_irq+0x18/0x60
> [223936.580061] softirqs last  enabled at (2615463964): [<ffffffff81063cce>] _local_bh_enable+0xe/0x10
> [223936.580065] softirqs last disabled at (2615463965): [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223936.580069] CPU 3
> [223936.580070] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223936.580078]
> [223936.580080] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
> [223936.580085] RIP: 0010:[<ffffffff8101afa6>]  [<ffffffff8101afa6>] native_read_tsc+0x6/0x20
> [223936.580090] RSP: 0018:ffff88022fcc3ce0  EFLAGS: 00000202
> [223936.580092] RAX: 000000007c457b06 RBX: ffffffff816f6334 RCX: 000000007c457ad5
> [223936.580094] RDX: 0000000000017b5d RSI: ffffffff818f9896 RDI: 0000000000000001
> [223936.580097] RBP: ffff88022fcc3ce0 R08: 000000007c457a88 R09: 0000000000000000
> [223936.580099] R10: ffff880226981f20 R11: 0000000000000000 R12: ffff88022fcc3c58
> [223936.580101] R13: ffffffff816feb33 R14: ffff88022fcc3ce0 R15: 00000000183a1380
> [223936.580104] FS:  0000000000000000(0000) GS:ffff88022fcc0000(0000) knlGS:0000000000000000
> [223936.580107] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [223936.580109] CR2: 0000000000f38820 CR3: 0000000104b52000 CR4: 00000000000006e0
> [223936.580112] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [223936.580114] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [223936.580117] Process kworker/0:1 (pid: 0, threadinfo ffff880226988000, task ffff880226981f20)
> [223936.580119] Stack:
> [223936.580120]  ffff88022fcc3d30 ffffffff813a4ee8 ffff880226988000 000000007c457ad5
> [223936.580125]  00000003ffffff10 ffff880031438948 00000000183a1380 000000006efcdc62
> [223936.580129]  0000000000000001 ffff880226981f20 ffff88022fcc3d40 ffffffff813a4f6a
> [223936.580133] Call Trace:
> [223936.580135]  <IRQ>
> [223936.580138]  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
> [223936.580142]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223936.580147]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223936.580151]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223936.580156]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223936.580160]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223936.580164]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223936.580168]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223936.580172]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223936.580176]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223936.580181]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223936.580185]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223936.580188]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223936.580192]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223936.580196]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223936.580199]  <EOI>
> [223936.580202]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.580206]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223936.580211]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.580214]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223936.580218]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223936.580221] Code: 08 ff 15 46 5c a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31
> [223936.580240]  c1 48 89 d0 48 c1 e0 20 89 ca 48 09 d0 c9 c3 66 2e 0f 1f 84
> [223936.580250] Call Trace:
> [223936.580251]  <IRQ>  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
> [223936.580256]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223936.580260]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223936.580264]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223936.580267]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223936.580270]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223936.580274]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223936.580277]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223936.580280]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223936.580284]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223936.580288]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223936.580291]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223936.580294]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223936.580297]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223936.580300]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223936.580302]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.580308]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223936.580312]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.580315]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223936.580318]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223936.268002] RDX: 000000005932dd02 RSI: ffffffff818f9896 RDI: 0000000000000001
> [223936.268002] RBP: ffff88022fc03d30 R08: 000000005932dcb5 R09: 0000000000000000
> [223936.268002] R10: ffffffff81a2b020 R11: 0000000000000000 R12: ffff88022fc03c68
> [223936.268002] R13: ffffffff816feb33 R14: ffff88022fc03d30 R15: 0000000017f328cd
> [223936.268002] FS:  0000000000000000(0000) GS:ffff88022fc00000(0000) knlGS:0000000000000000
> [223936.268002] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [223936.268002] CR2: 00007f25e7bc13a0 CR3: 00000001426fc000 CR4: 00000000000006f0
> [223936.268002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [223936.268002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [223936.268002] Process swapper (pid: 0, threadinfo ffffffff819a6000, task ffffffff81a2b020)
> [223936.268002] Stack:
> [223936.268002]  ffffffff819a6000 000000005932dd02 000000002fc03d18 ffff8801f6c22448
> [223936.268002]  0000000017f328cd 000000006efcdc62 0000000000000001 ffffffff81a2b020
> [223936.268002]  ffff88022fc03d40 ffffffff813a4f6a ffff88022fc03d80 ffffffff813ac2ab
> [223936.268002] Call Trace:
> [223936.268002]  <IRQ>
> [223936.268002]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223936.268002]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223936.268002]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223936.268002]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223936.268002]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223936.268002]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223936.268002]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223936.268002]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223936.268002]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223936.268002]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223936.268002]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223936.268002]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223936.268002]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223936.268002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223936.268002]  <EOI>
> [223936.268002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.268002]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223936.268002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.268002]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223936.268002]  [<ffffffff816ca491>] rest_init+0xd1/0xe0
> [223936.268002]  [<ffffffff816ca3c0>] ? csum_partial_copy_generic+0x170/0x170
> [223936.268002]  [<ffffffff81adcc55>] start_kernel+0x360/0x3ac
> [223936.268002]  [<ffffffff81adc2a2>] x86_64_start_reservations+0x82/0x89
> [223936.268002]  [<ffffffff81adc3b8>] x86_64_start_kernel+0x10f/0x12a
> [223936.268002]  [<ffffffff81adc140>] ? early_idt_handlers+0x140/0x140
> [223936.268002] Code: 4c 89 7d c8 eb 1f 66 90 48 8b 45 c0 83 68 1c 01 f3 90 83 40 1c 01 65 44 8b 3c 25 50 d3 00 00 44 3b 7d d4 75 3b 66 66 90 0f ae e8 <e8> b8 60 c7 ff 66 90 4c 63 c0 4c 89 c0 48 2b 45 c8 48 39 d8 72
> [223936.268002] Call Trace:
> [223936.268002]  <IRQ>  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223936.268002]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223936.268002]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223936.268002]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223936.268002]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223936.268002]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223936.268002]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223936.268002]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223936.268002]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223936.268002]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223936.268002]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223936.268002]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223936.268002]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223936.268002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223936.268002]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.268002]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223936.268002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223936.268002]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223936.268002]  [<ffffffff816ca491>] rest_init+0xd1/0xe0
> [223936.268002]  [<ffffffff816ca3c0>] ? csum_partial_copy_generic+0x170/0x170
> [223936.268002]  [<ffffffff81adcc55>] start_kernel+0x360/0x3ac
> [223936.268002]  [<ffffffff81adc2a2>] x86_64_start_reservations+0x82/0x89
> [223936.268002]  [<ffffffff81adc3b8>] x86_64_start_kernel+0x10f/0x12a
> [223936.268002]  [<ffffffff81adc140>] ? early_idt_handlers+0x140/0x140
> [223964.264018] BUG: soft lockup - CPU#0 stuck for 22s! [swapper:0]
> [223964.268002] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223964.268002] irq event stamp: 2595159887
> [223964.268002] hardirqs last  enabled at (2595159887): [<ffffffff8101b805>] mwait_idle+0x145/0x170
> [223964.268002] hardirqs last disabled at (2595159886): [<ffffffff81013139>] cpu_idle+0x79/0xf0
> [223964.268002] softirqs last  enabled at (2595159878): [<ffffffff81064176>] __do_softirq+0x1f6/0x250
> [223964.268002] softirqs last disabled at (2595159873): [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223964.268002] CPU 0
> [223964.268002] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223964.268002]
> [223964.268002] Pid: 0, comm: swapper Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
> [223964.268002] RIP: 0010:[<ffffffff8101afa6>]  [<ffffffff8101afa6>] native_read_tsc+0x6/0x20
> [223964.268002] RSP: 0018:ffff88022fc03ce0  EFLAGS: 00000202
> [223964.268002] RAX: 000000007cb6c61b RBX: ffffffff816f6334 RCX: 000000007cb6c5e3
> [223964.372025] BUG: soft lockup - CPU#1 stuck for 23s! [kworker/0:0:0]
> [223964.372027] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223964.372036] irq event stamp: 2598787699
> [223964.372037] hardirqs last  enabled at (2598787699): [<ffffffff8101b805>] mwait_idle+0x145/0x170
> [223964.372042] hardirqs last disabled at (2598787698): [<ffffffff81013139>] cpu_idle+0x79/0xf0
> [223964.372045] softirqs last  enabled at (2598787696): [<ffffffff81064176>] __do_softirq+0x1f6/0x250
> [223964.372049] softirqs last disabled at (2598787681): [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223964.372052] CPU 1
> [223964.372053] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223964.372061]
> [223964.372063] Pid: 0, comm: kworker/0:0 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
> [223964.372067] RIP: 0010:[<ffffffff8101afa0>]  [<ffffffff8101afa0>] read_persistent_clock+0x30/0x30
> [223964.372072] RSP: 0018:ffff88022fc43ce8  EFLAGS: 00000202
> [223964.372074] RAX: 0000000000000001 RBX: ffff88022fc43c68 RCX: 0000000088b369fd
> [223964.372076] RDX: 0000000000000200 RSI: 0000000000000000 RDI: 0000000000000001
> [223964.372078] RBP: ffff88022fc43d30 R08: ffffffff88b369fd R09: 0000000000000000
> [223964.372081] R10: ffff88022690dd60 R11: 0000000000000000 R12: ffff88022fc43c58
> [223964.372083] R13: ffffffff816feb33 R14: ffff88022fc43d30 R15: 00000000307e58b4
> [223964.372086] FS:  0000000000000000(0000) GS:ffff88022fc40000(0000) knlGS:0000000000000000
> [223964.372089] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [223964.372091] CR2: 00007f472ba6a6fc CR3: 0000000126bb7000 CR4: 00000000000006e0
> [223964.372093] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [223964.372096] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [223964.372098] Process kworker/0:0 (pid: 0, threadinfo ffff880226928000, task ffff88022690dd60)
> [223964.372100] Stack:
> [223964.372102]  ffffffff813a4eaf ffff880226928000 ffffffff88b369c5 000000012fc43d18
> [223964.372106]  ffff88002e90e348 00000000307e58b4 000000006efcdc62 0000000000000001
> [223964.372111]  ffff88022690dd60 ffff88022fc43d40 ffffffff813a4f6a ffff88022fc43d80
> [223964.372115] Call Trace:
> [223964.372116]  <IRQ>
> [223964.372119]  [<ffffffff813a4eaf>] ? delay_tsc+0x3f/0xd0
> [223964.372123]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223964.372127]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223964.372132]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223964.372136]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223964.372140]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223964.372144]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223964.372148]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223964.372153]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223964.372158]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223964.372162]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223964.372166]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223964.372170]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223964.372174]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223964.372178]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223964.372180]  <EOI>
> [223964.372184]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.372188]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223964.372192]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.372196]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223964.372200]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223964.372203] Code: 48 89 fb 48 83 ec 08 ff 15 46 5c a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00
> [223964.372221]  48 89 e5 0f 31 89 c1 48 89 d0 48 c1 e0 20 89 ca 48 09 d0 c9
> [223964.372231] Call Trace:
> [223964.372232]  <IRQ>  [<ffffffff813a4eaf>] ? delay_tsc+0x3f/0xd0
> [223964.372237]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223964.372241]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223964.372245]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223964.372248]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223964.372251]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223964.372255]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223964.372258]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223964.372261]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223964.372265]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223964.372268]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223964.372271]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223964.372275]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223964.372278]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223964.372281]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223964.372282]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.372288]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223964.372292]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.372295]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223964.372298]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223964.476031] BUG: soft lockup - CPU#2 stuck for 23s! [kworker/0:1:0]
> [223964.476033] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223964.476042] irq event stamp: 2613824057
> [223964.476043] hardirqs last  enabled at (2613824057): [<ffffffff8101b805>] mwait_idle+0x145/0x170
> [223964.476048] hardirqs last disabled at (2613824056): [<ffffffff81013139>] cpu_idle+0x79/0xf0
> [223964.476051] softirqs last  enabled at (2613824048): [<ffffffff81064176>] __do_softirq+0x1f6/0x250
> [223964.476055] softirqs last disabled at (2613824031): [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223964.476059] CPU 2
> [223964.476060] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223964.476067]
> [223964.476070] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
> [223964.476074] RIP: 0010:[<ffffffff8101afa6>]  [<ffffffff8101afa6>] native_read_tsc+0x6/0x20
> [223964.476078] RSP: 0018:ffff88022fc83ce0  EFLAGS: 00000206
> [223964.476080] RAX: 00000000943e6715 RBX: ffffffff816f6334 RCX: 00000000943e66dd
> [223964.476083] RDX: 0000000000017b69 RSI: 0000000000000000 RDI: 0000000000000001
> [223964.476085] RBP: ffff88022fc83ce0 R08: ffffffff943e6697 R09: 0000000000000000
> [223964.476087] R10: ffff880226948000 R11: 0000000000000000 R12: ffff88022fc83c58
> [223964.476090] R13: ffffffff816feb33 R14: ffff88022fc83ce0 R15: 000000003094ad30
> [223964.476092] FS:  0000000000000000(0000) GS:ffff88022fc80000(0000) knlGS:0000000000000000
> [223964.476095] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [223964.476097] CR2: 00007f25e7874d7f CR3: 0000000124c0d000 CR4: 00000000000006e0
> [223964.476100] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [223964.476102] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [223964.476105] Process kworker/0:1 (pid: 0, threadinfo ffff88022693e000, task ffff880226948000)
> [223964.476107] Stack:
> [223964.476108]  ffff88022fc83d30 ffffffff813a4ee8 ffff88022693e000 ffffffff943e66dd
> [223964.476113]  00000002ffffff10 ffff88006afd8948 000000003094ad30 000000006efcdc62
> [223964.476117]  0000000000000001 ffff880226948000 ffff88022fc83d40 ffffffff813a4f6a
> [223964.476121] Call Trace:
> [223964.476123]  <IRQ>
> [223964.476126]  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
> [223964.476130]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223964.476134]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223964.476139]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223964.476143]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223964.476147]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223964.476151]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223964.476155]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223964.476159]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223964.476164]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223964.476168]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223964.476172]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223964.476176]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223964.476180]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223964.476184]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223964.476186]  <EOI>
> [223964.476190]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.476194]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223964.476198]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.476202]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223964.476206]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223964.476208] Code: 08 ff 15 46 5c a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31
> [223964.476227]  c1 48 89 d0 48 c1 e0 20 89 ca 48 09 d0 c9 c3 66 2e 0f 1f 84
> [223964.476236] Call Trace:
> [223964.476238]  <IRQ>  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
> [223964.476243]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223964.476246]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223964.476250]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223964.476254]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223964.476257]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223964.476260]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223964.476264]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223964.476267]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223964.476271]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223964.476274]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223964.476277]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223964.476281]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223964.476284]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223964.476287]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223964.476289]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.476295]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223964.476298]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.476301]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223964.476304]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223964.580038] BUG: soft lockup - CPU#3 stuck for 23s! [kworker/0:1:0]
> [223964.580040] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223964.580049] irq event stamp: 2615464042
> [223964.580050] hardirqs last  enabled at (2615464042): [<ffffffff816f5edb>] _raw_spin_unlock_irq+0x2b/0x50
> [223964.580054] hardirqs last disabled at (2615464041): [<ffffffff816f56a8>] _raw_spin_lock_irq+0x18/0x60
> [223964.580058] softirqs last  enabled at (2615463964): [<ffffffff81063cce>] _local_bh_enable+0xe/0x10
> [223964.580062] softirqs last disabled at (2615463965): [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223964.580066] CPU 3
> [223964.580067] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223964.580075]
> [223964.580077] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
> [223964.580081] RIP: 0010:[<ffffffff8101afa6>]  [<ffffffff8101afa6>] native_read_tsc+0x6/0x20
> [223964.580086] RSP: 0018:ffff88022fcc3ce0  EFLAGS: 00000206
> [223964.580088] RAX: 000000009fc963af RBX: ffffffff816f6334 RCX: 000000009fc96377
> [223964.580090] RDX: 0000000000017b69 RSI: ffffffff818f9896 RDI: 0000000000000001
> [223964.580093] RBP: ffff88022fcc3ce0 R08: ffffffff9fc96331 R09: 0000000000000000
> [223964.580095] R10: ffff880226981f20 R11: 0000000000000000 R12: ffff88022fcc3c58
> [223964.580097] R13: ffffffff816feb33 R14: ffff88022fcc3ce0 R15: 0000000030ac88b0
> [223964.580100] FS:  0000000000000000(0000) GS:ffff88022fcc0000(0000) knlGS:0000000000000000
> [223964.580103] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [223964.580105] CR2: 0000000000f38820 CR3: 0000000104b52000 CR4: 00000000000006e0
> [223964.580107] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [223964.580110] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [223964.580112] Process kworker/0:1 (pid: 0, threadinfo ffff880226988000, task ffff880226981f20)
> [223964.580114] Stack:
> [223964.580116]  ffff88022fcc3d30 ffffffff813a4ee8 ffff880226988000 ffffffff9fc96377
> [223964.580120]  000000039c3b34d8 ffff880031438948 0000000030ac88b0 000000006efcdc62
> [223964.580124]  0000000000000001 ffff880226981f20 ffff88022fcc3d40 ffffffff813a4f6a
> [223964.580128] Call Trace:
> [223964.580130]  <IRQ>
> [223964.580133]  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
> [223964.580137]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223964.580141]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223964.580146]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223964.580150]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223964.580154]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223964.580158]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223964.580162]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223964.580167]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223964.580171]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223964.580176]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223964.580180]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223964.580184]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223964.580188]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223964.580192]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223964.580194]  <EOI>
> [223964.580198]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.580202]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223964.580206]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.580210]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223964.580214]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223964.580217] Code: 08 ff 15 46 5c a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31
> [223964.580235]  c1 48 89 d0 48 c1 e0 20 89 ca 48 09 d0 c9 c3 66 2e 0f 1f 84
> [223964.580245] Call Trace:
> [223964.580246]  <IRQ>  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
> [223964.580252]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223964.580255]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223964.580259]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223964.580262]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223964.580265]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223964.580269]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223964.580272]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223964.580276]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223964.580279]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223964.580283]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223964.580286]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223964.580289]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223964.580292]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223964.580295]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223964.580297]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.580303]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223964.580307]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.580310]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223964.580313]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223964.268002] RDX: 0000000000017b69 RSI: ffffffff818f9896 RDI: 0000000000000001
> [223964.268002] RBP: ffff88022fc03ce0 R08: 000000007cb6c596 R09: 0000000000000000
> [223964.268002] R10: ffffffff81a2b020 R11: 0000000000000000 R12: ffff88022fc03c58
> [223964.268002] R13: ffffffff816feb33 R14: ffff88022fc03ce0 R15: 000000002eb85d38
> [223964.268002] FS:  0000000000000000(0000) GS:ffff88022fc00000(0000) knlGS:0000000000000000
> [223964.268002] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [223964.268002] CR2: 00007f25e7bc13a0 CR3: 00000001426fc000 CR4: 00000000000006f0
> [223964.268002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [223964.268002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [223964.268002] Process swapper (pid: 0, threadinfo ffffffff819a6000, task ffffffff81a2b020)
> [223964.268002] Stack:
> [223964.268002]  ffff88022fc03d30 ffffffff813a4ee8 ffffffff819a6000 000000007cb6c5e3
> [223964.268002]  000000007c44ac9c ffff8801f6c22448 000000002eb85d38 000000006efcdc62
> [223964.268002]  0000000000000001 ffffffff81a2b020 ffff88022fc03d40 ffffffff813a4f6a
> [223964.268002] Call Trace:
> [223964.268002]  <IRQ>
> [223964.268002]  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
> [223964.268002]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223964.268002]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223964.268002]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223964.268002]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223964.268002]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223964.268002]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223964.268002]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223964.268002]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223964.268002]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223964.268002]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223964.268002]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223964.268002]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223964.268002]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223964.268002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223964.268002]  <EOI>
> [223964.268002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.268002]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223964.268002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.268002]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223964.268002]  [<ffffffff816ca491>] rest_init+0xd1/0xe0
> [223964.268002]  [<ffffffff816ca3c0>] ? csum_partial_copy_generic+0x170/0x170
> [223964.268002]  [<ffffffff81adcc55>] start_kernel+0x360/0x3ac
> [223964.268002]  [<ffffffff81adc2a2>] x86_64_start_reservations+0x82/0x89
> [223964.268002]  [<ffffffff81adc3b8>] x86_64_start_kernel+0x10f/0x12a
> [223964.268002]  [<ffffffff81adc140>] ? early_idt_handlers+0x140/0x140
> [223964.268002] Code: 08 ff 15 46 5c a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31
> [223964.268002]  c1 48 89 d0 48 c1 e0 20 89 ca 48 09 d0 c9 c3 66 2e 0f 1f 84
> [223964.268002] Call Trace:
> [223964.268002]  <IRQ>  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
> [223964.268002]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223964.268002]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223964.268002]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223964.268002]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223964.268002]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223964.268002]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223964.268002]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223964.268002]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223964.268002]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223964.268002]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223964.268002]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223964.268002]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223964.268002]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223964.268002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223964.268002]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.268002]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223964.268002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223964.268002]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223964.268002]  [<ffffffff816ca491>] rest_init+0xd1/0xe0
> [223964.268002]  [<ffffffff816ca3c0>] ? csum_partial_copy_generic+0x170/0x170
> [223964.268002]  [<ffffffff81adcc55>] start_kernel+0x360/0x3ac
> [223964.268002]  [<ffffffff81adc2a2>] x86_64_start_reservations+0x82/0x89
> [223964.268002]  [<ffffffff81adc3b8>] x86_64_start_kernel+0x10f/0x12a
> [223964.268002]  [<ffffffff81adc140>] ? early_idt_handlers+0x140/0x140
> [223968.815995] INFO: rcu_sched_state detected stall on CPU 1 (t=15000 jiffies)
> [223968.819995] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 1} (detected by 3, t=15002 jiffies)
> [223968.820000] sending NMI to all CPUs:
> [223968.820002] NMI backtrace for cpu 3
> [223968.820002] CPU 3
> [223968.820002] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223968.820002]
> [223968.820002] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
> [223968.820002] RIP: 0010:[<ffffffff813a4f86>]  [<ffffffff813a4f86>] __const_udelay+0x16/0x40
> [223968.820002] RSP: 0018:ffff88022fcc3a90  EFLAGS: 00000002
> [223968.820002] RAX: 0000000000e34d8a RBX: 0000000000000001 RCX: 0000000001062560
> [223968.820002] RDX: 000000000071a6c5 RSI: 0000000000000002 RDI: 0000000000418958
> [223968.820002] RBP: ffff88022fcc3ab0 R08: 0000000000000002 R09: 0000000000000000
> [223968.820002] R10: 0000000000000006 R11: 000000000000000a R12: ffffffff81a40d80
> [223968.820002] R13: 0000000000000010 R14: ffffffff81a40e40 R15: ffffffff81a40fc0
> [223968.820002] FS:  0000000000000000(0000) GS:ffff88022fcc0000(0000) knlGS:0000000000000000
> [223968.820002] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [223968.820002] CR2: 0000000000f38820 CR3: 0000000104b52000 CR4: 00000000000006e0
> [223968.820002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [223968.820002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [223968.820002] Process kworker/0:1 (pid: 0, threadinfo ffff880226988000, task ffff880226981f20)
> [223968.820002] Stack:
> [223968.820002]  ffff88022fcc3ab0 ffffffff81031695 ffff88022fccdfa0 ffff88022fccdfa0
> [223968.820002]  ffff88022fcc3af0 ffffffff810bb9d2 ffffffff81a40fc0 0000000000000003
> [223968.820002]  0000000000000003 ffff880226981f20 ffffffff810921f0 ffff88022fcc3be0
> [223968.820002] Call Trace:
> [223968.820002]  <IRQ>
> [223968.820002]  [<ffffffff81031695>] ? arch_trigger_all_cpu_backtrace+0x65/0x90
> [223968.820002]  [<ffffffff810bb9d2>] __rcu_pending+0x382/0x3b0
> [223968.820002]  [<ffffffff810921f0>] ? tick_nohz_handler+0x100/0x100
> [223968.820002]  [<ffffffff810bba67>] rcu_check_callbacks+0x67/0x130
> [223968.820002]  [<ffffffff8106d861>] update_process_times+0x41/0x80
> [223968.820002]  [<ffffffff81092256>] tick_sched_timer+0x66/0xc0
> [223968.820002]  [<ffffffff810845ee>] __run_hrtimer+0xfe/0x1e0
> [223968.820002]  [<ffffffff8108491d>] hrtimer_interrupt+0xcd/0x1f0
> [223968.820002]  [<ffffffff810310c4>] smp_apic_timer_interrupt+0x64/0xa0
> [223968.820002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223968.820002]  [<ffffffff816f6334>] ? retint_restore_args+0x13/0x13
> [223968.820002]  [<ffffffff8101afa6>] ? native_read_tsc+0x6/0x20
> [223968.820002]  [<ffffffff813a4eaf>] delay_tsc+0x3f/0xd0
> [223968.820002]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223968.820002]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223968.820002]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223968.820002]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223968.820002]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223968.820002]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223968.820002]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223968.820002]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223968.820002]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223968.820002]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223968.820002]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223968.820002]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223968.820002]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223968.820002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223968.820002]  <EOI>
> [223968.820002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.820002]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223968.820002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.820002]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223968.820002]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223968.820002] Code: 00 00 00 00 00 55 48 89 e5 ff 15 8e a5 6c 00 c9 c3 0f 1f 40 00 55 48 8d 0c bd 00 00 00 00 65 48 8b 14 25 58 2d 01 00 48 8d 04 12
> [223968.820002]  c1 e2 06 48 89 e5 48 29 c2 48 89 c8 f7 e2 48 8d 7a 01 ff 15
> [223968.820002] Call Trace:
> [223968.820002]  <IRQ>  [<ffffffff81031695>] ? arch_trigger_all_cpu_backtrace+0x65/0x90
> [223968.820002]  [<ffffffff810bb9d2>] __rcu_pending+0x382/0x3b0
> [223968.820002]  [<ffffffff810921f0>] ? tick_nohz_handler+0x100/0x100
> [223968.820002]  [<ffffffff810bba67>] rcu_check_callbacks+0x67/0x130
> [223968.820002]  [<ffffffff8106d861>] update_process_times+0x41/0x80
> [223968.820002]  [<ffffffff81092256>] tick_sched_timer+0x66/0xc0
> [223968.820002]  [<ffffffff810845ee>] __run_hrtimer+0xfe/0x1e0
> [223968.820002]  [<ffffffff8108491d>] hrtimer_interrupt+0xcd/0x1f0
> [223968.820002]  [<ffffffff810310c4>] smp_apic_timer_interrupt+0x64/0xa0
> [223968.820002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223968.820002]  [<ffffffff816f6334>] ? retint_restore_args+0x13/0x13
> [223968.820002]  [<ffffffff8101afa6>] ? native_read_tsc+0x6/0x20
> [223968.820002]  [<ffffffff813a4eaf>] delay_tsc+0x3f/0xd0
> [223968.820002]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223968.820002]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223968.820002]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223968.820002]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223968.820002]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223968.820002]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223968.820002]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223968.820002]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223968.820002]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223968.820002]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223968.820002]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223968.820002]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223968.820002]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223968.820002]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223968.820002]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.820002]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223968.820002]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.820002]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223968.820002]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223968.820335] NMI backtrace for cpu 0
> [223968.820337] CPU 0
> [223968.820338] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223968.820347]
> [223968.820349] Pid: 0, comm: swapper Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
> [223968.820353] RIP: 0010:[<ffffffff813a4ef0>]  [<ffffffff813a4ef0>] delay_tsc+0x80/0xd0
> [223968.820358] RSP: 0018:ffff88022fc03cf0  EFLAGS: 00000206
> [223968.820360] RAX: 000000007659b10f RBX: 0000000000000001 RCX: 000000007659b10f
> [223968.820363] RDX: 000000007659b10f RSI: ffffffff818f9896 RDI: 0000000000000001
> [223968.820365] RBP: ffff88022fc03d30 R08: 000000007659b10f R09: 0000000000000000
> [223968.820367] R10: ffffffff81a2b020 R11: 0000000000000000 R12: 0000000031026962
> [223968.820370] R13: 000000006efcdc62 R14: ffffffff819a6000 R15: 000000007659b0de
> [223968.820373] FS:  0000000000000000(0000) GS:ffff88022fc00000(0000) knlGS:0000000000000000
> [223968.820375] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [223968.820377] CR2: 00007f25e7bc13a0 CR3: 00000001426fc000 CR4: 00000000000006f0
> [223968.820380] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [223968.820382] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [223968.820385] Process swapper (pid: 0, threadinfo ffffffff819a6000, task ffffffff81a2b020)
> [223968.820387] Stack:
> [223968.820388]  ffffffff819a6000 000000007659b0de 00000000818f9896 ffff8801f6c22448
> [223968.820393]  0000000031026962 000000006efcdc62 0000000000000001 ffffffff81a2b020
> [223968.820397]  ffff88022fc03d40 ffffffff813a4f6a ffff88022fc03d80 ffffffff813ac2ab
> [223968.820401] Call Trace:
> [223968.820402]  <IRQ>
> [223968.820406]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223968.820410]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223968.820414]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223968.820417]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223968.820420]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223968.820424]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223968.820427]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223968.820430]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223968.820434]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223968.820437]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223968.820441]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223968.820444]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223968.820447]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223968.820450]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223968.820452]  <EOI>
> [223968.820455]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.820459]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223968.820462]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.820465]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223968.820468]  [<ffffffff816ca491>] rest_init+0xd1/0xe0
> [223968.820471]  [<ffffffff816ca3c0>] ? csum_partial_copy_generic+0x170/0x170
> [223968.820475]  [<ffffffff81adcc55>] start_kernel+0x360/0x3ac
> [223968.820478]  [<ffffffff81adc2a2>] x86_64_start_reservations+0x82/0x89
> [223968.820481]  [<ffffffff81adc3b8>] x86_64_start_kernel+0x10f/0x12a
> [223968.820484]  [<ffffffff81adc140>] ? early_idt_handlers+0x140/0x140
> [223968.820486] Code: 68 1c 01 f3 90 83 40 1c 01 65 44 8b 3c 25 50 d3 00 00 44 3b 7d d4 75 3b 66 66 90 0f ae e8 e8 b8 60 c7 ff 66 90 4c 63 c0 4c 89 c0
> [223968.820504]  2b 45 c8 48 39 d8 72 c7 65 48 8b 04 25 08 c4 00 00 83 a8 44
> [223968.820514] Call Trace:
> [223968.820515]  <IRQ>  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223968.820521]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223968.820525]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223968.820528]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223968.820532]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223968.820535]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223968.820538]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223968.820542]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223968.820546]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223968.820549]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223968.820552]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223968.820556]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223968.820559]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223968.820562]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223968.820564]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.820570]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223968.820573]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.820576]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223968.820579]  [<ffffffff816ca491>] rest_init+0xd1/0xe0
> [223968.820583]  [<ffffffff816ca3c0>] ? csum_partial_copy_generic+0x170/0x170
> [223968.820586]  [<ffffffff81adcc55>] start_kernel+0x360/0x3ac
> [223968.820589]  [<ffffffff81adc2a2>] x86_64_start_reservations+0x82/0x89
> [223968.820593]  [<ffffffff81adc3b8>] x86_64_start_kernel+0x10f/0x12a
> [223968.820596]  [<ffffffff81adc140>] ? early_idt_handlers+0x140/0x140
> [223968.820599] NMI backtrace for cpu 2
> [223968.820600] CPU 2
> [223968.820602] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223968.820610]
> [223968.820612] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
> [223968.820616] RIP: 0010:[<ffffffff8101afa6>]  [<ffffffff8101afa6>] native_read_tsc+0x6/0x20
> [223968.820621] RSP: 0018:ffff88022fc83ce0  EFLAGS: 00000202
> [223968.820623] RAX: 000000007659b116 RBX: 0000000000000001 RCX: 000000007659b0e5
> [223968.820625] RDX: 0000000000017b6b RSI: 0000000000000000 RDI: 0000000000000001
> [223968.820628] RBP: ffff88022fc83ce0 R08: 000000007659b098 R09: 0000000000000000
> [223968.820630] R10: ffff880226948000 R11: 0000000000000000 R12: 00000000345f87d7
> [223968.820632] R13: 000000006efcdc62 R14: ffff88022693e000 R15: 000000007659b0e5
> [223968.820635] FS:  0000000000000000(0000) GS:ffff88022fc80000(0000) knlGS:0000000000000000
> [223968.820638] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [223968.820640] CR2: 00007f25e7874d7f CR3: 0000000124c0d000 CR4: 00000000000006e0
> [223968.820642] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [223968.820645] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [223968.820647] Process kworker/0:1 (pid: 0, threadinfo ffff88022693e000, task ffff880226948000)
> [223968.820649] Stack:
> [223968.820651]  ffff88022fc83d30 ffffffff813a4ee8 ffff88022693e000 000000007659b0e5
> [223968.820655]  000000026b4044c5 ffff88006afd8948 00000000345f87d7 000000006efcdc62
> [223968.820659]  0000000000000001 ffff880226948000 ffff88022fc83d40 ffffffff813a4f6a
> [223968.820663] Call Trace:
> [223968.820665]  <IRQ>
> [223968.820668]  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
> [223968.820671]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223968.820674]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223968.820678]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223968.820682]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223968.820685]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223968.820688]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223968.820691]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223968.820695]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223968.820699]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223968.820702]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223968.820705]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223968.820708]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223968.820712]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223968.820715]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223968.820717]  <EOI>
> [223968.820720]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.820723]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223968.820727]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.820730]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223968.820733]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223968.820735] Code: 08 ff 15 46 5c a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31
> [223968.820753]  c1 48 89 d0 48 c1 e0 20 89 ca 48 09 d0 c9 c3 66 2e 0f 1f 84
> [223968.820763] Call Trace:
> [223968.820764]  <IRQ>  [<ffffffff813a4ee8>] delay_tsc+0x78/0xd0
> [223968.820769]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223968.820773]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223968.820777]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223968.820780]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223968.820783]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223968.820787]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223968.820790]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223968.820793]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223968.820797]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223968.820801]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223968.820804]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223968.820807]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223968.820810]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223968.820813]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223968.820815]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.820821]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223968.820824]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.820827]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223968.820831]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223968.816001] NMI backtrace for cpu 1
> [223968.816001] CPU 1
> [223968.816001] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
> [223968.816001]
> [223968.816001] Pid: 0, comm: kworker/0:0 Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0M788G
> [223968.816001] RIP: 0010:[<ffffffff81440955>]  [<ffffffff81440955>] io_serial_out+0x15/0x20
> [223968.816001] RSP: 0018:ffff88022fc437f0  EFLAGS: 00000002
> [223968.816001] RAX: 0000000000000073 RBX: ffffffff8243eec0 RCX: 0000000000000000
> [223968.816001] RDX: 00000000000003f8 RSI: 00000000000003f8 RDI: ffffffff8243eec0
> [223968.816001] RBP: ffff88022fc437f0 R08: 000000007659a435 R09: 0000000000000000
> [223968.816001] R10: 0000000000000001 R11: 000000000000000a R12: 0000000000000073
> [223968.816001] R13: ffffffff81bc648d R14: 0000000000000050 R15: ffffffff8243eec0
> [223968.816001] FS:  0000000000000000(0000) GS:ffff88022fc40000(0000) knlGS:0000000000000000
> [223968.816001] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [223968.816001] CR2: 00007f472ba6a6fc CR3: 0000000126bb7000 CR4: 00000000000006e0
> [223968.816001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [223968.816001] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [223968.816001] Process kworker/0:0 (pid: 0, threadinfo ffff880226928000, task ffff88022690dd60)
> [223968.816001] Stack:
> [223968.816001]  ffff88022fc43810 ffffffff814410dc 0000000000000030 ffffffff814410b0
> [223968.816001]  ffff88022fc43850 ffffffff8143cdb5 0000000000000087 0000000000000000
> [223968.816001]  ffffffff8243eec0 0000000000000001 0000000000000087 000000000000000d
> [223968.816001] Call Trace:
> [223968.816001]  <IRQ>
> [223968.816001]  [<ffffffff814410dc>] serial8250_console_putchar+0x2c/0x40
> [223968.816001]  [<ffffffff814410b0>] ? wait_for_xmitr+0xa0/0xa0
> [223968.816001]  [<ffffffff8143cdb5>] uart_console_write+0x35/0x70
> [223968.816001]  [<ffffffff814417be>] serial8250_console_write+0xbe/0x1a0
> [223968.816001]  [<ffffffff8105c78e>] __call_console_drivers+0x8e/0xb0
> [223968.816001]  [<ffffffff8105c7f5>] _call_console_drivers+0x45/0x70
> [223968.816001]  [<ffffffff8105d02f>] console_unlock+0x17f/0x2b0
> [223968.816001]  [<ffffffff8105d64d>] vprintk+0x1fd/0x520
> [223968.816001]  [<ffffffff816f16c1>] printk+0x67/0x69
> [223968.816001]  [<ffffffff816f5fa6>] ? _raw_spin_unlock+0x26/0x40
> [223968.816001]  [<ffffffff8105388b>] ? account_system_time+0xab/0x190
> [223968.816001]  [<ffffffff810bb7e4>] __rcu_pending+0x194/0x3b0
> [223968.816001]  [<ffffffff810921f0>] ? tick_nohz_handler+0x100/0x100
> [223968.816001]  [<ffffffff810bba67>] rcu_check_callbacks+0x67/0x130
> [223968.816001]  [<ffffffff8106d861>] update_process_times+0x41/0x80
> [223968.816001]  [<ffffffff81092256>] tick_sched_timer+0x66/0xc0
> [223968.816001]  [<ffffffff810845ee>] __run_hrtimer+0xfe/0x1e0
> [223968.816001]  [<ffffffff8108491d>] hrtimer_interrupt+0xcd/0x1f0
> [223968.816001]  [<ffffffff810310c4>] smp_apic_timer_interrupt+0x64/0xa0
> [223968.816001]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223968.816001]  [<ffffffff816f6334>] ? retint_restore_args+0x13/0x13
> [223968.816001]  [<ffffffff8101afa6>] ? native_read_tsc+0x6/0x20
> [223968.816001]  [<ffffffff813a4eaf>] delay_tsc+0x3f/0xd0
> [223968.816001]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223968.816001]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223968.816001]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223968.816001]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223968.816001]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223968.816001]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223968.816001]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223968.816001]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223968.816001]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223968.816001]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223968.816001]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223968.816001]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223968.816001]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223968.816001]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223968.816001]  <EOI>
> [223968.816001]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.816001]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223968.816001]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.816001]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223968.816001]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
> [223968.816001] Code: 48 89 e5 d3 e2 03 57 38 ec 0f b6 c0 c9 c3 0f 1f 84 00 00 00 00 00 0f b6 8f 81 00 00 00 55 89 d0 48 89 e5 d3 e6 03 77 38 89 f2 ee <c9> c3 66 0f 1f 84 00 00 00 00 00 55 80 bf 82 00 00 00 08 48 89
> [223968.816001] Call Trace:
> [223968.816001]  <IRQ>  [<ffffffff814410dc>] serial8250_console_putchar+0x2c/0x40
> [223968.816001]  [<ffffffff814410b0>] ? wait_for_xmitr+0xa0/0xa0
> [223968.816001]  [<ffffffff8143cdb5>] uart_console_write+0x35/0x70
> [223968.816001]  [<ffffffff814417be>] serial8250_console_write+0xbe/0x1a0
> [223968.816001]  [<ffffffff8105c78e>] __call_console_drivers+0x8e/0xb0
> [223968.816001]  [<ffffffff8105c7f5>] _call_console_drivers+0x45/0x70
> [223968.816001]  [<ffffffff8105d02f>] console_unlock+0x17f/0x2b0
> [223968.816001]  [<ffffffff8105d64d>] vprintk+0x1fd/0x520
> [223968.816001]  [<ffffffff816f16c1>] printk+0x67/0x69
> [223968.816001]  [<ffffffff816f5fa6>] ? _raw_spin_unlock+0x26/0x40
> [223968.816001]  [<ffffffff8105388b>] ? account_system_time+0xab/0x190
> [223968.816001]  [<ffffffff810bb7e4>] __rcu_pending+0x194/0x3b0
> [223968.816001]  [<ffffffff810921f0>] ? tick_nohz_handler+0x100/0x100
> [223968.816001]  [<ffffffff810bba67>] rcu_check_callbacks+0x67/0x130
> [223968.816001]  [<ffffffff8106d861>] update_process_times+0x41/0x80
> [223968.816001]  [<ffffffff81092256>] tick_sched_timer+0x66/0xc0
> [223968.816001]  [<ffffffff810845ee>] __run_hrtimer+0xfe/0x1e0
> [223968.816001]  [<ffffffff8108491d>] hrtimer_interrupt+0xcd/0x1f0
> [223968.816001]  [<ffffffff810310c4>] smp_apic_timer_interrupt+0x64/0xa0
> [223968.816001]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223968.816001]  [<ffffffff816f6334>] ? retint_restore_args+0x13/0x13
> [223968.816001]  [<ffffffff8101afa6>] ? native_read_tsc+0x6/0x20
> [223968.816001]  [<ffffffff813a4eaf>] delay_tsc+0x3f/0xd0
> [223968.816001]  [<ffffffff813a4f6a>] __delay+0xa/0x10
> [223968.816001]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
> [223968.816001]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
> [223968.816001]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
> [223968.816001]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
> [223968.816001]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
> [223968.816001]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
> [223968.816001]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
> [223968.816001]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
> [223968.816001]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
> [223968.816001]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
> [223968.816001]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
> [223968.816001]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
> [223968.816001]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
> [223968.816001]  <EOI>  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.816001]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
> [223968.816001]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
> [223968.816001]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
> [223968.816001]  [<ffffffff816ec4bb>] start_secondary+0x1ca/0x1ff
>
> [ goes on for another ~300kB, trimmed ]
>

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-25  7:13                                       ` Linus Torvalds
@ 2011-10-25  9:01                                         ` David Miller
  2011-10-25 12:30                                           ` Thomas Gleixner
  0 siblings, 1 reply; 156+ messages in thread
From: David Miller @ 2011-10-25  9:01 UTC (permalink / raw)
  To: torvalds
  Cc: sim, netdev, tglx, a.p.zijlstra, linux-kernel, davej, schwidefsky, mingo

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Tue, 25 Oct 2011 09:13:48 +0200

> Added netdev, because this seems to be a generic networking bug (ABBA
> between sk_lock and icsk_retransmit_timer if my quick scan looks
> correct).
> 
> Davem?

I suspect that's all just a side effect of whatever is creating the
preempt_count imbalance.


^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-25  9:01                                         ` David Miller
@ 2011-10-25 12:30                                           ` Thomas Gleixner
  2011-10-25 23:18                                             ` David Miller
  0 siblings, 1 reply; 156+ messages in thread
From: Thomas Gleixner @ 2011-10-25 12:30 UTC (permalink / raw)
  To: David Miller
  Cc: torvalds, sim, netdev, a.p.zijlstra, linux-kernel, davej,
	schwidefsky, mingo

On Tue, 25 Oct 2011, David Miller wrote:

> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Tue, 25 Oct 2011 09:13:48 +0200
> 
> > Added netdev, because this seems to be a generic networking bug (ABBA
> > between sk_lock and icsk_retransmit_timer if my quick scan looks
> > correct).
> > 
> > Davem?
> 
> I suspect that's all just a side effect of whatever is creating the
> preempt_count imbalance.

Something is holding socket lock and it was acquired in sk_clone()
which does bh_lock_sock() and returns with the lock held, though I got
completely lost in the gazillions of possible callchains ...

While staring at it I found an missing unlock in sk_clone() itself,
but that's not the one which causes the leak. Lockdep would have
complained about that separately :)

Thanks,

	tglx

--------->
Subject: net: Unlock sock before calling sk_free()

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Index: linux-2.6/net/core/sock.c
===================================================================
--- linux-2.6.orig/net/core/sock.c
+++ linux-2.6/net/core/sock.c
@@ -1260,6 +1260,7 @@ struct sock *sk_clone(const struct sock 
 			/* It is still raw copy of parent, so invalidate
 			 * destructor and make plain sk_free() */
 			newsk->sk_destruct = NULL;
+			bh_unlock_sock(newsk);
 			sk_free(newsk);
 			newsk = NULL;
 			goto out;


^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-18 20:12                                     ` Linus Torvalds
@ 2011-10-25 15:26                                       ` Simon Kirby
  2011-10-26  1:47                                         ` Yong Zhang
  0 siblings, 1 reply; 156+ messages in thread
From: Simon Kirby @ 2011-10-25 15:26 UTC (permalink / raw)
  To: Linus Torvalds, Peter Zijlstra, Ingo Molnar
  Cc: Thomas Gleixner, Linux Kernel Mailing List, Dave Jones,
	Martin Schwidefsky, Ingo Molnar, David Miller

On Tue, Oct 18, 2011 at 01:12:41PM -0700, Linus Torvalds wrote:

> On Tue, Oct 18, 2011 at 12:48 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > It does not look related.
> 
> Yeah, the only lock held there seems to be the socket lock, and it
> looks like all CPU's are spinning on it.
> 
> > Could you try to reproduce that problem with
> > lockdep enabled? lockdep might make it go away, but it's definitely
> > worth a try.
> 
> And DEBUG_SPINLOCK / DEBUG_SPINLOCK_SLEEP too. Maybe you're triggering
> some odd networking thing.  It sounds unlikely, but maybe some error
> case you get into doesn't release the socket lock.
> 
> I think PROVE_LOCKING already enables DEBUG_SPINLOCK, but the sleeping
> lock thing is separate, iirc.

I think the config option you were trying to think of is
CONFIG_DEBUG_ATOMIC_SLEEP, which enables CONFIG_PREEMPT_COUNT.

By the way, we got this WARN_ON_ONCE while running lockdep elsewhere:

       /*
        * We can walk the hash lockfree, because the hash only
        * grows, and we are careful when adding entries to the end:
        */
       list_for_each_entry(class, hash_head, hash_entry) {
               if (class->key == key) {
                       WARN_ON_ONCE(class->name != lock->name);
                       return class;
               }
       }

[19274.691090] ------------[ cut here ]------------
[19274.691107] WARNING: at kernel/lockdep.c:690 __lock_acquire+0xfd6/0x2180()
[19274.691112] Hardware name: PowerEdge 2950
[19274.691115] Modules linked in: drbd lru_cache cn ipmi_devintf ipmi_si ipmi_msghandler sata_sil24 bnx2
[19274.691137] Pid: 4416, comm: heartbeat Not tainted 3.1.0-hw-lockdep+ #52
[19274.691141] Call Trace:
[19274.691149]  [<ffffffff81098f96>] ? __lock_acquire+0xfd6/0x2180
[19274.691156]  [<ffffffff8105c4f0>] warn_slowpath_common+0x80/0xc0
[19274.691163]  [<ffffffff8105c545>] warn_slowpath_null+0x15/0x20
[19274.691169]  [<ffffffff81098f96>] __lock_acquire+0xfd6/0x2180
[19274.691175]  [<ffffffff8109a2e9>] ? lock_release_non_nested+0x1a9/0x340
[19274.691181]  [<ffffffff8109a7b9>] lock_acquire+0x109/0x140
[19274.691185]  [<ffffffff8104a302>] ? double_rq_lock+0x52/0x80
[19274.691191]  [<ffffffff813a4f8a>] ? __delay+0xa/0x10
[19274.691197]  [<ffffffff816f55fa>] _raw_spin_lock_nested+0x3a/0x50
[19274.691201]  [<ffffffff8104a302>] ? double_rq_lock+0x52/0x80
[19274.691205]  [<ffffffff8104a302>] double_rq_lock+0x52/0x80
[19274.691210]  [<ffffffff81058167>] load_balance+0x897/0x16e0
[19274.691215]  [<ffffffff81058199>] ? load_balance+0x8c9/0x16e0
[19274.691219]  [<ffffffff8104d172>] ? update_shares+0xd2/0x150
[19274.691226]  [<ffffffff816f2572>] ? __schedule+0x842/0xa20
[19274.691232]  [<ffffffff816f2608>] __schedule+0x8d8/0xa20
[19274.691238]  [<ffffffff816f2572>] ? __schedule+0x842/0xa20
[19274.691243]  [<ffffffff81063e87>] ? local_bh_enable+0xa7/0x110
[19274.691249]  [<ffffffff8169c008>] ? unix_stream_recvmsg+0x1d8/0x7f0
[19274.691254]  [<ffffffff81614c88>] ? dev_queue_xmit+0x1a8/0x8a0
[19274.691258]  [<ffffffff816f282a>] schedule+0x3a/0x60
[19274.691265]  [<ffffffff816f4515>] schedule_hrtimeout_range_clock+0x105/0x120
[19274.691270]  [<ffffffff81096c9d>] ? trace_hardirqs_on+0xd/0x10
[19274.691276]  [<ffffffff81080d89>] ? add_wait_queue+0x49/0x60
[19274.691282]  [<ffffffff816f453e>] schedule_hrtimeout_range+0xe/0x10
[19274.691291]  [<ffffffff8113dc04>] poll_schedule_timeout+0x44/0x70
[19274.691297]  [<ffffffff8113e29c>] do_sys_poll+0x33c/0x4f0
[19274.691303]  [<ffffffff8113dcf0>] ? poll_freewait+0xc0/0xc0
[19274.691309]  [<ffffffff8113ddf0>] ? __pollwait+0x100/0x100
[19274.691317]  [<ffffffff81602c3d>] ? sock_update_classid+0xfd/0x140
[19274.691323]  [<ffffffff81602bb0>] ? sock_update_classid+0x70/0x140
[19274.691330]  [<ffffffff815ff1f7>] ? sock_recvmsg+0xf7/0x130
[19274.691336]  [<ffffffff81098450>] ? __lock_acquire+0x490/0x2180
[19274.691343]  [<ffffffff8110427e>] ? might_fault+0x4e/0xa0
[19274.691351]  [<ffffffff8101a129>] ? sched_clock+0x9/0x10
[19274.691356]  [<ffffffff810944cd>] ? trace_hardirqs_off+0xd/0x10
[19274.691363]  [<ffffffff815ffb0b>] ? sys_recvfrom+0xbb/0x120
[19274.691370]  [<ffffffff81082540>] ? process_cpu_clock_getres+0x10/0x10
[19274.691376]  [<ffffffff8110427e>] ? might_fault+0x4e/0xa0
[19274.691383]  [<ffffffff8110427e>] ? might_fault+0x4e/0xa0
[19274.691390]  [<ffffffff816fe0ca>] ? sysret_check+0x2e/0x69
[19274.691396]  [<ffffffff8113e647>] sys_poll+0x77/0x110
[19274.691402]  [<ffffffff816fe092>] system_call_fastpath+0x16/0x1b
[19274.691407] ---[ end trace 74fbaae9066aadcc ]---

Simon-

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-24 19:02                                     ` Simon Kirby
  2011-10-25  7:13                                       ` Linus Torvalds
@ 2011-10-25 20:20                                       ` Simon Kirby
  2011-10-31 17:32                                         ` Simon Kirby
  2011-11-18 23:11                                         ` [tip:perf/core] lockdep: Show subclass in pretty print of lockdep output tip-bot for Steven Rostedt
  1 sibling, 2 replies; 156+ messages in thread
From: Simon Kirby @ 2011-10-25 20:20 UTC (permalink / raw)
  To: Thomas Gleixner, David Miller
  Cc: Peter Zijlstra, Linus Torvalds, Linux Kernel Mailing List,
	Dave Jones, Martin Schwidefsky, Ingo Molnar, Network Development

On Mon, Oct 24, 2011 at 12:02:03PM -0700, Simon Kirby wrote:

> Ok, hit the hang about 4 more times, but only this morning on a box with
> a serial cable attached. Yay!

Here's lockdep output from another box. This one looks a bit different.

Simon-

[583223.799383] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[583223.805083] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[583223.805093] 
[583223.805094] =================================
[583223.805096] [ INFO: inconsistent lock state ]
[583223.805098] 3.1.0-rc10-hw-lockdep+ #51
[583223.805100] ---------------------------------
[583223.805102] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
[583223.805105] swapper/0 [HC0[0]:SC0[0]:HE1:SE1] takes:
[583223.805107]  (slock-AF_INET){+.?.-.}, at: [<ffffffff81604540>] sk_clone+0x120/0x420
[583223.805116] {IN-SOFTIRQ-W} state was registered at:
[583223.805117]   [<ffffffff81098c7c>] __lock_acquire+0xcbc/0x2180
[583223.805123]   [<ffffffff8109a7b9>] lock_acquire+0x109/0x140
[583223.805126]   [<ffffffff816f55fc>] _raw_spin_lock+0x3c/0x50
[583223.805131]   [<ffffffff8166bd3d>] udp_queue_rcv_skb+0x26d/0x4b0
[583223.805135]   [<ffffffff8166c6a3>] __udp4_lib_rcv+0x2f3/0x910
[583223.805139]   [<ffffffff8166ccd5>] udp_rcv+0x15/0x20
[583223.805142]   [<ffffffff81641960>] ip_local_deliver_finish+0x100/0x2f0
[583223.805146]   [<ffffffff81641bdd>] ip_local_deliver+0x8d/0xa0
[583223.805149]   [<ffffffff81641203>] ip_rcv_finish+0x1a3/0x510
[583223.805152]   [<ffffffff816417e2>] ip_rcv+0x272/0x2f0
[583223.805154]   [<ffffffff81610d67>] __netif_receive_skb+0x4d7/0x560
[583223.805158]   [<ffffffff81610ec0>] process_backlog+0xd0/0x1e0
[583223.805161]   [<ffffffff81613880>] net_rx_action+0x140/0x2c0
[583223.805164]   [<ffffffff810640b8>] __do_softirq+0x138/0x250
[583223.805168]   [<ffffffff817002bc>] call_softirq+0x1c/0x30
[583223.805172]   [<ffffffff810153c5>] do_softirq+0x95/0xd0
[583223.805176]   [<ffffffff81063ecd>] local_bh_enable+0xed/0x110
[583223.805179]   [<ffffffff81614c48>] dev_queue_xmit+0x1a8/0x8a0
[583223.805181]   [<ffffffff8161f1aa>] neigh_resolve_output+0x17a/0x220
[583223.805185]   [<ffffffff81647d4c>] ip_finish_output+0x2ec/0x590
[583223.805188]   [<ffffffff81648078>] ip_output+0x88/0xe0
[583223.805191]   [<ffffffff81646cd8>] ip_local_out+0x28/0x80
[583223.805194]   [<ffffffff81646d39>] ip_send_skb+0x9/0x40
[583223.805197]   [<ffffffff8166aeb2>] udp_send_skb+0x122/0x390
[583223.805200]   [<ffffffff8166db0c>] udp_sendmsg+0x7dc/0x920
[583223.805203]   [<ffffffff81675e1f>] inet_sendmsg+0xbf/0x120
[583223.805207]   [<ffffffff815ff333>] sock_sendmsg+0xe3/0x110
[583223.805209]   [<ffffffff815ffc55>] sys_sendto+0x105/0x140
[583223.805212]   [<ffffffff816fe052>] system_call_fastpath+0x16/0x1b
[583223.805217] irq event stamp: 4284605374
[583223.805219] hardirqs last  enabled at (4284605372): [<ffffffff816101ad>] net_rps_action_and_irq_enable+0x8d/0xa0
[583223.805222] hardirqs last disabled at (4284605373): [<ffffffff8106412d>] __do_softirq+0x1ad/0x250
[583223.805226] softirqs last  enabled at (4284605374): [<ffffffff81064176>] __do_softirq+0x1f6/0x250
[583223.805230] softirqs last disabled at (4284605313): [<ffffffff817002bc>] call_softirq+0x1c/0x30
[583223.805233] 
[583223.805233] other info that might help us debug this:
[583223.805235]  Possible unsafe locking scenario:
[583223.805236] 
[583223.805237]        CPU0
[583223.805238]        ----
[583223.805239]   lock(slock-AF_INET);
[583223.805241]   <Interrupt>
[583223.805242]     lock(slock-AF_INET);
[583223.805244] 
[583223.805245]  *** DEADLOCK ***
[583223.805246] 
[583223.805248] 1 lock held by swapper/0:
[583223.805249]  #0:  (slock-AF_INET){+.?.-.}, at: [<ffffffff81604540>] sk_clone+0x120/0x420
[583223.805254] 
[583223.805254] stack backtrace:
[583223.805257] Pid: 0, comm: swapper Not tainted 3.1.0-rc10-hw-lockdep+ #51
[583223.805259] Call Trace:
[583223.805264]  [<ffffffff81096033>] print_usage_bug+0x243/0x310
[583223.805267]  [<ffffffff810965b4>] mark_lock+0x4b4/0x6c0
[583223.805271]  [<ffffffff81097400>] ? check_usage_forwards+0x110/0x110
[583223.805275]  [<ffffffff81096862>] mark_held_locks+0xa2/0x130
[583223.805278]  [<ffffffff816f6334>] ? retint_restore_args+0x13/0x13
[583223.805281]  [<ffffffff81096c0d>] trace_hardirqs_on_caller+0x13d/0x1c0
[583223.805286]  [<ffffffff813a60ae>] trace_hardirqs_on_thunk+0x3a/0x3f
[583223.805290]  [<ffffffff81092b8e>] ? tick_nohz_stop_sched_tick+0x2fe/0x430
[583223.805293]  [<ffffffff816f6334>] ? retint_restore_args+0x13/0x13
[583223.805297]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[583223.805301]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[583223.805304]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[583223.805307]  [<ffffffff816ca491>] rest_init+0xd1/0xe0
[583223.805310]  [<ffffffff816ca3c0>] ? csum_partial_copy_generic+0x170/0x170
[583223.805315]  [<ffffffff81adcc55>] start_kernel+0x360/0x3ac
[583223.805318]  [<ffffffff81adc2a2>] x86_64_start_reservations+0x82/0x89
[583223.805321]  [<ffffffff81adc3b8>] x86_64_start_kernel+0x10f/0x12a
[583223.805325]  [<ffffffff81adc140>] ? early_idt_handlers+0x140/0x140
[583226.813848] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[583232.802948] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[583244.833571] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[583253.849631] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[583268.837126] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[587843.931805] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[587846.165584] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[587850.602316] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[587859.482841] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[587873.940136] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[587877.240624] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[590476.272022] BUG: soft lockup - CPU#0 stuck for 22s! [swapper:0]
[590476.276002] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[590476.276002] irq event stamp: 4284605374
[590476.276002] hardirqs last  enabled at (4284605372): [<ffffffff816101ad>] net_rps_action_and_irq_enable+0x8d/0xa0
[590476.276002] hardirqs last disabled at (4284605373): [<ffffffff8106412d>] __do_softirq+0x1ad/0x250
[590476.276002] softirqs last  enabled at (4284605374): [<ffffffff81064176>] __do_softirq+0x1f6/0x250
[590476.276002] softirqs last disabled at (4284605313): [<ffffffff817002bc>] call_softirq+0x1c/0x30
[590476.276002] CPU 0 
[590476.276002] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[590476.276002] 
[590476.276002] Pid: 0, comm: swapper Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0UR033
[590476.276002] RIP: 0010:[<ffffffff813a4ee3>]  [<ffffffff813a4ee3>] delay_tsc+0x73/0xd0
[590476.276002] RSP: 0018:ffff88022fc03cf0  EFLAGS: 00000206
[590476.276002] RAX: 00042f884dcdaa24 RBX: ffff88022fc0d3c0 RCX: 000000004dcdaa24
[590476.380029] BUG: soft lockup - CPU#1 stuck for 22s! [php:10828]
[590476.380033] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[590476.380044] irq event stamp: 0
[590476.380045] hardirqs last  enabled at (0): [<          (null)>]           (null)
[590476.380048] hardirqs last disabled at (0): [<ffffffff8105aa8b>] copy_process+0x65b/0x1450
[590476.380056] softirqs last  enabled at (0): [<ffffffff8105aa8b>] copy_process+0x65b/0x1450
[590476.380060] softirqs last disabled at (0): [<          (null)>]           (null)
[590476.380063] CPU 1 
[590476.380064] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2
[590476.380072] 
[590476.380075] Pid: 10828, comm: php Not tainted 3.1.0-rc10-hw-lockdep+ #51 Dell Inc. PowerEdge 1950/0UR033
[590476.380079] RIP: 0010:[<ffffffff8101afa6>]  [<ffffffff8101afa6>] native_read_tsc+0x6/0x20
[590476.380086] RSP: 0000:ffff88022fc43ce0  EFLAGS: 00000206
[590476.380088] RAX: 000000005aa56d04 RBX: ffffffff816f6334 RCX: 000000005aa56c92
[590476.380091] RDX: 0000000000042f88 RSI: ffffffff818f9896 RDI: 0000000000000001
[590476.380093] RBP: ffff88022fc43ce0 R08: 000000005aa56c92 R09: 0000000000000000
[590476.380096] R10: ffff88014b9a9f20 R11: 0000000000000000 R12: ffff88022fc43c58
[590476.380098] R13: ffffffff816feb33 R14: ffff88022fc43ce0 R15: 000000000e27878c
[590476.380101] FS:  00007fb61c8fa720(0000) GS:ffff88022fc40000(0000) knlGS:0000000000000000
[590476.380103] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[590476.380106] CR2: 00000000027914a0 CR3: 000000013a070000 CR4: 00000000000006e0
[590476.380108] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[590476.380110] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[590476.380113] Process php (pid: 10828, threadinfo ffff88014a1f2000, task ffff88014b9a9f20)
[590476.380115] Stack:
[590476.380117]  ffff88022fc43d30 ffffffff813a4eaf ffff88014a1f2000 000000005aa56c38
[590476.380121]  00000001818f9896 ffff88001db58048 000000000e27878c 0000000076e96800
[590476.380125]  0000000000000001 ffff88014b9a9f20 ffff88022fc43d40 ffffffff813a4f6a
[590476.380129] Call Trace:
[590476.380132]  <IRQ> 
[590476.380137]  [<ffffffff813a4eaf>] delay_tsc+0x3f/0xd0
[590476.380141]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[590476.380145]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[590476.380151]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[590476.380157]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[590476.380161]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[590476.380166]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[590476.380169]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[590476.380174]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[590476.380179]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[590476.380184]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[590476.380188]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[590476.380191]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[590476.380196]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[590476.380200]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[590476.380203]  <EOI> 
[590476.380206]  [<ffffffff816f6319>] ? retint_swapgs+0x13/0x1b
[590476.380208] Code: 08 ff 15 46 5c a1 00 48 c7 43 08 00 00 00 00 48 89 03 48 83 c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 31 
[590476.380227]  c1 48 89 d0 48 c1 e0 20 89 ca 48 09 d0 c9 c3 66 2e 0f 1f 84 
[590476.380236] Call Trace:
[590476.380237]  <IRQ>  [<ffffffff813a4eaf>] delay_tsc+0x3f/0xd0
[590476.380242]  [<ffffffff813a4f6a>] __delay+0xa/0x10
[590476.380246]  [<ffffffff813ac2ab>] do_raw_spin_lock+0x13b/0x180
[590476.380249]  [<ffffffff816f5604>] _raw_spin_lock+0x44/0x50
[590476.380252]  [<ffffffff81661823>] ? tcp_keepalive_timer+0x23/0x270
[590476.380256]  [<ffffffff81661823>] tcp_keepalive_timer+0x23/0x270
[590476.380259]  [<ffffffff8106cd5d>] run_timer_softirq+0x26d/0x410
[590476.380262]  [<ffffffff8106cc88>] ? run_timer_softirq+0x198/0x410
[590476.380265]  [<ffffffff81661800>] ? tcp_init_xmit_timers+0x20/0x20
[590476.380268]  [<ffffffff810640b8>] __do_softirq+0x138/0x250
[590476.380271]  [<ffffffff817002bc>] call_softirq+0x1c/0x30
[590476.380274]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[590476.380277]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110
[590476.380280]  [<ffffffff810310c9>] smp_apic_timer_interrupt+0x69/0xa0
[590476.380283]  [<ffffffff816feb33>] apic_timer_interrupt+0x73/0x80
[590476.380285]  <EOI>  [<ffffffff816f6319>] ? retint_swapgs+0x13/0x1b
[590476.484032] BUG: soft lockup - CPU#2 stuck for 23s! [suexec:10831]
...

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-25 12:30                                           ` Thomas Gleixner
@ 2011-10-25 23:18                                             ` David Miller
  0 siblings, 0 replies; 156+ messages in thread
From: David Miller @ 2011-10-25 23:18 UTC (permalink / raw)
  To: tglx
  Cc: torvalds, sim, netdev, a.p.zijlstra, linux-kernel, davej,
	schwidefsky, mingo

From: Thomas Gleixner <tglx@linutronix.de>
Date: Tue, 25 Oct 2011 14:30:50 +0200 (CEST)

> Subject: net: Unlock sock before calling sk_free()
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Good spotting, applied, thanks Thomas!

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-25 15:26                                       ` Simon Kirby
@ 2011-10-26  1:47                                         ` Yong Zhang
  0 siblings, 0 replies; 156+ messages in thread
From: Yong Zhang @ 2011-10-26  1:47 UTC (permalink / raw)
  To: Simon Kirby
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	David Miller

On Tue, Oct 25, 2011 at 08:26:31AM -0700, Simon Kirby wrote:
> On Tue, Oct 18, 2011 at 01:12:41PM -0700, Linus Torvalds wrote:
> 
> > On Tue, Oct 18, 2011 at 12:48 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > >
> > > It does not look related.
> > 
> > Yeah, the only lock held there seems to be the socket lock, and it
> > looks like all CPU's are spinning on it.
> > 
> > > Could you try to reproduce that problem with
> > > lockdep enabled? lockdep might make it go away, but it's definitely
> > > worth a try.
> > 
> > And DEBUG_SPINLOCK / DEBUG_SPINLOCK_SLEEP too. Maybe you're triggering
> > some odd networking thing.  It sounds unlikely, but maybe some error
> > case you get into doesn't release the socket lock.
> > 
> > I think PROVE_LOCKING already enables DEBUG_SPINLOCK, but the sleeping
> > lock thing is separate, iirc.
> 
> I think the config option you were trying to think of is
> CONFIG_DEBUG_ATOMIC_SLEEP, which enables CONFIG_PREEMPT_COUNT.
> 
> By the way, we got this WARN_ON_ONCE while running lockdep elsewhere:
> 
>        /*
>         * We can walk the hash lockfree, because the hash only
>         * grows, and we are careful when adding entries to the end:
>         */
>        list_for_each_entry(class, hash_head, hash_entry) {
>                if (class->key == key) {
>                        WARN_ON_ONCE(class->name != lock->name);

Someone has hit this before, maybe you can try the patch in:
http://marc.info/?l=linux-kernel&m=131919035525533

Thanks,
Yong

>                        return class;
>                }
>        }
> 
> [19274.691090] ------------[ cut here ]------------
> [19274.691107] WARNING: at kernel/lockdep.c:690 __lock_acquire+0xfd6/0x2180()
> [19274.691112] Hardware name: PowerEdge 2950
> [19274.691115] Modules linked in: drbd lru_cache cn ipmi_devintf ipmi_si ipmi_msghandler sata_sil24 bnx2
> [19274.691137] Pid: 4416, comm: heartbeat Not tainted 3.1.0-hw-lockdep+ #52
> [19274.691141] Call Trace:
> [19274.691149]  [<ffffffff81098f96>] ? __lock_acquire+0xfd6/0x2180
> [19274.691156]  [<ffffffff8105c4f0>] warn_slowpath_common+0x80/0xc0
> [19274.691163]  [<ffffffff8105c545>] warn_slowpath_null+0x15/0x20
> [19274.691169]  [<ffffffff81098f96>] __lock_acquire+0xfd6/0x2180
> [19274.691175]  [<ffffffff8109a2e9>] ? lock_release_non_nested+0x1a9/0x340
> [19274.691181]  [<ffffffff8109a7b9>] lock_acquire+0x109/0x140
> [19274.691185]  [<ffffffff8104a302>] ? double_rq_lock+0x52/0x80
> [19274.691191]  [<ffffffff813a4f8a>] ? __delay+0xa/0x10
> [19274.691197]  [<ffffffff816f55fa>] _raw_spin_lock_nested+0x3a/0x50
> [19274.691201]  [<ffffffff8104a302>] ? double_rq_lock+0x52/0x80
> [19274.691205]  [<ffffffff8104a302>] double_rq_lock+0x52/0x80
> [19274.691210]  [<ffffffff81058167>] load_balance+0x897/0x16e0
> [19274.691215]  [<ffffffff81058199>] ? load_balance+0x8c9/0x16e0
> [19274.691219]  [<ffffffff8104d172>] ? update_shares+0xd2/0x150
> [19274.691226]  [<ffffffff816f2572>] ? __schedule+0x842/0xa20
> [19274.691232]  [<ffffffff816f2608>] __schedule+0x8d8/0xa20
> [19274.691238]  [<ffffffff816f2572>] ? __schedule+0x842/0xa20
> [19274.691243]  [<ffffffff81063e87>] ? local_bh_enable+0xa7/0x110
> [19274.691249]  [<ffffffff8169c008>] ? unix_stream_recvmsg+0x1d8/0x7f0
> [19274.691254]  [<ffffffff81614c88>] ? dev_queue_xmit+0x1a8/0x8a0
> [19274.691258]  [<ffffffff816f282a>] schedule+0x3a/0x60
> [19274.691265]  [<ffffffff816f4515>] schedule_hrtimeout_range_clock+0x105/0x120
> [19274.691270]  [<ffffffff81096c9d>] ? trace_hardirqs_on+0xd/0x10
> [19274.691276]  [<ffffffff81080d89>] ? add_wait_queue+0x49/0x60
> [19274.691282]  [<ffffffff816f453e>] schedule_hrtimeout_range+0xe/0x10
> [19274.691291]  [<ffffffff8113dc04>] poll_schedule_timeout+0x44/0x70
> [19274.691297]  [<ffffffff8113e29c>] do_sys_poll+0x33c/0x4f0
> [19274.691303]  [<ffffffff8113dcf0>] ? poll_freewait+0xc0/0xc0
> [19274.691309]  [<ffffffff8113ddf0>] ? __pollwait+0x100/0x100
> [19274.691317]  [<ffffffff81602c3d>] ? sock_update_classid+0xfd/0x140
> [19274.691323]  [<ffffffff81602bb0>] ? sock_update_classid+0x70/0x140
> [19274.691330]  [<ffffffff815ff1f7>] ? sock_recvmsg+0xf7/0x130
> [19274.691336]  [<ffffffff81098450>] ? __lock_acquire+0x490/0x2180
> [19274.691343]  [<ffffffff8110427e>] ? might_fault+0x4e/0xa0
> [19274.691351]  [<ffffffff8101a129>] ? sched_clock+0x9/0x10
> [19274.691356]  [<ffffffff810944cd>] ? trace_hardirqs_off+0xd/0x10
> [19274.691363]  [<ffffffff815ffb0b>] ? sys_recvfrom+0xbb/0x120
> [19274.691370]  [<ffffffff81082540>] ? process_cpu_clock_getres+0x10/0x10
> [19274.691376]  [<ffffffff8110427e>] ? might_fault+0x4e/0xa0
> [19274.691383]  [<ffffffff8110427e>] ? might_fault+0x4e/0xa0
> [19274.691390]  [<ffffffff816fe0ca>] ? sysret_check+0x2e/0x69
> [19274.691396]  [<ffffffff8113e647>] sys_poll+0x77/0x110
> [19274.691402]  [<ffffffff816fe092>] system_call_fastpath+0x16/0x1b
> [19274.691407] ---[ end trace 74fbaae9066aadcc ]---
> 
> Simon-
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Only stand for myself

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-25 20:20                                       ` Simon Kirby
@ 2011-10-31 17:32                                         ` Simon Kirby
  2011-11-02 16:40                                           ` Thomas Gleixner
  2011-11-02 22:10                                           ` Steven Rostedt
  2011-11-18 23:11                                         ` [tip:perf/core] lockdep: Show subclass in pretty print of lockdep output tip-bot for Steven Rostedt
  1 sibling, 2 replies; 156+ messages in thread
From: Simon Kirby @ 2011-10-31 17:32 UTC (permalink / raw)
  To: Thomas Gleixner, David Miller
  Cc: Peter Zijlstra, Linus Torvalds, Linux Kernel Mailing List,
	Dave Jones, Martin Schwidefsky, Ingo Molnar, Network Development

On Tue, Oct 25, 2011 at 01:20:49PM -0700, Simon Kirby wrote:

> On Mon, Oct 24, 2011 at 12:02:03PM -0700, Simon Kirby wrote:
> 
> > Ok, hit the hang about 4 more times, but only this morning on a box with
> > a serial cable attached. Yay!
> 
> Here's lockdep output from another box. This one looks a bit different.

One more, again a bit different. The last few lockups have looked like
this. Not sure why, but we're hitting this at a few a day now. Thomas,
this is without your patch, but as you said, that's right before a free
and should print a separate lockdep warning.

No "huh" lines until after the trace on this one. I'll move to 3.1 with
cherry-picked b0691c8e now.

Simon-

[104661.173798] 
[104661.173801] =======================================================
[104661.179922] [ INFO: possible circular locking dependency detected ]
[104661.179922] 3.1.0-rc10-hw-lockdep+ #51
[104661.179922] -------------------------------------------------------
[104661.179922] watchdog.pl/29331 is trying to acquire lock:
[104661.179922]  (slock-AF_INET/1){+.-.-.}, at: [<ffffffff81664887>] tcp_v4_rcv+0x867/0xc10
[104661.179922] 
[104661.179922] but task is already holding lock:
[104661.179922]  (slock-AF_INET){+.-.-.}, at: [<ffffffff81604540>] sk_clone+0x120/0x420
[104661.179922] 
[104661.179922] which lock already depends on the new lock.
[104661.179922] 
[104661.179922] 
[104661.179922] the existing dependency chain (in reverse order) is:
[104661.239412] 
[104661.239412] -> #1 (slock-AF_INET){+.-.-.}:
[104661.244767]        [<ffffffff8109a7b9>] lock_acquire+0x109/0x140
[104661.244767]        [<ffffffff816f55fc>] _raw_spin_lock+0x3c/0x50
[104661.244767]        [<ffffffff81604540>] sk_clone+0x120/0x420
[104661.244767]        [<ffffffff8164cb33>] inet_csk_clone+0x13/0x90
[104661.244767]        [<ffffffff816669a5>] tcp_create_openreq_child+0x25/0x4d0
[104661.244767]        [<ffffffff81664c78>] tcp_v4_syn_recv_sock+0x48/0x2c0
[104661.244767]        [<ffffffff816667f5>] tcp_check_req+0x335/0x4c0
[104661.244767]        [<ffffffff81663e5e>] tcp_v4_do_rcv+0x29e/0x460
[104661.244767]        [<ffffffff816648ac>] tcp_v4_rcv+0x88c/0xc10   
[104661.244767]        [<ffffffff81641960>] ip_local_deliver_finish+0x100/0x2f0
[104661.244767]        [<ffffffff81641bdd>] ip_local_deliver+0x8d/0xa0
[104661.244767]        [<ffffffff81641203>] ip_rcv_finish+0x1a3/0x510 
[104661.244767]        [<ffffffff816417e2>] ip_rcv+0x272/0x2f0
[104661.244767]        [<ffffffff81610d67>] __netif_receive_skb+0x4d7/0x560
[104661.244767]        [<ffffffff81610ec0>] process_backlog+0xd0/0x1e0
[104661.244767]        [<ffffffff81613880>] net_rx_action+0x140/0x2c0 
[104661.244767]        [<ffffffff810640b8>] __do_softirq+0x138/0x250  
[104661.244767]        [<ffffffff817002bc>] call_softirq+0x1c/0x30    
[104661.244767]        [<ffffffff810153c5>] do_softirq+0x95/0xd0      
[104661.244767]        [<ffffffff81063dbd>] local_bh_enable_ip+0xed/0x110
[104661.244767]        [<ffffffff816f5e9f>] _raw_spin_unlock_bh+0x3f/0x50
[104661.244767]        [<ffffffff81602e41>] release_sock+0x161/0x1d0
[104661.244767]        [<ffffffff816762ed>] inet_stream_connect+0x6d/0x2f0
[104661.244767]        [<ffffffff815fcfeb>] kernel_connect+0xb/0x10
[104661.244767]        [<ffffffff816aaf86>] xs_tcp_setup_socket+0x2a6/0x4c0
[104661.244767]        [<ffffffff81078cf9>] process_one_work+0x1e9/0x560   
[104661.244767]        [<ffffffff81079403>] worker_thread+0x193/0x420      
[104661.244767]        [<ffffffff81080466>] kthread+0x96/0xb0
[104661.244767]        [<ffffffff817001c4>] kernel_thread_helper+0x4/0x10
[104661.244767] 
[104661.244767] -> #0 (slock-AF_INET/1){+.-.-.}:
[104661.244767]        [<ffffffff8109a000>] __lock_acquire+0x2040/0x2180
[104661.244767]        [<ffffffff8109a7b9>] lock_acquire+0x109/0x140
[104661.244767]        [<ffffffff816f55aa>] _raw_spin_lock_nested+0x3a/0x50
[104661.244767]        [<ffffffff81664887>] tcp_v4_rcv+0x867/0xc10
[104661.244767]        [<ffffffff81641960>] ip_local_deliver_finish+0x100/0x2f0
[104661.244767]        [<ffffffff81641bdd>] ip_local_deliver+0x8d/0xa0
[104661.244767]        [<ffffffff81641203>] ip_rcv_finish+0x1a3/0x510 
[104661.244767]        [<ffffffff816417e2>] ip_rcv+0x272/0x2f0
[104661.244767]        [<ffffffff81610d67>] __netif_receive_skb+0x4d7/0x560
[104661.244767]        [<ffffffff81612e24>] netif_receive_skb+0x104/0x120  
[104661.244767]        [<ffffffff81612f70>] napi_skb_finish+0x50/0x70
[104661.244767]        [<ffffffff81613635>] napi_gro_receive+0xc5/0xd0
[104661.244767]        [<ffffffffa000ad50>] bnx2_poll_work+0x610/0x1560 [bnx2]
[104661.244767]        [<ffffffffa000bde6>] bnx2_poll+0x66/0x250 [bnx2]
[104661.244767]        [<ffffffff81613880>] net_rx_action+0x140/0x2c0  
[104661.244767]        [<ffffffff810640b8>] __do_softirq+0x138/0x250   
[104661.244767]        [<ffffffff817002bc>] call_softirq+0x1c/0x30     
[104661.244767]        [<ffffffff810153c5>] do_softirq+0x95/0xd0       
[104661.244767]        [<ffffffff81063c8d>] irq_exit+0xdd/0x110        
[104661.244767]        [<ffffffff81014b74>] do_IRQ+0x64/0xe0           
[104661.244767]        [<ffffffff816f6273>] ret_from_intr+0x0/0x1a     
[104661.244767]        [<ffffffff816f65b5>] page_fault+0x25/0x30     
[104661.244767] 
[104661.244767] other info that might help us debug this:
[104661.244767] 
[104661.244767]  Possible unsafe locking scenario:
[104661.244767]        
[104661.244767]        CPU0                    CPU1
[104661.244767]        ----                    ----
[104661.244767]   lock(slock-AF_INET);
[104661.244767]                                lock(slock-AF_INET);
[104661.244767]                                lock(slock-AF_INET);
[104661.244767]   lock(slock-AF_INET);
[104661.244767] 
[104661.244767]  *** DEADLOCK ***
[104661.244767] 
[104661.244767] 3 locks held by watchdog.pl/29331:
[104661.244767]  #0:  (slock-AF_INET){+.-.-.}, at: [<ffffffff81604540>] sk_clone+0x120/0x420
[104661.244767]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff816109f5>] __netif_receive_skb+0x165/0x560
[104661.244767]  #2:  (rcu_read_lock){.+.+..}, at: [<ffffffff816418a0>] ip_local_deliver_finish+0x40/0x2f0
[104661.244767] 
[104661.244767] stack backtrace:
[104661.244767] Pid: 29331, comm: watchdog.pl Not tainted 3.1.0-rc10-hw-lockdep+ #51
[104661.244767] Call Trace:
[104661.244767]  <IRQ>  [<ffffffff81097eab>] print_circular_bug+0x21b/0x330
[104661.244767]  [<ffffffff8109a000>] __lock_acquire+0x2040/0x2180
[104661.244767]  [<ffffffff8109a7b9>] lock_acquire+0x109/0x140
[104661.244767]  [<ffffffff81664887>] ? tcp_v4_rcv+0x867/0xc10
[104661.244767]  [<ffffffff816f55aa>] _raw_spin_lock_nested+0x3a/0x50
[104661.244767]  [<ffffffff81664887>] ? tcp_v4_rcv+0x867/0xc10
[104661.244767]  [<ffffffff81664887>] tcp_v4_rcv+0x867/0xc10  
[104661.244767]  [<ffffffff816418a0>] ? ip_local_deliver_finish+0x40/0x2f0
[104661.244767]  [<ffffffff81636978>] ? nf_hook_slow+0x148/0x1a0
[104661.244767]  [<ffffffff81641960>] ip_local_deliver_finish+0x100/0x2f0
[104661.244767]  [<ffffffff816418a0>] ? ip_local_deliver_finish+0x40/0x2f0
[104661.244767]  [<ffffffff81641bdd>] ip_local_deliver+0x8d/0xa0
[104661.244767]  [<ffffffff81641203>] ip_rcv_finish+0x1a3/0x510 
[104661.244767]  [<ffffffff816417e2>] ip_rcv+0x272/0x2f0
[104661.244767]  [<ffffffff81610d67>] __netif_receive_skb+0x4d7/0x560
[104661.244767]  [<ffffffff816109f5>] ? __netif_receive_skb+0x165/0x560
[104661.244767]  [<ffffffff81612e24>] netif_receive_skb+0x104/0x120
[104661.244767]  [<ffffffff81612d43>] ? netif_receive_skb+0x23/0x120
[104661.244767]  [<ffffffff816133ab>] ? dev_gro_receive+0x29b/0x380 
[104661.244767]  [<ffffffff816132a2>] ? dev_gro_receive+0x192/0x380 
[104661.244767]  [<ffffffff81612f70>] napi_skb_finish+0x50/0x70
[104661.244767]  [<ffffffff81613635>] napi_gro_receive+0xc5/0xd0
[104661.244767]  [<ffffffffa000ad50>] bnx2_poll_work+0x610/0x1560 [bnx2]
[104661.244767]  [<ffffffffa000bde6>] bnx2_poll+0x66/0x250 [bnx2]
[104661.244767]  [<ffffffff81613880>] net_rx_action+0x140/0x2c0  
[104661.244767]  [<ffffffff810640b8>] __do_softirq+0x138/0x250   
[104661.244767]  [<ffffffff817002bc>] call_softirq+0x1c/0x30     
[104661.244767]  [<ffffffff810153c5>] do_softirq+0x95/0xd0       
[104661.244767]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110        
[104661.244767]  [<ffffffff81014b74>] do_IRQ+0x64/0xe0           
[104661.244767]  [<ffffffff816f6273>] common_interrupt+0x73/0x73
[104661.244767]  <EOI>  [<ffffffff816f99b3>] ? do_page_fault+0x93/0x520
[104661.244767]  [<ffffffff816f99af>] ? do_page_fault+0x8f/0x520
[104661.244767]  [<ffffffff81149afc>] ? vfsmount_lock_local_unlock+0x1c/0x40
[104661.244767]  [<ffffffff8114a79b>] ? mntput_no_expire+0x3b/0x150
[104661.244767]  [<ffffffff8114a8ca>] ? mntput+0x1a/0x30
[104661.244767]  [<ffffffff8112c540>] ? fput+0x190/0x230
[104661.244767]  [<ffffffff813a60ed>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[104661.244767]  [<ffffffff816f65b5>] page_fault+0x25/0x30
[104661.897577] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[104661.923653] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[104663.418206] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[104666.420003] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[104672.425159] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[104684.423542] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[104691.206752] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-31 17:32                                         ` Simon Kirby
@ 2011-11-02 16:40                                           ` Thomas Gleixner
  2011-11-02 17:27                                             ` Eric Dumazet
  2011-11-02 18:28                                             ` Simon Kirby
  2011-11-02 22:10                                           ` Steven Rostedt
  1 sibling, 2 replies; 156+ messages in thread
From: Thomas Gleixner @ 2011-11-02 16:40 UTC (permalink / raw)
  To: Simon Kirby
  Cc: David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

On Mon, 31 Oct 2011, Simon Kirby wrote:
> On Tue, Oct 25, 2011 at 01:20:49PM -0700, Simon Kirby wrote:
> 
> > On Mon, Oct 24, 2011 at 12:02:03PM -0700, Simon Kirby wrote:
> > 
> > > Ok, hit the hang about 4 more times, but only this morning on a box with
> > > a serial cable attached. Yay!
> > 
> > Here's lockdep output from another box. This one looks a bit different.
> 
> One more, again a bit different. The last few lockups have looked like
> this. Not sure why, but we're hitting this at a few a day now. Thomas,
> this is without your patch, but as you said, that's right before a free
> and should print a separate lockdep warning.
> 
> No "huh" lines until after the trace on this one. I'll move to 3.1 with

That means that the lockdep warning hit in the same net_rx cycle
before the leak was detected by the softirq code.

> cherry-picked b0691c8e now.

Can you please add the debug patch below and try the following:

Enable CONFIG_FUNCTION_TRACER & CONFIG_FUNCTION_GRAPH_TRACER

# cd $DEBUGFSMOUNTPOINT/tracing
# echo sk_clone >set_ftrace_filter
# echo function >current_tracer
# echo 1 >options/func_stack_trace

Now wait until it reproduces (which stops the trace) and read out

# cat trace >/tmp/trace.txt

Please provide the trace file along with the lockdep splat. That
should tell us which callchain is responsible for the spinlock
leakage.

Thanks,

	tglx

--------------->
 kernel/softirq.c |    1 +
 1 file changed, 1 insertion(+)

Index: linux-2.6/kernel/softirq.c
===================================================================
--- linux-2.6.orig/kernel/softirq.c
+++ linux-2.6/kernel/softirq.c
@@ -238,6 +238,7 @@ restart:
 			h->action(h);
 			trace_softirq_exit(vec_nr);
 			if (unlikely(prev_count != preempt_count())) {
+				tracing_off();
 				printk(KERN_ERR "huh, entered softirq %u %s %p"
 				       "with preempt_count %08x,"
 				       " exited with %08x?\n", vec_nr,

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 16:40                                           ` Thomas Gleixner
@ 2011-11-02 17:27                                             ` Eric Dumazet
  2011-11-02 17:46                                               ` Linus Torvalds
                                                                 ` (2 more replies)
  2011-11-02 18:28                                             ` Simon Kirby
  1 sibling, 3 replies; 156+ messages in thread
From: Eric Dumazet @ 2011-11-02 17:27 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Simon Kirby, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

Le mercredi 02 novembre 2011 à 17:40 +0100, Thomas Gleixner a écrit :
> On Mon, 31 Oct 2011, Simon Kirby wrote:
> > On Tue, Oct 25, 2011 at 01:20:49PM -0700, Simon Kirby wrote:
> > 
> > > On Mon, Oct 24, 2011 at 12:02:03PM -0700, Simon Kirby wrote:
> > > 
> > > > Ok, hit the hang about 4 more times, but only this morning on a box with
> > > > a serial cable attached. Yay!
> > > 
> > > Here's lockdep output from another box. This one looks a bit different.
> > 
> > One more, again a bit different. The last few lockups have looked like
> > this. Not sure why, but we're hitting this at a few a day now. Thomas,
> > this is without your patch, but as you said, that's right before a free
> > and should print a separate lockdep warning.
> > 
> > No "huh" lines until after the trace on this one. I'll move to 3.1 with
> 
> That means that the lockdep warning hit in the same net_rx cycle
> before the leak was detected by the softirq code.
> 
> > cherry-picked b0691c8e now.
> 
> Can you please add the debug patch below and try the following:
> 
> Enable CONFIG_FUNCTION_TRACER & CONFIG_FUNCTION_GRAPH_TRACER
> 
> # cd $DEBUGFSMOUNTPOINT/tracing
> # echo sk_clone >set_ftrace_filter
> # echo function >current_tracer
> # echo 1 >options/func_stack_trace
> 
> Now wait until it reproduces (which stops the trace) and read out
> 
> # cat trace >/tmp/trace.txt
> 
> Please provide the trace file along with the lockdep splat. That
> should tell us which callchain is responsible for the spinlock
> leakage.
> 
> Thanks,
> 
> 	tglx
> 
> --------------->
>  kernel/softirq.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> Index: linux-2.6/kernel/softirq.c
> ===================================================================
> --- linux-2.6.orig/kernel/softirq.c
> +++ linux-2.6/kernel/softirq.c
> @@ -238,6 +238,7 @@ restart:
>  			h->action(h);
>  			trace_softirq_exit(vec_nr);
>  			if (unlikely(prev_count != preempt_count())) {
> +				tracing_off();
>  				printk(KERN_ERR "huh, entered softirq %u %s %p"
>  				       "with preempt_count %08x,"
>  				       " exited with %08x?\n", vec_nr,


I believe it might come from commit 0e734419
(ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.)

In case inet_csk_route_child_sock() returns NULL, we dont release socket
lock.




^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 17:27                                             ` Eric Dumazet
@ 2011-11-02 17:46                                               ` Linus Torvalds
  2011-11-02 17:53                                                 ` Eric Dumazet
  2011-11-02 17:49                                               ` Eric Dumazet
  2011-11-02 17:54                                               ` Thomas Gleixner
  2 siblings, 1 reply; 156+ messages in thread
From: Linus Torvalds @ 2011-11-02 17:46 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Thomas Gleixner, Simon Kirby, David Miller, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

On Wed, Nov 2, 2011 at 10:27 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> I believe it might come from commit 0e734419
> (ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.)
>
> In case inet_csk_route_child_sock() returns NULL, we dont release socket
> lock.

Hmm. I'm not seeing it. We're not even taking the socket lock there.
Or is it hidden somehow?

                    Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 17:27                                             ` Eric Dumazet
  2011-11-02 17:46                                               ` Linus Torvalds
@ 2011-11-02 17:49                                               ` Eric Dumazet
  2011-11-02 17:58                                                 ` Eric Dumazet
  2011-11-02 17:54                                               ` Thomas Gleixner
  2 siblings, 1 reply; 156+ messages in thread
From: Eric Dumazet @ 2011-11-02 17:49 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Simon Kirby, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

Le mercredi 02 novembre 2011 à 18:27 +0100, Eric Dumazet a écrit :

> I believe it might come from commit 0e734419
> (ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.)
> 
> In case inet_csk_route_child_sock() returns NULL, we dont release socket
> lock.
> 
> 

Yes, thats the problem. I am testing following patch :

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 0ea10ee..683d97a 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1510,6 +1510,7 @@ exit:
 	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS);
 	return NULL;
 put_and_exit:
+	bh_unlock_sock(newsk);
 	sock_put(newsk);
 	goto exit;
 }



^ permalink raw reply related	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 17:46                                               ` Linus Torvalds
@ 2011-11-02 17:53                                                 ` Eric Dumazet
  2011-11-02 18:00                                                   ` Linus Torvalds
  0 siblings, 1 reply; 156+ messages in thread
From: Eric Dumazet @ 2011-11-02 17:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Simon Kirby, David Miller, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

Le mercredi 02 novembre 2011 à 10:46 -0700, Linus Torvalds a écrit :
> On Wed, Nov 2, 2011 at 10:27 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > I believe it might come from commit 0e734419
> > (ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.)
> >
> > In case inet_csk_route_child_sock() returns NULL, we dont release socket
> > lock.
> 
> Hmm. I'm not seeing it. We're not even taking the socket lock there.
> Or is it hidden somehow?
> 
>                     Linus

tcp_v4_syn_recv_sock()
{
	newsk = tcp_create_openreq_child(sk, req, skb);

...
	

}

newsk is locked at this point.




^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 17:27                                             ` Eric Dumazet
  2011-11-02 17:46                                               ` Linus Torvalds
  2011-11-02 17:49                                               ` Eric Dumazet
@ 2011-11-02 17:54                                               ` Thomas Gleixner
  2011-11-02 18:04                                                 ` Eric Dumazet
  2 siblings, 1 reply; 156+ messages in thread
From: Thomas Gleixner @ 2011-11-02 17:54 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Simon Kirby, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2793 bytes --]

On Wed, 2 Nov 2011, Eric Dumazet wrote:

> Le mercredi 02 novembre 2011 à 17:40 +0100, Thomas Gleixner a écrit :
> > On Mon, 31 Oct 2011, Simon Kirby wrote:
> > > On Tue, Oct 25, 2011 at 01:20:49PM -0700, Simon Kirby wrote:
> > > 
> > > > On Mon, Oct 24, 2011 at 12:02:03PM -0700, Simon Kirby wrote:
> > > > 
> > > > > Ok, hit the hang about 4 more times, but only this morning on a box with
> > > > > a serial cable attached. Yay!
> > > > 
> > > > Here's lockdep output from another box. This one looks a bit different.
> > > 
> > > One more, again a bit different. The last few lockups have looked like
> > > this. Not sure why, but we're hitting this at a few a day now. Thomas,
> > > this is without your patch, but as you said, that's right before a free
> > > and should print a separate lockdep warning.
> > > 
> > > No "huh" lines until after the trace on this one. I'll move to 3.1 with
> > 
> > That means that the lockdep warning hit in the same net_rx cycle
> > before the leak was detected by the softirq code.
> > 
> > > cherry-picked b0691c8e now.
> > 
> > Can you please add the debug patch below and try the following:
> > 
> > Enable CONFIG_FUNCTION_TRACER & CONFIG_FUNCTION_GRAPH_TRACER
> > 
> > # cd $DEBUGFSMOUNTPOINT/tracing
> > # echo sk_clone >set_ftrace_filter
> > # echo function >current_tracer
> > # echo 1 >options/func_stack_trace
> > 
> > Now wait until it reproduces (which stops the trace) and read out
> > 
> > # cat trace >/tmp/trace.txt
> > 
> > Please provide the trace file along with the lockdep splat. That
> > should tell us which callchain is responsible for the spinlock
> > leakage.
> > 
> > Thanks,
> > 
> > 	tglx
> > 
> > --------------->
> >  kernel/softirq.c |    1 +
> >  1 file changed, 1 insertion(+)
> > 
> > Index: linux-2.6/kernel/softirq.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/softirq.c
> > +++ linux-2.6/kernel/softirq.c
> > @@ -238,6 +238,7 @@ restart:
> >  			h->action(h);
> >  			trace_softirq_exit(vec_nr);
> >  			if (unlikely(prev_count != preempt_count())) {
> > +				tracing_off();
> >  				printk(KERN_ERR "huh, entered softirq %u %s %p"
> >  				       "with preempt_count %08x,"
> >  				       " exited with %08x?\n", vec_nr,
> 
> 
> I believe it might come from commit 0e734419
> (ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.)
> 
> In case inet_csk_route_child_sock() returns NULL, we dont release socket
> lock.

The same applies for if (__inet_inherit_port(sk, newsk) < 0) a few
lines further down, but that part was leaking the lock before that
commit already.

Just for the record, the locking in that code is mind boggling. It
took me some detective work to find even the place where the success
code path unlocks the lock :(

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 17:49                                               ` Eric Dumazet
@ 2011-11-02 17:58                                                 ` Eric Dumazet
  2011-11-02 19:16                                                   ` Simon Kirby
  0 siblings, 1 reply; 156+ messages in thread
From: Eric Dumazet @ 2011-11-02 17:58 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Simon Kirby, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

Le mercredi 02 novembre 2011 à 18:49 +0100, Eric Dumazet a écrit :
> Le mercredi 02 novembre 2011 à 18:27 +0100, Eric Dumazet a écrit :
> 
> > I believe it might come from commit 0e734419
> > (ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.)
> > 
> > In case inet_csk_route_child_sock() returns NULL, we dont release socket
> > lock.
> > 
> > 
> 
> Yes, thats the problem. I am testing following patch :
> 
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 0ea10ee..683d97a 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1510,6 +1510,7 @@ exit:
>  	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS);
>  	return NULL;
>  put_and_exit:
> +	bh_unlock_sock(newsk);
>  	sock_put(newsk);
>  	goto exit;
>  }
> 


This indeed solves the problem, but more closer inspection is needed to
close all bugs, not this only one.

# netstat -s
Ip:
    6961157 total packets received
    0 forwarded
    0 incoming packets discarded
    6961157 incoming packets delivered
    6961049 requests sent out
    2 dropped because of missing route    //// HERE, this is the origin




^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 17:53                                                 ` Eric Dumazet
@ 2011-11-02 18:00                                                   ` Linus Torvalds
  2011-11-02 18:05                                                     ` Eric Dumazet
  0 siblings, 1 reply; 156+ messages in thread
From: Linus Torvalds @ 2011-11-02 18:00 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Thomas Gleixner, Simon Kirby, David Miller, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

On Wed, Nov 2, 2011 at 10:53 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> tcp_v4_syn_recv_sock()
> {
>        newsk = tcp_create_openreq_child(sk, req, skb);
>
> newsk is locked at this point.

Umm, if that is the case, then the bug predates the commit you point
to. There were exit paths before that too.

                   Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 17:54                                               ` Thomas Gleixner
@ 2011-11-02 18:04                                                 ` Eric Dumazet
  0 siblings, 0 replies; 156+ messages in thread
From: Eric Dumazet @ 2011-11-02 18:04 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Simon Kirby, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

Le mercredi 02 novembre 2011 à 18:54 +0100, Thomas Gleixner a écrit :

> The same applies for if (__inet_inherit_port(sk, newsk) < 0) a few
> lines further down, but that part was leaking the lock before that
> commit already.
> 

Yes, but in normal condition, this never happened, this is why this
problem was never noticed.

tproxy is probaby very seldom used, and when used the error path is
probably never reached...



^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 18:00                                                   ` Linus Torvalds
@ 2011-11-02 18:05                                                     ` Eric Dumazet
  2011-11-02 18:10                                                       ` Linus Torvalds
  0 siblings, 1 reply; 156+ messages in thread
From: Eric Dumazet @ 2011-11-02 18:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Simon Kirby, David Miller, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

Le mercredi 02 novembre 2011 à 11:00 -0700, Linus Torvalds a écrit :
> On Wed, Nov 2, 2011 at 10:53 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > tcp_v4_syn_recv_sock()
> > {
> >        newsk = tcp_create_openreq_child(sk, req, skb);
> >
> > newsk is locked at this point.
> 
> Umm, if that is the case, then the bug predates the commit you point
> to. There were exit paths before that too.
> 

Yes, but only when tproxy is used, and in some obscure error
conditions... Probably nobody ever hit them or complained.




^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 18:05                                                     ` Eric Dumazet
@ 2011-11-02 18:10                                                       ` Linus Torvalds
  0 siblings, 0 replies; 156+ messages in thread
From: Linus Torvalds @ 2011-11-02 18:10 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Thomas Gleixner, Simon Kirby, David Miller, Peter Zijlstra,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

On Wed, Nov 2, 2011 at 11:05 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> Yes, but only when tproxy is used, and in some obscure error
> conditions... Probably nobody ever hit them or complained.

Yes, I'm not disputing that. However, it does show how incredibly
fragile that code is.

May I suggest renaming those "clone_sk()" kinds of functions
"clone_sk_lock()" or something? So that you *see* that it's locked as
it is cloned. That might have made the bug not happen in the first
place..

Of course, maybe it's obvious to most net people - just not me looking
at the code - that the new socket ended up being locked at allocation.
But considering the bug happened twice, that "obvious" part is clearly
debatable..

                          Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 16:40                                           ` Thomas Gleixner
  2011-11-02 17:27                                             ` Eric Dumazet
@ 2011-11-02 18:28                                             ` Simon Kirby
  2011-11-02 18:30                                               ` Thomas Gleixner
  1 sibling, 1 reply; 156+ messages in thread
From: Simon Kirby @ 2011-11-02 18:28 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

On Wed, Nov 02, 2011 at 05:40:53PM +0100, Thomas Gleixner wrote:

> On Mon, 31 Oct 2011, Simon Kirby wrote:
> 
> > One more, again a bit different. The last few lockups have looked like
> > this. Not sure why, but we're hitting this at a few a day now. Thomas,
> > this is without your patch, but as you said, that's right before a free
> > and should print a separate lockdep warning.
> > 
> > No "huh" lines until after the trace on this one. I'll move to 3.1 with
> 
> That means that the lockdep warning hit in the same net_rx cycle
> before the leak was detected by the softirq code.
> 
> > cherry-picked b0691c8e now.
> 
> Can you please add the debug patch below and try the following:
> 
> Enable CONFIG_FUNCTION_TRACER & CONFIG_FUNCTION_GRAPH_TRACER
> 
> # cd $DEBUGFSMOUNTPOINT/tracing
> # echo sk_clone >set_ftrace_filter
> # echo function >current_tracer
> # echo 1 >options/func_stack_trace
> 
> Now wait until it reproduces (which stops the trace) and read out
> 
> # cat trace >/tmp/trace.txt
> 
> Please provide the trace file along with the lockdep splat. That
> should tell us which callchain is responsible for the spinlock
> leakage.
> Thanks,
> 
> 	tglx
> 
> --------------->
>  kernel/softirq.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> Index: linux-2.6/kernel/softirq.c
> ===================================================================
> --- linux-2.6.orig/kernel/softirq.c
> +++ linux-2.6/kernel/softirq.c
> @@ -238,6 +238,7 @@ restart:
>  			h->action(h);
>  			trace_softirq_exit(vec_nr);
>  			if (unlikely(prev_count != preempt_count())) {
> +				tracing_off();
>  				printk(KERN_ERR "huh, entered softirq %u %s %p"
>  				       "with preempt_count %08x,"
>  				       " exited with %08x?\n", vec_nr,

Ok, I'll try this. Hmm, all CPUs typically try to grab the lock fairly
quickly after it happens, which could make it difficult to cat the file.
I'll try ftrace_dump(DUMP_ALL); in there instead.

Simon-

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 18:28                                             ` Simon Kirby
@ 2011-11-02 18:30                                               ` Thomas Gleixner
  0 siblings, 0 replies; 156+ messages in thread
From: Thomas Gleixner @ 2011-11-02 18:30 UTC (permalink / raw)
  To: Simon Kirby
  Cc: David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

On Wed, 2 Nov 2011, Simon Kirby wrote:
> On Wed, Nov 02, 2011 at 05:40:53PM +0100, Thomas Gleixner wrote:
> Ok, I'll try this. Hmm, all CPUs typically try to grab the lock fairly
> quickly after it happens, which could make it difficult to cat the file.
> I'll try ftrace_dump(DUMP_ALL); in there instead.

Eric has spotted the source of trouble already. Can you try his patch
first? If it still persists, we still can resort to hardcore tracing :)

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 17:58                                                 ` Eric Dumazet
@ 2011-11-02 19:16                                                   ` Simon Kirby
  2011-11-02 22:42                                                     ` Eric Dumazet
  0 siblings, 1 reply; 156+ messages in thread
From: Simon Kirby @ 2011-11-02 19:16 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Thomas Gleixner, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

On Wed, Nov 02, 2011 at 06:58:21PM +0100, Eric Dumazet wrote:

> Le mercredi 02 novembre 2011 ?? 18:49 +0100, Eric Dumazet a ??crit :
> > Le mercredi 02 novembre 2011 ?? 18:27 +0100, Eric Dumazet a ??crit :
> > 
> > > I believe it might come from commit 0e734419
> > > (ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.)
> > > 
> > > In case inet_csk_route_child_sock() returns NULL, we dont release socket
> > > lock.
> > > 
> > > 
> > 
> > Yes, thats the problem. I am testing following patch :
> > 
> > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> > index 0ea10ee..683d97a 100644
> > --- a/net/ipv4/tcp_ipv4.c
> > +++ b/net/ipv4/tcp_ipv4.c
> > @@ -1510,6 +1510,7 @@ exit:
> >  	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS);
> >  	return NULL;
> >  put_and_exit:
> > +	bh_unlock_sock(newsk);
> >  	sock_put(newsk);
> >  	goto exit;
> >  }
> > 
> 
> 
> This indeed solves the problem, but more closer inspection is needed to
> close all bugs, not this only one.
> 
> # netstat -s
> Ip:
>     6961157 total packets received
>     0 forwarded
>     0 incoming packets discarded
>     6961157 incoming packets delivered
>     6961049 requests sent out
>     2 dropped because of missing route    //// HERE, this is the origin

Actually, we have an anti-abuse daemon that injects blackhole routes, so
this makes sense. (The daemon was written before ipsets were merged and
normal netfilter rules make it fall over under attack.)

I'll try with this patch. Thanks!

Simon-

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-10-31 17:32                                         ` Simon Kirby
  2011-11-02 16:40                                           ` Thomas Gleixner
@ 2011-11-02 22:10                                           ` Steven Rostedt
  2011-11-02 23:00                                             ` Steven Rostedt
  1 sibling, 1 reply; 156+ messages in thread
From: Steven Rostedt @ 2011-11-02 22:10 UTC (permalink / raw)
  To: Simon Kirby
  Cc: Thomas Gleixner, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

Thomas pointed me here.

On Mon, Oct 31, 2011 at 10:32:46AM -0700, Simon Kirby wrote:
> [104661.244767] 
> [104661.244767]  Possible unsafe locking scenario:
> [104661.244767]        
> [104661.244767]        CPU0                    CPU1
> [104661.244767]        ----                    ----
> [104661.244767]   lock(slock-AF_INET);
> [104661.244767]                                lock(slock-AF_INET);
> [104661.244767]                                lock(slock-AF_INET);
> [104661.244767]   lock(slock-AF_INET);
> [104661.244767] 
> [104661.244767]  *** DEADLOCK ***
> [104661.244767] 

Bah, I used the __print_lock_name() function to show the lock names in
the above, which leaves off the subclass number. I'll go write up a
patch that fixes that.

Thanks,

-- Steve


^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 19:16                                                   ` Simon Kirby
@ 2011-11-02 22:42                                                     ` Eric Dumazet
  2011-11-03  0:24                                                       ` Thomas Gleixner
                                                                         ` (2 more replies)
  0 siblings, 3 replies; 156+ messages in thread
From: Eric Dumazet @ 2011-11-02 22:42 UTC (permalink / raw)
  To: Simon Kirby
  Cc: Thomas Gleixner, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development, Balazs Scheidler,
	KOVACS Krisztian

On 02/11/2011 20:16, Simon Kirby wrote:

 
> Actually, we have an anti-abuse daemon that injects blackhole routes, so
> this makes sense. (The daemon was written before ipsets were merged and
> normal netfilter rules make it fall over under attack.)
> 
> I'll try with this patch. Thanks!
> 


Thanks !

Here is the official submission, please add your 'Tested-by' signature
when you can confirm problem goes away.

(It did here, when I injected random NULL returns from
inet_csk_route_child_sock(), so I am confident this is the problem you hit )

[PATCH] net: add missing bh_unlock_sock() calls

Simon Kirby reported lockdep warnings and following messages :

[104661.897577] huh, entered softirq 3 NET_RX ffffffff81613740
preempt_count 00000101, exited with 00000102?

[104661.923653] huh, entered softirq 3 NET_RX ffffffff81613740
preempt_count 00000101, exited with 00000102?

Problem comes from commit 0e734419
(ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.)

If inet_csk_route_child_sock() returns NULL, we should release socket
lock before freeing it.

Another lock imbalance exists if __inet_inherit_port() returns an error
since commit 093d282321da ( tproxy: fix hash locking issue when using
port redirection in __inet_inherit_port()) a backport is also needed for
>= 2.6.37 kernels.

Reported-by: Dimon Kirby <sim@hostway.ca>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Balazs Scheidler <bazsi@balabit.hu>
CC: KOVACS Krisztian <hidden@balabit.hu>
---
 net/dccp/ipv4.c     |    1 +
 net/ipv4/tcp_ipv4.c |    1 +
 2 files changed, 2 insertions(+)

diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index 332639b..90a919a 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -433,6 +433,7 @@ exit:
 	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS);
 	return NULL;
 put_and_exit:
+	bh_unlock_sock(newsk);
 	sock_put(newsk);
 	goto exit;
 }
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 0ea10ee..683d97a 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1510,6 +1510,7 @@ exit:
 	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS);
 	return NULL;
 put_and_exit:
+	bh_unlock_sock(newsk);
 	sock_put(newsk);
 	goto exit;
 }

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 22:10                                           ` Steven Rostedt
@ 2011-11-02 23:00                                             ` Steven Rostedt
  2011-11-03  0:09                                               ` Simon Kirby
  0 siblings, 1 reply; 156+ messages in thread
From: Steven Rostedt @ 2011-11-02 23:00 UTC (permalink / raw)
  To: Simon Kirby
  Cc: Thomas Gleixner, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

On Wed, Nov 02, 2011 at 06:10:23PM -0400, Steven Rostedt wrote:
> Thomas pointed me here.
> 
> On Mon, Oct 31, 2011 at 10:32:46AM -0700, Simon Kirby wrote:
> > [104661.244767] 
> > [104661.244767]  Possible unsafe locking scenario:
> > [104661.244767]        
> > [104661.244767]        CPU0                    CPU1
> > [104661.244767]        ----                    ----
> > [104661.244767]   lock(slock-AF_INET);
> > [104661.244767]                                lock(slock-AF_INET);
> > [104661.244767]                                lock(slock-AF_INET);
> > [104661.244767]   lock(slock-AF_INET);
> > [104661.244767] 
> > [104661.244767]  *** DEADLOCK ***
> > [104661.244767] 
> 
> Bah, I used the __print_lock_name() function to show the lock names in
> the above, which leaves off the subclass number. I'll go write up a
> patch that fixes that.
> 

Simon,

If you are still triggering the bug. Could you do me a favor and apply
the following patch. Just to make sure it fixes the confusing output
from above.

Thanks,

-- Steve


diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 91d67ce..d821ac9 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -490,16 +490,22 @@ void get_usage_chars(struct lock_class *class, char usage[LOCK_USAGE_CHARS])
 	usage[i] = '\0';
 }
 
-static int __print_lock_name(struct lock_class *class)
+static void __print_lock_name(struct lock_class *class)
 {
 	char str[KSYM_NAME_LEN];
 	const char *name;
 
 	name = class->name;
-	if (!name)
+	if (!name) {
 		name = __get_key_name(class->key, str);
-
-	return printk("%s", name);
+		printk("%s", name);
+	} else {
+		printk("%s", name);
+		if (class->name_version > 1)
+			printk("#%d", class->name_version);
+		if (class->subclass)
+			printk("/%d", class->subclass);
+	}
 }
 
 static void print_lock_name(struct lock_class *class)
@@ -509,17 +515,8 @@ static void print_lock_name(struct lock_class *class)
 
 	get_usage_chars(class, usage);
 
-	name = class->name;
-	if (!name) {
-		name = __get_key_name(class->key, str);
-		printk(" (%s", name);
-	} else {
-		printk(" (%s", name);
-		if (class->name_version > 1)
-			printk("#%d", class->name_version);
-		if (class->subclass)
-			printk("/%d", class->subclass);
-	}
+	printk(" (");
+	__print_lock_name(class);
 	printk("){%s}", usage);
 }
 

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 23:00                                             ` Steven Rostedt
@ 2011-11-03  0:09                                               ` Simon Kirby
  2011-11-03  0:15                                                 ` Steven Rostedt
  0 siblings, 1 reply; 156+ messages in thread
From: Simon Kirby @ 2011-11-03  0:09 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Thomas Gleixner, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

On Wed, Nov 02, 2011 at 07:00:10PM -0400, Steven Rostedt wrote:

> On Wed, Nov 02, 2011 at 06:10:23PM -0400, Steven Rostedt wrote:
> > Thomas pointed me here.
> > 
> > On Mon, Oct 31, 2011 at 10:32:46AM -0700, Simon Kirby wrote:
> > > [104661.244767] 
> > > [104661.244767]  Possible unsafe locking scenario:
> > > [104661.244767]        
> > > [104661.244767]        CPU0                    CPU1
> > > [104661.244767]        ----                    ----
> > > [104661.244767]   lock(slock-AF_INET);
> > > [104661.244767]                                lock(slock-AF_INET);
> > > [104661.244767]                                lock(slock-AF_INET);
> > > [104661.244767]   lock(slock-AF_INET);
> > > [104661.244767] 
> > > [104661.244767]  *** DEADLOCK ***
> > > [104661.244767] 
> > 
> > Bah, I used the __print_lock_name() function to show the lock names in
> > the above, which leaves off the subclass number. I'll go write up a
> > patch that fixes that.
> > 
> 
> Simon,
> 
> If you are still triggering the bug. Could you do me a favor and apply
> the following patch. Just to make sure it fixes the confusing output
> from above.
> 
> Thanks,
> 
> -- Steve
> 
> 
> diff --git a/kernel/lockdep.c b/kernel/lockdep.c
> index 91d67ce..d821ac9 100644
> --- a/kernel/lockdep.c
> +++ b/kernel/lockdep.c
> @@ -490,16 +490,22 @@ void get_usage_chars(struct lock_class *class, char usage[LOCK_USAGE_CHARS])
>  	usage[i] = '\0';
>  }
>  
> -static int __print_lock_name(struct lock_class *class)
> +static void __print_lock_name(struct lock_class *class)
>  {
>  	char str[KSYM_NAME_LEN];
>  	const char *name;
>  
>  	name = class->name;
> -	if (!name)
> +	if (!name) {
>  		name = __get_key_name(class->key, str);
> -
> -	return printk("%s", name);
> +		printk("%s", name);
> +	} else {
> +		printk("%s", name);
> +		if (class->name_version > 1)
> +			printk("#%d", class->name_version);
> +		if (class->subclass)
> +			printk("/%d", class->subclass);
> +	}
>  }
>  
>  static void print_lock_name(struct lock_class *class)
> @@ -509,17 +515,8 @@ static void print_lock_name(struct lock_class *class)
>  
>  	get_usage_chars(class, usage);
>  
> -	name = class->name;
> -	if (!name) {
> -		name = __get_key_name(class->key, str);
> -		printk(" (%s", name);
> -	} else {
> -		printk(" (%s", name);
> -		if (class->name_version > 1)
> -			printk("#%d", class->name_version);
> -		if (class->subclass)
> -			printk("/%d", class->subclass);
> -	}
> +	printk(" (");
> +	__print_lock_name(class);
>  	printk("){%s}", usage);
>  }

Hello!

I am now able to reproduce on demand by just starting an "ab" from
another box and "ip route add blackhole <other machine>" on the target
box while the ab is running. The first time I tried this without your
patch, and got the trace I had before. With your patch, I got this:

[  366.198866] huh, entered softirq 3 NET_RX ffffffff81616560 preempt_count 00000102, exited with 00000103?
[  366.198981] 
[  366.198982] =================================
[  366.199118] [ INFO: inconsistent lock state ]
[  366.199189] 3.1.0-hw-lockdep+ #58
[  366.199259] ---------------------------------
[  366.199331] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
[  366.199407] kworker/0:1/0 [HC0[0]:SC0[0]:HE1:SE1] takes:
[  366.199480]  (slock-AF_INET){+.?...}, at: [<ffffffff8160738e>] sk_clone+0x10e/0x3e0
[  366.199773] {IN-SOFTIRQ-W} state was registered at:
[  366.199846]   [<ffffffff81098b7c>] __lock_acquire+0xcbc/0x2180
[  366.199973]   [<ffffffff8109a149>] lock_acquire+0x109/0x140
[  366.200096]   [<ffffffff816f842c>] _raw_spin_lock+0x3c/0x50
[  366.200220]   [<ffffffff8166eb6d>] udp_queue_rcv_skb+0x26d/0x4b0
[  366.200346]   [<ffffffff8166f4d3>] __udp4_lib_rcv+0x2f3/0x910
[  366.200470]   [<ffffffff8166fb05>] udp_rcv+0x15/0x20
[  366.200592]   [<ffffffff81644790>] ip_local_deliver_finish+0x100/0x2f0
[  366.200718]   [<ffffffff81644a0d>] ip_local_deliver+0x8d/0xa0
[  366.200841]   [<ffffffff81644033>] ip_rcv_finish+0x1a3/0x510
[  366.200965]   [<ffffffff81644612>] ip_rcv+0x272/0x2f0
[  366.201086]   [<ffffffff81613b87>] __netif_receive_skb+0x4d7/0x560
[  366.201211]   [<ffffffff81613ce0>] process_backlog+0xd0/0x1e0
[  366.201335]   [<ffffffff816166a0>] net_rx_action+0x140/0x2c0
[  366.201458]   [<ffffffff810640e8>] __do_softirq+0x138/0x250
[  366.201582]   [<ffffffff817030fc>] call_softirq+0x1c/0x30
[  366.201706]   [<ffffffff810153c5>] do_softirq+0x95/0xd0
[  366.202822]   [<ffffffff81063efd>] local_bh_enable+0xed/0x110
[  366.202822]   [<ffffffff81617a68>] dev_queue_xmit+0x1a8/0x8a0
[  366.202822]   [<ffffffff81621fca>] neigh_resolve_output+0x17a/0x220
[  366.202822]   [<ffffffff8164ab7c>] ip_finish_output+0x2ec/0x590
[  366.202822]   [<ffffffff8164aea8>] ip_output+0x88/0xe0
[  366.202822]   [<ffffffff81649b08>] ip_local_out+0x28/0x80
[  366.202822]   [<ffffffff81649b69>] ip_send_skb+0x9/0x40
[  366.202822]   [<ffffffff8166dce8>] udp_send_skb+0x108/0x370
[  366.202822]   [<ffffffff8167093c>] udp_sendmsg+0x7dc/0x920
[  366.202822]   [<ffffffff81678c4f>] inet_sendmsg+0xbf/0x120
[  366.202822]   [<ffffffff81602193>] sock_sendmsg+0xe3/0x110
[  366.202822]   [<ffffffff81602ab5>] sys_sendto+0x105/0x140
[  366.202822]   [<ffffffff81700e92>] system_call_fastpath+0x16/0x1b
[  366.202822] irq event stamp: 1175966
[  366.202822] hardirqs last  enabled at (1175964): [<ffffffff816f9174>] restore_args+0x0/0x30
[  366.202822] hardirqs last disabled at (1175965): [<ffffffff8106415d>] __do_softirq+0x1ad/0x250
[  366.202822] softirqs last  enabled at (1175966): [<ffffffff810641a6>] __do_softirq+0x1f6/0x250
[  366.202822] softirqs last disabled at (1175907): [<ffffffff817030fc>] call_softirq+0x1c/0x30
[  366.202822] 
[  366.202822] other info that might help us debug this:
[  366.202822]  Possible unsafe locking scenario:
[  366.202822] 
[  366.202822]        CPU0
[  366.202822]        ----
[  366.202822]   lock(slock-AF_INET);
[  366.202822]   <Interrupt>
[  366.202822]     lock(slock-AF_INET);
[  366.202822] 
[  366.202822]  *** DEADLOCK ***
[  366.202822] 
[  366.202822] 1 lock held by kworker/0:1/0:
[  366.202822]  #0:  (slock-AF_INET){+.?...}, at: [<ffffffff8160738e>] sk_clone+0x10e/0x3e0
[  366.202822] 
[  366.202822] stack backtrace:
[  366.202822] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-hw-lockdep+ #58
[  366.202822] Call Trace:
[  366.202822]  [<ffffffff81095f31>] print_usage_bug+0x241/0x310
[  366.202822]  [<ffffffff810964b4>] mark_lock+0x4b4/0x6c0
[  366.202822]  [<ffffffff81097300>] ? check_usage_forwards+0x110/0x110
[  366.202822]  [<ffffffff81096762>] mark_held_locks+0xa2/0x130
[  366.202822]  [<ffffffff816f9174>] ? retint_restore_args+0x13/0x13
[  366.202822]  [<ffffffff81096b0d>] trace_hardirqs_on_caller+0x13d/0x1c0
[  366.202822]  [<ffffffff813a60ee>] trace_hardirqs_on_thunk+0x3a/0x3f
[  366.202822]  [<ffffffff816f9174>] ? retint_restore_args+0x13/0x13
[  366.202822]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[  366.202822]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[  366.202822]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[  366.202822]  [<ffffffff816ef2eb>] start_secondary+0x1ca/0x1ff

...which of course is a different splat, so I ran it again:

[   49.028097] =======================================================
[   49.028244] [ INFO: possible circular locking dependency detected ]
[   49.028321] 3.1.0-hw-lockdep+ #58
[   49.028391] -------------------------------------------------------
[   49.028466] tcsh/2490 is trying to acquire lock:
[   49.028539]  (slock-AF_INET/1){+.-...}, at: [<ffffffff816676b7>] tcp_v4_rcv+0x867/0xc10
[   49.028882] 
[   49.028883] but task is already holding lock:
[   49.029018]  (slock-AF_INET){+.-...}, at: [<ffffffff8160738e>] sk_clone+0x10e/0x3e0
[   49.029310] 
[   49.029310] which lock already depends on the new lock.
[   49.029312] 
[   49.029513] 
[   49.029514] the existing dependency chain (in reverse order) is:
[   49.029653] 
[   49.029654] -> #1 (slock-AF_INET){+.-...}:
[   49.029986]        [<ffffffff8109a149>] lock_acquire+0x109/0x140
[   49.030115]        [<ffffffff816f842c>] _raw_spin_lock+0x3c/0x50
[   49.030242]        [<ffffffff8160738e>] sk_clone+0x10e/0x3e0
[   49.031959]        [<ffffffff8164f963>] inet_csk_clone+0x13/0x90
[   49.032008]        [<ffffffff816697d5>] tcp_create_openreq_child+0x25/0x4d0
[   49.032008]        [<ffffffff81667aa8>] tcp_v4_syn_recv_sock+0x48/0x2c0
[   49.032008]        [<ffffffff81669625>] tcp_check_req+0x335/0x4c0
[   49.032008]        [<ffffffff81666c8e>] tcp_v4_do_rcv+0x29e/0x460
[   49.032008]        [<ffffffff816676dc>] tcp_v4_rcv+0x88c/0xc10
[   49.032008]        [<ffffffff81644790>] ip_local_deliver_finish+0x100/0x2f0
[   49.032008]        [<ffffffff81644a0d>] ip_local_deliver+0x8d/0xa0
[   49.032008]        [<ffffffff81644033>] ip_rcv_finish+0x1a3/0x510
[   49.032008]        [<ffffffff81644612>] ip_rcv+0x272/0x2f0
[   49.032008]        [<ffffffff81613b87>] __netif_receive_skb+0x4d7/0x560
[   49.032008]        [<ffffffff81613ce0>] process_backlog+0xd0/0x1e0
[   49.032008]        [<ffffffff816166a0>] net_rx_action+0x140/0x2c0
[   49.032008]        [<ffffffff810640e8>] __do_softirq+0x138/0x250
[   49.032008]        [<ffffffff817030fc>] call_softirq+0x1c/0x30
[   49.032008]        [<ffffffff810153c5>] do_softirq+0x95/0xd0
[   49.032008]        [<ffffffff81063ded>] local_bh_enable_ip+0xed/0x110
[   49.032008]        [<ffffffff816f8ccf>] _raw_spin_unlock_bh+0x3f/0x50
[   49.032008]        [<ffffffff81605ca1>] release_sock+0x161/0x1d0
[   49.032008]        [<ffffffff8167911d>] inet_stream_connect+0x6d/0x2f0
[   49.032008]        [<ffffffff815ffe4b>] kernel_connect+0xb/0x10
[   49.032008]        [<ffffffff816addb6>] xs_tcp_setup_socket+0x2a6/0x4c0
[   49.032008]        [<ffffffff81078d29>] process_one_work+0x1e9/0x560
[   49.032008]        [<ffffffff81079433>] worker_thread+0x193/0x420
[   49.032008]        [<ffffffff81080496>] kthread+0x96/0xb0
[   49.032008]        [<ffffffff81703004>] kernel_thread_helper+0x4/0x10
[   49.032008] 
[   49.032008] -> #0 (slock-AF_INET/1){+.-...}:
[   49.032008]        [<ffffffff81099f00>] __lock_acquire+0x2040/0x2180
[   49.032008]        [<ffffffff8109a149>] lock_acquire+0x109/0x140
[   49.032008]        [<ffffffff816f83da>] _raw_spin_lock_nested+0x3a/0x50
[   49.032008]        [<ffffffff816676b7>] tcp_v4_rcv+0x867/0xc10
[   49.032008]        [<ffffffff81644790>] ip_local_deliver_finish+0x100/0x2f0
[   49.032008]        [<ffffffff81644a0d>] ip_local_deliver+0x8d/0xa0
[   49.032008]        [<ffffffff81644033>] ip_rcv_finish+0x1a3/0x510
[   49.032008]        [<ffffffff81644612>] ip_rcv+0x272/0x2f0
[   49.032008]        [<ffffffff81613b87>] __netif_receive_skb+0x4d7/0x560
[   49.032008]        [<ffffffff81615c44>] netif_receive_skb+0x104/0x120
[   49.032008]        [<ffffffff81615d90>] napi_skb_finish+0x50/0x70
[   49.032008]        [<ffffffff81616455>] napi_gro_receive+0xc5/0xd0
[   49.032008]        [<ffffffffa000ad50>] bnx2_poll_work+0x610/0x1560 [bnx2]
[   49.032008]        [<ffffffffa000bde6>] bnx2_poll+0x66/0x250 [bnx2]
[   49.032008]        [<ffffffff816166a0>] net_rx_action+0x140/0x2c0
[   49.032008]        [<ffffffff810640e8>] __do_softirq+0x138/0x250
[   49.032008]        [<ffffffff817030fc>] call_softirq+0x1c/0x30
[   49.032008]        [<ffffffff810153c5>] do_softirq+0x95/0xd0
[   49.032008]        [<ffffffff81063cbd>] irq_exit+0xdd/0x110
[   49.032008]        [<ffffffff81014b74>] do_IRQ+0x64/0xe0
[   49.032008]        [<ffffffff816f90b3>] ret_from_intr+0x0/0x1a
[   49.032008]        [<ffffffff8105f63f>] release_task+0x24f/0x4c0
[   49.032008]        [<ffffffff810601de>] wait_consider_task+0x92e/0xb90
[   49.032008]        [<ffffffff81060590>] do_wait+0x150/0x270
[   49.032008]        [<ffffffff81060751>] sys_wait4+0xa1/0xf0
[   49.032008]        [<ffffffff81700e92>] system_call_fastpath+0x16/0x1b
[   49.032008] 
[   49.032008] other info that might help us debug this:
[   49.032008] 
[   49.032008]  Possible unsafe locking scenario:
[   49.032008] 
[   49.032008]        CPU0                    CPU1
[   49.032008]        ----                    ----
[   49.032008]   lock(slock-AF_INET);
[   49.039565]                                lock(slock-AF_INET/1);
[   49.039565]                                lock(slock-AF_INET);
[   49.039565]   lock(slock-AF_INET/1);
[   49.039565] 
[   49.039565]  *** DEADLOCK ***
[   49.039565] 
[   49.039565] 3 locks held by tcsh/2490:
[   49.039565]  #0:  (slock-AF_INET){+.-...}, at: [<ffffffff8160738e>] sk_clone+0x10e/0x3e0
[   49.039565]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff81613815>] __netif_receive_skb+0x165/0x560
[   49.039565]  #2:  (rcu_read_lock){.+.+..}, at: [<ffffffff816446d0>] ip_local_deliver_finish+0x40/0x2f0
[   49.039565] 
[   49.039565] stack backtrace:
[   49.039565] Pid: 2490, comm: tcsh Not tainted 3.1.0-hw-lockdep+ #58
[   49.039565] Call Trace:
[   49.039565]  <IRQ>  [<ffffffff81097dab>] print_circular_bug+0x21b/0x330
[   49.039565]  [<ffffffff81099f00>] __lock_acquire+0x2040/0x2180
[   49.039565]  [<ffffffff8109a149>] lock_acquire+0x109/0x140
[   49.039565]  [<ffffffff816676b7>] ? tcp_v4_rcv+0x867/0xc10
[   49.039565]  [<ffffffff816f83da>] _raw_spin_lock_nested+0x3a/0x50
[   49.039565]  [<ffffffff816676b7>] ? tcp_v4_rcv+0x867/0xc10
[   49.039565]  [<ffffffff816676b7>] tcp_v4_rcv+0x867/0xc10
[   49.039565]  [<ffffffff816446d0>] ? ip_local_deliver_finish+0x40/0x2f0
[   49.039565]  [<ffffffff81644790>] ip_local_deliver_finish+0x100/0x2f0
[   49.039565]  [<ffffffff816446d0>] ? ip_local_deliver_finish+0x40/0x2f0
[   49.039565]  [<ffffffff81644a0d>] ip_local_deliver+0x8d/0xa0
[   49.039565]  [<ffffffff81644033>] ip_rcv_finish+0x1a3/0x510
[   49.039565]  [<ffffffff81644612>] ip_rcv+0x272/0x2f0
[   49.039565]  [<ffffffff81613b87>] __netif_receive_skb+0x4d7/0x560
[   49.039565]  [<ffffffff81613815>] ? __netif_receive_skb+0x165/0x560
[   49.039565]  [<ffffffff81615c44>] netif_receive_skb+0x104/0x120
[   49.039565]  [<ffffffff81615b63>] ? netif_receive_skb+0x23/0x120
[   49.039565]  [<ffffffff816161cb>] ? dev_gro_receive+0x29b/0x380
[   49.039565]  [<ffffffff816160c2>] ? dev_gro_receive+0x192/0x380
[   49.039565]  [<ffffffff81615d90>] napi_skb_finish+0x50/0x70
[   49.039565]  [<ffffffff81616455>] napi_gro_receive+0xc5/0xd0
[   49.039565]  [<ffffffffa000ad50>] bnx2_poll_work+0x610/0x1560 [bnx2]
[   49.039565]  [<ffffffffa000bde6>] bnx2_poll+0x66/0x250 [bnx2]
[   49.039565]  [<ffffffff816166a0>] net_rx_action+0x140/0x2c0
[   49.039565]  [<ffffffff810640e8>] __do_softirq+0x138/0x250
[   49.039565]  [<ffffffff817030fc>] call_softirq+0x1c/0x30
[   49.039565]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[   49.039565]  [<ffffffff81063cbd>] irq_exit+0xdd/0x110
[   49.039565]  [<ffffffff81014b74>] do_IRQ+0x64/0xe0
[   49.039565]  [<ffffffff816f90b3>] common_interrupt+0x73/0x73
[   49.039565]  <EOI>  [<ffffffff810944fd>] ? trace_hardirqs_off+0xd/0x10
[   49.039565]  [<ffffffff816f864f>] ? _raw_write_unlock_irq+0x2f/0x50
[   49.039565]  [<ffffffff816f864b>] ? _raw_write_unlock_irq+0x2b/0x50
[   49.039565]  [<ffffffff8105f63f>] release_task+0x24f/0x4c0
[   49.039565]  [<ffffffff8105f414>] ? release_task+0x24/0x4c0
[   49.039565]  [<ffffffff810601de>] wait_consider_task+0x92e/0xb90
[   49.039565]  [<ffffffff81096b0d>] ? trace_hardirqs_on_caller+0x13d/0x1c0
[   49.039565]  [<ffffffff81060590>] do_wait+0x150/0x270
[   49.039565]  [<ffffffff81096b9d>] ? trace_hardirqs_on+0xd/0x10
[   49.039565]  [<ffffffff81060751>] sys_wait4+0xa1/0xf0
[   49.039565]  [<ffffffff8105e9b0>] ? wait_noreap_copyout+0x150/0x150
[   49.039565]  [<ffffffff81700e92>] system_call_fastpath+0x16/0x1b
[   49.045277] huh, entered softirq 3 NET_RX ffffffff81616560 preempt_count 00000102, exited with 00000103?

Did that help? I'm not sure if that's what you wanted to see...

Simon-

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-03  0:09                                               ` Simon Kirby
@ 2011-11-03  0:15                                                 ` Steven Rostedt
  2011-11-03  0:17                                                   ` Simon Kirby
  0 siblings, 1 reply; 156+ messages in thread
From: Steven Rostedt @ 2011-11-03  0:15 UTC (permalink / raw)
  To: Simon Kirby
  Cc: Thomas Gleixner, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

On Wed, 2011-11-02 at 17:09 -0700, Simon Kirby wrote:
>  
> [   49.032008] other info that might help us debug this:
> [   49.032008] 
> [   49.032008]  Possible unsafe locking scenario:
> [   49.032008] 
> [   49.032008]        CPU0                    CPU1
> [   49.032008]        ----                    ----
> [   49.032008]   lock(slock-AF_INET);
> [   49.039565]                                lock(slock-AF_INET/1);
> [   49.039565]                                lock(slock-AF_INET);
> [   49.039565]   lock(slock-AF_INET/1);
> [   49.039565] 
> [   49.039565]  *** DEADLOCK ***
> [   49.039565] 

> Did that help? I'm not sure if that's what you wanted to see...


Yes, this looks much better than what you previously showed. The added
"/1" makes a world of difference.

Thanks!

I'll add your "Tested-by". Seems rather strange as we didn't fix the bug
you are chasing, but instead fixed the output of what the bug
produced ;)

-- Steve



^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-03  0:15                                                 ` Steven Rostedt
@ 2011-11-03  0:17                                                   ` Simon Kirby
  0 siblings, 0 replies; 156+ messages in thread
From: Simon Kirby @ 2011-11-03  0:17 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Thomas Gleixner, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development

On Wed, Nov 02, 2011 at 08:15:51PM -0400, Steven Rostedt wrote:

> On Wed, 2011-11-02 at 17:09 -0700, Simon Kirby wrote:
> >  
> > [   49.032008] other info that might help us debug this:
> > [   49.032008] 
> > [   49.032008]  Possible unsafe locking scenario:
> > [   49.032008] 
> > [   49.032008]        CPU0                    CPU1
> > [   49.032008]        ----                    ----
> > [   49.032008]   lock(slock-AF_INET);
> > [   49.039565]                                lock(slock-AF_INET/1);
> > [   49.039565]                                lock(slock-AF_INET);
> > [   49.039565]   lock(slock-AF_INET/1);
> > [   49.039565] 
> > [   49.039565]  *** DEADLOCK ***
> > [   49.039565] 
> 
> > Did that help? I'm not sure if that's what you wanted to see...
> 
> 
> Yes, this looks much better than what you previously showed. The added
> "/1" makes a world of difference.
> 
> Thanks!
> 
> I'll add your "Tested-by". Seems rather strange as we didn't fix the bug
> you are chasing, but instead fixed the output of what the bug
> produced ;)

Well, I was testing this without Eric's patch as I figured you wanted to
see the splat. :) Testing again with Eric's patch now.

Simon-

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 22:42                                                     ` Eric Dumazet
@ 2011-11-03  0:24                                                       ` Thomas Gleixner
  2011-11-03  0:52                                                       ` Simon Kirby
  2011-11-03  6:06                                                       ` Jörg-Volker Peetz
  2 siblings, 0 replies; 156+ messages in thread
From: Thomas Gleixner @ 2011-11-03  0:24 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Simon Kirby, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development, Balazs Scheidler,
	KOVACS Krisztian

On Wed, 2 Nov 2011, Eric Dumazet wrote:
> On 02/11/2011 20:16, Simon Kirby wrote:
> 
>  
> > Actually, we have an anti-abuse daemon that injects blackhole routes, so
> > this makes sense. (The daemon was written before ipsets were merged and
> > normal netfilter rules make it fall over under attack.)
> > 
> > I'll try with this patch. Thanks!
> > 
> 
> 
> Thanks !
> 
> Here is the official submission, please add your 'Tested-by' signature
> when you can confirm problem goes away.
> 
> (It did here, when I injected random NULL returns from
> inet_csk_route_child_sock(), so I am confident this is the problem you hit )
> 
> [PATCH] net: add missing bh_unlock_sock() calls
> 
> Simon Kirby reported lockdep warnings and following messages :
> 
> [104661.897577] huh, entered softirq 3 NET_RX ffffffff81613740
> preempt_count 00000101, exited with 00000102?
> 
> [104661.923653] huh, entered softirq 3 NET_RX ffffffff81613740
> preempt_count 00000101, exited with 00000102?
> 
> Problem comes from commit 0e734419
> (ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.)
> 
> If inet_csk_route_child_sock() returns NULL, we should release socket
> lock before freeing it.
> 
> Another lock imbalance exists if __inet_inherit_port() returns an error
> since commit 093d282321da ( tproxy: fix hash locking issue when using
> port redirection in __inet_inherit_port()) a backport is also needed for
> >= 2.6.37 kernels.
> 
> Reported-by: Dimon Kirby <sim@hostway.ca>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Balazs Scheidler <bazsi@balabit.hu>
> CC: KOVACS Krisztian <hidden@balabit.hu>

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

You probably also want: CC: stable@vger.kernel.org

Thanks,

	tglx

> ---
>  net/dccp/ipv4.c     |    1 +
>  net/ipv4/tcp_ipv4.c |    1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
> index 332639b..90a919a 100644
> --- a/net/dccp/ipv4.c
> +++ b/net/dccp/ipv4.c
> @@ -433,6 +433,7 @@ exit:
>  	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS);
>  	return NULL;
>  put_and_exit:
> +	bh_unlock_sock(newsk);
>  	sock_put(newsk);
>  	goto exit;
>  }
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 0ea10ee..683d97a 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1510,6 +1510,7 @@ exit:
>  	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS);
>  	return NULL;
>  put_and_exit:
> +	bh_unlock_sock(newsk);
>  	sock_put(newsk);
>  	goto exit;
>  }
> 

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 22:42                                                     ` Eric Dumazet
  2011-11-03  0:24                                                       ` Thomas Gleixner
@ 2011-11-03  0:52                                                       ` Simon Kirby
  2011-11-03 22:07                                                         ` David Miller
  2011-11-03  6:06                                                       ` Jörg-Volker Peetz
  2 siblings, 1 reply; 156+ messages in thread
From: Simon Kirby @ 2011-11-03  0:52 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Thomas Gleixner, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development, Balazs Scheidler,
	KOVACS Krisztian

On Wed, Nov 02, 2011 at 11:42:56PM +0100, Eric Dumazet wrote:

> On 02/11/2011 20:16, Simon Kirby wrote:
> 
>  
> > Actually, we have an anti-abuse daemon that injects blackhole routes, so
> > this makes sense. (The daemon was written before ipsets were merged and
> > normal netfilter rules make it fall over under attack.)
> > 
> > I'll try with this patch. Thanks!
> > 
> 
> 
> Thanks !
> 
> Here is the official submission, please add your 'Tested-by' signature
> when you can confirm problem goes away.
> 
> (It did here, when I injected random NULL returns from
> inet_csk_route_child_sock(), so I am confident this is the problem you hit )
> 
> [PATCH] net: add missing bh_unlock_sock() calls
> 
> Simon Kirby reported lockdep warnings and following messages :
> 
> [104661.897577] huh, entered softirq 3 NET_RX ffffffff81613740
> preempt_count 00000101, exited with 00000102?
> 
> [104661.923653] huh, entered softirq 3 NET_RX ffffffff81613740
> preempt_count 00000101, exited with 00000102?
> 
> Problem comes from commit 0e734419
> (ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.)
> 
> If inet_csk_route_child_sock() returns NULL, we should release socket
> lock before freeing it.
> 
> Another lock imbalance exists if __inet_inherit_port() returns an error
> since commit 093d282321da ( tproxy: fix hash locking issue when using
> port redirection in __inet_inherit_port()) a backport is also needed for
> >= 2.6.37 kernels.
> 
> Reported-by: Dimon Kirby <sim@hostway.ca>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Balazs Scheidler <bazsi@balabit.hu>
> CC: KOVACS Krisztian <hidden@balabit.hu>
> ---
>  net/dccp/ipv4.c     |    1 +
>  net/ipv4/tcp_ipv4.c |    1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
> index 332639b..90a919a 100644
> --- a/net/dccp/ipv4.c
> +++ b/net/dccp/ipv4.c
> @@ -433,6 +433,7 @@ exit:
>  	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS);
>  	return NULL;
>  put_and_exit:
> +	bh_unlock_sock(newsk);
>  	sock_put(newsk);
>  	goto exit;
>  }
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 0ea10ee..683d97a 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1510,6 +1510,7 @@ exit:
>  	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS);
>  	return NULL;
>  put_and_exit:
> +	bh_unlock_sock(newsk);
>  	sock_put(newsk);
>  	goto exit;
>  }

Tested-by: Simon Kirby <sim@hostway.ca>

I tried many times, with route unreach/blackhole, and could not reproduce
the issue with this patch applied.

Thanks!

Simon-

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-02 22:42                                                     ` Eric Dumazet
  2011-11-03  0:24                                                       ` Thomas Gleixner
  2011-11-03  0:52                                                       ` Simon Kirby
@ 2011-11-03  6:06                                                       ` Jörg-Volker Peetz
  2011-11-03  6:26                                                         ` Eric Dumazet
  2 siblings, 1 reply; 156+ messages in thread
From: Jörg-Volker Peetz @ 2011-11-03  6:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev

Eric Dumazet wrote, on 11/02/11 23:42:
<snip>
> Reported-by: Dimon Kirby <sim@hostway.ca>
??             Simon                     ??
<snip>
-- 
Best regards,
Jörg-Volker.


^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-03  6:06                                                       ` Jörg-Volker Peetz
@ 2011-11-03  6:26                                                         ` Eric Dumazet
  2011-11-03  6:43                                                           ` David Miller
  0 siblings, 1 reply; 156+ messages in thread
From: Eric Dumazet @ 2011-11-03  6:26 UTC (permalink / raw)
  To: Jörg-Volker Peetz
  Cc: Simon Kirby, Thomas Gleixner, Linux Kernel Mailing List

On 03/11/2011 07:06, Jörg-Volker Peetz wrote:

> Eric Dumazet wrote, on 11/02/11 23:42:
> <snip>
>> Reported-by: Dimon Kirby <sim@hostway.ca>
> ??             Simon                     ??
> <snip>


Oops sorry, please David could you fix Simon name ?

Reported-by: Simon Kirby <sim@hostway.ca>

Thanks


^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-03  6:26                                                         ` Eric Dumazet
@ 2011-11-03  6:43                                                           ` David Miller
  0 siblings, 0 replies; 156+ messages in thread
From: David Miller @ 2011-11-03  6:43 UTC (permalink / raw)
  To: eric.dumazet; +Cc: jvpeetz, sim, tglx, linux-kernel

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 03 Nov 2011 07:26:37 +0100

> On 03/11/2011 07:06, Jörg-Volker Peetz wrote:
> 
>> Eric Dumazet wrote, on 11/02/11 23:42:
>> <snip>
>>> Reported-by: Dimon Kirby <sim@hostway.ca>
>> ??             Simon                     ??
>> <snip>
> 
> 
> Oops sorry, please David could you fix Simon name ?

Will do.

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: Linux 3.1-rc9
  2011-11-03  0:52                                                       ` Simon Kirby
@ 2011-11-03 22:07                                                         ` David Miller
  0 siblings, 0 replies; 156+ messages in thread
From: David Miller @ 2011-11-03 22:07 UTC (permalink / raw)
  To: sim
  Cc: eric.dumazet, tglx, a.p.zijlstra, torvalds, linux-kernel, davej,
	schwidefsky, mingo, netdev, bazsi, hidden

From: Simon Kirby <sim@hostway.ca>
Date: Wed, 2 Nov 2011 17:52:55 -0700

>> [PATCH] net: add missing bh_unlock_sock() calls
 ...
> Tested-by: Simon Kirby <sim@hostway.ca>
> 
> I tried many times, with route unreach/blackhole, and could not reproduce
> the issue with this patch applied.

Applied and queued up for -stable, thanks everyone!

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [tip:perf/core] lockdep: Show subclass in pretty print of lockdep output
  2011-10-25 20:20                                       ` Simon Kirby
  2011-10-31 17:32                                         ` Simon Kirby
@ 2011-11-18 23:11                                         ` tip-bot for Steven Rostedt
  1 sibling, 0 replies; 156+ messages in thread
From: tip-bot for Steven Rostedt @ 2011-11-18 23:11 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, rostedt, srostedt, tglx, sim

Commit-ID:  e5e78d08f3ab3094783b8df08a5b6d1d1a56a58f
Gitweb:     http://git.kernel.org/tip/e5e78d08f3ab3094783b8df08a5b6d1d1a56a58f
Author:     Steven Rostedt <srostedt@redhat.com>
AuthorDate: Wed, 2 Nov 2011 20:24:16 -0400
Committer:  Steven Rostedt <rostedt@goodmis.org>
CommitDate: Mon, 7 Nov 2011 11:01:46 -0500

lockdep: Show subclass in pretty print of lockdep output

The pretty print of the lockdep debug splat uses just the lock name
to show how the locking scenario happens. But when it comes to
nesting locks, the output becomes confusing which takes away the point
of the pretty printing of the lock scenario.

Without displaying the subclass info, we get the following output:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(slock-AF_INET);
                                lock(slock-AF_INET);
                                lock(slock-AF_INET);
   lock(slock-AF_INET);

  *** DEADLOCK ***

The above looks more of a A->A locking bug than a A->B B->A.
By adding the subclass to the output, we can see what really happened:

 other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(slock-AF_INET);
                                lock(slock-AF_INET/1);
                                lock(slock-AF_INET);
   lock(slock-AF_INET/1);

  *** DEADLOCK ***

This bug was discovered while tracking down a real bug caught by lockdep.

Link: http://lkml.kernel.org/r/20111025202049.GB25043@hostway.ca

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Simon Kirby <sim@hostway.ca>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/lockdep.c |   30 +++++++++++++-----------------
 1 files changed, 13 insertions(+), 17 deletions(-)

diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 91d67ce..6bd915d 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -490,36 +490,32 @@ void get_usage_chars(struct lock_class *class, char usage[LOCK_USAGE_CHARS])
 	usage[i] = '\0';
 }
 
-static int __print_lock_name(struct lock_class *class)
+static void __print_lock_name(struct lock_class *class)
 {
 	char str[KSYM_NAME_LEN];
 	const char *name;
 
 	name = class->name;
-	if (!name)
-		name = __get_key_name(class->key, str);
-
-	return printk("%s", name);
-}
-
-static void print_lock_name(struct lock_class *class)
-{
-	char str[KSYM_NAME_LEN], usage[LOCK_USAGE_CHARS];
-	const char *name;
-
-	get_usage_chars(class, usage);
-
-	name = class->name;
 	if (!name) {
 		name = __get_key_name(class->key, str);
-		printk(" (%s", name);
+		printk("%s", name);
 	} else {
-		printk(" (%s", name);
+		printk("%s", name);
 		if (class->name_version > 1)
 			printk("#%d", class->name_version);
 		if (class->subclass)
 			printk("/%d", class->subclass);
 	}
+}
+
+static void print_lock_name(struct lock_class *class)
+{
+	char usage[LOCK_USAGE_CHARS];
+
+	get_usage_chars(class, usage);
+
+	printk(" (");
+	__print_lock_name(class);
 	printk("){%s}", usage);
 }
 

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* Re: [GIT PULL] timer fix
  2024-05-10 11:12 [GIT PULL] timer fix Ingo Molnar
@ 2024-05-10 17:29 ` pr-tracker-bot
  0 siblings, 0 replies; 156+ messages in thread
From: pr-tracker-bot @ 2024-05-10 17:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Andrew Morton, Anna-Maria Behnsen, Frederic Weisbecker

The pull request you sent on Fri, 10 May 2024 13:12:49 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-2024-05-10

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/92d503011f2fa2c85624dde43429cd0c6a25ef6a

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [GIT PULL] timer fix
@ 2024-05-10 11:12 Ingo Molnar
  2024-05-10 17:29 ` pr-tracker-bot
  0 siblings, 1 reply; 156+ messages in thread
From: Ingo Molnar @ 2024-05-10 11:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Andrew Morton,
	Anna-Maria Behnsen, Frederic Weisbecker

Linus,

Please pull the latest timers/urgent Git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-2024-05-10

   # HEAD: d7ad05c86e2191bd66e5b62fca8da53c4a53484f timers/migration: Prevent out of bounds access on failure

Fix possible (but unlikely) out-of-bounds access in
the timer migration per-CPU-init code.

 Thanks,

	Ingo

------------------>
Levi Yun (1):
      timers/migration: Prevent out of bounds access on failure


 kernel/time/timer_migration.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c
index ccba875d2234..84413114db5c 100644
--- a/kernel/time/timer_migration.c
+++ b/kernel/time/timer_migration.c
@@ -1596,7 +1596,7 @@ static int tmigr_setup_groups(unsigned int cpu, unsigned int node)
 
 	} while (i < tmigr_hierarchy_levels);
 
-	do {
+	while (i > 0) {
 		group = stack[--i];
 
 		if (err < 0) {
@@ -1645,7 +1645,7 @@ static int tmigr_setup_groups(unsigned int cpu, unsigned int node)
 				tmigr_connect_child_parent(child, group);
 			}
 		}
-	} while (i > 0);
+	}
 
 	kfree(stack);
 

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* Re: [GIT PULL] timer fix
  2020-06-28 18:39 Ingo Molnar
@ 2020-06-28 22:05 ` pr-tracker-bot
  0 siblings, 0 replies; 156+ messages in thread
From: pr-tracker-bot @ 2020-06-28 22:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Borislav Petkov, Andrew Morton

The pull request you sent on Sun, 28 Jun 2020 20:39:25 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-2020-06-28

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/668f532da4808688f5162cec6a38875390e1a91d

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [GIT PULL] timer fix
@ 2020-06-28 18:39 Ingo Molnar
  2020-06-28 22:05 ` pr-tracker-bot
  0 siblings, 1 reply; 156+ messages in thread
From: Ingo Molnar @ 2020-06-28 18:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Borislav Petkov,
	Andrew Morton

Linus,

Please pull the latest timers/urgent git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-2020-06-28

   # HEAD: f097eb38f71391ff2cf078788bad5a00eb3bd96a timekeeping: Fix kerneldoc system_device_crosststamp & al

A single DocBook fix.

 Thanks,

	Ingo

------------------>
Kurt Kanzenbach (1):
      timekeeping: Fix kerneldoc system_device_crosststamp & al


 include/linux/timekeeping.h | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index b27e2ffa96c1..d5471d6fa778 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -222,9 +222,9 @@ extern bool timekeeping_rtc_skipresume(void);
 
 extern void timekeeping_inject_sleeptime64(const struct timespec64 *delta);
 
-/*
+/**
  * struct system_time_snapshot - simultaneous raw/real time capture with
- *	counter value
+ *				 counter value
  * @cycles:	Clocksource counter value to produce the system times
  * @real:	Realtime system time
  * @raw:	Monotonic raw system time
@@ -239,9 +239,9 @@ struct system_time_snapshot {
 	u8		cs_was_changed_seq;
 };
 
-/*
+/**
  * struct system_device_crosststamp - system/device cross-timestamp
- *	(syncronized capture)
+ *				      (synchronized capture)
  * @device:		Device time
  * @sys_realtime:	Realtime simultaneous with device time
  * @sys_monoraw:	Monotonic raw simultaneous with device time
@@ -252,12 +252,12 @@ struct system_device_crosststamp {
 	ktime_t sys_monoraw;
 };
 
-/*
+/**
  * struct system_counterval_t - system counter value with the pointer to the
- *	corresponding clocksource
+ *				corresponding clocksource
  * @cycles:	System counter value
  * @cs:		Clocksource corresponding to system counter value. Used by
- *	timekeeping code to verify comparibility of two cycle values
+ *		timekeeping code to verify comparibility of two cycle values
  */
 struct system_counterval_t {
 	u64			cycles;

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* Re: [GIT PULL] timer fix
  2020-04-25 10:16 Ingo Molnar
@ 2020-04-25 19:30 ` pr-tracker-bot
  0 siblings, 0 replies; 156+ messages in thread
From: pr-tracker-bot @ 2020-04-25 19:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Andrew Morton

The pull request you sent on Sat, 25 Apr 2020 12:16:11 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-2020-04-25

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/acd629446804617a8fe4700fc4ca16eb44aa4efd

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [GIT PULL] timer fix
@ 2020-04-25 10:16 Ingo Molnar
  2020-04-25 19:30 ` pr-tracker-bot
  0 siblings, 1 reply; 156+ messages in thread
From: Ingo Molnar @ 2020-04-25 10:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Andrew Morton

Linus,

Please pull the latest timers/urgent git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-2020-04-25

   # HEAD: ac84bac4062e7fc24f5e2c61c6a414b2a00a29ad vdso/datapage: Use correct clock mode name in comment

A single fix for a comment that may show up in DocBook output.

 Thanks,

	Ingo

------------------>
Christian Brauner (1):
      vdso/datapage: Use correct clock mode name in comment


 include/vdso/datapage.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/vdso/datapage.h b/include/vdso/datapage.h
index 5cbc9fcbfd45..7955c56d6b3c 100644
--- a/include/vdso/datapage.h
+++ b/include/vdso/datapage.h
@@ -73,8 +73,8 @@ struct vdso_timestamp {
  *
  * @offset is used by the special time namespace VVAR pages which are
  * installed instead of the real VVAR page. These namespace pages must set
- * @seq to 1 and @clock_mode to VLOCK_TIMENS to force the code into the
- * time namespace slow path. The namespace aware functions retrieve the
+ * @seq to 1 and @clock_mode to VDSO_CLOCKMODE_TIMENS to force the code into
+ * the time namespace slow path. The namespace aware functions retrieve the
  * real system wide VVAR page, read host time and add the per clock offset.
  * For clocks which are not affected by time namespace adjustment the
  * offset must be zero.

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* Re: [GIT PULL] timer fix
  2019-11-16 21:38 Ingo Molnar
@ 2019-11-17  0:35 ` pr-tracker-bot
  0 siblings, 0 replies; 156+ messages in thread
From: pr-tracker-bot @ 2019-11-17  0:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Andrew Morton

The pull request you sent on Sat, 16 Nov 2019 22:38:54 +0100:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-for-linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/3278b3b6782c562079a3e0af0979968fd94d141c

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [GIT PULL] timer fix
@ 2019-11-16 21:38 Ingo Molnar
  2019-11-17  0:35 ` pr-tracker-bot
  0 siblings, 1 reply; 156+ messages in thread
From: Ingo Molnar @ 2019-11-16 21:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Andrew Morton

Linus,

Please pull the latest timers-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-for-linus

   # HEAD: 2f5841349df281ecf8f81cc82d869b8476f0db0b ntp/y2038: Remove incorrect time_t truncation

Fix integer truncation bug in __do_adjtimex().

 Thanks,

	Ingo

------------------>
Arnd Bergmann (1):
      ntp/y2038: Remove incorrect time_t truncation


 kernel/time/ntp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 65eb796610dc..069ca78fb0bf 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -771,7 +771,7 @@ int __do_adjtimex(struct __kernel_timex *txc, const struct timespec64 *ts,
 	/* fill PPS status fields */
 	pps_fill_timex(txc);
 
-	txc->time.tv_sec = (time_t)ts->tv_sec;
+	txc->time.tv_sec = ts->tv_sec;
 	txc->time.tv_usec = ts->tv_nsec;
 	if (!(time_status & STA_NANO))
 		txc->time.tv_usec = ts->tv_nsec / NSEC_PER_USEC;

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* Re: [GIT PULL] timer fix
  2019-10-02 22:06 Ingo Molnar
@ 2019-10-02 23:00 ` pr-tracker-bot
  0 siblings, 0 replies; 156+ messages in thread
From: pr-tracker-bot @ 2019-10-02 23:00 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Andrew Morton

The pull request you sent on Thu, 3 Oct 2019 00:06:07 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-for-linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/5021b9182ee805603e3b180220a929af7bd4b960

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [GIT PULL] timer fix
@ 2019-10-02 22:06 Ingo Molnar
  2019-10-02 23:00 ` pr-tracker-bot
  0 siblings, 1 reply; 156+ messages in thread
From: Ingo Molnar @ 2019-10-02 22:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Andrew Morton

Linus,

Please pull the latest timers-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-for-linus

   # HEAD: b9023b91dd020ad7e093baa5122b6968c48cc9e0 tick: broadcast-hrtimer: Fix a race in bc_set_next

Fix a broadcast-timer handling race that can result in spuriously and 
indefinitely delayed hrtimers and even RCU stalls if the system is 
otherwise quiet.

 Thanks,

	Ingo

------------------>
Balasubramani Vivekanandan (1):
      tick: broadcast-hrtimer: Fix a race in bc_set_next


 kernel/time/tick-broadcast-hrtimer.c | 62 +++++++++++++++++-------------------
 1 file changed, 29 insertions(+), 33 deletions(-)

diff --git a/kernel/time/tick-broadcast-hrtimer.c b/kernel/time/tick-broadcast-hrtimer.c
index c1f5bb590b5e..b5a65e212df2 100644
--- a/kernel/time/tick-broadcast-hrtimer.c
+++ b/kernel/time/tick-broadcast-hrtimer.c
@@ -42,39 +42,39 @@ static int bc_shutdown(struct clock_event_device *evt)
  */
 static int bc_set_next(ktime_t expires, struct clock_event_device *bc)
 {
-	int bc_moved;
 	/*
-	 * We try to cancel the timer first. If the callback is on
-	 * flight on some other cpu then we let it handle it. If we
-	 * were able to cancel the timer nothing can rearm it as we
-	 * own broadcast_lock.
+	 * This is called either from enter/exit idle code or from the
+	 * broadcast handler. In all cases tick_broadcast_lock is held.
 	 *
-	 * However we can also be called from the event handler of
-	 * ce_broadcast_hrtimer itself when it expires. We cannot
-	 * restart the timer because we are in the callback, but we
-	 * can set the expiry time and let the callback return
-	 * HRTIMER_RESTART.
+	 * hrtimer_cancel() cannot be called here neither from the
+	 * broadcast handler nor from the enter/exit idle code. The idle
+	 * code can run into the problem described in bc_shutdown() and the
+	 * broadcast handler cannot wait for itself to complete for obvious
+	 * reasons.
 	 *
-	 * Since we are in the idle loop at this point and because
-	 * hrtimer_{start/cancel} functions call into tracing,
-	 * calls to these functions must be bound within RCU_NONIDLE.
+	 * Each caller tries to arm the hrtimer on its own CPU, but if the
+	 * hrtimer callbback function is currently running, then
+	 * hrtimer_start() cannot move it and the timer stays on the CPU on
+	 * which it is assigned at the moment.
+	 *
+	 * As this can be called from idle code, the hrtimer_start()
+	 * invocation has to be wrapped with RCU_NONIDLE() as
+	 * hrtimer_start() can call into tracing.
 	 */
-	RCU_NONIDLE(
-		{
-			bc_moved = hrtimer_try_to_cancel(&bctimer) >= 0;
-			if (bc_moved) {
-				hrtimer_start(&bctimer, expires,
-					      HRTIMER_MODE_ABS_PINNED_HARD);
-			}
-		}
-	);
-
-	if (bc_moved) {
-		/* Bind the "device" to the cpu */
-		bc->bound_on = smp_processor_id();
-	} else if (bc->bound_on == smp_processor_id()) {
-		hrtimer_set_expires(&bctimer, expires);
-	}
+	RCU_NONIDLE( {
+		hrtimer_start(&bctimer, expires, HRTIMER_MODE_ABS_PINNED_HARD);
+		/*
+		 * The core tick broadcast mode expects bc->bound_on to be set
+		 * correctly to prevent a CPU which has the broadcast hrtimer
+		 * armed from going deep idle.
+		 *
+		 * As tick_broadcast_lock is held, nothing can change the cpu
+		 * base which was just established in hrtimer_start() above. So
+		 * the below access is safe even without holding the hrtimer
+		 * base lock.
+		 */
+		bc->bound_on = bctimer.base->cpu_base->cpu;
+	} );
 	return 0;
 }
 
@@ -100,10 +100,6 @@ static enum hrtimer_restart bc_handler(struct hrtimer *t)
 {
 	ce_broadcast_hrtimer.event_handler(&ce_broadcast_hrtimer);
 
-	if (clockevent_state_oneshot(&ce_broadcast_hrtimer))
-		if (ce_broadcast_hrtimer.next_event != KTIME_MAX)
-			return HRTIMER_RESTART;
-
 	return HRTIMER_NORESTART;
 }
 

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* Re: [GIT PULL] timer fix
  2019-09-26 20:18 Ingo Molnar
@ 2019-09-26 23:00 ` pr-tracker-bot
  0 siblings, 0 replies; 156+ messages in thread
From: pr-tracker-bot @ 2019-09-26 23:00 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Andrew Morton

The pull request you sent on Thu, 26 Sep 2019 22:18:25 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-for-linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/da05b5ea12c1e50b2988a63470d6b69434796f8b

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [GIT PULL] timer fix
@ 2019-09-26 20:18 Ingo Molnar
  2019-09-26 23:00 ` pr-tracker-bot
  0 siblings, 1 reply; 156+ messages in thread
From: Ingo Molnar @ 2019-09-26 20:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Andrew Morton

Linus,

Please pull the latest timers-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-for-linus

   # HEAD: e430d802d6a3aaf61bd3ed03d9404888a29b9bf9 timer: Read jiffies once when forwarding base clk

Fixes a timer expiry bug that would cause spurious delay of timers.

 Thanks,

	Ingo

------------------>
Li RongQing (1):
      timer: Read jiffies once when forwarding base clk


 kernel/time/timer.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 0e315a2e77ae..4820823515e9 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1678,24 +1678,26 @@ void timer_clear_idle(void)
 static int collect_expired_timers(struct timer_base *base,
 				  struct hlist_head *heads)
 {
+	unsigned long now = READ_ONCE(jiffies);
+
 	/*
 	 * NOHZ optimization. After a long idle sleep we need to forward the
 	 * base to current jiffies. Avoid a loop by searching the bitfield for
 	 * the next expiring timer.
 	 */
-	if ((long)(jiffies - base->clk) > 2) {
+	if ((long)(now - base->clk) > 2) {
 		unsigned long next = __next_timer_interrupt(base);
 
 		/*
 		 * If the next timer is ahead of time forward to current
 		 * jiffies, otherwise forward to the next expiry time:
 		 */
-		if (time_after(next, jiffies)) {
+		if (time_after(next, now)) {
 			/*
 			 * The call site will increment base->clk and then
 			 * terminate the expiry loop immediately.
 			 */
-			base->clk = jiffies;
+			base->clk = now;
 			return 0;
 		}
 		base->clk = next;

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* Re: [GIT PULL] timer fix
  2019-04-12 13:09 Ingo Molnar
@ 2019-04-13  4:05 ` pr-tracker-bot
  0 siblings, 0 replies; 156+ messages in thread
From: pr-tracker-bot @ 2019-04-13  4:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Andrew Morton

The pull request you sent on Fri, 12 Apr 2019 15:09:19 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-for-linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/122c215bfae884f10a189e6754d9603a06b981c3

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [GIT PULL] timer fix
@ 2019-04-12 13:09 Ingo Molnar
  2019-04-13  4:05 ` pr-tracker-bot
  0 siblings, 1 reply; 156+ messages in thread
From: Ingo Molnar @ 2019-04-12 13:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Andrew Morton

Linus,

Please pull the latest timers-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-for-linus

   # HEAD: 07d7e12091f4ab869cc6a4bb276399057e73b0b3 alarmtimer: Return correct remaining time

Fix the alarm_timer_remaining() return value.

 Thanks,

	Ingo

------------------>
Andrei Vagin (1):
      alarmtimer: Return correct remaining time


 kernel/time/alarmtimer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index 2c97e8c2d29f..0519a8805aab 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -594,7 +594,7 @@ static ktime_t alarm_timer_remaining(struct k_itimer *timr, ktime_t now)
 {
 	struct alarm *alarm = &timr->it.alarm.alarmtimer;
 
-	return ktime_sub(now, alarm->node.expires);
+	return ktime_sub(alarm->node.expires, now);
 }
 
 /**

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* Re: [GIT PULL] timer fix
  2019-01-17 15:58     ` Heiko Carstens
@ 2019-01-17 16:57       ` Thomas Gleixner
  0 siblings, 0 replies; 156+ messages in thread
From: Thomas Gleixner @ 2019-01-17 16:57 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Ingo Molnar, Linus Torvalds, linux-kernel, Peter Zijlstra,
	Andrew Morton, Stefan Liebler

On Thu, 17 Jan 2019, Heiko Carstens wrote:
> On Thu, Jan 17, 2019 at 10:51:02AM +0100, Ingo Molnar wrote:
> > 
> > * Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> > 
> > > > -	if (timr->it_requeue_pending == info->si_sys_private) {
> > > > +	if (timr->it_interval && timr->it_requeue_pending == info->si_sys_private) {
> > > >  		timr->kclock->timer_rearm(timr);
> > > 
> > > FWIW, with this patch the vanilla glibc 2.28 self tests
> > > rt/tst-cputimer1, rt/tst-cputimer2, and rt/tst-cputimer3
> > > start to fail on s390:
> ...
> > > I haven't looked any further into this, just reporting.. otherwise the
> > > test systems seem to be healthy.
> > 
> > Could you please check whether the top commit in tip:timers/urgent fixes 
> > it:
> >   93ad0fc088c5: posix-cpu-timers: Unbreak timer rearming
> 
> Yes, the test cases don't fail anymore. Thanks!
> 
> A general question: since I reported this already last year, was the
> bug report not usable? I understand that x-mas holidays were in
> between, just wondering if new "glibc test case" fails are worth to be
> reported like I did.

I was on a 3 weeks vacation and I tend to clean out my inbox when I return
as it turned out in the past that playing catch up is hopeless. The
important stuff comes back by itself :)

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [GIT PULL] timer fix
  2019-01-17  9:51   ` Ingo Molnar
@ 2019-01-17 15:58     ` Heiko Carstens
  2019-01-17 16:57       ` Thomas Gleixner
  0 siblings, 1 reply; 156+ messages in thread
From: Heiko Carstens @ 2019-01-17 15:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Andrew Morton, Stefan Liebler

On Thu, Jan 17, 2019 at 10:51:02AM +0100, Ingo Molnar wrote:
> 
> * Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> 
> > > -	if (timr->it_requeue_pending == info->si_sys_private) {
> > > +	if (timr->it_interval && timr->it_requeue_pending == info->si_sys_private) {
> > >  		timr->kclock->timer_rearm(timr);
> > 
> > FWIW, with this patch the vanilla glibc 2.28 self tests
> > rt/tst-cputimer1, rt/tst-cputimer2, and rt/tst-cputimer3
> > start to fail on s390:
...
> > I haven't looked any further into this, just reporting.. otherwise the
> > test systems seem to be healthy.
> 
> Could you please check whether the top commit in tip:timers/urgent fixes 
> it:
>   93ad0fc088c5: posix-cpu-timers: Unbreak timer rearming

Yes, the test cases don't fail anymore. Thanks!

A general question: since I reported this already last year, was the
bug report not usable? I understand that x-mas holidays were in
between, just wondering if new "glibc test case" fails are worth to be
reported like I did.


^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [GIT PULL] timer fix
  2018-12-23 19:29 ` Heiko Carstens
@ 2019-01-17  9:51   ` Ingo Molnar
  2019-01-17 15:58     ` Heiko Carstens
  0 siblings, 1 reply; 156+ messages in thread
From: Ingo Molnar @ 2019-01-17  9:51 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Linus Torvalds, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Andrew Morton


* Heiko Carstens <heiko.carstens@de.ibm.com> wrote:

> > -	if (timr->it_requeue_pending == info->si_sys_private) {
> > +	if (timr->it_interval && timr->it_requeue_pending == info->si_sys_private) {
> >  		timr->kclock->timer_rearm(timr);
> 
> FWIW, with this patch the vanilla glibc 2.28 self tests
> rt/tst-cputimer1, rt/tst-cputimer2, and rt/tst-cputimer3
> start to fail on s390:
> 
> rt/tst-cputimer1.out:
> clock_gettime returned timespec = { 0, 117181 }
> clock_getres returned timespec = { 0, 1 }
> Timed out: killed the child process
> rt/tst-cputimer1.test-result:
> FAIL: rt/tst-cputimer1
> original exit status 1
> 
> rt/tst-cputimer2.out:
> clock_gettime returned timespec = { 0, 9686 }
> clock_getres returned timespec = { 0, 1 }
> Timed out: killed the child process
> rt/tst-cputimer2.test-result:
> FAIL: rt/tst-cputimer2
> original exit status 1
> 
> rt/tst-cputimer3.out:
> clock_gettime returned timespec = { 0, 0 }
> clock_getres returned timespec = { 0, 1 }
> Timed out: killed the child process
> rt/tst-cputimer3.test-result:
> FAIL: rt/tst-cputimer3
> original exit status 1
> 
> I haven't looked any further into this, just reporting.. otherwise the
> test systems seem to be healthy.

Could you please check whether the top commit in tip:timers/urgent fixes 
it:

  93ad0fc088c5: posix-cpu-timers: Unbreak timer rearming

?

It's in tip:master as well and should show up in linux-next tomorrow.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [GIT PULL] timer fix
  2018-12-21 12:34 Ingo Molnar
  2018-12-21 19:30 ` pr-tracker-bot
@ 2018-12-23 19:29 ` Heiko Carstens
  2019-01-17  9:51   ` Ingo Molnar
  1 sibling, 1 reply; 156+ messages in thread
From: Heiko Carstens @ 2018-12-23 19:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Andrew Morton

On Fri, Dec 21, 2018 at 01:34:53PM +0100, Ingo Molnar wrote:
> Linus,
> 
> Please pull the latest timers-urgent-for-linus git tree from:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-for-linus
> 
>    # HEAD: 0e334db6bb4b1fd1e2d72c1f3d8f004313cd9f94 posix-timers: Fix division by zero bug
> 
> Fix a division by zero crash in the posix-timers code.
> 
>  Thanks,
> 
> 	Ingo
> 
> ------------------>
> Thomas Gleixner (1):
>       posix-timers: Fix division by zero bug
> 
> 
>  kernel/time/posix-timers.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
> index bd62b5eeb5a0..31f49ae80f43 100644
> --- a/kernel/time/posix-timers.c
> +++ b/kernel/time/posix-timers.c
> @@ -289,9 +289,6 @@ static void common_hrtimer_rearm(struct k_itimer *timr)
>  {
>  	struct hrtimer *timer = &timr->it.real.timer;
>  
> -	if (!timr->it_interval)
> -		return;
> -
>  	timr->it_overrun += hrtimer_forward(timer, timer->base->get_time(),
>  					    timr->it_interval);
>  	hrtimer_restart(timer);
> @@ -317,7 +314,7 @@ void posixtimer_rearm(struct kernel_siginfo *info)
>  	if (!timr)
>  		return;
>  
> -	if (timr->it_requeue_pending == info->si_sys_private) {
> +	if (timr->it_interval && timr->it_requeue_pending == info->si_sys_private) {
>  		timr->kclock->timer_rearm(timr);

FWIW, with this patch the vanilla glibc 2.28 self tests
rt/tst-cputimer1, rt/tst-cputimer2, and rt/tst-cputimer3
start to fail on s390:

rt/tst-cputimer1.out:
clock_gettime returned timespec = { 0, 117181 }
clock_getres returned timespec = { 0, 1 }
Timed out: killed the child process
rt/tst-cputimer1.test-result:
FAIL: rt/tst-cputimer1
original exit status 1

rt/tst-cputimer2.out:
clock_gettime returned timespec = { 0, 9686 }
clock_getres returned timespec = { 0, 1 }
Timed out: killed the child process
rt/tst-cputimer2.test-result:
FAIL: rt/tst-cputimer2
original exit status 1

rt/tst-cputimer3.out:
clock_gettime returned timespec = { 0, 0 }
clock_getres returned timespec = { 0, 1 }
Timed out: killed the child process
rt/tst-cputimer3.test-result:
FAIL: rt/tst-cputimer3
original exit status 1

I haven't looked any further into this, just reporting.. otherwise the
test systems seem to be healthy.


^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [GIT PULL] timer fix
  2018-12-21 12:34 Ingo Molnar
@ 2018-12-21 19:30 ` pr-tracker-bot
  2018-12-23 19:29 ` Heiko Carstens
  1 sibling, 0 replies; 156+ messages in thread
From: pr-tracker-bot @ 2018-12-21 19:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Andrew Morton

The pull request you sent on Fri, 21 Dec 2018 13:34:53 +0100:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-for-linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/e572fa0e840154d33a69622af030dda551eee606

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [GIT PULL] timer fix
@ 2018-12-21 12:34 Ingo Molnar
  2018-12-21 19:30 ` pr-tracker-bot
  2018-12-23 19:29 ` Heiko Carstens
  0 siblings, 2 replies; 156+ messages in thread
From: Ingo Molnar @ 2018-12-21 12:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Andrew Morton

Linus,

Please pull the latest timers-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-for-linus

   # HEAD: 0e334db6bb4b1fd1e2d72c1f3d8f004313cd9f94 posix-timers: Fix division by zero bug

Fix a division by zero crash in the posix-timers code.

 Thanks,

	Ingo

------------------>
Thomas Gleixner (1):
      posix-timers: Fix division by zero bug


 kernel/time/posix-timers.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index bd62b5eeb5a0..31f49ae80f43 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -289,9 +289,6 @@ static void common_hrtimer_rearm(struct k_itimer *timr)
 {
 	struct hrtimer *timer = &timr->it.real.timer;
 
-	if (!timr->it_interval)
-		return;
-
 	timr->it_overrun += hrtimer_forward(timer, timer->base->get_time(),
 					    timr->it_interval);
 	hrtimer_restart(timer);
@@ -317,7 +314,7 @@ void posixtimer_rearm(struct kernel_siginfo *info)
 	if (!timr)
 		return;
 
-	if (timr->it_requeue_pending == info->si_sys_private) {
+	if (timr->it_interval && timr->it_requeue_pending == info->si_sys_private) {
 		timr->kclock->timer_rearm(timr);
 
 		timr->it_active = 1;

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [GIT PULL] timer fix
@ 2018-03-25  9:00 Ingo Molnar
  0 siblings, 0 replies; 156+ messages in thread
From: Ingo Molnar @ 2018-03-25  9:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, John Stultz, Peter Zijlstra,
	Andrew Morton

Linus,

Please pull the latest timers-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-for-linus

   # HEAD: 19b558db12f9f4e45a22012bae7b4783e62224da posix-timers: Protect posix clock array access against speculation

Make posix clock ID usage Spectre-safe.

 Thanks,

	Ingo

------------------>
Thomas Gleixner (1):
      posix-timers: Protect posix clock array access against speculation


 kernel/time/posix-timers.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index 75043046914e..10b7186d0638 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -50,6 +50,7 @@
 #include <linux/export.h>
 #include <linux/hashtable.h>
 #include <linux/compat.h>
+#include <linux/nospec.h>
 
 #include "timekeeping.h"
 #include "posix-timers.h"
@@ -1346,11 +1347,15 @@ static const struct k_clock * const posix_clocks[] = {
 
 static const struct k_clock *clockid_to_kclock(const clockid_t id)
 {
-	if (id < 0)
+	clockid_t idx = id;
+
+	if (id < 0) {
 		return (id & CLOCKFD_MASK) == CLOCKFD ?
 			&clock_posix_dynamic : &clock_posix_cpu;
+	}
 
-	if (id >= ARRAY_SIZE(posix_clocks) || !posix_clocks[id])
+	if (id >= ARRAY_SIZE(posix_clocks))
 		return NULL;
-	return posix_clocks[id];
+
+	return posix_clocks[array_index_nospec(idx, ARRAY_SIZE(posix_clocks))];
 }

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [GIT PULL] timer fix
@ 2017-09-24 11:25 Ingo Molnar
  0 siblings, 0 replies; 156+ messages in thread
From: Ingo Molnar @ 2017-09-24 11:25 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Andrew Morton

Linus,

Please pull the latest timers-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-for-linus

   # HEAD: 8fce3dc5c5d6f6301f67311fa79f333902b58cea clocksource/integrator: Fix section mismatch warning

A clocksource driver section mismatch fix.

 Thanks,

	Ingo

------------------>
Arnd Bergmann (1):
      clocksource/integrator: Fix section mismatch warning


 drivers/clocksource/timer-integrator-ap.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/clocksource/timer-integrator-ap.c b/drivers/clocksource/timer-integrator-ap.c
index 2ff64d9d4fb3..62d24690ba02 100644
--- a/drivers/clocksource/timer-integrator-ap.c
+++ b/drivers/clocksource/timer-integrator-ap.c
@@ -36,8 +36,8 @@ static u64 notrace integrator_read_sched_clock(void)
 	return -readl(sched_clk_base + TIMER_VALUE);
 }
 
-static int integrator_clocksource_init(unsigned long inrate,
-				       void __iomem *base)
+static int __init integrator_clocksource_init(unsigned long inrate,
+					      void __iomem *base)
 {
 	u32 ctrl = TIMER_CTRL_ENABLE | TIMER_CTRL_PERIODIC;
 	unsigned long rate = inrate;

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [GIT PULL] timer fix
@ 2017-08-26  7:17 Ingo Molnar
  0 siblings, 0 replies; 156+ messages in thread
From: Ingo Molnar @ 2017-08-26  7:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Andrew Morton

Linus,

Please pull the latest timers-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-for-linus

   # HEAD: 2fe59f507a65dbd734b990a11ebc7488f6f87a24 timers: Fix excessive granularity of new timers after a nohz idle

Fix a timer granularity handling race+bug, which would manifest itself by 
spuriously increasing timeouts of some timers (from 1 jiffy to ~500 jiffies
in the worst case measured) in certain nohz states.

 Thanks,

	Ingo

------------------>
Nicholas Piggin (1):
      timers: Fix excessive granularity of new timers after a nohz idle


 kernel/time/timer.c | 50 +++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 41 insertions(+), 9 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 8f5d1bf18854..f2674a056c26 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -203,6 +203,7 @@ struct timer_base {
 	bool			migration_enabled;
 	bool			nohz_active;
 	bool			is_idle;
+	bool			must_forward_clk;
 	DECLARE_BITMAP(pending_map, WHEEL_SIZE);
 	struct hlist_head	vectors[WHEEL_SIZE];
 } ____cacheline_aligned;
@@ -856,13 +857,19 @@ get_target_base(struct timer_base *base, unsigned tflags)
 
 static inline void forward_timer_base(struct timer_base *base)
 {
-	unsigned long jnow = READ_ONCE(jiffies);
+	unsigned long jnow;
 
 	/*
-	 * We only forward the base when it's idle and we have a delta between
-	 * base clock and jiffies.
+	 * We only forward the base when we are idle or have just come out of
+	 * idle (must_forward_clk logic), and have a delta between base clock
+	 * and jiffies. In the common case, run_timers will take care of it.
 	 */
-	if (!base->is_idle || (long) (jnow - base->clk) < 2)
+	if (likely(!base->must_forward_clk))
+		return;
+
+	jnow = READ_ONCE(jiffies);
+	base->must_forward_clk = base->is_idle;
+	if ((long)(jnow - base->clk) < 2)
 		return;
 
 	/*
@@ -938,6 +945,11 @@ __mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only)
 	 * same array bucket then just return:
 	 */
 	if (timer_pending(timer)) {
+		/*
+		 * The downside of this optimization is that it can result in
+		 * larger granularity than you would get from adding a new
+		 * timer with this expiry.
+		 */
 		if (timer->expires == expires)
 			return 1;
 
@@ -948,6 +960,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only)
 		 * dequeue/enqueue dance.
 		 */
 		base = lock_timer_base(timer, &flags);
+		forward_timer_base(base);
 
 		clk = base->clk;
 		idx = calc_wheel_index(expires, clk);
@@ -964,6 +977,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only)
 		}
 	} else {
 		base = lock_timer_base(timer, &flags);
+		forward_timer_base(base);
 	}
 
 	ret = detach_if_pending(timer, base, false);
@@ -991,12 +1005,10 @@ __mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only)
 			raw_spin_lock(&base->lock);
 			WRITE_ONCE(timer->flags,
 				   (timer->flags & ~TIMER_BASEMASK) | base->cpu);
+			forward_timer_base(base);
 		}
 	}
 
-	/* Try to forward a stale timer base clock */
-	forward_timer_base(base);
-
 	timer->expires = expires;
 	/*
 	 * If 'idx' was calculated above and the base time did not advance
@@ -1112,6 +1124,7 @@ void add_timer_on(struct timer_list *timer, int cpu)
 		WRITE_ONCE(timer->flags,
 			   (timer->flags & ~TIMER_BASEMASK) | cpu);
 	}
+	forward_timer_base(base);
 
 	debug_activate(timer, timer->expires);
 	internal_add_timer(base, timer);
@@ -1497,10 +1510,16 @@ u64 get_next_timer_interrupt(unsigned long basej, u64 basem)
 		if (!is_max_delta)
 			expires = basem + (u64)(nextevt - basej) * TICK_NSEC;
 		/*
-		 * If we expect to sleep more than a tick, mark the base idle:
+		 * If we expect to sleep more than a tick, mark the base idle.
+		 * Also the tick is stopped so any added timer must forward
+		 * the base clk itself to keep granularity small. This idle
+		 * logic is only maintained for the BASE_STD base, deferrable
+		 * timers may still see large granularity skew (by design).
 		 */
-		if ((expires - basem) > TICK_NSEC)
+		if ((expires - basem) > TICK_NSEC) {
+			base->must_forward_clk = true;
 			base->is_idle = true;
+		}
 	}
 	raw_spin_unlock(&base->lock);
 
@@ -1611,6 +1630,19 @@ static __latent_entropy void run_timer_softirq(struct softirq_action *h)
 {
 	struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);
 
+	/*
+	 * must_forward_clk must be cleared before running timers so that any
+	 * timer functions that call mod_timer will not try to forward the
+	 * base. idle trcking / clock forwarding logic is only used with
+	 * BASE_STD timers.
+	 *
+	 * The deferrable base does not do idle tracking at all, so we do
+	 * not forward it. This can result in very large variations in
+	 * granularity for deferrable timers, but they can be deferred for
+	 * long periods due to idle.
+	 */
+	base->must_forward_clk = false;
+
 	__run_timers(base);
 	if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && base->nohz_active)
 		__run_timers(this_cpu_ptr(&timer_bases[BASE_DEF]));

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [GIT PULL] timer fix
@ 2017-07-21 10:21 Ingo Molnar
  0 siblings, 0 replies; 156+ messages in thread
From: Ingo Molnar @ 2017-07-21 10:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Andrew Morton

Linus,

Please pull the latest timers-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-for-linus

   # HEAD: 32f2fea6e77e64cd4045ec2d5deb879aada3b476 clocksource/drivers/timer-of: Handle of_irq_get_byname() result correctly

A timer_irq_init() clocksource API robustness fix.

 Thanks,

	Ingo

------------------>
Sergei Shtylyov (1):
      clocksource/drivers/timer-of: Handle of_irq_get_byname() result correctly


 drivers/clocksource/timer-of.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/clocksource/timer-of.c b/drivers/clocksource/timer-of.c
index f6e7491c873c..d509b500a7b5 100644
--- a/drivers/clocksource/timer-of.c
+++ b/drivers/clocksource/timer-of.c
@@ -41,8 +41,16 @@ static __init int timer_irq_init(struct device_node *np,
 	struct timer_of *to = container_of(of_irq, struct timer_of, of_irq);
 	struct clock_event_device *clkevt = &to->clkevt;
 
-	of_irq->irq = of_irq->name ? of_irq_get_byname(np, of_irq->name):
-		irq_of_parse_and_map(np, of_irq->index);
+	if (of_irq->name) {
+		of_irq->irq = ret = of_irq_get_byname(np, of_irq->name);
+		if (ret < 0) {
+			pr_err("Failed to get interrupt %s for %s\n",
+			       of_irq->name, np->full_name);
+			return ret;
+		}
+	} else	{
+		of_irq->irq = irq_of_parse_and_map(np, of_irq->index);
+	}
 	if (!of_irq->irq) {
 		pr_err("Failed to map interrupt for %s\n", np->full_name);
 		return -EINVAL;

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [GIT PULL] timer fix
@ 2017-05-12  7:35 Ingo Molnar
  0 siblings, 0 replies; 156+ messages in thread
From: Ingo Molnar @ 2017-05-12  7:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Andrew Morton

Linus,

Please pull the latest timers-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-for-linus

   # HEAD: f63d947c1673930bfc5f2f9bd1073a02c179a890 clocksource/arm_arch_timer: Fix arch_timer_mem_find_best_frame()

A single ARM Juno clocksource driver fix.

 Thanks,

	Ingo

------------------>
Sudeep Holla (1):
      clocksource/arm_arch_timer: Fix arch_timer_mem_find_best_frame()


 drivers/clocksource/arm_arch_timer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index a1fb918b8021..4bed671e490e 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -1268,7 +1268,7 @@ arch_timer_mem_find_best_frame(struct arch_timer_mem *timer_mem)
 		pr_err("Unable to find a suitable frame in timer @ %pa\n",
 			&timer_mem->cntctlbase);
 
-	return frame;
+	return best_frame;
 }
 
 static int __init

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [GIT PULL] timer fix
@ 2017-01-18  9:37 Ingo Molnar
  0 siblings, 0 replies; 156+ messages in thread
From: Ingo Molnar @ 2017-01-18  9:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Andrew Morton

Linus,

Please pull the latest timers-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-for-linus

   # HEAD: bc7c36eedb0c7004aa06c2afc3c5385adada8fa3 clocksource/exynos_mct: Clear interrupt when cpu is shut down

Fix a crash in the ARM-Exynos clocksource driver, triggered by CPU hotplug 
operations.

 Thanks,

	Ingo

------------------>
Joonyoung Shim (1):
      clocksource/exynos_mct: Clear interrupt when cpu is shut down


 drivers/clocksource/exynos_mct.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c
index 4da1dc2278bd..670ff0f25b67 100644
--- a/drivers/clocksource/exynos_mct.c
+++ b/drivers/clocksource/exynos_mct.c
@@ -495,6 +495,7 @@ static int exynos4_mct_dying_cpu(unsigned int cpu)
 	if (mct_int_type == MCT_INT_SPI) {
 		if (evt->irq != -1)
 			disable_irq_nosync(evt->irq);
+		exynos4_mct_write(0x1, mevt->base + MCT_L_INT_CSTAT_OFFSET);
 	} else {
 		disable_percpu_irq(mct_irqs[MCT_L0_IRQ]);
 	}

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [GIT PULL] timer fix
@ 2016-12-23 22:53 Ingo Molnar
  0 siblings, 0 replies; 156+ messages in thread
From: Ingo Molnar @ 2016-12-23 22:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Andrew Morton

Linus,

Please pull the latest timers-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-for-linus

   # HEAD: c9435f35ae64ee162555a82b6a3586b160093957 clocksource/drivers/moxart: Plug memory and mapping leaks

ARM/MOXA SoC clocksource driver fixes.

 Thanks,

	Ingo

------------------>
Sudip Mukherjee (1):
      clocksource/drivers/moxart: Plug memory and mapping leaks


 drivers/clocksource/moxart_timer.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/drivers/clocksource/moxart_timer.c b/drivers/clocksource/moxart_timer.c
index 2a8f4705c734..7f3430654fbd 100644
--- a/drivers/clocksource/moxart_timer.c
+++ b/drivers/clocksource/moxart_timer.c
@@ -161,19 +161,22 @@ static int __init moxart_timer_init(struct device_node *node)
 	timer->base = of_iomap(node, 0);
 	if (!timer->base) {
 		pr_err("%s: of_iomap failed\n", node->full_name);
-		return -ENXIO;
+		ret = -ENXIO;
+		goto out_free;
 	}
 
 	irq = irq_of_parse_and_map(node, 0);
 	if (irq <= 0) {
 		pr_err("%s: irq_of_parse_and_map failed\n", node->full_name);
-		return -EINVAL;
+		ret = -EINVAL;
+		goto out_unmap;
 	}
 
 	clk = of_clk_get(node, 0);
 	if (IS_ERR(clk))  {
 		pr_err("%s: of_clk_get failed\n", node->full_name);
-		return PTR_ERR(clk);
+		ret = PTR_ERR(clk);
+		goto out_unmap;
 	}
 
 	pclk = clk_get_rate(clk);
@@ -186,7 +189,8 @@ static int __init moxart_timer_init(struct device_node *node)
 		timer->t1_disable_val = ASPEED_TIMER1_DISABLE;
 	} else {
 		pr_err("%s: unknown platform\n", node->full_name);
-		return -EINVAL;
+		ret = -EINVAL;
+		goto out_unmap;
 	}
 
 	timer->count_per_tick = DIV_ROUND_CLOSEST(pclk, HZ);
@@ -208,14 +212,14 @@ static int __init moxart_timer_init(struct device_node *node)
 				    clocksource_mmio_readl_down);
 	if (ret) {
 		pr_err("%s: clocksource_mmio_init failed\n", node->full_name);
-		return ret;
+		goto out_unmap;
 	}
 
 	ret = request_irq(irq, moxart_timer_interrupt, IRQF_TIMER,
 			  node->name, &timer->clkevt);
 	if (ret) {
 		pr_err("%s: setup_irq failed\n", node->full_name);
-		return ret;
+		goto out_unmap;
 	}
 
 	/* Clear match registers */
@@ -241,6 +245,12 @@ static int __init moxart_timer_init(struct device_node *node)
 	clockevents_config_and_register(&timer->clkevt, pclk, 0x4, 0xfffffffe);
 
 	return 0;
+
+out_unmap:
+	iounmap(timer->base);
+out_free:
+	kfree(timer);
+	return ret;
 }
 CLOCKSOURCE_OF_DECLARE(moxart, "moxa,moxart-timer", moxart_timer_init);
 CLOCKSOURCE_OF_DECLARE(aspeed, "aspeed,ast2400-timer", moxart_timer_init);

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [GIT PULL] timer fix
@ 2016-10-18 11:18 Ingo Molnar
  0 siblings, 0 replies; 156+ messages in thread
From: Ingo Molnar @ 2016-10-18 11:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Andrew Morton

Linus,

Please pull the latest timers-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-for-linus

   # HEAD: 54e23845e965898f65f76aba79fa9db76d830fa9 alarmtimer: Remove unused but set variable

Remove an unused variable.

 Thanks,

	Ingo

------------------>
Tobias Klauser (1):
      alarmtimer: Remove unused but set variable


 kernel/time/alarmtimer.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index c3aad685bbc0..12dd190634ab 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -542,7 +542,6 @@ static int alarm_clock_get(clockid_t which_clock, struct timespec *tp)
 static int alarm_timer_create(struct k_itimer *new_timer)
 {
 	enum  alarmtimer_type type;
-	struct alarm_base *base;
 
 	if (!alarmtimer_get_rtcdev())
 		return -ENOTSUPP;
@@ -551,7 +550,6 @@ static int alarm_timer_create(struct k_itimer *new_timer)
 		return -EPERM;
 
 	type = clock2alarm(new_timer->it_clock);
-	base = &alarm_bases[type];
 	alarm_init(&new_timer->it.alarm.alarmtimer, type, alarm_handle_timer);
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [GIT PULL] timer fix
@ 2016-07-13 12:58 Ingo Molnar
  0 siblings, 0 replies; 156+ messages in thread
From: Ingo Molnar @ 2016-07-13 12:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Andrew Morton

Linus,

Please pull the latest timers-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-for-linus

   # HEAD: 2c13ce8f6b2f6fd9ba2f9261b1939fc0f62d1307 posix_cpu_timer: Exit early when process has been reaped

A single fix for a posix CPU timers bug.

 Thanks,

	Ingo

------------------>
Alexey Dobriyan (1):
      posix_cpu_timer: Exit early when process has been reaped


 kernel/time/posix-cpu-timers.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 1cafba860b08..39008d78927a 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -777,6 +777,7 @@ static void posix_cpu_timer_get(struct k_itimer *timer, struct itimerspec *itp)
 			timer->it.cpu.expires = 0;
 			sample_to_timespec(timer->it_clock, timer->it.cpu.expires,
 					   &itp->it_value);
+			return;
 		} else {
 			cpu_timer_sample_group(timer->it_clock, p, &now);
 			unlock_task_sighand(p, &flags);

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [GIT PULL] timer fix
@ 2016-04-23 11:34 Ingo Molnar
  0 siblings, 0 replies; 156+ messages in thread
From: Ingo Molnar @ 2016-04-23 11:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Andrew Morton

Linus,

Please pull the latest timers-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-for-linus

   # HEAD: 16eeed7e5558a3dcf30f75526a896b2632f299f9 clocksource/drivers/tango-xtal: Fix boot hang due to incorrect test

Fix a boot hang in the ARM based Tango SoC clocksource driver.

 Thanks,

	Ingo

------------------>
Daniel Lezcano (1):
      clocksource/drivers/tango-xtal: Fix boot hang due to incorrect test


 drivers/clocksource/tango_xtal.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clocksource/tango_xtal.c b/drivers/clocksource/tango_xtal.c
index 2bcecafdeaea..c407c47a3232 100644
--- a/drivers/clocksource/tango_xtal.c
+++ b/drivers/clocksource/tango_xtal.c
@@ -42,7 +42,7 @@ static void __init tango_clocksource_init(struct device_node *np)
 
 	ret = clocksource_mmio_init(xtal_in_cnt, "tango-xtal", xtal_freq, 350,
 				    32, clocksource_mmio_readl_up);
-	if (!ret) {
+	if (ret) {
 		pr_err("%s: registration failed\n", np->full_name);
 		return;
 	}

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [GIT PULL] timer fix
@ 2015-08-14  7:13 Ingo Molnar
  0 siblings, 0 replies; 156+ messages in thread
From: Ingo Molnar @ 2015-08-14  7:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Andrew Morton

Linus,

Please pull the latest timers-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-for-linus

   # HEAD: 54d46b7fbcbd00fe4b20a27208e5909facc714e3 clockevents/drivers/sh_cmt: Only perform clocksource suspend/resume if enabled

A single clocksource driver suspend/resume fix.

 Thanks,

	Ingo

------------------>
Geert Uytterhoeven (1):
      clockevents/drivers/sh_cmt: Only perform clocksource suspend/resume if enabled


 drivers/clocksource/sh_cmt.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/clocksource/sh_cmt.c b/drivers/clocksource/sh_cmt.c
index b8ff3c64cc45..c96de14036a0 100644
--- a/drivers/clocksource/sh_cmt.c
+++ b/drivers/clocksource/sh_cmt.c
@@ -661,6 +661,9 @@ static void sh_cmt_clocksource_suspend(struct clocksource *cs)
 {
 	struct sh_cmt_channel *ch = cs_to_sh_cmt(cs);
 
+	if (!ch->cs_enabled)
+		return;
+
 	sh_cmt_stop(ch, FLAG_CLOCKSOURCE);
 	pm_genpd_syscore_poweroff(&ch->cmt->pdev->dev);
 }
@@ -669,6 +672,9 @@ static void sh_cmt_clocksource_resume(struct clocksource *cs)
 {
 	struct sh_cmt_channel *ch = cs_to_sh_cmt(cs);
 
+	if (!ch->cs_enabled)
+		return;
+
 	pm_genpd_syscore_poweron(&ch->cmt->pdev->dev);
 	sh_cmt_start(ch, FLAG_CLOCKSOURCE);
 }

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [GIT PULL] timer fix
@ 2015-07-18  3:06 Ingo Molnar
  0 siblings, 0 replies; 156+ messages in thread
From: Ingo Molnar @ 2015-07-18  3:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Andrew Morton

Linus,

Please pull the latest timers-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-for-linus

   # HEAD: 0f44705175347ec96935d60b765b5d14ecc763bb tick: Move the export of tick_broadcast_oneshot_control to the proper place

Fix for a misplaced export that can cause build failures in certain (rare) Kconfig 
situations.

 Thanks,

	Ingo

------------------>
Thomas Gleixner (1):
      tick: Move the export of tick_broadcast_oneshot_control to the proper place


 kernel/time/tick-broadcast.c | 1 -
 kernel/time/tick-common.c    | 1 +
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index 52b9e199b5ac..f6aae7977824 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -839,7 +839,6 @@ int __tick_broadcast_oneshot_control(enum tick_broadcast_state state)
 	raw_spin_unlock(&tick_broadcast_lock);
 	return ret;
 }
-EXPORT_SYMBOL_GPL(tick_broadcast_oneshot_control);
 
 /*
  * Reset the one shot broadcast for a cpu
diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index 55e13efff1ab..f8bf47571dda 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -363,6 +363,7 @@ int tick_broadcast_oneshot_control(enum tick_broadcast_state state)
 
 	return __tick_broadcast_oneshot_control(state);
 }
+EXPORT_SYMBOL_GPL(tick_broadcast_oneshot_control);
 
 #ifdef CONFIG_HOTPLUG_CPU
 /*

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [GIT PULL] timer fix
@ 2015-02-06 18:38 Ingo Molnar
  0 siblings, 0 replies; 156+ messages in thread
From: Ingo Molnar @ 2015-02-06 18:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Andrew Morton

Linus,

Please pull the latest timers-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-for-linus

   # HEAD: 2d926c15d629a13914ce3e5f26354f6a0ac99e70 hrtimer: Fix incorrect tai offset calculation for non high-res timer systems

A CLOCK_TAI early expiry fix.

 Thanks,

	Ingo

------------------>
John Stultz (1):
      hrtimer: Fix incorrect tai offset calculation for non high-res timer systems


 kernel/time/hrtimer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 37e50aadd471..d8c724cda37b 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -122,7 +122,7 @@ static void hrtimer_get_softirq_time(struct hrtimer_cpu_base *base)
 	mono = ktime_get_update_offsets_tick(&off_real, &off_boot, &off_tai);
 	boot = ktime_add(mono, off_boot);
 	xtim = ktime_add(mono, off_real);
-	tai = ktime_add(xtim, off_tai);
+	tai = ktime_add(mono, off_tai);
 
 	base->clock_base[HRTIMER_BASE_REALTIME].softirq_time = xtim;
 	base->clock_base[HRTIMER_BASE_MONOTONIC].softirq_time = mono;

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [GIT PULL] timer fix
@ 2014-03-29 18:44 Ingo Molnar
  0 siblings, 0 replies; 156+ messages in thread
From: Ingo Molnar @ 2014-03-29 18:44 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Thomas Gleixner, John Stultz, Andrew Morton

Linus,

Please pull the latest timers-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-for-linus

   # HEAD: cab5e127eef040399902caa8e1510795583fa03a time: Revert to calling clock_was_set_delayed() while in irq context

A late breaking fix from John. (The bug fixed has a hard lockup 
potential, but that was not observed, warnings were.)

 Thanks,

	Ingo

------------------>
John Stultz (1):
      time: Revert to calling clock_was_set_delayed() while in irq context


 kernel/time/timekeeping.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 0aa4ce8..5b40279 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1435,7 +1435,8 @@ void update_wall_time(void)
 out:
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
 	if (clock_set)
-		clock_was_set();
+		/* Have to call _delayed version, since in irq context*/
+		clock_was_set_delayed();
 }
 
 /**

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [GIT PULL] timer fix
@ 2014-01-15 18:27 Ingo Molnar
  0 siblings, 0 replies; 156+ messages in thread
From: Ingo Molnar @ 2014-01-15 18:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, H. Peter Anvin,
	Andrew Morton

Linus,

Please pull the latest timers-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-for-linus

   # HEAD: e59da0aedb573a347fa501fa63d3ff5055aa1bc7 Merge branch 'clockevents/3.13-fixes' of git://git.linaro.org/people/daniel.lezcano/linux into timers/urgent

It contains a crash fix for the ARM Cadence TTC clock driver.

 Thanks,

	Ingo

------------------>
Soren Brinkmann (1):
      clocksource: cadence_ttc: Fix mutex taken inside interrupt context


 drivers/clocksource/cadence_ttc_timer.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/clocksource/cadence_ttc_timer.c b/drivers/clocksource/cadence_ttc_timer.c
index b2bb3a4b..a92350b 100644
--- a/drivers/clocksource/cadence_ttc_timer.c
+++ b/drivers/clocksource/cadence_ttc_timer.c
@@ -67,11 +67,13 @@
  * struct ttc_timer - This definition defines local timer structure
  *
  * @base_addr:	Base address of timer
+ * @freq:	Timer input clock frequency
  * @clk:	Associated clock source
  * @clk_rate_change_nb	Notifier block for clock rate changes
  */
 struct ttc_timer {
 	void __iomem *base_addr;
+	unsigned long freq;
 	struct clk *clk;
 	struct notifier_block clk_rate_change_nb;
 };
@@ -196,9 +198,8 @@ static void ttc_set_mode(enum clock_event_mode mode,
 
 	switch (mode) {
 	case CLOCK_EVT_MODE_PERIODIC:
-		ttc_set_interval(timer,
-				DIV_ROUND_CLOSEST(clk_get_rate(ttce->ttc.clk),
-					PRESCALE * HZ));
+		ttc_set_interval(timer, DIV_ROUND_CLOSEST(ttce->ttc.freq,
+						PRESCALE * HZ));
 		break;
 	case CLOCK_EVT_MODE_ONESHOT:
 	case CLOCK_EVT_MODE_UNUSED:
@@ -273,6 +274,8 @@ static void __init ttc_setup_clocksource(struct clk *clk, void __iomem *base)
 		return;
 	}
 
+	ttccs->ttc.freq = clk_get_rate(ttccs->ttc.clk);
+
 	ttccs->ttc.clk_rate_change_nb.notifier_call =
 		ttc_rate_change_clocksource_cb;
 	ttccs->ttc.clk_rate_change_nb.next = NULL;
@@ -298,16 +301,14 @@ static void __init ttc_setup_clocksource(struct clk *clk, void __iomem *base)
 	__raw_writel(CNT_CNTRL_RESET,
 		     ttccs->ttc.base_addr + TTC_CNT_CNTRL_OFFSET);
 
-	err = clocksource_register_hz(&ttccs->cs,
-			clk_get_rate(ttccs->ttc.clk) / PRESCALE);
+	err = clocksource_register_hz(&ttccs->cs, ttccs->ttc.freq / PRESCALE);
 	if (WARN_ON(err)) {
 		kfree(ttccs);
 		return;
 	}
 
 	ttc_sched_clock_val_reg = base + TTC_COUNT_VAL_OFFSET;
-	setup_sched_clock(ttc_sched_clock_read, 16,
-			clk_get_rate(ttccs->ttc.clk) / PRESCALE);
+	setup_sched_clock(ttc_sched_clock_read, 16, ttccs->ttc.freq / PRESCALE);
 }
 
 static int ttc_rate_change_clockevent_cb(struct notifier_block *nb,
@@ -334,6 +335,9 @@ static int ttc_rate_change_clockevent_cb(struct notifier_block *nb,
 				ndata->new_rate / PRESCALE);
 		local_irq_restore(flags);
 
+		/* update cached frequency */
+		ttc->freq = ndata->new_rate;
+
 		/* fall through */
 	}
 	case PRE_RATE_CHANGE:
@@ -367,6 +371,7 @@ static void __init ttc_setup_clockevent(struct clk *clk,
 	if (clk_notifier_register(ttcce->ttc.clk,
 				&ttcce->ttc.clk_rate_change_nb))
 		pr_warn("Unable to register clock notifier.\n");
+	ttcce->ttc.freq = clk_get_rate(ttcce->ttc.clk);
 
 	ttcce->ttc.base_addr = base;
 	ttcce->ce.name = "ttc_clockevent";
@@ -396,7 +401,7 @@ static void __init ttc_setup_clockevent(struct clk *clk,
 	}
 
 	clockevents_config_and_register(&ttcce->ce,
-			clk_get_rate(ttcce->ttc.clk) / PRESCALE, 1, 0xfffe);
+			ttcce->ttc.freq / PRESCALE, 1, 0xfffe);
 }
 
 /**

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [GIT PULL] timer fix
@ 2013-10-26 12:27 Ingo Molnar
  0 siblings, 0 replies; 156+ messages in thread
From: Ingo Molnar @ 2013-10-26 12:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Andrew Morton

Linus,

Please pull the latest timers-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-for-linus

   # HEAD: 97b9410643475d6557d2517c2aff9fd2221141a9 clockevents: Sanitize ticks to nsec conversion

This tree contains a clockevents regression fix for certain ARM 
subarchitectures.

 Thanks,

	Ingo

------------------>
Thomas Gleixner (1):
      clockevents: Sanitize ticks to nsec conversion


 kernel/time/clockevents.c | 65 ++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 50 insertions(+), 15 deletions(-)

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 38959c8..662c579 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -33,29 +33,64 @@ struct ce_unbind {
 	int res;
 };
 
-/**
- * clockevents_delta2ns - Convert a latch value (device ticks) to nanoseconds
- * @latch:	value to convert
- * @evt:	pointer to clock event device descriptor
- *
- * Math helper, returns latch value converted to nanoseconds (bound checked)
- */
-u64 clockevent_delta2ns(unsigned long latch, struct clock_event_device *evt)
+static u64 cev_delta2ns(unsigned long latch, struct clock_event_device *evt,
+			bool ismax)
 {
 	u64 clc = (u64) latch << evt->shift;
+	u64 rnd;
 
 	if (unlikely(!evt->mult)) {
 		evt->mult = 1;
 		WARN_ON(1);
 	}
+	rnd = (u64) evt->mult - 1;
+
+	/*
+	 * Upper bound sanity check. If the backwards conversion is
+	 * not equal latch, we know that the above shift overflowed.
+	 */
+	if ((clc >> evt->shift) != (u64)latch)
+		clc = ~0ULL;
+
+	/*
+	 * Scaled math oddities:
+	 *
+	 * For mult <= (1 << shift) we can safely add mult - 1 to
+	 * prevent integer rounding loss. So the backwards conversion
+	 * from nsec to device ticks will be correct.
+	 *
+	 * For mult > (1 << shift), i.e. device frequency is > 1GHz we
+	 * need to be careful. Adding mult - 1 will result in a value
+	 * which when converted back to device ticks can be larger
+	 * than latch by up to (mult - 1) >> shift. For the min_delta
+	 * calculation we still want to apply this in order to stay
+	 * above the minimum device ticks limit. For the upper limit
+	 * we would end up with a latch value larger than the upper
+	 * limit of the device, so we omit the add to stay below the
+	 * device upper boundary.
+	 *
+	 * Also omit the add if it would overflow the u64 boundary.
+	 */
+	if ((~0ULL - clc > rnd) &&
+	    (!ismax || evt->mult <= (1U << evt->shift)))
+		clc += rnd;
 
 	do_div(clc, evt->mult);
-	if (clc < 1000)
-		clc = 1000;
-	if (clc > KTIME_MAX)
-		clc = KTIME_MAX;
 
-	return clc;
+	/* Deltas less than 1usec are pointless noise */
+	return clc > 1000 ? clc : 1000;
+}
+
+/**
+ * clockevents_delta2ns - Convert a latch value (device ticks) to nanoseconds
+ * @latch:	value to convert
+ * @evt:	pointer to clock event device descriptor
+ *
+ * Math helper, returns latch value converted to nanoseconds (bound checked)
+ */
+u64 clockevent_delta2ns(unsigned long latch, struct clock_event_device *evt)
+{
+	return cev_delta2ns(latch, evt, false);
 }
 EXPORT_SYMBOL_GPL(clockevent_delta2ns);
 
@@ -380,8 +415,8 @@ void clockevents_config(struct clock_event_device *dev, u32 freq)
 		sec = 600;
 
 	clockevents_calc_mult_shift(dev, freq, sec);
-	dev->min_delta_ns = clockevent_delta2ns(dev->min_delta_ticks, dev);
-	dev->max_delta_ns = clockevent_delta2ns(dev->max_delta_ticks, dev);
+	dev->min_delta_ns = cev_delta2ns(dev->min_delta_ticks, dev, false);
+	dev->max_delta_ns = cev_delta2ns(dev->max_delta_ticks, dev, true);
 }
 
 /**

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [GIT PULL] timer fix
@ 2013-09-18 16:22 Ingo Molnar
  0 siblings, 0 replies; 156+ messages in thread
From: Ingo Molnar @ 2013-09-18 16:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Andrew Morton

Linus,

Please pull the latest timers-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-for-linus

   HEAD: 7bd36014460f793c19e7d6c94dab67b0afcfcb7f timekeeping: Fix HRTICK related deadlock from ntp lock changes

An NTP related lockup fix.

 Thanks,

	Ingo

------------------>
John Stultz (1):
      timekeeping: Fix HRTICK related deadlock from ntp lock changes


 include/linux/timex.h     | 1 +
 kernel/time/ntp.c         | 6 ++----
 kernel/time/timekeeping.c | 2 ++
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/linux/timex.h b/include/linux/timex.h
index b3726e6..dd3edd7 100644
--- a/include/linux/timex.h
+++ b/include/linux/timex.h
@@ -141,6 +141,7 @@ extern int do_adjtimex(struct timex *);
 extern void hardpps(const struct timespec *, const struct timespec *);
 
 int read_current_timer(unsigned long *timer_val);
+void ntp_notify_cmos_timer(void);
 
 /* The clock frequency of the i8253/i8254 PIT */
 #define PIT_TICK_RATE 1193182ul
diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 8f5b3b9..bb22151 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -516,13 +516,13 @@ static void sync_cmos_clock(struct work_struct *work)
 	schedule_delayed_work(&sync_cmos_work, timespec_to_jiffies(&next));
 }
 
-static void notify_cmos_timer(void)
+void ntp_notify_cmos_timer(void)
 {
 	schedule_delayed_work(&sync_cmos_work, 0);
 }
 
 #else
-static inline void notify_cmos_timer(void) { }
+void ntp_notify_cmos_timer(void) { }
 #endif
 
 
@@ -687,8 +687,6 @@ int __do_adjtimex(struct timex *txc, struct timespec *ts, s32 *time_tai)
 	if (!(time_status & STA_NANO))
 		txc->time.tv_usec /= NSEC_PER_USEC;
 
-	notify_cmos_timer();
-
 	return result;
 }
 
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 48b9fff..947ba25 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1703,6 +1703,8 @@ int do_adjtimex(struct timex *txc)
 	write_seqcount_end(&timekeeper_seq);
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
 
+	ntp_notify_cmos_timer();
+
 	return ret;
 }
 

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [GIT PULL] timer fix
@ 2011-04-29 18:11 Ingo Molnar
  0 siblings, 0 replies; 156+ messages in thread
From: Ingo Molnar @ 2011-04-29 18:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Andrew Morton

Linus,

Please pull the latest timer-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git timer-fixes-for-linus

This fixes the scheduler bug discussed in the:

 "2.6.39-rc4+: Kernel leaking memory during FS scanning, regression?"

lkml thread.

 Thanks,

	Ingo

------------------>
Thomas Gleixner (1):
      hrtimer: Initialize CLOCK_ID to HRTIMER_BASE table statically

Zhangfei Gao (1):
      rtc: max8925: Call dev_set_drvdata before rtc_device_register


 drivers/rtc/rtc-max8925.c |    3 ++-
 kernel/hrtimer.c          |   10 +++++-----
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/rtc/rtc-max8925.c b/drivers/rtc/rtc-max8925.c
index 174036d..20494b5 100644
--- a/drivers/rtc/rtc-max8925.c
+++ b/drivers/rtc/rtc-max8925.c
@@ -257,6 +257,8 @@ static int __devinit max8925_rtc_probe(struct platform_device *pdev)
 		goto out_irq;
 	}
 
+	dev_set_drvdata(&pdev->dev, info);
+
 	info->rtc_dev = rtc_device_register("max8925-rtc", &pdev->dev,
 					&max8925_rtc_ops, THIS_MODULE);
 	ret = PTR_ERR(info->rtc_dev);
@@ -265,7 +267,6 @@ static int __devinit max8925_rtc_probe(struct platform_device *pdev)
 		goto out_rtc;
 	}
 
-	dev_set_drvdata(&pdev->dev, info);
 	platform_set_drvdata(pdev, info);
 
 	return 0;
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 9017478..87fdb3f 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -81,7 +81,11 @@ DEFINE_PER_CPU(struct hrtimer_cpu_base, hrtimer_bases) =
 	}
 };
 
-static int hrtimer_clock_to_base_table[MAX_CLOCKS];
+static int hrtimer_clock_to_base_table[MAX_CLOCKS] = {
+	[CLOCK_REALTIME]	= HRTIMER_BASE_REALTIME,
+	[CLOCK_MONOTONIC]	= HRTIMER_BASE_MONOTONIC,
+	[CLOCK_BOOTTIME]	= HRTIMER_BASE_BOOTTIME,
+};
 
 static inline int hrtimer_clockid_to_base(clockid_t clock_id)
 {
@@ -1722,10 +1726,6 @@ static struct notifier_block __cpuinitdata hrtimers_nb = {
 
 void __init hrtimers_init(void)
 {
-	hrtimer_clock_to_base_table[CLOCK_REALTIME] = HRTIMER_BASE_REALTIME;
-	hrtimer_clock_to_base_table[CLOCK_MONOTONIC] = HRTIMER_BASE_MONOTONIC;
-	hrtimer_clock_to_base_table[CLOCK_BOOTTIME] = HRTIMER_BASE_BOOTTIME;
-
 	hrtimer_cpu_notify(&hrtimers_nb, (unsigned long)CPU_UP_PREPARE,
 			  (void *)(long)smp_processor_id());
 	register_cpu_notifier(&hrtimers_nb);

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [GIT PULL] timer fix
@ 2011-02-28 17:39 Ingo Molnar
  0 siblings, 0 replies; 156+ messages in thread
From: Ingo Molnar @ 2011-02-28 17:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Andrew Morton

Linus,

Please pull the latest timers-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git timers-fixes-for-linus

 Thanks,

	Ingo

------------------>
Thomas Gleixner (1):
      clockevents: Prevent oneshot mode when broadcast device is periodic


 kernel/time/tick-broadcast.c |   10 ++++++++++
 kernel/time/tick-common.c    |    6 +++++-
 kernel/time/tick-internal.h  |    3 +++
 3 files changed, 18 insertions(+), 1 deletions(-)

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index 48b2761..a3b5aff 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -600,4 +600,14 @@ int tick_broadcast_oneshot_active(void)
 	return tick_broadcast_device.mode == TICKDEV_MODE_ONESHOT;
 }
 
+/*
+ * Check whether the broadcast device supports oneshot.
+ */
+bool tick_broadcast_oneshot_available(void)
+{
+	struct clock_event_device *bc = tick_broadcast_device.evtdev;
+
+	return bc ? bc->features & CLOCK_EVT_FEAT_ONESHOT : false;
+}
+
 #endif
diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index 051bc80..ed228ef 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -51,7 +51,11 @@ int tick_is_oneshot_available(void)
 {
 	struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev);
 
-	return dev && (dev->features & CLOCK_EVT_FEAT_ONESHOT);
+	if (!dev || !(dev->features & CLOCK_EVT_FEAT_ONESHOT))
+		return 0;
+	if (!(dev->features & CLOCK_EVT_FEAT_C3STOP))
+		return 1;
+	return tick_broadcast_oneshot_available();
 }
 
 /*
diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h
index 290eefb..f65d3a7 100644
--- a/kernel/time/tick-internal.h
+++ b/kernel/time/tick-internal.h
@@ -36,6 +36,7 @@ extern void tick_shutdown_broadcast_oneshot(unsigned int *cpup);
 extern int tick_resume_broadcast_oneshot(struct clock_event_device *bc);
 extern int tick_broadcast_oneshot_active(void);
 extern void tick_check_oneshot_broadcast(int cpu);
+bool tick_broadcast_oneshot_available(void);
 # else /* BROADCAST */
 static inline void tick_broadcast_setup_oneshot(struct clock_event_device *bc)
 {
@@ -46,6 +47,7 @@ static inline void tick_broadcast_switch_to_oneshot(void) { }
 static inline void tick_shutdown_broadcast_oneshot(unsigned int *cpup) { }
 static inline int tick_broadcast_oneshot_active(void) { return 0; }
 static inline void tick_check_oneshot_broadcast(int cpu) { }
+static inline bool tick_broadcast_oneshot_available(void) { return true; }
 # endif /* !BROADCAST */
 
 #else /* !ONESHOT */
@@ -76,6 +78,7 @@ static inline int tick_resume_broadcast_oneshot(struct clock_event_device *bc)
 	return 0;
 }
 static inline int tick_broadcast_oneshot_active(void) { return 0; }
+static inline bool tick_broadcast_oneshot_available(void) { return false; }
 #endif /* !TICK_ONESHOT */
 
 /*

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [GIT PULL] timer fix
@ 2011-02-15 17:06 Ingo Molnar
  0 siblings, 0 replies; 156+ messages in thread
From: Ingo Molnar @ 2011-02-15 17:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Andrew Morton

Linus,

Please pull the latest timers-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git timers-fixes-for-linus

 Thanks,

	Ingo

------------------>
Kees Cook (1):
      timer debug: Hide kernel addresses via %pK in /proc/timer_list


 kernel/time/timer_list.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/time/timer_list.c b/kernel/time/timer_list.c
index 32a19f9..3258455 100644
--- a/kernel/time/timer_list.c
+++ b/kernel/time/timer_list.c
@@ -41,7 +41,7 @@ static void print_name_offset(struct seq_file *m, void *sym)
 	char symname[KSYM_NAME_LEN];
 
 	if (lookup_symbol_name((unsigned long)sym, symname) < 0)
-		SEQ_printf(m, "<%p>", sym);
+		SEQ_printf(m, "<%pK>", sym);
 	else
 		SEQ_printf(m, "%s", symname);
 }
@@ -112,7 +112,7 @@ next_one:
 static void
 print_base(struct seq_file *m, struct hrtimer_clock_base *base, u64 now)
 {
-	SEQ_printf(m, "  .base:       %p\n", base);
+	SEQ_printf(m, "  .base:       %pK\n", base);
 	SEQ_printf(m, "  .index:      %d\n",
 			base->index);
 	SEQ_printf(m, "  .resolution: %Lu nsecs\n",

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [GIT PULL] timer fix
@ 2010-01-31 17:26 Ingo Molnar
  0 siblings, 0 replies; 156+ messages in thread
From: Ingo Molnar @ 2010-01-31 17:26 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Thomas Gleixner, Andrew Morton

Linus,

Please pull the latest timers-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git timers-fixes-for-linus

 Thanks,

	Ingo

------------------>
Thomas Gleixner (1):
      clocksource: Prevent potential kgdb dead lock


 kernel/time/clocksource.c |   18 +++++++++++++++---
 1 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index e85c234..1370083 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -343,7 +343,19 @@ static void clocksource_resume_watchdog(void)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&watchdog_lock, flags);
+	/*
+	 * We use trylock here to avoid a potential dead lock when
+	 * kgdb calls this code after the kernel has been stopped with
+	 * watchdog_lock held. When watchdog_lock is held we just
+	 * return and accept, that the watchdog might trigger and mark
+	 * the monitored clock source (usually TSC) unstable.
+	 *
+	 * This does not affect the other caller clocksource_resume()
+	 * because at this point the kernel is UP, interrupts are
+	 * disabled and nothing can hold watchdog_lock.
+	 */
+	if (!spin_trylock_irqsave(&watchdog_lock, flags))
+		return;
 	clocksource_reset_watchdog();
 	spin_unlock_irqrestore(&watchdog_lock, flags);
 }
@@ -458,8 +470,8 @@ void clocksource_resume(void)
  * clocksource_touch_watchdog - Update watchdog
  *
  * Update the watchdog after exception contexts such as kgdb so as not
- * to incorrectly trip the watchdog.
- *
+ * to incorrectly trip the watchdog. This might fail when the kernel
+ * was stopped in code which holds watchdog_lock.
  */
 void clocksource_touch_watchdog(void)
 {

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [GIT PULL] timer fix
@ 2009-10-02 12:38 Ingo Molnar
  0 siblings, 0 replies; 156+ messages in thread
From: Ingo Molnar @ 2009-10-02 12:38 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Thomas Gleixner

Linus,

Please pull the latest timers-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git timers-fixes-for-linus

 Thanks,

	Ingo

------------------>
Roland Dreier (1):
      hrtimer: Remove overly verbose "switch to high res mode" message


 kernel/hrtimer.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index e5d98ce..4267279 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -720,8 +720,6 @@ static int hrtimer_switch_to_hres(void)
 	/* "Retrigger" the interrupt to get things going */
 	retrigger_next_event(NULL);
 	local_irq_restore(flags);
-	printk(KERN_DEBUG "Switched to high resolution mode on CPU %d\n",
-	       smp_processor_id());
 	return 1;
 }
 

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [GIT PULL] timer fix
@ 2009-09-26 12:27 Ingo Molnar
  0 siblings, 0 replies; 156+ messages in thread
From: Ingo Molnar @ 2009-09-26 12:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Thomas Gleixner, Martin Schwidefsky

Linus,

Please pull the latest timers-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git timers-fixes-for-linus

 Thanks,

	Ingo

------------------>
Martin Schwidefsky (1):
      clocksource: Resume clocksource without taking the clocksource mutex


 kernel/time/clocksource.c |    4 ----
 1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 0911334..5e18c6a 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -394,15 +394,11 @@ void clocksource_resume(void)
 {
 	struct clocksource *cs;
 
-	mutex_lock(&clocksource_mutex);
-
 	list_for_each_entry(cs, &clocksource_list, list)
 		if (cs->resume)
 			cs->resume();
 
 	clocksource_resume_watchdog();
-
-	mutex_unlock(&clocksource_mutex);
 }
 
 /**

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [GIT PULL] timer fix
@ 2009-08-09 16:09 Ingo Molnar
  0 siblings, 0 replies; 156+ messages in thread
From: Ingo Molnar @ 2009-08-09 16:09 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Thomas Gleixner

Linus,

Please pull the latest timers-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git timers-fixes-for-linus

 Thanks,

	Ingo

------------------>
Stanislaw Gruszka (1):
      posix_cpu_timers_exit_group(): Do not use thread_group_cputimer()


 kernel/posix-cpu-timers.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index bece7c0..e33a21c 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -521,11 +521,12 @@ void posix_cpu_timers_exit(struct task_struct *tsk)
 }
 void posix_cpu_timers_exit_group(struct task_struct *tsk)
 {
-	struct task_cputime cputime;
+	struct signal_struct *const sig = tsk->signal;
 
-	thread_group_cputimer(tsk, &cputime);
 	cleanup_timers(tsk->signal->cpu_timers,
-		       cputime.utime, cputime.stime, cputime.sum_exec_runtime);
+		       cputime_add(tsk->utime, sig->utime),
+		       cputime_add(tsk->stime, sig->stime),
+		       tsk->se.sum_exec_runtime + sig->sum_sched_runtime);
 }
 
 static void clear_dead_task(struct k_itimer *timer, union cpu_time_count now)

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [GIT PULL] timer fix
@ 2009-08-04 19:04 Ingo Molnar
  0 siblings, 0 replies; 156+ messages in thread
From: Ingo Molnar @ 2009-08-04 19:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra

Linus,

Please pull the latest timers-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git timers-fixes-for-linus

 Thanks,

	Ingo

------------------>
Hiroshi Shimamoto (1):
      posix-timers: Fix oops in clock_nanosleep() with CLOCK_MONOTONIC_RAW


 kernel/posix-timers.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/kernel/posix-timers.c b/kernel/posix-timers.c
index 052ec4d..d089d05 100644
--- a/kernel/posix-timers.c
+++ b/kernel/posix-timers.c
@@ -202,6 +202,12 @@ static int no_timer_create(struct k_itimer *new_timer)
 	return -EOPNOTSUPP;
 }
 
+static int no_nsleep(const clockid_t which_clock, int flags,
+		     struct timespec *tsave, struct timespec __user *rmtp)
+{
+	return -EOPNOTSUPP;
+}
+
 /*
  * Return nonzero if we know a priori this clockid_t value is bogus.
  */
@@ -254,6 +260,7 @@ static __init int init_posix_timers(void)
 		.clock_get = posix_get_monotonic_raw,
 		.clock_set = do_posix_clock_nosettime,
 		.timer_create = no_timer_create,
+		.nsleep = no_nsleep,
 	};
 
 	register_posix_clock(CLOCK_REALTIME, &clock_realtime);

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [GIT PULL] timer fix
@ 2009-06-20 16:55 Ingo Molnar
  0 siblings, 0 replies; 156+ messages in thread
From: Ingo Molnar @ 2009-06-20 16:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Thomas Gleixner, Andrew Morton

Linus,

Please pull the latest timers-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git timers-fixes-for-linus

 Thanks,

	Ingo

------------------>
Eero Nurkkala (1):
      NOHZ: Properly feed cpufreq ondemand governor


 kernel/time/tick-sched.c |   12 +++++++++---
 1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index d3f1ef4..a3562ce 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -222,6 +222,15 @@ void tick_nohz_stop_sched_tick(int inidle)
 
 	cpu = smp_processor_id();
 	ts = &per_cpu(tick_cpu_sched, cpu);
+
+	/*
+	 * Call to tick_nohz_start_idle stops the last_update_time from being
+	 * updated. Thus, it must not be called in the event we are called from
+	 * irq_exit() with the prior state different than idle.
+	 */
+	if (!inidle && !ts->inidle)
+		goto end;
+
 	now = tick_nohz_start_idle(ts);
 
 	/*
@@ -239,9 +248,6 @@ void tick_nohz_stop_sched_tick(int inidle)
 	if (unlikely(ts->nohz_mode == NOHZ_MODE_INACTIVE))
 		goto end;
 
-	if (!inidle && !ts->inidle)
-		goto end;
-
 	ts->inidle = 1;
 
 	if (need_resched())

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [git pull] timer fix
@ 2009-02-17 16:38 Ingo Molnar
  0 siblings, 0 replies; 156+ messages in thread
From: Ingo Molnar @ 2009-02-17 16:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Andrew Morton, Peter Zijlstra, Thomas Gleixner

Linus,

Please pull the latest timers-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git timers-fixes-for-linus

 Thanks,

	Ingo

------------------>
Peter Zijlstra (1):
      timers: more consistently use clock vs timer


 kernel/posix-cpu-timers.c |   60 ++++++++++++++++++++++----------------------
 1 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index 2313a4c..e976e50 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -681,6 +681,33 @@ static void cpu_timer_fire(struct k_itimer *timer)
 }
 
 /*
+ * Sample a process (thread group) timer for the given group_leader task.
+ * Must be called with tasklist_lock held for reading.
+ */
+static int cpu_timer_sample_group(const clockid_t which_clock,
+				  struct task_struct *p,
+				  union cpu_time_count *cpu)
+{
+	struct task_cputime cputime;
+
+	thread_group_cputimer(p, &cputime);
+	switch (CPUCLOCK_WHICH(which_clock)) {
+	default:
+		return -EINVAL;
+	case CPUCLOCK_PROF:
+		cpu->cpu = cputime_add(cputime.utime, cputime.stime);
+		break;
+	case CPUCLOCK_VIRT:
+		cpu->cpu = cputime.utime;
+		break;
+	case CPUCLOCK_SCHED:
+		cpu->sched = cputime.sum_exec_runtime + task_delta_exec(p);
+		break;
+	}
+	return 0;
+}
+
+/*
  * Guts of sys_timer_settime for CPU timers.
  * This is called with the timer locked and interrupts disabled.
  * If we return TIMER_RETRY, it's necessary to release the timer's lock
@@ -741,7 +768,7 @@ int posix_cpu_timer_set(struct k_itimer *timer, int flags,
 	if (CPUCLOCK_PERTHREAD(timer->it_clock)) {
 		cpu_clock_sample(timer->it_clock, p, &val);
 	} else {
-		cpu_clock_sample_group(timer->it_clock, p, &val);
+		cpu_timer_sample_group(timer->it_clock, p, &val);
 	}
 
 	if (old) {
@@ -889,7 +916,7 @@ void posix_cpu_timer_get(struct k_itimer *timer, struct itimerspec *itp)
 			read_unlock(&tasklist_lock);
 			goto dead;
 		} else {
-			cpu_clock_sample_group(timer->it_clock, p, &now);
+			cpu_timer_sample_group(timer->it_clock, p, &now);
 			clear_dead = (unlikely(p->exit_state) &&
 				      thread_group_empty(p));
 		}
@@ -1244,7 +1271,7 @@ void posix_cpu_timer_schedule(struct k_itimer *timer)
 			clear_dead_task(timer, now);
 			goto out_unlock;
 		}
-		cpu_clock_sample_group(timer->it_clock, p, &now);
+		cpu_timer_sample_group(timer->it_clock, p, &now);
 		bump_cpu_timer(timer, now);
 		/* Leave the tasklist_lock locked for the call below.  */
 	}
@@ -1409,33 +1436,6 @@ void run_posix_cpu_timers(struct task_struct *tsk)
 }
 
 /*
- * Sample a process (thread group) timer for the given group_leader task.
- * Must be called with tasklist_lock held for reading.
- */
-static int cpu_timer_sample_group(const clockid_t which_clock,
-				  struct task_struct *p,
-				  union cpu_time_count *cpu)
-{
-	struct task_cputime cputime;
-
-	thread_group_cputimer(p, &cputime);
-	switch (CPUCLOCK_WHICH(which_clock)) {
-	default:
-		return -EINVAL;
-	case CPUCLOCK_PROF:
-		cpu->cpu = cputime_add(cputime.utime, cputime.stime);
-		break;
-	case CPUCLOCK_VIRT:
-		cpu->cpu = cputime.utime;
-		break;
-	case CPUCLOCK_SCHED:
-		cpu->sched = cputime.sum_exec_runtime + task_delta_exec(p);
-		break;
-	}
-	return 0;
-}
-
-/*
  * Set one of the process-wide special case CPU timers.
  * The tsk->sighand->siglock must be held by the caller.
  * The *newval argument is relative and we update it to be absolute, *oldval

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* Re: [git pull] timer fix
  2009-02-05  9:58       ` Pavel Emelyanov
  2009-02-05 14:30         ` Ingo Molnar
@ 2009-02-05 16:04         ` Ray Lee
  1 sibling, 0 replies; 156+ messages in thread
From: Ray Lee @ 2009-02-05 16:04 UTC (permalink / raw)
  To: Pavel Emelyanov
  Cc: Kirill Korotaev, Ingo Molnar, Linus Torvalds, Kirill Korotaev,
	linux-kernel, Andrew Morton, Thomas Gleixner

On Thu, Feb 5, 2009 at 1:58 AM, Pavel Emelyanov <xemul@openvz.org> wrote:
> Kirill Korotaev wrote:
>> ACK, it works. Why not save another 4 bytes of .bss then by changing
>> hept_t1_cmp to u32? ;-)
>
> Hm... .bss you say? :) The bloat-o-meter results would be:
>
> For the original fix with casting hpet_t1_cmp to u32 inside the loop
> add/remove: 0/0 grow/shrink: 1/0 up/down: 2/0 (2)
> function                                     old     new   delta
> hpet_rtc_interrupt                           741     743      +2
>
> For the fix with casting the substitution result to s32
> add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-2 (-2)
> function                                     old     new   delta
> hpet_rtc_interrupt                           741     739      -2
>
> For the proposed by Kirill type change of the hpet_t1_cmp
> add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-6 (-6)
> function                                     old     new   delta
> hpet_rtc_timer_init                          186     185      -1
> hpet_rtc_interrupt                           741     740      -1
> hpet_t1_cmp                                    8       4      -4
>
> That's the fix:
>
> diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
> index 64d5ad0..dfbbb94 100644
> --- a/arch/x86/kernel/hpet.c
> +++ b/arch/x86/kernel/hpet.c
> @@ -897,7 +897,7 @@ static unsigned long hpet_rtc_flags;
>  static int hpet_prev_update_sec;
>  static struct rtc_time hpet_alarm_time;
>  static unsigned long hpet_pie_count;
> -static unsigned long hpet_t1_cmp;
> +static u32 hpet_t1_cmp;
>  static unsigned long hpet_default_delta;
>  static unsigned long hpet_pie_delta;
>  static unsigned long hpet_pie_limit;
>
> Reported-by: Kirill Korotaev <dev@openvz.org>
> Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
>
> This one also works. Should I re-send this patch in a proper way?

Ugh. This version seems too subtle for a new reader coming into this
code. I'd prefer the fix be kept in the comparison site, where people
already expect wraparound issues.

If you don't have time to add the helper, I'd really rather see the
version with the explicit cast acting as documentation, rather than a
test that magically works due to type issues.

Dunno, maybe it's just me.

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [git pull] timer fix
  2009-02-05  9:58       ` Pavel Emelyanov
@ 2009-02-05 14:30         ` Ingo Molnar
  2009-02-05 16:04         ` Ray Lee
  1 sibling, 0 replies; 156+ messages in thread
From: Ingo Molnar @ 2009-02-05 14:30 UTC (permalink / raw)
  To: Pavel Emelyanov
  Cc: Kirill Korotaev, Linus Torvalds, Kirill Korotaev, linux-kernel,
	Andrew Morton, Thomas Gleixner


* Pavel Emelyanov <xemul@openvz.org> wrote:

> Reported-by: Kirill Korotaev <dev@openvz.org>
> Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
> 
> This one also works. Should I re-send this patch in a proper way?

Yes, please do. Also, please have a look at the helper function cleanup 
suggestion from Linus - such a patch would be welcome too.

	Ingo


^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [git pull] timer fix
  2009-02-05  7:51     ` Kirill Korotaev
@ 2009-02-05  9:58       ` Pavel Emelyanov
  2009-02-05 14:30         ` Ingo Molnar
  2009-02-05 16:04         ` Ray Lee
  0 siblings, 2 replies; 156+ messages in thread
From: Pavel Emelyanov @ 2009-02-05  9:58 UTC (permalink / raw)
  To: Kirill Korotaev, Ingo Molnar
  Cc: Linus Torvalds, Kirill Korotaev, linux-kernel, Andrew Morton,
	Thomas Gleixner

Kirill Korotaev wrote:
> ACK, it works. Why not save another 4 bytes of .bss then by changing
> hept_t1_cmp to u32? ;-)

Hm... .bss you say? :) The bloat-o-meter results would be:

For the original fix with casting hpet_t1_cmp to u32 inside the loop
add/remove: 0/0 grow/shrink: 1/0 up/down: 2/0 (2)
function                                     old     new   delta
hpet_rtc_interrupt                           741     743      +2

For the fix with casting the substitution result to s32
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-2 (-2)
function                                     old     new   delta
hpet_rtc_interrupt                           741     739      -2

For the proposed by Kirill type change of the hpet_t1_cmp
add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-6 (-6)
function                                     old     new   delta
hpet_rtc_timer_init                          186     185      -1
hpet_rtc_interrupt                           741     740      -1
hpet_t1_cmp                                    8       4      -4

That's the fix:

diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 64d5ad0..dfbbb94 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -897,7 +897,7 @@ static unsigned long hpet_rtc_flags;
 static int hpet_prev_update_sec;
 static struct rtc_time hpet_alarm_time;
 static unsigned long hpet_pie_count;
-static unsigned long hpet_t1_cmp;
+static u32 hpet_t1_cmp;
 static unsigned long hpet_default_delta;
 static unsigned long hpet_pie_delta;
 static unsigned long hpet_pie_limit;

Reported-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

This one also works. Should I re-send this patch in a proper way?

> Kirill
> 
> 
> On 2/5/09 1:58 AM, "Ingo Molnar" <mingo@elte.hu> wrote:
> 
>>
>> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>
>>>       } while ((s32)(hpet_readl(HPET_COUNTER) - hpet_t1_cmp) > 0);
>> that is also more efficient code by 6 bytes:
>>
>> before:
>>
>> ffffffff81020856:       ff c3                   inc    %ebx
>> ffffffff81020858:       48 05 f0 00 00 00       add    $0xf0,%rax
>> ffffffff8102085e:       8b 00                   mov    (%rax),%eax
>> ffffffff81020860:       8b 15 2a 79 85 00       mov    0x85792a(%rip),%edx
>> # ffffffff81878190 <hpet_t1_cmp>
>> ffffffff81020866:       89 c0                   mov    %eax,%eax
>> ffffffff81020868:       48 29 d0                sub    %rdx,%rax
>> ffffffff8102086b:       48 85 c0                test   %rax,%rax
>> ffffffff8102086e:       7f bf                   jg     ffffffff8102082f
>> <hpet_rtc_interrupt+0x68>
>> ffffffff81020870:       85 db                   test   %ebx,%ebx
>>
>> after:
>>
>> ffffffff81020856:       ff c3                   inc    %ebx
>> ffffffff81020858:       48 05 f0 00 00 00       add    $0xf0,%rax
>> ffffffff8102085e:       8b 00                   mov    (%rax),%eax
>> ffffffff81020860:       2b 05 2a 79 85 00       sub    0x85792a(%rip),%eax
>> # ffffffff81878190 <hpet_t1_cmp>
>> ffffffff81020866:       85 c0                   test   %eax,%eax
>> ffffffff81020868:       7f c5                   jg     ffffffff8102082f
>> <hpet_rtc_interrupt+0x68>
>> ffffffff8102086a:       85 db                   test   %ebx,%ebx
>>
>> Kirill, Pavel, could you please re-test the updated commit attached below?
>>
>>         Ingo
>>
>> ---------------->
>> From 66a36a1e95fe9de9c6a56f0bcd01f4ba21929f86 Mon Sep 17 00:00:00 2001
>> From: Pavel Emelyanov <xemul@openvz.org>
>> Date: Wed, 4 Feb 2009 13:40:31 +0300
>> Subject: [PATCH] x86: fix hpet timer reinit for x86_64
>>
>> There's a small problem with hpet_rtc_reinit function - it checks
>> for the:
>>
>>         hpet_readl(HPET_COUNTER) - hpet_t1_cmp > 0
>>
>> to continue increasing both the HPET_T1_CMP (register) and the
>> hpet_t1_cmp (variable).
>>
>> But since the HPET_COUNTER is always 32-bit, if the hpet_t1_cmp
>> is 64-bit this condition will always be FALSE once the latter hits
>> the 32-bit boundary, and we can have a situation, when we don't
>> increase the HPET_T1_CMP register high enough.
>>
>> The result - timer stops ticking, since HPET_T1_CMP becomes less,
>> than the COUNTER and never increased again.
>>
>> The solution is (based on Linus's suggestion) to compare 64-bits
>> (on 64-bit x86), but to do the comparison on 32-bit signed
>> integers.
>>
>> Reported-by: Kirill Korotaev <dev@openvz.org>
>> Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
>> Signed-off-by: Ingo Molnar <mingo@elte.hu>
>> ---
>>  arch/x86/kernel/hpet.c |    2 +-
>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
>> index 64d5ad0..c761f91 100644
>> --- a/arch/x86/kernel/hpet.c
>> +++ b/arch/x86/kernel/hpet.c
>> @@ -1075,7 +1075,7 @@ static void hpet_rtc_timer_reinit(void)
>>                 hpet_t1_cmp += delta;
>>                 hpet_writel(hpet_t1_cmp, HPET_T1_CMP);
>>                 lost_ints++;
>> -       } while ((long)(hpet_readl(HPET_COUNTER) - hpet_t1_cmp) > 0);
>> +       } while ((s32)(hpet_readl(HPET_COUNTER) - hpet_t1_cmp) > 0);
>>
>>         if (lost_ints) {
>>                 if (hpet_rtc_flags & RTC_PIE)
> 
> 
> 


^ permalink raw reply related	[flat|nested] 156+ messages in thread

* Re: [git pull] timer fix
  2009-02-04 22:58   ` Ingo Molnar
  2009-02-04 23:13     ` H. Peter Anvin
@ 2009-02-05  7:51     ` Kirill Korotaev
  2009-02-05  9:58       ` Pavel Emelyanov
  1 sibling, 1 reply; 156+ messages in thread
From: Kirill Korotaev @ 2009-02-05  7:51 UTC (permalink / raw)
  To: Ingo Molnar, Linus Torvalds, Pavel Emelyanov, Kirill Korotaev
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner

ACK, it works. Why not save another 4 bytes of .bss then by changing
hept_t1_cmp to u32? ;-)

Kirill


On 2/5/09 1:58 AM, "Ingo Molnar" <mingo@elte.hu> wrote:

> 
> 
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
>>       } while ((s32)(hpet_readl(HPET_COUNTER) - hpet_t1_cmp) > 0);
> 
> that is also more efficient code by 6 bytes:
> 
> before:
> 
> ffffffff81020856:       ff c3                   inc    %ebx
> ffffffff81020858:       48 05 f0 00 00 00       add    $0xf0,%rax
> ffffffff8102085e:       8b 00                   mov    (%rax),%eax
> ffffffff81020860:       8b 15 2a 79 85 00       mov    0x85792a(%rip),%edx
> # ffffffff81878190 <hpet_t1_cmp>
> ffffffff81020866:       89 c0                   mov    %eax,%eax
> ffffffff81020868:       48 29 d0                sub    %rdx,%rax
> ffffffff8102086b:       48 85 c0                test   %rax,%rax
> ffffffff8102086e:       7f bf                   jg     ffffffff8102082f
> <hpet_rtc_interrupt+0x68>
> ffffffff81020870:       85 db                   test   %ebx,%ebx
> 
> after:
> 
> ffffffff81020856:       ff c3                   inc    %ebx
> ffffffff81020858:       48 05 f0 00 00 00       add    $0xf0,%rax
> ffffffff8102085e:       8b 00                   mov    (%rax),%eax
> ffffffff81020860:       2b 05 2a 79 85 00       sub    0x85792a(%rip),%eax
> # ffffffff81878190 <hpet_t1_cmp>
> ffffffff81020866:       85 c0                   test   %eax,%eax
> ffffffff81020868:       7f c5                   jg     ffffffff8102082f
> <hpet_rtc_interrupt+0x68>
> ffffffff8102086a:       85 db                   test   %ebx,%ebx
> 
> Kirill, Pavel, could you please re-test the updated commit attached below?
> 
>         Ingo
> 
> ---------------->
> From 66a36a1e95fe9de9c6a56f0bcd01f4ba21929f86 Mon Sep 17 00:00:00 2001
> From: Pavel Emelyanov <xemul@openvz.org>
> Date: Wed, 4 Feb 2009 13:40:31 +0300
> Subject: [PATCH] x86: fix hpet timer reinit for x86_64
> 
> There's a small problem with hpet_rtc_reinit function - it checks
> for the:
> 
>         hpet_readl(HPET_COUNTER) - hpet_t1_cmp > 0
> 
> to continue increasing both the HPET_T1_CMP (register) and the
> hpet_t1_cmp (variable).
> 
> But since the HPET_COUNTER is always 32-bit, if the hpet_t1_cmp
> is 64-bit this condition will always be FALSE once the latter hits
> the 32-bit boundary, and we can have a situation, when we don't
> increase the HPET_T1_CMP register high enough.
> 
> The result - timer stops ticking, since HPET_T1_CMP becomes less,
> than the COUNTER and never increased again.
> 
> The solution is (based on Linus's suggestion) to compare 64-bits
> (on 64-bit x86), but to do the comparison on 32-bit signed
> integers.
> 
> Reported-by: Kirill Korotaev <dev@openvz.org>
> Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> ---
>  arch/x86/kernel/hpet.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
> index 64d5ad0..c761f91 100644
> --- a/arch/x86/kernel/hpet.c
> +++ b/arch/x86/kernel/hpet.c
> @@ -1075,7 +1075,7 @@ static void hpet_rtc_timer_reinit(void)
>                 hpet_t1_cmp += delta;
>                 hpet_writel(hpet_t1_cmp, HPET_T1_CMP);
>                 lost_ints++;
> -       } while ((long)(hpet_readl(HPET_COUNTER) - hpet_t1_cmp) > 0);
> +       } while ((s32)(hpet_readl(HPET_COUNTER) - hpet_t1_cmp) > 0);
> 
>         if (lost_ints) {
>                 if (hpet_rtc_flags & RTC_PIE)


^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [git pull] timer fix
  2009-02-04 23:13     ` H. Peter Anvin
@ 2009-02-05  0:04       ` Ingo Molnar
  0 siblings, 0 replies; 156+ messages in thread
From: Ingo Molnar @ 2009-02-05  0:04 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Pavel Emelyanov, Kirill Korotaev, linux-kernel,
	Andrew Morton, Thomas Gleixner


* H. Peter Anvin <hpa@zytor.com> wrote:

> Ingo Molnar wrote:
>>
>> The solution is (based on Linus's suggestion) to compare 64-bits
>                                                   ^not, presumably
>> (on 64-bit x86), but to do the comparison on 32-bit signed
>> integers.

fixed, thanks!

	Ingo

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [git pull] timer fix
  2009-02-04 22:58   ` Ingo Molnar
@ 2009-02-04 23:13     ` H. Peter Anvin
  2009-02-05  0:04       ` Ingo Molnar
  2009-02-05  7:51     ` Kirill Korotaev
  1 sibling, 1 reply; 156+ messages in thread
From: H. Peter Anvin @ 2009-02-04 23:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Pavel Emelyanov, Kirill Korotaev, linux-kernel,
	Andrew Morton, Thomas Gleixner

Ingo Molnar wrote:
> 
> The solution is (based on Linus's suggestion) to compare 64-bits
                                                   ^not, presumably
> (on 64-bit x86), but to do the comparison on 32-bit signed
> integers.

	-hpa

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [git pull] timer fix
  2009-02-04 22:11 ` Linus Torvalds
  2009-02-04 22:16   ` Linus Torvalds
  2009-02-04 22:25   ` Ingo Molnar
@ 2009-02-04 22:58   ` Ingo Molnar
  2009-02-04 23:13     ` H. Peter Anvin
  2009-02-05  7:51     ` Kirill Korotaev
  2 siblings, 2 replies; 156+ messages in thread
From: Ingo Molnar @ 2009-02-04 22:58 UTC (permalink / raw)
  To: Linus Torvalds, Pavel Emelyanov, Kirill Korotaev
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> 	} while ((s32)(hpet_readl(HPET_COUNTER) - hpet_t1_cmp) > 0);

that is also more efficient code by 6 bytes:

before:

ffffffff81020856:	ff c3                	inc    %ebx
ffffffff81020858:	48 05 f0 00 00 00    	add    $0xf0,%rax
ffffffff8102085e:	8b 00                	mov    (%rax),%eax
ffffffff81020860:	8b 15 2a 79 85 00    	mov    0x85792a(%rip),%edx        # ffffffff81878190 <hpet_t1_cmp>
ffffffff81020866:	89 c0                	mov    %eax,%eax
ffffffff81020868:	48 29 d0             	sub    %rdx,%rax
ffffffff8102086b:	48 85 c0             	test   %rax,%rax
ffffffff8102086e:	7f bf                	jg     ffffffff8102082f <hpet_rtc_interrupt+0x68>
ffffffff81020870:	85 db                	test   %ebx,%ebx

after:

ffffffff81020856:	ff c3                	inc    %ebx
ffffffff81020858:	48 05 f0 00 00 00    	add    $0xf0,%rax
ffffffff8102085e:	8b 00                	mov    (%rax),%eax
ffffffff81020860:	2b 05 2a 79 85 00    	sub    0x85792a(%rip),%eax        # ffffffff81878190 <hpet_t1_cmp>
ffffffff81020866:	85 c0                	test   %eax,%eax
ffffffff81020868:	7f c5                	jg     ffffffff8102082f <hpet_rtc_interrupt+0x68>
ffffffff8102086a:	85 db                	test   %ebx,%ebx

Kirill, Pavel, could you please re-test the updated commit attached below?

	Ingo

---------------->
>From 66a36a1e95fe9de9c6a56f0bcd01f4ba21929f86 Mon Sep 17 00:00:00 2001
From: Pavel Emelyanov <xemul@openvz.org>
Date: Wed, 4 Feb 2009 13:40:31 +0300
Subject: [PATCH] x86: fix hpet timer reinit for x86_64

There's a small problem with hpet_rtc_reinit function - it checks
for the:

	hpet_readl(HPET_COUNTER) - hpet_t1_cmp > 0

to continue increasing both the HPET_T1_CMP (register) and the
hpet_t1_cmp (variable).

But since the HPET_COUNTER is always 32-bit, if the hpet_t1_cmp
is 64-bit this condition will always be FALSE once the latter hits
the 32-bit boundary, and we can have a situation, when we don't
increase the HPET_T1_CMP register high enough.

The result - timer stops ticking, since HPET_T1_CMP becomes less,
than the COUNTER and never increased again.

The solution is (based on Linus's suggestion) to compare 64-bits
(on 64-bit x86), but to do the comparison on 32-bit signed
integers.

Reported-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/hpet.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 64d5ad0..c761f91 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -1075,7 +1075,7 @@ static void hpet_rtc_timer_reinit(void)
 		hpet_t1_cmp += delta;
 		hpet_writel(hpet_t1_cmp, HPET_T1_CMP);
 		lost_ints++;
-	} while ((long)(hpet_readl(HPET_COUNTER) - hpet_t1_cmp) > 0);
+	} while ((s32)(hpet_readl(HPET_COUNTER) - hpet_t1_cmp) > 0);
 
 	if (lost_ints) {
 		if (hpet_rtc_flags & RTC_PIE)

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* Re: [git pull] timer fix
  2009-02-04 22:11 ` Linus Torvalds
  2009-02-04 22:16   ` Linus Torvalds
@ 2009-02-04 22:25   ` Ingo Molnar
  2009-02-04 22:58   ` Ingo Molnar
  2 siblings, 0 replies; 156+ messages in thread
From: Ingo Molnar @ 2009-02-04 22:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Andrew Morton, Thomas Gleixner


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Wed, 4 Feb 2009, Ingo Molnar wrote:
> >
> > Pavel Emelyanov (1):
> >       x86: fix hpet timer reinit for x86_64
> > 
> > 
> >  arch/x86/kernel/hpet.c |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
> > index 64d5ad0..ec319d1 100644
> > --- a/arch/x86/kernel/hpet.c
> > +++ b/arch/x86/kernel/hpet.c
> > @@ -1075,7 +1075,7 @@ static void hpet_rtc_timer_reinit(void)
> >  		hpet_t1_cmp += delta;
> >  		hpet_writel(hpet_t1_cmp, HPET_T1_CMP);
> >  		lost_ints++;
> > -	} while ((long)(hpet_readl(HPET_COUNTER) - hpet_t1_cmp) > 0);
> > +	} while ((long)(hpet_readl(HPET_COUNTER) - (u32)hpet_t1_cmp) > 0);
> 
> This is bordering on not being correct.

yeah, i had to look twice. The only reason i left it that way was because i 
couldnt reproduce the problem and hpet is hellishly fragile and this patch 
was tested so i chickened out.

OTOH that fragility is partly because such constructs have piled up so you 
very much have a valid point ...

We'll clean this up. I've already added the clean 32-bit casts - which also 
has another advantage: it does not actually trust the hw to always return 
32-bit values - it explicitly cuts to 32 bits and does signed arithmetics on 
that. Will also do the helper function cleanup to abstract the counter 
arithmetics away.

> In particular, think about when HPET_COUNTER or hpet_t1_cmp overflows in 
> 32 bits, and what you want to happen. If you do the subtract add test in 
> 64 bits, it will simply do the wrong thing. Think what happens if 
> hpet_t1_cmp is actually _larger_ than HPET_COUNTER, but overflowed in 32 
> bits, and you're now looking at:
> 
> 	(long) (0xffffffff - 0x00000001)
> 
> which is actually > 0, so the thing will continue to loop INCORRECTLY. It 
> should have stopped (and _would_ have stopped on 32-bit x86).

yeah, allowing that to happen is just wrong.

	Ingo

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [git pull] timer fix
  2009-02-04 22:11 ` Linus Torvalds
@ 2009-02-04 22:16   ` Linus Torvalds
  2009-02-04 22:25   ` Ingo Molnar
  2009-02-04 22:58   ` Ingo Molnar
  2 siblings, 0 replies; 156+ messages in thread
From: Linus Torvalds @ 2009-02-04 22:16 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Andrew Morton, Thomas Gleixner



On Wed, 4 Feb 2009, Linus Torvalds wrote:
> 
> Either cast the result of the subtract to "s32" (or "int", whatever), or 
> cast _both_ of them to (s32) so that the subtract is done in a signed 
> type, and then the expansion to (long) will still be right - but 
> unnecessary - in the sign.

Btw, doing it with a nice helper macro or function is also perhaps a good 
idea, at least if these "compare hpet values" things happen more than 
once. 

Look at "time_after()" in <linux/jiffies.h> to see how to do these kinds 
of "comparisons of things that may overflow" really carefully. You 
absolutely need to do the compare in a size that is no larger than the 
size of the actual values (and in the case of HPET, it's 32-bit, at least 
the way we do things now - I guess HPET's _could_ be 64-bit, but we don't 
read more than 32 bits or whatever).

So <linux/jiffies.h> does the cast to "(long)", but it does so because the 
incoming values really have type "unsigned long" and are valid in all 
bits.

			Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [git pull] timer fix
  2009-02-04 19:25 Ingo Molnar
@ 2009-02-04 22:11 ` Linus Torvalds
  2009-02-04 22:16   ` Linus Torvalds
                     ` (2 more replies)
  0 siblings, 3 replies; 156+ messages in thread
From: Linus Torvalds @ 2009-02-04 22:11 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Andrew Morton, Thomas Gleixner



On Wed, 4 Feb 2009, Ingo Molnar wrote:
>
> Pavel Emelyanov (1):
>       x86: fix hpet timer reinit for x86_64
> 
> 
>  arch/x86/kernel/hpet.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
> index 64d5ad0..ec319d1 100644
> --- a/arch/x86/kernel/hpet.c
> +++ b/arch/x86/kernel/hpet.c
> @@ -1075,7 +1075,7 @@ static void hpet_rtc_timer_reinit(void)
>  		hpet_t1_cmp += delta;
>  		hpet_writel(hpet_t1_cmp, HPET_T1_CMP);
>  		lost_ints++;
> -	} while ((long)(hpet_readl(HPET_COUNTER) - hpet_t1_cmp) > 0);
> +	} while ((long)(hpet_readl(HPET_COUNTER) - (u32)hpet_t1_cmp) > 0);

This is bordering on not being correct.

It may happen to _work_, but the fact is, you want a 32-bit signed 
compare, not a 64-bit subtract that just happens to work. So the proper 
fix is to just make it do

	} while ((s32)(hpet_readl(HPET_COUNTER) - hpet_t1_cmp) > 0);

Otherwise you always end up depending on very subtle internal logic, and 
the exact types of the things involved.

In particular, think about when HPET_COUNTER or hpet_t1_cmp overflows in 
32 bits, and what you want to happen. If you do the subtract add test in 
64 bits, it will simply do the wrong thing. Think what happens if 
hpet_t1_cmp is actually _larger_ than HPET_COUNTER, but overflowed in 32 
bits, and you're now looking at:

	(long) (0xffffffff - 0x00000001)

which is actually > 0, so the thing will continue to loop INCORRECTLY. It 
should have stopped (and _would_ have stopped on 32-bit x86).

In contrast, look at what happens if you do the subtracting (or at least 
test the _result_ of the subtract) in the right size:

	(s32) (0xffffffff - 0x00000001) 

which becomes -2, which is not larger than 0, which means that we exit 
(which is correct, because the comparator value is actually ahead of the 
current count: 0x00000001 is _ahead_ of 0xffffffff, even if it's smaller 
in an "unsigned long".

So I'm not going to pull it. This cast is simply wrong.

Either cast the result of the subtract to "s32" (or "int", whatever), or 
cast _both_ of them to (s32) so that the subtract is done in a signed 
type, and then the expansion to (long) will still be right - but 
unnecessary - in the sign.

			Linus

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [git pull] timer fix
@ 2009-02-04 19:25 Ingo Molnar
  2009-02-04 22:11 ` Linus Torvalds
  0 siblings, 1 reply; 156+ messages in thread
From: Ingo Molnar @ 2009-02-04 19:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Andrew Morton, Thomas Gleixner

Linus,

Please pull the latest timers-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git timers-fixes-for-linus

 Thanks,

	Ingo

------------------>
Pavel Emelyanov (1):
      x86: fix hpet timer reinit for x86_64


 arch/x86/kernel/hpet.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 64d5ad0..ec319d1 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -1075,7 +1075,7 @@ static void hpet_rtc_timer_reinit(void)
 		hpet_t1_cmp += delta;
 		hpet_writel(hpet_t1_cmp, HPET_T1_CMP);
 		lost_ints++;
-	} while ((long)(hpet_readl(HPET_COUNTER) - hpet_t1_cmp) > 0);
+	} while ((long)(hpet_readl(HPET_COUNTER) - (u32)hpet_t1_cmp) > 0);
 
 	if (lost_ints) {
 		if (hpet_rtc_flags & RTC_PIE)

^ permalink raw reply related	[flat|nested] 156+ messages in thread

end of thread, other threads:[~2024-05-10 17:29 UTC | newest]

Thread overview: 156+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-05  1:40 Linux 3.1-rc9 Linus Torvalds
2011-10-07  7:08 ` Simon Kirby
2011-10-07 17:48   ` Simon Kirby
2011-10-07 18:01     ` Peter Zijlstra
2011-10-08  0:33       ` Simon Kirby
2011-10-08  0:50       ` Simon Kirby
2011-10-08  7:55         ` Peter Zijlstra
2011-10-12 21:35           ` Simon Kirby
2011-10-13 23:25             ` Simon Kirby
2011-10-17  1:39               ` Linus Torvalds
2011-10-17  4:58                 ` Ingo Molnar
2011-10-17  9:03                   ` Thomas Gleixner
2011-10-17 10:40                     ` Peter Zijlstra
2011-10-17 11:40                       ` Alan Cox
2011-10-17 18:49                     ` Ingo Molnar
2011-10-17 20:35                       ` H. Peter Anvin
2011-10-17 21:19                         ` Ingo Molnar
2011-10-17 21:22                           ` H. Peter Anvin
2011-10-17 21:39                             ` Ingo Molnar
2011-10-17 22:03                               ` Ingo Molnar
2011-10-17 22:04                                 ` Ingo Molnar
2011-10-17 22:08                               ` H. Peter Anvin
2011-10-18  6:01                                 ` Ingo Molnar
2011-10-18  7:12                                 ` Geert Uytterhoeven
2011-10-18 18:50                                   ` H. Peter Anvin
2011-10-17 21:31                           ` Ingo Molnar
2011-10-17  7:55                 ` Martin Schwidefsky
2011-10-17  9:12                   ` Peter Zijlstra
2011-10-17  9:18                     ` Martin Schwidefsky
2011-10-17 20:48                   ` H. Peter Anvin
2011-10-18  7:20                     ` Martin Schwidefsky
2011-10-17 10:34                 ` Peter Zijlstra
2011-10-17 14:07                   ` Martin Schwidefsky
2011-10-17 14:57                   ` Linus Torvalds
2011-10-17 17:54                     ` Peter Zijlstra
2011-10-17 18:31                       ` Linus Torvalds
2011-10-17 19:23                         ` Peter Zijlstra
2011-10-17 21:00                           ` Thomas Gleixner
2011-10-18  8:39                             ` Thomas Gleixner
2011-10-18  9:05                               ` Peter Zijlstra
2011-10-18 14:59                                 ` Linus Torvalds
2011-10-18 15:26                                   ` Thomas Gleixner
2011-10-18 18:07                                   ` Ingo Molnar
2011-10-18 18:14                                   ` [GIT PULL] timer fix Ingo Molnar
2011-10-18 16:13                                 ` Linux 3.1-rc9 Dave Jones
2011-10-18 18:20                                 ` Simon Kirby
2011-10-18 19:48                                   ` Thomas Gleixner
2011-10-18 20:12                                     ` Linus Torvalds
2011-10-25 15:26                                       ` Simon Kirby
2011-10-26  1:47                                         ` Yong Zhang
2011-10-24 19:02                                     ` Simon Kirby
2011-10-25  7:13                                       ` Linus Torvalds
2011-10-25  9:01                                         ` David Miller
2011-10-25 12:30                                           ` Thomas Gleixner
2011-10-25 23:18                                             ` David Miller
2011-10-25 20:20                                       ` Simon Kirby
2011-10-31 17:32                                         ` Simon Kirby
2011-11-02 16:40                                           ` Thomas Gleixner
2011-11-02 17:27                                             ` Eric Dumazet
2011-11-02 17:46                                               ` Linus Torvalds
2011-11-02 17:53                                                 ` Eric Dumazet
2011-11-02 18:00                                                   ` Linus Torvalds
2011-11-02 18:05                                                     ` Eric Dumazet
2011-11-02 18:10                                                       ` Linus Torvalds
2011-11-02 17:49                                               ` Eric Dumazet
2011-11-02 17:58                                                 ` Eric Dumazet
2011-11-02 19:16                                                   ` Simon Kirby
2011-11-02 22:42                                                     ` Eric Dumazet
2011-11-03  0:24                                                       ` Thomas Gleixner
2011-11-03  0:52                                                       ` Simon Kirby
2011-11-03 22:07                                                         ` David Miller
2011-11-03  6:06                                                       ` Jörg-Volker Peetz
2011-11-03  6:26                                                         ` Eric Dumazet
2011-11-03  6:43                                                           ` David Miller
2011-11-02 17:54                                               ` Thomas Gleixner
2011-11-02 18:04                                                 ` Eric Dumazet
2011-11-02 18:28                                             ` Simon Kirby
2011-11-02 18:30                                               ` Thomas Gleixner
2011-11-02 22:10                                           ` Steven Rostedt
2011-11-02 23:00                                             ` Steven Rostedt
2011-11-03  0:09                                               ` Simon Kirby
2011-11-03  0:15                                                 ` Steven Rostedt
2011-11-03  0:17                                                   ` Simon Kirby
2011-11-18 23:11                                         ` [tip:perf/core] lockdep: Show subclass in pretty print of lockdep output tip-bot for Steven Rostedt
2011-10-20 14:36                 ` Linux 3.1-rc9 Martin Schwidefsky
2011-10-23 11:34                   ` Ingo Molnar
2011-10-24  7:48                     ` Martin Schwidefsky
2011-10-24  7:51                       ` Linus Torvalds
2011-10-24  8:08                         ` Martin Schwidefsky
2011-10-18  5:40             ` Simon Kirby
2011-10-09 20:51 ` Arkadiusz Miśkiewicz
2011-10-10  2:29   ` [tpmdd-devel] " Stefan Berger
2011-10-10 16:23     ` Rajiv Andrade
2011-10-10 17:05       ` Arkadiusz Miśkiewicz
2011-10-10 17:22         ` Stefan Berger
2011-10-10 17:57           ` Arkadiusz Miśkiewicz
2011-10-10 21:08             ` Arkadiusz Miśkiewicz
2011-10-11  7:09             ` [tpmdd-devel] " Peter.Huewe
  -- strict thread matches above, loose matches on Subject: below --
2024-05-10 11:12 [GIT PULL] timer fix Ingo Molnar
2024-05-10 17:29 ` pr-tracker-bot
2020-06-28 18:39 Ingo Molnar
2020-06-28 22:05 ` pr-tracker-bot
2020-04-25 10:16 Ingo Molnar
2020-04-25 19:30 ` pr-tracker-bot
2019-11-16 21:38 Ingo Molnar
2019-11-17  0:35 ` pr-tracker-bot
2019-10-02 22:06 Ingo Molnar
2019-10-02 23:00 ` pr-tracker-bot
2019-09-26 20:18 Ingo Molnar
2019-09-26 23:00 ` pr-tracker-bot
2019-04-12 13:09 Ingo Molnar
2019-04-13  4:05 ` pr-tracker-bot
2018-12-21 12:34 Ingo Molnar
2018-12-21 19:30 ` pr-tracker-bot
2018-12-23 19:29 ` Heiko Carstens
2019-01-17  9:51   ` Ingo Molnar
2019-01-17 15:58     ` Heiko Carstens
2019-01-17 16:57       ` Thomas Gleixner
2018-03-25  9:00 Ingo Molnar
2017-09-24 11:25 Ingo Molnar
2017-08-26  7:17 Ingo Molnar
2017-07-21 10:21 Ingo Molnar
2017-05-12  7:35 Ingo Molnar
2017-01-18  9:37 Ingo Molnar
2016-12-23 22:53 Ingo Molnar
2016-10-18 11:18 Ingo Molnar
2016-07-13 12:58 Ingo Molnar
2016-04-23 11:34 Ingo Molnar
2015-08-14  7:13 Ingo Molnar
2015-07-18  3:06 Ingo Molnar
2015-02-06 18:38 Ingo Molnar
2014-03-29 18:44 Ingo Molnar
2014-01-15 18:27 Ingo Molnar
2013-10-26 12:27 Ingo Molnar
2013-09-18 16:22 Ingo Molnar
2011-04-29 18:11 Ingo Molnar
2011-02-28 17:39 Ingo Molnar
2011-02-15 17:06 Ingo Molnar
2010-01-31 17:26 Ingo Molnar
2009-10-02 12:38 Ingo Molnar
2009-09-26 12:27 Ingo Molnar
2009-08-09 16:09 Ingo Molnar
2009-08-04 19:04 Ingo Molnar
2009-06-20 16:55 Ingo Molnar
2009-02-17 16:38 [git pull] " Ingo Molnar
2009-02-04 19:25 Ingo Molnar
2009-02-04 22:11 ` Linus Torvalds
2009-02-04 22:16   ` Linus Torvalds
2009-02-04 22:25   ` Ingo Molnar
2009-02-04 22:58   ` Ingo Molnar
2009-02-04 23:13     ` H. Peter Anvin
2009-02-05  0:04       ` Ingo Molnar
2009-02-05  7:51     ` Kirill Korotaev
2009-02-05  9:58       ` Pavel Emelyanov
2009-02-05 14:30         ` Ingo Molnar
2009-02-05 16:04         ` Ray Lee

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).