All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 00/46] CPU hotplug: stop_machine()-free CPU hotplug
@ 2013-02-18 12:38 ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:38 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

Hi,

This patchset removes CPU hotplug's dependence on stop_machine() from the CPU
offline path and provides an alternative (set of APIs) to preempt_disable() to
prevent CPUs from going offline, which can be invoked from atomic context.
The motivation behind the removal of stop_machine() is to avoid its ill-effects
and thus improve the design of CPU hotplug. (More description regarding this
is available in the patches).

All the users of preempt_disable()/local_irq_disable() who used to use it to
prevent CPU offline, have been converted to the new primitives introduced in the
patchset. Also, the CPU_DYING notifiers have been audited to check whether
they can cope up with the removal of stop_machine() or whether they need to
use new locks for synchronization (all CPU_DYING notifiers looked OK, without
the need for any new locks).

Applies on current mainline (v3.8-rc7+).

This patchset is available in the following git branch:

git://github.com/srivatsabhat/linux.git  stop-machine-free-cpu-hotplug-v6


Overview of the patches:
-----------------------

Patches 1 to 7 introduce a generic, flexible Per-CPU Reader-Writer Locking
scheme.

Patch 8 uses this synchronization mechanism to build the
get/put_online_cpus_atomic() APIs which can be used from atomic context, to
prevent CPUs from going offline.

Patch 9 is a cleanup; it converts preprocessor macros to static inline
functions.

Patches 10 to 43 convert various call-sites to use the new APIs.

Patch 44 is the one which actually removes stop_machine() from the CPU
offline path.

Patch 45 decouples stop_machine() and CPU hotplug from Kconfig.

Patch 46 updates the documentation to reflect the new APIs.


Changes in v6:
--------------

* Fixed issues related to memory barriers, as pointed out by Paul and Oleg.
* Fixed the locking issue related to clockevents_lock, which was being
  triggered when cpu idle was enabled.
* Some code restructuring to improve readability and to enhance some fastpath
  optimizations.
* Randconfig build-fixes, reported by Fengguang Wu.


Changes in v5:
--------------
  Exposed a new generic locking scheme: Flexible Per-CPU Reader-Writer locks,
  based on the synchronization schemes already discussed in the previous
  versions, and used it in CPU hotplug, to implement the new APIs.

  Audited the CPU_DYING notifiers in the kernel source tree and replaced
  usages of preempt_disable() with the new get/put_online_cpus_atomic() APIs
  where necessary.


Changes in v4:
--------------
  The synchronization scheme has been simplified quite a bit, which makes it
  look a lot less complex than before. Some highlights:

* Implicit ACKs:

  The earlier design required the readers to explicitly ACK the writer's
  signal. The new design uses implicit ACKs instead. The reader switching
  over to rwlock implicitly tells the writer to stop waiting for that reader.

* No atomic operations:

  Since we got rid of explicit ACKs, we no longer have the need for a reader
  and a writer to update the same counter. So we can get rid of atomic ops
  too.

Changes in v3:
--------------
* Dropped the _light() and _full() variants of the APIs. Provided a single
  interface: get/put_online_cpus_atomic().

* Completely redesigned the synchronization mechanism again, to make it
  fast and scalable at the reader-side in the fast-path (when no hotplug
  writers are active). This new scheme also ensures that there is no
  possibility of deadlocks due to circular locking dependency.
  In summary, this provides the scalability and speed of per-cpu rwlocks
  (without actually using them), while avoiding the downside (deadlock
  possibilities) which is inherent in any per-cpu locking scheme that is
  meant to compete with preempt_disable()/enable() in terms of flexibility.

  The problem with using per-cpu locking to replace preempt_disable()/enable
  was explained here:
  https://lkml.org/lkml/2012/12/6/290

  Basically we use per-cpu counters (for scalability) when no writers are
  active, and then switch to global rwlocks (for lock-safety) when a writer
  becomes active. It is a slightly complex scheme, but it is based on
  standard principles of distributed algorithms.

Changes in v2:
-------------
* Completely redesigned the synchronization scheme to avoid using any extra
  cpumasks.

* Provided APIs for 2 types of atomic hotplug readers: "light" (for
  light-weight) and "full". We wish to have more "light" readers than
  the "full" ones, to avoid indirectly inducing the "stop_machine effect"
  without even actually using stop_machine().

  And the patches show that it _is_ generally true: 5 patches deal with
  "light" readers, whereas only 1 patch deals with a "full" reader.

  Also, the "light" readers happen to be in very hot paths. So it makes a
  lot of sense to have such a distinction and a corresponding light-weight
  API.

Links to previous versions:
v5: http://lwn.net/Articles/533553/
v4: https://lkml.org/lkml/2012/12/11/209
v3: https://lkml.org/lkml/2012/12/7/287
v2: https://lkml.org/lkml/2012/12/5/322
v1: https://lkml.org/lkml/2012/12/4/88

--
 Paul E. McKenney (1):
      cpu: No more __stop_machine() in _cpu_down()

Srivatsa S. Bhat (45):
      percpu_rwlock: Introduce the global reader-writer lock backend
      percpu_rwlock: Introduce per-CPU variables for the reader and the writer
      percpu_rwlock: Provide a way to define and init percpu-rwlocks at compile time
      percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
      percpu_rwlock: Make percpu-rwlocks IRQ-safe, optimally
      percpu_rwlock: Rearrange the read-lock code to fastpath nested percpu readers
      percpu_rwlock: Allow writers to be readers, and add lockdep annotations
      CPU hotplug: Provide APIs to prevent CPU offline from atomic context
      CPU hotplug: Convert preprocessor macros to static inline functions
      smp, cpu hotplug: Fix smp_call_function_*() to prevent CPU offline properly
      smp, cpu hotplug: Fix on_each_cpu_*() to prevent CPU offline properly
      sched/timer: Use get/put_online_cpus_atomic() to prevent CPU offline
      sched/migration: Use raw_spin_lock/unlock since interrupts are already disabled
      sched/rt: Use get/put_online_cpus_atomic() to prevent CPU offline
      tick: Use get/put_online_cpus_atomic() to prevent CPU offline
      time/clocksource: Use get/put_online_cpus_atomic() to prevent CPU offline
      clockevents: Use get/put_online_cpus_atomic() in clockevents_notify()
      softirq: Use get/put_online_cpus_atomic() to prevent CPU offline
      irq: Use get/put_online_cpus_atomic() to prevent CPU offline
      net: Use get/put_online_cpus_atomic() to prevent CPU offline
      block: Use get/put_online_cpus_atomic() to prevent CPU offline
      crypto: pcrypt - Protect access to cpu_online_mask with get/put_online_cpus()
      infiniband: ehca: Use get/put_online_cpus_atomic() to prevent CPU offline
      [SCSI] fcoe: Use get/put_online_cpus_atomic() to prevent CPU offline
      staging: octeon: Use get/put_online_cpus_atomic() to prevent CPU offline
      x86: Use get/put_online_cpus_atomic() to prevent CPU offline
      perf/x86: Use get/put_online_cpus_atomic() to prevent CPU offline
      KVM: Use get/put_online_cpus_atomic() to prevent CPU offline from atomic context
      kvm/vmx: Use get/put_online_cpus_atomic() to prevent CPU offline
      x86/xen: Use get/put_online_cpus_atomic() to prevent CPU offline
      alpha/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
      blackfin/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
      cris/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
      hexagon/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
      ia64: Use get/put_online_cpus_atomic() to prevent CPU offline
      m32r: Use get/put_online_cpus_atomic() to prevent CPU offline
      MIPS: Use get/put_online_cpus_atomic() to prevent CPU offline
      mn10300: Use get/put_online_cpus_atomic() to prevent CPU offline
      parisc: Use get/put_online_cpus_atomic() to prevent CPU offline
      powerpc: Use get/put_online_cpus_atomic() to prevent CPU offline
      sh: Use get/put_online_cpus_atomic() to prevent CPU offline
      sparc: Use get/put_online_cpus_atomic() to prevent CPU offline
      tile: Use get/put_online_cpus_atomic() to prevent CPU offline
      CPU hotplug, stop_machine: Decouple CPU hotplug from stop_machine() in Kconfig
      Documentation/cpu-hotplug: Remove references to stop_machine()

  Documentation/cpu-hotplug.txt                 |   17 +-
 arch/alpha/kernel/smp.c                       |   19 +-
 arch/arm/Kconfig                              |    1 
 arch/blackfin/Kconfig                         |    1 
 arch/blackfin/mach-common/smp.c               |    6 -
 arch/cris/arch-v32/kernel/smp.c               |    8 +
 arch/hexagon/kernel/smp.c                     |    5 
 arch/ia64/Kconfig                             |    1 
 arch/ia64/kernel/irq_ia64.c                   |   13 +
 arch/ia64/kernel/perfmon.c                    |    6 +
 arch/ia64/kernel/smp.c                        |   23 ++
 arch/ia64/mm/tlb.c                            |    6 -
 arch/m32r/kernel/smp.c                        |   12 +
 arch/mips/Kconfig                             |    1 
 arch/mips/kernel/cevt-smtc.c                  |    8 +
 arch/mips/kernel/smp.c                        |   16 +-
 arch/mips/kernel/smtc.c                       |    3 
 arch/mips/mm/c-octeon.c                       |    4 
 arch/mn10300/Kconfig                          |    1 
 arch/mn10300/kernel/smp.c                     |    2 
 arch/mn10300/mm/cache-smp.c                   |    5 
 arch/mn10300/mm/tlb-smp.c                     |   15 +
 arch/parisc/Kconfig                           |    1 
 arch/parisc/kernel/smp.c                      |    4 
 arch/powerpc/Kconfig                          |    1 
 arch/powerpc/mm/mmu_context_nohash.c          |    2 
 arch/s390/Kconfig                             |    1 
 arch/sh/Kconfig                               |    1 
 arch/sh/kernel/smp.c                          |   12 +
 arch/sparc/Kconfig                            |    1 
 arch/sparc/kernel/leon_smp.c                  |    2 
 arch/sparc/kernel/smp_64.c                    |    9 -
 arch/sparc/kernel/sun4d_smp.c                 |    2 
 arch/sparc/kernel/sun4m_smp.c                 |    3 
 arch/tile/kernel/smp.c                        |    4 
 arch/x86/Kconfig                              |    1 
 arch/x86/include/asm/ipi.h                    |    5 
 arch/x86/kernel/apic/apic_flat_64.c           |   10 +
 arch/x86/kernel/apic/apic_numachip.c          |    5 
 arch/x86/kernel/apic/es7000_32.c              |    5 
 arch/x86/kernel/apic/io_apic.c                |    7 -
 arch/x86/kernel/apic/ipi.c                    |   10 +
 arch/x86/kernel/apic/x2apic_cluster.c         |    4 
 arch/x86/kernel/apic/x2apic_uv_x.c            |    4 
 arch/x86/kernel/cpu/mcheck/therm_throt.c      |    4 
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |    5 
 arch/x86/kvm/vmx.c                            |    8 +
 arch/x86/mm/tlb.c                             |   14 +
 arch/x86/xen/mmu.c                            |   11 +
 arch/x86/xen/smp.c                            |    9 +
 block/blk-softirq.c                           |    4 
 crypto/pcrypt.c                               |    4 
 drivers/infiniband/hw/ehca/ehca_irq.c         |    8 +
 drivers/scsi/fcoe/fcoe.c                      |    7 +
 drivers/staging/octeon/ethernet-rx.c          |    3 
 include/linux/cpu.h                           |    8 +
 include/linux/percpu-rwlock.h                 |   74 +++++++
 include/linux/stop_machine.h                  |    2 
 init/Kconfig                                  |    2 
 kernel/cpu.c                                  |   59 +++++-
 kernel/irq/manage.c                           |    7 +
 kernel/sched/core.c                           |   36 +++-
 kernel/sched/fair.c                           |    5 
 kernel/sched/rt.c                             |    3 
 kernel/smp.c                                  |   65 ++++--
 kernel/softirq.c                              |    3 
 kernel/time/clockevents.c                     |    3 
 kernel/time/clocksource.c                     |    5 
 kernel/time/tick-broadcast.c                  |    2 
 kernel/timer.c                                |    2 
 lib/Kconfig                                   |    3 
 lib/Makefile                                  |    1 
 lib/percpu-rwlock.c                           |  256 +++++++++++++++++++++++++
 net/core/dev.c                                |    9 +
 virt/kvm/kvm_main.c                           |   10 +
 75 files changed, 776 insertions(+), 123 deletions(-)
 create mode 100644 include/linux/percpu-rwlock.h
 create mode 100644 lib/percpu-rwlock.c



Regards,
Srivatsa S. Bhat
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 00/46] CPU hotplug: stop_machine()-free CPU hotplug
@ 2013-02-18 12:38 ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:38 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

Hi,

This patchset removes CPU hotplug's dependence on stop_machine() from the CPU
offline path and provides an alternative (set of APIs) to preempt_disable() to
prevent CPUs from going offline, which can be invoked from atomic context.
The motivation behind the removal of stop_machine() is to avoid its ill-effects
and thus improve the design of CPU hotplug. (More description regarding this
is available in the patches).

All the users of preempt_disable()/local_irq_disable() who used to use it to
prevent CPU offline, have been converted to the new primitives introduced in the
patchset. Also, the CPU_DYING notifiers have been audited to check whether
they can cope up with the removal of stop_machine() or whether they need to
use new locks for synchronization (all CPU_DYING notifiers looked OK, without
the need for any new locks).

Applies on current mainline (v3.8-rc7+).

This patchset is available in the following git branch:

git://github.com/srivatsabhat/linux.git  stop-machine-free-cpu-hotplug-v6


Overview of the patches:
-----------------------

Patches 1 to 7 introduce a generic, flexible Per-CPU Reader-Writer Locking
scheme.

Patch 8 uses this synchronization mechanism to build the
get/put_online_cpus_atomic() APIs which can be used from atomic context, to
prevent CPUs from going offline.

Patch 9 is a cleanup; it converts preprocessor macros to static inline
functions.

Patches 10 to 43 convert various call-sites to use the new APIs.

Patch 44 is the one which actually removes stop_machine() from the CPU
offline path.

Patch 45 decouples stop_machine() and CPU hotplug from Kconfig.

Patch 46 updates the documentation to reflect the new APIs.


Changes in v6:
--------------

* Fixed issues related to memory barriers, as pointed out by Paul and Oleg.
* Fixed the locking issue related to clockevents_lock, which was being
  triggered when cpu idle was enabled.
* Some code restructuring to improve readability and to enhance some fastpath
  optimizations.
* Randconfig build-fixes, reported by Fengguang Wu.


Changes in v5:
--------------
  Exposed a new generic locking scheme: Flexible Per-CPU Reader-Writer locks,
  based on the synchronization schemes already discussed in the previous
  versions, and used it in CPU hotplug, to implement the new APIs.

  Audited the CPU_DYING notifiers in the kernel source tree and replaced
  usages of preempt_disable() with the new get/put_online_cpus_atomic() APIs
  where necessary.


Changes in v4:
--------------
  The synchronization scheme has been simplified quite a bit, which makes it
  look a lot less complex than before. Some highlights:

* Implicit ACKs:

  The earlier design required the readers to explicitly ACK the writer's
  signal. The new design uses implicit ACKs instead. The reader switching
  over to rwlock implicitly tells the writer to stop waiting for that reader.

* No atomic operations:

  Since we got rid of explicit ACKs, we no longer have the need for a reader
  and a writer to update the same counter. So we can get rid of atomic ops
  too.

Changes in v3:
--------------
* Dropped the _light() and _full() variants of the APIs. Provided a single
  interface: get/put_online_cpus_atomic().

* Completely redesigned the synchronization mechanism again, to make it
  fast and scalable at the reader-side in the fast-path (when no hotplug
  writers are active). This new scheme also ensures that there is no
  possibility of deadlocks due to circular locking dependency.
  In summary, this provides the scalability and speed of per-cpu rwlocks
  (without actually using them), while avoiding the downside (deadlock
  possibilities) which is inherent in any per-cpu locking scheme that is
  meant to compete with preempt_disable()/enable() in terms of flexibility.

  The problem with using per-cpu locking to replace preempt_disable()/enable
  was explained here:
  https://lkml.org/lkml/2012/12/6/290

  Basically we use per-cpu counters (for scalability) when no writers are
  active, and then switch to global rwlocks (for lock-safety) when a writer
  becomes active. It is a slightly complex scheme, but it is based on
  standard principles of distributed algorithms.

Changes in v2:
-------------
* Completely redesigned the synchronization scheme to avoid using any extra
  cpumasks.

* Provided APIs for 2 types of atomic hotplug readers: "light" (for
  light-weight) and "full". We wish to have more "light" readers than
  the "full" ones, to avoid indirectly inducing the "stop_machine effect"
  without even actually using stop_machine().

  And the patches show that it _is_ generally true: 5 patches deal with
  "light" readers, whereas only 1 patch deals with a "full" reader.

  Also, the "light" readers happen to be in very hot paths. So it makes a
  lot of sense to have such a distinction and a corresponding light-weight
  API.

Links to previous versions:
v5: http://lwn.net/Articles/533553/
v4: https://lkml.org/lkml/2012/12/11/209
v3: https://lkml.org/lkml/2012/12/7/287
v2: https://lkml.org/lkml/2012/12/5/322
v1: https://lkml.org/lkml/2012/12/4/88

--
 Paul E. McKenney (1):
      cpu: No more __stop_machine() in _cpu_down()

Srivatsa S. Bhat (45):
      percpu_rwlock: Introduce the global reader-writer lock backend
      percpu_rwlock: Introduce per-CPU variables for the reader and the writer
      percpu_rwlock: Provide a way to define and init percpu-rwlocks at compile time
      percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
      percpu_rwlock: Make percpu-rwlocks IRQ-safe, optimally
      percpu_rwlock: Rearrange the read-lock code to fastpath nested percpu readers
      percpu_rwlock: Allow writers to be readers, and add lockdep annotations
      CPU hotplug: Provide APIs to prevent CPU offline from atomic context
      CPU hotplug: Convert preprocessor macros to static inline functions
      smp, cpu hotplug: Fix smp_call_function_*() to prevent CPU offline properly
      smp, cpu hotplug: Fix on_each_cpu_*() to prevent CPU offline properly
      sched/timer: Use get/put_online_cpus_atomic() to prevent CPU offline
      sched/migration: Use raw_spin_lock/unlock since interrupts are already disabled
      sched/rt: Use get/put_online_cpus_atomic() to prevent CPU offline
      tick: Use get/put_online_cpus_atomic() to prevent CPU offline
      time/clocksource: Use get/put_online_cpus_atomic() to prevent CPU offline
      clockevents: Use get/put_online_cpus_atomic() in clockevents_notify()
      softirq: Use get/put_online_cpus_atomic() to prevent CPU offline
      irq: Use get/put_online_cpus_atomic() to prevent CPU offline
      net: Use get/put_online_cpus_atomic() to prevent CPU offline
      block: Use get/put_online_cpus_atomic() to prevent CPU offline
      crypto: pcrypt - Protect access to cpu_online_mask with get/put_online_cpus()
      infiniband: ehca: Use get/put_online_cpus_atomic() to prevent CPU offline
      [SCSI] fcoe: Use get/put_online_cpus_atomic() to prevent CPU offline
      staging: octeon: Use get/put_online_cpus_atomic() to prevent CPU offline
      x86: Use get/put_online_cpus_atomic() to prevent CPU offline
      perf/x86: Use get/put_online_cpus_atomic() to prevent CPU offline
      KVM: Use get/put_online_cpus_atomic() to prevent CPU offline from atomic context
      kvm/vmx: Use get/put_online_cpus_atomic() to prevent CPU offline
      x86/xen: Use get/put_online_cpus_atomic() to prevent CPU offline
      alpha/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
      blackfin/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
      cris/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
      hexagon/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
      ia64: Use get/put_online_cpus_atomic() to prevent CPU offline
      m32r: Use get/put_online_cpus_atomic() to prevent CPU offline
      MIPS: Use get/put_online_cpus_atomic() to prevent CPU offline
      mn10300: Use get/put_online_cpus_atomic() to prevent CPU offline
      parisc: Use get/put_online_cpus_atomic() to prevent CPU offline
      powerpc: Use get/put_online_cpus_atomic() to prevent CPU offline
      sh: Use get/put_online_cpus_atomic() to prevent CPU offline
      sparc: Use get/put_online_cpus_atomic() to prevent CPU offline
      tile: Use get/put_online_cpus_atomic() to prevent CPU offline
      CPU hotplug, stop_machine: Decouple CPU hotplug from stop_machine() in Kconfig
      Documentation/cpu-hotplug: Remove references to stop_machine()

  Documentation/cpu-hotplug.txt                 |   17 +-
 arch/alpha/kernel/smp.c                       |   19 +-
 arch/arm/Kconfig                              |    1 
 arch/blackfin/Kconfig                         |    1 
 arch/blackfin/mach-common/smp.c               |    6 -
 arch/cris/arch-v32/kernel/smp.c               |    8 +
 arch/hexagon/kernel/smp.c                     |    5 
 arch/ia64/Kconfig                             |    1 
 arch/ia64/kernel/irq_ia64.c                   |   13 +
 arch/ia64/kernel/perfmon.c                    |    6 +
 arch/ia64/kernel/smp.c                        |   23 ++
 arch/ia64/mm/tlb.c                            |    6 -
 arch/m32r/kernel/smp.c                        |   12 +
 arch/mips/Kconfig                             |    1 
 arch/mips/kernel/cevt-smtc.c                  |    8 +
 arch/mips/kernel/smp.c                        |   16 +-
 arch/mips/kernel/smtc.c                       |    3 
 arch/mips/mm/c-octeon.c                       |    4 
 arch/mn10300/Kconfig                          |    1 
 arch/mn10300/kernel/smp.c                     |    2 
 arch/mn10300/mm/cache-smp.c                   |    5 
 arch/mn10300/mm/tlb-smp.c                     |   15 +
 arch/parisc/Kconfig                           |    1 
 arch/parisc/kernel/smp.c                      |    4 
 arch/powerpc/Kconfig                          |    1 
 arch/powerpc/mm/mmu_context_nohash.c          |    2 
 arch/s390/Kconfig                             |    1 
 arch/sh/Kconfig                               |    1 
 arch/sh/kernel/smp.c                          |   12 +
 arch/sparc/Kconfig                            |    1 
 arch/sparc/kernel/leon_smp.c                  |    2 
 arch/sparc/kernel/smp_64.c                    |    9 -
 arch/sparc/kernel/sun4d_smp.c                 |    2 
 arch/sparc/kernel/sun4m_smp.c                 |    3 
 arch/tile/kernel/smp.c                        |    4 
 arch/x86/Kconfig                              |    1 
 arch/x86/include/asm/ipi.h                    |    5 
 arch/x86/kernel/apic/apic_flat_64.c           |   10 +
 arch/x86/kernel/apic/apic_numachip.c          |    5 
 arch/x86/kernel/apic/es7000_32.c              |    5 
 arch/x86/kernel/apic/io_apic.c                |    7 -
 arch/x86/kernel/apic/ipi.c                    |   10 +
 arch/x86/kernel/apic/x2apic_cluster.c         |    4 
 arch/x86/kernel/apic/x2apic_uv_x.c            |    4 
 arch/x86/kernel/cpu/mcheck/therm_throt.c      |    4 
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |    5 
 arch/x86/kvm/vmx.c                            |    8 +
 arch/x86/mm/tlb.c                             |   14 +
 arch/x86/xen/mmu.c                            |   11 +
 arch/x86/xen/smp.c                            |    9 +
 block/blk-softirq.c                           |    4 
 crypto/pcrypt.c                               |    4 
 drivers/infiniband/hw/ehca/ehca_irq.c         |    8 +
 drivers/scsi/fcoe/fcoe.c                      |    7 +
 drivers/staging/octeon/ethernet-rx.c          |    3 
 include/linux/cpu.h                           |    8 +
 include/linux/percpu-rwlock.h                 |   74 +++++++
 include/linux/stop_machine.h                  |    2 
 init/Kconfig                                  |    2 
 kernel/cpu.c                                  |   59 +++++-
 kernel/irq/manage.c                           |    7 +
 kernel/sched/core.c                           |   36 +++-
 kernel/sched/fair.c                           |    5 
 kernel/sched/rt.c                             |    3 
 kernel/smp.c                                  |   65 ++++--
 kernel/softirq.c                              |    3 
 kernel/time/clockevents.c                     |    3 
 kernel/time/clocksource.c                     |    5 
 kernel/time/tick-broadcast.c                  |    2 
 kernel/timer.c                                |    2 
 lib/Kconfig                                   |    3 
 lib/Makefile                                  |    1 
 lib/percpu-rwlock.c                           |  256 +++++++++++++++++++++++++
 net/core/dev.c                                |    9 +
 virt/kvm/kvm_main.c                           |   10 +
 75 files changed, 776 insertions(+), 123 deletions(-)
 create mode 100644 include/linux/percpu-rwlock.h
 create mode 100644 lib/percpu-rwlock.c



Regards,
Srivatsa S. Bhat
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 00/46] CPU hotplug: stop_machine()-free CPU hotplug
@ 2013-02-18 12:38 ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:38 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

This patchset removes CPU hotplug's dependence on stop_machine() from the CPU
offline path and provides an alternative (set of APIs) to preempt_disable() to
prevent CPUs from going offline, which can be invoked from atomic context.
The motivation behind the removal of stop_machine() is to avoid its ill-effects
and thus improve the design of CPU hotplug. (More description regarding this
is available in the patches).

All the users of preempt_disable()/local_irq_disable() who used to use it to
prevent CPU offline, have been converted to the new primitives introduced in the
patchset. Also, the CPU_DYING notifiers have been audited to check whether
they can cope up with the removal of stop_machine() or whether they need to
use new locks for synchronization (all CPU_DYING notifiers looked OK, without
the need for any new locks).

Applies on current mainline (v3.8-rc7+).

This patchset is available in the following git branch:

git://github.com/srivatsabhat/linux.git  stop-machine-free-cpu-hotplug-v6


Overview of the patches:
-----------------------

Patches 1 to 7 introduce a generic, flexible Per-CPU Reader-Writer Locking
scheme.

Patch 8 uses this synchronization mechanism to build the
get/put_online_cpus_atomic() APIs which can be used from atomic context, to
prevent CPUs from going offline.

Patch 9 is a cleanup; it converts preprocessor macros to static inline
functions.

Patches 10 to 43 convert various call-sites to use the new APIs.

Patch 44 is the one which actually removes stop_machine() from the CPU
offline path.

Patch 45 decouples stop_machine() and CPU hotplug from Kconfig.

Patch 46 updates the documentation to reflect the new APIs.


Changes in v6:
--------------

* Fixed issues related to memory barriers, as pointed out by Paul and Oleg.
* Fixed the locking issue related to clockevents_lock, which was being
  triggered when cpu idle was enabled.
* Some code restructuring to improve readability and to enhance some fastpath
  optimizations.
* Randconfig build-fixes, reported by Fengguang Wu.


Changes in v5:
--------------
  Exposed a new generic locking scheme: Flexible Per-CPU Reader-Writer locks,
  based on the synchronization schemes already discussed in the previous
  versions, and used it in CPU hotplug, to implement the new APIs.

  Audited the CPU_DYING notifiers in the kernel source tree and replaced
  usages of preempt_disable() with the new get/put_online_cpus_atomic() APIs
  where necessary.


Changes in v4:
--------------
  The synchronization scheme has been simplified quite a bit, which makes it
  look a lot less complex than before. Some highlights:

* Implicit ACKs:

  The earlier design required the readers to explicitly ACK the writer's
  signal. The new design uses implicit ACKs instead. The reader switching
  over to rwlock implicitly tells the writer to stop waiting for that reader.

* No atomic operations:

  Since we got rid of explicit ACKs, we no longer have the need for a reader
  and a writer to update the same counter. So we can get rid of atomic ops
  too.

Changes in v3:
--------------
* Dropped the _light() and _full() variants of the APIs. Provided a single
  interface: get/put_online_cpus_atomic().

* Completely redesigned the synchronization mechanism again, to make it
  fast and scalable at the reader-side in the fast-path (when no hotplug
  writers are active). This new scheme also ensures that there is no
  possibility of deadlocks due to circular locking dependency.
  In summary, this provides the scalability and speed of per-cpu rwlocks
  (without actually using them), while avoiding the downside (deadlock
  possibilities) which is inherent in any per-cpu locking scheme that is
  meant to compete with preempt_disable()/enable() in terms of flexibility.

  The problem with using per-cpu locking to replace preempt_disable()/enable
  was explained here:
  https://lkml.org/lkml/2012/12/6/290

  Basically we use per-cpu counters (for scalability) when no writers are
  active, and then switch to global rwlocks (for lock-safety) when a writer
  becomes active. It is a slightly complex scheme, but it is based on
  standard principles of distributed algorithms.

Changes in v2:
-------------
* Completely redesigned the synchronization scheme to avoid using any extra
  cpumasks.

* Provided APIs for 2 types of atomic hotplug readers: "light" (for
  light-weight) and "full". We wish to have more "light" readers than
  the "full" ones, to avoid indirectly inducing the "stop_machine effect"
  without even actually using stop_machine().

  And the patches show that it _is_ generally true: 5 patches deal with
  "light" readers, whereas only 1 patch deals with a "full" reader.

  Also, the "light" readers happen to be in very hot paths. So it makes a
  lot of sense to have such a distinction and a corresponding light-weight
  API.

Links to previous versions:
v5: http://lwn.net/Articles/533553/
v4: https://lkml.org/lkml/2012/12/11/209
v3: https://lkml.org/lkml/2012/12/7/287
v2: https://lkml.org/lkml/2012/12/5/322
v1: https://lkml.org/lkml/2012/12/4/88

--
 Paul E. McKenney (1):
      cpu: No more __stop_machine() in _cpu_down()

Srivatsa S. Bhat (45):
      percpu_rwlock: Introduce the global reader-writer lock backend
      percpu_rwlock: Introduce per-CPU variables for the reader and the writer
      percpu_rwlock: Provide a way to define and init percpu-rwlocks at compile time
      percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
      percpu_rwlock: Make percpu-rwlocks IRQ-safe, optimally
      percpu_rwlock: Rearrange the read-lock code to fastpath nested percpu readers
      percpu_rwlock: Allow writers to be readers, and add lockdep annotations
      CPU hotplug: Provide APIs to prevent CPU offline from atomic context
      CPU hotplug: Convert preprocessor macros to static inline functions
      smp, cpu hotplug: Fix smp_call_function_*() to prevent CPU offline properly
      smp, cpu hotplug: Fix on_each_cpu_*() to prevent CPU offline properly
      sched/timer: Use get/put_online_cpus_atomic() to prevent CPU offline
      sched/migration: Use raw_spin_lock/unlock since interrupts are already disabled
      sched/rt: Use get/put_online_cpus_atomic() to prevent CPU offline
      tick: Use get/put_online_cpus_atomic() to prevent CPU offline
      time/clocksource: Use get/put_online_cpus_atomic() to prevent CPU offline
      clockevents: Use get/put_online_cpus_atomic() in clockevents_notify()
      softirq: Use get/put_online_cpus_atomic() to prevent CPU offline
      irq: Use get/put_online_cpus_atomic() to prevent CPU offline
      net: Use get/put_online_cpus_atomic() to prevent CPU offline
      block: Use get/put_online_cpus_atomic() to prevent CPU offline
      crypto: pcrypt - Protect access to cpu_online_mask with get/put_online_cpus()
      infiniband: ehca: Use get/put_online_cpus_atomic() to prevent CPU offline
      [SCSI] fcoe: Use get/put_online_cpus_atomic() to prevent CPU offline
      staging: octeon: Use get/put_online_cpus_atomic() to prevent CPU offline
      x86: Use get/put_online_cpus_atomic() to prevent CPU offline
      perf/x86: Use get/put_online_cpus_atomic() to prevent CPU offline
      KVM: Use get/put_online_cpus_atomic() to prevent CPU offline from atomic context
      kvm/vmx: Use get/put_online_cpus_atomic() to prevent CPU offline
      x86/xen: Use get/put_online_cpus_atomic() to prevent CPU offline
      alpha/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
      blackfin/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
      cris/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
      hexagon/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
      ia64: Use get/put_online_cpus_atomic() to prevent CPU offline
      m32r: Use get/put_online_cpus_atomic() to prevent CPU offline
      MIPS: Use get/put_online_cpus_atomic() to prevent CPU offline
      mn10300: Use get/put_online_cpus_atomic() to prevent CPU offline
      parisc: Use get/put_online_cpus_atomic() to prevent CPU offline
      powerpc: Use get/put_online_cpus_atomic() to prevent CPU offline
      sh: Use get/put_online_cpus_atomic() to prevent CPU offline
      sparc: Use get/put_online_cpus_atomic() to prevent CPU offline
      tile: Use get/put_online_cpus_atomic() to prevent CPU offline
      CPU hotplug, stop_machine: Decouple CPU hotplug from stop_machine() in Kconfig
      Documentation/cpu-hotplug: Remove references to stop_machine()

  Documentation/cpu-hotplug.txt                 |   17 +-
 arch/alpha/kernel/smp.c                       |   19 +-
 arch/arm/Kconfig                              |    1 
 arch/blackfin/Kconfig                         |    1 
 arch/blackfin/mach-common/smp.c               |    6 -
 arch/cris/arch-v32/kernel/smp.c               |    8 +
 arch/hexagon/kernel/smp.c                     |    5 
 arch/ia64/Kconfig                             |    1 
 arch/ia64/kernel/irq_ia64.c                   |   13 +
 arch/ia64/kernel/perfmon.c                    |    6 +
 arch/ia64/kernel/smp.c                        |   23 ++
 arch/ia64/mm/tlb.c                            |    6 -
 arch/m32r/kernel/smp.c                        |   12 +
 arch/mips/Kconfig                             |    1 
 arch/mips/kernel/cevt-smtc.c                  |    8 +
 arch/mips/kernel/smp.c                        |   16 +-
 arch/mips/kernel/smtc.c                       |    3 
 arch/mips/mm/c-octeon.c                       |    4 
 arch/mn10300/Kconfig                          |    1 
 arch/mn10300/kernel/smp.c                     |    2 
 arch/mn10300/mm/cache-smp.c                   |    5 
 arch/mn10300/mm/tlb-smp.c                     |   15 +
 arch/parisc/Kconfig                           |    1 
 arch/parisc/kernel/smp.c                      |    4 
 arch/powerpc/Kconfig                          |    1 
 arch/powerpc/mm/mmu_context_nohash.c          |    2 
 arch/s390/Kconfig                             |    1 
 arch/sh/Kconfig                               |    1 
 arch/sh/kernel/smp.c                          |   12 +
 arch/sparc/Kconfig                            |    1 
 arch/sparc/kernel/leon_smp.c                  |    2 
 arch/sparc/kernel/smp_64.c                    |    9 -
 arch/sparc/kernel/sun4d_smp.c                 |    2 
 arch/sparc/kernel/sun4m_smp.c                 |    3 
 arch/tile/kernel/smp.c                        |    4 
 arch/x86/Kconfig                              |    1 
 arch/x86/include/asm/ipi.h                    |    5 
 arch/x86/kernel/apic/apic_flat_64.c           |   10 +
 arch/x86/kernel/apic/apic_numachip.c          |    5 
 arch/x86/kernel/apic/es7000_32.c              |    5 
 arch/x86/kernel/apic/io_apic.c                |    7 -
 arch/x86/kernel/apic/ipi.c                    |   10 +
 arch/x86/kernel/apic/x2apic_cluster.c         |    4 
 arch/x86/kernel/apic/x2apic_uv_x.c            |    4 
 arch/x86/kernel/cpu/mcheck/therm_throt.c      |    4 
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |    5 
 arch/x86/kvm/vmx.c                            |    8 +
 arch/x86/mm/tlb.c                             |   14 +
 arch/x86/xen/mmu.c                            |   11 +
 arch/x86/xen/smp.c                            |    9 +
 block/blk-softirq.c                           |    4 
 crypto/pcrypt.c                               |    4 
 drivers/infiniband/hw/ehca/ehca_irq.c         |    8 +
 drivers/scsi/fcoe/fcoe.c                      |    7 +
 drivers/staging/octeon/ethernet-rx.c          |    3 
 include/linux/cpu.h                           |    8 +
 include/linux/percpu-rwlock.h                 |   74 +++++++
 include/linux/stop_machine.h                  |    2 
 init/Kconfig                                  |    2 
 kernel/cpu.c                                  |   59 +++++-
 kernel/irq/manage.c                           |    7 +
 kernel/sched/core.c                           |   36 +++-
 kernel/sched/fair.c                           |    5 
 kernel/sched/rt.c                             |    3 
 kernel/smp.c                                  |   65 ++++--
 kernel/softirq.c                              |    3 
 kernel/time/clockevents.c                     |    3 
 kernel/time/clocksource.c                     |    5 
 kernel/time/tick-broadcast.c                  |    2 
 kernel/timer.c                                |    2 
 lib/Kconfig                                   |    3 
 lib/Makefile                                  |    1 
 lib/percpu-rwlock.c                           |  256 +++++++++++++++++++++++++
 net/core/dev.c                                |    9 +
 virt/kvm/kvm_main.c                           |   10 +
 75 files changed, 776 insertions(+), 123 deletions(-)
 create mode 100644 include/linux/percpu-rwlock.h
 create mode 100644 lib/percpu-rwlock.c



Regards,
Srivatsa S. Bhat
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 01/46] percpu_rwlock: Introduce the global reader-writer lock backend
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:38   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:38 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

A straight-forward (and obvious) algorithm to implement Per-CPU Reader-Writer
locks can also lead to too many deadlock possibilities which can make it very
hard/impossible to use. This is explained in the example below, which helps
justify the need for a different algorithm to implement flexible Per-CPU
Reader-Writer locks.

We can use global rwlocks as shown below safely, without fear of deadlocks:

Readers:

         CPU 0                                CPU 1
         ------                               ------

1.    spin_lock(&random_lock);             read_lock(&my_rwlock);


2.    read_lock(&my_rwlock);               spin_lock(&random_lock);


Writer:

         CPU 2:
         ------

       write_lock(&my_rwlock);


We can observe that there is no possibility of deadlocks or circular locking
dependencies here. Its perfectly safe.

Now consider a blind/straight-forward conversion of global rwlocks to per-CPU
rwlocks like this:

The reader locks its own per-CPU rwlock for read, and proceeds.

Something like: read_lock(per-cpu rwlock of this cpu);

The writer acquires all per-CPU rwlocks for write and only then proceeds.

Something like:

  for_each_online_cpu(cpu)
	write_lock(per-cpu rwlock of 'cpu');


Now let's say that for performance reasons, the above scenario (which was
perfectly safe when using global rwlocks) was converted to use per-CPU rwlocks.


         CPU 0                                CPU 1
         ------                               ------

1.    spin_lock(&random_lock);             read_lock(my_rwlock of CPU 1);


2.    read_lock(my_rwlock of CPU 0);       spin_lock(&random_lock);


Writer:

         CPU 2:
         ------

      for_each_online_cpu(cpu)
        write_lock(my_rwlock of 'cpu');


Consider what happens if the writer begins his operation in between steps 1
and 2 at the reader side. It becomes evident that we end up in a (previously
non-existent) deadlock due to a circular locking dependency between the 3
entities, like this:


(holds              Waiting for
 random_lock) CPU 0 -------------> CPU 2  (holds my_rwlock of CPU 0
                                               for write)
               ^                   |
               |                   |
        Waiting|                   | Waiting
          for  |                   |  for
               |                   V
                ------ CPU 1 <------

                (holds my_rwlock of
                 CPU 1 for read)



So obviously this "straight-forward" way of implementing percpu rwlocks is
deadlock-prone. One simple measure for (or characteristic of) safe percpu
rwlock should be that if a user replaces global rwlocks with per-CPU rwlocks
(for performance reasons), he shouldn't suddenly end up in numerous deadlock
possibilities which never existed before. The replacement should continue to
remain safe, and perhaps improve the performance.

Observing the robustness of global rwlocks in providing a fair amount of
deadlock safety, we implement per-CPU rwlocks as nothing but global rwlocks,
as a first step.


Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 include/linux/percpu-rwlock.h |   49 ++++++++++++++++++++++++++++++++
 lib/Kconfig                   |    3 ++
 lib/Makefile                  |    1 +
 lib/percpu-rwlock.c           |   63 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 116 insertions(+)
 create mode 100644 include/linux/percpu-rwlock.h
 create mode 100644 lib/percpu-rwlock.c

diff --git a/include/linux/percpu-rwlock.h b/include/linux/percpu-rwlock.h
new file mode 100644
index 0000000..0caf81f
--- /dev/null
+++ b/include/linux/percpu-rwlock.h
@@ -0,0 +1,49 @@
+/*
+ * Flexible Per-CPU Reader-Writer Locks
+ * (with relaxed locking rules and reduced deadlock-possibilities)
+ *
+ * Copyright (C) IBM Corporation, 2012-2013
+ * Author: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
+ *
+ * With lots of invaluable suggestions from:
+ *	   Oleg Nesterov <oleg@redhat.com>
+ *	   Tejun Heo <tj@kernel.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _LINUX_PERCPU_RWLOCK_H
+#define _LINUX_PERCPU_RWLOCK_H
+
+#include <linux/percpu.h>
+#include <linux/lockdep.h>
+#include <linux/spinlock.h>
+
+struct percpu_rwlock {
+	rwlock_t			global_rwlock;
+};
+
+extern void percpu_read_lock(struct percpu_rwlock *);
+extern void percpu_read_unlock(struct percpu_rwlock *);
+
+extern void percpu_write_lock(struct percpu_rwlock *);
+extern void percpu_write_unlock(struct percpu_rwlock *);
+
+extern int __percpu_init_rwlock(struct percpu_rwlock *,
+				const char *, struct lock_class_key *);
+
+#define percpu_init_rwlock(pcpu_rwlock)					\
+({	static struct lock_class_key rwlock_key;			\
+	__percpu_init_rwlock(pcpu_rwlock, #pcpu_rwlock, &rwlock_key);	\
+})
+
+#endif
diff --git a/lib/Kconfig b/lib/Kconfig
index 75cdb77..32fb0b9 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -45,6 +45,9 @@ config STMP_DEVICE
 config PERCPU_RWSEM
 	boolean
 
+config PERCPU_RWLOCK
+	boolean
+
 config CRC_CCITT
 	tristate "CRC-CCITT functions"
 	help
diff --git a/lib/Makefile b/lib/Makefile
index 02ed6c0..1854b5e 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -41,6 +41,7 @@ obj-$(CONFIG_DEBUG_SPINLOCK) += spinlock_debug.o
 lib-$(CONFIG_RWSEM_GENERIC_SPINLOCK) += rwsem-spinlock.o
 lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
 lib-$(CONFIG_PERCPU_RWSEM) += percpu-rwsem.o
+lib-$(CONFIG_PERCPU_RWLOCK) += percpu-rwlock.o
 
 CFLAGS_hweight.o = $(subst $(quote),,$(CONFIG_ARCH_HWEIGHT_CFLAGS))
 obj-$(CONFIG_GENERIC_HWEIGHT) += hweight.o
diff --git a/lib/percpu-rwlock.c b/lib/percpu-rwlock.c
new file mode 100644
index 0000000..111a238
--- /dev/null
+++ b/lib/percpu-rwlock.c
@@ -0,0 +1,63 @@
+/*
+ * Flexible Per-CPU Reader-Writer Locks
+ * (with relaxed locking rules and reduced deadlock-possibilities)
+ *
+ * Copyright (C) IBM Corporation, 2012-2013
+ * Author: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
+ *
+ * With lots of invaluable suggestions from:
+ *	   Oleg Nesterov <oleg@redhat.com>
+ *	   Tejun Heo <tj@kernel.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/spinlock.h>
+#include <linux/percpu.h>
+#include <linux/lockdep.h>
+#include <linux/percpu-rwlock.h>
+#include <linux/errno.h>
+
+
+int __percpu_init_rwlock(struct percpu_rwlock *pcpu_rwlock,
+			 const char *name, struct lock_class_key *rwlock_key)
+{
+	/* ->global_rwlock represents the whole percpu_rwlock for lockdep */
+#ifdef CONFIG_DEBUG_SPINLOCK
+	__rwlock_init(&pcpu_rwlock->global_rwlock, name, rwlock_key);
+#else
+	pcpu_rwlock->global_rwlock =
+			__RW_LOCK_UNLOCKED(&pcpu_rwlock->global_rwlock);
+#endif
+	return 0;
+}
+
+void percpu_read_lock(struct percpu_rwlock *pcpu_rwlock)
+{
+	read_lock(&pcpu_rwlock->global_rwlock);
+}
+
+void percpu_read_unlock(struct percpu_rwlock *pcpu_rwlock)
+{
+	read_unlock(&pcpu_rwlock->global_rwlock);
+}
+
+void percpu_write_lock(struct percpu_rwlock *pcpu_rwlock)
+{
+	write_lock(&pcpu_rwlock->global_rwlock);
+}
+
+void percpu_write_unlock(struct percpu_rwlock *pcpu_rwlock)
+{
+	write_unlock(&pcpu_rwlock->global_rwlock);
+}
+


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 01/46] percpu_rwlock: Introduce the global reader-writer lock backend
@ 2013-02-18 12:38   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:38 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

A straight-forward (and obvious) algorithm to implement Per-CPU Reader-Writer
locks can also lead to too many deadlock possibilities which can make it very
hard/impossible to use. This is explained in the example below, which helps
justify the need for a different algorithm to implement flexible Per-CPU
Reader-Writer locks.

We can use global rwlocks as shown below safely, without fear of deadlocks:

Readers:

         CPU 0                                CPU 1
         ------                               ------

1.    spin_lock(&random_lock);             read_lock(&my_rwlock);


2.    read_lock(&my_rwlock);               spin_lock(&random_lock);


Writer:

         CPU 2:
         ------

       write_lock(&my_rwlock);


We can observe that there is no possibility of deadlocks or circular locking
dependencies here. Its perfectly safe.

Now consider a blind/straight-forward conversion of global rwlocks to per-CPU
rwlocks like this:

The reader locks its own per-CPU rwlock for read, and proceeds.

Something like: read_lock(per-cpu rwlock of this cpu);

The writer acquires all per-CPU rwlocks for write and only then proceeds.

Something like:

  for_each_online_cpu(cpu)
	write_lock(per-cpu rwlock of 'cpu');


Now let's say that for performance reasons, the above scenario (which was
perfectly safe when using global rwlocks) was converted to use per-CPU rwlocks.


         CPU 0                                CPU 1
         ------                               ------

1.    spin_lock(&random_lock);             read_lock(my_rwlock of CPU 1);


2.    read_lock(my_rwlock of CPU 0);       spin_lock(&random_lock);


Writer:

         CPU 2:
         ------

      for_each_online_cpu(cpu)
        write_lock(my_rwlock of 'cpu');


Consider what happens if the writer begins his operation in between steps 1
and 2 at the reader side. It becomes evident that we end up in a (previously
non-existent) deadlock due to a circular locking dependency between the 3
entities, like this:


(holds              Waiting for
 random_lock) CPU 0 -------------> CPU 2  (holds my_rwlock of CPU 0
                                               for write)
               ^                   |
               |                   |
        Waiting|                   | Waiting
          for  |                   |  for
               |                   V
                ------ CPU 1 <------

                (holds my_rwlock of
                 CPU 1 for read)



So obviously this "straight-forward" way of implementing percpu rwlocks is
deadlock-prone. One simple measure for (or characteristic of) safe percpu
rwlock should be that if a user replaces global rwlocks with per-CPU rwlocks
(for performance reasons), he shouldn't suddenly end up in numerous deadlock
possibilities which never existed before. The replacement should continue to
remain safe, and perhaps improve the performance.

Observing the robustness of global rwlocks in providing a fair amount of
deadlock safety, we implement per-CPU rwlocks as nothing but global rwlocks,
as a first step.


Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 include/linux/percpu-rwlock.h |   49 ++++++++++++++++++++++++++++++++
 lib/Kconfig                   |    3 ++
 lib/Makefile                  |    1 +
 lib/percpu-rwlock.c           |   63 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 116 insertions(+)
 create mode 100644 include/linux/percpu-rwlock.h
 create mode 100644 lib/percpu-rwlock.c

diff --git a/include/linux/percpu-rwlock.h b/include/linux/percpu-rwlock.h
new file mode 100644
index 0000000..0caf81f
--- /dev/null
+++ b/include/linux/percpu-rwlock.h
@@ -0,0 +1,49 @@
+/*
+ * Flexible Per-CPU Reader-Writer Locks
+ * (with relaxed locking rules and reduced deadlock-possibilities)
+ *
+ * Copyright (C) IBM Corporation, 2012-2013
+ * Author: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
+ *
+ * With lots of invaluable suggestions from:
+ *	   Oleg Nesterov <oleg@redhat.com>
+ *	   Tejun Heo <tj@kernel.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _LINUX_PERCPU_RWLOCK_H
+#define _LINUX_PERCPU_RWLOCK_H
+
+#include <linux/percpu.h>
+#include <linux/lockdep.h>
+#include <linux/spinlock.h>
+
+struct percpu_rwlock {
+	rwlock_t			global_rwlock;
+};
+
+extern void percpu_read_lock(struct percpu_rwlock *);
+extern void percpu_read_unlock(struct percpu_rwlock *);
+
+extern void percpu_write_lock(struct percpu_rwlock *);
+extern void percpu_write_unlock(struct percpu_rwlock *);
+
+extern int __percpu_init_rwlock(struct percpu_rwlock *,
+				const char *, struct lock_class_key *);
+
+#define percpu_init_rwlock(pcpu_rwlock)					\
+({	static struct lock_class_key rwlock_key;			\
+	__percpu_init_rwlock(pcpu_rwlock, #pcpu_rwlock, &rwlock_key);	\
+})
+
+#endif
diff --git a/lib/Kconfig b/lib/Kconfig
index 75cdb77..32fb0b9 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -45,6 +45,9 @@ config STMP_DEVICE
 config PERCPU_RWSEM
 	boolean
 
+config PERCPU_RWLOCK
+	boolean
+
 config CRC_CCITT
 	tristate "CRC-CCITT functions"
 	help
diff --git a/lib/Makefile b/lib/Makefile
index 02ed6c0..1854b5e 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -41,6 +41,7 @@ obj-$(CONFIG_DEBUG_SPINLOCK) += spinlock_debug.o
 lib-$(CONFIG_RWSEM_GENERIC_SPINLOCK) += rwsem-spinlock.o
 lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
 lib-$(CONFIG_PERCPU_RWSEM) += percpu-rwsem.o
+lib-$(CONFIG_PERCPU_RWLOCK) += percpu-rwlock.o
 
 CFLAGS_hweight.o = $(subst $(quote),,$(CONFIG_ARCH_HWEIGHT_CFLAGS))
 obj-$(CONFIG_GENERIC_HWEIGHT) += hweight.o
diff --git a/lib/percpu-rwlock.c b/lib/percpu-rwlock.c
new file mode 100644
index 0000000..111a238
--- /dev/null
+++ b/lib/percpu-rwlock.c
@@ -0,0 +1,63 @@
+/*
+ * Flexible Per-CPU Reader-Writer Locks
+ * (with relaxed locking rules and reduced deadlock-possibilities)
+ *
+ * Copyright (C) IBM Corporation, 2012-2013
+ * Author: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
+ *
+ * With lots of invaluable suggestions from:
+ *	   Oleg Nesterov <oleg@redhat.com>
+ *	   Tejun Heo <tj@kernel.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/spinlock.h>
+#include <linux/percpu.h>
+#include <linux/lockdep.h>
+#include <linux/percpu-rwlock.h>
+#include <linux/errno.h>
+
+
+int __percpu_init_rwlock(struct percpu_rwlock *pcpu_rwlock,
+			 const char *name, struct lock_class_key *rwlock_key)
+{
+	/* ->global_rwlock represents the whole percpu_rwlock for lockdep */
+#ifdef CONFIG_DEBUG_SPINLOCK
+	__rwlock_init(&pcpu_rwlock->global_rwlock, name, rwlock_key);
+#else
+	pcpu_rwlock->global_rwlock =
+			__RW_LOCK_UNLOCKED(&pcpu_rwlock->global_rwlock);
+#endif
+	return 0;
+}
+
+void percpu_read_lock(struct percpu_rwlock *pcpu_rwlock)
+{
+	read_lock(&pcpu_rwlock->global_rwlock);
+}
+
+void percpu_read_unlock(struct percpu_rwlock *pcpu_rwlock)
+{
+	read_unlock(&pcpu_rwlock->global_rwlock);
+}
+
+void percpu_write_lock(struct percpu_rwlock *pcpu_rwlock)
+{
+	write_lock(&pcpu_rwlock->global_rwlock);
+}
+
+void percpu_write_unlock(struct percpu_rwlock *pcpu_rwlock)
+{
+	write_unlock(&pcpu_rwlock->global_rwlock);
+}
+

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 01/46] percpu_rwlock: Introduce the global reader-writer lock backend
@ 2013-02-18 12:38   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:38 UTC (permalink / raw)
  To: linux-arm-kernel

A straight-forward (and obvious) algorithm to implement Per-CPU Reader-Writer
locks can also lead to too many deadlock possibilities which can make it very
hard/impossible to use. This is explained in the example below, which helps
justify the need for a different algorithm to implement flexible Per-CPU
Reader-Writer locks.

We can use global rwlocks as shown below safely, without fear of deadlocks:

Readers:

         CPU 0                                CPU 1
         ------                               ------

1.    spin_lock(&random_lock);             read_lock(&my_rwlock);


2.    read_lock(&my_rwlock);               spin_lock(&random_lock);


Writer:

         CPU 2:
         ------

       write_lock(&my_rwlock);


We can observe that there is no possibility of deadlocks or circular locking
dependencies here. Its perfectly safe.

Now consider a blind/straight-forward conversion of global rwlocks to per-CPU
rwlocks like this:

The reader locks its own per-CPU rwlock for read, and proceeds.

Something like: read_lock(per-cpu rwlock of this cpu);

The writer acquires all per-CPU rwlocks for write and only then proceeds.

Something like:

  for_each_online_cpu(cpu)
	write_lock(per-cpu rwlock of 'cpu');


Now let's say that for performance reasons, the above scenario (which was
perfectly safe when using global rwlocks) was converted to use per-CPU rwlocks.


         CPU 0                                CPU 1
         ------                               ------

1.    spin_lock(&random_lock);             read_lock(my_rwlock of CPU 1);


2.    read_lock(my_rwlock of CPU 0);       spin_lock(&random_lock);


Writer:

         CPU 2:
         ------

      for_each_online_cpu(cpu)
        write_lock(my_rwlock of 'cpu');


Consider what happens if the writer begins his operation in between steps 1
and 2 at the reader side. It becomes evident that we end up in a (previously
non-existent) deadlock due to a circular locking dependency between the 3
entities, like this:


(holds              Waiting for
 random_lock) CPU 0 -------------> CPU 2  (holds my_rwlock of CPU 0
                                               for write)
               ^                   |
               |                   |
        Waiting|                   | Waiting
          for  |                   |  for
               |                   V
                ------ CPU 1 <------

                (holds my_rwlock of
                 CPU 1 for read)



So obviously this "straight-forward" way of implementing percpu rwlocks is
deadlock-prone. One simple measure for (or characteristic of) safe percpu
rwlock should be that if a user replaces global rwlocks with per-CPU rwlocks
(for performance reasons), he shouldn't suddenly end up in numerous deadlock
possibilities which never existed before. The replacement should continue to
remain safe, and perhaps improve the performance.

Observing the robustness of global rwlocks in providing a fair amount of
deadlock safety, we implement per-CPU rwlocks as nothing but global rwlocks,
as a first step.


Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 include/linux/percpu-rwlock.h |   49 ++++++++++++++++++++++++++++++++
 lib/Kconfig                   |    3 ++
 lib/Makefile                  |    1 +
 lib/percpu-rwlock.c           |   63 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 116 insertions(+)
 create mode 100644 include/linux/percpu-rwlock.h
 create mode 100644 lib/percpu-rwlock.c

diff --git a/include/linux/percpu-rwlock.h b/include/linux/percpu-rwlock.h
new file mode 100644
index 0000000..0caf81f
--- /dev/null
+++ b/include/linux/percpu-rwlock.h
@@ -0,0 +1,49 @@
+/*
+ * Flexible Per-CPU Reader-Writer Locks
+ * (with relaxed locking rules and reduced deadlock-possibilities)
+ *
+ * Copyright (C) IBM Corporation, 2012-2013
+ * Author: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
+ *
+ * With lots of invaluable suggestions from:
+ *	   Oleg Nesterov <oleg@redhat.com>
+ *	   Tejun Heo <tj@kernel.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _LINUX_PERCPU_RWLOCK_H
+#define _LINUX_PERCPU_RWLOCK_H
+
+#include <linux/percpu.h>
+#include <linux/lockdep.h>
+#include <linux/spinlock.h>
+
+struct percpu_rwlock {
+	rwlock_t			global_rwlock;
+};
+
+extern void percpu_read_lock(struct percpu_rwlock *);
+extern void percpu_read_unlock(struct percpu_rwlock *);
+
+extern void percpu_write_lock(struct percpu_rwlock *);
+extern void percpu_write_unlock(struct percpu_rwlock *);
+
+extern int __percpu_init_rwlock(struct percpu_rwlock *,
+				const char *, struct lock_class_key *);
+
+#define percpu_init_rwlock(pcpu_rwlock)					\
+({	static struct lock_class_key rwlock_key;			\
+	__percpu_init_rwlock(pcpu_rwlock, #pcpu_rwlock, &rwlock_key);	\
+})
+
+#endif
diff --git a/lib/Kconfig b/lib/Kconfig
index 75cdb77..32fb0b9 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -45,6 +45,9 @@ config STMP_DEVICE
 config PERCPU_RWSEM
 	boolean
 
+config PERCPU_RWLOCK
+	boolean
+
 config CRC_CCITT
 	tristate "CRC-CCITT functions"
 	help
diff --git a/lib/Makefile b/lib/Makefile
index 02ed6c0..1854b5e 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -41,6 +41,7 @@ obj-$(CONFIG_DEBUG_SPINLOCK) += spinlock_debug.o
 lib-$(CONFIG_RWSEM_GENERIC_SPINLOCK) += rwsem-spinlock.o
 lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
 lib-$(CONFIG_PERCPU_RWSEM) += percpu-rwsem.o
+lib-$(CONFIG_PERCPU_RWLOCK) += percpu-rwlock.o
 
 CFLAGS_hweight.o = $(subst $(quote),,$(CONFIG_ARCH_HWEIGHT_CFLAGS))
 obj-$(CONFIG_GENERIC_HWEIGHT) += hweight.o
diff --git a/lib/percpu-rwlock.c b/lib/percpu-rwlock.c
new file mode 100644
index 0000000..111a238
--- /dev/null
+++ b/lib/percpu-rwlock.c
@@ -0,0 +1,63 @@
+/*
+ * Flexible Per-CPU Reader-Writer Locks
+ * (with relaxed locking rules and reduced deadlock-possibilities)
+ *
+ * Copyright (C) IBM Corporation, 2012-2013
+ * Author: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
+ *
+ * With lots of invaluable suggestions from:
+ *	   Oleg Nesterov <oleg@redhat.com>
+ *	   Tejun Heo <tj@kernel.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/spinlock.h>
+#include <linux/percpu.h>
+#include <linux/lockdep.h>
+#include <linux/percpu-rwlock.h>
+#include <linux/errno.h>
+
+
+int __percpu_init_rwlock(struct percpu_rwlock *pcpu_rwlock,
+			 const char *name, struct lock_class_key *rwlock_key)
+{
+	/* ->global_rwlock represents the whole percpu_rwlock for lockdep */
+#ifdef CONFIG_DEBUG_SPINLOCK
+	__rwlock_init(&pcpu_rwlock->global_rwlock, name, rwlock_key);
+#else
+	pcpu_rwlock->global_rwlock =
+			__RW_LOCK_UNLOCKED(&pcpu_rwlock->global_rwlock);
+#endif
+	return 0;
+}
+
+void percpu_read_lock(struct percpu_rwlock *pcpu_rwlock)
+{
+	read_lock(&pcpu_rwlock->global_rwlock);
+}
+
+void percpu_read_unlock(struct percpu_rwlock *pcpu_rwlock)
+{
+	read_unlock(&pcpu_rwlock->global_rwlock);
+}
+
+void percpu_write_lock(struct percpu_rwlock *pcpu_rwlock)
+{
+	write_lock(&pcpu_rwlock->global_rwlock);
+}
+
+void percpu_write_unlock(struct percpu_rwlock *pcpu_rwlock)
+{
+	write_unlock(&pcpu_rwlock->global_rwlock);
+}
+

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 02/46] percpu_rwlock: Introduce per-CPU variables for the reader and the writer
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:38   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:38 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

Per-CPU rwlocks ought to give better performance than global rwlocks.
That is where the "per-CPU" component comes in. So introduce the necessary
per-CPU variables that would be necessary at the reader and the writer sides,
and add the support for dynamically initializing per-CPU rwlocks.
These per-CPU variables will be used subsequently to implement the core
algorithm behind per-CPU rwlocks.

Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 include/linux/percpu-rwlock.h |    8 ++++++++
 lib/percpu-rwlock.c           |   12 ++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/include/linux/percpu-rwlock.h b/include/linux/percpu-rwlock.h
index 0caf81f..74eaf4d 100644
--- a/include/linux/percpu-rwlock.h
+++ b/include/linux/percpu-rwlock.h
@@ -28,7 +28,13 @@
 #include <linux/lockdep.h>
 #include <linux/spinlock.h>
 
+struct rw_state {
+	unsigned long	reader_refcnt;
+	bool		writer_signal;
+};
+
 struct percpu_rwlock {
+	struct rw_state __percpu	*rw_state;
 	rwlock_t			global_rwlock;
 };
 
@@ -41,6 +47,8 @@ extern void percpu_write_unlock(struct percpu_rwlock *);
 extern int __percpu_init_rwlock(struct percpu_rwlock *,
 				const char *, struct lock_class_key *);
 
+extern void percpu_free_rwlock(struct percpu_rwlock *);
+
 #define percpu_init_rwlock(pcpu_rwlock)					\
 ({	static struct lock_class_key rwlock_key;			\
 	__percpu_init_rwlock(pcpu_rwlock, #pcpu_rwlock, &rwlock_key);	\
diff --git a/lib/percpu-rwlock.c b/lib/percpu-rwlock.c
index 111a238..f938096 100644
--- a/lib/percpu-rwlock.c
+++ b/lib/percpu-rwlock.c
@@ -31,6 +31,10 @@
 int __percpu_init_rwlock(struct percpu_rwlock *pcpu_rwlock,
 			 const char *name, struct lock_class_key *rwlock_key)
 {
+	pcpu_rwlock->rw_state = alloc_percpu(struct rw_state);
+	if (unlikely(!pcpu_rwlock->rw_state))
+		return -ENOMEM;
+
 	/* ->global_rwlock represents the whole percpu_rwlock for lockdep */
 #ifdef CONFIG_DEBUG_SPINLOCK
 	__rwlock_init(&pcpu_rwlock->global_rwlock, name, rwlock_key);
@@ -41,6 +45,14 @@ int __percpu_init_rwlock(struct percpu_rwlock *pcpu_rwlock,
 	return 0;
 }
 
+void percpu_free_rwlock(struct percpu_rwlock *pcpu_rwlock)
+{
+	free_percpu(pcpu_rwlock->rw_state);
+
+	/* Catch use-after-free bugs */
+	pcpu_rwlock->rw_state = NULL;
+}
+
 void percpu_read_lock(struct percpu_rwlock *pcpu_rwlock)
 {
 	read_lock(&pcpu_rwlock->global_rwlock);


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 02/46] percpu_rwlock: Introduce per-CPU variables for the reader and the writer
@ 2013-02-18 12:38   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:38 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

Per-CPU rwlocks ought to give better performance than global rwlocks.
That is where the "per-CPU" component comes in. So introduce the necessary
per-CPU variables that would be necessary at the reader and the writer sides,
and add the support for dynamically initializing per-CPU rwlocks.
These per-CPU variables will be used subsequently to implement the core
algorithm behind per-CPU rwlocks.

Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 include/linux/percpu-rwlock.h |    8 ++++++++
 lib/percpu-rwlock.c           |   12 ++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/include/linux/percpu-rwlock.h b/include/linux/percpu-rwlock.h
index 0caf81f..74eaf4d 100644
--- a/include/linux/percpu-rwlock.h
+++ b/include/linux/percpu-rwlock.h
@@ -28,7 +28,13 @@
 #include <linux/lockdep.h>
 #include <linux/spinlock.h>
 
+struct rw_state {
+	unsigned long	reader_refcnt;
+	bool		writer_signal;
+};
+
 struct percpu_rwlock {
+	struct rw_state __percpu	*rw_state;
 	rwlock_t			global_rwlock;
 };
 
@@ -41,6 +47,8 @@ extern void percpu_write_unlock(struct percpu_rwlock *);
 extern int __percpu_init_rwlock(struct percpu_rwlock *,
 				const char *, struct lock_class_key *);
 
+extern void percpu_free_rwlock(struct percpu_rwlock *);
+
 #define percpu_init_rwlock(pcpu_rwlock)					\
 ({	static struct lock_class_key rwlock_key;			\
 	__percpu_init_rwlock(pcpu_rwlock, #pcpu_rwlock, &rwlock_key);	\
diff --git a/lib/percpu-rwlock.c b/lib/percpu-rwlock.c
index 111a238..f938096 100644
--- a/lib/percpu-rwlock.c
+++ b/lib/percpu-rwlock.c
@@ -31,6 +31,10 @@
 int __percpu_init_rwlock(struct percpu_rwlock *pcpu_rwlock,
 			 const char *name, struct lock_class_key *rwlock_key)
 {
+	pcpu_rwlock->rw_state = alloc_percpu(struct rw_state);
+	if (unlikely(!pcpu_rwlock->rw_state))
+		return -ENOMEM;
+
 	/* ->global_rwlock represents the whole percpu_rwlock for lockdep */
 #ifdef CONFIG_DEBUG_SPINLOCK
 	__rwlock_init(&pcpu_rwlock->global_rwlock, name, rwlock_key);
@@ -41,6 +45,14 @@ int __percpu_init_rwlock(struct percpu_rwlock *pcpu_rwlock,
 	return 0;
 }
 
+void percpu_free_rwlock(struct percpu_rwlock *pcpu_rwlock)
+{
+	free_percpu(pcpu_rwlock->rw_state);
+
+	/* Catch use-after-free bugs */
+	pcpu_rwlock->rw_state = NULL;
+}
+
 void percpu_read_lock(struct percpu_rwlock *pcpu_rwlock)
 {
 	read_lock(&pcpu_rwlock->global_rwlock);

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 02/46] percpu_rwlock: Introduce per-CPU variables for the reader and the writer
@ 2013-02-18 12:38   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:38 UTC (permalink / raw)
  To: linux-arm-kernel

Per-CPU rwlocks ought to give better performance than global rwlocks.
That is where the "per-CPU" component comes in. So introduce the necessary
per-CPU variables that would be necessary at the reader and the writer sides,
and add the support for dynamically initializing per-CPU rwlocks.
These per-CPU variables will be used subsequently to implement the core
algorithm behind per-CPU rwlocks.

Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 include/linux/percpu-rwlock.h |    8 ++++++++
 lib/percpu-rwlock.c           |   12 ++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/include/linux/percpu-rwlock.h b/include/linux/percpu-rwlock.h
index 0caf81f..74eaf4d 100644
--- a/include/linux/percpu-rwlock.h
+++ b/include/linux/percpu-rwlock.h
@@ -28,7 +28,13 @@
 #include <linux/lockdep.h>
 #include <linux/spinlock.h>
 
+struct rw_state {
+	unsigned long	reader_refcnt;
+	bool		writer_signal;
+};
+
 struct percpu_rwlock {
+	struct rw_state __percpu	*rw_state;
 	rwlock_t			global_rwlock;
 };
 
@@ -41,6 +47,8 @@ extern void percpu_write_unlock(struct percpu_rwlock *);
 extern int __percpu_init_rwlock(struct percpu_rwlock *,
 				const char *, struct lock_class_key *);
 
+extern void percpu_free_rwlock(struct percpu_rwlock *);
+
 #define percpu_init_rwlock(pcpu_rwlock)					\
 ({	static struct lock_class_key rwlock_key;			\
 	__percpu_init_rwlock(pcpu_rwlock, #pcpu_rwlock, &rwlock_key);	\
diff --git a/lib/percpu-rwlock.c b/lib/percpu-rwlock.c
index 111a238..f938096 100644
--- a/lib/percpu-rwlock.c
+++ b/lib/percpu-rwlock.c
@@ -31,6 +31,10 @@
 int __percpu_init_rwlock(struct percpu_rwlock *pcpu_rwlock,
 			 const char *name, struct lock_class_key *rwlock_key)
 {
+	pcpu_rwlock->rw_state = alloc_percpu(struct rw_state);
+	if (unlikely(!pcpu_rwlock->rw_state))
+		return -ENOMEM;
+
 	/* ->global_rwlock represents the whole percpu_rwlock for lockdep */
 #ifdef CONFIG_DEBUG_SPINLOCK
 	__rwlock_init(&pcpu_rwlock->global_rwlock, name, rwlock_key);
@@ -41,6 +45,14 @@ int __percpu_init_rwlock(struct percpu_rwlock *pcpu_rwlock,
 	return 0;
 }
 
+void percpu_free_rwlock(struct percpu_rwlock *pcpu_rwlock)
+{
+	free_percpu(pcpu_rwlock->rw_state);
+
+	/* Catch use-after-free bugs */
+	pcpu_rwlock->rw_state = NULL;
+}
+
 void percpu_read_lock(struct percpu_rwlock *pcpu_rwlock)
 {
 	read_lock(&pcpu_rwlock->global_rwlock);

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 03/46] percpu_rwlock: Provide a way to define and init percpu-rwlocks at compile time
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:38   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:38 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

Add the support for defining and initializing percpu-rwlocks at compile time
for those users who would like to use percpu-rwlocks really early in the boot
process (even before dynamic per-CPU allocations can begin).

Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 include/linux/percpu-rwlock.h |   15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/include/linux/percpu-rwlock.h b/include/linux/percpu-rwlock.h
index 74eaf4d..5590b1e 100644
--- a/include/linux/percpu-rwlock.h
+++ b/include/linux/percpu-rwlock.h
@@ -49,6 +49,21 @@ extern int __percpu_init_rwlock(struct percpu_rwlock *,
 
 extern void percpu_free_rwlock(struct percpu_rwlock *);
 
+
+#define __PERCPU_RWLOCK_INIT(name)					\
+	{								\
+		.rw_state = &name##_rw_state,				\
+		.global_rwlock = __RW_LOCK_UNLOCKED(name.global_rwlock) \
+	}
+
+#define DEFINE_PERCPU_RWLOCK(name)					\
+	static DEFINE_PER_CPU(struct rw_state, name##_rw_state);	\
+	struct percpu_rwlock (name) = __PERCPU_RWLOCK_INIT(name);
+
+#define DEFINE_STATIC_PERCPU_RWLOCK(name)				\
+	static DEFINE_PER_CPU(struct rw_state, name##_rw_state);	\
+	static struct percpu_rwlock(name) = __PERCPU_RWLOCK_INIT(name);
+
 #define percpu_init_rwlock(pcpu_rwlock)					\
 ({	static struct lock_class_key rwlock_key;			\
 	__percpu_init_rwlock(pcpu_rwlock, #pcpu_rwlock, &rwlock_key);	\


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 03/46] percpu_rwlock: Provide a way to define and init percpu-rwlocks at compile time
@ 2013-02-18 12:38   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:38 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

Add the support for defining and initializing percpu-rwlocks at compile time
for those users who would like to use percpu-rwlocks really early in the boot
process (even before dynamic per-CPU allocations can begin).

Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 include/linux/percpu-rwlock.h |   15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/include/linux/percpu-rwlock.h b/include/linux/percpu-rwlock.h
index 74eaf4d..5590b1e 100644
--- a/include/linux/percpu-rwlock.h
+++ b/include/linux/percpu-rwlock.h
@@ -49,6 +49,21 @@ extern int __percpu_init_rwlock(struct percpu_rwlock *,
 
 extern void percpu_free_rwlock(struct percpu_rwlock *);
 
+
+#define __PERCPU_RWLOCK_INIT(name)					\
+	{								\
+		.rw_state = &name##_rw_state,				\
+		.global_rwlock = __RW_LOCK_UNLOCKED(name.global_rwlock) \
+	}
+
+#define DEFINE_PERCPU_RWLOCK(name)					\
+	static DEFINE_PER_CPU(struct rw_state, name##_rw_state);	\
+	struct percpu_rwlock (name) = __PERCPU_RWLOCK_INIT(name);
+
+#define DEFINE_STATIC_PERCPU_RWLOCK(name)				\
+	static DEFINE_PER_CPU(struct rw_state, name##_rw_state);	\
+	static struct percpu_rwlock(name) = __PERCPU_RWLOCK_INIT(name);
+
 #define percpu_init_rwlock(pcpu_rwlock)					\
 ({	static struct lock_class_key rwlock_key;			\
 	__percpu_init_rwlock(pcpu_rwlock, #pcpu_rwlock, &rwlock_key);	\

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 03/46] percpu_rwlock: Provide a way to define and init percpu-rwlocks at compile time
@ 2013-02-18 12:38   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:38 UTC (permalink / raw)
  To: linux-arm-kernel

Add the support for defining and initializing percpu-rwlocks at compile time
for those users who would like to use percpu-rwlocks really early in the boot
process (even before dynamic per-CPU allocations can begin).

Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 include/linux/percpu-rwlock.h |   15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/include/linux/percpu-rwlock.h b/include/linux/percpu-rwlock.h
index 74eaf4d..5590b1e 100644
--- a/include/linux/percpu-rwlock.h
+++ b/include/linux/percpu-rwlock.h
@@ -49,6 +49,21 @@ extern int __percpu_init_rwlock(struct percpu_rwlock *,
 
 extern void percpu_free_rwlock(struct percpu_rwlock *);
 
+
+#define __PERCPU_RWLOCK_INIT(name)					\
+	{								\
+		.rw_state = &name##_rw_state,				\
+		.global_rwlock = __RW_LOCK_UNLOCKED(name.global_rwlock) \
+	}
+
+#define DEFINE_PERCPU_RWLOCK(name)					\
+	static DEFINE_PER_CPU(struct rw_state, name##_rw_state);	\
+	struct percpu_rwlock (name) = __PERCPU_RWLOCK_INIT(name);
+
+#define DEFINE_STATIC_PERCPU_RWLOCK(name)				\
+	static DEFINE_PER_CPU(struct rw_state, name##_rw_state);	\
+	static struct percpu_rwlock(name) = __PERCPU_RWLOCK_INIT(name);
+
 #define percpu_init_rwlock(pcpu_rwlock)					\
 ({	static struct lock_class_key rwlock_key;			\
 	__percpu_init_rwlock(pcpu_rwlock, #pcpu_rwlock, &rwlock_key);	\

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:38   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:38 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

Using global rwlocks as the backend for per-CPU rwlocks helps us avoid many
lock-ordering related problems (unlike per-cpu locks). However, global
rwlocks lead to unnecessary cache-line bouncing even when there are no
writers present, which can slow down the system needlessly.

Per-cpu counters can help solve the cache-line bouncing problem. So we
actually use the best of both: per-cpu counters (no-waiting) at the reader
side in the fast-path, and global rwlocks in the slowpath.

[ Fastpath = no writer is active; Slowpath = a writer is active ]

IOW, the readers just increment/decrement their per-cpu refcounts (disabling
interrupts during the updates, if necessary) when no writer is active.
When a writer becomes active, he signals all readers to switch to global
rwlocks for the duration of his activity. The readers switch over when it
is safe for them (ie., when they are about to start a fresh, non-nested
read-side critical section) and start using (holding) the global rwlock for
read in their subsequent critical sections.

The writer waits for every existing reader to switch, and then acquires the
global rwlock for write and enters his critical section. Later, the writer
signals all readers that he is done, and that they can go back to using their
per-cpu refcounts again.

Note that the lock-safety (despite the per-cpu scheme) comes from the fact
that the readers can *choose* _when_ to switch to rwlocks upon the writer's
signal. And the readers don't wait on anybody based on the per-cpu counters.
The only true synchronization that involves waiting at the reader-side in this
scheme, is the one arising from the global rwlock, which is safe from circular
locking dependency issues.

Reader-writer locks and per-cpu counters are recursive, so they can be
used in a nested fashion in the reader-path, which makes per-CPU rwlocks also
recursive. Also, this design of switching the synchronization scheme ensures
that you can safely nest and use these locks in a very flexible manner.

I'm indebted to Michael Wang and Xiao Guangrong for their numerous thoughtful
suggestions and ideas, which inspired and influenced many of the decisions in
this as well as previous designs. Thanks a lot Michael and Xiao!

Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 lib/percpu-rwlock.c |  139 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 137 insertions(+), 2 deletions(-)

diff --git a/lib/percpu-rwlock.c b/lib/percpu-rwlock.c
index f938096..edefdea 100644
--- a/lib/percpu-rwlock.c
+++ b/lib/percpu-rwlock.c
@@ -27,6 +27,24 @@
 #include <linux/percpu-rwlock.h>
 #include <linux/errno.h>
 
+#include <asm/processor.h>
+
+
+#define reader_yet_to_switch(pcpu_rwlock, cpu)				    \
+	(ACCESS_ONCE(per_cpu_ptr((pcpu_rwlock)->rw_state, cpu)->reader_refcnt))
+
+#define reader_percpu_nesting_depth(pcpu_rwlock)		  \
+	(__this_cpu_read((pcpu_rwlock)->rw_state->reader_refcnt))
+
+#define reader_uses_percpu_refcnt(pcpu_rwlock)				\
+				reader_percpu_nesting_depth(pcpu_rwlock)
+
+#define reader_nested_percpu(pcpu_rwlock)				\
+			(reader_percpu_nesting_depth(pcpu_rwlock) > 1)
+
+#define writer_active(pcpu_rwlock)					\
+	(__this_cpu_read((pcpu_rwlock)->rw_state->writer_signal))
+
 
 int __percpu_init_rwlock(struct percpu_rwlock *pcpu_rwlock,
 			 const char *name, struct lock_class_key *rwlock_key)
@@ -55,21 +73,138 @@ void percpu_free_rwlock(struct percpu_rwlock *pcpu_rwlock)
 
 void percpu_read_lock(struct percpu_rwlock *pcpu_rwlock)
 {
-	read_lock(&pcpu_rwlock->global_rwlock);
+	preempt_disable();
+
+	/*
+	 * Let the writer know that a reader is active, even before we choose
+	 * our reader-side synchronization scheme.
+	 */
+	this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
+
+	/*
+	 * If we are already using per-cpu refcounts, it is not safe to switch
+	 * the synchronization scheme. So continue using the refcounts.
+	 */
+	if (reader_nested_percpu(pcpu_rwlock))
+		return;
+
+	/*
+	 * The write to 'reader_refcnt' must be visible before we read
+	 * 'writer_signal'.
+	 */
+	smp_mb();
+
+	if (likely(!writer_active(pcpu_rwlock))) {
+		goto out;
+	} else {
+		/* Writer is active, so switch to global rwlock. */
+		read_lock(&pcpu_rwlock->global_rwlock);
+
+		/*
+		 * We might have raced with a writer going inactive before we
+		 * took the read-lock. So re-evaluate whether we still need to
+		 * hold the rwlock or if we can switch back to per-cpu
+		 * refcounts. (This also helps avoid heterogeneous nesting of
+		 * readers).
+		 */
+		if (writer_active(pcpu_rwlock)) {
+			/*
+			 * The above writer_active() check can get reordered
+			 * with this_cpu_dec() below, but this is OK, because
+			 * holding the rwlock is conservative.
+			 */
+			this_cpu_dec(pcpu_rwlock->rw_state->reader_refcnt);
+		} else {
+			read_unlock(&pcpu_rwlock->global_rwlock);
+		}
+	}
+
+out:
+	/* Prevent reordering of any subsequent reads/writes */
+	smp_mb();
 }
 
 void percpu_read_unlock(struct percpu_rwlock *pcpu_rwlock)
 {
-	read_unlock(&pcpu_rwlock->global_rwlock);
+	/*
+	 * We never allow heterogeneous nesting of readers. So it is trivial
+	 * to find out the kind of reader we are, and undo the operation
+	 * done by our corresponding percpu_read_lock().
+	 */
+
+	/* Try to fast-path: a nested percpu reader is the simplest case */
+	if (reader_nested_percpu(pcpu_rwlock)) {
+		this_cpu_dec(pcpu_rwlock->rw_state->reader_refcnt);
+		preempt_enable();
+		return;
+	}
+
+	/*
+	 * Now we are left with only 2 options: a non-nested percpu reader,
+	 * or a reader holding rwlock
+	 */
+	if (reader_uses_percpu_refcnt(pcpu_rwlock)) {
+		/*
+		 * Complete the critical section before decrementing the
+		 * refcnt. We can optimize this away if we are a nested
+		 * reader (the case above).
+		 */
+		smp_mb();
+		this_cpu_dec(pcpu_rwlock->rw_state->reader_refcnt);
+	} else {
+		read_unlock(&pcpu_rwlock->global_rwlock);
+	}
+
+	preempt_enable();
 }
 
 void percpu_write_lock(struct percpu_rwlock *pcpu_rwlock)
 {
+	unsigned int cpu;
+
+	/*
+	 * Tell all readers that a writer is becoming active, so that they
+	 * start switching over to the global rwlock.
+	 */
+	for_each_possible_cpu(cpu)
+		per_cpu_ptr(pcpu_rwlock->rw_state, cpu)->writer_signal = true;
+
+	smp_mb();
+
+	/*
+	 * Wait for every reader to see the writer's signal and switch from
+	 * percpu refcounts to global rwlock.
+	 *
+	 * If a reader is still using percpu refcounts, wait for him to switch.
+	 * Else, we can safely go ahead, because either the reader has already
+	 * switched over, or the next reader that comes along on that CPU will
+	 * notice the writer's signal and will switch over to the rwlock.
+	 */
+
+	for_each_possible_cpu(cpu) {
+		while (reader_yet_to_switch(pcpu_rwlock, cpu))
+			cpu_relax();
+	}
+
+	smp_mb(); /* Complete the wait-for-readers, before taking the lock */
 	write_lock(&pcpu_rwlock->global_rwlock);
 }
 
 void percpu_write_unlock(struct percpu_rwlock *pcpu_rwlock)
 {
+	unsigned int cpu;
+
+	/* Complete the critical section before clearing ->writer_signal */
+	smp_mb();
+
+	/*
+	 * Inform all readers that we are done, so that they can switch back
+	 * to their per-cpu refcounts. (We don't need to wait for them to
+	 * see it).
+	 */
+	for_each_possible_cpu(cpu)
+		per_cpu_ptr(pcpu_rwlock->rw_state, cpu)->writer_signal = false;
+
 	write_unlock(&pcpu_rwlock->global_rwlock);
 }
 


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-18 12:38   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:38 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

Using global rwlocks as the backend for per-CPU rwlocks helps us avoid many
lock-ordering related problems (unlike per-cpu locks). However, global
rwlocks lead to unnecessary cache-line bouncing even when there are no
writers present, which can slow down the system needlessly.

Per-cpu counters can help solve the cache-line bouncing problem. So we
actually use the best of both: per-cpu counters (no-waiting) at the reader
side in the fast-path, and global rwlocks in the slowpath.

[ Fastpath = no writer is active; Slowpath = a writer is active ]

IOW, the readers just increment/decrement their per-cpu refcounts (disabling
interrupts during the updates, if necessary) when no writer is active.
When a writer becomes active, he signals all readers to switch to global
rwlocks for the duration of his activity. The readers switch over when it
is safe for them (ie., when they are about to start a fresh, non-nested
read-side critical section) and start using (holding) the global rwlock for
read in their subsequent critical sections.

The writer waits for every existing reader to switch, and then acquires the
global rwlock for write and enters his critical section. Later, the writer
signals all readers that he is done, and that they can go back to using their
per-cpu refcounts again.

Note that the lock-safety (despite the per-cpu scheme) comes from the fact
that the readers can *choose* _when_ to switch to rwlocks upon the writer's
signal. And the readers don't wait on anybody based on the per-cpu counters.
The only true synchronization that involves waiting at the reader-side in this
scheme, is the one arising from the global rwlock, which is safe from circular
locking dependency issues.

Reader-writer locks and per-cpu counters are recursive, so they can be
used in a nested fashion in the reader-path, which makes per-CPU rwlocks also
recursive. Also, this design of switching the synchronization scheme ensures
that you can safely nest and use these locks in a very flexible manner.

I'm indebted to Michael Wang and Xiao Guangrong for their numerous thoughtful
suggestions and ideas, which inspired and influenced many of the decisions in
this as well as previous designs. Thanks a lot Michael and Xiao!

Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 lib/percpu-rwlock.c |  139 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 137 insertions(+), 2 deletions(-)

diff --git a/lib/percpu-rwlock.c b/lib/percpu-rwlock.c
index f938096..edefdea 100644
--- a/lib/percpu-rwlock.c
+++ b/lib/percpu-rwlock.c
@@ -27,6 +27,24 @@
 #include <linux/percpu-rwlock.h>
 #include <linux/errno.h>
 
+#include <asm/processor.h>
+
+
+#define reader_yet_to_switch(pcpu_rwlock, cpu)				    \
+	(ACCESS_ONCE(per_cpu_ptr((pcpu_rwlock)->rw_state, cpu)->reader_refcnt))
+
+#define reader_percpu_nesting_depth(pcpu_rwlock)		  \
+	(__this_cpu_read((pcpu_rwlock)->rw_state->reader_refcnt))
+
+#define reader_uses_percpu_refcnt(pcpu_rwlock)				\
+				reader_percpu_nesting_depth(pcpu_rwlock)
+
+#define reader_nested_percpu(pcpu_rwlock)				\
+			(reader_percpu_nesting_depth(pcpu_rwlock) > 1)
+
+#define writer_active(pcpu_rwlock)					\
+	(__this_cpu_read((pcpu_rwlock)->rw_state->writer_signal))
+
 
 int __percpu_init_rwlock(struct percpu_rwlock *pcpu_rwlock,
 			 const char *name, struct lock_class_key *rwlock_key)
@@ -55,21 +73,138 @@ void percpu_free_rwlock(struct percpu_rwlock *pcpu_rwlock)
 
 void percpu_read_lock(struct percpu_rwlock *pcpu_rwlock)
 {
-	read_lock(&pcpu_rwlock->global_rwlock);
+	preempt_disable();
+
+	/*
+	 * Let the writer know that a reader is active, even before we choose
+	 * our reader-side synchronization scheme.
+	 */
+	this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
+
+	/*
+	 * If we are already using per-cpu refcounts, it is not safe to switch
+	 * the synchronization scheme. So continue using the refcounts.
+	 */
+	if (reader_nested_percpu(pcpu_rwlock))
+		return;
+
+	/*
+	 * The write to 'reader_refcnt' must be visible before we read
+	 * 'writer_signal'.
+	 */
+	smp_mb();
+
+	if (likely(!writer_active(pcpu_rwlock))) {
+		goto out;
+	} else {
+		/* Writer is active, so switch to global rwlock. */
+		read_lock(&pcpu_rwlock->global_rwlock);
+
+		/*
+		 * We might have raced with a writer going inactive before we
+		 * took the read-lock. So re-evaluate whether we still need to
+		 * hold the rwlock or if we can switch back to per-cpu
+		 * refcounts. (This also helps avoid heterogeneous nesting of
+		 * readers).
+		 */
+		if (writer_active(pcpu_rwlock)) {
+			/*
+			 * The above writer_active() check can get reordered
+			 * with this_cpu_dec() below, but this is OK, because
+			 * holding the rwlock is conservative.
+			 */
+			this_cpu_dec(pcpu_rwlock->rw_state->reader_refcnt);
+		} else {
+			read_unlock(&pcpu_rwlock->global_rwlock);
+		}
+	}
+
+out:
+	/* Prevent reordering of any subsequent reads/writes */
+	smp_mb();
 }
 
 void percpu_read_unlock(struct percpu_rwlock *pcpu_rwlock)
 {
-	read_unlock(&pcpu_rwlock->global_rwlock);
+	/*
+	 * We never allow heterogeneous nesting of readers. So it is trivial
+	 * to find out the kind of reader we are, and undo the operation
+	 * done by our corresponding percpu_read_lock().
+	 */
+
+	/* Try to fast-path: a nested percpu reader is the simplest case */
+	if (reader_nested_percpu(pcpu_rwlock)) {
+		this_cpu_dec(pcpu_rwlock->rw_state->reader_refcnt);
+		preempt_enable();
+		return;
+	}
+
+	/*
+	 * Now we are left with only 2 options: a non-nested percpu reader,
+	 * or a reader holding rwlock
+	 */
+	if (reader_uses_percpu_refcnt(pcpu_rwlock)) {
+		/*
+		 * Complete the critical section before decrementing the
+		 * refcnt. We can optimize this away if we are a nested
+		 * reader (the case above).
+		 */
+		smp_mb();
+		this_cpu_dec(pcpu_rwlock->rw_state->reader_refcnt);
+	} else {
+		read_unlock(&pcpu_rwlock->global_rwlock);
+	}
+
+	preempt_enable();
 }
 
 void percpu_write_lock(struct percpu_rwlock *pcpu_rwlock)
 {
+	unsigned int cpu;
+
+	/*
+	 * Tell all readers that a writer is becoming active, so that they
+	 * start switching over to the global rwlock.
+	 */
+	for_each_possible_cpu(cpu)
+		per_cpu_ptr(pcpu_rwlock->rw_state, cpu)->writer_signal = true;
+
+	smp_mb();
+
+	/*
+	 * Wait for every reader to see the writer's signal and switch from
+	 * percpu refcounts to global rwlock.
+	 *
+	 * If a reader is still using percpu refcounts, wait for him to switch.
+	 * Else, we can safely go ahead, because either the reader has already
+	 * switched over, or the next reader that comes along on that CPU will
+	 * notice the writer's signal and will switch over to the rwlock.
+	 */
+
+	for_each_possible_cpu(cpu) {
+		while (reader_yet_to_switch(pcpu_rwlock, cpu))
+			cpu_relax();
+	}
+
+	smp_mb(); /* Complete the wait-for-readers, before taking the lock */
 	write_lock(&pcpu_rwlock->global_rwlock);
 }
 
 void percpu_write_unlock(struct percpu_rwlock *pcpu_rwlock)
 {
+	unsigned int cpu;
+
+	/* Complete the critical section before clearing ->writer_signal */
+	smp_mb();
+
+	/*
+	 * Inform all readers that we are done, so that they can switch back
+	 * to their per-cpu refcounts. (We don't need to wait for them to
+	 * see it).
+	 */
+	for_each_possible_cpu(cpu)
+		per_cpu_ptr(pcpu_rwlock->rw_state, cpu)->writer_signal = false;
+
 	write_unlock(&pcpu_rwlock->global_rwlock);
 }
 

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-18 12:38   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:38 UTC (permalink / raw)
  To: linux-arm-kernel

Using global rwlocks as the backend for per-CPU rwlocks helps us avoid many
lock-ordering related problems (unlike per-cpu locks). However, global
rwlocks lead to unnecessary cache-line bouncing even when there are no
writers present, which can slow down the system needlessly.

Per-cpu counters can help solve the cache-line bouncing problem. So we
actually use the best of both: per-cpu counters (no-waiting) at the reader
side in the fast-path, and global rwlocks in the slowpath.

[ Fastpath = no writer is active; Slowpath = a writer is active ]

IOW, the readers just increment/decrement their per-cpu refcounts (disabling
interrupts during the updates, if necessary) when no writer is active.
When a writer becomes active, he signals all readers to switch to global
rwlocks for the duration of his activity. The readers switch over when it
is safe for them (ie., when they are about to start a fresh, non-nested
read-side critical section) and start using (holding) the global rwlock for
read in their subsequent critical sections.

The writer waits for every existing reader to switch, and then acquires the
global rwlock for write and enters his critical section. Later, the writer
signals all readers that he is done, and that they can go back to using their
per-cpu refcounts again.

Note that the lock-safety (despite the per-cpu scheme) comes from the fact
that the readers can *choose* _when_ to switch to rwlocks upon the writer's
signal. And the readers don't wait on anybody based on the per-cpu counters.
The only true synchronization that involves waiting at the reader-side in this
scheme, is the one arising from the global rwlock, which is safe from circular
locking dependency issues.

Reader-writer locks and per-cpu counters are recursive, so they can be
used in a nested fashion in the reader-path, which makes per-CPU rwlocks also
recursive. Also, this design of switching the synchronization scheme ensures
that you can safely nest and use these locks in a very flexible manner.

I'm indebted to Michael Wang and Xiao Guangrong for their numerous thoughtful
suggestions and ideas, which inspired and influenced many of the decisions in
this as well as previous designs. Thanks a lot Michael and Xiao!

Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 lib/percpu-rwlock.c |  139 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 137 insertions(+), 2 deletions(-)

diff --git a/lib/percpu-rwlock.c b/lib/percpu-rwlock.c
index f938096..edefdea 100644
--- a/lib/percpu-rwlock.c
+++ b/lib/percpu-rwlock.c
@@ -27,6 +27,24 @@
 #include <linux/percpu-rwlock.h>
 #include <linux/errno.h>
 
+#include <asm/processor.h>
+
+
+#define reader_yet_to_switch(pcpu_rwlock, cpu)				    \
+	(ACCESS_ONCE(per_cpu_ptr((pcpu_rwlock)->rw_state, cpu)->reader_refcnt))
+
+#define reader_percpu_nesting_depth(pcpu_rwlock)		  \
+	(__this_cpu_read((pcpu_rwlock)->rw_state->reader_refcnt))
+
+#define reader_uses_percpu_refcnt(pcpu_rwlock)				\
+				reader_percpu_nesting_depth(pcpu_rwlock)
+
+#define reader_nested_percpu(pcpu_rwlock)				\
+			(reader_percpu_nesting_depth(pcpu_rwlock) > 1)
+
+#define writer_active(pcpu_rwlock)					\
+	(__this_cpu_read((pcpu_rwlock)->rw_state->writer_signal))
+
 
 int __percpu_init_rwlock(struct percpu_rwlock *pcpu_rwlock,
 			 const char *name, struct lock_class_key *rwlock_key)
@@ -55,21 +73,138 @@ void percpu_free_rwlock(struct percpu_rwlock *pcpu_rwlock)
 
 void percpu_read_lock(struct percpu_rwlock *pcpu_rwlock)
 {
-	read_lock(&pcpu_rwlock->global_rwlock);
+	preempt_disable();
+
+	/*
+	 * Let the writer know that a reader is active, even before we choose
+	 * our reader-side synchronization scheme.
+	 */
+	this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
+
+	/*
+	 * If we are already using per-cpu refcounts, it is not safe to switch
+	 * the synchronization scheme. So continue using the refcounts.
+	 */
+	if (reader_nested_percpu(pcpu_rwlock))
+		return;
+
+	/*
+	 * The write to 'reader_refcnt' must be visible before we read
+	 * 'writer_signal'.
+	 */
+	smp_mb();
+
+	if (likely(!writer_active(pcpu_rwlock))) {
+		goto out;
+	} else {
+		/* Writer is active, so switch to global rwlock. */
+		read_lock(&pcpu_rwlock->global_rwlock);
+
+		/*
+		 * We might have raced with a writer going inactive before we
+		 * took the read-lock. So re-evaluate whether we still need to
+		 * hold the rwlock or if we can switch back to per-cpu
+		 * refcounts. (This also helps avoid heterogeneous nesting of
+		 * readers).
+		 */
+		if (writer_active(pcpu_rwlock)) {
+			/*
+			 * The above writer_active() check can get reordered
+			 * with this_cpu_dec() below, but this is OK, because
+			 * holding the rwlock is conservative.
+			 */
+			this_cpu_dec(pcpu_rwlock->rw_state->reader_refcnt);
+		} else {
+			read_unlock(&pcpu_rwlock->global_rwlock);
+		}
+	}
+
+out:
+	/* Prevent reordering of any subsequent reads/writes */
+	smp_mb();
 }
 
 void percpu_read_unlock(struct percpu_rwlock *pcpu_rwlock)
 {
-	read_unlock(&pcpu_rwlock->global_rwlock);
+	/*
+	 * We never allow heterogeneous nesting of readers. So it is trivial
+	 * to find out the kind of reader we are, and undo the operation
+	 * done by our corresponding percpu_read_lock().
+	 */
+
+	/* Try to fast-path: a nested percpu reader is the simplest case */
+	if (reader_nested_percpu(pcpu_rwlock)) {
+		this_cpu_dec(pcpu_rwlock->rw_state->reader_refcnt);
+		preempt_enable();
+		return;
+	}
+
+	/*
+	 * Now we are left with only 2 options: a non-nested percpu reader,
+	 * or a reader holding rwlock
+	 */
+	if (reader_uses_percpu_refcnt(pcpu_rwlock)) {
+		/*
+		 * Complete the critical section before decrementing the
+		 * refcnt. We can optimize this away if we are a nested
+		 * reader (the case above).
+		 */
+		smp_mb();
+		this_cpu_dec(pcpu_rwlock->rw_state->reader_refcnt);
+	} else {
+		read_unlock(&pcpu_rwlock->global_rwlock);
+	}
+
+	preempt_enable();
 }
 
 void percpu_write_lock(struct percpu_rwlock *pcpu_rwlock)
 {
+	unsigned int cpu;
+
+	/*
+	 * Tell all readers that a writer is becoming active, so that they
+	 * start switching over to the global rwlock.
+	 */
+	for_each_possible_cpu(cpu)
+		per_cpu_ptr(pcpu_rwlock->rw_state, cpu)->writer_signal = true;
+
+	smp_mb();
+
+	/*
+	 * Wait for every reader to see the writer's signal and switch from
+	 * percpu refcounts to global rwlock.
+	 *
+	 * If a reader is still using percpu refcounts, wait for him to switch.
+	 * Else, we can safely go ahead, because either the reader has already
+	 * switched over, or the next reader that comes along on that CPU will
+	 * notice the writer's signal and will switch over to the rwlock.
+	 */
+
+	for_each_possible_cpu(cpu) {
+		while (reader_yet_to_switch(pcpu_rwlock, cpu))
+			cpu_relax();
+	}
+
+	smp_mb(); /* Complete the wait-for-readers, before taking the lock */
 	write_lock(&pcpu_rwlock->global_rwlock);
 }
 
 void percpu_write_unlock(struct percpu_rwlock *pcpu_rwlock)
 {
+	unsigned int cpu;
+
+	/* Complete the critical section before clearing ->writer_signal */
+	smp_mb();
+
+	/*
+	 * Inform all readers that we are done, so that they can switch back
+	 * to their per-cpu refcounts. (We don't need to wait for them to
+	 * see it).
+	 */
+	for_each_possible_cpu(cpu)
+		per_cpu_ptr(pcpu_rwlock->rw_state, cpu)->writer_signal = false;
+
 	write_unlock(&pcpu_rwlock->global_rwlock);
 }
 

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 05/46] percpu_rwlock: Make percpu-rwlocks IRQ-safe, optimally
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:39   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:39 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

If interrupt handlers can also be readers, then one of the ways to make
per-CPU rwlocks safe, is to disable interrupts at the reader side before
trying to acquire the per-CPU rwlock and keep it disabled throughout the
duration of the read-side critical section.

The goal is to avoid cases such as:

  1. writer is active and it holds the global rwlock for write

  2. a regular reader comes in and marks itself as present (by incrementing
     its per-CPU refcount) before checking whether writer is active.

  3. an interrupt hits the reader;
     [If it had not hit, the reader would have noticed that the writer is
      active and would have decremented its refcount and would have tried
      to acquire the global rwlock for read].
     Since the interrupt handler also happens to be a reader, it notices
     the non-zero refcount (which was due to the reader who got interrupted)
     and thinks that this is a nested read-side critical section and
     proceeds to take the fastpath, which is wrong. The interrupt handler
     should have noticed that the writer is active and taken the rwlock
     for read.

So, disabling interrupts can help avoid this problem (at the cost of keeping
the interrupts disabled for quite long).

But Oleg had a brilliant idea by which we can do much better than that:
we can manage with disabling interrupts _just_ during the updates (writes to
per-CPU refcounts) to safe-guard against races with interrupt handlers.
Beyond that, we can keep the interrupts enabled and still be safe w.r.t
interrupt handlers that can act as readers.

Basically the idea is that we differentiate between the *part* of the
per-CPU refcount that we use for reference counting vs the part that we use
merely to make the writer wait for us to switch over to the right
synchronization scheme.

The scheme involves splitting the per-CPU refcounts into 2 parts:
eg: the lower 16 bits are used to track the nesting depth of the reader
(a "nested-counter"), and the remaining (upper) bits are used to merely mark
the presence of the reader.

As long as the overall reader_refcnt is non-zero, the writer waits for the
reader (assuming that the reader is still actively using per-CPU refcounts for
synchronization).

The reader first sets one of the higher bits to mark its presence, and then
uses the lower 16 bits to manage the nesting depth. So, an interrupt handler
coming in as illustrated above will be able to distinguish between "this is
a nested read-side critical section" vs "we have merely marked our presence
to make the writer wait for us to switch" by looking at the same refcount.
Thus, it makes it unnecessary to keep interrupts disabled throughout the
read-side critical section, despite having the possibility of interrupt
handlers being readers themselves.


Implement this logic and rename the locking functions appropriately, to
reflect what they do.

Based-on-idea-by: Oleg Nesterov <oleg@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 include/linux/percpu-rwlock.h |   10 ++++---
 lib/percpu-rwlock.c           |   57 ++++++++++++++++++++++++++---------------
 2 files changed, 42 insertions(+), 25 deletions(-)

diff --git a/include/linux/percpu-rwlock.h b/include/linux/percpu-rwlock.h
index 5590b1e..8c9e145 100644
--- a/include/linux/percpu-rwlock.h
+++ b/include/linux/percpu-rwlock.h
@@ -38,11 +38,13 @@ struct percpu_rwlock {
 	rwlock_t			global_rwlock;
 };
 
-extern void percpu_read_lock(struct percpu_rwlock *);
-extern void percpu_read_unlock(struct percpu_rwlock *);
+extern void percpu_read_lock_irqsafe(struct percpu_rwlock *);
+extern void percpu_read_unlock_irqsafe(struct percpu_rwlock *);
 
-extern void percpu_write_lock(struct percpu_rwlock *);
-extern void percpu_write_unlock(struct percpu_rwlock *);
+extern void percpu_write_lock_irqsave(struct percpu_rwlock *,
+				      unsigned long *flags);
+extern void percpu_write_unlock_irqrestore(struct percpu_rwlock *,
+					   unsigned long *flags);
 
 extern int __percpu_init_rwlock(struct percpu_rwlock *,
 				const char *, struct lock_class_key *);
diff --git a/lib/percpu-rwlock.c b/lib/percpu-rwlock.c
index edefdea..ce7e440 100644
--- a/lib/percpu-rwlock.c
+++ b/lib/percpu-rwlock.c
@@ -30,11 +30,15 @@
 #include <asm/processor.h>
 
 
+#define READER_PRESENT		(1UL << 16)
+#define READER_REFCNT_MASK	(READER_PRESENT - 1)
+
 #define reader_yet_to_switch(pcpu_rwlock, cpu)				    \
 	(ACCESS_ONCE(per_cpu_ptr((pcpu_rwlock)->rw_state, cpu)->reader_refcnt))
 
-#define reader_percpu_nesting_depth(pcpu_rwlock)		  \
-	(__this_cpu_read((pcpu_rwlock)->rw_state->reader_refcnt))
+#define reader_percpu_nesting_depth(pcpu_rwlock)			\
+	(__this_cpu_read((pcpu_rwlock)->rw_state->reader_refcnt) &	\
+	 READER_REFCNT_MASK)
 
 #define reader_uses_percpu_refcnt(pcpu_rwlock)				\
 				reader_percpu_nesting_depth(pcpu_rwlock)
@@ -71,7 +75,7 @@ void percpu_free_rwlock(struct percpu_rwlock *pcpu_rwlock)
 	pcpu_rwlock->rw_state = NULL;
 }
 
-void percpu_read_lock(struct percpu_rwlock *pcpu_rwlock)
+void percpu_read_lock_irqsafe(struct percpu_rwlock *pcpu_rwlock)
 {
 	preempt_disable();
 
@@ -79,14 +83,18 @@ void percpu_read_lock(struct percpu_rwlock *pcpu_rwlock)
 	 * Let the writer know that a reader is active, even before we choose
 	 * our reader-side synchronization scheme.
 	 */
-	this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
+	this_cpu_add(pcpu_rwlock->rw_state->reader_refcnt, READER_PRESENT);
 
 	/*
 	 * If we are already using per-cpu refcounts, it is not safe to switch
 	 * the synchronization scheme. So continue using the refcounts.
 	 */
-	if (reader_nested_percpu(pcpu_rwlock))
+	if (reader_uses_percpu_refcnt(pcpu_rwlock)) {
+		this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
+		this_cpu_sub(pcpu_rwlock->rw_state->reader_refcnt,
+			     READER_PRESENT);
 		return;
+	}
 
 	/*
 	 * The write to 'reader_refcnt' must be visible before we read
@@ -95,9 +103,19 @@ void percpu_read_lock(struct percpu_rwlock *pcpu_rwlock)
 	smp_mb();
 
 	if (likely(!writer_active(pcpu_rwlock))) {
-		goto out;
+		this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
 	} else {
 		/* Writer is active, so switch to global rwlock. */
+
+		/*
+		 * While we are spinning on ->global_rwlock, an
+		 * interrupt can hit us, and the interrupt handler
+		 * might call this function. The distinction between
+		 * READER_PRESENT and the refcnt helps ensure that the
+		 * interrupt handler also takes this branch and spins
+		 * on the ->global_rwlock, as long as the writer is
+		 * active.
+		 */
 		read_lock(&pcpu_rwlock->global_rwlock);
 
 		/*
@@ -107,29 +125,24 @@ void percpu_read_lock(struct percpu_rwlock *pcpu_rwlock)
 		 * refcounts. (This also helps avoid heterogeneous nesting of
 		 * readers).
 		 */
-		if (writer_active(pcpu_rwlock)) {
-			/*
-			 * The above writer_active() check can get reordered
-			 * with this_cpu_dec() below, but this is OK, because
-			 * holding the rwlock is conservative.
-			 */
-			this_cpu_dec(pcpu_rwlock->rw_state->reader_refcnt);
-		} else {
+		if (!writer_active(pcpu_rwlock)) {
+			this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
 			read_unlock(&pcpu_rwlock->global_rwlock);
 		}
 	}
 
-out:
+	this_cpu_sub(pcpu_rwlock->rw_state->reader_refcnt, READER_PRESENT);
+
 	/* Prevent reordering of any subsequent reads/writes */
 	smp_mb();
 }
 
-void percpu_read_unlock(struct percpu_rwlock *pcpu_rwlock)
+void percpu_read_unlock_irqsafe(struct percpu_rwlock *pcpu_rwlock)
 {
 	/*
 	 * We never allow heterogeneous nesting of readers. So it is trivial
 	 * to find out the kind of reader we are, and undo the operation
-	 * done by our corresponding percpu_read_lock().
+	 * done by our corresponding percpu_read_lock_irqsafe().
 	 */
 
 	/* Try to fast-path: a nested percpu reader is the simplest case */
@@ -158,7 +171,8 @@ void percpu_read_unlock(struct percpu_rwlock *pcpu_rwlock)
 	preempt_enable();
 }
 
-void percpu_write_lock(struct percpu_rwlock *pcpu_rwlock)
+void percpu_write_lock_irqsave(struct percpu_rwlock *pcpu_rwlock,
+			       unsigned long *flags)
 {
 	unsigned int cpu;
 
@@ -187,10 +201,11 @@ void percpu_write_lock(struct percpu_rwlock *pcpu_rwlock)
 	}
 
 	smp_mb(); /* Complete the wait-for-readers, before taking the lock */
-	write_lock(&pcpu_rwlock->global_rwlock);
+	write_lock_irqsave(&pcpu_rwlock->global_rwlock, *flags);
 }
 
-void percpu_write_unlock(struct percpu_rwlock *pcpu_rwlock)
+void percpu_write_unlock_irqrestore(struct percpu_rwlock *pcpu_rwlock,
+				    unsigned long *flags)
 {
 	unsigned int cpu;
 
@@ -205,6 +220,6 @@ void percpu_write_unlock(struct percpu_rwlock *pcpu_rwlock)
 	for_each_possible_cpu(cpu)
 		per_cpu_ptr(pcpu_rwlock->rw_state, cpu)->writer_signal = false;
 
-	write_unlock(&pcpu_rwlock->global_rwlock);
+	write_unlock_irqrestore(&pcpu_rwlock->global_rwlock, *flags);
 }
 


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 05/46] percpu_rwlock: Make percpu-rwlocks IRQ-safe, optimally
@ 2013-02-18 12:39   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:39 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

If interrupt handlers can also be readers, then one of the ways to make
per-CPU rwlocks safe, is to disable interrupts at the reader side before
trying to acquire the per-CPU rwlock and keep it disabled throughout the
duration of the read-side critical section.

The goal is to avoid cases such as:

  1. writer is active and it holds the global rwlock for write

  2. a regular reader comes in and marks itself as present (by incrementing
     its per-CPU refcount) before checking whether writer is active.

  3. an interrupt hits the reader;
     [If it had not hit, the reader would have noticed that the writer is
      active and would have decremented its refcount and would have tried
      to acquire the global rwlock for read].
     Since the interrupt handler also happens to be a reader, it notices
     the non-zero refcount (which was due to the reader who got interrupted)
     and thinks that this is a nested read-side critical section and
     proceeds to take the fastpath, which is wrong. The interrupt handler
     should have noticed that the writer is active and taken the rwlock
     for read.

So, disabling interrupts can help avoid this problem (at the cost of keeping
the interrupts disabled for quite long).

But Oleg had a brilliant idea by which we can do much better than that:
we can manage with disabling interrupts _just_ during the updates (writes to
per-CPU refcounts) to safe-guard against races with interrupt handlers.
Beyond that, we can keep the interrupts enabled and still be safe w.r.t
interrupt handlers that can act as readers.

Basically the idea is that we differentiate between the *part* of the
per-CPU refcount that we use for reference counting vs the part that we use
merely to make the writer wait for us to switch over to the right
synchronization scheme.

The scheme involves splitting the per-CPU refcounts into 2 parts:
eg: the lower 16 bits are used to track the nesting depth of the reader
(a "nested-counter"), and the remaining (upper) bits are used to merely mark
the presence of the reader.

As long as the overall reader_refcnt is non-zero, the writer waits for the
reader (assuming that the reader is still actively using per-CPU refcounts for
synchronization).

The reader first sets one of the higher bits to mark its presence, and then
uses the lower 16 bits to manage the nesting depth. So, an interrupt handler
coming in as illustrated above will be able to distinguish between "this is
a nested read-side critical section" vs "we have merely marked our presence
to make the writer wait for us to switch" by looking at the same refcount.
Thus, it makes it unnecessary to keep interrupts disabled throughout the
read-side critical section, despite having the possibility of interrupt
handlers being readers themselves.


Implement this logic and rename the locking functions appropriately, to
reflect what they do.

Based-on-idea-by: Oleg Nesterov <oleg@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 include/linux/percpu-rwlock.h |   10 ++++---
 lib/percpu-rwlock.c           |   57 ++++++++++++++++++++++++++---------------
 2 files changed, 42 insertions(+), 25 deletions(-)

diff --git a/include/linux/percpu-rwlock.h b/include/linux/percpu-rwlock.h
index 5590b1e..8c9e145 100644
--- a/include/linux/percpu-rwlock.h
+++ b/include/linux/percpu-rwlock.h
@@ -38,11 +38,13 @@ struct percpu_rwlock {
 	rwlock_t			global_rwlock;
 };
 
-extern void percpu_read_lock(struct percpu_rwlock *);
-extern void percpu_read_unlock(struct percpu_rwlock *);
+extern void percpu_read_lock_irqsafe(struct percpu_rwlock *);
+extern void percpu_read_unlock_irqsafe(struct percpu_rwlock *);
 
-extern void percpu_write_lock(struct percpu_rwlock *);
-extern void percpu_write_unlock(struct percpu_rwlock *);
+extern void percpu_write_lock_irqsave(struct percpu_rwlock *,
+				      unsigned long *flags);
+extern void percpu_write_unlock_irqrestore(struct percpu_rwlock *,
+					   unsigned long *flags);
 
 extern int __percpu_init_rwlock(struct percpu_rwlock *,
 				const char *, struct lock_class_key *);
diff --git a/lib/percpu-rwlock.c b/lib/percpu-rwlock.c
index edefdea..ce7e440 100644
--- a/lib/percpu-rwlock.c
+++ b/lib/percpu-rwlock.c
@@ -30,11 +30,15 @@
 #include <asm/processor.h>
 
 
+#define READER_PRESENT		(1UL << 16)
+#define READER_REFCNT_MASK	(READER_PRESENT - 1)
+
 #define reader_yet_to_switch(pcpu_rwlock, cpu)				    \
 	(ACCESS_ONCE(per_cpu_ptr((pcpu_rwlock)->rw_state, cpu)->reader_refcnt))
 
-#define reader_percpu_nesting_depth(pcpu_rwlock)		  \
-	(__this_cpu_read((pcpu_rwlock)->rw_state->reader_refcnt))
+#define reader_percpu_nesting_depth(pcpu_rwlock)			\
+	(__this_cpu_read((pcpu_rwlock)->rw_state->reader_refcnt) &	\
+	 READER_REFCNT_MASK)
 
 #define reader_uses_percpu_refcnt(pcpu_rwlock)				\
 				reader_percpu_nesting_depth(pcpu_rwlock)
@@ -71,7 +75,7 @@ void percpu_free_rwlock(struct percpu_rwlock *pcpu_rwlock)
 	pcpu_rwlock->rw_state = NULL;
 }
 
-void percpu_read_lock(struct percpu_rwlock *pcpu_rwlock)
+void percpu_read_lock_irqsafe(struct percpu_rwlock *pcpu_rwlock)
 {
 	preempt_disable();
 
@@ -79,14 +83,18 @@ void percpu_read_lock(struct percpu_rwlock *pcpu_rwlock)
 	 * Let the writer know that a reader is active, even before we choose
 	 * our reader-side synchronization scheme.
 	 */
-	this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
+	this_cpu_add(pcpu_rwlock->rw_state->reader_refcnt, READER_PRESENT);
 
 	/*
 	 * If we are already using per-cpu refcounts, it is not safe to switch
 	 * the synchronization scheme. So continue using the refcounts.
 	 */
-	if (reader_nested_percpu(pcpu_rwlock))
+	if (reader_uses_percpu_refcnt(pcpu_rwlock)) {
+		this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
+		this_cpu_sub(pcpu_rwlock->rw_state->reader_refcnt,
+			     READER_PRESENT);
 		return;
+	}
 
 	/*
 	 * The write to 'reader_refcnt' must be visible before we read
@@ -95,9 +103,19 @@ void percpu_read_lock(struct percpu_rwlock *pcpu_rwlock)
 	smp_mb();
 
 	if (likely(!writer_active(pcpu_rwlock))) {
-		goto out;
+		this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
 	} else {
 		/* Writer is active, so switch to global rwlock. */
+
+		/*
+		 * While we are spinning on ->global_rwlock, an
+		 * interrupt can hit us, and the interrupt handler
+		 * might call this function. The distinction between
+		 * READER_PRESENT and the refcnt helps ensure that the
+		 * interrupt handler also takes this branch and spins
+		 * on the ->global_rwlock, as long as the writer is
+		 * active.
+		 */
 		read_lock(&pcpu_rwlock->global_rwlock);
 
 		/*
@@ -107,29 +125,24 @@ void percpu_read_lock(struct percpu_rwlock *pcpu_rwlock)
 		 * refcounts. (This also helps avoid heterogeneous nesting of
 		 * readers).
 		 */
-		if (writer_active(pcpu_rwlock)) {
-			/*
-			 * The above writer_active() check can get reordered
-			 * with this_cpu_dec() below, but this is OK, because
-			 * holding the rwlock is conservative.
-			 */
-			this_cpu_dec(pcpu_rwlock->rw_state->reader_refcnt);
-		} else {
+		if (!writer_active(pcpu_rwlock)) {
+			this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
 			read_unlock(&pcpu_rwlock->global_rwlock);
 		}
 	}
 
-out:
+	this_cpu_sub(pcpu_rwlock->rw_state->reader_refcnt, READER_PRESENT);
+
 	/* Prevent reordering of any subsequent reads/writes */
 	smp_mb();
 }
 
-void percpu_read_unlock(struct percpu_rwlock *pcpu_rwlock)
+void percpu_read_unlock_irqsafe(struct percpu_rwlock *pcpu_rwlock)
 {
 	/*
 	 * We never allow heterogeneous nesting of readers. So it is trivial
 	 * to find out the kind of reader we are, and undo the operation
-	 * done by our corresponding percpu_read_lock().
+	 * done by our corresponding percpu_read_lock_irqsafe().
 	 */
 
 	/* Try to fast-path: a nested percpu reader is the simplest case */
@@ -158,7 +171,8 @@ void percpu_read_unlock(struct percpu_rwlock *pcpu_rwlock)
 	preempt_enable();
 }
 
-void percpu_write_lock(struct percpu_rwlock *pcpu_rwlock)
+void percpu_write_lock_irqsave(struct percpu_rwlock *pcpu_rwlock,
+			       unsigned long *flags)
 {
 	unsigned int cpu;
 
@@ -187,10 +201,11 @@ void percpu_write_lock(struct percpu_rwlock *pcpu_rwlock)
 	}
 
 	smp_mb(); /* Complete the wait-for-readers, before taking the lock */
-	write_lock(&pcpu_rwlock->global_rwlock);
+	write_lock_irqsave(&pcpu_rwlock->global_rwlock, *flags);
 }
 
-void percpu_write_unlock(struct percpu_rwlock *pcpu_rwlock)
+void percpu_write_unlock_irqrestore(struct percpu_rwlock *pcpu_rwlock,
+				    unsigned long *flags)
 {
 	unsigned int cpu;
 
@@ -205,6 +220,6 @@ void percpu_write_unlock(struct percpu_rwlock *pcpu_rwlock)
 	for_each_possible_cpu(cpu)
 		per_cpu_ptr(pcpu_rwlock->rw_state, cpu)->writer_signal = false;
 
-	write_unlock(&pcpu_rwlock->global_rwlock);
+	write_unlock_irqrestore(&pcpu_rwlock->global_rwlock, *flags);
 }

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 05/46] percpu_rwlock: Make percpu-rwlocks IRQ-safe, optimally
@ 2013-02-18 12:39   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:39 UTC (permalink / raw)
  To: linux-arm-kernel

If interrupt handlers can also be readers, then one of the ways to make
per-CPU rwlocks safe, is to disable interrupts at the reader side before
trying to acquire the per-CPU rwlock and keep it disabled throughout the
duration of the read-side critical section.

The goal is to avoid cases such as:

  1. writer is active and it holds the global rwlock for write

  2. a regular reader comes in and marks itself as present (by incrementing
     its per-CPU refcount) before checking whether writer is active.

  3. an interrupt hits the reader;
     [If it had not hit, the reader would have noticed that the writer is
      active and would have decremented its refcount and would have tried
      to acquire the global rwlock for read].
     Since the interrupt handler also happens to be a reader, it notices
     the non-zero refcount (which was due to the reader who got interrupted)
     and thinks that this is a nested read-side critical section and
     proceeds to take the fastpath, which is wrong. The interrupt handler
     should have noticed that the writer is active and taken the rwlock
     for read.

So, disabling interrupts can help avoid this problem (at the cost of keeping
the interrupts disabled for quite long).

But Oleg had a brilliant idea by which we can do much better than that:
we can manage with disabling interrupts _just_ during the updates (writes to
per-CPU refcounts) to safe-guard against races with interrupt handlers.
Beyond that, we can keep the interrupts enabled and still be safe w.r.t
interrupt handlers that can act as readers.

Basically the idea is that we differentiate between the *part* of the
per-CPU refcount that we use for reference counting vs the part that we use
merely to make the writer wait for us to switch over to the right
synchronization scheme.

The scheme involves splitting the per-CPU refcounts into 2 parts:
eg: the lower 16 bits are used to track the nesting depth of the reader
(a "nested-counter"), and the remaining (upper) bits are used to merely mark
the presence of the reader.

As long as the overall reader_refcnt is non-zero, the writer waits for the
reader (assuming that the reader is still actively using per-CPU refcounts for
synchronization).

The reader first sets one of the higher bits to mark its presence, and then
uses the lower 16 bits to manage the nesting depth. So, an interrupt handler
coming in as illustrated above will be able to distinguish between "this is
a nested read-side critical section" vs "we have merely marked our presence
to make the writer wait for us to switch" by looking at the same refcount.
Thus, it makes it unnecessary to keep interrupts disabled throughout the
read-side critical section, despite having the possibility of interrupt
handlers being readers themselves.


Implement this logic and rename the locking functions appropriately, to
reflect what they do.

Based-on-idea-by: Oleg Nesterov <oleg@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 include/linux/percpu-rwlock.h |   10 ++++---
 lib/percpu-rwlock.c           |   57 ++++++++++++++++++++++++++---------------
 2 files changed, 42 insertions(+), 25 deletions(-)

diff --git a/include/linux/percpu-rwlock.h b/include/linux/percpu-rwlock.h
index 5590b1e..8c9e145 100644
--- a/include/linux/percpu-rwlock.h
+++ b/include/linux/percpu-rwlock.h
@@ -38,11 +38,13 @@ struct percpu_rwlock {
 	rwlock_t			global_rwlock;
 };
 
-extern void percpu_read_lock(struct percpu_rwlock *);
-extern void percpu_read_unlock(struct percpu_rwlock *);
+extern void percpu_read_lock_irqsafe(struct percpu_rwlock *);
+extern void percpu_read_unlock_irqsafe(struct percpu_rwlock *);
 
-extern void percpu_write_lock(struct percpu_rwlock *);
-extern void percpu_write_unlock(struct percpu_rwlock *);
+extern void percpu_write_lock_irqsave(struct percpu_rwlock *,
+				      unsigned long *flags);
+extern void percpu_write_unlock_irqrestore(struct percpu_rwlock *,
+					   unsigned long *flags);
 
 extern int __percpu_init_rwlock(struct percpu_rwlock *,
 				const char *, struct lock_class_key *);
diff --git a/lib/percpu-rwlock.c b/lib/percpu-rwlock.c
index edefdea..ce7e440 100644
--- a/lib/percpu-rwlock.c
+++ b/lib/percpu-rwlock.c
@@ -30,11 +30,15 @@
 #include <asm/processor.h>
 
 
+#define READER_PRESENT		(1UL << 16)
+#define READER_REFCNT_MASK	(READER_PRESENT - 1)
+
 #define reader_yet_to_switch(pcpu_rwlock, cpu)				    \
 	(ACCESS_ONCE(per_cpu_ptr((pcpu_rwlock)->rw_state, cpu)->reader_refcnt))
 
-#define reader_percpu_nesting_depth(pcpu_rwlock)		  \
-	(__this_cpu_read((pcpu_rwlock)->rw_state->reader_refcnt))
+#define reader_percpu_nesting_depth(pcpu_rwlock)			\
+	(__this_cpu_read((pcpu_rwlock)->rw_state->reader_refcnt) &	\
+	 READER_REFCNT_MASK)
 
 #define reader_uses_percpu_refcnt(pcpu_rwlock)				\
 				reader_percpu_nesting_depth(pcpu_rwlock)
@@ -71,7 +75,7 @@ void percpu_free_rwlock(struct percpu_rwlock *pcpu_rwlock)
 	pcpu_rwlock->rw_state = NULL;
 }
 
-void percpu_read_lock(struct percpu_rwlock *pcpu_rwlock)
+void percpu_read_lock_irqsafe(struct percpu_rwlock *pcpu_rwlock)
 {
 	preempt_disable();
 
@@ -79,14 +83,18 @@ void percpu_read_lock(struct percpu_rwlock *pcpu_rwlock)
 	 * Let the writer know that a reader is active, even before we choose
 	 * our reader-side synchronization scheme.
 	 */
-	this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
+	this_cpu_add(pcpu_rwlock->rw_state->reader_refcnt, READER_PRESENT);
 
 	/*
 	 * If we are already using per-cpu refcounts, it is not safe to switch
 	 * the synchronization scheme. So continue using the refcounts.
 	 */
-	if (reader_nested_percpu(pcpu_rwlock))
+	if (reader_uses_percpu_refcnt(pcpu_rwlock)) {
+		this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
+		this_cpu_sub(pcpu_rwlock->rw_state->reader_refcnt,
+			     READER_PRESENT);
 		return;
+	}
 
 	/*
 	 * The write to 'reader_refcnt' must be visible before we read
@@ -95,9 +103,19 @@ void percpu_read_lock(struct percpu_rwlock *pcpu_rwlock)
 	smp_mb();
 
 	if (likely(!writer_active(pcpu_rwlock))) {
-		goto out;
+		this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
 	} else {
 		/* Writer is active, so switch to global rwlock. */
+
+		/*
+		 * While we are spinning on ->global_rwlock, an
+		 * interrupt can hit us, and the interrupt handler
+		 * might call this function. The distinction between
+		 * READER_PRESENT and the refcnt helps ensure that the
+		 * interrupt handler also takes this branch and spins
+		 * on the ->global_rwlock, as long as the writer is
+		 * active.
+		 */
 		read_lock(&pcpu_rwlock->global_rwlock);
 
 		/*
@@ -107,29 +125,24 @@ void percpu_read_lock(struct percpu_rwlock *pcpu_rwlock)
 		 * refcounts. (This also helps avoid heterogeneous nesting of
 		 * readers).
 		 */
-		if (writer_active(pcpu_rwlock)) {
-			/*
-			 * The above writer_active() check can get reordered
-			 * with this_cpu_dec() below, but this is OK, because
-			 * holding the rwlock is conservative.
-			 */
-			this_cpu_dec(pcpu_rwlock->rw_state->reader_refcnt);
-		} else {
+		if (!writer_active(pcpu_rwlock)) {
+			this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
 			read_unlock(&pcpu_rwlock->global_rwlock);
 		}
 	}
 
-out:
+	this_cpu_sub(pcpu_rwlock->rw_state->reader_refcnt, READER_PRESENT);
+
 	/* Prevent reordering of any subsequent reads/writes */
 	smp_mb();
 }
 
-void percpu_read_unlock(struct percpu_rwlock *pcpu_rwlock)
+void percpu_read_unlock_irqsafe(struct percpu_rwlock *pcpu_rwlock)
 {
 	/*
 	 * We never allow heterogeneous nesting of readers. So it is trivial
 	 * to find out the kind of reader we are, and undo the operation
-	 * done by our corresponding percpu_read_lock().
+	 * done by our corresponding percpu_read_lock_irqsafe().
 	 */
 
 	/* Try to fast-path: a nested percpu reader is the simplest case */
@@ -158,7 +171,8 @@ void percpu_read_unlock(struct percpu_rwlock *pcpu_rwlock)
 	preempt_enable();
 }
 
-void percpu_write_lock(struct percpu_rwlock *pcpu_rwlock)
+void percpu_write_lock_irqsave(struct percpu_rwlock *pcpu_rwlock,
+			       unsigned long *flags)
 {
 	unsigned int cpu;
 
@@ -187,10 +201,11 @@ void percpu_write_lock(struct percpu_rwlock *pcpu_rwlock)
 	}
 
 	smp_mb(); /* Complete the wait-for-readers, before taking the lock */
-	write_lock(&pcpu_rwlock->global_rwlock);
+	write_lock_irqsave(&pcpu_rwlock->global_rwlock, *flags);
 }
 
-void percpu_write_unlock(struct percpu_rwlock *pcpu_rwlock)
+void percpu_write_unlock_irqrestore(struct percpu_rwlock *pcpu_rwlock,
+				    unsigned long *flags)
 {
 	unsigned int cpu;
 
@@ -205,6 +220,6 @@ void percpu_write_unlock(struct percpu_rwlock *pcpu_rwlock)
 	for_each_possible_cpu(cpu)
 		per_cpu_ptr(pcpu_rwlock->rw_state, cpu)->writer_signal = false;
 
-	write_unlock(&pcpu_rwlock->global_rwlock);
+	write_unlock_irqrestore(&pcpu_rwlock->global_rwlock, *flags);
 }
 

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 06/46] percpu_rwlock: Rearrange the read-lock code to fastpath nested percpu readers
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:39   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:39 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

If we are dealing with a nester percpu reader, we can optimize away quite
a few costly operations. Improve that fastpath further, by avoiding the
unnecessary addition and subtraction of 'READER_PRESENT' to reader_refcnt,
by rearranging the code a bit.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 lib/percpu-rwlock.c |   14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/lib/percpu-rwlock.c b/lib/percpu-rwlock.c
index ce7e440..ed36531 100644
--- a/lib/percpu-rwlock.c
+++ b/lib/percpu-rwlock.c
@@ -80,23 +80,21 @@ void percpu_read_lock_irqsafe(struct percpu_rwlock *pcpu_rwlock)
 	preempt_disable();
 
 	/*
-	 * Let the writer know that a reader is active, even before we choose
-	 * our reader-side synchronization scheme.
-	 */
-	this_cpu_add(pcpu_rwlock->rw_state->reader_refcnt, READER_PRESENT);
-
-	/*
 	 * If we are already using per-cpu refcounts, it is not safe to switch
 	 * the synchronization scheme. So continue using the refcounts.
 	 */
 	if (reader_uses_percpu_refcnt(pcpu_rwlock)) {
 		this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
-		this_cpu_sub(pcpu_rwlock->rw_state->reader_refcnt,
-			     READER_PRESENT);
 		return;
 	}
 
 	/*
+	 * Let the writer know that a reader is active, even before we choose
+	 * our reader-side synchronization scheme.
+	 */
+	this_cpu_add(pcpu_rwlock->rw_state->reader_refcnt, READER_PRESENT);
+
+	/*
 	 * The write to 'reader_refcnt' must be visible before we read
 	 * 'writer_signal'.
 	 */


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 06/46] percpu_rwlock: Rearrange the read-lock code to fastpath nested percpu readers
@ 2013-02-18 12:39   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:39 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

If we are dealing with a nester percpu reader, we can optimize away quite
a few costly operations. Improve that fastpath further, by avoiding the
unnecessary addition and subtraction of 'READER_PRESENT' to reader_refcnt,
by rearranging the code a bit.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 lib/percpu-rwlock.c |   14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/lib/percpu-rwlock.c b/lib/percpu-rwlock.c
index ce7e440..ed36531 100644
--- a/lib/percpu-rwlock.c
+++ b/lib/percpu-rwlock.c
@@ -80,23 +80,21 @@ void percpu_read_lock_irqsafe(struct percpu_rwlock *pcpu_rwlock)
 	preempt_disable();
 
 	/*
-	 * Let the writer know that a reader is active, even before we choose
-	 * our reader-side synchronization scheme.
-	 */
-	this_cpu_add(pcpu_rwlock->rw_state->reader_refcnt, READER_PRESENT);
-
-	/*
 	 * If we are already using per-cpu refcounts, it is not safe to switch
 	 * the synchronization scheme. So continue using the refcounts.
 	 */
 	if (reader_uses_percpu_refcnt(pcpu_rwlock)) {
 		this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
-		this_cpu_sub(pcpu_rwlock->rw_state->reader_refcnt,
-			     READER_PRESENT);
 		return;
 	}
 
 	/*
+	 * Let the writer know that a reader is active, even before we choose
+	 * our reader-side synchronization scheme.
+	 */
+	this_cpu_add(pcpu_rwlock->rw_state->reader_refcnt, READER_PRESENT);
+
+	/*
 	 * The write to 'reader_refcnt' must be visible before we read
 	 * 'writer_signal'.
 	 */

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 06/46] percpu_rwlock: Rearrange the read-lock code to fastpath nested percpu readers
@ 2013-02-18 12:39   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:39 UTC (permalink / raw)
  To: linux-arm-kernel

If we are dealing with a nester percpu reader, we can optimize away quite
a few costly operations. Improve that fastpath further, by avoiding the
unnecessary addition and subtraction of 'READER_PRESENT' to reader_refcnt,
by rearranging the code a bit.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 lib/percpu-rwlock.c |   14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/lib/percpu-rwlock.c b/lib/percpu-rwlock.c
index ce7e440..ed36531 100644
--- a/lib/percpu-rwlock.c
+++ b/lib/percpu-rwlock.c
@@ -80,23 +80,21 @@ void percpu_read_lock_irqsafe(struct percpu_rwlock *pcpu_rwlock)
 	preempt_disable();
 
 	/*
-	 * Let the writer know that a reader is active, even before we choose
-	 * our reader-side synchronization scheme.
-	 */
-	this_cpu_add(pcpu_rwlock->rw_state->reader_refcnt, READER_PRESENT);
-
-	/*
 	 * If we are already using per-cpu refcounts, it is not safe to switch
 	 * the synchronization scheme. So continue using the refcounts.
 	 */
 	if (reader_uses_percpu_refcnt(pcpu_rwlock)) {
 		this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
-		this_cpu_sub(pcpu_rwlock->rw_state->reader_refcnt,
-			     READER_PRESENT);
 		return;
 	}
 
 	/*
+	 * Let the writer know that a reader is active, even before we choose
+	 * our reader-side synchronization scheme.
+	 */
+	this_cpu_add(pcpu_rwlock->rw_state->reader_refcnt, READER_PRESENT);
+
+	/*
 	 * The write to 'reader_refcnt' must be visible before we read
 	 * 'writer_signal'.
 	 */

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 07/46] percpu_rwlock: Allow writers to be readers, and add lockdep annotations
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:39   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:39 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

CPU hotplug (which will be the first user of per-CPU rwlocks) has a special
requirement with respect to locking: the writer, after acquiring the per-CPU
rwlock for write, must be allowed to take the same lock for read, without
deadlocking and without getting complaints from lockdep. In comparison, this
is similar to what get_online_cpus()/put_online_cpus() does today: it allows
a hotplug writer (who holds the cpu_hotplug.lock mutex) to invoke it without
locking issues, because it silently returns if the caller is the hotplug
writer itself.

This can be easily achieved with per-CPU rwlocks as well (even without a
"is this a writer?" check) by incrementing the per-CPU refcount of the writer
immediately after taking the global rwlock for write, and then decrementing
the per-CPU refcount before releasing the global rwlock.
This ensures that any reader that comes along on that CPU while the writer is
active (on that same CPU), notices the non-zero value of the nested counter
and assumes that it is a nested read-side critical section and proceeds by
just incrementing the refcount. Thus we prevent the reader from taking the
global rwlock for read, which prevents the writer from deadlocking itself.

Add that support and teach lockdep about this special locking scheme so
that it knows that this sort of usage is valid. Also add the required lockdep
annotations to enable it to detect common locking problems with per-CPU
rwlocks.

Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 lib/percpu-rwlock.c |   33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/lib/percpu-rwlock.c b/lib/percpu-rwlock.c
index ed36531..bf95e40 100644
--- a/lib/percpu-rwlock.c
+++ b/lib/percpu-rwlock.c
@@ -102,6 +102,10 @@ void percpu_read_lock_irqsafe(struct percpu_rwlock *pcpu_rwlock)
 
 	if (likely(!writer_active(pcpu_rwlock))) {
 		this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
+
+		/* Pretend that we take global_rwlock for lockdep */
+		rwlock_acquire_read(&pcpu_rwlock->global_rwlock.dep_map,
+				    0, 0, _RET_IP_);
 	} else {
 		/* Writer is active, so switch to global rwlock. */
 
@@ -126,6 +130,12 @@ void percpu_read_lock_irqsafe(struct percpu_rwlock *pcpu_rwlock)
 		if (!writer_active(pcpu_rwlock)) {
 			this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
 			read_unlock(&pcpu_rwlock->global_rwlock);
+
+			/*
+			 * Pretend that we take global_rwlock for lockdep
+			 */
+			rwlock_acquire_read(&pcpu_rwlock->global_rwlock.dep_map,
+					    0, 0, _RET_IP_);
 		}
 	}
 
@@ -162,6 +172,13 @@ void percpu_read_unlock_irqsafe(struct percpu_rwlock *pcpu_rwlock)
 		 */
 		smp_mb();
 		this_cpu_dec(pcpu_rwlock->rw_state->reader_refcnt);
+
+		/*
+		 * Since this is the last decrement, it is time to pretend
+		 * to lockdep that we are releasing the read lock.
+		 */
+		rwlock_release(&pcpu_rwlock->global_rwlock.dep_map,
+			       1, _RET_IP_);
 	} else {
 		read_unlock(&pcpu_rwlock->global_rwlock);
 	}
@@ -200,6 +217,16 @@ void percpu_write_lock_irqsave(struct percpu_rwlock *pcpu_rwlock,
 
 	smp_mb(); /* Complete the wait-for-readers, before taking the lock */
 	write_lock_irqsave(&pcpu_rwlock->global_rwlock, *flags);
+
+	/*
+	 * It is desirable to allow the writer to acquire the percpu-rwlock
+	 * for read (if necessary), without deadlocking or getting complaints
+	 * from lockdep. To achieve that, just increment the reader_refcnt of
+	 * this CPU - that way, any attempt by the writer to acquire the
+	 * percpu-rwlock for read, will get treated as a case of nested percpu
+	 * reader, which is safe, from a locking perspective.
+	 */
+	this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
 }
 
 void percpu_write_unlock_irqrestore(struct percpu_rwlock *pcpu_rwlock,
@@ -207,6 +234,12 @@ void percpu_write_unlock_irqrestore(struct percpu_rwlock *pcpu_rwlock,
 {
 	unsigned int cpu;
 
+	/*
+	 * Undo the special increment that we had done in the write-lock path
+	 * in order to allow writers to be readers.
+	 */
+	this_cpu_dec(pcpu_rwlock->rw_state->reader_refcnt);
+
 	/* Complete the critical section before clearing ->writer_signal */
 	smp_mb();
 


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 07/46] percpu_rwlock: Allow writers to be readers, and add lockdep annotations
@ 2013-02-18 12:39   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:39 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

CPU hotplug (which will be the first user of per-CPU rwlocks) has a special
requirement with respect to locking: the writer, after acquiring the per-CPU
rwlock for write, must be allowed to take the same lock for read, without
deadlocking and without getting complaints from lockdep. In comparison, this
is similar to what get_online_cpus()/put_online_cpus() does today: it allows
a hotplug writer (who holds the cpu_hotplug.lock mutex) to invoke it without
locking issues, because it silently returns if the caller is the hotplug
writer itself.

This can be easily achieved with per-CPU rwlocks as well (even without a
"is this a writer?" check) by incrementing the per-CPU refcount of the writer
immediately after taking the global rwlock for write, and then decrementing
the per-CPU refcount before releasing the global rwlock.
This ensures that any reader that comes along on that CPU while the writer is
active (on that same CPU), notices the non-zero value of the nested counter
and assumes that it is a nested read-side critical section and proceeds by
just incrementing the refcount. Thus we prevent the reader from taking the
global rwlock for read, which prevents the writer from deadlocking itself.

Add that support and teach lockdep about this special locking scheme so
that it knows that this sort of usage is valid. Also add the required lockdep
annotations to enable it to detect common locking problems with per-CPU
rwlocks.

Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 lib/percpu-rwlock.c |   33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/lib/percpu-rwlock.c b/lib/percpu-rwlock.c
index ed36531..bf95e40 100644
--- a/lib/percpu-rwlock.c
+++ b/lib/percpu-rwlock.c
@@ -102,6 +102,10 @@ void percpu_read_lock_irqsafe(struct percpu_rwlock *pcpu_rwlock)
 
 	if (likely(!writer_active(pcpu_rwlock))) {
 		this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
+
+		/* Pretend that we take global_rwlock for lockdep */
+		rwlock_acquire_read(&pcpu_rwlock->global_rwlock.dep_map,
+				    0, 0, _RET_IP_);
 	} else {
 		/* Writer is active, so switch to global rwlock. */
 
@@ -126,6 +130,12 @@ void percpu_read_lock_irqsafe(struct percpu_rwlock *pcpu_rwlock)
 		if (!writer_active(pcpu_rwlock)) {
 			this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
 			read_unlock(&pcpu_rwlock->global_rwlock);
+
+			/*
+			 * Pretend that we take global_rwlock for lockdep
+			 */
+			rwlock_acquire_read(&pcpu_rwlock->global_rwlock.dep_map,
+					    0, 0, _RET_IP_);
 		}
 	}
 
@@ -162,6 +172,13 @@ void percpu_read_unlock_irqsafe(struct percpu_rwlock *pcpu_rwlock)
 		 */
 		smp_mb();
 		this_cpu_dec(pcpu_rwlock->rw_state->reader_refcnt);
+
+		/*
+		 * Since this is the last decrement, it is time to pretend
+		 * to lockdep that we are releasing the read lock.
+		 */
+		rwlock_release(&pcpu_rwlock->global_rwlock.dep_map,
+			       1, _RET_IP_);
 	} else {
 		read_unlock(&pcpu_rwlock->global_rwlock);
 	}
@@ -200,6 +217,16 @@ void percpu_write_lock_irqsave(struct percpu_rwlock *pcpu_rwlock,
 
 	smp_mb(); /* Complete the wait-for-readers, before taking the lock */
 	write_lock_irqsave(&pcpu_rwlock->global_rwlock, *flags);
+
+	/*
+	 * It is desirable to allow the writer to acquire the percpu-rwlock
+	 * for read (if necessary), without deadlocking or getting complaints
+	 * from lockdep. To achieve that, just increment the reader_refcnt of
+	 * this CPU - that way, any attempt by the writer to acquire the
+	 * percpu-rwlock for read, will get treated as a case of nested percpu
+	 * reader, which is safe, from a locking perspective.
+	 */
+	this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
 }
 
 void percpu_write_unlock_irqrestore(struct percpu_rwlock *pcpu_rwlock,
@@ -207,6 +234,12 @@ void percpu_write_unlock_irqrestore(struct percpu_rwlock *pcpu_rwlock,
 {
 	unsigned int cpu;
 
+	/*
+	 * Undo the special increment that we had done in the write-lock path
+	 * in order to allow writers to be readers.
+	 */
+	this_cpu_dec(pcpu_rwlock->rw_state->reader_refcnt);
+
 	/* Complete the critical section before clearing ->writer_signal */
 	smp_mb();
 

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 07/46] percpu_rwlock: Allow writers to be readers, and add lockdep annotations
@ 2013-02-18 12:39   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:39 UTC (permalink / raw)
  To: linux-arm-kernel

CPU hotplug (which will be the first user of per-CPU rwlocks) has a special
requirement with respect to locking: the writer, after acquiring the per-CPU
rwlock for write, must be allowed to take the same lock for read, without
deadlocking and without getting complaints from lockdep. In comparison, this
is similar to what get_online_cpus()/put_online_cpus() does today: it allows
a hotplug writer (who holds the cpu_hotplug.lock mutex) to invoke it without
locking issues, because it silently returns if the caller is the hotplug
writer itself.

This can be easily achieved with per-CPU rwlocks as well (even without a
"is this a writer?" check) by incrementing the per-CPU refcount of the writer
immediately after taking the global rwlock for write, and then decrementing
the per-CPU refcount before releasing the global rwlock.
This ensures that any reader that comes along on that CPU while the writer is
active (on that same CPU), notices the non-zero value of the nested counter
and assumes that it is a nested read-side critical section and proceeds by
just incrementing the refcount. Thus we prevent the reader from taking the
global rwlock for read, which prevents the writer from deadlocking itself.

Add that support and teach lockdep about this special locking scheme so
that it knows that this sort of usage is valid. Also add the required lockdep
annotations to enable it to detect common locking problems with per-CPU
rwlocks.

Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 lib/percpu-rwlock.c |   33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/lib/percpu-rwlock.c b/lib/percpu-rwlock.c
index ed36531..bf95e40 100644
--- a/lib/percpu-rwlock.c
+++ b/lib/percpu-rwlock.c
@@ -102,6 +102,10 @@ void percpu_read_lock_irqsafe(struct percpu_rwlock *pcpu_rwlock)
 
 	if (likely(!writer_active(pcpu_rwlock))) {
 		this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
+
+		/* Pretend that we take global_rwlock for lockdep */
+		rwlock_acquire_read(&pcpu_rwlock->global_rwlock.dep_map,
+				    0, 0, _RET_IP_);
 	} else {
 		/* Writer is active, so switch to global rwlock. */
 
@@ -126,6 +130,12 @@ void percpu_read_lock_irqsafe(struct percpu_rwlock *pcpu_rwlock)
 		if (!writer_active(pcpu_rwlock)) {
 			this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
 			read_unlock(&pcpu_rwlock->global_rwlock);
+
+			/*
+			 * Pretend that we take global_rwlock for lockdep
+			 */
+			rwlock_acquire_read(&pcpu_rwlock->global_rwlock.dep_map,
+					    0, 0, _RET_IP_);
 		}
 	}
 
@@ -162,6 +172,13 @@ void percpu_read_unlock_irqsafe(struct percpu_rwlock *pcpu_rwlock)
 		 */
 		smp_mb();
 		this_cpu_dec(pcpu_rwlock->rw_state->reader_refcnt);
+
+		/*
+		 * Since this is the last decrement, it is time to pretend
+		 * to lockdep that we are releasing the read lock.
+		 */
+		rwlock_release(&pcpu_rwlock->global_rwlock.dep_map,
+			       1, _RET_IP_);
 	} else {
 		read_unlock(&pcpu_rwlock->global_rwlock);
 	}
@@ -200,6 +217,16 @@ void percpu_write_lock_irqsave(struct percpu_rwlock *pcpu_rwlock,
 
 	smp_mb(); /* Complete the wait-for-readers, before taking the lock */
 	write_lock_irqsave(&pcpu_rwlock->global_rwlock, *flags);
+
+	/*
+	 * It is desirable to allow the writer to acquire the percpu-rwlock
+	 * for read (if necessary), without deadlocking or getting complaints
+	 * from lockdep. To achieve that, just increment the reader_refcnt of
+	 * this CPU - that way, any attempt by the writer to acquire the
+	 * percpu-rwlock for read, will get treated as a case of nested percpu
+	 * reader, which is safe, from a locking perspective.
+	 */
+	this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
 }
 
 void percpu_write_unlock_irqrestore(struct percpu_rwlock *pcpu_rwlock,
@@ -207,6 +234,12 @@ void percpu_write_unlock_irqrestore(struct percpu_rwlock *pcpu_rwlock,
 {
 	unsigned int cpu;
 
+	/*
+	 * Undo the special increment that we had done in the write-lock path
+	 * in order to allow writers to be readers.
+	 */
+	this_cpu_dec(pcpu_rwlock->rw_state->reader_refcnt);
+
 	/* Complete the critical section before clearing ->writer_signal */
 	smp_mb();
 

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 08/46] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:39   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:39 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

There are places where preempt_disable() or local_irq_disable() are used
to prevent any CPU from going offline during the critical section. Let us
call them as "atomic hotplug readers" ("atomic" because they run in atomic,
non-preemptible contexts).

Today, preempt_disable() or its equivalent works because the hotplug writer
uses stop_machine() to take CPUs offline. But once stop_machine() is gone
from the CPU hotplug offline path, the readers won't be able to prevent
CPUs from going offline using preempt_disable().

So the intent here is to provide synchronization APIs for such atomic hotplug
readers, to prevent (any) CPUs from going offline, without depending on
stop_machine() at the writer-side. The new APIs will look something like
this:  get_online_cpus_atomic() and put_online_cpus_atomic()

Some important design requirements and considerations:
-----------------------------------------------------

1. Scalable synchronization at the reader-side, especially in the fast-path

   Any synchronization at the atomic hotplug readers side must be highly
   scalable - avoid global single-holder locks/counters etc. Because, these
   paths currently use the extremely fast preempt_disable(); our replacement
   to preempt_disable() should not become ridiculously costly and also should
   not serialize the readers among themselves needlessly.

   At a minimum, the new APIs must be extremely fast at the reader side
   atleast in the fast-path, when no CPU offline writers are active.

2. preempt_disable() was recursive. The replacement should also be recursive.

3. No (new) lock-ordering restrictions

   preempt_disable() was super-flexible. It didn't impose any ordering
   restrictions or rules for nesting. Our replacement should also be equally
   flexible and usable.

4. No deadlock possibilities

   Regular per-cpu locking is not the way to go if we want to have relaxed
   rules for lock-ordering. Because, we can end up in circular-locking
   dependencies as explained in https://lkml.org/lkml/2012/12/6/290

   So, avoid the usual per-cpu locking schemes (per-cpu locks/per-cpu atomic
   counters with spin-on-contention etc) as much as possible, to avoid
   numerous deadlock possibilities from creeping in.


Implementation of the design:
----------------------------

We use per-CPU reader-writer locks for synchronization because:

  a. They are quite fast and scalable in the fast-path (when no writers are
     active), since they use fast per-cpu counters in those paths.

  b. They are recursive at the reader side.

  c. They provide a good amount of safety against deadlocks; they don't
     spring new deadlock possibilities on us from out of nowhere. As a
     result, they have relaxed locking rules and are quite flexible, and
     thus are best suited for replacing usages of preempt_disable() or
     local_irq_disable() at the reader side.

Together, these satisfy all the requirements mentioned above.

I'm indebted to Michael Wang and Xiao Guangrong for their numerous thoughtful
suggestions and ideas, which inspired and influenced many of the decisions in
this as well as previous designs. Thanks a lot Michael and Xiao!

Cc: Russell King <linux@arm.linux.org.uk>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: uclinux-dist-devel@blackfin.uclinux.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-am33-list@redhat.com
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---

 arch/arm/Kconfig      |    1 +
 arch/blackfin/Kconfig |    1 +
 arch/ia64/Kconfig     |    1 +
 arch/mips/Kconfig     |    1 +
 arch/mn10300/Kconfig  |    1 +
 arch/parisc/Kconfig   |    1 +
 arch/powerpc/Kconfig  |    1 +
 arch/s390/Kconfig     |    1 +
 arch/sh/Kconfig       |    1 +
 arch/sparc/Kconfig    |    1 +
 arch/x86/Kconfig      |    1 +
 include/linux/cpu.h   |    4 ++++
 kernel/cpu.c          |   55 +++++++++++++++++++++++++++++++++++++++++++++++--
 13 files changed, 68 insertions(+), 2 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 67874b8..cb6b94b 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1616,6 +1616,7 @@ config NR_CPUS
 config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs"
 	depends on SMP && HOTPLUG
+	select PERCPU_RWLOCK
 	help
 	  Say Y here to experiment with turning CPUs off and on.  CPUs
 	  can be controlled through /sys/devices/system/cpu.
diff --git a/arch/blackfin/Kconfig b/arch/blackfin/Kconfig
index b6f3ad5..83d9882 100644
--- a/arch/blackfin/Kconfig
+++ b/arch/blackfin/Kconfig
@@ -261,6 +261,7 @@ config NR_CPUS
 config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs"
 	depends on SMP && HOTPLUG
+	select PERCPU_RWLOCK
 	default y
 
 config BF_REV_MIN
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 3279646..c246772 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -378,6 +378,7 @@ config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs (EXPERIMENTAL)"
 	depends on SMP && EXPERIMENTAL
 	select HOTPLUG
+	select PERCPU_RWLOCK
 	default n
 	---help---
 	  Say Y here to experiment with turning CPUs off and on.  CPUs
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 2ac626a..f97c479 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -956,6 +956,7 @@ config SYS_HAS_EARLY_PRINTK
 config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs"
 	depends on SMP && HOTPLUG && SYS_SUPPORTS_HOTPLUG_CPU
+	select PERCPU_RWLOCK
 	help
 	  Say Y here to allow turning CPUs off and on. CPUs can be
 	  controlled through /sys/devices/system/cpu.
diff --git a/arch/mn10300/Kconfig b/arch/mn10300/Kconfig
index e70001c..a64e488 100644
--- a/arch/mn10300/Kconfig
+++ b/arch/mn10300/Kconfig
@@ -60,6 +60,7 @@ config ARCH_HAS_ILOG2_U32
 
 config HOTPLUG_CPU
 	def_bool n
+	select PERCPU_RWLOCK
 
 source "init/Kconfig"
 
diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index b77feff..6f55cd4 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -226,6 +226,7 @@ config HOTPLUG_CPU
 	bool
 	default y if SMP
 	select HOTPLUG
+	select PERCPU_RWLOCK
 
 config ARCH_SELECT_MEMORY_MODEL
 	def_bool y
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 17903f1..56b1f15 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -336,6 +336,7 @@ config HOTPLUG_CPU
 	bool "Support for enabling/disabling CPUs"
 	depends on SMP && HOTPLUG && EXPERIMENTAL && (PPC_PSERIES || \
 	PPC_PMAC || PPC_POWERNV || (PPC_85xx && !PPC_E500MC))
+	select PERCPU_RWLOCK
 	---help---
 	  Say Y here to be able to disable and re-enable individual
 	  CPUs at runtime on SMP machines.
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index b5ea38c..a9aafb4 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -299,6 +299,7 @@ config HOTPLUG_CPU
 	prompt "Support for hot-pluggable CPUs"
 	depends on SMP
 	select HOTPLUG
+	select PERCPU_RWLOCK
 	help
 	  Say Y here to be able to turn CPUs off and on. CPUs
 	  can be controlled through /sys/devices/system/cpu/cpu#.
diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index babc2b8..8c92eef 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -765,6 +765,7 @@ config NR_CPUS
 config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs (EXPERIMENTAL)"
 	depends on SMP && HOTPLUG && EXPERIMENTAL
+	select PERCPU_RWLOCK
 	help
 	  Say Y here to experiment with turning CPUs off and on.  CPUs
 	  can be controlled through /sys/devices/system/cpu.
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index cb9c333..b22f29d 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -254,6 +254,7 @@ config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs"
 	depends on SPARC64 && SMP
 	select HOTPLUG
+	select PERCPU_RWLOCK
 	help
 	  Say Y here to experiment with turning CPUs off and on.  CPUs
 	  can be controlled through /sys/devices/system/cpu/cpu#.
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 225543b..1a6d50d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1689,6 +1689,7 @@ config PHYSICAL_ALIGN
 config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs"
 	depends on SMP && HOTPLUG
+	select PERCPU_RWLOCK
 	---help---
 	  Say Y here to allow turning CPUs off and on. CPUs can be
 	  controlled through /sys/devices/system/cpu.
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index ce7a074..cf24da1 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -175,6 +175,8 @@ extern struct bus_type cpu_subsys;
 
 extern void get_online_cpus(void);
 extern void put_online_cpus(void);
+extern void get_online_cpus_atomic(void);
+extern void put_online_cpus_atomic(void);
 #define hotcpu_notifier(fn, pri)	cpu_notifier(fn, pri)
 #define register_hotcpu_notifier(nb)	register_cpu_notifier(nb)
 #define unregister_hotcpu_notifier(nb)	unregister_cpu_notifier(nb)
@@ -198,6 +200,8 @@ static inline void cpu_hotplug_driver_unlock(void)
 
 #define get_online_cpus()	do { } while (0)
 #define put_online_cpus()	do { } while (0)
+#define get_online_cpus_atomic()	do { } while (0)
+#define put_online_cpus_atomic()	do { } while (0)
 #define hotcpu_notifier(fn, pri)	do { (void)(fn); } while (0)
 /* These aren't inline functions due to a GCC bug. */
 #define register_hotcpu_notifier(nb)	({ (void)(nb); 0; })
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 3046a50..58dd1df 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1,6 +1,18 @@
 /* CPU control.
  * (C) 2001, 2002, 2003, 2004 Rusty Russell
  *
+ * Rework of the CPU hotplug offline mechanism to remove its dependence on
+ * the heavy-weight stop_machine() primitive, by Srivatsa S. Bhat and
+ * Paul E. McKenney.
+ *
+ * Copyright (C) IBM Corporation, 2012-2013
+ * Authors: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
+ *          Paul E. McKenney <paulmck@linux.vnet.ibm.com>
+ *
+ * With lots of invaluable suggestions from:
+ *	    Oleg Nesterov <oleg@redhat.com>
+ *	    Tejun Heo <tj@kernel.org>
+ *
  * This code is licenced under the GPL.
  */
 #include <linux/proc_fs.h>
@@ -19,6 +31,7 @@
 #include <linux/mutex.h>
 #include <linux/gfp.h>
 #include <linux/suspend.h>
+#include <linux/percpu-rwlock.h>
 
 #include "smpboot.h"
 
@@ -133,6 +146,38 @@ static void cpu_hotplug_done(void)
 	mutex_unlock(&cpu_hotplug.lock);
 }
 
+/*
+ * Per-CPU Reader-Writer lock to synchronize between atomic hotplug
+ * readers and the CPU offline hotplug writer.
+ */
+DEFINE_STATIC_PERCPU_RWLOCK(hotplug_pcpu_rwlock);
+
+/*
+ * Invoked by atomic hotplug reader (a task which wants to prevent
+ * CPU offline, but which can't afford to sleep), to prevent CPUs from
+ * going offline. So, you can call this function from atomic contexts
+ * (including interrupt handlers).
+ *
+ * Note: This does NOT prevent CPUs from coming online! It only prevents
+ * CPUs from going offline.
+ *
+ * You can call this function recursively.
+ *
+ * Returns with preemption disabled (but interrupts remain as they are;
+ * they are not disabled).
+ */
+void get_online_cpus_atomic(void)
+{
+	percpu_read_lock_irqsafe(&hotplug_pcpu_rwlock);
+}
+EXPORT_SYMBOL_GPL(get_online_cpus_atomic);
+
+void put_online_cpus_atomic(void)
+{
+	percpu_read_unlock_irqsafe(&hotplug_pcpu_rwlock);
+}
+EXPORT_SYMBOL_GPL(put_online_cpus_atomic);
+
 #else /* #if CONFIG_HOTPLUG_CPU */
 static void cpu_hotplug_begin(void) {}
 static void cpu_hotplug_done(void) {}
@@ -246,15 +291,21 @@ struct take_cpu_down_param {
 static int __ref take_cpu_down(void *_param)
 {
 	struct take_cpu_down_param *param = _param;
+	unsigned long flags;
 	int err;
 
+	percpu_write_lock_irqsave(&hotplug_pcpu_rwlock, &flags);
+
 	/* Ensure this CPU doesn't handle any more interrupts. */
 	err = __cpu_disable();
 	if (err < 0)
-		return err;
+		goto out;
 
 	cpu_notify(CPU_DYING | param->mod, param->hcpu);
-	return 0;
+
+out:
+	percpu_write_unlock_irqrestore(&hotplug_pcpu_rwlock, &flags);
+	return err;
 }
 
 /* Requires cpu_add_remove_lock to be held */


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 08/46] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
@ 2013-02-18 12:39   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:39 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

There are places where preempt_disable() or local_irq_disable() are used
to prevent any CPU from going offline during the critical section. Let us
call them as "atomic hotplug readers" ("atomic" because they run in atomic,
non-preemptible contexts).

Today, preempt_disable() or its equivalent works because the hotplug writer
uses stop_machine() to take CPUs offline. But once stop_machine() is gone
from the CPU hotplug offline path, the readers won't be able to prevent
CPUs from going offline using preempt_disable().

So the intent here is to provide synchronization APIs for such atomic hotplug
readers, to prevent (any) CPUs from going offline, without depending on
stop_machine() at the writer-side. The new APIs will look something like
this:  get_online_cpus_atomic() and put_online_cpus_atomic()

Some important design requirements and considerations:
-----------------------------------------------------

1. Scalable synchronization at the reader-side, especially in the fast-path

   Any synchronization at the atomic hotplug readers side must be highly
   scalable - avoid global single-holder locks/counters etc. Because, these
   paths currently use the extremely fast preempt_disable(); our replacement
   to preempt_disable() should not become ridiculously costly and also should
   not serialize the readers among themselves needlessly.

   At a minimum, the new APIs must be extremely fast at the reader side
   atleast in the fast-path, when no CPU offline writers are active.

2. preempt_disable() was recursive. The replacement should also be recursive.

3. No (new) lock-ordering restrictions

   preempt_disable() was super-flexible. It didn't impose any ordering
   restrictions or rules for nesting. Our replacement should also be equally
   flexible and usable.

4. No deadlock possibilities

   Regular per-cpu locking is not the way to go if we want to have relaxed
   rules for lock-ordering. Because, we can end up in circular-locking
   dependencies as explained in https://lkml.org/lkml/2012/12/6/290

   So, avoid the usual per-cpu locking schemes (per-cpu locks/per-cpu atomic
   counters with spin-on-contention etc) as much as possible, to avoid
   numerous deadlock possibilities from creeping in.


Implementation of the design:
----------------------------

We use per-CPU reader-writer locks for synchronization because:

  a. They are quite fast and scalable in the fast-path (when no writers are
     active), since they use fast per-cpu counters in those paths.

  b. They are recursive at the reader side.

  c. They provide a good amount of safety against deadlocks; they don't
     spring new deadlock possibilities on us from out of nowhere. As a
     result, they have relaxed locking rules and are quite flexible, and
     thus are best suited for replacing usages of preempt_disable() or
     local_irq_disable() at the reader side.

Together, these satisfy all the requirements mentioned above.

I'm indebted to Michael Wang and Xiao Guangrong for their numerous thoughtful
suggestions and ideas, which inspired and influenced many of the decisions in
this as well as previous designs. Thanks a lot Michael and Xiao!

Cc: Russell King <linux@arm.linux.org.uk>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: uclinux-dist-devel@blackfin.uclinux.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-am33-list@redhat.com
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---

 arch/arm/Kconfig      |    1 +
 arch/blackfin/Kconfig |    1 +
 arch/ia64/Kconfig     |    1 +
 arch/mips/Kconfig     |    1 +
 arch/mn10300/Kconfig  |    1 +
 arch/parisc/Kconfig   |    1 +
 arch/powerpc/Kconfig  |    1 +
 arch/s390/Kconfig     |    1 +
 arch/sh/Kconfig       |    1 +
 arch/sparc/Kconfig    |    1 +
 arch/x86/Kconfig      |    1 +
 include/linux/cpu.h   |    4 ++++
 kernel/cpu.c          |   55 +++++++++++++++++++++++++++++++++++++++++++++++--
 13 files changed, 68 insertions(+), 2 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 67874b8..cb6b94b 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1616,6 +1616,7 @@ config NR_CPUS
 config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs"
 	depends on SMP && HOTPLUG
+	select PERCPU_RWLOCK
 	help
 	  Say Y here to experiment with turning CPUs off and on.  CPUs
 	  can be controlled through /sys/devices/system/cpu.
diff --git a/arch/blackfin/Kconfig b/arch/blackfin/Kconfig
index b6f3ad5..83d9882 100644
--- a/arch/blackfin/Kconfig
+++ b/arch/blackfin/Kconfig
@@ -261,6 +261,7 @@ config NR_CPUS
 config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs"
 	depends on SMP && HOTPLUG
+	select PERCPU_RWLOCK
 	default y
 
 config BF_REV_MIN
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 3279646..c246772 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -378,6 +378,7 @@ config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs (EXPERIMENTAL)"
 	depends on SMP && EXPERIMENTAL
 	select HOTPLUG
+	select PERCPU_RWLOCK
 	default n
 	---help---
 	  Say Y here to experiment with turning CPUs off and on.  CPUs
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 2ac626a..f97c479 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -956,6 +956,7 @@ config SYS_HAS_EARLY_PRINTK
 config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs"
 	depends on SMP && HOTPLUG && SYS_SUPPORTS_HOTPLUG_CPU
+	select PERCPU_RWLOCK
 	help
 	  Say Y here to allow turning CPUs off and on. CPUs can be
 	  controlled through /sys/devices/system/cpu.
diff --git a/arch/mn10300/Kconfig b/arch/mn10300/Kconfig
index e70001c..a64e488 100644
--- a/arch/mn10300/Kconfig
+++ b/arch/mn10300/Kconfig
@@ -60,6 +60,7 @@ config ARCH_HAS_ILOG2_U32
 
 config HOTPLUG_CPU
 	def_bool n
+	select PERCPU_RWLOCK
 
 source "init/Kconfig"
 
diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index b77feff..6f55cd4 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -226,6 +226,7 @@ config HOTPLUG_CPU
 	bool
 	default y if SMP
 	select HOTPLUG
+	select PERCPU_RWLOCK
 
 config ARCH_SELECT_MEMORY_MODEL
 	def_bool y
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 17903f1..56b1f15 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -336,6 +336,7 @@ config HOTPLUG_CPU
 	bool "Support for enabling/disabling CPUs"
 	depends on SMP && HOTPLUG && EXPERIMENTAL && (PPC_PSERIES || \
 	PPC_PMAC || PPC_POWERNV || (PPC_85xx && !PPC_E500MC))
+	select PERCPU_RWLOCK
 	---help---
 	  Say Y here to be able to disable and re-enable individual
 	  CPUs at runtime on SMP machines.
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index b5ea38c..a9aafb4 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -299,6 +299,7 @@ config HOTPLUG_CPU
 	prompt "Support for hot-pluggable CPUs"
 	depends on SMP
 	select HOTPLUG
+	select PERCPU_RWLOCK
 	help
 	  Say Y here to be able to turn CPUs off and on. CPUs
 	  can be controlled through /sys/devices/system/cpu/cpu#.
diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index babc2b8..8c92eef 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -765,6 +765,7 @@ config NR_CPUS
 config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs (EXPERIMENTAL)"
 	depends on SMP && HOTPLUG && EXPERIMENTAL
+	select PERCPU_RWLOCK
 	help
 	  Say Y here to experiment with turning CPUs off and on.  CPUs
 	  can be controlled through /sys/devices/system/cpu.
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index cb9c333..b22f29d 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -254,6 +254,7 @@ config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs"
 	depends on SPARC64 && SMP
 	select HOTPLUG
+	select PERCPU_RWLOCK
 	help
 	  Say Y here to experiment with turning CPUs off and on.  CPUs
 	  can be controlled through /sys/devices/system/cpu/cpu#.
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 225543b..1a6d50d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1689,6 +1689,7 @@ config PHYSICAL_ALIGN
 config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs"
 	depends on SMP && HOTPLUG
+	select PERCPU_RWLOCK
 	---help---
 	  Say Y here to allow turning CPUs off and on. CPUs can be
 	  controlled through /sys/devices/system/cpu.
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index ce7a074..cf24da1 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -175,6 +175,8 @@ extern struct bus_type cpu_subsys;
 
 extern void get_online_cpus(void);
 extern void put_online_cpus(void);
+extern void get_online_cpus_atomic(void);
+extern void put_online_cpus_atomic(void);
 #define hotcpu_notifier(fn, pri)	cpu_notifier(fn, pri)
 #define register_hotcpu_notifier(nb)	register_cpu_notifier(nb)
 #define unregister_hotcpu_notifier(nb)	unregister_cpu_notifier(nb)
@@ -198,6 +200,8 @@ static inline void cpu_hotplug_driver_unlock(void)
 
 #define get_online_cpus()	do { } while (0)
 #define put_online_cpus()	do { } while (0)
+#define get_online_cpus_atomic()	do { } while (0)
+#define put_online_cpus_atomic()	do { } while (0)
 #define hotcpu_notifier(fn, pri)	do { (void)(fn); } while (0)
 /* These aren't inline functions due to a GCC bug. */
 #define register_hotcpu_notifier(nb)	({ (void)(nb); 0; })
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 3046a50..58dd1df 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1,6 +1,18 @@
 /* CPU control.
  * (C) 2001, 2002, 2003, 2004 Rusty Russell
  *
+ * Rework of the CPU hotplug offline mechanism to remove its dependence on
+ * the heavy-weight stop_machine() primitive, by Srivatsa S. Bhat and
+ * Paul E. McKenney.
+ *
+ * Copyright (C) IBM Corporation, 2012-2013
+ * Authors: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
+ *          Paul E. McKenney <paulmck@linux.vnet.ibm.com>
+ *
+ * With lots of invaluable suggestions from:
+ *	    Oleg Nesterov <oleg@redhat.com>
+ *	    Tejun Heo <tj@kernel.org>
+ *
  * This code is licenced under the GPL.
  */
 #include <linux/proc_fs.h>
@@ -19,6 +31,7 @@
 #include <linux/mutex.h>
 #include <linux/gfp.h>
 #include <linux/suspend.h>
+#include <linux/percpu-rwlock.h>
 
 #include "smpboot.h"
 
@@ -133,6 +146,38 @@ static void cpu_hotplug_done(void)
 	mutex_unlock(&cpu_hotplug.lock);
 }
 
+/*
+ * Per-CPU Reader-Writer lock to synchronize between atomic hotplug
+ * readers and the CPU offline hotplug writer.
+ */
+DEFINE_STATIC_PERCPU_RWLOCK(hotplug_pcpu_rwlock);
+
+/*
+ * Invoked by atomic hotplug reader (a task which wants to prevent
+ * CPU offline, but which can't afford to sleep), to prevent CPUs from
+ * going offline. So, you can call this function from atomic contexts
+ * (including interrupt handlers).
+ *
+ * Note: This does NOT prevent CPUs from coming online! It only prevents
+ * CPUs from going offline.
+ *
+ * You can call this function recursively.
+ *
+ * Returns with preemption disabled (but interrupts remain as they are;
+ * they are not disabled).
+ */
+void get_online_cpus_atomic(void)
+{
+	percpu_read_lock_irqsafe(&hotplug_pcpu_rwlock);
+}
+EXPORT_SYMBOL_GPL(get_online_cpus_atomic);
+
+void put_online_cpus_atomic(void)
+{
+	percpu_read_unlock_irqsafe(&hotplug_pcpu_rwlock);
+}
+EXPORT_SYMBOL_GPL(put_online_cpus_atomic);
+
 #else /* #if CONFIG_HOTPLUG_CPU */
 static void cpu_hotplug_begin(void) {}
 static void cpu_hotplug_done(void) {}
@@ -246,15 +291,21 @@ struct take_cpu_down_param {
 static int __ref take_cpu_down(void *_param)
 {
 	struct take_cpu_down_param *param = _param;
+	unsigned long flags;
 	int err;
 
+	percpu_write_lock_irqsave(&hotplug_pcpu_rwlock, &flags);
+
 	/* Ensure this CPU doesn't handle any more interrupts. */
 	err = __cpu_disable();
 	if (err < 0)
-		return err;
+		goto out;
 
 	cpu_notify(CPU_DYING | param->mod, param->hcpu);
-	return 0;
+
+out:
+	percpu_write_unlock_irqrestore(&hotplug_pcpu_rwlock, &flags);
+	return err;
 }
 
 /* Requires cpu_add_remove_lock to be held */

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 08/46] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
@ 2013-02-18 12:39   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:39 UTC (permalink / raw)
  To: linux-arm-kernel

There are places where preempt_disable() or local_irq_disable() are used
to prevent any CPU from going offline during the critical section. Let us
call them as "atomic hotplug readers" ("atomic" because they run in atomic,
non-preemptible contexts).

Today, preempt_disable() or its equivalent works because the hotplug writer
uses stop_machine() to take CPUs offline. But once stop_machine() is gone
from the CPU hotplug offline path, the readers won't be able to prevent
CPUs from going offline using preempt_disable().

So the intent here is to provide synchronization APIs for such atomic hotplug
readers, to prevent (any) CPUs from going offline, without depending on
stop_machine() at the writer-side. The new APIs will look something like
this:  get_online_cpus_atomic() and put_online_cpus_atomic()

Some important design requirements and considerations:
-----------------------------------------------------

1. Scalable synchronization at the reader-side, especially in the fast-path

   Any synchronization at the atomic hotplug readers side must be highly
   scalable - avoid global single-holder locks/counters etc. Because, these
   paths currently use the extremely fast preempt_disable(); our replacement
   to preempt_disable() should not become ridiculously costly and also should
   not serialize the readers among themselves needlessly.

   At a minimum, the new APIs must be extremely fast at the reader side
   atleast in the fast-path, when no CPU offline writers are active.

2. preempt_disable() was recursive. The replacement should also be recursive.

3. No (new) lock-ordering restrictions

   preempt_disable() was super-flexible. It didn't impose any ordering
   restrictions or rules for nesting. Our replacement should also be equally
   flexible and usable.

4. No deadlock possibilities

   Regular per-cpu locking is not the way to go if we want to have relaxed
   rules for lock-ordering. Because, we can end up in circular-locking
   dependencies as explained in https://lkml.org/lkml/2012/12/6/290

   So, avoid the usual per-cpu locking schemes (per-cpu locks/per-cpu atomic
   counters with spin-on-contention etc) as much as possible, to avoid
   numerous deadlock possibilities from creeping in.


Implementation of the design:
----------------------------

We use per-CPU reader-writer locks for synchronization because:

  a. They are quite fast and scalable in the fast-path (when no writers are
     active), since they use fast per-cpu counters in those paths.

  b. They are recursive at the reader side.

  c. They provide a good amount of safety against deadlocks; they don't
     spring new deadlock possibilities on us from out of nowhere. As a
     result, they have relaxed locking rules and are quite flexible, and
     thus are best suited for replacing usages of preempt_disable() or
     local_irq_disable() at the reader side.

Together, these satisfy all the requirements mentioned above.

I'm indebted to Michael Wang and Xiao Guangrong for their numerous thoughtful
suggestions and ideas, which inspired and influenced many of the decisions in
this as well as previous designs. Thanks a lot Michael and Xiao!

Cc: Russell King <linux@arm.linux.org.uk>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86 at kernel.org
Cc: linux-arm-kernel at lists.infradead.org
Cc: uclinux-dist-devel at blackfin.uclinux.org
Cc: linux-ia64 at vger.kernel.org
Cc: linux-mips at linux-mips.org
Cc: linux-am33-list at redhat.com
Cc: linux-parisc at vger.kernel.org
Cc: linuxppc-dev at lists.ozlabs.org
Cc: linux-s390 at vger.kernel.org
Cc: linux-sh at vger.kernel.org
Cc: sparclinux at vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---

 arch/arm/Kconfig      |    1 +
 arch/blackfin/Kconfig |    1 +
 arch/ia64/Kconfig     |    1 +
 arch/mips/Kconfig     |    1 +
 arch/mn10300/Kconfig  |    1 +
 arch/parisc/Kconfig   |    1 +
 arch/powerpc/Kconfig  |    1 +
 arch/s390/Kconfig     |    1 +
 arch/sh/Kconfig       |    1 +
 arch/sparc/Kconfig    |    1 +
 arch/x86/Kconfig      |    1 +
 include/linux/cpu.h   |    4 ++++
 kernel/cpu.c          |   55 +++++++++++++++++++++++++++++++++++++++++++++++--
 13 files changed, 68 insertions(+), 2 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 67874b8..cb6b94b 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1616,6 +1616,7 @@ config NR_CPUS
 config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs"
 	depends on SMP && HOTPLUG
+	select PERCPU_RWLOCK
 	help
 	  Say Y here to experiment with turning CPUs off and on.  CPUs
 	  can be controlled through /sys/devices/system/cpu.
diff --git a/arch/blackfin/Kconfig b/arch/blackfin/Kconfig
index b6f3ad5..83d9882 100644
--- a/arch/blackfin/Kconfig
+++ b/arch/blackfin/Kconfig
@@ -261,6 +261,7 @@ config NR_CPUS
 config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs"
 	depends on SMP && HOTPLUG
+	select PERCPU_RWLOCK
 	default y
 
 config BF_REV_MIN
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 3279646..c246772 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -378,6 +378,7 @@ config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs (EXPERIMENTAL)"
 	depends on SMP && EXPERIMENTAL
 	select HOTPLUG
+	select PERCPU_RWLOCK
 	default n
 	---help---
 	  Say Y here to experiment with turning CPUs off and on.  CPUs
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 2ac626a..f97c479 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -956,6 +956,7 @@ config SYS_HAS_EARLY_PRINTK
 config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs"
 	depends on SMP && HOTPLUG && SYS_SUPPORTS_HOTPLUG_CPU
+	select PERCPU_RWLOCK
 	help
 	  Say Y here to allow turning CPUs off and on. CPUs can be
 	  controlled through /sys/devices/system/cpu.
diff --git a/arch/mn10300/Kconfig b/arch/mn10300/Kconfig
index e70001c..a64e488 100644
--- a/arch/mn10300/Kconfig
+++ b/arch/mn10300/Kconfig
@@ -60,6 +60,7 @@ config ARCH_HAS_ILOG2_U32
 
 config HOTPLUG_CPU
 	def_bool n
+	select PERCPU_RWLOCK
 
 source "init/Kconfig"
 
diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index b77feff..6f55cd4 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -226,6 +226,7 @@ config HOTPLUG_CPU
 	bool
 	default y if SMP
 	select HOTPLUG
+	select PERCPU_RWLOCK
 
 config ARCH_SELECT_MEMORY_MODEL
 	def_bool y
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 17903f1..56b1f15 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -336,6 +336,7 @@ config HOTPLUG_CPU
 	bool "Support for enabling/disabling CPUs"
 	depends on SMP && HOTPLUG && EXPERIMENTAL && (PPC_PSERIES || \
 	PPC_PMAC || PPC_POWERNV || (PPC_85xx && !PPC_E500MC))
+	select PERCPU_RWLOCK
 	---help---
 	  Say Y here to be able to disable and re-enable individual
 	  CPUs at runtime on SMP machines.
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index b5ea38c..a9aafb4 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -299,6 +299,7 @@ config HOTPLUG_CPU
 	prompt "Support for hot-pluggable CPUs"
 	depends on SMP
 	select HOTPLUG
+	select PERCPU_RWLOCK
 	help
 	  Say Y here to be able to turn CPUs off and on. CPUs
 	  can be controlled through /sys/devices/system/cpu/cpu#.
diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index babc2b8..8c92eef 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -765,6 +765,7 @@ config NR_CPUS
 config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs (EXPERIMENTAL)"
 	depends on SMP && HOTPLUG && EXPERIMENTAL
+	select PERCPU_RWLOCK
 	help
 	  Say Y here to experiment with turning CPUs off and on.  CPUs
 	  can be controlled through /sys/devices/system/cpu.
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index cb9c333..b22f29d 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -254,6 +254,7 @@ config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs"
 	depends on SPARC64 && SMP
 	select HOTPLUG
+	select PERCPU_RWLOCK
 	help
 	  Say Y here to experiment with turning CPUs off and on.  CPUs
 	  can be controlled through /sys/devices/system/cpu/cpu#.
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 225543b..1a6d50d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1689,6 +1689,7 @@ config PHYSICAL_ALIGN
 config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs"
 	depends on SMP && HOTPLUG
+	select PERCPU_RWLOCK
 	---help---
 	  Say Y here to allow turning CPUs off and on. CPUs can be
 	  controlled through /sys/devices/system/cpu.
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index ce7a074..cf24da1 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -175,6 +175,8 @@ extern struct bus_type cpu_subsys;
 
 extern void get_online_cpus(void);
 extern void put_online_cpus(void);
+extern void get_online_cpus_atomic(void);
+extern void put_online_cpus_atomic(void);
 #define hotcpu_notifier(fn, pri)	cpu_notifier(fn, pri)
 #define register_hotcpu_notifier(nb)	register_cpu_notifier(nb)
 #define unregister_hotcpu_notifier(nb)	unregister_cpu_notifier(nb)
@@ -198,6 +200,8 @@ static inline void cpu_hotplug_driver_unlock(void)
 
 #define get_online_cpus()	do { } while (0)
 #define put_online_cpus()	do { } while (0)
+#define get_online_cpus_atomic()	do { } while (0)
+#define put_online_cpus_atomic()	do { } while (0)
 #define hotcpu_notifier(fn, pri)	do { (void)(fn); } while (0)
 /* These aren't inline functions due to a GCC bug. */
 #define register_hotcpu_notifier(nb)	({ (void)(nb); 0; })
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 3046a50..58dd1df 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1,6 +1,18 @@
 /* CPU control.
  * (C) 2001, 2002, 2003, 2004 Rusty Russell
  *
+ * Rework of the CPU hotplug offline mechanism to remove its dependence on
+ * the heavy-weight stop_machine() primitive, by Srivatsa S. Bhat and
+ * Paul E. McKenney.
+ *
+ * Copyright (C) IBM Corporation, 2012-2013
+ * Authors: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
+ *          Paul E. McKenney <paulmck@linux.vnet.ibm.com>
+ *
+ * With lots of invaluable suggestions from:
+ *	    Oleg Nesterov <oleg@redhat.com>
+ *	    Tejun Heo <tj@kernel.org>
+ *
  * This code is licenced under the GPL.
  */
 #include <linux/proc_fs.h>
@@ -19,6 +31,7 @@
 #include <linux/mutex.h>
 #include <linux/gfp.h>
 #include <linux/suspend.h>
+#include <linux/percpu-rwlock.h>
 
 #include "smpboot.h"
 
@@ -133,6 +146,38 @@ static void cpu_hotplug_done(void)
 	mutex_unlock(&cpu_hotplug.lock);
 }
 
+/*
+ * Per-CPU Reader-Writer lock to synchronize between atomic hotplug
+ * readers and the CPU offline hotplug writer.
+ */
+DEFINE_STATIC_PERCPU_RWLOCK(hotplug_pcpu_rwlock);
+
+/*
+ * Invoked by atomic hotplug reader (a task which wants to prevent
+ * CPU offline, but which can't afford to sleep), to prevent CPUs from
+ * going offline. So, you can call this function from atomic contexts
+ * (including interrupt handlers).
+ *
+ * Note: This does NOT prevent CPUs from coming online! It only prevents
+ * CPUs from going offline.
+ *
+ * You can call this function recursively.
+ *
+ * Returns with preemption disabled (but interrupts remain as they are;
+ * they are not disabled).
+ */
+void get_online_cpus_atomic(void)
+{
+	percpu_read_lock_irqsafe(&hotplug_pcpu_rwlock);
+}
+EXPORT_SYMBOL_GPL(get_online_cpus_atomic);
+
+void put_online_cpus_atomic(void)
+{
+	percpu_read_unlock_irqsafe(&hotplug_pcpu_rwlock);
+}
+EXPORT_SYMBOL_GPL(put_online_cpus_atomic);
+
 #else /* #if CONFIG_HOTPLUG_CPU */
 static void cpu_hotplug_begin(void) {}
 static void cpu_hotplug_done(void) {}
@@ -246,15 +291,21 @@ struct take_cpu_down_param {
 static int __ref take_cpu_down(void *_param)
 {
 	struct take_cpu_down_param *param = _param;
+	unsigned long flags;
 	int err;
 
+	percpu_write_lock_irqsave(&hotplug_pcpu_rwlock, &flags);
+
 	/* Ensure this CPU doesn't handle any more interrupts. */
 	err = __cpu_disable();
 	if (err < 0)
-		return err;
+		goto out;
 
 	cpu_notify(CPU_DYING | param->mod, param->hcpu);
-	return 0;
+
+out:
+	percpu_write_unlock_irqrestore(&hotplug_pcpu_rwlock, &flags);
+	return err;
 }
 
 /* Requires cpu_add_remove_lock to be held */

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 09/46] CPU hotplug: Convert preprocessor macros to static inline functions
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:39   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:39 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

On 12/05/2012 06:10 AM, Andrew Morton wrote:
"static inline C functions would be preferred if possible.  Feel free to
fix up the wrong crufty surrounding code as well ;-)"

Convert the macros in the CPU hotplug code to static inline C functions.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---

 include/linux/cpu.h |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index cf24da1..eb79f47 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -198,10 +198,10 @@ static inline void cpu_hotplug_driver_unlock(void)
 
 #else		/* CONFIG_HOTPLUG_CPU */
 
-#define get_online_cpus()	do { } while (0)
-#define put_online_cpus()	do { } while (0)
-#define get_online_cpus_atomic()	do { } while (0)
-#define put_online_cpus_atomic()	do { } while (0)
+static inline void get_online_cpus(void) {}
+static inline void put_online_cpus(void) {}
+static inline void get_online_cpus_atomic(void) {}
+static inline void put_online_cpus_atomic(void) {}
 #define hotcpu_notifier(fn, pri)	do { (void)(fn); } while (0)
 /* These aren't inline functions due to a GCC bug. */
 #define register_hotcpu_notifier(nb)	({ (void)(nb); 0; })


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 09/46] CPU hotplug: Convert preprocessor macros to static inline functions
@ 2013-02-18 12:39   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:39 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

On 12/05/2012 06:10 AM, Andrew Morton wrote:
"static inline C functions would be preferred if possible.  Feel free to
fix up the wrong crufty surrounding code as well ;-)"

Convert the macros in the CPU hotplug code to static inline C functions.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---

 include/linux/cpu.h |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index cf24da1..eb79f47 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -198,10 +198,10 @@ static inline void cpu_hotplug_driver_unlock(void)
 
 #else		/* CONFIG_HOTPLUG_CPU */
 
-#define get_online_cpus()	do { } while (0)
-#define put_online_cpus()	do { } while (0)
-#define get_online_cpus_atomic()	do { } while (0)
-#define put_online_cpus_atomic()	do { } while (0)
+static inline void get_online_cpus(void) {}
+static inline void put_online_cpus(void) {}
+static inline void get_online_cpus_atomic(void) {}
+static inline void put_online_cpus_atomic(void) {}
 #define hotcpu_notifier(fn, pri)	do { (void)(fn); } while (0)
 /* These aren't inline functions due to a GCC bug. */
 #define register_hotcpu_notifier(nb)	({ (void)(nb); 0; })

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 09/46] CPU hotplug: Convert preprocessor macros to static inline functions
@ 2013-02-18 12:39   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:39 UTC (permalink / raw)
  To: linux-arm-kernel

On 12/05/2012 06:10 AM, Andrew Morton wrote:
"static inline C functions would be preferred if possible.  Feel free to
fix up the wrong crufty surrounding code as well ;-)"

Convert the macros in the CPU hotplug code to static inline C functions.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---

 include/linux/cpu.h |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index cf24da1..eb79f47 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -198,10 +198,10 @@ static inline void cpu_hotplug_driver_unlock(void)
 
 #else		/* CONFIG_HOTPLUG_CPU */
 
-#define get_online_cpus()	do { } while (0)
-#define put_online_cpus()	do { } while (0)
-#define get_online_cpus_atomic()	do { } while (0)
-#define put_online_cpus_atomic()	do { } while (0)
+static inline void get_online_cpus(void) {}
+static inline void put_online_cpus(void) {}
+static inline void get_online_cpus_atomic(void) {}
+static inline void put_online_cpus_atomic(void) {}
 #define hotcpu_notifier(fn, pri)	do { (void)(fn); } while (0)
 /* These aren't inline functions due to a GCC bug. */
 #define register_hotcpu_notifier(nb)	({ (void)(nb); 0; })

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 10/46] smp, cpu hotplug: Fix smp_call_function_*() to prevent CPU offline properly
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:39   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:39 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() to prevent CPUs from going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/smp.c |   40 ++++++++++++++++++++++++++--------------
 1 file changed, 26 insertions(+), 14 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 69f38bd..0f40d36 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -315,7 +315,8 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
 	 * prevent preemption and reschedule on another processor,
 	 * as well as CPU removal
 	 */
-	this_cpu = get_cpu();
+	get_online_cpus_atomic();
+	this_cpu = smp_processor_id();
 
 	/*
 	 * Can deadlock when called with interrupts disabled.
@@ -347,7 +348,7 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
 		}
 	}
 
-	put_cpu();
+	put_online_cpus_atomic();
 
 	return err;
 }
@@ -376,8 +377,10 @@ int smp_call_function_any(const struct cpumask *mask,
 	const struct cpumask *nodemask;
 	int ret;
 
+	get_online_cpus_atomic();
 	/* Try for same CPU (cheapest) */
-	cpu = get_cpu();
+	cpu = smp_processor_id();
+
 	if (cpumask_test_cpu(cpu, mask))
 		goto call;
 
@@ -393,7 +396,7 @@ int smp_call_function_any(const struct cpumask *mask,
 	cpu = cpumask_any_and(mask, cpu_online_mask);
 call:
 	ret = smp_call_function_single(cpu, func, info, wait);
-	put_cpu();
+	put_online_cpus_atomic();
 	return ret;
 }
 EXPORT_SYMBOL_GPL(smp_call_function_any);
@@ -414,25 +417,28 @@ void __smp_call_function_single(int cpu, struct call_single_data *data,
 	unsigned int this_cpu;
 	unsigned long flags;
 
-	this_cpu = get_cpu();
+	get_online_cpus_atomic();
+
+	this_cpu = smp_processor_id();
+
 	/*
 	 * Can deadlock when called with interrupts disabled.
 	 * We allow cpu's that are not yet online though, as no one else can
 	 * send smp call function interrupt to this cpu and as such deadlocks
 	 * can't happen.
 	 */
-	WARN_ON_ONCE(cpu_online(smp_processor_id()) && wait && irqs_disabled()
+	WARN_ON_ONCE(cpu_online(this_cpu) && wait && irqs_disabled()
 		     && !oops_in_progress);
 
 	if (cpu == this_cpu) {
 		local_irq_save(flags);
 		data->func(data->info);
 		local_irq_restore(flags);
-	} else {
+	} else if ((unsigned)cpu < nr_cpu_ids && cpu_online(cpu)) {
 		csd_lock(data);
 		generic_exec_single(cpu, data, wait);
 	}
-	put_cpu();
+	put_online_cpus_atomic();
 }
 
 /**
@@ -456,6 +462,8 @@ void smp_call_function_many(const struct cpumask *mask,
 	unsigned long flags;
 	int refs, cpu, next_cpu, this_cpu = smp_processor_id();
 
+	get_online_cpus_atomic();
+
 	/*
 	 * Can deadlock when called with interrupts disabled.
 	 * We allow cpu's that are not yet online though, as no one else can
@@ -472,17 +480,18 @@ void smp_call_function_many(const struct cpumask *mask,
 
 	/* No online cpus?  We're done. */
 	if (cpu >= nr_cpu_ids)
-		return;
+		goto out_unlock;
 
 	/* Do we have another CPU which isn't us? */
 	next_cpu = cpumask_next_and(cpu, mask, cpu_online_mask);
 	if (next_cpu == this_cpu)
-		next_cpu = cpumask_next_and(next_cpu, mask, cpu_online_mask);
+		next_cpu = cpumask_next_and(next_cpu, mask,
+						cpu_online_mask);
 
 	/* Fastpath: do that cpu by itself. */
 	if (next_cpu >= nr_cpu_ids) {
 		smp_call_function_single(cpu, func, info, wait);
-		return;
+		goto out_unlock;
 	}
 
 	data = &__get_cpu_var(cfd_data);
@@ -528,7 +537,7 @@ void smp_call_function_many(const struct cpumask *mask,
 	/* Some callers race with other cpus changing the passed mask */
 	if (unlikely(!refs)) {
 		csd_unlock(&data->csd);
-		return;
+		goto out_unlock;
 	}
 
 	/*
@@ -565,6 +574,9 @@ void smp_call_function_many(const struct cpumask *mask,
 	/* Optionally wait for the CPUs to complete */
 	if (wait)
 		csd_lock_wait(&data->csd);
+
+out_unlock:
+	put_online_cpus_atomic();
 }
 EXPORT_SYMBOL(smp_call_function_many);
 
@@ -585,9 +597,9 @@ EXPORT_SYMBOL(smp_call_function_many);
  */
 int smp_call_function(smp_call_func_t func, void *info, int wait)
 {
-	preempt_disable();
+	get_online_cpus_atomic();
 	smp_call_function_many(cpu_online_mask, func, info, wait);
-	preempt_enable();
+	put_online_cpus_atomic();
 
 	return 0;
 }


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 10/46] smp, cpu hotplug: Fix smp_call_function_*() to prevent CPU offline properly
@ 2013-02-18 12:39   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:39 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() to prevent CPUs from going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/smp.c |   40 ++++++++++++++++++++++++++--------------
 1 file changed, 26 insertions(+), 14 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 69f38bd..0f40d36 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -315,7 +315,8 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
 	 * prevent preemption and reschedule on another processor,
 	 * as well as CPU removal
 	 */
-	this_cpu = get_cpu();
+	get_online_cpus_atomic();
+	this_cpu = smp_processor_id();
 
 	/*
 	 * Can deadlock when called with interrupts disabled.
@@ -347,7 +348,7 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
 		}
 	}
 
-	put_cpu();
+	put_online_cpus_atomic();
 
 	return err;
 }
@@ -376,8 +377,10 @@ int smp_call_function_any(const struct cpumask *mask,
 	const struct cpumask *nodemask;
 	int ret;
 
+	get_online_cpus_atomic();
 	/* Try for same CPU (cheapest) */
-	cpu = get_cpu();
+	cpu = smp_processor_id();
+
 	if (cpumask_test_cpu(cpu, mask))
 		goto call;
 
@@ -393,7 +396,7 @@ int smp_call_function_any(const struct cpumask *mask,
 	cpu = cpumask_any_and(mask, cpu_online_mask);
 call:
 	ret = smp_call_function_single(cpu, func, info, wait);
-	put_cpu();
+	put_online_cpus_atomic();
 	return ret;
 }
 EXPORT_SYMBOL_GPL(smp_call_function_any);
@@ -414,25 +417,28 @@ void __smp_call_function_single(int cpu, struct call_single_data *data,
 	unsigned int this_cpu;
 	unsigned long flags;
 
-	this_cpu = get_cpu();
+	get_online_cpus_atomic();
+
+	this_cpu = smp_processor_id();
+
 	/*
 	 * Can deadlock when called with interrupts disabled.
 	 * We allow cpu's that are not yet online though, as no one else can
 	 * send smp call function interrupt to this cpu and as such deadlocks
 	 * can't happen.
 	 */
-	WARN_ON_ONCE(cpu_online(smp_processor_id()) && wait && irqs_disabled()
+	WARN_ON_ONCE(cpu_online(this_cpu) && wait && irqs_disabled()
 		     && !oops_in_progress);
 
 	if (cpu == this_cpu) {
 		local_irq_save(flags);
 		data->func(data->info);
 		local_irq_restore(flags);
-	} else {
+	} else if ((unsigned)cpu < nr_cpu_ids && cpu_online(cpu)) {
 		csd_lock(data);
 		generic_exec_single(cpu, data, wait);
 	}
-	put_cpu();
+	put_online_cpus_atomic();
 }
 
 /**
@@ -456,6 +462,8 @@ void smp_call_function_many(const struct cpumask *mask,
 	unsigned long flags;
 	int refs, cpu, next_cpu, this_cpu = smp_processor_id();
 
+	get_online_cpus_atomic();
+
 	/*
 	 * Can deadlock when called with interrupts disabled.
 	 * We allow cpu's that are not yet online though, as no one else can
@@ -472,17 +480,18 @@ void smp_call_function_many(const struct cpumask *mask,
 
 	/* No online cpus?  We're done. */
 	if (cpu >= nr_cpu_ids)
-		return;
+		goto out_unlock;
 
 	/* Do we have another CPU which isn't us? */
 	next_cpu = cpumask_next_and(cpu, mask, cpu_online_mask);
 	if (next_cpu == this_cpu)
-		next_cpu = cpumask_next_and(next_cpu, mask, cpu_online_mask);
+		next_cpu = cpumask_next_and(next_cpu, mask,
+						cpu_online_mask);
 
 	/* Fastpath: do that cpu by itself. */
 	if (next_cpu >= nr_cpu_ids) {
 		smp_call_function_single(cpu, func, info, wait);
-		return;
+		goto out_unlock;
 	}
 
 	data = &__get_cpu_var(cfd_data);
@@ -528,7 +537,7 @@ void smp_call_function_many(const struct cpumask *mask,
 	/* Some callers race with other cpus changing the passed mask */
 	if (unlikely(!refs)) {
 		csd_unlock(&data->csd);
-		return;
+		goto out_unlock;
 	}
 
 	/*
@@ -565,6 +574,9 @@ void smp_call_function_many(const struct cpumask *mask,
 	/* Optionally wait for the CPUs to complete */
 	if (wait)
 		csd_lock_wait(&data->csd);
+
+out_unlock:
+	put_online_cpus_atomic();
 }
 EXPORT_SYMBOL(smp_call_function_many);
 
@@ -585,9 +597,9 @@ EXPORT_SYMBOL(smp_call_function_many);
  */
 int smp_call_function(smp_call_func_t func, void *info, int wait)
 {
-	preempt_disable();
+	get_online_cpus_atomic();
 	smp_call_function_many(cpu_online_mask, func, info, wait);
-	preempt_enable();
+	put_online_cpus_atomic();
 
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 10/46] smp, cpu hotplug: Fix smp_call_function_*() to prevent CPU offline properly
@ 2013-02-18 12:39   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:39 UTC (permalink / raw)
  To: linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() to prevent CPUs from going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/smp.c |   40 ++++++++++++++++++++++++++--------------
 1 file changed, 26 insertions(+), 14 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 69f38bd..0f40d36 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -315,7 +315,8 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
 	 * prevent preemption and reschedule on another processor,
 	 * as well as CPU removal
 	 */
-	this_cpu = get_cpu();
+	get_online_cpus_atomic();
+	this_cpu = smp_processor_id();
 
 	/*
 	 * Can deadlock when called with interrupts disabled.
@@ -347,7 +348,7 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
 		}
 	}
 
-	put_cpu();
+	put_online_cpus_atomic();
 
 	return err;
 }
@@ -376,8 +377,10 @@ int smp_call_function_any(const struct cpumask *mask,
 	const struct cpumask *nodemask;
 	int ret;
 
+	get_online_cpus_atomic();
 	/* Try for same CPU (cheapest) */
-	cpu = get_cpu();
+	cpu = smp_processor_id();
+
 	if (cpumask_test_cpu(cpu, mask))
 		goto call;
 
@@ -393,7 +396,7 @@ int smp_call_function_any(const struct cpumask *mask,
 	cpu = cpumask_any_and(mask, cpu_online_mask);
 call:
 	ret = smp_call_function_single(cpu, func, info, wait);
-	put_cpu();
+	put_online_cpus_atomic();
 	return ret;
 }
 EXPORT_SYMBOL_GPL(smp_call_function_any);
@@ -414,25 +417,28 @@ void __smp_call_function_single(int cpu, struct call_single_data *data,
 	unsigned int this_cpu;
 	unsigned long flags;
 
-	this_cpu = get_cpu();
+	get_online_cpus_atomic();
+
+	this_cpu = smp_processor_id();
+
 	/*
 	 * Can deadlock when called with interrupts disabled.
 	 * We allow cpu's that are not yet online though, as no one else can
 	 * send smp call function interrupt to this cpu and as such deadlocks
 	 * can't happen.
 	 */
-	WARN_ON_ONCE(cpu_online(smp_processor_id()) && wait && irqs_disabled()
+	WARN_ON_ONCE(cpu_online(this_cpu) && wait && irqs_disabled()
 		     && !oops_in_progress);
 
 	if (cpu == this_cpu) {
 		local_irq_save(flags);
 		data->func(data->info);
 		local_irq_restore(flags);
-	} else {
+	} else if ((unsigned)cpu < nr_cpu_ids && cpu_online(cpu)) {
 		csd_lock(data);
 		generic_exec_single(cpu, data, wait);
 	}
-	put_cpu();
+	put_online_cpus_atomic();
 }
 
 /**
@@ -456,6 +462,8 @@ void smp_call_function_many(const struct cpumask *mask,
 	unsigned long flags;
 	int refs, cpu, next_cpu, this_cpu = smp_processor_id();
 
+	get_online_cpus_atomic();
+
 	/*
 	 * Can deadlock when called with interrupts disabled.
 	 * We allow cpu's that are not yet online though, as no one else can
@@ -472,17 +480,18 @@ void smp_call_function_many(const struct cpumask *mask,
 
 	/* No online cpus?  We're done. */
 	if (cpu >= nr_cpu_ids)
-		return;
+		goto out_unlock;
 
 	/* Do we have another CPU which isn't us? */
 	next_cpu = cpumask_next_and(cpu, mask, cpu_online_mask);
 	if (next_cpu == this_cpu)
-		next_cpu = cpumask_next_and(next_cpu, mask, cpu_online_mask);
+		next_cpu = cpumask_next_and(next_cpu, mask,
+						cpu_online_mask);
 
 	/* Fastpath: do that cpu by itself. */
 	if (next_cpu >= nr_cpu_ids) {
 		smp_call_function_single(cpu, func, info, wait);
-		return;
+		goto out_unlock;
 	}
 
 	data = &__get_cpu_var(cfd_data);
@@ -528,7 +537,7 @@ void smp_call_function_many(const struct cpumask *mask,
 	/* Some callers race with other cpus changing the passed mask */
 	if (unlikely(!refs)) {
 		csd_unlock(&data->csd);
-		return;
+		goto out_unlock;
 	}
 
 	/*
@@ -565,6 +574,9 @@ void smp_call_function_many(const struct cpumask *mask,
 	/* Optionally wait for the CPUs to complete */
 	if (wait)
 		csd_lock_wait(&data->csd);
+
+out_unlock:
+	put_online_cpus_atomic();
 }
 EXPORT_SYMBOL(smp_call_function_many);
 
@@ -585,9 +597,9 @@ EXPORT_SYMBOL(smp_call_function_many);
  */
 int smp_call_function(smp_call_func_t func, void *info, int wait)
 {
-	preempt_disable();
+	get_online_cpus_atomic();
 	smp_call_function_many(cpu_online_mask, func, info, wait);
-	preempt_enable();
+	put_online_cpus_atomic();
 
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 11/46] smp, cpu hotplug: Fix on_each_cpu_*() to prevent CPU offline properly
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:39   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:39 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() to prevent CPUs from going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/smp.c |   25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 0f40d36..a8fd381 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -699,12 +699,12 @@ int on_each_cpu(void (*func) (void *info), void *info, int wait)
 	unsigned long flags;
 	int ret = 0;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	ret = smp_call_function(func, info, wait);
 	local_irq_save(flags);
 	func(info);
 	local_irq_restore(flags);
-	preempt_enable();
+	put_online_cpus_atomic();
 	return ret;
 }
 EXPORT_SYMBOL(on_each_cpu);
@@ -726,7 +726,11 @@ EXPORT_SYMBOL(on_each_cpu);
 void on_each_cpu_mask(const struct cpumask *mask, smp_call_func_t func,
 			void *info, bool wait)
 {
-	int cpu = get_cpu();
+	int cpu;
+
+	get_online_cpus_atomic();
+
+	cpu = smp_processor_id();
 
 	smp_call_function_many(mask, func, info, wait);
 	if (cpumask_test_cpu(cpu, mask)) {
@@ -734,7 +738,7 @@ void on_each_cpu_mask(const struct cpumask *mask, smp_call_func_t func,
 		func(info);
 		local_irq_enable();
 	}
-	put_cpu();
+	put_online_cpus_atomic();
 }
 EXPORT_SYMBOL(on_each_cpu_mask);
 
@@ -759,8 +763,9 @@ EXPORT_SYMBOL(on_each_cpu_mask);
  * The function might sleep if the GFP flags indicates a non
  * atomic allocation is allowed.
  *
- * Preemption is disabled to protect against CPUs going offline but not online.
- * CPUs going online during the call will not be seen or sent an IPI.
+ * We use get/put_online_cpus_atomic() to prevent CPUs from going
+ * offline in-between our operation. CPUs coming online during the
+ * call will not be seen or sent an IPI.
  *
  * You must not call this function with disabled interrupts or
  * from a hardware interrupt handler or from a bottom half handler.
@@ -775,26 +780,26 @@ void on_each_cpu_cond(bool (*cond_func)(int cpu, void *info),
 	might_sleep_if(gfp_flags & __GFP_WAIT);
 
 	if (likely(zalloc_cpumask_var(&cpus, (gfp_flags|__GFP_NOWARN)))) {
-		preempt_disable();
+		get_online_cpus_atomic();
 		for_each_online_cpu(cpu)
 			if (cond_func(cpu, info))
 				cpumask_set_cpu(cpu, cpus);
 		on_each_cpu_mask(cpus, func, info, wait);
-		preempt_enable();
+		put_online_cpus_atomic();
 		free_cpumask_var(cpus);
 	} else {
 		/*
 		 * No free cpumask, bother. No matter, we'll
 		 * just have to IPI them one by one.
 		 */
-		preempt_disable();
+		get_online_cpus_atomic();
 		for_each_online_cpu(cpu)
 			if (cond_func(cpu, info)) {
 				ret = smp_call_function_single(cpu, func,
 								info, wait);
 				WARN_ON_ONCE(!ret);
 			}
-		preempt_enable();
+		put_online_cpus_atomic();
 	}
 }
 EXPORT_SYMBOL(on_each_cpu_cond);


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 11/46] smp, cpu hotplug: Fix on_each_cpu_*() to prevent CPU offline properly
@ 2013-02-18 12:39   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:39 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() to prevent CPUs from going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/smp.c |   25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 0f40d36..a8fd381 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -699,12 +699,12 @@ int on_each_cpu(void (*func) (void *info), void *info, int wait)
 	unsigned long flags;
 	int ret = 0;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	ret = smp_call_function(func, info, wait);
 	local_irq_save(flags);
 	func(info);
 	local_irq_restore(flags);
-	preempt_enable();
+	put_online_cpus_atomic();
 	return ret;
 }
 EXPORT_SYMBOL(on_each_cpu);
@@ -726,7 +726,11 @@ EXPORT_SYMBOL(on_each_cpu);
 void on_each_cpu_mask(const struct cpumask *mask, smp_call_func_t func,
 			void *info, bool wait)
 {
-	int cpu = get_cpu();
+	int cpu;
+
+	get_online_cpus_atomic();
+
+	cpu = smp_processor_id();
 
 	smp_call_function_many(mask, func, info, wait);
 	if (cpumask_test_cpu(cpu, mask)) {
@@ -734,7 +738,7 @@ void on_each_cpu_mask(const struct cpumask *mask, smp_call_func_t func,
 		func(info);
 		local_irq_enable();
 	}
-	put_cpu();
+	put_online_cpus_atomic();
 }
 EXPORT_SYMBOL(on_each_cpu_mask);
 
@@ -759,8 +763,9 @@ EXPORT_SYMBOL(on_each_cpu_mask);
  * The function might sleep if the GFP flags indicates a non
  * atomic allocation is allowed.
  *
- * Preemption is disabled to protect against CPUs going offline but not online.
- * CPUs going online during the call will not be seen or sent an IPI.
+ * We use get/put_online_cpus_atomic() to prevent CPUs from going
+ * offline in-between our operation. CPUs coming online during the
+ * call will not be seen or sent an IPI.
  *
  * You must not call this function with disabled interrupts or
  * from a hardware interrupt handler or from a bottom half handler.
@@ -775,26 +780,26 @@ void on_each_cpu_cond(bool (*cond_func)(int cpu, void *info),
 	might_sleep_if(gfp_flags & __GFP_WAIT);
 
 	if (likely(zalloc_cpumask_var(&cpus, (gfp_flags|__GFP_NOWARN)))) {
-		preempt_disable();
+		get_online_cpus_atomic();
 		for_each_online_cpu(cpu)
 			if (cond_func(cpu, info))
 				cpumask_set_cpu(cpu, cpus);
 		on_each_cpu_mask(cpus, func, info, wait);
-		preempt_enable();
+		put_online_cpus_atomic();
 		free_cpumask_var(cpus);
 	} else {
 		/*
 		 * No free cpumask, bother. No matter, we'll
 		 * just have to IPI them one by one.
 		 */
-		preempt_disable();
+		get_online_cpus_atomic();
 		for_each_online_cpu(cpu)
 			if (cond_func(cpu, info)) {
 				ret = smp_call_function_single(cpu, func,
 								info, wait);
 				WARN_ON_ONCE(!ret);
 			}
-		preempt_enable();
+		put_online_cpus_atomic();
 	}
 }
 EXPORT_SYMBOL(on_each_cpu_cond);

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 11/46] smp, cpu hotplug: Fix on_each_cpu_*() to prevent CPU offline properly
@ 2013-02-18 12:39   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:39 UTC (permalink / raw)
  To: linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() to prevent CPUs from going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/smp.c |   25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 0f40d36..a8fd381 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -699,12 +699,12 @@ int on_each_cpu(void (*func) (void *info), void *info, int wait)
 	unsigned long flags;
 	int ret = 0;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	ret = smp_call_function(func, info, wait);
 	local_irq_save(flags);
 	func(info);
 	local_irq_restore(flags);
-	preempt_enable();
+	put_online_cpus_atomic();
 	return ret;
 }
 EXPORT_SYMBOL(on_each_cpu);
@@ -726,7 +726,11 @@ EXPORT_SYMBOL(on_each_cpu);
 void on_each_cpu_mask(const struct cpumask *mask, smp_call_func_t func,
 			void *info, bool wait)
 {
-	int cpu = get_cpu();
+	int cpu;
+
+	get_online_cpus_atomic();
+
+	cpu = smp_processor_id();
 
 	smp_call_function_many(mask, func, info, wait);
 	if (cpumask_test_cpu(cpu, mask)) {
@@ -734,7 +738,7 @@ void on_each_cpu_mask(const struct cpumask *mask, smp_call_func_t func,
 		func(info);
 		local_irq_enable();
 	}
-	put_cpu();
+	put_online_cpus_atomic();
 }
 EXPORT_SYMBOL(on_each_cpu_mask);
 
@@ -759,8 +763,9 @@ EXPORT_SYMBOL(on_each_cpu_mask);
  * The function might sleep if the GFP flags indicates a non
  * atomic allocation is allowed.
  *
- * Preemption is disabled to protect against CPUs going offline but not online.
- * CPUs going online during the call will not be seen or sent an IPI.
+ * We use get/put_online_cpus_atomic() to prevent CPUs from going
+ * offline in-between our operation. CPUs coming online during the
+ * call will not be seen or sent an IPI.
  *
  * You must not call this function with disabled interrupts or
  * from a hardware interrupt handler or from a bottom half handler.
@@ -775,26 +780,26 @@ void on_each_cpu_cond(bool (*cond_func)(int cpu, void *info),
 	might_sleep_if(gfp_flags & __GFP_WAIT);
 
 	if (likely(zalloc_cpumask_var(&cpus, (gfp_flags|__GFP_NOWARN)))) {
-		preempt_disable();
+		get_online_cpus_atomic();
 		for_each_online_cpu(cpu)
 			if (cond_func(cpu, info))
 				cpumask_set_cpu(cpu, cpus);
 		on_each_cpu_mask(cpus, func, info, wait);
-		preempt_enable();
+		put_online_cpus_atomic();
 		free_cpumask_var(cpus);
 	} else {
 		/*
 		 * No free cpumask, bother. No matter, we'll
 		 * just have to IPI them one by one.
 		 */
-		preempt_disable();
+		get_online_cpus_atomic();
 		for_each_online_cpu(cpu)
 			if (cond_func(cpu, info)) {
 				ret = smp_call_function_single(cpu, func,
 								info, wait);
 				WARN_ON_ONCE(!ret);
 			}
-		preempt_enable();
+		put_online_cpus_atomic();
 	}
 }
 EXPORT_SYMBOL(on_each_cpu_cond);

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 12/46] sched/timer: Use get/put_online_cpus_atomic() to prevent CPU offline
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:40   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:40 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from going
offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/sched/core.c |   24 +++++++++++++++++++++---
 kernel/sched/fair.c |    5 ++++-
 kernel/timer.c      |    2 ++
 3 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 26058d0..ead8af6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1117,11 +1117,11 @@ void kick_process(struct task_struct *p)
 {
 	int cpu;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	cpu = task_cpu(p);
 	if ((cpu != smp_processor_id()) && task_curr(p))
 		smp_send_reschedule(cpu);
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 EXPORT_SYMBOL_GPL(kick_process);
 #endif /* CONFIG_SMP */
@@ -1129,6 +1129,10 @@ EXPORT_SYMBOL_GPL(kick_process);
 #ifdef CONFIG_SMP
 /*
  * ->cpus_allowed is protected by both rq->lock and p->pi_lock
+ *
+ *  Must be called under get/put_online_cpus_atomic() or
+ *  equivalent, to avoid CPUs from going offline from underneath
+ *  us.
  */
 static int select_fallback_rq(int cpu, struct task_struct *p)
 {
@@ -1192,6 +1196,9 @@ out:
 
 /*
  * The caller (fork, wakeup) owns p->pi_lock, ->cpus_allowed is stable.
+ *
+ * Must be called under get/put_online_cpus_atomic(), to prevent
+ * CPUs from going offline from underneath us.
  */
 static inline
 int select_task_rq(struct task_struct *p, int sd_flags, int wake_flags)
@@ -1432,6 +1439,7 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 	int cpu, success = 0;
 
 	smp_wmb();
+	get_online_cpus_atomic();
 	raw_spin_lock_irqsave(&p->pi_lock, flags);
 	if (!(p->state & state))
 		goto out;
@@ -1472,6 +1480,7 @@ stat:
 	ttwu_stat(p, cpu, wake_flags);
 out:
 	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+	put_online_cpus_atomic();
 
 	return success;
 }
@@ -1693,6 +1702,7 @@ void wake_up_new_task(struct task_struct *p)
 	unsigned long flags;
 	struct rq *rq;
 
+	get_online_cpus_atomic();
 	raw_spin_lock_irqsave(&p->pi_lock, flags);
 #ifdef CONFIG_SMP
 	/*
@@ -1713,6 +1723,7 @@ void wake_up_new_task(struct task_struct *p)
 		p->sched_class->task_woken(rq, p);
 #endif
 	task_rq_unlock(rq, p, &flags);
+	put_online_cpus_atomic();
 }
 
 #ifdef CONFIG_PREEMPT_NOTIFIERS
@@ -2610,6 +2621,7 @@ void sched_exec(void)
 	unsigned long flags;
 	int dest_cpu;
 
+	get_online_cpus_atomic();
 	raw_spin_lock_irqsave(&p->pi_lock, flags);
 	dest_cpu = p->sched_class->select_task_rq(p, SD_BALANCE_EXEC, 0);
 	if (dest_cpu == smp_processor_id())
@@ -2619,11 +2631,13 @@ void sched_exec(void)
 		struct migration_arg arg = { p, dest_cpu };
 
 		raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+		put_online_cpus_atomic();
 		stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg);
 		return;
 	}
 unlock:
 	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+	put_online_cpus_atomic();
 }
 
 #endif
@@ -4373,6 +4387,7 @@ bool __sched yield_to(struct task_struct *p, bool preempt)
 	unsigned long flags;
 	bool yielded = 0;
 
+	get_online_cpus_atomic();
 	local_irq_save(flags);
 	rq = this_rq();
 
@@ -4400,13 +4415,14 @@ again:
 		 * Make p's CPU reschedule; pick_next_entity takes care of
 		 * fairness.
 		 */
-		if (preempt && rq != p_rq)
+		if (preempt && rq != p_rq && cpu_online(task_cpu(p)))
 			resched_task(p_rq->curr);
 	}
 
 out:
 	double_rq_unlock(rq, p_rq);
 	local_irq_restore(flags);
+	put_online_cpus_atomic();
 
 	if (yielded)
 		schedule();
@@ -4811,9 +4827,11 @@ static int migration_cpu_stop(void *data)
 	 * The original target cpu might have gone down and we might
 	 * be on another cpu but it doesn't matter.
 	 */
+	get_online_cpus_atomic();
 	local_irq_disable();
 	__migrate_task(arg->task, raw_smp_processor_id(), arg->dest_cpu);
 	local_irq_enable();
+	put_online_cpus_atomic();
 	return 0;
 }
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 81fa536..c602d5c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5695,8 +5695,11 @@ void trigger_load_balance(struct rq *rq, int cpu)
 	    likely(!on_null_domain(cpu)))
 		raise_softirq(SCHED_SOFTIRQ);
 #ifdef CONFIG_NO_HZ
-	if (nohz_kick_needed(rq, cpu) && likely(!on_null_domain(cpu)))
+	if (nohz_kick_needed(rq, cpu) && likely(!on_null_domain(cpu))) {
+		get_online_cpus_atomic();
 		nohz_balancer_kick(cpu);
+		put_online_cpus_atomic();
+	}
 #endif
 }
 
diff --git a/kernel/timer.c b/kernel/timer.c
index 367d008..b1820e3 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -924,6 +924,7 @@ void add_timer_on(struct timer_list *timer, int cpu)
 
 	timer_stats_timer_set_start_info(timer);
 	BUG_ON(timer_pending(timer) || !timer->function);
+	get_online_cpus_atomic();
 	spin_lock_irqsave(&base->lock, flags);
 	timer_set_base(timer, base);
 	debug_activate(timer, timer->expires);
@@ -938,6 +939,7 @@ void add_timer_on(struct timer_list *timer, int cpu)
 	 */
 	wake_up_idle_cpu(cpu);
 	spin_unlock_irqrestore(&base->lock, flags);
+	put_online_cpus_atomic();
 }
 EXPORT_SYMBOL_GPL(add_timer_on);
 


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 12/46] sched/timer: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:40   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:40 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from going
offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/sched/core.c |   24 +++++++++++++++++++++---
 kernel/sched/fair.c |    5 ++++-
 kernel/timer.c      |    2 ++
 3 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 26058d0..ead8af6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1117,11 +1117,11 @@ void kick_process(struct task_struct *p)
 {
 	int cpu;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	cpu = task_cpu(p);
 	if ((cpu != smp_processor_id()) && task_curr(p))
 		smp_send_reschedule(cpu);
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 EXPORT_SYMBOL_GPL(kick_process);
 #endif /* CONFIG_SMP */
@@ -1129,6 +1129,10 @@ EXPORT_SYMBOL_GPL(kick_process);
 #ifdef CONFIG_SMP
 /*
  * ->cpus_allowed is protected by both rq->lock and p->pi_lock
+ *
+ *  Must be called under get/put_online_cpus_atomic() or
+ *  equivalent, to avoid CPUs from going offline from underneath
+ *  us.
  */
 static int select_fallback_rq(int cpu, struct task_struct *p)
 {
@@ -1192,6 +1196,9 @@ out:
 
 /*
  * The caller (fork, wakeup) owns p->pi_lock, ->cpus_allowed is stable.
+ *
+ * Must be called under get/put_online_cpus_atomic(), to prevent
+ * CPUs from going offline from underneath us.
  */
 static inline
 int select_task_rq(struct task_struct *p, int sd_flags, int wake_flags)
@@ -1432,6 +1439,7 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 	int cpu, success = 0;
 
 	smp_wmb();
+	get_online_cpus_atomic();
 	raw_spin_lock_irqsave(&p->pi_lock, flags);
 	if (!(p->state & state))
 		goto out;
@@ -1472,6 +1480,7 @@ stat:
 	ttwu_stat(p, cpu, wake_flags);
 out:
 	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+	put_online_cpus_atomic();
 
 	return success;
 }
@@ -1693,6 +1702,7 @@ void wake_up_new_task(struct task_struct *p)
 	unsigned long flags;
 	struct rq *rq;
 
+	get_online_cpus_atomic();
 	raw_spin_lock_irqsave(&p->pi_lock, flags);
 #ifdef CONFIG_SMP
 	/*
@@ -1713,6 +1723,7 @@ void wake_up_new_task(struct task_struct *p)
 		p->sched_class->task_woken(rq, p);
 #endif
 	task_rq_unlock(rq, p, &flags);
+	put_online_cpus_atomic();
 }
 
 #ifdef CONFIG_PREEMPT_NOTIFIERS
@@ -2610,6 +2621,7 @@ void sched_exec(void)
 	unsigned long flags;
 	int dest_cpu;
 
+	get_online_cpus_atomic();
 	raw_spin_lock_irqsave(&p->pi_lock, flags);
 	dest_cpu = p->sched_class->select_task_rq(p, SD_BALANCE_EXEC, 0);
 	if (dest_cpu == smp_processor_id())
@@ -2619,11 +2631,13 @@ void sched_exec(void)
 		struct migration_arg arg = { p, dest_cpu };
 
 		raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+		put_online_cpus_atomic();
 		stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg);
 		return;
 	}
 unlock:
 	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+	put_online_cpus_atomic();
 }
 
 #endif
@@ -4373,6 +4387,7 @@ bool __sched yield_to(struct task_struct *p, bool preempt)
 	unsigned long flags;
 	bool yielded = 0;
 
+	get_online_cpus_atomic();
 	local_irq_save(flags);
 	rq = this_rq();
 
@@ -4400,13 +4415,14 @@ again:
 		 * Make p's CPU reschedule; pick_next_entity takes care of
 		 * fairness.
 		 */
-		if (preempt && rq != p_rq)
+		if (preempt && rq != p_rq && cpu_online(task_cpu(p)))
 			resched_task(p_rq->curr);
 	}
 
 out:
 	double_rq_unlock(rq, p_rq);
 	local_irq_restore(flags);
+	put_online_cpus_atomic();
 
 	if (yielded)
 		schedule();
@@ -4811,9 +4827,11 @@ static int migration_cpu_stop(void *data)
 	 * The original target cpu might have gone down and we might
 	 * be on another cpu but it doesn't matter.
 	 */
+	get_online_cpus_atomic();
 	local_irq_disable();
 	__migrate_task(arg->task, raw_smp_processor_id(), arg->dest_cpu);
 	local_irq_enable();
+	put_online_cpus_atomic();
 	return 0;
 }
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 81fa536..c602d5c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5695,8 +5695,11 @@ void trigger_load_balance(struct rq *rq, int cpu)
 	    likely(!on_null_domain(cpu)))
 		raise_softirq(SCHED_SOFTIRQ);
 #ifdef CONFIG_NO_HZ
-	if (nohz_kick_needed(rq, cpu) && likely(!on_null_domain(cpu)))
+	if (nohz_kick_needed(rq, cpu) && likely(!on_null_domain(cpu))) {
+		get_online_cpus_atomic();
 		nohz_balancer_kick(cpu);
+		put_online_cpus_atomic();
+	}
 #endif
 }
 
diff --git a/kernel/timer.c b/kernel/timer.c
index 367d008..b1820e3 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -924,6 +924,7 @@ void add_timer_on(struct timer_list *timer, int cpu)
 
 	timer_stats_timer_set_start_info(timer);
 	BUG_ON(timer_pending(timer) || !timer->function);
+	get_online_cpus_atomic();
 	spin_lock_irqsave(&base->lock, flags);
 	timer_set_base(timer, base);
 	debug_activate(timer, timer->expires);
@@ -938,6 +939,7 @@ void add_timer_on(struct timer_list *timer, int cpu)
 	 */
 	wake_up_idle_cpu(cpu);
 	spin_unlock_irqrestore(&base->lock, flags);
+	put_online_cpus_atomic();
 }
 EXPORT_SYMBOL_GPL(add_timer_on);
 

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 12/46] sched/timer: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:40   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:40 UTC (permalink / raw)
  To: linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from going
offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/sched/core.c |   24 +++++++++++++++++++++---
 kernel/sched/fair.c |    5 ++++-
 kernel/timer.c      |    2 ++
 3 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 26058d0..ead8af6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1117,11 +1117,11 @@ void kick_process(struct task_struct *p)
 {
 	int cpu;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	cpu = task_cpu(p);
 	if ((cpu != smp_processor_id()) && task_curr(p))
 		smp_send_reschedule(cpu);
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 EXPORT_SYMBOL_GPL(kick_process);
 #endif /* CONFIG_SMP */
@@ -1129,6 +1129,10 @@ EXPORT_SYMBOL_GPL(kick_process);
 #ifdef CONFIG_SMP
 /*
  * ->cpus_allowed is protected by both rq->lock and p->pi_lock
+ *
+ *  Must be called under get/put_online_cpus_atomic() or
+ *  equivalent, to avoid CPUs from going offline from underneath
+ *  us.
  */
 static int select_fallback_rq(int cpu, struct task_struct *p)
 {
@@ -1192,6 +1196,9 @@ out:
 
 /*
  * The caller (fork, wakeup) owns p->pi_lock, ->cpus_allowed is stable.
+ *
+ * Must be called under get/put_online_cpus_atomic(), to prevent
+ * CPUs from going offline from underneath us.
  */
 static inline
 int select_task_rq(struct task_struct *p, int sd_flags, int wake_flags)
@@ -1432,6 +1439,7 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 	int cpu, success = 0;
 
 	smp_wmb();
+	get_online_cpus_atomic();
 	raw_spin_lock_irqsave(&p->pi_lock, flags);
 	if (!(p->state & state))
 		goto out;
@@ -1472,6 +1480,7 @@ stat:
 	ttwu_stat(p, cpu, wake_flags);
 out:
 	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+	put_online_cpus_atomic();
 
 	return success;
 }
@@ -1693,6 +1702,7 @@ void wake_up_new_task(struct task_struct *p)
 	unsigned long flags;
 	struct rq *rq;
 
+	get_online_cpus_atomic();
 	raw_spin_lock_irqsave(&p->pi_lock, flags);
 #ifdef CONFIG_SMP
 	/*
@@ -1713,6 +1723,7 @@ void wake_up_new_task(struct task_struct *p)
 		p->sched_class->task_woken(rq, p);
 #endif
 	task_rq_unlock(rq, p, &flags);
+	put_online_cpus_atomic();
 }
 
 #ifdef CONFIG_PREEMPT_NOTIFIERS
@@ -2610,6 +2621,7 @@ void sched_exec(void)
 	unsigned long flags;
 	int dest_cpu;
 
+	get_online_cpus_atomic();
 	raw_spin_lock_irqsave(&p->pi_lock, flags);
 	dest_cpu = p->sched_class->select_task_rq(p, SD_BALANCE_EXEC, 0);
 	if (dest_cpu == smp_processor_id())
@@ -2619,11 +2631,13 @@ void sched_exec(void)
 		struct migration_arg arg = { p, dest_cpu };
 
 		raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+		put_online_cpus_atomic();
 		stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg);
 		return;
 	}
 unlock:
 	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+	put_online_cpus_atomic();
 }
 
 #endif
@@ -4373,6 +4387,7 @@ bool __sched yield_to(struct task_struct *p, bool preempt)
 	unsigned long flags;
 	bool yielded = 0;
 
+	get_online_cpus_atomic();
 	local_irq_save(flags);
 	rq = this_rq();
 
@@ -4400,13 +4415,14 @@ again:
 		 * Make p's CPU reschedule; pick_next_entity takes care of
 		 * fairness.
 		 */
-		if (preempt && rq != p_rq)
+		if (preempt && rq != p_rq && cpu_online(task_cpu(p)))
 			resched_task(p_rq->curr);
 	}
 
 out:
 	double_rq_unlock(rq, p_rq);
 	local_irq_restore(flags);
+	put_online_cpus_atomic();
 
 	if (yielded)
 		schedule();
@@ -4811,9 +4827,11 @@ static int migration_cpu_stop(void *data)
 	 * The original target cpu might have gone down and we might
 	 * be on another cpu but it doesn't matter.
 	 */
+	get_online_cpus_atomic();
 	local_irq_disable();
 	__migrate_task(arg->task, raw_smp_processor_id(), arg->dest_cpu);
 	local_irq_enable();
+	put_online_cpus_atomic();
 	return 0;
 }
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 81fa536..c602d5c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5695,8 +5695,11 @@ void trigger_load_balance(struct rq *rq, int cpu)
 	    likely(!on_null_domain(cpu)))
 		raise_softirq(SCHED_SOFTIRQ);
 #ifdef CONFIG_NO_HZ
-	if (nohz_kick_needed(rq, cpu) && likely(!on_null_domain(cpu)))
+	if (nohz_kick_needed(rq, cpu) && likely(!on_null_domain(cpu))) {
+		get_online_cpus_atomic();
 		nohz_balancer_kick(cpu);
+		put_online_cpus_atomic();
+	}
 #endif
 }
 
diff --git a/kernel/timer.c b/kernel/timer.c
index 367d008..b1820e3 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -924,6 +924,7 @@ void add_timer_on(struct timer_list *timer, int cpu)
 
 	timer_stats_timer_set_start_info(timer);
 	BUG_ON(timer_pending(timer) || !timer->function);
+	get_online_cpus_atomic();
 	spin_lock_irqsave(&base->lock, flags);
 	timer_set_base(timer, base);
 	debug_activate(timer, timer->expires);
@@ -938,6 +939,7 @@ void add_timer_on(struct timer_list *timer, int cpu)
 	 */
 	wake_up_idle_cpu(cpu);
 	spin_unlock_irqrestore(&base->lock, flags);
+	put_online_cpus_atomic();
 }
 EXPORT_SYMBOL_GPL(add_timer_on);
 

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 13/46] sched/migration: Use raw_spin_lock/unlock since interrupts are already disabled
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:40   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:40 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

We need not use the raw_spin_lock_irqsave/restore primitives because
all CPU_DYING notifiers run with interrupts disabled. So just use
raw_spin_lock/unlock.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/sched/core.c |   12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ead8af6..71741d0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4870,9 +4870,7 @@ static void calc_load_migrate(struct rq *rq)
  * Migrate all tasks from the rq, sleeping tasks will be migrated by
  * try_to_wake_up()->select_task_rq().
  *
- * Called with rq->lock held even though we'er in stop_machine() and
- * there's no concurrency possible, we hold the required locks anyway
- * because of lock validation efforts.
+ * Called with rq->lock held.
  */
 static void migrate_tasks(unsigned int dead_cpu)
 {
@@ -4884,8 +4882,8 @@ static void migrate_tasks(unsigned int dead_cpu)
 	 * Fudge the rq selection such that the below task selection loop
 	 * doesn't get stuck on the currently eligible stop task.
 	 *
-	 * We're currently inside stop_machine() and the rq is either stuck
-	 * in the stop_machine_cpu_stop() loop, or we're executing this code,
+	 * We're currently inside stop_one_cpu() and the rq is either stuck
+	 * in the cpu_stopper_thread(), or we're executing this code,
 	 * either way we should never end up calling schedule() until we're
 	 * done here.
 	 */
@@ -5154,14 +5152,14 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
 	case CPU_DYING:
 		sched_ttwu_pending();
 		/* Update our root-domain */
-		raw_spin_lock_irqsave(&rq->lock, flags);
+		raw_spin_lock(&rq->lock); /* Interrupts already disabled */
 		if (rq->rd) {
 			BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span));
 			set_rq_offline(rq);
 		}
 		migrate_tasks(cpu);
 		BUG_ON(rq->nr_running != 1); /* the migration thread */
-		raw_spin_unlock_irqrestore(&rq->lock, flags);
+		raw_spin_unlock(&rq->lock);
 		break;
 
 	case CPU_DEAD:


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 13/46] sched/migration: Use raw_spin_lock/unlock since interrupts are already disabled
@ 2013-02-18 12:40   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:40 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

We need not use the raw_spin_lock_irqsave/restore primitives because
all CPU_DYING notifiers run with interrupts disabled. So just use
raw_spin_lock/unlock.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/sched/core.c |   12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ead8af6..71741d0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4870,9 +4870,7 @@ static void calc_load_migrate(struct rq *rq)
  * Migrate all tasks from the rq, sleeping tasks will be migrated by
  * try_to_wake_up()->select_task_rq().
  *
- * Called with rq->lock held even though we'er in stop_machine() and
- * there's no concurrency possible, we hold the required locks anyway
- * because of lock validation efforts.
+ * Called with rq->lock held.
  */
 static void migrate_tasks(unsigned int dead_cpu)
 {
@@ -4884,8 +4882,8 @@ static void migrate_tasks(unsigned int dead_cpu)
 	 * Fudge the rq selection such that the below task selection loop
 	 * doesn't get stuck on the currently eligible stop task.
 	 *
-	 * We're currently inside stop_machine() and the rq is either stuck
-	 * in the stop_machine_cpu_stop() loop, or we're executing this code,
+	 * We're currently inside stop_one_cpu() and the rq is either stuck
+	 * in the cpu_stopper_thread(), or we're executing this code,
 	 * either way we should never end up calling schedule() until we're
 	 * done here.
 	 */
@@ -5154,14 +5152,14 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
 	case CPU_DYING:
 		sched_ttwu_pending();
 		/* Update our root-domain */
-		raw_spin_lock_irqsave(&rq->lock, flags);
+		raw_spin_lock(&rq->lock); /* Interrupts already disabled */
 		if (rq->rd) {
 			BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span));
 			set_rq_offline(rq);
 		}
 		migrate_tasks(cpu);
 		BUG_ON(rq->nr_running != 1); /* the migration thread */
-		raw_spin_unlock_irqrestore(&rq->lock, flags);
+		raw_spin_unlock(&rq->lock);
 		break;
 
 	case CPU_DEAD:

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 13/46] sched/migration: Use raw_spin_lock/unlock since interrupts are already disabled
@ 2013-02-18 12:40   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:40 UTC (permalink / raw)
  To: linux-arm-kernel

We need not use the raw_spin_lock_irqsave/restore primitives because
all CPU_DYING notifiers run with interrupts disabled. So just use
raw_spin_lock/unlock.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/sched/core.c |   12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ead8af6..71741d0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4870,9 +4870,7 @@ static void calc_load_migrate(struct rq *rq)
  * Migrate all tasks from the rq, sleeping tasks will be migrated by
  * try_to_wake_up()->select_task_rq().
  *
- * Called with rq->lock held even though we'er in stop_machine() and
- * there's no concurrency possible, we hold the required locks anyway
- * because of lock validation efforts.
+ * Called with rq->lock held.
  */
 static void migrate_tasks(unsigned int dead_cpu)
 {
@@ -4884,8 +4882,8 @@ static void migrate_tasks(unsigned int dead_cpu)
 	 * Fudge the rq selection such that the below task selection loop
 	 * doesn't get stuck on the currently eligible stop task.
 	 *
-	 * We're currently inside stop_machine() and the rq is either stuck
-	 * in the stop_machine_cpu_stop() loop, or we're executing this code,
+	 * We're currently inside stop_one_cpu() and the rq is either stuck
+	 * in the cpu_stopper_thread(), or we're executing this code,
 	 * either way we should never end up calling schedule() until we're
 	 * done here.
 	 */
@@ -5154,14 +5152,14 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
 	case CPU_DYING:
 		sched_ttwu_pending();
 		/* Update our root-domain */
-		raw_spin_lock_irqsave(&rq->lock, flags);
+		raw_spin_lock(&rq->lock); /* Interrupts already disabled */
 		if (rq->rd) {
 			BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span));
 			set_rq_offline(rq);
 		}
 		migrate_tasks(cpu);
 		BUG_ON(rq->nr_running != 1); /* the migration thread */
-		raw_spin_unlock_irqrestore(&rq->lock, flags);
+		raw_spin_unlock(&rq->lock);
 		break;
 
 	case CPU_DEAD:

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 14/46] sched/rt: Use get/put_online_cpus_atomic() to prevent CPU offline
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:40   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:40 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/sched/rt.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 4f02b28..e546c98 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -6,6 +6,7 @@
 #include "sched.h"
 
 #include <linux/slab.h>
+#include <linux/cpu.h>
 
 static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun);
 
@@ -26,7 +27,9 @@ static enum hrtimer_restart sched_rt_period_timer(struct hrtimer *timer)
 		if (!overrun)
 			break;
 
+		get_online_cpus_atomic();
 		idle = do_sched_rt_period_timer(rt_b, overrun);
+		put_online_cpus_atomic();
 	}
 
 	return idle ? HRTIMER_NORESTART : HRTIMER_RESTART;


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 14/46] sched/rt: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:40   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:40 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/sched/rt.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 4f02b28..e546c98 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -6,6 +6,7 @@
 #include "sched.h"
 
 #include <linux/slab.h>
+#include <linux/cpu.h>
 
 static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun);
 
@@ -26,7 +27,9 @@ static enum hrtimer_restart sched_rt_period_timer(struct hrtimer *timer)
 		if (!overrun)
 			break;
 
+		get_online_cpus_atomic();
 		idle = do_sched_rt_period_timer(rt_b, overrun);
+		put_online_cpus_atomic();
 	}
 
 	return idle ? HRTIMER_NORESTART : HRTIMER_RESTART;

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 14/46] sched/rt: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:40   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:40 UTC (permalink / raw)
  To: linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/sched/rt.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 4f02b28..e546c98 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -6,6 +6,7 @@
 #include "sched.h"
 
 #include <linux/slab.h>
+#include <linux/cpu.h>
 
 static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun);
 
@@ -26,7 +27,9 @@ static enum hrtimer_restart sched_rt_period_timer(struct hrtimer *timer)
 		if (!overrun)
 			break;
 
+		get_online_cpus_atomic();
 		idle = do_sched_rt_period_timer(rt_b, overrun);
+		put_online_cpus_atomic();
 	}
 
 	return idle ? HRTIMER_NORESTART : HRTIMER_RESTART;

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 15/46] tick: Use get/put_online_cpus_atomic() to prevent CPU offline
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:40   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:40 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/time/tick-broadcast.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index f113755..d123a2c 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -160,6 +160,7 @@ static void tick_do_broadcast(struct cpumask *mask)
  */
 static void tick_do_periodic_broadcast(void)
 {
+	get_online_cpus_atomic();
 	raw_spin_lock(&tick_broadcast_lock);
 
 	cpumask_and(to_cpumask(tmpmask),
@@ -167,6 +168,7 @@ static void tick_do_periodic_broadcast(void)
 	tick_do_broadcast(to_cpumask(tmpmask));
 
 	raw_spin_unlock(&tick_broadcast_lock);
+	put_online_cpus_atomic();
 }
 
 /*


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 15/46] tick: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:40   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:40 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/time/tick-broadcast.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index f113755..d123a2c 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -160,6 +160,7 @@ static void tick_do_broadcast(struct cpumask *mask)
  */
 static void tick_do_periodic_broadcast(void)
 {
+	get_online_cpus_atomic();
 	raw_spin_lock(&tick_broadcast_lock);
 
 	cpumask_and(to_cpumask(tmpmask),
@@ -167,6 +168,7 @@ static void tick_do_periodic_broadcast(void)
 	tick_do_broadcast(to_cpumask(tmpmask));
 
 	raw_spin_unlock(&tick_broadcast_lock);
+	put_online_cpus_atomic();
 }
 
 /*

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 15/46] tick: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:40   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:40 UTC (permalink / raw)
  To: linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/time/tick-broadcast.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index f113755..d123a2c 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -160,6 +160,7 @@ static void tick_do_broadcast(struct cpumask *mask)
  */
 static void tick_do_periodic_broadcast(void)
 {
+	get_online_cpus_atomic();
 	raw_spin_lock(&tick_broadcast_lock);
 
 	cpumask_and(to_cpumask(tmpmask),
@@ -167,6 +168,7 @@ static void tick_do_periodic_broadcast(void)
 	tick_do_broadcast(to_cpumask(tmpmask));
 
 	raw_spin_unlock(&tick_broadcast_lock);
+	put_online_cpus_atomic();
 }
 
 /*

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 16/46] time/clocksource: Use get/put_online_cpus_atomic() to prevent CPU offline
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:40   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:40 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from going
offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/time/clocksource.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index c958338..1c8d735 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -30,6 +30,7 @@
 #include <linux/sched.h> /* for spin_unlock_irq() using preempt_count() m68k */
 #include <linux/tick.h>
 #include <linux/kthread.h>
+#include <linux/cpu.h>
 
 void timecounter_init(struct timecounter *tc,
 		      const struct cyclecounter *cc,
@@ -320,11 +321,13 @@ static void clocksource_watchdog(unsigned long data)
 	 * Cycle through CPUs to check if the CPUs stay synchronized
 	 * to each other.
 	 */
+	get_online_cpus_atomic();
 	next_cpu = cpumask_next(raw_smp_processor_id(), cpu_online_mask);
 	if (next_cpu >= nr_cpu_ids)
 		next_cpu = cpumask_first(cpu_online_mask);
 	watchdog_timer.expires += WATCHDOG_INTERVAL;
 	add_timer_on(&watchdog_timer, next_cpu);
+	put_online_cpus_atomic();
 out:
 	spin_unlock(&watchdog_lock);
 }
@@ -336,7 +339,9 @@ static inline void clocksource_start_watchdog(void)
 	init_timer(&watchdog_timer);
 	watchdog_timer.function = clocksource_watchdog;
 	watchdog_timer.expires = jiffies + WATCHDOG_INTERVAL;
+	get_online_cpus_atomic();
 	add_timer_on(&watchdog_timer, cpumask_first(cpu_online_mask));
+	put_online_cpus_atomic();
 	watchdog_running = 1;
 }
 


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 16/46] time/clocksource: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:40   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:40 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from going
offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/time/clocksource.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index c958338..1c8d735 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -30,6 +30,7 @@
 #include <linux/sched.h> /* for spin_unlock_irq() using preempt_count() m68k */
 #include <linux/tick.h>
 #include <linux/kthread.h>
+#include <linux/cpu.h>
 
 void timecounter_init(struct timecounter *tc,
 		      const struct cyclecounter *cc,
@@ -320,11 +321,13 @@ static void clocksource_watchdog(unsigned long data)
 	 * Cycle through CPUs to check if the CPUs stay synchronized
 	 * to each other.
 	 */
+	get_online_cpus_atomic();
 	next_cpu = cpumask_next(raw_smp_processor_id(), cpu_online_mask);
 	if (next_cpu >= nr_cpu_ids)
 		next_cpu = cpumask_first(cpu_online_mask);
 	watchdog_timer.expires += WATCHDOG_INTERVAL;
 	add_timer_on(&watchdog_timer, next_cpu);
+	put_online_cpus_atomic();
 out:
 	spin_unlock(&watchdog_lock);
 }
@@ -336,7 +339,9 @@ static inline void clocksource_start_watchdog(void)
 	init_timer(&watchdog_timer);
 	watchdog_timer.function = clocksource_watchdog;
 	watchdog_timer.expires = jiffies + WATCHDOG_INTERVAL;
+	get_online_cpus_atomic();
 	add_timer_on(&watchdog_timer, cpumask_first(cpu_online_mask));
+	put_online_cpus_atomic();
 	watchdog_running = 1;
 }
 

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 16/46] time/clocksource: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:40   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:40 UTC (permalink / raw)
  To: linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from going
offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/time/clocksource.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index c958338..1c8d735 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -30,6 +30,7 @@
 #include <linux/sched.h> /* for spin_unlock_irq() using preempt_count() m68k */
 #include <linux/tick.h>
 #include <linux/kthread.h>
+#include <linux/cpu.h>
 
 void timecounter_init(struct timecounter *tc,
 		      const struct cyclecounter *cc,
@@ -320,11 +321,13 @@ static void clocksource_watchdog(unsigned long data)
 	 * Cycle through CPUs to check if the CPUs stay synchronized
 	 * to each other.
 	 */
+	get_online_cpus_atomic();
 	next_cpu = cpumask_next(raw_smp_processor_id(), cpu_online_mask);
 	if (next_cpu >= nr_cpu_ids)
 		next_cpu = cpumask_first(cpu_online_mask);
 	watchdog_timer.expires += WATCHDOG_INTERVAL;
 	add_timer_on(&watchdog_timer, next_cpu);
+	put_online_cpus_atomic();
 out:
 	spin_unlock(&watchdog_lock);
 }
@@ -336,7 +339,9 @@ static inline void clocksource_start_watchdog(void)
 	init_timer(&watchdog_timer);
 	watchdog_timer.function = clocksource_watchdog;
 	watchdog_timer.expires = jiffies + WATCHDOG_INTERVAL;
+	get_online_cpus_atomic();
 	add_timer_on(&watchdog_timer, cpumask_first(cpu_online_mask));
+	put_online_cpus_atomic();
 	watchdog_running = 1;
 }
 

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 17/46] clockevents: Use get/put_online_cpus_atomic() in clockevents_notify()
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:40   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:40 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

The cpu idle code invokes clockevents_notify() during idle state transitions
and the cpu offline code invokes it during the CPU_DYING phase. There
seems to be a race-condition between the two, where the clockevents_lock
never gets released, ending in a lockup. This can be fixed by synchronizing
clockevents_notify() with CPU offline, by wrapping its contents within
get/put_online_cpus_atomic().

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/time/clockevents.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 30b6de0..ca340fd 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -17,6 +17,7 @@
 #include <linux/module.h>
 #include <linux/notifier.h>
 #include <linux/smp.h>
+#include <linux/cpu.h>
 
 #include "tick-internal.h"
 
@@ -431,6 +432,7 @@ void clockevents_notify(unsigned long reason, void *arg)
 	unsigned long flags;
 	int cpu;
 
+	get_online_cpus_atomic();
 	raw_spin_lock_irqsave(&clockevents_lock, flags);
 	clockevents_do_notify(reason, arg);
 
@@ -459,6 +461,7 @@ void clockevents_notify(unsigned long reason, void *arg)
 		break;
 	}
 	raw_spin_unlock_irqrestore(&clockevents_lock, flags);
+	put_online_cpus_atomic();
 }
 EXPORT_SYMBOL_GPL(clockevents_notify);
 #endif


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 17/46] clockevents: Use get/put_online_cpus_atomic() in clockevents_notify()
@ 2013-02-18 12:40   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:40 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

The cpu idle code invokes clockevents_notify() during idle state transitions
and the cpu offline code invokes it during the CPU_DYING phase. There
seems to be a race-condition between the two, where the clockevents_lock
never gets released, ending in a lockup. This can be fixed by synchronizing
clockevents_notify() with CPU offline, by wrapping its contents within
get/put_online_cpus_atomic().

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/time/clockevents.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 30b6de0..ca340fd 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -17,6 +17,7 @@
 #include <linux/module.h>
 #include <linux/notifier.h>
 #include <linux/smp.h>
+#include <linux/cpu.h>
 
 #include "tick-internal.h"
 
@@ -431,6 +432,7 @@ void clockevents_notify(unsigned long reason, void *arg)
 	unsigned long flags;
 	int cpu;
 
+	get_online_cpus_atomic();
 	raw_spin_lock_irqsave(&clockevents_lock, flags);
 	clockevents_do_notify(reason, arg);
 
@@ -459,6 +461,7 @@ void clockevents_notify(unsigned long reason, void *arg)
 		break;
 	}
 	raw_spin_unlock_irqrestore(&clockevents_lock, flags);
+	put_online_cpus_atomic();
 }
 EXPORT_SYMBOL_GPL(clockevents_notify);
 #endif

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 17/46] clockevents: Use get/put_online_cpus_atomic() in clockevents_notify()
@ 2013-02-18 12:40   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:40 UTC (permalink / raw)
  To: linux-arm-kernel

The cpu idle code invokes clockevents_notify() during idle state transitions
and the cpu offline code invokes it during the CPU_DYING phase. There
seems to be a race-condition between the two, where the clockevents_lock
never gets released, ending in a lockup. This can be fixed by synchronizing
clockevents_notify() with CPU offline, by wrapping its contents within
get/put_online_cpus_atomic().

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/time/clockevents.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 30b6de0..ca340fd 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -17,6 +17,7 @@
 #include <linux/module.h>
 #include <linux/notifier.h>
 #include <linux/smp.h>
+#include <linux/cpu.h>
 
 #include "tick-internal.h"
 
@@ -431,6 +432,7 @@ void clockevents_notify(unsigned long reason, void *arg)
 	unsigned long flags;
 	int cpu;
 
+	get_online_cpus_atomic();
 	raw_spin_lock_irqsave(&clockevents_lock, flags);
 	clockevents_do_notify(reason, arg);
 
@@ -459,6 +461,7 @@ void clockevents_notify(unsigned long reason, void *arg)
 		break;
 	}
 	raw_spin_unlock_irqrestore(&clockevents_lock, flags);
+	put_online_cpus_atomic();
 }
 EXPORT_SYMBOL_GPL(clockevents_notify);
 #endif

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 18/46] softirq: Use get/put_online_cpus_atomic() to prevent CPU offline
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:40   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:40 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/softirq.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index ed567ba..98c3e27 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -631,6 +631,7 @@ static void remote_softirq_receive(void *data)
 
 static int __try_remote_softirq(struct call_single_data *cp, int cpu, int softirq)
 {
+	get_online_cpus_atomic();
 	if (cpu_online(cpu)) {
 		cp->func = remote_softirq_receive;
 		cp->info = cp;
@@ -638,8 +639,10 @@ static int __try_remote_softirq(struct call_single_data *cp, int cpu, int softir
 		cp->priv = softirq;
 
 		__smp_call_function_single(cpu, cp, 0);
+		put_online_cpus_atomic();
 		return 0;
 	}
+	put_online_cpus_atomic();
 	return 1;
 }
 #else /* CONFIG_USE_GENERIC_SMP_HELPERS */


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 18/46] softirq: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:40   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:40 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/softirq.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index ed567ba..98c3e27 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -631,6 +631,7 @@ static void remote_softirq_receive(void *data)
 
 static int __try_remote_softirq(struct call_single_data *cp, int cpu, int softirq)
 {
+	get_online_cpus_atomic();
 	if (cpu_online(cpu)) {
 		cp->func = remote_softirq_receive;
 		cp->info = cp;
@@ -638,8 +639,10 @@ static int __try_remote_softirq(struct call_single_data *cp, int cpu, int softir
 		cp->priv = softirq;
 
 		__smp_call_function_single(cpu, cp, 0);
+		put_online_cpus_atomic();
 		return 0;
 	}
+	put_online_cpus_atomic();
 	return 1;
 }
 #else /* CONFIG_USE_GENERIC_SMP_HELPERS */

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 18/46] softirq: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:40   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:40 UTC (permalink / raw)
  To: linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/softirq.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index ed567ba..98c3e27 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -631,6 +631,7 @@ static void remote_softirq_receive(void *data)
 
 static int __try_remote_softirq(struct call_single_data *cp, int cpu, int softirq)
 {
+	get_online_cpus_atomic();
 	if (cpu_online(cpu)) {
 		cp->func = remote_softirq_receive;
 		cp->info = cp;
@@ -638,8 +639,10 @@ static int __try_remote_softirq(struct call_single_data *cp, int cpu, int softir
 		cp->priv = softirq;
 
 		__smp_call_function_single(cpu, cp, 0);
+		put_online_cpus_atomic();
 		return 0;
 	}
+	put_online_cpus_atomic();
 	return 1;
 }
 #else /* CONFIG_USE_GENERIC_SMP_HELPERS */

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 19/46] irq: Use get/put_online_cpus_atomic() to prevent CPU offline
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:40   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:40 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/irq/manage.c |    7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index e49a288..b4240b9 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -16,6 +16,7 @@
 #include <linux/interrupt.h>
 #include <linux/slab.h>
 #include <linux/sched.h>
+#include <linux/cpu.h>
 #include <linux/task_work.h>
 
 #include "internals.h"
@@ -202,7 +203,9 @@ int irq_set_affinity(unsigned int irq, const struct cpumask *mask)
 		return -EINVAL;
 
 	raw_spin_lock_irqsave(&desc->lock, flags);
+	get_online_cpus_atomic();
 	ret =  __irq_set_affinity_locked(irq_desc_get_irq_data(desc), mask);
+	put_online_cpus_atomic();
 	raw_spin_unlock_irqrestore(&desc->lock, flags);
 	return ret;
 }
@@ -343,7 +346,9 @@ int irq_select_affinity_usr(unsigned int irq, struct cpumask *mask)
 	int ret;
 
 	raw_spin_lock_irqsave(&desc->lock, flags);
+	get_online_cpus_atomic();
 	ret = setup_affinity(irq, desc, mask);
+	put_online_cpus_atomic();
 	raw_spin_unlock_irqrestore(&desc->lock, flags);
 	return ret;
 }
@@ -1126,7 +1131,9 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new)
 		}
 
 		/* Set default affinity mask once everything is setup */
+		get_online_cpus_atomic();
 		setup_affinity(irq, desc, mask);
+		put_online_cpus_atomic();
 
 	} else if (new->flags & IRQF_TRIGGER_MASK) {
 		unsigned int nmsk = new->flags & IRQF_TRIGGER_MASK;


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 19/46] irq: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:40   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:40 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/irq/manage.c |    7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index e49a288..b4240b9 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -16,6 +16,7 @@
 #include <linux/interrupt.h>
 #include <linux/slab.h>
 #include <linux/sched.h>
+#include <linux/cpu.h>
 #include <linux/task_work.h>
 
 #include "internals.h"
@@ -202,7 +203,9 @@ int irq_set_affinity(unsigned int irq, const struct cpumask *mask)
 		return -EINVAL;
 
 	raw_spin_lock_irqsave(&desc->lock, flags);
+	get_online_cpus_atomic();
 	ret =  __irq_set_affinity_locked(irq_desc_get_irq_data(desc), mask);
+	put_online_cpus_atomic();
 	raw_spin_unlock_irqrestore(&desc->lock, flags);
 	return ret;
 }
@@ -343,7 +346,9 @@ int irq_select_affinity_usr(unsigned int irq, struct cpumask *mask)
 	int ret;
 
 	raw_spin_lock_irqsave(&desc->lock, flags);
+	get_online_cpus_atomic();
 	ret = setup_affinity(irq, desc, mask);
+	put_online_cpus_atomic();
 	raw_spin_unlock_irqrestore(&desc->lock, flags);
 	return ret;
 }
@@ -1126,7 +1131,9 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new)
 		}
 
 		/* Set default affinity mask once everything is setup */
+		get_online_cpus_atomic();
 		setup_affinity(irq, desc, mask);
+		put_online_cpus_atomic();
 
 	} else if (new->flags & IRQF_TRIGGER_MASK) {
 		unsigned int nmsk = new->flags & IRQF_TRIGGER_MASK;

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 19/46] irq: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:40   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:40 UTC (permalink / raw)
  To: linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/irq/manage.c |    7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index e49a288..b4240b9 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -16,6 +16,7 @@
 #include <linux/interrupt.h>
 #include <linux/slab.h>
 #include <linux/sched.h>
+#include <linux/cpu.h>
 #include <linux/task_work.h>
 
 #include "internals.h"
@@ -202,7 +203,9 @@ int irq_set_affinity(unsigned int irq, const struct cpumask *mask)
 		return -EINVAL;
 
 	raw_spin_lock_irqsave(&desc->lock, flags);
+	get_online_cpus_atomic();
 	ret =  __irq_set_affinity_locked(irq_desc_get_irq_data(desc), mask);
+	put_online_cpus_atomic();
 	raw_spin_unlock_irqrestore(&desc->lock, flags);
 	return ret;
 }
@@ -343,7 +346,9 @@ int irq_select_affinity_usr(unsigned int irq, struct cpumask *mask)
 	int ret;
 
 	raw_spin_lock_irqsave(&desc->lock, flags);
+	get_online_cpus_atomic();
 	ret = setup_affinity(irq, desc, mask);
+	put_online_cpus_atomic();
 	raw_spin_unlock_irqrestore(&desc->lock, flags);
 	return ret;
 }
@@ -1126,7 +1131,9 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new)
 		}
 
 		/* Set default affinity mask once everything is setup */
+		get_online_cpus_atomic();
 		setup_affinity(irq, desc, mask);
+		put_online_cpus_atomic();
 
 	} else if (new->flags & IRQF_TRIGGER_MASK) {
 		unsigned int nmsk = new->flags & IRQF_TRIGGER_MASK;

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 20/46] net: Use get/put_online_cpus_atomic() to prevent CPU offline
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:41   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:41 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 net/core/dev.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index f64e439..5421f96 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3089,7 +3089,7 @@ int netif_rx(struct sk_buff *skb)
 		struct rps_dev_flow voidflow, *rflow = &voidflow;
 		int cpu;
 
-		preempt_disable();
+		get_online_cpus_atomic();
 		rcu_read_lock();
 
 		cpu = get_rps_cpu(skb->dev, skb, &rflow);
@@ -3099,7 +3099,7 @@ int netif_rx(struct sk_buff *skb)
 		ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
 
 		rcu_read_unlock();
-		preempt_enable();
+		put_online_cpus_atomic();
 	} else
 #endif
 	{
@@ -3498,6 +3498,7 @@ int netif_receive_skb(struct sk_buff *skb)
 		struct rps_dev_flow voidflow, *rflow = &voidflow;
 		int cpu, ret;
 
+		get_online_cpus_atomic();
 		rcu_read_lock();
 
 		cpu = get_rps_cpu(skb->dev, skb, &rflow);
@@ -3505,9 +3506,11 @@ int netif_receive_skb(struct sk_buff *skb)
 		if (cpu >= 0) {
 			ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
 			rcu_read_unlock();
+			put_online_cpus_atomic();
 			return ret;
 		}
 		rcu_read_unlock();
+		put_online_cpus_atomic();
 	}
 #endif
 	return __netif_receive_skb(skb);
@@ -3887,6 +3890,7 @@ static void net_rps_action_and_irq_enable(struct softnet_data *sd)
 		local_irq_enable();
 
 		/* Send pending IPI's to kick RPS processing on remote cpus. */
+		get_online_cpus_atomic();
 		while (remsd) {
 			struct softnet_data *next = remsd->rps_ipi_next;
 
@@ -3895,6 +3899,7 @@ static void net_rps_action_and_irq_enable(struct softnet_data *sd)
 							   &remsd->csd, 0);
 			remsd = next;
 		}
+		put_online_cpus_atomic();
 	} else
 #endif
 		local_irq_enable();


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 20/46] net: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:41   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:41 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 net/core/dev.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index f64e439..5421f96 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3089,7 +3089,7 @@ int netif_rx(struct sk_buff *skb)
 		struct rps_dev_flow voidflow, *rflow = &voidflow;
 		int cpu;
 
-		preempt_disable();
+		get_online_cpus_atomic();
 		rcu_read_lock();
 
 		cpu = get_rps_cpu(skb->dev, skb, &rflow);
@@ -3099,7 +3099,7 @@ int netif_rx(struct sk_buff *skb)
 		ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
 
 		rcu_read_unlock();
-		preempt_enable();
+		put_online_cpus_atomic();
 	} else
 #endif
 	{
@@ -3498,6 +3498,7 @@ int netif_receive_skb(struct sk_buff *skb)
 		struct rps_dev_flow voidflow, *rflow = &voidflow;
 		int cpu, ret;
 
+		get_online_cpus_atomic();
 		rcu_read_lock();
 
 		cpu = get_rps_cpu(skb->dev, skb, &rflow);
@@ -3505,9 +3506,11 @@ int netif_receive_skb(struct sk_buff *skb)
 		if (cpu >= 0) {
 			ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
 			rcu_read_unlock();
+			put_online_cpus_atomic();
 			return ret;
 		}
 		rcu_read_unlock();
+		put_online_cpus_atomic();
 	}
 #endif
 	return __netif_receive_skb(skb);
@@ -3887,6 +3890,7 @@ static void net_rps_action_and_irq_enable(struct softnet_data *sd)
 		local_irq_enable();
 
 		/* Send pending IPI's to kick RPS processing on remote cpus. */
+		get_online_cpus_atomic();
 		while (remsd) {
 			struct softnet_data *next = remsd->rps_ipi_next;
 
@@ -3895,6 +3899,7 @@ static void net_rps_action_and_irq_enable(struct softnet_data *sd)
 							   &remsd->csd, 0);
 			remsd = next;
 		}
+		put_online_cpus_atomic();
 	} else
 #endif
 		local_irq_enable();

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 20/46] net: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:41   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:41 UTC (permalink / raw)
  To: linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: netdev at vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 net/core/dev.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index f64e439..5421f96 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3089,7 +3089,7 @@ int netif_rx(struct sk_buff *skb)
 		struct rps_dev_flow voidflow, *rflow = &voidflow;
 		int cpu;
 
-		preempt_disable();
+		get_online_cpus_atomic();
 		rcu_read_lock();
 
 		cpu = get_rps_cpu(skb->dev, skb, &rflow);
@@ -3099,7 +3099,7 @@ int netif_rx(struct sk_buff *skb)
 		ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
 
 		rcu_read_unlock();
-		preempt_enable();
+		put_online_cpus_atomic();
 	} else
 #endif
 	{
@@ -3498,6 +3498,7 @@ int netif_receive_skb(struct sk_buff *skb)
 		struct rps_dev_flow voidflow, *rflow = &voidflow;
 		int cpu, ret;
 
+		get_online_cpus_atomic();
 		rcu_read_lock();
 
 		cpu = get_rps_cpu(skb->dev, skb, &rflow);
@@ -3505,9 +3506,11 @@ int netif_receive_skb(struct sk_buff *skb)
 		if (cpu >= 0) {
 			ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
 			rcu_read_unlock();
+			put_online_cpus_atomic();
 			return ret;
 		}
 		rcu_read_unlock();
+		put_online_cpus_atomic();
 	}
 #endif
 	return __netif_receive_skb(skb);
@@ -3887,6 +3890,7 @@ static void net_rps_action_and_irq_enable(struct softnet_data *sd)
 		local_irq_enable();
 
 		/* Send pending IPI's to kick RPS processing on remote cpus. */
+		get_online_cpus_atomic();
 		while (remsd) {
 			struct softnet_data *next = remsd->rps_ipi_next;
 
@@ -3895,6 +3899,7 @@ static void net_rps_action_and_irq_enable(struct softnet_data *sd)
 							   &remsd->csd, 0);
 			remsd = next;
 		}
+		put_online_cpus_atomic();
 	} else
 #endif
 		local_irq_enable();

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 21/46] block: Use get/put_online_cpus_atomic() to prevent CPU offline
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:41   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:41 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 block/blk-softirq.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/block/blk-softirq.c b/block/blk-softirq.c
index 467c8de..448f9a9 100644
--- a/block/blk-softirq.c
+++ b/block/blk-softirq.c
@@ -58,6 +58,8 @@ static void trigger_softirq(void *data)
  */
 static int raise_blk_irq(int cpu, struct request *rq)
 {
+	get_online_cpus_atomic();
+
 	if (cpu_online(cpu)) {
 		struct call_single_data *data = &rq->csd;
 
@@ -66,9 +68,11 @@ static int raise_blk_irq(int cpu, struct request *rq)
 		data->flags = 0;
 
 		__smp_call_function_single(cpu, data, 0);
+		put_online_cpus_atomic();
 		return 0;
 	}
 
+	put_online_cpus_atomic();
 	return 1;
 }
 #else /* CONFIG_SMP && CONFIG_USE_GENERIC_SMP_HELPERS */


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 21/46] block: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:41   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:41 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 block/blk-softirq.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/block/blk-softirq.c b/block/blk-softirq.c
index 467c8de..448f9a9 100644
--- a/block/blk-softirq.c
+++ b/block/blk-softirq.c
@@ -58,6 +58,8 @@ static void trigger_softirq(void *data)
  */
 static int raise_blk_irq(int cpu, struct request *rq)
 {
+	get_online_cpus_atomic();
+
 	if (cpu_online(cpu)) {
 		struct call_single_data *data = &rq->csd;
 
@@ -66,9 +68,11 @@ static int raise_blk_irq(int cpu, struct request *rq)
 		data->flags = 0;
 
 		__smp_call_function_single(cpu, data, 0);
+		put_online_cpus_atomic();
 		return 0;
 	}
 
+	put_online_cpus_atomic();
 	return 1;
 }
 #else /* CONFIG_SMP && CONFIG_USE_GENERIC_SMP_HELPERS */

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 21/46] block: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:41   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:41 UTC (permalink / raw)
  To: linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 block/blk-softirq.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/block/blk-softirq.c b/block/blk-softirq.c
index 467c8de..448f9a9 100644
--- a/block/blk-softirq.c
+++ b/block/blk-softirq.c
@@ -58,6 +58,8 @@ static void trigger_softirq(void *data)
  */
 static int raise_blk_irq(int cpu, struct request *rq)
 {
+	get_online_cpus_atomic();
+
 	if (cpu_online(cpu)) {
 		struct call_single_data *data = &rq->csd;
 
@@ -66,9 +68,11 @@ static int raise_blk_irq(int cpu, struct request *rq)
 		data->flags = 0;
 
 		__smp_call_function_single(cpu, data, 0);
+		put_online_cpus_atomic();
 		return 0;
 	}
 
+	put_online_cpus_atomic();
 	return 1;
 }
 #else /* CONFIG_SMP && CONFIG_USE_GENERIC_SMP_HELPERS */

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 22/46] crypto: pcrypt - Protect access to cpu_online_mask with get/put_online_cpus()
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:41   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:41 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

The pcrypt_aead_init_tfm() function access the cpu_online_mask without
disabling CPU hotplug. And it looks like it can afford to sleep, so use
the get/put_online_cpus() APIs to protect against CPU hotplug.

Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-crypto@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 crypto/pcrypt.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/crypto/pcrypt.c b/crypto/pcrypt.c
index b2c99dc..10f64e2 100644
--- a/crypto/pcrypt.c
+++ b/crypto/pcrypt.c
@@ -280,12 +280,16 @@ static int pcrypt_aead_init_tfm(struct crypto_tfm *tfm)
 
 	ictx->tfm_count++;
 
+	get_online_cpus();
+
 	cpu_index = ictx->tfm_count % cpumask_weight(cpu_online_mask);
 
 	ctx->cb_cpu = cpumask_first(cpu_online_mask);
 	for (cpu = 0; cpu < cpu_index; cpu++)
 		ctx->cb_cpu = cpumask_next(ctx->cb_cpu, cpu_online_mask);
 
+	put_online_cpus();
+
 	cipher = crypto_spawn_aead(crypto_instance_ctx(inst));
 
 	if (IS_ERR(cipher))


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 22/46] crypto: pcrypt - Protect access to cpu_online_mask with get/put_online_cpus()
@ 2013-02-18 12:41   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:41 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

The pcrypt_aead_init_tfm() function access the cpu_online_mask without
disabling CPU hotplug. And it looks like it can afford to sleep, so use
the get/put_online_cpus() APIs to protect against CPU hotplug.

Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-crypto@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 crypto/pcrypt.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/crypto/pcrypt.c b/crypto/pcrypt.c
index b2c99dc..10f64e2 100644
--- a/crypto/pcrypt.c
+++ b/crypto/pcrypt.c
@@ -280,12 +280,16 @@ static int pcrypt_aead_init_tfm(struct crypto_tfm *tfm)
 
 	ictx->tfm_count++;
 
+	get_online_cpus();
+
 	cpu_index = ictx->tfm_count % cpumask_weight(cpu_online_mask);
 
 	ctx->cb_cpu = cpumask_first(cpu_online_mask);
 	for (cpu = 0; cpu < cpu_index; cpu++)
 		ctx->cb_cpu = cpumask_next(ctx->cb_cpu, cpu_online_mask);
 
+	put_online_cpus();
+
 	cipher = crypto_spawn_aead(crypto_instance_ctx(inst));
 
 	if (IS_ERR(cipher))

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 22/46] crypto: pcrypt - Protect access to cpu_online_mask with get/put_online_cpus()
@ 2013-02-18 12:41   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:41 UTC (permalink / raw)
  To: linux-arm-kernel

The pcrypt_aead_init_tfm() function access the cpu_online_mask without
disabling CPU hotplug. And it looks like it can afford to sleep, so use
the get/put_online_cpus() APIs to protect against CPU hotplug.

Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-crypto at vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 crypto/pcrypt.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/crypto/pcrypt.c b/crypto/pcrypt.c
index b2c99dc..10f64e2 100644
--- a/crypto/pcrypt.c
+++ b/crypto/pcrypt.c
@@ -280,12 +280,16 @@ static int pcrypt_aead_init_tfm(struct crypto_tfm *tfm)
 
 	ictx->tfm_count++;
 
+	get_online_cpus();
+
 	cpu_index = ictx->tfm_count % cpumask_weight(cpu_online_mask);
 
 	ctx->cb_cpu = cpumask_first(cpu_online_mask);
 	for (cpu = 0; cpu < cpu_index; cpu++)
 		ctx->cb_cpu = cpumask_next(ctx->cb_cpu, cpu_online_mask);
 
+	put_online_cpus();
+
 	cipher = crypto_spawn_aead(crypto_instance_ctx(inst));
 
 	if (IS_ERR(cipher))

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 23/46] infiniband: ehca: Use get/put_online_cpus_atomic() to prevent CPU offline
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:41   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:41 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Roland Dreier <roland@kernel.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 drivers/infiniband/hw/ehca/ehca_irq.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/infiniband/hw/ehca/ehca_irq.c b/drivers/infiniband/hw/ehca/ehca_irq.c
index 8615d7c..d61936c 100644
--- a/drivers/infiniband/hw/ehca/ehca_irq.c
+++ b/drivers/infiniband/hw/ehca/ehca_irq.c
@@ -43,6 +43,7 @@
 
 #include <linux/slab.h>
 #include <linux/smpboot.h>
+#include <linux/cpu.h>
 
 #include "ehca_classes.h"
 #include "ehca_irq.h"
@@ -653,6 +654,9 @@ void ehca_tasklet_eq(unsigned long data)
 	ehca_process_eq((struct ehca_shca*)data, 1);
 }
 
+/*
+ * Must be called under get_online_cpus_atomic() and put_online_cpus_atomic().
+ */
 static int find_next_online_cpu(struct ehca_comp_pool *pool)
 {
 	int cpu;
@@ -703,6 +707,7 @@ static void queue_comp_task(struct ehca_cq *__cq)
 	int cq_jobs;
 	unsigned long flags;
 
+	get_online_cpus_atomic();
 	cpu_id = find_next_online_cpu(pool);
 	BUG_ON(!cpu_online(cpu_id));
 
@@ -720,6 +725,7 @@ static void queue_comp_task(struct ehca_cq *__cq)
 		BUG_ON(!cct || !thread);
 	}
 	__queue_comp_task(__cq, cct, thread);
+	put_online_cpus_atomic();
 }
 
 static void run_comp_task(struct ehca_cpu_comp_task *cct)
@@ -759,6 +765,7 @@ static void comp_task_park(unsigned int cpu)
 	list_splice_init(&cct->cq_list, &list);
 	spin_unlock_irq(&cct->task_lock);
 
+	get_online_cpus_atomic();
 	cpu = find_next_online_cpu(pool);
 	target = per_cpu_ptr(pool->cpu_comp_tasks, cpu);
 	thread = *per_cpu_ptr(pool->cpu_comp_threads, cpu);
@@ -768,6 +775,7 @@ static void comp_task_park(unsigned int cpu)
 		__queue_comp_task(cq, target, thread);
 	}
 	spin_unlock_irq(&target->task_lock);
+	put_online_cpus_atomic();
 }
 
 static void comp_task_stop(unsigned int cpu, bool online)


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 23/46] infiniband: ehca: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:41   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:41 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Roland Dreier <roland@kernel.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 drivers/infiniband/hw/ehca/ehca_irq.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/infiniband/hw/ehca/ehca_irq.c b/drivers/infiniband/hw/ehca/ehca_irq.c
index 8615d7c..d61936c 100644
--- a/drivers/infiniband/hw/ehca/ehca_irq.c
+++ b/drivers/infiniband/hw/ehca/ehca_irq.c
@@ -43,6 +43,7 @@
 
 #include <linux/slab.h>
 #include <linux/smpboot.h>
+#include <linux/cpu.h>
 
 #include "ehca_classes.h"
 #include "ehca_irq.h"
@@ -653,6 +654,9 @@ void ehca_tasklet_eq(unsigned long data)
 	ehca_process_eq((struct ehca_shca*)data, 1);
 }
 
+/*
+ * Must be called under get_online_cpus_atomic() and put_online_cpus_atomic().
+ */
 static int find_next_online_cpu(struct ehca_comp_pool *pool)
 {
 	int cpu;
@@ -703,6 +707,7 @@ static void queue_comp_task(struct ehca_cq *__cq)
 	int cq_jobs;
 	unsigned long flags;
 
+	get_online_cpus_atomic();
 	cpu_id = find_next_online_cpu(pool);
 	BUG_ON(!cpu_online(cpu_id));
 
@@ -720,6 +725,7 @@ static void queue_comp_task(struct ehca_cq *__cq)
 		BUG_ON(!cct || !thread);
 	}
 	__queue_comp_task(__cq, cct, thread);
+	put_online_cpus_atomic();
 }
 
 static void run_comp_task(struct ehca_cpu_comp_task *cct)
@@ -759,6 +765,7 @@ static void comp_task_park(unsigned int cpu)
 	list_splice_init(&cct->cq_list, &list);
 	spin_unlock_irq(&cct->task_lock);
 
+	get_online_cpus_atomic();
 	cpu = find_next_online_cpu(pool);
 	target = per_cpu_ptr(pool->cpu_comp_tasks, cpu);
 	thread = *per_cpu_ptr(pool->cpu_comp_threads, cpu);
@@ -768,6 +775,7 @@ static void comp_task_park(unsigned int cpu)
 		__queue_comp_task(cq, target, thread);
 	}
 	spin_unlock_irq(&target->task_lock);
+	put_online_cpus_atomic();
 }
 
 static void comp_task_stop(unsigned int cpu, bool online)

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 23/46] infiniband: ehca: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:41   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:41 UTC (permalink / raw)
  To: linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Roland Dreier <roland@kernel.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 drivers/infiniband/hw/ehca/ehca_irq.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/infiniband/hw/ehca/ehca_irq.c b/drivers/infiniband/hw/ehca/ehca_irq.c
index 8615d7c..d61936c 100644
--- a/drivers/infiniband/hw/ehca/ehca_irq.c
+++ b/drivers/infiniband/hw/ehca/ehca_irq.c
@@ -43,6 +43,7 @@
 
 #include <linux/slab.h>
 #include <linux/smpboot.h>
+#include <linux/cpu.h>
 
 #include "ehca_classes.h"
 #include "ehca_irq.h"
@@ -653,6 +654,9 @@ void ehca_tasklet_eq(unsigned long data)
 	ehca_process_eq((struct ehca_shca*)data, 1);
 }
 
+/*
+ * Must be called under get_online_cpus_atomic() and put_online_cpus_atomic().
+ */
 static int find_next_online_cpu(struct ehca_comp_pool *pool)
 {
 	int cpu;
@@ -703,6 +707,7 @@ static void queue_comp_task(struct ehca_cq *__cq)
 	int cq_jobs;
 	unsigned long flags;
 
+	get_online_cpus_atomic();
 	cpu_id = find_next_online_cpu(pool);
 	BUG_ON(!cpu_online(cpu_id));
 
@@ -720,6 +725,7 @@ static void queue_comp_task(struct ehca_cq *__cq)
 		BUG_ON(!cct || !thread);
 	}
 	__queue_comp_task(__cq, cct, thread);
+	put_online_cpus_atomic();
 }
 
 static void run_comp_task(struct ehca_cpu_comp_task *cct)
@@ -759,6 +765,7 @@ static void comp_task_park(unsigned int cpu)
 	list_splice_init(&cct->cq_list, &list);
 	spin_unlock_irq(&cct->task_lock);
 
+	get_online_cpus_atomic();
 	cpu = find_next_online_cpu(pool);
 	target = per_cpu_ptr(pool->cpu_comp_tasks, cpu);
 	thread = *per_cpu_ptr(pool->cpu_comp_threads, cpu);
@@ -768,6 +775,7 @@ static void comp_task_park(unsigned int cpu)
 		__queue_comp_task(cq, target, thread);
 	}
 	spin_unlock_irq(&target->task_lock);
+	put_online_cpus_atomic();
 }
 
 static void comp_task_stop(unsigned int cpu, bool online)

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 24/46] [SCSI] fcoe: Use get/put_online_cpus_atomic() to prevent CPU offline
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:41   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:41 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Robert Love <robert.w.love@intel.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: devel@open-fcoe.org
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 drivers/scsi/fcoe/fcoe.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/fcoe/fcoe.c b/drivers/scsi/fcoe/fcoe.c
index 666b7ac..c971a17 100644
--- a/drivers/scsi/fcoe/fcoe.c
+++ b/drivers/scsi/fcoe/fcoe.c
@@ -1475,6 +1475,7 @@ static int fcoe_rcv(struct sk_buff *skb, struct net_device *netdev,
 	 * was originated, otherwise select cpu using rx exchange id
 	 * or fcoe_select_cpu().
 	 */
+	get_online_cpus_atomic();
 	if (ntoh24(fh->fh_f_ctl) & FC_FC_EX_CTX)
 		cpu = ntohs(fh->fh_ox_id) & fc_cpu_mask;
 	else {
@@ -1484,8 +1485,10 @@ static int fcoe_rcv(struct sk_buff *skb, struct net_device *netdev,
 			cpu = ntohs(fh->fh_rx_id) & fc_cpu_mask;
 	}
 
-	if (cpu >= nr_cpu_ids)
+	if (cpu >= nr_cpu_ids) {
+		put_online_cpus_atomic();
 		goto err;
+	}
 
 	fps = &per_cpu(fcoe_percpu, cpu);
 	spin_lock(&fps->fcoe_rx_list.lock);
@@ -1505,6 +1508,7 @@ static int fcoe_rcv(struct sk_buff *skb, struct net_device *netdev,
 		spin_lock(&fps->fcoe_rx_list.lock);
 		if (!fps->thread) {
 			spin_unlock(&fps->fcoe_rx_list.lock);
+			put_online_cpus_atomic();
 			goto err;
 		}
 	}
@@ -1526,6 +1530,7 @@ static int fcoe_rcv(struct sk_buff *skb, struct net_device *netdev,
 	if (fps->thread->state == TASK_INTERRUPTIBLE)
 		wake_up_process(fps->thread);
 	spin_unlock(&fps->fcoe_rx_list.lock);
+	put_online_cpus_atomic();
 
 	return 0;
 err:


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 24/46] [SCSI] fcoe: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:41   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:41 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Robert Love <robert.w.love@intel.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: devel@open-fcoe.org
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 drivers/scsi/fcoe/fcoe.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/fcoe/fcoe.c b/drivers/scsi/fcoe/fcoe.c
index 666b7ac..c971a17 100644
--- a/drivers/scsi/fcoe/fcoe.c
+++ b/drivers/scsi/fcoe/fcoe.c
@@ -1475,6 +1475,7 @@ static int fcoe_rcv(struct sk_buff *skb, struct net_device *netdev,
 	 * was originated, otherwise select cpu using rx exchange id
 	 * or fcoe_select_cpu().
 	 */
+	get_online_cpus_atomic();
 	if (ntoh24(fh->fh_f_ctl) & FC_FC_EX_CTX)
 		cpu = ntohs(fh->fh_ox_id) & fc_cpu_mask;
 	else {
@@ -1484,8 +1485,10 @@ static int fcoe_rcv(struct sk_buff *skb, struct net_device *netdev,
 			cpu = ntohs(fh->fh_rx_id) & fc_cpu_mask;
 	}
 
-	if (cpu >= nr_cpu_ids)
+	if (cpu >= nr_cpu_ids) {
+		put_online_cpus_atomic();
 		goto err;
+	}
 
 	fps = &per_cpu(fcoe_percpu, cpu);
 	spin_lock(&fps->fcoe_rx_list.lock);
@@ -1505,6 +1508,7 @@ static int fcoe_rcv(struct sk_buff *skb, struct net_device *netdev,
 		spin_lock(&fps->fcoe_rx_list.lock);
 		if (!fps->thread) {
 			spin_unlock(&fps->fcoe_rx_list.lock);
+			put_online_cpus_atomic();
 			goto err;
 		}
 	}
@@ -1526,6 +1530,7 @@ static int fcoe_rcv(struct sk_buff *skb, struct net_device *netdev,
 	if (fps->thread->state == TASK_INTERRUPTIBLE)
 		wake_up_process(fps->thread);
 	spin_unlock(&fps->fcoe_rx_list.lock);
+	put_online_cpus_atomic();
 
 	return 0;
 err:

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 24/46] [SCSI] fcoe: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:41   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:41 UTC (permalink / raw)
  To: linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Robert Love <robert.w.love@intel.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: devel at open-fcoe.org
Cc: linux-scsi at vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 drivers/scsi/fcoe/fcoe.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/fcoe/fcoe.c b/drivers/scsi/fcoe/fcoe.c
index 666b7ac..c971a17 100644
--- a/drivers/scsi/fcoe/fcoe.c
+++ b/drivers/scsi/fcoe/fcoe.c
@@ -1475,6 +1475,7 @@ static int fcoe_rcv(struct sk_buff *skb, struct net_device *netdev,
 	 * was originated, otherwise select cpu using rx exchange id
 	 * or fcoe_select_cpu().
 	 */
+	get_online_cpus_atomic();
 	if (ntoh24(fh->fh_f_ctl) & FC_FC_EX_CTX)
 		cpu = ntohs(fh->fh_ox_id) & fc_cpu_mask;
 	else {
@@ -1484,8 +1485,10 @@ static int fcoe_rcv(struct sk_buff *skb, struct net_device *netdev,
 			cpu = ntohs(fh->fh_rx_id) & fc_cpu_mask;
 	}
 
-	if (cpu >= nr_cpu_ids)
+	if (cpu >= nr_cpu_ids) {
+		put_online_cpus_atomic();
 		goto err;
+	}
 
 	fps = &per_cpu(fcoe_percpu, cpu);
 	spin_lock(&fps->fcoe_rx_list.lock);
@@ -1505,6 +1508,7 @@ static int fcoe_rcv(struct sk_buff *skb, struct net_device *netdev,
 		spin_lock(&fps->fcoe_rx_list.lock);
 		if (!fps->thread) {
 			spin_unlock(&fps->fcoe_rx_list.lock);
+			put_online_cpus_atomic();
 			goto err;
 		}
 	}
@@ -1526,6 +1530,7 @@ static int fcoe_rcv(struct sk_buff *skb, struct net_device *netdev,
 	if (fps->thread->state == TASK_INTERRUPTIBLE)
 		wake_up_process(fps->thread);
 	spin_unlock(&fps->fcoe_rx_list.lock);
+	put_online_cpus_atomic();
 
 	return 0;
 err:

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 25/46] staging: octeon: Use get/put_online_cpus_atomic() to prevent CPU offline
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:41   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:41 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: David Daney <david.daney@cavium.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 drivers/staging/octeon/ethernet-rx.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/staging/octeon/ethernet-rx.c b/drivers/staging/octeon/ethernet-rx.c
index 34afc16..8588b4d 100644
--- a/drivers/staging/octeon/ethernet-rx.c
+++ b/drivers/staging/octeon/ethernet-rx.c
@@ -36,6 +36,7 @@
 #include <linux/prefetch.h>
 #include <linux/ratelimit.h>
 #include <linux/smp.h>
+#include <linux/cpu.h>
 #include <linux/interrupt.h>
 #include <net/dst.h>
 #ifdef CONFIG_XFRM
@@ -97,6 +98,7 @@ static void cvm_oct_enable_one_cpu(void)
 		return;
 
 	/* ... if a CPU is available, Turn on NAPI polling for that CPU.  */
+	get_online_cpus_atomic();
 	for_each_online_cpu(cpu) {
 		if (!cpu_test_and_set(cpu, core_state.cpu_state)) {
 			v = smp_call_function_single(cpu, cvm_oct_enable_napi,
@@ -106,6 +108,7 @@ static void cvm_oct_enable_one_cpu(void)
 			break;
 		}
 	}
+	put_online_cpus_atomic();
 }
 
 static void cvm_oct_no_more_work(void)


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 25/46] staging: octeon: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:41   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:41 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: David Daney <david.daney@cavium.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 drivers/staging/octeon/ethernet-rx.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/staging/octeon/ethernet-rx.c b/drivers/staging/octeon/ethernet-rx.c
index 34afc16..8588b4d 100644
--- a/drivers/staging/octeon/ethernet-rx.c
+++ b/drivers/staging/octeon/ethernet-rx.c
@@ -36,6 +36,7 @@
 #include <linux/prefetch.h>
 #include <linux/ratelimit.h>
 #include <linux/smp.h>
+#include <linux/cpu.h>
 #include <linux/interrupt.h>
 #include <net/dst.h>
 #ifdef CONFIG_XFRM
@@ -97,6 +98,7 @@ static void cvm_oct_enable_one_cpu(void)
 		return;
 
 	/* ... if a CPU is available, Turn on NAPI polling for that CPU.  */
+	get_online_cpus_atomic();
 	for_each_online_cpu(cpu) {
 		if (!cpu_test_and_set(cpu, core_state.cpu_state)) {
 			v = smp_call_function_single(cpu, cvm_oct_enable_napi,
@@ -106,6 +108,7 @@ static void cvm_oct_enable_one_cpu(void)
 			break;
 		}
 	}
+	put_online_cpus_atomic();
 }
 
 static void cvm_oct_no_more_work(void)

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 25/46] staging: octeon: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:41   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:41 UTC (permalink / raw)
  To: linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: David Daney <david.daney@cavium.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 drivers/staging/octeon/ethernet-rx.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/staging/octeon/ethernet-rx.c b/drivers/staging/octeon/ethernet-rx.c
index 34afc16..8588b4d 100644
--- a/drivers/staging/octeon/ethernet-rx.c
+++ b/drivers/staging/octeon/ethernet-rx.c
@@ -36,6 +36,7 @@
 #include <linux/prefetch.h>
 #include <linux/ratelimit.h>
 #include <linux/smp.h>
+#include <linux/cpu.h>
 #include <linux/interrupt.h>
 #include <net/dst.h>
 #ifdef CONFIG_XFRM
@@ -97,6 +98,7 @@ static void cvm_oct_enable_one_cpu(void)
 		return;
 
 	/* ... if a CPU is available, Turn on NAPI polling for that CPU.  */
+	get_online_cpus_atomic();
 	for_each_online_cpu(cpu) {
 		if (!cpu_test_and_set(cpu, core_state.cpu_state)) {
 			v = smp_call_function_single(cpu, cvm_oct_enable_napi,
@@ -106,6 +108,7 @@ static void cvm_oct_enable_one_cpu(void)
 			break;
 		}
 	}
+	put_online_cpus_atomic();
 }
 
 static void cvm_oct_no_more_work(void)

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 26/46] x86: Use get/put_online_cpus_atomic() to prevent CPU offline
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:41   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:41 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Daniel J Blueman <daniel@numascale-asia.com>
Cc: Steffen Persvold <sp@numascale.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: linux-edac@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/x86/include/asm/ipi.h               |    5 +++++
 arch/x86/kernel/apic/apic_flat_64.c      |   10 ++++++++++
 arch/x86/kernel/apic/apic_numachip.c     |    5 +++++
 arch/x86/kernel/apic/es7000_32.c         |    5 +++++
 arch/x86/kernel/apic/io_apic.c           |    7 +++++--
 arch/x86/kernel/apic/ipi.c               |   10 ++++++++++
 arch/x86/kernel/apic/x2apic_cluster.c    |    4 ++++
 arch/x86/kernel/apic/x2apic_uv_x.c       |    4 ++++
 arch/x86/kernel/cpu/mcheck/therm_throt.c |    4 ++--
 arch/x86/mm/tlb.c                        |   14 +++++++-------
 10 files changed, 57 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/ipi.h b/arch/x86/include/asm/ipi.h
index 615fa90..112249c 100644
--- a/arch/x86/include/asm/ipi.h
+++ b/arch/x86/include/asm/ipi.h
@@ -20,6 +20,7 @@
  * Subject to the GNU Public License, v.2
  */
 
+#include <linux/cpu.h>
 #include <asm/hw_irq.h>
 #include <asm/apic.h>
 #include <asm/smp.h>
@@ -131,18 +132,22 @@ extern int no_broadcast;
 
 static inline void __default_local_send_IPI_allbutself(int vector)
 {
+	get_online_cpus_atomic();
 	if (no_broadcast || vector == NMI_VECTOR)
 		apic->send_IPI_mask_allbutself(cpu_online_mask, vector);
 	else
 		__default_send_IPI_shortcut(APIC_DEST_ALLBUT, vector, apic->dest_logical);
+	put_online_cpus_atomic();
 }
 
 static inline void __default_local_send_IPI_all(int vector)
 {
+	get_online_cpus_atomic();
 	if (no_broadcast || vector == NMI_VECTOR)
 		apic->send_IPI_mask(cpu_online_mask, vector);
 	else
 		__default_send_IPI_shortcut(APIC_DEST_ALLINC, vector, apic->dest_logical);
+	put_online_cpus_atomic();
 }
 
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/apic/apic_flat_64.c b/arch/x86/kernel/apic/apic_flat_64.c
index 00c77cf..8207ade 100644
--- a/arch/x86/kernel/apic/apic_flat_64.c
+++ b/arch/x86/kernel/apic/apic_flat_64.c
@@ -11,6 +11,7 @@
 #include <linux/errno.h>
 #include <linux/threads.h>
 #include <linux/cpumask.h>
+#include <linux/cpu.h>
 #include <linux/string.h>
 #include <linux/kernel.h>
 #include <linux/ctype.h>
@@ -92,6 +93,8 @@ static void flat_send_IPI_allbutself(int vector)
 #else
 	int hotplug = 0;
 #endif
+
+	get_online_cpus_atomic();
 	if (hotplug || vector == NMI_VECTOR) {
 		if (!cpumask_equal(cpu_online_mask, cpumask_of(cpu))) {
 			unsigned long mask = cpumask_bits(cpu_online_mask)[0];
@@ -105,16 +108,19 @@ static void flat_send_IPI_allbutself(int vector)
 		__default_send_IPI_shortcut(APIC_DEST_ALLBUT,
 					    vector, apic->dest_logical);
 	}
+	put_online_cpus_atomic();
 }
 
 static void flat_send_IPI_all(int vector)
 {
+	get_online_cpus_atomic();
 	if (vector == NMI_VECTOR) {
 		flat_send_IPI_mask(cpu_online_mask, vector);
 	} else {
 		__default_send_IPI_shortcut(APIC_DEST_ALLINC,
 					    vector, apic->dest_logical);
 	}
+	put_online_cpus_atomic();
 }
 
 static unsigned int flat_get_apic_id(unsigned long x)
@@ -255,12 +261,16 @@ static void physflat_send_IPI_mask_allbutself(const struct cpumask *cpumask,
 
 static void physflat_send_IPI_allbutself(int vector)
 {
+	get_online_cpus_atomic();
 	default_send_IPI_mask_allbutself_phys(cpu_online_mask, vector);
+	put_online_cpus_atomic();
 }
 
 static void physflat_send_IPI_all(int vector)
 {
+	get_online_cpus_atomic();
 	physflat_send_IPI_mask(cpu_online_mask, vector);
+	put_online_cpus_atomic();
 }
 
 static int physflat_probe(void)
diff --git a/arch/x86/kernel/apic/apic_numachip.c b/arch/x86/kernel/apic/apic_numachip.c
index 9c2aa89..7d19c1d 100644
--- a/arch/x86/kernel/apic/apic_numachip.c
+++ b/arch/x86/kernel/apic/apic_numachip.c
@@ -14,6 +14,7 @@
 #include <linux/errno.h>
 #include <linux/threads.h>
 #include <linux/cpumask.h>
+#include <linux/cpu.h>
 #include <linux/string.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
@@ -131,15 +132,19 @@ static void numachip_send_IPI_allbutself(int vector)
 	unsigned int this_cpu = smp_processor_id();
 	unsigned int cpu;
 
+	get_online_cpus_atomic();
 	for_each_online_cpu(cpu) {
 		if (cpu != this_cpu)
 			numachip_send_IPI_one(cpu, vector);
 	}
+	put_online_cpus_atomic();
 }
 
 static void numachip_send_IPI_all(int vector)
 {
+	get_online_cpus_atomic();
 	numachip_send_IPI_mask(cpu_online_mask, vector);
+	put_online_cpus_atomic();
 }
 
 static void numachip_send_IPI_self(int vector)
diff --git a/arch/x86/kernel/apic/es7000_32.c b/arch/x86/kernel/apic/es7000_32.c
index 0874799..ddf2995 100644
--- a/arch/x86/kernel/apic/es7000_32.c
+++ b/arch/x86/kernel/apic/es7000_32.c
@@ -45,6 +45,7 @@
 #include <linux/gfp.h>
 #include <linux/nmi.h>
 #include <linux/smp.h>
+#include <linux/cpu.h>
 #include <linux/io.h>
 
 #include <asm/apicdef.h>
@@ -412,12 +413,16 @@ static void es7000_send_IPI_mask(const struct cpumask *mask, int vector)
 
 static void es7000_send_IPI_allbutself(int vector)
 {
+	get_online_cpus_atomic();
 	default_send_IPI_mask_allbutself_phys(cpu_online_mask, vector);
+	put_online_cpus_atomic();
 }
 
 static void es7000_send_IPI_all(int vector)
 {
+	get_online_cpus_atomic();
 	es7000_send_IPI_mask(cpu_online_mask, vector);
+	put_online_cpus_atomic();
 }
 
 static int es7000_apic_id_registered(void)
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index b739d39..ca1c2a5 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -25,6 +25,7 @@
 #include <linux/init.h>
 #include <linux/delay.h>
 #include <linux/sched.h>
+#include <linux/cpu.h>
 #include <linux/pci.h>
 #include <linux/mc146818rtc.h>
 #include <linux/compiler.h>
@@ -1788,13 +1789,13 @@ __apicdebuginit(void) print_local_APICs(int maxcpu)
 	if (!maxcpu)
 		return;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	for_each_online_cpu(cpu) {
 		if (cpu >= maxcpu)
 			break;
 		smp_call_function_single(cpu, print_local_APIC, NULL, 1);
 	}
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 __apicdebuginit(void) print_PIC(void)
@@ -2209,6 +2210,7 @@ void send_cleanup_vector(struct irq_cfg *cfg)
 {
 	cpumask_var_t cleanup_mask;
 
+	get_online_cpus_atomic();
 	if (unlikely(!alloc_cpumask_var(&cleanup_mask, GFP_ATOMIC))) {
 		unsigned int i;
 		for_each_cpu_and(i, cfg->old_domain, cpu_online_mask)
@@ -2219,6 +2221,7 @@ void send_cleanup_vector(struct irq_cfg *cfg)
 		free_cpumask_var(cleanup_mask);
 	}
 	cfg->move_in_progress = 0;
+	put_online_cpus_atomic();
 }
 
 asmlinkage void smp_irq_move_cleanup_interrupt(void)
diff --git a/arch/x86/kernel/apic/ipi.c b/arch/x86/kernel/apic/ipi.c
index cce91bf..c65aa77 100644
--- a/arch/x86/kernel/apic/ipi.c
+++ b/arch/x86/kernel/apic/ipi.c
@@ -29,12 +29,14 @@ void default_send_IPI_mask_sequence_phys(const struct cpumask *mask, int vector)
 	 * to an arbitrary mask, so I do a unicast to each CPU instead.
 	 * - mbligh
 	 */
+	get_online_cpus_atomic();
 	local_irq_save(flags);
 	for_each_cpu(query_cpu, mask) {
 		__default_send_IPI_dest_field(per_cpu(x86_cpu_to_apicid,
 				query_cpu), vector, APIC_DEST_PHYSICAL);
 	}
 	local_irq_restore(flags);
+	put_online_cpus_atomic();
 }
 
 void default_send_IPI_mask_allbutself_phys(const struct cpumask *mask,
@@ -46,6 +48,7 @@ void default_send_IPI_mask_allbutself_phys(const struct cpumask *mask,
 
 	/* See Hack comment above */
 
+	get_online_cpus_atomic();
 	local_irq_save(flags);
 	for_each_cpu(query_cpu, mask) {
 		if (query_cpu == this_cpu)
@@ -54,6 +57,7 @@ void default_send_IPI_mask_allbutself_phys(const struct cpumask *mask,
 				 query_cpu), vector, APIC_DEST_PHYSICAL);
 	}
 	local_irq_restore(flags);
+	put_online_cpus_atomic();
 }
 
 #ifdef CONFIG_X86_32
@@ -70,12 +74,14 @@ void default_send_IPI_mask_sequence_logical(const struct cpumask *mask,
 	 * should be modified to do 1 message per cluster ID - mbligh
 	 */
 
+	get_online_cpus_atomic();
 	local_irq_save(flags);
 	for_each_cpu(query_cpu, mask)
 		__default_send_IPI_dest_field(
 			early_per_cpu(x86_cpu_to_logical_apicid, query_cpu),
 			vector, apic->dest_logical);
 	local_irq_restore(flags);
+	put_online_cpus_atomic();
 }
 
 void default_send_IPI_mask_allbutself_logical(const struct cpumask *mask,
@@ -87,6 +93,7 @@ void default_send_IPI_mask_allbutself_logical(const struct cpumask *mask,
 
 	/* See Hack comment above */
 
+	get_online_cpus_atomic();
 	local_irq_save(flags);
 	for_each_cpu(query_cpu, mask) {
 		if (query_cpu == this_cpu)
@@ -96,6 +103,7 @@ void default_send_IPI_mask_allbutself_logical(const struct cpumask *mask,
 			vector, apic->dest_logical);
 		}
 	local_irq_restore(flags);
+	put_online_cpus_atomic();
 }
 
 /*
@@ -109,10 +117,12 @@ void default_send_IPI_mask_logical(const struct cpumask *cpumask, int vector)
 	if (WARN_ONCE(!mask, "empty IPI mask"))
 		return;
 
+	get_online_cpus_atomic();
 	local_irq_save(flags);
 	WARN_ON(mask & ~cpumask_bits(cpu_online_mask)[0]);
 	__default_send_IPI_dest_field(mask, vector, apic->dest_logical);
 	local_irq_restore(flags);
+	put_online_cpus_atomic();
 }
 
 void default_send_IPI_allbutself(int vector)
diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c
index c88baa4..cb08e6b 100644
--- a/arch/x86/kernel/apic/x2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -88,12 +88,16 @@ x2apic_send_IPI_mask_allbutself(const struct cpumask *mask, int vector)
 
 static void x2apic_send_IPI_allbutself(int vector)
 {
+	get_online_cpus_atomic();
 	__x2apic_send_IPI_mask(cpu_online_mask, vector, APIC_DEST_ALLBUT);
+	put_online_cpus_atomic();
 }
 
 static void x2apic_send_IPI_all(int vector)
 {
+	get_online_cpus_atomic();
 	__x2apic_send_IPI_mask(cpu_online_mask, vector, APIC_DEST_ALLINC);
+	put_online_cpus_atomic();
 }
 
 static int
diff --git a/arch/x86/kernel/apic/x2apic_uv_x.c b/arch/x86/kernel/apic/x2apic_uv_x.c
index 8cfade9..cc469a3 100644
--- a/arch/x86/kernel/apic/x2apic_uv_x.c
+++ b/arch/x86/kernel/apic/x2apic_uv_x.c
@@ -244,15 +244,19 @@ static void uv_send_IPI_allbutself(int vector)
 	unsigned int this_cpu = smp_processor_id();
 	unsigned int cpu;
 
+	get_online_cpus_atomic();
 	for_each_online_cpu(cpu) {
 		if (cpu != this_cpu)
 			uv_send_IPI_one(cpu, vector);
 	}
+	put_online_cpus_atomic();
 }
 
 static void uv_send_IPI_all(int vector)
 {
+	get_online_cpus_atomic();
 	uv_send_IPI_mask(cpu_online_mask, vector);
+	put_online_cpus_atomic();
 }
 
 static int uv_apic_id_valid(int apicid)
diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
index 47a1870..d128ba4 100644
--- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
+++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
@@ -82,13 +82,13 @@ static ssize_t therm_throt_device_show_##event##_##name(		\
 	unsigned int cpu = dev->id;					\
 	ssize_t ret;							\
 									\
-	preempt_disable();	/* CPU hotplug */			\
+	get_online_cpus_atomic();	/* CPU hotplug */		\
 	if (cpu_online(cpu)) {						\
 		ret = sprintf(buf, "%lu\n",				\
 			      per_cpu(thermal_state, cpu).event.name);	\
 	} else								\
 		ret = 0;						\
-	preempt_enable();						\
+	put_online_cpus_atomic();					\
 									\
 	return ret;							\
 }
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 13a6b29..2c3ec76 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -147,12 +147,12 @@ void flush_tlb_current_task(void)
 {
 	struct mm_struct *mm = current->mm;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 
 	local_flush_tlb();
 	if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)
 		flush_tlb_others(mm_cpumask(mm), mm, 0UL, TLB_FLUSH_ALL);
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 /*
@@ -187,7 +187,7 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
 	unsigned long addr;
 	unsigned act_entries, tlb_entries = 0;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	if (current->active_mm != mm)
 		goto flush_all;
 
@@ -225,21 +225,21 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
 		if (cpumask_any_but(mm_cpumask(mm),
 				smp_processor_id()) < nr_cpu_ids)
 			flush_tlb_others(mm_cpumask(mm), mm, start, end);
-		preempt_enable();
+		put_online_cpus_atomic();
 		return;
 	}
 
 flush_all:
 	if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)
 		flush_tlb_others(mm_cpumask(mm), mm, 0UL, TLB_FLUSH_ALL);
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 void flush_tlb_page(struct vm_area_struct *vma, unsigned long start)
 {
 	struct mm_struct *mm = vma->vm_mm;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 
 	if (current->active_mm == mm) {
 		if (current->mm)
@@ -251,7 +251,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long start)
 	if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)
 		flush_tlb_others(mm_cpumask(mm), mm, start, 0UL);
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 static void do_flush_tlb_all(void *info)


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 26/46] x86: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:41   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:41 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Daniel J Blueman <daniel@numascale-asia.com>
Cc: Steffen Persvold <sp@numascale.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: linux-edac@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/x86/include/asm/ipi.h               |    5 +++++
 arch/x86/kernel/apic/apic_flat_64.c      |   10 ++++++++++
 arch/x86/kernel/apic/apic_numachip.c     |    5 +++++
 arch/x86/kernel/apic/es7000_32.c         |    5 +++++
 arch/x86/kernel/apic/io_apic.c           |    7 +++++--
 arch/x86/kernel/apic/ipi.c               |   10 ++++++++++
 arch/x86/kernel/apic/x2apic_cluster.c    |    4 ++++
 arch/x86/kernel/apic/x2apic_uv_x.c       |    4 ++++
 arch/x86/kernel/cpu/mcheck/therm_throt.c |    4 ++--
 arch/x86/mm/tlb.c                        |   14 +++++++-------
 10 files changed, 57 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/ipi.h b/arch/x86/include/asm/ipi.h
index 615fa90..112249c 100644
--- a/arch/x86/include/asm/ipi.h
+++ b/arch/x86/include/asm/ipi.h
@@ -20,6 +20,7 @@
  * Subject to the GNU Public License, v.2
  */
 
+#include <linux/cpu.h>
 #include <asm/hw_irq.h>
 #include <asm/apic.h>
 #include <asm/smp.h>
@@ -131,18 +132,22 @@ extern int no_broadcast;
 
 static inline void __default_local_send_IPI_allbutself(int vector)
 {
+	get_online_cpus_atomic();
 	if (no_broadcast || vector == NMI_VECTOR)
 		apic->send_IPI_mask_allbutself(cpu_online_mask, vector);
 	else
 		__default_send_IPI_shortcut(APIC_DEST_ALLBUT, vector, apic->dest_logical);
+	put_online_cpus_atomic();
 }
 
 static inline void __default_local_send_IPI_all(int vector)
 {
+	get_online_cpus_atomic();
 	if (no_broadcast || vector == NMI_VECTOR)
 		apic->send_IPI_mask(cpu_online_mask, vector);
 	else
 		__default_send_IPI_shortcut(APIC_DEST_ALLINC, vector, apic->dest_logical);
+	put_online_cpus_atomic();
 }
 
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/apic/apic_flat_64.c b/arch/x86/kernel/apic/apic_flat_64.c
index 00c77cf..8207ade 100644
--- a/arch/x86/kernel/apic/apic_flat_64.c
+++ b/arch/x86/kernel/apic/apic_flat_64.c
@@ -11,6 +11,7 @@
 #include <linux/errno.h>
 #include <linux/threads.h>
 #include <linux/cpumask.h>
+#include <linux/cpu.h>
 #include <linux/string.h>
 #include <linux/kernel.h>
 #include <linux/ctype.h>
@@ -92,6 +93,8 @@ static void flat_send_IPI_allbutself(int vector)
 #else
 	int hotplug = 0;
 #endif
+
+	get_online_cpus_atomic();
 	if (hotplug || vector == NMI_VECTOR) {
 		if (!cpumask_equal(cpu_online_mask, cpumask_of(cpu))) {
 			unsigned long mask = cpumask_bits(cpu_online_mask)[0];
@@ -105,16 +108,19 @@ static void flat_send_IPI_allbutself(int vector)
 		__default_send_IPI_shortcut(APIC_DEST_ALLBUT,
 					    vector, apic->dest_logical);
 	}
+	put_online_cpus_atomic();
 }
 
 static void flat_send_IPI_all(int vector)
 {
+	get_online_cpus_atomic();
 	if (vector == NMI_VECTOR) {
 		flat_send_IPI_mask(cpu_online_mask, vector);
 	} else {
 		__default_send_IPI_shortcut(APIC_DEST_ALLINC,
 					    vector, apic->dest_logical);
 	}
+	put_online_cpus_atomic();
 }
 
 static unsigned int flat_get_apic_id(unsigned long x)
@@ -255,12 +261,16 @@ static void physflat_send_IPI_mask_allbutself(const struct cpumask *cpumask,
 
 static void physflat_send_IPI_allbutself(int vector)
 {
+	get_online_cpus_atomic();
 	default_send_IPI_mask_allbutself_phys(cpu_online_mask, vector);
+	put_online_cpus_atomic();
 }
 
 static void physflat_send_IPI_all(int vector)
 {
+	get_online_cpus_atomic();
 	physflat_send_IPI_mask(cpu_online_mask, vector);
+	put_online_cpus_atomic();
 }
 
 static int physflat_probe(void)
diff --git a/arch/x86/kernel/apic/apic_numachip.c b/arch/x86/kernel/apic/apic_numachip.c
index 9c2aa89..7d19c1d 100644
--- a/arch/x86/kernel/apic/apic_numachip.c
+++ b/arch/x86/kernel/apic/apic_numachip.c
@@ -14,6 +14,7 @@
 #include <linux/errno.h>
 #include <linux/threads.h>
 #include <linux/cpumask.h>
+#include <linux/cpu.h>
 #include <linux/string.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
@@ -131,15 +132,19 @@ static void numachip_send_IPI_allbutself(int vector)
 	unsigned int this_cpu = smp_processor_id();
 	unsigned int cpu;
 
+	get_online_cpus_atomic();
 	for_each_online_cpu(cpu) {
 		if (cpu != this_cpu)
 			numachip_send_IPI_one(cpu, vector);
 	}
+	put_online_cpus_atomic();
 }
 
 static void numachip_send_IPI_all(int vector)
 {
+	get_online_cpus_atomic();
 	numachip_send_IPI_mask(cpu_online_mask, vector);
+	put_online_cpus_atomic();
 }
 
 static void numachip_send_IPI_self(int vector)
diff --git a/arch/x86/kernel/apic/es7000_32.c b/arch/x86/kernel/apic/es7000_32.c
index 0874799..ddf2995 100644
--- a/arch/x86/kernel/apic/es7000_32.c
+++ b/arch/x86/kernel/apic/es7000_32.c
@@ -45,6 +45,7 @@
 #include <linux/gfp.h>
 #include <linux/nmi.h>
 #include <linux/smp.h>
+#include <linux/cpu.h>
 #include <linux/io.h>
 
 #include <asm/apicdef.h>
@@ -412,12 +413,16 @@ static void es7000_send_IPI_mask(const struct cpumask *mask, int vector)
 
 static void es7000_send_IPI_allbutself(int vector)
 {
+	get_online_cpus_atomic();
 	default_send_IPI_mask_allbutself_phys(cpu_online_mask, vector);
+	put_online_cpus_atomic();
 }
 
 static void es7000_send_IPI_all(int vector)
 {
+	get_online_cpus_atomic();
 	es7000_send_IPI_mask(cpu_online_mask, vector);
+	put_online_cpus_atomic();
 }
 
 static int es7000_apic_id_registered(void)
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index b739d39..ca1c2a5 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -25,6 +25,7 @@
 #include <linux/init.h>
 #include <linux/delay.h>
 #include <linux/sched.h>
+#include <linux/cpu.h>
 #include <linux/pci.h>
 #include <linux/mc146818rtc.h>
 #include <linux/compiler.h>
@@ -1788,13 +1789,13 @@ __apicdebuginit(void) print_local_APICs(int maxcpu)
 	if (!maxcpu)
 		return;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	for_each_online_cpu(cpu) {
 		if (cpu >= maxcpu)
 			break;
 		smp_call_function_single(cpu, print_local_APIC, NULL, 1);
 	}
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 __apicdebuginit(void) print_PIC(void)
@@ -2209,6 +2210,7 @@ void send_cleanup_vector(struct irq_cfg *cfg)
 {
 	cpumask_var_t cleanup_mask;
 
+	get_online_cpus_atomic();
 	if (unlikely(!alloc_cpumask_var(&cleanup_mask, GFP_ATOMIC))) {
 		unsigned int i;
 		for_each_cpu_and(i, cfg->old_domain, cpu_online_mask)
@@ -2219,6 +2221,7 @@ void send_cleanup_vector(struct irq_cfg *cfg)
 		free_cpumask_var(cleanup_mask);
 	}
 	cfg->move_in_progress = 0;
+	put_online_cpus_atomic();
 }
 
 asmlinkage void smp_irq_move_cleanup_interrupt(void)
diff --git a/arch/x86/kernel/apic/ipi.c b/arch/x86/kernel/apic/ipi.c
index cce91bf..c65aa77 100644
--- a/arch/x86/kernel/apic/ipi.c
+++ b/arch/x86/kernel/apic/ipi.c
@@ -29,12 +29,14 @@ void default_send_IPI_mask_sequence_phys(const struct cpumask *mask, int vector)
 	 * to an arbitrary mask, so I do a unicast to each CPU instead.
 	 * - mbligh
 	 */
+	get_online_cpus_atomic();
 	local_irq_save(flags);
 	for_each_cpu(query_cpu, mask) {
 		__default_send_IPI_dest_field(per_cpu(x86_cpu_to_apicid,
 				query_cpu), vector, APIC_DEST_PHYSICAL);
 	}
 	local_irq_restore(flags);
+	put_online_cpus_atomic();
 }
 
 void default_send_IPI_mask_allbutself_phys(const struct cpumask *mask,
@@ -46,6 +48,7 @@ void default_send_IPI_mask_allbutself_phys(const struct cpumask *mask,
 
 	/* See Hack comment above */
 
+	get_online_cpus_atomic();
 	local_irq_save(flags);
 	for_each_cpu(query_cpu, mask) {
 		if (query_cpu == this_cpu)
@@ -54,6 +57,7 @@ void default_send_IPI_mask_allbutself_phys(const struct cpumask *mask,
 				 query_cpu), vector, APIC_DEST_PHYSICAL);
 	}
 	local_irq_restore(flags);
+	put_online_cpus_atomic();
 }
 
 #ifdef CONFIG_X86_32
@@ -70,12 +74,14 @@ void default_send_IPI_mask_sequence_logical(const struct cpumask *mask,
 	 * should be modified to do 1 message per cluster ID - mbligh
 	 */
 
+	get_online_cpus_atomic();
 	local_irq_save(flags);
 	for_each_cpu(query_cpu, mask)
 		__default_send_IPI_dest_field(
 			early_per_cpu(x86_cpu_to_logical_apicid, query_cpu),
 			vector, apic->dest_logical);
 	local_irq_restore(flags);
+	put_online_cpus_atomic();
 }
 
 void default_send_IPI_mask_allbutself_logical(const struct cpumask *mask,
@@ -87,6 +93,7 @@ void default_send_IPI_mask_allbutself_logical(const struct cpumask *mask,
 
 	/* See Hack comment above */
 
+	get_online_cpus_atomic();
 	local_irq_save(flags);
 	for_each_cpu(query_cpu, mask) {
 		if (query_cpu == this_cpu)
@@ -96,6 +103,7 @@ void default_send_IPI_mask_allbutself_logical(const struct cpumask *mask,
 			vector, apic->dest_logical);
 		}
 	local_irq_restore(flags);
+	put_online_cpus_atomic();
 }
 
 /*
@@ -109,10 +117,12 @@ void default_send_IPI_mask_logical(const struct cpumask *cpumask, int vector)
 	if (WARN_ONCE(!mask, "empty IPI mask"))
 		return;
 
+	get_online_cpus_atomic();
 	local_irq_save(flags);
 	WARN_ON(mask & ~cpumask_bits(cpu_online_mask)[0]);
 	__default_send_IPI_dest_field(mask, vector, apic->dest_logical);
 	local_irq_restore(flags);
+	put_online_cpus_atomic();
 }
 
 void default_send_IPI_allbutself(int vector)
diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c
index c88baa4..cb08e6b 100644
--- a/arch/x86/kernel/apic/x2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -88,12 +88,16 @@ x2apic_send_IPI_mask_allbutself(const struct cpumask *mask, int vector)
 
 static void x2apic_send_IPI_allbutself(int vector)
 {
+	get_online_cpus_atomic();
 	__x2apic_send_IPI_mask(cpu_online_mask, vector, APIC_DEST_ALLBUT);
+	put_online_cpus_atomic();
 }
 
 static void x2apic_send_IPI_all(int vector)
 {
+	get_online_cpus_atomic();
 	__x2apic_send_IPI_mask(cpu_online_mask, vector, APIC_DEST_ALLINC);
+	put_online_cpus_atomic();
 }
 
 static int
diff --git a/arch/x86/kernel/apic/x2apic_uv_x.c b/arch/x86/kernel/apic/x2apic_uv_x.c
index 8cfade9..cc469a3 100644
--- a/arch/x86/kernel/apic/x2apic_uv_x.c
+++ b/arch/x86/kernel/apic/x2apic_uv_x.c
@@ -244,15 +244,19 @@ static void uv_send_IPI_allbutself(int vector)
 	unsigned int this_cpu = smp_processor_id();
 	unsigned int cpu;
 
+	get_online_cpus_atomic();
 	for_each_online_cpu(cpu) {
 		if (cpu != this_cpu)
 			uv_send_IPI_one(cpu, vector);
 	}
+	put_online_cpus_atomic();
 }
 
 static void uv_send_IPI_all(int vector)
 {
+	get_online_cpus_atomic();
 	uv_send_IPI_mask(cpu_online_mask, vector);
+	put_online_cpus_atomic();
 }
 
 static int uv_apic_id_valid(int apicid)
diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
index 47a1870..d128ba4 100644
--- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
+++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
@@ -82,13 +82,13 @@ static ssize_t therm_throt_device_show_##event##_##name(		\
 	unsigned int cpu = dev->id;					\
 	ssize_t ret;							\
 									\
-	preempt_disable();	/* CPU hotplug */			\
+	get_online_cpus_atomic();	/* CPU hotplug */		\
 	if (cpu_online(cpu)) {						\
 		ret = sprintf(buf, "%lu\n",				\
 			      per_cpu(thermal_state, cpu).event.name);	\
 	} else								\
 		ret = 0;						\
-	preempt_enable();						\
+	put_online_cpus_atomic();					\
 									\
 	return ret;							\
 }
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 13a6b29..2c3ec76 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -147,12 +147,12 @@ void flush_tlb_current_task(void)
 {
 	struct mm_struct *mm = current->mm;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 
 	local_flush_tlb();
 	if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)
 		flush_tlb_others(mm_cpumask(mm), mm, 0UL, TLB_FLUSH_ALL);
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 /*
@@ -187,7 +187,7 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
 	unsigned long addr;
 	unsigned act_entries, tlb_entries = 0;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	if (current->active_mm != mm)
 		goto flush_all;
 
@@ -225,21 +225,21 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
 		if (cpumask_any_but(mm_cpumask(mm),
 				smp_processor_id()) < nr_cpu_ids)
 			flush_tlb_others(mm_cpumask(mm), mm, start, end);
-		preempt_enable();
+		put_online_cpus_atomic();
 		return;
 	}
 
 flush_all:
 	if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)
 		flush_tlb_others(mm_cpumask(mm), mm, 0UL, TLB_FLUSH_ALL);
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 void flush_tlb_page(struct vm_area_struct *vma, unsigned long start)
 {
 	struct mm_struct *mm = vma->vm_mm;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 
 	if (current->active_mm == mm) {
 		if (current->mm)
@@ -251,7 +251,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long start)
 	if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)
 		flush_tlb_others(mm_cpumask(mm), mm, start, 0UL);
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 static void do_flush_tlb_all(void *info)

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 26/46] x86: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:41   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:41 UTC (permalink / raw)
  To: linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86 at kernel.org
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Daniel J Blueman <daniel@numascale-asia.com>
Cc: Steffen Persvold <sp@numascale.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: linux-edac at vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/x86/include/asm/ipi.h               |    5 +++++
 arch/x86/kernel/apic/apic_flat_64.c      |   10 ++++++++++
 arch/x86/kernel/apic/apic_numachip.c     |    5 +++++
 arch/x86/kernel/apic/es7000_32.c         |    5 +++++
 arch/x86/kernel/apic/io_apic.c           |    7 +++++--
 arch/x86/kernel/apic/ipi.c               |   10 ++++++++++
 arch/x86/kernel/apic/x2apic_cluster.c    |    4 ++++
 arch/x86/kernel/apic/x2apic_uv_x.c       |    4 ++++
 arch/x86/kernel/cpu/mcheck/therm_throt.c |    4 ++--
 arch/x86/mm/tlb.c                        |   14 +++++++-------
 10 files changed, 57 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/ipi.h b/arch/x86/include/asm/ipi.h
index 615fa90..112249c 100644
--- a/arch/x86/include/asm/ipi.h
+++ b/arch/x86/include/asm/ipi.h
@@ -20,6 +20,7 @@
  * Subject to the GNU Public License, v.2
  */
 
+#include <linux/cpu.h>
 #include <asm/hw_irq.h>
 #include <asm/apic.h>
 #include <asm/smp.h>
@@ -131,18 +132,22 @@ extern int no_broadcast;
 
 static inline void __default_local_send_IPI_allbutself(int vector)
 {
+	get_online_cpus_atomic();
 	if (no_broadcast || vector == NMI_VECTOR)
 		apic->send_IPI_mask_allbutself(cpu_online_mask, vector);
 	else
 		__default_send_IPI_shortcut(APIC_DEST_ALLBUT, vector, apic->dest_logical);
+	put_online_cpus_atomic();
 }
 
 static inline void __default_local_send_IPI_all(int vector)
 {
+	get_online_cpus_atomic();
 	if (no_broadcast || vector == NMI_VECTOR)
 		apic->send_IPI_mask(cpu_online_mask, vector);
 	else
 		__default_send_IPI_shortcut(APIC_DEST_ALLINC, vector, apic->dest_logical);
+	put_online_cpus_atomic();
 }
 
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/apic/apic_flat_64.c b/arch/x86/kernel/apic/apic_flat_64.c
index 00c77cf..8207ade 100644
--- a/arch/x86/kernel/apic/apic_flat_64.c
+++ b/arch/x86/kernel/apic/apic_flat_64.c
@@ -11,6 +11,7 @@
 #include <linux/errno.h>
 #include <linux/threads.h>
 #include <linux/cpumask.h>
+#include <linux/cpu.h>
 #include <linux/string.h>
 #include <linux/kernel.h>
 #include <linux/ctype.h>
@@ -92,6 +93,8 @@ static void flat_send_IPI_allbutself(int vector)
 #else
 	int hotplug = 0;
 #endif
+
+	get_online_cpus_atomic();
 	if (hotplug || vector == NMI_VECTOR) {
 		if (!cpumask_equal(cpu_online_mask, cpumask_of(cpu))) {
 			unsigned long mask = cpumask_bits(cpu_online_mask)[0];
@@ -105,16 +108,19 @@ static void flat_send_IPI_allbutself(int vector)
 		__default_send_IPI_shortcut(APIC_DEST_ALLBUT,
 					    vector, apic->dest_logical);
 	}
+	put_online_cpus_atomic();
 }
 
 static void flat_send_IPI_all(int vector)
 {
+	get_online_cpus_atomic();
 	if (vector == NMI_VECTOR) {
 		flat_send_IPI_mask(cpu_online_mask, vector);
 	} else {
 		__default_send_IPI_shortcut(APIC_DEST_ALLINC,
 					    vector, apic->dest_logical);
 	}
+	put_online_cpus_atomic();
 }
 
 static unsigned int flat_get_apic_id(unsigned long x)
@@ -255,12 +261,16 @@ static void physflat_send_IPI_mask_allbutself(const struct cpumask *cpumask,
 
 static void physflat_send_IPI_allbutself(int vector)
 {
+	get_online_cpus_atomic();
 	default_send_IPI_mask_allbutself_phys(cpu_online_mask, vector);
+	put_online_cpus_atomic();
 }
 
 static void physflat_send_IPI_all(int vector)
 {
+	get_online_cpus_atomic();
 	physflat_send_IPI_mask(cpu_online_mask, vector);
+	put_online_cpus_atomic();
 }
 
 static int physflat_probe(void)
diff --git a/arch/x86/kernel/apic/apic_numachip.c b/arch/x86/kernel/apic/apic_numachip.c
index 9c2aa89..7d19c1d 100644
--- a/arch/x86/kernel/apic/apic_numachip.c
+++ b/arch/x86/kernel/apic/apic_numachip.c
@@ -14,6 +14,7 @@
 #include <linux/errno.h>
 #include <linux/threads.h>
 #include <linux/cpumask.h>
+#include <linux/cpu.h>
 #include <linux/string.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
@@ -131,15 +132,19 @@ static void numachip_send_IPI_allbutself(int vector)
 	unsigned int this_cpu = smp_processor_id();
 	unsigned int cpu;
 
+	get_online_cpus_atomic();
 	for_each_online_cpu(cpu) {
 		if (cpu != this_cpu)
 			numachip_send_IPI_one(cpu, vector);
 	}
+	put_online_cpus_atomic();
 }
 
 static void numachip_send_IPI_all(int vector)
 {
+	get_online_cpus_atomic();
 	numachip_send_IPI_mask(cpu_online_mask, vector);
+	put_online_cpus_atomic();
 }
 
 static void numachip_send_IPI_self(int vector)
diff --git a/arch/x86/kernel/apic/es7000_32.c b/arch/x86/kernel/apic/es7000_32.c
index 0874799..ddf2995 100644
--- a/arch/x86/kernel/apic/es7000_32.c
+++ b/arch/x86/kernel/apic/es7000_32.c
@@ -45,6 +45,7 @@
 #include <linux/gfp.h>
 #include <linux/nmi.h>
 #include <linux/smp.h>
+#include <linux/cpu.h>
 #include <linux/io.h>
 
 #include <asm/apicdef.h>
@@ -412,12 +413,16 @@ static void es7000_send_IPI_mask(const struct cpumask *mask, int vector)
 
 static void es7000_send_IPI_allbutself(int vector)
 {
+	get_online_cpus_atomic();
 	default_send_IPI_mask_allbutself_phys(cpu_online_mask, vector);
+	put_online_cpus_atomic();
 }
 
 static void es7000_send_IPI_all(int vector)
 {
+	get_online_cpus_atomic();
 	es7000_send_IPI_mask(cpu_online_mask, vector);
+	put_online_cpus_atomic();
 }
 
 static int es7000_apic_id_registered(void)
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index b739d39..ca1c2a5 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -25,6 +25,7 @@
 #include <linux/init.h>
 #include <linux/delay.h>
 #include <linux/sched.h>
+#include <linux/cpu.h>
 #include <linux/pci.h>
 #include <linux/mc146818rtc.h>
 #include <linux/compiler.h>
@@ -1788,13 +1789,13 @@ __apicdebuginit(void) print_local_APICs(int maxcpu)
 	if (!maxcpu)
 		return;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	for_each_online_cpu(cpu) {
 		if (cpu >= maxcpu)
 			break;
 		smp_call_function_single(cpu, print_local_APIC, NULL, 1);
 	}
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 __apicdebuginit(void) print_PIC(void)
@@ -2209,6 +2210,7 @@ void send_cleanup_vector(struct irq_cfg *cfg)
 {
 	cpumask_var_t cleanup_mask;
 
+	get_online_cpus_atomic();
 	if (unlikely(!alloc_cpumask_var(&cleanup_mask, GFP_ATOMIC))) {
 		unsigned int i;
 		for_each_cpu_and(i, cfg->old_domain, cpu_online_mask)
@@ -2219,6 +2221,7 @@ void send_cleanup_vector(struct irq_cfg *cfg)
 		free_cpumask_var(cleanup_mask);
 	}
 	cfg->move_in_progress = 0;
+	put_online_cpus_atomic();
 }
 
 asmlinkage void smp_irq_move_cleanup_interrupt(void)
diff --git a/arch/x86/kernel/apic/ipi.c b/arch/x86/kernel/apic/ipi.c
index cce91bf..c65aa77 100644
--- a/arch/x86/kernel/apic/ipi.c
+++ b/arch/x86/kernel/apic/ipi.c
@@ -29,12 +29,14 @@ void default_send_IPI_mask_sequence_phys(const struct cpumask *mask, int vector)
 	 * to an arbitrary mask, so I do a unicast to each CPU instead.
 	 * - mbligh
 	 */
+	get_online_cpus_atomic();
 	local_irq_save(flags);
 	for_each_cpu(query_cpu, mask) {
 		__default_send_IPI_dest_field(per_cpu(x86_cpu_to_apicid,
 				query_cpu), vector, APIC_DEST_PHYSICAL);
 	}
 	local_irq_restore(flags);
+	put_online_cpus_atomic();
 }
 
 void default_send_IPI_mask_allbutself_phys(const struct cpumask *mask,
@@ -46,6 +48,7 @@ void default_send_IPI_mask_allbutself_phys(const struct cpumask *mask,
 
 	/* See Hack comment above */
 
+	get_online_cpus_atomic();
 	local_irq_save(flags);
 	for_each_cpu(query_cpu, mask) {
 		if (query_cpu == this_cpu)
@@ -54,6 +57,7 @@ void default_send_IPI_mask_allbutself_phys(const struct cpumask *mask,
 				 query_cpu), vector, APIC_DEST_PHYSICAL);
 	}
 	local_irq_restore(flags);
+	put_online_cpus_atomic();
 }
 
 #ifdef CONFIG_X86_32
@@ -70,12 +74,14 @@ void default_send_IPI_mask_sequence_logical(const struct cpumask *mask,
 	 * should be modified to do 1 message per cluster ID - mbligh
 	 */
 
+	get_online_cpus_atomic();
 	local_irq_save(flags);
 	for_each_cpu(query_cpu, mask)
 		__default_send_IPI_dest_field(
 			early_per_cpu(x86_cpu_to_logical_apicid, query_cpu),
 			vector, apic->dest_logical);
 	local_irq_restore(flags);
+	put_online_cpus_atomic();
 }
 
 void default_send_IPI_mask_allbutself_logical(const struct cpumask *mask,
@@ -87,6 +93,7 @@ void default_send_IPI_mask_allbutself_logical(const struct cpumask *mask,
 
 	/* See Hack comment above */
 
+	get_online_cpus_atomic();
 	local_irq_save(flags);
 	for_each_cpu(query_cpu, mask) {
 		if (query_cpu == this_cpu)
@@ -96,6 +103,7 @@ void default_send_IPI_mask_allbutself_logical(const struct cpumask *mask,
 			vector, apic->dest_logical);
 		}
 	local_irq_restore(flags);
+	put_online_cpus_atomic();
 }
 
 /*
@@ -109,10 +117,12 @@ void default_send_IPI_mask_logical(const struct cpumask *cpumask, int vector)
 	if (WARN_ONCE(!mask, "empty IPI mask"))
 		return;
 
+	get_online_cpus_atomic();
 	local_irq_save(flags);
 	WARN_ON(mask & ~cpumask_bits(cpu_online_mask)[0]);
 	__default_send_IPI_dest_field(mask, vector, apic->dest_logical);
 	local_irq_restore(flags);
+	put_online_cpus_atomic();
 }
 
 void default_send_IPI_allbutself(int vector)
diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c
index c88baa4..cb08e6b 100644
--- a/arch/x86/kernel/apic/x2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -88,12 +88,16 @@ x2apic_send_IPI_mask_allbutself(const struct cpumask *mask, int vector)
 
 static void x2apic_send_IPI_allbutself(int vector)
 {
+	get_online_cpus_atomic();
 	__x2apic_send_IPI_mask(cpu_online_mask, vector, APIC_DEST_ALLBUT);
+	put_online_cpus_atomic();
 }
 
 static void x2apic_send_IPI_all(int vector)
 {
+	get_online_cpus_atomic();
 	__x2apic_send_IPI_mask(cpu_online_mask, vector, APIC_DEST_ALLINC);
+	put_online_cpus_atomic();
 }
 
 static int
diff --git a/arch/x86/kernel/apic/x2apic_uv_x.c b/arch/x86/kernel/apic/x2apic_uv_x.c
index 8cfade9..cc469a3 100644
--- a/arch/x86/kernel/apic/x2apic_uv_x.c
+++ b/arch/x86/kernel/apic/x2apic_uv_x.c
@@ -244,15 +244,19 @@ static void uv_send_IPI_allbutself(int vector)
 	unsigned int this_cpu = smp_processor_id();
 	unsigned int cpu;
 
+	get_online_cpus_atomic();
 	for_each_online_cpu(cpu) {
 		if (cpu != this_cpu)
 			uv_send_IPI_one(cpu, vector);
 	}
+	put_online_cpus_atomic();
 }
 
 static void uv_send_IPI_all(int vector)
 {
+	get_online_cpus_atomic();
 	uv_send_IPI_mask(cpu_online_mask, vector);
+	put_online_cpus_atomic();
 }
 
 static int uv_apic_id_valid(int apicid)
diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
index 47a1870..d128ba4 100644
--- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
+++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
@@ -82,13 +82,13 @@ static ssize_t therm_throt_device_show_##event##_##name(		\
 	unsigned int cpu = dev->id;					\
 	ssize_t ret;							\
 									\
-	preempt_disable();	/* CPU hotplug */			\
+	get_online_cpus_atomic();	/* CPU hotplug */		\
 	if (cpu_online(cpu)) {						\
 		ret = sprintf(buf, "%lu\n",				\
 			      per_cpu(thermal_state, cpu).event.name);	\
 	} else								\
 		ret = 0;						\
-	preempt_enable();						\
+	put_online_cpus_atomic();					\
 									\
 	return ret;							\
 }
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 13a6b29..2c3ec76 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -147,12 +147,12 @@ void flush_tlb_current_task(void)
 {
 	struct mm_struct *mm = current->mm;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 
 	local_flush_tlb();
 	if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)
 		flush_tlb_others(mm_cpumask(mm), mm, 0UL, TLB_FLUSH_ALL);
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 /*
@@ -187,7 +187,7 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
 	unsigned long addr;
 	unsigned act_entries, tlb_entries = 0;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	if (current->active_mm != mm)
 		goto flush_all;
 
@@ -225,21 +225,21 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
 		if (cpumask_any_but(mm_cpumask(mm),
 				smp_processor_id()) < nr_cpu_ids)
 			flush_tlb_others(mm_cpumask(mm), mm, start, end);
-		preempt_enable();
+		put_online_cpus_atomic();
 		return;
 	}
 
 flush_all:
 	if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)
 		flush_tlb_others(mm_cpumask(mm), mm, 0UL, TLB_FLUSH_ALL);
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 void flush_tlb_page(struct vm_area_struct *vma, unsigned long start)
 {
 	struct mm_struct *mm = vma->vm_mm;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 
 	if (current->active_mm == mm) {
 		if (current->mm)
@@ -251,7 +251,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long start)
 	if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)
 		flush_tlb_others(mm_cpumask(mm), mm, start, 0UL);
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 static void do_flush_tlb_all(void *info)

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 27/46] perf/x86: Use get/put_online_cpus_atomic() to prevent CPU offline
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:42   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:42 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

The CPU_DYING notifier modifies the per-cpu pointer pmu->box, and this can
race with functions such as uncore_pmu_to_box() and uncore_pci_remove() when
we remove stop_machine() from the CPU offline path. So protect them using
get/put_online_cpus_atomic().

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/x86/kernel/cpu/perf_event_intel_uncore.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
index b43200d..6faae53 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -1,3 +1,4 @@
+#include <linux/cpu.h>
 #include "perf_event_intel_uncore.h"
 
 static struct intel_uncore_type *empty_uncore[] = { NULL, };
@@ -1965,6 +1966,7 @@ uncore_pmu_to_box(struct intel_uncore_pmu *pmu, int cpu)
 	if (box)
 		return box;
 
+	get_online_cpus_atomic();
 	raw_spin_lock(&uncore_box_lock);
 	list_for_each_entry(box, &pmu->box_list, list) {
 		if (box->phys_id == topology_physical_package_id(cpu)) {
@@ -1974,6 +1976,7 @@ uncore_pmu_to_box(struct intel_uncore_pmu *pmu, int cpu)
 		}
 	}
 	raw_spin_unlock(&uncore_box_lock);
+	put_online_cpus_atomic();
 
 	return *per_cpu_ptr(pmu->box, cpu);
 }
@@ -2556,6 +2559,7 @@ static void uncore_pci_remove(struct pci_dev *pdev)
 	if (WARN_ON_ONCE(phys_id != box->phys_id))
 		return;
 
+	get_online_cpus_atomic();
 	raw_spin_lock(&uncore_box_lock);
 	list_del(&box->list);
 	raw_spin_unlock(&uncore_box_lock);
@@ -2569,6 +2573,7 @@ static void uncore_pci_remove(struct pci_dev *pdev)
 
 	WARN_ON_ONCE(atomic_read(&box->refcnt) != 1);
 	kfree(box);
+	put_online_cpus_atomic();
 }
 
 static int uncore_pci_probe(struct pci_dev *pdev,


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 27/46] perf/x86: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:42   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:42 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

The CPU_DYING notifier modifies the per-cpu pointer pmu->box, and this can
race with functions such as uncore_pmu_to_box() and uncore_pci_remove() when
we remove stop_machine() from the CPU offline path. So protect them using
get/put_online_cpus_atomic().

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/x86/kernel/cpu/perf_event_intel_uncore.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
index b43200d..6faae53 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -1,3 +1,4 @@
+#include <linux/cpu.h>
 #include "perf_event_intel_uncore.h"
 
 static struct intel_uncore_type *empty_uncore[] = { NULL, };
@@ -1965,6 +1966,7 @@ uncore_pmu_to_box(struct intel_uncore_pmu *pmu, int cpu)
 	if (box)
 		return box;
 
+	get_online_cpus_atomic();
 	raw_spin_lock(&uncore_box_lock);
 	list_for_each_entry(box, &pmu->box_list, list) {
 		if (box->phys_id == topology_physical_package_id(cpu)) {
@@ -1974,6 +1976,7 @@ uncore_pmu_to_box(struct intel_uncore_pmu *pmu, int cpu)
 		}
 	}
 	raw_spin_unlock(&uncore_box_lock);
+	put_online_cpus_atomic();
 
 	return *per_cpu_ptr(pmu->box, cpu);
 }
@@ -2556,6 +2559,7 @@ static void uncore_pci_remove(struct pci_dev *pdev)
 	if (WARN_ON_ONCE(phys_id != box->phys_id))
 		return;
 
+	get_online_cpus_atomic();
 	raw_spin_lock(&uncore_box_lock);
 	list_del(&box->list);
 	raw_spin_unlock(&uncore_box_lock);
@@ -2569,6 +2573,7 @@ static void uncore_pci_remove(struct pci_dev *pdev)
 
 	WARN_ON_ONCE(atomic_read(&box->refcnt) != 1);
 	kfree(box);
+	put_online_cpus_atomic();
 }
 
 static int uncore_pci_probe(struct pci_dev *pdev,

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 27/46] perf/x86: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:42   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:42 UTC (permalink / raw)
  To: linux-arm-kernel

The CPU_DYING notifier modifies the per-cpu pointer pmu->box, and this can
race with functions such as uncore_pmu_to_box() and uncore_pci_remove() when
we remove stop_machine() from the CPU offline path. So protect them using
get/put_online_cpus_atomic().

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86 at kernel.org
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/x86/kernel/cpu/perf_event_intel_uncore.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
index b43200d..6faae53 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -1,3 +1,4 @@
+#include <linux/cpu.h>
 #include "perf_event_intel_uncore.h"
 
 static struct intel_uncore_type *empty_uncore[] = { NULL, };
@@ -1965,6 +1966,7 @@ uncore_pmu_to_box(struct intel_uncore_pmu *pmu, int cpu)
 	if (box)
 		return box;
 
+	get_online_cpus_atomic();
 	raw_spin_lock(&uncore_box_lock);
 	list_for_each_entry(box, &pmu->box_list, list) {
 		if (box->phys_id == topology_physical_package_id(cpu)) {
@@ -1974,6 +1976,7 @@ uncore_pmu_to_box(struct intel_uncore_pmu *pmu, int cpu)
 		}
 	}
 	raw_spin_unlock(&uncore_box_lock);
+	put_online_cpus_atomic();
 
 	return *per_cpu_ptr(pmu->box, cpu);
 }
@@ -2556,6 +2559,7 @@ static void uncore_pci_remove(struct pci_dev *pdev)
 	if (WARN_ON_ONCE(phys_id != box->phys_id))
 		return;
 
+	get_online_cpus_atomic();
 	raw_spin_lock(&uncore_box_lock);
 	list_del(&box->list);
 	raw_spin_unlock(&uncore_box_lock);
@@ -2569,6 +2573,7 @@ static void uncore_pci_remove(struct pci_dev *pdev)
 
 	WARN_ON_ONCE(atomic_read(&box->refcnt) != 1);
 	kfree(box);
+	put_online_cpus_atomic();
 }
 
 static int uncore_pci_probe(struct pci_dev *pdev,

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 28/46] KVM: Use get/put_online_cpus_atomic() to prevent CPU offline from atomic context
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:42   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:42 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: kvm@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 virt/kvm/kvm_main.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1cd693a..47f9c30 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -174,7 +174,8 @@ static bool make_all_cpus_request(struct kvm *kvm, unsigned int req)
 
 	zalloc_cpumask_var(&cpus, GFP_ATOMIC);
 
-	me = get_cpu();
+	get_online_cpus_atomic();
+	me = smp_processor_id();
 	kvm_for_each_vcpu(i, vcpu, kvm) {
 		kvm_make_request(req, vcpu);
 		cpu = vcpu->cpu;
@@ -192,7 +193,7 @@ static bool make_all_cpus_request(struct kvm *kvm, unsigned int req)
 		smp_call_function_many(cpus, ack_flush, NULL, 1);
 	else
 		called = false;
-	put_cpu();
+	put_online_cpus_atomic();
 	free_cpumask_var(cpus);
 	return called;
 }
@@ -1621,11 +1622,12 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
 		++vcpu->stat.halt_wakeup;
 	}
 
-	me = get_cpu();
+	get_online_cpus_atomic();
+	me = smp_processor_id();
 	if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu))
 		if (kvm_arch_vcpu_should_kick(vcpu))
 			smp_send_reschedule(cpu);
-	put_cpu();
+	put_online_cpus_atomic();
 }
 #endif /* !CONFIG_S390 */
 


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 28/46] KVM: Use get/put_online_cpus_atomic() to prevent CPU offline from atomic context
@ 2013-02-18 12:42   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:42 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: kvm@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 virt/kvm/kvm_main.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1cd693a..47f9c30 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -174,7 +174,8 @@ static bool make_all_cpus_request(struct kvm *kvm, unsigned int req)
 
 	zalloc_cpumask_var(&cpus, GFP_ATOMIC);
 
-	me = get_cpu();
+	get_online_cpus_atomic();
+	me = smp_processor_id();
 	kvm_for_each_vcpu(i, vcpu, kvm) {
 		kvm_make_request(req, vcpu);
 		cpu = vcpu->cpu;
@@ -192,7 +193,7 @@ static bool make_all_cpus_request(struct kvm *kvm, unsigned int req)
 		smp_call_function_many(cpus, ack_flush, NULL, 1);
 	else
 		called = false;
-	put_cpu();
+	put_online_cpus_atomic();
 	free_cpumask_var(cpus);
 	return called;
 }
@@ -1621,11 +1622,12 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
 		++vcpu->stat.halt_wakeup;
 	}
 
-	me = get_cpu();
+	get_online_cpus_atomic();
+	me = smp_processor_id();
 	if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu))
 		if (kvm_arch_vcpu_should_kick(vcpu))
 			smp_send_reschedule(cpu);
-	put_cpu();
+	put_online_cpus_atomic();
 }
 #endif /* !CONFIG_S390 */
 

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 28/46] KVM: Use get/put_online_cpus_atomic() to prevent CPU offline from atomic context
@ 2013-02-18 12:42   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:42 UTC (permalink / raw)
  To: linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: kvm at vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 virt/kvm/kvm_main.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1cd693a..47f9c30 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -174,7 +174,8 @@ static bool make_all_cpus_request(struct kvm *kvm, unsigned int req)
 
 	zalloc_cpumask_var(&cpus, GFP_ATOMIC);
 
-	me = get_cpu();
+	get_online_cpus_atomic();
+	me = smp_processor_id();
 	kvm_for_each_vcpu(i, vcpu, kvm) {
 		kvm_make_request(req, vcpu);
 		cpu = vcpu->cpu;
@@ -192,7 +193,7 @@ static bool make_all_cpus_request(struct kvm *kvm, unsigned int req)
 		smp_call_function_many(cpus, ack_flush, NULL, 1);
 	else
 		called = false;
-	put_cpu();
+	put_online_cpus_atomic();
 	free_cpumask_var(cpus);
 	return called;
 }
@@ -1621,11 +1622,12 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
 		++vcpu->stat.halt_wakeup;
 	}
 
-	me = get_cpu();
+	get_online_cpus_atomic();
+	me = smp_processor_id();
 	if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu))
 		if (kvm_arch_vcpu_should_kick(vcpu))
 			smp_send_reschedule(cpu);
-	put_cpu();
+	put_online_cpus_atomic();
 }
 #endif /* !CONFIG_S390 */
 

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 29/46] kvm/vmx: Use get/put_online_cpus_atomic() to prevent CPU offline
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:42   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:42 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context (in vmx_vcpu_load() to prevent CPUs from
going offline while clearing vmcs).

Reported-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Debugged-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/x86/kvm/vmx.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9120ae1..2886ff0 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1557,10 +1557,14 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	u64 phys_addr = __pa(per_cpu(vmxarea, cpu));
 
-	if (!vmm_exclusive)
+	if (!vmm_exclusive) {
 		kvm_cpu_vmxon(phys_addr);
-	else if (vmx->loaded_vmcs->cpu != cpu)
+	} else if (vmx->loaded_vmcs->cpu != cpu) {
+		/* Prevent any CPU from going offline */
+		get_online_cpus_atomic();
 		loaded_vmcs_clear(vmx->loaded_vmcs);
+		put_online_cpus_atomic();
+	}
 
 	if (per_cpu(current_vmcs, cpu) != vmx->loaded_vmcs->vmcs) {
 		per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs;


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 29/46] kvm/vmx: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:42   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:42 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context (in vmx_vcpu_load() to prevent CPUs from
going offline while clearing vmcs).

Reported-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Debugged-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/x86/kvm/vmx.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9120ae1..2886ff0 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1557,10 +1557,14 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	u64 phys_addr = __pa(per_cpu(vmxarea, cpu));
 
-	if (!vmm_exclusive)
+	if (!vmm_exclusive) {
 		kvm_cpu_vmxon(phys_addr);
-	else if (vmx->loaded_vmcs->cpu != cpu)
+	} else if (vmx->loaded_vmcs->cpu != cpu) {
+		/* Prevent any CPU from going offline */
+		get_online_cpus_atomic();
 		loaded_vmcs_clear(vmx->loaded_vmcs);
+		put_online_cpus_atomic();
+	}
 
 	if (per_cpu(current_vmcs, cpu) != vmx->loaded_vmcs->vmcs) {
 		per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs;

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 29/46] kvm/vmx: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:42   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:42 UTC (permalink / raw)
  To: linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context (in vmx_vcpu_load() to prevent CPUs from
going offline while clearing vmcs).

Reported-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Debugged-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86 at kernel.org
Cc: kvm at vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/x86/kvm/vmx.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9120ae1..2886ff0 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1557,10 +1557,14 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	u64 phys_addr = __pa(per_cpu(vmxarea, cpu));
 
-	if (!vmm_exclusive)
+	if (!vmm_exclusive) {
 		kvm_cpu_vmxon(phys_addr);
-	else if (vmx->loaded_vmcs->cpu != cpu)
+	} else if (vmx->loaded_vmcs->cpu != cpu) {
+		/* Prevent any CPU from going offline */
+		get_online_cpus_atomic();
 		loaded_vmcs_clear(vmx->loaded_vmcs);
+		put_online_cpus_atomic();
+	}
 
 	if (per_cpu(current_vmcs, cpu) != vmx->loaded_vmcs->vmcs) {
 		per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs;

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 30/46] x86/xen: Use get/put_online_cpus_atomic() to prevent CPU offline
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:42   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:42 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: xen-devel@lists.xensource.com
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/x86/xen/mmu.c |   11 +++++++++--
 arch/x86/xen/smp.c |    9 +++++++++
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 01de35c..6a95a15 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -39,6 +39,7 @@
  * Jeremy Fitzhardinge <jeremy@xensource.com>, XenSource Inc, 2007
  */
 #include <linux/sched.h>
+#include <linux/cpu.h>
 #include <linux/highmem.h>
 #include <linux/debugfs.h>
 #include <linux/bug.h>
@@ -1163,9 +1164,13 @@ static void xen_drop_mm_ref(struct mm_struct *mm)
  */
 static void xen_exit_mmap(struct mm_struct *mm)
 {
-	get_cpu();		/* make sure we don't move around */
+	/*
+	 * Make sure we don't move around, and prevent CPUs from going
+	 * offline.
+	 */
+	get_online_cpus_atomic();
 	xen_drop_mm_ref(mm);
-	put_cpu();
+	put_online_cpus_atomic();
 
 	spin_lock(&mm->page_table_lock);
 
@@ -1371,6 +1376,7 @@ static void xen_flush_tlb_others(const struct cpumask *cpus,
 	args->op.arg2.vcpumask = to_cpumask(args->mask);
 
 	/* Remove us, and any offline CPUS. */
+	get_online_cpus_atomic();
 	cpumask_and(to_cpumask(args->mask), cpus, cpu_online_mask);
 	cpumask_clear_cpu(smp_processor_id(), to_cpumask(args->mask));
 
@@ -1383,6 +1389,7 @@ static void xen_flush_tlb_others(const struct cpumask *cpus,
 	MULTI_mmuext_op(mcs.mc, &args->op, 1, NULL, DOMID_SELF);
 
 	xen_mc_issue(PARAVIRT_LAZY_MMU);
+	put_online_cpus_atomic();
 }
 
 static unsigned long xen_read_cr3(void)
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 34bc4ce..cb82e71 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -16,6 +16,7 @@
 #include <linux/err.h>
 #include <linux/slab.h>
 #include <linux/smp.h>
+#include <linux/cpu.h>
 #include <linux/irq_work.h>
 
 #include <asm/paravirt.h>
@@ -480,8 +481,10 @@ static void __xen_send_IPI_mask(const struct cpumask *mask,
 {
 	unsigned cpu;
 
+	get_online_cpus_atomic();
 	for_each_cpu_and(cpu, mask, cpu_online_mask)
 		xen_send_IPI_one(cpu, vector);
+	put_online_cpus_atomic();
 }
 
 static void xen_smp_send_call_function_ipi(const struct cpumask *mask)
@@ -544,8 +547,10 @@ void xen_send_IPI_all(int vector)
 {
 	int xen_vector = xen_map_vector(vector);
 
+	get_online_cpus_atomic();
 	if (xen_vector >= 0)
 		__xen_send_IPI_mask(cpu_online_mask, xen_vector);
+	put_online_cpus_atomic();
 }
 
 void xen_send_IPI_self(int vector)
@@ -565,20 +570,24 @@ void xen_send_IPI_mask_allbutself(const struct cpumask *mask,
 	if (!(num_online_cpus() > 1))
 		return;
 
+	get_online_cpus_atomic();
 	for_each_cpu_and(cpu, mask, cpu_online_mask) {
 		if (this_cpu == cpu)
 			continue;
 
 		xen_smp_send_call_function_single_ipi(cpu);
 	}
+	put_online_cpus_atomic();
 }
 
 void xen_send_IPI_allbutself(int vector)
 {
 	int xen_vector = xen_map_vector(vector);
 
+	get_online_cpus_atomic();
 	if (xen_vector >= 0)
 		xen_send_IPI_mask_allbutself(cpu_online_mask, xen_vector);
+	put_online_cpus_atomic();
 }
 
 static irqreturn_t xen_call_function_interrupt(int irq, void *dev_id)


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 30/46] x86/xen: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:42   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:42 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: xen-devel@lists.xensource.com
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/x86/xen/mmu.c |   11 +++++++++--
 arch/x86/xen/smp.c |    9 +++++++++
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 01de35c..6a95a15 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -39,6 +39,7 @@
  * Jeremy Fitzhardinge <jeremy@xensource.com>, XenSource Inc, 2007
  */
 #include <linux/sched.h>
+#include <linux/cpu.h>
 #include <linux/highmem.h>
 #include <linux/debugfs.h>
 #include <linux/bug.h>
@@ -1163,9 +1164,13 @@ static void xen_drop_mm_ref(struct mm_struct *mm)
  */
 static void xen_exit_mmap(struct mm_struct *mm)
 {
-	get_cpu();		/* make sure we don't move around */
+	/*
+	 * Make sure we don't move around, and prevent CPUs from going
+	 * offline.
+	 */
+	get_online_cpus_atomic();
 	xen_drop_mm_ref(mm);
-	put_cpu();
+	put_online_cpus_atomic();
 
 	spin_lock(&mm->page_table_lock);
 
@@ -1371,6 +1376,7 @@ static void xen_flush_tlb_others(const struct cpumask *cpus,
 	args->op.arg2.vcpumask = to_cpumask(args->mask);
 
 	/* Remove us, and any offline CPUS. */
+	get_online_cpus_atomic();
 	cpumask_and(to_cpumask(args->mask), cpus, cpu_online_mask);
 	cpumask_clear_cpu(smp_processor_id(), to_cpumask(args->mask));
 
@@ -1383,6 +1389,7 @@ static void xen_flush_tlb_others(const struct cpumask *cpus,
 	MULTI_mmuext_op(mcs.mc, &args->op, 1, NULL, DOMID_SELF);
 
 	xen_mc_issue(PARAVIRT_LAZY_MMU);
+	put_online_cpus_atomic();
 }
 
 static unsigned long xen_read_cr3(void)
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 34bc4ce..cb82e71 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -16,6 +16,7 @@
 #include <linux/err.h>
 #include <linux/slab.h>
 #include <linux/smp.h>
+#include <linux/cpu.h>
 #include <linux/irq_work.h>
 
 #include <asm/paravirt.h>
@@ -480,8 +481,10 @@ static void __xen_send_IPI_mask(const struct cpumask *mask,
 {
 	unsigned cpu;
 
+	get_online_cpus_atomic();
 	for_each_cpu_and(cpu, mask, cpu_online_mask)
 		xen_send_IPI_one(cpu, vector);
+	put_online_cpus_atomic();
 }
 
 static void xen_smp_send_call_function_ipi(const struct cpumask *mask)
@@ -544,8 +547,10 @@ void xen_send_IPI_all(int vector)
 {
 	int xen_vector = xen_map_vector(vector);
 
+	get_online_cpus_atomic();
 	if (xen_vector >= 0)
 		__xen_send_IPI_mask(cpu_online_mask, xen_vector);
+	put_online_cpus_atomic();
 }
 
 void xen_send_IPI_self(int vector)
@@ -565,20 +570,24 @@ void xen_send_IPI_mask_allbutself(const struct cpumask *mask,
 	if (!(num_online_cpus() > 1))
 		return;
 
+	get_online_cpus_atomic();
 	for_each_cpu_and(cpu, mask, cpu_online_mask) {
 		if (this_cpu == cpu)
 			continue;
 
 		xen_smp_send_call_function_single_ipi(cpu);
 	}
+	put_online_cpus_atomic();
 }
 
 void xen_send_IPI_allbutself(int vector)
 {
 	int xen_vector = xen_map_vector(vector);
 
+	get_online_cpus_atomic();
 	if (xen_vector >= 0)
 		xen_send_IPI_mask_allbutself(cpu_online_mask, xen_vector);
+	put_online_cpus_atomic();
 }
 
 static irqreturn_t xen_call_function_interrupt(int irq, void *dev_id)

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 30/46] x86/xen: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:42   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:42 UTC (permalink / raw)
  To: linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86 at kernel.org
Cc: xen-devel at lists.xensource.com
Cc: virtualization at lists.linux-foundation.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/x86/xen/mmu.c |   11 +++++++++--
 arch/x86/xen/smp.c |    9 +++++++++
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 01de35c..6a95a15 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -39,6 +39,7 @@
  * Jeremy Fitzhardinge <jeremy@xensource.com>, XenSource Inc, 2007
  */
 #include <linux/sched.h>
+#include <linux/cpu.h>
 #include <linux/highmem.h>
 #include <linux/debugfs.h>
 #include <linux/bug.h>
@@ -1163,9 +1164,13 @@ static void xen_drop_mm_ref(struct mm_struct *mm)
  */
 static void xen_exit_mmap(struct mm_struct *mm)
 {
-	get_cpu();		/* make sure we don't move around */
+	/*
+	 * Make sure we don't move around, and prevent CPUs from going
+	 * offline.
+	 */
+	get_online_cpus_atomic();
 	xen_drop_mm_ref(mm);
-	put_cpu();
+	put_online_cpus_atomic();
 
 	spin_lock(&mm->page_table_lock);
 
@@ -1371,6 +1376,7 @@ static void xen_flush_tlb_others(const struct cpumask *cpus,
 	args->op.arg2.vcpumask = to_cpumask(args->mask);
 
 	/* Remove us, and any offline CPUS. */
+	get_online_cpus_atomic();
 	cpumask_and(to_cpumask(args->mask), cpus, cpu_online_mask);
 	cpumask_clear_cpu(smp_processor_id(), to_cpumask(args->mask));
 
@@ -1383,6 +1389,7 @@ static void xen_flush_tlb_others(const struct cpumask *cpus,
 	MULTI_mmuext_op(mcs.mc, &args->op, 1, NULL, DOMID_SELF);
 
 	xen_mc_issue(PARAVIRT_LAZY_MMU);
+	put_online_cpus_atomic();
 }
 
 static unsigned long xen_read_cr3(void)
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 34bc4ce..cb82e71 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -16,6 +16,7 @@
 #include <linux/err.h>
 #include <linux/slab.h>
 #include <linux/smp.h>
+#include <linux/cpu.h>
 #include <linux/irq_work.h>
 
 #include <asm/paravirt.h>
@@ -480,8 +481,10 @@ static void __xen_send_IPI_mask(const struct cpumask *mask,
 {
 	unsigned cpu;
 
+	get_online_cpus_atomic();
 	for_each_cpu_and(cpu, mask, cpu_online_mask)
 		xen_send_IPI_one(cpu, vector);
+	put_online_cpus_atomic();
 }
 
 static void xen_smp_send_call_function_ipi(const struct cpumask *mask)
@@ -544,8 +547,10 @@ void xen_send_IPI_all(int vector)
 {
 	int xen_vector = xen_map_vector(vector);
 
+	get_online_cpus_atomic();
 	if (xen_vector >= 0)
 		__xen_send_IPI_mask(cpu_online_mask, xen_vector);
+	put_online_cpus_atomic();
 }
 
 void xen_send_IPI_self(int vector)
@@ -565,20 +570,24 @@ void xen_send_IPI_mask_allbutself(const struct cpumask *mask,
 	if (!(num_online_cpus() > 1))
 		return;
 
+	get_online_cpus_atomic();
 	for_each_cpu_and(cpu, mask, cpu_online_mask) {
 		if (this_cpu == cpu)
 			continue;
 
 		xen_smp_send_call_function_single_ipi(cpu);
 	}
+	put_online_cpus_atomic();
 }
 
 void xen_send_IPI_allbutself(int vector)
 {
 	int xen_vector = xen_map_vector(vector);
 
+	get_online_cpus_atomic();
 	if (xen_vector >= 0)
 		xen_send_IPI_mask_allbutself(cpu_online_mask, xen_vector);
+	put_online_cpus_atomic();
 }
 
 static irqreturn_t xen_call_function_interrupt(int irq, void *dev_id)

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 31/46] alpha/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
  (?)
@ 2013-02-18 12:42   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:42 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Also, remove the non-ASCII character present in this file!

Cc: linux-alpha@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/alpha/kernel/smp.c |   19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/arch/alpha/kernel/smp.c b/arch/alpha/kernel/smp.c
index 9603bc2..9213d5d 100644
--- a/arch/alpha/kernel/smp.c
+++ b/arch/alpha/kernel/smp.c
@@ -498,7 +498,6 @@ smp_cpus_done(unsigned int max_cpus)
 	       ((bogosum + 2500) / (5000/HZ)) % 100);
 }
 
-\f
 void
 smp_percpu_timer_interrupt(struct pt_regs *regs)
 {
@@ -682,7 +681,7 @@ ipi_flush_tlb_mm(void *x)
 void
 flush_tlb_mm(struct mm_struct *mm)
 {
-	preempt_disable();
+	get_online_cpus_atomic();
 
 	if (mm == current->active_mm) {
 		flush_tlb_current(mm);
@@ -694,7 +693,7 @@ flush_tlb_mm(struct mm_struct *mm)
 				if (mm->context[cpu])
 					mm->context[cpu] = 0;
 			}
-			preempt_enable();
+			put_online_cpus_atomic();
 			return;
 		}
 	}
@@ -703,7 +702,7 @@ flush_tlb_mm(struct mm_struct *mm)
 		printk(KERN_CRIT "flush_tlb_mm: timed out\n");
 	}
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 EXPORT_SYMBOL(flush_tlb_mm);
 
@@ -731,7 +730,7 @@ flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
 	struct flush_tlb_page_struct data;
 	struct mm_struct *mm = vma->vm_mm;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 
 	if (mm == current->active_mm) {
 		flush_tlb_current_page(mm, vma, addr);
@@ -743,7 +742,7 @@ flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
 				if (mm->context[cpu])
 					mm->context[cpu] = 0;
 			}
-			preempt_enable();
+			put_online_cpus_atomic();
 			return;
 		}
 	}
@@ -756,7 +755,7 @@ flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
 		printk(KERN_CRIT "flush_tlb_page: timed out\n");
 	}
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 EXPORT_SYMBOL(flush_tlb_page);
 
@@ -787,7 +786,7 @@ flush_icache_user_range(struct vm_area_struct *vma, struct page *page,
 	if ((vma->vm_flags & VM_EXEC) == 0)
 		return;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 
 	if (mm == current->active_mm) {
 		__load_new_mm_context(mm);
@@ -799,7 +798,7 @@ flush_icache_user_range(struct vm_area_struct *vma, struct page *page,
 				if (mm->context[cpu])
 					mm->context[cpu] = 0;
 			}
-			preempt_enable();
+			put_online_cpus_atomic();
 			return;
 		}
 	}
@@ -808,5 +807,5 @@ flush_icache_user_range(struct vm_area_struct *vma, struct page *page,
 		printk(KERN_CRIT "flush_icache_page: timed out\n");
 	}
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 31/46] alpha/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:42   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:42 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Also, remove the non-ASCII character present in this file!

Cc: linux-alpha@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/alpha/kernel/smp.c |   19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/arch/alpha/kernel/smp.c b/arch/alpha/kernel/smp.c
index 9603bc2..9213d5d 100644
--- a/arch/alpha/kernel/smp.c
+++ b/arch/alpha/kernel/smp.c
@@ -498,7 +498,6 @@ smp_cpus_done(unsigned int max_cpus)
 	       ((bogosum + 2500) / (5000/HZ)) % 100);
 }
 
-\f

 void
 smp_percpu_timer_interrupt(struct pt_regs *regs)
 {
@@ -682,7 +681,7 @@ ipi_flush_tlb_mm(void *x)
 void
 flush_tlb_mm(struct mm_struct *mm)
 {
-	preempt_disable();
+	get_online_cpus_atomic();
 
 	if (mm == current->active_mm) {
 		flush_tlb_current(mm);
@@ -694,7 +693,7 @@ flush_tlb_mm(struct mm_struct *mm)
 				if (mm->context[cpu])
 					mm->context[cpu] = 0;
 			}
-			preempt_enable();
+			put_online_cpus_atomic();
 			return;
 		}
 	}
@@ -703,7 +702,7 @@ flush_tlb_mm(struct mm_struct *mm)
 		printk(KERN_CRIT "flush_tlb_mm: timed out\n");
 	}
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 EXPORT_SYMBOL(flush_tlb_mm);
 
@@ -731,7 +730,7 @@ flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
 	struct flush_tlb_page_struct data;
 	struct mm_struct *mm = vma->vm_mm;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 
 	if (mm == current->active_mm) {
 		flush_tlb_current_page(mm, vma, addr);
@@ -743,7 +742,7 @@ flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
 				if (mm->context[cpu])
 					mm->context[cpu] = 0;
 			}
-			preempt_enable();
+			put_online_cpus_atomic();
 			return;
 		}
 	}
@@ -756,7 +755,7 @@ flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
 		printk(KERN_CRIT "flush_tlb_page: timed out\n");
 	}
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 EXPORT_SYMBOL(flush_tlb_page);
 
@@ -787,7 +786,7 @@ flush_icache_user_range(struct vm_area_struct *vma, struct page *page,
 	if ((vma->vm_flags & VM_EXEC) == 0)
 		return;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 
 	if (mm == current->active_mm) {
 		__load_new_mm_context(mm);
@@ -799,7 +798,7 @@ flush_icache_user_range(struct vm_area_struct *vma, struct page *page,
 				if (mm->context[cpu])
 					mm->context[cpu] = 0;
 			}
-			preempt_enable();
+			put_online_cpus_atomic();
 			return;
 		}
 	}
@@ -808,5 +807,5 @@ flush_icache_user_range(struct vm_area_struct *vma, struct page *page,
 		printk(KERN_CRIT "flush_icache_page: timed out\n");
 	}
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 31/46] alpha/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:42   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:42 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Also, remove the non-ASCII character present in this file!

Cc: linux-alpha@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/alpha/kernel/smp.c |   19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/arch/alpha/kernel/smp.c b/arch/alpha/kernel/smp.c
index 9603bc2..9213d5d 100644
--- a/arch/alpha/kernel/smp.c
+++ b/arch/alpha/kernel/smp.c
@@ -498,7 +498,6 @@ smp_cpus_done(unsigned int max_cpus)
 	       ((bogosum + 2500) / (5000/HZ)) % 100);
 }
 
-\f
 void
 smp_percpu_timer_interrupt(struct pt_regs *regs)
 {
@@ -682,7 +681,7 @@ ipi_flush_tlb_mm(void *x)
 void
 flush_tlb_mm(struct mm_struct *mm)
 {
-	preempt_disable();
+	get_online_cpus_atomic();
 
 	if (mm == current->active_mm) {
 		flush_tlb_current(mm);
@@ -694,7 +693,7 @@ flush_tlb_mm(struct mm_struct *mm)
 				if (mm->context[cpu])
 					mm->context[cpu] = 0;
 			}
-			preempt_enable();
+			put_online_cpus_atomic();
 			return;
 		}
 	}
@@ -703,7 +702,7 @@ flush_tlb_mm(struct mm_struct *mm)
 		printk(KERN_CRIT "flush_tlb_mm: timed out\n");
 	}
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 EXPORT_SYMBOL(flush_tlb_mm);
 
@@ -731,7 +730,7 @@ flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
 	struct flush_tlb_page_struct data;
 	struct mm_struct *mm = vma->vm_mm;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 
 	if (mm == current->active_mm) {
 		flush_tlb_current_page(mm, vma, addr);
@@ -743,7 +742,7 @@ flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
 				if (mm->context[cpu])
 					mm->context[cpu] = 0;
 			}
-			preempt_enable();
+			put_online_cpus_atomic();
 			return;
 		}
 	}
@@ -756,7 +755,7 @@ flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
 		printk(KERN_CRIT "flush_tlb_page: timed out\n");
 	}
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 EXPORT_SYMBOL(flush_tlb_page);
 
@@ -787,7 +786,7 @@ flush_icache_user_range(struct vm_area_struct *vma, struct page *page,
 	if ((vma->vm_flags & VM_EXEC) == 0)
 		return;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 
 	if (mm == current->active_mm) {
 		__load_new_mm_context(mm);
@@ -799,7 +798,7 @@ flush_icache_user_range(struct vm_area_struct *vma, struct page *page,
 				if (mm->context[cpu])
 					mm->context[cpu] = 0;
 			}
-			preempt_enable();
+			put_online_cpus_atomic();
 			return;
 		}
 	}
@@ -808,5 +807,5 @@ flush_icache_user_range(struct vm_area_struct *vma, struct page *page,
 		printk(KERN_CRIT "flush_icache_page: timed out\n");
 	}
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 31/46] alpha/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:42   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:42 UTC (permalink / raw)
  To: linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Also, remove the non-ASCII character present in this file!

Cc: linux-alpha at vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/alpha/kernel/smp.c |   19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/arch/alpha/kernel/smp.c b/arch/alpha/kernel/smp.c
index 9603bc2..9213d5d 100644
--- a/arch/alpha/kernel/smp.c
+++ b/arch/alpha/kernel/smp.c
@@ -498,7 +498,6 @@ smp_cpus_done(unsigned int max_cpus)
 	       ((bogosum + 2500) / (5000/HZ)) % 100);
 }
 
-

 void
 smp_percpu_timer_interrupt(struct pt_regs *regs)
 {
@@ -682,7 +681,7 @@ ipi_flush_tlb_mm(void *x)
 void
 flush_tlb_mm(struct mm_struct *mm)
 {
-	preempt_disable();
+	get_online_cpus_atomic();
 
 	if (mm == current->active_mm) {
 		flush_tlb_current(mm);
@@ -694,7 +693,7 @@ flush_tlb_mm(struct mm_struct *mm)
 				if (mm->context[cpu])
 					mm->context[cpu] = 0;
 			}
-			preempt_enable();
+			put_online_cpus_atomic();
 			return;
 		}
 	}
@@ -703,7 +702,7 @@ flush_tlb_mm(struct mm_struct *mm)
 		printk(KERN_CRIT "flush_tlb_mm: timed out\n");
 	}
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 EXPORT_SYMBOL(flush_tlb_mm);
 
@@ -731,7 +730,7 @@ flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
 	struct flush_tlb_page_struct data;
 	struct mm_struct *mm = vma->vm_mm;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 
 	if (mm == current->active_mm) {
 		flush_tlb_current_page(mm, vma, addr);
@@ -743,7 +742,7 @@ flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
 				if (mm->context[cpu])
 					mm->context[cpu] = 0;
 			}
-			preempt_enable();
+			put_online_cpus_atomic();
 			return;
 		}
 	}
@@ -756,7 +755,7 @@ flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
 		printk(KERN_CRIT "flush_tlb_page: timed out\n");
 	}
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 EXPORT_SYMBOL(flush_tlb_page);
 
@@ -787,7 +786,7 @@ flush_icache_user_range(struct vm_area_struct *vma, struct page *page,
 	if ((vma->vm_flags & VM_EXEC) == 0)
 		return;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 
 	if (mm == current->active_mm) {
 		__load_new_mm_context(mm);
@@ -799,7 +798,7 @@ flush_icache_user_range(struct vm_area_struct *vma, struct page *page,
 				if (mm->context[cpu])
 					mm->context[cpu] = 0;
 			}
-			preempt_enable();
+			put_online_cpus_atomic();
 			return;
 		}
 	}
@@ -808,5 +807,5 @@ flush_icache_user_range(struct vm_area_struct *vma, struct page *page,
 		printk(KERN_CRIT "flush_icache_page: timed out\n");
 	}
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 32/46] blackfin/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:42   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:42 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Bob Liu <lliubbo@gmail.com>
Cc: Steven Miao <realmz6@gmail.com>
Cc: uclinux-dist-devel@blackfin.uclinux.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/blackfin/mach-common/smp.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/blackfin/mach-common/smp.c b/arch/blackfin/mach-common/smp.c
index bb61ae4..6cc6d7a 100644
--- a/arch/blackfin/mach-common/smp.c
+++ b/arch/blackfin/mach-common/smp.c
@@ -194,6 +194,7 @@ void send_ipi(const struct cpumask *cpumask, enum ipi_message_type msg)
 	struct ipi_data *bfin_ipi_data;
 	unsigned long flags;
 
+	get_online_cpus_atomic();
 	local_irq_save(flags);
 	smp_mb();
 	for_each_cpu(cpu, cpumask) {
@@ -205,6 +206,7 @@ void send_ipi(const struct cpumask *cpumask, enum ipi_message_type msg)
 	}
 
 	local_irq_restore(flags);
+	put_online_cpus_atomic();
 }
 
 void arch_send_call_function_single_ipi(int cpu)
@@ -238,13 +240,13 @@ void smp_send_stop(void)
 {
 	cpumask_t callmap;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	cpumask_copy(&callmap, cpu_online_mask);
 	cpumask_clear_cpu(smp_processor_id(), &callmap);
 	if (!cpumask_empty(&callmap))
 		send_ipi(&callmap, BFIN_IPI_CPU_STOP);
 
-	preempt_enable();
+	put_online_cpus_atomic();
 
 	return;
 }


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 32/46] blackfin/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:42   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:42 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Bob Liu <lliubbo@gmail.com>
Cc: Steven Miao <realmz6@gmail.com>
Cc: uclinux-dist-devel@blackfin.uclinux.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/blackfin/mach-common/smp.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/blackfin/mach-common/smp.c b/arch/blackfin/mach-common/smp.c
index bb61ae4..6cc6d7a 100644
--- a/arch/blackfin/mach-common/smp.c
+++ b/arch/blackfin/mach-common/smp.c
@@ -194,6 +194,7 @@ void send_ipi(const struct cpumask *cpumask, enum ipi_message_type msg)
 	struct ipi_data *bfin_ipi_data;
 	unsigned long flags;
 
+	get_online_cpus_atomic();
 	local_irq_save(flags);
 	smp_mb();
 	for_each_cpu(cpu, cpumask) {
@@ -205,6 +206,7 @@ void send_ipi(const struct cpumask *cpumask, enum ipi_message_type msg)
 	}
 
 	local_irq_restore(flags);
+	put_online_cpus_atomic();
 }
 
 void arch_send_call_function_single_ipi(int cpu)
@@ -238,13 +240,13 @@ void smp_send_stop(void)
 {
 	cpumask_t callmap;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	cpumask_copy(&callmap, cpu_online_mask);
 	cpumask_clear_cpu(smp_processor_id(), &callmap);
 	if (!cpumask_empty(&callmap))
 		send_ipi(&callmap, BFIN_IPI_CPU_STOP);
 
-	preempt_enable();
+	put_online_cpus_atomic();
 
 	return;
 }

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 32/46] blackfin/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:42   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:42 UTC (permalink / raw)
  To: linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Bob Liu <lliubbo@gmail.com>
Cc: Steven Miao <realmz6@gmail.com>
Cc: uclinux-dist-devel at blackfin.uclinux.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/blackfin/mach-common/smp.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/blackfin/mach-common/smp.c b/arch/blackfin/mach-common/smp.c
index bb61ae4..6cc6d7a 100644
--- a/arch/blackfin/mach-common/smp.c
+++ b/arch/blackfin/mach-common/smp.c
@@ -194,6 +194,7 @@ void send_ipi(const struct cpumask *cpumask, enum ipi_message_type msg)
 	struct ipi_data *bfin_ipi_data;
 	unsigned long flags;
 
+	get_online_cpus_atomic();
 	local_irq_save(flags);
 	smp_mb();
 	for_each_cpu(cpu, cpumask) {
@@ -205,6 +206,7 @@ void send_ipi(const struct cpumask *cpumask, enum ipi_message_type msg)
 	}
 
 	local_irq_restore(flags);
+	put_online_cpus_atomic();
 }
 
 void arch_send_call_function_single_ipi(int cpu)
@@ -238,13 +240,13 @@ void smp_send_stop(void)
 {
 	cpumask_t callmap;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	cpumask_copy(&callmap, cpu_online_mask);
 	cpumask_clear_cpu(smp_processor_id(), &callmap);
 	if (!cpumask_empty(&callmap))
 		send_ipi(&callmap, BFIN_IPI_CPU_STOP);
 
-	preempt_enable();
+	put_online_cpus_atomic();
 
 	return;
 }

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 33/46] cris/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:42   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:42 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: linux-cris-kernel@axis.com
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/cris/arch-v32/kernel/smp.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/cris/arch-v32/kernel/smp.c b/arch/cris/arch-v32/kernel/smp.c
index 04a16ed..644b358 100644
--- a/arch/cris/arch-v32/kernel/smp.c
+++ b/arch/cris/arch-v32/kernel/smp.c
@@ -15,6 +15,7 @@
 #include <linux/sched.h>
 #include <linux/kernel.h>
 #include <linux/cpumask.h>
+#include <linux/cpu.h>
 #include <linux/interrupt.h>
 #include <linux/module.h>
 
@@ -208,9 +209,12 @@ int __cpuinit __cpu_up(unsigned int cpu, struct task_struct *tidle)
 void smp_send_reschedule(int cpu)
 {
 	cpumask_t cpu_mask;
+
+	get_online_cpus_atomic();
 	cpumask_clear(&cpu_mask);
 	cpumask_set_cpu(cpu, &cpu_mask);
 	send_ipi(IPI_SCHEDULE, 0, cpu_mask);
+	put_online_cpus_atomic();
 }
 
 /* TLB flushing
@@ -224,6 +228,7 @@ void flush_tlb_common(struct mm_struct* mm, struct vm_area_struct* vma, unsigned
 	unsigned long flags;
 	cpumask_t cpu_mask;
 
+	get_online_cpus_atomic();
 	spin_lock_irqsave(&tlbstate_lock, flags);
 	cpu_mask = (mm == FLUSH_ALL ? cpu_all_mask : *mm_cpumask(mm));
 	cpumask_clear_cpu(smp_processor_id(), &cpu_mask);
@@ -232,6 +237,7 @@ void flush_tlb_common(struct mm_struct* mm, struct vm_area_struct* vma, unsigned
 	flush_addr = addr;
 	send_ipi(IPI_FLUSH_TLB, 1, cpu_mask);
 	spin_unlock_irqrestore(&tlbstate_lock, flags);
+	put_online_cpus_atomic();
 }
 
 void flush_tlb_all(void)
@@ -312,6 +318,7 @@ int smp_call_function(void (*func)(void *info), void *info, int wait)
 	struct call_data_struct data;
 	int ret;
 
+	get_online_cpus_atomic();
 	cpumask_setall(&cpu_mask);
 	cpumask_clear_cpu(smp_processor_id(), &cpu_mask);
 
@@ -325,6 +332,7 @@ int smp_call_function(void (*func)(void *info), void *info, int wait)
 	call_data = &data;
 	ret = send_ipi(IPI_CALL, wait, cpu_mask);
 	spin_unlock(&call_lock);
+	put_online_cpus_atomic();
 
 	return ret;
 }


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 33/46] cris/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:42   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:42 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: linux-cris-kernel@axis.com
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/cris/arch-v32/kernel/smp.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/cris/arch-v32/kernel/smp.c b/arch/cris/arch-v32/kernel/smp.c
index 04a16ed..644b358 100644
--- a/arch/cris/arch-v32/kernel/smp.c
+++ b/arch/cris/arch-v32/kernel/smp.c
@@ -15,6 +15,7 @@
 #include <linux/sched.h>
 #include <linux/kernel.h>
 #include <linux/cpumask.h>
+#include <linux/cpu.h>
 #include <linux/interrupt.h>
 #include <linux/module.h>
 
@@ -208,9 +209,12 @@ int __cpuinit __cpu_up(unsigned int cpu, struct task_struct *tidle)
 void smp_send_reschedule(int cpu)
 {
 	cpumask_t cpu_mask;
+
+	get_online_cpus_atomic();
 	cpumask_clear(&cpu_mask);
 	cpumask_set_cpu(cpu, &cpu_mask);
 	send_ipi(IPI_SCHEDULE, 0, cpu_mask);
+	put_online_cpus_atomic();
 }
 
 /* TLB flushing
@@ -224,6 +228,7 @@ void flush_tlb_common(struct mm_struct* mm, struct vm_area_struct* vma, unsigned
 	unsigned long flags;
 	cpumask_t cpu_mask;
 
+	get_online_cpus_atomic();
 	spin_lock_irqsave(&tlbstate_lock, flags);
 	cpu_mask = (mm == FLUSH_ALL ? cpu_all_mask : *mm_cpumask(mm));
 	cpumask_clear_cpu(smp_processor_id(), &cpu_mask);
@@ -232,6 +237,7 @@ void flush_tlb_common(struct mm_struct* mm, struct vm_area_struct* vma, unsigned
 	flush_addr = addr;
 	send_ipi(IPI_FLUSH_TLB, 1, cpu_mask);
 	spin_unlock_irqrestore(&tlbstate_lock, flags);
+	put_online_cpus_atomic();
 }
 
 void flush_tlb_all(void)
@@ -312,6 +318,7 @@ int smp_call_function(void (*func)(void *info), void *info, int wait)
 	struct call_data_struct data;
 	int ret;
 
+	get_online_cpus_atomic();
 	cpumask_setall(&cpu_mask);
 	cpumask_clear_cpu(smp_processor_id(), &cpu_mask);
 
@@ -325,6 +332,7 @@ int smp_call_function(void (*func)(void *info), void *info, int wait)
 	call_data = &data;
 	ret = send_ipi(IPI_CALL, wait, cpu_mask);
 	spin_unlock(&call_lock);
+	put_online_cpus_atomic();
 
 	return ret;
 }

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 33/46] cris/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:42   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:42 UTC (permalink / raw)
  To: linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: linux-cris-kernel at axis.com
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/cris/arch-v32/kernel/smp.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/cris/arch-v32/kernel/smp.c b/arch/cris/arch-v32/kernel/smp.c
index 04a16ed..644b358 100644
--- a/arch/cris/arch-v32/kernel/smp.c
+++ b/arch/cris/arch-v32/kernel/smp.c
@@ -15,6 +15,7 @@
 #include <linux/sched.h>
 #include <linux/kernel.h>
 #include <linux/cpumask.h>
+#include <linux/cpu.h>
 #include <linux/interrupt.h>
 #include <linux/module.h>
 
@@ -208,9 +209,12 @@ int __cpuinit __cpu_up(unsigned int cpu, struct task_struct *tidle)
 void smp_send_reschedule(int cpu)
 {
 	cpumask_t cpu_mask;
+
+	get_online_cpus_atomic();
 	cpumask_clear(&cpu_mask);
 	cpumask_set_cpu(cpu, &cpu_mask);
 	send_ipi(IPI_SCHEDULE, 0, cpu_mask);
+	put_online_cpus_atomic();
 }
 
 /* TLB flushing
@@ -224,6 +228,7 @@ void flush_tlb_common(struct mm_struct* mm, struct vm_area_struct* vma, unsigned
 	unsigned long flags;
 	cpumask_t cpu_mask;
 
+	get_online_cpus_atomic();
 	spin_lock_irqsave(&tlbstate_lock, flags);
 	cpu_mask = (mm == FLUSH_ALL ? cpu_all_mask : *mm_cpumask(mm));
 	cpumask_clear_cpu(smp_processor_id(), &cpu_mask);
@@ -232,6 +237,7 @@ void flush_tlb_common(struct mm_struct* mm, struct vm_area_struct* vma, unsigned
 	flush_addr = addr;
 	send_ipi(IPI_FLUSH_TLB, 1, cpu_mask);
 	spin_unlock_irqrestore(&tlbstate_lock, flags);
+	put_online_cpus_atomic();
 }
 
 void flush_tlb_all(void)
@@ -312,6 +318,7 @@ int smp_call_function(void (*func)(void *info), void *info, int wait)
 	struct call_data_struct data;
 	int ret;
 
+	get_online_cpus_atomic();
 	cpumask_setall(&cpu_mask);
 	cpumask_clear_cpu(smp_processor_id(), &cpu_mask);
 
@@ -325,6 +332,7 @@ int smp_call_function(void (*func)(void *info), void *info, int wait)
 	call_data = &data;
 	ret = send_ipi(IPI_CALL, wait, cpu_mask);
 	spin_unlock(&call_lock);
+	put_online_cpus_atomic();
 
 	return ret;
 }

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 34/46] hexagon/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:43   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:43 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: linux-hexagon@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/hexagon/kernel/smp.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/hexagon/kernel/smp.c b/arch/hexagon/kernel/smp.c
index 8e095df..ec87de9 100644
--- a/arch/hexagon/kernel/smp.c
+++ b/arch/hexagon/kernel/smp.c
@@ -112,6 +112,7 @@ void send_ipi(const struct cpumask *cpumask, enum ipi_message_type msg)
 	unsigned long cpu;
 	unsigned long retval;
 
+	get_online_cpus_atomic();
 	local_irq_save(flags);
 
 	for_each_cpu(cpu, cpumask) {
@@ -128,6 +129,7 @@ void send_ipi(const struct cpumask *cpumask, enum ipi_message_type msg)
 	}
 
 	local_irq_restore(flags);
+	put_online_cpus_atomic();
 }
 
 static struct irqaction ipi_intdesc = {
@@ -241,9 +243,12 @@ void smp_send_reschedule(int cpu)
 void smp_send_stop(void)
 {
 	struct cpumask targets;
+
+	get_online_cpus_atomic();
 	cpumask_copy(&targets, cpu_online_mask);
 	cpumask_clear_cpu(smp_processor_id(), &targets);
 	send_ipi(&targets, IPI_CPU_STOP);
+	put_online_cpus_atomic();
 }
 
 void arch_send_call_function_single_ipi(int cpu)


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 34/46] hexagon/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:43   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:43 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: linux-hexagon@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/hexagon/kernel/smp.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/hexagon/kernel/smp.c b/arch/hexagon/kernel/smp.c
index 8e095df..ec87de9 100644
--- a/arch/hexagon/kernel/smp.c
+++ b/arch/hexagon/kernel/smp.c
@@ -112,6 +112,7 @@ void send_ipi(const struct cpumask *cpumask, enum ipi_message_type msg)
 	unsigned long cpu;
 	unsigned long retval;
 
+	get_online_cpus_atomic();
 	local_irq_save(flags);
 
 	for_each_cpu(cpu, cpumask) {
@@ -128,6 +129,7 @@ void send_ipi(const struct cpumask *cpumask, enum ipi_message_type msg)
 	}
 
 	local_irq_restore(flags);
+	put_online_cpus_atomic();
 }
 
 static struct irqaction ipi_intdesc = {
@@ -241,9 +243,12 @@ void smp_send_reschedule(int cpu)
 void smp_send_stop(void)
 {
 	struct cpumask targets;
+
+	get_online_cpus_atomic();
 	cpumask_copy(&targets, cpu_online_mask);
 	cpumask_clear_cpu(smp_processor_id(), &targets);
 	send_ipi(&targets, IPI_CPU_STOP);
+	put_online_cpus_atomic();
 }
 
 void arch_send_call_function_single_ipi(int cpu)

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 34/46] hexagon/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:43   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:43 UTC (permalink / raw)
  To: linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: linux-hexagon at vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/hexagon/kernel/smp.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/hexagon/kernel/smp.c b/arch/hexagon/kernel/smp.c
index 8e095df..ec87de9 100644
--- a/arch/hexagon/kernel/smp.c
+++ b/arch/hexagon/kernel/smp.c
@@ -112,6 +112,7 @@ void send_ipi(const struct cpumask *cpumask, enum ipi_message_type msg)
 	unsigned long cpu;
 	unsigned long retval;
 
+	get_online_cpus_atomic();
 	local_irq_save(flags);
 
 	for_each_cpu(cpu, cpumask) {
@@ -128,6 +129,7 @@ void send_ipi(const struct cpumask *cpumask, enum ipi_message_type msg)
 	}
 
 	local_irq_restore(flags);
+	put_online_cpus_atomic();
 }
 
 static struct irqaction ipi_intdesc = {
@@ -241,9 +243,12 @@ void smp_send_reschedule(int cpu)
 void smp_send_stop(void)
 {
 	struct cpumask targets;
+
+	get_online_cpus_atomic();
 	cpumask_copy(&targets, cpu_online_mask);
 	cpumask_clear_cpu(smp_processor_id(), &targets);
 	send_ipi(&targets, IPI_CPU_STOP);
+	put_online_cpus_atomic();
 }
 
 void arch_send_call_function_single_ipi(int cpu)

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 35/46] ia64: Use get/put_online_cpus_atomic() to prevent CPU offline
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:43   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:43 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: linux-ia64@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/ia64/kernel/irq_ia64.c |   13 +++++++++++++
 arch/ia64/kernel/perfmon.c  |    6 ++++++
 arch/ia64/kernel/smp.c      |   23 ++++++++++++++++-------
 arch/ia64/mm/tlb.c          |    6 ++++--
 4 files changed, 39 insertions(+), 9 deletions(-)

diff --git a/arch/ia64/kernel/irq_ia64.c b/arch/ia64/kernel/irq_ia64.c
index 1034884..d0b4478 100644
--- a/arch/ia64/kernel/irq_ia64.c
+++ b/arch/ia64/kernel/irq_ia64.c
@@ -31,6 +31,7 @@
 #include <linux/ratelimit.h>
 #include <linux/acpi.h>
 #include <linux/sched.h>
+#include <linux/cpu.h>
 
 #include <asm/delay.h>
 #include <asm/intrinsics.h>
@@ -190,9 +191,11 @@ static void clear_irq_vector(int irq)
 {
 	unsigned long flags;
 
+	get_online_cpus_atomic();
 	spin_lock_irqsave(&vector_lock, flags);
 	__clear_irq_vector(irq);
 	spin_unlock_irqrestore(&vector_lock, flags);
+	put_online_cpus_atomic();
 }
 
 int
@@ -204,6 +207,7 @@ ia64_native_assign_irq_vector (int irq)
 
 	vector = -ENOSPC;
 
+	get_online_cpus_atomic();
 	spin_lock_irqsave(&vector_lock, flags);
 	for_each_online_cpu(cpu) {
 		domain = vector_allocation_domain(cpu);
@@ -218,6 +222,7 @@ ia64_native_assign_irq_vector (int irq)
 	BUG_ON(__bind_irq_vector(irq, vector, domain));
  out:
 	spin_unlock_irqrestore(&vector_lock, flags);
+	put_online_cpus_atomic();
 	return vector;
 }
 
@@ -302,9 +307,11 @@ int irq_prepare_move(int irq, int cpu)
 	unsigned long flags;
 	int ret;
 
+	get_online_cpus_atomic();
 	spin_lock_irqsave(&vector_lock, flags);
 	ret = __irq_prepare_move(irq, cpu);
 	spin_unlock_irqrestore(&vector_lock, flags);
+	put_online_cpus_atomic();
 	return ret;
 }
 
@@ -320,11 +327,13 @@ void irq_complete_move(unsigned irq)
 	if (unlikely(cpu_isset(smp_processor_id(), cfg->old_domain)))
 		return;
 
+	get_online_cpus_atomic();
 	cpumask_and(&cleanup_mask, &cfg->old_domain, cpu_online_mask);
 	cfg->move_cleanup_count = cpus_weight(cleanup_mask);
 	for_each_cpu_mask(i, cleanup_mask)
 		platform_send_ipi(i, IA64_IRQ_MOVE_VECTOR, IA64_IPI_DM_INT, 0);
 	cfg->move_in_progress = 0;
+	put_online_cpus_atomic();
 }
 
 static irqreturn_t smp_irq_move_cleanup_interrupt(int irq, void *dev_id)
@@ -409,6 +418,8 @@ int create_irq(void)
 	cpumask_t domain = CPU_MASK_NONE;
 
 	irq = vector = -ENOSPC;
+
+	get_online_cpus_atomic();
 	spin_lock_irqsave(&vector_lock, flags);
 	for_each_online_cpu(cpu) {
 		domain = vector_allocation_domain(cpu);
@@ -424,6 +435,8 @@ int create_irq(void)
 	BUG_ON(__bind_irq_vector(irq, vector, domain));
  out:
 	spin_unlock_irqrestore(&vector_lock, flags);
+	put_online_cpus_atomic();
+
 	if (irq >= 0)
 		dynamic_irq_init(irq);
 	return irq;
diff --git a/arch/ia64/kernel/perfmon.c b/arch/ia64/kernel/perfmon.c
index ea39eba..6c6a029 100644
--- a/arch/ia64/kernel/perfmon.c
+++ b/arch/ia64/kernel/perfmon.c
@@ -34,6 +34,7 @@
 #include <linux/poll.h>
 #include <linux/vfs.h>
 #include <linux/smp.h>
+#include <linux/cpu.h>
 #include <linux/pagemap.h>
 #include <linux/mount.h>
 #include <linux/bitops.h>
@@ -6485,6 +6486,7 @@ pfm_install_alt_pmu_interrupt(pfm_intr_handler_desc_t *hdl)
 	}
 
 	/* reserve our session */
+	get_online_cpus_atomic();
 	for_each_online_cpu(reserve_cpu) {
 		ret = pfm_reserve_session(NULL, 1, reserve_cpu);
 		if (ret) goto cleanup_reserve;
@@ -6500,6 +6502,7 @@ pfm_install_alt_pmu_interrupt(pfm_intr_handler_desc_t *hdl)
 	/* officially change to the alternate interrupt handler */
 	pfm_alt_intr_handler = hdl;
 
+	put_online_cpus_atomic();
 	spin_unlock(&pfm_alt_install_check);
 
 	return 0;
@@ -6512,6 +6515,7 @@ cleanup_reserve:
 		pfm_unreserve_session(NULL, 1, i);
 	}
 
+	put_online_cpus_atomic();
 	spin_unlock(&pfm_alt_install_check);
 
 	return ret;
@@ -6536,6 +6540,7 @@ pfm_remove_alt_pmu_interrupt(pfm_intr_handler_desc_t *hdl)
 
 	pfm_alt_intr_handler = NULL;
 
+	get_online_cpus_atomic();
 	ret = on_each_cpu(pfm_alt_restore_pmu_state, NULL, 1);
 	if (ret) {
 		DPRINT(("on_each_cpu() failed: %d\n", ret));
@@ -6545,6 +6550,7 @@ pfm_remove_alt_pmu_interrupt(pfm_intr_handler_desc_t *hdl)
 		pfm_unreserve_session(NULL, 1, i);
 	}
 
+	put_online_cpus_atomic();
 	spin_unlock(&pfm_alt_install_check);
 
 	return 0;
diff --git a/arch/ia64/kernel/smp.c b/arch/ia64/kernel/smp.c
index 9fcd4e6..d9a4636 100644
--- a/arch/ia64/kernel/smp.c
+++ b/arch/ia64/kernel/smp.c
@@ -24,6 +24,7 @@
 #include <linux/init.h>
 #include <linux/interrupt.h>
 #include <linux/smp.h>
+#include <linux/cpu.h>
 #include <linux/kernel_stat.h>
 #include <linux/mm.h>
 #include <linux/cache.h>
@@ -154,12 +155,15 @@ send_IPI_single (int dest_cpu, int op)
 static inline void
 send_IPI_allbutself (int op)
 {
-	unsigned int i;
+	unsigned int i, cpu;
 
+	get_online_cpus_atomic();
+	cpu = smp_processor_id();
 	for_each_online_cpu(i) {
-		if (i != smp_processor_id())
+		if (i != cpu)
 			send_IPI_single(i, op);
 	}
+	put_online_cpus_atomic();
 }
 
 /*
@@ -170,9 +174,11 @@ send_IPI_mask(const struct cpumask *mask, int op)
 {
 	unsigned int cpu;
 
+	get_online_cpus_atomic();
 	for_each_cpu(cpu, mask) {
 			send_IPI_single(cpu, op);
 	}
+	put_online_cpus_atomic();
 }
 
 /*
@@ -183,9 +189,11 @@ send_IPI_all (int op)
 {
 	int i;
 
+	get_online_cpus_atomic();
 	for_each_online_cpu(i) {
 		send_IPI_single(i, op);
 	}
+	put_online_cpus_atomic();
 }
 
 /*
@@ -259,7 +267,7 @@ smp_flush_tlb_cpumask(cpumask_t xcpumask)
 	cpumask_t cpumask = xcpumask;
 	int mycpu, cpu, flush_mycpu = 0;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	mycpu = smp_processor_id();
 
 	for_each_cpu_mask(cpu, cpumask)
@@ -280,7 +288,7 @@ smp_flush_tlb_cpumask(cpumask_t xcpumask)
 		while(counts[cpu] == (local_tlb_flush_counts[cpu].count & 0xffff))
 			udelay(FLUSH_DELAY);
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 void
@@ -293,12 +301,13 @@ void
 smp_flush_tlb_mm (struct mm_struct *mm)
 {
 	cpumask_var_t cpus;
-	preempt_disable();
+
+	get_online_cpus_atomic();
 	/* this happens for the common case of a single-threaded fork():  */
 	if (likely(mm == current->active_mm && atomic_read(&mm->mm_users) == 1))
 	{
 		local_finish_flush_tlb_mm(mm);
-		preempt_enable();
+		put_online_cpus_atomic();
 		return;
 	}
 	if (!alloc_cpumask_var(&cpus, GFP_ATOMIC)) {
@@ -313,7 +322,7 @@ smp_flush_tlb_mm (struct mm_struct *mm)
 	local_irq_disable();
 	local_finish_flush_tlb_mm(mm);
 	local_irq_enable();
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 void arch_send_call_function_single_ipi(int cpu)
diff --git a/arch/ia64/mm/tlb.c b/arch/ia64/mm/tlb.c
index ed61297..8f03b58 100644
--- a/arch/ia64/mm/tlb.c
+++ b/arch/ia64/mm/tlb.c
@@ -20,6 +20,7 @@
 #include <linux/kernel.h>
 #include <linux/sched.h>
 #include <linux/smp.h>
+#include <linux/cpu.h>
 #include <linux/mm.h>
 #include <linux/bootmem.h>
 #include <linux/slab.h>
@@ -87,11 +88,12 @@ wrap_mmu_context (struct mm_struct *mm)
 	 * can't call flush_tlb_all() here because of race condition
 	 * with O(1) scheduler [EF]
 	 */
-	cpu = get_cpu(); /* prevent preemption/migration */
+	get_online_cpus_atomic(); /* prevent preemption/migration */
+	cpu = smp_processor_id();
 	for_each_online_cpu(i)
 		if (i != cpu)
 			per_cpu(ia64_need_tlb_flush, i) = 1;
-	put_cpu();
+	put_online_cpus_atomic();
 	local_flush_tlb_all();
 }
 


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 35/46] ia64: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:43   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:43 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: linux-ia64@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/ia64/kernel/irq_ia64.c |   13 +++++++++++++
 arch/ia64/kernel/perfmon.c  |    6 ++++++
 arch/ia64/kernel/smp.c      |   23 ++++++++++++++++-------
 arch/ia64/mm/tlb.c          |    6 ++++--
 4 files changed, 39 insertions(+), 9 deletions(-)

diff --git a/arch/ia64/kernel/irq_ia64.c b/arch/ia64/kernel/irq_ia64.c
index 1034884..d0b4478 100644
--- a/arch/ia64/kernel/irq_ia64.c
+++ b/arch/ia64/kernel/irq_ia64.c
@@ -31,6 +31,7 @@
 #include <linux/ratelimit.h>
 #include <linux/acpi.h>
 #include <linux/sched.h>
+#include <linux/cpu.h>
 
 #include <asm/delay.h>
 #include <asm/intrinsics.h>
@@ -190,9 +191,11 @@ static void clear_irq_vector(int irq)
 {
 	unsigned long flags;
 
+	get_online_cpus_atomic();
 	spin_lock_irqsave(&vector_lock, flags);
 	__clear_irq_vector(irq);
 	spin_unlock_irqrestore(&vector_lock, flags);
+	put_online_cpus_atomic();
 }
 
 int
@@ -204,6 +207,7 @@ ia64_native_assign_irq_vector (int irq)
 
 	vector = -ENOSPC;
 
+	get_online_cpus_atomic();
 	spin_lock_irqsave(&vector_lock, flags);
 	for_each_online_cpu(cpu) {
 		domain = vector_allocation_domain(cpu);
@@ -218,6 +222,7 @@ ia64_native_assign_irq_vector (int irq)
 	BUG_ON(__bind_irq_vector(irq, vector, domain));
  out:
 	spin_unlock_irqrestore(&vector_lock, flags);
+	put_online_cpus_atomic();
 	return vector;
 }
 
@@ -302,9 +307,11 @@ int irq_prepare_move(int irq, int cpu)
 	unsigned long flags;
 	int ret;
 
+	get_online_cpus_atomic();
 	spin_lock_irqsave(&vector_lock, flags);
 	ret = __irq_prepare_move(irq, cpu);
 	spin_unlock_irqrestore(&vector_lock, flags);
+	put_online_cpus_atomic();
 	return ret;
 }
 
@@ -320,11 +327,13 @@ void irq_complete_move(unsigned irq)
 	if (unlikely(cpu_isset(smp_processor_id(), cfg->old_domain)))
 		return;
 
+	get_online_cpus_atomic();
 	cpumask_and(&cleanup_mask, &cfg->old_domain, cpu_online_mask);
 	cfg->move_cleanup_count = cpus_weight(cleanup_mask);
 	for_each_cpu_mask(i, cleanup_mask)
 		platform_send_ipi(i, IA64_IRQ_MOVE_VECTOR, IA64_IPI_DM_INT, 0);
 	cfg->move_in_progress = 0;
+	put_online_cpus_atomic();
 }
 
 static irqreturn_t smp_irq_move_cleanup_interrupt(int irq, void *dev_id)
@@ -409,6 +418,8 @@ int create_irq(void)
 	cpumask_t domain = CPU_MASK_NONE;
 
 	irq = vector = -ENOSPC;
+
+	get_online_cpus_atomic();
 	spin_lock_irqsave(&vector_lock, flags);
 	for_each_online_cpu(cpu) {
 		domain = vector_allocation_domain(cpu);
@@ -424,6 +435,8 @@ int create_irq(void)
 	BUG_ON(__bind_irq_vector(irq, vector, domain));
  out:
 	spin_unlock_irqrestore(&vector_lock, flags);
+	put_online_cpus_atomic();
+
 	if (irq >= 0)
 		dynamic_irq_init(irq);
 	return irq;
diff --git a/arch/ia64/kernel/perfmon.c b/arch/ia64/kernel/perfmon.c
index ea39eba..6c6a029 100644
--- a/arch/ia64/kernel/perfmon.c
+++ b/arch/ia64/kernel/perfmon.c
@@ -34,6 +34,7 @@
 #include <linux/poll.h>
 #include <linux/vfs.h>
 #include <linux/smp.h>
+#include <linux/cpu.h>
 #include <linux/pagemap.h>
 #include <linux/mount.h>
 #include <linux/bitops.h>
@@ -6485,6 +6486,7 @@ pfm_install_alt_pmu_interrupt(pfm_intr_handler_desc_t *hdl)
 	}
 
 	/* reserve our session */
+	get_online_cpus_atomic();
 	for_each_online_cpu(reserve_cpu) {
 		ret = pfm_reserve_session(NULL, 1, reserve_cpu);
 		if (ret) goto cleanup_reserve;
@@ -6500,6 +6502,7 @@ pfm_install_alt_pmu_interrupt(pfm_intr_handler_desc_t *hdl)
 	/* officially change to the alternate interrupt handler */
 	pfm_alt_intr_handler = hdl;
 
+	put_online_cpus_atomic();
 	spin_unlock(&pfm_alt_install_check);
 
 	return 0;
@@ -6512,6 +6515,7 @@ cleanup_reserve:
 		pfm_unreserve_session(NULL, 1, i);
 	}
 
+	put_online_cpus_atomic();
 	spin_unlock(&pfm_alt_install_check);
 
 	return ret;
@@ -6536,6 +6540,7 @@ pfm_remove_alt_pmu_interrupt(pfm_intr_handler_desc_t *hdl)
 
 	pfm_alt_intr_handler = NULL;
 
+	get_online_cpus_atomic();
 	ret = on_each_cpu(pfm_alt_restore_pmu_state, NULL, 1);
 	if (ret) {
 		DPRINT(("on_each_cpu() failed: %d\n", ret));
@@ -6545,6 +6550,7 @@ pfm_remove_alt_pmu_interrupt(pfm_intr_handler_desc_t *hdl)
 		pfm_unreserve_session(NULL, 1, i);
 	}
 
+	put_online_cpus_atomic();
 	spin_unlock(&pfm_alt_install_check);
 
 	return 0;
diff --git a/arch/ia64/kernel/smp.c b/arch/ia64/kernel/smp.c
index 9fcd4e6..d9a4636 100644
--- a/arch/ia64/kernel/smp.c
+++ b/arch/ia64/kernel/smp.c
@@ -24,6 +24,7 @@
 #include <linux/init.h>
 #include <linux/interrupt.h>
 #include <linux/smp.h>
+#include <linux/cpu.h>
 #include <linux/kernel_stat.h>
 #include <linux/mm.h>
 #include <linux/cache.h>
@@ -154,12 +155,15 @@ send_IPI_single (int dest_cpu, int op)
 static inline void
 send_IPI_allbutself (int op)
 {
-	unsigned int i;
+	unsigned int i, cpu;
 
+	get_online_cpus_atomic();
+	cpu = smp_processor_id();
 	for_each_online_cpu(i) {
-		if (i != smp_processor_id())
+		if (i != cpu)
 			send_IPI_single(i, op);
 	}
+	put_online_cpus_atomic();
 }
 
 /*
@@ -170,9 +174,11 @@ send_IPI_mask(const struct cpumask *mask, int op)
 {
 	unsigned int cpu;
 
+	get_online_cpus_atomic();
 	for_each_cpu(cpu, mask) {
 			send_IPI_single(cpu, op);
 	}
+	put_online_cpus_atomic();
 }
 
 /*
@@ -183,9 +189,11 @@ send_IPI_all (int op)
 {
 	int i;
 
+	get_online_cpus_atomic();
 	for_each_online_cpu(i) {
 		send_IPI_single(i, op);
 	}
+	put_online_cpus_atomic();
 }
 
 /*
@@ -259,7 +267,7 @@ smp_flush_tlb_cpumask(cpumask_t xcpumask)
 	cpumask_t cpumask = xcpumask;
 	int mycpu, cpu, flush_mycpu = 0;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	mycpu = smp_processor_id();
 
 	for_each_cpu_mask(cpu, cpumask)
@@ -280,7 +288,7 @@ smp_flush_tlb_cpumask(cpumask_t xcpumask)
 		while(counts[cpu] == (local_tlb_flush_counts[cpu].count & 0xffff))
 			udelay(FLUSH_DELAY);
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 void
@@ -293,12 +301,13 @@ void
 smp_flush_tlb_mm (struct mm_struct *mm)
 {
 	cpumask_var_t cpus;
-	preempt_disable();
+
+	get_online_cpus_atomic();
 	/* this happens for the common case of a single-threaded fork():  */
 	if (likely(mm == current->active_mm && atomic_read(&mm->mm_users) == 1))
 	{
 		local_finish_flush_tlb_mm(mm);
-		preempt_enable();
+		put_online_cpus_atomic();
 		return;
 	}
 	if (!alloc_cpumask_var(&cpus, GFP_ATOMIC)) {
@@ -313,7 +322,7 @@ smp_flush_tlb_mm (struct mm_struct *mm)
 	local_irq_disable();
 	local_finish_flush_tlb_mm(mm);
 	local_irq_enable();
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 void arch_send_call_function_single_ipi(int cpu)
diff --git a/arch/ia64/mm/tlb.c b/arch/ia64/mm/tlb.c
index ed61297..8f03b58 100644
--- a/arch/ia64/mm/tlb.c
+++ b/arch/ia64/mm/tlb.c
@@ -20,6 +20,7 @@
 #include <linux/kernel.h>
 #include <linux/sched.h>
 #include <linux/smp.h>
+#include <linux/cpu.h>
 #include <linux/mm.h>
 #include <linux/bootmem.h>
 #include <linux/slab.h>
@@ -87,11 +88,12 @@ wrap_mmu_context (struct mm_struct *mm)
 	 * can't call flush_tlb_all() here because of race condition
 	 * with O(1) scheduler [EF]
 	 */
-	cpu = get_cpu(); /* prevent preemption/migration */
+	get_online_cpus_atomic(); /* prevent preemption/migration */
+	cpu = smp_processor_id();
 	for_each_online_cpu(i)
 		if (i != cpu)
 			per_cpu(ia64_need_tlb_flush, i) = 1;
-	put_cpu();
+	put_online_cpus_atomic();
 	local_flush_tlb_all();
 }
 

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 35/46] ia64: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:43   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:43 UTC (permalink / raw)
  To: linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: linux-ia64 at vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/ia64/kernel/irq_ia64.c |   13 +++++++++++++
 arch/ia64/kernel/perfmon.c  |    6 ++++++
 arch/ia64/kernel/smp.c      |   23 ++++++++++++++++-------
 arch/ia64/mm/tlb.c          |    6 ++++--
 4 files changed, 39 insertions(+), 9 deletions(-)

diff --git a/arch/ia64/kernel/irq_ia64.c b/arch/ia64/kernel/irq_ia64.c
index 1034884..d0b4478 100644
--- a/arch/ia64/kernel/irq_ia64.c
+++ b/arch/ia64/kernel/irq_ia64.c
@@ -31,6 +31,7 @@
 #include <linux/ratelimit.h>
 #include <linux/acpi.h>
 #include <linux/sched.h>
+#include <linux/cpu.h>
 
 #include <asm/delay.h>
 #include <asm/intrinsics.h>
@@ -190,9 +191,11 @@ static void clear_irq_vector(int irq)
 {
 	unsigned long flags;
 
+	get_online_cpus_atomic();
 	spin_lock_irqsave(&vector_lock, flags);
 	__clear_irq_vector(irq);
 	spin_unlock_irqrestore(&vector_lock, flags);
+	put_online_cpus_atomic();
 }
 
 int
@@ -204,6 +207,7 @@ ia64_native_assign_irq_vector (int irq)
 
 	vector = -ENOSPC;
 
+	get_online_cpus_atomic();
 	spin_lock_irqsave(&vector_lock, flags);
 	for_each_online_cpu(cpu) {
 		domain = vector_allocation_domain(cpu);
@@ -218,6 +222,7 @@ ia64_native_assign_irq_vector (int irq)
 	BUG_ON(__bind_irq_vector(irq, vector, domain));
  out:
 	spin_unlock_irqrestore(&vector_lock, flags);
+	put_online_cpus_atomic();
 	return vector;
 }
 
@@ -302,9 +307,11 @@ int irq_prepare_move(int irq, int cpu)
 	unsigned long flags;
 	int ret;
 
+	get_online_cpus_atomic();
 	spin_lock_irqsave(&vector_lock, flags);
 	ret = __irq_prepare_move(irq, cpu);
 	spin_unlock_irqrestore(&vector_lock, flags);
+	put_online_cpus_atomic();
 	return ret;
 }
 
@@ -320,11 +327,13 @@ void irq_complete_move(unsigned irq)
 	if (unlikely(cpu_isset(smp_processor_id(), cfg->old_domain)))
 		return;
 
+	get_online_cpus_atomic();
 	cpumask_and(&cleanup_mask, &cfg->old_domain, cpu_online_mask);
 	cfg->move_cleanup_count = cpus_weight(cleanup_mask);
 	for_each_cpu_mask(i, cleanup_mask)
 		platform_send_ipi(i, IA64_IRQ_MOVE_VECTOR, IA64_IPI_DM_INT, 0);
 	cfg->move_in_progress = 0;
+	put_online_cpus_atomic();
 }
 
 static irqreturn_t smp_irq_move_cleanup_interrupt(int irq, void *dev_id)
@@ -409,6 +418,8 @@ int create_irq(void)
 	cpumask_t domain = CPU_MASK_NONE;
 
 	irq = vector = -ENOSPC;
+
+	get_online_cpus_atomic();
 	spin_lock_irqsave(&vector_lock, flags);
 	for_each_online_cpu(cpu) {
 		domain = vector_allocation_domain(cpu);
@@ -424,6 +435,8 @@ int create_irq(void)
 	BUG_ON(__bind_irq_vector(irq, vector, domain));
  out:
 	spin_unlock_irqrestore(&vector_lock, flags);
+	put_online_cpus_atomic();
+
 	if (irq >= 0)
 		dynamic_irq_init(irq);
 	return irq;
diff --git a/arch/ia64/kernel/perfmon.c b/arch/ia64/kernel/perfmon.c
index ea39eba..6c6a029 100644
--- a/arch/ia64/kernel/perfmon.c
+++ b/arch/ia64/kernel/perfmon.c
@@ -34,6 +34,7 @@
 #include <linux/poll.h>
 #include <linux/vfs.h>
 #include <linux/smp.h>
+#include <linux/cpu.h>
 #include <linux/pagemap.h>
 #include <linux/mount.h>
 #include <linux/bitops.h>
@@ -6485,6 +6486,7 @@ pfm_install_alt_pmu_interrupt(pfm_intr_handler_desc_t *hdl)
 	}
 
 	/* reserve our session */
+	get_online_cpus_atomic();
 	for_each_online_cpu(reserve_cpu) {
 		ret = pfm_reserve_session(NULL, 1, reserve_cpu);
 		if (ret) goto cleanup_reserve;
@@ -6500,6 +6502,7 @@ pfm_install_alt_pmu_interrupt(pfm_intr_handler_desc_t *hdl)
 	/* officially change to the alternate interrupt handler */
 	pfm_alt_intr_handler = hdl;
 
+	put_online_cpus_atomic();
 	spin_unlock(&pfm_alt_install_check);
 
 	return 0;
@@ -6512,6 +6515,7 @@ cleanup_reserve:
 		pfm_unreserve_session(NULL, 1, i);
 	}
 
+	put_online_cpus_atomic();
 	spin_unlock(&pfm_alt_install_check);
 
 	return ret;
@@ -6536,6 +6540,7 @@ pfm_remove_alt_pmu_interrupt(pfm_intr_handler_desc_t *hdl)
 
 	pfm_alt_intr_handler = NULL;
 
+	get_online_cpus_atomic();
 	ret = on_each_cpu(pfm_alt_restore_pmu_state, NULL, 1);
 	if (ret) {
 		DPRINT(("on_each_cpu() failed: %d\n", ret));
@@ -6545,6 +6550,7 @@ pfm_remove_alt_pmu_interrupt(pfm_intr_handler_desc_t *hdl)
 		pfm_unreserve_session(NULL, 1, i);
 	}
 
+	put_online_cpus_atomic();
 	spin_unlock(&pfm_alt_install_check);
 
 	return 0;
diff --git a/arch/ia64/kernel/smp.c b/arch/ia64/kernel/smp.c
index 9fcd4e6..d9a4636 100644
--- a/arch/ia64/kernel/smp.c
+++ b/arch/ia64/kernel/smp.c
@@ -24,6 +24,7 @@
 #include <linux/init.h>
 #include <linux/interrupt.h>
 #include <linux/smp.h>
+#include <linux/cpu.h>
 #include <linux/kernel_stat.h>
 #include <linux/mm.h>
 #include <linux/cache.h>
@@ -154,12 +155,15 @@ send_IPI_single (int dest_cpu, int op)
 static inline void
 send_IPI_allbutself (int op)
 {
-	unsigned int i;
+	unsigned int i, cpu;
 
+	get_online_cpus_atomic();
+	cpu = smp_processor_id();
 	for_each_online_cpu(i) {
-		if (i != smp_processor_id())
+		if (i != cpu)
 			send_IPI_single(i, op);
 	}
+	put_online_cpus_atomic();
 }
 
 /*
@@ -170,9 +174,11 @@ send_IPI_mask(const struct cpumask *mask, int op)
 {
 	unsigned int cpu;
 
+	get_online_cpus_atomic();
 	for_each_cpu(cpu, mask) {
 			send_IPI_single(cpu, op);
 	}
+	put_online_cpus_atomic();
 }
 
 /*
@@ -183,9 +189,11 @@ send_IPI_all (int op)
 {
 	int i;
 
+	get_online_cpus_atomic();
 	for_each_online_cpu(i) {
 		send_IPI_single(i, op);
 	}
+	put_online_cpus_atomic();
 }
 
 /*
@@ -259,7 +267,7 @@ smp_flush_tlb_cpumask(cpumask_t xcpumask)
 	cpumask_t cpumask = xcpumask;
 	int mycpu, cpu, flush_mycpu = 0;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	mycpu = smp_processor_id();
 
 	for_each_cpu_mask(cpu, cpumask)
@@ -280,7 +288,7 @@ smp_flush_tlb_cpumask(cpumask_t xcpumask)
 		while(counts[cpu] == (local_tlb_flush_counts[cpu].count & 0xffff))
 			udelay(FLUSH_DELAY);
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 void
@@ -293,12 +301,13 @@ void
 smp_flush_tlb_mm (struct mm_struct *mm)
 {
 	cpumask_var_t cpus;
-	preempt_disable();
+
+	get_online_cpus_atomic();
 	/* this happens for the common case of a single-threaded fork():  */
 	if (likely(mm == current->active_mm && atomic_read(&mm->mm_users) == 1))
 	{
 		local_finish_flush_tlb_mm(mm);
-		preempt_enable();
+		put_online_cpus_atomic();
 		return;
 	}
 	if (!alloc_cpumask_var(&cpus, GFP_ATOMIC)) {
@@ -313,7 +322,7 @@ smp_flush_tlb_mm (struct mm_struct *mm)
 	local_irq_disable();
 	local_finish_flush_tlb_mm(mm);
 	local_irq_enable();
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 void arch_send_call_function_single_ipi(int cpu)
diff --git a/arch/ia64/mm/tlb.c b/arch/ia64/mm/tlb.c
index ed61297..8f03b58 100644
--- a/arch/ia64/mm/tlb.c
+++ b/arch/ia64/mm/tlb.c
@@ -20,6 +20,7 @@
 #include <linux/kernel.h>
 #include <linux/sched.h>
 #include <linux/smp.h>
+#include <linux/cpu.h>
 #include <linux/mm.h>
 #include <linux/bootmem.h>
 #include <linux/slab.h>
@@ -87,11 +88,12 @@ wrap_mmu_context (struct mm_struct *mm)
 	 * can't call flush_tlb_all() here because of race condition
 	 * with O(1) scheduler [EF]
 	 */
-	cpu = get_cpu(); /* prevent preemption/migration */
+	get_online_cpus_atomic(); /* prevent preemption/migration */
+	cpu = smp_processor_id();
 	for_each_online_cpu(i)
 		if (i != cpu)
 			per_cpu(ia64_need_tlb_flush, i) = 1;
-	put_cpu();
+	put_online_cpus_atomic();
 	local_flush_tlb_all();
 }
 

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 36/46] m32r: Use get/put_online_cpus_atomic() to prevent CPU offline
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:43   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:43 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: linux-m32r@ml.linux-m32r.org
Cc: linux-m32r-ja@ml.linux-m32r.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/m32r/kernel/smp.c |   12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/m32r/kernel/smp.c b/arch/m32r/kernel/smp.c
index ce7aea3..0dad4d7 100644
--- a/arch/m32r/kernel/smp.c
+++ b/arch/m32r/kernel/smp.c
@@ -151,7 +151,7 @@ void smp_flush_cache_all(void)
 	cpumask_t cpumask;
 	unsigned long *mask;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	cpumask_copy(&cpumask, cpu_online_mask);
 	cpumask_clear_cpu(smp_processor_id(), &cpumask);
 	spin_lock(&flushcache_lock);
@@ -162,7 +162,7 @@ void smp_flush_cache_all(void)
 	while (flushcache_cpumask)
 		mb();
 	spin_unlock(&flushcache_lock);
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 void smp_flush_cache_all_interrupt(void)
@@ -250,7 +250,7 @@ void smp_flush_tlb_mm(struct mm_struct *mm)
 	unsigned long *mmc;
 	unsigned long flags;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	cpu_id = smp_processor_id();
 	mmc = &mm->context[cpu_id];
 	cpumask_copy(&cpu_mask, mm_cpumask(mm));
@@ -268,7 +268,7 @@ void smp_flush_tlb_mm(struct mm_struct *mm)
 	if (!cpumask_empty(&cpu_mask))
 		flush_tlb_others(cpu_mask, mm, NULL, FLUSH_ALL);
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 /*==========================================================================*
@@ -715,10 +715,12 @@ static void send_IPI_allbutself(int ipi_num, int try)
 {
 	cpumask_t cpumask;
 
+	get_online_cpus_atomic();
 	cpumask_copy(&cpumask, cpu_online_mask);
 	cpumask_clear_cpu(smp_processor_id(), &cpumask);
 
 	send_IPI_mask(&cpumask, ipi_num, try);
+	put_online_cpus_atomic();
 }
 
 /*==========================================================================*
@@ -750,6 +752,7 @@ static void send_IPI_mask(const struct cpumask *cpumask, int ipi_num, int try)
 	if (num_cpus <= 1)	/* NO MP */
 		return;
 
+	get_online_cpus_atomic();
 	cpumask_and(&tmp, cpumask, cpu_online_mask);
 	BUG_ON(!cpumask_equal(cpumask, &tmp));
 
@@ -760,6 +763,7 @@ static void send_IPI_mask(const struct cpumask *cpumask, int ipi_num, int try)
 	}
 
 	send_IPI_mask_phys(&physid_mask, ipi_num, try);
+	put_online_cpus_atomic();
 }
 
 /*==========================================================================*


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 36/46] m32r: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:43   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:43 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: linux-m32r@ml.linux-m32r.org
Cc: linux-m32r-ja@ml.linux-m32r.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/m32r/kernel/smp.c |   12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/m32r/kernel/smp.c b/arch/m32r/kernel/smp.c
index ce7aea3..0dad4d7 100644
--- a/arch/m32r/kernel/smp.c
+++ b/arch/m32r/kernel/smp.c
@@ -151,7 +151,7 @@ void smp_flush_cache_all(void)
 	cpumask_t cpumask;
 	unsigned long *mask;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	cpumask_copy(&cpumask, cpu_online_mask);
 	cpumask_clear_cpu(smp_processor_id(), &cpumask);
 	spin_lock(&flushcache_lock);
@@ -162,7 +162,7 @@ void smp_flush_cache_all(void)
 	while (flushcache_cpumask)
 		mb();
 	spin_unlock(&flushcache_lock);
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 void smp_flush_cache_all_interrupt(void)
@@ -250,7 +250,7 @@ void smp_flush_tlb_mm(struct mm_struct *mm)
 	unsigned long *mmc;
 	unsigned long flags;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	cpu_id = smp_processor_id();
 	mmc = &mm->context[cpu_id];
 	cpumask_copy(&cpu_mask, mm_cpumask(mm));
@@ -268,7 +268,7 @@ void smp_flush_tlb_mm(struct mm_struct *mm)
 	if (!cpumask_empty(&cpu_mask))
 		flush_tlb_others(cpu_mask, mm, NULL, FLUSH_ALL);
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 /*==========================================================================*
@@ -715,10 +715,12 @@ static void send_IPI_allbutself(int ipi_num, int try)
 {
 	cpumask_t cpumask;
 
+	get_online_cpus_atomic();
 	cpumask_copy(&cpumask, cpu_online_mask);
 	cpumask_clear_cpu(smp_processor_id(), &cpumask);
 
 	send_IPI_mask(&cpumask, ipi_num, try);
+	put_online_cpus_atomic();
 }
 
 /*==========================================================================*
@@ -750,6 +752,7 @@ static void send_IPI_mask(const struct cpumask *cpumask, int ipi_num, int try)
 	if (num_cpus <= 1)	/* NO MP */
 		return;
 
+	get_online_cpus_atomic();
 	cpumask_and(&tmp, cpumask, cpu_online_mask);
 	BUG_ON(!cpumask_equal(cpumask, &tmp));
 
@@ -760,6 +763,7 @@ static void send_IPI_mask(const struct cpumask *cpumask, int ipi_num, int try)
 	}
 
 	send_IPI_mask_phys(&physid_mask, ipi_num, try);
+	put_online_cpus_atomic();
 }
 
 /*==========================================================================*

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 36/46] m32r: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:43   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:43 UTC (permalink / raw)
  To: linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: linux-m32r at ml.linux-m32r.org
Cc: linux-m32r-ja at ml.linux-m32r.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/m32r/kernel/smp.c |   12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/m32r/kernel/smp.c b/arch/m32r/kernel/smp.c
index ce7aea3..0dad4d7 100644
--- a/arch/m32r/kernel/smp.c
+++ b/arch/m32r/kernel/smp.c
@@ -151,7 +151,7 @@ void smp_flush_cache_all(void)
 	cpumask_t cpumask;
 	unsigned long *mask;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	cpumask_copy(&cpumask, cpu_online_mask);
 	cpumask_clear_cpu(smp_processor_id(), &cpumask);
 	spin_lock(&flushcache_lock);
@@ -162,7 +162,7 @@ void smp_flush_cache_all(void)
 	while (flushcache_cpumask)
 		mb();
 	spin_unlock(&flushcache_lock);
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 void smp_flush_cache_all_interrupt(void)
@@ -250,7 +250,7 @@ void smp_flush_tlb_mm(struct mm_struct *mm)
 	unsigned long *mmc;
 	unsigned long flags;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	cpu_id = smp_processor_id();
 	mmc = &mm->context[cpu_id];
 	cpumask_copy(&cpu_mask, mm_cpumask(mm));
@@ -268,7 +268,7 @@ void smp_flush_tlb_mm(struct mm_struct *mm)
 	if (!cpumask_empty(&cpu_mask))
 		flush_tlb_others(cpu_mask, mm, NULL, FLUSH_ALL);
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 /*==========================================================================*
@@ -715,10 +715,12 @@ static void send_IPI_allbutself(int ipi_num, int try)
 {
 	cpumask_t cpumask;
 
+	get_online_cpus_atomic();
 	cpumask_copy(&cpumask, cpu_online_mask);
 	cpumask_clear_cpu(smp_processor_id(), &cpumask);
 
 	send_IPI_mask(&cpumask, ipi_num, try);
+	put_online_cpus_atomic();
 }
 
 /*==========================================================================*
@@ -750,6 +752,7 @@ static void send_IPI_mask(const struct cpumask *cpumask, int ipi_num, int try)
 	if (num_cpus <= 1)	/* NO MP */
 		return;
 
+	get_online_cpus_atomic();
 	cpumask_and(&tmp, cpumask, cpu_online_mask);
 	BUG_ON(!cpumask_equal(cpumask, &tmp));
 
@@ -760,6 +763,7 @@ static void send_IPI_mask(const struct cpumask *cpumask, int ipi_num, int try)
 	}
 
 	send_IPI_mask_phys(&physid_mask, ipi_num, try);
+	put_online_cpus_atomic();
 }
 
 /*==========================================================================*

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 37/46] MIPS: Use get/put_online_cpus_atomic() to prevent CPU offline
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:43   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:43 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs fom
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/mips/kernel/cevt-smtc.c |    8 ++++++++
 arch/mips/kernel/smp.c       |   16 ++++++++--------
 arch/mips/kernel/smtc.c      |    3 +++
 arch/mips/mm/c-octeon.c      |    4 ++--
 4 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/arch/mips/kernel/cevt-smtc.c b/arch/mips/kernel/cevt-smtc.c
index 2e72d30..6fb311b 100644
--- a/arch/mips/kernel/cevt-smtc.c
+++ b/arch/mips/kernel/cevt-smtc.c
@@ -11,6 +11,7 @@
 #include <linux/interrupt.h>
 #include <linux/percpu.h>
 #include <linux/smp.h>
+#include <linux/cpu.h>
 #include <linux/irq.h>
 
 #include <asm/smtc_ipi.h>
@@ -84,6 +85,8 @@ static int mips_next_event(unsigned long delta,
 	unsigned long nextcomp = 0L;
 	int vpe = current_cpu_data.vpe_id;
 	int cpu = smp_processor_id();
+
+	get_online_cpus_atomic();
 	local_irq_save(flags);
 	mtflags = dmt();
 
@@ -164,6 +167,7 @@ static int mips_next_event(unsigned long delta,
 	}
 	emt(mtflags);
 	local_irq_restore(flags);
+	put_online_cpus_atomic();
 	return 0;
 }
 
@@ -180,6 +184,7 @@ void smtc_distribute_timer(int vpe)
 
 repeat:
 	nextstamp = 0L;
+	get_online_cpus_atomic();
 	for_each_online_cpu(cpu) {
 	    /*
 	     * Find virtual CPUs within the current VPE who have
@@ -221,6 +226,9 @@ repeat:
 
 	    }
 	}
+
+	put_online_cpus_atomic()
+
 	/* Reprogram for interrupt at next soonest timestamp for VPE */
 	if (ISVALID(nextstamp)) {
 		write_c0_compare(nextstamp);
diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
index 66bf4e2..3828afa 100644
--- a/arch/mips/kernel/smp.c
+++ b/arch/mips/kernel/smp.c
@@ -248,12 +248,12 @@ static inline void smp_on_other_tlbs(void (*func) (void *info), void *info)
 
 static inline void smp_on_each_tlb(void (*func) (void *info), void *info)
 {
-	preempt_disable();
+	get_online_cpus_atomic();
 
 	smp_on_other_tlbs(func, info);
 	func(info);
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 /*
@@ -271,7 +271,7 @@ static inline void smp_on_each_tlb(void (*func) (void *info), void *info)
 
 void flush_tlb_mm(struct mm_struct *mm)
 {
-	preempt_disable();
+	get_online_cpus_atomic();
 
 	if ((atomic_read(&mm->mm_users) != 1) || (current->mm != mm)) {
 		smp_on_other_tlbs(flush_tlb_mm_ipi, mm);
@@ -285,7 +285,7 @@ void flush_tlb_mm(struct mm_struct *mm)
 	}
 	local_flush_tlb_mm(mm);
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 struct flush_tlb_data {
@@ -305,7 +305,7 @@ void flush_tlb_range(struct vm_area_struct *vma, unsigned long start, unsigned l
 {
 	struct mm_struct *mm = vma->vm_mm;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	if ((atomic_read(&mm->mm_users) != 1) || (current->mm != mm)) {
 		struct flush_tlb_data fd = {
 			.vma = vma,
@@ -323,7 +323,7 @@ void flush_tlb_range(struct vm_area_struct *vma, unsigned long start, unsigned l
 		}
 	}
 	local_flush_tlb_range(vma, start, end);
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 static void flush_tlb_kernel_range_ipi(void *info)
@@ -352,7 +352,7 @@ static void flush_tlb_page_ipi(void *info)
 
 void flush_tlb_page(struct vm_area_struct *vma, unsigned long page)
 {
-	preempt_disable();
+	get_online_cpus_atomic();
 	if ((atomic_read(&vma->vm_mm->mm_users) != 1) || (current->mm != vma->vm_mm)) {
 		struct flush_tlb_data fd = {
 			.vma = vma,
@@ -369,7 +369,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long page)
 		}
 	}
 	local_flush_tlb_page(vma, page);
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 static void flush_tlb_one_ipi(void *info)
diff --git a/arch/mips/kernel/smtc.c b/arch/mips/kernel/smtc.c
index 1d47843..caf081e 100644
--- a/arch/mips/kernel/smtc.c
+++ b/arch/mips/kernel/smtc.c
@@ -22,6 +22,7 @@
 #include <linux/sched.h>
 #include <linux/smp.h>
 #include <linux/cpumask.h>
+#include <linux/cpu.h>
 #include <linux/interrupt.h>
 #include <linux/kernel_stat.h>
 #include <linux/module.h>
@@ -1143,6 +1144,7 @@ static irqreturn_t ipi_interrupt(int irq, void *dev_idm)
 	 * for the current TC, so we ought not to have to do it explicitly here.
 	 */
 
+	get_online_cpus_atomic();
 	for_each_online_cpu(cpu) {
 		if (cpu_data[cpu].vpe_id != my_vpe)
 			continue;
@@ -1179,6 +1181,7 @@ static irqreturn_t ipi_interrupt(int irq, void *dev_idm)
 			}
 		}
 	}
+	put_online_cpus_atomic();
 
 	return IRQ_HANDLED;
 }
diff --git a/arch/mips/mm/c-octeon.c b/arch/mips/mm/c-octeon.c
index 6ec04da..cd2c1ce 100644
--- a/arch/mips/mm/c-octeon.c
+++ b/arch/mips/mm/c-octeon.c
@@ -73,7 +73,7 @@ static void octeon_flush_icache_all_cores(struct vm_area_struct *vma)
 	mb();
 	octeon_local_flush_icache();
 #ifdef CONFIG_SMP
-	preempt_disable();
+	get_online_cpus_atomic();
 	cpu = smp_processor_id();
 
 	/*
@@ -88,7 +88,7 @@ static void octeon_flush_icache_all_cores(struct vm_area_struct *vma)
 	for_each_cpu(cpu, &mask)
 		octeon_send_ipi_single(cpu, SMP_ICACHE_FLUSH);
 
-	preempt_enable();
+	put_online_cpus_atomic();
 #endif
 }
 


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 37/46] MIPS: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:43   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:43 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs fom
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/mips/kernel/cevt-smtc.c |    8 ++++++++
 arch/mips/kernel/smp.c       |   16 ++++++++--------
 arch/mips/kernel/smtc.c      |    3 +++
 arch/mips/mm/c-octeon.c      |    4 ++--
 4 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/arch/mips/kernel/cevt-smtc.c b/arch/mips/kernel/cevt-smtc.c
index 2e72d30..6fb311b 100644
--- a/arch/mips/kernel/cevt-smtc.c
+++ b/arch/mips/kernel/cevt-smtc.c
@@ -11,6 +11,7 @@
 #include <linux/interrupt.h>
 #include <linux/percpu.h>
 #include <linux/smp.h>
+#include <linux/cpu.h>
 #include <linux/irq.h>
 
 #include <asm/smtc_ipi.h>
@@ -84,6 +85,8 @@ static int mips_next_event(unsigned long delta,
 	unsigned long nextcomp = 0L;
 	int vpe = current_cpu_data.vpe_id;
 	int cpu = smp_processor_id();
+
+	get_online_cpus_atomic();
 	local_irq_save(flags);
 	mtflags = dmt();
 
@@ -164,6 +167,7 @@ static int mips_next_event(unsigned long delta,
 	}
 	emt(mtflags);
 	local_irq_restore(flags);
+	put_online_cpus_atomic();
 	return 0;
 }
 
@@ -180,6 +184,7 @@ void smtc_distribute_timer(int vpe)
 
 repeat:
 	nextstamp = 0L;
+	get_online_cpus_atomic();
 	for_each_online_cpu(cpu) {
 	    /*
 	     * Find virtual CPUs within the current VPE who have
@@ -221,6 +226,9 @@ repeat:
 
 	    }
 	}
+
+	put_online_cpus_atomic()
+
 	/* Reprogram for interrupt at next soonest timestamp for VPE */
 	if (ISVALID(nextstamp)) {
 		write_c0_compare(nextstamp);
diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
index 66bf4e2..3828afa 100644
--- a/arch/mips/kernel/smp.c
+++ b/arch/mips/kernel/smp.c
@@ -248,12 +248,12 @@ static inline void smp_on_other_tlbs(void (*func) (void *info), void *info)
 
 static inline void smp_on_each_tlb(void (*func) (void *info), void *info)
 {
-	preempt_disable();
+	get_online_cpus_atomic();
 
 	smp_on_other_tlbs(func, info);
 	func(info);
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 /*
@@ -271,7 +271,7 @@ static inline void smp_on_each_tlb(void (*func) (void *info), void *info)
 
 void flush_tlb_mm(struct mm_struct *mm)
 {
-	preempt_disable();
+	get_online_cpus_atomic();
 
 	if ((atomic_read(&mm->mm_users) != 1) || (current->mm != mm)) {
 		smp_on_other_tlbs(flush_tlb_mm_ipi, mm);
@@ -285,7 +285,7 @@ void flush_tlb_mm(struct mm_struct *mm)
 	}
 	local_flush_tlb_mm(mm);
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 struct flush_tlb_data {
@@ -305,7 +305,7 @@ void flush_tlb_range(struct vm_area_struct *vma, unsigned long start, unsigned l
 {
 	struct mm_struct *mm = vma->vm_mm;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	if ((atomic_read(&mm->mm_users) != 1) || (current->mm != mm)) {
 		struct flush_tlb_data fd = {
 			.vma = vma,
@@ -323,7 +323,7 @@ void flush_tlb_range(struct vm_area_struct *vma, unsigned long start, unsigned l
 		}
 	}
 	local_flush_tlb_range(vma, start, end);
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 static void flush_tlb_kernel_range_ipi(void *info)
@@ -352,7 +352,7 @@ static void flush_tlb_page_ipi(void *info)
 
 void flush_tlb_page(struct vm_area_struct *vma, unsigned long page)
 {
-	preempt_disable();
+	get_online_cpus_atomic();
 	if ((atomic_read(&vma->vm_mm->mm_users) != 1) || (current->mm != vma->vm_mm)) {
 		struct flush_tlb_data fd = {
 			.vma = vma,
@@ -369,7 +369,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long page)
 		}
 	}
 	local_flush_tlb_page(vma, page);
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 static void flush_tlb_one_ipi(void *info)
diff --git a/arch/mips/kernel/smtc.c b/arch/mips/kernel/smtc.c
index 1d47843..caf081e 100644
--- a/arch/mips/kernel/smtc.c
+++ b/arch/mips/kernel/smtc.c
@@ -22,6 +22,7 @@
 #include <linux/sched.h>
 #include <linux/smp.h>
 #include <linux/cpumask.h>
+#include <linux/cpu.h>
 #include <linux/interrupt.h>
 #include <linux/kernel_stat.h>
 #include <linux/module.h>
@@ -1143,6 +1144,7 @@ static irqreturn_t ipi_interrupt(int irq, void *dev_idm)
 	 * for the current TC, so we ought not to have to do it explicitly here.
 	 */
 
+	get_online_cpus_atomic();
 	for_each_online_cpu(cpu) {
 		if (cpu_data[cpu].vpe_id != my_vpe)
 			continue;
@@ -1179,6 +1181,7 @@ static irqreturn_t ipi_interrupt(int irq, void *dev_idm)
 			}
 		}
 	}
+	put_online_cpus_atomic();
 
 	return IRQ_HANDLED;
 }
diff --git a/arch/mips/mm/c-octeon.c b/arch/mips/mm/c-octeon.c
index 6ec04da..cd2c1ce 100644
--- a/arch/mips/mm/c-octeon.c
+++ b/arch/mips/mm/c-octeon.c
@@ -73,7 +73,7 @@ static void octeon_flush_icache_all_cores(struct vm_area_struct *vma)
 	mb();
 	octeon_local_flush_icache();
 #ifdef CONFIG_SMP
-	preempt_disable();
+	get_online_cpus_atomic();
 	cpu = smp_processor_id();
 
 	/*
@@ -88,7 +88,7 @@ static void octeon_flush_icache_all_cores(struct vm_area_struct *vma)
 	for_each_cpu(cpu, &mask)
 		octeon_send_ipi_single(cpu, SMP_ICACHE_FLUSH);
 
-	preempt_enable();
+	put_online_cpus_atomic();
 #endif
 }
 

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 37/46] MIPS: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:43   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:43 UTC (permalink / raw)
  To: linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs fom
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Daney <david.daney@cavium.com>
Cc: linux-mips at linux-mips.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/mips/kernel/cevt-smtc.c |    8 ++++++++
 arch/mips/kernel/smp.c       |   16 ++++++++--------
 arch/mips/kernel/smtc.c      |    3 +++
 arch/mips/mm/c-octeon.c      |    4 ++--
 4 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/arch/mips/kernel/cevt-smtc.c b/arch/mips/kernel/cevt-smtc.c
index 2e72d30..6fb311b 100644
--- a/arch/mips/kernel/cevt-smtc.c
+++ b/arch/mips/kernel/cevt-smtc.c
@@ -11,6 +11,7 @@
 #include <linux/interrupt.h>
 #include <linux/percpu.h>
 #include <linux/smp.h>
+#include <linux/cpu.h>
 #include <linux/irq.h>
 
 #include <asm/smtc_ipi.h>
@@ -84,6 +85,8 @@ static int mips_next_event(unsigned long delta,
 	unsigned long nextcomp = 0L;
 	int vpe = current_cpu_data.vpe_id;
 	int cpu = smp_processor_id();
+
+	get_online_cpus_atomic();
 	local_irq_save(flags);
 	mtflags = dmt();
 
@@ -164,6 +167,7 @@ static int mips_next_event(unsigned long delta,
 	}
 	emt(mtflags);
 	local_irq_restore(flags);
+	put_online_cpus_atomic();
 	return 0;
 }
 
@@ -180,6 +184,7 @@ void smtc_distribute_timer(int vpe)
 
 repeat:
 	nextstamp = 0L;
+	get_online_cpus_atomic();
 	for_each_online_cpu(cpu) {
 	    /*
 	     * Find virtual CPUs within the current VPE who have
@@ -221,6 +226,9 @@ repeat:
 
 	    }
 	}
+
+	put_online_cpus_atomic()
+
 	/* Reprogram for interrupt at next soonest timestamp for VPE */
 	if (ISVALID(nextstamp)) {
 		write_c0_compare(nextstamp);
diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
index 66bf4e2..3828afa 100644
--- a/arch/mips/kernel/smp.c
+++ b/arch/mips/kernel/smp.c
@@ -248,12 +248,12 @@ static inline void smp_on_other_tlbs(void (*func) (void *info), void *info)
 
 static inline void smp_on_each_tlb(void (*func) (void *info), void *info)
 {
-	preempt_disable();
+	get_online_cpus_atomic();
 
 	smp_on_other_tlbs(func, info);
 	func(info);
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 /*
@@ -271,7 +271,7 @@ static inline void smp_on_each_tlb(void (*func) (void *info), void *info)
 
 void flush_tlb_mm(struct mm_struct *mm)
 {
-	preempt_disable();
+	get_online_cpus_atomic();
 
 	if ((atomic_read(&mm->mm_users) != 1) || (current->mm != mm)) {
 		smp_on_other_tlbs(flush_tlb_mm_ipi, mm);
@@ -285,7 +285,7 @@ void flush_tlb_mm(struct mm_struct *mm)
 	}
 	local_flush_tlb_mm(mm);
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 struct flush_tlb_data {
@@ -305,7 +305,7 @@ void flush_tlb_range(struct vm_area_struct *vma, unsigned long start, unsigned l
 {
 	struct mm_struct *mm = vma->vm_mm;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	if ((atomic_read(&mm->mm_users) != 1) || (current->mm != mm)) {
 		struct flush_tlb_data fd = {
 			.vma = vma,
@@ -323,7 +323,7 @@ void flush_tlb_range(struct vm_area_struct *vma, unsigned long start, unsigned l
 		}
 	}
 	local_flush_tlb_range(vma, start, end);
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 static void flush_tlb_kernel_range_ipi(void *info)
@@ -352,7 +352,7 @@ static void flush_tlb_page_ipi(void *info)
 
 void flush_tlb_page(struct vm_area_struct *vma, unsigned long page)
 {
-	preempt_disable();
+	get_online_cpus_atomic();
 	if ((atomic_read(&vma->vm_mm->mm_users) != 1) || (current->mm != vma->vm_mm)) {
 		struct flush_tlb_data fd = {
 			.vma = vma,
@@ -369,7 +369,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long page)
 		}
 	}
 	local_flush_tlb_page(vma, page);
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 static void flush_tlb_one_ipi(void *info)
diff --git a/arch/mips/kernel/smtc.c b/arch/mips/kernel/smtc.c
index 1d47843..caf081e 100644
--- a/arch/mips/kernel/smtc.c
+++ b/arch/mips/kernel/smtc.c
@@ -22,6 +22,7 @@
 #include <linux/sched.h>
 #include <linux/smp.h>
 #include <linux/cpumask.h>
+#include <linux/cpu.h>
 #include <linux/interrupt.h>
 #include <linux/kernel_stat.h>
 #include <linux/module.h>
@@ -1143,6 +1144,7 @@ static irqreturn_t ipi_interrupt(int irq, void *dev_idm)
 	 * for the current TC, so we ought not to have to do it explicitly here.
 	 */
 
+	get_online_cpus_atomic();
 	for_each_online_cpu(cpu) {
 		if (cpu_data[cpu].vpe_id != my_vpe)
 			continue;
@@ -1179,6 +1181,7 @@ static irqreturn_t ipi_interrupt(int irq, void *dev_idm)
 			}
 		}
 	}
+	put_online_cpus_atomic();
 
 	return IRQ_HANDLED;
 }
diff --git a/arch/mips/mm/c-octeon.c b/arch/mips/mm/c-octeon.c
index 6ec04da..cd2c1ce 100644
--- a/arch/mips/mm/c-octeon.c
+++ b/arch/mips/mm/c-octeon.c
@@ -73,7 +73,7 @@ static void octeon_flush_icache_all_cores(struct vm_area_struct *vma)
 	mb();
 	octeon_local_flush_icache();
 #ifdef CONFIG_SMP
-	preempt_disable();
+	get_online_cpus_atomic();
 	cpu = smp_processor_id();
 
 	/*
@@ -88,7 +88,7 @@ static void octeon_flush_icache_all_cores(struct vm_area_struct *vma)
 	for_each_cpu(cpu, &mask)
 		octeon_send_ipi_single(cpu, SMP_ICACHE_FLUSH);
 
-	preempt_enable();
+	put_online_cpus_atomic();
 #endif
 }
 

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 38/46] mn10300: Use get/put_online_cpus_atomic() to prevent CPU offline
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:43   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:43 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: David Howells <dhowells@redhat.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: linux-am33-list@redhat.com
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/mn10300/kernel/smp.c   |    2 ++
 arch/mn10300/mm/cache-smp.c |    5 +++++
 arch/mn10300/mm/tlb-smp.c   |   15 +++++++++------
 3 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/mn10300/kernel/smp.c b/arch/mn10300/kernel/smp.c
index 5d7e152..9dfa172 100644
--- a/arch/mn10300/kernel/smp.c
+++ b/arch/mn10300/kernel/smp.c
@@ -349,9 +349,11 @@ void send_IPI_allbutself(int irq)
 {
 	cpumask_t cpumask;
 
+	get_online_cpus_atomic();
 	cpumask_copy(&cpumask, cpu_online_mask);
 	cpumask_clear_cpu(smp_processor_id(), &cpumask);
 	send_IPI_mask(&cpumask, irq);
+	put_online_cpus_atomic();
 }
 
 void arch_send_call_function_ipi_mask(const struct cpumask *mask)
diff --git a/arch/mn10300/mm/cache-smp.c b/arch/mn10300/mm/cache-smp.c
index 2d23b9e..47ca1c9 100644
--- a/arch/mn10300/mm/cache-smp.c
+++ b/arch/mn10300/mm/cache-smp.c
@@ -13,6 +13,7 @@
 #include <linux/mman.h>
 #include <linux/threads.h>
 #include <linux/interrupt.h>
+#include <linux/cpu.h>
 #include <asm/page.h>
 #include <asm/pgtable.h>
 #include <asm/processor.h>
@@ -94,6 +95,8 @@ void smp_cache_call(unsigned long opr_mask,
 	smp_cache_mask = opr_mask;
 	smp_cache_start = start;
 	smp_cache_end = end;
+
+	get_online_cpus_atomic();
 	cpumask_copy(&smp_cache_ipi_map, cpu_online_mask);
 	cpumask_clear_cpu(smp_processor_id(), &smp_cache_ipi_map);
 
@@ -102,4 +105,6 @@ void smp_cache_call(unsigned long opr_mask,
 	while (!cpumask_empty(&smp_cache_ipi_map))
 		/* nothing. lockup detection does not belong here */
 		mb();
+
+	put_online_cpus_atomic();
 }
diff --git a/arch/mn10300/mm/tlb-smp.c b/arch/mn10300/mm/tlb-smp.c
index 3e57faf..d47304d 100644
--- a/arch/mn10300/mm/tlb-smp.c
+++ b/arch/mn10300/mm/tlb-smp.c
@@ -23,6 +23,7 @@
 #include <linux/sched.h>
 #include <linux/profile.h>
 #include <linux/smp.h>
+#include <linux/cpu.h>
 #include <asm/tlbflush.h>
 #include <asm/bitops.h>
 #include <asm/processor.h>
@@ -105,6 +106,7 @@ static void flush_tlb_others(cpumask_t cpumask, struct mm_struct *mm,
 	BUG_ON(cpumask_empty(&cpumask));
 	BUG_ON(cpumask_test_cpu(smp_processor_id(), &cpumask));
 
+	get_online_cpus_atomic();
 	cpumask_and(&tmp, &cpumask, cpu_online_mask);
 	BUG_ON(!cpumask_equal(&cpumask, &tmp));
 
@@ -134,6 +136,7 @@ static void flush_tlb_others(cpumask_t cpumask, struct mm_struct *mm,
 	flush_mm = NULL;
 	flush_va = 0;
 	spin_unlock(&tlbstate_lock);
+	put_online_cpus_atomic();
 }
 
 /**
@@ -144,7 +147,7 @@ void flush_tlb_mm(struct mm_struct *mm)
 {
 	cpumask_t cpu_mask;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	cpumask_copy(&cpu_mask, mm_cpumask(mm));
 	cpumask_clear_cpu(smp_processor_id(), &cpu_mask);
 
@@ -152,7 +155,7 @@ void flush_tlb_mm(struct mm_struct *mm)
 	if (!cpumask_empty(&cpu_mask))
 		flush_tlb_others(cpu_mask, mm, FLUSH_ALL);
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 /**
@@ -163,7 +166,7 @@ void flush_tlb_current_task(void)
 	struct mm_struct *mm = current->mm;
 	cpumask_t cpu_mask;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	cpumask_copy(&cpu_mask, mm_cpumask(mm));
 	cpumask_clear_cpu(smp_processor_id(), &cpu_mask);
 
@@ -171,7 +174,7 @@ void flush_tlb_current_task(void)
 	if (!cpumask_empty(&cpu_mask))
 		flush_tlb_others(cpu_mask, mm, FLUSH_ALL);
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 /**
@@ -184,7 +187,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long va)
 	struct mm_struct *mm = vma->vm_mm;
 	cpumask_t cpu_mask;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	cpumask_copy(&cpu_mask, mm_cpumask(mm));
 	cpumask_clear_cpu(smp_processor_id(), &cpu_mask);
 
@@ -192,7 +195,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long va)
 	if (!cpumask_empty(&cpu_mask))
 		flush_tlb_others(cpu_mask, mm, va);
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 /**


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 38/46] mn10300: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:43   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:43 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: David Howells <dhowells@redhat.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: linux-am33-list@redhat.com
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/mn10300/kernel/smp.c   |    2 ++
 arch/mn10300/mm/cache-smp.c |    5 +++++
 arch/mn10300/mm/tlb-smp.c   |   15 +++++++++------
 3 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/mn10300/kernel/smp.c b/arch/mn10300/kernel/smp.c
index 5d7e152..9dfa172 100644
--- a/arch/mn10300/kernel/smp.c
+++ b/arch/mn10300/kernel/smp.c
@@ -349,9 +349,11 @@ void send_IPI_allbutself(int irq)
 {
 	cpumask_t cpumask;
 
+	get_online_cpus_atomic();
 	cpumask_copy(&cpumask, cpu_online_mask);
 	cpumask_clear_cpu(smp_processor_id(), &cpumask);
 	send_IPI_mask(&cpumask, irq);
+	put_online_cpus_atomic();
 }
 
 void arch_send_call_function_ipi_mask(const struct cpumask *mask)
diff --git a/arch/mn10300/mm/cache-smp.c b/arch/mn10300/mm/cache-smp.c
index 2d23b9e..47ca1c9 100644
--- a/arch/mn10300/mm/cache-smp.c
+++ b/arch/mn10300/mm/cache-smp.c
@@ -13,6 +13,7 @@
 #include <linux/mman.h>
 #include <linux/threads.h>
 #include <linux/interrupt.h>
+#include <linux/cpu.h>
 #include <asm/page.h>
 #include <asm/pgtable.h>
 #include <asm/processor.h>
@@ -94,6 +95,8 @@ void smp_cache_call(unsigned long opr_mask,
 	smp_cache_mask = opr_mask;
 	smp_cache_start = start;
 	smp_cache_end = end;
+
+	get_online_cpus_atomic();
 	cpumask_copy(&smp_cache_ipi_map, cpu_online_mask);
 	cpumask_clear_cpu(smp_processor_id(), &smp_cache_ipi_map);
 
@@ -102,4 +105,6 @@ void smp_cache_call(unsigned long opr_mask,
 	while (!cpumask_empty(&smp_cache_ipi_map))
 		/* nothing. lockup detection does not belong here */
 		mb();
+
+	put_online_cpus_atomic();
 }
diff --git a/arch/mn10300/mm/tlb-smp.c b/arch/mn10300/mm/tlb-smp.c
index 3e57faf..d47304d 100644
--- a/arch/mn10300/mm/tlb-smp.c
+++ b/arch/mn10300/mm/tlb-smp.c
@@ -23,6 +23,7 @@
 #include <linux/sched.h>
 #include <linux/profile.h>
 #include <linux/smp.h>
+#include <linux/cpu.h>
 #include <asm/tlbflush.h>
 #include <asm/bitops.h>
 #include <asm/processor.h>
@@ -105,6 +106,7 @@ static void flush_tlb_others(cpumask_t cpumask, struct mm_struct *mm,
 	BUG_ON(cpumask_empty(&cpumask));
 	BUG_ON(cpumask_test_cpu(smp_processor_id(), &cpumask));
 
+	get_online_cpus_atomic();
 	cpumask_and(&tmp, &cpumask, cpu_online_mask);
 	BUG_ON(!cpumask_equal(&cpumask, &tmp));
 
@@ -134,6 +136,7 @@ static void flush_tlb_others(cpumask_t cpumask, struct mm_struct *mm,
 	flush_mm = NULL;
 	flush_va = 0;
 	spin_unlock(&tlbstate_lock);
+	put_online_cpus_atomic();
 }
 
 /**
@@ -144,7 +147,7 @@ void flush_tlb_mm(struct mm_struct *mm)
 {
 	cpumask_t cpu_mask;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	cpumask_copy(&cpu_mask, mm_cpumask(mm));
 	cpumask_clear_cpu(smp_processor_id(), &cpu_mask);
 
@@ -152,7 +155,7 @@ void flush_tlb_mm(struct mm_struct *mm)
 	if (!cpumask_empty(&cpu_mask))
 		flush_tlb_others(cpu_mask, mm, FLUSH_ALL);
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 /**
@@ -163,7 +166,7 @@ void flush_tlb_current_task(void)
 	struct mm_struct *mm = current->mm;
 	cpumask_t cpu_mask;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	cpumask_copy(&cpu_mask, mm_cpumask(mm));
 	cpumask_clear_cpu(smp_processor_id(), &cpu_mask);
 
@@ -171,7 +174,7 @@ void flush_tlb_current_task(void)
 	if (!cpumask_empty(&cpu_mask))
 		flush_tlb_others(cpu_mask, mm, FLUSH_ALL);
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 /**
@@ -184,7 +187,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long va)
 	struct mm_struct *mm = vma->vm_mm;
 	cpumask_t cpu_mask;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	cpumask_copy(&cpu_mask, mm_cpumask(mm));
 	cpumask_clear_cpu(smp_processor_id(), &cpu_mask);
 
@@ -192,7 +195,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long va)
 	if (!cpumask_empty(&cpu_mask))
 		flush_tlb_others(cpu_mask, mm, va);
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 /**

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 38/46] mn10300: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:43   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:43 UTC (permalink / raw)
  To: linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: David Howells <dhowells@redhat.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: linux-am33-list at redhat.com
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/mn10300/kernel/smp.c   |    2 ++
 arch/mn10300/mm/cache-smp.c |    5 +++++
 arch/mn10300/mm/tlb-smp.c   |   15 +++++++++------
 3 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/mn10300/kernel/smp.c b/arch/mn10300/kernel/smp.c
index 5d7e152..9dfa172 100644
--- a/arch/mn10300/kernel/smp.c
+++ b/arch/mn10300/kernel/smp.c
@@ -349,9 +349,11 @@ void send_IPI_allbutself(int irq)
 {
 	cpumask_t cpumask;
 
+	get_online_cpus_atomic();
 	cpumask_copy(&cpumask, cpu_online_mask);
 	cpumask_clear_cpu(smp_processor_id(), &cpumask);
 	send_IPI_mask(&cpumask, irq);
+	put_online_cpus_atomic();
 }
 
 void arch_send_call_function_ipi_mask(const struct cpumask *mask)
diff --git a/arch/mn10300/mm/cache-smp.c b/arch/mn10300/mm/cache-smp.c
index 2d23b9e..47ca1c9 100644
--- a/arch/mn10300/mm/cache-smp.c
+++ b/arch/mn10300/mm/cache-smp.c
@@ -13,6 +13,7 @@
 #include <linux/mman.h>
 #include <linux/threads.h>
 #include <linux/interrupt.h>
+#include <linux/cpu.h>
 #include <asm/page.h>
 #include <asm/pgtable.h>
 #include <asm/processor.h>
@@ -94,6 +95,8 @@ void smp_cache_call(unsigned long opr_mask,
 	smp_cache_mask = opr_mask;
 	smp_cache_start = start;
 	smp_cache_end = end;
+
+	get_online_cpus_atomic();
 	cpumask_copy(&smp_cache_ipi_map, cpu_online_mask);
 	cpumask_clear_cpu(smp_processor_id(), &smp_cache_ipi_map);
 
@@ -102,4 +105,6 @@ void smp_cache_call(unsigned long opr_mask,
 	while (!cpumask_empty(&smp_cache_ipi_map))
 		/* nothing. lockup detection does not belong here */
 		mb();
+
+	put_online_cpus_atomic();
 }
diff --git a/arch/mn10300/mm/tlb-smp.c b/arch/mn10300/mm/tlb-smp.c
index 3e57faf..d47304d 100644
--- a/arch/mn10300/mm/tlb-smp.c
+++ b/arch/mn10300/mm/tlb-smp.c
@@ -23,6 +23,7 @@
 #include <linux/sched.h>
 #include <linux/profile.h>
 #include <linux/smp.h>
+#include <linux/cpu.h>
 #include <asm/tlbflush.h>
 #include <asm/bitops.h>
 #include <asm/processor.h>
@@ -105,6 +106,7 @@ static void flush_tlb_others(cpumask_t cpumask, struct mm_struct *mm,
 	BUG_ON(cpumask_empty(&cpumask));
 	BUG_ON(cpumask_test_cpu(smp_processor_id(), &cpumask));
 
+	get_online_cpus_atomic();
 	cpumask_and(&tmp, &cpumask, cpu_online_mask);
 	BUG_ON(!cpumask_equal(&cpumask, &tmp));
 
@@ -134,6 +136,7 @@ static void flush_tlb_others(cpumask_t cpumask, struct mm_struct *mm,
 	flush_mm = NULL;
 	flush_va = 0;
 	spin_unlock(&tlbstate_lock);
+	put_online_cpus_atomic();
 }
 
 /**
@@ -144,7 +147,7 @@ void flush_tlb_mm(struct mm_struct *mm)
 {
 	cpumask_t cpu_mask;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	cpumask_copy(&cpu_mask, mm_cpumask(mm));
 	cpumask_clear_cpu(smp_processor_id(), &cpu_mask);
 
@@ -152,7 +155,7 @@ void flush_tlb_mm(struct mm_struct *mm)
 	if (!cpumask_empty(&cpu_mask))
 		flush_tlb_others(cpu_mask, mm, FLUSH_ALL);
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 /**
@@ -163,7 +166,7 @@ void flush_tlb_current_task(void)
 	struct mm_struct *mm = current->mm;
 	cpumask_t cpu_mask;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	cpumask_copy(&cpu_mask, mm_cpumask(mm));
 	cpumask_clear_cpu(smp_processor_id(), &cpu_mask);
 
@@ -171,7 +174,7 @@ void flush_tlb_current_task(void)
 	if (!cpumask_empty(&cpu_mask))
 		flush_tlb_others(cpu_mask, mm, FLUSH_ALL);
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 /**
@@ -184,7 +187,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long va)
 	struct mm_struct *mm = vma->vm_mm;
 	cpumask_t cpu_mask;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	cpumask_copy(&cpu_mask, mm_cpumask(mm));
 	cpumask_clear_cpu(smp_processor_id(), &cpu_mask);
 
@@ -192,7 +195,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long va)
 	if (!cpumask_empty(&cpu_mask))
 		flush_tlb_others(cpu_mask, mm, va);
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 /**

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 39/46] parisc: Use get/put_online_cpus_atomic() to prevent CPU offline
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:43   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:43 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: linux-parisc@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/parisc/kernel/smp.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/parisc/kernel/smp.c b/arch/parisc/kernel/smp.c
index 6266730..f7851b7 100644
--- a/arch/parisc/kernel/smp.c
+++ b/arch/parisc/kernel/smp.c
@@ -229,11 +229,13 @@ static inline void
 send_IPI_allbutself(enum ipi_message_type op)
 {
 	int i;
-	
+
+	get_online_cpus_atomic();
 	for_each_online_cpu(i) {
 		if (i != smp_processor_id())
 			send_IPI_single(i, op);
 	}
+	put_online_cpus_atomic();
 }
 
 


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 39/46] parisc: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:43   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:43 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: linux-parisc@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/parisc/kernel/smp.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/parisc/kernel/smp.c b/arch/parisc/kernel/smp.c
index 6266730..f7851b7 100644
--- a/arch/parisc/kernel/smp.c
+++ b/arch/parisc/kernel/smp.c
@@ -229,11 +229,13 @@ static inline void
 send_IPI_allbutself(enum ipi_message_type op)
 {
 	int i;
-	
+
+	get_online_cpus_atomic();
 	for_each_online_cpu(i) {
 		if (i != smp_processor_id())
 			send_IPI_single(i, op);
 	}
+	put_online_cpus_atomic();
 }
 
 

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 39/46] parisc: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:43   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:43 UTC (permalink / raw)
  To: linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: linux-parisc at vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/parisc/kernel/smp.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/parisc/kernel/smp.c b/arch/parisc/kernel/smp.c
index 6266730..f7851b7 100644
--- a/arch/parisc/kernel/smp.c
+++ b/arch/parisc/kernel/smp.c
@@ -229,11 +229,13 @@ static inline void
 send_IPI_allbutself(enum ipi_message_type op)
 {
 	int i;
-	
+
+	get_online_cpus_atomic();
 	for_each_online_cpu(i) {
 		if (i != smp_processor_id())
 			send_IPI_single(i, op);
 	}
+	put_online_cpus_atomic();
 }
 
 

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 40/46] powerpc: Use get/put_online_cpus_atomic() to prevent CPU offline
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:43   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:43 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/powerpc/mm/mmu_context_nohash.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/mm/mmu_context_nohash.c b/arch/powerpc/mm/mmu_context_nohash.c
index e779642..29f58cf 100644
--- a/arch/powerpc/mm/mmu_context_nohash.c
+++ b/arch/powerpc/mm/mmu_context_nohash.c
@@ -196,6 +196,7 @@ void switch_mmu_context(struct mm_struct *prev, struct mm_struct *next)
 
 	/* No lockless fast path .. yet */
 	raw_spin_lock(&context_lock);
+	get_online_cpus_atomic();
 
 	pr_hard("[%d] activating context for mm @%p, active=%d, id=%d",
 		cpu, next, next->context.active, next->context.id);
@@ -279,6 +280,7 @@ void switch_mmu_context(struct mm_struct *prev, struct mm_struct *next)
 	/* Flick the MMU and release lock */
 	pr_hardcont(" -> %d\n", id);
 	set_context(id, next->pgd);
+	put_online_cpus_atomic();
 	raw_spin_unlock(&context_lock);
 }
 


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 40/46] powerpc: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:43   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:43 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/powerpc/mm/mmu_context_nohash.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/mm/mmu_context_nohash.c b/arch/powerpc/mm/mmu_context_nohash.c
index e779642..29f58cf 100644
--- a/arch/powerpc/mm/mmu_context_nohash.c
+++ b/arch/powerpc/mm/mmu_context_nohash.c
@@ -196,6 +196,7 @@ void switch_mmu_context(struct mm_struct *prev, struct mm_struct *next)
 
 	/* No lockless fast path .. yet */
 	raw_spin_lock(&context_lock);
+	get_online_cpus_atomic();
 
 	pr_hard("[%d] activating context for mm @%p, active=%d, id=%d",
 		cpu, next, next->context.active, next->context.id);
@@ -279,6 +280,7 @@ void switch_mmu_context(struct mm_struct *prev, struct mm_struct *next)
 	/* Flick the MMU and release lock */
 	pr_hardcont(" -> %d\n", id);
 	set_context(id, next->pgd);
+	put_online_cpus_atomic();
 	raw_spin_unlock(&context_lock);
 }
 

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 40/46] powerpc: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:43   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:43 UTC (permalink / raw)
  To: linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev at lists.ozlabs.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/powerpc/mm/mmu_context_nohash.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/mm/mmu_context_nohash.c b/arch/powerpc/mm/mmu_context_nohash.c
index e779642..29f58cf 100644
--- a/arch/powerpc/mm/mmu_context_nohash.c
+++ b/arch/powerpc/mm/mmu_context_nohash.c
@@ -196,6 +196,7 @@ void switch_mmu_context(struct mm_struct *prev, struct mm_struct *next)
 
 	/* No lockless fast path .. yet */
 	raw_spin_lock(&context_lock);
+	get_online_cpus_atomic();
 
 	pr_hard("[%d] activating context for mm @%p, active=%d, id=%d",
 		cpu, next, next->context.active, next->context.id);
@@ -279,6 +280,7 @@ void switch_mmu_context(struct mm_struct *prev, struct mm_struct *next)
 	/* Flick the MMU and release lock */
 	pr_hardcont(" -> %d\n", id);
 	set_context(id, next->pgd);
+	put_online_cpus_atomic();
 	raw_spin_unlock(&context_lock);
 }
 

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 41/46] sh: Use get/put_online_cpus_atomic() to prevent CPU offline
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:44   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:44 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Paul Mundt <lethal@linux-sh.org>
Cc: linux-sh@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/sh/kernel/smp.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/sh/kernel/smp.c b/arch/sh/kernel/smp.c
index 2062aa8..232fabe 100644
--- a/arch/sh/kernel/smp.c
+++ b/arch/sh/kernel/smp.c
@@ -357,7 +357,7 @@ static void flush_tlb_mm_ipi(void *mm)
  */
 void flush_tlb_mm(struct mm_struct *mm)
 {
-	preempt_disable();
+	get_online_cpus_atomic();
 
 	if ((atomic_read(&mm->mm_users) != 1) || (current->mm != mm)) {
 		smp_call_function(flush_tlb_mm_ipi, (void *)mm, 1);
@@ -369,7 +369,7 @@ void flush_tlb_mm(struct mm_struct *mm)
 	}
 	local_flush_tlb_mm(mm);
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 struct flush_tlb_data {
@@ -390,7 +390,7 @@ void flush_tlb_range(struct vm_area_struct *vma,
 {
 	struct mm_struct *mm = vma->vm_mm;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	if ((atomic_read(&mm->mm_users) != 1) || (current->mm != mm)) {
 		struct flush_tlb_data fd;
 
@@ -405,7 +405,7 @@ void flush_tlb_range(struct vm_area_struct *vma,
 				cpu_context(i, mm) = 0;
 	}
 	local_flush_tlb_range(vma, start, end);
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 static void flush_tlb_kernel_range_ipi(void *info)
@@ -433,7 +433,7 @@ static void flush_tlb_page_ipi(void *info)
 
 void flush_tlb_page(struct vm_area_struct *vma, unsigned long page)
 {
-	preempt_disable();
+	get_online_cpus_atomic();
 	if ((atomic_read(&vma->vm_mm->mm_users) != 1) ||
 	    (current->mm != vma->vm_mm)) {
 		struct flush_tlb_data fd;
@@ -448,7 +448,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long page)
 				cpu_context(i, vma->vm_mm) = 0;
 	}
 	local_flush_tlb_page(vma, page);
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 static void flush_tlb_one_ipi(void *info)


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 41/46] sh: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:44   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:44 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Paul Mundt <lethal@linux-sh.org>
Cc: linux-sh@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/sh/kernel/smp.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/sh/kernel/smp.c b/arch/sh/kernel/smp.c
index 2062aa8..232fabe 100644
--- a/arch/sh/kernel/smp.c
+++ b/arch/sh/kernel/smp.c
@@ -357,7 +357,7 @@ static void flush_tlb_mm_ipi(void *mm)
  */
 void flush_tlb_mm(struct mm_struct *mm)
 {
-	preempt_disable();
+	get_online_cpus_atomic();
 
 	if ((atomic_read(&mm->mm_users) != 1) || (current->mm != mm)) {
 		smp_call_function(flush_tlb_mm_ipi, (void *)mm, 1);
@@ -369,7 +369,7 @@ void flush_tlb_mm(struct mm_struct *mm)
 	}
 	local_flush_tlb_mm(mm);
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 struct flush_tlb_data {
@@ -390,7 +390,7 @@ void flush_tlb_range(struct vm_area_struct *vma,
 {
 	struct mm_struct *mm = vma->vm_mm;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	if ((atomic_read(&mm->mm_users) != 1) || (current->mm != mm)) {
 		struct flush_tlb_data fd;
 
@@ -405,7 +405,7 @@ void flush_tlb_range(struct vm_area_struct *vma,
 				cpu_context(i, mm) = 0;
 	}
 	local_flush_tlb_range(vma, start, end);
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 static void flush_tlb_kernel_range_ipi(void *info)
@@ -433,7 +433,7 @@ static void flush_tlb_page_ipi(void *info)
 
 void flush_tlb_page(struct vm_area_struct *vma, unsigned long page)
 {
-	preempt_disable();
+	get_online_cpus_atomic();
 	if ((atomic_read(&vma->vm_mm->mm_users) != 1) ||
 	    (current->mm != vma->vm_mm)) {
 		struct flush_tlb_data fd;
@@ -448,7 +448,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long page)
 				cpu_context(i, vma->vm_mm) = 0;
 	}
 	local_flush_tlb_page(vma, page);
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 static void flush_tlb_one_ipi(void *info)

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 41/46] sh: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:44   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Paul Mundt <lethal@linux-sh.org>
Cc: linux-sh at vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/sh/kernel/smp.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/sh/kernel/smp.c b/arch/sh/kernel/smp.c
index 2062aa8..232fabe 100644
--- a/arch/sh/kernel/smp.c
+++ b/arch/sh/kernel/smp.c
@@ -357,7 +357,7 @@ static void flush_tlb_mm_ipi(void *mm)
  */
 void flush_tlb_mm(struct mm_struct *mm)
 {
-	preempt_disable();
+	get_online_cpus_atomic();
 
 	if ((atomic_read(&mm->mm_users) != 1) || (current->mm != mm)) {
 		smp_call_function(flush_tlb_mm_ipi, (void *)mm, 1);
@@ -369,7 +369,7 @@ void flush_tlb_mm(struct mm_struct *mm)
 	}
 	local_flush_tlb_mm(mm);
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 struct flush_tlb_data {
@@ -390,7 +390,7 @@ void flush_tlb_range(struct vm_area_struct *vma,
 {
 	struct mm_struct *mm = vma->vm_mm;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	if ((atomic_read(&mm->mm_users) != 1) || (current->mm != mm)) {
 		struct flush_tlb_data fd;
 
@@ -405,7 +405,7 @@ void flush_tlb_range(struct vm_area_struct *vma,
 				cpu_context(i, mm) = 0;
 	}
 	local_flush_tlb_range(vma, start, end);
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 static void flush_tlb_kernel_range_ipi(void *info)
@@ -433,7 +433,7 @@ static void flush_tlb_page_ipi(void *info)
 
 void flush_tlb_page(struct vm_area_struct *vma, unsigned long page)
 {
-	preempt_disable();
+	get_online_cpus_atomic();
 	if ((atomic_read(&vma->vm_mm->mm_users) != 1) ||
 	    (current->mm != vma->vm_mm)) {
 		struct flush_tlb_data fd;
@@ -448,7 +448,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long page)
 				cpu_context(i, vma->vm_mm) = 0;
 	}
 	local_flush_tlb_page(vma, page);
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 static void flush_tlb_one_ipi(void *info)

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 42/46] sparc: Use get/put_online_cpus_atomic() to prevent CPU offline
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:44   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:44 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: sparclinux@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/sparc/kernel/leon_smp.c  |    2 ++
 arch/sparc/kernel/smp_64.c    |    9 +++++----
 arch/sparc/kernel/sun4d_smp.c |    2 ++
 arch/sparc/kernel/sun4m_smp.c |    3 +++
 4 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/sparc/kernel/leon_smp.c b/arch/sparc/kernel/leon_smp.c
index 0f3fb6d..441d3ac 100644
--- a/arch/sparc/kernel/leon_smp.c
+++ b/arch/sparc/kernel/leon_smp.c
@@ -420,6 +420,7 @@ static void leon_cross_call(smpfunc_t func, cpumask_t mask, unsigned long arg1,
 		unsigned long flags;
 
 		spin_lock_irqsave(&cross_call_lock, flags);
+		get_online_cpus_atomic();
 
 		{
 			/* If you make changes here, make sure gcc generates proper code... */
@@ -476,6 +477,7 @@ static void leon_cross_call(smpfunc_t func, cpumask_t mask, unsigned long arg1,
 			} while (++i <= high);
 		}
 
+		put_online_cpus_atomic();
 		spin_unlock_irqrestore(&cross_call_lock, flags);
 	}
 }
diff --git a/arch/sparc/kernel/smp_64.c b/arch/sparc/kernel/smp_64.c
index 537eb66..e1d7300 100644
--- a/arch/sparc/kernel/smp_64.c
+++ b/arch/sparc/kernel/smp_64.c
@@ -894,7 +894,8 @@ void smp_flush_dcache_page_impl(struct page *page, int cpu)
 	atomic_inc(&dcpage_flushes);
 #endif
 
-	this_cpu = get_cpu();
+	get_online_cpus_atomic();
+	this_cpu = smp_processor_id();
 
 	if (cpu == this_cpu) {
 		__local_flush_dcache_page(page);
@@ -920,7 +921,7 @@ void smp_flush_dcache_page_impl(struct page *page, int cpu)
 		}
 	}
 
-	put_cpu();
+	put_online_cpus_atomic();
 }
 
 void flush_dcache_page_all(struct mm_struct *mm, struct page *page)
@@ -931,7 +932,7 @@ void flush_dcache_page_all(struct mm_struct *mm, struct page *page)
 	if (tlb_type == hypervisor)
 		return;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 
 #ifdef CONFIG_DEBUG_DCFLUSH
 	atomic_inc(&dcpage_flushes);
@@ -956,7 +957,7 @@ void flush_dcache_page_all(struct mm_struct *mm, struct page *page)
 	}
 	__local_flush_dcache_page(page);
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 void __irq_entry smp_new_mmu_context_version_client(int irq, struct pt_regs *regs)
diff --git a/arch/sparc/kernel/sun4d_smp.c b/arch/sparc/kernel/sun4d_smp.c
index ddaea31..1fa7ff2 100644
--- a/arch/sparc/kernel/sun4d_smp.c
+++ b/arch/sparc/kernel/sun4d_smp.c
@@ -300,6 +300,7 @@ static void sun4d_cross_call(smpfunc_t func, cpumask_t mask, unsigned long arg1,
 		unsigned long flags;
 
 		spin_lock_irqsave(&cross_call_lock, flags);
+		get_online_cpus_atomic();
 
 		{
 			/*
@@ -356,6 +357,7 @@ static void sun4d_cross_call(smpfunc_t func, cpumask_t mask, unsigned long arg1,
 			} while (++i <= high);
 		}
 
+		put_online_cpus_atomic();
 		spin_unlock_irqrestore(&cross_call_lock, flags);
 	}
 }
diff --git a/arch/sparc/kernel/sun4m_smp.c b/arch/sparc/kernel/sun4m_smp.c
index 128af73..5599548 100644
--- a/arch/sparc/kernel/sun4m_smp.c
+++ b/arch/sparc/kernel/sun4m_smp.c
@@ -192,6 +192,7 @@ static void sun4m_cross_call(smpfunc_t func, cpumask_t mask, unsigned long arg1,
 		unsigned long flags;
 
 		spin_lock_irqsave(&cross_call_lock, flags);
+		get_online_cpus_atomic();
 
 		/* Init function glue. */
 		ccall_info.func = func;
@@ -238,6 +239,8 @@ static void sun4m_cross_call(smpfunc_t func, cpumask_t mask, unsigned long arg1,
 					barrier();
 			} while (++i < ncpus);
 		}
+
+		put_online_cpus_atomic();
 		spin_unlock_irqrestore(&cross_call_lock, flags);
 }
 


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 42/46] sparc: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:44   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:44 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: sparclinux@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/sparc/kernel/leon_smp.c  |    2 ++
 arch/sparc/kernel/smp_64.c    |    9 +++++----
 arch/sparc/kernel/sun4d_smp.c |    2 ++
 arch/sparc/kernel/sun4m_smp.c |    3 +++
 4 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/sparc/kernel/leon_smp.c b/arch/sparc/kernel/leon_smp.c
index 0f3fb6d..441d3ac 100644
--- a/arch/sparc/kernel/leon_smp.c
+++ b/arch/sparc/kernel/leon_smp.c
@@ -420,6 +420,7 @@ static void leon_cross_call(smpfunc_t func, cpumask_t mask, unsigned long arg1,
 		unsigned long flags;
 
 		spin_lock_irqsave(&cross_call_lock, flags);
+		get_online_cpus_atomic();
 
 		{
 			/* If you make changes here, make sure gcc generates proper code... */
@@ -476,6 +477,7 @@ static void leon_cross_call(smpfunc_t func, cpumask_t mask, unsigned long arg1,
 			} while (++i <= high);
 		}
 
+		put_online_cpus_atomic();
 		spin_unlock_irqrestore(&cross_call_lock, flags);
 	}
 }
diff --git a/arch/sparc/kernel/smp_64.c b/arch/sparc/kernel/smp_64.c
index 537eb66..e1d7300 100644
--- a/arch/sparc/kernel/smp_64.c
+++ b/arch/sparc/kernel/smp_64.c
@@ -894,7 +894,8 @@ void smp_flush_dcache_page_impl(struct page *page, int cpu)
 	atomic_inc(&dcpage_flushes);
 #endif
 
-	this_cpu = get_cpu();
+	get_online_cpus_atomic();
+	this_cpu = smp_processor_id();
 
 	if (cpu == this_cpu) {
 		__local_flush_dcache_page(page);
@@ -920,7 +921,7 @@ void smp_flush_dcache_page_impl(struct page *page, int cpu)
 		}
 	}
 
-	put_cpu();
+	put_online_cpus_atomic();
 }
 
 void flush_dcache_page_all(struct mm_struct *mm, struct page *page)
@@ -931,7 +932,7 @@ void flush_dcache_page_all(struct mm_struct *mm, struct page *page)
 	if (tlb_type == hypervisor)
 		return;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 
 #ifdef CONFIG_DEBUG_DCFLUSH
 	atomic_inc(&dcpage_flushes);
@@ -956,7 +957,7 @@ void flush_dcache_page_all(struct mm_struct *mm, struct page *page)
 	}
 	__local_flush_dcache_page(page);
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 void __irq_entry smp_new_mmu_context_version_client(int irq, struct pt_regs *regs)
diff --git a/arch/sparc/kernel/sun4d_smp.c b/arch/sparc/kernel/sun4d_smp.c
index ddaea31..1fa7ff2 100644
--- a/arch/sparc/kernel/sun4d_smp.c
+++ b/arch/sparc/kernel/sun4d_smp.c
@@ -300,6 +300,7 @@ static void sun4d_cross_call(smpfunc_t func, cpumask_t mask, unsigned long arg1,
 		unsigned long flags;
 
 		spin_lock_irqsave(&cross_call_lock, flags);
+		get_online_cpus_atomic();
 
 		{
 			/*
@@ -356,6 +357,7 @@ static void sun4d_cross_call(smpfunc_t func, cpumask_t mask, unsigned long arg1,
 			} while (++i <= high);
 		}
 
+		put_online_cpus_atomic();
 		spin_unlock_irqrestore(&cross_call_lock, flags);
 	}
 }
diff --git a/arch/sparc/kernel/sun4m_smp.c b/arch/sparc/kernel/sun4m_smp.c
index 128af73..5599548 100644
--- a/arch/sparc/kernel/sun4m_smp.c
+++ b/arch/sparc/kernel/sun4m_smp.c
@@ -192,6 +192,7 @@ static void sun4m_cross_call(smpfunc_t func, cpumask_t mask, unsigned long arg1,
 		unsigned long flags;
 
 		spin_lock_irqsave(&cross_call_lock, flags);
+		get_online_cpus_atomic();
 
 		/* Init function glue. */
 		ccall_info.func = func;
@@ -238,6 +239,8 @@ static void sun4m_cross_call(smpfunc_t func, cpumask_t mask, unsigned long arg1,
 					barrier();
 			} while (++i < ncpus);
 		}
+
+		put_online_cpus_atomic();
 		spin_unlock_irqrestore(&cross_call_lock, flags);
 }
 

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 42/46] sparc: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:44   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: sparclinux at vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/sparc/kernel/leon_smp.c  |    2 ++
 arch/sparc/kernel/smp_64.c    |    9 +++++----
 arch/sparc/kernel/sun4d_smp.c |    2 ++
 arch/sparc/kernel/sun4m_smp.c |    3 +++
 4 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/sparc/kernel/leon_smp.c b/arch/sparc/kernel/leon_smp.c
index 0f3fb6d..441d3ac 100644
--- a/arch/sparc/kernel/leon_smp.c
+++ b/arch/sparc/kernel/leon_smp.c
@@ -420,6 +420,7 @@ static void leon_cross_call(smpfunc_t func, cpumask_t mask, unsigned long arg1,
 		unsigned long flags;
 
 		spin_lock_irqsave(&cross_call_lock, flags);
+		get_online_cpus_atomic();
 
 		{
 			/* If you make changes here, make sure gcc generates proper code... */
@@ -476,6 +477,7 @@ static void leon_cross_call(smpfunc_t func, cpumask_t mask, unsigned long arg1,
 			} while (++i <= high);
 		}
 
+		put_online_cpus_atomic();
 		spin_unlock_irqrestore(&cross_call_lock, flags);
 	}
 }
diff --git a/arch/sparc/kernel/smp_64.c b/arch/sparc/kernel/smp_64.c
index 537eb66..e1d7300 100644
--- a/arch/sparc/kernel/smp_64.c
+++ b/arch/sparc/kernel/smp_64.c
@@ -894,7 +894,8 @@ void smp_flush_dcache_page_impl(struct page *page, int cpu)
 	atomic_inc(&dcpage_flushes);
 #endif
 
-	this_cpu = get_cpu();
+	get_online_cpus_atomic();
+	this_cpu = smp_processor_id();
 
 	if (cpu == this_cpu) {
 		__local_flush_dcache_page(page);
@@ -920,7 +921,7 @@ void smp_flush_dcache_page_impl(struct page *page, int cpu)
 		}
 	}
 
-	put_cpu();
+	put_online_cpus_atomic();
 }
 
 void flush_dcache_page_all(struct mm_struct *mm, struct page *page)
@@ -931,7 +932,7 @@ void flush_dcache_page_all(struct mm_struct *mm, struct page *page)
 	if (tlb_type == hypervisor)
 		return;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 
 #ifdef CONFIG_DEBUG_DCFLUSH
 	atomic_inc(&dcpage_flushes);
@@ -956,7 +957,7 @@ void flush_dcache_page_all(struct mm_struct *mm, struct page *page)
 	}
 	__local_flush_dcache_page(page);
 
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 
 void __irq_entry smp_new_mmu_context_version_client(int irq, struct pt_regs *regs)
diff --git a/arch/sparc/kernel/sun4d_smp.c b/arch/sparc/kernel/sun4d_smp.c
index ddaea31..1fa7ff2 100644
--- a/arch/sparc/kernel/sun4d_smp.c
+++ b/arch/sparc/kernel/sun4d_smp.c
@@ -300,6 +300,7 @@ static void sun4d_cross_call(smpfunc_t func, cpumask_t mask, unsigned long arg1,
 		unsigned long flags;
 
 		spin_lock_irqsave(&cross_call_lock, flags);
+		get_online_cpus_atomic();
 
 		{
 			/*
@@ -356,6 +357,7 @@ static void sun4d_cross_call(smpfunc_t func, cpumask_t mask, unsigned long arg1,
 			} while (++i <= high);
 		}
 
+		put_online_cpus_atomic();
 		spin_unlock_irqrestore(&cross_call_lock, flags);
 	}
 }
diff --git a/arch/sparc/kernel/sun4m_smp.c b/arch/sparc/kernel/sun4m_smp.c
index 128af73..5599548 100644
--- a/arch/sparc/kernel/sun4m_smp.c
+++ b/arch/sparc/kernel/sun4m_smp.c
@@ -192,6 +192,7 @@ static void sun4m_cross_call(smpfunc_t func, cpumask_t mask, unsigned long arg1,
 		unsigned long flags;
 
 		spin_lock_irqsave(&cross_call_lock, flags);
+		get_online_cpus_atomic();
 
 		/* Init function glue. */
 		ccall_info.func = func;
@@ -238,6 +239,8 @@ static void sun4m_cross_call(smpfunc_t func, cpumask_t mask, unsigned long arg1,
 					barrier();
 			} while (++i < ncpus);
 		}
+
+		put_online_cpus_atomic();
 		spin_unlock_irqrestore(&cross_call_lock, flags);
 }
 

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 43/46] tile: Use get/put_online_cpus_atomic() to prevent CPU offline
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:44   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:44 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/tile/kernel/smp.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/tile/kernel/smp.c b/arch/tile/kernel/smp.c
index cbc73a8..fb30624 100644
--- a/arch/tile/kernel/smp.c
+++ b/arch/tile/kernel/smp.c
@@ -15,6 +15,7 @@
  */
 
 #include <linux/smp.h>
+#include <linux/cpu.h>
 #include <linux/interrupt.h>
 #include <linux/io.h>
 #include <linux/irq.h>
@@ -82,9 +83,12 @@ void send_IPI_many(const struct cpumask *mask, int tag)
 void send_IPI_allbutself(int tag)
 {
 	struct cpumask mask;
+
+	get_online_cpus_atomic();
 	cpumask_copy(&mask, cpu_online_mask);
 	cpumask_clear_cpu(smp_processor_id(), &mask);
 	send_IPI_many(&mask, tag);
+	put_online_cpus_atomic();
 }
 
 /*


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 43/46] tile: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:44   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:44 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/tile/kernel/smp.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/tile/kernel/smp.c b/arch/tile/kernel/smp.c
index cbc73a8..fb30624 100644
--- a/arch/tile/kernel/smp.c
+++ b/arch/tile/kernel/smp.c
@@ -15,6 +15,7 @@
  */
 
 #include <linux/smp.h>
+#include <linux/cpu.h>
 #include <linux/interrupt.h>
 #include <linux/io.h>
 #include <linux/irq.h>
@@ -82,9 +83,12 @@ void send_IPI_many(const struct cpumask *mask, int tag)
 void send_IPI_allbutself(int tag)
 {
 	struct cpumask mask;
+
+	get_online_cpus_atomic();
 	cpumask_copy(&mask, cpu_online_mask);
 	cpumask_clear_cpu(smp_processor_id(), &mask);
 	send_IPI_many(&mask, tag);
+	put_online_cpus_atomic();
 }
 
 /*

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 43/46] tile: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 12:44   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() or local_irq_disable() to prevent CPUs from
going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/tile/kernel/smp.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/tile/kernel/smp.c b/arch/tile/kernel/smp.c
index cbc73a8..fb30624 100644
--- a/arch/tile/kernel/smp.c
+++ b/arch/tile/kernel/smp.c
@@ -15,6 +15,7 @@
  */
 
 #include <linux/smp.h>
+#include <linux/cpu.h>
 #include <linux/interrupt.h>
 #include <linux/io.h>
 #include <linux/irq.h>
@@ -82,9 +83,12 @@ void send_IPI_many(const struct cpumask *mask, int tag)
 void send_IPI_allbutself(int tag)
 {
 	struct cpumask mask;
+
+	get_online_cpus_atomic();
 	cpumask_copy(&mask, cpu_online_mask);
 	cpumask_clear_cpu(smp_processor_id(), &mask);
 	send_IPI_many(&mask, tag);
+	put_online_cpus_atomic();
 }
 
 /*

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 44/46] cpu: No more __stop_machine() in _cpu_down()
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:44   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:44 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

From: Paul E. McKenney <paul.mckenney@linaro.org>

The _cpu_down() function invoked as part of the CPU-hotplug offlining
process currently invokes __stop_machine(), which is slow and inflicts
substantial real-time latencies on the entire system.  This patch
substitutes stop_one_cpu() for __stop_machine() in order to improve
both performance and real-time latency.

There were a number of uses of preempt_disable() or local_irq_disable()
that were intended to block CPU-hotplug offlining. These were fixed by
using get/put_online_cpus_atomic(), which is the new synchronization
primitive to prevent CPU offline, while invoking from atomic context.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ srivatsa.bhat@linux.vnet.ibm.com: Refer to the new sync primitives for
  readers (in the changelog); s/stop_cpus/stop_one_cpu and fix comment
  referring to stop_machine in the code]
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/cpu.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 58dd1df..652fb53 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -337,7 +337,7 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
 	}
 	smpboot_park_threads(cpu);
 
-	err = __stop_machine(take_cpu_down, &tcd_param, cpumask_of(cpu));
+	err = stop_one_cpu(cpu, take_cpu_down, &tcd_param);
 	if (err) {
 		/* CPU didn't die: tell everyone.  Can't complain. */
 		smpboot_unpark_threads(cpu);
@@ -349,7 +349,7 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
 	/*
 	 * The migration_call() CPU_DYING callback will have removed all
 	 * runnable tasks from the cpu, there's only the idle task left now
-	 * that the migration thread is done doing the stop_machine thing.
+	 * that the migration thread is done doing the stop_one_cpu() thing.
 	 *
 	 * Wait for the stop thread to go away.
 	 */


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 44/46] cpu: No more __stop_machine() in _cpu_down()
@ 2013-02-18 12:44   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:44 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

From: Paul E. McKenney <paul.mckenney@linaro.org>

The _cpu_down() function invoked as part of the CPU-hotplug offlining
process currently invokes __stop_machine(), which is slow and inflicts
substantial real-time latencies on the entire system.  This patch
substitutes stop_one_cpu() for __stop_machine() in order to improve
both performance and real-time latency.

There were a number of uses of preempt_disable() or local_irq_disable()
that were intended to block CPU-hotplug offlining. These were fixed by
using get/put_online_cpus_atomic(), which is the new synchronization
primitive to prevent CPU offline, while invoking from atomic context.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ srivatsa.bhat@linux.vnet.ibm.com: Refer to the new sync primitives for
  readers (in the changelog); s/stop_cpus/stop_one_cpu and fix comment
  referring to stop_machine in the code]
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/cpu.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 58dd1df..652fb53 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -337,7 +337,7 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
 	}
 	smpboot_park_threads(cpu);
 
-	err = __stop_machine(take_cpu_down, &tcd_param, cpumask_of(cpu));
+	err = stop_one_cpu(cpu, take_cpu_down, &tcd_param);
 	if (err) {
 		/* CPU didn't die: tell everyone.  Can't complain. */
 		smpboot_unpark_threads(cpu);
@@ -349,7 +349,7 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
 	/*
 	 * The migration_call() CPU_DYING callback will have removed all
 	 * runnable tasks from the cpu, there's only the idle task left now
-	 * that the migration thread is done doing the stop_machine thing.
+	 * that the migration thread is done doing the stop_one_cpu() thing.
 	 *
 	 * Wait for the stop thread to go away.
 	 */

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 44/46] cpu: No more __stop_machine() in _cpu_down()
@ 2013-02-18 12:44   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

From: Paul E. McKenney <paul.mckenney@linaro.org>

The _cpu_down() function invoked as part of the CPU-hotplug offlining
process currently invokes __stop_machine(), which is slow and inflicts
substantial real-time latencies on the entire system.  This patch
substitutes stop_one_cpu() for __stop_machine() in order to improve
both performance and real-time latency.

There were a number of uses of preempt_disable() or local_irq_disable()
that were intended to block CPU-hotplug offlining. These were fixed by
using get/put_online_cpus_atomic(), which is the new synchronization
primitive to prevent CPU offline, while invoking from atomic context.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ srivatsa.bhat at linux.vnet.ibm.com: Refer to the new sync primitives for
  readers (in the changelog); s/stop_cpus/stop_one_cpu and fix comment
  referring to stop_machine in the code]
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/cpu.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 58dd1df..652fb53 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -337,7 +337,7 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
 	}
 	smpboot_park_threads(cpu);
 
-	err = __stop_machine(take_cpu_down, &tcd_param, cpumask_of(cpu));
+	err = stop_one_cpu(cpu, take_cpu_down, &tcd_param);
 	if (err) {
 		/* CPU didn't die: tell everyone.  Can't complain. */
 		smpboot_unpark_threads(cpu);
@@ -349,7 +349,7 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
 	/*
 	 * The migration_call() CPU_DYING callback will have removed all
 	 * runnable tasks from the cpu, there's only the idle task left now
-	 * that the migration thread is done doing the stop_machine thing.
+	 * that the migration thread is done doing the stop_one_cpu() thing.
 	 *
 	 * Wait for the stop thread to go away.
 	 */

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 45/46] CPU hotplug, stop_machine: Decouple CPU hotplug from stop_machine() in Kconfig
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:44   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:44 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

Simply dropping HOTPLUG_CPU from the dependency list of STOP_MACHINE will
lead to the following kconfig issue, reported by Fengguang Wu:

"warning: (HAVE_TEXT_POKE_SMP) selects STOP_MACHINE which has unmet direct
dependencies (SMP && MODULE_UNLOAD)"

So drop HOTPLUG_CPU and add HAVE_TEXT_POKE_SMP to the dependency list of
STOP_MACHINE.

And while at it, also cleanup a comment that refers to CPU hotplug being
dependent on stop_machine().

Cc: David Howells <dhowells@redhat.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Suggested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---

 include/linux/stop_machine.h |    2 +-
 init/Kconfig                 |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
index 3b5e910..ce2d3c4 100644
--- a/include/linux/stop_machine.h
+++ b/include/linux/stop_machine.h
@@ -120,7 +120,7 @@ int stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus);
  * @cpus: the cpus to run the @fn() on (NULL = any online cpu)
  *
  * Description: This is a special version of the above, which assumes cpus
- * won't come or go while it's being called.  Used by hotplug cpu.
+ * won't come or go while it's being called.
  */
 int __stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus);
 
diff --git a/init/Kconfig b/init/Kconfig
index be8b7f5..f4f79af 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1711,7 +1711,7 @@ config INIT_ALL_POSSIBLE
 config STOP_MACHINE
 	bool
 	default y
-	depends on (SMP && MODULE_UNLOAD) || HOTPLUG_CPU
+	depends on (SMP && (MODULE_UNLOAD || HAVE_TEXT_POKE_SMP))
 	help
 	  Need stop_machine() primitive.
 


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 45/46] CPU hotplug, stop_machine: Decouple CPU hotplug from stop_machine() in Kconfig
@ 2013-02-18 12:44   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:44 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

Simply dropping HOTPLUG_CPU from the dependency list of STOP_MACHINE will
lead to the following kconfig issue, reported by Fengguang Wu:

"warning: (HAVE_TEXT_POKE_SMP) selects STOP_MACHINE which has unmet direct
dependencies (SMP && MODULE_UNLOAD)"

So drop HOTPLUG_CPU and add HAVE_TEXT_POKE_SMP to the dependency list of
STOP_MACHINE.

And while at it, also cleanup a comment that refers to CPU hotplug being
dependent on stop_machine().

Cc: David Howells <dhowells@redhat.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Suggested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---

 include/linux/stop_machine.h |    2 +-
 init/Kconfig                 |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
index 3b5e910..ce2d3c4 100644
--- a/include/linux/stop_machine.h
+++ b/include/linux/stop_machine.h
@@ -120,7 +120,7 @@ int stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus);
  * @cpus: the cpus to run the @fn() on (NULL = any online cpu)
  *
  * Description: This is a special version of the above, which assumes cpus
- * won't come or go while it's being called.  Used by hotplug cpu.
+ * won't come or go while it's being called.
  */
 int __stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus);
 
diff --git a/init/Kconfig b/init/Kconfig
index be8b7f5..f4f79af 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1711,7 +1711,7 @@ config INIT_ALL_POSSIBLE
 config STOP_MACHINE
 	bool
 	default y
-	depends on (SMP && MODULE_UNLOAD) || HOTPLUG_CPU
+	depends on (SMP && (MODULE_UNLOAD || HAVE_TEXT_POKE_SMP))
 	help
 	  Need stop_machine() primitive.
 

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 45/46] CPU hotplug, stop_machine: Decouple CPU hotplug from stop_machine() in Kconfig
@ 2013-02-18 12:44   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

Simply dropping HOTPLUG_CPU from the dependency list of STOP_MACHINE will
lead to the following kconfig issue, reported by Fengguang Wu:

"warning: (HAVE_TEXT_POKE_SMP) selects STOP_MACHINE which has unmet direct
dependencies (SMP && MODULE_UNLOAD)"

So drop HOTPLUG_CPU and add HAVE_TEXT_POKE_SMP to the dependency list of
STOP_MACHINE.

And while at it, also cleanup a comment that refers to CPU hotplug being
dependent on stop_machine().

Cc: David Howells <dhowells@redhat.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Suggested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---

 include/linux/stop_machine.h |    2 +-
 init/Kconfig                 |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
index 3b5e910..ce2d3c4 100644
--- a/include/linux/stop_machine.h
+++ b/include/linux/stop_machine.h
@@ -120,7 +120,7 @@ int stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus);
  * @cpus: the cpus to run the @fn() on (NULL = any online cpu)
  *
  * Description: This is a special version of the above, which assumes cpus
- * won't come or go while it's being called.  Used by hotplug cpu.
+ * won't come or go while it's being called.
  */
 int __stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus);
 
diff --git a/init/Kconfig b/init/Kconfig
index be8b7f5..f4f79af 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1711,7 +1711,7 @@ config INIT_ALL_POSSIBLE
 config STOP_MACHINE
 	bool
 	default y
-	depends on (SMP && MODULE_UNLOAD) || HOTPLUG_CPU
+	depends on (SMP && (MODULE_UNLOAD || HAVE_TEXT_POKE_SMP))
 	help
 	  Need stop_machine() primitive.
 

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 46/46] Documentation/cpu-hotplug: Remove references to stop_machine()
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 12:44   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:44 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

Since stop_machine() is no longer used in the CPU offline path, we cannot
disable CPU hotplug using preempt_disable()/local_irq_disable() etc. We
need to use the newly introduced get/put_online_cpus_atomic() APIs.
Reflect this in the documentation.

Cc: Rob Landley <rob@landley.net>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---

 Documentation/cpu-hotplug.txt |   17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/Documentation/cpu-hotplug.txt b/Documentation/cpu-hotplug.txt
index 9f40135..7f907ec 100644
--- a/Documentation/cpu-hotplug.txt
+++ b/Documentation/cpu-hotplug.txt
@@ -113,13 +113,15 @@ Never use anything other than cpumask_t to represent bitmap of CPUs.
 	#include <linux/cpu.h>
 	get_online_cpus() and put_online_cpus():
 
-The above calls are used to inhibit cpu hotplug operations. While the
+The above calls are used to inhibit cpu hotplug operations, when invoked from
+non-atomic context (because the above functions can sleep). While the
 cpu_hotplug.refcount is non zero, the cpu_online_mask will not change.
-If you merely need to avoid cpus going away, you could also use
-preempt_disable() and preempt_enable() for those sections.
-Just remember the critical section cannot call any
-function that can sleep or schedule this process away. The preempt_disable()
-will work as long as stop_machine_run() is used to take a cpu down.
+
+However, if you are executing in atomic context (ie., you can't afford to
+sleep), and you merely need to avoid cpus going offline, you can use
+get_online_cpus_atomic() and put_online_cpus_atomic() for those sections.
+Just remember the critical section cannot call any function that can sleep or
+schedule this process away.
 
 CPU Hotplug - Frequently Asked Questions.
 
@@ -360,6 +362,9 @@ A: There are two ways.  If your code can be run in interrupt context, use
 		return err;
 	}
 
+   If my_func_on_cpu() itself cannot block, use get/put_online_cpus_atomic()
+   instead of get/put_online_cpus() to prevent CPUs from going offline.
+
 Q: How do we determine how many CPUs are available for hotplug.
 A: There is no clear spec defined way from ACPI that can give us that
    information today. Based on some input from Natalie of Unisys,


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 46/46] Documentation/cpu-hotplug: Remove references to stop_machine()
@ 2013-02-18 12:44   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:44 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

Since stop_machine() is no longer used in the CPU offline path, we cannot
disable CPU hotplug using preempt_disable()/local_irq_disable() etc. We
need to use the newly introduced get/put_online_cpus_atomic() APIs.
Reflect this in the documentation.

Cc: Rob Landley <rob@landley.net>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---

 Documentation/cpu-hotplug.txt |   17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/Documentation/cpu-hotplug.txt b/Documentation/cpu-hotplug.txt
index 9f40135..7f907ec 100644
--- a/Documentation/cpu-hotplug.txt
+++ b/Documentation/cpu-hotplug.txt
@@ -113,13 +113,15 @@ Never use anything other than cpumask_t to represent bitmap of CPUs.
 	#include <linux/cpu.h>
 	get_online_cpus() and put_online_cpus():
 
-The above calls are used to inhibit cpu hotplug operations. While the
+The above calls are used to inhibit cpu hotplug operations, when invoked from
+non-atomic context (because the above functions can sleep). While the
 cpu_hotplug.refcount is non zero, the cpu_online_mask will not change.
-If you merely need to avoid cpus going away, you could also use
-preempt_disable() and preempt_enable() for those sections.
-Just remember the critical section cannot call any
-function that can sleep or schedule this process away. The preempt_disable()
-will work as long as stop_machine_run() is used to take a cpu down.
+
+However, if you are executing in atomic context (ie., you can't afford to
+sleep), and you merely need to avoid cpus going offline, you can use
+get_online_cpus_atomic() and put_online_cpus_atomic() for those sections.
+Just remember the critical section cannot call any function that can sleep or
+schedule this process away.
 
 CPU Hotplug - Frequently Asked Questions.
 
@@ -360,6 +362,9 @@ A: There are two ways.  If your code can be run in interrupt context, use
 		return err;
 	}
 
+   If my_func_on_cpu() itself cannot block, use get/put_online_cpus_atomic()
+   instead of get/put_online_cpus() to prevent CPUs from going offline.
+
 Q: How do we determine how many CPUs are available for hotplug.
 A: There is no clear spec defined way from ACPI that can give us that
    information today. Based on some input from Natalie of Unisys,

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 46/46] Documentation/cpu-hotplug: Remove references to stop_machine()
@ 2013-02-18 12:44   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

Since stop_machine() is no longer used in the CPU offline path, we cannot
disable CPU hotplug using preempt_disable()/local_irq_disable() etc. We
need to use the newly introduced get/put_online_cpus_atomic() APIs.
Reflect this in the documentation.

Cc: Rob Landley <rob@landley.net>
Cc: linux-doc at vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---

 Documentation/cpu-hotplug.txt |   17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/Documentation/cpu-hotplug.txt b/Documentation/cpu-hotplug.txt
index 9f40135..7f907ec 100644
--- a/Documentation/cpu-hotplug.txt
+++ b/Documentation/cpu-hotplug.txt
@@ -113,13 +113,15 @@ Never use anything other than cpumask_t to represent bitmap of CPUs.
 	#include <linux/cpu.h>
 	get_online_cpus() and put_online_cpus():
 
-The above calls are used to inhibit cpu hotplug operations. While the
+The above calls are used to inhibit cpu hotplug operations, when invoked from
+non-atomic context (because the above functions can sleep). While the
 cpu_hotplug.refcount is non zero, the cpu_online_mask will not change.
-If you merely need to avoid cpus going away, you could also use
-preempt_disable() and preempt_enable() for those sections.
-Just remember the critical section cannot call any
-function that can sleep or schedule this process away. The preempt_disable()
-will work as long as stop_machine_run() is used to take a cpu down.
+
+However, if you are executing in atomic context (ie., you can't afford to
+sleep), and you merely need to avoid cpus going offline, you can use
+get_online_cpus_atomic() and put_online_cpus_atomic() for those sections.
+Just remember the critical section cannot call any function that can sleep or
+schedule this process away.
 
 CPU Hotplug - Frequently Asked Questions.
 
@@ -360,6 +362,9 @@ A: There are two ways.  If your code can be run in interrupt context, use
 		return err;
 	}
 
+   If my_func_on_cpu() itself cannot block, use get/put_online_cpus_atomic()
+   instead of get/put_online_cpus() to prevent CPUs from going offline.
+
 Q: How do we determine how many CPUs are available for hotplug.
 A: There is no clear spec defined way from ACPI that can give us that
    information today. Based on some input from Natalie of Unisys,

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 33/46] cris/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
  2013-02-18 12:42   ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 13:07     ` Jesper Nilsson
  -1 siblings, 0 replies; 360+ messages in thread
From: Jesper Nilsson @ 2013-02-18 13:07 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung,
	rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, linux-pm, linux-arch, linux-arm-kernel, linuxppc-dev,
	netdev, linux-doc, linux-kernel, walken, vincent.guittot

On Mon, Feb 18, 2013 at 06:12:54PM +0530, Srivatsa S. Bhat wrote:
> Once stop_machine() is gone from the CPU offline path, we won't be able to
> depend on preempt_disable() or local_irq_disable() to prevent CPUs from
> going offline from under us.
> 
> Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
> while invoking from atomic context.
> 
> Cc: Mikael Starvik <starvik@axis.com>

> Cc: linux-cris-kernel@axis.com
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>

/^JN - Jesper Nilsson
-- 
               Jesper Nilsson -- jesper.nilsson@axis.com

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 33/46] cris/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 13:07     ` Jesper Nilsson
  0 siblings, 0 replies; 360+ messages in thread
From: Jesper Nilsson @ 2013-02-18 13:07 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: linux-doc, peterz, fweisbec, linux-kernel, walken, mingo,
	linux-arch, linux, xiaoguangrong, wangyun, paulmck, nikunj,
	linux-pm, rusty, rostedt, rjw, namhyung, tglx, linux-arm-kernel,
	netdev, oleg, vincent.guittot, sbw, tj, akpm, linuxppc-dev

On Mon, Feb 18, 2013 at 06:12:54PM +0530, Srivatsa S. Bhat wrote:
> Once stop_machine() is gone from the CPU offline path, we won't be able to
> depend on preempt_disable() or local_irq_disable() to prevent CPUs from
> going offline from under us.
> 
> Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
> while invoking from atomic context.
> 
> Cc: Mikael Starvik <starvik@axis.com>

> Cc: linux-cris-kernel@axis.com
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>

/^JN - Jesper Nilsson
-- 
               Jesper Nilsson -- jesper.nilsson@axis.com

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 33/46] cris/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
@ 2013-02-18 13:07     ` Jesper Nilsson
  0 siblings, 0 replies; 360+ messages in thread
From: Jesper Nilsson @ 2013-02-18 13:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 18, 2013 at 06:12:54PM +0530, Srivatsa S. Bhat wrote:
> Once stop_machine() is gone from the CPU offline path, we won't be able to
> depend on preempt_disable() or local_irq_disable() to prevent CPUs from
> going offline from under us.
> 
> Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
> while invoking from atomic context.
> 
> Cc: Mikael Starvik <starvik@axis.com>

> Cc: linux-cris-kernel at axis.com
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>

/^JN - Jesper Nilsson
-- 
               Jesper Nilsson -- jesper.nilsson at axis.com

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
  2013-02-18 12:38   ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 15:45     ` Michel Lespinasse
  -1 siblings, 0 replies; 360+ messages in thread
From: Michel Lespinasse @ 2013-02-18 15:45 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung,
	rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, linux-pm, linux-arch, linux-arm-kernel, linuxppc-dev,
	netdev, linux-doc, linux-kernel, vincent.guittot

Hi Srivasta,

I admit not having followed in detail the threads about the previous
iteration, so some of my comments may have been discussed already
before - apologies if that is the case.

On Mon, Feb 18, 2013 at 8:38 PM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> Reader-writer locks and per-cpu counters are recursive, so they can be
> used in a nested fashion in the reader-path, which makes per-CPU rwlocks also
> recursive. Also, this design of switching the synchronization scheme ensures
> that you can safely nest and use these locks in a very flexible manner.

I like the general idea of switching between per-cpu and global
rwlocks as needed; however I dislike unfair locks, and especially
unfair recursive rwlocks.

If you look at rwlock_t, the main reason we haven't been able to
implement reader/writer fairness there is because tasklist_lock makes
use of the recursive nature of the rwlock_t read side. I'm worried
about introducing more lock usages that would make use of the same
property for your proposed lock.

I am fine with your proposal not implementing reader/writer fairness
from day 1, but I am worried about your proposal having a recursive
reader side. Or, to put it another way: if your proposal didn't have a
recursive reader side, and rwlock_t could somehow be changed to
implement reader/writer fairness, then this property could
automatically propagate into your proposed rwlock; but if anyone makes
use of the recursive nature of your proposal then implementing
reader/writer fairness later won't be as easy.

I see that the very next change in this series is talking about
acquiring the read side from interrupts, so it does look like you're
planning to make use of the recursive nature of the read side. I kinda
wish you didn't, as this is exactly replicating the design of
tasklist_lock which is IMO problematic. Your prior proposal of
disabling interrupts during the read side had other disadvantages, but
I think it was nice that it didn't rely on having a recursive read
side.

> +#define reader_yet_to_switch(pcpu_rwlock, cpu)                             \
> +       (ACCESS_ONCE(per_cpu_ptr((pcpu_rwlock)->rw_state, cpu)->reader_refcnt))
> +
> +#define reader_percpu_nesting_depth(pcpu_rwlock)                 \
> +       (__this_cpu_read((pcpu_rwlock)->rw_state->reader_refcnt))
> +
> +#define reader_uses_percpu_refcnt(pcpu_rwlock)                         \
> +                               reader_percpu_nesting_depth(pcpu_rwlock)
> +
> +#define reader_nested_percpu(pcpu_rwlock)                              \
> +                       (reader_percpu_nesting_depth(pcpu_rwlock) > 1)
> +
> +#define writer_active(pcpu_rwlock)                                     \
> +       (__this_cpu_read((pcpu_rwlock)->rw_state->writer_signal))

I'm personally not a fan of such one-line shorthand functions - I
think they tend to make the code harder to read instead of easier, as
one constantly has to refer to them to understand what's actually
going on.

>  void percpu_write_lock(struct percpu_rwlock *pcpu_rwlock)
>  {
> +       unsigned int cpu;
> +
> +       /*
> +        * Tell all readers that a writer is becoming active, so that they
> +        * start switching over to the global rwlock.
> +        */
> +       for_each_possible_cpu(cpu)
> +               per_cpu_ptr(pcpu_rwlock->rw_state, cpu)->writer_signal = true;

I don't see anything preventing a race with the corresponding code in
percpu_write_unlock() that sets writer_signal back to false. Did I
miss something here ? It seems to me we don't have any guarantee that
all writer signals will be set to true at the end of the loop...

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-18 15:45     ` Michel Lespinasse
  0 siblings, 0 replies; 360+ messages in thread
From: Michel Lespinasse @ 2013-02-18 15:45 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: linux-doc, peterz, fweisbec, linux-kernel, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, namhyung, tglx, linux-arm-kernel, netdev, oleg,
	vincent.guittot, sbw, tj, akpm, linuxppc-dev

Hi Srivasta,

I admit not having followed in detail the threads about the previous
iteration, so some of my comments may have been discussed already
before - apologies if that is the case.

On Mon, Feb 18, 2013 at 8:38 PM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> Reader-writer locks and per-cpu counters are recursive, so they can be
> used in a nested fashion in the reader-path, which makes per-CPU rwlocks also
> recursive. Also, this design of switching the synchronization scheme ensures
> that you can safely nest and use these locks in a very flexible manner.

I like the general idea of switching between per-cpu and global
rwlocks as needed; however I dislike unfair locks, and especially
unfair recursive rwlocks.

If you look at rwlock_t, the main reason we haven't been able to
implement reader/writer fairness there is because tasklist_lock makes
use of the recursive nature of the rwlock_t read side. I'm worried
about introducing more lock usages that would make use of the same
property for your proposed lock.

I am fine with your proposal not implementing reader/writer fairness
from day 1, but I am worried about your proposal having a recursive
reader side. Or, to put it another way: if your proposal didn't have a
recursive reader side, and rwlock_t could somehow be changed to
implement reader/writer fairness, then this property could
automatically propagate into your proposed rwlock; but if anyone makes
use of the recursive nature of your proposal then implementing
reader/writer fairness later won't be as easy.

I see that the very next change in this series is talking about
acquiring the read side from interrupts, so it does look like you're
planning to make use of the recursive nature of the read side. I kinda
wish you didn't, as this is exactly replicating the design of
tasklist_lock which is IMO problematic. Your prior proposal of
disabling interrupts during the read side had other disadvantages, but
I think it was nice that it didn't rely on having a recursive read
side.

> +#define reader_yet_to_switch(pcpu_rwlock, cpu)                             \
> +       (ACCESS_ONCE(per_cpu_ptr((pcpu_rwlock)->rw_state, cpu)->reader_refcnt))
> +
> +#define reader_percpu_nesting_depth(pcpu_rwlock)                 \
> +       (__this_cpu_read((pcpu_rwlock)->rw_state->reader_refcnt))
> +
> +#define reader_uses_percpu_refcnt(pcpu_rwlock)                         \
> +                               reader_percpu_nesting_depth(pcpu_rwlock)
> +
> +#define reader_nested_percpu(pcpu_rwlock)                              \
> +                       (reader_percpu_nesting_depth(pcpu_rwlock) > 1)
> +
> +#define writer_active(pcpu_rwlock)                                     \
> +       (__this_cpu_read((pcpu_rwlock)->rw_state->writer_signal))

I'm personally not a fan of such one-line shorthand functions - I
think they tend to make the code harder to read instead of easier, as
one constantly has to refer to them to understand what's actually
going on.

>  void percpu_write_lock(struct percpu_rwlock *pcpu_rwlock)
>  {
> +       unsigned int cpu;
> +
> +       /*
> +        * Tell all readers that a writer is becoming active, so that they
> +        * start switching over to the global rwlock.
> +        */
> +       for_each_possible_cpu(cpu)
> +               per_cpu_ptr(pcpu_rwlock->rw_state, cpu)->writer_signal = true;

I don't see anything preventing a race with the corresponding code in
percpu_write_unlock() that sets writer_signal back to false. Did I
miss something here ? It seems to me we don't have any guarantee that
all writer signals will be set to true at the end of the loop...

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-18 15:45     ` Michel Lespinasse
  0 siblings, 0 replies; 360+ messages in thread
From: Michel Lespinasse @ 2013-02-18 15:45 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Srivasta,

I admit not having followed in detail the threads about the previous
iteration, so some of my comments may have been discussed already
before - apologies if that is the case.

On Mon, Feb 18, 2013 at 8:38 PM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> Reader-writer locks and per-cpu counters are recursive, so they can be
> used in a nested fashion in the reader-path, which makes per-CPU rwlocks also
> recursive. Also, this design of switching the synchronization scheme ensures
> that you can safely nest and use these locks in a very flexible manner.

I like the general idea of switching between per-cpu and global
rwlocks as needed; however I dislike unfair locks, and especially
unfair recursive rwlocks.

If you look at rwlock_t, the main reason we haven't been able to
implement reader/writer fairness there is because tasklist_lock makes
use of the recursive nature of the rwlock_t read side. I'm worried
about introducing more lock usages that would make use of the same
property for your proposed lock.

I am fine with your proposal not implementing reader/writer fairness
from day 1, but I am worried about your proposal having a recursive
reader side. Or, to put it another way: if your proposal didn't have a
recursive reader side, and rwlock_t could somehow be changed to
implement reader/writer fairness, then this property could
automatically propagate into your proposed rwlock; but if anyone makes
use of the recursive nature of your proposal then implementing
reader/writer fairness later won't be as easy.

I see that the very next change in this series is talking about
acquiring the read side from interrupts, so it does look like you're
planning to make use of the recursive nature of the read side. I kinda
wish you didn't, as this is exactly replicating the design of
tasklist_lock which is IMO problematic. Your prior proposal of
disabling interrupts during the read side had other disadvantages, but
I think it was nice that it didn't rely on having a recursive read
side.

> +#define reader_yet_to_switch(pcpu_rwlock, cpu)                             \
> +       (ACCESS_ONCE(per_cpu_ptr((pcpu_rwlock)->rw_state, cpu)->reader_refcnt))
> +
> +#define reader_percpu_nesting_depth(pcpu_rwlock)                 \
> +       (__this_cpu_read((pcpu_rwlock)->rw_state->reader_refcnt))
> +
> +#define reader_uses_percpu_refcnt(pcpu_rwlock)                         \
> +                               reader_percpu_nesting_depth(pcpu_rwlock)
> +
> +#define reader_nested_percpu(pcpu_rwlock)                              \
> +                       (reader_percpu_nesting_depth(pcpu_rwlock) > 1)
> +
> +#define writer_active(pcpu_rwlock)                                     \
> +       (__this_cpu_read((pcpu_rwlock)->rw_state->writer_signal))

I'm personally not a fan of such one-line shorthand functions - I
think they tend to make the code harder to read instead of easier, as
one constantly has to refer to them to understand what's actually
going on.

>  void percpu_write_lock(struct percpu_rwlock *pcpu_rwlock)
>  {
> +       unsigned int cpu;
> +
> +       /*
> +        * Tell all readers that a writer is becoming active, so that they
> +        * start switching over to the global rwlock.
> +        */
> +       for_each_possible_cpu(cpu)
> +               per_cpu_ptr(pcpu_rwlock->rw_state, cpu)->writer_signal = true;

I don't see anything preventing a race with the corresponding code in
percpu_write_unlock() that sets writer_signal back to false. Did I
miss something here ? It seems to me we don't have any guarantee that
all writer signals will be set to true at the end of the loop...

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 07/46] percpu_rwlock: Allow writers to be readers, and add lockdep annotations
  2013-02-18 12:39   ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 15:51     ` Michel Lespinasse
  -1 siblings, 0 replies; 360+ messages in thread
From: Michel Lespinasse @ 2013-02-18 15:51 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung,
	rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, linux-pm, linux-arch, linux-arm-kernel, linuxppc-dev,
	netdev, linux-doc, linux-kernel, vincent.guittot

On Mon, Feb 18, 2013 at 8:39 PM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> @@ -200,6 +217,16 @@ void percpu_write_lock_irqsave(struct percpu_rwlock *pcpu_rwlock,
>
>         smp_mb(); /* Complete the wait-for-readers, before taking the lock */
>         write_lock_irqsave(&pcpu_rwlock->global_rwlock, *flags);
> +
> +       /*
> +        * It is desirable to allow the writer to acquire the percpu-rwlock
> +        * for read (if necessary), without deadlocking or getting complaints
> +        * from lockdep. To achieve that, just increment the reader_refcnt of
> +        * this CPU - that way, any attempt by the writer to acquire the
> +        * percpu-rwlock for read, will get treated as a case of nested percpu
> +        * reader, which is safe, from a locking perspective.
> +        */
> +       this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);

I find this quite disgusting, but once again this may be because I
don't like unfair recursive rwlocks.

In my opinion, the alternative of explicitly not taking the read lock
when one already has the write lock sounds *much* nicer.

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 07/46] percpu_rwlock: Allow writers to be readers, and add lockdep annotations
@ 2013-02-18 15:51     ` Michel Lespinasse
  0 siblings, 0 replies; 360+ messages in thread
From: Michel Lespinasse @ 2013-02-18 15:51 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: linux-doc, peterz, fweisbec, linux-kernel, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, namhyung, tglx, linux-arm-kernel, netdev, oleg,
	vincent.guittot, sbw, tj, akpm, linuxppc-dev

On Mon, Feb 18, 2013 at 8:39 PM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> @@ -200,6 +217,16 @@ void percpu_write_lock_irqsave(struct percpu_rwlock *pcpu_rwlock,
>
>         smp_mb(); /* Complete the wait-for-readers, before taking the lock */
>         write_lock_irqsave(&pcpu_rwlock->global_rwlock, *flags);
> +
> +       /*
> +        * It is desirable to allow the writer to acquire the percpu-rwlock
> +        * for read (if necessary), without deadlocking or getting complaints
> +        * from lockdep. To achieve that, just increment the reader_refcnt of
> +        * this CPU - that way, any attempt by the writer to acquire the
> +        * percpu-rwlock for read, will get treated as a case of nested percpu
> +        * reader, which is safe, from a locking perspective.
> +        */
> +       this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);

I find this quite disgusting, but once again this may be because I
don't like unfair recursive rwlocks.

In my opinion, the alternative of explicitly not taking the read lock
when one already has the write lock sounds *much* nicer.

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 07/46] percpu_rwlock: Allow writers to be readers, and add lockdep annotations
@ 2013-02-18 15:51     ` Michel Lespinasse
  0 siblings, 0 replies; 360+ messages in thread
From: Michel Lespinasse @ 2013-02-18 15:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 18, 2013 at 8:39 PM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> @@ -200,6 +217,16 @@ void percpu_write_lock_irqsave(struct percpu_rwlock *pcpu_rwlock,
>
>         smp_mb(); /* Complete the wait-for-readers, before taking the lock */
>         write_lock_irqsave(&pcpu_rwlock->global_rwlock, *flags);
> +
> +       /*
> +        * It is desirable to allow the writer to acquire the percpu-rwlock
> +        * for read (if necessary), without deadlocking or getting complaints
> +        * from lockdep. To achieve that, just increment the reader_refcnt of
> +        * this CPU - that way, any attempt by the writer to acquire the
> +        * percpu-rwlock for read, will get treated as a case of nested percpu
> +        * reader, which is safe, from a locking perspective.
> +        */
> +       this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);

I find this quite disgusting, but once again this may be because I
don't like unfair recursive rwlocks.

In my opinion, the alternative of explicitly not taking the read lock
when one already has the write lock sounds *much* nicer.

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
  2013-02-18 15:45     ` Michel Lespinasse
  (?)
@ 2013-02-18 16:21       ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 16:21 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung,
	rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, linux-pm, linux-arch, linux-arm-kernel, linuxppc-dev,
	netdev, linux-doc, linux-kernel, vincent.guittot

Hi Michel,

On 02/18/2013 09:15 PM, Michel Lespinasse wrote:
> Hi Srivasta,
> 
> I admit not having followed in detail the threads about the previous
> iteration, so some of my comments may have been discussed already
> before - apologies if that is the case.
> 
> On Mon, Feb 18, 2013 at 8:38 PM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> Reader-writer locks and per-cpu counters are recursive, so they can be
>> used in a nested fashion in the reader-path, which makes per-CPU rwlocks also
>> recursive. Also, this design of switching the synchronization scheme ensures
>> that you can safely nest and use these locks in a very flexible manner.
> 
> I like the general idea of switching between per-cpu and global
> rwlocks as needed; however I dislike unfair locks, and especially
> unfair recursive rwlocks.
> 
> If you look at rwlock_t, the main reason we haven't been able to
> implement reader/writer fairness there is because tasklist_lock makes
> use of the recursive nature of the rwlock_t read side. I'm worried
> about introducing more lock usages that would make use of the same
> property for your proposed lock.
> 
> I am fine with your proposal not implementing reader/writer fairness
> from day 1, but I am worried about your proposal having a recursive
> reader side. Or, to put it another way: if your proposal didn't have a
> recursive reader side, and rwlock_t could somehow be changed to
> implement reader/writer fairness, then this property could
> automatically propagate into your proposed rwlock; but if anyone makes
> use of the recursive nature of your proposal then implementing
> reader/writer fairness later won't be as easy.
>

Actually, we don't want reader/writer fairness in this particular case.
We want deadlock safety - and in this particular case, this is guaranteed
by the unfair nature of rwlocks today.

I understand that you want to make rwlocks fair. So, I am thinking of
going ahead with Tejun's proposal - implementing our own unfair locking
scheme inside percpu-rwlocks using atomic ops or something like that, and
being completely independent of rwlock_t. That way, you can easily go
ahead with making rwlocks fair without fear of breaking CPU hotplug.
However I would much prefer making that change to percpu-rwlocks as a
separate patchset, after this patchset goes in, so that we can also see
how well this unfair logic performs in practice.

And regarding recursive reader side,... the way I see it, having a
recursive reader side is a primary requirement in this case. The reason is
that the existing reader side (with stop_machine) uses preempt_disable(),
which is recursive. So our replacement also has to be recursive.
 
> I see that the very next change in this series is talking about
> acquiring the read side from interrupts, so it does look like you're
> planning to make use of the recursive nature of the read side.

Yes.. I don't think we can avoid that. Moreover, since we _want_ unfair
reader/writer semantics to allow flexible locking rules and guarantee
deadlock-safety, having a recursive reader side is not even an issue, IMHO.

> I kinda
> wish you didn't, as this is exactly replicating the design of
> tasklist_lock which is IMO problematic. Your prior proposal of
> disabling interrupts during the read side had other disadvantages, but
> I think it was nice that it didn't rely on having a recursive read
> side.
> 

We can have readers from non-interrupt contexts too, which depend on the
recursive property...

>> +#define reader_yet_to_switch(pcpu_rwlock, cpu)                             \
>> +       (ACCESS_ONCE(per_cpu_ptr((pcpu_rwlock)->rw_state, cpu)->reader_refcnt))
>> +
>> +#define reader_percpu_nesting_depth(pcpu_rwlock)                 \
>> +       (__this_cpu_read((pcpu_rwlock)->rw_state->reader_refcnt))
>> +
>> +#define reader_uses_percpu_refcnt(pcpu_rwlock)                         \
>> +                               reader_percpu_nesting_depth(pcpu_rwlock)
>> +
>> +#define reader_nested_percpu(pcpu_rwlock)                              \
>> +                       (reader_percpu_nesting_depth(pcpu_rwlock) > 1)
>> +
>> +#define writer_active(pcpu_rwlock)                                     \
>> +       (__this_cpu_read((pcpu_rwlock)->rw_state->writer_signal))
> 
> I'm personally not a fan of such one-line shorthand functions - I
> think they tend to make the code harder to read instead of easier, as
> one constantly has to refer to them to understand what's actually
> going on.
> 

I got rid of most of the helper functions in this version. But I would rather
prefer retaining the above ones, because they are unwieldy and long. And IMHO
the short-hand names are pretty descriptive, so you might not actually need
to keep referring to their implementations all the time.

>>  void percpu_write_lock(struct percpu_rwlock *pcpu_rwlock)
>>  {
>> +       unsigned int cpu;
>> +
>> +       /*
>> +        * Tell all readers that a writer is becoming active, so that they
>> +        * start switching over to the global rwlock.
>> +        */
>> +       for_each_possible_cpu(cpu)
>> +               per_cpu_ptr(pcpu_rwlock->rw_state, cpu)->writer_signal = true;
> 
> I don't see anything preventing a race with the corresponding code in
> percpu_write_unlock() that sets writer_signal back to false. Did I
> miss something here ? It seems to me we don't have any guarantee that
> all writer signals will be set to true at the end of the loop...
> 

Ah, thanks for pointing that out! IIRC Oleg had pointed this issue in the last
version, but back then, I hadn't fully understood what he meant. Your
explanation made it clear. I'll work on fixing this.

Thanks a lot for your review Michel!

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-18 16:21       ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 16:21 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: linux-doc, peterz, fweisbec, linux-kernel, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, namhyung, tglx, linux-arm-kernel, netdev, oleg,
	vincent.guittot, sbw, tj, akpm, linuxppc-dev

Hi Michel,

On 02/18/2013 09:15 PM, Michel Lespinasse wrote:
> Hi Srivasta,
> 
> I admit not having followed in detail the threads about the previous
> iteration, so some of my comments may have been discussed already
> before - apologies if that is the case.
> 
> On Mon, Feb 18, 2013 at 8:38 PM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> Reader-writer locks and per-cpu counters are recursive, so they can be
>> used in a nested fashion in the reader-path, which makes per-CPU rwlocks also
>> recursive. Also, this design of switching the synchronization scheme ensures
>> that you can safely nest and use these locks in a very flexible manner.
> 
> I like the general idea of switching between per-cpu and global
> rwlocks as needed; however I dislike unfair locks, and especially
> unfair recursive rwlocks.
> 
> If you look at rwlock_t, the main reason we haven't been able to
> implement reader/writer fairness there is because tasklist_lock makes
> use of the recursive nature of the rwlock_t read side. I'm worried
> about introducing more lock usages that would make use of the same
> property for your proposed lock.
> 
> I am fine with your proposal not implementing reader/writer fairness
> from day 1, but I am worried about your proposal having a recursive
> reader side. Or, to put it another way: if your proposal didn't have a
> recursive reader side, and rwlock_t could somehow be changed to
> implement reader/writer fairness, then this property could
> automatically propagate into your proposed rwlock; but if anyone makes
> use of the recursive nature of your proposal then implementing
> reader/writer fairness later won't be as easy.
>

Actually, we don't want reader/writer fairness in this particular case.
We want deadlock safety - and in this particular case, this is guaranteed
by the unfair nature of rwlocks today.

I understand that you want to make rwlocks fair. So, I am thinking of
going ahead with Tejun's proposal - implementing our own unfair locking
scheme inside percpu-rwlocks using atomic ops or something like that, and
being completely independent of rwlock_t. That way, you can easily go
ahead with making rwlocks fair without fear of breaking CPU hotplug.
However I would much prefer making that change to percpu-rwlocks as a
separate patchset, after this patchset goes in, so that we can also see
how well this unfair logic performs in practice.

And regarding recursive reader side,... the way I see it, having a
recursive reader side is a primary requirement in this case. The reason is
that the existing reader side (with stop_machine) uses preempt_disable(),
which is recursive. So our replacement also has to be recursive.
 
> I see that the very next change in this series is talking about
> acquiring the read side from interrupts, so it does look like you're
> planning to make use of the recursive nature of the read side.

Yes.. I don't think we can avoid that. Moreover, since we _want_ unfair
reader/writer semantics to allow flexible locking rules and guarantee
deadlock-safety, having a recursive reader side is not even an issue, IMHO.

> I kinda
> wish you didn't, as this is exactly replicating the design of
> tasklist_lock which is IMO problematic. Your prior proposal of
> disabling interrupts during the read side had other disadvantages, but
> I think it was nice that it didn't rely on having a recursive read
> side.
> 

We can have readers from non-interrupt contexts too, which depend on the
recursive property...

>> +#define reader_yet_to_switch(pcpu_rwlock, cpu)                             \
>> +       (ACCESS_ONCE(per_cpu_ptr((pcpu_rwlock)->rw_state, cpu)->reader_refcnt))
>> +
>> +#define reader_percpu_nesting_depth(pcpu_rwlock)                 \
>> +       (__this_cpu_read((pcpu_rwlock)->rw_state->reader_refcnt))
>> +
>> +#define reader_uses_percpu_refcnt(pcpu_rwlock)                         \
>> +                               reader_percpu_nesting_depth(pcpu_rwlock)
>> +
>> +#define reader_nested_percpu(pcpu_rwlock)                              \
>> +                       (reader_percpu_nesting_depth(pcpu_rwlock) > 1)
>> +
>> +#define writer_active(pcpu_rwlock)                                     \
>> +       (__this_cpu_read((pcpu_rwlock)->rw_state->writer_signal))
> 
> I'm personally not a fan of such one-line shorthand functions - I
> think they tend to make the code harder to read instead of easier, as
> one constantly has to refer to them to understand what's actually
> going on.
> 

I got rid of most of the helper functions in this version. But I would rather
prefer retaining the above ones, because they are unwieldy and long. And IMHO
the short-hand names are pretty descriptive, so you might not actually need
to keep referring to their implementations all the time.

>>  void percpu_write_lock(struct percpu_rwlock *pcpu_rwlock)
>>  {
>> +       unsigned int cpu;
>> +
>> +       /*
>> +        * Tell all readers that a writer is becoming active, so that they
>> +        * start switching over to the global rwlock.
>> +        */
>> +       for_each_possible_cpu(cpu)
>> +               per_cpu_ptr(pcpu_rwlock->rw_state, cpu)->writer_signal = true;
> 
> I don't see anything preventing a race with the corresponding code in
> percpu_write_unlock() that sets writer_signal back to false. Did I
> miss something here ? It seems to me we don't have any guarantee that
> all writer signals will be set to true at the end of the loop...
> 

Ah, thanks for pointing that out! IIRC Oleg had pointed this issue in the last
version, but back then, I hadn't fully understood what he meant. Your
explanation made it clear. I'll work on fixing this.

Thanks a lot for your review Michel!

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-18 16:21       ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 16:21 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Michel,

On 02/18/2013 09:15 PM, Michel Lespinasse wrote:
> Hi Srivasta,
> 
> I admit not having followed in detail the threads about the previous
> iteration, so some of my comments may have been discussed already
> before - apologies if that is the case.
> 
> On Mon, Feb 18, 2013 at 8:38 PM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> Reader-writer locks and per-cpu counters are recursive, so they can be
>> used in a nested fashion in the reader-path, which makes per-CPU rwlocks also
>> recursive. Also, this design of switching the synchronization scheme ensures
>> that you can safely nest and use these locks in a very flexible manner.
> 
> I like the general idea of switching between per-cpu and global
> rwlocks as needed; however I dislike unfair locks, and especially
> unfair recursive rwlocks.
> 
> If you look at rwlock_t, the main reason we haven't been able to
> implement reader/writer fairness there is because tasklist_lock makes
> use of the recursive nature of the rwlock_t read side. I'm worried
> about introducing more lock usages that would make use of the same
> property for your proposed lock.
> 
> I am fine with your proposal not implementing reader/writer fairness
> from day 1, but I am worried about your proposal having a recursive
> reader side. Or, to put it another way: if your proposal didn't have a
> recursive reader side, and rwlock_t could somehow be changed to
> implement reader/writer fairness, then this property could
> automatically propagate into your proposed rwlock; but if anyone makes
> use of the recursive nature of your proposal then implementing
> reader/writer fairness later won't be as easy.
>

Actually, we don't want reader/writer fairness in this particular case.
We want deadlock safety - and in this particular case, this is guaranteed
by the unfair nature of rwlocks today.

I understand that you want to make rwlocks fair. So, I am thinking of
going ahead with Tejun's proposal - implementing our own unfair locking
scheme inside percpu-rwlocks using atomic ops or something like that, and
being completely independent of rwlock_t. That way, you can easily go
ahead with making rwlocks fair without fear of breaking CPU hotplug.
However I would much prefer making that change to percpu-rwlocks as a
separate patchset, after this patchset goes in, so that we can also see
how well this unfair logic performs in practice.

And regarding recursive reader side,... the way I see it, having a
recursive reader side is a primary requirement in this case. The reason is
that the existing reader side (with stop_machine) uses preempt_disable(),
which is recursive. So our replacement also has to be recursive.
 
> I see that the very next change in this series is talking about
> acquiring the read side from interrupts, so it does look like you're
> planning to make use of the recursive nature of the read side.

Yes.. I don't think we can avoid that. Moreover, since we _want_ unfair
reader/writer semantics to allow flexible locking rules and guarantee
deadlock-safety, having a recursive reader side is not even an issue, IMHO.

> I kinda
> wish you didn't, as this is exactly replicating the design of
> tasklist_lock which is IMO problematic. Your prior proposal of
> disabling interrupts during the read side had other disadvantages, but
> I think it was nice that it didn't rely on having a recursive read
> side.
> 

We can have readers from non-interrupt contexts too, which depend on the
recursive property...

>> +#define reader_yet_to_switch(pcpu_rwlock, cpu)                             \
>> +       (ACCESS_ONCE(per_cpu_ptr((pcpu_rwlock)->rw_state, cpu)->reader_refcnt))
>> +
>> +#define reader_percpu_nesting_depth(pcpu_rwlock)                 \
>> +       (__this_cpu_read((pcpu_rwlock)->rw_state->reader_refcnt))
>> +
>> +#define reader_uses_percpu_refcnt(pcpu_rwlock)                         \
>> +                               reader_percpu_nesting_depth(pcpu_rwlock)
>> +
>> +#define reader_nested_percpu(pcpu_rwlock)                              \
>> +                       (reader_percpu_nesting_depth(pcpu_rwlock) > 1)
>> +
>> +#define writer_active(pcpu_rwlock)                                     \
>> +       (__this_cpu_read((pcpu_rwlock)->rw_state->writer_signal))
> 
> I'm personally not a fan of such one-line shorthand functions - I
> think they tend to make the code harder to read instead of easier, as
> one constantly has to refer to them to understand what's actually
> going on.
> 

I got rid of most of the helper functions in this version. But I would rather
prefer retaining the above ones, because they are unwieldy and long. And IMHO
the short-hand names are pretty descriptive, so you might not actually need
to keep referring to their implementations all the time.

>>  void percpu_write_lock(struct percpu_rwlock *pcpu_rwlock)
>>  {
>> +       unsigned int cpu;
>> +
>> +       /*
>> +        * Tell all readers that a writer is becoming active, so that they
>> +        * start switching over to the global rwlock.
>> +        */
>> +       for_each_possible_cpu(cpu)
>> +               per_cpu_ptr(pcpu_rwlock->rw_state, cpu)->writer_signal = true;
> 
> I don't see anything preventing a race with the corresponding code in
> percpu_write_unlock() that sets writer_signal back to false. Did I
> miss something here ? It seems to me we don't have any guarantee that
> all writer signals will be set to true at the end of the loop...
> 

Ah, thanks for pointing that out! IIRC Oleg had pointed this issue in the last
version, but back then, I hadn't fully understood what he meant. Your
explanation made it clear. I'll work on fixing this.

Thanks a lot for your review Michel!

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 08/46] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
  2013-02-18 12:39   ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 16:23     ` Michel Lespinasse
  -1 siblings, 0 replies; 360+ messages in thread
From: Michel Lespinasse @ 2013-02-18 16:23 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung,
	rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, linux-pm, linux-arch, linux-arm-kernel, linuxppc-dev,
	netdev, linux-doc, linux-kernel, vincent.guittot

On Mon, Feb 18, 2013 at 8:39 PM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> Some important design requirements and considerations:
> -----------------------------------------------------
>
> 1. Scalable synchronization at the reader-side, especially in the fast-path
>
>    Any synchronization at the atomic hotplug readers side must be highly
>    scalable - avoid global single-holder locks/counters etc. Because, these
>    paths currently use the extremely fast preempt_disable(); our replacement
>    to preempt_disable() should not become ridiculously costly and also should
>    not serialize the readers among themselves needlessly.
>
>    At a minimum, the new APIs must be extremely fast at the reader side
>    atleast in the fast-path, when no CPU offline writers are active.
>
> 2. preempt_disable() was recursive. The replacement should also be recursive.
>
> 3. No (new) lock-ordering restrictions
>
>    preempt_disable() was super-flexible. It didn't impose any ordering
>    restrictions or rules for nesting. Our replacement should also be equally
>    flexible and usable.
>
> 4. No deadlock possibilities
>
>    Regular per-cpu locking is not the way to go if we want to have relaxed
>    rules for lock-ordering. Because, we can end up in circular-locking
>    dependencies as explained in https://lkml.org/lkml/2012/12/6/290
>
>    So, avoid the usual per-cpu locking schemes (per-cpu locks/per-cpu atomic
>    counters with spin-on-contention etc) as much as possible, to avoid
>    numerous deadlock possibilities from creeping in.
>
>
> Implementation of the design:
> ----------------------------
>
> We use per-CPU reader-writer locks for synchronization because:
>
>   a. They are quite fast and scalable in the fast-path (when no writers are
>      active), since they use fast per-cpu counters in those paths.
>
>   b. They are recursive at the reader side.
>
>   c. They provide a good amount of safety against deadlocks; they don't
>      spring new deadlock possibilities on us from out of nowhere. As a
>      result, they have relaxed locking rules and are quite flexible, and
>      thus are best suited for replacing usages of preempt_disable() or
>      local_irq_disable() at the reader side.
>
> Together, these satisfy all the requirements mentioned above.

Thanks for this detailed design explanation.

> +/*
> + * Invoked by atomic hotplug reader (a task which wants to prevent
> + * CPU offline, but which can't afford to sleep), to prevent CPUs from
> + * going offline. So, you can call this function from atomic contexts
> + * (including interrupt handlers).
> + *
> + * Note: This does NOT prevent CPUs from coming online! It only prevents
> + * CPUs from going offline.
> + *
> + * You can call this function recursively.
> + *
> + * Returns with preemption disabled (but interrupts remain as they are;
> + * they are not disabled).
> + */
> +void get_online_cpus_atomic(void)
> +{
> +       percpu_read_lock_irqsafe(&hotplug_pcpu_rwlock);
> +}
> +EXPORT_SYMBOL_GPL(get_online_cpus_atomic);
> +
> +void put_online_cpus_atomic(void)
> +{
> +       percpu_read_unlock_irqsafe(&hotplug_pcpu_rwlock);
> +}
> +EXPORT_SYMBOL_GPL(put_online_cpus_atomic);

So, you made it clear why you want a recursive read side here.

I am wondering though, if you could take care of recursive uses in
get/put_online_cpus_atomic() instead of doing it as a property of your
rwlock:

get_online_cpus_atomic()
{
    unsigned long flags;
    local_irq_save(flags);
    if (this_cpu_inc_return(hotplug_recusion_count) == 1)
        percpu_read_lock_irqsafe(&hotplug_pcpu_rwlock);
    local_irq_restore(flags);
}

Once again, the idea there is to avoid baking the reader side
recursive properties into your rwlock itself, so that it won't be
impossible to implement reader/writer fairness into your rwlock in the
future (which may be not be very important for the hotplug use, but
could be when other uses get introduced).

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 08/46] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
@ 2013-02-18 16:23     ` Michel Lespinasse
  0 siblings, 0 replies; 360+ messages in thread
From: Michel Lespinasse @ 2013-02-18 16:23 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: linux-doc, peterz, fweisbec, linux-kernel, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, namhyung, tglx, linux-arm-kernel, netdev, oleg,
	vincent.guittot, sbw, tj, akpm, linuxppc-dev

On Mon, Feb 18, 2013 at 8:39 PM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> Some important design requirements and considerations:
> -----------------------------------------------------
>
> 1. Scalable synchronization at the reader-side, especially in the fast-path
>
>    Any synchronization at the atomic hotplug readers side must be highly
>    scalable - avoid global single-holder locks/counters etc. Because, these
>    paths currently use the extremely fast preempt_disable(); our replacement
>    to preempt_disable() should not become ridiculously costly and also should
>    not serialize the readers among themselves needlessly.
>
>    At a minimum, the new APIs must be extremely fast at the reader side
>    atleast in the fast-path, when no CPU offline writers are active.
>
> 2. preempt_disable() was recursive. The replacement should also be recursive.
>
> 3. No (new) lock-ordering restrictions
>
>    preempt_disable() was super-flexible. It didn't impose any ordering
>    restrictions or rules for nesting. Our replacement should also be equally
>    flexible and usable.
>
> 4. No deadlock possibilities
>
>    Regular per-cpu locking is not the way to go if we want to have relaxed
>    rules for lock-ordering. Because, we can end up in circular-locking
>    dependencies as explained in https://lkml.org/lkml/2012/12/6/290
>
>    So, avoid the usual per-cpu locking schemes (per-cpu locks/per-cpu atomic
>    counters with spin-on-contention etc) as much as possible, to avoid
>    numerous deadlock possibilities from creeping in.
>
>
> Implementation of the design:
> ----------------------------
>
> We use per-CPU reader-writer locks for synchronization because:
>
>   a. They are quite fast and scalable in the fast-path (when no writers are
>      active), since they use fast per-cpu counters in those paths.
>
>   b. They are recursive at the reader side.
>
>   c. They provide a good amount of safety against deadlocks; they don't
>      spring new deadlock possibilities on us from out of nowhere. As a
>      result, they have relaxed locking rules and are quite flexible, and
>      thus are best suited for replacing usages of preempt_disable() or
>      local_irq_disable() at the reader side.
>
> Together, these satisfy all the requirements mentioned above.

Thanks for this detailed design explanation.

> +/*
> + * Invoked by atomic hotplug reader (a task which wants to prevent
> + * CPU offline, but which can't afford to sleep), to prevent CPUs from
> + * going offline. So, you can call this function from atomic contexts
> + * (including interrupt handlers).
> + *
> + * Note: This does NOT prevent CPUs from coming online! It only prevents
> + * CPUs from going offline.
> + *
> + * You can call this function recursively.
> + *
> + * Returns with preemption disabled (but interrupts remain as they are;
> + * they are not disabled).
> + */
> +void get_online_cpus_atomic(void)
> +{
> +       percpu_read_lock_irqsafe(&hotplug_pcpu_rwlock);
> +}
> +EXPORT_SYMBOL_GPL(get_online_cpus_atomic);
> +
> +void put_online_cpus_atomic(void)
> +{
> +       percpu_read_unlock_irqsafe(&hotplug_pcpu_rwlock);
> +}
> +EXPORT_SYMBOL_GPL(put_online_cpus_atomic);

So, you made it clear why you want a recursive read side here.

I am wondering though, if you could take care of recursive uses in
get/put_online_cpus_atomic() instead of doing it as a property of your
rwlock:

get_online_cpus_atomic()
{
    unsigned long flags;
    local_irq_save(flags);
    if (this_cpu_inc_return(hotplug_recusion_count) == 1)
        percpu_read_lock_irqsafe(&hotplug_pcpu_rwlock);
    local_irq_restore(flags);
}

Once again, the idea there is to avoid baking the reader side
recursive properties into your rwlock itself, so that it won't be
impossible to implement reader/writer fairness into your rwlock in the
future (which may be not be very important for the hotplug use, but
could be when other uses get introduced).

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 08/46] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
@ 2013-02-18 16:23     ` Michel Lespinasse
  0 siblings, 0 replies; 360+ messages in thread
From: Michel Lespinasse @ 2013-02-18 16:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 18, 2013 at 8:39 PM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> Some important design requirements and considerations:
> -----------------------------------------------------
>
> 1. Scalable synchronization at the reader-side, especially in the fast-path
>
>    Any synchronization at the atomic hotplug readers side must be highly
>    scalable - avoid global single-holder locks/counters etc. Because, these
>    paths currently use the extremely fast preempt_disable(); our replacement
>    to preempt_disable() should not become ridiculously costly and also should
>    not serialize the readers among themselves needlessly.
>
>    At a minimum, the new APIs must be extremely fast at the reader side
>    atleast in the fast-path, when no CPU offline writers are active.
>
> 2. preempt_disable() was recursive. The replacement should also be recursive.
>
> 3. No (new) lock-ordering restrictions
>
>    preempt_disable() was super-flexible. It didn't impose any ordering
>    restrictions or rules for nesting. Our replacement should also be equally
>    flexible and usable.
>
> 4. No deadlock possibilities
>
>    Regular per-cpu locking is not the way to go if we want to have relaxed
>    rules for lock-ordering. Because, we can end up in circular-locking
>    dependencies as explained in https://lkml.org/lkml/2012/12/6/290
>
>    So, avoid the usual per-cpu locking schemes (per-cpu locks/per-cpu atomic
>    counters with spin-on-contention etc) as much as possible, to avoid
>    numerous deadlock possibilities from creeping in.
>
>
> Implementation of the design:
> ----------------------------
>
> We use per-CPU reader-writer locks for synchronization because:
>
>   a. They are quite fast and scalable in the fast-path (when no writers are
>      active), since they use fast per-cpu counters in those paths.
>
>   b. They are recursive at the reader side.
>
>   c. They provide a good amount of safety against deadlocks; they don't
>      spring new deadlock possibilities on us from out of nowhere. As a
>      result, they have relaxed locking rules and are quite flexible, and
>      thus are best suited for replacing usages of preempt_disable() or
>      local_irq_disable() at the reader side.
>
> Together, these satisfy all the requirements mentioned above.

Thanks for this detailed design explanation.

> +/*
> + * Invoked by atomic hotplug reader (a task which wants to prevent
> + * CPU offline, but which can't afford to sleep), to prevent CPUs from
> + * going offline. So, you can call this function from atomic contexts
> + * (including interrupt handlers).
> + *
> + * Note: This does NOT prevent CPUs from coming online! It only prevents
> + * CPUs from going offline.
> + *
> + * You can call this function recursively.
> + *
> + * Returns with preemption disabled (but interrupts remain as they are;
> + * they are not disabled).
> + */
> +void get_online_cpus_atomic(void)
> +{
> +       percpu_read_lock_irqsafe(&hotplug_pcpu_rwlock);
> +}
> +EXPORT_SYMBOL_GPL(get_online_cpus_atomic);
> +
> +void put_online_cpus_atomic(void)
> +{
> +       percpu_read_unlock_irqsafe(&hotplug_pcpu_rwlock);
> +}
> +EXPORT_SYMBOL_GPL(put_online_cpus_atomic);

So, you made it clear why you want a recursive read side here.

I am wondering though, if you could take care of recursive uses in
get/put_online_cpus_atomic() instead of doing it as a property of your
rwlock:

get_online_cpus_atomic()
{
    unsigned long flags;
    local_irq_save(flags);
    if (this_cpu_inc_return(hotplug_recusion_count) == 1)
        percpu_read_lock_irqsafe(&hotplug_pcpu_rwlock);
    local_irq_restore(flags);
}

Once again, the idea there is to avoid baking the reader side
recursive properties into your rwlock itself, so that it won't be
impossible to implement reader/writer fairness into your rwlock in the
future (which may be not be very important for the hotplug use, but
could be when other uses get introduced).

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 07/46] percpu_rwlock: Allow writers to be readers, and add lockdep annotations
  2013-02-18 15:51     ` Michel Lespinasse
  (?)
@ 2013-02-18 16:31       ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 16:31 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: linux-doc, peterz, fweisbec, linux-kernel, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, namhyung, tglx, linux-arm-kernel, netdev, oleg,
	vincent.guittot, sbw, tj, akpm, linuxppc-dev

On 02/18/2013 09:21 PM, Michel Lespinasse wrote:
> On Mon, Feb 18, 2013 at 8:39 PM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> @@ -200,6 +217,16 @@ void percpu_write_lock_irqsave(struct percpu_rwlock *pcpu_rwlock,
>>
>>         smp_mb(); /* Complete the wait-for-readers, before taking the lock */
>>         write_lock_irqsave(&pcpu_rwlock->global_rwlock, *flags);
>> +
>> +       /*
>> +        * It is desirable to allow the writer to acquire the percpu-rwlock
>> +        * for read (if necessary), without deadlocking or getting complaints
>> +        * from lockdep. To achieve that, just increment the reader_refcnt of
>> +        * this CPU - that way, any attempt by the writer to acquire the
>> +        * percpu-rwlock for read, will get treated as a case of nested percpu
>> +        * reader, which is safe, from a locking perspective.
>> +        */
>> +       this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
> 
> I find this quite disgusting, but once again this may be because I
> don't like unfair recursive rwlocks.
> 

:-)

> In my opinion, the alternative of explicitly not taking the read lock
> when one already has the write lock sounds *much* nicer.

I don't seem to recall any strong reasons to do it this way, so I don't have
any strong opinions on doing it this way. But one of the things to note is that,
in the CPU Hotplug case, the readers are *way* more hotter than the writer.
So avoiding extra checks/'if' conditions/memory barriers in the reader-side
is very welcome. (If we slow down the read-side, we get a performance hit
even when *not* doing hotplug!). Considering this, the logic used in this
patchset seems better, IMHO.

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 07/46] percpu_rwlock: Allow writers to be readers, and add lockdep annotations
@ 2013-02-18 16:31       ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 16:31 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: linux-doc, peterz, fweisbec, linux-kernel, namhyung, mingo,
	linux-arch, linux, xiaoguangrong, wangyun, paulmck, nikunj,
	linux-pm, rusty, rostedt, rjw, vincent.guittot, tglx,
	linux-arm-kernel, netdev, oleg, sbw, tj, akpm, linuxppc-dev

On 02/18/2013 09:21 PM, Michel Lespinasse wrote:
> On Mon, Feb 18, 2013 at 8:39 PM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> @@ -200,6 +217,16 @@ void percpu_write_lock_irqsave(struct percpu_rwlock *pcpu_rwlock,
>>
>>         smp_mb(); /* Complete the wait-for-readers, before taking the lock */
>>         write_lock_irqsave(&pcpu_rwlock->global_rwlock, *flags);
>> +
>> +       /*
>> +        * It is desirable to allow the writer to acquire the percpu-rwlock
>> +        * for read (if necessary), without deadlocking or getting complaints
>> +        * from lockdep. To achieve that, just increment the reader_refcnt of
>> +        * this CPU - that way, any attempt by the writer to acquire the
>> +        * percpu-rwlock for read, will get treated as a case of nested percpu
>> +        * reader, which is safe, from a locking perspective.
>> +        */
>> +       this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
> 
> I find this quite disgusting, but once again this may be because I
> don't like unfair recursive rwlocks.
> 

:-)

> In my opinion, the alternative of explicitly not taking the read lock
> when one already has the write lock sounds *much* nicer.

I don't seem to recall any strong reasons to do it this way, so I don't have
any strong opinions on doing it this way. But one of the things to note is that,
in the CPU Hotplug case, the readers are *way* more hotter than the writer.
So avoiding extra checks/'if' conditions/memory barriers in the reader-side
is very welcome. (If we slow down the read-side, we get a performance hit
even when *not* doing hotplug!). Considering this, the logic used in this
patchset seems better, IMHO.

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 07/46] percpu_rwlock: Allow writers to be readers, and add lockdep annotations
@ 2013-02-18 16:31       ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 16:31 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/18/2013 09:21 PM, Michel Lespinasse wrote:
> On Mon, Feb 18, 2013 at 8:39 PM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> @@ -200,6 +217,16 @@ void percpu_write_lock_irqsave(struct percpu_rwlock *pcpu_rwlock,
>>
>>         smp_mb(); /* Complete the wait-for-readers, before taking the lock */
>>         write_lock_irqsave(&pcpu_rwlock->global_rwlock, *flags);
>> +
>> +       /*
>> +        * It is desirable to allow the writer to acquire the percpu-rwlock
>> +        * for read (if necessary), without deadlocking or getting complaints
>> +        * from lockdep. To achieve that, just increment the reader_refcnt of
>> +        * this CPU - that way, any attempt by the writer to acquire the
>> +        * percpu-rwlock for read, will get treated as a case of nested percpu
>> +        * reader, which is safe, from a locking perspective.
>> +        */
>> +       this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
> 
> I find this quite disgusting, but once again this may be because I
> don't like unfair recursive rwlocks.
> 

:-)

> In my opinion, the alternative of explicitly not taking the read lock
> when one already has the write lock sounds *much* nicer.

I don't seem to recall any strong reasons to do it this way, so I don't have
any strong opinions on doing it this way. But one of the things to note is that,
in the CPU Hotplug case, the readers are *way* more hotter than the writer.
So avoiding extra checks/'if' conditions/memory barriers in the reader-side
is very welcome. (If we slow down the read-side, we get a performance hit
even when *not* doing hotplug!). Considering this, the logic used in this
patchset seems better, IMHO.

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
  2013-02-18 16:21       ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 16:31         ` Steven Rostedt
  -1 siblings, 0 replies; 360+ messages in thread
From: Steven Rostedt @ 2013-02-18 16:31 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Michel Lespinasse, tglx, peterz, tj, oleg, paulmck, rusty, mingo,
	akpm, namhyung, wangyun, xiaoguangrong, rjw, sbw, fweisbec,
	linux, nikunj, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, vincent.guittot

On Mon, 2013-02-18 at 21:51 +0530, Srivatsa S. Bhat wrote:
> Hi Michel,

> Yes.. I don't think we can avoid that. Moreover, since we _want_ unfair
> reader/writer semantics to allow flexible locking rules and guarantee
> deadlock-safety, having a recursive reader side is not even an issue, IMHO.

Recursive unfair reader lock may guarantee deadlock-safety, but
remember, it adds a higher probability of live-locking the write_lock.
Which is another argument to keep this separate to cpu hotplug only.

-- Steve



^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-18 16:31         ` Steven Rostedt
  0 siblings, 0 replies; 360+ messages in thread
From: Steven Rostedt @ 2013-02-18 16:31 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: linux-doc, peterz, fweisbec, linux-kernel, Michel Lespinasse,
	mingo, linux-arch, linux, xiaoguangrong, wangyun, paulmck,
	nikunj, linux-pm, rusty, rjw, namhyung, tglx, linux-arm-kernel,
	netdev, oleg, vincent.guittot, sbw, tj, akpm, linuxppc-dev

On Mon, 2013-02-18 at 21:51 +0530, Srivatsa S. Bhat wrote:
> Hi Michel,

> Yes.. I don't think we can avoid that. Moreover, since we _want_ unfair
> reader/writer semantics to allow flexible locking rules and guarantee
> deadlock-safety, having a recursive reader side is not even an issue, IMHO.

Recursive unfair reader lock may guarantee deadlock-safety, but
remember, it adds a higher probability of live-locking the write_lock.
Which is another argument to keep this separate to cpu hotplug only.

-- Steve

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-18 16:31         ` Steven Rostedt
  0 siblings, 0 replies; 360+ messages in thread
From: Steven Rostedt @ 2013-02-18 16:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2013-02-18 at 21:51 +0530, Srivatsa S. Bhat wrote:
> Hi Michel,

> Yes.. I don't think we can avoid that. Moreover, since we _want_ unfair
> reader/writer semantics to allow flexible locking rules and guarantee
> deadlock-safety, having a recursive reader side is not even an issue, IMHO.

Recursive unfair reader lock may guarantee deadlock-safety, but
remember, it adds a higher probability of live-locking the write_lock.
Which is another argument to keep this separate to cpu hotplug only.

-- Steve

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 08/46] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
  2013-02-18 16:23     ` Michel Lespinasse
  (?)
@ 2013-02-18 16:43       ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 16:43 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung,
	rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, linux-pm, linux-arch, linux-arm-kernel, linuxppc-dev,
	netdev, linux-doc, linux-kernel, vincent.guittot

On 02/18/2013 09:53 PM, Michel Lespinasse wrote:
> On Mon, Feb 18, 2013 at 8:39 PM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> Some important design requirements and considerations:
>> -----------------------------------------------------
[...]
>> +/*
>> + * Invoked by atomic hotplug reader (a task which wants to prevent
>> + * CPU offline, but which can't afford to sleep), to prevent CPUs from
>> + * going offline. So, you can call this function from atomic contexts
>> + * (including interrupt handlers).
>> + *
>> + * Note: This does NOT prevent CPUs from coming online! It only prevents
>> + * CPUs from going offline.
>> + *
>> + * You can call this function recursively.
>> + *
>> + * Returns with preemption disabled (but interrupts remain as they are;
>> + * they are not disabled).
>> + */
>> +void get_online_cpus_atomic(void)
>> +{
>> +       percpu_read_lock_irqsafe(&hotplug_pcpu_rwlock);
>> +}
>> +EXPORT_SYMBOL_GPL(get_online_cpus_atomic);
>> +
>> +void put_online_cpus_atomic(void)
>> +{
>> +       percpu_read_unlock_irqsafe(&hotplug_pcpu_rwlock);
>> +}
>> +EXPORT_SYMBOL_GPL(put_online_cpus_atomic);
> 
> So, you made it clear why you want a recursive read side here.
> 
> I am wondering though, if you could take care of recursive uses in
> get/put_online_cpus_atomic() instead of doing it as a property of your
> rwlock:
> 
> get_online_cpus_atomic()
> {
>     unsigned long flags;
>     local_irq_save(flags);
>     if (this_cpu_inc_return(hotplug_recusion_count) == 1)
>         percpu_read_lock_irqsafe(&hotplug_pcpu_rwlock);
>     local_irq_restore(flags);
> }
> 
> Once again, the idea there is to avoid baking the reader side
> recursive properties into your rwlock itself, so that it won't be
> impossible to implement reader/writer fairness into your rwlock in the
> future (which may be not be very important for the hotplug use, but
> could be when other uses get introduced).
> 

Hmm, your proposal above looks good to me, at first glance.
(Sorry, I had mistaken your earlier mails to mean that you were against
recursive reader-side, while you actually meant that you didn't like
implementing the recursive reader-side logic using the recursive property
of rwlocks).

While your idea above looks good, it might introduce more complexity
in the unlock path, since this would allow nesting of heterogeneous readers
(ie., if hotplug_recursion_count == 1, you don't know whether you need to
simply decrement the counter or unlock the rwlock as well).

But I'll give this some more thought to see if we can implement this
without making it too complex. Thank you!

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 08/46] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
@ 2013-02-18 16:43       ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 16:43 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: linux-doc, peterz, fweisbec, linux-kernel, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, namhyung, tglx, linux-arm-kernel, netdev, oleg,
	vincent.guittot, sbw, tj, akpm, linuxppc-dev

On 02/18/2013 09:53 PM, Michel Lespinasse wrote:
> On Mon, Feb 18, 2013 at 8:39 PM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> Some important design requirements and considerations:
>> -----------------------------------------------------
[...]
>> +/*
>> + * Invoked by atomic hotplug reader (a task which wants to prevent
>> + * CPU offline, but which can't afford to sleep), to prevent CPUs from
>> + * going offline. So, you can call this function from atomic contexts
>> + * (including interrupt handlers).
>> + *
>> + * Note: This does NOT prevent CPUs from coming online! It only prevents
>> + * CPUs from going offline.
>> + *
>> + * You can call this function recursively.
>> + *
>> + * Returns with preemption disabled (but interrupts remain as they are;
>> + * they are not disabled).
>> + */
>> +void get_online_cpus_atomic(void)
>> +{
>> +       percpu_read_lock_irqsafe(&hotplug_pcpu_rwlock);
>> +}
>> +EXPORT_SYMBOL_GPL(get_online_cpus_atomic);
>> +
>> +void put_online_cpus_atomic(void)
>> +{
>> +       percpu_read_unlock_irqsafe(&hotplug_pcpu_rwlock);
>> +}
>> +EXPORT_SYMBOL_GPL(put_online_cpus_atomic);
> 
> So, you made it clear why you want a recursive read side here.
> 
> I am wondering though, if you could take care of recursive uses in
> get/put_online_cpus_atomic() instead of doing it as a property of your
> rwlock:
> 
> get_online_cpus_atomic()
> {
>     unsigned long flags;
>     local_irq_save(flags);
>     if (this_cpu_inc_return(hotplug_recusion_count) == 1)
>         percpu_read_lock_irqsafe(&hotplug_pcpu_rwlock);
>     local_irq_restore(flags);
> }
> 
> Once again, the idea there is to avoid baking the reader side
> recursive properties into your rwlock itself, so that it won't be
> impossible to implement reader/writer fairness into your rwlock in the
> future (which may be not be very important for the hotplug use, but
> could be when other uses get introduced).
> 

Hmm, your proposal above looks good to me, at first glance.
(Sorry, I had mistaken your earlier mails to mean that you were against
recursive reader-side, while you actually meant that you didn't like
implementing the recursive reader-side logic using the recursive property
of rwlocks).

While your idea above looks good, it might introduce more complexity
in the unlock path, since this would allow nesting of heterogeneous readers
(ie., if hotplug_recursion_count == 1, you don't know whether you need to
simply decrement the counter or unlock the rwlock as well).

But I'll give this some more thought to see if we can implement this
without making it too complex. Thank you!

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 08/46] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
@ 2013-02-18 16:43       ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 16:43 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/18/2013 09:53 PM, Michel Lespinasse wrote:
> On Mon, Feb 18, 2013 at 8:39 PM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> Some important design requirements and considerations:
>> -----------------------------------------------------
[...]
>> +/*
>> + * Invoked by atomic hotplug reader (a task which wants to prevent
>> + * CPU offline, but which can't afford to sleep), to prevent CPUs from
>> + * going offline. So, you can call this function from atomic contexts
>> + * (including interrupt handlers).
>> + *
>> + * Note: This does NOT prevent CPUs from coming online! It only prevents
>> + * CPUs from going offline.
>> + *
>> + * You can call this function recursively.
>> + *
>> + * Returns with preemption disabled (but interrupts remain as they are;
>> + * they are not disabled).
>> + */
>> +void get_online_cpus_atomic(void)
>> +{
>> +       percpu_read_lock_irqsafe(&hotplug_pcpu_rwlock);
>> +}
>> +EXPORT_SYMBOL_GPL(get_online_cpus_atomic);
>> +
>> +void put_online_cpus_atomic(void)
>> +{
>> +       percpu_read_unlock_irqsafe(&hotplug_pcpu_rwlock);
>> +}
>> +EXPORT_SYMBOL_GPL(put_online_cpus_atomic);
> 
> So, you made it clear why you want a recursive read side here.
> 
> I am wondering though, if you could take care of recursive uses in
> get/put_online_cpus_atomic() instead of doing it as a property of your
> rwlock:
> 
> get_online_cpus_atomic()
> {
>     unsigned long flags;
>     local_irq_save(flags);
>     if (this_cpu_inc_return(hotplug_recusion_count) == 1)
>         percpu_read_lock_irqsafe(&hotplug_pcpu_rwlock);
>     local_irq_restore(flags);
> }
> 
> Once again, the idea there is to avoid baking the reader side
> recursive properties into your rwlock itself, so that it won't be
> impossible to implement reader/writer fairness into your rwlock in the
> future (which may be not be very important for the hotplug use, but
> could be when other uses get introduced).
> 

Hmm, your proposal above looks good to me, at first glance.
(Sorry, I had mistaken your earlier mails to mean that you were against
recursive reader-side, while you actually meant that you didn't like
implementing the recursive reader-side logic using the recursive property
of rwlocks).

While your idea above looks good, it might introduce more complexity
in the unlock path, since this would allow nesting of heterogeneous readers
(ie., if hotplug_recursion_count == 1, you don't know whether you need to
simply decrement the counter or unlock the rwlock as well).

But I'll give this some more thought to see if we can implement this
without making it too complex. Thank you!

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
  2013-02-18 16:31         ` Steven Rostedt
  (?)
@ 2013-02-18 16:46           ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 16:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Michel Lespinasse, tglx, peterz, tj, oleg, paulmck, rusty, mingo,
	akpm, namhyung, wangyun, xiaoguangrong, rjw, sbw, fweisbec,
	linux, nikunj, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, vincent.guittot

On 02/18/2013 10:01 PM, Steven Rostedt wrote:
> On Mon, 2013-02-18 at 21:51 +0530, Srivatsa S. Bhat wrote:
>> Hi Michel,
> 
>> Yes.. I don't think we can avoid that. Moreover, since we _want_ unfair
>> reader/writer semantics to allow flexible locking rules and guarantee
>> deadlock-safety, having a recursive reader side is not even an issue, IMHO.
> 
> Recursive unfair reader lock may guarantee deadlock-safety, but
> remember, it adds a higher probability of live-locking the write_lock.
> Which is another argument to keep this separate to cpu hotplug only.
> 

True.
 
Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-18 16:46           ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 16:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-doc, peterz, fweisbec, linux-kernel, Michel Lespinasse,
	mingo, linux-arch, linux, xiaoguangrong, wangyun, paulmck,
	nikunj, linux-pm, rusty, rjw, namhyung, tglx, linux-arm-kernel,
	netdev, oleg, vincent.guittot, sbw, tj, akpm, linuxppc-dev

On 02/18/2013 10:01 PM, Steven Rostedt wrote:
> On Mon, 2013-02-18 at 21:51 +0530, Srivatsa S. Bhat wrote:
>> Hi Michel,
> 
>> Yes.. I don't think we can avoid that. Moreover, since we _want_ unfair
>> reader/writer semantics to allow flexible locking rules and guarantee
>> deadlock-safety, having a recursive reader side is not even an issue, IMHO.
> 
> Recursive unfair reader lock may guarantee deadlock-safety, but
> remember, it adds a higher probability of live-locking the write_lock.
> Which is another argument to keep this separate to cpu hotplug only.
> 

True.
 
Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-18 16:46           ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 16:46 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/18/2013 10:01 PM, Steven Rostedt wrote:
> On Mon, 2013-02-18 at 21:51 +0530, Srivatsa S. Bhat wrote:
>> Hi Michel,
> 
>> Yes.. I don't think we can avoid that. Moreover, since we _want_ unfair
>> reader/writer semantics to allow flexible locking rules and guarantee
>> deadlock-safety, having a recursive reader side is not even an issue, IMHO.
> 
> Recursive unfair reader lock may guarantee deadlock-safety, but
> remember, it adds a higher probability of live-locking the write_lock.
> Which is another argument to keep this separate to cpu hotplug only.
> 

True.
 
Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 08/46] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
  2013-02-18 16:43       ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 17:21         ` Michel Lespinasse
  -1 siblings, 0 replies; 360+ messages in thread
From: Michel Lespinasse @ 2013-02-18 17:21 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung,
	rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, linux-pm, linux-arch, linux-arm-kernel, linuxppc-dev,
	netdev, linux-doc, linux-kernel, vincent.guittot

On Tue, Feb 19, 2013 at 12:43 AM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> On 02/18/2013 09:53 PM, Michel Lespinasse wrote:
>> I am wondering though, if you could take care of recursive uses in
>> get/put_online_cpus_atomic() instead of doing it as a property of your
>> rwlock:
>>
>> get_online_cpus_atomic()
>> {
>>     unsigned long flags;
>>     local_irq_save(flags);
>>     if (this_cpu_inc_return(hotplug_recusion_count) == 1)
>>         percpu_read_lock_irqsafe(&hotplug_pcpu_rwlock);
>>     local_irq_restore(flags);
>> }
>>
>> Once again, the idea there is to avoid baking the reader side
>> recursive properties into your rwlock itself, so that it won't be
>> impossible to implement reader/writer fairness into your rwlock in the
>> future (which may be not be very important for the hotplug use, but
>> could be when other uses get introduced).
>
> Hmm, your proposal above looks good to me, at first glance.
> (Sorry, I had mistaken your earlier mails to mean that you were against
> recursive reader-side, while you actually meant that you didn't like
> implementing the recursive reader-side logic using the recursive property
> of rwlocks).

To be honest, I was replying as I went through the series, so I hadn't
digested your hotplug use case yet :)

But yes - I don't like having the rwlock itself be recursive, but I do
understand that you have a legitimate requirement for
get_online_cpus_atomic() to be recursive. This IMO points to the
direction I suggested, of explicitly handling the recusion in
get_online_cpus_atomic() so that the underlying rwlock doesn't have to
support recursive reader side itself.

(And this would work for the idea of making writers own the reader
side as well - you can do it with the hotplug_recursion_count instead
of with the underlying rwlock).

> While your idea above looks good, it might introduce more complexity
> in the unlock path, since this would allow nesting of heterogeneous readers
> (ie., if hotplug_recursion_count == 1, you don't know whether you need to
> simply decrement the counter or unlock the rwlock as well).

Well, I think the idea doesn't make the underlying rwlock more
complex, since you could in principle keep your existing
percpu_read_lock_irqsafe implementation as is and just remove the
recursive behavior from its documentation.

Now ideally if we're adding a bit of complexity in
get_online_cpus_atomic() it'd be nice if we could remove some from
percpu_read_lock_irqsafe, but I haven't thought about that deeply
either. I think you'd still want to have the equivalent of a percpu
reader_refcnt, except it could now be a bool instead of an int, and
percpu_read_lock_irqsafe would still set it to back to 0/false after
acquiring the global read side if a writer is signaled. Basically your
existing percpu_read_lock_irqsafe code should still work, and we could
remove just the parts that were only there to deal with the recursive
property.

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 08/46] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
@ 2013-02-18 17:21         ` Michel Lespinasse
  0 siblings, 0 replies; 360+ messages in thread
From: Michel Lespinasse @ 2013-02-18 17:21 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: linux-doc, peterz, fweisbec, linux-kernel, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, namhyung, tglx, linux-arm-kernel, netdev, oleg,
	vincent.guittot, sbw, tj, akpm, linuxppc-dev

On Tue, Feb 19, 2013 at 12:43 AM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> On 02/18/2013 09:53 PM, Michel Lespinasse wrote:
>> I am wondering though, if you could take care of recursive uses in
>> get/put_online_cpus_atomic() instead of doing it as a property of your
>> rwlock:
>>
>> get_online_cpus_atomic()
>> {
>>     unsigned long flags;
>>     local_irq_save(flags);
>>     if (this_cpu_inc_return(hotplug_recusion_count) == 1)
>>         percpu_read_lock_irqsafe(&hotplug_pcpu_rwlock);
>>     local_irq_restore(flags);
>> }
>>
>> Once again, the idea there is to avoid baking the reader side
>> recursive properties into your rwlock itself, so that it won't be
>> impossible to implement reader/writer fairness into your rwlock in the
>> future (which may be not be very important for the hotplug use, but
>> could be when other uses get introduced).
>
> Hmm, your proposal above looks good to me, at first glance.
> (Sorry, I had mistaken your earlier mails to mean that you were against
> recursive reader-side, while you actually meant that you didn't like
> implementing the recursive reader-side logic using the recursive property
> of rwlocks).

To be honest, I was replying as I went through the series, so I hadn't
digested your hotplug use case yet :)

But yes - I don't like having the rwlock itself be recursive, but I do
understand that you have a legitimate requirement for
get_online_cpus_atomic() to be recursive. This IMO points to the
direction I suggested, of explicitly handling the recusion in
get_online_cpus_atomic() so that the underlying rwlock doesn't have to
support recursive reader side itself.

(And this would work for the idea of making writers own the reader
side as well - you can do it with the hotplug_recursion_count instead
of with the underlying rwlock).

> While your idea above looks good, it might introduce more complexity
> in the unlock path, since this would allow nesting of heterogeneous readers
> (ie., if hotplug_recursion_count == 1, you don't know whether you need to
> simply decrement the counter or unlock the rwlock as well).

Well, I think the idea doesn't make the underlying rwlock more
complex, since you could in principle keep your existing
percpu_read_lock_irqsafe implementation as is and just remove the
recursive behavior from its documentation.

Now ideally if we're adding a bit of complexity in
get_online_cpus_atomic() it'd be nice if we could remove some from
percpu_read_lock_irqsafe, but I haven't thought about that deeply
either. I think you'd still want to have the equivalent of a percpu
reader_refcnt, except it could now be a bool instead of an int, and
percpu_read_lock_irqsafe would still set it to back to 0/false after
acquiring the global read side if a writer is signaled. Basically your
existing percpu_read_lock_irqsafe code should still work, and we could
remove just the parts that were only there to deal with the recursive
property.

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 08/46] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
@ 2013-02-18 17:21         ` Michel Lespinasse
  0 siblings, 0 replies; 360+ messages in thread
From: Michel Lespinasse @ 2013-02-18 17:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 19, 2013 at 12:43 AM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> On 02/18/2013 09:53 PM, Michel Lespinasse wrote:
>> I am wondering though, if you could take care of recursive uses in
>> get/put_online_cpus_atomic() instead of doing it as a property of your
>> rwlock:
>>
>> get_online_cpus_atomic()
>> {
>>     unsigned long flags;
>>     local_irq_save(flags);
>>     if (this_cpu_inc_return(hotplug_recusion_count) == 1)
>>         percpu_read_lock_irqsafe(&hotplug_pcpu_rwlock);
>>     local_irq_restore(flags);
>> }
>>
>> Once again, the idea there is to avoid baking the reader side
>> recursive properties into your rwlock itself, so that it won't be
>> impossible to implement reader/writer fairness into your rwlock in the
>> future (which may be not be very important for the hotplug use, but
>> could be when other uses get introduced).
>
> Hmm, your proposal above looks good to me, at first glance.
> (Sorry, I had mistaken your earlier mails to mean that you were against
> recursive reader-side, while you actually meant that you didn't like
> implementing the recursive reader-side logic using the recursive property
> of rwlocks).

To be honest, I was replying as I went through the series, so I hadn't
digested your hotplug use case yet :)

But yes - I don't like having the rwlock itself be recursive, but I do
understand that you have a legitimate requirement for
get_online_cpus_atomic() to be recursive. This IMO points to the
direction I suggested, of explicitly handling the recusion in
get_online_cpus_atomic() so that the underlying rwlock doesn't have to
support recursive reader side itself.

(And this would work for the idea of making writers own the reader
side as well - you can do it with the hotplug_recursion_count instead
of with the underlying rwlock).

> While your idea above looks good, it might introduce more complexity
> in the unlock path, since this would allow nesting of heterogeneous readers
> (ie., if hotplug_recursion_count == 1, you don't know whether you need to
> simply decrement the counter or unlock the rwlock as well).

Well, I think the idea doesn't make the underlying rwlock more
complex, since you could in principle keep your existing
percpu_read_lock_irqsafe implementation as is and just remove the
recursive behavior from its documentation.

Now ideally if we're adding a bit of complexity in
get_online_cpus_atomic() it'd be nice if we could remove some from
percpu_read_lock_irqsafe, but I haven't thought about that deeply
either. I think you'd still want to have the equivalent of a percpu
reader_refcnt, except it could now be a bool instead of an int, and
percpu_read_lock_irqsafe would still set it to back to 0/false after
acquiring the global read side if a writer is signaled. Basically your
existing percpu_read_lock_irqsafe code should still work, and we could
remove just the parts that were only there to deal with the recursive
property.

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
  2013-02-18 16:21       ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 17:56         ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 17:56 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: linux-doc, peterz, fweisbec, linux-kernel, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, namhyung, tglx, linux-arm-kernel, netdev, oleg,
	vincent.guittot, sbw, tj, akpm, linuxppc-dev

On 02/18/2013 09:51 PM, Srivatsa S. Bhat wrote:
> Hi Michel,
> 
> On 02/18/2013 09:15 PM, Michel Lespinasse wrote:
>> Hi Srivasta,
>>
>> I admit not having followed in detail the threads about the previous
>> iteration, so some of my comments may have been discussed already
>> before - apologies if that is the case.
>>
>> On Mon, Feb 18, 2013 at 8:38 PM, Srivatsa S. Bhat
>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>> Reader-writer locks and per-cpu counters are recursive, so they can be
>>> used in a nested fashion in the reader-path, which makes per-CPU rwlocks also
>>> recursive. Also, this design of switching the synchronization scheme ensures
>>> that you can safely nest and use these locks in a very flexible manner.
[...]
>>>  void percpu_write_lock(struct percpu_rwlock *pcpu_rwlock)
>>>  {
>>> +       unsigned int cpu;
>>> +
>>> +       /*
>>> +        * Tell all readers that a writer is becoming active, so that they
>>> +        * start switching over to the global rwlock.
>>> +        */
>>> +       for_each_possible_cpu(cpu)
>>> +               per_cpu_ptr(pcpu_rwlock->rw_state, cpu)->writer_signal = true;
>>
>> I don't see anything preventing a race with the corresponding code in
>> percpu_write_unlock() that sets writer_signal back to false. Did I
>> miss something here ? It seems to me we don't have any guarantee that
>> all writer signals will be set to true at the end of the loop...
>>
> 
> Ah, thanks for pointing that out! IIRC Oleg had pointed this issue in the last
> version, but back then, I hadn't fully understood what he meant. Your
> explanation made it clear. I'll work on fixing this.
> 

We can fix this by using the simple patch (untested) shown below.
The alternative would be to acquire the rwlock for write, update the
->writer_signal values, release the lock, wait for readers to switch,
again acquire the rwlock for write with interrupts disabled etc... which
makes it kinda messy, IMHO. So I prefer the simple version shown below.


diff --git a/lib/percpu-rwlock.c b/lib/percpu-rwlock.c
index bf95e40..64ccd3f 100644
--- a/lib/percpu-rwlock.c
+++ b/lib/percpu-rwlock.c
@@ -50,6 +50,12 @@
 	(__this_cpu_read((pcpu_rwlock)->rw_state->writer_signal))
 
 
+/*
+ * Spinlock to synchronize access to the writer's data-structures
+ * (->writer_signal) from multiple writers.
+ */
+static DEFINE_SPINLOCK(writer_side_lock);
+
 int __percpu_init_rwlock(struct percpu_rwlock *pcpu_rwlock,
 			 const char *name, struct lock_class_key *rwlock_key)
 {
@@ -191,6 +197,8 @@ void percpu_write_lock_irqsave(struct percpu_rwlock *pcpu_rwlock,
 {
 	unsigned int cpu;
 
+	spin_lock(&writer_side_lock);
+
 	/*
 	 * Tell all readers that a writer is becoming active, so that they
 	 * start switching over to the global rwlock.
@@ -252,5 +260,6 @@ void percpu_write_unlock_irqrestore(struct percpu_rwlock *pcpu_rwlock,
 		per_cpu_ptr(pcpu_rwlock->rw_state, cpu)->writer_signal = false;
 
 	write_unlock_irqrestore(&pcpu_rwlock->global_rwlock, *flags);
+	spin_unlock(&writer_side_lock);
 }


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-18 17:56         ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 17:56 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: linux-doc, peterz, fweisbec, linux-kernel, namhyung, mingo,
	linux-arch, linux, xiaoguangrong, wangyun, paulmck, nikunj,
	linux-pm, rusty, rostedt, rjw, vincent.guittot, tglx,
	linux-arm-kernel, netdev, oleg, sbw, tj, akpm, linuxppc-dev

On 02/18/2013 09:51 PM, Srivatsa S. Bhat wrote:
> Hi Michel,
> 
> On 02/18/2013 09:15 PM, Michel Lespinasse wrote:
>> Hi Srivasta,
>>
>> I admit not having followed in detail the threads about the previous
>> iteration, so some of my comments may have been discussed already
>> before - apologies if that is the case.
>>
>> On Mon, Feb 18, 2013 at 8:38 PM, Srivatsa S. Bhat
>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>> Reader-writer locks and per-cpu counters are recursive, so they can be
>>> used in a nested fashion in the reader-path, which makes per-CPU rwlocks also
>>> recursive. Also, this design of switching the synchronization scheme ensures
>>> that you can safely nest and use these locks in a very flexible manner.
[...]
>>>  void percpu_write_lock(struct percpu_rwlock *pcpu_rwlock)
>>>  {
>>> +       unsigned int cpu;
>>> +
>>> +       /*
>>> +        * Tell all readers that a writer is becoming active, so that they
>>> +        * start switching over to the global rwlock.
>>> +        */
>>> +       for_each_possible_cpu(cpu)
>>> +               per_cpu_ptr(pcpu_rwlock->rw_state, cpu)->writer_signal = true;
>>
>> I don't see anything preventing a race with the corresponding code in
>> percpu_write_unlock() that sets writer_signal back to false. Did I
>> miss something here ? It seems to me we don't have any guarantee that
>> all writer signals will be set to true at the end of the loop...
>>
> 
> Ah, thanks for pointing that out! IIRC Oleg had pointed this issue in the last
> version, but back then, I hadn't fully understood what he meant. Your
> explanation made it clear. I'll work on fixing this.
> 

We can fix this by using the simple patch (untested) shown below.
The alternative would be to acquire the rwlock for write, update the
->writer_signal values, release the lock, wait for readers to switch,
again acquire the rwlock for write with interrupts disabled etc... which
makes it kinda messy, IMHO. So I prefer the simple version shown below.


diff --git a/lib/percpu-rwlock.c b/lib/percpu-rwlock.c
index bf95e40..64ccd3f 100644
--- a/lib/percpu-rwlock.c
+++ b/lib/percpu-rwlock.c
@@ -50,6 +50,12 @@
 	(__this_cpu_read((pcpu_rwlock)->rw_state->writer_signal))
 
 
+/*
+ * Spinlock to synchronize access to the writer's data-structures
+ * (->writer_signal) from multiple writers.
+ */
+static DEFINE_SPINLOCK(writer_side_lock);
+
 int __percpu_init_rwlock(struct percpu_rwlock *pcpu_rwlock,
 			 const char *name, struct lock_class_key *rwlock_key)
 {
@@ -191,6 +197,8 @@ void percpu_write_lock_irqsave(struct percpu_rwlock *pcpu_rwlock,
 {
 	unsigned int cpu;
 
+	spin_lock(&writer_side_lock);
+
 	/*
 	 * Tell all readers that a writer is becoming active, so that they
 	 * start switching over to the global rwlock.
@@ -252,5 +260,6 @@ void percpu_write_unlock_irqrestore(struct percpu_rwlock *pcpu_rwlock,
 		per_cpu_ptr(pcpu_rwlock->rw_state, cpu)->writer_signal = false;
 
 	write_unlock_irqrestore(&pcpu_rwlock->global_rwlock, *flags);
+	spin_unlock(&writer_side_lock);
 }

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-18 17:56         ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 17:56 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/18/2013 09:51 PM, Srivatsa S. Bhat wrote:
> Hi Michel,
> 
> On 02/18/2013 09:15 PM, Michel Lespinasse wrote:
>> Hi Srivasta,
>>
>> I admit not having followed in detail the threads about the previous
>> iteration, so some of my comments may have been discussed already
>> before - apologies if that is the case.
>>
>> On Mon, Feb 18, 2013 at 8:38 PM, Srivatsa S. Bhat
>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>> Reader-writer locks and per-cpu counters are recursive, so they can be
>>> used in a nested fashion in the reader-path, which makes per-CPU rwlocks also
>>> recursive. Also, this design of switching the synchronization scheme ensures
>>> that you can safely nest and use these locks in a very flexible manner.
[...]
>>>  void percpu_write_lock(struct percpu_rwlock *pcpu_rwlock)
>>>  {
>>> +       unsigned int cpu;
>>> +
>>> +       /*
>>> +        * Tell all readers that a writer is becoming active, so that they
>>> +        * start switching over to the global rwlock.
>>> +        */
>>> +       for_each_possible_cpu(cpu)
>>> +               per_cpu_ptr(pcpu_rwlock->rw_state, cpu)->writer_signal = true;
>>
>> I don't see anything preventing a race with the corresponding code in
>> percpu_write_unlock() that sets writer_signal back to false. Did I
>> miss something here ? It seems to me we don't have any guarantee that
>> all writer signals will be set to true at the end of the loop...
>>
> 
> Ah, thanks for pointing that out! IIRC Oleg had pointed this issue in the last
> version, but back then, I hadn't fully understood what he meant. Your
> explanation made it clear. I'll work on fixing this.
> 

We can fix this by using the simple patch (untested) shown below.
The alternative would be to acquire the rwlock for write, update the
->writer_signal values, release the lock, wait for readers to switch,
again acquire the rwlock for write with interrupts disabled etc... which
makes it kinda messy, IMHO. So I prefer the simple version shown below.


diff --git a/lib/percpu-rwlock.c b/lib/percpu-rwlock.c
index bf95e40..64ccd3f 100644
--- a/lib/percpu-rwlock.c
+++ b/lib/percpu-rwlock.c
@@ -50,6 +50,12 @@
 	(__this_cpu_read((pcpu_rwlock)->rw_state->writer_signal))
 
 
+/*
+ * Spinlock to synchronize access to the writer's data-structures
+ * (->writer_signal) from multiple writers.
+ */
+static DEFINE_SPINLOCK(writer_side_lock);
+
 int __percpu_init_rwlock(struct percpu_rwlock *pcpu_rwlock,
 			 const char *name, struct lock_class_key *rwlock_key)
 {
@@ -191,6 +197,8 @@ void percpu_write_lock_irqsave(struct percpu_rwlock *pcpu_rwlock,
 {
 	unsigned int cpu;
 
+	spin_lock(&writer_side_lock);
+
 	/*
 	 * Tell all readers that a writer is becoming active, so that they
 	 * start switching over to the global rwlock.
@@ -252,5 +260,6 @@ void percpu_write_unlock_irqrestore(struct percpu_rwlock *pcpu_rwlock,
 		per_cpu_ptr(pcpu_rwlock->rw_state, cpu)->writer_signal = false;
 
 	write_unlock_irqrestore(&pcpu_rwlock->global_rwlock, *flags);
+	spin_unlock(&writer_side_lock);
 }

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
  2013-02-18 17:56         ` Srivatsa S. Bhat
  (?)
@ 2013-02-18 18:07           ` Michel Lespinasse
  -1 siblings, 0 replies; 360+ messages in thread
From: Michel Lespinasse @ 2013-02-18 18:07 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: linux-doc, peterz, fweisbec, linux-kernel, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, namhyung, tglx, linux-arm-kernel, netdev, oleg,
	vincent.guittot, sbw, tj, akpm, linuxppc-dev

On Tue, Feb 19, 2013 at 1:56 AM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> On 02/18/2013 09:51 PM, Srivatsa S. Bhat wrote:
>> On 02/18/2013 09:15 PM, Michel Lespinasse wrote:
>>> I don't see anything preventing a race with the corresponding code in
>>> percpu_write_unlock() that sets writer_signal back to false. Did I
>>> miss something here ? It seems to me we don't have any guarantee that
>>> all writer signals will be set to true at the end of the loop...
>>
>> Ah, thanks for pointing that out! IIRC Oleg had pointed this issue in the last
>> version, but back then, I hadn't fully understood what he meant. Your
>> explanation made it clear. I'll work on fixing this.
>
> We can fix this by using the simple patch (untested) shown below.
> The alternative would be to acquire the rwlock for write, update the
> ->writer_signal values, release the lock, wait for readers to switch,
> again acquire the rwlock for write with interrupts disabled etc... which
> makes it kinda messy, IMHO. So I prefer the simple version shown below.

Looks good.

Another alternative would be to make writer_signal an atomic integer
instead of a bool. That way writers can increment it before locking
and decrement it while unlocking.

To reduce the number of atomic ops during writer lock/unlock, the
writer_signal could also be a global read_mostly variable (I don't see
any downsides to that compared to having it percpu - or is it because
you wanted all the fastpath state to be in one single cacheline ?)

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-18 18:07           ` Michel Lespinasse
  0 siblings, 0 replies; 360+ messages in thread
From: Michel Lespinasse @ 2013-02-18 18:07 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: linux-doc, peterz, fweisbec, linux-kernel, namhyung, mingo,
	linux-arch, linux, xiaoguangrong, wangyun, paulmck, nikunj,
	linux-pm, rusty, rostedt, rjw, vincent.guittot, tglx,
	linux-arm-kernel, netdev, oleg, sbw, tj, akpm, linuxppc-dev

On Tue, Feb 19, 2013 at 1:56 AM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> On 02/18/2013 09:51 PM, Srivatsa S. Bhat wrote:
>> On 02/18/2013 09:15 PM, Michel Lespinasse wrote:
>>> I don't see anything preventing a race with the corresponding code in
>>> percpu_write_unlock() that sets writer_signal back to false. Did I
>>> miss something here ? It seems to me we don't have any guarantee that
>>> all writer signals will be set to true at the end of the loop...
>>
>> Ah, thanks for pointing that out! IIRC Oleg had pointed this issue in the last
>> version, but back then, I hadn't fully understood what he meant. Your
>> explanation made it clear. I'll work on fixing this.
>
> We can fix this by using the simple patch (untested) shown below.
> The alternative would be to acquire the rwlock for write, update the
> ->writer_signal values, release the lock, wait for readers to switch,
> again acquire the rwlock for write with interrupts disabled etc... which
> makes it kinda messy, IMHO. So I prefer the simple version shown below.

Looks good.

Another alternative would be to make writer_signal an atomic integer
instead of a bool. That way writers can increment it before locking
and decrement it while unlocking.

To reduce the number of atomic ops during writer lock/unlock, the
writer_signal could also be a global read_mostly variable (I don't see
any downsides to that compared to having it percpu - or is it because
you wanted all the fastpath state to be in one single cacheline ?)

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-18 18:07           ` Michel Lespinasse
  0 siblings, 0 replies; 360+ messages in thread
From: Michel Lespinasse @ 2013-02-18 18:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 19, 2013 at 1:56 AM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> On 02/18/2013 09:51 PM, Srivatsa S. Bhat wrote:
>> On 02/18/2013 09:15 PM, Michel Lespinasse wrote:
>>> I don't see anything preventing a race with the corresponding code in
>>> percpu_write_unlock() that sets writer_signal back to false. Did I
>>> miss something here ? It seems to me we don't have any guarantee that
>>> all writer signals will be set to true at the end of the loop...
>>
>> Ah, thanks for pointing that out! IIRC Oleg had pointed this issue in the last
>> version, but back then, I hadn't fully understood what he meant. Your
>> explanation made it clear. I'll work on fixing this.
>
> We can fix this by using the simple patch (untested) shown below.
> The alternative would be to acquire the rwlock for write, update the
> ->writer_signal values, release the lock, wait for readers to switch,
> again acquire the rwlock for write with interrupts disabled etc... which
> makes it kinda messy, IMHO. So I prefer the simple version shown below.

Looks good.

Another alternative would be to make writer_signal an atomic integer
instead of a bool. That way writers can increment it before locking
and decrement it while unlocking.

To reduce the number of atomic ops during writer lock/unlock, the
writer_signal could also be a global read_mostly variable (I don't see
any downsides to that compared to having it percpu - or is it because
you wanted all the fastpath state to be in one single cacheline ?)

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
  2013-02-18 18:07           ` Michel Lespinasse
  (?)
@ 2013-02-18 18:14             ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 18:14 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: linux-doc, peterz, fweisbec, linux-kernel, namhyung, mingo,
	linux-arch, linux, xiaoguangrong, wangyun, paulmck, nikunj,
	linux-pm, rusty, rostedt, rjw, vincent.guittot, tglx,
	linux-arm-kernel, netdev, oleg, sbw, tj, akpm, linuxppc-dev

On 02/18/2013 11:37 PM, Michel Lespinasse wrote:
> On Tue, Feb 19, 2013 at 1:56 AM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> On 02/18/2013 09:51 PM, Srivatsa S. Bhat wrote:
>>> On 02/18/2013 09:15 PM, Michel Lespinasse wrote:
>>>> I don't see anything preventing a race with the corresponding code in
>>>> percpu_write_unlock() that sets writer_signal back to false. Did I
>>>> miss something here ? It seems to me we don't have any guarantee that
>>>> all writer signals will be set to true at the end of the loop...
>>>
>>> Ah, thanks for pointing that out! IIRC Oleg had pointed this issue in the last
>>> version, but back then, I hadn't fully understood what he meant. Your
>>> explanation made it clear. I'll work on fixing this.
>>
>> We can fix this by using the simple patch (untested) shown below.
>> The alternative would be to acquire the rwlock for write, update the
>> ->writer_signal values, release the lock, wait for readers to switch,
>> again acquire the rwlock for write with interrupts disabled etc... which
>> makes it kinda messy, IMHO. So I prefer the simple version shown below.
> 
> Looks good.
> 
> Another alternative would be to make writer_signal an atomic integer
> instead of a bool. That way writers can increment it before locking
> and decrement it while unlocking.
> 

Yep, that would also do. But the spinlock version looks simpler - no need
to check if the atomic counter is non-zero, no need to explicitly spin in
a tight-loop etc.

> To reduce the number of atomic ops during writer lock/unlock, the
> writer_signal could also be a global read_mostly variable (I don't see
> any downsides to that compared to having it percpu - or is it because
> you wanted all the fastpath state to be in one single cacheline ?)
> 

Yes, we (Oleg and I) debated for a while about global vs percpu, and then
finally decided to go with percpu to have cache benefits.

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-18 18:14             ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 18:14 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: linux-doc, peterz, fweisbec, linux-kernel, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, namhyung, tglx, linux-arm-kernel, netdev, oleg,
	vincent.guittot, sbw, tj, akpm, linuxppc-dev

On 02/18/2013 11:37 PM, Michel Lespinasse wrote:
> On Tue, Feb 19, 2013 at 1:56 AM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> On 02/18/2013 09:51 PM, Srivatsa S. Bhat wrote:
>>> On 02/18/2013 09:15 PM, Michel Lespinasse wrote:
>>>> I don't see anything preventing a race with the corresponding code in
>>>> percpu_write_unlock() that sets writer_signal back to false. Did I
>>>> miss something here ? It seems to me we don't have any guarantee that
>>>> all writer signals will be set to true at the end of the loop...
>>>
>>> Ah, thanks for pointing that out! IIRC Oleg had pointed this issue in the last
>>> version, but back then, I hadn't fully understood what he meant. Your
>>> explanation made it clear. I'll work on fixing this.
>>
>> We can fix this by using the simple patch (untested) shown below.
>> The alternative would be to acquire the rwlock for write, update the
>> ->writer_signal values, release the lock, wait for readers to switch,
>> again acquire the rwlock for write with interrupts disabled etc... which
>> makes it kinda messy, IMHO. So I prefer the simple version shown below.
> 
> Looks good.
> 
> Another alternative would be to make writer_signal an atomic integer
> instead of a bool. That way writers can increment it before locking
> and decrement it while unlocking.
> 

Yep, that would also do. But the spinlock version looks simpler - no need
to check if the atomic counter is non-zero, no need to explicitly spin in
a tight-loop etc.

> To reduce the number of atomic ops during writer lock/unlock, the
> writer_signal could also be a global read_mostly variable (I don't see
> any downsides to that compared to having it percpu - or is it because
> you wanted all the fastpath state to be in one single cacheline ?)
> 

Yes, we (Oleg and I) debated for a while about global vs percpu, and then
finally decided to go with percpu to have cache benefits.

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-18 18:14             ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 18:14 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/18/2013 11:37 PM, Michel Lespinasse wrote:
> On Tue, Feb 19, 2013 at 1:56 AM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> On 02/18/2013 09:51 PM, Srivatsa S. Bhat wrote:
>>> On 02/18/2013 09:15 PM, Michel Lespinasse wrote:
>>>> I don't see anything preventing a race with the corresponding code in
>>>> percpu_write_unlock() that sets writer_signal back to false. Did I
>>>> miss something here ? It seems to me we don't have any guarantee that
>>>> all writer signals will be set to true at the end of the loop...
>>>
>>> Ah, thanks for pointing that out! IIRC Oleg had pointed this issue in the last
>>> version, but back then, I hadn't fully understood what he meant. Your
>>> explanation made it clear. I'll work on fixing this.
>>
>> We can fix this by using the simple patch (untested) shown below.
>> The alternative would be to acquire the rwlock for write, update the
>> ->writer_signal values, release the lock, wait for readers to switch,
>> again acquire the rwlock for write with interrupts disabled etc... which
>> makes it kinda messy, IMHO. So I prefer the simple version shown below.
> 
> Looks good.
> 
> Another alternative would be to make writer_signal an atomic integer
> instead of a bool. That way writers can increment it before locking
> and decrement it while unlocking.
> 

Yep, that would also do. But the spinlock version looks simpler - no need
to check if the atomic counter is non-zero, no need to explicitly spin in
a tight-loop etc.

> To reduce the number of atomic ops during writer lock/unlock, the
> writer_signal could also be a global read_mostly variable (I don't see
> any downsides to that compared to having it percpu - or is it because
> you wanted all the fastpath state to be in one single cacheline ?)
> 

Yes, we (Oleg and I) debated for a while about global vs percpu, and then
finally decided to go with percpu to have cache benefits.

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 08/46] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
  2013-02-18 17:21         ` Michel Lespinasse
  (?)
@ 2013-02-18 18:50           ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 18:50 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung,
	rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, linux-pm, linux-arch, linux-arm-kernel, linuxppc-dev,
	netdev, linux-doc, linux-kernel, vincent.guittot

On 02/18/2013 10:51 PM, Michel Lespinasse wrote:
> On Tue, Feb 19, 2013 at 12:43 AM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> On 02/18/2013 09:53 PM, Michel Lespinasse wrote:
>>> I am wondering though, if you could take care of recursive uses in
>>> get/put_online_cpus_atomic() instead of doing it as a property of your
>>> rwlock:
>>>
>>> get_online_cpus_atomic()
>>> {
>>>     unsigned long flags;
>>>     local_irq_save(flags);
>>>     if (this_cpu_inc_return(hotplug_recusion_count) == 1)
>>>         percpu_read_lock_irqsafe(&hotplug_pcpu_rwlock);
>>>     local_irq_restore(flags);
>>> }
>>>
>>> Once again, the idea there is to avoid baking the reader side
>>> recursive properties into your rwlock itself, so that it won't be
>>> impossible to implement reader/writer fairness into your rwlock in the
>>> future (which may be not be very important for the hotplug use, but
>>> could be when other uses get introduced).
>>
>> Hmm, your proposal above looks good to me, at first glance.
>> (Sorry, I had mistaken your earlier mails to mean that you were against
>> recursive reader-side, while you actually meant that you didn't like
>> implementing the recursive reader-side logic using the recursive property
>> of rwlocks).
> 
> To be honest, I was replying as I went through the series, so I hadn't
> digested your hotplug use case yet :)
> 
> But yes - I don't like having the rwlock itself be recursive, but I do
> understand that you have a legitimate requirement for
> get_online_cpus_atomic() to be recursive. This IMO points to the
> direction I suggested, of explicitly handling the recusion in
> get_online_cpus_atomic() so that the underlying rwlock doesn't have to
> support recursive reader side itself.
> 
> (And this would work for the idea of making writers own the reader
> side as well - you can do it with the hotplug_recursion_count instead
> of with the underlying rwlock).
> 
>> While your idea above looks good, it might introduce more complexity
>> in the unlock path, since this would allow nesting of heterogeneous readers
>> (ie., if hotplug_recursion_count == 1, you don't know whether you need to
>> simply decrement the counter or unlock the rwlock as well).
> 
> Well, I think the idea doesn't make the underlying rwlock more
> complex, since you could in principle keep your existing
> percpu_read_lock_irqsafe implementation as is and just remove the
> recursive behavior from its documentation.
> 
> Now ideally if we're adding a bit of complexity in
> get_online_cpus_atomic() it'd be nice if we could remove some from
> percpu_read_lock_irqsafe, but I haven't thought about that deeply
> either. I think you'd still want to have the equivalent of a percpu
> reader_refcnt, except it could now be a bool instead of an int, and
> percpu_read_lock_irqsafe would still set it to back to 0/false after
> acquiring the global read side if a writer is signaled. Basically your
> existing percpu_read_lock_irqsafe code should still work, and we could
> remove just the parts that were only there to deal with the recursive
> property.
> 

But, the whole intention behind removing the parts depending on the
recursive property of rwlocks would be to make it easier to make rwlocks
fair (going forward) right? Then, that won't work for CPU hotplug, because,
just like we have a legitimate reason to have recursive
get_online_cpus_atomic(), we also have a legitimate reason to have
unfairness in locking (i.e., for deadlock-safety). So we simply can't
afford to make the locking fair - we'll end up in too many deadlock
possibilities, as hinted in the changelog of patch 1.

(Remember, we are replacing preempt_disable(), which had absolutely no
special nesting rules or locking implications. That is why we are forced
to provide maximum locking flexibility and safety against new/previously
non-existent deadlocks, in the new synchronization scheme).

So the only long-term solution I can think of is to decouple
percpu-rwlocks and rwlock_t (like what Tejun suggested) by implementing
our own unfair locking scheme inside. What do you think?

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 08/46] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
@ 2013-02-18 18:50           ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 18:50 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: linux-doc, peterz, fweisbec, linux-kernel, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, namhyung, tglx, linux-arm-kernel, netdev, oleg,
	vincent.guittot, sbw, tj, akpm, linuxppc-dev

On 02/18/2013 10:51 PM, Michel Lespinasse wrote:
> On Tue, Feb 19, 2013 at 12:43 AM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> On 02/18/2013 09:53 PM, Michel Lespinasse wrote:
>>> I am wondering though, if you could take care of recursive uses in
>>> get/put_online_cpus_atomic() instead of doing it as a property of your
>>> rwlock:
>>>
>>> get_online_cpus_atomic()
>>> {
>>>     unsigned long flags;
>>>     local_irq_save(flags);
>>>     if (this_cpu_inc_return(hotplug_recusion_count) == 1)
>>>         percpu_read_lock_irqsafe(&hotplug_pcpu_rwlock);
>>>     local_irq_restore(flags);
>>> }
>>>
>>> Once again, the idea there is to avoid baking the reader side
>>> recursive properties into your rwlock itself, so that it won't be
>>> impossible to implement reader/writer fairness into your rwlock in the
>>> future (which may be not be very important for the hotplug use, but
>>> could be when other uses get introduced).
>>
>> Hmm, your proposal above looks good to me, at first glance.
>> (Sorry, I had mistaken your earlier mails to mean that you were against
>> recursive reader-side, while you actually meant that you didn't like
>> implementing the recursive reader-side logic using the recursive property
>> of rwlocks).
> 
> To be honest, I was replying as I went through the series, so I hadn't
> digested your hotplug use case yet :)
> 
> But yes - I don't like having the rwlock itself be recursive, but I do
> understand that you have a legitimate requirement for
> get_online_cpus_atomic() to be recursive. This IMO points to the
> direction I suggested, of explicitly handling the recusion in
> get_online_cpus_atomic() so that the underlying rwlock doesn't have to
> support recursive reader side itself.
> 
> (And this would work for the idea of making writers own the reader
> side as well - you can do it with the hotplug_recursion_count instead
> of with the underlying rwlock).
> 
>> While your idea above looks good, it might introduce more complexity
>> in the unlock path, since this would allow nesting of heterogeneous readers
>> (ie., if hotplug_recursion_count == 1, you don't know whether you need to
>> simply decrement the counter or unlock the rwlock as well).
> 
> Well, I think the idea doesn't make the underlying rwlock more
> complex, since you could in principle keep your existing
> percpu_read_lock_irqsafe implementation as is and just remove the
> recursive behavior from its documentation.
> 
> Now ideally if we're adding a bit of complexity in
> get_online_cpus_atomic() it'd be nice if we could remove some from
> percpu_read_lock_irqsafe, but I haven't thought about that deeply
> either. I think you'd still want to have the equivalent of a percpu
> reader_refcnt, except it could now be a bool instead of an int, and
> percpu_read_lock_irqsafe would still set it to back to 0/false after
> acquiring the global read side if a writer is signaled. Basically your
> existing percpu_read_lock_irqsafe code should still work, and we could
> remove just the parts that were only there to deal with the recursive
> property.
> 

But, the whole intention behind removing the parts depending on the
recursive property of rwlocks would be to make it easier to make rwlocks
fair (going forward) right? Then, that won't work for CPU hotplug, because,
just like we have a legitimate reason to have recursive
get_online_cpus_atomic(), we also have a legitimate reason to have
unfairness in locking (i.e., for deadlock-safety). So we simply can't
afford to make the locking fair - we'll end up in too many deadlock
possibilities, as hinted in the changelog of patch 1.

(Remember, we are replacing preempt_disable(), which had absolutely no
special nesting rules or locking implications. That is why we are forced
to provide maximum locking flexibility and safety against new/previously
non-existent deadlocks, in the new synchronization scheme).

So the only long-term solution I can think of is to decouple
percpu-rwlocks and rwlock_t (like what Tejun suggested) by implementing
our own unfair locking scheme inside. What do you think?

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 08/46] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
@ 2013-02-18 18:50           ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-18 18:50 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/18/2013 10:51 PM, Michel Lespinasse wrote:
> On Tue, Feb 19, 2013 at 12:43 AM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> On 02/18/2013 09:53 PM, Michel Lespinasse wrote:
>>> I am wondering though, if you could take care of recursive uses in
>>> get/put_online_cpus_atomic() instead of doing it as a property of your
>>> rwlock:
>>>
>>> get_online_cpus_atomic()
>>> {
>>>     unsigned long flags;
>>>     local_irq_save(flags);
>>>     if (this_cpu_inc_return(hotplug_recusion_count) == 1)
>>>         percpu_read_lock_irqsafe(&hotplug_pcpu_rwlock);
>>>     local_irq_restore(flags);
>>> }
>>>
>>> Once again, the idea there is to avoid baking the reader side
>>> recursive properties into your rwlock itself, so that it won't be
>>> impossible to implement reader/writer fairness into your rwlock in the
>>> future (which may be not be very important for the hotplug use, but
>>> could be when other uses get introduced).
>>
>> Hmm, your proposal above looks good to me, at first glance.
>> (Sorry, I had mistaken your earlier mails to mean that you were against
>> recursive reader-side, while you actually meant that you didn't like
>> implementing the recursive reader-side logic using the recursive property
>> of rwlocks).
> 
> To be honest, I was replying as I went through the series, so I hadn't
> digested your hotplug use case yet :)
> 
> But yes - I don't like having the rwlock itself be recursive, but I do
> understand that you have a legitimate requirement for
> get_online_cpus_atomic() to be recursive. This IMO points to the
> direction I suggested, of explicitly handling the recusion in
> get_online_cpus_atomic() so that the underlying rwlock doesn't have to
> support recursive reader side itself.
> 
> (And this would work for the idea of making writers own the reader
> side as well - you can do it with the hotplug_recursion_count instead
> of with the underlying rwlock).
> 
>> While your idea above looks good, it might introduce more complexity
>> in the unlock path, since this would allow nesting of heterogeneous readers
>> (ie., if hotplug_recursion_count == 1, you don't know whether you need to
>> simply decrement the counter or unlock the rwlock as well).
> 
> Well, I think the idea doesn't make the underlying rwlock more
> complex, since you could in principle keep your existing
> percpu_read_lock_irqsafe implementation as is and just remove the
> recursive behavior from its documentation.
> 
> Now ideally if we're adding a bit of complexity in
> get_online_cpus_atomic() it'd be nice if we could remove some from
> percpu_read_lock_irqsafe, but I haven't thought about that deeply
> either. I think you'd still want to have the equivalent of a percpu
> reader_refcnt, except it could now be a bool instead of an int, and
> percpu_read_lock_irqsafe would still set it to back to 0/false after
> acquiring the global read side if a writer is signaled. Basically your
> existing percpu_read_lock_irqsafe code should still work, and we could
> remove just the parts that were only there to deal with the recursive
> property.
> 

But, the whole intention behind removing the parts depending on the
recursive property of rwlocks would be to make it easier to make rwlocks
fair (going forward) right? Then, that won't work for CPU hotplug, because,
just like we have a legitimate reason to have recursive
get_online_cpus_atomic(), we also have a legitimate reason to have
unfairness in locking (i.e., for deadlock-safety). So we simply can't
afford to make the locking fair - we'll end up in too many deadlock
possibilities, as hinted in the changelog of patch 1.

(Remember, we are replacing preempt_disable(), which had absolutely no
special nesting rules or locking implications. That is why we are forced
to provide maximum locking flexibility and safety against new/previously
non-existent deadlocks, in the new synchronization scheme).

So the only long-term solution I can think of is to decouple
percpu-rwlocks and rwlock_t (like what Tejun suggested) by implementing
our own unfair locking scheme inside. What do you think?

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 08/46] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
  2013-02-18 18:50           ` Srivatsa S. Bhat
  (?)
@ 2013-02-19  9:40             ` Michel Lespinasse
  -1 siblings, 0 replies; 360+ messages in thread
From: Michel Lespinasse @ 2013-02-19  9:40 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung,
	rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, linux-pm, linux-arch, linux-arm-kernel, linuxppc-dev,
	netdev, linux-doc, linux-kernel, vincent.guittot

On Tue, Feb 19, 2013 at 2:50 AM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> But, the whole intention behind removing the parts depending on the
> recursive property of rwlocks would be to make it easier to make rwlocks
> fair (going forward) right? Then, that won't work for CPU hotplug, because,
> just like we have a legitimate reason to have recursive
> get_online_cpus_atomic(), we also have a legitimate reason to have
> unfairness in locking (i.e., for deadlock-safety). So we simply can't
> afford to make the locking fair - we'll end up in too many deadlock
> possibilities, as hinted in the changelog of patch 1.

Grumpf - I hadn't realized that making the underlying rwlock fair
would break your hotplug use case. But you are right, it would. Oh
well :/

> So the only long-term solution I can think of is to decouple
> percpu-rwlocks and rwlock_t (like what Tejun suggested) by implementing
> our own unfair locking scheme inside. What do you think?

I have no idea how hard would it be to change get_online_cpus_atomic()
call sites so that the hotplug rwlock read side has a defined order vs
other locks (thus making sure the situation you describe in patch 1
doesn't happen). I agree we shouldn't base our short term plans around
that, but maybe that's doable in the long term ???

Otherwise, I think we should add some big-fat-warning that percpu
rwlocks don't have reader/writer fairness, that the hotplug use case
actually depends on the unfairness / would break if the rwlock was
made fair, and that any new uses of percpu rwlocks should be very
carefully considered because of the reader/writer fairness issues.
Maybe even give percpu rwlocks a less generic sounding name, given how
constrained they are by the hotplug use case.

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 08/46] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
@ 2013-02-19  9:40             ` Michel Lespinasse
  0 siblings, 0 replies; 360+ messages in thread
From: Michel Lespinasse @ 2013-02-19  9:40 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: linux-doc, peterz, fweisbec, linux-kernel, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, namhyung, tglx, linux-arm-kernel, netdev, oleg,
	vincent.guittot, sbw, tj, akpm, linuxppc-dev

On Tue, Feb 19, 2013 at 2:50 AM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> But, the whole intention behind removing the parts depending on the
> recursive property of rwlocks would be to make it easier to make rwlocks
> fair (going forward) right? Then, that won't work for CPU hotplug, because,
> just like we have a legitimate reason to have recursive
> get_online_cpus_atomic(), we also have a legitimate reason to have
> unfairness in locking (i.e., for deadlock-safety). So we simply can't
> afford to make the locking fair - we'll end up in too many deadlock
> possibilities, as hinted in the changelog of patch 1.

Grumpf - I hadn't realized that making the underlying rwlock fair
would break your hotplug use case. But you are right, it would. Oh
well :/

> So the only long-term solution I can think of is to decouple
> percpu-rwlocks and rwlock_t (like what Tejun suggested) by implementing
> our own unfair locking scheme inside. What do you think?

I have no idea how hard would it be to change get_online_cpus_atomic()
call sites so that the hotplug rwlock read side has a defined order vs
other locks (thus making sure the situation you describe in patch 1
doesn't happen). I agree we shouldn't base our short term plans around
that, but maybe that's doable in the long term ???

Otherwise, I think we should add some big-fat-warning that percpu
rwlocks don't have reader/writer fairness, that the hotplug use case
actually depends on the unfairness / would break if the rwlock was
made fair, and that any new uses of percpu rwlocks should be very
carefully considered because of the reader/writer fairness issues.
Maybe even give percpu rwlocks a less generic sounding name, given how
constrained they are by the hotplug use case.

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 08/46] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
@ 2013-02-19  9:40             ` Michel Lespinasse
  0 siblings, 0 replies; 360+ messages in thread
From: Michel Lespinasse @ 2013-02-19  9:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 19, 2013 at 2:50 AM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> But, the whole intention behind removing the parts depending on the
> recursive property of rwlocks would be to make it easier to make rwlocks
> fair (going forward) right? Then, that won't work for CPU hotplug, because,
> just like we have a legitimate reason to have recursive
> get_online_cpus_atomic(), we also have a legitimate reason to have
> unfairness in locking (i.e., for deadlock-safety). So we simply can't
> afford to make the locking fair - we'll end up in too many deadlock
> possibilities, as hinted in the changelog of patch 1.

Grumpf - I hadn't realized that making the underlying rwlock fair
would break your hotplug use case. But you are right, it would. Oh
well :/

> So the only long-term solution I can think of is to decouple
> percpu-rwlocks and rwlock_t (like what Tejun suggested) by implementing
> our own unfair locking scheme inside. What do you think?

I have no idea how hard would it be to change get_online_cpus_atomic()
call sites so that the hotplug rwlock read side has a defined order vs
other locks (thus making sure the situation you describe in patch 1
doesn't happen). I agree we shouldn't base our short term plans around
that, but maybe that's doable in the long term ???

Otherwise, I think we should add some big-fat-warning that percpu
rwlocks don't have reader/writer fairness, that the hotplug use case
actually depends on the unfairness / would break if the rwlock was
made fair, and that any new uses of percpu rwlocks should be very
carefully considered because of the reader/writer fairness issues.
Maybe even give percpu rwlocks a less generic sounding name, given how
constrained they are by the hotplug use case.

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 08/46] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
  2013-02-19  9:40             ` Michel Lespinasse
  (?)
@ 2013-02-19  9:55               ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-19  9:55 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung,
	rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, linux-pm, linux-arch, linux-arm-kernel, linuxppc-dev,
	netdev, linux-doc, linux-kernel, vincent.guittot

On 02/19/2013 03:10 PM, Michel Lespinasse wrote:
> On Tue, Feb 19, 2013 at 2:50 AM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> But, the whole intention behind removing the parts depending on the
>> recursive property of rwlocks would be to make it easier to make rwlocks
>> fair (going forward) right? Then, that won't work for CPU hotplug, because,
>> just like we have a legitimate reason to have recursive
>> get_online_cpus_atomic(), we also have a legitimate reason to have
>> unfairness in locking (i.e., for deadlock-safety). So we simply can't
>> afford to make the locking fair - we'll end up in too many deadlock
>> possibilities, as hinted in the changelog of patch 1.
> 
> Grumpf - I hadn't realized that making the underlying rwlock fair
> would break your hotplug use case. But you are right, it would. Oh
> well :/
> 

Yeah :-/

>> So the only long-term solution I can think of is to decouple
>> percpu-rwlocks and rwlock_t (like what Tejun suggested) by implementing
>> our own unfair locking scheme inside. What do you think?
> 
> I have no idea how hard would it be to change get_online_cpus_atomic()
> call sites so that the hotplug rwlock read side has a defined order vs
> other locks (thus making sure the situation you describe in patch 1
> doesn't happen). I agree we shouldn't base our short term plans around
> that, but maybe that's doable in the long term ???
>

I think it should be possible in the longer term. I'm expecting it to be
*much much* harder to audit and convert (requiring a lot of subsystem
knowledge of each subsystem that we are touching), than the simpler
tree-wide conversion that I did in this patchset... but I don't think it
is impossible.
 
> Otherwise, I think we should add some big-fat-warning that percpu
> rwlocks don't have reader/writer fairness, that the hotplug use case
> actually depends on the unfairness / would break if the rwlock was
> made fair, and that any new uses of percpu rwlocks should be very
> carefully considered because of the reader/writer fairness issues.

In fact, when I started out, I actually contained all the new locking code
inside CPU hotplug itself, and didn't even expose it as a generic percpu
rwlock in some of the previous versions of this patchset... :-)

But now that we already have a generic locking scheme exposed, we could
add a warning against using it without due consideration.

> Maybe even give percpu rwlocks a less generic sounding name, given how
> constrained they are by the hotplug use case.
 
I wouldn't go that far... ;-) Unfairness is not a show-stopper right?
IMHO, the warning/documentation should suffice for anybody wanting to
try out this locking scheme for other use-cases.

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 08/46] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
@ 2013-02-19  9:55               ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-19  9:55 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: linux-doc, peterz, fweisbec, linux-kernel, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, namhyung, tglx, linux-arm-kernel, netdev, oleg,
	vincent.guittot, sbw, tj, akpm, linuxppc-dev

On 02/19/2013 03:10 PM, Michel Lespinasse wrote:
> On Tue, Feb 19, 2013 at 2:50 AM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> But, the whole intention behind removing the parts depending on the
>> recursive property of rwlocks would be to make it easier to make rwlocks
>> fair (going forward) right? Then, that won't work for CPU hotplug, because,
>> just like we have a legitimate reason to have recursive
>> get_online_cpus_atomic(), we also have a legitimate reason to have
>> unfairness in locking (i.e., for deadlock-safety). So we simply can't
>> afford to make the locking fair - we'll end up in too many deadlock
>> possibilities, as hinted in the changelog of patch 1.
> 
> Grumpf - I hadn't realized that making the underlying rwlock fair
> would break your hotplug use case. But you are right, it would. Oh
> well :/
> 

Yeah :-/

>> So the only long-term solution I can think of is to decouple
>> percpu-rwlocks and rwlock_t (like what Tejun suggested) by implementing
>> our own unfair locking scheme inside. What do you think?
> 
> I have no idea how hard would it be to change get_online_cpus_atomic()
> call sites so that the hotplug rwlock read side has a defined order vs
> other locks (thus making sure the situation you describe in patch 1
> doesn't happen). I agree we shouldn't base our short term plans around
> that, but maybe that's doable in the long term ???
>

I think it should be possible in the longer term. I'm expecting it to be
*much much* harder to audit and convert (requiring a lot of subsystem
knowledge of each subsystem that we are touching), than the simpler
tree-wide conversion that I did in this patchset... but I don't think it
is impossible.
 
> Otherwise, I think we should add some big-fat-warning that percpu
> rwlocks don't have reader/writer fairness, that the hotplug use case
> actually depends on the unfairness / would break if the rwlock was
> made fair, and that any new uses of percpu rwlocks should be very
> carefully considered because of the reader/writer fairness issues.

In fact, when I started out, I actually contained all the new locking code
inside CPU hotplug itself, and didn't even expose it as a generic percpu
rwlock in some of the previous versions of this patchset... :-)

But now that we already have a generic locking scheme exposed, we could
add a warning against using it without due consideration.

> Maybe even give percpu rwlocks a less generic sounding name, given how
> constrained they are by the hotplug use case.
 
I wouldn't go that far... ;-) Unfairness is not a show-stopper right?
IMHO, the warning/documentation should suffice for anybody wanting to
try out this locking scheme for other use-cases.

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 08/46] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
@ 2013-02-19  9:55               ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-19  9:55 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/19/2013 03:10 PM, Michel Lespinasse wrote:
> On Tue, Feb 19, 2013 at 2:50 AM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> But, the whole intention behind removing the parts depending on the
>> recursive property of rwlocks would be to make it easier to make rwlocks
>> fair (going forward) right? Then, that won't work for CPU hotplug, because,
>> just like we have a legitimate reason to have recursive
>> get_online_cpus_atomic(), we also have a legitimate reason to have
>> unfairness in locking (i.e., for deadlock-safety). So we simply can't
>> afford to make the locking fair - we'll end up in too many deadlock
>> possibilities, as hinted in the changelog of patch 1.
> 
> Grumpf - I hadn't realized that making the underlying rwlock fair
> would break your hotplug use case. But you are right, it would. Oh
> well :/
> 

Yeah :-/

>> So the only long-term solution I can think of is to decouple
>> percpu-rwlocks and rwlock_t (like what Tejun suggested) by implementing
>> our own unfair locking scheme inside. What do you think?
> 
> I have no idea how hard would it be to change get_online_cpus_atomic()
> call sites so that the hotplug rwlock read side has a defined order vs
> other locks (thus making sure the situation you describe in patch 1
> doesn't happen). I agree we shouldn't base our short term plans around
> that, but maybe that's doable in the long term ???
>

I think it should be possible in the longer term. I'm expecting it to be
*much much* harder to audit and convert (requiring a lot of subsystem
knowledge of each subsystem that we are touching), than the simpler
tree-wide conversion that I did in this patchset... but I don't think it
is impossible.
 
> Otherwise, I think we should add some big-fat-warning that percpu
> rwlocks don't have reader/writer fairness, that the hotplug use case
> actually depends on the unfairness / would break if the rwlock was
> made fair, and that any new uses of percpu rwlocks should be very
> carefully considered because of the reader/writer fairness issues.

In fact, when I started out, I actually contained all the new locking code
inside CPU hotplug itself, and didn't even expose it as a generic percpu
rwlock in some of the previous versions of this patchset... :-)

But now that we already have a generic locking scheme exposed, we could
add a warning against using it without due consideration.

> Maybe even give percpu rwlocks a less generic sounding name, given how
> constrained they are by the hotplug use case.
 
I wouldn't go that far... ;-) Unfairness is not a show-stopper right?
IMHO, the warning/documentation should suffice for anybody wanting to
try out this locking scheme for other use-cases.

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* RE: [PATCH v6 08/46] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
  2013-02-19  9:55               ` Srivatsa S. Bhat
  (?)
  (?)
@ 2013-02-19 10:42                 ` David Laight
  -1 siblings, 0 replies; 360+ messages in thread
From: David Laight @ 2013-02-19 10:42 UTC (permalink / raw)
  To: Srivatsa S. Bhat, Michel Lespinasse
  Cc: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung,
	rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, linux-pm, linux-arch, linux-arm-kernel, linuxppc-dev,
	netdev, linux-doc, linux-kernel, vincent.guittot

> I wouldn't go that far... ;-) Unfairness is not a show-stopper right?
> IMHO, the warning/documentation should suffice for anybody wanting to
> try out this locking scheme for other use-cases.

I presume that by 'fairness' you mean 'write preference'?

I'd not sure how difficult it would be, but maybe have two functions
for acquiring the lock for read, one blocks if there is a writer
waiting, the other doesn't.

That way you can change the individual call sites separately.

The other place I can imagine a per-cpu rwlock being used
is to allow a driver to disable 'sleep' or software controlled
hardware removal while it performs a sequence of operations.

	David




^ permalink raw reply	[flat|nested] 360+ messages in thread

* RE: [PATCH v6 08/46] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
@ 2013-02-19 10:42                 ` David Laight
  0 siblings, 0 replies; 360+ messages in thread
From: David Laight @ 2013-02-19 10:42 UTC (permalink / raw)
  To: Srivatsa S. Bhat, Michel Lespinasse
  Cc: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung,
	rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, linux-pm, linux-arch, linux-arm-kernel, linuxppc-dev,
	netdev, linux-doc, linux-kernel, vincent.guittot

> I wouldn't go that far... ;-) Unfairness is not a show-stopper right?
> IMHO, the warning/documentation should suffice for anybody wanting to
> try out this locking scheme for other use-cases.

I presume that by 'fairness' you mean 'write preference'?

I'd not sure how difficult it would be, but maybe have two functions
for acquiring the lock for read, one blocks if there is a writer
waiting, the other doesn't.

That way you can change the individual call sites separately.

The other place I can imagine a per-cpu rwlock being used
is to allow a driver to disable 'sleep' or software controlled
hardware removal while it performs a sequence of operations.

	David

^ permalink raw reply	[flat|nested] 360+ messages in thread

* RE: [PATCH v6 08/46] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
@ 2013-02-19 10:42                 ` David Laight
  0 siblings, 0 replies; 360+ messages in thread
From: David Laight @ 2013-02-19 10:42 UTC (permalink / raw)
  To: Srivatsa S. Bhat, Michel Lespinasse
  Cc: linux-doc, peterz, fweisbec, linux-kernel, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, namhyung, tglx, linux-arm-kernel, netdev, oleg,
	vincent.guittot, sbw, tj, akpm, linuxppc-dev

> I wouldn't go that far... ;-) Unfairness is not a show-stopper right?
> IMHO, the warning/documentation should suffice for anybody wanting to
> try out this locking scheme for other use-cases.

I presume that by 'fairness' you mean 'write preference'?

I'd not sure how difficult it would be, but maybe have two functions
for acquiring the lock for read, one blocks if there is a writer
waiting, the other doesn't.

That way you can change the individual call sites separately.

The other place I can imagine a per-cpu rwlock being used
is to allow a driver to disable 'sleep' or software controlled
hardware removal while it performs a sequence of operations.

	David

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 08/46] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
@ 2013-02-19 10:42                 ` David Laight
  0 siblings, 0 replies; 360+ messages in thread
From: David Laight @ 2013-02-19 10:42 UTC (permalink / raw)
  To: linux-arm-kernel

> I wouldn't go that far... ;-) Unfairness is not a show-stopper right?
> IMHO, the warning/documentation should suffice for anybody wanting to
> try out this locking scheme for other use-cases.

I presume that by 'fairness' you mean 'write preference'?

I'd not sure how difficult it would be, but maybe have two functions
for acquiring the lock for read, one blocks if there is a writer
waiting, the other doesn't.

That way you can change the individual call sites separately.

The other place I can imagine a per-cpu rwlock being used
is to allow a driver to disable 'sleep' or software controlled
hardware removal while it performs a sequence of operations.

	David

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 08/46] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
  2013-02-19 10:42                 ` David Laight
  (?)
@ 2013-02-19 10:58                   ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-19 10:58 UTC (permalink / raw)
  To: David Laight
  Cc: Michel Lespinasse, linux-doc, peterz, fweisbec, linux-kernel,
	mingo, linux-arch, linux, xiaoguangrong, wangyun, paulmck,
	nikunj, linux-pm, rusty, rostedt, rjw, namhyung, tglx,
	linux-arm-kernel, netdev, oleg, vincent.guittot, sbw, tj, akpm,
	linuxppc-dev

On 02/19/2013 04:12 PM, David Laight wrote:
>> I wouldn't go that far... ;-) Unfairness is not a show-stopper right?
>> IMHO, the warning/documentation should suffice for anybody wanting to
>> try out this locking scheme for other use-cases.
> 
> I presume that by 'fairness' you mean 'write preference'?
> 

Yep.

> I'd not sure how difficult it would be, but maybe have two functions
> for acquiring the lock for read, one blocks if there is a writer
> waiting, the other doesn't.
> 
> That way you can change the individual call sites separately.
>

Right, we could probably use that method to change the call sites in
multiple stages, in the future.
 
> The other place I can imagine a per-cpu rwlock being used
> is to allow a driver to disable 'sleep' or software controlled
> hardware removal while it performs a sequence of operations.
> 

BTW, per-cpu rwlocks use spinlocks underneath, so they can be used only
in atomic contexts (you can't sleep holding this lock). So that would
probably make it less attractive or useless to "heavy-weight" usecases
like the latter one you mentioned. They probably need to use per-cpu
rw-semaphore or some such, which allows sleeping. I'm not very certain
of the exact usecases you are talking about, but I just wanted to
point out that percpu-rwlocks might not be applicable to many scenarios.
..(which might be a good thing, considering its unfair property today).

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 08/46] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
@ 2013-02-19 10:58                   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-19 10:58 UTC (permalink / raw)
  To: David Laight
  Cc: linux-doc, peterz, fweisbec, linux-kernel, namhyung,
	Michel Lespinasse, mingo, linux-arch, linux, xiaoguangrong,
	wangyun, paulmck, nikunj, linux-pm, rusty, rostedt, rjw,
	vincent.guittot, tglx, linux-arm-kernel, netdev, oleg, sbw, tj,
	akpm, linuxppc-dev

On 02/19/2013 04:12 PM, David Laight wrote:
>> I wouldn't go that far... ;-) Unfairness is not a show-stopper right?
>> IMHO, the warning/documentation should suffice for anybody wanting to
>> try out this locking scheme for other use-cases.
> 
> I presume that by 'fairness' you mean 'write preference'?
> 

Yep.

> I'd not sure how difficult it would be, but maybe have two functions
> for acquiring the lock for read, one blocks if there is a writer
> waiting, the other doesn't.
> 
> That way you can change the individual call sites separately.
>

Right, we could probably use that method to change the call sites in
multiple stages, in the future.
 
> The other place I can imagine a per-cpu rwlock being used
> is to allow a driver to disable 'sleep' or software controlled
> hardware removal while it performs a sequence of operations.
> 

BTW, per-cpu rwlocks use spinlocks underneath, so they can be used only
in atomic contexts (you can't sleep holding this lock). So that would
probably make it less attractive or useless to "heavy-weight" usecases
like the latter one you mentioned. They probably need to use per-cpu
rw-semaphore or some such, which allows sleeping. I'm not very certain
of the exact usecases you are talking about, but I just wanted to
point out that percpu-rwlocks might not be applicable to many scenarios.
..(which might be a good thing, considering its unfair property today).

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 08/46] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
@ 2013-02-19 10:58                   ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-19 10:58 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/19/2013 04:12 PM, David Laight wrote:
>> I wouldn't go that far... ;-) Unfairness is not a show-stopper right?
>> IMHO, the warning/documentation should suffice for anybody wanting to
>> try out this locking scheme for other use-cases.
> 
> I presume that by 'fairness' you mean 'write preference'?
> 

Yep.

> I'd not sure how difficult it would be, but maybe have two functions
> for acquiring the lock for read, one blocks if there is a writer
> waiting, the other doesn't.
> 
> That way you can change the individual call sites separately.
>

Right, we could probably use that method to change the call sites in
multiple stages, in the future.
 
> The other place I can imagine a per-cpu rwlock being used
> is to allow a driver to disable 'sleep' or software controlled
> hardware removal while it performs a sequence of operations.
> 

BTW, per-cpu rwlocks use spinlocks underneath, so they can be used only
in atomic contexts (you can't sleep holding this lock). So that would
probably make it less attractive or useless to "heavy-weight" usecases
like the latter one you mentioned. They probably need to use per-cpu
rw-semaphore or some such, which allows sleeping. I'm not very certain
of the exact usecases you are talking about, but I just wanted to
point out that percpu-rwlocks might not be applicable to many scenarios.
..(which might be a good thing, considering its unfair property today).

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 00/46] CPU hotplug: stop_machine()-free CPU hotplug
  2013-02-18 12:38 ` Srivatsa S. Bhat
                     ` (2 preceding siblings ...)
  (?)
@ 2013-02-22  0:31   ` Rusty Russell
  -1 siblings, 0 replies; 360+ messages in thread
From: Rusty Russell @ 2013-02-22  0:31 UTC (permalink / raw)
  To: Srivatsa S. Bhat, tglx, peterz, tj, oleg, paulmck, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

"Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> writes:
> Hi,
>
> This patchset removes CPU hotplug's dependence on stop_machine() from the CPU
> offline path and provides an alternative (set of APIs) to preempt_disable() to
> prevent CPUs from going offline, which can be invoked from atomic context.
> The motivation behind the removal of stop_machine() is to avoid its ill-effects
> and thus improve the design of CPU hotplug. (More description regarding this
> is available in the patches).

If you're doing a v7, please put your benchmark results somewhere!

The obvious place is in the 44/46.

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 00/46] CPU hotplug: stop_machine()-free CPU hotplug
@ 2013-02-22  0:31   ` Rusty Russell
  0 siblings, 0 replies; 360+ messages in thread
From: Rusty Russell @ 2013-02-22  0:31 UTC (permalink / raw)
  To: tglx, peterz, tj, oleg, paulmck, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, srivatsa.bhat, linux-pm, linux-arch, linux-arm-kernel,
	linuxppc-dev, netdev, linux-doc, linux-kernel, walken,
	vincent.guittot

"Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> writes:
> Hi,
>
> This patchset removes CPU hotplug's dependence on stop_machine() from the CPU
> offline path and provides an alternative (set of APIs) to preempt_disable() to
> prevent CPUs from going offline, which can be invoked from atomic context.
> The motivation behind the removal of stop_machine() is to avoid its ill-effects
> and thus improve the design of CPU hotplug. (More description regarding this
> is available in the patches).

If you're doing a v7, please put your benchmark results somewhere!

The obvious place is in the 44/46.

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 00/46] CPU hotplug: stop_machine()-free CPU hotplug
@ 2013-02-22  0:31   ` Rusty Russell
  0 siblings, 0 replies; 360+ messages in thread
From: Rusty Russell @ 2013-02-22  0:31 UTC (permalink / raw)
  To: Srivatsa S. Bhat, tglx, peterz, tj, oleg, paulmck, mingo, akpm, namhyung
  Cc: rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, linux-pm, linux-arch, linux-arm-kernel, linuxppc-dev,
	netdev, linux-doc, linux-kernel, walken, vincent.guittot

"Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> writes:
> Hi,
>
> This patchset removes CPU hotplug's dependence on stop_machine() from the CPU
> offline path and provides an alternative (set of APIs) to preempt_disable() to
> prevent CPUs from going offline, which can be invoked from atomic context.
> The motivation behind the removal of stop_machine() is to avoid its ill-effects
> and thus improve the design of CPU hotplug. (More description regarding this
> is available in the patches).

If you're doing a v7, please put your benchmark results somewhere!

The obvious place is in the 44/46.

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 00/46] CPU hotplug: stop_machine()-free CPU hotplug
@ 2013-02-22  0:31   ` Rusty Russell
  0 siblings, 0 replies; 360+ messages in thread
From: Rusty Russell @ 2013-02-22  0:31 UTC (permalink / raw)
  To: Srivatsa S. Bhat, tglx, peterz, tj, oleg, paulmck, mingo, akpm, namhyung
  Cc: linux-arch, linux, nikunj, linux-pm, fweisbec, linux-doc,
	linux-kernel, rostedt, xiaoguangrong, rjw, sbw, wangyun,
	srivatsa.bhat, netdev, vincent.guittot, walken, linuxppc-dev,
	linux-arm-kernel

"Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> writes:
> Hi,
>
> This patchset removes CPU hotplug's dependence on stop_machine() from the CPU
> offline path and provides an alternative (set of APIs) to preempt_disable() to
> prevent CPUs from going offline, which can be invoked from atomic context.
> The motivation behind the removal of stop_machine() is to avoid its ill-effects
> and thus improve the design of CPU hotplug. (More description regarding this
> is available in the patches).

If you're doing a v7, please put your benchmark results somewhere!

The obvious place is in the 44/46.

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 00/46] CPU hotplug: stop_machine()-free CPU hotplug
@ 2013-02-22  0:31   ` Rusty Russell
  0 siblings, 0 replies; 360+ messages in thread
From: Rusty Russell @ 2013-02-22  0:31 UTC (permalink / raw)
  To: linux-arm-kernel

"Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> writes:
> Hi,
>
> This patchset removes CPU hotplug's dependence on stop_machine() from the CPU
> offline path and provides an alternative (set of APIs) to preempt_disable() to
> prevent CPUs from going offline, which can be invoked from atomic context.
> The motivation behind the removal of stop_machine() is to avoid its ill-effects
> and thus improve the design of CPU hotplug. (More description regarding this
> is available in the patches).

If you're doing a v7, please put your benchmark results somewhere!

The obvious place is in the 44/46.

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
  2013-02-18 18:14             ` Srivatsa S. Bhat
  (?)
  (?)
@ 2013-02-25 15:53               ` Lai Jiangshan
  -1 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-02-25 15:53 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Michel Lespinasse, linux-doc, peterz, fweisbec, linux-kernel,
	namhyung, mingo, linux-arch, linux, xiaoguangrong, wangyun,
	paulmck, nikunj, linux-pm, rusty, rostedt, rjw, vincent.guittot,
	tglx, linux-arm-kernel, netdev, oleg, sbw, tj, akpm,
	linuxppc-dev

Hi, Srivatsa,

The target of the whole patchset is nice for me.
A question: How did you find out the such usages of
"preempt_disable()" and convert them? did all are converted?

And I think the lock is too complex and reinvent the wheel, why don't
you reuse the lglock?
I wrote an untested draft here.

Thanks,
Lai

PS: Some HA tools(I'm writing one) which takes checkpoints of
virtual-machines frequently, I guess this patchset can speedup the
tools.

>From 01db542693a1b7fc6f9ece45d57cb529d9be5b66 Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <laijs@cn.fujitsu.com>
Date: Mon, 25 Feb 2013 23:14:27 +0800
Subject: [PATCH] lglock: add read-preference local-global rwlock

locality via lglock(trylock)
read-preference read-write-lock via fallback rwlock_t

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 include/linux/lglock.h |   31 +++++++++++++++++++++++++++++++
 kernel/lglock.c        |   45 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 76 insertions(+), 0 deletions(-)

diff --git a/include/linux/lglock.h b/include/linux/lglock.h
index 0d24e93..30fe887 100644
--- a/include/linux/lglock.h
+++ b/include/linux/lglock.h
@@ -67,4 +67,35 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu);
 void lg_global_lock(struct lglock *lg);
 void lg_global_unlock(struct lglock *lg);

+struct lgrwlock {
+	unsigned long __percpu *fallback_reader_refcnt;
+	struct lglock lglock;
+	rwlock_t fallback_rwlock;
+};
+
+#define DEFINE_LGRWLOCK(name)						\
+	static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)		\
+	= __ARCH_SPIN_LOCK_UNLOCKED;					\
+	static DEFINE_PER_CPU(unsigned long, name ## _refcnt);		\
+	struct lgrwlock name = {					\
+		.fallback_reader_refcnt = &name ## _refcnt,		\
+		.lglock = { .lock = &name ## _lock } }
+
+#define DEFINE_STATIC_LGRWLOCK(name)					\
+	static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)		\
+	= __ARCH_SPIN_LOCK_UNLOCKED;					\
+	static DEFINE_PER_CPU(unsigned long, name ## _refcnt);		\
+	static struct lgrwlock name = {					\
+		.fallback_reader_refcnt = &name ## _refcnt,		\
+		.lglock = { .lock = &name ## _lock } }
+
+static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name)
+{
+	lg_lock_init(&lgrw->lglock, name);
+}
+
+void lg_rwlock_local_read_lock(struct lgrwlock *lgrw);
+void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw);
+void lg_rwlock_global_write_lock(struct lgrwlock *lgrw);
+void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw);
 #endif
diff --git a/kernel/lglock.c b/kernel/lglock.c
index 6535a66..463543a 100644
--- a/kernel/lglock.c
+++ b/kernel/lglock.c
@@ -87,3 +87,48 @@ void lg_global_unlock(struct lglock *lg)
 	preempt_enable();
 }
 EXPORT_SYMBOL(lg_global_unlock);
+
+void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
+{
+	struct lglock *lg = &lgrw->lglock;
+
+	preempt_disable();
+	if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
+		if (likely(arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
+			rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
+			return;
+		}
+		read_lock(&lgrw->fallback_rwlock);
+	}
+
+	__this_cpu_inc(*lgrw->fallback_reader_refcnt);
+}
+EXPORT_SYMBOL(lg_rwlock_local_read_lock);
+
+void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
+{
+	if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
+		lg_local_unlock(&lgrw->lglock);
+		return;
+	}
+
+	if (!__this_cpu_dec_return(*lgrw->fallback_reader_refcnt))
+		read_unlock(&lgrw->fallback_rwlock);
+
+	preempt_enable();
+}
+EXPORT_SYMBOL(lg_rwlock_local_read_unlock);
+
+void lg_rwlock_global_write_lock(struct lgrwlock *lgrw)
+{
+	lg_global_lock(&lgrw->lglock);
+	write_lock(&lgrw->fallback_rwlock);
+}
+EXPORT_SYMBOL(lg_rwlock_global_write_lock);
+
+void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw)
+{
+	write_unlock(&lgrw->fallback_rwlock);
+	lg_global_unlock(&lgrw->lglock);
+}
+EXPORT_SYMBOL(lg_rwlock_global_write_unlock);
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-25 15:53               ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-02-25 15:53 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Michel Lespinasse, linux-doc, peterz, fweisbec, linux-kernel,
	namhyung, mingo, linux-arch, linux, xiaoguangrong, wangyun,
	paulmck, nikunj, linux-pm, rusty, rostedt, rjw, vincent.guittot,
	tglx, linux-arm-kernel, netdev, oleg, sbw, tj, akpm,
	linuxppc-dev

Hi, Srivatsa,

The target of the whole patchset is nice for me.
A question: How did you find out the such usages of
"preempt_disable()" and convert them? did all are converted?

And I think the lock is too complex and reinvent the wheel, why don't
you reuse the lglock?
I wrote an untested draft here.

Thanks,
Lai

PS: Some HA tools(I'm writing one) which takes checkpoints of
virtual-machines frequently, I guess this patchset can speedup the
tools.

From 01db542693a1b7fc6f9ece45d57cb529d9be5b66 Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <laijs@cn.fujitsu.com>
Date: Mon, 25 Feb 2013 23:14:27 +0800
Subject: [PATCH] lglock: add read-preference local-global rwlock

locality via lglock(trylock)
read-preference read-write-lock via fallback rwlock_t

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 include/linux/lglock.h |   31 +++++++++++++++++++++++++++++++
 kernel/lglock.c        |   45 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 76 insertions(+), 0 deletions(-)

diff --git a/include/linux/lglock.h b/include/linux/lglock.h
index 0d24e93..30fe887 100644
--- a/include/linux/lglock.h
+++ b/include/linux/lglock.h
@@ -67,4 +67,35 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu);
 void lg_global_lock(struct lglock *lg);
 void lg_global_unlock(struct lglock *lg);

+struct lgrwlock {
+	unsigned long __percpu *fallback_reader_refcnt;
+	struct lglock lglock;
+	rwlock_t fallback_rwlock;
+};
+
+#define DEFINE_LGRWLOCK(name)						\
+	static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)		\
+	= __ARCH_SPIN_LOCK_UNLOCKED;					\
+	static DEFINE_PER_CPU(unsigned long, name ## _refcnt);		\
+	struct lgrwlock name = {					\
+		.fallback_reader_refcnt = &name ## _refcnt,		\
+		.lglock = { .lock = &name ## _lock } }
+
+#define DEFINE_STATIC_LGRWLOCK(name)					\
+	static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)		\
+	= __ARCH_SPIN_LOCK_UNLOCKED;					\
+	static DEFINE_PER_CPU(unsigned long, name ## _refcnt);		\
+	static struct lgrwlock name = {					\
+		.fallback_reader_refcnt = &name ## _refcnt,		\
+		.lglock = { .lock = &name ## _lock } }
+
+static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name)
+{
+	lg_lock_init(&lgrw->lglock, name);
+}
+
+void lg_rwlock_local_read_lock(struct lgrwlock *lgrw);
+void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw);
+void lg_rwlock_global_write_lock(struct lgrwlock *lgrw);
+void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw);
 #endif
diff --git a/kernel/lglock.c b/kernel/lglock.c
index 6535a66..463543a 100644
--- a/kernel/lglock.c
+++ b/kernel/lglock.c
@@ -87,3 +87,48 @@ void lg_global_unlock(struct lglock *lg)
 	preempt_enable();
 }
 EXPORT_SYMBOL(lg_global_unlock);
+
+void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
+{
+	struct lglock *lg = &lgrw->lglock;
+
+	preempt_disable();
+	if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
+		if (likely(arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
+			rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
+			return;
+		}
+		read_lock(&lgrw->fallback_rwlock);
+	}
+
+	__this_cpu_inc(*lgrw->fallback_reader_refcnt);
+}
+EXPORT_SYMBOL(lg_rwlock_local_read_lock);
+
+void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
+{
+	if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
+		lg_local_unlock(&lgrw->lglock);
+		return;
+	}
+
+	if (!__this_cpu_dec_return(*lgrw->fallback_reader_refcnt))
+		read_unlock(&lgrw->fallback_rwlock);
+
+	preempt_enable();
+}
+EXPORT_SYMBOL(lg_rwlock_local_read_unlock);
+
+void lg_rwlock_global_write_lock(struct lgrwlock *lgrw)
+{
+	lg_global_lock(&lgrw->lglock);
+	write_lock(&lgrw->fallback_rwlock);
+}
+EXPORT_SYMBOL(lg_rwlock_global_write_lock);
+
+void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw)
+{
+	write_unlock(&lgrw->fallback_rwlock);
+	lg_global_unlock(&lgrw->lglock);
+}
+EXPORT_SYMBOL(lg_rwlock_global_write_unlock);
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-25 15:53               ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-02-25 15:53 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: linux-doc, peterz, fweisbec, linux-kernel, Michel Lespinasse,
	mingo, linux-arch, linux, xiaoguangrong, wangyun, paulmck,
	nikunj, linux-pm, rusty, rostedt, rjw, namhyung, tglx,
	linux-arm-kernel, netdev, oleg, vincent.guittot, sbw, tj, akpm,
	linuxppc-dev

Hi, Srivatsa,

The target of the whole patchset is nice for me.
A question: How did you find out the such usages of
"preempt_disable()" and convert them? did all are converted?

And I think the lock is too complex and reinvent the wheel, why don't
you reuse the lglock?
I wrote an untested draft here.

Thanks,
Lai

PS: Some HA tools(I'm writing one) which takes checkpoints of
virtual-machines frequently, I guess this patchset can speedup the
tools.

>From 01db542693a1b7fc6f9ece45d57cb529d9be5b66 Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <laijs@cn.fujitsu.com>
Date: Mon, 25 Feb 2013 23:14:27 +0800
Subject: [PATCH] lglock: add read-preference local-global rwlock

locality via lglock(trylock)
read-preference read-write-lock via fallback rwlock_t

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 include/linux/lglock.h |   31 +++++++++++++++++++++++++++++++
 kernel/lglock.c        |   45 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 76 insertions(+), 0 deletions(-)

diff --git a/include/linux/lglock.h b/include/linux/lglock.h
index 0d24e93..30fe887 100644
--- a/include/linux/lglock.h
+++ b/include/linux/lglock.h
@@ -67,4 +67,35 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu);
 void lg_global_lock(struct lglock *lg);
 void lg_global_unlock(struct lglock *lg);

+struct lgrwlock {
+	unsigned long __percpu *fallback_reader_refcnt;
+	struct lglock lglock;
+	rwlock_t fallback_rwlock;
+};
+
+#define DEFINE_LGRWLOCK(name)						\
+	static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)		\
+	= __ARCH_SPIN_LOCK_UNLOCKED;					\
+	static DEFINE_PER_CPU(unsigned long, name ## _refcnt);		\
+	struct lgrwlock name = {					\
+		.fallback_reader_refcnt = &name ## _refcnt,		\
+		.lglock = { .lock = &name ## _lock } }
+
+#define DEFINE_STATIC_LGRWLOCK(name)					\
+	static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)		\
+	= __ARCH_SPIN_LOCK_UNLOCKED;					\
+	static DEFINE_PER_CPU(unsigned long, name ## _refcnt);		\
+	static struct lgrwlock name = {					\
+		.fallback_reader_refcnt = &name ## _refcnt,		\
+		.lglock = { .lock = &name ## _lock } }
+
+static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name)
+{
+	lg_lock_init(&lgrw->lglock, name);
+}
+
+void lg_rwlock_local_read_lock(struct lgrwlock *lgrw);
+void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw);
+void lg_rwlock_global_write_lock(struct lgrwlock *lgrw);
+void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw);
 #endif
diff --git a/kernel/lglock.c b/kernel/lglock.c
index 6535a66..463543a 100644
--- a/kernel/lglock.c
+++ b/kernel/lglock.c
@@ -87,3 +87,48 @@ void lg_global_unlock(struct lglock *lg)
 	preempt_enable();
 }
 EXPORT_SYMBOL(lg_global_unlock);
+
+void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
+{
+	struct lglock *lg = &lgrw->lglock;
+
+	preempt_disable();
+	if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
+		if (likely(arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
+			rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
+			return;
+		}
+		read_lock(&lgrw->fallback_rwlock);
+	}
+
+	__this_cpu_inc(*lgrw->fallback_reader_refcnt);
+}
+EXPORT_SYMBOL(lg_rwlock_local_read_lock);
+
+void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
+{
+	if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
+		lg_local_unlock(&lgrw->lglock);
+		return;
+	}
+
+	if (!__this_cpu_dec_return(*lgrw->fallback_reader_refcnt))
+		read_unlock(&lgrw->fallback_rwlock);
+
+	preempt_enable();
+}
+EXPORT_SYMBOL(lg_rwlock_local_read_unlock);
+
+void lg_rwlock_global_write_lock(struct lgrwlock *lgrw)
+{
+	lg_global_lock(&lgrw->lglock);
+	write_lock(&lgrw->fallback_rwlock);
+}
+EXPORT_SYMBOL(lg_rwlock_global_write_lock);
+
+void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw)
+{
+	write_unlock(&lgrw->fallback_rwlock);
+	lg_global_unlock(&lgrw->lglock);
+}
+EXPORT_SYMBOL(lg_rwlock_global_write_unlock);
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-25 15:53               ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-02-25 15:53 UTC (permalink / raw)
  To: linux-arm-kernel

Hi, Srivatsa,

The target of the whole patchset is nice for me.
A question: How did you find out the such usages of
"preempt_disable()" and convert them? did all are converted?

And I think the lock is too complex and reinvent the wheel, why don't
you reuse the lglock?
I wrote an untested draft here.

Thanks,
Lai

PS: Some HA tools(I'm writing one) which takes checkpoints of
virtual-machines frequently, I guess this patchset can speedup the
tools.

>From 01db542693a1b7fc6f9ece45d57cb529d9be5b66 Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <laijs@cn.fujitsu.com>
Date: Mon, 25 Feb 2013 23:14:27 +0800
Subject: [PATCH] lglock: add read-preference local-global rwlock

locality via lglock(trylock)
read-preference read-write-lock via fallback rwlock_t

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 include/linux/lglock.h |   31 +++++++++++++++++++++++++++++++
 kernel/lglock.c        |   45 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 76 insertions(+), 0 deletions(-)

diff --git a/include/linux/lglock.h b/include/linux/lglock.h
index 0d24e93..30fe887 100644
--- a/include/linux/lglock.h
+++ b/include/linux/lglock.h
@@ -67,4 +67,35 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu);
 void lg_global_lock(struct lglock *lg);
 void lg_global_unlock(struct lglock *lg);

+struct lgrwlock {
+	unsigned long __percpu *fallback_reader_refcnt;
+	struct lglock lglock;
+	rwlock_t fallback_rwlock;
+};
+
+#define DEFINE_LGRWLOCK(name)						\
+	static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)		\
+	= __ARCH_SPIN_LOCK_UNLOCKED;					\
+	static DEFINE_PER_CPU(unsigned long, name ## _refcnt);		\
+	struct lgrwlock name = {					\
+		.fallback_reader_refcnt = &name ## _refcnt,		\
+		.lglock = { .lock = &name ## _lock } }
+
+#define DEFINE_STATIC_LGRWLOCK(name)					\
+	static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)		\
+	= __ARCH_SPIN_LOCK_UNLOCKED;					\
+	static DEFINE_PER_CPU(unsigned long, name ## _refcnt);		\
+	static struct lgrwlock name = {					\
+		.fallback_reader_refcnt = &name ## _refcnt,		\
+		.lglock = { .lock = &name ## _lock } }
+
+static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name)
+{
+	lg_lock_init(&lgrw->lglock, name);
+}
+
+void lg_rwlock_local_read_lock(struct lgrwlock *lgrw);
+void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw);
+void lg_rwlock_global_write_lock(struct lgrwlock *lgrw);
+void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw);
 #endif
diff --git a/kernel/lglock.c b/kernel/lglock.c
index 6535a66..463543a 100644
--- a/kernel/lglock.c
+++ b/kernel/lglock.c
@@ -87,3 +87,48 @@ void lg_global_unlock(struct lglock *lg)
 	preempt_enable();
 }
 EXPORT_SYMBOL(lg_global_unlock);
+
+void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
+{
+	struct lglock *lg = &lgrw->lglock;
+
+	preempt_disable();
+	if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
+		if (likely(arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
+			rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
+			return;
+		}
+		read_lock(&lgrw->fallback_rwlock);
+	}
+
+	__this_cpu_inc(*lgrw->fallback_reader_refcnt);
+}
+EXPORT_SYMBOL(lg_rwlock_local_read_lock);
+
+void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
+{
+	if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
+		lg_local_unlock(&lgrw->lglock);
+		return;
+	}
+
+	if (!__this_cpu_dec_return(*lgrw->fallback_reader_refcnt))
+		read_unlock(&lgrw->fallback_rwlock);
+
+	preempt_enable();
+}
+EXPORT_SYMBOL(lg_rwlock_local_read_unlock);
+
+void lg_rwlock_global_write_lock(struct lgrwlock *lgrw)
+{
+	lg_global_lock(&lgrw->lglock);
+	write_lock(&lgrw->fallback_rwlock);
+}
+EXPORT_SYMBOL(lg_rwlock_global_write_lock);
+
+void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw)
+{
+	write_unlock(&lgrw->fallback_rwlock);
+	lg_global_unlock(&lgrw->lglock);
+}
+EXPORT_SYMBOL(lg_rwlock_global_write_unlock);
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
  2013-02-25 15:53               ` Lai Jiangshan
  (?)
@ 2013-02-25 19:26                 ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-25 19:26 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Michel Lespinasse, linux-doc, peterz, fweisbec, linux-kernel,
	namhyung, mingo, linux-arch, linux, xiaoguangrong, wangyun,
	paulmck, nikunj, linux-pm, rusty, rostedt, rjw, vincent.guittot,
	tglx, linux-arm-kernel, netdev, oleg, sbw, tj, akpm,
	linuxppc-dev

Hi Lai,

On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
> Hi, Srivatsa,
> 
> The target of the whole patchset is nice for me.

Cool! Thanks :-)

> A question: How did you find out the such usages of
> "preempt_disable()" and convert them? did all are converted?
> 

Well, I scanned through the source tree for usages which implicitly
disabled CPU offline and converted them over. Its not limited to uses
of preempt_disable() alone - even spin_locks, rwlocks, local_irq_disable()
etc also help disable CPU offline. So I tried to dig out all such uses
and converted them. However, since the merge window is open, a lot of
new code is flowing into the tree. So I'll have to rescan the tree to
see if there are any more places to convert.

> And I think the lock is too complex and reinvent the wheel, why don't
> you reuse the lglock?

lglocks? No way! ;-) See below...

> I wrote an untested draft here.
> 
> Thanks,
> Lai
> 
> PS: Some HA tools(I'm writing one) which takes checkpoints of
> virtual-machines frequently, I guess this patchset can speedup the
> tools.
> 
> From 01db542693a1b7fc6f9ece45d57cb529d9be5b66 Mon Sep 17 00:00:00 2001
> From: Lai Jiangshan <laijs@cn.fujitsu.com>
> Date: Mon, 25 Feb 2013 23:14:27 +0800
> Subject: [PATCH] lglock: add read-preference local-global rwlock
> 
> locality via lglock(trylock)
> read-preference read-write-lock via fallback rwlock_t
> 
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> ---
>  include/linux/lglock.h |   31 +++++++++++++++++++++++++++++++
>  kernel/lglock.c        |   45 +++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 76 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/lglock.h b/include/linux/lglock.h
> index 0d24e93..30fe887 100644
> --- a/include/linux/lglock.h
> +++ b/include/linux/lglock.h
> @@ -67,4 +67,35 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu);
>  void lg_global_lock(struct lglock *lg);
>  void lg_global_unlock(struct lglock *lg);
> 
> +struct lgrwlock {
> +	unsigned long __percpu *fallback_reader_refcnt;
> +	struct lglock lglock;
> +	rwlock_t fallback_rwlock;
> +};
> +
> +#define DEFINE_LGRWLOCK(name)						\
> +	static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)		\
> +	= __ARCH_SPIN_LOCK_UNLOCKED;					\
> +	static DEFINE_PER_CPU(unsigned long, name ## _refcnt);		\
> +	struct lgrwlock name = {					\
> +		.fallback_reader_refcnt = &name ## _refcnt,		\
> +		.lglock = { .lock = &name ## _lock } }
> +
> +#define DEFINE_STATIC_LGRWLOCK(name)					\
> +	static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)		\
> +	= __ARCH_SPIN_LOCK_UNLOCKED;					\
> +	static DEFINE_PER_CPU(unsigned long, name ## _refcnt);		\
> +	static struct lgrwlock name = {					\
> +		.fallback_reader_refcnt = &name ## _refcnt,		\
> +		.lglock = { .lock = &name ## _lock } }
> +
> +static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name)
> +{
> +	lg_lock_init(&lgrw->lglock, name);
> +}
> +
> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw);
> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw);
> +void lg_rwlock_global_write_lock(struct lgrwlock *lgrw);
> +void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw);
>  #endif
> diff --git a/kernel/lglock.c b/kernel/lglock.c
> index 6535a66..463543a 100644
> --- a/kernel/lglock.c
> +++ b/kernel/lglock.c
> @@ -87,3 +87,48 @@ void lg_global_unlock(struct lglock *lg)
>  	preempt_enable();
>  }
>  EXPORT_SYMBOL(lg_global_unlock);
> +
> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
> +{
> +	struct lglock *lg = &lgrw->lglock;
> +
> +	preempt_disable();
> +	if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
> +		if (likely(arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
> +			rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
> +			return;
> +		}
> +		read_lock(&lgrw->fallback_rwlock);
> +	}
> +
> +	__this_cpu_inc(*lgrw->fallback_reader_refcnt);
> +}
> +EXPORT_SYMBOL(lg_rwlock_local_read_lock);
> +
> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
> +{
> +	if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
> +		lg_local_unlock(&lgrw->lglock);
> +		return;
> +	}
> +
> +	if (!__this_cpu_dec_return(*lgrw->fallback_reader_refcnt))
> +		read_unlock(&lgrw->fallback_rwlock);
> +
> +	preempt_enable();
> +}
> +EXPORT_SYMBOL(lg_rwlock_local_read_unlock);
> +

If I read the code above correctly, all you are doing is implementing a
recursive reader-side primitive (ie., allowing the reader to call these
functions recursively, without resulting in a self-deadlock).

But the thing is, making the reader-side recursive is the least of our
problems! Our main challenge is to make the locking extremely flexible
and also safe-guard it against circular-locking-dependencies and deadlocks.
Please take a look at the changelog of patch 1 - it explains the situation
with an example.

> +void lg_rwlock_global_write_lock(struct lgrwlock *lgrw)
> +{
> +	lg_global_lock(&lgrw->lglock);

This does a for-loop on all CPUs and takes their locks one-by-one. That's
exactly what we want to prevent, because that is the _source_ of all our
deadlock woes in this case. In the presence of perfect lock ordering
guarantees, this wouldn't have been a problem (that's why lglocks are
being used successfully elsewhere in the kernel). In the stop-machine()
removal case, the over-flexibility of preempt_disable() forces us to provide
an equally flexible locking alternative. Hence we can't use such per-cpu
locking schemes.

You might note that, for exactly this reason, I haven't actually used any
per-cpu _locks_ in this synchronization scheme, though it is named as
"per-cpu rwlocks". The only per-cpu component here are the refcounts, and
we consciously avoid waiting/spinning on them (because then that would be
equivalent to having per-cpu locks, which are deadlock-prone). We use
global rwlocks to get the deadlock-safety that we need.

> +	write_lock(&lgrw->fallback_rwlock);
> +}
> +EXPORT_SYMBOL(lg_rwlock_global_write_lock);
> +
> +void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw)
> +{
> +	write_unlock(&lgrw->fallback_rwlock);
> +	lg_global_unlock(&lgrw->lglock);
> +}
> +EXPORT_SYMBOL(lg_rwlock_global_write_unlock);
> 

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-25 19:26                 ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-25 19:26 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-doc, peterz, fweisbec, linux-kernel, Michel Lespinasse,
	mingo, linux-arch, linux, xiaoguangrong, wangyun, paulmck,
	nikunj, linux-pm, rusty, rostedt, rjw, namhyung, tglx,
	linux-arm-kernel, netdev, oleg, vincent.guittot, sbw, tj, akpm,
	linuxppc-dev

Hi Lai,

On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
> Hi, Srivatsa,
> 
> The target of the whole patchset is nice for me.

Cool! Thanks :-)

> A question: How did you find out the such usages of
> "preempt_disable()" and convert them? did all are converted?
> 

Well, I scanned through the source tree for usages which implicitly
disabled CPU offline and converted them over. Its not limited to uses
of preempt_disable() alone - even spin_locks, rwlocks, local_irq_disable()
etc also help disable CPU offline. So I tried to dig out all such uses
and converted them. However, since the merge window is open, a lot of
new code is flowing into the tree. So I'll have to rescan the tree to
see if there are any more places to convert.

> And I think the lock is too complex and reinvent the wheel, why don't
> you reuse the lglock?

lglocks? No way! ;-) See below...

> I wrote an untested draft here.
> 
> Thanks,
> Lai
> 
> PS: Some HA tools(I'm writing one) which takes checkpoints of
> virtual-machines frequently, I guess this patchset can speedup the
> tools.
> 
> From 01db542693a1b7fc6f9ece45d57cb529d9be5b66 Mon Sep 17 00:00:00 2001
> From: Lai Jiangshan <laijs@cn.fujitsu.com>
> Date: Mon, 25 Feb 2013 23:14:27 +0800
> Subject: [PATCH] lglock: add read-preference local-global rwlock
> 
> locality via lglock(trylock)
> read-preference read-write-lock via fallback rwlock_t
> 
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> ---
>  include/linux/lglock.h |   31 +++++++++++++++++++++++++++++++
>  kernel/lglock.c        |   45 +++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 76 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/lglock.h b/include/linux/lglock.h
> index 0d24e93..30fe887 100644
> --- a/include/linux/lglock.h
> +++ b/include/linux/lglock.h
> @@ -67,4 +67,35 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu);
>  void lg_global_lock(struct lglock *lg);
>  void lg_global_unlock(struct lglock *lg);
> 
> +struct lgrwlock {
> +	unsigned long __percpu *fallback_reader_refcnt;
> +	struct lglock lglock;
> +	rwlock_t fallback_rwlock;
> +};
> +
> +#define DEFINE_LGRWLOCK(name)						\
> +	static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)		\
> +	= __ARCH_SPIN_LOCK_UNLOCKED;					\
> +	static DEFINE_PER_CPU(unsigned long, name ## _refcnt);		\
> +	struct lgrwlock name = {					\
> +		.fallback_reader_refcnt = &name ## _refcnt,		\
> +		.lglock = { .lock = &name ## _lock } }
> +
> +#define DEFINE_STATIC_LGRWLOCK(name)					\
> +	static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)		\
> +	= __ARCH_SPIN_LOCK_UNLOCKED;					\
> +	static DEFINE_PER_CPU(unsigned long, name ## _refcnt);		\
> +	static struct lgrwlock name = {					\
> +		.fallback_reader_refcnt = &name ## _refcnt,		\
> +		.lglock = { .lock = &name ## _lock } }
> +
> +static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name)
> +{
> +	lg_lock_init(&lgrw->lglock, name);
> +}
> +
> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw);
> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw);
> +void lg_rwlock_global_write_lock(struct lgrwlock *lgrw);
> +void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw);
>  #endif
> diff --git a/kernel/lglock.c b/kernel/lglock.c
> index 6535a66..463543a 100644
> --- a/kernel/lglock.c
> +++ b/kernel/lglock.c
> @@ -87,3 +87,48 @@ void lg_global_unlock(struct lglock *lg)
>  	preempt_enable();
>  }
>  EXPORT_SYMBOL(lg_global_unlock);
> +
> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
> +{
> +	struct lglock *lg = &lgrw->lglock;
> +
> +	preempt_disable();
> +	if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
> +		if (likely(arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
> +			rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
> +			return;
> +		}
> +		read_lock(&lgrw->fallback_rwlock);
> +	}
> +
> +	__this_cpu_inc(*lgrw->fallback_reader_refcnt);
> +}
> +EXPORT_SYMBOL(lg_rwlock_local_read_lock);
> +
> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
> +{
> +	if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
> +		lg_local_unlock(&lgrw->lglock);
> +		return;
> +	}
> +
> +	if (!__this_cpu_dec_return(*lgrw->fallback_reader_refcnt))
> +		read_unlock(&lgrw->fallback_rwlock);
> +
> +	preempt_enable();
> +}
> +EXPORT_SYMBOL(lg_rwlock_local_read_unlock);
> +

If I read the code above correctly, all you are doing is implementing a
recursive reader-side primitive (ie., allowing the reader to call these
functions recursively, without resulting in a self-deadlock).

But the thing is, making the reader-side recursive is the least of our
problems! Our main challenge is to make the locking extremely flexible
and also safe-guard it against circular-locking-dependencies and deadlocks.
Please take a look at the changelog of patch 1 - it explains the situation
with an example.

> +void lg_rwlock_global_write_lock(struct lgrwlock *lgrw)
> +{
> +	lg_global_lock(&lgrw->lglock);

This does a for-loop on all CPUs and takes their locks one-by-one. That's
exactly what we want to prevent, because that is the _source_ of all our
deadlock woes in this case. In the presence of perfect lock ordering
guarantees, this wouldn't have been a problem (that's why lglocks are
being used successfully elsewhere in the kernel). In the stop-machine()
removal case, the over-flexibility of preempt_disable() forces us to provide
an equally flexible locking alternative. Hence we can't use such per-cpu
locking schemes.

You might note that, for exactly this reason, I haven't actually used any
per-cpu _locks_ in this synchronization scheme, though it is named as
"per-cpu rwlocks". The only per-cpu component here are the refcounts, and
we consciously avoid waiting/spinning on them (because then that would be
equivalent to having per-cpu locks, which are deadlock-prone). We use
global rwlocks to get the deadlock-safety that we need.

> +	write_lock(&lgrw->fallback_rwlock);
> +}
> +EXPORT_SYMBOL(lg_rwlock_global_write_lock);
> +
> +void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw)
> +{
> +	write_unlock(&lgrw->fallback_rwlock);
> +	lg_global_unlock(&lgrw->lglock);
> +}
> +EXPORT_SYMBOL(lg_rwlock_global_write_unlock);
> 

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-25 19:26                 ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-25 19:26 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Lai,

On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
> Hi, Srivatsa,
> 
> The target of the whole patchset is nice for me.

Cool! Thanks :-)

> A question: How did you find out the such usages of
> "preempt_disable()" and convert them? did all are converted?
> 

Well, I scanned through the source tree for usages which implicitly
disabled CPU offline and converted them over. Its not limited to uses
of preempt_disable() alone - even spin_locks, rwlocks, local_irq_disable()
etc also help disable CPU offline. So I tried to dig out all such uses
and converted them. However, since the merge window is open, a lot of
new code is flowing into the tree. So I'll have to rescan the tree to
see if there are any more places to convert.

> And I think the lock is too complex and reinvent the wheel, why don't
> you reuse the lglock?

lglocks? No way! ;-) See below...

> I wrote an untested draft here.
> 
> Thanks,
> Lai
> 
> PS: Some HA tools(I'm writing one) which takes checkpoints of
> virtual-machines frequently, I guess this patchset can speedup the
> tools.
> 
> From 01db542693a1b7fc6f9ece45d57cb529d9be5b66 Mon Sep 17 00:00:00 2001
> From: Lai Jiangshan <laijs@cn.fujitsu.com>
> Date: Mon, 25 Feb 2013 23:14:27 +0800
> Subject: [PATCH] lglock: add read-preference local-global rwlock
> 
> locality via lglock(trylock)
> read-preference read-write-lock via fallback rwlock_t
> 
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> ---
>  include/linux/lglock.h |   31 +++++++++++++++++++++++++++++++
>  kernel/lglock.c        |   45 +++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 76 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/lglock.h b/include/linux/lglock.h
> index 0d24e93..30fe887 100644
> --- a/include/linux/lglock.h
> +++ b/include/linux/lglock.h
> @@ -67,4 +67,35 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu);
>  void lg_global_lock(struct lglock *lg);
>  void lg_global_unlock(struct lglock *lg);
> 
> +struct lgrwlock {
> +	unsigned long __percpu *fallback_reader_refcnt;
> +	struct lglock lglock;
> +	rwlock_t fallback_rwlock;
> +};
> +
> +#define DEFINE_LGRWLOCK(name)						\
> +	static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)		\
> +	= __ARCH_SPIN_LOCK_UNLOCKED;					\
> +	static DEFINE_PER_CPU(unsigned long, name ## _refcnt);		\
> +	struct lgrwlock name = {					\
> +		.fallback_reader_refcnt = &name ## _refcnt,		\
> +		.lglock = { .lock = &name ## _lock } }
> +
> +#define DEFINE_STATIC_LGRWLOCK(name)					\
> +	static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)		\
> +	= __ARCH_SPIN_LOCK_UNLOCKED;					\
> +	static DEFINE_PER_CPU(unsigned long, name ## _refcnt);		\
> +	static struct lgrwlock name = {					\
> +		.fallback_reader_refcnt = &name ## _refcnt,		\
> +		.lglock = { .lock = &name ## _lock } }
> +
> +static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name)
> +{
> +	lg_lock_init(&lgrw->lglock, name);
> +}
> +
> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw);
> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw);
> +void lg_rwlock_global_write_lock(struct lgrwlock *lgrw);
> +void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw);
>  #endif
> diff --git a/kernel/lglock.c b/kernel/lglock.c
> index 6535a66..463543a 100644
> --- a/kernel/lglock.c
> +++ b/kernel/lglock.c
> @@ -87,3 +87,48 @@ void lg_global_unlock(struct lglock *lg)
>  	preempt_enable();
>  }
>  EXPORT_SYMBOL(lg_global_unlock);
> +
> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
> +{
> +	struct lglock *lg = &lgrw->lglock;
> +
> +	preempt_disable();
> +	if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
> +		if (likely(arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
> +			rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
> +			return;
> +		}
> +		read_lock(&lgrw->fallback_rwlock);
> +	}
> +
> +	__this_cpu_inc(*lgrw->fallback_reader_refcnt);
> +}
> +EXPORT_SYMBOL(lg_rwlock_local_read_lock);
> +
> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
> +{
> +	if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
> +		lg_local_unlock(&lgrw->lglock);
> +		return;
> +	}
> +
> +	if (!__this_cpu_dec_return(*lgrw->fallback_reader_refcnt))
> +		read_unlock(&lgrw->fallback_rwlock);
> +
> +	preempt_enable();
> +}
> +EXPORT_SYMBOL(lg_rwlock_local_read_unlock);
> +

If I read the code above correctly, all you are doing is implementing a
recursive reader-side primitive (ie., allowing the reader to call these
functions recursively, without resulting in a self-deadlock).

But the thing is, making the reader-side recursive is the least of our
problems! Our main challenge is to make the locking extremely flexible
and also safe-guard it against circular-locking-dependencies and deadlocks.
Please take a look at the changelog of patch 1 - it explains the situation
with an example.

> +void lg_rwlock_global_write_lock(struct lgrwlock *lgrw)
> +{
> +	lg_global_lock(&lgrw->lglock);

This does a for-loop on all CPUs and takes their locks one-by-one. That's
exactly what we want to prevent, because that is the _source_ of all our
deadlock woes in this case. In the presence of perfect lock ordering
guarantees, this wouldn't have been a problem (that's why lglocks are
being used successfully elsewhere in the kernel). In the stop-machine()
removal case, the over-flexibility of preempt_disable() forces us to provide
an equally flexible locking alternative. Hence we can't use such per-cpu
locking schemes.

You might note that, for exactly this reason, I haven't actually used any
per-cpu _locks_ in this synchronization scheme, though it is named as
"per-cpu rwlocks". The only per-cpu component here are the refcounts, and
we consciously avoid waiting/spinning on them (because then that would be
equivalent to having per-cpu locks, which are deadlock-prone). We use
global rwlocks to get the deadlock-safety that we need.

> +	write_lock(&lgrw->fallback_rwlock);
> +}
> +EXPORT_SYMBOL(lg_rwlock_global_write_lock);
> +
> +void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw)
> +{
> +	write_unlock(&lgrw->fallback_rwlock);
> +	lg_global_unlock(&lgrw->lglock);
> +}
> +EXPORT_SYMBOL(lg_rwlock_global_write_unlock);
> 

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 00/46] CPU hotplug: stop_machine()-free CPU hotplug
  2013-02-22  0:31   ` Rusty Russell
  (?)
@ 2013-02-25 21:45     ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-25 21:45 UTC (permalink / raw)
  To: Rusty Russell
  Cc: tglx, peterz, tj, oleg, paulmck, mingo, akpm, namhyung, rostedt,
	wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux, nikunj,
	linux-pm, linux-arch, linux-arm-kernel, linuxppc-dev, netdev,
	linux-doc, linux-kernel, walken, vincent.guittot

On 02/22/2013 06:01 AM, Rusty Russell wrote:
> "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> writes:
>> Hi,
>>
>> This patchset removes CPU hotplug's dependence on stop_machine() from the CPU
>> offline path and provides an alternative (set of APIs) to preempt_disable() to
>> prevent CPUs from going offline, which can be invoked from atomic context.
>> The motivation behind the removal of stop_machine() is to avoid its ill-effects
>> and thus improve the design of CPU hotplug. (More description regarding this
>> is available in the patches).
> 
> If you're doing a v7, please put your benchmark results somewhere!
> 

Oh, I forgot to put them in v6! Thanks for reminding :-)
And yes, I'll have to do a v7 to incorporate changes (if any) to the new code
that went in during this merge window.

> The obvious place is in the 44/46.
>

Ok, will add it there. Thank you!
 
Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 00/46] CPU hotplug: stop_machine()-free CPU hotplug
@ 2013-02-25 21:45     ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-25 21:45 UTC (permalink / raw)
  To: Rusty Russell
  Cc: linux-doc, peterz, fweisbec, linux-kernel, walken, mingo,
	linux-arch, linux, xiaoguangrong, wangyun, paulmck, nikunj,
	linux-pm, rostedt, rjw, namhyung, tglx, linux-arm-kernel, netdev,
	oleg, vincent.guittot, sbw, tj, akpm, linuxppc-dev

On 02/22/2013 06:01 AM, Rusty Russell wrote:
> "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> writes:
>> Hi,
>>
>> This patchset removes CPU hotplug's dependence on stop_machine() from the CPU
>> offline path and provides an alternative (set of APIs) to preempt_disable() to
>> prevent CPUs from going offline, which can be invoked from atomic context.
>> The motivation behind the removal of stop_machine() is to avoid its ill-effects
>> and thus improve the design of CPU hotplug. (More description regarding this
>> is available in the patches).
> 
> If you're doing a v7, please put your benchmark results somewhere!
> 

Oh, I forgot to put them in v6! Thanks for reminding :-)
And yes, I'll have to do a v7 to incorporate changes (if any) to the new code
that went in during this merge window.

> The obvious place is in the 44/46.
>

Ok, will add it there. Thank you!
 
Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 00/46] CPU hotplug: stop_machine()-free CPU hotplug
@ 2013-02-25 21:45     ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-25 21:45 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/22/2013 06:01 AM, Rusty Russell wrote:
> "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> writes:
>> Hi,
>>
>> This patchset removes CPU hotplug's dependence on stop_machine() from the CPU
>> offline path and provides an alternative (set of APIs) to preempt_disable() to
>> prevent CPUs from going offline, which can be invoked from atomic context.
>> The motivation behind the removal of stop_machine() is to avoid its ill-effects
>> and thus improve the design of CPU hotplug. (More description regarding this
>> is available in the patches).
> 
> If you're doing a v7, please put your benchmark results somewhere!
> 

Oh, I forgot to put them in v6! Thanks for reminding :-)
And yes, I'll have to do a v7 to incorporate changes (if any) to the new code
that went in during this merge window.

> The obvious place is in the 44/46.
>

Ok, will add it there. Thank you!
 
Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
  2013-02-25 19:26                 ` Srivatsa S. Bhat
  (?)
@ 2013-02-26  0:17                   ` Lai Jiangshan
  -1 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-02-26  0:17 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Michel Lespinasse, linux-doc, peterz, fweisbec, linux-kernel,
	namhyung, mingo, linux-arch, linux, xiaoguangrong, wangyun,
	paulmck, nikunj, linux-pm, rusty, rostedt, rjw, vincent.guittot,
	tglx, linux-arm-kernel, netdev, oleg, sbw, tj, akpm,
	linuxppc-dev

On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> Hi Lai,
>
> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>> Hi, Srivatsa,
>>
>> The target of the whole patchset is nice for me.
>
> Cool! Thanks :-)
>
>> A question: How did you find out the such usages of
>> "preempt_disable()" and convert them? did all are converted?
>>
>
> Well, I scanned through the source tree for usages which implicitly
> disabled CPU offline and converted them over. Its not limited to uses
> of preempt_disable() alone - even spin_locks, rwlocks, local_irq_disable()
> etc also help disable CPU offline. So I tried to dig out all such uses
> and converted them. However, since the merge window is open, a lot of
> new code is flowing into the tree. So I'll have to rescan the tree to
> see if there are any more places to convert.
>
>> And I think the lock is too complex and reinvent the wheel, why don't
>> you reuse the lglock?
>
> lglocks? No way! ;-) See below...
>
>> I wrote an untested draft here.
>>
>> Thanks,
>> Lai
>>
>> PS: Some HA tools(I'm writing one) which takes checkpoints of
>> virtual-machines frequently, I guess this patchset can speedup the
>> tools.
>>
>> From 01db542693a1b7fc6f9ece45d57cb529d9be5b66 Mon Sep 17 00:00:00 2001
>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>> Date: Mon, 25 Feb 2013 23:14:27 +0800
>> Subject: [PATCH] lglock: add read-preference local-global rwlock
>>
>> locality via lglock(trylock)
>> read-preference read-write-lock via fallback rwlock_t
>>
>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>> ---
>>  include/linux/lglock.h |   31 +++++++++++++++++++++++++++++++
>>  kernel/lglock.c        |   45 +++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 76 insertions(+), 0 deletions(-)
>>
>> diff --git a/include/linux/lglock.h b/include/linux/lglock.h
>> index 0d24e93..30fe887 100644
>> --- a/include/linux/lglock.h
>> +++ b/include/linux/lglock.h
>> @@ -67,4 +67,35 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu);
>>  void lg_global_lock(struct lglock *lg);
>>  void lg_global_unlock(struct lglock *lg);
>>
>> +struct lgrwlock {
>> +     unsigned long __percpu *fallback_reader_refcnt;
>> +     struct lglock lglock;
>> +     rwlock_t fallback_rwlock;
>> +};
>> +
>> +#define DEFINE_LGRWLOCK(name)                                                \
>> +     static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)           \
>> +     = __ARCH_SPIN_LOCK_UNLOCKED;                                    \
>> +     static DEFINE_PER_CPU(unsigned long, name ## _refcnt);          \
>> +     struct lgrwlock name = {                                        \
>> +             .fallback_reader_refcnt = &name ## _refcnt,             \
>> +             .lglock = { .lock = &name ## _lock } }
>> +
>> +#define DEFINE_STATIC_LGRWLOCK(name)                                 \
>> +     static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)           \
>> +     = __ARCH_SPIN_LOCK_UNLOCKED;                                    \
>> +     static DEFINE_PER_CPU(unsigned long, name ## _refcnt);          \
>> +     static struct lgrwlock name = {                                 \
>> +             .fallback_reader_refcnt = &name ## _refcnt,             \
>> +             .lglock = { .lock = &name ## _lock } }
>> +
>> +static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name)
>> +{
>> +     lg_lock_init(&lgrw->lglock, name);
>> +}
>> +
>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw);
>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw);
>> +void lg_rwlock_global_write_lock(struct lgrwlock *lgrw);
>> +void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw);
>>  #endif
>> diff --git a/kernel/lglock.c b/kernel/lglock.c
>> index 6535a66..463543a 100644
>> --- a/kernel/lglock.c
>> +++ b/kernel/lglock.c
>> @@ -87,3 +87,48 @@ void lg_global_unlock(struct lglock *lg)
>>       preempt_enable();
>>  }
>>  EXPORT_SYMBOL(lg_global_unlock);
>> +
>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
>> +{
>> +     struct lglock *lg = &lgrw->lglock;
>> +
>> +     preempt_disable();
>> +     if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
>> +             if (likely(arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
>> +                     rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
>> +                     return;
>> +             }
>> +             read_lock(&lgrw->fallback_rwlock);
>> +     }
>> +
>> +     __this_cpu_inc(*lgrw->fallback_reader_refcnt);
>> +}
>> +EXPORT_SYMBOL(lg_rwlock_local_read_lock);
>> +
>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
>> +{
>> +     if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
>> +             lg_local_unlock(&lgrw->lglock);
>> +             return;
>> +     }
>> +
>> +     if (!__this_cpu_dec_return(*lgrw->fallback_reader_refcnt))
>> +             read_unlock(&lgrw->fallback_rwlock);
>> +
>> +     preempt_enable();
>> +}
>> +EXPORT_SYMBOL(lg_rwlock_local_read_unlock);
>> +
>
> If I read the code above correctly, all you are doing is implementing a
> recursive reader-side primitive (ie., allowing the reader to call these
> functions recursively, without resulting in a self-deadlock).
>
> But the thing is, making the reader-side recursive is the least of our
> problems! Our main challenge is to make the locking extremely flexible
> and also safe-guard it against circular-locking-dependencies and deadlocks.
> Please take a look at the changelog of patch 1 - it explains the situation
> with an example.


My lock fixes your requirements(I read patch 1-6 before I sent). In
readsite, lglock 's lock is token via trylock, the lglock doesn't
contribute to deadlocks, we can consider it doesn't exist when we find
deadlock from it. And global fallback rwlock doesn't result to
deadlocks because it is read-preference(you need to inc the
fallback_reader_refcnt inside the cpu-hotplug write-side, I don't do
it in generic lgrwlock)


If lg_rwlock_local_read_lock() spins, which means
lg_rwlock_local_read_lock() spins on fallback_rwlock, and which means
lg_rwlock_global_write_lock() took the lgrwlock successfully and
return, and which means lg_rwlock_local_read_lock() will stop spinning
when the write side finished.


>
>> +void lg_rwlock_global_write_lock(struct lgrwlock *lgrw)
>> +{
>> +     lg_global_lock(&lgrw->lglock);
>
> This does a for-loop on all CPUs and takes their locks one-by-one. That's
> exactly what we want to prevent, because that is the _source_ of all our
> deadlock woes in this case. In the presence of perfect lock ordering
> guarantees, this wouldn't have been a problem (that's why lglocks are
> being used successfully elsewhere in the kernel). In the stop-machine()
> removal case, the over-flexibility of preempt_disable() forces us to provide
> an equally flexible locking alternative. Hence we can't use such per-cpu
> locking schemes.
>
> You might note that, for exactly this reason, I haven't actually used any
> per-cpu _locks_ in this synchronization scheme, though it is named as
> "per-cpu rwlocks". The only per-cpu component here are the refcounts, and
> we consciously avoid waiting/spinning on them (because then that would be
> equivalent to having per-cpu locks, which are deadlock-prone). We use
> global rwlocks to get the deadlock-safety that we need.
>
>> +     write_lock(&lgrw->fallback_rwlock);
>> +}
>> +EXPORT_SYMBOL(lg_rwlock_global_write_lock);
>> +
>> +void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw)
>> +{
>> +     write_unlock(&lgrw->fallback_rwlock);
>> +     lg_global_unlock(&lgrw->lglock);
>> +}
>> +EXPORT_SYMBOL(lg_rwlock_global_write_unlock);
>>
>
> Regards,
> Srivatsa S. Bhat
>

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-26  0:17                   ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-02-26  0:17 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: linux-doc, peterz, fweisbec, linux-kernel, Michel Lespinasse,
	mingo, linux-arch, linux, xiaoguangrong, wangyun, paulmck,
	nikunj, linux-pm, rusty, rostedt, rjw, namhyung, tglx,
	linux-arm-kernel, netdev, oleg, vincent.guittot, sbw, tj, akpm,
	linuxppc-dev

On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> Hi Lai,
>
> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>> Hi, Srivatsa,
>>
>> The target of the whole patchset is nice for me.
>
> Cool! Thanks :-)
>
>> A question: How did you find out the such usages of
>> "preempt_disable()" and convert them? did all are converted?
>>
>
> Well, I scanned through the source tree for usages which implicitly
> disabled CPU offline and converted them over. Its not limited to uses
> of preempt_disable() alone - even spin_locks, rwlocks, local_irq_disable()
> etc also help disable CPU offline. So I tried to dig out all such uses
> and converted them. However, since the merge window is open, a lot of
> new code is flowing into the tree. So I'll have to rescan the tree to
> see if there are any more places to convert.
>
>> And I think the lock is too complex and reinvent the wheel, why don't
>> you reuse the lglock?
>
> lglocks? No way! ;-) See below...
>
>> I wrote an untested draft here.
>>
>> Thanks,
>> Lai
>>
>> PS: Some HA tools(I'm writing one) which takes checkpoints of
>> virtual-machines frequently, I guess this patchset can speedup the
>> tools.
>>
>> From 01db542693a1b7fc6f9ece45d57cb529d9be5b66 Mon Sep 17 00:00:00 2001
>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>> Date: Mon, 25 Feb 2013 23:14:27 +0800
>> Subject: [PATCH] lglock: add read-preference local-global rwlock
>>
>> locality via lglock(trylock)
>> read-preference read-write-lock via fallback rwlock_t
>>
>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>> ---
>>  include/linux/lglock.h |   31 +++++++++++++++++++++++++++++++
>>  kernel/lglock.c        |   45 +++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 76 insertions(+), 0 deletions(-)
>>
>> diff --git a/include/linux/lglock.h b/include/linux/lglock.h
>> index 0d24e93..30fe887 100644
>> --- a/include/linux/lglock.h
>> +++ b/include/linux/lglock.h
>> @@ -67,4 +67,35 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu);
>>  void lg_global_lock(struct lglock *lg);
>>  void lg_global_unlock(struct lglock *lg);
>>
>> +struct lgrwlock {
>> +     unsigned long __percpu *fallback_reader_refcnt;
>> +     struct lglock lglock;
>> +     rwlock_t fallback_rwlock;
>> +};
>> +
>> +#define DEFINE_LGRWLOCK(name)                                                \
>> +     static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)           \
>> +     = __ARCH_SPIN_LOCK_UNLOCKED;                                    \
>> +     static DEFINE_PER_CPU(unsigned long, name ## _refcnt);          \
>> +     struct lgrwlock name = {                                        \
>> +             .fallback_reader_refcnt = &name ## _refcnt,             \
>> +             .lglock = { .lock = &name ## _lock } }
>> +
>> +#define DEFINE_STATIC_LGRWLOCK(name)                                 \
>> +     static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)           \
>> +     = __ARCH_SPIN_LOCK_UNLOCKED;                                    \
>> +     static DEFINE_PER_CPU(unsigned long, name ## _refcnt);          \
>> +     static struct lgrwlock name = {                                 \
>> +             .fallback_reader_refcnt = &name ## _refcnt,             \
>> +             .lglock = { .lock = &name ## _lock } }
>> +
>> +static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name)
>> +{
>> +     lg_lock_init(&lgrw->lglock, name);
>> +}
>> +
>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw);
>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw);
>> +void lg_rwlock_global_write_lock(struct lgrwlock *lgrw);
>> +void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw);
>>  #endif
>> diff --git a/kernel/lglock.c b/kernel/lglock.c
>> index 6535a66..463543a 100644
>> --- a/kernel/lglock.c
>> +++ b/kernel/lglock.c
>> @@ -87,3 +87,48 @@ void lg_global_unlock(struct lglock *lg)
>>       preempt_enable();
>>  }
>>  EXPORT_SYMBOL(lg_global_unlock);
>> +
>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
>> +{
>> +     struct lglock *lg = &lgrw->lglock;
>> +
>> +     preempt_disable();
>> +     if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
>> +             if (likely(arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
>> +                     rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
>> +                     return;
>> +             }
>> +             read_lock(&lgrw->fallback_rwlock);
>> +     }
>> +
>> +     __this_cpu_inc(*lgrw->fallback_reader_refcnt);
>> +}
>> +EXPORT_SYMBOL(lg_rwlock_local_read_lock);
>> +
>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
>> +{
>> +     if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
>> +             lg_local_unlock(&lgrw->lglock);
>> +             return;
>> +     }
>> +
>> +     if (!__this_cpu_dec_return(*lgrw->fallback_reader_refcnt))
>> +             read_unlock(&lgrw->fallback_rwlock);
>> +
>> +     preempt_enable();
>> +}
>> +EXPORT_SYMBOL(lg_rwlock_local_read_unlock);
>> +
>
> If I read the code above correctly, all you are doing is implementing a
> recursive reader-side primitive (ie., allowing the reader to call these
> functions recursively, without resulting in a self-deadlock).
>
> But the thing is, making the reader-side recursive is the least of our
> problems! Our main challenge is to make the locking extremely flexible
> and also safe-guard it against circular-locking-dependencies and deadlocks.
> Please take a look at the changelog of patch 1 - it explains the situation
> with an example.


My lock fixes your requirements(I read patch 1-6 before I sent). In
readsite, lglock 's lock is token via trylock, the lglock doesn't
contribute to deadlocks, we can consider it doesn't exist when we find
deadlock from it. And global fallback rwlock doesn't result to
deadlocks because it is read-preference(you need to inc the
fallback_reader_refcnt inside the cpu-hotplug write-side, I don't do
it in generic lgrwlock)


If lg_rwlock_local_read_lock() spins, which means
lg_rwlock_local_read_lock() spins on fallback_rwlock, and which means
lg_rwlock_global_write_lock() took the lgrwlock successfully and
return, and which means lg_rwlock_local_read_lock() will stop spinning
when the write side finished.


>
>> +void lg_rwlock_global_write_lock(struct lgrwlock *lgrw)
>> +{
>> +     lg_global_lock(&lgrw->lglock);
>
> This does a for-loop on all CPUs and takes their locks one-by-one. That's
> exactly what we want to prevent, because that is the _source_ of all our
> deadlock woes in this case. In the presence of perfect lock ordering
> guarantees, this wouldn't have been a problem (that's why lglocks are
> being used successfully elsewhere in the kernel). In the stop-machine()
> removal case, the over-flexibility of preempt_disable() forces us to provide
> an equally flexible locking alternative. Hence we can't use such per-cpu
> locking schemes.
>
> You might note that, for exactly this reason, I haven't actually used any
> per-cpu _locks_ in this synchronization scheme, though it is named as
> "per-cpu rwlocks". The only per-cpu component here are the refcounts, and
> we consciously avoid waiting/spinning on them (because then that would be
> equivalent to having per-cpu locks, which are deadlock-prone). We use
> global rwlocks to get the deadlock-safety that we need.
>
>> +     write_lock(&lgrw->fallback_rwlock);
>> +}
>> +EXPORT_SYMBOL(lg_rwlock_global_write_lock);
>> +
>> +void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw)
>> +{
>> +     write_unlock(&lgrw->fallback_rwlock);
>> +     lg_global_unlock(&lgrw->lglock);
>> +}
>> +EXPORT_SYMBOL(lg_rwlock_global_write_unlock);
>>
>
> Regards,
> Srivatsa S. Bhat
>

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-26  0:17                   ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-02-26  0:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> Hi Lai,
>
> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>> Hi, Srivatsa,
>>
>> The target of the whole patchset is nice for me.
>
> Cool! Thanks :-)
>
>> A question: How did you find out the such usages of
>> "preempt_disable()" and convert them? did all are converted?
>>
>
> Well, I scanned through the source tree for usages which implicitly
> disabled CPU offline and converted them over. Its not limited to uses
> of preempt_disable() alone - even spin_locks, rwlocks, local_irq_disable()
> etc also help disable CPU offline. So I tried to dig out all such uses
> and converted them. However, since the merge window is open, a lot of
> new code is flowing into the tree. So I'll have to rescan the tree to
> see if there are any more places to convert.
>
>> And I think the lock is too complex and reinvent the wheel, why don't
>> you reuse the lglock?
>
> lglocks? No way! ;-) See below...
>
>> I wrote an untested draft here.
>>
>> Thanks,
>> Lai
>>
>> PS: Some HA tools(I'm writing one) which takes checkpoints of
>> virtual-machines frequently, I guess this patchset can speedup the
>> tools.
>>
>> From 01db542693a1b7fc6f9ece45d57cb529d9be5b66 Mon Sep 17 00:00:00 2001
>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>> Date: Mon, 25 Feb 2013 23:14:27 +0800
>> Subject: [PATCH] lglock: add read-preference local-global rwlock
>>
>> locality via lglock(trylock)
>> read-preference read-write-lock via fallback rwlock_t
>>
>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>> ---
>>  include/linux/lglock.h |   31 +++++++++++++++++++++++++++++++
>>  kernel/lglock.c        |   45 +++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 76 insertions(+), 0 deletions(-)
>>
>> diff --git a/include/linux/lglock.h b/include/linux/lglock.h
>> index 0d24e93..30fe887 100644
>> --- a/include/linux/lglock.h
>> +++ b/include/linux/lglock.h
>> @@ -67,4 +67,35 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu);
>>  void lg_global_lock(struct lglock *lg);
>>  void lg_global_unlock(struct lglock *lg);
>>
>> +struct lgrwlock {
>> +     unsigned long __percpu *fallback_reader_refcnt;
>> +     struct lglock lglock;
>> +     rwlock_t fallback_rwlock;
>> +};
>> +
>> +#define DEFINE_LGRWLOCK(name)                                                \
>> +     static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)           \
>> +     = __ARCH_SPIN_LOCK_UNLOCKED;                                    \
>> +     static DEFINE_PER_CPU(unsigned long, name ## _refcnt);          \
>> +     struct lgrwlock name = {                                        \
>> +             .fallback_reader_refcnt = &name ## _refcnt,             \
>> +             .lglock = { .lock = &name ## _lock } }
>> +
>> +#define DEFINE_STATIC_LGRWLOCK(name)                                 \
>> +     static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)           \
>> +     = __ARCH_SPIN_LOCK_UNLOCKED;                                    \
>> +     static DEFINE_PER_CPU(unsigned long, name ## _refcnt);          \
>> +     static struct lgrwlock name = {                                 \
>> +             .fallback_reader_refcnt = &name ## _refcnt,             \
>> +             .lglock = { .lock = &name ## _lock } }
>> +
>> +static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name)
>> +{
>> +     lg_lock_init(&lgrw->lglock, name);
>> +}
>> +
>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw);
>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw);
>> +void lg_rwlock_global_write_lock(struct lgrwlock *lgrw);
>> +void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw);
>>  #endif
>> diff --git a/kernel/lglock.c b/kernel/lglock.c
>> index 6535a66..463543a 100644
>> --- a/kernel/lglock.c
>> +++ b/kernel/lglock.c
>> @@ -87,3 +87,48 @@ void lg_global_unlock(struct lglock *lg)
>>       preempt_enable();
>>  }
>>  EXPORT_SYMBOL(lg_global_unlock);
>> +
>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
>> +{
>> +     struct lglock *lg = &lgrw->lglock;
>> +
>> +     preempt_disable();
>> +     if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
>> +             if (likely(arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
>> +                     rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
>> +                     return;
>> +             }
>> +             read_lock(&lgrw->fallback_rwlock);
>> +     }
>> +
>> +     __this_cpu_inc(*lgrw->fallback_reader_refcnt);
>> +}
>> +EXPORT_SYMBOL(lg_rwlock_local_read_lock);
>> +
>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
>> +{
>> +     if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
>> +             lg_local_unlock(&lgrw->lglock);
>> +             return;
>> +     }
>> +
>> +     if (!__this_cpu_dec_return(*lgrw->fallback_reader_refcnt))
>> +             read_unlock(&lgrw->fallback_rwlock);
>> +
>> +     preempt_enable();
>> +}
>> +EXPORT_SYMBOL(lg_rwlock_local_read_unlock);
>> +
>
> If I read the code above correctly, all you are doing is implementing a
> recursive reader-side primitive (ie., allowing the reader to call these
> functions recursively, without resulting in a self-deadlock).
>
> But the thing is, making the reader-side recursive is the least of our
> problems! Our main challenge is to make the locking extremely flexible
> and also safe-guard it against circular-locking-dependencies and deadlocks.
> Please take a look at the changelog of patch 1 - it explains the situation
> with an example.


My lock fixes your requirements(I read patch 1-6 before I sent). In
readsite, lglock 's lock is token via trylock, the lglock doesn't
contribute to deadlocks, we can consider it doesn't exist when we find
deadlock from it. And global fallback rwlock doesn't result to
deadlocks because it is read-preference(you need to inc the
fallback_reader_refcnt inside the cpu-hotplug write-side, I don't do
it in generic lgrwlock)


If lg_rwlock_local_read_lock() spins, which means
lg_rwlock_local_read_lock() spins on fallback_rwlock, and which means
lg_rwlock_global_write_lock() took the lgrwlock successfully and
return, and which means lg_rwlock_local_read_lock() will stop spinning
when the write side finished.


>
>> +void lg_rwlock_global_write_lock(struct lgrwlock *lgrw)
>> +{
>> +     lg_global_lock(&lgrw->lglock);
>
> This does a for-loop on all CPUs and takes their locks one-by-one. That's
> exactly what we want to prevent, because that is the _source_ of all our
> deadlock woes in this case. In the presence of perfect lock ordering
> guarantees, this wouldn't have been a problem (that's why lglocks are
> being used successfully elsewhere in the kernel). In the stop-machine()
> removal case, the over-flexibility of preempt_disable() forces us to provide
> an equally flexible locking alternative. Hence we can't use such per-cpu
> locking schemes.
>
> You might note that, for exactly this reason, I haven't actually used any
> per-cpu _locks_ in this synchronization scheme, though it is named as
> "per-cpu rwlocks". The only per-cpu component here are the refcounts, and
> we consciously avoid waiting/spinning on them (because then that would be
> equivalent to having per-cpu locks, which are deadlock-prone). We use
> global rwlocks to get the deadlock-safety that we need.
>
>> +     write_lock(&lgrw->fallback_rwlock);
>> +}
>> +EXPORT_SYMBOL(lg_rwlock_global_write_lock);
>> +
>> +void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw)
>> +{
>> +     write_unlock(&lgrw->fallback_rwlock);
>> +     lg_global_unlock(&lgrw->lglock);
>> +}
>> +EXPORT_SYMBOL(lg_rwlock_global_write_unlock);
>>
>
> Regards,
> Srivatsa S. Bhat
>

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
  2013-02-26  0:17                   ` Lai Jiangshan
  (?)
@ 2013-02-26  0:19                     ` Lai Jiangshan
  -1 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-02-26  0:19 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Michel Lespinasse, linux-doc, peterz, fweisbec, linux-kernel,
	namhyung, mingo, linux-arch, linux, xiaoguangrong, wangyun,
	paulmck, nikunj, linux-pm, rusty, rostedt, rjw, vincent.guittot,
	tglx, linux-arm-kernel, netdev, oleg, sbw, tj, akpm,
	linuxppc-dev

On Tue, Feb 26, 2013 at 8:17 AM, Lai Jiangshan <eag0628@gmail.com> wrote:
> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> Hi Lai,
>>
>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>> Hi, Srivatsa,
>>>
>>> The target of the whole patchset is nice for me.
>>
>> Cool! Thanks :-)
>>
>>> A question: How did you find out the such usages of
>>> "preempt_disable()" and convert them? did all are converted?
>>>
>>
>> Well, I scanned through the source tree for usages which implicitly
>> disabled CPU offline and converted them over. Its not limited to uses
>> of preempt_disable() alone - even spin_locks, rwlocks, local_irq_disable()
>> etc also help disable CPU offline. So I tried to dig out all such uses
>> and converted them. However, since the merge window is open, a lot of
>> new code is flowing into the tree. So I'll have to rescan the tree to
>> see if there are any more places to convert.
>>
>>> And I think the lock is too complex and reinvent the wheel, why don't
>>> you reuse the lglock?
>>
>> lglocks? No way! ;-) See below...
>>
>>> I wrote an untested draft here.
>>>
>>> Thanks,
>>> Lai
>>>
>>> PS: Some HA tools(I'm writing one) which takes checkpoints of
>>> virtual-machines frequently, I guess this patchset can speedup the
>>> tools.
>>>
>>> From 01db542693a1b7fc6f9ece45d57cb529d9be5b66 Mon Sep 17 00:00:00 2001
>>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>>> Date: Mon, 25 Feb 2013 23:14:27 +0800
>>> Subject: [PATCH] lglock: add read-preference local-global rwlock
>>>
>>> locality via lglock(trylock)
>>> read-preference read-write-lock via fallback rwlock_t
>>>
>>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>>> ---
>>>  include/linux/lglock.h |   31 +++++++++++++++++++++++++++++++
>>>  kernel/lglock.c        |   45 +++++++++++++++++++++++++++++++++++++++++++++
>>>  2 files changed, 76 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/include/linux/lglock.h b/include/linux/lglock.h
>>> index 0d24e93..30fe887 100644
>>> --- a/include/linux/lglock.h
>>> +++ b/include/linux/lglock.h
>>> @@ -67,4 +67,35 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu);
>>>  void lg_global_lock(struct lglock *lg);
>>>  void lg_global_unlock(struct lglock *lg);
>>>
>>> +struct lgrwlock {
>>> +     unsigned long __percpu *fallback_reader_refcnt;
>>> +     struct lglock lglock;
>>> +     rwlock_t fallback_rwlock;
>>> +};
>>> +
>>> +#define DEFINE_LGRWLOCK(name)                                                \
>>> +     static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)           \
>>> +     = __ARCH_SPIN_LOCK_UNLOCKED;                                    \
>>> +     static DEFINE_PER_CPU(unsigned long, name ## _refcnt);          \
>>> +     struct lgrwlock name = {                                        \
>>> +             .fallback_reader_refcnt = &name ## _refcnt,             \
>>> +             .lglock = { .lock = &name ## _lock } }
>>> +
>>> +#define DEFINE_STATIC_LGRWLOCK(name)                                 \
>>> +     static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)           \
>>> +     = __ARCH_SPIN_LOCK_UNLOCKED;                                    \
>>> +     static DEFINE_PER_CPU(unsigned long, name ## _refcnt);          \
>>> +     static struct lgrwlock name = {                                 \
>>> +             .fallback_reader_refcnt = &name ## _refcnt,             \
>>> +             .lglock = { .lock = &name ## _lock } }
>>> +
>>> +static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name)
>>> +{
>>> +     lg_lock_init(&lgrw->lglock, name);
>>> +}
>>> +
>>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw);
>>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw);
>>> +void lg_rwlock_global_write_lock(struct lgrwlock *lgrw);
>>> +void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw);
>>>  #endif
>>> diff --git a/kernel/lglock.c b/kernel/lglock.c
>>> index 6535a66..463543a 100644
>>> --- a/kernel/lglock.c
>>> +++ b/kernel/lglock.c
>>> @@ -87,3 +87,48 @@ void lg_global_unlock(struct lglock *lg)
>>>       preempt_enable();
>>>  }
>>>  EXPORT_SYMBOL(lg_global_unlock);
>>> +
>>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
>>> +{
>>> +     struct lglock *lg = &lgrw->lglock;
>>> +
>>> +     preempt_disable();
>>> +     if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
>>> +             if (likely(arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
>>> +                     rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
>>> +                     return;
>>> +             }
>>> +             read_lock(&lgrw->fallback_rwlock);
>>> +     }
>>> +
>>> +     __this_cpu_inc(*lgrw->fallback_reader_refcnt);
>>> +}
>>> +EXPORT_SYMBOL(lg_rwlock_local_read_lock);
>>> +
>>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
>>> +{
>>> +     if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
>>> +             lg_local_unlock(&lgrw->lglock);
>>> +             return;
>>> +     }
>>> +
>>> +     if (!__this_cpu_dec_return(*lgrw->fallback_reader_refcnt))
>>> +             read_unlock(&lgrw->fallback_rwlock);
>>> +
>>> +     preempt_enable();
>>> +}
>>> +EXPORT_SYMBOL(lg_rwlock_local_read_unlock);
>>> +
>>
>> If I read the code above correctly, all you are doing is implementing a
>> recursive reader-side primitive (ie., allowing the reader to call these
>> functions recursively, without resulting in a self-deadlock).
>>
>> But the thing is, making the reader-side recursive is the least of our
>> problems! Our main challenge is to make the locking extremely flexible
>> and also safe-guard it against circular-locking-dependencies and deadlocks.
>> Please take a look at the changelog of patch 1 - it explains the situation
>> with an example.
>
>
> My lock fixes your requirements(I read patch 1-6 before I sent). In

s/fixes/fits/

> readsite, lglock 's lock is token via trylock, the lglock doesn't
> contribute to deadlocks, we can consider it doesn't exist when we find
> deadlock from it. And global fallback rwlock doesn't result to
> deadlocks because it is read-preference(you need to inc the
> fallback_reader_refcnt inside the cpu-hotplug write-side, I don't do
> it in generic lgrwlock)
>
>
> If lg_rwlock_local_read_lock() spins, which means
> lg_rwlock_local_read_lock() spins on fallback_rwlock, and which means
> lg_rwlock_global_write_lock() took the lgrwlock successfully and
> return, and which means lg_rwlock_local_read_lock() will stop spinning
> when the write side finished.
>
>
>>
>>> +void lg_rwlock_global_write_lock(struct lgrwlock *lgrw)
>>> +{
>>> +     lg_global_lock(&lgrw->lglock);
>>
>> This does a for-loop on all CPUs and takes their locks one-by-one. That's
>> exactly what we want to prevent, because that is the _source_ of all our
>> deadlock woes in this case. In the presence of perfect lock ordering
>> guarantees, this wouldn't have been a problem (that's why lglocks are
>> being used successfully elsewhere in the kernel). In the stop-machine()
>> removal case, the over-flexibility of preempt_disable() forces us to provide
>> an equally flexible locking alternative. Hence we can't use such per-cpu
>> locking schemes.
>>
>> You might note that, for exactly this reason, I haven't actually used any
>> per-cpu _locks_ in this synchronization scheme, though it is named as
>> "per-cpu rwlocks". The only per-cpu component here are the refcounts, and
>> we consciously avoid waiting/spinning on them (because then that would be
>> equivalent to having per-cpu locks, which are deadlock-prone). We use
>> global rwlocks to get the deadlock-safety that we need.
>>
>>> +     write_lock(&lgrw->fallback_rwlock);
>>> +}
>>> +EXPORT_SYMBOL(lg_rwlock_global_write_lock);
>>> +
>>> +void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw)
>>> +{
>>> +     write_unlock(&lgrw->fallback_rwlock);
>>> +     lg_global_unlock(&lgrw->lglock);
>>> +}
>>> +EXPORT_SYMBOL(lg_rwlock_global_write_unlock);
>>>
>>
>> Regards,
>> Srivatsa S. Bhat
>>

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-26  0:19                     ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-02-26  0:19 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: linux-doc, peterz, fweisbec, linux-kernel, Michel Lespinasse,
	mingo, linux-arch, linux, xiaoguangrong, wangyun, paulmck,
	nikunj, linux-pm, rusty, rostedt, rjw, namhyung, tglx,
	linux-arm-kernel, netdev, oleg, vincent.guittot, sbw, tj, akpm,
	linuxppc-dev

On Tue, Feb 26, 2013 at 8:17 AM, Lai Jiangshan <eag0628@gmail.com> wrote:
> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> Hi Lai,
>>
>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>> Hi, Srivatsa,
>>>
>>> The target of the whole patchset is nice for me.
>>
>> Cool! Thanks :-)
>>
>>> A question: How did you find out the such usages of
>>> "preempt_disable()" and convert them? did all are converted?
>>>
>>
>> Well, I scanned through the source tree for usages which implicitly
>> disabled CPU offline and converted them over. Its not limited to uses
>> of preempt_disable() alone - even spin_locks, rwlocks, local_irq_disable()
>> etc also help disable CPU offline. So I tried to dig out all such uses
>> and converted them. However, since the merge window is open, a lot of
>> new code is flowing into the tree. So I'll have to rescan the tree to
>> see if there are any more places to convert.
>>
>>> And I think the lock is too complex and reinvent the wheel, why don't
>>> you reuse the lglock?
>>
>> lglocks? No way! ;-) See below...
>>
>>> I wrote an untested draft here.
>>>
>>> Thanks,
>>> Lai
>>>
>>> PS: Some HA tools(I'm writing one) which takes checkpoints of
>>> virtual-machines frequently, I guess this patchset can speedup the
>>> tools.
>>>
>>> From 01db542693a1b7fc6f9ece45d57cb529d9be5b66 Mon Sep 17 00:00:00 2001
>>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>>> Date: Mon, 25 Feb 2013 23:14:27 +0800
>>> Subject: [PATCH] lglock: add read-preference local-global rwlock
>>>
>>> locality via lglock(trylock)
>>> read-preference read-write-lock via fallback rwlock_t
>>>
>>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>>> ---
>>>  include/linux/lglock.h |   31 +++++++++++++++++++++++++++++++
>>>  kernel/lglock.c        |   45 +++++++++++++++++++++++++++++++++++++++++++++
>>>  2 files changed, 76 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/include/linux/lglock.h b/include/linux/lglock.h
>>> index 0d24e93..30fe887 100644
>>> --- a/include/linux/lglock.h
>>> +++ b/include/linux/lglock.h
>>> @@ -67,4 +67,35 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu);
>>>  void lg_global_lock(struct lglock *lg);
>>>  void lg_global_unlock(struct lglock *lg);
>>>
>>> +struct lgrwlock {
>>> +     unsigned long __percpu *fallback_reader_refcnt;
>>> +     struct lglock lglock;
>>> +     rwlock_t fallback_rwlock;
>>> +};
>>> +
>>> +#define DEFINE_LGRWLOCK(name)                                                \
>>> +     static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)           \
>>> +     = __ARCH_SPIN_LOCK_UNLOCKED;                                    \
>>> +     static DEFINE_PER_CPU(unsigned long, name ## _refcnt);          \
>>> +     struct lgrwlock name = {                                        \
>>> +             .fallback_reader_refcnt = &name ## _refcnt,             \
>>> +             .lglock = { .lock = &name ## _lock } }
>>> +
>>> +#define DEFINE_STATIC_LGRWLOCK(name)                                 \
>>> +     static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)           \
>>> +     = __ARCH_SPIN_LOCK_UNLOCKED;                                    \
>>> +     static DEFINE_PER_CPU(unsigned long, name ## _refcnt);          \
>>> +     static struct lgrwlock name = {                                 \
>>> +             .fallback_reader_refcnt = &name ## _refcnt,             \
>>> +             .lglock = { .lock = &name ## _lock } }
>>> +
>>> +static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name)
>>> +{
>>> +     lg_lock_init(&lgrw->lglock, name);
>>> +}
>>> +
>>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw);
>>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw);
>>> +void lg_rwlock_global_write_lock(struct lgrwlock *lgrw);
>>> +void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw);
>>>  #endif
>>> diff --git a/kernel/lglock.c b/kernel/lglock.c
>>> index 6535a66..463543a 100644
>>> --- a/kernel/lglock.c
>>> +++ b/kernel/lglock.c
>>> @@ -87,3 +87,48 @@ void lg_global_unlock(struct lglock *lg)
>>>       preempt_enable();
>>>  }
>>>  EXPORT_SYMBOL(lg_global_unlock);
>>> +
>>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
>>> +{
>>> +     struct lglock *lg = &lgrw->lglock;
>>> +
>>> +     preempt_disable();
>>> +     if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
>>> +             if (likely(arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
>>> +                     rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
>>> +                     return;
>>> +             }
>>> +             read_lock(&lgrw->fallback_rwlock);
>>> +     }
>>> +
>>> +     __this_cpu_inc(*lgrw->fallback_reader_refcnt);
>>> +}
>>> +EXPORT_SYMBOL(lg_rwlock_local_read_lock);
>>> +
>>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
>>> +{
>>> +     if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
>>> +             lg_local_unlock(&lgrw->lglock);
>>> +             return;
>>> +     }
>>> +
>>> +     if (!__this_cpu_dec_return(*lgrw->fallback_reader_refcnt))
>>> +             read_unlock(&lgrw->fallback_rwlock);
>>> +
>>> +     preempt_enable();
>>> +}
>>> +EXPORT_SYMBOL(lg_rwlock_local_read_unlock);
>>> +
>>
>> If I read the code above correctly, all you are doing is implementing a
>> recursive reader-side primitive (ie., allowing the reader to call these
>> functions recursively, without resulting in a self-deadlock).
>>
>> But the thing is, making the reader-side recursive is the least of our
>> problems! Our main challenge is to make the locking extremely flexible
>> and also safe-guard it against circular-locking-dependencies and deadlocks.
>> Please take a look at the changelog of patch 1 - it explains the situation
>> with an example.
>
>
> My lock fixes your requirements(I read patch 1-6 before I sent). In

s/fixes/fits/

> readsite, lglock 's lock is token via trylock, the lglock doesn't
> contribute to deadlocks, we can consider it doesn't exist when we find
> deadlock from it. And global fallback rwlock doesn't result to
> deadlocks because it is read-preference(you need to inc the
> fallback_reader_refcnt inside the cpu-hotplug write-side, I don't do
> it in generic lgrwlock)
>
>
> If lg_rwlock_local_read_lock() spins, which means
> lg_rwlock_local_read_lock() spins on fallback_rwlock, and which means
> lg_rwlock_global_write_lock() took the lgrwlock successfully and
> return, and which means lg_rwlock_local_read_lock() will stop spinning
> when the write side finished.
>
>
>>
>>> +void lg_rwlock_global_write_lock(struct lgrwlock *lgrw)
>>> +{
>>> +     lg_global_lock(&lgrw->lglock);
>>
>> This does a for-loop on all CPUs and takes their locks one-by-one. That's
>> exactly what we want to prevent, because that is the _source_ of all our
>> deadlock woes in this case. In the presence of perfect lock ordering
>> guarantees, this wouldn't have been a problem (that's why lglocks are
>> being used successfully elsewhere in the kernel). In the stop-machine()
>> removal case, the over-flexibility of preempt_disable() forces us to provide
>> an equally flexible locking alternative. Hence we can't use such per-cpu
>> locking schemes.
>>
>> You might note that, for exactly this reason, I haven't actually used any
>> per-cpu _locks_ in this synchronization scheme, though it is named as
>> "per-cpu rwlocks". The only per-cpu component here are the refcounts, and
>> we consciously avoid waiting/spinning on them (because then that would be
>> equivalent to having per-cpu locks, which are deadlock-prone). We use
>> global rwlocks to get the deadlock-safety that we need.
>>
>>> +     write_lock(&lgrw->fallback_rwlock);
>>> +}
>>> +EXPORT_SYMBOL(lg_rwlock_global_write_lock);
>>> +
>>> +void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw)
>>> +{
>>> +     write_unlock(&lgrw->fallback_rwlock);
>>> +     lg_global_unlock(&lgrw->lglock);
>>> +}
>>> +EXPORT_SYMBOL(lg_rwlock_global_write_unlock);
>>>
>>
>> Regards,
>> Srivatsa S. Bhat
>>

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-26  0:19                     ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-02-26  0:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 26, 2013 at 8:17 AM, Lai Jiangshan <eag0628@gmail.com> wrote:
> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> Hi Lai,
>>
>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>> Hi, Srivatsa,
>>>
>>> The target of the whole patchset is nice for me.
>>
>> Cool! Thanks :-)
>>
>>> A question: How did you find out the such usages of
>>> "preempt_disable()" and convert them? did all are converted?
>>>
>>
>> Well, I scanned through the source tree for usages which implicitly
>> disabled CPU offline and converted them over. Its not limited to uses
>> of preempt_disable() alone - even spin_locks, rwlocks, local_irq_disable()
>> etc also help disable CPU offline. So I tried to dig out all such uses
>> and converted them. However, since the merge window is open, a lot of
>> new code is flowing into the tree. So I'll have to rescan the tree to
>> see if there are any more places to convert.
>>
>>> And I think the lock is too complex and reinvent the wheel, why don't
>>> you reuse the lglock?
>>
>> lglocks? No way! ;-) See below...
>>
>>> I wrote an untested draft here.
>>>
>>> Thanks,
>>> Lai
>>>
>>> PS: Some HA tools(I'm writing one) which takes checkpoints of
>>> virtual-machines frequently, I guess this patchset can speedup the
>>> tools.
>>>
>>> From 01db542693a1b7fc6f9ece45d57cb529d9be5b66 Mon Sep 17 00:00:00 2001
>>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>>> Date: Mon, 25 Feb 2013 23:14:27 +0800
>>> Subject: [PATCH] lglock: add read-preference local-global rwlock
>>>
>>> locality via lglock(trylock)
>>> read-preference read-write-lock via fallback rwlock_t
>>>
>>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>>> ---
>>>  include/linux/lglock.h |   31 +++++++++++++++++++++++++++++++
>>>  kernel/lglock.c        |   45 +++++++++++++++++++++++++++++++++++++++++++++
>>>  2 files changed, 76 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/include/linux/lglock.h b/include/linux/lglock.h
>>> index 0d24e93..30fe887 100644
>>> --- a/include/linux/lglock.h
>>> +++ b/include/linux/lglock.h
>>> @@ -67,4 +67,35 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu);
>>>  void lg_global_lock(struct lglock *lg);
>>>  void lg_global_unlock(struct lglock *lg);
>>>
>>> +struct lgrwlock {
>>> +     unsigned long __percpu *fallback_reader_refcnt;
>>> +     struct lglock lglock;
>>> +     rwlock_t fallback_rwlock;
>>> +};
>>> +
>>> +#define DEFINE_LGRWLOCK(name)                                                \
>>> +     static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)           \
>>> +     = __ARCH_SPIN_LOCK_UNLOCKED;                                    \
>>> +     static DEFINE_PER_CPU(unsigned long, name ## _refcnt);          \
>>> +     struct lgrwlock name = {                                        \
>>> +             .fallback_reader_refcnt = &name ## _refcnt,             \
>>> +             .lglock = { .lock = &name ## _lock } }
>>> +
>>> +#define DEFINE_STATIC_LGRWLOCK(name)                                 \
>>> +     static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)           \
>>> +     = __ARCH_SPIN_LOCK_UNLOCKED;                                    \
>>> +     static DEFINE_PER_CPU(unsigned long, name ## _refcnt);          \
>>> +     static struct lgrwlock name = {                                 \
>>> +             .fallback_reader_refcnt = &name ## _refcnt,             \
>>> +             .lglock = { .lock = &name ## _lock } }
>>> +
>>> +static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name)
>>> +{
>>> +     lg_lock_init(&lgrw->lglock, name);
>>> +}
>>> +
>>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw);
>>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw);
>>> +void lg_rwlock_global_write_lock(struct lgrwlock *lgrw);
>>> +void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw);
>>>  #endif
>>> diff --git a/kernel/lglock.c b/kernel/lglock.c
>>> index 6535a66..463543a 100644
>>> --- a/kernel/lglock.c
>>> +++ b/kernel/lglock.c
>>> @@ -87,3 +87,48 @@ void lg_global_unlock(struct lglock *lg)
>>>       preempt_enable();
>>>  }
>>>  EXPORT_SYMBOL(lg_global_unlock);
>>> +
>>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
>>> +{
>>> +     struct lglock *lg = &lgrw->lglock;
>>> +
>>> +     preempt_disable();
>>> +     if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
>>> +             if (likely(arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
>>> +                     rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
>>> +                     return;
>>> +             }
>>> +             read_lock(&lgrw->fallback_rwlock);
>>> +     }
>>> +
>>> +     __this_cpu_inc(*lgrw->fallback_reader_refcnt);
>>> +}
>>> +EXPORT_SYMBOL(lg_rwlock_local_read_lock);
>>> +
>>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
>>> +{
>>> +     if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
>>> +             lg_local_unlock(&lgrw->lglock);
>>> +             return;
>>> +     }
>>> +
>>> +     if (!__this_cpu_dec_return(*lgrw->fallback_reader_refcnt))
>>> +             read_unlock(&lgrw->fallback_rwlock);
>>> +
>>> +     preempt_enable();
>>> +}
>>> +EXPORT_SYMBOL(lg_rwlock_local_read_unlock);
>>> +
>>
>> If I read the code above correctly, all you are doing is implementing a
>> recursive reader-side primitive (ie., allowing the reader to call these
>> functions recursively, without resulting in a self-deadlock).
>>
>> But the thing is, making the reader-side recursive is the least of our
>> problems! Our main challenge is to make the locking extremely flexible
>> and also safe-guard it against circular-locking-dependencies and deadlocks.
>> Please take a look at the changelog of patch 1 - it explains the situation
>> with an example.
>
>
> My lock fixes your requirements(I read patch 1-6 before I sent). In

s/fixes/fits/

> readsite, lglock 's lock is token via trylock, the lglock doesn't
> contribute to deadlocks, we can consider it doesn't exist when we find
> deadlock from it. And global fallback rwlock doesn't result to
> deadlocks because it is read-preference(you need to inc the
> fallback_reader_refcnt inside the cpu-hotplug write-side, I don't do
> it in generic lgrwlock)
>
>
> If lg_rwlock_local_read_lock() spins, which means
> lg_rwlock_local_read_lock() spins on fallback_rwlock, and which means
> lg_rwlock_global_write_lock() took the lgrwlock successfully and
> return, and which means lg_rwlock_local_read_lock() will stop spinning
> when the write side finished.
>
>
>>
>>> +void lg_rwlock_global_write_lock(struct lgrwlock *lgrw)
>>> +{
>>> +     lg_global_lock(&lgrw->lglock);
>>
>> This does a for-loop on all CPUs and takes their locks one-by-one. That's
>> exactly what we want to prevent, because that is the _source_ of all our
>> deadlock woes in this case. In the presence of perfect lock ordering
>> guarantees, this wouldn't have been a problem (that's why lglocks are
>> being used successfully elsewhere in the kernel). In the stop-machine()
>> removal case, the over-flexibility of preempt_disable() forces us to provide
>> an equally flexible locking alternative. Hence we can't use such per-cpu
>> locking schemes.
>>
>> You might note that, for exactly this reason, I haven't actually used any
>> per-cpu _locks_ in this synchronization scheme, though it is named as
>> "per-cpu rwlocks". The only per-cpu component here are the refcounts, and
>> we consciously avoid waiting/spinning on them (because then that would be
>> equivalent to having per-cpu locks, which are deadlock-prone). We use
>> global rwlocks to get the deadlock-safety that we need.
>>
>>> +     write_lock(&lgrw->fallback_rwlock);
>>> +}
>>> +EXPORT_SYMBOL(lg_rwlock_global_write_lock);
>>> +
>>> +void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw)
>>> +{
>>> +     write_unlock(&lgrw->fallback_rwlock);
>>> +     lg_global_unlock(&lgrw->lglock);
>>> +}
>>> +EXPORT_SYMBOL(lg_rwlock_global_write_unlock);
>>>
>>
>> Regards,
>> Srivatsa S. Bhat
>>

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
  2013-02-26  0:17                   ` Lai Jiangshan
  (?)
@ 2013-02-26  9:02                     ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-26  9:02 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Michel Lespinasse, linux-doc, peterz, fweisbec, linux-kernel,
	namhyung, mingo, linux-arch, linux, xiaoguangrong, wangyun,
	paulmck, nikunj, linux-pm, rusty, rostedt, rjw, vincent.guittot,
	tglx, linux-arm-kernel, netdev, oleg, sbw, tj, akpm,
	linuxppc-dev

On 02/26/2013 05:47 AM, Lai Jiangshan wrote:
> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> Hi Lai,
>>
>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>> Hi, Srivatsa,
>>>
>>> The target of the whole patchset is nice for me.
>>
>> Cool! Thanks :-)
>>
[...]
>>> I wrote an untested draft here.
>>>
>>> Thanks,
>>> Lai
>>>
>>> PS: Some HA tools(I'm writing one) which takes checkpoints of
>>> virtual-machines frequently, I guess this patchset can speedup the
>>> tools.
>>>
>>> From 01db542693a1b7fc6f9ece45d57cb529d9be5b66 Mon Sep 17 00:00:00 2001
>>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>>> Date: Mon, 25 Feb 2013 23:14:27 +0800
>>> Subject: [PATCH] lglock: add read-preference local-global rwlock
>>>
>>> locality via lglock(trylock)
>>> read-preference read-write-lock via fallback rwlock_t
>>>
>>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>>> ---
>>>  include/linux/lglock.h |   31 +++++++++++++++++++++++++++++++
>>>  kernel/lglock.c        |   45 +++++++++++++++++++++++++++++++++++++++++++++
>>>  2 files changed, 76 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/include/linux/lglock.h b/include/linux/lglock.h
>>> index 0d24e93..30fe887 100644
>>> --- a/include/linux/lglock.h
>>> +++ b/include/linux/lglock.h
>>> @@ -67,4 +67,35 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu);
>>>  void lg_global_lock(struct lglock *lg);
>>>  void lg_global_unlock(struct lglock *lg);
>>>
>>> +struct lgrwlock {
>>> +     unsigned long __percpu *fallback_reader_refcnt;
>>> +     struct lglock lglock;
>>> +     rwlock_t fallback_rwlock;
>>> +};
>>> +
>>> +#define DEFINE_LGRWLOCK(name)                                                \
>>> +     static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)           \
>>> +     = __ARCH_SPIN_LOCK_UNLOCKED;                                    \
>>> +     static DEFINE_PER_CPU(unsigned long, name ## _refcnt);          \
>>> +     struct lgrwlock name = {                                        \
>>> +             .fallback_reader_refcnt = &name ## _refcnt,             \
>>> +             .lglock = { .lock = &name ## _lock } }
>>> +
>>> +#define DEFINE_STATIC_LGRWLOCK(name)                                 \
>>> +     static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)           \
>>> +     = __ARCH_SPIN_LOCK_UNLOCKED;                                    \
>>> +     static DEFINE_PER_CPU(unsigned long, name ## _refcnt);          \
>>> +     static struct lgrwlock name = {                                 \
>>> +             .fallback_reader_refcnt = &name ## _refcnt,             \
>>> +             .lglock = { .lock = &name ## _lock } }
>>> +
>>> +static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name)
>>> +{
>>> +     lg_lock_init(&lgrw->lglock, name);
>>> +}
>>> +
>>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw);
>>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw);
>>> +void lg_rwlock_global_write_lock(struct lgrwlock *lgrw);
>>> +void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw);
>>>  #endif
>>> diff --git a/kernel/lglock.c b/kernel/lglock.c
>>> index 6535a66..463543a 100644
>>> --- a/kernel/lglock.c
>>> +++ b/kernel/lglock.c
>>> @@ -87,3 +87,48 @@ void lg_global_unlock(struct lglock *lg)
>>>       preempt_enable();
>>>  }
>>>  EXPORT_SYMBOL(lg_global_unlock);
>>> +
>>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
>>> +{
>>> +     struct lglock *lg = &lgrw->lglock;
>>> +
>>> +     preempt_disable();
>>> +     if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
>>> +             if (likely(arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
>>> +                     rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
>>> +                     return;
>>> +             }
>>> +             read_lock(&lgrw->fallback_rwlock);
>>> +     }
>>> +
>>> +     __this_cpu_inc(*lgrw->fallback_reader_refcnt);
>>> +}
>>> +EXPORT_SYMBOL(lg_rwlock_local_read_lock);
>>> +
>>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
>>> +{
>>> +     if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
>>> +             lg_local_unlock(&lgrw->lglock);
>>> +             return;
>>> +     }
>>> +
>>> +     if (!__this_cpu_dec_return(*lgrw->fallback_reader_refcnt))
>>> +             read_unlock(&lgrw->fallback_rwlock);
>>> +
>>> +     preempt_enable();
>>> +}
>>> +EXPORT_SYMBOL(lg_rwlock_local_read_unlock);
>>> +
>>
>> If I read the code above correctly, all you are doing is implementing a
>> recursive reader-side primitive (ie., allowing the reader to call these
>> functions recursively, without resulting in a self-deadlock).
>>
>> But the thing is, making the reader-side recursive is the least of our
>> problems! Our main challenge is to make the locking extremely flexible
>> and also safe-guard it against circular-locking-dependencies and deadlocks.
>> Please take a look at the changelog of patch 1 - it explains the situation
>> with an example.
> 
> 
> My lock fixes your requirements(I read patch 1-6 before I sent). In
> readsite, lglock 's lock is token via trylock, the lglock doesn't
> contribute to deadlocks, we can consider it doesn't exist when we find
> deadlock from it. And global fallback rwlock doesn't result to
> deadlocks because it is read-preference(you need to inc the
> fallback_reader_refcnt inside the cpu-hotplug write-side, I don't do
> it in generic lgrwlock)
>

Ah, since you hadn't mentioned the increment at the writer-side in your
previous email, I had missed the bigger picture of what you were trying
to achieve.
 
> 
> If lg_rwlock_local_read_lock() spins, which means
> lg_rwlock_local_read_lock() spins on fallback_rwlock, and which means
> lg_rwlock_global_write_lock() took the lgrwlock successfully and
> return, and which means lg_rwlock_local_read_lock() will stop spinning
> when the write side finished.
> 

Unfortunately, I see quite a few issues with the code above. IIUC, the
writer and the reader both increment the same counters. So how will the
unlock() code in the reader path know when to unlock which of the locks?
(The counter-dropping-to-zero logic is not safe, since it can be updated
due to different reasons). And now that I look at it again, in the absence
of the writer, the reader is allowed to be recursive at the heavy cost of
taking the global rwlock for read, every 2nd time you nest (because the
spinlock is non-recursive). Also, this lg_rwlock implementation uses 3
different data-structures - a per-cpu spinlock, a global rwlock and
a per-cpu refcnt, and its not immediately apparent why you need those many
or even those many varieties. Also I see that this doesn't handle the
case of interrupt-handlers also being readers.

IMHO, the per-cpu rwlock scheme that I have implemented in this patchset
has a clean, understandable design and just enough data-structures/locks
to achieve its goal and has several optimizations (like reducing the
interrupts-disabled time etc) included - all in a very straight-forward
manner. Since this is non-trivial, IMHO, starting from a clean slate is
actually better than trying to retrofit the logic into some locking scheme
which we actively want to avoid (and hence effectively we aren't even
borrowing anything from!).

To summarize, if you are just pointing out that we can implement the same
logic by altering lglocks, then sure, I acknowledge the possibility.
However, I don't think doing that actually makes it better; it either
convolutes the logic unnecessarily, or ends up looking _very_ similar to
the implementation in this patchset, from what I can see.

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-26  9:02                     ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-26  9:02 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-doc, peterz, fweisbec, linux-kernel, Michel Lespinasse,
	mingo, linux-arch, linux, xiaoguangrong, wangyun, paulmck,
	nikunj, linux-pm, rusty, rostedt, rjw, namhyung, tglx,
	linux-arm-kernel, netdev, oleg, vincent.guittot, sbw, tj, akpm,
	linuxppc-dev

On 02/26/2013 05:47 AM, Lai Jiangshan wrote:
> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> Hi Lai,
>>
>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>> Hi, Srivatsa,
>>>
>>> The target of the whole patchset is nice for me.
>>
>> Cool! Thanks :-)
>>
[...]
>>> I wrote an untested draft here.
>>>
>>> Thanks,
>>> Lai
>>>
>>> PS: Some HA tools(I'm writing one) which takes checkpoints of
>>> virtual-machines frequently, I guess this patchset can speedup the
>>> tools.
>>>
>>> From 01db542693a1b7fc6f9ece45d57cb529d9be5b66 Mon Sep 17 00:00:00 2001
>>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>>> Date: Mon, 25 Feb 2013 23:14:27 +0800
>>> Subject: [PATCH] lglock: add read-preference local-global rwlock
>>>
>>> locality via lglock(trylock)
>>> read-preference read-write-lock via fallback rwlock_t
>>>
>>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>>> ---
>>>  include/linux/lglock.h |   31 +++++++++++++++++++++++++++++++
>>>  kernel/lglock.c        |   45 +++++++++++++++++++++++++++++++++++++++++++++
>>>  2 files changed, 76 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/include/linux/lglock.h b/include/linux/lglock.h
>>> index 0d24e93..30fe887 100644
>>> --- a/include/linux/lglock.h
>>> +++ b/include/linux/lglock.h
>>> @@ -67,4 +67,35 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu);
>>>  void lg_global_lock(struct lglock *lg);
>>>  void lg_global_unlock(struct lglock *lg);
>>>
>>> +struct lgrwlock {
>>> +     unsigned long __percpu *fallback_reader_refcnt;
>>> +     struct lglock lglock;
>>> +     rwlock_t fallback_rwlock;
>>> +};
>>> +
>>> +#define DEFINE_LGRWLOCK(name)                                                \
>>> +     static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)           \
>>> +     = __ARCH_SPIN_LOCK_UNLOCKED;                                    \
>>> +     static DEFINE_PER_CPU(unsigned long, name ## _refcnt);          \
>>> +     struct lgrwlock name = {                                        \
>>> +             .fallback_reader_refcnt = &name ## _refcnt,             \
>>> +             .lglock = { .lock = &name ## _lock } }
>>> +
>>> +#define DEFINE_STATIC_LGRWLOCK(name)                                 \
>>> +     static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)           \
>>> +     = __ARCH_SPIN_LOCK_UNLOCKED;                                    \
>>> +     static DEFINE_PER_CPU(unsigned long, name ## _refcnt);          \
>>> +     static struct lgrwlock name = {                                 \
>>> +             .fallback_reader_refcnt = &name ## _refcnt,             \
>>> +             .lglock = { .lock = &name ## _lock } }
>>> +
>>> +static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name)
>>> +{
>>> +     lg_lock_init(&lgrw->lglock, name);
>>> +}
>>> +
>>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw);
>>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw);
>>> +void lg_rwlock_global_write_lock(struct lgrwlock *lgrw);
>>> +void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw);
>>>  #endif
>>> diff --git a/kernel/lglock.c b/kernel/lglock.c
>>> index 6535a66..463543a 100644
>>> --- a/kernel/lglock.c
>>> +++ b/kernel/lglock.c
>>> @@ -87,3 +87,48 @@ void lg_global_unlock(struct lglock *lg)
>>>       preempt_enable();
>>>  }
>>>  EXPORT_SYMBOL(lg_global_unlock);
>>> +
>>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
>>> +{
>>> +     struct lglock *lg = &lgrw->lglock;
>>> +
>>> +     preempt_disable();
>>> +     if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
>>> +             if (likely(arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
>>> +                     rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
>>> +                     return;
>>> +             }
>>> +             read_lock(&lgrw->fallback_rwlock);
>>> +     }
>>> +
>>> +     __this_cpu_inc(*lgrw->fallback_reader_refcnt);
>>> +}
>>> +EXPORT_SYMBOL(lg_rwlock_local_read_lock);
>>> +
>>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
>>> +{
>>> +     if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
>>> +             lg_local_unlock(&lgrw->lglock);
>>> +             return;
>>> +     }
>>> +
>>> +     if (!__this_cpu_dec_return(*lgrw->fallback_reader_refcnt))
>>> +             read_unlock(&lgrw->fallback_rwlock);
>>> +
>>> +     preempt_enable();
>>> +}
>>> +EXPORT_SYMBOL(lg_rwlock_local_read_unlock);
>>> +
>>
>> If I read the code above correctly, all you are doing is implementing a
>> recursive reader-side primitive (ie., allowing the reader to call these
>> functions recursively, without resulting in a self-deadlock).
>>
>> But the thing is, making the reader-side recursive is the least of our
>> problems! Our main challenge is to make the locking extremely flexible
>> and also safe-guard it against circular-locking-dependencies and deadlocks.
>> Please take a look at the changelog of patch 1 - it explains the situation
>> with an example.
> 
> 
> My lock fixes your requirements(I read patch 1-6 before I sent). In
> readsite, lglock 's lock is token via trylock, the lglock doesn't
> contribute to deadlocks, we can consider it doesn't exist when we find
> deadlock from it. And global fallback rwlock doesn't result to
> deadlocks because it is read-preference(you need to inc the
> fallback_reader_refcnt inside the cpu-hotplug write-side, I don't do
> it in generic lgrwlock)
>

Ah, since you hadn't mentioned the increment at the writer-side in your
previous email, I had missed the bigger picture of what you were trying
to achieve.
 
> 
> If lg_rwlock_local_read_lock() spins, which means
> lg_rwlock_local_read_lock() spins on fallback_rwlock, and which means
> lg_rwlock_global_write_lock() took the lgrwlock successfully and
> return, and which means lg_rwlock_local_read_lock() will stop spinning
> when the write side finished.
> 

Unfortunately, I see quite a few issues with the code above. IIUC, the
writer and the reader both increment the same counters. So how will the
unlock() code in the reader path know when to unlock which of the locks?
(The counter-dropping-to-zero logic is not safe, since it can be updated
due to different reasons). And now that I look at it again, in the absence
of the writer, the reader is allowed to be recursive at the heavy cost of
taking the global rwlock for read, every 2nd time you nest (because the
spinlock is non-recursive). Also, this lg_rwlock implementation uses 3
different data-structures - a per-cpu spinlock, a global rwlock and
a per-cpu refcnt, and its not immediately apparent why you need those many
or even those many varieties. Also I see that this doesn't handle the
case of interrupt-handlers also being readers.

IMHO, the per-cpu rwlock scheme that I have implemented in this patchset
has a clean, understandable design and just enough data-structures/locks
to achieve its goal and has several optimizations (like reducing the
interrupts-disabled time etc) included - all in a very straight-forward
manner. Since this is non-trivial, IMHO, starting from a clean slate is
actually better than trying to retrofit the logic into some locking scheme
which we actively want to avoid (and hence effectively we aren't even
borrowing anything from!).

To summarize, if you are just pointing out that we can implement the same
logic by altering lglocks, then sure, I acknowledge the possibility.
However, I don't think doing that actually makes it better; it either
convolutes the logic unnecessarily, or ends up looking _very_ similar to
the implementation in this patchset, from what I can see.

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-26  9:02                     ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-26  9:02 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/26/2013 05:47 AM, Lai Jiangshan wrote:
> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> Hi Lai,
>>
>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>> Hi, Srivatsa,
>>>
>>> The target of the whole patchset is nice for me.
>>
>> Cool! Thanks :-)
>>
[...]
>>> I wrote an untested draft here.
>>>
>>> Thanks,
>>> Lai
>>>
>>> PS: Some HA tools(I'm writing one) which takes checkpoints of
>>> virtual-machines frequently, I guess this patchset can speedup the
>>> tools.
>>>
>>> From 01db542693a1b7fc6f9ece45d57cb529d9be5b66 Mon Sep 17 00:00:00 2001
>>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>>> Date: Mon, 25 Feb 2013 23:14:27 +0800
>>> Subject: [PATCH] lglock: add read-preference local-global rwlock
>>>
>>> locality via lglock(trylock)
>>> read-preference read-write-lock via fallback rwlock_t
>>>
>>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>>> ---
>>>  include/linux/lglock.h |   31 +++++++++++++++++++++++++++++++
>>>  kernel/lglock.c        |   45 +++++++++++++++++++++++++++++++++++++++++++++
>>>  2 files changed, 76 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/include/linux/lglock.h b/include/linux/lglock.h
>>> index 0d24e93..30fe887 100644
>>> --- a/include/linux/lglock.h
>>> +++ b/include/linux/lglock.h
>>> @@ -67,4 +67,35 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu);
>>>  void lg_global_lock(struct lglock *lg);
>>>  void lg_global_unlock(struct lglock *lg);
>>>
>>> +struct lgrwlock {
>>> +     unsigned long __percpu *fallback_reader_refcnt;
>>> +     struct lglock lglock;
>>> +     rwlock_t fallback_rwlock;
>>> +};
>>> +
>>> +#define DEFINE_LGRWLOCK(name)                                                \
>>> +     static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)           \
>>> +     = __ARCH_SPIN_LOCK_UNLOCKED;                                    \
>>> +     static DEFINE_PER_CPU(unsigned long, name ## _refcnt);          \
>>> +     struct lgrwlock name = {                                        \
>>> +             .fallback_reader_refcnt = &name ## _refcnt,             \
>>> +             .lglock = { .lock = &name ## _lock } }
>>> +
>>> +#define DEFINE_STATIC_LGRWLOCK(name)                                 \
>>> +     static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)           \
>>> +     = __ARCH_SPIN_LOCK_UNLOCKED;                                    \
>>> +     static DEFINE_PER_CPU(unsigned long, name ## _refcnt);          \
>>> +     static struct lgrwlock name = {                                 \
>>> +             .fallback_reader_refcnt = &name ## _refcnt,             \
>>> +             .lglock = { .lock = &name ## _lock } }
>>> +
>>> +static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name)
>>> +{
>>> +     lg_lock_init(&lgrw->lglock, name);
>>> +}
>>> +
>>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw);
>>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw);
>>> +void lg_rwlock_global_write_lock(struct lgrwlock *lgrw);
>>> +void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw);
>>>  #endif
>>> diff --git a/kernel/lglock.c b/kernel/lglock.c
>>> index 6535a66..463543a 100644
>>> --- a/kernel/lglock.c
>>> +++ b/kernel/lglock.c
>>> @@ -87,3 +87,48 @@ void lg_global_unlock(struct lglock *lg)
>>>       preempt_enable();
>>>  }
>>>  EXPORT_SYMBOL(lg_global_unlock);
>>> +
>>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
>>> +{
>>> +     struct lglock *lg = &lgrw->lglock;
>>> +
>>> +     preempt_disable();
>>> +     if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
>>> +             if (likely(arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
>>> +                     rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
>>> +                     return;
>>> +             }
>>> +             read_lock(&lgrw->fallback_rwlock);
>>> +     }
>>> +
>>> +     __this_cpu_inc(*lgrw->fallback_reader_refcnt);
>>> +}
>>> +EXPORT_SYMBOL(lg_rwlock_local_read_lock);
>>> +
>>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
>>> +{
>>> +     if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
>>> +             lg_local_unlock(&lgrw->lglock);
>>> +             return;
>>> +     }
>>> +
>>> +     if (!__this_cpu_dec_return(*lgrw->fallback_reader_refcnt))
>>> +             read_unlock(&lgrw->fallback_rwlock);
>>> +
>>> +     preempt_enable();
>>> +}
>>> +EXPORT_SYMBOL(lg_rwlock_local_read_unlock);
>>> +
>>
>> If I read the code above correctly, all you are doing is implementing a
>> recursive reader-side primitive (ie., allowing the reader to call these
>> functions recursively, without resulting in a self-deadlock).
>>
>> But the thing is, making the reader-side recursive is the least of our
>> problems! Our main challenge is to make the locking extremely flexible
>> and also safe-guard it against circular-locking-dependencies and deadlocks.
>> Please take a look at the changelog of patch 1 - it explains the situation
>> with an example.
> 
> 
> My lock fixes your requirements(I read patch 1-6 before I sent). In
> readsite, lglock 's lock is token via trylock, the lglock doesn't
> contribute to deadlocks, we can consider it doesn't exist when we find
> deadlock from it. And global fallback rwlock doesn't result to
> deadlocks because it is read-preference(you need to inc the
> fallback_reader_refcnt inside the cpu-hotplug write-side, I don't do
> it in generic lgrwlock)
>

Ah, since you hadn't mentioned the increment at the writer-side in your
previous email, I had missed the bigger picture of what you were trying
to achieve.
 
> 
> If lg_rwlock_local_read_lock() spins, which means
> lg_rwlock_local_read_lock() spins on fallback_rwlock, and which means
> lg_rwlock_global_write_lock() took the lgrwlock successfully and
> return, and which means lg_rwlock_local_read_lock() will stop spinning
> when the write side finished.
> 

Unfortunately, I see quite a few issues with the code above. IIUC, the
writer and the reader both increment the same counters. So how will the
unlock() code in the reader path know when to unlock which of the locks?
(The counter-dropping-to-zero logic is not safe, since it can be updated
due to different reasons). And now that I look at it again, in the absence
of the writer, the reader is allowed to be recursive at the heavy cost of
taking the global rwlock for read, every 2nd time you nest (because the
spinlock is non-recursive). Also, this lg_rwlock implementation uses 3
different data-structures - a per-cpu spinlock, a global rwlock and
a per-cpu refcnt, and its not immediately apparent why you need those many
or even those many varieties. Also I see that this doesn't handle the
case of interrupt-handlers also being readers.

IMHO, the per-cpu rwlock scheme that I have implemented in this patchset
has a clean, understandable design and just enough data-structures/locks
to achieve its goal and has several optimizations (like reducing the
interrupts-disabled time etc) included - all in a very straight-forward
manner. Since this is non-trivial, IMHO, starting from a clean slate is
actually better than trying to retrofit the logic into some locking scheme
which we actively want to avoid (and hence effectively we aren't even
borrowing anything from!).

To summarize, if you are just pointing out that we can implement the same
logic by altering lglocks, then sure, I acknowledge the possibility.
However, I don't think doing that actually makes it better; it either
convolutes the logic unnecessarily, or ends up looking _very_ similar to
the implementation in this patchset, from what I can see.

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
  2013-02-26  9:02                     ` Srivatsa S. Bhat
  (?)
@ 2013-02-26 12:59                       ` Lai Jiangshan
  -1 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-02-26 12:59 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Michel Lespinasse, linux-doc, peterz, fweisbec, linux-kernel,
	namhyung, mingo, linux-arch, linux, xiaoguangrong, wangyun,
	paulmck, nikunj, linux-pm, rusty, rostedt, rjw, vincent.guittot,
	tglx, linux-arm-kernel, netdev, oleg, sbw, tj, akpm,
	linuxppc-dev

On Tue, Feb 26, 2013 at 5:02 PM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> On 02/26/2013 05:47 AM, Lai Jiangshan wrote:
>> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>> Hi Lai,
>>>
>>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>>> Hi, Srivatsa,
>>>>
>>>> The target of the whole patchset is nice for me.
>>>
>>> Cool! Thanks :-)
>>>
> [...]
>>>> I wrote an untested draft here.
>>>>
>>>> Thanks,
>>>> Lai
>>>>
>>>> PS: Some HA tools(I'm writing one) which takes checkpoints of
>>>> virtual-machines frequently, I guess this patchset can speedup the
>>>> tools.
>>>>
>>>> From 01db542693a1b7fc6f9ece45d57cb529d9be5b66 Mon Sep 17 00:00:00 2001
>>>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>>>> Date: Mon, 25 Feb 2013 23:14:27 +0800
>>>> Subject: [PATCH] lglock: add read-preference local-global rwlock
>>>>
>>>> locality via lglock(trylock)
>>>> read-preference read-write-lock via fallback rwlock_t
>>>>
>>>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>>>> ---
>>>>  include/linux/lglock.h |   31 +++++++++++++++++++++++++++++++
>>>>  kernel/lglock.c        |   45 +++++++++++++++++++++++++++++++++++++++++++++
>>>>  2 files changed, 76 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/include/linux/lglock.h b/include/linux/lglock.h
>>>> index 0d24e93..30fe887 100644
>>>> --- a/include/linux/lglock.h
>>>> +++ b/include/linux/lglock.h
>>>> @@ -67,4 +67,35 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu);
>>>>  void lg_global_lock(struct lglock *lg);
>>>>  void lg_global_unlock(struct lglock *lg);
>>>>
>>>> +struct lgrwlock {
>>>> +     unsigned long __percpu *fallback_reader_refcnt;
>>>> +     struct lglock lglock;
>>>> +     rwlock_t fallback_rwlock;
>>>> +};
>>>> +
>>>> +#define DEFINE_LGRWLOCK(name)                                                \
>>>> +     static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)           \
>>>> +     = __ARCH_SPIN_LOCK_UNLOCKED;                                    \
>>>> +     static DEFINE_PER_CPU(unsigned long, name ## _refcnt);          \
>>>> +     struct lgrwlock name = {                                        \
>>>> +             .fallback_reader_refcnt = &name ## _refcnt,             \
>>>> +             .lglock = { .lock = &name ## _lock } }
>>>> +
>>>> +#define DEFINE_STATIC_LGRWLOCK(name)                                 \
>>>> +     static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)           \
>>>> +     = __ARCH_SPIN_LOCK_UNLOCKED;                                    \
>>>> +     static DEFINE_PER_CPU(unsigned long, name ## _refcnt);          \
>>>> +     static struct lgrwlock name = {                                 \
>>>> +             .fallback_reader_refcnt = &name ## _refcnt,             \
>>>> +             .lglock = { .lock = &name ## _lock } }
>>>> +
>>>> +static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name)
>>>> +{
>>>> +     lg_lock_init(&lgrw->lglock, name);
>>>> +}
>>>> +
>>>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw);
>>>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw);
>>>> +void lg_rwlock_global_write_lock(struct lgrwlock *lgrw);
>>>> +void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw);
>>>>  #endif
>>>> diff --git a/kernel/lglock.c b/kernel/lglock.c
>>>> index 6535a66..463543a 100644
>>>> --- a/kernel/lglock.c
>>>> +++ b/kernel/lglock.c
>>>> @@ -87,3 +87,48 @@ void lg_global_unlock(struct lglock *lg)
>>>>       preempt_enable();
>>>>  }
>>>>  EXPORT_SYMBOL(lg_global_unlock);
>>>> +
>>>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
>>>> +{
>>>> +     struct lglock *lg = &lgrw->lglock;
>>>> +
>>>> +     preempt_disable();
>>>> +     if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
>>>> +             if (likely(arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
>>>> +                     rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
>>>> +                     return;
>>>> +             }
>>>> +             read_lock(&lgrw->fallback_rwlock);
>>>> +     }
>>>> +
>>>> +     __this_cpu_inc(*lgrw->fallback_reader_refcnt);
>>>> +}
>>>> +EXPORT_SYMBOL(lg_rwlock_local_read_lock);
>>>> +
>>>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
>>>> +{
>>>> +     if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
>>>> +             lg_local_unlock(&lgrw->lglock);
>>>> +             return;
>>>> +     }
>>>> +
>>>> +     if (!__this_cpu_dec_return(*lgrw->fallback_reader_refcnt))
>>>> +             read_unlock(&lgrw->fallback_rwlock);
>>>> +
>>>> +     preempt_enable();
>>>> +}
>>>> +EXPORT_SYMBOL(lg_rwlock_local_read_unlock);
>>>> +
>>>
>>> If I read the code above correctly, all you are doing is implementing a
>>> recursive reader-side primitive (ie., allowing the reader to call these
>>> functions recursively, without resulting in a self-deadlock).
>>>
>>> But the thing is, making the reader-side recursive is the least of our
>>> problems! Our main challenge is to make the locking extremely flexible
>>> and also safe-guard it against circular-locking-dependencies and deadlocks.
>>> Please take a look at the changelog of patch 1 - it explains the situation
>>> with an example.
>>
>>
>> My lock fixes your requirements(I read patch 1-6 before I sent). In
>> readsite, lglock 's lock is token via trylock, the lglock doesn't
>> contribute to deadlocks, we can consider it doesn't exist when we find
>> deadlock from it. And global fallback rwlock doesn't result to
>> deadlocks because it is read-preference(you need to inc the
>> fallback_reader_refcnt inside the cpu-hotplug write-side, I don't do
>> it in generic lgrwlock)
>>
>
> Ah, since you hadn't mentioned the increment at the writer-side in your
> previous email, I had missed the bigger picture of what you were trying
> to achieve.
>
>>
>> If lg_rwlock_local_read_lock() spins, which means
>> lg_rwlock_local_read_lock() spins on fallback_rwlock, and which means
>> lg_rwlock_global_write_lock() took the lgrwlock successfully and
>> return, and which means lg_rwlock_local_read_lock() will stop spinning
>> when the write side finished.
>>
>
> Unfortunately, I see quite a few issues with the code above. IIUC, the
> writer and the reader both increment the same counters. So how will the
> unlock() code in the reader path know when to unlock which of the locks?

The same as your code, the reader(which nested in write C.S.) just dec
the counters.

> (The counter-dropping-to-zero logic is not safe, since it can be updated
> due to different reasons). And now that I look at it again, in the absence
> of the writer, the reader is allowed to be recursive at the heavy cost of
> taking the global rwlock for read, every 2nd time you nest (because the
> spinlock is non-recursive).

(I did not understand your comments of this part)
nested reader is considered seldom. But if N(>=2) nested readers happen,
the overhead is:
    1 spin_try_lock() + 1 read_lock() + (N-1) __this_cpu_inc()

> Also, this lg_rwlock implementation uses 3
> different data-structures - a per-cpu spinlock, a global rwlock and
> a per-cpu refcnt, and its not immediately apparent why you need those many
> or even those many varieties.

data-structures is the same as yours.
fallback_reader_refcnt <--> reader_refcnt
per-cpu spinlock <--> write_signal
fallback_rwlock  <---> global_rwlock

> Also I see that this doesn't handle the
> case of interrupt-handlers also being readers.

handled. nested reader will see the ref or take the fallback_rwlock

>
> IMHO, the per-cpu rwlock scheme that I have implemented in this patchset
> has a clean, understandable design and just enough data-structures/locks
> to achieve its goal and has several optimizations (like reducing the
> interrupts-disabled time etc) included - all in a very straight-forward
> manner. Since this is non-trivial, IMHO, starting from a clean slate is
> actually better than trying to retrofit the logic into some locking scheme
> which we actively want to avoid (and hence effectively we aren't even
> borrowing anything from!).
>
> To summarize, if you are just pointing out that we can implement the same
> logic by altering lglocks, then sure, I acknowledge the possibility.
> However, I don't think doing that actually makes it better; it either
> convolutes the logic unnecessarily, or ends up looking _very_ similar to
> the implementation in this patchset, from what I can see.
>
> Regards,
> Srivatsa S. Bhat
>

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-26 12:59                       ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-02-26 12:59 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: linux-doc, peterz, fweisbec, linux-kernel, Michel Lespinasse,
	mingo, linux-arch, linux, xiaoguangrong, wangyun, paulmck,
	nikunj, linux-pm, rusty, rostedt, rjw, namhyung, tglx,
	linux-arm-kernel, netdev, oleg, vincent.guittot, sbw, tj, akpm,
	linuxppc-dev

On Tue, Feb 26, 2013 at 5:02 PM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> On 02/26/2013 05:47 AM, Lai Jiangshan wrote:
>> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>> Hi Lai,
>>>
>>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>>> Hi, Srivatsa,
>>>>
>>>> The target of the whole patchset is nice for me.
>>>
>>> Cool! Thanks :-)
>>>
> [...]
>>>> I wrote an untested draft here.
>>>>
>>>> Thanks,
>>>> Lai
>>>>
>>>> PS: Some HA tools(I'm writing one) which takes checkpoints of
>>>> virtual-machines frequently, I guess this patchset can speedup the
>>>> tools.
>>>>
>>>> From 01db542693a1b7fc6f9ece45d57cb529d9be5b66 Mon Sep 17 00:00:00 2001
>>>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>>>> Date: Mon, 25 Feb 2013 23:14:27 +0800
>>>> Subject: [PATCH] lglock: add read-preference local-global rwlock
>>>>
>>>> locality via lglock(trylock)
>>>> read-preference read-write-lock via fallback rwlock_t
>>>>
>>>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>>>> ---
>>>>  include/linux/lglock.h |   31 +++++++++++++++++++++++++++++++
>>>>  kernel/lglock.c        |   45 +++++++++++++++++++++++++++++++++++++++++++++
>>>>  2 files changed, 76 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/include/linux/lglock.h b/include/linux/lglock.h
>>>> index 0d24e93..30fe887 100644
>>>> --- a/include/linux/lglock.h
>>>> +++ b/include/linux/lglock.h
>>>> @@ -67,4 +67,35 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu);
>>>>  void lg_global_lock(struct lglock *lg);
>>>>  void lg_global_unlock(struct lglock *lg);
>>>>
>>>> +struct lgrwlock {
>>>> +     unsigned long __percpu *fallback_reader_refcnt;
>>>> +     struct lglock lglock;
>>>> +     rwlock_t fallback_rwlock;
>>>> +};
>>>> +
>>>> +#define DEFINE_LGRWLOCK(name)                                                \
>>>> +     static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)           \
>>>> +     = __ARCH_SPIN_LOCK_UNLOCKED;                                    \
>>>> +     static DEFINE_PER_CPU(unsigned long, name ## _refcnt);          \
>>>> +     struct lgrwlock name = {                                        \
>>>> +             .fallback_reader_refcnt = &name ## _refcnt,             \
>>>> +             .lglock = { .lock = &name ## _lock } }
>>>> +
>>>> +#define DEFINE_STATIC_LGRWLOCK(name)                                 \
>>>> +     static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)           \
>>>> +     = __ARCH_SPIN_LOCK_UNLOCKED;                                    \
>>>> +     static DEFINE_PER_CPU(unsigned long, name ## _refcnt);          \
>>>> +     static struct lgrwlock name = {                                 \
>>>> +             .fallback_reader_refcnt = &name ## _refcnt,             \
>>>> +             .lglock = { .lock = &name ## _lock } }
>>>> +
>>>> +static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name)
>>>> +{
>>>> +     lg_lock_init(&lgrw->lglock, name);
>>>> +}
>>>> +
>>>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw);
>>>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw);
>>>> +void lg_rwlock_global_write_lock(struct lgrwlock *lgrw);
>>>> +void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw);
>>>>  #endif
>>>> diff --git a/kernel/lglock.c b/kernel/lglock.c
>>>> index 6535a66..463543a 100644
>>>> --- a/kernel/lglock.c
>>>> +++ b/kernel/lglock.c
>>>> @@ -87,3 +87,48 @@ void lg_global_unlock(struct lglock *lg)
>>>>       preempt_enable();
>>>>  }
>>>>  EXPORT_SYMBOL(lg_global_unlock);
>>>> +
>>>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
>>>> +{
>>>> +     struct lglock *lg = &lgrw->lglock;
>>>> +
>>>> +     preempt_disable();
>>>> +     if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
>>>> +             if (likely(arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
>>>> +                     rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
>>>> +                     return;
>>>> +             }
>>>> +             read_lock(&lgrw->fallback_rwlock);
>>>> +     }
>>>> +
>>>> +     __this_cpu_inc(*lgrw->fallback_reader_refcnt);
>>>> +}
>>>> +EXPORT_SYMBOL(lg_rwlock_local_read_lock);
>>>> +
>>>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
>>>> +{
>>>> +     if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
>>>> +             lg_local_unlock(&lgrw->lglock);
>>>> +             return;
>>>> +     }
>>>> +
>>>> +     if (!__this_cpu_dec_return(*lgrw->fallback_reader_refcnt))
>>>> +             read_unlock(&lgrw->fallback_rwlock);
>>>> +
>>>> +     preempt_enable();
>>>> +}
>>>> +EXPORT_SYMBOL(lg_rwlock_local_read_unlock);
>>>> +
>>>
>>> If I read the code above correctly, all you are doing is implementing a
>>> recursive reader-side primitive (ie., allowing the reader to call these
>>> functions recursively, without resulting in a self-deadlock).
>>>
>>> But the thing is, making the reader-side recursive is the least of our
>>> problems! Our main challenge is to make the locking extremely flexible
>>> and also safe-guard it against circular-locking-dependencies and deadlocks.
>>> Please take a look at the changelog of patch 1 - it explains the situation
>>> with an example.
>>
>>
>> My lock fixes your requirements(I read patch 1-6 before I sent). In
>> readsite, lglock 's lock is token via trylock, the lglock doesn't
>> contribute to deadlocks, we can consider it doesn't exist when we find
>> deadlock from it. And global fallback rwlock doesn't result to
>> deadlocks because it is read-preference(you need to inc the
>> fallback_reader_refcnt inside the cpu-hotplug write-side, I don't do
>> it in generic lgrwlock)
>>
>
> Ah, since you hadn't mentioned the increment at the writer-side in your
> previous email, I had missed the bigger picture of what you were trying
> to achieve.
>
>>
>> If lg_rwlock_local_read_lock() spins, which means
>> lg_rwlock_local_read_lock() spins on fallback_rwlock, and which means
>> lg_rwlock_global_write_lock() took the lgrwlock successfully and
>> return, and which means lg_rwlock_local_read_lock() will stop spinning
>> when the write side finished.
>>
>
> Unfortunately, I see quite a few issues with the code above. IIUC, the
> writer and the reader both increment the same counters. So how will the
> unlock() code in the reader path know when to unlock which of the locks?

The same as your code, the reader(which nested in write C.S.) just dec
the counters.

> (The counter-dropping-to-zero logic is not safe, since it can be updated
> due to different reasons). And now that I look at it again, in the absence
> of the writer, the reader is allowed to be recursive at the heavy cost of
> taking the global rwlock for read, every 2nd time you nest (because the
> spinlock is non-recursive).

(I did not understand your comments of this part)
nested reader is considered seldom. But if N(>=2) nested readers happen,
the overhead is:
    1 spin_try_lock() + 1 read_lock() + (N-1) __this_cpu_inc()

> Also, this lg_rwlock implementation uses 3
> different data-structures - a per-cpu spinlock, a global rwlock and
> a per-cpu refcnt, and its not immediately apparent why you need those many
> or even those many varieties.

data-structures is the same as yours.
fallback_reader_refcnt <--> reader_refcnt
per-cpu spinlock <--> write_signal
fallback_rwlock  <---> global_rwlock

> Also I see that this doesn't handle the
> case of interrupt-handlers also being readers.

handled. nested reader will see the ref or take the fallback_rwlock

>
> IMHO, the per-cpu rwlock scheme that I have implemented in this patchset
> has a clean, understandable design and just enough data-structures/locks
> to achieve its goal and has several optimizations (like reducing the
> interrupts-disabled time etc) included - all in a very straight-forward
> manner. Since this is non-trivial, IMHO, starting from a clean slate is
> actually better than trying to retrofit the logic into some locking scheme
> which we actively want to avoid (and hence effectively we aren't even
> borrowing anything from!).
>
> To summarize, if you are just pointing out that we can implement the same
> logic by altering lglocks, then sure, I acknowledge the possibility.
> However, I don't think doing that actually makes it better; it either
> convolutes the logic unnecessarily, or ends up looking _very_ similar to
> the implementation in this patchset, from what I can see.
>
> Regards,
> Srivatsa S. Bhat
>

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-26 12:59                       ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-02-26 12:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 26, 2013 at 5:02 PM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> On 02/26/2013 05:47 AM, Lai Jiangshan wrote:
>> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>> Hi Lai,
>>>
>>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>>> Hi, Srivatsa,
>>>>
>>>> The target of the whole patchset is nice for me.
>>>
>>> Cool! Thanks :-)
>>>
> [...]
>>>> I wrote an untested draft here.
>>>>
>>>> Thanks,
>>>> Lai
>>>>
>>>> PS: Some HA tools(I'm writing one) which takes checkpoints of
>>>> virtual-machines frequently, I guess this patchset can speedup the
>>>> tools.
>>>>
>>>> From 01db542693a1b7fc6f9ece45d57cb529d9be5b66 Mon Sep 17 00:00:00 2001
>>>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>>>> Date: Mon, 25 Feb 2013 23:14:27 +0800
>>>> Subject: [PATCH] lglock: add read-preference local-global rwlock
>>>>
>>>> locality via lglock(trylock)
>>>> read-preference read-write-lock via fallback rwlock_t
>>>>
>>>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>>>> ---
>>>>  include/linux/lglock.h |   31 +++++++++++++++++++++++++++++++
>>>>  kernel/lglock.c        |   45 +++++++++++++++++++++++++++++++++++++++++++++
>>>>  2 files changed, 76 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/include/linux/lglock.h b/include/linux/lglock.h
>>>> index 0d24e93..30fe887 100644
>>>> --- a/include/linux/lglock.h
>>>> +++ b/include/linux/lglock.h
>>>> @@ -67,4 +67,35 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu);
>>>>  void lg_global_lock(struct lglock *lg);
>>>>  void lg_global_unlock(struct lglock *lg);
>>>>
>>>> +struct lgrwlock {
>>>> +     unsigned long __percpu *fallback_reader_refcnt;
>>>> +     struct lglock lglock;
>>>> +     rwlock_t fallback_rwlock;
>>>> +};
>>>> +
>>>> +#define DEFINE_LGRWLOCK(name)                                                \
>>>> +     static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)           \
>>>> +     = __ARCH_SPIN_LOCK_UNLOCKED;                                    \
>>>> +     static DEFINE_PER_CPU(unsigned long, name ## _refcnt);          \
>>>> +     struct lgrwlock name = {                                        \
>>>> +             .fallback_reader_refcnt = &name ## _refcnt,             \
>>>> +             .lglock = { .lock = &name ## _lock } }
>>>> +
>>>> +#define DEFINE_STATIC_LGRWLOCK(name)                                 \
>>>> +     static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)           \
>>>> +     = __ARCH_SPIN_LOCK_UNLOCKED;                                    \
>>>> +     static DEFINE_PER_CPU(unsigned long, name ## _refcnt);          \
>>>> +     static struct lgrwlock name = {                                 \
>>>> +             .fallback_reader_refcnt = &name ## _refcnt,             \
>>>> +             .lglock = { .lock = &name ## _lock } }
>>>> +
>>>> +static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name)
>>>> +{
>>>> +     lg_lock_init(&lgrw->lglock, name);
>>>> +}
>>>> +
>>>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw);
>>>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw);
>>>> +void lg_rwlock_global_write_lock(struct lgrwlock *lgrw);
>>>> +void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw);
>>>>  #endif
>>>> diff --git a/kernel/lglock.c b/kernel/lglock.c
>>>> index 6535a66..463543a 100644
>>>> --- a/kernel/lglock.c
>>>> +++ b/kernel/lglock.c
>>>> @@ -87,3 +87,48 @@ void lg_global_unlock(struct lglock *lg)
>>>>       preempt_enable();
>>>>  }
>>>>  EXPORT_SYMBOL(lg_global_unlock);
>>>> +
>>>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
>>>> +{
>>>> +     struct lglock *lg = &lgrw->lglock;
>>>> +
>>>> +     preempt_disable();
>>>> +     if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
>>>> +             if (likely(arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
>>>> +                     rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
>>>> +                     return;
>>>> +             }
>>>> +             read_lock(&lgrw->fallback_rwlock);
>>>> +     }
>>>> +
>>>> +     __this_cpu_inc(*lgrw->fallback_reader_refcnt);
>>>> +}
>>>> +EXPORT_SYMBOL(lg_rwlock_local_read_lock);
>>>> +
>>>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
>>>> +{
>>>> +     if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
>>>> +             lg_local_unlock(&lgrw->lglock);
>>>> +             return;
>>>> +     }
>>>> +
>>>> +     if (!__this_cpu_dec_return(*lgrw->fallback_reader_refcnt))
>>>> +             read_unlock(&lgrw->fallback_rwlock);
>>>> +
>>>> +     preempt_enable();
>>>> +}
>>>> +EXPORT_SYMBOL(lg_rwlock_local_read_unlock);
>>>> +
>>>
>>> If I read the code above correctly, all you are doing is implementing a
>>> recursive reader-side primitive (ie., allowing the reader to call these
>>> functions recursively, without resulting in a self-deadlock).
>>>
>>> But the thing is, making the reader-side recursive is the least of our
>>> problems! Our main challenge is to make the locking extremely flexible
>>> and also safe-guard it against circular-locking-dependencies and deadlocks.
>>> Please take a look at the changelog of patch 1 - it explains the situation
>>> with an example.
>>
>>
>> My lock fixes your requirements(I read patch 1-6 before I sent). In
>> readsite, lglock 's lock is token via trylock, the lglock doesn't
>> contribute to deadlocks, we can consider it doesn't exist when we find
>> deadlock from it. And global fallback rwlock doesn't result to
>> deadlocks because it is read-preference(you need to inc the
>> fallback_reader_refcnt inside the cpu-hotplug write-side, I don't do
>> it in generic lgrwlock)
>>
>
> Ah, since you hadn't mentioned the increment at the writer-side in your
> previous email, I had missed the bigger picture of what you were trying
> to achieve.
>
>>
>> If lg_rwlock_local_read_lock() spins, which means
>> lg_rwlock_local_read_lock() spins on fallback_rwlock, and which means
>> lg_rwlock_global_write_lock() took the lgrwlock successfully and
>> return, and which means lg_rwlock_local_read_lock() will stop spinning
>> when the write side finished.
>>
>
> Unfortunately, I see quite a few issues with the code above. IIUC, the
> writer and the reader both increment the same counters. So how will the
> unlock() code in the reader path know when to unlock which of the locks?

The same as your code, the reader(which nested in write C.S.) just dec
the counters.

> (The counter-dropping-to-zero logic is not safe, since it can be updated
> due to different reasons). And now that I look at it again, in the absence
> of the writer, the reader is allowed to be recursive at the heavy cost of
> taking the global rwlock for read, every 2nd time you nest (because the
> spinlock is non-recursive).

(I did not understand your comments of this part)
nested reader is considered seldom. But if N(>=2) nested readers happen,
the overhead is:
    1 spin_try_lock() + 1 read_lock() + (N-1) __this_cpu_inc()

> Also, this lg_rwlock implementation uses 3
> different data-structures - a per-cpu spinlock, a global rwlock and
> a per-cpu refcnt, and its not immediately apparent why you need those many
> or even those many varieties.

data-structures is the same as yours.
fallback_reader_refcnt <--> reader_refcnt
per-cpu spinlock <--> write_signal
fallback_rwlock  <---> global_rwlock

> Also I see that this doesn't handle the
> case of interrupt-handlers also being readers.

handled. nested reader will see the ref or take the fallback_rwlock

>
> IMHO, the per-cpu rwlock scheme that I have implemented in this patchset
> has a clean, understandable design and just enough data-structures/locks
> to achieve its goal and has several optimizations (like reducing the
> interrupts-disabled time etc) included - all in a very straight-forward
> manner. Since this is non-trivial, IMHO, starting from a clean slate is
> actually better than trying to retrofit the logic into some locking scheme
> which we actively want to avoid (and hence effectively we aren't even
> borrowing anything from!).
>
> To summarize, if you are just pointing out that we can implement the same
> logic by altering lglocks, then sure, I acknowledge the possibility.
> However, I don't think doing that actually makes it better; it either
> convolutes the logic unnecessarily, or ends up looking _very_ similar to
> the implementation in this patchset, from what I can see.
>
> Regards,
> Srivatsa S. Bhat
>

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
  2013-02-25 19:26                 ` Srivatsa S. Bhat
  (?)
@ 2013-02-26 13:34                   ` Lai Jiangshan
  -1 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-02-26 13:34 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Michel Lespinasse, linux-doc, peterz, fweisbec, linux-kernel,
	namhyung, mingo, linux-arch, linux, xiaoguangrong, wangyun,
	paulmck, nikunj, linux-pm, rusty, rostedt, rjw, vincent.guittot,
	tglx, linux-arm-kernel, netdev, oleg, sbw, tj, akpm,
	linuxppc-dev, Lai Jiangshan

On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> Hi Lai,
>
> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>> Hi, Srivatsa,
>>
>> The target of the whole patchset is nice for me.
>
> Cool! Thanks :-)
>
>> A question: How did you find out the such usages of
>> "preempt_disable()" and convert them? did all are converted?
>>
>
> Well, I scanned through the source tree for usages which implicitly
> disabled CPU offline and converted them over.

How do you scan? could you show the way you scan the source tree.
I can follow your instructions for double checking.

> Its not limited to uses
> of preempt_disable() alone - even spin_locks, rwlocks, local_irq_disable()
> etc also help disable CPU offline. So I tried to dig out all such uses
> and converted them. However, since the merge window is open, a lot of
> new code is flowing into the tree. So I'll have to rescan the tree to
> see if there are any more places to convert.

I remember some code has such assumption:
    preempt_disable() (or something else)
    //the code assume that the cpu_online_map can't be changed.
    preempt_enable()

It is very hard to find out all such kinds of assumptions and fixes them.
(I notice your code mainly fixes code around send_xxxx())


>
>> And I think the lock is too complex and reinvent the wheel, why don't
>> you reuse the lglock?
>
> lglocks? No way! ;-) See below...
>
>> I wrote an untested draft here.
>>
>> Thanks,
>> Lai
>>
>> PS: Some HA tools(I'm writing one) which takes checkpoints of
>> virtual-machines frequently, I guess this patchset can speedup the
>> tools.
>>
>> From 01db542693a1b7fc6f9ece45d57cb529d9be5b66 Mon Sep 17 00:00:00 2001
>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>> Date: Mon, 25 Feb 2013 23:14:27 +0800
>> Subject: [PATCH] lglock: add read-preference local-global rwlock
>>
>> locality via lglock(trylock)
>> read-preference read-write-lock via fallback rwlock_t
>>
>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>> ---
>>  include/linux/lglock.h |   31 +++++++++++++++++++++++++++++++
>>  kernel/lglock.c        |   45 +++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 76 insertions(+), 0 deletions(-)
>>
>> diff --git a/include/linux/lglock.h b/include/linux/lglock.h
>> index 0d24e93..30fe887 100644
>> --- a/include/linux/lglock.h
>> +++ b/include/linux/lglock.h
>> @@ -67,4 +67,35 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu);
>>  void lg_global_lock(struct lglock *lg);
>>  void lg_global_unlock(struct lglock *lg);
>>
>> +struct lgrwlock {
>> +     unsigned long __percpu *fallback_reader_refcnt;
>> +     struct lglock lglock;
>> +     rwlock_t fallback_rwlock;
>> +};
>> +
>> +#define DEFINE_LGRWLOCK(name)                                                \
>> +     static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)           \
>> +     = __ARCH_SPIN_LOCK_UNLOCKED;                                    \
>> +     static DEFINE_PER_CPU(unsigned long, name ## _refcnt);          \
>> +     struct lgrwlock name = {                                        \
>> +             .fallback_reader_refcnt = &name ## _refcnt,             \
>> +             .lglock = { .lock = &name ## _lock } }
>> +
>> +#define DEFINE_STATIC_LGRWLOCK(name)                                 \
>> +     static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)           \
>> +     = __ARCH_SPIN_LOCK_UNLOCKED;                                    \
>> +     static DEFINE_PER_CPU(unsigned long, name ## _refcnt);          \
>> +     static struct lgrwlock name = {                                 \
>> +             .fallback_reader_refcnt = &name ## _refcnt,             \
>> +             .lglock = { .lock = &name ## _lock } }
>> +
>> +static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name)
>> +{
>> +     lg_lock_init(&lgrw->lglock, name);
>> +}
>> +
>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw);
>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw);
>> +void lg_rwlock_global_write_lock(struct lgrwlock *lgrw);
>> +void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw);
>>  #endif
>> diff --git a/kernel/lglock.c b/kernel/lglock.c
>> index 6535a66..463543a 100644
>> --- a/kernel/lglock.c
>> +++ b/kernel/lglock.c
>> @@ -87,3 +87,48 @@ void lg_global_unlock(struct lglock *lg)
>>       preempt_enable();
>>  }
>>  EXPORT_SYMBOL(lg_global_unlock);
>> +
>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
>> +{
>> +     struct lglock *lg = &lgrw->lglock;
>> +
>> +     preempt_disable();
>> +     if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
>> +             if (likely(arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
>> +                     rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
>> +                     return;
>> +             }
>> +             read_lock(&lgrw->fallback_rwlock);
>> +     }
>> +
>> +     __this_cpu_inc(*lgrw->fallback_reader_refcnt);
>> +}
>> +EXPORT_SYMBOL(lg_rwlock_local_read_lock);
>> +
>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
>> +{
>> +     if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
>> +             lg_local_unlock(&lgrw->lglock);
>> +             return;
>> +     }
>> +
>> +     if (!__this_cpu_dec_return(*lgrw->fallback_reader_refcnt))
>> +             read_unlock(&lgrw->fallback_rwlock);
>> +
>> +     preempt_enable();
>> +}
>> +EXPORT_SYMBOL(lg_rwlock_local_read_unlock);
>> +
>
> If I read the code above correctly, all you are doing is implementing a
> recursive reader-side primitive (ie., allowing the reader to call these
> functions recursively, without resulting in a self-deadlock).
>
> But the thing is, making the reader-side recursive is the least of our
> problems! Our main challenge is to make the locking extremely flexible
> and also safe-guard it against circular-locking-dependencies and deadlocks.
> Please take a look at the changelog of patch 1 - it explains the situation
> with an example.
>
>> +void lg_rwlock_global_write_lock(struct lgrwlock *lgrw)
>> +{
>> +     lg_global_lock(&lgrw->lglock);
>
> This does a for-loop on all CPUs and takes their locks one-by-one. That's
> exactly what we want to prevent, because that is the _source_ of all our
> deadlock woes in this case. In the presence of perfect lock ordering
> guarantees, this wouldn't have been a problem (that's why lglocks are
> being used successfully elsewhere in the kernel). In the stop-machine()
> removal case, the over-flexibility of preempt_disable() forces us to provide
> an equally flexible locking alternative. Hence we can't use such per-cpu
> locking schemes.
>
> You might note that, for exactly this reason, I haven't actually used any
> per-cpu _locks_ in this synchronization scheme, though it is named as
> "per-cpu rwlocks". The only per-cpu component here are the refcounts, and
> we consciously avoid waiting/spinning on them (because then that would be
> equivalent to having per-cpu locks, which are deadlock-prone). We use
> global rwlocks to get the deadlock-safety that we need.
>
>> +     write_lock(&lgrw->fallback_rwlock);
>> +}
>> +EXPORT_SYMBOL(lg_rwlock_global_write_lock);
>> +
>> +void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw)
>> +{
>> +     write_unlock(&lgrw->fallback_rwlock);
>> +     lg_global_unlock(&lgrw->lglock);
>> +}
>> +EXPORT_SYMBOL(lg_rwlock_global_write_unlock);
>>
>
> Regards,
> Srivatsa S. Bhat
>

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-26 13:34                   ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-02-26 13:34 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: linux-doc, peterz, fweisbec, linux-kernel, Michel Lespinasse,
	mingo, linux-arch, linux, xiaoguangrong, wangyun, paulmck,
	nikunj, linux-pm, rusty, rostedt, rjw, namhyung, tglx,
	linux-arm-kernel, Lai Jiangshan, netdev, oleg, vincent.guittot,
	sbw, tj, akpm, linuxppc-dev

On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> Hi Lai,
>
> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>> Hi, Srivatsa,
>>
>> The target of the whole patchset is nice for me.
>
> Cool! Thanks :-)
>
>> A question: How did you find out the such usages of
>> "preempt_disable()" and convert them? did all are converted?
>>
>
> Well, I scanned through the source tree for usages which implicitly
> disabled CPU offline and converted them over.

How do you scan? could you show the way you scan the source tree.
I can follow your instructions for double checking.

> Its not limited to uses
> of preempt_disable() alone - even spin_locks, rwlocks, local_irq_disable()
> etc also help disable CPU offline. So I tried to dig out all such uses
> and converted them. However, since the merge window is open, a lot of
> new code is flowing into the tree. So I'll have to rescan the tree to
> see if there are any more places to convert.

I remember some code has such assumption:
    preempt_disable() (or something else)
    //the code assume that the cpu_online_map can't be changed.
    preempt_enable()

It is very hard to find out all such kinds of assumptions and fixes them.
(I notice your code mainly fixes code around send_xxxx())


>
>> And I think the lock is too complex and reinvent the wheel, why don't
>> you reuse the lglock?
>
> lglocks? No way! ;-) See below...
>
>> I wrote an untested draft here.
>>
>> Thanks,
>> Lai
>>
>> PS: Some HA tools(I'm writing one) which takes checkpoints of
>> virtual-machines frequently, I guess this patchset can speedup the
>> tools.
>>
>> From 01db542693a1b7fc6f9ece45d57cb529d9be5b66 Mon Sep 17 00:00:00 2001
>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>> Date: Mon, 25 Feb 2013 23:14:27 +0800
>> Subject: [PATCH] lglock: add read-preference local-global rwlock
>>
>> locality via lglock(trylock)
>> read-preference read-write-lock via fallback rwlock_t
>>
>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>> ---
>>  include/linux/lglock.h |   31 +++++++++++++++++++++++++++++++
>>  kernel/lglock.c        |   45 +++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 76 insertions(+), 0 deletions(-)
>>
>> diff --git a/include/linux/lglock.h b/include/linux/lglock.h
>> index 0d24e93..30fe887 100644
>> --- a/include/linux/lglock.h
>> +++ b/include/linux/lglock.h
>> @@ -67,4 +67,35 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu);
>>  void lg_global_lock(struct lglock *lg);
>>  void lg_global_unlock(struct lglock *lg);
>>
>> +struct lgrwlock {
>> +     unsigned long __percpu *fallback_reader_refcnt;
>> +     struct lglock lglock;
>> +     rwlock_t fallback_rwlock;
>> +};
>> +
>> +#define DEFINE_LGRWLOCK(name)                                                \
>> +     static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)           \
>> +     = __ARCH_SPIN_LOCK_UNLOCKED;                                    \
>> +     static DEFINE_PER_CPU(unsigned long, name ## _refcnt);          \
>> +     struct lgrwlock name = {                                        \
>> +             .fallback_reader_refcnt = &name ## _refcnt,             \
>> +             .lglock = { .lock = &name ## _lock } }
>> +
>> +#define DEFINE_STATIC_LGRWLOCK(name)                                 \
>> +     static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)           \
>> +     = __ARCH_SPIN_LOCK_UNLOCKED;                                    \
>> +     static DEFINE_PER_CPU(unsigned long, name ## _refcnt);          \
>> +     static struct lgrwlock name = {                                 \
>> +             .fallback_reader_refcnt = &name ## _refcnt,             \
>> +             .lglock = { .lock = &name ## _lock } }
>> +
>> +static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name)
>> +{
>> +     lg_lock_init(&lgrw->lglock, name);
>> +}
>> +
>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw);
>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw);
>> +void lg_rwlock_global_write_lock(struct lgrwlock *lgrw);
>> +void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw);
>>  #endif
>> diff --git a/kernel/lglock.c b/kernel/lglock.c
>> index 6535a66..463543a 100644
>> --- a/kernel/lglock.c
>> +++ b/kernel/lglock.c
>> @@ -87,3 +87,48 @@ void lg_global_unlock(struct lglock *lg)
>>       preempt_enable();
>>  }
>>  EXPORT_SYMBOL(lg_global_unlock);
>> +
>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
>> +{
>> +     struct lglock *lg = &lgrw->lglock;
>> +
>> +     preempt_disable();
>> +     if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
>> +             if (likely(arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
>> +                     rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
>> +                     return;
>> +             }
>> +             read_lock(&lgrw->fallback_rwlock);
>> +     }
>> +
>> +     __this_cpu_inc(*lgrw->fallback_reader_refcnt);
>> +}
>> +EXPORT_SYMBOL(lg_rwlock_local_read_lock);
>> +
>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
>> +{
>> +     if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
>> +             lg_local_unlock(&lgrw->lglock);
>> +             return;
>> +     }
>> +
>> +     if (!__this_cpu_dec_return(*lgrw->fallback_reader_refcnt))
>> +             read_unlock(&lgrw->fallback_rwlock);
>> +
>> +     preempt_enable();
>> +}
>> +EXPORT_SYMBOL(lg_rwlock_local_read_unlock);
>> +
>
> If I read the code above correctly, all you are doing is implementing a
> recursive reader-side primitive (ie., allowing the reader to call these
> functions recursively, without resulting in a self-deadlock).
>
> But the thing is, making the reader-side recursive is the least of our
> problems! Our main challenge is to make the locking extremely flexible
> and also safe-guard it against circular-locking-dependencies and deadlocks.
> Please take a look at the changelog of patch 1 - it explains the situation
> with an example.
>
>> +void lg_rwlock_global_write_lock(struct lgrwlock *lgrw)
>> +{
>> +     lg_global_lock(&lgrw->lglock);
>
> This does a for-loop on all CPUs and takes their locks one-by-one. That's
> exactly what we want to prevent, because that is the _source_ of all our
> deadlock woes in this case. In the presence of perfect lock ordering
> guarantees, this wouldn't have been a problem (that's why lglocks are
> being used successfully elsewhere in the kernel). In the stop-machine()
> removal case, the over-flexibility of preempt_disable() forces us to provide
> an equally flexible locking alternative. Hence we can't use such per-cpu
> locking schemes.
>
> You might note that, for exactly this reason, I haven't actually used any
> per-cpu _locks_ in this synchronization scheme, though it is named as
> "per-cpu rwlocks". The only per-cpu component here are the refcounts, and
> we consciously avoid waiting/spinning on them (because then that would be
> equivalent to having per-cpu locks, which are deadlock-prone). We use
> global rwlocks to get the deadlock-safety that we need.
>
>> +     write_lock(&lgrw->fallback_rwlock);
>> +}
>> +EXPORT_SYMBOL(lg_rwlock_global_write_lock);
>> +
>> +void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw)
>> +{
>> +     write_unlock(&lgrw->fallback_rwlock);
>> +     lg_global_unlock(&lgrw->lglock);
>> +}
>> +EXPORT_SYMBOL(lg_rwlock_global_write_unlock);
>>
>
> Regards,
> Srivatsa S. Bhat
>

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-26 13:34                   ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-02-26 13:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> Hi Lai,
>
> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>> Hi, Srivatsa,
>>
>> The target of the whole patchset is nice for me.
>
> Cool! Thanks :-)
>
>> A question: How did you find out the such usages of
>> "preempt_disable()" and convert them? did all are converted?
>>
>
> Well, I scanned through the source tree for usages which implicitly
> disabled CPU offline and converted them over.

How do you scan? could you show the way you scan the source tree.
I can follow your instructions for double checking.

> Its not limited to uses
> of preempt_disable() alone - even spin_locks, rwlocks, local_irq_disable()
> etc also help disable CPU offline. So I tried to dig out all such uses
> and converted them. However, since the merge window is open, a lot of
> new code is flowing into the tree. So I'll have to rescan the tree to
> see if there are any more places to convert.

I remember some code has such assumption:
    preempt_disable() (or something else)
    //the code assume that the cpu_online_map can't be changed.
    preempt_enable()

It is very hard to find out all such kinds of assumptions and fixes them.
(I notice your code mainly fixes code around send_xxxx())


>
>> And I think the lock is too complex and reinvent the wheel, why don't
>> you reuse the lglock?
>
> lglocks? No way! ;-) See below...
>
>> I wrote an untested draft here.
>>
>> Thanks,
>> Lai
>>
>> PS: Some HA tools(I'm writing one) which takes checkpoints of
>> virtual-machines frequently, I guess this patchset can speedup the
>> tools.
>>
>> From 01db542693a1b7fc6f9ece45d57cb529d9be5b66 Mon Sep 17 00:00:00 2001
>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>> Date: Mon, 25 Feb 2013 23:14:27 +0800
>> Subject: [PATCH] lglock: add read-preference local-global rwlock
>>
>> locality via lglock(trylock)
>> read-preference read-write-lock via fallback rwlock_t
>>
>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>> ---
>>  include/linux/lglock.h |   31 +++++++++++++++++++++++++++++++
>>  kernel/lglock.c        |   45 +++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 76 insertions(+), 0 deletions(-)
>>
>> diff --git a/include/linux/lglock.h b/include/linux/lglock.h
>> index 0d24e93..30fe887 100644
>> --- a/include/linux/lglock.h
>> +++ b/include/linux/lglock.h
>> @@ -67,4 +67,35 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu);
>>  void lg_global_lock(struct lglock *lg);
>>  void lg_global_unlock(struct lglock *lg);
>>
>> +struct lgrwlock {
>> +     unsigned long __percpu *fallback_reader_refcnt;
>> +     struct lglock lglock;
>> +     rwlock_t fallback_rwlock;
>> +};
>> +
>> +#define DEFINE_LGRWLOCK(name)                                                \
>> +     static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)           \
>> +     = __ARCH_SPIN_LOCK_UNLOCKED;                                    \
>> +     static DEFINE_PER_CPU(unsigned long, name ## _refcnt);          \
>> +     struct lgrwlock name = {                                        \
>> +             .fallback_reader_refcnt = &name ## _refcnt,             \
>> +             .lglock = { .lock = &name ## _lock } }
>> +
>> +#define DEFINE_STATIC_LGRWLOCK(name)                                 \
>> +     static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)           \
>> +     = __ARCH_SPIN_LOCK_UNLOCKED;                                    \
>> +     static DEFINE_PER_CPU(unsigned long, name ## _refcnt);          \
>> +     static struct lgrwlock name = {                                 \
>> +             .fallback_reader_refcnt = &name ## _refcnt,             \
>> +             .lglock = { .lock = &name ## _lock } }
>> +
>> +static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name)
>> +{
>> +     lg_lock_init(&lgrw->lglock, name);
>> +}
>> +
>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw);
>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw);
>> +void lg_rwlock_global_write_lock(struct lgrwlock *lgrw);
>> +void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw);
>>  #endif
>> diff --git a/kernel/lglock.c b/kernel/lglock.c
>> index 6535a66..463543a 100644
>> --- a/kernel/lglock.c
>> +++ b/kernel/lglock.c
>> @@ -87,3 +87,48 @@ void lg_global_unlock(struct lglock *lg)
>>       preempt_enable();
>>  }
>>  EXPORT_SYMBOL(lg_global_unlock);
>> +
>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
>> +{
>> +     struct lglock *lg = &lgrw->lglock;
>> +
>> +     preempt_disable();
>> +     if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
>> +             if (likely(arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
>> +                     rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
>> +                     return;
>> +             }
>> +             read_lock(&lgrw->fallback_rwlock);
>> +     }
>> +
>> +     __this_cpu_inc(*lgrw->fallback_reader_refcnt);
>> +}
>> +EXPORT_SYMBOL(lg_rwlock_local_read_lock);
>> +
>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
>> +{
>> +     if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) {
>> +             lg_local_unlock(&lgrw->lglock);
>> +             return;
>> +     }
>> +
>> +     if (!__this_cpu_dec_return(*lgrw->fallback_reader_refcnt))
>> +             read_unlock(&lgrw->fallback_rwlock);
>> +
>> +     preempt_enable();
>> +}
>> +EXPORT_SYMBOL(lg_rwlock_local_read_unlock);
>> +
>
> If I read the code above correctly, all you are doing is implementing a
> recursive reader-side primitive (ie., allowing the reader to call these
> functions recursively, without resulting in a self-deadlock).
>
> But the thing is, making the reader-side recursive is the least of our
> problems! Our main challenge is to make the locking extremely flexible
> and also safe-guard it against circular-locking-dependencies and deadlocks.
> Please take a look at the changelog of patch 1 - it explains the situation
> with an example.
>
>> +void lg_rwlock_global_write_lock(struct lgrwlock *lgrw)
>> +{
>> +     lg_global_lock(&lgrw->lglock);
>
> This does a for-loop on all CPUs and takes their locks one-by-one. That's
> exactly what we want to prevent, because that is the _source_ of all our
> deadlock woes in this case. In the presence of perfect lock ordering
> guarantees, this wouldn't have been a problem (that's why lglocks are
> being used successfully elsewhere in the kernel). In the stop-machine()
> removal case, the over-flexibility of preempt_disable() forces us to provide
> an equally flexible locking alternative. Hence we can't use such per-cpu
> locking schemes.
>
> You might note that, for exactly this reason, I haven't actually used any
> per-cpu _locks_ in this synchronization scheme, though it is named as
> "per-cpu rwlocks". The only per-cpu component here are the refcounts, and
> we consciously avoid waiting/spinning on them (because then that would be
> equivalent to having per-cpu locks, which are deadlock-prone). We use
> global rwlocks to get the deadlock-safety that we need.
>
>> +     write_lock(&lgrw->fallback_rwlock);
>> +}
>> +EXPORT_SYMBOL(lg_rwlock_global_write_lock);
>> +
>> +void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw)
>> +{
>> +     write_unlock(&lgrw->fallback_rwlock);
>> +     lg_global_unlock(&lgrw->lglock);
>> +}
>> +EXPORT_SYMBOL(lg_rwlock_global_write_unlock);
>>
>
> Regards,
> Srivatsa S. Bhat
>

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
  2013-02-18 12:38   ` Srivatsa S. Bhat
  (?)
@ 2013-02-26 14:17     ` Lai Jiangshan
  -1 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-02-26 14:17 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung,
	rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, linux-pm, linux-arch, linux-arm-kernel, linuxppc-dev,
	netdev, linux-doc, linux-kernel, walken, vincent.guittot

On Mon, Feb 18, 2013 at 8:38 PM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> Using global rwlocks as the backend for per-CPU rwlocks helps us avoid many
> lock-ordering related problems (unlike per-cpu locks). However, global
> rwlocks lead to unnecessary cache-line bouncing even when there are no
> writers present, which can slow down the system needlessly.

per-CPU rwlocks(yours and mine) are the exactly same as rwlock_t in
the view of lock dependency(except reader-C.S. can be nested in
writer-C.S.)
so they can deadlock in this order:

spin_lock(some_lock);				percpu_write_lock_irqsave()
						case CPU_DYING
percpu_read_lock_irqsafe();	<---deadlock--->	spin_lock(some_lock);

The lockdep can find out such dependency, but we must try our best to
find out them before merge the patchset to mainline. We can  review
all the code of cpu_disable() and CPU_DYING and fix this kinds of lock
dependency, but it is not easy thing, it may be a long term project.

======
And if there is any CPU_DYING code takes no locks and do some
works(because they know they are called via stop_machine()) we need to
add that locking code back if there is such code.(I don't know whether
such code exist or not)

>
> Per-cpu counters can help solve the cache-line bouncing problem. So we
> actually use the best of both: per-cpu counters (no-waiting) at the reader
> side in the fast-path, and global rwlocks in the slowpath.
>
> [ Fastpath = no writer is active; Slowpath = a writer is active ]
>
> IOW, the readers just increment/decrement their per-cpu refcounts (disabling
> interrupts during the updates, if necessary) when no writer is active.
> When a writer becomes active, he signals all readers to switch to global
> rwlocks for the duration of his activity. The readers switch over when it
> is safe for them (ie., when they are about to start a fresh, non-nested
> read-side critical section) and start using (holding) the global rwlock for
> read in their subsequent critical sections.
>
> The writer waits for every existing reader to switch, and then acquires the
> global rwlock for write and enters his critical section. Later, the writer
> signals all readers that he is done, and that they can go back to using their
> per-cpu refcounts again.
>
> Note that the lock-safety (despite the per-cpu scheme) comes from the fact
> that the readers can *choose* _when_ to switch to rwlocks upon the writer's
> signal. And the readers don't wait on anybody based on the per-cpu counters.
> The only true synchronization that involves waiting at the reader-side in this
> scheme, is the one arising from the global rwlock, which is safe from circular
> locking dependency issues.
>
> Reader-writer locks and per-cpu counters are recursive, so they can be
> used in a nested fashion in the reader-path, which makes per-CPU rwlocks also
> recursive. Also, this design of switching the synchronization scheme ensures
> that you can safely nest and use these locks in a very flexible manner.
>
> I'm indebted to Michael Wang and Xiao Guangrong for their numerous thoughtful
> suggestions and ideas, which inspired and influenced many of the decisions in
> this as well as previous designs. Thanks a lot Michael and Xiao!
>
> Cc: David Howells <dhowells@redhat.com>
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> ---
>
>  lib/percpu-rwlock.c |  139 ++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 137 insertions(+), 2 deletions(-)
>
> diff --git a/lib/percpu-rwlock.c b/lib/percpu-rwlock.c
> index f938096..edefdea 100644
> --- a/lib/percpu-rwlock.c
> +++ b/lib/percpu-rwlock.c
> @@ -27,6 +27,24 @@
>  #include <linux/percpu-rwlock.h>
>  #include <linux/errno.h>
>
> +#include <asm/processor.h>
> +
> +
> +#define reader_yet_to_switch(pcpu_rwlock, cpu)                             \
> +       (ACCESS_ONCE(per_cpu_ptr((pcpu_rwlock)->rw_state, cpu)->reader_refcnt))
> +
> +#define reader_percpu_nesting_depth(pcpu_rwlock)                 \
> +       (__this_cpu_read((pcpu_rwlock)->rw_state->reader_refcnt))
> +
> +#define reader_uses_percpu_refcnt(pcpu_rwlock)                         \
> +                               reader_percpu_nesting_depth(pcpu_rwlock)
> +
> +#define reader_nested_percpu(pcpu_rwlock)                              \
> +                       (reader_percpu_nesting_depth(pcpu_rwlock) > 1)
> +
> +#define writer_active(pcpu_rwlock)                                     \
> +       (__this_cpu_read((pcpu_rwlock)->rw_state->writer_signal))
> +
>
>  int __percpu_init_rwlock(struct percpu_rwlock *pcpu_rwlock,
>                          const char *name, struct lock_class_key *rwlock_key)
> @@ -55,21 +73,138 @@ void percpu_free_rwlock(struct percpu_rwlock *pcpu_rwlock)
>
>  void percpu_read_lock(struct percpu_rwlock *pcpu_rwlock)
>  {
> -       read_lock(&pcpu_rwlock->global_rwlock);
> +       preempt_disable();
> +
> +       /*
> +        * Let the writer know that a reader is active, even before we choose
> +        * our reader-side synchronization scheme.
> +        */
> +       this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
> +
> +       /*
> +        * If we are already using per-cpu refcounts, it is not safe to switch
> +        * the synchronization scheme. So continue using the refcounts.
> +        */
> +       if (reader_nested_percpu(pcpu_rwlock))
> +               return;
> +
> +       /*
> +        * The write to 'reader_refcnt' must be visible before we read
> +        * 'writer_signal'.
> +        */
> +       smp_mb();
> +
> +       if (likely(!writer_active(pcpu_rwlock))) {
> +               goto out;
> +       } else {
> +               /* Writer is active, so switch to global rwlock. */
> +               read_lock(&pcpu_rwlock->global_rwlock);
> +
> +               /*
> +                * We might have raced with a writer going inactive before we
> +                * took the read-lock. So re-evaluate whether we still need to
> +                * hold the rwlock or if we can switch back to per-cpu
> +                * refcounts. (This also helps avoid heterogeneous nesting of
> +                * readers).
> +                */
> +               if (writer_active(pcpu_rwlock)) {
> +                       /*
> +                        * The above writer_active() check can get reordered
> +                        * with this_cpu_dec() below, but this is OK, because
> +                        * holding the rwlock is conservative.
> +                        */
> +                       this_cpu_dec(pcpu_rwlock->rw_state->reader_refcnt);
> +               } else {
> +                       read_unlock(&pcpu_rwlock->global_rwlock);
> +               }
> +       }
> +
> +out:
> +       /* Prevent reordering of any subsequent reads/writes */
> +       smp_mb();
>  }
>
>  void percpu_read_unlock(struct percpu_rwlock *pcpu_rwlock)
>  {
> -       read_unlock(&pcpu_rwlock->global_rwlock);
> +       /*
> +        * We never allow heterogeneous nesting of readers. So it is trivial
> +        * to find out the kind of reader we are, and undo the operation
> +        * done by our corresponding percpu_read_lock().
> +        */
> +
> +       /* Try to fast-path: a nested percpu reader is the simplest case */
> +       if (reader_nested_percpu(pcpu_rwlock)) {
> +               this_cpu_dec(pcpu_rwlock->rw_state->reader_refcnt);
> +               preempt_enable();
> +               return;
> +       }
> +
> +       /*
> +        * Now we are left with only 2 options: a non-nested percpu reader,
> +        * or a reader holding rwlock
> +        */
> +       if (reader_uses_percpu_refcnt(pcpu_rwlock)) {
> +               /*
> +                * Complete the critical section before decrementing the
> +                * refcnt. We can optimize this away if we are a nested
> +                * reader (the case above).
> +                */
> +               smp_mb();
> +               this_cpu_dec(pcpu_rwlock->rw_state->reader_refcnt);
> +       } else {
> +               read_unlock(&pcpu_rwlock->global_rwlock);
> +       }
> +
> +       preempt_enable();
>  }
>
>  void percpu_write_lock(struct percpu_rwlock *pcpu_rwlock)
>  {
> +       unsigned int cpu;
> +
> +       /*
> +        * Tell all readers that a writer is becoming active, so that they
> +        * start switching over to the global rwlock.
> +        */
> +       for_each_possible_cpu(cpu)
> +               per_cpu_ptr(pcpu_rwlock->rw_state, cpu)->writer_signal = true;
> +
> +       smp_mb();
> +
> +       /*
> +        * Wait for every reader to see the writer's signal and switch from
> +        * percpu refcounts to global rwlock.
> +        *
> +        * If a reader is still using percpu refcounts, wait for him to switch.
> +        * Else, we can safely go ahead, because either the reader has already
> +        * switched over, or the next reader that comes along on that CPU will
> +        * notice the writer's signal and will switch over to the rwlock.
> +        */
> +
> +       for_each_possible_cpu(cpu) {
> +               while (reader_yet_to_switch(pcpu_rwlock, cpu))
> +                       cpu_relax();
> +       }
> +
> +       smp_mb(); /* Complete the wait-for-readers, before taking the lock */
>         write_lock(&pcpu_rwlock->global_rwlock);
>  }
>
>  void percpu_write_unlock(struct percpu_rwlock *pcpu_rwlock)
>  {
> +       unsigned int cpu;
> +
> +       /* Complete the critical section before clearing ->writer_signal */
> +       smp_mb();
> +
> +       /*
> +        * Inform all readers that we are done, so that they can switch back
> +        * to their per-cpu refcounts. (We don't need to wait for them to
> +        * see it).
> +        */
> +       for_each_possible_cpu(cpu)
> +               per_cpu_ptr(pcpu_rwlock->rw_state, cpu)->writer_signal = false;
> +
>         write_unlock(&pcpu_rwlock->global_rwlock);
>  }
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-26 14:17     ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-02-26 14:17 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: linux-doc, peterz, fweisbec, linux-kernel, walken, mingo,
	linux-arch, linux, xiaoguangrong, wangyun, paulmck, nikunj,
	linux-pm, rusty, rostedt, rjw, namhyung, tglx, linux-arm-kernel,
	netdev, oleg, vincent.guittot, sbw, tj, akpm, linuxppc-dev

On Mon, Feb 18, 2013 at 8:38 PM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> Using global rwlocks as the backend for per-CPU rwlocks helps us avoid many
> lock-ordering related problems (unlike per-cpu locks). However, global
> rwlocks lead to unnecessary cache-line bouncing even when there are no
> writers present, which can slow down the system needlessly.

per-CPU rwlocks(yours and mine) are the exactly same as rwlock_t in
the view of lock dependency(except reader-C.S. can be nested in
writer-C.S.)
so they can deadlock in this order:

spin_lock(some_lock);				percpu_write_lock_irqsave()
						case CPU_DYING
percpu_read_lock_irqsafe();	<---deadlock--->	spin_lock(some_lock);

The lockdep can find out such dependency, but we must try our best to
find out them before merge the patchset to mainline. We can  review
all the code of cpu_disable() and CPU_DYING and fix this kinds of lock
dependency, but it is not easy thing, it may be a long term project.

======
And if there is any CPU_DYING code takes no locks and do some
works(because they know they are called via stop_machine()) we need to
add that locking code back if there is such code.(I don't know whether
such code exist or not)

>
> Per-cpu counters can help solve the cache-line bouncing problem. So we
> actually use the best of both: per-cpu counters (no-waiting) at the reader
> side in the fast-path, and global rwlocks in the slowpath.
>
> [ Fastpath = no writer is active; Slowpath = a writer is active ]
>
> IOW, the readers just increment/decrement their per-cpu refcounts (disabling
> interrupts during the updates, if necessary) when no writer is active.
> When a writer becomes active, he signals all readers to switch to global
> rwlocks for the duration of his activity. The readers switch over when it
> is safe for them (ie., when they are about to start a fresh, non-nested
> read-side critical section) and start using (holding) the global rwlock for
> read in their subsequent critical sections.
>
> The writer waits for every existing reader to switch, and then acquires the
> global rwlock for write and enters his critical section. Later, the writer
> signals all readers that he is done, and that they can go back to using their
> per-cpu refcounts again.
>
> Note that the lock-safety (despite the per-cpu scheme) comes from the fact
> that the readers can *choose* _when_ to switch to rwlocks upon the writer's
> signal. And the readers don't wait on anybody based on the per-cpu counters.
> The only true synchronization that involves waiting at the reader-side in this
> scheme, is the one arising from the global rwlock, which is safe from circular
> locking dependency issues.
>
> Reader-writer locks and per-cpu counters are recursive, so they can be
> used in a nested fashion in the reader-path, which makes per-CPU rwlocks also
> recursive. Also, this design of switching the synchronization scheme ensures
> that you can safely nest and use these locks in a very flexible manner.
>
> I'm indebted to Michael Wang and Xiao Guangrong for their numerous thoughtful
> suggestions and ideas, which inspired and influenced many of the decisions in
> this as well as previous designs. Thanks a lot Michael and Xiao!
>
> Cc: David Howells <dhowells@redhat.com>
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> ---
>
>  lib/percpu-rwlock.c |  139 ++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 137 insertions(+), 2 deletions(-)
>
> diff --git a/lib/percpu-rwlock.c b/lib/percpu-rwlock.c
> index f938096..edefdea 100644
> --- a/lib/percpu-rwlock.c
> +++ b/lib/percpu-rwlock.c
> @@ -27,6 +27,24 @@
>  #include <linux/percpu-rwlock.h>
>  #include <linux/errno.h>
>
> +#include <asm/processor.h>
> +
> +
> +#define reader_yet_to_switch(pcpu_rwlock, cpu)                             \
> +       (ACCESS_ONCE(per_cpu_ptr((pcpu_rwlock)->rw_state, cpu)->reader_refcnt))
> +
> +#define reader_percpu_nesting_depth(pcpu_rwlock)                 \
> +       (__this_cpu_read((pcpu_rwlock)->rw_state->reader_refcnt))
> +
> +#define reader_uses_percpu_refcnt(pcpu_rwlock)                         \
> +                               reader_percpu_nesting_depth(pcpu_rwlock)
> +
> +#define reader_nested_percpu(pcpu_rwlock)                              \
> +                       (reader_percpu_nesting_depth(pcpu_rwlock) > 1)
> +
> +#define writer_active(pcpu_rwlock)                                     \
> +       (__this_cpu_read((pcpu_rwlock)->rw_state->writer_signal))
> +
>
>  int __percpu_init_rwlock(struct percpu_rwlock *pcpu_rwlock,
>                          const char *name, struct lock_class_key *rwlock_key)
> @@ -55,21 +73,138 @@ void percpu_free_rwlock(struct percpu_rwlock *pcpu_rwlock)
>
>  void percpu_read_lock(struct percpu_rwlock *pcpu_rwlock)
>  {
> -       read_lock(&pcpu_rwlock->global_rwlock);
> +       preempt_disable();
> +
> +       /*
> +        * Let the writer know that a reader is active, even before we choose
> +        * our reader-side synchronization scheme.
> +        */
> +       this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
> +
> +       /*
> +        * If we are already using per-cpu refcounts, it is not safe to switch
> +        * the synchronization scheme. So continue using the refcounts.
> +        */
> +       if (reader_nested_percpu(pcpu_rwlock))
> +               return;
> +
> +       /*
> +        * The write to 'reader_refcnt' must be visible before we read
> +        * 'writer_signal'.
> +        */
> +       smp_mb();
> +
> +       if (likely(!writer_active(pcpu_rwlock))) {
> +               goto out;
> +       } else {
> +               /* Writer is active, so switch to global rwlock. */
> +               read_lock(&pcpu_rwlock->global_rwlock);
> +
> +               /*
> +                * We might have raced with a writer going inactive before we
> +                * took the read-lock. So re-evaluate whether we still need to
> +                * hold the rwlock or if we can switch back to per-cpu
> +                * refcounts. (This also helps avoid heterogeneous nesting of
> +                * readers).
> +                */
> +               if (writer_active(pcpu_rwlock)) {
> +                       /*
> +                        * The above writer_active() check can get reordered
> +                        * with this_cpu_dec() below, but this is OK, because
> +                        * holding the rwlock is conservative.
> +                        */
> +                       this_cpu_dec(pcpu_rwlock->rw_state->reader_refcnt);
> +               } else {
> +                       read_unlock(&pcpu_rwlock->global_rwlock);
> +               }
> +       }
> +
> +out:
> +       /* Prevent reordering of any subsequent reads/writes */
> +       smp_mb();
>  }
>
>  void percpu_read_unlock(struct percpu_rwlock *pcpu_rwlock)
>  {
> -       read_unlock(&pcpu_rwlock->global_rwlock);
> +       /*
> +        * We never allow heterogeneous nesting of readers. So it is trivial
> +        * to find out the kind of reader we are, and undo the operation
> +        * done by our corresponding percpu_read_lock().
> +        */
> +
> +       /* Try to fast-path: a nested percpu reader is the simplest case */
> +       if (reader_nested_percpu(pcpu_rwlock)) {
> +               this_cpu_dec(pcpu_rwlock->rw_state->reader_refcnt);
> +               preempt_enable();
> +               return;
> +       }
> +
> +       /*
> +        * Now we are left with only 2 options: a non-nested percpu reader,
> +        * or a reader holding rwlock
> +        */
> +       if (reader_uses_percpu_refcnt(pcpu_rwlock)) {
> +               /*
> +                * Complete the critical section before decrementing the
> +                * refcnt. We can optimize this away if we are a nested
> +                * reader (the case above).
> +                */
> +               smp_mb();
> +               this_cpu_dec(pcpu_rwlock->rw_state->reader_refcnt);
> +       } else {
> +               read_unlock(&pcpu_rwlock->global_rwlock);
> +       }
> +
> +       preempt_enable();
>  }
>
>  void percpu_write_lock(struct percpu_rwlock *pcpu_rwlock)
>  {
> +       unsigned int cpu;
> +
> +       /*
> +        * Tell all readers that a writer is becoming active, so that they
> +        * start switching over to the global rwlock.
> +        */
> +       for_each_possible_cpu(cpu)
> +               per_cpu_ptr(pcpu_rwlock->rw_state, cpu)->writer_signal = true;
> +
> +       smp_mb();
> +
> +       /*
> +        * Wait for every reader to see the writer's signal and switch from
> +        * percpu refcounts to global rwlock.
> +        *
> +        * If a reader is still using percpu refcounts, wait for him to switch.
> +        * Else, we can safely go ahead, because either the reader has already
> +        * switched over, or the next reader that comes along on that CPU will
> +        * notice the writer's signal and will switch over to the rwlock.
> +        */
> +
> +       for_each_possible_cpu(cpu) {
> +               while (reader_yet_to_switch(pcpu_rwlock, cpu))
> +                       cpu_relax();
> +       }
> +
> +       smp_mb(); /* Complete the wait-for-readers, before taking the lock */
>         write_lock(&pcpu_rwlock->global_rwlock);
>  }
>
>  void percpu_write_unlock(struct percpu_rwlock *pcpu_rwlock)
>  {
> +       unsigned int cpu;
> +
> +       /* Complete the critical section before clearing ->writer_signal */
> +       smp_mb();
> +
> +       /*
> +        * Inform all readers that we are done, so that they can switch back
> +        * to their per-cpu refcounts. (We don't need to wait for them to
> +        * see it).
> +        */
> +       for_each_possible_cpu(cpu)
> +               per_cpu_ptr(pcpu_rwlock->rw_state, cpu)->writer_signal = false;
> +
>         write_unlock(&pcpu_rwlock->global_rwlock);
>  }
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-26 14:17     ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-02-26 14:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 18, 2013 at 8:38 PM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> Using global rwlocks as the backend for per-CPU rwlocks helps us avoid many
> lock-ordering related problems (unlike per-cpu locks). However, global
> rwlocks lead to unnecessary cache-line bouncing even when there are no
> writers present, which can slow down the system needlessly.

per-CPU rwlocks(yours and mine) are the exactly same as rwlock_t in
the view of lock dependency(except reader-C.S. can be nested in
writer-C.S.)
so they can deadlock in this order:

spin_lock(some_lock);				percpu_write_lock_irqsave()
						case CPU_DYING
percpu_read_lock_irqsafe();	<---deadlock--->	spin_lock(some_lock);

The lockdep can find out such dependency, but we must try our best to
find out them before merge the patchset to mainline. We can  review
all the code of cpu_disable() and CPU_DYING and fix this kinds of lock
dependency, but it is not easy thing, it may be a long term project.

======
And if there is any CPU_DYING code takes no locks and do some
works(because they know they are called via stop_machine()) we need to
add that locking code back if there is such code.(I don't know whether
such code exist or not)

>
> Per-cpu counters can help solve the cache-line bouncing problem. So we
> actually use the best of both: per-cpu counters (no-waiting) at the reader
> side in the fast-path, and global rwlocks in the slowpath.
>
> [ Fastpath = no writer is active; Slowpath = a writer is active ]
>
> IOW, the readers just increment/decrement their per-cpu refcounts (disabling
> interrupts during the updates, if necessary) when no writer is active.
> When a writer becomes active, he signals all readers to switch to global
> rwlocks for the duration of his activity. The readers switch over when it
> is safe for them (ie., when they are about to start a fresh, non-nested
> read-side critical section) and start using (holding) the global rwlock for
> read in their subsequent critical sections.
>
> The writer waits for every existing reader to switch, and then acquires the
> global rwlock for write and enters his critical section. Later, the writer
> signals all readers that he is done, and that they can go back to using their
> per-cpu refcounts again.
>
> Note that the lock-safety (despite the per-cpu scheme) comes from the fact
> that the readers can *choose* _when_ to switch to rwlocks upon the writer's
> signal. And the readers don't wait on anybody based on the per-cpu counters.
> The only true synchronization that involves waiting at the reader-side in this
> scheme, is the one arising from the global rwlock, which is safe from circular
> locking dependency issues.
>
> Reader-writer locks and per-cpu counters are recursive, so they can be
> used in a nested fashion in the reader-path, which makes per-CPU rwlocks also
> recursive. Also, this design of switching the synchronization scheme ensures
> that you can safely nest and use these locks in a very flexible manner.
>
> I'm indebted to Michael Wang and Xiao Guangrong for their numerous thoughtful
> suggestions and ideas, which inspired and influenced many of the decisions in
> this as well as previous designs. Thanks a lot Michael and Xiao!
>
> Cc: David Howells <dhowells@redhat.com>
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> ---
>
>  lib/percpu-rwlock.c |  139 ++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 137 insertions(+), 2 deletions(-)
>
> diff --git a/lib/percpu-rwlock.c b/lib/percpu-rwlock.c
> index f938096..edefdea 100644
> --- a/lib/percpu-rwlock.c
> +++ b/lib/percpu-rwlock.c
> @@ -27,6 +27,24 @@
>  #include <linux/percpu-rwlock.h>
>  #include <linux/errno.h>
>
> +#include <asm/processor.h>
> +
> +
> +#define reader_yet_to_switch(pcpu_rwlock, cpu)                             \
> +       (ACCESS_ONCE(per_cpu_ptr((pcpu_rwlock)->rw_state, cpu)->reader_refcnt))
> +
> +#define reader_percpu_nesting_depth(pcpu_rwlock)                 \
> +       (__this_cpu_read((pcpu_rwlock)->rw_state->reader_refcnt))
> +
> +#define reader_uses_percpu_refcnt(pcpu_rwlock)                         \
> +                               reader_percpu_nesting_depth(pcpu_rwlock)
> +
> +#define reader_nested_percpu(pcpu_rwlock)                              \
> +                       (reader_percpu_nesting_depth(pcpu_rwlock) > 1)
> +
> +#define writer_active(pcpu_rwlock)                                     \
> +       (__this_cpu_read((pcpu_rwlock)->rw_state->writer_signal))
> +
>
>  int __percpu_init_rwlock(struct percpu_rwlock *pcpu_rwlock,
>                          const char *name, struct lock_class_key *rwlock_key)
> @@ -55,21 +73,138 @@ void percpu_free_rwlock(struct percpu_rwlock *pcpu_rwlock)
>
>  void percpu_read_lock(struct percpu_rwlock *pcpu_rwlock)
>  {
> -       read_lock(&pcpu_rwlock->global_rwlock);
> +       preempt_disable();
> +
> +       /*
> +        * Let the writer know that a reader is active, even before we choose
> +        * our reader-side synchronization scheme.
> +        */
> +       this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
> +
> +       /*
> +        * If we are already using per-cpu refcounts, it is not safe to switch
> +        * the synchronization scheme. So continue using the refcounts.
> +        */
> +       if (reader_nested_percpu(pcpu_rwlock))
> +               return;
> +
> +       /*
> +        * The write to 'reader_refcnt' must be visible before we read
> +        * 'writer_signal'.
> +        */
> +       smp_mb();
> +
> +       if (likely(!writer_active(pcpu_rwlock))) {
> +               goto out;
> +       } else {
> +               /* Writer is active, so switch to global rwlock. */
> +               read_lock(&pcpu_rwlock->global_rwlock);
> +
> +               /*
> +                * We might have raced with a writer going inactive before we
> +                * took the read-lock. So re-evaluate whether we still need to
> +                * hold the rwlock or if we can switch back to per-cpu
> +                * refcounts. (This also helps avoid heterogeneous nesting of
> +                * readers).
> +                */
> +               if (writer_active(pcpu_rwlock)) {
> +                       /*
> +                        * The above writer_active() check can get reordered
> +                        * with this_cpu_dec() below, but this is OK, because
> +                        * holding the rwlock is conservative.
> +                        */
> +                       this_cpu_dec(pcpu_rwlock->rw_state->reader_refcnt);
> +               } else {
> +                       read_unlock(&pcpu_rwlock->global_rwlock);
> +               }
> +       }
> +
> +out:
> +       /* Prevent reordering of any subsequent reads/writes */
> +       smp_mb();
>  }
>
>  void percpu_read_unlock(struct percpu_rwlock *pcpu_rwlock)
>  {
> -       read_unlock(&pcpu_rwlock->global_rwlock);
> +       /*
> +        * We never allow heterogeneous nesting of readers. So it is trivial
> +        * to find out the kind of reader we are, and undo the operation
> +        * done by our corresponding percpu_read_lock().
> +        */
> +
> +       /* Try to fast-path: a nested percpu reader is the simplest case */
> +       if (reader_nested_percpu(pcpu_rwlock)) {
> +               this_cpu_dec(pcpu_rwlock->rw_state->reader_refcnt);
> +               preempt_enable();
> +               return;
> +       }
> +
> +       /*
> +        * Now we are left with only 2 options: a non-nested percpu reader,
> +        * or a reader holding rwlock
> +        */
> +       if (reader_uses_percpu_refcnt(pcpu_rwlock)) {
> +               /*
> +                * Complete the critical section before decrementing the
> +                * refcnt. We can optimize this away if we are a nested
> +                * reader (the case above).
> +                */
> +               smp_mb();
> +               this_cpu_dec(pcpu_rwlock->rw_state->reader_refcnt);
> +       } else {
> +               read_unlock(&pcpu_rwlock->global_rwlock);
> +       }
> +
> +       preempt_enable();
>  }
>
>  void percpu_write_lock(struct percpu_rwlock *pcpu_rwlock)
>  {
> +       unsigned int cpu;
> +
> +       /*
> +        * Tell all readers that a writer is becoming active, so that they
> +        * start switching over to the global rwlock.
> +        */
> +       for_each_possible_cpu(cpu)
> +               per_cpu_ptr(pcpu_rwlock->rw_state, cpu)->writer_signal = true;
> +
> +       smp_mb();
> +
> +       /*
> +        * Wait for every reader to see the writer's signal and switch from
> +        * percpu refcounts to global rwlock.
> +        *
> +        * If a reader is still using percpu refcounts, wait for him to switch.
> +        * Else, we can safely go ahead, because either the reader has already
> +        * switched over, or the next reader that comes along on that CPU will
> +        * notice the writer's signal and will switch over to the rwlock.
> +        */
> +
> +       for_each_possible_cpu(cpu) {
> +               while (reader_yet_to_switch(pcpu_rwlock, cpu))
> +                       cpu_relax();
> +       }
> +
> +       smp_mb(); /* Complete the wait-for-readers, before taking the lock */
>         write_lock(&pcpu_rwlock->global_rwlock);
>  }
>
>  void percpu_write_unlock(struct percpu_rwlock *pcpu_rwlock)
>  {
> +       unsigned int cpu;
> +
> +       /* Complete the critical section before clearing ->writer_signal */
> +       smp_mb();
> +
> +       /*
> +        * Inform all readers that we are done, so that they can switch back
> +        * to their per-cpu refcounts. (We don't need to wait for them to
> +        * see it).
> +        */
> +       for_each_possible_cpu(cpu)
> +               per_cpu_ptr(pcpu_rwlock->rw_state, cpu)->writer_signal = false;
> +
>         write_unlock(&pcpu_rwlock->global_rwlock);
>  }
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
  2013-02-26 12:59                       ` Lai Jiangshan
  (?)
@ 2013-02-26 14:22                         ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-26 14:22 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Michel Lespinasse, linux-doc, peterz, fweisbec, linux-kernel,
	namhyung, mingo, linux-arch, linux, xiaoguangrong, wangyun,
	paulmck, nikunj, linux-pm, rusty, rostedt, rjw, vincent.guittot,
	tglx, linux-arm-kernel, netdev, oleg, sbw, tj, akpm,
	linuxppc-dev


Hi Lai,

I'm really not convinced that piggy-backing on lglocks would help
us in any way. But still, let me try to address some of the points
you raised...

On 02/26/2013 06:29 PM, Lai Jiangshan wrote:
> On Tue, Feb 26, 2013 at 5:02 PM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> On 02/26/2013 05:47 AM, Lai Jiangshan wrote:
>>> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>> Hi Lai,
>>>>
>>>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>>>> Hi, Srivatsa,
>>>>>
>>>>> The target of the whole patchset is nice for me.
>>>>
>>>> Cool! Thanks :-)
>>>>
>> [...]
>>
>> Unfortunately, I see quite a few issues with the code above. IIUC, the
>> writer and the reader both increment the same counters. So how will the
>> unlock() code in the reader path know when to unlock which of the locks?
> 
> The same as your code, the reader(which nested in write C.S.) just dec
> the counters.

And that works fine in my case because the writer and the reader update
_two_ _different_ counters. If both of them update the same counter, there
will be a semantic clash - an increment of the counter can either mean that
a new writer became active, or it can also indicate a nested reader. A decrement
can also similarly have 2 meanings. And thus it will be difficult to decide
the right action to take, based on the value of the counter.

> 
>> (The counter-dropping-to-zero logic is not safe, since it can be updated
>> due to different reasons). And now that I look at it again, in the absence
>> of the writer, the reader is allowed to be recursive at the heavy cost of
>> taking the global rwlock for read, every 2nd time you nest (because the
>> spinlock is non-recursive).
> 
> (I did not understand your comments of this part)
> nested reader is considered seldom.

No, nested readers can be _quite_ frequent. Because, potentially all users
of preempt_disable() are readers - and its well-known how frequently we
nest preempt_disable(). As a simple example, any atomic reader who calls
smp_call_function() will become a nested reader, because smp_call_function()
itself is a reader. So reader nesting is expected to be quite frequent.

> But if N(>=2) nested readers happen,
> the overhead is:
>     1 spin_try_lock() + 1 read_lock() + (N-1) __this_cpu_inc()
> 

In my patch, its just this_cpu_inc(). Note that these are _very_ hot paths.
So every bit of optimization that you can add is worthwhile.

And your read_lock() is a _global_ lock - thus, it can lead to a lot of
cache-line bouncing. That's *exactly* why I have used per-cpu refcounts in
my synchronization scheme, to avoid taking the global rwlock as much as possible.

Another important point to note is that, the overhead we are talking about
here, exists even when _not_ performing hotplug. And its the replacement to
the super-fast preempt_disable(). So its extremely important to consciously
minimize this overhead - else we'll end up slowing down the system significantly.

>> Also, this lg_rwlock implementation uses 3
>> different data-structures - a per-cpu spinlock, a global rwlock and
>> a per-cpu refcnt, and its not immediately apparent why you need those many
>> or even those many varieties.
> 
> data-structures is the same as yours.
> fallback_reader_refcnt <--> reader_refcnt

This has semantic problems, as noted above.

> per-cpu spinlock <--> write_signal

Acquire/release of (spin) lock is costlier than inc/dec of a counter, IIUC.

> fallback_rwlock  <---> global_rwlock
> 
>> Also I see that this doesn't handle the
>> case of interrupt-handlers also being readers.
> 
> handled. nested reader will see the ref or take the fallback_rwlock
>

I'm not referring to simple nested readers here, but interrupt handlers who
can act as readers. For starters, the arch_spin_trylock() is not safe when
interrupt handlers can also run the same code, right? You'll need to save
and restore interrupts at critical points in the code. Also, the __foo()
variants used to read/update the counters are not interrupt-safe. And,
the unlock() code in the reader path is again going to be confused about
what to do when interrupt handlers interrupt regular readers, due to the
messed up refcount.
 
>>
>> IMHO, the per-cpu rwlock scheme that I have implemented in this patchset
>> has a clean, understandable design and just enough data-structures/locks
>> to achieve its goal and has several optimizations (like reducing the
>> interrupts-disabled time etc) included - all in a very straight-forward
>> manner. Since this is non-trivial, IMHO, starting from a clean slate is
>> actually better than trying to retrofit the logic into some locking scheme
>> which we actively want to avoid (and hence effectively we aren't even
>> borrowing anything from!).
>>
>> To summarize, if you are just pointing out that we can implement the same
>> logic by altering lglocks, then sure, I acknowledge the possibility.
>> However, I don't think doing that actually makes it better; it either
>> convolutes the logic unnecessarily, or ends up looking _very_ similar to
>> the implementation in this patchset, from what I can see.
>>

 
Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-26 14:22                         ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-26 14:22 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-doc, peterz, fweisbec, linux-kernel, Michel Lespinasse,
	mingo, linux-arch, linux, xiaoguangrong, wangyun, paulmck,
	nikunj, linux-pm, rusty, rostedt, rjw, namhyung, tglx,
	linux-arm-kernel, netdev, oleg, vincent.guittot, sbw, tj, akpm,
	linuxppc-dev


Hi Lai,

I'm really not convinced that piggy-backing on lglocks would help
us in any way. But still, let me try to address some of the points
you raised...

On 02/26/2013 06:29 PM, Lai Jiangshan wrote:
> On Tue, Feb 26, 2013 at 5:02 PM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> On 02/26/2013 05:47 AM, Lai Jiangshan wrote:
>>> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>> Hi Lai,
>>>>
>>>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>>>> Hi, Srivatsa,
>>>>>
>>>>> The target of the whole patchset is nice for me.
>>>>
>>>> Cool! Thanks :-)
>>>>
>> [...]
>>
>> Unfortunately, I see quite a few issues with the code above. IIUC, the
>> writer and the reader both increment the same counters. So how will the
>> unlock() code in the reader path know when to unlock which of the locks?
> 
> The same as your code, the reader(which nested in write C.S.) just dec
> the counters.

And that works fine in my case because the writer and the reader update
_two_ _different_ counters. If both of them update the same counter, there
will be a semantic clash - an increment of the counter can either mean that
a new writer became active, or it can also indicate a nested reader. A decrement
can also similarly have 2 meanings. And thus it will be difficult to decide
the right action to take, based on the value of the counter.

> 
>> (The counter-dropping-to-zero logic is not safe, since it can be updated
>> due to different reasons). And now that I look at it again, in the absence
>> of the writer, the reader is allowed to be recursive at the heavy cost of
>> taking the global rwlock for read, every 2nd time you nest (because the
>> spinlock is non-recursive).
> 
> (I did not understand your comments of this part)
> nested reader is considered seldom.

No, nested readers can be _quite_ frequent. Because, potentially all users
of preempt_disable() are readers - and its well-known how frequently we
nest preempt_disable(). As a simple example, any atomic reader who calls
smp_call_function() will become a nested reader, because smp_call_function()
itself is a reader. So reader nesting is expected to be quite frequent.

> But if N(>=2) nested readers happen,
> the overhead is:
>     1 spin_try_lock() + 1 read_lock() + (N-1) __this_cpu_inc()
> 

In my patch, its just this_cpu_inc(). Note that these are _very_ hot paths.
So every bit of optimization that you can add is worthwhile.

And your read_lock() is a _global_ lock - thus, it can lead to a lot of
cache-line bouncing. That's *exactly* why I have used per-cpu refcounts in
my synchronization scheme, to avoid taking the global rwlock as much as possible.

Another important point to note is that, the overhead we are talking about
here, exists even when _not_ performing hotplug. And its the replacement to
the super-fast preempt_disable(). So its extremely important to consciously
minimize this overhead - else we'll end up slowing down the system significantly.

>> Also, this lg_rwlock implementation uses 3
>> different data-structures - a per-cpu spinlock, a global rwlock and
>> a per-cpu refcnt, and its not immediately apparent why you need those many
>> or even those many varieties.
> 
> data-structures is the same as yours.
> fallback_reader_refcnt <--> reader_refcnt

This has semantic problems, as noted above.

> per-cpu spinlock <--> write_signal

Acquire/release of (spin) lock is costlier than inc/dec of a counter, IIUC.

> fallback_rwlock  <---> global_rwlock
> 
>> Also I see that this doesn't handle the
>> case of interrupt-handlers also being readers.
> 
> handled. nested reader will see the ref or take the fallback_rwlock
>

I'm not referring to simple nested readers here, but interrupt handlers who
can act as readers. For starters, the arch_spin_trylock() is not safe when
interrupt handlers can also run the same code, right? You'll need to save
and restore interrupts at critical points in the code. Also, the __foo()
variants used to read/update the counters are not interrupt-safe. And,
the unlock() code in the reader path is again going to be confused about
what to do when interrupt handlers interrupt regular readers, due to the
messed up refcount.
 
>>
>> IMHO, the per-cpu rwlock scheme that I have implemented in this patchset
>> has a clean, understandable design and just enough data-structures/locks
>> to achieve its goal and has several optimizations (like reducing the
>> interrupts-disabled time etc) included - all in a very straight-forward
>> manner. Since this is non-trivial, IMHO, starting from a clean slate is
>> actually better than trying to retrofit the logic into some locking scheme
>> which we actively want to avoid (and hence effectively we aren't even
>> borrowing anything from!).
>>
>> To summarize, if you are just pointing out that we can implement the same
>> logic by altering lglocks, then sure, I acknowledge the possibility.
>> However, I don't think doing that actually makes it better; it either
>> convolutes the logic unnecessarily, or ends up looking _very_ similar to
>> the implementation in this patchset, from what I can see.
>>

 
Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-26 14:22                         ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-26 14:22 UTC (permalink / raw)
  To: linux-arm-kernel


Hi Lai,

I'm really not convinced that piggy-backing on lglocks would help
us in any way. But still, let me try to address some of the points
you raised...

On 02/26/2013 06:29 PM, Lai Jiangshan wrote:
> On Tue, Feb 26, 2013 at 5:02 PM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> On 02/26/2013 05:47 AM, Lai Jiangshan wrote:
>>> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>> Hi Lai,
>>>>
>>>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>>>> Hi, Srivatsa,
>>>>>
>>>>> The target of the whole patchset is nice for me.
>>>>
>>>> Cool! Thanks :-)
>>>>
>> [...]
>>
>> Unfortunately, I see quite a few issues with the code above. IIUC, the
>> writer and the reader both increment the same counters. So how will the
>> unlock() code in the reader path know when to unlock which of the locks?
> 
> The same as your code, the reader(which nested in write C.S.) just dec
> the counters.

And that works fine in my case because the writer and the reader update
_two_ _different_ counters. If both of them update the same counter, there
will be a semantic clash - an increment of the counter can either mean that
a new writer became active, or it can also indicate a nested reader. A decrement
can also similarly have 2 meanings. And thus it will be difficult to decide
the right action to take, based on the value of the counter.

> 
>> (The counter-dropping-to-zero logic is not safe, since it can be updated
>> due to different reasons). And now that I look at it again, in the absence
>> of the writer, the reader is allowed to be recursive at the heavy cost of
>> taking the global rwlock for read, every 2nd time you nest (because the
>> spinlock is non-recursive).
> 
> (I did not understand your comments of this part)
> nested reader is considered seldom.

No, nested readers can be _quite_ frequent. Because, potentially all users
of preempt_disable() are readers - and its well-known how frequently we
nest preempt_disable(). As a simple example, any atomic reader who calls
smp_call_function() will become a nested reader, because smp_call_function()
itself is a reader. So reader nesting is expected to be quite frequent.

> But if N(>=2) nested readers happen,
> the overhead is:
>     1 spin_try_lock() + 1 read_lock() + (N-1) __this_cpu_inc()
> 

In my patch, its just this_cpu_inc(). Note that these are _very_ hot paths.
So every bit of optimization that you can add is worthwhile.

And your read_lock() is a _global_ lock - thus, it can lead to a lot of
cache-line bouncing. That's *exactly* why I have used per-cpu refcounts in
my synchronization scheme, to avoid taking the global rwlock as much as possible.

Another important point to note is that, the overhead we are talking about
here, exists even when _not_ performing hotplug. And its the replacement to
the super-fast preempt_disable(). So its extremely important to consciously
minimize this overhead - else we'll end up slowing down the system significantly.

>> Also, this lg_rwlock implementation uses 3
>> different data-structures - a per-cpu spinlock, a global rwlock and
>> a per-cpu refcnt, and its not immediately apparent why you need those many
>> or even those many varieties.
> 
> data-structures is the same as yours.
> fallback_reader_refcnt <--> reader_refcnt

This has semantic problems, as noted above.

> per-cpu spinlock <--> write_signal

Acquire/release of (spin) lock is costlier than inc/dec of a counter, IIUC.

> fallback_rwlock  <---> global_rwlock
> 
>> Also I see that this doesn't handle the
>> case of interrupt-handlers also being readers.
> 
> handled. nested reader will see the ref or take the fallback_rwlock
>

I'm not referring to simple nested readers here, but interrupt handlers who
can act as readers. For starters, the arch_spin_trylock() is not safe when
interrupt handlers can also run the same code, right? You'll need to save
and restore interrupts at critical points in the code. Also, the __foo()
variants used to read/update the counters are not interrupt-safe. And,
the unlock() code in the reader path is again going to be confused about
what to do when interrupt handlers interrupt regular readers, due to the
messed up refcount.
 
>>
>> IMHO, the per-cpu rwlock scheme that I have implemented in this patchset
>> has a clean, understandable design and just enough data-structures/locks
>> to achieve its goal and has several optimizations (like reducing the
>> interrupts-disabled time etc) included - all in a very straight-forward
>> manner. Since this is non-trivial, IMHO, starting from a clean slate is
>> actually better than trying to retrofit the logic into some locking scheme
>> which we actively want to avoid (and hence effectively we aren't even
>> borrowing anything from!).
>>
>> To summarize, if you are just pointing out that we can implement the same
>> logic by altering lglocks, then sure, I acknowledge the possibility.
>> However, I don't think doing that actually makes it better; it either
>> convolutes the logic unnecessarily, or ends up looking _very_ similar to
>> the implementation in this patchset, from what I can see.
>>

 
Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
  2013-02-26 14:17     ` Lai Jiangshan
  (?)
@ 2013-02-26 14:37       ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-26 14:37 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-doc, peterz, fweisbec, linux-kernel, walken, mingo,
	linux-arch, linux, xiaoguangrong, wangyun, paulmck, nikunj,
	linux-pm, rusty, rostedt, rjw, namhyung, tglx, linux-arm-kernel,
	netdev, oleg, vincent.guittot, sbw, tj, akpm, linuxppc-dev

On 02/26/2013 07:47 PM, Lai Jiangshan wrote:
> On Mon, Feb 18, 2013 at 8:38 PM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> Using global rwlocks as the backend for per-CPU rwlocks helps us avoid many
>> lock-ordering related problems (unlike per-cpu locks). However, global
>> rwlocks lead to unnecessary cache-line bouncing even when there are no
>> writers present, which can slow down the system needlessly.
> 
> per-CPU rwlocks(yours and mine) are the exactly same as rwlock_t in
> the view of lock dependency(except reader-C.S. can be nested in
> writer-C.S.)
> so they can deadlock in this order:
> 
> spin_lock(some_lock);				percpu_write_lock_irqsave()
> 						case CPU_DYING
> percpu_read_lock_irqsafe();	<---deadlock--->	spin_lock(some_lock);
> 

Yes, of course! But this is the most straight-forward of cases to fix,
because there are a well-defined number of CPU_DYING notifiers, and the fix
is to make the lock ordering same at both sides.

The real challenge is with cases like below:

spin_lock(some_lock)			percpu_read_lock_irqsafe()

percpu_read_lock_irqsafe()		spin_lock(some_lock)

The locking primitives (percpu_read_lock/unlock_irqsafe) have been explicitly
designed to keep cases like the above deadlock-free. Because, they ease
conversions from preempt_disable() to the new locking primitive without
lock-ordering headache.

> The lockdep can find out such dependency, but we must try our best to
> find out them before merge the patchset to mainline. We can  review
> all the code of cpu_disable() and CPU_DYING and fix this kinds of lock
> dependency, but it is not easy thing, it may be a long term project.
> 

:-)

That's exactly what I have done in this patchset, and that's why there
are 46 patches in this series! :-) I have run this patchset with lockdep
turned on, and verified that there are no locking problems due to the
conversion. I even posted out performance numbers from this patchset
(ie., in comparison to stop_machine), if you are curious...
http://article.gmane.org/gmane.linux.kernel/1435249

But yes, I'll have to re-verify this because of the new code that went in
during this merge window.

> ======
> And if there is any CPU_DYING code takes no locks and do some
> works(because they know they are called via stop_machine()) we need to
> add that locking code back if there is such code.(I don't know whether
> such code exist or not)
>

Yes, I explicitly verified this too. (I had mentioned this in the cover letter).

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-26 14:37       ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-26 14:37 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-doc, peterz, fweisbec, linux-kernel, namhyung, walken,
	mingo, linux-arch, linux, xiaoguangrong, wangyun, paulmck,
	nikunj, linux-pm, rusty, rostedt, rjw, vincent.guittot, tglx,
	linux-arm-kernel, netdev, oleg, sbw, tj, akpm, linuxppc-dev

On 02/26/2013 07:47 PM, Lai Jiangshan wrote:
> On Mon, Feb 18, 2013 at 8:38 PM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> Using global rwlocks as the backend for per-CPU rwlocks helps us avoid many
>> lock-ordering related problems (unlike per-cpu locks). However, global
>> rwlocks lead to unnecessary cache-line bouncing even when there are no
>> writers present, which can slow down the system needlessly.
> 
> per-CPU rwlocks(yours and mine) are the exactly same as rwlock_t in
> the view of lock dependency(except reader-C.S. can be nested in
> writer-C.S.)
> so they can deadlock in this order:
> 
> spin_lock(some_lock);				percpu_write_lock_irqsave()
> 						case CPU_DYING
> percpu_read_lock_irqsafe();	<---deadlock--->	spin_lock(some_lock);
> 

Yes, of course! But this is the most straight-forward of cases to fix,
because there are a well-defined number of CPU_DYING notifiers, and the fix
is to make the lock ordering same at both sides.

The real challenge is with cases like below:

spin_lock(some_lock)			percpu_read_lock_irqsafe()

percpu_read_lock_irqsafe()		spin_lock(some_lock)

The locking primitives (percpu_read_lock/unlock_irqsafe) have been explicitly
designed to keep cases like the above deadlock-free. Because, they ease
conversions from preempt_disable() to the new locking primitive without
lock-ordering headache.

> The lockdep can find out such dependency, but we must try our best to
> find out them before merge the patchset to mainline. We can  review
> all the code of cpu_disable() and CPU_DYING and fix this kinds of lock
> dependency, but it is not easy thing, it may be a long term project.
> 

:-)

That's exactly what I have done in this patchset, and that's why there
are 46 patches in this series! :-) I have run this patchset with lockdep
turned on, and verified that there are no locking problems due to the
conversion. I even posted out performance numbers from this patchset
(ie., in comparison to stop_machine), if you are curious...
http://article.gmane.org/gmane.linux.kernel/1435249

But yes, I'll have to re-verify this because of the new code that went in
during this merge window.

> ======
> And if there is any CPU_DYING code takes no locks and do some
> works(because they know they are called via stop_machine()) we need to
> add that locking code back if there is such code.(I don't know whether
> such code exist or not)
>

Yes, I explicitly verified this too. (I had mentioned this in the cover letter).

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-26 14:37       ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-26 14:37 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/26/2013 07:47 PM, Lai Jiangshan wrote:
> On Mon, Feb 18, 2013 at 8:38 PM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> Using global rwlocks as the backend for per-CPU rwlocks helps us avoid many
>> lock-ordering related problems (unlike per-cpu locks). However, global
>> rwlocks lead to unnecessary cache-line bouncing even when there are no
>> writers present, which can slow down the system needlessly.
> 
> per-CPU rwlocks(yours and mine) are the exactly same as rwlock_t in
> the view of lock dependency(except reader-C.S. can be nested in
> writer-C.S.)
> so they can deadlock in this order:
> 
> spin_lock(some_lock);				percpu_write_lock_irqsave()
> 						case CPU_DYING
> percpu_read_lock_irqsafe();	<---deadlock--->	spin_lock(some_lock);
> 

Yes, of course! But this is the most straight-forward of cases to fix,
because there are a well-defined number of CPU_DYING notifiers, and the fix
is to make the lock ordering same at both sides.

The real challenge is with cases like below:

spin_lock(some_lock)			percpu_read_lock_irqsafe()

percpu_read_lock_irqsafe()		spin_lock(some_lock)

The locking primitives (percpu_read_lock/unlock_irqsafe) have been explicitly
designed to keep cases like the above deadlock-free. Because, they ease
conversions from preempt_disable() to the new locking primitive without
lock-ordering headache.

> The lockdep can find out such dependency, but we must try our best to
> find out them before merge the patchset to mainline. We can  review
> all the code of cpu_disable() and CPU_DYING and fix this kinds of lock
> dependency, but it is not easy thing, it may be a long term project.
> 

:-)

That's exactly what I have done in this patchset, and that's why there
are 46 patches in this series! :-) I have run this patchset with lockdep
turned on, and verified that there are no locking problems due to the
conversion. I even posted out performance numbers from this patchset
(ie., in comparison to stop_machine), if you are curious...
http://article.gmane.org/gmane.linux.kernel/1435249

But yes, I'll have to re-verify this because of the new code that went in
during this merge window.

> ======
> And if there is any CPU_DYING code takes no locks and do some
> works(because they know they are called via stop_machine()) we need to
> add that locking code back if there is such code.(I don't know whether
> such code exist or not)
>

Yes, I explicitly verified this too. (I had mentioned this in the cover letter).

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
  2013-02-26 13:34                   ` Lai Jiangshan
@ 2013-02-26 15:17                     ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-26 15:17 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-doc, peterz, fweisbec, linux-kernel, Michel Lespinasse,
	mingo, linux-arch, linux, xiaoguangrong, wangyun, paulmck,
	nikunj, linux-pm, rusty, rostedt, rjw, namhyung, tglx,
	linux-arm-kernel, Lai Jiangshan, netdev, oleg, vincent.guittot,
	sbw, tj, akpm, linuxppc-dev

On 02/26/2013 07:04 PM, Lai Jiangshan wrote:
> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> Hi Lai,
>>
>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>> Hi, Srivatsa,
>>>
>>> The target of the whole patchset is nice for me.
>>
>> Cool! Thanks :-)
>>
>>> A question: How did you find out the such usages of
>>> "preempt_disable()" and convert them? did all are converted?
>>>
>>
>> Well, I scanned through the source tree for usages which implicitly
>> disabled CPU offline and converted them over.
> 
> How do you scan? could you show the way you scan the source tree.
> I can follow your instructions for double checking.
> 

Its nothing special. I grepped the source tree for anything dealing with
cpu_online_mask or its derivatives and also for functions/constructs that
rely on the cpumasks internally (eg: smp_call_function). Then I audited all
such call-sites and converted them (if needed) accordingly.

>> Its not limited to uses
>> of preempt_disable() alone - even spin_locks, rwlocks, local_irq_disable()
>> etc also help disable CPU offline. So I tried to dig out all such uses
>> and converted them. However, since the merge window is open, a lot of
>> new code is flowing into the tree. So I'll have to rescan the tree to
>> see if there are any more places to convert.
> 
> I remember some code has such assumption:
>     preempt_disable() (or something else)
>     //the code assume that the cpu_online_map can't be changed.
>     preempt_enable()
> 
> It is very hard to find out all such kinds of assumptions and fixes them.
> (I notice your code mainly fixes code around send_xxxx())
> 

The conversion can be carried out using the method I mentioned above.

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-26 15:17                     ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-26 15:17 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/26/2013 07:04 PM, Lai Jiangshan wrote:
> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> Hi Lai,
>>
>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>> Hi, Srivatsa,
>>>
>>> The target of the whole patchset is nice for me.
>>
>> Cool! Thanks :-)
>>
>>> A question: How did you find out the such usages of
>>> "preempt_disable()" and convert them? did all are converted?
>>>
>>
>> Well, I scanned through the source tree for usages which implicitly
>> disabled CPU offline and converted them over.
> 
> How do you scan? could you show the way you scan the source tree.
> I can follow your instructions for double checking.
> 

Its nothing special. I grepped the source tree for anything dealing with
cpu_online_mask or its derivatives and also for functions/constructs that
rely on the cpumasks internally (eg: smp_call_function). Then I audited all
such call-sites and converted them (if needed) accordingly.

>> Its not limited to uses
>> of preempt_disable() alone - even spin_locks, rwlocks, local_irq_disable()
>> etc also help disable CPU offline. So I tried to dig out all such uses
>> and converted them. However, since the merge window is open, a lot of
>> new code is flowing into the tree. So I'll have to rescan the tree to
>> see if there are any more places to convert.
> 
> I remember some code has such assumption:
>     preempt_disable() (or something else)
>     //the code assume that the cpu_online_map can't be changed.
>     preempt_enable()
> 
> It is very hard to find out all such kinds of assumptions and fixes them.
> (I notice your code mainly fixes code around send_xxxx())
> 

The conversion can be carried out using the method I mentioned above.

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
  2013-02-26 14:22                         ` Srivatsa S. Bhat
  (?)
@ 2013-02-26 16:25                           ` Lai Jiangshan
  -1 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-02-26 16:25 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Michel Lespinasse, linux-doc, peterz, fweisbec, linux-kernel,
	namhyung, mingo, linux-arch, linux, xiaoguangrong, wangyun,
	paulmck, nikunj, linux-pm, rusty, rostedt, rjw, vincent.guittot,
	tglx, linux-arm-kernel, netdev, oleg, sbw, tj, akpm,
	linuxppc-dev

On Tue, Feb 26, 2013 at 10:22 PM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
>
> Hi Lai,
>
> I'm really not convinced that piggy-backing on lglocks would help
> us in any way. But still, let me try to address some of the points
> you raised...
>
> On 02/26/2013 06:29 PM, Lai Jiangshan wrote:
>> On Tue, Feb 26, 2013 at 5:02 PM, Srivatsa S. Bhat
>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>> On 02/26/2013 05:47 AM, Lai Jiangshan wrote:
>>>> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>> Hi Lai,
>>>>>
>>>>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>>>>> Hi, Srivatsa,
>>>>>>
>>>>>> The target of the whole patchset is nice for me.
>>>>>
>>>>> Cool! Thanks :-)
>>>>>
>>> [...]
>>>
>>> Unfortunately, I see quite a few issues with the code above. IIUC, the
>>> writer and the reader both increment the same counters. So how will the
>>> unlock() code in the reader path know when to unlock which of the locks?
>>
>> The same as your code, the reader(which nested in write C.S.) just dec
>> the counters.
>
> And that works fine in my case because the writer and the reader update
> _two_ _different_ counters.

I can't find any magic in your code, they are the same counter.

        /*
         * It is desirable to allow the writer to acquire the percpu-rwlock
         * for read (if necessary), without deadlocking or getting complaints
         * from lockdep. To achieve that, just increment the reader_refcnt of
         * this CPU - that way, any attempt by the writer to acquire the
         * percpu-rwlock for read, will get treated as a case of nested percpu
         * reader, which is safe, from a locking perspective.
         */
        this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);


> If both of them update the same counter, there
> will be a semantic clash - an increment of the counter can either mean that
> a new writer became active, or it can also indicate a nested reader. A decrement
> can also similarly have 2 meanings. And thus it will be difficult to decide
> the right action to take, based on the value of the counter.
>
>>
>>> (The counter-dropping-to-zero logic is not safe, since it can be updated
>>> due to different reasons). And now that I look at it again, in the absence
>>> of the writer, the reader is allowed to be recursive at the heavy cost of
>>> taking the global rwlock for read, every 2nd time you nest (because the
>>> spinlock is non-recursive).
>>
>> (I did not understand your comments of this part)
>> nested reader is considered seldom.
>
> No, nested readers can be _quite_ frequent. Because, potentially all users
> of preempt_disable() are readers - and its well-known how frequently we
> nest preempt_disable(). As a simple example, any atomic reader who calls
> smp_call_function() will become a nested reader, because smp_call_function()
> itself is a reader. So reader nesting is expected to be quite frequent.
>
>> But if N(>=2) nested readers happen,
>> the overhead is:
>>     1 spin_try_lock() + 1 read_lock() + (N-1) __this_cpu_inc()
>>
>
> In my patch, its just this_cpu_inc(). Note that these are _very_ hot paths.
> So every bit of optimization that you can add is worthwhile.
>
> And your read_lock() is a _global_ lock - thus, it can lead to a lot of
> cache-line bouncing. That's *exactly* why I have used per-cpu refcounts in
> my synchronization scheme, to avoid taking the global rwlock as much as possible.
>
> Another important point to note is that, the overhead we are talking about
> here, exists even when _not_ performing hotplug. And its the replacement to
> the super-fast preempt_disable(). So its extremely important to consciously
> minimize this overhead - else we'll end up slowing down the system significantly.
>

All I was considered is "nested reader is seldom", so I always
fallback to rwlock when nested.
If you like, I can add 6 lines of code, the overhead is
1 spin_try_lock()(fast path)  + N  __this_cpu_inc()

The overhead of your code is
2 smp_mb() + N __this_cpu_inc()

I don't see how much different.

>>> Also, this lg_rwlock implementation uses 3
>>> different data-structures - a per-cpu spinlock, a global rwlock and
>>> a per-cpu refcnt, and its not immediately apparent why you need those many
>>> or even those many varieties.
>>
>> data-structures is the same as yours.
>> fallback_reader_refcnt <--> reader_refcnt
>
> This has semantic problems, as noted above.
>
>> per-cpu spinlock <--> write_signal
>
> Acquire/release of (spin) lock is costlier than inc/dec of a counter, IIUC.
>
>> fallback_rwlock  <---> global_rwlock
>>
>>> Also I see that this doesn't handle the
>>> case of interrupt-handlers also being readers.
>>
>> handled. nested reader will see the ref or take the fallback_rwlock
>>

Sorry, _reentrance_ read_lock() will see the ref or take the fallback_rwlock

>
> I'm not referring to simple nested readers here, but interrupt handlers who
> can act as readers. For starters, the arch_spin_trylock() is not safe when
> interrupt handlers can also run the same code, right? You'll need to save
> and restore interrupts at critical points in the code. Also, the __foo()
> variants used to read/update the counters are not interrupt-safe.

I must missed something.

Could you elaborate more why arch_spin_trylock() is not safe when
interrupt handlers can also run the same code?

Could you elaborate more why __this_cpu_op variants is not
interrupt-safe since they are always called paired.


> And,
> the unlock() code in the reader path is again going to be confused about
> what to do when interrupt handlers interrupt regular readers, due to the
> messed up refcount.

I still can't understand.

>
>>>
>>> IMHO, the per-cpu rwlock scheme that I have implemented in this patchset
>>> has a clean, understandable design and just enough data-structures/locks
>>> to achieve its goal and has several optimizations (like reducing the
>>> interrupts-disabled time etc) included - all in a very straight-forward
>>> manner. Since this is non-trivial, IMHO, starting from a clean slate is
>>> actually better than trying to retrofit the logic into some locking scheme
>>> which we actively want to avoid (and hence effectively we aren't even
>>> borrowing anything from!).
>>>
>>> To summarize, if you are just pointing out that we can implement the same
>>> logic by altering lglocks, then sure, I acknowledge the possibility.
>>> However, I don't think doing that actually makes it better; it either
>>> convolutes the logic unnecessarily, or ends up looking _very_ similar to
>>> the implementation in this patchset, from what I can see.
>>>
>

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-26 16:25                           ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-02-26 16:25 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: linux-doc, peterz, fweisbec, linux-kernel, Michel Lespinasse,
	mingo, linux-arch, linux, xiaoguangrong, wangyun, paulmck,
	nikunj, linux-pm, rusty, rostedt, rjw, namhyung, tglx,
	linux-arm-kernel, netdev, oleg, vincent.guittot, sbw, tj, akpm,
	linuxppc-dev

On Tue, Feb 26, 2013 at 10:22 PM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
>
> Hi Lai,
>
> I'm really not convinced that piggy-backing on lglocks would help
> us in any way. But still, let me try to address some of the points
> you raised...
>
> On 02/26/2013 06:29 PM, Lai Jiangshan wrote:
>> On Tue, Feb 26, 2013 at 5:02 PM, Srivatsa S. Bhat
>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>> On 02/26/2013 05:47 AM, Lai Jiangshan wrote:
>>>> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>> Hi Lai,
>>>>>
>>>>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>>>>> Hi, Srivatsa,
>>>>>>
>>>>>> The target of the whole patchset is nice for me.
>>>>>
>>>>> Cool! Thanks :-)
>>>>>
>>> [...]
>>>
>>> Unfortunately, I see quite a few issues with the code above. IIUC, the
>>> writer and the reader both increment the same counters. So how will the
>>> unlock() code in the reader path know when to unlock which of the locks?
>>
>> The same as your code, the reader(which nested in write C.S.) just dec
>> the counters.
>
> And that works fine in my case because the writer and the reader update
> _two_ _different_ counters.

I can't find any magic in your code, they are the same counter.

        /*
         * It is desirable to allow the writer to acquire the percpu-rwlock
         * for read (if necessary), without deadlocking or getting complaints
         * from lockdep. To achieve that, just increment the reader_refcnt of
         * this CPU - that way, any attempt by the writer to acquire the
         * percpu-rwlock for read, will get treated as a case of nested percpu
         * reader, which is safe, from a locking perspective.
         */
        this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);


> If both of them update the same counter, there
> will be a semantic clash - an increment of the counter can either mean that
> a new writer became active, or it can also indicate a nested reader. A decrement
> can also similarly have 2 meanings. And thus it will be difficult to decide
> the right action to take, based on the value of the counter.
>
>>
>>> (The counter-dropping-to-zero logic is not safe, since it can be updated
>>> due to different reasons). And now that I look at it again, in the absence
>>> of the writer, the reader is allowed to be recursive at the heavy cost of
>>> taking the global rwlock for read, every 2nd time you nest (because the
>>> spinlock is non-recursive).
>>
>> (I did not understand your comments of this part)
>> nested reader is considered seldom.
>
> No, nested readers can be _quite_ frequent. Because, potentially all users
> of preempt_disable() are readers - and its well-known how frequently we
> nest preempt_disable(). As a simple example, any atomic reader who calls
> smp_call_function() will become a nested reader, because smp_call_function()
> itself is a reader. So reader nesting is expected to be quite frequent.
>
>> But if N(>=2) nested readers happen,
>> the overhead is:
>>     1 spin_try_lock() + 1 read_lock() + (N-1) __this_cpu_inc()
>>
>
> In my patch, its just this_cpu_inc(). Note that these are _very_ hot paths.
> So every bit of optimization that you can add is worthwhile.
>
> And your read_lock() is a _global_ lock - thus, it can lead to a lot of
> cache-line bouncing. That's *exactly* why I have used per-cpu refcounts in
> my synchronization scheme, to avoid taking the global rwlock as much as possible.
>
> Another important point to note is that, the overhead we are talking about
> here, exists even when _not_ performing hotplug. And its the replacement to
> the super-fast preempt_disable(). So its extremely important to consciously
> minimize this overhead - else we'll end up slowing down the system significantly.
>

All I was considered is "nested reader is seldom", so I always
fallback to rwlock when nested.
If you like, I can add 6 lines of code, the overhead is
1 spin_try_lock()(fast path)  + N  __this_cpu_inc()

The overhead of your code is
2 smp_mb() + N __this_cpu_inc()

I don't see how much different.

>>> Also, this lg_rwlock implementation uses 3
>>> different data-structures - a per-cpu spinlock, a global rwlock and
>>> a per-cpu refcnt, and its not immediately apparent why you need those many
>>> or even those many varieties.
>>
>> data-structures is the same as yours.
>> fallback_reader_refcnt <--> reader_refcnt
>
> This has semantic problems, as noted above.
>
>> per-cpu spinlock <--> write_signal
>
> Acquire/release of (spin) lock is costlier than inc/dec of a counter, IIUC.
>
>> fallback_rwlock  <---> global_rwlock
>>
>>> Also I see that this doesn't handle the
>>> case of interrupt-handlers also being readers.
>>
>> handled. nested reader will see the ref or take the fallback_rwlock
>>

Sorry, _reentrance_ read_lock() will see the ref or take the fallback_rwlock

>
> I'm not referring to simple nested readers here, but interrupt handlers who
> can act as readers. For starters, the arch_spin_trylock() is not safe when
> interrupt handlers can also run the same code, right? You'll need to save
> and restore interrupts at critical points in the code. Also, the __foo()
> variants used to read/update the counters are not interrupt-safe.

I must missed something.

Could you elaborate more why arch_spin_trylock() is not safe when
interrupt handlers can also run the same code?

Could you elaborate more why __this_cpu_op variants is not
interrupt-safe since they are always called paired.


> And,
> the unlock() code in the reader path is again going to be confused about
> what to do when interrupt handlers interrupt regular readers, due to the
> messed up refcount.

I still can't understand.

>
>>>
>>> IMHO, the per-cpu rwlock scheme that I have implemented in this patchset
>>> has a clean, understandable design and just enough data-structures/locks
>>> to achieve its goal and has several optimizations (like reducing the
>>> interrupts-disabled time etc) included - all in a very straight-forward
>>> manner. Since this is non-trivial, IMHO, starting from a clean slate is
>>> actually better than trying to retrofit the logic into some locking scheme
>>> which we actively want to avoid (and hence effectively we aren't even
>>> borrowing anything from!).
>>>
>>> To summarize, if you are just pointing out that we can implement the same
>>> logic by altering lglocks, then sure, I acknowledge the possibility.
>>> However, I don't think doing that actually makes it better; it either
>>> convolutes the logic unnecessarily, or ends up looking _very_ similar to
>>> the implementation in this patchset, from what I can see.
>>>
>

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-26 16:25                           ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-02-26 16:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 26, 2013 at 10:22 PM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
>
> Hi Lai,
>
> I'm really not convinced that piggy-backing on lglocks would help
> us in any way. But still, let me try to address some of the points
> you raised...
>
> On 02/26/2013 06:29 PM, Lai Jiangshan wrote:
>> On Tue, Feb 26, 2013 at 5:02 PM, Srivatsa S. Bhat
>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>> On 02/26/2013 05:47 AM, Lai Jiangshan wrote:
>>>> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>> Hi Lai,
>>>>>
>>>>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>>>>> Hi, Srivatsa,
>>>>>>
>>>>>> The target of the whole patchset is nice for me.
>>>>>
>>>>> Cool! Thanks :-)
>>>>>
>>> [...]
>>>
>>> Unfortunately, I see quite a few issues with the code above. IIUC, the
>>> writer and the reader both increment the same counters. So how will the
>>> unlock() code in the reader path know when to unlock which of the locks?
>>
>> The same as your code, the reader(which nested in write C.S.) just dec
>> the counters.
>
> And that works fine in my case because the writer and the reader update
> _two_ _different_ counters.

I can't find any magic in your code, they are the same counter.

        /*
         * It is desirable to allow the writer to acquire the percpu-rwlock
         * for read (if necessary), without deadlocking or getting complaints
         * from lockdep. To achieve that, just increment the reader_refcnt of
         * this CPU - that way, any attempt by the writer to acquire the
         * percpu-rwlock for read, will get treated as a case of nested percpu
         * reader, which is safe, from a locking perspective.
         */
        this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);


> If both of them update the same counter, there
> will be a semantic clash - an increment of the counter can either mean that
> a new writer became active, or it can also indicate a nested reader. A decrement
> can also similarly have 2 meanings. And thus it will be difficult to decide
> the right action to take, based on the value of the counter.
>
>>
>>> (The counter-dropping-to-zero logic is not safe, since it can be updated
>>> due to different reasons). And now that I look at it again, in the absence
>>> of the writer, the reader is allowed to be recursive at the heavy cost of
>>> taking the global rwlock for read, every 2nd time you nest (because the
>>> spinlock is non-recursive).
>>
>> (I did not understand your comments of this part)
>> nested reader is considered seldom.
>
> No, nested readers can be _quite_ frequent. Because, potentially all users
> of preempt_disable() are readers - and its well-known how frequently we
> nest preempt_disable(). As a simple example, any atomic reader who calls
> smp_call_function() will become a nested reader, because smp_call_function()
> itself is a reader. So reader nesting is expected to be quite frequent.
>
>> But if N(>=2) nested readers happen,
>> the overhead is:
>>     1 spin_try_lock() + 1 read_lock() + (N-1) __this_cpu_inc()
>>
>
> In my patch, its just this_cpu_inc(). Note that these are _very_ hot paths.
> So every bit of optimization that you can add is worthwhile.
>
> And your read_lock() is a _global_ lock - thus, it can lead to a lot of
> cache-line bouncing. That's *exactly* why I have used per-cpu refcounts in
> my synchronization scheme, to avoid taking the global rwlock as much as possible.
>
> Another important point to note is that, the overhead we are talking about
> here, exists even when _not_ performing hotplug. And its the replacement to
> the super-fast preempt_disable(). So its extremely important to consciously
> minimize this overhead - else we'll end up slowing down the system significantly.
>

All I was considered is "nested reader is seldom", so I always
fallback to rwlock when nested.
If you like, I can add 6 lines of code, the overhead is
1 spin_try_lock()(fast path)  + N  __this_cpu_inc()

The overhead of your code is
2 smp_mb() + N __this_cpu_inc()

I don't see how much different.

>>> Also, this lg_rwlock implementation uses 3
>>> different data-structures - a per-cpu spinlock, a global rwlock and
>>> a per-cpu refcnt, and its not immediately apparent why you need those many
>>> or even those many varieties.
>>
>> data-structures is the same as yours.
>> fallback_reader_refcnt <--> reader_refcnt
>
> This has semantic problems, as noted above.
>
>> per-cpu spinlock <--> write_signal
>
> Acquire/release of (spin) lock is costlier than inc/dec of a counter, IIUC.
>
>> fallback_rwlock  <---> global_rwlock
>>
>>> Also I see that this doesn't handle the
>>> case of interrupt-handlers also being readers.
>>
>> handled. nested reader will see the ref or take the fallback_rwlock
>>

Sorry, _reentrance_ read_lock() will see the ref or take the fallback_rwlock

>
> I'm not referring to simple nested readers here, but interrupt handlers who
> can act as readers. For starters, the arch_spin_trylock() is not safe when
> interrupt handlers can also run the same code, right? You'll need to save
> and restore interrupts at critical points in the code. Also, the __foo()
> variants used to read/update the counters are not interrupt-safe.

I must missed something.

Could you elaborate more why arch_spin_trylock() is not safe when
interrupt handlers can also run the same code?

Could you elaborate more why __this_cpu_op variants is not
interrupt-safe since they are always called paired.


> And,
> the unlock() code in the reader path is again going to be confused about
> what to do when interrupt handlers interrupt regular readers, due to the
> messed up refcount.

I still can't understand.

>
>>>
>>> IMHO, the per-cpu rwlock scheme that I have implemented in this patchset
>>> has a clean, understandable design and just enough data-structures/locks
>>> to achieve its goal and has several optimizations (like reducing the
>>> interrupts-disabled time etc) included - all in a very straight-forward
>>> manner. Since this is non-trivial, IMHO, starting from a clean slate is
>>> actually better than trying to retrofit the logic into some locking scheme
>>> which we actively want to avoid (and hence effectively we aren't even
>>> borrowing anything from!).
>>>
>>> To summarize, if you are just pointing out that we can implement the same
>>> logic by altering lglocks, then sure, I acknowledge the possibility.
>>> However, I don't think doing that actually makes it better; it either
>>> convolutes the logic unnecessarily, or ends up looking _very_ similar to
>>> the implementation in this patchset, from what I can see.
>>>
>

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
  2013-02-26 16:25                           ` Lai Jiangshan
  (?)
@ 2013-02-26 19:30                             ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-26 19:30 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Michel Lespinasse, linux-doc, peterz, fweisbec, linux-kernel,
	namhyung, mingo, linux-arch, linux, xiaoguangrong, wangyun,
	paulmck, nikunj, linux-pm, rusty, rostedt, rjw, vincent.guittot,
	tglx, linux-arm-kernel, netdev, oleg, sbw, tj, akpm,
	linuxppc-dev

On 02/26/2013 09:55 PM, Lai Jiangshan wrote:
> On Tue, Feb 26, 2013 at 10:22 PM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>
>> Hi Lai,
>>
>> I'm really not convinced that piggy-backing on lglocks would help
>> us in any way. But still, let me try to address some of the points
>> you raised...
>>
>> On 02/26/2013 06:29 PM, Lai Jiangshan wrote:
>>> On Tue, Feb 26, 2013 at 5:02 PM, Srivatsa S. Bhat
>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>> On 02/26/2013 05:47 AM, Lai Jiangshan wrote:
>>>>> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>> Hi Lai,
>>>>>>
>>>>>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>>>>>> Hi, Srivatsa,
>>>>>>>
>>>>>>> The target of the whole patchset is nice for me.
>>>>>>
>>>>>> Cool! Thanks :-)
>>>>>>
>>>> [...]
>>>>
>>>> Unfortunately, I see quite a few issues with the code above. IIUC, the
>>>> writer and the reader both increment the same counters. So how will the
>>>> unlock() code in the reader path know when to unlock which of the locks?
>>>
>>> The same as your code, the reader(which nested in write C.S.) just dec
>>> the counters.
>>
>> And that works fine in my case because the writer and the reader update
>> _two_ _different_ counters.
> 
> I can't find any magic in your code, they are the same counter.
> 
>         /*
>          * It is desirable to allow the writer to acquire the percpu-rwlock
>          * for read (if necessary), without deadlocking or getting complaints
>          * from lockdep. To achieve that, just increment the reader_refcnt of
>          * this CPU - that way, any attempt by the writer to acquire the
>          * percpu-rwlock for read, will get treated as a case of nested percpu
>          * reader, which is safe, from a locking perspective.
>          */
>         this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
>

Whoa! Hold on, were you really referring to _this_ increment when you said
that, in your patch you would increment the refcnt at the writer? Then I guess
there is a major disconnect in our conversations. (I had assumed that you were
referring to the update of writer_signal, and were just trying to have a single
refcnt instead of reader_refcnt and writer_signal).

So, please let me clarify things a bit here. Forget about the above increment
of reader_refcnt at the writer side. Its almost utterly insignificant for our
current discussion. We can simply replace it with a check as shown below, at
the reader side:

void percpu_read_lock_irqsafe()
{
	if (current == active_writer)
		return;

	/* Rest of the code */
}

Now, assuming that, in your patch, you were trying to use the per-cpu refcnt
to allow the writer to safely take the reader path, you can simply get rid of
that percpu-refcnt, as demonstrated above.

So that would reduce your code to the following (after simplification):

lg_rwlock_local_read_lock()
{
	if (current == active_writer)
		return;
	if (arch_spin_trylock(per-cpu-spinlock))
		return;
	read_lock(global-rwlock);
}

Now, let us assume that hotplug is not happening, meaning, nobody is running
the writer side code. Now let's see what happens at the reader side in your
patch. As I mentioned earlier, the readers are _very_ frequent and can be in
very hot paths. And they also happen to be nested quite often. 

So, a non-nested reader acquires the per-cpu spinlock. Every subsequent nested
reader on that CPU has to acquire the global rwlock for read. Right there you
have 2 significant performance issues -
1. Acquiring the (spin) lock is costly
2. Acquiring the global rwlock causes cacheline bouncing, which hurts
   performance.

And why do we care so much about performance here? Because, the existing
kernel does an efficient preempt_disable() here - which is an optimized
per-cpu counter increment. Replacing that with such heavy primitives on the
reader side can be very bad.

Now, how does my patchset tackle this? My scheme just requires an increment
of a per-cpu refcnt (reader_refcnt) and memory barrier. Which is acceptable
from a performance-perspective, because IMHO its not horrendously worse than
a preempt_disable().

> 
>> If both of them update the same counter, there
>> will be a semantic clash - an increment of the counter can either mean that
>> a new writer became active, or it can also indicate a nested reader. A decrement
>> can also similarly have 2 meanings. And thus it will be difficult to decide
>> the right action to take, based on the value of the counter.
>>
>>>
>>>> (The counter-dropping-to-zero logic is not safe, since it can be updated
>>>> due to different reasons). And now that I look at it again, in the absence
>>>> of the writer, the reader is allowed to be recursive at the heavy cost of
>>>> taking the global rwlock for read, every 2nd time you nest (because the
>>>> spinlock is non-recursive).
>>>
>>> (I did not understand your comments of this part)
>>> nested reader is considered seldom.
>>
>> No, nested readers can be _quite_ frequent. Because, potentially all users
>> of preempt_disable() are readers - and its well-known how frequently we
>> nest preempt_disable(). As a simple example, any atomic reader who calls
>> smp_call_function() will become a nested reader, because smp_call_function()
>> itself is a reader. So reader nesting is expected to be quite frequent.
>>
>>> But if N(>=2) nested readers happen,
>>> the overhead is:
>>>     1 spin_try_lock() + 1 read_lock() + (N-1) __this_cpu_inc()
>>>
>>
>> In my patch, its just this_cpu_inc(). Note that these are _very_ hot paths.
>> So every bit of optimization that you can add is worthwhile.
>>
>> And your read_lock() is a _global_ lock - thus, it can lead to a lot of
>> cache-line bouncing. That's *exactly* why I have used per-cpu refcounts in
>> my synchronization scheme, to avoid taking the global rwlock as much as possible.
>>
>> Another important point to note is that, the overhead we are talking about
>> here, exists even when _not_ performing hotplug. And its the replacement to
>> the super-fast preempt_disable(). So its extremely important to consciously
>> minimize this overhead - else we'll end up slowing down the system significantly.
>>
> 
> All I was considered is "nested reader is seldom", so I always
> fallback to rwlock when nested.
> If you like, I can add 6 lines of code, the overhead is
> 1 spin_try_lock()(fast path)  + N  __this_cpu_inc()
> 

I'm assuming that calculation is no longer valid, considering that
we just discussed how the per-cpu refcnt that you were using is quite
unnecessary and can be removed.

IIUC, the overhead with your code, as per above discussion would be:
1 spin_try_lock() [non-nested] + N read_lock(global_rwlock).

Note that I'm referring to the scenario when hotplug is _not_ happening
(ie., nobody is running writer side code).

> The overhead of your code is
> 2 smp_mb() + N __this_cpu_inc()
> 

Right. And as you can see, this is much much better than the overhead
shown above.

> I don't see how much different.
> 
>>>> Also, this lg_rwlock implementation uses 3
>>>> different data-structures - a per-cpu spinlock, a global rwlock and
>>>> a per-cpu refcnt, and its not immediately apparent why you need those many
>>>> or even those many varieties.
>>>
>>> data-structures is the same as yours.
>>> fallback_reader_refcnt <--> reader_refcnt
>>
>> This has semantic problems, as noted above.
>>
>>> per-cpu spinlock <--> write_signal
>>
>> Acquire/release of (spin) lock is costlier than inc/dec of a counter, IIUC.
>>
>>> fallback_rwlock  <---> global_rwlock
>>>
>>>> Also I see that this doesn't handle the
>>>> case of interrupt-handlers also being readers.
>>>
>>> handled. nested reader will see the ref or take the fallback_rwlock
>>>
> 
> Sorry, _reentrance_ read_lock() will see the ref or take the fallback_rwlock
> 
>>
>> I'm not referring to simple nested readers here, but interrupt handlers who
>> can act as readers. For starters, the arch_spin_trylock() is not safe when
>> interrupt handlers can also run the same code, right? You'll need to save
>> and restore interrupts at critical points in the code. Also, the __foo()
>> variants used to read/update the counters are not interrupt-safe.
> 
> I must missed something.
> 
> Could you elaborate more why arch_spin_trylock() is not safe when
> interrupt handlers can also run the same code?
> 
> Could you elaborate more why __this_cpu_op variants is not
> interrupt-safe since they are always called paired.
>

Take a look at include/linux/percpu.h. You'll note that __this_cpu_*
operations map to __this_cpu_generic_to_op(), which doesn't disable interrupts
while doing the update. Hence you can get inconsistent results if an interrupt
hits the CPU at that time and the interrupt handler tries to do the same thing.
In contrast, if you use this_cpu_inc() for example, interrupts are explicitly
disabled during the update and hence you won't get inconsistent results.

> 
>> And,
>> the unlock() code in the reader path is again going to be confused about
>> what to do when interrupt handlers interrupt regular readers, due to the
>> messed up refcount.
> 
> I still can't understand.
>

The primary reason _I_ was using the refcnt vs the reason _you_ were using the
refcnt, appears to be very different. Maybe that's why the above statement
didn't make sense. In your case, IIUC, you can simply get rid of the refcnt
and replace it with the simple check I mentioned above. Whereas, I use
refcnts to keep the reader-side synchronization fast (and for reader-writer
communication).

 
Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-26 19:30                             ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-26 19:30 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-doc, peterz, fweisbec, linux-kernel, Michel Lespinasse,
	mingo, linux-arch, linux, xiaoguangrong, wangyun, paulmck,
	nikunj, linux-pm, rusty, rostedt, rjw, namhyung, tglx,
	linux-arm-kernel, netdev, oleg, vincent.guittot, sbw, tj, akpm,
	linuxppc-dev

On 02/26/2013 09:55 PM, Lai Jiangshan wrote:
> On Tue, Feb 26, 2013 at 10:22 PM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>
>> Hi Lai,
>>
>> I'm really not convinced that piggy-backing on lglocks would help
>> us in any way. But still, let me try to address some of the points
>> you raised...
>>
>> On 02/26/2013 06:29 PM, Lai Jiangshan wrote:
>>> On Tue, Feb 26, 2013 at 5:02 PM, Srivatsa S. Bhat
>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>> On 02/26/2013 05:47 AM, Lai Jiangshan wrote:
>>>>> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>> Hi Lai,
>>>>>>
>>>>>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>>>>>> Hi, Srivatsa,
>>>>>>>
>>>>>>> The target of the whole patchset is nice for me.
>>>>>>
>>>>>> Cool! Thanks :-)
>>>>>>
>>>> [...]
>>>>
>>>> Unfortunately, I see quite a few issues with the code above. IIUC, the
>>>> writer and the reader both increment the same counters. So how will the
>>>> unlock() code in the reader path know when to unlock which of the locks?
>>>
>>> The same as your code, the reader(which nested in write C.S.) just dec
>>> the counters.
>>
>> And that works fine in my case because the writer and the reader update
>> _two_ _different_ counters.
> 
> I can't find any magic in your code, they are the same counter.
> 
>         /*
>          * It is desirable to allow the writer to acquire the percpu-rwlock
>          * for read (if necessary), without deadlocking or getting complaints
>          * from lockdep. To achieve that, just increment the reader_refcnt of
>          * this CPU - that way, any attempt by the writer to acquire the
>          * percpu-rwlock for read, will get treated as a case of nested percpu
>          * reader, which is safe, from a locking perspective.
>          */
>         this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
>

Whoa! Hold on, were you really referring to _this_ increment when you said
that, in your patch you would increment the refcnt at the writer? Then I guess
there is a major disconnect in our conversations. (I had assumed that you were
referring to the update of writer_signal, and were just trying to have a single
refcnt instead of reader_refcnt and writer_signal).

So, please let me clarify things a bit here. Forget about the above increment
of reader_refcnt at the writer side. Its almost utterly insignificant for our
current discussion. We can simply replace it with a check as shown below, at
the reader side:

void percpu_read_lock_irqsafe()
{
	if (current == active_writer)
		return;

	/* Rest of the code */
}

Now, assuming that, in your patch, you were trying to use the per-cpu refcnt
to allow the writer to safely take the reader path, you can simply get rid of
that percpu-refcnt, as demonstrated above.

So that would reduce your code to the following (after simplification):

lg_rwlock_local_read_lock()
{
	if (current == active_writer)
		return;
	if (arch_spin_trylock(per-cpu-spinlock))
		return;
	read_lock(global-rwlock);
}

Now, let us assume that hotplug is not happening, meaning, nobody is running
the writer side code. Now let's see what happens at the reader side in your
patch. As I mentioned earlier, the readers are _very_ frequent and can be in
very hot paths. And they also happen to be nested quite often. 

So, a non-nested reader acquires the per-cpu spinlock. Every subsequent nested
reader on that CPU has to acquire the global rwlock for read. Right there you
have 2 significant performance issues -
1. Acquiring the (spin) lock is costly
2. Acquiring the global rwlock causes cacheline bouncing, which hurts
   performance.

And why do we care so much about performance here? Because, the existing
kernel does an efficient preempt_disable() here - which is an optimized
per-cpu counter increment. Replacing that with such heavy primitives on the
reader side can be very bad.

Now, how does my patchset tackle this? My scheme just requires an increment
of a per-cpu refcnt (reader_refcnt) and memory barrier. Which is acceptable
from a performance-perspective, because IMHO its not horrendously worse than
a preempt_disable().

> 
>> If both of them update the same counter, there
>> will be a semantic clash - an increment of the counter can either mean that
>> a new writer became active, or it can also indicate a nested reader. A decrement
>> can also similarly have 2 meanings. And thus it will be difficult to decide
>> the right action to take, based on the value of the counter.
>>
>>>
>>>> (The counter-dropping-to-zero logic is not safe, since it can be updated
>>>> due to different reasons). And now that I look at it again, in the absence
>>>> of the writer, the reader is allowed to be recursive at the heavy cost of
>>>> taking the global rwlock for read, every 2nd time you nest (because the
>>>> spinlock is non-recursive).
>>>
>>> (I did not understand your comments of this part)
>>> nested reader is considered seldom.
>>
>> No, nested readers can be _quite_ frequent. Because, potentially all users
>> of preempt_disable() are readers - and its well-known how frequently we
>> nest preempt_disable(). As a simple example, any atomic reader who calls
>> smp_call_function() will become a nested reader, because smp_call_function()
>> itself is a reader. So reader nesting is expected to be quite frequent.
>>
>>> But if N(>=2) nested readers happen,
>>> the overhead is:
>>>     1 spin_try_lock() + 1 read_lock() + (N-1) __this_cpu_inc()
>>>
>>
>> In my patch, its just this_cpu_inc(). Note that these are _very_ hot paths.
>> So every bit of optimization that you can add is worthwhile.
>>
>> And your read_lock() is a _global_ lock - thus, it can lead to a lot of
>> cache-line bouncing. That's *exactly* why I have used per-cpu refcounts in
>> my synchronization scheme, to avoid taking the global rwlock as much as possible.
>>
>> Another important point to note is that, the overhead we are talking about
>> here, exists even when _not_ performing hotplug. And its the replacement to
>> the super-fast preempt_disable(). So its extremely important to consciously
>> minimize this overhead - else we'll end up slowing down the system significantly.
>>
> 
> All I was considered is "nested reader is seldom", so I always
> fallback to rwlock when nested.
> If you like, I can add 6 lines of code, the overhead is
> 1 spin_try_lock()(fast path)  + N  __this_cpu_inc()
> 

I'm assuming that calculation is no longer valid, considering that
we just discussed how the per-cpu refcnt that you were using is quite
unnecessary and can be removed.

IIUC, the overhead with your code, as per above discussion would be:
1 spin_try_lock() [non-nested] + N read_lock(global_rwlock).

Note that I'm referring to the scenario when hotplug is _not_ happening
(ie., nobody is running writer side code).

> The overhead of your code is
> 2 smp_mb() + N __this_cpu_inc()
> 

Right. And as you can see, this is much much better than the overhead
shown above.

> I don't see how much different.
> 
>>>> Also, this lg_rwlock implementation uses 3
>>>> different data-structures - a per-cpu spinlock, a global rwlock and
>>>> a per-cpu refcnt, and its not immediately apparent why you need those many
>>>> or even those many varieties.
>>>
>>> data-structures is the same as yours.
>>> fallback_reader_refcnt <--> reader_refcnt
>>
>> This has semantic problems, as noted above.
>>
>>> per-cpu spinlock <--> write_signal
>>
>> Acquire/release of (spin) lock is costlier than inc/dec of a counter, IIUC.
>>
>>> fallback_rwlock  <---> global_rwlock
>>>
>>>> Also I see that this doesn't handle the
>>>> case of interrupt-handlers also being readers.
>>>
>>> handled. nested reader will see the ref or take the fallback_rwlock
>>>
> 
> Sorry, _reentrance_ read_lock() will see the ref or take the fallback_rwlock
> 
>>
>> I'm not referring to simple nested readers here, but interrupt handlers who
>> can act as readers. For starters, the arch_spin_trylock() is not safe when
>> interrupt handlers can also run the same code, right? You'll need to save
>> and restore interrupts at critical points in the code. Also, the __foo()
>> variants used to read/update the counters are not interrupt-safe.
> 
> I must missed something.
> 
> Could you elaborate more why arch_spin_trylock() is not safe when
> interrupt handlers can also run the same code?
> 
> Could you elaborate more why __this_cpu_op variants is not
> interrupt-safe since they are always called paired.
>

Take a look at include/linux/percpu.h. You'll note that __this_cpu_*
operations map to __this_cpu_generic_to_op(), which doesn't disable interrupts
while doing the update. Hence you can get inconsistent results if an interrupt
hits the CPU at that time and the interrupt handler tries to do the same thing.
In contrast, if you use this_cpu_inc() for example, interrupts are explicitly
disabled during the update and hence you won't get inconsistent results.

> 
>> And,
>> the unlock() code in the reader path is again going to be confused about
>> what to do when interrupt handlers interrupt regular readers, due to the
>> messed up refcount.
> 
> I still can't understand.
>

The primary reason _I_ was using the refcnt vs the reason _you_ were using the
refcnt, appears to be very different. Maybe that's why the above statement
didn't make sense. In your case, IIUC, you can simply get rid of the refcnt
and replace it with the simple check I mentioned above. Whereas, I use
refcnts to keep the reader-side synchronization fast (and for reader-writer
communication).

 
Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-26 19:30                             ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-26 19:30 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/26/2013 09:55 PM, Lai Jiangshan wrote:
> On Tue, Feb 26, 2013 at 10:22 PM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>
>> Hi Lai,
>>
>> I'm really not convinced that piggy-backing on lglocks would help
>> us in any way. But still, let me try to address some of the points
>> you raised...
>>
>> On 02/26/2013 06:29 PM, Lai Jiangshan wrote:
>>> On Tue, Feb 26, 2013 at 5:02 PM, Srivatsa S. Bhat
>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>> On 02/26/2013 05:47 AM, Lai Jiangshan wrote:
>>>>> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>> Hi Lai,
>>>>>>
>>>>>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>>>>>> Hi, Srivatsa,
>>>>>>>
>>>>>>> The target of the whole patchset is nice for me.
>>>>>>
>>>>>> Cool! Thanks :-)
>>>>>>
>>>> [...]
>>>>
>>>> Unfortunately, I see quite a few issues with the code above. IIUC, the
>>>> writer and the reader both increment the same counters. So how will the
>>>> unlock() code in the reader path know when to unlock which of the locks?
>>>
>>> The same as your code, the reader(which nested in write C.S.) just dec
>>> the counters.
>>
>> And that works fine in my case because the writer and the reader update
>> _two_ _different_ counters.
> 
> I can't find any magic in your code, they are the same counter.
> 
>         /*
>          * It is desirable to allow the writer to acquire the percpu-rwlock
>          * for read (if necessary), without deadlocking or getting complaints
>          * from lockdep. To achieve that, just increment the reader_refcnt of
>          * this CPU - that way, any attempt by the writer to acquire the
>          * percpu-rwlock for read, will get treated as a case of nested percpu
>          * reader, which is safe, from a locking perspective.
>          */
>         this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
>

Whoa! Hold on, were you really referring to _this_ increment when you said
that, in your patch you would increment the refcnt at the writer? Then I guess
there is a major disconnect in our conversations. (I had assumed that you were
referring to the update of writer_signal, and were just trying to have a single
refcnt instead of reader_refcnt and writer_signal).

So, please let me clarify things a bit here. Forget about the above increment
of reader_refcnt at the writer side. Its almost utterly insignificant for our
current discussion. We can simply replace it with a check as shown below, at
the reader side:

void percpu_read_lock_irqsafe()
{
	if (current == active_writer)
		return;

	/* Rest of the code */
}

Now, assuming that, in your patch, you were trying to use the per-cpu refcnt
to allow the writer to safely take the reader path, you can simply get rid of
that percpu-refcnt, as demonstrated above.

So that would reduce your code to the following (after simplification):

lg_rwlock_local_read_lock()
{
	if (current == active_writer)
		return;
	if (arch_spin_trylock(per-cpu-spinlock))
		return;
	read_lock(global-rwlock);
}

Now, let us assume that hotplug is not happening, meaning, nobody is running
the writer side code. Now let's see what happens at the reader side in your
patch. As I mentioned earlier, the readers are _very_ frequent and can be in
very hot paths. And they also happen to be nested quite often. 

So, a non-nested reader acquires the per-cpu spinlock. Every subsequent nested
reader on that CPU has to acquire the global rwlock for read. Right there you
have 2 significant performance issues -
1. Acquiring the (spin) lock is costly
2. Acquiring the global rwlock causes cacheline bouncing, which hurts
   performance.

And why do we care so much about performance here? Because, the existing
kernel does an efficient preempt_disable() here - which is an optimized
per-cpu counter increment. Replacing that with such heavy primitives on the
reader side can be very bad.

Now, how does my patchset tackle this? My scheme just requires an increment
of a per-cpu refcnt (reader_refcnt) and memory barrier. Which is acceptable
from a performance-perspective, because IMHO its not horrendously worse than
a preempt_disable().

> 
>> If both of them update the same counter, there
>> will be a semantic clash - an increment of the counter can either mean that
>> a new writer became active, or it can also indicate a nested reader. A decrement
>> can also similarly have 2 meanings. And thus it will be difficult to decide
>> the right action to take, based on the value of the counter.
>>
>>>
>>>> (The counter-dropping-to-zero logic is not safe, since it can be updated
>>>> due to different reasons). And now that I look at it again, in the absence
>>>> of the writer, the reader is allowed to be recursive at the heavy cost of
>>>> taking the global rwlock for read, every 2nd time you nest (because the
>>>> spinlock is non-recursive).
>>>
>>> (I did not understand your comments of this part)
>>> nested reader is considered seldom.
>>
>> No, nested readers can be _quite_ frequent. Because, potentially all users
>> of preempt_disable() are readers - and its well-known how frequently we
>> nest preempt_disable(). As a simple example, any atomic reader who calls
>> smp_call_function() will become a nested reader, because smp_call_function()
>> itself is a reader. So reader nesting is expected to be quite frequent.
>>
>>> But if N(>=2) nested readers happen,
>>> the overhead is:
>>>     1 spin_try_lock() + 1 read_lock() + (N-1) __this_cpu_inc()
>>>
>>
>> In my patch, its just this_cpu_inc(). Note that these are _very_ hot paths.
>> So every bit of optimization that you can add is worthwhile.
>>
>> And your read_lock() is a _global_ lock - thus, it can lead to a lot of
>> cache-line bouncing. That's *exactly* why I have used per-cpu refcounts in
>> my synchronization scheme, to avoid taking the global rwlock as much as possible.
>>
>> Another important point to note is that, the overhead we are talking about
>> here, exists even when _not_ performing hotplug. And its the replacement to
>> the super-fast preempt_disable(). So its extremely important to consciously
>> minimize this overhead - else we'll end up slowing down the system significantly.
>>
> 
> All I was considered is "nested reader is seldom", so I always
> fallback to rwlock when nested.
> If you like, I can add 6 lines of code, the overhead is
> 1 spin_try_lock()(fast path)  + N  __this_cpu_inc()
> 

I'm assuming that calculation is no longer valid, considering that
we just discussed how the per-cpu refcnt that you were using is quite
unnecessary and can be removed.

IIUC, the overhead with your code, as per above discussion would be:
1 spin_try_lock() [non-nested] + N read_lock(global_rwlock).

Note that I'm referring to the scenario when hotplug is _not_ happening
(ie., nobody is running writer side code).

> The overhead of your code is
> 2 smp_mb() + N __this_cpu_inc()
> 

Right. And as you can see, this is much much better than the overhead
shown above.

> I don't see how much different.
> 
>>>> Also, this lg_rwlock implementation uses 3
>>>> different data-structures - a per-cpu spinlock, a global rwlock and
>>>> a per-cpu refcnt, and its not immediately apparent why you need those many
>>>> or even those many varieties.
>>>
>>> data-structures is the same as yours.
>>> fallback_reader_refcnt <--> reader_refcnt
>>
>> This has semantic problems, as noted above.
>>
>>> per-cpu spinlock <--> write_signal
>>
>> Acquire/release of (spin) lock is costlier than inc/dec of a counter, IIUC.
>>
>>> fallback_rwlock  <---> global_rwlock
>>>
>>>> Also I see that this doesn't handle the
>>>> case of interrupt-handlers also being readers.
>>>
>>> handled. nested reader will see the ref or take the fallback_rwlock
>>>
> 
> Sorry, _reentrance_ read_lock() will see the ref or take the fallback_rwlock
> 
>>
>> I'm not referring to simple nested readers here, but interrupt handlers who
>> can act as readers. For starters, the arch_spin_trylock() is not safe when
>> interrupt handlers can also run the same code, right? You'll need to save
>> and restore interrupts at critical points in the code. Also, the __foo()
>> variants used to read/update the counters are not interrupt-safe.
> 
> I must missed something.
> 
> Could you elaborate more why arch_spin_trylock() is not safe when
> interrupt handlers can also run the same code?
> 
> Could you elaborate more why __this_cpu_op variants is not
> interrupt-safe since they are always called paired.
>

Take a look at include/linux/percpu.h. You'll note that __this_cpu_*
operations map to __this_cpu_generic_to_op(), which doesn't disable interrupts
while doing the update. Hence you can get inconsistent results if an interrupt
hits the CPU at that time and the interrupt handler tries to do the same thing.
In contrast, if you use this_cpu_inc() for example, interrupts are explicitly
disabled during the update and hence you won't get inconsistent results.

> 
>> And,
>> the unlock() code in the reader path is again going to be confused about
>> what to do when interrupt handlers interrupt regular readers, due to the
>> messed up refcount.
> 
> I still can't understand.
>

The primary reason _I_ was using the refcnt vs the reason _you_ were using the
refcnt, appears to be very different. Maybe that's why the above statement
didn't make sense. In your case, IIUC, you can simply get rid of the refcnt
and replace it with the simple check I mentioned above. Whereas, I use
refcnts to keep the reader-side synchronization fast (and for reader-writer
communication).

 
Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
  2013-02-26 19:30                             ` Srivatsa S. Bhat
  (?)
@ 2013-02-27  0:33                               ` Lai Jiangshan
  -1 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-02-27  0:33 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Michel Lespinasse, linux-doc, peterz, fweisbec, linux-kernel,
	namhyung, mingo, linux-arch, linux, xiaoguangrong, wangyun,
	paulmck, nikunj, linux-pm, rusty, rostedt, rjw, vincent.guittot,
	tglx, linux-arm-kernel, netdev, oleg, sbw, tj, akpm,
	linuxppc-dev

On Wed, Feb 27, 2013 at 3:30 AM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> On 02/26/2013 09:55 PM, Lai Jiangshan wrote:
>> On Tue, Feb 26, 2013 at 10:22 PM, Srivatsa S. Bhat
>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>
>>> Hi Lai,
>>>
>>> I'm really not convinced that piggy-backing on lglocks would help
>>> us in any way. But still, let me try to address some of the points
>>> you raised...
>>>
>>> On 02/26/2013 06:29 PM, Lai Jiangshan wrote:
>>>> On Tue, Feb 26, 2013 at 5:02 PM, Srivatsa S. Bhat
>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>> On 02/26/2013 05:47 AM, Lai Jiangshan wrote:
>>>>>> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
>>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>> Hi Lai,
>>>>>>>
>>>>>>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>>>>>>> Hi, Srivatsa,
>>>>>>>>
>>>>>>>> The target of the whole patchset is nice for me.
>>>>>>>
>>>>>>> Cool! Thanks :-)
>>>>>>>
>>>>> [...]
>>>>>
>>>>> Unfortunately, I see quite a few issues with the code above. IIUC, the
>>>>> writer and the reader both increment the same counters. So how will the
>>>>> unlock() code in the reader path know when to unlock which of the locks?
>>>>
>>>> The same as your code, the reader(which nested in write C.S.) just dec
>>>> the counters.
>>>
>>> And that works fine in my case because the writer and the reader update
>>> _two_ _different_ counters.
>>
>> I can't find any magic in your code, they are the same counter.
>>
>>         /*
>>          * It is desirable to allow the writer to acquire the percpu-rwlock
>>          * for read (if necessary), without deadlocking or getting complaints
>>          * from lockdep. To achieve that, just increment the reader_refcnt of
>>          * this CPU - that way, any attempt by the writer to acquire the
>>          * percpu-rwlock for read, will get treated as a case of nested percpu
>>          * reader, which is safe, from a locking perspective.
>>          */
>>         this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
>>
>
> Whoa! Hold on, were you really referring to _this_ increment when you said
> that, in your patch you would increment the refcnt at the writer? Then I guess
> there is a major disconnect in our conversations. (I had assumed that you were
> referring to the update of writer_signal, and were just trying to have a single
> refcnt instead of reader_refcnt and writer_signal).

https://github.com/laijs/linux/commit/53e5053d5b724bea7c538b11743d0f420d98f38d

Sorry the name "fallback_reader_refcnt" misled you.

>
> So, please let me clarify things a bit here. Forget about the above increment
> of reader_refcnt at the writer side. Its almost utterly insignificant for our
> current discussion. We can simply replace it with a check as shown below, at
> the reader side:
>
> void percpu_read_lock_irqsafe()
> {
>         if (current == active_writer)
>                 return;
>
>         /* Rest of the code */
> }
>
> Now, assuming that, in your patch, you were trying to use the per-cpu refcnt
> to allow the writer to safely take the reader path, you can simply get rid of
> that percpu-refcnt, as demonstrated above.
>
> So that would reduce your code to the following (after simplification):
>
> lg_rwlock_local_read_lock()
> {
>         if (current == active_writer)
>                 return;
>         if (arch_spin_trylock(per-cpu-spinlock))
>                 return;
>         read_lock(global-rwlock);
> }
>
> Now, let us assume that hotplug is not happening, meaning, nobody is running
> the writer side code. Now let's see what happens at the reader side in your
> patch. As I mentioned earlier, the readers are _very_ frequent and can be in
> very hot paths. And they also happen to be nested quite often.
>
> So, a non-nested reader acquires the per-cpu spinlock. Every subsequent nested
> reader on that CPU has to acquire the global rwlock for read. Right there you
> have 2 significant performance issues -
> 1. Acquiring the (spin) lock is costly
> 2. Acquiring the global rwlock causes cacheline bouncing, which hurts
>    performance.
>
> And why do we care so much about performance here? Because, the existing
> kernel does an efficient preempt_disable() here - which is an optimized
> per-cpu counter increment. Replacing that with such heavy primitives on the
> reader side can be very bad.
>
> Now, how does my patchset tackle this? My scheme just requires an increment
> of a per-cpu refcnt (reader_refcnt) and memory barrier. Which is acceptable
> from a performance-perspective, because IMHO its not horrendously worse than
> a preempt_disable().
>
>>
>>> If both of them update the same counter, there
>>> will be a semantic clash - an increment of the counter can either mean that
>>> a new writer became active, or it can also indicate a nested reader. A decrement
>>> can also similarly have 2 meanings. And thus it will be difficult to decide
>>> the right action to take, based on the value of the counter.
>>>
>>>>
>>>>> (The counter-dropping-to-zero logic is not safe, since it can be updated
>>>>> due to different reasons). And now that I look at it again, in the absence
>>>>> of the writer, the reader is allowed to be recursive at the heavy cost of
>>>>> taking the global rwlock for read, every 2nd time you nest (because the
>>>>> spinlock is non-recursive).
>>>>
>>>> (I did not understand your comments of this part)
>>>> nested reader is considered seldom.
>>>
>>> No, nested readers can be _quite_ frequent. Because, potentially all users
>>> of preempt_disable() are readers - and its well-known how frequently we
>>> nest preempt_disable(). As a simple example, any atomic reader who calls
>>> smp_call_function() will become a nested reader, because smp_call_function()
>>> itself is a reader. So reader nesting is expected to be quite frequent.
>>>
>>>> But if N(>=2) nested readers happen,
>>>> the overhead is:
>>>>     1 spin_try_lock() + 1 read_lock() + (N-1) __this_cpu_inc()
>>>>
>>>
>>> In my patch, its just this_cpu_inc(). Note that these are _very_ hot paths.
>>> So every bit of optimization that you can add is worthwhile.
>>>
>>> And your read_lock() is a _global_ lock - thus, it can lead to a lot of
>>> cache-line bouncing. That's *exactly* why I have used per-cpu refcounts in
>>> my synchronization scheme, to avoid taking the global rwlock as much as possible.
>>>
>>> Another important point to note is that, the overhead we are talking about
>>> here, exists even when _not_ performing hotplug. And its the replacement to
>>> the super-fast preempt_disable(). So its extremely important to consciously
>>> minimize this overhead - else we'll end up slowing down the system significantly.
>>>
>>
>> All I was considered is "nested reader is seldom", so I always
>> fallback to rwlock when nested.
>> If you like, I can add 6 lines of code, the overhead is
>> 1 spin_try_lock()(fast path)  + N  __this_cpu_inc()
>>
>
> I'm assuming that calculation is no longer valid, considering that
> we just discussed how the per-cpu refcnt that you were using is quite
> unnecessary and can be removed.
>
> IIUC, the overhead with your code, as per above discussion would be:
> 1 spin_try_lock() [non-nested] + N read_lock(global_rwlock).

https://github.com/laijs/linux/commit/46334544bb7961550b7065e015da76f6dab21f16

Again, I'm so sorry the name "fallback_reader_refcnt" misled you.

>
> Note that I'm referring to the scenario when hotplug is _not_ happening
> (ie., nobody is running writer side code).
>
>> The overhead of your code is
>> 2 smp_mb() + N __this_cpu_inc()
>>
>
> Right. And as you can see, this is much much better than the overhead
> shown above.

I will write a test to compare it to "1 spin_try_lock()(fast path)  +
N  __this_cpu_inc()"

>
>> I don't see how much different.
>>
>>>>> Also, this lg_rwlock implementation uses 3
>>>>> different data-structures - a per-cpu spinlock, a global rwlock and
>>>>> a per-cpu refcnt, and its not immediately apparent why you need those many
>>>>> or even those many varieties.
>>>>
>>>> data-structures is the same as yours.
>>>> fallback_reader_refcnt <--> reader_refcnt
>>>
>>> This has semantic problems, as noted above.
>>>
>>>> per-cpu spinlock <--> write_signal
>>>
>>> Acquire/release of (spin) lock is costlier than inc/dec of a counter, IIUC.
>>>
>>>> fallback_rwlock  <---> global_rwlock
>>>>
>>>>> Also I see that this doesn't handle the
>>>>> case of interrupt-handlers also being readers.
>>>>
>>>> handled. nested reader will see the ref or take the fallback_rwlock
>>>>
>>
>> Sorry, _reentrance_ read_lock() will see the ref or take the fallback_rwlock
>>
>>>
>>> I'm not referring to simple nested readers here, but interrupt handlers who
>>> can act as readers. For starters, the arch_spin_trylock() is not safe when
>>> interrupt handlers can also run the same code, right? You'll need to save
>>> and restore interrupts at critical points in the code. Also, the __foo()
>>> variants used to read/update the counters are not interrupt-safe.
>>
>> I must missed something.
>>
>> Could you elaborate more why arch_spin_trylock() is not safe when
>> interrupt handlers can also run the same code?
>>
>> Could you elaborate more why __this_cpu_op variants is not
>> interrupt-safe since they are always called paired.
>>
>
> Take a look at include/linux/percpu.h. You'll note that __this_cpu_*
> operations map to __this_cpu_generic_to_op(), which doesn't disable interrupts
> while doing the update. Hence you can get inconsistent results if an interrupt
> hits the CPU at that time and the interrupt handler tries to do the same thing.
> In contrast, if you use this_cpu_inc() for example, interrupts are explicitly
> disabled during the update and hence you won't get inconsistent results.


xx_lock()/xx_unlock() are must called paired, if interrupts happens
the value of the data is recovered after the interrupts return.

the same reason, preempt_disable() itself it is not irqsafe,
but preempt_disable()/preempt_enable() are called paired, so they are
all irqsafe.

>
>>
>>> And,
>>> the unlock() code in the reader path is again going to be confused about
>>> what to do when interrupt handlers interrupt regular readers, due to the
>>> messed up refcount.
>>
>> I still can't understand.
>>
>
> The primary reason _I_ was using the refcnt vs the reason _you_ were using the
> refcnt, appears to be very different.  Maybe that's why the above statement
> didn't make sense. In your case, IIUC, you can simply get rid of the refcnt
> and replace it with the simple check I mentioned above. Whereas, I use
> refcnts to keep the reader-side synchronization fast (and for reader-writer
> communication).
>
>
> Regards,
> Srivatsa S. Bhat
>

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-27  0:33                               ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-02-27  0:33 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: linux-doc, peterz, fweisbec, linux-kernel, Michel Lespinasse,
	mingo, linux-arch, linux, xiaoguangrong, wangyun, paulmck,
	nikunj, linux-pm, rusty, rostedt, rjw, namhyung, tglx,
	linux-arm-kernel, netdev, oleg, vincent.guittot, sbw, tj, akpm,
	linuxppc-dev

On Wed, Feb 27, 2013 at 3:30 AM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> On 02/26/2013 09:55 PM, Lai Jiangshan wrote:
>> On Tue, Feb 26, 2013 at 10:22 PM, Srivatsa S. Bhat
>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>
>>> Hi Lai,
>>>
>>> I'm really not convinced that piggy-backing on lglocks would help
>>> us in any way. But still, let me try to address some of the points
>>> you raised...
>>>
>>> On 02/26/2013 06:29 PM, Lai Jiangshan wrote:
>>>> On Tue, Feb 26, 2013 at 5:02 PM, Srivatsa S. Bhat
>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>> On 02/26/2013 05:47 AM, Lai Jiangshan wrote:
>>>>>> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
>>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>> Hi Lai,
>>>>>>>
>>>>>>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>>>>>>> Hi, Srivatsa,
>>>>>>>>
>>>>>>>> The target of the whole patchset is nice for me.
>>>>>>>
>>>>>>> Cool! Thanks :-)
>>>>>>>
>>>>> [...]
>>>>>
>>>>> Unfortunately, I see quite a few issues with the code above. IIUC, the
>>>>> writer and the reader both increment the same counters. So how will the
>>>>> unlock() code in the reader path know when to unlock which of the locks?
>>>>
>>>> The same as your code, the reader(which nested in write C.S.) just dec
>>>> the counters.
>>>
>>> And that works fine in my case because the writer and the reader update
>>> _two_ _different_ counters.
>>
>> I can't find any magic in your code, they are the same counter.
>>
>>         /*
>>          * It is desirable to allow the writer to acquire the percpu-rwlock
>>          * for read (if necessary), without deadlocking or getting complaints
>>          * from lockdep. To achieve that, just increment the reader_refcnt of
>>          * this CPU - that way, any attempt by the writer to acquire the
>>          * percpu-rwlock for read, will get treated as a case of nested percpu
>>          * reader, which is safe, from a locking perspective.
>>          */
>>         this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
>>
>
> Whoa! Hold on, were you really referring to _this_ increment when you said
> that, in your patch you would increment the refcnt at the writer? Then I guess
> there is a major disconnect in our conversations. (I had assumed that you were
> referring to the update of writer_signal, and were just trying to have a single
> refcnt instead of reader_refcnt and writer_signal).

https://github.com/laijs/linux/commit/53e5053d5b724bea7c538b11743d0f420d98f38d

Sorry the name "fallback_reader_refcnt" misled you.

>
> So, please let me clarify things a bit here. Forget about the above increment
> of reader_refcnt at the writer side. Its almost utterly insignificant for our
> current discussion. We can simply replace it with a check as shown below, at
> the reader side:
>
> void percpu_read_lock_irqsafe()
> {
>         if (current == active_writer)
>                 return;
>
>         /* Rest of the code */
> }
>
> Now, assuming that, in your patch, you were trying to use the per-cpu refcnt
> to allow the writer to safely take the reader path, you can simply get rid of
> that percpu-refcnt, as demonstrated above.
>
> So that would reduce your code to the following (after simplification):
>
> lg_rwlock_local_read_lock()
> {
>         if (current == active_writer)
>                 return;
>         if (arch_spin_trylock(per-cpu-spinlock))
>                 return;
>         read_lock(global-rwlock);
> }
>
> Now, let us assume that hotplug is not happening, meaning, nobody is running
> the writer side code. Now let's see what happens at the reader side in your
> patch. As I mentioned earlier, the readers are _very_ frequent and can be in
> very hot paths. And they also happen to be nested quite often.
>
> So, a non-nested reader acquires the per-cpu spinlock. Every subsequent nested
> reader on that CPU has to acquire the global rwlock for read. Right there you
> have 2 significant performance issues -
> 1. Acquiring the (spin) lock is costly
> 2. Acquiring the global rwlock causes cacheline bouncing, which hurts
>    performance.
>
> And why do we care so much about performance here? Because, the existing
> kernel does an efficient preempt_disable() here - which is an optimized
> per-cpu counter increment. Replacing that with such heavy primitives on the
> reader side can be very bad.
>
> Now, how does my patchset tackle this? My scheme just requires an increment
> of a per-cpu refcnt (reader_refcnt) and memory barrier. Which is acceptable
> from a performance-perspective, because IMHO its not horrendously worse than
> a preempt_disable().
>
>>
>>> If both of them update the same counter, there
>>> will be a semantic clash - an increment of the counter can either mean that
>>> a new writer became active, or it can also indicate a nested reader. A decrement
>>> can also similarly have 2 meanings. And thus it will be difficult to decide
>>> the right action to take, based on the value of the counter.
>>>
>>>>
>>>>> (The counter-dropping-to-zero logic is not safe, since it can be updated
>>>>> due to different reasons). And now that I look at it again, in the absence
>>>>> of the writer, the reader is allowed to be recursive at the heavy cost of
>>>>> taking the global rwlock for read, every 2nd time you nest (because the
>>>>> spinlock is non-recursive).
>>>>
>>>> (I did not understand your comments of this part)
>>>> nested reader is considered seldom.
>>>
>>> No, nested readers can be _quite_ frequent. Because, potentially all users
>>> of preempt_disable() are readers - and its well-known how frequently we
>>> nest preempt_disable(). As a simple example, any atomic reader who calls
>>> smp_call_function() will become a nested reader, because smp_call_function()
>>> itself is a reader. So reader nesting is expected to be quite frequent.
>>>
>>>> But if N(>=2) nested readers happen,
>>>> the overhead is:
>>>>     1 spin_try_lock() + 1 read_lock() + (N-1) __this_cpu_inc()
>>>>
>>>
>>> In my patch, its just this_cpu_inc(). Note that these are _very_ hot paths.
>>> So every bit of optimization that you can add is worthwhile.
>>>
>>> And your read_lock() is a _global_ lock - thus, it can lead to a lot of
>>> cache-line bouncing. That's *exactly* why I have used per-cpu refcounts in
>>> my synchronization scheme, to avoid taking the global rwlock as much as possible.
>>>
>>> Another important point to note is that, the overhead we are talking about
>>> here, exists even when _not_ performing hotplug. And its the replacement to
>>> the super-fast preempt_disable(). So its extremely important to consciously
>>> minimize this overhead - else we'll end up slowing down the system significantly.
>>>
>>
>> All I was considered is "nested reader is seldom", so I always
>> fallback to rwlock when nested.
>> If you like, I can add 6 lines of code, the overhead is
>> 1 spin_try_lock()(fast path)  + N  __this_cpu_inc()
>>
>
> I'm assuming that calculation is no longer valid, considering that
> we just discussed how the per-cpu refcnt that you were using is quite
> unnecessary and can be removed.
>
> IIUC, the overhead with your code, as per above discussion would be:
> 1 spin_try_lock() [non-nested] + N read_lock(global_rwlock).

https://github.com/laijs/linux/commit/46334544bb7961550b7065e015da76f6dab21f16

Again, I'm so sorry the name "fallback_reader_refcnt" misled you.

>
> Note that I'm referring to the scenario when hotplug is _not_ happening
> (ie., nobody is running writer side code).
>
>> The overhead of your code is
>> 2 smp_mb() + N __this_cpu_inc()
>>
>
> Right. And as you can see, this is much much better than the overhead
> shown above.

I will write a test to compare it to "1 spin_try_lock()(fast path)  +
N  __this_cpu_inc()"

>
>> I don't see how much different.
>>
>>>>> Also, this lg_rwlock implementation uses 3
>>>>> different data-structures - a per-cpu spinlock, a global rwlock and
>>>>> a per-cpu refcnt, and its not immediately apparent why you need those many
>>>>> or even those many varieties.
>>>>
>>>> data-structures is the same as yours.
>>>> fallback_reader_refcnt <--> reader_refcnt
>>>
>>> This has semantic problems, as noted above.
>>>
>>>> per-cpu spinlock <--> write_signal
>>>
>>> Acquire/release of (spin) lock is costlier than inc/dec of a counter, IIUC.
>>>
>>>> fallback_rwlock  <---> global_rwlock
>>>>
>>>>> Also I see that this doesn't handle the
>>>>> case of interrupt-handlers also being readers.
>>>>
>>>> handled. nested reader will see the ref or take the fallback_rwlock
>>>>
>>
>> Sorry, _reentrance_ read_lock() will see the ref or take the fallback_rwlock
>>
>>>
>>> I'm not referring to simple nested readers here, but interrupt handlers who
>>> can act as readers. For starters, the arch_spin_trylock() is not safe when
>>> interrupt handlers can also run the same code, right? You'll need to save
>>> and restore interrupts at critical points in the code. Also, the __foo()
>>> variants used to read/update the counters are not interrupt-safe.
>>
>> I must missed something.
>>
>> Could you elaborate more why arch_spin_trylock() is not safe when
>> interrupt handlers can also run the same code?
>>
>> Could you elaborate more why __this_cpu_op variants is not
>> interrupt-safe since they are always called paired.
>>
>
> Take a look at include/linux/percpu.h. You'll note that __this_cpu_*
> operations map to __this_cpu_generic_to_op(), which doesn't disable interrupts
> while doing the update. Hence you can get inconsistent results if an interrupt
> hits the CPU at that time and the interrupt handler tries to do the same thing.
> In contrast, if you use this_cpu_inc() for example, interrupts are explicitly
> disabled during the update and hence you won't get inconsistent results.


xx_lock()/xx_unlock() are must called paired, if interrupts happens
the value of the data is recovered after the interrupts return.

the same reason, preempt_disable() itself it is not irqsafe,
but preempt_disable()/preempt_enable() are called paired, so they are
all irqsafe.

>
>>
>>> And,
>>> the unlock() code in the reader path is again going to be confused about
>>> what to do when interrupt handlers interrupt regular readers, due to the
>>> messed up refcount.
>>
>> I still can't understand.
>>
>
> The primary reason _I_ was using the refcnt vs the reason _you_ were using the
> refcnt, appears to be very different.  Maybe that's why the above statement
> didn't make sense. In your case, IIUC, you can simply get rid of the refcnt
> and replace it with the simple check I mentioned above. Whereas, I use
> refcnts to keep the reader-side synchronization fast (and for reader-writer
> communication).
>
>
> Regards,
> Srivatsa S. Bhat
>

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-27  0:33                               ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-02-27  0:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 27, 2013 at 3:30 AM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> On 02/26/2013 09:55 PM, Lai Jiangshan wrote:
>> On Tue, Feb 26, 2013 at 10:22 PM, Srivatsa S. Bhat
>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>
>>> Hi Lai,
>>>
>>> I'm really not convinced that piggy-backing on lglocks would help
>>> us in any way. But still, let me try to address some of the points
>>> you raised...
>>>
>>> On 02/26/2013 06:29 PM, Lai Jiangshan wrote:
>>>> On Tue, Feb 26, 2013 at 5:02 PM, Srivatsa S. Bhat
>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>> On 02/26/2013 05:47 AM, Lai Jiangshan wrote:
>>>>>> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
>>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>> Hi Lai,
>>>>>>>
>>>>>>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>>>>>>> Hi, Srivatsa,
>>>>>>>>
>>>>>>>> The target of the whole patchset is nice for me.
>>>>>>>
>>>>>>> Cool! Thanks :-)
>>>>>>>
>>>>> [...]
>>>>>
>>>>> Unfortunately, I see quite a few issues with the code above. IIUC, the
>>>>> writer and the reader both increment the same counters. So how will the
>>>>> unlock() code in the reader path know when to unlock which of the locks?
>>>>
>>>> The same as your code, the reader(which nested in write C.S.) just dec
>>>> the counters.
>>>
>>> And that works fine in my case because the writer and the reader update
>>> _two_ _different_ counters.
>>
>> I can't find any magic in your code, they are the same counter.
>>
>>         /*
>>          * It is desirable to allow the writer to acquire the percpu-rwlock
>>          * for read (if necessary), without deadlocking or getting complaints
>>          * from lockdep. To achieve that, just increment the reader_refcnt of
>>          * this CPU - that way, any attempt by the writer to acquire the
>>          * percpu-rwlock for read, will get treated as a case of nested percpu
>>          * reader, which is safe, from a locking perspective.
>>          */
>>         this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
>>
>
> Whoa! Hold on, were you really referring to _this_ increment when you said
> that, in your patch you would increment the refcnt at the writer? Then I guess
> there is a major disconnect in our conversations. (I had assumed that you were
> referring to the update of writer_signal, and were just trying to have a single
> refcnt instead of reader_refcnt and writer_signal).

https://github.com/laijs/linux/commit/53e5053d5b724bea7c538b11743d0f420d98f38d

Sorry the name "fallback_reader_refcnt" misled you.

>
> So, please let me clarify things a bit here. Forget about the above increment
> of reader_refcnt at the writer side. Its almost utterly insignificant for our
> current discussion. We can simply replace it with a check as shown below, at
> the reader side:
>
> void percpu_read_lock_irqsafe()
> {
>         if (current == active_writer)
>                 return;
>
>         /* Rest of the code */
> }
>
> Now, assuming that, in your patch, you were trying to use the per-cpu refcnt
> to allow the writer to safely take the reader path, you can simply get rid of
> that percpu-refcnt, as demonstrated above.
>
> So that would reduce your code to the following (after simplification):
>
> lg_rwlock_local_read_lock()
> {
>         if (current == active_writer)
>                 return;
>         if (arch_spin_trylock(per-cpu-spinlock))
>                 return;
>         read_lock(global-rwlock);
> }
>
> Now, let us assume that hotplug is not happening, meaning, nobody is running
> the writer side code. Now let's see what happens at the reader side in your
> patch. As I mentioned earlier, the readers are _very_ frequent and can be in
> very hot paths. And they also happen to be nested quite often.
>
> So, a non-nested reader acquires the per-cpu spinlock. Every subsequent nested
> reader on that CPU has to acquire the global rwlock for read. Right there you
> have 2 significant performance issues -
> 1. Acquiring the (spin) lock is costly
> 2. Acquiring the global rwlock causes cacheline bouncing, which hurts
>    performance.
>
> And why do we care so much about performance here? Because, the existing
> kernel does an efficient preempt_disable() here - which is an optimized
> per-cpu counter increment. Replacing that with such heavy primitives on the
> reader side can be very bad.
>
> Now, how does my patchset tackle this? My scheme just requires an increment
> of a per-cpu refcnt (reader_refcnt) and memory barrier. Which is acceptable
> from a performance-perspective, because IMHO its not horrendously worse than
> a preempt_disable().
>
>>
>>> If both of them update the same counter, there
>>> will be a semantic clash - an increment of the counter can either mean that
>>> a new writer became active, or it can also indicate a nested reader. A decrement
>>> can also similarly have 2 meanings. And thus it will be difficult to decide
>>> the right action to take, based on the value of the counter.
>>>
>>>>
>>>>> (The counter-dropping-to-zero logic is not safe, since it can be updated
>>>>> due to different reasons). And now that I look at it again, in the absence
>>>>> of the writer, the reader is allowed to be recursive at the heavy cost of
>>>>> taking the global rwlock for read, every 2nd time you nest (because the
>>>>> spinlock is non-recursive).
>>>>
>>>> (I did not understand your comments of this part)
>>>> nested reader is considered seldom.
>>>
>>> No, nested readers can be _quite_ frequent. Because, potentially all users
>>> of preempt_disable() are readers - and its well-known how frequently we
>>> nest preempt_disable(). As a simple example, any atomic reader who calls
>>> smp_call_function() will become a nested reader, because smp_call_function()
>>> itself is a reader. So reader nesting is expected to be quite frequent.
>>>
>>>> But if N(>=2) nested readers happen,
>>>> the overhead is:
>>>>     1 spin_try_lock() + 1 read_lock() + (N-1) __this_cpu_inc()
>>>>
>>>
>>> In my patch, its just this_cpu_inc(). Note that these are _very_ hot paths.
>>> So every bit of optimization that you can add is worthwhile.
>>>
>>> And your read_lock() is a _global_ lock - thus, it can lead to a lot of
>>> cache-line bouncing. That's *exactly* why I have used per-cpu refcounts in
>>> my synchronization scheme, to avoid taking the global rwlock as much as possible.
>>>
>>> Another important point to note is that, the overhead we are talking about
>>> here, exists even when _not_ performing hotplug. And its the replacement to
>>> the super-fast preempt_disable(). So its extremely important to consciously
>>> minimize this overhead - else we'll end up slowing down the system significantly.
>>>
>>
>> All I was considered is "nested reader is seldom", so I always
>> fallback to rwlock when nested.
>> If you like, I can add 6 lines of code, the overhead is
>> 1 spin_try_lock()(fast path)  + N  __this_cpu_inc()
>>
>
> I'm assuming that calculation is no longer valid, considering that
> we just discussed how the per-cpu refcnt that you were using is quite
> unnecessary and can be removed.
>
> IIUC, the overhead with your code, as per above discussion would be:
> 1 spin_try_lock() [non-nested] + N read_lock(global_rwlock).

https://github.com/laijs/linux/commit/46334544bb7961550b7065e015da76f6dab21f16

Again, I'm so sorry the name "fallback_reader_refcnt" misled you.

>
> Note that I'm referring to the scenario when hotplug is _not_ happening
> (ie., nobody is running writer side code).
>
>> The overhead of your code is
>> 2 smp_mb() + N __this_cpu_inc()
>>
>
> Right. And as you can see, this is much much better than the overhead
> shown above.

I will write a test to compare it to "1 spin_try_lock()(fast path)  +
N  __this_cpu_inc()"

>
>> I don't see how much different.
>>
>>>>> Also, this lg_rwlock implementation uses 3
>>>>> different data-structures - a per-cpu spinlock, a global rwlock and
>>>>> a per-cpu refcnt, and its not immediately apparent why you need those many
>>>>> or even those many varieties.
>>>>
>>>> data-structures is the same as yours.
>>>> fallback_reader_refcnt <--> reader_refcnt
>>>
>>> This has semantic problems, as noted above.
>>>
>>>> per-cpu spinlock <--> write_signal
>>>
>>> Acquire/release of (spin) lock is costlier than inc/dec of a counter, IIUC.
>>>
>>>> fallback_rwlock  <---> global_rwlock
>>>>
>>>>> Also I see that this doesn't handle the
>>>>> case of interrupt-handlers also being readers.
>>>>
>>>> handled. nested reader will see the ref or take the fallback_rwlock
>>>>
>>
>> Sorry, _reentrance_ read_lock() will see the ref or take the fallback_rwlock
>>
>>>
>>> I'm not referring to simple nested readers here, but interrupt handlers who
>>> can act as readers. For starters, the arch_spin_trylock() is not safe when
>>> interrupt handlers can also run the same code, right? You'll need to save
>>> and restore interrupts at critical points in the code. Also, the __foo()
>>> variants used to read/update the counters are not interrupt-safe.
>>
>> I must missed something.
>>
>> Could you elaborate more why arch_spin_trylock() is not safe when
>> interrupt handlers can also run the same code?
>>
>> Could you elaborate more why __this_cpu_op variants is not
>> interrupt-safe since they are always called paired.
>>
>
> Take a look at include/linux/percpu.h. You'll note that __this_cpu_*
> operations map to __this_cpu_generic_to_op(), which doesn't disable interrupts
> while doing the update. Hence you can get inconsistent results if an interrupt
> hits the CPU at that time and the interrupt handler tries to do the same thing.
> In contrast, if you use this_cpu_inc() for example, interrupts are explicitly
> disabled during the update and hence you won't get inconsistent results.


xx_lock()/xx_unlock() are must called paired, if interrupts happens
the value of the data is recovered after the interrupts return.

the same reason, preempt_disable() itself it is not irqsafe,
but preempt_disable()/preempt_enable() are called paired, so they are
all irqsafe.

>
>>
>>> And,
>>> the unlock() code in the reader path is again going to be confused about
>>> what to do when interrupt handlers interrupt regular readers, due to the
>>> messed up refcount.
>>
>> I still can't understand.
>>
>
> The primary reason _I_ was using the refcnt vs the reason _you_ were using the
> refcnt, appears to be very different.  Maybe that's why the above statement
> didn't make sense. In your case, IIUC, you can simply get rid of the refcnt
> and replace it with the simple check I mentioned above. Whereas, I use
> refcnts to keep the reader-side synchronization fast (and for reader-writer
> communication).
>
>
> Regards,
> Srivatsa S. Bhat
>

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
  2013-02-26 19:30                             ` Srivatsa S. Bhat
  (?)
@ 2013-02-27 11:11                               ` Michel Lespinasse
  -1 siblings, 0 replies; 360+ messages in thread
From: Michel Lespinasse @ 2013-02-27 11:11 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Lai Jiangshan, linux-doc, peterz, fweisbec, linux-kernel,
	namhyung, mingo, linux-arch, linux, xiaoguangrong, wangyun,
	paulmck, nikunj, linux-pm, rusty, rostedt, rjw, vincent.guittot,
	tglx, linux-arm-kernel, netdev, oleg, sbw, tj, akpm,
	linuxppc-dev

Hi Srivatsa,

I think there is some elegance in Lai's proposal of using a local
trylock for the reader uncontended case and global rwlock to deal with
the contended case without deadlocks. He apparently didn't realize
initially that nested read locks are common, and he seems to have
confused you because of that, but I think his proposal could be
changed easily to account for that and result in short, easily
understood code. What about the following:

- local_refcnt is a local lock count; it indicates how many recursive
locks are taken using the local lglock
- lglock is used by readers for local locking; it must be acquired
before local_refcnt becomes nonzero and released after local_refcnt
goes back to zero.
- fallback_rwlock is used by readers for global locking; it is
acquired when fallback_reader_refcnt is zero and the trylock fails on
lglock

+void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
+{
+       preempt_disable();
+
+       if (__this_cpu_read(*lgrw->local_refcnt) ||
+           arch_spin_trylock(this_cpu_ptr(lgrw->lglock->lock))) {
+               __this_cpu_inc(*lgrw->local_refcnt);
+
rwlock_acquire_read(&lgrw->fallback_rwlock->lock_dep_map, 0, 0,
_RET_IP_);
+       } else {
+               read_lock(&lgrw->fallback_rwlock);
+       }
+}
+EXPORT_SYMBOL(lg_rwlock_local_read_lock);
+
+void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
+{
+       if (likely(__this_cpu_read(*lgrw->local_refcnt))) {
+               rwlock_release(&lgrw->fallback_rwlock->lock_dep_map,
1, _RET_IP_);
+               if (!__this_cpu_dec_return(*lgrw->local_refcnt))
+                       arch_spin_unlock(this_cpu_ptr(lgrw->lglock->lock));
+       } else {
+               read_unlock(&lgrw->fallback_rwlock);
+       }
+
+       preempt_enable();
+}
+EXPORT_SYMBOL(lg_rwlock_local_read_unlock);
+
+void lg_rwlock_global_write_lock(struct lgrwlock *lgrw)
+{
+       int i;
+
+       preempt_disable();
+
+       for_each_possible_cpu(i)
+               arch_spin_lock(per_cpu_ptr(lgrw->lglock->lock, i));
+       write_lock(&lgrw->fallback_rwlock);
+}
+EXPORT_SYMBOL(lg_rwlock_global_write_lock);
+
+void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw)
+{
+       int i;
+
+       write_unlock(&lgrw->fallback_rwlock);
+       for_each_possible_cpu(i)
+               arch_spin_unlock(per_cpu_ptr(lgrw->lglock->lock, i));
+
+       preempt_enable();
+}
+EXPORT_SYMBOL(lg_rwlock_global_write_unlock);

This is to me relatively easier to understand than Srivatsa's
proposal. Now I'm not sure who wins efficiency wise, but I think it
should be relatively close as readers at least don't touch shared
state in the uncontended case (even with some recursion going on).

There is an interesting case where lg_rwlock_local_read_lock could be
interrupted after getting the local lglock but before incrementing
local_refcnt to 1; if that happens any nested readers within that
interrupt will have to take the global rwlock read side. I think this
is perfectly acceptable as this should not be a common case though
(and thus the global rwlock cache line probably wouldn't even bounce
between cpus then).

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-27 11:11                               ` Michel Lespinasse
  0 siblings, 0 replies; 360+ messages in thread
From: Michel Lespinasse @ 2013-02-27 11:11 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Lai Jiangshan, linux-doc, peterz, fweisbec, linux-kernel, mingo,
	linux-arch, linux, xiaoguangrong, wangyun, paulmck, nikunj,
	linux-pm, rusty, rostedt, rjw, namhyung, tglx, linux-arm-kernel,
	netdev, oleg, vincent.guittot, sbw, tj, akpm, linuxppc-dev

Hi Srivatsa,

I think there is some elegance in Lai's proposal of using a local
trylock for the reader uncontended case and global rwlock to deal with
the contended case without deadlocks. He apparently didn't realize
initially that nested read locks are common, and he seems to have
confused you because of that, but I think his proposal could be
changed easily to account for that and result in short, easily
understood code. What about the following:

- local_refcnt is a local lock count; it indicates how many recursive
locks are taken using the local lglock
- lglock is used by readers for local locking; it must be acquired
before local_refcnt becomes nonzero and released after local_refcnt
goes back to zero.
- fallback_rwlock is used by readers for global locking; it is
acquired when fallback_reader_refcnt is zero and the trylock fails on
lglock

+void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
+{
+       preempt_disable();
+
+       if (__this_cpu_read(*lgrw->local_refcnt) ||
+           arch_spin_trylock(this_cpu_ptr(lgrw->lglock->lock))) {
+               __this_cpu_inc(*lgrw->local_refcnt);
+
rwlock_acquire_read(&lgrw->fallback_rwlock->lock_dep_map, 0, 0,
_RET_IP_);
+       } else {
+               read_lock(&lgrw->fallback_rwlock);
+       }
+}
+EXPORT_SYMBOL(lg_rwlock_local_read_lock);
+
+void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
+{
+       if (likely(__this_cpu_read(*lgrw->local_refcnt))) {
+               rwlock_release(&lgrw->fallback_rwlock->lock_dep_map,
1, _RET_IP_);
+               if (!__this_cpu_dec_return(*lgrw->local_refcnt))
+                       arch_spin_unlock(this_cpu_ptr(lgrw->lglock->lock));
+       } else {
+               read_unlock(&lgrw->fallback_rwlock);
+       }
+
+       preempt_enable();
+}
+EXPORT_SYMBOL(lg_rwlock_local_read_unlock);
+
+void lg_rwlock_global_write_lock(struct lgrwlock *lgrw)
+{
+       int i;
+
+       preempt_disable();
+
+       for_each_possible_cpu(i)
+               arch_spin_lock(per_cpu_ptr(lgrw->lglock->lock, i));
+       write_lock(&lgrw->fallback_rwlock);
+}
+EXPORT_SYMBOL(lg_rwlock_global_write_lock);
+
+void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw)
+{
+       int i;
+
+       write_unlock(&lgrw->fallback_rwlock);
+       for_each_possible_cpu(i)
+               arch_spin_unlock(per_cpu_ptr(lgrw->lglock->lock, i));
+
+       preempt_enable();
+}
+EXPORT_SYMBOL(lg_rwlock_global_write_unlock);

This is to me relatively easier to understand than Srivatsa's
proposal. Now I'm not sure who wins efficiency wise, but I think it
should be relatively close as readers at least don't touch shared
state in the uncontended case (even with some recursion going on).

There is an interesting case where lg_rwlock_local_read_lock could be
interrupted after getting the local lglock but before incrementing
local_refcnt to 1; if that happens any nested readers within that
interrupt will have to take the global rwlock read side. I think this
is perfectly acceptable as this should not be a common case though
(and thus the global rwlock cache line probably wouldn't even bounce
between cpus then).

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-27 11:11                               ` Michel Lespinasse
  0 siblings, 0 replies; 360+ messages in thread
From: Michel Lespinasse @ 2013-02-27 11:11 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Srivatsa,

I think there is some elegance in Lai's proposal of using a local
trylock for the reader uncontended case and global rwlock to deal with
the contended case without deadlocks. He apparently didn't realize
initially that nested read locks are common, and he seems to have
confused you because of that, but I think his proposal could be
changed easily to account for that and result in short, easily
understood code. What about the following:

- local_refcnt is a local lock count; it indicates how many recursive
locks are taken using the local lglock
- lglock is used by readers for local locking; it must be acquired
before local_refcnt becomes nonzero and released after local_refcnt
goes back to zero.
- fallback_rwlock is used by readers for global locking; it is
acquired when fallback_reader_refcnt is zero and the trylock fails on
lglock

+void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
+{
+       preempt_disable();
+
+       if (__this_cpu_read(*lgrw->local_refcnt) ||
+           arch_spin_trylock(this_cpu_ptr(lgrw->lglock->lock))) {
+               __this_cpu_inc(*lgrw->local_refcnt);
+
rwlock_acquire_read(&lgrw->fallback_rwlock->lock_dep_map, 0, 0,
_RET_IP_);
+       } else {
+               read_lock(&lgrw->fallback_rwlock);
+       }
+}
+EXPORT_SYMBOL(lg_rwlock_local_read_lock);
+
+void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
+{
+       if (likely(__this_cpu_read(*lgrw->local_refcnt))) {
+               rwlock_release(&lgrw->fallback_rwlock->lock_dep_map,
1, _RET_IP_);
+               if (!__this_cpu_dec_return(*lgrw->local_refcnt))
+                       arch_spin_unlock(this_cpu_ptr(lgrw->lglock->lock));
+       } else {
+               read_unlock(&lgrw->fallback_rwlock);
+       }
+
+       preempt_enable();
+}
+EXPORT_SYMBOL(lg_rwlock_local_read_unlock);
+
+void lg_rwlock_global_write_lock(struct lgrwlock *lgrw)
+{
+       int i;
+
+       preempt_disable();
+
+       for_each_possible_cpu(i)
+               arch_spin_lock(per_cpu_ptr(lgrw->lglock->lock, i));
+       write_lock(&lgrw->fallback_rwlock);
+}
+EXPORT_SYMBOL(lg_rwlock_global_write_lock);
+
+void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw)
+{
+       int i;
+
+       write_unlock(&lgrw->fallback_rwlock);
+       for_each_possible_cpu(i)
+               arch_spin_unlock(per_cpu_ptr(lgrw->lglock->lock, i));
+
+       preempt_enable();
+}
+EXPORT_SYMBOL(lg_rwlock_global_write_unlock);

This is to me relatively easier to understand than Srivatsa's
proposal. Now I'm not sure who wins efficiency wise, but I think it
should be relatively close as readers at least don't touch shared
state in the uncontended case (even with some recursion going on).

There is an interesting case where lg_rwlock_local_read_lock could be
interrupted after getting the local lglock but before incrementing
local_refcnt to 1; if that happens any nested readers within that
interrupt will have to take the global rwlock read side. I think this
is perfectly acceptable as this should not be a common case though
(and thus the global rwlock cache line probably wouldn't even bounce
between cpus then).

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
  2013-02-27 11:11                               ` Michel Lespinasse
  (?)
@ 2013-02-27 19:25                                 ` Oleg Nesterov
  -1 siblings, 0 replies; 360+ messages in thread
From: Oleg Nesterov @ 2013-02-27 19:25 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: Srivatsa S. Bhat, Lai Jiangshan, linux-doc, peterz, fweisbec,
	linux-kernel, namhyung, mingo, linux-arch, linux, xiaoguangrong,
	wangyun, paulmck, nikunj, linux-pm, rusty, rostedt, rjw,
	vincent.guittot, tglx, linux-arm-kernel, netdev, sbw, tj, akpm,
	linuxppc-dev

On 02/27, Michel Lespinasse wrote:
>
> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
> +{
> +       preempt_disable();
> +
> +       if (__this_cpu_read(*lgrw->local_refcnt) ||
> +           arch_spin_trylock(this_cpu_ptr(lgrw->lglock->lock))) {
> +               __this_cpu_inc(*lgrw->local_refcnt);

Please look at __this_cpu_generic_to_op(). You need this_cpu_inc()
to avoid the race with irs. The same for _read_unlock.

But otherwise I agree, looks like a clever and working idea to me.
And simple!

> There is an interesting case where lg_rwlock_local_read_lock could be
> interrupted after getting the local lglock but before incrementing
> local_refcnt to 1; if that happens any nested readers within that
> interrupt will have to take the global rwlock read side. I think this
> is perfectly acceptable

Agreed.

Or interrupt can do spin_trylock(percpu-lock) after we take the global
->fallback_rwlock (if we race with write_lock + write_unlock), but I do
not see any possible deadlock in this case.

Oleg.


^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-27 19:25                                 ` Oleg Nesterov
  0 siblings, 0 replies; 360+ messages in thread
From: Oleg Nesterov @ 2013-02-27 19:25 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: Lai Jiangshan, linux-doc, peterz, fweisbec, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, namhyung, tglx, linux-arm-kernel, netdev,
	linux-kernel, vincent.guittot, sbw, Srivatsa S. Bhat, tj, akpm,
	linuxppc-dev

On 02/27, Michel Lespinasse wrote:
>
> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
> +{
> +       preempt_disable();
> +
> +       if (__this_cpu_read(*lgrw->local_refcnt) ||
> +           arch_spin_trylock(this_cpu_ptr(lgrw->lglock->lock))) {
> +               __this_cpu_inc(*lgrw->local_refcnt);

Please look at __this_cpu_generic_to_op(). You need this_cpu_inc()
to avoid the race with irs. The same for _read_unlock.

But otherwise I agree, looks like a clever and working idea to me.
And simple!

> There is an interesting case where lg_rwlock_local_read_lock could be
> interrupted after getting the local lglock but before incrementing
> local_refcnt to 1; if that happens any nested readers within that
> interrupt will have to take the global rwlock read side. I think this
> is perfectly acceptable

Agreed.

Or interrupt can do spin_trylock(percpu-lock) after we take the global
->fallback_rwlock (if we race with write_lock + write_unlock), but I do
not see any possible deadlock in this case.

Oleg.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-27 19:25                                 ` Oleg Nesterov
  0 siblings, 0 replies; 360+ messages in thread
From: Oleg Nesterov @ 2013-02-27 19:25 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/27, Michel Lespinasse wrote:
>
> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
> +{
> +       preempt_disable();
> +
> +       if (__this_cpu_read(*lgrw->local_refcnt) ||
> +           arch_spin_trylock(this_cpu_ptr(lgrw->lglock->lock))) {
> +               __this_cpu_inc(*lgrw->local_refcnt);

Please look at __this_cpu_generic_to_op(). You need this_cpu_inc()
to avoid the race with irs. The same for _read_unlock.

But otherwise I agree, looks like a clever and working idea to me.
And simple!

> There is an interesting case where lg_rwlock_local_read_lock could be
> interrupted after getting the local lglock but before incrementing
> local_refcnt to 1; if that happens any nested readers within that
> interrupt will have to take the global rwlock read side. I think this
> is perfectly acceptable

Agreed.

Or interrupt can do spin_trylock(percpu-lock) after we take the global
->fallback_rwlock (if we race with write_lock + write_unlock), but I do
not see any possible deadlock in this case.

Oleg.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
  2013-02-27  0:33                               ` Lai Jiangshan
  (?)
  (?)
@ 2013-02-27 21:19                                 ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-27 21:19 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Michel Lespinasse, linux-doc, peterz, fweisbec, linux-kernel,
	namhyung, mingo, linux-arch, linux, xiaoguangrong, wangyun,
	paulmck, nikunj, linux-pm, rusty, rostedt, rjw, vincent.guittot,
	tglx, linux-arm-kernel, netdev, oleg, sbw, tj, akpm,
	linuxppc-dev

On 02/27/2013 06:03 AM, Lai Jiangshan wrote:
> On Wed, Feb 27, 2013 at 3:30 AM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> On 02/26/2013 09:55 PM, Lai Jiangshan wrote:
>>> On Tue, Feb 26, 2013 at 10:22 PM, Srivatsa S. Bhat
>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>
>>>> Hi Lai,
>>>>
>>>> I'm really not convinced that piggy-backing on lglocks would help
>>>> us in any way. But still, let me try to address some of the points
>>>> you raised...
>>>>
>>>> On 02/26/2013 06:29 PM, Lai Jiangshan wrote:
>>>>> On Tue, Feb 26, 2013 at 5:02 PM, Srivatsa S. Bhat
>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>> On 02/26/2013 05:47 AM, Lai Jiangshan wrote:
>>>>>>> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
>>>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>>> Hi Lai,
>>>>>>>>
>>>>>>>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>>>>>>>> Hi, Srivatsa,
>>>>>>>>>
>>>>>>>>> The target of the whole patchset is nice for me.
>>>>>>>>
>>>>>>>> Cool! Thanks :-)
>>>>>>>>
>>>>>> [...]
>>>>>>
>>>>>> Unfortunately, I see quite a few issues with the code above. IIUC, the
>>>>>> writer and the reader both increment the same counters. So how will the
>>>>>> unlock() code in the reader path know when to unlock which of the locks?
>>>>>
>>>>> The same as your code, the reader(which nested in write C.S.) just dec
>>>>> the counters.
>>>>
>>>> And that works fine in my case because the writer and the reader update
>>>> _two_ _different_ counters.
>>>
>>> I can't find any magic in your code, they are the same counter.
>>>
>>>         /*
>>>          * It is desirable to allow the writer to acquire the percpu-rwlock
>>>          * for read (if necessary), without deadlocking or getting complaints
>>>          * from lockdep. To achieve that, just increment the reader_refcnt of
>>>          * this CPU - that way, any attempt by the writer to acquire the
>>>          * percpu-rwlock for read, will get treated as a case of nested percpu
>>>          * reader, which is safe, from a locking perspective.
>>>          */
>>>         this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
>>>
>>
>> Whoa! Hold on, were you really referring to _this_ increment when you said
>> that, in your patch you would increment the refcnt at the writer? Then I guess
>> there is a major disconnect in our conversations. (I had assumed that you were
>> referring to the update of writer_signal, and were just trying to have a single
>> refcnt instead of reader_refcnt and writer_signal).
> 
> https://github.com/laijs/linux/commit/53e5053d5b724bea7c538b11743d0f420d98f38d
> 
> Sorry the name "fallback_reader_refcnt" misled you.
> 
[...]

>>> All I was considered is "nested reader is seldom", so I always
>>> fallback to rwlock when nested.
>>> If you like, I can add 6 lines of code, the overhead is
>>> 1 spin_try_lock()(fast path)  + N  __this_cpu_inc()
>>>
>>
>> I'm assuming that calculation is no longer valid, considering that
>> we just discussed how the per-cpu refcnt that you were using is quite
>> unnecessary and can be removed.
>>
>> IIUC, the overhead with your code, as per above discussion would be:
>> 1 spin_try_lock() [non-nested] + N read_lock(global_rwlock).
> 
> https://github.com/laijs/linux/commit/46334544bb7961550b7065e015da76f6dab21f16
> 
> Again, I'm so sorry the name "fallback_reader_refcnt" misled you.
> 

At this juncture I really have to admit that I don't understand your
intentions at all. What are you really trying to prove? Without giving
a single good reason why my code is inferior, why are you even bringing
up the discussion about a complete rewrite of the synchronization code?
http://article.gmane.org/gmane.linux.kernel.cross-arch/17103
http://article.gmane.org/gmane.linux.power-management.general/31345

I'm beginning to add 2 + 2 together based on the kinds of questions you
have been asking...

You posted a patch in this thread and started a discussion around it without
even establishing a strong reason to do so. Now you point me to your git
tree where your patches have even more traces of ideas being borrowed from
my patchset (apart from my own ideas/code, there are traces of others' ideas
being borrowed too - for example, it was Oleg who originally proposed the
idea of splitting up the counter into 2 parts and I'm seeing that it is
slowly crawling into your code with no sign of appropriate credits).
http://article.gmane.org/gmane.linux.network/260288

And in reply to my mail pointing out the performance implications of the
global read_lock at the reader side in your code, you said you'll come up
with a comparison between that and my patchset.
http://article.gmane.org/gmane.linux.network/260288
The issue has been well-documented in my patch description of patch 4.
http://article.gmane.org/gmane.linux.kernel/1443258

Are you really trying to pit bits and pieces of my own ideas/versions
against one another and claiming them as your own?

You projected the work involved in handling the locking issues pertaining
to CPU_DYING notifiers etc as a TODO, despite the fact that I had explicitly
noted in my cover letter that I had audited and taken care of all of them.
http://article.gmane.org/gmane.linux.documentation/9727
http://article.gmane.org/gmane.linux.documentation/9520

You failed to acknowledge (on purpose?) that I had done a tree-wide
conversion despite the fact that you were replying to the very thread which
had the 46 patches which did exactly that (and I had also mentioned it
explicitly in my cover letter).
http://article.gmane.org/gmane.linux.documentation/9727
http://article.gmane.org/gmane.linux.documentation/9520

You then started probing more and more about the technique I used to do
the tree-wide conversion.
http://article.gmane.org/gmane.linux.kernel.cross-arch/17111

You also retorted saying you did go through my patch descriptions, so
its not like you have missed reading them.
http://article.gmane.org/gmane.linux.power-management.general/31345

Each of these when considered individually, might appear like innocuous and
honest attempts at evaluating my code. But when put together, I'm beginning
to sense a whole different angle to it altogether, as if you are trying
to spin your own patch series, complete with the locking framework _and_
the tree-wide conversion, heavily borrowed from mine. At the beginning of
this discussion, I predicted that the lglock version that you are proposing
would end up being either less efficient than my version or look very similar
to my version. http://article.gmane.org/gmane.linux.kernel/1447139

I thought it was just the former till now, but its not hard to see how it
is getting closer to becoming the latter too. So yeah, I'm not amused.

Maybe (and hopefully) you are just trying out different ideas on your own,
and I'm just being paranoid. I really hope that is the case. If you are just
trying to review my code, then please stop sending patches with borrowed ideas
with your sole Signed-off-by, and purposefully ignoring the work already done
in my patchset, because it is really starting to look suspicious, at least
to me.

Don't get me wrong - I'll whole-heartedly acknowledge and appreciate if
_your_ code is better than mine. I just don't like the idea of somebody
plagiarizing my ideas/code (or even others' ideas for that matter).
However, I sincerely apologize in advance if I misunderstood/misjudged your
intentions; I just wanted to voice my concerns out loud at this point,
considering the bad feeling I got by looking at your responses collectively.

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-27 21:19                                 ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-27 21:19 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Michel Lespinasse, linux-doc, peterz, fweisbec, linux-kernel,
	namhyung, mingo, linux-arch, linux, xiaoguangrong, wangyun,
	paulmck, nikunj, linux-pm, rusty, rostedt, rjw, vincent.guittot,
	tglx, linux-arm-kernel, netdev, oleg, sbw, tj, akpm,
	linuxppc-dev

On 02/27/2013 06:03 AM, Lai Jiangshan wrote:
> On Wed, Feb 27, 2013 at 3:30 AM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> On 02/26/2013 09:55 PM, Lai Jiangshan wrote:
>>> On Tue, Feb 26, 2013 at 10:22 PM, Srivatsa S. Bhat
>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>
>>>> Hi Lai,
>>>>
>>>> I'm really not convinced that piggy-backing on lglocks would help
>>>> us in any way. But still, let me try to address some of the points
>>>> you raised...
>>>>
>>>> On 02/26/2013 06:29 PM, Lai Jiangshan wrote:
>>>>> On Tue, Feb 26, 2013 at 5:02 PM, Srivatsa S. Bhat
>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>> On 02/26/2013 05:47 AM, Lai Jiangshan wrote:
>>>>>>> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
>>>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>>> Hi Lai,
>>>>>>>>
>>>>>>>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>>>>>>>> Hi, Srivatsa,
>>>>>>>>>
>>>>>>>>> The target of the whole patchset is nice for me.
>>>>>>>>
>>>>>>>> Cool! Thanks :-)
>>>>>>>>
>>>>>> [...]
>>>>>>
>>>>>> Unfortunately, I see quite a few issues with the code above. IIUC, the
>>>>>> writer and the reader both increment the same counters. So how will the
>>>>>> unlock() code in the reader path know when to unlock which of the locks?
>>>>>
>>>>> The same as your code, the reader(which nested in write C.S.) just dec
>>>>> the counters.
>>>>
>>>> And that works fine in my case because the writer and the reader update
>>>> _two_ _different_ counters.
>>>
>>> I can't find any magic in your code, they are the same counter.
>>>
>>>         /*
>>>          * It is desirable to allow the writer to acquire the percpu-rwlock
>>>          * for read (if necessary), without deadlocking or getting complaints
>>>          * from lockdep. To achieve that, just increment the reader_refcnt of
>>>          * this CPU - that way, any attempt by the writer to acquire the
>>>          * percpu-rwlock for read, will get treated as a case of nested percpu
>>>          * reader, which is safe, from a locking perspective.
>>>          */
>>>         this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
>>>
>>
>> Whoa! Hold on, were you really referring to _this_ increment when you said
>> that, in your patch you would increment the refcnt at the writer? Then I guess
>> there is a major disconnect in our conversations. (I had assumed that you were
>> referring to the update of writer_signal, and were just trying to have a single
>> refcnt instead of reader_refcnt and writer_signal).
> 
> https://github.com/laijs/linux/commit/53e5053d5b724bea7c538b11743d0f420d98f38d
> 
> Sorry the name "fallback_reader_refcnt" misled you.
> 
[...]

>>> All I was considered is "nested reader is seldom", so I always
>>> fallback to rwlock when nested.
>>> If you like, I can add 6 lines of code, the overhead is
>>> 1 spin_try_lock()(fast path)  + N  __this_cpu_inc()
>>>
>>
>> I'm assuming that calculation is no longer valid, considering that
>> we just discussed how the per-cpu refcnt that you were using is quite
>> unnecessary and can be removed.
>>
>> IIUC, the overhead with your code, as per above discussion would be:
>> 1 spin_try_lock() [non-nested] + N read_lock(global_rwlock).
> 
> https://github.com/laijs/linux/commit/46334544bb7961550b7065e015da76f6dab21f16
> 
> Again, I'm so sorry the name "fallback_reader_refcnt" misled you.
> 

At this juncture I really have to admit that I don't understand your
intentions at all. What are you really trying to prove? Without giving
a single good reason why my code is inferior, why are you even bringing
up the discussion about a complete rewrite of the synchronization code?
http://article.gmane.org/gmane.linux.kernel.cross-arch/17103
http://article.gmane.org/gmane.linux.power-management.general/31345

I'm beginning to add 2 + 2 together based on the kinds of questions you
have been asking...

You posted a patch in this thread and started a discussion around it without
even establishing a strong reason to do so. Now you point me to your git
tree where your patches have even more traces of ideas being borrowed from
my patchset (apart from my own ideas/code, there are traces of others' ideas
being borrowed too - for example, it was Oleg who originally proposed the
idea of splitting up the counter into 2 parts and I'm seeing that it is
slowly crawling into your code with no sign of appropriate credits).
http://article.gmane.org/gmane.linux.network/260288

And in reply to my mail pointing out the performance implications of the
global read_lock at the reader side in your code, you said you'll come up
with a comparison between that and my patchset.
http://article.gmane.org/gmane.linux.network/260288
The issue has been well-documented in my patch description of patch 4.
http://article.gmane.org/gmane.linux.kernel/1443258

Are you really trying to pit bits and pieces of my own ideas/versions
against one another and claiming them as your own?

You projected the work involved in handling the locking issues pertaining
to CPU_DYING notifiers etc as a TODO, despite the fact that I had explicitly
noted in my cover letter that I had audited and taken care of all of them.
http://article.gmane.org/gmane.linux.documentation/9727
http://article.gmane.org/gmane.linux.documentation/9520

You failed to acknowledge (on purpose?) that I had done a tree-wide
conversion despite the fact that you were replying to the very thread which
had the 46 patches which did exactly that (and I had also mentioned it
explicitly in my cover letter).
http://article.gmane.org/gmane.linux.documentation/9727
http://article.gmane.org/gmane.linux.documentation/9520

You then started probing more and more about the technique I used to do
the tree-wide conversion.
http://article.gmane.org/gmane.linux.kernel.cross-arch/17111

You also retorted saying you did go through my patch descriptions, so
its not like you have missed reading them.
http://article.gmane.org/gmane.linux.power-management.general/31345

Each of these when considered individually, might appear like innocuous and
honest attempts at evaluating my code. But when put together, I'm beginning
to sense a whole different angle to it altogether, as if you are trying
to spin your own patch series, complete with the locking framework _and_
the tree-wide conversion, heavily borrowed from mine. At the beginning of
this discussion, I predicted that the lglock version that you are proposing
would end up being either less efficient than my version or look very similar
to my version. http://article.gmane.org/gmane.linux.kernel/1447139

I thought it was just the former till now, but its not hard to see how it
is getting closer to becoming the latter too. So yeah, I'm not amused.

Maybe (and hopefully) you are just trying out different ideas on your own,
and I'm just being paranoid. I really hope that is the case. If you are just
trying to review my code, then please stop sending patches with borrowed ideas
with your sole Signed-off-by, and purposefully ignoring the work already done
in my patchset, because it is really starting to look suspicious, at least
to me.

Don't get me wrong - I'll whole-heartedly acknowledge and appreciate if
_your_ code is better than mine. I just don't like the idea of somebody
plagiarizing my ideas/code (or even others' ideas for that matter).
However, I sincerely apologize in advance if I misunderstood/misjudged your
intentions; I just wanted to voice my concerns out loud at this point,
considering the bad feeling I got by looking at your responses collectively.

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-27 21:19                                 ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-27 21:19 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-doc, peterz, fweisbec, linux-kernel, Michel Lespinasse,
	mingo, linux-arch, linux, xiaoguangrong, wangyun, paulmck,
	nikunj, linux-pm, rusty, rostedt, rjw, namhyung, tglx,
	linux-arm-kernel, netdev, oleg, vincent.guittot, sbw, tj, akpm,
	linuxppc-dev

On 02/27/2013 06:03 AM, Lai Jiangshan wrote:
> On Wed, Feb 27, 2013 at 3:30 AM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> On 02/26/2013 09:55 PM, Lai Jiangshan wrote:
>>> On Tue, Feb 26, 2013 at 10:22 PM, Srivatsa S. Bhat
>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>
>>>> Hi Lai,
>>>>
>>>> I'm really not convinced that piggy-backing on lglocks would help
>>>> us in any way. But still, let me try to address some of the points
>>>> you raised...
>>>>
>>>> On 02/26/2013 06:29 PM, Lai Jiangshan wrote:
>>>>> On Tue, Feb 26, 2013 at 5:02 PM, Srivatsa S. Bhat
>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>> On 02/26/2013 05:47 AM, Lai Jiangshan wrote:
>>>>>>> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
>>>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>>> Hi Lai,
>>>>>>>>
>>>>>>>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>>>>>>>> Hi, Srivatsa,
>>>>>>>>>
>>>>>>>>> The target of the whole patchset is nice for me.
>>>>>>>>
>>>>>>>> Cool! Thanks :-)
>>>>>>>>
>>>>>> [...]
>>>>>>
>>>>>> Unfortunately, I see quite a few issues with the code above. IIUC, the
>>>>>> writer and the reader both increment the same counters. So how will the
>>>>>> unlock() code in the reader path know when to unlock which of the locks?
>>>>>
>>>>> The same as your code, the reader(which nested in write C.S.) just dec
>>>>> the counters.
>>>>
>>>> And that works fine in my case because the writer and the reader update
>>>> _two_ _different_ counters.
>>>
>>> I can't find any magic in your code, they are the same counter.
>>>
>>>         /*
>>>          * It is desirable to allow the writer to acquire the percpu-rwlock
>>>          * for read (if necessary), without deadlocking or getting complaints
>>>          * from lockdep. To achieve that, just increment the reader_refcnt of
>>>          * this CPU - that way, any attempt by the writer to acquire the
>>>          * percpu-rwlock for read, will get treated as a case of nested percpu
>>>          * reader, which is safe, from a locking perspective.
>>>          */
>>>         this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
>>>
>>
>> Whoa! Hold on, were you really referring to _this_ increment when you said
>> that, in your patch you would increment the refcnt at the writer? Then I guess
>> there is a major disconnect in our conversations. (I had assumed that you were
>> referring to the update of writer_signal, and were just trying to have a single
>> refcnt instead of reader_refcnt and writer_signal).
> 
> https://github.com/laijs/linux/commit/53e5053d5b724bea7c538b11743d0f420d98f38d
> 
> Sorry the name "fallback_reader_refcnt" misled you.
> 
[...]

>>> All I was considered is "nested reader is seldom", so I always
>>> fallback to rwlock when nested.
>>> If you like, I can add 6 lines of code, the overhead is
>>> 1 spin_try_lock()(fast path)  + N  __this_cpu_inc()
>>>
>>
>> I'm assuming that calculation is no longer valid, considering that
>> we just discussed how the per-cpu refcnt that you were using is quite
>> unnecessary and can be removed.
>>
>> IIUC, the overhead with your code, as per above discussion would be:
>> 1 spin_try_lock() [non-nested] + N read_lock(global_rwlock).
> 
> https://github.com/laijs/linux/commit/46334544bb7961550b7065e015da76f6dab21f16
> 
> Again, I'm so sorry the name "fallback_reader_refcnt" misled you.
> 

At this juncture I really have to admit that I don't understand your
intentions at all. What are you really trying to prove? Without giving
a single good reason why my code is inferior, why are you even bringing
up the discussion about a complete rewrite of the synchronization code?
http://article.gmane.org/gmane.linux.kernel.cross-arch/17103
http://article.gmane.org/gmane.linux.power-management.general/31345

I'm beginning to add 2 + 2 together based on the kinds of questions you
have been asking...

You posted a patch in this thread and started a discussion around it without
even establishing a strong reason to do so. Now you point me to your git
tree where your patches have even more traces of ideas being borrowed from
my patchset (apart from my own ideas/code, there are traces of others' ideas
being borrowed too - for example, it was Oleg who originally proposed the
idea of splitting up the counter into 2 parts and I'm seeing that it is
slowly crawling into your code with no sign of appropriate credits).
http://article.gmane.org/gmane.linux.network/260288

And in reply to my mail pointing out the performance implications of the
global read_lock at the reader side in your code, you said you'll come up
with a comparison between that and my patchset.
http://article.gmane.org/gmane.linux.network/260288
The issue has been well-documented in my patch description of patch 4.
http://article.gmane.org/gmane.linux.kernel/1443258

Are you really trying to pit bits and pieces of my own ideas/versions
against one another and claiming them as your own?

You projected the work involved in handling the locking issues pertaining
to CPU_DYING notifiers etc as a TODO, despite the fact that I had explicitly
noted in my cover letter that I had audited and taken care of all of them.
http://article.gmane.org/gmane.linux.documentation/9727
http://article.gmane.org/gmane.linux.documentation/9520

You failed to acknowledge (on purpose?) that I had done a tree-wide
conversion despite the fact that you were replying to the very thread which
had the 46 patches which did exactly that (and I had also mentioned it
explicitly in my cover letter).
http://article.gmane.org/gmane.linux.documentation/9727
http://article.gmane.org/gmane.linux.documentation/9520

You then started probing more and more about the technique I used to do
the tree-wide conversion.
http://article.gmane.org/gmane.linux.kernel.cross-arch/17111

You also retorted saying you did go through my patch descriptions, so
its not like you have missed reading them.
http://article.gmane.org/gmane.linux.power-management.general/31345

Each of these when considered individually, might appear like innocuous and
honest attempts at evaluating my code. But when put together, I'm beginning
to sense a whole different angle to it altogether, as if you are trying
to spin your own patch series, complete with the locking framework _and_
the tree-wide conversion, heavily borrowed from mine. At the beginning of
this discussion, I predicted that the lglock version that you are proposing
would end up being either less efficient than my version or look very similar
to my version. http://article.gmane.org/gmane.linux.kernel/1447139

I thought it was just the former till now, but its not hard to see how it
is getting closer to becoming the latter too. So yeah, I'm not amused.

Maybe (and hopefully) you are just trying out different ideas on your own,
and I'm just being paranoid. I really hope that is the case. If you are just
trying to review my code, then please stop sending patches with borrowed ideas
with your sole Signed-off-by, and purposefully ignoring the work already done
in my patchset, because it is really starting to look suspicious, at least
to me.

Don't get me wrong - I'll whole-heartedly acknowledge and appreciate if
_your_ code is better than mine. I just don't like the idea of somebody
plagiarizing my ideas/code (or even others' ideas for that matter).
However, I sincerely apologize in advance if I misunderstood/misjudged your
intentions; I just wanted to voice my concerns out loud at this point,
considering the bad feeling I got by looking at your responses collectively.

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-27 21:19                                 ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-02-27 21:19 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/27/2013 06:03 AM, Lai Jiangshan wrote:
> On Wed, Feb 27, 2013 at 3:30 AM, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> On 02/26/2013 09:55 PM, Lai Jiangshan wrote:
>>> On Tue, Feb 26, 2013 at 10:22 PM, Srivatsa S. Bhat
>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>
>>>> Hi Lai,
>>>>
>>>> I'm really not convinced that piggy-backing on lglocks would help
>>>> us in any way. But still, let me try to address some of the points
>>>> you raised...
>>>>
>>>> On 02/26/2013 06:29 PM, Lai Jiangshan wrote:
>>>>> On Tue, Feb 26, 2013 at 5:02 PM, Srivatsa S. Bhat
>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>> On 02/26/2013 05:47 AM, Lai Jiangshan wrote:
>>>>>>> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
>>>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>>> Hi Lai,
>>>>>>>>
>>>>>>>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>>>>>>>> Hi, Srivatsa,
>>>>>>>>>
>>>>>>>>> The target of the whole patchset is nice for me.
>>>>>>>>
>>>>>>>> Cool! Thanks :-)
>>>>>>>>
>>>>>> [...]
>>>>>>
>>>>>> Unfortunately, I see quite a few issues with the code above. IIUC, the
>>>>>> writer and the reader both increment the same counters. So how will the
>>>>>> unlock() code in the reader path know when to unlock which of the locks?
>>>>>
>>>>> The same as your code, the reader(which nested in write C.S.) just dec
>>>>> the counters.
>>>>
>>>> And that works fine in my case because the writer and the reader update
>>>> _two_ _different_ counters.
>>>
>>> I can't find any magic in your code, they are the same counter.
>>>
>>>         /*
>>>          * It is desirable to allow the writer to acquire the percpu-rwlock
>>>          * for read (if necessary), without deadlocking or getting complaints
>>>          * from lockdep. To achieve that, just increment the reader_refcnt of
>>>          * this CPU - that way, any attempt by the writer to acquire the
>>>          * percpu-rwlock for read, will get treated as a case of nested percpu
>>>          * reader, which is safe, from a locking perspective.
>>>          */
>>>         this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
>>>
>>
>> Whoa! Hold on, were you really referring to _this_ increment when you said
>> that, in your patch you would increment the refcnt at the writer? Then I guess
>> there is a major disconnect in our conversations. (I had assumed that you were
>> referring to the update of writer_signal, and were just trying to have a single
>> refcnt instead of reader_refcnt and writer_signal).
> 
> https://github.com/laijs/linux/commit/53e5053d5b724bea7c538b11743d0f420d98f38d
> 
> Sorry the name "fallback_reader_refcnt" misled you.
> 
[...]

>>> All I was considered is "nested reader is seldom", so I always
>>> fallback to rwlock when nested.
>>> If you like, I can add 6 lines of code, the overhead is
>>> 1 spin_try_lock()(fast path)  + N  __this_cpu_inc()
>>>
>>
>> I'm assuming that calculation is no longer valid, considering that
>> we just discussed how the per-cpu refcnt that you were using is quite
>> unnecessary and can be removed.
>>
>> IIUC, the overhead with your code, as per above discussion would be:
>> 1 spin_try_lock() [non-nested] + N read_lock(global_rwlock).
> 
> https://github.com/laijs/linux/commit/46334544bb7961550b7065e015da76f6dab21f16
> 
> Again, I'm so sorry the name "fallback_reader_refcnt" misled you.
> 

At this juncture I really have to admit that I don't understand your
intentions at all. What are you really trying to prove? Without giving
a single good reason why my code is inferior, why are you even bringing
up the discussion about a complete rewrite of the synchronization code?
http://article.gmane.org/gmane.linux.kernel.cross-arch/17103
http://article.gmane.org/gmane.linux.power-management.general/31345

I'm beginning to add 2 + 2 together based on the kinds of questions you
have been asking...

You posted a patch in this thread and started a discussion around it without
even establishing a strong reason to do so. Now you point me to your git
tree where your patches have even more traces of ideas being borrowed from
my patchset (apart from my own ideas/code, there are traces of others' ideas
being borrowed too - for example, it was Oleg who originally proposed the
idea of splitting up the counter into 2 parts and I'm seeing that it is
slowly crawling into your code with no sign of appropriate credits).
http://article.gmane.org/gmane.linux.network/260288

And in reply to my mail pointing out the performance implications of the
global read_lock at the reader side in your code, you said you'll come up
with a comparison between that and my patchset.
http://article.gmane.org/gmane.linux.network/260288
The issue has been well-documented in my patch description of patch 4.
http://article.gmane.org/gmane.linux.kernel/1443258

Are you really trying to pit bits and pieces of my own ideas/versions
against one another and claiming them as your own?

You projected the work involved in handling the locking issues pertaining
to CPU_DYING notifiers etc as a TODO, despite the fact that I had explicitly
noted in my cover letter that I had audited and taken care of all of them.
http://article.gmane.org/gmane.linux.documentation/9727
http://article.gmane.org/gmane.linux.documentation/9520

You failed to acknowledge (on purpose?) that I had done a tree-wide
conversion despite the fact that you were replying to the very thread which
had the 46 patches which did exactly that (and I had also mentioned it
explicitly in my cover letter).
http://article.gmane.org/gmane.linux.documentation/9727
http://article.gmane.org/gmane.linux.documentation/9520

You then started probing more and more about the technique I used to do
the tree-wide conversion.
http://article.gmane.org/gmane.linux.kernel.cross-arch/17111

You also retorted saying you did go through my patch descriptions, so
its not like you have missed reading them.
http://article.gmane.org/gmane.linux.power-management.general/31345

Each of these when considered individually, might appear like innocuous and
honest attempts at evaluating my code. But when put together, I'm beginning
to sense a whole different angle to it altogether, as if you are trying
to spin your own patch series, complete with the locking framework _and_
the tree-wide conversion, heavily borrowed from mine. At the beginning of
this discussion, I predicted that the lglock version that you are proposing
would end up being either less efficient than my version or look very similar
to my version. http://article.gmane.org/gmane.linux.kernel/1447139

I thought it was just the former till now, but its not hard to see how it
is getting closer to becoming the latter too. So yeah, I'm not amused.

Maybe (and hopefully) you are just trying out different ideas on your own,
and I'm just being paranoid. I really hope that is the case. If you are just
trying to review my code, then please stop sending patches with borrowed ideas
with your sole Signed-off-by, and purposefully ignoring the work already done
in my patchset, because it is really starting to look suspicious, at least
to me.

Don't get me wrong - I'll whole-heartedly acknowledge and appreciate if
_your_ code is better than mine. I just don't like the idea of somebody
plagiarizing my ideas/code (or even others' ideas for that matter).
However, I sincerely apologize in advance if I misunderstood/misjudged your
intentions; I just wanted to voice my concerns out loud at this point,
considering the bad feeling I got by looking at your responses collectively.

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
  2013-02-27 19:25                                 ` Oleg Nesterov
  (?)
@ 2013-02-28 11:34                                   ` Michel Lespinasse
  -1 siblings, 0 replies; 360+ messages in thread
From: Michel Lespinasse @ 2013-02-28 11:34 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Srivatsa S. Bhat, Lai Jiangshan, linux-doc, peterz, fweisbec,
	linux-kernel, namhyung, mingo, linux-arch, linux, xiaoguangrong,
	wangyun, paulmck, nikunj, linux-pm, rusty, rostedt, rjw,
	vincent.guittot, tglx, linux-arm-kernel, netdev, sbw, tj, akpm,
	linuxppc-dev

On Thu, Feb 28, 2013 at 3:25 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 02/27, Michel Lespinasse wrote:
>>
>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
>> +{
>> +       preempt_disable();
>> +
>> +       if (__this_cpu_read(*lgrw->local_refcnt) ||
>> +           arch_spin_trylock(this_cpu_ptr(lgrw->lglock->lock))) {
>> +               __this_cpu_inc(*lgrw->local_refcnt);
>
> Please look at __this_cpu_generic_to_op(). You need this_cpu_inc()
> to avoid the race with irs. The same for _read_unlock.

Hmmm, I was thinking that this was safe because while interrupts might
modify local_refcnt to acquire a nested read lock, they are expected
to release that lock as well which would set local_refcnt back to its
original value ???

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-28 11:34                                   ` Michel Lespinasse
  0 siblings, 0 replies; 360+ messages in thread
From: Michel Lespinasse @ 2013-02-28 11:34 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Lai Jiangshan, linux-doc, peterz, fweisbec, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, namhyung, tglx, linux-arm-kernel, netdev,
	linux-kernel, vincent.guittot, sbw, Srivatsa S. Bhat, tj, akpm,
	linuxppc-dev

On Thu, Feb 28, 2013 at 3:25 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 02/27, Michel Lespinasse wrote:
>>
>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
>> +{
>> +       preempt_disable();
>> +
>> +       if (__this_cpu_read(*lgrw->local_refcnt) ||
>> +           arch_spin_trylock(this_cpu_ptr(lgrw->lglock->lock))) {
>> +               __this_cpu_inc(*lgrw->local_refcnt);
>
> Please look at __this_cpu_generic_to_op(). You need this_cpu_inc()
> to avoid the race with irs. The same for _read_unlock.

Hmmm, I was thinking that this was safe because while interrupts might
modify local_refcnt to acquire a nested read lock, they are expected
to release that lock as well which would set local_refcnt back to its
original value ???

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-28 11:34                                   ` Michel Lespinasse
  0 siblings, 0 replies; 360+ messages in thread
From: Michel Lespinasse @ 2013-02-28 11:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 28, 2013 at 3:25 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 02/27, Michel Lespinasse wrote:
>>
>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
>> +{
>> +       preempt_disable();
>> +
>> +       if (__this_cpu_read(*lgrw->local_refcnt) ||
>> +           arch_spin_trylock(this_cpu_ptr(lgrw->lglock->lock))) {
>> +               __this_cpu_inc(*lgrw->local_refcnt);
>
> Please look at __this_cpu_generic_to_op(). You need this_cpu_inc()
> to avoid the race with irs. The same for _read_unlock.

Hmmm, I was thinking that this was safe because while interrupts might
modify local_refcnt to acquire a nested read lock, they are expected
to release that lock as well which would set local_refcnt back to its
original value ???

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
  2013-02-28 11:34                                   ` Michel Lespinasse
  (?)
@ 2013-02-28 18:00                                     ` Oleg Nesterov
  -1 siblings, 0 replies; 360+ messages in thread
From: Oleg Nesterov @ 2013-02-28 18:00 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: Srivatsa S. Bhat, Lai Jiangshan, linux-doc, peterz, fweisbec,
	linux-kernel, namhyung, mingo, linux-arch, linux, xiaoguangrong,
	wangyun, paulmck, nikunj, linux-pm, rusty, rostedt, rjw,
	vincent.guittot, tglx, linux-arm-kernel, netdev, sbw, tj, akpm,
	linuxppc-dev

On 02/28, Michel Lespinasse wrote:
>
> On Thu, Feb 28, 2013 at 3:25 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> > On 02/27, Michel Lespinasse wrote:
> >>
> >> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
> >> +{
> >> +       preempt_disable();
> >> +
> >> +       if (__this_cpu_read(*lgrw->local_refcnt) ||
> >> +           arch_spin_trylock(this_cpu_ptr(lgrw->lglock->lock))) {
> >> +               __this_cpu_inc(*lgrw->local_refcnt);
> >
> > Please look at __this_cpu_generic_to_op(). You need this_cpu_inc()
> > to avoid the race with irs. The same for _read_unlock.
>
> Hmmm, I was thinking that this was safe because while interrupts might
> modify local_refcnt to acquire a nested read lock, they are expected
> to release that lock as well which would set local_refcnt back to its
> original value ???

Yes, yes, this is correct.

I meant that (in general, x86 is fine) __this_cpu_inc() itself is not
irq-safe. It simply does "pcp += 1".

this_cpu_inc() is fine, _this_cpu_generic_to_op() does cli/sti around.

I know this only because I did the same mistake recently, and Srivatsa
explained the problem to me ;)

Oleg.


^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-28 18:00                                     ` Oleg Nesterov
  0 siblings, 0 replies; 360+ messages in thread
From: Oleg Nesterov @ 2013-02-28 18:00 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: Lai Jiangshan, linux-doc, peterz, fweisbec, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, namhyung, tglx, linux-arm-kernel, netdev,
	linux-kernel, vincent.guittot, sbw, Srivatsa S. Bhat, tj, akpm,
	linuxppc-dev

On 02/28, Michel Lespinasse wrote:
>
> On Thu, Feb 28, 2013 at 3:25 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> > On 02/27, Michel Lespinasse wrote:
> >>
> >> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
> >> +{
> >> +       preempt_disable();
> >> +
> >> +       if (__this_cpu_read(*lgrw->local_refcnt) ||
> >> +           arch_spin_trylock(this_cpu_ptr(lgrw->lglock->lock))) {
> >> +               __this_cpu_inc(*lgrw->local_refcnt);
> >
> > Please look at __this_cpu_generic_to_op(). You need this_cpu_inc()
> > to avoid the race with irs. The same for _read_unlock.
>
> Hmmm, I was thinking that this was safe because while interrupts might
> modify local_refcnt to acquire a nested read lock, they are expected
> to release that lock as well which would set local_refcnt back to its
> original value ???

Yes, yes, this is correct.

I meant that (in general, x86 is fine) __this_cpu_inc() itself is not
irq-safe. It simply does "pcp += 1".

this_cpu_inc() is fine, _this_cpu_generic_to_op() does cli/sti around.

I know this only because I did the same mistake recently, and Srivatsa
explained the problem to me ;)

Oleg.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-28 18:00                                     ` Oleg Nesterov
  0 siblings, 0 replies; 360+ messages in thread
From: Oleg Nesterov @ 2013-02-28 18:00 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/28, Michel Lespinasse wrote:
>
> On Thu, Feb 28, 2013 at 3:25 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> > On 02/27, Michel Lespinasse wrote:
> >>
> >> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
> >> +{
> >> +       preempt_disable();
> >> +
> >> +       if (__this_cpu_read(*lgrw->local_refcnt) ||
> >> +           arch_spin_trylock(this_cpu_ptr(lgrw->lglock->lock))) {
> >> +               __this_cpu_inc(*lgrw->local_refcnt);
> >
> > Please look at __this_cpu_generic_to_op(). You need this_cpu_inc()
> > to avoid the race with irs. The same for _read_unlock.
>
> Hmmm, I was thinking that this was safe because while interrupts might
> modify local_refcnt to acquire a nested read lock, they are expected
> to release that lock as well which would set local_refcnt back to its
> original value ???

Yes, yes, this is correct.

I meant that (in general, x86 is fine) __this_cpu_inc() itself is not
irq-safe. It simply does "pcp += 1".

this_cpu_inc() is fine, _this_cpu_generic_to_op() does cli/sti around.

I know this only because I did the same mistake recently, and Srivatsa
explained the problem to me ;)

Oleg.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
  2013-02-28 18:00                                     ` Oleg Nesterov
  (?)
@ 2013-02-28 18:20                                       ` Oleg Nesterov
  -1 siblings, 0 replies; 360+ messages in thread
From: Oleg Nesterov @ 2013-02-28 18:20 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: Srivatsa S. Bhat, Lai Jiangshan, linux-doc, peterz, fweisbec,
	linux-kernel, namhyung, mingo, linux-arch, linux, xiaoguangrong,
	wangyun, paulmck, nikunj, linux-pm, rusty, rostedt, rjw,
	vincent.guittot, tglx, linux-arm-kernel, netdev, sbw, tj, akpm,
	linuxppc-dev

On 02/28, Oleg Nesterov wrote:
> On 02/28, Michel Lespinasse wrote:
> >
> > On Thu, Feb 28, 2013 at 3:25 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> > > On 02/27, Michel Lespinasse wrote:
> > >>
> > >> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
> > >> +{
> > >> +       preempt_disable();
> > >> +
> > >> +       if (__this_cpu_read(*lgrw->local_refcnt) ||
> > >> +           arch_spin_trylock(this_cpu_ptr(lgrw->lglock->lock))) {
> > >> +               __this_cpu_inc(*lgrw->local_refcnt);
> > >
> > > Please look at __this_cpu_generic_to_op(). You need this_cpu_inc()
> > > to avoid the race with irs. The same for _read_unlock.
> >
> > Hmmm, I was thinking that this was safe because while interrupts might
> > modify local_refcnt to acquire a nested read lock, they are expected
> > to release that lock as well which would set local_refcnt back to its
> > original value ???
>
> Yes, yes, this is correct.
>
> I meant that (in general, x86 is fine) __this_cpu_inc() itself is not
> irq-safe. It simply does "pcp += 1".
>
> this_cpu_inc() is fine, _this_cpu_generic_to_op() does cli/sti around.

Just in case, it is not that I really understand why __this_cpu_inc() can
race with irq in this particular case (given that irq handler should
restore the counter).

So perhaps I am wrong again. The comments in include/linux/percpu.h look
confusing to me, and I simply know nothing about !x86 architectures. But
since, say, preempt_disable() doesn't do anything special then probably
__this_cpu_inc() is fine too.

In short: please ignore me ;)

Oleg.


^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-28 18:20                                       ` Oleg Nesterov
  0 siblings, 0 replies; 360+ messages in thread
From: Oleg Nesterov @ 2013-02-28 18:20 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: Lai Jiangshan, linux-doc, peterz, fweisbec, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, namhyung, tglx, linux-arm-kernel, netdev,
	linux-kernel, vincent.guittot, sbw, Srivatsa S. Bhat, tj, akpm,
	linuxppc-dev

On 02/28, Oleg Nesterov wrote:
> On 02/28, Michel Lespinasse wrote:
> >
> > On Thu, Feb 28, 2013 at 3:25 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> > > On 02/27, Michel Lespinasse wrote:
> > >>
> > >> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
> > >> +{
> > >> +       preempt_disable();
> > >> +
> > >> +       if (__this_cpu_read(*lgrw->local_refcnt) ||
> > >> +           arch_spin_trylock(this_cpu_ptr(lgrw->lglock->lock))) {
> > >> +               __this_cpu_inc(*lgrw->local_refcnt);
> > >
> > > Please look at __this_cpu_generic_to_op(). You need this_cpu_inc()
> > > to avoid the race with irs. The same for _read_unlock.
> >
> > Hmmm, I was thinking that this was safe because while interrupts might
> > modify local_refcnt to acquire a nested read lock, they are expected
> > to release that lock as well which would set local_refcnt back to its
> > original value ???
>
> Yes, yes, this is correct.
>
> I meant that (in general, x86 is fine) __this_cpu_inc() itself is not
> irq-safe. It simply does "pcp += 1".
>
> this_cpu_inc() is fine, _this_cpu_generic_to_op() does cli/sti around.

Just in case, it is not that I really understand why __this_cpu_inc() can
race with irq in this particular case (given that irq handler should
restore the counter).

So perhaps I am wrong again. The comments in include/linux/percpu.h look
confusing to me, and I simply know nothing about !x86 architectures. But
since, say, preempt_disable() doesn't do anything special then probably
__this_cpu_inc() is fine too.

In short: please ignore me ;)

Oleg.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-02-28 18:20                                       ` Oleg Nesterov
  0 siblings, 0 replies; 360+ messages in thread
From: Oleg Nesterov @ 2013-02-28 18:20 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/28, Oleg Nesterov wrote:
> On 02/28, Michel Lespinasse wrote:
> >
> > On Thu, Feb 28, 2013 at 3:25 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> > > On 02/27, Michel Lespinasse wrote:
> > >>
> > >> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
> > >> +{
> > >> +       preempt_disable();
> > >> +
> > >> +       if (__this_cpu_read(*lgrw->local_refcnt) ||
> > >> +           arch_spin_trylock(this_cpu_ptr(lgrw->lglock->lock))) {
> > >> +               __this_cpu_inc(*lgrw->local_refcnt);
> > >
> > > Please look at __this_cpu_generic_to_op(). You need this_cpu_inc()
> > > to avoid the race with irs. The same for _read_unlock.
> >
> > Hmmm, I was thinking that this was safe because while interrupts might
> > modify local_refcnt to acquire a nested read lock, they are expected
> > to release that lock as well which would set local_refcnt back to its
> > original value ???
>
> Yes, yes, this is correct.
>
> I meant that (in general, x86 is fine) __this_cpu_inc() itself is not
> irq-safe. It simply does "pcp += 1".
>
> this_cpu_inc() is fine, _this_cpu_generic_to_op() does cli/sti around.

Just in case, it is not that I really understand why __this_cpu_inc() can
race with irq in this particular case (given that irq handler should
restore the counter).

So perhaps I am wrong again. The comments in include/linux/percpu.h look
confusing to me, and I simply know nothing about !x86 architectures. But
since, say, preempt_disable() doesn't do anything special then probably
__this_cpu_inc() is fine too.

In short: please ignore me ;)

Oleg.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 00/46] CPU hotplug: stop_machine()-free CPU hotplug
  2013-02-18 12:38 ` Srivatsa S. Bhat
  (?)
@ 2013-03-01 12:05   ` Vincent Guittot
  -1 siblings, 0 replies; 360+ messages in thread
From: Vincent Guittot @ 2013-03-01 12:05 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: tglx, peterz, tj, oleg, paulmck, rusty, mingo, akpm, namhyung,
	rostedt, wangyun, xiaoguangrong, rjw, sbw, fweisbec, linux,
	nikunj, linux-pm, linux-arch, linux-arm-kernel, linuxppc-dev,
	netdev, linux-doc, linux-kernel, walken

Hi Srivatsa,

I have run some tests with genload on my ARM platform but even with
the mainline the cpu_down is quite short and stable ( around  4ms )
with 5 or 2 online cores. The duration is similar with your patches

I have maybe not used the right option for genload ? I have used
genload -m 10 which seems to generate the most system time. Which
command have you used for your tests ?

Vincent

On 18 February 2013 13:38, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> Hi,
>
> This patchset removes CPU hotplug's dependence on stop_machine() from the CPU
> offline path and provides an alternative (set of APIs) to preempt_disable() to
> prevent CPUs from going offline, which can be invoked from atomic context.
> The motivation behind the removal of stop_machine() is to avoid its ill-effects
> and thus improve the design of CPU hotplug. (More description regarding this
> is available in the patches).
>
> All the users of preempt_disable()/local_irq_disable() who used to use it to
> prevent CPU offline, have been converted to the new primitives introduced in the
> patchset. Also, the CPU_DYING notifiers have been audited to check whether
> they can cope up with the removal of stop_machine() or whether they need to
> use new locks for synchronization (all CPU_DYING notifiers looked OK, without
> the need for any new locks).
>
> Applies on current mainline (v3.8-rc7+).
>
> This patchset is available in the following git branch:
>
> git://github.com/srivatsabhat/linux.git  stop-machine-free-cpu-hotplug-v6
>
>
> Overview of the patches:
> -----------------------
>
> Patches 1 to 7 introduce a generic, flexible Per-CPU Reader-Writer Locking
> scheme.
>
> Patch 8 uses this synchronization mechanism to build the
> get/put_online_cpus_atomic() APIs which can be used from atomic context, to
> prevent CPUs from going offline.
>
> Patch 9 is a cleanup; it converts preprocessor macros to static inline
> functions.
>
> Patches 10 to 43 convert various call-sites to use the new APIs.
>
> Patch 44 is the one which actually removes stop_machine() from the CPU
> offline path.
>
> Patch 45 decouples stop_machine() and CPU hotplug from Kconfig.
>
> Patch 46 updates the documentation to reflect the new APIs.
>
>
> Changes in v6:
> --------------
>
> * Fixed issues related to memory barriers, as pointed out by Paul and Oleg.
> * Fixed the locking issue related to clockevents_lock, which was being
>   triggered when cpu idle was enabled.
> * Some code restructuring to improve readability and to enhance some fastpath
>   optimizations.
> * Randconfig build-fixes, reported by Fengguang Wu.
>
>
> Changes in v5:
> --------------
>   Exposed a new generic locking scheme: Flexible Per-CPU Reader-Writer locks,
>   based on the synchronization schemes already discussed in the previous
>   versions, and used it in CPU hotplug, to implement the new APIs.
>
>   Audited the CPU_DYING notifiers in the kernel source tree and replaced
>   usages of preempt_disable() with the new get/put_online_cpus_atomic() APIs
>   where necessary.
>
>
> Changes in v4:
> --------------
>   The synchronization scheme has been simplified quite a bit, which makes it
>   look a lot less complex than before. Some highlights:
>
> * Implicit ACKs:
>
>   The earlier design required the readers to explicitly ACK the writer's
>   signal. The new design uses implicit ACKs instead. The reader switching
>   over to rwlock implicitly tells the writer to stop waiting for that reader.
>
> * No atomic operations:
>
>   Since we got rid of explicit ACKs, we no longer have the need for a reader
>   and a writer to update the same counter. So we can get rid of atomic ops
>   too.
>
> Changes in v3:
> --------------
> * Dropped the _light() and _full() variants of the APIs. Provided a single
>   interface: get/put_online_cpus_atomic().
>
> * Completely redesigned the synchronization mechanism again, to make it
>   fast and scalable at the reader-side in the fast-path (when no hotplug
>   writers are active). This new scheme also ensures that there is no
>   possibility of deadlocks due to circular locking dependency.
>   In summary, this provides the scalability and speed of per-cpu rwlocks
>   (without actually using them), while avoiding the downside (deadlock
>   possibilities) which is inherent in any per-cpu locking scheme that is
>   meant to compete with preempt_disable()/enable() in terms of flexibility.
>
>   The problem with using per-cpu locking to replace preempt_disable()/enable
>   was explained here:
>   https://lkml.org/lkml/2012/12/6/290
>
>   Basically we use per-cpu counters (for scalability) when no writers are
>   active, and then switch to global rwlocks (for lock-safety) when a writer
>   becomes active. It is a slightly complex scheme, but it is based on
>   standard principles of distributed algorithms.
>
> Changes in v2:
> -------------
> * Completely redesigned the synchronization scheme to avoid using any extra
>   cpumasks.
>
> * Provided APIs for 2 types of atomic hotplug readers: "light" (for
>   light-weight) and "full". We wish to have more "light" readers than
>   the "full" ones, to avoid indirectly inducing the "stop_machine effect"
>   without even actually using stop_machine().
>
>   And the patches show that it _is_ generally true: 5 patches deal with
>   "light" readers, whereas only 1 patch deals with a "full" reader.
>
>   Also, the "light" readers happen to be in very hot paths. So it makes a
>   lot of sense to have such a distinction and a corresponding light-weight
>   API.
>
> Links to previous versions:
> v5: http://lwn.net/Articles/533553/
> v4: https://lkml.org/lkml/2012/12/11/209
> v3: https://lkml.org/lkml/2012/12/7/287
> v2: https://lkml.org/lkml/2012/12/5/322
> v1: https://lkml.org/lkml/2012/12/4/88
>
> --
>  Paul E. McKenney (1):
>       cpu: No more __stop_machine() in _cpu_down()
>
> Srivatsa S. Bhat (45):
>       percpu_rwlock: Introduce the global reader-writer lock backend
>       percpu_rwlock: Introduce per-CPU variables for the reader and the writer
>       percpu_rwlock: Provide a way to define and init percpu-rwlocks at compile time
>       percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
>       percpu_rwlock: Make percpu-rwlocks IRQ-safe, optimally
>       percpu_rwlock: Rearrange the read-lock code to fastpath nested percpu readers
>       percpu_rwlock: Allow writers to be readers, and add lockdep annotations
>       CPU hotplug: Provide APIs to prevent CPU offline from atomic context
>       CPU hotplug: Convert preprocessor macros to static inline functions
>       smp, cpu hotplug: Fix smp_call_function_*() to prevent CPU offline properly
>       smp, cpu hotplug: Fix on_each_cpu_*() to prevent CPU offline properly
>       sched/timer: Use get/put_online_cpus_atomic() to prevent CPU offline
>       sched/migration: Use raw_spin_lock/unlock since interrupts are already disabled
>       sched/rt: Use get/put_online_cpus_atomic() to prevent CPU offline
>       tick: Use get/put_online_cpus_atomic() to prevent CPU offline
>       time/clocksource: Use get/put_online_cpus_atomic() to prevent CPU offline
>       clockevents: Use get/put_online_cpus_atomic() in clockevents_notify()
>       softirq: Use get/put_online_cpus_atomic() to prevent CPU offline
>       irq: Use get/put_online_cpus_atomic() to prevent CPU offline
>       net: Use get/put_online_cpus_atomic() to prevent CPU offline
>       block: Use get/put_online_cpus_atomic() to prevent CPU offline
>       crypto: pcrypt - Protect access to cpu_online_mask with get/put_online_cpus()
>       infiniband: ehca: Use get/put_online_cpus_atomic() to prevent CPU offline
>       [SCSI] fcoe: Use get/put_online_cpus_atomic() to prevent CPU offline
>       staging: octeon: Use get/put_online_cpus_atomic() to prevent CPU offline
>       x86: Use get/put_online_cpus_atomic() to prevent CPU offline
>       perf/x86: Use get/put_online_cpus_atomic() to prevent CPU offline
>       KVM: Use get/put_online_cpus_atomic() to prevent CPU offline from atomic context
>       kvm/vmx: Use get/put_online_cpus_atomic() to prevent CPU offline
>       x86/xen: Use get/put_online_cpus_atomic() to prevent CPU offline
>       alpha/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
>       blackfin/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
>       cris/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
>       hexagon/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
>       ia64: Use get/put_online_cpus_atomic() to prevent CPU offline
>       m32r: Use get/put_online_cpus_atomic() to prevent CPU offline
>       MIPS: Use get/put_online_cpus_atomic() to prevent CPU offline
>       mn10300: Use get/put_online_cpus_atomic() to prevent CPU offline
>       parisc: Use get/put_online_cpus_atomic() to prevent CPU offline
>       powerpc: Use get/put_online_cpus_atomic() to prevent CPU offline
>       sh: Use get/put_online_cpus_atomic() to prevent CPU offline
>       sparc: Use get/put_online_cpus_atomic() to prevent CPU offline
>       tile: Use get/put_online_cpus_atomic() to prevent CPU offline
>       CPU hotplug, stop_machine: Decouple CPU hotplug from stop_machine() in Kconfig
>       Documentation/cpu-hotplug: Remove references to stop_machine()
>
>   Documentation/cpu-hotplug.txt                 |   17 +-
>  arch/alpha/kernel/smp.c                       |   19 +-
>  arch/arm/Kconfig                              |    1
>  arch/blackfin/Kconfig                         |    1
>  arch/blackfin/mach-common/smp.c               |    6 -
>  arch/cris/arch-v32/kernel/smp.c               |    8 +
>  arch/hexagon/kernel/smp.c                     |    5
>  arch/ia64/Kconfig                             |    1
>  arch/ia64/kernel/irq_ia64.c                   |   13 +
>  arch/ia64/kernel/perfmon.c                    |    6 +
>  arch/ia64/kernel/smp.c                        |   23 ++
>  arch/ia64/mm/tlb.c                            |    6 -
>  arch/m32r/kernel/smp.c                        |   12 +
>  arch/mips/Kconfig                             |    1
>  arch/mips/kernel/cevt-smtc.c                  |    8 +
>  arch/mips/kernel/smp.c                        |   16 +-
>  arch/mips/kernel/smtc.c                       |    3
>  arch/mips/mm/c-octeon.c                       |    4
>  arch/mn10300/Kconfig                          |    1
>  arch/mn10300/kernel/smp.c                     |    2
>  arch/mn10300/mm/cache-smp.c                   |    5
>  arch/mn10300/mm/tlb-smp.c                     |   15 +
>  arch/parisc/Kconfig                           |    1
>  arch/parisc/kernel/smp.c                      |    4
>  arch/powerpc/Kconfig                          |    1
>  arch/powerpc/mm/mmu_context_nohash.c          |    2
>  arch/s390/Kconfig                             |    1
>  arch/sh/Kconfig                               |    1
>  arch/sh/kernel/smp.c                          |   12 +
>  arch/sparc/Kconfig                            |    1
>  arch/sparc/kernel/leon_smp.c                  |    2
>  arch/sparc/kernel/smp_64.c                    |    9 -
>  arch/sparc/kernel/sun4d_smp.c                 |    2
>  arch/sparc/kernel/sun4m_smp.c                 |    3
>  arch/tile/kernel/smp.c                        |    4
>  arch/x86/Kconfig                              |    1
>  arch/x86/include/asm/ipi.h                    |    5
>  arch/x86/kernel/apic/apic_flat_64.c           |   10 +
>  arch/x86/kernel/apic/apic_numachip.c          |    5
>  arch/x86/kernel/apic/es7000_32.c              |    5
>  arch/x86/kernel/apic/io_apic.c                |    7 -
>  arch/x86/kernel/apic/ipi.c                    |   10 +
>  arch/x86/kernel/apic/x2apic_cluster.c         |    4
>  arch/x86/kernel/apic/x2apic_uv_x.c            |    4
>  arch/x86/kernel/cpu/mcheck/therm_throt.c      |    4
>  arch/x86/kernel/cpu/perf_event_intel_uncore.c |    5
>  arch/x86/kvm/vmx.c                            |    8 +
>  arch/x86/mm/tlb.c                             |   14 +
>  arch/x86/xen/mmu.c                            |   11 +
>  arch/x86/xen/smp.c                            |    9 +
>  block/blk-softirq.c                           |    4
>  crypto/pcrypt.c                               |    4
>  drivers/infiniband/hw/ehca/ehca_irq.c         |    8 +
>  drivers/scsi/fcoe/fcoe.c                      |    7 +
>  drivers/staging/octeon/ethernet-rx.c          |    3
>  include/linux/cpu.h                           |    8 +
>  include/linux/percpu-rwlock.h                 |   74 +++++++
>  include/linux/stop_machine.h                  |    2
>  init/Kconfig                                  |    2
>  kernel/cpu.c                                  |   59 +++++-
>  kernel/irq/manage.c                           |    7 +
>  kernel/sched/core.c                           |   36 +++-
>  kernel/sched/fair.c                           |    5
>  kernel/sched/rt.c                             |    3
>  kernel/smp.c                                  |   65 ++++--
>  kernel/softirq.c                              |    3
>  kernel/time/clockevents.c                     |    3
>  kernel/time/clocksource.c                     |    5
>  kernel/time/tick-broadcast.c                  |    2
>  kernel/timer.c                                |    2
>  lib/Kconfig                                   |    3
>  lib/Makefile                                  |    1
>  lib/percpu-rwlock.c                           |  256 +++++++++++++++++++++++++
>  net/core/dev.c                                |    9 +
>  virt/kvm/kvm_main.c                           |   10 +
>  75 files changed, 776 insertions(+), 123 deletions(-)
>  create mode 100644 include/linux/percpu-rwlock.h
>  create mode 100644 lib/percpu-rwlock.c
>
>
>
> Regards,
> Srivatsa S. Bhat
> IBM Linux Technology Center
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 00/46] CPU hotplug: stop_machine()-free CPU hotplug
@ 2013-03-01 12:05   ` Vincent Guittot
  0 siblings, 0 replies; 360+ messages in thread
From: Vincent Guittot @ 2013-03-01 12:05 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: linux-doc, peterz, fweisbec, linux-kernel, walken, mingo,
	linux-arch, linux, xiaoguangrong, wangyun, paulmck, nikunj,
	linux-pm, rusty, rostedt, rjw, namhyung, tglx, linux-arm-kernel,
	netdev, oleg, sbw, tj, akpm, linuxppc-dev

Hi Srivatsa,

I have run some tests with genload on my ARM platform but even with
the mainline the cpu_down is quite short and stable ( around  4ms )
with 5 or 2 online cores. The duration is similar with your patches

I have maybe not used the right option for genload ? I have used
genload -m 10 which seems to generate the most system time. Which
command have you used for your tests ?

Vincent

On 18 February 2013 13:38, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> Hi,
>
> This patchset removes CPU hotplug's dependence on stop_machine() from the CPU
> offline path and provides an alternative (set of APIs) to preempt_disable() to
> prevent CPUs from going offline, which can be invoked from atomic context.
> The motivation behind the removal of stop_machine() is to avoid its ill-effects
> and thus improve the design of CPU hotplug. (More description regarding this
> is available in the patches).
>
> All the users of preempt_disable()/local_irq_disable() who used to use it to
> prevent CPU offline, have been converted to the new primitives introduced in the
> patchset. Also, the CPU_DYING notifiers have been audited to check whether
> they can cope up with the removal of stop_machine() or whether they need to
> use new locks for synchronization (all CPU_DYING notifiers looked OK, without
> the need for any new locks).
>
> Applies on current mainline (v3.8-rc7+).
>
> This patchset is available in the following git branch:
>
> git://github.com/srivatsabhat/linux.git  stop-machine-free-cpu-hotplug-v6
>
>
> Overview of the patches:
> -----------------------
>
> Patches 1 to 7 introduce a generic, flexible Per-CPU Reader-Writer Locking
> scheme.
>
> Patch 8 uses this synchronization mechanism to build the
> get/put_online_cpus_atomic() APIs which can be used from atomic context, to
> prevent CPUs from going offline.
>
> Patch 9 is a cleanup; it converts preprocessor macros to static inline
> functions.
>
> Patches 10 to 43 convert various call-sites to use the new APIs.
>
> Patch 44 is the one which actually removes stop_machine() from the CPU
> offline path.
>
> Patch 45 decouples stop_machine() and CPU hotplug from Kconfig.
>
> Patch 46 updates the documentation to reflect the new APIs.
>
>
> Changes in v6:
> --------------
>
> * Fixed issues related to memory barriers, as pointed out by Paul and Oleg.
> * Fixed the locking issue related to clockevents_lock, which was being
>   triggered when cpu idle was enabled.
> * Some code restructuring to improve readability and to enhance some fastpath
>   optimizations.
> * Randconfig build-fixes, reported by Fengguang Wu.
>
>
> Changes in v5:
> --------------
>   Exposed a new generic locking scheme: Flexible Per-CPU Reader-Writer locks,
>   based on the synchronization schemes already discussed in the previous
>   versions, and used it in CPU hotplug, to implement the new APIs.
>
>   Audited the CPU_DYING notifiers in the kernel source tree and replaced
>   usages of preempt_disable() with the new get/put_online_cpus_atomic() APIs
>   where necessary.
>
>
> Changes in v4:
> --------------
>   The synchronization scheme has been simplified quite a bit, which makes it
>   look a lot less complex than before. Some highlights:
>
> * Implicit ACKs:
>
>   The earlier design required the readers to explicitly ACK the writer's
>   signal. The new design uses implicit ACKs instead. The reader switching
>   over to rwlock implicitly tells the writer to stop waiting for that reader.
>
> * No atomic operations:
>
>   Since we got rid of explicit ACKs, we no longer have the need for a reader
>   and a writer to update the same counter. So we can get rid of atomic ops
>   too.
>
> Changes in v3:
> --------------
> * Dropped the _light() and _full() variants of the APIs. Provided a single
>   interface: get/put_online_cpus_atomic().
>
> * Completely redesigned the synchronization mechanism again, to make it
>   fast and scalable at the reader-side in the fast-path (when no hotplug
>   writers are active). This new scheme also ensures that there is no
>   possibility of deadlocks due to circular locking dependency.
>   In summary, this provides the scalability and speed of per-cpu rwlocks
>   (without actually using them), while avoiding the downside (deadlock
>   possibilities) which is inherent in any per-cpu locking scheme that is
>   meant to compete with preempt_disable()/enable() in terms of flexibility.
>
>   The problem with using per-cpu locking to replace preempt_disable()/enable
>   was explained here:
>   https://lkml.org/lkml/2012/12/6/290
>
>   Basically we use per-cpu counters (for scalability) when no writers are
>   active, and then switch to global rwlocks (for lock-safety) when a writer
>   becomes active. It is a slightly complex scheme, but it is based on
>   standard principles of distributed algorithms.
>
> Changes in v2:
> -------------
> * Completely redesigned the synchronization scheme to avoid using any extra
>   cpumasks.
>
> * Provided APIs for 2 types of atomic hotplug readers: "light" (for
>   light-weight) and "full". We wish to have more "light" readers than
>   the "full" ones, to avoid indirectly inducing the "stop_machine effect"
>   without even actually using stop_machine().
>
>   And the patches show that it _is_ generally true: 5 patches deal with
>   "light" readers, whereas only 1 patch deals with a "full" reader.
>
>   Also, the "light" readers happen to be in very hot paths. So it makes a
>   lot of sense to have such a distinction and a corresponding light-weight
>   API.
>
> Links to previous versions:
> v5: http://lwn.net/Articles/533553/
> v4: https://lkml.org/lkml/2012/12/11/209
> v3: https://lkml.org/lkml/2012/12/7/287
> v2: https://lkml.org/lkml/2012/12/5/322
> v1: https://lkml.org/lkml/2012/12/4/88
>
> --
>  Paul E. McKenney (1):
>       cpu: No more __stop_machine() in _cpu_down()
>
> Srivatsa S. Bhat (45):
>       percpu_rwlock: Introduce the global reader-writer lock backend
>       percpu_rwlock: Introduce per-CPU variables for the reader and the writer
>       percpu_rwlock: Provide a way to define and init percpu-rwlocks at compile time
>       percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
>       percpu_rwlock: Make percpu-rwlocks IRQ-safe, optimally
>       percpu_rwlock: Rearrange the read-lock code to fastpath nested percpu readers
>       percpu_rwlock: Allow writers to be readers, and add lockdep annotations
>       CPU hotplug: Provide APIs to prevent CPU offline from atomic context
>       CPU hotplug: Convert preprocessor macros to static inline functions
>       smp, cpu hotplug: Fix smp_call_function_*() to prevent CPU offline properly
>       smp, cpu hotplug: Fix on_each_cpu_*() to prevent CPU offline properly
>       sched/timer: Use get/put_online_cpus_atomic() to prevent CPU offline
>       sched/migration: Use raw_spin_lock/unlock since interrupts are already disabled
>       sched/rt: Use get/put_online_cpus_atomic() to prevent CPU offline
>       tick: Use get/put_online_cpus_atomic() to prevent CPU offline
>       time/clocksource: Use get/put_online_cpus_atomic() to prevent CPU offline
>       clockevents: Use get/put_online_cpus_atomic() in clockevents_notify()
>       softirq: Use get/put_online_cpus_atomic() to prevent CPU offline
>       irq: Use get/put_online_cpus_atomic() to prevent CPU offline
>       net: Use get/put_online_cpus_atomic() to prevent CPU offline
>       block: Use get/put_online_cpus_atomic() to prevent CPU offline
>       crypto: pcrypt - Protect access to cpu_online_mask with get/put_online_cpus()
>       infiniband: ehca: Use get/put_online_cpus_atomic() to prevent CPU offline
>       [SCSI] fcoe: Use get/put_online_cpus_atomic() to prevent CPU offline
>       staging: octeon: Use get/put_online_cpus_atomic() to prevent CPU offline
>       x86: Use get/put_online_cpus_atomic() to prevent CPU offline
>       perf/x86: Use get/put_online_cpus_atomic() to prevent CPU offline
>       KVM: Use get/put_online_cpus_atomic() to prevent CPU offline from atomic context
>       kvm/vmx: Use get/put_online_cpus_atomic() to prevent CPU offline
>       x86/xen: Use get/put_online_cpus_atomic() to prevent CPU offline
>       alpha/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
>       blackfin/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
>       cris/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
>       hexagon/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
>       ia64: Use get/put_online_cpus_atomic() to prevent CPU offline
>       m32r: Use get/put_online_cpus_atomic() to prevent CPU offline
>       MIPS: Use get/put_online_cpus_atomic() to prevent CPU offline
>       mn10300: Use get/put_online_cpus_atomic() to prevent CPU offline
>       parisc: Use get/put_online_cpus_atomic() to prevent CPU offline
>       powerpc: Use get/put_online_cpus_atomic() to prevent CPU offline
>       sh: Use get/put_online_cpus_atomic() to prevent CPU offline
>       sparc: Use get/put_online_cpus_atomic() to prevent CPU offline
>       tile: Use get/put_online_cpus_atomic() to prevent CPU offline
>       CPU hotplug, stop_machine: Decouple CPU hotplug from stop_machine() in Kconfig
>       Documentation/cpu-hotplug: Remove references to stop_machine()
>
>   Documentation/cpu-hotplug.txt                 |   17 +-
>  arch/alpha/kernel/smp.c                       |   19 +-
>  arch/arm/Kconfig                              |    1
>  arch/blackfin/Kconfig                         |    1
>  arch/blackfin/mach-common/smp.c               |    6 -
>  arch/cris/arch-v32/kernel/smp.c               |    8 +
>  arch/hexagon/kernel/smp.c                     |    5
>  arch/ia64/Kconfig                             |    1
>  arch/ia64/kernel/irq_ia64.c                   |   13 +
>  arch/ia64/kernel/perfmon.c                    |    6 +
>  arch/ia64/kernel/smp.c                        |   23 ++
>  arch/ia64/mm/tlb.c                            |    6 -
>  arch/m32r/kernel/smp.c                        |   12 +
>  arch/mips/Kconfig                             |    1
>  arch/mips/kernel/cevt-smtc.c                  |    8 +
>  arch/mips/kernel/smp.c                        |   16 +-
>  arch/mips/kernel/smtc.c                       |    3
>  arch/mips/mm/c-octeon.c                       |    4
>  arch/mn10300/Kconfig                          |    1
>  arch/mn10300/kernel/smp.c                     |    2
>  arch/mn10300/mm/cache-smp.c                   |    5
>  arch/mn10300/mm/tlb-smp.c                     |   15 +
>  arch/parisc/Kconfig                           |    1
>  arch/parisc/kernel/smp.c                      |    4
>  arch/powerpc/Kconfig                          |    1
>  arch/powerpc/mm/mmu_context_nohash.c          |    2
>  arch/s390/Kconfig                             |    1
>  arch/sh/Kconfig                               |    1
>  arch/sh/kernel/smp.c                          |   12 +
>  arch/sparc/Kconfig                            |    1
>  arch/sparc/kernel/leon_smp.c                  |    2
>  arch/sparc/kernel/smp_64.c                    |    9 -
>  arch/sparc/kernel/sun4d_smp.c                 |    2
>  arch/sparc/kernel/sun4m_smp.c                 |    3
>  arch/tile/kernel/smp.c                        |    4
>  arch/x86/Kconfig                              |    1
>  arch/x86/include/asm/ipi.h                    |    5
>  arch/x86/kernel/apic/apic_flat_64.c           |   10 +
>  arch/x86/kernel/apic/apic_numachip.c          |    5
>  arch/x86/kernel/apic/es7000_32.c              |    5
>  arch/x86/kernel/apic/io_apic.c                |    7 -
>  arch/x86/kernel/apic/ipi.c                    |   10 +
>  arch/x86/kernel/apic/x2apic_cluster.c         |    4
>  arch/x86/kernel/apic/x2apic_uv_x.c            |    4
>  arch/x86/kernel/cpu/mcheck/therm_throt.c      |    4
>  arch/x86/kernel/cpu/perf_event_intel_uncore.c |    5
>  arch/x86/kvm/vmx.c                            |    8 +
>  arch/x86/mm/tlb.c                             |   14 +
>  arch/x86/xen/mmu.c                            |   11 +
>  arch/x86/xen/smp.c                            |    9 +
>  block/blk-softirq.c                           |    4
>  crypto/pcrypt.c                               |    4
>  drivers/infiniband/hw/ehca/ehca_irq.c         |    8 +
>  drivers/scsi/fcoe/fcoe.c                      |    7 +
>  drivers/staging/octeon/ethernet-rx.c          |    3
>  include/linux/cpu.h                           |    8 +
>  include/linux/percpu-rwlock.h                 |   74 +++++++
>  include/linux/stop_machine.h                  |    2
>  init/Kconfig                                  |    2
>  kernel/cpu.c                                  |   59 +++++-
>  kernel/irq/manage.c                           |    7 +
>  kernel/sched/core.c                           |   36 +++-
>  kernel/sched/fair.c                           |    5
>  kernel/sched/rt.c                             |    3
>  kernel/smp.c                                  |   65 ++++--
>  kernel/softirq.c                              |    3
>  kernel/time/clockevents.c                     |    3
>  kernel/time/clocksource.c                     |    5
>  kernel/time/tick-broadcast.c                  |    2
>  kernel/timer.c                                |    2
>  lib/Kconfig                                   |    3
>  lib/Makefile                                  |    1
>  lib/percpu-rwlock.c                           |  256 +++++++++++++++++++++++++
>  net/core/dev.c                                |    9 +
>  virt/kvm/kvm_main.c                           |   10 +
>  75 files changed, 776 insertions(+), 123 deletions(-)
>  create mode 100644 include/linux/percpu-rwlock.h
>  create mode 100644 lib/percpu-rwlock.c
>
>
>
> Regards,
> Srivatsa S. Bhat
> IBM Linux Technology Center
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 00/46] CPU hotplug: stop_machine()-free CPU hotplug
@ 2013-03-01 12:05   ` Vincent Guittot
  0 siblings, 0 replies; 360+ messages in thread
From: Vincent Guittot @ 2013-03-01 12:05 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Srivatsa,

I have run some tests with genload on my ARM platform but even with
the mainline the cpu_down is quite short and stable ( around  4ms )
with 5 or 2 online cores. The duration is similar with your patches

I have maybe not used the right option for genload ? I have used
genload -m 10 which seems to generate the most system time. Which
command have you used for your tests ?

Vincent

On 18 February 2013 13:38, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> Hi,
>
> This patchset removes CPU hotplug's dependence on stop_machine() from the CPU
> offline path and provides an alternative (set of APIs) to preempt_disable() to
> prevent CPUs from going offline, which can be invoked from atomic context.
> The motivation behind the removal of stop_machine() is to avoid its ill-effects
> and thus improve the design of CPU hotplug. (More description regarding this
> is available in the patches).
>
> All the users of preempt_disable()/local_irq_disable() who used to use it to
> prevent CPU offline, have been converted to the new primitives introduced in the
> patchset. Also, the CPU_DYING notifiers have been audited to check whether
> they can cope up with the removal of stop_machine() or whether they need to
> use new locks for synchronization (all CPU_DYING notifiers looked OK, without
> the need for any new locks).
>
> Applies on current mainline (v3.8-rc7+).
>
> This patchset is available in the following git branch:
>
> git://github.com/srivatsabhat/linux.git  stop-machine-free-cpu-hotplug-v6
>
>
> Overview of the patches:
> -----------------------
>
> Patches 1 to 7 introduce a generic, flexible Per-CPU Reader-Writer Locking
> scheme.
>
> Patch 8 uses this synchronization mechanism to build the
> get/put_online_cpus_atomic() APIs which can be used from atomic context, to
> prevent CPUs from going offline.
>
> Patch 9 is a cleanup; it converts preprocessor macros to static inline
> functions.
>
> Patches 10 to 43 convert various call-sites to use the new APIs.
>
> Patch 44 is the one which actually removes stop_machine() from the CPU
> offline path.
>
> Patch 45 decouples stop_machine() and CPU hotplug from Kconfig.
>
> Patch 46 updates the documentation to reflect the new APIs.
>
>
> Changes in v6:
> --------------
>
> * Fixed issues related to memory barriers, as pointed out by Paul and Oleg.
> * Fixed the locking issue related to clockevents_lock, which was being
>   triggered when cpu idle was enabled.
> * Some code restructuring to improve readability and to enhance some fastpath
>   optimizations.
> * Randconfig build-fixes, reported by Fengguang Wu.
>
>
> Changes in v5:
> --------------
>   Exposed a new generic locking scheme: Flexible Per-CPU Reader-Writer locks,
>   based on the synchronization schemes already discussed in the previous
>   versions, and used it in CPU hotplug, to implement the new APIs.
>
>   Audited the CPU_DYING notifiers in the kernel source tree and replaced
>   usages of preempt_disable() with the new get/put_online_cpus_atomic() APIs
>   where necessary.
>
>
> Changes in v4:
> --------------
>   The synchronization scheme has been simplified quite a bit, which makes it
>   look a lot less complex than before. Some highlights:
>
> * Implicit ACKs:
>
>   The earlier design required the readers to explicitly ACK the writer's
>   signal. The new design uses implicit ACKs instead. The reader switching
>   over to rwlock implicitly tells the writer to stop waiting for that reader.
>
> * No atomic operations:
>
>   Since we got rid of explicit ACKs, we no longer have the need for a reader
>   and a writer to update the same counter. So we can get rid of atomic ops
>   too.
>
> Changes in v3:
> --------------
> * Dropped the _light() and _full() variants of the APIs. Provided a single
>   interface: get/put_online_cpus_atomic().
>
> * Completely redesigned the synchronization mechanism again, to make it
>   fast and scalable at the reader-side in the fast-path (when no hotplug
>   writers are active). This new scheme also ensures that there is no
>   possibility of deadlocks due to circular locking dependency.
>   In summary, this provides the scalability and speed of per-cpu rwlocks
>   (without actually using them), while avoiding the downside (deadlock
>   possibilities) which is inherent in any per-cpu locking scheme that is
>   meant to compete with preempt_disable()/enable() in terms of flexibility.
>
>   The problem with using per-cpu locking to replace preempt_disable()/enable
>   was explained here:
>   https://lkml.org/lkml/2012/12/6/290
>
>   Basically we use per-cpu counters (for scalability) when no writers are
>   active, and then switch to global rwlocks (for lock-safety) when a writer
>   becomes active. It is a slightly complex scheme, but it is based on
>   standard principles of distributed algorithms.
>
> Changes in v2:
> -------------
> * Completely redesigned the synchronization scheme to avoid using any extra
>   cpumasks.
>
> * Provided APIs for 2 types of atomic hotplug readers: "light" (for
>   light-weight) and "full". We wish to have more "light" readers than
>   the "full" ones, to avoid indirectly inducing the "stop_machine effect"
>   without even actually using stop_machine().
>
>   And the patches show that it _is_ generally true: 5 patches deal with
>   "light" readers, whereas only 1 patch deals with a "full" reader.
>
>   Also, the "light" readers happen to be in very hot paths. So it makes a
>   lot of sense to have such a distinction and a corresponding light-weight
>   API.
>
> Links to previous versions:
> v5: http://lwn.net/Articles/533553/
> v4: https://lkml.org/lkml/2012/12/11/209
> v3: https://lkml.org/lkml/2012/12/7/287
> v2: https://lkml.org/lkml/2012/12/5/322
> v1: https://lkml.org/lkml/2012/12/4/88
>
> --
>  Paul E. McKenney (1):
>       cpu: No more __stop_machine() in _cpu_down()
>
> Srivatsa S. Bhat (45):
>       percpu_rwlock: Introduce the global reader-writer lock backend
>       percpu_rwlock: Introduce per-CPU variables for the reader and the writer
>       percpu_rwlock: Provide a way to define and init percpu-rwlocks at compile time
>       percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
>       percpu_rwlock: Make percpu-rwlocks IRQ-safe, optimally
>       percpu_rwlock: Rearrange the read-lock code to fastpath nested percpu readers
>       percpu_rwlock: Allow writers to be readers, and add lockdep annotations
>       CPU hotplug: Provide APIs to prevent CPU offline from atomic context
>       CPU hotplug: Convert preprocessor macros to static inline functions
>       smp, cpu hotplug: Fix smp_call_function_*() to prevent CPU offline properly
>       smp, cpu hotplug: Fix on_each_cpu_*() to prevent CPU offline properly
>       sched/timer: Use get/put_online_cpus_atomic() to prevent CPU offline
>       sched/migration: Use raw_spin_lock/unlock since interrupts are already disabled
>       sched/rt: Use get/put_online_cpus_atomic() to prevent CPU offline
>       tick: Use get/put_online_cpus_atomic() to prevent CPU offline
>       time/clocksource: Use get/put_online_cpus_atomic() to prevent CPU offline
>       clockevents: Use get/put_online_cpus_atomic() in clockevents_notify()
>       softirq: Use get/put_online_cpus_atomic() to prevent CPU offline
>       irq: Use get/put_online_cpus_atomic() to prevent CPU offline
>       net: Use get/put_online_cpus_atomic() to prevent CPU offline
>       block: Use get/put_online_cpus_atomic() to prevent CPU offline
>       crypto: pcrypt - Protect access to cpu_online_mask with get/put_online_cpus()
>       infiniband: ehca: Use get/put_online_cpus_atomic() to prevent CPU offline
>       [SCSI] fcoe: Use get/put_online_cpus_atomic() to prevent CPU offline
>       staging: octeon: Use get/put_online_cpus_atomic() to prevent CPU offline
>       x86: Use get/put_online_cpus_atomic() to prevent CPU offline
>       perf/x86: Use get/put_online_cpus_atomic() to prevent CPU offline
>       KVM: Use get/put_online_cpus_atomic() to prevent CPU offline from atomic context
>       kvm/vmx: Use get/put_online_cpus_atomic() to prevent CPU offline
>       x86/xen: Use get/put_online_cpus_atomic() to prevent CPU offline
>       alpha/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
>       blackfin/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
>       cris/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
>       hexagon/smp: Use get/put_online_cpus_atomic() to prevent CPU offline
>       ia64: Use get/put_online_cpus_atomic() to prevent CPU offline
>       m32r: Use get/put_online_cpus_atomic() to prevent CPU offline
>       MIPS: Use get/put_online_cpus_atomic() to prevent CPU offline
>       mn10300: Use get/put_online_cpus_atomic() to prevent CPU offline
>       parisc: Use get/put_online_cpus_atomic() to prevent CPU offline
>       powerpc: Use get/put_online_cpus_atomic() to prevent CPU offline
>       sh: Use get/put_online_cpus_atomic() to prevent CPU offline
>       sparc: Use get/put_online_cpus_atomic() to prevent CPU offline
>       tile: Use get/put_online_cpus_atomic() to prevent CPU offline
>       CPU hotplug, stop_machine: Decouple CPU hotplug from stop_machine() in Kconfig
>       Documentation/cpu-hotplug: Remove references to stop_machine()
>
>   Documentation/cpu-hotplug.txt                 |   17 +-
>  arch/alpha/kernel/smp.c                       |   19 +-
>  arch/arm/Kconfig                              |    1
>  arch/blackfin/Kconfig                         |    1
>  arch/blackfin/mach-common/smp.c               |    6 -
>  arch/cris/arch-v32/kernel/smp.c               |    8 +
>  arch/hexagon/kernel/smp.c                     |    5
>  arch/ia64/Kconfig                             |    1
>  arch/ia64/kernel/irq_ia64.c                   |   13 +
>  arch/ia64/kernel/perfmon.c                    |    6 +
>  arch/ia64/kernel/smp.c                        |   23 ++
>  arch/ia64/mm/tlb.c                            |    6 -
>  arch/m32r/kernel/smp.c                        |   12 +
>  arch/mips/Kconfig                             |    1
>  arch/mips/kernel/cevt-smtc.c                  |    8 +
>  arch/mips/kernel/smp.c                        |   16 +-
>  arch/mips/kernel/smtc.c                       |    3
>  arch/mips/mm/c-octeon.c                       |    4
>  arch/mn10300/Kconfig                          |    1
>  arch/mn10300/kernel/smp.c                     |    2
>  arch/mn10300/mm/cache-smp.c                   |    5
>  arch/mn10300/mm/tlb-smp.c                     |   15 +
>  arch/parisc/Kconfig                           |    1
>  arch/parisc/kernel/smp.c                      |    4
>  arch/powerpc/Kconfig                          |    1
>  arch/powerpc/mm/mmu_context_nohash.c          |    2
>  arch/s390/Kconfig                             |    1
>  arch/sh/Kconfig                               |    1
>  arch/sh/kernel/smp.c                          |   12 +
>  arch/sparc/Kconfig                            |    1
>  arch/sparc/kernel/leon_smp.c                  |    2
>  arch/sparc/kernel/smp_64.c                    |    9 -
>  arch/sparc/kernel/sun4d_smp.c                 |    2
>  arch/sparc/kernel/sun4m_smp.c                 |    3
>  arch/tile/kernel/smp.c                        |    4
>  arch/x86/Kconfig                              |    1
>  arch/x86/include/asm/ipi.h                    |    5
>  arch/x86/kernel/apic/apic_flat_64.c           |   10 +
>  arch/x86/kernel/apic/apic_numachip.c          |    5
>  arch/x86/kernel/apic/es7000_32.c              |    5
>  arch/x86/kernel/apic/io_apic.c                |    7 -
>  arch/x86/kernel/apic/ipi.c                    |   10 +
>  arch/x86/kernel/apic/x2apic_cluster.c         |    4
>  arch/x86/kernel/apic/x2apic_uv_x.c            |    4
>  arch/x86/kernel/cpu/mcheck/therm_throt.c      |    4
>  arch/x86/kernel/cpu/perf_event_intel_uncore.c |    5
>  arch/x86/kvm/vmx.c                            |    8 +
>  arch/x86/mm/tlb.c                             |   14 +
>  arch/x86/xen/mmu.c                            |   11 +
>  arch/x86/xen/smp.c                            |    9 +
>  block/blk-softirq.c                           |    4
>  crypto/pcrypt.c                               |    4
>  drivers/infiniband/hw/ehca/ehca_irq.c         |    8 +
>  drivers/scsi/fcoe/fcoe.c                      |    7 +
>  drivers/staging/octeon/ethernet-rx.c          |    3
>  include/linux/cpu.h                           |    8 +
>  include/linux/percpu-rwlock.h                 |   74 +++++++
>  include/linux/stop_machine.h                  |    2
>  init/Kconfig                                  |    2
>  kernel/cpu.c                                  |   59 +++++-
>  kernel/irq/manage.c                           |    7 +
>  kernel/sched/core.c                           |   36 +++-
>  kernel/sched/fair.c                           |    5
>  kernel/sched/rt.c                             |    3
>  kernel/smp.c                                  |   65 ++++--
>  kernel/softirq.c                              |    3
>  kernel/time/clockevents.c                     |    3
>  kernel/time/clocksource.c                     |    5
>  kernel/time/tick-broadcast.c                  |    2
>  kernel/timer.c                                |    2
>  lib/Kconfig                                   |    3
>  lib/Makefile                                  |    1
>  lib/percpu-rwlock.c                           |  256 +++++++++++++++++++++++++
>  net/core/dev.c                                |    9 +
>  virt/kvm/kvm_main.c                           |   10 +
>  75 files changed, 776 insertions(+), 123 deletions(-)
>  create mode 100644 include/linux/percpu-rwlock.h
>  create mode 100644 lib/percpu-rwlock.c
>
>
>
> Regards,
> Srivatsa S. Bhat
> IBM Linux Technology Center
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH] lglock: add read-preference local-global rwlock
  2013-02-27 21:19                                 ` Srivatsa S. Bhat
                                                     ` (3 preceding siblings ...)
  (?)
@ 2013-03-01 17:44                                   ` Lai Jiangshan
  -1 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-03-01 17:44 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Lai Jiangshan, Michel Lespinasse, linux-doc, peterz, fweisbec,
	linux-kernel, namhyung, mingo, linux-arch, linux, xiaoguangrong,
	wangyun, paulmck, nikunj, linux-pm, rusty, rostedt, rjw,
	vincent.guittot, tglx, linux-arm-kernel, netdev, oleg, sbw, tj,
	akpm, linuxppc-dev

>From c63f2be9a4cf7106a521dda169a0e14f8e4f7e3b Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <laijs@cn.fujitsu.com>
Date: Mon, 25 Feb 2013 23:14:27 +0800
Subject: [PATCH] lglock: add read-preference local-global rwlock

Current lglock is not read-preference, so it can't be used on some cases
which read-preference rwlock can do. Example, get_cpu_online_atomic().

Although we can use rwlock for these cases which needs read-preference.
but it leads to unnecessary cache-line bouncing even when there are no
writers present, which can slow down the system needlessly. It will be
worse when we have a lot of CPUs, it is not scale.

So we look forward to lglock. lglock is read-write-lock based on percpu locks,
but it is not read-preference due to its underlining percpu locks.

But what if we convert the percpu locks of lglock to use percpu rwlocks:

         CPU 0                                CPU 1
         ------                               ------
1.    spin_lock(&random_lock);             read_lock(my_rwlock of CPU 1);
2.    read_lock(my_rwlock of CPU 0);       spin_lock(&random_lock);

Writer:
         CPU 2:
         ------
      for_each_online_cpu(cpu)
        write_lock(my_rwlock of 'cpu');


Consider what happens if the writer begins his operation in between steps 1
and 2 at the reader side. It becomes evident that we end up in a (previously
non-existent) deadlock due to a circular locking dependency between the 3
entities, like this:


(holds              Waiting for
 random_lock) CPU 0 -------------> CPU 2  (holds my_rwlock of CPU 0
                                               for write)
               ^                   |
               |                   |
        Waiting|                   | Waiting
          for  |                   |  for
               |                   V
                ------ CPU 1 <------

                (holds my_rwlock of
                 CPU 1 for read)


So obviously this "straight-forward" way of implementing percpu rwlocks is
deadlock-prone. So we can't implement read-preference local-global rwlock
like this.


The implement of this patch reuse current lglock as frontend to achieve
local-read-lock, and reuse global fallback rwlock as backend to achieve
read-preference, and use percpu reader counter to indicate 1) the depth
of the nested reader lockes and 2) whether the outmost lock is percpu lock
or fallback rwlock.

The algorithm is simple, in the read site:
If it is nested reader, just increase the counter
If it is the outmost reader,
	1) try to lock its cpu's lock of the frontend lglock.
	   (reader count +=1 if success)
	2) if the above step fails, read_lock(&fallback_rwlock).
	   (reader count += FALLBACK_BASE + 1)

Write site:
Do the lg_global_lock() of the frontend lglock.
And then write_lock(&fallback_rwlock).


Prof:
1) reader-writer exclusive:
The two steps of write site finished, no reader. Vice-verse.

2) read-preference:
before write site lock finished acquired, read site at least
wins at read_lock(&fallback_rwlock) for rwlock is read-preference.
read-preference also implies nestable.

3) read site functions are irqsafe(reentrance-safe)
	If read site function is interrupted at any point and reenters read site
again, reentranced read site will not be mislead by the first read site if the
reader counter > 0, in this case, it means currently frontend(this cpu lock of
lglock) or backend(fallback rwlock) lock is held, it is safe to act as
nested reader.
	if the reader counter=0, eentranced reader considers it is the
outmost read site, and it always successes after the write side release the lock.
(even the interrupted read-site has already hold the cpu lock of lglock
or the fallback_rwlock).
	And reentranced read site only calls arch_spin_trylock(), read_lock()
and __this_cpu_op(), arch_spin_trylock(), read_lock() is already reentrance-safe.
Although __this_cpu_op() is not reentrance-safe, but the value of the counter
will be restored after the interrupted finished, so read site functions
are still reentrance-safe.


Performance:
We only focus on the performance of the read site. this read site's fast path
is just preempt_disable() + __this_cpu_read/inc() + arch_spin_trylock(),
It has only one heavy memory operation. it will be expected fast.

We test three locks.
1) traditional rwlock WITHOUT remote competition nor cache-bouncing.(opt-rwlock)
2) this lock(lgrwlock)
3) V6 percpu-rwlock by "Srivatsa S. Bhat". (percpu-rwlock)
   (https://lkml.org/lkml/2013/2/18/186)

		nested=1(no nested)	nested=2	nested=4
opt-rwlock	 517181			1009200		2010027
lgrwlock	 452897			 700026		1201415
percpu-rwlock	1192955			1451343		1951757

The value is the time(nano-second) of 10000 times of the operations:
{
	read-lock
	[nested read-lock]...
	[nested read-unlock]...
	read-unlock
}
If the way of this test is wrong nor bad, please correct me,
the code of test is here: https://gist.github.com/laijs/5066159
(i5 760, 64bit)

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 include/linux/lglock.h |   35 ++++++++++++++++++++++++++++++++
 kernel/lglock.c        |   52 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 87 insertions(+), 0 deletions(-)

diff --git a/include/linux/lglock.h b/include/linux/lglock.h
index 0d24e93..afb22bd 100644
--- a/include/linux/lglock.h
+++ b/include/linux/lglock.h
@@ -67,4 +67,39 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu);
 void lg_global_lock(struct lglock *lg);
 void lg_global_unlock(struct lglock *lg);
 
+struct lgrwlock {
+	unsigned long __percpu *reader_refcnt;
+	struct lglock lglock;
+	rwlock_t fallback_rwlock;
+};
+
+#define __DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)		\
+	= __ARCH_SPIN_LOCK_UNLOCKED;					\
+	static DEFINE_PER_CPU(unsigned long, name ## _refcnt);
+
+#define __LGRWLOCK_INIT(name)						\
+	{								\
+		.reader_refcnt = &name ## _refcnt,			\
+		.lglock = { .lock = &name ## _lock },			\
+		.fallback_rwlock = __RW_LOCK_UNLOCKED(name.fallback_rwlock)\
+	}
+
+#define DEFINE_LGRWLOCK(name)						\
+	__DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	struct lgrwlock name = __LGRWLOCK_INIT(name)
+
+#define DEFINE_STATIC_LGRWLOCK(name)					\
+	__DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	static struct lgrwlock name = __LGRWLOCK_INIT(name)
+
+static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name)
+{
+	lg_lock_init(&lgrw->lglock, name);
+}
+
+void lg_rwlock_local_read_lock(struct lgrwlock *lgrw);
+void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw);
+void lg_rwlock_global_write_lock(struct lgrwlock *lgrw);
+void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw);
 #endif
diff --git a/kernel/lglock.c b/kernel/lglock.c
index 6535a66..6c60cf7 100644
--- a/kernel/lglock.c
+++ b/kernel/lglock.c
@@ -87,3 +87,55 @@ void lg_global_unlock(struct lglock *lg)
 	preempt_enable();
 }
 EXPORT_SYMBOL(lg_global_unlock);
+
+#define FALLBACK_BASE	(1UL << 30)
+
+void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
+{
+	struct lglock *lg = &lgrw->lglock;
+
+	preempt_disable();
+	rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
+	if (likely(!__this_cpu_read(*lgrw->reader_refcnt))) {
+		if (!arch_spin_trylock(this_cpu_ptr(lg->lock))) {
+			read_lock(&lgrw->fallback_rwlock);
+			__this_cpu_add(*lgrw->reader_refcnt, FALLBACK_BASE);
+		}
+	}
+
+	__this_cpu_inc(*lgrw->reader_refcnt);
+}
+EXPORT_SYMBOL(lg_rwlock_local_read_lock);
+
+void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
+{
+	switch (__this_cpu_dec_return(*lgrw->reader_refcnt)) {
+	case 0:
+		lg_local_unlock(&lgrw->lglock);
+		return;
+	case FALLBACK_BASE:
+		__this_cpu_sub(*lgrw->reader_refcnt, FALLBACK_BASE);
+		read_unlock(&lgrw->fallback_rwlock);
+		break;
+	default:
+		break;
+	}
+
+	rwlock_release(&lg->lock_dep_map, 1, _RET_IP_);
+	preempt_enable();
+}
+EXPORT_SYMBOL(lg_rwlock_local_read_unlock);
+
+void lg_rwlock_global_write_lock(struct lgrwlock *lgrw)
+{
+	lg_global_lock(&lgrw->lglock);
+	write_lock(&lgrw->fallback_rwlock);
+}
+EXPORT_SYMBOL(lg_rwlock_global_write_lock);
+
+void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw)
+{
+	write_unlock(&lgrw->fallback_rwlock);
+	lg_global_unlock(&lgrw->lglock);
+}
+EXPORT_SYMBOL(lg_rwlock_global_write_unlock);
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH] lglock: add read-preference local-global rwlock
@ 2013-03-01 17:44                                   ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-03-01 17:44 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Lai Jiangshan, Michel Lespinasse, linux-doc, peterz, fweisbec,
	linux-kernel, namhyung, mingo, linux-arch, linux, xiaoguangrong,
	wangyun, paulmck, nikunj, linux-pm, rusty, rostedt, rjw,
	vincent.guittot, tglx, linux-arm-kernel, netdev, oleg, sbw, tj,
	akpm, linuxppc-dev

>From c63f2be9a4cf7106a521dda169a0e14f8e4f7e3b Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <laijs@cn.fujitsu.com>
Date: Mon, 25 Feb 2013 23:14:27 +0800
Subject: [PATCH] lglock: add read-preference local-global rwlock

Current lglock is not read-preference, so it can't be used on some cases
which read-preference rwlock can do. Example, get_cpu_online_atomic().

Although we can use rwlock for these cases which needs read-preference.
but it leads to unnecessary cache-line bouncing even when there are no
writers present, which can slow down the system needlessly. It will be
worse when we have a lot of CPUs, it is not scale.

So we look forward to lglock. lglock is read-write-lock based on percpu locks,
but it is not read-preference due to its underlining percpu locks.

But what if we convert the percpu locks of lglock to use percpu rwlocks:

         CPU 0                                CPU 1
         ------                               ------
1.    spin_lock(&random_lock);             read_lock(my_rwlock of CPU 1);
2.    read_lock(my_rwlock of CPU 0);       spin_lock(&random_lock);

Writer:
         CPU 2:
         ------
      for_each_online_cpu(cpu)
        write_lock(my_rwlock of 'cpu');


Consider what happens if the writer begins his operation in between steps 1
and 2 at the reader side. It becomes evident that we end up in a (previously
non-existent) deadlock due to a circular locking dependency between the 3
entities, like this:


(holds              Waiting for
 random_lock) CPU 0 -------------> CPU 2  (holds my_rwlock of CPU 0
                                               for write)
               ^                   |
               |                   |
        Waiting|                   | Waiting
          for  |                   |  for
               |                   V
                ------ CPU 1 <------

                (holds my_rwlock of
                 CPU 1 for read)


So obviously this "straight-forward" way of implementing percpu rwlocks is
deadlock-prone. So we can't implement read-preference local-global rwlock
like this.


The implement of this patch reuse current lglock as frontend to achieve
local-read-lock, and reuse global fallback rwlock as backend to achieve
read-preference, and use percpu reader counter to indicate 1) the depth
of the nested reader lockes and 2) whether the outmost lock is percpu lock
or fallback rwlock.

The algorithm is simple, in the read site:
If it is nested reader, just increase the counter
If it is the outmost reader,
	1) try to lock its cpu's lock of the frontend lglock.
	   (reader count +=1 if success)
	2) if the above step fails, read_lock(&fallback_rwlock).
	   (reader count += FALLBACK_BASE + 1)

Write site:
Do the lg_global_lock() of the frontend lglock.
And then write_lock(&fallback_rwlock).


Prof:
1) reader-writer exclusive:
The two steps of write site finished, no reader. Vice-verse.

2) read-preference:
before write site lock finished acquired, read site at least
wins at read_lock(&fallback_rwlock) for rwlock is read-preference.
read-preference also implies nestable.

3) read site functions are irqsafe(reentrance-safe)
	If read site function is interrupted at any point and reenters read site
again, reentranced read site will not be mislead by the first read site if the
reader counter > 0, in this case, it means currently frontend(this cpu lock of
lglock) or backend(fallback rwlock) lock is held, it is safe to act as
nested reader.
	if the reader counter=0, eentranced reader considers it is the
outmost read site, and it always successes after the write side release the lock.
(even the interrupted read-site has already hold the cpu lock of lglock
or the fallback_rwlock).
	And reentranced read site only calls arch_spin_trylock(), read_lock()
and __this_cpu_op(), arch_spin_trylock(), read_lock() is already reentrance-safe.
Although __this_cpu_op() is not reentrance-safe, but the value of the counter
will be restored after the interrupted finished, so read site functions
are still reentrance-safe.


Performance:
We only focus on the performance of the read site. this read site's fast path
is just preempt_disable() + __this_cpu_read/inc() + arch_spin_trylock(),
It has only one heavy memory operation. it will be expected fast.

We test three locks.
1) traditional rwlock WITHOUT remote competition nor cache-bouncing.(opt-rwlock)
2) this lock(lgrwlock)
3) V6 percpu-rwlock by "Srivatsa S. Bhat". (percpu-rwlock)
   (https://lkml.org/lkml/2013/2/18/186)

		nested=1(no nested)	nested=2	nested=4
opt-rwlock	 517181			1009200		2010027
lgrwlock	 452897			 700026		1201415
percpu-rwlock	1192955			1451343		1951757

The value is the time(nano-second) of 10000 times of the operations:
{
	read-lock
	[nested read-lock]...
	[nested read-unlock]...
	read-unlock
}
If the way of this test is wrong nor bad, please correct me,
the code of test is here: https://gist.github.com/laijs/5066159
(i5 760, 64bit)

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 include/linux/lglock.h |   35 ++++++++++++++++++++++++++++++++
 kernel/lglock.c        |   52 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 87 insertions(+), 0 deletions(-)

diff --git a/include/linux/lglock.h b/include/linux/lglock.h
index 0d24e93..afb22bd 100644
--- a/include/linux/lglock.h
+++ b/include/linux/lglock.h
@@ -67,4 +67,39 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu);
 void lg_global_lock(struct lglock *lg);
 void lg_global_unlock(struct lglock *lg);
 
+struct lgrwlock {
+	unsigned long __percpu *reader_refcnt;
+	struct lglock lglock;
+	rwlock_t fallback_rwlock;
+};
+
+#define __DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)		\
+	= __ARCH_SPIN_LOCK_UNLOCKED;					\
+	static DEFINE_PER_CPU(unsigned long, name ## _refcnt);
+
+#define __LGRWLOCK_INIT(name)						\
+	{								\
+		.reader_refcnt = &name ## _refcnt,			\
+		.lglock = { .lock = &name ## _lock },			\
+		.fallback_rwlock = __RW_LOCK_UNLOCKED(name.fallback_rwlock)\
+	}
+
+#define DEFINE_LGRWLOCK(name)						\
+	__DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	struct lgrwlock name = __LGRWLOCK_INIT(name)
+
+#define DEFINE_STATIC_LGRWLOCK(name)					\
+	__DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	static struct lgrwlock name = __LGRWLOCK_INIT(name)
+
+static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name)
+{
+	lg_lock_init(&lgrw->lglock, name);
+}
+
+void lg_rwlock_local_read_lock(struct lgrwlock *lgrw);
+void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw);
+void lg_rwlock_global_write_lock(struct lgrwlock *lgrw);
+void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw);
 #endif
diff --git a/kernel/lglock.c b/kernel/lglock.c
index 6535a66..6c60cf7 100644
--- a/kernel/lglock.c
+++ b/kernel/lglock.c
@@ -87,3 +87,55 @@ void lg_global_unlock(struct lglock *lg)
 	preempt_enable();
 }
 EXPORT_SYMBOL(lg_global_unlock);
+
+#define FALLBACK_BASE	(1UL << 30)
+
+void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
+{
+	struct lglock *lg = &lgrw->lglock;
+
+	preempt_disable();
+	rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
+	if (likely(!__this_cpu_read(*lgrw->reader_refcnt))) {
+		if (!arch_spin_trylock(this_cpu_ptr(lg->lock))) {
+			read_lock(&lgrw->fallback_rwlock);
+			__this_cpu_add(*lgrw->reader_refcnt, FALLBACK_BASE);
+		}
+	}
+
+	__this_cpu_inc(*lgrw->reader_refcnt);
+}
+EXPORT_SYMBOL(lg_rwlock_local_read_lock);
+
+void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
+{
+	switch (__this_cpu_dec_return(*lgrw->reader_refcnt)) {
+	case 0:
+		lg_local_unlock(&lgrw->lglock);
+		return;
+	case FALLBACK_BASE:
+		__this_cpu_sub(*lgrw->reader_refcnt, FALLBACK_BASE);
+		read_unlock(&lgrw->fallback_rwlock);
+		break;
+	default:
+		break;
+	}
+
+	rwlock_release(&lg->lock_dep_map, 1, _RET_IP_);
+	preempt_enable();
+}
+EXPORT_SYMBOL(lg_rwlock_local_read_unlock);
+
+void lg_rwlock_global_write_lock(struct lgrwlock *lgrw)
+{
+	lg_global_lock(&lgrw->lglock);
+	write_lock(&lgrw->fallback_rwlock);
+}
+EXPORT_SYMBOL(lg_rwlock_global_write_lock);
+
+void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw)
+{
+	write_unlock(&lgrw->fallback_rwlock);
+	lg_global_unlock(&lgrw->lglock);
+}
+EXPORT_SYMBOL(lg_rwlock_global_write_unlock);
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH] lglock: add read-preference local-global rwlock
@ 2013-03-01 17:44                                   ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-03-01 17:44 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Lai Jiangshan, Michel Lespinasse, linux-doc, peterz, fweisbec,
	linux-kernel, namhyung, mingo, linux-arch, linux, xiaoguangrong,
	wangyun, paulmck, nikunj, linux-pm, rusty, rostedt, rjw,
	vincent.guittot, tglx, linux-arm-kernel, netdev, oleg, sbw, tj,
	akpm, linuxppc-dev

From c63f2be9a4cf7106a521dda169a0e14f8e4f7e3b Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <laijs@cn.fujitsu.com>
Date: Mon, 25 Feb 2013 23:14:27 +0800
Subject: [PATCH] lglock: add read-preference local-global rwlock

Current lglock is not read-preference, so it can't be used on some cases
which read-preference rwlock can do. Example, get_cpu_online_atomic().

Although we can use rwlock for these cases which needs read-preference.
but it leads to unnecessary cache-line bouncing even when there are no
writers present, which can slow down the system needlessly. It will be
worse when we have a lot of CPUs, it is not scale.

So we look forward to lglock. lglock is read-write-lock based on percpu locks,
but it is not read-preference due to its underlining percpu locks.

But what if we convert the percpu locks of lglock to use percpu rwlocks:

         CPU 0                                CPU 1
         ------                               ------
1.    spin_lock(&random_lock);             read_lock(my_rwlock of CPU 1);
2.    read_lock(my_rwlock of CPU 0);       spin_lock(&random_lock);

Writer:
         CPU 2:
         ------
      for_each_online_cpu(cpu)
        write_lock(my_rwlock of 'cpu');


Consider what happens if the writer begins his operation in between steps 1
and 2 at the reader side. It becomes evident that we end up in a (previously
non-existent) deadlock due to a circular locking dependency between the 3
entities, like this:


(holds              Waiting for
 random_lock) CPU 0 -------------> CPU 2  (holds my_rwlock of CPU 0
                                               for write)
               ^                   |
               |                   |
        Waiting|                   | Waiting
          for  |                   |  for
               |                   V
                ------ CPU 1 <------

                (holds my_rwlock of
                 CPU 1 for read)


So obviously this "straight-forward" way of implementing percpu rwlocks is
deadlock-prone. So we can't implement read-preference local-global rwlock
like this.


The implement of this patch reuse current lglock as frontend to achieve
local-read-lock, and reuse global fallback rwlock as backend to achieve
read-preference, and use percpu reader counter to indicate 1) the depth
of the nested reader lockes and 2) whether the outmost lock is percpu lock
or fallback rwlock.

The algorithm is simple, in the read site:
If it is nested reader, just increase the counter
If it is the outmost reader,
	1) try to lock its cpu's lock of the frontend lglock.
	   (reader count +=1 if success)
	2) if the above step fails, read_lock(&fallback_rwlock).
	   (reader count += FALLBACK_BASE + 1)

Write site:
Do the lg_global_lock() of the frontend lglock.
And then write_lock(&fallback_rwlock).


Prof:
1) reader-writer exclusive:
The two steps of write site finished, no reader. Vice-verse.

2) read-preference:
before write site lock finished acquired, read site at least
wins at read_lock(&fallback_rwlock) for rwlock is read-preference.
read-preference also implies nestable.

3) read site functions are irqsafe(reentrance-safe)
	If read site function is interrupted at any point and reenters read site
again, reentranced read site will not be mislead by the first read site if the
reader counter > 0, in this case, it means currently frontend(this cpu lock of
lglock) or backend(fallback rwlock) lock is held, it is safe to act as
nested reader.
	if the reader counter=0, eentranced reader considers it is the
outmost read site, and it always successes after the write side release the lock.
(even the interrupted read-site has already hold the cpu lock of lglock
or the fallback_rwlock).
	And reentranced read site only calls arch_spin_trylock(), read_lock()
and __this_cpu_op(), arch_spin_trylock(), read_lock() is already reentrance-safe.
Although __this_cpu_op() is not reentrance-safe, but the value of the counter
will be restored after the interrupted finished, so read site functions
are still reentrance-safe.


Performance:
We only focus on the performance of the read site. this read site's fast path
is just preempt_disable() + __this_cpu_read/inc() + arch_spin_trylock(),
It has only one heavy memory operation. it will be expected fast.

We test three locks.
1) traditional rwlock WITHOUT remote competition nor cache-bouncing.(opt-rwlock)
2) this lock(lgrwlock)
3) V6 percpu-rwlock by "Srivatsa S. Bhat". (percpu-rwlock)
   (https://lkml.org/lkml/2013/2/18/186)

		nested=1(no nested)	nested=2	nested=4
opt-rwlock	 517181			1009200		2010027
lgrwlock	 452897			 700026		1201415
percpu-rwlock	1192955			1451343		1951757

The value is the time(nano-second) of 10000 times of the operations:
{
	read-lock
	[nested read-lock]...
	[nested read-unlock]...
	read-unlock
}
If the way of this test is wrong nor bad, please correct me,
the code of test is here: https://gist.github.com/laijs/5066159
(i5 760, 64bit)

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 include/linux/lglock.h |   35 ++++++++++++++++++++++++++++++++
 kernel/lglock.c        |   52 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 87 insertions(+), 0 deletions(-)

diff --git a/include/linux/lglock.h b/include/linux/lglock.h
index 0d24e93..afb22bd 100644
--- a/include/linux/lglock.h
+++ b/include/linux/lglock.h
@@ -67,4 +67,39 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu);
 void lg_global_lock(struct lglock *lg);
 void lg_global_unlock(struct lglock *lg);
 
+struct lgrwlock {
+	unsigned long __percpu *reader_refcnt;
+	struct lglock lglock;
+	rwlock_t fallback_rwlock;
+};
+
+#define __DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)		\
+	= __ARCH_SPIN_LOCK_UNLOCKED;					\
+	static DEFINE_PER_CPU(unsigned long, name ## _refcnt);
+
+#define __LGRWLOCK_INIT(name)						\
+	{								\
+		.reader_refcnt = &name ## _refcnt,			\
+		.lglock = { .lock = &name ## _lock },			\
+		.fallback_rwlock = __RW_LOCK_UNLOCKED(name.fallback_rwlock)\
+	}
+
+#define DEFINE_LGRWLOCK(name)						\
+	__DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	struct lgrwlock name = __LGRWLOCK_INIT(name)
+
+#define DEFINE_STATIC_LGRWLOCK(name)					\
+	__DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	static struct lgrwlock name = __LGRWLOCK_INIT(name)
+
+static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name)
+{
+	lg_lock_init(&lgrw->lglock, name);
+}
+
+void lg_rwlock_local_read_lock(struct lgrwlock *lgrw);
+void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw);
+void lg_rwlock_global_write_lock(struct lgrwlock *lgrw);
+void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw);
 #endif
diff --git a/kernel/lglock.c b/kernel/lglock.c
index 6535a66..6c60cf7 100644
--- a/kernel/lglock.c
+++ b/kernel/lglock.c
@@ -87,3 +87,55 @@ void lg_global_unlock(struct lglock *lg)
 	preempt_enable();
 }
 EXPORT_SYMBOL(lg_global_unlock);
+
+#define FALLBACK_BASE	(1UL << 30)
+
+void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
+{
+	struct lglock *lg = &lgrw->lglock;
+
+	preempt_disable();
+	rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
+	if (likely(!__this_cpu_read(*lgrw->reader_refcnt))) {
+		if (!arch_spin_trylock(this_cpu_ptr(lg->lock))) {
+			read_lock(&lgrw->fallback_rwlock);
+			__this_cpu_add(*lgrw->reader_refcnt, FALLBACK_BASE);
+		}
+	}
+
+	__this_cpu_inc(*lgrw->reader_refcnt);
+}
+EXPORT_SYMBOL(lg_rwlock_local_read_lock);
+
+void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
+{
+	switch (__this_cpu_dec_return(*lgrw->reader_refcnt)) {
+	case 0:
+		lg_local_unlock(&lgrw->lglock);
+		return;
+	case FALLBACK_BASE:
+		__this_cpu_sub(*lgrw->reader_refcnt, FALLBACK_BASE);
+		read_unlock(&lgrw->fallback_rwlock);
+		break;
+	default:
+		break;
+	}
+
+	rwlock_release(&lg->lock_dep_map, 1, _RET_IP_);
+	preempt_enable();
+}
+EXPORT_SYMBOL(lg_rwlock_local_read_unlock);
+
+void lg_rwlock_global_write_lock(struct lgrwlock *lgrw)
+{
+	lg_global_lock(&lgrw->lglock);
+	write_lock(&lgrw->fallback_rwlock);
+}
+EXPORT_SYMBOL(lg_rwlock_global_write_lock);
+
+void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw)
+{
+	write_unlock(&lgrw->fallback_rwlock);
+	lg_global_unlock(&lgrw->lglock);
+}
+EXPORT_SYMBOL(lg_rwlock_global_write_unlock);
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH] lglock: add read-preference local-global rwlock
@ 2013-03-01 17:44                                   ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-03-01 17:44 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Lai Jiangshan, Michel Lespinasse, linux-doc, peterz, fweisbec,
	linux-kernel, namhyung, mingo, linux-arch, linux, xiaoguangrong,
	wangyun, paulmck, nikunj, linux-pm, rusty, rostedt, rjw,
	vincent.guittot, tglx, linux-arm-kernel, netdev, oleg, sbw, tj,
	akpm, linuxppc-dev

From c63f2be9a4cf7106a521dda169a0e14f8e4f7e3b Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <laijs@cn.fujitsu.com>
Date: Mon, 25 Feb 2013 23:14:27 +0800
Subject: [PATCH] lglock: add read-preference local-global rwlock

Current lglock is not read-preference, so it can't be used on some cases
which read-preference rwlock can do. Example, get_cpu_online_atomic().

Although we can use rwlock for these cases which needs read-preference.
but it leads to unnecessary cache-line bouncing even when there are no
writers present, which can slow down the system needlessly. It will be
worse when we have a lot of CPUs, it is not scale.

So we look forward to lglock. lglock is read-write-lock based on percpu locks,
but it is not read-preference due to its underlining percpu locks.

But what if we convert the percpu locks of lglock to use percpu rwlocks:

         CPU 0                                CPU 1
         ------                               ------
1.    spin_lock(&random_lock);             read_lock(my_rwlock of CPU 1);
2.    read_lock(my_rwlock of CPU 0);       spin_lock(&random_lock);

Writer:
         CPU 2:
         ------
      for_each_online_cpu(cpu)
        write_lock(my_rwlock of 'cpu');


Consider what happens if the writer begins his operation in between steps 1
and 2 at the reader side. It becomes evident that we end up in a (previously
non-existent) deadlock due to a circular locking dependency between the 3
entities, like this:


(holds              Waiting for
 random_lock) CPU 0 -------------> CPU 2  (holds my_rwlock of CPU 0
                                               for write)
               ^                   |
               |                   |
        Waiting|                   | Waiting
          for  |                   |  for
               |                   V
                ------ CPU 1 <------

                (holds my_rwlock of
                 CPU 1 for read)


So obviously this "straight-forward" way of implementing percpu rwlocks is
deadlock-prone. So we can't implement read-preference local-global rwlock
like this.


The implement of this patch reuse current lglock as frontend to achieve
local-read-lock, and reuse global fallback rwlock as backend to achieve
read-preference, and use percpu reader counter to indicate 1) the depth
of the nested reader lockes and 2) whether the outmost lock is percpu lock
or fallback rwlock.

The algorithm is simple, in the read site:
If it is nested reader, just increase the counter
If it is the outmost reader,
	1) try to lock its cpu's lock of the frontend lglock.
	   (reader count +=1 if success)
	2) if the above step fails, read_lock(&fallback_rwlock).
	   (reader count += FALLBACK_BASE + 1)

Write site:
Do the lg_global_lock() of the frontend lglock.
And then write_lock(&fallback_rwlock).


Prof:
1) reader-writer exclusive:
The two steps of write site finished, no reader. Vice-verse.

2) read-preference:
before write site lock finished acquired, read site at least
wins at read_lock(&fallback_rwlock) for rwlock is read-preference.
read-preference also implies nestable.

3) read site functions are irqsafe(reentrance-safe)
	If read site function is interrupted at any point and reenters read site
again, reentranced read site will not be mislead by the first read site if the
reader counter > 0, in this case, it means currently frontend(this cpu lock of
lglock) or backend(fallback rwlock) lock is held, it is safe to act as
nested reader.
	if the reader counter=0, eentranced reader considers it is the
outmost read site, and it always successes after the write side release the lock.
(even the interrupted read-site has already hold the cpu lock of lglock
or the fallback_rwlock).
	And reentranced read site only calls arch_spin_trylock(), read_lock()
and __this_cpu_op(), arch_spin_trylock(), read_lock() is already reentrance-safe.
Although __this_cpu_op() is not reentrance-safe, but the value of the counter
will be restored after the interrupted finished, so read site functions
are still reentrance-safe.


Performance:
We only focus on the performance of the read site. this read site's fast path
is just preempt_disable() + __this_cpu_read/inc() + arch_spin_trylock(),
It has only one heavy memory operation. it will be expected fast.

We test three locks.
1) traditional rwlock WITHOUT remote competition nor cache-bouncing.(opt-rwlock)
2) this lock(lgrwlock)
3) V6 percpu-rwlock by "Srivatsa S. Bhat". (percpu-rwlock)
   (https://lkml.org/lkml/2013/2/18/186)

		nested=1(no nested)	nested=2	nested=4
opt-rwlock	 517181			1009200		2010027
lgrwlock	 452897			 700026		1201415
percpu-rwlock	1192955			1451343		1951757

The value is the time(nano-second) of 10000 times of the operations:
{
	read-lock
	[nested read-lock]...
	[nested read-unlock]...
	read-unlock
}
If the way of this test is wrong nor bad, please correct me,
the code of test is here: https://gist.github.com/laijs/5066159
(i5 760, 64bit)

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 include/linux/lglock.h |   35 ++++++++++++++++++++++++++++++++
 kernel/lglock.c        |   52 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 87 insertions(+), 0 deletions(-)

diff --git a/include/linux/lglock.h b/include/linux/lglock.h
index 0d24e93..afb22bd 100644
--- a/include/linux/lglock.h
+++ b/include/linux/lglock.h
@@ -67,4 +67,39 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu);
 void lg_global_lock(struct lglock *lg);
 void lg_global_unlock(struct lglock *lg);
 
+struct lgrwlock {
+	unsigned long __percpu *reader_refcnt;
+	struct lglock lglock;
+	rwlock_t fallback_rwlock;
+};
+
+#define __DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)		\
+	= __ARCH_SPIN_LOCK_UNLOCKED;					\
+	static DEFINE_PER_CPU(unsigned long, name ## _refcnt);
+
+#define __LGRWLOCK_INIT(name)						\
+	{								\
+		.reader_refcnt = &name ## _refcnt,			\
+		.lglock = { .lock = &name ## _lock },			\
+		.fallback_rwlock = __RW_LOCK_UNLOCKED(name.fallback_rwlock)\
+	}
+
+#define DEFINE_LGRWLOCK(name)						\
+	__DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	struct lgrwlock name = __LGRWLOCK_INIT(name)
+
+#define DEFINE_STATIC_LGRWLOCK(name)					\
+	__DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	static struct lgrwlock name = __LGRWLOCK_INIT(name)
+
+static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name)
+{
+	lg_lock_init(&lgrw->lglock, name);
+}
+
+void lg_rwlock_local_read_lock(struct lgrwlock *lgrw);
+void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw);
+void lg_rwlock_global_write_lock(struct lgrwlock *lgrw);
+void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw);
 #endif
diff --git a/kernel/lglock.c b/kernel/lglock.c
index 6535a66..6c60cf7 100644
--- a/kernel/lglock.c
+++ b/kernel/lglock.c
@@ -87,3 +87,55 @@ void lg_global_unlock(struct lglock *lg)
 	preempt_enable();
 }
 EXPORT_SYMBOL(lg_global_unlock);
+
+#define FALLBACK_BASE	(1UL << 30)
+
+void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
+{
+	struct lglock *lg = &lgrw->lglock;
+
+	preempt_disable();
+	rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
+	if (likely(!__this_cpu_read(*lgrw->reader_refcnt))) {
+		if (!arch_spin_trylock(this_cpu_ptr(lg->lock))) {
+			read_lock(&lgrw->fallback_rwlock);
+			__this_cpu_add(*lgrw->reader_refcnt, FALLBACK_BASE);
+		}
+	}
+
+	__this_cpu_inc(*lgrw->reader_refcnt);
+}
+EXPORT_SYMBOL(lg_rwlock_local_read_lock);
+
+void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
+{
+	switch (__this_cpu_dec_return(*lgrw->reader_refcnt)) {
+	case 0:
+		lg_local_unlock(&lgrw->lglock);
+		return;
+	case FALLBACK_BASE:
+		__this_cpu_sub(*lgrw->reader_refcnt, FALLBACK_BASE);
+		read_unlock(&lgrw->fallback_rwlock);
+		break;
+	default:
+		break;
+	}
+
+	rwlock_release(&lg->lock_dep_map, 1, _RET_IP_);
+	preempt_enable();
+}
+EXPORT_SYMBOL(lg_rwlock_local_read_unlock);
+
+void lg_rwlock_global_write_lock(struct lgrwlock *lgrw)
+{
+	lg_global_lock(&lgrw->lglock);
+	write_lock(&lgrw->fallback_rwlock);
+}
+EXPORT_SYMBOL(lg_rwlock_global_write_lock);
+
+void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw)
+{
+	write_unlock(&lgrw->fallback_rwlock);
+	lg_global_unlock(&lgrw->lglock);
+}
+EXPORT_SYMBOL(lg_rwlock_global_write_unlock);
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH] lglock: add read-preference local-global rwlock
@ 2013-03-01 17:44                                   ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-03-01 17:44 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Lai Jiangshan, linux-doc, peterz, fweisbec, oleg,
	Michel Lespinasse, mingo, linux-arch, linux, xiaoguangrong,
	wangyun, paulmck, nikunj, linux-pm, rusty, rostedt, rjw,
	namhyung, tglx, linux-arm-kernel, netdev, linux-kernel,
	vincent.guittot, sbw, tj, akpm, linuxppc-dev

>From c63f2be9a4cf7106a521dda169a0e14f8e4f7e3b Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <laijs@cn.fujitsu.com>
Date: Mon, 25 Feb 2013 23:14:27 +0800
Subject: [PATCH] lglock: add read-preference local-global rwlock

Current lglock is not read-preference, so it can't be used on some cases
which read-preference rwlock can do. Example, get_cpu_online_atomic().

Although we can use rwlock for these cases which needs read-preference.
but it leads to unnecessary cache-line bouncing even when there are no
writers present, which can slow down the system needlessly. It will be
worse when we have a lot of CPUs, it is not scale.

So we look forward to lglock. lglock is read-write-lock based on percpu locks,
but it is not read-preference due to its underlining percpu locks.

But what if we convert the percpu locks of lglock to use percpu rwlocks:

         CPU 0                                CPU 1
         ------                               ------
1.    spin_lock(&random_lock);             read_lock(my_rwlock of CPU 1);
2.    read_lock(my_rwlock of CPU 0);       spin_lock(&random_lock);

Writer:
         CPU 2:
         ------
      for_each_online_cpu(cpu)
        write_lock(my_rwlock of 'cpu');


Consider what happens if the writer begins his operation in between steps 1
and 2 at the reader side. It becomes evident that we end up in a (previously
non-existent) deadlock due to a circular locking dependency between the 3
entities, like this:


(holds              Waiting for
 random_lock) CPU 0 -------------> CPU 2  (holds my_rwlock of CPU 0
                                               for write)
               ^                   |
               |                   |
        Waiting|                   | Waiting
          for  |                   |  for
               |                   V
                ------ CPU 1 <------

                (holds my_rwlock of
                 CPU 1 for read)


So obviously this "straight-forward" way of implementing percpu rwlocks is
deadlock-prone. So we can't implement read-preference local-global rwlock
like this.


The implement of this patch reuse current lglock as frontend to achieve
local-read-lock, and reuse global fallback rwlock as backend to achieve
read-preference, and use percpu reader counter to indicate 1) the depth
of the nested reader lockes and 2) whether the outmost lock is percpu lock
or fallback rwlock.

The algorithm is simple, in the read site:
If it is nested reader, just increase the counter
If it is the outmost reader,
	1) try to lock its cpu's lock of the frontend lglock.
	   (reader count +=1 if success)
	2) if the above step fails, read_lock(&fallback_rwlock).
	   (reader count += FALLBACK_BASE + 1)

Write site:
Do the lg_global_lock() of the frontend lglock.
And then write_lock(&fallback_rwlock).


Prof:
1) reader-writer exclusive:
The two steps of write site finished, no reader. Vice-verse.

2) read-preference:
before write site lock finished acquired, read site at least
wins at read_lock(&fallback_rwlock) for rwlock is read-preference.
read-preference also implies nestable.

3) read site functions are irqsafe(reentrance-safe)
	If read site function is interrupted at any point and reenters read site
again, reentranced read site will not be mislead by the first read site if the
reader counter > 0, in this case, it means currently frontend(this cpu lock of
lglock) or backend(fallback rwlock) lock is held, it is safe to act as
nested reader.
	if the reader counter=0, eentranced reader considers it is the
outmost read site, and it always successes after the write side release the lock.
(even the interrupted read-site has already hold the cpu lock of lglock
or the fallback_rwlock).
	And reentranced read site only calls arch_spin_trylock(), read_lock()
and __this_cpu_op(), arch_spin_trylock(), read_lock() is already reentrance-safe.
Although __this_cpu_op() is not reentrance-safe, but the value of the counter
will be restored after the interrupted finished, so read site functions
are still reentrance-safe.


Performance:
We only focus on the performance of the read site. this read site's fast path
is just preempt_disable() + __this_cpu_read/inc() + arch_spin_trylock(),
It has only one heavy memory operation. it will be expected fast.

We test three locks.
1) traditional rwlock WITHOUT remote competition nor cache-bouncing.(opt-rwlock)
2) this lock(lgrwlock)
3) V6 percpu-rwlock by "Srivatsa S. Bhat". (percpu-rwlock)
   (https://lkml.org/lkml/2013/2/18/186)

		nested=1(no nested)	nested=2	nested=4
opt-rwlock	 517181			1009200		2010027
lgrwlock	 452897			 700026		1201415
percpu-rwlock	1192955			1451343		1951757

The value is the time(nano-second) of 10000 times of the operations:
{
	read-lock
	[nested read-lock]...
	[nested read-unlock]...
	read-unlock
}
If the way of this test is wrong nor bad, please correct me,
the code of test is here: https://gist.github.com/laijs/5066159
(i5 760, 64bit)

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 include/linux/lglock.h |   35 ++++++++++++++++++++++++++++++++
 kernel/lglock.c        |   52 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 87 insertions(+), 0 deletions(-)

diff --git a/include/linux/lglock.h b/include/linux/lglock.h
index 0d24e93..afb22bd 100644
--- a/include/linux/lglock.h
+++ b/include/linux/lglock.h
@@ -67,4 +67,39 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu);
 void lg_global_lock(struct lglock *lg);
 void lg_global_unlock(struct lglock *lg);
 
+struct lgrwlock {
+	unsigned long __percpu *reader_refcnt;
+	struct lglock lglock;
+	rwlock_t fallback_rwlock;
+};
+
+#define __DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)		\
+	= __ARCH_SPIN_LOCK_UNLOCKED;					\
+	static DEFINE_PER_CPU(unsigned long, name ## _refcnt);
+
+#define __LGRWLOCK_INIT(name)						\
+	{								\
+		.reader_refcnt = &name ## _refcnt,			\
+		.lglock = { .lock = &name ## _lock },			\
+		.fallback_rwlock = __RW_LOCK_UNLOCKED(name.fallback_rwlock)\
+	}
+
+#define DEFINE_LGRWLOCK(name)						\
+	__DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	struct lgrwlock name = __LGRWLOCK_INIT(name)
+
+#define DEFINE_STATIC_LGRWLOCK(name)					\
+	__DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	static struct lgrwlock name = __LGRWLOCK_INIT(name)
+
+static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name)
+{
+	lg_lock_init(&lgrw->lglock, name);
+}
+
+void lg_rwlock_local_read_lock(struct lgrwlock *lgrw);
+void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw);
+void lg_rwlock_global_write_lock(struct lgrwlock *lgrw);
+void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw);
 #endif
diff --git a/kernel/lglock.c b/kernel/lglock.c
index 6535a66..6c60cf7 100644
--- a/kernel/lglock.c
+++ b/kernel/lglock.c
@@ -87,3 +87,55 @@ void lg_global_unlock(struct lglock *lg)
 	preempt_enable();
 }
 EXPORT_SYMBOL(lg_global_unlock);
+
+#define FALLBACK_BASE	(1UL << 30)
+
+void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
+{
+	struct lglock *lg = &lgrw->lglock;
+
+	preempt_disable();
+	rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
+	if (likely(!__this_cpu_read(*lgrw->reader_refcnt))) {
+		if (!arch_spin_trylock(this_cpu_ptr(lg->lock))) {
+			read_lock(&lgrw->fallback_rwlock);
+			__this_cpu_add(*lgrw->reader_refcnt, FALLBACK_BASE);
+		}
+	}
+
+	__this_cpu_inc(*lgrw->reader_refcnt);
+}
+EXPORT_SYMBOL(lg_rwlock_local_read_lock);
+
+void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
+{
+	switch (__this_cpu_dec_return(*lgrw->reader_refcnt)) {
+	case 0:
+		lg_local_unlock(&lgrw->lglock);
+		return;
+	case FALLBACK_BASE:
+		__this_cpu_sub(*lgrw->reader_refcnt, FALLBACK_BASE);
+		read_unlock(&lgrw->fallback_rwlock);
+		break;
+	default:
+		break;
+	}
+
+	rwlock_release(&lg->lock_dep_map, 1, _RET_IP_);
+	preempt_enable();
+}
+EXPORT_SYMBOL(lg_rwlock_local_read_unlock);
+
+void lg_rwlock_global_write_lock(struct lgrwlock *lgrw)
+{
+	lg_global_lock(&lgrw->lglock);
+	write_lock(&lgrw->fallback_rwlock);
+}
+EXPORT_SYMBOL(lg_rwlock_global_write_lock);
+
+void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw)
+{
+	write_unlock(&lgrw->fallback_rwlock);
+	lg_global_unlock(&lgrw->lglock);
+}
+EXPORT_SYMBOL(lg_rwlock_global_write_unlock);
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH] lglock: add read-preference local-global rwlock
@ 2013-03-01 17:44                                   ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-03-01 17:44 UTC (permalink / raw)
  To: linux-arm-kernel

>From c63f2be9a4cf7106a521dda169a0e14f8e4f7e3b Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <laijs@cn.fujitsu.com>
Date: Mon, 25 Feb 2013 23:14:27 +0800
Subject: [PATCH] lglock: add read-preference local-global rwlock

Current lglock is not read-preference, so it can't be used on some cases
which read-preference rwlock can do. Example, get_cpu_online_atomic().

Although we can use rwlock for these cases which needs read-preference.
but it leads to unnecessary cache-line bouncing even when there are no
writers present, which can slow down the system needlessly. It will be
worse when we have a lot of CPUs, it is not scale.

So we look forward to lglock. lglock is read-write-lock based on percpu locks,
but it is not read-preference due to its underlining percpu locks.

But what if we convert the percpu locks of lglock to use percpu rwlocks:

         CPU 0                                CPU 1
         ------                               ------
1.    spin_lock(&random_lock);             read_lock(my_rwlock of CPU 1);
2.    read_lock(my_rwlock of CPU 0);       spin_lock(&random_lock);

Writer:
         CPU 2:
         ------
      for_each_online_cpu(cpu)
        write_lock(my_rwlock of 'cpu');


Consider what happens if the writer begins his operation in between steps 1
and 2 at the reader side. It becomes evident that we end up in a (previously
non-existent) deadlock due to a circular locking dependency between the 3
entities, like this:


(holds              Waiting for
 random_lock) CPU 0 -------------> CPU 2  (holds my_rwlock of CPU 0
                                               for write)
               ^                   |
               |                   |
        Waiting|                   | Waiting
          for  |                   |  for
               |                   V
                ------ CPU 1 <------

                (holds my_rwlock of
                 CPU 1 for read)


So obviously this "straight-forward" way of implementing percpu rwlocks is
deadlock-prone. So we can't implement read-preference local-global rwlock
like this.


The implement of this patch reuse current lglock as frontend to achieve
local-read-lock, and reuse global fallback rwlock as backend to achieve
read-preference, and use percpu reader counter to indicate 1) the depth
of the nested reader lockes and 2) whether the outmost lock is percpu lock
or fallback rwlock.

The algorithm is simple, in the read site:
If it is nested reader, just increase the counter
If it is the outmost reader,
	1) try to lock its cpu's lock of the frontend lglock.
	   (reader count +=1 if success)
	2) if the above step fails, read_lock(&fallback_rwlock).
	   (reader count += FALLBACK_BASE + 1)

Write site:
Do the lg_global_lock() of the frontend lglock.
And then write_lock(&fallback_rwlock).


Prof:
1) reader-writer exclusive:
The two steps of write site finished, no reader. Vice-verse.

2) read-preference:
before write site lock finished acquired, read site at least
wins at read_lock(&fallback_rwlock) for rwlock is read-preference.
read-preference also implies nestable.

3) read site functions are irqsafe(reentrance-safe)
	If read site function is interrupted@any point and reenters read site
again, reentranced read site will not be mislead by the first read site if the
reader counter > 0, in this case, it means currently frontend(this cpu lock of
lglock) or backend(fallback rwlock) lock is held, it is safe to act as
nested reader.
	if the reader counter=0, eentranced reader considers it is the
outmost read site, and it always successes after the write side release the lock.
(even the interrupted read-site has already hold the cpu lock of lglock
or the fallback_rwlock).
	And reentranced read site only calls arch_spin_trylock(), read_lock()
and __this_cpu_op(), arch_spin_trylock(), read_lock() is already reentrance-safe.
Although __this_cpu_op() is not reentrance-safe, but the value of the counter
will be restored after the interrupted finished, so read site functions
are still reentrance-safe.


Performance:
We only focus on the performance of the read site. this read site's fast path
is just preempt_disable() + __this_cpu_read/inc() + arch_spin_trylock(),
It has only one heavy memory operation. it will be expected fast.

We test three locks.
1) traditional rwlock WITHOUT remote competition nor cache-bouncing.(opt-rwlock)
2) this lock(lgrwlock)
3) V6 percpu-rwlock by "Srivatsa S. Bhat". (percpu-rwlock)
   (https://lkml.org/lkml/2013/2/18/186)

		nested=1(no nested)	nested=2	nested=4
opt-rwlock	 517181			1009200		2010027
lgrwlock	 452897			 700026		1201415
percpu-rwlock	1192955			1451343		1951757

The value is the time(nano-second) of 10000 times of the operations:
{
	read-lock
	[nested read-lock]...
	[nested read-unlock]...
	read-unlock
}
If the way of this test is wrong nor bad, please correct me,
the code of test is here: https://gist.github.com/laijs/5066159
(i5 760, 64bit)

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 include/linux/lglock.h |   35 ++++++++++++++++++++++++++++++++
 kernel/lglock.c        |   52 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 87 insertions(+), 0 deletions(-)

diff --git a/include/linux/lglock.h b/include/linux/lglock.h
index 0d24e93..afb22bd 100644
--- a/include/linux/lglock.h
+++ b/include/linux/lglock.h
@@ -67,4 +67,39 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu);
 void lg_global_lock(struct lglock *lg);
 void lg_global_unlock(struct lglock *lg);
 
+struct lgrwlock {
+	unsigned long __percpu *reader_refcnt;
+	struct lglock lglock;
+	rwlock_t fallback_rwlock;
+};
+
+#define __DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)		\
+	= __ARCH_SPIN_LOCK_UNLOCKED;					\
+	static DEFINE_PER_CPU(unsigned long, name ## _refcnt);
+
+#define __LGRWLOCK_INIT(name)						\
+	{								\
+		.reader_refcnt = &name ## _refcnt,			\
+		.lglock = { .lock = &name ## _lock },			\
+		.fallback_rwlock = __RW_LOCK_UNLOCKED(name.fallback_rwlock)\
+	}
+
+#define DEFINE_LGRWLOCK(name)						\
+	__DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	struct lgrwlock name = __LGRWLOCK_INIT(name)
+
+#define DEFINE_STATIC_LGRWLOCK(name)					\
+	__DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	static struct lgrwlock name = __LGRWLOCK_INIT(name)
+
+static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name)
+{
+	lg_lock_init(&lgrw->lglock, name);
+}
+
+void lg_rwlock_local_read_lock(struct lgrwlock *lgrw);
+void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw);
+void lg_rwlock_global_write_lock(struct lgrwlock *lgrw);
+void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw);
 #endif
diff --git a/kernel/lglock.c b/kernel/lglock.c
index 6535a66..6c60cf7 100644
--- a/kernel/lglock.c
+++ b/kernel/lglock.c
@@ -87,3 +87,55 @@ void lg_global_unlock(struct lglock *lg)
 	preempt_enable();
 }
 EXPORT_SYMBOL(lg_global_unlock);
+
+#define FALLBACK_BASE	(1UL << 30)
+
+void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
+{
+	struct lglock *lg = &lgrw->lglock;
+
+	preempt_disable();
+	rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
+	if (likely(!__this_cpu_read(*lgrw->reader_refcnt))) {
+		if (!arch_spin_trylock(this_cpu_ptr(lg->lock))) {
+			read_lock(&lgrw->fallback_rwlock);
+			__this_cpu_add(*lgrw->reader_refcnt, FALLBACK_BASE);
+		}
+	}
+
+	__this_cpu_inc(*lgrw->reader_refcnt);
+}
+EXPORT_SYMBOL(lg_rwlock_local_read_lock);
+
+void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
+{
+	switch (__this_cpu_dec_return(*lgrw->reader_refcnt)) {
+	case 0:
+		lg_local_unlock(&lgrw->lglock);
+		return;
+	case FALLBACK_BASE:
+		__this_cpu_sub(*lgrw->reader_refcnt, FALLBACK_BASE);
+		read_unlock(&lgrw->fallback_rwlock);
+		break;
+	default:
+		break;
+	}
+
+	rwlock_release(&lg->lock_dep_map, 1, _RET_IP_);
+	preempt_enable();
+}
+EXPORT_SYMBOL(lg_rwlock_local_read_unlock);
+
+void lg_rwlock_global_write_lock(struct lgrwlock *lgrw)
+{
+	lg_global_lock(&lgrw->lglock);
+	write_lock(&lgrw->fallback_rwlock);
+}
+EXPORT_SYMBOL(lg_rwlock_global_write_lock);
+
+void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw)
+{
+	write_unlock(&lgrw->fallback_rwlock);
+	lg_global_unlock(&lgrw->lglock);
+}
+EXPORT_SYMBOL(lg_rwlock_global_write_unlock);
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
  2013-02-27 21:19                                 ` Srivatsa S. Bhat
  (?)
  (?)
@ 2013-03-01 17:50                                   ` Lai Jiangshan
  -1 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-03-01 17:50 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Lai Jiangshan, Michel Lespinasse, linux-doc, peterz, fweisbec,
	linux-kernel, namhyung, mingo, linux-arch, linux, xiaoguangrong,
	wangyun, paulmck, nikunj, linux-pm, rusty, rostedt, rjw,
	vincent.guittot, tglx, linux-arm-kernel, netdev, oleg, sbw, tj,
	akpm, linuxppc-dev

On 28/02/13 05:19, Srivatsa S. Bhat wrote:
> On 02/27/2013 06:03 AM, Lai Jiangshan wrote:
>> On Wed, Feb 27, 2013 at 3:30 AM, Srivatsa S. Bhat
>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>> On 02/26/2013 09:55 PM, Lai Jiangshan wrote:
>>>> On Tue, Feb 26, 2013 at 10:22 PM, Srivatsa S. Bhat
>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>
>>>>> Hi Lai,
>>>>>
>>>>> I'm really not convinced that piggy-backing on lglocks would help
>>>>> us in any way. But still, let me try to address some of the points
>>>>> you raised...
>>>>>
>>>>> On 02/26/2013 06:29 PM, Lai Jiangshan wrote:
>>>>>> On Tue, Feb 26, 2013 at 5:02 PM, Srivatsa S. Bhat
>>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>> On 02/26/2013 05:47 AM, Lai Jiangshan wrote:
>>>>>>>> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
>>>>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>>>> Hi Lai,
>>>>>>>>>
>>>>>>>>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>>>>>>>>> Hi, Srivatsa,
>>>>>>>>>>
>>>>>>>>>> The target of the whole patchset is nice for me.
>>>>>>>>>
>>>>>>>>> Cool! Thanks :-)
>>>>>>>>>
>>>>>>> [...]
>>>>>>>
>>>>>>> Unfortunately, I see quite a few issues with the code above. IIUC, the
>>>>>>> writer and the reader both increment the same counters. So how will the
>>>>>>> unlock() code in the reader path know when to unlock which of the locks?
>>>>>>
>>>>>> The same as your code, the reader(which nested in write C.S.) just dec
>>>>>> the counters.
>>>>>
>>>>> And that works fine in my case because the writer and the reader update
>>>>> _two_ _different_ counters.
>>>>
>>>> I can't find any magic in your code, they are the same counter.
>>>>
>>>>         /*
>>>>          * It is desirable to allow the writer to acquire the percpu-rwlock
>>>>          * for read (if necessary), without deadlocking or getting complaints
>>>>          * from lockdep. To achieve that, just increment the reader_refcnt of
>>>>          * this CPU - that way, any attempt by the writer to acquire the
>>>>          * percpu-rwlock for read, will get treated as a case of nested percpu
>>>>          * reader, which is safe, from a locking perspective.
>>>>          */
>>>>         this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
>>>>
>>>
>>> Whoa! Hold on, were you really referring to _this_ increment when you said
>>> that, in your patch you would increment the refcnt at the writer? Then I guess
>>> there is a major disconnect in our conversations. (I had assumed that you were
>>> referring to the update of writer_signal, and were just trying to have a single
>>> refcnt instead of reader_refcnt and writer_signal).
>>
>> https://github.com/laijs/linux/commit/53e5053d5b724bea7c538b11743d0f420d98f38d
>>
>> Sorry the name "fallback_reader_refcnt" misled you.
>>
> [...]
> 
>>>> All I was considered is "nested reader is seldom", so I always
>>>> fallback to rwlock when nested.
>>>> If you like, I can add 6 lines of code, the overhead is
>>>> 1 spin_try_lock()(fast path)  + N  __this_cpu_inc()
>>>>
>>>
>>> I'm assuming that calculation is no longer valid, considering that
>>> we just discussed how the per-cpu refcnt that you were using is quite
>>> unnecessary and can be removed.
>>>
>>> IIUC, the overhead with your code, as per above discussion would be:
>>> 1 spin_try_lock() [non-nested] + N read_lock(global_rwlock).
>>
>> https://github.com/laijs/linux/commit/46334544bb7961550b7065e015da76f6dab21f16
>>
>> Again, I'm so sorry the name "fallback_reader_refcnt" misled you.
>>
> 
> At this juncture I really have to admit that I don't understand your
> intentions at all. What are you really trying to prove? Without giving
> a single good reason why my code is inferior, why are you even bringing
> up the discussion about a complete rewrite of the synchronization code?
> http://article.gmane.org/gmane.linux.kernel.cross-arch/17103
> http://article.gmane.org/gmane.linux.power-management.general/31345
> 
> I'm beginning to add 2 + 2 together based on the kinds of questions you
> have been asking...
> 
> You posted a patch in this thread and started a discussion around it without
> even establishing a strong reason to do so. Now you point me to your git
> tree where your patches have even more traces of ideas being borrowed from
> my patchset (apart from my own ideas/code, there are traces of others' ideas
> being borrowed too - for example, it was Oleg who originally proposed the
> idea of splitting up the counter into 2 parts and I'm seeing that it is
> slowly crawling into your code with no sign of appropriate credits).
> http://article.gmane.org/gmane.linux.network/260288
> 
> And in reply to my mail pointing out the performance implications of the
> global read_lock at the reader side in your code, you said you'll come up
> with a comparison between that and my patchset.
> http://article.gmane.org/gmane.linux.network/260288
> The issue has been well-documented in my patch description of patch 4.
> http://article.gmane.org/gmane.linux.kernel/1443258
> 
> Are you really trying to pit bits and pieces of my own ideas/versions
> against one another and claiming them as your own?
> 
> You projected the work involved in handling the locking issues pertaining
> to CPU_DYING notifiers etc as a TODO, despite the fact that I had explicitly
> noted in my cover letter that I had audited and taken care of all of them.
> http://article.gmane.org/gmane.linux.documentation/9727
> http://article.gmane.org/gmane.linux.documentation/9520
> 
> You failed to acknowledge (on purpose?) that I had done a tree-wide
> conversion despite the fact that you were replying to the very thread which
> had the 46 patches which did exactly that (and I had also mentioned it
> explicitly in my cover letter).
> http://article.gmane.org/gmane.linux.documentation/9727
> http://article.gmane.org/gmane.linux.documentation/9520
> 
> You then started probing more and more about the technique I used to do
> the tree-wide conversion.
> http://article.gmane.org/gmane.linux.kernel.cross-arch/17111
> 
> You also retorted saying you did go through my patch descriptions, so
> its not like you have missed reading them.
> http://article.gmane.org/gmane.linux.power-management.general/31345
> 
> Each of these when considered individually, might appear like innocuous and
> honest attempts at evaluating my code. But when put together, I'm beginning
> to sense a whole different angle to it altogether, as if you are trying
> to spin your own patch series, complete with the locking framework _and_
> the tree-wide conversion, heavily borrowed from mine. At the beginning of
> this discussion, I predicted that the lglock version that you are proposing
> would end up being either less efficient than my version or look very similar
> to my version. http://article.gmane.org/gmane.linux.kernel/1447139
> 
> I thought it was just the former till now, but its not hard to see how it
> is getting closer to becoming the latter too. So yeah, I'm not amused.
> 
> Maybe (and hopefully) you are just trying out different ideas on your own,
> and I'm just being paranoid. I really hope that is the case. If you are just
> trying to review my code, then please stop sending patches with borrowed ideas
> with your sole Signed-off-by, and purposefully ignoring the work already done
> in my patchset, because it is really starting to look suspicious, at least
> to me.
> 
> Don't get me wrong - I'll whole-heartedly acknowledge and appreciate if
> _your_ code is better than mine. I just don't like the idea of somebody
> plagiarizing my ideas/code (or even others' ideas for that matter).
> However, I sincerely apologize in advance if I misunderstood/misjudged your
> intentions; I just wanted to voice my concerns out loud at this point,
> considering the bad feeling I got by looking at your responses collectively.
> 

Hi, Srivatsa

I'm sorry, big apology to you.
I'm bad in communication and I did be wrong.
I tended to improve the codes but in false direction.

Thanks,
Lai

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-03-01 17:50                                   ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-03-01 17:50 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Lai Jiangshan, Michel Lespinasse, linux-doc, peterz, fweisbec,
	linux-kernel, namhyung, mingo, linux-arch, linux, xiaoguangrong,
	wangyun, paulmck, nikunj, linux-pm, rusty, rostedt, rjw,
	vincent.guittot, tglx, linux-arm-kernel, netdev, oleg, sbw, tj,
	akpm, linuxppc-dev

On 28/02/13 05:19, Srivatsa S. Bhat wrote:
> On 02/27/2013 06:03 AM, Lai Jiangshan wrote:
>> On Wed, Feb 27, 2013 at 3:30 AM, Srivatsa S. Bhat
>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>> On 02/26/2013 09:55 PM, Lai Jiangshan wrote:
>>>> On Tue, Feb 26, 2013 at 10:22 PM, Srivatsa S. Bhat
>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>
>>>>> Hi Lai,
>>>>>
>>>>> I'm really not convinced that piggy-backing on lglocks would help
>>>>> us in any way. But still, let me try to address some of the points
>>>>> you raised...
>>>>>
>>>>> On 02/26/2013 06:29 PM, Lai Jiangshan wrote:
>>>>>> On Tue, Feb 26, 2013 at 5:02 PM, Srivatsa S. Bhat
>>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>> On 02/26/2013 05:47 AM, Lai Jiangshan wrote:
>>>>>>>> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
>>>>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>>>> Hi Lai,
>>>>>>>>>
>>>>>>>>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>>>>>>>>> Hi, Srivatsa,
>>>>>>>>>>
>>>>>>>>>> The target of the whole patchset is nice for me.
>>>>>>>>>
>>>>>>>>> Cool! Thanks :-)
>>>>>>>>>
>>>>>>> [...]
>>>>>>>
>>>>>>> Unfortunately, I see quite a few issues with the code above. IIUC, the
>>>>>>> writer and the reader both increment the same counters. So how will the
>>>>>>> unlock() code in the reader path know when to unlock which of the locks?
>>>>>>
>>>>>> The same as your code, the reader(which nested in write C.S.) just dec
>>>>>> the counters.
>>>>>
>>>>> And that works fine in my case because the writer and the reader update
>>>>> _two_ _different_ counters.
>>>>
>>>> I can't find any magic in your code, they are the same counter.
>>>>
>>>>         /*
>>>>          * It is desirable to allow the writer to acquire the percpu-rwlock
>>>>          * for read (if necessary), without deadlocking or getting complaints
>>>>          * from lockdep. To achieve that, just increment the reader_refcnt of
>>>>          * this CPU - that way, any attempt by the writer to acquire the
>>>>          * percpu-rwlock for read, will get treated as a case of nested percpu
>>>>          * reader, which is safe, from a locking perspective.
>>>>          */
>>>>         this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
>>>>
>>>
>>> Whoa! Hold on, were you really referring to _this_ increment when you said
>>> that, in your patch you would increment the refcnt at the writer? Then I guess
>>> there is a major disconnect in our conversations. (I had assumed that you were
>>> referring to the update of writer_signal, and were just trying to have a single
>>> refcnt instead of reader_refcnt and writer_signal).
>>
>> https://github.com/laijs/linux/commit/53e5053d5b724bea7c538b11743d0f420d98f38d
>>
>> Sorry the name "fallback_reader_refcnt" misled you.
>>
> [...]
> 
>>>> All I was considered is "nested reader is seldom", so I always
>>>> fallback to rwlock when nested.
>>>> If you like, I can add 6 lines of code, the overhead is
>>>> 1 spin_try_lock()(fast path)  + N  __this_cpu_inc()
>>>>
>>>
>>> I'm assuming that calculation is no longer valid, considering that
>>> we just discussed how the per-cpu refcnt that you were using is quite
>>> unnecessary and can be removed.
>>>
>>> IIUC, the overhead with your code, as per above discussion would be:
>>> 1 spin_try_lock() [non-nested] + N read_lock(global_rwlock).
>>
>> https://github.com/laijs/linux/commit/46334544bb7961550b7065e015da76f6dab21f16
>>
>> Again, I'm so sorry the name "fallback_reader_refcnt" misled you.
>>
> 
> At this juncture I really have to admit that I don't understand your
> intentions at all. What are you really trying to prove? Without giving
> a single good reason why my code is inferior, why are you even bringing
> up the discussion about a complete rewrite of the synchronization code?
> http://article.gmane.org/gmane.linux.kernel.cross-arch/17103
> http://article.gmane.org/gmane.linux.power-management.general/31345
> 
> I'm beginning to add 2 + 2 together based on the kinds of questions you
> have been asking...
> 
> You posted a patch in this thread and started a discussion around it without
> even establishing a strong reason to do so. Now you point me to your git
> tree where your patches have even more traces of ideas being borrowed from
> my patchset (apart from my own ideas/code, there are traces of others' ideas
> being borrowed too - for example, it was Oleg who originally proposed the
> idea of splitting up the counter into 2 parts and I'm seeing that it is
> slowly crawling into your code with no sign of appropriate credits).
> http://article.gmane.org/gmane.linux.network/260288
> 
> And in reply to my mail pointing out the performance implications of the
> global read_lock at the reader side in your code, you said you'll come up
> with a comparison between that and my patchset.
> http://article.gmane.org/gmane.linux.network/260288
> The issue has been well-documented in my patch description of patch 4.
> http://article.gmane.org/gmane.linux.kernel/1443258
> 
> Are you really trying to pit bits and pieces of my own ideas/versions
> against one another and claiming them as your own?
> 
> You projected the work involved in handling the locking issues pertaining
> to CPU_DYING notifiers etc as a TODO, despite the fact that I had explicitly
> noted in my cover letter that I had audited and taken care of all of them.
> http://article.gmane.org/gmane.linux.documentation/9727
> http://article.gmane.org/gmane.linux.documentation/9520
> 
> You failed to acknowledge (on purpose?) that I had done a tree-wide
> conversion despite the fact that you were replying to the very thread which
> had the 46 patches which did exactly that (and I had also mentioned it
> explicitly in my cover letter).
> http://article.gmane.org/gmane.linux.documentation/9727
> http://article.gmane.org/gmane.linux.documentation/9520
> 
> You then started probing more and more about the technique I used to do
> the tree-wide conversion.
> http://article.gmane.org/gmane.linux.kernel.cross-arch/17111
> 
> You also retorted saying you did go through my patch descriptions, so
> its not like you have missed reading them.
> http://article.gmane.org/gmane.linux.power-management.general/31345
> 
> Each of these when considered individually, might appear like innocuous and
> honest attempts at evaluating my code. But when put together, I'm beginning
> to sense a whole different angle to it altogether, as if you are trying
> to spin your own patch series, complete with the locking framework _and_
> the tree-wide conversion, heavily borrowed from mine. At the beginning of
> this discussion, I predicted that the lglock version that you are proposing
> would end up being either less efficient than my version or look very similar
> to my version. http://article.gmane.org/gmane.linux.kernel/1447139
> 
> I thought it was just the former till now, but its not hard to see how it
> is getting closer to becoming the latter too. So yeah, I'm not amused.
> 
> Maybe (and hopefully) you are just trying out different ideas on your own,
> and I'm just being paranoid. I really hope that is the case. If you are just
> trying to review my code, then please stop sending patches with borrowed ideas
> with your sole Signed-off-by, and purposefully ignoring the work already done
> in my patchset, because it is really starting to look suspicious, at least
> to me.
> 
> Don't get me wrong - I'll whole-heartedly acknowledge and appreciate if
> _your_ code is better than mine. I just don't like the idea of somebody
> plagiarizing my ideas/code (or even others' ideas for that matter).
> However, I sincerely apologize in advance if I misunderstood/misjudged your
> intentions; I just wanted to voice my concerns out loud at this point,
> considering the bad feeling I got by looking at your responses collectively.
> 

Hi, Srivatsa

I'm sorry, big apology to you.
I'm bad in communication and I did be wrong.
I tended to improve the codes but in false direction.

Thanks,
Lai

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-03-01 17:50                                   ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-03-01 17:50 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Lai Jiangshan, linux-doc, peterz, fweisbec, oleg,
	Michel Lespinasse, mingo, linux-arch, linux, xiaoguangrong,
	wangyun, paulmck, nikunj, linux-pm, rusty, rostedt, rjw,
	namhyung, tglx, linux-arm-kernel, netdev, linux-kernel,
	vincent.guittot, sbw, tj, akpm, linuxppc-dev

On 28/02/13 05:19, Srivatsa S. Bhat wrote:
> On 02/27/2013 06:03 AM, Lai Jiangshan wrote:
>> On Wed, Feb 27, 2013 at 3:30 AM, Srivatsa S. Bhat
>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>> On 02/26/2013 09:55 PM, Lai Jiangshan wrote:
>>>> On Tue, Feb 26, 2013 at 10:22 PM, Srivatsa S. Bhat
>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>
>>>>> Hi Lai,
>>>>>
>>>>> I'm really not convinced that piggy-backing on lglocks would help
>>>>> us in any way. But still, let me try to address some of the points
>>>>> you raised...
>>>>>
>>>>> On 02/26/2013 06:29 PM, Lai Jiangshan wrote:
>>>>>> On Tue, Feb 26, 2013 at 5:02 PM, Srivatsa S. Bhat
>>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>> On 02/26/2013 05:47 AM, Lai Jiangshan wrote:
>>>>>>>> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
>>>>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>>>> Hi Lai,
>>>>>>>>>
>>>>>>>>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>>>>>>>>> Hi, Srivatsa,
>>>>>>>>>>
>>>>>>>>>> The target of the whole patchset is nice for me.
>>>>>>>>>
>>>>>>>>> Cool! Thanks :-)
>>>>>>>>>
>>>>>>> [...]
>>>>>>>
>>>>>>> Unfortunately, I see quite a few issues with the code above. IIUC, the
>>>>>>> writer and the reader both increment the same counters. So how will the
>>>>>>> unlock() code in the reader path know when to unlock which of the locks?
>>>>>>
>>>>>> The same as your code, the reader(which nested in write C.S.) just dec
>>>>>> the counters.
>>>>>
>>>>> And that works fine in my case because the writer and the reader update
>>>>> _two_ _different_ counters.
>>>>
>>>> I can't find any magic in your code, they are the same counter.
>>>>
>>>>         /*
>>>>          * It is desirable to allow the writer to acquire the percpu-rwlock
>>>>          * for read (if necessary), without deadlocking or getting complaints
>>>>          * from lockdep. To achieve that, just increment the reader_refcnt of
>>>>          * this CPU - that way, any attempt by the writer to acquire the
>>>>          * percpu-rwlock for read, will get treated as a case of nested percpu
>>>>          * reader, which is safe, from a locking perspective.
>>>>          */
>>>>         this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
>>>>
>>>
>>> Whoa! Hold on, were you really referring to _this_ increment when you said
>>> that, in your patch you would increment the refcnt at the writer? Then I guess
>>> there is a major disconnect in our conversations. (I had assumed that you were
>>> referring to the update of writer_signal, and were just trying to have a single
>>> refcnt instead of reader_refcnt and writer_signal).
>>
>> https://github.com/laijs/linux/commit/53e5053d5b724bea7c538b11743d0f420d98f38d
>>
>> Sorry the name "fallback_reader_refcnt" misled you.
>>
> [...]
> 
>>>> All I was considered is "nested reader is seldom", so I always
>>>> fallback to rwlock when nested.
>>>> If you like, I can add 6 lines of code, the overhead is
>>>> 1 spin_try_lock()(fast path)  + N  __this_cpu_inc()
>>>>
>>>
>>> I'm assuming that calculation is no longer valid, considering that
>>> we just discussed how the per-cpu refcnt that you were using is quite
>>> unnecessary and can be removed.
>>>
>>> IIUC, the overhead with your code, as per above discussion would be:
>>> 1 spin_try_lock() [non-nested] + N read_lock(global_rwlock).
>>
>> https://github.com/laijs/linux/commit/46334544bb7961550b7065e015da76f6dab21f16
>>
>> Again, I'm so sorry the name "fallback_reader_refcnt" misled you.
>>
> 
> At this juncture I really have to admit that I don't understand your
> intentions at all. What are you really trying to prove? Without giving
> a single good reason why my code is inferior, why are you even bringing
> up the discussion about a complete rewrite of the synchronization code?
> http://article.gmane.org/gmane.linux.kernel.cross-arch/17103
> http://article.gmane.org/gmane.linux.power-management.general/31345
> 
> I'm beginning to add 2 + 2 together based on the kinds of questions you
> have been asking...
> 
> You posted a patch in this thread and started a discussion around it without
> even establishing a strong reason to do so. Now you point me to your git
> tree where your patches have even more traces of ideas being borrowed from
> my patchset (apart from my own ideas/code, there are traces of others' ideas
> being borrowed too - for example, it was Oleg who originally proposed the
> idea of splitting up the counter into 2 parts and I'm seeing that it is
> slowly crawling into your code with no sign of appropriate credits).
> http://article.gmane.org/gmane.linux.network/260288
> 
> And in reply to my mail pointing out the performance implications of the
> global read_lock at the reader side in your code, you said you'll come up
> with a comparison between that and my patchset.
> http://article.gmane.org/gmane.linux.network/260288
> The issue has been well-documented in my patch description of patch 4.
> http://article.gmane.org/gmane.linux.kernel/1443258
> 
> Are you really trying to pit bits and pieces of my own ideas/versions
> against one another and claiming them as your own?
> 
> You projected the work involved in handling the locking issues pertaining
> to CPU_DYING notifiers etc as a TODO, despite the fact that I had explicitly
> noted in my cover letter that I had audited and taken care of all of them.
> http://article.gmane.org/gmane.linux.documentation/9727
> http://article.gmane.org/gmane.linux.documentation/9520
> 
> You failed to acknowledge (on purpose?) that I had done a tree-wide
> conversion despite the fact that you were replying to the very thread which
> had the 46 patches which did exactly that (and I had also mentioned it
> explicitly in my cover letter).
> http://article.gmane.org/gmane.linux.documentation/9727
> http://article.gmane.org/gmane.linux.documentation/9520
> 
> You then started probing more and more about the technique I used to do
> the tree-wide conversion.
> http://article.gmane.org/gmane.linux.kernel.cross-arch/17111
> 
> You also retorted saying you did go through my patch descriptions, so
> its not like you have missed reading them.
> http://article.gmane.org/gmane.linux.power-management.general/31345
> 
> Each of these when considered individually, might appear like innocuous and
> honest attempts at evaluating my code. But when put together, I'm beginning
> to sense a whole different angle to it altogether, as if you are trying
> to spin your own patch series, complete with the locking framework _and_
> the tree-wide conversion, heavily borrowed from mine. At the beginning of
> this discussion, I predicted that the lglock version that you are proposing
> would end up being either less efficient than my version or look very similar
> to my version. http://article.gmane.org/gmane.linux.kernel/1447139
> 
> I thought it was just the former till now, but its not hard to see how it
> is getting closer to becoming the latter too. So yeah, I'm not amused.
> 
> Maybe (and hopefully) you are just trying out different ideas on your own,
> and I'm just being paranoid. I really hope that is the case. If you are just
> trying to review my code, then please stop sending patches with borrowed ideas
> with your sole Signed-off-by, and purposefully ignoring the work already done
> in my patchset, because it is really starting to look suspicious, at least
> to me.
> 
> Don't get me wrong - I'll whole-heartedly acknowledge and appreciate if
> _your_ code is better than mine. I just don't like the idea of somebody
> plagiarizing my ideas/code (or even others' ideas for that matter).
> However, I sincerely apologize in advance if I misunderstood/misjudged your
> intentions; I just wanted to voice my concerns out loud at this point,
> considering the bad feeling I got by looking at your responses collectively.
> 

Hi, Srivatsa

I'm sorry, big apology to you.
I'm bad in communication and I did be wrong.
I tended to improve the codes but in false direction.

Thanks,
Lai

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-03-01 17:50                                   ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-03-01 17:50 UTC (permalink / raw)
  To: linux-arm-kernel

On 28/02/13 05:19, Srivatsa S. Bhat wrote:
> On 02/27/2013 06:03 AM, Lai Jiangshan wrote:
>> On Wed, Feb 27, 2013 at 3:30 AM, Srivatsa S. Bhat
>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>> On 02/26/2013 09:55 PM, Lai Jiangshan wrote:
>>>> On Tue, Feb 26, 2013 at 10:22 PM, Srivatsa S. Bhat
>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>
>>>>> Hi Lai,
>>>>>
>>>>> I'm really not convinced that piggy-backing on lglocks would help
>>>>> us in any way. But still, let me try to address some of the points
>>>>> you raised...
>>>>>
>>>>> On 02/26/2013 06:29 PM, Lai Jiangshan wrote:
>>>>>> On Tue, Feb 26, 2013 at 5:02 PM, Srivatsa S. Bhat
>>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>> On 02/26/2013 05:47 AM, Lai Jiangshan wrote:
>>>>>>>> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
>>>>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>>>> Hi Lai,
>>>>>>>>>
>>>>>>>>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>>>>>>>>> Hi, Srivatsa,
>>>>>>>>>>
>>>>>>>>>> The target of the whole patchset is nice for me.
>>>>>>>>>
>>>>>>>>> Cool! Thanks :-)
>>>>>>>>>
>>>>>>> [...]
>>>>>>>
>>>>>>> Unfortunately, I see quite a few issues with the code above. IIUC, the
>>>>>>> writer and the reader both increment the same counters. So how will the
>>>>>>> unlock() code in the reader path know when to unlock which of the locks?
>>>>>>
>>>>>> The same as your code, the reader(which nested in write C.S.) just dec
>>>>>> the counters.
>>>>>
>>>>> And that works fine in my case because the writer and the reader update
>>>>> _two_ _different_ counters.
>>>>
>>>> I can't find any magic in your code, they are the same counter.
>>>>
>>>>         /*
>>>>          * It is desirable to allow the writer to acquire the percpu-rwlock
>>>>          * for read (if necessary), without deadlocking or getting complaints
>>>>          * from lockdep. To achieve that, just increment the reader_refcnt of
>>>>          * this CPU - that way, any attempt by the writer to acquire the
>>>>          * percpu-rwlock for read, will get treated as a case of nested percpu
>>>>          * reader, which is safe, from a locking perspective.
>>>>          */
>>>>         this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
>>>>
>>>
>>> Whoa! Hold on, were you really referring to _this_ increment when you said
>>> that, in your patch you would increment the refcnt at the writer? Then I guess
>>> there is a major disconnect in our conversations. (I had assumed that you were
>>> referring to the update of writer_signal, and were just trying to have a single
>>> refcnt instead of reader_refcnt and writer_signal).
>>
>> https://github.com/laijs/linux/commit/53e5053d5b724bea7c538b11743d0f420d98f38d
>>
>> Sorry the name "fallback_reader_refcnt" misled you.
>>
> [...]
> 
>>>> All I was considered is "nested reader is seldom", so I always
>>>> fallback to rwlock when nested.
>>>> If you like, I can add 6 lines of code, the overhead is
>>>> 1 spin_try_lock()(fast path)  + N  __this_cpu_inc()
>>>>
>>>
>>> I'm assuming that calculation is no longer valid, considering that
>>> we just discussed how the per-cpu refcnt that you were using is quite
>>> unnecessary and can be removed.
>>>
>>> IIUC, the overhead with your code, as per above discussion would be:
>>> 1 spin_try_lock() [non-nested] + N read_lock(global_rwlock).
>>
>> https://github.com/laijs/linux/commit/46334544bb7961550b7065e015da76f6dab21f16
>>
>> Again, I'm so sorry the name "fallback_reader_refcnt" misled you.
>>
> 
> At this juncture I really have to admit that I don't understand your
> intentions at all. What are you really trying to prove? Without giving
> a single good reason why my code is inferior, why are you even bringing
> up the discussion about a complete rewrite of the synchronization code?
> http://article.gmane.org/gmane.linux.kernel.cross-arch/17103
> http://article.gmane.org/gmane.linux.power-management.general/31345
> 
> I'm beginning to add 2 + 2 together based on the kinds of questions you
> have been asking...
> 
> You posted a patch in this thread and started a discussion around it without
> even establishing a strong reason to do so. Now you point me to your git
> tree where your patches have even more traces of ideas being borrowed from
> my patchset (apart from my own ideas/code, there are traces of others' ideas
> being borrowed too - for example, it was Oleg who originally proposed the
> idea of splitting up the counter into 2 parts and I'm seeing that it is
> slowly crawling into your code with no sign of appropriate credits).
> http://article.gmane.org/gmane.linux.network/260288
> 
> And in reply to my mail pointing out the performance implications of the
> global read_lock at the reader side in your code, you said you'll come up
> with a comparison between that and my patchset.
> http://article.gmane.org/gmane.linux.network/260288
> The issue has been well-documented in my patch description of patch 4.
> http://article.gmane.org/gmane.linux.kernel/1443258
> 
> Are you really trying to pit bits and pieces of my own ideas/versions
> against one another and claiming them as your own?
> 
> You projected the work involved in handling the locking issues pertaining
> to CPU_DYING notifiers etc as a TODO, despite the fact that I had explicitly
> noted in my cover letter that I had audited and taken care of all of them.
> http://article.gmane.org/gmane.linux.documentation/9727
> http://article.gmane.org/gmane.linux.documentation/9520
> 
> You failed to acknowledge (on purpose?) that I had done a tree-wide
> conversion despite the fact that you were replying to the very thread which
> had the 46 patches which did exactly that (and I had also mentioned it
> explicitly in my cover letter).
> http://article.gmane.org/gmane.linux.documentation/9727
> http://article.gmane.org/gmane.linux.documentation/9520
> 
> You then started probing more and more about the technique I used to do
> the tree-wide conversion.
> http://article.gmane.org/gmane.linux.kernel.cross-arch/17111
> 
> You also retorted saying you did go through my patch descriptions, so
> its not like you have missed reading them.
> http://article.gmane.org/gmane.linux.power-management.general/31345
> 
> Each of these when considered individually, might appear like innocuous and
> honest attempts at evaluating my code. But when put together, I'm beginning
> to sense a whole different angle to it altogether, as if you are trying
> to spin your own patch series, complete with the locking framework _and_
> the tree-wide conversion, heavily borrowed from mine. At the beginning of
> this discussion, I predicted that the lglock version that you are proposing
> would end up being either less efficient than my version or look very similar
> to my version. http://article.gmane.org/gmane.linux.kernel/1447139
> 
> I thought it was just the former till now, but its not hard to see how it
> is getting closer to becoming the latter too. So yeah, I'm not amused.
> 
> Maybe (and hopefully) you are just trying out different ideas on your own,
> and I'm just being paranoid. I really hope that is the case. If you are just
> trying to review my code, then please stop sending patches with borrowed ideas
> with your sole Signed-off-by, and purposefully ignoring the work already done
> in my patchset, because it is really starting to look suspicious, at least
> to me.
> 
> Don't get me wrong - I'll whole-heartedly acknowledge and appreciate if
> _your_ code is better than mine. I just don't like the idea of somebody
> plagiarizing my ideas/code (or even others' ideas for that matter).
> However, I sincerely apologize in advance if I misunderstood/misjudged your
> intentions; I just wanted to voice my concerns out loud at this point,
> considering the bad feeling I got by looking at your responses collectively.
> 

Hi, Srivatsa

I'm sorry, big apology to you.
I'm bad in communication and I did be wrong.
I tended to improve the codes but in false direction.

Thanks,
Lai

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH] lglock: add read-preference local-global rwlock
  2013-03-01 17:44                                   ` Lai Jiangshan
  (?)
@ 2013-03-01 17:53                                     ` Tejun Heo
  -1 siblings, 0 replies; 360+ messages in thread
From: Tejun Heo @ 2013-03-01 17:53 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Srivatsa S. Bhat, Lai Jiangshan, Michel Lespinasse, linux-doc,
	peterz, fweisbec, linux-kernel, namhyung, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, vincent.guittot, tglx, linux-arm-kernel, netdev,
	oleg, sbw, akpm, linuxppc-dev

Hey, guys and Oleg (yes, I'm singling you out ;p because you're that
awesome.)

On Sat, Mar 02, 2013 at 01:44:02AM +0800, Lai Jiangshan wrote:
> Performance:
> We only focus on the performance of the read site. this read site's fast path
> is just preempt_disable() + __this_cpu_read/inc() + arch_spin_trylock(),
> It has only one heavy memory operation. it will be expected fast.
> 
> We test three locks.
> 1) traditional rwlock WITHOUT remote competition nor cache-bouncing.(opt-rwlock)
> 2) this lock(lgrwlock)
> 3) V6 percpu-rwlock by "Srivatsa S. Bhat". (percpu-rwlock)
>    (https://lkml.org/lkml/2013/2/18/186)
> 
> 		nested=1(no nested)	nested=2	nested=4
> opt-rwlock	 517181			1009200		2010027
> lgrwlock	 452897			 700026		1201415
> percpu-rwlock	1192955			1451343		1951757

On the first glance, the numbers look pretty good and I kinda really
like the fact that if this works out we don't have to introduce yet
another percpu synchronization construct and get to reuse lglock.

So, Oleg, can you please see whether you can find holes in this one?

Srivatsa, I know you spent a lot of time on percpu_rwlock but as you
wrote before Lai's work can be seen as continuation of yours, and if
we get to extend what's already there instead of introducing something
completely new, there's no reason not to (and my apologies for not
noticing the possibility of extending lglock before).  So, if this can
work, it would be awesome if you guys can work together.  Lai might
not be very good at communicating in english yet but he's really good
at spotting patterns in complex code and playing with them.

Thanks!

-- 
tejun

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH] lglock: add read-preference local-global rwlock
@ 2013-03-01 17:53                                     ` Tejun Heo
  0 siblings, 0 replies; 360+ messages in thread
From: Tejun Heo @ 2013-03-01 17:53 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Lai Jiangshan, linux-doc, peterz, fweisbec, oleg,
	Michel Lespinasse, mingo, linux-arch, linux, xiaoguangrong,
	wangyun, paulmck, nikunj, linux-pm, rusty, rostedt, rjw,
	namhyung, tglx, linux-arm-kernel, netdev, linux-kernel,
	vincent.guittot, sbw, Srivatsa S. Bhat, akpm, linuxppc-dev

Hey, guys and Oleg (yes, I'm singling you out ;p because you're that
awesome.)

On Sat, Mar 02, 2013 at 01:44:02AM +0800, Lai Jiangshan wrote:
> Performance:
> We only focus on the performance of the read site. this read site's fast path
> is just preempt_disable() + __this_cpu_read/inc() + arch_spin_trylock(),
> It has only one heavy memory operation. it will be expected fast.
> 
> We test three locks.
> 1) traditional rwlock WITHOUT remote competition nor cache-bouncing.(opt-rwlock)
> 2) this lock(lgrwlock)
> 3) V6 percpu-rwlock by "Srivatsa S. Bhat". (percpu-rwlock)
>    (https://lkml.org/lkml/2013/2/18/186)
> 
> 		nested=1(no nested)	nested=2	nested=4
> opt-rwlock	 517181			1009200		2010027
> lgrwlock	 452897			 700026		1201415
> percpu-rwlock	1192955			1451343		1951757

On the first glance, the numbers look pretty good and I kinda really
like the fact that if this works out we don't have to introduce yet
another percpu synchronization construct and get to reuse lglock.

So, Oleg, can you please see whether you can find holes in this one?

Srivatsa, I know you spent a lot of time on percpu_rwlock but as you
wrote before Lai's work can be seen as continuation of yours, and if
we get to extend what's already there instead of introducing something
completely new, there's no reason not to (and my apologies for not
noticing the possibility of extending lglock before).  So, if this can
work, it would be awesome if you guys can work together.  Lai might
not be very good at communicating in english yet but he's really good
at spotting patterns in complex code and playing with them.

Thanks!

-- 
tejun

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH] lglock: add read-preference local-global rwlock
@ 2013-03-01 17:53                                     ` Tejun Heo
  0 siblings, 0 replies; 360+ messages in thread
From: Tejun Heo @ 2013-03-01 17:53 UTC (permalink / raw)
  To: linux-arm-kernel

Hey, guys and Oleg (yes, I'm singling you out ;p because you're that
awesome.)

On Sat, Mar 02, 2013 at 01:44:02AM +0800, Lai Jiangshan wrote:
> Performance:
> We only focus on the performance of the read site. this read site's fast path
> is just preempt_disable() + __this_cpu_read/inc() + arch_spin_trylock(),
> It has only one heavy memory operation. it will be expected fast.
> 
> We test three locks.
> 1) traditional rwlock WITHOUT remote competition nor cache-bouncing.(opt-rwlock)
> 2) this lock(lgrwlock)
> 3) V6 percpu-rwlock by "Srivatsa S. Bhat". (percpu-rwlock)
>    (https://lkml.org/lkml/2013/2/18/186)
> 
> 		nested=1(no nested)	nested=2	nested=4
> opt-rwlock	 517181			1009200		2010027
> lgrwlock	 452897			 700026		1201415
> percpu-rwlock	1192955			1451343		1951757

On the first glance, the numbers look pretty good and I kinda really
like the fact that if this works out we don't have to introduce yet
another percpu synchronization construct and get to reuse lglock.

So, Oleg, can you please see whether you can find holes in this one?

Srivatsa, I know you spent a lot of time on percpu_rwlock but as you
wrote before Lai's work can be seen as continuation of yours, and if
we get to extend what's already there instead of introducing something
completely new, there's no reason not to (and my apologies for not
noticing the possibility of extending lglock before).  So, if this can
work, it would be awesome if you guys can work together.  Lai might
not be very good at communicating in english yet but he's really good
at spotting patterns in complex code and playing with them.

Thanks!

-- 
tejun

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
  2013-02-27 21:19                                 ` Srivatsa S. Bhat
  (?)
@ 2013-03-01 18:10                                   ` Tejun Heo
  -1 siblings, 0 replies; 360+ messages in thread
From: Tejun Heo @ 2013-03-01 18:10 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Lai Jiangshan, Michel Lespinasse, linux-doc, peterz, fweisbec,
	linux-kernel, namhyung, mingo, linux-arch, linux, xiaoguangrong,
	wangyun, paulmck, nikunj, linux-pm, rusty, rostedt, rjw,
	vincent.guittot, tglx, linux-arm-kernel, netdev, oleg, sbw, akpm,
	linuxppc-dev

Hello, Srivatsa.

On Thu, Feb 28, 2013 at 02:49:53AM +0530, Srivatsa S. Bhat wrote:
> Don't get me wrong - I'll whole-heartedly acknowledge and appreciate if
> _your_ code is better than mine. I just don't like the idea of somebody
> plagiarizing my ideas/code (or even others' ideas for that matter).
> However, I sincerely apologize in advance if I misunderstood/misjudged your
> intentions; I just wanted to voice my concerns out loud at this point,
> considering the bad feeling I got by looking at your responses collectively.

Although I don't know Lai personally, from my experience working with
him past several months on workqueue, I strongly doubt he has dark
urterior intenions.  The biggest problem probably is that his
communication in English isn't very fluent yet and thus doesn't carry
the underlying intentions or tones very well and he tends to bombard
patches in quick succession without much explanation inbetween (during
3.8 cycle, I was receiving several workqueue patchsets days apart
trying to solve the same problem with completely different approaches
way before the discussion on the previous postings reach any kind of
consensus).  A lot of that also probably comes from communication
problems.

Anyways, I really don't think he's trying to take claim of your work.
At least for now, you kinda need to speculate what he's trying to do
and then confirm that back to him, but once you get used to that part,
he's pretty nice to work with and I'm sure his communication will
improve in time (I think it has gotten a lot better than only some
months ago).

Thanks!

-- 
tejun

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-03-01 18:10                                   ` Tejun Heo
  0 siblings, 0 replies; 360+ messages in thread
From: Tejun Heo @ 2013-03-01 18:10 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Lai Jiangshan, linux-doc, peterz, fweisbec, oleg,
	Michel Lespinasse, mingo, linux-arch, linux, xiaoguangrong,
	wangyun, paulmck, nikunj, linux-pm, rusty, rostedt, rjw,
	namhyung, tglx, linux-arm-kernel, netdev, linux-kernel,
	vincent.guittot, sbw, akpm, linuxppc-dev

Hello, Srivatsa.

On Thu, Feb 28, 2013 at 02:49:53AM +0530, Srivatsa S. Bhat wrote:
> Don't get me wrong - I'll whole-heartedly acknowledge and appreciate if
> _your_ code is better than mine. I just don't like the idea of somebody
> plagiarizing my ideas/code (or even others' ideas for that matter).
> However, I sincerely apologize in advance if I misunderstood/misjudged your
> intentions; I just wanted to voice my concerns out loud at this point,
> considering the bad feeling I got by looking at your responses collectively.

Although I don't know Lai personally, from my experience working with
him past several months on workqueue, I strongly doubt he has dark
urterior intenions.  The biggest problem probably is that his
communication in English isn't very fluent yet and thus doesn't carry
the underlying intentions or tones very well and he tends to bombard
patches in quick succession without much explanation inbetween (during
3.8 cycle, I was receiving several workqueue patchsets days apart
trying to solve the same problem with completely different approaches
way before the discussion on the previous postings reach any kind of
consensus).  A lot of that also probably comes from communication
problems.

Anyways, I really don't think he's trying to take claim of your work.
At least for now, you kinda need to speculate what he's trying to do
and then confirm that back to him, but once you get used to that part,
he's pretty nice to work with and I'm sure his communication will
improve in time (I think it has gotten a lot better than only some
months ago).

Thanks!

-- 
tejun

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-03-01 18:10                                   ` Tejun Heo
  0 siblings, 0 replies; 360+ messages in thread
From: Tejun Heo @ 2013-03-01 18:10 UTC (permalink / raw)
  To: linux-arm-kernel

Hello, Srivatsa.

On Thu, Feb 28, 2013 at 02:49:53AM +0530, Srivatsa S. Bhat wrote:
> Don't get me wrong - I'll whole-heartedly acknowledge and appreciate if
> _your_ code is better than mine. I just don't like the idea of somebody
> plagiarizing my ideas/code (or even others' ideas for that matter).
> However, I sincerely apologize in advance if I misunderstood/misjudged your
> intentions; I just wanted to voice my concerns out loud at this point,
> considering the bad feeling I got by looking at your responses collectively.

Although I don't know Lai personally, from my experience working with
him past several months on workqueue, I strongly doubt he has dark
urterior intenions.  The biggest problem probably is that his
communication in English isn't very fluent yet and thus doesn't carry
the underlying intentions or tones very well and he tends to bombard
patches in quick succession without much explanation inbetween (during
3.8 cycle, I was receiving several workqueue patchsets days apart
trying to solve the same problem with completely different approaches
way before the discussion on the previous postings reach any kind of
consensus).  A lot of that also probably comes from communication
problems.

Anyways, I really don't think he's trying to take claim of your work.
At least for now, you kinda need to speculate what he's trying to do
and then confirm that back to him, but once you get used to that part,
he's pretty nice to work with and I'm sure his communication will
improve in time (I think it has gotten a lot better than only some
months ago).

Thanks!

-- 
tejun

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH] lglock: add read-preference local-global rwlock
  2013-03-01 17:44                                   ` Lai Jiangshan
  (?)
@ 2013-03-01 18:28                                     ` Oleg Nesterov
  -1 siblings, 0 replies; 360+ messages in thread
From: Oleg Nesterov @ 2013-03-01 18:28 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Srivatsa S. Bhat, Lai Jiangshan, Michel Lespinasse, linux-doc,
	peterz, fweisbec, linux-kernel, namhyung, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, vincent.guittot, tglx, linux-arm-kernel, netdev,
	sbw, tj, akpm, linuxppc-dev

Lai, I didn't read this discussion except the code posted by Michel.
I'll try to read this patch carefully later, but I'd like to ask
a couple of questions.

This version looks more complex than Michel's, why? Just curious, I
am trying to understand what I missed. See
http://marc.info/?l=linux-kernel&m=136196350213593

And I can't understand FALLBACK_BASE...

OK, suppose that CPU_0 does _write_unlock() and releases ->fallback_rwlock.

CPU_1 does _read_lock(), and ...

> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
> +{
> +	struct lglock *lg = &lgrw->lglock;
> +
> +	preempt_disable();
> +	rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
> +	if (likely(!__this_cpu_read(*lgrw->reader_refcnt))) {
> +		if (!arch_spin_trylock(this_cpu_ptr(lg->lock))) {

_trylock() fails,

> +			read_lock(&lgrw->fallback_rwlock);
> +			__this_cpu_add(*lgrw->reader_refcnt, FALLBACK_BASE);

so we take ->fallback_rwlock and ->reader_refcnt == FALLBACK_BASE.

CPU_0 does lg_global_unlock(lgrw->lglock) and finishes _write_unlock().

Interrupt handler on CPU_1 does _read_lock() notices ->reader_refcnt != 0
and simply does this_cpu_inc(), so reader_refcnt == FALLBACK_BASE + 1.

Then irq does _read_unlock(), and

> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
> +{
> +	switch (__this_cpu_dec_return(*lgrw->reader_refcnt)) {
> +	case 0:
> +		lg_local_unlock(&lgrw->lglock);
> +		return;
> +	case FALLBACK_BASE:
> +		__this_cpu_sub(*lgrw->reader_refcnt, FALLBACK_BASE);
> +		read_unlock(&lgrw->fallback_rwlock);

hits this case?

Doesn't look right, but most probably I missed something.

Oleg.


^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH] lglock: add read-preference local-global rwlock
@ 2013-03-01 18:28                                     ` Oleg Nesterov
  0 siblings, 0 replies; 360+ messages in thread
From: Oleg Nesterov @ 2013-03-01 18:28 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Lai Jiangshan, linux-doc, peterz, fweisbec, Michel Lespinasse,
	mingo, linux-arch, linux, xiaoguangrong, wangyun, paulmck,
	nikunj, linux-pm, rusty, rostedt, rjw, namhyung, tglx,
	linux-arm-kernel, netdev, linux-kernel, vincent.guittot, sbw,
	Srivatsa S. Bhat, tj, akpm, linuxppc-dev

Lai, I didn't read this discussion except the code posted by Michel.
I'll try to read this patch carefully later, but I'd like to ask
a couple of questions.

This version looks more complex than Michel's, why? Just curious, I
am trying to understand what I missed. See
http://marc.info/?l=linux-kernel&m=136196350213593

And I can't understand FALLBACK_BASE...

OK, suppose that CPU_0 does _write_unlock() and releases ->fallback_rwlock.

CPU_1 does _read_lock(), and ...

> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
> +{
> +	struct lglock *lg = &lgrw->lglock;
> +
> +	preempt_disable();
> +	rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
> +	if (likely(!__this_cpu_read(*lgrw->reader_refcnt))) {
> +		if (!arch_spin_trylock(this_cpu_ptr(lg->lock))) {

_trylock() fails,

> +			read_lock(&lgrw->fallback_rwlock);
> +			__this_cpu_add(*lgrw->reader_refcnt, FALLBACK_BASE);

so we take ->fallback_rwlock and ->reader_refcnt == FALLBACK_BASE.

CPU_0 does lg_global_unlock(lgrw->lglock) and finishes _write_unlock().

Interrupt handler on CPU_1 does _read_lock() notices ->reader_refcnt != 0
and simply does this_cpu_inc(), so reader_refcnt == FALLBACK_BASE + 1.

Then irq does _read_unlock(), and

> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
> +{
> +	switch (__this_cpu_dec_return(*lgrw->reader_refcnt)) {
> +	case 0:
> +		lg_local_unlock(&lgrw->lglock);
> +		return;
> +	case FALLBACK_BASE:
> +		__this_cpu_sub(*lgrw->reader_refcnt, FALLBACK_BASE);
> +		read_unlock(&lgrw->fallback_rwlock);

hits this case?

Doesn't look right, but most probably I missed something.

Oleg.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH] lglock: add read-preference local-global rwlock
@ 2013-03-01 18:28                                     ` Oleg Nesterov
  0 siblings, 0 replies; 360+ messages in thread
From: Oleg Nesterov @ 2013-03-01 18:28 UTC (permalink / raw)
  To: linux-arm-kernel

Lai, I didn't read this discussion except the code posted by Michel.
I'll try to read this patch carefully later, but I'd like to ask
a couple of questions.

This version looks more complex than Michel's, why? Just curious, I
am trying to understand what I missed. See
http://marc.info/?l=linux-kernel&m=136196350213593

And I can't understand FALLBACK_BASE...

OK, suppose that CPU_0 does _write_unlock() and releases ->fallback_rwlock.

CPU_1 does _read_lock(), and ...

> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
> +{
> +	struct lglock *lg = &lgrw->lglock;
> +
> +	preempt_disable();
> +	rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
> +	if (likely(!__this_cpu_read(*lgrw->reader_refcnt))) {
> +		if (!arch_spin_trylock(this_cpu_ptr(lg->lock))) {

_trylock() fails,

> +			read_lock(&lgrw->fallback_rwlock);
> +			__this_cpu_add(*lgrw->reader_refcnt, FALLBACK_BASE);

so we take ->fallback_rwlock and ->reader_refcnt == FALLBACK_BASE.

CPU_0 does lg_global_unlock(lgrw->lglock) and finishes _write_unlock().

Interrupt handler on CPU_1 does _read_lock() notices ->reader_refcnt != 0
and simply does this_cpu_inc(), so reader_refcnt == FALLBACK_BASE + 1.

Then irq does _read_unlock(), and

> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
> +{
> +	switch (__this_cpu_dec_return(*lgrw->reader_refcnt)) {
> +	case 0:
> +		lg_local_unlock(&lgrw->lglock);
> +		return;
> +	case FALLBACK_BASE:
> +		__this_cpu_sub(*lgrw->reader_refcnt, FALLBACK_BASE);
> +		read_unlock(&lgrw->fallback_rwlock);

hits this case?

Doesn't look right, but most probably I missed something.

Oleg.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
  2013-03-01 17:50                                   ` Lai Jiangshan
  (?)
  (?)
@ 2013-03-01 19:47                                     ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-03-01 19:47 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Lai Jiangshan, Michel Lespinasse, linux-doc, peterz, fweisbec,
	linux-kernel, namhyung, mingo, linux-arch, linux, xiaoguangrong,
	wangyun, paulmck, nikunj, linux-pm, rusty, rostedt, rjw,
	vincent.guittot, tglx, linux-arm-kernel, netdev, oleg, sbw, tj,
	akpm, linuxppc-dev

On 03/01/2013 11:20 PM, Lai Jiangshan wrote:
> On 28/02/13 05:19, Srivatsa S. Bhat wrote:
>> On 02/27/2013 06:03 AM, Lai Jiangshan wrote:
>>> On Wed, Feb 27, 2013 at 3:30 AM, Srivatsa S. Bhat
>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>> On 02/26/2013 09:55 PM, Lai Jiangshan wrote:
>>>>> On Tue, Feb 26, 2013 at 10:22 PM, Srivatsa S. Bhat
>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>
>>>>>> Hi Lai,
>>>>>>
>>>>>> I'm really not convinced that piggy-backing on lglocks would help
>>>>>> us in any way. But still, let me try to address some of the points
>>>>>> you raised...
>>>>>>
>>>>>> On 02/26/2013 06:29 PM, Lai Jiangshan wrote:
>>>>>>> On Tue, Feb 26, 2013 at 5:02 PM, Srivatsa S. Bhat
>>>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>>> On 02/26/2013 05:47 AM, Lai Jiangshan wrote:
>>>>>>>>> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
>>>>>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>>>>> Hi Lai,
>>>>>>>>>>
>>>>>>>>>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>>>>>>>>>> Hi, Srivatsa,
>>>>>>>>>>>
>>>>>>>>>>> The target of the whole patchset is nice for me.
>>>>>>>>>>
>>>>>>>>>> Cool! Thanks :-)
>>>>>>>>>>
>>>>>>>> [...]
>>>>>>>>
>>>>>>>> Unfortunately, I see quite a few issues with the code above. IIUC, the
>>>>>>>> writer and the reader both increment the same counters. So how will the
>>>>>>>> unlock() code in the reader path know when to unlock which of the locks?
>>>>>>>
>>>>>>> The same as your code, the reader(which nested in write C.S.) just dec
>>>>>>> the counters.
>>>>>>
>>>>>> And that works fine in my case because the writer and the reader update
>>>>>> _two_ _different_ counters.
>>>>>
>>>>> I can't find any magic in your code, they are the same counter.
>>>>>
>>>>>         /*
>>>>>          * It is desirable to allow the writer to acquire the percpu-rwlock
>>>>>          * for read (if necessary), without deadlocking or getting complaints
>>>>>          * from lockdep. To achieve that, just increment the reader_refcnt of
>>>>>          * this CPU - that way, any attempt by the writer to acquire the
>>>>>          * percpu-rwlock for read, will get treated as a case of nested percpu
>>>>>          * reader, which is safe, from a locking perspective.
>>>>>          */
>>>>>         this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
>>>>>
>>>>
>>>> Whoa! Hold on, were you really referring to _this_ increment when you said
>>>> that, in your patch you would increment the refcnt at the writer? Then I guess
>>>> there is a major disconnect in our conversations. (I had assumed that you were
>>>> referring to the update of writer_signal, and were just trying to have a single
>>>> refcnt instead of reader_refcnt and writer_signal).
>>>
>>> https://github.com/laijs/linux/commit/53e5053d5b724bea7c538b11743d0f420d98f38d
>>>
>>> Sorry the name "fallback_reader_refcnt" misled you.
>>>
>> [...]
>>
>>>>> All I was considered is "nested reader is seldom", so I always
>>>>> fallback to rwlock when nested.
>>>>> If you like, I can add 6 lines of code, the overhead is
>>>>> 1 spin_try_lock()(fast path)  + N  __this_cpu_inc()
>>>>>
>>>>
>>>> I'm assuming that calculation is no longer valid, considering that
>>>> we just discussed how the per-cpu refcnt that you were using is quite
>>>> unnecessary and can be removed.
>>>>
>>>> IIUC, the overhead with your code, as per above discussion would be:
>>>> 1 spin_try_lock() [non-nested] + N read_lock(global_rwlock).
>>>
>>> https://github.com/laijs/linux/commit/46334544bb7961550b7065e015da76f6dab21f16
>>>
>>> Again, I'm so sorry the name "fallback_reader_refcnt" misled you.
>>>
>>
>> At this juncture I really have to admit that I don't understand your
>> intentions at all. What are you really trying to prove? Without giving
>> a single good reason why my code is inferior, why are you even bringing
>> up the discussion about a complete rewrite of the synchronization code?
>> http://article.gmane.org/gmane.linux.kernel.cross-arch/17103
>> http://article.gmane.org/gmane.linux.power-management.general/31345
>>
>> I'm beginning to add 2 + 2 together based on the kinds of questions you
>> have been asking...
>>
>> You posted a patch in this thread and started a discussion around it without
>> even establishing a strong reason to do so. Now you point me to your git
>> tree where your patches have even more traces of ideas being borrowed from
>> my patchset (apart from my own ideas/code, there are traces of others' ideas
>> being borrowed too - for example, it was Oleg who originally proposed the
>> idea of splitting up the counter into 2 parts and I'm seeing that it is
>> slowly crawling into your code with no sign of appropriate credits).
>> http://article.gmane.org/gmane.linux.network/260288
>>
>> And in reply to my mail pointing out the performance implications of the
>> global read_lock at the reader side in your code, you said you'll come up
>> with a comparison between that and my patchset.
>> http://article.gmane.org/gmane.linux.network/260288
>> The issue has been well-documented in my patch description of patch 4.
>> http://article.gmane.org/gmane.linux.kernel/1443258
>>
>> Are you really trying to pit bits and pieces of my own ideas/versions
>> against one another and claiming them as your own?
>>
>> You projected the work involved in handling the locking issues pertaining
>> to CPU_DYING notifiers etc as a TODO, despite the fact that I had explicitly
>> noted in my cover letter that I had audited and taken care of all of them.
>> http://article.gmane.org/gmane.linux.documentation/9727
>> http://article.gmane.org/gmane.linux.documentation/9520
>>
>> You failed to acknowledge (on purpose?) that I had done a tree-wide
>> conversion despite the fact that you were replying to the very thread which
>> had the 46 patches which did exactly that (and I had also mentioned it
>> explicitly in my cover letter).
>> http://article.gmane.org/gmane.linux.documentation/9727
>> http://article.gmane.org/gmane.linux.documentation/9520
>>
>> You then started probing more and more about the technique I used to do
>> the tree-wide conversion.
>> http://article.gmane.org/gmane.linux.kernel.cross-arch/17111
>>
>> You also retorted saying you did go through my patch descriptions, so
>> its not like you have missed reading them.
>> http://article.gmane.org/gmane.linux.power-management.general/31345
>>
>> Each of these when considered individually, might appear like innocuous and
>> honest attempts at evaluating my code. But when put together, I'm beginning
>> to sense a whole different angle to it altogether, as if you are trying
>> to spin your own patch series, complete with the locking framework _and_
>> the tree-wide conversion, heavily borrowed from mine. At the beginning of
>> this discussion, I predicted that the lglock version that you are proposing
>> would end up being either less efficient than my version or look very similar
>> to my version. http://article.gmane.org/gmane.linux.kernel/1447139
>>
>> I thought it was just the former till now, but its not hard to see how it
>> is getting closer to becoming the latter too. So yeah, I'm not amused.
>>
>> Maybe (and hopefully) you are just trying out different ideas on your own,
>> and I'm just being paranoid. I really hope that is the case. If you are just
>> trying to review my code, then please stop sending patches with borrowed ideas
>> with your sole Signed-off-by, and purposefully ignoring the work already done
>> in my patchset, because it is really starting to look suspicious, at least
>> to me.
>>
>> Don't get me wrong - I'll whole-heartedly acknowledge and appreciate if
>> _your_ code is better than mine. I just don't like the idea of somebody
>> plagiarizing my ideas/code (or even others' ideas for that matter).
>> However, I sincerely apologize in advance if I misunderstood/misjudged your
>> intentions; I just wanted to voice my concerns out loud at this point,
>> considering the bad feeling I got by looking at your responses collectively.
>>
> 
> Hi, Srivatsa
> 
> I'm sorry, big apology to you.
> I'm bad in communication and I did be wrong.
> I tended to improve the codes but in false direction.
> 

OK, in that case, I'm extremely sorry too, for jumping on you like that.
I hope you'll forgive me for the uneasiness it caused.

Now that I understand that you were simply trying to help, I would like to
express my gratitude for your time, effort and inputs in improving the design
of the stop-machine replacement.

I'm looking forward to working with you on this as well as future endeavours,
so I sincerely hope that we can put this unfortunate incident behind us and
collaborate effectively with renewed mutual trust and good-will.

Thank you very much!

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-03-01 19:47                                     ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-03-01 19:47 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Lai Jiangshan, Michel Lespinasse, linux-doc, peterz, fweisbec,
	linux-kernel, namhyung, mingo, linux-arch, linux, xiaoguangrong,
	wangyun, paulmck, nikunj, linux-pm, rusty, rostedt, rjw,
	vincent.guittot, tglx, linux-arm-kernel, netdev, oleg, sbw, tj,
	akpm, linuxppc-dev

On 03/01/2013 11:20 PM, Lai Jiangshan wrote:
> On 28/02/13 05:19, Srivatsa S. Bhat wrote:
>> On 02/27/2013 06:03 AM, Lai Jiangshan wrote:
>>> On Wed, Feb 27, 2013 at 3:30 AM, Srivatsa S. Bhat
>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>> On 02/26/2013 09:55 PM, Lai Jiangshan wrote:
>>>>> On Tue, Feb 26, 2013 at 10:22 PM, Srivatsa S. Bhat
>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>
>>>>>> Hi Lai,
>>>>>>
>>>>>> I'm really not convinced that piggy-backing on lglocks would help
>>>>>> us in any way. But still, let me try to address some of the points
>>>>>> you raised...
>>>>>>
>>>>>> On 02/26/2013 06:29 PM, Lai Jiangshan wrote:
>>>>>>> On Tue, Feb 26, 2013 at 5:02 PM, Srivatsa S. Bhat
>>>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>>> On 02/26/2013 05:47 AM, Lai Jiangshan wrote:
>>>>>>>>> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
>>>>>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>>>>> Hi Lai,
>>>>>>>>>>
>>>>>>>>>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>>>>>>>>>> Hi, Srivatsa,
>>>>>>>>>>>
>>>>>>>>>>> The target of the whole patchset is nice for me.
>>>>>>>>>>
>>>>>>>>>> Cool! Thanks :-)
>>>>>>>>>>
>>>>>>>> [...]
>>>>>>>>
>>>>>>>> Unfortunately, I see quite a few issues with the code above. IIUC, the
>>>>>>>> writer and the reader both increment the same counters. So how will the
>>>>>>>> unlock() code in the reader path know when to unlock which of the locks?
>>>>>>>
>>>>>>> The same as your code, the reader(which nested in write C.S.) just dec
>>>>>>> the counters.
>>>>>>
>>>>>> And that works fine in my case because the writer and the reader update
>>>>>> _two_ _different_ counters.
>>>>>
>>>>> I can't find any magic in your code, they are the same counter.
>>>>>
>>>>>         /*
>>>>>          * It is desirable to allow the writer to acquire the percpu-rwlock
>>>>>          * for read (if necessary), without deadlocking or getting complaints
>>>>>          * from lockdep. To achieve that, just increment the reader_refcnt of
>>>>>          * this CPU - that way, any attempt by the writer to acquire the
>>>>>          * percpu-rwlock for read, will get treated as a case of nested percpu
>>>>>          * reader, which is safe, from a locking perspective.
>>>>>          */
>>>>>         this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
>>>>>
>>>>
>>>> Whoa! Hold on, were you really referring to _this_ increment when you said
>>>> that, in your patch you would increment the refcnt at the writer? Then I guess
>>>> there is a major disconnect in our conversations. (I had assumed that you were
>>>> referring to the update of writer_signal, and were just trying to have a single
>>>> refcnt instead of reader_refcnt and writer_signal).
>>>
>>> https://github.com/laijs/linux/commit/53e5053d5b724bea7c538b11743d0f420d98f38d
>>>
>>> Sorry the name "fallback_reader_refcnt" misled you.
>>>
>> [...]
>>
>>>>> All I was considered is "nested reader is seldom", so I always
>>>>> fallback to rwlock when nested.
>>>>> If you like, I can add 6 lines of code, the overhead is
>>>>> 1 spin_try_lock()(fast path)  + N  __this_cpu_inc()
>>>>>
>>>>
>>>> I'm assuming that calculation is no longer valid, considering that
>>>> we just discussed how the per-cpu refcnt that you were using is quite
>>>> unnecessary and can be removed.
>>>>
>>>> IIUC, the overhead with your code, as per above discussion would be:
>>>> 1 spin_try_lock() [non-nested] + N read_lock(global_rwlock).
>>>
>>> https://github.com/laijs/linux/commit/46334544bb7961550b7065e015da76f6dab21f16
>>>
>>> Again, I'm so sorry the name "fallback_reader_refcnt" misled you.
>>>
>>
>> At this juncture I really have to admit that I don't understand your
>> intentions at all. What are you really trying to prove? Without giving
>> a single good reason why my code is inferior, why are you even bringing
>> up the discussion about a complete rewrite of the synchronization code?
>> http://article.gmane.org/gmane.linux.kernel.cross-arch/17103
>> http://article.gmane.org/gmane.linux.power-management.general/31345
>>
>> I'm beginning to add 2 + 2 together based on the kinds of questions you
>> have been asking...
>>
>> You posted a patch in this thread and started a discussion around it without
>> even establishing a strong reason to do so. Now you point me to your git
>> tree where your patches have even more traces of ideas being borrowed from
>> my patchset (apart from my own ideas/code, there are traces of others' ideas
>> being borrowed too - for example, it was Oleg who originally proposed the
>> idea of splitting up the counter into 2 parts and I'm seeing that it is
>> slowly crawling into your code with no sign of appropriate credits).
>> http://article.gmane.org/gmane.linux.network/260288
>>
>> And in reply to my mail pointing out the performance implications of the
>> global read_lock at the reader side in your code, you said you'll come up
>> with a comparison between that and my patchset.
>> http://article.gmane.org/gmane.linux.network/260288
>> The issue has been well-documented in my patch description of patch 4.
>> http://article.gmane.org/gmane.linux.kernel/1443258
>>
>> Are you really trying to pit bits and pieces of my own ideas/versions
>> against one another and claiming them as your own?
>>
>> You projected the work involved in handling the locking issues pertaining
>> to CPU_DYING notifiers etc as a TODO, despite the fact that I had explicitly
>> noted in my cover letter that I had audited and taken care of all of them.
>> http://article.gmane.org/gmane.linux.documentation/9727
>> http://article.gmane.org/gmane.linux.documentation/9520
>>
>> You failed to acknowledge (on purpose?) that I had done a tree-wide
>> conversion despite the fact that you were replying to the very thread which
>> had the 46 patches which did exactly that (and I had also mentioned it
>> explicitly in my cover letter).
>> http://article.gmane.org/gmane.linux.documentation/9727
>> http://article.gmane.org/gmane.linux.documentation/9520
>>
>> You then started probing more and more about the technique I used to do
>> the tree-wide conversion.
>> http://article.gmane.org/gmane.linux.kernel.cross-arch/17111
>>
>> You also retorted saying you did go through my patch descriptions, so
>> its not like you have missed reading them.
>> http://article.gmane.org/gmane.linux.power-management.general/31345
>>
>> Each of these when considered individually, might appear like innocuous and
>> honest attempts at evaluating my code. But when put together, I'm beginning
>> to sense a whole different angle to it altogether, as if you are trying
>> to spin your own patch series, complete with the locking framework _and_
>> the tree-wide conversion, heavily borrowed from mine. At the beginning of
>> this discussion, I predicted that the lglock version that you are proposing
>> would end up being either less efficient than my version or look very similar
>> to my version. http://article.gmane.org/gmane.linux.kernel/1447139
>>
>> I thought it was just the former till now, but its not hard to see how it
>> is getting closer to becoming the latter too. So yeah, I'm not amused.
>>
>> Maybe (and hopefully) you are just trying out different ideas on your own,
>> and I'm just being paranoid. I really hope that is the case. If you are just
>> trying to review my code, then please stop sending patches with borrowed ideas
>> with your sole Signed-off-by, and purposefully ignoring the work already done
>> in my patchset, because it is really starting to look suspicious, at least
>> to me.
>>
>> Don't get me wrong - I'll whole-heartedly acknowledge and appreciate if
>> _your_ code is better than mine. I just don't like the idea of somebody
>> plagiarizing my ideas/code (or even others' ideas for that matter).
>> However, I sincerely apologize in advance if I misunderstood/misjudged your
>> intentions; I just wanted to voice my concerns out loud at this point,
>> considering the bad feeling I got by looking at your responses collectively.
>>
> 
> Hi, Srivatsa
> 
> I'm sorry, big apology to you.
> I'm bad in communication and I did be wrong.
> I tended to improve the codes but in false direction.
> 

OK, in that case, I'm extremely sorry too, for jumping on you like that.
I hope you'll forgive me for the uneasiness it caused.

Now that I understand that you were simply trying to help, I would like to
express my gratitude for your time, effort and inputs in improving the design
of the stop-machine replacement.

I'm looking forward to working with you on this as well as future endeavours,
so I sincerely hope that we can put this unfortunate incident behind us and
collaborate effectively with renewed mutual trust and good-will.

Thank you very much!

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-03-01 19:47                                     ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-03-01 19:47 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Lai Jiangshan, linux-doc, peterz, fweisbec, oleg,
	Michel Lespinasse, mingo, linux-arch, linux, xiaoguangrong,
	wangyun, paulmck, nikunj, linux-pm, rusty, rostedt, rjw,
	namhyung, tglx, linux-arm-kernel, netdev, linux-kernel,
	vincent.guittot, sbw, tj, akpm, linuxppc-dev

On 03/01/2013 11:20 PM, Lai Jiangshan wrote:
> On 28/02/13 05:19, Srivatsa S. Bhat wrote:
>> On 02/27/2013 06:03 AM, Lai Jiangshan wrote:
>>> On Wed, Feb 27, 2013 at 3:30 AM, Srivatsa S. Bhat
>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>> On 02/26/2013 09:55 PM, Lai Jiangshan wrote:
>>>>> On Tue, Feb 26, 2013 at 10:22 PM, Srivatsa S. Bhat
>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>
>>>>>> Hi Lai,
>>>>>>
>>>>>> I'm really not convinced that piggy-backing on lglocks would help
>>>>>> us in any way. But still, let me try to address some of the points
>>>>>> you raised...
>>>>>>
>>>>>> On 02/26/2013 06:29 PM, Lai Jiangshan wrote:
>>>>>>> On Tue, Feb 26, 2013 at 5:02 PM, Srivatsa S. Bhat
>>>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>>> On 02/26/2013 05:47 AM, Lai Jiangshan wrote:
>>>>>>>>> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
>>>>>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>>>>> Hi Lai,
>>>>>>>>>>
>>>>>>>>>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>>>>>>>>>> Hi, Srivatsa,
>>>>>>>>>>>
>>>>>>>>>>> The target of the whole patchset is nice for me.
>>>>>>>>>>
>>>>>>>>>> Cool! Thanks :-)
>>>>>>>>>>
>>>>>>>> [...]
>>>>>>>>
>>>>>>>> Unfortunately, I see quite a few issues with the code above. IIUC, the
>>>>>>>> writer and the reader both increment the same counters. So how will the
>>>>>>>> unlock() code in the reader path know when to unlock which of the locks?
>>>>>>>
>>>>>>> The same as your code, the reader(which nested in write C.S.) just dec
>>>>>>> the counters.
>>>>>>
>>>>>> And that works fine in my case because the writer and the reader update
>>>>>> _two_ _different_ counters.
>>>>>
>>>>> I can't find any magic in your code, they are the same counter.
>>>>>
>>>>>         /*
>>>>>          * It is desirable to allow the writer to acquire the percpu-rwlock
>>>>>          * for read (if necessary), without deadlocking or getting complaints
>>>>>          * from lockdep. To achieve that, just increment the reader_refcnt of
>>>>>          * this CPU - that way, any attempt by the writer to acquire the
>>>>>          * percpu-rwlock for read, will get treated as a case of nested percpu
>>>>>          * reader, which is safe, from a locking perspective.
>>>>>          */
>>>>>         this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
>>>>>
>>>>
>>>> Whoa! Hold on, were you really referring to _this_ increment when you said
>>>> that, in your patch you would increment the refcnt at the writer? Then I guess
>>>> there is a major disconnect in our conversations. (I had assumed that you were
>>>> referring to the update of writer_signal, and were just trying to have a single
>>>> refcnt instead of reader_refcnt and writer_signal).
>>>
>>> https://github.com/laijs/linux/commit/53e5053d5b724bea7c538b11743d0f420d98f38d
>>>
>>> Sorry the name "fallback_reader_refcnt" misled you.
>>>
>> [...]
>>
>>>>> All I was considered is "nested reader is seldom", so I always
>>>>> fallback to rwlock when nested.
>>>>> If you like, I can add 6 lines of code, the overhead is
>>>>> 1 spin_try_lock()(fast path)  + N  __this_cpu_inc()
>>>>>
>>>>
>>>> I'm assuming that calculation is no longer valid, considering that
>>>> we just discussed how the per-cpu refcnt that you were using is quite
>>>> unnecessary and can be removed.
>>>>
>>>> IIUC, the overhead with your code, as per above discussion would be:
>>>> 1 spin_try_lock() [non-nested] + N read_lock(global_rwlock).
>>>
>>> https://github.com/laijs/linux/commit/46334544bb7961550b7065e015da76f6dab21f16
>>>
>>> Again, I'm so sorry the name "fallback_reader_refcnt" misled you.
>>>
>>
>> At this juncture I really have to admit that I don't understand your
>> intentions at all. What are you really trying to prove? Without giving
>> a single good reason why my code is inferior, why are you even bringing
>> up the discussion about a complete rewrite of the synchronization code?
>> http://article.gmane.org/gmane.linux.kernel.cross-arch/17103
>> http://article.gmane.org/gmane.linux.power-management.general/31345
>>
>> I'm beginning to add 2 + 2 together based on the kinds of questions you
>> have been asking...
>>
>> You posted a patch in this thread and started a discussion around it without
>> even establishing a strong reason to do so. Now you point me to your git
>> tree where your patches have even more traces of ideas being borrowed from
>> my patchset (apart from my own ideas/code, there are traces of others' ideas
>> being borrowed too - for example, it was Oleg who originally proposed the
>> idea of splitting up the counter into 2 parts and I'm seeing that it is
>> slowly crawling into your code with no sign of appropriate credits).
>> http://article.gmane.org/gmane.linux.network/260288
>>
>> And in reply to my mail pointing out the performance implications of the
>> global read_lock at the reader side in your code, you said you'll come up
>> with a comparison between that and my patchset.
>> http://article.gmane.org/gmane.linux.network/260288
>> The issue has been well-documented in my patch description of patch 4.
>> http://article.gmane.org/gmane.linux.kernel/1443258
>>
>> Are you really trying to pit bits and pieces of my own ideas/versions
>> against one another and claiming them as your own?
>>
>> You projected the work involved in handling the locking issues pertaining
>> to CPU_DYING notifiers etc as a TODO, despite the fact that I had explicitly
>> noted in my cover letter that I had audited and taken care of all of them.
>> http://article.gmane.org/gmane.linux.documentation/9727
>> http://article.gmane.org/gmane.linux.documentation/9520
>>
>> You failed to acknowledge (on purpose?) that I had done a tree-wide
>> conversion despite the fact that you were replying to the very thread which
>> had the 46 patches which did exactly that (and I had also mentioned it
>> explicitly in my cover letter).
>> http://article.gmane.org/gmane.linux.documentation/9727
>> http://article.gmane.org/gmane.linux.documentation/9520
>>
>> You then started probing more and more about the technique I used to do
>> the tree-wide conversion.
>> http://article.gmane.org/gmane.linux.kernel.cross-arch/17111
>>
>> You also retorted saying you did go through my patch descriptions, so
>> its not like you have missed reading them.
>> http://article.gmane.org/gmane.linux.power-management.general/31345
>>
>> Each of these when considered individually, might appear like innocuous and
>> honest attempts at evaluating my code. But when put together, I'm beginning
>> to sense a whole different angle to it altogether, as if you are trying
>> to spin your own patch series, complete with the locking framework _and_
>> the tree-wide conversion, heavily borrowed from mine. At the beginning of
>> this discussion, I predicted that the lglock version that you are proposing
>> would end up being either less efficient than my version or look very similar
>> to my version. http://article.gmane.org/gmane.linux.kernel/1447139
>>
>> I thought it was just the former till now, but its not hard to see how it
>> is getting closer to becoming the latter too. So yeah, I'm not amused.
>>
>> Maybe (and hopefully) you are just trying out different ideas on your own,
>> and I'm just being paranoid. I really hope that is the case. If you are just
>> trying to review my code, then please stop sending patches with borrowed ideas
>> with your sole Signed-off-by, and purposefully ignoring the work already done
>> in my patchset, because it is really starting to look suspicious, at least
>> to me.
>>
>> Don't get me wrong - I'll whole-heartedly acknowledge and appreciate if
>> _your_ code is better than mine. I just don't like the idea of somebody
>> plagiarizing my ideas/code (or even others' ideas for that matter).
>> However, I sincerely apologize in advance if I misunderstood/misjudged your
>> intentions; I just wanted to voice my concerns out loud at this point,
>> considering the bad feeling I got by looking at your responses collectively.
>>
> 
> Hi, Srivatsa
> 
> I'm sorry, big apology to you.
> I'm bad in communication and I did be wrong.
> I tended to improve the codes but in false direction.
> 

OK, in that case, I'm extremely sorry too, for jumping on you like that.
I hope you'll forgive me for the uneasiness it caused.

Now that I understand that you were simply trying to help, I would like to
express my gratitude for your time, effort and inputs in improving the design
of the stop-machine replacement.

I'm looking forward to working with you on this as well as future endeavours,
so I sincerely hope that we can put this unfortunate incident behind us and
collaborate effectively with renewed mutual trust and good-will.

Thank you very much!

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-03-01 19:47                                     ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-03-01 19:47 UTC (permalink / raw)
  To: linux-arm-kernel

On 03/01/2013 11:20 PM, Lai Jiangshan wrote:
> On 28/02/13 05:19, Srivatsa S. Bhat wrote:
>> On 02/27/2013 06:03 AM, Lai Jiangshan wrote:
>>> On Wed, Feb 27, 2013 at 3:30 AM, Srivatsa S. Bhat
>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>> On 02/26/2013 09:55 PM, Lai Jiangshan wrote:
>>>>> On Tue, Feb 26, 2013 at 10:22 PM, Srivatsa S. Bhat
>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>
>>>>>> Hi Lai,
>>>>>>
>>>>>> I'm really not convinced that piggy-backing on lglocks would help
>>>>>> us in any way. But still, let me try to address some of the points
>>>>>> you raised...
>>>>>>
>>>>>> On 02/26/2013 06:29 PM, Lai Jiangshan wrote:
>>>>>>> On Tue, Feb 26, 2013 at 5:02 PM, Srivatsa S. Bhat
>>>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>>> On 02/26/2013 05:47 AM, Lai Jiangshan wrote:
>>>>>>>>> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
>>>>>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>>>>> Hi Lai,
>>>>>>>>>>
>>>>>>>>>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>>>>>>>>>> Hi, Srivatsa,
>>>>>>>>>>>
>>>>>>>>>>> The target of the whole patchset is nice for me.
>>>>>>>>>>
>>>>>>>>>> Cool! Thanks :-)
>>>>>>>>>>
>>>>>>>> [...]
>>>>>>>>
>>>>>>>> Unfortunately, I see quite a few issues with the code above. IIUC, the
>>>>>>>> writer and the reader both increment the same counters. So how will the
>>>>>>>> unlock() code in the reader path know when to unlock which of the locks?
>>>>>>>
>>>>>>> The same as your code, the reader(which nested in write C.S.) just dec
>>>>>>> the counters.
>>>>>>
>>>>>> And that works fine in my case because the writer and the reader update
>>>>>> _two_ _different_ counters.
>>>>>
>>>>> I can't find any magic in your code, they are the same counter.
>>>>>
>>>>>         /*
>>>>>          * It is desirable to allow the writer to acquire the percpu-rwlock
>>>>>          * for read (if necessary), without deadlocking or getting complaints
>>>>>          * from lockdep. To achieve that, just increment the reader_refcnt of
>>>>>          * this CPU - that way, any attempt by the writer to acquire the
>>>>>          * percpu-rwlock for read, will get treated as a case of nested percpu
>>>>>          * reader, which is safe, from a locking perspective.
>>>>>          */
>>>>>         this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
>>>>>
>>>>
>>>> Whoa! Hold on, were you really referring to _this_ increment when you said
>>>> that, in your patch you would increment the refcnt at the writer? Then I guess
>>>> there is a major disconnect in our conversations. (I had assumed that you were
>>>> referring to the update of writer_signal, and were just trying to have a single
>>>> refcnt instead of reader_refcnt and writer_signal).
>>>
>>> https://github.com/laijs/linux/commit/53e5053d5b724bea7c538b11743d0f420d98f38d
>>>
>>> Sorry the name "fallback_reader_refcnt" misled you.
>>>
>> [...]
>>
>>>>> All I was considered is "nested reader is seldom", so I always
>>>>> fallback to rwlock when nested.
>>>>> If you like, I can add 6 lines of code, the overhead is
>>>>> 1 spin_try_lock()(fast path)  + N  __this_cpu_inc()
>>>>>
>>>>
>>>> I'm assuming that calculation is no longer valid, considering that
>>>> we just discussed how the per-cpu refcnt that you were using is quite
>>>> unnecessary and can be removed.
>>>>
>>>> IIUC, the overhead with your code, as per above discussion would be:
>>>> 1 spin_try_lock() [non-nested] + N read_lock(global_rwlock).
>>>
>>> https://github.com/laijs/linux/commit/46334544bb7961550b7065e015da76f6dab21f16
>>>
>>> Again, I'm so sorry the name "fallback_reader_refcnt" misled you.
>>>
>>
>> At this juncture I really have to admit that I don't understand your
>> intentions at all. What are you really trying to prove? Without giving
>> a single good reason why my code is inferior, why are you even bringing
>> up the discussion about a complete rewrite of the synchronization code?
>> http://article.gmane.org/gmane.linux.kernel.cross-arch/17103
>> http://article.gmane.org/gmane.linux.power-management.general/31345
>>
>> I'm beginning to add 2 + 2 together based on the kinds of questions you
>> have been asking...
>>
>> You posted a patch in this thread and started a discussion around it without
>> even establishing a strong reason to do so. Now you point me to your git
>> tree where your patches have even more traces of ideas being borrowed from
>> my patchset (apart from my own ideas/code, there are traces of others' ideas
>> being borrowed too - for example, it was Oleg who originally proposed the
>> idea of splitting up the counter into 2 parts and I'm seeing that it is
>> slowly crawling into your code with no sign of appropriate credits).
>> http://article.gmane.org/gmane.linux.network/260288
>>
>> And in reply to my mail pointing out the performance implications of the
>> global read_lock at the reader side in your code, you said you'll come up
>> with a comparison between that and my patchset.
>> http://article.gmane.org/gmane.linux.network/260288
>> The issue has been well-documented in my patch description of patch 4.
>> http://article.gmane.org/gmane.linux.kernel/1443258
>>
>> Are you really trying to pit bits and pieces of my own ideas/versions
>> against one another and claiming them as your own?
>>
>> You projected the work involved in handling the locking issues pertaining
>> to CPU_DYING notifiers etc as a TODO, despite the fact that I had explicitly
>> noted in my cover letter that I had audited and taken care of all of them.
>> http://article.gmane.org/gmane.linux.documentation/9727
>> http://article.gmane.org/gmane.linux.documentation/9520
>>
>> You failed to acknowledge (on purpose?) that I had done a tree-wide
>> conversion despite the fact that you were replying to the very thread which
>> had the 46 patches which did exactly that (and I had also mentioned it
>> explicitly in my cover letter).
>> http://article.gmane.org/gmane.linux.documentation/9727
>> http://article.gmane.org/gmane.linux.documentation/9520
>>
>> You then started probing more and more about the technique I used to do
>> the tree-wide conversion.
>> http://article.gmane.org/gmane.linux.kernel.cross-arch/17111
>>
>> You also retorted saying you did go through my patch descriptions, so
>> its not like you have missed reading them.
>> http://article.gmane.org/gmane.linux.power-management.general/31345
>>
>> Each of these when considered individually, might appear like innocuous and
>> honest attempts at evaluating my code. But when put together, I'm beginning
>> to sense a whole different angle to it altogether, as if you are trying
>> to spin your own patch series, complete with the locking framework _and_
>> the tree-wide conversion, heavily borrowed from mine. At the beginning of
>> this discussion, I predicted that the lglock version that you are proposing
>> would end up being either less efficient than my version or look very similar
>> to my version. http://article.gmane.org/gmane.linux.kernel/1447139
>>
>> I thought it was just the former till now, but its not hard to see how it
>> is getting closer to becoming the latter too. So yeah, I'm not amused.
>>
>> Maybe (and hopefully) you are just trying out different ideas on your own,
>> and I'm just being paranoid. I really hope that is the case. If you are just
>> trying to review my code, then please stop sending patches with borrowed ideas
>> with your sole Signed-off-by, and purposefully ignoring the work already done
>> in my patchset, because it is really starting to look suspicious, at least
>> to me.
>>
>> Don't get me wrong - I'll whole-heartedly acknowledge and appreciate if
>> _your_ code is better than mine. I just don't like the idea of somebody
>> plagiarizing my ideas/code (or even others' ideas for that matter).
>> However, I sincerely apologize in advance if I misunderstood/misjudged your
>> intentions; I just wanted to voice my concerns out loud at this point,
>> considering the bad feeling I got by looking at your responses collectively.
>>
> 
> Hi, Srivatsa
> 
> I'm sorry, big apology to you.
> I'm bad in communication and I did be wrong.
> I tended to improve the codes but in false direction.
> 

OK, in that case, I'm extremely sorry too, for jumping on you like that.
I hope you'll forgive me for the uneasiness it caused.

Now that I understand that you were simply trying to help, I would like to
express my gratitude for your time, effort and inputs in improving the design
of the stop-machine replacement.

I'm looking forward to working with you on this as well as future endeavours,
so I sincerely hope that we can put this unfortunate incident behind us and
collaborate effectively with renewed mutual trust and good-will.

Thank you very much!

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
  2013-03-01 18:10                                   ` Tejun Heo
  (?)
@ 2013-03-01 19:59                                     ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-03-01 19:59 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Lai Jiangshan, Michel Lespinasse, linux-doc, peterz, fweisbec,
	linux-kernel, namhyung, mingo, linux-arch, linux, xiaoguangrong,
	wangyun, paulmck, nikunj, linux-pm, rusty, rostedt, rjw,
	vincent.guittot, tglx, linux-arm-kernel, netdev, oleg, sbw, akpm,
	linuxppc-dev

Hi Tejun,

On 03/01/2013 11:40 PM, Tejun Heo wrote:
> Hello, Srivatsa.
> 
> On Thu, Feb 28, 2013 at 02:49:53AM +0530, Srivatsa S. Bhat wrote:
>> Don't get me wrong - I'll whole-heartedly acknowledge and appreciate if
>> _your_ code is better than mine. I just don't like the idea of somebody
>> plagiarizing my ideas/code (or even others' ideas for that matter).
>> However, I sincerely apologize in advance if I misunderstood/misjudged your
>> intentions; I just wanted to voice my concerns out loud at this point,
>> considering the bad feeling I got by looking at your responses collectively.
> 
> Although I don't know Lai personally, from my experience working with
> him past several months on workqueue, I strongly doubt he has dark
> urterior intenions.  The biggest problem probably is that his
> communication in English isn't very fluent yet and thus doesn't carry
> the underlying intentions or tones very well and he tends to bombard
> patches in quick succession without much explanation inbetween (during
> 3.8 cycle, I was receiving several workqueue patchsets days apart
> trying to solve the same problem with completely different approaches
> way before the discussion on the previous postings reach any kind of
> consensus).  A lot of that also probably comes from communication
> problems.
> 
> Anyways, I really don't think he's trying to take claim of your work.
> At least for now, you kinda need to speculate what he's trying to do
> and then confirm that back to him, but once you get used to that part,
> he's pretty nice to work with and I'm sure his communication will
> improve in time (I think it has gotten a lot better than only some
> months ago).
>

Thank you so much for helping clear the air, Tejun! I'm truly sorry again
for having drawn the wrong conclusions based on gaps in our communication.
It was a mistake on my part and I would like to request Lai to kindly
forgive me. I'm looking forward to working with Lai effectively, going
forward.

Thank you!

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-03-01 19:59                                     ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-03-01 19:59 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Lai Jiangshan, linux-doc, peterz, fweisbec, oleg,
	Michel Lespinasse, mingo, linux-arch, linux, xiaoguangrong,
	wangyun, paulmck, nikunj, linux-pm, rusty, rostedt, rjw,
	namhyung, tglx, linux-arm-kernel, netdev, linux-kernel,
	vincent.guittot, sbw, akpm, linuxppc-dev

Hi Tejun,

On 03/01/2013 11:40 PM, Tejun Heo wrote:
> Hello, Srivatsa.
> 
> On Thu, Feb 28, 2013 at 02:49:53AM +0530, Srivatsa S. Bhat wrote:
>> Don't get me wrong - I'll whole-heartedly acknowledge and appreciate if
>> _your_ code is better than mine. I just don't like the idea of somebody
>> plagiarizing my ideas/code (or even others' ideas for that matter).
>> However, I sincerely apologize in advance if I misunderstood/misjudged your
>> intentions; I just wanted to voice my concerns out loud at this point,
>> considering the bad feeling I got by looking at your responses collectively.
> 
> Although I don't know Lai personally, from my experience working with
> him past several months on workqueue, I strongly doubt he has dark
> urterior intenions.  The biggest problem probably is that his
> communication in English isn't very fluent yet and thus doesn't carry
> the underlying intentions or tones very well and he tends to bombard
> patches in quick succession without much explanation inbetween (during
> 3.8 cycle, I was receiving several workqueue patchsets days apart
> trying to solve the same problem with completely different approaches
> way before the discussion on the previous postings reach any kind of
> consensus).  A lot of that also probably comes from communication
> problems.
> 
> Anyways, I really don't think he's trying to take claim of your work.
> At least for now, you kinda need to speculate what he's trying to do
> and then confirm that back to him, but once you get used to that part,
> he's pretty nice to work with and I'm sure his communication will
> improve in time (I think it has gotten a lot better than only some
> months ago).
>

Thank you so much for helping clear the air, Tejun! I'm truly sorry again
for having drawn the wrong conclusions based on gaps in our communication.
It was a mistake on my part and I would like to request Lai to kindly
forgive me. I'm looking forward to working with Lai effectively, going
forward.

Thank you!

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-03-01 19:59                                     ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-03-01 19:59 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Tejun,

On 03/01/2013 11:40 PM, Tejun Heo wrote:
> Hello, Srivatsa.
> 
> On Thu, Feb 28, 2013 at 02:49:53AM +0530, Srivatsa S. Bhat wrote:
>> Don't get me wrong - I'll whole-heartedly acknowledge and appreciate if
>> _your_ code is better than mine. I just don't like the idea of somebody
>> plagiarizing my ideas/code (or even others' ideas for that matter).
>> However, I sincerely apologize in advance if I misunderstood/misjudged your
>> intentions; I just wanted to voice my concerns out loud at this point,
>> considering the bad feeling I got by looking at your responses collectively.
> 
> Although I don't know Lai personally, from my experience working with
> him past several months on workqueue, I strongly doubt he has dark
> urterior intenions.  The biggest problem probably is that his
> communication in English isn't very fluent yet and thus doesn't carry
> the underlying intentions or tones very well and he tends to bombard
> patches in quick succession without much explanation inbetween (during
> 3.8 cycle, I was receiving several workqueue patchsets days apart
> trying to solve the same problem with completely different approaches
> way before the discussion on the previous postings reach any kind of
> consensus).  A lot of that also probably comes from communication
> problems.
> 
> Anyways, I really don't think he's trying to take claim of your work.
> At least for now, you kinda need to speculate what he's trying to do
> and then confirm that back to him, but once you get used to that part,
> he's pretty nice to work with and I'm sure his communication will
> improve in time (I think it has gotten a lot better than only some
> months ago).
>

Thank you so much for helping clear the air, Tejun! I'm truly sorry again
for having drawn the wrong conclusions based on gaps in our communication.
It was a mistake on my part and I would like to request Lai to kindly
forgive me. I'm looking forward to working with Lai effectively, going
forward.

Thank you!

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH] lglock: add read-preference local-global rwlock
  2013-03-01 17:53                                     ` Tejun Heo
  (?)
@ 2013-03-01 20:06                                       ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-03-01 20:06 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Lai Jiangshan, Lai Jiangshan, Michel Lespinasse, linux-doc,
	peterz, fweisbec, linux-kernel, namhyung, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, vincent.guittot, tglx, linux-arm-kernel, netdev,
	oleg, sbw, akpm, linuxppc-dev

On 03/01/2013 11:23 PM, Tejun Heo wrote:
> Hey, guys and Oleg (yes, I'm singling you out ;p because you're that
> awesome.)
> 
> On Sat, Mar 02, 2013 at 01:44:02AM +0800, Lai Jiangshan wrote:
>> Performance:
>> We only focus on the performance of the read site. this read site's fast path
>> is just preempt_disable() + __this_cpu_read/inc() + arch_spin_trylock(),
>> It has only one heavy memory operation. it will be expected fast.
>>
>> We test three locks.
>> 1) traditional rwlock WITHOUT remote competition nor cache-bouncing.(opt-rwlock)
>> 2) this lock(lgrwlock)
>> 3) V6 percpu-rwlock by "Srivatsa S. Bhat". (percpu-rwlock)
>>    (https://lkml.org/lkml/2013/2/18/186)
>>
>> 		nested=1(no nested)	nested=2	nested=4
>> opt-rwlock	 517181			1009200		2010027
>> lgrwlock	 452897			 700026		1201415
>> percpu-rwlock	1192955			1451343		1951757
> 
> On the first glance, the numbers look pretty good and I kinda really
> like the fact that if this works out we don't have to introduce yet
> another percpu synchronization construct and get to reuse lglock.
> 
> So, Oleg, can you please see whether you can find holes in this one?
> 
> Srivatsa, I know you spent a lot of time on percpu_rwlock but as you
> wrote before Lai's work can be seen as continuation of yours, and if
> we get to extend what's already there instead of introducing something
> completely new, there's no reason not to

Yep, I agree!

> (and my apologies for not
> noticing the possibility of extending lglock before).

No problem at all! You gave so many invaluable suggestions to make this
whole thing work in the first place! Now, if we can reuse the existing
stuff and extend it to what we want, then its just even better! :-)

>  So, if this can
> work, it would be awesome if you guys can work together.

Absolutely!

>  Lai might
> not be very good at communicating in english yet but he's really good
> at spotting patterns in complex code and playing with them.
>

That sounds great! :-)

I'll soon take a closer look at his code and the comparisons he posted,
and work towards taking this effort forward.

Thank you very much!

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH] lglock: add read-preference local-global rwlock
@ 2013-03-01 20:06                                       ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-03-01 20:06 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Lai Jiangshan, linux-doc, peterz, fweisbec, oleg,
	Michel Lespinasse, mingo, linux-arch, linux, xiaoguangrong,
	wangyun, paulmck, nikunj, linux-pm, rusty, rostedt, rjw,
	namhyung, tglx, linux-arm-kernel, Lai Jiangshan, netdev,
	linux-kernel, vincent.guittot, sbw, akpm, linuxppc-dev

On 03/01/2013 11:23 PM, Tejun Heo wrote:
> Hey, guys and Oleg (yes, I'm singling you out ;p because you're that
> awesome.)
> 
> On Sat, Mar 02, 2013 at 01:44:02AM +0800, Lai Jiangshan wrote:
>> Performance:
>> We only focus on the performance of the read site. this read site's fast path
>> is just preempt_disable() + __this_cpu_read/inc() + arch_spin_trylock(),
>> It has only one heavy memory operation. it will be expected fast.
>>
>> We test three locks.
>> 1) traditional rwlock WITHOUT remote competition nor cache-bouncing.(opt-rwlock)
>> 2) this lock(lgrwlock)
>> 3) V6 percpu-rwlock by "Srivatsa S. Bhat". (percpu-rwlock)
>>    (https://lkml.org/lkml/2013/2/18/186)
>>
>> 		nested=1(no nested)	nested=2	nested=4
>> opt-rwlock	 517181			1009200		2010027
>> lgrwlock	 452897			 700026		1201415
>> percpu-rwlock	1192955			1451343		1951757
> 
> On the first glance, the numbers look pretty good and I kinda really
> like the fact that if this works out we don't have to introduce yet
> another percpu synchronization construct and get to reuse lglock.
> 
> So, Oleg, can you please see whether you can find holes in this one?
> 
> Srivatsa, I know you spent a lot of time on percpu_rwlock but as you
> wrote before Lai's work can be seen as continuation of yours, and if
> we get to extend what's already there instead of introducing something
> completely new, there's no reason not to

Yep, I agree!

> (and my apologies for not
> noticing the possibility of extending lglock before).

No problem at all! You gave so many invaluable suggestions to make this
whole thing work in the first place! Now, if we can reuse the existing
stuff and extend it to what we want, then its just even better! :-)

>  So, if this can
> work, it would be awesome if you guys can work together.

Absolutely!

>  Lai might
> not be very good at communicating in english yet but he's really good
> at spotting patterns in complex code and playing with them.
>

That sounds great! :-)

I'll soon take a closer look at his code and the comparisons he posted,
and work towards taking this effort forward.

Thank you very much!

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH] lglock: add read-preference local-global rwlock
@ 2013-03-01 20:06                                       ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-03-01 20:06 UTC (permalink / raw)
  To: linux-arm-kernel

On 03/01/2013 11:23 PM, Tejun Heo wrote:
> Hey, guys and Oleg (yes, I'm singling you out ;p because you're that
> awesome.)
> 
> On Sat, Mar 02, 2013 at 01:44:02AM +0800, Lai Jiangshan wrote:
>> Performance:
>> We only focus on the performance of the read site. this read site's fast path
>> is just preempt_disable() + __this_cpu_read/inc() + arch_spin_trylock(),
>> It has only one heavy memory operation. it will be expected fast.
>>
>> We test three locks.
>> 1) traditional rwlock WITHOUT remote competition nor cache-bouncing.(opt-rwlock)
>> 2) this lock(lgrwlock)
>> 3) V6 percpu-rwlock by "Srivatsa S. Bhat". (percpu-rwlock)
>>    (https://lkml.org/lkml/2013/2/18/186)
>>
>> 		nested=1(no nested)	nested=2	nested=4
>> opt-rwlock	 517181			1009200		2010027
>> lgrwlock	 452897			 700026		1201415
>> percpu-rwlock	1192955			1451343		1951757
> 
> On the first glance, the numbers look pretty good and I kinda really
> like the fact that if this works out we don't have to introduce yet
> another percpu synchronization construct and get to reuse lglock.
> 
> So, Oleg, can you please see whether you can find holes in this one?
> 
> Srivatsa, I know you spent a lot of time on percpu_rwlock but as you
> wrote before Lai's work can be seen as continuation of yours, and if
> we get to extend what's already there instead of introducing something
> completely new, there's no reason not to

Yep, I agree!

> (and my apologies for not
> noticing the possibility of extending lglock before).

No problem at all! You gave so many invaluable suggestions to make this
whole thing work in the first place! Now, if we can reuse the existing
stuff and extend it to what we want, then its just even better! :-)

>  So, if this can
> work, it would be awesome if you guys can work together.

Absolutely!

>  Lai might
> not be very good at communicating in english yet but he's really good
> at spotting patterns in complex code and playing with them.
>

That sounds great! :-)

I'll soon take a closer look at his code and the comparisons he posted,
and work towards taking this effort forward.

Thank you very much!

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH] lglock: add read-preference local-global rwlock
  2013-03-01 18:28                                     ` Oleg Nesterov
  (?)
  (?)
@ 2013-03-02 12:13                                       ` Michel Lespinasse
  -1 siblings, 0 replies; 360+ messages in thread
From: Michel Lespinasse @ 2013-03-02 12:13 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Lai Jiangshan, Srivatsa S. Bhat, Lai Jiangshan, linux-doc,
	peterz, fweisbec, linux-kernel, namhyung, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, vincent.guittot, tglx, linux-arm-kernel, netdev,
	sbw, tj, akpm, linuxppc-dev

On Sat, Mar 2, 2013 at 2:28 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> Lai, I didn't read this discussion except the code posted by Michel.
> I'll try to read this patch carefully later, but I'd like to ask
> a couple of questions.
>
> This version looks more complex than Michel's, why? Just curious, I
> am trying to understand what I missed. See
> http://marc.info/?l=linux-kernel&m=136196350213593

>From what I can see, my version used local_refcnt to count how many
reentrant locks are represented by the fastpath lglock spinlock; Lai's
version uses it to count how many reentrant locks are represented by
either the fastpath lglock spinlock or the global rwlock, with
FALLBACK_BASE being a bit thrown in so we can remember which of these
locks was acquired. My version would be slower if it needs to take the
slow path in a reentrant way, but I'm not sure it matters either :)

> Interrupt handler on CPU_1 does _read_lock() notices ->reader_refcnt != 0
> and simply does this_cpu_inc(), so reader_refcnt == FALLBACK_BASE + 1.
>
> Then irq does _read_unlock(), and
>
>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
>> +{
>> +     switch (__this_cpu_dec_return(*lgrw->reader_refcnt)) {
>> +     case 0:
>> +             lg_local_unlock(&lgrw->lglock);
>> +             return;
>> +     case FALLBACK_BASE:
>> +             __this_cpu_sub(*lgrw->reader_refcnt, FALLBACK_BASE);
>> +             read_unlock(&lgrw->fallback_rwlock);
>
> hits this case?
>
> Doesn't look right, but most probably I missed something.

Good catch. I think this is easily fixed by setting reader_refcn
directly to FALLBACK_BASE+1, instead of setting it to FALLBACK_BASE
and then incrementing it to FALLBACK_BASE+1.

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH] lglock: add read-preference local-global rwlock
@ 2013-03-02 12:13                                       ` Michel Lespinasse
  0 siblings, 0 replies; 360+ messages in thread
From: Michel Lespinasse @ 2013-03-02 12:13 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Lai Jiangshan, Srivatsa S. Bhat, Lai Jiangshan, linux-doc,
	peterz, fweisbec, linux-kernel, namhyung, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, vincent.guittot, tglx, linux-arm-kernel, netdev,
	sbw, tj, akpm, linuxppc-dev

On Sat, Mar 2, 2013 at 2:28 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> Lai, I didn't read this discussion except the code posted by Michel.
> I'll try to read this patch carefully later, but I'd like to ask
> a couple of questions.
>
> This version looks more complex than Michel's, why? Just curious, I
> am trying to understand what I missed. See
> http://marc.info/?l=linux-kernel&m=136196350213593

From what I can see, my version used local_refcnt to count how many
reentrant locks are represented by the fastpath lglock spinlock; Lai's
version uses it to count how many reentrant locks are represented by
either the fastpath lglock spinlock or the global rwlock, with
FALLBACK_BASE being a bit thrown in so we can remember which of these
locks was acquired. My version would be slower if it needs to take the
slow path in a reentrant way, but I'm not sure it matters either :)

> Interrupt handler on CPU_1 does _read_lock() notices ->reader_refcnt != 0
> and simply does this_cpu_inc(), so reader_refcnt == FALLBACK_BASE + 1.
>
> Then irq does _read_unlock(), and
>
>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
>> +{
>> +     switch (__this_cpu_dec_return(*lgrw->reader_refcnt)) {
>> +     case 0:
>> +             lg_local_unlock(&lgrw->lglock);
>> +             return;
>> +     case FALLBACK_BASE:
>> +             __this_cpu_sub(*lgrw->reader_refcnt, FALLBACK_BASE);
>> +             read_unlock(&lgrw->fallback_rwlock);
>
> hits this case?
>
> Doesn't look right, but most probably I missed something.

Good catch. I think this is easily fixed by setting reader_refcn
directly to FALLBACK_BASE+1, instead of setting it to FALLBACK_BASE
and then incrementing it to FALLBACK_BASE+1.

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH] lglock: add read-preference local-global rwlock
@ 2013-03-02 12:13                                       ` Michel Lespinasse
  0 siblings, 0 replies; 360+ messages in thread
From: Michel Lespinasse @ 2013-03-02 12:13 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Lai Jiangshan, linux-doc, peterz, fweisbec, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, namhyung, tglx, linux-arm-kernel, Lai Jiangshan,
	netdev, linux-kernel, vincent.guittot, sbw, Srivatsa S. Bhat, tj,
	akpm, linuxppc-dev

On Sat, Mar 2, 2013 at 2:28 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> Lai, I didn't read this discussion except the code posted by Michel.
> I'll try to read this patch carefully later, but I'd like to ask
> a couple of questions.
>
> This version looks more complex than Michel's, why? Just curious, I
> am trying to understand what I missed. See
> http://marc.info/?l=linux-kernel&m=136196350213593

>From what I can see, my version used local_refcnt to count how many
reentrant locks are represented by the fastpath lglock spinlock; Lai's
version uses it to count how many reentrant locks are represented by
either the fastpath lglock spinlock or the global rwlock, with
FALLBACK_BASE being a bit thrown in so we can remember which of these
locks was acquired. My version would be slower if it needs to take the
slow path in a reentrant way, but I'm not sure it matters either :)

> Interrupt handler on CPU_1 does _read_lock() notices ->reader_refcnt != 0
> and simply does this_cpu_inc(), so reader_refcnt == FALLBACK_BASE + 1.
>
> Then irq does _read_unlock(), and
>
>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
>> +{
>> +     switch (__this_cpu_dec_return(*lgrw->reader_refcnt)) {
>> +     case 0:
>> +             lg_local_unlock(&lgrw->lglock);
>> +             return;
>> +     case FALLBACK_BASE:
>> +             __this_cpu_sub(*lgrw->reader_refcnt, FALLBACK_BASE);
>> +             read_unlock(&lgrw->fallback_rwlock);
>
> hits this case?
>
> Doesn't look right, but most probably I missed something.

Good catch. I think this is easily fixed by setting reader_refcn
directly to FALLBACK_BASE+1, instead of setting it to FALLBACK_BASE
and then incrementing it to FALLBACK_BASE+1.

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH] lglock: add read-preference local-global rwlock
@ 2013-03-02 12:13                                       ` Michel Lespinasse
  0 siblings, 0 replies; 360+ messages in thread
From: Michel Lespinasse @ 2013-03-02 12:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Mar 2, 2013 at 2:28 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> Lai, I didn't read this discussion except the code posted by Michel.
> I'll try to read this patch carefully later, but I'd like to ask
> a couple of questions.
>
> This version looks more complex than Michel's, why? Just curious, I
> am trying to understand what I missed. See
> http://marc.info/?l=linux-kernel&m=136196350213593

>From what I can see, my version used local_refcnt to count how many
reentrant locks are represented by the fastpath lglock spinlock; Lai's
version uses it to count how many reentrant locks are represented by
either the fastpath lglock spinlock or the global rwlock, with
FALLBACK_BASE being a bit thrown in so we can remember which of these
locks was acquired. My version would be slower if it needs to take the
slow path in a reentrant way, but I'm not sure it matters either :)

> Interrupt handler on CPU_1 does _read_lock() notices ->reader_refcnt != 0
> and simply does this_cpu_inc(), so reader_refcnt == FALLBACK_BASE + 1.
>
> Then irq does _read_unlock(), and
>
>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
>> +{
>> +     switch (__this_cpu_dec_return(*lgrw->reader_refcnt)) {
>> +     case 0:
>> +             lg_local_unlock(&lgrw->lglock);
>> +             return;
>> +     case FALLBACK_BASE:
>> +             __this_cpu_sub(*lgrw->reader_refcnt, FALLBACK_BASE);
>> +             read_unlock(&lgrw->fallback_rwlock);
>
> hits this case?
>
> Doesn't look right, but most probably I missed something.

Good catch. I think this is easily fixed by setting reader_refcn
directly to FALLBACK_BASE+1, instead of setting it to FALLBACK_BASE
and then incrementing it to FALLBACK_BASE+1.

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH V2] lglock: add read-preference local-global rwlock
  2013-03-02 12:13                                       ` Michel Lespinasse
                                                           ` (2 preceding siblings ...)
  (?)
@ 2013-03-02 13:14                                         ` Lai Jiangshan
  -1 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-03-02 13:14 UTC (permalink / raw)
  To: Michel Lespinasse, Srivatsa S. Bhat
  Cc: Oleg Nesterov, Lai Jiangshan, linux-doc, peterz, fweisbec,
	linux-kernel, namhyung, mingo, linux-arch, linux, xiaoguangrong,
	wangyun, paulmck, nikunj, linux-pm, rusty, rostedt, rjw,
	vincent.guittot, tglx, linux-arm-kernel, netdev, sbw, tj, akpm,
	linuxppc-dev

>From 345a7a75c314ff567be48983e0892bc69c4452e7 Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <laijs@cn.fujitsu.com>
Date: Sat, 2 Mar 2013 20:33:14 +0800
Subject: [PATCH] lglock: add read-preference local-global rwlock

Current lglock is not read-preference, so it can't be used on some cases
which read-preference rwlock can do. Example, get_cpu_online_atomic().

Although we can use rwlock for these cases which needs read-preference.
but it leads to unnecessary cache-line bouncing even when there are no
writers present, which can slow down the system needlessly. It will be
worse when we have a lot of CPUs, it is not scale.

So we look forward to lglock. lglock is read-write-lock based on percpu locks,
but it is not read-preference due to its underlining percpu locks.

But what if we convert the percpu locks of lglock to use percpu rwlocks:

         CPU 0                                CPU 1
         ------                               ------
1.    spin_lock(&random_lock);             read_lock(my_rwlock of CPU 1);
2.    read_lock(my_rwlock of CPU 0);       spin_lock(&random_lock);

Writer:
         CPU 2:
         ------
      for_each_online_cpu(cpu)
        write_lock(my_rwlock of 'cpu');


Consider what happens if the writer begins his operation in between steps 1
and 2 at the reader side. It becomes evident that we end up in a (previously
non-existent) deadlock due to a circular locking dependency between the 3
entities, like this:


(holds              Waiting for
 random_lock) CPU 0 -------------> CPU 2  (holds my_rwlock of CPU 0
                                               for write)
               ^                   |
               |                   |
        Waiting|                   | Waiting
          for  |                   |  for
               |                   V
                ------ CPU 1 <------

                (holds my_rwlock of
                 CPU 1 for read)


So obviously this "straight-forward" way of implementing percpu rwlocks is
deadlock-prone. So we can't implement read-preference local-global rwlock
like this.


The implement of this patch reuse current lglock as frontend to achieve
local-read-lock, and reuse global fallback rwlock as backend to achieve
read-preference, and use percpu reader counter to indicate 1) the depth
of the nested reader lockes and 2) whether the outmost lock is percpu lock
or fallback rwlock.

The algorithm is simple, in the read site:
If it is nested reader, just increase the counter
If it is the outmost reader,
	1) try to lock its cpu's lock of the frontend lglock.
	   (reader count +=1 if success)
	2) if the above step fails, read_lock(&fallback_rwlock).
	   (reader count += FALLBACK_BASE + 1)

Write site:
Do the lg_global_lock() of the frontend lglock.
And then write_lock(&fallback_rwlock).


Prof:
1) reader-writer exclusive:
write-site must requires all percpu locks and fallback_rwlock.
outmost read-site must requires one of these locks.

2) read-preference:
before write site lock finished acquired, read site at least
wins at read_lock(&fallback_rwlock) due to rwlock is read-preference.

3) read site functions are irqsafe(reentrance-safe)
   (read site functions is not protected by disabled irq, but they are irqsafe)
	If read site function is interrupted at any point and reenters read site
again, reentranced read site will not be mislead by the first read site if the
reader counter > 0, in this case, it means currently frontend(this cpu lock of
lglock) or backend(fallback rwlock) lock is held, it is safe to act as
nested reader.
	if the reader counter=0, eentranced reader considers it is the
outmost read site, and it always successes after the write side release the lock.
(even the interrupted read-site has already hold the cpu lock of lglock
or the fallback_rwlock).
	And reentranced read site only calls arch_spin_trylock(), read_lock()
and __this_cpu_op(), arch_spin_trylock(), read_lock() is already reentrance-safe.
Although __this_cpu_op() is not reentrance-safe, but the value of the counter
will be restored after the interrupted finished, so read site functions
are still reentrance-safe.


Performance:
We only focus on the performance of the read site. this read site's fast path
is just preempt_disable() + __this_cpu_read/inc() + arch_spin_trylock(),
It has only one heavy memory operation. it will be expected fast.

We test three locks.
1) traditional rwlock WITHOUT remote competition nor cache-bouncing.(opt-rwlock)
2) this lock(lgrwlock)
3) V6 percpu-rwlock by "Srivatsa S. Bhat". (percpu-rwlock)
   (https://lkml.org/lkml/2013/2/18/186)

		nested=1(no nested)	nested=2	nested=4
opt-rwlock	 517181			1009200		2010027
lgrwlock	 452897			 700026		1201415
percpu-rwlock	1192955			1451343		1951757

The value is the time(nano-second) of 10000 times of the operations
{
	read-lock
	[nested read-lock]...
	[nested read-unlock]...
	read-unlock
}
If the way of this test is wrong nor bad, please correct me,
the code of test is here: https://gist.github.com/laijs/5066159
(i5 760, 64bit)

Changed from V1
 fix a reentrance-bug which is cought by Oleg
 don't touch lockdep for nested reader(needed by below change)
 add two special APIs for cpuhotplug.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 include/linux/lglock.h |   38 ++++++++++++++++++++++++++
 kernel/lglock.c        |   68 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 106 insertions(+), 0 deletions(-)

diff --git a/include/linux/lglock.h b/include/linux/lglock.h
index 0d24e93..90bfe79 100644
--- a/include/linux/lglock.h
+++ b/include/linux/lglock.h
@@ -67,4 +67,42 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu);
 void lg_global_lock(struct lglock *lg);
 void lg_global_unlock(struct lglock *lg);
 
+/* read-preference read-write-lock like rwlock but provides local-read-lock */
+struct lgrwlock {
+	unsigned long __percpu *reader_refcnt;
+	struct lglock lglock;
+	rwlock_t fallback_rwlock;
+};
+
+#define __DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)		\
+	= __ARCH_SPIN_LOCK_UNLOCKED;					\
+	static DEFINE_PER_CPU(unsigned long, name ## _refcnt);
+
+#define __LGRWLOCK_INIT(name)						\
+	{								\
+		.reader_refcnt = &name ## _refcnt,			\
+		.lglock = { .lock = &name ## _lock },			\
+		.fallback_rwlock = __RW_LOCK_UNLOCKED(name.fallback_rwlock)\
+	}
+
+#define DEFINE_LGRWLOCK(name)						\
+	__DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	struct lgrwlock name = __LGRWLOCK_INIT(name)
+
+#define DEFINE_STATIC_LGRWLOCK(name)					\
+	__DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	static struct lgrwlock name = __LGRWLOCK_INIT(name)
+
+static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name)
+{
+	lg_lock_init(&lgrw->lglock, name);
+}
+
+void lg_rwlock_local_read_lock(struct lgrwlock *lgrw);
+void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw);
+void lg_rwlock_global_write_lock(struct lgrwlock *lgrw);
+void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw);
+void __lg_rwlock_global_read_write_lock(struct lgrwlock *lgrw);
+void __lg_rwlock_global_read_write_unlock(struct lgrwlock *lgrw);
 #endif
diff --git a/kernel/lglock.c b/kernel/lglock.c
index 6535a66..52e9b2c 100644
--- a/kernel/lglock.c
+++ b/kernel/lglock.c
@@ -87,3 +87,71 @@ void lg_global_unlock(struct lglock *lg)
 	preempt_enable();
 }
 EXPORT_SYMBOL(lg_global_unlock);
+
+#define FALLBACK_BASE	(1UL << 30)
+
+void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
+{
+	struct lglock *lg = &lgrw->lglock;
+
+	preempt_disable();
+	if (likely(!__this_cpu_read(*lgrw->reader_refcnt))) {
+		rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
+		if (unlikely(!arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
+			read_lock(&lgrw->fallback_rwlock);
+			__this_cpu_write(*lgrw->reader_refcnt, FALLBACK_BASE);
+			return;
+		}
+	}
+
+	__this_cpu_inc(*lgrw->reader_refcnt);
+}
+EXPORT_SYMBOL(lg_rwlock_local_read_lock);
+
+void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
+{
+	switch (__this_cpu_read(*lgrw->reader_refcnt)) {
+	case 1:
+		__this_cpu_write(*lgrw->reader_refcnt, 0);
+		lg_local_unlock(&lgrw->lglock);
+		return;
+	case FALLBACK_BASE:
+		__this_cpu_write(*lgrw->reader_refcnt, 0);
+		read_unlock(&lgrw->fallback_rwlock);
+		rwlock_release(&lg->lock_dep_map, 1, _RET_IP_);
+		break;
+	default:
+		__this_cpu_dec(*lgrw->reader_refcnt);
+		break;
+	}
+
+	preempt_enable();
+}
+EXPORT_SYMBOL(lg_rwlock_local_read_unlock);
+
+void lg_rwlock_global_write_lock(struct lgrwlock *lgrw)
+{
+	lg_global_lock(&lgrw->lglock);
+	write_lock(&lgrw->fallback_rwlock);
+}
+EXPORT_SYMBOL(lg_rwlock_global_write_lock);
+
+void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw)
+{
+	write_unlock(&lgrw->fallback_rwlock);
+	lg_global_unlock(&lgrw->lglock);
+}
+EXPORT_SYMBOL(lg_rwlock_global_write_unlock);
+
+/* special write-site APIs allolw nested reader in such write-site */
+void __lg_rwlock_global_read_write_lock(struct lgrwlock *lgrw)
+{
+	lg_rwlock_global_write_lock(lgrw);
+	__this_cpu_write(*lgrw->reader_refcnt, 1);
+}
+
+void __lg_rwlock_global_read_write_unlock(struct lgrwlock *lgrw)
+{
+	__this_cpu_write(*lgrw->reader_refcnt, 0);
+	lg_rwlock_global_write_unlock(lgrw);
+}
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH V2] lglock: add read-preference local-global rwlock
@ 2013-03-02 13:14                                         ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-03-02 13:14 UTC (permalink / raw)
  To: Michel Lespinasse, Srivatsa S. Bhat
  Cc: Lai Jiangshan, linux-doc, peterz, fweisbec, linux-kernel, mingo,
	linux-arch, linux, xiaoguangrong, wangyun, paulmck, nikunj,
	linux-pm, rusty, rostedt, rjw, namhyung, tglx, linux-arm-kernel,
	netdev, Oleg Nesterov, vincent.guittot, sbw, tj, akpm,
	linuxppc-dev

>From 345a7a75c314ff567be48983e0892bc69c4452e7 Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <laijs@cn.fujitsu.com>
Date: Sat, 2 Mar 2013 20:33:14 +0800
Subject: [PATCH] lglock: add read-preference local-global rwlock

Current lglock is not read-preference, so it can't be used on some cases
which read-preference rwlock can do. Example, get_cpu_online_atomic().

Although we can use rwlock for these cases which needs read-preference.
but it leads to unnecessary cache-line bouncing even when there are no
writers present, which can slow down the system needlessly. It will be
worse when we have a lot of CPUs, it is not scale.

So we look forward to lglock. lglock is read-write-lock based on percpu locks,
but it is not read-preference due to its underlining percpu locks.

But what if we convert the percpu locks of lglock to use percpu rwlocks:

         CPU 0                                CPU 1
         ------                               ------
1.    spin_lock(&random_lock);             read_lock(my_rwlock of CPU 1);
2.    read_lock(my_rwlock of CPU 0);       spin_lock(&random_lock);

Writer:
         CPU 2:
         ------
      for_each_online_cpu(cpu)
        write_lock(my_rwlock of 'cpu');


Consider what happens if the writer begins his operation in between steps 1
and 2 at the reader side. It becomes evident that we end up in a (previously
non-existent) deadlock due to a circular locking dependency between the 3
entities, like this:


(holds              Waiting for
 random_lock) CPU 0 -------------> CPU 2  (holds my_rwlock of CPU 0
                                               for write)
               ^                   |
               |                   |
        Waiting|                   | Waiting
          for  |                   |  for
               |                   V
                ------ CPU 1 <------

                (holds my_rwlock of
                 CPU 1 for read)


So obviously this "straight-forward" way of implementing percpu rwlocks is
deadlock-prone. So we can't implement read-preference local-global rwlock
like this.


The implement of this patch reuse current lglock as frontend to achieve
local-read-lock, and reuse global fallback rwlock as backend to achieve
read-preference, and use percpu reader counter to indicate 1) the depth
of the nested reader lockes and 2) whether the outmost lock is percpu lock
or fallback rwlock.

The algorithm is simple, in the read site:
If it is nested reader, just increase the counter
If it is the outmost reader,
	1) try to lock its cpu's lock of the frontend lglock.
	   (reader count +=1 if success)
	2) if the above step fails, read_lock(&fallback_rwlock).
	   (reader count += FALLBACK_BASE + 1)

Write site:
Do the lg_global_lock() of the frontend lglock.
And then write_lock(&fallback_rwlock).


Prof:
1) reader-writer exclusive:
write-site must requires all percpu locks and fallback_rwlock.
outmost read-site must requires one of these locks.

2) read-preference:
before write site lock finished acquired, read site at least
wins at read_lock(&fallback_rwlock) due to rwlock is read-preference.

3) read site functions are irqsafe(reentrance-safe)
   (read site functions is not protected by disabled irq, but they are irqsafe)
	If read site function is interrupted at any point and reenters read site
again, reentranced read site will not be mislead by the first read site if the
reader counter > 0, in this case, it means currently frontend(this cpu lock of
lglock) or backend(fallback rwlock) lock is held, it is safe to act as
nested reader.
	if the reader counter=0, eentranced reader considers it is the
outmost read site, and it always successes after the write side release the lock.
(even the interrupted read-site has already hold the cpu lock of lglock
or the fallback_rwlock).
	And reentranced read site only calls arch_spin_trylock(), read_lock()
and __this_cpu_op(), arch_spin_trylock(), read_lock() is already reentrance-safe.
Although __this_cpu_op() is not reentrance-safe, but the value of the counter
will be restored after the interrupted finished, so read site functions
are still reentrance-safe.


Performance:
We only focus on the performance of the read site. this read site's fast path
is just preempt_disable() + __this_cpu_read/inc() + arch_spin_trylock(),
It has only one heavy memory operation. it will be expected fast.

We test three locks.
1) traditional rwlock WITHOUT remote competition nor cache-bouncing.(opt-rwlock)
2) this lock(lgrwlock)
3) V6 percpu-rwlock by "Srivatsa S. Bhat". (percpu-rwlock)
   (https://lkml.org/lkml/2013/2/18/186)

		nested=1(no nested)	nested=2	nested=4
opt-rwlock	 517181			1009200		2010027
lgrwlock	 452897			 700026		1201415
percpu-rwlock	1192955			1451343		1951757

The value is the time(nano-second) of 10000 times of the operations
{
	read-lock
	[nested read-lock]...
	[nested read-unlock]...
	read-unlock
}
If the way of this test is wrong nor bad, please correct me,
the code of test is here: https://gist.github.com/laijs/5066159
(i5 760, 64bit)

Changed from V1
 fix a reentrance-bug which is cought by Oleg
 don't touch lockdep for nested reader(needed by below change)
 add two special APIs for cpuhotplug.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 include/linux/lglock.h |   38 ++++++++++++++++++++++++++
 kernel/lglock.c        |   68 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 106 insertions(+), 0 deletions(-)

diff --git a/include/linux/lglock.h b/include/linux/lglock.h
index 0d24e93..90bfe79 100644
--- a/include/linux/lglock.h
+++ b/include/linux/lglock.h
@@ -67,4 +67,42 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu);
 void lg_global_lock(struct lglock *lg);
 void lg_global_unlock(struct lglock *lg);
 
+/* read-preference read-write-lock like rwlock but provides local-read-lock */
+struct lgrwlock {
+	unsigned long __percpu *reader_refcnt;
+	struct lglock lglock;
+	rwlock_t fallback_rwlock;
+};
+
+#define __DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)		\
+	= __ARCH_SPIN_LOCK_UNLOCKED;					\
+	static DEFINE_PER_CPU(unsigned long, name ## _refcnt);
+
+#define __LGRWLOCK_INIT(name)						\
+	{								\
+		.reader_refcnt = &name ## _refcnt,			\
+		.lglock = { .lock = &name ## _lock },			\
+		.fallback_rwlock = __RW_LOCK_UNLOCKED(name.fallback_rwlock)\
+	}
+
+#define DEFINE_LGRWLOCK(name)						\
+	__DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	struct lgrwlock name = __LGRWLOCK_INIT(name)
+
+#define DEFINE_STATIC_LGRWLOCK(name)					\
+	__DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	static struct lgrwlock name = __LGRWLOCK_INIT(name)
+
+static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name)
+{
+	lg_lock_init(&lgrw->lglock, name);
+}
+
+void lg_rwlock_local_read_lock(struct lgrwlock *lgrw);
+void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw);
+void lg_rwlock_global_write_lock(struct lgrwlock *lgrw);
+void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw);
+void __lg_rwlock_global_read_write_lock(struct lgrwlock *lgrw);
+void __lg_rwlock_global_read_write_unlock(struct lgrwlock *lgrw);
 #endif
diff --git a/kernel/lglock.c b/kernel/lglock.c
index 6535a66..52e9b2c 100644
--- a/kernel/lglock.c
+++ b/kernel/lglock.c
@@ -87,3 +87,71 @@ void lg_global_unlock(struct lglock *lg)
 	preempt_enable();
 }
 EXPORT_SYMBOL(lg_global_unlock);
+
+#define FALLBACK_BASE	(1UL << 30)
+
+void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
+{
+	struct lglock *lg = &lgrw->lglock;
+
+	preempt_disable();
+	if (likely(!__this_cpu_read(*lgrw->reader_refcnt))) {
+		rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
+		if (unlikely(!arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
+			read_lock(&lgrw->fallback_rwlock);
+			__this_cpu_write(*lgrw->reader_refcnt, FALLBACK_BASE);
+			return;
+		}
+	}
+
+	__this_cpu_inc(*lgrw->reader_refcnt);
+}
+EXPORT_SYMBOL(lg_rwlock_local_read_lock);
+
+void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
+{
+	switch (__this_cpu_read(*lgrw->reader_refcnt)) {
+	case 1:
+		__this_cpu_write(*lgrw->reader_refcnt, 0);
+		lg_local_unlock(&lgrw->lglock);
+		return;
+	case FALLBACK_BASE:
+		__this_cpu_write(*lgrw->reader_refcnt, 0);
+		read_unlock(&lgrw->fallback_rwlock);
+		rwlock_release(&lg->lock_dep_map, 1, _RET_IP_);
+		break;
+	default:
+		__this_cpu_dec(*lgrw->reader_refcnt);
+		break;
+	}
+
+	preempt_enable();
+}
+EXPORT_SYMBOL(lg_rwlock_local_read_unlock);
+
+void lg_rwlock_global_write_lock(struct lgrwlock *lgrw)
+{
+	lg_global_lock(&lgrw->lglock);
+	write_lock(&lgrw->fallback_rwlock);
+}
+EXPORT_SYMBOL(lg_rwlock_global_write_lock);
+
+void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw)
+{
+	write_unlock(&lgrw->fallback_rwlock);
+	lg_global_unlock(&lgrw->lglock);
+}
+EXPORT_SYMBOL(lg_rwlock_global_write_unlock);
+
+/* special write-site APIs allolw nested reader in such write-site */
+void __lg_rwlock_global_read_write_lock(struct lgrwlock *lgrw)
+{
+	lg_rwlock_global_write_lock(lgrw);
+	__this_cpu_write(*lgrw->reader_refcnt, 1);
+}
+
+void __lg_rwlock_global_read_write_unlock(struct lgrwlock *lgrw)
+{
+	__this_cpu_write(*lgrw->reader_refcnt, 0);
+	lg_rwlock_global_write_unlock(lgrw);
+}
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH V2] lglock: add read-preference local-global rwlock
@ 2013-03-02 13:14                                         ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-03-02 13:14 UTC (permalink / raw)
  To: Michel Lespinasse, Srivatsa S. Bhat
  Cc: Lai Jiangshan, linux-doc, peterz, fweisbec, linux-kernel, mingo,
	linux-arch, linux, xiaoguangrong, wangyun, paulmck, nikunj,
	linux-pm, rusty, rostedt, rjw, namhyung, tglx, linux-arm-kernel,
	netdev, Oleg Nesterov, vincent.guittot, sbw, tj, akpm,
	linuxppc-dev

From 345a7a75c314ff567be48983e0892bc69c4452e7 Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <laijs@cn.fujitsu.com>
Date: Sat, 2 Mar 2013 20:33:14 +0800
Subject: [PATCH] lglock: add read-preference local-global rwlock

Current lglock is not read-preference, so it can't be used on some cases
which read-preference rwlock can do. Example, get_cpu_online_atomic().

Although we can use rwlock for these cases which needs read-preference.
but it leads to unnecessary cache-line bouncing even when there are no
writers present, which can slow down the system needlessly. It will be
worse when we have a lot of CPUs, it is not scale.

So we look forward to lglock. lglock is read-write-lock based on percpu locks,
but it is not read-preference due to its underlining percpu locks.

But what if we convert the percpu locks of lglock to use percpu rwlocks:

         CPU 0                                CPU 1
         ------                               ------
1.    spin_lock(&random_lock);             read_lock(my_rwlock of CPU 1);
2.    read_lock(my_rwlock of CPU 0);       spin_lock(&random_lock);

Writer:
         CPU 2:
         ------
      for_each_online_cpu(cpu)
        write_lock(my_rwlock of 'cpu');


Consider what happens if the writer begins his operation in between steps 1
and 2 at the reader side. It becomes evident that we end up in a (previously
non-existent) deadlock due to a circular locking dependency between the 3
entities, like this:


(holds              Waiting for
 random_lock) CPU 0 -------------> CPU 2  (holds my_rwlock of CPU 0
                                               for write)
               ^                   |
               |                   |
        Waiting|                   | Waiting
          for  |                   |  for
               |                   V
                ------ CPU 1 <------

                (holds my_rwlock of
                 CPU 1 for read)


So obviously this "straight-forward" way of implementing percpu rwlocks is
deadlock-prone. So we can't implement read-preference local-global rwlock
like this.


The implement of this patch reuse current lglock as frontend to achieve
local-read-lock, and reuse global fallback rwlock as backend to achieve
read-preference, and use percpu reader counter to indicate 1) the depth
of the nested reader lockes and 2) whether the outmost lock is percpu lock
or fallback rwlock.

The algorithm is simple, in the read site:
If it is nested reader, just increase the counter
If it is the outmost reader,
	1) try to lock its cpu's lock of the frontend lglock.
	   (reader count +=1 if success)
	2) if the above step fails, read_lock(&fallback_rwlock).
	   (reader count += FALLBACK_BASE + 1)

Write site:
Do the lg_global_lock() of the frontend lglock.
And then write_lock(&fallback_rwlock).


Prof:
1) reader-writer exclusive:
write-site must requires all percpu locks and fallback_rwlock.
outmost read-site must requires one of these locks.

2) read-preference:
before write site lock finished acquired, read site at least
wins at read_lock(&fallback_rwlock) due to rwlock is read-preference.

3) read site functions are irqsafe(reentrance-safe)
   (read site functions is not protected by disabled irq, but they are irqsafe)
	If read site function is interrupted at any point and reenters read site
again, reentranced read site will not be mislead by the first read site if the
reader counter > 0, in this case, it means currently frontend(this cpu lock of
lglock) or backend(fallback rwlock) lock is held, it is safe to act as
nested reader.
	if the reader counter=0, eentranced reader considers it is the
outmost read site, and it always successes after the write side release the lock.
(even the interrupted read-site has already hold the cpu lock of lglock
or the fallback_rwlock).
	And reentranced read site only calls arch_spin_trylock(), read_lock()
and __this_cpu_op(), arch_spin_trylock(), read_lock() is already reentrance-safe.
Although __this_cpu_op() is not reentrance-safe, but the value of the counter
will be restored after the interrupted finished, so read site functions
are still reentrance-safe.


Performance:
We only focus on the performance of the read site. this read site's fast path
is just preempt_disable() + __this_cpu_read/inc() + arch_spin_trylock(),
It has only one heavy memory operation. it will be expected fast.

We test three locks.
1) traditional rwlock WITHOUT remote competition nor cache-bouncing.(opt-rwlock)
2) this lock(lgrwlock)
3) V6 percpu-rwlock by "Srivatsa S. Bhat". (percpu-rwlock)
   (https://lkml.org/lkml/2013/2/18/186)

		nested=1(no nested)	nested=2	nested=4
opt-rwlock	 517181			1009200		2010027
lgrwlock	 452897			 700026		1201415
percpu-rwlock	1192955			1451343		1951757

The value is the time(nano-second) of 10000 times of the operations
{
	read-lock
	[nested read-lock]...
	[nested read-unlock]...
	read-unlock
}
If the way of this test is wrong nor bad, please correct me,
the code of test is here: https://gist.github.com/laijs/5066159
(i5 760, 64bit)

Changed from V1
 fix a reentrance-bug which is cought by Oleg
 don't touch lockdep for nested reader(needed by below change)
 add two special APIs for cpuhotplug.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 include/linux/lglock.h |   38 ++++++++++++++++++++++++++
 kernel/lglock.c        |   68 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 106 insertions(+), 0 deletions(-)

diff --git a/include/linux/lglock.h b/include/linux/lglock.h
index 0d24e93..90bfe79 100644
--- a/include/linux/lglock.h
+++ b/include/linux/lglock.h
@@ -67,4 +67,42 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu);
 void lg_global_lock(struct lglock *lg);
 void lg_global_unlock(struct lglock *lg);
 
+/* read-preference read-write-lock like rwlock but provides local-read-lock */
+struct lgrwlock {
+	unsigned long __percpu *reader_refcnt;
+	struct lglock lglock;
+	rwlock_t fallback_rwlock;
+};
+
+#define __DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)		\
+	= __ARCH_SPIN_LOCK_UNLOCKED;					\
+	static DEFINE_PER_CPU(unsigned long, name ## _refcnt);
+
+#define __LGRWLOCK_INIT(name)						\
+	{								\
+		.reader_refcnt = &name ## _refcnt,			\
+		.lglock = { .lock = &name ## _lock },			\
+		.fallback_rwlock = __RW_LOCK_UNLOCKED(name.fallback_rwlock)\
+	}
+
+#define DEFINE_LGRWLOCK(name)						\
+	__DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	struct lgrwlock name = __LGRWLOCK_INIT(name)
+
+#define DEFINE_STATIC_LGRWLOCK(name)					\
+	__DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	static struct lgrwlock name = __LGRWLOCK_INIT(name)
+
+static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name)
+{
+	lg_lock_init(&lgrw->lglock, name);
+}
+
+void lg_rwlock_local_read_lock(struct lgrwlock *lgrw);
+void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw);
+void lg_rwlock_global_write_lock(struct lgrwlock *lgrw);
+void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw);
+void __lg_rwlock_global_read_write_lock(struct lgrwlock *lgrw);
+void __lg_rwlock_global_read_write_unlock(struct lgrwlock *lgrw);
 #endif
diff --git a/kernel/lglock.c b/kernel/lglock.c
index 6535a66..52e9b2c 100644
--- a/kernel/lglock.c
+++ b/kernel/lglock.c
@@ -87,3 +87,71 @@ void lg_global_unlock(struct lglock *lg)
 	preempt_enable();
 }
 EXPORT_SYMBOL(lg_global_unlock);
+
+#define FALLBACK_BASE	(1UL << 30)
+
+void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
+{
+	struct lglock *lg = &lgrw->lglock;
+
+	preempt_disable();
+	if (likely(!__this_cpu_read(*lgrw->reader_refcnt))) {
+		rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
+		if (unlikely(!arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
+			read_lock(&lgrw->fallback_rwlock);
+			__this_cpu_write(*lgrw->reader_refcnt, FALLBACK_BASE);
+			return;
+		}
+	}
+
+	__this_cpu_inc(*lgrw->reader_refcnt);
+}
+EXPORT_SYMBOL(lg_rwlock_local_read_lock);
+
+void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
+{
+	switch (__this_cpu_read(*lgrw->reader_refcnt)) {
+	case 1:
+		__this_cpu_write(*lgrw->reader_refcnt, 0);
+		lg_local_unlock(&lgrw->lglock);
+		return;
+	case FALLBACK_BASE:
+		__this_cpu_write(*lgrw->reader_refcnt, 0);
+		read_unlock(&lgrw->fallback_rwlock);
+		rwlock_release(&lg->lock_dep_map, 1, _RET_IP_);
+		break;
+	default:
+		__this_cpu_dec(*lgrw->reader_refcnt);
+		break;
+	}
+
+	preempt_enable();
+}
+EXPORT_SYMBOL(lg_rwlock_local_read_unlock);
+
+void lg_rwlock_global_write_lock(struct lgrwlock *lgrw)
+{
+	lg_global_lock(&lgrw->lglock);
+	write_lock(&lgrw->fallback_rwlock);
+}
+EXPORT_SYMBOL(lg_rwlock_global_write_lock);
+
+void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw)
+{
+	write_unlock(&lgrw->fallback_rwlock);
+	lg_global_unlock(&lgrw->lglock);
+}
+EXPORT_SYMBOL(lg_rwlock_global_write_unlock);
+
+/* special write-site APIs allolw nested reader in such write-site */
+void __lg_rwlock_global_read_write_lock(struct lgrwlock *lgrw)
+{
+	lg_rwlock_global_write_lock(lgrw);
+	__this_cpu_write(*lgrw->reader_refcnt, 1);
+}
+
+void __lg_rwlock_global_read_write_unlock(struct lgrwlock *lgrw)
+{
+	__this_cpu_write(*lgrw->reader_refcnt, 0);
+	lg_rwlock_global_write_unlock(lgrw);
+}
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH V2] lglock: add read-preference local-global rwlock
@ 2013-03-02 13:14                                         ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-03-02 13:14 UTC (permalink / raw)
  To: Michel Lespinasse, Srivatsa S. Bhat
  Cc: Oleg Nesterov, Lai Jiangshan, linux-doc, peterz, fweisbec,
	linux-kernel, namhyung, mingo, linux-arch, linux, xiaoguangrong,
	wangyun, paulmck, nikunj, linux-pm, rusty, rostedt, rjw,
	vincent.guittot, tglx, linux-arm-kernel, netdev, sbw, tj, akpm,
	linuxppc-dev

From 345a7a75c314ff567be48983e0892bc69c4452e7 Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <laijs@cn.fujitsu.com>
Date: Sat, 2 Mar 2013 20:33:14 +0800
Subject: [PATCH] lglock: add read-preference local-global rwlock

Current lglock is not read-preference, so it can't be used on some cases
which read-preference rwlock can do. Example, get_cpu_online_atomic().

Although we can use rwlock for these cases which needs read-preference.
but it leads to unnecessary cache-line bouncing even when there are no
writers present, which can slow down the system needlessly. It will be
worse when we have a lot of CPUs, it is not scale.

So we look forward to lglock. lglock is read-write-lock based on percpu locks,
but it is not read-preference due to its underlining percpu locks.

But what if we convert the percpu locks of lglock to use percpu rwlocks:

         CPU 0                                CPU 1
         ------                               ------
1.    spin_lock(&random_lock);             read_lock(my_rwlock of CPU 1);
2.    read_lock(my_rwlock of CPU 0);       spin_lock(&random_lock);

Writer:
         CPU 2:
         ------
      for_each_online_cpu(cpu)
        write_lock(my_rwlock of 'cpu');


Consider what happens if the writer begins his operation in between steps 1
and 2 at the reader side. It becomes evident that we end up in a (previously
non-existent) deadlock due to a circular locking dependency between the 3
entities, like this:


(holds              Waiting for
 random_lock) CPU 0 -------------> CPU 2  (holds my_rwlock of CPU 0
                                               for write)
               ^                   |
               |                   |
        Waiting|                   | Waiting
          for  |                   |  for
               |                   V
                ------ CPU 1 <------

                (holds my_rwlock of
                 CPU 1 for read)


So obviously this "straight-forward" way of implementing percpu rwlocks is
deadlock-prone. So we can't implement read-preference local-global rwlock
like this.


The implement of this patch reuse current lglock as frontend to achieve
local-read-lock, and reuse global fallback rwlock as backend to achieve
read-preference, and use percpu reader counter to indicate 1) the depth
of the nested reader lockes and 2) whether the outmost lock is percpu lock
or fallback rwlock.

The algorithm is simple, in the read site:
If it is nested reader, just increase the counter
If it is the outmost reader,
	1) try to lock its cpu's lock of the frontend lglock.
	   (reader count +=1 if success)
	2) if the above step fails, read_lock(&fallback_rwlock).
	   (reader count += FALLBACK_BASE + 1)

Write site:
Do the lg_global_lock() of the frontend lglock.
And then write_lock(&fallback_rwlock).


Prof:
1) reader-writer exclusive:
write-site must requires all percpu locks and fallback_rwlock.
outmost read-site must requires one of these locks.

2) read-preference:
before write site lock finished acquired, read site at least
wins at read_lock(&fallback_rwlock) due to rwlock is read-preference.

3) read site functions are irqsafe(reentrance-safe)
   (read site functions is not protected by disabled irq, but they are irqsafe)
	If read site function is interrupted at any point and reenters read site
again, reentranced read site will not be mislead by the first read site if the
reader counter > 0, in this case, it means currently frontend(this cpu lock of
lglock) or backend(fallback rwlock) lock is held, it is safe to act as
nested reader.
	if the reader counter=0, eentranced reader considers it is the
outmost read site, and it always successes after the write side release the lock.
(even the interrupted read-site has already hold the cpu lock of lglock
or the fallback_rwlock).
	And reentranced read site only calls arch_spin_trylock(), read_lock()
and __this_cpu_op(), arch_spin_trylock(), read_lock() is already reentrance-safe.
Although __this_cpu_op() is not reentrance-safe, but the value of the counter
will be restored after the interrupted finished, so read site functions
are still reentrance-safe.


Performance:
We only focus on the performance of the read site. this read site's fast path
is just preempt_disable() + __this_cpu_read/inc() + arch_spin_trylock(),
It has only one heavy memory operation. it will be expected fast.

We test three locks.
1) traditional rwlock WITHOUT remote competition nor cache-bouncing.(opt-rwlock)
2) this lock(lgrwlock)
3) V6 percpu-rwlock by "Srivatsa S. Bhat". (percpu-rwlock)
   (https://lkml.org/lkml/2013/2/18/186)

		nested=1(no nested)	nested=2	nested=4
opt-rwlock	 517181			1009200		2010027
lgrwlock	 452897			 700026		1201415
percpu-rwlock	1192955			1451343		1951757

The value is the time(nano-second) of 10000 times of the operations
{
	read-lock
	[nested read-lock]...
	[nested read-unlock]...
	read-unlock
}
If the way of this test is wrong nor bad, please correct me,
the code of test is here: https://gist.github.com/laijs/5066159
(i5 760, 64bit)

Changed from V1
 fix a reentrance-bug which is cought by Oleg
 don't touch lockdep for nested reader(needed by below change)
 add two special APIs for cpuhotplug.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 include/linux/lglock.h |   38 ++++++++++++++++++++++++++
 kernel/lglock.c        |   68 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 106 insertions(+), 0 deletions(-)

diff --git a/include/linux/lglock.h b/include/linux/lglock.h
index 0d24e93..90bfe79 100644
--- a/include/linux/lglock.h
+++ b/include/linux/lglock.h
@@ -67,4 +67,42 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu);
 void lg_global_lock(struct lglock *lg);
 void lg_global_unlock(struct lglock *lg);
 
+/* read-preference read-write-lock like rwlock but provides local-read-lock */
+struct lgrwlock {
+	unsigned long __percpu *reader_refcnt;
+	struct lglock lglock;
+	rwlock_t fallback_rwlock;
+};
+
+#define __DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)		\
+	= __ARCH_SPIN_LOCK_UNLOCKED;					\
+	static DEFINE_PER_CPU(unsigned long, name ## _refcnt);
+
+#define __LGRWLOCK_INIT(name)						\
+	{								\
+		.reader_refcnt = &name ## _refcnt,			\
+		.lglock = { .lock = &name ## _lock },			\
+		.fallback_rwlock = __RW_LOCK_UNLOCKED(name.fallback_rwlock)\
+	}
+
+#define DEFINE_LGRWLOCK(name)						\
+	__DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	struct lgrwlock name = __LGRWLOCK_INIT(name)
+
+#define DEFINE_STATIC_LGRWLOCK(name)					\
+	__DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	static struct lgrwlock name = __LGRWLOCK_INIT(name)
+
+static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name)
+{
+	lg_lock_init(&lgrw->lglock, name);
+}
+
+void lg_rwlock_local_read_lock(struct lgrwlock *lgrw);
+void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw);
+void lg_rwlock_global_write_lock(struct lgrwlock *lgrw);
+void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw);
+void __lg_rwlock_global_read_write_lock(struct lgrwlock *lgrw);
+void __lg_rwlock_global_read_write_unlock(struct lgrwlock *lgrw);
 #endif
diff --git a/kernel/lglock.c b/kernel/lglock.c
index 6535a66..52e9b2c 100644
--- a/kernel/lglock.c
+++ b/kernel/lglock.c
@@ -87,3 +87,71 @@ void lg_global_unlock(struct lglock *lg)
 	preempt_enable();
 }
 EXPORT_SYMBOL(lg_global_unlock);
+
+#define FALLBACK_BASE	(1UL << 30)
+
+void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
+{
+	struct lglock *lg = &lgrw->lglock;
+
+	preempt_disable();
+	if (likely(!__this_cpu_read(*lgrw->reader_refcnt))) {
+		rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
+		if (unlikely(!arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
+			read_lock(&lgrw->fallback_rwlock);
+			__this_cpu_write(*lgrw->reader_refcnt, FALLBACK_BASE);
+			return;
+		}
+	}
+
+	__this_cpu_inc(*lgrw->reader_refcnt);
+}
+EXPORT_SYMBOL(lg_rwlock_local_read_lock);
+
+void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
+{
+	switch (__this_cpu_read(*lgrw->reader_refcnt)) {
+	case 1:
+		__this_cpu_write(*lgrw->reader_refcnt, 0);
+		lg_local_unlock(&lgrw->lglock);
+		return;
+	case FALLBACK_BASE:
+		__this_cpu_write(*lgrw->reader_refcnt, 0);
+		read_unlock(&lgrw->fallback_rwlock);
+		rwlock_release(&lg->lock_dep_map, 1, _RET_IP_);
+		break;
+	default:
+		__this_cpu_dec(*lgrw->reader_refcnt);
+		break;
+	}
+
+	preempt_enable();
+}
+EXPORT_SYMBOL(lg_rwlock_local_read_unlock);
+
+void lg_rwlock_global_write_lock(struct lgrwlock *lgrw)
+{
+	lg_global_lock(&lgrw->lglock);
+	write_lock(&lgrw->fallback_rwlock);
+}
+EXPORT_SYMBOL(lg_rwlock_global_write_lock);
+
+void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw)
+{
+	write_unlock(&lgrw->fallback_rwlock);
+	lg_global_unlock(&lgrw->lglock);
+}
+EXPORT_SYMBOL(lg_rwlock_global_write_unlock);
+
+/* special write-site APIs allolw nested reader in such write-site */
+void __lg_rwlock_global_read_write_lock(struct lgrwlock *lgrw)
+{
+	lg_rwlock_global_write_lock(lgrw);
+	__this_cpu_write(*lgrw->reader_refcnt, 1);
+}
+
+void __lg_rwlock_global_read_write_unlock(struct lgrwlock *lgrw)
+{
+	__this_cpu_write(*lgrw->reader_refcnt, 0);
+	lg_rwlock_global_write_unlock(lgrw);
+}
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH V2] lglock: add read-preference local-global rwlock
@ 2013-03-02 13:14                                         ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-03-02 13:14 UTC (permalink / raw)
  To: linux-arm-kernel

>From 345a7a75c314ff567be48983e0892bc69c4452e7 Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <laijs@cn.fujitsu.com>
Date: Sat, 2 Mar 2013 20:33:14 +0800
Subject: [PATCH] lglock: add read-preference local-global rwlock

Current lglock is not read-preference, so it can't be used on some cases
which read-preference rwlock can do. Example, get_cpu_online_atomic().

Although we can use rwlock for these cases which needs read-preference.
but it leads to unnecessary cache-line bouncing even when there are no
writers present, which can slow down the system needlessly. It will be
worse when we have a lot of CPUs, it is not scale.

So we look forward to lglock. lglock is read-write-lock based on percpu locks,
but it is not read-preference due to its underlining percpu locks.

But what if we convert the percpu locks of lglock to use percpu rwlocks:

         CPU 0                                CPU 1
         ------                               ------
1.    spin_lock(&random_lock);             read_lock(my_rwlock of CPU 1);
2.    read_lock(my_rwlock of CPU 0);       spin_lock(&random_lock);

Writer:
         CPU 2:
         ------
      for_each_online_cpu(cpu)
        write_lock(my_rwlock of 'cpu');


Consider what happens if the writer begins his operation in between steps 1
and 2 at the reader side. It becomes evident that we end up in a (previously
non-existent) deadlock due to a circular locking dependency between the 3
entities, like this:


(holds              Waiting for
 random_lock) CPU 0 -------------> CPU 2  (holds my_rwlock of CPU 0
                                               for write)
               ^                   |
               |                   |
        Waiting|                   | Waiting
          for  |                   |  for
               |                   V
                ------ CPU 1 <------

                (holds my_rwlock of
                 CPU 1 for read)


So obviously this "straight-forward" way of implementing percpu rwlocks is
deadlock-prone. So we can't implement read-preference local-global rwlock
like this.


The implement of this patch reuse current lglock as frontend to achieve
local-read-lock, and reuse global fallback rwlock as backend to achieve
read-preference, and use percpu reader counter to indicate 1) the depth
of the nested reader lockes and 2) whether the outmost lock is percpu lock
or fallback rwlock.

The algorithm is simple, in the read site:
If it is nested reader, just increase the counter
If it is the outmost reader,
	1) try to lock its cpu's lock of the frontend lglock.
	   (reader count +=1 if success)
	2) if the above step fails, read_lock(&fallback_rwlock).
	   (reader count += FALLBACK_BASE + 1)

Write site:
Do the lg_global_lock() of the frontend lglock.
And then write_lock(&fallback_rwlock).


Prof:
1) reader-writer exclusive:
write-site must requires all percpu locks and fallback_rwlock.
outmost read-site must requires one of these locks.

2) read-preference:
before write site lock finished acquired, read site at least
wins at read_lock(&fallback_rwlock) due to rwlock is read-preference.

3) read site functions are irqsafe(reentrance-safe)
   (read site functions is not protected by disabled irq, but they are irqsafe)
	If read site function is interrupted@any point and reenters read site
again, reentranced read site will not be mislead by the first read site if the
reader counter > 0, in this case, it means currently frontend(this cpu lock of
lglock) or backend(fallback rwlock) lock is held, it is safe to act as
nested reader.
	if the reader counter=0, eentranced reader considers it is the
outmost read site, and it always successes after the write side release the lock.
(even the interrupted read-site has already hold the cpu lock of lglock
or the fallback_rwlock).
	And reentranced read site only calls arch_spin_trylock(), read_lock()
and __this_cpu_op(), arch_spin_trylock(), read_lock() is already reentrance-safe.
Although __this_cpu_op() is not reentrance-safe, but the value of the counter
will be restored after the interrupted finished, so read site functions
are still reentrance-safe.


Performance:
We only focus on the performance of the read site. this read site's fast path
is just preempt_disable() + __this_cpu_read/inc() + arch_spin_trylock(),
It has only one heavy memory operation. it will be expected fast.

We test three locks.
1) traditional rwlock WITHOUT remote competition nor cache-bouncing.(opt-rwlock)
2) this lock(lgrwlock)
3) V6 percpu-rwlock by "Srivatsa S. Bhat". (percpu-rwlock)
   (https://lkml.org/lkml/2013/2/18/186)

		nested=1(no nested)	nested=2	nested=4
opt-rwlock	 517181			1009200		2010027
lgrwlock	 452897			 700026		1201415
percpu-rwlock	1192955			1451343		1951757

The value is the time(nano-second) of 10000 times of the operations
{
	read-lock
	[nested read-lock]...
	[nested read-unlock]...
	read-unlock
}
If the way of this test is wrong nor bad, please correct me,
the code of test is here: https://gist.github.com/laijs/5066159
(i5 760, 64bit)

Changed from V1
 fix a reentrance-bug which is cought by Oleg
 don't touch lockdep for nested reader(needed by below change)
 add two special APIs for cpuhotplug.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 include/linux/lglock.h |   38 ++++++++++++++++++++++++++
 kernel/lglock.c        |   68 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 106 insertions(+), 0 deletions(-)

diff --git a/include/linux/lglock.h b/include/linux/lglock.h
index 0d24e93..90bfe79 100644
--- a/include/linux/lglock.h
+++ b/include/linux/lglock.h
@@ -67,4 +67,42 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu);
 void lg_global_lock(struct lglock *lg);
 void lg_global_unlock(struct lglock *lg);
 
+/* read-preference read-write-lock like rwlock but provides local-read-lock */
+struct lgrwlock {
+	unsigned long __percpu *reader_refcnt;
+	struct lglock lglock;
+	rwlock_t fallback_rwlock;
+};
+
+#define __DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)		\
+	= __ARCH_SPIN_LOCK_UNLOCKED;					\
+	static DEFINE_PER_CPU(unsigned long, name ## _refcnt);
+
+#define __LGRWLOCK_INIT(name)						\
+	{								\
+		.reader_refcnt = &name ## _refcnt,			\
+		.lglock = { .lock = &name ## _lock },			\
+		.fallback_rwlock = __RW_LOCK_UNLOCKED(name.fallback_rwlock)\
+	}
+
+#define DEFINE_LGRWLOCK(name)						\
+	__DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	struct lgrwlock name = __LGRWLOCK_INIT(name)
+
+#define DEFINE_STATIC_LGRWLOCK(name)					\
+	__DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	static struct lgrwlock name = __LGRWLOCK_INIT(name)
+
+static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name)
+{
+	lg_lock_init(&lgrw->lglock, name);
+}
+
+void lg_rwlock_local_read_lock(struct lgrwlock *lgrw);
+void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw);
+void lg_rwlock_global_write_lock(struct lgrwlock *lgrw);
+void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw);
+void __lg_rwlock_global_read_write_lock(struct lgrwlock *lgrw);
+void __lg_rwlock_global_read_write_unlock(struct lgrwlock *lgrw);
 #endif
diff --git a/kernel/lglock.c b/kernel/lglock.c
index 6535a66..52e9b2c 100644
--- a/kernel/lglock.c
+++ b/kernel/lglock.c
@@ -87,3 +87,71 @@ void lg_global_unlock(struct lglock *lg)
 	preempt_enable();
 }
 EXPORT_SYMBOL(lg_global_unlock);
+
+#define FALLBACK_BASE	(1UL << 30)
+
+void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
+{
+	struct lglock *lg = &lgrw->lglock;
+
+	preempt_disable();
+	if (likely(!__this_cpu_read(*lgrw->reader_refcnt))) {
+		rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
+		if (unlikely(!arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
+			read_lock(&lgrw->fallback_rwlock);
+			__this_cpu_write(*lgrw->reader_refcnt, FALLBACK_BASE);
+			return;
+		}
+	}
+
+	__this_cpu_inc(*lgrw->reader_refcnt);
+}
+EXPORT_SYMBOL(lg_rwlock_local_read_lock);
+
+void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
+{
+	switch (__this_cpu_read(*lgrw->reader_refcnt)) {
+	case 1:
+		__this_cpu_write(*lgrw->reader_refcnt, 0);
+		lg_local_unlock(&lgrw->lglock);
+		return;
+	case FALLBACK_BASE:
+		__this_cpu_write(*lgrw->reader_refcnt, 0);
+		read_unlock(&lgrw->fallback_rwlock);
+		rwlock_release(&lg->lock_dep_map, 1, _RET_IP_);
+		break;
+	default:
+		__this_cpu_dec(*lgrw->reader_refcnt);
+		break;
+	}
+
+	preempt_enable();
+}
+EXPORT_SYMBOL(lg_rwlock_local_read_unlock);
+
+void lg_rwlock_global_write_lock(struct lgrwlock *lgrw)
+{
+	lg_global_lock(&lgrw->lglock);
+	write_lock(&lgrw->fallback_rwlock);
+}
+EXPORT_SYMBOL(lg_rwlock_global_write_lock);
+
+void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw)
+{
+	write_unlock(&lgrw->fallback_rwlock);
+	lg_global_unlock(&lgrw->lglock);
+}
+EXPORT_SYMBOL(lg_rwlock_global_write_unlock);
+
+/* special write-site APIs allolw nested reader in such write-site */
+void __lg_rwlock_global_read_write_lock(struct lgrwlock *lgrw)
+{
+	lg_rwlock_global_write_lock(lgrw);
+	__this_cpu_write(*lgrw->reader_refcnt, 1);
+}
+
+void __lg_rwlock_global_read_write_unlock(struct lgrwlock *lgrw)
+{
+	__this_cpu_write(*lgrw->reader_refcnt, 0);
+	lg_rwlock_global_write_unlock(lgrw);
+}
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* Re: [PATCH] lglock: add read-preference local-global rwlock
  2013-03-01 18:28                                     ` Oleg Nesterov
  (?)
@ 2013-03-02 13:42                                       ` Lai Jiangshan
  -1 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-03-02 13:42 UTC (permalink / raw)
  To: Oleg Nesterov, paulmck
  Cc: Srivatsa S. Bhat, Lai Jiangshan, Michel Lespinasse, linux-doc,
	peterz, fweisbec, linux-kernel, namhyung, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, nikunj, linux-pm, rusty, rostedt,
	rjw, vincent.guittot, tglx, linux-arm-kernel, netdev, sbw, tj,
	akpm, linuxppc-dev

On 02/03/13 02:28, Oleg Nesterov wrote:
> Lai, I didn't read this discussion except the code posted by Michel.
> I'll try to read this patch carefully later, but I'd like to ask
> a couple of questions.
> 
> This version looks more complex than Michel's, why? Just curious, I
> am trying to understand what I missed. See
> http://marc.info/?l=linux-kernel&m=136196350213593

Michel changed my old draft version a little, his version is good enough for me.
My new version tries to add a little better nestable support with only
adding single __this_cpu_op() in _read_[un]lock().

> 
> And I can't understand FALLBACK_BASE...
> 
> OK, suppose that CPU_0 does _write_unlock() and releases ->fallback_rwlock.
> 
> CPU_1 does _read_lock(), and ...
> 
>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
>> +{
>> +	struct lglock *lg = &lgrw->lglock;
>> +
>> +	preempt_disable();
>> +	rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
>> +	if (likely(!__this_cpu_read(*lgrw->reader_refcnt))) {
>> +		if (!arch_spin_trylock(this_cpu_ptr(lg->lock))) {
> 
> _trylock() fails,
> 
>> +			read_lock(&lgrw->fallback_rwlock);
>> +			__this_cpu_add(*lgrw->reader_refcnt, FALLBACK_BASE);
> 
> so we take ->fallback_rwlock and ->reader_refcnt == FALLBACK_BASE.
> 
> CPU_0 does lg_global_unlock(lgrw->lglock) and finishes _write_unlock().
> 
> Interrupt handler on CPU_1 does _read_lock() notices ->reader_refcnt != 0
> and simply does this_cpu_inc(), so reader_refcnt == FALLBACK_BASE + 1.
> 
> Then irq does _read_unlock(), and
> 
>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
>> +{
>> +	switch (__this_cpu_dec_return(*lgrw->reader_refcnt)) {
>> +	case 0:
>> +		lg_local_unlock(&lgrw->lglock);
>> +		return;
>> +	case FALLBACK_BASE:
>> +		__this_cpu_sub(*lgrw->reader_refcnt, FALLBACK_BASE);
>> +		read_unlock(&lgrw->fallback_rwlock);
> 
> hits this case?
> 
> Doesn't look right, but most probably I missed something.

Your are right, I just realized that I had spit a code which should be atomic.

I hope this patch(V2) can get more reviews.

My first and many locking knowledge is learned from Paul.
Paul, would you also review it?

Thanks,
Lai

> 
> Oleg.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH] lglock: add read-preference local-global rwlock
@ 2013-03-02 13:42                                       ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-03-02 13:42 UTC (permalink / raw)
  To: Oleg Nesterov, paulmck
  Cc: Lai Jiangshan, linux-doc, peterz, fweisbec, Michel Lespinasse,
	mingo, linux-arch, linux, xiaoguangrong, wangyun, nikunj,
	linux-pm, rusty, rostedt, rjw, namhyung, tglx, linux-arm-kernel,
	netdev, linux-kernel, vincent.guittot, sbw, Srivatsa S. Bhat, tj,
	akpm, linuxppc-dev

On 02/03/13 02:28, Oleg Nesterov wrote:
> Lai, I didn't read this discussion except the code posted by Michel.
> I'll try to read this patch carefully later, but I'd like to ask
> a couple of questions.
> 
> This version looks more complex than Michel's, why? Just curious, I
> am trying to understand what I missed. See
> http://marc.info/?l=linux-kernel&m=136196350213593

Michel changed my old draft version a little, his version is good enough for me.
My new version tries to add a little better nestable support with only
adding single __this_cpu_op() in _read_[un]lock().

> 
> And I can't understand FALLBACK_BASE...
> 
> OK, suppose that CPU_0 does _write_unlock() and releases ->fallback_rwlock.
> 
> CPU_1 does _read_lock(), and ...
> 
>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
>> +{
>> +	struct lglock *lg = &lgrw->lglock;
>> +
>> +	preempt_disable();
>> +	rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
>> +	if (likely(!__this_cpu_read(*lgrw->reader_refcnt))) {
>> +		if (!arch_spin_trylock(this_cpu_ptr(lg->lock))) {
> 
> _trylock() fails,
> 
>> +			read_lock(&lgrw->fallback_rwlock);
>> +			__this_cpu_add(*lgrw->reader_refcnt, FALLBACK_BASE);
> 
> so we take ->fallback_rwlock and ->reader_refcnt == FALLBACK_BASE.
> 
> CPU_0 does lg_global_unlock(lgrw->lglock) and finishes _write_unlock().
> 
> Interrupt handler on CPU_1 does _read_lock() notices ->reader_refcnt != 0
> and simply does this_cpu_inc(), so reader_refcnt == FALLBACK_BASE + 1.
> 
> Then irq does _read_unlock(), and
> 
>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
>> +{
>> +	switch (__this_cpu_dec_return(*lgrw->reader_refcnt)) {
>> +	case 0:
>> +		lg_local_unlock(&lgrw->lglock);
>> +		return;
>> +	case FALLBACK_BASE:
>> +		__this_cpu_sub(*lgrw->reader_refcnt, FALLBACK_BASE);
>> +		read_unlock(&lgrw->fallback_rwlock);
> 
> hits this case?
> 
> Doesn't look right, but most probably I missed something.

Your are right, I just realized that I had spit a code which should be atomic.

I hope this patch(V2) can get more reviews.

My first and many locking knowledge is learned from Paul.
Paul, would you also review it?

Thanks,
Lai

> 
> Oleg.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH] lglock: add read-preference local-global rwlock
@ 2013-03-02 13:42                                       ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-03-02 13:42 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/03/13 02:28, Oleg Nesterov wrote:
> Lai, I didn't read this discussion except the code posted by Michel.
> I'll try to read this patch carefully later, but I'd like to ask
> a couple of questions.
> 
> This version looks more complex than Michel's, why? Just curious, I
> am trying to understand what I missed. See
> http://marc.info/?l=linux-kernel&m=136196350213593

Michel changed my old draft version a little, his version is good enough for me.
My new version tries to add a little better nestable support with only
adding single __this_cpu_op() in _read_[un]lock().

> 
> And I can't understand FALLBACK_BASE...
> 
> OK, suppose that CPU_0 does _write_unlock() and releases ->fallback_rwlock.
> 
> CPU_1 does _read_lock(), and ...
> 
>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
>> +{
>> +	struct lglock *lg = &lgrw->lglock;
>> +
>> +	preempt_disable();
>> +	rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
>> +	if (likely(!__this_cpu_read(*lgrw->reader_refcnt))) {
>> +		if (!arch_spin_trylock(this_cpu_ptr(lg->lock))) {
> 
> _trylock() fails,
> 
>> +			read_lock(&lgrw->fallback_rwlock);
>> +			__this_cpu_add(*lgrw->reader_refcnt, FALLBACK_BASE);
> 
> so we take ->fallback_rwlock and ->reader_refcnt == FALLBACK_BASE.
> 
> CPU_0 does lg_global_unlock(lgrw->lglock) and finishes _write_unlock().
> 
> Interrupt handler on CPU_1 does _read_lock() notices ->reader_refcnt != 0
> and simply does this_cpu_inc(), so reader_refcnt == FALLBACK_BASE + 1.
> 
> Then irq does _read_unlock(), and
> 
>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
>> +{
>> +	switch (__this_cpu_dec_return(*lgrw->reader_refcnt)) {
>> +	case 0:
>> +		lg_local_unlock(&lgrw->lglock);
>> +		return;
>> +	case FALLBACK_BASE:
>> +		__this_cpu_sub(*lgrw->reader_refcnt, FALLBACK_BASE);
>> +		read_unlock(&lgrw->fallback_rwlock);
> 
> hits this case?
> 
> Doesn't look right, but most probably I missed something.

Your are right, I just realized that I had spit a code which should be atomic.

I hope this patch(V2) can get more reviews.

My first and many locking knowledge is learned from Paul.
Paul, would you also review it?

Thanks,
Lai

> 
> Oleg.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH] lglock: add read-preference local-global rwlock
  2013-03-02 13:42                                       ` Lai Jiangshan
  (?)
@ 2013-03-02 17:01                                         ` Oleg Nesterov
  -1 siblings, 0 replies; 360+ messages in thread
From: Oleg Nesterov @ 2013-03-02 17:01 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: paulmck, Srivatsa S. Bhat, Lai Jiangshan, Michel Lespinasse,
	linux-doc, peterz, fweisbec, linux-kernel, namhyung, mingo,
	linux-arch, linux, xiaoguangrong, wangyun, nikunj, linux-pm,
	rusty, rostedt, rjw, vincent.guittot, tglx, linux-arm-kernel,
	netdev, sbw, tj, akpm, linuxppc-dev

On 03/02, Lai Jiangshan wrote:
>
> On 02/03/13 02:28, Oleg Nesterov wrote:
> > Lai, I didn't read this discussion except the code posted by Michel.
> > I'll try to read this patch carefully later, but I'd like to ask
> > a couple of questions.
> >
> > This version looks more complex than Michel's, why? Just curious, I
> > am trying to understand what I missed. See
> > http://marc.info/?l=linux-kernel&m=136196350213593
>
> Michel changed my old draft version a little, his version is good enough for me.

Yes, I see. But imho Michel suggested the valuable cleanup, the code
becomes even more simple with the same perfomance.

Your v2 looks almost correct to me, but I still think it makes sense
to incorporate the simplification from Michel.

> My new version tries to add a little better nestable support with only
> adding single __this_cpu_op() in _read_[un]lock().

How? Afaics with or without FALLBACK_BASE you need _reed + _inc/dec in
_read_lock/unlock.

Oleg.


^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH] lglock: add read-preference local-global rwlock
@ 2013-03-02 17:01                                         ` Oleg Nesterov
  0 siblings, 0 replies; 360+ messages in thread
From: Oleg Nesterov @ 2013-03-02 17:01 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Lai Jiangshan, linux-doc, peterz, fweisbec, Michel Lespinasse,
	mingo, linux-arch, linux, xiaoguangrong, wangyun, paulmck,
	nikunj, linux-pm, rusty, rostedt, rjw, namhyung, tglx,
	linux-arm-kernel, netdev, linux-kernel, vincent.guittot, sbw,
	Srivatsa S. Bhat, tj, akpm, linuxppc-dev

On 03/02, Lai Jiangshan wrote:
>
> On 02/03/13 02:28, Oleg Nesterov wrote:
> > Lai, I didn't read this discussion except the code posted by Michel.
> > I'll try to read this patch carefully later, but I'd like to ask
> > a couple of questions.
> >
> > This version looks more complex than Michel's, why? Just curious, I
> > am trying to understand what I missed. See
> > http://marc.info/?l=linux-kernel&m=136196350213593
>
> Michel changed my old draft version a little, his version is good enough for me.

Yes, I see. But imho Michel suggested the valuable cleanup, the code
becomes even more simple with the same perfomance.

Your v2 looks almost correct to me, but I still think it makes sense
to incorporate the simplification from Michel.

> My new version tries to add a little better nestable support with only
> adding single __this_cpu_op() in _read_[un]lock().

How? Afaics with or without FALLBACK_BASE you need _reed + _inc/dec in
_read_lock/unlock.

Oleg.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH] lglock: add read-preference local-global rwlock
@ 2013-03-02 17:01                                         ` Oleg Nesterov
  0 siblings, 0 replies; 360+ messages in thread
From: Oleg Nesterov @ 2013-03-02 17:01 UTC (permalink / raw)
  To: linux-arm-kernel

On 03/02, Lai Jiangshan wrote:
>
> On 02/03/13 02:28, Oleg Nesterov wrote:
> > Lai, I didn't read this discussion except the code posted by Michel.
> > I'll try to read this patch carefully later, but I'd like to ask
> > a couple of questions.
> >
> > This version looks more complex than Michel's, why? Just curious, I
> > am trying to understand what I missed. See
> > http://marc.info/?l=linux-kernel&m=136196350213593
>
> Michel changed my old draft version a little, his version is good enough for me.

Yes, I see. But imho Michel suggested the valuable cleanup, the code
becomes even more simple with the same perfomance.

Your v2 looks almost correct to me, but I still think it makes sense
to incorporate the simplification from Michel.

> My new version tries to add a little better nestable support with only
> adding single __this_cpu_op() in _read_[un]lock().

How? Afaics with or without FALLBACK_BASE you need _reed + _inc/dec in
_read_lock/unlock.

Oleg.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH] lglock: add read-preference local-global rwlock
  2013-03-02 12:13                                       ` Michel Lespinasse
  (?)
@ 2013-03-02 17:06                                         ` Oleg Nesterov
  -1 siblings, 0 replies; 360+ messages in thread
From: Oleg Nesterov @ 2013-03-02 17:06 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: Lai Jiangshan, Srivatsa S. Bhat, Lai Jiangshan, linux-doc,
	peterz, fweisbec, linux-kernel, namhyung, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, vincent.guittot, tglx, linux-arm-kernel, netdev,
	sbw, tj, akpm, linuxppc-dev

On 03/02, Michel Lespinasse wrote:
>
> My version would be slower if it needs to take the
> slow path in a reentrant way, but I'm not sure it matters either :)

I'd say, this doesn't matter at all, simply because this can only happen
if we race with the active writer.

Oleg.


^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH] lglock: add read-preference local-global rwlock
@ 2013-03-02 17:06                                         ` Oleg Nesterov
  0 siblings, 0 replies; 360+ messages in thread
From: Oleg Nesterov @ 2013-03-02 17:06 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: Lai Jiangshan, linux-doc, peterz, fweisbec, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, namhyung, tglx, linux-arm-kernel, Lai Jiangshan,
	netdev, linux-kernel, vincent.guittot, sbw, Srivatsa S. Bhat, tj,
	akpm, linuxppc-dev

On 03/02, Michel Lespinasse wrote:
>
> My version would be slower if it needs to take the
> slow path in a reentrant way, but I'm not sure it matters either :)

I'd say, this doesn't matter at all, simply because this can only happen
if we race with the active writer.

Oleg.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH] lglock: add read-preference local-global rwlock
@ 2013-03-02 17:06                                         ` Oleg Nesterov
  0 siblings, 0 replies; 360+ messages in thread
From: Oleg Nesterov @ 2013-03-02 17:06 UTC (permalink / raw)
  To: linux-arm-kernel

On 03/02, Michel Lespinasse wrote:
>
> My version would be slower if it needs to take the
> slow path in a reentrant way, but I'm not sure it matters either :)

I'd say, this doesn't matter at all, simply because this can only happen
if we race with the active writer.

Oleg.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH V2] lglock: add read-preference local-global rwlock
  2013-03-02 13:14                                         ` Lai Jiangshan
  (?)
@ 2013-03-02 17:11                                           ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-03-02 17:11 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Michel Lespinasse, Oleg Nesterov, Lai Jiangshan, linux-doc,
	peterz, fweisbec, linux-kernel, namhyung, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, vincent.guittot, tglx, linux-arm-kernel, netdev,
	sbw, tj, akpm, linuxppc-dev

On 03/02/2013 06:44 PM, Lai Jiangshan wrote:
> From 345a7a75c314ff567be48983e0892bc69c4452e7 Mon Sep 17 00:00:00 2001
> From: Lai Jiangshan <laijs@cn.fujitsu.com>
> Date: Sat, 2 Mar 2013 20:33:14 +0800
> Subject: [PATCH] lglock: add read-preference local-global rwlock
> 
> Current lglock is not read-preference, so it can't be used on some cases
> which read-preference rwlock can do. Example, get_cpu_online_atomic().
> 
[...]
> diff --git a/kernel/lglock.c b/kernel/lglock.c
> index 6535a66..52e9b2c 100644
> --- a/kernel/lglock.c
> +++ b/kernel/lglock.c
> @@ -87,3 +87,71 @@ void lg_global_unlock(struct lglock *lg)
>  	preempt_enable();
>  }
>  EXPORT_SYMBOL(lg_global_unlock);
> +
> +#define FALLBACK_BASE	(1UL << 30)
> +
> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
> +{
> +	struct lglock *lg = &lgrw->lglock;
> +
> +	preempt_disable();
> +	if (likely(!__this_cpu_read(*lgrw->reader_refcnt))) {
> +		rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
> +		if (unlikely(!arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
> +			read_lock(&lgrw->fallback_rwlock);
> +			__this_cpu_write(*lgrw->reader_refcnt, FALLBACK_BASE);
> +			return;
> +		}
> +	}
> +
> +	__this_cpu_inc(*lgrw->reader_refcnt);
> +}
> +EXPORT_SYMBOL(lg_rwlock_local_read_lock);
> +
> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
> +{
> +	switch (__this_cpu_read(*lgrw->reader_refcnt)) {
> +	case 1:
> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
> +		lg_local_unlock(&lgrw->lglock);
> +		return;

This should be a break, instead of a return, right?
Otherwise, there will be a preempt imbalance...

> +	case FALLBACK_BASE:
> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
> +		read_unlock(&lgrw->fallback_rwlock);
> +		rwlock_release(&lg->lock_dep_map, 1, _RET_IP_);
> +		break;
> +	default:
> +		__this_cpu_dec(*lgrw->reader_refcnt);
> +		break;
> +	}
> +
> +	preempt_enable();
> +}


Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH V2] lglock: add read-preference local-global rwlock
@ 2013-03-02 17:11                                           ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-03-02 17:11 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Lai Jiangshan, linux-doc, peterz, fweisbec, linux-kernel,
	Michel Lespinasse, mingo, linux-arch, linux, xiaoguangrong,
	wangyun, paulmck, nikunj, linux-pm, rusty, rostedt, rjw,
	namhyung, tglx, linux-arm-kernel, netdev, Oleg Nesterov,
	vincent.guittot, sbw, tj, akpm, linuxppc-dev

On 03/02/2013 06:44 PM, Lai Jiangshan wrote:
> From 345a7a75c314ff567be48983e0892bc69c4452e7 Mon Sep 17 00:00:00 2001
> From: Lai Jiangshan <laijs@cn.fujitsu.com>
> Date: Sat, 2 Mar 2013 20:33:14 +0800
> Subject: [PATCH] lglock: add read-preference local-global rwlock
> 
> Current lglock is not read-preference, so it can't be used on some cases
> which read-preference rwlock can do. Example, get_cpu_online_atomic().
> 
[...]
> diff --git a/kernel/lglock.c b/kernel/lglock.c
> index 6535a66..52e9b2c 100644
> --- a/kernel/lglock.c
> +++ b/kernel/lglock.c
> @@ -87,3 +87,71 @@ void lg_global_unlock(struct lglock *lg)
>  	preempt_enable();
>  }
>  EXPORT_SYMBOL(lg_global_unlock);
> +
> +#define FALLBACK_BASE	(1UL << 30)
> +
> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
> +{
> +	struct lglock *lg = &lgrw->lglock;
> +
> +	preempt_disable();
> +	if (likely(!__this_cpu_read(*lgrw->reader_refcnt))) {
> +		rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
> +		if (unlikely(!arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
> +			read_lock(&lgrw->fallback_rwlock);
> +			__this_cpu_write(*lgrw->reader_refcnt, FALLBACK_BASE);
> +			return;
> +		}
> +	}
> +
> +	__this_cpu_inc(*lgrw->reader_refcnt);
> +}
> +EXPORT_SYMBOL(lg_rwlock_local_read_lock);
> +
> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
> +{
> +	switch (__this_cpu_read(*lgrw->reader_refcnt)) {
> +	case 1:
> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
> +		lg_local_unlock(&lgrw->lglock);
> +		return;

This should be a break, instead of a return, right?
Otherwise, there will be a preempt imbalance...

> +	case FALLBACK_BASE:
> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
> +		read_unlock(&lgrw->fallback_rwlock);
> +		rwlock_release(&lg->lock_dep_map, 1, _RET_IP_);
> +		break;
> +	default:
> +		__this_cpu_dec(*lgrw->reader_refcnt);
> +		break;
> +	}
> +
> +	preempt_enable();
> +}


Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH V2] lglock: add read-preference local-global rwlock
@ 2013-03-02 17:11                                           ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-03-02 17:11 UTC (permalink / raw)
  To: linux-arm-kernel

On 03/02/2013 06:44 PM, Lai Jiangshan wrote:
> From 345a7a75c314ff567be48983e0892bc69c4452e7 Mon Sep 17 00:00:00 2001
> From: Lai Jiangshan <laijs@cn.fujitsu.com>
> Date: Sat, 2 Mar 2013 20:33:14 +0800
> Subject: [PATCH] lglock: add read-preference local-global rwlock
> 
> Current lglock is not read-preference, so it can't be used on some cases
> which read-preference rwlock can do. Example, get_cpu_online_atomic().
> 
[...]
> diff --git a/kernel/lglock.c b/kernel/lglock.c
> index 6535a66..52e9b2c 100644
> --- a/kernel/lglock.c
> +++ b/kernel/lglock.c
> @@ -87,3 +87,71 @@ void lg_global_unlock(struct lglock *lg)
>  	preempt_enable();
>  }
>  EXPORT_SYMBOL(lg_global_unlock);
> +
> +#define FALLBACK_BASE	(1UL << 30)
> +
> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
> +{
> +	struct lglock *lg = &lgrw->lglock;
> +
> +	preempt_disable();
> +	if (likely(!__this_cpu_read(*lgrw->reader_refcnt))) {
> +		rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
> +		if (unlikely(!arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
> +			read_lock(&lgrw->fallback_rwlock);
> +			__this_cpu_write(*lgrw->reader_refcnt, FALLBACK_BASE);
> +			return;
> +		}
> +	}
> +
> +	__this_cpu_inc(*lgrw->reader_refcnt);
> +}
> +EXPORT_SYMBOL(lg_rwlock_local_read_lock);
> +
> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
> +{
> +	switch (__this_cpu_read(*lgrw->reader_refcnt)) {
> +	case 1:
> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
> +		lg_local_unlock(&lgrw->lglock);
> +		return;

This should be a break, instead of a return, right?
Otherwise, there will be a preempt imbalance...

> +	case FALLBACK_BASE:
> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
> +		read_unlock(&lgrw->fallback_rwlock);
> +		rwlock_release(&lg->lock_dep_map, 1, _RET_IP_);
> +		break;
> +	default:
> +		__this_cpu_dec(*lgrw->reader_refcnt);
> +		break;
> +	}
> +
> +	preempt_enable();
> +}


Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH V2] lglock: add read-preference local-global rwlock
  2013-03-02 13:14                                         ` Lai Jiangshan
  (?)
@ 2013-03-02 17:20                                           ` Oleg Nesterov
  -1 siblings, 0 replies; 360+ messages in thread
From: Oleg Nesterov @ 2013-03-02 17:20 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Michel Lespinasse, Srivatsa S. Bhat, Lai Jiangshan, linux-doc,
	peterz, fweisbec, linux-kernel, namhyung, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, vincent.guittot, tglx, linux-arm-kernel, netdev,
	sbw, tj, akpm, linuxppc-dev

On 03/02, Lai Jiangshan wrote:
>
> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
> +{
> +	switch (__this_cpu_read(*lgrw->reader_refcnt)) {
> +	case 1:
> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
> +		lg_local_unlock(&lgrw->lglock);
> +		return;
> +	case FALLBACK_BASE:
> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
> +		read_unlock(&lgrw->fallback_rwlock);
> +		rwlock_release(&lg->lock_dep_map, 1, _RET_IP_);

I guess "case 1:" should do rwlock_release() too.

Otherwise, at first glance looks correct...

However, I still think that FALLBACK_BASE only adds the unnecessary
complications. But even if I am right this is subjective of course, please
feel free to ignore.

And btw, I am not sure about lg->lock_dep_map, perhaps we should use
fallback_rwlock->dep_map ?

We need rwlock_acquire_read() even in the fast-path, and this acquire_read
should be paired with rwlock_acquire() in _write_lock(), but it does
spin_acquire(lg->lock_dep_map). Yes, currently this is the same (afaics)
but perhaps fallback_rwlock->dep_map would be more clean.

Oleg.


^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH V2] lglock: add read-preference local-global rwlock
@ 2013-03-02 17:20                                           ` Oleg Nesterov
  0 siblings, 0 replies; 360+ messages in thread
From: Oleg Nesterov @ 2013-03-02 17:20 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Lai Jiangshan, linux-doc, peterz, fweisbec, Michel Lespinasse,
	mingo, linux-arch, linux, xiaoguangrong, wangyun, paulmck,
	nikunj, linux-pm, rusty, rostedt, rjw, namhyung, tglx,
	linux-arm-kernel, netdev, linux-kernel, vincent.guittot, sbw,
	Srivatsa S. Bhat, tj, akpm, linuxppc-dev

On 03/02, Lai Jiangshan wrote:
>
> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
> +{
> +	switch (__this_cpu_read(*lgrw->reader_refcnt)) {
> +	case 1:
> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
> +		lg_local_unlock(&lgrw->lglock);
> +		return;
> +	case FALLBACK_BASE:
> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
> +		read_unlock(&lgrw->fallback_rwlock);
> +		rwlock_release(&lg->lock_dep_map, 1, _RET_IP_);

I guess "case 1:" should do rwlock_release() too.

Otherwise, at first glance looks correct...

However, I still think that FALLBACK_BASE only adds the unnecessary
complications. But even if I am right this is subjective of course, please
feel free to ignore.

And btw, I am not sure about lg->lock_dep_map, perhaps we should use
fallback_rwlock->dep_map ?

We need rwlock_acquire_read() even in the fast-path, and this acquire_read
should be paired with rwlock_acquire() in _write_lock(), but it does
spin_acquire(lg->lock_dep_map). Yes, currently this is the same (afaics)
but perhaps fallback_rwlock->dep_map would be more clean.

Oleg.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH V2] lglock: add read-preference local-global rwlock
@ 2013-03-02 17:20                                           ` Oleg Nesterov
  0 siblings, 0 replies; 360+ messages in thread
From: Oleg Nesterov @ 2013-03-02 17:20 UTC (permalink / raw)
  To: linux-arm-kernel

On 03/02, Lai Jiangshan wrote:
>
> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
> +{
> +	switch (__this_cpu_read(*lgrw->reader_refcnt)) {
> +	case 1:
> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
> +		lg_local_unlock(&lgrw->lglock);
> +		return;
> +	case FALLBACK_BASE:
> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
> +		read_unlock(&lgrw->fallback_rwlock);
> +		rwlock_release(&lg->lock_dep_map, 1, _RET_IP_);

I guess "case 1:" should do rwlock_release() too.

Otherwise, at first glance looks correct...

However, I still think that FALLBACK_BASE only adds the unnecessary
complications. But even if I am right this is subjective of course, please
feel free to ignore.

And btw, I am not sure about lg->lock_dep_map, perhaps we should use
fallback_rwlock->dep_map ?

We need rwlock_acquire_read() even in the fast-path, and this acquire_read
should be paired with rwlock_acquire() in _write_lock(), but it does
spin_acquire(lg->lock_dep_map). Yes, currently this is the same (afaics)
but perhaps fallback_rwlock->dep_map would be more clean.

Oleg.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH V2] lglock: add read-preference local-global rwlock
  2013-03-02 17:20                                           ` Oleg Nesterov
  (?)
@ 2013-03-03 17:40                                             ` Oleg Nesterov
  -1 siblings, 0 replies; 360+ messages in thread
From: Oleg Nesterov @ 2013-03-03 17:40 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Michel Lespinasse, Srivatsa S. Bhat, Lai Jiangshan, linux-doc,
	peterz, fweisbec, linux-kernel, namhyung, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, vincent.guittot, tglx, linux-arm-kernel, netdev,
	sbw, tj, akpm, linuxppc-dev

On 03/02, Oleg Nesterov wrote:
>
> On 03/02, Lai Jiangshan wrote:
> >
> > +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
> > +{
> > +	switch (__this_cpu_read(*lgrw->reader_refcnt)) {
> > +	case 1:
> > +		__this_cpu_write(*lgrw->reader_refcnt, 0);
> > +		lg_local_unlock(&lgrw->lglock);
> > +		return;
> > +	case FALLBACK_BASE:
> > +		__this_cpu_write(*lgrw->reader_refcnt, 0);
> > +		read_unlock(&lgrw->fallback_rwlock);
> > +		rwlock_release(&lg->lock_dep_map, 1, _RET_IP_);
>
> I guess "case 1:" should do rwlock_release() too.
>
> Otherwise, at first glance looks correct...
>
> However, I still think that FALLBACK_BASE only adds the unnecessary
> complications. But even if I am right this is subjective of course, please
> feel free to ignore.

Yes, but...

> And btw, I am not sure about lg->lock_dep_map, perhaps we should use
> fallback_rwlock->dep_map ?
>
> We need rwlock_acquire_read() even in the fast-path, and this acquire_read
> should be paired with rwlock_acquire() in _write_lock(), but it does
> spin_acquire(lg->lock_dep_map). Yes, currently this is the same (afaics)
> but perhaps fallback_rwlock->dep_map would be more clean.

Please ignore this part.

I missed that lg_rwlock_global_write_lock() relies on lg_global_lock(),
and I just noticed that it does rwlock_acquire(lg->lock_dep_map).

Hmm. But then I do not understand the lglock annotations. Obviously,
rwlock_acquire_read() in lg_local_lock() can't even detect the simplest
deadlock, say, lg_local_lock(LOCK) + lg_local_lock(LOCK). Not to mention
spin_lock(X) + lg_local_lock(Y) vs lg_local_lock(Y) + spin_lock(X).

OK, I understand that it is not easy to make these annotations correct...

Oleg.


^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH V2] lglock: add read-preference local-global rwlock
@ 2013-03-03 17:40                                             ` Oleg Nesterov
  0 siblings, 0 replies; 360+ messages in thread
From: Oleg Nesterov @ 2013-03-03 17:40 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Lai Jiangshan, linux-doc, peterz, fweisbec, Michel Lespinasse,
	mingo, linux-arch, linux, xiaoguangrong, wangyun, paulmck,
	nikunj, linux-pm, rusty, rostedt, rjw, namhyung, tglx,
	linux-arm-kernel, netdev, linux-kernel, vincent.guittot, sbw,
	Srivatsa S. Bhat, tj, akpm, linuxppc-dev

On 03/02, Oleg Nesterov wrote:
>
> On 03/02, Lai Jiangshan wrote:
> >
> > +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
> > +{
> > +	switch (__this_cpu_read(*lgrw->reader_refcnt)) {
> > +	case 1:
> > +		__this_cpu_write(*lgrw->reader_refcnt, 0);
> > +		lg_local_unlock(&lgrw->lglock);
> > +		return;
> > +	case FALLBACK_BASE:
> > +		__this_cpu_write(*lgrw->reader_refcnt, 0);
> > +		read_unlock(&lgrw->fallback_rwlock);
> > +		rwlock_release(&lg->lock_dep_map, 1, _RET_IP_);
>
> I guess "case 1:" should do rwlock_release() too.
>
> Otherwise, at first glance looks correct...
>
> However, I still think that FALLBACK_BASE only adds the unnecessary
> complications. But even if I am right this is subjective of course, please
> feel free to ignore.

Yes, but...

> And btw, I am not sure about lg->lock_dep_map, perhaps we should use
> fallback_rwlock->dep_map ?
>
> We need rwlock_acquire_read() even in the fast-path, and this acquire_read
> should be paired with rwlock_acquire() in _write_lock(), but it does
> spin_acquire(lg->lock_dep_map). Yes, currently this is the same (afaics)
> but perhaps fallback_rwlock->dep_map would be more clean.

Please ignore this part.

I missed that lg_rwlock_global_write_lock() relies on lg_global_lock(),
and I just noticed that it does rwlock_acquire(lg->lock_dep_map).

Hmm. But then I do not understand the lglock annotations. Obviously,
rwlock_acquire_read() in lg_local_lock() can't even detect the simplest
deadlock, say, lg_local_lock(LOCK) + lg_local_lock(LOCK). Not to mention
spin_lock(X) + lg_local_lock(Y) vs lg_local_lock(Y) + spin_lock(X).

OK, I understand that it is not easy to make these annotations correct...

Oleg.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH V2] lglock: add read-preference local-global rwlock
@ 2013-03-03 17:40                                             ` Oleg Nesterov
  0 siblings, 0 replies; 360+ messages in thread
From: Oleg Nesterov @ 2013-03-03 17:40 UTC (permalink / raw)
  To: linux-arm-kernel

On 03/02, Oleg Nesterov wrote:
>
> On 03/02, Lai Jiangshan wrote:
> >
> > +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
> > +{
> > +	switch (__this_cpu_read(*lgrw->reader_refcnt)) {
> > +	case 1:
> > +		__this_cpu_write(*lgrw->reader_refcnt, 0);
> > +		lg_local_unlock(&lgrw->lglock);
> > +		return;
> > +	case FALLBACK_BASE:
> > +		__this_cpu_write(*lgrw->reader_refcnt, 0);
> > +		read_unlock(&lgrw->fallback_rwlock);
> > +		rwlock_release(&lg->lock_dep_map, 1, _RET_IP_);
>
> I guess "case 1:" should do rwlock_release() too.
>
> Otherwise, at first glance looks correct...
>
> However, I still think that FALLBACK_BASE only adds the unnecessary
> complications. But even if I am right this is subjective of course, please
> feel free to ignore.

Yes, but...

> And btw, I am not sure about lg->lock_dep_map, perhaps we should use
> fallback_rwlock->dep_map ?
>
> We need rwlock_acquire_read() even in the fast-path, and this acquire_read
> should be paired with rwlock_acquire() in _write_lock(), but it does
> spin_acquire(lg->lock_dep_map). Yes, currently this is the same (afaics)
> but perhaps fallback_rwlock->dep_map would be more clean.

Please ignore this part.

I missed that lg_rwlock_global_write_lock() relies on lg_global_lock(),
and I just noticed that it does rwlock_acquire(lg->lock_dep_map).

Hmm. But then I do not understand the lglock annotations. Obviously,
rwlock_acquire_read() in lg_local_lock() can't even detect the simplest
deadlock, say, lg_local_lock(LOCK) + lg_local_lock(LOCK). Not to mention
spin_lock(X) + lg_local_lock(Y) vs lg_local_lock(Y) + spin_lock(X).

OK, I understand that it is not easy to make these annotations correct...

Oleg.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH V2] lglock: add read-preference local-global rwlock
  2013-03-03 17:40                                             ` Oleg Nesterov
  (?)
@ 2013-03-05  1:37                                               ` Michel Lespinasse
  -1 siblings, 0 replies; 360+ messages in thread
From: Michel Lespinasse @ 2013-03-05  1:37 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Lai Jiangshan, Srivatsa S. Bhat, Lai Jiangshan, linux-doc,
	peterz, fweisbec, linux-kernel, namhyung, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, vincent.guittot, tglx, linux-arm-kernel, netdev,
	sbw, tj, akpm, linuxppc-dev

On Sun, Mar 3, 2013 at 9:40 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>> However, I still think that FALLBACK_BASE only adds the unnecessary
>> complications. But even if I am right this is subjective of course, please
>> feel free to ignore.

Would it help if I sent out that version (without FALLBACK_BASE) as a
formal proposal ?

> Hmm. But then I do not understand the lglock annotations. Obviously,
> rwlock_acquire_read() in lg_local_lock() can't even detect the simplest
> deadlock, say, lg_local_lock(LOCK) + lg_local_lock(LOCK). Not to mention
> spin_lock(X) + lg_local_lock(Y) vs lg_local_lock(Y) + spin_lock(X).
>
> OK, I understand that it is not easy to make these annotations correct...

I am going to send out a proposal to fix the existing lglock
annotations and detect the two cases you noticed. It's actually not
that hard :)

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH V2] lglock: add read-preference local-global rwlock
@ 2013-03-05  1:37                                               ` Michel Lespinasse
  0 siblings, 0 replies; 360+ messages in thread
From: Michel Lespinasse @ 2013-03-05  1:37 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Lai Jiangshan, linux-doc, peterz, fweisbec, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, namhyung, tglx, linux-arm-kernel, Lai Jiangshan,
	netdev, linux-kernel, vincent.guittot, sbw, Srivatsa S. Bhat, tj,
	akpm, linuxppc-dev

On Sun, Mar 3, 2013 at 9:40 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>> However, I still think that FALLBACK_BASE only adds the unnecessary
>> complications. But even if I am right this is subjective of course, please
>> feel free to ignore.

Would it help if I sent out that version (without FALLBACK_BASE) as a
formal proposal ?

> Hmm. But then I do not understand the lglock annotations. Obviously,
> rwlock_acquire_read() in lg_local_lock() can't even detect the simplest
> deadlock, say, lg_local_lock(LOCK) + lg_local_lock(LOCK). Not to mention
> spin_lock(X) + lg_local_lock(Y) vs lg_local_lock(Y) + spin_lock(X).
>
> OK, I understand that it is not easy to make these annotations correct...

I am going to send out a proposal to fix the existing lglock
annotations and detect the two cases you noticed. It's actually not
that hard :)

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH V2] lglock: add read-preference local-global rwlock
@ 2013-03-05  1:37                                               ` Michel Lespinasse
  0 siblings, 0 replies; 360+ messages in thread
From: Michel Lespinasse @ 2013-03-05  1:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, Mar 3, 2013 at 9:40 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>> However, I still think that FALLBACK_BASE only adds the unnecessary
>> complications. But even if I am right this is subjective of course, please
>> feel free to ignore.

Would it help if I sent out that version (without FALLBACK_BASE) as a
formal proposal ?

> Hmm. But then I do not understand the lglock annotations. Obviously,
> rwlock_acquire_read() in lg_local_lock() can't even detect the simplest
> deadlock, say, lg_local_lock(LOCK) + lg_local_lock(LOCK). Not to mention
> spin_lock(X) + lg_local_lock(Y) vs lg_local_lock(Y) + spin_lock(X).
>
> OK, I understand that it is not easy to make these annotations correct...

I am going to send out a proposal to fix the existing lglock
annotations and detect the two cases you noticed. It's actually not
that hard :)

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH V2] lglock: add read-preference local-global rwlock
  2013-03-02 17:20                                           ` Oleg Nesterov
  (?)
@ 2013-03-05 15:27                                             ` Lai Jiangshan
  -1 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-03-05 15:27 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Michel Lespinasse, Srivatsa S. Bhat, Lai Jiangshan, linux-doc,
	peterz, fweisbec, linux-kernel, namhyung, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, vincent.guittot, tglx, linux-arm-kernel, netdev,
	sbw, tj, akpm, linuxppc-dev

On 03/03/13 01:20, Oleg Nesterov wrote:
> On 03/02, Lai Jiangshan wrote:
>>
>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
>> +{
>> +	switch (__this_cpu_read(*lgrw->reader_refcnt)) {
>> +	case 1:
>> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
>> +		lg_local_unlock(&lgrw->lglock);
>> +		return;
>> +	case FALLBACK_BASE:
>> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
>> +		read_unlock(&lgrw->fallback_rwlock);
>> +		rwlock_release(&lg->lock_dep_map, 1, _RET_IP_);
> 
> I guess "case 1:" should do rwlock_release() too.

Already do it in "lg_local_unlock(&lgrw->lglock);" before it returns.
(I like reuse old code)

> 
> Otherwise, at first glance looks correct...
> 
> However, I still think that FALLBACK_BASE only adds the unnecessary
> complications. But even if I am right this is subjective of course, please
> feel free to ignore.

OK, I kill FALLBACK_BASE in later patch.

> 
> And btw, I am not sure about lg->lock_dep_map, perhaps we should use
> fallback_rwlock->dep_map ?

Use either one is OK.

> 
> We need rwlock_acquire_read() even in the fast-path, and this acquire_read
> should be paired with rwlock_acquire() in _write_lock(), but it does
> spin_acquire(lg->lock_dep_map). Yes, currently this is the same (afaics)
> but perhaps fallback_rwlock->dep_map would be more clean.
> 

I can't tell which one is better. I try to use fallback_rwlock->dep_map later.

> Oleg.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH V2] lglock: add read-preference local-global rwlock
@ 2013-03-05 15:27                                             ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-03-05 15:27 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Lai Jiangshan, linux-doc, peterz, fweisbec, Michel Lespinasse,
	mingo, linux-arch, linux, xiaoguangrong, wangyun, paulmck,
	nikunj, linux-pm, rusty, rostedt, rjw, namhyung, tglx,
	linux-arm-kernel, netdev, linux-kernel, vincent.guittot, sbw,
	Srivatsa S. Bhat, tj, akpm, linuxppc-dev

On 03/03/13 01:20, Oleg Nesterov wrote:
> On 03/02, Lai Jiangshan wrote:
>>
>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
>> +{
>> +	switch (__this_cpu_read(*lgrw->reader_refcnt)) {
>> +	case 1:
>> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
>> +		lg_local_unlock(&lgrw->lglock);
>> +		return;
>> +	case FALLBACK_BASE:
>> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
>> +		read_unlock(&lgrw->fallback_rwlock);
>> +		rwlock_release(&lg->lock_dep_map, 1, _RET_IP_);
> 
> I guess "case 1:" should do rwlock_release() too.

Already do it in "lg_local_unlock(&lgrw->lglock);" before it returns.
(I like reuse old code)

> 
> Otherwise, at first glance looks correct...
> 
> However, I still think that FALLBACK_BASE only adds the unnecessary
> complications. But even if I am right this is subjective of course, please
> feel free to ignore.

OK, I kill FALLBACK_BASE in later patch.

> 
> And btw, I am not sure about lg->lock_dep_map, perhaps we should use
> fallback_rwlock->dep_map ?

Use either one is OK.

> 
> We need rwlock_acquire_read() even in the fast-path, and this acquire_read
> should be paired with rwlock_acquire() in _write_lock(), but it does
> spin_acquire(lg->lock_dep_map). Yes, currently this is the same (afaics)
> but perhaps fallback_rwlock->dep_map would be more clean.
> 

I can't tell which one is better. I try to use fallback_rwlock->dep_map later.

> Oleg.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH V2] lglock: add read-preference local-global rwlock
@ 2013-03-05 15:27                                             ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-03-05 15:27 UTC (permalink / raw)
  To: linux-arm-kernel

On 03/03/13 01:20, Oleg Nesterov wrote:
> On 03/02, Lai Jiangshan wrote:
>>
>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
>> +{
>> +	switch (__this_cpu_read(*lgrw->reader_refcnt)) {
>> +	case 1:
>> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
>> +		lg_local_unlock(&lgrw->lglock);
>> +		return;
>> +	case FALLBACK_BASE:
>> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
>> +		read_unlock(&lgrw->fallback_rwlock);
>> +		rwlock_release(&lg->lock_dep_map, 1, _RET_IP_);
> 
> I guess "case 1:" should do rwlock_release() too.

Already do it in "lg_local_unlock(&lgrw->lglock);" before it returns.
(I like reuse old code)

> 
> Otherwise, at first glance looks correct...
> 
> However, I still think that FALLBACK_BASE only adds the unnecessary
> complications. But even if I am right this is subjective of course, please
> feel free to ignore.

OK, I kill FALLBACK_BASE in later patch.

> 
> And btw, I am not sure about lg->lock_dep_map, perhaps we should use
> fallback_rwlock->dep_map ?

Use either one is OK.

> 
> We need rwlock_acquire_read() even in the fast-path, and this acquire_read
> should be paired with rwlock_acquire() in _write_lock(), but it does
> spin_acquire(lg->lock_dep_map). Yes, currently this is the same (afaics)
> but perhaps fallback_rwlock->dep_map would be more clean.
> 

I can't tell which one is better. I try to use fallback_rwlock->dep_map later.

> Oleg.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH V2] lglock: add read-preference local-global rwlock
  2013-03-02 17:11                                           ` Srivatsa S. Bhat
  (?)
@ 2013-03-05 15:41                                             ` Lai Jiangshan
  -1 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-03-05 15:41 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Michel Lespinasse, Oleg Nesterov, Lai Jiangshan, linux-doc,
	peterz, fweisbec, linux-kernel, namhyung, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, vincent.guittot, tglx, linux-arm-kernel, netdev,
	sbw, tj, akpm, linuxppc-dev

On 03/03/13 01:11, Srivatsa S. Bhat wrote:
> On 03/02/2013 06:44 PM, Lai Jiangshan wrote:
>> From 345a7a75c314ff567be48983e0892bc69c4452e7 Mon Sep 17 00:00:00 2001
>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>> Date: Sat, 2 Mar 2013 20:33:14 +0800
>> Subject: [PATCH] lglock: add read-preference local-global rwlock
>>
>> Current lglock is not read-preference, so it can't be used on some cases
>> which read-preference rwlock can do. Example, get_cpu_online_atomic().
>>
> [...]
>> diff --git a/kernel/lglock.c b/kernel/lglock.c
>> index 6535a66..52e9b2c 100644
>> --- a/kernel/lglock.c
>> +++ b/kernel/lglock.c
>> @@ -87,3 +87,71 @@ void lg_global_unlock(struct lglock *lg)
>>  	preempt_enable();
>>  }
>>  EXPORT_SYMBOL(lg_global_unlock);
>> +
>> +#define FALLBACK_BASE	(1UL << 30)
>> +
>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
>> +{
>> +	struct lglock *lg = &lgrw->lglock;
>> +
>> +	preempt_disable();
>> +	if (likely(!__this_cpu_read(*lgrw->reader_refcnt))) {
>> +		rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
>> +		if (unlikely(!arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
>> +			read_lock(&lgrw->fallback_rwlock);
>> +			__this_cpu_write(*lgrw->reader_refcnt, FALLBACK_BASE);
>> +			return;
>> +		}
>> +	}
>> +
>> +	__this_cpu_inc(*lgrw->reader_refcnt);
>> +}
>> +EXPORT_SYMBOL(lg_rwlock_local_read_lock);
>> +
>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
>> +{
>> +	switch (__this_cpu_read(*lgrw->reader_refcnt)) {
>> +	case 1:
>> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
>> +		lg_local_unlock(&lgrw->lglock);
>> +		return;
> 
> This should be a break, instead of a return, right?
> Otherwise, there will be a preempt imbalance...


"lockdep" and "preempt" are handled in lg_local_unlock(&lgrw->lglock);

Thanks,
Lai

> 
>> +	case FALLBACK_BASE:
>> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
>> +		read_unlock(&lgrw->fallback_rwlock);
>> +		rwlock_release(&lg->lock_dep_map, 1, _RET_IP_);
>> +		break;
>> +	default:
>> +		__this_cpu_dec(*lgrw->reader_refcnt);
>> +		break;
>> +	}
>> +
>> +	preempt_enable();
>> +}
> 
> 
> Regards,
> Srivatsa S. Bhat
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH V2] lglock: add read-preference local-global rwlock
@ 2013-03-05 15:41                                             ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-03-05 15:41 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Lai Jiangshan, linux-doc, peterz, fweisbec, linux-kernel,
	Michel Lespinasse, mingo, linux-arch, linux, xiaoguangrong,
	wangyun, paulmck, nikunj, linux-pm, rusty, rostedt, rjw,
	namhyung, tglx, linux-arm-kernel, netdev, Oleg Nesterov,
	vincent.guittot, sbw, tj, akpm, linuxppc-dev

On 03/03/13 01:11, Srivatsa S. Bhat wrote:
> On 03/02/2013 06:44 PM, Lai Jiangshan wrote:
>> From 345a7a75c314ff567be48983e0892bc69c4452e7 Mon Sep 17 00:00:00 2001
>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>> Date: Sat, 2 Mar 2013 20:33:14 +0800
>> Subject: [PATCH] lglock: add read-preference local-global rwlock
>>
>> Current lglock is not read-preference, so it can't be used on some cases
>> which read-preference rwlock can do. Example, get_cpu_online_atomic().
>>
> [...]
>> diff --git a/kernel/lglock.c b/kernel/lglock.c
>> index 6535a66..52e9b2c 100644
>> --- a/kernel/lglock.c
>> +++ b/kernel/lglock.c
>> @@ -87,3 +87,71 @@ void lg_global_unlock(struct lglock *lg)
>>  	preempt_enable();
>>  }
>>  EXPORT_SYMBOL(lg_global_unlock);
>> +
>> +#define FALLBACK_BASE	(1UL << 30)
>> +
>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
>> +{
>> +	struct lglock *lg = &lgrw->lglock;
>> +
>> +	preempt_disable();
>> +	if (likely(!__this_cpu_read(*lgrw->reader_refcnt))) {
>> +		rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
>> +		if (unlikely(!arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
>> +			read_lock(&lgrw->fallback_rwlock);
>> +			__this_cpu_write(*lgrw->reader_refcnt, FALLBACK_BASE);
>> +			return;
>> +		}
>> +	}
>> +
>> +	__this_cpu_inc(*lgrw->reader_refcnt);
>> +}
>> +EXPORT_SYMBOL(lg_rwlock_local_read_lock);
>> +
>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
>> +{
>> +	switch (__this_cpu_read(*lgrw->reader_refcnt)) {
>> +	case 1:
>> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
>> +		lg_local_unlock(&lgrw->lglock);
>> +		return;
> 
> This should be a break, instead of a return, right?
> Otherwise, there will be a preempt imbalance...


"lockdep" and "preempt" are handled in lg_local_unlock(&lgrw->lglock);

Thanks,
Lai

> 
>> +	case FALLBACK_BASE:
>> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
>> +		read_unlock(&lgrw->fallback_rwlock);
>> +		rwlock_release(&lg->lock_dep_map, 1, _RET_IP_);
>> +		break;
>> +	default:
>> +		__this_cpu_dec(*lgrw->reader_refcnt);
>> +		break;
>> +	}
>> +
>> +	preempt_enable();
>> +}
> 
> 
> Regards,
> Srivatsa S. Bhat
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH V2] lglock: add read-preference local-global rwlock
@ 2013-03-05 15:41                                             ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-03-05 15:41 UTC (permalink / raw)
  To: linux-arm-kernel

On 03/03/13 01:11, Srivatsa S. Bhat wrote:
> On 03/02/2013 06:44 PM, Lai Jiangshan wrote:
>> From 345a7a75c314ff567be48983e0892bc69c4452e7 Mon Sep 17 00:00:00 2001
>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>> Date: Sat, 2 Mar 2013 20:33:14 +0800
>> Subject: [PATCH] lglock: add read-preference local-global rwlock
>>
>> Current lglock is not read-preference, so it can't be used on some cases
>> which read-preference rwlock can do. Example, get_cpu_online_atomic().
>>
> [...]
>> diff --git a/kernel/lglock.c b/kernel/lglock.c
>> index 6535a66..52e9b2c 100644
>> --- a/kernel/lglock.c
>> +++ b/kernel/lglock.c
>> @@ -87,3 +87,71 @@ void lg_global_unlock(struct lglock *lg)
>>  	preempt_enable();
>>  }
>>  EXPORT_SYMBOL(lg_global_unlock);
>> +
>> +#define FALLBACK_BASE	(1UL << 30)
>> +
>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
>> +{
>> +	struct lglock *lg = &lgrw->lglock;
>> +
>> +	preempt_disable();
>> +	if (likely(!__this_cpu_read(*lgrw->reader_refcnt))) {
>> +		rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
>> +		if (unlikely(!arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
>> +			read_lock(&lgrw->fallback_rwlock);
>> +			__this_cpu_write(*lgrw->reader_refcnt, FALLBACK_BASE);
>> +			return;
>> +		}
>> +	}
>> +
>> +	__this_cpu_inc(*lgrw->reader_refcnt);
>> +}
>> +EXPORT_SYMBOL(lg_rwlock_local_read_lock);
>> +
>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
>> +{
>> +	switch (__this_cpu_read(*lgrw->reader_refcnt)) {
>> +	case 1:
>> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
>> +		lg_local_unlock(&lgrw->lglock);
>> +		return;
> 
> This should be a break, instead of a return, right?
> Otherwise, there will be a preempt imbalance...


"lockdep" and "preempt" are handled in lg_local_unlock(&lgrw->lglock);

Thanks,
Lai

> 
>> +	case FALLBACK_BASE:
>> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
>> +		read_unlock(&lgrw->fallback_rwlock);
>> +		rwlock_release(&lg->lock_dep_map, 1, _RET_IP_);
>> +		break;
>> +	default:
>> +		__this_cpu_dec(*lgrw->reader_refcnt);
>> +		break;
>> +	}
>> +
>> +	preempt_enable();
>> +}
> 
> 
> Regards,
> Srivatsa S. Bhat
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH] lglock: add read-preference local-global rwlock
  2013-03-02 17:06                                         ` Oleg Nesterov
  (?)
@ 2013-03-05 15:54                                           ` Lai Jiangshan
  -1 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-03-05 15:54 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Michel Lespinasse, Srivatsa S. Bhat, Lai Jiangshan, linux-doc,
	peterz, fweisbec, linux-kernel, namhyung, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, vincent.guittot, tglx, linux-arm-kernel, netdev,
	sbw, tj, akpm, linuxppc-dev

On 03/03/13 01:06, Oleg Nesterov wrote:
> On 03/02, Michel Lespinasse wrote:
>>
>> My version would be slower if it needs to take the
>> slow path in a reentrant way, but I'm not sure it matters either :)
> 
> I'd say, this doesn't matter at all, simply because this can only happen
> if we race with the active writer.
> 

It can also happen when interrupted. (still very rarely)

arch_spin_trylock()
	------->interrupted,
		__this_cpu_read() returns 0.
		arch_spin_trylock() fails
		slowpath, any nested will be slowpath too.
		...
		..._read_unlock()
	<-------interrupt
__this_cpu_inc()
....


I saw get_online_cpu_atomic() is called very frequent.
And the above thing happens in one CPU rarely, but how often it
happens in the whole system if we have 4096 CPUs?
(I worries to much. I tend to remove FALLBACK_BASE now, we should
add it only after we proved we needed it, this part is not proved)

Thanks,
Lai



^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH] lglock: add read-preference local-global rwlock
@ 2013-03-05 15:54                                           ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-03-05 15:54 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Lai Jiangshan, linux-doc, peterz, fweisbec, Michel Lespinasse,
	mingo, linux-arch, linux, xiaoguangrong, wangyun, paulmck,
	nikunj, linux-pm, rusty, rostedt, rjw, namhyung, tglx,
	linux-arm-kernel, netdev, linux-kernel, vincent.guittot, sbw,
	Srivatsa S. Bhat, tj, akpm, linuxppc-dev

On 03/03/13 01:06, Oleg Nesterov wrote:
> On 03/02, Michel Lespinasse wrote:
>>
>> My version would be slower if it needs to take the
>> slow path in a reentrant way, but I'm not sure it matters either :)
> 
> I'd say, this doesn't matter at all, simply because this can only happen
> if we race with the active writer.
> 

It can also happen when interrupted. (still very rarely)

arch_spin_trylock()
	------->interrupted,
		__this_cpu_read() returns 0.
		arch_spin_trylock() fails
		slowpath, any nested will be slowpath too.
		...
		..._read_unlock()
	<-------interrupt
__this_cpu_inc()
....


I saw get_online_cpu_atomic() is called very frequent.
And the above thing happens in one CPU rarely, but how often it
happens in the whole system if we have 4096 CPUs?
(I worries to much. I tend to remove FALLBACK_BASE now, we should
add it only after we proved we needed it, this part is not proved)

Thanks,
Lai

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH] lglock: add read-preference local-global rwlock
@ 2013-03-05 15:54                                           ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-03-05 15:54 UTC (permalink / raw)
  To: linux-arm-kernel

On 03/03/13 01:06, Oleg Nesterov wrote:
> On 03/02, Michel Lespinasse wrote:
>>
>> My version would be slower if it needs to take the
>> slow path in a reentrant way, but I'm not sure it matters either :)
> 
> I'd say, this doesn't matter at all, simply because this can only happen
> if we race with the active writer.
> 

It can also happen when interrupted. (still very rarely)

arch_spin_trylock()
	------->interrupted,
		__this_cpu_read() returns 0.
		arch_spin_trylock() fails
		slowpath, any nested will be slowpath too.
		...
		..._read_unlock()
	<-------interrupt
__this_cpu_inc()
....


I saw get_online_cpu_atomic() is called very frequent.
And the above thing happens in one CPU rarely, but how often it
happens in the whole system if we have 4096 CPUs?
(I worries to much. I tend to remove FALLBACK_BASE now, we should
add it only after we proved we needed it, this part is not proved)

Thanks,
Lai

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH V2] lglock: add read-preference local-global rwlock
  2013-03-05 15:27                                             ` Lai Jiangshan
  (?)
@ 2013-03-05 16:19                                               ` Michel Lespinasse
  -1 siblings, 0 replies; 360+ messages in thread
From: Michel Lespinasse @ 2013-03-05 16:19 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Oleg Nesterov, Srivatsa S. Bhat, Lai Jiangshan, linux-doc,
	peterz, fweisbec, linux-kernel, namhyung, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, vincent.guittot, tglx, linux-arm-kernel, netdev,
	sbw, tj, akpm, linuxppc-dev

Hi Lai,

Just a few comments about your v2 proposal. Hopefully you'll catch
these before you send out v3 :)

- I would prefer reader_refcnt to be unsigned int instead of unsigned long
- I would like some comment to indicate that lgrwlocks don't have
  reader-writer fairness and are thus somewhat discouraged
  (people could use plain lglock if they don't need reader preference,
  though even that use (as brlock) is discouraged already :)
- I don't think FALLBACK_BASE is necessary (you already mentioned you'd
  drop it)
- I prefer using the fallback_rwlock's dep_map for lockdep tracking.
  I feel this is more natural since we want the lgrwlock to behave as
  the rwlock, not as the lglock.
- I prefer to avoid return statements in the middle of functions when
  it's easyto do so.

Attached is my current version (based on an earlier version of your code).
You don't have to take it as is but I feel it makes for a more concrete
suggestion :)

Thanks,

----------------------------8<-------------------------------------------
lglock: add read-preference lgrwlock

Current lglock may be used as a fair rwlock; however sometimes a
read-preference rwlock is preferred. One such use case recently came
up for get_cpu_online_atomic().

This change adds a new lgrwlock with the following properties:
- high performance read side, using only cpu-local structures when there
  is no write side to contend with;
- correctness guarantees similar to rwlock_t: recursive readers are allowed
  and the lock's read side is not ordered vs other locks;
- low performance write side (comparable to lglocks' global side).

The implementation relies on the following principles:
- reader_refcnt is a local lock count; it indicates how many recursive
  read locks are taken using the local lglock;
- lglock is used by readers for local locking; it must be acquired
  before reader_refcnt becomes nonzero and released after reader_refcnt
  goes back to zero;
- fallback_rwlock is used by readers for global locking; it is acquired
  when fallback_reader_refcnt is zero and the trylock fails on lglock.
- writers take both the lglock write side and the fallback_rwlock, thus
  making sure to exclude both local and global readers.

Thanks to Srivatsa S. Bhat for proposing a lock with these requirements
and Lai Jiangshan for proposing this algorithm as an lglock extension.

Signed-off-by: Michel Lespinasse <walken@google.com>

---
 include/linux/lglock.h | 46 +++++++++++++++++++++++++++++++++++++++
 kernel/lglock.c        | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 104 insertions(+)

diff --git a/include/linux/lglock.h b/include/linux/lglock.h
index 0d24e932db0b..8b59084935d5 100644
--- a/include/linux/lglock.h
+++ b/include/linux/lglock.h
@@ -67,4 +67,50 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu);
 void lg_global_lock(struct lglock *lg);
 void lg_global_unlock(struct lglock *lg);
 
+/*
+ * lglock may be used as a read write spinlock if desired (though this is
+ * not encouraged as the write side scales badly on high CPU count machines).
+ * It has reader/writer fairness when used that way.
+ *
+ * However, sometimes it is desired to have an unfair rwlock instead, with
+ * reentrant readers that don't need to be ordered vs other locks, comparable
+ * to rwlock_t. lgrwlock implements such semantics.
+ */
+struct lgrwlock {
+	unsigned int __percpu *reader_refcnt;
+	struct lglock lglock;
+	rwlock_t fallback_rwlock;
+};
+
+#define __DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	static DEFINE_PER_CPU(unsigned int, name ## _refcnt);		\
+	static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)		\
+	= __ARCH_SPIN_LOCK_UNLOCKED;
+
+#define __LGRWLOCK_INIT(name) {						\
+	.reader_refcnt = &name ## _refcnt,				\
+	.lglock = { .lock = &name ## _lock },				\
+	.fallback_rwlock = __RW_LOCK_UNLOCKED(name.fallback_rwlock)	\
+}
+
+#define DEFINE_LGRWLOCK(name)						\
+	__DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	struct lgrwlock name = __LGRWLOCK_INIT(name)
+
+#define DEFINE_STATIC_LGRWLOCK(name)					\
+	__DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	static struct lgrwlock name = __LGRWLOCK_INIT(name)
+
+static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name)
+{
+	lg_lock_init(&lgrw->lglock, name);
+}
+
+void lg_read_lock(struct lgrwlock *lgrw);
+void lg_read_unlock(struct lgrwlock *lgrw);
+void lg_write_lock(struct lgrwlock *lgrw);
+void lg_write_unlock(struct lgrwlock *lgrw);
+void __lg_read_write_lock(struct lgrwlock *lgrw);
+void __lg_read_write_unlock(struct lgrwlock *lgrw);
+
 #endif
diff --git a/kernel/lglock.c b/kernel/lglock.c
index 86ae2aebf004..e78a7c95dbfd 100644
--- a/kernel/lglock.c
+++ b/kernel/lglock.c
@@ -87,3 +87,61 @@ void lg_global_unlock(struct lglock *lg)
 	preempt_enable();
 }
 EXPORT_SYMBOL(lg_global_unlock);
+
+void lg_read_lock(struct lgrwlock *lgrw)
+{
+	preempt_disable();
+
+	if (__this_cpu_read(*lgrw->reader_refcnt) ||
+	    arch_spin_trylock(this_cpu_ptr(lgrw->lglock.lock))) {
+		__this_cpu_inc(*lgrw->reader_refcnt);
+		rwlock_acquire_read(&lgrw->fallback_rwlock.dep_map,
+				    0, 0, _RET_IP_);
+	} else {
+		read_lock(&lgrw->fallback_rwlock);
+	}
+}
+EXPORT_SYMBOL(lg_read_lock);
+
+void lg_read_unlock(struct lgrwlock *lgrw)
+{
+	if (likely(__this_cpu_read(*lgrw->reader_refcnt))) {
+		rwlock_release(&lgrw->fallback_rwlock.dep_map,
+			       1, _RET_IP_);
+		if (!__this_cpu_dec_return(*lgrw->reader_refcnt))
+			arch_spin_unlock(this_cpu_ptr(lgrw->lglock.lock));
+	} else {
+		read_unlock(&lgrw->fallback_rwlock);
+	}
+
+	preempt_enable();
+}
+EXPORT_SYMBOL(lg_read_unlock);
+
+void lg_write_lock(struct lgrwlock *lgrw)
+{
+	lg_global_lock(&lgrw->lglock);
+	write_lock(&lgrw->fallback_rwlock);
+}
+EXPORT_SYMBOL(lg_write_lock);
+
+void lg_write_unlock(struct lgrwlock *lgrw)
+{
+	write_unlock(&lgrw->fallback_rwlock);
+	lg_global_unlock(&lgrw->lglock);
+}
+EXPORT_SYMBOL(lg_write_unlock);
+
+void __lg_read_write_lock(struct lgrwlock *lgrw)
+{
+	lg_write_lock(lgrw);
+	__this_cpu_write(*lgrw->reader_refcnt, 1);
+}
+EXPORT_SYMBOL(__lg_read_write_lock);
+
+void __lg_read_write_unlock(struct lgrwlock *lgrw)
+{
+	__this_cpu_write(*lgrw->reader_refcnt, 0);
+	lg_write_unlock(lgrw);
+}
+EXPORT_SYMBOL(__lg_read_write_unlock);

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* Re: [PATCH V2] lglock: add read-preference local-global rwlock
@ 2013-03-05 16:19                                               ` Michel Lespinasse
  0 siblings, 0 replies; 360+ messages in thread
From: Michel Lespinasse @ 2013-03-05 16:19 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Lai Jiangshan, linux-doc, peterz, fweisbec, linux-kernel, mingo,
	linux-arch, linux, xiaoguangrong, wangyun, paulmck, nikunj,
	linux-pm, rusty, rostedt, rjw, namhyung, tglx, linux-arm-kernel,
	netdev, Oleg Nesterov, vincent.guittot, sbw, Srivatsa S. Bhat,
	tj, akpm, linuxppc-dev

Hi Lai,

Just a few comments about your v2 proposal. Hopefully you'll catch
these before you send out v3 :)

- I would prefer reader_refcnt to be unsigned int instead of unsigned long
- I would like some comment to indicate that lgrwlocks don't have
  reader-writer fairness and are thus somewhat discouraged
  (people could use plain lglock if they don't need reader preference,
  though even that use (as brlock) is discouraged already :)
- I don't think FALLBACK_BASE is necessary (you already mentioned you'd
  drop it)
- I prefer using the fallback_rwlock's dep_map for lockdep tracking.
  I feel this is more natural since we want the lgrwlock to behave as
  the rwlock, not as the lglock.
- I prefer to avoid return statements in the middle of functions when
  it's easyto do so.

Attached is my current version (based on an earlier version of your code).
You don't have to take it as is but I feel it makes for a more concrete
suggestion :)

Thanks,

----------------------------8<-------------------------------------------
lglock: add read-preference lgrwlock

Current lglock may be used as a fair rwlock; however sometimes a
read-preference rwlock is preferred. One such use case recently came
up for get_cpu_online_atomic().

This change adds a new lgrwlock with the following properties:
- high performance read side, using only cpu-local structures when there
  is no write side to contend with;
- correctness guarantees similar to rwlock_t: recursive readers are allowed
  and the lock's read side is not ordered vs other locks;
- low performance write side (comparable to lglocks' global side).

The implementation relies on the following principles:
- reader_refcnt is a local lock count; it indicates how many recursive
  read locks are taken using the local lglock;
- lglock is used by readers for local locking; it must be acquired
  before reader_refcnt becomes nonzero and released after reader_refcnt
  goes back to zero;
- fallback_rwlock is used by readers for global locking; it is acquired
  when fallback_reader_refcnt is zero and the trylock fails on lglock.
- writers take both the lglock write side and the fallback_rwlock, thus
  making sure to exclude both local and global readers.

Thanks to Srivatsa S. Bhat for proposing a lock with these requirements
and Lai Jiangshan for proposing this algorithm as an lglock extension.

Signed-off-by: Michel Lespinasse <walken@google.com>

---
 include/linux/lglock.h | 46 +++++++++++++++++++++++++++++++++++++++
 kernel/lglock.c        | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 104 insertions(+)

diff --git a/include/linux/lglock.h b/include/linux/lglock.h
index 0d24e932db0b..8b59084935d5 100644
--- a/include/linux/lglock.h
+++ b/include/linux/lglock.h
@@ -67,4 +67,50 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu);
 void lg_global_lock(struct lglock *lg);
 void lg_global_unlock(struct lglock *lg);
 
+/*
+ * lglock may be used as a read write spinlock if desired (though this is
+ * not encouraged as the write side scales badly on high CPU count machines).
+ * It has reader/writer fairness when used that way.
+ *
+ * However, sometimes it is desired to have an unfair rwlock instead, with
+ * reentrant readers that don't need to be ordered vs other locks, comparable
+ * to rwlock_t. lgrwlock implements such semantics.
+ */
+struct lgrwlock {
+	unsigned int __percpu *reader_refcnt;
+	struct lglock lglock;
+	rwlock_t fallback_rwlock;
+};
+
+#define __DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	static DEFINE_PER_CPU(unsigned int, name ## _refcnt);		\
+	static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)		\
+	= __ARCH_SPIN_LOCK_UNLOCKED;
+
+#define __LGRWLOCK_INIT(name) {						\
+	.reader_refcnt = &name ## _refcnt,				\
+	.lglock = { .lock = &name ## _lock },				\
+	.fallback_rwlock = __RW_LOCK_UNLOCKED(name.fallback_rwlock)	\
+}
+
+#define DEFINE_LGRWLOCK(name)						\
+	__DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	struct lgrwlock name = __LGRWLOCK_INIT(name)
+
+#define DEFINE_STATIC_LGRWLOCK(name)					\
+	__DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	static struct lgrwlock name = __LGRWLOCK_INIT(name)
+
+static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name)
+{
+	lg_lock_init(&lgrw->lglock, name);
+}
+
+void lg_read_lock(struct lgrwlock *lgrw);
+void lg_read_unlock(struct lgrwlock *lgrw);
+void lg_write_lock(struct lgrwlock *lgrw);
+void lg_write_unlock(struct lgrwlock *lgrw);
+void __lg_read_write_lock(struct lgrwlock *lgrw);
+void __lg_read_write_unlock(struct lgrwlock *lgrw);
+
 #endif
diff --git a/kernel/lglock.c b/kernel/lglock.c
index 86ae2aebf004..e78a7c95dbfd 100644
--- a/kernel/lglock.c
+++ b/kernel/lglock.c
@@ -87,3 +87,61 @@ void lg_global_unlock(struct lglock *lg)
 	preempt_enable();
 }
 EXPORT_SYMBOL(lg_global_unlock);
+
+void lg_read_lock(struct lgrwlock *lgrw)
+{
+	preempt_disable();
+
+	if (__this_cpu_read(*lgrw->reader_refcnt) ||
+	    arch_spin_trylock(this_cpu_ptr(lgrw->lglock.lock))) {
+		__this_cpu_inc(*lgrw->reader_refcnt);
+		rwlock_acquire_read(&lgrw->fallback_rwlock.dep_map,
+				    0, 0, _RET_IP_);
+	} else {
+		read_lock(&lgrw->fallback_rwlock);
+	}
+}
+EXPORT_SYMBOL(lg_read_lock);
+
+void lg_read_unlock(struct lgrwlock *lgrw)
+{
+	if (likely(__this_cpu_read(*lgrw->reader_refcnt))) {
+		rwlock_release(&lgrw->fallback_rwlock.dep_map,
+			       1, _RET_IP_);
+		if (!__this_cpu_dec_return(*lgrw->reader_refcnt))
+			arch_spin_unlock(this_cpu_ptr(lgrw->lglock.lock));
+	} else {
+		read_unlock(&lgrw->fallback_rwlock);
+	}
+
+	preempt_enable();
+}
+EXPORT_SYMBOL(lg_read_unlock);
+
+void lg_write_lock(struct lgrwlock *lgrw)
+{
+	lg_global_lock(&lgrw->lglock);
+	write_lock(&lgrw->fallback_rwlock);
+}
+EXPORT_SYMBOL(lg_write_lock);
+
+void lg_write_unlock(struct lgrwlock *lgrw)
+{
+	write_unlock(&lgrw->fallback_rwlock);
+	lg_global_unlock(&lgrw->lglock);
+}
+EXPORT_SYMBOL(lg_write_unlock);
+
+void __lg_read_write_lock(struct lgrwlock *lgrw)
+{
+	lg_write_lock(lgrw);
+	__this_cpu_write(*lgrw->reader_refcnt, 1);
+}
+EXPORT_SYMBOL(__lg_read_write_lock);
+
+void __lg_read_write_unlock(struct lgrwlock *lgrw)
+{
+	__this_cpu_write(*lgrw->reader_refcnt, 0);
+	lg_write_unlock(lgrw);
+}
+EXPORT_SYMBOL(__lg_read_write_unlock);

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* [PATCH V2] lglock: add read-preference local-global rwlock
@ 2013-03-05 16:19                                               ` Michel Lespinasse
  0 siblings, 0 replies; 360+ messages in thread
From: Michel Lespinasse @ 2013-03-05 16:19 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Lai,

Just a few comments about your v2 proposal. Hopefully you'll catch
these before you send out v3 :)

- I would prefer reader_refcnt to be unsigned int instead of unsigned long
- I would like some comment to indicate that lgrwlocks don't have
  reader-writer fairness and are thus somewhat discouraged
  (people could use plain lglock if they don't need reader preference,
  though even that use (as brlock) is discouraged already :)
- I don't think FALLBACK_BASE is necessary (you already mentioned you'd
  drop it)
- I prefer using the fallback_rwlock's dep_map for lockdep tracking.
  I feel this is more natural since we want the lgrwlock to behave as
  the rwlock, not as the lglock.
- I prefer to avoid return statements in the middle of functions when
  it's easyto do so.

Attached is my current version (based on an earlier version of your code).
You don't have to take it as is but I feel it makes for a more concrete
suggestion :)

Thanks,

----------------------------8<-------------------------------------------
lglock: add read-preference lgrwlock

Current lglock may be used as a fair rwlock; however sometimes a
read-preference rwlock is preferred. One such use case recently came
up for get_cpu_online_atomic().

This change adds a new lgrwlock with the following properties:
- high performance read side, using only cpu-local structures when there
  is no write side to contend with;
- correctness guarantees similar to rwlock_t: recursive readers are allowed
  and the lock's read side is not ordered vs other locks;
- low performance write side (comparable to lglocks' global side).

The implementation relies on the following principles:
- reader_refcnt is a local lock count; it indicates how many recursive
  read locks are taken using the local lglock;
- lglock is used by readers for local locking; it must be acquired
  before reader_refcnt becomes nonzero and released after reader_refcnt
  goes back to zero;
- fallback_rwlock is used by readers for global locking; it is acquired
  when fallback_reader_refcnt is zero and the trylock fails on lglock.
- writers take both the lglock write side and the fallback_rwlock, thus
  making sure to exclude both local and global readers.

Thanks to Srivatsa S. Bhat for proposing a lock with these requirements
and Lai Jiangshan for proposing this algorithm as an lglock extension.

Signed-off-by: Michel Lespinasse <walken@google.com>

---
 include/linux/lglock.h | 46 +++++++++++++++++++++++++++++++++++++++
 kernel/lglock.c        | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 104 insertions(+)

diff --git a/include/linux/lglock.h b/include/linux/lglock.h
index 0d24e932db0b..8b59084935d5 100644
--- a/include/linux/lglock.h
+++ b/include/linux/lglock.h
@@ -67,4 +67,50 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu);
 void lg_global_lock(struct lglock *lg);
 void lg_global_unlock(struct lglock *lg);
 
+/*
+ * lglock may be used as a read write spinlock if desired (though this is
+ * not encouraged as the write side scales badly on high CPU count machines).
+ * It has reader/writer fairness when used that way.
+ *
+ * However, sometimes it is desired to have an unfair rwlock instead, with
+ * reentrant readers that don't need to be ordered vs other locks, comparable
+ * to rwlock_t. lgrwlock implements such semantics.
+ */
+struct lgrwlock {
+	unsigned int __percpu *reader_refcnt;
+	struct lglock lglock;
+	rwlock_t fallback_rwlock;
+};
+
+#define __DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	static DEFINE_PER_CPU(unsigned int, name ## _refcnt);		\
+	static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock)		\
+	= __ARCH_SPIN_LOCK_UNLOCKED;
+
+#define __LGRWLOCK_INIT(name) {						\
+	.reader_refcnt = &name ## _refcnt,				\
+	.lglock = { .lock = &name ## _lock },				\
+	.fallback_rwlock = __RW_LOCK_UNLOCKED(name.fallback_rwlock)	\
+}
+
+#define DEFINE_LGRWLOCK(name)						\
+	__DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	struct lgrwlock name = __LGRWLOCK_INIT(name)
+
+#define DEFINE_STATIC_LGRWLOCK(name)					\
+	__DEFINE_LGRWLOCK_PERCPU_DATA(name)				\
+	static struct lgrwlock name = __LGRWLOCK_INIT(name)
+
+static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name)
+{
+	lg_lock_init(&lgrw->lglock, name);
+}
+
+void lg_read_lock(struct lgrwlock *lgrw);
+void lg_read_unlock(struct lgrwlock *lgrw);
+void lg_write_lock(struct lgrwlock *lgrw);
+void lg_write_unlock(struct lgrwlock *lgrw);
+void __lg_read_write_lock(struct lgrwlock *lgrw);
+void __lg_read_write_unlock(struct lgrwlock *lgrw);
+
 #endif
diff --git a/kernel/lglock.c b/kernel/lglock.c
index 86ae2aebf004..e78a7c95dbfd 100644
--- a/kernel/lglock.c
+++ b/kernel/lglock.c
@@ -87,3 +87,61 @@ void lg_global_unlock(struct lglock *lg)
 	preempt_enable();
 }
 EXPORT_SYMBOL(lg_global_unlock);
+
+void lg_read_lock(struct lgrwlock *lgrw)
+{
+	preempt_disable();
+
+	if (__this_cpu_read(*lgrw->reader_refcnt) ||
+	    arch_spin_trylock(this_cpu_ptr(lgrw->lglock.lock))) {
+		__this_cpu_inc(*lgrw->reader_refcnt);
+		rwlock_acquire_read(&lgrw->fallback_rwlock.dep_map,
+				    0, 0, _RET_IP_);
+	} else {
+		read_lock(&lgrw->fallback_rwlock);
+	}
+}
+EXPORT_SYMBOL(lg_read_lock);
+
+void lg_read_unlock(struct lgrwlock *lgrw)
+{
+	if (likely(__this_cpu_read(*lgrw->reader_refcnt))) {
+		rwlock_release(&lgrw->fallback_rwlock.dep_map,
+			       1, _RET_IP_);
+		if (!__this_cpu_dec_return(*lgrw->reader_refcnt))
+			arch_spin_unlock(this_cpu_ptr(lgrw->lglock.lock));
+	} else {
+		read_unlock(&lgrw->fallback_rwlock);
+	}
+
+	preempt_enable();
+}
+EXPORT_SYMBOL(lg_read_unlock);
+
+void lg_write_lock(struct lgrwlock *lgrw)
+{
+	lg_global_lock(&lgrw->lglock);
+	write_lock(&lgrw->fallback_rwlock);
+}
+EXPORT_SYMBOL(lg_write_lock);
+
+void lg_write_unlock(struct lgrwlock *lgrw)
+{
+	write_unlock(&lgrw->fallback_rwlock);
+	lg_global_unlock(&lgrw->lglock);
+}
+EXPORT_SYMBOL(lg_write_unlock);
+
+void __lg_read_write_lock(struct lgrwlock *lgrw)
+{
+	lg_write_lock(lgrw);
+	__this_cpu_write(*lgrw->reader_refcnt, 1);
+}
+EXPORT_SYMBOL(__lg_read_write_lock);
+
+void __lg_read_write_unlock(struct lgrwlock *lgrw)
+{
+	__this_cpu_write(*lgrw->reader_refcnt, 0);
+	lg_write_unlock(lgrw);
+}
+EXPORT_SYMBOL(__lg_read_write_unlock);

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply related	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
  2013-03-01 19:47                                     ` Srivatsa S. Bhat
  (?)
  (?)
@ 2013-03-05 16:25                                       ` Lai Jiangshan
  -1 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-03-05 16:25 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Lai Jiangshan, Michel Lespinasse, linux-doc, peterz, fweisbec,
	linux-kernel, namhyung, mingo, linux-arch, linux, xiaoguangrong,
	wangyun, paulmck, nikunj, linux-pm, rusty, rostedt, rjw,
	vincent.guittot, tglx, linux-arm-kernel, netdev, oleg, sbw, tj,
	akpm, linuxppc-dev

On 02/03/13 03:47, Srivatsa S. Bhat wrote:
> On 03/01/2013 11:20 PM, Lai Jiangshan wrote:
>> On 28/02/13 05:19, Srivatsa S. Bhat wrote:
>>> On 02/27/2013 06:03 AM, Lai Jiangshan wrote:
>>>> On Wed, Feb 27, 2013 at 3:30 AM, Srivatsa S. Bhat
>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>> On 02/26/2013 09:55 PM, Lai Jiangshan wrote:
>>>>>> On Tue, Feb 26, 2013 at 10:22 PM, Srivatsa S. Bhat
>>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>>
>>>>>>> Hi Lai,
>>>>>>>
>>>>>>> I'm really not convinced that piggy-backing on lglocks would help
>>>>>>> us in any way. But still, let me try to address some of the points
>>>>>>> you raised...
>>>>>>>
>>>>>>> On 02/26/2013 06:29 PM, Lai Jiangshan wrote:
>>>>>>>> On Tue, Feb 26, 2013 at 5:02 PM, Srivatsa S. Bhat
>>>>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>>>> On 02/26/2013 05:47 AM, Lai Jiangshan wrote:
>>>>>>>>>> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
>>>>>>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>>>>>> Hi Lai,
>>>>>>>>>>>
>>>>>>>>>>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>>>>>>>>>>> Hi, Srivatsa,
>>>>>>>>>>>>
>>>>>>>>>>>> The target of the whole patchset is nice for me.
>>>>>>>>>>>
>>>>>>>>>>> Cool! Thanks :-)
>>>>>>>>>>>
>>>>>>>>> [...]
>>>>>>>>>
>>>>>>>>> Unfortunately, I see quite a few issues with the code above. IIUC, the
>>>>>>>>> writer and the reader both increment the same counters. So how will the
>>>>>>>>> unlock() code in the reader path know when to unlock which of the locks?
>>>>>>>>
>>>>>>>> The same as your code, the reader(which nested in write C.S.) just dec
>>>>>>>> the counters.
>>>>>>>
>>>>>>> And that works fine in my case because the writer and the reader update
>>>>>>> _two_ _different_ counters.
>>>>>>
>>>>>> I can't find any magic in your code, they are the same counter.
>>>>>>
>>>>>>         /*
>>>>>>          * It is desirable to allow the writer to acquire the percpu-rwlock
>>>>>>          * for read (if necessary), without deadlocking or getting complaints
>>>>>>          * from lockdep. To achieve that, just increment the reader_refcnt of
>>>>>>          * this CPU - that way, any attempt by the writer to acquire the
>>>>>>          * percpu-rwlock for read, will get treated as a case of nested percpu
>>>>>>          * reader, which is safe, from a locking perspective.
>>>>>>          */
>>>>>>         this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
>>>>>>
>>>>>
>>>>> Whoa! Hold on, were you really referring to _this_ increment when you said
>>>>> that, in your patch you would increment the refcnt at the writer? Then I guess
>>>>> there is a major disconnect in our conversations. (I had assumed that you were
>>>>> referring to the update of writer_signal, and were just trying to have a single
>>>>> refcnt instead of reader_refcnt and writer_signal).
>>>>
>>>> https://github.com/laijs/linux/commit/53e5053d5b724bea7c538b11743d0f420d98f38d
>>>>
>>>> Sorry the name "fallback_reader_refcnt" misled you.
>>>>
>>> [...]
>>>
>>>>>> All I was considered is "nested reader is seldom", so I always
>>>>>> fallback to rwlock when nested.
>>>>>> If you like, I can add 6 lines of code, the overhead is
>>>>>> 1 spin_try_lock()(fast path)  + N  __this_cpu_inc()
>>>>>>
>>>>>
>>>>> I'm assuming that calculation is no longer valid, considering that
>>>>> we just discussed how the per-cpu refcnt that you were using is quite
>>>>> unnecessary and can be removed.
>>>>>
>>>>> IIUC, the overhead with your code, as per above discussion would be:
>>>>> 1 spin_try_lock() [non-nested] + N read_lock(global_rwlock).
>>>>
>>>> https://github.com/laijs/linux/commit/46334544bb7961550b7065e015da76f6dab21f16
>>>>
>>>> Again, I'm so sorry the name "fallback_reader_refcnt" misled you.
>>>>
>>>
>>> At this juncture I really have to admit that I don't understand your
>>> intentions at all. What are you really trying to prove? Without giving
>>> a single good reason why my code is inferior, why are you even bringing
>>> up the discussion about a complete rewrite of the synchronization code?
>>> http://article.gmane.org/gmane.linux.kernel.cross-arch/17103
>>> http://article.gmane.org/gmane.linux.power-management.general/31345
>>>
>>> I'm beginning to add 2 + 2 together based on the kinds of questions you
>>> have been asking...
>>>
>>> You posted a patch in this thread and started a discussion around it without
>>> even establishing a strong reason to do so. Now you point me to your git
>>> tree where your patches have even more traces of ideas being borrowed from
>>> my patchset (apart from my own ideas/code, there are traces of others' ideas
>>> being borrowed too - for example, it was Oleg who originally proposed the
>>> idea of splitting up the counter into 2 parts and I'm seeing that it is
>>> slowly crawling into your code with no sign of appropriate credits).
>>> http://article.gmane.org/gmane.linux.network/260288
>>>
>>> And in reply to my mail pointing out the performance implications of the
>>> global read_lock at the reader side in your code, you said you'll come up
>>> with a comparison between that and my patchset.
>>> http://article.gmane.org/gmane.linux.network/260288
>>> The issue has been well-documented in my patch description of patch 4.
>>> http://article.gmane.org/gmane.linux.kernel/1443258
>>>
>>> Are you really trying to pit bits and pieces of my own ideas/versions
>>> against one another and claiming them as your own?
>>>
>>> You projected the work involved in handling the locking issues pertaining
>>> to CPU_DYING notifiers etc as a TODO, despite the fact that I had explicitly
>>> noted in my cover letter that I had audited and taken care of all of them.
>>> http://article.gmane.org/gmane.linux.documentation/9727
>>> http://article.gmane.org/gmane.linux.documentation/9520
>>>
>>> You failed to acknowledge (on purpose?) that I had done a tree-wide
>>> conversion despite the fact that you were replying to the very thread which
>>> had the 46 patches which did exactly that (and I had also mentioned it
>>> explicitly in my cover letter).
>>> http://article.gmane.org/gmane.linux.documentation/9727
>>> http://article.gmane.org/gmane.linux.documentation/9520
>>>
>>> You then started probing more and more about the technique I used to do
>>> the tree-wide conversion.
>>> http://article.gmane.org/gmane.linux.kernel.cross-arch/17111
>>>
>>> You also retorted saying you did go through my patch descriptions, so
>>> its not like you have missed reading them.
>>> http://article.gmane.org/gmane.linux.power-management.general/31345
>>>
>>> Each of these when considered individually, might appear like innocuous and
>>> honest attempts at evaluating my code. But when put together, I'm beginning
>>> to sense a whole different angle to it altogether, as if you are trying
>>> to spin your own patch series, complete with the locking framework _and_
>>> the tree-wide conversion, heavily borrowed from mine. At the beginning of
>>> this discussion, I predicted that the lglock version that you are proposing
>>> would end up being either less efficient than my version or look very similar
>>> to my version. http://article.gmane.org/gmane.linux.kernel/1447139
>>>
>>> I thought it was just the former till now, but its not hard to see how it
>>> is getting closer to becoming the latter too. So yeah, I'm not amused.
>>>
>>> Maybe (and hopefully) you are just trying out different ideas on your own,
>>> and I'm just being paranoid. I really hope that is the case. If you are just
>>> trying to review my code, then please stop sending patches with borrowed ideas
>>> with your sole Signed-off-by, and purposefully ignoring the work already done
>>> in my patchset, because it is really starting to look suspicious, at least
>>> to me.
>>>
>>> Don't get me wrong - I'll whole-heartedly acknowledge and appreciate if
>>> _your_ code is better than mine. I just don't like the idea of somebody
>>> plagiarizing my ideas/code (or even others' ideas for that matter).
>>> However, I sincerely apologize in advance if I misunderstood/misjudged your
>>> intentions; I just wanted to voice my concerns out loud at this point,
>>> considering the bad feeling I got by looking at your responses collectively.
>>>
>>
>> Hi, Srivatsa
>>
>> I'm sorry, big apology to you.
>> I'm bad in communication and I did be wrong.
>> I tended to improve the codes but in false direction.
>>
> 
> OK, in that case, I'm extremely sorry too, for jumping on you like that.
> I hope you'll forgive me for the uneasiness it caused.
> 
> Now that I understand that you were simply trying to help, I would like to
> express my gratitude for your time, effort and inputs in improving the design
> of the stop-machine replacement.
> 
> I'm looking forward to working with you on this as well as future endeavours,
> so I sincerely hope that we can put this unfortunate incident behind us and
> collaborate effectively with renewed mutual trust and good-will.
> 
> Thank you very much!
> 

Hi, Srivatsa,

I'm sorry again, I delayed your works.

I have some thinkings about the way how to get this work done.

First step: (2~3 patches)
Use preempt_disable() to implement get_online_cpu_atomic(), and add lockdep for it.

Second step:
Conversion patches.

We can send the patchset of the above steps at first.
{
It does not change any behavior of the kernel.
and it is annotation(instead of direct preempt_diable() without comments sometimes),
so I expected they can be merged very early.
}

Third step:
After all people confide the conversion patches covered all cases and cpuhotplug site is ready for it,
we will implement get_online_cpu_atomic() via locks and remove stop_machine() from cpuhotplug.

Any thought?

Thanks,
Lai

If I have time, I will help you for the patches of the first step.
(I was assigned bad job in office-time, I can only do kernel-dev work in night.)

And for step2, I will write a checklist or spatch-script.



^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-03-05 16:25                                       ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-03-05 16:25 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Lai Jiangshan, Michel Lespinasse, linux-doc, peterz, fweisbec,
	linux-kernel, namhyung, mingo, linux-arch, linux, xiaoguangrong,
	wangyun, paulmck, nikunj, linux-pm, rusty, rostedt, rjw,
	vincent.guittot, tglx, linux-arm-kernel, netdev, oleg, sbw, tj,
	akpm, linuxppc-dev

On 02/03/13 03:47, Srivatsa S. Bhat wrote:
> On 03/01/2013 11:20 PM, Lai Jiangshan wrote:
>> On 28/02/13 05:19, Srivatsa S. Bhat wrote:
>>> On 02/27/2013 06:03 AM, Lai Jiangshan wrote:
>>>> On Wed, Feb 27, 2013 at 3:30 AM, Srivatsa S. Bhat
>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>> On 02/26/2013 09:55 PM, Lai Jiangshan wrote:
>>>>>> On Tue, Feb 26, 2013 at 10:22 PM, Srivatsa S. Bhat
>>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>>
>>>>>>> Hi Lai,
>>>>>>>
>>>>>>> I'm really not convinced that piggy-backing on lglocks would help
>>>>>>> us in any way. But still, let me try to address some of the points
>>>>>>> you raised...
>>>>>>>
>>>>>>> On 02/26/2013 06:29 PM, Lai Jiangshan wrote:
>>>>>>>> On Tue, Feb 26, 2013 at 5:02 PM, Srivatsa S. Bhat
>>>>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>>>> On 02/26/2013 05:47 AM, Lai Jiangshan wrote:
>>>>>>>>>> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
>>>>>>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>>>>>> Hi Lai,
>>>>>>>>>>>
>>>>>>>>>>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>>>>>>>>>>> Hi, Srivatsa,
>>>>>>>>>>>>
>>>>>>>>>>>> The target of the whole patchset is nice for me.
>>>>>>>>>>>
>>>>>>>>>>> Cool! Thanks :-)
>>>>>>>>>>>
>>>>>>>>> [...]
>>>>>>>>>
>>>>>>>>> Unfortunately, I see quite a few issues with the code above. IIUC, the
>>>>>>>>> writer and the reader both increment the same counters. So how will the
>>>>>>>>> unlock() code in the reader path know when to unlock which of the locks?
>>>>>>>>
>>>>>>>> The same as your code, the reader(which nested in write C.S.) just dec
>>>>>>>> the counters.
>>>>>>>
>>>>>>> And that works fine in my case because the writer and the reader update
>>>>>>> _two_ _different_ counters.
>>>>>>
>>>>>> I can't find any magic in your code, they are the same counter.
>>>>>>
>>>>>>         /*
>>>>>>          * It is desirable to allow the writer to acquire the percpu-rwlock
>>>>>>          * for read (if necessary), without deadlocking or getting complaints
>>>>>>          * from lockdep. To achieve that, just increment the reader_refcnt of
>>>>>>          * this CPU - that way, any attempt by the writer to acquire the
>>>>>>          * percpu-rwlock for read, will get treated as a case of nested percpu
>>>>>>          * reader, which is safe, from a locking perspective.
>>>>>>          */
>>>>>>         this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
>>>>>>
>>>>>
>>>>> Whoa! Hold on, were you really referring to _this_ increment when you said
>>>>> that, in your patch you would increment the refcnt at the writer? Then I guess
>>>>> there is a major disconnect in our conversations. (I had assumed that you were
>>>>> referring to the update of writer_signal, and were just trying to have a single
>>>>> refcnt instead of reader_refcnt and writer_signal).
>>>>
>>>> https://github.com/laijs/linux/commit/53e5053d5b724bea7c538b11743d0f420d98f38d
>>>>
>>>> Sorry the name "fallback_reader_refcnt" misled you.
>>>>
>>> [...]
>>>
>>>>>> All I was considered is "nested reader is seldom", so I always
>>>>>> fallback to rwlock when nested.
>>>>>> If you like, I can add 6 lines of code, the overhead is
>>>>>> 1 spin_try_lock()(fast path)  + N  __this_cpu_inc()
>>>>>>
>>>>>
>>>>> I'm assuming that calculation is no longer valid, considering that
>>>>> we just discussed how the per-cpu refcnt that you were using is quite
>>>>> unnecessary and can be removed.
>>>>>
>>>>> IIUC, the overhead with your code, as per above discussion would be:
>>>>> 1 spin_try_lock() [non-nested] + N read_lock(global_rwlock).
>>>>
>>>> https://github.com/laijs/linux/commit/46334544bb7961550b7065e015da76f6dab21f16
>>>>
>>>> Again, I'm so sorry the name "fallback_reader_refcnt" misled you.
>>>>
>>>
>>> At this juncture I really have to admit that I don't understand your
>>> intentions at all. What are you really trying to prove? Without giving
>>> a single good reason why my code is inferior, why are you even bringing
>>> up the discussion about a complete rewrite of the synchronization code?
>>> http://article.gmane.org/gmane.linux.kernel.cross-arch/17103
>>> http://article.gmane.org/gmane.linux.power-management.general/31345
>>>
>>> I'm beginning to add 2 + 2 together based on the kinds of questions you
>>> have been asking...
>>>
>>> You posted a patch in this thread and started a discussion around it without
>>> even establishing a strong reason to do so. Now you point me to your git
>>> tree where your patches have even more traces of ideas being borrowed from
>>> my patchset (apart from my own ideas/code, there are traces of others' ideas
>>> being borrowed too - for example, it was Oleg who originally proposed the
>>> idea of splitting up the counter into 2 parts and I'm seeing that it is
>>> slowly crawling into your code with no sign of appropriate credits).
>>> http://article.gmane.org/gmane.linux.network/260288
>>>
>>> And in reply to my mail pointing out the performance implications of the
>>> global read_lock at the reader side in your code, you said you'll come up
>>> with a comparison between that and my patchset.
>>> http://article.gmane.org/gmane.linux.network/260288
>>> The issue has been well-documented in my patch description of patch 4.
>>> http://article.gmane.org/gmane.linux.kernel/1443258
>>>
>>> Are you really trying to pit bits and pieces of my own ideas/versions
>>> against one another and claiming them as your own?
>>>
>>> You projected the work involved in handling the locking issues pertaining
>>> to CPU_DYING notifiers etc as a TODO, despite the fact that I had explicitly
>>> noted in my cover letter that I had audited and taken care of all of them.
>>> http://article.gmane.org/gmane.linux.documentation/9727
>>> http://article.gmane.org/gmane.linux.documentation/9520
>>>
>>> You failed to acknowledge (on purpose?) that I had done a tree-wide
>>> conversion despite the fact that you were replying to the very thread which
>>> had the 46 patches which did exactly that (and I had also mentioned it
>>> explicitly in my cover letter).
>>> http://article.gmane.org/gmane.linux.documentation/9727
>>> http://article.gmane.org/gmane.linux.documentation/9520
>>>
>>> You then started probing more and more about the technique I used to do
>>> the tree-wide conversion.
>>> http://article.gmane.org/gmane.linux.kernel.cross-arch/17111
>>>
>>> You also retorted saying you did go through my patch descriptions, so
>>> its not like you have missed reading them.
>>> http://article.gmane.org/gmane.linux.power-management.general/31345
>>>
>>> Each of these when considered individually, might appear like innocuous and
>>> honest attempts at evaluating my code. But when put together, I'm beginning
>>> to sense a whole different angle to it altogether, as if you are trying
>>> to spin your own patch series, complete with the locking framework _and_
>>> the tree-wide conversion, heavily borrowed from mine. At the beginning of
>>> this discussion, I predicted that the lglock version that you are proposing
>>> would end up being either less efficient than my version or look very similar
>>> to my version. http://article.gmane.org/gmane.linux.kernel/1447139
>>>
>>> I thought it was just the former till now, but its not hard to see how it
>>> is getting closer to becoming the latter too. So yeah, I'm not amused.
>>>
>>> Maybe (and hopefully) you are just trying out different ideas on your own,
>>> and I'm just being paranoid. I really hope that is the case. If you are just
>>> trying to review my code, then please stop sending patches with borrowed ideas
>>> with your sole Signed-off-by, and purposefully ignoring the work already done
>>> in my patchset, because it is really starting to look suspicious, at least
>>> to me.
>>>
>>> Don't get me wrong - I'll whole-heartedly acknowledge and appreciate if
>>> _your_ code is better than mine. I just don't like the idea of somebody
>>> plagiarizing my ideas/code (or even others' ideas for that matter).
>>> However, I sincerely apologize in advance if I misunderstood/misjudged your
>>> intentions; I just wanted to voice my concerns out loud at this point,
>>> considering the bad feeling I got by looking at your responses collectively.
>>>
>>
>> Hi, Srivatsa
>>
>> I'm sorry, big apology to you.
>> I'm bad in communication and I did be wrong.
>> I tended to improve the codes but in false direction.
>>
> 
> OK, in that case, I'm extremely sorry too, for jumping on you like that.
> I hope you'll forgive me for the uneasiness it caused.
> 
> Now that I understand that you were simply trying to help, I would like to
> express my gratitude for your time, effort and inputs in improving the design
> of the stop-machine replacement.
> 
> I'm looking forward to working with you on this as well as future endeavours,
> so I sincerely hope that we can put this unfortunate incident behind us and
> collaborate effectively with renewed mutual trust and good-will.
> 
> Thank you very much!
> 

Hi, Srivatsa,

I'm sorry again, I delayed your works.

I have some thinkings about the way how to get this work done.

First step: (2~3 patches)
Use preempt_disable() to implement get_online_cpu_atomic(), and add lockdep for it.

Second step:
Conversion patches.

We can send the patchset of the above steps at first.
{
It does not change any behavior of the kernel.
and it is annotation(instead of direct preempt_diable() without comments sometimes),
so I expected they can be merged very early.
}

Third step:
After all people confide the conversion patches covered all cases and cpuhotplug site is ready for it,
we will implement get_online_cpu_atomic() via locks and remove stop_machine() from cpuhotplug.

Any thought?

Thanks,
Lai

If I have time, I will help you for the patches of the first step.
(I was assigned bad job in office-time, I can only do kernel-dev work in night.)

And for step2, I will write a checklist or spatch-script.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-03-05 16:25                                       ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-03-05 16:25 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Lai Jiangshan, linux-doc, peterz, fweisbec, oleg,
	Michel Lespinasse, mingo, linux-arch, linux, xiaoguangrong,
	wangyun, paulmck, nikunj, linux-pm, rusty, rostedt, rjw,
	namhyung, tglx, linux-arm-kernel, netdev, linux-kernel,
	vincent.guittot, sbw, tj, akpm, linuxppc-dev

On 02/03/13 03:47, Srivatsa S. Bhat wrote:
> On 03/01/2013 11:20 PM, Lai Jiangshan wrote:
>> On 28/02/13 05:19, Srivatsa S. Bhat wrote:
>>> On 02/27/2013 06:03 AM, Lai Jiangshan wrote:
>>>> On Wed, Feb 27, 2013 at 3:30 AM, Srivatsa S. Bhat
>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>> On 02/26/2013 09:55 PM, Lai Jiangshan wrote:
>>>>>> On Tue, Feb 26, 2013 at 10:22 PM, Srivatsa S. Bhat
>>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>>
>>>>>>> Hi Lai,
>>>>>>>
>>>>>>> I'm really not convinced that piggy-backing on lglocks would help
>>>>>>> us in any way. But still, let me try to address some of the points
>>>>>>> you raised...
>>>>>>>
>>>>>>> On 02/26/2013 06:29 PM, Lai Jiangshan wrote:
>>>>>>>> On Tue, Feb 26, 2013 at 5:02 PM, Srivatsa S. Bhat
>>>>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>>>> On 02/26/2013 05:47 AM, Lai Jiangshan wrote:
>>>>>>>>>> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
>>>>>>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>>>>>> Hi Lai,
>>>>>>>>>>>
>>>>>>>>>>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>>>>>>>>>>> Hi, Srivatsa,
>>>>>>>>>>>>
>>>>>>>>>>>> The target of the whole patchset is nice for me.
>>>>>>>>>>>
>>>>>>>>>>> Cool! Thanks :-)
>>>>>>>>>>>
>>>>>>>>> [...]
>>>>>>>>>
>>>>>>>>> Unfortunately, I see quite a few issues with the code above. IIUC, the
>>>>>>>>> writer and the reader both increment the same counters. So how will the
>>>>>>>>> unlock() code in the reader path know when to unlock which of the locks?
>>>>>>>>
>>>>>>>> The same as your code, the reader(which nested in write C.S.) just dec
>>>>>>>> the counters.
>>>>>>>
>>>>>>> And that works fine in my case because the writer and the reader update
>>>>>>> _two_ _different_ counters.
>>>>>>
>>>>>> I can't find any magic in your code, they are the same counter.
>>>>>>
>>>>>>         /*
>>>>>>          * It is desirable to allow the writer to acquire the percpu-rwlock
>>>>>>          * for read (if necessary), without deadlocking or getting complaints
>>>>>>          * from lockdep. To achieve that, just increment the reader_refcnt of
>>>>>>          * this CPU - that way, any attempt by the writer to acquire the
>>>>>>          * percpu-rwlock for read, will get treated as a case of nested percpu
>>>>>>          * reader, which is safe, from a locking perspective.
>>>>>>          */
>>>>>>         this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
>>>>>>
>>>>>
>>>>> Whoa! Hold on, were you really referring to _this_ increment when you said
>>>>> that, in your patch you would increment the refcnt at the writer? Then I guess
>>>>> there is a major disconnect in our conversations. (I had assumed that you were
>>>>> referring to the update of writer_signal, and were just trying to have a single
>>>>> refcnt instead of reader_refcnt and writer_signal).
>>>>
>>>> https://github.com/laijs/linux/commit/53e5053d5b724bea7c538b11743d0f420d98f38d
>>>>
>>>> Sorry the name "fallback_reader_refcnt" misled you.
>>>>
>>> [...]
>>>
>>>>>> All I was considered is "nested reader is seldom", so I always
>>>>>> fallback to rwlock when nested.
>>>>>> If you like, I can add 6 lines of code, the overhead is
>>>>>> 1 spin_try_lock()(fast path)  + N  __this_cpu_inc()
>>>>>>
>>>>>
>>>>> I'm assuming that calculation is no longer valid, considering that
>>>>> we just discussed how the per-cpu refcnt that you were using is quite
>>>>> unnecessary and can be removed.
>>>>>
>>>>> IIUC, the overhead with your code, as per above discussion would be:
>>>>> 1 spin_try_lock() [non-nested] + N read_lock(global_rwlock).
>>>>
>>>> https://github.com/laijs/linux/commit/46334544bb7961550b7065e015da76f6dab21f16
>>>>
>>>> Again, I'm so sorry the name "fallback_reader_refcnt" misled you.
>>>>
>>>
>>> At this juncture I really have to admit that I don't understand your
>>> intentions at all. What are you really trying to prove? Without giving
>>> a single good reason why my code is inferior, why are you even bringing
>>> up the discussion about a complete rewrite of the synchronization code?
>>> http://article.gmane.org/gmane.linux.kernel.cross-arch/17103
>>> http://article.gmane.org/gmane.linux.power-management.general/31345
>>>
>>> I'm beginning to add 2 + 2 together based on the kinds of questions you
>>> have been asking...
>>>
>>> You posted a patch in this thread and started a discussion around it without
>>> even establishing a strong reason to do so. Now you point me to your git
>>> tree where your patches have even more traces of ideas being borrowed from
>>> my patchset (apart from my own ideas/code, there are traces of others' ideas
>>> being borrowed too - for example, it was Oleg who originally proposed the
>>> idea of splitting up the counter into 2 parts and I'm seeing that it is
>>> slowly crawling into your code with no sign of appropriate credits).
>>> http://article.gmane.org/gmane.linux.network/260288
>>>
>>> And in reply to my mail pointing out the performance implications of the
>>> global read_lock at the reader side in your code, you said you'll come up
>>> with a comparison between that and my patchset.
>>> http://article.gmane.org/gmane.linux.network/260288
>>> The issue has been well-documented in my patch description of patch 4.
>>> http://article.gmane.org/gmane.linux.kernel/1443258
>>>
>>> Are you really trying to pit bits and pieces of my own ideas/versions
>>> against one another and claiming them as your own?
>>>
>>> You projected the work involved in handling the locking issues pertaining
>>> to CPU_DYING notifiers etc as a TODO, despite the fact that I had explicitly
>>> noted in my cover letter that I had audited and taken care of all of them.
>>> http://article.gmane.org/gmane.linux.documentation/9727
>>> http://article.gmane.org/gmane.linux.documentation/9520
>>>
>>> You failed to acknowledge (on purpose?) that I had done a tree-wide
>>> conversion despite the fact that you were replying to the very thread which
>>> had the 46 patches which did exactly that (and I had also mentioned it
>>> explicitly in my cover letter).
>>> http://article.gmane.org/gmane.linux.documentation/9727
>>> http://article.gmane.org/gmane.linux.documentation/9520
>>>
>>> You then started probing more and more about the technique I used to do
>>> the tree-wide conversion.
>>> http://article.gmane.org/gmane.linux.kernel.cross-arch/17111
>>>
>>> You also retorted saying you did go through my patch descriptions, so
>>> its not like you have missed reading them.
>>> http://article.gmane.org/gmane.linux.power-management.general/31345
>>>
>>> Each of these when considered individually, might appear like innocuous and
>>> honest attempts at evaluating my code. But when put together, I'm beginning
>>> to sense a whole different angle to it altogether, as if you are trying
>>> to spin your own patch series, complete with the locking framework _and_
>>> the tree-wide conversion, heavily borrowed from mine. At the beginning of
>>> this discussion, I predicted that the lglock version that you are proposing
>>> would end up being either less efficient than my version or look very similar
>>> to my version. http://article.gmane.org/gmane.linux.kernel/1447139
>>>
>>> I thought it was just the former till now, but its not hard to see how it
>>> is getting closer to becoming the latter too. So yeah, I'm not amused.
>>>
>>> Maybe (and hopefully) you are just trying out different ideas on your own,
>>> and I'm just being paranoid. I really hope that is the case. If you are just
>>> trying to review my code, then please stop sending patches with borrowed ideas
>>> with your sole Signed-off-by, and purposefully ignoring the work already done
>>> in my patchset, because it is really starting to look suspicious, at least
>>> to me.
>>>
>>> Don't get me wrong - I'll whole-heartedly acknowledge and appreciate if
>>> _your_ code is better than mine. I just don't like the idea of somebody
>>> plagiarizing my ideas/code (or even others' ideas for that matter).
>>> However, I sincerely apologize in advance if I misunderstood/misjudged your
>>> intentions; I just wanted to voice my concerns out loud at this point,
>>> considering the bad feeling I got by looking at your responses collectively.
>>>
>>
>> Hi, Srivatsa
>>
>> I'm sorry, big apology to you.
>> I'm bad in communication and I did be wrong.
>> I tended to improve the codes but in false direction.
>>
> 
> OK, in that case, I'm extremely sorry too, for jumping on you like that.
> I hope you'll forgive me for the uneasiness it caused.
> 
> Now that I understand that you were simply trying to help, I would like to
> express my gratitude for your time, effort and inputs in improving the design
> of the stop-machine replacement.
> 
> I'm looking forward to working with you on this as well as future endeavours,
> so I sincerely hope that we can put this unfortunate incident behind us and
> collaborate effectively with renewed mutual trust and good-will.
> 
> Thank you very much!
> 

Hi, Srivatsa,

I'm sorry again, I delayed your works.

I have some thinkings about the way how to get this work done.

First step: (2~3 patches)
Use preempt_disable() to implement get_online_cpu_atomic(), and add lockdep for it.

Second step:
Conversion patches.

We can send the patchset of the above steps at first.
{
It does not change any behavior of the kernel.
and it is annotation(instead of direct preempt_diable() without comments sometimes),
so I expected they can be merged very early.
}

Third step:
After all people confide the conversion patches covered all cases and cpuhotplug site is ready for it,
we will implement get_online_cpu_atomic() via locks and remove stop_machine() from cpuhotplug.

Any thought?

Thanks,
Lai

If I have time, I will help you for the patches of the first step.
(I was assigned bad job in office-time, I can only do kernel-dev work in night.)

And for step2, I will write a checklist or spatch-script.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-03-05 16:25                                       ` Lai Jiangshan
  0 siblings, 0 replies; 360+ messages in thread
From: Lai Jiangshan @ 2013-03-05 16:25 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/03/13 03:47, Srivatsa S. Bhat wrote:
> On 03/01/2013 11:20 PM, Lai Jiangshan wrote:
>> On 28/02/13 05:19, Srivatsa S. Bhat wrote:
>>> On 02/27/2013 06:03 AM, Lai Jiangshan wrote:
>>>> On Wed, Feb 27, 2013 at 3:30 AM, Srivatsa S. Bhat
>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>> On 02/26/2013 09:55 PM, Lai Jiangshan wrote:
>>>>>> On Tue, Feb 26, 2013 at 10:22 PM, Srivatsa S. Bhat
>>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>>
>>>>>>> Hi Lai,
>>>>>>>
>>>>>>> I'm really not convinced that piggy-backing on lglocks would help
>>>>>>> us in any way. But still, let me try to address some of the points
>>>>>>> you raised...
>>>>>>>
>>>>>>> On 02/26/2013 06:29 PM, Lai Jiangshan wrote:
>>>>>>>> On Tue, Feb 26, 2013 at 5:02 PM, Srivatsa S. Bhat
>>>>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>>>> On 02/26/2013 05:47 AM, Lai Jiangshan wrote:
>>>>>>>>>> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat
>>>>>>>>>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>>>>>>>>> Hi Lai,
>>>>>>>>>>>
>>>>>>>>>>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote:
>>>>>>>>>>>> Hi, Srivatsa,
>>>>>>>>>>>>
>>>>>>>>>>>> The target of the whole patchset is nice for me.
>>>>>>>>>>>
>>>>>>>>>>> Cool! Thanks :-)
>>>>>>>>>>>
>>>>>>>>> [...]
>>>>>>>>>
>>>>>>>>> Unfortunately, I see quite a few issues with the code above. IIUC, the
>>>>>>>>> writer and the reader both increment the same counters. So how will the
>>>>>>>>> unlock() code in the reader path know when to unlock which of the locks?
>>>>>>>>
>>>>>>>> The same as your code, the reader(which nested in write C.S.) just dec
>>>>>>>> the counters.
>>>>>>>
>>>>>>> And that works fine in my case because the writer and the reader update
>>>>>>> _two_ _different_ counters.
>>>>>>
>>>>>> I can't find any magic in your code, they are the same counter.
>>>>>>
>>>>>>         /*
>>>>>>          * It is desirable to allow the writer to acquire the percpu-rwlock
>>>>>>          * for read (if necessary), without deadlocking or getting complaints
>>>>>>          * from lockdep. To achieve that, just increment the reader_refcnt of
>>>>>>          * this CPU - that way, any attempt by the writer to acquire the
>>>>>>          * percpu-rwlock for read, will get treated as a case of nested percpu
>>>>>>          * reader, which is safe, from a locking perspective.
>>>>>>          */
>>>>>>         this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt);
>>>>>>
>>>>>
>>>>> Whoa! Hold on, were you really referring to _this_ increment when you said
>>>>> that, in your patch you would increment the refcnt at the writer? Then I guess
>>>>> there is a major disconnect in our conversations. (I had assumed that you were
>>>>> referring to the update of writer_signal, and were just trying to have a single
>>>>> refcnt instead of reader_refcnt and writer_signal).
>>>>
>>>> https://github.com/laijs/linux/commit/53e5053d5b724bea7c538b11743d0f420d98f38d
>>>>
>>>> Sorry the name "fallback_reader_refcnt" misled you.
>>>>
>>> [...]
>>>
>>>>>> All I was considered is "nested reader is seldom", so I always
>>>>>> fallback to rwlock when nested.
>>>>>> If you like, I can add 6 lines of code, the overhead is
>>>>>> 1 spin_try_lock()(fast path)  + N  __this_cpu_inc()
>>>>>>
>>>>>
>>>>> I'm assuming that calculation is no longer valid, considering that
>>>>> we just discussed how the per-cpu refcnt that you were using is quite
>>>>> unnecessary and can be removed.
>>>>>
>>>>> IIUC, the overhead with your code, as per above discussion would be:
>>>>> 1 spin_try_lock() [non-nested] + N read_lock(global_rwlock).
>>>>
>>>> https://github.com/laijs/linux/commit/46334544bb7961550b7065e015da76f6dab21f16
>>>>
>>>> Again, I'm so sorry the name "fallback_reader_refcnt" misled you.
>>>>
>>>
>>> At this juncture I really have to admit that I don't understand your
>>> intentions at all. What are you really trying to prove? Without giving
>>> a single good reason why my code is inferior, why are you even bringing
>>> up the discussion about a complete rewrite of the synchronization code?
>>> http://article.gmane.org/gmane.linux.kernel.cross-arch/17103
>>> http://article.gmane.org/gmane.linux.power-management.general/31345
>>>
>>> I'm beginning to add 2 + 2 together based on the kinds of questions you
>>> have been asking...
>>>
>>> You posted a patch in this thread and started a discussion around it without
>>> even establishing a strong reason to do so. Now you point me to your git
>>> tree where your patches have even more traces of ideas being borrowed from
>>> my patchset (apart from my own ideas/code, there are traces of others' ideas
>>> being borrowed too - for example, it was Oleg who originally proposed the
>>> idea of splitting up the counter into 2 parts and I'm seeing that it is
>>> slowly crawling into your code with no sign of appropriate credits).
>>> http://article.gmane.org/gmane.linux.network/260288
>>>
>>> And in reply to my mail pointing out the performance implications of the
>>> global read_lock at the reader side in your code, you said you'll come up
>>> with a comparison between that and my patchset.
>>> http://article.gmane.org/gmane.linux.network/260288
>>> The issue has been well-documented in my patch description of patch 4.
>>> http://article.gmane.org/gmane.linux.kernel/1443258
>>>
>>> Are you really trying to pit bits and pieces of my own ideas/versions
>>> against one another and claiming them as your own?
>>>
>>> You projected the work involved in handling the locking issues pertaining
>>> to CPU_DYING notifiers etc as a TODO, despite the fact that I had explicitly
>>> noted in my cover letter that I had audited and taken care of all of them.
>>> http://article.gmane.org/gmane.linux.documentation/9727
>>> http://article.gmane.org/gmane.linux.documentation/9520
>>>
>>> You failed to acknowledge (on purpose?) that I had done a tree-wide
>>> conversion despite the fact that you were replying to the very thread which
>>> had the 46 patches which did exactly that (and I had also mentioned it
>>> explicitly in my cover letter).
>>> http://article.gmane.org/gmane.linux.documentation/9727
>>> http://article.gmane.org/gmane.linux.documentation/9520
>>>
>>> You then started probing more and more about the technique I used to do
>>> the tree-wide conversion.
>>> http://article.gmane.org/gmane.linux.kernel.cross-arch/17111
>>>
>>> You also retorted saying you did go through my patch descriptions, so
>>> its not like you have missed reading them.
>>> http://article.gmane.org/gmane.linux.power-management.general/31345
>>>
>>> Each of these when considered individually, might appear like innocuous and
>>> honest attempts at evaluating my code. But when put together, I'm beginning
>>> to sense a whole different angle to it altogether, as if you are trying
>>> to spin your own patch series, complete with the locking framework _and_
>>> the tree-wide conversion, heavily borrowed from mine. At the beginning of
>>> this discussion, I predicted that the lglock version that you are proposing
>>> would end up being either less efficient than my version or look very similar
>>> to my version. http://article.gmane.org/gmane.linux.kernel/1447139
>>>
>>> I thought it was just the former till now, but its not hard to see how it
>>> is getting closer to becoming the latter too. So yeah, I'm not amused.
>>>
>>> Maybe (and hopefully) you are just trying out different ideas on your own,
>>> and I'm just being paranoid. I really hope that is the case. If you are just
>>> trying to review my code, then please stop sending patches with borrowed ideas
>>> with your sole Signed-off-by, and purposefully ignoring the work already done
>>> in my patchset, because it is really starting to look suspicious, at least
>>> to me.
>>>
>>> Don't get me wrong - I'll whole-heartedly acknowledge and appreciate if
>>> _your_ code is better than mine. I just don't like the idea of somebody
>>> plagiarizing my ideas/code (or even others' ideas for that matter).
>>> However, I sincerely apologize in advance if I misunderstood/misjudged your
>>> intentions; I just wanted to voice my concerns out loud at this point,
>>> considering the bad feeling I got by looking at your responses collectively.
>>>
>>
>> Hi, Srivatsa
>>
>> I'm sorry, big apology to you.
>> I'm bad in communication and I did be wrong.
>> I tended to improve the codes but in false direction.
>>
> 
> OK, in that case, I'm extremely sorry too, for jumping on you like that.
> I hope you'll forgive me for the uneasiness it caused.
> 
> Now that I understand that you were simply trying to help, I would like to
> express my gratitude for your time, effort and inputs in improving the design
> of the stop-machine replacement.
> 
> I'm looking forward to working with you on this as well as future endeavours,
> so I sincerely hope that we can put this unfortunate incident behind us and
> collaborate effectively with renewed mutual trust and good-will.
> 
> Thank you very much!
> 

Hi, Srivatsa,

I'm sorry again, I delayed your works.

I have some thinkings about the way how to get this work done.

First step: (2~3 patches)
Use preempt_disable() to implement get_online_cpu_atomic(), and add lockdep for it.

Second step:
Conversion patches.

We can send the patchset of the above steps at first.
{
It does not change any behavior of the kernel.
and it is annotation(instead of direct preempt_diable() without comments sometimes),
so I expected they can be merged very early.
}

Third step:
After all people confide the conversion patches covered all cases and cpuhotplug site is ready for it,
we will implement get_online_cpu_atomic() via locks and remove stop_machine() from cpuhotplug.

Any thought?

Thanks,
Lai

If I have time, I will help you for the patches of the first step.
(I was assigned bad job in office-time, I can only do kernel-dev work in night.)

And for step2, I will write a checklist or spatch-script.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH] lglock: add read-preference local-global rwlock
  2013-03-05 15:54                                           ` Lai Jiangshan
  (?)
@ 2013-03-05 16:32                                             ` Michel Lespinasse
  -1 siblings, 0 replies; 360+ messages in thread
From: Michel Lespinasse @ 2013-03-05 16:32 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Oleg Nesterov, Srivatsa S. Bhat, Lai Jiangshan, linux-doc,
	peterz, fweisbec, linux-kernel, namhyung, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, vincent.guittot, tglx, linux-arm-kernel, netdev,
	sbw, tj, akpm, linuxppc-dev

On Tue, Mar 5, 2013 at 7:54 AM, Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
> On 03/03/13 01:06, Oleg Nesterov wrote:
>> On 03/02, Michel Lespinasse wrote:
>>>
>>> My version would be slower if it needs to take the
>>> slow path in a reentrant way, but I'm not sure it matters either :)
>>
>> I'd say, this doesn't matter at all, simply because this can only happen
>> if we race with the active writer.
>>
>
> It can also happen when interrupted. (still very rarely)
>
> arch_spin_trylock()
>         ------->interrupted,
>                 __this_cpu_read() returns 0.
>                 arch_spin_trylock() fails
>                 slowpath, any nested will be slowpath too.
>                 ...
>                 ..._read_unlock()
>         <-------interrupt
> __this_cpu_inc()
> ....

Yes (and I think this is actually the most likely way for it to happen).

We do need this to work correctly, but I don't expect we need it to be fast.
(could be wrong, this is only my intuition)

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH] lglock: add read-preference local-global rwlock
@ 2013-03-05 16:32                                             ` Michel Lespinasse
  0 siblings, 0 replies; 360+ messages in thread
From: Michel Lespinasse @ 2013-03-05 16:32 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Lai Jiangshan, linux-doc, peterz, fweisbec, linux-kernel, mingo,
	linux-arch, linux, xiaoguangrong, wangyun, paulmck, nikunj,
	linux-pm, rusty, rostedt, rjw, namhyung, tglx, linux-arm-kernel,
	netdev, Oleg Nesterov, vincent.guittot, sbw, Srivatsa S. Bhat,
	tj, akpm, linuxppc-dev

On Tue, Mar 5, 2013 at 7:54 AM, Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
> On 03/03/13 01:06, Oleg Nesterov wrote:
>> On 03/02, Michel Lespinasse wrote:
>>>
>>> My version would be slower if it needs to take the
>>> slow path in a reentrant way, but I'm not sure it matters either :)
>>
>> I'd say, this doesn't matter at all, simply because this can only happen
>> if we race with the active writer.
>>
>
> It can also happen when interrupted. (still very rarely)
>
> arch_spin_trylock()
>         ------->interrupted,
>                 __this_cpu_read() returns 0.
>                 arch_spin_trylock() fails
>                 slowpath, any nested will be slowpath too.
>                 ...
>                 ..._read_unlock()
>         <-------interrupt
> __this_cpu_inc()
> ....

Yes (and I think this is actually the most likely way for it to happen).

We do need this to work correctly, but I don't expect we need it to be fast.
(could be wrong, this is only my intuition)

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH] lglock: add read-preference local-global rwlock
@ 2013-03-05 16:32                                             ` Michel Lespinasse
  0 siblings, 0 replies; 360+ messages in thread
From: Michel Lespinasse @ 2013-03-05 16:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Mar 5, 2013 at 7:54 AM, Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
> On 03/03/13 01:06, Oleg Nesterov wrote:
>> On 03/02, Michel Lespinasse wrote:
>>>
>>> My version would be slower if it needs to take the
>>> slow path in a reentrant way, but I'm not sure it matters either :)
>>
>> I'd say, this doesn't matter at all, simply because this can only happen
>> if we race with the active writer.
>>
>
> It can also happen when interrupted. (still very rarely)
>
> arch_spin_trylock()
>         ------->interrupted,
>                 __this_cpu_read() returns 0.
>                 arch_spin_trylock() fails
>                 slowpath, any nested will be slowpath too.
>                 ...
>                 ..._read_unlock()
>         <-------interrupt
> __this_cpu_inc()
> ....

Yes (and I think this is actually the most likely way for it to happen).

We do need this to work correctly, but I don't expect we need it to be fast.
(could be wrong, this is only my intuition)

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH] lglock: add read-preference local-global rwlock
  2013-03-05 15:54                                           ` Lai Jiangshan
  (?)
@ 2013-03-05 16:35                                             ` Oleg Nesterov
  -1 siblings, 0 replies; 360+ messages in thread
From: Oleg Nesterov @ 2013-03-05 16:35 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Michel Lespinasse, Srivatsa S. Bhat, Lai Jiangshan, linux-doc,
	peterz, fweisbec, linux-kernel, namhyung, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, vincent.guittot, tglx, linux-arm-kernel, netdev,
	sbw, tj, akpm, linuxppc-dev

On 03/05, Lai Jiangshan wrote:
>
> On 03/03/13 01:06, Oleg Nesterov wrote:
> > On 03/02, Michel Lespinasse wrote:
> >>
> >> My version would be slower if it needs to take the
> >> slow path in a reentrant way, but I'm not sure it matters either :)
> >
> > I'd say, this doesn't matter at all, simply because this can only happen
> > if we race with the active writer.
>
> It can also happen when interrupted. (still very rarely)
>
> arch_spin_trylock()
> 	------->interrupted,
> 		__this_cpu_read() returns 0.
> 		arch_spin_trylock() fails
> 		slowpath, any nested will be slowpath too.
> 		...
> 		..._read_unlock()
> 	<-------interrupt
> __this_cpu_inc()
> ....

Yes sure. Or it can take the local lock after we already take the global
fallback_lock.

But the same can happen with FALLBACK_BASE, just because we need to take
a lock (local or global) first, then increment the counter.

> (I worries to much. I tend to remove FALLBACK_BASE now, we should
> add it only after we proved we needed it, this part is not proved)

Agreed, great ;)

Oleg.


^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH] lglock: add read-preference local-global rwlock
@ 2013-03-05 16:35                                             ` Oleg Nesterov
  0 siblings, 0 replies; 360+ messages in thread
From: Oleg Nesterov @ 2013-03-05 16:35 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Lai Jiangshan, linux-doc, peterz, fweisbec, Michel Lespinasse,
	mingo, linux-arch, linux, xiaoguangrong, wangyun, paulmck,
	nikunj, linux-pm, rusty, rostedt, rjw, namhyung, tglx,
	linux-arm-kernel, netdev, linux-kernel, vincent.guittot, sbw,
	Srivatsa S. Bhat, tj, akpm, linuxppc-dev

On 03/05, Lai Jiangshan wrote:
>
> On 03/03/13 01:06, Oleg Nesterov wrote:
> > On 03/02, Michel Lespinasse wrote:
> >>
> >> My version would be slower if it needs to take the
> >> slow path in a reentrant way, but I'm not sure it matters either :)
> >
> > I'd say, this doesn't matter at all, simply because this can only happen
> > if we race with the active writer.
>
> It can also happen when interrupted. (still very rarely)
>
> arch_spin_trylock()
> 	------->interrupted,
> 		__this_cpu_read() returns 0.
> 		arch_spin_trylock() fails
> 		slowpath, any nested will be slowpath too.
> 		...
> 		..._read_unlock()
> 	<-------interrupt
> __this_cpu_inc()
> ....

Yes sure. Or it can take the local lock after we already take the global
fallback_lock.

But the same can happen with FALLBACK_BASE, just because we need to take
a lock (local or global) first, then increment the counter.

> (I worries to much. I tend to remove FALLBACK_BASE now, we should
> add it only after we proved we needed it, this part is not proved)

Agreed, great ;)

Oleg.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH] lglock: add read-preference local-global rwlock
@ 2013-03-05 16:35                                             ` Oleg Nesterov
  0 siblings, 0 replies; 360+ messages in thread
From: Oleg Nesterov @ 2013-03-05 16:35 UTC (permalink / raw)
  To: linux-arm-kernel

On 03/05, Lai Jiangshan wrote:
>
> On 03/03/13 01:06, Oleg Nesterov wrote:
> > On 03/02, Michel Lespinasse wrote:
> >>
> >> My version would be slower if it needs to take the
> >> slow path in a reentrant way, but I'm not sure it matters either :)
> >
> > I'd say, this doesn't matter at all, simply because this can only happen
> > if we race with the active writer.
>
> It can also happen when interrupted. (still very rarely)
>
> arch_spin_trylock()
> 	------->interrupted,
> 		__this_cpu_read() returns 0.
> 		arch_spin_trylock() fails
> 		slowpath, any nested will be slowpath too.
> 		...
> 		..._read_unlock()
> 	<-------interrupt
> __this_cpu_inc()
> ....

Yes sure. Or it can take the local lock after we already take the global
fallback_lock.

But the same can happen with FALLBACK_BASE, just because we need to take
a lock (local or global) first, then increment the counter.

> (I worries to much. I tend to remove FALLBACK_BASE now, we should
> add it only after we proved we needed it, this part is not proved)

Agreed, great ;)

Oleg.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH V2] lglock: add read-preference local-global rwlock
  2013-03-05 15:27                                             ` Lai Jiangshan
  (?)
@ 2013-03-05 16:41                                               ` Oleg Nesterov
  -1 siblings, 0 replies; 360+ messages in thread
From: Oleg Nesterov @ 2013-03-05 16:41 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Michel Lespinasse, Srivatsa S. Bhat, Lai Jiangshan, linux-doc,
	peterz, fweisbec, linux-kernel, namhyung, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, vincent.guittot, tglx, linux-arm-kernel, netdev,
	sbw, tj, akpm, linuxppc-dev

On 03/05, Lai Jiangshan wrote:
>
> On 03/03/13 01:20, Oleg Nesterov wrote:
> > On 03/02, Lai Jiangshan wrote:
> >>
> >> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
> >> +{
> >> +	switch (__this_cpu_read(*lgrw->reader_refcnt)) {
> >> +	case 1:
> >> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
> >> +		lg_local_unlock(&lgrw->lglock);
> >> +		return;
> >> +	case FALLBACK_BASE:
> >> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
> >> +		read_unlock(&lgrw->fallback_rwlock);
> >> +		rwlock_release(&lg->lock_dep_map, 1, _RET_IP_);
> >
> > I guess "case 1:" should do rwlock_release() too.
>
> Already do it in "lg_local_unlock(&lgrw->lglock);" before it returns.
> (I like reuse old code)

Yes, I was wrong thanks. Another case when I didn't notice that you
re-use the regular lg_ code...

> > We need rwlock_acquire_read() even in the fast-path, and this acquire_read
> > should be paired with rwlock_acquire() in _write_lock(), but it does
> > spin_acquire(lg->lock_dep_map). Yes, currently this is the same (afaics)
> > but perhaps fallback_rwlock->dep_map would be more clean.
>
> I can't tell which one is better. I try to use fallback_rwlock->dep_map later.

I am not sure which one should be better too, please check.

Again, I forgot that _write_lock/unlock use lg_global_*() code.

Oleg.


^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH V2] lglock: add read-preference local-global rwlock
@ 2013-03-05 16:41                                               ` Oleg Nesterov
  0 siblings, 0 replies; 360+ messages in thread
From: Oleg Nesterov @ 2013-03-05 16:41 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Lai Jiangshan, linux-doc, peterz, fweisbec, Michel Lespinasse,
	mingo, linux-arch, linux, xiaoguangrong, wangyun, paulmck,
	nikunj, linux-pm, rusty, rostedt, rjw, namhyung, tglx,
	linux-arm-kernel, netdev, linux-kernel, vincent.guittot, sbw,
	Srivatsa S. Bhat, tj, akpm, linuxppc-dev

On 03/05, Lai Jiangshan wrote:
>
> On 03/03/13 01:20, Oleg Nesterov wrote:
> > On 03/02, Lai Jiangshan wrote:
> >>
> >> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
> >> +{
> >> +	switch (__this_cpu_read(*lgrw->reader_refcnt)) {
> >> +	case 1:
> >> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
> >> +		lg_local_unlock(&lgrw->lglock);
> >> +		return;
> >> +	case FALLBACK_BASE:
> >> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
> >> +		read_unlock(&lgrw->fallback_rwlock);
> >> +		rwlock_release(&lg->lock_dep_map, 1, _RET_IP_);
> >
> > I guess "case 1:" should do rwlock_release() too.
>
> Already do it in "lg_local_unlock(&lgrw->lglock);" before it returns.
> (I like reuse old code)

Yes, I was wrong thanks. Another case when I didn't notice that you
re-use the regular lg_ code...

> > We need rwlock_acquire_read() even in the fast-path, and this acquire_read
> > should be paired with rwlock_acquire() in _write_lock(), but it does
> > spin_acquire(lg->lock_dep_map). Yes, currently this is the same (afaics)
> > but perhaps fallback_rwlock->dep_map would be more clean.
>
> I can't tell which one is better. I try to use fallback_rwlock->dep_map later.

I am not sure which one should be better too, please check.

Again, I forgot that _write_lock/unlock use lg_global_*() code.

Oleg.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH V2] lglock: add read-preference local-global rwlock
@ 2013-03-05 16:41                                               ` Oleg Nesterov
  0 siblings, 0 replies; 360+ messages in thread
From: Oleg Nesterov @ 2013-03-05 16:41 UTC (permalink / raw)
  To: linux-arm-kernel

On 03/05, Lai Jiangshan wrote:
>
> On 03/03/13 01:20, Oleg Nesterov wrote:
> > On 03/02, Lai Jiangshan wrote:
> >>
> >> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
> >> +{
> >> +	switch (__this_cpu_read(*lgrw->reader_refcnt)) {
> >> +	case 1:
> >> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
> >> +		lg_local_unlock(&lgrw->lglock);
> >> +		return;
> >> +	case FALLBACK_BASE:
> >> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
> >> +		read_unlock(&lgrw->fallback_rwlock);
> >> +		rwlock_release(&lg->lock_dep_map, 1, _RET_IP_);
> >
> > I guess "case 1:" should do rwlock_release() too.
>
> Already do it in "lg_local_unlock(&lgrw->lglock);" before it returns.
> (I like reuse old code)

Yes, I was wrong thanks. Another case when I didn't notice that you
re-use the regular lg_ code...

> > We need rwlock_acquire_read() even in the fast-path, and this acquire_read
> > should be paired with rwlock_acquire() in _write_lock(), but it does
> > spin_acquire(lg->lock_dep_map). Yes, currently this is the same (afaics)
> > but perhaps fallback_rwlock->dep_map would be more clean.
>
> I can't tell which one is better. I try to use fallback_rwlock->dep_map later.

I am not sure which one should be better too, please check.

Again, I forgot that _write_lock/unlock use lg_global_*() code.

Oleg.

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH V2] lglock: add read-preference local-global rwlock
  2013-03-05 15:41                                             ` Lai Jiangshan
  (?)
@ 2013-03-05 17:55                                               ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-03-05 17:55 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Michel Lespinasse, Oleg Nesterov, Lai Jiangshan, linux-doc,
	peterz, fweisbec, linux-kernel, namhyung, mingo, linux-arch,
	linux, xiaoguangrong, wangyun, paulmck, nikunj, linux-pm, rusty,
	rostedt, rjw, vincent.guittot, tglx, linux-arm-kernel, netdev,
	sbw, tj, akpm, linuxppc-dev

On 03/05/2013 09:11 PM, Lai Jiangshan wrote:
> On 03/03/13 01:11, Srivatsa S. Bhat wrote:
>> On 03/02/2013 06:44 PM, Lai Jiangshan wrote:
>>> From 345a7a75c314ff567be48983e0892bc69c4452e7 Mon Sep 17 00:00:00 2001
>>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>>> Date: Sat, 2 Mar 2013 20:33:14 +0800
>>> Subject: [PATCH] lglock: add read-preference local-global rwlock
>>>
>>> Current lglock is not read-preference, so it can't be used on some cases
>>> which read-preference rwlock can do. Example, get_cpu_online_atomic().
>>>
>> [...]
>>> diff --git a/kernel/lglock.c b/kernel/lglock.c
>>> index 6535a66..52e9b2c 100644
>>> --- a/kernel/lglock.c
>>> +++ b/kernel/lglock.c
>>> @@ -87,3 +87,71 @@ void lg_global_unlock(struct lglock *lg)
>>>  	preempt_enable();
>>>  }
>>>  EXPORT_SYMBOL(lg_global_unlock);
>>> +
>>> +#define FALLBACK_BASE	(1UL << 30)
>>> +
>>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
>>> +{
>>> +	struct lglock *lg = &lgrw->lglock;
>>> +
>>> +	preempt_disable();
>>> +	if (likely(!__this_cpu_read(*lgrw->reader_refcnt))) {
>>> +		rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
>>> +		if (unlikely(!arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
>>> +			read_lock(&lgrw->fallback_rwlock);
>>> +			__this_cpu_write(*lgrw->reader_refcnt, FALLBACK_BASE);
>>> +			return;
>>> +		}
>>> +	}
>>> +
>>> +	__this_cpu_inc(*lgrw->reader_refcnt);
>>> +}
>>> +EXPORT_SYMBOL(lg_rwlock_local_read_lock);
>>> +
>>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
>>> +{
>>> +	switch (__this_cpu_read(*lgrw->reader_refcnt)) {
>>> +	case 1:
>>> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
>>> +		lg_local_unlock(&lgrw->lglock);
>>> +		return;
>>
>> This should be a break, instead of a return, right?
>> Otherwise, there will be a preempt imbalance...
> 
> 
> "lockdep" and "preempt" are handled in lg_local_unlock(&lgrw->lglock);
> 

Ah, ok.. I had missed that.

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH V2] lglock: add read-preference local-global rwlock
@ 2013-03-05 17:55                                               ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-03-05 17:55 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Lai Jiangshan, linux-doc, peterz, fweisbec, linux-kernel,
	Michel Lespinasse, mingo, linux-arch, linux, xiaoguangrong,
	wangyun, paulmck, nikunj, linux-pm, rusty, rostedt, rjw,
	namhyung, tglx, linux-arm-kernel, netdev, Oleg Nesterov,
	vincent.guittot, sbw, tj, akpm, linuxppc-dev

On 03/05/2013 09:11 PM, Lai Jiangshan wrote:
> On 03/03/13 01:11, Srivatsa S. Bhat wrote:
>> On 03/02/2013 06:44 PM, Lai Jiangshan wrote:
>>> From 345a7a75c314ff567be48983e0892bc69c4452e7 Mon Sep 17 00:00:00 2001
>>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>>> Date: Sat, 2 Mar 2013 20:33:14 +0800
>>> Subject: [PATCH] lglock: add read-preference local-global rwlock
>>>
>>> Current lglock is not read-preference, so it can't be used on some cases
>>> which read-preference rwlock can do. Example, get_cpu_online_atomic().
>>>
>> [...]
>>> diff --git a/kernel/lglock.c b/kernel/lglock.c
>>> index 6535a66..52e9b2c 100644
>>> --- a/kernel/lglock.c
>>> +++ b/kernel/lglock.c
>>> @@ -87,3 +87,71 @@ void lg_global_unlock(struct lglock *lg)
>>>  	preempt_enable();
>>>  }
>>>  EXPORT_SYMBOL(lg_global_unlock);
>>> +
>>> +#define FALLBACK_BASE	(1UL << 30)
>>> +
>>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
>>> +{
>>> +	struct lglock *lg = &lgrw->lglock;
>>> +
>>> +	preempt_disable();
>>> +	if (likely(!__this_cpu_read(*lgrw->reader_refcnt))) {
>>> +		rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
>>> +		if (unlikely(!arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
>>> +			read_lock(&lgrw->fallback_rwlock);
>>> +			__this_cpu_write(*lgrw->reader_refcnt, FALLBACK_BASE);
>>> +			return;
>>> +		}
>>> +	}
>>> +
>>> +	__this_cpu_inc(*lgrw->reader_refcnt);
>>> +}
>>> +EXPORT_SYMBOL(lg_rwlock_local_read_lock);
>>> +
>>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
>>> +{
>>> +	switch (__this_cpu_read(*lgrw->reader_refcnt)) {
>>> +	case 1:
>>> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
>>> +		lg_local_unlock(&lgrw->lglock);
>>> +		return;
>>
>> This should be a break, instead of a return, right?
>> Otherwise, there will be a preempt imbalance...
> 
> 
> "lockdep" and "preempt" are handled in lg_local_unlock(&lgrw->lglock);
> 

Ah, ok.. I had missed that.

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH V2] lglock: add read-preference local-global rwlock
@ 2013-03-05 17:55                                               ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-03-05 17:55 UTC (permalink / raw)
  To: linux-arm-kernel

On 03/05/2013 09:11 PM, Lai Jiangshan wrote:
> On 03/03/13 01:11, Srivatsa S. Bhat wrote:
>> On 03/02/2013 06:44 PM, Lai Jiangshan wrote:
>>> From 345a7a75c314ff567be48983e0892bc69c4452e7 Mon Sep 17 00:00:00 2001
>>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>>> Date: Sat, 2 Mar 2013 20:33:14 +0800
>>> Subject: [PATCH] lglock: add read-preference local-global rwlock
>>>
>>> Current lglock is not read-preference, so it can't be used on some cases
>>> which read-preference rwlock can do. Example, get_cpu_online_atomic().
>>>
>> [...]
>>> diff --git a/kernel/lglock.c b/kernel/lglock.c
>>> index 6535a66..52e9b2c 100644
>>> --- a/kernel/lglock.c
>>> +++ b/kernel/lglock.c
>>> @@ -87,3 +87,71 @@ void lg_global_unlock(struct lglock *lg)
>>>  	preempt_enable();
>>>  }
>>>  EXPORT_SYMBOL(lg_global_unlock);
>>> +
>>> +#define FALLBACK_BASE	(1UL << 30)
>>> +
>>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw)
>>> +{
>>> +	struct lglock *lg = &lgrw->lglock;
>>> +
>>> +	preempt_disable();
>>> +	if (likely(!__this_cpu_read(*lgrw->reader_refcnt))) {
>>> +		rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_);
>>> +		if (unlikely(!arch_spin_trylock(this_cpu_ptr(lg->lock)))) {
>>> +			read_lock(&lgrw->fallback_rwlock);
>>> +			__this_cpu_write(*lgrw->reader_refcnt, FALLBACK_BASE);
>>> +			return;
>>> +		}
>>> +	}
>>> +
>>> +	__this_cpu_inc(*lgrw->reader_refcnt);
>>> +}
>>> +EXPORT_SYMBOL(lg_rwlock_local_read_lock);
>>> +
>>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw)
>>> +{
>>> +	switch (__this_cpu_read(*lgrw->reader_refcnt)) {
>>> +	case 1:
>>> +		__this_cpu_write(*lgrw->reader_refcnt, 0);
>>> +		lg_local_unlock(&lgrw->lglock);
>>> +		return;
>>
>> This should be a break, instead of a return, right?
>> Otherwise, there will be a preempt imbalance...
> 
> 
> "lockdep" and "preempt" are handled in lg_local_unlock(&lgrw->lglock);
> 

Ah, ok.. I had missed that.

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
  2013-03-05 16:25                                       ` Lai Jiangshan
  (?)
  (?)
@ 2013-03-05 18:27                                         ` Srivatsa S. Bhat
  -1 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-03-05 18:27 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Lai Jiangshan, Michel Lespinasse, linux-doc, peterz, fweisbec,
	linux-kernel, namhyung, mingo, linux-arch, linux, xiaoguangrong,
	wangyun, paulmck, nikunj, linux-pm, rusty, rostedt, rjw,
	vincent.guittot, tglx, linux-arm-kernel, netdev, oleg, sbw, tj,
	akpm, linuxppc-dev

Hi Lai,

On 03/05/2013 09:55 PM, Lai Jiangshan wrote:
> Hi, Srivatsa,
> 
> I'm sorry again, I delayed your works.
>

No, you didn't :-) I have been busy with some internal work lately,
so I haven't been able to go through the recent discussions and
review the new code carefully.. I'll get to it as soon as I can.
 
> I have some thinkings about the way how to get this work done.
> 
> First step: (2~3 patches)
> Use preempt_disable() to implement get_online_cpu_atomic(), and add lockdep for it.
> 
> Second step:
> Conversion patches.
> 
> We can send the patchset of the above steps at first.
> {
> It does not change any behavior of the kernel.
> and it is annotation(instead of direct preempt_diable() without comments sometimes),
> so I expected they can be merged very early.
> }
> 
> Third step:
> After all people confide the conversion patches covered all cases and cpuhotplug site is ready for it,
> we will implement get_online_cpu_atomic() via locks and remove stop_machine() from cpuhotplug.
> 
> Any thought?
> 

That sounds like a good plan. It might involve slightly more churn
than just directly changing the locking scheme, but it is safer.
And the extra churn is anyway limited only to the implementation of
get/put_online_cpus_atomic().. so that should be fine IMHO.

> 
> If I have time, I will help you for the patches of the first step.
> (I was assigned bad job in office-time, I can only do kernel-dev work in night.)
> 
> And for step2, I will write a checklist or spatch-script.
> 

Do look at the conversion already done in this v6 as well. In
addition to that, we will have to account for the new kernel
code that went in recently.

I'll get back to working on the above mentioned aspects soon.

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-03-05 18:27                                         ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-03-05 18:27 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Lai Jiangshan, Michel Lespinasse, linux-doc, peterz, fweisbec,
	linux-kernel, namhyung, mingo, linux-arch, linux, xiaoguangrong,
	wangyun, paulmck, nikunj, linux-pm, rusty, rostedt, rjw,
	vincent.guittot, tglx, linux-arm-kernel, netdev, oleg, sbw, tj,
	akpm, linuxppc-dev

Hi Lai,

On 03/05/2013 09:55 PM, Lai Jiangshan wrote:
> Hi, Srivatsa,
> 
> I'm sorry again, I delayed your works.
>

No, you didn't :-) I have been busy with some internal work lately,
so I haven't been able to go through the recent discussions and
review the new code carefully.. I'll get to it as soon as I can.
 
> I have some thinkings about the way how to get this work done.
> 
> First step: (2~3 patches)
> Use preempt_disable() to implement get_online_cpu_atomic(), and add lockdep for it.
> 
> Second step:
> Conversion patches.
> 
> We can send the patchset of the above steps at first.
> {
> It does not change any behavior of the kernel.
> and it is annotation(instead of direct preempt_diable() without comments sometimes),
> so I expected they can be merged very early.
> }
> 
> Third step:
> After all people confide the conversion patches covered all cases and cpuhotplug site is ready for it,
> we will implement get_online_cpu_atomic() via locks and remove stop_machine() from cpuhotplug.
> 
> Any thought?
> 

That sounds like a good plan. It might involve slightly more churn
than just directly changing the locking scheme, but it is safer.
And the extra churn is anyway limited only to the implementation of
get/put_online_cpus_atomic().. so that should be fine IMHO.

> 
> If I have time, I will help you for the patches of the first step.
> (I was assigned bad job in office-time, I can only do kernel-dev work in night.)
> 
> And for step2, I will write a checklist or spatch-script.
> 

Do look at the conversion already done in this v6 as well. In
addition to that, we will have to account for the new kernel
code that went in recently.

I'll get back to working on the above mentioned aspects soon.

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 360+ messages in thread

* Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-03-05 18:27                                         ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-03-05 18:27 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Lai Jiangshan, linux-doc, peterz, fweisbec, oleg,
	Michel Lespinasse, mingo, linux-arch, linux, xiaoguangrong,
	wangyun, paulmck, nikunj, linux-pm, rusty, rostedt, rjw,
	namhyung, tglx, linux-arm-kernel, netdev, linux-kernel,
	vincent.guittot, sbw, tj, akpm, linuxppc-dev

Hi Lai,

On 03/05/2013 09:55 PM, Lai Jiangshan wrote:
> Hi, Srivatsa,
> 
> I'm sorry again, I delayed your works.
>

No, you didn't :-) I have been busy with some internal work lately,
so I haven't been able to go through the recent discussions and
review the new code carefully.. I'll get to it as soon as I can.
 
> I have some thinkings about the way how to get this work done.
> 
> First step: (2~3 patches)
> Use preempt_disable() to implement get_online_cpu_atomic(), and add lockdep for it.
> 
> Second step:
> Conversion patches.
> 
> We can send the patchset of the above steps at first.
> {
> It does not change any behavior of the kernel.
> and it is annotation(instead of direct preempt_diable() without comments sometimes),
> so I expected they can be merged very early.
> }
> 
> Third step:
> After all people confide the conversion patches covered all cases and cpuhotplug site is ready for it,
> we will implement get_online_cpu_atomic() via locks and remove stop_machine() from cpuhotplug.
> 
> Any thought?
> 

That sounds like a good plan. It might involve slightly more churn
than just directly changing the locking scheme, but it is safer.
And the extra churn is anyway limited only to the implementation of
get/put_online_cpus_atomic().. so that should be fine IMHO.

> 
> If I have time, I will help you for the patches of the first step.
> (I was assigned bad job in office-time, I can only do kernel-dev work in night.)
> 
> And for step2, I will write a checklist or spatch-script.
> 

Do look at the conversion already done in this v6 as well. In
addition to that, we will have to account for the new kernel
code that went in recently.

I'll get back to working on the above mentioned aspects soon.

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

* [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks
@ 2013-03-05 18:27                                         ` Srivatsa S. Bhat
  0 siblings, 0 replies; 360+ messages in thread
From: Srivatsa S. Bhat @ 2013-03-05 18:27 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Lai,

On 03/05/2013 09:55 PM, Lai Jiangshan wrote:
> Hi, Srivatsa,
> 
> I'm sorry again, I delayed your works.
>

No, you didn't :-) I have been busy with some internal work lately,
so I haven't been able to go through the recent discussions and
review the new code carefully.. I'll get to it as soon as I can.
 
> I have some thinkings about the way how to get this work done.
> 
> First step: (2~3 patches)
> Use preempt_disable() to implement get_online_cpu_atomic(), and add lockdep for it.
> 
> Second step:
> Conversion patches.
> 
> We can send the patchset of the above steps at first.
> {
> It does not change any behavior of the kernel.
> and it is annotation(instead of direct preempt_diable() without comments sometimes),
> so I expected they can be merged very early.
> }
> 
> Third step:
> After all people confide the conversion patches covered all cases and cpuhotplug site is ready for it,
> we will implement get_online_cpu_atomic() via locks and remove stop_machine() from cpuhotplug.
> 
> Any thought?
> 

That sounds like a good plan. It might involve slightly more churn
than just directly changing the locking scheme, but it is safer.
And the extra churn is anyway limited only to the implementation of
get/put_online_cpus_atomic().. so that should be fine IMHO.

> 
> If I have time, I will help you for the patches of the first step.
> (I was assigned bad job in office-time, I can only do kernel-dev work in night.)
> 
> And for step2, I will write a checklist or spatch-script.
> 

Do look at the conversion already done in this v6 as well. In
addition to that, we will have to account for the new kernel
code that went in recently.

I'll get back to working on the above mentioned aspects soon.

Regards,
Srivatsa S. Bhat

^ permalink raw reply	[flat|nested] 360+ messages in thread

end of thread, other threads:[~2013-03-05 18:30 UTC | newest]

Thread overview: 360+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-18 12:38 [PATCH v6 00/46] CPU hotplug: stop_machine()-free CPU hotplug Srivatsa S. Bhat
2013-02-18 12:38 ` Srivatsa S. Bhat
2013-02-18 12:38 ` Srivatsa S. Bhat
2013-02-18 12:38 ` [PATCH v6 01/46] percpu_rwlock: Introduce the global reader-writer lock backend Srivatsa S. Bhat
2013-02-18 12:38   ` Srivatsa S. Bhat
2013-02-18 12:38   ` Srivatsa S. Bhat
2013-02-18 12:38 ` [PATCH v6 02/46] percpu_rwlock: Introduce per-CPU variables for the reader and the writer Srivatsa S. Bhat
2013-02-18 12:38   ` Srivatsa S. Bhat
2013-02-18 12:38   ` Srivatsa S. Bhat
2013-02-18 12:38 ` [PATCH v6 03/46] percpu_rwlock: Provide a way to define and init percpu-rwlocks at compile time Srivatsa S. Bhat
2013-02-18 12:38   ` Srivatsa S. Bhat
2013-02-18 12:38   ` Srivatsa S. Bhat
2013-02-18 12:38 ` [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks Srivatsa S. Bhat
2013-02-18 12:38   ` Srivatsa S. Bhat
2013-02-18 12:38   ` Srivatsa S. Bhat
2013-02-18 15:45   ` Michel Lespinasse
2013-02-18 15:45     ` Michel Lespinasse
2013-02-18 15:45     ` Michel Lespinasse
2013-02-18 16:21     ` Srivatsa S. Bhat
2013-02-18 16:21       ` Srivatsa S. Bhat
2013-02-18 16:21       ` Srivatsa S. Bhat
2013-02-18 16:31       ` Steven Rostedt
2013-02-18 16:31         ` Steven Rostedt
2013-02-18 16:31         ` Steven Rostedt
2013-02-18 16:46         ` Srivatsa S. Bhat
2013-02-18 16:46           ` Srivatsa S. Bhat
2013-02-18 16:46           ` Srivatsa S. Bhat
2013-02-18 17:56       ` Srivatsa S. Bhat
2013-02-18 17:56         ` Srivatsa S. Bhat
2013-02-18 17:56         ` Srivatsa S. Bhat
2013-02-18 18:07         ` Michel Lespinasse
2013-02-18 18:07           ` Michel Lespinasse
2013-02-18 18:07           ` Michel Lespinasse
2013-02-18 18:14           ` Srivatsa S. Bhat
2013-02-18 18:14             ` Srivatsa S. Bhat
2013-02-18 18:14             ` Srivatsa S. Bhat
2013-02-25 15:53             ` Lai Jiangshan
2013-02-25 15:53               ` Lai Jiangshan
2013-02-25 15:53               ` Lai Jiangshan
2013-02-25 15:53               ` Lai Jiangshan
2013-02-25 19:26               ` Srivatsa S. Bhat
2013-02-25 19:26                 ` Srivatsa S. Bhat
2013-02-25 19:26                 ` Srivatsa S. Bhat
2013-02-26  0:17                 ` Lai Jiangshan
2013-02-26  0:17                   ` Lai Jiangshan
2013-02-26  0:17                   ` Lai Jiangshan
2013-02-26  0:19                   ` Lai Jiangshan
2013-02-26  0:19                     ` Lai Jiangshan
2013-02-26  0:19                     ` Lai Jiangshan
2013-02-26  9:02                   ` Srivatsa S. Bhat
2013-02-26  9:02                     ` Srivatsa S. Bhat
2013-02-26  9:02                     ` Srivatsa S. Bhat
2013-02-26 12:59                     ` Lai Jiangshan
2013-02-26 12:59                       ` Lai Jiangshan
2013-02-26 12:59                       ` Lai Jiangshan
2013-02-26 14:22                       ` Srivatsa S. Bhat
2013-02-26 14:22                         ` Srivatsa S. Bhat
2013-02-26 14:22                         ` Srivatsa S. Bhat
2013-02-26 16:25                         ` Lai Jiangshan
2013-02-26 16:25                           ` Lai Jiangshan
2013-02-26 16:25                           ` Lai Jiangshan
2013-02-26 19:30                           ` Srivatsa S. Bhat
2013-02-26 19:30                             ` Srivatsa S. Bhat
2013-02-26 19:30                             ` Srivatsa S. Bhat
2013-02-27  0:33                             ` Lai Jiangshan
2013-02-27  0:33                               ` Lai Jiangshan
2013-02-27  0:33                               ` Lai Jiangshan
2013-02-27 21:19                               ` Srivatsa S. Bhat
2013-02-27 21:19                                 ` Srivatsa S. Bhat
2013-02-27 21:19                                 ` Srivatsa S. Bhat
2013-02-27 21:19                                 ` Srivatsa S. Bhat
2013-03-01 17:44                                 ` [PATCH] lglock: add read-preference local-global rwlock Lai Jiangshan
2013-03-01 17:44                                   ` Lai Jiangshan
2013-03-01 17:44                                   ` Lai Jiangshan
2013-03-01 17:44                                   ` Lai Jiangshan
2013-03-01 17:44                                   ` Lai Jiangshan
2013-03-01 17:44                                   ` Lai Jiangshan
2013-03-01 17:53                                   ` Tejun Heo
2013-03-01 17:53                                     ` Tejun Heo
2013-03-01 17:53                                     ` Tejun Heo
2013-03-01 20:06                                     ` Srivatsa S. Bhat
2013-03-01 20:06                                       ` Srivatsa S. Bhat
2013-03-01 20:06                                       ` Srivatsa S. Bhat
2013-03-01 18:28                                   ` Oleg Nesterov
2013-03-01 18:28                                     ` Oleg Nesterov
2013-03-01 18:28                                     ` Oleg Nesterov
2013-03-02 12:13                                     ` Michel Lespinasse
2013-03-02 12:13                                       ` Michel Lespinasse
2013-03-02 12:13                                       ` Michel Lespinasse
2013-03-02 12:13                                       ` Michel Lespinasse
2013-03-02 13:14                                       ` [PATCH V2] " Lai Jiangshan
2013-03-02 13:14                                         ` Lai Jiangshan
2013-03-02 13:14                                         ` Lai Jiangshan
2013-03-02 13:14                                         ` Lai Jiangshan
2013-03-02 13:14                                         ` Lai Jiangshan
2013-03-02 17:11                                         ` Srivatsa S. Bhat
2013-03-02 17:11                                           ` Srivatsa S. Bhat
2013-03-02 17:11                                           ` Srivatsa S. Bhat
2013-03-05 15:41                                           ` Lai Jiangshan
2013-03-05 15:41                                             ` Lai Jiangshan
2013-03-05 15:41                                             ` Lai Jiangshan
2013-03-05 17:55                                             ` Srivatsa S. Bhat
2013-03-05 17:55                                               ` Srivatsa S. Bhat
2013-03-05 17:55                                               ` Srivatsa S. Bhat
2013-03-02 17:20                                         ` Oleg Nesterov
2013-03-02 17:20                                           ` Oleg Nesterov
2013-03-02 17:20                                           ` Oleg Nesterov
2013-03-03 17:40                                           ` Oleg Nesterov
2013-03-03 17:40                                             ` Oleg Nesterov
2013-03-03 17:40                                             ` Oleg Nesterov
2013-03-05  1:37                                             ` Michel Lespinasse
2013-03-05  1:37                                               ` Michel Lespinasse
2013-03-05  1:37                                               ` Michel Lespinasse
2013-03-05 15:27                                           ` Lai Jiangshan
2013-03-05 15:27                                             ` Lai Jiangshan
2013-03-05 15:27                                             ` Lai Jiangshan
2013-03-05 16:19                                             ` Michel Lespinasse
2013-03-05 16:19                                               ` Michel Lespinasse
2013-03-05 16:19                                               ` Michel Lespinasse
2013-03-05 16:41                                             ` Oleg Nesterov
2013-03-05 16:41                                               ` Oleg Nesterov
2013-03-05 16:41                                               ` Oleg Nesterov
2013-03-02 17:06                                       ` [PATCH] " Oleg Nesterov
2013-03-02 17:06                                         ` Oleg Nesterov
2013-03-02 17:06                                         ` Oleg Nesterov
2013-03-05 15:54                                         ` Lai Jiangshan
2013-03-05 15:54                                           ` Lai Jiangshan
2013-03-05 15:54                                           ` Lai Jiangshan
2013-03-05 16:32                                           ` Michel Lespinasse
2013-03-05 16:32                                             ` Michel Lespinasse
2013-03-05 16:32                                             ` Michel Lespinasse
2013-03-05 16:35                                           ` Oleg Nesterov
2013-03-05 16:35                                             ` Oleg Nesterov
2013-03-05 16:35                                             ` Oleg Nesterov
2013-03-02 13:42                                     ` Lai Jiangshan
2013-03-02 13:42                                       ` Lai Jiangshan
2013-03-02 13:42                                       ` Lai Jiangshan
2013-03-02 17:01                                       ` Oleg Nesterov
2013-03-02 17:01                                         ` Oleg Nesterov
2013-03-02 17:01                                         ` Oleg Nesterov
2013-03-01 17:50                                 ` [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks Lai Jiangshan
2013-03-01 17:50                                   ` Lai Jiangshan
2013-03-01 17:50                                   ` Lai Jiangshan
2013-03-01 17:50                                   ` Lai Jiangshan
2013-03-01 19:47                                   ` Srivatsa S. Bhat
2013-03-01 19:47                                     ` Srivatsa S. Bhat
2013-03-01 19:47                                     ` Srivatsa S. Bhat
2013-03-01 19:47                                     ` Srivatsa S. Bhat
2013-03-05 16:25                                     ` Lai Jiangshan
2013-03-05 16:25                                       ` Lai Jiangshan
2013-03-05 16:25                                       ` Lai Jiangshan
2013-03-05 16:25                                       ` Lai Jiangshan
2013-03-05 18:27                                       ` Srivatsa S. Bhat
2013-03-05 18:27                                         ` Srivatsa S. Bhat
2013-03-05 18:27                                         ` Srivatsa S. Bhat
2013-03-05 18:27                                         ` Srivatsa S. Bhat
2013-03-01 18:10                                 ` Tejun Heo
2013-03-01 18:10                                   ` Tejun Heo
2013-03-01 18:10                                   ` Tejun Heo
2013-03-01 19:59                                   ` Srivatsa S. Bhat
2013-03-01 19:59                                     ` Srivatsa S. Bhat
2013-03-01 19:59                                     ` Srivatsa S. Bhat
2013-02-27 11:11                             ` Michel Lespinasse
2013-02-27 11:11                               ` Michel Lespinasse
2013-02-27 11:11                               ` Michel Lespinasse
2013-02-27 19:25                               ` Oleg Nesterov
2013-02-27 19:25                                 ` Oleg Nesterov
2013-02-27 19:25                                 ` Oleg Nesterov
2013-02-28 11:34                                 ` Michel Lespinasse
2013-02-28 11:34                                   ` Michel Lespinasse
2013-02-28 11:34                                   ` Michel Lespinasse
2013-02-28 18:00                                   ` Oleg Nesterov
2013-02-28 18:00                                     ` Oleg Nesterov
2013-02-28 18:00                                     ` Oleg Nesterov
2013-02-28 18:20                                     ` Oleg Nesterov
2013-02-28 18:20                                       ` Oleg Nesterov
2013-02-28 18:20                                       ` Oleg Nesterov
2013-02-26 13:34                 ` Lai Jiangshan
2013-02-26 13:34                   ` Lai Jiangshan
2013-02-26 13:34                   ` Lai Jiangshan
2013-02-26 15:17                   ` Srivatsa S. Bhat
2013-02-26 15:17                     ` Srivatsa S. Bhat
2013-02-26 14:17   ` Lai Jiangshan
2013-02-26 14:17     ` Lai Jiangshan
2013-02-26 14:17     ` Lai Jiangshan
2013-02-26 14:37     ` Srivatsa S. Bhat
2013-02-26 14:37       ` Srivatsa S. Bhat
2013-02-26 14:37       ` Srivatsa S. Bhat
2013-02-18 12:39 ` [PATCH v6 05/46] percpu_rwlock: Make percpu-rwlocks IRQ-safe, optimally Srivatsa S. Bhat
2013-02-18 12:39   ` Srivatsa S. Bhat
2013-02-18 12:39   ` Srivatsa S. Bhat
2013-02-18 12:39 ` [PATCH v6 06/46] percpu_rwlock: Rearrange the read-lock code to fastpath nested percpu readers Srivatsa S. Bhat
2013-02-18 12:39   ` Srivatsa S. Bhat
2013-02-18 12:39   ` Srivatsa S. Bhat
2013-02-18 12:39 ` [PATCH v6 07/46] percpu_rwlock: Allow writers to be readers, and add lockdep annotations Srivatsa S. Bhat
2013-02-18 12:39   ` Srivatsa S. Bhat
2013-02-18 12:39   ` Srivatsa S. Bhat
2013-02-18 15:51   ` Michel Lespinasse
2013-02-18 15:51     ` Michel Lespinasse
2013-02-18 15:51     ` Michel Lespinasse
2013-02-18 16:31     ` Srivatsa S. Bhat
2013-02-18 16:31       ` Srivatsa S. Bhat
2013-02-18 16:31       ` Srivatsa S. Bhat
2013-02-18 12:39 ` [PATCH v6 08/46] CPU hotplug: Provide APIs to prevent CPU offline from atomic context Srivatsa S. Bhat
2013-02-18 12:39   ` Srivatsa S. Bhat
2013-02-18 12:39   ` Srivatsa S. Bhat
2013-02-18 16:23   ` Michel Lespinasse
2013-02-18 16:23     ` Michel Lespinasse
2013-02-18 16:23     ` Michel Lespinasse
2013-02-18 16:43     ` Srivatsa S. Bhat
2013-02-18 16:43       ` Srivatsa S. Bhat
2013-02-18 16:43       ` Srivatsa S. Bhat
2013-02-18 17:21       ` Michel Lespinasse
2013-02-18 17:21         ` Michel Lespinasse
2013-02-18 17:21         ` Michel Lespinasse
2013-02-18 18:50         ` Srivatsa S. Bhat
2013-02-18 18:50           ` Srivatsa S. Bhat
2013-02-18 18:50           ` Srivatsa S. Bhat
2013-02-19  9:40           ` Michel Lespinasse
2013-02-19  9:40             ` Michel Lespinasse
2013-02-19  9:40             ` Michel Lespinasse
2013-02-19  9:55             ` Srivatsa S. Bhat
2013-02-19  9:55               ` Srivatsa S. Bhat
2013-02-19  9:55               ` Srivatsa S. Bhat
2013-02-19 10:42               ` David Laight
2013-02-19 10:42                 ` David Laight
2013-02-19 10:42                 ` David Laight
2013-02-19 10:42                 ` David Laight
2013-02-19 10:58                 ` Srivatsa S. Bhat
2013-02-19 10:58                   ` Srivatsa S. Bhat
2013-02-19 10:58                   ` Srivatsa S. Bhat
2013-02-18 12:39 ` [PATCH v6 09/46] CPU hotplug: Convert preprocessor macros to static inline functions Srivatsa S. Bhat
2013-02-18 12:39   ` Srivatsa S. Bhat
2013-02-18 12:39   ` Srivatsa S. Bhat
2013-02-18 12:39 ` [PATCH v6 10/46] smp, cpu hotplug: Fix smp_call_function_*() to prevent CPU offline properly Srivatsa S. Bhat
2013-02-18 12:39   ` Srivatsa S. Bhat
2013-02-18 12:39   ` Srivatsa S. Bhat
2013-02-18 12:39 ` [PATCH v6 11/46] smp, cpu hotplug: Fix on_each_cpu_*() " Srivatsa S. Bhat
2013-02-18 12:39   ` Srivatsa S. Bhat
2013-02-18 12:39   ` Srivatsa S. Bhat
2013-02-18 12:40 ` [PATCH v6 12/46] sched/timer: Use get/put_online_cpus_atomic() to prevent CPU offline Srivatsa S. Bhat
2013-02-18 12:40   ` Srivatsa S. Bhat
2013-02-18 12:40   ` Srivatsa S. Bhat
2013-02-18 12:40 ` [PATCH v6 13/46] sched/migration: Use raw_spin_lock/unlock since interrupts are already disabled Srivatsa S. Bhat
2013-02-18 12:40   ` Srivatsa S. Bhat
2013-02-18 12:40   ` Srivatsa S. Bhat
2013-02-18 12:40 ` [PATCH v6 14/46] sched/rt: Use get/put_online_cpus_atomic() to prevent CPU offline Srivatsa S. Bhat
2013-02-18 12:40   ` Srivatsa S. Bhat
2013-02-18 12:40   ` Srivatsa S. Bhat
2013-02-18 12:40 ` [PATCH v6 15/46] tick: " Srivatsa S. Bhat
2013-02-18 12:40   ` Srivatsa S. Bhat
2013-02-18 12:40   ` Srivatsa S. Bhat
2013-02-18 12:40 ` [PATCH v6 16/46] time/clocksource: " Srivatsa S. Bhat
2013-02-18 12:40   ` Srivatsa S. Bhat
2013-02-18 12:40   ` Srivatsa S. Bhat
2013-02-18 12:40 ` [PATCH v6 17/46] clockevents: Use get/put_online_cpus_atomic() in clockevents_notify() Srivatsa S. Bhat
2013-02-18 12:40   ` Srivatsa S. Bhat
2013-02-18 12:40   ` Srivatsa S. Bhat
2013-02-18 12:40 ` [PATCH v6 18/46] softirq: Use get/put_online_cpus_atomic() to prevent CPU offline Srivatsa S. Bhat
2013-02-18 12:40   ` Srivatsa S. Bhat
2013-02-18 12:40   ` Srivatsa S. Bhat
2013-02-18 12:40 ` [PATCH v6 19/46] irq: " Srivatsa S. Bhat
2013-02-18 12:40   ` Srivatsa S. Bhat
2013-02-18 12:40   ` Srivatsa S. Bhat
2013-02-18 12:41 ` [PATCH v6 20/46] net: " Srivatsa S. Bhat
2013-02-18 12:41   ` Srivatsa S. Bhat
2013-02-18 12:41   ` Srivatsa S. Bhat
2013-02-18 12:41 ` [PATCH v6 21/46] block: " Srivatsa S. Bhat
2013-02-18 12:41   ` Srivatsa S. Bhat
2013-02-18 12:41   ` Srivatsa S. Bhat
2013-02-18 12:41 ` [PATCH v6 22/46] crypto: pcrypt - Protect access to cpu_online_mask with get/put_online_cpus() Srivatsa S. Bhat
2013-02-18 12:41   ` Srivatsa S. Bhat
2013-02-18 12:41   ` Srivatsa S. Bhat
2013-02-18 12:41 ` [PATCH v6 23/46] infiniband: ehca: Use get/put_online_cpus_atomic() to prevent CPU offline Srivatsa S. Bhat
2013-02-18 12:41   ` Srivatsa S. Bhat
2013-02-18 12:41   ` Srivatsa S. Bhat
2013-02-18 12:41 ` [PATCH v6 24/46] [SCSI] fcoe: " Srivatsa S. Bhat
2013-02-18 12:41   ` Srivatsa S. Bhat
2013-02-18 12:41   ` Srivatsa S. Bhat
2013-02-18 12:41 ` [PATCH v6 25/46] staging: octeon: " Srivatsa S. Bhat
2013-02-18 12:41   ` Srivatsa S. Bhat
2013-02-18 12:41   ` Srivatsa S. Bhat
2013-02-18 12:41 ` [PATCH v6 26/46] x86: " Srivatsa S. Bhat
2013-02-18 12:41   ` Srivatsa S. Bhat
2013-02-18 12:41   ` Srivatsa S. Bhat
2013-02-18 12:42 ` [PATCH v6 27/46] perf/x86: " Srivatsa S. Bhat
2013-02-18 12:42   ` Srivatsa S. Bhat
2013-02-18 12:42   ` Srivatsa S. Bhat
2013-02-18 12:42 ` [PATCH v6 28/46] KVM: Use get/put_online_cpus_atomic() to prevent CPU offline from atomic context Srivatsa S. Bhat
2013-02-18 12:42   ` Srivatsa S. Bhat
2013-02-18 12:42   ` Srivatsa S. Bhat
2013-02-18 12:42 ` [PATCH v6 29/46] kvm/vmx: Use get/put_online_cpus_atomic() to prevent CPU offline Srivatsa S. Bhat
2013-02-18 12:42   ` Srivatsa S. Bhat
2013-02-18 12:42   ` Srivatsa S. Bhat
2013-02-18 12:42 ` [PATCH v6 30/46] x86/xen: " Srivatsa S. Bhat
2013-02-18 12:42   ` Srivatsa S. Bhat
2013-02-18 12:42   ` Srivatsa S. Bhat
2013-02-18 12:42 ` [PATCH v6 31/46] alpha/smp: " Srivatsa S. Bhat
2013-02-18 12:42   ` Srivatsa S. Bhat
2013-02-18 12:42   ` Srivatsa S. Bhat
2013-02-18 12:42   ` Srivatsa S. Bhat
2013-02-18 12:42 ` [PATCH v6 32/46] blackfin/smp: " Srivatsa S. Bhat
2013-02-18 12:42   ` Srivatsa S. Bhat
2013-02-18 12:42   ` Srivatsa S. Bhat
2013-02-18 12:42 ` [PATCH v6 33/46] cris/smp: " Srivatsa S. Bhat
2013-02-18 12:42   ` Srivatsa S. Bhat
2013-02-18 12:42   ` Srivatsa S. Bhat
2013-02-18 13:07   ` Jesper Nilsson
2013-02-18 13:07     ` Jesper Nilsson
2013-02-18 13:07     ` Jesper Nilsson
2013-02-18 12:43 ` [PATCH v6 34/46] hexagon/smp: " Srivatsa S. Bhat
2013-02-18 12:43   ` Srivatsa S. Bhat
2013-02-18 12:43   ` Srivatsa S. Bhat
2013-02-18 12:43 ` [PATCH v6 35/46] ia64: " Srivatsa S. Bhat
2013-02-18 12:43   ` Srivatsa S. Bhat
2013-02-18 12:43   ` Srivatsa S. Bhat
2013-02-18 12:43 ` [PATCH v6 36/46] m32r: " Srivatsa S. Bhat
2013-02-18 12:43   ` Srivatsa S. Bhat
2013-02-18 12:43   ` Srivatsa S. Bhat
2013-02-18 12:43 ` [PATCH v6 37/46] MIPS: " Srivatsa S. Bhat
2013-02-18 12:43   ` Srivatsa S. Bhat
2013-02-18 12:43   ` Srivatsa S. Bhat
2013-02-18 12:43 ` [PATCH v6 38/46] mn10300: " Srivatsa S. Bhat
2013-02-18 12:43   ` Srivatsa S. Bhat
2013-02-18 12:43   ` Srivatsa S. Bhat
2013-02-18 12:43 ` [PATCH v6 39/46] parisc: " Srivatsa S. Bhat
2013-02-18 12:43   ` Srivatsa S. Bhat
2013-02-18 12:43   ` Srivatsa S. Bhat
2013-02-18 12:43 ` [PATCH v6 40/46] powerpc: " Srivatsa S. Bhat
2013-02-18 12:43   ` Srivatsa S. Bhat
2013-02-18 12:43   ` Srivatsa S. Bhat
2013-02-18 12:44 ` [PATCH v6 41/46] sh: " Srivatsa S. Bhat
2013-02-18 12:44   ` Srivatsa S. Bhat
2013-02-18 12:44   ` Srivatsa S. Bhat
2013-02-18 12:44 ` [PATCH v6 42/46] sparc: " Srivatsa S. Bhat
2013-02-18 12:44   ` Srivatsa S. Bhat
2013-02-18 12:44   ` Srivatsa S. Bhat
2013-02-18 12:44 ` [PATCH v6 43/46] tile: " Srivatsa S. Bhat
2013-02-18 12:44   ` Srivatsa S. Bhat
2013-02-18 12:44   ` Srivatsa S. Bhat
2013-02-18 12:44 ` [PATCH v6 44/46] cpu: No more __stop_machine() in _cpu_down() Srivatsa S. Bhat
2013-02-18 12:44   ` Srivatsa S. Bhat
2013-02-18 12:44   ` Srivatsa S. Bhat
2013-02-18 12:44 ` [PATCH v6 45/46] CPU hotplug, stop_machine: Decouple CPU hotplug from stop_machine() in Kconfig Srivatsa S. Bhat
2013-02-18 12:44   ` Srivatsa S. Bhat
2013-02-18 12:44   ` Srivatsa S. Bhat
2013-02-18 12:44 ` [PATCH v6 46/46] Documentation/cpu-hotplug: Remove references to stop_machine() Srivatsa S. Bhat
2013-02-18 12:44   ` Srivatsa S. Bhat
2013-02-18 12:44   ` Srivatsa S. Bhat
2013-02-22  0:31 ` [PATCH v6 00/46] CPU hotplug: stop_machine()-free CPU hotplug Rusty Russell
2013-02-22  0:31   ` Rusty Russell
2013-02-22  0:31   ` Rusty Russell
2013-02-22  0:31   ` Rusty Russell
2013-02-22  0:31   ` Rusty Russell
2013-02-25 21:45   ` Srivatsa S. Bhat
2013-02-25 21:45     ` Srivatsa S. Bhat
2013-02-25 21:45     ` Srivatsa S. Bhat
2013-03-01 12:05 ` Vincent Guittot
2013-03-01 12:05   ` Vincent Guittot
2013-03-01 12:05   ` Vincent Guittot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.