* Re: rt14: strace -> migrate_disable_atomic imbalance
2011-09-21 10:17 ` rt14: strace -> migrate_disable_atomic imbalance Mike Galbraith
@ 2011-09-21 17:01 ` Peter Zijlstra
2011-09-21 18:50 ` Peter Zijlstra
` (4 subsequent siblings)
5 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2011-09-21 17:01 UTC (permalink / raw)
To: Mike Galbraith
Cc: linux-rt-users, Thomas Gleixner, LKML, Oleg Nesterov, Miklos Szeredi
On Wed, 2011-09-21 at 12:17 +0200, Mike Galbraith wrote:
> [ 144.212272] ------------[ cut here ]------------
> [ 144.212280] WARNING: at kernel/sched.c:6152 migrate_disable+0x1b6/0x200()
> [ 144.212282] Hardware name: MS-7502
> [ 144.212283] Modules linked in: snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device edd nfsd lockd parport_pc parport nfs_acl auth_rpcgss sunrpc bridge ipv6 stp cpufreq_conservative microcode cpufreq_ondemand cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf nls_iso8859_1 nls_cp437 vfat fat fuse ext3 jbd dm_mod usbmouse usb_storage usbhid snd_hda_codec_realtek usb_libusual uas sr_mod cdrom hid snd_hda_intel e1000e snd_hda_codec kvm_intel snd_hwdep sg snd_pcm kvm i2c_i801 snd_timer snd firewire_ohci firewire_core soundcore snd_page_alloc crc_itu_t button ext4 mbcache jbd2 crc16 uhci_hcd sd_mod ehci_hcd usbcore rtc_cmos ahci libahci libata scsi_mod fan processor thermal
> [ 144.212317] Pid: 6215, comm: strace Not tainted 3.0.4-rt14 #2052
> [ 144.212319] Call Trace:
> [ 144.212323] [<ffffffff8104662f>] warn_slowpath_common+0x7f/0xc0
> [ 144.212326] [<ffffffff8104668a>] warn_slowpath_null+0x1a/0x20
> [ 144.212328] [<ffffffff8103f606>] migrate_disable+0x1b6/0x200
> [ 144.212331] [<ffffffff8105a2a8>] ptrace_stop+0x128/0x240
> [ 144.212334] [<ffffffff81057b9b>] ? recalc_sigpending+0x1b/0x50
> [ 144.212337] [<ffffffff8105b6f1>] get_signal_to_deliver+0x211/0x530
> [ 144.212340] [<ffffffff81001835>] do_signal+0x75/0x7a0
> [ 144.212342] [<ffffffff8105ae68>] ? kill_pid_info+0x58/0x80
> [ 144.212344] [<ffffffff8105c34c>] ? sys_kill+0xac/0x1e0
> [ 144.212347] [<ffffffff81001fe5>] do_notify_resume+0x65/0x80
> [ 144.212350] [<ffffffff8135978b>] int_signal+0x12/0x17
> [ 144.212352] ---[ end trace 0000000000000002 ]---
Right, that's because of
53da1d9456fe7f87a920a78fdbdcf1225d197cb7, I think we simply want a full
revert of that for -rt.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: rt14: strace -> migrate_disable_atomic imbalance
2011-09-21 10:17 ` rt14: strace -> migrate_disable_atomic imbalance Mike Galbraith
@ 2011-09-21 18:50 ` Peter Zijlstra
2011-09-21 18:50 ` Peter Zijlstra
` (4 subsequent siblings)
5 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2011-09-21 18:50 UTC (permalink / raw)
To: Mike Galbraith
Cc: linux-rt-users, Thomas Gleixner, LKML, Oleg Nesterov,
Miklos Szeredi, mingo
On Wed, 2011-09-21 at 19:01 +0200, Peter Zijlstra wrote:
> On Wed, 2011-09-21 at 12:17 +0200, Mike Galbraith wrote:
> > [ 144.212272] ------------[ cut here ]------------
> > [ 144.212280] WARNING: at kernel/sched.c:6152 migrate_disable+0x1b6/0x200()
> > [ 144.212282] Hardware name: MS-7502
> > [ 144.212283] Modules linked in: snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device edd nfsd lockd parport_pc parport nfs_acl auth_rpcgss sunrpc bridge ipv6 stp cpufreq_conservative microcode cpufreq_ondemand cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf nls_iso8859_1 nls_cp437 vfat fat fuse ext3 jbd dm_mod usbmouse usb_storage usbhid snd_hda_codec_realtek usb_libusual uas sr_mod cdrom hid snd_hda_intel e1000e snd_hda_codec kvm_intel snd_hwdep sg snd_pcm kvm i2c_i801 snd_timer snd firewire_ohci firewire_core soundcore snd_page_alloc crc_itu_t button ext4 mbcache jbd2 crc16 uhci_hcd sd_mod ehci_hcd usbcore rtc_cmos ahci libahci libata scsi_mod fan processor thermal
> > [ 144.212317] Pid: 6215, comm: strace Not tainted 3.0.4-rt14 #2052
> > [ 144.212319] Call Trace:
> > [ 144.212323] [<ffffffff8104662f>] warn_slowpath_common+0x7f/0xc0
> > [ 144.212326] [<ffffffff8104668a>] warn_slowpath_null+0x1a/0x20
> > [ 144.212328] [<ffffffff8103f606>] migrate_disable+0x1b6/0x200
> > [ 144.212331] [<ffffffff8105a2a8>] ptrace_stop+0x128/0x240
> > [ 144.212334] [<ffffffff81057b9b>] ? recalc_sigpending+0x1b/0x50
> > [ 144.212337] [<ffffffff8105b6f1>] get_signal_to_deliver+0x211/0x530
> > [ 144.212340] [<ffffffff81001835>] do_signal+0x75/0x7a0
> > [ 144.212342] [<ffffffff8105ae68>] ? kill_pid_info+0x58/0x80
> > [ 144.212344] [<ffffffff8105c34c>] ? sys_kill+0xac/0x1e0
> > [ 144.212347] [<ffffffff81001fe5>] do_notify_resume+0x65/0x80
> > [ 144.212350] [<ffffffff8135978b>] int_signal+0x12/0x17
> > [ 144.212352] ---[ end trace 0000000000000002 ]---
>
>
> Right, that's because of
> 53da1d9456fe7f87a920a78fdbdcf1225d197cb7, I think we simply want a full
> revert of that for -rt.
This also made me stare at the trainwreck called wait_task_inactive(),
how about something like the below, it survives a boot and simple
strace.
I'm not particularly keen on always enabling preempt notifiers, but
seeing that pretty much world+dog already has them enabled...
Also, less LOC is always better, right ;-)
---
arch/ia64/kvm/Kconfig | 1 -
arch/powerpc/kvm/Kconfig | 1 -
arch/s390/kvm/Kconfig | 1 -
arch/tile/kvm/Kconfig | 1 -
arch/x86/kvm/Kconfig | 1 -
include/linux/kvm_host.h | 2 -
include/linux/preempt.h | 4 -
include/linux/sched.h | 2 -
init/Kconfig | 3 -
kernel/sched.c | 163 ++++++++++++++++++----------------------------
10 files changed, 64 insertions(+), 115 deletions(-)
diff --git a/arch/ia64/kvm/Kconfig b/arch/ia64/kvm/Kconfig
index 9806e55..02b36ca 100644
--- a/arch/ia64/kvm/Kconfig
+++ b/arch/ia64/kvm/Kconfig
@@ -22,7 +22,6 @@ config KVM
depends on HAVE_KVM && MODULES && EXPERIMENTAL
# for device assignment:
depends on PCI
- select PREEMPT_NOTIFIERS
select ANON_INODES
select HAVE_KVM_IRQCHIP
select KVM_APIC_ARCHITECTURE
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 78133de..0bcd5a8 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -18,7 +18,6 @@ if VIRTUALIZATION
config KVM
bool
- select PREEMPT_NOTIFIERS
select ANON_INODES
config KVM_BOOK3S_HANDLER
diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig
index a216341..7ff8d54 100644
--- a/arch/s390/kvm/Kconfig
+++ b/arch/s390/kvm/Kconfig
@@ -19,7 +19,6 @@ config KVM
def_tristate y
prompt "Kernel-based Virtual Machine (KVM) support"
depends on HAVE_KVM && EXPERIMENTAL
- select PREEMPT_NOTIFIERS
select ANON_INODES
---help---
Support hosting paravirtualized guest machines using the SIE
diff --git a/arch/tile/kvm/Kconfig b/arch/tile/kvm/Kconfig
index 669fcdb..6a936d1 100644
--- a/arch/tile/kvm/Kconfig
+++ b/arch/tile/kvm/Kconfig
@@ -19,7 +19,6 @@ if VIRTUALIZATION
config KVM
tristate "Kernel-based Virtual Machine (KVM) support"
depends on HAVE_KVM && MODULES && EXPERIMENTAL
- select PREEMPT_NOTIFIERS
select ANON_INODES
---help---
Support hosting paravirtualized guest machines.
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index ff5790d..d82150a 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -24,7 +24,6 @@ config KVM
depends on PCI
# for TASKSTATS/TASK_DELAY_ACCT:
depends on NET
- select PREEMPT_NOTIFIERS
select MMU_NOTIFIER
select ANON_INODES
select HAVE_KVM_IRQCHIP
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index eabb21a..a9343b8 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -111,9 +111,7 @@ enum {
struct kvm_vcpu {
struct kvm *kvm;
-#ifdef CONFIG_PREEMPT_NOTIFIERS
struct preempt_notifier preempt_notifier;
-#endif
int cpu;
int vcpu_id;
int srcu_idx;
diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index 58969b2..7ca8968 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -101,8 +101,6 @@ do { \
#endif /* CONFIG_PREEMPT_COUNT */
-#ifdef CONFIG_PREEMPT_NOTIFIERS
-
struct preempt_notifier;
/**
@@ -147,6 +145,4 @@ static inline void preempt_notifier_init(struct preempt_notifier *notifier,
notifier->ops = ops;
}
-#endif
-
#endif /* __LINUX_PREEMPT_H */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index e54c890..64fc7c7 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1236,10 +1236,8 @@ struct task_struct {
struct sched_entity se;
struct sched_rt_entity rt;
-#ifdef CONFIG_PREEMPT_NOTIFIERS
/* list of struct preempt_notifier: */
struct hlist_head preempt_notifiers;
-#endif
/*
* fpu_counter contains the number of consecutive context switches
diff --git a/init/Kconfig b/init/Kconfig
index d19b3a7..c1c411c 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1403,9 +1403,6 @@ config STOP_MACHINE
source "block/Kconfig"
-config PREEMPT_NOTIFIERS
- bool
-
config PADATA
depends on SMP
bool
diff --git a/kernel/sched.c b/kernel/sched.c
index db143fd..b38ab2e 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2387,6 +2387,38 @@ struct migration_arg {
static int migration_cpu_stop(void *data);
+struct wait_task_inactive_blocked {
+ struct preempt_notifier notifier;
+ struct task_struct *waiter;
+};
+
+static void wait_task_inactive_sched_in(struct preempt_notifier *n, int cpu)
+{
+ /* Dummy, could be called when preempted before sleeping */
+}
+
+static void wait_task_inactive_sched_out(struct preempt_notifier *n,
+ struct task_struct *next)
+{
+ struct task_struct *p;
+ struct wait_task_inactive_blocked *blocked =
+ container_of(n, struct wait_task_inactive_blocked, notifier);
+
+ if (current->on_rq) /* we're not inactive yet */
+ return;
+
+ hlist_del(&n->link);
+
+ p = ACCESS_ONCE(blocked->waiter);
+ blocked->waiter = NULL;
+ wake_up_process(p);
+}
+
+static struct preempt_ops wait_task_inactive_ops = {
+ .sched_in = wait_task_inactive_sched_in,
+ .sched_out = wait_task_inactive_sched_out,
+};
+
/*
* wait_task_inactive - wait for a thread to unschedule.
*
@@ -2405,93 +2437,45 @@ static int migration_cpu_stop(void *data);
*/
unsigned long wait_task_inactive(struct task_struct *p, long match_state)
{
+ unsigned long ncsw = 0;
unsigned long flags;
- int running, on_rq;
- unsigned long ncsw;
struct rq *rq;
- for (;;) {
- /*
- * We do the initial early heuristics without holding
- * any task-queue locks at all. We'll only try to get
- * the runqueue lock when things look like they will
- * work out!
- */
- rq = task_rq(p);
-
- /*
- * If the task is actively running on another CPU
- * still, just relax and busy-wait without holding
- * any locks.
- *
- * NOTE! Since we don't hold any locks, it's not
- * even sure that "rq" stays as the right runqueue!
- * But we don't care, since "task_running()" will
- * return false if the runqueue has changed and p
- * is actually now running somewhere else!
- */
- while (task_running(rq, p)) {
- if (match_state && unlikely(p->state != match_state))
- return 0;
- cpu_relax();
- }
-
- /*
- * Ok, time to look more closely! We need the rq
- * lock now, to be *sure*. If we're wrong, we'll
- * just go back and repeat.
- */
- rq = task_rq_lock(p, &flags);
- trace_sched_wait_task(p);
- running = task_running(rq, p);
- on_rq = p->on_rq;
- ncsw = 0;
- if (!match_state || p->state == match_state)
- ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
- task_rq_unlock(rq, p, &flags);
-
- /*
- * If it changed from the expected state, bail out now.
- */
- if (unlikely(!ncsw))
- break;
+ struct wait_task_inactive_blocked blocked = {
+ .notifier = {
+ .ops = &wait_task_inactive_ops,
+ },
+ .waiter = current,
+ };
- /*
- * Was it really running after all now that we
- * checked with the proper locks actually held?
- *
- * Oops. Go back and try again..
- */
- if (unlikely(running)) {
- cpu_relax();
- continue;
- }
+ rq = task_rq_lock(p, &flags);
+ if (!task_running(rq, p))
+ goto done;
- /*
- * It's not enough that it's not actively running,
- * it must be off the runqueue _entirely_, and not
- * preempted!
- *
- * So if it was still runnable (but just not actively
- * running right now), it's preempted, and we should
- * yield - it could be a while.
- */
- if (unlikely(on_rq)) {
- ktime_t to = ktime_set(0, NSEC_PER_SEC/HZ);
+ if (match_state && unlikely(p->state != match_state))
+ goto unlock;
- set_current_state(TASK_UNINTERRUPTIBLE);
- schedule_hrtimeout(&to, HRTIMER_MODE_REL);
- continue;
- }
+ hlist_add_head(&blocked.notifier.link, &p->preempt_notifiers);
+ task_rq_unlock(rq, p, &flags);
- /*
- * Ahh, all good. It wasn't running, and it wasn't
- * runnable, which means that it will never become
- * running in the future either. We're all done!
- */
- break;
+ for (;;) {
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ if (!blocked.waiter)
+ break;
+ schedule();
}
+ __set_current_state(TASK_RUNNING);
+ /*
+ * Serializes against the completion of the previously observed context
+ * switch.
+ */
+ rq = task_rq_lock(p, &flags);
+done:
+ if (!match_state || p->state == match_state)
+ ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
+unlock:
+ task_rq_unlock(rq, p, &flags);
return ncsw;
}
@@ -2967,10 +2951,7 @@ static void __sched_fork(struct task_struct *p)
#endif
INIT_LIST_HEAD(&p->rt.run_list);
-
-#ifdef CONFIG_PREEMPT_NOTIFIERS
INIT_HLIST_HEAD(&p->preempt_notifiers);
-#endif
}
/*
@@ -3084,8 +3065,6 @@ void wake_up_new_task(struct task_struct *p)
task_rq_unlock(rq, p, &flags);
}
-#ifdef CONFIG_PREEMPT_NOTIFIERS
-
/**
* preempt_notifier_register - tell me when current is being preempted & rescheduled
* @notifier: notifier struct to register
@@ -3122,26 +3101,12 @@ fire_sched_out_preempt_notifiers(struct task_struct *curr,
struct task_struct *next)
{
struct preempt_notifier *notifier;
- struct hlist_node *node;
+ struct hlist_node *node, *n;
- hlist_for_each_entry(notifier, node, &curr->preempt_notifiers, link)
+ hlist_for_each_entry_safe(notifier, node, n, &curr->preempt_notifiers, link)
notifier->ops->sched_out(notifier, next);
}
-#else /* !CONFIG_PREEMPT_NOTIFIERS */
-
-static void fire_sched_in_preempt_notifiers(struct task_struct *curr)
-{
-}
-
-static void
-fire_sched_out_preempt_notifiers(struct task_struct *curr,
- struct task_struct *next)
-{
-}
-
-#endif /* CONFIG_PREEMPT_NOTIFIERS */
-
/**
* prepare_task_switch - prepare to switch tasks
* @rq: the runqueue preparing to switch
^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: rt14: strace -> migrate_disable_atomic imbalance
@ 2011-09-21 18:50 ` Peter Zijlstra
0 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2011-09-21 18:50 UTC (permalink / raw)
To: Mike Galbraith
Cc: linux-rt-users, Thomas Gleixner, LKML, Oleg Nesterov,
Miklos Szeredi, mingo
On Wed, 2011-09-21 at 19:01 +0200, Peter Zijlstra wrote:
> On Wed, 2011-09-21 at 12:17 +0200, Mike Galbraith wrote:
> > [ 144.212272] ------------[ cut here ]------------
> > [ 144.212280] WARNING: at kernel/sched.c:6152 migrate_disable+0x1b6/0x200()
> > [ 144.212282] Hardware name: MS-7502
> > [ 144.212283] Modules linked in: snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device edd nfsd lockd parport_pc parport nfs_acl auth_rpcgss sunrpc bridge ipv6 stp cpufreq_conservative microcode cpufreq_ondemand cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf nls_iso8859_1 nls_cp437 vfat fat fuse ext3 jbd dm_mod usbmouse usb_storage usbhid snd_hda_codec_realtek usb_libusual uas sr_mod cdrom hid snd_hda_intel e1000e snd_hda_codec kvm_intel snd_hwdep sg snd_pcm kvm i2c_i801 snd_timer snd firewire_ohci firewire_core soundcore snd_page_alloc crc_itu_t button ext4 mbcache jbd2 crc16 uhci_hcd sd_mod ehci_hcd usbcore rtc_cmos ahci libahci libata scsi_mod fan processor thermal
> > [ 144.212317] Pid: 6215, comm: strace Not tainted 3.0.4-rt14 #2052
> > [ 144.212319] Call Trace:
> > [ 144.212323] [<ffffffff8104662f>] warn_slowpath_common+0x7f/0xc0
> > [ 144.212326] [<ffffffff8104668a>] warn_slowpath_null+0x1a/0x20
> > [ 144.212328] [<ffffffff8103f606>] migrate_disable+0x1b6/0x200
> > [ 144.212331] [<ffffffff8105a2a8>] ptrace_stop+0x128/0x240
> > [ 144.212334] [<ffffffff81057b9b>] ? recalc_sigpending+0x1b/0x50
> > [ 144.212337] [<ffffffff8105b6f1>] get_signal_to_deliver+0x211/0x530
> > [ 144.212340] [<ffffffff81001835>] do_signal+0x75/0x7a0
> > [ 144.212342] [<ffffffff8105ae68>] ? kill_pid_info+0x58/0x80
> > [ 144.212344] [<ffffffff8105c34c>] ? sys_kill+0xac/0x1e0
> > [ 144.212347] [<ffffffff81001fe5>] do_notify_resume+0x65/0x80
> > [ 144.212350] [<ffffffff8135978b>] int_signal+0x12/0x17
> > [ 144.212352] ---[ end trace 0000000000000002 ]---
>
>
> Right, that's because of
> 53da1d9456fe7f87a920a78fdbdcf1225d197cb7, I think we simply want a full
> revert of that for -rt.
This also made me stare at the trainwreck called wait_task_inactive(),
how about something like the below, it survives a boot and simple
strace.
I'm not particularly keen on always enabling preempt notifiers, but
seeing that pretty much world+dog already has them enabled...
Also, less LOC is always better, right ;-)
---
arch/ia64/kvm/Kconfig | 1 -
arch/powerpc/kvm/Kconfig | 1 -
arch/s390/kvm/Kconfig | 1 -
arch/tile/kvm/Kconfig | 1 -
arch/x86/kvm/Kconfig | 1 -
include/linux/kvm_host.h | 2 -
include/linux/preempt.h | 4 -
include/linux/sched.h | 2 -
init/Kconfig | 3 -
kernel/sched.c | 163 ++++++++++++++++++----------------------------
10 files changed, 64 insertions(+), 115 deletions(-)
diff --git a/arch/ia64/kvm/Kconfig b/arch/ia64/kvm/Kconfig
index 9806e55..02b36ca 100644
--- a/arch/ia64/kvm/Kconfig
+++ b/arch/ia64/kvm/Kconfig
@@ -22,7 +22,6 @@ config KVM
depends on HAVE_KVM && MODULES && EXPERIMENTAL
# for device assignment:
depends on PCI
- select PREEMPT_NOTIFIERS
select ANON_INODES
select HAVE_KVM_IRQCHIP
select KVM_APIC_ARCHITECTURE
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 78133de..0bcd5a8 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -18,7 +18,6 @@ if VIRTUALIZATION
config KVM
bool
- select PREEMPT_NOTIFIERS
select ANON_INODES
config KVM_BOOK3S_HANDLER
diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig
index a216341..7ff8d54 100644
--- a/arch/s390/kvm/Kconfig
+++ b/arch/s390/kvm/Kconfig
@@ -19,7 +19,6 @@ config KVM
def_tristate y
prompt "Kernel-based Virtual Machine (KVM) support"
depends on HAVE_KVM && EXPERIMENTAL
- select PREEMPT_NOTIFIERS
select ANON_INODES
---help---
Support hosting paravirtualized guest machines using the SIE
diff --git a/arch/tile/kvm/Kconfig b/arch/tile/kvm/Kconfig
index 669fcdb..6a936d1 100644
--- a/arch/tile/kvm/Kconfig
+++ b/arch/tile/kvm/Kconfig
@@ -19,7 +19,6 @@ if VIRTUALIZATION
config KVM
tristate "Kernel-based Virtual Machine (KVM) support"
depends on HAVE_KVM && MODULES && EXPERIMENTAL
- select PREEMPT_NOTIFIERS
select ANON_INODES
---help---
Support hosting paravirtualized guest machines.
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index ff5790d..d82150a 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -24,7 +24,6 @@ config KVM
depends on PCI
# for TASKSTATS/TASK_DELAY_ACCT:
depends on NET
- select PREEMPT_NOTIFIERS
select MMU_NOTIFIER
select ANON_INODES
select HAVE_KVM_IRQCHIP
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index eabb21a..a9343b8 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -111,9 +111,7 @@ enum {
struct kvm_vcpu {
struct kvm *kvm;
-#ifdef CONFIG_PREEMPT_NOTIFIERS
struct preempt_notifier preempt_notifier;
-#endif
int cpu;
int vcpu_id;
int srcu_idx;
diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index 58969b2..7ca8968 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -101,8 +101,6 @@ do { \
#endif /* CONFIG_PREEMPT_COUNT */
-#ifdef CONFIG_PREEMPT_NOTIFIERS
-
struct preempt_notifier;
/**
@@ -147,6 +145,4 @@ static inline void preempt_notifier_init(struct preempt_notifier *notifier,
notifier->ops = ops;
}
-#endif
-
#endif /* __LINUX_PREEMPT_H */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index e54c890..64fc7c7 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1236,10 +1236,8 @@ struct task_struct {
struct sched_entity se;
struct sched_rt_entity rt;
-#ifdef CONFIG_PREEMPT_NOTIFIERS
/* list of struct preempt_notifier: */
struct hlist_head preempt_notifiers;
-#endif
/*
* fpu_counter contains the number of consecutive context switches
diff --git a/init/Kconfig b/init/Kconfig
index d19b3a7..c1c411c 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1403,9 +1403,6 @@ config STOP_MACHINE
source "block/Kconfig"
-config PREEMPT_NOTIFIERS
- bool
-
config PADATA
depends on SMP
bool
diff --git a/kernel/sched.c b/kernel/sched.c
index db143fd..b38ab2e 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2387,6 +2387,38 @@ struct migration_arg {
static int migration_cpu_stop(void *data);
+struct wait_task_inactive_blocked {
+ struct preempt_notifier notifier;
+ struct task_struct *waiter;
+};
+
+static void wait_task_inactive_sched_in(struct preempt_notifier *n, int cpu)
+{
+ /* Dummy, could be called when preempted before sleeping */
+}
+
+static void wait_task_inactive_sched_out(struct preempt_notifier *n,
+ struct task_struct *next)
+{
+ struct task_struct *p;
+ struct wait_task_inactive_blocked *blocked =
+ container_of(n, struct wait_task_inactive_blocked, notifier);
+
+ if (current->on_rq) /* we're not inactive yet */
+ return;
+
+ hlist_del(&n->link);
+
+ p = ACCESS_ONCE(blocked->waiter);
+ blocked->waiter = NULL;
+ wake_up_process(p);
+}
+
+static struct preempt_ops wait_task_inactive_ops = {
+ .sched_in = wait_task_inactive_sched_in,
+ .sched_out = wait_task_inactive_sched_out,
+};
+
/*
* wait_task_inactive - wait for a thread to unschedule.
*
@@ -2405,93 +2437,45 @@ static int migration_cpu_stop(void *data);
*/
unsigned long wait_task_inactive(struct task_struct *p, long match_state)
{
+ unsigned long ncsw = 0;
unsigned long flags;
- int running, on_rq;
- unsigned long ncsw;
struct rq *rq;
- for (;;) {
- /*
- * We do the initial early heuristics without holding
- * any task-queue locks at all. We'll only try to get
- * the runqueue lock when things look like they will
- * work out!
- */
- rq = task_rq(p);
-
- /*
- * If the task is actively running on another CPU
- * still, just relax and busy-wait without holding
- * any locks.
- *
- * NOTE! Since we don't hold any locks, it's not
- * even sure that "rq" stays as the right runqueue!
- * But we don't care, since "task_running()" will
- * return false if the runqueue has changed and p
- * is actually now running somewhere else!
- */
- while (task_running(rq, p)) {
- if (match_state && unlikely(p->state != match_state))
- return 0;
- cpu_relax();
- }
-
- /*
- * Ok, time to look more closely! We need the rq
- * lock now, to be *sure*. If we're wrong, we'll
- * just go back and repeat.
- */
- rq = task_rq_lock(p, &flags);
- trace_sched_wait_task(p);
- running = task_running(rq, p);
- on_rq = p->on_rq;
- ncsw = 0;
- if (!match_state || p->state == match_state)
- ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
- task_rq_unlock(rq, p, &flags);
-
- /*
- * If it changed from the expected state, bail out now.
- */
- if (unlikely(!ncsw))
- break;
+ struct wait_task_inactive_blocked blocked = {
+ .notifier = {
+ .ops = &wait_task_inactive_ops,
+ },
+ .waiter = current,
+ };
- /*
- * Was it really running after all now that we
- * checked with the proper locks actually held?
- *
- * Oops. Go back and try again..
- */
- if (unlikely(running)) {
- cpu_relax();
- continue;
- }
+ rq = task_rq_lock(p, &flags);
+ if (!task_running(rq, p))
+ goto done;
- /*
- * It's not enough that it's not actively running,
- * it must be off the runqueue _entirely_, and not
- * preempted!
- *
- * So if it was still runnable (but just not actively
- * running right now), it's preempted, and we should
- * yield - it could be a while.
- */
- if (unlikely(on_rq)) {
- ktime_t to = ktime_set(0, NSEC_PER_SEC/HZ);
+ if (match_state && unlikely(p->state != match_state))
+ goto unlock;
- set_current_state(TASK_UNINTERRUPTIBLE);
- schedule_hrtimeout(&to, HRTIMER_MODE_REL);
- continue;
- }
+ hlist_add_head(&blocked.notifier.link, &p->preempt_notifiers);
+ task_rq_unlock(rq, p, &flags);
- /*
- * Ahh, all good. It wasn't running, and it wasn't
- * runnable, which means that it will never become
- * running in the future either. We're all done!
- */
- break;
+ for (;;) {
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ if (!blocked.waiter)
+ break;
+ schedule();
}
+ __set_current_state(TASK_RUNNING);
+ /*
+ * Serializes against the completion of the previously observed context
+ * switch.
+ */
+ rq = task_rq_lock(p, &flags);
+done:
+ if (!match_state || p->state == match_state)
+ ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
+unlock:
+ task_rq_unlock(rq, p, &flags);
return ncsw;
}
@@ -2967,10 +2951,7 @@ static void __sched_fork(struct task_struct *p)
#endif
INIT_LIST_HEAD(&p->rt.run_list);
-
-#ifdef CONFIG_PREEMPT_NOTIFIERS
INIT_HLIST_HEAD(&p->preempt_notifiers);
-#endif
}
/*
@@ -3084,8 +3065,6 @@ void wake_up_new_task(struct task_struct *p)
task_rq_unlock(rq, p, &flags);
}
-#ifdef CONFIG_PREEMPT_NOTIFIERS
-
/**
* preempt_notifier_register - tell me when current is being preempted & rescheduled
* @notifier: notifier struct to register
@@ -3122,26 +3101,12 @@ fire_sched_out_preempt_notifiers(struct task_struct *curr,
struct task_struct *next)
{
struct preempt_notifier *notifier;
- struct hlist_node *node;
+ struct hlist_node *node, *n;
- hlist_for_each_entry(notifier, node, &curr->preempt_notifiers, link)
+ hlist_for_each_entry_safe(notifier, node, n, &curr->preempt_notifiers, link)
notifier->ops->sched_out(notifier, next);
}
-#else /* !CONFIG_PREEMPT_NOTIFIERS */
-
-static void fire_sched_in_preempt_notifiers(struct task_struct *curr)
-{
-}
-
-static void
-fire_sched_out_preempt_notifiers(struct task_struct *curr,
- struct task_struct *next)
-{
-}
-
-#endif /* CONFIG_PREEMPT_NOTIFIERS */
^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: rt14: strace -> migrate_disable_atomic imbalance
2011-09-21 18:50 ` Peter Zijlstra
(?)
@ 2011-09-22 4:46 ` Mike Galbraith
2011-09-22 6:31 ` Peter Zijlstra
-1 siblings, 1 reply; 52+ messages in thread
From: Mike Galbraith @ 2011-09-22 4:46 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-rt-users, Thomas Gleixner, LKML, Oleg Nesterov,
Miklos Szeredi, mingo
On Wed, 2011-09-21 at 20:50 +0200, Peter Zijlstra wrote:
> On Wed, 2011-09-21 at 19:01 +0200, Peter Zijlstra wrote:
> > On Wed, 2011-09-21 at 12:17 +0200, Mike Galbraith wrote:
> > > [ 144.212272] ------------[ cut here ]------------
> > > [ 144.212280] WARNING: at kernel/sched.c:6152 migrate_disable+0x1b6/0x200()
> > > [ 144.212282] Hardware name: MS-7502
> > > [ 144.212283] Modules linked in: snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device edd nfsd lockd parport_pc parport nfs_acl auth_rpcgss sunrpc bridge ipv6 stp cpufreq_conservative microcode cpufreq_ondemand cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf nls_iso8859_1 nls_cp437 vfat fat fuse ext3 jbd dm_mod usbmouse usb_storage usbhid snd_hda_codec_realtek usb_libusual uas sr_mod cdrom hid snd_hda_intel e1000e snd_hda_codec kvm_intel snd_hwdep sg snd_pcm kvm i2c_i801 snd_timer snd firewire_ohci firewire_core soundcore snd_page_alloc crc_itu_t button ext4 mbcache jbd2 crc16 uhci_hcd sd_mod ehci_hcd usbcore rtc_cmos ahci libahci libata scsi_mod fan processor thermal
> > > [ 144.212317] Pid: 6215, comm: strace Not tainted 3.0.4-rt14 #2052
> > > [ 144.212319] Call Trace:
> > > [ 144.212323] [<ffffffff8104662f>] warn_slowpath_common+0x7f/0xc0
> > > [ 144.212326] [<ffffffff8104668a>] warn_slowpath_null+0x1a/0x20
> > > [ 144.212328] [<ffffffff8103f606>] migrate_disable+0x1b6/0x200
> > > [ 144.212331] [<ffffffff8105a2a8>] ptrace_stop+0x128/0x240
> > > [ 144.212334] [<ffffffff81057b9b>] ? recalc_sigpending+0x1b/0x50
> > > [ 144.212337] [<ffffffff8105b6f1>] get_signal_to_deliver+0x211/0x530
> > > [ 144.212340] [<ffffffff81001835>] do_signal+0x75/0x7a0
> > > [ 144.212342] [<ffffffff8105ae68>] ? kill_pid_info+0x58/0x80
> > > [ 144.212344] [<ffffffff8105c34c>] ? sys_kill+0xac/0x1e0
> > > [ 144.212347] [<ffffffff81001fe5>] do_notify_resume+0x65/0x80
> > > [ 144.212350] [<ffffffff8135978b>] int_signal+0x12/0x17
> > > [ 144.212352] ---[ end trace 0000000000000002 ]---
> >
> >
> > Right, that's because of
> > 53da1d9456fe7f87a920a78fdbdcf1225d197cb7, I think we simply want a full
> > revert of that for -rt.
>
> This also made me stare at the trainwreck called wait_task_inactive(),
> how about something like the below, it survives a boot and simple
> strace.
There's a missing hunklet, but...
@@ -8325,9 +8290,7 @@ void __init sched_init(void)
set_load_weight(&init_task);
-#ifdef CONFIG_PREEMPT_NOTIFIERS
INIT_HLIST_HEAD(&init_task.preempt_notifiers);
-#endif
#ifdef CONFIG_SMP
open_softirq(SCHED_SOFTIRQ, run_rebalance_domains);
..perturbation (100% userspace hog) measurement proggy and jitter
measurement proggy pinned to the same cpu makes 100% repeatable boom.
Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 3
Pid: 6226, comm: pert Not tainted 3.0.4-rt14 #2053
Call Trace:
<NMI> [<ffffffff81355f00>] panic+0xa0/0x1a8
[<ffffffff8108fe47>] watchdog_overflow_callback+0xe7/0xf0
[<ffffffff810c1c7c>] __perf_event_overflow+0x9c/0x250
[<ffffffff810c2734>] perf_event_overflow+0x14/0x20
[<ffffffff81014c7c>] intel_pmu_handle_irq+0x21c/0x440
[<ffffffff81010fb9>] perf_event_nmi_handler+0x39/0xc0
[<ffffffff8106f42c>] notifier_call_chain+0x4c/0x70
[<ffffffff8106fa6a>] __atomic_notifier_call_chain+0x4a/0x70
[<ffffffff8106faa6>] atomic_notifier_call_chain+0x16/0x20
[<ffffffff8106fc2e>] notify_die+0x2e/0x30
[<ffffffff81002c8a>] do_nmi+0xaa/0x240
[<ffffffff813592ea>] nmi+0x1a/0x20
<<EOE>> <0>Rebooting in 60 seconds..[ 0.000000]
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: rt14: strace -> migrate_disable_atomic imbalance
2011-09-22 4:46 ` Mike Galbraith
@ 2011-09-22 6:31 ` Peter Zijlstra
0 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2011-09-22 6:31 UTC (permalink / raw)
To: Mike Galbraith
Cc: linux-rt-users, Thomas Gleixner, LKML, Oleg Nesterov,
Miklos Szeredi, mingo
On Thu, 2011-09-22 at 06:46 +0200, Mike Galbraith wrote:
>
> ..perturbation (100% userspace hog) measurement proggy and jitter
> measurement proggy pinned to the same cpu makes 100% repeatable boom.
>
>
Uh ow... :-) I'd better go have a look then..
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: rt14: strace -> migrate_disable_atomic imbalance
2011-09-21 10:17 ` rt14: strace -> migrate_disable_atomic imbalance Mike Galbraith
2011-09-21 17:01 ` Peter Zijlstra
2011-09-21 18:50 ` Peter Zijlstra
@ 2011-09-22 8:38 ` Peter Zijlstra
2011-09-22 10:00 ` Peter Zijlstra
` (2 subsequent siblings)
5 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2011-09-22 8:38 UTC (permalink / raw)
To: Mike Galbraith
Cc: linux-rt-users, Thomas Gleixner, LKML, Oleg Nesterov,
Miklos Szeredi, mingo
On Wed, 2011-09-21 at 20:50 +0200, Peter Zijlstra wrote:
> +static void wait_task_inactive_sched_out(struct preempt_notifier *n,
> + struct task_struct *next)
> +{
> + struct task_struct *p;
> + struct wait_task_inactive_blocked *blocked =
> + container_of(n, struct wait_task_inactive_blocked, notifier);
> +
> + if (current->on_rq) /* we're not inactive yet */
> + return;
> +
> + hlist_del(&n->link);
> +
> + p = ACCESS_ONCE(blocked->waiter);
> + blocked->waiter = NULL;
> + wake_up_process(p);
> +}
Trying a wakeup from there isn't going to actually ever work of-course..
Duh!
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: rt14: strace -> migrate_disable_atomic imbalance
2011-09-21 10:17 ` rt14: strace -> migrate_disable_atomic imbalance Mike Galbraith
@ 2011-09-22 10:00 ` Peter Zijlstra
2011-09-21 18:50 ` Peter Zijlstra
` (4 subsequent siblings)
5 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2011-09-22 10:00 UTC (permalink / raw)
To: Mike Galbraith
Cc: linux-rt-users, Thomas Gleixner, LKML, Oleg Nesterov,
Miklos Szeredi, mingo
On Thu, 2011-09-22 at 10:38 +0200, Peter Zijlstra wrote:
> On Wed, 2011-09-21 at 20:50 +0200, Peter Zijlstra wrote:
> > +static void wait_task_inactive_sched_out(struct preempt_notifier *n,
> > + struct task_struct *next)
> > +{
> > + struct task_struct *p;
> > + struct wait_task_inactive_blocked *blocked =
> > + container_of(n, struct wait_task_inactive_blocked, notifier);
> > +
> > + if (current->on_rq) /* we're not inactive yet */
> > + return;
> > +
> > + hlist_del(&n->link);
> > +
> > + p = ACCESS_ONCE(blocked->waiter);
> > + blocked->waiter = NULL;
> > + wake_up_process(p);
> > +}
>
> Trying a wakeup from there isn't going to actually ever work of-course..
> Duh!
OK, this one seems to be better.. But its quite vile, not sure I
actually like it anymore.
---
arch/ia64/kvm/Kconfig | 1
arch/powerpc/kvm/Kconfig | 1
arch/s390/kvm/Kconfig | 1
arch/tile/kvm/Kconfig | 1
arch/x86/kvm/Kconfig | 1
include/linux/kvm_host.h | 2
include/linux/preempt.h | 4 -
include/linux/sched.h | 2
init/Kconfig | 3
kernel/sched.c | 188 +++++++++++++++++++++--------------------------
10 files changed, 85 insertions(+), 119 deletions(-)
Index: linux-2.6/arch/ia64/kvm/Kconfig
===================================================================
--- linux-2.6.orig/arch/ia64/kvm/Kconfig
+++ linux-2.6/arch/ia64/kvm/Kconfig
@@ -22,7 +22,6 @@ config KVM
depends on HAVE_KVM && MODULES && EXPERIMENTAL
# for device assignment:
depends on PCI
- select PREEMPT_NOTIFIERS
select ANON_INODES
select HAVE_KVM_IRQCHIP
select KVM_APIC_ARCHITECTURE
Index: linux-2.6/arch/powerpc/kvm/Kconfig
===================================================================
--- linux-2.6.orig/arch/powerpc/kvm/Kconfig
+++ linux-2.6/arch/powerpc/kvm/Kconfig
@@ -18,7 +18,6 @@ if VIRTUALIZATION
config KVM
bool
- select PREEMPT_NOTIFIERS
select ANON_INODES
config KVM_BOOK3S_HANDLER
Index: linux-2.6/arch/s390/kvm/Kconfig
===================================================================
--- linux-2.6.orig/arch/s390/kvm/Kconfig
+++ linux-2.6/arch/s390/kvm/Kconfig
@@ -19,7 +19,6 @@ config KVM
def_tristate y
prompt "Kernel-based Virtual Machine (KVM) support"
depends on HAVE_KVM && EXPERIMENTAL
- select PREEMPT_NOTIFIERS
select ANON_INODES
---help---
Support hosting paravirtualized guest machines using the SIE
Index: linux-2.6/arch/tile/kvm/Kconfig
===================================================================
--- linux-2.6.orig/arch/tile/kvm/Kconfig
+++ linux-2.6/arch/tile/kvm/Kconfig
@@ -19,7 +19,6 @@ if VIRTUALIZATION
config KVM
tristate "Kernel-based Virtual Machine (KVM) support"
depends on HAVE_KVM && MODULES && EXPERIMENTAL
- select PREEMPT_NOTIFIERS
select ANON_INODES
---help---
Support hosting paravirtualized guest machines.
Index: linux-2.6/arch/x86/kvm/Kconfig
===================================================================
--- linux-2.6.orig/arch/x86/kvm/Kconfig
+++ linux-2.6/arch/x86/kvm/Kconfig
@@ -24,7 +24,6 @@ config KVM
depends on PCI
# for TASKSTATS/TASK_DELAY_ACCT:
depends on NET
- select PREEMPT_NOTIFIERS
select MMU_NOTIFIER
select ANON_INODES
select HAVE_KVM_IRQCHIP
Index: linux-2.6/include/linux/kvm_host.h
===================================================================
--- linux-2.6.orig/include/linux/kvm_host.h
+++ linux-2.6/include/linux/kvm_host.h
@@ -111,9 +111,7 @@ enum {
struct kvm_vcpu {
struct kvm *kvm;
-#ifdef CONFIG_PREEMPT_NOTIFIERS
struct preempt_notifier preempt_notifier;
-#endif
int cpu;
int vcpu_id;
int srcu_idx;
Index: linux-2.6/include/linux/preempt.h
===================================================================
--- linux-2.6.orig/include/linux/preempt.h
+++ linux-2.6/include/linux/preempt.h
@@ -101,8 +101,6 @@ do { \
#endif /* CONFIG_PREEMPT_COUNT */
-#ifdef CONFIG_PREEMPT_NOTIFIERS
-
struct preempt_notifier;
/**
@@ -147,6 +145,4 @@ static inline void preempt_notifier_init
notifier->ops = ops;
}
-#endif
-
#endif /* __LINUX_PREEMPT_H */
Index: linux-2.6/include/linux/sched.h
===================================================================
--- linux-2.6.orig/include/linux/sched.h
+++ linux-2.6/include/linux/sched.h
@@ -1236,10 +1236,8 @@ struct task_struct {
struct sched_entity se;
struct sched_rt_entity rt;
-#ifdef CONFIG_PREEMPT_NOTIFIERS
/* list of struct preempt_notifier: */
struct hlist_head preempt_notifiers;
-#endif
/*
* fpu_counter contains the number of consecutive context switches
Index: linux-2.6/init/Kconfig
===================================================================
--- linux-2.6.orig/init/Kconfig
+++ linux-2.6/init/Kconfig
@@ -1403,9 +1403,6 @@ config STOP_MACHINE
source "block/Kconfig"
-config PREEMPT_NOTIFIERS
- bool
-
config PADATA
depends on SMP
bool
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -2387,6 +2387,57 @@ struct migration_arg {
static int migration_cpu_stop(void *data);
+struct wait_task_inactive_blocked {
+ struct preempt_notifier notifier;
+ struct task_struct *waiter;
+};
+
+static void
+preempt_ops_sched_out_nop(struct preempt_notifier *n, struct task_struct *next)
+{
+}
+
+static void wait_task_inactive_sched_in(struct preempt_notifier *n, int cpu)
+{
+ struct task_struct *p;
+ struct wait_task_inactive_blocked *blocked =
+ container_of(n, struct wait_task_inactive_blocked, notifier);
+
+ hlist_del(&n->link);
+
+ p = ACCESS_ONCE(blocked->waiter);
+ blocked->waiter = NULL;
+ wake_up_process(p);
+}
+
+static struct preempt_ops wait_task_inactive_ops_post = {
+ .sched_in = wait_task_inactive_sched_in,
+ .sched_out = preempt_ops_sched_out_nop,
+};
+
+static void preempt_ops_sched_in_nop(struct preempt_notifier *n, int cpu)
+{
+}
+
+static void
+wait_task_inactive_sched_out(struct preempt_notifier *n, struct task_struct *next)
+{
+ struct wait_task_inactive_blocked *blocked =
+ container_of(n, struct wait_task_inactive_blocked, notifier);
+
+ if (current->on_rq) /* we're not inactive yet */
+ return;
+
+ hlist_del(&n->link);
+ blocked->notifier.ops = &wait_task_inactive_ops_post;
+ hlist_add_head(&n->link, &next->preempt_notifiers);
+}
+
+static struct preempt_ops wait_task_inactive_ops_pre = {
+ .sched_in = preempt_ops_sched_in_nop,
+ .sched_out = wait_task_inactive_sched_out,
+};
+
/*
* wait_task_inactive - wait for a thread to unschedule.
*
@@ -2405,93 +2456,45 @@ static int migration_cpu_stop(void *data
*/
unsigned long wait_task_inactive(struct task_struct *p, long match_state)
{
+ unsigned long ncsw = 0;
unsigned long flags;
- int running, on_rq;
- unsigned long ncsw;
struct rq *rq;
- for (;;) {
- /*
- * We do the initial early heuristics without holding
- * any task-queue locks at all. We'll only try to get
- * the runqueue lock when things look like they will
- * work out!
- */
- rq = task_rq(p);
-
- /*
- * If the task is actively running on another CPU
- * still, just relax and busy-wait without holding
- * any locks.
- *
- * NOTE! Since we don't hold any locks, it's not
- * even sure that "rq" stays as the right runqueue!
- * But we don't care, since "task_running()" will
- * return false if the runqueue has changed and p
- * is actually now running somewhere else!
- */
- while (task_running(rq, p)) {
- if (match_state && unlikely(p->state != match_state))
- return 0;
- cpu_relax();
- }
-
- /*
- * Ok, time to look more closely! We need the rq
- * lock now, to be *sure*. If we're wrong, we'll
- * just go back and repeat.
- */
- rq = task_rq_lock(p, &flags);
- trace_sched_wait_task(p);
- running = task_running(rq, p);
- on_rq = p->on_rq;
- ncsw = 0;
- if (!match_state || p->state == match_state)
- ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
- task_rq_unlock(rq, p, &flags);
-
- /*
- * If it changed from the expected state, bail out now.
- */
- if (unlikely(!ncsw))
- break;
+ struct wait_task_inactive_blocked blocked = {
+ .notifier = {
+ .ops = &wait_task_inactive_ops_pre,
+ },
+ .waiter = current,
+ };
- /*
- * Was it really running after all now that we
- * checked with the proper locks actually held?
- *
- * Oops. Go back and try again..
- */
- if (unlikely(running)) {
- cpu_relax();
- continue;
- }
+ /* if we don't match the expected state, bail */
+ if (match_state && unlikely(p->state != match_state))
+ return 0;
- /*
- * It's not enough that it's not actively running,
- * it must be off the runqueue _entirely_, and not
- * preempted!
- *
- * So if it was still runnable (but just not actively
- * running right now), it's preempted, and we should
- * yield - it could be a while.
- */
- if (unlikely(on_rq)) {
- ktime_t to = ktime_set(0, NSEC_PER_SEC/HZ);
+ rq = task_rq_lock(p, &flags);
+ if (!p->on_rq) /* we're already blocked */
+ goto done;
- set_current_state(TASK_UNINTERRUPTIBLE);
- schedule_hrtimeout(&to, HRTIMER_MODE_REL);
- continue;
- }
+ hlist_add_head(&blocked.notifier.link, &p->preempt_notifiers);
+ task_rq_unlock(rq, p, &flags);
- /*
- * Ahh, all good. It wasn't running, and it wasn't
- * runnable, which means that it will never become
- * running in the future either. We're all done!
- */
- break;
+ for (;;) {
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ if (!blocked.waiter)
+ break;
+ schedule();
}
+ __set_current_state(TASK_RUNNING);
+ /*
+ * Serializes against the completion of the previously observed context
+ * switch.
+ */
+ rq = task_rq_lock(p, &flags);
+done:
+ if (!match_state || p->state == match_state)
+ ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
+ task_rq_unlock(rq, p, &flags);
return ncsw;
}
@@ -2967,10 +2970,7 @@ static void __sched_fork(struct task_str
#endif
INIT_LIST_HEAD(&p->rt.run_list);
-
-#ifdef CONFIG_PREEMPT_NOTIFIERS
INIT_HLIST_HEAD(&p->preempt_notifiers);
-#endif
}
/*
@@ -3084,8 +3084,6 @@ void wake_up_new_task(struct task_struct
task_rq_unlock(rq, p, &flags);
}
-#ifdef CONFIG_PREEMPT_NOTIFIERS
-
/**
* preempt_notifier_register - tell me when current is being preempted & rescheduled
* @notifier: notifier struct to register
@@ -3111,9 +3109,9 @@ EXPORT_SYMBOL_GPL(preempt_notifier_unreg
static void fire_sched_in_preempt_notifiers(struct task_struct *curr)
{
struct preempt_notifier *notifier;
- struct hlist_node *node;
+ struct hlist_node *node, *n;
- hlist_for_each_entry(notifier, node, &curr->preempt_notifiers, link)
+ hlist_for_each_entry_safe(notifier, node, n, &curr->preempt_notifiers, link)
notifier->ops->sched_in(notifier, raw_smp_processor_id());
}
@@ -3122,26 +3120,12 @@ fire_sched_out_preempt_notifiers(struct
struct task_struct *next)
{
struct preempt_notifier *notifier;
- struct hlist_node *node;
+ struct hlist_node *node, *n;
- hlist_for_each_entry(notifier, node, &curr->preempt_notifiers, link)
+ hlist_for_each_entry_safe(notifier, node, n, &curr->preempt_notifiers, link)
notifier->ops->sched_out(notifier, next);
}
-#else /* !CONFIG_PREEMPT_NOTIFIERS */
-
-static void fire_sched_in_preempt_notifiers(struct task_struct *curr)
-{
-}
-
-static void
-fire_sched_out_preempt_notifiers(struct task_struct *curr,
- struct task_struct *next)
-{
-}
-
-#endif /* CONFIG_PREEMPT_NOTIFIERS */
-
/**
* prepare_task_switch - prepare to switch tasks
* @rq: the runqueue preparing to switch
@@ -8312,9 +8296,7 @@ void __init sched_init(void)
set_load_weight(&init_task);
-#ifdef CONFIG_PREEMPT_NOTIFIERS
INIT_HLIST_HEAD(&init_task.preempt_notifiers);
-#endif
#ifdef CONFIG_SMP
open_softirq(SCHED_SOFTIRQ, run_rebalance_domains);
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: rt14: strace -> migrate_disable_atomic imbalance
@ 2011-09-22 10:00 ` Peter Zijlstra
0 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2011-09-22 10:00 UTC (permalink / raw)
To: Mike Galbraith
Cc: linux-rt-users, Thomas Gleixner, LKML, Oleg Nesterov,
Miklos Szeredi, mingo
On Thu, 2011-09-22 at 10:38 +0200, Peter Zijlstra wrote:
> On Wed, 2011-09-21 at 20:50 +0200, Peter Zijlstra wrote:
> > +static void wait_task_inactive_sched_out(struct preempt_notifier *n,
> > + struct task_struct *next)
> > +{
> > + struct task_struct *p;
> > + struct wait_task_inactive_blocked *blocked =
> > + container_of(n, struct wait_task_inactive_blocked, notifier);
> > +
> > + if (current->on_rq) /* we're not inactive yet */
> > + return;
> > +
> > + hlist_del(&n->link);
> > +
> > + p = ACCESS_ONCE(blocked->waiter);
> > + blocked->waiter = NULL;
> > + wake_up_process(p);
> > +}
>
> Trying a wakeup from there isn't going to actually ever work of-course..
> Duh!
OK, this one seems to be better.. But its quite vile, not sure I
actually like it anymore.
---
arch/ia64/kvm/Kconfig | 1
arch/powerpc/kvm/Kconfig | 1
arch/s390/kvm/Kconfig | 1
arch/tile/kvm/Kconfig | 1
arch/x86/kvm/Kconfig | 1
include/linux/kvm_host.h | 2
include/linux/preempt.h | 4 -
include/linux/sched.h | 2
init/Kconfig | 3
kernel/sched.c | 188 +++++++++++++++++++++--------------------------
10 files changed, 85 insertions(+), 119 deletions(-)
Index: linux-2.6/arch/ia64/kvm/Kconfig
===================================================================
--- linux-2.6.orig/arch/ia64/kvm/Kconfig
+++ linux-2.6/arch/ia64/kvm/Kconfig
@@ -22,7 +22,6 @@ config KVM
depends on HAVE_KVM && MODULES && EXPERIMENTAL
# for device assignment:
depends on PCI
- select PREEMPT_NOTIFIERS
select ANON_INODES
select HAVE_KVM_IRQCHIP
select KVM_APIC_ARCHITECTURE
Index: linux-2.6/arch/powerpc/kvm/Kconfig
===================================================================
--- linux-2.6.orig/arch/powerpc/kvm/Kconfig
+++ linux-2.6/arch/powerpc/kvm/Kconfig
@@ -18,7 +18,6 @@ if VIRTUALIZATION
config KVM
bool
- select PREEMPT_NOTIFIERS
select ANON_INODES
config KVM_BOOK3S_HANDLER
Index: linux-2.6/arch/s390/kvm/Kconfig
===================================================================
--- linux-2.6.orig/arch/s390/kvm/Kconfig
+++ linux-2.6/arch/s390/kvm/Kconfig
@@ -19,7 +19,6 @@ config KVM
def_tristate y
prompt "Kernel-based Virtual Machine (KVM) support"
depends on HAVE_KVM && EXPERIMENTAL
- select PREEMPT_NOTIFIERS
select ANON_INODES
---help---
Support hosting paravirtualized guest machines using the SIE
Index: linux-2.6/arch/tile/kvm/Kconfig
===================================================================
--- linux-2.6.orig/arch/tile/kvm/Kconfig
+++ linux-2.6/arch/tile/kvm/Kconfig
@@ -19,7 +19,6 @@ if VIRTUALIZATION
config KVM
tristate "Kernel-based Virtual Machine (KVM) support"
depends on HAVE_KVM && MODULES && EXPERIMENTAL
- select PREEMPT_NOTIFIERS
select ANON_INODES
---help---
Support hosting paravirtualized guest machines.
Index: linux-2.6/arch/x86/kvm/Kconfig
===================================================================
--- linux-2.6.orig/arch/x86/kvm/Kconfig
+++ linux-2.6/arch/x86/kvm/Kconfig
@@ -24,7 +24,6 @@ config KVM
depends on PCI
# for TASKSTATS/TASK_DELAY_ACCT:
depends on NET
- select PREEMPT_NOTIFIERS
select MMU_NOTIFIER
select ANON_INODES
select HAVE_KVM_IRQCHIP
Index: linux-2.6/include/linux/kvm_host.h
===================================================================
--- linux-2.6.orig/include/linux/kvm_host.h
+++ linux-2.6/include/linux/kvm_host.h
@@ -111,9 +111,7 @@ enum {
struct kvm_vcpu {
struct kvm *kvm;
-#ifdef CONFIG_PREEMPT_NOTIFIERS
struct preempt_notifier preempt_notifier;
-#endif
int cpu;
int vcpu_id;
int srcu_idx;
Index: linux-2.6/include/linux/preempt.h
===================================================================
--- linux-2.6.orig/include/linux/preempt.h
+++ linux-2.6/include/linux/preempt.h
@@ -101,8 +101,6 @@ do { \
#endif /* CONFIG_PREEMPT_COUNT */
-#ifdef CONFIG_PREEMPT_NOTIFIERS
-
struct preempt_notifier;
/**
@@ -147,6 +145,4 @@ static inline void preempt_notifier_init
notifier->ops = ops;
}
-#endif
-
#endif /* __LINUX_PREEMPT_H */
Index: linux-2.6/include/linux/sched.h
===================================================================
--- linux-2.6.orig/include/linux/sched.h
+++ linux-2.6/include/linux/sched.h
@@ -1236,10 +1236,8 @@ struct task_struct {
struct sched_entity se;
struct sched_rt_entity rt;
-#ifdef CONFIG_PREEMPT_NOTIFIERS
/* list of struct preempt_notifier: */
struct hlist_head preempt_notifiers;
-#endif
/*
* fpu_counter contains the number of consecutive context switches
Index: linux-2.6/init/Kconfig
===================================================================
--- linux-2.6.orig/init/Kconfig
+++ linux-2.6/init/Kconfig
@@ -1403,9 +1403,6 @@ config STOP_MACHINE
source "block/Kconfig"
-config PREEMPT_NOTIFIERS
- bool
-
config PADATA
depends on SMP
bool
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -2387,6 +2387,57 @@ struct migration_arg {
static int migration_cpu_stop(void *data);
+struct wait_task_inactive_blocked {
+ struct preempt_notifier notifier;
+ struct task_struct *waiter;
+};
+
+static void
+preempt_ops_sched_out_nop(struct preempt_notifier *n, struct task_struct *next)
+{
+}
+
+static void wait_task_inactive_sched_in(struct preempt_notifier *n, int cpu)
+{
+ struct task_struct *p;
+ struct wait_task_inactive_blocked *blocked =
+ container_of(n, struct wait_task_inactive_blocked, notifier);
+
+ hlist_del(&n->link);
+
+ p = ACCESS_ONCE(blocked->waiter);
+ blocked->waiter = NULL;
+ wake_up_process(p);
+}
+
+static struct preempt_ops wait_task_inactive_ops_post = {
+ .sched_in = wait_task_inactive_sched_in,
+ .sched_out = preempt_ops_sched_out_nop,
+};
+
+static void preempt_ops_sched_in_nop(struct preempt_notifier *n, int cpu)
+{
+}
+
+static void
+wait_task_inactive_sched_out(struct preempt_notifier *n, struct task_struct *next)
+{
+ struct wait_task_inactive_blocked *blocked =
+ container_of(n, struct wait_task_inactive_blocked, notifier);
+
+ if (current->on_rq) /* we're not inactive yet */
+ return;
+
+ hlist_del(&n->link);
+ blocked->notifier.ops = &wait_task_inactive_ops_post;
+ hlist_add_head(&n->link, &next->preempt_notifiers);
+}
+
+static struct preempt_ops wait_task_inactive_ops_pre = {
+ .sched_in = preempt_ops_sched_in_nop,
+ .sched_out = wait_task_inactive_sched_out,
+};
+
/*
* wait_task_inactive - wait for a thread to unschedule.
*
@@ -2405,93 +2456,45 @@ static int migration_cpu_stop(void *data
*/
unsigned long wait_task_inactive(struct task_struct *p, long match_state)
{
+ unsigned long ncsw = 0;
unsigned long flags;
- int running, on_rq;
- unsigned long ncsw;
struct rq *rq;
- for (;;) {
- /*
- * We do the initial early heuristics without holding
- * any task-queue locks at all. We'll only try to get
- * the runqueue lock when things look like they will
- * work out!
- */
- rq = task_rq(p);
-
- /*
- * If the task is actively running on another CPU
- * still, just relax and busy-wait without holding
- * any locks.
- *
- * NOTE! Since we don't hold any locks, it's not
- * even sure that "rq" stays as the right runqueue!
- * But we don't care, since "task_running()" will
- * return false if the runqueue has changed and p
- * is actually now running somewhere else!
- */
- while (task_running(rq, p)) {
- if (match_state && unlikely(p->state != match_state))
- return 0;
- cpu_relax();
- }
-
- /*
- * Ok, time to look more closely! We need the rq
- * lock now, to be *sure*. If we're wrong, we'll
- * just go back and repeat.
- */
- rq = task_rq_lock(p, &flags);
- trace_sched_wait_task(p);
- running = task_running(rq, p);
- on_rq = p->on_rq;
- ncsw = 0;
- if (!match_state || p->state == match_state)
- ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
- task_rq_unlock(rq, p, &flags);
-
- /*
- * If it changed from the expected state, bail out now.
- */
- if (unlikely(!ncsw))
- break;
+ struct wait_task_inactive_blocked blocked = {
+ .notifier = {
+ .ops = &wait_task_inactive_ops_pre,
+ },
+ .waiter = current,
+ };
- /*
- * Was it really running after all now that we
- * checked with the proper locks actually held?
- *
- * Oops. Go back and try again..
- */
- if (unlikely(running)) {
- cpu_relax();
- continue;
- }
+ /* if we don't match the expected state, bail */
+ if (match_state && unlikely(p->state != match_state))
+ return 0;
- /*
- * It's not enough that it's not actively running,
- * it must be off the runqueue _entirely_, and not
- * preempted!
- *
- * So if it was still runnable (but just not actively
- * running right now), it's preempted, and we should
- * yield - it could be a while.
- */
- if (unlikely(on_rq)) {
- ktime_t to = ktime_set(0, NSEC_PER_SEC/HZ);
+ rq = task_rq_lock(p, &flags);
+ if (!p->on_rq) /* we're already blocked */
+ goto done;
- set_current_state(TASK_UNINTERRUPTIBLE);
- schedule_hrtimeout(&to, HRTIMER_MODE_REL);
- continue;
- }
+ hlist_add_head(&blocked.notifier.link, &p->preempt_notifiers);
+ task_rq_unlock(rq, p, &flags);
- /*
- * Ahh, all good. It wasn't running, and it wasn't
- * runnable, which means that it will never become
- * running in the future either. We're all done!
- */
- break;
+ for (;;) {
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ if (!blocked.waiter)
+ break;
+ schedule();
}
+ __set_current_state(TASK_RUNNING);
+ /*
+ * Serializes against the completion of the previously observed context
+ * switch.
+ */
+ rq = task_rq_lock(p, &flags);
+done:
+ if (!match_state || p->state == match_state)
+ ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
+ task_rq_unlock(rq, p, &flags);
return ncsw;
}
@@ -2967,10 +2970,7 @@ static void __sched_fork(struct task_str
#endif
INIT_LIST_HEAD(&p->rt.run_list);
-
-#ifdef CONFIG_PREEMPT_NOTIFIERS
INIT_HLIST_HEAD(&p->preempt_notifiers);
-#endif
}
/*
@@ -3084,8 +3084,6 @@ void wake_up_new_task(struct task_struct
task_rq_unlock(rq, p, &flags);
}
-#ifdef CONFIG_PREEMPT_NOTIFIERS
-
/**
* preempt_notifier_register - tell me when current is being preempted & rescheduled
* @notifier: notifier struct to register
@@ -3111,9 +3109,9 @@ EXPORT_SYMBOL_GPL(preempt_notifier_unreg
static void fire_sched_in_preempt_notifiers(struct task_struct *curr)
{
struct preempt_notifier *notifier;
- struct hlist_node *node;
+ struct hlist_node *node, *n;
- hlist_for_each_entry(notifier, node, &curr->preempt_notifiers, link)
+ hlist_for_each_entry_safe(notifier, node, n, &curr->preempt_notifiers, link)
notifier->ops->sched_in(notifier, raw_smp_processor_id());
}
@@ -3122,26 +3120,12 @@ fire_sched_out_preempt_notifiers(struct
struct task_struct *next)
{
struct preempt_notifier *notifier;
- struct hlist_node *node;
+ struct hlist_node *node, *n;
- hlist_for_each_entry(notifier, node, &curr->preempt_notifiers, link)
+ hlist_for_each_entry_safe(notifier, node, n, &curr->preempt_notifiers, link)
notifier->ops->sched_out(notifier, next);
}
-#else /* !CONFIG_PREEMPT_NOTIFIERS */
-
-static void fire_sched_in_preempt_notifiers(struct task_struct *curr)
-{
-}
-
-static void
-fire_sched_out_preempt_notifiers(struct task_struct *curr,
- struct task_struct *next)
-{
-}
-
-#endif /* CONFIG_PREEMPT_NOTIFIERS */
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: rt14: strace -> migrate_disable_atomic imbalance
2011-09-22 10:00 ` Peter Zijlstra
(?)
@ 2011-09-22 11:55 ` Mike Galbraith
2011-09-22 12:09 ` Peter Zijlstra
-1 siblings, 1 reply; 52+ messages in thread
From: Mike Galbraith @ 2011-09-22 11:55 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-rt-users, Thomas Gleixner, LKML, Oleg Nesterov,
Miklos Szeredi, mingo
On Thu, 2011-09-22 at 12:00 +0200, Peter Zijlstra wrote:
> OK, this one seems to be better.. But its quite vile, not sure I
> actually like it anymore.
Well, seemed to work, but I see there's a v3 now.
-Mike
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: rt14: strace -> migrate_disable_atomic imbalance
2011-09-22 11:55 ` Mike Galbraith
@ 2011-09-22 12:09 ` Peter Zijlstra
2011-09-22 13:42 ` Mike Galbraith
0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2011-09-22 12:09 UTC (permalink / raw)
To: Mike Galbraith
Cc: linux-rt-users, Thomas Gleixner, LKML, Oleg Nesterov,
Miklos Szeredi, mingo
On Thu, 2011-09-22 at 13:55 +0200, Mike Galbraith wrote:
> On Thu, 2011-09-22 at 12:00 +0200, Peter Zijlstra wrote:
>
> > OK, this one seems to be better.. But its quite vile, not sure I
> > actually like it anymore.
>
> Well, seemed to work, but I see there's a v3 now.
Yeah, just posted it for completeness, not sure its actually going
anywhere since its slower than the current code (although its hard to
say with the results changing from reboot to reboot), and its still
quite ugly..
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: rt14: strace -> migrate_disable_atomic imbalance
2011-09-22 12:09 ` Peter Zijlstra
@ 2011-09-22 13:42 ` Mike Galbraith
2011-09-22 14:05 ` Mike Galbraith
2011-09-22 14:34 ` Peter Zijlstra
0 siblings, 2 replies; 52+ messages in thread
From: Mike Galbraith @ 2011-09-22 13:42 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-rt-users, Thomas Gleixner, LKML, Oleg Nesterov,
Miklos Szeredi, mingo
On Thu, 2011-09-22 at 14:09 +0200, Peter Zijlstra wrote:
> On Thu, 2011-09-22 at 13:55 +0200, Mike Galbraith wrote:
> > On Thu, 2011-09-22 at 12:00 +0200, Peter Zijlstra wrote:
> >
> > > OK, this one seems to be better.. But its quite vile, not sure I
> > > actually like it anymore.
> >
> > Well, seemed to work, but I see there's a v3 now.
>
> Yeah, just posted it for completeness, not sure its actually going
> anywhere since its slower than the current code (although its hard to
> say with the results changing from reboot to reboot), and its still
> quite ugly..
Hm. Stracing this proglet will soon leave it stuck forever unless the
timer is left running. Virgin rt14 does the same though...
strace ./jitter -c 3 -p 99 -f 1000 -t 10 -r
rt_sigtimedwait([], NULL, NULL, 8) = 64
timer_settime(0x1, TIMER_ABSTIME, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
timer_settime(0x1, TIMER_ABSTIME, {it_interval={0, 0}, it_value={1316698141, 166759038}}, NULL) = 0
rt_sigtimedwait([], NULL, NULL, 8) = 64
timer_settime(0x1, TIMER_ABSTIME, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
timer_settime(0x1, TIMER_ABSTIME, {it_interval={0, 0}, it_value={1316698141, 167822701}}, NULL) = 0
rt_sigtimedwait([], NULL, NULL, 8) = 64
timer_settime(0x1, TIMER_ABSTIME, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
timer_settime(0x1, TIMER_ABSTIME, {it_interval={0, 0}, it_value={1316698141, 168887375}}, NULL) = 0
--- SIGRT_32 (Real-time signal 30) @ 0 (0) ---
rt_sigreturn(0x40) = 0
rt_sigtimedwait([], NULL, NULL, 8^C <unfinished ...>
#define _GNU_SOURCE
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <math.h>
#include <values.h>
#include <sched.h>
#include <signal.h>
#include <time.h>
#include <cpuset.h>
#include <getopt.h>
#include <sys/time.h>
#include <sys/types.h>
#include <sys/mman.h>
/* compile with gcc -O jitter.c -o jitter -lrt -lm */
#define NSEC_PER_SEC 1000000000ULL
#define USEC_PER_SEC 1000000ULL
#define NSEC_PER_USEC 1000ULL
int frequency = 1000;
int period;
int tolerance = 5;
int delay = 1;
int samples = 1000;
int cpu;
int priority = 1;
int reset_timer;
double *deltas;
double *deviants;
sigset_t mysigset;
char *usage = "Usage: -c <cpu> -d <delay> -f <freq(Hz)> -p <prio> -t <tolerance(us)>\n";
void parse_options(int argc, char **argv)
{
int ch;
extern char *optarg;
extern int optind;
while ((ch = getopt(argc, argv, "c:d:f:p:rt:")) != EOF) {
switch (ch) {
case 'c':
if (sscanf(optarg, "%d", &cpu) != 1 ||
cpu < 0) {
fprintf(stderr,"Invalid cpu.\n");
exit(EXIT_FAILURE);
}
break;
case 'd':
if (sscanf(optarg, "%d", &delay) != 1 ||
delay <= 0) {
fprintf(stderr,"Invalid delay.\n");
exit(EXIT_FAILURE);
}
break;
case 'f':
if (sscanf(optarg, "%d", &frequency) != 1 ||
frequency <= 0) {
fprintf(stderr,"Invalid frequency.\n");
exit(EXIT_FAILURE);
}
break;
case 'r':
reset_timer = 1;
break;
case 'p':
if (sscanf(optarg, "%d", &priority) != 1 ||
priority < 1 || priority > 99) {
fprintf(stderr,"Invalid priority.\n");
exit(EXIT_FAILURE);
}
break;
case 't':
if (sscanf(optarg, "%d", &tolerance) != 1 ||
tolerance < 0) {
fprintf(stderr,"Invalid tolerance.\n");
exit(EXIT_FAILURE);
}
break;
default:
fprintf(stderr, "%s", usage);
exit(EXIT_FAILURE);
break;
}
}
samples = frequency * delay;
period = NSEC_PER_SEC / frequency;
}
long delta(struct timespec *now, struct timespec *then)
{
long delta = now->tv_sec * NSEC_PER_SEC + now->tv_nsec;
delta -= then->tv_sec * NSEC_PER_SEC + then->tv_nsec;
return delta;
}
void signal_handler(int signo)
{
}
void init_timer(timer_t *timer_id)
{
struct sigaction sa;
struct sigevent se;
memset(&sa, 0, sizeof(sa));
sa.sa_flags = SA_RESTART|SA_SIGINFO;
sa.sa_handler = signal_handler;
sigemptyset(&sa.sa_mask);
sigaddset(&sa.sa_mask, SIGCHLD);
memset(&se, 0, sizeof(se));
se.sigev_notify = SIGEV_SIGNAL;
se.sigev_signo = SIGRTMAX;
se.sigev_value.sival_int = 0;
if (sigaction(SIGRTMAX, &sa, 0) < 0) {
perror("sigaction");
exit(EXIT_FAILURE);
}
if (timer_create(CLOCK_REALTIME, &se, timer_id) < 0) {
perror("timer_create");
exit(EXIT_FAILURE);
}
sigemptyset(&mysigset);
sigaddset(&mysigset,SIGRTMAX);
}
void set_timer(timer_t timer_id, int period, struct timespec *target)
{
struct itimerspec ts;
struct timespec now;
clock_gettime(CLOCK_REALTIME, target);
if (period) {
target->tv_nsec += period;
if (target->tv_nsec >= NSEC_PER_SEC) {
target->tv_sec++;
target->tv_nsec -= NSEC_PER_SEC;
}
ts.it_value = *target;
ts.it_interval.tv_sec = 0;
ts.it_interval.tv_nsec = reset_timer ? 0 : period;
} else
memset(&ts, 0, sizeof(struct itimerspec));
if (timer_settime(timer_id, TIMER_ABSTIME, &ts, NULL) < 0) {
perror("timer_settime");
exit (EXIT_FAILURE);
}
}
void print_stats(void)
{
struct sched_param sp;
double min = MAXDOUBLE, max = MINDOUBLE;
double sum = 0.0, delta, mean, sd;
double tol = (double)tolerance;
int i, deviant = 0;
sp.sched_priority = 0;
if (sched_setscheduler(0, SCHED_OTHER, &sp) == -1) {
perror("sched_setscheduler");
exit(EXIT_FAILURE);
}
for (i = 0; i < samples; i++) {
deltas[i] /= (double)NSEC_PER_USEC;
if (deltas[i] < min)
min = deltas[i];
if (deltas[i] > max)
max = deltas[i];
if (abs((int)deltas[i]) > tol) {
deviants[deviant] = deltas[i];
deviant++;
}
sum += deltas[i];
}
mean = sum / (double)samples;
/* calculate standard deviation */
sum = 0.0;
for (i = 0; i < samples; i++) {
delta = deltas[i] - mean;
sum += delta*delta;
}
sum /= (double)samples;
sd = sqrt(sum);
printf("jitter:%7.2f\tmin: %9.2f max: %9.2f mean: %9.2f stddev: %7.2f\n",
max - min, min, max, mean, sd);
if (!deviant)
goto out;
min = MAXDOUBLE;
max = MINDOUBLE;
sum = 0.0;
for (i = 0; i < deviant; i++) {
if (deviants[i] < min)
min = deviants[i];
if (deviants[i] > max)
max = deviants[i];
sum += deviants[i];
}
mean = sum / (double)deviant;
/* calculate standard deviation */
sum = 0.0;
for (i = 0; i < deviant; i++) {
delta = deviants[i] - mean;
sum += delta*delta;
}
sum /= (double)deviant;
sd = sqrt(sum);
printf("%d > %d us hits\tmin: %9.2f max: %9.2f mean: %9.2f stddev: %7.2f\n\n",
deviant, tolerance, min, max, mean, sd);
out:
fflush(stdout);
sp.sched_priority = priority;
if (sched_setscheduler(0, SCHED_FIFO, &sp) == -1) {
perror("sched_setscheduler");
exit(EXIT_FAILURE);
}
}
int quit;
void exit_handler(int signo)
{
quit = 1;
}
int main(int argc, char **argv)
{
timer_t timer_id;
struct sched_param sp;
cpu_set_t cpuset;
struct timespec now, then;
int i = 0, sig = 0;
parse_options(argc, argv);
signal(SIGINT, exit_handler);
signal(SIGTERM, exit_handler);
if (mlockall(MCL_CURRENT|MCL_FUTURE) < 0) {
perror("mlockall");
exit(EXIT_FAILURE);
}
if (!(deltas = malloc(samples * sizeof(double)))) {
perror("malloc deltas");
exit(EXIT_FAILURE);
} else if (!(deviants = malloc(samples * sizeof(double)))) {
perror("malloc deviants");
exit(EXIT_FAILURE);
}
CPU_ZERO(&cpuset);
CPU_SET(cpu, &cpuset);
if (sched_setaffinity(0, sizeof(cpuset), &cpuset) == -1) {
perror("setaffinity");
exit(EXIT_FAILURE);
}
sp.sched_priority = priority;
if (sched_setscheduler(0, SCHED_FIFO, &sp) == -1) {
perror("sched_setscheduler");
exit(EXIT_FAILURE);
}
printf("CPU%d priority: %d timer freq: %d Hz tolerance: %d usecs, stats interval: %d %s\n\n",
cpu, sp.sched_priority, frequency, tolerance, delay, "secs");
init_timer(&timer_id);
set_timer(timer_id, period, &then);
while (!quit && reset_timer) {
sigwait(&mysigset,&sig);
set_timer(timer_id, 0, &now);
deltas[i] = (double)delta(&now, &then);
if (i++ >= samples) {
print_stats();
i = 0;
}
set_timer(timer_id, period, &then);
}
clock_gettime(CLOCK_REALTIME, &then);
while (!quit && !reset_timer) {
sigwait(&mysigset,&sig);
clock_gettime(CLOCK_REALTIME, &now);
deltas[i] = (double)delta(&now, &then) - period;
if (i++ >= samples) {
set_timer(timer_id, 0, &then);
print_stats();
i = 0;
set_timer(timer_id, period, &then);
}
clock_gettime(CLOCK_REALTIME, &then);
}
set_timer(timer_id, 0, &now);
exit(EXIT_SUCCESS);
}
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: rt14: strace -> migrate_disable_atomic imbalance
2011-09-22 13:42 ` Mike Galbraith
@ 2011-09-22 14:05 ` Mike Galbraith
2011-09-22 15:20 ` Peter Zijlstra
2011-09-22 14:34 ` Peter Zijlstra
1 sibling, 1 reply; 52+ messages in thread
From: Mike Galbraith @ 2011-09-22 14:05 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-rt-users, Thomas Gleixner, LKML, Oleg Nesterov,
Miklos Szeredi, mingo
On Thu, 2011-09-22 at 15:42 +0200, Mike Galbraith wrote:
> On Thu, 2011-09-22 at 14:09 +0200, Peter Zijlstra wrote:
> > On Thu, 2011-09-22 at 13:55 +0200, Mike Galbraith wrote:
> > > On Thu, 2011-09-22 at 12:00 +0200, Peter Zijlstra wrote:
> > >
> > > > OK, this one seems to be better.. But its quite vile, not sure I
> > > > actually like it anymore.
> > >
> > > Well, seemed to work, but I see there's a v3 now.
> >
> > Yeah, just posted it for completeness, not sure its actually going
> > anywhere since its slower than the current code (although its hard to
> > say with the results changing from reboot to reboot), and its still
> > quite ugly..
>
> Hm. Stracing this proglet will soon leave it stuck forever unless the
> timer is left running. Virgin rt14 does the same though...
>
> strace ./jitter -c 3 -p 99 -f 1000 -t 10 -r
>
> rt_sigtimedwait([], NULL, NULL, 8) = 64
> timer_settime(0x1, TIMER_ABSTIME, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
> timer_settime(0x1, TIMER_ABSTIME, {it_interval={0, 0}, it_value={1316698141, 166759038}}, NULL) = 0
> rt_sigtimedwait([], NULL, NULL, 8) = 64
> timer_settime(0x1, TIMER_ABSTIME, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
> timer_settime(0x1, TIMER_ABSTIME, {it_interval={0, 0}, it_value={1316698141, 167822701}}, NULL) = 0
> rt_sigtimedwait([], NULL, NULL, 8) = 64
> timer_settime(0x1, TIMER_ABSTIME, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
> timer_settime(0x1, TIMER_ABSTIME, {it_interval={0, 0}, it_value={1316698141, 168887375}}, NULL) = 0
> --- SIGRT_32 (Real-time signal 30) @ 0 (0) ---
> rt_sigreturn(0x40) = 0
> rt_sigtimedwait([], NULL, NULL, 8^C <unfinished ...>
I thought it was RT specific, but it's not after all, a 3.0.4 distro
desktop (preempt) kernel did the same after a bit.
-Mike
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: rt14: strace -> migrate_disable_atomic imbalance
2011-09-22 14:05 ` Mike Galbraith
@ 2011-09-22 15:20 ` Peter Zijlstra
0 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2011-09-22 15:20 UTC (permalink / raw)
To: Mike Galbraith
Cc: linux-rt-users, Thomas Gleixner, LKML, Oleg Nesterov,
Miklos Szeredi, mingo
On Thu, 2011-09-22 at 16:05 +0200, Mike Galbraith wrote:
> > strace ./jitter -c 3 -p 99 -f 1000 -t 10 -r
> >
> I thought it was RT specific, but it's not after all, a 3.0.4 distro
> desktop (preempt) kernel did the same after a bit.
I can't reproduce on a recent -tip, could be my machine, could be -tip,
let me try more.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: rt14: strace -> migrate_disable_atomic imbalance
2011-09-22 13:42 ` Mike Galbraith
2011-09-22 14:05 ` Mike Galbraith
@ 2011-09-22 14:34 ` Peter Zijlstra
2011-09-22 14:38 ` Mike Galbraith
1 sibling, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2011-09-22 14:34 UTC (permalink / raw)
To: Mike Galbraith
Cc: linux-rt-users, Thomas Gleixner, LKML, Oleg Nesterov,
Miklos Szeredi, mingo
On Thu, 2011-09-22 at 15:42 +0200, Mike Galbraith wrote:
> #include <cpuset.h>
I don't seem to have that, where does that come from?
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: rt14: strace -> migrate_disable_atomic imbalance
2011-09-22 14:34 ` Peter Zijlstra
@ 2011-09-22 14:38 ` Mike Galbraith
2011-09-22 14:41 ` Mike Galbraith
2011-09-22 14:41 ` Peter Zijlstra
0 siblings, 2 replies; 52+ messages in thread
From: Mike Galbraith @ 2011-09-22 14:38 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-rt-users, Thomas Gleixner, LKML, Oleg Nesterov,
Miklos Szeredi, mingo
On Thu, 2011-09-22 at 16:34 +0200, Peter Zijlstra wrote:
> On Thu, 2011-09-22 at 15:42 +0200, Mike Galbraith wrote:
> > #include <cpuset.h>
>
> I don't seem to have that, where does that come from?
libcpuset-devel.
-Mike
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: rt14: strace -> migrate_disable_atomic imbalance
2011-09-22 14:38 ` Mike Galbraith
@ 2011-09-22 14:41 ` Mike Galbraith
2011-09-22 14:41 ` Peter Zijlstra
1 sibling, 0 replies; 52+ messages in thread
From: Mike Galbraith @ 2011-09-22 14:41 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-rt-users, Thomas Gleixner, LKML, Oleg Nesterov,
Miklos Szeredi, mingo
On Thu, 2011-09-22 at 16:38 +0200, Mike Galbraith wrote:
> On Thu, 2011-09-22 at 16:34 +0200, Peter Zijlstra wrote:
> > On Thu, 2011-09-22 at 15:42 +0200, Mike Galbraith wrote:
> > > #include <cpuset.h>
> >
> > I don't seem to have that, where does that come from?
>
> libcpuset-devel.
But you don't need it. That's a leftover from a version that could move
itself into a shielded cpuset.
-Mike
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: rt14: strace -> migrate_disable_atomic imbalance
2011-09-22 14:38 ` Mike Galbraith
2011-09-22 14:41 ` Mike Galbraith
@ 2011-09-22 14:41 ` Peter Zijlstra
2011-09-22 14:46 ` Mike Galbraith
1 sibling, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2011-09-22 14:41 UTC (permalink / raw)
To: Mike Galbraith
Cc: linux-rt-users, Thomas Gleixner, LKML, Oleg Nesterov,
Miklos Szeredi, mingo
On Thu, 2011-09-22 at 16:38 +0200, Mike Galbraith wrote:
> On Thu, 2011-09-22 at 16:34 +0200, Peter Zijlstra wrote:
> > On Thu, 2011-09-22 at 15:42 +0200, Mike Galbraith wrote:
> > > #include <cpuset.h>
> >
> > I don't seem to have that, where does that come from?
>
> libcpuset-devel.
Not in any distro near me. I'm assuming its this stuff:
ftp://oss.sgi.com/projects/cpusets/download/libcpuset.html
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: rt14: strace -> migrate_disable_atomic imbalance
2011-09-22 14:41 ` Peter Zijlstra
@ 2011-09-22 14:46 ` Mike Galbraith
0 siblings, 0 replies; 52+ messages in thread
From: Mike Galbraith @ 2011-09-22 14:46 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-rt-users, Thomas Gleixner, LKML, Oleg Nesterov,
Miklos Szeredi, mingo
On Thu, 2011-09-22 at 16:41 +0200, Peter Zijlstra wrote:
> On Thu, 2011-09-22 at 16:38 +0200, Mike Galbraith wrote:
> > On Thu, 2011-09-22 at 16:34 +0200, Peter Zijlstra wrote:
> > > On Thu, 2011-09-22 at 15:42 +0200, Mike Galbraith wrote:
> > > > #include <cpuset.h>
> > >
> > > I don't seem to have that, where does that come from?
> >
> > libcpuset-devel.
>
> Not in any distro near me. I'm assuming its this stuff:
(move closer to Nürnberg;)
> ftp://oss.sgi.com/projects/cpusets/download/libcpuset.html
Yeah, that should be it.
-Mike
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: rt14: strace -> migrate_disable_atomic imbalance
@ 2011-09-22 14:46 ` Mike Galbraith
0 siblings, 0 replies; 52+ messages in thread
From: Mike Galbraith @ 2011-09-22 14:46 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-rt-users, Thomas Gleixner, LKML, Oleg Nesterov,
Miklos Szeredi, mingo
On Thu, 2011-09-22 at 16:41 +0200, Peter Zijlstra wrote:
> On Thu, 2011-09-22 at 16:38 +0200, Mike Galbraith wrote:
> > On Thu, 2011-09-22 at 16:34 +0200, Peter Zijlstra wrote:
> > > On Thu, 2011-09-22 at 15:42 +0200, Mike Galbraith wrote:
> > > > #include <cpuset.h>
> > >
> > > I don't seem to have that, where does that come from?
> >
> > libcpuset-devel.
>
> Not in any distro near me. I'm assuming its this stuff:
(move closer to Nürnberg;)
> ftp://oss.sgi.com/projects/cpusets/download/libcpuset.html
Yeah, that should be it.
-Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: rt14: strace -> migrate_disable_atomic imbalance
2011-09-21 10:17 ` rt14: strace -> migrate_disable_atomic imbalance Mike Galbraith
` (3 preceding siblings ...)
2011-09-22 10:00 ` Peter Zijlstra
@ 2011-09-22 11:31 ` Peter Zijlstra
2011-09-22 11:46 ` Peter Zijlstra
5 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2011-09-22 11:31 UTC (permalink / raw)
To: Mike Galbraith
Cc: linux-rt-users, Thomas Gleixner, LKML, Oleg Nesterov,
Miklos Szeredi, mingo
On Thu, 2011-09-22 at 12:00 +0200, Peter Zijlstra wrote:
> OK, this one seems to be better.. But its quite vile, not sure I
> actually like it anymore.
>
There's also a small matter of it actually being slower than what we had
when tested with: strace hackbench > /dev/null 2>&1.
curses.. even adding a busy wait on p->on_cpu in front.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: rt14: strace -> migrate_disable_atomic imbalance
2011-09-21 10:17 ` rt14: strace -> migrate_disable_atomic imbalance Mike Galbraith
@ 2011-09-22 11:46 ` Peter Zijlstra
2011-09-21 18:50 ` Peter Zijlstra
` (4 subsequent siblings)
5 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2011-09-22 11:46 UTC (permalink / raw)
To: Mike Galbraith
Cc: linux-rt-users, Thomas Gleixner, LKML, Oleg Nesterov,
Miklos Szeredi, mingo
On Thu, 2011-09-22 at 13:31 +0200, Peter Zijlstra wrote:
> On Thu, 2011-09-22 at 12:00 +0200, Peter Zijlstra wrote:
> > OK, this one seems to be better.. But its quite vile, not sure I
> > actually like it anymore.
> >
> There's also a small matter of it actually being slower than what we had
> when tested with: strace hackbench > /dev/null 2>&1.
>
> curses.. even adding a busy wait on p->on_cpu in front.
Gah, its one of those benchmarks where if you rebuild+reboot the numbers
change. Latest patch here..
---
Subject: sched: Rewrite wait_task_inactive()
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Wed Sep 21 21:34:05 CEST 2011
Instead of using a combination of busy-waits and sleeps to poll a
task's state, ensure we get a notification and properly block.
This makes preempt notifiers unconditional, but since pretty much
everybody already has those enabled in their config it shouldn't be a
big deal (I hope).
The notification is somewhat horrid since the sched_out notifier is
called with rq->lock held and interrupts disabled, so we can't
actually do the wakeup from there. However the sched_in notifier is
called without holding rq->lock and with interrupts enable.
So what we do is we register a preempt notifier on the task we want,
that notifier will move itself to the next task passed into the
sched_out function, the sched_in notification of the next task, will do
the actual work... most horrid.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/ia64/kvm/Kconfig | 1
arch/powerpc/kvm/Kconfig | 1
arch/s390/kvm/Kconfig | 1
arch/tile/kvm/Kconfig | 1
arch/x86/kvm/Kconfig | 1
include/linux/kvm_host.h | 2
include/linux/preempt.h | 4
include/linux/sched.h | 2
init/Kconfig | 3
kernel/sched.c | 191 +++++++++++++++++++++--------------------------
10 files changed, 86 insertions(+), 121 deletions(-)
Index: linux-2.6/arch/ia64/kvm/Kconfig
===================================================================
--- linux-2.6.orig/arch/ia64/kvm/Kconfig
+++ linux-2.6/arch/ia64/kvm/Kconfig
@@ -22,7 +22,6 @@ config KVM
depends on HAVE_KVM && MODULES && EXPERIMENTAL
# for device assignment:
depends on PCI
- select PREEMPT_NOTIFIERS
select ANON_INODES
select HAVE_KVM_IRQCHIP
select KVM_APIC_ARCHITECTURE
Index: linux-2.6/arch/powerpc/kvm/Kconfig
===================================================================
--- linux-2.6.orig/arch/powerpc/kvm/Kconfig
+++ linux-2.6/arch/powerpc/kvm/Kconfig
@@ -18,7 +18,6 @@ if VIRTUALIZATION
config KVM
bool
- select PREEMPT_NOTIFIERS
select ANON_INODES
config KVM_BOOK3S_HANDLER
Index: linux-2.6/arch/s390/kvm/Kconfig
===================================================================
--- linux-2.6.orig/arch/s390/kvm/Kconfig
+++ linux-2.6/arch/s390/kvm/Kconfig
@@ -19,7 +19,6 @@ config KVM
def_tristate y
prompt "Kernel-based Virtual Machine (KVM) support"
depends on HAVE_KVM && EXPERIMENTAL
- select PREEMPT_NOTIFIERS
select ANON_INODES
---help---
Support hosting paravirtualized guest machines using the SIE
Index: linux-2.6/arch/tile/kvm/Kconfig
===================================================================
--- linux-2.6.orig/arch/tile/kvm/Kconfig
+++ linux-2.6/arch/tile/kvm/Kconfig
@@ -19,7 +19,6 @@ if VIRTUALIZATION
config KVM
tristate "Kernel-based Virtual Machine (KVM) support"
depends on HAVE_KVM && MODULES && EXPERIMENTAL
- select PREEMPT_NOTIFIERS
select ANON_INODES
---help---
Support hosting paravirtualized guest machines.
Index: linux-2.6/arch/x86/kvm/Kconfig
===================================================================
--- linux-2.6.orig/arch/x86/kvm/Kconfig
+++ linux-2.6/arch/x86/kvm/Kconfig
@@ -24,7 +24,6 @@ config KVM
depends on PCI
# for TASKSTATS/TASK_DELAY_ACCT:
depends on NET
- select PREEMPT_NOTIFIERS
select MMU_NOTIFIER
select ANON_INODES
select HAVE_KVM_IRQCHIP
Index: linux-2.6/include/linux/kvm_host.h
===================================================================
--- linux-2.6.orig/include/linux/kvm_host.h
+++ linux-2.6/include/linux/kvm_host.h
@@ -111,9 +111,7 @@ enum {
struct kvm_vcpu {
struct kvm *kvm;
-#ifdef CONFIG_PREEMPT_NOTIFIERS
struct preempt_notifier preempt_notifier;
-#endif
int cpu;
int vcpu_id;
int srcu_idx;
Index: linux-2.6/include/linux/preempt.h
===================================================================
--- linux-2.6.orig/include/linux/preempt.h
+++ linux-2.6/include/linux/preempt.h
@@ -101,8 +101,6 @@ do { \
#endif /* CONFIG_PREEMPT_COUNT */
-#ifdef CONFIG_PREEMPT_NOTIFIERS
-
struct preempt_notifier;
/**
@@ -147,6 +145,4 @@ static inline void preempt_notifier_init
notifier->ops = ops;
}
-#endif
-
#endif /* __LINUX_PREEMPT_H */
Index: linux-2.6/include/linux/sched.h
===================================================================
--- linux-2.6.orig/include/linux/sched.h
+++ linux-2.6/include/linux/sched.h
@@ -1236,10 +1236,8 @@ struct task_struct {
struct sched_entity se;
struct sched_rt_entity rt;
-#ifdef CONFIG_PREEMPT_NOTIFIERS
/* list of struct preempt_notifier: */
struct hlist_head preempt_notifiers;
-#endif
/*
* fpu_counter contains the number of consecutive context switches
Index: linux-2.6/init/Kconfig
===================================================================
--- linux-2.6.orig/init/Kconfig
+++ linux-2.6/init/Kconfig
@@ -1403,9 +1403,6 @@ config STOP_MACHINE
source "block/Kconfig"
-config PREEMPT_NOTIFIERS
- bool
-
config PADATA
depends on SMP
bool
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -2387,6 +2387,54 @@ struct migration_arg {
static int migration_cpu_stop(void *data);
+struct wait_task_inactive_blocked {
+ struct preempt_notifier notifier;
+ struct task_struct *waiter;
+};
+
+static void wait_task_inactive_sched_in(struct preempt_notifier *n, int cpu)
+{
+ struct task_struct *p;
+ struct wait_task_inactive_blocked *blocked =
+ container_of(n, struct wait_task_inactive_blocked, notifier);
+
+ hlist_del(&n->link);
+
+ p = ACCESS_ONCE(blocked->waiter);
+ blocked->waiter = NULL;
+ wake_up_process(p);
+}
+
+static void
+preempt_ops_sched_out_nop(struct preempt_notifier *n, struct task_struct *next)
+{
+}
+
+static struct preempt_ops wait_task_inactive_ops_post = {
+ .sched_in = wait_task_inactive_sched_in,
+ .sched_out = preempt_ops_sched_out_nop,
+};
+
+static void preempt_ops_sched_in_nop(struct preempt_notifier *n, int cpu)
+{
+}
+
+static void
+wait_task_inactive_sched_out(struct preempt_notifier *n, struct task_struct *next)
+{
+ if (current->on_rq) /* we're not inactive yet */
+ return;
+
+ hlist_del(&n->link);
+ n->ops = &wait_task_inactive_ops_post;
+ hlist_add_head(&n->link, &next->preempt_notifiers);
+}
+
+static struct preempt_ops wait_task_inactive_ops_pre = {
+ .sched_in = preempt_ops_sched_in_nop,
+ .sched_out = wait_task_inactive_sched_out,
+};
+
/*
* wait_task_inactive - wait for a thread to unschedule.
*
@@ -2405,93 +2453,49 @@ static int migration_cpu_stop(void *data
*/
unsigned long wait_task_inactive(struct task_struct *p, long match_state)
{
+ unsigned long ncsw = 0;
unsigned long flags;
- int running, on_rq;
- unsigned long ncsw;
struct rq *rq;
- for (;;) {
- /*
- * We do the initial early heuristics without holding
- * any task-queue locks at all. We'll only try to get
- * the runqueue lock when things look like they will
- * work out!
- */
- rq = task_rq(p);
-
- /*
- * If the task is actively running on another CPU
- * still, just relax and busy-wait without holding
- * any locks.
- *
- * NOTE! Since we don't hold any locks, it's not
- * even sure that "rq" stays as the right runqueue!
- * But we don't care, since "task_running()" will
- * return false if the runqueue has changed and p
- * is actually now running somewhere else!
- */
- while (task_running(rq, p)) {
- if (match_state && unlikely(p->state != match_state))
- return 0;
- cpu_relax();
- }
-
- /*
- * Ok, time to look more closely! We need the rq
- * lock now, to be *sure*. If we're wrong, we'll
- * just go back and repeat.
- */
- rq = task_rq_lock(p, &flags);
- trace_sched_wait_task(p);
- running = task_running(rq, p);
- on_rq = p->on_rq;
- ncsw = 0;
- if (!match_state || p->state == match_state)
- ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
- task_rq_unlock(rq, p, &flags);
-
- /*
- * If it changed from the expected state, bail out now.
- */
- if (unlikely(!ncsw))
- break;
+ struct wait_task_inactive_blocked blocked = {
+ .notifier = {
+ .ops = &wait_task_inactive_ops_pre,
+ },
+ .waiter = current,
+ };
- /*
- * Was it really running after all now that we
- * checked with the proper locks actually held?
- *
- * Oops. Go back and try again..
- */
- if (unlikely(running)) {
- cpu_relax();
- continue;
- }
+ /*
+ * Busy wait for the task to stop running, the caller promised it
+ * wouldn't be long.
+ */
+ while (p->on_cpu) {
+ /* if we don't match the expected state, bail */
+ if (match_state && unlikely(p->state != match_state))
+ return 0;
+ cpu_relax();
+ }
- /*
- * It's not enough that it's not actively running,
- * it must be off the runqueue _entirely_, and not
- * preempted!
- *
- * So if it was still runnable (but just not actively
- * running right now), it's preempted, and we should
- * yield - it could be a while.
- */
- if (unlikely(on_rq)) {
- ktime_t to = ktime_set(0, NSEC_PER_SEC/HZ);
+ rq = task_rq_lock(p, &flags);
+ trace_sched_wait_task(p);
+ if (!p->on_rq) /* we're already blocked */
+ goto done;
- set_current_state(TASK_UNINTERRUPTIBLE);
- schedule_hrtimeout(&to, HRTIMER_MODE_REL);
- continue;
- }
+ hlist_add_head(&blocked.notifier.link, &p->preempt_notifiers);
+ task_rq_unlock(rq, p, &flags);
- /*
- * Ahh, all good. It wasn't running, and it wasn't
- * runnable, which means that it will never become
- * running in the future either. We're all done!
- */
- break;
+ for (;;) {
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ if (!blocked.waiter)
+ break;
+ schedule();
}
+ __set_current_state(TASK_RUNNING);
+ rq = task_rq_lock(p, &flags);
+done:
+ if (!match_state || p->state == match_state)
+ ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
+ task_rq_unlock(rq, p, &flags);
return ncsw;
}
@@ -2519,9 +2523,7 @@ void kick_process(struct task_struct *p)
preempt_enable();
}
EXPORT_SYMBOL_GPL(kick_process);
-#endif /* CONFIG_SMP */
-#ifdef CONFIG_SMP
/*
* ->cpus_allowed is protected by both rq->lock and p->pi_lock
*/
@@ -2967,10 +2969,7 @@ static void __sched_fork(struct task_str
#endif
INIT_LIST_HEAD(&p->rt.run_list);
-
-#ifdef CONFIG_PREEMPT_NOTIFIERS
INIT_HLIST_HEAD(&p->preempt_notifiers);
-#endif
}
/*
@@ -3084,8 +3083,6 @@ void wake_up_new_task(struct task_struct
task_rq_unlock(rq, p, &flags);
}
-#ifdef CONFIG_PREEMPT_NOTIFIERS
-
/**
* preempt_notifier_register - tell me when current is being preempted & rescheduled
* @notifier: notifier struct to register
@@ -3111,9 +3108,9 @@ EXPORT_SYMBOL_GPL(preempt_notifier_unreg
static void fire_sched_in_preempt_notifiers(struct task_struct *curr)
{
struct preempt_notifier *notifier;
- struct hlist_node *node;
+ struct hlist_node *node, *n;
- hlist_for_each_entry(notifier, node, &curr->preempt_notifiers, link)
+ hlist_for_each_entry_safe(notifier, node, n, &curr->preempt_notifiers, link)
notifier->ops->sched_in(notifier, raw_smp_processor_id());
}
@@ -3122,26 +3119,12 @@ fire_sched_out_preempt_notifiers(struct
struct task_struct *next)
{
struct preempt_notifier *notifier;
- struct hlist_node *node;
+ struct hlist_node *node, *n;
- hlist_for_each_entry(notifier, node, &curr->preempt_notifiers, link)
+ hlist_for_each_entry_safe(notifier, node, n, &curr->preempt_notifiers, link)
notifier->ops->sched_out(notifier, next);
}
-#else /* !CONFIG_PREEMPT_NOTIFIERS */
-
-static void fire_sched_in_preempt_notifiers(struct task_struct *curr)
-{
-}
-
-static void
-fire_sched_out_preempt_notifiers(struct task_struct *curr,
- struct task_struct *next)
-{
-}
-
-#endif /* CONFIG_PREEMPT_NOTIFIERS */
-
/**
* prepare_task_switch - prepare to switch tasks
* @rq: the runqueue preparing to switch
@@ -8312,9 +8295,7 @@ void __init sched_init(void)
set_load_weight(&init_task);
-#ifdef CONFIG_PREEMPT_NOTIFIERS
INIT_HLIST_HEAD(&init_task.preempt_notifiers);
-#endif
#ifdef CONFIG_SMP
open_softirq(SCHED_SOFTIRQ, run_rebalance_domains);
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: rt14: strace -> migrate_disable_atomic imbalance
@ 2011-09-22 11:46 ` Peter Zijlstra
0 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2011-09-22 11:46 UTC (permalink / raw)
To: Mike Galbraith
Cc: linux-rt-users, Thomas Gleixner, LKML, Oleg Nesterov,
Miklos Szeredi, mingo
On Thu, 2011-09-22 at 13:31 +0200, Peter Zijlstra wrote:
> On Thu, 2011-09-22 at 12:00 +0200, Peter Zijlstra wrote:
> > OK, this one seems to be better.. But its quite vile, not sure I
> > actually like it anymore.
> >
> There's also a small matter of it actually being slower than what we had
> when tested with: strace hackbench > /dev/null 2>&1.
>
> curses.. even adding a busy wait on p->on_cpu in front.
Gah, its one of those benchmarks where if you rebuild+reboot the numbers
change. Latest patch here..
---
Subject: sched: Rewrite wait_task_inactive()
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Wed Sep 21 21:34:05 CEST 2011
Instead of using a combination of busy-waits and sleeps to poll a
task's state, ensure we get a notification and properly block.
This makes preempt notifiers unconditional, but since pretty much
everybody already has those enabled in their config it shouldn't be a
big deal (I hope).
The notification is somewhat horrid since the sched_out notifier is
called with rq->lock held and interrupts disabled, so we can't
actually do the wakeup from there. However the sched_in notifier is
called without holding rq->lock and with interrupts enable.
So what we do is we register a preempt notifier on the task we want,
that notifier will move itself to the next task passed into the
sched_out function, the sched_in notification of the next task, will do
the actual work... most horrid.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/ia64/kvm/Kconfig | 1
arch/powerpc/kvm/Kconfig | 1
arch/s390/kvm/Kconfig | 1
arch/tile/kvm/Kconfig | 1
arch/x86/kvm/Kconfig | 1
include/linux/kvm_host.h | 2
include/linux/preempt.h | 4
include/linux/sched.h | 2
init/Kconfig | 3
kernel/sched.c | 191 +++++++++++++++++++++--------------------------
10 files changed, 86 insertions(+), 121 deletions(-)
Index: linux-2.6/arch/ia64/kvm/Kconfig
===================================================================
--- linux-2.6.orig/arch/ia64/kvm/Kconfig
+++ linux-2.6/arch/ia64/kvm/Kconfig
@@ -22,7 +22,6 @@ config KVM
depends on HAVE_KVM && MODULES && EXPERIMENTAL
# for device assignment:
depends on PCI
- select PREEMPT_NOTIFIERS
select ANON_INODES
select HAVE_KVM_IRQCHIP
select KVM_APIC_ARCHITECTURE
Index: linux-2.6/arch/powerpc/kvm/Kconfig
===================================================================
--- linux-2.6.orig/arch/powerpc/kvm/Kconfig
+++ linux-2.6/arch/powerpc/kvm/Kconfig
@@ -18,7 +18,6 @@ if VIRTUALIZATION
config KVM
bool
- select PREEMPT_NOTIFIERS
select ANON_INODES
config KVM_BOOK3S_HANDLER
Index: linux-2.6/arch/s390/kvm/Kconfig
===================================================================
--- linux-2.6.orig/arch/s390/kvm/Kconfig
+++ linux-2.6/arch/s390/kvm/Kconfig
@@ -19,7 +19,6 @@ config KVM
def_tristate y
prompt "Kernel-based Virtual Machine (KVM) support"
depends on HAVE_KVM && EXPERIMENTAL
- select PREEMPT_NOTIFIERS
select ANON_INODES
---help---
Support hosting paravirtualized guest machines using the SIE
Index: linux-2.6/arch/tile/kvm/Kconfig
===================================================================
--- linux-2.6.orig/arch/tile/kvm/Kconfig
+++ linux-2.6/arch/tile/kvm/Kconfig
@@ -19,7 +19,6 @@ if VIRTUALIZATION
config KVM
tristate "Kernel-based Virtual Machine (KVM) support"
depends on HAVE_KVM && MODULES && EXPERIMENTAL
- select PREEMPT_NOTIFIERS
select ANON_INODES
---help---
Support hosting paravirtualized guest machines.
Index: linux-2.6/arch/x86/kvm/Kconfig
===================================================================
--- linux-2.6.orig/arch/x86/kvm/Kconfig
+++ linux-2.6/arch/x86/kvm/Kconfig
@@ -24,7 +24,6 @@ config KVM
depends on PCI
# for TASKSTATS/TASK_DELAY_ACCT:
depends on NET
- select PREEMPT_NOTIFIERS
select MMU_NOTIFIER
select ANON_INODES
select HAVE_KVM_IRQCHIP
Index: linux-2.6/include/linux/kvm_host.h
===================================================================
--- linux-2.6.orig/include/linux/kvm_host.h
+++ linux-2.6/include/linux/kvm_host.h
@@ -111,9 +111,7 @@ enum {
struct kvm_vcpu {
struct kvm *kvm;
-#ifdef CONFIG_PREEMPT_NOTIFIERS
struct preempt_notifier preempt_notifier;
-#endif
int cpu;
int vcpu_id;
int srcu_idx;
Index: linux-2.6/include/linux/preempt.h
===================================================================
--- linux-2.6.orig/include/linux/preempt.h
+++ linux-2.6/include/linux/preempt.h
@@ -101,8 +101,6 @@ do { \
#endif /* CONFIG_PREEMPT_COUNT */
-#ifdef CONFIG_PREEMPT_NOTIFIERS
-
struct preempt_notifier;
/**
@@ -147,6 +145,4 @@ static inline void preempt_notifier_init
notifier->ops = ops;
}
-#endif
-
#endif /* __LINUX_PREEMPT_H */
Index: linux-2.6/include/linux/sched.h
===================================================================
--- linux-2.6.orig/include/linux/sched.h
+++ linux-2.6/include/linux/sched.h
@@ -1236,10 +1236,8 @@ struct task_struct {
struct sched_entity se;
struct sched_rt_entity rt;
-#ifdef CONFIG_PREEMPT_NOTIFIERS
/* list of struct preempt_notifier: */
struct hlist_head preempt_notifiers;
-#endif
/*
* fpu_counter contains the number of consecutive context switches
Index: linux-2.6/init/Kconfig
===================================================================
--- linux-2.6.orig/init/Kconfig
+++ linux-2.6/init/Kconfig
@@ -1403,9 +1403,6 @@ config STOP_MACHINE
source "block/Kconfig"
-config PREEMPT_NOTIFIERS
- bool
-
config PADATA
depends on SMP
bool
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -2387,6 +2387,54 @@ struct migration_arg {
static int migration_cpu_stop(void *data);
+struct wait_task_inactive_blocked {
+ struct preempt_notifier notifier;
+ struct task_struct *waiter;
+};
+
+static void wait_task_inactive_sched_in(struct preempt_notifier *n, int cpu)
+{
+ struct task_struct *p;
+ struct wait_task_inactive_blocked *blocked =
+ container_of(n, struct wait_task_inactive_blocked, notifier);
+
+ hlist_del(&n->link);
+
+ p = ACCESS_ONCE(blocked->waiter);
+ blocked->waiter = NULL;
+ wake_up_process(p);
+}
+
+static void
+preempt_ops_sched_out_nop(struct preempt_notifier *n, struct task_struct *next)
+{
+}
+
+static struct preempt_ops wait_task_inactive_ops_post = {
+ .sched_in = wait_task_inactive_sched_in,
+ .sched_out = preempt_ops_sched_out_nop,
+};
+
+static void preempt_ops_sched_in_nop(struct preempt_notifier *n, int cpu)
+{
+}
+
+static void
+wait_task_inactive_sched_out(struct preempt_notifier *n, struct task_struct *next)
+{
+ if (current->on_rq) /* we're not inactive yet */
+ return;
+
+ hlist_del(&n->link);
+ n->ops = &wait_task_inactive_ops_post;
+ hlist_add_head(&n->link, &next->preempt_notifiers);
+}
+
+static struct preempt_ops wait_task_inactive_ops_pre = {
+ .sched_in = preempt_ops_sched_in_nop,
+ .sched_out = wait_task_inactive_sched_out,
+};
+
/*
* wait_task_inactive - wait for a thread to unschedule.
*
@@ -2405,93 +2453,49 @@ static int migration_cpu_stop(void *data
*/
unsigned long wait_task_inactive(struct task_struct *p, long match_state)
{
+ unsigned long ncsw = 0;
unsigned long flags;
- int running, on_rq;
- unsigned long ncsw;
struct rq *rq;
- for (;;) {
- /*
- * We do the initial early heuristics without holding
- * any task-queue locks at all. We'll only try to get
- * the runqueue lock when things look like they will
- * work out!
- */
- rq = task_rq(p);
-
- /*
- * If the task is actively running on another CPU
- * still, just relax and busy-wait without holding
- * any locks.
- *
- * NOTE! Since we don't hold any locks, it's not
- * even sure that "rq" stays as the right runqueue!
- * But we don't care, since "task_running()" will
- * return false if the runqueue has changed and p
- * is actually now running somewhere else!
- */
- while (task_running(rq, p)) {
- if (match_state && unlikely(p->state != match_state))
- return 0;
- cpu_relax();
- }
-
- /*
- * Ok, time to look more closely! We need the rq
- * lock now, to be *sure*. If we're wrong, we'll
- * just go back and repeat.
- */
- rq = task_rq_lock(p, &flags);
- trace_sched_wait_task(p);
- running = task_running(rq, p);
- on_rq = p->on_rq;
- ncsw = 0;
- if (!match_state || p->state == match_state)
- ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
- task_rq_unlock(rq, p, &flags);
-
- /*
- * If it changed from the expected state, bail out now.
- */
- if (unlikely(!ncsw))
- break;
+ struct wait_task_inactive_blocked blocked = {
+ .notifier = {
+ .ops = &wait_task_inactive_ops_pre,
+ },
+ .waiter = current,
+ };
- /*
- * Was it really running after all now that we
- * checked with the proper locks actually held?
- *
- * Oops. Go back and try again..
- */
- if (unlikely(running)) {
- cpu_relax();
- continue;
- }
+ /*
+ * Busy wait for the task to stop running, the caller promised it
+ * wouldn't be long.
+ */
+ while (p->on_cpu) {
+ /* if we don't match the expected state, bail */
+ if (match_state && unlikely(p->state != match_state))
+ return 0;
+ cpu_relax();
+ }
- /*
- * It's not enough that it's not actively running,
- * it must be off the runqueue _entirely_, and not
- * preempted!
- *
- * So if it was still runnable (but just not actively
- * running right now), it's preempted, and we should
- * yield - it could be a while.
- */
- if (unlikely(on_rq)) {
- ktime_t to = ktime_set(0, NSEC_PER_SEC/HZ);
+ rq = task_rq_lock(p, &flags);
+ trace_sched_wait_task(p);
+ if (!p->on_rq) /* we're already blocked */
+ goto done;
- set_current_state(TASK_UNINTERRUPTIBLE);
- schedule_hrtimeout(&to, HRTIMER_MODE_REL);
- continue;
- }
+ hlist_add_head(&blocked.notifier.link, &p->preempt_notifiers);
+ task_rq_unlock(rq, p, &flags);
- /*
- * Ahh, all good. It wasn't running, and it wasn't
- * runnable, which means that it will never become
- * running in the future either. We're all done!
- */
- break;
+ for (;;) {
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ if (!blocked.waiter)
+ break;
+ schedule();
}
+ __set_current_state(TASK_RUNNING);
+ rq = task_rq_lock(p, &flags);
+done:
+ if (!match_state || p->state == match_state)
+ ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
+ task_rq_unlock(rq, p, &flags);
return ncsw;
}
@@ -2519,9 +2523,7 @@ void kick_process(struct task_struct *p)
preempt_enable();
}
EXPORT_SYMBOL_GPL(kick_process);
-#endif /* CONFIG_SMP */
-#ifdef CONFIG_SMP
/*
* ->cpus_allowed is protected by both rq->lock and p->pi_lock
*/
@@ -2967,10 +2969,7 @@ static void __sched_fork(struct task_str
#endif
INIT_LIST_HEAD(&p->rt.run_list);
-
-#ifdef CONFIG_PREEMPT_NOTIFIERS
INIT_HLIST_HEAD(&p->preempt_notifiers);
-#endif
}
/*
@@ -3084,8 +3083,6 @@ void wake_up_new_task(struct task_struct
task_rq_unlock(rq, p, &flags);
}
-#ifdef CONFIG_PREEMPT_NOTIFIERS
-
/**
* preempt_notifier_register - tell me when current is being preempted & rescheduled
* @notifier: notifier struct to register
@@ -3111,9 +3108,9 @@ EXPORT_SYMBOL_GPL(preempt_notifier_unreg
static void fire_sched_in_preempt_notifiers(struct task_struct *curr)
{
struct preempt_notifier *notifier;
- struct hlist_node *node;
+ struct hlist_node *node, *n;
- hlist_for_each_entry(notifier, node, &curr->preempt_notifiers, link)
+ hlist_for_each_entry_safe(notifier, node, n, &curr->preempt_notifiers, link)
notifier->ops->sched_in(notifier, raw_smp_processor_id());
}
@@ -3122,26 +3119,12 @@ fire_sched_out_preempt_notifiers(struct
struct task_struct *next)
{
struct preempt_notifier *notifier;
- struct hlist_node *node;
+ struct hlist_node *node, *n;
- hlist_for_each_entry(notifier, node, &curr->preempt_notifiers, link)
+ hlist_for_each_entry_safe(notifier, node, n, &curr->preempt_notifiers, link)
notifier->ops->sched_out(notifier, next);
}
-#else /* !CONFIG_PREEMPT_NOTIFIERS */
-
-static void fire_sched_in_preempt_notifiers(struct task_struct *curr)
-{
-}
-
-static void
-fire_sched_out_preempt_notifiers(struct task_struct *curr,
- struct task_struct *next)
-{
-}
-
-#endif /* CONFIG_PREEMPT_NOTIFIERS */
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: rt14: strace -> migrate_disable_atomic imbalance
2011-09-22 11:46 ` Peter Zijlstra
(?)
@ 2011-09-22 14:52 ` Oleg Nesterov
2011-09-22 15:13 ` Peter Zijlstra
-1 siblings, 1 reply; 52+ messages in thread
From: Oleg Nesterov @ 2011-09-22 14:52 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Mike Galbraith, linux-rt-users, Thomas Gleixner, LKML,
Miklos Szeredi, mingo
On 09/22, Peter Zijlstra wrote:
>
> +static void wait_task_inactive_sched_in(struct preempt_notifier *n, int cpu)
> +{
> + struct task_struct *p;
> + struct wait_task_inactive_blocked *blocked =
> + container_of(n, struct wait_task_inactive_blocked, notifier);
> +
> + hlist_del(&n->link);
> +
> + p = ACCESS_ONCE(blocked->waiter);
> + blocked->waiter = NULL;
> + wake_up_process(p);
> +}
> ...
> +static void
> +wait_task_inactive_sched_out(struct preempt_notifier *n, struct task_struct *next)
> +{
> + if (current->on_rq) /* we're not inactive yet */
> + return;
> +
> + hlist_del(&n->link);
> + n->ops = &wait_task_inactive_ops_post;
> + hlist_add_head(&n->link, &next->preempt_notifiers);
> +}
Tricky ;) Yes, the first ->sched_out() is not enough.
> unsigned long wait_task_inactive(struct task_struct *p, long match_state)
> {
> ...
> + rq = task_rq_lock(p, &flags);
> + trace_sched_wait_task(p);
> + if (!p->on_rq) /* we're already blocked */
> + goto done;
This doesn't look right. schedule() clears ->on_rq a long before
__switch_to/etc.
And it seems that we check ->on_cpu above, this is not UP friendly.
>
> - set_current_state(TASK_UNINTERRUPTIBLE);
> - schedule_hrtimeout(&to, HRTIMER_MODE_REL);
> - continue;
> - }
> + hlist_add_head(&blocked.notifier.link, &p->preempt_notifiers);
> + task_rq_unlock(rq, p, &flags);
I thought about reimplementing wait_task_inactive() too, but afaics there
is a problem: why we can't race with p doing register_preempt_notifier() ?
I guess register_ needs rq->lock too.
Oleg.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: rt14: strace -> migrate_disable_atomic imbalance
2011-09-22 14:52 ` Oleg Nesterov
@ 2011-09-22 15:13 ` Peter Zijlstra
0 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2011-09-22 15:13 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Mike Galbraith, linux-rt-users, Thomas Gleixner, LKML,
Miklos Szeredi, mingo
On Thu, 2011-09-22 at 16:52 +0200, Oleg Nesterov wrote:
> On 09/22, Peter Zijlstra wrote:
> >
> > +static void wait_task_inactive_sched_in(struct preempt_notifier *n, int cpu)
> > +{
> > + struct task_struct *p;
> > + struct wait_task_inactive_blocked *blocked =
> > + container_of(n, struct wait_task_inactive_blocked, notifier);
> > +
> > + hlist_del(&n->link);
> > +
> > + p = ACCESS_ONCE(blocked->waiter);
> > + blocked->waiter = NULL;
> > + wake_up_process(p);
> > +}
> > ...
> > +static void
> > +wait_task_inactive_sched_out(struct preempt_notifier *n, struct task_struct *next)
> > +{
> > + if (current->on_rq) /* we're not inactive yet */
> > + return;
> > +
> > + hlist_del(&n->link);
> > + n->ops = &wait_task_inactive_ops_post;
> > + hlist_add_head(&n->link, &next->preempt_notifiers);
> > +}
>
> Tricky ;) Yes, the first ->sched_out() is not enough.
Not enough isn't the problem, its ran with rq->lock held and irqs
disabled, you simply cannot do ttwu() from there.
If we could, the subsequent task_rq_lock() in wait_task_inactive() would
be enough to serialize against the still in-flight context switch.
One of the problems with doing it from the next sched_in notifier, is
that next can be idle, and then we do a A -> idle -> B switch, which is
of course sub-optimal.
> > unsigned long wait_task_inactive(struct task_struct *p, long match_state)
> > {
> > ...
> > + rq = task_rq_lock(p, &flags);
> > + trace_sched_wait_task(p);
> > + if (!p->on_rq) /* we're already blocked */
> > + goto done;
>
> This doesn't look right. schedule() clears ->on_rq a long before
> __switch_to/etc.
Oh, bugger, yes its before we can drop the rq for idle balance and
nonsense like that. (!p->on_rq && !p->on_cpu) should suffice I think.
> And it seems that we check ->on_cpu above, this is not UP friendly.
True, but its what the old code did.. and I was seeing performance
suckage compared to the unpatched kernel (not that the p->on_cpu busy
wait fixed it)...
> >
> > - set_current_state(TASK_UNINTERRUPTIBLE);
> > - schedule_hrtimeout(&to, HRTIMER_MODE_REL);
> > - continue;
> > - }
> > + hlist_add_head(&blocked.notifier.link, &p->preempt_notifiers);
> > + task_rq_unlock(rq, p, &flags);
>
> I thought about reimplementing wait_task_inactive() too, but afaics there
> is a problem: why we can't race with p doing register_preempt_notifier() ?
> I guess register_ needs rq->lock too.
We can actually, now you mention it.. ->pi_lock would be sufficient and
less expensive to acquire.
^ permalink raw reply [flat|nested] 52+ messages in thread