linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] sched: make callers check lock contention for cond_resched_lock()
@ 2012-05-03  8:12 Takuya Yoshikawa
  2012-05-03  8:35 ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Takuya Yoshikawa @ 2012-05-03  8:12 UTC (permalink / raw)
  To: peterz, mingo
  Cc: linux-kernel, linux-fsdevel, kvm, avi, mtosatti, yoshikawa.takuya

This patch is for showing what I am thinking and only compile tested
on linux-next, so an RFC.

Although I might misread something, I am not sure whether every user of
this API wanted to avoid contention check without CONFIG_PREEMPT.

Any comments will be appreciated.

Thanks,
	Takuya

===
From: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>

While doing kvm development, we found a case in which we wanted to break
a critical section on lock contention even without CONFIG_PREEMPT.

Although we can do that using spin_is_contended() and cond_resched(),
changing cond_resched_lock() to satisfy such a need is another option.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
---
 arch/x86/kvm/mmu.c       |    3 ++-
 fs/btrfs/extent_io.c     |    2 +-
 fs/btrfs/inode.c         |    3 ++-
 fs/btrfs/ordered-data.c  |    3 ++-
 fs/btrfs/relocation.c    |    3 ++-
 fs/dcache.c              |    3 ++-
 fs/fscache/object.c      |    3 ++-
 fs/jbd/commit.c          |    6 ++++--
 fs/jbd2/commit.c         |    3 ++-
 fs/nfs/nfs4filelayout.c  |    2 +-
 fs/nfs/write.c           |    2 +-
 fs/ocfs2/dlm/dlmdomain.c |    5 +++--
 fs/ocfs2/dlm/dlmthread.c |    3 ++-
 fs/reiserfs/journal.c    |    4 ++--
 include/linux/sched.h    |    6 +++---
 kernel/sched/core.c      |    4 ++--
 16 files changed, 33 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 07424cf..3361ee3 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1704,7 +1704,8 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
 			mmu_pages_clear_parents(&parents);
 		}
 		kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
-		cond_resched_lock(&vcpu->kvm->mmu_lock);
+		cond_resched_lock(&vcpu->kvm->mmu_lock,
+				  spin_is_contended(&vcpu->kvm->mmu_lock));
 		kvm_mmu_pages_init(parent, &parents, &pages);
 	}
 }
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 198c2ba..cfcc233 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -675,7 +675,7 @@ again:
 		if (start > end)
 			break;
 
-		cond_resched_lock(&tree->lock);
+		cond_resched_lock(&tree->lock, spin_needbreak(&tree->lock));
 	}
 out:
 	spin_unlock(&tree->lock);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 61b16c6..16a6173 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3985,7 +3985,8 @@ again:
 			goto again;
 		}
 
-		if (cond_resched_lock(&root->inode_lock))
+		if (cond_resched_lock(&root->inode_lock,
+				      spin_needbreak(&root->inode_lock)))
 			goto again;
 
 		node = rb_next(node);
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index bbf6d0d..1dfcd6d 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -485,7 +485,8 @@ void btrfs_wait_ordered_extents(struct btrfs_root *root,
 		    !test_bit(BTRFS_ORDERED_PREALLOC, &ordered->flags)) {
 			list_move(&ordered->root_extent_list,
 				  &root->fs_info->ordered_extents);
-			cond_resched_lock(&root->fs_info->ordered_extent_lock);
+			cond_resched_lock(&root->fs_info->ordered_extent_lock,
+				spin_needbreak(&root->fs_info->ordered_extent_lock));
 			continue;
 		}
 
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 646ee21..6102a62 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1471,7 +1471,8 @@ again:
 		}
 
 		objectid = btrfs_ino(&entry->vfs_inode) + 1;
-		if (cond_resched_lock(&root->inode_lock))
+		if (cond_resched_lock(&root->inode_lock,
+				      spin_needbreak(&root->inode_lock)))
 			goto again;
 
 		node = rb_next(node);
diff --git a/fs/dcache.c b/fs/dcache.c
index 58a6ecf..dccfa62 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -855,7 +855,8 @@ relock:
 			if (!--count)
 				break;
 		}
-		cond_resched_lock(&dcache_lru_lock);
+		cond_resched_lock(&dcache_lru_lock,
+				  spin_needbreak(&dcache_lru_lock));
 	}
 	if (!list_empty(&referenced))
 		list_splice(&referenced, &sb->s_dentry_lru);
diff --git a/fs/fscache/object.c b/fs/fscache/object.c
index b6b897c..9db99c6 100644
--- a/fs/fscache/object.c
+++ b/fs/fscache/object.c
@@ -824,7 +824,8 @@ static void fscache_enqueue_dependents(struct fscache_object *object)
 		fscache_put_object(dep);
 
 		if (!list_empty(&object->dependents))
-			cond_resched_lock(&object->lock);
+			cond_resched_lock(&object->lock,
+					  spin_needbreak(&object->lock));
 	}
 
 	spin_unlock(&object->lock);
diff --git a/fs/jbd/commit.c b/fs/jbd/commit.c
index 52c15c7..59c60bf 100644
--- a/fs/jbd/commit.c
+++ b/fs/jbd/commit.c
@@ -474,7 +474,8 @@ void journal_commit_transaction(journal_t *journal)
 			__journal_unfile_buffer(jh);
 		jbd_unlock_bh_state(bh);
 		release_data_buffer(bh);
-		cond_resched_lock(&journal->j_list_lock);
+		cond_resched_lock(&journal->j_list_lock,
+				  spin_needbreak(&journal->j_list_lock));
 	}
 	spin_unlock(&journal->j_list_lock);
 
@@ -905,7 +906,8 @@ restart_loop:
 			release_buffer_page(bh);
 		else
 			__brelse(bh);
-		cond_resched_lock(&journal->j_list_lock);
+		cond_resched_lock(&journal->j_list_lock,
+				  spin_needbreak(&journal->j_list_lock));
 	}
 	spin_unlock(&journal->j_list_lock);
 	/*
diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 840f70f..5e71afa 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -989,7 +989,8 @@ restart_loop:
 			release_buffer_page(bh);	/* Drops bh reference */
 		else
 			__brelse(bh);
-		cond_resched_lock(&journal->j_list_lock);
+		cond_resched_lock(&journal->j_list_lock,
+				  spin_needbreak(&journal->j_list_lock));
 	}
 	spin_unlock(&journal->j_list_lock);
 	/*
diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 5acfd9e..0536aab 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -946,7 +946,7 @@ filelayout_scan_ds_commit_list(struct nfs4_fl_commit_bucket *bucket, int max,
 	list_for_each_entry_safe(req, tmp, src, wb_list) {
 		if (!nfs_lock_request(req))
 			continue;
-		if (cond_resched_lock(lock))
+		if (cond_resched_lock(lock, spin_needbreak(lock)))
 			list_safe_reset_next(req, tmp, wb_list);
 		nfs_request_remove_commit_list(req);
 		clear_bit(PG_COMMIT_TO_DS, &req->wb_flags);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index c074623..0d83257 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -570,7 +570,7 @@ nfs_scan_commit_list(struct list_head *src, struct list_head *dst, int max,
 	list_for_each_entry_safe(req, tmp, src, wb_list) {
 		if (!nfs_lock_request(req))
 			continue;
-		if (cond_resched_lock(lock))
+		if (cond_resched_lock(lock, spin_needbreak(lock)))
 			list_safe_reset_next(req, tmp, wb_list);
 		nfs_request_remove_commit_list(req);
 		nfs_list_add_request(req, dst);
diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
index 9e89d70..5602e4c 100644
--- a/fs/ocfs2/dlm/dlmdomain.c
+++ b/fs/ocfs2/dlm/dlmdomain.c
@@ -465,11 +465,12 @@ redo_bucket:
 			dlm_lockres_put(res);
 
 			if (dropped) {
-				cond_resched_lock(&dlm->spinlock);
+				cond_resched_lock(&dlm->spinlock,
+						  spin_needbreak(&dlm->spinlock));
 				goto redo_bucket;
 			}
 		}
-		cond_resched_lock(&dlm->spinlock);
+		cond_resched_lock(&dlm->spinlock, spin_needbreak(&dlm->spinlock));
 		num += n;
 	}
 	spin_unlock(&dlm->spinlock);
diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c
index e73c833..ee86242 100644
--- a/fs/ocfs2/dlm/dlmthread.c
+++ b/fs/ocfs2/dlm/dlmthread.c
@@ -276,7 +276,8 @@ static void dlm_run_purge_list(struct dlm_ctxt *dlm,
 		dlm_lockres_put(lockres);
 
 		/* Avoid adding any scheduling latencies */
-		cond_resched_lock(&dlm->spinlock);
+		cond_resched_lock(&dlm->spinlock,
+				  spin_needbreak(&dlm->spinlock));
 	}
 
 	spin_unlock(&dlm->spinlock);
diff --git a/fs/reiserfs/journal.c b/fs/reiserfs/journal.c
index b1a0857..d58f596 100644
--- a/fs/reiserfs/journal.c
+++ b/fs/reiserfs/journal.c
@@ -835,7 +835,7 @@ static int write_ordered_buffers(spinlock_t * lock,
 		}
 	      loop_next:
 		put_bh(bh);
-		cond_resched_lock(lock);
+		cond_resched_lock(lock, spin_needbreak(lock));
 	}
 	if (chunk.nr) {
 		spin_unlock(lock);
@@ -870,7 +870,7 @@ static int write_ordered_buffers(spinlock_t * lock,
 			spin_lock(lock);
 		}
 		put_bh(bh);
-		cond_resched_lock(lock);
+		cond_resched_lock(lock, spin_needbreak(lock));
 	}
 	spin_unlock(lock);
 	return ret;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 7d2acbd..61f4396 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2723,7 +2723,7 @@ extern int _cond_resched(void);
 	_cond_resched();			\
 })
 
-extern int __cond_resched_lock(spinlock_t *lock);
+extern int __cond_resched_lock(spinlock_t *lock, int need_break);
 
 #ifdef CONFIG_PREEMPT_COUNT
 #define PREEMPT_LOCK_OFFSET	PREEMPT_OFFSET
@@ -2731,9 +2731,9 @@ extern int __cond_resched_lock(spinlock_t *lock);
 #define PREEMPT_LOCK_OFFSET	0
 #endif
 
-#define cond_resched_lock(lock) ({				\
+#define cond_resched_lock(lock, need_break) ({			\
 	__might_sleep(__FILE__, __LINE__, PREEMPT_LOCK_OFFSET);	\
-	__cond_resched_lock(lock);				\
+	__cond_resched_lock(lock, need_break);			\
 })
 
 extern int __cond_resched_softirq(void);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 477b998..470113f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4570,14 +4570,14 @@ EXPORT_SYMBOL(_cond_resched);
  * operations here to prevent schedule() from being called twice (once via
  * spin_unlock(), once by hand).
  */
-int __cond_resched_lock(spinlock_t *lock)
+int __cond_resched_lock(spinlock_t *lock, int need_break)
 {
 	int resched = should_resched();
 	int ret = 0;
 
 	lockdep_assert_held(lock);
 
-	if (spin_needbreak(lock) || resched) {
+	if (need_break || resched) {
 		spin_unlock(lock);
 		if (resched)
 			__cond_resched();
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC] sched: make callers check lock contention for cond_resched_lock()
  2012-05-03  8:12 [RFC] sched: make callers check lock contention for cond_resched_lock() Takuya Yoshikawa
@ 2012-05-03  8:35 ` Peter Zijlstra
  2012-05-03 12:22   ` Takuya Yoshikawa
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2012-05-03  8:35 UTC (permalink / raw)
  To: Takuya Yoshikawa
  Cc: mingo, linux-kernel, linux-fsdevel, kvm, avi, mtosatti, yoshikawa.takuya

On Thu, 2012-05-03 at 17:12 +0900, Takuya Yoshikawa wrote:
> 
> Although we can do that using spin_is_contended() and cond_resched(),
> changing cond_resched_lock() to satisfy such a need is another option.
> 
Yeah, not a pretty patch. Changing all cond_resched_lock() sites just to
change one and in such an ugly way too.

So what's the impact of making spin_needbreak() work for !PREEMPT?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] sched: make callers check lock contention for cond_resched_lock()
  2012-05-03  8:35 ` Peter Zijlstra
@ 2012-05-03 12:22   ` Takuya Yoshikawa
  2012-05-03 12:29     ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Takuya Yoshikawa @ 2012-05-03 12:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, linux-fsdevel, kvm, avi, mtosatti, yoshikawa.takuya

On Thu, 03 May 2012 10:35:27 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, 2012-05-03 at 17:12 +0900, Takuya Yoshikawa wrote:
> > 
> > Although we can do that using spin_is_contended() and cond_resched(),
> > changing cond_resched_lock() to satisfy such a need is another option.
> > 
> Yeah, not a pretty patch. Changing all cond_resched_lock() sites just to
> change one and in such an ugly way too.
> 
> So what's the impact of making spin_needbreak() work for !PREEMPT?

Although the real use case is out of this RFC patch, we are now discussing
a case in which we may hold a spin_lock for long time, ms order, depending
on workload;  and in that case, other threads -- VCPU threads -- should be
given higher priority for that problematic lock.

I wanted to hear whether other users also have similar needs.  If so, it
may be worth making the API a bit more generic.

But I could not find a clean solution for that.  Do you think that using
spin_is_contended() directly is the way to go?

Thanks,
	Takuya

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] sched: make callers check lock contention for cond_resched_lock()
  2012-05-03 12:22   ` Takuya Yoshikawa
@ 2012-05-03 12:29     ` Peter Zijlstra
  2012-05-03 12:47       ` Avi Kivity
  2012-05-03 13:00       ` Takuya Yoshikawa
  0 siblings, 2 replies; 14+ messages in thread
From: Peter Zijlstra @ 2012-05-03 12:29 UTC (permalink / raw)
  To: Takuya Yoshikawa
  Cc: mingo, linux-kernel, linux-fsdevel, kvm, avi, mtosatti, yoshikawa.takuya

On Thu, 2012-05-03 at 21:22 +0900, Takuya Yoshikawa wrote:
> Although the real use case is out of this RFC patch, we are now discussing
> a case in which we may hold a spin_lock for long time, ms order, depending
> on workload;  and in that case, other threads -- VCPU threads -- should be
> given higher priority for that problematic lock. 

Firstly, if you can hold a lock that long, it shouldn't be a spinlock,
secondly why isn't TIF_RESCHED being set if its running that long? That
should still make cond_resched_lock() break.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] sched: make callers check lock contention for cond_resched_lock()
  2012-05-03 12:29     ` Peter Zijlstra
@ 2012-05-03 12:47       ` Avi Kivity
  2012-05-03 14:11         ` Takuya Yoshikawa
  2012-05-03 13:00       ` Takuya Yoshikawa
  1 sibling, 1 reply; 14+ messages in thread
From: Avi Kivity @ 2012-05-03 12:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Takuya Yoshikawa, mingo, linux-kernel, linux-fsdevel, kvm,
	mtosatti, yoshikawa.takuya

On 05/03/2012 03:29 PM, Peter Zijlstra wrote:
> On Thu, 2012-05-03 at 21:22 +0900, Takuya Yoshikawa wrote:
> > Although the real use case is out of this RFC patch, we are now discussing
> > a case in which we may hold a spin_lock for long time, ms order, depending
> > on workload;  and in that case, other threads -- VCPU threads -- should be
> > given higher priority for that problematic lock. 
>
> Firstly, if you can hold a lock that long, it shouldn't be a spinlock,

In fact with your mm preemptibility work it can be made into a mutex, if
the entire mmu notifier path can be done in task context.  However it
ends up a strange mutex - you can sleep while holding it but you may not
allocate, because you might recurse into an mmu notifier again.

Most uses of the lock only involve tweaking some bits though.

> secondly why isn't TIF_RESCHED being set if its running that long? That
> should still make cond_resched_lock() break.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] sched: make callers check lock contention for cond_resched_lock()
  2012-05-03 12:29     ` Peter Zijlstra
  2012-05-03 12:47       ` Avi Kivity
@ 2012-05-03 13:00       ` Takuya Yoshikawa
  2012-05-03 15:47         ` Peter Zijlstra
  2012-05-04  2:43         ` Michael Wang
  1 sibling, 2 replies; 14+ messages in thread
From: Takuya Yoshikawa @ 2012-05-03 13:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, linux-fsdevel, kvm, avi, mtosatti, yoshikawa.takuya

On Thu, 03 May 2012 14:29:10 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, 2012-05-03 at 21:22 +0900, Takuya Yoshikawa wrote:
> > Although the real use case is out of this RFC patch, we are now discussing
> > a case in which we may hold a spin_lock for long time, ms order, depending
> > on workload;  and in that case, other threads -- VCPU threads -- should be
> > given higher priority for that problematic lock. 
> 
> Firstly, if you can hold a lock that long, it shouldn't be a spinlock,

I agree with you in principle, but isn't cond_resched_lock() there for that?

> secondly why isn't TIF_RESCHED being set if its running that long? That
> should still make cond_resched_lock() break.

I see.

I did some tests using spin_is_contended() and need_resched() and saw
that need_resched() was called as often as spin_is_contended(), so
experimentally I understand your point.

But as I could not see why spin_needbreak() was differently implemented
depending on CONFIG_PREEMPT, I wanted to understand the meaning.

Thanks,
	Takuya

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] sched: make callers check lock contention for cond_resched_lock()
  2012-05-03 12:47       ` Avi Kivity
@ 2012-05-03 14:11         ` Takuya Yoshikawa
  2012-05-03 14:27           ` Avi Kivity
  0 siblings, 1 reply; 14+ messages in thread
From: Takuya Yoshikawa @ 2012-05-03 14:11 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Peter Zijlstra, mingo, linux-kernel, linux-fsdevel, kvm,
	mtosatti, yoshikawa.takuya

On Thu, 03 May 2012 15:47:26 +0300
Avi Kivity <avi@redhat.com> wrote:

> On 05/03/2012 03:29 PM, Peter Zijlstra wrote:
> > On Thu, 2012-05-03 at 21:22 +0900, Takuya Yoshikawa wrote:
> > > Although the real use case is out of this RFC patch, we are now discussing
> > > a case in which we may hold a spin_lock for long time, ms order, depending
> > > on workload;  and in that case, other threads -- VCPU threads -- should be
> > > given higher priority for that problematic lock. 
> >
> > Firstly, if you can hold a lock that long, it shouldn't be a spinlock,
> 
> In fact with your mm preemptibility work it can be made into a mutex, if
> the entire mmu notifier path can be done in task context.  However it
> ends up a strange mutex - you can sleep while holding it but you may not
> allocate, because you might recurse into an mmu notifier again.
> 
> Most uses of the lock only involve tweaking some bits though.

I might find a real way to go.

After your "mmu_lock -- TLB-flush" decoupling, we can change the current
get_dirty work flow like this:

	for ... {
		take mmu_lock
		for 4K*8 gfns {		// with 4KB dirty_bitmap_buffer
			xchg dirty bits	// 64/32 gfns at once
			write protect them
		}
		release mmu_lock
		copy_to_user
	}
	TLB flush

This reduces the size of dirty_bitmap_buffer and does not hold mmu_lock
so long.

I should have think of a way not to hold the spin_lock so long as Peter
said.  My lack of thinking might be the real problem.

Thanks,
	Takuya

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] sched: make callers check lock contention for cond_resched_lock()
  2012-05-03 14:11         ` Takuya Yoshikawa
@ 2012-05-03 14:27           ` Avi Kivity
  2012-05-03 14:38             ` Avi Kivity
  0 siblings, 1 reply; 14+ messages in thread
From: Avi Kivity @ 2012-05-03 14:27 UTC (permalink / raw)
  To: Takuya Yoshikawa
  Cc: Peter Zijlstra, mingo, linux-kernel, linux-fsdevel, kvm,
	mtosatti, yoshikawa.takuya

On 05/03/2012 05:11 PM, Takuya Yoshikawa wrote:
> On Thu, 03 May 2012 15:47:26 +0300
> Avi Kivity <avi@redhat.com> wrote:
>
> > On 05/03/2012 03:29 PM, Peter Zijlstra wrote:
> > > On Thu, 2012-05-03 at 21:22 +0900, Takuya Yoshikawa wrote:
> > > > Although the real use case is out of this RFC patch, we are now discussing
> > > > a case in which we may hold a spin_lock for long time, ms order, depending
> > > > on workload;  and in that case, other threads -- VCPU threads -- should be
> > > > given higher priority for that problematic lock. 
> > >
> > > Firstly, if you can hold a lock that long, it shouldn't be a spinlock,
> > 
> > In fact with your mm preemptibility work it can be made into a mutex, if
> > the entire mmu notifier path can be done in task context.  However it
> > ends up a strange mutex - you can sleep while holding it but you may not
> > allocate, because you might recurse into an mmu notifier again.
> > 
> > Most uses of the lock only involve tweaking some bits though.
>
> I might find a real way to go.
>
> After your "mmu_lock -- TLB-flush" decoupling, we can change the current
> get_dirty work flow like this:
>
> 	for ... {
> 		take mmu_lock
> 		for 4K*8 gfns {		// with 4KB dirty_bitmap_buffer
> 			xchg dirty bits	// 64/32 gfns at once
> 			write protect them
> 		}
> 		release mmu_lock
> 		copy_to_user
> 	}
> 	TLB flush
>
> This reduces the size of dirty_bitmap_buffer and does not hold mmu_lock
> so long.

Good idea.  Hopefully the lock acquisition costs are low enough - we're
adding two atomic operations per iteration here.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] sched: make callers check lock contention for cond_resched_lock()
  2012-05-03 14:27           ` Avi Kivity
@ 2012-05-03 14:38             ` Avi Kivity
  0 siblings, 0 replies; 14+ messages in thread
From: Avi Kivity @ 2012-05-03 14:38 UTC (permalink / raw)
  To: Takuya Yoshikawa
  Cc: Peter Zijlstra, mingo, linux-kernel, linux-fsdevel, kvm,
	mtosatti, yoshikawa.takuya

On 05/03/2012 05:27 PM, Avi Kivity wrote:
> On 05/03/2012 05:11 PM, Takuya Yoshikawa wrote:
> > On Thu, 03 May 2012 15:47:26 +0300
> > Avi Kivity <avi@redhat.com> wrote:
> >
> > > On 05/03/2012 03:29 PM, Peter Zijlstra wrote:
> > > > On Thu, 2012-05-03 at 21:22 +0900, Takuya Yoshikawa wrote:
> > > > > Although the real use case is out of this RFC patch, we are now discussing
> > > > > a case in which we may hold a spin_lock for long time, ms order, depending
> > > > > on workload;  and in that case, other threads -- VCPU threads -- should be
> > > > > given higher priority for that problematic lock. 
> > > >
> > > > Firstly, if you can hold a lock that long, it shouldn't be a spinlock,
> > > 
> > > In fact with your mm preemptibility work it can be made into a mutex, if
> > > the entire mmu notifier path can be done in task context.  However it
> > > ends up a strange mutex - you can sleep while holding it but you may not
> > > allocate, because you might recurse into an mmu notifier again.
> > > 
> > > Most uses of the lock only involve tweaking some bits though.
> >
> > I might find a real way to go.
> >
> > After your "mmu_lock -- TLB-flush" decoupling, we can change the current
> > get_dirty work flow like this:
> >
> > 	for ... {
> > 		take mmu_lock
> > 		for 4K*8 gfns {		// with 4KB dirty_bitmap_buffer
> > 			xchg dirty bits	// 64/32 gfns at once
> > 			write protect them
> > 		}
> > 		release mmu_lock
> > 		copy_to_user
> > 	}
> > 	TLB flush
> >
> > This reduces the size of dirty_bitmap_buffer and does not hold mmu_lock
> > so long.
>
> Good idea.  Hopefully the lock acquisition costs are low enough - we're
> adding two atomic operations per iteration here.
>

btw, this requires my kvm_cond_flush_remote_tlbs().  Otherwise another
thread can acquire the lock, see a pagetable marked read-only by this
code, and proceed to shadow it, while the guest still has a writeable
tlb entry pointing at it.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] sched: make callers check lock contention for cond_resched_lock()
  2012-05-03 13:00       ` Takuya Yoshikawa
@ 2012-05-03 15:47         ` Peter Zijlstra
  2012-05-10 22:03           ` Takuya Yoshikawa
  2012-05-04  2:43         ` Michael Wang
  1 sibling, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2012-05-03 15:47 UTC (permalink / raw)
  To: Takuya Yoshikawa
  Cc: mingo, linux-kernel, linux-fsdevel, kvm, avi, mtosatti, yoshikawa.takuya

On Thu, 2012-05-03 at 22:00 +0900, Takuya Yoshikawa wrote:
> But as I could not see why spin_needbreak() was differently
> implemented
> depending on CONFIG_PREEMPT, I wanted to understand the meaning. 

Its been that way since before voluntary preemption was introduced, so
its possible Ingo simply missed that spot and nobody noticed until now.

Ingo, do you have any recollections from back when?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] sched: make callers check lock contention for cond_resched_lock()
  2012-05-03 13:00       ` Takuya Yoshikawa
  2012-05-03 15:47         ` Peter Zijlstra
@ 2012-05-04  2:43         ` Michael Wang
  1 sibling, 0 replies; 14+ messages in thread
From: Michael Wang @ 2012-05-04  2:43 UTC (permalink / raw)
  To: Takuya Yoshikawa
  Cc: Peter Zijlstra, mingo, linux-kernel, linux-fsdevel, kvm, avi,
	mtosatti, yoshikawa.takuya

On 05/03/2012 09:00 PM, Takuya Yoshikawa wrote:

> On Thu, 03 May 2012 14:29:10 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
>> On Thu, 2012-05-03 at 21:22 +0900, Takuya Yoshikawa wrote:
>>> Although the real use case is out of this RFC patch, we are now discussing
>>> a case in which we may hold a spin_lock for long time, ms order, depending
>>> on workload;  and in that case, other threads -- VCPU threads -- should be
>>> given higher priority for that problematic lock. 
>>
>> Firstly, if you can hold a lock that long, it shouldn't be a spinlock,
> 
> I agree with you in principle, but isn't cond_resched_lock() there for that?
> 
>> secondly why isn't TIF_RESCHED being set if its running that long? That
>> should still make cond_resched_lock() break.
> 
> I see.
> 
> I did some tests using spin_is_contended() and need_resched() and saw
> that need_resched() was called as often as spin_is_contended(), so
> experimentally I understand your point.
> 
> But as I could not see why spin_needbreak() was differently implemented
> depending on CONFIG_PREEMPT, I wanted to understand the meaning.


I think enable CONFIG_PREEMPT means allow preemption in kernel, so if
disabled, we can't reschedule a task if it is running in kernel not the
user space at a given time.

As the cond_resched_lock() was invoked in kernel, and looks like
cpu_relax() will give up cpu(I'm not sure whether this will invoke
schedule on some arch, just because that name...), so we can't do break
if CONFIG_PREEMPT disabled, because that will cause kernel preemption
while not allowed.

May be that's the reason why we need to consider CONFIG_PREEMPT in
spin_needbreak().

Regards,
Michael Wang

> 
> Thanks,
> 	Takuya
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] sched: make callers check lock contention for cond_resched_lock()
  2012-05-03 15:47         ` Peter Zijlstra
@ 2012-05-10 22:03           ` Takuya Yoshikawa
  2012-05-18  7:26             ` Ingo Molnar
  0 siblings, 1 reply; 14+ messages in thread
From: Takuya Yoshikawa @ 2012-05-10 22:03 UTC (permalink / raw)
  To: Peter Zijlstra, mingo
  Cc: linux-kernel, linux-fsdevel, kvm, avi, mtosatti, yoshikawa.takuya

Replaced Ingo's address with kernel.org one,

On Thu, 03 May 2012 17:47:30 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, 2012-05-03 at 22:00 +0900, Takuya Yoshikawa wrote:
> > But as I could not see why spin_needbreak() was differently
> > implemented
> > depending on CONFIG_PREEMPT, I wanted to understand the meaning. 
> 
> Its been that way since before voluntary preemption was introduced, so
> its possible Ingo simply missed that spot and nobody noticed until now.
> 
> Ingo, do you have any recollections from back when?

ping

	Takuya

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] sched: make callers check lock contention for cond_resched_lock()
  2012-05-10 22:03           ` Takuya Yoshikawa
@ 2012-05-18  7:26             ` Ingo Molnar
  2012-05-18 16:10               ` Takuya Yoshikawa
  0 siblings, 1 reply; 14+ messages in thread
From: Ingo Molnar @ 2012-05-18  7:26 UTC (permalink / raw)
  To: Takuya Yoshikawa
  Cc: Peter Zijlstra, linux-kernel, linux-fsdevel, kvm, avi, mtosatti,
	yoshikawa.takuya


* Takuya Yoshikawa <takuya.yoshikawa@gmail.com> wrote:

> Replaced Ingo's address with kernel.org one,
> 
> On Thu, 03 May 2012 17:47:30 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Thu, 2012-05-03 at 22:00 +0900, Takuya Yoshikawa wrote:
> > > But as I could not see why spin_needbreak() was differently
> > > implemented
> > > depending on CONFIG_PREEMPT, I wanted to understand the meaning. 
> > 
> > Its been that way since before voluntary preemption was introduced, so
> > its possible Ingo simply missed that spot and nobody noticed until now.
> > 
> > Ingo, do you have any recollections from back when?
> 
> ping

I'm not sure we had a usable spin_is_contended() back then, nor 
was the !PREEMPT case in my mind really.

( The patch looks ugly though, in 99% of the lines it just does
  something that cond_resched_lock() itself could do. )

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] sched: make callers check lock contention for cond_resched_lock()
  2012-05-18  7:26             ` Ingo Molnar
@ 2012-05-18 16:10               ` Takuya Yoshikawa
  0 siblings, 0 replies; 14+ messages in thread
From: Takuya Yoshikawa @ 2012-05-18 16:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, linux-kernel, linux-fsdevel, kvm, avi, mtosatti,
	yoshikawa.takuya

On Fri, 18 May 2012 09:26:05 +0200
Ingo Molnar <mingo@kernel.org> wrote:

> I'm not sure we had a usable spin_is_contended() back then, nor 
> was the !PREEMPT case in my mind really.

The fact that both spin_needbreak() and spin_is_contended() can be
used outside of sched is a bit confusing.

For example, in mm/compaction.c we are using spin_is_contended(), but
in mm/memory.c spin_needbreak().

BTW, the actual users of spin_is_contended() look to be only:
	mm/compaction.c
	security/keys/gc.c

> ( The patch looks ugly though, in 99% of the lines it just does
>   something that cond_resched_lock() itself could do. )

Please ignore the patch.  I have already found a way to solve my
problem without cond_resched().

Thanks,
	Takuya

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2012-05-18 16:10 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-03  8:12 [RFC] sched: make callers check lock contention for cond_resched_lock() Takuya Yoshikawa
2012-05-03  8:35 ` Peter Zijlstra
2012-05-03 12:22   ` Takuya Yoshikawa
2012-05-03 12:29     ` Peter Zijlstra
2012-05-03 12:47       ` Avi Kivity
2012-05-03 14:11         ` Takuya Yoshikawa
2012-05-03 14:27           ` Avi Kivity
2012-05-03 14:38             ` Avi Kivity
2012-05-03 13:00       ` Takuya Yoshikawa
2012-05-03 15:47         ` Peter Zijlstra
2012-05-10 22:03           ` Takuya Yoshikawa
2012-05-18  7:26             ` Ingo Molnar
2012-05-18 16:10               ` Takuya Yoshikawa
2012-05-04  2:43         ` Michael Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).