linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeremy Fitzhardinge <jeremy@goop.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
	David Miller <davem@davemloft.net>,
	mingo@elte.hu, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org
Subject: Re: [PATCH] lockdep: lock_set_subclass - reset a held lock's subclass
Date: Fri, 01 Aug 2008 13:08:37 -0700	[thread overview]
Message-ID: <48936D45.2030500@goop.org> (raw)
In-Reply-To: <alpine.LFD.1.10.0808011216360.3277@nehalem.linux-foundation.org>

Linus Torvalds wrote:
> Hmm. 
>
> With hashed locks, that is _not_ safe in general. Now, it may well be safe 
> in your case, so I'm not going to say you have a bug, but even if you do 
> them in vaddr order, the thing is, we don't guarantee that the _locks_ are 
> ordered in virtual address order.
>   

Yes, it's current simplicity definitely relies on a simple relationship 
between pte pages and locks.

> Right now, I think the pte locks are either a single one per mm (in which 
> case I assume you don't take any lock at all), or it's a lock that is 
> embedded in the pmd page iirc.
>   

Actually the locks in the pte page, so it's already fairly 
fine-grained.  If it were made coarser - at the pmd level - then I'd 
simply move my lock-taking out a level in the pagetable traversal.

But making it finer - by hashing individual pte entries to their own 
locks - would work badly for me.  I'm taking the lock specifically to 
protect against updates to any pte within the pte page, so I'd have to 
manually take all the locks that the ptes could possibly hash to.  
Presumably in that eventuality we could define correct order for taking 
the hashed pte locks, independent of how the ptes actually map to them 
(for example, always take locks in low->high order of *lock* address).

> What if you have pmd sharing through some shared area being mapped at two 
> different processes at different addresses? Yeah, I don't think we share 
> pmd's at all (except if you use hugetables and for the kernel), but it's 
> one of those things where subtle changes in how the pte lock allocation 
> could cause problems.
>   

In this particular case it isn't an issue.  The traversal is performing 
first/last use init/deinit, so any shared parts of the pagetable can be 
skipped with no action because they're already done.  Presumably there'd 
be separate lifetime management for whatever parts of the pagetable can 
be shared, so I'd need to hook in there, rather than the wholesale 
pagetable create/destroy as I do now.

> Eg, I could easily see somebody doing the pte lock as a hash over not just 
> the address, but the "struct mm" pointer too. At which point different 
> parts of the address space might even share the PTE lock, and you'd get 
> deadlocks even without any ABBA behavior, just because the pte lock might 
> be A B C A or something inside the same process.
>   

Yep.  That would be awkward.  A function which says "here's a pte page, 
return the set of all pte locks these ptes map to in correct locking 
order" would be useful in that case.  Ugh, but still unpleasant.

    J

  reply	other threads:[~2008-08-01 20:08 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-31 21:43 [git pull] scheduler fixes Ingo Molnar
2008-07-31 22:04 ` David Miller
2008-07-31 22:26   ` Ingo Molnar
2008-07-31 22:55     ` David Miller
2008-08-01  8:11       ` David Miller
2008-08-01  9:01         ` Ingo Molnar
2008-08-01  9:13           ` David Miller
2008-08-01 11:08             ` [PATCH] lockdep: lock_set_subclass - reset a held lock's subclass Peter Zijlstra
2008-08-01 18:06               ` Jeremy Fitzhardinge
2008-08-01 18:14                 ` Linus Torvalds
2008-08-01 19:10                   ` Jeremy Fitzhardinge
2008-08-01 19:24                     ` Linus Torvalds
2008-08-01 20:08                       ` Jeremy Fitzhardinge [this message]
2008-08-01 19:59                     ` Hugh Dickins
2008-08-01 20:22                       ` Jeremy Fitzhardinge
2008-08-01 20:33                         ` Hugh Dickins
2008-08-01 23:20                       ` David Miller
2008-08-01 23:26                         ` Linus Torvalds
2008-08-01 20:49                 ` Peter Zijlstra
2008-08-01 11:08             ` [PATCH] lockdep: re-annotate scheduler runqueues Peter Zijlstra
2008-08-01 17:04               ` Linus Torvalds
2008-08-02  8:34               ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48936D45.2030500@goop.org \
    --to=jeremy@goop.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).