All of lore.kernel.org
 help / color / mirror / Atom feed
* Another proposal for DAX fault locking
@ 2016-02-09 17:24 ` Jan Kara
  0 siblings, 0 replies; 46+ messages in thread
From: Jan Kara @ 2016-02-09 17:24 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: linux-kernel, Dave Chinner, linux-fsdevel, linux-mm,
	Dan Williams, linux-nvdimm, mgorman, Matthew Wilcox

Hello,

I was thinking about current issues with DAX fault locking [1] (data
corruption due to racing faults allocating blocks) and also races which
currently don't allow us to clear dirty tags in the radix tree due to races
between faults and cache flushing [2]. Both of these exist because we don't
have an equivalent of page lock available for DAX. While we have a
reasonable solution available for problem [1], so far I'm not aware of a
decent solution for [2]. After briefly discussing the issue with Mel he had
a bright idea that we could used hashed locks to deal with [2] (and I think
we can solve [1] with them as well). So my proposal looks as follows:

DAX will have an array of mutexes (the array can be made per device but
initially a global one should be OK). We will use mutexes in the array as a
replacement for page lock - we will use hashfn(mapping, index) to get
particular mutex protecting our offset in the mapping. On fault / page
mkwrite, we'll grab the mutex similarly to page lock and release it once we
are done updating page tables. This deals with races in [1]. When flushing
caches we grab the mutex before clearing writeable bit in page tables
and clearing dirty bit in the radix tree and drop it after we have flushed
caches for the pfn. This deals with races in [2].

Thoughts?

								Honza

[1] http://oss.sgi.com/archives/xfs/2016-01/msg00575.html
[2] https://lists.01.org/pipermail/linux-nvdimm/2016-January/004057.html

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Another proposal for DAX fault locking
@ 2016-02-09 17:24 ` Jan Kara
  0 siblings, 0 replies; 46+ messages in thread
From: Jan Kara @ 2016-02-09 17:24 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: linux-kernel, Dave Chinner, linux-fsdevel, linux-mm,
	Dan Williams, linux-nvdimm, mgorman, Matthew Wilcox

Hello,

I was thinking about current issues with DAX fault locking [1] (data
corruption due to racing faults allocating blocks) and also races which
currently don't allow us to clear dirty tags in the radix tree due to races
between faults and cache flushing [2]. Both of these exist because we don't
have an equivalent of page lock available for DAX. While we have a
reasonable solution available for problem [1], so far I'm not aware of a
decent solution for [2]. After briefly discussing the issue with Mel he had
a bright idea that we could used hashed locks to deal with [2] (and I think
we can solve [1] with them as well). So my proposal looks as follows:

DAX will have an array of mutexes (the array can be made per device but
initially a global one should be OK). We will use mutexes in the array as a
replacement for page lock - we will use hashfn(mapping, index) to get
particular mutex protecting our offset in the mapping. On fault / page
mkwrite, we'll grab the mutex similarly to page lock and release it once we
are done updating page tables. This deals with races in [1]. When flushing
caches we grab the mutex before clearing writeable bit in page tables
and clearing dirty bit in the radix tree and drop it after we have flushed
caches for the pfn. This deals with races in [2].

Thoughts?

								Honza

[1] http://oss.sgi.com/archives/xfs/2016-01/msg00575.html
[2] https://lists.01.org/pipermail/linux-nvdimm/2016-January/004057.html

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
  2016-02-09 17:24 ` Jan Kara
@ 2016-02-09 18:18   ` Dan Williams
  -1 siblings, 0 replies; 46+ messages in thread
From: Dan Williams @ 2016-02-09 18:18 UTC (permalink / raw)
  To: Jan Kara
  Cc: Ross Zwisler, linux-kernel, Dave Chinner, linux-fsdevel,
	Linux MM, linux-nvdimm, Mel Gorman, Matthew Wilcox

I l

On Tue, Feb 9, 2016 at 9:24 AM, Jan Kara <jack@suse.cz> wrote:
> Hello,
>
> I was thinking about current issues with DAX fault locking [1] (data
> corruption due to racing faults allocating blocks) and also races which
> currently don't allow us to clear dirty tags in the radix tree due to races
> between faults and cache flushing [2]. Both of these exist because we don't
> have an equivalent of page lock available for DAX. While we have a
> reasonable solution available for problem [1], so far I'm not aware of a
> decent solution for [2]. After briefly discussing the issue with Mel he had
> a bright idea that we could used hashed locks to deal with [2] (and I think
> we can solve [1] with them as well). So my proposal looks as follows:
>
> DAX will have an array of mutexes (the array can be made per device but
> initially a global one should be OK). We will use mutexes in the array as a
> replacement for page lock - we will use hashfn(mapping, index) to get
> particular mutex protecting our offset in the mapping. On fault / page
> mkwrite, we'll grab the mutex similarly to page lock and release it once we
> are done updating page tables. This deals with races in [1]. When flushing
> caches we grab the mutex before clearing writeable bit in page tables
> and clearing dirty bit in the radix tree and drop it after we have flushed
> caches for the pfn. This deals with races in [2].
>
> Thoughts?
>

I like the fact that this makes the locking explicit and
straightforward rather than something more tricky.  Can we make the
hashfn pfn based?  I'm thinking we could later reuse this as part of
the solution for eliminating the need to allocate struct page, and we
don't have the 'mapping' available in all paths...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
@ 2016-02-09 18:18   ` Dan Williams
  0 siblings, 0 replies; 46+ messages in thread
From: Dan Williams @ 2016-02-09 18:18 UTC (permalink / raw)
  To: Jan Kara
  Cc: Ross Zwisler, linux-kernel, Dave Chinner, linux-fsdevel,
	Linux MM, linux-nvdimm@lists.01.org, Mel Gorman, Matthew Wilcox

I l

On Tue, Feb 9, 2016 at 9:24 AM, Jan Kara <jack@suse.cz> wrote:
> Hello,
>
> I was thinking about current issues with DAX fault locking [1] (data
> corruption due to racing faults allocating blocks) and also races which
> currently don't allow us to clear dirty tags in the radix tree due to races
> between faults and cache flushing [2]. Both of these exist because we don't
> have an equivalent of page lock available for DAX. While we have a
> reasonable solution available for problem [1], so far I'm not aware of a
> decent solution for [2]. After briefly discussing the issue with Mel he had
> a bright idea that we could used hashed locks to deal with [2] (and I think
> we can solve [1] with them as well). So my proposal looks as follows:
>
> DAX will have an array of mutexes (the array can be made per device but
> initially a global one should be OK). We will use mutexes in the array as a
> replacement for page lock - we will use hashfn(mapping, index) to get
> particular mutex protecting our offset in the mapping. On fault / page
> mkwrite, we'll grab the mutex similarly to page lock and release it once we
> are done updating page tables. This deals with races in [1]. When flushing
> caches we grab the mutex before clearing writeable bit in page tables
> and clearing dirty bit in the radix tree and drop it after we have flushed
> caches for the pfn. This deals with races in [2].
>
> Thoughts?
>

I like the fact that this makes the locking explicit and
straightforward rather than something more tricky.  Can we make the
hashfn pfn based?  I'm thinking we could later reuse this as part of
the solution for eliminating the need to allocate struct page, and we
don't have the 'mapping' available in all paths...

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
  2016-02-09 17:24 ` Jan Kara
@ 2016-02-09 18:46   ` Cedric Blancher
  -1 siblings, 0 replies; 46+ messages in thread
From: Cedric Blancher @ 2016-02-09 18:46 UTC (permalink / raw)
  To: Jan Kara
  Cc: Ross Zwisler, Linux Kernel Mailing List, Dave Chinner,
	linux-fsdevel, linux-mm, Dan Williams, linux-nvdimm, mgorman,
	Matthew Wilcox

On 9 February 2016 at 18:24, Jan Kara <jack@suse.cz> wrote:
> Hello,
>
> I was thinking about current issues with DAX fault locking [1] (data
> corruption due to racing faults allocating blocks) and also races which
> currently don't allow us to clear dirty tags in the radix tree due to races
> between faults and cache flushing [2]. Both of these exist because we don't
> have an equivalent of page lock available for DAX. While we have a
> reasonable solution available for problem [1], so far I'm not aware of a
> decent solution for [2]. After briefly discussing the issue with Mel he had
> a bright idea that we could used hashed locks to deal with [2] (and I think
> we can solve [1] with them as well). So my proposal looks as follows:
>
> DAX will have an array of mutexes

One folly here: Arrays of mutexes NEVER work unless you manage to
align them to occupy one complete L2/L3 cache line each. Otherwise the
CPUS will fight over cache lines each time they touch (read or write)
a mutex, and it then becomes a O^n-like scalability problem if
multiple mutexes occupy one cache line. It becomes WORSE as more
mutexes fit into a single cache line and even more worse with the
number of CPUS accessing such contested lines.

Ced
-- 
Cedric Blancher <cedric.blancher@gmail.com>
Institute Pasteur

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
@ 2016-02-09 18:46   ` Cedric Blancher
  0 siblings, 0 replies; 46+ messages in thread
From: Cedric Blancher @ 2016-02-09 18:46 UTC (permalink / raw)
  To: Jan Kara
  Cc: Ross Zwisler, Linux Kernel Mailing List, Dave Chinner,
	linux-fsdevel, linux-mm, Dan Williams, linux-nvdimm, mgorman,
	Matthew Wilcox

On 9 February 2016 at 18:24, Jan Kara <jack@suse.cz> wrote:
> Hello,
>
> I was thinking about current issues with DAX fault locking [1] (data
> corruption due to racing faults allocating blocks) and also races which
> currently don't allow us to clear dirty tags in the radix tree due to races
> between faults and cache flushing [2]. Both of these exist because we don't
> have an equivalent of page lock available for DAX. While we have a
> reasonable solution available for problem [1], so far I'm not aware of a
> decent solution for [2]. After briefly discussing the issue with Mel he had
> a bright idea that we could used hashed locks to deal with [2] (and I think
> we can solve [1] with them as well). So my proposal looks as follows:
>
> DAX will have an array of mutexes

One folly here: Arrays of mutexes NEVER work unless you manage to
align them to occupy one complete L2/L3 cache line each. Otherwise the
CPUS will fight over cache lines each time they touch (read or write)
a mutex, and it then becomes a O^n-like scalability problem if
multiple mutexes occupy one cache line. It becomes WORSE as more
mutexes fit into a single cache line and even more worse with the
number of CPUS accessing such contested lines.

Ced
-- 
Cedric Blancher <cedric.blancher@gmail.com>
Institute Pasteur

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
  2016-02-09 18:46   ` Cedric Blancher
@ 2016-02-10  8:19     ` Mel Gorman
  -1 siblings, 0 replies; 46+ messages in thread
From: Mel Gorman @ 2016-02-10  8:19 UTC (permalink / raw)
  To: Cedric Blancher
  Cc: Jan Kara, Ross Zwisler, Linux Kernel Mailing List, Dave Chinner,
	linux-fsdevel, linux-mm, Dan Williams, linux-nvdimm,
	Matthew Wilcox

On Tue, Feb 09, 2016 at 07:46:05PM +0100, Cedric Blancher wrote:
> On 9 February 2016 at 18:24, Jan Kara <jack@suse.cz> wrote:
> > Hello,
> >
> > I was thinking about current issues with DAX fault locking [1] (data
> > corruption due to racing faults allocating blocks) and also races which
> > currently don't allow us to clear dirty tags in the radix tree due to races
> > between faults and cache flushing [2]. Both of these exist because we don't
> > have an equivalent of page lock available for DAX. While we have a
> > reasonable solution available for problem [1], so far I'm not aware of a
> > decent solution for [2]. After briefly discussing the issue with Mel he had
> > a bright idea that we could used hashed locks to deal with [2] (and I think
> > we can solve [1] with them as well). So my proposal looks as follows:
> >
> > DAX will have an array of mutexes
> 
> One folly here: Arrays of mutexes NEVER work unless you manage to
> align them to occupy one complete L2/L3 cache line each. Otherwise the
> CPUS will fight over cache lines each time they touch (read or write)
> a mutex, and it then becomes a O^n-like scalability problem if
> multiple mutexes occupy one cache line. It becomes WORSE as more
> mutexes fit into a single cache line and even more worse with the
> number of CPUS accessing such contested lines.
> 

That is a *potential* performance concern although I agree with you in that
mutex's false sharing a cache line would be a problem. However, it is a
performance concern that potentially is alleviated by alternative hashing
where as AFAIK the issues being faced currently are data corruption and
functional issues. I'd take a performance issue over a data corruption
issue any day of the week.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
@ 2016-02-10  8:19     ` Mel Gorman
  0 siblings, 0 replies; 46+ messages in thread
From: Mel Gorman @ 2016-02-10  8:19 UTC (permalink / raw)
  To: Cedric Blancher
  Cc: Jan Kara, Ross Zwisler, Linux Kernel Mailing List, Dave Chinner,
	linux-fsdevel, linux-mm, Dan Williams, linux-nvdimm,
	Matthew Wilcox

On Tue, Feb 09, 2016 at 07:46:05PM +0100, Cedric Blancher wrote:
> On 9 February 2016 at 18:24, Jan Kara <jack@suse.cz> wrote:
> > Hello,
> >
> > I was thinking about current issues with DAX fault locking [1] (data
> > corruption due to racing faults allocating blocks) and also races which
> > currently don't allow us to clear dirty tags in the radix tree due to races
> > between faults and cache flushing [2]. Both of these exist because we don't
> > have an equivalent of page lock available for DAX. While we have a
> > reasonable solution available for problem [1], so far I'm not aware of a
> > decent solution for [2]. After briefly discussing the issue with Mel he had
> > a bright idea that we could used hashed locks to deal with [2] (and I think
> > we can solve [1] with them as well). So my proposal looks as follows:
> >
> > DAX will have an array of mutexes
> 
> One folly here: Arrays of mutexes NEVER work unless you manage to
> align them to occupy one complete L2/L3 cache line each. Otherwise the
> CPUS will fight over cache lines each time they touch (read or write)
> a mutex, and it then becomes a O^n-like scalability problem if
> multiple mutexes occupy one cache line. It becomes WORSE as more
> mutexes fit into a single cache line and even more worse with the
> number of CPUS accessing such contested lines.
> 

That is a *potential* performance concern although I agree with you in that
mutex's false sharing a cache line would be a problem. However, it is a
performance concern that potentially is alleviated by alternative hashing
where as AFAIK the issues being faced currently are data corruption and
functional issues. I'd take a performance issue over a data corruption
issue any day of the week.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
  2016-02-10  8:19     ` Mel Gorman
@ 2016-02-10 10:18       ` Jan Kara
  -1 siblings, 0 replies; 46+ messages in thread
From: Jan Kara @ 2016-02-10 10:18 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Cedric Blancher, Jan Kara, Ross Zwisler,
	Linux Kernel Mailing List, Dave Chinner, linux-fsdevel, linux-mm,
	Dan Williams, linux-nvdimm, Matthew Wilcox

On Wed 10-02-16 08:19:22, Mel Gorman wrote:
> On Tue, Feb 09, 2016 at 07:46:05PM +0100, Cedric Blancher wrote:
> > On 9 February 2016 at 18:24, Jan Kara <jack@suse.cz> wrote:
> > > Hello,
> > >
> > > I was thinking about current issues with DAX fault locking [1] (data
> > > corruption due to racing faults allocating blocks) and also races which
> > > currently don't allow us to clear dirty tags in the radix tree due to races
> > > between faults and cache flushing [2]. Both of these exist because we don't
> > > have an equivalent of page lock available for DAX. While we have a
> > > reasonable solution available for problem [1], so far I'm not aware of a
> > > decent solution for [2]. After briefly discussing the issue with Mel he had
> > > a bright idea that we could used hashed locks to deal with [2] (and I think
> > > we can solve [1] with them as well). So my proposal looks as follows:
> > >
> > > DAX will have an array of mutexes
> > 
> > One folly here: Arrays of mutexes NEVER work unless you manage to
> > align them to occupy one complete L2/L3 cache line each. Otherwise the
> > CPUS will fight over cache lines each time they touch (read or write)
> > a mutex, and it then becomes a O^n-like scalability problem if
> > multiple mutexes occupy one cache line. It becomes WORSE as more
> > mutexes fit into a single cache line and even more worse with the
> > number of CPUS accessing such contested lines.
> > 
> 
> That is a *potential* performance concern although I agree with you in that
> mutex's false sharing a cache line would be a problem. However, it is a
> performance concern that potentially is alleviated by alternative hashing
> where as AFAIK the issues being faced currently are data corruption and
> functional issues. I'd take a performance issue over a data corruption
> issue any day of the week.

Exactly. We have to add *some* locking to fix the data corruption. Cache
aliasing of hashed mutexes may be an issue but I believe the result will be
still better than a single mutex.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
@ 2016-02-10 10:18       ` Jan Kara
  0 siblings, 0 replies; 46+ messages in thread
From: Jan Kara @ 2016-02-10 10:18 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Cedric Blancher, Jan Kara, Ross Zwisler,
	Linux Kernel Mailing List, Dave Chinner, linux-fsdevel, linux-mm,
	Dan Williams, linux-nvdimm, Matthew Wilcox

On Wed 10-02-16 08:19:22, Mel Gorman wrote:
> On Tue, Feb 09, 2016 at 07:46:05PM +0100, Cedric Blancher wrote:
> > On 9 February 2016 at 18:24, Jan Kara <jack@suse.cz> wrote:
> > > Hello,
> > >
> > > I was thinking about current issues with DAX fault locking [1] (data
> > > corruption due to racing faults allocating blocks) and also races which
> > > currently don't allow us to clear dirty tags in the radix tree due to races
> > > between faults and cache flushing [2]. Both of these exist because we don't
> > > have an equivalent of page lock available for DAX. While we have a
> > > reasonable solution available for problem [1], so far I'm not aware of a
> > > decent solution for [2]. After briefly discussing the issue with Mel he had
> > > a bright idea that we could used hashed locks to deal with [2] (and I think
> > > we can solve [1] with them as well). So my proposal looks as follows:
> > >
> > > DAX will have an array of mutexes
> > 
> > One folly here: Arrays of mutexes NEVER work unless you manage to
> > align them to occupy one complete L2/L3 cache line each. Otherwise the
> > CPUS will fight over cache lines each time they touch (read or write)
> > a mutex, and it then becomes a O^n-like scalability problem if
> > multiple mutexes occupy one cache line. It becomes WORSE as more
> > mutexes fit into a single cache line and even more worse with the
> > number of CPUS accessing such contested lines.
> > 
> 
> That is a *potential* performance concern although I agree with you in that
> mutex's false sharing a cache line would be a problem. However, it is a
> performance concern that potentially is alleviated by alternative hashing
> where as AFAIK the issues being faced currently are data corruption and
> functional issues. I'd take a performance issue over a data corruption
> issue any day of the week.

Exactly. We have to add *some* locking to fix the data corruption. Cache
aliasing of hashed mutexes may be an issue but I believe the result will be
still better than a single mutex.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
  2016-02-09 18:18   ` Dan Williams
@ 2016-02-10 10:32     ` Jan Kara
  -1 siblings, 0 replies; 46+ messages in thread
From: Jan Kara @ 2016-02-10 10:32 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jan Kara, Ross Zwisler, linux-kernel, Dave Chinner,
	linux-fsdevel, Linux MM, linux-nvdimm, Mel Gorman,
	Matthew Wilcox

On Tue 09-02-16 10:18:53, Dan Williams wrote:
> On Tue, Feb 9, 2016 at 9:24 AM, Jan Kara <jack@suse.cz> wrote:
> > Hello,
> >
> > I was thinking about current issues with DAX fault locking [1] (data
> > corruption due to racing faults allocating blocks) and also races which
> > currently don't allow us to clear dirty tags in the radix tree due to races
> > between faults and cache flushing [2]. Both of these exist because we don't
> > have an equivalent of page lock available for DAX. While we have a
> > reasonable solution available for problem [1], so far I'm not aware of a
> > decent solution for [2]. After briefly discussing the issue with Mel he had
> > a bright idea that we could used hashed locks to deal with [2] (and I think
> > we can solve [1] with them as well). So my proposal looks as follows:
> >
> > DAX will have an array of mutexes (the array can be made per device but
> > initially a global one should be OK). We will use mutexes in the array as a
> > replacement for page lock - we will use hashfn(mapping, index) to get
> > particular mutex protecting our offset in the mapping. On fault / page
> > mkwrite, we'll grab the mutex similarly to page lock and release it once we
> > are done updating page tables. This deals with races in [1]. When flushing
> > caches we grab the mutex before clearing writeable bit in page tables
> > and clearing dirty bit in the radix tree and drop it after we have flushed
> > caches for the pfn. This deals with races in [2].
> >
> > Thoughts?
> >
> 
> I like the fact that this makes the locking explicit and
> straightforward rather than something more tricky.  Can we make the
> hashfn pfn based?  I'm thinking we could later reuse this as part of
> the solution for eliminating the need to allocate struct page, and we
> don't have the 'mapping' available in all paths...

So Mel originally suggested to use pfn for hashing as well. My concern with
using pfn is that e.g. if you want to fill a hole, you don't have a pfn to
lock. What you really need to protect is a logical offset in the file to
serialize allocation of underlying blocks, its mapping into page tables,
and flushing the blocks out of caches. So using inode/mapping and offset
for the hashing is easier (it isn't obvious to me we can fix hole filling
races with pfn-based locking).

I'm not sure for which other purposes you'd like to use this lock and
whether propagating file+offset to those call sites would make sense or
not. struct page has the advantage that block mapping information is only
attached to it, so when filling a hole, we can just allocate some page,
attach it to the radix tree, use page lock for synchronization, and allocate
blocks only after that. With pfns we cannot do this...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
@ 2016-02-10 10:32     ` Jan Kara
  0 siblings, 0 replies; 46+ messages in thread
From: Jan Kara @ 2016-02-10 10:32 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jan Kara, Ross Zwisler, linux-kernel, Dave Chinner,
	linux-fsdevel, Linux MM, linux-nvdimm@lists.01.org, Mel Gorman,
	Matthew Wilcox

On Tue 09-02-16 10:18:53, Dan Williams wrote:
> On Tue, Feb 9, 2016 at 9:24 AM, Jan Kara <jack@suse.cz> wrote:
> > Hello,
> >
> > I was thinking about current issues with DAX fault locking [1] (data
> > corruption due to racing faults allocating blocks) and also races which
> > currently don't allow us to clear dirty tags in the radix tree due to races
> > between faults and cache flushing [2]. Both of these exist because we don't
> > have an equivalent of page lock available for DAX. While we have a
> > reasonable solution available for problem [1], so far I'm not aware of a
> > decent solution for [2]. After briefly discussing the issue with Mel he had
> > a bright idea that we could used hashed locks to deal with [2] (and I think
> > we can solve [1] with them as well). So my proposal looks as follows:
> >
> > DAX will have an array of mutexes (the array can be made per device but
> > initially a global one should be OK). We will use mutexes in the array as a
> > replacement for page lock - we will use hashfn(mapping, index) to get
> > particular mutex protecting our offset in the mapping. On fault / page
> > mkwrite, we'll grab the mutex similarly to page lock and release it once we
> > are done updating page tables. This deals with races in [1]. When flushing
> > caches we grab the mutex before clearing writeable bit in page tables
> > and clearing dirty bit in the radix tree and drop it after we have flushed
> > caches for the pfn. This deals with races in [2].
> >
> > Thoughts?
> >
> 
> I like the fact that this makes the locking explicit and
> straightforward rather than something more tricky.  Can we make the
> hashfn pfn based?  I'm thinking we could later reuse this as part of
> the solution for eliminating the need to allocate struct page, and we
> don't have the 'mapping' available in all paths...

So Mel originally suggested to use pfn for hashing as well. My concern with
using pfn is that e.g. if you want to fill a hole, you don't have a pfn to
lock. What you really need to protect is a logical offset in the file to
serialize allocation of underlying blocks, its mapping into page tables,
and flushing the blocks out of caches. So using inode/mapping and offset
for the hashing is easier (it isn't obvious to me we can fix hole filling
races with pfn-based locking).

I'm not sure for which other purposes you'd like to use this lock and
whether propagating file+offset to those call sites would make sense or
not. struct page has the advantage that block mapping information is only
attached to it, so when filling a hole, we can just allocate some page,
attach it to the radix tree, use page lock for synchronization, and allocate
blocks only after that. With pfns we cannot do this...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
  2016-02-09 17:24 ` Jan Kara
@ 2016-02-10 12:29   ` Dmitry Monakhov
  -1 siblings, 0 replies; 46+ messages in thread
From: Dmitry Monakhov @ 2016-02-10 12:29 UTC (permalink / raw)
  To: Jan Kara, Ross Zwisler
  Cc: linux-nvdimm, Dave Chinner, linux-kernel, linux-mm, mgorman,
	linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 2011 bytes --]

Jan Kara <jack@suse.cz> writes:

> Hello,
>
> I was thinking about current issues with DAX fault locking [1] (data
> corruption due to racing faults allocating blocks) and also races which
> currently don't allow us to clear dirty tags in the radix tree due to races
> between faults and cache flushing [2]. Both of these exist because we don't
> have an equivalent of page lock available for DAX. While we have a
> reasonable solution available for problem [1], so far I'm not aware of a
> decent solution for [2]. After briefly discussing the issue with Mel he had
> a bright idea that we could used hashed locks to deal with [2] (and I think
> we can solve [1] with them as well). So my proposal looks as follows:
>
> DAX will have an array of mutexes (the array can be made per device but
> initially a global one should be OK). We will use mutexes in the array as a
> replacement for page lock - we will use hashfn(mapping, index) to get
> particular mutex protecting our offset in the mapping. On fault / page
> mkwrite, we'll grab the mutex similarly to page lock and release it once we
> are done updating page tables. This deals with races in [1]. When flushing
> caches we grab the mutex before clearing writeable bit in page tables
> and clearing dirty bit in the radix tree and drop it after we have flushed
> caches for the pfn. This deals with races in [2].
>
> Thoughts?
Agree, only small note:
Hash locks has side effect for batch locking due to collision.
Some times we want to lock several pages/entries (migration/defragmentation)
So we will endup with deadlock due to hash collision.
>
> 								Honza
>
> [1] http://oss.sgi.com/archives/xfs/2016-01/msg00575.html
> [2] https://lists.01.org/pipermail/linux-nvdimm/2016-January/004057.html
>
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
> _______________________________________________
> Linux-nvdimm mailing list
> Linux-nvdimm@lists.01.org
> https://lists.01.org/mailman/listinfo/linux-nvdimm

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
@ 2016-02-10 12:29   ` Dmitry Monakhov
  0 siblings, 0 replies; 46+ messages in thread
From: Dmitry Monakhov @ 2016-02-10 12:29 UTC (permalink / raw)
  To: Jan Kara, Ross Zwisler
  Cc: linux-nvdimm, Dave Chinner, linux-kernel, linux-mm, mgorman,
	linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 2011 bytes --]

Jan Kara <jack@suse.cz> writes:

> Hello,
>
> I was thinking about current issues with DAX fault locking [1] (data
> corruption due to racing faults allocating blocks) and also races which
> currently don't allow us to clear dirty tags in the radix tree due to races
> between faults and cache flushing [2]. Both of these exist because we don't
> have an equivalent of page lock available for DAX. While we have a
> reasonable solution available for problem [1], so far I'm not aware of a
> decent solution for [2]. After briefly discussing the issue with Mel he had
> a bright idea that we could used hashed locks to deal with [2] (and I think
> we can solve [1] with them as well). So my proposal looks as follows:
>
> DAX will have an array of mutexes (the array can be made per device but
> initially a global one should be OK). We will use mutexes in the array as a
> replacement for page lock - we will use hashfn(mapping, index) to get
> particular mutex protecting our offset in the mapping. On fault / page
> mkwrite, we'll grab the mutex similarly to page lock and release it once we
> are done updating page tables. This deals with races in [1]. When flushing
> caches we grab the mutex before clearing writeable bit in page tables
> and clearing dirty bit in the radix tree and drop it after we have flushed
> caches for the pfn. This deals with races in [2].
>
> Thoughts?
Agree, only small note:
Hash locks has side effect for batch locking due to collision.
Some times we want to lock several pages/entries (migration/defragmentation)
So we will endup with deadlock due to hash collision.
>
> 								Honza
>
> [1] http://oss.sgi.com/archives/xfs/2016-01/msg00575.html
> [2] https://lists.01.org/pipermail/linux-nvdimm/2016-January/004057.html
>
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
> _______________________________________________
> Linux-nvdimm mailing list
> Linux-nvdimm@lists.01.org
> https://lists.01.org/mailman/listinfo/linux-nvdimm

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
  2016-02-10 12:29   ` Dmitry Monakhov
@ 2016-02-10 12:35     ` Jan Kara
  -1 siblings, 0 replies; 46+ messages in thread
From: Jan Kara @ 2016-02-10 12:35 UTC (permalink / raw)
  To: Dmitry Monakhov
  Cc: Jan Kara, Ross Zwisler, linux-nvdimm, Dave Chinner, linux-kernel,
	linux-mm, mgorman, linux-fsdevel

On Wed 10-02-16 15:29:34, Dmitry Monakhov wrote:
> Jan Kara <jack@suse.cz> writes:
> 
> > Hello,
> >
> > I was thinking about current issues with DAX fault locking [1] (data
> > corruption due to racing faults allocating blocks) and also races which
> > currently don't allow us to clear dirty tags in the radix tree due to races
> > between faults and cache flushing [2]. Both of these exist because we don't
> > have an equivalent of page lock available for DAX. While we have a
> > reasonable solution available for problem [1], so far I'm not aware of a
> > decent solution for [2]. After briefly discussing the issue with Mel he had
> > a bright idea that we could used hashed locks to deal with [2] (and I think
> > we can solve [1] with them as well). So my proposal looks as follows:
> >
> > DAX will have an array of mutexes (the array can be made per device but
> > initially a global one should be OK). We will use mutexes in the array as a
> > replacement for page lock - we will use hashfn(mapping, index) to get
> > particular mutex protecting our offset in the mapping. On fault / page
> > mkwrite, we'll grab the mutex similarly to page lock and release it once we
> > are done updating page tables. This deals with races in [1]. When flushing
> > caches we grab the mutex before clearing writeable bit in page tables
> > and clearing dirty bit in the radix tree and drop it after we have flushed
> > caches for the pfn. This deals with races in [2].
> >
> > Thoughts?
> Agree, only small note:
> Hash locks has side effect for batch locking due to collision.
> Some times we want to lock several pages/entries (migration/defragmentation)
> So we will endup with deadlock due to hash collision.

Yeah, but at least for the purposes we want the locks for locking just one
'page' is enough. If we ever needed locking more 'pages', we would have to
choose a different locking scheme.

									Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
@ 2016-02-10 12:35     ` Jan Kara
  0 siblings, 0 replies; 46+ messages in thread
From: Jan Kara @ 2016-02-10 12:35 UTC (permalink / raw)
  To: Dmitry Monakhov
  Cc: Jan Kara, Ross Zwisler, linux-nvdimm, Dave Chinner, linux-kernel,
	linux-mm, mgorman, linux-fsdevel

On Wed 10-02-16 15:29:34, Dmitry Monakhov wrote:
> Jan Kara <jack@suse.cz> writes:
> 
> > Hello,
> >
> > I was thinking about current issues with DAX fault locking [1] (data
> > corruption due to racing faults allocating blocks) and also races which
> > currently don't allow us to clear dirty tags in the radix tree due to races
> > between faults and cache flushing [2]. Both of these exist because we don't
> > have an equivalent of page lock available for DAX. While we have a
> > reasonable solution available for problem [1], so far I'm not aware of a
> > decent solution for [2]. After briefly discussing the issue with Mel he had
> > a bright idea that we could used hashed locks to deal with [2] (and I think
> > we can solve [1] with them as well). So my proposal looks as follows:
> >
> > DAX will have an array of mutexes (the array can be made per device but
> > initially a global one should be OK). We will use mutexes in the array as a
> > replacement for page lock - we will use hashfn(mapping, index) to get
> > particular mutex protecting our offset in the mapping. On fault / page
> > mkwrite, we'll grab the mutex similarly to page lock and release it once we
> > are done updating page tables. This deals with races in [1]. When flushing
> > caches we grab the mutex before clearing writeable bit in page tables
> > and clearing dirty bit in the radix tree and drop it after we have flushed
> > caches for the pfn. This deals with races in [2].
> >
> > Thoughts?
> Agree, only small note:
> Hash locks has side effect for batch locking due to collision.
> Some times we want to lock several pages/entries (migration/defragmentation)
> So we will endup with deadlock due to hash collision.

Yeah, but at least for the purposes we want the locks for locking just one
'page' is enough. If we ever needed locking more 'pages', we would have to
choose a different locking scheme.

									Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
  2016-02-09 17:24 ` Jan Kara
@ 2016-02-10 17:38   ` Boaz Harrosh
  -1 siblings, 0 replies; 46+ messages in thread
From: Boaz Harrosh @ 2016-02-10 17:38 UTC (permalink / raw)
  To: Jan Kara, Ross Zwisler
  Cc: linux-nvdimm, Dave Chinner, linux-kernel, linux-mm, mgorman,
	linux-fsdevel

On 02/09/2016 07:24 PM, Jan Kara wrote:
> Hello,
> 
> I was thinking about current issues with DAX fault locking [1] (data
> corruption due to racing faults allocating blocks) and also races which
> currently don't allow us to clear dirty tags in the radix tree due to races
> between faults and cache flushing [2]. Both of these exist because we don't
> have an equivalent of page lock available for DAX. While we have a
> reasonable solution available for problem [1], so far I'm not aware of a
> decent solution for [2]. After briefly discussing the issue with Mel he had
> a bright idea that we could used hashed locks to deal with [2] (and I think
> we can solve [1] with them as well). So my proposal looks as follows:
> 
> DAX will have an array of mutexes (the array can be made per device but
> initially a global one should be OK). We will use mutexes in the array as a
> replacement for page lock - we will use hashfn(mapping, index) to get
> particular mutex protecting our offset in the mapping. On fault / page
> mkwrite, we'll grab the mutex similarly to page lock and release it once we
> are done updating page tables. This deals with races in [1]. When flushing
> caches we grab the mutex before clearing writeable bit in page tables
> and clearing dirty bit in the radix tree and drop it after we have flushed
> caches for the pfn. This deals with races in [2].
> 
> Thoughts?
> 

You could also use one of the radix-tree's special-bits as a bit lock.
So no need for any extra allocations.

[latest page-lock is a bit-lock so performance is the same]

Thanks
Boaz


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
@ 2016-02-10 17:38   ` Boaz Harrosh
  0 siblings, 0 replies; 46+ messages in thread
From: Boaz Harrosh @ 2016-02-10 17:38 UTC (permalink / raw)
  To: Jan Kara, Ross Zwisler
  Cc: linux-nvdimm, Dave Chinner, linux-kernel, linux-mm, mgorman,
	linux-fsdevel

On 02/09/2016 07:24 PM, Jan Kara wrote:
> Hello,
> 
> I was thinking about current issues with DAX fault locking [1] (data
> corruption due to racing faults allocating blocks) and also races which
> currently don't allow us to clear dirty tags in the radix tree due to races
> between faults and cache flushing [2]. Both of these exist because we don't
> have an equivalent of page lock available for DAX. While we have a
> reasonable solution available for problem [1], so far I'm not aware of a
> decent solution for [2]. After briefly discussing the issue with Mel he had
> a bright idea that we could used hashed locks to deal with [2] (and I think
> we can solve [1] with them as well). So my proposal looks as follows:
> 
> DAX will have an array of mutexes (the array can be made per device but
> initially a global one should be OK). We will use mutexes in the array as a
> replacement for page lock - we will use hashfn(mapping, index) to get
> particular mutex protecting our offset in the mapping. On fault / page
> mkwrite, we'll grab the mutex similarly to page lock and release it once we
> are done updating page tables. This deals with races in [1]. When flushing
> caches we grab the mutex before clearing writeable bit in page tables
> and clearing dirty bit in the radix tree and drop it after we have flushed
> caches for the pfn. This deals with races in [2].
> 
> Thoughts?
> 

You could also use one of the radix-tree's special-bits as a bit lock.
So no need for any extra allocations.

[latest page-lock is a bit-lock so performance is the same]

Thanks
Boaz

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
  2016-02-10 10:32     ` Jan Kara
@ 2016-02-10 20:08       ` Dan Williams
  -1 siblings, 0 replies; 46+ messages in thread
From: Dan Williams @ 2016-02-10 20:08 UTC (permalink / raw)
  To: Jan Kara
  Cc: Ross Zwisler, linux-kernel, Dave Chinner, linux-fsdevel,
	Linux MM, linux-nvdimm, Mel Gorman, Matthew Wilcox

On Wed, Feb 10, 2016 at 2:32 AM, Jan Kara <jack@suse.cz> wrote:
> On Tue 09-02-16 10:18:53, Dan Williams wrote:
>> On Tue, Feb 9, 2016 at 9:24 AM, Jan Kara <jack@suse.cz> wrote:
>> > Hello,
>> >
>> > I was thinking about current issues with DAX fault locking [1] (data
>> > corruption due to racing faults allocating blocks) and also races which
>> > currently don't allow us to clear dirty tags in the radix tree due to races
>> > between faults and cache flushing [2]. Both of these exist because we don't
>> > have an equivalent of page lock available for DAX. While we have a
>> > reasonable solution available for problem [1], so far I'm not aware of a
>> > decent solution for [2]. After briefly discussing the issue with Mel he had
>> > a bright idea that we could used hashed locks to deal with [2] (and I think
>> > we can solve [1] with them as well). So my proposal looks as follows:
>> >
>> > DAX will have an array of mutexes (the array can be made per device but
>> > initially a global one should be OK). We will use mutexes in the array as a
>> > replacement for page lock - we will use hashfn(mapping, index) to get
>> > particular mutex protecting our offset in the mapping. On fault / page
>> > mkwrite, we'll grab the mutex similarly to page lock and release it once we
>> > are done updating page tables. This deals with races in [1]. When flushing
>> > caches we grab the mutex before clearing writeable bit in page tables
>> > and clearing dirty bit in the radix tree and drop it after we have flushed
>> > caches for the pfn. This deals with races in [2].
>> >
>> > Thoughts?
>> >
>>
>> I like the fact that this makes the locking explicit and
>> straightforward rather than something more tricky.  Can we make the
>> hashfn pfn based?  I'm thinking we could later reuse this as part of
>> the solution for eliminating the need to allocate struct page, and we
>> don't have the 'mapping' available in all paths...
>
> So Mel originally suggested to use pfn for hashing as well. My concern with
> using pfn is that e.g. if you want to fill a hole, you don't have a pfn to
> lock. What you really need to protect is a logical offset in the file to
> serialize allocation of underlying blocks, its mapping into page tables,
> and flushing the blocks out of caches. So using inode/mapping and offset
> for the hashing is easier (it isn't obvious to me we can fix hole filling
> races with pfn-based locking).
>
> I'm not sure for which other purposes you'd like to use this lock and
> whether propagating file+offset to those call sites would make sense or
> not. struct page has the advantage that block mapping information is only
> attached to it, so when filling a hole, we can just allocate some page,
> attach it to the radix tree, use page lock for synchronization, and allocate
> blocks only after that. With pfns we cannot do this...

Right, I am thinking of the direct-I/O path's use of the page lock and
the occasions where it relies on page->mapping lookups.

Given we already have support for dynamically allocating struct page I
don't think we need to have a "pfn to lock" lookup in the initial
implementation of this locking scheme.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
@ 2016-02-10 20:08       ` Dan Williams
  0 siblings, 0 replies; 46+ messages in thread
From: Dan Williams @ 2016-02-10 20:08 UTC (permalink / raw)
  To: Jan Kara
  Cc: Ross Zwisler, linux-kernel, Dave Chinner, linux-fsdevel,
	Linux MM, linux-nvdimm@lists.01.org, Mel Gorman, Matthew Wilcox

On Wed, Feb 10, 2016 at 2:32 AM, Jan Kara <jack@suse.cz> wrote:
> On Tue 09-02-16 10:18:53, Dan Williams wrote:
>> On Tue, Feb 9, 2016 at 9:24 AM, Jan Kara <jack@suse.cz> wrote:
>> > Hello,
>> >
>> > I was thinking about current issues with DAX fault locking [1] (data
>> > corruption due to racing faults allocating blocks) and also races which
>> > currently don't allow us to clear dirty tags in the radix tree due to races
>> > between faults and cache flushing [2]. Both of these exist because we don't
>> > have an equivalent of page lock available for DAX. While we have a
>> > reasonable solution available for problem [1], so far I'm not aware of a
>> > decent solution for [2]. After briefly discussing the issue with Mel he had
>> > a bright idea that we could used hashed locks to deal with [2] (and I think
>> > we can solve [1] with them as well). So my proposal looks as follows:
>> >
>> > DAX will have an array of mutexes (the array can be made per device but
>> > initially a global one should be OK). We will use mutexes in the array as a
>> > replacement for page lock - we will use hashfn(mapping, index) to get
>> > particular mutex protecting our offset in the mapping. On fault / page
>> > mkwrite, we'll grab the mutex similarly to page lock and release it once we
>> > are done updating page tables. This deals with races in [1]. When flushing
>> > caches we grab the mutex before clearing writeable bit in page tables
>> > and clearing dirty bit in the radix tree and drop it after we have flushed
>> > caches for the pfn. This deals with races in [2].
>> >
>> > Thoughts?
>> >
>>
>> I like the fact that this makes the locking explicit and
>> straightforward rather than something more tricky.  Can we make the
>> hashfn pfn based?  I'm thinking we could later reuse this as part of
>> the solution for eliminating the need to allocate struct page, and we
>> don't have the 'mapping' available in all paths...
>
> So Mel originally suggested to use pfn for hashing as well. My concern with
> using pfn is that e.g. if you want to fill a hole, you don't have a pfn to
> lock. What you really need to protect is a logical offset in the file to
> serialize allocation of underlying blocks, its mapping into page tables,
> and flushing the blocks out of caches. So using inode/mapping and offset
> for the hashing is easier (it isn't obvious to me we can fix hole filling
> races with pfn-based locking).
>
> I'm not sure for which other purposes you'd like to use this lock and
> whether propagating file+offset to those call sites would make sense or
> not. struct page has the advantage that block mapping information is only
> attached to it, so when filling a hole, we can just allocate some page,
> attach it to the radix tree, use page lock for synchronization, and allocate
> blocks only after that. With pfns we cannot do this...

Right, I am thinking of the direct-I/O path's use of the page lock and
the occasions where it relies on page->mapping lookups.

Given we already have support for dynamically allocating struct page I
don't think we need to have a "pfn to lock" lookup in the initial
implementation of this locking scheme.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
  2016-02-10 10:32     ` Jan Kara
@ 2016-02-10 22:09       ` Dave Chinner
  -1 siblings, 0 replies; 46+ messages in thread
From: Dave Chinner @ 2016-02-10 22:09 UTC (permalink / raw)
  To: Jan Kara
  Cc: Dan Williams, Ross Zwisler, linux-kernel, linux-fsdevel,
	Linux MM, linux-nvdimm, Mel Gorman, Matthew Wilcox

On Wed, Feb 10, 2016 at 11:32:49AM +0100, Jan Kara wrote:
> On Tue 09-02-16 10:18:53, Dan Williams wrote:
> > On Tue, Feb 9, 2016 at 9:24 AM, Jan Kara <jack@suse.cz> wrote:
> > > Hello,
> > >
> > > I was thinking about current issues with DAX fault locking [1] (data
> > > corruption due to racing faults allocating blocks) and also races which
> > > currently don't allow us to clear dirty tags in the radix tree due to races
> > > between faults and cache flushing [2]. Both of these exist because we don't
> > > have an equivalent of page lock available for DAX. While we have a
> > > reasonable solution available for problem [1], so far I'm not aware of a
> > > decent solution for [2]. After briefly discussing the issue with Mel he had
> > > a bright idea that we could used hashed locks to deal with [2] (and I think
> > > we can solve [1] with them as well). So my proposal looks as follows:
> > >
> > > DAX will have an array of mutexes (the array can be made per device but
> > > initially a global one should be OK). We will use mutexes in the array as a
> > > replacement for page lock - we will use hashfn(mapping, index) to get
> > > particular mutex protecting our offset in the mapping. On fault / page
> > > mkwrite, we'll grab the mutex similarly to page lock and release it once we
> > > are done updating page tables. This deals with races in [1]. When flushing
> > > caches we grab the mutex before clearing writeable bit in page tables
> > > and clearing dirty bit in the radix tree and drop it after we have flushed
> > > caches for the pfn. This deals with races in [2].
> > >
> > > Thoughts?
> > >
> > 
> > I like the fact that this makes the locking explicit and
> > straightforward rather than something more tricky.  Can we make the
> > hashfn pfn based?  I'm thinking we could later reuse this as part of
> > the solution for eliminating the need to allocate struct page, and we
> > don't have the 'mapping' available in all paths...
> 
> So Mel originally suggested to use pfn for hashing as well. My concern with
> using pfn is that e.g. if you want to fill a hole, you don't have a pfn to
> lock. What you really need to protect is a logical offset in the file to
> serialize allocation of underlying blocks, its mapping into page tables,
> and flushing the blocks out of caches. So using inode/mapping and offset
> for the hashing is easier (it isn't obvious to me we can fix hole filling
> races with pfn-based locking).

So how does that file+offset hash work when trying to lock different
ranges?  file+offset hashing to determine the lock to use only works
if we are dealing with fixed size ranges that the locks affect.
e.g. offset has 4k granularity for a single page faults, but we also
need to handle 2MB granularity for huge page faults, and IIRC 1GB
granularity for giant page faults...

What's the plan here?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
@ 2016-02-10 22:09       ` Dave Chinner
  0 siblings, 0 replies; 46+ messages in thread
From: Dave Chinner @ 2016-02-10 22:09 UTC (permalink / raw)
  To: Jan Kara
  Cc: Dan Williams, Ross Zwisler, linux-kernel, linux-fsdevel,
	Linux MM, linux-nvdimm@lists.01.org, Mel Gorman, Matthew Wilcox

On Wed, Feb 10, 2016 at 11:32:49AM +0100, Jan Kara wrote:
> On Tue 09-02-16 10:18:53, Dan Williams wrote:
> > On Tue, Feb 9, 2016 at 9:24 AM, Jan Kara <jack@suse.cz> wrote:
> > > Hello,
> > >
> > > I was thinking about current issues with DAX fault locking [1] (data
> > > corruption due to racing faults allocating blocks) and also races which
> > > currently don't allow us to clear dirty tags in the radix tree due to races
> > > between faults and cache flushing [2]. Both of these exist because we don't
> > > have an equivalent of page lock available for DAX. While we have a
> > > reasonable solution available for problem [1], so far I'm not aware of a
> > > decent solution for [2]. After briefly discussing the issue with Mel he had
> > > a bright idea that we could used hashed locks to deal with [2] (and I think
> > > we can solve [1] with them as well). So my proposal looks as follows:
> > >
> > > DAX will have an array of mutexes (the array can be made per device but
> > > initially a global one should be OK). We will use mutexes in the array as a
> > > replacement for page lock - we will use hashfn(mapping, index) to get
> > > particular mutex protecting our offset in the mapping. On fault / page
> > > mkwrite, we'll grab the mutex similarly to page lock and release it once we
> > > are done updating page tables. This deals with races in [1]. When flushing
> > > caches we grab the mutex before clearing writeable bit in page tables
> > > and clearing dirty bit in the radix tree and drop it after we have flushed
> > > caches for the pfn. This deals with races in [2].
> > >
> > > Thoughts?
> > >
> > 
> > I like the fact that this makes the locking explicit and
> > straightforward rather than something more tricky.  Can we make the
> > hashfn pfn based?  I'm thinking we could later reuse this as part of
> > the solution for eliminating the need to allocate struct page, and we
> > don't have the 'mapping' available in all paths...
> 
> So Mel originally suggested to use pfn for hashing as well. My concern with
> using pfn is that e.g. if you want to fill a hole, you don't have a pfn to
> lock. What you really need to protect is a logical offset in the file to
> serialize allocation of underlying blocks, its mapping into page tables,
> and flushing the blocks out of caches. So using inode/mapping and offset
> for the hashing is easier (it isn't obvious to me we can fix hole filling
> races with pfn-based locking).

So how does that file+offset hash work when trying to lock different
ranges?  file+offset hashing to determine the lock to use only works
if we are dealing with fixed size ranges that the locks affect.
e.g. offset has 4k granularity for a single page faults, but we also
need to handle 2MB granularity for huge page faults, and IIRC 1GB
granularity for giant page faults...

What's the plan here?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
  2016-02-10 22:09       ` Dave Chinner
@ 2016-02-10 22:39         ` Cedric Blancher
  -1 siblings, 0 replies; 46+ messages in thread
From: Cedric Blancher @ 2016-02-10 22:39 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Jan Kara, Dan Williams, Ross Zwisler, linux-kernel,
	linux-fsdevel, Linux MM, linux-nvdimm, Mel Gorman,
	Matthew Wilcox

AFAIK Solaris 11 uses a sparse tree instead of a array. Solves the
scalability problem AND deals with variable page size.

Ced

On 10 February 2016 at 23:09, Dave Chinner <david@fromorbit.com> wrote:
> On Wed, Feb 10, 2016 at 11:32:49AM +0100, Jan Kara wrote:
>> On Tue 09-02-16 10:18:53, Dan Williams wrote:
>> > On Tue, Feb 9, 2016 at 9:24 AM, Jan Kara <jack@suse.cz> wrote:
>> > > Hello,
>> > >
>> > > I was thinking about current issues with DAX fault locking [1] (data
>> > > corruption due to racing faults allocating blocks) and also races which
>> > > currently don't allow us to clear dirty tags in the radix tree due to races
>> > > between faults and cache flushing [2]. Both of these exist because we don't
>> > > have an equivalent of page lock available for DAX. While we have a
>> > > reasonable solution available for problem [1], so far I'm not aware of a
>> > > decent solution for [2]. After briefly discussing the issue with Mel he had
>> > > a bright idea that we could used hashed locks to deal with [2] (and I think
>> > > we can solve [1] with them as well). So my proposal looks as follows:
>> > >
>> > > DAX will have an array of mutexes (the array can be made per device but
>> > > initially a global one should be OK). We will use mutexes in the array as a
>> > > replacement for page lock - we will use hashfn(mapping, index) to get
>> > > particular mutex protecting our offset in the mapping. On fault / page
>> > > mkwrite, we'll grab the mutex similarly to page lock and release it once we
>> > > are done updating page tables. This deals with races in [1]. When flushing
>> > > caches we grab the mutex before clearing writeable bit in page tables
>> > > and clearing dirty bit in the radix tree and drop it after we have flushed
>> > > caches for the pfn. This deals with races in [2].
>> > >
>> > > Thoughts?
>> > >
>> >
>> > I like the fact that this makes the locking explicit and
>> > straightforward rather than something more tricky.  Can we make the
>> > hashfn pfn based?  I'm thinking we could later reuse this as part of
>> > the solution for eliminating the need to allocate struct page, and we
>> > don't have the 'mapping' available in all paths...
>>
>> So Mel originally suggested to use pfn for hashing as well. My concern with
>> using pfn is that e.g. if you want to fill a hole, you don't have a pfn to
>> lock. What you really need to protect is a logical offset in the file to
>> serialize allocation of underlying blocks, its mapping into page tables,
>> and flushing the blocks out of caches. So using inode/mapping and offset
>> for the hashing is easier (it isn't obvious to me we can fix hole filling
>> races with pfn-based locking).
>
> So how does that file+offset hash work when trying to lock different
> ranges?  file+offset hashing to determine the lock to use only works
> if we are dealing with fixed size ranges that the locks affect.
> e.g. offset has 4k granularity for a single page faults, but we also
> need to handle 2MB granularity for huge page faults, and IIRC 1GB
> granularity for giant page faults...
>
> What's the plan here?
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Cedric Blancher <cedric.blancher@gmail.com>
Institute Pasteur

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
@ 2016-02-10 22:39         ` Cedric Blancher
  0 siblings, 0 replies; 46+ messages in thread
From: Cedric Blancher @ 2016-02-10 22:39 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Jan Kara, Dan Williams, Ross Zwisler, linux-kernel,
	linux-fsdevel, Linux MM, linux-nvdimm@lists.01.org, Mel Gorman,
	Matthew Wilcox

AFAIK Solaris 11 uses a sparse tree instead of a array. Solves the
scalability problem AND deals with variable page size.

Ced

On 10 February 2016 at 23:09, Dave Chinner <david@fromorbit.com> wrote:
> On Wed, Feb 10, 2016 at 11:32:49AM +0100, Jan Kara wrote:
>> On Tue 09-02-16 10:18:53, Dan Williams wrote:
>> > On Tue, Feb 9, 2016 at 9:24 AM, Jan Kara <jack@suse.cz> wrote:
>> > > Hello,
>> > >
>> > > I was thinking about current issues with DAX fault locking [1] (data
>> > > corruption due to racing faults allocating blocks) and also races which
>> > > currently don't allow us to clear dirty tags in the radix tree due to races
>> > > between faults and cache flushing [2]. Both of these exist because we don't
>> > > have an equivalent of page lock available for DAX. While we have a
>> > > reasonable solution available for problem [1], so far I'm not aware of a
>> > > decent solution for [2]. After briefly discussing the issue with Mel he had
>> > > a bright idea that we could used hashed locks to deal with [2] (and I think
>> > > we can solve [1] with them as well). So my proposal looks as follows:
>> > >
>> > > DAX will have an array of mutexes (the array can be made per device but
>> > > initially a global one should be OK). We will use mutexes in the array as a
>> > > replacement for page lock - we will use hashfn(mapping, index) to get
>> > > particular mutex protecting our offset in the mapping. On fault / page
>> > > mkwrite, we'll grab the mutex similarly to page lock and release it once we
>> > > are done updating page tables. This deals with races in [1]. When flushing
>> > > caches we grab the mutex before clearing writeable bit in page tables
>> > > and clearing dirty bit in the radix tree and drop it after we have flushed
>> > > caches for the pfn. This deals with races in [2].
>> > >
>> > > Thoughts?
>> > >
>> >
>> > I like the fact that this makes the locking explicit and
>> > straightforward rather than something more tricky.  Can we make the
>> > hashfn pfn based?  I'm thinking we could later reuse this as part of
>> > the solution for eliminating the need to allocate struct page, and we
>> > don't have the 'mapping' available in all paths...
>>
>> So Mel originally suggested to use pfn for hashing as well. My concern with
>> using pfn is that e.g. if you want to fill a hole, you don't have a pfn to
>> lock. What you really need to protect is a logical offset in the file to
>> serialize allocation of underlying blocks, its mapping into page tables,
>> and flushing the blocks out of caches. So using inode/mapping and offset
>> for the hashing is easier (it isn't obvious to me we can fix hole filling
>> races with pfn-based locking).
>
> So how does that file+offset hash work when trying to lock different
> ranges?  file+offset hashing to determine the lock to use only works
> if we are dealing with fixed size ranges that the locks affect.
> e.g. offset has 4k granularity for a single page faults, but we also
> need to handle 2MB granularity for huge page faults, and IIRC 1GB
> granularity for giant page faults...
>
> What's the plan here?
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Cedric Blancher <cedric.blancher@gmail.com>
Institute Pasteur

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
  2016-02-10 22:09       ` Dave Chinner
@ 2016-02-10 23:32         ` Ross Zwisler
  -1 siblings, 0 replies; 46+ messages in thread
From: Ross Zwisler @ 2016-02-10 23:32 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Jan Kara, Dan Williams, Ross Zwisler, linux-kernel,
	linux-fsdevel, Linux MM, linux-nvdimm, Mel Gorman,
	Matthew Wilcox

On Thu, Feb 11, 2016 at 09:09:53AM +1100, Dave Chinner wrote:
> On Wed, Feb 10, 2016 at 11:32:49AM +0100, Jan Kara wrote:
> > On Tue 09-02-16 10:18:53, Dan Williams wrote:
> > > On Tue, Feb 9, 2016 at 9:24 AM, Jan Kara <jack@suse.cz> wrote:
> > > > Hello,
> > > >
> > > > I was thinking about current issues with DAX fault locking [1] (data
> > > > corruption due to racing faults allocating blocks) and also races which
> > > > currently don't allow us to clear dirty tags in the radix tree due to races
> > > > between faults and cache flushing [2]. Both of these exist because we don't
> > > > have an equivalent of page lock available for DAX. While we have a
> > > > reasonable solution available for problem [1], so far I'm not aware of a
> > > > decent solution for [2]. After briefly discussing the issue with Mel he had
> > > > a bright idea that we could used hashed locks to deal with [2] (and I think
> > > > we can solve [1] with them as well). So my proposal looks as follows:
> > > >
> > > > DAX will have an array of mutexes (the array can be made per device but
> > > > initially a global one should be OK). We will use mutexes in the array as a
> > > > replacement for page lock - we will use hashfn(mapping, index) to get
> > > > particular mutex protecting our offset in the mapping. On fault / page
> > > > mkwrite, we'll grab the mutex similarly to page lock and release it once we
> > > > are done updating page tables. This deals with races in [1]. When flushing
> > > > caches we grab the mutex before clearing writeable bit in page tables
> > > > and clearing dirty bit in the radix tree and drop it after we have flushed
> > > > caches for the pfn. This deals with races in [2].
> > > >
> > > > Thoughts?
> > > >
> > > 
> > > I like the fact that this makes the locking explicit and
> > > straightforward rather than something more tricky.  Can we make the
> > > hashfn pfn based?  I'm thinking we could later reuse this as part of
> > > the solution for eliminating the need to allocate struct page, and we
> > > don't have the 'mapping' available in all paths...
> > 
> > So Mel originally suggested to use pfn for hashing as well. My concern with
> > using pfn is that e.g. if you want to fill a hole, you don't have a pfn to
> > lock. What you really need to protect is a logical offset in the file to
> > serialize allocation of underlying blocks, its mapping into page tables,
> > and flushing the blocks out of caches. So using inode/mapping and offset
> > for the hashing is easier (it isn't obvious to me we can fix hole filling
> > races with pfn-based locking).
> 
> So how does that file+offset hash work when trying to lock different
> ranges?  file+offset hashing to determine the lock to use only works
> if we are dealing with fixed size ranges that the locks affect.
> e.g. offset has 4k granularity for a single page faults, but we also
> need to handle 2MB granularity for huge page faults, and IIRC 1GB
> granularity for giant page faults...
> 
> What's the plan here?

I wonder if it makes sense to tie the locking in with the radix tree?
Meaning, instead of having an array of mutexes, we lock based on the radix
tree entry.

Right now we already have to check for PTE and PMD entries in the radix tree,
and with Matthew's suggested radix tree changes a lookup of a random address
would give you the appropriate PMD or PUD entry, if one was present.

This sort of solves the need for having a hash function that works on
file+offset - that's all already there when using the radix tree...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
@ 2016-02-10 23:32         ` Ross Zwisler
  0 siblings, 0 replies; 46+ messages in thread
From: Ross Zwisler @ 2016-02-10 23:32 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Jan Kara, Dan Williams, Ross Zwisler, linux-kernel,
	linux-fsdevel, Linux MM, linux-nvdimm@lists.01.org, Mel Gorman,
	Matthew Wilcox

On Thu, Feb 11, 2016 at 09:09:53AM +1100, Dave Chinner wrote:
> On Wed, Feb 10, 2016 at 11:32:49AM +0100, Jan Kara wrote:
> > On Tue 09-02-16 10:18:53, Dan Williams wrote:
> > > On Tue, Feb 9, 2016 at 9:24 AM, Jan Kara <jack@suse.cz> wrote:
> > > > Hello,
> > > >
> > > > I was thinking about current issues with DAX fault locking [1] (data
> > > > corruption due to racing faults allocating blocks) and also races which
> > > > currently don't allow us to clear dirty tags in the radix tree due to races
> > > > between faults and cache flushing [2]. Both of these exist because we don't
> > > > have an equivalent of page lock available for DAX. While we have a
> > > > reasonable solution available for problem [1], so far I'm not aware of a
> > > > decent solution for [2]. After briefly discussing the issue with Mel he had
> > > > a bright idea that we could used hashed locks to deal with [2] (and I think
> > > > we can solve [1] with them as well). So my proposal looks as follows:
> > > >
> > > > DAX will have an array of mutexes (the array can be made per device but
> > > > initially a global one should be OK). We will use mutexes in the array as a
> > > > replacement for page lock - we will use hashfn(mapping, index) to get
> > > > particular mutex protecting our offset in the mapping. On fault / page
> > > > mkwrite, we'll grab the mutex similarly to page lock and release it once we
> > > > are done updating page tables. This deals with races in [1]. When flushing
> > > > caches we grab the mutex before clearing writeable bit in page tables
> > > > and clearing dirty bit in the radix tree and drop it after we have flushed
> > > > caches for the pfn. This deals with races in [2].
> > > >
> > > > Thoughts?
> > > >
> > > 
> > > I like the fact that this makes the locking explicit and
> > > straightforward rather than something more tricky.  Can we make the
> > > hashfn pfn based?  I'm thinking we could later reuse this as part of
> > > the solution for eliminating the need to allocate struct page, and we
> > > don't have the 'mapping' available in all paths...
> > 
> > So Mel originally suggested to use pfn for hashing as well. My concern with
> > using pfn is that e.g. if you want to fill a hole, you don't have a pfn to
> > lock. What you really need to protect is a logical offset in the file to
> > serialize allocation of underlying blocks, its mapping into page tables,
> > and flushing the blocks out of caches. So using inode/mapping and offset
> > for the hashing is easier (it isn't obvious to me we can fix hole filling
> > races with pfn-based locking).
> 
> So how does that file+offset hash work when trying to lock different
> ranges?  file+offset hashing to determine the lock to use only works
> if we are dealing with fixed size ranges that the locks affect.
> e.g. offset has 4k granularity for a single page faults, but we also
> need to handle 2MB granularity for huge page faults, and IIRC 1GB
> granularity for giant page faults...
> 
> What's the plan here?

I wonder if it makes sense to tie the locking in with the radix tree?
Meaning, instead of having an array of mutexes, we lock based on the radix
tree entry.

Right now we already have to check for PTE and PMD entries in the radix tree,
and with Matthew's suggested radix tree changes a lookup of a random address
would give you the appropriate PMD or PUD entry, if one was present.

This sort of solves the need for having a hash function that works on
file+offset - that's all already there when using the radix tree...

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
  2016-02-10 22:39         ` Cedric Blancher
@ 2016-02-10 23:34           ` Ross Zwisler
  -1 siblings, 0 replies; 46+ messages in thread
From: Ross Zwisler @ 2016-02-10 23:34 UTC (permalink / raw)
  To: Cedric Blancher
  Cc: Dave Chinner, Jan Kara, Dan Williams, Ross Zwisler, linux-kernel,
	linux-fsdevel, Linux MM, linux-nvdimm, Mel Gorman,
	Matthew Wilcox

On Wed, Feb 10, 2016 at 11:39:43PM +0100, Cedric Blancher wrote:
> AFAIK Solaris 11 uses a sparse tree instead of a array. Solves the
> scalability problem AND deals with variable page size.

Right - seems like tying the radix tree into the locking instead of using an
array would have these same benefits.

> On 10 February 2016 at 23:09, Dave Chinner <david@fromorbit.com> wrote:
> > On Wed, Feb 10, 2016 at 11:32:49AM +0100, Jan Kara wrote:
> >> On Tue 09-02-16 10:18:53, Dan Williams wrote:
> >> > On Tue, Feb 9, 2016 at 9:24 AM, Jan Kara <jack@suse.cz> wrote:
> >> > > Hello,
> >> > >
> >> > > I was thinking about current issues with DAX fault locking [1] (data
> >> > > corruption due to racing faults allocating blocks) and also races which
> >> > > currently don't allow us to clear dirty tags in the radix tree due to races
> >> > > between faults and cache flushing [2]. Both of these exist because we don't
> >> > > have an equivalent of page lock available for DAX. While we have a
> >> > > reasonable solution available for problem [1], so far I'm not aware of a
> >> > > decent solution for [2]. After briefly discussing the issue with Mel he had
> >> > > a bright idea that we could used hashed locks to deal with [2] (and I think
> >> > > we can solve [1] with them as well). So my proposal looks as follows:
> >> > >
> >> > > DAX will have an array of mutexes (the array can be made per device but
> >> > > initially a global one should be OK). We will use mutexes in the array as a
> >> > > replacement for page lock - we will use hashfn(mapping, index) to get
> >> > > particular mutex protecting our offset in the mapping. On fault / page
> >> > > mkwrite, we'll grab the mutex similarly to page lock and release it once we
> >> > > are done updating page tables. This deals with races in [1]. When flushing
> >> > > caches we grab the mutex before clearing writeable bit in page tables
> >> > > and clearing dirty bit in the radix tree and drop it after we have flushed
> >> > > caches for the pfn. This deals with races in [2].
> >> > >
> >> > > Thoughts?
> >> > >
> >> >
> >> > I like the fact that this makes the locking explicit and
> >> > straightforward rather than something more tricky.  Can we make the
> >> > hashfn pfn based?  I'm thinking we could later reuse this as part of
> >> > the solution for eliminating the need to allocate struct page, and we
> >> > don't have the 'mapping' available in all paths...
> >>
> >> So Mel originally suggested to use pfn for hashing as well. My concern with
> >> using pfn is that e.g. if you want to fill a hole, you don't have a pfn to
> >> lock. What you really need to protect is a logical offset in the file to
> >> serialize allocation of underlying blocks, its mapping into page tables,
> >> and flushing the blocks out of caches. So using inode/mapping and offset
> >> for the hashing is easier (it isn't obvious to me we can fix hole filling
> >> races with pfn-based locking).
> >
> > So how does that file+offset hash work when trying to lock different
> > ranges?  file+offset hashing to determine the lock to use only works
> > if we are dealing with fixed size ranges that the locks affect.
> > e.g. offset has 4k granularity for a single page faults, but we also
> > need to handle 2MB granularity for huge page faults, and IIRC 1GB
> > granularity for giant page faults...
> >
> > What's the plan here?
> >
> > Cheers,
> >
> > Dave.
> > --
> > Dave Chinner
> > david@fromorbit.com
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> -- 
> Cedric Blancher <cedric.blancher@gmail.com>
> Institute Pasteur

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
@ 2016-02-10 23:34           ` Ross Zwisler
  0 siblings, 0 replies; 46+ messages in thread
From: Ross Zwisler @ 2016-02-10 23:34 UTC (permalink / raw)
  To: Cedric Blancher
  Cc: Dave Chinner, Jan Kara, Dan Williams, Ross Zwisler, linux-kernel,
	linux-fsdevel, Linux MM, linux-nvdimm@lists.01.org, Mel Gorman,
	Matthew Wilcox

On Wed, Feb 10, 2016 at 11:39:43PM +0100, Cedric Blancher wrote:
> AFAIK Solaris 11 uses a sparse tree instead of a array. Solves the
> scalability problem AND deals with variable page size.

Right - seems like tying the radix tree into the locking instead of using an
array would have these same benefits.

> On 10 February 2016 at 23:09, Dave Chinner <david@fromorbit.com> wrote:
> > On Wed, Feb 10, 2016 at 11:32:49AM +0100, Jan Kara wrote:
> >> On Tue 09-02-16 10:18:53, Dan Williams wrote:
> >> > On Tue, Feb 9, 2016 at 9:24 AM, Jan Kara <jack@suse.cz> wrote:
> >> > > Hello,
> >> > >
> >> > > I was thinking about current issues with DAX fault locking [1] (data
> >> > > corruption due to racing faults allocating blocks) and also races which
> >> > > currently don't allow us to clear dirty tags in the radix tree due to races
> >> > > between faults and cache flushing [2]. Both of these exist because we don't
> >> > > have an equivalent of page lock available for DAX. While we have a
> >> > > reasonable solution available for problem [1], so far I'm not aware of a
> >> > > decent solution for [2]. After briefly discussing the issue with Mel he had
> >> > > a bright idea that we could used hashed locks to deal with [2] (and I think
> >> > > we can solve [1] with them as well). So my proposal looks as follows:
> >> > >
> >> > > DAX will have an array of mutexes (the array can be made per device but
> >> > > initially a global one should be OK). We will use mutexes in the array as a
> >> > > replacement for page lock - we will use hashfn(mapping, index) to get
> >> > > particular mutex protecting our offset in the mapping. On fault / page
> >> > > mkwrite, we'll grab the mutex similarly to page lock and release it once we
> >> > > are done updating page tables. This deals with races in [1]. When flushing
> >> > > caches we grab the mutex before clearing writeable bit in page tables
> >> > > and clearing dirty bit in the radix tree and drop it after we have flushed
> >> > > caches for the pfn. This deals with races in [2].
> >> > >
> >> > > Thoughts?
> >> > >
> >> >
> >> > I like the fact that this makes the locking explicit and
> >> > straightforward rather than something more tricky.  Can we make the
> >> > hashfn pfn based?  I'm thinking we could later reuse this as part of
> >> > the solution for eliminating the need to allocate struct page, and we
> >> > don't have the 'mapping' available in all paths...
> >>
> >> So Mel originally suggested to use pfn for hashing as well. My concern with
> >> using pfn is that e.g. if you want to fill a hole, you don't have a pfn to
> >> lock. What you really need to protect is a logical offset in the file to
> >> serialize allocation of underlying blocks, its mapping into page tables,
> >> and flushing the blocks out of caches. So using inode/mapping and offset
> >> for the hashing is easier (it isn't obvious to me we can fix hole filling
> >> races with pfn-based locking).
> >
> > So how does that file+offset hash work when trying to lock different
> > ranges?  file+offset hashing to determine the lock to use only works
> > if we are dealing with fixed size ranges that the locks affect.
> > e.g. offset has 4k granularity for a single page faults, but we also
> > need to handle 2MB granularity for huge page faults, and IIRC 1GB
> > granularity for giant page faults...
> >
> > What's the plan here?
> >
> > Cheers,
> >
> > Dave.
> > --
> > Dave Chinner
> > david@fromorbit.com
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> -- 
> Cedric Blancher <cedric.blancher@gmail.com>
> Institute Pasteur

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
  2016-02-09 17:24 ` Jan Kara
@ 2016-02-10 23:44   ` Ross Zwisler
  -1 siblings, 0 replies; 46+ messages in thread
From: Ross Zwisler @ 2016-02-10 23:44 UTC (permalink / raw)
  To: Jan Kara
  Cc: Ross Zwisler, linux-kernel, Dave Chinner, linux-fsdevel,
	linux-mm, Dan Williams, linux-nvdimm, mgorman, Matthew Wilcox

On Tue, Feb 09, 2016 at 06:24:16PM +0100, Jan Kara wrote:
> Hello,
> 
> I was thinking about current issues with DAX fault locking [1] (data
> corruption due to racing faults allocating blocks) and also races which
> currently don't allow us to clear dirty tags in the radix tree due to races
> between faults and cache flushing [2]. Both of these exist because we don't
> have an equivalent of page lock available for DAX. While we have a
> reasonable solution available for problem [1], so far I'm not aware of a
> decent solution for [2]. After briefly discussing the issue with Mel he had
> a bright idea that we could used hashed locks to deal with [2] (and I think
> we can solve [1] with them as well). So my proposal looks as follows:
> 
> DAX will have an array of mutexes (the array can be made per device but
> initially a global one should be OK). We will use mutexes in the array as a
> replacement for page lock - we will use hashfn(mapping, index) to get
> particular mutex protecting our offset in the mapping. On fault / page
> mkwrite, we'll grab the mutex similarly to page lock and release it once we
> are done updating page tables. This deals with races in [1]. When flushing
> caches we grab the mutex before clearing writeable bit in page tables
> and clearing dirty bit in the radix tree and drop it after we have flushed
> caches for the pfn. This deals with races in [2].
> 
> Thoughts?
> 
> 								Honza
> 
> [1] http://oss.sgi.com/archives/xfs/2016-01/msg00575.html
> [2] https://lists.01.org/pipermail/linux-nvdimm/2016-January/004057.html

Overall I think this sounds promising.  I think a potential tie-in with the
radix tree would maybe take us in a good direction.

I had another idea of how to solve race #2 that involved sticking a seqlock
around the DAX radix tree + pte_mkwrite() sequence, and on the flushing side
if you noticed that you've raced against a page fault, just leaving the dirty
page tree entry intact.

I *think* this could work - I'd want to bang on it more - but if we have a
general way of handling DAX locking that we can use instead of solving these
issues one-by-one as they come up, that seems like a much better route.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
@ 2016-02-10 23:44   ` Ross Zwisler
  0 siblings, 0 replies; 46+ messages in thread
From: Ross Zwisler @ 2016-02-10 23:44 UTC (permalink / raw)
  To: Jan Kara
  Cc: Ross Zwisler, linux-kernel, Dave Chinner, linux-fsdevel,
	linux-mm, Dan Williams, linux-nvdimm, mgorman, Matthew Wilcox

On Tue, Feb 09, 2016 at 06:24:16PM +0100, Jan Kara wrote:
> Hello,
> 
> I was thinking about current issues with DAX fault locking [1] (data
> corruption due to racing faults allocating blocks) and also races which
> currently don't allow us to clear dirty tags in the radix tree due to races
> between faults and cache flushing [2]. Both of these exist because we don't
> have an equivalent of page lock available for DAX. While we have a
> reasonable solution available for problem [1], so far I'm not aware of a
> decent solution for [2]. After briefly discussing the issue with Mel he had
> a bright idea that we could used hashed locks to deal with [2] (and I think
> we can solve [1] with them as well). So my proposal looks as follows:
> 
> DAX will have an array of mutexes (the array can be made per device but
> initially a global one should be OK). We will use mutexes in the array as a
> replacement for page lock - we will use hashfn(mapping, index) to get
> particular mutex protecting our offset in the mapping. On fault / page
> mkwrite, we'll grab the mutex similarly to page lock and release it once we
> are done updating page tables. This deals with races in [1]. When flushing
> caches we grab the mutex before clearing writeable bit in page tables
> and clearing dirty bit in the radix tree and drop it after we have flushed
> caches for the pfn. This deals with races in [2].
> 
> Thoughts?
> 
> 								Honza
> 
> [1] http://oss.sgi.com/archives/xfs/2016-01/msg00575.html
> [2] https://lists.01.org/pipermail/linux-nvdimm/2016-January/004057.html

Overall I think this sounds promising.  I think a potential tie-in with the
radix tree would maybe take us in a good direction.

I had another idea of how to solve race #2 that involved sticking a seqlock
around the DAX radix tree + pte_mkwrite() sequence, and on the flushing side
if you noticed that you've raced against a page fault, just leaving the dirty
page tree entry intact.

I *think* this could work - I'd want to bang on it more - but if we have a
general way of handling DAX locking that we can use instead of solving these
issues one-by-one as they come up, that seems like a much better route.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
  2016-02-10 23:44   ` Ross Zwisler
@ 2016-02-10 23:51     ` Cedric Blancher
  -1 siblings, 0 replies; 46+ messages in thread
From: Cedric Blancher @ 2016-02-10 23:51 UTC (permalink / raw)
  To: Ross Zwisler, Jan Kara, Linux Kernel Mailing List, Dave Chinner,
	linux-fsdevel, Linux MM, Dan Williams, linux-nvdimm, Mel Gorman,
	Matthew Wilcox

There is another "twist" in this game: If there is a huge page with
1GB with a small 4k page as "overlay" (e.g. mmap() MAP_FIXED somewhere
in the middle of a 1GB huge page), hows that handled?

Ced

On 11 February 2016 at 00:44, Ross Zwisler <ross.zwisler@linux.intel.com> wrote:
> On Tue, Feb 09, 2016 at 06:24:16PM +0100, Jan Kara wrote:
>> Hello,
>>
>> I was thinking about current issues with DAX fault locking [1] (data
>> corruption due to racing faults allocating blocks) and also races which
>> currently don't allow us to clear dirty tags in the radix tree due to races
>> between faults and cache flushing [2]. Both of these exist because we don't
>> have an equivalent of page lock available for DAX. While we have a
>> reasonable solution available for problem [1], so far I'm not aware of a
>> decent solution for [2]. After briefly discussing the issue with Mel he had
>> a bright idea that we could used hashed locks to deal with [2] (and I think
>> we can solve [1] with them as well). So my proposal looks as follows:
>>
>> DAX will have an array of mutexes (the array can be made per device but
>> initially a global one should be OK). We will use mutexes in the array as a
>> replacement for page lock - we will use hashfn(mapping, index) to get
>> particular mutex protecting our offset in the mapping. On fault / page
>> mkwrite, we'll grab the mutex similarly to page lock and release it once we
>> are done updating page tables. This deals with races in [1]. When flushing
>> caches we grab the mutex before clearing writeable bit in page tables
>> and clearing dirty bit in the radix tree and drop it after we have flushed
>> caches for the pfn. This deals with races in [2].
>>
>> Thoughts?
>>
>>                                                               Honza
>>
>> [1] http://oss.sgi.com/archives/xfs/2016-01/msg00575.html
>> [2] https://lists.01.org/pipermail/linux-nvdimm/2016-January/004057.html
>
> Overall I think this sounds promising.  I think a potential tie-in with the
> radix tree would maybe take us in a good direction.
>
> I had another idea of how to solve race #2 that involved sticking a seqlock
> around the DAX radix tree + pte_mkwrite() sequence, and on the flushing side
> if you noticed that you've raced against a page fault, just leaving the dirty
> page tree entry intact.
>
> I *think* this could work - I'd want to bang on it more - but if we have a
> general way of handling DAX locking that we can use instead of solving these
> issues one-by-one as they come up, that seems like a much better route.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Cedric Blancher <cedric.blancher@gmail.com>
Institute Pasteur

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
@ 2016-02-10 23:51     ` Cedric Blancher
  0 siblings, 0 replies; 46+ messages in thread
From: Cedric Blancher @ 2016-02-10 23:51 UTC (permalink / raw)
  To: Ross Zwisler, Jan Kara, Linux Kernel Mailing List, Dave Chinner,
	linux-fsdevel, Linux MM, Dan Williams, linux-nvdimm, Mel Gorman,
	Matthew Wilcox

There is another "twist" in this game: If there is a huge page with
1GB with a small 4k page as "overlay" (e.g. mmap() MAP_FIXED somewhere
in the middle of a 1GB huge page), hows that handled?

Ced

On 11 February 2016 at 00:44, Ross Zwisler <ross.zwisler@linux.intel.com> wrote:
> On Tue, Feb 09, 2016 at 06:24:16PM +0100, Jan Kara wrote:
>> Hello,
>>
>> I was thinking about current issues with DAX fault locking [1] (data
>> corruption due to racing faults allocating blocks) and also races which
>> currently don't allow us to clear dirty tags in the radix tree due to races
>> between faults and cache flushing [2]. Both of these exist because we don't
>> have an equivalent of page lock available for DAX. While we have a
>> reasonable solution available for problem [1], so far I'm not aware of a
>> decent solution for [2]. After briefly discussing the issue with Mel he had
>> a bright idea that we could used hashed locks to deal with [2] (and I think
>> we can solve [1] with them as well). So my proposal looks as follows:
>>
>> DAX will have an array of mutexes (the array can be made per device but
>> initially a global one should be OK). We will use mutexes in the array as a
>> replacement for page lock - we will use hashfn(mapping, index) to get
>> particular mutex protecting our offset in the mapping. On fault / page
>> mkwrite, we'll grab the mutex similarly to page lock and release it once we
>> are done updating page tables. This deals with races in [1]. When flushing
>> caches we grab the mutex before clearing writeable bit in page tables
>> and clearing dirty bit in the radix tree and drop it after we have flushed
>> caches for the pfn. This deals with races in [2].
>>
>> Thoughts?
>>
>>                                                               Honza
>>
>> [1] http://oss.sgi.com/archives/xfs/2016-01/msg00575.html
>> [2] https://lists.01.org/pipermail/linux-nvdimm/2016-January/004057.html
>
> Overall I think this sounds promising.  I think a potential tie-in with the
> radix tree would maybe take us in a good direction.
>
> I had another idea of how to solve race #2 that involved sticking a seqlock
> around the DAX radix tree + pte_mkwrite() sequence, and on the flushing side
> if you noticed that you've raced against a page fault, just leaving the dirty
> page tree entry intact.
>
> I *think* this could work - I'd want to bang on it more - but if we have a
> general way of handling DAX locking that we can use instead of solving these
> issues one-by-one as they come up, that seems like a much better route.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Cedric Blancher <cedric.blancher@gmail.com>
Institute Pasteur

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
  2016-02-10 23:51     ` Cedric Blancher
@ 2016-02-11  0:13       ` Ross Zwisler
  -1 siblings, 0 replies; 46+ messages in thread
From: Ross Zwisler @ 2016-02-11  0:13 UTC (permalink / raw)
  To: Cedric Blancher
  Cc: Ross Zwisler, Jan Kara, Linux Kernel Mailing List, Dave Chinner,
	linux-fsdevel, Linux MM, Dan Williams, linux-nvdimm, Mel Gorman,
	Matthew Wilcox

On Thu, Feb 11, 2016 at 12:51:05AM +0100, Cedric Blancher wrote:
> There is another "twist" in this game: If there is a huge page with
> 1GB with a small 4k page as "overlay" (e.g. mmap() MAP_FIXED somewhere
> in the middle of a 1GB huge page), hows that handled?

Ugh - I'm pretty sure we haven't touched overlays with DAX at all.

The man page says this:

  If the memory region specified by addr and len overlaps pages of any
  existing mapping(s), then the overlapped part of the existing mapping(s)
  will be discarded.

I wonder if this would translate into a hole punch for our DAX mapping,
whatever size it may be, plus an insert?

If so, it seems like we just need to handle each of those operations correctly
on their own (hole punch, insert), and things will take care of themselves?

That being said, I know for a fact that PMD hole punch is currently broken.

> On 11 February 2016 at 00:44, Ross Zwisler <ross.zwisler@linux.intel.com> wrote:
> > On Tue, Feb 09, 2016 at 06:24:16PM +0100, Jan Kara wrote:
> >> Hello,
> >>
> >> I was thinking about current issues with DAX fault locking [1] (data
> >> corruption due to racing faults allocating blocks) and also races which
> >> currently don't allow us to clear dirty tags in the radix tree due to races
> >> between faults and cache flushing [2]. Both of these exist because we don't
> >> have an equivalent of page lock available for DAX. While we have a
> >> reasonable solution available for problem [1], so far I'm not aware of a
> >> decent solution for [2]. After briefly discussing the issue with Mel he had
> >> a bright idea that we could used hashed locks to deal with [2] (and I think
> >> we can solve [1] with them as well). So my proposal looks as follows:
> >>
> >> DAX will have an array of mutexes (the array can be made per device but
> >> initially a global one should be OK). We will use mutexes in the array as a
> >> replacement for page lock - we will use hashfn(mapping, index) to get
> >> particular mutex protecting our offset in the mapping. On fault / page
> >> mkwrite, we'll grab the mutex similarly to page lock and release it once we
> >> are done updating page tables. This deals with races in [1]. When flushing
> >> caches we grab the mutex before clearing writeable bit in page tables
> >> and clearing dirty bit in the radix tree and drop it after we have flushed
> >> caches for the pfn. This deals with races in [2].
> >>
> >> Thoughts?
> >>
> >>                                                               Honza
> >>
> >> [1] http://oss.sgi.com/archives/xfs/2016-01/msg00575.html
> >> [2] https://lists.01.org/pipermail/linux-nvdimm/2016-January/004057.html
> >
> > Overall I think this sounds promising.  I think a potential tie-in with the
> > radix tree would maybe take us in a good direction.
> >
> > I had another idea of how to solve race #2 that involved sticking a seqlock
> > around the DAX radix tree + pte_mkwrite() sequence, and on the flushing side
> > if you noticed that you've raced against a page fault, just leaving the dirty
> > page tree entry intact.
> >
> > I *think* this could work - I'd want to bang on it more - but if we have a
> > general way of handling DAX locking that we can use instead of solving these
> > issues one-by-one as they come up, that seems like a much better route.
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> -- 
> Cedric Blancher <cedric.blancher@gmail.com>
> Institute Pasteur

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
@ 2016-02-11  0:13       ` Ross Zwisler
  0 siblings, 0 replies; 46+ messages in thread
From: Ross Zwisler @ 2016-02-11  0:13 UTC (permalink / raw)
  To: Cedric Blancher
  Cc: Ross Zwisler, Jan Kara, Linux Kernel Mailing List, Dave Chinner,
	linux-fsdevel, Linux MM, Dan Williams, linux-nvdimm, Mel Gorman,
	Matthew Wilcox

On Thu, Feb 11, 2016 at 12:51:05AM +0100, Cedric Blancher wrote:
> There is another "twist" in this game: If there is a huge page with
> 1GB with a small 4k page as "overlay" (e.g. mmap() MAP_FIXED somewhere
> in the middle of a 1GB huge page), hows that handled?

Ugh - I'm pretty sure we haven't touched overlays with DAX at all.

The man page says this:

  If the memory region specified by addr and len overlaps pages of any
  existing mapping(s), then the overlapped part of the existing mapping(s)
  will be discarded.

I wonder if this would translate into a hole punch for our DAX mapping,
whatever size it may be, plus an insert?

If so, it seems like we just need to handle each of those operations correctly
on their own (hole punch, insert), and things will take care of themselves?

That being said, I know for a fact that PMD hole punch is currently broken.

> On 11 February 2016 at 00:44, Ross Zwisler <ross.zwisler@linux.intel.com> wrote:
> > On Tue, Feb 09, 2016 at 06:24:16PM +0100, Jan Kara wrote:
> >> Hello,
> >>
> >> I was thinking about current issues with DAX fault locking [1] (data
> >> corruption due to racing faults allocating blocks) and also races which
> >> currently don't allow us to clear dirty tags in the radix tree due to races
> >> between faults and cache flushing [2]. Both of these exist because we don't
> >> have an equivalent of page lock available for DAX. While we have a
> >> reasonable solution available for problem [1], so far I'm not aware of a
> >> decent solution for [2]. After briefly discussing the issue with Mel he had
> >> a bright idea that we could used hashed locks to deal with [2] (and I think
> >> we can solve [1] with them as well). So my proposal looks as follows:
> >>
> >> DAX will have an array of mutexes (the array can be made per device but
> >> initially a global one should be OK). We will use mutexes in the array as a
> >> replacement for page lock - we will use hashfn(mapping, index) to get
> >> particular mutex protecting our offset in the mapping. On fault / page
> >> mkwrite, we'll grab the mutex similarly to page lock and release it once we
> >> are done updating page tables. This deals with races in [1]. When flushing
> >> caches we grab the mutex before clearing writeable bit in page tables
> >> and clearing dirty bit in the radix tree and drop it after we have flushed
> >> caches for the pfn. This deals with races in [2].
> >>
> >> Thoughts?
> >>
> >>                                                               Honza
> >>
> >> [1] http://oss.sgi.com/archives/xfs/2016-01/msg00575.html
> >> [2] https://lists.01.org/pipermail/linux-nvdimm/2016-January/004057.html
> >
> > Overall I think this sounds promising.  I think a potential tie-in with the
> > radix tree would maybe take us in a good direction.
> >
> > I had another idea of how to solve race #2 that involved sticking a seqlock
> > around the DAX radix tree + pte_mkwrite() sequence, and on the flushing side
> > if you noticed that you've raced against a page fault, just leaving the dirty
> > page tree entry intact.
> >
> > I *think* this could work - I'd want to bang on it more - but if we have a
> > general way of handling DAX locking that we can use instead of solving these
> > issues one-by-one as they come up, that seems like a much better route.
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> -- 
> Cedric Blancher <cedric.blancher@gmail.com>
> Institute Pasteur

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
  2016-02-10 17:38   ` Boaz Harrosh
@ 2016-02-11 10:38     ` Jan Kara
  -1 siblings, 0 replies; 46+ messages in thread
From: Jan Kara @ 2016-02-11 10:38 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Jan Kara, Ross Zwisler, linux-nvdimm, Dave Chinner, linux-kernel,
	linux-mm, mgorman, linux-fsdevel

On Wed 10-02-16 19:38:21, Boaz Harrosh wrote:
> On 02/09/2016 07:24 PM, Jan Kara wrote:
> > Hello,
> > 
> > I was thinking about current issues with DAX fault locking [1] (data
> > corruption due to racing faults allocating blocks) and also races which
> > currently don't allow us to clear dirty tags in the radix tree due to races
> > between faults and cache flushing [2]. Both of these exist because we don't
> > have an equivalent of page lock available for DAX. While we have a
> > reasonable solution available for problem [1], so far I'm not aware of a
> > decent solution for [2]. After briefly discussing the issue with Mel he had
> > a bright idea that we could used hashed locks to deal with [2] (and I think
> > we can solve [1] with them as well). So my proposal looks as follows:
> > 
> > DAX will have an array of mutexes (the array can be made per device but
> > initially a global one should be OK). We will use mutexes in the array as a
> > replacement for page lock - we will use hashfn(mapping, index) to get
> > particular mutex protecting our offset in the mapping. On fault / page
> > mkwrite, we'll grab the mutex similarly to page lock and release it once we
> > are done updating page tables. This deals with races in [1]. When flushing
> > caches we grab the mutex before clearing writeable bit in page tables
> > and clearing dirty bit in the radix tree and drop it after we have flushed
> > caches for the pfn. This deals with races in [2].
> > 
> > Thoughts?
> > 
> 
> You could also use one of the radix-tree's special-bits as a bit lock.
> So no need for any extra allocations.

Yes and I've suggested that once as well. But since we need sleeping
locks, you need some wait queues somewhere as well. So some allocations are
going to be needed anyway. And mutexes have much better properties than
bit-locks so I prefer mutexes over cramming bit locks into radix tree. Plus
you'd have to be careful so that someone doesn't remove the bit from the
radix tree while you are working with it.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
@ 2016-02-11 10:38     ` Jan Kara
  0 siblings, 0 replies; 46+ messages in thread
From: Jan Kara @ 2016-02-11 10:38 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Jan Kara, Ross Zwisler, linux-nvdimm, Dave Chinner, linux-kernel,
	linux-mm, mgorman, linux-fsdevel

On Wed 10-02-16 19:38:21, Boaz Harrosh wrote:
> On 02/09/2016 07:24 PM, Jan Kara wrote:
> > Hello,
> > 
> > I was thinking about current issues with DAX fault locking [1] (data
> > corruption due to racing faults allocating blocks) and also races which
> > currently don't allow us to clear dirty tags in the radix tree due to races
> > between faults and cache flushing [2]. Both of these exist because we don't
> > have an equivalent of page lock available for DAX. While we have a
> > reasonable solution available for problem [1], so far I'm not aware of a
> > decent solution for [2]. After briefly discussing the issue with Mel he had
> > a bright idea that we could used hashed locks to deal with [2] (and I think
> > we can solve [1] with them as well). So my proposal looks as follows:
> > 
> > DAX will have an array of mutexes (the array can be made per device but
> > initially a global one should be OK). We will use mutexes in the array as a
> > replacement for page lock - we will use hashfn(mapping, index) to get
> > particular mutex protecting our offset in the mapping. On fault / page
> > mkwrite, we'll grab the mutex similarly to page lock and release it once we
> > are done updating page tables. This deals with races in [1]. When flushing
> > caches we grab the mutex before clearing writeable bit in page tables
> > and clearing dirty bit in the radix tree and drop it after we have flushed
> > caches for the pfn. This deals with races in [2].
> > 
> > Thoughts?
> > 
> 
> You could also use one of the radix-tree's special-bits as a bit lock.
> So no need for any extra allocations.

Yes and I've suggested that once as well. But since we need sleeping
locks, you need some wait queues somewhere as well. So some allocations are
going to be needed anyway. And mutexes have much better properties than
bit-locks so I prefer mutexes over cramming bit locks into radix tree. Plus
you'd have to be careful so that someone doesn't remove the bit from the
radix tree while you are working with it.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
  2016-02-10 20:08       ` Dan Williams
@ 2016-02-11 10:43         ` Jan Kara
  -1 siblings, 0 replies; 46+ messages in thread
From: Jan Kara @ 2016-02-11 10:43 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jan Kara, Ross Zwisler, linux-kernel, Dave Chinner,
	linux-fsdevel, Linux MM, linux-nvdimm, Mel Gorman,
	Matthew Wilcox

On Wed 10-02-16 12:08:12, Dan Williams wrote:
> On Wed, Feb 10, 2016 at 2:32 AM, Jan Kara <jack@suse.cz> wrote:
> > On Tue 09-02-16 10:18:53, Dan Williams wrote:
> >> On Tue, Feb 9, 2016 at 9:24 AM, Jan Kara <jack@suse.cz> wrote:
> >> > Hello,
> >> >
> >> > I was thinking about current issues with DAX fault locking [1] (data
> >> > corruption due to racing faults allocating blocks) and also races which
> >> > currently don't allow us to clear dirty tags in the radix tree due to races
> >> > between faults and cache flushing [2]. Both of these exist because we don't
> >> > have an equivalent of page lock available for DAX. While we have a
> >> > reasonable solution available for problem [1], so far I'm not aware of a
> >> > decent solution for [2]. After briefly discussing the issue with Mel he had
> >> > a bright idea that we could used hashed locks to deal with [2] (and I think
> >> > we can solve [1] with them as well). So my proposal looks as follows:
> >> >
> >> > DAX will have an array of mutexes (the array can be made per device but
> >> > initially a global one should be OK). We will use mutexes in the array as a
> >> > replacement for page lock - we will use hashfn(mapping, index) to get
> >> > particular mutex protecting our offset in the mapping. On fault / page
> >> > mkwrite, we'll grab the mutex similarly to page lock and release it once we
> >> > are done updating page tables. This deals with races in [1]. When flushing
> >> > caches we grab the mutex before clearing writeable bit in page tables
> >> > and clearing dirty bit in the radix tree and drop it after we have flushed
> >> > caches for the pfn. This deals with races in [2].
> >> >
> >> > Thoughts?
> >> >
> >>
> >> I like the fact that this makes the locking explicit and
> >> straightforward rather than something more tricky.  Can we make the
> >> hashfn pfn based?  I'm thinking we could later reuse this as part of
> >> the solution for eliminating the need to allocate struct page, and we
> >> don't have the 'mapping' available in all paths...
> >
> > So Mel originally suggested to use pfn for hashing as well. My concern with
> > using pfn is that e.g. if you want to fill a hole, you don't have a pfn to
> > lock. What you really need to protect is a logical offset in the file to
> > serialize allocation of underlying blocks, its mapping into page tables,
> > and flushing the blocks out of caches. So using inode/mapping and offset
> > for the hashing is easier (it isn't obvious to me we can fix hole filling
> > races with pfn-based locking).
> >
> > I'm not sure for which other purposes you'd like to use this lock and
> > whether propagating file+offset to those call sites would make sense or
> > not. struct page has the advantage that block mapping information is only
> > attached to it, so when filling a hole, we can just allocate some page,
> > attach it to the radix tree, use page lock for synchronization, and allocate
> > blocks only after that. With pfns we cannot do this...
> 
> Right, I am thinking of the direct-I/O path's use of the page lock and
> the occasions where it relies on page->mapping lookups.

Well, but the main problem with direct IO is that it takes page *reference*
via get_user_pages(). So that's something different from page lock. Maybe
the new lock could be abused to provide necessary exclusion for direct IO
use as well but that would need deep thinking... So far it seems
problematic to me.

> Given we already have support for dynamically allocating struct page I
> don't think we need to have a "pfn to lock" lookup in the initial
> implementation of this locking scheme.

Agreed.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
@ 2016-02-11 10:43         ` Jan Kara
  0 siblings, 0 replies; 46+ messages in thread
From: Jan Kara @ 2016-02-11 10:43 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jan Kara, Ross Zwisler, linux-kernel, Dave Chinner,
	linux-fsdevel, Linux MM, linux-nvdimm@lists.01.org, Mel Gorman,
	Matthew Wilcox

On Wed 10-02-16 12:08:12, Dan Williams wrote:
> On Wed, Feb 10, 2016 at 2:32 AM, Jan Kara <jack@suse.cz> wrote:
> > On Tue 09-02-16 10:18:53, Dan Williams wrote:
> >> On Tue, Feb 9, 2016 at 9:24 AM, Jan Kara <jack@suse.cz> wrote:
> >> > Hello,
> >> >
> >> > I was thinking about current issues with DAX fault locking [1] (data
> >> > corruption due to racing faults allocating blocks) and also races which
> >> > currently don't allow us to clear dirty tags in the radix tree due to races
> >> > between faults and cache flushing [2]. Both of these exist because we don't
> >> > have an equivalent of page lock available for DAX. While we have a
> >> > reasonable solution available for problem [1], so far I'm not aware of a
> >> > decent solution for [2]. After briefly discussing the issue with Mel he had
> >> > a bright idea that we could used hashed locks to deal with [2] (and I think
> >> > we can solve [1] with them as well). So my proposal looks as follows:
> >> >
> >> > DAX will have an array of mutexes (the array can be made per device but
> >> > initially a global one should be OK). We will use mutexes in the array as a
> >> > replacement for page lock - we will use hashfn(mapping, index) to get
> >> > particular mutex protecting our offset in the mapping. On fault / page
> >> > mkwrite, we'll grab the mutex similarly to page lock and release it once we
> >> > are done updating page tables. This deals with races in [1]. When flushing
> >> > caches we grab the mutex before clearing writeable bit in page tables
> >> > and clearing dirty bit in the radix tree and drop it after we have flushed
> >> > caches for the pfn. This deals with races in [2].
> >> >
> >> > Thoughts?
> >> >
> >>
> >> I like the fact that this makes the locking explicit and
> >> straightforward rather than something more tricky.  Can we make the
> >> hashfn pfn based?  I'm thinking we could later reuse this as part of
> >> the solution for eliminating the need to allocate struct page, and we
> >> don't have the 'mapping' available in all paths...
> >
> > So Mel originally suggested to use pfn for hashing as well. My concern with
> > using pfn is that e.g. if you want to fill a hole, you don't have a pfn to
> > lock. What you really need to protect is a logical offset in the file to
> > serialize allocation of underlying blocks, its mapping into page tables,
> > and flushing the blocks out of caches. So using inode/mapping and offset
> > for the hashing is easier (it isn't obvious to me we can fix hole filling
> > races with pfn-based locking).
> >
> > I'm not sure for which other purposes you'd like to use this lock and
> > whether propagating file+offset to those call sites would make sense or
> > not. struct page has the advantage that block mapping information is only
> > attached to it, so when filling a hole, we can just allocate some page,
> > attach it to the radix tree, use page lock for synchronization, and allocate
> > blocks only after that. With pfns we cannot do this...
> 
> Right, I am thinking of the direct-I/O path's use of the page lock and
> the occasions where it relies on page->mapping lookups.

Well, but the main problem with direct IO is that it takes page *reference*
via get_user_pages(). So that's something different from page lock. Maybe
the new lock could be abused to provide necessary exclusion for direct IO
use as well but that would need deep thinking... So far it seems
problematic to me.

> Given we already have support for dynamically allocating struct page I
> don't think we need to have a "pfn to lock" lookup in the initial
> implementation of this locking scheme.

Agreed.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
  2016-02-10 22:39         ` Cedric Blancher
@ 2016-02-11 10:55           ` Jan Kara
  -1 siblings, 0 replies; 46+ messages in thread
From: Jan Kara @ 2016-02-11 10:55 UTC (permalink / raw)
  To: Cedric Blancher
  Cc: Dave Chinner, Jan Kara, Dan Williams, Ross Zwisler, linux-kernel,
	linux-fsdevel, Linux MM, linux-nvdimm, Mel Gorman,
	Matthew Wilcox

On Wed 10-02-16 23:39:43, Cedric Blancher wrote:
> AFAIK Solaris 11 uses a sparse tree instead of a array. Solves the
> scalability problem AND deals with variable page size.

Well, but then you have to have this locking tree for every inode so the
memory overhead is relatively large, no? I've played with range locking of
mapping in the past but its performance was not stellar. Do you have any
reference for what Solaris does?

								Honza

> On 10 February 2016 at 23:09, Dave Chinner <david@fromorbit.com> wrote:
> > On Wed, Feb 10, 2016 at 11:32:49AM +0100, Jan Kara wrote:
> >> On Tue 09-02-16 10:18:53, Dan Williams wrote:
> >> > On Tue, Feb 9, 2016 at 9:24 AM, Jan Kara <jack@suse.cz> wrote:
> >> > > Hello,
> >> > >
> >> > > I was thinking about current issues with DAX fault locking [1] (data
> >> > > corruption due to racing faults allocating blocks) and also races which
> >> > > currently don't allow us to clear dirty tags in the radix tree due to races
> >> > > between faults and cache flushing [2]. Both of these exist because we don't
> >> > > have an equivalent of page lock available for DAX. While we have a
> >> > > reasonable solution available for problem [1], so far I'm not aware of a
> >> > > decent solution for [2]. After briefly discussing the issue with Mel he had
> >> > > a bright idea that we could used hashed locks to deal with [2] (and I think
> >> > > we can solve [1] with them as well). So my proposal looks as follows:
> >> > >
> >> > > DAX will have an array of mutexes (the array can be made per device but
> >> > > initially a global one should be OK). We will use mutexes in the array as a
> >> > > replacement for page lock - we will use hashfn(mapping, index) to get
> >> > > particular mutex protecting our offset in the mapping. On fault / page
> >> > > mkwrite, we'll grab the mutex similarly to page lock and release it once we
> >> > > are done updating page tables. This deals with races in [1]. When flushing
> >> > > caches we grab the mutex before clearing writeable bit in page tables
> >> > > and clearing dirty bit in the radix tree and drop it after we have flushed
> >> > > caches for the pfn. This deals with races in [2].
> >> > >
> >> > > Thoughts?
> >> > >
> >> >
> >> > I like the fact that this makes the locking explicit and
> >> > straightforward rather than something more tricky.  Can we make the
> >> > hashfn pfn based?  I'm thinking we could later reuse this as part of
> >> > the solution for eliminating the need to allocate struct page, and we
> >> > don't have the 'mapping' available in all paths...
> >>
> >> So Mel originally suggested to use pfn for hashing as well. My concern with
> >> using pfn is that e.g. if you want to fill a hole, you don't have a pfn to
> >> lock. What you really need to protect is a logical offset in the file to
> >> serialize allocation of underlying blocks, its mapping into page tables,
> >> and flushing the blocks out of caches. So using inode/mapping and offset
> >> for the hashing is easier (it isn't obvious to me we can fix hole filling
> >> races with pfn-based locking).
> >
> > So how does that file+offset hash work when trying to lock different
> > ranges?  file+offset hashing to determine the lock to use only works
> > if we are dealing with fixed size ranges that the locks affect.
> > e.g. offset has 4k granularity for a single page faults, but we also
> > need to handle 2MB granularity for huge page faults, and IIRC 1GB
> > granularity for giant page faults...
> >
> > What's the plan here?
> >
> > Cheers,
> >
> > Dave.
> > --
> > Dave Chinner
> > david@fromorbit.com
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> -- 
> Cedric Blancher <cedric.blancher@gmail.com>
> Institute Pasteur
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
@ 2016-02-11 10:55           ` Jan Kara
  0 siblings, 0 replies; 46+ messages in thread
From: Jan Kara @ 2016-02-11 10:55 UTC (permalink / raw)
  To: Cedric Blancher
  Cc: Dave Chinner, Jan Kara, Dan Williams, Ross Zwisler, linux-kernel,
	linux-fsdevel, Linux MM, linux-nvdimm@lists.01.org, Mel Gorman,
	Matthew Wilcox

On Wed 10-02-16 23:39:43, Cedric Blancher wrote:
> AFAIK Solaris 11 uses a sparse tree instead of a array. Solves the
> scalability problem AND deals with variable page size.

Well, but then you have to have this locking tree for every inode so the
memory overhead is relatively large, no? I've played with range locking of
mapping in the past but its performance was not stellar. Do you have any
reference for what Solaris does?

								Honza

> On 10 February 2016 at 23:09, Dave Chinner <david@fromorbit.com> wrote:
> > On Wed, Feb 10, 2016 at 11:32:49AM +0100, Jan Kara wrote:
> >> On Tue 09-02-16 10:18:53, Dan Williams wrote:
> >> > On Tue, Feb 9, 2016 at 9:24 AM, Jan Kara <jack@suse.cz> wrote:
> >> > > Hello,
> >> > >
> >> > > I was thinking about current issues with DAX fault locking [1] (data
> >> > > corruption due to racing faults allocating blocks) and also races which
> >> > > currently don't allow us to clear dirty tags in the radix tree due to races
> >> > > between faults and cache flushing [2]. Both of these exist because we don't
> >> > > have an equivalent of page lock available for DAX. While we have a
> >> > > reasonable solution available for problem [1], so far I'm not aware of a
> >> > > decent solution for [2]. After briefly discussing the issue with Mel he had
> >> > > a bright idea that we could used hashed locks to deal with [2] (and I think
> >> > > we can solve [1] with them as well). So my proposal looks as follows:
> >> > >
> >> > > DAX will have an array of mutexes (the array can be made per device but
> >> > > initially a global one should be OK). We will use mutexes in the array as a
> >> > > replacement for page lock - we will use hashfn(mapping, index) to get
> >> > > particular mutex protecting our offset in the mapping. On fault / page
> >> > > mkwrite, we'll grab the mutex similarly to page lock and release it once we
> >> > > are done updating page tables. This deals with races in [1]. When flushing
> >> > > caches we grab the mutex before clearing writeable bit in page tables
> >> > > and clearing dirty bit in the radix tree and drop it after we have flushed
> >> > > caches for the pfn. This deals with races in [2].
> >> > >
> >> > > Thoughts?
> >> > >
> >> >
> >> > I like the fact that this makes the locking explicit and
> >> > straightforward rather than something more tricky.  Can we make the
> >> > hashfn pfn based?  I'm thinking we could later reuse this as part of
> >> > the solution for eliminating the need to allocate struct page, and we
> >> > don't have the 'mapping' available in all paths...
> >>
> >> So Mel originally suggested to use pfn for hashing as well. My concern with
> >> using pfn is that e.g. if you want to fill a hole, you don't have a pfn to
> >> lock. What you really need to protect is a logical offset in the file to
> >> serialize allocation of underlying blocks, its mapping into page tables,
> >> and flushing the blocks out of caches. So using inode/mapping and offset
> >> for the hashing is easier (it isn't obvious to me we can fix hole filling
> >> races with pfn-based locking).
> >
> > So how does that file+offset hash work when trying to lock different
> > ranges?  file+offset hashing to determine the lock to use only works
> > if we are dealing with fixed size ranges that the locks affect.
> > e.g. offset has 4k granularity for a single page faults, but we also
> > need to handle 2MB granularity for huge page faults, and IIRC 1GB
> > granularity for giant page faults...
> >
> > What's the plan here?
> >
> > Cheers,
> >
> > Dave.
> > --
> > Dave Chinner
> > david@fromorbit.com
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> -- 
> Cedric Blancher <cedric.blancher@gmail.com>
> Institute Pasteur
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
  2016-02-10 23:32         ` Ross Zwisler
@ 2016-02-11 11:15           ` Jan Kara
  -1 siblings, 0 replies; 46+ messages in thread
From: Jan Kara @ 2016-02-11 11:15 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Dave Chinner, Jan Kara, Dan Williams, linux-kernel,
	linux-fsdevel, Linux MM, linux-nvdimm, Mel Gorman,
	Matthew Wilcox

On Wed 10-02-16 16:32:53, Ross Zwisler wrote:
> On Thu, Feb 11, 2016 at 09:09:53AM +1100, Dave Chinner wrote:
> > On Wed, Feb 10, 2016 at 11:32:49AM +0100, Jan Kara wrote:
> > > On Tue 09-02-16 10:18:53, Dan Williams wrote:
> > > > On Tue, Feb 9, 2016 at 9:24 AM, Jan Kara <jack@suse.cz> wrote:
> > > > > Hello,
> > > > >
> > > > > I was thinking about current issues with DAX fault locking [1] (data
> > > > > corruption due to racing faults allocating blocks) and also races which
> > > > > currently don't allow us to clear dirty tags in the radix tree due to races
> > > > > between faults and cache flushing [2]. Both of these exist because we don't
> > > > > have an equivalent of page lock available for DAX. While we have a
> > > > > reasonable solution available for problem [1], so far I'm not aware of a
> > > > > decent solution for [2]. After briefly discussing the issue with Mel he had
> > > > > a bright idea that we could used hashed locks to deal with [2] (and I think
> > > > > we can solve [1] with them as well). So my proposal looks as follows:
> > > > >
> > > > > DAX will have an array of mutexes (the array can be made per device but
> > > > > initially a global one should be OK). We will use mutexes in the array as a
> > > > > replacement for page lock - we will use hashfn(mapping, index) to get
> > > > > particular mutex protecting our offset in the mapping. On fault / page
> > > > > mkwrite, we'll grab the mutex similarly to page lock and release it once we
> > > > > are done updating page tables. This deals with races in [1]. When flushing
> > > > > caches we grab the mutex before clearing writeable bit in page tables
> > > > > and clearing dirty bit in the radix tree and drop it after we have flushed
> > > > > caches for the pfn. This deals with races in [2].
> > > > >
> > > > > Thoughts?
> > > > >
> > > > 
> > > > I like the fact that this makes the locking explicit and
> > > > straightforward rather than something more tricky.  Can we make the
> > > > hashfn pfn based?  I'm thinking we could later reuse this as part of
> > > > the solution for eliminating the need to allocate struct page, and we
> > > > don't have the 'mapping' available in all paths...
> > > 
> > > So Mel originally suggested to use pfn for hashing as well. My concern with
> > > using pfn is that e.g. if you want to fill a hole, you don't have a pfn to
> > > lock. What you really need to protect is a logical offset in the file to
> > > serialize allocation of underlying blocks, its mapping into page tables,
> > > and flushing the blocks out of caches. So using inode/mapping and offset
> > > for the hashing is easier (it isn't obvious to me we can fix hole filling
> > > races with pfn-based locking).
> > 
> > So how does that file+offset hash work when trying to lock different
> > ranges?  file+offset hashing to determine the lock to use only works
> > if we are dealing with fixed size ranges that the locks affect.
> > e.g. offset has 4k granularity for a single page faults, but we also
> > need to handle 2MB granularity for huge page faults, and IIRC 1GB
> > granularity for giant page faults...
> > 
> > What's the plan here?
> 
> I wonder if it makes sense to tie the locking in with the radix tree?
> Meaning, instead of having an array of mutexes, we lock based on the radix
> tree entry.
> 
> Right now we already have to check for PTE and PMD entries in the radix tree,
> and with Matthew's suggested radix tree changes a lookup of a random address
> would give you the appropriate PMD or PUD entry, if one was present.
> 
> This sort of solves the need for having a hash function that works on
> file+offset - that's all already there when using the radix tree...

Yeah, so we need to be careful there are no aliasing issues (i.e., you do not
have PTE and PMD entries covering the same offset). Other than that using the
radix tree entry (or it's offset - you need to somehow map the entry to the
mutex anyway) as a base for mapping should deal with issues with different
page sizes.

We will have to be careful, e.g. when allocating blocks for a PMD fault. We
would have to insert PMD entry, lock it (so all newcomers will see the
entry and block on it), walk the whole range the fault covers and clear out
entries we find waiting if they are locked - lock aliasing may be an issue
here - and only after that we can proceed with the fault. It is more complex
than I'd wish but doable and I don't have anything better.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
@ 2016-02-11 11:15           ` Jan Kara
  0 siblings, 0 replies; 46+ messages in thread
From: Jan Kara @ 2016-02-11 11:15 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Dave Chinner, Jan Kara, Dan Williams, linux-kernel,
	linux-fsdevel, Linux MM, linux-nvdimm@lists.01.org, Mel Gorman,
	Matthew Wilcox

On Wed 10-02-16 16:32:53, Ross Zwisler wrote:
> On Thu, Feb 11, 2016 at 09:09:53AM +1100, Dave Chinner wrote:
> > On Wed, Feb 10, 2016 at 11:32:49AM +0100, Jan Kara wrote:
> > > On Tue 09-02-16 10:18:53, Dan Williams wrote:
> > > > On Tue, Feb 9, 2016 at 9:24 AM, Jan Kara <jack@suse.cz> wrote:
> > > > > Hello,
> > > > >
> > > > > I was thinking about current issues with DAX fault locking [1] (data
> > > > > corruption due to racing faults allocating blocks) and also races which
> > > > > currently don't allow us to clear dirty tags in the radix tree due to races
> > > > > between faults and cache flushing [2]. Both of these exist because we don't
> > > > > have an equivalent of page lock available for DAX. While we have a
> > > > > reasonable solution available for problem [1], so far I'm not aware of a
> > > > > decent solution for [2]. After briefly discussing the issue with Mel he had
> > > > > a bright idea that we could used hashed locks to deal with [2] (and I think
> > > > > we can solve [1] with them as well). So my proposal looks as follows:
> > > > >
> > > > > DAX will have an array of mutexes (the array can be made per device but
> > > > > initially a global one should be OK). We will use mutexes in the array as a
> > > > > replacement for page lock - we will use hashfn(mapping, index) to get
> > > > > particular mutex protecting our offset in the mapping. On fault / page
> > > > > mkwrite, we'll grab the mutex similarly to page lock and release it once we
> > > > > are done updating page tables. This deals with races in [1]. When flushing
> > > > > caches we grab the mutex before clearing writeable bit in page tables
> > > > > and clearing dirty bit in the radix tree and drop it after we have flushed
> > > > > caches for the pfn. This deals with races in [2].
> > > > >
> > > > > Thoughts?
> > > > >
> > > > 
> > > > I like the fact that this makes the locking explicit and
> > > > straightforward rather than something more tricky.  Can we make the
> > > > hashfn pfn based?  I'm thinking we could later reuse this as part of
> > > > the solution for eliminating the need to allocate struct page, and we
> > > > don't have the 'mapping' available in all paths...
> > > 
> > > So Mel originally suggested to use pfn for hashing as well. My concern with
> > > using pfn is that e.g. if you want to fill a hole, you don't have a pfn to
> > > lock. What you really need to protect is a logical offset in the file to
> > > serialize allocation of underlying blocks, its mapping into page tables,
> > > and flushing the blocks out of caches. So using inode/mapping and offset
> > > for the hashing is easier (it isn't obvious to me we can fix hole filling
> > > races with pfn-based locking).
> > 
> > So how does that file+offset hash work when trying to lock different
> > ranges?  file+offset hashing to determine the lock to use only works
> > if we are dealing with fixed size ranges that the locks affect.
> > e.g. offset has 4k granularity for a single page faults, but we also
> > need to handle 2MB granularity for huge page faults, and IIRC 1GB
> > granularity for giant page faults...
> > 
> > What's the plan here?
> 
> I wonder if it makes sense to tie the locking in with the radix tree?
> Meaning, instead of having an array of mutexes, we lock based on the radix
> tree entry.
> 
> Right now we already have to check for PTE and PMD entries in the radix tree,
> and with Matthew's suggested radix tree changes a lookup of a random address
> would give you the appropriate PMD or PUD entry, if one was present.
> 
> This sort of solves the need for having a hash function that works on
> file+offset - that's all already there when using the radix tree...

Yeah, so we need to be careful there are no aliasing issues (i.e., you do not
have PTE and PMD entries covering the same offset). Other than that using the
radix tree entry (or it's offset - you need to somehow map the entry to the
mutex anyway) as a base for mapping should deal with issues with different
page sizes.

We will have to be careful, e.g. when allocating blocks for a PMD fault. We
would have to insert PMD entry, lock it (so all newcomers will see the
entry and block on it), walk the whole range the fault covers and clear out
entries we find waiting if they are locked - lock aliasing may be an issue
here - and only after that we can proceed with the fault. It is more complex
than I'd wish but doable and I don't have anything better.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
  2016-02-11 10:55           ` Jan Kara
@ 2016-02-11 21:05             ` Cedric Blancher
  -1 siblings, 0 replies; 46+ messages in thread
From: Cedric Blancher @ 2016-02-11 21:05 UTC (permalink / raw)
  To: Jan Kara
  Cc: Dave Chinner, Dan Williams, Ross Zwisler, linux-kernel,
	linux-fsdevel, Linux MM, linux-nvdimm, Mel Gorman,
	Matthew Wilcox

The Solaris 11 sources are still available at Illumos.org (Illumos is
what Opensolaris was once, minus Suns bug database).

Also, if you keep the performance in mind, remember that the world is
moving towards many cores with many hardware threads per core, so
optimising for the "two core with low latency" use case is wrong like
a sin. More likely is the "8 core with 4 threads per core" use case
for benchmarking because that's what we will end up in even low end
hardware soon, maybe with variable bandwidth between cores if
something like ScaleMP is used.

Ced

On 11 February 2016 at 11:55, Jan Kara <jack@suse.cz> wrote:
> On Wed 10-02-16 23:39:43, Cedric Blancher wrote:
>> AFAIK Solaris 11 uses a sparse tree instead of a array. Solves the
>> scalability problem AND deals with variable page size.
>
> Well, but then you have to have this locking tree for every inode so the
> memory overhead is relatively large, no? I've played with range locking of
> mapping in the past but its performance was not stellar. Do you have any
> reference for what Solaris does?
>
>                                                                 Honza
>
>> On 10 February 2016 at 23:09, Dave Chinner <david@fromorbit.com> wrote:
>> > On Wed, Feb 10, 2016 at 11:32:49AM +0100, Jan Kara wrote:
>> >> On Tue 09-02-16 10:18:53, Dan Williams wrote:
>> >> > On Tue, Feb 9, 2016 at 9:24 AM, Jan Kara <jack@suse.cz> wrote:
>> >> > > Hello,
>> >> > >
>> >> > > I was thinking about current issues with DAX fault locking [1] (data
>> >> > > corruption due to racing faults allocating blocks) and also races which
>> >> > > currently don't allow us to clear dirty tags in the radix tree due to races
>> >> > > between faults and cache flushing [2]. Both of these exist because we don't
>> >> > > have an equivalent of page lock available for DAX. While we have a
>> >> > > reasonable solution available for problem [1], so far I'm not aware of a
>> >> > > decent solution for [2]. After briefly discussing the issue with Mel he had
>> >> > > a bright idea that we could used hashed locks to deal with [2] (and I think
>> >> > > we can solve [1] with them as well). So my proposal looks as follows:
>> >> > >
>> >> > > DAX will have an array of mutexes (the array can be made per device but
>> >> > > initially a global one should be OK). We will use mutexes in the array as a
>> >> > > replacement for page lock - we will use hashfn(mapping, index) to get
>> >> > > particular mutex protecting our offset in the mapping. On fault / page
>> >> > > mkwrite, we'll grab the mutex similarly to page lock and release it once we
>> >> > > are done updating page tables. This deals with races in [1]. When flushing
>> >> > > caches we grab the mutex before clearing writeable bit in page tables
>> >> > > and clearing dirty bit in the radix tree and drop it after we have flushed
>> >> > > caches for the pfn. This deals with races in [2].
>> >> > >
>> >> > > Thoughts?
>> >> > >
>> >> >
>> >> > I like the fact that this makes the locking explicit and
>> >> > straightforward rather than something more tricky.  Can we make the
>> >> > hashfn pfn based?  I'm thinking we could later reuse this as part of
>> >> > the solution for eliminating the need to allocate struct page, and we
>> >> > don't have the 'mapping' available in all paths...
>> >>
>> >> So Mel originally suggested to use pfn for hashing as well. My concern with
>> >> using pfn is that e.g. if you want to fill a hole, you don't have a pfn to
>> >> lock. What you really need to protect is a logical offset in the file to
>> >> serialize allocation of underlying blocks, its mapping into page tables,
>> >> and flushing the blocks out of caches. So using inode/mapping and offset
>> >> for the hashing is easier (it isn't obvious to me we can fix hole filling
>> >> races with pfn-based locking).
>> >
>> > So how does that file+offset hash work when trying to lock different
>> > ranges?  file+offset hashing to determine the lock to use only works
>> > if we are dealing with fixed size ranges that the locks affect.
>> > e.g. offset has 4k granularity for a single page faults, but we also
>> > need to handle 2MB granularity for huge page faults, and IIRC 1GB
>> > granularity for giant page faults...
>> >
>> > What's the plan here?
>> >
>> > Cheers,
>> >
>> > Dave.
>> > --
>> > Dave Chinner
>> > david@fromorbit.com
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>> --
>> Cedric Blancher <cedric.blancher@gmail.com>
>> Institute Pasteur
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR



-- 
Cedric Blancher <cedric.blancher@gmail.com>
Institute Pasteur

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
@ 2016-02-11 21:05             ` Cedric Blancher
  0 siblings, 0 replies; 46+ messages in thread
From: Cedric Blancher @ 2016-02-11 21:05 UTC (permalink / raw)
  To: Jan Kara
  Cc: Dave Chinner, Dan Williams, Ross Zwisler, linux-kernel,
	linux-fsdevel, Linux MM, linux-nvdimm@lists.01.org, Mel Gorman,
	Matthew Wilcox

The Solaris 11 sources are still available at Illumos.org (Illumos is
what Opensolaris was once, minus Suns bug database).

Also, if you keep the performance in mind, remember that the world is
moving towards many cores with many hardware threads per core, so
optimising for the "two core with low latency" use case is wrong like
a sin. More likely is the "8 core with 4 threads per core" use case
for benchmarking because that's what we will end up in even low end
hardware soon, maybe with variable bandwidth between cores if
something like ScaleMP is used.

Ced

On 11 February 2016 at 11:55, Jan Kara <jack@suse.cz> wrote:
> On Wed 10-02-16 23:39:43, Cedric Blancher wrote:
>> AFAIK Solaris 11 uses a sparse tree instead of a array. Solves the
>> scalability problem AND deals with variable page size.
>
> Well, but then you have to have this locking tree for every inode so the
> memory overhead is relatively large, no? I've played with range locking of
> mapping in the past but its performance was not stellar. Do you have any
> reference for what Solaris does?
>
>                                                                 Honza
>
>> On 10 February 2016 at 23:09, Dave Chinner <david@fromorbit.com> wrote:
>> > On Wed, Feb 10, 2016 at 11:32:49AM +0100, Jan Kara wrote:
>> >> On Tue 09-02-16 10:18:53, Dan Williams wrote:
>> >> > On Tue, Feb 9, 2016 at 9:24 AM, Jan Kara <jack@suse.cz> wrote:
>> >> > > Hello,
>> >> > >
>> >> > > I was thinking about current issues with DAX fault locking [1] (data
>> >> > > corruption due to racing faults allocating blocks) and also races which
>> >> > > currently don't allow us to clear dirty tags in the radix tree due to races
>> >> > > between faults and cache flushing [2]. Both of these exist because we don't
>> >> > > have an equivalent of page lock available for DAX. While we have a
>> >> > > reasonable solution available for problem [1], so far I'm not aware of a
>> >> > > decent solution for [2]. After briefly discussing the issue with Mel he had
>> >> > > a bright idea that we could used hashed locks to deal with [2] (and I think
>> >> > > we can solve [1] with them as well). So my proposal looks as follows:
>> >> > >
>> >> > > DAX will have an array of mutexes (the array can be made per device but
>> >> > > initially a global one should be OK). We will use mutexes in the array as a
>> >> > > replacement for page lock - we will use hashfn(mapping, index) to get
>> >> > > particular mutex protecting our offset in the mapping. On fault / page
>> >> > > mkwrite, we'll grab the mutex similarly to page lock and release it once we
>> >> > > are done updating page tables. This deals with races in [1]. When flushing
>> >> > > caches we grab the mutex before clearing writeable bit in page tables
>> >> > > and clearing dirty bit in the radix tree and drop it after we have flushed
>> >> > > caches for the pfn. This deals with races in [2].
>> >> > >
>> >> > > Thoughts?
>> >> > >
>> >> >
>> >> > I like the fact that this makes the locking explicit and
>> >> > straightforward rather than something more tricky.  Can we make the
>> >> > hashfn pfn based?  I'm thinking we could later reuse this as part of
>> >> > the solution for eliminating the need to allocate struct page, and we
>> >> > don't have the 'mapping' available in all paths...
>> >>
>> >> So Mel originally suggested to use pfn for hashing as well. My concern with
>> >> using pfn is that e.g. if you want to fill a hole, you don't have a pfn to
>> >> lock. What you really need to protect is a logical offset in the file to
>> >> serialize allocation of underlying blocks, its mapping into page tables,
>> >> and flushing the blocks out of caches. So using inode/mapping and offset
>> >> for the hashing is easier (it isn't obvious to me we can fix hole filling
>> >> races with pfn-based locking).
>> >
>> > So how does that file+offset hash work when trying to lock different
>> > ranges?  file+offset hashing to determine the lock to use only works
>> > if we are dealing with fixed size ranges that the locks affect.
>> > e.g. offset has 4k granularity for a single page faults, but we also
>> > need to handle 2MB granularity for huge page faults, and IIRC 1GB
>> > granularity for giant page faults...
>> >
>> > What's the plan here?
>> >
>> > Cheers,
>> >
>> > Dave.
>> > --
>> > Dave Chinner
>> > david@fromorbit.com
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>> --
>> Cedric Blancher <cedric.blancher@gmail.com>
>> Institute Pasteur
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR



-- 
Cedric Blancher <cedric.blancher@gmail.com>
Institute Pasteur

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
  2016-02-11 10:38     ` Jan Kara
@ 2016-02-14  8:51       ` Boaz Harrosh
  -1 siblings, 0 replies; 46+ messages in thread
From: Boaz Harrosh @ 2016-02-14  8:51 UTC (permalink / raw)
  To: Jan Kara, Boaz Harrosh
  Cc: Ross Zwisler, linux-nvdimm, Dave Chinner, linux-kernel, linux-mm,
	mgorman, linux-fsdevel

On 02/11/2016 12:38 PM, Jan Kara wrote:
> On Wed 10-02-16 19:38:21, Boaz Harrosh wrote:
>> On 02/09/2016 07:24 PM, Jan Kara wrote:
>>> Hello,
>>>
<>
>>>
>>> DAX will have an array of mutexes (the array can be made per device but
>>> initially a global one should be OK). We will use mutexes in the array as a
>>> replacement for page lock - we will use hashfn(mapping, index) to get
>>> particular mutex protecting our offset in the mapping. On fault / page
>>> mkwrite, we'll grab the mutex similarly to page lock and release it once we
>>> are done updating page tables. This deals with races in [1]. When flushing
>>> caches we grab the mutex before clearing writeable bit in page tables
>>> and clearing dirty bit in the radix tree and drop it after we have flushed
>>> caches for the pfn. This deals with races in [2].
>>>
>>> Thoughts?
>>>
>>
>> You could also use one of the radix-tree's special-bits as a bit lock.
>> So no need for any extra allocations.
> 
> Yes and I've suggested that once as well. But since we need sleeping
> locks, you need some wait queues somewhere as well. So some allocations are
> going to be needed anyway. 

They are already sleeping locks and there are all the proper "wait queues"
in place. I'm talking about
   lock:
	err = wait_on_bit_lock(&some_long, SOME_BIT_LOCK, ...);
and
   unlock:
	WARN_ON(!test_and_clear_bit(SOME_BIT_LOCK, &some_long));
	wake_up_bit(&some_long, SOME_BIT_LOCK);

> And mutexes have much better properties than

Just saying that page-locks are implemented just this way these days
so it is the performance and characteristics we already know.
(You are replacing page locks, no?)

> bit-locks so I prefer mutexes over cramming bit locks into radix tree. Plus
> you'd have to be careful so that someone doesn't remove the bit from the
> radix tree while you are working with it.
> 

Sure! need to be careful, is our middle name.

That said. Is your call. Thank you for working on this. Your plan sounds
very good as well, and is very much needed, because DAX's mmap performance
success right now.
[Maybe one small enhancement perhaps allocate an array of mutexes per NUMA
 node and access the proper array through numa_node_id()]

> 								Honza
> 

Thanks
Boaz

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Another proposal for DAX fault locking
@ 2016-02-14  8:51       ` Boaz Harrosh
  0 siblings, 0 replies; 46+ messages in thread
From: Boaz Harrosh @ 2016-02-14  8:51 UTC (permalink / raw)
  To: Jan Kara, Boaz Harrosh
  Cc: Ross Zwisler, linux-nvdimm, Dave Chinner, linux-kernel, linux-mm,
	mgorman, linux-fsdevel

On 02/11/2016 12:38 PM, Jan Kara wrote:
> On Wed 10-02-16 19:38:21, Boaz Harrosh wrote:
>> On 02/09/2016 07:24 PM, Jan Kara wrote:
>>> Hello,
>>>
<>
>>>
>>> DAX will have an array of mutexes (the array can be made per device but
>>> initially a global one should be OK). We will use mutexes in the array as a
>>> replacement for page lock - we will use hashfn(mapping, index) to get
>>> particular mutex protecting our offset in the mapping. On fault / page
>>> mkwrite, we'll grab the mutex similarly to page lock and release it once we
>>> are done updating page tables. This deals with races in [1]. When flushing
>>> caches we grab the mutex before clearing writeable bit in page tables
>>> and clearing dirty bit in the radix tree and drop it after we have flushed
>>> caches for the pfn. This deals with races in [2].
>>>
>>> Thoughts?
>>>
>>
>> You could also use one of the radix-tree's special-bits as a bit lock.
>> So no need for any extra allocations.
> 
> Yes and I've suggested that once as well. But since we need sleeping
> locks, you need some wait queues somewhere as well. So some allocations are
> going to be needed anyway. 

They are already sleeping locks and there are all the proper "wait queues"
in place. I'm talking about
   lock:
	err = wait_on_bit_lock(&some_long, SOME_BIT_LOCK, ...);
and
   unlock:
	WARN_ON(!test_and_clear_bit(SOME_BIT_LOCK, &some_long));
	wake_up_bit(&some_long, SOME_BIT_LOCK);

> And mutexes have much better properties than

Just saying that page-locks are implemented just this way these days
so it is the performance and characteristics we already know.
(You are replacing page locks, no?)

> bit-locks so I prefer mutexes over cramming bit locks into radix tree. Plus
> you'd have to be careful so that someone doesn't remove the bit from the
> radix tree while you are working with it.
> 

Sure! need to be careful, is our middle name.

That said. Is your call. Thank you for working on this. Your plan sounds
very good as well, and is very much needed, because DAX's mmap performance
success right now.
[Maybe one small enhancement perhaps allocate an array of mutexes per NUMA
 node and access the proper array through numa_node_id()]

> 								Honza
> 

Thanks
Boaz

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2016-02-14  8:51 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-09 17:24 Another proposal for DAX fault locking Jan Kara
2016-02-09 17:24 ` Jan Kara
2016-02-09 18:18 ` Dan Williams
2016-02-09 18:18   ` Dan Williams
2016-02-10 10:32   ` Jan Kara
2016-02-10 10:32     ` Jan Kara
2016-02-10 20:08     ` Dan Williams
2016-02-10 20:08       ` Dan Williams
2016-02-11 10:43       ` Jan Kara
2016-02-11 10:43         ` Jan Kara
2016-02-10 22:09     ` Dave Chinner
2016-02-10 22:09       ` Dave Chinner
2016-02-10 22:39       ` Cedric Blancher
2016-02-10 22:39         ` Cedric Blancher
2016-02-10 23:34         ` Ross Zwisler
2016-02-10 23:34           ` Ross Zwisler
2016-02-11 10:55         ` Jan Kara
2016-02-11 10:55           ` Jan Kara
2016-02-11 21:05           ` Cedric Blancher
2016-02-11 21:05             ` Cedric Blancher
2016-02-10 23:32       ` Ross Zwisler
2016-02-10 23:32         ` Ross Zwisler
2016-02-11 11:15         ` Jan Kara
2016-02-11 11:15           ` Jan Kara
2016-02-09 18:46 ` Cedric Blancher
2016-02-09 18:46   ` Cedric Blancher
2016-02-10  8:19   ` Mel Gorman
2016-02-10  8:19     ` Mel Gorman
2016-02-10 10:18     ` Jan Kara
2016-02-10 10:18       ` Jan Kara
2016-02-10 12:29 ` Dmitry Monakhov
2016-02-10 12:29   ` Dmitry Monakhov
2016-02-10 12:35   ` Jan Kara
2016-02-10 12:35     ` Jan Kara
2016-02-10 17:38 ` Boaz Harrosh
2016-02-10 17:38   ` Boaz Harrosh
2016-02-11 10:38   ` Jan Kara
2016-02-11 10:38     ` Jan Kara
2016-02-14  8:51     ` Boaz Harrosh
2016-02-14  8:51       ` Boaz Harrosh
2016-02-10 23:44 ` Ross Zwisler
2016-02-10 23:44   ` Ross Zwisler
2016-02-10 23:51   ` Cedric Blancher
2016-02-10 23:51     ` Cedric Blancher
2016-02-11  0:13     ` Ross Zwisler
2016-02-11  0:13       ` Ross Zwisler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.