All of lore.kernel.org
 help / color / mirror / Atom feed
* quota: dqio_mutex design
@ 2017-02-02 12:23 Andrew Perepechko
  2017-03-03 10:08 ` Jan Kara
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Perepechko @ 2017-02-02 12:23 UTC (permalink / raw)
  To: linux-fsdevel

Hello!

We have a heavy metadata related workload (ext4, quota journalling)
and profiling shows that there's significant dqio_mutex contention.

>From the quota code, it looks like every time dqio_mutex is taken
it protects access to only one quota file.

Is it possible to split dqio_mutex for each of MAXQUOTAS so that
e.g. 2 parallel dquot_commit()'s can be running for user and group
quota update? Am I missing any dqio_mutex function that requires
dqio_mutex to be monolithic?

Thank you,
Andrew

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: quota: dqio_mutex design
  2017-02-02 12:23 quota: dqio_mutex design Andrew Perepechko
@ 2017-03-03 10:08 ` Jan Kara
  2017-03-09 22:29   ` Andrew Perepechko
  2017-06-21 10:52   ` Jan Kara
  0 siblings, 2 replies; 22+ messages in thread
From: Jan Kara @ 2017-03-03 10:08 UTC (permalink / raw)
  To: Andrew Perepechko; +Cc: linux-fsdevel

Hello!

On Thu 02-02-17 15:23:44, Andrew Perepechko wrote:
> We have a heavy metadata related workload (ext4, quota journalling)
> and profiling shows that there's significant dqio_mutex contention.
> 
> From the quota code, it looks like every time dqio_mutex is taken
> it protects access to only one quota file.
> 
> Is it possible to split dqio_mutex for each of MAXQUOTAS so that
> e.g. 2 parallel dquot_commit()'s can be running for user and group
> quota update? Am I missing any dqio_mutex function that requires
> dqio_mutex to be monolithic?

So we can certainly make dqio_mutex less heavy. Making it per-quota-type
would OK but I suspect it will not bring a big benefit. What would likely
be more noticeable is if we avoided dqio_mutex for updates of quota
information - that should not be that hard to do since we update that
in-place and so don't really need the serialization for anything
substantial. However we will need some restructuring of the code to make
such locking scheme possible in a clean way...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: quota: dqio_mutex design
  2017-03-03 10:08 ` Jan Kara
@ 2017-03-09 22:29   ` Andrew Perepechko
  2017-03-13  8:44     ` Jan Kara
  2017-06-21 10:52   ` Jan Kara
  1 sibling, 1 reply; 22+ messages in thread
From: Andrew Perepechko @ 2017-03-09 22:29 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

Hello!

Jan, do you think it makes sense, as an improvement
until the code restructuring, to exit immediately from
ext4_mark_dquot_dirty() if dquot_mark_dquot_dirty()
returns 1?

It seems that in this case we are guaranteed that some
thread is somewhere in the middle of mark_dquot_dirty()
and clear_dquot_dirty(), so it will update the quota file
buffer with the latest dquot data.

That would improve a single user/group scenario like:
thread 1) processing dquot_commit()
thread 2) dirtied dquot and is waiting for dqio_mutex
thread 3, 4, 5 ...) dirtied dquot and are waiting for dqio_mutex

If we exit immediately on dquot dirtying, threads 3, 4, 5, ...
can let thread 2 update the buffer data and themselves
may not block on the mutex.

Thank you,
Andrew


> Hello!
> 
> On Thu 02-02-17 15:23:44, Andrew Perepechko wrote:
> > We have a heavy metadata related workload (ext4, quota journalling)
> > and profiling shows that there's significant dqio_mutex contention.
> > 
> > From the quota code, it looks like every time dqio_mutex is taken
> > it protects access to only one quota file.
> > 
> > Is it possible to split dqio_mutex for each of MAXQUOTAS so that
> > e.g. 2 parallel dquot_commit()'s can be running for user and group
> > quota update? Am I missing any dqio_mutex function that requires
> > dqio_mutex to be monolithic?
> 
> So we can certainly make dqio_mutex less heavy. Making it per-quota-type
> would OK but I suspect it will not bring a big benefit. What would likely
> be more noticeable is if we avoided dqio_mutex for updates of quota
> information - that should not be that hard to do since we update that
> in-place and so don't really need the serialization for anything
> substantial. However we will need some restructuring of the code to make
> such locking scheme possible in a clean way...
> 
> 								Honza

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: quota: dqio_mutex design
  2017-03-09 22:29   ` Andrew Perepechko
@ 2017-03-13  8:44     ` Jan Kara
  0 siblings, 0 replies; 22+ messages in thread
From: Jan Kara @ 2017-03-13  8:44 UTC (permalink / raw)
  To: Andrew Perepechko; +Cc: Jan Kara, linux-fsdevel

Hi,

On Fri 10-03-17 01:29:22, Andrew Perepechko wrote:
> Jan, do you think it makes sense, as an improvement
> until the code restructuring, to exit immediately from
> ext4_mark_dquot_dirty() if dquot_mark_dquot_dirty()
> returns 1?
> 
> It seems that in this case we are guaranteed that some
> thread is somewhere in the middle of mark_dquot_dirty()
> and clear_dquot_dirty(), so it will update the quota file
> buffer with the latest dquot data.

Well, it would mostly work, except if process A dirties dquot outside of
transaction (e.g. dquot_set_dqblk()), it could happen that other updates
of dquot inside a running transaction will end up relying on update of
dquot buffer by process A and that may end only in the next transaction
thus breaking the journalling guarantees.

								Honza

> > Hello!
> > 
> > On Thu 02-02-17 15:23:44, Andrew Perepechko wrote:
> > > We have a heavy metadata related workload (ext4, quota journalling)
> > > and profiling shows that there's significant dqio_mutex contention.
> > > 
> > > From the quota code, it looks like every time dqio_mutex is taken
> > > it protects access to only one quota file.
> > > 
> > > Is it possible to split dqio_mutex for each of MAXQUOTAS so that
> > > e.g. 2 parallel dquot_commit()'s can be running for user and group
> > > quota update? Am I missing any dqio_mutex function that requires
> > > dqio_mutex to be monolithic?
> > 
> > So we can certainly make dqio_mutex less heavy. Making it per-quota-type
> > would OK but I suspect it will not bring a big benefit. What would likely
> > be more noticeable is if we avoided dqio_mutex for updates of quota
> > information - that should not be that hard to do since we update that
> > in-place and so don't really need the serialization for anything
> > substantial. However we will need some restructuring of the code to make
> > such locking scheme possible in a clean way...
> > 
> > 								Honza
> 
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: quota: dqio_mutex design
  2017-03-03 10:08 ` Jan Kara
  2017-03-09 22:29   ` Andrew Perepechko
@ 2017-06-21 10:52   ` Jan Kara
       [not found]     ` <4181747.CBilgxvOab@panda>
  1 sibling, 1 reply; 22+ messages in thread
From: Jan Kara @ 2017-06-21 10:52 UTC (permalink / raw)
  To: Andrew Perepechko; +Cc: linux-fsdevel

On Fri 03-03-17 11:08:42, Jan Kara wrote:
> Hello!
> 
> On Thu 02-02-17 15:23:44, Andrew Perepechko wrote:
> > We have a heavy metadata related workload (ext4, quota journalling)
> > and profiling shows that there's significant dqio_mutex contention.
> > 
> > From the quota code, it looks like every time dqio_mutex is taken
> > it protects access to only one quota file.
> > 
> > Is it possible to split dqio_mutex for each of MAXQUOTAS so that
> > e.g. 2 parallel dquot_commit()'s can be running for user and group
> > quota update? Am I missing any dqio_mutex function that requires
> > dqio_mutex to be monolithic?
> 
> So we can certainly make dqio_mutex less heavy. Making it per-quota-type
> would OK but I suspect it will not bring a big benefit. What would likely
> be more noticeable is if we avoided dqio_mutex for updates of quota
> information - that should not be that hard to do since we update that
> in-place and so don't really need the serialization for anything
> substantial. However we will need some restructuring of the code to make
> such locking scheme possible in a clean way...

So I'm experimenting with some patches. However I have trouble creating
a workload where quota updates would show significant overhead. Can you
share which workload is problematic for you? Thanks!

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: quota: dqio_mutex design
       [not found]     ` <4181747.CBilgxvOab@panda>
@ 2017-08-01 13:02       ` Jan Kara
  2017-08-02 16:25         ` Jan Kara
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Kara @ 2017-08-01 13:02 UTC (permalink / raw)
  To: Andrew Perepechko; +Cc: Jan Kara, linux-fsdevel

Hi Andrew,

On Fri 23-06-17 02:43:44, Andrew Perepechko wrote:
> The original workload was 50 threads sequentially creating files, each
> 
> thread in its own directory, over a fast RAID array.

OK, I can reproduce this. Actually I can reproduce on normal SATA drive.
Originally I've tried on ramdisk to simulate really fast drive but there
dq_list_lock and dq_data_lock contention is much more visible and the
contention on dqio_mutex is minimal (two orders of magnitude smaller). On
SATA drive we spend ~45% of runtime contending on dqio_mutex when creating
empty files.

The problem is that if it is single user that is creating all these files,
it is not clear how we could do much better - all processes contend to
update the same location on disk with quota information for that user and
they have to be synchronized somehow. If there are more users, we could do
better by splitting dqio_mutex on per-dquot basis (I have some preliminary
patches for that).

One idea I have how we could make things faster is that instead of having
dquot dirty flag, we would have a sequence counter. So currently dquot
modification looks like:

update counters in dquot
dquot_mark_dquot_dirty(dquot);
dquot_commit(dquot)
  mutex_lock(dqio_mutex);
  if (!clear_dquot_dirty(dquot))
    nothing to do -> bail
  ->commit_dqblk(dquot)
  mutex_unlock(dqio_mutex);

When several processes race updating the same dquot, they very often all
end up updating dquot on disk even though another process has already
written dquot for them while they were waiting for dqio_sem - in my test
above the ratio of commit_dqblk / dquot_commit calls was 59%. What we could
do is that dquot_mark_dquot_dirty() would return "current sequence of
dquot", dquot_commit() would then get sequence that is required to be
written and if that is already written (we would also store in dquot latest
written sequence), it would bail out doing nothing. This should cut down
dqio_mutex hold times and thus wait times but I need to experiment and
measure that...

								Honza

> > On Fri 03-03-17 11:08:42, Jan Kara wrote:
> 
> > > Hello!
> 
> > >
> 
> > > On Thu 02-02-17 15:23:44, Andrew Perepechko wrote:
> 
> > > > We have a heavy metadata related workload (ext4, quota journalling)
> 
> > > > and profiling shows that there's significant dqio_mutex contention.
> 
> > > >
> 
> > > > From the quota code, it looks like every time dqio_mutex is taken
> 
> > > > it protects access to only one quota file.
> 
> > > >
> 
> > > > Is it possible to split dqio_mutex for each of MAXQUOTAS so that
> 
> > > > e.g. 2 parallel dquot_commit()'s can be running for user and group
> 
> > > > quota update? Am I missing any dqio_mutex function that requires
> 
> > > > dqio_mutex to be monolithic?
> 
> > >
> 
> > > So we can certainly make dqio_mutex less heavy. Making it per-quota-type
> 
> > > would OK but I suspect it will not bring a big benefit. What would likely
> 
> > > be more noticeable is if we avoided dqio_mutex for updates of quota
> 
> > > information - that should not be that hard to do since we update that
> 
> > > in-place and so don't really need the serialization for anything
> 
> > > substantial. However we will need some restructuring of the code to make
> 
> > > such locking scheme possible in a clean way...
> 
> >
> 
> > So I'm experimenting with some patches. However I have trouble creating
> 
> > a workload where quota updates would show significant overhead. Can you
> 
> > share which workload is problematic for you? Thanks!
> 
> >
> 
> > Honza
> 
>  
> 
>  
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: quota: dqio_mutex design
  2017-08-01 13:02       ` Jan Kara
@ 2017-08-02 16:25         ` Jan Kara
  2017-08-02 17:52           ` Andrew Perepechko
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Kara @ 2017-08-02 16:25 UTC (permalink / raw)
  To: Andrew Perepechko; +Cc: Jan Kara, linux-fsdevel

On Tue 01-08-17 15:02:42, Jan Kara wrote:
> Hi Andrew,
> 
> On Fri 23-06-17 02:43:44, Andrew Perepechko wrote:
> > The original workload was 50 threads sequentially creating files, each
> > 
> > thread in its own directory, over a fast RAID array.
> 
> OK, I can reproduce this. Actually I can reproduce on normal SATA drive.
> Originally I've tried on ramdisk to simulate really fast drive but there
> dq_list_lock and dq_data_lock contention is much more visible and the
> contention on dqio_mutex is minimal (two orders of magnitude smaller). On
> SATA drive we spend ~45% of runtime contending on dqio_mutex when creating
> empty files.

So this was just me misinterpretting lockstat data (forgot to divide the
wait time by number of processes) - then the result would be that each
process waits only ~1% of its runtime for dqio_mutex.

Anyway, my patches show ~10% improvement in runtime when 50 different
processes create empty files for 50 different users. As expected there's
not measurable benefit when all processes create files for the same user.

> The problem is that if it is single user that is creating all these files,
> it is not clear how we could do much better - all processes contend to
> update the same location on disk with quota information for that user and
> they have to be synchronized somehow. If there are more users, we could do
> better by splitting dqio_mutex on per-dquot basis (I have some preliminary
> patches for that).
> 
> One idea I have how we could make things faster is that instead of having
> dquot dirty flag, we would have a sequence counter. So currently dquot
> modification looks like:
> 
> update counters in dquot
> dquot_mark_dquot_dirty(dquot);
> dquot_commit(dquot)
>   mutex_lock(dqio_mutex);
>   if (!clear_dquot_dirty(dquot))
>     nothing to do -> bail
>   ->commit_dqblk(dquot)
>   mutex_unlock(dqio_mutex);
> 
> When several processes race updating the same dquot, they very often all
> end up updating dquot on disk even though another process has already
> written dquot for them while they were waiting for dqio_sem - in my test
> above the ratio of commit_dqblk / dquot_commit calls was 59%. What we could
> do is that dquot_mark_dquot_dirty() would return "current sequence of
> dquot", dquot_commit() would then get sequence that is required to be
> written and if that is already written (we would also store in dquot latest
> written sequence), it would bail out doing nothing. This should cut down
> dqio_mutex hold times and thus wait times but I need to experiment and
> measure that...

I've been experimenting with this today but this idea didn't bring any
benefit in my testing. Was your setup with multiple users or a single user?
Could you give some testing to my patches to see whether they bring some
benefit to you?

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: quota: dqio_mutex design
  2017-08-02 16:25         ` Jan Kara
@ 2017-08-02 17:52           ` Andrew Perepechko
  2017-08-03 11:09             ` Jan Kara
  2017-08-03 11:31             ` Wang Shilong
  0 siblings, 2 replies; 22+ messages in thread
From: Andrew Perepechko @ 2017-08-02 17:52 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel

> On Tue 01-08-17 15:02:42, Jan Kara wrote:
> > Hi Andrew,
> > 
> > On Fri 23-06-17 02:43:44, Andrew Perepechko wrote:
> > > The original workload was 50 threads sequentially creating files, each
> > > 
> > > thread in its own directory, over a fast RAID array.
> > 
> > OK, I can reproduce this. Actually I can reproduce on normal SATA drive.
> > Originally I've tried on ramdisk to simulate really fast drive but there
> > dq_list_lock and dq_data_lock contention is much more visible and the
> > contention on dqio_mutex is minimal (two orders of magnitude smaller). On
> > SATA drive we spend ~45% of runtime contending on dqio_mutex when creating
> > empty files.
> 
> So this was just me misinterpretting lockstat data (forgot to divide the
> wait time by number of processes) - then the result would be that each
> process waits only ~1% of its runtime for dqio_mutex.
> 
> Anyway, my patches show ~10% improvement in runtime when 50 different
> processes create empty files for 50 different users. As expected there's
> not measurable benefit when all processes create files for the same user.
> 
> > The problem is that if it is single user that is creating all these files,
> > it is not clear how we could do much better - all processes contend to
> > update the same location on disk with quota information for that user and
> > they have to be synchronized somehow. If there are more users, we could do
> > better by splitting dqio_mutex on per-dquot basis (I have some preliminary
> > patches for that).
> > 
> > One idea I have how we could make things faster is that instead of having
> > dquot dirty flag, we would have a sequence counter. So currently dquot
> > modification looks like:
> > 
> > update counters in dquot
> > dquot_mark_dquot_dirty(dquot);
> > dquot_commit(dquot)
> > 
> >   mutex_lock(dqio_mutex);
> >   if (!clear_dquot_dirty(dquot))
> >   
> >     nothing to do -> bail
> >   
> >   ->commit_dqblk(dquot)
> >   mutex_unlock(dqio_mutex);
> > 
> > When several processes race updating the same dquot, they very often all
> > end up updating dquot on disk even though another process has already
> > written dquot for them while they were waiting for dqio_sem - in my test
> > above the ratio of commit_dqblk / dquot_commit calls was 59%. What we
> > could
> > do is that dquot_mark_dquot_dirty() would return "current sequence of
> > dquot", dquot_commit() would then get sequence that is required to be
> > written and if that is already written (we would also store in dquot
> > latest
> > written sequence), it would bail out doing nothing. This should cut down
> > dqio_mutex hold times and thus wait times but I need to experiment and
> > measure that...
> 
> I've been experimenting with this today but this idea didn't bring any
> benefit in my testing. Was your setup with multiple users or a single user?
> Could you give some testing to my patches to see whether they bring some
> benefit to you?
> 
> 								Honza

Hi Jan!

My setup was with a single user. Unfortunately, it may take some time before
I can try a patched kernel other than RHEL6 or RHEL7 with the same test,
we have a lot of dependencies on these kernels.

The actual test we ran was mdtest.

By the way, we had 15+% performance improvement in creates from the
change that was discussed earlier in this thread:

           EXT4_SB(dquot->dq_sb)->s_qf_names[GRPQUOTA]) {
+              if (test_bit(DQ_MOD_B, &dquot->dq_flags))
+                       return 0;
               dquot_mark_dquot_dirty(dquot);
               return ext4_write_dquot(dquot);

The idea was that if we know that some thread is somewhere between
mark_dirty and clear_dirty, then we can avoid blocking on dqio_mutex,
since that thread will update the ondisk dquot for us.

I think, you also mentioned that some mark_dquot_dirty callers, such
as do_set_dqblk, may not be running with an open transaction handle,
so we cannot assume this optimization is atomic. However, we don't
use do_set_dqblk and seem safe wrt journalling.

Thank you,
Andrew

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: quota: dqio_mutex design
  2017-08-02 17:52           ` Andrew Perepechko
@ 2017-08-03 11:09             ` Jan Kara
  2017-08-03 11:31             ` Wang Shilong
  1 sibling, 0 replies; 22+ messages in thread
From: Jan Kara @ 2017-08-03 11:09 UTC (permalink / raw)
  To: Andrew Perepechko; +Cc: Jan Kara, linux-fsdevel

On Wed 02-08-17 20:52:51, Andrew Perepechko wrote:
> > On Tue 01-08-17 15:02:42, Jan Kara wrote:
> > > When several processes race updating the same dquot, they very often all
> > > end up updating dquot on disk even though another process has already
> > > written dquot for them while they were waiting for dqio_sem - in my test
> > > above the ratio of commit_dqblk / dquot_commit calls was 59%. What we
> > > could
> > > do is that dquot_mark_dquot_dirty() would return "current sequence of
> > > dquot", dquot_commit() would then get sequence that is required to be
> > > written and if that is already written (we would also store in dquot
> > > latest
> > > written sequence), it would bail out doing nothing. This should cut down
> > > dqio_mutex hold times and thus wait times but I need to experiment and
> > > measure that...
> > 
> > I've been experimenting with this today but this idea didn't bring any
> > benefit in my testing. Was your setup with multiple users or a single user?
> > Could you give some testing to my patches to see whether they bring some
> > benefit to you?
> > 
> > 								Honza
> 
> Hi Jan!
> 
> My setup was with a single user. Unfortunately, it may take some time before
> I can try a patched kernel other than RHEL6 or RHEL7 with the same test,
> we have a lot of dependencies on these kernels.
> 
> The actual test we ran was mdtest.
> 
> By the way, we had 15+% performance improvement in creates from the
> change that was discussed earlier in this thread:
> 
>            EXT4_SB(dquot->dq_sb)->s_qf_names[GRPQUOTA]) {
> +              if (test_bit(DQ_MOD_B, &dquot->dq_flags))
> +                       return 0;
>                dquot_mark_dquot_dirty(dquot);
>                return ext4_write_dquot(dquot);
> 
> The idea was that if we know that some thread is somewhere between
> mark_dirty and clear_dirty, then we can avoid blocking on dqio_mutex,
> since that thread will update the ondisk dquot for us.
> 
> I think, you also mentioned that some mark_dquot_dirty callers, such
> as do_set_dqblk, may not be running with an open transaction handle,
> so we cannot assume this optimization is atomic. However, we don't
> use do_set_dqblk and seem safe wrt journalling.

OK, thanks for info. I'm reluctant to make ext4_mark_dquot_dirty() return
before quota data is actually copied to the transaction which is what
probably brings you the benefit. I'll think about it some more.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: quota: dqio_mutex design
  2017-08-02 17:52           ` Andrew Perepechko
  2017-08-03 11:09             ` Jan Kara
@ 2017-08-03 11:31             ` Wang Shilong
  2017-08-03 12:24               ` Andrew Perepechko
  2017-08-03 14:36               ` Jan Kara
  1 sibling, 2 replies; 22+ messages in thread
From: Wang Shilong @ 2017-08-03 11:31 UTC (permalink / raw)
  To: Andrew Perepechko, Shuichi Ihara, Wang Shilong, Li Xi,
	Ext4 Developers List
  Cc: Jan Kara, linux-fsdevel

Hello Guys,

We DDN is investigating the same issue!

Some comments comes:

On Thu, Aug 3, 2017 at 1:52 AM, Andrew Perepechko <anserper@yandex.ru> wrote:
>> On Tue 01-08-17 15:02:42, Jan Kara wrote:
>> > Hi Andrew,
>> >
>> I've been experimenting with this today but this idea didn't bring any
>> benefit in my testing. Was your setup with multiple users or a single user?
>> Could you give some testing to my patches to see whether they bring some
>> benefit to you?
>>
>>                                                               Honza
>
> Hi Jan!
>
> My setup was with a single user. Unfortunately, it may take some time before
> I can try a patched kernel other than RHEL6 or RHEL7 with the same test,
> we have a lot of dependencies on these kernels.
>
> The actual test we ran was mdtest.
>
> By the way, we had 15+% performance improvement in creates from the
> change that was discussed earlier in this thread:
>
>            EXT4_SB(dquot->dq_sb)->s_qf_names[GRPQUOTA]) {
> +              if (test_bit(DQ_MOD_B, &dquot->dq_flags))
> +                       return 0;

I don't think this is right, as far as i understand, journal quota need go
together with quota space change update inside same transaction, this will
break consistency if power off or RO happen.

Here is some ideas that i have thought:

1) switch dqio_mutex to a read/write lock, especially, i think most of
time journal quota updates is in-place update, that means we don't need
change quota tree in memory, firstly try read lock, retry with write lock if
there is real tree change.

2)another is similar idea of Andrew's walkaround, but we need make correct
fix, maintain dirty list for per transaction, and gurantee quota updates are
flushed when commit transaction, this might be complex, i am not very
familiar with JBD2 codes.

It will be really nice if we could fix this regression, as we see 20% performace
regression.

Thanks,
Shilong

>                dquot_mark_dquot_dirty(dquot);
>                return ext4_write_dquot(dquot);
>
> The idea was that if we know that some thread is somewhere between
> mark_dirty and clear_dirty, then we can avoid blocking on dqio_mutex,
> since that thread will update the ondisk dquot for us.
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: quota: dqio_mutex design
  2017-08-03 11:31             ` Wang Shilong
@ 2017-08-03 12:24               ` Andrew Perepechko
  2017-08-03 13:19                 ` Wang Shilong
  2017-08-03 14:36               ` Jan Kara
  1 sibling, 1 reply; 22+ messages in thread
From: Andrew Perepechko @ 2017-08-03 12:24 UTC (permalink / raw)
  To: Wang Shilong
  Cc: Shuichi Ihara, Wang Shilong, Li Xi, Ext4 Developers List,
	Jan Kara, linux-fsdevel

> 
> I don't think this is right, as far as i understand, journal quota need go
> together with quota space change update inside same transaction, this will
> break consistency if power off or RO happen.
> 

Hello Wang!

There is no transaction change in this case because all callers of this
function have open handles for the same transaction.

If you enter that DQ_MOD_B check, you are guaranteed to reference
the SAME transaction as the thread that's in between of mark_dirty
and clear_dirty.

Thank you,
Andrew

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: quota: dqio_mutex design
  2017-08-03 12:24               ` Andrew Perepechko
@ 2017-08-03 13:19                 ` Wang Shilong
  2017-08-03 13:41                   ` Andrew Perepechko
  0 siblings, 1 reply; 22+ messages in thread
From: Wang Shilong @ 2017-08-03 13:19 UTC (permalink / raw)
  To: Andrew Perepechko
  Cc: Shuichi Ihara, Wang Shilong, Li Xi, Ext4 Developers List,
	Jan Kara, linux-fsdevel

Hi,

On Thu, Aug 3, 2017 at 8:24 PM, Andrew Perepechko <anserper@yandex.ru> wrote:
>>
>> I don't think this is right, as far as i understand, journal quota need go
>> together with quota space change update inside same transaction, this will
>> break consistency if power off or RO happen.
>>
>
> Hello Wang!
>
> There is no transaction change in this case because all callers of this
> function have open handles for the same transaction.
>
> If you enter that DQ_MOD_B check, you are guaranteed to reference
> the SAME transaction as the thread that's in between of mark_dirty
> and clear_dirty.
>

This change mean if this dquot is dirty we skip, this
won't work because in this way, quota update is only kept in vfs dquota memory
and newer update is not wrote to journal file and not wrapped into transaction
too.

This is not what journal quota means to do.


Thanks,
Shilong


> Thank you,
> Andrew

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: quota: dqio_mutex design
  2017-08-03 13:19                 ` Wang Shilong
@ 2017-08-03 13:41                   ` Andrew Perepechko
  2017-08-03 13:55                     ` Andrew Perepechko
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Perepechko @ 2017-08-03 13:41 UTC (permalink / raw)
  To: Wang Shilong
  Cc: Shuichi Ihara, Wang Shilong, Li Xi, Ext4 Developers List,
	Jan Kara, linux-fsdevel

> 
> This change mean if this dquot is dirty we skip, this
> won't work because in this way, quota update is only kept in vfs dquota
> memory and newer update is not wrote to journal file and not wrapped into
> transaction too.

That's not true.

As I explained earlier, having DQ_MOD_B set at this point means another
thread is going to write dquot but hasn't yet started doing so. This thread
does not care whether it updates the ondisk dquot with its own data or with
fresher data which came from another thread. In-core dquot has no indication
of whose data in contains.

As I also explained earlier, the update cannot happen in the context of
another transaction because thread A which sees DQ_MOD_B set and thread
B which is running dquot_commit() both have journal handles to the same
transaction. There's only one running transaction at a time and thread B does
not switch to another transaction.

Please read the code carefully.


> 
> This is not what journal quota means to do.
> 
> 
> Thanks,
> Shilong
> 
> > Thank you,
> > Andrew

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: quota: dqio_mutex design
  2017-08-03 13:41                   ` Andrew Perepechko
@ 2017-08-03 13:55                     ` Andrew Perepechko
  2017-08-03 14:23                       ` Jan Kara
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Perepechko @ 2017-08-03 13:55 UTC (permalink / raw)
  To: Wang Shilong
  Cc: Shuichi Ihara, Wang Shilong, Li Xi, Ext4 Developers List,
	Jan Kara, linux-fsdevel

Let me put it this way:

Under file creation from different threads, ext4 will generate a series of
dquot updates (incore and then ondisk, through journal):

dquot update1
dquot update2
dquot update3
...
dquot updateN

Either with my patch or without it, ondisk dquot update through journal
may miss dquot update1, dquot update2, ... dquot update{N-1}.

You can easily see that from the code of dquot_commit():

int dquot_commit(struct dquot *dquot)
{
        int ret = 0;
        struct quota_info *dqopt = sb_dqopt(dquot->dq_sb);

        mutex_lock(&dqopt->dqio_mutex);
        spin_lock(&dq_list_lock);
        if (!clear_dquot_dirty(dquot)) {
                spin_unlock(&dq_list_lock);
                goto out_sem;
        }
...
}


If actual dquot_commit() wrote dquot update N, the threads commiting
updates 1 through N-1 will exit immediately once they get dqio_mutex
since the dquot will NOT be dirty.

My patch only avoids blocking on dqio_mutex when we know for sure
that another will NECESSARILY write the needed or a FRESHER dquot ondisk.

> > This change mean if this dquot is dirty we skip, this
> > won't work because in this way, quota update is only kept in vfs dquota
> > memory and newer update is not wrote to journal file and not wrapped into
> > transaction too.
> 
> That's not true.
> 
> As I explained earlier, having DQ_MOD_B set at this point means another
> thread is going to write dquot but hasn't yet started doing so. This thread
> does not care whether it updates the ondisk dquot with its own data or with
> fresher data which came from another thread. In-core dquot has no indication
> of whose data in contains.
> 
> As I also explained earlier, the update cannot happen in the context of
> another transaction because thread A which sees DQ_MOD_B set and thread
> B which is running dquot_commit() both have journal handles to the same
> transaction. There's only one running transaction at a time and thread B
> does not switch to another transaction.
> 
> Please read the code carefully.
> 
> > This is not what journal quota means to do.
> > 
> > 
> > Thanks,
> > Shilong
> > 
> > > Thank you,
> > > Andrew

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: quota: dqio_mutex design
  2017-08-03 13:55                     ` Andrew Perepechko
@ 2017-08-03 14:23                       ` Jan Kara
  0 siblings, 0 replies; 22+ messages in thread
From: Jan Kara @ 2017-08-03 14:23 UTC (permalink / raw)
  To: Andrew Perepechko
  Cc: Wang Shilong, Shuichi Ihara, Wang Shilong, Li Xi,
	Ext4 Developers List, Jan Kara, linux-fsdevel

On Thu 03-08-17 16:55:40, Andrew Perepechko wrote:
> Let me put it this way:
> 
> Under file creation from different threads, ext4 will generate a series of
> dquot updates (incore and then ondisk, through journal):
> 
> dquot update1
> dquot update2
> dquot update3
> ...
> dquot updateN
> 
> Either with my patch or without it, ondisk dquot update through journal
> may miss dquot update1, dquot update2, ... dquot update{N-1}.
> 
> You can easily see that from the code of dquot_commit():
> 
> int dquot_commit(struct dquot *dquot)
> {
>         int ret = 0;
>         struct quota_info *dqopt = sb_dqopt(dquot->dq_sb);
> 
>         mutex_lock(&dqopt->dqio_mutex);
>         spin_lock(&dq_list_lock);
>         if (!clear_dquot_dirty(dquot)) {
>                 spin_unlock(&dq_list_lock);
>                 goto out_sem;
>         }
> ...
> }
> 
> 
> If actual dquot_commit() wrote dquot update N, the threads commiting
> updates 1 through N-1 will exit immediately once they get dqio_mutex
> since the dquot will NOT be dirty.
> 
> My patch only avoids blocking on dqio_mutex when we know for sure
> that another will NECESSARILY write the needed or a FRESHER dquot ondisk.

Yeah, I agree with Andrew. What they did is *almost* safe for ext4. The
only moment when it is not safe is when someone calls mark_dquot_dirty()
outside of a scope of a transaction which happens when doing Q_SETQUOTA
quotactl.

Another things which is subtle with Andrew's approach is that process
modifying quota information can return and stop its handle before quota
data gets copied to transaction buffer. This does not currently create any
real problem since nobody is relying on that however it relies on intimate
details of JBD2 transaction machinery and that could bite us in the future.

								Honza

> > > This change mean if this dquot is dirty we skip, this
> > > won't work because in this way, quota update is only kept in vfs dquota
> > > memory and newer update is not wrote to journal file and not wrapped into
> > > transaction too.
> > 
> > That's not true.
> > 
> > As I explained earlier, having DQ_MOD_B set at this point means another
> > thread is going to write dquot but hasn't yet started doing so. This thread
> > does not care whether it updates the ondisk dquot with its own data or with
> > fresher data which came from another thread. In-core dquot has no indication
> > of whose data in contains.
> > 
> > As I also explained earlier, the update cannot happen in the context of
> > another transaction because thread A which sees DQ_MOD_B set and thread
> > B which is running dquot_commit() both have journal handles to the same
> > transaction. There's only one running transaction at a time and thread B
> > does not switch to another transaction.
> > 
> > Please read the code carefully.
> > 
> > > This is not what journal quota means to do.
> > > 
> > > 
> > > Thanks,
> > > Shilong
> > > 
> > > > Thank you,
> > > > Andrew
> 
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: quota: dqio_mutex design
  2017-08-03 11:31             ` Wang Shilong
  2017-08-03 12:24               ` Andrew Perepechko
@ 2017-08-03 14:36               ` Jan Kara
  2017-08-03 14:39                 ` Wang Shilong
  1 sibling, 1 reply; 22+ messages in thread
From: Jan Kara @ 2017-08-03 14:36 UTC (permalink / raw)
  To: Wang Shilong
  Cc: Andrew Perepechko, Shuichi Ihara, Wang Shilong, Li Xi,
	Ext4 Developers List, Jan Kara, linux-fsdevel

Hello!

On Thu 03-08-17 19:31:04, Wang Shilong wrote:
> We DDN is investigating the same issue!
> 
> Some comments comes:
> 
> On Thu, Aug 3, 2017 at 1:52 AM, Andrew Perepechko <anserper@yandex.ru> wrote:
> >> On Tue 01-08-17 15:02:42, Jan Kara wrote:
> >> > Hi Andrew,
> >> >
> >> I've been experimenting with this today but this idea didn't bring any
> >> benefit in my testing. Was your setup with multiple users or a single user?
> >> Could you give some testing to my patches to see whether they bring some
> >> benefit to you?
> >>
> >>                                                               Honza
> >
> > Hi Jan!
> >
> > My setup was with a single user. Unfortunately, it may take some time before
> > I can try a patched kernel other than RHEL6 or RHEL7 with the same test,
> > we have a lot of dependencies on these kernels.
> >
> > The actual test we ran was mdtest.
> >
> > By the way, we had 15+% performance improvement in creates from the
> > change that was discussed earlier in this thread:
> >
> >            EXT4_SB(dquot->dq_sb)->s_qf_names[GRPQUOTA]) {
> > +              if (test_bit(DQ_MOD_B, &dquot->dq_flags))
> > +                       return 0;
> 
> I don't think this is right, as far as i understand, journal quota need go
> together with quota space change update inside same transaction, this will
> break consistency if power off or RO happen.
> 
> Here is some ideas that i have thought:
> 
> 1) switch dqio_mutex to a read/write lock, especially, i think most of
> time journal quota updates is in-place update, that means we don't need
> change quota tree in memory, firstly try read lock, retry with write lock if
> there is real tree change.
> 
> 2)another is similar idea of Andrew's walkaround, but we need make correct
> fix, maintain dirty list for per transaction, and gurantee quota updates are
> flushed when commit transaction, this might be complex, i am not very
> familiar with JBD2 codes.
> 
> It will be really nice if we could fix this regression, as we see 20% performace
> regression.

So I have couple of patches:

1) I convert dqio_mutex do rw semaphore and use it in exclusive mode only
when quota tree is going to change. We also use dq_lock to serialize writes
of dquot - you cannot have two writes happening in parallel as that could
result in stale data being on disk. This patch brings benefit when there
are multiple users - now they don't contend on common lock. It shows
advantage in my testing so I plan to merge these patches. When the
contention is on a structure for single user this change however doesn't
bring much (the performance change is in statistical noise in my testing).

2) I have patches to remove some contention on dq_list_lock by not using
dirty list for tracking dquots in ext4 (and thus avoid dq_list_lock
completely in quota modification path). This does not bring measurable
benefit in my testing even on ramdisk but lockstat data for dq_list_lock
looks much better after this - it seems lock contention just shifted to
dq_data_lock - I'll try to address that as well and see whether I'll be
able to measure some advantage.

3) I have patches to convert dquot dirty bit to sequence counter so that
in commit_dqblk() we can check whether dquot state we wanted to write is
already on disk. Note that this is different from Andrew's approach in that
we do wait for dquot to be actually written before returning. We just don't
repeat the write unnecessarily. However this didn't bring any measurable
benefit in my testing so unless I'll be able to confirm it benefits some
workloads I won't merge this change.

If you can experiment with your workloads, I can send you patches. I'd be
keen on having some performance data from real setups...

								Honza

> 
> Thanks,
> Shilong
> 
> >                dquot_mark_dquot_dirty(dquot);
> >                return ext4_write_dquot(dquot);
> >
> > The idea was that if we know that some thread is somewhere between
> > mark_dirty and clear_dirty, then we can avoid blocking on dqio_mutex,
> > since that thread will update the ondisk dquot for us.
> >
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: quota: dqio_mutex design
  2017-08-03 14:36               ` Jan Kara
@ 2017-08-03 14:39                 ` Wang Shilong
  2017-08-08 16:06                   ` Jan Kara
  0 siblings, 1 reply; 22+ messages in thread
From: Wang Shilong @ 2017-08-03 14:39 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Perepechko, Shuichi Ihara, Wang Shilong, Li Xi,
	Ext4 Developers List, linux-fsdevel

Hello Jan,


Please send me patches, we could test and response you!

Thanks,
Shilong

On Thu, Aug 3, 2017 at 10:36 PM, Jan Kara <jack@suse.cz> wrote:
> Hello!
>
> On Thu 03-08-17 19:31:04, Wang Shilong wrote:
>> We DDN is investigating the same issue!
>>
>> Some comments comes:
>>
>> On Thu, Aug 3, 2017 at 1:52 AM, Andrew Perepechko <anserper@yandex.ru> wrote:
>> >> On Tue 01-08-17 15:02:42, Jan Kara wrote:
>> >> > Hi Andrew,
>> >> >
>> >> I've been experimenting with this today but this idea didn't bring any
>> >> benefit in my testing. Was your setup with multiple users or a single user?
>> >> Could you give some testing to my patches to see whether they bring some
>> >> benefit to you?
>> >>
>> >>                                                               Honza
>> >
>> > Hi Jan!
>> >
>> > My setup was with a single user. Unfortunately, it may take some time before
>> > I can try a patched kernel other than RHEL6 or RHEL7 with the same test,
>> > we have a lot of dependencies on these kernels.
>> >
>> > The actual test we ran was mdtest.
>> >
>> > By the way, we had 15+% performance improvement in creates from the
>> > change that was discussed earlier in this thread:
>> >
>> >            EXT4_SB(dquot->dq_sb)->s_qf_names[GRPQUOTA]) {
>> > +              if (test_bit(DQ_MOD_B, &dquot->dq_flags))
>> > +                       return 0;
>>
>> I don't think this is right, as far as i understand, journal quota need go
>> together with quota space change update inside same transaction, this will
>> break consistency if power off or RO happen.
>>
>> Here is some ideas that i have thought:
>>
>> 1) switch dqio_mutex to a read/write lock, especially, i think most of
>> time journal quota updates is in-place update, that means we don't need
>> change quota tree in memory, firstly try read lock, retry with write lock if
>> there is real tree change.
>>
>> 2)another is similar idea of Andrew's walkaround, but we need make correct
>> fix, maintain dirty list for per transaction, and gurantee quota updates are
>> flushed when commit transaction, this might be complex, i am not very
>> familiar with JBD2 codes.
>>
>> It will be really nice if we could fix this regression, as we see 20% performace
>> regression.
>
> So I have couple of patches:
>
> 1) I convert dqio_mutex do rw semaphore and use it in exclusive mode only
> when quota tree is going to change. We also use dq_lock to serialize writes
> of dquot - you cannot have two writes happening in parallel as that could
> result in stale data being on disk. This patch brings benefit when there
> are multiple users - now they don't contend on common lock. It shows
> advantage in my testing so I plan to merge these patches. When the
> contention is on a structure for single user this change however doesn't
> bring much (the performance change is in statistical noise in my testing).
>
> 2) I have patches to remove some contention on dq_list_lock by not using
> dirty list for tracking dquots in ext4 (and thus avoid dq_list_lock
> completely in quota modification path). This does not bring measurable
> benefit in my testing even on ramdisk but lockstat data for dq_list_lock
> looks much better after this - it seems lock contention just shifted to
> dq_data_lock - I'll try to address that as well and see whether I'll be
> able to measure some advantage.
>
> 3) I have patches to convert dquot dirty bit to sequence counter so that
> in commit_dqblk() we can check whether dquot state we wanted to write is
> already on disk. Note that this is different from Andrew's approach in that
> we do wait for dquot to be actually written before returning. We just don't
> repeat the write unnecessarily. However this didn't bring any measurable
> benefit in my testing so unless I'll be able to confirm it benefits some
> workloads I won't merge this change.
>
> If you can experiment with your workloads, I can send you patches. I'd be
> keen on having some performance data from real setups...
>
>                                                                 Honza
>
>>
>> Thanks,
>> Shilong
>>
>> >                dquot_mark_dquot_dirty(dquot);
>> >                return ext4_write_dquot(dquot);
>> >
>> > The idea was that if we know that some thread is somewhere between
>> > mark_dirty and clear_dirty, then we can avoid blocking on dqio_mutex,
>> > since that thread will update the ondisk dquot for us.
>> >
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: quota: dqio_mutex design
  2017-08-03 14:39                 ` Wang Shilong
@ 2017-08-08 16:06                   ` Jan Kara
  2017-08-14  3:24                     ` Wang Shilong
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Kara @ 2017-08-08 16:06 UTC (permalink / raw)
  To: Wang Shilong
  Cc: Jan Kara, Andrew Perepechko, Shuichi Ihara, Wang Shilong, Li Xi,
	Ext4 Developers List, linux-fsdevel

Hi,

On Thu 03-08-17 22:39:51, Wang Shilong wrote:
> Please send me patches, we could test and response you!

So I finally have something which isn't obviously wrong (it survives basic
testing and gives me improvements for some workloads). I have pushed out
the patches to:

git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git quota_scaling

I'd be happy if you can share your results with my patches. I have not yet
figured out a safe way to reduce the contention on dq_lock during update of
on-disk structure when lot of processes bang single dquot. I have
experimental patch but it didn't bring any benefit in my testing - I'll
rebase it on top of other patches I have send it to you for some testing.

								Honza

> On Thu, Aug 3, 2017 at 10:36 PM, Jan Kara <jack@suse.cz> wrote:
> > Hello!
> >
> > On Thu 03-08-17 19:31:04, Wang Shilong wrote:
> >> We DDN is investigating the same issue!
> >>
> >> Some comments comes:
> >>
> >> On Thu, Aug 3, 2017 at 1:52 AM, Andrew Perepechko <anserper@yandex.ru> wrote:
> >> >> On Tue 01-08-17 15:02:42, Jan Kara wrote:
> >> >> > Hi Andrew,
> >> >> >
> >> >> I've been experimenting with this today but this idea didn't bring any
> >> >> benefit in my testing. Was your setup with multiple users or a single user?
> >> >> Could you give some testing to my patches to see whether they bring some
> >> >> benefit to you?
> >> >>
> >> >>                                                               Honza
> >> >
> >> > Hi Jan!
> >> >
> >> > My setup was with a single user. Unfortunately, it may take some time before
> >> > I can try a patched kernel other than RHEL6 or RHEL7 with the same test,
> >> > we have a lot of dependencies on these kernels.
> >> >
> >> > The actual test we ran was mdtest.
> >> >
> >> > By the way, we had 15+% performance improvement in creates from the
> >> > change that was discussed earlier in this thread:
> >> >
> >> >            EXT4_SB(dquot->dq_sb)->s_qf_names[GRPQUOTA]) {
> >> > +              if (test_bit(DQ_MOD_B, &dquot->dq_flags))
> >> > +                       return 0;
> >>
> >> I don't think this is right, as far as i understand, journal quota need go
> >> together with quota space change update inside same transaction, this will
> >> break consistency if power off or RO happen.
> >>
> >> Here is some ideas that i have thought:
> >>
> >> 1) switch dqio_mutex to a read/write lock, especially, i think most of
> >> time journal quota updates is in-place update, that means we don't need
> >> change quota tree in memory, firstly try read lock, retry with write lock if
> >> there is real tree change.
> >>
> >> 2)another is similar idea of Andrew's walkaround, but we need make correct
> >> fix, maintain dirty list for per transaction, and gurantee quota updates are
> >> flushed when commit transaction, this might be complex, i am not very
> >> familiar with JBD2 codes.
> >>
> >> It will be really nice if we could fix this regression, as we see 20% performace
> >> regression.
> >
> > So I have couple of patches:
> >
> > 1) I convert dqio_mutex do rw semaphore and use it in exclusive mode only
> > when quota tree is going to change. We also use dq_lock to serialize writes
> > of dquot - you cannot have two writes happening in parallel as that could
> > result in stale data being on disk. This patch brings benefit when there
> > are multiple users - now they don't contend on common lock. It shows
> > advantage in my testing so I plan to merge these patches. When the
> > contention is on a structure for single user this change however doesn't
> > bring much (the performance change is in statistical noise in my testing).
> >
> > 2) I have patches to remove some contention on dq_list_lock by not using
> > dirty list for tracking dquots in ext4 (and thus avoid dq_list_lock
> > completely in quota modification path). This does not bring measurable
> > benefit in my testing even on ramdisk but lockstat data for dq_list_lock
> > looks much better after this - it seems lock contention just shifted to
> > dq_data_lock - I'll try to address that as well and see whether I'll be
> > able to measure some advantage.
> >
> > 3) I have patches to convert dquot dirty bit to sequence counter so that
> > in commit_dqblk() we can check whether dquot state we wanted to write is
> > already on disk. Note that this is different from Andrew's approach in that
> > we do wait for dquot to be actually written before returning. We just don't
> > repeat the write unnecessarily. However this didn't bring any measurable
> > benefit in my testing so unless I'll be able to confirm it benefits some
> > workloads I won't merge this change.
> >
> > If you can experiment with your workloads, I can send you patches. I'd be
> > keen on having some performance data from real setups...
> >
> >                                                                 Honza
> >
> >>
> >> Thanks,
> >> Shilong
> >>
> >> >                dquot_mark_dquot_dirty(dquot);
> >> >                return ext4_write_dquot(dquot);
> >> >
> >> > The idea was that if we know that some thread is somewhere between
> >> > mark_dirty and clear_dirty, then we can avoid blocking on dqio_mutex,
> >> > since that thread will update the ondisk dquot for us.
> >> >
> > --
> > Jan Kara <jack@suse.com>
> > SUSE Labs, CR
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: quota: dqio_mutex design
  2017-08-08 16:06                   ` Jan Kara
@ 2017-08-14  3:24                     ` Wang Shilong
  2017-08-14  3:28                       ` Wang Shilong
  2017-08-14  3:53                       ` Wang Shilong
  0 siblings, 2 replies; 22+ messages in thread
From: Wang Shilong @ 2017-08-14  3:24 UTC (permalink / raw)
  To: Jan Kara, Wang Shilong
  Cc: Andrew Perepechko, Shuichi Ihara, Li Xi, Ext4 Developers List,
	linux-fsdevel

Hello Jan,

   We have tested your patches, in generally, it helped in our case. Noticed,
our test case is only one user with many process create/remove file.


	4.13.0-rc3 without any patches					
	no Quota		-O quota'		-O quota, project'	
	File Creation	File Unlink	File Creation	File Unlink	File Creation	File Unlink
0	93,068 	         296,028 	            86,860 	285,131 	           85,199 	        189,653 
1	79,501 	         280,921 	            91,079 	277,349 	          186,279 	170,982 
2	79,932 	         299,750 	            90,246 	274,457 	           133,922 	191,677 
3	80,146 	         297,525 	            86,416 	272,160 	           192,354 	198,869 

	4.13.0-rc3/w Jan Kara patch					
	no Quota		-O quota'		-O quota, project'	
	File Creation	File Unlink	File Creation	File Unlink	File Creation	File Unlink  
0	73,057 	        311,217 	         74,898 	          286,120 	 81,217 	        288,138  ops/per second
1	78,872             312,471 	         76,470 	          277,033 	 77,014 	        288,057 
2	79,170             291,440 	         76,174              283,525 	 73,686            283,526 
3	79,941             309,168            78,493              277,331          78,751            281,377 

	4.13.0-rc3/with https://patchwork.ozlabs.org/patch/799014/				
	no Quota		-O quota'		-O quota, project'	
	File Creation	File Unlink	File Creation	File Unlink	File Creation	File Unlink
0	100,319 	        322,746 	         87,480 	         302,579 	         84,569 	        218,969 
1	728,424 	        299,808            312,766           293,471           219,198          199,389 
2	729,410           300,930            315,590           289,664           218,283          197,871 
3	727,555           298,797 	         316,837 	         289,108           213,095          213,458 
						
	4.13.0-rc3/w https://patchwork.ozlabs.org/patch/799014/ + Jan Kara patch					
	no Quota		-O quota'		-O quota, project'	
	File Creation	File Unlink	File Creation	File Unlink	File Creation	File Unlink
0	100,312 	         324,871 	         87,076 	         267,303 	          86,258 	        288,137 
1	707,524 	         298,892           361,963           252,493            421,919 	282,492 
2	707,792            298,162           363,450           264,923            397,723 	283,675 
3	707,420            302,552 	         354,013 	         266,638 	          421,537 	281,763 


In conclusion, your patches helped a lot for our testing, noticed, please ignored test0 running
for creation, the first time testing will loaded inode cache in memory, we used test1-3 to compare.

With extra patch applied, your patches improved File creation(quota+project) 2X, File unlink
1.5X.

Thanks,
Shilong

________________________________________
From: Jan Kara [jack@suse.cz]
Sent: Wednesday, August 09, 2017 0:06
To: Wang Shilong
Cc: Jan Kara; Andrew Perepechko; Shuichi Ihara; Wang Shilong; Li Xi; Ext4 Developers List; linux-fsdevel@vger.kernel.org
Subject: Re: quota: dqio_mutex design

Hi,

On Thu 03-08-17 22:39:51, Wang Shilong wrote:
> Please send me patches, we could test and response you!

So I finally have something which isn't obviously wrong (it survives basic
testing and gives me improvements for some workloads). I have pushed out
the patches to:

git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git quota_scaling

I'd be happy if you can share your results with my patches. I have not yet
figured out a safe way to reduce the contention on dq_lock during update of
on-disk structure when lot of processes bang single dquot. I have
experimental patch but it didn't bring any benefit in my testing - I'll
rebase it on top of other patches I have send it to you for some testing.

                                                                Honza

> On Thu, Aug 3, 2017 at 10:36 PM, Jan Kara <jack@suse.cz> wrote:
> > Hello!
> >
> > On Thu 03-08-17 19:31:04, Wang Shilong wrote:
> >> We DDN is investigating the same issue!
> >>
> >> Some comments comes:
> >>
> >> On Thu, Aug 3, 2017 at 1:52 AM, Andrew Perepechko <anserper@yandex.ru> wrote:
> >> >> On Tue 01-08-17 15:02:42, Jan Kara wrote:
> >> >> > Hi Andrew,
> >> >> >
> >> >> I've been experimenting with this today but this idea didn't bring any
> >> >> benefit in my testing. Was your setup with multiple users or a single user?
> >> >> Could you give some testing to my patches to see whether they bring some
> >> >> benefit to you?
> >> >>
> >> >>                                                               Honza
> >> >
> >> > Hi Jan!
> >> >
> >> > My setup was with a single user. Unfortunately, it may take some time before
> >> > I can try a patched kernel other than RHEL6 or RHEL7 with the same test,
> >> > we have a lot of dependencies on these kernels.
> >> >
> >> > The actual test we ran was mdtest.
> >> >
> >> > By the way, we had 15+% performance improvement in creates from the
> >> > change that was discussed earlier in this thread:
> >> >
> >> >            EXT4_SB(dquot->dq_sb)->s_qf_names[GRPQUOTA]) {
> >> > +              if (test_bit(DQ_MOD_B, &dquot->dq_flags))
> >> > +                       return 0;
> >>
> >> I don't think this is right, as far as i understand, journal quota need go
> >> together with quota space change update inside same transaction, this will
> >> break consistency if power off or RO happen.
> >>
> >> Here is some ideas that i have thought:
> >>
> >> 1) switch dqio_mutex to a read/write lock, especially, i think most of
> >> time journal quota updates is in-place update, that means we don't need
> >> change quota tree in memory, firstly try read lock, retry with write lock if
> >> there is real tree change.
> >>
> >> 2)another is similar idea of Andrew's walkaround, but we need make correct
> >> fix, maintain dirty list for per transaction, and gurantee quota updates are
> >> flushed when commit transaction, this might be complex, i am not very
> >> familiar with JBD2 codes.
> >>
> >> It will be really nice if we could fix this regression, as we see 20% performace
> >> regression.
> >
> > So I have couple of patches:
> >
> > 1) I convert dqio_mutex do rw semaphore and use it in exclusive mode only
> > when quota tree is going to change. We also use dq_lock to serialize writes
> > of dquot - you cannot have two writes happening in parallel as that could
> > result in stale data being on disk. This patch brings benefit when there
> > are multiple users - now they don't contend on common lock. It shows
> > advantage in my testing so I plan to merge these patches. When the
> > contention is on a structure for single user this change however doesn't
> > bring much (the performance change is in statistical noise in my testing).
> >
> > 2) I have patches to remove some contention on dq_list_lock by not using
> > dirty list for tracking dquots in ext4 (and thus avoid dq_list_lock
> > completely in quota modification path). This does not bring measurable
> > benefit in my testing even on ramdisk but lockstat data for dq_list_lock
> > looks much better after this - it seems lock contention just shifted to
> > dq_data_lock - I'll try to address that as well and see whether I'll be
> > able to measure some advantage.
> >
> > 3) I have patches to convert dquot dirty bit to sequence counter so that
> > in commit_dqblk() we can check whether dquot state we wanted to write is
> > already on disk. Note that this is different from Andrew's approach in that
> > we do wait for dquot to be actually written before returning. We just don't
> > repeat the write unnecessarily. However this didn't bring any measurable
> > benefit in my testing so unless I'll be able to confirm it benefits some
> > workloads I won't merge this change.
> >
> > If you can experiment with your workloads, I can send you patches. I'd be
> > keen on having some performance data from real setups...
> >
> >                                                                 Honza
> >
> >>
> >> Thanks,
> >> Shilong
> >>
> >> >                dquot_mark_dquot_dirty(dquot);
> >> >                return ext4_write_dquot(dquot);
> >> >
> >> > The idea was that if we know that some thread is somewhere between
> >> > mark_dirty and clear_dirty, then we can avoid blocking on dqio_mutex,
> >> > since that thread will update the ondisk dquot for us.
> >> >
> > --
> > Jan Kara <jack@suse.com>
> > SUSE Labs, CR
--
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: quota: dqio_mutex design
  2017-08-14  3:24                     ` Wang Shilong
@ 2017-08-14  3:28                       ` Wang Shilong
  2017-08-14  3:53                       ` Wang Shilong
  1 sibling, 0 replies; 22+ messages in thread
From: Wang Shilong @ 2017-08-14  3:28 UTC (permalink / raw)
  To: Wang Shilong
  Cc: Jan Kara, Andrew Perepechko, Shuichi Ihara, Li Xi,
	Ext4 Developers List, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 9308 bytes --]

sorry, format did not look fine, please use attachment.

On Mon, Aug 14, 2017 at 11:24 AM, Wang Shilong <wshilong@ddn.com> wrote:
> Hello Jan,
>
>    We have tested your patches, in generally, it helped in our case. Noticed,
> our test case is only one user with many process create/remove file.
>
>
>         4.13.0-rc3 without any patches
>         no Quota                -O quota'               -O quota, project'
>         File Creation   File Unlink     File Creation   File Unlink     File Creation   File Unlink
> 0       93,068           296,028                    86,860      285,131      85,199               189,653
> 1       79,501           280,921                    91,079      277,349                   186,279       170,982
> 2       79,932           299,750                    90,246      274,457                    133,922      191,677
> 3       80,146           297,525                    86,416      272,160                    192,354      198,869
>
>         4.13.0-rc3/w Jan Kara patch
>         no Quota                -O quota'               -O quota, project'
>         File Creation   File Unlink     File Creation   File Unlink     File Creation   File Unlink
> 0       73,057          311,217                  74,898                   286,120        81,217                 288,138  ops/per second
> 1       78,872             312,471               76,470                   277,033        77,014                 288,057
> 2       79,170             291,440               76,174              283,525     73,686            283,526
> 3       79,941             309,168            78,493              277,331          78,751            281,377
>
>         4.13.0-rc3/with https://patchwork.ozlabs.org/patch/799014/
>         no Quota                -O quota'               -O quota, project'
>         File Creation   File Unlink     File Creation   File Unlink     File Creation   File Unlink
> 0       100,319                 322,746                  87,480                  302,579                 84,569                 218,969
> 1       728,424                 299,808            312,766           293,471           219,198          199,389
> 2       729,410           300,930            315,590           289,664           218,283          197,871
> 3       727,555           298,797                316,837                 289,108           213,095          213,458
>
>         4.13.0-rc3/w https://patchwork.ozlabs.org/patch/799014/ + Jan Kara patch
>         no Quota                -O quota'               -O quota, project'
>         File Creation   File Unlink     File Creation   File Unlink     File Creation   File Unlink
> 0       100,312                  324,871                 87,076                  267,303                  86,258                288,137
> 1       707,524                  298,892           361,963           252,493            421,919         282,492
> 2       707,792            298,162           363,450           264,923            397,723       283,675
> 3       707,420            302,552               354,013                 266,638                  421,537       281,763
>
>
> In conclusion, your patches helped a lot for our testing, noticed, please ignored test0 running
> for creation, the first time testing will loaded inode cache in memory, we used test1-3 to compare.
>
> With extra patch applied, your patches improved File creation(quota+project) 2X, File unlink
> 1.5X.
>
> Thanks,
> Shilong
>
> ________________________________________
> From: Jan Kara [jack@suse.cz]
> Sent: Wednesday, August 09, 2017 0:06
> To: Wang Shilong
> Cc: Jan Kara; Andrew Perepechko; Shuichi Ihara; Wang Shilong; Li Xi; Ext4 Developers List; linux-fsdevel@vger.kernel.org
> Subject: Re: quota: dqio_mutex design
>
> Hi,
>
> On Thu 03-08-17 22:39:51, Wang Shilong wrote:
>> Please send me patches, we could test and response you!
>
> So I finally have something which isn't obviously wrong (it survives basic
> testing and gives me improvements for some workloads). I have pushed out
> the patches to:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git quota_scaling
>
> I'd be happy if you can share your results with my patches. I have not yet
> figured out a safe way to reduce the contention on dq_lock during update of
> on-disk structure when lot of processes bang single dquot. I have
> experimental patch but it didn't bring any benefit in my testing - I'll
> rebase it on top of other patches I have send it to you for some testing.
>
>                                                                 Honza
>
>> On Thu, Aug 3, 2017 at 10:36 PM, Jan Kara <jack@suse.cz> wrote:
>> > Hello!
>> >
>> > On Thu 03-08-17 19:31:04, Wang Shilong wrote:
>> >> We DDN is investigating the same issue!
>> >>
>> >> Some comments comes:
>> >>
>> >> On Thu, Aug 3, 2017 at 1:52 AM, Andrew Perepechko <anserper@yandex.ru> wrote:
>> >> >> On Tue 01-08-17 15:02:42, Jan Kara wrote:
>> >> >> > Hi Andrew,
>> >> >> >
>> >> >> I've been experimenting with this today but this idea didn't bring any
>> >> >> benefit in my testing. Was your setup with multiple users or a single user?
>> >> >> Could you give some testing to my patches to see whether they bring some
>> >> >> benefit to you?
>> >> >>
>> >> >>                                                               Honza
>> >> >
>> >> > Hi Jan!
>> >> >
>> >> > My setup was with a single user. Unfortunately, it may take some time before
>> >> > I can try a patched kernel other than RHEL6 or RHEL7 with the same test,
>> >> > we have a lot of dependencies on these kernels.
>> >> >
>> >> > The actual test we ran was mdtest.
>> >> >
>> >> > By the way, we had 15+% performance improvement in creates from the
>> >> > change that was discussed earlier in this thread:
>> >> >
>> >> >            EXT4_SB(dquot->dq_sb)->s_qf_names[GRPQUOTA]) {
>> >> > +              if (test_bit(DQ_MOD_B, &dquot->dq_flags))
>> >> > +                       return 0;
>> >>
>> >> I don't think this is right, as far as i understand, journal quota need go
>> >> together with quota space change update inside same transaction, this will
>> >> break consistency if power off or RO happen.
>> >>
>> >> Here is some ideas that i have thought:
>> >>
>> >> 1) switch dqio_mutex to a read/write lock, especially, i think most of
>> >> time journal quota updates is in-place update, that means we don't need
>> >> change quota tree in memory, firstly try read lock, retry with write lock if
>> >> there is real tree change.
>> >>
>> >> 2)another is similar idea of Andrew's walkaround, but we need make correct
>> >> fix, maintain dirty list for per transaction, and gurantee quota updates are
>> >> flushed when commit transaction, this might be complex, i am not very
>> >> familiar with JBD2 codes.
>> >>
>> >> It will be really nice if we could fix this regression, as we see 20% performace
>> >> regression.
>> >
>> > So I have couple of patches:
>> >
>> > 1) I convert dqio_mutex do rw semaphore and use it in exclusive mode only
>> > when quota tree is going to change. We also use dq_lock to serialize writes
>> > of dquot - you cannot have two writes happening in parallel as that could
>> > result in stale data being on disk. This patch brings benefit when there
>> > are multiple users - now they don't contend on common lock. It shows
>> > advantage in my testing so I plan to merge these patches. When the
>> > contention is on a structure for single user this change however doesn't
>> > bring much (the performance change is in statistical noise in my testing).
>> >
>> > 2) I have patches to remove some contention on dq_list_lock by not using
>> > dirty list for tracking dquots in ext4 (and thus avoid dq_list_lock
>> > completely in quota modification path). This does not bring measurable
>> > benefit in my testing even on ramdisk but lockstat data for dq_list_lock
>> > looks much better after this - it seems lock contention just shifted to
>> > dq_data_lock - I'll try to address that as well and see whether I'll be
>> > able to measure some advantage.
>> >
>> > 3) I have patches to convert dquot dirty bit to sequence counter so that
>> > in commit_dqblk() we can check whether dquot state we wanted to write is
>> > already on disk. Note that this is different from Andrew's approach in that
>> > we do wait for dquot to be actually written before returning. We just don't
>> > repeat the write unnecessarily. However this didn't bring any measurable
>> > benefit in my testing so unless I'll be able to confirm it benefits some
>> > workloads I won't merge this change.
>> >
>> > If you can experiment with your workloads, I can send you patches. I'd be
>> > keen on having some performance data from real setups...
>> >
>> >                                                                 Honza
>> >
>> >>
>> >> Thanks,
>> >> Shilong
>> >>
>> >> >                dquot_mark_dquot_dirty(dquot);
>> >> >                return ext4_write_dquot(dquot);
>> >> >
>> >> > The idea was that if we know that some thread is somewhere between
>> >> > mark_dirty and clear_dirty, then we can avoid blocking on dqio_mutex,
>> >> > since that thread will update the ondisk dquot for us.
>> >> >
>> > --
>> > Jan Kara <jack@suse.com>
>> > SUSE Labs, CR
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

[-- Attachment #2: mdtest-JK-patch.xlsx --]
[-- Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet, Size: 27959 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: quota: dqio_mutex design
  2017-08-14  3:24                     ` Wang Shilong
  2017-08-14  3:28                       ` Wang Shilong
@ 2017-08-14  3:53                       ` Wang Shilong
  2017-08-14  8:22                         ` Jan Kara
  1 sibling, 1 reply; 22+ messages in thread
From: Wang Shilong @ 2017-08-14  3:53 UTC (permalink / raw)
  To: Wang Shilong
  Cc: Jan Kara, Andrew Perepechko, Shuichi Ihara, Li Xi,
	Ext4 Developers List, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 9473 bytes --]

Txt format attched.

BTW, Jan, it will be cool if you could point which patch help a lot for
our test case, since there are a lot of patches there, we want to port
some of patches to RHEL7.

Thanks,
Shilong

On Mon, Aug 14, 2017 at 11:24 AM, Wang Shilong <wshilong@ddn.com> wrote:
> Hello Jan,
>
>    We have tested your patches, in generally, it helped in our case. Noticed,
> our test case is only one user with many process create/remove file.
>
>
>         4.13.0-rc3 without any patches
>         no Quota                -O quota'               -O quota, project'
>         File Creation   File Unlink     File Creation   File Unlink     File Creation   File Unlink
> 0       93,068           296,028                    86,860      285,131                    85,199               189,653
> 1       79,501           280,921                    91,079      277,349                   186,279       170,982
> 2       79,932           299,750                    90,246      274,457                    133,922      191,677
> 3       80,146           297,525                    86,416      272,160                    192,354      198,869
>
>         4.13.0-rc3/w Jan Kara patch
>         no Quota                -O quota'               -O quota, project'
>         File Creation   File Unlink     File Creation   File Unlink     File Creation   File Unlink
> 0       73,057          311,217                  74,898                   286,120        81,217                 288,138  ops/per second
> 1       78,872             312,471               76,470                   277,033        77,014                 288,057
> 2       79,170             291,440               76,174              283,525     73,686            283,526
> 3       79,941             309,168            78,493              277,331          78,751            281,377
>
>         4.13.0-rc3/with https://patchwork.ozlabs.org/patch/799014/
>         no Quota                -O quota'               -O quota, project'
>         File Creation   File Unlink     File Creation   File Unlink     File Creation   File Unlink
> 0       100,319                 322,746                  87,480                  302,579                 84,569                 218,969
> 1       728,424                 299,808            312,766           293,471           219,198          199,389
> 2       729,410           300,930            315,590           289,664           218,283          197,871
> 3       727,555           298,797                316,837                 289,108           213,095          213,458
>
>         4.13.0-rc3/w https://patchwork.ozlabs.org/patch/799014/ + Jan Kara patch
>         no Quota                -O quota'               -O quota, project'
>         File Creation   File Unlink     File Creation   File Unlink     File Creation   File Unlink
> 0       100,312                  324,871                 87,076                  267,303                  86,258                288,137
> 1       707,524                  298,892           361,963           252,493            421,919         282,492
> 2       707,792            298,162           363,450           264,923            397,723       283,675
> 3       707,420            302,552               354,013                 266,638                  421,537       281,763
>
>
> In conclusion, your patches helped a lot for our testing, noticed, please ignored test0 running
> for creation, the first time testing will loaded inode cache in memory, we used test1-3 to compare.
>
> With extra patch applied, your patches improved File creation(quota+project) 2X, File unlink
> 1.5X.
>
> Thanks,
> Shilong
>
> ________________________________________
> From: Jan Kara [jack@suse.cz]
> Sent: Wednesday, August 09, 2017 0:06
> To: Wang Shilong
> Cc: Jan Kara; Andrew Perepechko; Shuichi Ihara; Wang Shilong; Li Xi; Ext4 Developers List; linux-fsdevel@vger.kernel.org
> Subject: Re: quota: dqio_mutex design
>
> Hi,
>
> On Thu 03-08-17 22:39:51, Wang Shilong wrote:
>> Please send me patches, we could test and response you!
>
> So I finally have something which isn't obviously wrong (it survives basic
> testing and gives me improvements for some workloads). I have pushed out
> the patches to:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git quota_scaling
>
> I'd be happy if you can share your results with my patches. I have not yet
> figured out a safe way to reduce the contention on dq_lock during update of
> on-disk structure when lot of processes bang single dquot. I have
> experimental patch but it didn't bring any benefit in my testing - I'll
> rebase it on top of other patches I have send it to you for some testing.
>
>                                                                 Honza
>
>> On Thu, Aug 3, 2017 at 10:36 PM, Jan Kara <jack@suse.cz> wrote:
>> > Hello!
>> >
>> > On Thu 03-08-17 19:31:04, Wang Shilong wrote:
>> >> We DDN is investigating the same issue!
>> >>
>> >> Some comments comes:
>> >>
>> >> On Thu, Aug 3, 2017 at 1:52 AM, Andrew Perepechko <anserper@yandex.ru> wrote:
>> >> >> On Tue 01-08-17 15:02:42, Jan Kara wrote:
>> >> >> > Hi Andrew,
>> >> >> >
>> >> >> I've been experimenting with this today but this idea didn't bring any
>> >> >> benefit in my testing. Was your setup with multiple users or a single user?
>> >> >> Could you give some testing to my patches to see whether they bring some
>> >> >> benefit to you?
>> >> >>
>> >> >>                                                               Honza
>> >> >
>> >> > Hi Jan!
>> >> >
>> >> > My setup was with a single user. Unfortunately, it may take some time before
>> >> > I can try a patched kernel other than RHEL6 or RHEL7 with the same test,
>> >> > we have a lot of dependencies on these kernels.
>> >> >
>> >> > The actual test we ran was mdtest.
>> >> >
>> >> > By the way, we had 15+% performance improvement in creates from the
>> >> > change that was discussed earlier in this thread:
>> >> >
>> >> >            EXT4_SB(dquot->dq_sb)->s_qf_names[GRPQUOTA]) {
>> >> > +              if (test_bit(DQ_MOD_B, &dquot->dq_flags))
>> >> > +                       return 0;
>> >>
>> >> I don't think this is right, as far as i understand, journal quota need go
>> >> together with quota space change update inside same transaction, this will
>> >> break consistency if power off or RO happen.
>> >>
>> >> Here is some ideas that i have thought:
>> >>
>> >> 1) switch dqio_mutex to a read/write lock, especially, i think most of
>> >> time journal quota updates is in-place update, that means we don't need
>> >> change quota tree in memory, firstly try read lock, retry with write lock if
>> >> there is real tree change.
>> >>
>> >> 2)another is similar idea of Andrew's walkaround, but we need make correct
>> >> fix, maintain dirty list for per transaction, and gurantee quota updates are
>> >> flushed when commit transaction, this might be complex, i am not very
>> >> familiar with JBD2 codes.
>> >>
>> >> It will be really nice if we could fix this regression, as we see 20% performace
>> >> regression.
>> >
>> > So I have couple of patches:
>> >
>> > 1) I convert dqio_mutex do rw semaphore and use it in exclusive mode only
>> > when quota tree is going to change. We also use dq_lock to serialize writes
>> > of dquot - you cannot have two writes happening in parallel as that could
>> > result in stale data being on disk. This patch brings benefit when there
>> > are multiple users - now they don't contend on common lock. It shows
>> > advantage in my testing so I plan to merge these patches. When the
>> > contention is on a structure for single user this change however doesn't
>> > bring much (the performance change is in statistical noise in my testing).
>> >
>> > 2) I have patches to remove some contention on dq_list_lock by not using
>> > dirty list for tracking dquots in ext4 (and thus avoid dq_list_lock
>> > completely in quota modification path). This does not bring measurable
>> > benefit in my testing even on ramdisk but lockstat data for dq_list_lock
>> > looks much better after this - it seems lock contention just shifted to
>> > dq_data_lock - I'll try to address that as well and see whether I'll be
>> > able to measure some advantage.
>> >
>> > 3) I have patches to convert dquot dirty bit to sequence counter so that
>> > in commit_dqblk() we can check whether dquot state we wanted to write is
>> > already on disk. Note that this is different from Andrew's approach in that
>> > we do wait for dquot to be actually written before returning. We just don't
>> > repeat the write unnecessarily. However this didn't bring any measurable
>> > benefit in my testing so unless I'll be able to confirm it benefits some
>> > workloads I won't merge this change.
>> >
>> > If you can experiment with your workloads, I can send you patches. I'd be
>> > keen on having some performance data from real setups...
>> >
>> >                                                                 Honza
>> >
>> >>
>> >> Thanks,
>> >> Shilong
>> >>
>> >> >                dquot_mark_dquot_dirty(dquot);
>> >> >                return ext4_write_dquot(dquot);
>> >> >
>> >> > The idea was that if we know that some thread is somewhere between
>> >> > mark_dirty and clear_dirty, then we can avoid blocking on dqio_mutex,
>> >> > since that thread will update the ondisk dquot for us.
>> >> >
>> > --
>> > Jan Kara <jack@suse.com>
>> > SUSE Labs, CR
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

[-- Attachment #2: quota-scaling-results.txt --]
[-- Type: text/plain, Size: 1771 bytes --]

4.13.0-rc3 without any patches
        no Quota            -O quota         -O quota,project
    creation   unlink    creation   unlink    creation  unlink
0   93,068  296,028      86,860  285,131      85,199    189,653     ops/per second
1   79,501  280,921      91,079  277,349     186,279    170,982
2   79,932  299,750      90,246  274,457     133,922    191,677
3   80,146  297,525      86,416  272,160     192,354    198,869

Jan Kara branch (quota_scaling)
        no Quota            -O quota         -O quota,project
  creation   unlink      creation   unlink    creation  unlink
0   73,057  311,217      74,898  286,120      81,217    288,138
1   78,872  312,471      76,470  277,033      77,014    288,057
2   79,170  291,440      76,174  283,525      73,686    283,526
3   79,941  309,168      78,493  277,331      78,751    281,377

4.13.0-rc3 with v5 patch https://patchwork.ozlabs.org/patch/799014/					
        no Quota            -O quota         -O quota,project
  creation   unlink     creation   unlink    creation   unlink
0  100,319  322,746     87,480   302,579      84,569    218,969
1  728,424  299,808     312,766  293,471     219,198    199,389
2  729,410  300,930     315,590  289,664     218,283    197,871
3  727,555  298,797     316,837  289,108     213,095    213,458

Jan Kara branch (quota_scaling) with v5 patch https://patchwork.ozlabs.org/patch/799014/
        no Quota            -O quota         -O quota,project
  creation   unlink    creation   unlink    creation     unlink
0  100,312  324,871      87,076  267,303      86,258    288,137
1  707,524  298,892     361,963  252,493     421,919    282,492
2  707,792  298,162     363,450  264,923     397,723    283,675
3  707,420  302,552     354,013  266,638     421,537    281,763

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: quota: dqio_mutex design
  2017-08-14  3:53                       ` Wang Shilong
@ 2017-08-14  8:22                         ` Jan Kara
  0 siblings, 0 replies; 22+ messages in thread
From: Jan Kara @ 2017-08-14  8:22 UTC (permalink / raw)
  To: Wang Shilong
  Cc: Wang Shilong, Jan Kara, Andrew Perepechko, Shuichi Ihara, Li Xi,
	Ext4 Developers List, linux-fsdevel

Hello,

On Mon 14-08-17 11:53:37, Wang Shilong wrote:
> Txt format attched.
> 
> BTW, Jan, it will be cool if you could point which patch help a lot for
> our test case, since there are a lot of patches there, we want to port
> some of patches to RHEL7.

Thanks for the test results! They are really interesting. Do you have any
explanation why without any patches the '-O quota,project' runs for 'File
Creation' are faster than runs without quota or any other runs in the test?

WRT which patches helped I don't have a good subset for you. In my testing
each patch helped a bit. I expect in your setup the conversion of dqio_sem
to rwsem and then to use dq_lock might not have that big impact. So you
might try backporting patches from "quota: Fix possible corruption of
dqi_flags" onward.

								Honza

> On Mon, Aug 14, 2017 at 11:24 AM, Wang Shilong <wshilong@ddn.com> wrote:
> > Hello Jan,
> >
> >    We have tested your patches, in generally, it helped in our case. Noticed,
> > our test case is only one user with many process create/remove file.
> >
> >
> >         4.13.0-rc3 without any patches
> >         no Quota                -O quota'               -O quota, project'
> >         File Creation   File Unlink     File Creation   File Unlink     File Creation   File Unlink
> > 0       93,068           296,028                    86,860      285,131                    85,199               189,653
> > 1       79,501           280,921                    91,079      277,349                   186,279       170,982
> > 2       79,932           299,750                    90,246      274,457                    133,922      191,677
> > 3       80,146           297,525                    86,416      272,160                    192,354      198,869
> >
> >         4.13.0-rc3/w Jan Kara patch
> >         no Quota                -O quota'               -O quota, project'
> >         File Creation   File Unlink     File Creation   File Unlink     File Creation   File Unlink
> > 0       73,057          311,217                  74,898                   286,120        81,217                 288,138  ops/per second
> > 1       78,872             312,471               76,470                   277,033        77,014                 288,057
> > 2       79,170             291,440               76,174              283,525     73,686            283,526
> > 3       79,941             309,168            78,493              277,331          78,751            281,377
> >
> >         4.13.0-rc3/with https://patchwork.ozlabs.org/patch/799014/
> >         no Quota                -O quota'               -O quota, project'
> >         File Creation   File Unlink     File Creation   File Unlink     File Creation   File Unlink
> > 0       100,319                 322,746                  87,480                  302,579                 84,569                 218,969
> > 1       728,424                 299,808            312,766           293,471           219,198          199,389
> > 2       729,410           300,930            315,590           289,664           218,283          197,871
> > 3       727,555           298,797                316,837                 289,108           213,095          213,458
> >
> >         4.13.0-rc3/w https://patchwork.ozlabs.org/patch/799014/ + Jan Kara patch
> >         no Quota                -O quota'               -O quota, project'
> >         File Creation   File Unlink     File Creation   File Unlink     File Creation   File Unlink
> > 0       100,312                  324,871                 87,076                  267,303                  86,258                288,137
> > 1       707,524                  298,892           361,963           252,493            421,919         282,492
> > 2       707,792            298,162           363,450           264,923            397,723       283,675
> > 3       707,420            302,552               354,013                 266,638                  421,537       281,763
> >
> >
> > In conclusion, your patches helped a lot for our testing, noticed, please ignored test0 running
> > for creation, the first time testing will loaded inode cache in memory, we used test1-3 to compare.
> >
> > With extra patch applied, your patches improved File creation(quota+project) 2X, File unlink
> > 1.5X.
> >
> > Thanks,
> > Shilong
> >
> > ________________________________________
> > From: Jan Kara [jack@suse.cz]
> > Sent: Wednesday, August 09, 2017 0:06
> > To: Wang Shilong
> > Cc: Jan Kara; Andrew Perepechko; Shuichi Ihara; Wang Shilong; Li Xi; Ext4 Developers List; linux-fsdevel@vger.kernel.org
> > Subject: Re: quota: dqio_mutex design
> >
> > Hi,
> >
> > On Thu 03-08-17 22:39:51, Wang Shilong wrote:
> >> Please send me patches, we could test and response you!
> >
> > So I finally have something which isn't obviously wrong (it survives basic
> > testing and gives me improvements for some workloads). I have pushed out
> > the patches to:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git quota_scaling
> >
> > I'd be happy if you can share your results with my patches. I have not yet
> > figured out a safe way to reduce the contention on dq_lock during update of
> > on-disk structure when lot of processes bang single dquot. I have
> > experimental patch but it didn't bring any benefit in my testing - I'll
> > rebase it on top of other patches I have send it to you for some testing.
> >
> >                                                                 Honza
> >
> >> On Thu, Aug 3, 2017 at 10:36 PM, Jan Kara <jack@suse.cz> wrote:
> >> > Hello!
> >> >
> >> > On Thu 03-08-17 19:31:04, Wang Shilong wrote:
> >> >> We DDN is investigating the same issue!
> >> >>
> >> >> Some comments comes:
> >> >>
> >> >> On Thu, Aug 3, 2017 at 1:52 AM, Andrew Perepechko <anserper@yandex.ru> wrote:
> >> >> >> On Tue 01-08-17 15:02:42, Jan Kara wrote:
> >> >> >> > Hi Andrew,
> >> >> >> >
> >> >> >> I've been experimenting with this today but this idea didn't bring any
> >> >> >> benefit in my testing. Was your setup with multiple users or a single user?
> >> >> >> Could you give some testing to my patches to see whether they bring some
> >> >> >> benefit to you?
> >> >> >>
> >> >> >>                                                               Honza
> >> >> >
> >> >> > Hi Jan!
> >> >> >
> >> >> > My setup was with a single user. Unfortunately, it may take some time before
> >> >> > I can try a patched kernel other than RHEL6 or RHEL7 with the same test,
> >> >> > we have a lot of dependencies on these kernels.
> >> >> >
> >> >> > The actual test we ran was mdtest.
> >> >> >
> >> >> > By the way, we had 15+% performance improvement in creates from the
> >> >> > change that was discussed earlier in this thread:
> >> >> >
> >> >> >            EXT4_SB(dquot->dq_sb)->s_qf_names[GRPQUOTA]) {
> >> >> > +              if (test_bit(DQ_MOD_B, &dquot->dq_flags))
> >> >> > +                       return 0;
> >> >>
> >> >> I don't think this is right, as far as i understand, journal quota need go
> >> >> together with quota space change update inside same transaction, this will
> >> >> break consistency if power off or RO happen.
> >> >>
> >> >> Here is some ideas that i have thought:
> >> >>
> >> >> 1) switch dqio_mutex to a read/write lock, especially, i think most of
> >> >> time journal quota updates is in-place update, that means we don't need
> >> >> change quota tree in memory, firstly try read lock, retry with write lock if
> >> >> there is real tree change.
> >> >>
> >> >> 2)another is similar idea of Andrew's walkaround, but we need make correct
> >> >> fix, maintain dirty list for per transaction, and gurantee quota updates are
> >> >> flushed when commit transaction, this might be complex, i am not very
> >> >> familiar with JBD2 codes.
> >> >>
> >> >> It will be really nice if we could fix this regression, as we see 20% performace
> >> >> regression.
> >> >
> >> > So I have couple of patches:
> >> >
> >> > 1) I convert dqio_mutex do rw semaphore and use it in exclusive mode only
> >> > when quota tree is going to change. We also use dq_lock to serialize writes
> >> > of dquot - you cannot have two writes happening in parallel as that could
> >> > result in stale data being on disk. This patch brings benefit when there
> >> > are multiple users - now they don't contend on common lock. It shows
> >> > advantage in my testing so I plan to merge these patches. When the
> >> > contention is on a structure for single user this change however doesn't
> >> > bring much (the performance change is in statistical noise in my testing).
> >> >
> >> > 2) I have patches to remove some contention on dq_list_lock by not using
> >> > dirty list for tracking dquots in ext4 (and thus avoid dq_list_lock
> >> > completely in quota modification path). This does not bring measurable
> >> > benefit in my testing even on ramdisk but lockstat data for dq_list_lock
> >> > looks much better after this - it seems lock contention just shifted to
> >> > dq_data_lock - I'll try to address that as well and see whether I'll be
> >> > able to measure some advantage.
> >> >
> >> > 3) I have patches to convert dquot dirty bit to sequence counter so that
> >> > in commit_dqblk() we can check whether dquot state we wanted to write is
> >> > already on disk. Note that this is different from Andrew's approach in that
> >> > we do wait for dquot to be actually written before returning. We just don't
> >> > repeat the write unnecessarily. However this didn't bring any measurable
> >> > benefit in my testing so unless I'll be able to confirm it benefits some
> >> > workloads I won't merge this change.
> >> >
> >> > If you can experiment with your workloads, I can send you patches. I'd be
> >> > keen on having some performance data from real setups...
> >> >
> >> >                                                                 Honza
> >> >
> >> >>
> >> >> Thanks,
> >> >> Shilong
> >> >>
> >> >> >                dquot_mark_dquot_dirty(dquot);
> >> >> >                return ext4_write_dquot(dquot);
> >> >> >
> >> >> > The idea was that if we know that some thread is somewhere between
> >> >> > mark_dirty and clear_dirty, then we can avoid blocking on dqio_mutex,
> >> >> > since that thread will update the ondisk dquot for us.
> >> >> >
> >> > --
> >> > Jan Kara <jack@suse.com>
> >> > SUSE Labs, CR
> > --
> > Jan Kara <jack@suse.com>
> > SUSE Labs, CR

> 4.13.0-rc3 without any patches
>         no Quota            -O quota         -O quota,project
>     creation   unlink    creation   unlink    creation  unlink
> 0   93,068  296,028      86,860  285,131      85,199    189,653     ops/per second
> 1   79,501  280,921      91,079  277,349     186,279    170,982
> 2   79,932  299,750      90,246  274,457     133,922    191,677
> 3   80,146  297,525      86,416  272,160     192,354    198,869
> 
> Jan Kara branch (quota_scaling)
>         no Quota            -O quota         -O quota,project
>   creation   unlink      creation   unlink    creation  unlink
> 0   73,057  311,217      74,898  286,120      81,217    288,138
> 1   78,872  312,471      76,470  277,033      77,014    288,057
> 2   79,170  291,440      76,174  283,525      73,686    283,526
> 3   79,941  309,168      78,493  277,331      78,751    281,377
> 
> 4.13.0-rc3 with v5 patch https://patchwork.ozlabs.org/patch/799014/					
>         no Quota            -O quota         -O quota,project
>   creation   unlink     creation   unlink    creation   unlink
> 0  100,319  322,746     87,480   302,579      84,569    218,969
> 1  728,424  299,808     312,766  293,471     219,198    199,389
> 2  729,410  300,930     315,590  289,664     218,283    197,871
> 3  727,555  298,797     316,837  289,108     213,095    213,458
> 
> Jan Kara branch (quota_scaling) with v5 patch https://patchwork.ozlabs.org/patch/799014/
>         no Quota            -O quota         -O quota,project
>   creation   unlink    creation   unlink    creation     unlink
> 0  100,312  324,871      87,076  267,303      86,258    288,137
> 1  707,524  298,892     361,963  252,493     421,919    282,492
> 2  707,792  298,162     363,450  264,923     397,723    283,675
> 3  707,420  302,552     354,013  266,638     421,537    281,763

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-08-14  8:22 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-02 12:23 quota: dqio_mutex design Andrew Perepechko
2017-03-03 10:08 ` Jan Kara
2017-03-09 22:29   ` Andrew Perepechko
2017-03-13  8:44     ` Jan Kara
2017-06-21 10:52   ` Jan Kara
     [not found]     ` <4181747.CBilgxvOab@panda>
2017-08-01 13:02       ` Jan Kara
2017-08-02 16:25         ` Jan Kara
2017-08-02 17:52           ` Andrew Perepechko
2017-08-03 11:09             ` Jan Kara
2017-08-03 11:31             ` Wang Shilong
2017-08-03 12:24               ` Andrew Perepechko
2017-08-03 13:19                 ` Wang Shilong
2017-08-03 13:41                   ` Andrew Perepechko
2017-08-03 13:55                     ` Andrew Perepechko
2017-08-03 14:23                       ` Jan Kara
2017-08-03 14:36               ` Jan Kara
2017-08-03 14:39                 ` Wang Shilong
2017-08-08 16:06                   ` Jan Kara
2017-08-14  3:24                     ` Wang Shilong
2017-08-14  3:28                       ` Wang Shilong
2017-08-14  3:53                       ` Wang Shilong
2017-08-14  8:22                         ` Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.