Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete

All of lore.kernel.org
 help / color / mirror / Atom feed

* Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete
@ 2015-11-01 20:49 Stefan Priebe
  2015-11-01 22:57 ` Duncan
  2015-11-02  1:34 ` Qu Wenruo
  0 siblings, 2 replies; 14+ messages in thread
From: Stefan Priebe @ 2015-11-01 20:49 UTC (permalink / raw)
  To: linux-btrfs, mfasheh; +Cc: jbacik, Chris Mason

Hi,

this one: http://www.spinics.net/lists/linux-btrfs/msg47377.html

adds a regression to my test systems with very large disks (30tb and 50tb).

btrfs balance is super slow afterwards while heavily making use of cp 
--reflink=always on big files (200gb - 500gb).

Sorry didn't know how to correctly reply to that "old" message.

Greets,
Stefan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete
  2015-11-01 20:49 Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete Stefan Priebe
@ 2015-11-01 22:57 ` Duncan
  2015-11-02  1:34 ` Qu Wenruo
  1 sibling, 0 replies; 14+ messages in thread
From: Duncan @ 2015-11-01 22:57 UTC (permalink / raw)
  To: linux-btrfs

Stefan Priebe posted on Sun, 01 Nov 2015 21:49:44 +0100 as excerpted:

> this one: http://www.spinics.net/lists/linux-btrfs/msg47377.html
> 
> adds a regression to my test systems with very large disks (30tb and
> 50tb).
> 
> btrfs balance is super slow afterwards while heavily making use of cp
> --reflink=always on big files (200gb - 500gb).
> 
> Sorry didn't know how to correctly reply to that "old" message.

Just on the message-reply bit...

Gmane.org carries this list (among many), archiving the posts with both 
nntp/news and http/web interfaces.  Both the web and news interfaces 
normally allow replies to both old and current messages via the gmane 
gateway forwarding to the list, tho the first time you reply to a list 
via gmane, it'll respond with a confirmation to the email address you 
used, requiring you to reply to that before forwarding the mail on to the 
list.  If you don't reply within a week, the message is dropped.  
However, at least for the news interface (not sure about the web 
interface), you only have to confirm for a particular list/newsgroup 
once, after that, it forwards to the list without further confirmations.

That's how I follow all my lists, reading and replying to them as 
newsgroups via the gmane list2news interface.

http://gmane.org for more info.

The one caveat is that while on a lot of lists replies to the list only 
is the norm, on the Linux kernel and vger.kernel.org hosted lists 
(including this one), replying to all, list and previous posters, is the 
norm, and I'm not sure if the web interface allows that.  On the news 
interface it of course depends on your news client -- mine is more 
adapted to news than mail, and while it allows forwarding to your normal 
mail client for the mail side, normal followups are to news only, and 
it's not easy to reply to all, so I generally reply to list (as 
newsgroup) only, unless a poster specifically requests to be CCed on 
replies.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete
  2015-11-01 20:49 Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete Stefan Priebe
  2015-11-01 22:57 ` Duncan
@ 2015-11-02  1:34 ` Qu Wenruo
  2015-11-02  5:46   ` Stefan Priebe
  2015-11-03 19:26   ` Mark Fasheh
  1 sibling, 2 replies; 14+ messages in thread
From: Qu Wenruo @ 2015-11-02  1:34 UTC (permalink / raw)
  To: Stefan Priebe, linux-btrfs, mfasheh; +Cc: jbacik, Chris Mason

Stefan Priebe wrote on 2015/11/01 21:49 +0100:
> Hi,
>
> this one: http://www.spinics.net/lists/linux-btrfs/msg47377.html
>
> adds a regression to my test systems with very large disks (30tb and 50tb).
>
> btrfs balance is super slow afterwards while heavily making use of cp
> --reflink=always on big files (200gb - 500gb).
>
> Sorry didn't know how to correctly reply to that "old" message.
>
> Greets,
> Stefan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thanks for the testing.

Are you using qgroup or just doing normal balance with qgroup disabled?

For the latter case, that's should be optimized to skip the dirty extent 
insert in qgroup disabled case.

For qgroup enabled case, I'm afraid that's the design.
As relocation will drop a subtree to relocate, and to ensure qgroup 
consistent, we must walk down all the tree blocks and mark them dirty 
for later qgroup accounting.

But there should be some hope left for optimization.
For example, if all subtree blocks are already relocated, we can skip 
the tree down walk routine.

Anyway, for your case of huge files, as tree level grows rapidly, any 
workload involving tree iteration will be very time consuming.
Like snapshot deletion and relocation.

BTW, thanks for you regression report, I also found another problem of 
the patch.
I'll reply to the author to improve the patchset.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete
  2015-11-02  1:34 ` Qu Wenruo
@ 2015-11-02  5:46   ` Stefan Priebe
  2015-11-03 19:15     ` Mark Fasheh
  2015-11-03 19:26   ` Mark Fasheh
  1 sibling, 1 reply; 14+ messages in thread
From: Stefan Priebe @ 2015-11-02  5:46 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs, mfasheh; +Cc: jbacik, Chris Mason

Am 02.11.2015 um 02:34 schrieb Qu Wenruo:
>
>
> Stefan Priebe wrote on 2015/11/01 21:49 +0100:
>> Hi,
>>
>> this one: http://www.spinics.net/lists/linux-btrfs/msg47377.html
>>
>> adds a regression to my test systems with very large disks (30tb and
>> 50tb).
>>
>> btrfs balance is super slow afterwards while heavily making use of cp
>> --reflink=always on big files (200gb - 500gb).
>>
>> Sorry didn't know how to correctly reply to that "old" message.
>>
>> Greets,
>> Stefan
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> Thanks for the testing.
>
> Are you using qgroup or just doing normal balance with qgroup disabled?

just doing normal balance with qgroup disabled.

> For the latter case, that's should be optimized to skip the dirty extent
> insert in qgroup disabled case.
>
> For qgroup enabled case, I'm afraid that's the design.
> As relocation will drop a subtree to relocate, and to ensure qgroup
> consistent, we must walk down all the tree blocks and mark them dirty
> for later qgroup accounting.
>
> But there should be some hope left for optimization.
> For example, if all subtree blocks are already relocated, we can skip
> the tree down walk routine.
>
> Anyway, for your case of huge files, as tree level grows rapidly, any
> workload involving tree iteration will be very time consuming.
> Like snapshot deletion and relocation.
>
> BTW, thanks for you regression report, I also found another problem of
> the patch.
> I'll reply to the author to improve the patchset.

Thanks,
Stefan


> Thanks,
> Qu

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete
  2015-11-02  5:46   ` Stefan Priebe
@ 2015-11-03 19:15     ` Mark Fasheh
  0 siblings, 0 replies; 14+ messages in thread
From: Mark Fasheh @ 2015-11-03 19:15 UTC (permalink / raw)
  To: Stefan Priebe; +Cc: Qu Wenruo, linux-btrfs, jbacik, Chris Mason

On Mon, Nov 02, 2015 at 06:46:06AM +0100, Stefan Priebe wrote:
> Am 02.11.2015 um 02:34 schrieb Qu Wenruo:
> >
> >
> >Stefan Priebe wrote on 2015/11/01 21:49 +0100:
> >>Hi,
> >>
> >>this one: http://www.spinics.net/lists/linux-btrfs/msg47377.html
> >>
> >>adds a regression to my test systems with very large disks (30tb and
> >>50tb).
> >>
> >>btrfs balance is super slow afterwards while heavily making use of cp
> >>--reflink=always on big files (200gb - 500gb).
> >>
> >>Sorry didn't know how to correctly reply to that "old" message.
> >>
> >>Greets,
> >>Stefan
> >>--
> >>To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> >>the body of a message to majordomo@vger.kernel.org
> >>More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> >Thanks for the testing.
> >
> >Are you using qgroup or just doing normal balance with qgroup disabled?
> 
> just doing normal balance with qgroup disabled.

Then that patch is very unlikely to be your actual problem as it won't be
doing anything (ok some kmalloc/free of a very tiny object) since qgroups
are disabled.

Also, btrfs had working subtree accounting in that code for the last N
releases (doing the same exact thing) and it only changed for the one
release that Qu's rework was in (which lazily tore it out).
	--Mark

--
Mark Fasheh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete
  2015-11-02  1:34 ` Qu Wenruo
  2015-11-02  5:46   ` Stefan Priebe
@ 2015-11-03 19:26   ` Mark Fasheh
  2015-11-03 19:42     ` Stefan Priebe
  2015-11-04  1:01     ` Qu Wenruo
  1 sibling, 2 replies; 14+ messages in thread
From: Mark Fasheh @ 2015-11-03 19:26 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Stefan Priebe, linux-btrfs, jbacik, Chris Mason

On Mon, Nov 02, 2015 at 09:34:24AM +0800, Qu Wenruo wrote:
> 
> 
> Stefan Priebe wrote on 2015/11/01 21:49 +0100:
> >Hi,
> >
> >this one: http://www.spinics.net/lists/linux-btrfs/msg47377.html
> >
> >adds a regression to my test systems with very large disks (30tb and 50tb).
> >
> >btrfs balance is super slow afterwards while heavily making use of cp
> >--reflink=always on big files (200gb - 500gb).
> >
> >Sorry didn't know how to correctly reply to that "old" message.
> >
> >Greets,
> >Stefan
> >--
> >To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> Thanks for the testing.
> 
> Are you using qgroup or just doing normal balance with qgroup disabled?
> 
> For the latter case, that's should be optimized to skip the dirty
> extent insert in qgroup disabled case.
> 
> For qgroup enabled case, I'm afraid that's the design.
> As relocation will drop a subtree to relocate, and to ensure qgroup
> consistent, we must walk down all the tree blocks and mark them
> dirty for later qgroup accounting.

Qu, we're always going to have to walk the tree when deleting it, this is
part of removing a subvolume. We've walked shared subtrees in this code for
numerous kernel releases without incident before it was removed in 4.2.

Do you have any actual evidence that this is a major performance regression?
>From our previous conversations you seemed convinced of this, without even
having a working subtree walk to test. I remember the hand wringing
about an individual commit being too heavy with the qgroup code (even though
I pointed out that tree walk is a restartable transaction).

It seems that you are confused still about how we handle removing a volume
wrt qgroups.

If you have questions or concerns I would be happy to explain them but
IMHO your statements there are opinion and not based in fact.

Yes btw, we might have to do more work for the uncommon case of a
qgroup being referenced by higher level groups but that is clearly not
happening here (and honestly it's not a common case at all).
	--Mark


--
Mark Fasheh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete
  2015-11-03 19:26   ` Mark Fasheh
@ 2015-11-03 19:42     ` Stefan Priebe
  2015-11-03 23:31       ` Mark Fasheh
  2015-11-04  1:01     ` Qu Wenruo
  1 sibling, 1 reply; 14+ messages in thread
From: Stefan Priebe @ 2015-11-03 19:42 UTC (permalink / raw)
  To: Mark Fasheh, Qu Wenruo; +Cc: linux-btrfs, jbacik, Chris Mason

Am 03.11.2015 um 20:26 schrieb Mark Fasheh:
> On Mon, Nov 02, 2015 at 09:34:24AM +0800, Qu Wenruo wrote:
>>
>>
>> Stefan Priebe wrote on 2015/11/01 21:49 +0100:
>>> Hi,
>>>
>>> this one: http://www.spinics.net/lists/linux-btrfs/msg47377.html
>>>
>>> adds a regression to my test systems with very large disks (30tb and 50tb).
>>>
>>> btrfs balance is super slow afterwards while heavily making use of cp
>>> --reflink=always on big files (200gb - 500gb).
>>>
>>> Sorry didn't know how to correctly reply to that "old" message.
>>>
>>> Greets,
>>> Stefan
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>> Thanks for the testing.
>>
>> Are you using qgroup or just doing normal balance with qgroup disabled?
>>
>> For the latter case, that's should be optimized to skip the dirty
>> extent insert in qgroup disabled case.
>>
>> For qgroup enabled case, I'm afraid that's the design.
>> As relocation will drop a subtree to relocate, and to ensure qgroup
>> consistent, we must walk down all the tree blocks and mark them
>> dirty for later qgroup accounting.
>
> Qu, we're always going to have to walk the tree when deleting it, this is
> part of removing a subvolume. We've walked shared subtrees in this code for
> numerous kernel releases without incident before it was removed in 4.2.
>
> Do you have any actual evidence that this is a major performance regression?
>  From our previous conversations you seemed convinced of this, without even
> having a working subtree walk to test. I remember the hand wringing
> about an individual commit being too heavy with the qgroup code (even though
> I pointed out that tree walk is a restartable transaction).
>
> It seems that you are confused still about how we handle removing a volume
> wrt qgroups.
>
> If you have questions or concerns I would be happy to explain them but
> IMHO your statements there are opinion and not based in fact.
>
> Yes btw, we might have to do more work for the uncommon case of a
> qgroup being referenced by higher level groups but that is clearly not
> happening here (and honestly it's not a common case at all).
> 	--Mark

Sorry don't know much about the btrfs internals.

I just can reproduce this. Switching to a kernel with this patch and 
without. With it takes ages - without it's super fast. I prooved this 
several times by just rebooting to the other kernel.

Stefan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete
  2015-11-03 19:42     ` Stefan Priebe
@ 2015-11-03 23:31       ` Mark Fasheh
  2015-11-04  2:22         ` Chris Mason
  0 siblings, 1 reply; 14+ messages in thread
From: Mark Fasheh @ 2015-11-03 23:31 UTC (permalink / raw)
  To: Stefan Priebe; +Cc: Qu Wenruo, linux-btrfs, jbacik, Chris Mason

On Tue, Nov 03, 2015 at 08:42:33PM +0100, Stefan Priebe wrote:
> Sorry don't know much about the btrfs internals.
> 
> I just can reproduce this. Switching to a kernel with this patch and
> without. With it takes ages - without it's super fast. I prooved
> this several times by just rebooting to the other kernel.

That's fine, disregard my previous e-mail - I just saw the mail Qu sent me.
There's a problem in the code that the patch calls which is causing your
performance issues. I'll CC you when I put out a fix.

Thanks,
	--Mark


--
Mark Fasheh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete
  2015-11-03 19:26   ` Mark Fasheh
  2015-11-03 19:42     ` Stefan Priebe
@ 2015-11-04  1:01     ` Qu Wenruo
  2015-11-05 19:23       ` Mark Fasheh
  1 sibling, 1 reply; 14+ messages in thread
From: Qu Wenruo @ 2015-11-04  1:01 UTC (permalink / raw)
  To: Mark Fasheh; +Cc: Stefan Priebe, linux-btrfs, jbacik, Chris Mason



Mark Fasheh wrote on 2015/11/03 11:26 -0800:
> On Mon, Nov 02, 2015 at 09:34:24AM +0800, Qu Wenruo wrote:
>>
>>
>> Stefan Priebe wrote on 2015/11/01 21:49 +0100:
>>> Hi,
>>>
>>> this one: http://www.spinics.net/lists/linux-btrfs/msg47377.html
>>>
>>> adds a regression to my test systems with very large disks (30tb and 50tb).
>>>
>>> btrfs balance is super slow afterwards while heavily making use of cp
>>> --reflink=always on big files (200gb - 500gb).
>>>
>>> Sorry didn't know how to correctly reply to that "old" message.
>>>
>>> Greets,
>>> Stefan
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>> Thanks for the testing.
>>
>> Are you using qgroup or just doing normal balance with qgroup disabled?
>>
>> For the latter case, that's should be optimized to skip the dirty
>> extent insert in qgroup disabled case.
>>
>> For qgroup enabled case, I'm afraid that's the design.
>> As relocation will drop a subtree to relocate, and to ensure qgroup
>> consistent, we must walk down all the tree blocks and mark them
>> dirty for later qgroup accounting.
>
> Qu, we're always going to have to walk the tree when deleting it, this is
> part of removing a subvolume. We've walked shared subtrees in this code for
> numerous kernel releases without incident before it was removed in 4.2.
>
> Do you have any actual evidence that this is a major performance regression?
>  From our previous conversations you seemed convinced of this, without even
> having a working subtree walk to test. I remember the hand wringing
> about an individual commit being too heavy with the qgroup code (even though
> I pointed out that tree walk is a restartable transaction).
>
> It seems that you are confused still about how we handle removing a volume
> wrt qgroups.
>
> If you have questions or concerns I would be happy to explain them but
> IMHO your statements there are opinion and not based in fact.

Yes, I don't deny it.
But it's quite hard to prove it, as we need such a huge storage like Stefan.
What I have is only several hundred GB test storage.
Even accounting all my home NAS, I only have 2T, far from the storage 
Stefan has.

And what Stefan report should already give some hint about the 
performance issue.

In your word "it won't be doing anything (ok some kmalloc/free of a very 
tiny object)", it's already slowing down balance, since balance also use 
btrfs_drop_subtree().

You're right about tree walk can happen in several transaction, and 
normally user won't notice anything as subvolume delete is in background.

But in relocating case, it's causing relocation slower than it was,
due to "nothing(kmalloc/free ting objects)".
Yes, you can fix it by avoid memory allocation in qgroup disabled case,
but what will happen if user enabled qgroup?


I'm not saying there is anything wrong about your patch, in fact I'm 
quite happy you solved such problem with so small changes.

But we can't just ignore such "possible" performance issue just because 
old code did the same thing.(Although not the same now, we're marking 
all subtree blocks dirty other than shared one).

Thanks,
Qu

>
> Yes btw, we might have to do more work for the uncommon case of a
> qgroup being referenced by higher level groups but that is clearly not
> happening here (and honestly it's not a common case at all).
> 	--Mark
>
>
> --
> Mark Fasheh
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete
  2015-11-03 23:31       ` Mark Fasheh
@ 2015-11-04  2:22         ` Chris Mason
  0 siblings, 0 replies; 14+ messages in thread
From: Chris Mason @ 2015-11-04  2:22 UTC (permalink / raw)
  To: Mark Fasheh; +Cc: Stefan Priebe, Qu Wenruo, linux-btrfs, jbacik

On Tue, Nov 03, 2015 at 03:31:15PM -0800, Mark Fasheh wrote:
> On Tue, Nov 03, 2015 at 08:42:33PM +0100, Stefan Priebe wrote:
> > Sorry don't know much about the btrfs internals.
> > 
> > I just can reproduce this. Switching to a kernel with this patch and
> > without. With it takes ages - without it's super fast. I prooved
> > this several times by just rebooting to the other kernel.
> 
> That's fine, disregard my previous e-mail - I just saw the mail Qu sent me.
> There's a problem in the code that the patch calls which is causing your
> performance issues. I'll CC you when I put out a fix.

Thanks Mark (and Qu), I'll get the fixed version into integration once
it is out.

-chris


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete
  2015-11-04  1:01     ` Qu Wenruo
@ 2015-11-05 19:23       ` Mark Fasheh
  2015-11-06  1:02         ` Qu Wenruo
  0 siblings, 1 reply; 14+ messages in thread
From: Mark Fasheh @ 2015-11-05 19:23 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Stefan Priebe, linux-btrfs, jbacik, Chris Mason

On Wed, Nov 04, 2015 at 09:01:36AM +0800, Qu Wenruo wrote:
> 
> 
> Mark Fasheh wrote on 2015/11/03 11:26 -0800:
> >On Mon, Nov 02, 2015 at 09:34:24AM +0800, Qu Wenruo wrote:
> >>
> >>
> >>Stefan Priebe wrote on 2015/11/01 21:49 +0100:
> >>>Hi,
> >>>
> >>>this one: http://www.spinics.net/lists/linux-btrfs/msg47377.html
> >>>
> >>>adds a regression to my test systems with very large disks (30tb and 50tb).
> >>>
> >>>btrfs balance is super slow afterwards while heavily making use of cp
> >>>--reflink=always on big files (200gb - 500gb).
> >>>
> >>>Sorry didn't know how to correctly reply to that "old" message.
> >>>
> >>>Greets,
> >>>Stefan
> >>>--
> >>>To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> >>>the body of a message to majordomo@vger.kernel.org
> >>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> >>Thanks for the testing.
> >>
> >>Are you using qgroup or just doing normal balance with qgroup disabled?
> >>
> >>For the latter case, that's should be optimized to skip the dirty
> >>extent insert in qgroup disabled case.
> >>
> >>For qgroup enabled case, I'm afraid that's the design.
> >>As relocation will drop a subtree to relocate, and to ensure qgroup
> >>consistent, we must walk down all the tree blocks and mark them
> >>dirty for later qgroup accounting.
> >
> >Qu, we're always going to have to walk the tree when deleting it, this is
> >part of removing a subvolume. We've walked shared subtrees in this code for
> >numerous kernel releases without incident before it was removed in 4.2.
> >
> >Do you have any actual evidence that this is a major performance regression?
> > From our previous conversations you seemed convinced of this, without even
> >having a working subtree walk to test. I remember the hand wringing
> >about an individual commit being too heavy with the qgroup code (even though
> >I pointed out that tree walk is a restartable transaction).
> >
> >It seems that you are confused still about how we handle removing a volume
> >wrt qgroups.
> >
> >If you have questions or concerns I would be happy to explain them but
> >IMHO your statements there are opinion and not based in fact.
> 
> Yes, I don't deny it.
> But it's quite hard to prove it, as we need such a huge storage like Stefan.
> What I have is only several hundred GB test storage.
> Even accounting all my home NAS, I only have 2T, far from the
> storage Stefan has.
> 
> And what Stefan report should already give some hint about the
> performance issue.
> 
> In your word "it won't be doing anything (ok some kmalloc/free of a
> very tiny object)", it's already slowing down balance, since balance
> also use btrfs_drop_subtree().

When I wrote that I was under the impression that the qgroup code was doing
it's own sanity checking (it used to) and since Stephan had them disabled
they couldn't be causing the problem. I read your e-mail explaining that the
qgroup api was now intertwined with delayed ref locking after this one.

The same exact code ran in either case before and after your patches, so my
guess is that the issue is actually inside the qgroup code that shouldn't
have been run. I wonder if we even just filled up his memory but never
cleaned the objects. The only other thing I can think of is if
account_leaf_items() got run in a really tight loop for some reason.

Kmalloc in the way we are using it is not usually a performance issue,
especially if we've been reading off disk in the same process. Ask yourself
this - your own patch series does the same kmalloc for every qgroup
operation. Did you notice a complete and massive performance slowdown like
the one Stefan reported?

I will say that we never had this problem reported before, and
account_leaf_items() is always run in all kernels, even without qgroups
enabled. That will change with my new patch though.

What we can say for sure is that drop_snapshot in the qgroup case will read
more disk and obviously that will have a negative impact depending on what
the tree looks like. So IMHO we ought to be focusing on reducing the amount
of I/O involved.


> But we can't just ignore such "possible" performance issue just
> because old code did the same thing.(Although not the same now,
> we're marking all subtree blocks dirty other than shared one).

Well, I can't disagree with that - the only reason we are talking right now
is because you intentionally ignored the qgroup code in drop_snapshot(). So
let's start with this - no more 'fixing' code by tearing it out and replacing
it with /* TODO: somebody else re-implement this */   ;)
	--Mark

--
Mark Fasheh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete
  2015-11-05 19:23       ` Mark Fasheh
@ 2015-11-06  1:02         ` Qu Wenruo
  2015-11-06  3:15           ` Mark Fasheh
  0 siblings, 1 reply; 14+ messages in thread
From: Qu Wenruo @ 2015-11-06  1:02 UTC (permalink / raw)
  To: Mark Fasheh; +Cc: Stefan Priebe, linux-btrfs, jbacik, Chris Mason



Mark Fasheh wrote on 2015/11/05 11:23 -0800:
> On Wed, Nov 04, 2015 at 09:01:36AM +0800, Qu Wenruo wrote:
>>
>>
>> Mark Fasheh wrote on 2015/11/03 11:26 -0800:
>>> On Mon, Nov 02, 2015 at 09:34:24AM +0800, Qu Wenruo wrote:
>>>>
>>>>
>>>> Stefan Priebe wrote on 2015/11/01 21:49 +0100:
>>>>> Hi,
>>>>>
>>>>> this one: http://www.spinics.net/lists/linux-btrfs/msg47377.html
>>>>>
>>>>> adds a regression to my test systems with very large disks (30tb and 50tb).
>>>>>
>>>>> btrfs balance is super slow afterwards while heavily making use of cp
>>>>> --reflink=always on big files (200gb - 500gb).
>>>>>
>>>>> Sorry didn't know how to correctly reply to that "old" message.
>>>>>
>>>>> Greets,
>>>>> Stefan
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>> Thanks for the testing.
>>>>
>>>> Are you using qgroup or just doing normal balance with qgroup disabled?
>>>>
>>>> For the latter case, that's should be optimized to skip the dirty
>>>> extent insert in qgroup disabled case.
>>>>
>>>> For qgroup enabled case, I'm afraid that's the design.
>>>> As relocation will drop a subtree to relocate, and to ensure qgroup
>>>> consistent, we must walk down all the tree blocks and mark them
>>>> dirty for later qgroup accounting.
>>>
>>> Qu, we're always going to have to walk the tree when deleting it, this is
>>> part of removing a subvolume. We've walked shared subtrees in this code for
>>> numerous kernel releases without incident before it was removed in 4.2.
>>>
>>> Do you have any actual evidence that this is a major performance regression?
>>>  From our previous conversations you seemed convinced of this, without even
>>> having a working subtree walk to test. I remember the hand wringing
>>> about an individual commit being too heavy with the qgroup code (even though
>>> I pointed out that tree walk is a restartable transaction).
>>>
>>> It seems that you are confused still about how we handle removing a volume
>>> wrt qgroups.
>>>
>>> If you have questions or concerns I would be happy to explain them but
>>> IMHO your statements there are opinion and not based in fact.
>>
>> Yes, I don't deny it.
>> But it's quite hard to prove it, as we need such a huge storage like Stefan.
>> What I have is only several hundred GB test storage.
>> Even accounting all my home NAS, I only have 2T, far from the
>> storage Stefan has.
>>
>> And what Stefan report should already give some hint about the
>> performance issue.
>>
>> In your word "it won't be doing anything (ok some kmalloc/free of a
>> very tiny object)", it's already slowing down balance, since balance
>> also use btrfs_drop_subtree().
>
> When I wrote that I was under the impression that the qgroup code was doing
> it's own sanity checking (it used to) and since Stephan had them disabled
> they couldn't be causing the problem. I read your e-mail explaining that the
> qgroup api was now intertwined with delayed ref locking after this one.

My fault, as btrfs_qgroup_mark_exntent_dirty() is an exception which 
doesn't have the qgroup status check and depend on existing locks.

>
> The same exact code ran in either case before and after your patches, so my
> guess is that the issue is actually inside the qgroup code that shouldn't
> have been run. I wonder if we even just filled up his memory but never
> cleaned the objects. The only other thing I can think of is if
> account_leaf_items() got run in a really tight loop for some reason.
>
> Kmalloc in the way we are using it is not usually a performance issue,
> especially if we've been reading off disk in the same process. Ask yourself
> this - your own patch series does the same kmalloc for every qgroup
> operation. Did you notice a complete and massive performance slowdown like
> the one Stefan reported?

You're right, such memory allocation may impact performance but not so 
noticeable, compared to other operations which may kick disk IO, like 
btrfs_find_all_roots().

But at least, enabling qgroup will impact performance.

Yeah, this time I has test data now.
In a environment with 100 different snapshot, sysbench shows an overall 
performance drop about 5%, and in some case, up to 7%, with qgroup enabled.

Not sure about the kmalloc impact, maybe less than 1% or maybe 2~3%, but 
at least it's worthy trying to use kmem cache.

>
> I will say that we never had this problem reported before, and
> account_leaf_items() is always run in all kernels, even without qgroups
> enabled. That will change with my new patch though.
>
> What we can say for sure is that drop_snapshot in the qgroup case will read
> more disk and obviously that will have a negative impact depending on what
> the tree looks like. So IMHO we ought to be focusing on reducing the amount
> of I/O involved.

Totally agree.

Thanks,
Qu

>
>
>> But we can't just ignore such "possible" performance issue just
>> because old code did the same thing.(Although not the same now,
>> we're marking all subtree blocks dirty other than shared one).
>
> Well, I can't disagree with that - the only reason we are talking right now
> is because you intentionally ignored the qgroup code in drop_snapshot(). So
> let's start with this - no more 'fixing' code by tearing it out and replacing
> it with /* TODO: somebody else re-implement this */   ;)
> 	--Mark
>
> --
> Mark Fasheh
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete
  2015-11-06  1:02         ` Qu Wenruo
@ 2015-11-06  3:15           ` Mark Fasheh
  2015-11-06  3:25             ` Qu Wenruo
  0 siblings, 1 reply; 14+ messages in thread
From: Mark Fasheh @ 2015-11-06  3:15 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Stefan Priebe, linux-btrfs, jbacik, Chris Mason

On Fri, Nov 06, 2015 at 09:02:13AM +0800, Qu Wenruo wrote:
> >The same exact code ran in either case before and after your patches, so my
> >guess is that the issue is actually inside the qgroup code that shouldn't
> >have been run. I wonder if we even just filled up his memory but never
> >cleaned the objects. The only other thing I can think of is if
> >account_leaf_items() got run in a really tight loop for some reason.
> >
> >Kmalloc in the way we are using it is not usually a performance issue,
> >especially if we've been reading off disk in the same process. Ask yourself
> >this - your own patch series does the same kmalloc for every qgroup
> >operation. Did you notice a complete and massive performance slowdown like
> >the one Stefan reported?
> 
> You're right, such memory allocation may impact performance but not
> so noticeable, compared to other operations which may kick disk IO,
> like btrfs_find_all_roots().
> 
> But at least, enabling qgroup will impact performance.
> 
> Yeah, this time I has test data now.
> In a environment with 100 different snapshot, sysbench shows an
> overall performance drop about 5%, and in some case, up to 7%, with
> qgroup enabled.
> 
> Not sure about the kmalloc impact, maybe less than 1% or maybe 2~3%,
> but at least it's worthy trying to use kmem cache.

Ok cool, what'd you do to generate the snapshots? I can try a similar test
on one of my machines and see what I get. I'm not surprised that the
overhead is noticable, and I agree it's easy enough to try things like
replacing the allocation once we have a test going.

Thanks,
	--Mark

--
Mark Fasheh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete
  2015-11-06  3:15           ` Mark Fasheh
@ 2015-11-06  3:25             ` Qu Wenruo
  0 siblings, 0 replies; 14+ messages in thread
From: Qu Wenruo @ 2015-11-06  3:25 UTC (permalink / raw)
  To: Mark Fasheh; +Cc: Stefan Priebe, linux-btrfs, jbacik, Chris Mason



Mark Fasheh wrote on 2015/11/05 19:15 -0800:
> On Fri, Nov 06, 2015 at 09:02:13AM +0800, Qu Wenruo wrote:
>>> The same exact code ran in either case before and after your patches, so my
>>> guess is that the issue is actually inside the qgroup code that shouldn't
>>> have been run. I wonder if we even just filled up his memory but never
>>> cleaned the objects. The only other thing I can think of is if
>>> account_leaf_items() got run in a really tight loop for some reason.
>>>
>>> Kmalloc in the way we are using it is not usually a performance issue,
>>> especially if we've been reading off disk in the same process. Ask yourself
>>> this - your own patch series does the same kmalloc for every qgroup
>>> operation. Did you notice a complete and massive performance slowdown like
>>> the one Stefan reported?
>>
>> You're right, such memory allocation may impact performance but not
>> so noticeable, compared to other operations which may kick disk IO,
>> like btrfs_find_all_roots().
>>
>> But at least, enabling qgroup will impact performance.
>>
>> Yeah, this time I has test data now.
>> In a environment with 100 different snapshot, sysbench shows an
>> overall performance drop about 5%, and in some case, up to 7%, with
>> qgroup enabled.
>>
>> Not sure about the kmalloc impact, maybe less than 1% or maybe 2~3%,
>> but at least it's worthy trying to use kmem cache.
>
> Ok cool, what'd you do to generate the snapshots? I can try a similar test
> on one of my machines and see what I get. I'm not surprised that the
> overhead is noticable, and I agree it's easy enough to try things like
> replacing the allocation once we have a test going.
>
> Thanks,
> 	--Mark

Doing fsstress in a subvolume with 4 threads, creating a snapshot of 
that subvolume about every 5 seconds.

And do sysbench inside the 50th snapshot.

Such test takes both overhead of btrfs_find_all_roots() and kmalloc().
So I'm not sure which overhead is bigger.

Thanks,
Qu
>
> --
> Mark Fasheh
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-11-06  3:25 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-01 20:49 Regression in: [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete Stefan Priebe
2015-11-01 22:57 ` Duncan
2015-11-02  1:34 ` Qu Wenruo
2015-11-02  5:46   ` Stefan Priebe
2015-11-03 19:15     ` Mark Fasheh
2015-11-03 19:26   ` Mark Fasheh
2015-11-03 19:42     ` Stefan Priebe
2015-11-03 23:31       ` Mark Fasheh
2015-11-04  2:22         ` Chris Mason
2015-11-04  1:01     ` Qu Wenruo
2015-11-05 19:23       ` Mark Fasheh
2015-11-06  1:02         ` Qu Wenruo
2015-11-06  3:15           ` Mark Fasheh
2015-11-06  3:25             ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.