linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: 乱石 <zhangliguang@linux.alibaba.com>
To: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>,
	akpm@linux-foundation.org, cgroups@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [PATCH] fs/writeback: Attach inode's wb to root if needed
Date: Thu, 16 May 2019 13:54:24 +0800	[thread overview]
Message-ID: <4ebf1f8e-0f77-37df-da32-037384643527@linux.alibaba.com> (raw)
In-Reply-To: <20190513183053.GA73423@dennisz-mbp>

Hi Dennis,

Sorry for the later reply. Becase I cann't reproduce this problem by 
local test,

and online environment is not allowed to operate, I am constructing 
scenario

to reproduce it in recent days.


在 2019/5/14 2:30, Dennis Zhou 写道:
> Hi Liguang,
>
> On Fri, May 10, 2019 at 09:54:27AM +0800, 乱石 wrote:
>> Hi Tejun,
>>
>> 在 2019/5/10 0:48, Tejun Heo 写道:
>>> Hi Tejun,
>>>
>>> On Thu, May 09, 2019 at 04:03:53PM +0800, zhangliguang wrote:
>>>> There might have tons of files queued in the writeback, awaiting for
>>>> writing back. Unfortunately, the writeback's cgroup has been dead. In
>>>> this case, we reassociate the inode with another writeback cgroup, but
>>>> we possibly can't because the writeback associated with the dead cgroup
>>>> is the only valid one. In this case, the new writeback is allocated,
>>>> initialized and associated with the inode. It causes unnecessary high
>>>> system load and latency.
>>>>
>>>> This fixes the issue by enforce moving the inode to root cgroup when the
>>>> previous binding cgroup becomes dead. With it, no more unnecessary
>>>> writebacks are created, populated and the system load decreased by about
>>>> 6x in the online service we encounted:
>>>>       Without the patch: about 30% system load
>>>>       With the patch:    about  5% system load
>>> Can you please describe the scenario with more details?  I'm having a
>>> bit of hard time understanding the amount of cpu cycles being
>>> consumed.
>>>
>>> Thanks.
>> Our search line reported a problem, when containerA was removed,
>> containerB and containerC's system load were up to 30%.
>>
>> We record the trace with 'perf record cycles:k -g -a', found that wb_init
>> was the hotspot function.
>>
>> Function call:
>>
>> generic_file_direct_write
>>     filemap_write_and_wait_range
>>        __filemap_fdatawrite_range
>>           wbc_attach_fdatawrite_inode
>>              inode_attach_wb
>>                 __inode_attach_wb
>>                    wb_get_create
>>              wbc_attach_and_unlock_inode
>>                 if (unlikely(wb_dying(wbc->wb)))
>>                    inode_switch_wbs
>>                       wb_get_create
>>                          ; Search bdi->cgwb_tree from memcg_css->id
>>                          ; OR cgwb_create
>>                             kmalloc
>>                             wb_init       // hot spot
>>                             ; Insert to bdi->cgwb_tree, mmecg_css->id as key
>>
>> We discussed it through, base on the analysis:  When we running into the
>> issue, there is cgroups are being deleted. The inodes (files) that were
>> associated with these cgroups have to switch into another newly created
>> writeback. We think there are huge amount of inodes in the writeback list
>> that time. So we don't think there is anything abnormal. However, one
>> thing we possibly can do: enforce these inodes to BDI embedded wirteback
>> and we needn't create huge amount of writebacks in that case, to avoid
>> the high system load phenomenon. We expect correct wb (best candidate) is
>> picked up in next round.
>>
>> Thanks,
>> Liguang
>>
> If I understand correctly, this is mostlikely caused by a file shared by
> cgroup A and cgroup B. This means cgroup B is doing direct io against
> the file owned by the dying cgroup A. In this case, the code tries to do
> a wb switch. However, it fails to reallocate the wb as it's deleted and
> for the original cgrouip A's memcg id.
>
> I think the below may be a better solution. Could you please test it? If
> it works, I'll spin a patch with a more involved description.
>
> Thanks,
> Dennis
>
> ---
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 36855c1f8daf..fb331ea2a626 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -577,7 +577,7 @@ void wbc_attach_and_unlock_inode(struct writeback_control *wbc,
>   	 * A dying wb indicates that the memcg-blkcg mapping has changed
>   	 * and a new wb is already serving the memcg.  Switch immediately.
>   	 */
> -	if (unlikely(wb_dying(wbc->wb)))
> +	if (unlikely(wb_dying(wbc->wb)) && !css_is_dying(wbc->wb->memcg_css))
>   		inode_switch_wbs(inode, wbc->wb_id);
>   }
>   
> diff --git a/mm/backing-dev.c b/mm/backing-dev.c
> index 72e6d0c55cfa..685563ed9788 100644
> --- a/mm/backing-dev.c
> +++ b/mm/backing-dev.c
> @@ -659,7 +659,7 @@ struct bdi_writeback *wb_get_create(struct backing_dev_info *bdi,
>   
>   	might_sleep_if(gfpflags_allow_blocking(gfp));
>   
> -	if (!memcg_css->parent)
> +	if (!memcg_css->parent || css_is_dying(memcg_css))
>   		return &bdi->wb;
>   
>   	do {


      reply	other threads:[~2019-05-16  5:54 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-09  8:03 [PATCH] fs/writeback: Attach inode's wb to root if needed zhangliguang
2019-05-09 16:48 ` Tejun Heo
2019-05-10  1:54   ` 乱石
2019-05-13 18:30     ` Dennis Zhou
2019-05-16  5:54       ` 乱石 [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ebf1f8e-0f77-37df-da32-037384643527@linux.alibaba.com \
    --to=zhangliguang@linux.alibaba.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=dennis@kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).