ocfs2-devel.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Joseph Qi via Ocfs2-devel <ocfs2-devel@oss.oracle.com>
To: Heming Zhao <heming.zhao@suse.com>, ocfs2-devel@oss.oracle.com
Subject: Re: [Ocfs2-devel] [PATCH 2/2] ocfs2: fix for local alloc window restore unconditionally
Date: Thu, 2 Jun 2022 18:02:19 +0800	[thread overview]
Message-ID: <8c001d9a-ffa0-7173-1429-de19bd4779a5@linux.alibaba.com> (raw)
In-Reply-To: <20220521101416.29793-2-heming.zhao@suse.com>



On 5/21/22 6:14 PM, Heming Zhao wrote:
> When la state is ENABLE, ocfs2_recalc_la_window restores la window
> unconditionally. The logic is wrong.
> 
> Let's image below path.
> 
> 1. la state (->local_alloc_state) is set THROTTLED or DISABLED.
> 
> 2. About 30s (OCFS2_LA_ENABLE_INTERVAL), delayed work is triggered,
>    ocfs2_la_enable_worker set la state to ENABLED directly.
> 
> 3. a write IOs thread run:
> 
>    ```
>    ocfs2_write_begin
>     ...
>      ocfs2_lock_allocators
>       ocfs2_reserve_clusters
>        ocfs2_reserve_clusters_with_limit
>         ocfs2_reserve_local_alloc_bits
>          ocfs2_local_alloc_slide_window // [1]
>           + ocfs2_recalc_la_window(osb, OCFS2_LA_EVENT_SLIDE) // [2]
>           + ...
>           + ocfs2_local_alloc_new_window
>              ocfs2_claim_clusters // [3]
>    ```
> 
> [1]: will be called when la window bits used up.
> [2]: under la state is ENABLED (eg OCFS2_LA_ENABLE_INTERVAL delayed work
>      happened), it unconditionally restores la window to default value.
> [3]: will use default la window size to search clusters. IMO the timing
>      is O(n^4). The timing O(n^4) will cost huge time to scan global
>      bitmap. It makes write IOs (eg user space 'dd') become dramatically
>      slow.
> 
> i.e.
> an ocfs2 partition size: 1.45TB, cluster size: 4KB,
> la window default size: 106MB.
> The partition is fragmentation by creating & deleting huge mount of
> small file.
> 
> the timing should be (the number got from real world):
> - la window size change order (size: MB):
>   106, 53, 26.5, 13, 6.5, 3.25, 1.6, 0.8
>   only 0.8MB succeed, 0.8MB also triggers la window to disable.
>   ocfs2_local_alloc_new_window retries 8 times, first 7 times totally
>   runs in worst case.
> - group chain number: 242
>   ocfs2_claim_suballoc_bits calls for-loop 242 times
> - each chain has 49 block group
>   ocfs2_search_chain calls while-loop 49 times
> - each bg has 32256 blocks
>   ocfs2_block_group_find_clear_bits calls while-loop for 32256 bits.
>   for ocfs2_find_next_zero_bit uses ffz() to find zero bit, let's use
>   (32256/64) for timing calucation.
> 
> So the loop times: 7*242*49*(32256/64) = 41835024 (~42 million times)
> 
> In the worst case, user space writes 100MB data will trigger 42M scanning
> times, and if the write can't finish within 30s (OCFS2_LA_ENABLE_INTERVAL),
> the write IO will suffer another 42M scanning times. It makes the ocfs2
> partition keep pool performance all the time.
> 

The scenario makes sense.
I have to spend more time to dig into the code and then get back to you.

Thanks,
Joseph

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

  reply	other threads:[~2022-06-02 10:02 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-21 10:14 [Ocfs2-devel] [PATCH 1/2] ocfs2: fix jbd2 assertion in defragment path Heming Zhao via Ocfs2-devel
2022-05-21 10:14 ` [Ocfs2-devel] [PATCH 2/2] ocfs2: fix for local alloc window restore unconditionally Heming Zhao via Ocfs2-devel
2022-06-02 10:02   ` Joseph Qi via Ocfs2-devel [this message]
2022-06-12  2:57   ` Joseph Qi via Ocfs2-devel
2022-06-12  7:45     ` heming.zhao--- via Ocfs2-devel
2022-06-12 12:38       ` Joseph Qi via Ocfs2-devel
2022-06-13  1:48         ` heming.zhao--- via Ocfs2-devel
2022-06-02  9:34 ` [Ocfs2-devel] [PATCH 1/2] ocfs2: fix jbd2 assertion in defragment path Joseph Qi via Ocfs2-devel
2022-06-04  0:08   ` Heming Zhao via Ocfs2-devel
2022-06-10  7:46     ` Joseph Qi via Ocfs2-devel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8c001d9a-ffa0-7173-1429-de19bd4779a5@linux.alibaba.com \
    --to=ocfs2-devel@oss.oracle.com \
    --cc=heming.zhao@suse.com \
    --cc=joseph.qi@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).