All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andreas Gruenbacher <agruenba@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [gfs2 PATCH] gfs2: allocate pages for clone bitmaps
Date: Mon, 12 Apr 2021 13:32:03 +0200	[thread overview]
Message-ID: <CAHc6FU5=0p6=V3va3UNPB0ci2At3TuZ+TxgD2yQPBNjGzb4WqQ@mail.gmail.com> (raw)
In-Reply-To: <344305871.6577253.1618062541261.JavaMail.zimbra@redhat.com>

On Sat, Apr 10, 2021 at 3:49 PM Bob Peterson <rpeterso@redhat.com> wrote:
> Resource group (rgrp) bitmaps have in-core-only "clone" bitmaps that
> ensure freed fs space from deletes are not reused until the transaction
> is complete. Before this patch, these clone bitmaps were allocated with
> kmalloc, but with the default 4K block size, kmalloc is wasteful because
> of the way slab keeps track of them. As a matter of fact, kernel docs
> only recommend slab for allocations "less than page size." See:
> https://www.kernel.org/doc/html/v5.0/core-api/mm-api.html#mm-api-gfp-flags
> In fact, if you turn on kernel slab debugging options, slab will give
> you warnings that gfs2 should not do this.
>
> This patch switches the clone bitmap allocations to alloc_page, which
> has much less overhead and uses less memory. The down side is: if we
> allocate a whole page for block sizes smaller than page size, we will
> use more memory and it will be wasteful. But in general, we've always
> recommended using block size = page size for efficiency and performance.

If we really want to switch to page-granularity allocations, vmalloc
would be more appropriate. Note that vmalloc doesn't support
__GFP_NOFAIL, so we should get rid of that by doing the allocation in
a context where we can sleep first.
Looking at rgblk_free and gfs2_free_clones, another cheap improvement
would be to make a single allocation for all clone bitmaps of a
resource group instead of an allocation per bitmap.

But first, I'd like to understand what's actually going on here.

> In a recent test I did with 24 simultaneous recursive file deletes,
> on a large dataset (each working to delete a separate directory), this
> patch yielded a 16 percent increase in speed. Total accumulated real
> (clock) time of the test went from 41310 seconds (11.5 hours) down to
> just 34742 seconds (9.65 hours) (This was lock_nolock on a single node).

I find that really hard to believe. Did you look at the frequency of
clone bitmap allocations? If that is the problem, are we simply too
aggressive freeing the clone bitmaps?

Thanks,
Andreas



      reply	other threads:[~2021-04-12 11:32 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1528588397.6568321.1618062440748.JavaMail.zimbra@redhat.com>
2021-04-10 13:49 ` [Cluster-devel] [gfs2 PATCH] gfs2: allocate pages for clone bitmaps Bob Peterson
2021-04-12 11:32   ` Andreas Gruenbacher [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHc6FU5=0p6=V3va3UNPB0ci2At3TuZ+TxgD2yQPBNjGzb4WqQ@mail.gmail.com' \
    --to=agruenba@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.