Linux-ext4 Archive on
 help / color / Atom feed
From: "Theodore Ts'o" <>
To: Jan Kara <>
Cc: Christoph Hellwig <>,
	Zhang Yi <>,,,,
Subject: Re: [RFC PATCH v2 7/7] ext4: fix race between blkdev_releasepage() and ext4_put_super()
Date: Wed, 21 Apr 2021 12:57:39 -0400
Message-ID: <> (raw)
In-Reply-To: <>

On Wed, Apr 21, 2021 at 03:46:34PM +0200, Jan Kara wrote:
> Indeed, after 12 years in kernel .bdev_try_to_free_page is implemented only
> by ext4. So maybe it is not that important? I agree with Zhang and
> Christoph that getting the lifetime rules sorted out will be hairy and it
> is questionable, whether it is worth the additional pages we can reclaim.
> Ted, do you remember what was the original motivation for this?

The comment in fs/ext4/super.c is I thought a pretty good explanation:

 * Try to release metadata pages (indirect blocks, directories) which are
 * mapped via the block device.  Since these pages could have journal heads
 * which would prevent try_to_free_buffers() from freeing them, we must use
 * jbd2 layer's try_to_free_buffers() function to release them.

When we modify a metadata block, we attach a journal_head (jh)
structure to the buffer_head, and bump the ref count to prevent the
buffer from being freed.  Before the transaction is committed, the
buffer is marked jbddirty, but the dirty bit is not set until the
transaction commit.

At that back, writeback happens entirely at the discretion of the
buffer cache.  The jbd layer doesn't get notification when the I/O is
completed, nor when there is an I/O error.  (There was an attempt to
add a callback but that was NACK'ed because of a complaint that it was
jbd specific.)

So we don't actually know when it's safe to detach the jh from the
buffer_head and can drop the refcount so that the buffer_head can be
freed.  When the space in the journal starts getting low, we'll look
at at the jh's attached to completed transactions, and see how many of
them have clean bh's, and at that point, we can release the buffer

The other time when we'll attempt to detach jh's from clean buffers is
via bdev_try_to_free_buffers().  So if we drop the
bdev_try_to_free_page hook, then when we are under memory pressure,
there could be potentially a large percentage of the buffer cache
which can't be freed, and so the OOM-killer might trigger more often.

Now, if we could get a callback on I/O completion on a per-bh basis,
then we could detach the jh when the buffer is clean --- and as a
bonus, we'd get a notification when there was an I/O error writing
back a metadata block, which would be even better.

So how about an even swap?  If we can get a buffer I/O completion
callback, we can drop bdev_to_free_swap hook.....

	     	      			- Ted


  reply index

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-14 13:47 [RFC PATCH v2 0/7] ext4, jbd2: fix 3 issues about bdev_try_to_free_page() Zhang Yi
2021-04-14 13:47 ` [RFC PATCH v2 1/7] jbd2: remove the out label in __jbd2_journal_remove_checkpoint() Zhang Yi
2021-04-21 10:01   ` Jan Kara
2021-04-14 13:47 ` [RFC PATCH v2 2/7] jbd2: ensure abort the journal if detect IO error when writing original buffer back Zhang Yi
2021-04-21 13:20   ` Jan Kara
2021-04-14 13:47 ` [RFC PATCH v2 3/7] jbd2: don't abort the journal when freeing buffers Zhang Yi
2021-04-21 13:23   ` Jan Kara
2021-04-14 13:47 ` [RFC PATCH v2 4/7] jbd2: do not free buffers in jbd2_journal_try_to_free_buffers() Zhang Yi
2021-04-15 14:46   ` Christoph Hellwig
2021-04-14 13:47 ` [RFC PATCH v2 5/7] ext4: use RCU to protect accessing superblock in blkdev_releasepage() Zhang Yi
2021-04-15 14:48   ` Christoph Hellwig
2021-04-14 13:47 ` [RFC PATCH v2 6/7] fs: introduce a usage count into the superblock Zhang Yi
2021-04-15 14:40   ` Christoph Hellwig
2021-04-16  8:00     ` Zhang Yi
2021-04-14 13:47 ` [RFC PATCH v2 7/7] ext4: fix race between blkdev_releasepage() and ext4_put_super() Zhang Yi
2021-04-15 14:52   ` Christoph Hellwig
2021-04-16  8:00     ` Zhang Yi
2021-04-20 13:08       ` Christoph Hellwig
2021-04-21 13:46         ` Jan Kara
2021-04-21 16:57           ` Theodore Ts'o [this message]
2021-04-22  9:04             ` Jan Kara
2021-04-23 11:39               ` Zhang Yi
2021-04-23 16:06                 ` Jan Kara
2021-04-23 14:40               ` Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-ext4 Archive on

Archives are clonable:
	git clone --mirror linux-ext4/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-ext4 linux-ext4/ \
	public-inbox-index linux-ext4

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone