linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Mauricio Faria de Oliveira <mfo@canonical.com>
Cc: Jan Kara <jack@suse.cz>,
	linux-ext4@vger.kernel.org,
	dann frazier <dann.frazier@canonical.com>,
	Mauricio Faria de Oliveira <mauricio.foliveira@gmail.com>,
	Jan Kara <jack@suse.com>
Subject: Re: [RFC PATCH v2 3/5] ext4: data=journal: write-protect pages on submit inode data buffers callback
Date: Wed, 19 Aug 2020 10:44:21 +0200	[thread overview]
Message-ID: <20200819084421.GD1902@quack2.suse.cz> (raw)
In-Reply-To: <20200810010210.3305322-4-mfo@canonical.com>

On Sun 09-08-20 22:02:06, Mauricio Faria de Oliveira wrote:
> This implements the journal's j_submit_inode_data_buffers() callback
> to write-protect the inode's pages with write_cache_pages(), and use
> a writepage callback to redirty pages with buffers that are not part
> of the committing transaction or the next transaction.
> 
> And set a no-op function as j_finish_inode_data_buffers() callback
> (nothing needed other than the write-protect above.)
> 
> Currently, the inode is added to the transaction's inode list in the
> __ext4_journalled_writepage() function.
> ---
>  fs/ext4/inode.c |  4 +++
>  fs/ext4/super.c | 65 +++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 69 insertions(+)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 10dd470876b3..978ccde8454f 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1911,6 +1911,10 @@ static int __ext4_journalled_writepage(struct page *page,
>  		err = ext4_walk_page_buffers(handle, page_bufs, 0, len, NULL,
>  					     write_end_fn);
>  	}
> +	if (ret == 0)
> +		ret = err;
> +	// XXX: is this correct for inline data inodes?

Inodes with inline data should never get here. The thing is that when a
file with inline data is faulted for writing, ext4_page_mkwrite() converts
the file to normal data format. And ordinary write(2) will directly update
the inline data and clear the page dirty bit so writepage isn't called for
it.

> +	err = ext4_jbd2_inode_add_write(handle, inode, 0, len);
>  	if (ret == 0)
>  		ret = err;
>  	EXT4_I(inode)->i_datasync_tid = handle->h_transaction->t_tid;
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 330957ed1f05..38aaac6572ea 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -472,6 +472,66 @@ static void ext4_journal_commit_callback(journal_t *journal, transaction_t *txn)
>  	spin_unlock(&sbi->s_md_lock);
>  }
>  
> +/*
> + * This writepage callback for write_cache_pages()
> + * takes care of a few cases after page cleaning.
> + *
> + * write_cache_pages() already checks for dirty pages
> + * and calls clear_page_dirty_for_io(), which we want,
> + * to write protect the pages.
> + *
> + * However, we have to redirty a page in two cases:
> + * 1) some buffer is not part of the committing transaction
> + * 2) some buffer already has b_next_transaction set
> + */
> +
> +static int ext4_journalled_writepage_callback(struct page *page,
> +					      struct writeback_control *wbc,
> +					      void *data)
> +{
> +	transaction_t *transaction = (transaction_t *) data;
> +	struct buffer_head *bh, *head;
> +	struct journal_head *jh;
> +
> +	// XXX: any chance of !bh here?

No. In ext4, whenever a page is dirty, it should have buffers attached.

> +	bh = head = page_buffers(page);
> +	do {
> +		jh = bh2jh(bh);
> +		if (!jh || jh->b_transaction != transaction ||
> +		    jh->b_next_transaction) {
> +			redirty_page_for_writepage(wbc, page);
> +			goto out;
> +		}
> +	} while ((bh = bh->b_this_page) != head);
> +
> +out:
> +	return AOP_WRITEPAGE_ACTIVATE;
> +}
> +
> +static int ext4_journalled_submit_inode_data_buffers(struct jbd2_inode *jinode)
> +{
> +	struct address_space *mapping = jinode->i_vfs_inode->i_mapping;
> +	transaction_t *transaction = jinode->i_transaction;
> +	loff_t dirty_start = jinode->i_dirty_start;
> +	loff_t dirty_end = jinode->i_dirty_end;
> +
> +	struct writeback_control wbc = {
> +		.sync_mode =  WB_SYNC_ALL,
> +		.nr_to_write = mapping->nrpages * 2,

For WB_SYNC_ALL writeback, .nr_to_write is mostly ignored so just set it to
~0ULL.

> +		.range_start = dirty_start,
> +		.range_end = dirty_end,
> +        };
> +
> +	return write_cache_pages(mapping, &wbc,
> +				 ext4_journalled_writepage_callback,
> +				 transaction);

I was thinking about this and we may need to do this somewhat differently.
I've realized that there's the slight trouble that we now use page dirty
bit for two purposes in data=journal mode - to track pages that need write
protection during commit and also to track pages which have buffers that
need checkpointing. And this mixing is making things complex. So I was
thinking that we could simply leave PageDirty bit for checkpointing
purposes and always make sure buffers are appropriately attached to a
transaction as dirty in ext4_page_mkwrite(). This will make mmap writes in
data=journal mode somewhat less efficient (all the pages written through
mmap while transaction T was running will be written to the journal during
transaction T commit while currently, we write only pages that also went
through __ext4_journalled_writepage() while T was running which usually
happens less frequently). But the code should be simpler and we don't care
about mmap write performance for data=journal mode much. Furthermore I
don't think that the tricks with PageChecked logic we play in data=journal
mode are really needed as well which should bring further simplifications.
I'll try to code this cleanup.

Then in ext4_journalled_submit_inode_data_buffers() we would need to walk
all the pages in the range describe by jinode and call page_mkclean() for
each page which has buffer attached to the committing transaction.

> +}
> +
> +static int ext4_journalled_finish_inode_data_buffers(struct jbd2_inode *jinode)
> +{
> +	return 0;
> +}
> +
>  static bool system_going_down(void)
>  {
>  	return system_state == SYSTEM_HALT || system_state == SYSTEM_POWER_OFF
> @@ -4599,6 +4659,11 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
>  		ext4_msg(sb, KERN_ERR, "can't mount with "
>  			"journal_async_commit in data=ordered mode");
>  		goto failed_mount_wq;
> +	} else if (test_opt(sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA) {
> +		sbi->s_journal->j_submit_inode_data_buffers =
> +			ext4_journalled_submit_inode_data_buffers;
> +		sbi->s_journal->j_finish_inode_data_buffers =
> +			ext4_journalled_finish_inode_data_buffers;

We can journal data only for individual inodes (when inode has journal_data
flag set). So you have to set the callback unconditionally here and only in
the callback decide what to do with the particular inode based on what
ext4_should_journal_data() tells about the inode.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2020-08-19  8:44 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-10  1:02 [RFC PATCH v2 0/5] ext4/jbd2: data=journal: write-protect pages on transaction commit Mauricio Faria de Oliveira
2020-08-10  1:02 ` [RFC PATCH v2 1/5] jbd2: test case for ext4 data=journal/mmap() journal corruption Mauricio Faria de Oliveira
2020-08-18 14:38   ` Jan Kara
2020-08-19  1:15     ` Mauricio Faria de Oliveira
2020-08-10  1:02 ` [RFC PATCH v2 2/5] jbd2: introduce journal callbacks j_submit|finish_inode_data_buffers Mauricio Faria de Oliveira
2020-08-18 14:52   ` Jan Kara
2020-08-19  1:20     ` Mauricio Faria de Oliveira
2020-08-19  9:16       ` Jan Kara
2020-08-10  1:02 ` [RFC PATCH v2 3/5] ext4: data=journal: write-protect pages on submit inode data buffers callback Mauricio Faria de Oliveira
2020-08-19  8:44   ` Jan Kara [this message]
2020-08-19 10:41     ` Jan Kara
2020-08-20 22:55       ` Mauricio Faria de Oliveira
2020-08-21 10:26         ` Jan Kara
2020-08-10  1:02 ` [RFC PATCH v2 4/5] ext4: data=journal: add inode to transaction inode list in ext4_page_mkwrite() Mauricio Faria de Oliveira
2020-08-10  1:02 ` [RFC PATCH v2 5/5] ext4/jbd2: debugging messages Mauricio Faria de Oliveira
2020-08-19  8:46   ` Jan Kara
2020-08-10  1:02 ` [RFC PATCH v2/SETUP SCRIPT] Mauricio Faria de Oliveira
2020-08-10  1:02 ` [RFC PATCH v2/TEST CASE] Mauricio Faria de Oliveira
2020-08-19  9:27 ` [RFC PATCH v2 0/5] ext4/jbd2: data=journal: write-protect pages on transaction commit Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200819084421.GD1902@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=dann.frazier@canonical.com \
    --cc=jack@suse.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=mauricio.foliveira@gmail.com \
    --cc=mfo@canonical.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).