linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: Damien Le Moal <Damien.LeMoal@wdc.com>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Cc: Johannes Thumshirn <jth@kernel.org>,
	Naohiro Aota <Naohiro.Aota@wdc.com>,
	"Darrick J . Wong" <darrick.wong@oracle.com>
Subject: Re: [PATCH 1/2] fs: New zonefs file system
Date: Tue, 17 Dec 2019 08:28:33 +0100	[thread overview]
Message-ID: <32e3418b-727e-3018-1b8a-0530608fb34d@suse.de> (raw)
In-Reply-To: <BYAPR04MB5816D17D0A14D5651E37F700E7500@BYAPR04MB5816.namprd04.prod.outlook.com>

On 12/17/19 1:20 AM, Damien Le Moal wrote:
> On 2019/12/16 17:36, Hannes Reinecke wrote:
> [...]
>>> +static int zonefs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
>>> +			      unsigned int flags, struct iomap *iomap,
>>> +			      struct iomap *srcmap)
>>> +{
>>> +	struct zonefs_sb_info *sbi = ZONEFS_SB(inode->i_sb);
>>> +	struct zonefs_inode_info *zi = ZONEFS_I(inode);
>>> +	loff_t max_isize = zi->i_max_size;
>>> +	loff_t isize;
>>> +
>>> +	/*
>>> +	 * For sequential zones, enforce direct IO writes. This is already
>>> +	 * checked when writes are issued, so warn about this here if we
>>> +	 * get buffered write to a sequential file inode.
>>> +	 */
>>> +	if (WARN_ON_ONCE(zi->i_ztype == ZONEFS_ZTYPE_SEQ &&
>>> +			 (flags & IOMAP_WRITE) && !(flags & IOMAP_DIRECT)))
>>> +		return -EIO;
>>> +
>>> +	/*
>>> +	 * For all zones, all blocks are always mapped. For sequential zones,
>>> +	 * all blocks after the write pointer (inode size) are always unwritten.
>>> +	 */
>>> +	mutex_lock(&zi->i_truncate_mutex);
>>> +	isize = i_size_read(inode);
>>> +	if (offset >= isize) {
>>> +		length = min(length, max_isize - offset);
>>> +		if (zi->i_ztype == ZONEFS_ZTYPE_CNV)
>>> +			iomap->type = IOMAP_MAPPED;
>>> +		else
>>> +			iomap->type = IOMAP_UNWRITTEN;
>>> +	} else {
>>> +		length = min(length, isize - offset);
>>> +		iomap->type = IOMAP_MAPPED;
>>> +	}
>>> +	mutex_unlock(&zi->i_truncate_mutex);
>>> +
>>> +	iomap->offset = offset & (~sbi->s_blocksize_mask);
>>> +	iomap->length = ((offset + length + sbi->s_blocksize_mask) &
>>> +			 (~sbi->s_blocksize_mask)) - iomap->offset;
>>> +	iomap->bdev = inode->i_sb->s_bdev;
>>> +	iomap->addr = (zi->i_zsector << SECTOR_SHIFT) + iomap->offset;
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +static const struct iomap_ops zonefs_iomap_ops = {
>>> +	.iomap_begin	= zonefs_iomap_begin,
>>> +};
>>> +
>> This probably shows my complete ignorance, but what is the effect on
>> enforcing the direct I/O writes on the pagecache?
>> IE what happens for buffered reads? Will the pages be invalidated when a
>> write has been issued?
> 
> Yes, a direct write issued to a file range that has cached pages result
> in these pages to be invalidated. But note that in the case of zonefs,
> this can happen only in the case of conventional zones. For sequential
> zones, this does not happen: reads can be buffered and cache pages but
> only for pages below the write pointer. And writes can only be issued at
> the write pointer. So there is never any possible overlap between
> buffered reads and direct writes.
> 
Oh, indeed, you are correct. That's indeed easy then.

>> Or do we simply rely on upper layers to ensure no concurrent buffered
>> and direct I/O is being made?
> 
> Nope. VFS, or the file system specific implementation, takes care of
> that. See generic_file_direct_write() and its call to
> invalidate_inode_pages2_range().
> 
Of course.
One could even say: not applicable, as it won't happen.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

  reply	other threads:[~2019-12-17  7:28 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-12 18:38 [PATCH 0/2] New zonefs file system Damien Le Moal
2019-12-12 18:38 ` [PATCH 1/2] fs: " Damien Le Moal
2019-12-16  8:36   ` Hannes Reinecke
2019-12-17  0:20     ` Damien Le Moal
2019-12-17  7:28       ` Hannes Reinecke [this message]
2019-12-12 18:38 ` [PATCH 2/2] zonefs: Add documentation Damien Le Moal
2019-12-16  8:38   ` Hannes Reinecke
2019-12-17  0:20     ` Damien Le Moal
2019-12-16  8:18 ` [PATCH 0/2] New zonefs file system Enrico Weigelt, metux IT consult
2019-12-17  0:05   ` Damien Le Moal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=32e3418b-727e-3018-1b8a-0530608fb34d@suse.de \
    --to=hare@suse.de \
    --cc=Damien.LeMoal@wdc.com \
    --cc=Naohiro.Aota@wdc.com \
    --cc=darrick.wong@oracle.com \
    --cc=jth@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).