All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] OrangeFS blocksize in superblock and inode
@ 2018-04-25 17:27 martin
  2018-04-26 16:16 ` Walt Ligon
  0 siblings, 1 reply; 3+ messages in thread
From: martin @ 2018-04-25 17:27 UTC (permalink / raw)
  To: linux-fsdevel, devel, hubcap, walt, ligon

OrangeFS sends and receives I/O to a userspace client using shared
memory buffers.  OrangeFS sets blocksize in the superblock based on the
size of these buffers.  This is typically four megabytes.

sb->s_blocksize = orangefs_bufmap_size_query();
sb->s_blocksize_bits = orangefs_bufmap_shift_query();

Then OrangeFS sets blkbits on a regular file inode to PAGE_SHIFT, but
does not.  This dates back to 2003, long before OrangeFS was upstream.

http://dev.orangefs.org/trac/orangefs/changeset/2083

It appears neill was attempting to implement mmap.

/* 
  FIXME: 
  We're faking our inode block size to be PAGE_CACHE_SIZE 
  to play nicely with the page cache. 

  In some reality, inode->i_blksize != PAGE_CACHE_SIZE and 
  inode->i_blkbits != PAGE_CACHE_SHIFT 
*/ 

The comment is wrong anyway, as he sets block size to PAGE_CACHE_SIZE.
This leads me to suspect the whole idea is wrong.

I don't see any reason blkbits must equal PAGE_SHIFT.  Paging-related
code uses PAGE_SIZE and PAGE_SHIFT.  There are some references in fs/*
and mm/*,  but none that appear to affect OrangeFS.

The goal of setting blksize is to cause applications to perform bigger
reads and writes by reporting an increased block size through stat.
Then despite all the hoopla around blksize, it is not used for that
purpose.

stat->blksize = orangefs_inode->blksize;

which is the block size reported by the OrangeFS server (64 kilobytes in
my single-server environment, but it varies based on how parallel the
installation is).  This value may differ per file.

Then stat->blocks is set to inode->i_blocks in generic_fillattr, which
is set to (i_size + (4096 - i_size % 4096))/512.  The 4096 is hardcoded.

One approach is to set i_blksize to the size reported by the server and
set i_blocks to (i_size + (512 - i_size % 512))/512.  I intend to remove
orangefs_inode->blksize.

Then i_blksize may not equal PAGE_SIZE.  I am unsure of the implications
of this.

That leaves the superblock block size.  This ends up being reported as
i_blksize for directories and symlinks.  OrangeFS only reports a block
size on regular files.  Four megabytes seems rather large, but I'm not
sure what else to use.  I could use PAGE_SIZE.

Another approach is to set blksize to the page size.

However, I am informed by the OrangeFS server developers that they
really want this to value to change based on the number of servers the
file will be striped across.

With that in mind, I could set stat->blksize based on the server-reported
value but use PAGE_SIZE for the superblock and non-regular-file objects.

Do I need to keep i_blksize equal to PAGE_SIZE or may it also be set to
the server-reported value?

Do either of these sound more reasonable than the other?  Should I do
something else entirely?

Thanks in advance for any advice anyone may be able to give.

Martin

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC] OrangeFS blocksize in superblock and inode
  2018-04-25 17:27 [RFC] OrangeFS blocksize in superblock and inode martin
@ 2018-04-26 16:16 ` Walt Ligon
  2018-05-02 18:01   ` [PATCH] revamp OrangeFS block sizes Martin Brandenburg
  0 siblings, 1 reply; 3+ messages in thread
From: Walt Ligon @ 2018-04-26 16:16 UTC (permalink / raw)
  To: martin, linux-fsdevel, devel, hubcap, ligon

Walt (comments below)

On 4/25/18 1:27 PM, martin@omnibond.com wrote:
> OrangeFS sends and receives I/O to a userspace client using shared
> memory buffers.  OrangeFS sets blocksize in the superblock based on the
> size of these buffers.  This is typically four megabytes.
> 
> sb->s_blocksize = orangefs_bufmap_size_query();
> sb->s_blocksize_bits = orangefs_bufmap_shift_query();
> 
> Then OrangeFS sets blkbits on a regular file inode to PAGE_SHIFT, but
> does not.  This dates back to 2003, long before OrangeFS was upstream.
> 
> http://dev.orangefs.org/trac/orangefs/changeset/2083
> 
> It appears neill was attempting to implement mmap.
> 
> /*
>    FIXME:
>    We're faking our inode block size to be PAGE_CACHE_SIZE
>    to play nicely with the page cache.
> 
>    In some reality, inode->i_blksize != PAGE_CACHE_SIZE and
>    inode->i_blkbits != PAGE_CACHE_SHIFT
> */
> 
> The comment is wrong anyway, as he sets block size to PAGE_CACHE_SIZE.
> This leads me to suspect the whole idea is wrong.
> 
> I don't see any reason blkbits must equal PAGE_SHIFT.  Paging-related
> code uses PAGE_SIZE and PAGE_SHIFT.  There are some references in fs/*
> and mm/*,  but none that appear to affect OrangeFS.
> 
> The goal of setting blksize is to cause applications to perform bigger
> reads and writes by reporting an increased block size through stat.
> Then despite all the hoopla around blksize, it is not used for that
> purpose.

I'm not sure blksize is the right field (although that seems right).  At 
one point we looked at cp to see how it selects its buffer size based on 
the source and destination.  The applications only do this if they are 
smart enough, but I would guess many of the standard utilities are.

The field I'm interested in is st_blksize, not i_blksize.  There is also 
f_bsize from statfs, which seems appropriate (though really not since 
the optimal size varies per file).  Randy was the one who looked at cp, 
maybe we should do that again and document it in the wiki for future 
reference, since this seems to come up every few years.

> 
> stat->blksize = orangefs_inode->blksize;
> 
> which is the block size reported by the OrangeFS server (64 kilobytes in
> my single-server environment, but it varies based on how parallel the
> installation is).  This value may differ per file.

That would be correct.  The default strip size is 64K, and if there is 
only one server, then the stripe size (intended buffer size) would be 
64K X 1 or 64K.  With more servers it would be more.

> 
> Then stat->blocks is set to inode->i_blocks in generic_fillattr, which
> is set to (i_size + (4096 - i_size % 4096))/512.  The 4096 is hardcoded.
> 
> One approach is to set i_blksize to the size reported by the server and
> set i_blocks to (i_size + (512 - i_size % 512))/512.  I intend to remove
> orangefs_inode->blksize.
> 
> Then i_blksize may not equal PAGE_SIZE.  I am unsure of the implications
> of this.
> 
> That leaves the superblock block size.  This ends up being reported as
> i_blksize for directories and symlinks.  OrangeFS only reports a block
> size on regular files.  Four megabytes seems rather large, but I'm not
> sure what else to use.  I could use PAGE_SIZE.
> 
> Another approach is to set blksize to the page size.
> 
> However, I am informed by the OrangeFS server developers that they
> really want this to value to change based on the number of servers the
> file will be striped across.
> 
> With that in mind, I could set stat->blksize based on the server-reported
> value but use PAGE_SIZE for the superblock and non-regular-file objects.

Regular files are the only ones that would need to see the stripe size.
> 
> Do I need to keep i_blksize equal to PAGE_SIZE or may it also be set to
> the server-reported value?
> 
> Do either of these sound more reasonable than the other?  Should I do
> something else entirely?
> 
> Thanks in advance for any advice anyone may be able to give.
> 
> Martin
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH] revamp OrangeFS block sizes
  2018-04-26 16:16 ` Walt Ligon
@ 2018-05-02 18:01   ` Martin Brandenburg
  0 siblings, 0 replies; 3+ messages in thread
From: Martin Brandenburg @ 2018-05-02 18:01 UTC (permalink / raw)
  To: walt, linux-fsdevel, devel, hubcap, ligon; +Cc: Martin Brandenburg

Now the superblock block size is PAGE_SIZE.  The inode block size is
PAGE_SIZE for directories and symlinks, but is the server-reported
block size for regular files.

The block size in the OrangeFS private inode is now deleted.  Stat
now reports PAGE_SIZE for directories and symlinks and the
server-reported block size for regular files.

The user-space visible change is that the block size for directores
and symlinks and the superblock is now PAGE_SIZE rather than the size of
the client-core shared memory buffers, which was typically four
megabytes.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 fs/orangefs/inode.c           |  6 ++----
 fs/orangefs/orangefs-kernel.h |  1 -
 fs/orangefs/orangefs-utils.c  | 12 ++++--------
 fs/orangefs/super.c           |  4 ++--
 4 files changed, 8 insertions(+), 15 deletions(-)

diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index 79c61da8b1bc..b583fbf90665 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -20,8 +20,8 @@ static int read_one_page(struct page *page)
 	int max_block;
 	ssize_t bytes_read = 0;
 	struct inode *inode = page->mapping->host;
-	const __u32 blocksize = PAGE_SIZE;	/* inode->i_blksize */
-	const __u32 blockbits = PAGE_SHIFT;	/* inode->i_blkbits */
+	const __u32 blocksize = PAGE_SIZE;
+	const __u32 blockbits = PAGE_SHIFT;
 	struct iov_iter to;
 	struct bio_vec bv = {.bv_page = page, .bv_len = PAGE_SIZE};
 
@@ -262,7 +262,6 @@ int orangefs_getattr(const struct path *path, struct kstat *stat,
 
 		/* override block size reported to stat */
 		orangefs_inode = ORANGEFS_I(inode);
-		stat->blksize = orangefs_inode->blksize;
 
 		if (request_mask & STATX_SIZE)
 			stat->result_mask = STATX_BASIC_STATS;
@@ -325,7 +324,6 @@ static int orangefs_init_iops(struct inode *inode)
 	case S_IFREG:
 		inode->i_op = &orangefs_file_inode_operations;
 		inode->i_fop = &orangefs_file_operations;
-		inode->i_blkbits = PAGE_SHIFT;
 		break;
 	case S_IFLNK:
 		inode->i_op = &orangefs_symlink_inode_operations;
diff --git a/fs/orangefs/orangefs-kernel.h b/fs/orangefs/orangefs-kernel.h
index c29bb0ebc6bb..004511617b6d 100644
--- a/fs/orangefs/orangefs-kernel.h
+++ b/fs/orangefs/orangefs-kernel.h
@@ -182,7 +182,6 @@ static inline void set_op_state_purged(struct orangefs_kernel_op_s *op)
 struct orangefs_inode_s {
 	struct orangefs_object_kref refn;
 	char link_target[ORANGEFS_NAME_MAX];
-	__s64 blksize;
 	/*
 	 * Reading/Writing Extended attributes need to acquire the appropriate
 	 * reader/writer semaphore on the orangefs_inode_s structure.
diff --git a/fs/orangefs/orangefs-utils.c b/fs/orangefs/orangefs-utils.c
index 00fadaf0da8f..89729040c5b4 100644
--- a/fs/orangefs/orangefs-utils.c
+++ b/fs/orangefs/orangefs-utils.c
@@ -275,7 +275,7 @@ int orangefs_inode_getattr(struct inode *inode, int new, int bypass,
 {
 	struct orangefs_inode_s *orangefs_inode = ORANGEFS_I(inode);
 	struct orangefs_kernel_op_s *new_op;
-	loff_t inode_size, rounded_up_size;
+	loff_t inode_size;
 	int ret, type;
 
 	gossip_debug(GOSSIP_UTILS_DEBUG, "%s: called on inode %pU\n", __func__,
@@ -330,22 +330,19 @@ int orangefs_inode_getattr(struct inode *inode, int new, int bypass,
 		if (request_mask & STATX_SIZE || new) {
 			inode_size = (loff_t)new_op->
 			    downcall.resp.getattr.attributes.size;
-			rounded_up_size =
-			    (inode_size + (4096 - (inode_size % 4096)));
 			inode->i_size = inode_size;
-			orangefs_inode->blksize =
-			    new_op->downcall.resp.getattr.attributes.blksize;
+			inode->i_blkbits = ffs(new_op->downcall.resp.getattr.
+			    attributes.blksize);
 			spin_lock(&inode->i_lock);
 			inode->i_bytes = inode_size;
 			inode->i_blocks =
-			    (unsigned long)(rounded_up_size / 512);
+			    (inode_size + 512 - inode_size % 512)/512;
 			spin_unlock(&inode->i_lock);
 		}
 		break;
 	case S_IFDIR:
 		if (request_mask & STATX_SIZE || new) {
 			inode->i_size = PAGE_SIZE;
-			orangefs_inode->blksize = i_blocksize(inode);
 			spin_lock(&inode->i_lock);
 			inode_set_bytes(inode, inode->i_size);
 			spin_unlock(&inode->i_lock);
@@ -356,7 +353,6 @@ int orangefs_inode_getattr(struct inode *inode, int new, int bypass,
 		if (new) {
 			inode->i_size = (loff_t)strlen(new_op->
 			    downcall.resp.getattr.link_target);
-			orangefs_inode->blksize = i_blocksize(inode);
 			ret = strscpy(orangefs_inode->link_target,
 			    new_op->downcall.resp.getattr.link_target,
 			    ORANGEFS_NAME_MAX);
diff --git a/fs/orangefs/super.c b/fs/orangefs/super.c
index 3ae5fdba0225..b802486aab0a 100644
--- a/fs/orangefs/super.c
+++ b/fs/orangefs/super.c
@@ -423,8 +423,8 @@ static int orangefs_fill_sb(struct super_block *sb,
 	sb->s_op = &orangefs_s_ops;
 	sb->s_d_op = &orangefs_dentry_operations;
 
-	sb->s_blocksize = orangefs_bufmap_size_query();
-	sb->s_blocksize_bits = orangefs_bufmap_shift_query();
+	sb->s_blocksize = PAGE_SIZE;
+	sb->s_blocksize_bits = PAGE_SHIFT;
 	sb->s_maxbytes = MAX_LFS_FILESIZE;
 
 	root_object.khandle = ORANGEFS_SB(sb)->root_khandle;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-05-02 18:01 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-25 17:27 [RFC] OrangeFS blocksize in superblock and inode martin
2018-04-26 16:16 ` Walt Ligon
2018-05-02 18:01   ` [PATCH] revamp OrangeFS block sizes Martin Brandenburg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.