All of lore.kernel.org
 help / color / mirror / Atom feed
* Disentangling address_space and inode
@ 2020-06-09 12:41 Matthew Wilcox
  2020-06-10  9:27   ` [Cluster-devel] " Steven Whitehouse
  0 siblings, 1 reply; 3+ messages in thread
From: Matthew Wilcox @ 2020-06-09 12:41 UTC (permalink / raw)
  To: linux-fsdevel

I have a modest proposal ...

 struct inode {
-	struct address_space i_data;
 }

+struct minode {
+	struct inode i;
+	struct address_space m;
+};

 struct address_space {
-	struct inode *host;
 }

This saves one pointer per inode, and cuts all the pagecache support
from inodes which don't need to have a page cache (symlinks, directories,
pipes, sockets, char devices).

This was born from the annoyance of going from a struct page to a filesystem:
page->mapping->host->i_sb->s_type

That's four pointer dereferences.  This would bring it down to three:
i_host(page->mapping)->i_sb->s_type

I could see (eventually) interfaces changing to pass around a
struct minode *mapping instead of a struct address_space *mapping.  But
I know mapping->host and inode->i_mapping sometimes have some pretty
weird relationships and maybe there's a legitimate usage that can't be
handled by this change.

Every filesystem which does use the page cache would have to be changed
to use a minode instead of an inode, which is why this proposal is so
very modest.  But before I start looking into it properly, I thought
somebody might know why this isn't going to work.

I know about raw devices:
                file_inode(filp)->i_mapping =
                        bdev->bd_inode->i_mapping;

and this seems like it should work for that.  I know about coda:
                coda_inode->i_mapping = host_inode->i_mapping;

and this seems like it should work there too.

DAX just seems confused:
        inode->i_mapping = __dax_inode->i_mapping;
        inode->i_mapping->host = __dax_inode;
        inode->i_mapping->a_ops = &dev_dax_aops;

GFS2 might need to embed an entire minode instead of just a mapping in its
glocks and its superblock:
fs/gfs2/glock.c:                mapping->host = s->s_bdev->bd_inode;
fs/gfs2/ops_fstype.c:   mapping->host = sb->s_bdev->bd_inode;

NILFS ... I don't understand at all.  It seems to allocate its own
private address space in nilfs_inode_info instead of using i_data (why?)
and also allocate more address spaces for metadata inodes.
fs/nilfs2/page.c:       mapping->host = inode;

So that will need to be understood, but is there a fundamental reason
this won't work?

Advantages:
 - Eliminates a pointer dereference when moving from mapping to host
 - Shrinks all inodes by one pointer
 - Shrinks inodes used for symlinks, directories, sockets, pipes & char
   devices by an entire struct address_space.

Disadvantages:
 - Churn
 - Seems like it'll grow a few data structures in less common filesystems
   (but may be important for some users)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Disentangling address_space and inode
  2020-06-09 12:41 Disentangling address_space and inode Matthew Wilcox
@ 2020-06-10  9:27   ` Steven Whitehouse
  0 siblings, 0 replies; 3+ messages in thread
From: Steven Whitehouse @ 2020-06-10  9:27 UTC (permalink / raw)
  To: Matthew Wilcox, linux-fsdevel
  Cc: cluster-devel, Miklos Szeredi, Andreas Gruenbacher, Bob Peterson

Hi,

On 09/06/2020 13:41, Matthew Wilcox wrote:
> I have a modest proposal ...
>
>   struct inode {
> -	struct address_space i_data;
>   }
>
> +struct minode {
> +	struct inode i;
> +	struct address_space m;
> +};
>
>   struct address_space {
> -	struct inode *host;
>   }
>
> This saves one pointer per inode, and cuts all the pagecache support
> from inodes which don't need to have a page cache (symlinks, directories,
> pipes, sockets, char devices).
>
> This was born from the annoyance of going from a struct page to a filesystem:
> page->mapping->host->i_sb->s_type
>
> That's four pointer dereferences.  This would bring it down to three:
> i_host(page->mapping)->i_sb->s_type
>
> I could see (eventually) interfaces changing to pass around a
> struct minode *mapping instead of a struct address_space *mapping.  But
> I know mapping->host and inode->i_mapping sometimes have some pretty
> weird relationships and maybe there's a legitimate usage that can't be
> handled by this change.
>
> Every filesystem which does use the page cache would have to be changed
> to use a minode instead of an inode, which is why this proposal is so
> very modest.  But before I start looking into it properly, I thought
> somebody might know why this isn't going to work.
>
> I know about raw devices:
>                  file_inode(filp)->i_mapping =
>                          bdev->bd_inode->i_mapping;
>
> and this seems like it should work for that.  I know about coda:
>                  coda_inode->i_mapping = host_inode->i_mapping;
>
> and this seems like it should work there too.
>
> DAX just seems confused:
>          inode->i_mapping = __dax_inode->i_mapping;
>          inode->i_mapping->host = __dax_inode;
>          inode->i_mapping->a_ops = &dev_dax_aops;
>
> GFS2 might need to embed an entire minode instead of just a mapping in its
> glocks and its superblock:
> fs/gfs2/glock.c:                mapping->host = s->s_bdev->bd_inode;
> fs/gfs2/ops_fstype.c:   mapping->host = sb->s_bdev->bd_inode;

I don't think that will scale. We did gain a big reduction in overhead 
for each cached inode when we stopped using two struct inodes and just 
embedded an address_space in the glock. However, I'm fairly sure that 
for the glock address_space case, we already have our own way to find 
the associated inode. So it might well be ok to do this anyway, and not 
need to embed a full minode.

Also, if there was a better way to track metadata on a per inode basis, 
then that would be an even better solution, but a much bigger project too.

The issue that you might run across is for stacked filesystems... will 
you land up finding the correct layer in the stack?

Steve.


>
> NILFS ... I don't understand at all.  It seems to allocate its own
> private address space in nilfs_inode_info instead of using i_data (why?)
> and also allocate more address spaces for metadata inodes.
> fs/nilfs2/page.c:       mapping->host = inode;
>
> So that will need to be understood, but is there a fundamental reason
> this won't work?
>
> Advantages:
>   - Eliminates a pointer dereference when moving from mapping to host
>   - Shrinks all inodes by one pointer
>   - Shrinks inodes used for symlinks, directories, sockets, pipes & char
>     devices by an entire struct address_space.
>
> Disadvantages:
>   - Churn
>   - Seems like it'll grow a few data structures in less common filesystems
>     (but may be important for some users)
>


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Cluster-devel] Disentangling address_space and inode
@ 2020-06-10  9:27   ` Steven Whitehouse
  0 siblings, 0 replies; 3+ messages in thread
From: Steven Whitehouse @ 2020-06-10  9:27 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

On 09/06/2020 13:41, Matthew Wilcox wrote:
> I have a modest proposal ...
>
>   struct inode {
> -	struct address_space i_data;
>   }
>
> +struct minode {
> +	struct inode i;
> +	struct address_space m;
> +};
>
>   struct address_space {
> -	struct inode *host;
>   }
>
> This saves one pointer per inode, and cuts all the pagecache support
> from inodes which don't need to have a page cache (symlinks, directories,
> pipes, sockets, char devices).
>
> This was born from the annoyance of going from a struct page to a filesystem:
> page->mapping->host->i_sb->s_type
>
> That's four pointer dereferences.  This would bring it down to three:
> i_host(page->mapping)->i_sb->s_type
>
> I could see (eventually) interfaces changing to pass around a
> struct minode *mapping instead of a struct address_space *mapping.  But
> I know mapping->host and inode->i_mapping sometimes have some pretty
> weird relationships and maybe there's a legitimate usage that can't be
> handled by this change.
>
> Every filesystem which does use the page cache would have to be changed
> to use a minode instead of an inode, which is why this proposal is so
> very modest.  But before I start looking into it properly, I thought
> somebody might know why this isn't going to work.
>
> I know about raw devices:
>                  file_inode(filp)->i_mapping =
>                          bdev->bd_inode->i_mapping;
>
> and this seems like it should work for that.  I know about coda:
>                  coda_inode->i_mapping = host_inode->i_mapping;
>
> and this seems like it should work there too.
>
> DAX just seems confused:
>          inode->i_mapping = __dax_inode->i_mapping;
>          inode->i_mapping->host = __dax_inode;
>          inode->i_mapping->a_ops = &dev_dax_aops;
>
> GFS2 might need to embed an entire minode instead of just a mapping in its
> glocks and its superblock:
> fs/gfs2/glock.c:                mapping->host = s->s_bdev->bd_inode;
> fs/gfs2/ops_fstype.c:   mapping->host = sb->s_bdev->bd_inode;

I don't think that will scale. We did gain a big reduction in overhead 
for each cached inode when we stopped using two struct inodes and just 
embedded an address_space in the glock. However, I'm fairly sure that 
for the glock address_space case, we already have our own way to find 
the associated inode. So it might well be ok to do this anyway, and not 
need to embed a full minode.

Also, if there was a better way to track metadata on a per inode basis, 
then that would be an even better solution, but a much bigger project too.

The issue that you might run across is for stacked filesystems... will 
you land up finding the correct layer in the stack?

Steve.


>
> NILFS ... I don't understand at all.  It seems to allocate its own
> private address space in nilfs_inode_info instead of using i_data (why?)
> and also allocate more address spaces for metadata inodes.
> fs/nilfs2/page.c:       mapping->host = inode;
>
> So that will need to be understood, but is there a fundamental reason
> this won't work?
>
> Advantages:
>   - Eliminates a pointer dereference when moving from mapping to host
>   - Shrinks all inodes by one pointer
>   - Shrinks inodes used for symlinks, directories, sockets, pipes & char
>     devices by an entire struct address_space.
>
> Disadvantages:
>   - Churn
>   - Seems like it'll grow a few data structures in less common filesystems
>     (but may be important for some users)
>



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-06-10  9:27 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-09 12:41 Disentangling address_space and inode Matthew Wilcox
2020-06-10  9:27 ` Steven Whitehouse
2020-06-10  9:27   ` [Cluster-devel] " Steven Whitehouse

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.