All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xfs: reserve fields in inode for parent ptr and alloc policy
@ 2013-04-10 18:24 Mark Tinguely
  2013-04-10 18:56 ` Eric Sandeen
  2013-04-11  3:00 ` Dave Chinner
  0 siblings, 2 replies; 9+ messages in thread
From: Mark Tinguely @ 2013-04-10 18:24 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: xfs-reserve-parent-and-allocation-fields-in-inode.patch --]
[-- Type: text/plain, Size: 3118 bytes --]

Reserve fields in new inode layout for parent pointer and
allocation policy.
		----
The inode will hold the parent information for the first
link to a file. Information for the other links will be
held in extended attribute entries.

The "di_parino" is the inode of the parent directory. The
directory information for this entry is located the parent
directory with "di_paroff" offset. 

The di_parino/di_paroff concept code is running.
		----
The "di_allocpolicy" will be used to remember the allocation
policy associated with this inode.

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
---
 fs/xfs/xfs_dinode.h |    3 +++
 fs/xfs/xfs_inode.c  |    6 ++++++
 fs/xfs/xfs_inode.h  |    3 +++
 3 files changed, 12 insertions(+)

Index: b/fs/xfs/xfs_dinode.h
===================================================================
--- a/fs/xfs/xfs_dinode.h
+++ b/fs/xfs/xfs_dinode.h
@@ -76,6 +76,9 @@ typedef struct xfs_dinode {
 	__be64		di_changecount;	/* number of attribute changes */
 	__be64		di_lsn;		/* flush sequence */
 	__be64		di_flags2;	/* more random flags */
+	__be64		di_parino;	/* parent inode */
+	__be32		di_paroff;	/* offset into parent directory */
+	__be32		di_allocpolicy;	/* allocation policy number */
 	__u8		di_pad2[16];	/* more padding for future expansion */
 
 	/* fields only written to during inode creation */
Index: b/fs/xfs/xfs_inode.c
===================================================================
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -875,6 +875,9 @@ xfs_dinode_from_disk(
 		to->di_flags2 = be64_to_cpu(from->di_flags2);
 		to->di_ino = be64_to_cpu(from->di_ino);
 		to->di_lsn = be64_to_cpu(from->di_lsn);
+		to->di_parino = be64_to_cpu(from->di_parino);
+		to->di_paroff = be32_to_cpu(from->di_paroff);
+		to->di_allocpolicy = be32_to_cpu(from->di_allocpolicy);
 		memcpy(to->di_pad2, from->di_pad2, sizeof(to->di_pad2));
 		uuid_copy(&to->di_uuid, &from->di_uuid);
 	}
@@ -922,6 +925,9 @@ xfs_dinode_to_disk(
 		to->di_flags2 = cpu_to_be64(from->di_flags2);
 		to->di_ino = cpu_to_be64(from->di_ino);
 		to->di_lsn = cpu_to_be64(from->di_lsn);
+		to->di_parino = cpu_to_be64(from->di_parino);
+		to->di_paroff = cpu_to_be32(from->di_paroff);
+		to->di_allocpolicy = cpu_to_be32(from->di_allocpolicy);
 		memcpy(to->di_pad2, from->di_pad2, sizeof(to->di_pad2));
 		uuid_copy(&to->di_uuid, &from->di_uuid);
 	}
Index: b/fs/xfs/xfs_inode.h
===================================================================
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -159,6 +159,9 @@ typedef struct xfs_icdinode {
 	__uint64_t	di_changecount;	/* number of attribute changes */
 	xfs_lsn_t	di_lsn;		/* flush sequence */
 	__uint64_t	di_flags2;	/* more random flags */
+	xfs_ino_t	di_parino;	/* parent inode */
+	__uint32_t	di_paroff;	/* offset into parent directory */
+	__uint32_t	di_allocpolicy;	/* allocation policy number */
 	__uint8_t	di_pad2[16];	/* more padding for future expansion */
 
 	/* fields only written to during inode creation */


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xfs: reserve fields in inode for parent ptr and alloc policy
  2013-04-10 18:24 [PATCH] xfs: reserve fields in inode for parent ptr and alloc policy Mark Tinguely
@ 2013-04-10 18:56 ` Eric Sandeen
  2013-04-10 19:39   ` Rich Johnston
  2013-04-11  3:00 ` Dave Chinner
  1 sibling, 1 reply; 9+ messages in thread
From: Eric Sandeen @ 2013-04-10 18:56 UTC (permalink / raw)
  To: Mark Tinguely; +Cc: xfs

On 4/10/13 1:24 PM, Mark Tinguely wrote:
> Reserve fields in new inode layout for parent pointer and
> allocation policy.
> 		----
> The inode will hold the parent information for the first
> link to a file. Information for the other links will be
> held in extended attribute entries.
> 
> The "di_parino" is the inode of the parent directory. The
> directory information for this entry is located the parent
> directory with "di_paroff" offset. 
> 
> The di_parino/di_paroff concept code is running.
> 		----
> The "di_allocpolicy" will be used to remember the allocation
> policy associated with this inode.

can you say more about this allocation policy?

-Eric

> Signed-off-by: Mark Tinguely <tinguely@sgi.com>
> ---
>  fs/xfs/xfs_dinode.h |    3 +++
>  fs/xfs/xfs_inode.c  |    6 ++++++
>  fs/xfs/xfs_inode.h  |    3 +++
>  3 files changed, 12 insertions(+)
> 
> Index: b/fs/xfs/xfs_dinode.h
> ===================================================================
> --- a/fs/xfs/xfs_dinode.h
> +++ b/fs/xfs/xfs_dinode.h
> @@ -76,6 +76,9 @@ typedef struct xfs_dinode {
>  	__be64		di_changecount;	/* number of attribute changes */
>  	__be64		di_lsn;		/* flush sequence */
>  	__be64		di_flags2;	/* more random flags */
> +	__be64		di_parino;	/* parent inode */
> +	__be32		di_paroff;	/* offset into parent directory */
> +	__be32		di_allocpolicy;	/* allocation policy number */
>  	__u8		di_pad2[16];	/* more padding for future expansion */
>  
>  	/* fields only written to during inode creation */
> Index: b/fs/xfs/xfs_inode.c
> ===================================================================
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -875,6 +875,9 @@ xfs_dinode_from_disk(
>  		to->di_flags2 = be64_to_cpu(from->di_flags2);
>  		to->di_ino = be64_to_cpu(from->di_ino);
>  		to->di_lsn = be64_to_cpu(from->di_lsn);
> +		to->di_parino = be64_to_cpu(from->di_parino);
> +		to->di_paroff = be32_to_cpu(from->di_paroff);
> +		to->di_allocpolicy = be32_to_cpu(from->di_allocpolicy);
>  		memcpy(to->di_pad2, from->di_pad2, sizeof(to->di_pad2));
>  		uuid_copy(&to->di_uuid, &from->di_uuid);
>  	}
> @@ -922,6 +925,9 @@ xfs_dinode_to_disk(
>  		to->di_flags2 = cpu_to_be64(from->di_flags2);
>  		to->di_ino = cpu_to_be64(from->di_ino);
>  		to->di_lsn = cpu_to_be64(from->di_lsn);
> +		to->di_parino = cpu_to_be64(from->di_parino);
> +		to->di_paroff = cpu_to_be32(from->di_paroff);
> +		to->di_allocpolicy = cpu_to_be32(from->di_allocpolicy);
>  		memcpy(to->di_pad2, from->di_pad2, sizeof(to->di_pad2));
>  		uuid_copy(&to->di_uuid, &from->di_uuid);
>  	}
> Index: b/fs/xfs/xfs_inode.h
> ===================================================================
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -159,6 +159,9 @@ typedef struct xfs_icdinode {
>  	__uint64_t	di_changecount;	/* number of attribute changes */
>  	xfs_lsn_t	di_lsn;		/* flush sequence */
>  	__uint64_t	di_flags2;	/* more random flags */
> +	xfs_ino_t	di_parino;	/* parent inode */
> +	__uint32_t	di_paroff;	/* offset into parent directory */
> +	__uint32_t	di_allocpolicy;	/* allocation policy number */
>  	__uint8_t	di_pad2[16];	/* more padding for future expansion */
>  
>  	/* fields only written to during inode creation */
> 
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xfs: reserve fields in inode for parent ptr and alloc policy
  2013-04-10 18:56 ` Eric Sandeen
@ 2013-04-10 19:39   ` Rich Johnston
  2013-04-10 20:31     ` Mark Tinguely
  0 siblings, 1 reply; 9+ messages in thread
From: Rich Johnston @ 2013-04-10 19:39 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Mark Tinguely, xfs

On 04/10/2013 01:56 PM, Eric Sandeen wrote:
> On 4/10/13 1:24 PM, Mark Tinguely wrote:
>> Reserve fields in new inode layout for parent pointer and
>> allocation policy.
>> 		----
>> The inode will hold the parent information for the first
>> link to a file. Information for the other links will be
>> held in extended attribute entries.
>>
>> The "di_parino" is the inode of the parent directory. The
>> directory information for this entry is located the parent
>> directory with "di_paroff" offset.
>>
>> The di_parino/di_paroff concept code is running.
>> 		----
>> The "di_allocpolicy" will be used to remember the allocation
>> policy associated with this inode.
>
> can you say more about this allocation policy?
>
> -Eric

No its super secret information. ;)

Its on my plate Eric, because Mark was making a change for parent ptrs, 
I asked him to request space for allocation policies also.

I don't have all the details yet but here is a very high level concept.

Identify allocation groups by names (or numbers -- preferably using names
in user-visible areas), allowing many different areas. Placing the 
allocation
policy outside of user programs is necessary for this to be successful.

  Current thoughts on proposed a layered allocation policies:

     Policy for the entire filesystem
     Policy attached to a directory (whose policy would be inherited by 
subdirectories when subdirectories are created)
     Policy for a single file

The policy would define:

    where to place file data
    where to place metadata for the files.
    a prefered allocation group for placing file data (for directories).

--Rich

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xfs: reserve fields in inode for parent ptr and alloc policy
  2013-04-10 19:39   ` Rich Johnston
@ 2013-04-10 20:31     ` Mark Tinguely
  2013-04-10 20:40       ` Eric Sandeen
  2013-04-11  3:28       ` Dave Chinner
  0 siblings, 2 replies; 9+ messages in thread
From: Mark Tinguely @ 2013-04-10 20:31 UTC (permalink / raw)
  To: Rich Johnston; +Cc: Eric Sandeen, xfs

On 04/10/13 14:39, Rich Johnston wrote:
> On 04/10/2013 01:56 PM, Eric Sandeen wrote:
>> On 4/10/13 1:24 PM, Mark Tinguely wrote:
>>> Reserve fields in new inode layout for parent pointer and
>>> allocation policy.
>>> ----
>>> The inode will hold the parent information for the first
>>> link to a file. Information for the other links will be
>>> held in extended attribute entries.
>>>
>>> The "di_parino" is the inode of the parent directory. The
>>> directory information for this entry is located the parent
>>> directory with "di_paroff" offset.
>>>
>>> The di_parino/di_paroff concept code is running.
>>> ----
>>> The "di_allocpolicy" will be used to remember the allocation
>>> policy associated with this inode.
>>
>> can you say more about this allocation policy?
>>
>> -Eric
>
> No its super secret information. ;)
>
> Its on my plate Eric, because Mark was making a change for parent ptrs,
> I asked him to request space for allocation policies also.
>
> I don't have all the details yet but here is a very high level concept.
>
> Identify allocation groups by names (or numbers -- preferably using names
> in user-visible areas), allowing many different areas. Placing the
> allocation
> policy outside of user programs is necessary for this to be successful.
>
> Current thoughts on proposed a layered allocation policies:
>
> Policy for the entire filesystem
> Policy attached to a directory (whose policy would be inherited by
> subdirectories when subdirectories are created)
> Policy for a single file
>
> The policy would define:
>
> where to place file data
> where to place metadata for the files.
> a prefered allocation group for placing file data (for directories).
>
> --Rich
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

The allocation policies is based on work by Dave:

	http://oss.sgi.com/archives/xfs/2009-02/msg00250.html

--Mark.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xfs: reserve fields in inode for parent ptr and alloc policy
  2013-04-10 20:31     ` Mark Tinguely
@ 2013-04-10 20:40       ` Eric Sandeen
  2013-04-11  3:28       ` Dave Chinner
  1 sibling, 0 replies; 9+ messages in thread
From: Eric Sandeen @ 2013-04-10 20:40 UTC (permalink / raw)
  To: Mark Tinguely; +Cc: Rich Johnston, xfs

On 4/10/13 3:31 PM, Mark Tinguely wrote:
> On 04/10/13 14:39, Rich Johnston wrote:
>> On 04/10/2013 01:56 PM, Eric Sandeen wrote:
>>> On 4/10/13 1:24 PM, Mark Tinguely wrote:
>>>> Reserve fields in new inode layout for parent pointer and
>>>> allocation policy.
>>>> ----
>>>> The inode will hold the parent information for the first
>>>> link to a file. Information for the other links will be
>>>> held in extended attribute entries.
>>>>
>>>> The "di_parino" is the inode of the parent directory. The
>>>> directory information for this entry is located the parent
>>>> directory with "di_paroff" offset.
>>>>
>>>> The di_parino/di_paroff concept code is running.
>>>> ----
>>>> The "di_allocpolicy" will be used to remember the allocation
>>>> policy associated with this inode.
>>>
>>> can you say more about this allocation policy?
>>>
>>> -Eric
>>
>> No its super secret information. ;)
>>
>> Its on my plate Eric, because Mark was making a change for parent ptrs,
>> I asked him to request space for allocation policies also.
>>
>> I don't have all the details yet but here is a very high level concept.
>>
>> Identify allocation groups by names (or numbers -- preferably using names
>> in user-visible areas), allowing many different areas. Placing the
>> allocation
>> policy outside of user programs is necessary for this to be successful.
>>
>> Current thoughts on proposed a layered allocation policies:
>>
>> Policy for the entire filesystem
>> Policy attached to a directory (whose policy would be inherited by
>> subdirectories when subdirectories are created)
>> Policy for a single file
>>
>> The policy would define:
>>
>> where to place file data
>> where to place metadata for the files.
>> a prefered allocation group for placing file data (for directories).
>>
>> --Rich
>>

> 
> The allocation policies is based on work by Dave:
> 
>     http://oss.sgi.com/archives/xfs/2009-02/msg00250.html
> 
> --Mark.
> 

Great, thanks guys.  I had forgotten about that TBH.  Just wondered
if you had it in the works, or reserving just in case, or . . . 

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xfs: reserve fields in inode for parent ptr and alloc policy
  2013-04-10 18:24 [PATCH] xfs: reserve fields in inode for parent ptr and alloc policy Mark Tinguely
  2013-04-10 18:56 ` Eric Sandeen
@ 2013-04-11  3:00 ` Dave Chinner
  2013-04-11 13:54   ` Mark Tinguely
  1 sibling, 1 reply; 9+ messages in thread
From: Dave Chinner @ 2013-04-11  3:00 UTC (permalink / raw)
  To: Mark Tinguely; +Cc: xfs

On Wed, Apr 10, 2013 at 01:24:24PM -0500, Mark Tinguely wrote:
> Reserve fields in new inode layout for parent pointer and
> allocation policy.

Not without a design review - there is no way we are going to
blindly reserve space in any on-disk structure without first having
a solid design, published proof-of-concept code and agreement that
the format changes being proposed are the right way to proceed.

Where are the design docs/code for these features?

> 		----
> The inode will hold the parent information for the first
> link to a file. Information for the other links will be
> held in extended attribute entries.
> 
> The "di_parino" is the inode of the parent directory. The
> directory information for this entry is located the parent
> directory with "di_paroff" offset. 
> 
> The di_parino/di_paroff concept code is running.

How does it handle hard links? (i.e. multiple parents)

Also, inode number is not enough for unique identification of the
parent - generation number is also required.

FWIW, lets go back to the (was almost finished) parent pointer
code from 2009:

http://oss.sgi.com/archives/xfs/2009-01/msg01068.html

That uses xattrs to store parent information - inode #, generation
#, and a counter - for each parent. It requires a counter because
you can have the same inode hard linked into the same parent
directory multiple times:

typedef struct xfs_parent_eaname_rec {
       __be64  p_ino;
       __be32  p_gen;
       __be32  p_cnt;
} xfs_parent_eaname_rec_t;

And a transaction appended to the the inode create, link, rename and
remove operations to manage the xattrs so all cases involving hard
links work just fine.

Indeed, the single di_parino/di_paroff concept was one of the
original designs considered because of it's simplicity, but it was
rejected because of that very simplicity - it cannot handle common
use cases involving hardlinks.

Release early, release often?

> 		----
> The "di_allocpolicy" will be used to remember the allocation
> policy associated with this inode.

This is exactly what we have padding in the inode for - so that
future additions to the on-disk inode can be added via feature bits
to indicate the fields are present. 

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xfs: reserve fields in inode for parent ptr and alloc policy
  2013-04-10 20:31     ` Mark Tinguely
  2013-04-10 20:40       ` Eric Sandeen
@ 2013-04-11  3:28       ` Dave Chinner
  1 sibling, 0 replies; 9+ messages in thread
From: Dave Chinner @ 2013-04-11  3:28 UTC (permalink / raw)
  To: Mark Tinguely; +Cc: Eric Sandeen, Rich Johnston, xfs

On Wed, Apr 10, 2013 at 03:31:53PM -0500, Mark Tinguely wrote:
> On 04/10/13 14:39, Rich Johnston wrote:
> >On 04/10/2013 01:56 PM, Eric Sandeen wrote:
> >>On 4/10/13 1:24 PM, Mark Tinguely wrote:
> >>>The "di_allocpolicy" will be used to remember the allocation
> >>>policy associated with this inode.
> >>
> >>can you say more about this allocation policy?
> >>
> >>-Eric
> >
> >No its super secret information. ;)
> >
> >Its on my plate Eric, because Mark was making a change for parent ptrs,
> >I asked him to request space for allocation policies also.
> >
> >I don't have all the details yet but here is a very high level concept.

The on-disk format is a low level detail that falls out at the
bottom of the design/implementation/review cycle. It doesn't get
defined by high level concepts...

> >Identify allocation groups by names (or numbers -- preferably using names
> >in user-visible areas), allowing many different areas. Placing the
> >allocation
> >policy outside of user programs is necessary for this to be successful.
> >
> >Current thoughts on proposed a layered allocation policies:
> >
> >Policy for the entire filesystem
> >Policy attached to a directory (whose policy would be inherited by
> >subdirectories when subdirectories are created)
> >Policy for a single file
> >
> >The policy would define:
> >
> >where to place file data
> >where to place metadata for the files.
> >a prefered allocation group for placing file data (for directories).

Which is a summary of what this code:

> The allocation policies is based on work by Dave:
> 
> 	http://oss.sgi.com/archives/xfs/2009-02/msg00250.html

started with and was building on.

What I was trying to get to in that patch series was an arbitrarily
extensible allocation policy infrastructure, and that patch set was
proof-of-concept code I used to flesh out ideas.  Yes, it used 32
bits in the inode, but keep in mind that changed several times in
the patch set as I implemented new stuff and changed heirarchies,
definitions and concepts mid-patchset. But it isn't a reference
design - it was a research vehicle.....

Indeed, the patch set is an exact demonstration of the functionality
required by the on-disk format not being properly understood until
the functionality has at been fully implemented in a POC. And I
hadn't got anywhere near that with the above patch set.

As it is, I think that an extended attribute is be a better place
for allocation policy information. An xattr is far easier to modify
from userspace, and allows arbitrary allocator primitives being
exposed to policy control. That was the problem with the above patch
set - it didn't expose policy controls individually - it exposed
them as a defined set, and only the sets defined by the kernel were
possible.

As such, direct control of locality is simply not possible with the
above patch set. Having an attribute format that defines all the
different control parameters allows userspace to define complex
policies that match the specific storage topology of the system the
policy is designed for. This is impossible to express in a standard
kernel with the approach I was taking in the above patch set, and i
was already thinking about xattr based primitives to allow
fine-grained exposure of the allocation primitives I'd abstracted
out of the kernel code...

IOWs, We need to have agreement on design and implementation
direction of a feature before we consider what the on-disk
format is going to be, so reserving space on disk is extremely
premature at this point....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xfs: reserve fields in inode for parent ptr and alloc policy
  2013-04-11  3:00 ` Dave Chinner
@ 2013-04-11 13:54   ` Mark Tinguely
  2013-04-12  0:24     ` Dave Chinner
  0 siblings, 1 reply; 9+ messages in thread
From: Mark Tinguely @ 2013-04-11 13:54 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 04/10/13 22:00, Dave Chinner wrote:
> On Wed, Apr 10, 2013 at 01:24:24PM -0500, Mark Tinguely wrote:
>> Reserve fields in new inode layout for parent pointer and
>> allocation policy.
>
> Not without a design review - there is no way we are going to
> blindly reserve space in any on-disk structure without first having
> a solid design, published proof-of-concept code and agreement that
> the format changes being proposed are the right way to proceed.
>
> Where are the design docs/code for these features?

Then you might want to bump the pad size.

>
>> 		----
>> The inode will hold the parent information for the first
>> link to a file. Information for the other links will be
>> held in extended attribute entries.
>>
>> The "di_parino" is the inode of the parent directory. The
>> directory information for this entry is located the parent
>> directory with "di_paroff" offset.
>>
>> The di_parino/di_paroff concept code is running.
>
> How does it handle hard links? (i.e. multiple parents)

As I stated, the hard links will use extended attribute entries.

>
> Also, inode number is not enough for unique identification of the
> parent - generation number is also required.

Per the 2005 XFS features meeting, the inode and directory offset will 
uniquely describe a link - (thank-you to Glen Overby for that 
observation). The concept code just using one link verifies this fact. 
Using the inode/offset as the identifier is compact and also gives 
information to the file name.

>
> FWIW, lets go back to the (was almost finished) parent pointer
> code from 2009:
>
> http://oss.sgi.com/archives/xfs/2009-01/msg01068.html

yep, read it, studied it, plan to use parts of it but only where needed.

>
> That uses xattrs to store parent information - inode #, generation
> #, and a counter - for each parent. It requires a counter because
> you can have the same inode hard linked into the same parent
> directory multiple times:
>
> typedef struct xfs_parent_eaname_rec {
>         __be64  p_ino;
>         __be32  p_gen;
>         __be32  p_cnt;
> } xfs_parent_eaname_rec_t;
>
> And a transaction appended to the the inode create, link, rename and
> remove operations to manage the xattrs so all cases involving hard
> links work just fine.
>
> Indeed, the single di_parino/di_paroff concept was one of the
> original designs considered because of it's simplicity, but it was
> rejected because of that very simplicity - it cannot handle common
> use cases involving hardlinks.

According to Geoffrey's notes, the hybrid approach was discussed too. 
The single link case will save all the EA operations for the majority of 
the inodes.
>
> Release early, release often?

No, trust but verify.


>
>> 		----
>> The "di_allocpolicy" will be used to remember the allocation
>> policy associated with this inode.
>
> This is exactly what we have padding in the inode for - so that
> future additions to the on-disk inode can be added via feature bits
> to indicate the fields are present.
>
> Cheers,
>
> Dave.


--Mark

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xfs: reserve fields in inode for parent ptr and alloc policy
  2013-04-11 13:54   ` Mark Tinguely
@ 2013-04-12  0:24     ` Dave Chinner
  0 siblings, 0 replies; 9+ messages in thread
From: Dave Chinner @ 2013-04-12  0:24 UTC (permalink / raw)
  To: Mark Tinguely; +Cc: xfs

On Thu, Apr 11, 2013 at 08:54:20AM -0500, Mark Tinguely wrote:
> On 04/10/13 22:00, Dave Chinner wrote:
> >On Wed, Apr 10, 2013 at 01:24:24PM -0500, Mark Tinguely wrote:
> >>Reserve fields in new inode layout for parent pointer and
> >>allocation policy.
> >
> >Not without a design review - there is no way we are going to
> >blindly reserve space in any on-disk structure without first having
> >a solid design, published proof-of-concept code and agreement that
> >the format changes being proposed are the right way to proceed.
> >
> >Where are the design docs/code for these features?
> 
> Then you might want to bump the pad size.

Not without good reason. Padding is not designed to accomodate new
features that require 16 bytes of core inode space. If you need that
amount of space in the inode, we have EAs for that....

The plan of record for the past 5-6 years has been that parent
pointers require an arbitrary amount of space in the inode so they
need to be entirely EA based. Hence I'm not prepared to agree to an
inode core change for a shiny new parent pointer implementation I
first heard about 2 days ago and know absolutely nothing about.

You need to show everyone the code so we can get some idea of what
benefit putting this information in the inode core will bring us
over a purely EA based implementation.

FWIW, inodes are versioned and it's trivial to bump the inode
version and add feature bit if an inode core format change is
absolutely necessary. From that perspective, I don't see any need to
rush a change at this point in time.

> >>		----
> >>The inode will hold the parent information for the first
> >>link to a file. Information for the other links will be
> >>held in extended attribute entries.
> >>
> >>The "di_parino" is the inode of the parent directory. The
> >>directory information for this entry is located the parent
> >>directory with "di_paroff" offset.
> >>
> >>The di_parino/di_paroff concept code is running.
> >
> >How does it handle hard links? (i.e. multiple parents)
> 
> As I stated, the hard links will use extended attribute entries.

That will add significant complexity of the implementation.

> >Also, inode number is not enough for unique identification of the
> >parent - generation number is also required.
> 
> Per the 2005 XFS features meeting,

I was at those meetings and wasn't allowed to keep a copy of those
notes when I left SGI. If you want to refer to them, you'll need to
post the notes so everyone knows what was discussed. ;)

> the inode and directory offset
> will uniquely describe a link

It doesn't, though. An inode can only be uniquely identified by
the combination of the inode+generation number. The inode number
identifies an inode, but it doesn't identify the lifecycle instance
of the inode that the reference is valid for.

If this parent pointer link is going to be in any way useful for
metadata scrubbing and repair, then it has to identify the instance
of the parent inode that it points to.

Note that I'm not saying that the EA format in the 2009 patch set is
perfect - I think it needs a couple of significant tweaks. What I am
saying is that there is good reasons for the format of information
in that EA.

> - (thank-you to Glen Overby for that
> observation). The concept code just using one link verifies this
> fact. Using the inode/offset as the identifier is compact and also
> gives information to the file name.

Yup, using d_off was also considered, but has problems, too.  The
problem with pointing to the directory offset is that if the
directory is rebuilt for any reason (e.g. by xfs_repair), then the
directory offset in the child inodes is now invalid and needs to
also be fixed.

And then you have the problem of needing multiple parent links for a
single parent directory as you have to instantiate every single hard
link to the same directory. In the current EA patchset, this does
not occur as the parent structure simply counts the number of
references to a given parent directory.

The EA approach just points to the parent directory and keeps a
count of the number of links back to the directory. Hence as long as
repair doesn't reallocate the directory inode, the parent pointer
information remains intact. i.e. There is less dependency between
the objects and so the format is much more robust against unexpected
changes to directory structures.

This concedes the fact that inode number to name resolution needs to
be a userspace problem. This is something we have to conceed because
the presence of namespaces, security models, bind mounts and chroot
environments use top-down path traversal to limit what part of the
filesystem a userspace process can actually see. In-kernel
resolution of reverse path names is a potential vector for
information leakage between namespaces and at it's worst, a provide
a method for escape. Hence there are security concerns about parent
pointers that we have to take into account as well.

The EA approach is also allows more flexibility for potential future
directory features. For example: online defragmentation. If we want defrag
to be able to compact directories into a contiguous address space,
we need to be able to change the offsets of the data entries. We can
do this if there are no open references to the directory at the time
of the defrag. However, if we fix the offsets of directory entries
in the child inode parent pointers, then this form of defrag becomes
impossible to implement as we cannot update the directory and all
the child inodes atomically from a userspace or transactional
POV. IOWs, the parent pointer format containing an exact directory
offset also limits what we can do with directories in future....

> >That uses xattrs to store parent information - inode #, generation
> >#, and a counter - for each parent. It requires a counter because
> >you can have the same inode hard linked into the same parent
> >directory multiple times:
> >
> >typedef struct xfs_parent_eaname_rec {
> >        __be64  p_ino;
> >        __be32  p_gen;
> >        __be32  p_cnt;
> >} xfs_parent_eaname_rec_t;
> >
> >And a transaction appended to the the inode create, link, rename and
> >remove operations to manage the xattrs so all cases involving hard
> >links work just fine.
> >
> >Indeed, the single di_parino/di_paroff concept was one of the
> >original designs considered because of it's simplicity, but it was
> >rejected because of that very simplicity - it cannot handle common
> >use cases involving hardlinks.
> 
> According to Geoffrey's notes, the hybrid approach was discussed
> too. The single link case will save all the EA operations for the
> majority of the inodes.

Sure, those meetings were brain storming about how we *might* solve
a problem, but keep in mind the context of those meetings was that
all the first and second generation XFS gurus had left SGI and in
that vaccuum nobody left at SGI fully understood the problem domains
being discussed.

As such, there were some really crazy ideas discussed and
documented, but that doesn't mean they were the best solution or
even a good idea.  We came up with several possible ways to
implement parent pointers, but the real world sorted them out pretty
quickly after the brain storming sessions.

Indeed, the real world broke the initial implementation of parent
pointers pretty quickly. You may not know the entire history, but
XFS on Irix effectively ended up with a useless, broken parent
pointer implementation because it was implemented quickly based on
assumptions made in those brainstorming seesions meeting.  i.e.
still without fully understanding the problem space. The limitations
the Irix implementation exposed resulted in it being thrown away
completely and redesigned for Linux. IOWs, the EA based patchset is
what evolved from the hard lessons that Irix taught us....

> >Release early, release often?
> 
> No, trust but verify.

Verification can't be done if you don't tell us anything about what
you are doing.  Verification needs to be done at the design/POC
phase just as much as at the final patch review stage, because there
are many parties that have different requirements for the same
feature.

Open source development is supposed to be, well, open. Public. Not
behind closed doors. Design is open, implementation is open,
flamewars aout stuff are out there in public. Nothing is done behind
closed doors.

Saying "we're doing X" and then not telling anyone about it is the
antithesis of OSS. It -excludes- the community from the process of
developing the new features, and prevents them from verifying what
you are doing is suitable for their needs.

IOWs, there is no possibility of "trust, but verify" in OSS without
"release early, release often". You develop trust by being open and
allowing people to review and verify what you are doing at every
stage of development - from POC through to production code.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-04-12  0:25 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-10 18:24 [PATCH] xfs: reserve fields in inode for parent ptr and alloc policy Mark Tinguely
2013-04-10 18:56 ` Eric Sandeen
2013-04-10 19:39   ` Rich Johnston
2013-04-10 20:31     ` Mark Tinguely
2013-04-10 20:40       ` Eric Sandeen
2013-04-11  3:28       ` Dave Chinner
2013-04-11  3:00 ` Dave Chinner
2013-04-11 13:54   ` Mark Tinguely
2013-04-12  0:24     ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.