All of lore.kernel.org
 help / color / mirror / Atom feed
* Question on migrating data between PVs in xfs
@ 2016-08-09 14:50 Wei Lin
  2016-08-09 22:35 ` Dave Chinner
  0 siblings, 1 reply; 8+ messages in thread
From: Wei Lin @ 2016-08-09 14:50 UTC (permalink / raw)
  To: xfs

Hi there,

I am working on an xfs based project and want to modify the allocation
algorithm, which is quite involved. I am wondering if anyone could help
with this.

The high level goal is to create xfs agains multiple physical volumes,
allow user to specify the target PV for files, and migrate files
automatically.

I plan to implement the user interface with extended attributes, but am
now stuck with the allocation/migration part. Is there a way to make xfs
respect the attribute, i.e. only allocate blocks/extents from the target
PV specified by user?

Any suggestion would be highly appreciated.

Cheers,
-- 
Wei Lin

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question on migrating data between PVs in xfs
  2016-08-09 14:50 Question on migrating data between PVs in xfs Wei Lin
@ 2016-08-09 22:35 ` Dave Chinner
       [not found]   ` <20160810092313.GA16193@ic>
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Chinner @ 2016-08-09 22:35 UTC (permalink / raw)
  To: Wei Lin; +Cc: xfs

On Tue, Aug 09, 2016 at 03:50:47PM +0100, Wei Lin wrote:
> Hi there,
> 
> I am working on an xfs based project and want to modify the allocation
> algorithm, which is quite involved. I am wondering if anyone could help
> with this.
> 
> The high level goal is to create xfs agains multiple physical volumes,
> allow user to specify the target PV for files, and migrate files
> automatically.

So, essentially tiered storage with automatic migration. Can you
describe the storage layout and setup you are thinking of using and
how that will map to a single XFS filesystem so we have a better
idea of what you are thinking of?

> I plan to implement the user interface with extended attributes, but am
> now stuck with the allocation/migration part. Is there a way to make xfs
> respect the attribute, i.e. only allocate blocks/extents from the target
> PV specified by user?

Define "PV".

XFS separates allocation by allocation group - it has no concept of
underlying physical device layout. If I understand what you , you have
multiple "physical volumes" set up in a single block device (somehow
- please describe!) and now you want to control how data is
allocated to those underlying volumes, right?

So what you're asking about is how to define and implement user
controlled allocation policies, right? Sorta like this old
prototype I was working on years ago?

http://oss.sgi.com/archives/xfs/2009-02/msg00250.html

And some more info from a later discussion:

http://oss.sgi.com/archives/xfs/2013-01/msg00611.html

And maybe in conjunction with this, which added groupings of AGs
together to form independent regions of "physical separation" that
the allocator could then be made aware of:

http://oss.sgi.com/archives/xfs/2009-02/msg00253.html

These were more aimed at defining failure domains for error and
corruption isolation:

http://xfs.org/index.php/Reliable_Detection_and_Repair_of_Metadata_Corruption#Failure_Domains

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question on migrating data between PVs in xfs
       [not found]   ` <20160810092313.GA16193@ic>
@ 2016-08-10 10:56     ` Dave Chinner
  2016-08-10 16:31       ` Emmanuel Florac
  2016-08-11  9:04       ` Wei Lin
  0 siblings, 2 replies; 8+ messages in thread
From: Dave Chinner @ 2016-08-10 10:56 UTC (permalink / raw)
  To: Wei Lin; +Cc: xfs

Hi Wei,

Please keep the discussion on the list unless there's good reason
not to. I've readded the list cc...

On Wed, Aug 10, 2016 at 10:23:14AM +0100, Wei Lin wrote:
> Hi Dave,
> 
> Thank you very much for the reply. Comment inline.
> 
> On 16-08-10 08:35:03, Dave Chinner wrote:
> > On Tue, Aug 09, 2016 at 03:50:47PM +0100, Wei Lin wrote:
> > > Hi there,
> > > 
> > > I am working on an xfs based project and want to modify the allocation
> > > algorithm, which is quite involved. I am wondering if anyone could help
> > > with this.
> > > 
> > > The high level goal is to create xfs agains multiple physical volumes,
> > > allow user to specify the target PV for files, and migrate files
> > > automatically.
> > 
> > So, essentially tiered storage with automatic migration. Can you
> > describe the storage layout and setup you are thinking of using and
> > how that will map to a single XFS filesystem so we have a better
> > idea of what you are thinking of?
> > 
> Yes, but the migration is triggerd by user specifying a device, instead
> of kernel monitoring the usage pattern.

That's not migration - that's an allocation policy. Migration means
moving data at rest to a different physical location, such as via a
HSM, automatic teiring or defragmentation. Deciding where to write
when the first data is written is the job of the filesystem
allocator, so what you are describing here is user-controlled
allocation policy.

> By "PV" I meant physical volumes of LVM. Currently I have two physical
> volumes, one based on two SSDs and the other six HDDs.

That's what I thought, but you still need to describe everything in
full rather than assume the reader understands your abbreviations.

> The XFS was
> created as follows:
> 
> mdadm --create /dev/md1  --raid-devices=2 --level=10 -p f2 --bitmap=internal --assume-clean /dev/nvme?n1
> mdadm --create /dev/md2  --raid-devices=6 --level=5 --bitmap=internal --assume-clean /dev/sd[c-h]
> pvcreate /dev/md1
> pvcreate /dev/md2
> vgcreate researchvg /dev/md1 /dev/md2
> lvcreate -n hsd -l 100%FREE researchvg
> mkfs.xfs -L HSD -l internal,lazy-count=1,size=128m /dev/mapper/researchvg-hsd

It's a linear concatenation of multiple separate block devices,
so the physical boundaries are hidden from the filesystem by the
LVM layer.

Have you lookd at using dm-cache instead of modifying the
filesystem?

> > > I plan to implement the user interface with extended attributes, but am
> > > now stuck with the allocation/migration part. Is there a way to make xfs
> > > respect the attribute, i.e. only allocate blocks/extents from the target
> > > PV specified by user?
> > 
> > Define "PV".
> > 
> > XFS separates allocation by allocation group - it has no concept of
> > underlying physical device layout. If I understand what you , you have
> > multiple "physical volumes" set up in a single block device (somehow
> > - please describe!) and now you want to control how data is
> > allocated to those underlying volumes, right?
> 
> I thought about storing the mapping between the physical volumes and the
> logical volume in a special file, probably including metainfo like IOPS,
> access time as well. And consulting this file on the fly to determine if
> the allocated extent is within the target device.

How does the filesystem determine whether an allocated extent is on
a specific device when it has no knowledge of the underlying
physical device boundaries?

> > So what you're asking about is how to define and implement user
> > controlled allocation policies, right? Sorta like this old
> > prototype I was working on years ago?
> > 
> > http://oss.sgi.com/archives/xfs/2009-02/msg00250.html
> > 
> > And some more info from a later discussion:
> > 
> > http://oss.sgi.com/archives/xfs/2013-01/msg00611.html
> > 
> > And maybe in conjunction with this, which added groupings of AGs
> > together to form independent regions of "physical separation" that
> > the allocator could then be made aware of:
> > 
> > http://oss.sgi.com/archives/xfs/2009-02/msg00253.html
> 
> I am not sure if allocation(s) group would be a good unit of "physical
> separation".

There is no other construct in XFS designed for that purpose.

> Since the underlying physical devices (thus the physical
> volumes) have quite different characteristics, physical volumes seem
> naturally a good choice.

XFS knows nothing about those boundaries - you have to tell it where
the boundaries are. e.g. size your allocation groups to fit the
smallest physical boundary you have, then assign a different policy
to the user of that allocation group. THat's the point of the patch
set that allowed mkfs to define sets of AGs that lay in specific
domains so that the allocator could target them based on the
requirements supplied from the user in the allocation policy (which was
the first patch set I pointed to).

> On the other hand an allocation group may span
> multiple physical volumes, providing quite different QoS. This is why I
> planned to let users specify target "PV" instead of target allocation
> group. Any ideas?

Go read the code in the patches I pointed to first - they answer
both the questions you are asking right now as these were the
problems that I was looking to solve all that time ago. They will
also answer many questions you haven't yet realised you need to
ask, too.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question on migrating data between PVs in xfs
  2016-08-10 10:56     ` Dave Chinner
@ 2016-08-10 16:31       ` Emmanuel Florac
  2016-08-10 21:51         ` Dave Chinner
  2016-08-11  9:04       ` Wei Lin
  1 sibling, 1 reply; 8+ messages in thread
From: Emmanuel Florac @ 2016-08-10 16:31 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs, Wei Lin

Le Wed, 10 Aug 2016 20:56:39 +1000
Dave Chinner <david@fromorbit.com> écrivait:

> Have you lookd at using dm-cache instead of modifying the
> filesystem?
> 

Or bcache, fcache, or EnhanceIO. So far from my own testing bcache is
significantly faster and dm-cache by far the slowest of the bunch, but
bcache needs some more loving (his main developer is busy writing
some new tiered, caching filesystem instead).

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question on migrating data between PVs in xfs
  2016-08-10 16:31       ` Emmanuel Florac
@ 2016-08-10 21:51         ` Dave Chinner
  2016-08-11  9:26           ` Wei Lin
  2016-08-11 10:44           ` Emmanuel Florac
  0 siblings, 2 replies; 8+ messages in thread
From: Dave Chinner @ 2016-08-10 21:51 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: xfs, Wei Lin

On Wed, Aug 10, 2016 at 06:31:32PM +0200, Emmanuel Florac wrote:
> Le Wed, 10 Aug 2016 20:56:39 +1000
> Dave Chinner <david@fromorbit.com> écrivait:
> 
> > Have you lookd at using dm-cache instead of modifying the
> > filesystem?
> > 
> 
> Or bcache, fcache, or EnhanceIO. So far from my own testing bcache is
> significantly faster and dm-cache by far the slowest of the bunch, but
> bcache needs some more loving (his main developer is busy writing
> some new tiered, caching filesystem instead).

Yeah, the problem with bcache is that it is effectively an orphaned
driver. If there are obvious and reproducable performance
differentials between bcache and dm-cache, you should bring them to
the attention of the dm developers to see if they can fix them...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question on migrating data between PVs in xfs
  2016-08-10 10:56     ` Dave Chinner
  2016-08-10 16:31       ` Emmanuel Florac
@ 2016-08-11  9:04       ` Wei Lin
  1 sibling, 0 replies; 8+ messages in thread
From: Wei Lin @ 2016-08-11  9:04 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs, Wei Lin

Hi Dave,

Now I see your point. Initially I wanted to store the linear concat
layout in a special file, read this file during allocation to compute on
the fly if an extent falls into a physical volume. But now I do agree
that from the perspective of OS engineering, filesystems should not know
the underlying layout, at least not in such an "ad-hoc" way. Aligning
AGs to physical volumes and applying allocation policy might be the best
approach.

Thank you very much for the help.

Cheers,
-- 
Wei Lin

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question on migrating data between PVs in xfs
  2016-08-10 21:51         ` Dave Chinner
@ 2016-08-11  9:26           ` Wei Lin
  2016-08-11 10:44           ` Emmanuel Florac
  1 sibling, 0 replies; 8+ messages in thread
From: Wei Lin @ 2016-08-11  9:26 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs, Wei Lin

On 16-08-11 07:51:49, Dave Chinner wrote:
> On Wed, Aug 10, 2016 at 06:31:32PM +0200, Emmanuel Florac wrote:
> > Le Wed, 10 Aug 2016 20:56:39 +1000
> > Dave Chinner <david@fromorbit.com> écrivait:
> > 
> > > Have you lookd at using dm-cache instead of modifying the
> > > filesystem?
> > > 
> > 
> > Or bcache, fcache, or EnhanceIO. So far from my own testing bcache is
> > significantly faster and dm-cache by far the slowest of the bunch, but
> > bcache needs some more loving (his main developer is busy writing
> > some new tiered, caching filesystem instead).
> 
> Yeah, the problem with bcache is that it is effectively an orphaned
> driver. If there are obvious and reproducable performance
> differentials between bcache and dm-cache, you should bring them to
> the attention of the dm developers to see if they can fix them...
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

Software like dm-cache and bcache seem to use SSDs merely as caches
instead of aggregating the capacity of all devices. However I just found
aufs and overlayfs, which conceptually suit the purpose better.

Cheers,
-- 
Wei Lin

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question on migrating data between PVs in xfs
  2016-08-10 21:51         ` Dave Chinner
  2016-08-11  9:26           ` Wei Lin
@ 2016-08-11 10:44           ` Emmanuel Florac
  1 sibling, 0 replies; 8+ messages in thread
From: Emmanuel Florac @ 2016-08-11 10:44 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs, Wei Lin

Le Thu, 11 Aug 2016 07:51:49 +1000
Dave Chinner <david@fromorbit.com> écrivait:

> On Wed, Aug 10, 2016 at 06:31:32PM +0200, Emmanuel Florac wrote:
> > Le Wed, 10 Aug 2016 20:56:39 +1000
> > Dave Chinner <david@fromorbit.com> écrivait:
> >   
> > > Have you lookd at using dm-cache instead of modifying the
> > > filesystem?
> > >   
> > 
> > Or bcache, fcache, or EnhanceIO. So far from my own testing bcache
> > is significantly faster and dm-cache by far the slowest of the
> > bunch, but bcache needs some more loving (his main developer is
> > busy writing some new tiered, caching filesystem instead).  
> 
> Yeah, the problem with bcache is that it is effectively an orphaned
> driver. If there are obvious and reproducable performance
> differentials between bcache and dm-cache, you should bring them to
> the attention of the dm developers to see if they can fix them...

Good idea. Well bcache may be orphaned of its main developer, however
others still submit quite a lot of stability patches (among them
Christoph Hellwig which is also active here IIRC).

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-08-11 10:45 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-09 14:50 Question on migrating data between PVs in xfs Wei Lin
2016-08-09 22:35 ` Dave Chinner
     [not found]   ` <20160810092313.GA16193@ic>
2016-08-10 10:56     ` Dave Chinner
2016-08-10 16:31       ` Emmanuel Florac
2016-08-10 21:51         ` Dave Chinner
2016-08-11  9:26           ` Wei Lin
2016-08-11 10:44           ` Emmanuel Florac
2016-08-11  9:04       ` Wei Lin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.