All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: HSM
@ 2013-11-11 16:05 bernhard glomm
  0 siblings, 0 replies; 12+ messages in thread
From: bernhard glomm @ 2013-11-11 16:05 UTC (permalink / raw)
  To: ceph-devel

[-- Attachment #1: Type: text/plain, Size: 1116 bytes --]


Sorry for a maybe dumm idea,
but since this seems to be the beginnig of the discussion 
about integrating HSM Systems it might still be the right place for it.

Integrating an HSM is surely interesting but the prize of a commercial 
System (like SAMFS) exceeds the capacity of a wide range of customers,
maybe the GRAU - HSM is meanwhile available as a potential affordable solution
but still it needs it's own maintenance. 

If you are working on integrating a policy management and a copytool,
(not ME, I'm op not dev ;-) why not take a look at LTFS which comes
for free since LTO5 (at least the single drive version is free AFAIK).
It offers a filesystem structure that can be written/read directly to/from
tape without any further software than the kernel module.

I tend to believe that it can't be such a big extra effort to not just through
the files into an existing HSM for archiving 
but feed them to LTFS formated tapes
that reside in an autochanger.

That would decrease archiving costs for small business by ... A LOT?

Might that b worth a thought?

best regards

Bernhard

[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: HSM
  2013-11-18 19:22   ` HSM Dmitry Borodaenko
@ 2013-11-20 12:09     ` Malcolm Haak
  0 siblings, 0 replies; 12+ messages in thread
From: Malcolm Haak @ 2013-11-20 12:09 UTC (permalink / raw)
  To: Dmitry Borodaenko, Andreas Joachim Peters; +Cc: Sage Weil, ceph-devel

It is, except it might not be.

Dmapi only works if you are the one in charge of the HSM and the filesystem.

So for example in a DMF solution the filesystem mounted with DMAPI 
options is on your NFS head node. Your HSM solution is also installed 
there.

Things get a bit more odd when you look at DMAPI + Clustered systems. 
You would need HSM agents on every client node If we are talking CephFS 
that is.

This is also true with the Lustre solution. The Lustre clients have no 
idea this stuff is happening. This is how it should work. It means the 
current requirement for installed software on the bulk of your clients 
is a working kernel or fuse module.

On 19/11/13 05:22, Dmitry Borodaenko wrote:
> On Tue, Nov 12, 2013 at 1:47 AM, Andreas Joachim Peters
> <Andreas.Joachim.Peters@cern.ch> wrote:
>> I think you need to support the following functionality to support HSM (file not block based):
>>
>> 1 implement a trigger on file creation/modification/deletion
>>
>> 2 store the additional HSM identifier for recall as a file attribute
>>
>> 3 policy based purging of file related blocks (LRU cache etc.)
>>
>> 4 implement an optional trigger to recall a purged file and block the IO (our experience is that automatic recalls are problematic for huge installations if the aggregation window for desired recalls is short since they create inefficient and chaotic access on tapes)
>>
>> 5 either snapshot a file before migration, do an exclusive lock or freeze it to avoid modifications during migration (you need to have a unique enough identifier for a file, either inode/path + checksum or also inode/path + modification time works)
>
> DMAPI seems to be the natural choice for items 1 & 4 above.
>
>> FYI: there was a paper about migration policy scanning performance by IBM two years ago:
>> http://domino.watson.ibm.com/library/CyberDig.nsf/papers/4A50C2D66A1F90F7852578E3005A2034/$File/rj10484.pdf
>
> An important omission in that paper is the exact ILM policy that was
> used to scan the file system. I strongly suspect that it was a
> catch-all policy that matches every file without examining any
> metadata. When you add conditions that check file metadata, scan time
> would increase, probably by a few orders of magnitude.
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: HSM
  2013-11-12  9:47 ` HSM Andreas Joachim Peters
@ 2013-11-18 19:22   ` Dmitry Borodaenko
  2013-11-20 12:09     ` HSM Malcolm Haak
  0 siblings, 1 reply; 12+ messages in thread
From: Dmitry Borodaenko @ 2013-11-18 19:22 UTC (permalink / raw)
  To: Andreas Joachim Peters; +Cc: Sage Weil, ceph-devel

On Tue, Nov 12, 2013 at 1:47 AM, Andreas Joachim Peters
<Andreas.Joachim.Peters@cern.ch> wrote:
> I think you need to support the following functionality to support HSM (file not block based):
>
> 1 implement a trigger on file creation/modification/deletion
>
> 2 store the additional HSM identifier for recall as a file attribute
>
> 3 policy based purging of file related blocks (LRU cache etc.)
>
> 4 implement an optional trigger to recall a purged file and block the IO (our experience is that automatic recalls are problematic for huge installations if the aggregation window for desired recalls is short since they create inefficient and chaotic access on tapes)
>
> 5 either snapshot a file before migration, do an exclusive lock or freeze it to avoid modifications during migration (you need to have a unique enough identifier for a file, either inode/path + checksum or also inode/path + modification time works)

DMAPI seems to be the natural choice for items 1 & 4 above.

> FYI: there was a paper about migration policy scanning performance by IBM two years ago:
> http://domino.watson.ibm.com/library/CyberDig.nsf/papers/4A50C2D66A1F90F7852578E3005A2034/$File/rj10484.pdf

An important omission in that paper is the exact ILM policy that was
used to scan the file system. I strongly suspect that it was a
catch-all policy that matches every file without examining any
metadata. When you add conditions that check file metadata, scan time
would increase, probably by a few orders of magnitude.

-- 
Dmitry Borodaenko

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: HSM
  2013-11-09  8:33 HSM Sage Weil
                   ` (2 preceding siblings ...)
  2013-11-11  9:50 ` HSM Sebastien Ponce
@ 2013-11-12  9:47 ` Andreas Joachim Peters
  2013-11-18 19:22   ` HSM Dmitry Borodaenko
  3 siblings, 1 reply; 12+ messages in thread
From: Andreas Joachim Peters @ 2013-11-12  9:47 UTC (permalink / raw)
  To: Sage Weil, ceph-devel

Hi, 
I think you need to support the following functionality to support HSM (file not block based):

1 implement a trigger on file creation/modification/deletion

2 store the additional HSM identifier for recall as a file attribute

3 policy based purging of file related blocks (LRU cache etc.)

4 implement an optional trigger to recall a purged file and block the IO (our experience is that automatic recalls are problematic for huge installations if the aggregation window for desired recalls is short since they create inefficient and chaotic access on tapes)

5 either snapshot a file before migration, do an exclusive lock or freeze it to avoid modifications during migration (you need to have a unique enough identifier for a file, either inode/path + checksum or also inode/path + modification time works)

The interesting part is the policy engine:

Ideally one supports time based and volume triggered policies with meta data matching e.g. 

a) time based : one needs to create a LRU list e.g. have a view of all files matching a policy by creation and or  access time. 
Example: "evict files from the filesystem when they are not accessed since 1 month"

b) volume triggered : one needs to create a LRU list by creation and or access time and files are evicted from disks when a certain high-watermark is reached until the volume goes under a low-watermark
Example "evict files matching size/name/... criteria if the pool volume or subtree exceeds 95% to reach 90% usage"

Backup and archiving is simple compared to the above LRU policies.

You need the possibility to create this LRU view from scratch (e.g. full-table scan) and afterwards you could use incremental updates via trigger. 

Ideally one has a central view (on a subtree) and apply the policy there but this is not scalable as the rest of CEPH. It has the same problem like quota accounting by uid/gid on a subtree with the complication that you have to maintain a possibly huge file list sorted by ctime/mtime and or atime. CEPHFS stores directories as objects but you cannot apply policies on a the individual directory level, so it has to be at least at pool level or subtree level. If one trades the flexibility of policies one can keep the LRU view small. There is also no need to track each change of an atime, one could track atime for the LRU view with a granularity of days to avoid too many updates.

Now if you don't want to implement this LRU view you can outsource it to an external DB and ship the scalability issue and update frequency issue to the DB :-) and just provide the migration/recall hooks and attribute support. Maybe your idea was to integrate with RobinHood ... currently it seems tightly integrated with Lustre internals.

The HSM logic looks similiar to the peering logic you need for erasure coding to trigger eviction and recall. If you have the ctime/mtime/atime information on entries in directory objects and not on data objects it is sort of corresponding. With ctime/mtime only it is much more lightweight.

I actually wanted to make a BluePrint proposal for meta data searches in subtrees running as a method on the MDS objects which would provide the needed functionality for the HSM views. Although this is a full subtree scan it would be actually nicely distributed on the MDS backend pool and not on the MDS itself. The output of the search could go into temporary objects which are then converted into HSM actions like migration/deletion trigger etc.

I would favour this approach rather than relying on more and more external components since it is easy to do in CEPH.

FYI: there was a paper about migration policy scanning performance by IBM two years ago: 
http://domino.watson.ibm.com/library/CyberDig.nsf/papers/4A50C2D66A1F90F7852578E3005A2034/$File/rj10484.pdf

Cheers Andreas.




 





________________________________________
From: ceph-devel-owner@vger.kernel.org [ceph-devel-owner@vger.kernel.org] on behalf of Sage Weil [sage@inktank.com]
Sent: 09 November 2013 09:33
To: ceph-devel@vger.kernel.org
Subject: HSM

The latest Lustre just added HSM support:

        http://archive.hpcwire.com/hpcwire/2013-11-06/lustre_scores_business_class_upgrade_with_hsm.html

Here is a slide deck with some high-level detail:

        https://jira.hpdd.intel.com/secure/attachment/13185/Lustre_HSM_Design.pdf

Is anyone familiar with the interfaces and requirements of the file system
itself?  I don't know much about how these systems are implemented, but I
would guess there are relatively lightweight requirements on the fs (ceph
mds in our case) to keep track of file state (online or archived
elsewhere).  And some hooks to trigger migrations?

If anyone is interested in this area, I would be happy to help figure out
how to integrate things cleanly!

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: HSM
  2013-11-12  0:13     ` HSM Gregory Farnum
@ 2013-11-12  0:57       ` Malcolm Haak
  0 siblings, 0 replies; 12+ messages in thread
From: Malcolm Haak @ 2013-11-12  0:57 UTC (permalink / raw)
  To: Gregory Farnum, John Spray; +Cc: Sage Weil, ceph-devel

Hi Gregory,


On 12/11/13 10:13, Gregory Farnum wrote:
> On Mon, Nov 11, 2013 at 3:04 AM, John Spray <john.spray@inktank.com> wrote:
>> This is a really useful summary from Malcolm.
>>
>> In addition to the coordinator/copytool interface, there is the question of
>> where the policy engine gets its data from.  Lustre has the MDS changelog,
>> which Robinhood uses to replicate metadata into its MySQL database with all
>> the indices that it wants.
>
>> On Sun, Nov 10, 2013 at 11:17 PM, Malcolm Haak <malcolm@sgi.com> wrote:
>>> So there aren't really any hooks in that exports are triggered by the policy engine after a scan of the metadata, and the recalls are triggered when caps are requested on offline files
>
> Wait, is the HSM using a changelog or is it just scanning the full
> filesystem tree? Scanning the whole tree seems awfully expensive.

While I can't speak at length about the LustreHSM, it may just use 
incremental updates to its SQL database via metadata logs, I do know 
that filesystem scans are done regularly in other HSM solutions. I also 
know that the scan is multi-threaded and when backed by decent disks 
does not take an excessive amount of time.

>
>> I don't know if CephFS MDS currently has a similar interface.
> Well, the MDSes each have their journal of course, but more than that
> we can stick whatever we want into the metadata and expose it via
> virtual xattrs or whatever else.
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>>
>> John
>>
>>
>> On Sun, Nov 10, 2013 at 11:17 PM, Malcolm Haak <malcolm@sgi.com> wrote:
>>>
>>> Hi All,
>>>
>>> If you are talking specifically about Lustre HSM, its really an interface to add HSM functionality by leveraging existing HSM's (DMF for example)
>>>
>>> So with Lustre HSM you have a policy engine that triggers the migrations out of the filesystem. Rules are based around size, last accessed and target state (online, dual and offline).
>>>
>>> There is a 'coordinator' process involved here as well, it (from what I understand) runs on MDS nodes. It handles the interaction with the copytool. The copytool is provided by the HSM solution you are acutally using.
>>>
>>> For recalls when caps are aquired on the MDS for an exported file the resposible MSD contacts the coordinator, which in-turn uses the copytool to pull the required file out of the HSM.
>>>
>>> In the Lustre HSM, the objects that make up a file are all recalled and the file, not the objects, are handed to the HSM.
>>>
>>> For Lustre all it needs to keep track of is the current state of the file and the correct ID to reqest from the HSM. This is done inside the normal metadata storage.
>>>
>>> So there aren't really any hooks in that exports are triggered by the policy engine after a scan of the metadata, and the recalls are triggered when caps are requested on offline files. Then its just standard POSIX blocking until the file is available.
>>>
>>> Most of the state and ID stuff could be stored as XATTRS in cephfs. I'm not as sure how to do it for other things but as long as you could store some kind of extended metadata about whole objects, it could use the same interfaces as well.
>>>
>>> Hope that was acutually helpful and not just an obvious rehash...
>>>
>>> Regards
>>>
>>> Malcolm Haak

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: HSM
  2013-11-11 11:04   ` HSM John Spray
@ 2013-11-12  0:13     ` Gregory Farnum
  2013-11-12  0:57       ` HSM Malcolm Haak
  0 siblings, 1 reply; 12+ messages in thread
From: Gregory Farnum @ 2013-11-12  0:13 UTC (permalink / raw)
  To: John Spray; +Cc: Malcolm Haak, Sage Weil, ceph-devel

On Mon, Nov 11, 2013 at 3:04 AM, John Spray <john.spray@inktank.com> wrote:
> This is a really useful summary from Malcolm.
>
> In addition to the coordinator/copytool interface, there is the question of
> where the policy engine gets its data from.  Lustre has the MDS changelog,
> which Robinhood uses to replicate metadata into its MySQL database with all
> the indices that it wants.

> On Sun, Nov 10, 2013 at 11:17 PM, Malcolm Haak <malcolm@sgi.com> wrote:
>> So there aren't really any hooks in that exports are triggered by the policy engine after a scan of the metadata, and the recalls are triggered when caps are requested on offline files

Wait, is the HSM using a changelog or is it just scanning the full
filesystem tree? Scanning the whole tree seems awfully expensive.

> I don't know if CephFS MDS currently has a similar interface.
Well, the MDSes each have their journal of course, but more than that
we can stick whatever we want into the metadata and expose it via
virtual xattrs or whatever else.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

>
> John
>
>
> On Sun, Nov 10, 2013 at 11:17 PM, Malcolm Haak <malcolm@sgi.com> wrote:
>>
>> Hi All,
>>
>> If you are talking specifically about Lustre HSM, its really an interface to add HSM functionality by leveraging existing HSM's (DMF for example)
>>
>> So with Lustre HSM you have a policy engine that triggers the migrations out of the filesystem. Rules are based around size, last accessed and target state (online, dual and offline).
>>
>> There is a 'coordinator' process involved here as well, it (from what I understand) runs on MDS nodes. It handles the interaction with the copytool. The copytool is provided by the HSM solution you are acutally using.
>>
>> For recalls when caps are aquired on the MDS for an exported file the resposible MSD contacts the coordinator, which in-turn uses the copytool to pull the required file out of the HSM.
>>
>> In the Lustre HSM, the objects that make up a file are all recalled and the file, not the objects, are handed to the HSM.
>>
>> For Lustre all it needs to keep track of is the current state of the file and the correct ID to reqest from the HSM. This is done inside the normal metadata storage.
>>
>> So there aren't really any hooks in that exports are triggered by the policy engine after a scan of the metadata, and the recalls are triggered when caps are requested on offline files. Then its just standard POSIX blocking until the file is available.
>>
>> Most of the state and ID stuff could be stored as XATTRS in cephfs. I'm not as sure how to do it for other things but as long as you could store some kind of extended metadata about whole objects, it could use the same interfaces as well.
>>
>> Hope that was acutually helpful and not just an obvious rehash...
>>
>> Regards
>>
>> Malcolm Haak

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: HSM
  2013-11-10 23:17 ` HSM Malcolm Haak
@ 2013-11-11 11:04   ` John Spray
  2013-11-12  0:13     ` HSM Gregory Farnum
  0 siblings, 1 reply; 12+ messages in thread
From: John Spray @ 2013-11-11 11:04 UTC (permalink / raw)
  To: Malcolm Haak; +Cc: Sage Weil, ceph-devel

This is a really useful summary from Malcolm.

In addition to the coordinator/copytool interface, there is the question of
where the policy engine gets its data from.  Lustre has the MDS changelog,
which Robinhood uses to replicate metadata into its MySQL database with all
the indices that it wants.  I don't know if CephFS MDS currently has a
similar interface.

John


On Sun, Nov 10, 2013 at 11:17 PM, Malcolm Haak <malcolm@sgi.com> wrote:
>
> Hi All,
>
> If you are talking specifically about Lustre HSM, its really an interface to add HSM functionality by leveraging existing HSM's (DMF for example)
>
> So with Lustre HSM you have a policy engine that triggers the migrations out of the filesystem. Rules are based around size, last accessed and target state (online, dual and offline).
>
> There is a 'coordinator' process involved here as well, it (from what I understand) runs on MDS nodes. It handles the interaction with the copytool. The copytool is provided by the HSM solution you are acutally using.
>
> For recalls when caps are aquired on the MDS for an exported file the resposible MSD contacts the coordinator, which in-turn uses the copytool to pull the required file out of the HSM.
>
> In the Lustre HSM, the objects that make up a file are all recalled and the file, not the objects, are handed to the HSM.
>
> For Lustre all it needs to keep track of is the current state of the file and the correct ID to reqest from the HSM. This is done inside the normal metadata storage.
>
> So there aren't really any hooks in that exports are triggered by the policy engine after a scan of the metadata, and the recalls are triggered when caps are requested on offline files. Then its just standard POSIX blocking until the file is available.
>
> Most of the state and ID stuff could be stored as XATTRS in cephfs. I'm not as sure how to do it for other things but as long as you could store some kind of extended metadata about whole objects, it could use the same interfaces as well.
>
> Hope that was acutually helpful and not just an obvious rehash...
>
> Regards
>
> Malcolm Haak
>
>
> On 09/11/13 18:33, Sage Weil wrote:
>>
>> The latest Lustre just added HSM support:
>>
>>         http://archive.hpcwire.com/hpcwire/2013-11-06/lustre_scores_business_class_upgrade_with_hsm.html
>>
>> Here is a slide deck with some high-level detail:
>>
>>         https://jira.hpdd.intel.com/secure/attachment/13185/Lustre_HSM_Design.pdf
>>
>> Is anyone familiar with the interfaces and requirements of the file system
>> itself?  I don't know much about how these systems are implemented, but I
>> would guess there are relatively lightweight requirements on the fs (ceph
>> mds in our case) to keep track of file state (online or archived
>> elsewhere).  And some hooks to trigger migrations?
>>
>> If anyone is interested in this area, I would be happy to help figure out
>> how to integrate things cleanly!
>>
>> sage
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: HSM
  2013-11-09 14:20 ` HSM Tim Bell
@ 2013-11-11  9:58   ` Sebastien Ponce
  0 siblings, 0 replies; 12+ messages in thread
From: Sebastien Ponce @ 2013-11-11  9:58 UTC (permalink / raw)
  To: Tim Bell; +Cc: Sage Weil, <ceph-devel@vger.kernel.org>

> - Keeping the tape drives busy js always difficult… tape drives are
> now regularly exceeding 250MB/s on a single stream so the storage
> system needs to be able to maintain a high data rate. Tape drive
> performance drops rapidly when the drives have to stop and then
> restart as the buffers fill up again.

This is actually where the striping on top of librados is helping us
(see my other mail). The striping allows to read concurrently from
several ceph nodes concurrently (with some read ahead and using
asynchronous reading) and allows to achieve much more than 250MB/s
(already foreseeing next generation...).

Sebastien


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: HSM
  2013-11-09  8:33 HSM Sage Weil
  2013-11-09 14:20 ` HSM Tim Bell
  2013-11-10 23:17 ` HSM Malcolm Haak
@ 2013-11-11  9:50 ` Sebastien Ponce
  2013-11-12  9:47 ` HSM Andreas Joachim Peters
  3 siblings, 0 replies; 12+ messages in thread
From: Sebastien Ponce @ 2013-11-11  9:50 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

It may be even more lightweight than that depending on the Mass Storage
system intrinsic capabilities.

Here at CERN, we are testing the use of ceph in the disk cache in our
Mass Storage system and the only requirements we had was to store and
retrieve a file (put/get functionality). We've thus used the rados
library with a small extension implementing striping of big files the
way you've done it in cephFS. I will actually submit a blueprint for
this extension in case you want to integrate it.
This supposed that we have our own tools and protocols to migrate/recall
files and query the status of them.

In the general case, you're pretty right that you need to store the
state of the file (some use a stub file when the file is not on disk)
and hooks after the writing of a new file and in case of read failure
(cache miss).
One difficult point though, as Tim mentioned already, is to reach
optimal usage of your tape backend. This may end up in long queuing
times (hours) for recalls and one has to handle waiting clients during
that time.

Sebastien


On Sat, 2013-11-09 at 00:33 -0800, Sage Weil wrote:
> The latest Lustre just added HSM support:
> 
> 	http://archive.hpcwire.com/hpcwire/2013-11-06/lustre_scores_business_class_upgrade_with_hsm.html
> 
> Here is a slide deck with some high-level detail:
> 	
> 	https://jira.hpdd.intel.com/secure/attachment/13185/Lustre_HSM_Design.pdf
> 
> Is anyone familiar with the interfaces and requirements of the file system 
> itself?  I don't know much about how these systems are implemented, but I 
> would guess there are relatively lightweight requirements on the fs (ceph 
> mds in our case) to keep track of file state (online or archived 
> elsewhere).  And some hooks to trigger migrations?
> 
> If anyone is interested in this area, I would be happy to help figure out 
> how to integrate things cleanly!
> 
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: HSM
  2013-11-09  8:33 HSM Sage Weil
  2013-11-09 14:20 ` HSM Tim Bell
@ 2013-11-10 23:17 ` Malcolm Haak
  2013-11-11 11:04   ` HSM John Spray
  2013-11-11  9:50 ` HSM Sebastien Ponce
  2013-11-12  9:47 ` HSM Andreas Joachim Peters
  3 siblings, 1 reply; 12+ messages in thread
From: Malcolm Haak @ 2013-11-10 23:17 UTC (permalink / raw)
  To: Sage Weil, ceph-devel

Hi All,

If you are talking specifically about Lustre HSM, its really an 
interface to add HSM functionality by leveraging existing HSM's (DMF for 
example)

So with Lustre HSM you have a policy engine that triggers the migrations 
out of the filesystem. Rules are based around size, last accessed and 
target state (online, dual and offline).

There is a 'coordinator' process involved here as well, it (from what I 
understand) runs on MDS nodes. It handles the interaction with the 
copytool. The copytool is provided by the HSM solution you are acutally 
using.

For recalls when caps are aquired on the MDS for an exported file the 
resposible MSD contacts the coordinator, which in-turn uses the copytool 
to pull the required file out of the HSM.

In the Lustre HSM, the objects that make up a file are all recalled and 
the file, not the objects, are handed to the HSM.

For Lustre all it needs to keep track of is the current state of the 
file and the correct ID to reqest from the HSM. This is done inside the 
normal metadata storage.

So there aren't really any hooks in that exports are triggered by the 
policy engine after a scan of the metadata, and the recalls are 
triggered when caps are requested on offline files. Then its just 
standard POSIX blocking until the file is available.

Most of the state and ID stuff could be stored as XATTRS in cephfs. I'm 
not as sure how to do it for other things but as long as you could store 
some kind of extended metadata about whole objects, it could use the 
same interfaces as well.

Hope that was acutually helpful and not just an obvious rehash...

Regards

Malcolm Haak

On 09/11/13 18:33, Sage Weil wrote:
> The latest Lustre just added HSM support:
>
> 	http://archive.hpcwire.com/hpcwire/2013-11-06/lustre_scores_business_class_upgrade_with_hsm.html
>
> Here is a slide deck with some high-level detail:
> 	
> 	https://jira.hpdd.intel.com/secure/attachment/13185/Lustre_HSM_Design.pdf
>
> Is anyone familiar with the interfaces and requirements of the file system
> itself?  I don't know much about how these systems are implemented, but I
> would guess there are relatively lightweight requirements on the fs (ceph
> mds in our case) to keep track of file state (online or archived
> elsewhere).  And some hooks to trigger migrations?
>
> If anyone is interested in this area, I would be happy to help figure out
> how to integrate things cleanly!
>
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: HSM
  2013-11-09  8:33 HSM Sage Weil
@ 2013-11-09 14:20 ` Tim Bell
  2013-11-11  9:58   ` HSM Sebastien Ponce
  2013-11-10 23:17 ` HSM Malcolm Haak
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Tim Bell @ 2013-11-09 14:20 UTC (permalink / raw)
  To: Sage Weil; +Cc: <ceph-devel@vger.kernel.org>


Looking at the potential applications of HSM, an implementation based around the object store/radosgw may also be highly desirable.

Spectralogic recently announced their DS3 API which adds bring online/offline operations to S3.

In particular,

- People using S3/SWIFT think in terms of bulk per-file operations rather than POSIX semantics. Updates are a problem for tape based systems. It can be done but recovering from a bad migration of active data is not easy.

- Containers group related files together. The expensive operations on tape are mount and seek. Thus, the aim should be to maximise the number of related files on a tape for ease of recall. When moving files around from one media for another (repack operations), container level grouping ensures that the number of recall mounts is minimised.

- Keeping the tape drives busy js always difficult… tape drives are now regularly exceeding 250MB/s on a single stream so the storage system needs to be able to maintain a high data rate. Tape drive performance drops rapidly when the drives have to stop and then restart as the buffers fill up again.

Tim

On 9 Nov 2013, at 09:33, Sage Weil <sage@inktank.com> wrote:

> The latest Lustre just added HSM support:
> 
> 	http://archive.hpcwire.com/hpcwire/2013-11-06/lustre_scores_business_class_upgrade_with_hsm.html
> 
> Here is a slide deck with some high-level detail:
> 	
> 	https://jira.hpdd.intel.com/secure/attachment/13185/Lustre_HSM_Design.pdf
> 
> Is anyone familiar with the interfaces and requirements of the file system 
> itself?  I don't know much about how these systems are implemented, but I 
> would guess there are relatively lightweight requirements on the fs (ceph 
> mds in our case) to keep track of file state (online or archived 
> elsewhere).  And some hooks to trigger migrations?
> 
> If anyone is interested in this area, I would be happy to help figure out 
> how to integrate things cleanly!
> 
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* HSM
@ 2013-11-09  8:33 Sage Weil
  2013-11-09 14:20 ` HSM Tim Bell
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Sage Weil @ 2013-11-09  8:33 UTC (permalink / raw)
  To: ceph-devel

The latest Lustre just added HSM support:

	http://archive.hpcwire.com/hpcwire/2013-11-06/lustre_scores_business_class_upgrade_with_hsm.html

Here is a slide deck with some high-level detail:
	
	https://jira.hpdd.intel.com/secure/attachment/13185/Lustre_HSM_Design.pdf

Is anyone familiar with the interfaces and requirements of the file system 
itself?  I don't know much about how these systems are implemented, but I 
would guess there are relatively lightweight requirements on the fs (ceph 
mds in our case) to keep track of file state (online or archived 
elsewhere).  And some hooks to trigger migrations?

If anyone is interested in this area, I would be happy to help figure out 
how to integrate things cleanly!

sage

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-11-20 12:09 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-11 16:05 HSM bernhard glomm
  -- strict thread matches above, loose matches on Subject: below --
2013-11-09  8:33 HSM Sage Weil
2013-11-09 14:20 ` HSM Tim Bell
2013-11-11  9:58   ` HSM Sebastien Ponce
2013-11-10 23:17 ` HSM Malcolm Haak
2013-11-11 11:04   ` HSM John Spray
2013-11-12  0:13     ` HSM Gregory Farnum
2013-11-12  0:57       ` HSM Malcolm Haak
2013-11-11  9:50 ` HSM Sebastien Ponce
2013-11-12  9:47 ` HSM Andreas Joachim Peters
2013-11-18 19:22   ` HSM Dmitry Borodaenko
2013-11-20 12:09     ` HSM Malcolm Haak

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.