All of lore.kernel.org
 help / color / mirror / Atom feed
* Ceph backfilling explained ( maybe )
@ 2013-05-25 11:55 Loic Dachary
  2013-05-25 12:33 ` Leen Besselink
  0 siblings, 1 reply; 9+ messages in thread
From: Loic Dachary @ 2013-05-25 11:55 UTC (permalink / raw)
  To: Ceph Development

[-- Attachment #1: Type: text/plain, Size: 5272 bytes --]

Hi,

Here is a draft of my current understanding of backfilling. Disclaimer : it is possible that I completely misunderstood ;-)

Cheers

Ceph stores objects in pools which are divided in placement groups.

   +---------------------------- pool a ----+
   |+----- placement group 1 -------------+ |
   ||+-------+  +-------+                 | |
   |||object |  |object |                 | |
   ||+-------+  +-------+                 | |
   |+-------------------------------------+ |
   |+----- placement group 2 -------------+ |
   ||+-------+  +-------+                 | |
   |||object |  |object |   ...           | |
   ||+-------+  +-------+                 | |
   |+-------------------------------------+ |
   |               ....                     |
   |                                        |
   +----------------------------------------+

   +---------------------------- pool b ----+
   |+----- placement group 1 -------------+ |
   ||+-------+  +-------+                 | |
   |||object |  |object |                 | |
   ||+-------+  +-------+                 | |
   |+-------------------------------------+ |
   |+----- placement group 2 -------------+ |
   ||+-------+  +-------+                 | |
   |||object |  |object |   ...           | |
   ||+-------+  +-------+                 | |
   |+-------------------------------------+ |
   |               ....                     |
   |                                        |
   +----------------------------------------+

   ...

The placement group is supported by OSDs to store the objects. They are daemons running on machines where storage  For instance, a placement group supporting three replicates will have three OSDs at his disposal : one OSDs is the primary and the two other store copies of each object.

       +-------- placement group -------------+
       |+----------------+ +----------------+ |
       || object A       | | object B       | |
       |+----------------+ +----------------+ |
       +---+-------------+-----------+--------+
           |             |           |
           |             |           |
         OSD 0         OSD 1       OSD 2
        +------+      +------+    +------+
        |+---+ |      |+---+ |    |+---+ |
        || A | |      || A | |    || A | |
        |+---+ |      |+---+ |    |+---+ |
        |+---+ |      |+---+ |    |+---+ |
        || B | |      || B | |    || B | |
        |+---+ |      |+---+ |    |+---+ |
        +------+      +------+    +------+

The OSDs are not for the exclusive use of the placement group : multiple placement groups can use the same OSDs to store their objects. However, the collocation of objects from various placement groups in the same OSD is transparent and is not discussed here.

The placement group does not run as a single daemon as suggested above. Instead it os distributed and resides within each OSD. Whenever an OSD dies, the placement group for this OSD is gone and needs to be reconstructed using another OSD.

               OSD 0                                           OSD 1 ...
        +----------------+---- placement group --------+  +------
        |+--- object --+ |+--------------------------+ |  |
        || name : B    | ||  pg_log_entry_t MODIFY   | |  |
        || key : 2     | ||  pg_log_entry_t DELETE   | |  |
        |+-------------+ |+--------------------------+ |  |
        |+--- object --+ >------ last_backfill         |  | ....
        || name : A    | |                             |  |
        || key : 5     | |                             |  |
        |+-------------+ |                             |  |
        |                |                             |  |
        |    ....        |                             |  |
        +----------------+-----------------------------+  +-----


When an object is deleted or modified in the placement group, it is recorded in a log to be replayed if needed. In the simplest case, if an OSD gets disconnected, reconnects and needs to catch up with the other OSDs, copies of the log entries will be sent to it. However, the logs have a limited size and it may be more efficient, in some cases, to just copy the objects over instead of replaying the logs.

Each object name is hashed into an integer that can be used to order them. For instance, the object B above has been hashed to key 2 and the object A above has been hashed to key 5. The last_backfill pointer of the placement group draws the limit separating the objects that have already been copied from other OSDs and those in the process of being copied. The objects that are lower than last_backfill have been copied ( that would be object B above ) and the objects that are greater than last_backfill are going to be copied.

It may take time for an OSD to catch up and it is useful to allow replaying the logs while backfilling. log entries related to objects lower than last_backfill are applied. However, log entries related to objects greater than last_backfill are discarded because it is scheduled to be copied at a later time anyway.


-- 
Loïc Dachary, Artisan Logiciel Libre
All that is necessary for the triumph of evil is that good people do nothing.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Ceph backfilling explained ( maybe )
  2013-05-25 11:55 Ceph backfilling explained ( maybe ) Loic Dachary
@ 2013-05-25 12:33 ` Leen Besselink
  2013-05-25 14:27   ` Loic Dachary
  0 siblings, 1 reply; 9+ messages in thread
From: Leen Besselink @ 2013-05-25 12:33 UTC (permalink / raw)
  To: Ceph Development

On Sat, May 25, 2013 at 01:55:30PM +0200, Loic Dachary wrote:
> Hi,
> 

Hi Loic,

> Here is a draft of my current understanding of backfilling. Disclaimer : it is possible that I completely misunderstood ;-)
> 

Maybe I'm wrong, but I think there are some flaws in your explanation.

Disclaimer: I'm not a Ceph developer, but a fellow Ceph tester/user.

I would think almost any explanation of how Ceph works would use words like hash or algorithm.

Because the RADOS algorithm determines where data ends up.

It is used to calculate how to balance the data over the different placement groups, osds and machines.

I believe/assumed it works like this:
- an OSD belongs to one pool
- an OSD can serve many placement groups
- a placement group belongs to one pool
- the monitors know which OSD's exist in each pool and which are up
- this is sometimes called the topology, usually the osdmap and pgmap
- the monitors need to have quorum to be authoritive for their data before making changes.
- the list of monitors is called the monmap.
- when storing/retrieving a client will have to do a RADOS-calculation
- to do this calculation it will first need to talk to one of the monitor which has quorum
  to get the osdmap and pgmap. They serve as input for the RADOS-algorithm
- one OSD is the master for one piece of data and serves as the contact-point for clients
- that master OSD for a piece of data also runs the RADOS-calculation to talk to the other OSDs
  when replicating data changes
- a piece of data is called an object. Ceph is an Object Storage system/Object Store.
- the Ceph Object Store is not the same as a Swift or RADOS-GW object store.
- a Cehp object can store keys/values, not just data
- when using RBD the RBD client will create a 'directory' object which contains general information
  like the version/type of RBD-image and a list of names of the image parts. Each part is the same
  size, I think it was 4MB ?
- when an OSD or client connects to an OSD they also communicate information about atleast the osdmap and monmap.
- when one OSD or monitor can't reach an other mon or OSD, they will use a gossip protocol to communicate that to connected clients, OSDs or mons.
- when a new OSD comes online the other OSD's talk to it to know what data they might need to exchange
  this is called peering.
- the RADOS-algorithm works similair to consistent hashing, so a client can talk directly to the OSD where the data is or should be stored.
- backfilling is what a master OSD does when it is checking if the other OSD's that should have a copy actaully has a copy. It will send a copy of missing objects.

How the RADOS-algoritm calculates based on the osdmap and pgmap what pg and master-osd an object belongs to I'm not a 100% sure.

Does that help ?

> Cheers
> 
> Ceph stores objects in pools which are divided in placement groups.
> 
>    +---------------------------- pool a ----+
>    |+----- placement group 1 -------------+ |
>    ||+-------+  +-------+                 | |
>    |||object |  |object |                 | |
>    ||+-------+  +-------+                 | |
>    |+-------------------------------------+ |
>    |+----- placement group 2 -------------+ |
>    ||+-------+  +-------+                 | |
>    |||object |  |object |   ...           | |
>    ||+-------+  +-------+                 | |
>    |+-------------------------------------+ |
>    |               ....                     |
>    |                                        |
>    +----------------------------------------+
> 
>    +---------------------------- pool b ----+
>    |+----- placement group 1 -------------+ |
>    ||+-------+  +-------+                 | |
>    |||object |  |object |                 | |
>    ||+-------+  +-------+                 | |
>    |+-------------------------------------+ |
>    |+----- placement group 2 -------------+ |
>    ||+-------+  +-------+                 | |
>    |||object |  |object |   ...           | |
>    ||+-------+  +-------+                 | |
>    |+-------------------------------------+ |
>    |               ....                     |
>    |                                        |
>    +----------------------------------------+
> 
>    ...
> 
> The placement group is supported by OSDs to store the objects. They are daemons running on machines where storage  For instance, a placement group supporting three replicates will have three OSDs at his disposal : one OSDs is the primary and the two other store copies of each object.
> 
>        +-------- placement group -------------+
>        |+----------------+ +----------------+ |
>        || object A       | | object B       | |
>        |+----------------+ +----------------+ |
>        +---+-------------+-----------+--------+
>            |             |           |
>            |             |           |
>          OSD 0         OSD 1       OSD 2
>         +------+      +------+    +------+
>         |+---+ |      |+---+ |    |+---+ |
>         || A | |      || A | |    || A | |
>         |+---+ |      |+---+ |    |+---+ |
>         |+---+ |      |+---+ |    |+---+ |
>         || B | |      || B | |    || B | |
>         |+---+ |      |+---+ |    |+---+ |
>         +------+      +------+    +------+
> 
> The OSDs are not for the exclusive use of the placement group : multiple placement groups can use the same OSDs to store their objects. However, the collocation of objects from various placement groups in the same OSD is transparent and is not discussed here.
> 
> The placement group does not run as a single daemon as suggested above. Instead it os distributed and resides within each OSD. Whenever an OSD dies, the placement group for this OSD is gone and needs to be reconstructed using another OSD.
> 
>                OSD 0                                           OSD 1 ...
>         +----------------+---- placement group --------+  +------
>         |+--- object --+ |+--------------------------+ |  |
>         || name : B    | ||  pg_log_entry_t MODIFY   | |  |
>         || key : 2     | ||  pg_log_entry_t DELETE   | |  |
>         |+-------------+ |+--------------------------+ |  |
>         |+--- object --+ >------ last_backfill         |  | ....
>         || name : A    | |                             |  |
>         || key : 5     | |                             |  |
>         |+-------------+ |                             |  |
>         |                |                             |  |
>         |    ....        |                             |  |
>         +----------------+-----------------------------+  +-----
> 
> 
> When an object is deleted or modified in the placement group, it is recorded in a log to be replayed if needed. In the simplest case, if an OSD gets disconnected, reconnects and needs to catch up with the other OSDs, copies of the log entries will be sent to it. However, the logs have a limited size and it may be more efficient, in some cases, to just copy the objects over instead of replaying the logs.
> 
> Each object name is hashed into an integer that can be used to order them. For instance, the object B above has been hashed to key 2 and the object A above has been hashed to key 5. The last_backfill pointer of the placement group draws the limit separating the objects that have already been copied from other OSDs and those in the process of being copied. The objects that are lower than last_backfill have been copied ( that would be object B above ) and the objects that are greater than last_backfill are going to be copied.
> 
> It may take time for an OSD to catch up and it is useful to allow replaying the logs while backfilling. log entries related to objects lower than last_backfill are applied. However, log entries related to objects greater than last_backfill are discarded because it is scheduled to be copied at a later time anyway.
> 
> 
> -- 
> Loïc Dachary, Artisan Logiciel Libre
> All that is necessary for the triumph of evil is that good people do nothing.
> 


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Ceph backfilling explained ( maybe )
  2013-05-25 12:33 ` Leen Besselink
@ 2013-05-25 14:27   ` Loic Dachary
  2013-05-25 14:48     ` Leen Besselink
  0 siblings, 1 reply; 9+ messages in thread
From: Loic Dachary @ 2013-05-25 14:27 UTC (permalink / raw)
  To: leen; +Cc: Ceph Development

[-- Attachment #1: Type: text/plain, Size: 7295 bytes --]



On 05/25/2013 02:33 PM, Leen Besselink wrote:
Hi Leen,

> - a Cehp object can store keys/values, not just data

I did not know that. Could you explain or give me the URL ?

> - when using RBD the RBD client will create a 'directory' object which contains general information
>   like the version/type of RBD-image and a list of names of the image parts. Each part is the same
>   size, I think it was 4MB ?

That's also my understanding : 4MB is the default.

> - when an OSD or client connects to an OSD they also communicate information about atleast the osdmap and monmap.
> - when one OSD or monitor can't reach an other mon or OSD, they will use a gossip protocol to communicate that to connected clients, OSDs or mons.
> - when a new OSD comes online the other OSD's talk to it to know what data they might need to exchange
>   this is called peering.
> - the RADOS-algorithm works similair to consistent hashing, so a client can talk directly to the OSD where the data is or should be stored.
> - backfilling is what a master OSD does when it is checking if the other OSD's that should have a copy actaully has a copy. It will send a copy of missing objects.

I guess that's the area where I'm still unsure how it goes. I should look into the state machine of PG.{h,cc} to figure out how backfill related messages are exchanged.

Thanks for taking the time to explain :-)

Cheers

> How the RADOS-algoritm calculates based on the osdmap and pgmap what pg and master-osd an object belongs to I'm not a 100% sure.
> 
> Does that help ?
> 
>> Cheers
>>
>> Ceph stores objects in pools which are divided in placement groups.
>>
>>    +---------------------------- pool a ----+
>>    |+----- placement group 1 -------------+ |
>>    ||+-------+  +-------+                 | |
>>    |||object |  |object |                 | |
>>    ||+-------+  +-------+                 | |
>>    |+-------------------------------------+ |
>>    |+----- placement group 2 -------------+ |
>>    ||+-------+  +-------+                 | |
>>    |||object |  |object |   ...           | |
>>    ||+-------+  +-------+                 | |
>>    |+-------------------------------------+ |
>>    |               ....                     |
>>    |                                        |
>>    +----------------------------------------+
>>
>>    +---------------------------- pool b ----+
>>    |+----- placement group 1 -------------+ |
>>    ||+-------+  +-------+                 | |
>>    |||object |  |object |                 | |
>>    ||+-------+  +-------+                 | |
>>    |+-------------------------------------+ |
>>    |+----- placement group 2 -------------+ |
>>    ||+-------+  +-------+                 | |
>>    |||object |  |object |   ...           | |
>>    ||+-------+  +-------+                 | |
>>    |+-------------------------------------+ |
>>    |               ....                     |
>>    |                                        |
>>    +----------------------------------------+
>>
>>    ...
>>
>> The placement group is supported by OSDs to store the objects. They are daemons running on machines where storage  For instance, a placement group supporting three replicates will have three OSDs at his disposal : one OSDs is the primary and the two other store copies of each object.
>>
>>        +-------- placement group -------------+
>>        |+----------------+ +----------------+ |
>>        || object A       | | object B       | |
>>        |+----------------+ +----------------+ |
>>        +---+-------------+-----------+--------+
>>            |             |           |
>>            |             |           |
>>          OSD 0         OSD 1       OSD 2
>>         +------+      +------+    +------+
>>         |+---+ |      |+---+ |    |+---+ |
>>         || A | |      || A | |    || A | |
>>         |+---+ |      |+---+ |    |+---+ |
>>         |+---+ |      |+---+ |    |+---+ |
>>         || B | |      || B | |    || B | |
>>         |+---+ |      |+---+ |    |+---+ |
>>         +------+      +------+    +------+
>>
>> The OSDs are not for the exclusive use of the placement group : multiple placement groups can use the same OSDs to store their objects. However, the collocation of objects from various placement groups in the same OSD is transparent and is not discussed here.
>>
>> The placement group does not run as a single daemon as suggested above. Instead it os distributed and resides within each OSD. Whenever an OSD dies, the placement group for this OSD is gone and needs to be reconstructed using another OSD.
>>
>>                OSD 0                                           OSD 1 ...
>>         +----------------+---- placement group --------+  +------
>>         |+--- object --+ |+--------------------------+ |  |
>>         || name : B    | ||  pg_log_entry_t MODIFY   | |  |
>>         || key : 2     | ||  pg_log_entry_t DELETE   | |  |
>>         |+-------------+ |+--------------------------+ |  |
>>         |+--- object --+ >------ last_backfill         |  | ....
>>         || name : A    | |                             |  |
>>         || key : 5     | |                             |  |
>>         |+-------------+ |                             |  |
>>         |                |                             |  |
>>         |    ....        |                             |  |
>>         +----------------+-----------------------------+  +-----
>>
>>
>> When an object is deleted or modified in the placement group, it is recorded in a log to be replayed if needed. In the simplest case, if an OSD gets disconnected, reconnects and needs to catch up with the other OSDs, copies of the log entries will be sent to it. However, the logs have a limited size and it may be more efficient, in some cases, to just copy the objects over instead of replaying the logs.
>>
>> Each object name is hashed into an integer that can be used to order them. For instance, the object B above has been hashed to key 2 and the object A above has been hashed to key 5. The last_backfill pointer of the placement group draws the limit separating the objects that have already been copied from other OSDs and those in the process of being copied. The objects that are lower than last_backfill have been copied ( that would be object B above ) and the objects that are greater than last_backfill are going to be copied.
>>
>> It may take time for an OSD to catch up and it is useful to allow replaying the logs while backfilling. log entries related to objects lower than last_backfill are applied. However, log entries related to objects greater than last_backfill are discarded because it is scheduled to be copied at a later time anyway.
>>
>>
>> -- 
>> Loïc Dachary, Artisan Logiciel Libre
>> All that is necessary for the triumph of evil is that good people do nothing.
>>
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Loïc Dachary, Artisan Logiciel Libre
All that is necessary for the triumph of evil is that good people do nothing.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Ceph backfilling explained ( maybe )
  2013-05-25 14:27   ` Loic Dachary
@ 2013-05-25 14:48     ` Leen Besselink
  2013-05-25 17:37       ` Loic Dachary
  0 siblings, 1 reply; 9+ messages in thread
From: Leen Besselink @ 2013-05-25 14:48 UTC (permalink / raw)
  To: Ceph Development

On Sat, May 25, 2013 at 04:27:16PM +0200, Loic Dachary wrote:
> 
> 
> On 05/25/2013 02:33 PM, Leen Besselink wrote:
> Hi Leen,
> 
> > - a Cehp object can store keys/values, not just data
> 
> I did not know that. Could you explain or give me the URL ?
> 

Well, I got that impression from some of the earlier talks and from this blog post:

http://ceph.com/community/my-first-impressions-of-ceph-as-a-summer-intern/

But I haven't read it in while.

But at this time I only see something like:

http://ceph.com/docs/master/rados/api/librados/?highlight=rados_getxattr#rados_getxattr

Which looks like it is storing it in filesystem attributes.

So maybe an object can be a piece of data or a key/value store.

> > - when using RBD the RBD client will create a 'directory' object which contains general information
> >   like the version/type of RBD-image and a list of names of the image parts. Each part is the same
> >   size, I think it was 4MB ?
> 
> That's also my understanding : 4MB is the default.
> 
> > - when an OSD or client connects to an OSD they also communicate information about atleast the osdmap and monmap.
> > - when one OSD or monitor can't reach an other mon or OSD, they will use a gossip protocol to communicate that to connected clients, OSDs or mons.
> > - when a new OSD comes online the other OSD's talk to it to know what data they might need to exchange
> >   this is called peering.
> > - the RADOS-algorithm works similair to consistent hashing, so a client can talk directly to the OSD where the data is or should be stored.
> > - backfilling is what a master OSD does when it is checking if the other OSD's that should have a copy actaully has a copy. It will send a copy of missing objects.
> 
> I guess that's the area where I'm still unsure how it goes. I should look into the state machine of PG.{h,cc} to figure out how backfill related messages are exchanged.
> 

Well, I assume the master of an object knows it is the master.

When it knows an OSD has left the cluster it knows it has to store a new copy.

When changes happen in the cluster, all placement groups/object locations in the pool will have to be re-calculated.

I assumed this might mean an OSD which was master of an object before isn't master anymore, the new master is now resposible for the object and the number of copies.

> Thanks for taking the time to explain :-)
> 

This was just from memory, that is why I left some things unanswered.

My idea was when I transfer my knowledge to you, you can use it to search the source and documentation. :-)

> Cheers
> 
> > How the RADOS-algoritm calculates based on the osdmap and pgmap what pg and master-osd an object belongs to I'm not a 100% sure.
> > 
> > Does that help ?
> > 
> >> Cheers
> >>
> >> Ceph stores objects in pools which are divided in placement groups.
> >>
> >>    +---------------------------- pool a ----+
> >>    |+----- placement group 1 -------------+ |
> >>    ||+-------+  +-------+                 | |
> >>    |||object |  |object |                 | |
> >>    ||+-------+  +-------+                 | |
> >>    |+-------------------------------------+ |
> >>    |+----- placement group 2 -------------+ |
> >>    ||+-------+  +-------+                 | |
> >>    |||object |  |object |   ...           | |
> >>    ||+-------+  +-------+                 | |
> >>    |+-------------------------------------+ |
> >>    |               ....                     |
> >>    |                                        |
> >>    +----------------------------------------+
> >>
> >>    +---------------------------- pool b ----+
> >>    |+----- placement group 1 -------------+ |
> >>    ||+-------+  +-------+                 | |
> >>    |||object |  |object |                 | |
> >>    ||+-------+  +-------+                 | |
> >>    |+-------------------------------------+ |
> >>    |+----- placement group 2 -------------+ |
> >>    ||+-------+  +-------+                 | |
> >>    |||object |  |object |   ...           | |
> >>    ||+-------+  +-------+                 | |
> >>    |+-------------------------------------+ |
> >>    |               ....                     |
> >>    |                                        |
> >>    +----------------------------------------+
> >>
> >>    ...
> >>
> >> The placement group is supported by OSDs to store the objects. They are daemons running on machines where storage  For instance, a placement group supporting three replicates will have three OSDs at his disposal : one OSDs is the primary and the two other store copies of each object.
> >>
> >>        +-------- placement group -------------+
> >>        |+----------------+ +----------------+ |
> >>        || object A       | | object B       | |
> >>        |+----------------+ +----------------+ |
> >>        +---+-------------+-----------+--------+
> >>            |             |           |
> >>            |             |           |
> >>          OSD 0         OSD 1       OSD 2
> >>         +------+      +------+    +------+
> >>         |+---+ |      |+---+ |    |+---+ |
> >>         || A | |      || A | |    || A | |
> >>         |+---+ |      |+---+ |    |+---+ |
> >>         |+---+ |      |+---+ |    |+---+ |
> >>         || B | |      || B | |    || B | |
> >>         |+---+ |      |+---+ |    |+---+ |
> >>         +------+      +------+    +------+
> >>
> >> The OSDs are not for the exclusive use of the placement group : multiple placement groups can use the same OSDs to store their objects. However, the collocation of objects from various placement groups in the same OSD is transparent and is not discussed here.
> >>
> >> The placement group does not run as a single daemon as suggested above. Instead it os distributed and resides within each OSD. Whenever an OSD dies, the placement group for this OSD is gone and needs to be reconstructed using another OSD.
> >>
> >>                OSD 0                                           OSD 1 ...
> >>         +----------------+---- placement group --------+  +------
> >>         |+--- object --+ |+--------------------------+ |  |
> >>         || name : B    | ||  pg_log_entry_t MODIFY   | |  |
> >>         || key : 2     | ||  pg_log_entry_t DELETE   | |  |
> >>         |+-------------+ |+--------------------------+ |  |
> >>         |+--- object --+ >------ last_backfill         |  | ....
> >>         || name : A    | |                             |  |
> >>         || key : 5     | |                             |  |
> >>         |+-------------+ |                             |  |
> >>         |                |                             |  |
> >>         |    ....        |                             |  |
> >>         +----------------+-----------------------------+  +-----
> >>
> >>
> >> When an object is deleted or modified in the placement group, it is recorded in a log to be replayed if needed. In the simplest case, if an OSD gets disconnected, reconnects and needs to catch up with the other OSDs, copies of the log entries will be sent to it. However, the logs have a limited size and it may be more efficient, in some cases, to just copy the objects over instead of replaying the logs.
> >>
> >> Each object name is hashed into an integer that can be used to order them. For instance, the object B above has been hashed to key 2 and the object A above has been hashed to key 5. The last_backfill pointer of the placement group draws the limit separating the objects that have already been copied from other OSDs and those in the process of being copied. The objects that are lower than last_backfill have been copied ( that would be object B above ) and the objects that are greater than last_backfill are going to be copied.
> >>
> >> It may take time for an OSD to catch up and it is useful to allow replaying the logs while backfilling. log entries related to objects lower than last_backfill are applied. However, log entries related to objects greater than last_backfill are discarded because it is scheduled to be copied at a later time anyway.
> >>
> >>
> >> -- 
> >> Loïc Dachary, Artisan Logiciel Libre
> >> All that is necessary for the triumph of evil is that good people do nothing.
> >>
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> -- 
> Loïc Dachary, Artisan Logiciel Libre
> All that is necessary for the triumph of evil is that good people do nothing.
> 


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Ceph backfilling explained ( maybe )
  2013-05-25 14:48     ` Leen Besselink
@ 2013-05-25 17:37       ` Loic Dachary
  2013-05-25 18:06         ` Samuel Just
  0 siblings, 1 reply; 9+ messages in thread
From: Loic Dachary @ 2013-05-25 17:37 UTC (permalink / raw)
  To: leen; +Cc: Ceph Development

[-- Attachment #1: Type: text/plain, Size: 1177 bytes --]



On 05/25/2013 04:48 PM, Leen Besselink wrote:
> On Sat, May 25, 2013 at 04:27:16PM +0200, Loic Dachary wrote:
>>
>>
>> On 05/25/2013 02:33 PM, Leen Besselink wrote:
>> Hi Leen,
>>
>>> - a Cehp object can store keys/values, not just data
>>
>> I did not know that. Could you explain or give me the URL ?
>>
> 
> Well, I got that impression from some of the earlier talks and from this blog post:
> 
> http://ceph.com/community/my-first-impressions-of-ceph-as-a-summer-intern/
> 
> But I haven't read it in while.
> 
> But at this time I only see something like:
> 
> http://ceph.com/docs/master/rados/api/librados/?highlight=rados_getxattr#rados_getxattr
> 
> Which looks like it is storing it in filesystem attributes.
> 
> So maybe an object can be a piece of data or a key/value store.

Thanks for explaining: I did not know about the works of Eleanor Cawthon. I knew about the objects xattributes but I thought you meant that the data inside of the object could be structured as key/value pairs. My bad :-)

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre
All that is necessary for the triumph of evil is that good people do nothing.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Ceph backfilling explained ( maybe )
  2013-05-25 17:37       ` Loic Dachary
@ 2013-05-25 18:06         ` Samuel Just
  2013-05-25 19:15           ` Loic Dachary
  2013-05-26  5:22           ` Leen Besselink
  0 siblings, 2 replies; 9+ messages in thread
From: Samuel Just @ 2013-05-25 18:06 UTC (permalink / raw)
  To: Loic Dachary; +Cc: leen, Ceph Development

Hi, thanks for taking the time to try to get all this documented!

Placement groups are assigned to a set of OSDs by crush.

(4.1, osdmap(e 1)) --CRUSH--> [3,1,2]

where the primary is 3.  When 3 dies, the osdmap is updated to reflect this
and we get a new mapping for pg 4.1:

(4,1, osdmap(e 2)) --CRUSH--> [1,2,4]

Here, 1 and 2 already have up-to-date copies of 4.1.  osd 4, however, needs
to be brought up to date.  During peering, osd 1 will learn that osd 4
falls into
1 of 2 cases.

Case 1 is that osd 4 already had an old copy of pg 4.1 AND its pg log for pg
4.1 happens to overlap osd 1's pg log for pg 4.1.  In that case, by running
through the log of operations, we can determine exactly which objects need
to be copied over.  We usually refer to this as just "recovery" (or log based
recovery).

In case 2, either osd 4's pg log does not overlap that of osd 1.  In this case,
we cannot determine from the log which objects need to be copied over.
To bring osd 4 up to date, we therefore need to backfill.

Backfill involves the primary and the backfill peer (there is only ever one in
the acting set at a time, see PG::choose_acting) scanning over their pg stores
and copying the objects which are different or missing from the primary to the
backfill peer.  Because this may take a long time, we track the a last_backfill
attribute for each local pg copy indicating how far the local copy has been
backfilled.  In the case that the copy is complete, last_backfill is
hobject_t::max().

More exactly, a local pg copy is described by a few pieces of information:
1) the local pg log
2) the local last_backfill
3) the local last_complete
4) the local missing set
The local pg store reflects all updates up to version last_complete on all
hobject_ts hoid such that hoid < last_backfill AND hoid is not in the missing
set.  Comparing the pg logs is used to fill in the missing set for OSDs which
were only down for a brief period thus avoiding a costly backfill in many cases.

This is a bit of a rough brain dump and may be somewhat misleading/wrong.
I'll get it cleaned up and put it into
doc/dev/osd_internals/pg_recovery.rst next
week.

Also, rados objects currently have three pieces:
1) data - read, write, writefull, etc.
2) xattrs
3) omap
The omap is much like the xattrs except that it can generally store a much
larger number of keys and support efficient scans.  It's used at the moment
for a few things including rgw bucket indices.  The omap entries are copied
over along with the rest of the object in recovery.  Behind the scenes, all
omap entries for all objects stored on an OSD are stored prefixed in a single
big leveldb instance.

omap operations probably shouldn't be supported on objects in an
ErasureCodedPG :)
-Sam

On Sat, May 25, 2013 at 10:37 AM, Loic Dachary <loic@dachary.org> wrote:
>
>
> On 05/25/2013 04:48 PM, Leen Besselink wrote:
>> On Sat, May 25, 2013 at 04:27:16PM +0200, Loic Dachary wrote:
>>>
>>>
>>> On 05/25/2013 02:33 PM, Leen Besselink wrote:
>>> Hi Leen,
>>>
>>>> - a Cehp object can store keys/values, not just data
>>>
>>> I did not know that. Could you explain or give me the URL ?
>>>
>>
>> Well, I got that impression from some of the earlier talks and from this blog post:
>>
>> http://ceph.com/community/my-first-impressions-of-ceph-as-a-summer-intern/
>>
>> But I haven't read it in while.
>>
>> But at this time I only see something like:
>>
>> http://ceph.com/docs/master/rados/api/librados/?highlight=rados_getxattr#rados_getxattr
>>
>> Which looks like it is storing it in filesystem attributes.
>>
>> So maybe an object can be a piece of data or a key/value store.
>
> Thanks for explaining: I did not know about the works of Eleanor Cawthon. I knew about the objects xattributes but I thought you meant that the data inside of the object could be structured as key/value pairs. My bad :-)
>
> Cheers
>
> --
> Loïc Dachary, Artisan Logiciel Libre
> All that is necessary for the triumph of evil is that good people do nothing.
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Ceph backfilling explained ( maybe )
  2013-05-25 18:06         ` Samuel Just
@ 2013-05-25 19:15           ` Loic Dachary
  2013-05-26 11:45             ` Loic Dachary
  2013-05-26  5:22           ` Leen Besselink
  1 sibling, 1 reply; 9+ messages in thread
From: Loic Dachary @ 2013-05-25 19:15 UTC (permalink / raw)
  To: Samuel Just; +Cc: Ceph Development

[-- Attachment #1: Type: text/plain, Size: 5613 bytes --]

Hi !

On 05/25/2013 08:06 PM, Samuel Just wrote:
> Hi, thanks for taking the time to try to get all this documented!
> 
> Placement groups are assigned to a set of OSDs by crush.
> 
> (4.1, osdmap(e 1)) --CRUSH--> [3,1,2]
> 
> where the primary is 3.  When 3 dies, the osdmap is updated to reflect this
> and we get a new mapping for pg 4.1:
> 
> (4,1, osdmap(e 2)) --CRUSH--> [1,2,4]
> 
> Here, 1 and 2 already have up-to-date copies of 4.1.  osd 4, however, needs
> to be brought up to date.  During peering, osd 1 will learn that osd 4
> falls into
> 1 of 2 cases.
> 
> Case 1 is that osd 4 already had an old copy of pg 4.1 AND its pg log for pg
> 4.1 happens to overlap osd 1's pg log for pg 4.1.  In that case, by running
> through the log of operations, we can determine exactly which objects need
> to be copied over.  We usually refer to this as just "recovery" (or log based
> recovery).
> 
> In case 2, either osd 4's pg log does not overlap that of osd 1.  In this case,
> we cannot determine from the log which objects need to be copied over.
> To bring osd 4 up to date, we therefore need to backfill.
> 
> Backfill involves the primary and the backfill peer (there is only ever one in
> the acting set at a time, see PG::choose_acting) scanning over their pg stores
> and copying the objects which are different or missing from the primary to the
> backfill peer.  Because this may take a long time, we track the a last_backfill
> attribute for each local pg copy indicating how far the local copy has been
> backfilled.  In the case that the copy is complete, last_backfill is
> hobject_t::max().

Is it true that if two osd briefly disconnect while backfilling, they may be in the case 1 above (i.e. log based recovery ) and then backfilling again when done, starting from last_backfill and up ? 

> More exactly, a local pg copy is described by a few pieces of information:
> 1) the local pg log

pg_log_t https://github.com/ceph/ceph/blob/master/src/osd/osd_types.h#L1371
pg_log_entry_t https://github.com/ceph/ceph/blob/master/src/osd/osd_types.h#L1277

> 2) the local last_backfill

pg_info_t::last_backfill https://github.com/ceph/ceph/blob/master/src/osd/osd_types.h#L1102

> 3) the local last_complete

pg_info_t::last_complete https://github.com/ceph/ceph/blob/master/src/osd/osd_types.h#L1089

> 4) the local missing set

pg_missing_t https://github.com/ceph/ceph/blob/master/src/osd/osd_types.h#L1468

> The local pg store reflects all updates up to version last_complete on all

I assume you mean 'local pg log' instead of 'local pg log'. 

> hobject_ts hoid such that hoid < last_backfill AND hoid is not in the missing
> set.  Comparing the pg logs is used to fill in the missing set for OSDs which
> were only down for a brief period thus avoiding a costly backfill in many cases.

The pg logs are trimmed ( https://github.com/ceph/ceph/blob/master/src/osd/PG.cc#L216 ), this is why the pg logs of two OSDs that have been disconnected for too long are unlikely to overlap ? And therefore require a backfill because the two pg logs cannot be compared ?

> This is a bit of a rough brain dump and may be somewhat misleading/wrong.

It is very helpful as it is, thanks :-)

> I'll get it cleaned up and put it into
> doc/dev/osd_internals/pg_recovery.rst next
> week.
> 

That would be great. 

> Also, rados objects currently have three pieces:
> 1) data - read, write, writefull, etc.
> 2) xattrs
> 3) omap
> The omap is much like the xattrs except that it can generally store a much
> larger number of keys and support efficient scans.  It's used at the moment
> for a few things including rgw bucket indices.  The omap entries are copied
> over along with the rest of the object in recovery.  Behind the scenes, all
> omap entries for all objects stored on an OSD are stored prefixed in a single
> big leveldb instance.
> 
> omap operations probably shouldn't be supported on objects in an
> ErasureCodedPG :)

I thought omap / xattrs were mutually exclusive. I did not realize both could be used at the same time.

Cheers

> -Sam
> 
> On Sat, May 25, 2013 at 10:37 AM, Loic Dachary <loic@dachary.org> wrote:
>>
>>
>> On 05/25/2013 04:48 PM, Leen Besselink wrote:
>>> On Sat, May 25, 2013 at 04:27:16PM +0200, Loic Dachary wrote:
>>>>
>>>>
>>>> On 05/25/2013 02:33 PM, Leen Besselink wrote:
>>>> Hi Leen,
>>>>
>>>>> - a Cehp object can store keys/values, not just data
>>>>
>>>> I did not know that. Could you explain or give me the URL ?
>>>>
>>>
>>> Well, I got that impression from some of the earlier talks and from this blog post:
>>>
>>> http://ceph.com/community/my-first-impressions-of-ceph-as-a-summer-intern/
>>>
>>> But I haven't read it in while.
>>>
>>> But at this time I only see something like:
>>>
>>> http://ceph.com/docs/master/rados/api/librados/?highlight=rados_getxattr#rados_getxattr
>>>
>>> Which looks like it is storing it in filesystem attributes.
>>>
>>> So maybe an object can be a piece of data or a key/value store.
>>
>> Thanks for explaining: I did not know about the works of Eleanor Cawthon. I knew about the objects xattributes but I thought you meant that the data inside of the object could be structured as key/value pairs. My bad :-)
>>
>> Cheers
>>
>> --
>> Loïc Dachary, Artisan Logiciel Libre
>> All that is necessary for the triumph of evil is that good people do nothing.
>>

-- 
Loïc Dachary, Artisan Logiciel Libre
All that is necessary for the triumph of evil is that good people do nothing.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Ceph backfilling explained ( maybe )
  2013-05-25 18:06         ` Samuel Just
  2013-05-25 19:15           ` Loic Dachary
@ 2013-05-26  5:22           ` Leen Besselink
  1 sibling, 0 replies; 9+ messages in thread
From: Leen Besselink @ 2013-05-26  5:22 UTC (permalink / raw)
  To: Ceph Development

On Sat, May 25, 2013 at 11:06:08AM -0700, Samuel Just wrote:
> Hi, thanks for taking the time to try to get all this documented!
> 
> Placement groups are assigned to a set of OSDs by crush.
> 

Darn, silly me.

I made a stupid mistake, I meant it is CRUSH-algoritm and RADOS is the protocol.

> (4.1, osdmap(e 1)) --CRUSH--> [3,1,2]
> 
> where the primary is 3.  When 3 dies, the osdmap is updated to reflect this
> and we get a new mapping for pg 4.1:
> 
> (4,1, osdmap(e 2)) --CRUSH--> [1,2,4]
> 
> Here, 1 and 2 already have up-to-date copies of 4.1.  osd 4, however, needs
> to be brought up to date.  During peering, osd 1 will learn that osd 4
> falls into
> 1 of 2 cases.
> 
> Case 1 is that osd 4 already had an old copy of pg 4.1 AND its pg log for pg
> 4.1 happens to overlap osd 1's pg log for pg 4.1.  In that case, by running
> through the log of operations, we can determine exactly which objects need
> to be copied over.  We usually refer to this as just "recovery" (or log based
> recovery).
> 
> In case 2, either osd 4's pg log does not overlap that of osd 1.  In this case,
> we cannot determine from the log which objects need to be copied over.
> To bring osd 4 up to date, we therefore need to backfill.
> 
> Backfill involves the primary and the backfill peer (there is only ever one in
> the acting set at a time, see PG::choose_acting) scanning over their pg stores
> and copying the objects which are different or missing from the primary to the
> backfill peer.  Because this may take a long time, we track the a last_backfill
> attribute for each local pg copy indicating how far the local copy has been
> backfilled.  In the case that the copy is complete, last_backfill is
> hobject_t::max().
> 
> More exactly, a local pg copy is described by a few pieces of information:
> 1) the local pg log
> 2) the local last_backfill
> 3) the local last_complete
> 4) the local missing set
> The local pg store reflects all updates up to version last_complete on all
> hobject_ts hoid such that hoid < last_backfill AND hoid is not in the missing
> set.  Comparing the pg logs is used to fill in the missing set for OSDs which
> were only down for a brief period thus avoiding a costly backfill in many cases.
> 
> This is a bit of a rough brain dump and may be somewhat misleading/wrong.
> I'll get it cleaned up and put it into
> doc/dev/osd_internals/pg_recovery.rst next
> week.
> 
> Also, rados objects currently have three pieces:
> 1) data - read, write, writefull, etc.
> 2) xattrs
> 3) omap
> The omap is much like the xattrs except that it can generally store a much
> larger number of keys and support efficient scans.  It's used at the moment
> for a few things including rgw bucket indices.  The omap entries are copied
> over along with the rest of the object in recovery.  Behind the scenes, all
> omap entries for all objects stored on an OSD are stored prefixed in a single
> big leveldb instance.
> 
> omap operations probably shouldn't be supported on objects in an
> ErasureCodedPG :)
> -Sam
> 
> On Sat, May 25, 2013 at 10:37 AM, Loic Dachary <loic@dachary.org> wrote:
> >
> >
> > On 05/25/2013 04:48 PM, Leen Besselink wrote:
> >> On Sat, May 25, 2013 at 04:27:16PM +0200, Loic Dachary wrote:
> >>>
> >>>
> >>> On 05/25/2013 02:33 PM, Leen Besselink wrote:
> >>> Hi Leen,
> >>>
> >>>> - a Cehp object can store keys/values, not just data
> >>>
> >>> I did not know that. Could you explain or give me the URL ?
> >>>
> >>
> >> Well, I got that impression from some of the earlier talks and from this blog post:
> >>
> >> http://ceph.com/community/my-first-impressions-of-ceph-as-a-summer-intern/
> >>
> >> But I haven't read it in while.
> >>
> >> But at this time I only see something like:
> >>
> >> http://ceph.com/docs/master/rados/api/librados/?highlight=rados_getxattr#rados_getxattr
> >>
> >> Which looks like it is storing it in filesystem attributes.
> >>
> >> So maybe an object can be a piece of data or a key/value store.
> >
> > Thanks for explaining: I did not know about the works of Eleanor Cawthon. I knew about the objects xattributes but I thought you meant that the data inside of the object could be structured as key/value pairs. My bad :-)
> >
> > Cheers
> >
> > --
> > Loïc Dachary, Artisan Logiciel Libre
> > All that is necessary for the triumph of evil is that good people do nothing.
> >
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Ceph backfilling explained ( maybe )
  2013-05-25 19:15           ` Loic Dachary
@ 2013-05-26 11:45             ` Loic Dachary
  0 siblings, 0 replies; 9+ messages in thread
From: Loic Dachary @ 2013-05-26 11:45 UTC (permalink / raw)
  To: Ceph Development

[-- Attachment #1: Type: text/plain, Size: 6068 bytes --]

Hi,

Although I am yet to fully understand the logic of the placement group recovery ( I'm eager to read Sam's doc/dev/osd_internals/pg_recovery.rst :-), I wrote down my understanding of backfilling : http://dachary.org/?p=2009 . 

Cheers

On 05/25/2013 09:15 PM, Loic Dachary wrote:
> Hi !
> 
> On 05/25/2013 08:06 PM, Samuel Just wrote:
>> Hi, thanks for taking the time to try to get all this documented!
>>
>> Placement groups are assigned to a set of OSDs by crush.
>>
>> (4.1, osdmap(e 1)) --CRUSH--> [3,1,2]
>>
>> where the primary is 3.  When 3 dies, the osdmap is updated to reflect this
>> and we get a new mapping for pg 4.1:
>>
>> (4,1, osdmap(e 2)) --CRUSH--> [1,2,4]
>>
>> Here, 1 and 2 already have up-to-date copies of 4.1.  osd 4, however, needs
>> to be brought up to date.  During peering, osd 1 will learn that osd 4
>> falls into
>> 1 of 2 cases.
>>
>> Case 1 is that osd 4 already had an old copy of pg 4.1 AND its pg log for pg
>> 4.1 happens to overlap osd 1's pg log for pg 4.1.  In that case, by running
>> through the log of operations, we can determine exactly which objects need
>> to be copied over.  We usually refer to this as just "recovery" (or log based
>> recovery).
>>
>> In case 2, either osd 4's pg log does not overlap that of osd 1.  In this case,
>> we cannot determine from the log which objects need to be copied over.
>> To bring osd 4 up to date, we therefore need to backfill.
>>
>> Backfill involves the primary and the backfill peer (there is only ever one in
>> the acting set at a time, see PG::choose_acting) scanning over their pg stores
>> and copying the objects which are different or missing from the primary to the
>> backfill peer.  Because this may take a long time, we track the a last_backfill
>> attribute for each local pg copy indicating how far the local copy has been
>> backfilled.  In the case that the copy is complete, last_backfill is
>> hobject_t::max().
> 
> Is it true that if two osd briefly disconnect while backfilling, they may be in the case 1 above (i.e. log based recovery ) and then backfilling again when done, starting from last_backfill and up ? 
> 
>> More exactly, a local pg copy is described by a few pieces of information:
>> 1) the local pg log
> 
> pg_log_t https://github.com/ceph/ceph/blob/master/src/osd/osd_types.h#L1371
> pg_log_entry_t https://github.com/ceph/ceph/blob/master/src/osd/osd_types.h#L1277
> 
>> 2) the local last_backfill
> 
> pg_info_t::last_backfill https://github.com/ceph/ceph/blob/master/src/osd/osd_types.h#L1102
> 
>> 3) the local last_complete
> 
> pg_info_t::last_complete https://github.com/ceph/ceph/blob/master/src/osd/osd_types.h#L1089
> 
>> 4) the local missing set
> 
> pg_missing_t https://github.com/ceph/ceph/blob/master/src/osd/osd_types.h#L1468
> 
>> The local pg store reflects all updates up to version last_complete on all
> 
> I assume you mean 'local pg log' instead of 'local pg log'. 
> 
>> hobject_ts hoid such that hoid < last_backfill AND hoid is not in the missing
>> set.  Comparing the pg logs is used to fill in the missing set for OSDs which
>> were only down for a brief period thus avoiding a costly backfill in many cases.
> 
> The pg logs are trimmed ( https://github.com/ceph/ceph/blob/master/src/osd/PG.cc#L216 ), this is why the pg logs of two OSDs that have been disconnected for too long are unlikely to overlap ? And therefore require a backfill because the two pg logs cannot be compared ?
> 
>> This is a bit of a rough brain dump and may be somewhat misleading/wrong.
> 
> It is very helpful as it is, thanks :-)
> 
>> I'll get it cleaned up and put it into
>> doc/dev/osd_internals/pg_recovery.rst next
>> week.
>>
> 
> That would be great. 
> 
>> Also, rados objects currently have three pieces:
>> 1) data - read, write, writefull, etc.
>> 2) xattrs
>> 3) omap
>> The omap is much like the xattrs except that it can generally store a much
>> larger number of keys and support efficient scans.  It's used at the moment
>> for a few things including rgw bucket indices.  The omap entries are copied
>> over along with the rest of the object in recovery.  Behind the scenes, all
>> omap entries for all objects stored on an OSD are stored prefixed in a single
>> big leveldb instance.
>>
>> omap operations probably shouldn't be supported on objects in an
>> ErasureCodedPG :)
> 
> I thought omap / xattrs were mutually exclusive. I did not realize both could be used at the same time.
> 
> Cheers
> 
>> -Sam
>>
>> On Sat, May 25, 2013 at 10:37 AM, Loic Dachary <loic@dachary.org> wrote:
>>>
>>>
>>> On 05/25/2013 04:48 PM, Leen Besselink wrote:
>>>> On Sat, May 25, 2013 at 04:27:16PM +0200, Loic Dachary wrote:
>>>>>
>>>>>
>>>>> On 05/25/2013 02:33 PM, Leen Besselink wrote:
>>>>> Hi Leen,
>>>>>
>>>>>> - a Cehp object can store keys/values, not just data
>>>>>
>>>>> I did not know that. Could you explain or give me the URL ?
>>>>>
>>>>
>>>> Well, I got that impression from some of the earlier talks and from this blog post:
>>>>
>>>> http://ceph.com/community/my-first-impressions-of-ceph-as-a-summer-intern/
>>>>
>>>> But I haven't read it in while.
>>>>
>>>> But at this time I only see something like:
>>>>
>>>> http://ceph.com/docs/master/rados/api/librados/?highlight=rados_getxattr#rados_getxattr
>>>>
>>>> Which looks like it is storing it in filesystem attributes.
>>>>
>>>> So maybe an object can be a piece of data or a key/value store.
>>>
>>> Thanks for explaining: I did not know about the works of Eleanor Cawthon. I knew about the objects xattributes but I thought you meant that the data inside of the object could be structured as key/value pairs. My bad :-)
>>>
>>> Cheers
>>>
>>> --
>>> Loïc Dachary, Artisan Logiciel Libre
>>> All that is necessary for the triumph of evil is that good people do nothing.
>>>
> 

-- 
Loïc Dachary, Artisan Logiciel Libre
All that is necessary for the triumph of evil is that good people do nothing.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-05-26 11:45 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-25 11:55 Ceph backfilling explained ( maybe ) Loic Dachary
2013-05-25 12:33 ` Leen Besselink
2013-05-25 14:27   ` Loic Dachary
2013-05-25 14:48     ` Leen Besselink
2013-05-25 17:37       ` Loic Dachary
2013-05-25 18:06         ` Samuel Just
2013-05-25 19:15           ` Loic Dachary
2013-05-26 11:45             ` Loic Dachary
2013-05-26  5:22           ` Leen Besselink

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.