Forcing Ceph into mapping all objects to a single PG

* Forcing Ceph into mapping all objects to a single PG
@ 2014-07-21 22:27 Daniel Hofmann
  2014-07-21 22:50 ` Gregory Farnum
  2014-07-31  0:29 ` Daniel Hofmann
  0 siblings, 2 replies; 8+ messages in thread
From: Daniel Hofmann @ 2014-07-21 22:27 UTC (permalink / raw)
  To: ceph-devel

Preamble: you might want to read the decent formatted version of this
mail at:
> https://gist.github.com/daniel-j-h/2daae2237bb21596c97d

Motivation
----------

Recently I was wondering how Ceph (i.e. Rados in particular) maps object
to Placement Groups (PGs).

It basically works like the following:

    pg = hash(object)
    osd = crush(pg)

By hashing the object's name (not the content!) Ceph determines the PG.
For assigning PGs to OSDs, the CRUSH algorithm is used.

But what happens if the hash function always returns the same value,
i.e. what happens on hash collisions?
Does Ceph really map objects to the same PG then?

I asked this on the IRC Channel and was a bit surprised as I got the
response, that if Ceph really maps all objects to a single PG I'm simply
out of luck:

> with enough PG and enough objects of reasonable sizes, it should yield
a decent data distribution

Confirming, that the hash function is indeed responsible for the
distribution of objects to PGs.

So, what could possibly go wrong? Let's try this!

Setting up the environment
--------------------------

I modified the librados usage example
(src/examples/librados/hello_world.cc) to write a objects with specific
names:

    std::string object_name("\x41\x5b\xd2\xc2\xc3");  // bear with me here
    //std::string object_name("\x41\x1e\xf8\x5e\x22");
    //std::string object_name("\x01\x01\x01\x17\x2e");

After setting up a local cluster via the vstart.sh tool and writing
those three objects, executing

    ./ceph pg dump | less -S

indeed shows our three **objects getting mapped to the same PG**!

Collisions
----------

How is it possible to come up with such colliding names? It's actually
quite simple:

Ceph defines two hash functions (src/common/ceph_hash.cc) from which the
so called rjenkins hash function is used by default.

(Note: src/crush/hash.c actually implements the rjenkins function again
-- why?)

The hash function's implementation is deterministic in the sense that
the output only depends on it's input.
No random or setup-specific seed is used.

What implications does this have?

Once we compute collisions, we're able to force the object mapping on
**every rados object store out there**.
And because the hash function's output is only 32 bits wide, collisions
are not that hard to find at all!

Here are a few if you're following along and want to test your setup:

| hash (u32) |byte sequence (hex) |
| -----------|--------------------|
| 381933444  | 41 5b d2 c2 c3     |
| 381933444  | 41 1e f8 5e 22     |
| 381933444  | 1 1 1 17 2e        |
| 381933444  | 1 a4 34 f6 c0      |
| 381933444  | 42 ca 78 4f 47     |
| 381933444  | 82 e8 ea fe 16     |

Implications
-------------

I did this as a proof-of-concept in a few hours but I'm still not sure
about the full implications.
I'm able to force object mappings. Can this be used as a DOS attack
vector? Maybe.

I looked through the RadosGW source, in order to find out how object
names are generated there.
I only found the second hash function implementation getting used -- but
I'm not familiar with the source to say more about it at this time.

Open questions
--------------

I'd like to know more about what an impact this could have. In particular:

* How do higher-level API's (e.g. RadosGW) determine the object's name?
Can this be influenced or set by the user?
* Is it possible to introduce some randomness, in order to mitigate the
explained scenario? Maybe on a per-cluster basis or deduced from some
internal state. (Note: SipHash is doing something similar)

Summary
-------

Ceph's object mapping depends on the rjenkins hash function.
It's possible to force Ceph into mapping all objects to a single PG.

Please discuss!

Thanks,
Daniel J. Hofmann

^ permalink raw reply	[flat|nested] 8+ messages in thread