From mboxrd@z Thu Jan 1 00:00:00 1970 From: Gregory Farnum Subject: Re: Auto-striping feature Date: Tue, 25 Jan 2011 16:15:09 -0800 Message-ID: References: <1408EC05-75B0-4DD8-B6D6-7F5D10AAC096@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Return-path: Received: from mail-ey0-f174.google.com ([209.85.215.174]:58786 "EHLO mail-ey0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752465Ab1AZAPL (ORCPT ); Tue, 25 Jan 2011 19:15:11 -0500 Received: by eye27 with SMTP id 27so248759eye.19 for ; Tue, 25 Jan 2011 16:15:09 -0800 (PST) In-Reply-To: <1408EC05-75B0-4DD8-B6D6-7F5D10AAC096@gmail.com> Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Tatsuya Kawano Cc: "ceph-devel@vger.kernel.org" On Tue, Jan 25, 2011 at 3:53 AM, Tatsuya Kawano wrote: > > Hi, > > I have some questions about auto-striping feature in Ceph. > > - What is the default striping size? The default is to stripe the file across 4MB objects, 4MB at a time. You can also define your own striping strategy using cephfs. Make sure that "stripe_unit" * "stripe_count" equals "object_size". > - How can I specify the striping size for a specific file (via libceph and kernel driver)? In the kernel, use the cephfs tool. It lets you use ioctls to specify a single file layout or to define the default layout for newly created files in a subtree of the fs. You can't do it in cfuse, unfortunately. (Although you can set the default using the kernel client and cfuse will follow that setting correctly.) If you're writing your own application using libceph, you can also set it; use the cephfs source as a model. > - How many PGs will be involved on striping one file. That depends on how large the file is, and is pseudorandom. > > > I'm writing several files to Ceph and the size of each file will be about 64MB. There will be 10 to 20 OSDs in the cluster. I wonder how each file will be divided into objects and how these objects will be distributed in the cluster. Well, the files will be divided into objects on 4MB blocks. (The last block may be short.) The objects will be distributed pseudorandomly into "placement groups" and those placement groups will be pseudorandomly distributed across the OSDs in the cluster. If you're interested in the specifics of how this works, I'd recommend reading Sage's thesis, available on the Ceph website. -Greg