All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Yan, Zheng" <ukernel@gmail.com>
To: Barclay Jameson <almightybeeij@gmail.com>
Cc: Gregory Farnum <greg@gregs42.com>,
	"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>,
	ceph-users <ceph-users@ceph.com>
Subject: Re: [ceph-users] CephFS Slow writes with 1MB files
Date: Tue, 31 Mar 2015 11:59:43 +0800	[thread overview]
Message-ID: <CAAM7YAkwtFLnFea3fNT_CaeA2R+_xdZONVN_s21k7_Rbd27H+Q@mail.gmail.com> (raw)
In-Reply-To: <CAMzumda7VserizM5PEoT8mwYnTHeE4CFiUiLjnjqN2xXaeNVQQ@mail.gmail.com>

On Sun, Mar 29, 2015 at 1:12 AM, Barclay Jameson
<almightybeeij@gmail.com> wrote:
> I redid my entire Ceph build going back to to CentOS 7 hoping to the
> get the same performance I did last time.
> The rados bench test was the best I have ever had with a time of 740
> MB wr and 1300 MB rd. This was even better than the first rados bench
> test that had performance equal to PanFS. I find that this does not
> translate to my CephFS. Even with the following tweaking it still at
> least twice as slow as PanFS and my first *Magical* build (that had
> absolutely no tweaking):
>
> OSD
>  osd_op_treads 8
>  /sys/block/sd*/queue/nr_requests 4096
>  /sys/block/sd*/queue/read_ahead_kb 4096
>
> Client
>  rsize=16777216
>  readdir_max_bytes=16777216
>  readdir_max_entries=16777216
>
> ~160 mins to copy 100000 (1MB) files for CephFS vs ~50 mins for PanFS.
> Throughput on CephFS is about 10MB/s vs PanFS 30 MB/s.
>
> Strange thing is none of the resources are taxed.
> CPU, ram, network, disks, are not even close to being taxed on either
> the client,mon/mds, or the osd nodes.
> The PanFS client node was a 10Gb network the same as the CephFS client
> but you can see the huge difference in speed.
>
> As per Gregs questions before:
> There is only one client reading and writing (time cp Small1/*
> Small2/.) but three clients have cephfs mounted, although they aren't
> doing anything on the filesystem.
>
> I have done another test where I stream data info a file as fast as
> the processor can put it there.
> (for (i=0; i < 1000000001; i++){ fprintf (out_file, "I is : %d\n",i);}
> ) and it is faster than the PanFS. CephFS 16GB in 105 seconds with the
> above tuning vs 130 seconds for PanFS. Without the tuning it takes 230
> seconds for CephFS although the first build did it in 130 seconds
> without any tuning.
>
> This leads me to believe the bottleneck is the mds. Does anybody have
> any thoughts on this?
> Are there any tuning parameters that I would need to speed up the mds?

could you enable mds debugging for a few seconds (ceph daemon mds.x
config set debug_mds 10; sleep 10; ceph daemon mds.x config set
debug_mds 0). and upload /var/log/ceph/mds.x.log to somewhere.

Regards
Yan, Zheng

>
> On Fri, Mar 27, 2015 at 4:50 PM, Gregory Farnum <greg@gregs42.com> wrote:
>> On Fri, Mar 27, 2015 at 2:46 PM, Barclay Jameson
>> <almightybeeij@gmail.com> wrote:
>>> Yes it's the exact same hardware except for the MDS server (although I
>>> tried using the MDS on the old node).
>>> I have not tried moving the MON back to the old node.
>>>
>>> My default cache size is "mds cache size = 10000000"
>>> The OSDs (3 of them) have 16 Disks with 4 SSD Journal Disks.
>>> I created 2048 for data and metadata:
>>> ceph osd pool create cephfs_data 2048 2048
>>> ceph osd pool create cephfs_metadata 2048 2048
>>>
>>>
>>> To your point on clients competing against each other... how would I check that?
>>
>> Do you have multiple clients mounted? Are they both accessing files in
>> the directory(ies) you're testing? Were they accessing the same
>> pattern of files for the old cluster?
>>
>> If you happen to be running a hammer rc or something pretty new you
>> can use the MDS admin socket to explore a bit what client sessions
>> there are and what they have permissions on and check; otherwise
>> you'll have to figure it out from the client side.
>> -Greg
>>
>>>
>>> Thanks for the input!
>>>
>>>
>>> On Fri, Mar 27, 2015 at 3:04 PM, Gregory Farnum <greg@gregs42.com> wrote:
>>>> So this is exactly the same test you ran previously, but now it's on
>>>> faster hardware and the test is slower?
>>>>
>>>> Do you have more data in the test cluster? One obvious possibility is
>>>> that previously you were working entirely in the MDS' cache, but now
>>>> you've got more dentries and so it's kicking data out to RADOS and
>>>> then reading it back in.
>>>>
>>>> If you've got the memory (you appear to) you can pump up the "mds
>>>> cache size" config option quite dramatically from it's default 100000.
>>>>
>>>> Other things to check are that you've got an appropriately-sized
>>>> metadata pool, that you've not got clients competing against each
>>>> other inappropriately, etc.
>>>> -Greg
>>>>
>>>> On Fri, Mar 27, 2015 at 9:47 AM, Barclay Jameson
>>>> <almightybeeij@gmail.com> wrote:
>>>>> Opps I should have said that I am not just writing the data but copying it :
>>>>>
>>>>> time cp Small1/* Small2/*
>>>>>
>>>>> Thanks,
>>>>>
>>>>> BJ
>>>>>
>>>>> On Fri, Mar 27, 2015 at 11:40 AM, Barclay Jameson
>>>>> <almightybeeij@gmail.com> wrote:
>>>>>> I did a Ceph cluster install 2 weeks ago where I was getting great
>>>>>> performance (~= PanFS) where I could write 100,000 1MB files in 61
>>>>>> Mins (Took PanFS 59 Mins). I thought I could increase the performance
>>>>>> by adding a better MDS server so I redid the entire build.
>>>>>>
>>>>>> Now it takes 4 times as long to write the same data as it did before.
>>>>>> The only thing that changed was the MDS server. (I even tried moving
>>>>>> the MDS back on the old slower node and the performance was the same.)
>>>>>>
>>>>>> The first install was on CentOS 7. I tried going down to CentOS 6.6
>>>>>> and it's the same results.
>>>>>> I use the same scripts to install the OSDs (which I created because I
>>>>>> can never get ceph-deploy to behave correctly. Although, I did use
>>>>>> ceph-deploy to create the MDS and MON and initial cluster creation.)
>>>>>>
>>>>>> I use btrfs on the OSDS as I can get 734 MB/s write and 1100 MB/s read
>>>>>> with rados bench -p cephfs_data 500 write --no-cleanup && rados bench
>>>>>> -p cephfs_data 500 seq (xfs was 734 MB/s write but only 200 MB/s read)
>>>>>>
>>>>>> Could anybody think of a reason as to why I am now getting a huge regression.
>>>>>>
>>>>>> Hardware Setup:
>>>>>> [OSDs]
>>>>>> 64 GB 2133 MHz
>>>>>> Dual Proc E5-2630 v3 @ 2.40GHz (16 Cores)
>>>>>> 40Gb Mellanox NIC
>>>>>>
>>>>>> [MDS/MON new]
>>>>>> 128 GB 2133 MHz
>>>>>> Dual Proc E5-2650 v3 @ 2.30GHz (20 Cores)
>>>>>> 40Gb Mellanox NIC
>>>>>>
>>>>>> [MDS/MON old]
>>>>>> 32 GB 800 MHz
>>>>>> Dual Proc E5472  @ 3.00GHz (8 Cores)
>>>>>> 10Gb Intel NIC
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@lists.ceph.com
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

  parent reply	other threads:[~2015-03-31  3:59 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-27 16:40 CephFS Slow writes with 1MB files Barclay Jameson
2015-03-27 16:47 ` Barclay Jameson
     [not found]   ` <CAMzumdbezcb-p1_MpcSL-h8tTR0RKATt93Om6NPejtn1G6yPeQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-03-27 16:50     ` Mark Nelson
2015-03-27 20:04   ` [ceph-users] " Gregory Farnum
2015-03-27 21:46     ` Barclay Jameson
2015-03-27 21:50       ` Gregory Farnum
     [not found]         ` <CAC6JEv-4D2kF7rrnGMncbE-_63+hdDzecqpB+HrMOKp1YGwv_w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-03-28 17:12           ` Barclay Jameson
     [not found]             ` <CAMzumda7VserizM5PEoT8mwYnTHeE4CFiUiLjnjqN2xXaeNVQQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-03-30 18:30               ` Gregory Farnum
     [not found]                 ` <CAC6JEv_mMbKn+=nHjQmwxAk8T7=MjpJ7bXtHTeWKda7ogJ1GPg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-03-30 18:46                   ` Barclay Jameson
2015-03-31  3:59             ` Yan, Zheng [this message]
     [not found]               ` <CAMzumdZG4Zqv5SWGTRTS_FTEUe3EwVbbBThjXVTP2gQadMhFsw@mail.gmail.com>
     [not found]                 ` <CAMzumdZG4Zqv5SWGTRTS_FTEUe3EwVbbBThjXVTP2gQadMhFsw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-04-02 11:03                   ` Yan, Zheng
2015-04-02 15:18                     ` [ceph-users] " Barclay Jameson
     [not found]                       ` <CAMzumdYdeTv0qF9VdNHb4CM=DNV_HC2nQ7sVnqJ+5w6=TrvCiQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-04-02 21:36                         ` Barclay Jameson
2015-04-03  2:12                       ` [ceph-users] " Yan, Zheng
2015-04-03 20:57                         ` Barclay Jameson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAAM7YAkwtFLnFea3fNT_CaeA2R+_xdZONVN_s21k7_Rbd27H+Q@mail.gmail.com \
    --to=ukernel@gmail.com \
    --cc=almightybeeij@gmail.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=ceph-users@ceph.com \
    --cc=greg@gregs42.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.