All of lore.kernel.org
 help / color / mirror / Atom feed
* MDS has inconsistent performance
@ 2015-01-13  6:17 Michael Sevilla
  2015-01-13 19:13 ` Gregory Farnum
  0 siblings, 1 reply; 12+ messages in thread
From: Michael Sevilla @ 2015-01-13  6:17 UTC (permalink / raw)
  To: ceph-devel

I can't get consistent performance with 1 MDS. I have 2 clients create
100,000 files (separate directories) in a CephFS mount. I ran the
experiment 5 times (deleting the pools/fs and restarting the MDS in
between each run). I graphed the metadata throughput (requests per
second): https://github.com/michaelsevilla/mds/blob/master/graphs/thruput.png

Sometimes (run0, run3), both clients issue 2 lookups per create to the
MDS - this makes throughput high but the runtime long since the MDS
processes many more requests.
Sometimes (run2, run4), 1 client does 2 lookups per create and the
other doesn't do any lookups.
Sometimes (run1), neither client does any lookups - this has the
fastest runtime.

Does anyone know why the client behaves differently for the same exact
experiment? Reading the client logs, it looks like sometimes the
client enters add_update_cap() and clears the inode->flags in
check_cap_issue(), then when a lookup occurs (in _lookup()), the
client can't return ENOENT locally -- forcing it ask the MDS to do the
lookup. But this only happens sometimes (e.g., run0 and run3).

Details of the experiment:
Workload: 2 clients, 100,000 creates in separate directories, using
the FUSE client
MDS config: client_cache_size = 100000000, mds_cache_size = 16384000
Cluster: 18 OSDs, 1 MDS, 1 MON, data/metadata pools have 4096 PGs
Ceph version 0.90-877-gc219c43 (c219c43cc2943c794378214d77566e3f0d3f394a)

Thanks!

Michael

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: MDS has inconsistent performance
  2015-01-13  6:17 MDS has inconsistent performance Michael Sevilla
@ 2015-01-13 19:13 ` Gregory Farnum
  2015-01-13 23:45   ` Michael Sevilla
  0 siblings, 1 reply; 12+ messages in thread
From: Gregory Farnum @ 2015-01-13 19:13 UTC (permalink / raw)
  To: Michael Sevilla; +Cc: ceph-devel

On Mon, Jan 12, 2015 at 10:17 PM, Michael Sevilla
<mikesevilla3@gmail.com> wrote:
> I can't get consistent performance with 1 MDS. I have 2 clients create
> 100,000 files (separate directories) in a CephFS mount. I ran the
> experiment 5 times (deleting the pools/fs and restarting the MDS in
> between each run). I graphed the metadata throughput (requests per
> second): https://github.com/michaelsevilla/mds/blob/master/graphs/thruput.png

So that top line is ~20,000 processed requests/second, as measured at
the MDS? (Looking at perfcounters?) And the fast run is doing 10k
create requests/second? (This number is much higher than I expected!)

> Sometimes (run0, run3), both clients issue 2 lookups per create to the
> MDS - this makes throughput high but the runtime long since the MDS
> processes many more requests.
> Sometimes (run2, run4), 1 client does 2 lookups per create and the
> other doesn't do any lookups.
> Sometimes (run1), neither client does any lookups - this has the
> fastest runtime.
>
> Does anyone know why the client behaves differently for the same exact
> experiment? Reading the client logs, it looks like sometimes the
> client enters add_update_cap() and clears the inode->flags in
> check_cap_issue(), then when a lookup occurs (in _lookup()), the
> client can't return ENOENT locally -- forcing it ask the MDS to do the
> lookup. But this only happens sometimes (e.g., run0 and run3).

If you provide the logs I can check more carefully, but my guess is
that you've got another client mounting it, or are looking at both
directories from one of the clients, and this is inadvertently causing
them to go into shared rather than exclusive mode.

How are you trying to keep the directories private during the
workload? Some of the more naive solutions won't stand up to
repetitive testing given how various components of the system
currently behave.

>
> Details of the experiment:
> Workload: 2 clients, 100,000 creates in separate directories, using
> the FUSE client
> MDS config: client_cache_size = 100000000, mds_cache_size = 16384000

That client_cache_size only has any effect if it's applied to the
client-side config. ;)
-Greg

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: MDS has inconsistent performance
  2015-01-13 19:13 ` Gregory Farnum
@ 2015-01-13 23:45   ` Michael Sevilla
  2015-01-15 19:28     ` Gregory Farnum
  0 siblings, 1 reply; 12+ messages in thread
From: Michael Sevilla @ 2015-01-13 23:45 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: ceph-devel

On Tue, Jan 13, 2015 at 11:13 AM, Gregory Farnum <greg@gregs42.com> wrote:
> On Mon, Jan 12, 2015 at 10:17 PM, Michael Sevilla
> <mikesevilla3@gmail.com> wrote:
>> I can't get consistent performance with 1 MDS. I have 2 clients create
>> 100,000 files (separate directories) in a CephFS mount. I ran the
>> experiment 5 times (deleting the pools/fs and restarting the MDS in
>> between each run). I graphed the metadata throughput (requests per
>> second): https://github.com/michaelsevilla/mds/blob/master/graphs/thruput.png
>
> So that top line is ~20,000 processed requests/second, as measured at
> the MDS? (Looking at perfcounters?) And the fast run is doing 10k
> create requests/second? (This number is much higher than I expected!)

Yes - top line was 20K req/s from perf counter dump and the fast run
does about 13K creates/s. We were surprised, too... In fact, the
performance of 1 client per MDS gives us similar performance to
IndexFS - a system that came out in a paper at Supercomputing this
year. Here is a throughput graph, normalized to the # of clients, that
shows how powerful one MDS can actually be:
https://github.com/michaelsevilla/mds/blob/master/graphs/thruput-norm.png

Keep in mind that runs with more than 1 client aren't creates/s, but ops/sec. ;)

>
>> Sometimes (run0, run3), both clients issue 2 lookups per create to the
>> MDS - this makes throughput high but the runtime long since the MDS
>> processes many more requests.
>> Sometimes (run2, run4), 1 client does 2 lookups per create and the
>> other doesn't do any lookups.
>> Sometimes (run1), neither client does any lookups - this has the
>> fastest runtime.
>>
>> Does anyone know why the client behaves differently for the same exact
>> experiment? Reading the client logs, it looks like sometimes the
>> client enters add_update_cap() and clears the inode->flags in
>> check_cap_issue(), then when a lookup occurs (in _lookup()), the
>> client can't return ENOENT locally -- forcing it ask the MDS to do the
>> lookup. But this only happens sometimes (e.g., run0 and run3).
>
> If you provide the logs I can check more carefully, but my guess is
> that you've got another client mounting it, or are looking at both
> directories from one of the clients, and this is inadvertently causing
> them to go into shared rather than exclusive mode.

I think you are right! Here is a subset of the client log:
https://github.com/michaelsevilla/mds/blob/master/scratch/client0.log

These snippets are zoomed into when the client stops sending "create,
create, create, create..." and starts sending "lookup, lookup, create,
lookup, lookup, create..."

$ cat client0.log | grep "send_request client"
create ...file.2098
create ...file.2099
create ...file.2100
create ...file.2101
lookup ...file.2102
lookup ...file.2102
create ...file.2102
lookup ...file.2103
lookup ...file.2103
create ...file.2103
lookup ...file.2104
lookup ...file.2104
create ...file.2104

I think what you are looking for is on line 687:
... clearing (I_COMPLETE|I_DIR_ORDERED)
... add_update_cap issued pAsLsXs -> pAsLsXsFsx

It looks like we lose the exclusive mode on the file... but I don't
understand why the MDS revokes it for 1 client but not the other. The
MDS log is here:
https://raw.githubusercontent.com/michaelsevilla/mds/master/scratch/mds.log


>
> How are you trying to keep the directories private during the
> workload? Some of the more naive solutions won't stand up to
> repetitive testing given how various components of the system
> currently behave.
Is there a way to keep the directories private (i.e. keep the always
in exclusive mode? That'd be perfect... In my runs, one client does
mkdir /mnt/cephfs/dir0 and there other does mdkir /mnt/cephfs/dir1...

>
>>
>> Details of the experiment:
>> Workload: 2 clients, 100,000 creates in separate directories, using
>> the FUSE client
>> MDS config: client_cache_size = 100000000, mds_cache_size = 16384000
>
> That client_cache_size only has any effect if it's applied to the
> client-side config. ;)
Yes - I copy the ceph.conf to the client, too. I think it works
because the 1 client, 1 MDS test caches all the inodes, according the
perf counters.

Thanks so much, Greg!

Mike

> -Greg

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: MDS has inconsistent performance
  2015-01-13 23:45   ` Michael Sevilla
@ 2015-01-15 19:28     ` Gregory Farnum
  2015-01-15 22:44       ` Michael Sevilla
  0 siblings, 1 reply; 12+ messages in thread
From: Gregory Farnum @ 2015-01-15 19:28 UTC (permalink / raw)
  To: Michael Sevilla; +Cc: ceph-devel

Can you post the full logs somewhere to look at? These bits aren't
very helpful on their own (except to say, yes, the client cleared its
I_COMPLETE for some reason).

On Tue, Jan 13, 2015 at 3:45 PM, Michael Sevilla <mikesevilla3@gmail.com> wrote:
> On Tue, Jan 13, 2015 at 11:13 AM, Gregory Farnum <greg@gregs42.com> wrote:
>> On Mon, Jan 12, 2015 at 10:17 PM, Michael Sevilla
>> <mikesevilla3@gmail.com> wrote:
>>> I can't get consistent performance with 1 MDS. I have 2 clients create
>>> 100,000 files (separate directories) in a CephFS mount. I ran the
>>> experiment 5 times (deleting the pools/fs and restarting the MDS in
>>> between each run). I graphed the metadata throughput (requests per
>>> second): https://github.com/michaelsevilla/mds/blob/master/graphs/thruput.png
>>
>> So that top line is ~20,000 processed requests/second, as measured at
>> the MDS? (Looking at perfcounters?) And the fast run is doing 10k
>> create requests/second? (This number is much higher than I expected!)
>
> Yes - top line was 20K req/s from perf counter dump and the fast run
> does about 13K creates/s. We were surprised, too... In fact, the
> performance of 1 client per MDS gives us similar performance to
> IndexFS - a system that came out in a paper at Supercomputing this
> year. Here is a throughput graph, normalized to the # of clients, that
> shows how powerful one MDS can actually be:
> https://github.com/michaelsevilla/mds/blob/master/graphs/thruput-norm.png
>
> Keep in mind that runs with more than 1 client aren't creates/s, but ops/sec. ;)
>
>>
>>> Sometimes (run0, run3), both clients issue 2 lookups per create to the
>>> MDS - this makes throughput high but the runtime long since the MDS
>>> processes many more requests.
>>> Sometimes (run2, run4), 1 client does 2 lookups per create and the
>>> other doesn't do any lookups.
>>> Sometimes (run1), neither client does any lookups - this has the
>>> fastest runtime.
>>>
>>> Does anyone know why the client behaves differently for the same exact
>>> experiment? Reading the client logs, it looks like sometimes the
>>> client enters add_update_cap() and clears the inode->flags in
>>> check_cap_issue(), then when a lookup occurs (in _lookup()), the
>>> client can't return ENOENT locally -- forcing it ask the MDS to do the
>>> lookup. But this only happens sometimes (e.g., run0 and run3).
>>
>> If you provide the logs I can check more carefully, but my guess is
>> that you've got another client mounting it, or are looking at both
>> directories from one of the clients, and this is inadvertently causing
>> them to go into shared rather than exclusive mode.
>
> I think you are right! Here is a subset of the client log:
> https://github.com/michaelsevilla/mds/blob/master/scratch/client0.log
>
> These snippets are zoomed into when the client stops sending "create,
> create, create, create..." and starts sending "lookup, lookup, create,
> lookup, lookup, create..."
>
> $ cat client0.log | grep "send_request client"
> create ...file.2098
> create ...file.2099
> create ...file.2100
> create ...file.2101
> lookup ...file.2102
> lookup ...file.2102
> create ...file.2102
> lookup ...file.2103
> lookup ...file.2103
> create ...file.2103
> lookup ...file.2104
> lookup ...file.2104
> create ...file.2104
>
> I think what you are looking for is on line 687:
> ... clearing (I_COMPLETE|I_DIR_ORDERED)
> ... add_update_cap issued pAsLsXs -> pAsLsXsFsx
>
> It looks like we lose the exclusive mode on the file... but I don't
> understand why the MDS revokes it for 1 client but not the other. The
> MDS log is here:
> https://raw.githubusercontent.com/michaelsevilla/mds/master/scratch/mds.log
>
>
>>
>> How are you trying to keep the directories private during the
>> workload? Some of the more naive solutions won't stand up to
>> repetitive testing given how various components of the system
>> currently behave.
> Is there a way to keep the directories private (i.e. keep the always
> in exclusive mode? That'd be perfect... In my runs, one client does
> mkdir /mnt/cephfs/dir0 and there other does mdkir /mnt/cephfs/dir1...
>
>>
>>>
>>> Details of the experiment:
>>> Workload: 2 clients, 100,000 creates in separate directories, using
>>> the FUSE client
>>> MDS config: client_cache_size = 100000000, mds_cache_size = 16384000
>>
>> That client_cache_size only has any effect if it's applied to the
>> client-side config. ;)
> Yes - I copy the ceph.conf to the client, too. I think it works
> because the 1 client, 1 MDS test caches all the inodes, according the
> perf counters.
>
> Thanks so much, Greg!
>
> Mike
>
>> -Greg

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: MDS has inconsistent performance
  2015-01-15 19:28     ` Gregory Farnum
@ 2015-01-15 22:44       ` Michael Sevilla
  2015-01-16  6:37         ` Gregory Farnum
  0 siblings, 1 reply; 12+ messages in thread
From: Michael Sevilla @ 2015-01-15 22:44 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: ceph-devel

Let me know if this works and/or you need anything else:

https://www.dropbox.com/s/fq47w6jebnyluu0/lookup-logs.tar.gz?dl=0

Beware - the clients were on debug=10. Also, I tried this with the
kernel client and it is more consistent; it does the 2 lookups per
create on 1 client every single time.

On Thu, Jan 15, 2015 at 11:28 AM, Gregory Farnum <greg@gregs42.com> wrote:
> Can you post the full logs somewhere to look at? These bits aren't
> very helpful on their own (except to say, yes, the client cleared its
> I_COMPLETE for some reason).
>
> On Tue, Jan 13, 2015 at 3:45 PM, Michael Sevilla <mikesevilla3@gmail.com> wrote:
>> On Tue, Jan 13, 2015 at 11:13 AM, Gregory Farnum <greg@gregs42.com> wrote:
>>> On Mon, Jan 12, 2015 at 10:17 PM, Michael Sevilla
>>> <mikesevilla3@gmail.com> wrote:
>>>> I can't get consistent performance with 1 MDS. I have 2 clients create
>>>> 100,000 files (separate directories) in a CephFS mount. I ran the
>>>> experiment 5 times (deleting the pools/fs and restarting the MDS in
>>>> between each run). I graphed the metadata throughput (requests per
>>>> second): https://github.com/michaelsevilla/mds/blob/master/graphs/thruput.png
>>>
>>> So that top line is ~20,000 processed requests/second, as measured at
>>> the MDS? (Looking at perfcounters?) And the fast run is doing 10k
>>> create requests/second? (This number is much higher than I expected!)
>>
>> Yes - top line was 20K req/s from perf counter dump and the fast run
>> does about 13K creates/s. We were surprised, too... In fact, the
>> performance of 1 client per MDS gives us similar performance to
>> IndexFS - a system that came out in a paper at Supercomputing this
>> year. Here is a throughput graph, normalized to the # of clients, that
>> shows how powerful one MDS can actually be:
>> https://github.com/michaelsevilla/mds/blob/master/graphs/thruput-norm.png
>>
>> Keep in mind that runs with more than 1 client aren't creates/s, but ops/sec. ;)
>>
>>>
>>>> Sometimes (run0, run3), both clients issue 2 lookups per create to the
>>>> MDS - this makes throughput high but the runtime long since the MDS
>>>> processes many more requests.
>>>> Sometimes (run2, run4), 1 client does 2 lookups per create and the
>>>> other doesn't do any lookups.
>>>> Sometimes (run1), neither client does any lookups - this has the
>>>> fastest runtime.
>>>>
>>>> Does anyone know why the client behaves differently for the same exact
>>>> experiment? Reading the client logs, it looks like sometimes the
>>>> client enters add_update_cap() and clears the inode->flags in
>>>> check_cap_issue(), then when a lookup occurs (in _lookup()), the
>>>> client can't return ENOENT locally -- forcing it ask the MDS to do the
>>>> lookup. But this only happens sometimes (e.g., run0 and run3).
>>>
>>> If you provide the logs I can check more carefully, but my guess is
>>> that you've got another client mounting it, or are looking at both
>>> directories from one of the clients, and this is inadvertently causing
>>> them to go into shared rather than exclusive mode.
>>
>> I think you are right! Here is a subset of the client log:
>> https://github.com/michaelsevilla/mds/blob/master/scratch/client0.log
>>
>> These snippets are zoomed into when the client stops sending "create,
>> create, create, create..." and starts sending "lookup, lookup, create,
>> lookup, lookup, create..."
>>
>> $ cat client0.log | grep "send_request client"
>> create ...file.2098
>> create ...file.2099
>> create ...file.2100
>> create ...file.2101
>> lookup ...file.2102
>> lookup ...file.2102
>> create ...file.2102
>> lookup ...file.2103
>> lookup ...file.2103
>> create ...file.2103
>> lookup ...file.2104
>> lookup ...file.2104
>> create ...file.2104
>>
>> I think what you are looking for is on line 687:
>> ... clearing (I_COMPLETE|I_DIR_ORDERED)
>> ... add_update_cap issued pAsLsXs -> pAsLsXsFsx
>>
>> It looks like we lose the exclusive mode on the file... but I don't
>> understand why the MDS revokes it for 1 client but not the other. The
>> MDS log is here:
>> https://raw.githubusercontent.com/michaelsevilla/mds/master/scratch/mds.log
>>
>>
>>>
>>> How are you trying to keep the directories private during the
>>> workload? Some of the more naive solutions won't stand up to
>>> repetitive testing given how various components of the system
>>> currently behave.
>> Is there a way to keep the directories private (i.e. keep the always
>> in exclusive mode? That'd be perfect... In my runs, one client does
>> mkdir /mnt/cephfs/dir0 and there other does mdkir /mnt/cephfs/dir1...
>>
>>>
>>>>
>>>> Details of the experiment:
>>>> Workload: 2 clients, 100,000 creates in separate directories, using
>>>> the FUSE client
>>>> MDS config: client_cache_size = 100000000, mds_cache_size = 16384000
>>>
>>> That client_cache_size only has any effect if it's applied to the
>>> client-side config. ;)
>> Yes - I copy the ceph.conf to the client, too. I think it works
>> because the 1 client, 1 MDS test caches all the inodes, according the
>> perf counters.
>>
>> Thanks so much, Greg!
>>
>> Mike
>>
>>> -Greg

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: MDS has inconsistent performance
  2015-01-15 22:44       ` Michael Sevilla
@ 2015-01-16  6:37         ` Gregory Farnum
  2015-01-16 16:27           ` Yan, Zheng
  2015-01-16 18:34           ` Michael Sevilla
  0 siblings, 2 replies; 12+ messages in thread
From: Gregory Farnum @ 2015-01-16  6:37 UTC (permalink / raw)
  To: Michael Sevilla; +Cc: ceph-devel, 严正

On Thu, Jan 15, 2015 at 2:44 PM, Michael Sevilla <mikesevilla3@gmail.com> wrote:
> Let me know if this works and/or you need anything else:
>
> https://www.dropbox.com/s/fq47w6jebnyluu0/lookup-logs.tar.gz?dl=0
>
> Beware - the clients were on debug=10. Also, I tried this with the
> kernel client and it is more consistent; it does the 2 lookups per
> create on 1 client every single time.

Mmmm, there are no mds logs of note here. :(

I did look enough to see that:
1) The MDS is for some reason revoking caps on the file create
prompting the switch to double-lookups, which it was not before. The
client doesn't really have any visibility into why that would be the
case; the best guess I can come up with is that maybe the MDS split up
the directory into multiple frags at this point — do you have that
enabled?
2) The only way we set the I_COMPLETE flag is when we create an empty
directory, or when we do a complete listdir on one. That makes it
pretty difficult to get the flag back (and so do the optimal create
path) once you lose it. :( I'd love a better way to do so, but we'll
have to look at what's involved in a bit of depth.

I'm not sure why the kernel client is so much more cautious, but I
think there were a number of troubles with the directory listing
orders and things which were harder to solve there – I don't remember
if we introduced the I_DIR_ORDERED flag in it or not. Zheng can talk
more about that. What kernel client version are you using?

And for a vanity data point, what kind of hardware is your MDS running on? :)

-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: MDS has inconsistent performance
  2015-01-16  6:37         ` Gregory Farnum
@ 2015-01-16 16:27           ` Yan, Zheng
  2015-01-16 18:35             ` Michael Sevilla
  2015-01-16 18:34           ` Michael Sevilla
  1 sibling, 1 reply; 12+ messages in thread
From: Yan, Zheng @ 2015-01-16 16:27 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Michael Sevilla, ceph-devel, 严正

On Fri, Jan 16, 2015 at 2:37 PM, Gregory Farnum <greg@gregs42.com> wrote:
> On Thu, Jan 15, 2015 at 2:44 PM, Michael Sevilla <mikesevilla3@gmail.com> wrote:
>> Let me know if this works and/or you need anything else:
>>
>> https://www.dropbox.com/s/fq47w6jebnyluu0/lookup-logs.tar.gz?dl=0
>>
>> Beware - the clients were on debug=10. Also, I tried this with the
>> kernel client and it is more consistent; it does the 2 lookups per
>> create on 1 client every single time.
>
> Mmmm, there are no mds logs of note here. :(
>
> I did look enough to see that:
> 1) The MDS is for some reason revoking caps on the file create
> prompting the switch to double-lookups, which it was not before. The
> client doesn't really have any visibility into why that would be the
> case; the best guess I can come up with is that maybe the MDS split up
> the directory into multiple frags at this point -- do you have that
> enabled?
> 2) The only way we set the I_COMPLETE flag is when we create an empty
> directory, or when we do a complete listdir on one. That makes it
> pretty difficult to get the flag back (and so do the optimal create
> path) once you lose it. :( I'd love a better way to do so, but we'll
> have to look at what's involved in a bit of depth.
>
> I'm not sure why the kernel client is so much more cautious, but I
> think there were a number of troubles with the directory listing
> orders and things which were harder to solve there - I don't remember
> if we introduced the I_DIR_ORDERED flag in it or not. Zheng can talk
> more about that. What kernel client version are you using?
>
> And for a vanity data point, what kind of hardware is your MDS running on? :)

For kernel before 3.18, the I_COMPLETE get cleared once directory is modified.
I_DIR_ORDERED is introduced by 3.18 kernel. I just tried 3.18 kernel,
unfortunately
there still is a bug that prevent new directory from having I_COMPLETE flag

Regards
Yan, Zheng

>
> -Greg
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: MDS has inconsistent performance
  2015-01-16  6:37         ` Gregory Farnum
  2015-01-16 16:27           ` Yan, Zheng
@ 2015-01-16 18:34           ` Michael Sevilla
  2015-01-16 18:43             ` Gregory Farnum
  1 sibling, 1 reply; 12+ messages in thread
From: Michael Sevilla @ 2015-01-16 18:34 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: ceph-devel, 严正

On Thu, Jan 15, 2015 at 10:37 PM, Gregory Farnum <greg@gregs42.com> wrote:
> On Thu, Jan 15, 2015 at 2:44 PM, Michael Sevilla <mikesevilla3@gmail.com> wrote:
>> Let me know if this works and/or you need anything else:
>>
>> https://www.dropbox.com/s/fq47w6jebnyluu0/lookup-logs.tar.gz?dl=0
>>
>> Beware - the clients were on debug=10. Also, I tried this with the
>> kernel client and it is more consistent; it does the 2 lookups per
>> create on 1 client every single time.
>
> Mmmm, there are no mds logs of note here. :(
>

Meaning you couldn't find mds.issdm-15.log? Or that that log didn't
show anything interesting...

> I did look enough to see that:
> 1) The MDS is for some reason revoking caps on the file create
> prompting the switch to double-lookups, which it was not before. The
> client doesn't really have any visibility into why that would be the
> case; the best guess I can come up with is that maybe the MDS split up
> the directory into multiple frags at this point — do you have that
> enabled?

Nope, unless any of these make a difference:
$ ceph --admin-daemon... config show | grep frag
  "mds_bal_frag": "false",
  "mds_bal_fragment_interval": "5",
  "mds_thrash_fragments": "0",
  "mds_debug_frag": "false",

> 2) The only way we set the I_COMPLETE flag is when we create an empty
> directory, or when we do a complete listdir on one. That makes it
> pretty difficult to get the flag back (and so do the optimal create
> path) once you lose it. :( I'd love a better way to do so, but we'll
> have to look at what's involved in a bit of depth.

No need - with that reasoning it looks more like this is part of the
design rather than a bug. I'll just have to accept the fact that the
system is very complicated and clients touching stuff at certain times
can make things less predictable... I just wanted to make sure I
wasn't doing anything wrong. :)  I'll stick with the kernel client
(it's almost twice as fast, anyways!)

> I'm not sure why the kernel client is so much more cautious, but I
> think there were a number of troubles with the directory listing
> orders and things which were harder to solve there – I don't remember
> if we introduced the I_DIR_ORDERED flag in it or not. Zheng can talk
> more about that. What kernel client version are you using?
>
> And for a vanity data point, what kind of hardware is your MDS running on? :)

Really, really old hardware from 2006: 2 dual-core CPUs, 8GB RAM,
connected with 1Gbit. Kernel 3.4. We actually just installed beefier
nodes so I'll keep you posted if we get other cool results.

Thanks for all your help, Greg!


> -Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: MDS has inconsistent performance
  2015-01-16 16:27           ` Yan, Zheng
@ 2015-01-16 18:35             ` Michael Sevilla
  0 siblings, 0 replies; 12+ messages in thread
From: Michael Sevilla @ 2015-01-16 18:35 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: Gregory Farnum, ceph-devel, 严正

On Fri, Jan 16, 2015 at 8:27 AM, Yan, Zheng <ukernel@gmail.com> wrote:
> On Fri, Jan 16, 2015 at 2:37 PM, Gregory Farnum <greg@gregs42.com> wrote:
>> On Thu, Jan 15, 2015 at 2:44 PM, Michael Sevilla <mikesevilla3@gmail.com> wrote:
>>> Let me know if this works and/or you need anything else:
>>>
>>> https://www.dropbox.com/s/fq47w6jebnyluu0/lookup-logs.tar.gz?dl=0
>>>
>>> Beware - the clients were on debug=10. Also, I tried this with the
>>> kernel client and it is more consistent; it does the 2 lookups per
>>> create on 1 client every single time.
>>
>> Mmmm, there are no mds logs of note here. :(
>>
>> I did look enough to see that:
>> 1) The MDS is for some reason revoking caps on the file create
>> prompting the switch to double-lookups, which it was not before. The
>> client doesn't really have any visibility into why that would be the
>> case; the best guess I can come up with is that maybe the MDS split up
>> the directory into multiple frags at this point -- do you have that
>> enabled?
>> 2) The only way we set the I_COMPLETE flag is when we create an empty
>> directory, or when we do a complete listdir on one. That makes it
>> pretty difficult to get the flag back (and so do the optimal create
>> path) once you lose it. :( I'd love a better way to do so, but we'll
>> have to look at what's involved in a bit of depth.
>>
>> I'm not sure why the kernel client is so much more cautious, but I
>> think there were a number of troubles with the directory listing
>> orders and things which were harder to solve there - I don't remember
>> if we introduced the I_DIR_ORDERED flag in it or not. Zheng can talk
>> more about that. What kernel client version are you using?
>>
>> And for a vanity data point, what kind of hardware is your MDS running on? :)
>
> For kernel before 3.18, the I_COMPLETE get cleared once directory is modified.
> I_DIR_ORDERED is introduced by 3.18 kernel. I just tried 3.18 kernel,
> unfortunately
> there still is a bug that prevent new directory from having I_COMPLETE flag
>
Ok, I'll stay on 3.4 until that I_COMPLETE flag bug is fixed. Thanks!
> Regards
> Yan, Zheng
>
>>
>> -Greg
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: MDS has inconsistent performance
  2015-01-16 18:34           ` Michael Sevilla
@ 2015-01-16 18:43             ` Gregory Farnum
  2015-01-16 21:13               ` Michael Sevilla
  0 siblings, 1 reply; 12+ messages in thread
From: Gregory Farnum @ 2015-01-16 18:43 UTC (permalink / raw)
  To: Michael Sevilla; +Cc: ceph-devel, 严正

On Fri, Jan 16, 2015 at 10:34 AM, Michael Sevilla
<mikesevilla3@gmail.com> wrote:
> On Thu, Jan 15, 2015 at 10:37 PM, Gregory Farnum <greg@gregs42.com> wrote:
>> On Thu, Jan 15, 2015 at 2:44 PM, Michael Sevilla <mikesevilla3@gmail.com> wrote:
>>> Let me know if this works and/or you need anything else:
>>>
>>> https://www.dropbox.com/s/fq47w6jebnyluu0/lookup-logs.tar.gz?dl=0
>>>
>>> Beware - the clients were on debug=10. Also, I tried this with the
>>> kernel client and it is more consistent; it does the 2 lookups per
>>> create on 1 client every single time.
>>
>> Mmmm, there are no mds logs of note here. :(
>>
>
> Meaning you couldn't find mds.issdm-15.log? Or that that log didn't
> show anything interesting...

It's not interesting. Caps are not logged at a very high level so I
think we'd actually want debug 20 on the mds, the messenger, and the
client subsystems.

>
>> I did look enough to see that:
>> 1) The MDS is for some reason revoking caps on the file create
>> prompting the switch to double-lookups, which it was not before. The
>> client doesn't really have any visibility into why that would be the
>> case; the best guess I can come up with is that maybe the MDS split up
>> the directory into multiple frags at this point — do you have that
>> enabled?
>
> Nope, unless any of these make a difference:
> $ ceph --admin-daemon... config show | grep frag
>   "mds_bal_frag": "false",
>   "mds_bal_fragment_interval": "5",
>   "mds_thrash_fragments": "0",
>   "mds_debug_frag": "false",
>
>> 2) The only way we set the I_COMPLETE flag is when we create an empty
>> directory, or when we do a complete listdir on one. That makes it
>> pretty difficult to get the flag back (and so do the optimal create
>> path) once you lose it. :( I'd love a better way to do so, but we'll
>> have to look at what's involved in a bit of depth.
>
> No need - with that reasoning it looks more like this is part of the
> design rather than a bug. I'll just have to accept the fact that the
> system is very complicated and clients touching stuff at certain times
> can make things less predictable... I just wanted to make sure I
> wasn't doing anything wrong. :)  I'll stick with the kernel client
> (it's almost twice as fast, anyways!)

Well, sort of — an isolated client with their own directory is
something we definitely want to have exclusive caps, but our
heuristics aren't sophisticated enough yet.

>
>> I'm not sure why the kernel client is so much more cautious, but I
>> think there were a number of troubles with the directory listing
>> orders and things which were harder to solve there – I don't remember
>> if we introduced the I_DIR_ORDERED flag in it or not. Zheng can talk
>> more about that. What kernel client version are you using?
>>
>> And for a vanity data point, what kind of hardware is your MDS running on? :)
>
> Really, really old hardware from 2006: 2 dual-core CPUs, 8GB RAM,
> connected with 1Gbit. Kernel 3.4. We actually just installed beefier
> nodes so I'll keep you posted if we get other cool results.

Awesome! That's much faster than previously, although Zheng did some
work recently to split the journaling code into a separate thread
which I guess must have made a big difference.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: MDS has inconsistent performance
  2015-01-16 18:43             ` Gregory Farnum
@ 2015-01-16 21:13               ` Michael Sevilla
  2015-03-24 22:21                 ` Gregory Farnum
  0 siblings, 1 reply; 12+ messages in thread
From: Michael Sevilla @ 2015-01-16 21:13 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: ceph-devel, 严正

If you feel like perusing... log=20 on client, mds messenger, and mds:

https://www.dropbox.com/s/uvmexh9impd3f3c/forgreg.tar.gz?dl=0

In this run, only client 1 starts doing the extra lookups.

On Fri, Jan 16, 2015 at 10:43 AM, Gregory Farnum <greg@gregs42.com> wrote:
> On Fri, Jan 16, 2015 at 10:34 AM, Michael Sevilla
> <mikesevilla3@gmail.com> wrote:
>> On Thu, Jan 15, 2015 at 10:37 PM, Gregory Farnum <greg@gregs42.com> wrote:
>>> On Thu, Jan 15, 2015 at 2:44 PM, Michael Sevilla <mikesevilla3@gmail.com> wrote:
>>>> Let me know if this works and/or you need anything else:
>>>>
>>>> https://www.dropbox.com/s/fq47w6jebnyluu0/lookup-logs.tar.gz?dl=0
>>>>
>>>> Beware - the clients were on debug=10. Also, I tried this with the
>>>> kernel client and it is more consistent; it does the 2 lookups per
>>>> create on 1 client every single time.
>>>
>>> Mmmm, there are no mds logs of note here. :(
>>>
>>
>> Meaning you couldn't find mds.issdm-15.log? Or that that log didn't
>> show anything interesting...
>
> It's not interesting. Caps are not logged at a very high level so I
> think we'd actually want debug 20 on the mds, the messenger, and the
> client subsystems.
>
>>
>>> I did look enough to see that:
>>> 1) The MDS is for some reason revoking caps on the file create
>>> prompting the switch to double-lookups, which it was not before. The
>>> client doesn't really have any visibility into why that would be the
>>> case; the best guess I can come up with is that maybe the MDS split up
>>> the directory into multiple frags at this point — do you have that
>>> enabled?
>>
>> Nope, unless any of these make a difference:
>> $ ceph --admin-daemon... config show | grep frag
>>   "mds_bal_frag": "false",
>>   "mds_bal_fragment_interval": "5",
>>   "mds_thrash_fragments": "0",
>>   "mds_debug_frag": "false",
>>
>>> 2) The only way we set the I_COMPLETE flag is when we create an empty
>>> directory, or when we do a complete listdir on one. That makes it
>>> pretty difficult to get the flag back (and so do the optimal create
>>> path) once you lose it. :( I'd love a better way to do so, but we'll
>>> have to look at what's involved in a bit of depth.
>>
>> No need - with that reasoning it looks more like this is part of the
>> design rather than a bug. I'll just have to accept the fact that the
>> system is very complicated and clients touching stuff at certain times
>> can make things less predictable... I just wanted to make sure I
>> wasn't doing anything wrong. :)  I'll stick with the kernel client
>> (it's almost twice as fast, anyways!)
>
> Well, sort of — an isolated client with their own directory is
> something we definitely want to have exclusive caps, but our
> heuristics aren't sophisticated enough yet.
>
>>
>>> I'm not sure why the kernel client is so much more cautious, but I
>>> think there were a number of troubles with the directory listing
>>> orders and things which were harder to solve there – I don't remember
>>> if we introduced the I_DIR_ORDERED flag in it or not. Zheng can talk
>>> more about that. What kernel client version are you using?
>>>
>>> And for a vanity data point, what kind of hardware is your MDS running on? :)
>>
>> Really, really old hardware from 2006: 2 dual-core CPUs, 8GB RAM,
>> connected with 1Gbit. Kernel 3.4. We actually just installed beefier
>> nodes so I'll keep you posted if we get other cool results.
>
> Awesome! That's much faster than previously, although Zheng did some
> work recently to split the journaling code into a separate thread
> which I guess must have made a big difference.
> -Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: MDS has inconsistent performance
  2015-01-16 21:13               ` Michael Sevilla
@ 2015-03-24 22:21                 ` Gregory Farnum
  0 siblings, 0 replies; 12+ messages in thread
From: Gregory Farnum @ 2015-03-24 22:21 UTC (permalink / raw)
  To: Michael Sevilla; +Cc: ceph-devel

It's been a while and I don't imagine you care much right now, but I
finally made the time to look at these log details. The reason we got
slow turned out to be much stupider than I anticipated (we were losing
I_COMPLETE for bad reasons); I wrote up what I found at
http://tracker.ceph.com/issues/11226 and have an RFC change in
wip-11226-dir-fx (PR at https://github.com/ceph/ceph/pull/4168).

Thanks for the complaint and the logs! :)
-Greg

On Fri, Jan 16, 2015 at 1:13 PM, Michael Sevilla <mikesevilla3@gmail.com> wrote:
> If you feel like perusing... log=20 on client, mds messenger, and mds:
>
> https://www.dropbox.com/s/uvmexh9impd3f3c/forgreg.tar.gz?dl=0
>
> In this run, only client 1 starts doing the extra lookups.
>
> On Fri, Jan 16, 2015 at 10:43 AM, Gregory Farnum <greg@gregs42.com> wrote:
>> On Fri, Jan 16, 2015 at 10:34 AM, Michael Sevilla
>> <mikesevilla3@gmail.com> wrote:
>>> On Thu, Jan 15, 2015 at 10:37 PM, Gregory Farnum <greg@gregs42.com> wrote:
>>>> On Thu, Jan 15, 2015 at 2:44 PM, Michael Sevilla <mikesevilla3@gmail.com> wrote:
>>>>> Let me know if this works and/or you need anything else:
>>>>>
>>>>> https://www.dropbox.com/s/fq47w6jebnyluu0/lookup-logs.tar.gz?dl=0
>>>>>
>>>>> Beware - the clients were on debug=10. Also, I tried this with the
>>>>> kernel client and it is more consistent; it does the 2 lookups per
>>>>> create on 1 client every single time.
>>>>
>>>> Mmmm, there are no mds logs of note here. :(
>>>>
>>>
>>> Meaning you couldn't find mds.issdm-15.log? Or that that log didn't
>>> show anything interesting...
>>
>> It's not interesting. Caps are not logged at a very high level so I
>> think we'd actually want debug 20 on the mds, the messenger, and the
>> client subsystems.
>>
>>>
>>>> I did look enough to see that:
>>>> 1) The MDS is for some reason revoking caps on the file create
>>>> prompting the switch to double-lookups, which it was not before. The
>>>> client doesn't really have any visibility into why that would be the
>>>> case; the best guess I can come up with is that maybe the MDS split up
>>>> the directory into multiple frags at this point — do you have that
>>>> enabled?
>>>
>>> Nope, unless any of these make a difference:
>>> $ ceph --admin-daemon... config show | grep frag
>>>   "mds_bal_frag": "false",
>>>   "mds_bal_fragment_interval": "5",
>>>   "mds_thrash_fragments": "0",
>>>   "mds_debug_frag": "false",
>>>
>>>> 2) The only way we set the I_COMPLETE flag is when we create an empty
>>>> directory, or when we do a complete listdir on one. That makes it
>>>> pretty difficult to get the flag back (and so do the optimal create
>>>> path) once you lose it. :( I'd love a better way to do so, but we'll
>>>> have to look at what's involved in a bit of depth.
>>>
>>> No need - with that reasoning it looks more like this is part of the
>>> design rather than a bug. I'll just have to accept the fact that the
>>> system is very complicated and clients touching stuff at certain times
>>> can make things less predictable... I just wanted to make sure I
>>> wasn't doing anything wrong. :)  I'll stick with the kernel client
>>> (it's almost twice as fast, anyways!)
>>
>> Well, sort of — an isolated client with their own directory is
>> something we definitely want to have exclusive caps, but our
>> heuristics aren't sophisticated enough yet.
>>
>>>
>>>> I'm not sure why the kernel client is so much more cautious, but I
>>>> think there were a number of troubles with the directory listing
>>>> orders and things which were harder to solve there – I don't remember
>>>> if we introduced the I_DIR_ORDERED flag in it or not. Zheng can talk
>>>> more about that. What kernel client version are you using?
>>>>
>>>> And for a vanity data point, what kind of hardware is your MDS running on? :)
>>>
>>> Really, really old hardware from 2006: 2 dual-core CPUs, 8GB RAM,
>>> connected with 1Gbit. Kernel 3.4. We actually just installed beefier
>>> nodes so I'll keep you posted if we get other cool results.
>>
>> Awesome! That's much faster than previously, although Zheng did some
>> work recently to split the journaling code into a separate thread
>> which I guess must have made a big difference.
>> -Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-03-24 22:21 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-13  6:17 MDS has inconsistent performance Michael Sevilla
2015-01-13 19:13 ` Gregory Farnum
2015-01-13 23:45   ` Michael Sevilla
2015-01-15 19:28     ` Gregory Farnum
2015-01-15 22:44       ` Michael Sevilla
2015-01-16  6:37         ` Gregory Farnum
2015-01-16 16:27           ` Yan, Zheng
2015-01-16 18:35             ` Michael Sevilla
2015-01-16 18:34           ` Michael Sevilla
2015-01-16 18:43             ` Gregory Farnum
2015-01-16 21:13               ` Michael Sevilla
2015-03-24 22:21                 ` Gregory Farnum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.