All of lore.kernel.org
 help / color / mirror / Atom feed
* inconsistent file issue
@ 2017-09-20  3:42 陶冬冬
  2017-09-21 12:00 ` Yan, Zheng
  0 siblings, 1 reply; 5+ messages in thread
From: 陶冬冬 @ 2017-09-20  3:42 UTC (permalink / raw)
  To: Yan, Zheng, pdonnell, ceph-devel

Hi Zheng, 

we have been suffering from an inconsistent issue in cephfs :

kernel version : 3.1.0 
ceph version: 10.2.5

we are using the kernel client to mount cephfs
we mount the ceph filesystem on two machines all with kernel 3.1.0, 
but the strange thing is that the content of one same file is different from the two machines. 
and we are certain one machine has the correct content.

 we didn’t know the way to reproduce it , and we don’t have the log here.
 i’m wondering maybe it’s because  there is some bug within kernel client, so that the client think
 it has enough capability to read it just from it’s buffer, no need to get the cap from mds, so it didn’t get the latest content.

 since the kernel is too old, may be you have fixed this kind inconsistent issues ?
 
Regards,
Dongdong
 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: inconsistent file issue
  2017-09-20  3:42 inconsistent file issue 陶冬冬
@ 2017-09-21 12:00 ` Yan, Zheng
       [not found]   ` <2169D938-C686-4B14-B612-0FEBFECBC777@gmail.com>
  0 siblings, 1 reply; 5+ messages in thread
From: Yan, Zheng @ 2017-09-21 12:00 UTC (permalink / raw)
  To: 陶冬冬; +Cc: Patrick Donnelly, ceph-devel

On Wed, Sep 20, 2017 at 11:42 AM, 陶冬冬 <tdd21151186@gmail.com> wrote:
> Hi Zheng,
>
> we have been suffering from an inconsistent issue in cephfs :
>
> kernel version : 3.1.0
> ceph version: 10.2.5
>
> we are using the kernel client to mount cephfs
> we mount the ceph filesystem on two machines all with kernel 3.1.0,
> but the strange thing is that the content of one same file is different from the two machines.
> and we are certain one machine has the correct content.
>
>  we didn’t know the way to reproduce it , and we don’t have the log here.
>  i’m wondering maybe it’s because  there is some bug within kernel client, so that the client think
>  it has enough capability to read it just from it’s buffer, no need to get the cap from mds, so it didn’t get the latest content.
>
>  since the kernel is too old, may be you have fixed this kind inconsistent issues ?

we fixed a splice issue about 1 year ago, it could cause inconsistent
data when multiples client read/write a file at the same time. The
issue the is the only bug I remember, that can cause inconsistent
data. Please try recent kernel, check if the issue still happen.

Regards
Yan, Zheng

>
> Regards,
> Dongdong
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: inconsistent file issue
       [not found]   ` <2169D938-C686-4B14-B612-0FEBFECBC777@gmail.com>
@ 2017-09-21 14:38     ` Yan, Zheng
  2017-09-21 15:00       ` 陶冬冬
  0 siblings, 1 reply; 5+ messages in thread
From: Yan, Zheng @ 2017-09-21 14:38 UTC (permalink / raw)
  To: 陶冬冬; +Cc: Patrick Donnelly, ceph-devel

On Thu, Sep 21, 2017 at 8:25 PM, 陶冬冬 <tdd21151186@gmail.com> wrote:
>
> thanks zheng,
>
> that issue happened again today.
> and from the log, client A write one file, but the mds didn’t revoke the
> CEPH_CAP_FILE_CACHE of the file from client B.
> so that, when client B trying to read that file, client B just read it from
> its page cache which has the old data.

Are you sure that mds didn't revoke CEPH_CAP_FILE_CACHE? could you
provide mds log. In old version kernel, there are code paths that
operate directly on page cache, without checking if
CEPH_CAP_FILE_CACHE is issued.

Regards
Yan, Zheng


>
> Regards,
> Dongdong
>
> 在 2017年9月21日,下午8:00,Yan, Zheng <ukernel@gmail.com> 写道:
>
> On Wed, Sep 20, 2017 at 11:42 AM, 陶冬冬 <tdd21151186@gmail.com> wrote:
>
> Hi Zheng,
>
> we have been suffering from an inconsistent issue in cephfs :
>
> kernel version : 3.1.0
> ceph version: 10.2.5
>
> we are using the kernel client to mount cephfs
> we mount the ceph filesystem on two machines all with kernel 3.1.0,
> but the strange thing is that the content of one same file is different from
> the two machines.
> and we are certain one machine has the correct content.
>
> we didn’t know the way to reproduce it , and we don’t have the log here.
> i’m wondering maybe it’s because  there is some bug within kernel client, so
> that the client think
> it has enough capability to read it just from it’s buffer, no need to get
> the cap from mds, so it didn’t get the latest content.
>
> since the kernel is too old, may be you have fixed this kind inconsistent
> issues ?
>
>
> we fixed a splice issue about 1 year ago, it could cause inconsistent
> data when multiples client read/write a file at the same time. The
> issue the is the only bug I remember, that can cause inconsistent
> data. Please try recent kernel, check if the issue still happen.
>
> Regards
> Yan, Zheng
>
>
> Regards,
> Dongdong
>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: inconsistent file issue
  2017-09-21 14:38     ` Yan, Zheng
@ 2017-09-21 15:00       ` 陶冬冬
  2017-09-22 11:48         ` Yan, Zheng
  0 siblings, 1 reply; 5+ messages in thread
From: 陶冬冬 @ 2017-09-21 15:00 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: Patrick Donnelly, ceph-devel

sorry, i don’t have the detail log that could show if the mds has revoked the cap successful

i have done “echo 1 > /proc/sys/vm/drop_caches”,  and after that the client can read the latest data.

i have tried “ceph +p > /sys/kernel/debug/dynamic_debug/control” to enable the kernel log. 
but where can i find the ceph log ?

could you please show me the code path that directly operate on page cache without checking CEPH_CAP_FILE_CACHE   in kernel 3.10  ? 

Thanks,
Dongdong

> 在 2017年9月21日,下午10:38,Yan, Zheng <ukernel@gmail.com> 写道:
> 
> On Thu, Sep 21, 2017 at 8:25 PM, 陶冬冬 <tdd21151186@gmail.com> wrote:
>> 
>> thanks zheng,
>> 
>> that issue happened again today.
>> and from the log, client A write one file, but the mds didn’t revoke the
>> CEPH_CAP_FILE_CACHE of the file from client B.
>> so that, when client B trying to read that file, client B just read it from
>> its page cache which has the old data.
> 
> Are you sure that mds didn't revoke CEPH_CAP_FILE_CACHE? could you
> provide mds log. In old version kernel, there are code paths that
> operate directly on page cache, without checking if
> CEPH_CAP_FILE_CACHE is issued.
> 
> Regards
> Yan, Zheng
> 
> 
>> 
>> Regards,
>> Dongdong
>> 
>> 在 2017年9月21日,下午8:00,Yan, Zheng <ukernel@gmail.com> 写道:
>> 
>> On Wed, Sep 20, 2017 at 11:42 AM, 陶冬冬 <tdd21151186@gmail.com> wrote:
>> 
>> Hi Zheng,
>> 
>> we have been suffering from an inconsistent issue in cephfs :
>> 
>> kernel version : 3.1.0
>> ceph version: 10.2.5
>> 
>> we are using the kernel client to mount cephfs
>> we mount the ceph filesystem on two machines all with kernel 3.1.0,
>> but the strange thing is that the content of one same file is different from
>> the two machines.
>> and we are certain one machine has the correct content.
>> 
>> we didn’t know the way to reproduce it , and we don’t have the log here.
>> i’m wondering maybe it’s because  there is some bug within kernel client, so
>> that the client think
>> it has enough capability to read it just from it’s buffer, no need to get
>> the cap from mds, so it didn’t get the latest content.
>> 
>> since the kernel is too old, may be you have fixed this kind inconsistent
>> issues ?
>> 
>> 
>> we fixed a splice issue about 1 year ago, it could cause inconsistent
>> data when multiples client read/write a file at the same time. The
>> issue the is the only bug I remember, that can cause inconsistent
>> data. Please try recent kernel, check if the issue still happen.
>> 
>> Regards
>> Yan, Zheng
>> 
>> 
>> Regards,
>> Dongdong
>> 
>> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: inconsistent file issue
  2017-09-21 15:00       ` 陶冬冬
@ 2017-09-22 11:48         ` Yan, Zheng
  0 siblings, 0 replies; 5+ messages in thread
From: Yan, Zheng @ 2017-09-22 11:48 UTC (permalink / raw)
  To: 陶冬冬; +Cc: Patrick Donnelly, ceph-devel

On Thu, Sep 21, 2017 at 11:00 PM, 陶冬冬 <tdd21151186@gmail.com> wrote:
> sorry, i don’t have the detail log that could show if the mds has revoked the cap successful
>
> i have done “echo 1 > /proc/sys/vm/drop_caches”,  and after that the client can read the latest data.
>
> i have tried “ceph +p > /sys/kernel/debug/dynamic_debug/control” to enable the kernel log.
> but where can i find the ceph log ?
>
> could you please show me the code path that directly operate on page cache without checking CEPH_CAP_FILE_CACHE   in kernel 3.10  ?
>

splice_read, splice_write and readahead

> Thanks,
> Dongdong
>
>> 在 2017年9月21日,下午10:38,Yan, Zheng <ukernel@gmail.com> 写道:
>>
>> On Thu, Sep 21, 2017 at 8:25 PM, 陶冬冬 <tdd21151186@gmail.com> wrote:
>>>
>>> thanks zheng,
>>>
>>> that issue happened again today.
>>> and from the log, client A write one file, but the mds didn’t revoke the
>>> CEPH_CAP_FILE_CACHE of the file from client B.
>>> so that, when client B trying to read that file, client B just read it from
>>> its page cache which has the old data.
>>
>> Are you sure that mds didn't revoke CEPH_CAP_FILE_CACHE? could you
>> provide mds log. In old version kernel, there are code paths that
>> operate directly on page cache, without checking if
>> CEPH_CAP_FILE_CACHE is issued.
>>
>> Regards
>> Yan, Zheng
>>
>>
>>>
>>> Regards,
>>> Dongdong
>>>
>>> 在 2017年9月21日,下午8:00,Yan, Zheng <ukernel@gmail.com> 写道:
>>>
>>> On Wed, Sep 20, 2017 at 11:42 AM, 陶冬冬 <tdd21151186@gmail.com> wrote:
>>>
>>> Hi Zheng,
>>>
>>> we have been suffering from an inconsistent issue in cephfs :
>>>
>>> kernel version : 3.1.0
>>> ceph version: 10.2.5
>>>
>>> we are using the kernel client to mount cephfs
>>> we mount the ceph filesystem on two machines all with kernel 3.1.0,
>>> but the strange thing is that the content of one same file is different from
>>> the two machines.
>>> and we are certain one machine has the correct content.
>>>
>>> we didn’t know the way to reproduce it , and we don’t have the log here.
>>> i’m wondering maybe it’s because  there is some bug within kernel client, so
>>> that the client think
>>> it has enough capability to read it just from it’s buffer, no need to get
>>> the cap from mds, so it didn’t get the latest content.
>>>
>>> since the kernel is too old, may be you have fixed this kind inconsistent
>>> issues ?
>>>
>>>
>>> we fixed a splice issue about 1 year ago, it could cause inconsistent
>>> data when multiples client read/write a file at the same time. The
>>> issue the is the only bug I remember, that can cause inconsistent
>>> data. Please try recent kernel, check if the issue still happen.
>>>
>>> Regards
>>> Yan, Zheng
>>>
>>>
>>> Regards,
>>> Dongdong
>>>
>>>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-09-22 11:48 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-20  3:42 inconsistent file issue 陶冬冬
2017-09-21 12:00 ` Yan, Zheng
     [not found]   ` <2169D938-C686-4B14-B612-0FEBFECBC777@gmail.com>
2017-09-21 14:38     ` Yan, Zheng
2017-09-21 15:00       ` 陶冬冬
2017-09-22 11:48         ` Yan, Zheng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.