All of lore.kernel.org
 help / color / mirror / Atom feed
* question about  striped_read
@ 2013-07-25  0:52 majianpeng
  2013-07-25  5:54 ` Sage Weil
  0 siblings, 1 reply; 33+ messages in thread
From: majianpeng @ 2013-07-25  0:52 UTC (permalink / raw)
  To: ceph-devel

Hi all,
	I met a problem and ask somebody could help me.
In func striped_read()
> if (ret > 0) {
>                int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;

>                if (read < pos - off) {
>                       dout(" zero gap %llu to %llu\n", off + read, pos);
>                        ceph_zero_page_vector_range(page_align + read,
>                                                    pos - off - read, pages);
>                }   
>                pos += ret;
>               read = pos - off;

At first , pos = off and off don't modify.
Why does it judge 'read < pos -off ' ?
Because the read = pos -off, so the read must equal pos -off.

Thansk!
Jianpeng Ma

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: question about  striped_read
  2013-07-25  0:52 question about striped_read majianpeng
@ 2013-07-25  5:54 ` Sage Weil
  2013-07-25  6:55   ` majianpeng
  0 siblings, 1 reply; 33+ messages in thread
From: Sage Weil @ 2013-07-25  5:54 UTC (permalink / raw)
  To: majianpeng; +Cc: ceph-devel

On Thu, 25 Jul 2013, majianpeng wrote:
> Hi all,
> 	I met a problem and ask somebody could help me.
> In func striped_read()
> > if (ret > 0) {
> >                int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
> 
> >                if (read < pos - off) {
> >                       dout(" zero gap %llu to %llu\n", off + read, pos);
> >                        ceph_zero_page_vector_range(page_align + read,
> >                                                    pos - off - read, pages);
> >                }   
> >                pos += ret;
> >               read = pos - off;
> 
> At first , pos = off and off don't modify.
> Why does it judge 'read < pos -off ' ?
> Because the read = pos -off, so the read must equal pos -off.

This block triggers if we hit a stripe.  off is always the original 
starting offset.  pos is the current position that we just tried to read 
from.  We might:

 - try to read a big extent from off == pos
 - readpages truncates this to the end of the object
 - we get a short read (not a complete object)
 - read is some partial amount
 - pos is off + read + a gap
 - we try to read from the next object and there is data
   => there was a hole and we need to zero the gap

If, on the other hand, there isn't data in the next object, then we don't 
want to zero the hole.. and least not until we decide where EOF is.  
That's why this block is where it is, and never triggers on the first 
iteration.

Does that make sense?
sage

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re: question about  striped_read
  2013-07-25  5:54 ` Sage Weil
@ 2013-07-25  6:55   ` majianpeng
  2013-07-25 12:27     ` Yan, Zheng
  0 siblings, 1 reply; 33+ messages in thread
From: majianpeng @ 2013-07-25  6:55 UTC (permalink / raw)
  To: sage; +Cc: ceph-devel

>On Thu, 25 Jul 2013, majianpeng wrote:
>> Hi all,
>> 	I met a problem and ask somebody could help me.
>> In func striped_read()
>> > if (ret > 0) {
>> >                int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>> 
>> >                if (read < pos - off) {
>> >                       dout(" zero gap %llu to %llu\n", off + read, pos);
>> >                        ceph_zero_page_vector_range(page_align + read,
>> >                                                    pos - off - read, pages);
>> >                }   
>> >                pos += ret;
>> >               read = pos - off;
>> 
>> At first , pos = off and off don't modify.
>> Why does it judge 'read < pos -off ' ?
>> Because the read = pos -off, so the read must equal pos -off.
>
>This block triggers if we hit a stripe.  off is always the original 
>starting offset.  pos is the current position that we just tried to read 
>from.  We might:
>
> - try to read a big extent from off == pos
> - readpages truncates this to the end of the object
> - we get a short read (not a complete object)
What's the mean of not a complete object?
FYI,a short read maybe  met the boundry of stripe(object) or the EOF of file.
For the hole of file, i don't know how to handle.
> - read is some partial amount
> - pos is off + read + a gap
From the code :	read = pos - off; after that, the pos don't change.
So the a gap where is from?

Thanks!
Jianpeng Ma
> - we try to read from the next object and there is data
>   => there was a hole and we need to zero the gap
>
>If, on the other hand, there isn't data in the next object, then we don't 
>want to zero the hole.. and least not until we decide where EOF is.  
>That's why this block is where it is, and never triggers on the first 
>iteration.
>
>Does that make sense?
>sage

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re: question about striped_read
  2013-07-25  6:55   ` majianpeng
@ 2013-07-25 12:27     ` Yan, Zheng
  2013-07-25 15:50       ` Sage Weil
  0 siblings, 1 reply; 33+ messages in thread
From: Yan, Zheng @ 2013-07-25 12:27 UTC (permalink / raw)
  To: majianpeng; +Cc: sage, ceph-devel

On Thu, Jul 25, 2013 at 2:55 PM, majianpeng <majianpeng@gmail.com> wrote:
>>On Thu, 25 Jul 2013, majianpeng wrote:
>>> Hi all,
>>>      I met a problem and ask somebody could help me.
>>> In func striped_read()
>>> > if (ret > 0) {
>>> >                int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>>>
>>> >                if (read < pos - off) {
>>> >                       dout(" zero gap %llu to %llu\n", off + read, pos);
>>> >                        ceph_zero_page_vector_range(page_align + read,
>>> >                                                    pos - off - read, pages);
>>> >                }
>>> >                pos += ret;

I think you are right. probably above line should be 'pos += this_len'

regards
yan, zheng


>>> >               read = pos - off;
>>>
>>> At first , pos = off and off don't modify.
>>> Why does it judge 'read < pos -off ' ?
>>> Because the read = pos -off, so the read must equal pos -off.
>>
>>This block triggers if we hit a stripe.  off is always the original
>>starting offset.  pos is the current position that we just tried to read
>>from.  We might:
>>
>> - try to read a big extent from off == pos
>> - readpages truncates this to the end of the object
>> - we get a short read (not a complete object)
> What's the mean of not a complete object?
> FYI,a short read maybe  met the boundry of stripe(object) or the EOF of file.
> For the hole of file, i don't know how to handle.
>> - read is some partial amount
>> - pos is off + read + a gap
> From the code : read = pos - off; after that, the pos don't change.
> So the a gap where is from?
>
> Thanks!
> Jianpeng Ma
>> - we try to read from the next object and there is data
>>   => there was a hole and we need to zero the gap
>>
>>If, on the other hand, there isn't data in the next object, then we don't
>>want to zero the hole.. and least not until we decide where EOF is.
>>That's why this block is where it is, and never triggers on the first
>>iteration.
>>
>>Does that make sense?
>>sage

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re: question about striped_read
  2013-07-25 12:27     ` Yan, Zheng
@ 2013-07-25 15:50       ` Sage Weil
  2013-07-26  0:48         ` majianpeng
  0 siblings, 1 reply; 33+ messages in thread
From: Sage Weil @ 2013-07-25 15:50 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: majianpeng, ceph-devel

On Thu, 25 Jul 2013, Yan, Zheng wrote:
> On Thu, Jul 25, 2013 at 2:55 PM, majianpeng <majianpeng@gmail.com> wrote:
> >>On Thu, 25 Jul 2013, majianpeng wrote:
> >>> Hi all,
> >>>      I met a problem and ask somebody could help me.
> >>> In func striped_read()
> >>> > if (ret > 0) {
> >>> >                int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
> >>>
> >>> >                if (read < pos - off) {
> >>> >                       dout(" zero gap %llu to %llu\n", off + read, pos);
> >>> >                        ceph_zero_page_vector_range(page_align + read,
> >>> >                                                    pos - off - read, pages);
> >>> >                }
> >>> >                pos += ret;
> 
> I think you are right. probably above line should be 'pos += this_len'

It should be easy to construct a simple test for this.  E.g., something 
like

 pwrite(fd, buf, 0, 3000000);
 pwrite(fd, buf, 4194304, 1000);
 pread(fd, buf, 0, 6000000);
 ...

and whatever else to verify that pos was in fact advanced properly?

sage

> 
> regards
> yan, zheng
> 
> 
> >>> >               read = pos - off;
> >>>
> >>> At first , pos = off and off don't modify.
> >>> Why does it judge 'read < pos -off ' ?
> >>> Because the read = pos -off, so the read must equal pos -off.
> >>
> >>This block triggers if we hit a stripe.  off is always the original
> >>starting offset.  pos is the current position that we just tried to read
> >>from.  We might:
> >>
> >> - try to read a big extent from off == pos
> >> - readpages truncates this to the end of the object
> >> - we get a short read (not a complete object)
> > What's the mean of not a complete object?
> > FYI,a short read maybe  met the boundry of stripe(object) or the EOF of file.
> > For the hole of file, i don't know how to handle.
> >> - read is some partial amount
> >> - pos is off + read + a gap
> > From the code : read = pos - off; after that, the pos don't change.
> > So the a gap where is from?
> >
> > Thanks!
> > Jianpeng Ma
> >> - we try to read from the next object and there is data
> >>   => there was a hole and we need to zero the gap
> >>
> >>If, on the other hand, there isn't data in the next object, then we don't
> >>want to zero the hole.. and least not until we decide where EOF is.
> >>That's why this block is where it is, and never triggers on the first
> >>iteration.
> >>
> >>Does that make sense?
> >>sage
> 
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re: question about striped_read
  2013-07-25 15:50       ` Sage Weil
@ 2013-07-26  0:48         ` majianpeng
  2013-07-26  1:14           ` Yan, Zheng
  0 siblings, 1 reply; 33+ messages in thread
From: majianpeng @ 2013-07-26  0:48 UTC (permalink / raw)
  To: sage, Yan, Zheng; +Cc: ceph-devel

>On Thu, 25 Jul 2013, Yan, Zheng wrote:
>> On Thu, Jul 25, 2013 at 2:55 PM, majianpeng <majianpeng@gmail.com> wrote:
>> >>On Thu, 25 Jul 2013, majianpeng wrote:
>> >>> Hi all,
>> >>>      I met a problem and ask somebody could help me.
>> >>> In func striped_read()
>> >>> > if (ret > 0) {
>> >>> >                int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>> >>>
>> >>> >                if (read < pos - off) {
>> >>> >                       dout(" zero gap %llu to %llu\n", off + read, pos);
>> >>> >                        ceph_zero_page_vector_range(page_align + read,
>> >>> >                                                    pos - off - read, pages);
>> >>> >                }
>> >>> >                pos += ret;
>> 
>> I think you are right. probably above line should be 'pos += this_len'
>
>It should be easy to construct a simple test for this.  E.g., something 
>like
>
> pwrite(fd, buf, 0, 3000000);
> pwrite(fd, buf, 4194304, 1000);
> pread(fd, buf, 0, 6000000);
> ...
>
>and whatever else to verify that pos was in fact advanced properly?
>
The following is my test code:
void hole_test()
{
	char buf[4194304];
	ssize_t ret;
	int fd = open("/media/ceph/test", O_RDWR|O_CREAT|O_DIRECT|O_TRUNC);
	if (fd < 0) {
		printf("open error %s\n", strerror(errno));
		return;
	}

	ret = pwrite(fd, buf, 0, 3000000);
	ret = pwrite(fd, buf, 4194304, 1000);
	ret = pread(fd, buf , 0, 6000000);
	close(fd);
}

The debug message from striped_read are:
[  267.530266] ceph:           file.c:356  : striped_read 6000000~0 (read 0) got 0
[  267.530270] ceph:           file.c:396  : striped_read returns 0

The result isn't what's your said.
Am i missing something?

BTW, i think Yan is ok,
The code 
pos += ret; should pos += this_len.
Because this_len can larger than ret.
But the question is what's condition can cause this?
And can the short_read operation handle this situation?

>sage
>
>> 
>> regards
>> yan, zheng
>> 
>> 
>> >>> >               read = pos - off;
>> >>>
>> >>> At first , pos = off and off don't modify.
>> >>> Why does it judge 'read < pos -off ' ?
>> >>> Because the read = pos -off, so the read must equal pos -off.
>> >>
>> >>This block triggers if we hit a stripe.  off is always the original
>> >>starting offset.  pos is the current position that we just tried to read
>> >>from.  We might:
>> >>
>> >> - try to read a big extent from off == pos
>> >> - readpages truncates this to the end of the object
>> >> - we get a short read (not a complete object)
>> > What's the mean of not a complete object?
>> > FYI,a short read maybe  met the boundry of stripe(object) or the EOF of file.
>> > For the hole of file, i don't know how to handle.
>> >> - read is some partial amount
>> >> - pos is off + read + a gap
>> > From the code : read = pos - off; after that, the pos don't change.
>> > So the a gap where is from?
>> >
>> > Thanks!
>> > Jianpeng Ma
>> >> - we try to read from the next object and there is data
>> >>   => there was a hole and we need to zero the gap
>> >>
>> >>If, on the other hand, there isn't data in the next object, then we don't
>> >>want to zero the hole.. and least not until we decide where EOF is.
>> >>That's why this block is where it is, and never triggers on the first
>> >>iteration.
>> >>
>> >>Does that make sense?
>> >>sage
>> 
>> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re: question about striped_read
  2013-07-26  0:48         ` majianpeng
@ 2013-07-26  1:14           ` Yan, Zheng
  2013-07-26  1:22             ` majianpeng
  0 siblings, 1 reply; 33+ messages in thread
From: Yan, Zheng @ 2013-07-26  1:14 UTC (permalink / raw)
  To: majianpeng; +Cc: sage, ceph-devel

On Fri, Jul 26, 2013 at 8:48 AM, majianpeng <majianpeng@gmail.com> wrote:
>>On Thu, 25 Jul 2013, Yan, Zheng wrote:
>>> On Thu, Jul 25, 2013 at 2:55 PM, majianpeng <majianpeng@gmail.com> wrote:
>>> >>On Thu, 25 Jul 2013, majianpeng wrote:
>>> >>> Hi all,
>>> >>>      I met a problem and ask somebody could help me.
>>> >>> In func striped_read()
>>> >>> > if (ret > 0) {
>>> >>> >                int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>>> >>>
>>> >>> >                if (read < pos - off) {
>>> >>> >                       dout(" zero gap %llu to %llu\n", off + read, pos);
>>> >>> >                        ceph_zero_page_vector_range(page_align + read,
>>> >>> >                                                    pos - off - read, pages);
>>> >>> >                }
>>> >>> >                pos += ret;
>>>
>>> I think you are right. probably above line should be 'pos += this_len'
>>
>>It should be easy to construct a simple test for this.  E.g., something
>>like
>>
>> pwrite(fd, buf, 0, 3000000);
>> pwrite(fd, buf, 4194304, 1000);
>> pread(fd, buf, 0, 6000000);
>> ...
>>
>>and whatever else to verify that pos was in fact advanced properly?
>>
> The following is my test code:
> void hole_test()
> {
>         char buf[4194304];
>         ssize_t ret;
>         int fd = open("/media/ceph/test", O_RDWR|O_CREAT|O_DIRECT|O_TRUNC);
>         if (fd < 0) {
>                 printf("open error %s\n", strerror(errno));
>                 return;
>         }
>
>         ret = pwrite(fd, buf, 0, 3000000);
>         ret = pwrite(fd, buf, 4194304, 1000);
>         ret = pread(fd, buf , 0, 6000000);
>         close(fd);
> }
>
> The debug message from striped_read are:
> [  267.530266] ceph:           file.c:356  : striped_read 6000000~0 (read 0) got 0
> [  267.530270] ceph:           file.c:396  : striped_read returns 0
>
> The result isn't what's your said.
> Am i missing something?
>
> BTW, i think Yan is ok,
> The code
> pos += ret; should pos += this_len.
> Because this_len can larger than ret.
> But the question is what's condition can cause this?

by default, ceph strips file to 4M objects. In above example, the
first object only has
3M data, so 'ret = 3M'  and 'this_len = 4M'

> And can the short_read operation handle this situation?

I don't think it's good idea to short read unless we really reach EOF.

Regards
Yan, Zheng

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re: question about striped_read
  2013-07-26  1:14           ` Yan, Zheng
@ 2013-07-26  1:22             ` majianpeng
  2013-07-26  1:36               ` Yan, Zheng
  0 siblings, 1 reply; 33+ messages in thread
From: majianpeng @ 2013-07-26  1:22 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: sage, ceph-devel

>On Fri, Jul 26, 2013 at 8:48 AM, majianpeng <majianpeng@gmail.com> wrote:
>>>On Thu, 25 Jul 2013, Yan, Zheng wrote:
>>>> On Thu, Jul 25, 2013 at 2:55 PM, majianpeng <majianpeng@gmail.com> wrote:
>>>> >>On Thu, 25 Jul 2013, majianpeng wrote:
>>>> >>> Hi all,
>>>> >>>      I met a problem and ask somebody could help me.
>>>> >>> In func striped_read()
>>>> >>> > if (ret > 0) {
>>>> >>> >                int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>>>> >>>
>>>> >>> >                if (read < pos - off) {
>>>> >>> >                       dout(" zero gap %llu to %llu\n", off + read, pos);
>>>> >>> >                        ceph_zero_page_vector_range(page_align + read,
>>>> >>> >                                                    pos - off - read, pages);
>>>> >>> >                }
>>>> >>> >                pos += ret;
>>>>
>>>> I think you are right. probably above line should be 'pos += this_len'
>>>
>>>It should be easy to construct a simple test for this.  E.g., something
>>>like
>>>
>>> pwrite(fd, buf, 0, 3000000);
>>> pwrite(fd, buf, 4194304, 1000);
>>> pread(fd, buf, 0, 6000000);
>>> ...
>>>
>>>and whatever else to verify that pos was in fact advanced properly?
>>>
>> The following is my test code:
>> void hole_test()
>> {
>>         char buf[4194304];
>>         ssize_t ret;
>>         int fd = open("/media/ceph/test", O_RDWR|O_CREAT|O_DIRECT|O_TRUNC);
>>         if (fd < 0) {
>>                 printf("open error %s\n", strerror(errno));
>>                 return;
>>         }
>>
>>         ret = pwrite(fd, buf, 0, 3000000);
>>         ret = pwrite(fd, buf, 4194304, 1000);
>>         ret = pread(fd, buf , 0, 6000000);
>>         close(fd);
>> }
>>
>> The debug message from striped_read are:
>> [  267.530266] ceph:           file.c:356  : striped_read 6000000~0 (read 0) got 0
>> [  267.530270] ceph:           file.c:396  : striped_read returns 0
>>
>> The result isn't what's your said.
>> Am i missing something?
>>
>> BTW, i think Yan is ok,
>> The code
>> pos += ret; should pos += this_len.
>> Because this_len can larger than ret.
>> But the question is what's condition can cause this?
>
>by default, ceph strips file to 4M objects. In above example, the
>first object only has
>3M data, so 'ret = 3M'  and 'this_len = 4M'
actually, in func calc_layout: the read length is the smaller between object and left.
>
>> And can the short_read operation handle this situation?
>
>I don't think it's good idea to short read unless we really reach EOF.
Can you explain in detail?

Thanks
Jianpeng Ma
>
>Regards
>Yan, Zheng

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re: question about striped_read
  2013-07-26  1:22             ` majianpeng
@ 2013-07-26  1:36               ` Yan, Zheng
  2013-07-26  1:38                 ` majianpeng
  0 siblings, 1 reply; 33+ messages in thread
From: Yan, Zheng @ 2013-07-26  1:36 UTC (permalink / raw)
  To: majianpeng; +Cc: sage, ceph-devel

On Fri, Jul 26, 2013 at 9:22 AM, majianpeng <majianpeng@gmail.com> wrote:
>>On Fri, Jul 26, 2013 at 8:48 AM, majianpeng <majianpeng@gmail.com> wrote:
>>>>On Thu, 25 Jul 2013, Yan, Zheng wrote:
>>>>> On Thu, Jul 25, 2013 at 2:55 PM, majianpeng <majianpeng@gmail.com> wrote:
>>>>> >>On Thu, 25 Jul 2013, majianpeng wrote:
>>>>> >>> Hi all,
>>>>> >>>      I met a problem and ask somebody could help me.
>>>>> >>> In func striped_read()
>>>>> >>> > if (ret > 0) {
>>>>> >>> >                int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>>>>> >>>
>>>>> >>> >                if (read < pos - off) {
>>>>> >>> >                       dout(" zero gap %llu to %llu\n", off + read, pos);
>>>>> >>> >                        ceph_zero_page_vector_range(page_align + read,
>>>>> >>> >                                                    pos - off - read, pages);
>>>>> >>> >                }
>>>>> >>> >                pos += ret;
>>>>>
>>>>> I think you are right. probably above line should be 'pos += this_len'
>>>>
>>>>It should be easy to construct a simple test for this.  E.g., something
>>>>like
>>>>
>>>> pwrite(fd, buf, 0, 3000000);
>>>> pwrite(fd, buf, 4194304, 1000);
>>>> pread(fd, buf, 0, 6000000);
>>>> ...
>>>>
>>>>and whatever else to verify that pos was in fact advanced properly?
>>>>
>>> The following is my test code:
>>> void hole_test()
>>> {
>>>         char buf[4194304];
>>>         ssize_t ret;
>>>         int fd = open("/media/ceph/test", O_RDWR|O_CREAT|O_DIRECT|O_TRUNC);
>>>         if (fd < 0) {
>>>                 printf("open error %s\n", strerror(errno));
>>>                 return;
>>>         }
>>>
>>>         ret = pwrite(fd, buf, 0, 3000000);
>>>         ret = pwrite(fd, buf, 4194304, 1000);
>>>         ret = pread(fd, buf , 0, 6000000);
>>>         close(fd);
>>> }
>>>
>>> The debug message from striped_read are:
>>> [  267.530266] ceph:           file.c:356  : striped_read 6000000~0 (read 0) got 0
>>> [  267.530270] ceph:           file.c:396  : striped_read returns 0
>>>
>>> The result isn't what's your said.
>>> Am i missing something?
>>>
>>> BTW, i think Yan is ok,
>>> The code
>>> pos += ret; should pos += this_len.
>>> Because this_len can larger than ret.
>>> But the question is what's condition can cause this?
>>
>>by default, ceph strips file to 4M objects. In above example, the
>>first object only has
>>3M data, so 'ret = 3M'  and 'this_len = 4M'
> actually, in func calc_layout: the read length is the smaller between object and left.
>>
>>> And can the short_read operation handle this situation?
>>
>>I don't think it's good idea to short read unless we really reach EOF.
> Can you explain in detail?
>

because some user program interpret short read as EOF reached. If we
return short
read they get confused.

Regards
Yan, Zheng

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re: question about striped_read
  2013-07-26  1:36               ` Yan, Zheng
@ 2013-07-26  1:38                 ` majianpeng
  2013-07-26  1:59                   ` Yan, Zheng
  0 siblings, 1 reply; 33+ messages in thread
From: majianpeng @ 2013-07-26  1:38 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: sage, ceph-devel

>On Fri, Jul 26, 2013 at 9:22 AM, majianpeng <majianpeng@gmail.com> wrote:
>>>On Fri, Jul 26, 2013 at 8:48 AM, majianpeng <majianpeng@gmail.com> wrote:
>>>>>On Thu, 25 Jul 2013, Yan, Zheng wrote:
>>>>>> On Thu, Jul 25, 2013 at 2:55 PM, majianpeng <majianpeng@gmail.com> wrote:
>>>>>> >>On Thu, 25 Jul 2013, majianpeng wrote:
>>>>>> >>> Hi all,
>>>>>> >>>      I met a problem and ask somebody could help me.
>>>>>> >>> In func striped_read()
>>>>>> >>> > if (ret > 0) {
>>>>>> >>> >                int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>>>>>> >>>
>>>>>> >>> >                if (read < pos - off) {
>>>>>> >>> >                       dout(" zero gap %llu to %llu\n", off + read, pos);
>>>>>> >>> >                        ceph_zero_page_vector_range(page_align + read,
>>>>>> >>> >                                                    pos - off - read, pages);
>>>>>> >>> >                }
>>>>>> >>> >                pos += ret;
>>>>>>
>>>>>> I think you are right. probably above line should be 'pos += this_len'
>>>>>
>>>>>It should be easy to construct a simple test for this.  E.g., something
>>>>>like
>>>>>
>>>>> pwrite(fd, buf, 0, 3000000);
>>>>> pwrite(fd, buf, 4194304, 1000);
>>>>> pread(fd, buf, 0, 6000000);
>>>>> ...
>>>>>
>>>>>and whatever else to verify that pos was in fact advanced properly?
>>>>>
>>>> The following is my test code:
>>>> void hole_test()
>>>> {
>>>>         char buf[4194304];
>>>>         ssize_t ret;
>>>>         int fd = open("/media/ceph/test", O_RDWR|O_CREAT|O_DIRECT|O_TRUNC);
>>>>         if (fd < 0) {
>>>>                 printf("open error %s\n", strerror(errno));
>>>>                 return;
>>>>         }
>>>>
>>>>         ret = pwrite(fd, buf, 0, 3000000);
>>>>         ret = pwrite(fd, buf, 4194304, 1000);
>>>>         ret = pread(fd, buf , 0, 6000000);
>>>>         close(fd);
>>>> }
>>>>
>>>> The debug message from striped_read are:
>>>> [  267.530266] ceph:           file.c:356  : striped_read 6000000~0 (read 0) got 0
>>>> [  267.530270] ceph:           file.c:396  : striped_read returns 0
>>>>
>>>> The result isn't what's your said.
>>>> Am i missing something?
>>>>
>>>> BTW, i think Yan is ok,
>>>> The code
>>>> pos += ret; should pos += this_len.
>>>> Because this_len can larger than ret.
>>>> But the question is what's condition can cause this?
>>>
>>>by default, ceph strips file to 4M objects. In above example, the
>>>first object only has
>>>3M data, so 'ret = 3M'  and 'this_len = 4M'
>> actually, in func calc_layout: the read length is the smaller between object and left.
>>>
>>>> And can the short_read operation handle this situation?
>>>
>>>I don't think it's good idea to short read unless we really reach EOF.
>> Can you explain in detail?
>>
>
>because some user program interpret short read as EOF reached. If we
>return short
>read they get confused.
The reason cause short-read is only EOF? or other reason?

>
>Regards
>Yan, Zheng

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re: question about striped_read
  2013-07-26  1:38                 ` majianpeng
@ 2013-07-26  1:59                   ` Yan, Zheng
  2013-07-26  2:07                     ` majianpeng
  0 siblings, 1 reply; 33+ messages in thread
From: Yan, Zheng @ 2013-07-26  1:59 UTC (permalink / raw)
  To: majianpeng; +Cc: sage, ceph-devel

On Fri, Jul 26, 2013 at 9:38 AM, majianpeng <majianpeng@gmail.com> wrote:
>>On Fri, Jul 26, 2013 at 9:22 AM, majianpeng <majianpeng@gmail.com> wrote:
>>>>On Fri, Jul 26, 2013 at 8:48 AM, majianpeng <majianpeng@gmail.com> wrote:
>>>>>>On Thu, 25 Jul 2013, Yan, Zheng wrote:
>>>>>>> On Thu, Jul 25, 2013 at 2:55 PM, majianpeng <majianpeng@gmail.com> wrote:
>>>>>>> >>On Thu, 25 Jul 2013, majianpeng wrote:
>>>>>>> >>> Hi all,
>>>>>>> >>>      I met a problem and ask somebody could help me.
>>>>>>> >>> In func striped_read()
>>>>>>> >>> > if (ret > 0) {
>>>>>>> >>> >                int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>>>>>>> >>>
>>>>>>> >>> >                if (read < pos - off) {
>>>>>>> >>> >                       dout(" zero gap %llu to %llu\n", off + read, pos);
>>>>>>> >>> >                        ceph_zero_page_vector_range(page_align + read,
>>>>>>> >>> >                                                    pos - off - read, pages);
>>>>>>> >>> >                }
>>>>>>> >>> >                pos += ret;
>>>>>>>
>>>>>>> I think you are right. probably above line should be 'pos += this_len'
>>>>>>
>>>>>>It should be easy to construct a simple test for this.  E.g., something
>>>>>>like
>>>>>>
>>>>>> pwrite(fd, buf, 0, 3000000);
>>>>>> pwrite(fd, buf, 4194304, 1000);
>>>>>> pread(fd, buf, 0, 6000000);
>>>>>> ...
>>>>>>
>>>>>>and whatever else to verify that pos was in fact advanced properly?
>>>>>>
>>>>> The following is my test code:
>>>>> void hole_test()
>>>>> {
>>>>>         char buf[4194304];
>>>>>         ssize_t ret;
>>>>>         int fd = open("/media/ceph/test", O_RDWR|O_CREAT|O_DIRECT|O_TRUNC);
>>>>>         if (fd < 0) {
>>>>>                 printf("open error %s\n", strerror(errno));
>>>>>                 return;
>>>>>         }
>>>>>
>>>>>         ret = pwrite(fd, buf, 0, 3000000);
>>>>>         ret = pwrite(fd, buf, 4194304, 1000);
>>>>>         ret = pread(fd, buf , 0, 6000000);
>>>>>         close(fd);
>>>>> }
>>>>>
>>>>> The debug message from striped_read are:
>>>>> [  267.530266] ceph:           file.c:356  : striped_read 6000000~0 (read 0) got 0
>>>>> [  267.530270] ceph:           file.c:396  : striped_read returns 0
>>>>>
>>>>> The result isn't what's your said.
>>>>> Am i missing something?
>>>>>
>>>>> BTW, i think Yan is ok,
>>>>> The code
>>>>> pos += ret; should pos += this_len.
>>>>> Because this_len can larger than ret.
>>>>> But the question is what's condition can cause this?
>>>>
>>>>by default, ceph strips file to 4M objects. In above example, the
>>>>first object only has
>>>>3M data, so 'ret = 3M'  and 'this_len = 4M'
>>> actually, in func calc_layout: the read length is the smaller between object and left.
>>>>
>>>>> And can the short_read operation handle this situation?
>>>>
>>>>I don't think it's good idea to short read unless we really reach EOF.
>>> Can you explain in detail?
>>>
>>
>>because some user program interpret short read as EOF reached. If we
>>return short
>>read they get confused.
> The reason cause short-read is only EOF? or other reason?
>

I think yes, (at least for reading from FS and read size is not very large)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re: question about striped_read
  2013-07-26  1:59                   ` Yan, Zheng
@ 2013-07-26  2:07                     ` majianpeng
       [not found]                       ` <CAAM7YAkNQA5PqVr15CXRQ5xPLk42VCCb3kf3U8ic9f6n3d9SGg@mail.gmail.com>
  0 siblings, 1 reply; 33+ messages in thread
From: majianpeng @ 2013-07-26  2:07 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: sage, ceph-devel

>On Fri, Jul 26, 2013 at 9:38 AM, majianpeng <majianpeng@gmail.com> wrote:
>>>On Fri, Jul 26, 2013 at 9:22 AM, majianpeng <majianpeng@gmail.com> wrote:
>>>>>On Fri, Jul 26, 2013 at 8:48 AM, majianpeng <majianpeng@gmail.com> wrote:
>>>>>>>On Thu, 25 Jul 2013, Yan, Zheng wrote:
>>>>>>>> On Thu, Jul 25, 2013 at 2:55 PM, majianpeng <majianpeng@gmail.com> wrote:
>>>>>>>> >>On Thu, 25 Jul 2013, majianpeng wrote:
>>>>>>>> >>> Hi all,
>>>>>>>> >>>      I met a problem and ask somebody could help me.
>>>>>>>> >>> In func striped_read()
>>>>>>>> >>> > if (ret > 0) {
>>>>>>>> >>> >                int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>>>>>>>> >>>
>>>>>>>> >>> >                if (read < pos - off) {
>>>>>>>> >>> >                       dout(" zero gap %llu to %llu\n", off + read, pos);
>>>>>>>> >>> >                        ceph_zero_page_vector_range(page_align + read,
>>>>>>>> >>> >                                                    pos - off - read, pages);
>>>>>>>> >>> >                }
>>>>>>>> >>> >                pos += ret;
>>>>>>>>
>>>>>>>> I think you are right. probably above line should be 'pos += this_len'
>>>>>>>
>>>>>>>It should be easy to construct a simple test for this.  E.g., something
>>>>>>>like
>>>>>>>
>>>>>>> pwrite(fd, buf, 0, 3000000);
>>>>>>> pwrite(fd, buf, 4194304, 1000);
>>>>>>> pread(fd, buf, 0, 6000000);
>>>>>>> ...
>>>>>>>
>>>>>>>and whatever else to verify that pos was in fact advanced properly?
>>>>>>>
>>>>>> The following is my test code:
>>>>>> void hole_test()
>>>>>> {
>>>>>>         char buf[4194304];
>>>>>>         ssize_t ret;
>>>>>>         int fd = open("/media/ceph/test", O_RDWR|O_CREAT|O_DIRECT|O_TRUNC);
>>>>>>         if (fd < 0) {
>>>>>>                 printf("open error %s\n", strerror(errno));
>>>>>>                 return;
>>>>>>         }
>>>>>>
>>>>>>         ret = pwrite(fd, buf, 0, 3000000);
>>>>>>         ret = pwrite(fd, buf, 4194304, 1000);
>>>>>>         ret = pread(fd, buf , 0, 6000000);
>>>>>>         close(fd);
>>>>>> }
>>>>>>
>>>>>> The debug message from striped_read are:
>>>>>> [  267.530266] ceph:           file.c:356  : striped_read 6000000~0 (read 0) got 0
>>>>>> [  267.530270] ceph:           file.c:396  : striped_read returns 0
>>>>>>
>>>>>> The result isn't what's your said.
>>>>>> Am i missing something?
>>>>>>
>>>>>> BTW, i think Yan is ok,
>>>>>> The code
>>>>>> pos += ret; should pos += this_len.
>>>>>> Because this_len can larger than ret.
>>>>>> But the question is what's condition can cause this?
>>>>>
>>>>>by default, ceph strips file to 4M objects. In above example, the
>>>>>first object only has
>>>>>3M data, so 'ret = 3M'  and 'this_len = 4M'
>>>> actually, in func calc_layout: the read length is the smaller between object and left.
>>>>>
>>>>>> And can the short_read operation handle this situation?
>>>>>
>>>>>I don't think it's good idea to short read unless we really reach EOF.
>>>> Can you explain in detail?
>>>>
>>>
>>>because some user program interpret short read as EOF reached. If we
>>>return short
>>>read they get confused.
>> The reason cause short-read is only EOF? or other reason?
>>
>
>I think yes, (at least for reading from FS and read size is not very large)
we can remove those code:
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 2ddf061..94fa378 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -352,11 +352,6 @@ more:
        if (ret > 0) {
                int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
 
-               if (read < pos - off) {
-                       dout(" zero gap %llu to %llu\n", off + read, pos);
-                       ceph_zero_page_vector_range(page_align + read,
-                                                   pos - off - read, pages);
-               }
                pos += ret;
                read = pos - off;
                left -= ret;

Becase in fun striped_read, the short-read have two reasons:eof or met a hole.
In either case ,the later was_short handler can handle.
Is it ok?

Thanks
Jianpeng Ma



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: Re: question about striped_read
       [not found]                       ` <CAAM7YAkNQA5PqVr15CXRQ5xPLk42VCCb3kf3U8ic9f6n3d9SGg@mail.gmail.com>
@ 2013-07-29  3:00                         ` majianpeng
  2013-07-29  5:02                           ` Yan, Zheng
  0 siblings, 1 reply; 33+ messages in thread
From: majianpeng @ 2013-07-29  3:00 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: sage, ceph-devel

[snip]
>I don't think the later was_short can handle the hole case. For the hole case,
>we should try reading next strip object instead of return. how about
>below patch.
>
Hi Yan,
	i uesed this demo to test hole case.
dd if=/dev/urandom bs=4096 count=2 of=file_with_holes
dd if=/dev/urandom bs=4096 seek=7 count=2 of=file_with_holes

dd if=file_with_holes of=/dev/null bs=16k count=1 iflag=direct
Using the dynamic_debug in striped_read,  the message are:
>[ 8743.663499] ceph:           file.c:350  : striped_read 0~16384 (read 0) got 16384
>[ 8743.663502] ceph:           file.c:390  : striped_read returns 16384
From the messages, we can see it can't hit the short-read.
For the ceph-file-hole, how does the ceph handle?
Or am i missing something?

Thanks!
Jianpeng Ma

>Regards
>Yan, Zheng
>---
>diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>index 271a346..6ca2921 100644
>--- a/fs/ceph/file.c
>+++ b/fs/ceph/file.c
>@@ -350,16 +350,17 @@ more:
> 	     ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
>
> 	if (ret > 0) {
>-		int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>+		int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT;
>
>-		if (read < pos - off) {
>-			dout(" zero gap %llu to %llu\n", off + read, pos);
>-			ceph_zero_page_vector_range(page_align + read,
>-						    pos - off - read, pages);
>+		if (was_short) {
>+			dout(" zero gap %llu to %llu\n",
>+			     pos + ret, pos + this_len);
>+			ceph_zero_page_vector_range(page_align + ret,
>+						    this_len - ret, page_pos);
> 		}
>-		pos += ret;
>+		pos += this_len;
> 		read = pos - off;
>-		left -= ret;
>+		left -= this_len;
> 		page_pos += didpages;
> 		pages_left -= didpages;
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re: question about striped_read
  2013-07-29  3:00                         ` majianpeng
@ 2013-07-29  5:02                           ` Yan, Zheng
  2013-07-30  2:08                             ` majianpeng
  2013-07-30 11:01                             ` majianpeng
  0 siblings, 2 replies; 33+ messages in thread
From: Yan, Zheng @ 2013-07-29  5:02 UTC (permalink / raw)
  To: majianpeng; +Cc: sage, ceph-devel

On Mon, Jul 29, 2013 at 11:00 AM, majianpeng <majianpeng@gmail.com> wrote:
>
> [snip]
> >I don't think the later was_short can handle the hole case. For the hole case,
> >we should try reading next strip object instead of return. how about
> >below patch.
> >
> Hi Yan,
>         i uesed this demo to test hole case.
> dd if=/dev/urandom bs=4096 count=2 of=file_with_holes
> dd if=/dev/urandom bs=4096 seek=7 count=2 of=file_with_holes
>
> dd if=file_with_holes of=/dev/null bs=16k count=1 iflag=direct
> Using the dynamic_debug in striped_read,  the message are:
> >[ 8743.663499] ceph:           file.c:350  : striped_read 0~16384 (read 0) got 16384
> >[ 8743.663502] ceph:           file.c:390  : striped_read returns 16384
> From the messages, we can see it can't hit the short-read.
> For the ceph-file-hole, how does the ceph handle?
> Or am i missing something?

the default strip size is 4M, all data are written to the first object
in your test case.
could you try something like below.

dd if=/dev/urandom bs=1M count=2 of=file_with_holes
dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc
dd if=file_with_holes bs=8M >/dev/null

Regards
Yan, Zheng


>
>
> Thanks!
> Jianpeng Ma
>
> >Regards
> >Yan, Zheng
> >---
> >diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> >index 271a346..6ca2921 100644
> >--- a/fs/ceph/file.c
> >+++ b/fs/ceph/file.c
> >@@ -350,16 +350,17 @@ more:
> >            ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
> >
> >       if (ret > 0) {
> >-              int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
> >+              int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT;
> >
> >-              if (read < pos - off) {
> >-                      dout(" zero gap %llu to %llu\n", off + read, pos);
> >-                      ceph_zero_page_vector_range(page_align + read,
> >-                                                  pos - off - read, pages);
> >+              if (was_short) {
> >+                      dout(" zero gap %llu to %llu\n",
> >+                           pos + ret, pos + this_len);
> >+                      ceph_zero_page_vector_range(page_align + ret,
> >+                                                  this_len - ret, page_pos);
> >               }
> >-              pos += ret;
> >+              pos += this_len;
> >               read = pos - off;
> >-              left -= ret;
> >+              left -= this_len;
> >               page_pos += didpages;
> >               pages_left -= didpages;
> >

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re: question about striped_read
  2013-07-29  5:02                           ` Yan, Zheng
@ 2013-07-30  2:08                             ` majianpeng
  2013-07-30  2:56                               ` Yan, Zheng
  2013-07-30 11:01                             ` majianpeng
  1 sibling, 1 reply; 33+ messages in thread
From: majianpeng @ 2013-07-30  2:08 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: sage, ceph-devel

>On Mon, Jul 29, 2013 at 11:00 AM, majianpeng <majianpeng@gmail.com> wrote:
>>
>> [snip]
>> >I don't think the later was_short can handle the hole case. For the hole case,
>> >we should try reading next strip object instead of return. how about
>> >below patch.
>> >
>> Hi Yan,
>>         i uesed this demo to test hole case.
>> dd if=/dev/urandom bs=4096 count=2 of=file_with_holes
>> dd if=/dev/urandom bs=4096 seek=7 count=2 of=file_with_holes
>>
>> dd if=file_with_holes of=/dev/null bs=16k count=1 iflag=direct
>> Using the dynamic_debug in striped_read,  the message are:
>> >[ 8743.663499] ceph:           file.c:350  : striped_read 0~16384 (read 0) got 16384
>> >[ 8743.663502] ceph:           file.c:390  : striped_read returns 16384
>> From the messages, we can see it can't hit the short-read.
>> For the ceph-file-hole, how does the ceph handle?
>> Or am i missing something?
>
>the default strip size is 4M, all data are written to the first object
>in your test case.
>could you try something like below.
>
>dd if=/dev/urandom bs=1M count=2 of=file_with_holes
>dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc
>dd if=file_with_holes bs=8M >/dev/null
>

From above test, i think your patch is right.
Although, the original code can work but it  call multi striped_read.
As your said for stripe short-read,it doesn't make sense to return rather than reading next stripe.
But can you add some comments for this?
The short-read reasongs are two:EOF or hit-hole.
But for hit-hole there are some differents case. For that i don't know.

Thanks!
Jianpeng Ma
>Regards
>Yan, Zheng
>
>
>>
>>
>> Thanks!
>> Jianpeng Ma
>>
>> >Regards
>> >Yan, Zheng
>> >---
>> >diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>> >index 271a346..6ca2921 100644
>> >--- a/fs/ceph/file.c
>> >+++ b/fs/ceph/file.c
>> >@@ -350,16 +350,17 @@ more:
>> >            ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
>> >
>> >       if (ret > 0) {
>> >-              int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>> >+              int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT;
>> >
>> >-              if (read < pos - off) {
>> >-                      dout(" zero gap %llu to %llu\n", off + read, pos);
>> >-                      ceph_zero_page_vector_range(page_align + read,
>> >-                                                  pos - off - read, pages);
>> >+              if (was_short) {
>> >+                      dout(" zero gap %llu to %llu\n",
>> >+                           pos + ret, pos + this_len);
>> >+                      ceph_zero_page_vector_range(page_align + ret,
>> >+                                                  this_len - ret, page_pos);
>> >               }
>> >-              pos += ret;
>> >+              pos += this_len;
>> >               read = pos - off;
>> >-              left -= ret;
>> >+              left -= this_len;
>> >               page_pos += didpages;
>> >               pages_left -= didpages;
>> >
Thanks!
Jianpeng Ma
>On Mon, Jul 29, 2013 at 11:00 AM, majianpeng <majianpeng@gmail.com> wrote:
>>
>> [snip]
>> >I don't think the later was_short can handle the hole case. For the hole case,
>> >we should try reading next strip object instead of return. how about
>> >below patch.
>> >
>> Hi Yan,
>>         i uesed this demo to test hole case.
>> dd if=/dev/urandom bs=4096 count=2 of=file_with_holes
>> dd if=/dev/urandom bs=4096 seek=7 count=2 of=file_with_holes
>>
>> dd if=file_with_holes of=/dev/null bs=16k count=1 iflag=direct
>> Using the dynamic_debug in striped_read,  the message are:
>> >[ 8743.663499] ceph:           file.c:350  : striped_read 0~16384 (read 0) got 16384
>> >[ 8743.663502] ceph:           file.c:390  : striped_read returns 16384
>> From the messages, we can see it can't hit the short-read.
>> For the ceph-file-hole, how does the ceph handle?
>> Or am i missing something?
>
>the default strip size is 4M, all data are written to the first object
>in your test case.
>could you try something like below.
>
>dd if=/dev/urandom bs=1M count=2 of=file_with_holes
>dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc
>dd if=file_with_holes bs=8M >/dev/null
>
>Regards
>Yan, Zheng
>
>
>>
>>
>> Thanks!
>> Jianpeng Ma
>>
>> >Regards
>> >Yan, Zheng
>> >---
>> >diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>> >index 271a346..6ca2921 100644
>> >--- a/fs/ceph/file.c
>> >+++ b/fs/ceph/file.c
>> >@@ -350,16 +350,17 @@ more:
>> >            ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
>> >
>> >       if (ret > 0) {
>> >-              int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>> >+              int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT;
>> >
>> >-              if (read < pos - off) {
>> >-                      dout(" zero gap %llu to %llu\n", off + read, pos);
>> >-                      ceph_zero_page_vector_range(page_align + read,
>> >-                                                  pos - off - read, pages);
>> >+              if (was_short) {
>> >+                      dout(" zero gap %llu to %llu\n",
>> >+                           pos + ret, pos + this_len);
>> >+                      ceph_zero_page_vector_range(page_align + ret,
>> >+                                                  this_len - ret, page_pos);
>> >               }
>> >-              pos += ret;
>> >+              pos += this_len;
>> >               read = pos - off;
>> >-              left -= ret;
>> >+              left -= this_len;
>> >               page_pos += didpages;
>> >               pages_left -= didpages;
>> >

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re: question about striped_read
  2013-07-30  2:08                             ` majianpeng
@ 2013-07-30  2:56                               ` Yan, Zheng
  0 siblings, 0 replies; 33+ messages in thread
From: Yan, Zheng @ 2013-07-30  2:56 UTC (permalink / raw)
  To: majianpeng; +Cc: sage, ceph-devel

On Tue, Jul 30, 2013 at 10:08 AM, majianpeng <majianpeng@gmail.com> wrote:
>>On Mon, Jul 29, 2013 at 11:00 AM, majianpeng <majianpeng@gmail.com> wrote:
>>>
>>> [snip]
>>> >I don't think the later was_short can handle the hole case. For the hole case,
>>> >we should try reading next strip object instead of return. how about
>>> >below patch.
>>> >
>>> Hi Yan,
>>>         i uesed this demo to test hole case.
>>> dd if=/dev/urandom bs=4096 count=2 of=file_with_holes
>>> dd if=/dev/urandom bs=4096 seek=7 count=2 of=file_with_holes
>>>
>>> dd if=file_with_holes of=/dev/null bs=16k count=1 iflag=direct
>>> Using the dynamic_debug in striped_read,  the message are:
>>> >[ 8743.663499] ceph:           file.c:350  : striped_read 0~16384 (read 0) got 16384
>>> >[ 8743.663502] ceph:           file.c:390  : striped_read returns 16384
>>> From the messages, we can see it can't hit the short-read.
>>> For the ceph-file-hole, how does the ceph handle?
>>> Or am i missing something?
>>
>>the default strip size is 4M, all data are written to the first object
>>in your test case.
>>could you try something like below.
>>
>>dd if=/dev/urandom bs=1M count=2 of=file_with_holes
>>dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc
>>dd if=file_with_holes bs=8M >/dev/null
>>
>
> From above test, i think your patch is right.
> Although, the original code can work but it  call multi striped_read.

For test case
---
dd if=/dev/urandom bs=1M count=2 of=file_with_holes
dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc
dd if=file_with_holes bs=8M iflag=direct >/dev/null

I got
---
ceph:  striped_read 0~8388608 (read 0) got 2097152 HITSTRIPE SHORT
ceph:  striped_read 2097152~6291456 (read 2097152) got 0 HITSTRIPE SHORT
ceph:  zero tail 4194304
ceph:  striped_read returns 6291456
ceph:  sync_read result 6291456
ceph:  aio_read ffff88000fb22f98 10000193e8c.fffffffffffffffe dropping
cap refs on Fcr = 6291456

the original code zeros data in range 2M~6M, it's obvious incorrect.

> As your said for stripe short-read,it doesn't make sense to return rather than reading next stripe.
> But can you add some comments for this?
> The short-read reasongs are two:EOF or hit-hole.
> But for hit-hole there are some differents case. For that i don't know.
>

For hit-hole, there is only one case: the strip object's size is
smaller then 4M. When reading
a strip object, if the returned data is less than we expected, we need
to check if following strip
objects have data.

I think the original code and my patch doesn't handle the below case properly.

| object 0 |  hole  |  hole |  object 3 |
dd if=testfile iflag=direct bs=16M >/dev/null

Could you write a patch, do some tests and submit it.

Regards
Yan, Zheng

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re: question about striped_read
  2013-07-29  5:02                           ` Yan, Zheng
  2013-07-30  2:08                             ` majianpeng
@ 2013-07-30 11:01                             ` majianpeng
  2013-07-30 11:14                               ` Yan, Zheng
  1 sibling, 1 reply; 33+ messages in thread
From: majianpeng @ 2013-07-30 11:01 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: sage, ceph-devel

>On Mon, Jul 29, 2013 at 11:00 AM, majianpeng <majianpeng@gmail.com> wrote:
>>
>> [snip]
>> >I don't think the later was_short can handle the hole case. For the hole case,
>> >we should try reading next strip object instead of return. how about
>> >below patch.
>> >
>> Hi Yan,
>>         i uesed this demo to test hole case.
>> dd if=/dev/urandom bs=4096 count=2 of=file_with_holes
>> dd if=/dev/urandom bs=4096 seek=7 count=2 of=file_with_holes
>>
>> dd if=file_with_holes of=/dev/null bs=16k count=1 iflag=direct
>> Using the dynamic_debug in striped_read,  the message are:
>> >[ 8743.663499] ceph:           file.c:350  : striped_read 0~16384 (read 0) got 16384
>> >[ 8743.663502] ceph:           file.c:390  : striped_read returns 16384
>> From the messages, we can see it can't hit the short-read.
>> For the ceph-file-hole, how does the ceph handle?
>> Or am i missing something?
>
>the default strip size is 4M, all data are written to the first object
>in your test case.
>could you try something like below.
>

>dd if=/dev/urandom bs=1M count=2 of=file_with_holes
>dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc
>dd if=file_with_holes bs=8M >/dev/null
>
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 2ddf061..22a98e5 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -349,17 +349,17 @@ more:
        dout("striped_read %llu~%u (read %u) got %d%s%s\n", pos, left, read,
             ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
 
-       if (ret > 0) {
-               int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
+       if (ret >= 0) {
+               int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT;
 
-               if (read < pos - off) {
-                       dout(" zero gap %llu to %llu\n", off + read, pos);
-                       ceph_zero_page_vector_range(page_align + read,
-                                                   pos - off - read, pages);
+               if (was_short) {
+                       dout(" zero gap %llu to %llu\n", pos + ret, pos + this_len);
+                       ceph_zero_page_vector_range(page_align + ret,
+                                                   this_len - ret, pages);
                }
-               pos += ret;
+               pos += this_len;
                read = pos - off;
-               left -= ret;
+               left -= this_len;
                page_pos += didpages;
                pages_left -= didpages;

This patch can do those case. It only add ret== 0 in judgement 'ret > 0".
But i think i will add a parameter about hit_hole. It will make the code easy to understand.



>Regards
>Yan, Zheng
>
>
>>
>>
>> Thanks!
>> Jianpeng Ma
>>
>> >Regards
>> >Yan, Zheng
>> >---
>> >diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>> >index 271a346..6ca2921 100644
>> >--- a/fs/ceph/file.c
>> >+++ b/fs/ceph/file.c
>> >@@ -350,16 +350,17 @@ more:
>> >            ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
>> >
>> >       if (ret > 0) {
>> >-              int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>> >+              int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT;
>> >
>> >-              if (read < pos - off) {
>> >-                      dout(" zero gap %llu to %llu\n", off + read, pos);
>> >-                      ceph_zero_page_vector_range(page_align + read,
>> >-                                                  pos - off - read, pages);
>> >+              if (was_short) {
>> >+                      dout(" zero gap %llu to %llu\n",
>> >+                           pos + ret, pos + this_len);
>> >+                      ceph_zero_page_vector_range(page_align + ret,
>> >+                                                  this_len - ret, page_pos);
>> >               }
>> >-              pos += ret;
>> >+              pos += this_len;
>> >               read = pos - off;
>> >-              left -= ret;
>> >+              left -= this_len;
>> >               page_pos += didpages;
>> >               pages_left -= didpages;
>> >
Thanks!
Jianpeng Ma
>On Mon, Jul 29, 2013 at 11:00 AM, majianpeng <majianpeng@gmail.com> wrote:
>>
>> [snip]
>> >I don't think the later was_short can handle the hole case. For the hole case,
>> >we should try reading next strip object instead of return. how about
>> >below patch.
>> >
>> Hi Yan,
>>         i uesed this demo to test hole case.
>> dd if=/dev/urandom bs=4096 count=2 of=file_with_holes
>> dd if=/dev/urandom bs=4096 seek=7 count=2 of=file_with_holes
>>
>> dd if=file_with_holes of=/dev/null bs=16k count=1 iflag=direct
>> Using the dynamic_debug in striped_read,  the message are:
>> >[ 8743.663499] ceph:           file.c:350  : striped_read 0~16384 (read 0) got 16384
>> >[ 8743.663502] ceph:           file.c:390  : striped_read returns 16384
>> From the messages, we can see it can't hit the short-read.
>> For the ceph-file-hole, how does the ceph handle?
>> Or am i missing something?
>
>the default strip size is 4M, all data are written to the first object
>in your test case.
>could you try something like below.
>
>dd if=/dev/urandom bs=1M count=2 of=file_with_holes
>dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc
>dd if=file_with_holes bs=8M >/dev/null
>
>Regards
>Yan, Zheng
>
>
>>
>>
>> Thanks!
>> Jianpeng Ma
>>
>> >Regards
>> >Yan, Zheng
>> >---
>> >diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>> >index 271a346..6ca2921 100644
>> >--- a/fs/ceph/file.c
>> >+++ b/fs/ceph/file.c
>> >@@ -350,16 +350,17 @@ more:
>> >            ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
>> >
>> >       if (ret > 0) {
>> >-              int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>> >+              int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT;
>> >
>> >-              if (read < pos - off) {
>> >-                      dout(" zero gap %llu to %llu\n", off + read, pos);
>> >-                      ceph_zero_page_vector_range(page_align + read,
>> >-                                                  pos - off - read, pages);
>> >+              if (was_short) {
>> >+                      dout(" zero gap %llu to %llu\n",
>> >+                           pos + ret, pos + this_len);
>> >+                      ceph_zero_page_vector_range(page_align + ret,
>> >+                                                  this_len - ret, page_pos);
>> >               }
>> >-              pos += ret;
>> >+              pos += this_len;
>> >               read = pos - off;
>> >-              left -= ret;
>> >+              left -= this_len;
>> >               page_pos += didpages;
>> >               pages_left -= didpages;
>> >

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: Re: question about striped_read
  2013-07-30 11:01                             ` majianpeng
@ 2013-07-30 11:14                               ` Yan, Zheng
  2013-07-30 11:20                                 ` majianpeng
  2013-07-30 11:41                                 ` majianpeng
  0 siblings, 2 replies; 33+ messages in thread
From: Yan, Zheng @ 2013-07-30 11:14 UTC (permalink / raw)
  To: majianpeng; +Cc: sage, ceph-devel

On Tue, Jul 30, 2013 at 7:01 PM, majianpeng <majianpeng@gmail.com> wrote:
>>On Mon, Jul 29, 2013 at 11:00 AM, majianpeng <majianpeng@gmail.com> wrote:
>>>
>>> [snip]
>>> >I don't think the later was_short can handle the hole case. For the hole case,
>>> >we should try reading next strip object instead of return. how about
>>> >below patch.
>>> >
>>> Hi Yan,
>>>         i uesed this demo to test hole case.
>>> dd if=/dev/urandom bs=4096 count=2 of=file_with_holes
>>> dd if=/dev/urandom bs=4096 seek=7 count=2 of=file_with_holes
>>>
>>> dd if=file_with_holes of=/dev/null bs=16k count=1 iflag=direct
>>> Using the dynamic_debug in striped_read,  the message are:
>>> >[ 8743.663499] ceph:           file.c:350  : striped_read 0~16384 (read 0) got 16384
>>> >[ 8743.663502] ceph:           file.c:390  : striped_read returns 16384
>>> From the messages, we can see it can't hit the short-read.
>>> For the ceph-file-hole, how does the ceph handle?
>>> Or am i missing something?
>>
>>the default strip size is 4M, all data are written to the first object
>>in your test case.
>>could you try something like below.
>>
>
>>dd if=/dev/urandom bs=1M count=2 of=file_with_holes
>>dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc
>>dd if=file_with_holes bs=8M >/dev/null
>>
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 2ddf061..22a98e5 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -349,17 +349,17 @@ more:
>         dout("striped_read %llu~%u (read %u) got %d%s%s\n", pos, left, read,
>              ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
>
> -       if (ret > 0) {
> -               int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
> +       if (ret >= 0) {
> +               int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT;
>
> -               if (read < pos - off) {
> -                       dout(" zero gap %llu to %llu\n", off + read, pos);
> -                       ceph_zero_page_vector_range(page_align + read,
> -                                                   pos - off - read, pages);
> +               if (was_short) {
> +                       dout(" zero gap %llu to %llu\n", pos + ret, pos + this_len);
> +                       ceph_zero_page_vector_range(page_align + ret,
> +                                                   this_len - ret, pages);
>                 }
> -               pos += ret;
> +               pos += this_len;
>                 read = pos - off;
> -               left -= ret;
> +               left -= this_len;
>                 page_pos += didpages;
>                 pages_left -= didpages;
>
> This patch can do those case. It only add ret== 0 in judgement 'ret > 0".

maybe we should add a  i_size check. stop reading next strip object
when 'pos > i_size'

> But i think i will add a parameter about hit_hole. It will make the code easy to understand.
>

 i think 'was_short' is equal to 'hit_hole'

Regards
Yan, Zheng

>
>
>>Regards
>>Yan, Zheng
>>
>>
>>>
>>>
>>> Thanks!
>>> Jianpeng Ma
>>>
>>> >Regards
>>> >Yan, Zheng
>>> >---
>>> >diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>>> >index 271a346..6ca2921 100644
>>> >--- a/fs/ceph/file.c
>>> >+++ b/fs/ceph/file.c
>>> >@@ -350,16 +350,17 @@ more:
>>> >            ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
>>> >
>>> >       if (ret > 0) {
>>> >-              int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>>> >+              int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT;
>>> >
>>> >-              if (read < pos - off) {
>>> >-                      dout(" zero gap %llu to %llu\n", off + read, pos);
>>> >-                      ceph_zero_page_vector_range(page_align + read,
>>> >-                                                  pos - off - read, pages);
>>> >+              if (was_short) {
>>> >+                      dout(" zero gap %llu to %llu\n",
>>> >+                           pos + ret, pos + this_len);
>>> >+                      ceph_zero_page_vector_range(page_align + ret,
>>> >+                                                  this_len - ret, page_pos);
>>> >               }
>>> >-              pos += ret;
>>> >+              pos += this_len;
>>> >               read = pos - off;
>>> >-              left -= ret;
>>> >+              left -= this_len;
>>> >               page_pos += didpages;
>>> >               pages_left -= didpages;
>>> >
> Thanks!
> Jianpeng Ma
>>On Mon, Jul 29, 2013 at 11:00 AM, majianpeng <majianpeng@gmail.com> wrote:
>>>
>>> [snip]
>>> >I don't think the later was_short can handle the hole case. For the hole case,
>>> >we should try reading next strip object instead of return. how about
>>> >below patch.
>>> >
>>> Hi Yan,
>>>         i uesed this demo to test hole case.
>>> dd if=/dev/urandom bs=4096 count=2 of=file_with_holes
>>> dd if=/dev/urandom bs=4096 seek=7 count=2 of=file_with_holes
>>>
>>> dd if=file_with_holes of=/dev/null bs=16k count=1 iflag=direct
>>> Using the dynamic_debug in striped_read,  the message are:
>>> >[ 8743.663499] ceph:           file.c:350  : striped_read 0~16384 (read 0) got 16384
>>> >[ 8743.663502] ceph:           file.c:390  : striped_read returns 16384
>>> From the messages, we can see it can't hit the short-read.
>>> For the ceph-file-hole, how does the ceph handle?
>>> Or am i missing something?
>>
>>the default strip size is 4M, all data are written to the first object
>>in your test case.
>>could you try something like below.
>>
>>dd if=/dev/urandom bs=1M count=2 of=file_with_holes
>>dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc
>>dd if=file_with_holes bs=8M >/dev/null
>>
>>Regards
>>Yan, Zheng
>>
>>
>>>
>>>
>>> Thanks!
>>> Jianpeng Ma
>>>
>>> >Regards
>>> >Yan, Zheng
>>> >---
>>> >diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>>> >index 271a346..6ca2921 100644
>>> >--- a/fs/ceph/file.c
>>> >+++ b/fs/ceph/file.c
>>> >@@ -350,16 +350,17 @@ more:
>>> >            ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
>>> >
>>> >       if (ret > 0) {
>>> >-              int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>>> >+              int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT;
>>> >
>>> >-              if (read < pos - off) {
>>> >-                      dout(" zero gap %llu to %llu\n", off + read, pos);
>>> >-                      ceph_zero_page_vector_range(page_align + read,
>>> >-                                                  pos - off - read, pages);
>>> >+              if (was_short) {
>>> >+                      dout(" zero gap %llu to %llu\n",
>>> >+                           pos + ret, pos + this_len);
>>> >+                      ceph_zero_page_vector_range(page_align + ret,
>>> >+                                                  this_len - ret, page_pos);
>>> >               }
>>> >-              pos += ret;
>>> >+              pos += this_len;
>>> >               read = pos - off;
>>> >-              left -= ret;
>>> >+              left -= this_len;
>>> >               page_pos += didpages;
>>> >               pages_left -= didpages;
>>> >

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re: question about striped_read
  2013-07-30 11:14                               ` Yan, Zheng
@ 2013-07-30 11:20                                 ` majianpeng
  2013-07-30 11:41                                 ` majianpeng
  1 sibling, 0 replies; 33+ messages in thread
From: majianpeng @ 2013-07-30 11:20 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: sage, ceph-devel

>On Tue, Jul 30, 2013 at 7:01 PM, majianpeng <majianpeng@gmail.com> wrote:
>>>On Mon, Jul 29, 2013 at 11:00 AM, majianpeng <majianpeng@gmail.com> wrote:
>>>>
>>>> [snip]
>>>> >I don't think the later was_short can handle the hole case. For the hole case,
>>>> >we should try reading next strip object instead of return. how about
>>>> >below patch.
>>>> >
>>>> Hi Yan,
>>>>         i uesed this demo to test hole case.
>>>> dd if=/dev/urandom bs=4096 count=2 of=file_with_holes
>>>> dd if=/dev/urandom bs=4096 seek=7 count=2 of=file_with_holes
>>>>
>>>> dd if=file_with_holes of=/dev/null bs=16k count=1 iflag=direct
>>>> Using the dynamic_debug in striped_read,  the message are:
>>>> >[ 8743.663499] ceph:           file.c:350  : striped_read 0~16384 (read 0) got 16384
>>>> >[ 8743.663502] ceph:           file.c:390  : striped_read returns 16384
>>>> From the messages, we can see it can't hit the short-read.
>>>> For the ceph-file-hole, how does the ceph handle?
>>>> Or am i missing something?
>>>
>>>the default strip size is 4M, all data are written to the first object
>>>in your test case.
>>>could you try something like below.
>>>
>>
>>>dd if=/dev/urandom bs=1M count=2 of=file_with_holes
>>>dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc
>>>dd if=file_with_holes bs=8M >/dev/null
>>>
>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>> index 2ddf061..22a98e5 100644
>> --- a/fs/ceph/file.c
>> +++ b/fs/ceph/file.c
>> @@ -349,17 +349,17 @@ more:
>>         dout("striped_read %llu~%u (read %u) got %d%s%s\n", pos, left, read,
>>              ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
>>
>> -       if (ret > 0) {
>> -               int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>> +       if (ret >= 0) {
>> +               int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT;
>>
>> -               if (read < pos - off) {
>> -                       dout(" zero gap %llu to %llu\n", off + read, pos);
>> -                       ceph_zero_page_vector_range(page_align + read,
>> -                                                   pos - off - read, pages);
>> +               if (was_short) {
>> +                       dout(" zero gap %llu to %llu\n", pos + ret, pos + this_len);
>> +                       ceph_zero_page_vector_range(page_align + ret,
>> +                                                   this_len - ret, pages);
>>                 }
>> -               pos += ret;
>> +               pos += this_len;
>>                 read = pos - off;
>> -               left -= ret;
>> +               left -= this_len;
>>                 page_pos += didpages;
>>                 pages_left -= didpages;
>>
>> This patch can do those case. It only add ret== 0 in judgement 'ret > 0".
>
>maybe we should add a  i_size check. stop reading next strip object
>when 'pos > i_size'
>
I think those code can do.
>		/* hit stripe? */
>        if (left && hit_stripe)
 >               goto more;
>> But i think i will add a parameter about hit_hole. It will make the code easy to understand.
>>
>
> i think 'was_short' is equal to 'hit_hole'
FYI, for the EOF, they are the same meaing.

Thanks!
Jianpeng Ma
>
>Regards
>Yan, Zheng
>
>>
>>
>>>Regards
>>>Yan, Zheng
>>>
>>>
>>>>
>>>>
>>>> Thanks!
>>>> Jianpeng Ma
>>>>
>>>> >Regards
>>>> >Yan, Zheng
>>>> >---
>>>> >diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>>>> >index 271a346..6ca2921 100644
>>>> >--- a/fs/ceph/file.c
>>>> >+++ b/fs/ceph/file.c
>>>> >@@ -350,16 +350,17 @@ more:
>>>> >            ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
>>>> >
>>>> >       if (ret > 0) {
>>>> >-              int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>>>> >+              int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT;
>>>> >
>>>> >-              if (read < pos - off) {
>>>> >-                      dout(" zero gap %llu to %llu\n", off + read, pos);
>>>> >-                      ceph_zero_page_vector_range(page_align + read,
>>>> >-                                                  pos - off - read, pages);
>>>> >+              if (was_short) {
>>>> >+                      dout(" zero gap %llu to %llu\n",
>>>> >+                           pos + ret, pos + this_len);
>>>> >+                      ceph_zero_page_vector_range(page_align + ret,
>>>> >+                                                  this_len - ret, page_pos);
>>>> >               }
>>>> >-              pos += ret;
>>>> >+              pos += this_len;
>>>> >               read = pos - off;
>>>> >-              left -= ret;
>>>> >+              left -= this_len;
>>>> >               page_pos += didpages;
>>>> >               pages_left -= didpages;
>>>> >
>> Thanks!
>> Jianpeng Ma
>>>On Mon, Jul 29, 2013 at 11:00 AM, majianpeng <majianpeng@gmail.com> wrote:
>>>>
>>>> [snip]
>>>> >I don't think the later was_short can handle the hole case. For the hole case,
>>>> >we should try reading next strip object instead of return. how about
>>>> >below patch.
>>>> >
>>>> Hi Yan,
>>>>         i uesed this demo to test hole case.
>>>> dd if=/dev/urandom bs=4096 count=2 of=file_with_holes
>>>> dd if=/dev/urandom bs=4096 seek=7 count=2 of=file_with_holes
>>>>
>>>> dd if=file_with_holes of=/dev/null bs=16k count=1 iflag=direct
>>>> Using the dynamic_debug in striped_read,  the message are:
>>>> >[ 8743.663499] ceph:           file.c:350  : striped_read 0~16384 (read 0) got 16384
>>>> >[ 8743.663502] ceph:           file.c:390  : striped_read returns 16384
>>>> From the messages, we can see it can't hit the short-read.
>>>> For the ceph-file-hole, how does the ceph handle?
>>>> Or am i missing something?
>>>
>>>the default strip size is 4M, all data are written to the first object
>>>in your test case.
>>>could you try something like below.
>>>
>>>dd if=/dev/urandom bs=1M count=2 of=file_with_holes
>>>dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc
>>>dd if=file_with_holes bs=8M >/dev/null
>>>
>>>Regards
>>>Yan, Zheng
>>>
>>>
>>>>
>>>>
>>>> Thanks!
>>>> Jianpeng Ma
>>>>
>>>> >Regards
>>>> >Yan, Zheng
>>>> >---
>>>> >diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>>>> >index 271a346..6ca2921 100644
>>>> >--- a/fs/ceph/file.c
>>>> >+++ b/fs/ceph/file.c
>>>> >@@ -350,16 +350,17 @@ more:
>>>> >            ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
>>>> >
>>>> >       if (ret > 0) {
>>>> >-              int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>>>> >+              int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT;
>>>> >
>>>> >-              if (read < pos - off) {
>>>> >-                      dout(" zero gap %llu to %llu\n", off + read, pos);
>>>> >-                      ceph_zero_page_vector_range(page_align + read,
>>>> >-                                                  pos - off - read, pages);
>>>> >+              if (was_short) {
>>>> >+                      dout(" zero gap %llu to %llu\n",
>>>> >+                           pos + ret, pos + this_len);
>>>> >+                      ceph_zero_page_vector_range(page_align + ret,
>>>> >+                                                  this_len - ret, page_pos);
>>>> >               }
>>>> >-              pos += ret;
>>>> >+              pos += this_len;
>>>> >               read = pos - off;
>>>> >-              left -= ret;
>>>> >+              left -= this_len;
>>>> >               page_pos += didpages;
>>>> >               pages_left -= didpages;
>>>> >
Thanks!
Jianpeng Ma
>On Tue, Jul 30, 2013 at 7:01 PM, majianpeng <majianpeng@gmail.com> wrote:
>>>On Mon, Jul 29, 2013 at 11:00 AM, majianpeng <majianpeng@gmail.com> wrote:
>>>>
>>>> [snip]
>>>> >I don't think the later was_short can handle the hole case. For the hole case,
>>>> >we should try reading next strip object instead of return. how about
>>>> >below patch.
>>>> >
>>>> Hi Yan,
>>>>         i uesed this demo to test hole case.
>>>> dd if=/dev/urandom bs=4096 count=2 of=file_with_holes
>>>> dd if=/dev/urandom bs=4096 seek=7 count=2 of=file_with_holes
>>>>
>>>> dd if=file_with_holes of=/dev/null bs=16k count=1 iflag=direct
>>>> Using the dynamic_debug in striped_read,  the message are:
>>>> >[ 8743.663499] ceph:           file.c:350  : striped_read 0~16384 (read 0) got 16384
>>>> >[ 8743.663502] ceph:           file.c:390  : striped_read returns 16384
>>>> From the messages, we can see it can't hit the short-read.
>>>> For the ceph-file-hole, how does the ceph handle?
>>>> Or am i missing something?
>>>
>>>the default strip size is 4M, all data are written to the first object
>>>in your test case.
>>>could you try something like below.
>>>
>>
>>>dd if=/dev/urandom bs=1M count=2 of=file_with_holes
>>>dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc
>>>dd if=file_with_holes bs=8M >/dev/null
>>>
>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>> index 2ddf061..22a98e5 100644
>> --- a/fs/ceph/file.c
>> +++ b/fs/ceph/file.c
>> @@ -349,17 +349,17 @@ more:
>>         dout("striped_read %llu~%u (read %u) got %d%s%s\n", pos, left, read,
>>              ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
>>
>> -       if (ret > 0) {
>> -               int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>> +       if (ret >= 0) {
>> +               int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT;
>>
>> -               if (read < pos - off) {
>> -                       dout(" zero gap %llu to %llu\n", off + read, pos);
>> -                       ceph_zero_page_vector_range(page_align + read,
>> -                                                   pos - off - read, pages);
>> +               if (was_short) {
>> +                       dout(" zero gap %llu to %llu\n", pos + ret, pos + this_len);
>> +                       ceph_zero_page_vector_range(page_align + ret,
>> +                                                   this_len - ret, pages);
>>                 }
>> -               pos += ret;
>> +               pos += this_len;
>>                 read = pos - off;
>> -               left -= ret;
>> +               left -= this_len;
>>                 page_pos += didpages;
>>                 pages_left -= didpages;
>>
>> This patch can do those case. It only add ret== 0 in judgement 'ret > 0".
>
>maybe we should add a  i_size check. stop reading next strip object
>when 'pos > i_size'
>
>> But i think i will add a parameter about hit_hole. It will make the code easy to understand.
>>
>
> i think 'was_short' is equal to 'hit_hole'
>
>Regards
>Yan, Zheng
>
>>
>>
>>>Regards
>>>Yan, Zheng
>>>
>>>
>>>>
>>>>
>>>> Thanks!
>>>> Jianpeng Ma
>>>>
>>>> >Regards
>>>> >Yan, Zheng
>>>> >---
>>>> >diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>>>> >index 271a346..6ca2921 100644
>>>> >--- a/fs/ceph/file.c
>>>> >+++ b/fs/ceph/file.c
>>>> >@@ -350,16 +350,17 @@ more:
>>>> >            ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
>>>> >
>>>> >       if (ret > 0) {
>>>> >-              int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>>>> >+              int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT;
>>>> >
>>>> >-              if (read < pos - off) {
>>>> >-                      dout(" zero gap %llu to %llu\n", off + read, pos);
>>>> >-                      ceph_zero_page_vector_range(page_align + read,
>>>> >-                                                  pos - off - read, pages);
>>>> >+              if (was_short) {
>>>> >+                      dout(" zero gap %llu to %llu\n",
>>>> >+                           pos + ret, pos + this_len);
>>>> >+                      ceph_zero_page_vector_range(page_align + ret,
>>>> >+                                                  this_len - ret, page_pos);
>>>> >               }
>>>> >-              pos += ret;
>>>> >+              pos += this_len;
>>>> >               read = pos - off;
>>>> >-              left -= ret;
>>>> >+              left -= this_len;
>>>> >               page_pos += didpages;
>>>> >               pages_left -= didpages;
>>>> >
>> Thanks!
>> Jianpeng Ma
>>>On Mon, Jul 29, 2013 at 11:00 AM, majianpeng <majianpeng@gmail.com> wrote:
>>>>
>>>> [snip]
>>>> >I don't think the later was_short can handle the hole case. For the hole case,
>>>> >we should try reading next strip object instead of return. how about
>>>> >below patch.
>>>> >
>>>> Hi Yan,
>>>>         i uesed this demo to test hole case.
>>>> dd if=/dev/urandom bs=4096 count=2 of=file_with_holes
>>>> dd if=/dev/urandom bs=4096 seek=7 count=2 of=file_with_holes
>>>>
>>>> dd if=file_with_holes of=/dev/null bs=16k count=1 iflag=direct
>>>> Using the dynamic_debug in striped_read,  the message are:
>>>> >[ 8743.663499] ceph:           file.c:350  : striped_read 0~16384 (read 0) got 16384
>>>> >[ 8743.663502] ceph:           file.c:390  : striped_read returns 16384
>>>> From the messages, we can see it can't hit the short-read.
>>>> For the ceph-file-hole, how does the ceph handle?
>>>> Or am i missing something?
>>>
>>>the default strip size is 4M, all data are written to the first object
>>>in your test case.
>>>could you try something like below.
>>>
>>>dd if=/dev/urandom bs=1M count=2 of=file_with_holes
>>>dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc
>>>dd if=file_with_holes bs=8M >/dev/null
>>>
>>>Regards
>>>Yan, Zheng
>>>
>>>
>>>>
>>>>
>>>> Thanks!
>>>> Jianpeng Ma
>>>>
>>>> >Regards
>>>> >Yan, Zheng
>>>> >---
>>>> >diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>>>> >index 271a346..6ca2921 100644
>>>> >--- a/fs/ceph/file.c
>>>> >+++ b/fs/ceph/file.c
>>>> >@@ -350,16 +350,17 @@ more:
>>>> >            ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
>>>> >
>>>> >       if (ret > 0) {
>>>> >-              int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>>>> >+              int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT;
>>>> >
>>>> >-              if (read < pos - off) {
>>>> >-                      dout(" zero gap %llu to %llu\n", off + read, pos);
>>>> >-                      ceph_zero_page_vector_range(page_align + read,
>>>> >-                                                  pos - off - read, pages);
>>>> >+              if (was_short) {
>>>> >+                      dout(" zero gap %llu to %llu\n",
>>>> >+                           pos + ret, pos + this_len);
>>>> >+                      ceph_zero_page_vector_range(page_align + ret,
>>>> >+                                                  this_len - ret, page_pos);
>>>> >               }
>>>> >-              pos += ret;
>>>> >+              pos += this_len;
>>>> >               read = pos - off;
>>>> >-              left -= ret;
>>>> >+              left -= this_len;
>>>> >               page_pos += didpages;
>>>> >               pages_left -= didpages;
>>>> >

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re: question about striped_read
  2013-07-30 11:14                               ` Yan, Zheng
  2013-07-30 11:20                                 ` majianpeng
@ 2013-07-30 11:41                                 ` majianpeng
  2013-07-30 12:25                                   ` Yan, Zheng
  1 sibling, 1 reply; 33+ messages in thread
From: majianpeng @ 2013-07-30 11:41 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: sage, ceph-devel

>On Tue, Jul 30, 2013 at 7:01 PM, majianpeng <majianpeng@gmail.com> wrote:
>>>On Mon, Jul 29, 2013 at 11:00 AM, majianpeng <majianpeng@gmail.com> wrote:
>>>>
>>>> [snip]
>>>> >I don't think the later was_short can handle the hole case. For the hole case,
>>>> >we should try reading next strip object instead of return. how about
>>>> >below patch.
>>>> >
>>>> Hi Yan,
>>>>         i uesed this demo to test hole case.
>>>> dd if=/dev/urandom bs=4096 count=2 of=file_with_holes
>>>> dd if=/dev/urandom bs=4096 seek=7 count=2 of=file_with_holes
>>>>
>>>> dd if=file_with_holes of=/dev/null bs=16k count=1 iflag=direct
>>>> Using the dynamic_debug in striped_read,  the message are:
>>>> >[ 8743.663499] ceph:           file.c:350  : striped_read 0~16384 (read 0) got 16384
>>>> >[ 8743.663502] ceph:           file.c:390  : striped_read returns 16384
>>>> From the messages, we can see it can't hit the short-read.
>>>> For the ceph-file-hole, how does the ceph handle?
>>>> Or am i missing something?
>>>
>>>the default strip size is 4M, all data are written to the first object
>>>in your test case.
>>>could you try something like below.
>>>
>>
>>>dd if=/dev/urandom bs=1M count=2 of=file_with_holes
>>>dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc
>>>dd if=file_with_holes bs=8M >/dev/null
>>>
>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>> index 2ddf061..22a98e5 100644
>> --- a/fs/ceph/file.c
>> +++ b/fs/ceph/file.c
>> @@ -349,17 +349,17 @@ more:
>>         dout("striped_read %llu~%u (read %u) got %d%s%s\n", pos, left, read,
>>              ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
>>
>> -       if (ret > 0) {
>> -               int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>> +       if (ret >= 0) {
>> +               int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT;
>>
>> -               if (read < pos - off) {
>> -                       dout(" zero gap %llu to %llu\n", off + read, pos);
>> -                       ceph_zero_page_vector_range(page_align + read,
>> -                                                   pos - off - read, pages);
>> +               if (was_short) {
>> +                       dout(" zero gap %llu to %llu\n", pos + ret, pos + this_len);
>> +                       ceph_zero_page_vector_range(page_align + ret,
>> +                                                   this_len - ret, pages);
>>                 }
>> -               pos += ret;
>> +               pos += this_len;
>>                 read = pos - off;
>> -               left -= ret;
>> +               left -= this_len;
>>                 page_pos += didpages;
>>                 pages_left -= didpages;
>>
>> This patch can do those case. It only add ret== 0 in judgement 'ret > 0".
>
>maybe we should add a  i_size check. stop reading next strip object
>when 'pos > i_size'
>
I think we can't do this because i_size may smaller than real size in ceph.

>> But i think i will add a parameter about hit_hole. It will make the code easy to understand.
>>
>
> i think 'was_short' is equal to 'hit_hole'
>
>Regards
>Yan, Zheng
>
>>
>>
>>>Regards
>>>Yan, Zheng
>>>
>>>
>>>>
>>>>
>>>> Thanks!
>>>> Jianpeng Ma
>>>>
>>>> >Regards
>>>> >Yan, Zheng
>>>> >---
>>>> >diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>>>> >index 271a346..6ca2921 100644
>>>> >--- a/fs/ceph/file.c
>>>> >+++ b/fs/ceph/file.c
>>>> >@@ -350,16 +350,17 @@ more:
>>>> >            ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
>>>> >
>>>> >       if (ret > 0) {
>>>> >-              int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>>>> >+              int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT;
>>>> >
>>>> >-              if (read < pos - off) {
>>>> >-                      dout(" zero gap %llu to %llu\n", off + read, pos);
>>>> >-                      ceph_zero_page_vector_range(page_align + read,
>>>> >-                                                  pos - off - read, pages);
>>>> >+              if (was_short) {
>>>> >+                      dout(" zero gap %llu to %llu\n",
>>>> >+                           pos + ret, pos + this_len);
>>>> >+                      ceph_zero_page_vector_range(page_align + ret,
>>>> >+                                                  this_len - ret, page_pos);
>>>> >               }
>>>> >-              pos += ret;
>>>> >+              pos += this_len;
>>>> >               read = pos - off;
>>>> >-              left -= ret;
>>>> >+              left -= this_len;
>>>> >               page_pos += didpages;
>>>> >               pages_left -= didpages;
>>>> >
>> Thanks!
>> Jianpeng Ma
>>>On Mon, Jul 29, 2013 at 11:00 AM, majianpeng <majianpeng@gmail.com> wrote:
>>>>
>>>> [snip]
>>>> >I don't think the later was_short can handle the hole case. For the hole case,
>>>> >we should try reading next strip object instead of return. how about
>>>> >below patch.
>>>> >
>>>> Hi Yan,
>>>>         i uesed this demo to test hole case.
>>>> dd if=/dev/urandom bs=4096 count=2 of=file_with_holes
>>>> dd if=/dev/urandom bs=4096 seek=7 count=2 of=file_with_holes
>>>>
>>>> dd if=file_with_holes of=/dev/null bs=16k count=1 iflag=direct
>>>> Using the dynamic_debug in striped_read,  the message are:
>>>> >[ 8743.663499] ceph:           file.c:350  : striped_read 0~16384 (read 0) got 16384
>>>> >[ 8743.663502] ceph:           file.c:390  : striped_read returns 16384
>>>> From the messages, we can see it can't hit the short-read.
>>>> For the ceph-file-hole, how does the ceph handle?
>>>> Or am i missing something?
>>>
>>>the default strip size is 4M, all data are written to the first object
>>>in your test case.
>>>could you try something like below.
>>>
>>>dd if=/dev/urandom bs=1M count=2 of=file_with_holes
>>>dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc
>>>dd if=file_with_holes bs=8M >/dev/null
>>>
>>>Regards
>>>Yan, Zheng
>>>
>>>
>>>>
>>>>
>>>> Thanks!
>>>> Jianpeng Ma
>>>>
>>>> >Regards
>>>> >Yan, Zheng
>>>> >---
>>>> >diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>>>> >index 271a346..6ca2921 100644
>>>> >--- a/fs/ceph/file.c
>>>> >+++ b/fs/ceph/file.c
>>>> >@@ -350,16 +350,17 @@ more:
>>>> >            ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
>>>> >
>>>> >       if (ret > 0) {
>>>> >-              int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>>>> >+              int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT;
>>>> >
>>>> >-              if (read < pos - off) {
>>>> >-                      dout(" zero gap %llu to %llu\n", off + read, pos);
>>>> >-                      ceph_zero_page_vector_range(page_align + read,
>>>> >-                                                  pos - off - read, pages);
>>>> >+              if (was_short) {
>>>> >+                      dout(" zero gap %llu to %llu\n",
>>>> >+                           pos + ret, pos + this_len);
>>>> >+                      ceph_zero_page_vector_range(page_align + ret,
>>>> >+                                                  this_len - ret, page_pos);
>>>> >               }
>>>> >-              pos += ret;
>>>> >+              pos += this_len;
>>>> >               read = pos - off;
>>>> >-              left -= ret;
>>>> >+              left -= this_len;
>>>> >               page_pos += didpages;
>>>> >               pages_left -= didpages;
>>>> >
Thanks!
Jianpeng Ma
>On Tue, Jul 30, 2013 at 7:01 PM, majianpeng <majianpeng@gmail.com> wrote:
>>>On Mon, Jul 29, 2013 at 11:00 AM, majianpeng <majianpeng@gmail.com> wrote:
>>>>
>>>> [snip]
>>>> >I don't think the later was_short can handle the hole case. For the hole case,
>>>> >we should try reading next strip object instead of return. how about
>>>> >below patch.
>>>> >
>>>> Hi Yan,
>>>>         i uesed this demo to test hole case.
>>>> dd if=/dev/urandom bs=4096 count=2 of=file_with_holes
>>>> dd if=/dev/urandom bs=4096 seek=7 count=2 of=file_with_holes
>>>>
>>>> dd if=file_with_holes of=/dev/null bs=16k count=1 iflag=direct
>>>> Using the dynamic_debug in striped_read,  the message are:
>>>> >[ 8743.663499] ceph:           file.c:350  : striped_read 0~16384 (read 0) got 16384
>>>> >[ 8743.663502] ceph:           file.c:390  : striped_read returns 16384
>>>> From the messages, we can see it can't hit the short-read.
>>>> For the ceph-file-hole, how does the ceph handle?
>>>> Or am i missing something?
>>>
>>>the default strip size is 4M, all data are written to the first object
>>>in your test case.
>>>could you try something like below.
>>>
>>
>>>dd if=/dev/urandom bs=1M count=2 of=file_with_holes
>>>dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc
>>>dd if=file_with_holes bs=8M >/dev/null
>>>
>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>> index 2ddf061..22a98e5 100644
>> --- a/fs/ceph/file.c
>> +++ b/fs/ceph/file.c
>> @@ -349,17 +349,17 @@ more:
>>         dout("striped_read %llu~%u (read %u) got %d%s%s\n", pos, left, read,
>>              ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
>>
>> -       if (ret > 0) {
>> -               int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>> +       if (ret >= 0) {
>> +               int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT;
>>
>> -               if (read < pos - off) {
>> -                       dout(" zero gap %llu to %llu\n", off + read, pos);
>> -                       ceph_zero_page_vector_range(page_align + read,
>> -                                                   pos - off - read, pages);
>> +               if (was_short) {
>> +                       dout(" zero gap %llu to %llu\n", pos + ret, pos + this_len);
>> +                       ceph_zero_page_vector_range(page_align + ret,
>> +                                                   this_len - ret, pages);
>>                 }
>> -               pos += ret;
>> +               pos += this_len;
>>                 read = pos - off;
>> -               left -= ret;
>> +               left -= this_len;
>>                 page_pos += didpages;
>>                 pages_left -= didpages;
>>
>> This patch can do those case. It only add ret== 0 in judgement 'ret > 0".
>
>maybe we should add a  i_size check. stop reading next strip object
>when 'pos > i_size'
>
>> But i think i will add a parameter about hit_hole. It will make the code easy to understand.
>>
>
> i think 'was_short' is equal to 'hit_hole'
>
>Regards
>Yan, Zheng
>
>>
>>
>>>Regards
>>>Yan, Zheng
>>>
>>>
>>>>
>>>>
>>>> Thanks!
>>>> Jianpeng Ma
>>>>
>>>> >Regards
>>>> >Yan, Zheng
>>>> >---
>>>> >diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>>>> >index 271a346..6ca2921 100644
>>>> >--- a/fs/ceph/file.c
>>>> >+++ b/fs/ceph/file.c
>>>> >@@ -350,16 +350,17 @@ more:
>>>> >            ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
>>>> >
>>>> >       if (ret > 0) {
>>>> >-              int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>>>> >+              int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT;
>>>> >
>>>> >-              if (read < pos - off) {
>>>> >-                      dout(" zero gap %llu to %llu\n", off + read, pos);
>>>> >-                      ceph_zero_page_vector_range(page_align + read,
>>>> >-                                                  pos - off - read, pages);
>>>> >+              if (was_short) {
>>>> >+                      dout(" zero gap %llu to %llu\n",
>>>> >+                           pos + ret, pos + this_len);
>>>> >+                      ceph_zero_page_vector_range(page_align + ret,
>>>> >+                                                  this_len - ret, page_pos);
>>>> >               }
>>>> >-              pos += ret;
>>>> >+              pos += this_len;
>>>> >               read = pos - off;
>>>> >-              left -= ret;
>>>> >+              left -= this_len;
>>>> >               page_pos += didpages;
>>>> >               pages_left -= didpages;
>>>> >
>> Thanks!
>> Jianpeng Ma
>>>On Mon, Jul 29, 2013 at 11:00 AM, majianpeng <majianpeng@gmail.com> wrote:
>>>>
>>>> [snip]
>>>> >I don't think the later was_short can handle the hole case. For the hole case,
>>>> >we should try reading next strip object instead of return. how about
>>>> >below patch.
>>>> >
>>>> Hi Yan,
>>>>         i uesed this demo to test hole case.
>>>> dd if=/dev/urandom bs=4096 count=2 of=file_with_holes
>>>> dd if=/dev/urandom bs=4096 seek=7 count=2 of=file_with_holes
>>>>
>>>> dd if=file_with_holes of=/dev/null bs=16k count=1 iflag=direct
>>>> Using the dynamic_debug in striped_read,  the message are:
>>>> >[ 8743.663499] ceph:           file.c:350  : striped_read 0~16384 (read 0) got 16384
>>>> >[ 8743.663502] ceph:           file.c:390  : striped_read returns 16384
>>>> From the messages, we can see it can't hit the short-read.
>>>> For the ceph-file-hole, how does the ceph handle?
>>>> Or am i missing something?
>>>
>>>the default strip size is 4M, all data are written to the first object
>>>in your test case.
>>>could you try something like below.
>>>
>>>dd if=/dev/urandom bs=1M count=2 of=file_with_holes
>>>dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc
>>>dd if=file_with_holes bs=8M >/dev/null
>>>
>>>Regards
>>>Yan, Zheng
>>>
>>>
>>>>
>>>>
>>>> Thanks!
>>>> Jianpeng Ma
>>>>
>>>> >Regards
>>>> >Yan, Zheng
>>>> >---
>>>> >diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>>>> >index 271a346..6ca2921 100644
>>>> >--- a/fs/ceph/file.c
>>>> >+++ b/fs/ceph/file.c
>>>> >@@ -350,16 +350,17 @@ more:
>>>> >            ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
>>>> >
>>>> >       if (ret > 0) {
>>>> >-              int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>>>> >+              int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT;
>>>> >
>>>> >-              if (read < pos - off) {
>>>> >-                      dout(" zero gap %llu to %llu\n", off + read, pos);
>>>> >-                      ceph_zero_page_vector_range(page_align + read,
>>>> >-                                                  pos - off - read, pages);
>>>> >+              if (was_short) {
>>>> >+                      dout(" zero gap %llu to %llu\n",
>>>> >+                           pos + ret, pos + this_len);
>>>> >+                      ceph_zero_page_vector_range(page_align + ret,
>>>> >+                                                  this_len - ret, page_pos);
>>>> >               }
>>>> >-              pos += ret;
>>>> >+              pos += this_len;
>>>> >               read = pos - off;
>>>> >-              left -= ret;
>>>> >+              left -= this_len;
>>>> >               page_pos += didpages;
>>>> >               pages_left -= didpages;
>>>> >

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re: question about striped_read
  2013-07-30 11:41                                 ` majianpeng
@ 2013-07-30 12:25                                   ` Yan, Zheng
  2013-07-31  0:27                                     ` majianpeng
  0 siblings, 1 reply; 33+ messages in thread
From: Yan, Zheng @ 2013-07-30 12:25 UTC (permalink / raw)
  To: majianpeng; +Cc: sage, ceph-devel

On Tue, Jul 30, 2013 at 7:41 PM, majianpeng <majianpeng@gmail.com> wrote:
>>>
>>>>dd if=/dev/urandom bs=1M count=2 of=file_with_holes
>>>>dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc
>>>>dd if=file_with_holes bs=8M >/dev/null
>>>>
>>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>>> index 2ddf061..22a98e5 100644
>>> --- a/fs/ceph/file.c
>>> +++ b/fs/ceph/file.c
>>> @@ -349,17 +349,17 @@ more:
>>>         dout("striped_read %llu~%u (read %u) got %d%s%s\n", pos, left, read,
>>>              ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
>>>
>>> -       if (ret > 0) {
>>> -               int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>>> +       if (ret >= 0) {
>>> +               int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT;
>>>
>>> -               if (read < pos - off) {
>>> -                       dout(" zero gap %llu to %llu\n", off + read, pos);
>>> -                       ceph_zero_page_vector_range(page_align + read,
>>> -                                                   pos - off - read, pages);
>>> +               if (was_short) {
>>> +                       dout(" zero gap %llu to %llu\n", pos + ret, pos + this_len);
>>> +                       ceph_zero_page_vector_range(page_align + ret,
>>> +                                                   this_len - ret, pages);
>>>                 }
>>> -               pos += ret;
>>> +               pos += this_len;
>>>                 read = pos - off;
>>> -               left -= ret;
>>> +               left -= this_len;
>>>                 page_pos += didpages;
>>>                 pages_left -= didpages;
>>>
>>> This patch can do those case. It only add ret== 0 in judgement 'ret > 0".
>>
>>maybe we should add a  i_size check. stop reading next strip object
>>when 'pos > i_size'
>>
> I think we can't do this because i_size may smaller than real size in ceph.
>
ceph_aio_read() calls ceph_do_getattr() when 'checkeof = true', it
handles the case.

We must do i_size check in striped_read(), otherwise user program always gets as
much data as it requests. For example

dd if=/dev/urandom bs=1M count=1 of=file_with_holes
dd if=file_with_holes bs=64M iflag=direct of=/dev/null

Regards
Yan, Zheng


>>> But i think i will add a parameter about hit_hole. It will make the code easy to understand.
>>>
>>
>> i think 'was_short' is equal to 'hit_hole'
>>
[snip]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re: question about striped_read
  2013-07-30 12:25                                   ` Yan, Zheng
@ 2013-07-31  0:27                                     ` majianpeng
  2013-07-31  0:40                                       ` Sage Weil
  0 siblings, 1 reply; 33+ messages in thread
From: majianpeng @ 2013-07-31  0:27 UTC (permalink / raw)
  To: Yan, Zheng, sage; +Cc: ceph-devel

>On Tue, Jul 30, 2013 at 7:41 PM, majianpeng <majianpeng@gmail.com> wrote:
>>>>
>>>>>dd if=/dev/urandom bs=1M count=2 of=file_with_holes
>>>>>dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc
>>>>>dd if=file_with_holes bs=8M >/dev/null
>>>>>
>>>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>>>> index 2ddf061..22a98e5 100644
>>>> --- a/fs/ceph/file.c
>>>> +++ b/fs/ceph/file.c
>>>> @@ -349,17 +349,17 @@ more:
>>>>         dout("striped_read %llu~%u (read %u) got %d%s%s\n", pos, left, read,
>>>>              ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
>>>>
>>>> -       if (ret > 0) {
>>>> -               int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>>>> +       if (ret >= 0) {
>>>> +               int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT;
>>>>
>>>> -               if (read < pos - off) {
>>>> -                       dout(" zero gap %llu to %llu\n", off + read, pos);
>>>> -                       ceph_zero_page_vector_range(page_align + read,
>>>> -                                                   pos - off - read, pages);
>>>> +               if (was_short) {
>>>> +                       dout(" zero gap %llu to %llu\n", pos + ret, pos + this_len);
>>>> +                       ceph_zero_page_vector_range(page_align + ret,
>>>> +                                                   this_len - ret, pages);
>>>>                 }
>>>> -               pos += ret;
>>>> +               pos += this_len;
>>>>                 read = pos - off;
>>>> -               left -= ret;
>>>> +               left -= this_len;
>>>>                 page_pos += didpages;
>>>>                 pages_left -= didpages;
>>>>
>>>> This patch can do those case. It only add ret== 0 in judgement 'ret > 0".
>>>
>>>maybe we should add a  i_size check. stop reading next strip object
>>>when 'pos > i_size'
>>>
>> I think we can't do this because i_size may smaller than real size in ceph.
>>
>ceph_aio_read() calls ceph_do_getattr() when 'checkeof = true', it
>handles the case.
>
>We must do i_size check in striped_read(), otherwise user program always gets as
>much data as it requests. For example
>
>dd if=/dev/urandom bs=1M count=1 of=file_with_holes
>dd if=file_with_holes bs=64M iflag=direct of=/dev/null
>
Before doing that, we must know the meaning of return value.
A: ret = ENOENT
B: ret = 0

Only we knowed this, we can handle exactly.
Sage, can you explain those meaning in detail?

Thanks!
Jianpeng Ma

>Regards
>Yan, Zheng
>
>
>>>> But i think i will add a parameter about hit_hole. It will make the code easy to understand.
>>>>
>>>
>>> i think 'was_short' is equal to 'hit_hole'
>>>
>[snip]
Thanks!
Jianpeng Ma
>On Tue, Jul 30, 2013 at 7:41 PM, majianpeng <majianpeng@gmail.com> wrote:
>>>>
>>>>>dd if=/dev/urandom bs=1M count=2 of=file_with_holes
>>>>>dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc
>>>>>dd if=file_with_holes bs=8M >/dev/null
>>>>>
>>>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>>>> index 2ddf061..22a98e5 100644
>>>> --- a/fs/ceph/file.c
>>>> +++ b/fs/ceph/file.c
>>>> @@ -349,17 +349,17 @@ more:
>>>>         dout("striped_read %llu~%u (read %u) got %d%s%s\n", pos, left, read,
>>>>              ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
>>>>
>>>> -       if (ret > 0) {
>>>> -               int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>>>> +       if (ret >= 0) {
>>>> +               int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT;
>>>>
>>>> -               if (read < pos - off) {
>>>> -                       dout(" zero gap %llu to %llu\n", off + read, pos);
>>>> -                       ceph_zero_page_vector_range(page_align + read,
>>>> -                                                   pos - off - read, pages);
>>>> +               if (was_short) {
>>>> +                       dout(" zero gap %llu to %llu\n", pos + ret, pos + this_len);
>>>> +                       ceph_zero_page_vector_range(page_align + ret,
>>>> +                                                   this_len - ret, pages);
>>>>                 }
>>>> -               pos += ret;
>>>> +               pos += this_len;
>>>>                 read = pos - off;
>>>> -               left -= ret;
>>>> +               left -= this_len;
>>>>                 page_pos += didpages;
>>>>                 pages_left -= didpages;
>>>>
>>>> This patch can do those case. It only add ret== 0 in judgement 'ret > 0".
>>>
>>>maybe we should add a  i_size check. stop reading next strip object
>>>when 'pos > i_size'
>>>
>> I think we can't do this because i_size may smaller than real size in ceph.
>>
>ceph_aio_read() calls ceph_do_getattr() when 'checkeof = true', it
>handles the case.
>
>We must do i_size check in striped_read(), otherwise user program always gets as
>much data as it requests. For example
>
>dd if=/dev/urandom bs=1M count=1 of=file_with_holes
>dd if=file_with_holes bs=64M iflag=direct of=/dev/null
>
>Regards
>Yan, Zheng
>
>
>>>> But i think i will add a parameter about hit_hole. It will make the code easy to understand.
>>>>
>>>
>>> i think 'was_short' is equal to 'hit_hole'
>>>
>[snip]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re: question about striped_read
  2013-07-31  0:27                                     ` majianpeng
@ 2013-07-31  0:40                                       ` Sage Weil
  2013-07-31  0:44                                         ` majianpeng
  0 siblings, 1 reply; 33+ messages in thread
From: Sage Weil @ 2013-07-31  0:40 UTC (permalink / raw)
  To: majianpeng; +Cc: Yan, Zheng, ceph-devel

On Wed, 31 Jul 2013, majianpeng wrote:
> >On Tue, Jul 30, 2013 at 7:41 PM, majianpeng <majianpeng@gmail.com> wrote:
> >>>>
> >>>>>dd if=/dev/urandom bs=1M count=2 of=file_with_holes
> >>>>>dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc
> >>>>>dd if=file_with_holes bs=8M >/dev/null
> >>>>>
> >>>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> >>>> index 2ddf061..22a98e5 100644
> >>>> --- a/fs/ceph/file.c
> >>>> +++ b/fs/ceph/file.c
> >>>> @@ -349,17 +349,17 @@ more:
> >>>>         dout("striped_read %llu~%u (read %u) got %d%s%s\n", pos, left, read,
> >>>>              ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
> >>>>
> >>>> -       if (ret > 0) {
> >>>> -               int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
> >>>> +       if (ret >= 0) {
> >>>> +               int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT;
> >>>>
> >>>> -               if (read < pos - off) {
> >>>> -                       dout(" zero gap %llu to %llu\n", off + read, pos);
> >>>> -                       ceph_zero_page_vector_range(page_align + read,
> >>>> -                                                   pos - off - read, pages);
> >>>> +               if (was_short) {
> >>>> +                       dout(" zero gap %llu to %llu\n", pos + ret, pos + this_len);
> >>>> +                       ceph_zero_page_vector_range(page_align + ret,
> >>>> +                                                   this_len - ret, pages);
> >>>>                 }
> >>>> -               pos += ret;
> >>>> +               pos += this_len;
> >>>>                 read = pos - off;
> >>>> -               left -= ret;
> >>>> +               left -= this_len;
> >>>>                 page_pos += didpages;
> >>>>                 pages_left -= didpages;
> >>>>
> >>>> This patch can do those case. It only add ret== 0 in judgement 'ret > 0".
> >>>
> >>>maybe we should add a  i_size check. stop reading next strip object
> >>>when 'pos > i_size'
> >>>
> >> I think we can't do this because i_size may smaller than real size in ceph.
> >>
> >ceph_aio_read() calls ceph_do_getattr() when 'checkeof = true', it
> >handles the case.
> >
> >We must do i_size check in striped_read(), otherwise user program always gets as
> >much data as it requests. For example
> >
> >dd if=/dev/urandom bs=1M count=1 of=file_with_holes
> >dd if=file_with_holes bs=64M iflag=direct of=/dev/null
> >
> Before doing that, we must know the meaning of return value.

For ceph_osdc_readpages(),

> A: ret = ENOENT

The object does not exist.

> B: ret = 0

The object exists but we read 0 bytes, which means we are past EOF or the 
object has size 0 bytes.  Either way, we are either in a hole or past EOF.

sage

> 
> Only we knowed this, we can handle exactly.
> Sage, can you explain those meaning in detail?
> 
> Thanks!
> Jianpeng Ma
> 
> >Regards
> >Yan, Zheng
> >
> >
> >>>> But i think i will add a parameter about hit_hole. It will make the code easy to understand.
> >>>>
> >>>
> >>> i think 'was_short' is equal to 'hit_hole'
> >>>
> >[snip]
> Thanks!
> Jianpeng Ma
> >On Tue, Jul 30, 2013 at 7:41 PM, majianpeng <majianpeng@gmail.com> wrote:
> >>>>
> >>>>>dd if=/dev/urandom bs=1M count=2 of=file_with_holes
> >>>>>dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc
> >>>>>dd if=file_with_holes bs=8M >/dev/null
> >>>>>
> >>>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> >>>> index 2ddf061..22a98e5 100644
> >>>> --- a/fs/ceph/file.c
> >>>> +++ b/fs/ceph/file.c
> >>>> @@ -349,17 +349,17 @@ more:
> >>>>         dout("striped_read %llu~%u (read %u) got %d%s%s\n", pos, left, read,
> >>>>              ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
> >>>>
> >>>> -       if (ret > 0) {
> >>>> -               int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
> >>>> +       if (ret >= 0) {
> >>>> +               int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT;
> >>>>
> >>>> -               if (read < pos - off) {
> >>>> -                       dout(" zero gap %llu to %llu\n", off + read, pos);
> >>>> -                       ceph_zero_page_vector_range(page_align + read,
> >>>> -                                                   pos - off - read, pages);
> >>>> +               if (was_short) {
> >>>> +                       dout(" zero gap %llu to %llu\n", pos + ret, pos + this_len);
> >>>> +                       ceph_zero_page_vector_range(page_align + ret,
> >>>> +                                                   this_len - ret, pages);
> >>>>                 }
> >>>> -               pos += ret;
> >>>> +               pos += this_len;
> >>>>                 read = pos - off;
> >>>> -               left -= ret;
> >>>> +               left -= this_len;
> >>>>                 page_pos += didpages;
> >>>>                 pages_left -= didpages;
> >>>>
> >>>> This patch can do those case. It only add ret== 0 in judgement 'ret > 0".
> >>>
> >>>maybe we should add a  i_size check. stop reading next strip object
> >>>when 'pos > i_size'
> >>>
> >> I think we can't do this because i_size may smaller than real size in ceph.
> >>
> >ceph_aio_read() calls ceph_do_getattr() when 'checkeof = true', it
> >handles the case.
> >
> >We must do i_size check in striped_read(), otherwise user program always gets as
> >much data as it requests. For example
> >
> >dd if=/dev/urandom bs=1M count=1 of=file_with_holes
> >dd if=file_with_holes bs=64M iflag=direct of=/dev/null
> >
> >Regards
> >Yan, Zheng
> >
> >
> >>>> But i think i will add a parameter about hit_hole. It will make the code easy to understand.
> >>>>
> >>>
> >>> i think 'was_short' is equal to 'hit_hole'
> >>>
> >[snip]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re: question about striped_read
  2013-07-31  0:40                                       ` Sage Weil
@ 2013-07-31  0:44                                         ` majianpeng
  2013-07-31  0:47                                           ` Sage Weil
  0 siblings, 1 reply; 33+ messages in thread
From: majianpeng @ 2013-07-31  0:44 UTC (permalink / raw)
  To: sage; +Cc: Yan, Zheng, ceph-devel

>On Wed, 31 Jul 2013, majianpeng wrote:
>> >On Tue, Jul 30, 2013 at 7:41 PM, majianpeng <majianpeng@gmail.com> wrote:
[snip]
>
>For ceph_osdc_readpages(),
>
>> A: ret = ENOENT
>
From the original code, for this case we should zero the area.
Why?

Thanks!
Jianpeng Ma
>The object does not exist.
>
>> B: ret = 0
>
>The object exists but we read 0 bytes, which means we are past EOF or the 
>object has size 0 bytes.  Either way, we are either in a hole or past EOF.
>
>sage
>
>> 
>> Only we knowed this, we can handle exactly.
>> Sage, can you explain those meaning in detail?
>> 
>> Thanks!
>> Jianpeng Ma
>> 
>> >Regards
>> >Yan, Zheng
>> >
>> >
>> >>>> But i think i will add a parameter about hit_hole. It will make the code easy to understand.
>> >>>>
>> >>>
>> >>> i think 'was_short' is equal to 'hit_hole'
>> >>>
>> >[snip]
>> Thanks!
>> Jianpeng Ma
>> >On Tue, Jul 30, 2013 at 7:41 PM, majianpeng <majianpeng@gmail.com> wrote:
>> >>>>
>> >>>>>dd if=/dev/urandom bs=1M count=2 of=file_with_holes
>> >>>>>dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc
>> >>>>>dd if=file_with_holes bs=8M >/dev/null
>> >>>>>
>> >>>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>> >>>> index 2ddf061..22a98e5 100644
>> >>>> --- a/fs/ceph/file.c
>> >>>> +++ b/fs/ceph/file.c
>> >>>> @@ -349,17 +349,17 @@ more:
>> >>>>         dout("striped_read %llu~%u (read %u) got %d%s%s\n", pos, left, read,
>> >>>>              ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
>> >>>>
>> >>>> -       if (ret > 0) {
>> >>>> -               int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>> >>>> +       if (ret >= 0) {
>> >>>> +               int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT;
>> >>>>
>> >>>> -               if (read < pos - off) {
>> >>>> -                       dout(" zero gap %llu to %llu\n", off + read, pos);
>> >>>> -                       ceph_zero_page_vector_range(page_align + read,
>> >>>> -                                                   pos - off - read, pages);
>> >>>> +               if (was_short) {
>> >>>> +                       dout(" zero gap %llu to %llu\n", pos + ret, pos + this_len);
>> >>>> +                       ceph_zero_page_vector_range(page_align + ret,
>> >>>> +                                                   this_len - ret, pages);
>> >>>>                 }
>> >>>> -               pos += ret;
>> >>>> +               pos += this_len;
>> >>>>                 read = pos - off;
>> >>>> -               left -= ret;
>> >>>> +               left -= this_len;
>> >>>>                 page_pos += didpages;
>> >>>>                 pages_left -= didpages;
>> >>>>
>> >>>> This patch can do those case. It only add ret== 0 in judgement 'ret > 0".
>> >>>
>> >>>maybe we should add a  i_size check. stop reading next strip object
>> >>>when 'pos > i_size'
>> >>>
>> >> I think we can't do this because i_size may smaller than real size in ceph.
>> >>
>> >ceph_aio_read() calls ceph_do_getattr() when 'checkeof = true', it
>> >handles the case.
>> >
>> >We must do i_size check in striped_read(), otherwise user program always gets as
>> >much data as it requests. For example
>> >
>> >dd if=/dev/urandom bs=1M count=1 of=file_with_holes
>> >dd if=file_with_holes bs=64M iflag=direct of=/dev/null
>> >
>> >Regards
>> >Yan, Zheng
>> >
>> >
>> >>>> But i think i will add a parameter about hit_hole. It will make the code easy to understand.
>> >>>>
>> >>>
>> >>> i think 'was_short' is equal to 'hit_hole'
>> >>>
>> >[snip]
Thanks!
Jianpeng Ma
>On Wed, 31 Jul 2013, majianpeng wrote:
>> >On Tue, Jul 30, 2013 at 7:41 PM, majianpeng <majianpeng@gmail.com> wrote:
>> >>>>
>> >>>>>dd if=/dev/urandom bs=1M count=2 of=file_with_holes
>> >>>>>dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc
>> >>>>>dd if=file_with_holes bs=8M >/dev/null
>> >>>>>
>> >>>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>> >>>> index 2ddf061..22a98e5 100644
>> >>>> --- a/fs/ceph/file.c
>> >>>> +++ b/fs/ceph/file.c
>> >>>> @@ -349,17 +349,17 @@ more:
>> >>>>         dout("striped_read %llu~%u (read %u) got %d%s%s\n", pos, left, read,
>> >>>>              ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
>> >>>>
>> >>>> -       if (ret > 0) {
>> >>>> -               int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>> >>>> +       if (ret >= 0) {
>> >>>> +               int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT;
>> >>>>
>> >>>> -               if (read < pos - off) {
>> >>>> -                       dout(" zero gap %llu to %llu\n", off + read, pos);
>> >>>> -                       ceph_zero_page_vector_range(page_align + read,
>> >>>> -                                                   pos - off - read, pages);
>> >>>> +               if (was_short) {
>> >>>> +                       dout(" zero gap %llu to %llu\n", pos + ret, pos + this_len);
>> >>>> +                       ceph_zero_page_vector_range(page_align + ret,
>> >>>> +                                                   this_len - ret, pages);
>> >>>>                 }
>> >>>> -               pos += ret;
>> >>>> +               pos += this_len;
>> >>>>                 read = pos - off;
>> >>>> -               left -= ret;
>> >>>> +               left -= this_len;
>> >>>>                 page_pos += didpages;
>> >>>>                 pages_left -= didpages;
>> >>>>
>> >>>> This patch can do those case. It only add ret== 0 in judgement 'ret > 0".
>> >>>
>> >>>maybe we should add a  i_size check. stop reading next strip object
>> >>>when 'pos > i_size'
>> >>>
>> >> I think we can't do this because i_size may smaller than real size in ceph.
>> >>
>> >ceph_aio_read() calls ceph_do_getattr() when 'checkeof = true', it
>> >handles the case.
>> >
>> >We must do i_size check in striped_read(), otherwise user program always gets as
>> >much data as it requests. For example
>> >
>> >dd if=/dev/urandom bs=1M count=1 of=file_with_holes
>> >dd if=file_with_holes bs=64M iflag=direct of=/dev/null
>> >
>> Before doing that, we must know the meaning of return value.
>
>For ceph_osdc_readpages(),
>
>> A: ret = ENOENT
>
>The object does not exist.
>
>> B: ret = 0
>
>The object exists but we read 0 bytes, which means we are past EOF or the 
>object has size 0 bytes.  Either way, we are either in a hole or past EOF.
>
>sage
>
>> 
>> Only we knowed this, we can handle exactly.
>> Sage, can you explain those meaning in detail?
>> 
>> Thanks!
>> Jianpeng Ma
>> 
>> >Regards
>> >Yan, Zheng
>> >
>> >
>> >>>> But i think i will add a parameter about hit_hole. It will make the code easy to understand.
>> >>>>
>> >>>
>> >>> i think 'was_short' is equal to 'hit_hole'
>> >>>
>> >[snip]
>> Thanks!
>> Jianpeng Ma
>> >On Tue, Jul 30, 2013 at 7:41 PM, majianpeng <majianpeng@gmail.com> wrote:
>> >>>>
>> >>>>>dd if=/dev/urandom bs=1M count=2 of=file_with_holes
>> >>>>>dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc
>> >>>>>dd if=file_with_holes bs=8M >/dev/null
>> >>>>>
>> >>>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>> >>>> index 2ddf061..22a98e5 100644
>> >>>> --- a/fs/ceph/file.c
>> >>>> +++ b/fs/ceph/file.c
>> >>>> @@ -349,17 +349,17 @@ more:
>> >>>>         dout("striped_read %llu~%u (read %u) got %d%s%s\n", pos, left, read,
>> >>>>              ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
>> >>>>
>> >>>> -       if (ret > 0) {
>> >>>> -               int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>> >>>> +       if (ret >= 0) {
>> >>>> +               int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT;
>> >>>>
>> >>>> -               if (read < pos - off) {
>> >>>> -                       dout(" zero gap %llu to %llu\n", off + read, pos);
>> >>>> -                       ceph_zero_page_vector_range(page_align + read,
>> >>>> -                                                   pos - off - read, pages);
>> >>>> +               if (was_short) {
>> >>>> +                       dout(" zero gap %llu to %llu\n", pos + ret, pos + this_len);
>> >>>> +                       ceph_zero_page_vector_range(page_align + ret,
>> >>>> +                                                   this_len - ret, pages);
>> >>>>                 }
>> >>>> -               pos += ret;
>> >>>> +               pos += this_len;
>> >>>>                 read = pos - off;
>> >>>> -               left -= ret;
>> >>>> +               left -= this_len;
>> >>>>                 page_pos += didpages;
>> >>>>                 pages_left -= didpages;
>> >>>>
>> >>>> This patch can do those case. It only add ret== 0 in judgement 'ret > 0".
>> >>>
>> >>>maybe we should add a  i_size check. stop reading next strip object
>> >>>when 'pos > i_size'
>> >>>
>> >> I think we can't do this because i_size may smaller than real size in ceph.
>> >>
>> >ceph_aio_read() calls ceph_do_getattr() when 'checkeof = true', it
>> >handles the case.
>> >
>> >We must do i_size check in striped_read(), otherwise user program always gets as
>> >much data as it requests. For example
>> >
>> >dd if=/dev/urandom bs=1M count=1 of=file_with_holes
>> >dd if=file_with_holes bs=64M iflag=direct of=/dev/null
>> >
>> >Regards
>> >Yan, Zheng
>> >
>> >
>> >>>> But i think i will add a parameter about hit_hole. It will make the code easy to understand.
>> >>>>
>> >>>
>> >>> i think 'was_short' is equal to 'hit_hole'
>> >>>
>> >[snip]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re: question about striped_read
  2013-07-31  0:44                                         ` majianpeng
@ 2013-07-31  0:47                                           ` Sage Weil
  2013-07-31  1:36                                             ` majianpeng
  0 siblings, 1 reply; 33+ messages in thread
From: Sage Weil @ 2013-07-31  0:47 UTC (permalink / raw)
  To: majianpeng; +Cc: Yan, Zheng, ceph-devel

On Wed, 31 Jul 2013, majianpeng wrote:
> >On Wed, 31 Jul 2013, majianpeng wrote:
> >> >On Tue, Jul 30, 2013 at 7:41 PM, majianpeng <majianpeng@gmail.com> wrote:
> [snip]
> >
> >For ceph_osdc_readpages(),
> >
> >> A: ret = ENOENT
> >
> From the original code, for this case we should zero the area.
> Why?

If an object is missing it means that range of the file was never written 
to and it is a hole (if we are before EOF).

For example, open, seek(128MB), write 1 byte, close will write 1 byte to 
one object and everything before that, where the objects do not exist, is 
defined to be zeros..

sage



> 
> Thanks!
> Jianpeng Ma
> >The object does not exist.
> >
> >> B: ret = 0
> >
> >The object exists but we read 0 bytes, which means we are past EOF or the 
> >object has size 0 bytes.  Either way, we are either in a hole or past EOF.
> >
> >sage
> >
> >> 
> >> Only we knowed this, we can handle exactly.
> >> Sage, can you explain those meaning in detail?
> >> 
> >> Thanks!
> >> Jianpeng Ma
> >> 
> >> >Regards
> >> >Yan, Zheng
> >> >
> >> >
> >> >>>> But i think i will add a parameter about hit_hole. It will make the code easy to understand.
> >> >>>>
> >> >>>
> >> >>> i think 'was_short' is equal to 'hit_hole'
> >> >>>
> >> >[snip]
> >> Thanks!
> >> Jianpeng Ma
> >> >On Tue, Jul 30, 2013 at 7:41 PM, majianpeng <majianpeng@gmail.com> wrote:
> >> >>>>
> >> >>>>>dd if=/dev/urandom bs=1M count=2 of=file_with_holes
> >> >>>>>dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc
> >> >>>>>dd if=file_with_holes bs=8M >/dev/null
> >> >>>>>
> >> >>>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> >> >>>> index 2ddf061..22a98e5 100644
> >> >>>> --- a/fs/ceph/file.c
> >> >>>> +++ b/fs/ceph/file.c
> >> >>>> @@ -349,17 +349,17 @@ more:
> >> >>>>         dout("striped_read %llu~%u (read %u) got %d%s%s\n", pos, left, read,
> >> >>>>              ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
> >> >>>>
> >> >>>> -       if (ret > 0) {
> >> >>>> -               int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
> >> >>>> +       if (ret >= 0) {
> >> >>>> +               int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT;
> >> >>>>
> >> >>>> -               if (read < pos - off) {
> >> >>>> -                       dout(" zero gap %llu to %llu\n", off + read, pos);
> >> >>>> -                       ceph_zero_page_vector_range(page_align + read,
> >> >>>> -                                                   pos - off - read, pages);
> >> >>>> +               if (was_short) {
> >> >>>> +                       dout(" zero gap %llu to %llu\n", pos + ret, pos + this_len);
> >> >>>> +                       ceph_zero_page_vector_range(page_align + ret,
> >> >>>> +                                                   this_len - ret, pages);
> >> >>>>                 }
> >> >>>> -               pos += ret;
> >> >>>> +               pos += this_len;
> >> >>>>                 read = pos - off;
> >> >>>> -               left -= ret;
> >> >>>> +               left -= this_len;
> >> >>>>                 page_pos += didpages;
> >> >>>>                 pages_left -= didpages;
> >> >>>>
> >> >>>> This patch can do those case. It only add ret== 0 in judgement 'ret > 0".
> >> >>>
> >> >>>maybe we should add a  i_size check. stop reading next strip object
> >> >>>when 'pos > i_size'
> >> >>>
> >> >> I think we can't do this because i_size may smaller than real size in ceph.
> >> >>
> >> >ceph_aio_read() calls ceph_do_getattr() when 'checkeof = true', it
> >> >handles the case.
> >> >
> >> >We must do i_size check in striped_read(), otherwise user program always gets as
> >> >much data as it requests. For example
> >> >
> >> >dd if=/dev/urandom bs=1M count=1 of=file_with_holes
> >> >dd if=file_with_holes bs=64M iflag=direct of=/dev/null
> >> >
> >> >Regards
> >> >Yan, Zheng
> >> >
> >> >
> >> >>>> But i think i will add a parameter about hit_hole. It will make the code easy to understand.
> >> >>>>
> >> >>>
> >> >>> i think 'was_short' is equal to 'hit_hole'
> >> >>>
> >> >[snip]
> Thanks!
> Jianpeng Ma
> >On Wed, 31 Jul 2013, majianpeng wrote:
> >> >On Tue, Jul 30, 2013 at 7:41 PM, majianpeng <majianpeng@gmail.com> wrote:
> >> >>>>
> >> >>>>>dd if=/dev/urandom bs=1M count=2 of=file_with_holes
> >> >>>>>dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc
> >> >>>>>dd if=file_with_holes bs=8M >/dev/null
> >> >>>>>
> >> >>>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> >> >>>> index 2ddf061..22a98e5 100644
> >> >>>> --- a/fs/ceph/file.c
> >> >>>> +++ b/fs/ceph/file.c
> >> >>>> @@ -349,17 +349,17 @@ more:
> >> >>>>         dout("striped_read %llu~%u (read %u) got %d%s%s\n", pos, left, read,
> >> >>>>              ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
> >> >>>>
> >> >>>> -       if (ret > 0) {
> >> >>>> -               int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
> >> >>>> +       if (ret >= 0) {
> >> >>>> +               int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT;
> >> >>>>
> >> >>>> -               if (read < pos - off) {
> >> >>>> -                       dout(" zero gap %llu to %llu\n", off + read, pos);
> >> >>>> -                       ceph_zero_page_vector_range(page_align + read,
> >> >>>> -                                                   pos - off - read, pages);
> >> >>>> +               if (was_short) {
> >> >>>> +                       dout(" zero gap %llu to %llu\n", pos + ret, pos + this_len);
> >> >>>> +                       ceph_zero_page_vector_range(page_align + ret,
> >> >>>> +                                                   this_len - ret, pages);
> >> >>>>                 }
> >> >>>> -               pos += ret;
> >> >>>> +               pos += this_len;
> >> >>>>                 read = pos - off;
> >> >>>> -               left -= ret;
> >> >>>> +               left -= this_len;
> >> >>>>                 page_pos += didpages;
> >> >>>>                 pages_left -= didpages;
> >> >>>>
> >> >>>> This patch can do those case. It only add ret== 0 in judgement 'ret > 0".
> >> >>>
> >> >>>maybe we should add a  i_size check. stop reading next strip object
> >> >>>when 'pos > i_size'
> >> >>>
> >> >> I think we can't do this because i_size may smaller than real size in ceph.
> >> >>
> >> >ceph_aio_read() calls ceph_do_getattr() when 'checkeof = true', it
> >> >handles the case.
> >> >
> >> >We must do i_size check in striped_read(), otherwise user program always gets as
> >> >much data as it requests. For example
> >> >
> >> >dd if=/dev/urandom bs=1M count=1 of=file_with_holes
> >> >dd if=file_with_holes bs=64M iflag=direct of=/dev/null
> >> >
> >> Before doing that, we must know the meaning of return value.
> >
> >For ceph_osdc_readpages(),
> >
> >> A: ret = ENOENT
> >
> >The object does not exist.
> >
> >> B: ret = 0
> >
> >The object exists but we read 0 bytes, which means we are past EOF or the 
> >object has size 0 bytes.  Either way, we are either in a hole or past EOF.
> >
> >sage
> >
> >> 
> >> Only we knowed this, we can handle exactly.
> >> Sage, can you explain those meaning in detail?
> >> 
> >> Thanks!
> >> Jianpeng Ma
> >> 
> >> >Regards
> >> >Yan, Zheng
> >> >
> >> >
> >> >>>> But i think i will add a parameter about hit_hole. It will make the code easy to understand.
> >> >>>>
> >> >>>
> >> >>> i think 'was_short' is equal to 'hit_hole'
> >> >>>
> >> >[snip]
> >> Thanks!
> >> Jianpeng Ma
> >> >On Tue, Jul 30, 2013 at 7:41 PM, majianpeng <majianpeng@gmail.com> wrote:
> >> >>>>
> >> >>>>>dd if=/dev/urandom bs=1M count=2 of=file_with_holes
> >> >>>>>dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc
> >> >>>>>dd if=file_with_holes bs=8M >/dev/null
> >> >>>>>
> >> >>>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> >> >>>> index 2ddf061..22a98e5 100644
> >> >>>> --- a/fs/ceph/file.c
> >> >>>> +++ b/fs/ceph/file.c
> >> >>>> @@ -349,17 +349,17 @@ more:
> >> >>>>         dout("striped_read %llu~%u (read %u) got %d%s%s\n", pos, left, read,
> >> >>>>              ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
> >> >>>>
> >> >>>> -       if (ret > 0) {
> >> >>>> -               int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
> >> >>>> +       if (ret >= 0) {
> >> >>>> +               int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT;
> >> >>>>
> >> >>>> -               if (read < pos - off) {
> >> >>>> -                       dout(" zero gap %llu to %llu\n", off + read, pos);
> >> >>>> -                       ceph_zero_page_vector_range(page_align + read,
> >> >>>> -                                                   pos - off - read, pages);
> >> >>>> +               if (was_short) {
> >> >>>> +                       dout(" zero gap %llu to %llu\n", pos + ret, pos + this_len);
> >> >>>> +                       ceph_zero_page_vector_range(page_align + ret,
> >> >>>> +                                                   this_len - ret, pages);
> >> >>>>                 }
> >> >>>> -               pos += ret;
> >> >>>> +               pos += this_len;
> >> >>>>                 read = pos - off;
> >> >>>> -               left -= ret;
> >> >>>> +               left -= this_len;
> >> >>>>                 page_pos += didpages;
> >> >>>>                 pages_left -= didpages;
> >> >>>>
> >> >>>> This patch can do those case. It only add ret== 0 in judgement 'ret > 0".
> >> >>>
> >> >>>maybe we should add a  i_size check. stop reading next strip object
> >> >>>when 'pos > i_size'
> >> >>>
> >> >> I think we can't do this because i_size may smaller than real size in ceph.
> >> >>
> >> >ceph_aio_read() calls ceph_do_getattr() when 'checkeof = true', it
> >> >handles the case.
> >> >
> >> >We must do i_size check in striped_read(), otherwise user program always gets as
> >> >much data as it requests. For example
> >> >
> >> >dd if=/dev/urandom bs=1M count=1 of=file_with_holes
> >> >dd if=file_with_holes bs=64M iflag=direct of=/dev/null
> >> >
> >> >Regards
> >> >Yan, Zheng
> >> >
> >> >
> >> >>>> But i think i will add a parameter about hit_hole. It will make the code easy to understand.
> >> >>>>
> >> >>>
> >> >>> i think 'was_short' is equal to 'hit_hole'
> >> >>>
> >> >[snip]N????y????b?????v?????{.n??????z??ay????????j\a???f????????????????:+v????????\a??zZ+??????"?!?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re: question about striped_read
  2013-07-31  0:47                                           ` Sage Weil
@ 2013-07-31  1:36                                             ` majianpeng
       [not found]                                               ` <CAAM7YAnGaXcQm1LcaCUGL71FGRV5zfNx1iRObFkvXsyVpu91Ag@mail.gmail.com>
  0 siblings, 1 reply; 33+ messages in thread
From: majianpeng @ 2013-07-31  1:36 UTC (permalink / raw)
  To: sage; +Cc: Yan, Zheng, ceph-devel

[snip]
I think this patch can do work:
Those case which i tested
A: filesize=0,  buffer=1M
B:  data[2M] | hole| data[2M], bs= 6M/7M
C: data[4m] | hole | hole |data[2M]  bs=16M/18M

Are there some case ignore?
Thanks!
Jianpeng Ma


diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 2ddf061..96ce893 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -319,7 +319,7 @@ static int striped_read(struct inode *inode,
        int read;
        struct page **page_pos;
        int ret;
-       bool hit_stripe, was_short;
+       bool hit_stripe, was_short, hit_hole = false;
 
        /*
         * we may need to do multiple reads.  not atomic, unfortunately.
@@ -342,21 +342,30 @@ more:
                                  ci->i_truncate_seq,
                                  ci->i_truncate_size,
                                  page_pos, pages_left, page_align);
-       if (ret == -ENOENT)
+
+       if ((ret == 0  || ret == -ENOENT) && pos < inode->i_size)
+               hit_hole = true;
+       else if (ret == -ENOENT)
                ret = 0;
+       
        hit_stripe = this_len < left;
-       was_short = ret >= 0 && ret < this_len;
-       dout("striped_read %llu~%u (read %u) got %d%s%s\n", pos, left, read,
-            ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
-
-       if (ret > 0) {
-               int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
-
-               if (read < pos - off) {
-                       dout(" zero gap %llu to %llu\n", off + read, pos);
-                       ceph_zero_page_vector_range(page_align + read,
-                                                   pos - off - read, pages);
+       was_short = ret > 0 && ret < this_len;
+       dout("striped_read %llu~%u (read %u) got %d%s%s%s\n", pos, left, read,
+            ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "",
+            hit_hole ? " HITHOLE" : "");
+
+       if (ret > 0 || hit_hole) {
+               int didpages;
+               
+               if (hit_hole) {
+                       ret = this_len; 
+                       dout(" zero hole %llu to %llu\n", pos , pos + ret);
+                       ceph_zero_page_vector_range(page_align + read, 
+                                                       ret, pages);
+                       hit_hole = false;
                }
+               didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
+
                pos += ret;
                read = pos - off;
                left -= ret;

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: Re: question about striped_read
       [not found]                                               ` <CAAM7YAnGaXcQm1LcaCUGL71FGRV5zfNx1iRObFkvXsyVpu91Ag@mail.gmail.com>
@ 2013-07-31  5:46                                                 ` majianpeng
       [not found]                                                   ` <CAAM7YAmv6Ar_oTdYG31YSHnQwyUUYSNq3Zj_4fHcwMoOvno7Sw@mail.gmail.com>
  0 siblings, 1 reply; 33+ messages in thread
From: majianpeng @ 2013-07-31  5:46 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: sage, ceph-devel

>On Wed, Jul 31, 2013 at 9:36 AM, majianpeng <majianpeng@gmail.com> wrote:
>> [snip]
>> I think this patch can do work:
>> Those case which i tested
>> A: filesize=0,  buffer=1M
>> B:  data[2M] | hole| data[2M], bs= 6M/7M
>
>I don't think your zero buffer change is correct for this test case.
>
dd if=/dev/urandom of=file bs=1M count=2  oflag=direct
dd if=/dev/urandom of=file bs=1M count=2 seek=4  oflag=direct
ls file
-rw-r--r-- 1 root root  6291456 Jul 31 12:46 file

debug message:
[78370.408220] ceph:           file.c:355  : striped_read 0~7340032 (read 0) got 2097152 HITSTRIPE SHORT
[78370.409179] ceph:           file.c:355  : striped_read 2097152~5242880 (read 2097152) got 0 HITSTRIPE HITHOLE
[78370.409182] ceph:           file.c:362  :  zero hole 2097152 to 4194304
[78370.431289] ceph:           file.c:355  : striped_read 4194304~3145728 (read 4194304) got 2097152 SHORT
[78370.431294] ceph:           file.c:399  : striped_read returns 6291456

Am i missing something?

>> C: data[4m] | hole | hole |data[2M]  bs=16M/18M
>>
>> Are there some case ignore?
>> Thanks!
>> Jianpeng Ma
>>
>>
>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>> index 2ddf061..96ce893 100644
>> --- a/fs/ceph/file.c
>> +++ b/fs/ceph/file.c
>> @@ -319,7 +319,7 @@ static int striped_read(struct inode *inode,
>>         int read;
>>         struct page **page_pos;
>>         int ret;
>> -       bool hit_stripe, was_short;
>> +       bool hit_stripe, was_short, hit_hole = false;
>>
>>         /*
>>          * we may need to do multiple reads.  not atomic, unfortunately.
>> @@ -342,21 +342,30 @@ more:
>>                                   ci->i_truncate_seq,
>>                                   ci->i_truncate_size,
>>                                   page_pos, pages_left, page_align);
>> -       if (ret == -ENOENT)
>> +
>> +       if ((ret == 0  || ret == -ENOENT) && pos < inode->i_size)
>> +               hit_hole = true;
>
>why do we need 'hit_hole', it's essential the same as was_short. The code
>is already so messy.
>I don't think adding 'hit_hole' make it more readable.
>
hit_hole means return value are ENOENT or zero.
But the short-read contain EOF beside above two case.
>> +       else if (ret == -ENOENT)
>>                 ret = 0;
>> +
>>         hit_stripe = this_len < left;
>> -       was_short = ret >= 0 && ret < this_len;
>> -       dout("striped_read %llu~%u (read %u) got %d%s%s\n", pos, left,
>read,
>> -            ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" :
>"");
>> -
>> -       if (ret > 0) {
>> -               int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>> -
>> -               if (read < pos - off) {
>> -                       dout(" zero gap %llu to %llu\n", off + read, pos);
>> -                       ceph_zero_page_vector_range(page_align + read,
>> -                                                   pos - off - read,
>pages);
>> +       was_short = ret > 0 && ret < this_len;
>> +       dout("striped_read %llu~%u (read %u) got %d%s%s%s\n", pos, left,
>read,
>> +            ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" :
>"",
>> +            hit_hole ? " HITHOLE" : "");
>> +
>> +       if (ret > 0 || hit_hole) {
>> +               int didpages;
>> +
>> +               if (hit_hole) {
>> +                       ret = this_len;
>> +                       dout(" zero hole %llu to %llu\n", pos , pos +
>ret);
>> +                       ceph_zero_page_vector_range(page_align + read,
>> +                                                       ret, pages);
>> +                       hit_hole = false;
>this is incorrect for the 'ret > 0' and 'ret = -ENOENT' cases.
>
For the ret > 0, it can't do those.
For the ret = -ENOENT and  pos + this_len > i_size mean , can you give me a example ?

Jianpeng Ma
>Besides, if 'pos + this_len >= i_size'. we should do nothing here. let the
>'zero trailing bytes'
>code do the job.
>
>Yan, Zheng
>
>>                 }
>> +               didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>> +
>>                 pos += ret;
>>                 read = pos - off;
>>                 left -= ret;
>
Thanks!
Jianpeng Ma
>On Wed, Jul 31, 2013 at 9:36 AM, majianpeng <majianpeng@gmail.com> wrote:
>> [snip]
>> I think this patch can do work:
>> Those case which i tested
>> A: filesize=0,  buffer=1M
>> B:  data[2M] | hole| data[2M], bs= 6M/7M
>
>I don't think your zero buffer change is correct for this test case.
>
>> C: data[4m] | hole | hole |data[2M]  bs=16M/18M
>>
>> Are there some case ignore?
>> Thanks!
>> Jianpeng Ma
>>
>>
>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>> index 2ddf061..96ce893 100644
>> --- a/fs/ceph/file.c
>> +++ b/fs/ceph/file.c
>> @@ -319,7 +319,7 @@ static int striped_read(struct inode *inode,
>>         int read;
>>         struct page **page_pos;
>>         int ret;
>> -       bool hit_stripe, was_short;
>> +       bool hit_stripe, was_short, hit_hole = false;
>>
>>         /*
>>          * we may need to do multiple reads.  not atomic, unfortunately.
>> @@ -342,21 +342,30 @@ more:
>>                                   ci->i_truncate_seq,
>>                                   ci->i_truncate_size,
>>                                   page_pos, pages_left, page_align);
>> -       if (ret == -ENOENT)
>> +
>> +       if ((ret == 0  || ret == -ENOENT) && pos < inode->i_size)
>> +               hit_hole = true;
>
>why do we need 'hit_hole', it's essential the same as was_short. The code
>is already so messy.
>I don't think adding 'hit_hole' make it more readable.
>
>> +       else if (ret == -ENOENT)
>>                 ret = 0;
>> +
>>         hit_stripe = this_len < left;
>> -       was_short = ret >= 0 && ret < this_len;
>> -       dout("striped_read %llu~%u (read %u) got %d%s%s\n", pos, left,
>read,
>> -            ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" :
>"");
>> -
>> -       if (ret > 0) {
>> -               int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>> -
>> -               if (read < pos - off) {
>> -                       dout(" zero gap %llu to %llu\n", off + read, pos);
>> -                       ceph_zero_page_vector_range(page_align + read,
>> -                                                   pos - off - read,
>pages);
>> +       was_short = ret > 0 && ret < this_len;
>> +       dout("striped_read %llu~%u (read %u) got %d%s%s%s\n", pos, left,
>read,
>> +            ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" :
>"",
>> +            hit_hole ? " HITHOLE" : "");
>> +
>> +       if (ret > 0 || hit_hole) {
>> +               int didpages;
>> +
>> +               if (hit_hole) {
>> +                       ret = this_len;
>> +                       dout(" zero hole %llu to %llu\n", pos , pos +
>ret);
>> +                       ceph_zero_page_vector_range(page_align + read,
>> +                                                       ret, pages);
>> +                       hit_hole = false;
>this is incorrect for the 'ret > 0' and 'ret = -ENOENT' cases.
>
>Besides, if 'pos + this_len >= i_size'. we should do nothing here. let the
>'zero trailing bytes'
>code do the job.
>
>Yan, Zheng
>
>>                 }
>> +               didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>> +
>>                 pos += ret;
>>                 read = pos - off;
>>                 left -= ret;
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re: question about striped_read
       [not found]                                                   ` <CAAM7YAmv6Ar_oTdYG31YSHnQwyUUYSNq3Zj_4fHcwMoOvno7Sw@mail.gmail.com>
@ 2013-07-31  7:32                                                     ` majianpeng
  2013-07-31  8:26                                                       ` Yan, Zheng
  0 siblings, 1 reply; 33+ messages in thread
From: majianpeng @ 2013-07-31  7:32 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: sage, ceph-devel

 [snip]
Test case
A: touch file
    dd if=file of=/dev/null bs=5M count=1 iflag=direct
B: [data(2M)|hole(2m)][data(2M)]
   dd if=file of=/dev/null bs=8M count=1 iflag=direct
C: [data(4M)[hole(4M)][hole(4M)][data(2M)]
  dd if=file of=/dev/null bs=16M count=1 iflag=direct
D: touch file;truncate -s 5M file
  dd if=file of=/dev/null bs=8M count=1 iflag=direct

Those cases can work.
Now i make different processing  for short-read between 'ret > 0' and "ret =0".
For the short-read which ret > 0, it don't do read-page rather than zero the left area.
This means reduce one meaningless read operation.

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 2ddf061..f643ec8 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -350,13 +350,17 @@ more:
             ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
 
        if (ret > 0) {
-               int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
-
-               if (read < pos - off) {
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 2ddf061..f643ec8 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -350,13 +350,17 @@ more:
             ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
 
        if (ret > 0) {
-               int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
-
-               if (read < pos - off) {
-                       dout(" zero gap %llu to %llu\n", off + read, pos);
-                       ceph_zero_page_vector_range(page_align + read,
-                                                   pos - off - read, pages);
+               int  didpages;
+
+               if (was_short && pos + ret < inode->i_size) {
+                        u64 tmp = min(this_len - ret, 
+                                        	inode->i_size - pos - ret);
+                        dout(" zero gap %llu to %llu\n", pos + ret, pos + ret + tmp);
+                        ceph_zero_page_vector_range(page_align + read + ret,
+                                                       tmp, pages);
+                        ret += tmp;
                }
+               didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
                pos += ret;
                read = pos - off;
                left -= ret;
@@ -375,13 +379,25 @@ more:
 
                /* zero trailing bytes (inside i_size) */
                if (left > 0 && pos < inode->i_size) {
+                       int didpages;
                        if (pos + left > inode->i_size)
                                left = inode->i_size - pos;
 
-                       dout("zero tail %d\n", left);
-                       ceph_zero_page_vector_range(page_align + read, left,
+                       ret = min(left, this_len);
+                       dout("zero tail %d\n", ret);
+                       ceph_zero_page_vector_range(page_align + read, ret,
                                                    pages);
-                       read += left;
+                       didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
+
+                       pos += ret;
+                       read = pos - off;
+                       left -= ret;
+                       page_pos += didpages;
+                       pages_left -= didpages;
+
+                       /* hit stripe? */
+                       if (left && hit_stripe)
+                               goto more;
                }
        }
 

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: Re: question about striped_read
  2013-07-31  7:32                                                     ` majianpeng
@ 2013-07-31  8:26                                                       ` Yan, Zheng
  2013-08-01  1:45                                                         ` majianpeng
  0 siblings, 1 reply; 33+ messages in thread
From: Yan, Zheng @ 2013-07-31  8:26 UTC (permalink / raw)
  To: majianpeng; +Cc: sage, ceph-devel

On Wed, Jul 31, 2013 at 3:32 PM, majianpeng <majianpeng@gmail.com> wrote:
>
>  [snip]
> Test case
> A: touch file
>     dd if=file of=/dev/null bs=5M count=1 iflag=direct
> B: [data(2M)|hole(2m)][data(2M)]
>    dd if=file of=/dev/null bs=8M count=1 iflag=direct
> C: [data(4M)[hole(4M)][hole(4M)][data(2M)]
>   dd if=file of=/dev/null bs=16M count=1 iflag=direct
> D: touch file;truncate -s 5M file
>   dd if=file of=/dev/null bs=8M count=1 iflag=direct
>
> Those cases can work.
> Now i make different processing  for short-read between 'ret > 0' and "ret =0".
> For the short-read which ret > 0, it don't do read-page rather than zero the left area.
> This means reduce one meaningless read operation.
>

This patch looks good. But I still hope not to duplicate code.

how about change
 "hit_stripe = this_len < left;"
to
 "hit_stripe = this_len < left && (ret == this_len || pos + this_len <
inode->i_size);"

> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 2ddf061..f643ec8 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -350,13 +350,17 @@ more:
>              ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
>
>         if (ret > 0) {

changes this to "(ret >= 0)"

> -               int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
> -
> -               if (read < pos - off) {
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 2ddf061..f643ec8 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -350,13 +350,17 @@ more:
>              ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
>
>         if (ret > 0) {
> -               int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
> -
> -               if (read < pos - off) {
> -                       dout(" zero gap %llu to %llu\n", off + read, pos);
> -                       ceph_zero_page_vector_range(page_align + read,
> -                                                   pos - off - read, pages);
> +               int  didpages;
> +
> +               if (was_short && pos + ret < inode->i_size) {

change this to "(was_short && hit_stripe)"

by this way, we can avoid modifying the "zero trailing bytes" code.

Regards,
Yan, Zheng

> +                        u64 tmp = min(this_len - ret,
> +                                               inode->i_size - pos - ret);
> +                        dout(" zero gap %llu to %llu\n", pos + ret, pos + ret + tmp);
> +                        ceph_zero_page_vector_range(page_align + read + ret,
> +                                                       tmp, pages);
> +                        ret += tmp;
>                 }
> +               didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>                 pos += ret;
>                 read = pos - off;
>                 left -= ret;
> @@ -375,13 +379,25 @@ more:
>
>                 /* zero trailing bytes (inside i_size) */
>                 if (left > 0 && pos < inode->i_size) {
> +                       int didpages;
>                         if (pos + left > inode->i_size)
>                                 left = inode->i_size - pos;
>
> -                       dout("zero tail %d\n", left);
> -                       ceph_zero_page_vector_range(page_align + read, left,
> +                       ret = min(left, this_len);
> +                       dout("zero tail %d\n", ret);
> +                       ceph_zero_page_vector_range(page_align + read, ret,
>                                                     pages);
> -                       read += left;
> +                       didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
> +
> +                       pos += ret;
> +                       read = pos - off;
> +                       left -= ret;
> +                       page_pos += didpages;
> +                       pages_left -= didpages;
> +
> +                       /* hit stripe? */
> +                       if (left && hit_stripe)
> +                               goto more;
>                 }
>         }
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re: question about striped_read
  2013-07-31  8:26                                                       ` Yan, Zheng
@ 2013-08-01  1:45                                                         ` majianpeng
  2013-08-01  3:29                                                           ` Yan, Zheng
  0 siblings, 1 reply; 33+ messages in thread
From: majianpeng @ 2013-08-01  1:45 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: sage, ceph-devel

>On Wed, Jul 31, 2013 at 3:32 PM, majianpeng <majianpeng@gmail.com> wrote:
>>
>>  [snip]
>> Test case
>> A: touch file
>>     dd if=file of=/dev/null bs=5M count=1 iflag=direct
>> B: [data(2M)|hole(2m)][data(2M)]
>>    dd if=file of=/dev/null bs=8M count=1 iflag=direct
>> C: [data(4M)[hole(4M)][hole(4M)][data(2M)]
>>   dd if=file of=/dev/null bs=16M count=1 iflag=direct
>> D: touch file;truncate -s 5M file
>>   dd if=file of=/dev/null bs=8M count=1 iflag=direct
>>
>> Those cases can work.
>> Now i make different processing  for short-read between 'ret > 0' and "ret =0".
>> For the short-read which ret > 0, it don't do read-page rather than zero the left area.
>> This means reduce one meaningless read operation.
>>
>
>This patch looks good. But I still hope not to duplicate code.
>
>how about change
> "hit_stripe = this_len < left;"
>to
> "hit_stripe = this_len < left && (ret == this_len || pos + this_len <
>inode->i_size);"
>
To make the code easy to understand, i don't apply your suggestion.But i add this check on the judgement of
whether read more contents.
The follow is the latest patch.Can you check?

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 2ddf061..3d8d14d 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -349,44 +349,38 @@ more:
        dout("striped_read %llu~%u (read %u) got %d%s%s\n", pos, left, read,
             ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
 
-       if (ret > 0) {
-               int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
-
-               if (read < pos - off) {
-                       dout(" zero gap %llu to %llu\n", off + read, pos);
-                       ceph_zero_page_vector_range(page_align + read,
-                                                   pos - off - read, pages);
+       if (ret >= 0) {
+               int  didpages;
+               if (was_short && (pos + ret < inode->i_size)) {
+                       u64 tmp = min(this_len - ret, 
+                                inode->i_size - pos - ret);
+                       dout(" zero gap %llu to %llu\n", 
+                               pos + ret, pos + ret + tmp);
+                       ceph_zero_page_vector_range(page_align + read + ret,
+                                                       tmp, pages);
+                       ret += tmp;
                }
+
+               didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
                pos += ret;
                read = pos - off;
                left -= ret;
                page_pos += didpages;
                pages_left -= didpages;
 
-               /* hit stripe? */
-               if (left && hit_stripe)
+               /* hit stripe and need continue*/
+               if (left && hit_stripe && pos < inode->i_size) 
                        goto more;
+
        }
 
-       if (was_short) {
+       if (ret >= 0) {
+               ret = read;
                /* did we bounce off eof? */
                if (pos + left > inode->i_size)
                        *checkeof = 1;
-
-               /* zero trailing bytes (inside i_size) */
-               if (left > 0 && pos < inode->i_size) {
-                       if (pos + left > inode->i_size)
-                               left = inode->i_size - pos;
-
-                       dout("zero tail %d\n", left);
-                       ceph_zero_page_vector_range(page_align + read, left,
-                                                   pages);
-                       read += left;
-               }
        }
 
-       if (ret >= 0)
-               ret = read;
        dout("striped_read returns %d\n", ret);
        return ret;
 }

Thanks!
Jianpeng Ma

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: Re: question about striped_read
  2013-08-01  1:45                                                         ` majianpeng
@ 2013-08-01  3:29                                                           ` Yan, Zheng
  2013-08-01  6:30                                                             ` majianpeng
  0 siblings, 1 reply; 33+ messages in thread
From: Yan, Zheng @ 2013-08-01  3:29 UTC (permalink / raw)
  To: majianpeng; +Cc: sage, ceph-devel

On Thu, Aug 1, 2013 at 9:45 AM, majianpeng <majianpeng@gmail.com> wrote:
>>On Wed, Jul 31, 2013 at 3:32 PM, majianpeng <majianpeng@gmail.com> wrote:
>>>
>>>  [snip]
>>> Test case
>>> A: touch file
>>>     dd if=file of=/dev/null bs=5M count=1 iflag=direct
>>> B: [data(2M)|hole(2m)][data(2M)]
>>>    dd if=file of=/dev/null bs=8M count=1 iflag=direct
>>> C: [data(4M)[hole(4M)][hole(4M)][data(2M)]
>>>   dd if=file of=/dev/null bs=16M count=1 iflag=direct
>>> D: touch file;truncate -s 5M file
>>>   dd if=file of=/dev/null bs=8M count=1 iflag=direct
>>>
>>> Those cases can work.
>>> Now i make different processing  for short-read between 'ret > 0' and "ret =0".
>>> For the short-read which ret > 0, it don't do read-page rather than zero the left area.
>>> This means reduce one meaningless read operation.
>>>
>>
>>This patch looks good. But I still hope not to duplicate code.
>>
>>how about change
>> "hit_stripe = this_len < left;"
>>to
>> "hit_stripe = this_len < left && (ret == this_len || pos + this_len <
>>inode->i_size);"
>>
> To make the code easy to understand, i don't apply your suggestion.But i add this check on the judgement of
> whether read more contents.
> The follow is the latest patch.Can you check?
>
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 2ddf061..3d8d14d 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -349,44 +349,38 @@ more:
>         dout("striped_read %llu~%u (read %u) got %d%s%s\n", pos, left, read,
>              ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
>
> -       if (ret > 0) {
> -               int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
> -
> -               if (read < pos - off) {
> -                       dout(" zero gap %llu to %llu\n", off + read, pos);
> -                       ceph_zero_page_vector_range(page_align + read,
> -                                                   pos - off - read, pages);
> +       if (ret >= 0) {
> +               int  didpages;
> +               if (was_short && (pos + ret < inode->i_size)) {
> +                       u64 tmp = min(this_len - ret,
> +                                inode->i_size - pos - ret);
> +                       dout(" zero gap %llu to %llu\n",
> +                               pos + ret, pos + ret + tmp);
> +                       ceph_zero_page_vector_range(page_align + read + ret,
> +                                                       tmp, pages);
> +                       ret += tmp;
>                 }
> +
> +               didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>                 pos += ret;
>                 read = pos - off;
>                 left -= ret;
>                 page_pos += didpages;
>                 pages_left -= didpages;
>
> -               /* hit stripe? */
> -               if (left && hit_stripe)
> +               /* hit stripe and need continue*/
> +               if (left && hit_stripe && pos < inode->i_size)
>                         goto more;
> +
>         }
>
> -       if (was_short) {
> +       if (ret >= 0) {
> +               ret = read;
>                 /* did we bounce off eof? */
>                 if (pos + left > inode->i_size)
>                         *checkeof = 1;
> -
> -               /* zero trailing bytes (inside i_size) */
> -               if (left > 0 && pos < inode->i_size) {
> -                       if (pos + left > inode->i_size)
> -                               left = inode->i_size - pos;
> -
> -                       dout("zero tail %d\n", left);
> -                       ceph_zero_page_vector_range(page_align + read, left,
> -                                                   pages);
> -                       read += left;
> -               }
>         }
>
> -       if (ret >= 0)
> -               ret = read;

I think this line should be "if (read > 0) ret = read;". Other than
this, your patch looks good.
You can add "Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>" to your
formal patch.

Regards
Yan, Zheng


>         dout("striped_read returns %d\n", ret);
>         return ret;
>  }
>
> Thanks!
> Jianpeng Ma

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re: question about striped_read
  2013-08-01  3:29                                                           ` Yan, Zheng
@ 2013-08-01  6:30                                                             ` majianpeng
  2013-08-01  7:19                                                               ` Yan, Zheng
  0 siblings, 1 reply; 33+ messages in thread
From: majianpeng @ 2013-08-01  6:30 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: sage, ceph-devel

>On Thu, Aug 1, 2013 at 9:45 AM, majianpeng <majianpeng@gmail.com> wrote:
>>>On Wed, Jul 31, 2013 at 3:32 PM, majianpeng <majianpeng@gmail.com> wrote:
>>>>
>>>>  [snip]
>>>> Test case
>>>> A: touch file
>>>>     dd if=file of=/dev/null bs=5M count=1 iflag=direct
>>>> B: [data(2M)|hole(2m)][data(2M)]
>>>>    dd if=file of=/dev/null bs=8M count=1 iflag=direct
>>>> C: [data(4M)[hole(4M)][hole(4M)][data(2M)]
>>>>   dd if=file of=/dev/null bs=16M count=1 iflag=direct
>>>> D: touch file;truncate -s 5M file
>>>>   dd if=file of=/dev/null bs=8M count=1 iflag=direct
>>>>
>>>> Those cases can work.
>>>> Now i make different processing  for short-read between 'ret > 0' and "ret =0".
>>>> For the short-read which ret > 0, it don't do read-page rather than zero the left area.
>>>> This means reduce one meaningless read operation.
>>>>
>>>
>>>This patch looks good. But I still hope not to duplicate code.
>>>
>>>how about change
>>> "hit_stripe = this_len < left;"
>>>to
>>> "hit_stripe = this_len < left && (ret == this_len || pos + this_len <
>>>inode->i_size);"
>>>
>> To make the code easy to understand, i don't apply your suggestion.But i add this check on the judgement of
>> whether read more contents.
>> The follow is the latest patch.Can you check?
>>
>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>> index 2ddf061..3d8d14d 100644
>> --- a/fs/ceph/file.c
>> +++ b/fs/ceph/file.c
>> @@ -349,44 +349,38 @@ more:
>>         dout("striped_read %llu~%u (read %u) got %d%s%s\n", pos, left, read,
>>              ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
>>
>> -       if (ret > 0) {
>> -               int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>> -
>> -               if (read < pos - off) {
>> -                       dout(" zero gap %llu to %llu\n", off + read, pos);
>> -                       ceph_zero_page_vector_range(page_align + read,
>> -                                                   pos - off - read, pages);
>> +       if (ret >= 0) {
>> +               int  didpages;
>> +               if (was_short && (pos + ret < inode->i_size)) {
>> +                       u64 tmp = min(this_len - ret,
>> +                                inode->i_size - pos - ret);
>> +                       dout(" zero gap %llu to %llu\n",
>> +                               pos + ret, pos + ret + tmp);
>> +                       ceph_zero_page_vector_range(page_align + read + ret,
>> +                                                       tmp, pages);
>> +                       ret += tmp;
>>                 }
>> +
>> +               didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>>                 pos += ret;
>>                 read = pos - off;
>>                 left -= ret;
>>                 page_pos += didpages;
>>                 pages_left -= didpages;
>>
>> -               /* hit stripe? */
>> -               if (left && hit_stripe)
>> +               /* hit stripe and need continue*/
>> +               if (left && hit_stripe && pos < inode->i_size)
>>                         goto more;
>> +
>>         }
>>
>> -       if (was_short) {
>> +       if (ret >= 0) {
>> +               ret = read;
>>                 /* did we bounce off eof? */
>>                 if (pos + left > inode->i_size)
>>                         *checkeof = 1;
>> -
>> -               /* zero trailing bytes (inside i_size) */
>> -               if (left > 0 && pos < inode->i_size) {
>> -                       if (pos + left > inode->i_size)
>> -                               left = inode->i_size - pos;
>> -
>> -                       dout("zero tail %d\n", left);
>> -                       ceph_zero_page_vector_range(page_align + read, left,
>> -                                                   pages);
>> -                       read += left;
>> -               }
>>         }
>>
>> -       if (ret >= 0)
>> -               ret = read;
>
>I think this line should be "if (read > 0) ret = read;". Other than
>this, your patch looks good.
Because you metioned this, I noticed for ceph_sync_read/write the result are && the every striped read/write.
That is if we met one error, the total result is error.It can't return partial result.
I think i should write anthor patch for that.
>You can add "Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>" to your
>formal patch.
Ok ,thanks your times.

Thanks!
Jianpeng Ma
>
>Regards
>Yan, Zheng
>
>
>>         dout("striped_read returns %d\n", ret);
>>         return ret;
>>  }
>>
>> Thanks!
>> Jianpeng Ma
Thanks!
Jianpeng Ma
>On Thu, Aug 1, 2013 at 9:45 AM, majianpeng <majianpeng@gmail.com> wrote:
>>>On Wed, Jul 31, 2013 at 3:32 PM, majianpeng <majianpeng@gmail.com> wrote:
>>>>
>>>>  [snip]
>>>> Test case
>>>> A: touch file
>>>>     dd if=file of=/dev/null bs=5M count=1 iflag=direct
>>>> B: [data(2M)|hole(2m)][data(2M)]
>>>>    dd if=file of=/dev/null bs=8M count=1 iflag=direct
>>>> C: [data(4M)[hole(4M)][hole(4M)][data(2M)]
>>>>   dd if=file of=/dev/null bs=16M count=1 iflag=direct
>>>> D: touch file;truncate -s 5M file
>>>>   dd if=file of=/dev/null bs=8M count=1 iflag=direct
>>>>
>>>> Those cases can work.
>>>> Now i make different processing  for short-read between 'ret > 0' and "ret =0".
>>>> For the short-read which ret > 0, it don't do read-page rather than zero the left area.
>>>> This means reduce one meaningless read operation.
>>>>
>>>
>>>This patch looks good. But I still hope not to duplicate code.
>>>
>>>how about change
>>> "hit_stripe = this_len < left;"
>>>to
>>> "hit_stripe = this_len < left && (ret == this_len || pos + this_len <
>>>inode->i_size);"
>>>
>> To make the code easy to understand, i don't apply your suggestion.But i add this check on the judgement of
>> whether read more contents.
>> The follow is the latest patch.Can you check?
>>
>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>> index 2ddf061..3d8d14d 100644
>> --- a/fs/ceph/file.c
>> +++ b/fs/ceph/file.c
>> @@ -349,44 +349,38 @@ more:
>>         dout("striped_read %llu~%u (read %u) got %d%s%s\n", pos, left, read,
>>              ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
>>
>> -       if (ret > 0) {
>> -               int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>> -
>> -               if (read < pos - off) {
>> -                       dout(" zero gap %llu to %llu\n", off + read, pos);
>> -                       ceph_zero_page_vector_range(page_align + read,
>> -                                                   pos - off - read, pages);
>> +       if (ret >= 0) {
>> +               int  didpages;
>> +               if (was_short && (pos + ret < inode->i_size)) {
>> +                       u64 tmp = min(this_len - ret,
>> +                                inode->i_size - pos - ret);
>> +                       dout(" zero gap %llu to %llu\n",
>> +                               pos + ret, pos + ret + tmp);
>> +                       ceph_zero_page_vector_range(page_align + read + ret,
>> +                                                       tmp, pages);
>> +                       ret += tmp;
>>                 }
>> +
>> +               didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>>                 pos += ret;
>>                 read = pos - off;
>>                 left -= ret;
>>                 page_pos += didpages;
>>                 pages_left -= didpages;
>>
>> -               /* hit stripe? */
>> -               if (left && hit_stripe)
>> +               /* hit stripe and need continue*/
>> +               if (left && hit_stripe && pos < inode->i_size)
>>                         goto more;
>> +
>>         }
>>
>> -       if (was_short) {
>> +       if (ret >= 0) {
>> +               ret = read;
>>                 /* did we bounce off eof? */
>>                 if (pos + left > inode->i_size)
>>                         *checkeof = 1;
>> -
>> -               /* zero trailing bytes (inside i_size) */
>> -               if (left > 0 && pos < inode->i_size) {
>> -                       if (pos + left > inode->i_size)
>> -                               left = inode->i_size - pos;
>> -
>> -                       dout("zero tail %d\n", left);
>> -                       ceph_zero_page_vector_range(page_align + read, left,
>> -                                                   pages);
>> -                       read += left;
>> -               }
>>         }
>>
>> -       if (ret >= 0)
>> -               ret = read;
>
>I think this line should be "if (read > 0) ret = read;". Other than
>this, your patch looks good.
>You can add "Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>" to your
>formal patch.
>
>Regards
>Yan, Zheng
>
>
>>         dout("striped_read returns %d\n", ret);
>>         return ret;
>>  }
>>
>> Thanks!
>> Jianpeng Ma

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Re: question about striped_read
  2013-08-01  6:30                                                             ` majianpeng
@ 2013-08-01  7:19                                                               ` Yan, Zheng
  0 siblings, 0 replies; 33+ messages in thread
From: Yan, Zheng @ 2013-08-01  7:19 UTC (permalink / raw)
  To: majianpeng; +Cc: sage, ceph-devel

On Thu, Aug 1, 2013 at 2:30 PM, majianpeng <majianpeng@gmail.com> wrote:
>>On Thu, Aug 1, 2013 at 9:45 AM, majianpeng <majianpeng@gmail.com> wrote:
>>>>On Wed, Jul 31, 2013 at 3:32 PM, majianpeng <majianpeng@gmail.com> wrote:
>>>>>
>>>>>  [snip]
>>>>> Test case
>>>>> A: touch file
>>>>>     dd if=file of=/dev/null bs=5M count=1 iflag=direct
>>>>> B: [data(2M)|hole(2m)][data(2M)]
>>>>>    dd if=file of=/dev/null bs=8M count=1 iflag=direct
>>>>> C: [data(4M)[hole(4M)][hole(4M)][data(2M)]
>>>>>   dd if=file of=/dev/null bs=16M count=1 iflag=direct
>>>>> D: touch file;truncate -s 5M file
>>>>>   dd if=file of=/dev/null bs=8M count=1 iflag=direct
>>>>>
>>>>> Those cases can work.
>>>>> Now i make different processing  for short-read between 'ret > 0' and "ret =0".
>>>>> For the short-read which ret > 0, it don't do read-page rather than zero the left area.
>>>>> This means reduce one meaningless read operation.
>>>>>
>>>>
>>>>This patch looks good. But I still hope not to duplicate code.
>>>>
>>>>how about change
>>>> "hit_stripe = this_len < left;"
>>>>to
>>>> "hit_stripe = this_len < left && (ret == this_len || pos + this_len <
>>>>inode->i_size);"
>>>>
>>> To make the code easy to understand, i don't apply your suggestion.But i add this check on the judgement of
>>> whether read more contents.
>>> The follow is the latest patch.Can you check?
>>>
>>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>>> index 2ddf061..3d8d14d 100644
>>> --- a/fs/ceph/file.c
>>> +++ b/fs/ceph/file.c
>>> @@ -349,44 +349,38 @@ more:
>>>         dout("striped_read %llu~%u (read %u) got %d%s%s\n", pos, left, read,
>>>              ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
>>>
>>> -       if (ret > 0) {
>>> -               int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>>> -
>>> -               if (read < pos - off) {
>>> -                       dout(" zero gap %llu to %llu\n", off + read, pos);
>>> -                       ceph_zero_page_vector_range(page_align + read,
>>> -                                                   pos - off - read, pages);
>>> +       if (ret >= 0) {
>>> +               int  didpages;
>>> +               if (was_short && (pos + ret < inode->i_size)) {
>>> +                       u64 tmp = min(this_len - ret,
>>> +                                inode->i_size - pos - ret);
>>> +                       dout(" zero gap %llu to %llu\n",
>>> +                               pos + ret, pos + ret + tmp);
>>> +                       ceph_zero_page_vector_range(page_align + read + ret,
>>> +                                                       tmp, pages);
>>> +                       ret += tmp;
>>>                 }
>>> +
>>> +               didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>>>                 pos += ret;
>>>                 read = pos - off;
>>>                 left -= ret;
>>>                 page_pos += didpages;
>>>                 pages_left -= didpages;
>>>
>>> -               /* hit stripe? */
>>> -               if (left && hit_stripe)
>>> +               /* hit stripe and need continue*/
>>> +               if (left && hit_stripe && pos < inode->i_size)
>>>                         goto more;
>>> +
>>>         }
>>>
>>> -       if (was_short) {
>>> +       if (ret >= 0) {
>>> +               ret = read;
>>>                 /* did we bounce off eof? */
>>>                 if (pos + left > inode->i_size)
>>>                         *checkeof = 1;
>>> -
>>> -               /* zero trailing bytes (inside i_size) */
>>> -               if (left > 0 && pos < inode->i_size) {
>>> -                       if (pos + left > inode->i_size)
>>> -                               left = inode->i_size - pos;
>>> -
>>> -                       dout("zero tail %d\n", left);
>>> -                       ceph_zero_page_vector_range(page_align + read, left,
>>> -                                                   pages);
>>> -                       read += left;
>>> -               }
>>>         }
>>>
>>> -       if (ret >= 0)
>>> -               ret = read;
>>
>>I think this line should be "if (read > 0) ret = read;". Other than
>>this, your patch looks good.
> Because you metioned this, I noticed for ceph_sync_read/write the result are && the every striped read/write.
> That is if we met one error, the total result is error.It can't return partial result.

This behavior is not correct. If we read/write some data, then meet an
error, we should return the size we have
read/written. I think all other FS behave like this. See
generic_file_aio_read() and do_generic_file_read().

Regards
Yan, Zheng



> I think i should write anthor patch for that.
>>You can add "Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>" to your
>>formal patch.
> Ok ,thanks your times.
>
> Thanks!
> Jianpeng Ma
>>
>>Regards
>>Yan, Zheng
>>
>>
>>>         dout("striped_read returns %d\n", ret);
>>>         return ret;
>>>  }
>>>
>>> Thanks!
>>> Jianpeng Ma
> Thanks!
> Jianpeng Ma
>>On Thu, Aug 1, 2013 at 9:45 AM, majianpeng <majianpeng@gmail.com> wrote:
>>>>On Wed, Jul 31, 2013 at 3:32 PM, majianpeng <majianpeng@gmail.com> wrote:
>>>>>
>>>>>  [snip]
>>>>> Test case
>>>>> A: touch file
>>>>>     dd if=file of=/dev/null bs=5M count=1 iflag=direct
>>>>> B: [data(2M)|hole(2m)][data(2M)]
>>>>>    dd if=file of=/dev/null bs=8M count=1 iflag=direct
>>>>> C: [data(4M)[hole(4M)][hole(4M)][data(2M)]
>>>>>   dd if=file of=/dev/null bs=16M count=1 iflag=direct
>>>>> D: touch file;truncate -s 5M file
>>>>>   dd if=file of=/dev/null bs=8M count=1 iflag=direct
>>>>>
>>>>> Those cases can work.
>>>>> Now i make different processing  for short-read between 'ret > 0' and "ret =0".
>>>>> For the short-read which ret > 0, it don't do read-page rather than zero the left area.
>>>>> This means reduce one meaningless read operation.
>>>>>
>>>>
>>>>This patch looks good. But I still hope not to duplicate code.
>>>>
>>>>how about change
>>>> "hit_stripe = this_len < left;"
>>>>to
>>>> "hit_stripe = this_len < left && (ret == this_len || pos + this_len <
>>>>inode->i_size);"
>>>>
>>> To make the code easy to understand, i don't apply your suggestion.But i add this check on the judgement of
>>> whether read more contents.
>>> The follow is the latest patch.Can you check?
>>>
>>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>>> index 2ddf061..3d8d14d 100644
>>> --- a/fs/ceph/file.c
>>> +++ b/fs/ceph/file.c
>>> @@ -349,44 +349,38 @@ more:
>>>         dout("striped_read %llu~%u (read %u) got %d%s%s\n", pos, left, read,
>>>              ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
>>>
>>> -       if (ret > 0) {
>>> -               int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>>> -
>>> -               if (read < pos - off) {
>>> -                       dout(" zero gap %llu to %llu\n", off + read, pos);
>>> -                       ceph_zero_page_vector_range(page_align + read,
>>> -                                                   pos - off - read, pages);
>>> +       if (ret >= 0) {
>>> +               int  didpages;
>>> +               if (was_short && (pos + ret < inode->i_size)) {
>>> +                       u64 tmp = min(this_len - ret,
>>> +                                inode->i_size - pos - ret);
>>> +                       dout(" zero gap %llu to %llu\n",
>>> +                               pos + ret, pos + ret + tmp);
>>> +                       ceph_zero_page_vector_range(page_align + read + ret,
>>> +                                                       tmp, pages);
>>> +                       ret += tmp;
>>>                 }
>>> +
>>> +               didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
>>>                 pos += ret;
>>>                 read = pos - off;
>>>                 left -= ret;
>>>                 page_pos += didpages;
>>>                 pages_left -= didpages;
>>>
>>> -               /* hit stripe? */
>>> -               if (left && hit_stripe)
>>> +               /* hit stripe and need continue*/
>>> +               if (left && hit_stripe && pos < inode->i_size)
>>>                         goto more;
>>> +
>>>         }
>>>
>>> -       if (was_short) {
>>> +       if (ret >= 0) {
>>> +               ret = read;
>>>                 /* did we bounce off eof? */
>>>                 if (pos + left > inode->i_size)
>>>                         *checkeof = 1;
>>> -
>>> -               /* zero trailing bytes (inside i_size) */
>>> -               if (left > 0 && pos < inode->i_size) {
>>> -                       if (pos + left > inode->i_size)
>>> -                               left = inode->i_size - pos;
>>> -
>>> -                       dout("zero tail %d\n", left);
>>> -                       ceph_zero_page_vector_range(page_align + read, left,
>>> -                                                   pages);
>>> -                       read += left;
>>> -               }
>>>         }
>>>
>>> -       if (ret >= 0)
>>> -               ret = read;
>>
>>I think this line should be "if (read > 0) ret = read;". Other than
>>this, your patch looks good.
>>You can add "Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>" to your
>>formal patch.
>>
>>Regards
>>Yan, Zheng
>>
>>
>>>         dout("striped_read returns %d\n", ret);
>>>         return ret;
>>>  }
>>>
>>> Thanks!
>>> Jianpeng Ma

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2013-08-01  7:19 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-25  0:52 question about striped_read majianpeng
2013-07-25  5:54 ` Sage Weil
2013-07-25  6:55   ` majianpeng
2013-07-25 12:27     ` Yan, Zheng
2013-07-25 15:50       ` Sage Weil
2013-07-26  0:48         ` majianpeng
2013-07-26  1:14           ` Yan, Zheng
2013-07-26  1:22             ` majianpeng
2013-07-26  1:36               ` Yan, Zheng
2013-07-26  1:38                 ` majianpeng
2013-07-26  1:59                   ` Yan, Zheng
2013-07-26  2:07                     ` majianpeng
     [not found]                       ` <CAAM7YAkNQA5PqVr15CXRQ5xPLk42VCCb3kf3U8ic9f6n3d9SGg@mail.gmail.com>
2013-07-29  3:00                         ` majianpeng
2013-07-29  5:02                           ` Yan, Zheng
2013-07-30  2:08                             ` majianpeng
2013-07-30  2:56                               ` Yan, Zheng
2013-07-30 11:01                             ` majianpeng
2013-07-30 11:14                               ` Yan, Zheng
2013-07-30 11:20                                 ` majianpeng
2013-07-30 11:41                                 ` majianpeng
2013-07-30 12:25                                   ` Yan, Zheng
2013-07-31  0:27                                     ` majianpeng
2013-07-31  0:40                                       ` Sage Weil
2013-07-31  0:44                                         ` majianpeng
2013-07-31  0:47                                           ` Sage Weil
2013-07-31  1:36                                             ` majianpeng
     [not found]                                               ` <CAAM7YAnGaXcQm1LcaCUGL71FGRV5zfNx1iRObFkvXsyVpu91Ag@mail.gmail.com>
2013-07-31  5:46                                                 ` majianpeng
     [not found]                                                   ` <CAAM7YAmv6Ar_oTdYG31YSHnQwyUUYSNq3Zj_4fHcwMoOvno7Sw@mail.gmail.com>
2013-07-31  7:32                                                     ` majianpeng
2013-07-31  8:26                                                       ` Yan, Zheng
2013-08-01  1:45                                                         ` majianpeng
2013-08-01  3:29                                                           ` Yan, Zheng
2013-08-01  6:30                                                             ` majianpeng
2013-08-01  7:19                                                               ` Yan, Zheng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.