All of lore.kernel.org
 help / color / mirror / Atom feed
* [Ocfs2-devel] [PATCH 0/9 v6] ocfs2: support append O_DIRECT write
@ 2015-08-04  6:16 Ryan Ding
  2015-08-04  9:03 ` Joseph Qi
  0 siblings, 1 reply; 14+ messages in thread
From: Ryan Ding @ 2015-08-04  6:16 UTC (permalink / raw)
  To: ocfs2-devel

Hi Joseph,

Sorry for bothering you with the old patches. But I really need to know 
what this patch is for.

https://oss.oracle.com/pipermail/ocfs2-devel/2015-January/010496.html

 From above email archive, you mentioned those patches aim to reduce the 
host page cache consumption. But in my opinion, after append direct io, 
the page used for buffer is clean. System can realloc those cached 
pages. We can even call invalidate_mapping_pages to fast that process. 
Maybe more pages will be needed during direct io. But direct io size can 
not be too large, right?

Thanks,
Ryan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Ocfs2-devel] [PATCH 0/9 v6] ocfs2: support append O_DIRECT write
  2015-08-04  6:16 [Ocfs2-devel] [PATCH 0/9 v6] ocfs2: support append O_DIRECT write Ryan Ding
@ 2015-08-04  9:03 ` Joseph Qi
  2015-08-05  4:40   ` Ryan Ding
  0 siblings, 1 reply; 14+ messages in thread
From: Joseph Qi @ 2015-08-04  9:03 UTC (permalink / raw)
  To: ocfs2-devel

Hi Ryan,

On 2015/8/4 14:16, Ryan Ding wrote:
> Hi Joseph,
> 
> Sorry for bothering you with the old patches. But I really need to know what this patch is for.
> 
> https://oss.oracle.com/pipermail/ocfs2-devel/2015-January/010496.html
> 
> From above email archive, you mentioned those patches aim to reduce the host page cache consumption. But in my opinion, after append direct io, the page used for buffer is clean. System can realloc those cached pages. We can even call invalidate_mapping_pages to fast that process. Maybe more pages will be needed during direct io. But direct io size can not be too large, right?
> 
We introduced the append direct io because originally ocfs2 would fall
back to buffer io in case of thin provision, which was not the actual
behavior that user expect.
I didn't get you that more pages would be needed during direct io. Could
you please explain it more clearly?

Thanks,
Joseph

> Thanks,
> Ryan
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Ocfs2-devel] [PATCH 0/9 v6] ocfs2: support append O_DIRECT write
  2015-08-04  9:03 ` Joseph Qi
@ 2015-08-05  4:40   ` Ryan Ding
  2015-08-05  6:40     ` Joseph Qi
  2015-08-05  7:08     ` Joseph Qi
  0 siblings, 2 replies; 14+ messages in thread
From: Ryan Ding @ 2015-08-05  4:40 UTC (permalink / raw)
  To: ocfs2-devel

Hi Joseph,


On 08/04/2015 05:03 PM, Joseph Qi wrote:
> Hi Ryan,
>
> On 2015/8/4 14:16, Ryan Ding wrote:
>> Hi Joseph,
>>
>> Sorry for bothering you with the old patches. But I really need to know what this patch is for.
>>
>> https://oss.oracle.com/pipermail/ocfs2-devel/2015-January/010496.html
>>
>>  From above email archive, you mentioned those patches aim to reduce the host page cache consumption. But in my opinion, after append direct io, the page used for buffer is clean. System can realloc those cached pages. We can even call invalidate_mapping_pages to fast that process. Maybe more pages will be needed during direct io. But direct io size can not be too large, right?
>>
> We introduced the append direct io because originally ocfs2 would fall
> back to buffer io in case of thin provision, which was not the actual
> behavior that user expect.
direct io has 2 semantics:
1. io is performed synchronously, data is guaranteed to be transferred 
after write syscall return.
2. File I/O is done directly to/from user space buffers. No page buffer 
involved.
But I think #2 is invisible to user space, #1 is the only thing that 
user space is really interested in.
We should balance the benefit and disadvantage to determine whether #2 
should be supported.
The disadvantage is: bring too much complexity to the code, bugs will 
come along. And involved a incompatible feature.
For example, I did a single node sparse file test, and it failed.
The original way of ocfs2 handling direct io(turn to buffer io when it's 
append write or write to a file hole) has 2 consideration:
1. easier to support cluster wide coherence.
2. easier to support sparse file.
But it seems that your patch handle #2 not very well.
There may be more issues that I have not found.
> I didn't get you that more pages would be needed during direct io. Could
> you please explain it more clearly?
I mean the original way of handle append-dio will consume some page 
cache. The page cache size it consume depend on the direct io size. For 
example, 1MB direct io will consume 1MB page cache.But since direct io 
size can not be too large, the page cache it consume can not be too 
large also. And those pages can be freed after direct io finished by 
calling invalidate_mapping_pages().
>
> Thanks,
> Joseph
>
>> Thanks,
>> Ryan
>>
>>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Ocfs2-devel] [PATCH 0/9 v6] ocfs2: support append O_DIRECT write
  2015-08-05  4:40   ` Ryan Ding
@ 2015-08-05  6:40     ` Joseph Qi
  2015-08-05  8:07       ` Ryan Ding
  2015-08-05  7:08     ` Joseph Qi
  1 sibling, 1 reply; 14+ messages in thread
From: Joseph Qi @ 2015-08-05  6:40 UTC (permalink / raw)
  To: ocfs2-devel

On 2015/8/5 12:40, Ryan Ding wrote:
> Hi Joseph,
> 
> 
> On 08/04/2015 05:03 PM, Joseph Qi wrote:
>> Hi Ryan,
>>
>> On 2015/8/4 14:16, Ryan Ding wrote:
>>> Hi Joseph,
>>>
>>> Sorry for bothering you with the old patches. But I really need to know what this patch is for.
>>>
>>> https://oss.oracle.com/pipermail/ocfs2-devel/2015-January/010496.html
>>>
>>>  From above email archive, you mentioned those patches aim to reduce the host page cache consumption. But in my opinion, after append direct io, the page used for buffer is clean. System can realloc those cached pages. We can even call invalidate_mapping_pages to fast that process. Maybe more pages will be needed during direct io. But direct io size can not be too large, right?
>>>
>> We introduced the append direct io because originally ocfs2 would fall
>> back to buffer io in case of thin provision, which was not the actual
>> behavior that user expect.
> direct io has 2 semantics:
> 1. io is performed synchronously, data is guaranteed to be transferred after write syscall return.
> 2. File I/O is done directly to/from user space buffers. No page buffer involved.
> But I think #2 is invisible to user space, #1 is the only thing that user space is really interested in.
> We should balance the benefit and disadvantage to determine whether #2 should be supported.
> The disadvantage is: bring too much complexity to the code, bugs will come along. And involved a incompatible feature.
> For example, I did a single node sparse file test, and it failed.
What do you mean by "failed"? Could you please send out the test case
and the actual output?
And which version did you test? Because some bug fixes were submitted later.
Currently doing direct io with hole is not support.

> The original way of ocfs2 handling direct io(turn to buffer io when it's append write or write to a file hole) has 2 consideration:
> 1. easier to support cluster wide coherence.
> 2. easier to support sparse file.
> But it seems that your patch handle #2 not very well.
> There may be more issues that I have not found.
>> I didn't get you that more pages would be needed during direct io. Could
>> you please explain it more clearly?
> I mean the original way of handle append-dio will consume some page cache. The page cache size it consume depend on the direct io size. For example, 1MB direct io will consume 1MB page cache.But since direct io size can not be too large, the page cache it consume can not be too large also. And those pages can be freed after direct io finished by calling invalidate_mapping_pages().
>>
I've got your point. Please consider the following user scenario.
1. A node mounted several ocfs2 volumes, for example, 10.
2. For each ocfs2 volume, there are several thin provision VMs.

>> Thanks,
>> Joseph
>>
>>> Thanks,
>>> Ryan
>>>
>>>
>>
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Ocfs2-devel] [PATCH 0/9 v6] ocfs2: support append O_DIRECT write
  2015-08-05  4:40   ` Ryan Ding
  2015-08-05  6:40     ` Joseph Qi
@ 2015-08-05  7:08     ` Joseph Qi
  1 sibling, 0 replies; 14+ messages in thread
From: Joseph Qi @ 2015-08-05  7:08 UTC (permalink / raw)
  To: ocfs2-devel

On 2015/8/5 12:40, Ryan Ding wrote:
> Hi Joseph,
> 
> 
> On 08/04/2015 05:03 PM, Joseph Qi wrote:
>> Hi Ryan,
>>
>> On 2015/8/4 14:16, Ryan Ding wrote:
>>> Hi Joseph,
>>>
>>> Sorry for bothering you with the old patches. But I really need to know what this patch is for.
>>>
>>> https://oss.oracle.com/pipermail/ocfs2-devel/2015-January/010496.html
>>>
>>>  From above email archive, you mentioned those patches aim to reduce the host page cache consumption. But in my opinion, after append direct io, the page used for buffer is clean. System can realloc those cached pages. We can even call invalidate_mapping_pages to fast that process. Maybe more pages will be needed during direct io. But direct io size can not be too large, right?
>>>
>> We introduced the append direct io because originally ocfs2 would fall
>> back to buffer io in case of thin provision, which was not the actual
>> behavior that user expect.
> direct io has 2 semantics:
> 1. io is performed synchronously, data is guaranteed to be transferred after write syscall return.
> 2. File I/O is done directly to/from user space buffers. No page buffer involved.
> But I think #2 is invisible to user space, #1 is the only thing that user space is really interested in.
> We should balance the benefit and disadvantage to determine whether #2 should be supported.
> The disadvantage is: bring too much complexity to the code, bugs will come along. And involved a incompatible feature.
> For example, I did a single node sparse file test, and it failed.
> The original way of ocfs2 handling direct io(turn to buffer io when it's append write or write to a file hole) has 2 consideration:
> 1. easier to support cluster wide coherence.
> 2. easier to support sparse file.
> But it seems that your patch handle #2 not very well.
> There may be more issues that I have not found.
>> I didn't get you that more pages would be needed during direct io. Could
>> you please explain it more clearly?
> I mean the original way of handle append-dio will consume some page cache. The page cache size it consume depend on the direct io size. For example, 1MB direct io will consume 1MB page cache.But since direct io size can not be too large, the page cache it consume can not be too large also. And those pages can be freed after direct io finished by calling invalidate_mapping_pages().
>>
One more thing, in ocfs2, o2net_wq is single thread workqueue and has
many responsibilities, including unlock when reclaiming cache. That
means the path will be much longer and involve network.

>> Thanks,
>> Joseph
>>
>>> Thanks,
>>> Ryan
>>>
>>>
>>
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Ocfs2-devel] [PATCH 0/9 v6] ocfs2: support append O_DIRECT write
  2015-08-05  6:40     ` Joseph Qi
@ 2015-08-05  8:07       ` Ryan Ding
  2015-08-05 11:18         ` Joseph Qi
  0 siblings, 1 reply; 14+ messages in thread
From: Ryan Ding @ 2015-08-05  8:07 UTC (permalink / raw)
  To: ocfs2-devel


On 08/05/2015 02:40 PM, Joseph Qi wrote:
> On 2015/8/5 12:40, Ryan Ding wrote:
>> Hi Joseph,
>>
>>
>> On 08/04/2015 05:03 PM, Joseph Qi wrote:
>>> Hi Ryan,
>>>
>>> On 2015/8/4 14:16, Ryan Ding wrote:
>>>> Hi Joseph,
>>>>
>>>> Sorry for bothering you with the old patches. But I really need to know what this patch is for.
>>>>
>>>> https://oss.oracle.com/pipermail/ocfs2-devel/2015-January/010496.html
>>>>
>>>>   From above email archive, you mentioned those patches aim to reduce the host page cache consumption. But in my opinion, after append direct io, the page used for buffer is clean. System can realloc those cached pages. We can even call invalidate_mapping_pages to fast that process. Maybe more pages will be needed during direct io. But direct io size can not be too large, right?
>>>>
>>> We introduced the append direct io because originally ocfs2 would fall
>>> back to buffer io in case of thin provision, which was not the actual
>>> behavior that user expect.
>> direct io has 2 semantics:
>> 1. io is performed synchronously, data is guaranteed to be transferred after write syscall return.
>> 2. File I/O is done directly to/from user space buffers. No page buffer involved.
>> But I think #2 is invisible to user space, #1 is the only thing that user space is really interested in.
>> We should balance the benefit and disadvantage to determine whether #2 should be supported.
>> The disadvantage is: bring too much complexity to the code, bugs will come along. And involved a incompatible feature.
>> For example, I did a single node sparse file test, and it failed.
> What do you mean by "failed"? Could you please send out the test case
> and the actual output?
> And which version did you test? Because some bug fixes were submitted later.
> Currently doing direct io with hole is not support.
I use linux 4.0 latest commit 39a8804455fb23f09157341d3ba7db6d7ae6ee76
A simplified test case is:
dd if=/dev/zero of=/mnt/hello bs=512 count=1 oflag=direct && truncate 
/mnt/hello -s 2097152
file 'hello' is not exist before test. After this command, file 'hello' 
should be all zero. But 512~4096 is some random data.
>
>> The original way of ocfs2 handling direct io(turn to buffer io when it's append write or write to a file hole) has 2 consideration:
>> 1. easier to support cluster wide coherence.
>> 2. easier to support sparse file.
>> But it seems that your patch handle #2 not very well.
>> There may be more issues that I have not found.
>>> I didn't get you that more pages would be needed during direct io. Could
>>> you please explain it more clearly?
>> I mean the original way of handle append-dio will consume some page cache. The page cache size it consume depend on the direct io size. For example, 1MB direct io will consume 1MB page cache.But since direct io size can not be too large, the page cache it consume can not be too large also. And those pages can be freed after direct io finished by calling invalidate_mapping_pages().
> I've got your point. Please consider the following user scenario.
> 1. A node mounted several ocfs2 volumes, for example, 10.
> 2. For each ocfs2 volume, there are several thin provision VMs.
Is there many direct io in parallelthat had been tested out?
About o2net_wq will block reclaim cache issue you mentioned in another 
mail. invalidate_mapping_pages() only free the page cache pages that 
stored data. It will not affect meta data cache. So that will not wait 
unlock. Is that right?
>
>>> Thanks,
>>> Joseph
>>>
>>>> Thanks,
>>>> Ryan
>>>>
>>>>
>>
>> .
>>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Ocfs2-devel] [PATCH 0/9 v6] ocfs2: support append O_DIRECT write
  2015-08-05  8:07       ` Ryan Ding
@ 2015-08-05 11:18         ` Joseph Qi
  2015-08-06  2:35           ` Ryan Ding
  0 siblings, 1 reply; 14+ messages in thread
From: Joseph Qi @ 2015-08-05 11:18 UTC (permalink / raw)
  To: ocfs2-devel

On 2015/8/5 16:07, Ryan Ding wrote:
> 
> On 08/05/2015 02:40 PM, Joseph Qi wrote:
>> On 2015/8/5 12:40, Ryan Ding wrote:
>>> Hi Joseph,
>>>
>>>
>>> On 08/04/2015 05:03 PM, Joseph Qi wrote:
>>>> Hi Ryan,
>>>>
>>>> On 2015/8/4 14:16, Ryan Ding wrote:
>>>>> Hi Joseph,
>>>>>
>>>>> Sorry for bothering you with the old patches. But I really need to know what this patch is for.
>>>>>
>>>>> https://oss.oracle.com/pipermail/ocfs2-devel/2015-January/010496.html
>>>>>
>>>>>   From above email archive, you mentioned those patches aim to reduce the host page cache consumption. But in my opinion, after append direct io, the page used for buffer is clean. System can realloc those cached pages. We can even call invalidate_mapping_pages to fast that process. Maybe more pages will be needed during direct io. But direct io size can not be too large, right?
>>>>>
>>>> We introduced the append direct io because originally ocfs2 would fall
>>>> back to buffer io in case of thin provision, which was not the actual
>>>> behavior that user expect.
>>> direct io has 2 semantics:
>>> 1. io is performed synchronously, data is guaranteed to be transferred after write syscall return.
>>> 2. File I/O is done directly to/from user space buffers. No page buffer involved.
>>> But I think #2 is invisible to user space, #1 is the only thing that user space is really interested in.
>>> We should balance the benefit and disadvantage to determine whether #2 should be supported.
>>> The disadvantage is: bring too much complexity to the code, bugs will come along. And involved a incompatible feature.
>>> For example, I did a single node sparse file test, and it failed.
>> What do you mean by "failed"? Could you please send out the test case
>> and the actual output?
>> And which version did you test? Because some bug fixes were submitted later.
>> Currently doing direct io with hole is not support.
> I use linux 4.0 latest commit 39a8804455fb23f09157341d3ba7db6d7ae6ee76
> A simplified test case is:
> dd if=/dev/zero of=/mnt/hello bs=512 count=1 oflag=direct && truncate /mnt/hello -s 2097152
> file 'hello' is not exist before test. After this command, file 'hello' should be all zero. But 512~4096 is some random data.
I've got the issue.
dd if=/dev/zero of=/mnt/hello bs=512 count=1 oflag=direct
The above dd command will allocate a cluster, but only write 512B, left
512B to cluster size uninitialized.
truncate /mnt/hello -s 2097152
The above truncate is just block aligned, so it will only zero out 4k to
2M.
In my design, I have only considered zeroing out the current allocated
cluster head, and left the tail zeroing out to the next direct io for
performance consideration (we have no need to zero out first and then
write).
So to fix this issue, we should at least zero out the block aligned pad.
But this may be unnecessary in case of continuous direct io. Do you have
any suggestions?

>>
>>> The original way of ocfs2 handling direct io(turn to buffer io when it's append write or write to a file hole) has 2 consideration:
>>> 1. easier to support cluster wide coherence.
>>> 2. easier to support sparse file.
>>> But it seems that your patch handle #2 not very well.
>>> There may be more issues that I have not found.
>>>> I didn't get you that more pages would be needed during direct io. Could
>>>> you please explain it more clearly?
>>> I mean the original way of handle append-dio will consume some page cache. The page cache size it consume depend on the direct io size. For example, 1MB direct io will consume 1MB page cache.But since direct io size can not be too large, the page cache it consume can not be too large also. And those pages can be freed after direct io finished by calling invalidate_mapping_pages().
>> I've got your point. Please consider the following user scenario.
>> 1. A node mounted several ocfs2 volumes, for example, 10.
>> 2. For each ocfs2 volume, there are several thin provision VMs.
> Is there many direct io in parallelthat had been tested out?
> About o2net_wq will block reclaim cache issue you mentioned in another mail. invalidate_mapping_pages() only free the page cache pages that stored data. It will not affect meta data cache. So that will not wait unlock. Is that right?
>>
>>>> Thanks,
>>>> Joseph
>>>>
>>>>> Thanks,
>>>>> Ryan
>>>>>
>>>>>
>>>
>>> .
>>>
>>
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Ocfs2-devel] [PATCH 0/9 v6] ocfs2: support append O_DIRECT write
  2015-08-05 11:18         ` Joseph Qi
@ 2015-08-06  2:35           ` Ryan Ding
  0 siblings, 0 replies; 14+ messages in thread
From: Ryan Ding @ 2015-08-06  2:35 UTC (permalink / raw)
  To: ocfs2-devel

HiJoseph,


On 08/05/2015 07:18 PM, Joseph Qi wrote:
> On 2015/8/5 16:07, Ryan Ding wrote:
>> On 08/05/2015 02:40 PM, Joseph Qi wrote:
>>> On 2015/8/5 12:40, Ryan Ding wrote:
>>>> Hi Joseph,
>>>>
>>>>
>>>> On 08/04/2015 05:03 PM, Joseph Qi wrote:
>>>>> Hi Ryan,
>>>>>
>>>>> On 2015/8/4 14:16, Ryan Ding wrote:
>>>>>> Hi Joseph,
>>>>>>
>>>>>> Sorry for bothering you with the old patches. But I really need to know what this patch is for.
>>>>>>
>>>>>> https://oss.oracle.com/pipermail/ocfs2-devel/2015-January/010496.html
>>>>>>
>>>>>>    From above email archive, you mentioned those patches aim to reduce the host page cache consumption. But in my opinion, after append direct io, the page used for buffer is clean. System can realloc those cached pages. We can even call invalidate_mapping_pages to fast that process. Maybe more pages will be needed during direct io. But direct io size can not be too large, right?
>>>>>>
>>>>> We introduced the append direct io because originally ocfs2 would fall
>>>>> back to buffer io in case of thin provision, which was not the actual
>>>>> behavior that user expect.
>>>> direct io has 2 semantics:
>>>> 1. io is performed synchronously, data is guaranteed to be transferred after write syscall return.
>>>> 2. File I/O is done directly to/from user space buffers. No page buffer involved.
>>>> But I think #2 is invisible to user space, #1 is the only thing that user space is really interested in.
>>>> We should balance the benefit and disadvantage to determine whether #2 should be supported.
>>>> The disadvantage is: bring too much complexity to the code, bugs will come along. And involved a incompatible feature.
>>>> For example, I did a single node sparse file test, and it failed.
>>> What do you mean by "failed"? Could you please send out the test case
>>> and the actual output?
>>> And which version did you test? Because some bug fixes were submitted later.
>>> Currently doing direct io with hole is not support.
>> I use linux 4.0 latest commit 39a8804455fb23f09157341d3ba7db6d7ae6ee76
>> A simplified test case is:
>> dd if=/dev/zero of=/mnt/hello bs=512 count=1 oflag=direct && truncate /mnt/hello -s 2097152
>> file 'hello' is not exist before test. After this command, file 'hello' should be all zero. But 512~4096 is some random data.
> I've got the issue.
> dd if=/dev/zero of=/mnt/hello bs=512 count=1 oflag=direct
> The above dd command will allocate a cluster, but only write 512B, left
> 512B to cluster size uninitialized.
> truncate /mnt/hello -s 2097152
> The above truncate is just block aligned, so it will only zero out 4k to
> 2M.
> In my design, I have only considered zeroing out the current allocated
> cluster head, and left the tail zeroing out to the next direct io for
> performance consideration (we have no need to zero out first and then
> write).
> So to fix this issue, we should at least zero out the block aligned pad.
> But this may be unnecessary in case of continuous direct io. Do you have
> any suggestions?
I havean idea to resole those problem. I will start a new mail and have 
it discussed.
>>>> The original way of ocfs2 handling direct io(turn to buffer io when it's append write or write to a file hole) has 2 consideration:
>>>> 1. easier to support cluster wide coherence.
>>>> 2. easier to support sparse file.
>>>> But it seems that your patch handle #2 not very well.
>>>> There may be more issues that I have not found.
>>>>> I didn't get you that more pages would be needed during direct io. Could
>>>>> you please explain it more clearly?
>>>> I mean the original way of handle append-dio will consume some page cache. The page cache size it consume depend on the direct io size. For example, 1MB direct io will consume 1MB page cache.But since direct io size can not be too large, the page cache it consume can not be too large also. And those pages can be freed after direct io finished by calling invalidate_mapping_pages().
>>> I've got your point. Please consider the following user scenario.
>>> 1. A node mounted several ocfs2 volumes, for example, 10.
>>> 2. For each ocfs2 volume, there are several thin provision VMs.
>> Is there many direct io in parallelthat had been tested out?
>> About o2net_wq will block reclaim cache issue you mentioned in another mail. invalidate_mapping_pages() only free the page cache pages that stored data. It will not affect meta data cache. So that will not wait unlock. Is that right?
>>>>> Thanks,
>>>>> Joseph
>>>>>
>>>>>> Thanks,
>>>>>> Ryan
>>>>>>
>>>>>>
>>>> .
>>>>
>>
>> .
>>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Ocfs2-devel] [PATCH 0/9 v6] ocfs2: support append O_DIRECT write
  2015-01-22  3:54       ` Joseph Qi
@ 2015-01-22  5:06         ` Junxiao Bi
  0 siblings, 0 replies; 14+ messages in thread
From: Junxiao Bi @ 2015-01-22  5:06 UTC (permalink / raw)
  To: ocfs2-devel

On 01/22/2015 11:54 AM, Joseph Qi wrote:
> On 2015/1/22 10:10, Junxiao Bi wrote:
>> On 01/20/2015 05:00 PM, Joseph Qi wrote:
>>> Hi Junxiao,
>>>
>>> On 2015/1/20 16:26, Junxiao Bi wrote:
>>>> Hi Joseph,
>>>>
>>>> Did this version make any performance improvement with v5? I tested v5,
>>>> and it didn't improve performance with original buffer write + sync.
>>> No performance difference between these two versions.
>>> But we have tested with fio before, it shows about 5% performance
>>> improvement with normal buffer write (without sync).
>>> As I described, this feature is not truly for performance improvement.
>>> We aim to reduce the host page cache consumption. For example, dom0
>>> in virtualization case which won't be configured too much memory.
>> I cared the direct-io performance because recently i got a bug that
>> ocfs2 appending write direct-io performance was two times less than
>> ext4. Since you followed some idea from ext4, do you know why ext4 is
>> faster and any advise to improve ocfs2 performance?
> I have sent an related email "[RFD] ocfs2: poor performance on
> append write/punch hole" before to discuss this topic but unfortunately
> no reply.
> https://oss.oracle.com/pipermail/ocfs2-devel/2014-January/009585.html
> 
> Current ocfs2 append O_DIRECT write will fall back to buffer io and then
> do journal force commit. And my test shows jbd2_journal_force_commit
> consumes about 90% out of the total time.
> 
> After this patch set, io will go directly to disk and no longer need
> page cache and force commit. But block allocation, meta data update,
> as well as other necessary steps are still there.
Yes, i think these journal operations may be the key to the performance,
I traced the direct io for ext4 when I do dd test, I found there was no
journal operation during every write, but after dd done, about 5s, there
were some journal flush to disk, looks like ext4 cached the journal. If
ocfs2 can do this, maybe performance will be good.

Thanks,
Junxiao.
> 
> Yes, I have followed ext4 when implementing this feature. And in my
> opinion, the flow is almost the same with ext4. So I am not sure if
> the difference is caused by disk layout.
> 
> --
> Joseph
>>
>> Thanks,
>> Junxiao.
>>>
>>> --
>>> Joseph
>>>>
>>>> Thanks,
>>>> Junxiao.
>>>>
>>>> On 01/20/2015 04:01 PM, Joseph Qi wrote:
>>>>> Currently in case of append O_DIRECT write (block not allocated yet),
>>>>> ocfs2 will fall back to buffered I/O. This has some disadvantages.
>>>>> Firstly, it is not the behavior as expected.
>>>>> Secondly, it will consume huge page cache, e.g. in mass backup scenario.
>>>>> Thirdly, modern filesystems such as ext4 support this feature.
>>>>>
>>>>> In this patch set, the direct I/O write doesn't fallback to buffer I/O
>>>>> write any more because the allocate blocks are enabled in direct I/O
>>>>> now.
>>>>>
>>>>> changelog:
>>>>> v6 <- v5:
>>>>> -- Take Mark's advice to use prefix "dio-" to distinguish dio orphan
>>>>>    entry from unlink/rename.
>>>>> -- Take Mark's advice to treat this feature as a ro compat feature.
>>>>> -- Fix a bug in case of not cluster aligned io, cluster_align should
>>>>>    be !zero_len, not !!zero_len.
>>>>> -- Fix a bug in case of fallocate with FALLOC_FL_KEEP_SIZE.
>>>>> -- Fix the wrong *ppos and written when completing the rest request
>>>>>    using buffer io.
>>>>>
>>>>> Corresponding ocfs2 tools (mkfs.ocfs2, tunefs.ocfs2, fsck.ocfs2, etc.)
>>>>> will be updated later.
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Ocfs2-devel mailing list
>>>>> Ocfs2-devel at oss.oracle.com
>>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>> .
>>
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Ocfs2-devel] [PATCH 0/9 v6] ocfs2: support append O_DIRECT write
  2015-01-22  2:10     ` Junxiao Bi
@ 2015-01-22  3:54       ` Joseph Qi
  2015-01-22  5:06         ` Junxiao Bi
  0 siblings, 1 reply; 14+ messages in thread
From: Joseph Qi @ 2015-01-22  3:54 UTC (permalink / raw)
  To: ocfs2-devel

On 2015/1/22 10:10, Junxiao Bi wrote:
> On 01/20/2015 05:00 PM, Joseph Qi wrote:
>> Hi Junxiao,
>>
>> On 2015/1/20 16:26, Junxiao Bi wrote:
>>> Hi Joseph,
>>>
>>> Did this version make any performance improvement with v5? I tested v5,
>>> and it didn't improve performance with original buffer write + sync.
>> No performance difference between these two versions.
>> But we have tested with fio before, it shows about 5% performance
>> improvement with normal buffer write (without sync).
>> As I described, this feature is not truly for performance improvement.
>> We aim to reduce the host page cache consumption. For example, dom0
>> in virtualization case which won't be configured too much memory.
> I cared the direct-io performance because recently i got a bug that
> ocfs2 appending write direct-io performance was two times less than
> ext4. Since you followed some idea from ext4, do you know why ext4 is
> faster and any advise to improve ocfs2 performance?
I have sent an related email "[RFD] ocfs2: poor performance on
append write/punch hole" before to discuss this topic but unfortunately
no reply.
https://oss.oracle.com/pipermail/ocfs2-devel/2014-January/009585.html

Current ocfs2 append O_DIRECT write will fall back to buffer io and then
do journal force commit. And my test shows jbd2_journal_force_commit
consumes about 90% out of the total time.

After this patch set, io will go directly to disk and no longer need
page cache and force commit. But block allocation, meta data update,
as well as other necessary steps are still there.

Yes, I have followed ext4 when implementing this feature. And in my
opinion, the flow is almost the same with ext4. So I am not sure if
the difference is caused by disk layout.

--
Joseph
> 
> Thanks,
> Junxiao.
>>
>> --
>> Joseph
>>>
>>> Thanks,
>>> Junxiao.
>>>
>>> On 01/20/2015 04:01 PM, Joseph Qi wrote:
>>>> Currently in case of append O_DIRECT write (block not allocated yet),
>>>> ocfs2 will fall back to buffered I/O. This has some disadvantages.
>>>> Firstly, it is not the behavior as expected.
>>>> Secondly, it will consume huge page cache, e.g. in mass backup scenario.
>>>> Thirdly, modern filesystems such as ext4 support this feature.
>>>>
>>>> In this patch set, the direct I/O write doesn't fallback to buffer I/O
>>>> write any more because the allocate blocks are enabled in direct I/O
>>>> now.
>>>>
>>>> changelog:
>>>> v6 <- v5:
>>>> -- Take Mark's advice to use prefix "dio-" to distinguish dio orphan
>>>>    entry from unlink/rename.
>>>> -- Take Mark's advice to treat this feature as a ro compat feature.
>>>> -- Fix a bug in case of not cluster aligned io, cluster_align should
>>>>    be !zero_len, not !!zero_len.
>>>> -- Fix a bug in case of fallocate with FALLOC_FL_KEEP_SIZE.
>>>> -- Fix the wrong *ppos and written when completing the rest request
>>>>    using buffer io.
>>>>
>>>> Corresponding ocfs2 tools (mkfs.ocfs2, tunefs.ocfs2, fsck.ocfs2, etc.)
>>>> will be updated later.
>>>>
>>>>
>>>> _______________________________________________
>>>> Ocfs2-devel mailing list
>>>> Ocfs2-devel at oss.oracle.com
>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>>>
>>>
>>>
>>>
>>
>>
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Ocfs2-devel] [PATCH 0/9 v6] ocfs2: support append O_DIRECT write
  2015-01-20  9:00   ` Joseph Qi
@ 2015-01-22  2:10     ` Junxiao Bi
  2015-01-22  3:54       ` Joseph Qi
  0 siblings, 1 reply; 14+ messages in thread
From: Junxiao Bi @ 2015-01-22  2:10 UTC (permalink / raw)
  To: ocfs2-devel

On 01/20/2015 05:00 PM, Joseph Qi wrote:
> Hi Junxiao,
> 
> On 2015/1/20 16:26, Junxiao Bi wrote:
>> Hi Joseph,
>>
>> Did this version make any performance improvement with v5? I tested v5,
>> and it didn't improve performance with original buffer write + sync.
> No performance difference between these two versions.
> But we have tested with fio before, it shows about 5% performance
> improvement with normal buffer write (without sync).
> As I described, this feature is not truly for performance improvement.
> We aim to reduce the host page cache consumption. For example, dom0
> in virtualization case which won't be configured too much memory.
I cared the direct-io performance because recently i got a bug that
ocfs2 appending write direct-io performance was two times less than
ext4. Since you followed some idea from ext4, do you know why ext4 is
faster and any advise to improve ocfs2 performance?

Thanks,
Junxiao.
> 
> --
> Joseph
>>
>> Thanks,
>> Junxiao.
>>
>> On 01/20/2015 04:01 PM, Joseph Qi wrote:
>>> Currently in case of append O_DIRECT write (block not allocated yet),
>>> ocfs2 will fall back to buffered I/O. This has some disadvantages.
>>> Firstly, it is not the behavior as expected.
>>> Secondly, it will consume huge page cache, e.g. in mass backup scenario.
>>> Thirdly, modern filesystems such as ext4 support this feature.
>>>
>>> In this patch set, the direct I/O write doesn't fallback to buffer I/O
>>> write any more because the allocate blocks are enabled in direct I/O
>>> now.
>>>
>>> changelog:
>>> v6 <- v5:
>>> -- Take Mark's advice to use prefix "dio-" to distinguish dio orphan
>>>    entry from unlink/rename.
>>> -- Take Mark's advice to treat this feature as a ro compat feature.
>>> -- Fix a bug in case of not cluster aligned io, cluster_align should
>>>    be !zero_len, not !!zero_len.
>>> -- Fix a bug in case of fallocate with FALLOC_FL_KEEP_SIZE.
>>> -- Fix the wrong *ppos and written when completing the rest request
>>>    using buffer io.
>>>
>>> Corresponding ocfs2 tools (mkfs.ocfs2, tunefs.ocfs2, fsck.ocfs2, etc.)
>>> will be updated later.
>>>
>>>
>>> _______________________________________________
>>> Ocfs2-devel mailing list
>>> Ocfs2-devel at oss.oracle.com
>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>>
>>
>>
>>
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Ocfs2-devel] [PATCH 0/9 v6] ocfs2: support append O_DIRECT write
  2015-01-20  8:26 ` Junxiao Bi
@ 2015-01-20  9:00   ` Joseph Qi
  2015-01-22  2:10     ` Junxiao Bi
  0 siblings, 1 reply; 14+ messages in thread
From: Joseph Qi @ 2015-01-20  9:00 UTC (permalink / raw)
  To: ocfs2-devel

Hi Junxiao,

On 2015/1/20 16:26, Junxiao Bi wrote:
> Hi Joseph,
> 
> Did this version make any performance improvement with v5? I tested v5,
> and it didn't improve performance with original buffer write + sync.
No performance difference between these two versions.
But we have tested with fio before, it shows about 5% performance
improvement with normal buffer write (without sync).
As I described, this feature is not truly for performance improvement.
We aim to reduce the host page cache consumption. For example, dom0
in virtualization case which won't be configured too much memory.

--
Joseph
> 
> Thanks,
> Junxiao.
> 
> On 01/20/2015 04:01 PM, Joseph Qi wrote:
>> Currently in case of append O_DIRECT write (block not allocated yet),
>> ocfs2 will fall back to buffered I/O. This has some disadvantages.
>> Firstly, it is not the behavior as expected.
>> Secondly, it will consume huge page cache, e.g. in mass backup scenario.
>> Thirdly, modern filesystems such as ext4 support this feature.
>>
>> In this patch set, the direct I/O write doesn't fallback to buffer I/O
>> write any more because the allocate blocks are enabled in direct I/O
>> now.
>>
>> changelog:
>> v6 <- v5:
>> -- Take Mark's advice to use prefix "dio-" to distinguish dio orphan
>>    entry from unlink/rename.
>> -- Take Mark's advice to treat this feature as a ro compat feature.
>> -- Fix a bug in case of not cluster aligned io, cluster_align should
>>    be !zero_len, not !!zero_len.
>> -- Fix a bug in case of fallocate with FALLOC_FL_KEEP_SIZE.
>> -- Fix the wrong *ppos and written when completing the rest request
>>    using buffer io.
>>
>> Corresponding ocfs2 tools (mkfs.ocfs2, tunefs.ocfs2, fsck.ocfs2, etc.)
>> will be updated later.
>>
>>
>> _______________________________________________
>> Ocfs2-devel mailing list
>> Ocfs2-devel at oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>
> 
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Ocfs2-devel] [PATCH 0/9 v6] ocfs2: support append O_DIRECT write
  2015-01-20  8:01 Joseph Qi
@ 2015-01-20  8:26 ` Junxiao Bi
  2015-01-20  9:00   ` Joseph Qi
  0 siblings, 1 reply; 14+ messages in thread
From: Junxiao Bi @ 2015-01-20  8:26 UTC (permalink / raw)
  To: ocfs2-devel

Hi Joseph,

Did this version make any performance improvement with v5? I tested v5,
and it didn't improve performance with original buffer write + sync.

Thanks,
Junxiao.

On 01/20/2015 04:01 PM, Joseph Qi wrote:
> Currently in case of append O_DIRECT write (block not allocated yet),
> ocfs2 will fall back to buffered I/O. This has some disadvantages.
> Firstly, it is not the behavior as expected.
> Secondly, it will consume huge page cache, e.g. in mass backup scenario.
> Thirdly, modern filesystems such as ext4 support this feature.
> 
> In this patch set, the direct I/O write doesn't fallback to buffer I/O
> write any more because the allocate blocks are enabled in direct I/O
> now.
> 
> changelog:
> v6 <- v5:
> -- Take Mark's advice to use prefix "dio-" to distinguish dio orphan
>    entry from unlink/rename.
> -- Take Mark's advice to treat this feature as a ro compat feature.
> -- Fix a bug in case of not cluster aligned io, cluster_align should
>    be !zero_len, not !!zero_len.
> -- Fix a bug in case of fallocate with FALLOC_FL_KEEP_SIZE.
> -- Fix the wrong *ppos and written when completing the rest request
>    using buffer io.
> 
> Corresponding ocfs2 tools (mkfs.ocfs2, tunefs.ocfs2, fsck.ocfs2, etc.)
> will be updated later.
> 
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Ocfs2-devel] [PATCH 0/9 v6] ocfs2: support append O_DIRECT write
@ 2015-01-20  8:01 Joseph Qi
  2015-01-20  8:26 ` Junxiao Bi
  0 siblings, 1 reply; 14+ messages in thread
From: Joseph Qi @ 2015-01-20  8:01 UTC (permalink / raw)
  To: ocfs2-devel

Currently in case of append O_DIRECT write (block not allocated yet),
ocfs2 will fall back to buffered I/O. This has some disadvantages.
Firstly, it is not the behavior as expected.
Secondly, it will consume huge page cache, e.g. in mass backup scenario.
Thirdly, modern filesystems such as ext4 support this feature.

In this patch set, the direct I/O write doesn't fallback to buffer I/O
write any more because the allocate blocks are enabled in direct I/O
now.

changelog:
v6 <- v5:
-- Take Mark's advice to use prefix "dio-" to distinguish dio orphan
   entry from unlink/rename.
-- Take Mark's advice to treat this feature as a ro compat feature.
-- Fix a bug in case of not cluster aligned io, cluster_align should
   be !zero_len, not !!zero_len.
-- Fix a bug in case of fallocate with FALLOC_FL_KEEP_SIZE.
-- Fix the wrong *ppos and written when completing the rest request
   using buffer io.

Corresponding ocfs2 tools (mkfs.ocfs2, tunefs.ocfs2, fsck.ocfs2, etc.)
will be updated later.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-08-06  2:35 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-04  6:16 [Ocfs2-devel] [PATCH 0/9 v6] ocfs2: support append O_DIRECT write Ryan Ding
2015-08-04  9:03 ` Joseph Qi
2015-08-05  4:40   ` Ryan Ding
2015-08-05  6:40     ` Joseph Qi
2015-08-05  8:07       ` Ryan Ding
2015-08-05 11:18         ` Joseph Qi
2015-08-06  2:35           ` Ryan Ding
2015-08-05  7:08     ` Joseph Qi
  -- strict thread matches above, loose matches on Subject: below --
2015-01-20  8:01 Joseph Qi
2015-01-20  8:26 ` Junxiao Bi
2015-01-20  9:00   ` Joseph Qi
2015-01-22  2:10     ` Junxiao Bi
2015-01-22  3:54       ` Joseph Qi
2015-01-22  5:06         ` Junxiao Bi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.