All of lore.kernel.org
 help / color / mirror / Atom feed
* [Lustre-devel] Flush on file close
@ 2010-04-19 18:30 Andrew Perepechko
  2010-04-19 22:31 ` Oleg Drokin
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Andrew Perepechko @ 2010-04-19 18:30 UTC (permalink / raw)
  To: lustre-devel

Some applications expect non-zero errno on close() for any errors that may 
happen during flushing dirty cached data/metadata even though linux manual 
page for close(2) suggests that fsync(2) should be used prior to close(2) in 
order to detect problems like those.

Since syncing may degrade performance to a large extent, what do you think is 
the best/most convenient/least intrusive way to switch to that behaviour? 
Should it be a mount option for the client or anything else?

Andrew.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Lustre-devel] Flush on file close
  2010-04-19 18:30 [Lustre-devel] Flush on file close Andrew Perepechko
@ 2010-04-19 22:31 ` Oleg Drokin
  2010-04-20  2:34 ` Andreas Dilger
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Oleg Drokin @ 2010-04-19 22:31 UTC (permalink / raw)
  To: lustre-devel

Hello!

On Apr 19, 2010, at 2:30 PM, Andrew Perepechko wrote:
> Since syncing may degrade performance to a large extent, what do you think is 
> the best/most convenient/least intrusive way to switch to that behaviour? 
> Should it be a mount option for the client or anything else?

Personally I think this is a non-issue for us.
There is no mad dash in other filesystems to implement this, aside from NFS where this
is what is guaranteed.
As long as we don't lose data after close (which we usually don't anyway) we are in the clear.
If we do (eviction, double failure) then there was some system exception that is kind of
similar to disk failure for regular fs.
Apps doing this "close after each write" should be fixed anyway because now they incur
additional overhead of two extra unneeded sync RPCs (open + close) and adding more overhead
to this is not going to help even if made optional.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Lustre-devel] Flush on file close
  2010-04-19 18:30 [Lustre-devel] Flush on file close Andrew Perepechko
  2010-04-19 22:31 ` Oleg Drokin
@ 2010-04-20  2:34 ` Andreas Dilger
  2010-04-20  5:16   ` Oleg Drokin
  2010-04-20  4:04 ` Andreas Dilger
  2010-04-20 20:27 ` Nicolas Williams
  3 siblings, 1 reply; 12+ messages in thread
From: Andreas Dilger @ 2010-04-20  2:34 UTC (permalink / raw)
  To: lustre-devel

One thing we can do to improve this situation a bit is to return any  
previous write error codes at close time.

Cheers, Andreas

On 2010-04-19, at 12:30, Andrew Perepechko <Andrew.Perepechko@Sun.COM>  
wrote:

> Some applications expect non-zero errno on close() for any errors  
> that may
> happen during flushing dirty cached data/metadata even though linux  
> manual
> page for close(2) suggests that fsync(2) should be used prior to  
> close(2) in
> order to detect problems like those.
>
> Since syncing may degrade performance to a large extent, what do you  
> think is
> the best/most convenient/least intrusive way to switch to that  
> behaviour?
> Should it be a mount option for the client or anything else?
>
> Andrew.
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Lustre-devel] Flush on file close
  2010-04-19 18:30 [Lustre-devel] Flush on file close Andrew Perepechko
  2010-04-19 22:31 ` Oleg Drokin
  2010-04-20  2:34 ` Andreas Dilger
@ 2010-04-20  4:04 ` Andreas Dilger
  2010-04-20 20:27 ` Nicolas Williams
  3 siblings, 0 replies; 12+ messages in thread
From: Andreas Dilger @ 2010-04-20  4:04 UTC (permalink / raw)
  To: lustre-devel

On 2010-04-19, at 12:30, Andrew Perepechko wrote:
> Some applications expect non-zero errno on close() for any errors  
> that may
> happen during flushing dirty cached data/metadata even though linux  
> manual
> page for close(2) suggests that fsync(2) should be used prior to  
> close(2) in
> order to detect problems like those.
>
> Since syncing may degrade performance to a large extent, what do you  
> think is
> the best/most convenient/least intrusive way to switch to that  
> behaviour?
> Should it be a mount option for the client or anything else?


If the application is expecting the "flush all outstanding data"  
semantics of fsync() on close, it should just call fsync() and use the  
return code of that instead of overloading close() to do it.  There  
will be no worse performance by waiting for fsync() to complete, than  
waiting for a close that is flushing all of the data.

In contrast, if an application is NOT expecting synchronous flushing  
on close, then making every close flush all data WILL be a major  
performance impact.  Better to have applications do what they mean,  
and system calls do what they are supposed to, instead of adding in  
extra semantics that not every application wants.

Also, as Oleg wrote, if an application is doing close/open when it is  
really just trying to flush the data to disk (which I saw some traces  
of recently) then this is even WORSE than just doing an fsync().

Cheers, Andreas
--
Andreas Dilger
Principal Engineer, Lustre Group
Oracle Corporation Canada Inc.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Lustre-devel] Flush on file close
  2010-04-20  2:34 ` Andreas Dilger
@ 2010-04-20  5:16   ` Oleg Drokin
  2010-04-20  7:56     ` Mitchell Erblich
  2010-04-21 21:04     ` Andreas Dilger
  0 siblings, 2 replies; 12+ messages in thread
From: Oleg Drokin @ 2010-04-20  5:16 UTC (permalink / raw)
  To: lustre-devel

Actually this is already being done. We do set AS_error (or something like that).
There is only one exception where on eviction we forgot to implement this, I think.

On Apr 19, 2010, at 10:34 PM, Andreas Dilger wrote:

> One thing we can do to improve this situation a bit is to return any  
> previous write error codes at close time.
> 
> Cheers, Andreas
> 
> On 2010-04-19, at 12:30, Andrew Perepechko <Andrew.Perepechko@Sun.COM>  
> wrote:
> 
>> Some applications expect non-zero errno on close() for any errors  
>> that may
>> happen during flushing dirty cached data/metadata even though linux  
>> manual
>> page for close(2) suggests that fsync(2) should be used prior to  
>> close(2) in
>> order to detect problems like those.
>> 
>> Since syncing may degrade performance to a large extent, what do you  
>> think is
>> the best/most convenient/least intrusive way to switch to that  
>> behaviour?
>> Should it be a mount option for the client or anything else?
>> 
>> Andrew.
>> _______________________________________________
>> Lustre-devel mailing list
>> Lustre-devel at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-devel
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Lustre-devel] Flush on file close
  2010-04-20  5:16   ` Oleg Drokin
@ 2010-04-20  7:56     ` Mitchell Erblich
  2010-04-20 19:46       ` Oleg Drokin
  2010-04-21 21:04     ` Andreas Dilger
  1 sibling, 1 reply; 12+ messages in thread
From: Mitchell Erblich @ 2010-04-20  7:56 UTC (permalink / raw)
  To: lustre-devel

Possible Suggestion,

			NFS allows async writes and then commits.

			I would almost suggest this..

			As yes, close's are not expected to fail.
			What would you do / should you do if a close failed?

			Mitchell Erblich

			

			
On Apr 19, 2010, at 10:16 PM, Oleg Drokin wrote:

> Actually this is already being done. We do set AS_error (or something like that).
> There is only one exception where on eviction we forgot to implement this, I think.
> 
> On Apr 19, 2010, at 10:34 PM, Andreas Dilger wrote:
> 
>> One thing we can do to improve this situation a bit is to return any  
>> previous write error codes at close time.
>> 
>> Cheers, Andreas
>> 
>> On 2010-04-19, at 12:30, Andrew Perepechko <Andrew.Perepechko@Sun.COM>  
>> wrote:
>> 
>>> Some applications expect non-zero errno on close() for any errors  
>>> that may
>>> happen during flushing dirty cached data/metadata even though linux  
>>> manual
>>> page for close(2) suggests that fsync(2) should be used prior to  
>>> close(2) in
>>> order to detect problems like those.
>>> 
>>> Since syncing may degrade performance to a large extent, what do you  
>>> think is
>>> the best/most convenient/least intrusive way to switch to that  
>>> behaviour?
>>> Should it be a mount option for the client or anything else?
>>> 
>>> Andrew.
>>> _______________________________________________
>>> Lustre-devel mailing list
>>> Lustre-devel at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>> _______________________________________________
>> Lustre-devel mailing list
>> Lustre-devel at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-devel
> 
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Lustre-devel] Flush on file close
  2010-04-20  7:56     ` Mitchell Erblich
@ 2010-04-20 19:46       ` Oleg Drokin
  2010-04-20 19:57         ` Oleg Drokin
  0 siblings, 1 reply; 12+ messages in thread
From: Oleg Drokin @ 2010-04-20 19:46 UTC (permalink / raw)
  To: lustre-devel

Hello!

On Apr 20, 2010, at 3:56 AM, Mitchell Erblich wrote:

> 			NFS allows async writes and then commits.

It's all internal within NFS. There is no way for the client to say
"these writes are async", "now do commit" aside from
doing manual syncs for commits and using some sort of sync io open flags
to force "sync" writes.
Of course additional level of confusion is the "async writes" you reference
have nothing to do with real write system calls. In fact write system calls
put data in memory and data might stay there for who knows how long
(or until close) and only when NFS client decides (or VFS forces it to)
to write the data to the server is when these "async writes" are used,
Lustre has this too.

> 			I would almost suggest this..

It's already implemented sort of NFS-style except the close-to-open.

> 			As yes, close's are not expected to fail.

Huh?!

> 			What would you do / should you do if a close failed?

Well, you assume the file content is no longer correct and write it again, I guess.
Or if you are like 99% of people writing programs out there, you just bail out with an error ;)

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Lustre-devel] Flush on file close
  2010-04-20 19:46       ` Oleg Drokin
@ 2010-04-20 19:57         ` Oleg Drokin
  0 siblings, 0 replies; 12+ messages in thread
From: Oleg Drokin @ 2010-04-20 19:57 UTC (permalink / raw)
  To: lustre-devel

> It's all internal within NFS. There is no way for the client to say
> "these writes are async", "now do commit" aside from

I meant to say "no way for the application" here.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Lustre-devel] Flush on file close
  2010-04-19 18:30 [Lustre-devel] Flush on file close Andrew Perepechko
                   ` (2 preceding siblings ...)
  2010-04-20  4:04 ` Andreas Dilger
@ 2010-04-20 20:27 ` Nicolas Williams
  2010-04-20 20:43   ` Nicolas Williams
  3 siblings, 1 reply; 12+ messages in thread
From: Nicolas Williams @ 2010-04-20 20:27 UTC (permalink / raw)
  To: lustre-devel

POSIX behavior is that close(2) can fail, so if you care, you should
check its error status.  POSIX behavior is also that close(2) doesn't
imply an fsync(2) (much less a sync(2)).

NFS clients implement fsync(2)-on-close, but that's another story.

However, when writes deferred at close(2) time fail on a local
filesystem...  chances are that subsequent I/O will just fail.  Or at
least that's probably what many users will expect.  But does POSIX
require that?  I don't have it handy, but I'm pretty sure the answer is
"no".  With Lustre we could also have a close(2) whose deferred writes
fail long after the process that could handle the failure is gone.

Databases and so on should always fsync(2) when they need to ensure
consistent on-disk state.  Many apps that write(2) don't need to
fsync(2) (think of writes to log files).

Nico
-- 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Lustre-devel] Flush on file close
  2010-04-20 20:27 ` Nicolas Williams
@ 2010-04-20 20:43   ` Nicolas Williams
  0 siblings, 0 replies; 12+ messages in thread
From: Nicolas Williams @ 2010-04-20 20:43 UTC (permalink / raw)
  To: lustre-devel

On Tue, Apr 20, 2010 at 03:27:40PM -0500, Nicolas Williams wrote:
> However, when writes deferred at close(2) time fail on a local
> filesystem...  chances are that subsequent I/O will just fail.  Or at
> least that's probably what many users will expect.  But does POSIX
> require that?  I don't have it handy, but I'm pretty sure the answer is
> "no".  With Lustre we could also have a close(2) whose deferred writes
> fail long after the process that could handle the failure is gone.

To go out on a complete limb :) what we really need is a variant of
close(2) where eventual failure can be caught, even when the process
that called it has exit()ed.  Something like:

int close_or_spawn(int fd, <posix_spawn()-like arguments>);

or

int close_xid(int fd, uint64_t xid);   /*
                                        * Where some daemon(s) reads a
					* log of sucessful/failed close
					* XIDs and takes action as
					* necessary.
					*/

I prefer something along the lines of close_xid().

Adoption of new APIs in the context of Lustre is probably more
realistic than in the general case, but it'd still be slow.

So we're still left with: you'd better fsync(2) explicitly before
close()ing if you want to make sure that you don't lose data.

Nico
-- 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Lustre-devel] Flush on file close
  2010-04-20  5:16   ` Oleg Drokin
  2010-04-20  7:56     ` Mitchell Erblich
@ 2010-04-21 21:04     ` Andreas Dilger
  2010-04-22  5:15       ` Alexey Lyashkov
  1 sibling, 1 reply; 12+ messages in thread
From: Andreas Dilger @ 2010-04-21 21:04 UTC (permalink / raw)
  To: lustre-devel

On 2010-04-19, at 23:16, Oleg Drokin wrote:
> Actually this is already being done. We do set AS_error (or something like that).  There is only one exception where on eviction we forgot to implement this, I think.

One thing I see in both the 1.x and 2.x client code is that even though we track the errors from async file IO on the file descriptor, we don't actually return them to the application at close time:

ll_file_release()
{
        if (lsm)
                lov_test_and_clear_async_rc(lsm);
        lli->lli_async_rc = 0;     

It looks like this should be something like:

        if (lsm)
                rc2 = lov_test_and_clear_async_rc(lsm);
        if (rc2 == 0)
                rc2 = lli->lli_async_rc;     
        lli->lli_async_rc = 0;
   
> On Apr 19, 2010, at 10:34 PM, Andreas Dilger wrote:
> 
>> One thing we can do to improve this situation a bit is to return any  
>> previous write error codes at close time.
>> 
>> Cheers, Andreas
>> 
>> On 2010-04-19, at 12:30, Andrew Perepechko <Andrew.Perepechko@Sun.COM>  
>> wrote:
>> 
>>> Some applications expect non-zero errno on close() for any errors  
>>> that may
>>> happen during flushing dirty cached data/metadata even though linux  
>>> manual
>>> page for close(2) suggests that fsync(2) should be used prior to  
>>> close(2) in
>>> order to detect problems like those.
>>> 
>>> Since syncing may degrade performance to a large extent, what do you  
>>> think is
>>> the best/most convenient/least intrusive way to switch to that  
>>> behaviour?
>>> Should it be a mount option for the client or anything else?
>>> 
>>> Andrew.
>>> _______________________________________________
>>> Lustre-devel mailing list
>>> Lustre-devel at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>> _______________________________________________
>> Lustre-devel mailing list
>> Lustre-devel at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-devel
> 


Cheers, Andreas
--
Andreas Dilger
Principal Engineer, Lustre Group
Oracle Corporation Canada Inc.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Lustre-devel] Flush on file close
  2010-04-21 21:04     ` Andreas Dilger
@ 2010-04-22  5:15       ` Alexey Lyashkov
  0 siblings, 0 replies; 12+ messages in thread
From: Alexey Lyashkov @ 2010-04-22  5:15 UTC (permalink / raw)
  To: lustre-devel

Oleg, say about code:

                lock_page(page);
                wait_on_page_writeback(page);
                if (rc < 0) {
                        CERROR("writepage inode %lu(%p) of page %p "
                               "failed: %d\n", mapping->host->i_ino,
                               mapping->host, page, rc);
                        if (rc == -ENOSPC)
                                set_bit(AS_ENOSPC, &mapping->flags);
                        else
                                set_bit(AS_EIO, &mapping->flags);
                }

in lock cancel handler and IO completion.
This better way to tell linux kernel about hit a error, because lov async error isn't help in case O_SYNC write (and some other - as i remember).
int wait_on_page_writeback_range(struct address_space *mapping,
                                pgoff_t start, pgoff_t end)
...
        /* Check for outstanding write errors */
        if (test_and_clear_bit(AS_ENOSPC, &mapping->flags))
                ret = -ENOSPC;
        if (test_and_clear_bit(AS_EIO, &mapping->flags))
                ret = -EIO;

        return ret;


 
On Apr 22, 2010, at 00:04, Andreas Dilger wrote:

> On 2010-04-19, at 23:16, Oleg Drokin wrote:
>> Actually this is already being done. We do set AS_error (or something like that).  There is only one exception where on eviction we forgot to implement this, I think.
> 
> One thing I see in both the 1.x and 2.x client code is that even though we track the errors from async file IO on the file descriptor, we don't actually return them to the application at close time:
> 
> ll_file_release()
> {
>        if (lsm)
>                lov_test_and_clear_async_rc(lsm);
>        lli->lli_async_rc = 0;     
> 
> It looks like this should be something like:
> 
>        if (lsm)
>                rc2 = lov_test_and_clear_async_rc(lsm);
>        if (rc2 == 0)
>                rc2 = lli->lli_async_rc;     
>        lli->lli_async_rc = 0;
> 
>> On Apr 19, 2010, at 10:34 PM, Andreas Dilger wrote:
>> 
>>> One thing we can do to improve this situation a bit is to return any  
>>> previous write error codes at close time.
>>> 
>>> Cheers, Andreas
>>> 
>>> On 2010-04-19, at 12:30, Andrew Perepechko <Andrew.Perepechko@Sun.COM>  
>>> wrote:
>>> 
>>>> Some applications expect non-zero errno on close() for any errors  
>>>> that may
>>>> happen during flushing dirty cached data/metadata even though linux  
>>>> manual
>>>> page for close(2) suggests that fsync(2) should be used prior to  
>>>> close(2) in
>>>> order to detect problems like those.
>>>> 
>>>> Since syncing may degrade performance to a large extent, what do you  
>>>> think is
>>>> the best/most convenient/least intrusive way to switch to that  
>>>> behaviour?
>>>> Should it be a mount option for the client or anything else?
>>>> 
>>>> Andrew.
>>>> _______________________________________________
>>>> Lustre-devel mailing list
>>>> Lustre-devel at lists.lustre.org
>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>> _______________________________________________
>>> Lustre-devel mailing list
>>> Lustre-devel at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>> 
> 
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Engineer, Lustre Group
> Oracle Corporation Canada Inc.
> 
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-04-22  5:15 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-19 18:30 [Lustre-devel] Flush on file close Andrew Perepechko
2010-04-19 22:31 ` Oleg Drokin
2010-04-20  2:34 ` Andreas Dilger
2010-04-20  5:16   ` Oleg Drokin
2010-04-20  7:56     ` Mitchell Erblich
2010-04-20 19:46       ` Oleg Drokin
2010-04-20 19:57         ` Oleg Drokin
2010-04-21 21:04     ` Andreas Dilger
2010-04-22  5:15       ` Alexey Lyashkov
2010-04-20  4:04 ` Andreas Dilger
2010-04-20 20:27 ` Nicolas Williams
2010-04-20 20:43   ` Nicolas Williams

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.