* 4.1 client - LAYOUTCOMMIT @ 2010-07-01 23:47 Sandeep Joshi 2010-07-02 0:07 ` 4.1 client - LAYOUTCOMMIT & close Sandeep Joshi 2010-07-06 13:20 ` 4.1 client - LAYOUTCOMMIT Benny Halevy 0 siblings, 2 replies; 38+ messages in thread From: Sandeep Joshi @ 2010-07-01 23:47 UTC (permalink / raw) To: linux-nfs; +Cc: bhalevy As per specification value of newoffset4_u.no_offset should be less than or equal to NFS4_MAXFILEOFF. But, I observe it to be NFS4_MAXFILELEN. regards, Sandeep ^ permalink raw reply [flat|nested] 38+ messages in thread
* 4.1 client - LAYOUTCOMMIT & close 2010-07-01 23:47 4.1 client - LAYOUTCOMMIT Sandeep Joshi @ 2010-07-02 0:07 ` Sandeep Joshi [not found] ` <A062FCC8662DA848949F7C3046B9BEAE01F3A6EE-e1HlL03umel79urLq6li5IWksG4c/lV9Sp/tIRYA5EM@public.gmane.org> 2010-07-06 13:20 ` 4.1 client - LAYOUTCOMMIT Benny Halevy 1 sibling, 1 reply; 38+ messages in thread From: Sandeep Joshi @ 2010-07-02 0:07 UTC (permalink / raw) To: Sandeep Joshi, linux-nfs; +Cc: bhalevy In certain cases, I don't see layoutcommit on a file at all even after doing many writes. Client side operations: open write(s) close On server side (observed operations): open layoutget's close But, I do not see laycommit at all. In terms data written by client it is about 4-5MB. When does client issue laycommit? regards, Sandeep ^ permalink raw reply [flat|nested] 38+ messages in thread
[parent not found: <A062FCC8662DA848949F7C3046B9BEAE01F3A6EE-e1HlL03umel79urLq6li5IWksG4c/lV9Sp/tIRYA5EM@public.gmane.org>]
* Re: 4.1 client - LAYOUTCOMMIT & close [not found] ` <A062FCC8662DA848949F7C3046B9BEAE01F3A6EE-e1HlL03umel79urLq6li5IWksG4c/lV9Sp/tIRYA5EM@public.gmane.org> @ 2010-07-02 15:41 ` Andy Adamson 2010-07-02 17:08 ` 4.1 client - LAYOUTCOMMIT & close Suchit Kaura 2010-07-02 21:46 ` 4.1 client - LAYOUTCOMMIT & close Daniel.Muntz 0 siblings, 2 replies; 38+ messages in thread From: Andy Adamson @ 2010-07-02 15:41 UTC (permalink / raw) To: Sandeep Joshi; +Cc: linux-nfs, bhalevy On Jul 1, 2010, at 8:07 PM, Sandeep Joshi wrote: Hi Sandeep > > In certain cases, I don't see layoutcommit on a file at all even > after doing many writes. FYI: You should not be paying attention to layoutcommits - they have no value for the file layout type. From RFC 5661: "The LAYOUTCOMMIT operation commits chages in the layout represented by the current filehandle, client ID (derived from the session ID in the preceding SEQUENCE operation), byte-range, and stateid." For the block layout type, this sentence has meaning in that there is a layoutupdate4 payload that enumerates the blocks that have changed state from being 'handed out' to being 'written'. The file layout type has no layoutupdate4 payload, and the layout does not change due to writes, and thus the LAYOUTCOMMIT call is useless. The only field in the LAYOUTCOMMIT4args that might possibly be useful is the loca_last_write_offset which tells the server what the client thinks is the EOF of the file after WRITE. It is an extremely lame server (file layout type server) that depends upon clients for this info. > > > > Client side operations: > > open > write(s) > close > > > On server side (observed operations): > > open > layoutget's > close > > > But, I do not see laycommit at all. In terms data written by client > it is about 4-5MB. > > When does client issue laycommit? The latest linux client sends a layout commit when the VFS does a super_operations.write_inode call which happens when the metadata of an inode needs updating. We are seriously considering removing the layoutcommit call from the file layout client. -->Andy > > > regards, > > Sandeep > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" > in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 4.1 client - LAYOUTCOMMIT & close 2010-07-02 15:41 ` Andy Adamson @ 2010-07-02 17:08 ` Suchit Kaura [not found] ` <loom.20100702T190300-538-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org> 2010-07-02 21:46 ` 4.1 client - LAYOUTCOMMIT & close Daniel.Muntz 1 sibling, 1 reply; 38+ messages in thread From: Suchit Kaura @ 2010-07-02 17:08 UTC (permalink / raw) To: linux-nfs > We are seriously considering removing the > layoutcommit call from the file layout client. ^ permalink raw reply [flat|nested] 38+ messages in thread
[parent not found: <loom.20100702T190300-538-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org>]
* Re: 4.1 client - LAYOUTCOMMIT & close [not found] ` <loom.20100702T190300-538-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org> @ 2010-07-06 13:12 ` Andy Adamson 2010-07-06 13:23 ` Benny Halevy 0 siblings, 1 reply; 38+ messages in thread From: Andy Adamson @ 2010-07-06 13:12 UTC (permalink / raw) To: Suchit Kaura; +Cc: linux-nfs On Jul 2, 2010, at 1:08 PM, Suchit Kaura wrote: >> We are seriously considering removing the >> layoutcommit call from the file layout client. >> From the RFC5661: > For block/volume-based layouts, LAYOUTCOMMIT may require updating the > block list that comprises the file and committing this layout to > stable storage. For file-based layouts, synchronization of > attributes between the metadata and storage devices, primarily the > size attribute, is required. > > The control protocol is free to synchronize the attributes before it > receives a LAYOUTCOMMIT; however, upon successful completion of a > LAYOUTCOMMIT, state that exists on the metadata server that > describes > the file MUST be synchronized with the state that exists on the > storage devices that comprise that file as of the client's last sent > operation. Thus, a client that queries the size of a file between a > WRITE to a storage device and the LAYOUTCOMMIT might observe a size > that does not reflect the actual data written. > > I understand and agree with the option that control protocol will > update the > information on the MDFS for file layout type but does the text above > not mark > layout commit as a consistency boundary even with servers supporting > filelayouts? For the file layout type, the COMMIT operation does this already, and the LAYOUTCOMMIT is not needed. My reading of the above text is that if a LAYOUTCOMMIT is sent and successfully completed then the 'MUST be synchronized with the state ..... " text applies. But why would the file layout type want two synchronization points (LAYOUTCOMMIT and COMMIT)? So, why send a LAYOUTCOMMIT for the file layout type? > or are we saying that every write or DFS must be synchronized with > MDFS thru control protocol for file layout servers? Nope, only on COMMIT. -->Andy > > Regards, > Suchit > > Andy Adamson <andros@...> writes: > >> >> >> On Jul 1, 2010, at 8:07 PM, Sandeep Joshi wrote: >> >> Hi Sandeep >> >>> >>> In certain cases, I don't see layoutcommit on a file at all even >>> after doing many writes. >> >> FYI: >> >> You should not be paying attention to layoutcommits - they have no >> value for the file layout type. >> >> From RFC 5661: >> >> "The LAYOUTCOMMIT operation commits chages in the layout represented >> by the current filehandle, client ID (derived from the session ID in >> the preceding SEQUENCE operation), byte-range, and stateid." >> >> For the block layout type, this sentence has meaning in that there is >> a layoutupdate4 payload that enumerates the blocks that have changed >> state from being 'handed out' to being 'written'. >> >> The file layout type has no layoutupdate4 payload, and the layout >> does >> not change due to writes, and thus the LAYOUTCOMMIT call is useless. >> >> The only field in the LAYOUTCOMMIT4args that might possibly be useful >> is the loca_last_write_offset which tells the server what the client >> thinks is the EOF of the file after WRITE. It is an extremely lame >> server (file layout type server) that depends upon clients for this >> info. >> >>> >>> >>> >>> Client side operations: >>> >>> open >>> write(s) >>> close >>> >>> >>> On server side (observed operations): >>> >>> open >>> layoutget's >>> close >>> >>> >>> But, I do not see laycommit at all. In terms data written by client >>> it is about 4-5MB. >>> >>> When does client issue laycommit? >> >> The latest linux client sends a layout commit when the VFS does a >> super_operations.write_inode call which happens when the metadata of >> an inode needs updating. We are seriously considering removing the >> layoutcommit call from the file layout client. >> >> -->Andy >> >>> >>> >>> regards, >>> >>> Sandeep >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" >>> in >>> the body of a message to majordomo@... >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux- >> nfs" in >> the body of a message to majordomo@... >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" > in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 4.1 client - LAYOUTCOMMIT & close 2010-07-06 13:12 ` Andy Adamson @ 2010-07-06 13:23 ` Benny Halevy 0 siblings, 0 replies; 38+ messages in thread From: Benny Halevy @ 2010-07-06 13:23 UTC (permalink / raw) To: Andy Adamson; +Cc: Suchit Kaura, linux-nfs On Jul. 06, 2010, 16:12 +0300, Andy Adamson <andros@netapp.com> wrote: > > On Jul 2, 2010, at 1:08 PM, Suchit Kaura wrote: > >>> We are seriously considering removing the >>> layoutcommit call from the file layout client. >>> From the RFC5661: >> For block/volume-based layouts, LAYOUTCOMMIT may require updating the >> block list that comprises the file and committing this layout to >> stable storage. For file-based layouts, synchronization of >> attributes between the metadata and storage devices, primarily the >> size attribute, is required. >> >> The control protocol is free to synchronize the attributes before it >> receives a LAYOUTCOMMIT; however, upon successful completion of a >> LAYOUTCOMMIT, state that exists on the metadata server that >> describes >> the file MUST be synchronized with the state that exists on the >> storage devices that comprise that file as of the client's last sent >> operation. Thus, a client that queries the size of a file between a >> WRITE to a storage device and the LAYOUTCOMMIT might observe a size >> that does not reflect the actual data written. >> >> I understand and agree with the option that control protocol will >> update the >> information on the MDFS for file layout type but does the text above >> not mark >> layout commit as a consistency boundary even with servers supporting >> filelayouts? > > > For the file layout type, the COMMIT operation does this already, and > the LAYOUTCOMMIT is not needed. My reading of the above text is that This behavior is server implementation specific, isn't it? What about a loosely clustered backend, is it required by the spec. to communicate the file metadata in a cluster coherent way? Benny > if a LAYOUTCOMMIT is sent and successfully completed then the 'MUST be > synchronized with the state ..... " text applies. But why would the > file layout type want two synchronization points (LAYOUTCOMMIT and > COMMIT)? So, why send a LAYOUTCOMMIT for the file layout type? > >> or are we saying that every write or DFS must be synchronized with >> MDFS thru control protocol for file layout servers? > > Nope, only on COMMIT. > > -->Andy > >> >> Regards, >> Suchit >> >> Andy Adamson <andros@...> writes: >> >>> >>> >>> On Jul 1, 2010, at 8:07 PM, Sandeep Joshi wrote: >>> >>> Hi Sandeep >>> >>>> >>>> In certain cases, I don't see layoutcommit on a file at all even >>>> after doing many writes. >>> >>> FYI: >>> >>> You should not be paying attention to layoutcommits - they have no >>> value for the file layout type. >>> >>> From RFC 5661: >>> >>> "The LAYOUTCOMMIT operation commits chages in the layout represented >>> by the current filehandle, client ID (derived from the session ID in >>> the preceding SEQUENCE operation), byte-range, and stateid." >>> >>> For the block layout type, this sentence has meaning in that there is >>> a layoutupdate4 payload that enumerates the blocks that have changed >>> state from being 'handed out' to being 'written'. >>> >>> The file layout type has no layoutupdate4 payload, and the layout >>> does >>> not change due to writes, and thus the LAYOUTCOMMIT call is useless. >>> >>> The only field in the LAYOUTCOMMIT4args that might possibly be useful >>> is the loca_last_write_offset which tells the server what the client >>> thinks is the EOF of the file after WRITE. It is an extremely lame >>> server (file layout type server) that depends upon clients for this >>> info. >>> >>>> >>>> >>>> >>>> Client side operations: >>>> >>>> open >>>> write(s) >>>> close >>>> >>>> >>>> On server side (observed operations): >>>> >>>> open >>>> layoutget's >>>> close >>>> >>>> >>>> But, I do not see laycommit at all. In terms data written by client >>>> it is about 4-5MB. >>>> >>>> When does client issue laycommit? >>> >>> The latest linux client sends a layout commit when the VFS does a >>> super_operations.write_inode call which happens when the metadata of >>> an inode needs updating. We are seriously considering removing the >>> layoutcommit call from the file layout client. >>> >>> -->Andy >>> >>>> >>>> >>>> regards, >>>> >>>> Sandeep >>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" >>>> in >>>> the body of a message to majordomo@... >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux- >>> nfs" in >>> the body of a message to majordomo@... >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> >> >> >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" >> in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: 4.1 client - LAYOUTCOMMIT & close 2010-07-02 15:41 ` Andy Adamson 2010-07-02 17:08 ` 4.1 client - LAYOUTCOMMIT & close Suchit Kaura @ 2010-07-02 21:46 ` Daniel.Muntz 2010-07-06 13:35 ` Benny Halevy 2010-07-06 13:37 ` Andy Adamson 1 sibling, 2 replies; 38+ messages in thread From: Daniel.Muntz @ 2010-07-02 21:46 UTC (permalink / raw) To: andros, sjoshi; +Cc: linux-nfs, bhalevy By "extremely lame server" I assume you mean any pNFS server that doesn't have a cluster FS on the back end. So while this might work well for NetApp (as long as NetApp never ships a non-clustered pNFS), it might break others, or at least severely impact their performance. For example, will the Solaris pNFS server work correctly without LAYOUTCOMMIT? IMHO, the client MUST issue the appropriate LAYOUTCOMMIT, but the server is free to handle it as a no-op if the server implementation does not need to utilize the payload. -Dan > -----Original Message----- > From: linux-nfs-owner@vger.kernel.org > [mailto:linux-nfs-owner@vger.kernel.org] On Behalf Of Andy Adamson > Sent: Friday, July 02, 2010 8:41 AM > To: Sandeep Joshi > Cc: linux-nfs@vger.kernel.org; bhalevy@panasas.com > Subject: Re: 4.1 client - LAYOUTCOMMIT & close > > > On Jul 1, 2010, at 8:07 PM, Sandeep Joshi wrote: > > Hi Sandeep > > > > > In certain cases, I don't see layoutcommit on a file at all even > > after doing many writes. > > FYI: > > You should not be paying attention to layoutcommits - they have no > value for the file layout type. > > From RFC 5661: > > "The LAYOUTCOMMIT operation commits chages in the layout represented > by the current filehandle, client ID (derived from the session ID in > the preceding SEQUENCE operation), byte-range, and stateid." > > For the block layout type, this sentence has meaning in that > there is > a layoutupdate4 payload that enumerates the blocks that have changed > state from being 'handed out' to being 'written'. > > The file layout type has no layoutupdate4 payload, and the > layout does > not change due to writes, and thus the LAYOUTCOMMIT call is useless. > > The only field in the LAYOUTCOMMIT4args that might possibly > be useful > is the loca_last_write_offset which tells the server what the client > thinks is the EOF of the file after WRITE. It is an extremely lame > server (file layout type server) that depends upon clients for this > info. > > > > > > > > > Client side operations: > > > > open > > write(s) > > close > > > > > > On server side (observed operations): > > > > open > > layoutget's > > close > > > > > > But, I do not see laycommit at all. In terms data written > by client > > it is about 4-5MB. > > > > When does client issue laycommit? > > The latest linux client sends a layout commit when the VFS does a > super_operations.write_inode call which happens when the metadata of > an inode needs updating. We are seriously considering removing the > layoutcommit call from the file layout client. > > -->Andy > > > > > > > regards, > > > > Sandeep > > > > -- > > To unsubscribe from this list: send the line "unsubscribe > linux-nfs" > > in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > To unsubscribe from this list: send the line "unsubscribe > linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 4.1 client - LAYOUTCOMMIT & close 2010-07-02 21:46 ` 4.1 client - LAYOUTCOMMIT & close Daniel.Muntz @ 2010-07-06 13:35 ` Benny Halevy 2010-07-06 13:37 ` Andy Adamson 1 sibling, 0 replies; 38+ messages in thread From: Benny Halevy @ 2010-07-06 13:35 UTC (permalink / raw) To: Daniel.Muntz; +Cc: andros, sjoshi, linux-nfs On Jul. 03, 2010, 0:46 +0300, <Daniel.Muntz@emc.com> wrote: > By "extremely lame server" I assume you mean any pNFS server that > doesn't have a cluster FS on the back end. So while this might work > well for NetApp (as long as NetApp never ships a non-clustered pNFS), it > might break others, or at least severely impact their performance. For > example, will the Solaris pNFS server work correctly without > LAYOUTCOMMIT? IMHO, the client MUST issue the appropriate LAYOUTCOMMIT, > but the server is free to handle it as a no-op if the server > implementation does not need to utilize the payload. I completely agree. Only with Dave Noveck suggestion of adding a "LAYOUT_{DATA,FILE}_SYNC4" stable_how4 values (or maybe a LAYOUT_SYNC4=4 or higher power of 2 flag) to be returned by a DS on WRITE, the DS can say that it ensures metadata synchronization with the MDS in a cluster coherent way and the client can relax and avoid sending LAYOUTCOMMIT to the MDS. Otherwise, the linux implementation can potentially support a mount option telling the client to not send a LAYOUTCOMMIT to the MDS as an optimization if the admin is sure that the server doesn't require it. Benny > > -Dan > >> -----Original Message----- >> From: linux-nfs-owner@vger.kernel.org >> [mailto:linux-nfs-owner@vger.kernel.org] On Behalf Of Andy Adamson >> Sent: Friday, July 02, 2010 8:41 AM >> To: Sandeep Joshi >> Cc: linux-nfs@vger.kernel.org; bhalevy@panasas.com >> Subject: Re: 4.1 client - LAYOUTCOMMIT & close >> >> >> On Jul 1, 2010, at 8:07 PM, Sandeep Joshi wrote: >> >> Hi Sandeep >> >>> >>> In certain cases, I don't see layoutcommit on a file at all even >>> after doing many writes. >> >> FYI: >> >> You should not be paying attention to layoutcommits - they have no >> value for the file layout type. >> >> From RFC 5661: >> >> "The LAYOUTCOMMIT operation commits chages in the layout represented >> by the current filehandle, client ID (derived from the session ID in >> the preceding SEQUENCE operation), byte-range, and stateid." >> >> For the block layout type, this sentence has meaning in that >> there is >> a layoutupdate4 payload that enumerates the blocks that have changed >> state from being 'handed out' to being 'written'. >> >> The file layout type has no layoutupdate4 payload, and the >> layout does >> not change due to writes, and thus the LAYOUTCOMMIT call is useless. >> >> The only field in the LAYOUTCOMMIT4args that might possibly >> be useful >> is the loca_last_write_offset which tells the server what the client >> thinks is the EOF of the file after WRITE. It is an extremely lame >> server (file layout type server) that depends upon clients for this >> info. >> >>> >>> >>> >>> Client side operations: >>> >>> open >>> write(s) >>> close >>> >>> >>> On server side (observed operations): >>> >>> open >>> layoutget's >>> close >>> >>> >>> But, I do not see laycommit at all. In terms data written >> by client >>> it is about 4-5MB. >>> >>> When does client issue laycommit? >> >> The latest linux client sends a layout commit when the VFS does a >> super_operations.write_inode call which happens when the metadata of >> an inode needs updating. We are seriously considering removing the >> layoutcommit call from the file layout client. >> >> -->Andy >> >>> >>> >>> regards, >>> >>> Sandeep >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe >> linux-nfs" >>> in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- >> To unsubscribe from this list: send the line "unsubscribe >> linux-nfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 4.1 client - LAYOUTCOMMIT & close 2010-07-02 21:46 ` 4.1 client - LAYOUTCOMMIT & close Daniel.Muntz 2010-07-06 13:35 ` Benny Halevy @ 2010-07-06 13:37 ` Andy Adamson 2010-07-06 14:04 ` Boaz Harrosh 2010-07-06 19:20 ` Daniel.Muntz 1 sibling, 2 replies; 38+ messages in thread From: Andy Adamson @ 2010-07-06 13:37 UTC (permalink / raw) To: Daniel.Muntz; +Cc: sjoshi, linux-nfs, bhalevy On Jul 2, 2010, at 5:46 PM, <Daniel.Muntz@emc.com> wrote: > By "extremely lame server" I assume you mean any pNFS server that > doesn't have a cluster FS on the back end. No, I mean a pNFS file layout type server that depends upon the 'hint' of file size given by LAYOUTCOMMIT. This does not mean that the file system has to be a cluster FS. If COMMIT through MDS is set, the MDS to DS protocol (be it a cluster FS or not) ensures the data is "commited" on the DSs. LAYOUTCOMMIT is not needed. If COMMITs are sent to the DSs (or FILE_SYNC writes), then the MDS to DS protocol (be it a cluster FS or not) should kick off a back-end DS to MDS communication to update the file size on the MDS. What I consider an 'extremely lame pNFS file layout server' is one that requires COMMITs to the DS and then depends upon the LAYOUTCOMMIT to communicate the commited data size to the MDS. -->Andy > So while this might work > well for NetApp (as long as NetApp never ships a non-clustered > pNFS), it > might break others, or at least severely impact their performance. > For > example, will the Solaris pNFS server work correctly without > LAYOUTCOMMIT? IMHO, the client MUST issue the appropriate > LAYOUTCOMMIT, > but the server is free to handle it as a no-op if the server > implementation does not need to utilize the payload. > > -Dan > >> -----Original Message----- >> From: linux-nfs-owner@vger.kernel.org >> [mailto:linux-nfs-owner@vger.kernel.org] On Behalf Of Andy Adamson >> Sent: Friday, July 02, 2010 8:41 AM >> To: Sandeep Joshi >> Cc: linux-nfs@vger.kernel.org; bhalevy@panasas.com >> Subject: Re: 4.1 client - LAYOUTCOMMIT & close >> >> >> On Jul 1, 2010, at 8:07 PM, Sandeep Joshi wrote: >> >> Hi Sandeep >> >>> >>> In certain cases, I don't see layoutcommit on a file at all even >>> after doing many writes. >> >> FYI: >> >> You should not be paying attention to layoutcommits - they have no >> value for the file layout type. >> >> From RFC 5661: >> >> "The LAYOUTCOMMIT operation commits chages in the layout represented >> by the current filehandle, client ID (derived from the session ID in >> the preceding SEQUENCE operation), byte-range, and stateid." >> >> For the block layout type, this sentence has meaning in that >> there is >> a layoutupdate4 payload that enumerates the blocks that have changed >> state from being 'handed out' to being 'written'. >> >> The file layout type has no layoutupdate4 payload, and the >> layout does >> not change due to writes, and thus the LAYOUTCOMMIT call is useless. >> >> The only field in the LAYOUTCOMMIT4args that might possibly >> be useful >> is the loca_last_write_offset which tells the server what the client >> thinks is the EOF of the file after WRITE. It is an extremely lame >> server (file layout type server) that depends upon clients for this >> info. >> >>> >>> >>> >>> Client side operations: >>> >>> open >>> write(s) >>> close >>> >>> >>> On server side (observed operations): >>> >>> open >>> layoutget's >>> close >>> >>> >>> But, I do not see laycommit at all. In terms data written >> by client >>> it is about 4-5MB. >>> >>> When does client issue laycommit? >> >> The latest linux client sends a layout commit when the VFS does a >> super_operations.write_inode call which happens when the metadata of >> an inode needs updating. We are seriously considering removing the >> layoutcommit call from the file layout client. >> >> -->Andy >> >>> >>> >>> regards, >>> >>> Sandeep >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe >> linux-nfs" >>> in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- >> To unsubscribe from this list: send the line "unsubscribe >> linux-nfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 4.1 client - LAYOUTCOMMIT & close 2010-07-06 13:37 ` Andy Adamson @ 2010-07-06 14:04 ` Boaz Harrosh 2010-07-06 19:20 ` Daniel.Muntz 1 sibling, 0 replies; 38+ messages in thread From: Boaz Harrosh @ 2010-07-06 14:04 UTC (permalink / raw) To: Andy Adamson; +Cc: Daniel.Muntz, sjoshi, linux-nfs, bhalevy On 07/06/2010 04:37 PM, Andy Adamson wrote: > > On Jul 2, 2010, at 5:46 PM, <Daniel.Muntz@emc.com> wrote: > > What I consider an 'extremely lame pNFS file layout server' is one > that requires COMMITs to the DS and then depends upon the LAYOUTCOMMIT > to communicate the commited data size to the MDS. > (And mtime) This is not "lame" this is "smart". There are tens of DS(s) but thousands of clients with thousands of open files each, better make the clients busy then the servers. You are not looking scale. > -->Andy > Boaz ^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: 4.1 client - LAYOUTCOMMIT & close 2010-07-06 13:37 ` Andy Adamson 2010-07-06 14:04 ` Boaz Harrosh @ 2010-07-06 19:20 ` Daniel.Muntz 2010-07-06 20:40 ` Trond Myklebust 1 sibling, 1 reply; 38+ messages in thread From: Daniel.Muntz @ 2010-07-06 19:20 UTC (permalink / raw) To: andros; +Cc: sjoshi, linux-nfs, bhalevy The COMMIT to the DS, ttbomk, commits data on the DS. I see it as orthogonal to updating the metadata on the MDS (but perhaps I'm wrong). As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides a synchronization point, so even if the non-clustered server does not want to update metadata on every DS I/O, the LAYOUTCOMMIT could also be a trigger to execute whatever synchronization mechanism the implementer wishes to put in the control protocol. -Dan > -----Original Message----- > From: Andy Adamson [mailto:andros@netapp.com] > Sent: Tuesday, July 06, 2010 6:38 AM > To: Muntz, Daniel > Cc: sjoshi@bluearc.com; linux-nfs@vger.kernel.org; bhalevy@panasas.com > Subject: Re: 4.1 client - LAYOUTCOMMIT & close > > > On Jul 2, 2010, at 5:46 PM, <Daniel.Muntz@emc.com> wrote: > > > By "extremely lame server" I assume you mean any pNFS server that > > doesn't have a cluster FS on the back end. > > No, I mean a pNFS file layout type server that depends upon > the 'hint' > of file size given by LAYOUTCOMMIT. This does not mean that the file > system has to be a cluster FS. > > If COMMIT through MDS is set, the MDS to DS protocol (be it a > cluster > FS or not) ensures the data is "commited" on the DSs. > LAYOUTCOMMIT is > not needed. > > If COMMITs are sent to the DSs (or FILE_SYNC writes), then > the MDS to > DS protocol (be it a cluster FS or not) should kick off a > back-end DS > to MDS communication to update the file size on the MDS. > > What I consider an 'extremely lame pNFS file layout server' is one > that requires COMMITs to the DS and then depends upon the > LAYOUTCOMMIT > to communicate the commited data size to the MDS. > > -->Andy > > > > So while this might work > > well for NetApp (as long as NetApp never ships a non-clustered > > pNFS), it > > might break others, or at least severely impact their > performance. > > For > > example, will the Solaris pNFS server work correctly without > > LAYOUTCOMMIT? IMHO, the client MUST issue the appropriate > > LAYOUTCOMMIT, > > but the server is free to handle it as a no-op if the server > > implementation does not need to utilize the payload. > > > > -Dan > > > >> -----Original Message----- > >> From: linux-nfs-owner@vger.kernel.org > >> [mailto:linux-nfs-owner@vger.kernel.org] On Behalf Of Andy Adamson > >> Sent: Friday, July 02, 2010 8:41 AM > >> To: Sandeep Joshi > >> Cc: linux-nfs@vger.kernel.org; bhalevy@panasas.com > >> Subject: Re: 4.1 client - LAYOUTCOMMIT & close > >> > >> > >> On Jul 1, 2010, at 8:07 PM, Sandeep Joshi wrote: > >> > >> Hi Sandeep > >> > >>> > >>> In certain cases, I don't see layoutcommit on a file at all even > >>> after doing many writes. > >> > >> FYI: > >> > >> You should not be paying attention to layoutcommits - they have no > >> value for the file layout type. > >> > >> From RFC 5661: > >> > >> "The LAYOUTCOMMIT operation commits chages in the layout > represented > >> by the current filehandle, client ID (derived from the > session ID in > >> the preceding SEQUENCE operation), byte-range, and stateid." > >> > >> For the block layout type, this sentence has meaning in that > >> there is > >> a layoutupdate4 payload that enumerates the blocks that > have changed > >> state from being 'handed out' to being 'written'. > >> > >> The file layout type has no layoutupdate4 payload, and the > >> layout does > >> not change due to writes, and thus the LAYOUTCOMMIT call > is useless. > >> > >> The only field in the LAYOUTCOMMIT4args that might possibly > >> be useful > >> is the loca_last_write_offset which tells the server what > the client > >> thinks is the EOF of the file after WRITE. It is an extremely lame > >> server (file layout type server) that depends upon clients for this > >> info. > >> > >>> > >>> > >>> > >>> Client side operations: > >>> > >>> open > >>> write(s) > >>> close > >>> > >>> > >>> On server side (observed operations): > >>> > >>> open > >>> layoutget's > >>> close > >>> > >>> > >>> But, I do not see laycommit at all. In terms data written > >> by client > >>> it is about 4-5MB. > >>> > >>> When does client issue laycommit? > >> > >> The latest linux client sends a layout commit when the VFS does a > >> super_operations.write_inode call which happens when the > metadata of > >> an inode needs updating. We are seriously considering removing the > >> layoutcommit call from the file layout client. > >> > >> -->Andy > >> > >>> > >>> > >>> regards, > >>> > >>> Sandeep > >>> > >>> -- > >>> To unsubscribe from this list: send the line "unsubscribe > >> linux-nfs" > >>> in > >>> the body of a message to majordomo@vger.kernel.org > >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe > >> linux-nfs" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > >> > > > ^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: 4.1 client - LAYOUTCOMMIT & close 2010-07-06 19:20 ` Daniel.Muntz @ 2010-07-06 20:40 ` Trond Myklebust 2010-07-06 22:50 ` Daniel.Muntz 2010-07-07 12:05 ` Benny Halevy 0 siblings, 2 replies; 38+ messages in thread From: Trond Myklebust @ 2010-07-06 20:40 UTC (permalink / raw) To: Daniel.Muntz; +Cc: andros, sjoshi, linux-nfs, bhalevy On Tue, 2010-07-06 at 15:20 -0400, Daniel.Muntz@emc.com wrote: > The COMMIT to the DS, ttbomk, commits data on the DS. I see it as > orthogonal to updating the metadata on the MDS (but perhaps I'm wrong). > As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides a synchronization > point, so even if the non-clustered server does not want to update > metadata on every DS I/O, the LAYOUTCOMMIT could also be a trigger to > execute whatever synchronization mechanism the implementer wishes to put > in the control protocol. As far as I'm aware, there are no exceptions in RFC5661 that would allow pNFS servers to break the rule that any visible change to the data must be atomically accompanied with a change attribute update. As I see it, if your server allows one client to read data that may have been modified by another client that holds a WRITE layout for that range then (since that is a visible data change) it should provide a change attribute update irrespective of whether or not a LAYOUTCOMMIT has been sent. If your MDS is incapable of determining whether or not data has changed on the DSes, then it should probably recall the WRITE layout if someone tries to read data that may have been modified. Said server also needs a strategy for determining if a data change occurred if the client that held the WRITE layout died before it could send the LAYOUTCOMMIT. Cheers Trond > -Dan > > > -----Original Message----- > > From: Andy Adamson [mailto:andros@netapp.com] > > Sent: Tuesday, July 06, 2010 6:38 AM > > To: Muntz, Daniel > > Cc: sjoshi@bluearc.com; linux-nfs@vger.kernel.org; bhalevy@panasas.com > > Subject: Re: 4.1 client - LAYOUTCOMMIT & close > > > > > > On Jul 2, 2010, at 5:46 PM, <Daniel.Muntz@emc.com> wrote: > > > > > By "extremely lame server" I assume you mean any pNFS server that > > > doesn't have a cluster FS on the back end. > > > > No, I mean a pNFS file layout type server that depends upon > > the 'hint' > > of file size given by LAYOUTCOMMIT. This does not mean that the file > > system has to be a cluster FS. > > > > If COMMIT through MDS is set, the MDS to DS protocol (be it a > > cluster > > FS or not) ensures the data is "commited" on the DSs. > > LAYOUTCOMMIT is > > not needed. > > > > If COMMITs are sent to the DSs (or FILE_SYNC writes), then > > the MDS to > > DS protocol (be it a cluster FS or not) should kick off a > > back-end DS > > to MDS communication to update the file size on the MDS. > > > > What I consider an 'extremely lame pNFS file layout server' is one > > that requires COMMITs to the DS and then depends upon the > > LAYOUTCOMMIT > > to communicate the commited data size to the MDS. > > > > -->Andy > > > > > > > So while this might work > > > well for NetApp (as long as NetApp never ships a non-clustered > > > pNFS), it > > > might break others, or at least severely impact their > > performance. > > > For > > > example, will the Solaris pNFS server work correctly without > > > LAYOUTCOMMIT? IMHO, the client MUST issue the appropriate > > > LAYOUTCOMMIT, > > > but the server is free to handle it as a no-op if the server > > > implementation does not need to utilize the payload. > > > > > > -Dan > > > > > >> -----Original Message----- > > >> From: linux-nfs-owner@vger.kernel.org > > >> [mailto:linux-nfs-owner@vger.kernel.org] On Behalf Of Andy Adamson > > >> Sent: Friday, July 02, 2010 8:41 AM > > >> To: Sandeep Joshi > > >> Cc: linux-nfs@vger.kernel.org; bhalevy@panasas.com > > >> Subject: Re: 4.1 client - LAYOUTCOMMIT & close > > >> > > >> > > >> On Jul 1, 2010, at 8:07 PM, Sandeep Joshi wrote: > > >> > > >> Hi Sandeep > > >> > > >>> > > >>> In certain cases, I don't see layoutcommit on a file at all even > > >>> after doing many writes. > > >> > > >> FYI: > > >> > > >> You should not be paying attention to layoutcommits - they have no > > >> value for the file layout type. > > >> > > >> From RFC 5661: > > >> > > >> "The LAYOUTCOMMIT operation commits chages in the layout > > represented > > >> by the current filehandle, client ID (derived from the > > session ID in > > >> the preceding SEQUENCE operation), byte-range, and stateid." > > >> > > >> For the block layout type, this sentence has meaning in that > > >> there is > > >> a layoutupdate4 payload that enumerates the blocks that > > have changed > > >> state from being 'handed out' to being 'written'. > > >> > > >> The file layout type has no layoutupdate4 payload, and the > > >> layout does > > >> not change due to writes, and thus the LAYOUTCOMMIT call > > is useless. > > >> > > >> The only field in the LAYOUTCOMMIT4args that might possibly > > >> be useful > > >> is the loca_last_write_offset which tells the server what > > the client > > >> thinks is the EOF of the file after WRITE. It is an extremely lame > > >> server (file layout type server) that depends upon clients for this > > >> info. > > >> > > >>> > > >>> > > >>> > > >>> Client side operations: > > >>> > > >>> open > > >>> write(s) > > >>> close > > >>> > > >>> > > >>> On server side (observed operations): > > >>> > > >>> open > > >>> layoutget's > > >>> close > > >>> > > >>> > > >>> But, I do not see laycommit at all. In terms data written > > >> by client > > >>> it is about 4-5MB. > > >>> > > >>> When does client issue laycommit? > > >> > > >> The latest linux client sends a layout commit when the VFS does a > > >> super_operations.write_inode call which happens when the > > metadata of > > >> an inode needs updating. We are seriously considering removing the > > >> layoutcommit call from the file layout client. > > >> > > >> -->Andy > > >> > > >>> > > >>> > > >>> regards, > > >>> > > >>> Sandeep > > >>> > > >>> -- > > >>> To unsubscribe from this list: send the line "unsubscribe > > >> linux-nfs" > > >>> in > > >>> the body of a message to majordomo@vger.kernel.org > > >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > > >> > > >> -- > > >> To unsubscribe from this list: send the line "unsubscribe > > >> linux-nfs" in > > >> the body of a message to majordomo@vger.kernel.org > > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > >> > > >> > > > > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: 4.1 client - LAYOUTCOMMIT & close 2010-07-06 20:40 ` Trond Myklebust @ 2010-07-06 22:50 ` Daniel.Muntz 2010-07-06 23:23 ` Trond Myklebust 2010-07-07 12:05 ` Benny Halevy 1 sibling, 1 reply; 38+ messages in thread From: Daniel.Muntz @ 2010-07-06 22:50 UTC (permalink / raw) To: trond.myklebust; +Cc: andros, sjoshi, linux-nfs, bhalevy > -----Original Message----- > From: Trond Myklebust [mailto:trond.myklebust@fys.uio.no] > Sent: Tuesday, July 06, 2010 1:41 PM > To: Muntz, Daniel > Cc: andros@netapp.com; sjoshi@bluearc.com; > linux-nfs@vger.kernel.org; bhalevy@panasas.com > Subject: RE: 4.1 client - LAYOUTCOMMIT & close > > On Tue, 2010-07-06 at 15:20 -0400, Daniel.Muntz@emc.com wrote: > > The COMMIT to the DS, ttbomk, commits data on the DS. I see it as > > orthogonal to updating the metadata on the MDS (but perhaps > I'm wrong). > > As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides a > synchronization > > point, so even if the non-clustered server does not want to update > > metadata on every DS I/O, the LAYOUTCOMMIT could also be a > trigger to > > execute whatever synchronization mechanism the implementer > wishes to put > > in the control protocol. > > As far as I'm aware, there are no exceptions in RFC5661 that > would allow > pNFS servers to break the rule that any visible change to the > data must > be atomically accompanied with a change attribute update. As we've discussed before, until a LAYOUTCOMMIT occurs, new data may or may not be visible to clients. Suppose my server takes the approach that a COMMIT guarantees that data is written to a persistent intent log in NVRAM. On LAYOUTCOMMIT, file data is updated from NVRAM and there is a change attribute update (atomic). A client that does not issue LAYOUTCOMMITs will not be able to write data. If every WRITE to a DS has to atomically update metadata on the MDS, perhaps we could improve performance by co-locating data and metadata on a single server [1/2 :-)] > > As I see it, if your server allows one client to read data > that may have > been modified by another client that holds a WRITE layout for > that range > then (since that is a visible data change) it should provide a change > attribute update irrespective of whether or not a > LAYOUTCOMMIT has been > sent. > If your MDS is incapable of determining whether or not data > has changed > on the DSes, then it should probably recall the WRITE layout > if someone > tries to read data that may have been modified. Said server > also needs a > strategy for determining if a data change occurred if the client that > held the WRITE layout died before it could send the LAYOUTCOMMIT. Sounds like you're suggesting treating layouts as capabilities in the files case, which is one way to solve the problem. Is anyone doing this, or are the files implementations still all treating layouts as simply data locators? > > Cheers > Trond > > > -Dan > > > > > -----Original Message----- > > > From: Andy Adamson [mailto:andros@netapp.com] > > > Sent: Tuesday, July 06, 2010 6:38 AM > > > To: Muntz, Daniel > > > Cc: sjoshi@bluearc.com; linux-nfs@vger.kernel.org; > bhalevy@panasas.com > > > Subject: Re: 4.1 client - LAYOUTCOMMIT & close > > > > > > > > > On Jul 2, 2010, at 5:46 PM, <Daniel.Muntz@emc.com> wrote: > > > > > > > By "extremely lame server" I assume you mean any pNFS > server that > > > > doesn't have a cluster FS on the back end. > > > > > > No, I mean a pNFS file layout type server that depends upon > > > the 'hint' > > > of file size given by LAYOUTCOMMIT. This does not mean > that the file > > > system has to be a cluster FS. > > > > > > If COMMIT through MDS is set, the MDS to DS protocol (be it a > > > cluster > > > FS or not) ensures the data is "commited" on the DSs. > > > LAYOUTCOMMIT is > > > not needed. > > > > > > If COMMITs are sent to the DSs (or FILE_SYNC writes), then > > > the MDS to > > > DS protocol (be it a cluster FS or not) should kick off a > > > back-end DS > > > to MDS communication to update the file size on the MDS. > > > > > > What I consider an 'extremely lame pNFS file layout > server' is one > > > that requires COMMITs to the DS and then depends upon the > > > LAYOUTCOMMIT > > > to communicate the commited data size to the MDS. > > > > > > -->Andy > > > > > > > > > > So while this might work > > > > well for NetApp (as long as NetApp never ships a non-clustered > > > > pNFS), it > > > > might break others, or at least severely impact their > > > performance. > > > > For > > > > example, will the Solaris pNFS server work correctly without > > > > LAYOUTCOMMIT? IMHO, the client MUST issue the appropriate > > > > LAYOUTCOMMIT, > > > > but the server is free to handle it as a no-op if the server > > > > implementation does not need to utilize the payload. > > > > > > > > -Dan > > > > > > > >> -----Original Message----- > > > >> From: linux-nfs-owner@vger.kernel.org > > > >> [mailto:linux-nfs-owner@vger.kernel.org] On Behalf Of > Andy Adamson > > > >> Sent: Friday, July 02, 2010 8:41 AM > > > >> To: Sandeep Joshi > > > >> Cc: linux-nfs@vger.kernel.org; bhalevy@panasas.com > > > >> Subject: Re: 4.1 client - LAYOUTCOMMIT & close > > > >> > > > >> > > > >> On Jul 1, 2010, at 8:07 PM, Sandeep Joshi wrote: > > > >> > > > >> Hi Sandeep > > > >> > > > >>> > > > >>> In certain cases, I don't see layoutcommit on a file > at all even > > > >>> after doing many writes. > > > >> > > > >> FYI: > > > >> > > > >> You should not be paying attention to layoutcommits - > they have no > > > >> value for the file layout type. > > > >> > > > >> From RFC 5661: > > > >> > > > >> "The LAYOUTCOMMIT operation commits chages in the layout > > > represented > > > >> by the current filehandle, client ID (derived from the > > > session ID in > > > >> the preceding SEQUENCE operation), byte-range, and stateid." > > > >> > > > >> For the block layout type, this sentence has meaning in that > > > >> there is > > > >> a layoutupdate4 payload that enumerates the blocks that > > > have changed > > > >> state from being 'handed out' to being 'written'. > > > >> > > > >> The file layout type has no layoutupdate4 payload, and the > > > >> layout does > > > >> not change due to writes, and thus the LAYOUTCOMMIT call > > > is useless. > > > >> > > > >> The only field in the LAYOUTCOMMIT4args that might possibly > > > >> be useful > > > >> is the loca_last_write_offset which tells the server what > > > the client > > > >> thinks is the EOF of the file after WRITE. It is an > extremely lame > > > >> server (file layout type server) that depends upon > clients for this > > > >> info. > > > >> > > > >>> > > > >>> > > > >>> > > > >>> Client side operations: > > > >>> > > > >>> open > > > >>> write(s) > > > >>> close > > > >>> > > > >>> > > > >>> On server side (observed operations): > > > >>> > > > >>> open > > > >>> layoutget's > > > >>> close > > > >>> > > > >>> > > > >>> But, I do not see laycommit at all. In terms data written > > > >> by client > > > >>> it is about 4-5MB. > > > >>> > > > >>> When does client issue laycommit? > > > >> > > > >> The latest linux client sends a layout commit when the > VFS does a > > > >> super_operations.write_inode call which happens when the > > > metadata of > > > >> an inode needs updating. We are seriously considering > removing the > > > >> layoutcommit call from the file layout client. > > > >> > > > >> -->Andy > > > >> > > > >>> > > > >>> > > > >>> regards, > > > >>> > > > >>> Sandeep > > > >>> > > > >>> -- > > > >>> To unsubscribe from this list: send the line "unsubscribe > > > >> linux-nfs" > > > >>> in > > > >>> the body of a message to majordomo@vger.kernel.org > > > >>> More majordomo info at > http://vger.kernel.org/majordomo-info.html > > > >> > > > >> -- > > > >> To unsubscribe from this list: send the line "unsubscribe > > > >> linux-nfs" in > > > >> the body of a message to majordomo@vger.kernel.org > > > >> More majordomo info at > http://vger.kernel.org/majordomo-info.html > > > >> > > > >> > > > > > > > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe > linux-nfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > > ^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: 4.1 client - LAYOUTCOMMIT & close 2010-07-06 22:50 ` Daniel.Muntz @ 2010-07-06 23:23 ` Trond Myklebust 0 siblings, 0 replies; 38+ messages in thread From: Trond Myklebust @ 2010-07-06 23:23 UTC (permalink / raw) To: Daniel.Muntz; +Cc: andros, sjoshi, linux-nfs, bhalevy On Tue, 2010-07-06 at 18:50 -0400, Daniel.Muntz@emc.com wrote: > As we've discussed before, until a LAYOUTCOMMIT occurs, new data may or > may not be visible to clients. > > Suppose my server takes the approach that a COMMIT guarantees that data > is written to a persistent intent log in NVRAM. On LAYOUTCOMMIT, file > data is updated from NVRAM and there is a change attribute update > (atomic). A client that does not issue LAYOUTCOMMITs will not be able > to write data. That's fine unless you make those updates visible to other clients. It's a rather expensive way of solving the problem, though. > If every WRITE to a DS has to atomically update metadata on the MDS, > perhaps we could improve performance by co-locating data and metadata on > a single server [1/2 :-)] You only need to update the metadata when someone requests a change attribute or mtime through a GETATTR request to the MDS, so it shouldn't be that difficult to implement. > > > > As I see it, if your server allows one client to read data > > that may have > > been modified by another client that holds a WRITE layout for > > that range > > then (since that is a visible data change) it should provide a change > > attribute update irrespective of whether or not a > > LAYOUTCOMMIT has been > > sent. > > If your MDS is incapable of determining whether or not data > > has changed > > on the DSes, then it should probably recall the WRITE layout > > if someone > > tries to read data that may have been modified. Said server > > also needs a > > strategy for determining if a data change occurred if the client that > > held the WRITE layout died before it could send the LAYOUTCOMMIT. > > Sounds like you're suggesting treating layouts as capabilities in the > files case, which is one way to solve the problem. Is anyone doing > this, or are the files implementations still all treating layouts as > simply data locators? You shouldn't need it if you have a control protocol that conforms to the definition in section 12.2.6. Cheers Trond > > > > Cheers > > Trond > > > > > -Dan > > > > > > > -----Original Message----- > > > > From: Andy Adamson [mailto:andros@netapp.com] > > > > Sent: Tuesday, July 06, 2010 6:38 AM > > > > To: Muntz, Daniel > > > > Cc: sjoshi@bluearc.com; linux-nfs@vger.kernel.org; > > bhalevy@panasas.com > > > > Subject: Re: 4.1 client - LAYOUTCOMMIT & close > > > > > > > > > > > > On Jul 2, 2010, at 5:46 PM, <Daniel.Muntz@emc.com> wrote: > > > > > > > > > By "extremely lame server" I assume you mean any pNFS > > server that > > > > > doesn't have a cluster FS on the back end. > > > > > > > > No, I mean a pNFS file layout type server that depends upon > > > > the 'hint' > > > > of file size given by LAYOUTCOMMIT. This does not mean > > that the file > > > > system has to be a cluster FS. > > > > > > > > If COMMIT through MDS is set, the MDS to DS protocol (be it a > > > > cluster > > > > FS or not) ensures the data is "commited" on the DSs. > > > > LAYOUTCOMMIT is > > > > not needed. > > > > > > > > If COMMITs are sent to the DSs (or FILE_SYNC writes), then > > > > the MDS to > > > > DS protocol (be it a cluster FS or not) should kick off a > > > > back-end DS > > > > to MDS communication to update the file size on the MDS. > > > > > > > > What I consider an 'extremely lame pNFS file layout > > server' is one > > > > that requires COMMITs to the DS and then depends upon the > > > > LAYOUTCOMMIT > > > > to communicate the commited data size to the MDS. > > > > > > > > -->Andy > > > > > > > > > > > > > So while this might work > > > > > well for NetApp (as long as NetApp never ships a non-clustered > > > > > pNFS), it > > > > > might break others, or at least severely impact their > > > > performance. > > > > > For > > > > > example, will the Solaris pNFS server work correctly without > > > > > LAYOUTCOMMIT? IMHO, the client MUST issue the appropriate > > > > > LAYOUTCOMMIT, > > > > > but the server is free to handle it as a no-op if the server > > > > > implementation does not need to utilize the payload. > > > > > > > > > > -Dan > > > > > > > > > >> -----Original Message----- > > > > >> From: linux-nfs-owner@vger.kernel.org > > > > >> [mailto:linux-nfs-owner@vger.kernel.org] On Behalf Of > > Andy Adamson > > > > >> Sent: Friday, July 02, 2010 8:41 AM > > > > >> To: Sandeep Joshi > > > > >> Cc: linux-nfs@vger.kernel.org; bhalevy@panasas.com > > > > >> Subject: Re: 4.1 client - LAYOUTCOMMIT & close > > > > >> > > > > >> > > > > >> On Jul 1, 2010, at 8:07 PM, Sandeep Joshi wrote: > > > > >> > > > > >> Hi Sandeep > > > > >> > > > > >>> > > > > >>> In certain cases, I don't see layoutcommit on a file > > at all even > > > > >>> after doing many writes. > > > > >> > > > > >> FYI: > > > > >> > > > > >> You should not be paying attention to layoutcommits - > > they have no > > > > >> value for the file layout type. > > > > >> > > > > >> From RFC 5661: > > > > >> > > > > >> "The LAYOUTCOMMIT operation commits chages in the layout > > > > represented > > > > >> by the current filehandle, client ID (derived from the > > > > session ID in > > > > >> the preceding SEQUENCE operation), byte-range, and stateid." > > > > >> > > > > >> For the block layout type, this sentence has meaning in that > > > > >> there is > > > > >> a layoutupdate4 payload that enumerates the blocks that > > > > have changed > > > > >> state from being 'handed out' to being 'written'. > > > > >> > > > > >> The file layout type has no layoutupdate4 payload, and the > > > > >> layout does > > > > >> not change due to writes, and thus the LAYOUTCOMMIT call > > > > is useless. > > > > >> > > > > >> The only field in the LAYOUTCOMMIT4args that might possibly > > > > >> be useful > > > > >> is the loca_last_write_offset which tells the server what > > > > the client > > > > >> thinks is the EOF of the file after WRITE. It is an > > extremely lame > > > > >> server (file layout type server) that depends upon > > clients for this > > > > >> info. > > > > >> > > > > >>> > > > > >>> > > > > >>> > > > > >>> Client side operations: > > > > >>> > > > > >>> open > > > > >>> write(s) > > > > >>> close > > > > >>> > > > > >>> > > > > >>> On server side (observed operations): > > > > >>> > > > > >>> open > > > > >>> layoutget's > > > > >>> close > > > > >>> > > > > >>> > > > > >>> But, I do not see laycommit at all. In terms data written > > > > >> by client > > > > >>> it is about 4-5MB. > > > > >>> > > > > >>> When does client issue laycommit? > > > > >> > > > > >> The latest linux client sends a layout commit when the > > VFS does a > > > > >> super_operations.write_inode call which happens when the > > > > metadata of > > > > >> an inode needs updating. We are seriously considering > > removing the > > > > >> layoutcommit call from the file layout client. > > > > >> > > > > >> -->Andy > > > > >> > > > > >>> > > > > >>> > > > > >>> regards, > > > > >>> > > > > >>> Sandeep > > > > >>> > > > > >>> -- > > > > >>> To unsubscribe from this list: send the line "unsubscribe > > > > >> linux-nfs" > > > > >>> in > > > > >>> the body of a message to majordomo@vger.kernel.org > > > > >>> More majordomo info at > > http://vger.kernel.org/majordomo-info.html > > > > >> > > > > >> -- > > > > >> To unsubscribe from this list: send the line "unsubscribe > > > > >> linux-nfs" in > > > > >> the body of a message to majordomo@vger.kernel.org > > > > >> More majordomo info at > > http://vger.kernel.org/majordomo-info.html > > > > >> > > > > >> > > > > > > > > > > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe > > linux-nfs" in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > > > > > > > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 4.1 client - LAYOUTCOMMIT & close 2010-07-06 20:40 ` Trond Myklebust 2010-07-06 22:50 ` Daniel.Muntz @ 2010-07-07 12:05 ` Benny Halevy 2010-07-07 13:06 ` Trond Myklebust 1 sibling, 1 reply; 38+ messages in thread From: Benny Halevy @ 2010-07-07 12:05 UTC (permalink / raw) To: Trond Myklebust; +Cc: Daniel.Muntz, andros, sjoshi, linux-nfs, NFSv4 On Jul. 06, 2010, 23:40 +0300, Trond Myklebust <trond.myklebust@fys.uio.no> wrote: > On Tue, 2010-07-06 at 15:20 -0400, Daniel.Muntz@emc.com wrote: >> The COMMIT to the DS, ttbomk, commits data on the DS. I see it as >> orthogonal to updating the metadata on the MDS (but perhaps I'm wrong). >> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides a synchronization >> point, so even if the non-clustered server does not want to update >> metadata on every DS I/O, the LAYOUTCOMMIT could also be a trigger to >> execute whatever synchronization mechanism the implementer wishes to put >> in the control protocol. > > As far as I'm aware, there are no exceptions in RFC5661 that would allow > pNFS servers to break the rule that any visible change to the data must > be atomically accompanied with a change attribute update. > Trond, I'm not sure how this rule you mentioned is specified. See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT and change/time_modify in particular: For some layout protocols, the storage device is able to notify the metadata server of the occurrence of an I/O; as a result, the change and time_modify attributes may be updated at the metadata server. For a metadata server that is capable of monitoring updates to the change and time_modify attributes, LAYOUTCOMMIT processing is not required to update the change attribute. In this case, the metadata server must ensure that no further update to the data has occurred since the last update of the attributes; file-based protocols may have enough information to make this determination or may update the change attribute upon each file modification. This also applies for the time_modify attribute. If the server implementation is able to determine that the file has not been modified since the last time_modify update, the server need not update time_modify at LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the updated attributes should be visible if that file was modified since the latest previous LAYOUTCOMMIT or LAYOUTGET > As I see it, if your server allows one client to read data that may have > been modified by another client that holds a WRITE layout for that range > then (since that is a visible data change) it should provide a change > attribute update irrespective of whether or not a LAYOUTCOMMIT has been > sent. the requirement for the server in WRITE's implementation section is quite weak: "It is assumed that the act of writing data to a file will cause the time_modified and change attributes of the file to be updated." The difference here is that for pNFS the written data is not guaranteed to be visible until LAYOUTCOMMIT. In a broader sense, assuming the clients are caching dirty data and use a write-behind cache, application-written data may be visible to other processes on the same host but not to others until fsync() or close() - open-to-close semantics are the only thing the client guarantees, right? Issuing LAYOUTCOMMIT on fsync() and close() ensure the data is committed to stable storage and is visible to all other clients in the cluster. Benny > If your MDS is incapable of determining whether or not data has changed > on the DSes, then it should probably recall the WRITE layout if someone > tries to read data that may have been modified. Said server also needs a > strategy for determining if a data change occurred if the client that > held the WRITE layout died before it could send the LAYOUTCOMMIT. > > Cheers > Trond > >> -Dan >> >>> -----Original Message----- >>> From: Andy Adamson [mailto:andros@netapp.com] >>> Sent: Tuesday, July 06, 2010 6:38 AM >>> To: Muntz, Daniel >>> Cc: sjoshi@bluearc.com; linux-nfs@vger.kernel.org; bhalevy@panasas.com >>> Subject: Re: 4.1 client - LAYOUTCOMMIT & close >>> >>> >>> On Jul 2, 2010, at 5:46 PM, <Daniel.Muntz@emc.com> wrote: >>> >>>> By "extremely lame server" I assume you mean any pNFS server that >>>> doesn't have a cluster FS on the back end. >>> >>> No, I mean a pNFS file layout type server that depends upon >>> the 'hint' >>> of file size given by LAYOUTCOMMIT. This does not mean that the file >>> system has to be a cluster FS. >>> >>> If COMMIT through MDS is set, the MDS to DS protocol (be it a >>> cluster >>> FS or not) ensures the data is "commited" on the DSs. >>> LAYOUTCOMMIT is >>> not needed. >>> >>> If COMMITs are sent to the DSs (or FILE_SYNC writes), then >>> the MDS to >>> DS protocol (be it a cluster FS or not) should kick off a >>> back-end DS >>> to MDS communication to update the file size on the MDS. >>> >>> What I consider an 'extremely lame pNFS file layout server' is one >>> that requires COMMITs to the DS and then depends upon the >>> LAYOUTCOMMIT >>> to communicate the commited data size to the MDS. >>> >>> -->Andy >>> >>> >>>> So while this might work >>>> well for NetApp (as long as NetApp never ships a non-clustered >>>> pNFS), it >>>> might break others, or at least severely impact their >>> performance. >>>> For >>>> example, will the Solaris pNFS server work correctly without >>>> LAYOUTCOMMIT? IMHO, the client MUST issue the appropriate >>>> LAYOUTCOMMIT, >>>> but the server is free to handle it as a no-op if the server >>>> implementation does not need to utilize the payload. >>>> >>>> -Dan >>>> >>>>> -----Original Message----- >>>>> From: linux-nfs-owner@vger.kernel.org >>>>> [mailto:linux-nfs-owner@vger.kernel.org] On Behalf Of Andy Adamson >>>>> Sent: Friday, July 02, 2010 8:41 AM >>>>> To: Sandeep Joshi >>>>> Cc: linux-nfs@vger.kernel.org; bhalevy@panasas.com >>>>> Subject: Re: 4.1 client - LAYOUTCOMMIT & close >>>>> >>>>> >>>>> On Jul 1, 2010, at 8:07 PM, Sandeep Joshi wrote: >>>>> >>>>> Hi Sandeep >>>>> >>>>>> >>>>>> In certain cases, I don't see layoutcommit on a file at all even >>>>>> after doing many writes. >>>>> >>>>> FYI: >>>>> >>>>> You should not be paying attention to layoutcommits - they have no >>>>> value for the file layout type. >>>>> >>>>> From RFC 5661: >>>>> >>>>> "The LAYOUTCOMMIT operation commits chages in the layout >>> represented >>>>> by the current filehandle, client ID (derived from the >>> session ID in >>>>> the preceding SEQUENCE operation), byte-range, and stateid." >>>>> >>>>> For the block layout type, this sentence has meaning in that >>>>> there is >>>>> a layoutupdate4 payload that enumerates the blocks that >>> have changed >>>>> state from being 'handed out' to being 'written'. >>>>> >>>>> The file layout type has no layoutupdate4 payload, and the >>>>> layout does >>>>> not change due to writes, and thus the LAYOUTCOMMIT call >>> is useless. >>>>> >>>>> The only field in the LAYOUTCOMMIT4args that might possibly >>>>> be useful >>>>> is the loca_last_write_offset which tells the server what >>> the client >>>>> thinks is the EOF of the file after WRITE. It is an extremely lame >>>>> server (file layout type server) that depends upon clients for this >>>>> info. >>>>> >>>>>> >>>>>> >>>>>> >>>>>> Client side operations: >>>>>> >>>>>> open >>>>>> write(s) >>>>>> close >>>>>> >>>>>> >>>>>> On server side (observed operations): >>>>>> >>>>>> open >>>>>> layoutget's >>>>>> close >>>>>> >>>>>> >>>>>> But, I do not see laycommit at all. In terms data written >>>>> by client >>>>>> it is about 4-5MB. >>>>>> >>>>>> When does client issue laycommit? >>>>> >>>>> The latest linux client sends a layout commit when the VFS does a >>>>> super_operations.write_inode call which happens when the >>> metadata of >>>>> an inode needs updating. We are seriously considering removing the >>>>> layoutcommit call from the file layout client. >>>>> >>>>> -->Andy >>>>> >>>>>> >>>>>> >>>>>> regards, >>>>>> >>>>>> Sandeep >>>>>> >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe >>>>> linux-nfs" >>>>>> in >>>>>> the body of a message to majordomo@vger.kernel.org >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>> >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe >>>>> linux-nfs" in >>>>> the body of a message to majordomo@vger.kernel.org >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>> >>>>> >>> >>> >>> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 4.1 client - LAYOUTCOMMIT & close 2010-07-07 12:05 ` Benny Halevy @ 2010-07-07 13:06 ` Trond Myklebust 2010-07-07 13:18 ` [nfsv4] " Trond Myklebust 0 siblings, 1 reply; 38+ messages in thread From: Trond Myklebust @ 2010-07-07 13:06 UTC (permalink / raw) To: Benny Halevy; +Cc: Daniel.Muntz, andros, sjoshi, linux-nfs, NFSv4 On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote: > On Jul. 06, 2010, 23:40 +0300, Trond Myklebust <trond.myklebust@fys.uio.no> wrote: > > On Tue, 2010-07-06 at 15:20 -0400, Daniel.Muntz@emc.com wrote: > >> The COMMIT to the DS, ttbomk, commits data on the DS. I see it as > >> orthogonal to updating the metadata on the MDS (but perhaps I'm wrong). > >> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides a synchronization > >> point, so even if the non-clustered server does not want to update > >> metadata on every DS I/O, the LAYOUTCOMMIT could also be a trigger to > >> execute whatever synchronization mechanism the implementer wishes to put > >> in the control protocol. > > > > As far as I'm aware, there are no exceptions in RFC5661 that would allow > > pNFS servers to break the rule that any visible change to the data must > > be atomically accompanied with a change attribute update. > > > > Trond, I'm not sure how this rule you mentioned is specified. > > See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT and change/time_modify > in particular: > > For some layout protocols, the storage device is able to notify the > metadata server of the occurrence of an I/O; as a result, the change > and time_modify attributes may be updated at the metadata server. > For a metadata server that is capable of monitoring updates to the > change and time_modify attributes, LAYOUTCOMMIT processing is not > required to update the change attribute. In this case, the metadata > server must ensure that no further update to the data has occurred > since the last update of the attributes; file-based protocols may > have enough information to make this determination or may update the > change attribute upon each file modification. This also applies for > the time_modify attribute. If the server implementation is able to > determine that the file has not been modified since the last > time_modify update, the server need not update time_modify at > LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the updated attributes > should be visible if that file was modified since the latest previous > LAYOUTCOMMIT or LAYOUTGET I know. However the above paragraph does not state that the server should make those changes visible to clients other than the one that is writing. Section 18.32.4 states that writes will cause the time_modified and change attributes to be updated (if and only if the file data is modified). Several other sections rely on this behaviour, including section 10.3.1, section 11.7.2.2, and section 11.7.7. The only 'special behaviour' that I see allowed for pNFS is in section 13.10, which states that clients can't expect to see changes immediately, but that they must be able to expect close-to-open semantics to work. Again, if this is to be the case, then the server _must_ be able to deal with the case where client 1 dies before it can issue the LAYOUTCOMMIT. > > As I see it, if your server allows one client to read data that may have > > been modified by another client that holds a WRITE layout for that range > > then (since that is a visible data change) it should provide a change > > attribute update irrespective of whether or not a LAYOUTCOMMIT has been > > sent. > > the requirement for the server in WRITE's implementation section > is quite weak: "It is assumed that the act of writing data to a file will > cause the time_modified and change attributes of the file to be updated." > > The difference here is that for pNFS the written data is not guaranteed > to be visible until LAYOUTCOMMIT. In a broader sense, assuming the clients > are caching dirty data and use a write-behind cache, application-written data > may be visible to other processes on the same host but not to others until > fsync() or close() - open-to-close semantics are the only thing the client > guarantees, right? Issuing LAYOUTCOMMIT on fsync() and close() ensure the > data is committed to stable storage and is visible to all other clients in > the cluster. See above. I'm not disputing your statement that 'the written data is not guaranteed to be visible until LAYOUTCOMMIT'. I am disputing an assumption that 'the written data may be visible without an accompanying change attribute update'. Trond > Benny > > > If your MDS is incapable of determining whether or not data has changed > > on the DSes, then it should probably recall the WRITE layout if someone > > tries to read data that may have been modified. Said server also needs a > > strategy for determining if a data change occurred if the client that > > held the WRITE layout died before it could send the LAYOUTCOMMIT. > > > > Cheers > > Trond > > > >> -Dan > >> > >>> -----Original Message----- > >>> From: Andy Adamson [mailto:andros@netapp.com] > >>> Sent: Tuesday, July 06, 2010 6:38 AM > >>> To: Muntz, Daniel > >>> Cc: sjoshi@bluearc.com; linux-nfs@vger.kernel.org; bhalevy@panasas.com > >>> Subject: Re: 4.1 client - LAYOUTCOMMIT & close > >>> > >>> > >>> On Jul 2, 2010, at 5:46 PM, <Daniel.Muntz@emc.com> wrote: > >>> > >>>> By "extremely lame server" I assume you mean any pNFS server that > >>>> doesn't have a cluster FS on the back end. > >>> > >>> No, I mean a pNFS file layout type server that depends upon > >>> the 'hint' > >>> of file size given by LAYOUTCOMMIT. This does not mean that the file > >>> system has to be a cluster FS. > >>> > >>> If COMMIT through MDS is set, the MDS to DS protocol (be it a > >>> cluster > >>> FS or not) ensures the data is "commited" on the DSs. > >>> LAYOUTCOMMIT is > >>> not needed. > >>> > >>> If COMMITs are sent to the DSs (or FILE_SYNC writes), then > >>> the MDS to > >>> DS protocol (be it a cluster FS or not) should kick off a > >>> back-end DS > >>> to MDS communication to update the file size on the MDS. > >>> > >>> What I consider an 'extremely lame pNFS file layout server' is one > >>> that requires COMMITs to the DS and then depends upon the > >>> LAYOUTCOMMIT > >>> to communicate the commited data size to the MDS. > >>> > >>> -->Andy > >>> > >>> > >>>> So while this might work > >>>> well for NetApp (as long as NetApp never ships a non-clustered > >>>> pNFS), it > >>>> might break others, or at least severely impact their > >>> performance. > >>>> For > >>>> example, will the Solaris pNFS server work correctly without > >>>> LAYOUTCOMMIT? IMHO, the client MUST issue the appropriate > >>>> LAYOUTCOMMIT, > >>>> but the server is free to handle it as a no-op if the server > >>>> implementation does not need to utilize the payload. > >>>> > >>>> -Dan > >>>> > >>>>> -----Original Message----- > >>>>> From: linux-nfs-owner@vger.kernel.org > >>>>> [mailto:linux-nfs-owner@vger.kernel.org] On Behalf Of Andy Adamson > >>>>> Sent: Friday, July 02, 2010 8:41 AM > >>>>> To: Sandeep Joshi > >>>>> Cc: linux-nfs@vger.kernel.org; bhalevy@panasas.com > >>>>> Subject: Re: 4.1 client - LAYOUTCOMMIT & close > >>>>> > >>>>> > >>>>> On Jul 1, 2010, at 8:07 PM, Sandeep Joshi wrote: > >>>>> > >>>>> Hi Sandeep > >>>>> > >>>>>> > >>>>>> In certain cases, I don't see layoutcommit on a file at all even > >>>>>> after doing many writes. > >>>>> > >>>>> FYI: > >>>>> > >>>>> You should not be paying attention to layoutcommits - they have no > >>>>> value for the file layout type. > >>>>> > >>>>> From RFC 5661: > >>>>> > >>>>> "The LAYOUTCOMMIT operation commits chages in the layout > >>> represented > >>>>> by the current filehandle, client ID (derived from the > >>> session ID in > >>>>> the preceding SEQUENCE operation), byte-range, and stateid." > >>>>> > >>>>> For the block layout type, this sentence has meaning in that > >>>>> there is > >>>>> a layoutupdate4 payload that enumerates the blocks that > >>> have changed > >>>>> state from being 'handed out' to being 'written'. > >>>>> > >>>>> The file layout type has no layoutupdate4 payload, and the > >>>>> layout does > >>>>> not change due to writes, and thus the LAYOUTCOMMIT call > >>> is useless. > >>>>> > >>>>> The only field in the LAYOUTCOMMIT4args that might possibly > >>>>> be useful > >>>>> is the loca_last_write_offset which tells the server what > >>> the client > >>>>> thinks is the EOF of the file after WRITE. It is an extremely lame > >>>>> server (file layout type server) that depends upon clients for this > >>>>> info. > >>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> Client side operations: > >>>>>> > >>>>>> open > >>>>>> write(s) > >>>>>> close > >>>>>> > >>>>>> > >>>>>> On server side (observed operations): > >>>>>> > >>>>>> open > >>>>>> layoutget's > >>>>>> close > >>>>>> > >>>>>> > >>>>>> But, I do not see laycommit at all. In terms data written > >>>>> by client > >>>>>> it is about 4-5MB. > >>>>>> > >>>>>> When does client issue laycommit? > >>>>> > >>>>> The latest linux client sends a layout commit when the VFS does a > >>>>> super_operations.write_inode call which happens when the > >>> metadata of > >>>>> an inode needs updating. We are seriously considering removing the > >>>>> layoutcommit call from the file layout client. > >>>>> > >>>>> -->Andy > >>>>> > >>>>>> > >>>>>> > >>>>>> regards, > >>>>>> > >>>>>> Sandeep > >>>>>> > >>>>>> -- > >>>>>> To unsubscribe from this list: send the line "unsubscribe > >>>>> linux-nfs" > >>>>>> in > >>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>> > >>>>> -- > >>>>> To unsubscribe from this list: send the line "unsubscribe > >>>>> linux-nfs" in > >>>>> the body of a message to majordomo@vger.kernel.org > >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>> > >>>>> > >>> > >>> > >>> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > > > > > ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close 2010-07-07 13:06 ` Trond Myklebust @ 2010-07-07 13:18 ` Trond Myklebust 2010-07-07 13:51 ` Benny Halevy 0 siblings, 1 reply; 38+ messages in thread From: Trond Myklebust @ 2010-07-07 13:18 UTC (permalink / raw) To: Benny Halevy; +Cc: linux-nfs, NFSv4, andros On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote: > On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote: > > On Jul. 06, 2010, 23:40 +0300, Trond Myklebust <trond.myklebust@fys.uio.no> wrote: > > > On Tue, 2010-07-06 at 15:20 -0400, Daniel.Muntz@emc.com wrote: > > >> The COMMIT to the DS, ttbomk, commits data on the DS. I see it as > > >> orthogonal to updating the metadata on the MDS (but perhaps I'm wrong). > > >> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides a synchronization > > >> point, so even if the non-clustered server does not want to update > > >> metadata on every DS I/O, the LAYOUTCOMMIT could also be a trigger to > > >> execute whatever synchronization mechanism the implementer wishes to put > > >> in the control protocol. > > > > > > As far as I'm aware, there are no exceptions in RFC5661 that would allow > > > pNFS servers to break the rule that any visible change to the data must > > > be atomically accompanied with a change attribute update. > > > > > > > Trond, I'm not sure how this rule you mentioned is specified. > > > > See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT and change/time_modify > > in particular: > > > > For some layout protocols, the storage device is able to notify the > > metadata server of the occurrence of an I/O; as a result, the change > > and time_modify attributes may be updated at the metadata server. > > For a metadata server that is capable of monitoring updates to the > > change and time_modify attributes, LAYOUTCOMMIT processing is not > > required to update the change attribute. In this case, the metadata > > server must ensure that no further update to the data has occurred > > since the last update of the attributes; file-based protocols may > > have enough information to make this determination or may update the > > change attribute upon each file modification. This also applies for > > the time_modify attribute. If the server implementation is able to > > determine that the file has not been modified since the last > > time_modify update, the server need not update time_modify at > > LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the updated attributes > > should be visible if that file was modified since the latest previous > > LAYOUTCOMMIT or LAYOUTGET > > I know. However the above paragraph does not state that the server > should make those changes visible to clients other than the one that is > writing. > > Section 18.32.4 states that writes will cause the time_modified and > change attributes to be updated (if and only if the file data is > modified). Several other sections rely on this behaviour, including > section 10.3.1, section 11.7.2.2, and section 11.7.7. > > The only 'special behaviour' that I see allowed for pNFS is in section > 13.10, which states that clients can't expect to see changes > immediately, but that they must be able to expect close-to-open > semantics to work. Again, if this is to be the case, then the server > _must_ be able to deal with the case where client 1 dies before it can > issue the LAYOUTCOMMIT. > > > > > As I see it, if your server allows one client to read data that may have > > > been modified by another client that holds a WRITE layout for that range > > > then (since that is a visible data change) it should provide a change > > > attribute update irrespective of whether or not a LAYOUTCOMMIT has been > > > sent. > > > > the requirement for the server in WRITE's implementation section > > is quite weak: "It is assumed that the act of writing data to a file will > > cause the time_modified and change attributes of the file to be updated." > > > > The difference here is that for pNFS the written data is not guaranteed > > to be visible until LAYOUTCOMMIT. In a broader sense, assuming the clients > > are caching dirty data and use a write-behind cache, application-written data > > may be visible to other processes on the same host but not to others until > > fsync() or close() - open-to-close semantics are the only thing the client > > guarantees, right? Issuing LAYOUTCOMMIT on fsync() and close() ensure the > > data is committed to stable storage and is visible to all other clients in > > the cluster. > > See above. I'm not disputing your statement that 'the written data is > not guaranteed to be visible until LAYOUTCOMMIT'. I am disputing an > assumption that 'the written data may be visible without an accompanying > change attribute update'. In other words, I'd expect the following scenario to give the same results in NFSv4.1 w/pNFS as it does in NFSv4: Client 1 Client 2 ======== ======== OPEN foo READ CLOSE OPEN LAYOUTGET ... WRITE via DS <dies>... OPEN foo verify change_attr READ if above WRITE is visible CLOSE Trond _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www.ietf.org/mailman/listinfo/nfsv4 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close 2010-07-07 13:18 ` [nfsv4] " Trond Myklebust @ 2010-07-07 13:51 ` Benny Halevy 2010-07-07 14:03 ` Trond Myklebust 0 siblings, 1 reply; 38+ messages in thread From: Benny Halevy @ 2010-07-07 13:51 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs, NFSv4, andros, Garth Gibson, Brent Welch On Jul. 07, 2010, 16:18 +0300, Trond Myklebust <Trond.Myklebust@netapp.com> wrote: > On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote: >> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote: >>> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust <trond.myklebust@fys.uio.no> wrote: >>>> On Tue, 2010-07-06 at 15:20 -0400, Daniel.Muntz@emc.com wrote: >>>>> The COMMIT to the DS, ttbomk, commits data on the DS. I see it as >>>>> orthogonal to updating the metadata on the MDS (but perhaps I'm wrong). >>>>> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides a synchronization >>>>> point, so even if the non-clustered server does not want to update >>>>> metadata on every DS I/O, the LAYOUTCOMMIT could also be a trigger to >>>>> execute whatever synchronization mechanism the implementer wishes to put >>>>> in the control protocol. >>>> >>>> As far as I'm aware, there are no exceptions in RFC5661 that would allow >>>> pNFS servers to break the rule that any visible change to the data must >>>> be atomically accompanied with a change attribute update. >>>> >>> >>> Trond, I'm not sure how this rule you mentioned is specified. >>> >>> See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT and change/time_modify >>> in particular: >>> >>> For some layout protocols, the storage device is able to notify the >>> metadata server of the occurrence of an I/O; as a result, the change >>> and time_modify attributes may be updated at the metadata server. >>> For a metadata server that is capable of monitoring updates to the >>> change and time_modify attributes, LAYOUTCOMMIT processing is not >>> required to update the change attribute. In this case, the metadata >>> server must ensure that no further update to the data has occurred >>> since the last update of the attributes; file-based protocols may >>> have enough information to make this determination or may update the >>> change attribute upon each file modification. This also applies for >>> the time_modify attribute. If the server implementation is able to >>> determine that the file has not been modified since the last >>> time_modify update, the server need not update time_modify at >>> LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the updated attributes >>> should be visible if that file was modified since the latest previous >>> LAYOUTCOMMIT or LAYOUTGET >> >> I know. However the above paragraph does not state that the server >> should make those changes visible to clients other than the one that is >> writing. >> >> Section 18.32.4 states that writes will cause the time_modified and >> change attributes to be updated (if and only if the file data is >> modified). Several other sections rely on this behaviour, including >> section 10.3.1, section 11.7.2.2, and section 11.7.7. >> >> The only 'special behaviour' that I see allowed for pNFS is in section >> 13.10, which states that clients can't expect to see changes >> immediately, but that they must be able to expect close-to-open >> semantics to work. Again, if this is to be the case, then the server >> _must_ be able to deal with the case where client 1 dies before it can >> issue the LAYOUTCOMMIT. Agreed. >> >> >>>> As I see it, if your server allows one client to read data that may have >>>> been modified by another client that holds a WRITE layout for that range >>>> then (since that is a visible data change) it should provide a change >>>> attribute update irrespective of whether or not a LAYOUTCOMMIT has been >>>> sent. >>> >>> the requirement for the server in WRITE's implementation section >>> is quite weak: "It is assumed that the act of writing data to a file will >>> cause the time_modified and change attributes of the file to be updated." >>> >>> The difference here is that for pNFS the written data is not guaranteed >>> to be visible until LAYOUTCOMMIT. In a broader sense, assuming the clients >>> are caching dirty data and use a write-behind cache, application-written data >>> may be visible to other processes on the same host but not to others until >>> fsync() or close() - open-to-close semantics are the only thing the client >>> guarantees, right? Issuing LAYOUTCOMMIT on fsync() and close() ensure the >>> data is committed to stable storage and is visible to all other clients in >>> the cluster. >> >> See above. I'm not disputing your statement that 'the written data is >> not guaranteed to be visible until LAYOUTCOMMIT'. I am disputing an >> assumption that 'the written data may be visible without an accompanying >> change attribute update'. > > > In other words, I'd expect the following scenario to give the same > results in NFSv4.1 w/pNFS as it does in NFSv4: That's a strong requirement that may limit the scalability of the server. The spirit of the pNFS operations, at least from Panasas perspective was that the data is transient until LAYOUTCOMMIT, meaning it may or may not be visible to clients other than the one who wrote it, and its associated metadata MUST be updated and describe the new data only on LAYOUTCOMMIT and until then it's undefined, i.e. it's up to the server implementation whether to update it or not. Without locking, what do the stronger semantics buy you? Even if a client verified the change_attribute new data may become visible at any time after the GETATTR if the file/byte range aren't locked. Benny > > Client 1 Client 2 > ======== ======== > > OPEN foo > READ > CLOSE > OPEN > LAYOUTGET ... > WRITE via DS > <dies>... > OPEN foo > verify change_attr > READ if above WRITE is visible > CLOSE > > Trond > _______________________________________________ > nfsv4 mailing list > nfsv4@ietf.org > https://www.ietf.org/mailman/listinfo/nfsv4 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close 2010-07-07 13:51 ` Benny Halevy @ 2010-07-07 14:03 ` Trond Myklebust 2010-07-07 17:45 ` Dean Hildebrand 2010-07-07 20:39 ` Daniel.Muntz 0 siblings, 2 replies; 38+ messages in thread From: Trond Myklebust @ 2010-07-07 14:03 UTC (permalink / raw) To: Benny Halevy; +Cc: andros, linux-nfs, Garth Gibson, Brent Welch, NFSv4 On Wed, 2010-07-07 at 16:51 +0300, Benny Halevy wrote: > On Jul. 07, 2010, 16:18 +0300, Trond Myklebust <Trond.Myklebust@netapp.com> wrote: > > On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote: > >> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote: > >>> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust <trond.myklebust@fys.uio.no> wrote: > >>>> On Tue, 2010-07-06 at 15:20 -0400, Daniel.Muntz@emc.com wrote: > >>>>> The COMMIT to the DS, ttbomk, commits data on the DS. I see it as > >>>>> orthogonal to updating the metadata on the MDS (but perhaps I'm wrong). > >>>>> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides a synchronization > >>>>> point, so even if the non-clustered server does not want to update > >>>>> metadata on every DS I/O, the LAYOUTCOMMIT could also be a trigger to > >>>>> execute whatever synchronization mechanism the implementer wishes to put > >>>>> in the control protocol. > >>>> > >>>> As far as I'm aware, there are no exceptions in RFC5661 that would allow > >>>> pNFS servers to break the rule that any visible change to the data must > >>>> be atomically accompanied with a change attribute update. > >>>> > >>> > >>> Trond, I'm not sure how this rule you mentioned is specified. > >>> > >>> See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT and change/time_modify > >>> in particular: > >>> > >>> For some layout protocols, the storage device is able to notify the > >>> metadata server of the occurrence of an I/O; as a result, the change > >>> and time_modify attributes may be updated at the metadata server. > >>> For a metadata server that is capable of monitoring updates to the > >>> change and time_modify attributes, LAYOUTCOMMIT processing is not > >>> required to update the change attribute. In this case, the metadata > >>> server must ensure that no further update to the data has occurred > >>> since the last update of the attributes; file-based protocols may > >>> have enough information to make this determination or may update the > >>> change attribute upon each file modification. This also applies for > >>> the time_modify attribute. If the server implementation is able to > >>> determine that the file has not been modified since the last > >>> time_modify update, the server need not update time_modify at > >>> LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the updated attributes > >>> should be visible if that file was modified since the latest previous > >>> LAYOUTCOMMIT or LAYOUTGET > >> > >> I know. However the above paragraph does not state that the server > >> should make those changes visible to clients other than the one that is > >> writing. > >> > >> Section 18.32.4 states that writes will cause the time_modified and > >> change attributes to be updated (if and only if the file data is > >> modified). Several other sections rely on this behaviour, including > >> section 10.3.1, section 11.7.2.2, and section 11.7.7. > >> > >> The only 'special behaviour' that I see allowed for pNFS is in section > >> 13.10, which states that clients can't expect to see changes > >> immediately, but that they must be able to expect close-to-open > >> semantics to work. Again, if this is to be the case, then the server > >> _must_ be able to deal with the case where client 1 dies before it can > >> issue the LAYOUTCOMMIT. > > Agreed. > > >> > >> > >>>> As I see it, if your server allows one client to read data that may have > >>>> been modified by another client that holds a WRITE layout for that range > >>>> then (since that is a visible data change) it should provide a change > >>>> attribute update irrespective of whether or not a LAYOUTCOMMIT has been > >>>> sent. > >>> > >>> the requirement for the server in WRITE's implementation section > >>> is quite weak: "It is assumed that the act of writing data to a file will > >>> cause the time_modified and change attributes of the file to be updated." > >>> > >>> The difference here is that for pNFS the written data is not guaranteed > >>> to be visible until LAYOUTCOMMIT. In a broader sense, assuming the clients > >>> are caching dirty data and use a write-behind cache, application-written data > >>> may be visible to other processes on the same host but not to others until > >>> fsync() or close() - open-to-close semantics are the only thing the client > >>> guarantees, right? Issuing LAYOUTCOMMIT on fsync() and close() ensure the > >>> data is committed to stable storage and is visible to all other clients in > >>> the cluster. > >> > >> See above. I'm not disputing your statement that 'the written data is > >> not guaranteed to be visible until LAYOUTCOMMIT'. I am disputing an > >> assumption that 'the written data may be visible without an accompanying > >> change attribute update'. > > > > > > In other words, I'd expect the following scenario to give the same > > results in NFSv4.1 w/pNFS as it does in NFSv4: > > That's a strong requirement that may limit the scalability of the server. > > The spirit of the pNFS operations, at least from Panasas perspective was that > the data is transient until LAYOUTCOMMIT, meaning it may or may not be visible > to clients other than the one who wrote it, and its associated metadata MUST > be updated and describe the new data only on LAYOUTCOMMIT and until then it's > undefined, i.e. it's up to the server implementation whether to update it or not. > > Without locking, what do the stronger semantics buy you? > Even if a client verified the change_attribute new data may become visible > at any time after the GETATTR if the file/byte range aren't locked. There is no locking needed in the scenario below: it is ordinary close-to-open semantics. The point is that if you remove the one and only way that clients have to determine whether or not their data caches are valid, then they can no longer cache data at all, and server scalability will be shot to smithereens anyway. Trond > Benny > > > > > Client 1 Client 2 > > ======== ======== > > > > OPEN foo > > READ > > CLOSE > > OPEN > > LAYOUTGET ... > > WRITE via DS > > <dies>... > > OPEN foo > > verify change_attr > > READ if above WRITE is visible > > CLOSE > > > > Trond > > _______________________________________________ > > nfsv4 mailing list > > nfsv4@ietf.org > > https://www.ietf.org/mailman/listinfo/nfsv4 _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www.ietf.org/mailman/listinfo/nfsv4 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close 2010-07-07 14:03 ` Trond Myklebust @ 2010-07-07 17:45 ` Dean Hildebrand 2010-07-07 20:39 ` Daniel.Muntz 1 sibling, 0 replies; 38+ messages in thread From: Dean Hildebrand @ 2010-07-07 17:45 UTC (permalink / raw) To: Trond Myklebust Cc: Benny Halevy, andros, linux-nfs, Garth Gibson, Brent Welch, NFSv4 On 7/7/2010 7:03 AM, Trond Myklebust wrote: > On Wed, 2010-07-07 at 16:51 +0300, Benny Halevy wrote: > >> On Jul. 07, 2010, 16:18 +0300, Trond Myklebust<Trond.Myklebust@netapp.com> wrote: >> >>> On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote: >>> >>>> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote: >>>> >>>>> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust<trond.myklebust@fys.uio.no> wrote: >>>>> >>>>>> On Tue, 2010-07-06 at 15:20 -0400, Daniel.Muntz@emc.com wrote: >>>>>> >>>>>>> The COMMIT to the DS, ttbomk, commits data on the DS. I see it as >>>>>>> orthogonal to updating the metadata on the MDS (but perhaps I'm wrong). >>>>>>> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides a synchronization >>>>>>> point, so even if the non-clustered server does not want to update >>>>>>> metadata on every DS I/O, the LAYOUTCOMMIT could also be a trigger to >>>>>>> execute whatever synchronization mechanism the implementer wishes to put >>>>>>> in the control protocol. >>>>>>> >>>>>> As far as I'm aware, there are no exceptions in RFC5661 that would allow >>>>>> pNFS servers to break the rule that any visible change to the data must >>>>>> be atomically accompanied with a change attribute update. >>>>>> >>>>>> >>>>> Trond, I'm not sure how this rule you mentioned is specified. >>>>> >>>>> See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT and change/time_modify >>>>> in particular: >>>>> >>>>> For some layout protocols, the storage device is able to notify the >>>>> metadata server of the occurrence of an I/O; as a result, the change >>>>> and time_modify attributes may be updated at the metadata server. >>>>> For a metadata server that is capable of monitoring updates to the >>>>> change and time_modify attributes, LAYOUTCOMMIT processing is not >>>>> required to update the change attribute. In this case, the metadata >>>>> server must ensure that no further update to the data has occurred >>>>> since the last update of the attributes; file-based protocols may >>>>> have enough information to make this determination or may update the >>>>> change attribute upon each file modification. This also applies for >>>>> the time_modify attribute. If the server implementation is able to >>>>> determine that the file has not been modified since the last >>>>> time_modify update, the server need not update time_modify at >>>>> LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the updated attributes >>>>> should be visible if that file was modified since the latest previous >>>>> LAYOUTCOMMIT or LAYOUTGET >>>>> >>>> I know. However the above paragraph does not state that the server >>>> should make those changes visible to clients other than the one that is >>>> writing. >>>> >>>> Section 18.32.4 states that writes will cause the time_modified and >>>> change attributes to be updated (if and only if the file data is >>>> modified). Several other sections rely on this behaviour, including >>>> section 10.3.1, section 11.7.2.2, and section 11.7.7. >>>> >>>> The only 'special behaviour' that I see allowed for pNFS is in section >>>> 13.10, which states that clients can't expect to see changes >>>> immediately, but that they must be able to expect close-to-open >>>> semantics to work. Again, if this is to be the case, then the server >>>> _must_ be able to deal with the case where client 1 dies before it can >>>> issue the LAYOUTCOMMIT. >>>> >> Agreed. >> >> >>>> >>>> >>>>>> As I see it, if your server allows one client to read data that may have >>>>>> been modified by another client that holds a WRITE layout for that range >>>>>> then (since that is a visible data change) it should provide a change >>>>>> attribute update irrespective of whether or not a LAYOUTCOMMIT has been >>>>>> sent. >>>>>> >>>>> the requirement for the server in WRITE's implementation section >>>>> is quite weak: "It is assumed that the act of writing data to a file will >>>>> cause the time_modified and change attributes of the file to be updated." >>>>> >>>>> The difference here is that for pNFS the written data is not guaranteed >>>>> to be visible until LAYOUTCOMMIT. In a broader sense, assuming the clients >>>>> are caching dirty data and use a write-behind cache, application-written data >>>>> may be visible to other processes on the same host but not to others until >>>>> fsync() or close() - open-to-close semantics are the only thing the client >>>>> guarantees, right? Issuing LAYOUTCOMMIT on fsync() and close() ensure the >>>>> data is committed to stable storage and is visible to all other clients in >>>>> the cluster. >>>>> >>>> See above. I'm not disputing your statement that 'the written data is >>>> not guaranteed to be visible until LAYOUTCOMMIT'. I am disputing an >>>> assumption that 'the written data may be visible without an accompanying >>>> change attribute update'. >>>> >>> >>> In other words, I'd expect the following scenario to give the same >>> results in NFSv4.1 w/pNFS as it does in NFSv4: >>> >> That's a strong requirement that may limit the scalability of the server. >> >> The spirit of the pNFS operations, at least from Panasas perspective was that >> the data is transient until LAYOUTCOMMIT, meaning it may or may not be visible >> to clients other than the one who wrote it, and its associated metadata MUST >> be updated and describe the new data only on LAYOUTCOMMIT and until then it's >> undefined, i.e. it's up to the server implementation whether to update it or not. >> >> Without locking, what do the stronger semantics buy you? >> Even if a client verified the change_attribute new data may become visible >> at any time after the GETATTR if the file/byte range aren't locked. >> > There is no locking needed in the scenario below: it is ordinary > close-to-open semantics. > > The point is that if you remove the one and only way that clients have > to determine whether or not their data caches are valid, then they can > no longer cache data at all, and server scalability will be shot to > smithereens anyway. > It would seem that when the change_attr is changed depends on the server implementation. If the server implementation promises NOT to modify the file in place on a write, then it can postpone updating the change_attr until LAYOUTCOMMIT (at which time the actual file data is updated). If not, meaning that if client 1 can see the write by client 2 in the example below, then the change_attr should be updated on every write (I would guess it would only be updated when some server actually requested it) Dean > Trond > > >> Benny >> >> >>> Client 1 Client 2 >>> ======== ======== >>> >>> OPEN foo >>> READ >>> CLOSE >>> OPEN >>> LAYOUTGET ... >>> WRITE via DS >>> <dies>... >>> OPEN foo >>> verify change_attr >>> READ if above WRITE is visible >>> CLOSE >>> >>> Trond >>> _______________________________________________ >>> nfsv4 mailing list >>> nfsv4@ietf.org >>> https://www.ietf.org/mailman/listinfo/nfsv4 >>> > > _______________________________________________ > nfsv4 mailing list > nfsv4@ietf.org > https://www.ietf.org/mailman/listinfo/nfsv4 > ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close 2010-07-07 14:03 ` Trond Myklebust 2010-07-07 17:45 ` Dean Hildebrand @ 2010-07-07 20:39 ` Daniel.Muntz 2010-07-07 21:01 ` Trond Myklebust 1 sibling, 1 reply; 38+ messages in thread From: Daniel.Muntz @ 2010-07-07 20:39 UTC (permalink / raw) To: Trond.Myklebust, bhalevy; +Cc: andros, linux-nfs, garth, welch, nfsv4 To bring this discussion full circle, since we agree that a compliant server can implement a scheme where written data does not become visible until after a LAYOUTCOMMIT, do we also agree that LAYOUTCOMMIT is a "MUST" from a compliant client (independent of layout type)? -Dan > -----Original Message----- > From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] > On Behalf Of Trond Myklebust > Sent: Wednesday, July 07, 2010 7:04 AM > To: Benny Halevy > Cc: andros@netapp.com; linux-nfs@vger.kernel.org; Garth > Gibson; Brent Welch; NFSv4 > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > > On Wed, 2010-07-07 at 16:51 +0300, Benny Halevy wrote: > > On Jul. 07, 2010, 16:18 +0300, Trond Myklebust > <Trond.Myklebust@netapp.com> wrote: > > > On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote: > > >> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote: > > >>> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust > <trond.myklebust@fys.uio.no> wrote: > > >>>> On Tue, 2010-07-06 at 15:20 -0400, Daniel.Muntz@emc.com wrote: > > >>>>> The COMMIT to the DS, ttbomk, commits data on the DS. > I see it as > > >>>>> orthogonal to updating the metadata on the MDS (but > perhaps I'm wrong). > > >>>>> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT > provides a synchronization > > >>>>> point, so even if the non-clustered server does not > want to update > > >>>>> metadata on every DS I/O, the LAYOUTCOMMIT could also > be a trigger to > > >>>>> execute whatever synchronization mechanism the > implementer wishes to put > > >>>>> in the control protocol. > > >>>> > > >>>> As far as I'm aware, there are no exceptions in > RFC5661 that would allow > > >>>> pNFS servers to break the rule that any visible change > to the data must > > >>>> be atomically accompanied with a change attribute update. > > >>>> > > >>> > > >>> Trond, I'm not sure how this rule you mentioned is specified. > > >>> > > >>> See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT > and change/time_modify > > >>> in particular: > > >>> > > >>> For some layout protocols, the storage device is > able to notify the > > >>> metadata server of the occurrence of an I/O; as a > result, the change > > >>> and time_modify attributes may be updated at the > metadata server. > > >>> For a metadata server that is capable of monitoring > updates to the > > >>> change and time_modify attributes, LAYOUTCOMMIT > processing is not > > >>> required to update the change attribute. In this > case, the metadata > > >>> server must ensure that no further update to the > data has occurred > > >>> since the last update of the attributes; file-based > protocols may > > >>> have enough information to make this determination > or may update the > > >>> change attribute upon each file modification. This > also applies for > > >>> the time_modify attribute. If the server > implementation is able to > > >>> determine that the file has not been modified since the last > > >>> time_modify update, the server need not update time_modify at > > >>> LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the > updated attributes > > >>> should be visible if that file was modified since > the latest previous > > >>> LAYOUTCOMMIT or LAYOUTGET > > >> > > >> I know. However the above paragraph does not state that > the server > > >> should make those changes visible to clients other than > the one that is > > >> writing. > > >> > > >> Section 18.32.4 states that writes will cause the > time_modified and > > >> change attributes to be updated (if and only if the file data is > > >> modified). Several other sections rely on this > behaviour, including > > >> section 10.3.1, section 11.7.2.2, and section 11.7.7. > > >> > > >> The only 'special behaviour' that I see allowed for pNFS > is in section > > >> 13.10, which states that clients can't expect to see changes > > >> immediately, but that they must be able to expect close-to-open > > >> semantics to work. Again, if this is to be the case, > then the server > > >> _must_ be able to deal with the case where client 1 dies > before it can > > >> issue the LAYOUTCOMMIT. > > > > Agreed. > > > > >> > > >> > > >>>> As I see it, if your server allows one client to read > data that may have > > >>>> been modified by another client that holds a WRITE > layout for that range > > >>>> then (since that is a visible data change) it should > provide a change > > >>>> attribute update irrespective of whether or not a > LAYOUTCOMMIT has been > > >>>> sent. > > >>> > > >>> the requirement for the server in WRITE's > implementation section > > >>> is quite weak: "It is assumed that the act of writing > data to a file will > > >>> cause the time_modified and change attributes of the > file to be updated." > > >>> > > >>> The difference here is that for pNFS the written data > is not guaranteed > > >>> to be visible until LAYOUTCOMMIT. In a broader sense, > assuming the clients > > >>> are caching dirty data and use a write-behind cache, > application-written data > > >>> may be visible to other processes on the same host but > not to others until > > >>> fsync() or close() - open-to-close semantics are the > only thing the client > > >>> guarantees, right? Issuing LAYOUTCOMMIT on fsync() and > close() ensure the > > >>> data is committed to stable storage and is visible to > all other clients in > > >>> the cluster. > > >> > > >> See above. I'm not disputing your statement that 'the > written data is > > >> not guaranteed to be visible until LAYOUTCOMMIT'. I am > disputing an > > >> assumption that 'the written data may be visible without > an accompanying > > >> change attribute update'. > > > > > > > > > In other words, I'd expect the following scenario to give the same > > > results in NFSv4.1 w/pNFS as it does in NFSv4: > > > > That's a strong requirement that may limit the scalability > of the server. > > > > The spirit of the pNFS operations, at least from Panasas > perspective was that > > the data is transient until LAYOUTCOMMIT, meaning it may or > may not be visible > > to clients other than the one who wrote it, and its > associated metadata MUST > > be updated and describe the new data only on LAYOUTCOMMIT > and until then it's > > undefined, i.e. it's up to the server implementation > whether to update it or not. > > > > Without locking, what do the stronger semantics buy you? > > Even if a client verified the change_attribute new data may > become visible > > at any time after the GETATTR if the file/byte range aren't locked. > > There is no locking needed in the scenario below: it is ordinary > close-to-open semantics. > > The point is that if you remove the one and only way that clients have > to determine whether or not their data caches are valid, then they can > no longer cache data at all, and server scalability will be shot to > smithereens anyway. > > Trond > > > Benny > > > > > > > > Client 1 Client 2 > > > ======== ======== > > > > > > OPEN foo > > > READ > > > CLOSE > > > OPEN > > > LAYOUTGET ... > > > WRITE via DS > > > <dies>... > > > OPEN foo > > > verify change_attr > > > READ if above WRITE is visible > > > CLOSE > > > > > > Trond > > > _______________________________________________ > > > nfsv4 mailing list > > > nfsv4@ietf.org > > > https://www.ietf.org/mailman/listinfo/nfsv4 > > > _______________________________________________ > nfsv4 mailing list > nfsv4@ietf.org > https://www.ietf.org/mailman/listinfo/nfsv4 > > _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www.ietf.org/mailman/listinfo/nfsv4 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close 2010-07-07 20:39 ` Daniel.Muntz @ 2010-07-07 21:01 ` Trond Myklebust 2010-07-07 22:04 ` Noveck_David 0 siblings, 1 reply; 38+ messages in thread From: Trond Myklebust @ 2010-07-07 21:01 UTC (permalink / raw) To: Daniel.Muntz; +Cc: linux-nfs, garth, welch, nfsv4, andros, bhalevy On Wed, 2010-07-07 at 16:39 -0400, Daniel.Muntz@emc.com wrote: > To bring this discussion full circle, since we agree that a compliant > server can implement a scheme where written data does not become visible > until after a LAYOUTCOMMIT, do we also agree that LAYOUTCOMMIT is a > "MUST" from a compliant client (independent of layout type)? Yes. I would agree that the client cannot rely on the updates being made visible if it fails to send the LAYOUTCOMMIT. My point was simply that a compliant server MUST also have a valid strategy for dealing with the case where the client doesn't send it. Cheers Trond > -Dan > > > -----Original Message----- > > From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] > > On Behalf Of Trond Myklebust > > Sent: Wednesday, July 07, 2010 7:04 AM > > To: Benny Halevy > > Cc: andros@netapp.com; linux-nfs@vger.kernel.org; Garth > > Gibson; Brent Welch; NFSv4 > > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > > > > On Wed, 2010-07-07 at 16:51 +0300, Benny Halevy wrote: > > > On Jul. 07, 2010, 16:18 +0300, Trond Myklebust > > <Trond.Myklebust@netapp.com> wrote: > > > > On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote: > > > >> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote: > > > >>> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust > > <trond.myklebust@fys.uio.no> wrote: > > > >>>> On Tue, 2010-07-06 at 15:20 -0400, Daniel.Muntz@emc.com wrote: > > > >>>>> The COMMIT to the DS, ttbomk, commits data on the DS. > > I see it as > > > >>>>> orthogonal to updating the metadata on the MDS (but > > perhaps I'm wrong). > > > >>>>> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT > > provides a synchronization > > > >>>>> point, so even if the non-clustered server does not > > want to update > > > >>>>> metadata on every DS I/O, the LAYOUTCOMMIT could also > > be a trigger to > > > >>>>> execute whatever synchronization mechanism the > > implementer wishes to put > > > >>>>> in the control protocol. > > > >>>> > > > >>>> As far as I'm aware, there are no exceptions in > > RFC5661 that would allow > > > >>>> pNFS servers to break the rule that any visible change > > to the data must > > > >>>> be atomically accompanied with a change attribute update. > > > >>>> > > > >>> > > > >>> Trond, I'm not sure how this rule you mentioned is specified. > > > >>> > > > >>> See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT > > and change/time_modify > > > >>> in particular: > > > >>> > > > >>> For some layout protocols, the storage device is > > able to notify the > > > >>> metadata server of the occurrence of an I/O; as a > > result, the change > > > >>> and time_modify attributes may be updated at the > > metadata server. > > > >>> For a metadata server that is capable of monitoring > > updates to the > > > >>> change and time_modify attributes, LAYOUTCOMMIT > > processing is not > > > >>> required to update the change attribute. In this > > case, the metadata > > > >>> server must ensure that no further update to the > > data has occurred > > > >>> since the last update of the attributes; file-based > > protocols may > > > >>> have enough information to make this determination > > or may update the > > > >>> change attribute upon each file modification. This > > also applies for > > > >>> the time_modify attribute. If the server > > implementation is able to > > > >>> determine that the file has not been modified since the last > > > >>> time_modify update, the server need not update time_modify at > > > >>> LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the > > updated attributes > > > >>> should be visible if that file was modified since > > the latest previous > > > >>> LAYOUTCOMMIT or LAYOUTGET > > > >> > > > >> I know. However the above paragraph does not state that > > the server > > > >> should make those changes visible to clients other than > > the one that is > > > >> writing. > > > >> > > > >> Section 18.32.4 states that writes will cause the > > time_modified and > > > >> change attributes to be updated (if and only if the file data is > > > >> modified). Several other sections rely on this > > behaviour, including > > > >> section 10.3.1, section 11.7.2.2, and section 11.7.7. > > > >> > > > >> The only 'special behaviour' that I see allowed for pNFS > > is in section > > > >> 13.10, which states that clients can't expect to see changes > > > >> immediately, but that they must be able to expect close-to-open > > > >> semantics to work. Again, if this is to be the case, > > then the server > > > >> _must_ be able to deal with the case where client 1 dies > > before it can > > > >> issue the LAYOUTCOMMIT. > > > > > > Agreed. > > > > > > >> > > > >> > > > >>>> As I see it, if your server allows one client to read > > data that may have > > > >>>> been modified by another client that holds a WRITE > > layout for that range > > > >>>> then (since that is a visible data change) it should > > provide a change > > > >>>> attribute update irrespective of whether or not a > > LAYOUTCOMMIT has been > > > >>>> sent. > > > >>> > > > >>> the requirement for the server in WRITE's > > implementation section > > > >>> is quite weak: "It is assumed that the act of writing > > data to a file will > > > >>> cause the time_modified and change attributes of the > > file to be updated." > > > >>> > > > >>> The difference here is that for pNFS the written data > > is not guaranteed > > > >>> to be visible until LAYOUTCOMMIT. In a broader sense, > > assuming the clients > > > >>> are caching dirty data and use a write-behind cache, > > application-written data > > > >>> may be visible to other processes on the same host but > > not to others until > > > >>> fsync() or close() - open-to-close semantics are the > > only thing the client > > > >>> guarantees, right? Issuing LAYOUTCOMMIT on fsync() and > > close() ensure the > > > >>> data is committed to stable storage and is visible to > > all other clients in > > > >>> the cluster. > > > >> > > > >> See above. I'm not disputing your statement that 'the > > written data is > > > >> not guaranteed to be visible until LAYOUTCOMMIT'. I am > > disputing an > > > >> assumption that 'the written data may be visible without > > an accompanying > > > >> change attribute update'. > > > > > > > > > > > > In other words, I'd expect the following scenario to give the same > > > > results in NFSv4.1 w/pNFS as it does in NFSv4: > > > > > > That's a strong requirement that may limit the scalability > > of the server. > > > > > > The spirit of the pNFS operations, at least from Panasas > > perspective was that > > > the data is transient until LAYOUTCOMMIT, meaning it may or > > may not be visible > > > to clients other than the one who wrote it, and its > > associated metadata MUST > > > be updated and describe the new data only on LAYOUTCOMMIT > > and until then it's > > > undefined, i.e. it's up to the server implementation > > whether to update it or not. > > > > > > Without locking, what do the stronger semantics buy you? > > > Even if a client verified the change_attribute new data may > > become visible > > > at any time after the GETATTR if the file/byte range aren't locked. > > > > There is no locking needed in the scenario below: it is ordinary > > close-to-open semantics. > > > > The point is that if you remove the one and only way that clients have > > to determine whether or not their data caches are valid, then they can > > no longer cache data at all, and server scalability will be shot to > > smithereens anyway. > > > > Trond > > > > > Benny > > > > > > > > > > > Client 1 Client 2 > > > > ======== ======== > > > > > > > > OPEN foo > > > > READ > > > > CLOSE > > > > OPEN > > > > LAYOUTGET ... > > > > WRITE via DS > > > > <dies>... > > > > OPEN foo > > > > verify change_attr > > > > READ if above WRITE is visible > > > > CLOSE > > > > > > > > Trond > > > > _______________________________________________ > > > > nfsv4 mailing list > > > > nfsv4@ietf.org > > > > https://www.ietf.org/mailman/listinfo/nfsv4 > > > > > > _______________________________________________ > > nfsv4 mailing list > > nfsv4@ietf.org > > https://www.ietf.org/mailman/listinfo/nfsv4 > > > > _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www.ietf.org/mailman/listinfo/nfsv4 ^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [nfsv4] 4.1 client - LAYOUTCOMMIT & close 2010-07-07 21:01 ` Trond Myklebust @ 2010-07-07 22:04 ` Noveck_David 2010-07-07 22:27 ` Trond Myklebust 2010-07-07 22:44 ` david.black 0 siblings, 2 replies; 38+ messages in thread From: Noveck_David @ 2010-07-07 22:04 UTC (permalink / raw) To: Trond.Myklebust, Daniel.Muntz Cc: linux-nfs, garth, welch, nfsv4, andros, bhalevy > Yes. I would agree that the client cannot rely on the updates being made > visible if it fails to send the LAYOUTCOMMIT. My point was simply that a > compliant server MUST also have a valid strategy for dealing with the > case where the client doesn't send it. So you are saying the updates "MUST be made visible" through the server's valid strategy. Is that right. And that the client cannot rely on that. Why not, if the server must have a valid strategy. Is this just prudent "belt and suspenders" design or what? It seems to me that if one side here is MUST (and the spec needs to be clearer about what might or might not constitute a valid strategy), then the other side should be SHOULD. If both sides are "MUST", then if things don't work out then the client and server can equally point to one another and say "It's his fault". Am I missing something here? -----Original Message----- From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] On Behalf Of Trond Myklebust Sent: Wednesday, July 07, 2010 5:01 PM To: Muntz, Daniel Cc: linux-nfs@vger.kernel.org; garth@panasas.com; welch@panasas.com; nfsv4@ietf.org; andros@netapp.com; bhalevy@panasas.com Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close On Wed, 2010-07-07 at 16:39 -0400, Daniel.Muntz@emc.com wrote: > To bring this discussion full circle, since we agree that a compliant > server can implement a scheme where written data does not become visible > until after a LAYOUTCOMMIT, do we also agree that LAYOUTCOMMIT is a > "MUST" from a compliant client (independent of layout type)? Yes. I would agree that the client cannot rely on the updates being made visible if it fails to send the LAYOUTCOMMIT. My point was simply that a compliant server MUST also have a valid strategy for dealing with the case where the client doesn't send it. Cheers Trond > -Dan > > > -----Original Message----- > > From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] > > On Behalf Of Trond Myklebust > > Sent: Wednesday, July 07, 2010 7:04 AM > > To: Benny Halevy > > Cc: andros@netapp.com; linux-nfs@vger.kernel.org; Garth > > Gibson; Brent Welch; NFSv4 > > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > > > > On Wed, 2010-07-07 at 16:51 +0300, Benny Halevy wrote: > > > On Jul. 07, 2010, 16:18 +0300, Trond Myklebust > > <Trond.Myklebust@netapp.com> wrote: > > > > On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote: > > > >> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote: > > > >>> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust > > <trond.myklebust@fys.uio.no> wrote: > > > >>>> On Tue, 2010-07-06 at 15:20 -0400, Daniel.Muntz@emc.com wrote: > > > >>>>> The COMMIT to the DS, ttbomk, commits data on the DS. > > I see it as > > > >>>>> orthogonal to updating the metadata on the MDS (but > > perhaps I'm wrong). > > > >>>>> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT > > provides a synchronization > > > >>>>> point, so even if the non-clustered server does not > > want to update > > > >>>>> metadata on every DS I/O, the LAYOUTCOMMIT could also > > be a trigger to > > > >>>>> execute whatever synchronization mechanism the > > implementer wishes to put > > > >>>>> in the control protocol. > > > >>>> > > > >>>> As far as I'm aware, there are no exceptions in > > RFC5661 that would allow > > > >>>> pNFS servers to break the rule that any visible change > > to the data must > > > >>>> be atomically accompanied with a change attribute update. > > > >>>> > > > >>> > > > >>> Trond, I'm not sure how this rule you mentioned is specified. > > > >>> > > > >>> See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT > > and change/time_modify > > > >>> in particular: > > > >>> > > > >>> For some layout protocols, the storage device is > > able to notify the > > > >>> metadata server of the occurrence of an I/O; as a > > result, the change > > > >>> and time_modify attributes may be updated at the > > metadata server. > > > >>> For a metadata server that is capable of monitoring > > updates to the > > > >>> change and time_modify attributes, LAYOUTCOMMIT > > processing is not > > > >>> required to update the change attribute. In this > > case, the metadata > > > >>> server must ensure that no further update to the > > data has occurred > > > >>> since the last update of the attributes; file-based > > protocols may > > > >>> have enough information to make this determination > > or may update the > > > >>> change attribute upon each file modification. This > > also applies for > > > >>> the time_modify attribute. If the server > > implementation is able to > > > >>> determine that the file has not been modified since the last > > > >>> time_modify update, the server need not update time_modify at > > > >>> LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the > > updated attributes > > > >>> should be visible if that file was modified since > > the latest previous > > > >>> LAYOUTCOMMIT or LAYOUTGET > > > >> > > > >> I know. However the above paragraph does not state that > > the server > > > >> should make those changes visible to clients other than > > the one that is > > > >> writing. > > > >> > > > >> Section 18.32.4 states that writes will cause the > > time_modified and > > > >> change attributes to be updated (if and only if the file data is > > > >> modified). Several other sections rely on this > > behaviour, including > > > >> section 10.3.1, section 11.7.2.2, and section 11.7.7. > > > >> > > > >> The only 'special behaviour' that I see allowed for pNFS > > is in section > > > >> 13.10, which states that clients can't expect to see changes > > > >> immediately, but that they must be able to expect close-to-open > > > >> semantics to work. Again, if this is to be the case, > > then the server > > > >> _must_ be able to deal with the case where client 1 dies > > before it can > > > >> issue the LAYOUTCOMMIT. > > > > > > Agreed. > > > > > > >> > > > >> > > > >>>> As I see it, if your server allows one client to read > > data that may have > > > >>>> been modified by another client that holds a WRITE > > layout for that range > > > >>>> then (since that is a visible data change) it should > > provide a change > > > >>>> attribute update irrespective of whether or not a > > LAYOUTCOMMIT has been > > > >>>> sent. > > > >>> > > > >>> the requirement for the server in WRITE's > > implementation section > > > >>> is quite weak: "It is assumed that the act of writing > > data to a file will > > > >>> cause the time_modified and change attributes of the > > file to be updated." > > > >>> > > > >>> The difference here is that for pNFS the written data > > is not guaranteed > > > >>> to be visible until LAYOUTCOMMIT. In a broader sense, > > assuming the clients > > > >>> are caching dirty data and use a write-behind cache, > > application-written data > > > >>> may be visible to other processes on the same host but > > not to others until > > > >>> fsync() or close() - open-to-close semantics are the > > only thing the client > > > >>> guarantees, right? Issuing LAYOUTCOMMIT on fsync() and > > close() ensure the > > > >>> data is committed to stable storage and is visible to > > all other clients in > > > >>> the cluster. > > > >> > > > >> See above. I'm not disputing your statement that 'the > > written data is > > > >> not guaranteed to be visible until LAYOUTCOMMIT'. I am > > disputing an > > > >> assumption that 'the written data may be visible without > > an accompanying > > > >> change attribute update'. > > > > > > > > > > > > In other words, I'd expect the following scenario to give the same > > > > results in NFSv4.1 w/pNFS as it does in NFSv4: > > > > > > That's a strong requirement that may limit the scalability > > of the server. > > > > > > The spirit of the pNFS operations, at least from Panasas > > perspective was that > > > the data is transient until LAYOUTCOMMIT, meaning it may or > > may not be visible > > > to clients other than the one who wrote it, and its > > associated metadata MUST > > > be updated and describe the new data only on LAYOUTCOMMIT > > and until then it's > > > undefined, i.e. it's up to the server implementation > > whether to update it or not. > > > > > > Without locking, what do the stronger semantics buy you? > > > Even if a client verified the change_attribute new data may > > become visible > > > at any time after the GETATTR if the file/byte range aren't locked. > > > > There is no locking needed in the scenario below: it is ordinary > > close-to-open semantics. > > > > The point is that if you remove the one and only way that clients have > > to determine whether or not their data caches are valid, then they can > > no longer cache data at all, and server scalability will be shot to > > smithereens anyway. > > > > Trond > > > > > Benny > > > > > > > > > > > Client 1 Client 2 > > > > ======== ======== > > > > > > > > OPEN foo > > > > READ > > > > CLOSE > > > > OPEN > > > > LAYOUTGET ... > > > > WRITE via DS > > > > <dies>... > > > > OPEN foo > > > > verify change_attr > > > > READ if above WRITE is visible > > > > CLOSE > > > > > > > > Trond > > > > _______________________________________________ > > > > nfsv4 mailing list > > > > nfsv4@ietf.org > > > > https://www.ietf.org/mailman/listinfo/nfsv4 > > > > > > _______________________________________________ > > nfsv4 mailing list > > nfsv4@ietf.org > > https://www.ietf.org/mailman/listinfo/nfsv4 > > > > _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www.ietf.org/mailman/listinfo/nfsv4 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close 2010-07-07 22:04 ` Noveck_David @ 2010-07-07 22:27 ` Trond Myklebust 2010-07-07 22:44 ` david.black 1 sibling, 0 replies; 38+ messages in thread From: Trond Myklebust @ 2010-07-07 22:27 UTC (permalink / raw) To: Noveck_David; +Cc: linux-nfs, garth, welch, nfsv4, andros, bhalevy On Wed, 2010-07-07 at 18:04 -0400, Noveck_David@emc.com wrote: > > Yes. I would agree that the client cannot rely on the updates being > made > > visible if it fails to send the LAYOUTCOMMIT. My point was simply that > a > > compliant server MUST also have a valid strategy for dealing with the > > case where the client doesn't send it. > > So you are saying the updates "MUST be made visible" through the > server's valid strategy. Is that right. > > And that the client cannot rely on that. Why not, if the server must > have a valid strategy. > > Is this just prudent "belt and suspenders" design or what? > > It seems to me that if one side here is MUST (and the spec needs to be > clearer about what might or might not constitute a valid strategy), then > the other side should be SHOULD. > > If both sides are "MUST", then if things don't work out then the client > and server can equally point to one another and say "It's his fault". > > Am I missing something here? See the example at the very bottom of this email. If the client dies after it has written data to the data servers, but before it can issue LAYOUTCOMMIT, then the server needs to have a strategy for dealing with that. Either it has to figure out that changes have been made, and to update the change attribute so that close-to-open cache consistency works, or it needs to ensure that those changes are not made visible. A "solution" where the file data changes, but the client can't detect it is not acceptable. Trond > -----Original Message----- > From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] On Behalf > Of Trond Myklebust > Sent: Wednesday, July 07, 2010 5:01 PM > To: Muntz, Daniel > Cc: linux-nfs@vger.kernel.org; garth@panasas.com; welch@panasas.com; > nfsv4@ietf.org; andros@netapp.com; bhalevy@panasas.com > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > > On Wed, 2010-07-07 at 16:39 -0400, Daniel.Muntz@emc.com wrote: > > To bring this discussion full circle, since we agree that a compliant > > server can implement a scheme where written data does not become > visible > > until after a LAYOUTCOMMIT, do we also agree that LAYOUTCOMMIT is a > > "MUST" from a compliant client (independent of layout type)? > > Yes. I would agree that the client cannot rely on the updates being made > visible if it fails to send the LAYOUTCOMMIT. My point was simply that a > compliant server MUST also have a valid strategy for dealing with the > case where the client doesn't send it. > > Cheers > Trond > > > -Dan > > > > > -----Original Message----- > > > From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] > > > On Behalf Of Trond Myklebust > > > Sent: Wednesday, July 07, 2010 7:04 AM > > > To: Benny Halevy > > > Cc: andros@netapp.com; linux-nfs@vger.kernel.org; Garth > > > Gibson; Brent Welch; NFSv4 > > > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > > > > > > On Wed, 2010-07-07 at 16:51 +0300, Benny Halevy wrote: > > > > On Jul. 07, 2010, 16:18 +0300, Trond Myklebust > > > <Trond.Myklebust@netapp.com> wrote: > > > > > On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote: > > > > >> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote: > > > > >>> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust > > > <trond.myklebust@fys.uio.no> wrote: > > > > >>>> On Tue, 2010-07-06 at 15:20 -0400, Daniel.Muntz@emc.com > wrote: > > > > >>>>> The COMMIT to the DS, ttbomk, commits data on the DS. > > > I see it as > > > > >>>>> orthogonal to updating the metadata on the MDS (but > > > perhaps I'm wrong). > > > > >>>>> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT > > > provides a synchronization > > > > >>>>> point, so even if the non-clustered server does not > > > want to update > > > > >>>>> metadata on every DS I/O, the LAYOUTCOMMIT could also > > > be a trigger to > > > > >>>>> execute whatever synchronization mechanism the > > > implementer wishes to put > > > > >>>>> in the control protocol. > > > > >>>> > > > > >>>> As far as I'm aware, there are no exceptions in > > > RFC5661 that would allow > > > > >>>> pNFS servers to break the rule that any visible change > > > to the data must > > > > >>>> be atomically accompanied with a change attribute update. > > > > >>>> > > > > >>> > > > > >>> Trond, I'm not sure how this rule you mentioned is specified. > > > > >>> > > > > >>> See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT > > > and change/time_modify > > > > >>> in particular: > > > > >>> > > > > >>> For some layout protocols, the storage device is > > > able to notify the > > > > >>> metadata server of the occurrence of an I/O; as a > > > result, the change > > > > >>> and time_modify attributes may be updated at the > > > metadata server. > > > > >>> For a metadata server that is capable of monitoring > > > updates to the > > > > >>> change and time_modify attributes, LAYOUTCOMMIT > > > processing is not > > > > >>> required to update the change attribute. In this > > > case, the metadata > > > > >>> server must ensure that no further update to the > > > data has occurred > > > > >>> since the last update of the attributes; file-based > > > protocols may > > > > >>> have enough information to make this determination > > > or may update the > > > > >>> change attribute upon each file modification. This > > > also applies for > > > > >>> the time_modify attribute. If the server > > > implementation is able to > > > > >>> determine that the file has not been modified since the > last > > > > >>> time_modify update, the server need not update time_modify > at > > > > >>> LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the > > > updated attributes > > > > >>> should be visible if that file was modified since > > > the latest previous > > > > >>> LAYOUTCOMMIT or LAYOUTGET > > > > >> > > > > >> I know. However the above paragraph does not state that > > > the server > > > > >> should make those changes visible to clients other than > > > the one that is > > > > >> writing. > > > > >> > > > > >> Section 18.32.4 states that writes will cause the > > > time_modified and > > > > >> change attributes to be updated (if and only if the file data > is > > > > >> modified). Several other sections rely on this > > > behaviour, including > > > > >> section 10.3.1, section 11.7.2.2, and section 11.7.7. > > > > >> > > > > >> The only 'special behaviour' that I see allowed for pNFS > > > is in section > > > > >> 13.10, which states that clients can't expect to see changes > > > > >> immediately, but that they must be able to expect close-to-open > > > > >> semantics to work. Again, if this is to be the case, > > > then the server > > > > >> _must_ be able to deal with the case where client 1 dies > > > before it can > > > > >> issue the LAYOUTCOMMIT. > > > > > > > > Agreed. > > > > > > > > >> > > > > >> > > > > >>>> As I see it, if your server allows one client to read > > > data that may have > > > > >>>> been modified by another client that holds a WRITE > > > layout for that range > > > > >>>> then (since that is a visible data change) it should > > > provide a change > > > > >>>> attribute update irrespective of whether or not a > > > LAYOUTCOMMIT has been > > > > >>>> sent. > > > > >>> > > > > >>> the requirement for the server in WRITE's > > > implementation section > > > > >>> is quite weak: "It is assumed that the act of writing > > > data to a file will > > > > >>> cause the time_modified and change attributes of the > > > file to be updated." > > > > >>> > > > > >>> The difference here is that for pNFS the written data > > > is not guaranteed > > > > >>> to be visible until LAYOUTCOMMIT. In a broader sense, > > > assuming the clients > > > > >>> are caching dirty data and use a write-behind cache, > > > application-written data > > > > >>> may be visible to other processes on the same host but > > > not to others until > > > > >>> fsync() or close() - open-to-close semantics are the > > > only thing the client > > > > >>> guarantees, right? Issuing LAYOUTCOMMIT on fsync() and > > > close() ensure the > > > > >>> data is committed to stable storage and is visible to > > > all other clients in > > > > >>> the cluster. > > > > >> > > > > >> See above. I'm not disputing your statement that 'the > > > written data is > > > > >> not guaranteed to be visible until LAYOUTCOMMIT'. I am > > > disputing an > > > > >> assumption that 'the written data may be visible without > > > an accompanying > > > > >> change attribute update'. > > > > > > > > > > > > > > > In other words, I'd expect the following scenario to give the > same > > > > > results in NFSv4.1 w/pNFS as it does in NFSv4: > > > > > > > > That's a strong requirement that may limit the scalability > > > of the server. > > > > > > > > The spirit of the pNFS operations, at least from Panasas > > > perspective was that > > > > the data is transient until LAYOUTCOMMIT, meaning it may or > > > may not be visible > > > > to clients other than the one who wrote it, and its > > > associated metadata MUST > > > > be updated and describe the new data only on LAYOUTCOMMIT > > > and until then it's > > > > undefined, i.e. it's up to the server implementation > > > whether to update it or not. > > > > > > > > Without locking, what do the stronger semantics buy you? > > > > Even if a client verified the change_attribute new data may > > > become visible > > > > at any time after the GETATTR if the file/byte range aren't > locked. > > > > > > There is no locking needed in the scenario below: it is ordinary > > > close-to-open semantics. > > > > > > The point is that if you remove the one and only way that clients > have > > > to determine whether or not their data caches are valid, then they > can > > > no longer cache data at all, and server scalability will be shot to > > > smithereens anyway. > > > > > > Trond > > > > > > > Benny > > > > > > > > > > > > > > Client 1 Client 2 > > > > > ======== ======== > > > > > > > > > > OPEN foo > > > > > READ > > > > > CLOSE > > > > > OPEN > > > > > LAYOUTGET ... > > > > > WRITE via DS > > > > > <dies>... > > > > > OPEN foo > > > > > verify change_attr > > > > > READ if above WRITE is visible > > > > > CLOSE > > > > > > > > > > Trond > > > > > _______________________________________________ > > > > > nfsv4 mailing list > > > > > nfsv4@ietf.org > > > > > https://www.ietf.org/mailman/listinfo/nfsv4 > > > > > > > > > _______________________________________________ > > > nfsv4 mailing list > > > nfsv4@ietf.org > > > https://www.ietf.org/mailman/listinfo/nfsv4 > > > > > > > > > _______________________________________________ > nfsv4 mailing list > nfsv4@ietf.org > https://www.ietf.org/mailman/listinfo/nfsv4 > _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www.ietf.org/mailman/listinfo/nfsv4 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close 2010-07-07 22:04 ` Noveck_David 2010-07-07 22:27 ` Trond Myklebust @ 2010-07-07 22:44 ` david.black 2010-07-07 22:52 ` Trond Myklebust 1 sibling, 1 reply; 38+ messages in thread From: david.black @ 2010-07-07 22:44 UTC (permalink / raw) To: Noveck_David, Trond.Myklebust, Daniel.Muntz Cc: linux-nfs, garth, welch, nfsv4, andros, bhalevy Let me try this ... A correct client will always send LAYOUTCOMMIT. Assume that the client is correct. Hence if the LAYOUTCOMMIT doesn't arrive, something's failed. Important implication: No LAYOUTCOMMIT is an error/failure case. It just has to work; it doesn't have to be fast. Suggestion: If a client dies while holding writeable layouts that permit write-in-place, and the client doesn't reappear or doesn't reclaim those layouts, then the server should assume that the files involved were written before the client died, and set the file attributes accordingly as part of internally reclaiming the layout that the client has abandoned. Caveat: It may take a while for the server to determine that the client has abandoned a layout. This can result in false positives (file appears to be modified when it wasn't) but won't yield false negatives (file does not appear to be modified even though it was modified). Thanks, --David > -----Original Message----- > From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] On Behalf Of Noveck_David@emc.com > Sent: Wednesday, July 07, 2010 6:04 PM > To: Trond.Myklebust@netapp.com; Muntz, Daniel > Cc: linux-nfs@vger.kernel.org; garth@panasas.com; welch@panasas.com; nfsv4@ietf.org; > andros@netapp.com; bhalevy@panasas.com > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > > > Yes. I would agree that the client cannot rely on the updates being made > > visible if it fails to send the LAYOUTCOMMIT. My point was simply that a > > compliant server MUST also have a valid strategy for dealing with the > > case where the client doesn't send it. > > So you are saying the updates "MUST be made visible" through the > server's valid strategy. Is that right. > > And that the client cannot rely on that. Why not, if the server must > have a valid strategy. > > Is this just prudent "belt and suspenders" design or what? > > It seems to me that if one side here is MUST (and the spec needs to be > clearer about what might or might not constitute a valid strategy), then > the other side should be SHOULD. > > If both sides are "MUST", then if things don't work out then the client > and server can equally point to one another and say "It's his fault". > > Am I missing something here? > > > > -----Original Message----- > From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] On Behalf > Of Trond Myklebust > Sent: Wednesday, July 07, 2010 5:01 PM > To: Muntz, Daniel > Cc: linux-nfs@vger.kernel.org; garth@panasas.com; welch@panasas.com; > nfsv4@ietf.org; andros@netapp.com; bhalevy@panasas.com > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > > On Wed, 2010-07-07 at 16:39 -0400, Daniel.Muntz@emc.com wrote: > > To bring this discussion full circle, since we agree that a compliant > > server can implement a scheme where written data does not become > visible > > until after a LAYOUTCOMMIT, do we also agree that LAYOUTCOMMIT is a > > "MUST" from a compliant client (independent of layout type)? > > Yes. I would agree that the client cannot rely on the updates being made > visible if it fails to send the LAYOUTCOMMIT. My point was simply that a > compliant server MUST also have a valid strategy for dealing with the > case where the client doesn't send it. > > Cheers > Trond > > > -Dan > > > > > -----Original Message----- > > > From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] > > > On Behalf Of Trond Myklebust > > > Sent: Wednesday, July 07, 2010 7:04 AM > > > To: Benny Halevy > > > Cc: andros@netapp.com; linux-nfs@vger.kernel.org; Garth > > > Gibson; Brent Welch; NFSv4 > > > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > > > > > > On Wed, 2010-07-07 at 16:51 +0300, Benny Halevy wrote: > > > > On Jul. 07, 2010, 16:18 +0300, Trond Myklebust > > > <Trond.Myklebust@netapp.com> wrote: > > > > > On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote: > > > > >> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote: > > > > >>> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust > > > <trond.myklebust@fys.uio.no> wrote: > > > > >>>> On Tue, 2010-07-06 at 15:20 -0400, Daniel.Muntz@emc.com > wrote: > > > > >>>>> The COMMIT to the DS, ttbomk, commits data on the DS. I see it as > > > > >>>>> orthogonal to updating the metadata on the MDS (but perhaps I'm wrong). > > > > >>>>> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides a synchronization > > > > >>>>> point, so even if the non-clustered server does not want to update > > > > >>>>> metadata on every DS I/O, the LAYOUTCOMMIT could also be a trigger to > > > > >>>>> execute whatever synchronization mechanism the implementer wishes to put > > > > >>>>> in the control protocol. > > > > >>>> > > > > >>>> As far as I'm aware, there are no exceptions in RFC5661 that would allow > > > > >>>> pNFS servers to break the rule that any visible change to the data must > > > > >>>> be atomically accompanied with a change attribute update. > > > > >>>> > > > > >>> > > > > >>> Trond, I'm not sure how this rule you mentioned is specified. > > > > >>> > > > > >>> See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT and change/time_modify > > > > >>> in particular: > > > > >>> > > > > >>> For some layout protocols, the storage device is able to notify the > > > > >>> metadata server of the occurrence of an I/O; as a result, the change > > > > >>> and time_modify attributes may be updated at the metadata server. > > > > >>> For a metadata server that is capable of monitoring updates to the > > > > >>> change and time_modify attributes, LAYOUTCOMMIT processing is not > > > > >>> required to update the change attribute. In this case, the metadata > > > > >>> server must ensure that no further update to the data has occurred > > > > >>> since the last update of the attributes; file-based protocols may > > > > >>> have enough information to make this determination or may update the > > > > >>> change attribute upon each file modification. This also applies for > > > > >>> the time_modify attribute. If the server implementation is able to > > > > >>> determine that the file has not been modified since the last > > > > >>> time_modify update, the server need not update time_modify at > > > > >>> LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the updated attributes > > > > >>> should be visible if that file was modified since the latest previous > > > > >>> LAYOUTCOMMIT or LAYOUTGET > > > > >> > > > > >> I know. However the above paragraph does not state that the server > > > > >> should make those changes visible to clients other than the one that is > > > > >> writing. > > > > >> > > > > >> Section 18.32.4 states that writes will cause the time_modified and > > > > >> change attributes to be updated (if and only if the file data is > > > > >> modified). Several other sections rely on this behaviour, including > > > > >> section 10.3.1, section 11.7.2.2, and section 11.7.7. > > > > >> > > > > >> The only 'special behaviour' that I see allowed for pNFS is in section > > > > >> 13.10, which states that clients can't expect to see changes > > > > >> immediately, but that they must be able to expect close-to-open > > > > >> semantics to work. Again, if this is to be the case, then the server > > > > >> _must_ be able to deal with the case where client 1 dies before it can > > > > >> issue the LAYOUTCOMMIT. > > > > > > > > Agreed. > > > > > > > > >> > > > > >> > > > > >>>> As I see it, if your server allows one client to read data that may have > > > > >>>> been modified by another client that holds a WRITE layout for that range > > > > >>>> then (since that is a visible data change) it should provide a change > > > > >>>> attribute update irrespective of whether or not a LAYOUTCOMMIT has been > > > > >>>> sent. > > > > >>> > > > > >>> the requirement for the server in WRITE's implementation section > > > > >>> is quite weak: "It is assumed that the act of writing data to a file will > > > > >>> cause the time_modified and change attributes of the file to be updated." > > > > >>> > > > > >>> The difference here is that for pNFS the written data is not guaranteed > > > > >>> to be visible until LAYOUTCOMMIT. In a broader sense, assuming the clients > > > > >>> are caching dirty data and use a write-behind cache, application-written data > > > > >>> may be visible to other processes on the same host but not to others until > > > > >>> fsync() or close() - open-to-close semantics are the only thing the client > > > > >>> guarantees, right? Issuing LAYOUTCOMMIT on fsync() and close() ensure the > > > > >>> data is committed to stable storage and is visible to all other clients in > > > > >>> the cluster. > > > > >> > > > > >> See above. I'm not disputing your statement that 'the written data is > > > > >> not guaranteed to be visible until LAYOUTCOMMIT'. I am disputing an > > > > >> assumption that 'the written data may be visible without an accompanying > > > > >> change attribute update'. > > > > > > > > > > > > > > > In other words, I'd expect the following scenario to give the same > > > > > results in NFSv4.1 w/pNFS as it does in NFSv4: > > > > > > > > That's a strong requirement that may limit the scalability of the server. > > > > > > > > The spirit of the pNFS operations, at least from Panasas perspective was that > > > > the data is transient until LAYOUTCOMMIT, meaning it may or may not be visible > > > > to clients other than the one who wrote it, and its associated metadata MUST > > > > be updated and describe the new data only on LAYOUTCOMMIT and until then it's > > > > undefined, i.e. it's up to the server implementation whether to update it or not. > > > > > > > > Without locking, what do the stronger semantics buy you? > > > > Even if a client verified the change_attribute new data may become visible > > > > at any time after the GETATTR if the file/byte range aren't locked. > > > > > > There is no locking needed in the scenario below: it is ordinary > > > close-to-open semantics. > > > > > > The point is that if you remove the one and only way that clients have > > > to determine whether or not their data caches are valid, then they can > > > no longer cache data at all, and server scalability will be shot to > > > smithereens anyway. > > > > > > Trond > > > > > > > Benny > > > > > > > > > > > > > > Client 1 Client 2 > > > > > ======== ======== > > > > > > > > > > OPEN foo > > > > > READ > > > > > CLOSE > > > > > OPEN > > > > > LAYOUTGET ... > > > > > WRITE via DS > > > > > <dies>... > > > > > OPEN foo > > > > > verify change_attr > > > > > READ if above WRITE is visible > > > > > CLOSE > > > > > > > > > > Trond > > > > > _______________________________________________ > > > > > nfsv4 mailing list > > > > > nfsv4@ietf.org > > > > > https://www.ietf.org/mailman/listinfo/nfsv4 > > > > > > > > > _______________________________________________ > > > nfsv4 mailing list > > > nfsv4@ietf.org > > > https://www.ietf.org/mailman/listinfo/nfsv4 > > > > > > > > > _______________________________________________ > nfsv4 mailing list > nfsv4@ietf.org > https://www.ietf.org/mailman/listinfo/nfsv4 > > _______________________________________________ > nfsv4 mailing list > nfsv4@ietf.org > https://www.ietf.org/mailman/listinfo/nfsv4 _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www.ietf.org/mailman/listinfo/nfsv4 ^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [nfsv4] 4.1 client - LAYOUTCOMMIT & close 2010-07-07 22:44 ` david.black @ 2010-07-07 22:52 ` Trond Myklebust 2010-07-07 23:09 ` Trond Myklebust 0 siblings, 1 reply; 38+ messages in thread From: Trond Myklebust @ 2010-07-07 22:52 UTC (permalink / raw) To: david.black Cc: Noveck_David, Daniel.Muntz, linux-nfs, garth, welch, nfsv4, andros, bhalevy On Wed, 2010-07-07 at 18:44 -0400, david.black@emc.com wrote: > Let me try this ... > > A correct client will always send LAYOUTCOMMIT. > Assume that the client is correct. > Hence if the LAYOUTCOMMIT doesn't arrive, something's failed. > > Important implication: No LAYOUTCOMMIT is an error/failure case. It > just has to work; it doesn't have to be fast. > > Suggestion: If a client dies while holding writeable layouts that permit > write-in-place, and the client doesn't reappear or doesn't reclaim those > layouts, then the server should assume that the files involved were > written before the client died, and set the file attributes accordingly > as part of internally reclaiming the layout that the client has > abandoned. > > Caveat: It may take a while for the server to determine that the client > has abandoned a layout. > > This can result in false positives (file appears to be modified when it > wasn't) but won't yield false negatives (file does not appear to be > modified even though it was modified). OK... So we're going to have to turn off client side file caching entirely for pNFS? I can do that... The above won't work. Think readahead... Trond > Thanks, > --David > > > -----Original Message----- > > From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] On Behalf > Of Noveck_David@emc.com > > Sent: Wednesday, July 07, 2010 6:04 PM > > To: Trond.Myklebust@netapp.com; Muntz, Daniel > > Cc: linux-nfs@vger.kernel.org; garth@panasas.com; welch@panasas.com; > nfsv4@ietf.org; > > andros@netapp.com; bhalevy@panasas.com > > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > > > > > Yes. I would agree that the client cannot rely on the updates being > made > > > visible if it fails to send the LAYOUTCOMMIT. My point was simply > that a > > > compliant server MUST also have a valid strategy for dealing with > the > > > case where the client doesn't send it. > > > > So you are saying the updates "MUST be made visible" through the > > server's valid strategy. Is that right. > > > > And that the client cannot rely on that. Why not, if the server must > > have a valid strategy. > > > > Is this just prudent "belt and suspenders" design or what? > > > > It seems to me that if one side here is MUST (and the spec needs to be > > clearer about what might or might not constitute a valid strategy), > then > > the other side should be SHOULD. > > > > If both sides are "MUST", then if things don't work out then the > client > > and server can equally point to one another and say "It's his fault". > > > > Am I missing something here? > > > > > > > > -----Original Message----- > > From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] On Behalf > > Of Trond Myklebust > > Sent: Wednesday, July 07, 2010 5:01 PM > > To: Muntz, Daniel > > Cc: linux-nfs@vger.kernel.org; garth@panasas.com; welch@panasas.com; > > nfsv4@ietf.org; andros@netapp.com; bhalevy@panasas.com > > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > > > > On Wed, 2010-07-07 at 16:39 -0400, Daniel.Muntz@emc.com wrote: > > > To bring this discussion full circle, since we agree that a > compliant > > > server can implement a scheme where written data does not become > > visible > > > until after a LAYOUTCOMMIT, do we also agree that LAYOUTCOMMIT is a > > > "MUST" from a compliant client (independent of layout type)? > > > > Yes. I would agree that the client cannot rely on the updates being > made > > visible if it fails to send the LAYOUTCOMMIT. My point was simply that > a > > compliant server MUST also have a valid strategy for dealing with the > > case where the client doesn't send it. > > > > Cheers > > Trond > > > > > -Dan > > > > > > > -----Original Message----- > > > > From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] > > > > On Behalf Of Trond Myklebust > > > > Sent: Wednesday, July 07, 2010 7:04 AM > > > > To: Benny Halevy > > > > Cc: andros@netapp.com; linux-nfs@vger.kernel.org; Garth > > > > Gibson; Brent Welch; NFSv4 > > > > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > > > > > > > > On Wed, 2010-07-07 at 16:51 +0300, Benny Halevy wrote: > > > > > On Jul. 07, 2010, 16:18 +0300, Trond Myklebust > > > > <Trond.Myklebust@netapp.com> wrote: > > > > > > On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote: > > > > > >> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote: > > > > > >>> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust > > > > <trond.myklebust@fys.uio.no> wrote: > > > > > >>>> On Tue, 2010-07-06 at 15:20 -0400, Daniel.Muntz@emc.com > > wrote: > > > > > >>>>> The COMMIT to the DS, ttbomk, commits data on the DS. I > see it as > > > > > >>>>> orthogonal to updating the metadata on the MDS (but > perhaps I'm wrong). > > > > > >>>>> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides a > synchronization > > > > > >>>>> point, so even if the non-clustered server does not want > to update > > > > > >>>>> metadata on every DS I/O, the LAYOUTCOMMIT could also be a > trigger to > > > > > >>>>> execute whatever synchronization mechanism the implementer > wishes to put > > > > > >>>>> in the control protocol. > > > > > >>>> > > > > > >>>> As far as I'm aware, there are no exceptions in RFC5661 > that would allow > > > > > >>>> pNFS servers to break the rule that any visible change to > the data must > > > > > >>>> be atomically accompanied with a change attribute update. > > > > > >>>> > > > > > >>> > > > > > >>> Trond, I'm not sure how this rule you mentioned is > specified. > > > > > >>> > > > > > >>> See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT and > change/time_modify > > > > > >>> in particular: > > > > > >>> > > > > > >>> For some layout protocols, the storage device is able to > notify the > > > > > >>> metadata server of the occurrence of an I/O; as a result, > the change > > > > > >>> and time_modify attributes may be updated at the metadata > server. > > > > > >>> For a metadata server that is capable of monitoring > updates to the > > > > > >>> change and time_modify attributes, LAYOUTCOMMIT > processing is not > > > > > >>> required to update the change attribute. In this case, > the metadata > > > > > >>> server must ensure that no further update to the data has > occurred > > > > > >>> since the last update of the attributes; file-based > protocols may > > > > > >>> have enough information to make this determination or may > update the > > > > > >>> change attribute upon each file modification. This also > applies for > > > > > >>> the time_modify attribute. If the server implementation > is able to > > > > > >>> determine that the file has not been modified since the > last > > > > > >>> time_modify update, the server need not update > time_modify at > > > > > >>> LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the updated > attributes > > > > > >>> should be visible if that file was modified since the > latest previous > > > > > >>> LAYOUTCOMMIT or LAYOUTGET > > > > > >> > > > > > >> I know. However the above paragraph does not state that the > server > > > > > >> should make those changes visible to clients other than the > one that is > > > > > >> writing. > > > > > >> > > > > > >> Section 18.32.4 states that writes will cause the > time_modified and > > > > > >> change attributes to be updated (if and only if the file data > is > > > > > >> modified). Several other sections rely on this behaviour, > including > > > > > >> section 10.3.1, section 11.7.2.2, and section 11.7.7. > > > > > >> > > > > > >> The only 'special behaviour' that I see allowed for pNFS is > in section > > > > > >> 13.10, which states that clients can't expect to see changes > > > > > >> immediately, but that they must be able to expect > close-to-open > > > > > >> semantics to work. Again, if this is to be the case, then the > server > > > > > >> _must_ be able to deal with the case where client 1 dies > before it can > > > > > >> issue the LAYOUTCOMMIT. > > > > > > > > > > Agreed. > > > > > > > > > > >> > > > > > >> > > > > > >>>> As I see it, if your server allows one client to read data > that may have > > > > > >>>> been modified by another client that holds a WRITE layout > for that range > > > > > >>>> then (since that is a visible data change) it should > provide a change > > > > > >>>> attribute update irrespective of whether or not a > LAYOUTCOMMIT has been > > > > > >>>> sent. > > > > > >>> > > > > > >>> the requirement for the server in WRITE's implementation > section > > > > > >>> is quite weak: "It is assumed that the act of writing data > to a file will > > > > > >>> cause the time_modified and change attributes of the file to > be updated." > > > > > >>> > > > > > >>> The difference here is that for pNFS the written data is not > guaranteed > > > > > >>> to be visible until LAYOUTCOMMIT. In a broader sense, > assuming the clients > > > > > >>> are caching dirty data and use a write-behind cache, > application-written data > > > > > >>> may be visible to other processes on the same host but not > to others until > > > > > >>> fsync() or close() - open-to-close semantics are the only > thing the client > > > > > >>> guarantees, right? Issuing LAYOUTCOMMIT on fsync() and > close() ensure the > > > > > >>> data is committed to stable storage and is visible to all > other clients in > > > > > >>> the cluster. > > > > > >> > > > > > >> See above. I'm not disputing your statement that 'the written > data is > > > > > >> not guaranteed to be visible until LAYOUTCOMMIT'. I am > disputing an > > > > > >> assumption that 'the written data may be visible without an > accompanying > > > > > >> change attribute update'. > > > > > > > > > > > > > > > > > > In other words, I'd expect the following scenario to give the > same > > > > > > results in NFSv4.1 w/pNFS as it does in NFSv4: > > > > > > > > > > That's a strong requirement that may limit the scalability of > the server. > > > > > > > > > > The spirit of the pNFS operations, at least from Panasas > perspective was that > > > > > the data is transient until LAYOUTCOMMIT, meaning it may or may > not be visible > > > > > to clients other than the one who wrote it, and its associated > metadata MUST > > > > > be updated and describe the new data only on LAYOUTCOMMIT and > until then it's > > > > > undefined, i.e. it's up to the server implementation whether to > update it or not. > > > > > > > > > > Without locking, what do the stronger semantics buy you? > > > > > Even if a client verified the change_attribute new data may > become visible > > > > > at any time after the GETATTR if the file/byte range aren't > locked. > > > > > > > > There is no locking needed in the scenario below: it is ordinary > > > > close-to-open semantics. > > > > > > > > The point is that if you remove the one and only way that clients > have > > > > to determine whether or not their data caches are valid, then they > can > > > > no longer cache data at all, and server scalability will be shot > to > > > > smithereens anyway. > > > > > > > > Trond > > > > > > > > > Benny > > > > > > > > > > > > > > > > > Client 1 Client 2 > > > > > > ======== ======== > > > > > > > > > > > > OPEN foo > > > > > > READ > > > > > > CLOSE > > > > > > OPEN > > > > > > LAYOUTGET ... > > > > > > WRITE via DS > > > > > > <dies>... > > > > > > OPEN foo > > > > > > verify change_attr > > > > > > READ if above WRITE is visible > > > > > > CLOSE > > > > > > > > > > > > Trond > > > > > > _______________________________________________ > > > > > > nfsv4 mailing list > > > > > > nfsv4@ietf.org > > > > > > https://www.ietf.org/mailman/listinfo/nfsv4 > > > > > > > > > > > > _______________________________________________ > > > > nfsv4 mailing list > > > > nfsv4@ietf.org > > > > https://www.ietf.org/mailman/listinfo/nfsv4 > > > > > > > > > > > > > > _______________________________________________ > > nfsv4 mailing list > > nfsv4@ietf.org > > https://www.ietf.org/mailman/listinfo/nfsv4 > > > > _______________________________________________ > > nfsv4 mailing list > > nfsv4@ietf.org > > https://www.ietf.org/mailman/listinfo/nfsv4 > ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close 2010-07-07 22:52 ` Trond Myklebust @ 2010-07-07 23:09 ` Trond Myklebust [not found] ` <1278544497.15524.17.camel@heimdal.trondhje! m .org> 2010-07-07 23:14 ` Trond Myklebust 0 siblings, 2 replies; 38+ messages in thread From: Trond Myklebust @ 2010-07-07 23:09 UTC (permalink / raw) To: david.black; +Cc: linux-nfs, garth, welch, nfsv4, andros, bhalevy On Wed, 2010-07-07 at 18:52 -0400, Trond Myklebust wrote: > On Wed, 2010-07-07 at 18:44 -0400, david.black@emc.com wrote: > > Let me try this ... > > > > A correct client will always send LAYOUTCOMMIT. > > Assume that the client is correct. > > Hence if the LAYOUTCOMMIT doesn't arrive, something's failed. > > > > Important implication: No LAYOUTCOMMIT is an error/failure case. It > > just has to work; it doesn't have to be fast. > > > > Suggestion: If a client dies while holding writeable layouts that permit > > write-in-place, and the client doesn't reappear or doesn't reclaim those > > layouts, then the server should assume that the files involved were > > written before the client died, and set the file attributes accordingly > > as part of internally reclaiming the layout that the client has > > abandoned. > > > > Caveat: It may take a while for the server to determine that the client > > has abandoned a layout. > > > > This can result in false positives (file appears to be modified when it > > wasn't) but won't yield false negatives (file does not appear to be > > modified even though it was modified). > > OK... So we're going to have to turn off client side file caching > entirely for pNFS? I can do that... > > The above won't work. Think readahead... So... What can work, is if you modify it to work explicitly for close-to-open "Upon receiving an OPEN, LOCK or a WANT_DELEGATION, the server must check that it has received LAYOUTCOMMITs from any other clients that may have the file open for writing. If it hasn't, then it MUST take some action to ensure that any file data changes are accompanied by a change attribute update." Then you can add the above suggestion without the offending caveat. Note however that it does break the "SHOULD NOT" admonition in section 18.32.4. Trond > Trond > > > Thanks, > > --David > > > > > -----Original Message----- > > > From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] On Behalf > > Of Noveck_David@emc.com > > > Sent: Wednesday, July 07, 2010 6:04 PM > > > To: Trond.Myklebust@netapp.com; Muntz, Daniel > > > Cc: linux-nfs@vger.kernel.org; garth@panasas.com; welch@panasas.com; > > nfsv4@ietf.org; > > > andros@netapp.com; bhalevy@panasas.com > > > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > > > > > > > Yes. I would agree that the client cannot rely on the updates being > > made > > > > visible if it fails to send the LAYOUTCOMMIT. My point was simply > > that a > > > > compliant server MUST also have a valid strategy for dealing with > > the > > > > case where the client doesn't send it. > > > > > > So you are saying the updates "MUST be made visible" through the > > > server's valid strategy. Is that right. > > > > > > And that the client cannot rely on that. Why not, if the server must > > > have a valid strategy. > > > > > > Is this just prudent "belt and suspenders" design or what? > > > > > > It seems to me that if one side here is MUST (and the spec needs to be > > > clearer about what might or might not constitute a valid strategy), > > then > > > the other side should be SHOULD. > > > > > > If both sides are "MUST", then if things don't work out then the > > client > > > and server can equally point to one another and say "It's his fault". > > > > > > Am I missing something here? > > > > > > > > > > > > -----Original Message----- > > > From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] On Behalf > > > Of Trond Myklebust > > > Sent: Wednesday, July 07, 2010 5:01 PM > > > To: Muntz, Daniel > > > Cc: linux-nfs@vger.kernel.org; garth@panasas.com; welch@panasas.com; > > > nfsv4@ietf.org; andros@netapp.com; bhalevy@panasas.com > > > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > > > > > > On Wed, 2010-07-07 at 16:39 -0400, Daniel.Muntz@emc.com wrote: > > > > To bring this discussion full circle, since we agree that a > > compliant > > > > server can implement a scheme where written data does not become > > > visible > > > > until after a LAYOUTCOMMIT, do we also agree that LAYOUTCOMMIT is a > > > > "MUST" from a compliant client (independent of layout type)? > > > > > > Yes. I would agree that the client cannot rely on the updates being > > made > > > visible if it fails to send the LAYOUTCOMMIT. My point was simply that > > a > > > compliant server MUST also have a valid strategy for dealing with the > > > case where the client doesn't send it. > > > > > > Cheers > > > Trond > > > > > > > -Dan > > > > > > > > > -----Original Message----- > > > > > From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] > > > > > On Behalf Of Trond Myklebust > > > > > Sent: Wednesday, July 07, 2010 7:04 AM > > > > > To: Benny Halevy > > > > > Cc: andros@netapp.com; linux-nfs@vger.kernel.org; Garth > > > > > Gibson; Brent Welch; NFSv4 > > > > > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > > > > > > > > > > On Wed, 2010-07-07 at 16:51 +0300, Benny Halevy wrote: > > > > > > On Jul. 07, 2010, 16:18 +0300, Trond Myklebust > > > > > <Trond.Myklebust@netapp.com> wrote: > > > > > > > On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote: > > > > > > >> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote: > > > > > > >>> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust > > > > > <trond.myklebust@fys.uio.no> wrote: > > > > > > >>>> On Tue, 2010-07-06 at 15:20 -0400, Daniel.Muntz@emc.com > > > wrote: > > > > > > >>>>> The COMMIT to the DS, ttbomk, commits data on the DS. I > > see it as > > > > > > >>>>> orthogonal to updating the metadata on the MDS (but > > perhaps I'm wrong). > > > > > > >>>>> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides a > > synchronization > > > > > > >>>>> point, so even if the non-clustered server does not want > > to update > > > > > > >>>>> metadata on every DS I/O, the LAYOUTCOMMIT could also be a > > trigger to > > > > > > >>>>> execute whatever synchronization mechanism the implementer > > wishes to put > > > > > > >>>>> in the control protocol. > > > > > > >>>> > > > > > > >>>> As far as I'm aware, there are no exceptions in RFC5661 > > that would allow > > > > > > >>>> pNFS servers to break the rule that any visible change to > > the data must > > > > > > >>>> be atomically accompanied with a change attribute update. > > > > > > >>>> > > > > > > >>> > > > > > > >>> Trond, I'm not sure how this rule you mentioned is > > specified. > > > > > > >>> > > > > > > >>> See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT and > > change/time_modify > > > > > > >>> in particular: > > > > > > >>> > > > > > > >>> For some layout protocols, the storage device is able to > > notify the > > > > > > >>> metadata server of the occurrence of an I/O; as a result, > > the change > > > > > > >>> and time_modify attributes may be updated at the metadata > > server. > > > > > > >>> For a metadata server that is capable of monitoring > > updates to the > > > > > > >>> change and time_modify attributes, LAYOUTCOMMIT > > processing is not > > > > > > >>> required to update the change attribute. In this case, > > the metadata > > > > > > >>> server must ensure that no further update to the data has > > occurred > > > > > > >>> since the last update of the attributes; file-based > > protocols may > > > > > > >>> have enough information to make this determination or may > > update the > > > > > > >>> change attribute upon each file modification. This also > > applies for > > > > > > >>> the time_modify attribute. If the server implementation > > is able to > > > > > > >>> determine that the file has not been modified since the > > last > > > > > > >>> time_modify update, the server need not update > > time_modify at > > > > > > >>> LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the updated > > attributes > > > > > > >>> should be visible if that file was modified since the > > latest previous > > > > > > >>> LAYOUTCOMMIT or LAYOUTGET > > > > > > >> > > > > > > >> I know. However the above paragraph does not state that the > > server > > > > > > >> should make those changes visible to clients other than the > > one that is > > > > > > >> writing. > > > > > > >> > > > > > > >> Section 18.32.4 states that writes will cause the > > time_modified and > > > > > > >> change attributes to be updated (if and only if the file data > > is > > > > > > >> modified). Several other sections rely on this behaviour, > > including > > > > > > >> section 10.3.1, section 11.7.2.2, and section 11.7.7. > > > > > > >> > > > > > > >> The only 'special behaviour' that I see allowed for pNFS is > > in section > > > > > > >> 13.10, which states that clients can't expect to see changes > > > > > > >> immediately, but that they must be able to expect > > close-to-open > > > > > > >> semantics to work. Again, if this is to be the case, then the > > server > > > > > > >> _must_ be able to deal with the case where client 1 dies > > before it can > > > > > > >> issue the LAYOUTCOMMIT. > > > > > > > > > > > > Agreed. > > > > > > > > > > > > >> > > > > > > >> > > > > > > >>>> As I see it, if your server allows one client to read data > > that may have > > > > > > >>>> been modified by another client that holds a WRITE layout > > for that range > > > > > > >>>> then (since that is a visible data change) it should > > provide a change > > > > > > >>>> attribute update irrespective of whether or not a > > LAYOUTCOMMIT has been > > > > > > >>>> sent. > > > > > > >>> > > > > > > >>> the requirement for the server in WRITE's implementation > > section > > > > > > >>> is quite weak: "It is assumed that the act of writing data > > to a file will > > > > > > >>> cause the time_modified and change attributes of the file to > > be updated." > > > > > > >>> > > > > > > >>> The difference here is that for pNFS the written data is not > > guaranteed > > > > > > >>> to be visible until LAYOUTCOMMIT. In a broader sense, > > assuming the clients > > > > > > >>> are caching dirty data and use a write-behind cache, > > application-written data > > > > > > >>> may be visible to other processes on the same host but not > > to others until > > > > > > >>> fsync() or close() - open-to-close semantics are the only > > thing the client > > > > > > >>> guarantees, right? Issuing LAYOUTCOMMIT on fsync() and > > close() ensure the > > > > > > >>> data is committed to stable storage and is visible to all > > other clients in > > > > > > >>> the cluster. > > > > > > >> > > > > > > >> See above. I'm not disputing your statement that 'the written > > data is > > > > > > >> not guaranteed to be visible until LAYOUTCOMMIT'. I am > > disputing an > > > > > > >> assumption that 'the written data may be visible without an > > accompanying > > > > > > >> change attribute update'. > > > > > > > > > > > > > > > > > > > > > In other words, I'd expect the following scenario to give the > > same > > > > > > > results in NFSv4.1 w/pNFS as it does in NFSv4: > > > > > > > > > > > > That's a strong requirement that may limit the scalability of > > the server. > > > > > > > > > > > > The spirit of the pNFS operations, at least from Panasas > > perspective was that > > > > > > the data is transient until LAYOUTCOMMIT, meaning it may or may > > not be visible > > > > > > to clients other than the one who wrote it, and its associated > > metadata MUST > > > > > > be updated and describe the new data only on LAYOUTCOMMIT and > > until then it's > > > > > > undefined, i.e. it's up to the server implementation whether to > > update it or not. > > > > > > > > > > > > Without locking, what do the stronger semantics buy you? > > > > > > Even if a client verified the change_attribute new data may > > become visible > > > > > > at any time after the GETATTR if the file/byte range aren't > > locked. > > > > > > > > > > There is no locking needed in the scenario below: it is ordinary > > > > > close-to-open semantics. > > > > > > > > > > The point is that if you remove the one and only way that clients > > have > > > > > to determine whether or not their data caches are valid, then they > > can > > > > > no longer cache data at all, and server scalability will be shot > > to > > > > > smithereens anyway. > > > > > > > > > > Trond > > > > > > > > > > > Benny > > > > > > > > > > > > > > > > > > > > Client 1 Client 2 > > > > > > > ======== ======== > > > > > > > > > > > > > > OPEN foo > > > > > > > READ > > > > > > > CLOSE > > > > > > > OPEN > > > > > > > LAYOUTGET ... > > > > > > > WRITE via DS > > > > > > > <dies>... > > > > > > > OPEN foo > > > > > > > verify change_attr > > > > > > > READ if above WRITE is visible > > > > > > > CLOSE > > > > > > > > > > > > > > Trond > > > > > > > _______________________________________________ > > > > > > > nfsv4 mailing list > > > > > > > nfsv4@ietf.org > > > > > > > https://www.ietf.org/mailman/listinfo/nfsv4 > > > > > > > > > > > > > > > _______________________________________________ > > > > > nfsv4 mailing list > > > > > nfsv4@ietf.org > > > > > https://www.ietf.org/mailman/listinfo/nfsv4 > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > nfsv4 mailing list > > > nfsv4@ietf.org > > > https://www.ietf.org/mailman/listinfo/nfsv4 > > > > > > _______________________________________________ > > > nfsv4 mailing list > > > nfsv4@ietf.org > > > https://www.ietf.org/mailman/listinfo/nfsv4 > > > > _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www.ietf.org/mailman/listinfo/nfsv4 ^ permalink raw reply [flat|nested] 38+ messages in thread
[parent not found: <1278544497.15524.17.camel@heimdal.trondhje! m .org>]
[parent not found: < 4C35F5E3.3000604@panasas.com>]
* RE: [nfsv4] 4.1 client - LAYOUTCOMMIT & close 2010-07-07 23:09 ` Trond Myklebust [not found] ` <1278544497.15524.17.camel@heimdal.trondhje! m .org> @ 2010-07-07 23:14 ` Trond Myklebust 2010-07-08 15:59 ` Benny Halevy 1 sibling, 1 reply; 38+ messages in thread From: Trond Myklebust @ 2010-07-07 23:14 UTC (permalink / raw) To: david.black Cc: Noveck_David, Daniel.Muntz, linux-nfs, garth, welch, nfsv4, andros, bhalevy On Wed, 2010-07-07 at 19:09 -0400, Trond Myklebust wrote: > On Wed, 2010-07-07 at 18:52 -0400, Trond Myklebust wrote: > > On Wed, 2010-07-07 at 18:44 -0400, david.black@emc.com wrote: > > > Let me try this ... > > > > > > A correct client will always send LAYOUTCOMMIT. > > > Assume that the client is correct. > > > Hence if the LAYOUTCOMMIT doesn't arrive, something's failed. > > > > > > Important implication: No LAYOUTCOMMIT is an error/failure case. It > > > just has to work; it doesn't have to be fast. > > > > > > Suggestion: If a client dies while holding writeable layouts that permit > > > write-in-place, and the client doesn't reappear or doesn't reclaim those > > > layouts, then the server should assume that the files involved were > > > written before the client died, and set the file attributes accordingly > > > as part of internally reclaiming the layout that the client has > > > abandoned. > > > > > > Caveat: It may take a while for the server to determine that the client > > > has abandoned a layout. > > > > > > This can result in false positives (file appears to be modified when it > > > wasn't) but won't yield false negatives (file does not appear to be > > > modified even though it was modified). > > > > OK... So we're going to have to turn off client side file caching > > entirely for pNFS? I can do that... > > > > The above won't work. Think readahead... > > So... What can work, is if you modify it to work explicitly for > close-to-open > > "Upon receiving an OPEN, LOCK or a WANT_DELEGATION, the server must > check that it has received LAYOUTCOMMITs from any other clients that may > have the file open for writing. If it hasn't, then it MUST take some > action to ensure that any file data changes are accompanied by a change ^ potentially visible > attribute update." > > Then you can add the above suggestion without the offending caveat. Note > however that it does break the "SHOULD NOT" admonition in section > 18.32.4. > > Trond > > > > Trond > > > > > Thanks, > > > --David > > > > > > > -----Original Message----- > > > > From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] On Behalf > > > Of Noveck_David@emc.com > > > > Sent: Wednesday, July 07, 2010 6:04 PM > > > > To: Trond.Myklebust@netapp.com; Muntz, Daniel > > > > Cc: linux-nfs@vger.kernel.org; garth@panasas.com; welch@panasas.com; > > > nfsv4@ietf.org; > > > > andros@netapp.com; bhalevy@panasas.com > > > > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > > > > > > > > > Yes. I would agree that the client cannot rely on the updates being > > > made > > > > > visible if it fails to send the LAYOUTCOMMIT. My point was simply > > > that a > > > > > compliant server MUST also have a valid strategy for dealing with > > > the > > > > > case where the client doesn't send it. > > > > > > > > So you are saying the updates "MUST be made visible" through the > > > > server's valid strategy. Is that right. > > > > > > > > And that the client cannot rely on that. Why not, if the server must > > > > have a valid strategy. > > > > > > > > Is this just prudent "belt and suspenders" design or what? > > > > > > > > It seems to me that if one side here is MUST (and the spec needs to be > > > > clearer about what might or might not constitute a valid strategy), > > > then > > > > the other side should be SHOULD. > > > > > > > > If both sides are "MUST", then if things don't work out then the > > > client > > > > and server can equally point to one another and say "It's his fault". > > > > > > > > Am I missing something here? > > > > > > > > > > > > > > > > -----Original Message----- > > > > From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] On Behalf > > > > Of Trond Myklebust > > > > Sent: Wednesday, July 07, 2010 5:01 PM > > > > To: Muntz, Daniel > > > > Cc: linux-nfs@vger.kernel.org; garth@panasas.com; welch@panasas.com; > > > > nfsv4@ietf.org; andros@netapp.com; bhalevy@panasas.com > > > > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > > > > > > > > On Wed, 2010-07-07 at 16:39 -0400, Daniel.Muntz@emc.com wrote: > > > > > To bring this discussion full circle, since we agree that a > > > compliant > > > > > server can implement a scheme where written data does not become > > > > visible > > > > > until after a LAYOUTCOMMIT, do we also agree that LAYOUTCOMMIT is a > > > > > "MUST" from a compliant client (independent of layout type)? > > > > > > > > Yes. I would agree that the client cannot rely on the updates being > > > made > > > > visible if it fails to send the LAYOUTCOMMIT. My point was simply that > > > a > > > > compliant server MUST also have a valid strategy for dealing with the > > > > case where the client doesn't send it. > > > > > > > > Cheers > > > > Trond > > > > > > > > > -Dan > > > > > > > > > > > -----Original Message----- > > > > > > From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] > > > > > > On Behalf Of Trond Myklebust > > > > > > Sent: Wednesday, July 07, 2010 7:04 AM > > > > > > To: Benny Halevy > > > > > > Cc: andros@netapp.com; linux-nfs@vger.kernel.org; Garth > > > > > > Gibson; Brent Welch; NFSv4 > > > > > > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > > > > > > > > > > > > On Wed, 2010-07-07 at 16:51 +0300, Benny Halevy wrote: > > > > > > > On Jul. 07, 2010, 16:18 +0300, Trond Myklebust > > > > > > <Trond.Myklebust@netapp.com> wrote: > > > > > > > > On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote: > > > > > > > >> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote: > > > > > > > >>> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust > > > > > > <trond.myklebust@fys.uio.no> wrote: > > > > > > > >>>> On Tue, 2010-07-06 at 15:20 -0400, Daniel.Muntz@emc.com > > > > wrote: > > > > > > > >>>>> The COMMIT to the DS, ttbomk, commits data on the DS. I > > > see it as > > > > > > > >>>>> orthogonal to updating the metadata on the MDS (but > > > perhaps I'm wrong). > > > > > > > >>>>> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides a > > > synchronization > > > > > > > >>>>> point, so even if the non-clustered server does not want > > > to update > > > > > > > >>>>> metadata on every DS I/O, the LAYOUTCOMMIT could also be a > > > trigger to > > > > > > > >>>>> execute whatever synchronization mechanism the implementer > > > wishes to put > > > > > > > >>>>> in the control protocol. > > > > > > > >>>> > > > > > > > >>>> As far as I'm aware, there are no exceptions in RFC5661 > > > that would allow > > > > > > > >>>> pNFS servers to break the rule that any visible change to > > > the data must > > > > > > > >>>> be atomically accompanied with a change attribute update. > > > > > > > >>>> > > > > > > > >>> > > > > > > > >>> Trond, I'm not sure how this rule you mentioned is > > > specified. > > > > > > > >>> > > > > > > > >>> See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT and > > > change/time_modify > > > > > > > >>> in particular: > > > > > > > >>> > > > > > > > >>> For some layout protocols, the storage device is able to > > > notify the > > > > > > > >>> metadata server of the occurrence of an I/O; as a result, > > > the change > > > > > > > >>> and time_modify attributes may be updated at the metadata > > > server. > > > > > > > >>> For a metadata server that is capable of monitoring > > > updates to the > > > > > > > >>> change and time_modify attributes, LAYOUTCOMMIT > > > processing is not > > > > > > > >>> required to update the change attribute. In this case, > > > the metadata > > > > > > > >>> server must ensure that no further update to the data has > > > occurred > > > > > > > >>> since the last update of the attributes; file-based > > > protocols may > > > > > > > >>> have enough information to make this determination or may > > > update the > > > > > > > >>> change attribute upon each file modification. This also > > > applies for > > > > > > > >>> the time_modify attribute. If the server implementation > > > is able to > > > > > > > >>> determine that the file has not been modified since the > > > last > > > > > > > >>> time_modify update, the server need not update > > > time_modify at > > > > > > > >>> LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the updated > > > attributes > > > > > > > >>> should be visible if that file was modified since the > > > latest previous > > > > > > > >>> LAYOUTCOMMIT or LAYOUTGET > > > > > > > >> > > > > > > > >> I know. However the above paragraph does not state that the > > > server > > > > > > > >> should make those changes visible to clients other than the > > > one that is > > > > > > > >> writing. > > > > > > > >> > > > > > > > >> Section 18.32.4 states that writes will cause the > > > time_modified and > > > > > > > >> change attributes to be updated (if and only if the file data > > > is > > > > > > > >> modified). Several other sections rely on this behaviour, > > > including > > > > > > > >> section 10.3.1, section 11.7.2.2, and section 11.7.7. > > > > > > > >> > > > > > > > >> The only 'special behaviour' that I see allowed for pNFS is > > > in section > > > > > > > >> 13.10, which states that clients can't expect to see changes > > > > > > > >> immediately, but that they must be able to expect > > > close-to-open > > > > > > > >> semantics to work. Again, if this is to be the case, then the > > > server > > > > > > > >> _must_ be able to deal with the case where client 1 dies > > > before it can > > > > > > > >> issue the LAYOUTCOMMIT. > > > > > > > > > > > > > > Agreed. > > > > > > > > > > > > > > >> > > > > > > > >> > > > > > > > >>>> As I see it, if your server allows one client to read data > > > that may have > > > > > > > >>>> been modified by another client that holds a WRITE layout > > > for that range > > > > > > > >>>> then (since that is a visible data change) it should > > > provide a change > > > > > > > >>>> attribute update irrespective of whether or not a > > > LAYOUTCOMMIT has been > > > > > > > >>>> sent. > > > > > > > >>> > > > > > > > >>> the requirement for the server in WRITE's implementation > > > section > > > > > > > >>> is quite weak: "It is assumed that the act of writing data > > > to a file will > > > > > > > >>> cause the time_modified and change attributes of the file to > > > be updated." > > > > > > > >>> > > > > > > > >>> The difference here is that for pNFS the written data is not > > > guaranteed > > > > > > > >>> to be visible until LAYOUTCOMMIT. In a broader sense, > > > assuming the clients > > > > > > > >>> are caching dirty data and use a write-behind cache, > > > application-written data > > > > > > > >>> may be visible to other processes on the same host but not > > > to others until > > > > > > > >>> fsync() or close() - open-to-close semantics are the only > > > thing the client > > > > > > > >>> guarantees, right? Issuing LAYOUTCOMMIT on fsync() and > > > close() ensure the > > > > > > > >>> data is committed to stable storage and is visible to all > > > other clients in > > > > > > > >>> the cluster. > > > > > > > >> > > > > > > > >> See above. I'm not disputing your statement that 'the written > > > data is > > > > > > > >> not guaranteed to be visible until LAYOUTCOMMIT'. I am > > > disputing an > > > > > > > >> assumption that 'the written data may be visible without an > > > accompanying > > > > > > > >> change attribute update'. > > > > > > > > > > > > > > > > > > > > > > > > In other words, I'd expect the following scenario to give the > > > same > > > > > > > > results in NFSv4.1 w/pNFS as it does in NFSv4: > > > > > > > > > > > > > > That's a strong requirement that may limit the scalability of > > > the server. > > > > > > > > > > > > > > The spirit of the pNFS operations, at least from Panasas > > > perspective was that > > > > > > > the data is transient until LAYOUTCOMMIT, meaning it may or may > > > not be visible > > > > > > > to clients other than the one who wrote it, and its associated > > > metadata MUST > > > > > > > be updated and describe the new data only on LAYOUTCOMMIT and > > > until then it's > > > > > > > undefined, i.e. it's up to the server implementation whether to > > > update it or not. > > > > > > > > > > > > > > Without locking, what do the stronger semantics buy you? > > > > > > > Even if a client verified the change_attribute new data may > > > become visible > > > > > > > at any time after the GETATTR if the file/byte range aren't > > > locked. > > > > > > > > > > > > There is no locking needed in the scenario below: it is ordinary > > > > > > close-to-open semantics. > > > > > > > > > > > > The point is that if you remove the one and only way that clients > > > have > > > > > > to determine whether or not their data caches are valid, then they > > > can > > > > > > no longer cache data at all, and server scalability will be shot > > > to > > > > > > smithereens anyway. > > > > > > > > > > > > Trond > > > > > > > > > > > > > Benny > > > > > > > > > > > > > > > > > > > > > > > Client 1 Client 2 > > > > > > > > ======== ======== > > > > > > > > > > > > > > > > OPEN foo > > > > > > > > READ > > > > > > > > CLOSE > > > > > > > > OPEN > > > > > > > > LAYOUTGET ... > > > > > > > > WRITE via DS > > > > > > > > <dies>... > > > > > > > > OPEN foo > > > > > > > > verify change_attr > > > > > > > > READ if above WRITE is visible > > > > > > > > CLOSE > > > > > > > > > > > > > > > > Trond > > > > > > > > _______________________________________________ > > > > > > > > nfsv4 mailing list > > > > > > > > nfsv4@ietf.org > > > > > > > > https://www.ietf.org/mailman/listinfo/nfsv4 > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > nfsv4 mailing list > > > > > > nfsv4@ietf.org > > > > > > https://www.ietf.org/mailman/listinfo/nfsv4 > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > nfsv4 mailing list > > > > nfsv4@ietf.org > > > > https://www.ietf.org/mailman/listinfo/nfsv4 > > > > > > > > _______________________________________________ > > > > nfsv4 mailing list > > > > nfsv4@ietf.org > > > > https://www.ietf.org/mailman/listinfo/nfsv4 > > > > > > > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 4.1 client - LAYOUTCOMMIT & close 2010-07-07 23:14 ` Trond Myklebust @ 2010-07-08 15:59 ` Benny Halevy 2010-07-08 20:30 ` [nfsv4] " david.black 0 siblings, 1 reply; 38+ messages in thread From: Benny Halevy @ 2010-07-08 15:59 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs, garth, welch, nfsv4, andros On Jul. 08, 2010, 2:14 +0300, Trond Myklebust <trond.myklebust@fys.uio.no> wrote: > On Wed, 2010-07-07 at 19:09 -0400, Trond Myklebust wrote: >> On Wed, 2010-07-07 at 18:52 -0400, Trond Myklebust wrote: >>> On Wed, 2010-07-07 at 18:44 -0400, david.black@emc.com wrote: >>>> Let me try this ... >>>> >>>> A correct client will always send LAYOUTCOMMIT. >>>> Assume that the client is correct. >>>> Hence if the LAYOUTCOMMIT doesn't arrive, something's failed. >>>> >>>> Important implication: No LAYOUTCOMMIT is an error/failure case. It >>>> just has to work; it doesn't have to be fast. >>>> Note that a LAYOUTRETURN can arrive without LAYOUTCOMMIT if the client hasn't written to the file. I'm not sure what about the blocks case though, do you implicitly free up any provisionally allocated blocks that the client had not explicitly committed using LAYOUTCOMMIT? >>>> Suggestion: If a client dies while holding writeable layouts that permit >>>> write-in-place, and the client doesn't reappear or doesn't reclaim those >>>> layouts, then the server should assume that the files involved were >>>> written before the client died, and set the file attributes accordingly >>>> as part of internally reclaiming the layout that the client has >>>> abandoned. Of course. That's part of the server recovery. >>>> >>>> Caveat: It may take a while for the server to determine that the client >>>> has abandoned a layout. That's two lease times after a respective CB_LAYOUTRECALL. >>>> >>>> This can result in false positives (file appears to be modified when it >>>> wasn't) but won't yield false negatives (file does not appear to be >>>> modified even though it was modified). >>> >>> OK... So we're going to have to turn off client side file caching >>> entirely for pNFS? I can do that... >>> >>> The above won't work. Think readahead... >> >> So... What can work, is if you modify it to work explicitly for >> close-to-open >> >> "Upon receiving an OPEN, LOCK or a WANT_DELEGATION, the server must >> check that it has received LAYOUTCOMMITs from any other clients that may >> have the file open for writing. If it hasn't, then it MUST take some >> action to ensure that any file data changes are accompanied by a change > ^ potentially visible >> attribute update." That should be OK as long as it's not for every GETATTR for the change, mtime, or size attributes. >> >> Then you can add the above suggestion without the offending caveat. Note >> however that it does break the "SHOULD NOT" admonition in section >> 18.32.4. Better be safe than sorry in this rare error case. Benny >> >> Trond >> >> >>> Trond >>> >>>> Thanks, >>>> --David >>>> >>>>> -----Original Message----- >>>>> From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] On Behalf >>>> Of Noveck_David@emc.com >>>>> Sent: Wednesday, July 07, 2010 6:04 PM >>>>> To: Trond.Myklebust@netapp.com; Muntz, Daniel >>>>> Cc: linux-nfs@vger.kernel.org; garth@panasas.com; welch@panasas.com; >>>> nfsv4@ietf.org; >>>>> andros@netapp.com; bhalevy@panasas.com >>>>> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close >>>>> >>>>>> Yes. I would agree that the client cannot rely on the updates being >>>> made >>>>>> visible if it fails to send the LAYOUTCOMMIT. My point was simply >>>> that a >>>>>> compliant server MUST also have a valid strategy for dealing with >>>> the >>>>>> case where the client doesn't send it. >>>>> >>>>> So you are saying the updates "MUST be made visible" through the >>>>> server's valid strategy. Is that right. >>>>> >>>>> And that the client cannot rely on that. Why not, if the server must >>>>> have a valid strategy. >>>>> >>>>> Is this just prudent "belt and suspenders" design or what? >>>>> >>>>> It seems to me that if one side here is MUST (and the spec needs to be >>>>> clearer about what might or might not constitute a valid strategy), >>>> then >>>>> the other side should be SHOULD. >>>>> >>>>> If both sides are "MUST", then if things don't work out then the >>>> client >>>>> and server can equally point to one another and say "It's his fault". >>>>> >>>>> Am I missing something here? >>>>> >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] On Behalf >>>>> Of Trond Myklebust >>>>> Sent: Wednesday, July 07, 2010 5:01 PM >>>>> To: Muntz, Daniel >>>>> Cc: linux-nfs@vger.kernel.org; garth@panasas.com; welch@panasas.com; >>>>> nfsv4@ietf.org; andros@netapp.com; bhalevy@panasas.com >>>>> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close >>>>> >>>>> On Wed, 2010-07-07 at 16:39 -0400, Daniel.Muntz@emc.com wrote: >>>>>> To bring this discussion full circle, since we agree that a >>>> compliant >>>>>> server can implement a scheme where written data does not become >>>>> visible >>>>>> until after a LAYOUTCOMMIT, do we also agree that LAYOUTCOMMIT is a >>>>>> "MUST" from a compliant client (independent of layout type)? >>>>> >>>>> Yes. I would agree that the client cannot rely on the updates being >>>> made >>>>> visible if it fails to send the LAYOUTCOMMIT. My point was simply that >>>> a >>>>> compliant server MUST also have a valid strategy for dealing with the >>>>> case where the client doesn't send it. >>>>> >>>>> Cheers >>>>> Trond >>>>> >>>>>> -Dan >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] >>>>>>> On Behalf Of Trond Myklebust >>>>>>> Sent: Wednesday, July 07, 2010 7:04 AM >>>>>>> To: Benny Halevy >>>>>>> Cc: andros@netapp.com; linux-nfs@vger.kernel.org; Garth >>>>>>> Gibson; Brent Welch; NFSv4 >>>>>>> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close >>>>>>> >>>>>>> On Wed, 2010-07-07 at 16:51 +0300, Benny Halevy wrote: >>>>>>>> On Jul. 07, 2010, 16:18 +0300, Trond Myklebust >>>>>>> <Trond.Myklebust@netapp.com> wrote: >>>>>>>>> On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote: >>>>>>>>>> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote: >>>>>>>>>>> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust >>>>>>> <trond.myklebust@fys.uio.no> wrote: >>>>>>>>>>>> On Tue, 2010-07-06 at 15:20 -0400, Daniel.Muntz@emc.com >>>>> wrote: >>>>>>>>>>>>> The COMMIT to the DS, ttbomk, commits data on the DS. I >>>> see it as >>>>>>>>>>>>> orthogonal to updating the metadata on the MDS (but >>>> perhaps I'm wrong). >>>>>>>>>>>>> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides a >>>> synchronization >>>>>>>>>>>>> point, so even if the non-clustered server does not want >>>> to update >>>>>>>>>>>>> metadata on every DS I/O, the LAYOUTCOMMIT could also be a >>>> trigger to >>>>>>>>>>>>> execute whatever synchronization mechanism the implementer >>>> wishes to put >>>>>>>>>>>>> in the control protocol. >>>>>>>>>>>> >>>>>>>>>>>> As far as I'm aware, there are no exceptions in RFC5661 >>>> that would allow >>>>>>>>>>>> pNFS servers to break the rule that any visible change to >>>> the data must >>>>>>>>>>>> be atomically accompanied with a change attribute update. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Trond, I'm not sure how this rule you mentioned is >>>> specified. >>>>>>>>>>> >>>>>>>>>>> See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT and >>>> change/time_modify >>>>>>>>>>> in particular: >>>>>>>>>>> >>>>>>>>>>> For some layout protocols, the storage device is able to >>>> notify the >>>>>>>>>>> metadata server of the occurrence of an I/O; as a result, >>>> the change >>>>>>>>>>> and time_modify attributes may be updated at the metadata >>>> server. >>>>>>>>>>> For a metadata server that is capable of monitoring >>>> updates to the >>>>>>>>>>> change and time_modify attributes, LAYOUTCOMMIT >>>> processing is not >>>>>>>>>>> required to update the change attribute. In this case, >>>> the metadata >>>>>>>>>>> server must ensure that no further update to the data has >>>> occurred >>>>>>>>>>> since the last update of the attributes; file-based >>>> protocols may >>>>>>>>>>> have enough information to make this determination or may >>>> update the >>>>>>>>>>> change attribute upon each file modification. This also >>>> applies for >>>>>>>>>>> the time_modify attribute. If the server implementation >>>> is able to >>>>>>>>>>> determine that the file has not been modified since the >>>> last >>>>>>>>>>> time_modify update, the server need not update >>>> time_modify at >>>>>>>>>>> LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the updated >>>> attributes >>>>>>>>>>> should be visible if that file was modified since the >>>> latest previous >>>>>>>>>>> LAYOUTCOMMIT or LAYOUTGET >>>>>>>>>> >>>>>>>>>> I know. However the above paragraph does not state that the >>>> server >>>>>>>>>> should make those changes visible to clients other than the >>>> one that is >>>>>>>>>> writing. >>>>>>>>>> >>>>>>>>>> Section 18.32.4 states that writes will cause the >>>> time_modified and >>>>>>>>>> change attributes to be updated (if and only if the file data >>>> is >>>>>>>>>> modified). Several other sections rely on this behaviour, >>>> including >>>>>>>>>> section 10.3.1, section 11.7.2.2, and section 11.7.7. >>>>>>>>>> >>>>>>>>>> The only 'special behaviour' that I see allowed for pNFS is >>>> in section >>>>>>>>>> 13.10, which states that clients can't expect to see changes >>>>>>>>>> immediately, but that they must be able to expect >>>> close-to-open >>>>>>>>>> semantics to work. Again, if this is to be the case, then the >>>> server >>>>>>>>>> _must_ be able to deal with the case where client 1 dies >>>> before it can >>>>>>>>>> issue the LAYOUTCOMMIT. >>>>>>>> >>>>>>>> Agreed. >>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> As I see it, if your server allows one client to read data >>>> that may have >>>>>>>>>>>> been modified by another client that holds a WRITE layout >>>> for that range >>>>>>>>>>>> then (since that is a visible data change) it should >>>> provide a change >>>>>>>>>>>> attribute update irrespective of whether or not a >>>> LAYOUTCOMMIT has been >>>>>>>>>>>> sent. >>>>>>>>>>> >>>>>>>>>>> the requirement for the server in WRITE's implementation >>>> section >>>>>>>>>>> is quite weak: "It is assumed that the act of writing data >>>> to a file will >>>>>>>>>>> cause the time_modified and change attributes of the file to >>>> be updated." >>>>>>>>>>> >>>>>>>>>>> The difference here is that for pNFS the written data is not >>>> guaranteed >>>>>>>>>>> to be visible until LAYOUTCOMMIT. In a broader sense, >>>> assuming the clients >>>>>>>>>>> are caching dirty data and use a write-behind cache, >>>> application-written data >>>>>>>>>>> may be visible to other processes on the same host but not >>>> to others until >>>>>>>>>>> fsync() or close() - open-to-close semantics are the only >>>> thing the client >>>>>>>>>>> guarantees, right? Issuing LAYOUTCOMMIT on fsync() and >>>> close() ensure the >>>>>>>>>>> data is committed to stable storage and is visible to all >>>> other clients in >>>>>>>>>>> the cluster. >>>>>>>>>> >>>>>>>>>> See above. I'm not disputing your statement that 'the written >>>> data is >>>>>>>>>> not guaranteed to be visible until LAYOUTCOMMIT'. I am >>>> disputing an >>>>>>>>>> assumption that 'the written data may be visible without an >>>> accompanying >>>>>>>>>> change attribute update'. >>>>>>>>> >>>>>>>>> >>>>>>>>> In other words, I'd expect the following scenario to give the >>>> same >>>>>>>>> results in NFSv4.1 w/pNFS as it does in NFSv4: >>>>>>>> >>>>>>>> That's a strong requirement that may limit the scalability of >>>> the server. >>>>>>>> >>>>>>>> The spirit of the pNFS operations, at least from Panasas >>>> perspective was that >>>>>>>> the data is transient until LAYOUTCOMMIT, meaning it may or may >>>> not be visible >>>>>>>> to clients other than the one who wrote it, and its associated >>>> metadata MUST >>>>>>>> be updated and describe the new data only on LAYOUTCOMMIT and >>>> until then it's >>>>>>>> undefined, i.e. it's up to the server implementation whether to >>>> update it or not. >>>>>>>> >>>>>>>> Without locking, what do the stronger semantics buy you? >>>>>>>> Even if a client verified the change_attribute new data may >>>> become visible >>>>>>>> at any time after the GETATTR if the file/byte range aren't >>>> locked. >>>>>>> >>>>>>> There is no locking needed in the scenario below: it is ordinary >>>>>>> close-to-open semantics. >>>>>>> >>>>>>> The point is that if you remove the one and only way that clients >>>> have >>>>>>> to determine whether or not their data caches are valid, then they >>>> can >>>>>>> no longer cache data at all, and server scalability will be shot >>>> to >>>>>>> smithereens anyway. >>>>>>> >>>>>>> Trond >>>>>>> >>>>>>>> Benny >>>>>>>> >>>>>>>>> >>>>>>>>> Client 1 Client 2 >>>>>>>>> ======== ======== >>>>>>>>> >>>>>>>>> OPEN foo >>>>>>>>> READ >>>>>>>>> CLOSE >>>>>>>>> OPEN >>>>>>>>> LAYOUTGET ... >>>>>>>>> WRITE via DS >>>>>>>>> <dies>... >>>>>>>>> OPEN foo >>>>>>>>> verify change_attr >>>>>>>>> READ if above WRITE is visible >>>>>>>>> CLOSE >>>>>>>>> >>>>>>>>> Trond >>>>>>>>> _______________________________________________ >>>>>>>>> nfsv4 mailing list >>>>>>>>> nfsv4@ietf.org >>>>>>>>> https://www.ietf.org/mailman/listinfo/nfsv4 >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> nfsv4 mailing list >>>>>>> nfsv4@ietf.org >>>>>>> https://www.ietf.org/mailman/listinfo/nfsv4 >>>>>>> >>>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> nfsv4 mailing list >>>>> nfsv4@ietf.org >>>>> https://www.ietf.org/mailman/listinfo/nfsv4 >>>>> >>>>> _______________________________________________ >>>>> nfsv4 mailing list >>>>> nfsv4@ietf.org >>>>> https://www.ietf.org/mailman/listinfo/nfsv4 >>>> >>> >>> >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www.ietf.org/mailman/listinfo/nfsv4 ^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [nfsv4] 4.1 client - LAYOUTCOMMIT & close 2010-07-08 15:59 ` Benny Halevy @ 2010-07-08 20:30 ` david.black 2010-07-08 21:16 ` Trond Myklebust 2010-07-08 22:12 ` sfaibish 0 siblings, 2 replies; 38+ messages in thread From: david.black @ 2010-07-08 20:30 UTC (permalink / raw) To: bhalevy, trond.myklebust Cc: Noveck_David, Daniel.Muntz, linux-nfs, garth, welch, nfsv4, andros PiBOb3RlIHRoYXQgYSBMQVlPVVRSRVRVUk4gY2FuIGFycml2ZSB3aXRob3V0IExBWU9VVENPTU1J VCBpZiB0aGUgY2xpZW50IGhhc24ndA0KPiB3cml0dGVuIHRvIHRoZSBmaWxlLiAgSSdtIG5vdCBz dXJlIHdoYXQgYWJvdXQgdGhlIGJsb2NrcyBjYXNlIHRob3VnaCwgZG8geW91DQo+IGltcGxpY2l0 bHkgZnJlZSB1cCBhbnkgcHJvdmlzaW9uYWxseSBhbGxvY2F0ZWQgYmxvY2tzIHRoYXQgdGhlIGNs aWVudCBoYWQgbm90DQo+IGV4cGxpY2l0bHkgY29tbWl0dGVkIHVzaW5nIExBWU9VVENPTU1JVD8N Cg0KSW4gcHJpbmNpcGxlLCB5ZXMgYXMgdGhlIGJsb2NrcyBhcmUgbm8gbG9uZ2VyIHByb21pc2Vk IHRvIHRoZSBjbGllbnQsIGFsdGhvdWdoDQpsYXp5IGV2YWx1YXRpb24gb2YgdGhpcyBpcyBhbiBv YnZpb3VzIG9wdGltaXphdGlvbi4NCg0KPiA+PiAiVXBvbiByZWNlaXZpbmcgYW4gT1BFTiwgTE9D SyBvciBhIFdBTlRfREVMRUdBVElPTiwgdGhlIHNlcnZlciBtdXN0DQo+ID4+IGNoZWNrIHRoYXQg aXQgaGFzIHJlY2VpdmVkIExBWU9VVENPTU1JVHMgZnJvbSBhbnkgb3RoZXIgY2xpZW50cyB0aGF0 IG1heQ0KPiA+PiBoYXZlIHRoZSBmaWxlIG9wZW4gZm9yIHdyaXRpbmcuIElmIGl0IGhhc24ndCwg dGhlbiBpdCBNVVNUIHRha2Ugc29tZQ0KPiA+PiBhY3Rpb24gdG8gZW5zdXJlIHRoYXQgYW55IGZp bGUgZGF0YSBjaGFuZ2VzIGFyZSBhY2NvbXBhbmllZCBieSBhIGNoYW5nZQ0KPiA+ICAgICAgICAg ICAgICAgICAgICAgICAgICAgIF4gcG90ZW50aWFsbHkgdmlzaWJsZQ0KPiA+PiBhdHRyaWJ1dGUg dXBkYXRlLiINCj4gDQo+IFRoYXQgc2hvdWxkIGJlIE9LIGFzIGxvbmcgYXMgaXQncyBub3QgZm9y IGV2ZXJ5IEdFVEFUVFIgZm9yIHRoZSBjaGFuZ2UsIG10aW1lLA0KPiBvciBzaXplIGF0dHJpYnV0 ZXMuDQo+IA0KPiA+Pg0KPiA+PiBUaGVuIHlvdSBjYW4gYWRkIHRoZSBhYm92ZSBzdWdnZXN0aW9u IHdpdGhvdXQgdGhlIG9mZmVuZGluZyBjYXZlYXQuIE5vdGUNCj4gPj4gaG93ZXZlciB0aGF0IGl0 IGRvZXMgYnJlYWsgdGhlICJTSE9VTEQgTk9UIiBhZG1vbml0aW9uIGluIHNlY3Rpb24NCj4gPj4g MTguMzIuNC4NCj4gDQo+IEJldHRlciBiZSBzYWZlIHRoYW4gc29ycnkgaW4gdGhpcyByYXJlIGVy cm9yIGNhc2UuDQoNCkkgY29uY3VyIHdpdGggQmVubnkgb24gYm90aCBvZiB0aGUgYWJvdmUgLSBp biBlc3NlbmNlLCB0aGUgdW5yZWNvdmVyZWQgY2xpZW50IGZhaWx1cmUgaXMgYSByZWFzb24gdG8g cG90ZW50aWFsbHkgaWdub3JlIHRoZSAiU0hPVUxEIiAoc2VydmVyIGNhbid0IGtub3cgd2hldGhl ciBpdCBhY3R1YWxseSBpZ25vcmVkIHRoZSAiU0hPVUxEIiwgaGVuY2UgYmV0dGVyIHNhZmUgdGhh biBzb3JyeSkuICBXZSBwcm9iYWJseSBvdWdodCB0byBmaW5kIGEgc29tZXBsYWNlIGFwcHJvcHJp YXRlIHRvIGFkZCBhIHBhcmFncmFwaCBvciB0d28gZXhwbGFpbmluZyB0aGlzIGluIG9uZSBvZiB0 aGUgNC4yIGRvY3VtZW50cy4NCg0KVGhhbmtzLA0KLS1EYXZpZA0KDQoNCj4gLS0tLS1PcmlnaW5h bCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogQmVubnkgSGFsZXZ5IFttYWlsdG86YmhhbGV2eS5saXN0 c0BnbWFpbC5jb21dIE9uIEJlaGFsZiBPZiBCZW5ueSBIYWxldnkNCj4gU2VudDogVGh1cnNkYXks IEp1bHkgMDgsIDIwMTAgMTI6MDAgUE0NCj4gVG86IFRyb25kIE15a2xlYnVzdA0KPiBDYzogQmxh Y2ssIERhdmlkOyBOb3ZlY2ssIERhdmlkOyBNdW50eiwgRGFuaWVsOyBsaW51eC1uZnNAdmdlci5r ZXJuZWwub3JnOyBnYXJ0aEBwYW5hc2FzLmNvbTsNCj4gd2VsY2hAcGFuYXNhcy5jb207IG5mc3Y0 QGlldGYub3JnOyBhbmRyb3NAbmV0YXBwLmNvbQ0KPiBTdWJqZWN0OiBSZTogW25mc3Y0XSA0LjEg Y2xpZW50IC0gTEFZT1VUQ09NTUlUICYgY2xvc2UNCj4gDQo+IE9uIEp1bC4gMDgsIDIwMTAsIDI6 MTQgKzAzMDAsIFRyb25kIE15a2xlYnVzdCA8dHJvbmQubXlrbGVidXN0QGZ5cy51aW8ubm8+IHdy b3RlOg0KPiA+IE9uIFdlZCwgMjAxMC0wNy0wNyBhdCAxOTowOSAtMDQwMCwgVHJvbmQgTXlrbGVi dXN0IHdyb3RlOg0KPiA+PiBPbiBXZWQsIDIwMTAtMDctMDcgYXQgMTg6NTIgLTA0MDAsIFRyb25k IE15a2xlYnVzdCB3cm90ZToNCj4gPj4+IE9uIFdlZCwgMjAxMC0wNy0wNyBhdCAxODo0NCAtMDQw MCwgZGF2aWQuYmxhY2tAZW1jLmNvbSB3cm90ZToNCj4gPj4+PiBMZXQgbWUgdHJ5IHRoaXMgLi4u DQo+ID4+Pj4NCj4gPj4+PiBBIGNvcnJlY3QgY2xpZW50IHdpbGwgYWx3YXlzIHNlbmQgTEFZT1VU Q09NTUlULg0KPiA+Pj4+IEFzc3VtZSB0aGF0IHRoZSBjbGllbnQgaXMgY29ycmVjdC4NCj4gPj4+ PiBIZW5jZSBpZiB0aGUgTEFZT1VUQ09NTUlUIGRvZXNuJ3QgYXJyaXZlLCBzb21ldGhpbmcncyBm YWlsZWQuDQo+ID4+Pj4NCj4gPj4+PiBJbXBvcnRhbnQgaW1wbGljYXRpb246IE5vIExBWU9VVENP TU1JVCBpcyBhbiBlcnJvci9mYWlsdXJlIGNhc2UuICBJdA0KPiA+Pj4+IGp1c3QgaGFzIHRvIHdv cms7IGl0IGRvZXNuJ3QgaGF2ZSB0byBiZSBmYXN0Lg0KPiA+Pj4+DQo+IA0KPiBOb3RlIHRoYXQg YSBMQVlPVVRSRVRVUk4gY2FuIGFycml2ZSB3aXRob3V0IExBWU9VVENPTU1JVCBpZiB0aGUgY2xp ZW50IGhhc24ndA0KPiB3cml0dGVuIHRvIHRoZSBmaWxlLiAgSSdtIG5vdCBzdXJlIHdoYXQgYWJv dXQgdGhlIGJsb2NrcyBjYXNlIHRob3VnaCwgZG8geW91DQo+IGltcGxpY2l0bHkgZnJlZSB1cCBh bnkgcHJvdmlzaW9uYWxseSBhbGxvY2F0ZWQgYmxvY2tzIHRoYXQgdGhlIGNsaWVudCBoYWQgbm90 DQo+IGV4cGxpY2l0bHkgY29tbWl0dGVkIHVzaW5nIExBWU9VVENPTU1JVD8NCj4gDQo+ID4+Pj4g U3VnZ2VzdGlvbjogSWYgYSBjbGllbnQgZGllcyB3aGlsZSBob2xkaW5nIHdyaXRlYWJsZSBsYXlv dXRzIHRoYXQgcGVybWl0DQo+ID4+Pj4gd3JpdGUtaW4tcGxhY2UsIGFuZCB0aGUgY2xpZW50IGRv ZXNuJ3QgcmVhcHBlYXIgb3IgZG9lc24ndCByZWNsYWltIHRob3NlDQo+ID4+Pj4gbGF5b3V0cywg dGhlbiB0aGUgc2VydmVyIHNob3VsZCBhc3N1bWUgdGhhdCB0aGUgZmlsZXMgaW52b2x2ZWQgd2Vy ZQ0KPiA+Pj4+IHdyaXR0ZW4gYmVmb3JlIHRoZSBjbGllbnQgZGllZCwgYW5kIHNldCB0aGUgZmls ZSBhdHRyaWJ1dGVzIGFjY29yZGluZ2x5DQo+ID4+Pj4gYXMgcGFydCBvZiBpbnRlcm5hbGx5IHJl Y2xhaW1pbmcgdGhlIGxheW91dCB0aGF0IHRoZSBjbGllbnQgaGFzDQo+ID4+Pj4gYWJhbmRvbmVk Lg0KPiANCj4gT2YgY291cnNlLiBUaGF0J3MgcGFydCBvZiB0aGUgc2VydmVyIHJlY292ZXJ5Lg0K PiANCj4gPj4+Pg0KPiA+Pj4+IENhdmVhdDogSXQgbWF5IHRha2UgYSB3aGlsZSBmb3IgdGhlIHNl cnZlciB0byBkZXRlcm1pbmUgdGhhdCB0aGUgY2xpZW50DQo+ID4+Pj4gaGFzIGFiYW5kb25lZCBh IGxheW91dC4NCj4gDQo+IFRoYXQncyB0d28gbGVhc2UgdGltZXMgYWZ0ZXIgYSByZXNwZWN0aXZl IENCX0xBWU9VVFJFQ0FMTC4NCj4gDQo+ID4+Pj4NCj4gPj4+PiBUaGlzIGNhbiByZXN1bHQgaW4g ZmFsc2UgcG9zaXRpdmVzIChmaWxlIGFwcGVhcnMgdG8gYmUgbW9kaWZpZWQgd2hlbiBpdA0KPiA+ Pj4+IHdhc24ndCkgYnV0IHdvbid0IHlpZWxkIGZhbHNlIG5lZ2F0aXZlcyAoZmlsZSBkb2VzIG5v dCBhcHBlYXIgdG8gYmUNCj4gPj4+PiBtb2RpZmllZCBldmVuIHRob3VnaCBpdCB3YXMgbW9kaWZp ZWQpLg0KPiA+Pj4NCj4gPj4+IE9LLi4uIFNvIHdlJ3JlIGdvaW5nIHRvIGhhdmUgdG8gdHVybiBv ZmYgY2xpZW50IHNpZGUgZmlsZSBjYWNoaW5nDQo+ID4+PiBlbnRpcmVseSBmb3IgcE5GUz8gSSBj YW4gZG8gdGhhdC4uLg0KPiA+Pj4NCj4gPj4+IFRoZSBhYm92ZSB3b24ndCB3b3JrLiBUaGluayBy ZWFkYWhlYWQuLi4NCj4gPj4NCj4gPj4gU28uLi4gV2hhdCBjYW4gd29yaywgaXMgaWYgeW91IG1v ZGlmeSBpdCB0byB3b3JrIGV4cGxpY2l0bHkgZm9yDQo+ID4+IGNsb3NlLXRvLW9wZW4NCj4gPj4N Cj4gPj4gIlVwb24gcmVjZWl2aW5nIGFuIE9QRU4sIExPQ0sgb3IgYSBXQU5UX0RFTEVHQVRJT04s IHRoZSBzZXJ2ZXIgbXVzdA0KPiA+PiBjaGVjayB0aGF0IGl0IGhhcyByZWNlaXZlZCBMQVlPVVRD T01NSVRzIGZyb20gYW55IG90aGVyIGNsaWVudHMgdGhhdCBtYXkNCj4gPj4gaGF2ZSB0aGUgZmls ZSBvcGVuIGZvciB3cml0aW5nLiBJZiBpdCBoYXNuJ3QsIHRoZW4gaXQgTVVTVCB0YWtlIHNvbWUN Cj4gPj4gYWN0aW9uIHRvIGVuc3VyZSB0aGF0IGFueSBmaWxlIGRhdGEgY2hhbmdlcyBhcmUgYWNj b21wYW5pZWQgYnkgYSBjaGFuZ2UNCj4gPiAgICAgICAgICAgICAgICAgICAgICAgICAgICBeIHBv dGVudGlhbGx5IHZpc2libGUNCj4gPj4gYXR0cmlidXRlIHVwZGF0ZS4iDQo+IA0KPiBUaGF0IHNo b3VsZCBiZSBPSyBhcyBsb25nIGFzIGl0J3Mgbm90IGZvciBldmVyeSBHRVRBVFRSIGZvciB0aGUg Y2hhbmdlLCBtdGltZSwNCj4gb3Igc2l6ZSBhdHRyaWJ1dGVzLg0KPiANCj4gPj4NCj4gPj4gVGhl biB5b3UgY2FuIGFkZCB0aGUgYWJvdmUgc3VnZ2VzdGlvbiB3aXRob3V0IHRoZSBvZmZlbmRpbmcg Y2F2ZWF0LiBOb3RlDQo+ID4+IGhvd2V2ZXIgdGhhdCBpdCBkb2VzIGJyZWFrIHRoZSAiU0hPVUxE IE5PVCIgYWRtb25pdGlvbiBpbiBzZWN0aW9uDQo+ID4+IDE4LjMyLjQuDQo+IA0KPiBCZXR0ZXIg YmUgc2FmZSB0aGFuIHNvcnJ5IGluIHRoaXMgcmFyZSBlcnJvciBjYXNlLg0KPiANCj4gQmVubnkN Cj4gDQo+ID4+DQo+ID4+IFRyb25kDQo+ID4+DQo+ID4+DQo+ID4+PiBUcm9uZA0KPiA+Pj4NCj4g Pj4+PiBUaGFua3MsDQo+ID4+Pj4gLS1EYXZpZA0KPiA+Pj4+DQo+ID4+Pj4+IC0tLS0tT3JpZ2lu YWwgTWVzc2FnZS0tLS0tDQo+ID4+Pj4+IEZyb206IG5mc3Y0LWJvdW5jZXNAaWV0Zi5vcmcgW21h aWx0bzpuZnN2NC1ib3VuY2VzQGlldGYub3JnXSBPbiBCZWhhbGYNCj4gPj4+PiBPZiBOb3ZlY2tf RGF2aWRAZW1jLmNvbQ0KPiA+Pj4+PiBTZW50OiBXZWRuZXNkYXksIEp1bHkgMDcsIDIwMTAgNjow NCBQTQ0KPiA+Pj4+PiBUbzogVHJvbmQuTXlrbGVidXN0QG5ldGFwcC5jb207IE11bnR6LCBEYW5p ZWwNCj4gPj4+Pj4gQ2M6IGxpbnV4LW5mc0B2Z2VyLmtlcm5lbC5vcmc7IGdhcnRoQHBhbmFzYXMu Y29tOyB3ZWxjaEBwYW5hc2FzLmNvbTsNCj4gPj4+PiBuZnN2NEBpZXRmLm9yZzsNCj4gPj4+Pj4g YW5kcm9zQG5ldGFwcC5jb207IGJoYWxldnlAcGFuYXNhcy5jb20NCj4gPj4+Pj4gU3ViamVjdDog UmU6IFtuZnN2NF0gNC4xIGNsaWVudCAtIExBWU9VVENPTU1JVCAmIGNsb3NlDQo+ID4+Pj4+DQo+ ID4+Pj4+PiBZZXMuIEkgd291bGQgYWdyZWUgdGhhdCB0aGUgY2xpZW50IGNhbm5vdCByZWx5IG9u IHRoZSB1cGRhdGVzIGJlaW5nDQo+ID4+Pj4gbWFkZQ0KPiA+Pj4+Pj4gdmlzaWJsZSBpZiBpdCBm YWlscyB0byBzZW5kIHRoZSBMQVlPVVRDT01NSVQuIE15IHBvaW50IHdhcyBzaW1wbHkNCj4gPj4+ PiB0aGF0IGENCj4gPj4+Pj4+IGNvbXBsaWFudCBzZXJ2ZXIgTVVTVCBhbHNvIGhhdmUgYSB2YWxp ZCBzdHJhdGVneSBmb3IgZGVhbGluZyB3aXRoDQo+ID4+Pj4gdGhlDQo+ID4+Pj4+PiBjYXNlIHdo ZXJlIHRoZSBjbGllbnQgZG9lc24ndCBzZW5kIGl0Lg0KPiA+Pj4+Pg0KPiA+Pj4+PiBTbyB5b3Ug YXJlIHNheWluZyB0aGUgdXBkYXRlcyAiTVVTVCBiZSBtYWRlIHZpc2libGUiIHRocm91Z2ggdGhl DQo+ID4+Pj4+IHNlcnZlcidzIHZhbGlkIHN0cmF0ZWd5LiAgSXMgdGhhdCByaWdodC4NCj4gPj4+ Pj4NCj4gPj4+Pj4gQW5kIHRoYXQgdGhlIGNsaWVudCBjYW5ub3QgcmVseSBvbiB0aGF0LiAgV2h5 IG5vdCwgaWYgdGhlIHNlcnZlciBtdXN0DQo+ID4+Pj4+IGhhdmUgYSB2YWxpZCBzdHJhdGVneS4N Cj4gPj4+Pj4NCj4gPj4+Pj4gSXMgdGhpcyBqdXN0IHBydWRlbnQgImJlbHQgYW5kIHN1c3BlbmRl cnMiIGRlc2lnbiBvciB3aGF0Pw0KPiA+Pj4+Pg0KPiA+Pj4+PiBJdCBzZWVtcyB0byBtZSB0aGF0 IGlmIG9uZSBzaWRlIGhlcmUgaXMgTVVTVCAoYW5kIHRoZSBzcGVjIG5lZWRzIHRvIGJlDQo+ID4+ Pj4+IGNsZWFyZXIgYWJvdXQgd2hhdCBtaWdodCBvciBtaWdodCBub3QgY29uc3RpdHV0ZSBhIHZh bGlkIHN0cmF0ZWd5KSwNCj4gPj4+PiB0aGVuDQo+ID4+Pj4+IHRoZSBvdGhlciBzaWRlIHNob3Vs ZCBiZSBTSE9VTEQuDQo+ID4+Pj4+DQo+ID4+Pj4+IElmIGJvdGggc2lkZXMgYXJlICJNVVNUIiwg dGhlbiBpZiB0aGluZ3MgZG9uJ3Qgd29yayBvdXQgdGhlbiB0aGUNCj4gPj4+PiBjbGllbnQNCj4g Pj4+Pj4gYW5kIHNlcnZlciBjYW4gZXF1YWxseSBwb2ludCB0byBvbmUgYW5vdGhlciBhbmQgc2F5 ICJJdCdzIGhpcyBmYXVsdCIuDQo+ID4+Pj4+DQo+ID4+Pj4+IEFtIEkgbWlzc2luZyBzb21ldGhp bmcgaGVyZT8NCj4gPj4+Pj4NCj4gPj4+Pj4NCj4gPj4+Pj4NCj4gPj4+Pj4gLS0tLS1PcmlnaW5h bCBNZXNzYWdlLS0tLS0NCj4gPj4+Pj4gRnJvbTogbmZzdjQtYm91bmNlc0BpZXRmLm9yZyBbbWFp bHRvOm5mc3Y0LWJvdW5jZXNAaWV0Zi5vcmddIE9uIEJlaGFsZg0KPiA+Pj4+PiBPZiBUcm9uZCBN eWtsZWJ1c3QNCj4gPj4+Pj4gU2VudDogV2VkbmVzZGF5LCBKdWx5IDA3LCAyMDEwIDU6MDEgUE0N Cj4gPj4+Pj4gVG86IE11bnR6LCBEYW5pZWwNCj4gPj4+Pj4gQ2M6IGxpbnV4LW5mc0B2Z2VyLmtl cm5lbC5vcmc7IGdhcnRoQHBhbmFzYXMuY29tOyB3ZWxjaEBwYW5hc2FzLmNvbTsNCj4gPj4+Pj4g bmZzdjRAaWV0Zi5vcmc7IGFuZHJvc0BuZXRhcHAuY29tOyBiaGFsZXZ5QHBhbmFzYXMuY29tDQo+ ID4+Pj4+IFN1YmplY3Q6IFJlOiBbbmZzdjRdIDQuMSBjbGllbnQgLSBMQVlPVVRDT01NSVQgJiBj bG9zZQ0KPiA+Pj4+Pg0KPiA+Pj4+PiBPbiBXZWQsIDIwMTAtMDctMDcgYXQgMTY6MzkgLTA0MDAs IERhbmllbC5NdW50ekBlbWMuY29tIHdyb3RlOg0KPiA+Pj4+Pj4gVG8gYnJpbmcgdGhpcyBkaXNj dXNzaW9uIGZ1bGwgY2lyY2xlLCBzaW5jZSB3ZSBhZ3JlZSB0aGF0IGENCj4gPj4+PiBjb21wbGlh bnQNCj4gPj4+Pj4+IHNlcnZlciBjYW4gaW1wbGVtZW50IGEgc2NoZW1lIHdoZXJlIHdyaXR0ZW4g ZGF0YSBkb2VzIG5vdCBiZWNvbWUNCj4gPj4+Pj4gdmlzaWJsZQ0KPiA+Pj4+Pj4gdW50aWwgYWZ0 ZXIgYSBMQVlPVVRDT01NSVQsIGRvIHdlIGFsc28gYWdyZWUgdGhhdCBMQVlPVVRDT01NSVQgaXMg YQ0KPiA+Pj4+Pj4gIk1VU1QiIGZyb20gYSBjb21wbGlhbnQgY2xpZW50IChpbmRlcGVuZGVudCBv ZiBsYXlvdXQgdHlwZSk/DQo+ID4+Pj4+DQo+ID4+Pj4+IFllcy4gSSB3b3VsZCBhZ3JlZSB0aGF0 IHRoZSBjbGllbnQgY2Fubm90IHJlbHkgb24gdGhlIHVwZGF0ZXMgYmVpbmcNCj4gPj4+PiBtYWRl DQo+ID4+Pj4+IHZpc2libGUgaWYgaXQgZmFpbHMgdG8gc2VuZCB0aGUgTEFZT1VUQ09NTUlULiBN eSBwb2ludCB3YXMgc2ltcGx5IHRoYXQNCj4gPj4+PiBhDQo+ID4+Pj4+IGNvbXBsaWFudCBzZXJ2 ZXIgTVVTVCBhbHNvIGhhdmUgYSB2YWxpZCBzdHJhdGVneSBmb3IgZGVhbGluZyB3aXRoIHRoZQ0K PiA+Pj4+PiBjYXNlIHdoZXJlIHRoZSBjbGllbnQgZG9lc24ndCBzZW5kIGl0Lg0KPiA+Pj4+Pg0K PiA+Pj4+PiBDaGVlcnMNCj4gPj4+Pj4gICBUcm9uZA0KPiA+Pj4+Pg0KPiA+Pj4+Pj4gICAtRGFu DQo+ID4+Pj4+Pg0KPiA+Pj4+Pj4+IC0tLS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0tDQo+ID4+Pj4+ Pj4gRnJvbTogbmZzdjQtYm91bmNlc0BpZXRmLm9yZyBbbWFpbHRvOm5mc3Y0LWJvdW5jZXNAaWV0 Zi5vcmddDQo+ID4+Pj4+Pj4gT24gQmVoYWxmIE9mIFRyb25kIE15a2xlYnVzdA0KPiA+Pj4+Pj4+ IFNlbnQ6IFdlZG5lc2RheSwgSnVseSAwNywgMjAxMCA3OjA0IEFNDQo+ID4+Pj4+Pj4gVG86IEJl bm55IEhhbGV2eQ0KPiA+Pj4+Pj4+IENjOiBhbmRyb3NAbmV0YXBwLmNvbTsgbGludXgtbmZzQHZn ZXIua2VybmVsLm9yZzsgR2FydGgNCj4gPj4+Pj4+PiBHaWJzb247IEJyZW50IFdlbGNoOyBORlN2 NA0KPiA+Pj4+Pj4+IFN1YmplY3Q6IFJlOiBbbmZzdjRdIDQuMSBjbGllbnQgLSBMQVlPVVRDT01N SVQgJiBjbG9zZQ0KPiA+Pj4+Pj4+DQo+ID4+Pj4+Pj4gT24gV2VkLCAyMDEwLTA3LTA3IGF0IDE2 OjUxICswMzAwLCBCZW5ueSBIYWxldnkgd3JvdGU6DQo+ID4+Pj4+Pj4+IE9uIEp1bC4gMDcsIDIw MTAsIDE2OjE4ICswMzAwLCBUcm9uZCBNeWtsZWJ1c3QNCj4gPj4+Pj4+PiA8VHJvbmQuTXlrbGVi dXN0QG5ldGFwcC5jb20+IHdyb3RlOg0KPiA+Pj4+Pj4+Pj4gT24gV2VkLCAyMDEwLTA3LTA3IGF0 IDA5OjA2IC0wNDAwLCBUcm9uZCBNeWtsZWJ1c3Qgd3JvdGU6DQo+ID4+Pj4+Pj4+Pj4gT24gV2Vk LCAyMDEwLTA3LTA3IGF0IDE1OjA1ICswMzAwLCBCZW5ueSBIYWxldnkgd3JvdGU6DQo+ID4+Pj4+ Pj4+Pj4+IE9uIEp1bC4gMDYsIDIwMTAsIDIzOjQwICswMzAwLCBUcm9uZCBNeWtsZWJ1c3QNCj4g Pj4+Pj4+PiA8dHJvbmQubXlrbGVidXN0QGZ5cy51aW8ubm8+IHdyb3RlOg0KPiA+Pj4+Pj4+Pj4+ Pj4gT24gVHVlLCAyMDEwLTA3LTA2IGF0IDE1OjIwIC0wNDAwLCBEYW5pZWwuTXVudHpAZW1jLmNv bQ0KPiA+Pj4+PiB3cm90ZToNCj4gPj4+Pj4+Pj4+Pj4+PiBUaGUgQ09NTUlUIHRvIHRoZSBEUywg dHRib21rLCBjb21taXRzIGRhdGEgb24gdGhlIERTLiBJDQo+ID4+Pj4gc2VlIGl0IGFzDQo+ID4+ Pj4+Pj4+Pj4+Pj4gb3J0aG9nb25hbCB0byB1cGRhdGluZyB0aGUgbWV0YWRhdGEgb24gdGhlIE1E UyAoYnV0DQo+ID4+Pj4gcGVyaGFwcyBJJ20gd3JvbmcpLg0KPiA+Pj4+Pj4+Pj4+Pj4+IEFzIHNq b3NoaUBibHVlYXJjIG1lbnRpb25lZCwgdGhlIExBWU9VVENPTU1JVCBwcm92aWRlcyBhDQo+ID4+ Pj4gc3luY2hyb25pemF0aW9uDQo+ID4+Pj4+Pj4+Pj4+Pj4gcG9pbnQsIHNvIGV2ZW4gaWYgdGhl IG5vbi1jbHVzdGVyZWQgc2VydmVyIGRvZXMgbm90IHdhbnQNCj4gPj4+PiB0byB1cGRhdGUNCj4g Pj4+Pj4+Pj4+Pj4+PiBtZXRhZGF0YSBvbiBldmVyeSBEUyBJL08sIHRoZSBMQVlPVVRDT01NSVQg Y291bGQgYWxzbyBiZSBhDQo+ID4+Pj4gdHJpZ2dlciB0bw0KPiA+Pj4+Pj4+Pj4+Pj4+IGV4ZWN1 dGUgd2hhdGV2ZXIgc3luY2hyb25pemF0aW9uIG1lY2hhbmlzbSB0aGUgaW1wbGVtZW50ZXINCj4g Pj4+PiB3aXNoZXMgdG8gcHV0DQo+ID4+Pj4+Pj4+Pj4+Pj4gaW4gdGhlIGNvbnRyb2wgcHJvdG9j b2wuDQo+ID4+Pj4+Pj4+Pj4+Pg0KPiA+Pj4+Pj4+Pj4+Pj4gQXMgZmFyIGFzIEknbSBhd2FyZSwg dGhlcmUgYXJlIG5vIGV4Y2VwdGlvbnMgaW4gUkZDNTY2MQ0KPiA+Pj4+IHRoYXQgd291bGQgYWxs b3cNCj4gPj4+Pj4+Pj4+Pj4+IHBORlMgc2VydmVycyB0byBicmVhayB0aGUgcnVsZSB0aGF0IGFu eSB2aXNpYmxlIGNoYW5nZSB0bw0KPiA+Pj4+IHRoZSBkYXRhIG11c3QNCj4gPj4+Pj4+Pj4+Pj4+ IGJlIGF0b21pY2FsbHkgYWNjb21wYW5pZWQgd2l0aCBhIGNoYW5nZSBhdHRyaWJ1dGUgdXBkYXRl Lg0KPiA+Pj4+Pj4+Pj4+Pj4NCj4gPj4+Pj4+Pj4+Pj4NCj4gPj4+Pj4+Pj4+Pj4gVHJvbmQsIEkn bSBub3Qgc3VyZSBob3cgdGhpcyBydWxlIHlvdSBtZW50aW9uZWQgaXMNCj4gPj4+PiBzcGVjaWZp ZWQuDQo+ID4+Pj4+Pj4+Pj4+DQo+ID4+Pj4+Pj4+Pj4+IFNlZSBtb3JlIGluIHNlY3Rpb24gMTIu NS40IGFuZCAxMi41LjQuMS4gTEFZT1VUQ09NTUlUIGFuZA0KPiA+Pj4+IGNoYW5nZS90aW1lX21v ZGlmeQ0KPiA+Pj4+Pj4+Pj4+PiBpbiBwYXJ0aWN1bGFyOg0KPiA+Pj4+Pj4+Pj4+Pg0KPiA+Pj4+ Pj4+Pj4+PiAgICBGb3Igc29tZSBsYXlvdXQgcHJvdG9jb2xzLCB0aGUgc3RvcmFnZSBkZXZpY2Ug aXMgYWJsZSB0bw0KPiA+Pj4+IG5vdGlmeSB0aGUNCj4gPj4+Pj4+Pj4+Pj4gICAgbWV0YWRhdGEg c2VydmVyIG9mIHRoZSBvY2N1cnJlbmNlIG9mIGFuIEkvTzsgYXMgYSByZXN1bHQsDQo+ID4+Pj4g dGhlIGNoYW5nZQ0KPiA+Pj4+Pj4+Pj4+PiAgICBhbmQgdGltZV9tb2RpZnkgYXR0cmlidXRlcyBt YXkgYmUgdXBkYXRlZCBhdCB0aGUgbWV0YWRhdGENCj4gPj4+PiBzZXJ2ZXIuDQo+ID4+Pj4+Pj4+ Pj4+ICAgIEZvciBhIG1ldGFkYXRhIHNlcnZlciB0aGF0IGlzIGNhcGFibGUgb2YgbW9uaXRvcmlu Zw0KPiA+Pj4+IHVwZGF0ZXMgdG8gdGhlDQo+ID4+Pj4+Pj4+Pj4+ICAgIGNoYW5nZSBhbmQgdGlt ZV9tb2RpZnkgYXR0cmlidXRlcywgTEFZT1VUQ09NTUlUDQo+ID4+Pj4gcHJvY2Vzc2luZyBpcyBu b3QNCj4gPj4+Pj4+Pj4+Pj4gICAgcmVxdWlyZWQgdG8gdXBkYXRlIHRoZSBjaGFuZ2UgYXR0cmli dXRlLiAgSW4gdGhpcyBjYXNlLA0KPiA+Pj4+IHRoZSBtZXRhZGF0YQ0KPiA+Pj4+Pj4+Pj4+PiAg ICBzZXJ2ZXIgbXVzdCBlbnN1cmUgdGhhdCBubyBmdXJ0aGVyIHVwZGF0ZSB0byB0aGUgZGF0YSBo YXMNCj4gPj4+PiBvY2N1cnJlZA0KPiA+Pj4+Pj4+Pj4+PiAgICBzaW5jZSB0aGUgbGFzdCB1cGRh dGUgb2YgdGhlIGF0dHJpYnV0ZXM7IGZpbGUtYmFzZWQNCj4gPj4+PiBwcm90b2NvbHMgbWF5DQo+ ID4+Pj4+Pj4+Pj4+ICAgIGhhdmUgZW5vdWdoIGluZm9ybWF0aW9uIHRvIG1ha2UgdGhpcyBkZXRl cm1pbmF0aW9uIG9yIG1heQ0KPiA+Pj4+IHVwZGF0ZSB0aGUNCj4gPj4+Pj4+Pj4+Pj4gICAgY2hh bmdlIGF0dHJpYnV0ZSB1cG9uIGVhY2ggZmlsZSBtb2RpZmljYXRpb24uICBUaGlzIGFsc28NCj4g Pj4+PiBhcHBsaWVzIGZvcg0KPiA+Pj4+Pj4+Pj4+PiAgICB0aGUgdGltZV9tb2RpZnkgYXR0cmli dXRlLiAgSWYgdGhlIHNlcnZlciBpbXBsZW1lbnRhdGlvbg0KPiA+Pj4+IGlzIGFibGUgdG8NCj4g Pj4+Pj4+Pj4+Pj4gICAgZGV0ZXJtaW5lIHRoYXQgdGhlIGZpbGUgaGFzIG5vdCBiZWVuIG1vZGlm aWVkIHNpbmNlIHRoZQ0KPiA+Pj4+IGxhc3QNCj4gPj4+Pj4+Pj4+Pj4gICAgdGltZV9tb2RpZnkg dXBkYXRlLCB0aGUgc2VydmVyIG5lZWQgbm90IHVwZGF0ZQ0KPiA+Pj4+IHRpbWVfbW9kaWZ5IGF0 DQo+ID4+Pj4+Pj4+Pj4+ICAgIExBWU9VVENPTU1JVC4gIEF0IExBWU9VVENPTU1JVCBjb21wbGV0 aW9uLCB0aGUgdXBkYXRlZA0KPiA+Pj4+IGF0dHJpYnV0ZXMNCj4gPj4+Pj4+Pj4+Pj4gICAgc2hv dWxkIGJlIHZpc2libGUgaWYgdGhhdCBmaWxlIHdhcyBtb2RpZmllZCBzaW5jZSB0aGUNCj4gPj4+ PiBsYXRlc3QgcHJldmlvdXMNCj4gPj4+Pj4+Pj4+Pj4gICAgTEFZT1VUQ09NTUlUIG9yIExBWU9V VEdFVA0KPiA+Pj4+Pj4+Pj4+DQo+ID4+Pj4+Pj4+Pj4gSSBrbm93LiBIb3dldmVyIHRoZSBhYm92 ZSBwYXJhZ3JhcGggZG9lcyBub3Qgc3RhdGUgdGhhdCB0aGUNCj4gPj4+PiBzZXJ2ZXINCj4gPj4+ Pj4+Pj4+PiBzaG91bGQgbWFrZSB0aG9zZSBjaGFuZ2VzIHZpc2libGUgdG8gY2xpZW50cyBvdGhl ciB0aGFuIHRoZQ0KPiA+Pj4+IG9uZSB0aGF0IGlzDQo+ID4+Pj4+Pj4+Pj4gd3JpdGluZy4NCj4g Pj4+Pj4+Pj4+Pg0KPiA+Pj4+Pj4+Pj4+IFNlY3Rpb24gMTguMzIuNCBzdGF0ZXMgdGhhdCB3cml0 ZXMgd2lsbCBjYXVzZSB0aGUNCj4gPj4+PiB0aW1lX21vZGlmaWVkIGFuZA0KPiA+Pj4+Pj4+Pj4+ IGNoYW5nZSBhdHRyaWJ1dGVzIHRvIGJlIHVwZGF0ZWQgKGlmIGFuZCBvbmx5IGlmIHRoZSBmaWxl IGRhdGENCj4gPj4+PiBpcw0KPiA+Pj4+Pj4+Pj4+IG1vZGlmaWVkKS4gU2V2ZXJhbCBvdGhlciBz ZWN0aW9ucyByZWx5IG9uIHRoaXMgYmVoYXZpb3VyLA0KPiA+Pj4+IGluY2x1ZGluZw0KPiA+Pj4+ Pj4+Pj4+IHNlY3Rpb24gMTAuMy4xLCBzZWN0aW9uIDExLjcuMi4yLCBhbmQgc2VjdGlvbiAxMS43 LjcuDQo+ID4+Pj4+Pj4+Pj4NCj4gPj4+Pj4+Pj4+PiBUaGUgb25seSAnc3BlY2lhbCBiZWhhdmlv dXInIHRoYXQgSSBzZWUgYWxsb3dlZCBmb3IgcE5GUyBpcw0KPiA+Pj4+IGluIHNlY3Rpb24NCj4g Pj4+Pj4+Pj4+PiAxMy4xMCwgd2hpY2ggc3RhdGVzIHRoYXQgY2xpZW50cyBjYW4ndCBleHBlY3Qg dG8gc2VlIGNoYW5nZXMNCj4gPj4+Pj4+Pj4+PiBpbW1lZGlhdGVseSwgYnV0IHRoYXQgdGhleSBt dXN0IGJlIGFibGUgdG8gZXhwZWN0DQo+ID4+Pj4gY2xvc2UtdG8tb3Blbg0KPiA+Pj4+Pj4+Pj4+ IHNlbWFudGljcyB0byB3b3JrLiBBZ2FpbiwgaWYgdGhpcyBpcyB0byBiZSB0aGUgY2FzZSwgdGhl biB0aGUNCj4gPj4+PiBzZXJ2ZXINCj4gPj4+Pj4+Pj4+PiBfbXVzdF8gYmUgYWJsZSB0byBkZWFs IHdpdGggdGhlIGNhc2Ugd2hlcmUgY2xpZW50IDEgZGllcw0KPiA+Pj4+IGJlZm9yZSBpdCBjYW4N Cj4gPj4+Pj4+Pj4+PiBpc3N1ZSB0aGUgTEFZT1VUQ09NTUlULg0KPiA+Pj4+Pj4+Pg0KPiA+Pj4+ Pj4+PiBBZ3JlZWQuDQo+ID4+Pj4+Pj4+DQo+ID4+Pj4+Pj4+Pj4NCj4gPj4+Pj4+Pj4+Pg0KPiA+ Pj4+Pj4+Pj4+Pj4gQXMgSSBzZWUgaXQsIGlmIHlvdXIgc2VydmVyIGFsbG93cyBvbmUgY2xpZW50 IHRvIHJlYWQgZGF0YQ0KPiA+Pj4+IHRoYXQgbWF5IGhhdmUNCj4gPj4+Pj4+Pj4+Pj4+IGJlZW4g bW9kaWZpZWQgYnkgYW5vdGhlciBjbGllbnQgdGhhdCBob2xkcyBhIFdSSVRFIGxheW91dA0KPiA+ Pj4+IGZvciB0aGF0IHJhbmdlDQo+ID4+Pj4+Pj4+Pj4+PiB0aGVuIChzaW5jZSB0aGF0IGlzIGEg dmlzaWJsZSBkYXRhIGNoYW5nZSkgaXQgc2hvdWxkDQo+ID4+Pj4gcHJvdmlkZSBhIGNoYW5nZQ0K PiA+Pj4+Pj4+Pj4+Pj4gYXR0cmlidXRlIHVwZGF0ZSBpcnJlc3BlY3RpdmUgb2Ygd2hldGhlciBv ciBub3QgYQ0KPiA+Pj4+IExBWU9VVENPTU1JVCBoYXMgYmVlbg0KPiA+Pj4+Pj4+Pj4+Pj4gc2Vu dC4NCj4gPj4+Pj4+Pj4+Pj4NCj4gPj4+Pj4+Pj4+Pj4gdGhlIHJlcXVpcmVtZW50IGZvciB0aGUg c2VydmVyIGluIFdSSVRFJ3MgaW1wbGVtZW50YXRpb24NCj4gPj4+PiBzZWN0aW9uDQo+ID4+Pj4+ Pj4+Pj4+IGlzIHF1aXRlIHdlYWs6ICJJdCBpcyBhc3N1bWVkIHRoYXQgdGhlIGFjdCBvZiB3cml0 aW5nIGRhdGENCj4gPj4+PiB0byBhIGZpbGUgd2lsbA0KPiA+Pj4+Pj4+Pj4+PiBjYXVzZSB0aGUg dGltZV9tb2RpZmllZCBhbmQgY2hhbmdlIGF0dHJpYnV0ZXMgb2YgdGhlIGZpbGUgdG8NCj4gPj4+ PiBiZSB1cGRhdGVkLiINCj4gPj4+Pj4+Pj4+Pj4NCj4gPj4+Pj4+Pj4+Pj4gVGhlIGRpZmZlcmVu Y2UgaGVyZSBpcyB0aGF0IGZvciBwTkZTIHRoZSB3cml0dGVuIGRhdGEgaXMgbm90DQo+ID4+Pj4g Z3VhcmFudGVlZA0KPiA+Pj4+Pj4+Pj4+PiB0byBiZSB2aXNpYmxlIHVudGlsIExBWU9VVENPTU1J VC4gIEluIGEgYnJvYWRlciBzZW5zZSwNCj4gPj4+PiBhc3N1bWluZyB0aGUgY2xpZW50cw0KPiA+ Pj4+Pj4+Pj4+PiBhcmUgY2FjaGluZyBkaXJ0eSBkYXRhIGFuZCB1c2UgYSB3cml0ZS1iZWhpbmQg Y2FjaGUsDQo+ID4+Pj4gYXBwbGljYXRpb24td3JpdHRlbiBkYXRhDQo+ID4+Pj4+Pj4+Pj4+IG1h eSBiZSB2aXNpYmxlIHRvIG90aGVyIHByb2Nlc3NlcyBvbiB0aGUgc2FtZSBob3N0IGJ1dCBub3QN Cj4gPj4+PiB0byBvdGhlcnMgdW50aWwNCj4gPj4+Pj4+Pj4+Pj4gZnN5bmMoKSBvciBjbG9zZSgp IC0gb3Blbi10by1jbG9zZSBzZW1hbnRpY3MgYXJlIHRoZSBvbmx5DQo+ID4+Pj4gdGhpbmcgdGhl IGNsaWVudA0KPiA+Pj4+Pj4+Pj4+PiBndWFyYW50ZWVzLCByaWdodD8gIElzc3VpbmcgTEFZT1VU Q09NTUlUIG9uIGZzeW5jKCkgYW5kDQo+ID4+Pj4gY2xvc2UoKSBlbnN1cmUgdGhlDQo+ID4+Pj4+ Pj4+Pj4+IGRhdGEgaXMgY29tbWl0dGVkIHRvIHN0YWJsZSBzdG9yYWdlIGFuZCBpcyB2aXNpYmxl IHRvIGFsbA0KPiA+Pj4+IG90aGVyIGNsaWVudHMgaW4NCj4gPj4+Pj4+Pj4+Pj4gdGhlIGNsdXN0 ZXIuDQo+ID4+Pj4+Pj4+Pj4NCj4gPj4+Pj4+Pj4+PiBTZWUgYWJvdmUuIEknbSBub3QgZGlzcHV0 aW5nIHlvdXIgc3RhdGVtZW50IHRoYXQgJ3RoZSB3cml0dGVuDQo+ID4+Pj4gZGF0YSBpcw0KPiA+ Pj4+Pj4+Pj4+IG5vdCBndWFyYW50ZWVkIHRvIGJlIHZpc2libGUgdW50aWwgTEFZT1VUQ09NTUlU Jy4gSSBhbQ0KPiA+Pj4+IGRpc3B1dGluZyBhbg0KPiA+Pj4+Pj4+Pj4+IGFzc3VtcHRpb24gdGhh dCAndGhlIHdyaXR0ZW4gZGF0YSBtYXkgYmUgdmlzaWJsZSB3aXRob3V0IGFuDQo+ID4+Pj4gYWNj b21wYW55aW5nDQo+ID4+Pj4+Pj4+Pj4gY2hhbmdlIGF0dHJpYnV0ZSB1cGRhdGUnLg0KPiA+Pj4+ Pj4+Pj4NCj4gPj4+Pj4+Pj4+DQo+ID4+Pj4+Pj4+PiBJbiBvdGhlciB3b3JkcywgSSdkIGV4cGVj dCB0aGUgZm9sbG93aW5nIHNjZW5hcmlvIHRvIGdpdmUgdGhlDQo+ID4+Pj4gc2FtZQ0KPiA+Pj4+ Pj4+Pj4gcmVzdWx0cyBpbiBORlN2NC4xIHcvcE5GUyBhcyBpdCBkb2VzIGluIE5GU3Y0Og0KPiA+ Pj4+Pj4+Pg0KPiA+Pj4+Pj4+PiBUaGF0J3MgYSBzdHJvbmcgcmVxdWlyZW1lbnQgdGhhdCBtYXkg bGltaXQgdGhlIHNjYWxhYmlsaXR5IG9mDQo+ID4+Pj4gdGhlIHNlcnZlci4NCj4gPj4+Pj4+Pj4N Cj4gPj4+Pj4+Pj4gVGhlIHNwaXJpdCBvZiB0aGUgcE5GUyBvcGVyYXRpb25zLCBhdCBsZWFzdCBm cm9tIFBhbmFzYXMNCj4gPj4+PiBwZXJzcGVjdGl2ZSB3YXMgdGhhdA0KPiA+Pj4+Pj4+PiB0aGUg ZGF0YSBpcyB0cmFuc2llbnQgdW50aWwgTEFZT1VUQ09NTUlULCBtZWFuaW5nIGl0IG1heSBvciBt YXkNCj4gPj4+PiBub3QgYmUgdmlzaWJsZQ0KPiA+Pj4+Pj4+PiB0byBjbGllbnRzIG90aGVyIHRo YW4gdGhlIG9uZSB3aG8gd3JvdGUgaXQsIGFuZCBpdHMgYXNzb2NpYXRlZA0KPiA+Pj4+IG1ldGFk YXRhIE1VU1QNCj4gPj4+Pj4+Pj4gYmUgdXBkYXRlZCBhbmQgZGVzY3JpYmUgdGhlIG5ldyBkYXRh IG9ubHkgb24gTEFZT1VUQ09NTUlUIGFuZA0KPiA+Pj4+IHVudGlsIHRoZW4gaXQncw0KPiA+Pj4+ Pj4+PiB1bmRlZmluZWQsIGkuZS4gaXQncyB1cCB0byB0aGUgc2VydmVyIGltcGxlbWVudGF0aW9u IHdoZXRoZXIgdG8NCj4gPj4+PiB1cGRhdGUgaXQgb3Igbm90Lg0KPiA+Pj4+Pj4+Pg0KPiA+Pj4+ Pj4+PiBXaXRob3V0IGxvY2tpbmcsIHdoYXQgZG8gdGhlIHN0cm9uZ2VyIHNlbWFudGljcyBidXkg eW91Pw0KPiA+Pj4+Pj4+PiBFdmVuIGlmIGEgY2xpZW50IHZlcmlmaWVkIHRoZSBjaGFuZ2VfYXR0 cmlidXRlIG5ldyBkYXRhIG1heQ0KPiA+Pj4+IGJlY29tZSB2aXNpYmxlDQo+ID4+Pj4+Pj4+IGF0 IGFueSB0aW1lIGFmdGVyIHRoZSBHRVRBVFRSIGlmIHRoZSBmaWxlL2J5dGUgcmFuZ2UgYXJlbid0 DQo+ID4+Pj4gbG9ja2VkLg0KPiA+Pj4+Pj4+DQo+ID4+Pj4+Pj4gVGhlcmUgaXMgbm8gbG9ja2lu ZyBuZWVkZWQgaW4gdGhlIHNjZW5hcmlvIGJlbG93OiBpdCBpcyBvcmRpbmFyeQ0KPiA+Pj4+Pj4+ IGNsb3NlLXRvLW9wZW4gc2VtYW50aWNzLg0KPiA+Pj4+Pj4+DQo+ID4+Pj4+Pj4gVGhlIHBvaW50 IGlzIHRoYXQgaWYgeW91IHJlbW92ZSB0aGUgb25lIGFuZCBvbmx5IHdheSB0aGF0IGNsaWVudHMN Cj4gPj4+PiBoYXZlDQo+ID4+Pj4+Pj4gdG8gZGV0ZXJtaW5lIHdoZXRoZXIgb3Igbm90IHRoZWly IGRhdGEgY2FjaGVzIGFyZSB2YWxpZCwgdGhlbiB0aGV5DQo+ID4+Pj4gY2FuDQo+ID4+Pj4+Pj4g bm8gbG9uZ2VyIGNhY2hlIGRhdGEgYXQgYWxsLCBhbmQgc2VydmVyIHNjYWxhYmlsaXR5IHdpbGwg YmUgc2hvdA0KPiA+Pj4+IHRvDQo+ID4+Pj4+Pj4gc21pdGhlcmVlbnMgYW55d2F5Lg0KPiA+Pj4+ Pj4+DQo+ID4+Pj4+Pj4gVHJvbmQNCj4gPj4+Pj4+Pg0KPiA+Pj4+Pj4+PiBCZW5ueQ0KPiA+Pj4+ Pj4+Pg0KPiA+Pj4+Pj4+Pj4NCj4gPj4+Pj4+Pj4+IENsaWVudCAxCQkJQ2xpZW50IDINCj4gPj4+ Pj4+Pj4+ID09PT09PT09CQkJPT09PT09PT0NCj4gPj4+Pj4+Pj4+DQo+ID4+Pj4+Pj4+PiBPUEVO IGZvbw0KPiA+Pj4+Pj4+Pj4gUkVBRA0KPiA+Pj4+Pj4+Pj4gQ0xPU0UNCj4gPj4+Pj4+Pj4+IAkJ CQlPUEVODQo+ID4+Pj4+Pj4+PiAJCQkJTEFZT1VUR0VUIC4uLg0KPiA+Pj4+Pj4+Pj4gCQkJCVdS SVRFIHZpYSBEUw0KPiA+Pj4+Pj4+Pj4gCQkJCTxkaWVzPi4uLg0KPiA+Pj4+Pj4+Pj4gT1BFTiBm b28NCj4gPj4+Pj4+Pj4+IHZlcmlmeSBjaGFuZ2VfYXR0cg0KPiA+Pj4+Pj4+Pj4gUkVBRCBpZiBh Ym92ZSBXUklURSBpcyB2aXNpYmxlDQo+ID4+Pj4+Pj4+PiBDTE9TRQ0KPiA+Pj4+Pj4+Pj4NCj4g Pj4+Pj4+Pj4+IFRyb25kDQo+ID4+Pj4+Pj4+PiBfX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fXw0KPiA+Pj4+Pj4+Pj4gbmZzdjQgbWFpbGluZyBsaXN0DQo+ID4+ Pj4+Pj4+PiBuZnN2NEBpZXRmLm9yZw0KPiA+Pj4+Pj4+Pj4gaHR0cHM6Ly93d3cuaWV0Zi5vcmcv bWFpbG1hbi9saXN0aW5mby9uZnN2NA0KPiA+Pj4+Pj4+DQo+ID4+Pj4+Pj4NCj4gPj4+Pj4+PiBf X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXw0KPiA+Pj4+Pj4+ IG5mc3Y0IG1haWxpbmcgbGlzdA0KPiA+Pj4+Pj4+IG5mc3Y0QGlldGYub3JnDQo+ID4+Pj4+Pj4g aHR0cHM6Ly93d3cuaWV0Zi5vcmcvbWFpbG1hbi9saXN0aW5mby9uZnN2NA0KPiA+Pj4+Pj4+DQo+ ID4+Pj4+Pj4NCj4gPj4+Pj4NCj4gPj4+Pj4NCj4gPj4+Pj4gX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX18NCj4gPj4+Pj4gbmZzdjQgbWFpbGluZyBsaXN0DQo+ ID4+Pj4+IG5mc3Y0QGlldGYub3JnDQo+ID4+Pj4+IGh0dHBzOi8vd3d3LmlldGYub3JnL21haWxt YW4vbGlzdGluZm8vbmZzdjQNCj4gPj4+Pj4NCj4gPj4+Pj4gX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX18NCj4gPj4+Pj4gbmZzdjQgbWFpbGluZyBsaXN0DQo+ ID4+Pj4+IG5mc3Y0QGlldGYub3JnDQo+ID4+Pj4+IGh0dHBzOi8vd3d3LmlldGYub3JnL21haWxt YW4vbGlzdGluZm8vbmZzdjQNCj4gPj4+Pg0KPiA+Pj4NCj4gPj4+DQo+ID4+DQo+ID4+DQo+ID4+ IC0tDQo+ID4+IFRvIHVuc3Vic2NyaWJlIGZyb20gdGhpcyBsaXN0OiBzZW5kIHRoZSBsaW5lICJ1 bnN1YnNjcmliZSBsaW51eC1uZnMiIGluDQo+ID4+IHRoZSBib2R5IG9mIGEgbWVzc2FnZSB0byBt YWpvcmRvbW9Admdlci5rZXJuZWwub3JnDQo+ID4+IE1vcmUgbWFqb3Jkb21vIGluZm8gYXQgIGh0 dHA6Ly92Z2VyLmtlcm5lbC5vcmcvbWFqb3Jkb21vLWluZm8uaHRtbA0KPiA+DQo+ID4NCj4gPg0K PiA+IC0tDQo+ID4gVG8gdW5zdWJzY3JpYmUgZnJvbSB0aGlzIGxpc3Q6IHNlbmQgdGhlIGxpbmUg InVuc3Vic2NyaWJlIGxpbnV4LW5mcyIgaW4NCj4gPiB0aGUgYm9keSBvZiBhIG1lc3NhZ2UgdG8g bWFqb3Jkb21vQHZnZXIua2VybmVsLm9yZw0KPiA+IE1vcmUgbWFqb3Jkb21vIGluZm8gYXQgIGh0 dHA6Ly92Z2VyLmtlcm5lbC5vcmcvbWFqb3Jkb21vLWluZm8uaHRtbA0KPiANCg0K ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 4.1 client - LAYOUTCOMMIT & close 2010-07-08 20:30 ` [nfsv4] " david.black @ 2010-07-08 21:16 ` Trond Myklebust 2010-07-08 23:51 ` Daniel.Muntz [not found] ` <1278623771.13551.54.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org> 2010-07-08 22:12 ` sfaibish 1 sibling, 2 replies; 38+ messages in thread From: Trond Myklebust @ 2010-07-08 21:16 UTC (permalink / raw) To: david.black; +Cc: linux-nfs, garth, welch, nfsv4, andros, bhalevy On Thu, 2010-07-08 at 16:30 -0400, david.black@emc.com wrote: > > Note that a LAYOUTRETURN can arrive without LAYOUTCOMMIT if the client hasn't > > written to the file. I'm not sure what about the blocks case though, do you > > implicitly free up any provisionally allocated blocks that the client had not > > explicitly committed using LAYOUTCOMMIT? > > In principle, yes as the blocks are no longer promised to the client, although > lazy evaluation of this is an obvious optimization. > > > >> "Upon receiving an OPEN, LOCK or a WANT_DELEGATION, the server must > > >> check that it has received LAYOUTCOMMITs from any other clients that may > > >> have the file open for writing. If it hasn't, then it MUST take some > > >> action to ensure that any file data changes are accompanied by a change > > > ^ potentially visible > > >> attribute update." > > > > That should be OK as long as it's not for every GETATTR for the change, mtime, > > or size attributes. > > > > >> > > >> Then you can add the above suggestion without the offending caveat. Note > > >> however that it does break the "SHOULD NOT" admonition in section > > >> 18.32.4. > > > > Better be safe than sorry in this rare error case. > > I concur with Benny on both of the above - in essence, the unrecovered client failure is a reason to potentially ignore the "SHOULD" (server can't know whether it actually ignored the "SHOULD", hence better safe than sorry). We probably ought to find a someplace appropriate to add a paragraph or two explaining this in one of the 4.2 documents. Right. I'm only interested in fixing the close-to-open case. The case of general GETATTR calls might be nice to fix too, but it should not be essential in order to ensure that well-behaved applications continue to work as expected. Note, however, that legacy support for stateless protocols like NFSv2 and NFSv3 may be problematic: there is no equivalent of OPEN, and so the server may have to do the above check on all NFSPROC2_GETATTR, NFSPROC3_GETATTR, NFSPROC2_LOOKUP and NFSPROC3_LOOKUP requests. Trond > Thanks, > --David > > > > -----Original Message----- > > From: Benny Halevy [mailto:bhalevy.lists@gmail.com] On Behalf Of Benny Halevy > > Sent: Thursday, July 08, 2010 12:00 PM > > To: Trond Myklebust > > Cc: Black, David; Noveck, David; Muntz, Daniel; linux-nfs@vger.kernel.org; garth@panasas.com; > > welch@panasas.com; nfsv4@ietf.org; andros@netapp.com > > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > > > > On Jul. 08, 2010, 2:14 +0300, Trond Myklebust <trond.myklebust@fys.uio.no> wrote: > > > On Wed, 2010-07-07 at 19:09 -0400, Trond Myklebust wrote: > > >> On Wed, 2010-07-07 at 18:52 -0400, Trond Myklebust wrote: > > >>> On Wed, 2010-07-07 at 18:44 -0400, david.black@emc.com wrote: > > >>>> Let me try this ... > > >>>> > > >>>> A correct client will always send LAYOUTCOMMIT. > > >>>> Assume that the client is correct. > > >>>> Hence if the LAYOUTCOMMIT doesn't arrive, something's failed. > > >>>> > > >>>> Important implication: No LAYOUTCOMMIT is an error/failure case. It > > >>>> just has to work; it doesn't have to be fast. > > >>>> > > > > Note that a LAYOUTRETURN can arrive without LAYOUTCOMMIT if the client hasn't > > written to the file. I'm not sure what about the blocks case though, do you > > implicitly free up any provisionally allocated blocks that the client had not > > explicitly committed using LAYOUTCOMMIT? > > > > >>>> Suggestion: If a client dies while holding writeable layouts that permit > > >>>> write-in-place, and the client doesn't reappear or doesn't reclaim those > > >>>> layouts, then the server should assume that the files involved were > > >>>> written before the client died, and set the file attributes accordingly > > >>>> as part of internally reclaiming the layout that the client has > > >>>> abandoned. > > > > Of course. That's part of the server recovery. > > > > >>>> > > >>>> Caveat: It may take a while for the server to determine that the client > > >>>> has abandoned a layout. > > > > That's two lease times after a respective CB_LAYOUTRECALL. > > > > >>>> > > >>>> This can result in false positives (file appears to be modified when it > > >>>> wasn't) but won't yield false negatives (file does not appear to be > > >>>> modified even though it was modified). > > >>> > > >>> OK... So we're going to have to turn off client side file caching > > >>> entirely for pNFS? I can do that... > > >>> > > >>> The above won't work. Think readahead... > > >> > > >> So... What can work, is if you modify it to work explicitly for > > >> close-to-open > > >> > > >> "Upon receiving an OPEN, LOCK or a WANT_DELEGATION, the server must > > >> check that it has received LAYOUTCOMMITs from any other clients that may > > >> have the file open for writing. If it hasn't, then it MUST take some > > >> action to ensure that any file data changes are accompanied by a change > > > ^ potentially visible > > >> attribute update." > > > > That should be OK as long as it's not for every GETATTR for the change, mtime, > > or size attributes. > > > > >> > > >> Then you can add the above suggestion without the offending caveat. Note > > >> however that it does break the "SHOULD NOT" admonition in section > > >> 18.32.4. > > > > Better be safe than sorry in this rare error case. > > > > Benny > > > > >> > > >> Trond > > >> > > >> > > >>> Trond > > >>> > > >>>> Thanks, > > >>>> --David > > >>>> > > >>>>> -----Original Message----- > > >>>>> From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] On Behalf > > >>>> Of Noveck_David@emc.com > > >>>>> Sent: Wednesday, July 07, 2010 6:04 PM > > >>>>> To: Trond.Myklebust@netapp.com; Muntz, Daniel > > >>>>> Cc: linux-nfs@vger.kernel.org; garth@panasas.com; welch@panasas.com; > > >>>> nfsv4@ietf.org; > > >>>>> andros@netapp.com; bhalevy@panasas.com > > >>>>> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > > >>>>> > > >>>>>> Yes. I would agree that the client cannot rely on the updates being > > >>>> made > > >>>>>> visible if it fails to send the LAYOUTCOMMIT. My point was simply > > >>>> that a > > >>>>>> compliant server MUST also have a valid strategy for dealing with > > >>>> the > > >>>>>> case where the client doesn't send it. > > >>>>> > > >>>>> So you are saying the updates "MUST be made visible" through the > > >>>>> server's valid strategy. Is that right. > > >>>>> > > >>>>> And that the client cannot rely on that. Why not, if the server must > > >>>>> have a valid strategy. > > >>>>> > > >>>>> Is this just prudent "belt and suspenders" design or what? > > >>>>> > > >>>>> It seems to me that if one side here is MUST (and the spec needs to be > > >>>>> clearer about what might or might not constitute a valid strategy), > > >>>> then > > >>>>> the other side should be SHOULD. > > >>>>> > > >>>>> If both sides are "MUST", then if things don't work out then the > > >>>> client > > >>>>> and server can equally point to one another and say "It's his fault". > > >>>>> > > >>>>> Am I missing something here? > > >>>>> > > >>>>> > > >>>>> > > >>>>> -----Original Message----- > > >>>>> From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] On Behalf > > >>>>> Of Trond Myklebust > > >>>>> Sent: Wednesday, July 07, 2010 5:01 PM > > >>>>> To: Muntz, Daniel > > >>>>> Cc: linux-nfs@vger.kernel.org; garth@panasas.com; welch@panasas.com; > > >>>>> nfsv4@ietf.org; andros@netapp.com; bhalevy@panasas.com > > >>>>> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > > >>>>> > > >>>>> On Wed, 2010-07-07 at 16:39 -0400, Daniel.Muntz@emc.com wrote: > > >>>>>> To bring this discussion full circle, since we agree that a > > >>>> compliant > > >>>>>> server can implement a scheme where written data does not become > > >>>>> visible > > >>>>>> until after a LAYOUTCOMMIT, do we also agree that LAYOUTCOMMIT is a > > >>>>>> "MUST" from a compliant client (independent of layout type)? > > >>>>> > > >>>>> Yes. I would agree that the client cannot rely on the updates being > > >>>> made > > >>>>> visible if it fails to send the LAYOUTCOMMIT. My point was simply that > > >>>> a > > >>>>> compliant server MUST also have a valid strategy for dealing with the > > >>>>> case where the client doesn't send it. > > >>>>> > > >>>>> Cheers > > >>>>> Trond > > >>>>> > > >>>>>> -Dan > > >>>>>> > > >>>>>>> -----Original Message----- > > >>>>>>> From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] > > >>>>>>> On Behalf Of Trond Myklebust > > >>>>>>> Sent: Wednesday, July 07, 2010 7:04 AM > > >>>>>>> To: Benny Halevy > > >>>>>>> Cc: andros@netapp.com; linux-nfs@vger.kernel.org; Garth > > >>>>>>> Gibson; Brent Welch; NFSv4 > > >>>>>>> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > > >>>>>>> > > >>>>>>> On Wed, 2010-07-07 at 16:51 +0300, Benny Halevy wrote: > > >>>>>>>> On Jul. 07, 2010, 16:18 +0300, Trond Myklebust > > >>>>>>> <Trond.Myklebust@netapp.com> wrote: > > >>>>>>>>> On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote: > > >>>>>>>>>> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote: > > >>>>>>>>>>> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust > > >>>>>>> <trond.myklebust@fys.uio.no> wrote: > > >>>>>>>>>>>> On Tue, 2010-07-06 at 15:20 -0400, Daniel.Muntz@emc.com > > >>>>> wrote: > > >>>>>>>>>>>>> The COMMIT to the DS, ttbomk, commits data on the DS. I > > >>>> see it as > > >>>>>>>>>>>>> orthogonal to updating the metadata on the MDS (but > > >>>> perhaps I'm wrong). > > >>>>>>>>>>>>> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides a > > >>>> synchronization > > >>>>>>>>>>>>> point, so even if the non-clustered server does not want > > >>>> to update > > >>>>>>>>>>>>> metadata on every DS I/O, the LAYOUTCOMMIT could also be a > > >>>> trigger to > > >>>>>>>>>>>>> execute whatever synchronization mechanism the implementer > > >>>> wishes to put > > >>>>>>>>>>>>> in the control protocol. > > >>>>>>>>>>>> > > >>>>>>>>>>>> As far as I'm aware, there are no exceptions in RFC5661 > > >>>> that would allow > > >>>>>>>>>>>> pNFS servers to break the rule that any visible change to > > >>>> the data must > > >>>>>>>>>>>> be atomically accompanied with a change attribute update. > > >>>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> Trond, I'm not sure how this rule you mentioned is > > >>>> specified. > > >>>>>>>>>>> > > >>>>>>>>>>> See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT and > > >>>> change/time_modify > > >>>>>>>>>>> in particular: > > >>>>>>>>>>> > > >>>>>>>>>>> For some layout protocols, the storage device is able to > > >>>> notify the > > >>>>>>>>>>> metadata server of the occurrence of an I/O; as a result, > > >>>> the change > > >>>>>>>>>>> and time_modify attributes may be updated at the metadata > > >>>> server. > > >>>>>>>>>>> For a metadata server that is capable of monitoring > > >>>> updates to the > > >>>>>>>>>>> change and time_modify attributes, LAYOUTCOMMIT > > >>>> processing is not > > >>>>>>>>>>> required to update the change attribute. In this case, > > >>>> the metadata > > >>>>>>>>>>> server must ensure that no further update to the data has > > >>>> occurred > > >>>>>>>>>>> since the last update of the attributes; file-based > > >>>> protocols may > > >>>>>>>>>>> have enough information to make this determination or may > > >>>> update the > > >>>>>>>>>>> change attribute upon each file modification. This also > > >>>> applies for > > >>>>>>>>>>> the time_modify attribute. If the server implementation > > >>>> is able to > > >>>>>>>>>>> determine that the file has not been modified since the > > >>>> last > > >>>>>>>>>>> time_modify update, the server need not update > > >>>> time_modify at > > >>>>>>>>>>> LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the updated > > >>>> attributes > > >>>>>>>>>>> should be visible if that file was modified since the > > >>>> latest previous > > >>>>>>>>>>> LAYOUTCOMMIT or LAYOUTGET > > >>>>>>>>>> > > >>>>>>>>>> I know. However the above paragraph does not state that the > > >>>> server > > >>>>>>>>>> should make those changes visible to clients other than the > > >>>> one that is > > >>>>>>>>>> writing. > > >>>>>>>>>> > > >>>>>>>>>> Section 18.32.4 states that writes will cause the > > >>>> time_modified and > > >>>>>>>>>> change attributes to be updated (if and only if the file data > > >>>> is > > >>>>>>>>>> modified). Several other sections rely on this behaviour, > > >>>> including > > >>>>>>>>>> section 10.3.1, section 11.7.2.2, and section 11.7.7. > > >>>>>>>>>> > > >>>>>>>>>> The only 'special behaviour' that I see allowed for pNFS is > > >>>> in section > > >>>>>>>>>> 13.10, which states that clients can't expect to see changes > > >>>>>>>>>> immediately, but that they must be able to expect > > >>>> close-to-open > > >>>>>>>>>> semantics to work. Again, if this is to be the case, then the > > >>>> server > > >>>>>>>>>> _must_ be able to deal with the case where client 1 dies > > >>>> before it can > > >>>>>>>>>> issue the LAYOUTCOMMIT. > > >>>>>>>> > > >>>>>>>> Agreed. > > >>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>>>> As I see it, if your server allows one client to read data > > >>>> that may have > > >>>>>>>>>>>> been modified by another client that holds a WRITE layout > > >>>> for that range > > >>>>>>>>>>>> then (since that is a visible data change) it should > > >>>> provide a change > > >>>>>>>>>>>> attribute update irrespective of whether or not a > > >>>> LAYOUTCOMMIT has been > > >>>>>>>>>>>> sent. > > >>>>>>>>>>> > > >>>>>>>>>>> the requirement for the server in WRITE's implementation > > >>>> section > > >>>>>>>>>>> is quite weak: "It is assumed that the act of writing data > > >>>> to a file will > > >>>>>>>>>>> cause the time_modified and change attributes of the file to > > >>>> be updated." > > >>>>>>>>>>> > > >>>>>>>>>>> The difference here is that for pNFS the written data is not > > >>>> guaranteed > > >>>>>>>>>>> to be visible until LAYOUTCOMMIT. In a broader sense, > > >>>> assuming the clients > > >>>>>>>>>>> are caching dirty data and use a write-behind cache, > > >>>> application-written data > > >>>>>>>>>>> may be visible to other processes on the same host but not > > >>>> to others until > > >>>>>>>>>>> fsync() or close() - open-to-close semantics are the only > > >>>> thing the client > > >>>>>>>>>>> guarantees, right? Issuing LAYOUTCOMMIT on fsync() and > > >>>> close() ensure the > > >>>>>>>>>>> data is committed to stable storage and is visible to all > > >>>> other clients in > > >>>>>>>>>>> the cluster. > > >>>>>>>>>> > > >>>>>>>>>> See above. I'm not disputing your statement that 'the written > > >>>> data is > > >>>>>>>>>> not guaranteed to be visible until LAYOUTCOMMIT'. I am > > >>>> disputing an > > >>>>>>>>>> assumption that 'the written data may be visible without an > > >>>> accompanying > > >>>>>>>>>> change attribute update'. > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> In other words, I'd expect the following scenario to give the > > >>>> same > > >>>>>>>>> results in NFSv4.1 w/pNFS as it does in NFSv4: > > >>>>>>>> > > >>>>>>>> That's a strong requirement that may limit the scalability of > > >>>> the server. > > >>>>>>>> > > >>>>>>>> The spirit of the pNFS operations, at least from Panasas > > >>>> perspective was that > > >>>>>>>> the data is transient until LAYOUTCOMMIT, meaning it may or may > > >>>> not be visible > > >>>>>>>> to clients other than the one who wrote it, and its associated > > >>>> metadata MUST > > >>>>>>>> be updated and describe the new data only on LAYOUTCOMMIT and > > >>>> until then it's > > >>>>>>>> undefined, i.e. it's up to the server implementation whether to > > >>>> update it or not. > > >>>>>>>> > > >>>>>>>> Without locking, what do the stronger semantics buy you? > > >>>>>>>> Even if a client verified the change_attribute new data may > > >>>> become visible > > >>>>>>>> at any time after the GETATTR if the file/byte range aren't > > >>>> locked. > > >>>>>>> > > >>>>>>> There is no locking needed in the scenario below: it is ordinary > > >>>>>>> close-to-open semantics. > > >>>>>>> > > >>>>>>> The point is that if you remove the one and only way that clients > > >>>> have > > >>>>>>> to determine whether or not their data caches are valid, then they > > >>>> can > > >>>>>>> no longer cache data at all, and server scalability will be shot > > >>>> to > > >>>>>>> smithereens anyway. > > >>>>>>> > > >>>>>>> Trond > > >>>>>>> > > >>>>>>>> Benny > > >>>>>>>> > > >>>>>>>>> > > >>>>>>>>> Client 1 Client 2 > > >>>>>>>>> ======== ======== > > >>>>>>>>> > > >>>>>>>>> OPEN foo > > >>>>>>>>> READ > > >>>>>>>>> CLOSE > > >>>>>>>>> OPEN > > >>>>>>>>> LAYOUTGET ... > > >>>>>>>>> WRITE via DS > > >>>>>>>>> <dies>... > > >>>>>>>>> OPEN foo > > >>>>>>>>> verify change_attr > > >>>>>>>>> READ if above WRITE is visible > > >>>>>>>>> CLOSE > > >>>>>>>>> > > >>>>>>>>> Trond > > >>>>>>>>> _______________________________________________ > > >>>>>>>>> nfsv4 mailing list > > >>>>>>>>> nfsv4@ietf.org > > >>>>>>>>> https://www.ietf.org/mailman/listinfo/nfsv4 > > >>>>>>> > > >>>>>>> > > >>>>>>> _______________________________________________ > > >>>>>>> nfsv4 mailing list > > >>>>>>> nfsv4@ietf.org > > >>>>>>> https://www.ietf.org/mailman/listinfo/nfsv4 > > >>>>>>> > > >>>>>>> > > >>>>> > > >>>>> > > >>>>> _______________________________________________ > > >>>>> nfsv4 mailing list > > >>>>> nfsv4@ietf.org > > >>>>> https://www.ietf.org/mailman/listinfo/nfsv4 > > >>>>> > > >>>>> _______________________________________________ > > >>>>> nfsv4 mailing list > > >>>>> nfsv4@ietf.org > > >>>>> https://www.ietf.org/mailman/listinfo/nfsv4 > > >>>> > > >>> > > >>> > > >> > > >> > > >> -- > > >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > > >> the body of a message to majordomo@vger.kernel.org > > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > _______________________________________________ > nfsv4 mailing list > nfsv4@ietf.org > https://www.ietf.org/mailman/listinfo/nfsv4 _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www.ietf.org/mailman/listinfo/nfsv4 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 4.1 client - LAYOUTCOMMIT & close 2010-07-08 21:16 ` Trond Myklebust @ 2010-07-08 23:51 ` Daniel.Muntz [not found] ` <1278623771.13551.54.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org> 1 sibling, 0 replies; 38+ messages in thread From: Daniel.Muntz @ 2010-07-08 23:51 UTC (permalink / raw) To: trond.myklebust, david.black Cc: linux-nfs, garth, welch, nfsv4, andros, bhalevy > -----Original Message----- > From: linux-nfs-owner@vger.kernel.org > [mailto:linux-nfs-owner@vger.kernel.org] On Behalf Of Trond Myklebust > Sent: Thursday, July 08, 2010 2:16 PM > To: Black, David > Cc: bhalevy@panasas.com; linux-nfs@vger.kernel.org; > garth@panasas.com; welch@panasas.com; nfsv4@ietf.org; > andros@netapp.com > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > > On Thu, 2010-07-08 at 16:30 -0400, david.black@emc.com wrote: > > > Note that a LAYOUTRETURN can arrive without LAYOUTCOMMIT > if the client hasn't > > > written to the file. I'm not sure what about the blocks > case though, do you > > > implicitly free up any provisionally allocated blocks > that the client had not > > > explicitly committed using LAYOUTCOMMIT? > > > > In principle, yes as the blocks are no longer promised to > the client, although > > lazy evaluation of this is an obvious optimization. > > > > > >> "Upon receiving an OPEN, LOCK or a WANT_DELEGATION, > the server must > > > >> check that it has received LAYOUTCOMMITs from any > other clients that may > > > >> have the file open for writing. If it hasn't, then it > MUST take some > > > >> action to ensure that any file data changes are > accompanied by a change > > > > ^ potentially visible > > > >> attribute update." > > > > > > That should be OK as long as it's not for every GETATTR > for the change, mtime, > > > or size attributes. > > > > > > >> > > > >> Then you can add the above suggestion without the > offending caveat. Note > > > >> however that it does break the "SHOULD NOT" admonition > in section > > > >> 18.32.4. > > > > > > Better be safe than sorry in this rare error case. > > > > I concur with Benny on both of the above - in essence, the > unrecovered client failure is a reason to potentially ignore > the "SHOULD" (server can't know whether it actually ignored > the "SHOULD", hence better safe than sorry). We probably > ought to find a someplace appropriate to add a paragraph or > two explaining this in one of the 4.2 documents. > > Right. I'm only interested in fixing the close-to-open case. > The case of > general GETATTR calls might be nice to fix too, but it should not be > essential in order to ensure that well-behaved applications > continue to > work as expected. I think we have close-to-open covered. A client will do a LAYOUTCOMMIT after the last WRITE before a CLOSE (otherwise it has no guarantee that the data becomes "visible"). So, written data may be "visible" to other clients without the change attribute being updated, *but* at CLOSE time we are guaranteed the change attribute is updated. In the failure case (client dies before sending LAYOUTCOMMIT and/or CLOSE), the server will eventually have to close the file. At this point, the server can, e.g., use its knowledge of the layout(s) that may have been used by the client to check DSs (via control protocol) to synthesize the appropriate attributes including change attribute and set them before completing the server close operation. This is hand-wavy, but I think there's a way to solve close-to-open without updating the change attribute with every DS write. However, I think we may still have a problem when locking/delegations are combined with client caching and attempting to decouple the DS write from the change attribute update. I'm still looking into this. > > Note, however, that legacy support for stateless protocols like NFSv2 > and NFSv3 may be problematic: there is no equivalent of OPEN, > and so the > server may have to do the above check on all NFSPROC2_GETATTR, > NFSPROC3_GETATTR, NFSPROC2_LOOKUP and NFSPROC3_LOOKUP requests. > > Trond > > > Thanks, > > --David > > > > > > > -----Original Message----- > > > From: Benny Halevy [mailto:bhalevy.lists@gmail.com] On > Behalf Of Benny Halevy > > > Sent: Thursday, July 08, 2010 12:00 PM > > > To: Trond Myklebust > > > Cc: Black, David; Noveck, David; Muntz, Daniel; > linux-nfs@vger.kernel.org; garth@panasas.com; > > > welch@panasas.com; nfsv4@ietf.org; andros@netapp.com > > > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > > > > > > On Jul. 08, 2010, 2:14 +0300, Trond Myklebust > <trond.myklebust@fys.uio.no> wrote: > > > > On Wed, 2010-07-07 at 19:09 -0400, Trond Myklebust wrote: > > > >> On Wed, 2010-07-07 at 18:52 -0400, Trond Myklebust wrote: > > > >>> On Wed, 2010-07-07 at 18:44 -0400, david.black@emc.com wrote: > > > >>>> Let me try this ... > > > >>>> > > > >>>> A correct client will always send LAYOUTCOMMIT. > > > >>>> Assume that the client is correct. > > > >>>> Hence if the LAYOUTCOMMIT doesn't arrive, something's failed. > > > >>>> > > > >>>> Important implication: No LAYOUTCOMMIT is an > error/failure case. It > > > >>>> just has to work; it doesn't have to be fast. > > > >>>> > > > > > > Note that a LAYOUTRETURN can arrive without LAYOUTCOMMIT > if the client hasn't > > > written to the file. I'm not sure what about the blocks > case though, do you > > > implicitly free up any provisionally allocated blocks > that the client had not > > > explicitly committed using LAYOUTCOMMIT? > > > > > > >>>> Suggestion: If a client dies while holding writeable > layouts that permit > > > >>>> write-in-place, and the client doesn't reappear or > doesn't reclaim those > > > >>>> layouts, then the server should assume that the > files involved were > > > >>>> written before the client died, and set the file > attributes accordingly > > > >>>> as part of internally reclaiming the layout that the > client has > > > >>>> abandoned. > > > > > > Of course. That's part of the server recovery. > > > > > > >>>> > > > >>>> Caveat: It may take a while for the server to > determine that the client > > > >>>> has abandoned a layout. > > > > > > That's two lease times after a respective CB_LAYOUTRECALL. > > > > > > >>>> > > > >>>> This can result in false positives (file appears to > be modified when it > > > >>>> wasn't) but won't yield false negatives (file does > not appear to be > > > >>>> modified even though it was modified). > > > >>> > > > >>> OK... So we're going to have to turn off client side > file caching > > > >>> entirely for pNFS? I can do that... > > > >>> > > > >>> The above won't work. Think readahead... > > > >> > > > >> So... What can work, is if you modify it to work explicitly for > > > >> close-to-open > > > >> > > > >> "Upon receiving an OPEN, LOCK or a WANT_DELEGATION, > the server must > > > >> check that it has received LAYOUTCOMMITs from any > other clients that may > > > >> have the file open for writing. If it hasn't, then it > MUST take some > > > >> action to ensure that any file data changes are > accompanied by a change > > > > ^ potentially visible > > > >> attribute update." > > > > > > That should be OK as long as it's not for every GETATTR > for the change, mtime, > > > or size attributes. > > > > > > >> > > > >> Then you can add the above suggestion without the > offending caveat. Note > > > >> however that it does break the "SHOULD NOT" admonition > in section > > > >> 18.32.4. > > > > > > Better be safe than sorry in this rare error case. > > > > > > Benny > > > > > > >> > > > >> Trond > > > >> > > > >> > > > >>> Trond > > > >>> > > > >>>> Thanks, > > > >>>> --David > > > >>>> > > > >>>>> -----Original Message----- > > > >>>>> From: nfsv4-bounces@ietf.org > [mailto:nfsv4-bounces@ietf.org] On Behalf > > > >>>> Of Noveck_David@emc.com > > > >>>>> Sent: Wednesday, July 07, 2010 6:04 PM > > > >>>>> To: Trond.Myklebust@netapp.com; Muntz, Daniel > > > >>>>> Cc: linux-nfs@vger.kernel.org; garth@panasas.com; > welch@panasas.com; > > > >>>> nfsv4@ietf.org; > > > >>>>> andros@netapp.com; bhalevy@panasas.com > > > >>>>> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > > > >>>>> > > > >>>>>> Yes. I would agree that the client cannot rely on > the updates being > > > >>>> made > > > >>>>>> visible if it fails to send the LAYOUTCOMMIT. My > point was simply > > > >>>> that a > > > >>>>>> compliant server MUST also have a valid strategy > for dealing with > > > >>>> the > > > >>>>>> case where the client doesn't send it. > > > >>>>> > > > >>>>> So you are saying the updates "MUST be made > visible" through the > > > >>>>> server's valid strategy. Is that right. > > > >>>>> > > > >>>>> And that the client cannot rely on that. Why not, > if the server must > > > >>>>> have a valid strategy. > > > >>>>> > > > >>>>> Is this just prudent "belt and suspenders" design or what? > > > >>>>> > > > >>>>> It seems to me that if one side here is MUST (and > the spec needs to be > > > >>>>> clearer about what might or might not constitute a > valid strategy), > > > >>>> then > > > >>>>> the other side should be SHOULD. > > > >>>>> > > > >>>>> If both sides are "MUST", then if things don't work > out then the > > > >>>> client > > > >>>>> and server can equally point to one another and say > "It's his fault". > > > >>>>> > > > >>>>> Am I missing something here? > > > >>>>> > > > >>>>> > > > >>>>> > > > >>>>> -----Original Message----- > > > >>>>> From: nfsv4-bounces@ietf.org > [mailto:nfsv4-bounces@ietf.org] On Behalf > > > >>>>> Of Trond Myklebust > > > >>>>> Sent: Wednesday, July 07, 2010 5:01 PM > > > >>>>> To: Muntz, Daniel > > > >>>>> Cc: linux-nfs@vger.kernel.org; garth@panasas.com; > welch@panasas.com; > > > >>>>> nfsv4@ietf.org; andros@netapp.com; bhalevy@panasas.com > > > >>>>> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > > > >>>>> > > > >>>>> On Wed, 2010-07-07 at 16:39 -0400, > Daniel.Muntz@emc.com wrote: > > > >>>>>> To bring this discussion full circle, since we agree that a > > > >>>> compliant > > > >>>>>> server can implement a scheme where written data > does not become > > > >>>>> visible > > > >>>>>> until after a LAYOUTCOMMIT, do we also agree that > LAYOUTCOMMIT is a > > > >>>>>> "MUST" from a compliant client (independent of > layout type)? > > > >>>>> > > > >>>>> Yes. I would agree that the client cannot rely on > the updates being > > > >>>> made > > > >>>>> visible if it fails to send the LAYOUTCOMMIT. My > point was simply that > > > >>>> a > > > >>>>> compliant server MUST also have a valid strategy > for dealing with the > > > >>>>> case where the client doesn't send it. > > > >>>>> > > > >>>>> Cheers > > > >>>>> Trond > > > >>>>> > > > >>>>>> -Dan > > > >>>>>> > > > >>>>>>> -----Original Message----- > > > >>>>>>> From: nfsv4-bounces@ietf.org > [mailto:nfsv4-bounces@ietf.org] > > > >>>>>>> On Behalf Of Trond Myklebust > > > >>>>>>> Sent: Wednesday, July 07, 2010 7:04 AM > > > >>>>>>> To: Benny Halevy > > > >>>>>>> Cc: andros@netapp.com; linux-nfs@vger.kernel.org; Garth > > > >>>>>>> Gibson; Brent Welch; NFSv4 > > > >>>>>>> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > > > >>>>>>> > > > >>>>>>> On Wed, 2010-07-07 at 16:51 +0300, Benny Halevy wrote: > > > >>>>>>>> On Jul. 07, 2010, 16:18 +0300, Trond Myklebust > > > >>>>>>> <Trond.Myklebust@netapp.com> wrote: > > > >>>>>>>>> On Wed, 2010-07-07 at 09:06 -0400, Trond > Myklebust wrote: > > > >>>>>>>>>> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote: > > > >>>>>>>>>>> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust > > > >>>>>>> <trond.myklebust@fys.uio.no> wrote: > > > >>>>>>>>>>>> On Tue, 2010-07-06 at 15:20 -0400, > Daniel.Muntz@emc.com > > > >>>>> wrote: > > > >>>>>>>>>>>>> The COMMIT to the DS, ttbomk, commits data > on the DS. I > > > >>>> see it as > > > >>>>>>>>>>>>> orthogonal to updating the metadata on the MDS (but > > > >>>> perhaps I'm wrong). > > > >>>>>>>>>>>>> As sjoshi@bluearc mentioned, the > LAYOUTCOMMIT provides a > > > >>>> synchronization > > > >>>>>>>>>>>>> point, so even if the non-clustered server > does not want > > > >>>> to update > > > >>>>>>>>>>>>> metadata on every DS I/O, the LAYOUTCOMMIT > could also be a > > > >>>> trigger to > > > >>>>>>>>>>>>> execute whatever synchronization mechanism > the implementer > > > >>>> wishes to put > > > >>>>>>>>>>>>> in the control protocol. > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> As far as I'm aware, there are no exceptions > in RFC5661 > > > >>>> that would allow > > > >>>>>>>>>>>> pNFS servers to break the rule that any > visible change to > > > >>>> the data must > > > >>>>>>>>>>>> be atomically accompanied with a change > attribute update. > > > >>>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>>> Trond, I'm not sure how this rule you mentioned is > > > >>>> specified. > > > >>>>>>>>>>> > > > >>>>>>>>>>> See more in section 12.5.4 and 12.5.4.1. > LAYOUTCOMMIT and > > > >>>> change/time_modify > > > >>>>>>>>>>> in particular: > > > >>>>>>>>>>> > > > >>>>>>>>>>> For some layout protocols, the storage > device is able to > > > >>>> notify the > > > >>>>>>>>>>> metadata server of the occurrence of an > I/O; as a result, > > > >>>> the change > > > >>>>>>>>>>> and time_modify attributes may be updated > at the metadata > > > >>>> server. > > > >>>>>>>>>>> For a metadata server that is capable of monitoring > > > >>>> updates to the > > > >>>>>>>>>>> change and time_modify attributes, LAYOUTCOMMIT > > > >>>> processing is not > > > >>>>>>>>>>> required to update the change attribute. > In this case, > > > >>>> the metadata > > > >>>>>>>>>>> server must ensure that no further update > to the data has > > > >>>> occurred > > > >>>>>>>>>>> since the last update of the attributes; file-based > > > >>>> protocols may > > > >>>>>>>>>>> have enough information to make this > determination or may > > > >>>> update the > > > >>>>>>>>>>> change attribute upon each file > modification. This also > > > >>>> applies for > > > >>>>>>>>>>> the time_modify attribute. If the server > implementation > > > >>>> is able to > > > >>>>>>>>>>> determine that the file has not been > modified since the > > > >>>> last > > > >>>>>>>>>>> time_modify update, the server need not update > > > >>>> time_modify at > > > >>>>>>>>>>> LAYOUTCOMMIT. At LAYOUTCOMMIT completion, > the updated > > > >>>> attributes > > > >>>>>>>>>>> should be visible if that file was > modified since the > > > >>>> latest previous > > > >>>>>>>>>>> LAYOUTCOMMIT or LAYOUTGET > > > >>>>>>>>>> > > > >>>>>>>>>> I know. However the above paragraph does not > state that the > > > >>>> server > > > >>>>>>>>>> should make those changes visible to clients > other than the > > > >>>> one that is > > > >>>>>>>>>> writing. > > > >>>>>>>>>> > > > >>>>>>>>>> Section 18.32.4 states that writes will cause the > > > >>>> time_modified and > > > >>>>>>>>>> change attributes to be updated (if and only > if the file data > > > >>>> is > > > >>>>>>>>>> modified). Several other sections rely on this > behaviour, > > > >>>> including > > > >>>>>>>>>> section 10.3.1, section 11.7.2.2, and section 11.7.7. > > > >>>>>>>>>> > > > >>>>>>>>>> The only 'special behaviour' that I see > allowed for pNFS is > > > >>>> in section > > > >>>>>>>>>> 13.10, which states that clients can't expect > to see changes > > > >>>>>>>>>> immediately, but that they must be able to expect > > > >>>> close-to-open > > > >>>>>>>>>> semantics to work. Again, if this is to be the > case, then the > > > >>>> server > > > >>>>>>>>>> _must_ be able to deal with the case where > client 1 dies > > > >>>> before it can > > > >>>>>>>>>> issue the LAYOUTCOMMIT. > > > >>>>>>>> > > > >>>>>>>> Agreed. > > > >>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>>>>> As I see it, if your server allows one > client to read data > > > >>>> that may have > > > >>>>>>>>>>>> been modified by another client that holds a > WRITE layout > > > >>>> for that range > > > >>>>>>>>>>>> then (since that is a visible data change) it should > > > >>>> provide a change > > > >>>>>>>>>>>> attribute update irrespective of whether or not a > > > >>>> LAYOUTCOMMIT has been > > > >>>>>>>>>>>> sent. > > > >>>>>>>>>>> > > > >>>>>>>>>>> the requirement for the server in WRITE's > implementation > > > >>>> section > > > >>>>>>>>>>> is quite weak: "It is assumed that the act of > writing data > > > >>>> to a file will > > > >>>>>>>>>>> cause the time_modified and change attributes > of the file to > > > >>>> be updated." > > > >>>>>>>>>>> > > > >>>>>>>>>>> The difference here is that for pNFS the > written data is not > > > >>>> guaranteed > > > >>>>>>>>>>> to be visible until LAYOUTCOMMIT. In a broader sense, > > > >>>> assuming the clients > > > >>>>>>>>>>> are caching dirty data and use a write-behind cache, > > > >>>> application-written data > > > >>>>>>>>>>> may be visible to other processes on the same > host but not > > > >>>> to others until > > > >>>>>>>>>>> fsync() or close() - open-to-close semantics > are the only > > > >>>> thing the client > > > >>>>>>>>>>> guarantees, right? Issuing LAYOUTCOMMIT on > fsync() and > > > >>>> close() ensure the > > > >>>>>>>>>>> data is committed to stable storage and is > visible to all > > > >>>> other clients in > > > >>>>>>>>>>> the cluster. > > > >>>>>>>>>> > > > >>>>>>>>>> See above. I'm not disputing your statement > that 'the written > > > >>>> data is > > > >>>>>>>>>> not guaranteed to be visible until LAYOUTCOMMIT'. I am > > > >>>> disputing an > > > >>>>>>>>>> assumption that 'the written data may be > visible without an > > > >>>> accompanying > > > >>>>>>>>>> change attribute update'. > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> In other words, I'd expect the following > scenario to give the > > > >>>> same > > > >>>>>>>>> results in NFSv4.1 w/pNFS as it does in NFSv4: > > > >>>>>>>> > > > >>>>>>>> That's a strong requirement that may limit the > scalability of > > > >>>> the server. > > > >>>>>>>> > > > >>>>>>>> The spirit of the pNFS operations, at least from Panasas > > > >>>> perspective was that > > > >>>>>>>> the data is transient until LAYOUTCOMMIT, > meaning it may or may > > > >>>> not be visible > > > >>>>>>>> to clients other than the one who wrote it, and > its associated > > > >>>> metadata MUST > > > >>>>>>>> be updated and describe the new data only on > LAYOUTCOMMIT and > > > >>>> until then it's > > > >>>>>>>> undefined, i.e. it's up to the server > implementation whether to > > > >>>> update it or not. > > > >>>>>>>> > > > >>>>>>>> Without locking, what do the stronger semantics buy you? > > > >>>>>>>> Even if a client verified the change_attribute > new data may > > > >>>> become visible > > > >>>>>>>> at any time after the GETATTR if the file/byte > range aren't > > > >>>> locked. > > > >>>>>>> > > > >>>>>>> There is no locking needed in the scenario below: > it is ordinary > > > >>>>>>> close-to-open semantics. > > > >>>>>>> > > > >>>>>>> The point is that if you remove the one and only > way that clients > > > >>>> have > > > >>>>>>> to determine whether or not their data caches are > valid, then they > > > >>>> can > > > >>>>>>> no longer cache data at all, and server > scalability will be shot > > > >>>> to > > > >>>>>>> smithereens anyway. > > > >>>>>>> > > > >>>>>>> Trond > > > >>>>>>> > > > >>>>>>>> Benny > > > >>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> Client 1 Client 2 > > > >>>>>>>>> ======== ======== > > > >>>>>>>>> > > > >>>>>>>>> OPEN foo > > > >>>>>>>>> READ > > > >>>>>>>>> CLOSE > > > >>>>>>>>> OPEN > > > >>>>>>>>> LAYOUTGET ... > > > >>>>>>>>> WRITE via DS > > > >>>>>>>>> <dies>... > > > >>>>>>>>> OPEN foo > > > >>>>>>>>> verify change_attr > > > >>>>>>>>> READ if above WRITE is visible > > > >>>>>>>>> CLOSE > > > >>>>>>>>> > > > >>>>>>>>> Trond > > > >>>>>>>>> _______________________________________________ > > > >>>>>>>>> nfsv4 mailing list > > > >>>>>>>>> nfsv4@ietf.org > > > >>>>>>>>> https://www.ietf.org/mailman/listinfo/nfsv4 > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> _______________________________________________ > > > >>>>>>> nfsv4 mailing list > > > >>>>>>> nfsv4@ietf.org > > > >>>>>>> https://www.ietf.org/mailman/listinfo/nfsv4 > > > >>>>>>> > > > >>>>>>> > > > >>>>> > > > >>>>> > > > >>>>> _______________________________________________ > > > >>>>> nfsv4 mailing list > > > >>>>> nfsv4@ietf.org > > > >>>>> https://www.ietf.org/mailman/listinfo/nfsv4 > > > >>>>> > > > >>>>> _______________________________________________ > > > >>>>> nfsv4 mailing list > > > >>>>> nfsv4@ietf.org > > > >>>>> https://www.ietf.org/mailman/listinfo/nfsv4 > > > >>>> > > > >>> > > > >>> > > > >> > > > >> > > > >> -- > > > >> To unsubscribe from this list: send the line > "unsubscribe linux-nfs" in > > > >> the body of a message to majordomo@vger.kernel.org > > > >> More majordomo info at > http://vger.kernel.org/majordomo-info.html > > > > > > > > > > > > > > > > -- > > > > To unsubscribe from this list: send the line > "unsubscribe linux-nfs" in > > > > the body of a message to majordomo@vger.kernel.org > > > > More majordomo info at > http://vger.kernel.org/majordomo-info.html > > > > > > > _______________________________________________ > > nfsv4 mailing list > > nfsv4@ietf.org > > https://www.ietf.org/mailman/listinfo/nfsv4 > > > > -- > To unsubscribe from this list: send the line "unsubscribe > linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www.ietf.org/mailman/listinfo/nfsv4 ^ permalink raw reply [flat|nested] 38+ messages in thread
[parent not found: <1278623771.13551.54.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>]
* RE: [nfsv4] 4.1 client - LAYOUTCOMMIT & close [not found] ` <1278623771.13551.54.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org> @ 2010-07-09 0:03 ` Sandeep Joshi 0 siblings, 0 replies; 38+ messages in thread From: Sandeep Joshi @ 2010-07-09 0:03 UTC (permalink / raw) To: Trond Myklebust, david.black Cc: linux-nfs, garth, welch, nfsv4, andros, bhalevy It seems like we agree that laycommit will be sent for file layout, correct? Or Should I file a defect on this? For reference my original email below. // START In certain cases, I don't see layoutcommit on a file at all even after doing many writes. Client side operations: open write(s) close On server side (observed operations): open layoutget's close But, I do not see laycommit at all. In terms data written by client it is about 4-5MB. When does client issue laycommit? // END Regards, Sandeep -----Original Message----- From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] On Behalf Of Trond Myklebust Sent: Thursday, July 08, 2010 2:16 PM To: david.black@emc.com Cc: linux-nfs@vger.kernel.org; garth@panasas.com; welch@panasas.com; nfsv4@ietf.org; andros@netapp.com; bhalevy@panasas.com Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close On Thu, 2010-07-08 at 16:30 -0400, david.black@emc.com wrote: > > Note that a LAYOUTRETURN can arrive without LAYOUTCOMMIT if the > > client hasn't written to the file. I'm not sure what about the > > blocks case though, do you implicitly free up any provisionally > > allocated blocks that the client had not explicitly committed using LAYOUTCOMMIT? > > In principle, yes as the blocks are no longer promised to the client, > although lazy evaluation of this is an obvious optimization. > > > >> "Upon receiving an OPEN, LOCK or a WANT_DELEGATION, the server > > >> must check that it has received LAYOUTCOMMITs from any other > > >> clients that may have the file open for writing. If it hasn't, > > >> then it MUST take some action to ensure that any file data > > >> changes are accompanied by a change > > > ^ potentially visible > > >> attribute update." > > > > That should be OK as long as it's not for every GETATTR for the > > change, mtime, or size attributes. > > > > >> > > >> Then you can add the above suggestion without the offending > > >> caveat. Note however that it does break the "SHOULD NOT" > > >> admonition in section 18.32.4. > > > > Better be safe than sorry in this rare error case. > > I concur with Benny on both of the above - in essence, the unrecovered client failure is a reason to potentially ignore the "SHOULD" (server can't know whether it actually ignored the "SHOULD", hence better safe than sorry). We probably ought to find a someplace appropriate to add a paragraph or two explaining this in one of the 4.2 documents. Right. I'm only interested in fixing the close-to-open case. The case of general GETATTR calls might be nice to fix too, but it should not be essential in order to ensure that well-behaved applications continue to work as expected. Note, however, that legacy support for stateless protocols like NFSv2 and NFSv3 may be problematic: there is no equivalent of OPEN, and so the server may have to do the above check on all NFSPROC2_GETATTR, NFSPROC3_GETATTR, NFSPROC2_LOOKUP and NFSPROC3_LOOKUP requests. Trond > Thanks, > --David > > > > -----Original Message----- > > From: Benny Halevy [mailto:bhalevy.lists@gmail.com] On Behalf Of > > Benny Halevy > > Sent: Thursday, July 08, 2010 12:00 PM > > To: Trond Myklebust > > Cc: Black, David; Noveck, David; Muntz, Daniel; > > linux-nfs@vger.kernel.org; garth@panasas.com; welch@panasas.com; > > nfsv4@ietf.org; andros@netapp.com > > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > > > > On Jul. 08, 2010, 2:14 +0300, Trond Myklebust <trond.myklebust@fys.uio.no> wrote: > > > On Wed, 2010-07-07 at 19:09 -0400, Trond Myklebust wrote: > > >> On Wed, 2010-07-07 at 18:52 -0400, Trond Myklebust wrote: > > >>> On Wed, 2010-07-07 at 18:44 -0400, david.black@emc.com wrote: > > >>>> Let me try this ... > > >>>> > > >>>> A correct client will always send LAYOUTCOMMIT. > > >>>> Assume that the client is correct. > > >>>> Hence if the LAYOUTCOMMIT doesn't arrive, something's failed. > > >>>> > > >>>> Important implication: No LAYOUTCOMMIT is an error/failure > > >>>> case. It just has to work; it doesn't have to be fast. > > >>>> > > > > Note that a LAYOUTRETURN can arrive without LAYOUTCOMMIT if the > > client hasn't written to the file. I'm not sure what about the > > blocks case though, do you implicitly free up any provisionally > > allocated blocks that the client had not explicitly committed using LAYOUTCOMMIT? > > > > >>>> Suggestion: If a client dies while holding writeable layouts > > >>>> that permit write-in-place, and the client doesn't reappear or > > >>>> doesn't reclaim those layouts, then the server should assume > > >>>> that the files involved were written before the client died, > > >>>> and set the file attributes accordingly as part of internally > > >>>> reclaiming the layout that the client has abandoned. > > > > Of course. That's part of the server recovery. > > > > >>>> > > >>>> Caveat: It may take a while for the server to determine that > > >>>> the client has abandoned a layout. > > > > That's two lease times after a respective CB_LAYOUTRECALL. > > > > >>>> > > >>>> This can result in false positives (file appears to be modified > > >>>> when it > > >>>> wasn't) but won't yield false negatives (file does not appear > > >>>> to be modified even though it was modified). > > >>> > > >>> OK... So we're going to have to turn off client side file > > >>> caching entirely for pNFS? I can do that... > > >>> > > >>> The above won't work. Think readahead... > > >> > > >> So... What can work, is if you modify it to work explicitly for > > >> close-to-open > > >> > > >> "Upon receiving an OPEN, LOCK or a WANT_DELEGATION, the server > > >> must check that it has received LAYOUTCOMMITs from any other > > >> clients that may have the file open for writing. If it hasn't, > > >> then it MUST take some action to ensure that any file data > > >> changes are accompanied by a change > > > ^ potentially visible > > >> attribute update." > > > > That should be OK as long as it's not for every GETATTR for the > > change, mtime, or size attributes. > > > > >> > > >> Then you can add the above suggestion without the offending > > >> caveat. Note however that it does break the "SHOULD NOT" > > >> admonition in section 18.32.4. > > > > Better be safe than sorry in this rare error case. > > > > Benny > > > > >> > > >> Trond > > >> > > >> > > >>> Trond > > >>> > > >>>> Thanks, > > >>>> --David > > >>>> > > >>>>> -----Original Message----- > > >>>>> From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] > > >>>>> On Behalf > > >>>> Of Noveck_David@emc.com > > >>>>> Sent: Wednesday, July 07, 2010 6:04 PM > > >>>>> To: Trond.Myklebust@netapp.com; Muntz, Daniel > > >>>>> Cc: linux-nfs@vger.kernel.org; garth@panasas.com; > > >>>>> welch@panasas.com; > > >>>> nfsv4@ietf.org; > > >>>>> andros@netapp.com; bhalevy@panasas.com > > >>>>> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > > >>>>> > > >>>>>> Yes. I would agree that the client cannot rely on the updates > > >>>>>> being > > >>>> made > > >>>>>> visible if it fails to send the LAYOUTCOMMIT. My point was > > >>>>>> simply > > >>>> that a > > >>>>>> compliant server MUST also have a valid strategy for dealing > > >>>>>> with > > >>>> the > > >>>>>> case where the client doesn't send it. > > >>>>> > > >>>>> So you are saying the updates "MUST be made visible" through > > >>>>> the server's valid strategy. Is that right. > > >>>>> > > >>>>> And that the client cannot rely on that. Why not, if the > > >>>>> server must have a valid strategy. > > >>>>> > > >>>>> Is this just prudent "belt and suspenders" design or what? > > >>>>> > > >>>>> It seems to me that if one side here is MUST (and the spec > > >>>>> needs to be clearer about what might or might not constitute a > > >>>>> valid strategy), > > >>>> then > > >>>>> the other side should be SHOULD. > > >>>>> > > >>>>> If both sides are "MUST", then if things don't work out then > > >>>>> the > > >>>> client > > >>>>> and server can equally point to one another and say "It's his fault". > > >>>>> > > >>>>> Am I missing something here? > > >>>>> > > >>>>> > > >>>>> > > >>>>> -----Original Message----- > > >>>>> From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] > > >>>>> On Behalf Of Trond Myklebust > > >>>>> Sent: Wednesday, July 07, 2010 5:01 PM > > >>>>> To: Muntz, Daniel > > >>>>> Cc: linux-nfs@vger.kernel.org; garth@panasas.com; > > >>>>> welch@panasas.com; nfsv4@ietf.org; andros@netapp.com; > > >>>>> bhalevy@panasas.com > > >>>>> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > > >>>>> > > >>>>> On Wed, 2010-07-07 at 16:39 -0400, Daniel.Muntz@emc.com wrote: > > >>>>>> To bring this discussion full circle, since we agree that a > > >>>> compliant > > >>>>>> server can implement a scheme where written data does not > > >>>>>> become > > >>>>> visible > > >>>>>> until after a LAYOUTCOMMIT, do we also agree that > > >>>>>> LAYOUTCOMMIT is a "MUST" from a compliant client (independent of layout type)? > > >>>>> > > >>>>> Yes. I would agree that the client cannot rely on the updates > > >>>>> being > > >>>> made > > >>>>> visible if it fails to send the LAYOUTCOMMIT. My point was > > >>>>> simply that > > >>>> a > > >>>>> compliant server MUST also have a valid strategy for dealing > > >>>>> with the case where the client doesn't send it. > > >>>>> > > >>>>> Cheers > > >>>>> Trond > > >>>>> > > >>>>>> -Dan > > >>>>>> > > >>>>>>> -----Original Message----- > > >>>>>>> From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] > > >>>>>>> On Behalf Of Trond Myklebust > > >>>>>>> Sent: Wednesday, July 07, 2010 7:04 AM > > >>>>>>> To: Benny Halevy > > >>>>>>> Cc: andros@netapp.com; linux-nfs@vger.kernel.org; Garth > > >>>>>>> Gibson; Brent Welch; NFSv4 > > >>>>>>> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close > > >>>>>>> > > >>>>>>> On Wed, 2010-07-07 at 16:51 +0300, Benny Halevy wrote: > > >>>>>>>> On Jul. 07, 2010, 16:18 +0300, Trond Myklebust > > >>>>>>> <Trond.Myklebust@netapp.com> wrote: > > >>>>>>>>> On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote: > > >>>>>>>>>> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote: > > >>>>>>>>>>> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust > > >>>>>>> <trond.myklebust@fys.uio.no> wrote: > > >>>>>>>>>>>> On Tue, 2010-07-06 at 15:20 -0400, Daniel.Muntz@emc.com > > >>>>> wrote: > > >>>>>>>>>>>>> The COMMIT to the DS, ttbomk, commits data on the DS. > > >>>>>>>>>>>>> I > > >>>> see it as > > >>>>>>>>>>>>> orthogonal to updating the metadata on the MDS (but > > >>>> perhaps I'm wrong). > > >>>>>>>>>>>>> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides > > >>>>>>>>>>>>> a > > >>>> synchronization > > >>>>>>>>>>>>> point, so even if the non-clustered server does not > > >>>>>>>>>>>>> want > > >>>> to update > > >>>>>>>>>>>>> metadata on every DS I/O, the LAYOUTCOMMIT could also > > >>>>>>>>>>>>> be a > > >>>> trigger to > > >>>>>>>>>>>>> execute whatever synchronization mechanism the > > >>>>>>>>>>>>> implementer > > >>>> wishes to put > > >>>>>>>>>>>>> in the control protocol. > > >>>>>>>>>>>> > > >>>>>>>>>>>> As far as I'm aware, there are no exceptions in RFC5661 > > >>>> that would allow > > >>>>>>>>>>>> pNFS servers to break the rule that any visible change > > >>>>>>>>>>>> to > > >>>> the data must > > >>>>>>>>>>>> be atomically accompanied with a change attribute update. > > >>>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> Trond, I'm not sure how this rule you mentioned is > > >>>> specified. > > >>>>>>>>>>> > > >>>>>>>>>>> See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT > > >>>>>>>>>>> and > > >>>> change/time_modify > > >>>>>>>>>>> in particular: > > >>>>>>>>>>> > > >>>>>>>>>>> For some layout protocols, the storage device is able > > >>>>>>>>>>> to > > >>>> notify the > > >>>>>>>>>>> metadata server of the occurrence of an I/O; as a > > >>>>>>>>>>> result, > > >>>> the change > > >>>>>>>>>>> and time_modify attributes may be updated at the > > >>>>>>>>>>> metadata > > >>>> server. > > >>>>>>>>>>> For a metadata server that is capable of monitoring > > >>>> updates to the > > >>>>>>>>>>> change and time_modify attributes, LAYOUTCOMMIT > > >>>> processing is not > > >>>>>>>>>>> required to update the change attribute. In this > > >>>>>>>>>>> case, > > >>>> the metadata > > >>>>>>>>>>> server must ensure that no further update to the data > > >>>>>>>>>>> has > > >>>> occurred > > >>>>>>>>>>> since the last update of the attributes; file-based > > >>>> protocols may > > >>>>>>>>>>> have enough information to make this determination or > > >>>>>>>>>>> may > > >>>> update the > > >>>>>>>>>>> change attribute upon each file modification. This > > >>>>>>>>>>> also > > >>>> applies for > > >>>>>>>>>>> the time_modify attribute. If the server > > >>>>>>>>>>> implementation > > >>>> is able to > > >>>>>>>>>>> determine that the file has not been modified since > > >>>>>>>>>>> the > > >>>> last > > >>>>>>>>>>> time_modify update, the server need not update > > >>>> time_modify at > > >>>>>>>>>>> LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the > > >>>>>>>>>>> updated > > >>>> attributes > > >>>>>>>>>>> should be visible if that file was modified since the > > >>>> latest previous > > >>>>>>>>>>> LAYOUTCOMMIT or LAYOUTGET > > >>>>>>>>>> > > >>>>>>>>>> I know. However the above paragraph does not state that > > >>>>>>>>>> the > > >>>> server > > >>>>>>>>>> should make those changes visible to clients other than > > >>>>>>>>>> the > > >>>> one that is > > >>>>>>>>>> writing. > > >>>>>>>>>> > > >>>>>>>>>> Section 18.32.4 states that writes will cause the > > >>>> time_modified and > > >>>>>>>>>> change attributes to be updated (if and only if the file > > >>>>>>>>>> data > > >>>> is > > >>>>>>>>>> modified). Several other sections rely on this behaviour, > > >>>> including > > >>>>>>>>>> section 10.3.1, section 11.7.2.2, and section 11.7.7. > > >>>>>>>>>> > > >>>>>>>>>> The only 'special behaviour' that I see allowed for pNFS > > >>>>>>>>>> is > > >>>> in section > > >>>>>>>>>> 13.10, which states that clients can't expect to see > > >>>>>>>>>> changes immediately, but that they must be able to expect > > >>>> close-to-open > > >>>>>>>>>> semantics to work. Again, if this is to be the case, then > > >>>>>>>>>> the > > >>>> server > > >>>>>>>>>> _must_ be able to deal with the case where client 1 dies > > >>>> before it can > > >>>>>>>>>> issue the LAYOUTCOMMIT. > > >>>>>>>> > > >>>>>>>> Agreed. > > >>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>>>> As I see it, if your server allows one client to read > > >>>>>>>>>>>> data > > >>>> that may have > > >>>>>>>>>>>> been modified by another client that holds a WRITE > > >>>>>>>>>>>> layout > > >>>> for that range > > >>>>>>>>>>>> then (since that is a visible data change) it should > > >>>> provide a change > > >>>>>>>>>>>> attribute update irrespective of whether or not a > > >>>> LAYOUTCOMMIT has been > > >>>>>>>>>>>> sent. > > >>>>>>>>>>> > > >>>>>>>>>>> the requirement for the server in WRITE's implementation > > >>>> section > > >>>>>>>>>>> is quite weak: "It is assumed that the act of writing > > >>>>>>>>>>> data > > >>>> to a file will > > >>>>>>>>>>> cause the time_modified and change attributes of the > > >>>>>>>>>>> file to > > >>>> be updated." > > >>>>>>>>>>> > > >>>>>>>>>>> The difference here is that for pNFS the written data is > > >>>>>>>>>>> not > > >>>> guaranteed > > >>>>>>>>>>> to be visible until LAYOUTCOMMIT. In a broader sense, > > >>>> assuming the clients > > >>>>>>>>>>> are caching dirty data and use a write-behind cache, > > >>>> application-written data > > >>>>>>>>>>> may be visible to other processes on the same host but > > >>>>>>>>>>> not > > >>>> to others until > > >>>>>>>>>>> fsync() or close() - open-to-close semantics are the > > >>>>>>>>>>> only > > >>>> thing the client > > >>>>>>>>>>> guarantees, right? Issuing LAYOUTCOMMIT on fsync() and > > >>>> close() ensure the > > >>>>>>>>>>> data is committed to stable storage and is visible to > > >>>>>>>>>>> all > > >>>> other clients in > > >>>>>>>>>>> the cluster. > > >>>>>>>>>> > > >>>>>>>>>> See above. I'm not disputing your statement that 'the > > >>>>>>>>>> written > > >>>> data is > > >>>>>>>>>> not guaranteed to be visible until LAYOUTCOMMIT'. I am > > >>>> disputing an > > >>>>>>>>>> assumption that 'the written data may be visible without > > >>>>>>>>>> an > > >>>> accompanying > > >>>>>>>>>> change attribute update'. > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> In other words, I'd expect the following scenario to give > > >>>>>>>>> the > > >>>> same > > >>>>>>>>> results in NFSv4.1 w/pNFS as it does in NFSv4: > > >>>>>>>> > > >>>>>>>> That's a strong requirement that may limit the scalability > > >>>>>>>> of > > >>>> the server. > > >>>>>>>> > > >>>>>>>> The spirit of the pNFS operations, at least from Panasas > > >>>> perspective was that > > >>>>>>>> the data is transient until LAYOUTCOMMIT, meaning it may or > > >>>>>>>> may > > >>>> not be visible > > >>>>>>>> to clients other than the one who wrote it, and its > > >>>>>>>> associated > > >>>> metadata MUST > > >>>>>>>> be updated and describe the new data only on LAYOUTCOMMIT > > >>>>>>>> and > > >>>> until then it's > > >>>>>>>> undefined, i.e. it's up to the server implementation > > >>>>>>>> whether to > > >>>> update it or not. > > >>>>>>>> > > >>>>>>>> Without locking, what do the stronger semantics buy you? > > >>>>>>>> Even if a client verified the change_attribute new data may > > >>>> become visible > > >>>>>>>> at any time after the GETATTR if the file/byte range aren't > > >>>> locked. > > >>>>>>> > > >>>>>>> There is no locking needed in the scenario below: it is > > >>>>>>> ordinary close-to-open semantics. > > >>>>>>> > > >>>>>>> The point is that if you remove the one and only way that > > >>>>>>> clients > > >>>> have > > >>>>>>> to determine whether or not their data caches are valid, > > >>>>>>> then they > > >>>> can > > >>>>>>> no longer cache data at all, and server scalability will be > > >>>>>>> shot > > >>>> to > > >>>>>>> smithereens anyway. > > >>>>>>> > > >>>>>>> Trond > > >>>>>>> > > >>>>>>>> Benny > > >>>>>>>> > > >>>>>>>>> > > >>>>>>>>> Client 1 Client 2 > > >>>>>>>>> ======== ======== > > >>>>>>>>> > > >>>>>>>>> OPEN foo > > >>>>>>>>> READ > > >>>>>>>>> CLOSE > > >>>>>>>>> OPEN > > >>>>>>>>> LAYOUTGET ... > > >>>>>>>>> WRITE via DS > > >>>>>>>>> <dies>... > > >>>>>>>>> OPEN foo > > >>>>>>>>> verify change_attr > > >>>>>>>>> READ if above WRITE is visible CLOSE > > >>>>>>>>> > > >>>>>>>>> Trond > > >>>>>>>>> _______________________________________________ > > >>>>>>>>> nfsv4 mailing list > > >>>>>>>>> nfsv4@ietf.org > > >>>>>>>>> https://www.ietf.org/mailman/listinfo/nfsv4 > > >>>>>>> > > >>>>>>> > > >>>>>>> _______________________________________________ > > >>>>>>> nfsv4 mailing list > > >>>>>>> nfsv4@ietf.org > > >>>>>>> https://www.ietf.org/mailman/listinfo/nfsv4 > > >>>>>>> > > >>>>>>> > > >>>>> > > >>>>> > > >>>>> _______________________________________________ > > >>>>> nfsv4 mailing list > > >>>>> nfsv4@ietf.org > > >>>>> https://www.ietf.org/mailman/listinfo/nfsv4 > > >>>>> > > >>>>> _______________________________________________ > > >>>>> nfsv4 mailing list > > >>>>> nfsv4@ietf.org > > >>>>> https://www.ietf.org/mailman/listinfo/nfsv4 > > >>>> > > >>> > > >>> > > >> > > >> > > >> -- > > >> To unsubscribe from this list: send the line "unsubscribe > > >> linux-nfs" in the body of a message to majordomo@vger.kernel.org > > >> More majordomo info at > > >> http://vger.kernel.org/majordomo-info.html > > > > > > > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe > > > linux-nfs" in the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > _______________________________________________ > nfsv4 mailing list > nfsv4@ietf.org > https://www.ietf.org/mailman/listinfo/nfsv4 _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www.ietf.org/mailman/listinfo/nfsv4 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close 2010-07-08 20:30 ` [nfsv4] " david.black 2010-07-08 21:16 ` Trond Myklebust @ 2010-07-08 22:12 ` sfaibish 2010-07-08 23:01 ` Tom Haynes 1 sibling, 1 reply; 38+ messages in thread From: sfaibish @ 2010-07-08 22:12 UTC (permalink / raw) To: david.black, bhalevy, trond.myklebust Cc: andros, linux-nfs, garth, welch, nfsv4 All, After discussing this issue with Dave Noveck and as I mentioned in the call today I think that this is a serious issue and a disconnect between different layout types behavior. My proposal is to have this discussion F2F in Maastricht on the white board. So I will add an agenda item to the WG on this topic. I could address the behavior of the block layout but it is not something we want to mimic as we all agreed at cthon to avoid the LAYOUTCOMMIT as much as possible for file layout. If we solve the issue using the proposed mechanism (Trond) we will create a conflict with the use of LAYOUTCOMMIT. Just as a hint the difference from block is that block uses layout for write and read as different leases and when a client has layout for read the server will always send him a LAYOUTRETURN when either upgrading his lease to write of send a layout for write to another client. We don't want to do same for file, I don't think so. My 2c. /Sorin On Thu, 08 Jul 2010 16:30:48 -0400, <david.black@emc.com> wrote: >> Note that a LAYOUTRETURN can arrive without LAYOUTCOMMIT if the client = =20 >> hasn't >> written to the file. I'm not sure what about the blocks case though, =20 >> do you >> implicitly free up any provisionally allocated blocks that the client =20 >> had not >> explicitly committed using LAYOUTCOMMIT? > > In principle, yes as the blocks are no longer promised to the client, =20 > although > lazy evaluation of this is an obvious optimization. > >> >> "Upon receiving an OPEN, LOCK or a WANT_DELEGATION, the server must >> >> check that it has received LAYOUTCOMMITs from any other clients that = =20 >> may >> >> have the file open for writing. If it hasn't, then it MUST take some >> >> action to ensure that any file data changes are accompanied by a =20 >> change >> > ^ potentially visible >> >> attribute update." >> >> That should be OK as long as it's not for every GETATTR for the change, = =20 >> mtime, >> or size attributes. >> >> >> >> >> Then you can add the above suggestion without the offending caveat. = =20 >> Note >> >> however that it does break the "SHOULD NOT" admonition in section >> >> 18.32.4. >> >> Better be safe than sorry in this rare error case. > > I concur with Benny on both of the above - in essence, the unrecovered =20 > client failure is a reason to potentially ignore the "SHOULD" (server =20 > can't know whether it actually ignored the "SHOULD", hence better safe =20 > than sorry). We probably ought to find a someplace appropriate to add a = =20 > paragraph or two explaining this in one of the 4.2 documents. > > Thanks, > --David > > >> -----Original Message----- >> From: Benny Halevy [mailto:bhalevy.lists@gmail.com] On Behalf Of Benny = =20 >> Halevy >> Sent: Thursday, July 08, 2010 12:00 PM >> To: Trond Myklebust >> Cc: Black, David; Noveck, David; Muntz, Daniel; =20 >> linux-nfs@vger.kernel.org; garth@panasas.com; >> welch@panasas.com; nfsv4@ietf.org; andros@netapp.com >> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close >> >> On Jul. 08, 2010, 2:14 +0300, Trond Myklebust =20 >> <trond.myklebust@fys.uio.no> wrote: >> > On Wed, 2010-07-07 at 19:09 -0400, Trond Myklebust wrote: >> >> On Wed, 2010-07-07 at 18:52 -0400, Trond Myklebust wrote: >> >>> On Wed, 2010-07-07 at 18:44 -0400, david.black@emc.com wrote: >> >>>> Let me try this ... >> >>>> >> >>>> A correct client will always send LAYOUTCOMMIT. >> >>>> Assume that the client is correct. >> >>>> Hence if the LAYOUTCOMMIT doesn't arrive, something's failed. >> >>>> >> >>>> Important implication: No LAYOUTCOMMIT is an error/failure case. = =20 >> It >> >>>> just has to work; it doesn't have to be fast. >> >>>> >> >> Note that a LAYOUTRETURN can arrive without LAYOUTCOMMIT if the client = =20 >> hasn't >> written to the file. I'm not sure what about the blocks case though, =20 >> do you >> implicitly free up any provisionally allocated blocks that the client =20 >> had not >> explicitly committed using LAYOUTCOMMIT? >> >> >>>> Suggestion: If a client dies while holding writeable layouts that = =20 >> permit >> >>>> write-in-place, and the client doesn't reappear or doesn't reclaim = =20 >> those >> >>>> layouts, then the server should assume that the files involved were >> >>>> written before the client died, and set the file attributes =20 >> accordingly >> >>>> as part of internally reclaiming the layout that the client has >> >>>> abandoned. >> >> Of course. That's part of the server recovery. >> >> >>>> >> >>>> Caveat: It may take a while for the server to determine that the =20 >> client >> >>>> has abandoned a layout. >> >> That's two lease times after a respective CB_LAYOUTRECALL. >> >> >>>> >> >>>> This can result in false positives (file appears to be modified =20 >> when it >> >>>> wasn't) but won't yield false negatives (file does not appear to be >> >>>> modified even though it was modified). >> >>> >> >>> OK... So we're going to have to turn off client side file caching >> >>> entirely for pNFS? I can do that... >> >>> >> >>> The above won't work. Think readahead... >> >> >> >> So... What can work, is if you modify it to work explicitly for >> >> close-to-open >> >> >> >> "Upon receiving an OPEN, LOCK or a WANT_DELEGATION, the server must >> >> check that it has received LAYOUTCOMMITs from any other clients that = =20 >> may >> >> have the file open for writing. If it hasn't, then it MUST take some >> >> action to ensure that any file data changes are accompanied by a =20 >> change >> > ^ potentially visible >> >> attribute update." >> >> That should be OK as long as it's not for every GETATTR for the change, = =20 >> mtime, >> or size attributes. >> >> >> >> >> Then you can add the above suggestion without the offending caveat. = =20 >> Note >> >> however that it does break the "SHOULD NOT" admonition in section >> >> 18.32.4. >> >> Better be safe than sorry in this rare error case. >> >> Benny >> >> >> >> >> Trond >> >> >> >> >> >>> Trond >> >>> >> >>>> Thanks, >> >>>> --David >> >>>> >> >>>>> -----Original Message----- >> >>>>> From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] On =20 >> Behalf >> >>>> Of Noveck_David@emc.com >> >>>>> Sent: Wednesday, July 07, 2010 6:04 PM >> >>>>> To: Trond.Myklebust@netapp.com; Muntz, Daniel >> >>>>> Cc: linux-nfs@vger.kernel.org; garth@panasas.com; =20 >> welch@panasas.com; >> >>>> nfsv4@ietf.org; >> >>>>> andros@netapp.com; bhalevy@panasas.com >> >>>>> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close >> >>>>> >> >>>>>> Yes. I would agree that the client cannot rely on the updates =20 >> being >> >>>> made >> >>>>>> visible if it fails to send the LAYOUTCOMMIT. My point was simply >> >>>> that a >> >>>>>> compliant server MUST also have a valid strategy for dealing with >> >>>> the >> >>>>>> case where the client doesn't send it. >> >>>>> >> >>>>> So you are saying the updates "MUST be made visible" through the >> >>>>> server's valid strategy. Is that right. >> >>>>> >> >>>>> And that the client cannot rely on that. Why not, if the server = =20 >> must >> >>>>> have a valid strategy. >> >>>>> >> >>>>> Is this just prudent "belt and suspenders" design or what? >> >>>>> >> >>>>> It seems to me that if one side here is MUST (and the spec needs = =20 >> to be >> >>>>> clearer about what might or might not constitute a valid =20 >> strategy), >> >>>> then >> >>>>> the other side should be SHOULD. >> >>>>> >> >>>>> If both sides are "MUST", then if things don't work out then the >> >>>> client >> >>>>> and server can equally point to one another and say "It's his =20 >> fault". >> >>>>> >> >>>>> Am I missing something here? >> >>>>> >> >>>>> >> >>>>> >> >>>>> -----Original Message----- >> >>>>> From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] On =20 >> Behalf >> >>>>> Of Trond Myklebust >> >>>>> Sent: Wednesday, July 07, 2010 5:01 PM >> >>>>> To: Muntz, Daniel >> >>>>> Cc: linux-nfs@vger.kernel.org; garth@panasas.com; =20 >> welch@panasas.com; >> >>>>> nfsv4@ietf.org; andros@netapp.com; bhalevy@panasas.com >> >>>>> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close >> >>>>> >> >>>>> On Wed, 2010-07-07 at 16:39 -0400, Daniel.Muntz@emc.com wrote: >> >>>>>> To bring this discussion full circle, since we agree that a >> >>>> compliant >> >>>>>> server can implement a scheme where written data does not become >> >>>>> visible >> >>>>>> until after a LAYOUTCOMMIT, do we also agree that LAYOUTCOMMIT =20 >> is a >> >>>>>> "MUST" from a compliant client (independent of layout type)? >> >>>>> >> >>>>> Yes. I would agree that the client cannot rely on the updates =20 >> being >> >>>> made >> >>>>> visible if it fails to send the LAYOUTCOMMIT. My point was simply = =20 >> that >> >>>> a >> >>>>> compliant server MUST also have a valid strategy for dealing with = =20 >> the >> >>>>> case where the client doesn't send it. >> >>>>> >> >>>>> Cheers >> >>>>> Trond >> >>>>> >> >>>>>> -Dan >> >>>>>> >> >>>>>>> -----Original Message----- >> >>>>>>> From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] >> >>>>>>> On Behalf Of Trond Myklebust >> >>>>>>> Sent: Wednesday, July 07, 2010 7:04 AM >> >>>>>>> To: Benny Halevy >> >>>>>>> Cc: andros@netapp.com; linux-nfs@vger.kernel.org; Garth >> >>>>>>> Gibson; Brent Welch; NFSv4 >> >>>>>>> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close >> >>>>>>> >> >>>>>>> On Wed, 2010-07-07 at 16:51 +0300, Benny Halevy wrote: >> >>>>>>>> On Jul. 07, 2010, 16:18 +0300, Trond Myklebust >> >>>>>>> <Trond.Myklebust@netapp.com> wrote: >> >>>>>>>>> On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote: >> >>>>>>>>>> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote: >> >>>>>>>>>>> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust >> >>>>>>> <trond.myklebust@fys.uio.no> wrote: >> >>>>>>>>>>>> On Tue, 2010-07-06 at 15:20 -0400, Daniel.Muntz@emc.com >> >>>>> wrote: >> >>>>>>>>>>>>> The COMMIT to the DS, ttbomk, commits data on the DS. I >> >>>> see it as >> >>>>>>>>>>>>> orthogonal to updating the metadata on the MDS (but >> >>>> perhaps I'm wrong). >> >>>>>>>>>>>>> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides a >> >>>> synchronization >> >>>>>>>>>>>>> point, so even if the non-clustered server does not want >> >>>> to update >> >>>>>>>>>>>>> metadata on every DS I/O, the LAYOUTCOMMIT could also be a >> >>>> trigger to >> >>>>>>>>>>>>> execute whatever synchronization mechanism the implementer >> >>>> wishes to put >> >>>>>>>>>>>>> in the control protocol. >> >>>>>>>>>>>> >> >>>>>>>>>>>> As far as I'm aware, there are no exceptions in RFC5661 >> >>>> that would allow >> >>>>>>>>>>>> pNFS servers to break the rule that any visible change to >> >>>> the data must >> >>>>>>>>>>>> be atomically accompanied with a change attribute update. >> >>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> Trond, I'm not sure how this rule you mentioned is >> >>>> specified. >> >>>>>>>>>>> >> >>>>>>>>>>> See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT and >> >>>> change/time_modify >> >>>>>>>>>>> in particular: >> >>>>>>>>>>> >> >>>>>>>>>>> For some layout protocols, the storage device is able to >> >>>> notify the >> >>>>>>>>>>> metadata server of the occurrence of an I/O; as a result, >> >>>> the change >> >>>>>>>>>>> and time_modify attributes may be updated at the metadata >> >>>> server. >> >>>>>>>>>>> For a metadata server that is capable of monitoring >> >>>> updates to the >> >>>>>>>>>>> change and time_modify attributes, LAYOUTCOMMIT >> >>>> processing is not >> >>>>>>>>>>> required to update the change attribute. In this case, >> >>>> the metadata >> >>>>>>>>>>> server must ensure that no further update to the data has >> >>>> occurred >> >>>>>>>>>>> since the last update of the attributes; file-based >> >>>> protocols may >> >>>>>>>>>>> have enough information to make this determination or may >> >>>> update the >> >>>>>>>>>>> change attribute upon each file modification. This also >> >>>> applies for >> >>>>>>>>>>> the time_modify attribute. If the server implementation >> >>>> is able to >> >>>>>>>>>>> determine that the file has not been modified since the >> >>>> last >> >>>>>>>>>>> time_modify update, the server need not update >> >>>> time_modify at >> >>>>>>>>>>> LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the updated >> >>>> attributes >> >>>>>>>>>>> should be visible if that file was modified since the >> >>>> latest previous >> >>>>>>>>>>> LAYOUTCOMMIT or LAYOUTGET >> >>>>>>>>>> >> >>>>>>>>>> I know. However the above paragraph does not state that the >> >>>> server >> >>>>>>>>>> should make those changes visible to clients other than the >> >>>> one that is >> >>>>>>>>>> writing. >> >>>>>>>>>> >> >>>>>>>>>> Section 18.32.4 states that writes will cause the >> >>>> time_modified and >> >>>>>>>>>> change attributes to be updated (if and only if the file data >> >>>> is >> >>>>>>>>>> modified). Several other sections rely on this behaviour, >> >>>> including >> >>>>>>>>>> section 10.3.1, section 11.7.2.2, and section 11.7.7. >> >>>>>>>>>> >> >>>>>>>>>> The only 'special behaviour' that I see allowed for pNFS is >> >>>> in section >> >>>>>>>>>> 13.10, which states that clients can't expect to see changes >> >>>>>>>>>> immediately, but that they must be able to expect >> >>>> close-to-open >> >>>>>>>>>> semantics to work. Again, if this is to be the case, then the >> >>>> server >> >>>>>>>>>> _must_ be able to deal with the case where client 1 dies >> >>>> before it can >> >>>>>>>>>> issue the LAYOUTCOMMIT. >> >>>>>>>> >> >>>>>>>> Agreed. >> >>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>>>> As I see it, if your server allows one client to read data >> >>>> that may have >> >>>>>>>>>>>> been modified by another client that holds a WRITE layout >> >>>> for that range >> >>>>>>>>>>>> then (since that is a visible data change) it should >> >>>> provide a change >> >>>>>>>>>>>> attribute update irrespective of whether or not a >> >>>> LAYOUTCOMMIT has been >> >>>>>>>>>>>> sent. >> >>>>>>>>>>> >> >>>>>>>>>>> the requirement for the server in WRITE's implementation >> >>>> section >> >>>>>>>>>>> is quite weak: "It is assumed that the act of writing data >> >>>> to a file will >> >>>>>>>>>>> cause the time_modified and change attributes of the file to >> >>>> be updated." >> >>>>>>>>>>> >> >>>>>>>>>>> The difference here is that for pNFS the written data is not >> >>>> guaranteed >> >>>>>>>>>>> to be visible until LAYOUTCOMMIT. In a broader sense, >> >>>> assuming the clients >> >>>>>>>>>>> are caching dirty data and use a write-behind cache, >> >>>> application-written data >> >>>>>>>>>>> may be visible to other processes on the same host but not >> >>>> to others until >> >>>>>>>>>>> fsync() or close() - open-to-close semantics are the only >> >>>> thing the client >> >>>>>>>>>>> guarantees, right? Issuing LAYOUTCOMMIT on fsync() and >> >>>> close() ensure the >> >>>>>>>>>>> data is committed to stable storage and is visible to all >> >>>> other clients in >> >>>>>>>>>>> the cluster. >> >>>>>>>>>> >> >>>>>>>>>> See above. I'm not disputing your statement that 'the written >> >>>> data is >> >>>>>>>>>> not guaranteed to be visible until LAYOUTCOMMIT'. I am >> >>>> disputing an >> >>>>>>>>>> assumption that 'the written data may be visible without an >> >>>> accompanying >> >>>>>>>>>> change attribute update'. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> In other words, I'd expect the following scenario to give the >> >>>> same >> >>>>>>>>> results in NFSv4.1 w/pNFS as it does in NFSv4: >> >>>>>>>> >> >>>>>>>> That's a strong requirement that may limit the scalability of >> >>>> the server. >> >>>>>>>> >> >>>>>>>> The spirit of the pNFS operations, at least from Panasas >> >>>> perspective was that >> >>>>>>>> the data is transient until LAYOUTCOMMIT, meaning it may or may >> >>>> not be visible >> >>>>>>>> to clients other than the one who wrote it, and its associated >> >>>> metadata MUST >> >>>>>>>> be updated and describe the new data only on LAYOUTCOMMIT and >> >>>> until then it's >> >>>>>>>> undefined, i.e. it's up to the server implementation whether to >> >>>> update it or not. >> >>>>>>>> >> >>>>>>>> Without locking, what do the stronger semantics buy you? >> >>>>>>>> Even if a client verified the change_attribute new data may >> >>>> become visible >> >>>>>>>> at any time after the GETATTR if the file/byte range aren't >> >>>> locked. >> >>>>>>> >> >>>>>>> There is no locking needed in the scenario below: it is ordinary >> >>>>>>> close-to-open semantics. >> >>>>>>> >> >>>>>>> The point is that if you remove the one and only way that =20 >> clients >> >>>> have >> >>>>>>> to determine whether or not their data caches are valid, then =20 >> they >> >>>> can >> >>>>>>> no longer cache data at all, and server scalability will be shot >> >>>> to >> >>>>>>> smithereens anyway. >> >>>>>>> >> >>>>>>> Trond >> >>>>>>> >> >>>>>>>> Benny >> >>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Client 1 Client 2 >> >>>>>>>>> =3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D >> >>>>>>>>> >> >>>>>>>>> OPEN foo >> >>>>>>>>> READ >> >>>>>>>>> CLOSE >> >>>>>>>>> OPEN >> >>>>>>>>> LAYOUTGET ... >> >>>>>>>>> WRITE via DS >> >>>>>>>>> <dies>... >> >>>>>>>>> OPEN foo >> >>>>>>>>> verify change_attr >> >>>>>>>>> READ if above WRITE is visible >> >>>>>>>>> CLOSE >> >>>>>>>>> >> >>>>>>>>> Trond >> >>>>>>>>> _______________________________________________ >> >>>>>>>>> nfsv4 mailing list >> >>>>>>>>> nfsv4@ietf.org >> >>>>>>>>> https://www.ietf.org/mailman/listinfo/nfsv4 >> >>>>>>> >> >>>>>>> >> >>>>>>> _______________________________________________ >> >>>>>>> nfsv4 mailing list >> >>>>>>> nfsv4@ietf.org >> >>>>>>> https://www.ietf.org/mailman/listinfo/nfsv4 >> >>>>>>> >> >>>>>>> >> >>>>> >> >>>>> >> >>>>> _______________________________________________ >> >>>>> nfsv4 mailing list >> >>>>> nfsv4@ietf.org >> >>>>> https://www.ietf.org/mailman/listinfo/nfsv4 >> >>>>> >> >>>>> _______________________________________________ >> >>>>> nfsv4 mailing list >> >>>>> nfsv4@ietf.org >> >>>>> https://www.ietf.org/mailman/listinfo/nfsv4 >> >>>> >> >>> >> >>> >> >> >> >> >> >> -- >> >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" = =20 >> in >> >> the body of a message to majordomo@vger.kernel.org >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > >> > >> > >> > -- >> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" = =20 >> in >> > the body of a message to majordomo@vger.kernel.org >> > More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > _______________________________________________ > nfsv4 mailing list > nfsv4@ietf.org > https://www.ietf.org/mailman/listinfo/nfsv4 > > --=20 Best Regards Sorin Faibish Corporate Distinguished Engineer Network Storage Group EMC=B2 where information lives Phone: 508-435-1000 x 48545 Cellphone: 617-510-0422 Email : sfaibish@emc.com _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www.ietf.org/mailman/listinfo/nfsv4 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 4.1 client - LAYOUTCOMMIT & close 2010-07-08 22:12 ` sfaibish @ 2010-07-08 23:01 ` Tom Haynes 2010-07-08 23:57 ` sfaibish 2010-07-09 0:41 ` [nfsv4] " Trond Myklebust 0 siblings, 2 replies; 38+ messages in thread From: Tom Haynes @ 2010-07-08 23:01 UTC (permalink / raw) To: sfaibish; +Cc: linux-nfs, garth, welch, nfsv4, trond.myklebust, andros, bhalevy On 07/ 8/10 05:12 PM, sfaibish wrote: > All, After discussing this issue with Dave Noveck and as I mentioned > in the > call today I think that this is a serious issue and a disconnect between > different layout types behavior. My proposal is to have this > discussion F2F > in Maastricht on the white board. So I will add an agenda item to the WG > on this topic. I could address the behavior of the block layout but > it is not something we want to mimic as we all agreed at cthon to > avoid the > LAYOUTCOMMIT as much as possible for file layout. If we solve the > issue using the proposed mechanism (Trond) we will create a conflict > with the use of LAYOUTCOMMIT. Just as a hint the difference from block is > that block uses layout for write and read as different leases and > when a client has layout for read the server will always send him > a LAYOUTRETURN when either upgrading his lease to write of send a layout > for write to another client. We don't want to do same for file, I > don't think so. My 2c. > > /Sorin When I hear the words "white board", I immediately think unorganized and likely to get out of hand. I don't know how much time we are up to now, but we must be close to running out of it. I have a counter-proposal, why doesn't someone, say Trond, put together some slides on this and we discuss them. Or, if there is a strong consensus that we do need to do this on a white board, why don't we ask ietf for an additional slot in the morning? _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www.ietf.org/mailman/listinfo/nfsv4 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 4.1 client - LAYOUTCOMMIT & close 2010-07-08 23:01 ` Tom Haynes @ 2010-07-08 23:57 ` sfaibish 2010-07-09 0:41 ` [nfsv4] " Trond Myklebust 1 sibling, 0 replies; 38+ messages in thread From: sfaibish @ 2010-07-08 23:57 UTC (permalink / raw) To: Tom Haynes Cc: linux-nfs, garth, welch, nfsv4, trond.myklebust, andros, bhalevy On Thu, 08 Jul 2010 19:01:16 -0400, Tom Haynes <tom.haynes@oracle.com> = wrote: > On 07/ 8/10 05:12 PM, sfaibish wrote: >> All, After discussing this issue with Dave Noveck and as I mentioned in = >> the >> call today I think that this is a serious issue and a disconnect between >> different layout types behavior. My proposal is to have this discussion = >> F2F >> in Maastricht on the white board. So I will add an agenda item to the WG >> on this topic. I could address the behavior of the block layout but >> it is not something we want to mimic as we all agreed at cthon to avoid = >> the >> LAYOUTCOMMIT as much as possible for file layout. If we solve the >> issue using the proposed mechanism (Trond) we will create a conflict >> with the use of LAYOUTCOMMIT. Just as a hint the difference from block = >> is >> that block uses layout for write and read as different leases and >> when a client has layout for read the server will always send him >> a LAYOUTRETURN when either upgrading his lease to write of send a layout >> for write to another client. We don't want to do same for file, I >> don't think so. My 2c. >> >> /Sorin > > When I hear the words "white board", I immediately think unorganized and = > likely > to get out of hand. I don't know how much time we are up to now, but we = > must > be close to running out of it. > > I have a counter-proposal, why doesn't someone, say Trond, put together > some slides on this and we discuss them. Agreed. This is what I thought about "white board" a presentation followed = by a discussion on plan of action, perhaps a new 4.2 draft if there is a need. We can continue it in the email after we decide what to do. > > Or, if there is a strong consensus that we do need to do this on a white > board, why don't we ask ietf for an additional slot in the morning? My bad using the wrong term. I don't think we need a special time slot but we can decide on the spot in Maastricht. We should be able to find a room available. > > -- = Best Regards Sorin Faibish Corporate Distinguished Engineer Network Storage Group EMC=B2 where information lives Phone: 508-435-1000 x 48545 Cellphone: 617-510-0422 Email : sfaibish@emc.com _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www.ietf.org/mailman/listinfo/nfsv4 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close 2010-07-08 23:01 ` Tom Haynes 2010-07-08 23:57 ` sfaibish @ 2010-07-09 0:41 ` Trond Myklebust 1 sibling, 0 replies; 38+ messages in thread From: Trond Myklebust @ 2010-07-09 0:41 UTC (permalink / raw) To: Tom Haynes Cc: sfaibish, david.black, bhalevy, andros, linux-nfs, garth, welch, nfsv4 On Thu, 2010-07-08 at 18:01 -0500, Tom Haynes wrote: > I have a counter-proposal, why doesn't someone, say Trond, put together > some slides on this and we discuss them. Say who, what???? :-) OK. I can put something together, but it will take 5-10 minutes of meeting time (10 being the more realistic estimate). Trond ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: 4.1 client - LAYOUTCOMMIT 2010-07-01 23:47 4.1 client - LAYOUTCOMMIT Sandeep Joshi 2010-07-02 0:07 ` 4.1 client - LAYOUTCOMMIT & close Sandeep Joshi @ 2010-07-06 13:20 ` Benny Halevy 1 sibling, 0 replies; 38+ messages in thread From: Benny Halevy @ 2010-07-06 13:20 UTC (permalink / raw) To: Sandeep Joshi; +Cc: linux-nfs, NFSv4 On Jul. 02, 2010, 2:47 +0300, "Sandeep Joshi" <sjoshi@bluearc.com> wrote: > > As per specification value of newoffset4_u.no_offset should be less than or equal to NFS4_MAXFILEOFF. > But, I observe it to be NFS4_MAXFILELEN. No. The last offset the client can access is NFS4_MAXFILEOFF. Writing a single byte in NFS4_MAXFILEOFF will result in a file with length NFS4_MAXFILELEN but the loca_last_write_offset in this case is NFS4_MAXFILEOFF. Benny > > > regards, > > Sandeep ^ permalink raw reply [flat|nested] 38+ messages in thread
end of thread, other threads:[~2010-07-09 0:41 UTC | newest] Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-07-01 23:47 4.1 client - LAYOUTCOMMIT Sandeep Joshi 2010-07-02 0:07 ` 4.1 client - LAYOUTCOMMIT & close Sandeep Joshi [not found] ` <A062FCC8662DA848949F7C3046B9BEAE01F3A6EE-e1HlL03umel79urLq6li5IWksG4c/lV9Sp/tIRYA5EM@public.gmane.org> 2010-07-02 15:41 ` Andy Adamson 2010-07-02 17:08 ` 4.1 client - LAYOUTCOMMIT & close Suchit Kaura [not found] ` <loom.20100702T190300-538-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org> 2010-07-06 13:12 ` Andy Adamson 2010-07-06 13:23 ` Benny Halevy 2010-07-02 21:46 ` 4.1 client - LAYOUTCOMMIT & close Daniel.Muntz 2010-07-06 13:35 ` Benny Halevy 2010-07-06 13:37 ` Andy Adamson 2010-07-06 14:04 ` Boaz Harrosh 2010-07-06 19:20 ` Daniel.Muntz 2010-07-06 20:40 ` Trond Myklebust 2010-07-06 22:50 ` Daniel.Muntz 2010-07-06 23:23 ` Trond Myklebust 2010-07-07 12:05 ` Benny Halevy 2010-07-07 13:06 ` Trond Myklebust 2010-07-07 13:18 ` [nfsv4] " Trond Myklebust 2010-07-07 13:51 ` Benny Halevy 2010-07-07 14:03 ` Trond Myklebust 2010-07-07 17:45 ` Dean Hildebrand 2010-07-07 20:39 ` Daniel.Muntz 2010-07-07 21:01 ` Trond Myklebust 2010-07-07 22:04 ` Noveck_David 2010-07-07 22:27 ` Trond Myklebust 2010-07-07 22:44 ` david.black 2010-07-07 22:52 ` Trond Myklebust 2010-07-07 23:09 ` Trond Myklebust [not found] ` <1278544497.15524.17.camel@heimdal.trondhje! m .org> [not found] ` < 4C35F5E3.3000604@panasas.com> 2010-07-07 23:14 ` Trond Myklebust 2010-07-08 15:59 ` Benny Halevy 2010-07-08 20:30 ` [nfsv4] " david.black 2010-07-08 21:16 ` Trond Myklebust 2010-07-08 23:51 ` Daniel.Muntz [not found] ` <1278623771.13551.54.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org> 2010-07-09 0:03 ` [nfsv4] " Sandeep Joshi 2010-07-08 22:12 ` sfaibish 2010-07-08 23:01 ` Tom Haynes 2010-07-08 23:57 ` sfaibish 2010-07-09 0:41 ` [nfsv4] " Trond Myklebust 2010-07-06 13:20 ` 4.1 client - LAYOUTCOMMIT Benny Halevy
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.