RFC: [PATCH] staging/lustre/llite: fix O_TMPFILE/O_LOV_DELAY_CREATE conflict

* RFC: [PATCH] staging/lustre/llite: fix O_TMPFILE/O_LOV_DELAY_CREATE conflict
@ 2014-02-10 20:16 Dilger, Andreas
  2014-02-10 21:29 ` Al Viro
  0 siblings, 1 reply; 15+ messages in thread
From: Dilger, Andreas @ 2014-02-10 20:16 UTC (permalink / raw)
  To: Al Viro, Christoph Hellwig; +Cc: linux-fsdevel, Drokin, Oleg, Peng Tao, greg

On 2014/02/03, 5:09 PM, "Dilger, Andreas" <andreas.dilger@intel.com> wrote:
>On 2014/02/03, 4:07 PM, "Andreas Dilger" <andreas.dilger@intel.com> wrote:
>>In kernel 3.11 O_TMPFILE was introduced, but the open flag value
>>conflicts with the O_LOV_DELAY_CREATE flag 020000000 previously used
>>by Lustre-aware applications.  O_LOV_DELAY_CREATE allows applications
>>to defer file layout and object creation from open time (the default)
>>until it can instead be specified by the application using an ioctl.
>>
>>Instead of trying to find a non-conflicting O_LOV_DELAY_CREATE flag
>>or define a Lustre-specific flag that isn't of use to most/any other
>>filesystems, use (O_NOCTTY|FASYNC) as the new value.  These flag
>>are not meaningful for newly-created regular files and should be
>>OK since O_LOV_DELAY_CREATE is only meaningful for new files.
>>
>>I looked into using O_ACCMODE/FMODE_WRITE_IOCTL, which allows calling
>>ioctl() on the minimally-opened fd and is close to what is needed,
>>but that doesn't allow specifying the actual read or write mode for
>>the file, and fcntl(F_SETFL) doesn't allow O_RDONLY/O_WRONLY/O_RDWR
>>to be set after the file is opened.

Al, Christoph,
any comments on this approach?

>A few extra comments here that I wasn't sure should be in the commit
>comment.
>
>The main goal of the O_LOV_DELAY_CREATE flag is to allow the file to be
>opened in a "preliminary" manner to allow the application to specify the
>layout of the file across the Lustre storage targets (e.g. whether the
>app has millions of separate files, each one written to a single server,
>or there is a single huge file spread across all of the servers, or some
>combination of the two, if it is RAID-0 or RAID-1, or whatever).
>
>I'm open to a separate flag value for this, but I don't know if that is
>of much interest to other filesystems since they don't have similar needs.
>We want to avoid the need to have lots of syscalls to do this, since they
>translate into multiple RPCs that we want to avoid when creating
>potentially
>millions of files over the network.
>
>Cheers, Andreas
>
>>Lustre-change: http://review.whamcloud.com/8312
>>Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4209
>>Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
>>Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
>>Signed-off-by: Peng Tao <bergwolf@gmail.com>
>>---
>> .../lustre/lustre/include/lustre/lustre_user.h     |   12 ++++------
>> drivers/staging/lustre/lustre/include/lustre_mdc.h |   11 ++++++++++
>> drivers/staging/lustre/lustre/llite/file.c         |   21
>>++++++++++---------
>> drivers/staging/lustre/lustre/mdc/mdc_lib.c        |    2 +-
>> 4 files changed, 28 insertions(+), 18 deletions(-)
>>
>>diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
>>b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
>>index 6b6c0240..91f22a8 100644
>>--- a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
>>+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h
>>@@ -265,13 +265,11 @@ struct ost_id {
>> 
>> #define MAX_OBD_NAME 128 /* If this changes, a NEW ioctl must be added
>>*/
>> 
>>-/* Hopefully O_LOV_DELAY_CREATE does not conflict with standard O_xxx
>>flags.
>>- * Previously it was defined as 0100000000 and conflicts with
>>FMODE_NONOTIFY
>>- * which was added since kernel 2.6.36, so we redefine it as 020000000.
>>- * To be compatible with old version's statically linked binary, finally
>>we
>>- * define it as (020000000 | 0100000000).
>>- * */
>>-#define O_LOV_DELAY_CREATE      0120000000
>>+/* Define O_LOV_DELAY_CREATE to be a mask that is not useful for regular
>>+ * files, but are unlikely to be used in practice and are not harmful if
>>+ * used incorrectly.  O_NOCTTY and FASYNC are only meaningful for
>>character
>>+ * devices and are safe for use on new files (See LU-812, LU-4209). */
>>+#define O_LOV_DELAY_CREATE	(O_NOCTTY | FASYNC)
>> 
>> #define LL_FILE_IGNORE_LOCK     0x00000001
>> #define LL_FILE_GROUP_LOCKED    0x00000002
>>diff --git a/drivers/staging/lustre/lustre/include/lustre_mdc.h
>>b/drivers/staging/lustre/lustre/include/lustre_mdc.h
>>index c1e0270..468f363 100644
>>--- a/drivers/staging/lustre/lustre/include/lustre_mdc.h
>>+++ b/drivers/staging/lustre/lustre/include/lustre_mdc.h
>>@@ -166,6 +166,17 @@ void it_clear_disposition(struct lookup_intent *it,
>>int flag);
>> void it_set_disposition(struct lookup_intent *it, int flag);
>> int it_open_error(int phase, struct lookup_intent *it);
>> 
>>+static inline bool cl_is_lov_delay_create(unsigned int flags)
>>+{
>>+	return (flags & O_LOV_DELAY_CREATE) == O_LOV_DELAY_CREATE;
>>+}
>>+
>>+static inline void cl_lov_delay_create_clear(unsigned int *flags)
>>+{
>>+	if ((*flags & O_LOV_DELAY_CREATE) == O_LOV_DELAY_CREATE)
>>+		*flags &= ~O_LOV_DELAY_CREATE;
>>+}
>>+
>> /** @} mdc */
>> 
>> #endif
>>diff --git a/drivers/staging/lustre/lustre/llite/file.c
>>b/drivers/staging/lustre/lustre/llite/file.c
>>index c12821a..dc9da77 100644
>>--- a/drivers/staging/lustre/lustre/llite/file.c
>>+++ b/drivers/staging/lustre/lustre/llite/file.c
>>@@ -671,14 +671,13 @@ restart:
>> 
>> 	ll_capa_open(inode);
>> 
>>-	if (!lli->lli_has_smd) {
>>-		if (file->f_flags & O_LOV_DELAY_CREATE ||
>>-		    !(file->f_mode & FMODE_WRITE)) {
>>-			CDEBUG(D_INODE, "object creation was delayed\n");
>>-			GOTO(out_och_free, rc);
>>-		}
>>+	if (!lli->lli_has_smd &&
>>+	    (cl_is_lov_delay_create(file->f_flags) ||
>>+	     (file->f_mode & FMODE_WRITE) == 0)) {
>>+		CDEBUG(D_INODE, "object creation was delayed\n");
>>+		GOTO(out_och_free, rc);
>> 	}
>>-	file->f_flags &= ~O_LOV_DELAY_CREATE;
>>+	cl_lov_delay_create_clear(&file->f_flags);
>> 	GOTO(out_och_free, rc);
>> 
>> out_och_free:
>>@@ -1381,23 +1380,25 @@ int ll_lov_setstripe_ea_info(struct inode *inode,
>>struct file *file,
>> 		ccc_inode_lsm_put(inode, lsm);
>> 		CDEBUG(D_IOCTL, "stripe already exists for ino %lu\n",
>> 		       inode->i_ino);
>>-		return -EEXIST;
>>+		GOTO(out, rc = -EEXIST);
>> 	}
>> 
>> 	ll_inode_size_lock(inode);
>> 	rc = ll_intent_file_open(file, lum, lum_size, &oit);
>> 	if (rc)
>>-		GOTO(out, rc);
>>+		GOTO(out_unlock, rc);
>> 	rc = oit.d.lustre.it_status;
>> 	if (rc < 0)
>> 		GOTO(out_req_free, rc);
>> 
>> 	ll_release_openhandle(file->f_dentry, &oit);
>> 
>>- out:
>>+out_unlock:
>> 	ll_inode_size_unlock(inode);
>> 	ll_intent_release(&oit);
>> 	ccc_inode_lsm_put(inode, lsm);
>>+out:
>>+	cl_lov_delay_create_clear(&file->f_flags);
>> 	return rc;
>> out_req_free:
>> 	ptlrpc_req_finished((struct ptlrpc_request *) oit.d.lustre.it_data);
>>diff --git a/drivers/staging/lustre/lustre/mdc/mdc_lib.c
>>b/drivers/staging/lustre/lustre/mdc/mdc_lib.c
>>index 91f6876..5b9f371 100644
>>--- a/drivers/staging/lustre/lustre/mdc/mdc_lib.c
>>+++ b/drivers/staging/lustre/lustre/mdc/mdc_lib.c
>>@@ -197,7 +197,7 @@ static __u64 mds_pack_open_flags(__u64 flags, __u32
>>mode)
>> 	if (flags & FMODE_EXEC)
>> 		cr_flags |= MDS_FMODE_EXEC;
>> #endif
>>-	if (flags & O_LOV_DELAY_CREATE)
>>+	if (cl_is_lov_delay_create(flags))
>> 		cr_flags |= MDS_OPEN_DELAY_CREATE;
>> 
>> 	if (flags & O_NONBLOCK)
>>-- 
>>1.7.3.4
>>
>>
>
>
>Cheers, Andreas
>-- 
>Andreas Dilger
>
>Lustre Software Architect
>Intel High Performance Data Division
>
>
>

Cheers, Andreas
-- 
Andreas Dilger

Lustre Software Architect
Intel High Performance Data Division

^ permalink raw reply	[flat|nested] 15+ messages in thread