linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Patch 1/4] Support for checking and reading block grade information in kernel
@ 2018-04-06 11:41 Sayan Ghosh
  2018-04-06 17:09 ` Randy Dunlap
  2018-04-19 15:40 ` Jan Kara
  0 siblings, 2 replies; 7+ messages in thread
From: Sayan Ghosh @ 2018-04-06 11:41 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, niloy ganguly, Bhattacharya, Suparna,
	Madhumita Mallick, Bharde, Madhumita

This introduces the different functions in order to get the grades as
the extended attributes while pre-allocating a new file. The grades
are stored as extended attributes while the file gets created. The
grades can be used by different user space applications as necessary.
The functions introduced are read_grade_xattr(), is_file_graded(),
read_count_xattr() which aim to read the extended attribute for grade
array and also to know whether the file is graded. The detailed
descriptions of the functions are provided as comments in the patch.
The patch is on top of Linux Kernel 4.7.2.

Signed-off-by: Sayan Ghosh <sgdgp.2014@gmail.com>
---
 fs/ext4/ext4.h    | 15 +++++++++++++++
 fs/ext4/extents.c | 35 +++++++++++++++++++++++++++++++++++
 2 files changed, 50 insertions(+)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index b84aa1c..b9ec0ca 100755
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -136,6 +136,18 @@ enum SHIFT_DIRECTION {
 /* Use blocks from reserved pool */
 #define EXT4_MB_USE_RESERVED        0x2000

+/* Structure of a grade - starting block number
+ * and length of contiguous blocks with same higher
+ * grade (inclusive of starting block)
+ * example : if blocks 2,3,4 are higher graded,
+ * then block_num = 2 and len = 3
+ * Only high grade information is stored by this struct.
+ */
+struct grade_struct {
+    ext4_lblk_t block_num;
+    unsigned long long len;
+};
+
 struct ext4_allocation_request {
     /* target inode for block we're allocating */
     struct inode *inode;
@@ -3186,6 +3198,9 @@ extern int ext4_check_blockref(const char *, unsigned int,
 /* extents.c */
 struct ext4_ext_path;
 struct ext4_extent;
+extern unsigned long long read_count_xattr(struct inode *inode);
+extern void read_grade_xattr(struct inode *inode,struct grade_struct
*grade_array);
+extern int is_file_graded(struct inode *inode);

 /*
  * Maximum number of logical blocks in a file; ext4_extent's ee_block is
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index d7ccb7f..de9194f 100755
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -57,6 +57,41 @@
 #define EXT4_EXT_DATA_VALID1    0x8  /* first half contains valid data */
 #define EXT4_EXT_DATA_VALID2    0x10 /* second half contains valid data */

+/*
+ * read_grade_xattr() is used to read the grade array from the
extended attribute.
+ */
+void read_grade_xattr(struct inode *inode,struct grade_struct *grade_array)
+{
+    const char *xattr_name = "grade_array";
+    int xattr_size = ext4_xattr_get(inode,
EXT4_XATTR_INDEX_USER,xattr_name, NULL,0);
+    xattr_size = ext4_xattr_get(inode,
EXT4_XATTR_INDEX_USER,xattr_name, (void *)grade_array,xattr_size);
+    return;
+}
+
+/*
+ * is_file_graded() returns whether the file has a grade information or not.
+ * It takes the inode number as a parameter.
+ */
+int is_file_graded(struct inode *inode)
+{
+    const char *xattr_name = "is_graded";
+    int is_graded = 0;
+    int xattr_size = sizeof(int);
+    xattr_size = ext4_xattr_get(inode,
EXT4_XATTR_INDEX_USER,xattr_name, (void *)&is_graded,xattr_size);
+    return is_graded;
+}
+
+/*
+ * read_count_xattr() used to get the number of the elements in the
grade array.
+ */
+unsigned long long read_count_xattr(struct inode *inode)
+{
+    const char *xattr_name = "grade_array";
+    unsigned long long xattr_size = ext4_xattr_get(inode,
EXT4_XATTR_INDEX_USER,xattr_name, NULL,0);
+    unsigned long long total = xattr_size/sizeof(struct grade_struct);
+    return total;
+}
+
 static __le32 ext4_extent_block_csum(struct inode *inode,
                      struct ext4_extent_header *eh)
 {
‌

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [Patch 1/4] Support for checking and reading block grade information in kernel
  2018-04-06 11:41 [Patch 1/4] Support for checking and reading block grade information in kernel Sayan Ghosh
@ 2018-04-06 17:09 ` Randy Dunlap
  2018-04-09 17:27   ` Sayan Ghosh
  2018-04-19 15:40 ` Jan Kara
  1 sibling, 1 reply; 7+ messages in thread
From: Randy Dunlap @ 2018-04-06 17:09 UTC (permalink / raw)
  To: Sayan Ghosh, linux-ext4
  Cc: linux-fsdevel, niloy ganguly, Bhattacharya, Suparna,
	Madhumita Mallick, Bharde, Madhumita

On 04/06/2018 04:41 AM, Sayan Ghosh wrote:
> This introduces the different functions in order to get the grades as
> the extended attributes while pre-allocating a new file. The grades
> are stored as extended attributes while the file gets created. The
> grades can be used by different user space applications as necessary.
> The functions introduced are read_grade_xattr(), is_file_graded(),
> read_count_xattr() which aim to read the extended attribute for grade
> array and also to know whether the file is graded. The detailed
> descriptions of the functions are provided as comments in the patch.
> The patch is on top of Linux Kernel 4.7.2.

Well, it's up to Ted if he wants to merge a patch that is based on 4.7.2.
In general, patches are made to the current mainline or -next tree,
while 4.7.2 was released on 2016-Aug-20.

> 
> Signed-off-by: Sayan Ghosh <sgdgp.2014@gmail.com>
> ---
>  fs/ext4/ext4.h    | 15 +++++++++++++++
>  fs/ext4/extents.c | 35 +++++++++++++++++++++++++++++++++++
>  2 files changed, 50 insertions(+)
> 
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index b84aa1c..b9ec0ca 100755
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -136,6 +136,18 @@ enum SHIFT_DIRECTION {
>  /* Use blocks from reserved pool */
>  #define EXT4_MB_USE_RESERVED        0x2000
> 
> +/* Structure of a grade - starting block number
> + * and length of contiguous blocks with same higher
> + * grade (inclusive of starting block)
> + * example : if blocks 2,3,4 are higher graded,
> + * then block_num = 2 and len = 3
> + * Only high grade information is stored by this struct.
> + */

Wrong multi-line format style (unless ext4 is like netdev :).

-- 
~Randy

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Patch 1/4] Support for checking and reading block grade information in kernel
  2018-04-06 17:09 ` Randy Dunlap
@ 2018-04-09 17:27   ` Sayan Ghosh
  0 siblings, 0 replies; 7+ messages in thread
From: Sayan Ghosh @ 2018-04-09 17:27 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: linux-ext4, linux-fsdevel, niloy ganguly, Bhattacharya, Suparna,
	Madhumita Mallick, Bharde, Madhumita

Hi Randy,

Thank you for looking into the patchset and providing your suggestions
and comments.
We are working on the implementation for the current version of kernel
and will post the code for that soon.

We will definitely keep in mind the style and indentation issues that
you pointed out.

Regards,
Sayan Ghosh
‌On Fri, Apr 6, 2018 at 10:39 PM, Randy Dunlap <rdunlap@infradead.org> wrote:
> On 04/06/2018 04:41 AM, Sayan Ghosh wrote:
>> This introduces the different functions in order to get the grades as
>> the extended attributes while pre-allocating a new file. The grades
>> are stored as extended attributes while the file gets created. The
>> grades can be used by different user space applications as necessary.
>> The functions introduced are read_grade_xattr(), is_file_graded(),
>> read_count_xattr() which aim to read the extended attribute for grade
>> array and also to know whether the file is graded. The detailed
>> descriptions of the functions are provided as comments in the patch.
>> The patch is on top of Linux Kernel 4.7.2.
>
> Well, it's up to Ted if he wants to merge a patch that is based on 4.7.2.
> In general, patches are made to the current mainline or -next tree,
> while 4.7.2 was released on 2016-Aug-20.
>
>>
>> Signed-off-by: Sayan Ghosh <sgdgp.2014@gmail.com>
>> ---
>>  fs/ext4/ext4.h    | 15 +++++++++++++++
>>  fs/ext4/extents.c | 35 +++++++++++++++++++++++++++++++++++
>>  2 files changed, 50 insertions(+)
>>
>> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
>> index b84aa1c..b9ec0ca 100755
>> --- a/fs/ext4/ext4.h
>> +++ b/fs/ext4/ext4.h
>> @@ -136,6 +136,18 @@ enum SHIFT_DIRECTION {
>>  /* Use blocks from reserved pool */
>>  #define EXT4_MB_USE_RESERVED        0x2000
>>
>> +/* Structure of a grade - starting block number
>> + * and length of contiguous blocks with same higher
>> + * grade (inclusive of starting block)
>> + * example : if blocks 2,3,4 are higher graded,
>> + * then block_num = 2 and len = 3
>> + * Only high grade information is stored by this struct.
>> + */
>
> Wrong multi-line format style (unless ext4 is like netdev :).
>
> --
> ~Randy

</sgdgp.2014@gmail.com></rdunlap@infradead.org>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Patch 1/4] Support for checking and reading block grade information in kernel
  2018-04-06 11:41 [Patch 1/4] Support for checking and reading block grade information in kernel Sayan Ghosh
  2018-04-06 17:09 ` Randy Dunlap
@ 2018-04-19 15:40 ` Jan Kara
  2018-04-29  9:52   ` Sayan Ghosh
  1 sibling, 1 reply; 7+ messages in thread
From: Jan Kara @ 2018-04-19 15:40 UTC (permalink / raw)
  To: Sayan Ghosh
  Cc: linux-ext4, linux-fsdevel, niloy ganguly, Bhattacharya, Suparna,
	Madhumita Mallick, Bharde, Madhumita

On Fri 06-04-18 17:11:40, Sayan Ghosh wrote:
> This introduces the different functions in order to get the grades as
> the extended attributes while pre-allocating a new file. The grades
> are stored as extended attributes while the file gets created. The
> grades can be used by different user space applications as necessary.
> The functions introduced are read_grade_xattr(), is_file_graded(),
> read_count_xattr() which aim to read the extended attribute for grade
> array and also to know whether the file is graded. The detailed
> descriptions of the functions are provided as comments in the patch.
> The patch is on top of Linux Kernel 4.7.2.
> 
> Signed-off-by: Sayan Ghosh <sgdgp.2014@gmail.com>

Thanks for the patch! The fact that this is based on rather old kernel has
been already mentioned - you really need to base on much newer kernel to
get this merged. Another problem I see is that there's no description of
the design of this feature. I.e., What this feature is good for? And how is
it supposed to work? Probably before investing too much time into rebasing
you can start with sending the high level design of the feature for
discussion. From quickly glancing through the patches I gather it is some
kind of HSM but I'm not completely sure...

								Honza

> ---
>  fs/ext4/ext4.h    | 15 +++++++++++++++
>  fs/ext4/extents.c | 35 +++++++++++++++++++++++++++++++++++
>  2 files changed, 50 insertions(+)
> 
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index b84aa1c..b9ec0ca 100755
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -136,6 +136,18 @@ enum SHIFT_DIRECTION {
>  /* Use blocks from reserved pool */
>  #define EXT4_MB_USE_RESERVED        0x2000
> 
> +/* Structure of a grade - starting block number
> + * and length of contiguous blocks with same higher
> + * grade (inclusive of starting block)
> + * example : if blocks 2,3,4 are higher graded,
> + * then block_num = 2 and len = 3
> + * Only high grade information is stored by this struct.
> + */
> +struct grade_struct {
> +    ext4_lblk_t block_num;
> +    unsigned long long len;
> +};
> +
>  struct ext4_allocation_request {
>      /* target inode for block we're allocating */
>      struct inode *inode;
> @@ -3186,6 +3198,9 @@ extern int ext4_check_blockref(const char *, unsigned int,
>  /* extents.c */
>  struct ext4_ext_path;
>  struct ext4_extent;
> +extern unsigned long long read_count_xattr(struct inode *inode);
> +extern void read_grade_xattr(struct inode *inode,struct grade_struct
> *grade_array);
> +extern int is_file_graded(struct inode *inode);
> 
>  /*
>   * Maximum number of logical blocks in a file; ext4_extent's ee_block is
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index d7ccb7f..de9194f 100755
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -57,6 +57,41 @@
>  #define EXT4_EXT_DATA_VALID1    0x8  /* first half contains valid data */
>  #define EXT4_EXT_DATA_VALID2    0x10 /* second half contains valid data */
> 
> +/*
> + * read_grade_xattr() is used to read the grade array from the
> extended attribute.
> + */
> +void read_grade_xattr(struct inode *inode,struct grade_struct *grade_array)
> +{
> +    const char *xattr_name = "grade_array";
> +    int xattr_size = ext4_xattr_get(inode,
> EXT4_XATTR_INDEX_USER,xattr_name, NULL,0);
> +    xattr_size = ext4_xattr_get(inode,
> EXT4_XATTR_INDEX_USER,xattr_name, (void *)grade_array,xattr_size);
> +    return;
> +}
> +
> +/*
> + * is_file_graded() returns whether the file has a grade information or not.
> + * It takes the inode number as a parameter.
> + */
> +int is_file_graded(struct inode *inode)
> +{
> +    const char *xattr_name = "is_graded";
> +    int is_graded = 0;
> +    int xattr_size = sizeof(int);
> +    xattr_size = ext4_xattr_get(inode,
> EXT4_XATTR_INDEX_USER,xattr_name, (void *)&is_graded,xattr_size);
> +    return is_graded;
> +}
> +
> +/*
> + * read_count_xattr() used to get the number of the elements in the
> grade array.
> + */
> +unsigned long long read_count_xattr(struct inode *inode)
> +{
> +    const char *xattr_name = "grade_array";
> +    unsigned long long xattr_size = ext4_xattr_get(inode,
> EXT4_XATTR_INDEX_USER,xattr_name, NULL,0);
> +    unsigned long long total = xattr_size/sizeof(struct grade_struct);
> +    return total;
> +}
> +
>  static __le32 ext4_extent_block_csum(struct inode *inode,
>                       struct ext4_extent_header *eh)
>  {
> ‌
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Patch 1/4] Support for checking and reading block grade information in kernel
  2018-04-19 15:40 ` Jan Kara
@ 2018-04-29  9:52   ` Sayan Ghosh
  2018-04-29 11:02     ` Matthew Wilcox
  2018-05-03 12:27     ` Jan Kara
  0 siblings, 2 replies; 7+ messages in thread
From: Sayan Ghosh @ 2018-04-29  9:52 UTC (permalink / raw)
  To: Jan Kara
  Cc: Ext4 Developers List, linux-fsdevel, niloy ganguly, Bhattacharya,
	Suparna, Madhumita Mallick, Bharde, Madhumita

Hello Jan,

Thank you for looking into our patchset and providing feedbacks.
We are currently modifying these patches for the latest version of kernel.

The overall objective is described as follows :
The goal of our project is broadly to support data gradation of a
single file. If the contents of the file is graded in terms of its
importance then a corresponding application might need to view/analyse
only the important portions. It also helps if the important portions
can be accessed quickly without having to go through the entire file.
For an example, we can think of a leaning video with
indexing/annotations, in which the annotations contain the important
parts of the video. A learner can just be interested in those parts,
and it will help him if he can be provided with a reduced view with
just the parts he’s interested in. An example of such videos is ACM
Webinar videos where an user can navigate using table-of-contents or
phrase cloud.

The below link is one similar video -
https://videoken.com/video-detail?videoID=IpGxLWOIZy4&videoDuration=1853&videoName=A%20Friendly%20Introduction%20to%20Machine%20Learning&keyword=A%20Friendly%20Introduction%20to%20Machine%20Learning

These kind of video file can serve as an input to our system where we
know which parts of the file has been marked. Our goal then is to
properly place respective important blocks and provide a reduced view
of just the important parts of the file. Placing the important blocks
in a faster tier (SSD,PM etc) greatly enhances the performance of
reading and writing of the file.
In order to achieve this we have a data structure for the grades -
sort of like extent structure. It contains details of segments of high
graded parts of the file. The contents of the data structure are the
starting block number and the length of the segment.
So the patches basically focus on having functions to set and get the
grade information from the extended attributes and allocating the
blocks using this grade information (by modifying the fallocate calls
in the kernel). The aspect of getting a reduced view of the file is
being handled by modifying the code for dax calls in kernel.
Also taking clue from Andreas' feedback we are looking into the
streamID interface to see if we can use this for our work.
We are also looking if there are any other in-built methods which can
help in having the grade structure without introducing new data
structures. We would be grateful if you also could provide suggestions
on other ways of implementing grades.

Regards,
Sayan

On Thu, Apr 19, 2018 at 9:10 PM, Jan Kara <jack@suse.cz> wrote:
> On Fri 06-04-18 17:11:40, Sayan Ghosh wrote:
>> This introduces the different functions in order to get the grades as
>> the extended attributes while pre-allocating a new file. The grades
>> are stored as extended attributes while the file gets created. The
>> grades can be used by different user space applications as necessary.
>> The functions introduced are read_grade_xattr(), is_file_graded(),
>> read_count_xattr() which aim to read the extended attribute for grade
>> array and also to know whether the file is graded. The detailed
>> descriptions of the functions are provided as comments in the patch.
>> The patch is on top of Linux Kernel 4.7.2.
>>
>> Signed-off-by: Sayan Ghosh <sgdgp.2014@gmail.com>
>
> Thanks for the patch! The fact that this is based on rather old kernel has
> been already mentioned - you really need to base on much newer kernel to
> get this merged. Another problem I see is that there's no description of
> the design of this feature. I.e., What this feature is good for? And how is
> it supposed to work? Probably before investing too much time into rebasing
> you can start with sending the high level design of the feature for
> discussion. From quickly glancing through the patches I gather it is some
> kind of HSM but I'm not completely sure...
>
>                                                                 Honza
>
>> ---
>>  fs/ext4/ext4.h    | 15 +++++++++++++++
>>  fs/ext4/extents.c | 35 +++++++++++++++++++++++++++++++++++
>>  2 files changed, 50 insertions(+)
>>
>> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
>> index b84aa1c..b9ec0ca 100755
>> --- a/fs/ext4/ext4.h
>> +++ b/fs/ext4/ext4.h
>> @@ -136,6 +136,18 @@ enum SHIFT_DIRECTION {
>>  /* Use blocks from reserved pool */
>>  #define EXT4_MB_USE_RESERVED        0x2000
>>
>> +/* Structure of a grade - starting block number
>> + * and length of contiguous blocks with same higher
>> + * grade (inclusive of starting block)
>> + * example : if blocks 2,3,4 are higher graded,
>> + * then block_num = 2 and len = 3
>> + * Only high grade information is stored by this struct.
>> + */
>> +struct grade_struct {
>> +    ext4_lblk_t block_num;
>> +    unsigned long long len;
>> +};
>> +
>>  struct ext4_allocation_request {
>>      /* target inode for block we're allocating */
>>      struct inode *inode;
>> @@ -3186,6 +3198,9 @@ extern int ext4_check_blockref(const char *, unsigned int,
>>  /* extents.c */
>>  struct ext4_ext_path;
>>  struct ext4_extent;
>> +extern unsigned long long read_count_xattr(struct inode *inode);
>> +extern void read_grade_xattr(struct inode *inode,struct grade_struct
>> *grade_array);
>> +extern int is_file_graded(struct inode *inode);
>>
>>  /*
>>   * Maximum number of logical blocks in a file; ext4_extent's ee_block is
>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>> index d7ccb7f..de9194f 100755
>> --- a/fs/ext4/extents.c
>> +++ b/fs/ext4/extents.c
>> @@ -57,6 +57,41 @@
>>  #define EXT4_EXT_DATA_VALID1    0x8  /* first half contains valid data */
>>  #define EXT4_EXT_DATA_VALID2    0x10 /* second half contains valid data */
>>
>> +/*
>> + * read_grade_xattr() is used to read the grade array from the
>> extended attribute.
>> + */
>> +void read_grade_xattr(struct inode *inode,struct grade_struct *grade_array)
>> +{
>> +    const char *xattr_name = "grade_array";
>> +    int xattr_size = ext4_xattr_get(inode,
>> EXT4_XATTR_INDEX_USER,xattr_name, NULL,0);
>> +    xattr_size = ext4_xattr_get(inode,
>> EXT4_XATTR_INDEX_USER,xattr_name, (void *)grade_array,xattr_size);
>> +    return;
>> +}
>> +
>> +/*
>> + * is_file_graded() returns whether the file has a grade information or not.
>> + * It takes the inode number as a parameter.
>> + */
>> +int is_file_graded(struct inode *inode)
>> +{
>> +    const char *xattr_name = "is_graded";
>> +    int is_graded = 0;
>> +    int xattr_size = sizeof(int);
>> +    xattr_size = ext4_xattr_get(inode,
>> EXT4_XATTR_INDEX_USER,xattr_name, (void *)&amp;is_graded,xattr_size);
>> +    return is_graded;
>> +}
>> +
>> +/*
>> + * read_count_xattr() used to get the number of the elements in the
>> grade array.
>> + */
>> +unsigned long long read_count_xattr(struct inode *inode)
>> +{
>> +    const char *xattr_name = "grade_array";
>> +    unsigned long long xattr_size = ext4_xattr_get(inode,
>> EXT4_XATTR_INDEX_USER,xattr_name, NULL,0);
>> +    unsigned long long total = xattr_size/sizeof(struct grade_struct);
>> +    return total;
>> +}
>> +
>>  static __le32 ext4_extent_block_csum(struct inode *inode,
>>                       struct ext4_extent_header *eh)
>>  {
>> ‌
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

</jack@suse.com></sgdgp.2014@gmail.com></jack@suse.cz>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Patch 1/4] Support for checking and reading block grade information in kernel
  2018-04-29  9:52   ` Sayan Ghosh
@ 2018-04-29 11:02     ` Matthew Wilcox
  2018-05-03 12:27     ` Jan Kara
  1 sibling, 0 replies; 7+ messages in thread
From: Matthew Wilcox @ 2018-04-29 11:02 UTC (permalink / raw)
  To: Sayan Ghosh
  Cc: Jan Kara, Ext4 Developers List, linux-fsdevel, niloy ganguly,
	Bhattacharya, Suparna, Madhumita Mallick, Bharde, Madhumita

On Sun, Apr 29, 2018 at 03:22:34PM +0530, Sayan Ghosh wrote:
> Thank you for looking into our patchset and providing feedbacks.
> We are currently modifying these patches for the latest version of kernel.

It's probably a better use of your time to convince us that this is a
useful feature.

> The overall objective is described as follows :
> The goal of our project is broadly to support data gradation of a
> single file. If the contents of the file is graded in terms of its
> importance then a corresponding application might need to view/analyse
> only the important portions. It also helps if the important portions
> can be accessed quickly without having to go through the entire file.
> For an example, we can think of a leaning video with
> indexing/annotations, in which the annotations contain the important
> parts of the video. A learner can just be interested in those parts,
> and it will help him if he can be provided with a reduced view with
> just the parts he’s interested in. An example of such videos is ACM
> Webinar videos where an user can navigate using table-of-contents or
> phrase cloud.

The problem I have with this approach is that it assumes the author of
the video knows in advance which bits will be popular.  If there's a
part which becomes unexpectedly popular then you didn't win anything.

> We are also looking if there are any other in-built methods which can
> help in having the grade structure without introducing new data
> structures. We would be grateful if you also could provide suggestions
> on other ways of implementing grades.

You're getting ahead of yourself; the implementation of them inside an
individual filesystem is almost unimportant (it's important insofaras
it's good to demonstrate that it can be done).  The important part is the
API between the kernel and userspace, and convincing people to use it.
We don't want to merge an API that nobody ends up using.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Patch 1/4] Support for checking and reading block grade information in kernel
  2018-04-29  9:52   ` Sayan Ghosh
  2018-04-29 11:02     ` Matthew Wilcox
@ 2018-05-03 12:27     ` Jan Kara
  1 sibling, 0 replies; 7+ messages in thread
From: Jan Kara @ 2018-05-03 12:27 UTC (permalink / raw)
  To: Sayan Ghosh
  Cc: Jan Kara, Ext4 Developers List, linux-fsdevel, niloy ganguly,
	Bhattacharya, Suparna, Madhumita Mallick, Bharde, Madhumita,
	linux-xfs

Hello Sayan,

On Sun 29-04-18 15:22:34, Sayan Ghosh wrote:
> Thank you for looking into our patchset and providing feedbacks.
> We are currently modifying these patches for the latest version of kernel.
> 
> The overall objective is described as follows :
> The goal of our project is broadly to support data gradation of a
> single file. If the contents of the file is graded in terms of its
> importance then a corresponding application might need to view/analyse
> only the important portions. It also helps if the important portions
> can be accessed quickly without having to go through the entire file.
> For an example, we can think of a leaning video with
> indexing/annotations, in which the annotations contain the important
> parts of the video. A learner can just be interested in those parts,
> and it will help him if he can be provided with a reduced view with
> just the parts he’s interested in. An example of such videos is ACM
> Webinar videos where an user can navigate using table-of-contents or
> phrase cloud.
> 
> The below link is one similar video -
> https://videoken.com/video-detail?videoID=IpGxLWOIZy4&videoDuration=1853&videoName=A%20Friendly%20Introduction%20to%20Machine%20Learning&keyword=A%20Friendly%20Introduction%20to%20Machine%20Learning
> 
> These kind of video file can serve as an input to our system where we
> know which parts of the file has been marked. Our goal then is to
> properly place respective important blocks and provide a reduced view
> of just the important parts of the file. Placing the important blocks
> in a faster tier (SSD,PM etc) greatly enhances the performance of
> reading and writing of the file.
> In order to achieve this we have a data structure for the grades -
> sort of like extent structure. It contains details of segments of high
> graded parts of the file. The contents of the data structure are the
> starting block number and the length of the segment.
> So the patches basically focus on having functions to set and get the
> grade information from the extended attributes and allocating the
> blocks using this grade information (by modifying the fallocate calls
> in the kernel). The aspect of getting a reduced view of the file is
> being handled by modifying the code for dax calls in kernel.
> Also taking clue from Andreas' feedback we are looking into the
> streamID interface to see if we can use this for our work.
> We are also looking if there are any other in-built methods which can
> help in having the grade structure without introducing new data
> structures. We would be grateful if you also could provide suggestions
> on other ways of implementing grades.

What you describe here really sounds pretty much like "Hiearchical Storage
Management" (HSM). It was invented a long time ago to support storage of
less used data on slow storage (tapes or so at that time). There's even a
standard for filesystems to support this and XFS used to support it (the
support in Linux was later removed as it was broken) - I think "Data
Storage Management (XDSM) API" [1] is the standard describing the API. I've
CCed XFS mailing list as people more knowledgeable of HSM than me are
lingering there :).

The difference of your proposal to classical HSM is in that in your
proposal, all the storage devices are directly accessible by the filesystem
and just mapped to different block offsets of the device underlying the
filesystem. Which frankly sounds quite messy as is also shown by you having
to hardcode where fast / slow device starts in the block number space.
Also your support for reading only highly graded info (patch 4) IMO does
not belong to the kernel. Userspace application can just read from some
index which parts of the file are interesting and use lseek(2) + read(2) to
read only those. No need for special kernel magic. Finally mixing DAX &
non-DAX access to a single file as you do in patch 3 is technically very
difficult (there are lots of assumptions in current DAX code that a file is
either wholy accessed through DAX or nothing is accessed through DAX). So
to sum it up won't you get better overall results, if you just used
something like dm-cache / bcache and cached the slow device with the fast
one?

								Honza

[1] http://pubs.opengroup.org/onlinepubs/9657099/

> 
> Regards,
> Sayan
> 
> On Thu, Apr 19, 2018 at 9:10 PM, Jan Kara <jack@suse.cz> wrote:
> > On Fri 06-04-18 17:11:40, Sayan Ghosh wrote:
> >> This introduces the different functions in order to get the grades as
> >> the extended attributes while pre-allocating a new file. The grades
> >> are stored as extended attributes while the file gets created. The
> >> grades can be used by different user space applications as necessary.
> >> The functions introduced are read_grade_xattr(), is_file_graded(),
> >> read_count_xattr() which aim to read the extended attribute for grade
> >> array and also to know whether the file is graded. The detailed
> >> descriptions of the functions are provided as comments in the patch.
> >> The patch is on top of Linux Kernel 4.7.2.
> >>
> >> Signed-off-by: Sayan Ghosh <sgdgp.2014@gmail.com>
> >
> > Thanks for the patch! The fact that this is based on rather old kernel has
> > been already mentioned - you really need to base on much newer kernel to
> > get this merged. Another problem I see is that there's no description of
> > the design of this feature. I.e., What this feature is good for? And how is
> > it supposed to work? Probably before investing too much time into rebasing
> > you can start with sending the high level design of the feature for
> > discussion. From quickly glancing through the patches I gather it is some
> > kind of HSM but I'm not completely sure...
> >
> >                                                                 Honza
> >
> >> ---
> >>  fs/ext4/ext4.h    | 15 +++++++++++++++
> >>  fs/ext4/extents.c | 35 +++++++++++++++++++++++++++++++++++
> >>  2 files changed, 50 insertions(+)
> >>
> >> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> >> index b84aa1c..b9ec0ca 100755
> >> --- a/fs/ext4/ext4.h
> >> +++ b/fs/ext4/ext4.h
> >> @@ -136,6 +136,18 @@ enum SHIFT_DIRECTION {
> >>  /* Use blocks from reserved pool */
> >>  #define EXT4_MB_USE_RESERVED        0x2000
> >>
> >> +/* Structure of a grade - starting block number
> >> + * and length of contiguous blocks with same higher
> >> + * grade (inclusive of starting block)
> >> + * example : if blocks 2,3,4 are higher graded,
> >> + * then block_num = 2 and len = 3
> >> + * Only high grade information is stored by this struct.
> >> + */
> >> +struct grade_struct {
> >> +    ext4_lblk_t block_num;
> >> +    unsigned long long len;
> >> +};
> >> +
> >>  struct ext4_allocation_request {
> >>      /* target inode for block we're allocating */
> >>      struct inode *inode;
> >> @@ -3186,6 +3198,9 @@ extern int ext4_check_blockref(const char *, unsigned int,
> >>  /* extents.c */
> >>  struct ext4_ext_path;
> >>  struct ext4_extent;
> >> +extern unsigned long long read_count_xattr(struct inode *inode);
> >> +extern void read_grade_xattr(struct inode *inode,struct grade_struct
> >> *grade_array);
> >> +extern int is_file_graded(struct inode *inode);
> >>
> >>  /*
> >>   * Maximum number of logical blocks in a file; ext4_extent's ee_block is
> >> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> >> index d7ccb7f..de9194f 100755
> >> --- a/fs/ext4/extents.c
> >> +++ b/fs/ext4/extents.c
> >> @@ -57,6 +57,41 @@
> >>  #define EXT4_EXT_DATA_VALID1    0x8  /* first half contains valid data */
> >>  #define EXT4_EXT_DATA_VALID2    0x10 /* second half contains valid data */
> >>
> >> +/*
> >> + * read_grade_xattr() is used to read the grade array from the
> >> extended attribute.
> >> + */
> >> +void read_grade_xattr(struct inode *inode,struct grade_struct *grade_array)
> >> +{
> >> +    const char *xattr_name = "grade_array";
> >> +    int xattr_size = ext4_xattr_get(inode,
> >> EXT4_XATTR_INDEX_USER,xattr_name, NULL,0);
> >> +    xattr_size = ext4_xattr_get(inode,
> >> EXT4_XATTR_INDEX_USER,xattr_name, (void *)grade_array,xattr_size);
> >> +    return;
> >> +}
> >> +
> >> +/*
> >> + * is_file_graded() returns whether the file has a grade information or not.
> >> + * It takes the inode number as a parameter.
> >> + */
> >> +int is_file_graded(struct inode *inode)
> >> +{
> >> +    const char *xattr_name = "is_graded";
> >> +    int is_graded = 0;
> >> +    int xattr_size = sizeof(int);
> >> +    xattr_size = ext4_xattr_get(inode,
> >> EXT4_XATTR_INDEX_USER,xattr_name, (void *)&amp;is_graded,xattr_size);
> >> +    return is_graded;
> >> +}
> >> +
> >> +/*
> >> + * read_count_xattr() used to get the number of the elements in the
> >> grade array.
> >> + */
> >> +unsigned long long read_count_xattr(struct inode *inode)
> >> +{
> >> +    const char *xattr_name = "grade_array";
> >> +    unsigned long long xattr_size = ext4_xattr_get(inode,
> >> EXT4_XATTR_INDEX_USER,xattr_name, NULL,0);
> >> +    unsigned long long total = xattr_size/sizeof(struct grade_struct);
> >> +    return total;
> >> +}
> >> +
> >>  static __le32 ext4_extent_block_csum(struct inode *inode,
> >>                       struct ext4_extent_header *eh)
> >>  {
> >> ‌
> > --
> > Jan Kara <jack@suse.com>
> > SUSE Labs, CR
> 
> </jack@suse.com></sgdgp.2014@gmail.com></jack@suse.cz>
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-05-03 12:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-06 11:41 [Patch 1/4] Support for checking and reading block grade information in kernel Sayan Ghosh
2018-04-06 17:09 ` Randy Dunlap
2018-04-09 17:27   ` Sayan Ghosh
2018-04-19 15:40 ` Jan Kara
2018-04-29  9:52   ` Sayan Ghosh
2018-04-29 11:02     ` Matthew Wilcox
2018-05-03 12:27     ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).