lustre-devel-lustre.org archive mirror
 help / color / mirror / Atom feed
* [lustre-devel] Error checking for llapi_hsm_action_progress().
@ 2020-08-31  4:03 NeilBrown
  2020-08-31 12:53 ` quentin.bouget at cea.fr
  2020-08-31 15:36 ` Joseph Benjamin Evans
  0 siblings, 2 replies; 11+ messages in thread
From: NeilBrown @ 2020-08-31  4:03 UTC (permalink / raw)
  To: lustre-devel



I have a question about llapi_hsm_action_progress().  The documentation
says that every interval sent "must" be unique, and must not overlap
(which not exactly the same as 'unique').  The code (on server side)
only partially enforces this.  It causes any request for an empty
interval (start>end) to fail, but otherwise accepts any interval.  If it
gets two identical intervals (not just overlapping, but identical), it
ignores the second.  This seems weird.

It would make some sense to just accept any interval - all it does is
sum the lengths, and use this to report status, so no corruption would
result.  It would also make sense to return an error if an interval
overlaps any previous interval, as this violates the spec.  It might
make sense to accept any interval, but only count the overlapped length
once.  But the current behaviour of only ignoring exact duplicates is
weird.  I tried removing that check, but there is a test (hsm_test 108)
which checks for repeating identical intervals.

I want to clean up this code as I'm converting all users of the lustre
interval-tree to use the upstream-linux interval tree code.  What should
I do?

Should I remove test 108 because it is only testing one particular
corner case, or should I improve the code to handle all overlaps
consistently?  Would it be OK to fail an overlap (I'd need to change
test 108), it must they be quietly accepted?

Thanks,
NeilBrown
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20200831/d8c3d476/attachment.sig>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [lustre-devel] Error checking for llapi_hsm_action_progress().
  2020-08-31  4:03 [lustre-devel] Error checking for llapi_hsm_action_progress() NeilBrown
@ 2020-08-31 12:53 ` quentin.bouget at cea.fr
  2020-09-01  0:58   ` NeilBrown
  2020-08-31 15:36 ` Joseph Benjamin Evans
  1 sibling, 1 reply; 11+ messages in thread
From: quentin.bouget at cea.fr @ 2020-08-31 12:53 UTC (permalink / raw)
  To: lustre-devel

Hi Neil,


On 31/08/2020 06:03, NeilBrown wrote:
> I have a question about llapi_hsm_action_progress().  The documentation
> says that every interval sent "must" be unique, and must not overlap
> (which not exactly the same as 'unique').  The code (on server side)
> only partially enforces this.  It causes any request for an empty
> interval (start>end) to fail, but otherwise accepts any interval.  If it
> gets two identical intervals (not just overlapping, but identical), it
> ignores the second.  This seems weird.
>
> It would make some sense to just accept any interval - all it does is
> sum the lengths, and use this to report status, so no corruption would
> result.  It would also make sense to return an error if an interval
> overlaps any previous interval, as this violates the spec.  It might
> make sense to accept any interval, but only count the overlapped length
> once.  But the current behaviour of only ignoring exact duplicates is
> weird.  I tried removing that check, but there is a test (hsm_test 108)
> which checks for repeating identical intervals.

test108 handles a "growing" extent (offset=0, length=1, 2, 3, ...).
test112 handles acknowledging non-overlapping extents twice each.

I wrote a test to check what happens if you acknowledge overlapping extents:

  * (offset=0, length=256)
  * (offset=128, length=256)

And surely enough mdt_cdt_get_work_done() returns "512" rather than the 
expected "384" (ie. 128 + 256).

Even worse, when acknowledging a "shrinking" extent (offset=0, length=N, 
N - 1, N - 2, ...), only the last value is kept in store.

 From this, I think that exact duplicates are not really ignored, 
rather, intervals that share the same starting point overwrite one 
another, until only the last one remains.
Bug or feature? I don't really know.


> I want to clean up this code as I'm converting all users of the lustre
> interval-tree to use the upstream-linux interval tree code.  What should
> I do?
>
> Should I remove test 108 because it is only testing one particular
> corner case, or should I improve the code to handle all overlaps
> consistently?  Would it be OK to fail an overlap (I'd need to change
> test 108), it must they be quietly accepted?

How does the upstream-linux interval tree compares to Lustre's?

If their behaviours match, there should not be any issue (so far as the 
current behaviour can be considered issue-free).

Otherwise, I think it would be OK to just assume sending overlapping 
extents is a programming error and the server does not need to protect 
itself against it.
In terms of security, this isn't very good, but the problem already 
exists, and copytools are supposedly trusted binaries run as root.
You could then remove test108 as it is itself a programming error, 
test112 as well, and maybe others.

Just to be sure, you could open a new issue on Jira, and let others rule 
how much of a bug/feature the whole thing is.

Cheers,
Quentin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20200831/dc117d63/attachment.html>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [lustre-devel] Error checking for llapi_hsm_action_progress().
  2020-08-31  4:03 [lustre-devel] Error checking for llapi_hsm_action_progress() NeilBrown
  2020-08-31 12:53 ` quentin.bouget at cea.fr
@ 2020-08-31 15:36 ` Joseph Benjamin Evans
  2020-09-01  1:27   ` NeilBrown
  1 sibling, 1 reply; 11+ messages in thread
From: Joseph Benjamin Evans @ 2020-08-31 15:36 UTC (permalink / raw)
  To: lustre-devel

I don't think anything is actually monitoring or using the results of those extents, specifically.  "bytes copied" would be equally useful to the end user, I'd think.  Others may have better data on real-world usage, though.  So this might be a "code deleted is code debugged" situation.

-Ben

?On 8/31/20, 12:03 AM, "lustre-devel on behalf of NeilBrown" <lustre-devel-bounces at lists.lustre.org on behalf of neilb@suse.de> wrote:



    I have a question about llapi_hsm_action_progress().  The documentation
    says that every interval sent "must" be unique, and must not overlap
    (which not exactly the same as 'unique').  The code (on server side)
    only partially enforces this.  It causes any request for an empty
    interval (start>end) to fail, but otherwise accepts any interval.  If it
    gets two identical intervals (not just overlapping, but identical), it
    ignores the second.  This seems weird.

    It would make some sense to just accept any interval - all it does is
    sum the lengths, and use this to report status, so no corruption would
    result.  It would also make sense to return an error if an interval
    overlaps any previous interval, as this violates the spec.  It might
    make sense to accept any interval, but only count the overlapped length
    once.  But the current behaviour of only ignoring exact duplicates is
    weird.  I tried removing that check, but there is a test (hsm_test 108)
    which checks for repeating identical intervals.

    I want to clean up this code as I'm converting all users of the lustre
    interval-tree to use the upstream-linux interval tree code.  What should
    I do?

    Should I remove test 108 because it is only testing one particular
    corner case, or should I improve the code to handle all overlaps
    consistently?  Would it be OK to fail an overlap (I'd need to change
    test 108), it must they be quietly accepted?

    Thanks,
    NeilBrown

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [lustre-devel] Error checking for llapi_hsm_action_progress().
  2020-08-31 12:53 ` quentin.bouget at cea.fr
@ 2020-09-01  0:58   ` NeilBrown
  2020-09-01  9:33     ` quentin.bouget at cea.fr
  0 siblings, 1 reply; 11+ messages in thread
From: NeilBrown @ 2020-09-01  0:58 UTC (permalink / raw)
  To: lustre-devel

On Mon, Aug 31 2020, quentin.bouget at cea.fr wrote:

> Hi Neil,
>
>
> On 31/08/2020 06:03, NeilBrown wrote:
>> I have a question about llapi_hsm_action_progress().  The documentation
>> says that every interval sent "must" be unique, and must not overlap
>> (which not exactly the same as 'unique').  The code (on server side)
>> only partially enforces this.  It causes any request for an empty
>> interval (start>end) to fail, but otherwise accepts any interval.  If it
>> gets two identical intervals (not just overlapping, but identical), it
>> ignores the second.  This seems weird.
>>
>> It would make some sense to just accept any interval - all it does is
>> sum the lengths, and use this to report status, so no corruption would
>> result.  It would also make sense to return an error if an interval
>> overlaps any previous interval, as this violates the spec.  It might
>> make sense to accept any interval, but only count the overlapped length
>> once.  But the current behaviour of only ignoring exact duplicates is
>> weird.  I tried removing that check, but there is a test (hsm_test 108)
>> which checks for repeating identical intervals.
>
> test108 handles a "growing" extent (offset=0, length=1, 2, 3, ...).

test108_progress() in llapi_hsm_test.c sets:
		he.offset = 0;
		he.length = length;
where 'length' is 1000.  It reports progress for this extent 1000 times.
Then complains if the total progress isn't 1000.

> test112 handles acknowledging non-overlapping extents twice each.
Yep.  10 non-overlapping extents, then sends the same 10 again.

>
> I wrote a test to check what happens if you acknowledge overlapping extents:
>
>   * (offset=0, length=256)
>   * (offset=128, length=256)
>
> And surely enough mdt_cdt_get_work_done() returns "512" rather than the 
> expected "384" (ie. 128 + 256).
>
> Even worse, when acknowledging a "shrinking" extent (offset=0, length=N, 
> N - 1, N - 2, ...), only the last value is kept in store.

Does that even mean anything?  It seems to be saying "I've done N amount
of work" and then "Sorry, I lied, I've only done N-1"...

But I'm surprised at your result that only the last is kept.  Reading
the code strongly suggests that it would report the sum of all these
regions.  N^2/2 if you continued to  N-N.

How do you perform these tests?  Is there a command-line tool, or would
I need to write some code?

>
>  From this, I think that exact duplicates are not really ignored, 
> rather, intervals that share the same starting point overwrite one 
> another, until only the last one remains.
> Bug or feature? I don't really know.
>
>
>> I want to clean up this code as I'm converting all users of the lustre
>> interval-tree to use the upstream-linux interval tree code.  What should
>> I do?
>>
>> Should I remove test 108 because it is only testing one particular
>> corner case, or should I improve the code to handle all overlaps
>> consistently?  Would it be OK to fail an overlap (I'd need to change
>> test 108), it must they be quietly accepted?
>
> How does the upstream-linux interval tree compares to Lustre's?

The interesting difference is that the lustre interval-tree refuses to
store exact duplicates (same start, same end), while the Linux
interval-tree will accept any new interval.
I think this made some sense for the first user for interval-trees in
lustre, which was LDLM extent locking, but some other users need to
jump through hoops to handle duplicates correctly.

For hsm_update_work(), it just tries to store the interval it was given
and if that fails, it say "oh well, too bad".

>
> If their behaviours match, there should not be any issue (so far as the 
> current behaviour can be considered issue-free).

Certainly I could preserve exactly the same behaviour, but I find it
hard to write code that doesn't make any sense.
If I *were* to keep exactly the same behaviour as current, I wouldn't use
an interval tree, as (if I understand it correctly), the intervals
aren't really relevant.  It just stores discrete "start+length"
pairs and rejects duplicates.  I'd probably use an rhashtable for that.

>
> Otherwise, I think it would be OK to just assume sending overlapping 
> extents is a programming error and the server does not need to protect 
> itself against it.
> In terms of security, this isn't very good, but the problem already 
> exists, and copytools are supposedly trusted binaries run as root.
> You could then remove test108 as it is itself a programming error, 
> test112 as well, and maybe others.
>
> Just to be sure, you could open a new issue on Jira, and let others rule 
> how much of a bug/feature the whole thing is.

I guess the key question here is: who are those "others" ??
Jira has a "Component/s" field, but it seems to be fixed at "None".
How would I ensure that the Jira issue got to the right people?

Thanks for your response,

NeilBrown


>
> Cheers,
> Quentin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20200901/bfbd5ddf/attachment.sig>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [lustre-devel] Error checking for llapi_hsm_action_progress().
  2020-08-31 15:36 ` Joseph Benjamin Evans
@ 2020-09-01  1:27   ` NeilBrown
  2020-09-01  7:41     ` Degremont, Aurelien
  0 siblings, 1 reply; 11+ messages in thread
From: NeilBrown @ 2020-09-01  1:27 UTC (permalink / raw)
  To: lustre-devel


"code deleted in code debugged" is my preferred outcome.  I haven't
heard anyone clamouring to keep the current behaviour, so I'm leaning
more in that direction.

Thanks,
NeilBrown

On Mon, Aug 31 2020, Joseph Benjamin Evans wrote:

> I don't think anything is actually monitoring or using the results of those extents, specifically.  "bytes copied" would be equally useful to the end user, I'd think.  Others may have better data on real-world usage, though.  So this might be a "code deleted is code debugged" situation.
>
> -Ben
>
> ?On 8/31/20, 12:03 AM, "lustre-devel on behalf of NeilBrown" <lustre-devel-bounces at lists.lustre.org on behalf of neilb@suse.de> wrote:
>
>
>
>     I have a question about llapi_hsm_action_progress().  The documentation
>     says that every interval sent "must" be unique, and must not overlap
>     (which not exactly the same as 'unique').  The code (on server side)
>     only partially enforces this.  It causes any request for an empty
>     interval (start>end) to fail, but otherwise accepts any interval.  If it
>     gets two identical intervals (not just overlapping, but identical), it
>     ignores the second.  This seems weird.
>
>     It would make some sense to just accept any interval - all it does is
>     sum the lengths, and use this to report status, so no corruption would
>     result.  It would also make sense to return an error if an interval
>     overlaps any previous interval, as this violates the spec.  It might
>     make sense to accept any interval, but only count the overlapped length
>     once.  But the current behaviour of only ignoring exact duplicates is
>     weird.  I tried removing that check, but there is a test (hsm_test 108)
>     which checks for repeating identical intervals.
>
>     I want to clean up this code as I'm converting all users of the lustre
>     interval-tree to use the upstream-linux interval tree code.  What should
>     I do?
>
>     Should I remove test 108 because it is only testing one particular
>     corner case, or should I improve the code to handle all overlaps
>     consistently?  Would it be OK to fail an overlap (I'd need to change
>     test 108), it must they be quietly accepted?
>
>     Thanks,
>     NeilBrown
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20200901/a2331e59/attachment.sig>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [lustre-devel] Error checking for llapi_hsm_action_progress().
  2020-09-01  1:27   ` NeilBrown
@ 2020-09-01  7:41     ` Degremont, Aurelien
  2020-09-01 13:10       ` Joseph Benjamin Evans
  0 siblings, 1 reply; 11+ messages in thread
From: Degremont, Aurelien @ 2020-09-01  7:41 UTC (permalink / raw)
  To: lustre-devel

My understanding of the different use cases was:
- Copytool I/O could be done in parallel and acknowledge write range in any order.
- Having a map of what have been copied and what haven't been was done thinking of implementing partial restore in the future. I'm not sure when this feature will be implemented it will really need this from the coordinator.

You can verify some existing copytools in case some of them acknowledge I/O with a specific pattern:
- posix copytool in lustre source
- S3 copytool Lemur (https://github.com/whamcloud/lemur)
- TSM copytools (https://github.com/tstibor/ltsm, and Simon linked this one recently: https://github.com/guilbaults/ct_tsm/)

I would be in favor of not raising an error if acknowledging overlaps an existing extent. 

Aur?lien

?Le 01/09/2020 03:28, ? lustre-devel au nom de NeilBrown ? <lustre-devel-bounces at lists.lustre.org au nom de neilb@suse.de> a ?crit :

    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



    "code deleted in code debugged" is my preferred outcome.  I haven't
    heard anyone clamouring to keep the current behaviour, so I'm leaning
    more in that direction.

    Thanks,
    NeilBrown

    On Mon, Aug 31 2020, Joseph Benjamin Evans wrote:

    > I don't think anything is actually monitoring or using the results of those extents, specifically.  "bytes copied" would be equally useful to the end user, I'd think.  Others may have better data on real-world usage, though.  So this might be a "code deleted is code debugged" situation.
    >
    > -Ben
    >
    > On 8/31/20, 12:03 AM, "lustre-devel on behalf of NeilBrown" <lustre-devel-bounces at lists.lustre.org on behalf of neilb@suse.de> wrote:
    >
    >
    >
    >     I have a question about llapi_hsm_action_progress().  The documentation
    >     says that every interval sent "must" be unique, and must not overlap
    >     (which not exactly the same as 'unique').  The code (on server side)
    >     only partially enforces this.  It causes any request for an empty
    >     interval (start>end) to fail, but otherwise accepts any interval.  If it
    >     gets two identical intervals (not just overlapping, but identical), it
    >     ignores the second.  This seems weird.
    >
    >     It would make some sense to just accept any interval - all it does is
    >     sum the lengths, and use this to report status, so no corruption would
    >     result.  It would also make sense to return an error if an interval
    >     overlaps any previous interval, as this violates the spec.  It might
    >     make sense to accept any interval, but only count the overlapped length
    >     once.  But the current behaviour of only ignoring exact duplicates is
    >     weird.  I tried removing that check, but there is a test (hsm_test 108)
    >     which checks for repeating identical intervals.
    >
    >     I want to clean up this code as I'm converting all users of the lustre
    >     interval-tree to use the upstream-linux interval tree code.  What should
    >     I do?
    >
    >     Should I remove test 108 because it is only testing one particular
    >     corner case, or should I improve the code to handle all overlaps
    >     consistently?  Would it be OK to fail an overlap (I'd need to change
    >     test 108), it must they be quietly accepted?
    >
    >     Thanks,
    >     NeilBrown

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [lustre-devel] Error checking for llapi_hsm_action_progress().
  2020-09-01  0:58   ` NeilBrown
@ 2020-09-01  9:33     ` quentin.bouget at cea.fr
  2020-09-01 12:07       ` Degremont, Aurelien
  2020-09-02  0:36       ` NeilBrown
  0 siblings, 2 replies; 11+ messages in thread
From: quentin.bouget at cea.fr @ 2020-09-01  9:33 UTC (permalink / raw)
  To: lustre-devel

On 01/09/2020 02:58, NeilBrown wrote:
> On Mon, Aug 31 2020, quentin.bouget at cea.fr wrote:
>> On 31/08/2020 06:03, NeilBrown wrote:
>>> I tried removing that check, but there is a test (hsm_test 108)
>>> which checks for repeating identical intervals.
>> test108 handles a "growing" extent (offset=0, length=1, 2, 3, ...).
> test108_progress() in llapi_hsm_test.c sets:
> 		he.offset = 0;
> 		he.length = length;
> where 'length' is 1000.  It reports progress for this extent 1000 times.
> Then complains if the total progress isn't 1000.

My bad, I read that wrong. You are right.

>> test112 handles acknowledging non-overlapping extents twice each.
> Yep.  10 non-overlapping extents, then sends the same 10 again.
>
>> I wrote a test to check what happens if you acknowledge overlapping extents:
>>
>>    * (offset=0, length=256)
>>    * (offset=128, length=256)
>>
>> And surely enough mdt_cdt_get_work_done() returns "512" rather than the
>> expected "384" (ie. 128 + 256).
>>
>> Even worse, when acknowledging a "shrinking" extent (offset=0, length=N,
>> N - 1, N - 2, ...), only the last value is kept in store.
> Does that even mean anything?  It seems to be saying "I've done N amount
> of work" and then "Sorry, I lied, I've only done N-1"...

I think it could happen if a copytool spreads HSM actions to multiple 
workers, one of the worker stalls, its work is resubmitted to another 
worker, and both of them acknowledge extents with different lengths.

But yes, this is at best a weird/bad corner case.

> But I'm surprised at your result that only the last is kept.  Reading
> the code strongly suggests that it would report the sum of all these
> regions.  N^2/2 if you continued to  N-N.

That was me being wrong again. It behaves just like you say.

> How do you perform these tests?  Is there a command-line tool, or would
> I need to write some code?

I copied test112 in llapi_hsm_test.c and edited the code (and read the 
error message wrong =/).

>> From this, I think that exact duplicates are not really ignored, 
>> rather, intervals that share the same starting point overwrite one 
>> another, until only the last one remains. Bug or feature? I don't 
>> really know.
>>
>>
>>> I want to clean up this code as I'm converting all users of the lustre
>>> interval-tree to use the upstream-linux interval tree code.  What should
>>> I do?
>> How does the upstream-linux interval tree compares to Lustre's?
> The interesting difference is that the lustre interval-tree refuses to
> store exact duplicates (same start, same end), while the Linux
> interval-tree will accept any new interval.
> I think this made some sense for the first user for interval-trees in
> lustre, which was LDLM extent locking, but some other users need to
> jump through hoops to handle duplicates correctly.
>
> For hsm_update_work(), it just tries to store the interval it was given
> and if that fails, it say "oh well, too bad".

Yes, there is a comment saying : "it can happen if ct sends 2 times the 
same progress".

 From "git blame", this seems to be coming from one of the earliest HSM 
commits, so I don't know if this was ever useful in practice or just an 
instance of defensive programming.

>> If their behaviours match, there should not be any issue (so far as the
>> current behaviour can be considered issue-free).
> Certainly I could preserve exactly the same behaviour, but I find it
> hard to write code that doesn't make any sense.
> If I *were* to keep exactly the same behaviour as current, I wouldn't use
> an interval tree, as (if I understand it correctly), the intervals
> aren't really relevant.  It just stores discrete "start+length"
> pairs and rejects duplicates.  I'd probably use an rhashtable for that.

Assuming duplicated extents, but otherwise non-overlapping, I agree.

Ideally, there would be a standard data structure that automatically 
merges overlapping/contiguous extents.
But I don't know any, and I don't think it is worth coding one from scratch.

>> Otherwise, I think it would be OK to just assume sending overlapping
>> extents is a programming error and the server does not need to protect
>> itself against it.
>> In terms of security, this isn't very good, but the problem already
>> exists, and copytools are supposedly trusted binaries run as root.
>> You could then remove test108 as it is itself a programming error,
>> test112 as well, and maybe others.
>>
>> Just to be sure, you could open a new issue on Jira, and let others rule
>> how much of a bug/feature the whole thing is.
> I guess the key question here is: who are those "others" ??
> Jira has a "Component/s" field, but it seems to be fixed at "None".
> How would I ensure that the Jira issue got to the right people?

There is a "label" field that can take "HSM" as a value.

Other than that, I agree with Ben that removing the feature would be the 
easiest way to fix all this.
I am just worried that someone somewhere uses it. For example, I can 
imagine a monitoring script that polls the coordinator's active request 
procfile to detect stalled HSM actions.

I'll check with our admins, but I don't think we use it at CEA. Maybe if 
Cray & DDN confirm that they don't know anyone using it, it will be 
enough to remove the code, like Ben suggests. Or maybe we axe the 
feature and wait to see if anyone complains.

 From the repos linked by Aur?lien, every copytool seems to mind the 
constraint of not sending overlapping extents (not even duplicates). I 
also have access to the sources of a different copytool that uses 
llapi_hsm_action_progress() as a heartbeat (offset=0, length=0).

Sorry about the earlier mistakes.

Have a good day,
Quentin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20200901/2ddc421a/attachment-0001.html>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [lustre-devel] Error checking for llapi_hsm_action_progress().
  2020-09-01  9:33     ` quentin.bouget at cea.fr
@ 2020-09-01 12:07       ` Degremont, Aurelien
  2020-09-02  0:36       ` NeilBrown
  1 sibling, 0 replies; 11+ messages in thread
From: Degremont, Aurelien @ 2020-09-01 12:07 UTC (permalink / raw)
  To: lustre-devel




I also have access to the sources of a different copytool that uses llapi_hsm_action_progress() as a heartbeat (offset=0, length=0).



This should still be supported, this is a feature. Copytool are considered stale if no progress is received for a while. Copytools which are not making any progress (waiting for a tape to be mounted by example) are supposed to send progress for length=0 to avoid timeout.



Aur?lien
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20200901/c1df6e4e/attachment.html>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [lustre-devel] Error checking for llapi_hsm_action_progress().
  2020-09-01  7:41     ` Degremont, Aurelien
@ 2020-09-01 13:10       ` Joseph Benjamin Evans
  0 siblings, 0 replies; 11+ messages in thread
From: Joseph Benjamin Evans @ 2020-09-01 13:10 UTC (permalink / raw)
  To: lustre-devel

If we implement partial restore, there should definitely be code in the coordinator to handle it.  But right now, I don't see anyone working on it.

As to parallel I/O, I know of a few copytools that do that already, and they coordinate among themselves (using MPI or other frameworks) to specify what data gets copied by which clients rather than use Lustre to do it.  To Lustre that should be opaque.

?On 9/1/20, 3:41 AM, "Degremont, Aurelien" <degremoa@amazon.com> wrote:

    My understanding of the different use cases was:
    - Copytool I/O could be done in parallel and acknowledge write range in any order.
    - Having a map of what have been copied and what haven't been was done thinking of implementing partial restore in the future. I'm not sure when this feature will be implemented it will really need this from the coordinator.

    You can verify some existing copytools in case some of them acknowledge I/O with a specific pattern:
    - posix copytool in lustre source
    - S3 copytool Lemur (https://github.com/whamcloud/lemur)
    - TSM copytools (https://github.com/tstibor/ltsm, and Simon linked this one recently: https://github.com/guilbaults/ct_tsm/)

    I would be in favor of not raising an error if acknowledging overlaps an existing extent. 

    Aur?lien

    Le 01/09/2020 03:28, ? lustre-devel au nom de NeilBrown ? <lustre-devel-bounces at lists.lustre.org au nom de neilb@suse.de> a ?crit :

        CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



        "code deleted in code debugged" is my preferred outcome.  I haven't
        heard anyone clamouring to keep the current behaviour, so I'm leaning
        more in that direction.

        Thanks,
        NeilBrown

        On Mon, Aug 31 2020, Joseph Benjamin Evans wrote:

        > I don't think anything is actually monitoring or using the results of those extents, specifically.  "bytes copied" would be equally useful to the end user, I'd think.  Others may have better data on real-world usage, though.  So this might be a "code deleted is code debugged" situation.
        >
        > -Ben
        >
        > On 8/31/20, 12:03 AM, "lustre-devel on behalf of NeilBrown" <lustre-devel-bounces at lists.lustre.org on behalf of neilb@suse.de> wrote:
        >
        >
        >
        >     I have a question about llapi_hsm_action_progress().  The documentation
        >     says that every interval sent "must" be unique, and must not overlap
        >     (which not exactly the same as 'unique').  The code (on server side)
        >     only partially enforces this.  It causes any request for an empty
        >     interval (start>end) to fail, but otherwise accepts any interval.  If it
        >     gets two identical intervals (not just overlapping, but identical), it
        >     ignores the second.  This seems weird.
        >
        >     It would make some sense to just accept any interval - all it does is
        >     sum the lengths, and use this to report status, so no corruption would
        >     result.  It would also make sense to return an error if an interval
        >     overlaps any previous interval, as this violates the spec.  It might
        >     make sense to accept any interval, but only count the overlapped length
        >     once.  But the current behaviour of only ignoring exact duplicates is
        >     weird.  I tried removing that check, but there is a test (hsm_test 108)
        >     which checks for repeating identical intervals.
        >
        >     I want to clean up this code as I'm converting all users of the lustre
        >     interval-tree to use the upstream-linux interval tree code.  What should
        >     I do?
        >
        >     Should I remove test 108 because it is only testing one particular
        >     corner case, or should I improve the code to handle all overlaps
        >     consistently?  Would it be OK to fail an overlap (I'd need to change
        >     test 108), it must they be quietly accepted?
        >
        >     Thanks,
        >     NeilBrown

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [lustre-devel] Error checking for llapi_hsm_action_progress().
  2020-09-01  9:33     ` quentin.bouget at cea.fr
  2020-09-01 12:07       ` Degremont, Aurelien
@ 2020-09-02  0:36       ` NeilBrown
  1 sibling, 0 replies; 11+ messages in thread
From: NeilBrown @ 2020-09-02  0:36 UTC (permalink / raw)
  To: lustre-devel

On Tue, Sep 01 2020, quentin.bouget at cea.fr wrote:
>
> Ideally, there would be a standard data structure that automatically 
> merges overlapping/contiguous extents.
> But I don't know any, and I don't think it is worth coding one from scratch.
>

An interval tree is close enough that it would take .. uhmm....

 while ((overlap = interval_iter_first(
                        &crp->crp_root,
 			node->start == 0 ? 0 : node->start - 1,
                        node->end == LUSTRE_EOF ? LUSTRE_EOF : node->end + 1))
        != NULL) {
	node->start = min(node->start, overlap->start);
        node->end = max(node->end, overlap->end);
        interval_remove(node, &crp->crp_root);
 }
 interval_insert(node, &crp->crp_root);

... 10 lines of code to merge overlapping regions.

Maybe I'll do that.

I get the general impression that while people don't see it as very
important to keep track of the reported ranges, they would probably feel
a little happier if we kept track than if we didn't.
Reporting errors on overlaps is definitely out, but merging overlaps
is widely seen as sensible.

Thanks everyone for the feedback.

NeilBrown
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 853 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20200902/5a48ce37/attachment.sig>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [lustre-devel] Error checking for llapi_hsm_action_progress().
@ 2020-09-18 17:33 Nathan Rutman
  0 siblings, 0 replies; 11+ messages in thread
From: Nathan Rutman @ 2020-09-18 17:33 UTC (permalink / raw)
  To: lustre-devel

As I've presented before, I really think Lustre should get out of the
business of tracking HSM requests and progress completely for scalability
reasons. External tools should do their thing, and simply let Lustre know
when to atomically change the file's layout (released, restored, migrated,
mirrored, etc). (All this work is in the "externalized coordinator" and
"hsm as layout" patches up at WC.)

Anyhow, to that end, I vote in favor of resolving by removal this
apparently unused feature. Liblustre could silently drop the extent info
for a trivial "fix".


My understanding of the different use cases was:

- Copytool I/O could be done in parallel and acknowledge write range
in any order.

- Having a map of what have been copied and what haven't been was done
thinking of implementing partial restore in the future. I'm not sure
when this feature will be implemented it will really need this from
the coordinator.

You can verify some existing copytools in case some of them
acknowledge I/O with a specific pattern:

- posix copytool in lustre source

- S3 copytool Lemur (https://github.com/whamcloud/lemur)

- TSM copytools (https://github.com/tstibor/ltsm, and Simon linked
this one recently: https://github.com/guilbaults/ct_tsm/)

I would be in favor of not raising an error if acknowledging overlaps
an existing extent.

Aur?lien

?Le 01/09/2020 03:28, ? lustre-devel au nom de NeilBrown ?
<lustre-devel-bounces@lists.lustre.org
<http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org> au nom
de neilb at suse.de
<http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org>> a
?crit :

    CAUTION: This email originated from outside of the organization.
Do not click links or open attachments unless you can confirm the
sender and know the content is safe.

    "code deleted in code debugged" is my preferred outcome.  I haven't

    heard anyone clamouring to keep the current behaviour, so I'm leaning

    more in that direction.

    Thanks,

    NeilBrown

    On Mon, Aug 31 2020, Joseph Benjamin Evans wrote:

    > I don't think anything is actually monitoring or using the
results of those extents, specifically.  "bytes copied" would be
equally useful to the end user, I'd think.  Others may have better
data on real-world usage, though.  So this might be a "code deleted is
code debugged" situation.

    >

    > -Ben

    >

    > On 8/31/20, 12:03 AM, "lustre-devel on behalf of NeilBrown"
<lustre-devel-bounces@lists.lustre.org
<http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org> on
behalf of neilb at suse.de
<http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org>> wrote:

    >

    >

    >

    >     I have a question about llapi_hsm_action_progress().  The
documentation

    >     says that every interval sent "must" be unique, and must not overlap

    >     (which not exactly the same as 'unique').  The code (on server side)

    >     only partially enforces this.  It causes any request for an empty

    >     interval (start>end) to fail, but otherwise accepts any
interval.  If it

    >     gets two identical intervals (not just overlapping, but identical), it

    >     ignores the second.  This seems weird.

    >

    >     It would make some sense to just accept any interval - all it does is

    >     sum the lengths, and use this to report status, so no corruption would

    >     result.  It would also make sense to return an error if an interval

    >     overlaps any previous interval, as this violates the spec.  It might

    >     make sense to accept any interval, but only count the
overlapped length

    >     once.  But the current behaviour of only ignoring exact duplicates is

    >     weird.  I tried removing that check, but there is a test
(hsm_test 108)

    >     which checks for repeating identical intervals.

    >

    >     I want to clean up this code as I'm converting all users of the lustre

    >     interval-tree to use the upstream-linux interval tree code.
What should

    >     I do?

    >

    >     Should I remove test 108 because it is only testing one particular

    >     corner case, or should I improve the code to handle all overlaps

    >     consistently?  Would it be OK to fail an overlap (I'd need to change

    >     test 108), it must they be quietly accepted?

    >

    >     Thanks,

    >     NeilBrown
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20200918/36b186f4/attachment-0001.html>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-09-18 17:33 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-31  4:03 [lustre-devel] Error checking for llapi_hsm_action_progress() NeilBrown
2020-08-31 12:53 ` quentin.bouget at cea.fr
2020-09-01  0:58   ` NeilBrown
2020-09-01  9:33     ` quentin.bouget at cea.fr
2020-09-01 12:07       ` Degremont, Aurelien
2020-09-02  0:36       ` NeilBrown
2020-08-31 15:36 ` Joseph Benjamin Evans
2020-09-01  1:27   ` NeilBrown
2020-09-01  7:41     ` Degremont, Aurelien
2020-09-01 13:10       ` Joseph Benjamin Evans
2020-09-18 17:33 Nathan Rutman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).