All of lore.kernel.org
 help / color / mirror / Atom feed
* [lustre-devel] Regression with delayed_work in sec_gc
@ 2018-05-15 15:17 James Simmons
  2018-05-15 22:22 ` NeilBrown
  0 siblings, 1 reply; 11+ messages in thread
From: James Simmons @ 2018-05-15 15:17 UTC (permalink / raw)
  To: lustre-devel


Dmitry has backported your upstream commit 

be3c64da248b1ada2bbcfc3855f07536d8087acb ( ptlrpc: use workqueue for 
pinger)

and Sebastien tried out the change which can be seen at 
https://review.whamcloud.com/#/c/31724. Sebastien is having issues with 
unloading osc, mdc, ost or mdt kernel modules. He writes that he activated 
Kerberos for my file system, did some IOs, then tried to unmount the 
client and the  targets. Unfortunately, on some occasions, while 
unmounting, the mdc module for instance has extra references (as shown by 
'lsmod'), and cannot be unloaded. After a while, the extra references are 
dropped, and the  module can be unloaded. It also happens on occasions on 
server side, for the mdt or ost modules. I tried destroy_workqueue but
that made things worst :-( Do you know what is going on?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [lustre-devel] Regression with delayed_work in sec_gc
  2018-05-15 15:17 [lustre-devel] Regression with delayed_work in sec_gc James Simmons
@ 2018-05-15 22:22 ` NeilBrown
  2018-05-17  9:10   ` Sebastien Buisson
  0 siblings, 1 reply; 11+ messages in thread
From: NeilBrown @ 2018-05-15 22:22 UTC (permalink / raw)
  To: lustre-devel

On Tue, May 15 2018, James Simmons wrote:

> Dmitry has backported your upstream commit 
>
> be3c64da248b1ada2bbcfc3855f07536d8087acb ( ptlrpc: use workqueue for 
> pinger)
>
> and Sebastien tried out the change which can be seen at 
> https://review.whamcloud.com/#/c/31724. Sebastien is having issues with 
> unloading osc, mdc, ost or mdt kernel modules. He writes that he activated 
> Kerberos for my file system, did some IOs, then tried to unmount the 
> client and the  targets. Unfortunately, on some occasions, while 
> unmounting, the mdc module for instance has extra references (as shown by 
> 'lsmod'), and cannot be unloaded. After a while, the extra references are 
> dropped, and the  module can be unloaded. It also happens on occasions on 
> server side, for the mdt or ost modules. I tried destroy_workqueue but
> that made things worst :-( Do you know what is going on?

My guess is that ptlrpc_pinger_main() is looping, so
cancel_delayed_work_sync() is waiting an unusually long time for it.
This patch might fix the problem - it certainly makes the usage
of "this_ping" closer to the original code.

Thanks,
NeilBrown

diff --git a/drivers/staging/lustre/lustre/ptlrpc/pinger.c b/drivers/staging/lustre/lustre/ptlrpc/pinger.c
index 89eef8ec7df4..131b201541e8 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/pinger.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/pinger.c
@@ -222,12 +222,13 @@ static DECLARE_DELAYED_WORK(ping_work, ptlrpc_pinger_main);
 
 static void ptlrpc_pinger_main(struct work_struct *ws)
 {
-	unsigned long this_ping = jiffies;
 	long time_to_next_wake;
 	struct timeout_item *item;
 	struct obd_import *imp;
 
 	do {
+		unsigned long this_ping = jiffies;
+
 		mutex_lock(&pinger_mutex);
 		list_for_each_entry(item, &timeout_list, ti_chain) {
 			item->ti_cb(item, item->ti_cb_data);
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180516/4e6c8c37/attachment.sig>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [lustre-devel] Regression with delayed_work in sec_gc
  2018-05-15 22:22 ` NeilBrown
@ 2018-05-17  9:10   ` Sebastien Buisson
  2018-05-21  1:22     ` NeilBrown
  0 siblings, 1 reply; 11+ messages in thread
From: Sebastien Buisson @ 2018-05-17  9:10 UTC (permalink / raw)
  To: lustre-devel


> Le 16 mai 2018 ? 00:22, NeilBrown <neilb@suse.com> a ?crit :
> 
> On Tue, May 15 2018, James Simmons wrote:
> 
>> Dmitry has backported your upstream commit
>> 
>> be3c64da248b1ada2bbcfc3855f07536d8087acb ( ptlrpc: use workqueue for
>> pinger)
>> 
>> and Sebastien tried out the change which can be seen at
>> https://review.whamcloud.com/#/c/31724. Sebastien is having issues with
>> unloading osc, mdc, ost or mdt kernel modules. He writes that he activated
>> Kerberos for my file system, did some IOs, then tried to unmount the
>> client and the  targets. Unfortunately, on some occasions, while
>> unmounting, the mdc module for instance has extra references (as shown by
>> 'lsmod'), and cannot be unloaded. After a while, the extra references are
>> dropped, and the  module can be unloaded. It also happens on occasions on
>> server side, for the mdt or ost modules. I tried destroy_workqueue but
>> that made things worst :-( Do you know what is going on?
> 
> My guess is that ptlrpc_pinger_main() is looping, so
> cancel_delayed_work_sync() is waiting an unusually long time for it.
> This patch might fix the problem - it certainly makes the usage
> of "this_ping" closer to the original code.
> 
> Thanks,
> NeilBrown
> 
> diff --git a/drivers/staging/lustre/lustre/ptlrpc/pinger.c b/drivers/staging/lustre/lustre/ptlrpc/pinger.c
> index 89eef8ec7df4..131b201541e8 100644
> --- a/drivers/staging/lustre/lustre/ptlrpc/pinger.c
> +++ b/drivers/staging/lustre/lustre/ptlrpc/pinger.c
> @@ -222,12 +222,13 @@ static DECLARE_DELAYED_WORK(ping_work, ptlrpc_pinger_main);
> 
> static void ptlrpc_pinger_main(struct work_struct *ws)
> {
> -	unsigned long this_ping = jiffies;
> 	long time_to_next_wake;
> 	struct timeout_item *item;
> 	struct obd_import *imp;
> 
> 	do {
> +		unsigned long this_ping = jiffies;
> +
> 		mutex_lock(&pinger_mutex);
> 		list_for_each_entry(item, &timeout_list, ti_chain) {
> 			item->ti_cb(item, item->ti_cb_data);

Still same problem when testing "ptlrpc: use delayed_work in sec_gc" with patch "ptlrpc: use workqueue for pinger" and modification as suggested above. Kernel module ptlrpc_gss holds extra references, and cannot be unloaded.

However, patch "ptlrpc: use workqueue for pinger" alone does not have the issue.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Message signed with OpenPGP
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180517/94790452/attachment.sig>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [lustre-devel] Regression with delayed_work in sec_gc
  2018-05-17  9:10   ` Sebastien Buisson
@ 2018-05-21  1:22     ` NeilBrown
  2018-05-22 16:05       ` Sebastien Buisson
  2018-05-22 20:50       ` Dilger, Andreas
  0 siblings, 2 replies; 11+ messages in thread
From: NeilBrown @ 2018-05-21  1:22 UTC (permalink / raw)
  To: lustre-devel

On Thu, May 17 2018, Sebastien Buisson wrote:

>> Le 16 mai 2018 ? 00:22, NeilBrown <neilb@suse.com> a ?crit :
>> 
>> On Tue, May 15 2018, James Simmons wrote:
>> 
>>> Dmitry has backported your upstream commit
>>> 
>>> be3c64da248b1ada2bbcfc3855f07536d8087acb ( ptlrpc: use workqueue for
>>> pinger)
>>> 
>>> and Sebastien tried out the change which can be seen at
>>> https://review.whamcloud.com/#/c/31724. Sebastien is having issues with
>>> unloading osc, mdc, ost or mdt kernel modules. He writes that he activated
>>> Kerberos for my file system, did some IOs, then tried to unmount the
>>> client and the  targets. Unfortunately, on some occasions, while
>>> unmounting, the mdc module for instance has extra references (as shown by
>>> 'lsmod'), and cannot be unloaded. After a while, the extra references are
>>> dropped, and the  module can be unloaded. It also happens on occasions on
>>> server side, for the mdt or ost modules. I tried destroy_workqueue but
>>> that made things worst :-( Do you know what is going on?
>> 
>> My guess is that ptlrpc_pinger_main() is looping, so
>> cancel_delayed_work_sync() is waiting an unusually long time for it.
>> This patch might fix the problem - it certainly makes the usage
>> of "this_ping" closer to the original code.
>> 
>> Thanks,
>> NeilBrown
>> 
>> diff --git a/drivers/staging/lustre/lustre/ptlrpc/pinger.c b/drivers/staging/lustre/lustre/ptlrpc/pinger.c
>> index 89eef8ec7df4..131b201541e8 100644
>> --- a/drivers/staging/lustre/lustre/ptlrpc/pinger.c
>> +++ b/drivers/staging/lustre/lustre/ptlrpc/pinger.c
>> @@ -222,12 +222,13 @@ static DECLARE_DELAYED_WORK(ping_work, ptlrpc_pinger_main);
>> 
>> static void ptlrpc_pinger_main(struct work_struct *ws)
>> {
>> -	unsigned long this_ping = jiffies;
>> 	long time_to_next_wake;
>> 	struct timeout_item *item;
>> 	struct obd_import *imp;
>> 
>> 	do {
>> +		unsigned long this_ping = jiffies;
>> +
>> 		mutex_lock(&pinger_mutex);
>> 		list_for_each_entry(item, &timeout_list, ti_chain) {
>> 			item->ti_cb(item, item->ti_cb_data);
>
> Still same problem when testing "ptlrpc: use delayed_work in sec_gc" with patch "ptlrpc: use workqueue for pinger" and modification as suggested above. Kernel module ptlrpc_gss holds extra references, and cannot be unloaded.
>
> However, patch "ptlrpc: use workqueue for pinger" alone does not have the issue.

Thanks for the clear explanation - helps a lot.  I see the problem - I think.

When (e.g.) ctx_release_kr() is called as part of shutting down a security
context, it calls sptlrpc_gc_add_ctx() which signals the gc-thread to
re-run.
None of this code is in mainline (why isn't the GSS code in
mainline????)
So I removed the SVC_SIGNAL flag as it was never being set.
So I didn't add matching functionality when porting to delayed-work.

To port  "ptlrpc: use delayed_work in sec_gc" to upstream lustre you
need to get sptlrpc_gc_add_ctx() to call
  	mod_delayed_work(system_wq, &sec_gc_work, 0);

NeilBrown
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180521/a1104c70/attachment.sig>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [lustre-devel] Regression with delayed_work in sec_gc
  2018-05-21  1:22     ` NeilBrown
@ 2018-05-22 16:05       ` Sebastien Buisson
  2018-05-22 20:50       ` Dilger, Andreas
  1 sibling, 0 replies; 11+ messages in thread
From: Sebastien Buisson @ 2018-05-22 16:05 UTC (permalink / raw)
  To: lustre-devel


> Le 21 mai 2018 ? 03:22, NeilBrown <neilb@suse.com> a ?crit :
> 
> On Thu, May 17 2018, Sebastien Buisson wrote:
> 
>>> Le 16 mai 2018 ? 00:22, NeilBrown <neilb@suse.com> a ?crit :
>>> 
>>> On Tue, May 15 2018, James Simmons wrote:
>>> 
>>>> Dmitry has backported your upstream commit
>>>> 
>>>> be3c64da248b1ada2bbcfc3855f07536d8087acb ( ptlrpc: use workqueue for
>>>> pinger)
>>>> 
>>>> and Sebastien tried out the change which can be seen at
>>>> https://review.whamcloud.com/#/c/31724. Sebastien is having issues with
>>>> unloading osc, mdc, ost or mdt kernel modules. He writes that he activated
>>>> Kerberos for my file system, did some IOs, then tried to unmount the
>>>> client and the  targets. Unfortunately, on some occasions, while
>>>> unmounting, the mdc module for instance has extra references (as shown by
>>>> 'lsmod'), and cannot be unloaded. After a while, the extra references are
>>>> dropped, and the  module can be unloaded. It also happens on occasions on
>>>> server side, for the mdt or ost modules. I tried destroy_workqueue but
>>>> that made things worst :-( Do you know what is going on?
>>> 
>>> My guess is that ptlrpc_pinger_main() is looping, so
>>> cancel_delayed_work_sync() is waiting an unusually long time for it.
>>> This patch might fix the problem - it certainly makes the usage
>>> of "this_ping" closer to the original code.
>>> 
>>> Thanks,
>>> NeilBrown
>>> 
>>> diff --git a/drivers/staging/lustre/lustre/ptlrpc/pinger.c b/drivers/staging/lustre/lustre/ptlrpc/pinger.c
>>> index 89eef8ec7df4..131b201541e8 100644
>>> --- a/drivers/staging/lustre/lustre/ptlrpc/pinger.c
>>> +++ b/drivers/staging/lustre/lustre/ptlrpc/pinger.c
>>> @@ -222,12 +222,13 @@ static DECLARE_DELAYED_WORK(ping_work, ptlrpc_pinger_main);
>>> 
>>> static void ptlrpc_pinger_main(struct work_struct *ws)
>>> {
>>> -	unsigned long this_ping = jiffies;
>>> 	long time_to_next_wake;
>>> 	struct timeout_item *item;
>>> 	struct obd_import *imp;
>>> 
>>> 	do {
>>> +		unsigned long this_ping = jiffies;
>>> +
>>> 		mutex_lock(&pinger_mutex);
>>> 		list_for_each_entry(item, &timeout_list, ti_chain) {
>>> 			item->ti_cb(item, item->ti_cb_data);
>> 
>> Still same problem when testing "ptlrpc: use delayed_work in sec_gc" with patch "ptlrpc: use workqueue for pinger" and modification as suggested above. Kernel module ptlrpc_gss holds extra references, and cannot be unloaded.
>> 
>> However, patch "ptlrpc: use workqueue for pinger" alone does not have the issue.
> 
> Thanks for the clear explanation - helps a lot.  I see the problem - I think.
> 
> When (e.g.) ctx_release_kr() is called as part of shutting down a security
> context, it calls sptlrpc_gc_add_ctx() which signals the gc-thread to
> re-run.
> None of this code is in mainline (why isn't the GSS code in
> mainline????)
> So I removed the SVC_SIGNAL flag as it was never being set.
> So I didn't add matching functionality when porting to delayed-work.
> 
> To port  "ptlrpc: use delayed_work in sec_gc" to upstream lustre you
> need to get sptlrpc_gc_add_ctx() to call
>  	mod_delayed_work(system_wq, &sec_gc_work, 0);
> 
> NeilBrown

Excellent, no more pending extra references on kernel modules when calling mod_delayed_work() from sptlrpc_gc_add_ctx(), thanks!

I have updated patch at https://review.whamcloud.com/31724 to include this modification.

Cheers,
Sebastien.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Message signed with OpenPGP
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180522/11717952/attachment.sig>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [lustre-devel] Regression with delayed_work in sec_gc
  2018-05-21  1:22     ` NeilBrown
  2018-05-22 16:05       ` Sebastien Buisson
@ 2018-05-22 20:50       ` Dilger, Andreas
  2018-05-22 22:08         ` NeilBrown
  1 sibling, 1 reply; 11+ messages in thread
From: Dilger, Andreas @ 2018-05-22 20:50 UTC (permalink / raw)
  To: lustre-devel

On May 20, 2018, at 19:22, NeilBrown <neilb@suse.com> wrote:
> 
> On Thu, May 17 2018, Sebastien Buisson wrote:
> 
>>> Le 16 mai 2018 ? 00:22, NeilBrown <neilb@suse.com> a ?crit :
>>> 
>>> On Tue, May 15 2018, James Simmons wrote:
>>> 
>>>> Dmitry has backported your upstream commit
>>>> 
>>>> be3c64da248b1ada2bbcfc3855f07536d8087acb ( ptlrpc: use workqueue for
>>>> pinger)
>>>> 
>>>> and Sebastien tried out the change which can be seen at
>>>> https://review.whamcloud.com/#/c/31724. Sebastien is having issues with
>>>> unloading osc, mdc, ost or mdt kernel modules. He writes that he activated
>>>> Kerberos for my file system, did some IOs, then tried to unmount the
>>>> client and the  targets. Unfortunately, on some occasions, while
>>>> unmounting, the mdc module for instance has extra references (as shown by
>>>> 'lsmod'), and cannot be unloaded. After a while, the extra references are
>>>> dropped, and the  module can be unloaded. It also happens on occasions on
>>>> server side, for the mdt or ost modules. I tried destroy_workqueue but
>>>> that made things worst :-( Do you know what is going on?
>>> 
>>> My guess is that ptlrpc_pinger_main() is looping, so
>>> cancel_delayed_work_sync() is waiting an unusually long time for it.
>>> This patch might fix the problem - it certainly makes the usage
>>> of "this_ping" closer to the original code.
>>> 
>>> Thanks,
>>> NeilBrown
>>> 
>>> diff --git a/drivers/staging/lustre/lustre/ptlrpc/pinger.c b/drivers/staging/lustre/lustre/ptlrpc/pinger.c
>>> index 89eef8ec7df4..131b201541e8 100644
>>> --- a/drivers/staging/lustre/lustre/ptlrpc/pinger.c
>>> +++ b/drivers/staging/lustre/lustre/ptlrpc/pinger.c
>>> @@ -222,12 +222,13 @@ static DECLARE_DELAYED_WORK(ping_work, ptlrpc_pinger_main);
>>> 
>>> static void ptlrpc_pinger_main(struct work_struct *ws)
>>> {
>>> -	unsigned long this_ping = jiffies;
>>> 	long time_to_next_wake;
>>> 	struct timeout_item *item;
>>> 	struct obd_import *imp;
>>> 
>>> 	do {
>>> +		unsigned long this_ping = jiffies;
>>> +
>>> 		mutex_lock(&pinger_mutex);
>>> 		list_for_each_entry(item, &timeout_list, ti_chain) {
>>> 			item->ti_cb(item, item->ti_cb_data);
>> 
>> Still same problem when testing "ptlrpc: use delayed_work in sec_gc" with patch "ptlrpc: use workqueue for pinger" and modification as suggested above. Kernel module ptlrpc_gss holds extra references, and cannot be unloaded.
>> 
>> However, patch "ptlrpc: use workqueue for pinger" alone does not have the issue.
> 
> Thanks for the clear explanation - helps a lot.  I see the problem - I think.
> 
> When (e.g.) ctx_release_kr() is called as part of shutting down a security
> context, it calls sptlrpc_gc_add_ctx() which signals the gc-thread to
> re-run.
> None of this code is in mainline (why isn't the GSS code in
> mainline????)

We are in the un-enviable position that we can't really land new Lustre
features into the staging tree, but we can't get out of the staging tree
without a lot of code changes.

It isn't clear to me why Lustre has such a high bar to cross to get out
of staging, compared to other filesystems that were added in the past.
It's not like Lustre is going away any time soon, unlike some no-name
ethernet card driver, and we could still continue to improve the Lustre
code if we could move out of staging, but it could be done in a more
sustainable manner where we also land feature patches and try to bring
the kernel and master code closer together over time.

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [lustre-devel] Regression with delayed_work in sec_gc
  2018-05-22 20:50       ` Dilger, Andreas
@ 2018-05-22 22:08         ` NeilBrown
  2018-05-22 23:01           ` Patrick Farrell
  2018-05-23 22:36           ` James Simmons
  0 siblings, 2 replies; 11+ messages in thread
From: NeilBrown @ 2018-05-22 22:08 UTC (permalink / raw)
  To: lustre-devel

On Tue, May 22 2018, Dilger, Andreas wrote:

> On May 20, 2018, at 19:22, NeilBrown <neilb@suse.com> wrote:
>> 
>> On Thu, May 17 2018, Sebastien Buisson wrote:
>> 
>>>> Le 16 mai 2018 ? 00:22, NeilBrown <neilb@suse.com> a ?crit :
>>>> 
>>>> On Tue, May 15 2018, James Simmons wrote:
>>>> 
>>>>> Dmitry has backported your upstream commit
>>>>> 
>>>>> be3c64da248b1ada2bbcfc3855f07536d8087acb ( ptlrpc: use workqueue for
>>>>> pinger)
>>>>> 
>>>>> and Sebastien tried out the change which can be seen at
>>>>> https://review.whamcloud.com/#/c/31724. Sebastien is having issues with
>>>>> unloading osc, mdc, ost or mdt kernel modules. He writes that he activated
>>>>> Kerberos for my file system, did some IOs, then tried to unmount the
>>>>> client and the  targets. Unfortunately, on some occasions, while
>>>>> unmounting, the mdc module for instance has extra references (as shown by
>>>>> 'lsmod'), and cannot be unloaded. After a while, the extra references are
>>>>> dropped, and the  module can be unloaded. It also happens on occasions on
>>>>> server side, for the mdt or ost modules. I tried destroy_workqueue but
>>>>> that made things worst :-( Do you know what is going on?
>>>> 
>>>> My guess is that ptlrpc_pinger_main() is looping, so
>>>> cancel_delayed_work_sync() is waiting an unusually long time for it.
>>>> This patch might fix the problem - it certainly makes the usage
>>>> of "this_ping" closer to the original code.
>>>> 
>>>> Thanks,
>>>> NeilBrown
>>>> 
>>>> diff --git a/drivers/staging/lustre/lustre/ptlrpc/pinger.c b/drivers/staging/lustre/lustre/ptlrpc/pinger.c
>>>> index 89eef8ec7df4..131b201541e8 100644
>>>> --- a/drivers/staging/lustre/lustre/ptlrpc/pinger.c
>>>> +++ b/drivers/staging/lustre/lustre/ptlrpc/pinger.c
>>>> @@ -222,12 +222,13 @@ static DECLARE_DELAYED_WORK(ping_work, ptlrpc_pinger_main);
>>>> 
>>>> static void ptlrpc_pinger_main(struct work_struct *ws)
>>>> {
>>>> -	unsigned long this_ping = jiffies;
>>>> 	long time_to_next_wake;
>>>> 	struct timeout_item *item;
>>>> 	struct obd_import *imp;
>>>> 
>>>> 	do {
>>>> +		unsigned long this_ping = jiffies;
>>>> +
>>>> 		mutex_lock(&pinger_mutex);
>>>> 		list_for_each_entry(item, &timeout_list, ti_chain) {
>>>> 			item->ti_cb(item, item->ti_cb_data);
>>> 
>>> Still same problem when testing "ptlrpc: use delayed_work in sec_gc" with patch "ptlrpc: use workqueue for pinger" and modification as suggested above. Kernel module ptlrpc_gss holds extra references, and cannot be unloaded.
>>> 
>>> However, patch "ptlrpc: use workqueue for pinger" alone does not have the issue.
>> 
>> Thanks for the clear explanation - helps a lot.  I see the problem - I think.
>> 
>> When (e.g.) ctx_release_kr() is called as part of shutting down a security
>> context, it calls sptlrpc_gc_add_ctx() which signals the gc-thread to
>> re-run.
>> None of this code is in mainline (why isn't the GSS code in
>> mainline????)
>
> We are in the un-enviable position that we can't really land new Lustre
> features into the staging tree, but we can't get out of the staging tree
> without a lot of code changes.

I don't see this as a problem.  "code changes" and "new fetures" are
largely independent.  There certainly are grey areas, but then the rules
aren't inflexible.  If something feature-like is needed to achieve clean
code, I doubt anyone would object.

I also don't see how your observation explains the lack of gss code in
the staging tree.
I've found
Commit: b78560200d41 ("staging: lustre: ptlrpc: gss: delete unused code")
and am wondering why all the sec*.c code didn't disappear at the same
time.
I guess I'll put 'gss' on my list of things to "fix".

>
> It isn't clear to me why Lustre has such a high bar to cross to get out
> of staging, compared to other filesystems that were added in the past.

I don't see the bar as being all that high, but then I haven't seen any
battles that you might have already fought that seem to show a high bar.

In my mind there are two issues: code quality and maintainership
quality.
Lustre was (is) out-of-tree for a long time with little or no apparent
interesting in going mainline, so that results in a fairly low score for
maintainership.  Maintainership means working in and with the community.
While lustre was out-of-tree it developed lot of useful functionality
which was also independently (and differently) developed in mainline
linux.  This makes the code look worse than it is - because it seems to
be unnecessarily different.
Lustre also previously supported multiple kernels, so it had (has)
abstraction layers which are now just technical debt.
So it is coming from a fairly low starting point.  I think it is
reasonable for the Linux community to want to be able to see lustre clearly
rising above all that past, clearing the debt with interest (you might
say).
This is all quite achievable - it just takes time and effort (and skill
- I think we have that among us).
I've been given time by SUSE to work on this and I can clear a lot of
the technical debt.  The more we work together, the easier it will be.
James has been a great help and I think momentum is building.
The community side needs key people to stop thinking of "us vs them" and
start thinking of lustre as part of the Linux community.  Don't talk
about "upstream has accepted this" or "upstream have rejected that", but
get on board with "we've landed this upstream" or "we are working to
find the best solution for that".
It might help to start using the mailing list more, rather than
depending on whamcloud.  The work-flow is different, but is quite
workable, and shows a willingness to be part of the broader community.


> It's not like Lustre is going away any time soon, unlike some no-name
> ethernet card driver, and we could still continue to improve the Lustre
> code if we could move out of staging, but it could be done in a more
> sustainable manner where we also land feature patches and try to bring
> the kernel and master code closer together over time.

I think we should do one thing at a time.
Firstly, focus on getting out of staging.  Freeze the current feature
set and get all existing features into an acceptable form.
Secondly, add all the fixes and features from master-code into Linux.
If you also want to move fixes from Linux into master, that's up to you.

Thanks,
NeilBrown
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180523/fe024576/attachment-0001.sig>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [lustre-devel] Regression with delayed_work in sec_gc
  2018-05-22 22:08         ` NeilBrown
@ 2018-05-22 23:01           ` Patrick Farrell
  2018-05-22 23:19             ` Jones, Peter A
  2018-05-23  0:57             ` NeilBrown
  2018-05-23 22:36           ` James Simmons
  1 sibling, 2 replies; 11+ messages in thread
From: Patrick Farrell @ 2018-05-22 23:01 UTC (permalink / raw)
  To: lustre-devel

Neil, Andreas,

A few thoughts I?d be curious to see one or both of you speak to.

1. Supporting multiple kernels has historically been important because Lustre is integrated in to various different HPC setups and they value being able to move distribution versions and Lustre versions independently.  Cray, for example, is less interested in upstream than we might be in part BECAUSE we use SLES as the base for our HPC distro and have a need for very recent versions of Lustre.  Long term, how would we square the desire to use enterprise distros and new versions of Lustre?

2. The Lustre server code remains out of tree.  Until it?s in, it?s difficult for me to see development getting fully invested in upstream, since a key component - which is normally distributed as a unified whole, at least for source - is out of tree.  Thoughts on that?

3. This is perhaps not so much a question for either of you, but it?s one I?m curious about anyway.  What?s the route out of staging and what happens after?  Provided we keep cleanliness high and duplication of existing functionality minimal, will the community accept a ton of feature drops?  If we have to dribble in existing features slowly, people will remain reluctant to use the upstream version for as long as that goes on.

Perhaps Greg?s suggestion of going temporarily out of tree, cleaning up, and coming back with everything would be better?  That?s radical, I know, but...

- Patrick
________________________________
From: lustre-devel <lustre-devel-bounces@lists.lustre.org> on behalf of NeilBrown <neilb@suse.com>
Sent: Tuesday, May 22, 2018 5:08 PM
To: Dilger, Andreas
Cc: Lustre Developement
Subject: Re: [lustre-devel] Regression with delayed_work in sec_gc

On Tue, May 22 2018, Dilger, Andreas wrote:

> On May 20, 2018, at 19:22, NeilBrown <neilb@suse.com> wrote:
>>
>> On Thu, May 17 2018, Sebastien Buisson wrote:
>>
>>>> Le 16 mai 2018 ? 00:22, NeilBrown <neilb@suse.com> a ?crit :
>>>>
>>>> On Tue, May 15 2018, James Simmons wrote:
>>>>
>>>>> Dmitry has backported your upstream commit
>>>>>
>>>>> be3c64da248b1ada2bbcfc3855f07536d8087acb ( ptlrpc: use workqueue for
>>>>> pinger)
>>>>>
>>>>> and Sebastien tried out the change which can be seen at
>>>>> https://review.whamcloud.com/#/c/31724. Sebastien is having issues with
>>>>> unloading osc, mdc, ost or mdt kernel modules. He writes that he activated
>>>>> Kerberos for my file system, did some IOs, then tried to unmount the
>>>>> client and the  targets. Unfortunately, on some occasions, while
>>>>> unmounting, the mdc module for instance has extra references (as shown by
>>>>> 'lsmod'), and cannot be unloaded. After a while, the extra references are
>>>>> dropped, and the  module can be unloaded. It also happens on occasions on
>>>>> server side, for the mdt or ost modules. I tried destroy_workqueue but
>>>>> that made things worst :-( Do you know what is going on?
>>>>
>>>> My guess is that ptlrpc_pinger_main() is looping, so
>>>> cancel_delayed_work_sync() is waiting an unusually long time for it.
>>>> This patch might fix the problem - it certainly makes the usage
>>>> of "this_ping" closer to the original code.
>>>>
>>>> Thanks,
>>>> NeilBrown
>>>>
>>>> diff --git a/drivers/staging/lustre/lustre/ptlrpc/pinger.c b/drivers/staging/lustre/lustre/ptlrpc/pinger.c
>>>> index 89eef8ec7df4..131b201541e8 100644
>>>> --- a/drivers/staging/lustre/lustre/ptlrpc/pinger.c
>>>> +++ b/drivers/staging/lustre/lustre/ptlrpc/pinger.c
>>>> @@ -222,12 +222,13 @@ static DECLARE_DELAYED_WORK(ping_work, ptlrpc_pinger_main);
>>>>
>>>> static void ptlrpc_pinger_main(struct work_struct *ws)
>>>> {
>>>> -  unsigned long this_ping = jiffies;
>>>>     long time_to_next_wake;
>>>>     struct timeout_item *item;
>>>>     struct obd_import *imp;
>>>>
>>>>     do {
>>>> +          unsigned long this_ping = jiffies;
>>>> +
>>>>             mutex_lock(&pinger_mutex);
>>>>             list_for_each_entry(item, &timeout_list, ti_chain) {
>>>>                     item->ti_cb(item, item->ti_cb_data);
>>>
>>> Still same problem when testing "ptlrpc: use delayed_work in sec_gc" with patch "ptlrpc: use workqueue for pinger" and modification as suggested above. Kernel module ptlrpc_gss holds extra references, and cannot be unloaded.
>>>
>>> However, patch "ptlrpc: use workqueue for pinger" alone does not have the issue.
>>
>> Thanks for the clear explanation - helps a lot.  I see the problem - I think.
>>
>> When (e.g.) ctx_release_kr() is called as part of shutting down a security
>> context, it calls sptlrpc_gc_add_ctx() which signals the gc-thread to
>> re-run.
>> None of this code is in mainline (why isn't the GSS code in
>> mainline????)
>
> We are in the un-enviable position that we can't really land new Lustre
> features into the staging tree, but we can't get out of the staging tree
> without a lot of code changes.

I don't see this as a problem.  "code changes" and "new fetures" are
largely independent.  There certainly are grey areas, but then the rules
aren't inflexible.  If something feature-like is needed to achieve clean
code, I doubt anyone would object.

I also don't see how your observation explains the lack of gss code in
the staging tree.
I've found
Commit: b78560200d41 ("staging: lustre: ptlrpc: gss: delete unused code")
and am wondering why all the sec*.c code didn't disappear at the same
time.
I guess I'll put 'gss' on my list of things to "fix".

>
> It isn't clear to me why Lustre has such a high bar to cross to get out
> of staging, compared to other filesystems that were added in the past.

I don't see the bar as being all that high, but then I haven't seen any
battles that you might have already fought that seem to show a high bar.

In my mind there are two issues: code quality and maintainership
quality.
Lustre was (is) out-of-tree for a long time with little or no apparent
interesting in going mainline, so that results in a fairly low score for
maintainership.  Maintainership means working in and with the community.
While lustre was out-of-tree it developed lot of useful functionality
which was also independently (and differently) developed in mainline
linux.  This makes the code look worse than it is - because it seems to
be unnecessarily different.
Lustre also previously supported multiple kernels, so it had (has)
abstraction layers which are now just technical debt.
So it is coming from a fairly low starting point.  I think it is
reasonable for the Linux community to want to be able to see lustre clearly
rising above all that past, clearing the debt with interest (you might
say).
This is all quite achievable - it just takes time and effort (and skill
- I think we have that among us).
I've been given time by SUSE to work on this and I can clear a lot of
the technical debt.  The more we work together, the easier it will be.
James has been a great help and I think momentum is building.
The community side needs key people to stop thinking of "us vs them" and
start thinking of lustre as part of the Linux community.  Don't talk
about "upstream has accepted this" or "upstream have rejected that", but
get on board with "we've landed this upstream" or "we are working to
find the best solution for that".
It might help to start using the mailing list more, rather than
depending on whamcloud.  The work-flow is different, but is quite
workable, and shows a willingness to be part of the broader community.


> It's not like Lustre is going away any time soon, unlike some no-name
> ethernet card driver, and we could still continue to improve the Lustre
> code if we could move out of staging, but it could be done in a more
> sustainable manner where we also land feature patches and try to bring
> the kernel and master code closer together over time.

I think we should do one thing at a time.
Firstly, focus on getting out of staging.  Freeze the current feature
set and get all existing features into an acceptable form.
Secondly, add all the fixes and features from master-code into Linux.
If you also want to move fixes from Linux into master, that's up to you.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [lustre-devel] Regression with delayed_work in sec_gc
  2018-05-22 23:01           ` Patrick Farrell
@ 2018-05-22 23:19             ` Jones, Peter A
  2018-05-23  0:57             ` NeilBrown
  1 sibling, 0 replies; 11+ messages in thread
From: Jones, Peter A @ 2018-05-22 23:19 UTC (permalink / raw)
  To: lustre-devel

I?m not Andreas or Neil and I cannot talk on behalf of any given Linux distribution but I personally think that it would be better to ?jump forward? than ?jump back?. I think that if we are out of staging that Linux distributions would be more likely to consider keeping the version of the Lustre client in their tree in line with the ?latest and greatest? which will  in turn lead to the resources applied on the kernel version increasing because there would be more immediate ?pay off? from these efforts.




On 2018-05-22, 4:01 PM, "lustre-devel on behalf of Patrick Farrell" <lustre-devel-bounces at lists.lustre.org on behalf of paf@cray.com> wrote:

>Neil, Andreas,
>
>A few thoughts I?d be curious to see one or both of you speak to.
>
>1. Supporting multiple kernels has historically been important because Lustre is integrated in to various different HPC setups and they value being able to move distribution versions and Lustre versions independently.  Cray, for example, is less interested in upstream than we might be in part BECAUSE we use SLES as the base for our HPC distro and have a need for very recent versions of Lustre.  Long term, how would we square the desire to use enterprise distros and new versions of Lustre?
>
>2. The Lustre server code remains out of tree.  Until it?s in, it?s difficult for me to see development getting fully invested in upstream, since a key component - which is normally distributed as a unified whole, at least for source - is out of tree.  Thoughts on that?
>
>3. This is perhaps not so much a question for either of you, but it?s one I?m curious about anyway.  What?s the route out of staging and what happens after?  Provided we keep cleanliness high and duplication of existing functionality minimal, will the community accept a ton of feature drops?  If we have to dribble in existing features slowly, people will remain reluctant to use the upstream version for as long as that goes on.
>
>Perhaps Greg?s suggestion of going temporarily out of tree, cleaning up, and coming back with everything would be better?  That?s radical, I know, but...
>
>- Patrick
>________________________________
>From: lustre-devel <lustre-devel-bounces@lists.lustre.org> on behalf of NeilBrown <neilb@suse.com>
>Sent: Tuesday, May 22, 2018 5:08 PM
>To: Dilger, Andreas
>Cc: Lustre Developement
>Subject: Re: [lustre-devel] Regression with delayed_work in sec_gc
>
>On Tue, May 22 2018, Dilger, Andreas wrote:
>
>> On May 20, 2018, at 19:22, NeilBrown <neilb@suse.com> wrote:
>>>
>>> On Thu, May 17 2018, Sebastien Buisson wrote:
>>>
>>>>> Le 16 mai 2018 ? 00:22, NeilBrown <neilb@suse.com> a ?crit :
>>>>>
>>>>> On Tue, May 15 2018, James Simmons wrote:
>>>>>
>>>>>> Dmitry has backported your upstream commit
>>>>>>
>>>>>> be3c64da248b1ada2bbcfc3855f07536d8087acb ( ptlrpc: use workqueue for
>>>>>> pinger)
>>>>>>
>>>>>> and Sebastien tried out the change which can be seen at
>>>>>> https://review.whamcloud.com/#/c/31724. Sebastien is having issues with
>>>>>> unloading osc, mdc, ost or mdt kernel modules. He writes that he activated
>>>>>> Kerberos for my file system, did some IOs, then tried to unmount the
>>>>>> client and the  targets. Unfortunately, on some occasions, while
>>>>>> unmounting, the mdc module for instance has extra references (as shown by
>>>>>> 'lsmod'), and cannot be unloaded. After a while, the extra references are
>>>>>> dropped, and the  module can be unloaded. It also happens on occasions on
>>>>>> server side, for the mdt or ost modules. I tried destroy_workqueue but
>>>>>> that made things worst :-( Do you know what is going on?
>>>>>
>>>>> My guess is that ptlrpc_pinger_main() is looping, so
>>>>> cancel_delayed_work_sync() is waiting an unusually long time for it.
>>>>> This patch might fix the problem - it certainly makes the usage
>>>>> of "this_ping" closer to the original code.
>>>>>
>>>>> Thanks,
>>>>> NeilBrown
>>>>>
>>>>> diff --git a/drivers/staging/lustre/lustre/ptlrpc/pinger.c b/drivers/staging/lustre/lustre/ptlrpc/pinger.c
>>>>> index 89eef8ec7df4..131b201541e8 100644
>>>>> --- a/drivers/staging/lustre/lustre/ptlrpc/pinger.c
>>>>> +++ b/drivers/staging/lustre/lustre/ptlrpc/pinger.c
>>>>> @@ -222,12 +222,13 @@ static DECLARE_DELAYED_WORK(ping_work, ptlrpc_pinger_main);
>>>>>
>>>>> static void ptlrpc_pinger_main(struct work_struct *ws)
>>>>> {
>>>>> -  unsigned long this_ping = jiffies;
>>>>>     long time_to_next_wake;
>>>>>     struct timeout_item *item;
>>>>>     struct obd_import *imp;
>>>>>
>>>>>     do {
>>>>> +          unsigned long this_ping = jiffies;
>>>>> +
>>>>>             mutex_lock(&pinger_mutex);
>>>>>             list_for_each_entry(item, &timeout_list, ti_chain) {
>>>>>                     item->ti_cb(item, item->ti_cb_data);
>>>>
>>>> Still same problem when testing "ptlrpc: use delayed_work in sec_gc" with patch "ptlrpc: use workqueue for pinger" and modification as suggested above. Kernel module ptlrpc_gss holds extra references, and cannot be unloaded.
>>>>
>>>> However, patch "ptlrpc: use workqueue for pinger" alone does not have the issue.
>>>
>>> Thanks for the clear explanation - helps a lot.  I see the problem - I think.
>>>
>>> When (e.g.) ctx_release_kr() is called as part of shutting down a security
>>> context, it calls sptlrpc_gc_add_ctx() which signals the gc-thread to
>>> re-run.
>>> None of this code is in mainline (why isn't the GSS code in
>>> mainline????)
>>
>> We are in the un-enviable position that we can't really land new Lustre
>> features into the staging tree, but we can't get out of the staging tree
>> without a lot of code changes.
>
>I don't see this as a problem.  "code changes" and "new fetures" are
>largely independent.  There certainly are grey areas, but then the rules
>aren't inflexible.  If something feature-like is needed to achieve clean
>code, I doubt anyone would object.
>
>I also don't see how your observation explains the lack of gss code in
>the staging tree.
>I've found
>Commit: b78560200d41 ("staging: lustre: ptlrpc: gss: delete unused code")
>and am wondering why all the sec*.c code didn't disappear at the same
>time.
>I guess I'll put 'gss' on my list of things to "fix".
>
>>
>> It isn't clear to me why Lustre has such a high bar to cross to get out
>> of staging, compared to other filesystems that were added in the past.
>
>I don't see the bar as being all that high, but then I haven't seen any
>battles that you might have already fought that seem to show a high bar.
>
>In my mind there are two issues: code quality and maintainership
>quality.
>Lustre was (is) out-of-tree for a long time with little or no apparent
>interesting in going mainline, so that results in a fairly low score for
>maintainership.  Maintainership means working in and with the community.
>While lustre was out-of-tree it developed lot of useful functionality
>which was also independently (and differently) developed in mainline
>linux.  This makes the code look worse than it is - because it seems to
>be unnecessarily different.
>Lustre also previously supported multiple kernels, so it had (has)
>abstraction layers which are now just technical debt.
>So it is coming from a fairly low starting point.  I think it is
>reasonable for the Linux community to want to be able to see lustre clearly
>rising above all that past, clearing the debt with interest (you might
>say).
>This is all quite achievable - it just takes time and effort (and skill
>- I think we have that among us).
>I've been given time by SUSE to work on this and I can clear a lot of
>the technical debt.  The more we work together, the easier it will be.
>James has been a great help and I think momentum is building.
>The community side needs key people to stop thinking of "us vs them" and
>start thinking of lustre as part of the Linux community.  Don't talk
>about "upstream has accepted this" or "upstream have rejected that", but
>get on board with "we've landed this upstream" or "we are working to
>find the best solution for that".
>It might help to start using the mailing list more, rather than
>depending on whamcloud.  The work-flow is different, but is quite
>workable, and shows a willingness to be part of the broader community.
>
>
>> It's not like Lustre is going away any time soon, unlike some no-name
>> ethernet card driver, and we could still continue to improve the Lustre
>> code if we could move out of staging, but it could be done in a more
>> sustainable manner where we also land feature patches and try to bring
>> the kernel and master code closer together over time.
>
>I think we should do one thing at a time.
>Firstly, focus on getting out of staging.  Freeze the current feature
>set and get all existing features into an acceptable form.
>Secondly, add all the fixes and features from master-code into Linux.
>If you also want to move fixes from Linux into master, that's up to you.
>
>Thanks,
>NeilBrown
>_______________________________________________
>lustre-devel mailing list
>lustre-devel at lists.lustre.org
>http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [lustre-devel] Regression with delayed_work in sec_gc
  2018-05-22 23:01           ` Patrick Farrell
  2018-05-22 23:19             ` Jones, Peter A
@ 2018-05-23  0:57             ` NeilBrown
  1 sibling, 0 replies; 11+ messages in thread
From: NeilBrown @ 2018-05-23  0:57 UTC (permalink / raw)
  To: lustre-devel

On Tue, May 22 2018, Patrick Farrell wrote:

> Neil, Andreas,
>
> A few thoughts I?d be curious to see one or both of you speak to.
>
> 1. Supporting multiple kernels has historically been important because
> Lustre is integrated in to various different HPC setups and they value
> being able to move distribution versions and Lustre versions
> independently.  Cray, for example, is less interested in upstream than
> we might be in part BECAUSE we use SLES as the base for our HPC distro
> and have a need for very recent versions of Lustre.  Long term, how
> would we square the desire to use enterprise distros and new versions
> of Lustre? 
>

We resolve this the same way that every enterprise distro handles every
other important filesystem and every major chunk of the kernel.
We develop upstream, and back-port to our enterprise kernels.  Sometimes
our partners help with the backport, sometimes we just do it ourselves.
For example, SLE12-SP3 currently has about 500 patches to btrfs, many
of which a backports, and a fair few of those were probably developed by
SUSE for SLE12 - submitted upstream and backported to SLE.
NFS and XFS have about 100 patches each.
Backporting well-written patches is not that hard.

This is pretty much was Peter Jones says in his reply.

> 2. The Lustre server code remains out of tree.  Until it?s in, it?s
> difficult for me to see development getting fully invested in
> upstream, since a key component - which is normally distributed as a
> unified whole, at least for source - is out of tree.  Thoughts on
> that? 

My main thought is that it is crazy that we have the client in Linux but
not the server.  However that ship obviously sailed long ago.  One of my
first priorities after getting lustre-client out of staging will be to
get lustre-server into staging. It makes zero sense to develop them
separately.

>
> 3. This is perhaps not so much a question for either of you, but it?s
> one I?m curious about anyway.  What?s the route out of staging and
> what happens after?  Provided we keep cleanliness high and duplication
> of existing functionality minimal, will the community accept a ton of
> feature drops?  If we have to dribble in existing features slowly,
> people will remain reluctant to use the upstream version for as long
> as that goes on.

The path out of staging is to remove all the warts we can find, address
all the issues that have been raised in the past, then say "please".
If people don't see things they hate, they'll probably just shrug.
If they do, we will respond to their concerns and have a productive
discussion.  "Established user base" carries a lot of weight with
Linus - not enough to suppress the gag-reflex, but enough to ignore the
whiners.  So if we had well-documented cases of people using the
code that is in Linux, and indications of interest from other
stake holders, that would be an important part of the sell.

Once it is out of staging, we need a small groups of maintainers who
will review patches and forward pull requests directly to Linus.  Linus
tends to be willing to trust people who have demonstrated competence -
until they betray his trust.  I don't know how much he will look at the
patches at first, but over time he is likely to do little more than
glance at them.  He is more likely to respond to valid complaints from
other maintainers, than to poor code from us.  We should minimize the
later to avoid the former.

I think the first priority would be to go through all the patches in
out-of-tree lustre that landed since it was forked into drivers/staging
and port all those that are relevant.  Nobody outside of the lustre
community will care much what features actually land as long as they
don't introduce horrible APIs or duplicate functionality - and they
might not even notice if they do.  If we simply maintain the same level
of quality as we achieved to get out of staging, there should be no
problem.


>
> Perhaps Greg?s suggestion of going temporarily out of tree, cleaning
> up, and coming back with everything would be better?  That?s radical,
> I know, but... 

I know I proposed this at one stage too, but I'm currently firmly
against the idea.  The benefits from staying in-tree include:

- protection against bit-rot.  If someone changes an interface that
  lustre uses, they will change lustre along with everything else that
  is in-tree.  If lustre is out-of-tree we have to do that ourselves
  and will have less expertise in the reason for the change, and less
  opportunity to disagree with it if it hurts lustre

- community credibility.  If we have an established history of working
  upstream, people both within and without the community will see that
  and take the in-tree version of lustre more seriously.

- extensive testing by static checkers and other automated tests.
  I know lustre has a jenkins which does good stuff, but the more
  automated testing that happens, the better.

Conversely, I don't see any value in moving out-of-tree.
All the barriers that you might see to in-tree development are there to
make the code better - we benefit from them.  I think it would send
entirely the wrong message to leave.
My concern, when I mentioned it, was that it seemed that no-one was
testing in-tree code: I was hits serious bugs very easily.
My concerns in that direction have been mostly allayed.   It seems
people really do use and test linux/lustre - I just got lucky with a
particular combination of test options :-)

Thanks,
NeilBrown
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180523/b9e39d15/attachment.sig>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [lustre-devel] Regression with delayed_work in sec_gc
  2018-05-22 22:08         ` NeilBrown
  2018-05-22 23:01           ` Patrick Farrell
@ 2018-05-23 22:36           ` James Simmons
  1 sibling, 0 replies; 11+ messages in thread
From: James Simmons @ 2018-05-23 22:36 UTC (permalink / raw)
  To: lustre-devel


> >> When (e.g.) ctx_release_kr() is called as part of shutting down a security
> >> context, it calls sptlrpc_gc_add_ctx() which signals the gc-thread to
> >> re-run.
> >> None of this code is in mainline (why isn't the GSS code in
> >> mainline????)
> >
> > We are in the un-enviable position that we can't really land new Lustre
> > features into the staging tree, but we can't get out of the staging tree
> > without a lot of code changes.
> 
> I don't see this as a problem.  "code changes" and "new fetures" are
> largely independent.  There certainly are grey areas, but then the rules
> aren't inflexible.  If something feature-like is needed to achieve clean
> code, I doubt anyone would object.

To explain why this is brought up let me clear something up. While Lustre 
is thought of as a strictly Intel product, and yes the OpenSFS out of 
tree is managed by them, that is not the case. The best way to think 
of the Lustre Intel developers is as the "features team". Andreas as an 
Intel developer bringing this up makes sense. If you look at things like 
performance improvements it generally falls to companies like Cray and 
DDN. They have developers that work on those sort of things. Not saying
they don't do features but when they do it generally has the motive to 
improve performance. Look at slide 12 of the following slides:

http://cdn.opensfs.org/wp-content/uploads/2018/04/Jones-Community_Release_Update_LUG_2018.pdf

You will noticed that after Intel the next largest contributor is ORNL
which is my employer. The 25083 lines of code change is me updating the
code for the linux upstream work :-) So in the Intel source tree following
the features developed by Intel the next largest work is making Lustre 
more linux compliant. Well I do lots of other work to make Lustre 
deployable at my employer and to keep my admins happy :-) Also I'm the
Lustre ARM, PowerPC guy. But that is off topic. Okay I guess I do lots of
things.

So the slides show Lustre has grown into more of a community effort. The
aim is to make Intel slice shrink into a reasonable size. That is not 
sustainable in the long run especially once Lustre enters the cloud 
market which is many orders of magnitude larger then the HPC market.

> I also don't see how your observation explains the lack of gss code in
> the staging tree.
> I've found
> Commit: b78560200d41 ("staging: lustre: ptlrpc: gss: delete unused code")
> and am wondering why all the sec*.c code didn't disappear at the same
> time.
> I guess I'll put 'gss' on my list of things to "fix".

I can explain why this is the case. For the lustre version that first 
merged into the staging tree the GSS code just plain didn't work. At that
time anything that didn't seem to work wasn't fix but just deleted :-(
This is how for example we lost the libcfs watchdog timer used for ptlrpc
threads to get back traces for threads that got hung. Sadly once removed
its not easy to bring it back.
 
BTW the GSS code for lustre still is a bit wonky. So I would say its not
yet ready for prime time. I felt it was something best to wait until after
leaving staging since that is a big code drop. Also as I pointed out at
developers day their is a lot of code duplication with the sunrpc gss 
code. We need to get together with the sunrpc gss maintainer and work out 
a common api so lustre and sunrpc could register with this infrastructre.
I just haven't had the cycles to get to this yet. Currently I would say
this is down the list of TODOs.

> > It isn't clear to me why Lustre has such a high bar to cross to get out
> > of staging, compared to other filesystems that were added in the past.
> 
> I don't see the bar as being all that high, but then I haven't seen any
> battles that you might have already fought that seem to show a high bar.
> 
> In my mind there are two issues: code quality and maintainership
> quality.
> Lustre was (is) out-of-tree for a long time with little or no apparent
> interesting in going mainline, so that results in a fairly low score for
> maintainership.  Maintainership means working in and with the community.
> While lustre was out-of-tree it developed lot of useful functionality
> which was also independently (and differently) developed in mainline
> linux.  This makes the code look worse than it is - because it seems to
> be unnecessarily different.
> Lustre also previously supported multiple kernels, so it had (has)
> abstraction layers which are now just technical debt.
> So it is coming from a fairly low starting point.  I think it is
> reasonable for the Linux community to want to be able to see lustre clearly
> rising above all that past, clearing the debt with interest (you might
> say).

That is from the long history of supporting a user land version which also
was used for the Cray catamount OS and a potential Windows and Mac client.
Also when Sun acquired Lustre it wanted it on it Solaris system. Often
the solutions Lustre did happened long before linux had a solution. In
any case Lustre is now a linux product. Its better than what it was. I
have been doing a lot of work to clean up the duplicate efforts and such. 

> This is all quite achievable - it just takes time and effort (and skill
> - I think we have that among us).
> I've been given time by SUSE to work on this and I can clear a lot of
> the technical debt.  The more we work together, the easier it will be.
> James has been a great help and I think momentum is building.
> The community side needs key people to stop thinking of "us vs them" and
> start thinking of lustre as part of the Linux community.  Don't talk
> about "upstream has accepted this" or "upstream have rejected that", but
> get on board with "we've landed this upstream" or "we are working to
> find the best solution for that".
>
> It might help to start using the mailing list more, rather than
> depending on whamcloud.  The work-flow is different, but is quite
> workable, and shows a willingness to be part of the broader community.
> 
> > It's not like Lustre is going away any time soon, unlike some no-name
> > ethernet card driver, and we could still continue to improve the Lustre
> > code if we could move out of staging, but it could be done in a more
> > sustainable manner where we also land feature patches and try to bring
> > the kernel and master code closer together over time.
> 
> I think we should do one thing at a time.
> Firstly, focus on getting out of staging.  Freeze the current feature
> set and get all existing features into an acceptable form.
> Secondly, add all the fixes and features from master-code into Linux.
> If you also want to move fixes from Linux into master, that's up to you.

I have a plan which I will go over in a latter email.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-05-23 22:36 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-15 15:17 [lustre-devel] Regression with delayed_work in sec_gc James Simmons
2018-05-15 22:22 ` NeilBrown
2018-05-17  9:10   ` Sebastien Buisson
2018-05-21  1:22     ` NeilBrown
2018-05-22 16:05       ` Sebastien Buisson
2018-05-22 20:50       ` Dilger, Andreas
2018-05-22 22:08         ` NeilBrown
2018-05-22 23:01           ` Patrick Farrell
2018-05-22 23:19             ` Jones, Peter A
2018-05-23  0:57             ` NeilBrown
2018-05-23 22:36           ` James Simmons

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.