cti-tac.lists.linuxfoundation.org archive mirror
 help / color / mirror / Atom feed
* Hashing out the scope of work
@ 2024-04-24 16:45 Konstantin Ryabitsev
  2024-04-24 21:15 ` Ian Kelling
  2024-04-29 14:09 ` Joseph Myers
  0 siblings, 2 replies; 17+ messages in thread
From: Konstantin Ryabitsev @ 2024-04-24 16:45 UTC (permalink / raw)
  To: cti-tac

Hello, all:

This is the Scope of Work section of the larger Statement of Work document.
Please send any comments, adjustments, additions, or clarifications you would
like to see.

This SOW focuses on the migration primarily, and the tentative time frame is
through mid-2025.

Regards,
Konstantin

---
When choosing solutions and deploying services, the guiding principle is to
remain compliant with the [GNU Ethical Repository Criteria][1], Grade B. This
mandates exclusive use of Free Software and promotes open platforms over
proprietary hosting environments.

1. Mailing lists for decentralized patch-based development

  a. LF will set up mailing list hosting under the lists.coretoolcain.dev
     domain, using the following technology stack:

    - postfix
    - mlmmj
    - public-inbox

  b. For scalability and IP reputation reasons, LF will reuse its current
     mailing list platform already deployed for Linux kernel development
     (subspace.kernel.org).
  c. Individual glibc mailing lists will be created on request and do not
     require to be enumerated in this SOW.
  d. Mailing list archives will be available either via lore.kernel.org, or a
     dedicated coretoolchain.dev domain name, depending on project preferences.

2. Bug tracking software (Bugzilla)

  a. LF will set up a bug tracking system using [Bugzilla Harmony][2]. This
     will be a new, dedicated installation in the bugzilla.coretoolchain.dev
     domain.
  b. LF will work with the CTI leadership to properly configure the software
     for use with the glibc project.
  c. LF will work with the glibc community to offer the tooling to integrate
     git repositories, mailing lists, and the bugtracker (e.g. using the
     [bugspray][3] project).

3. Git repository hosting (Gitolite)

  a. LF will set up the necessary glibc repositories reusing the existing
     gitolite.coretoolchain.dev service.
  b. LF will work with the CTI leadership to migrate existing repositories and
     grant the necessary permissions.
  c. LF will work with the glibc project community to analyze, port, and adapt
     the existing set of [AdaCore][4] post-commit hooks to ensure that they
     are functioning after the migration, or are replaced with suitable
     alternatives if existing implementations cannot be used directly due to
     security considerations or duplicating functionality provided in
     Gitolite.
  d. LF will set up mirroring and replication using [grokmirror][5] to offer
     multiple redundant sites for git repository access.

4. Documentation repository and website (Sphinx)

  a. LF will work with the glibc community to help migrate the wiki site from
     [MoinMoin Wiki][6] to a set of restructured-text documents (e.g. using
     [pandoc][7]).
  b. LF will provide a publishing framework to automatically build and deploy
     documentation to the dedicated docs website (exact domain name to be
     established).

5. Static website (Sphinx)

  a. LF will provide hosting for a separate static website for glibc (exact
     domain name to be established).
  b. LF will provide a mechanism to publish release tarballs and offer
     downloads via the mirrors.kernel.org system of worldwide mirrors (already
     provided via mirrors.kernel.org/gnu/glibc/).

6. Patch tracking services (Patchwork)

  a. LF will deploy the latest version of Patchwork patch tracking software
     under patchwork.coretoolchain.dev.
  b. LF will set up automation and integration services between mailing lists,
     patchwork, and git repositories (using [git-patchwork-bot][8]).
  c. LF will work with the glibc community to set up projects and access as
     appropriate.

7. Video conferencing (BigBlueButton)

  a. LF will provide video conferencing to the project using [BigBlueButton][9]
     (BBB), either with a 3rd-party provider or as a fully self-hosted service.
  b. LF will work with the glibc project to set up conference rooms and access
     as required.

[1]: https://www.gnu.org/software/repo-criteria.en.html
[2]: https://github.com/bugzilla/harmony
[3]: https://git.kernel.org/pub/scm/utils/bugspray/bugspray.git
[4]: https://github.com/AdaCore/git-hooks
[5]: https://git.kernel.org/pub/scm/utils/grokmirror/grokmirror.git
[6]: https://sourceware.org/glibc/wiki/HomePage
[7]: https://pandoc.org/
[8]: https://git.kernel.org/pub/scm/linux/kernel/git/mricon/korg-helpers.git/tree/git-patchwork-bot.py

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Hashing out the scope of work
  2024-04-24 16:45 Hashing out the scope of work Konstantin Ryabitsev
@ 2024-04-24 21:15 ` Ian Kelling
  2024-04-25  1:31   ` Konstantin Ryabitsev
  2024-04-29 14:09 ` Joseph Myers
  1 sibling, 1 reply; 17+ messages in thread
From: Ian Kelling @ 2024-04-24 21:15 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: cti-tac


Konstantin Ryabitsev <konstantin@linuxfoundation.org> writes:

> 7. Video conferencing (BigBlueButton)
>
>   a. LF will provide video conferencing to the project using [BigBlueButton][9]
>      (BBB), either with a 3rd-party provider or as a fully self-hosted service.
>   b. LF will work with the glibc project to set up conference rooms and access
>      as required.

Regarding BBB:

Looks like you forgot to include the [9] link.

BBB versions of the last few years have a dependency on nonfree MongoDB
versions. I think the LF BBB that CTI has been using is running nonfree
MongoDB. The upstream BBB developers announced they are removing
MongoDB, but that work does not have a set timeline, I hope it will be
done within a year. At FSF, we are migrating to Galène,
https://galene.org/ , which has less features but also takes less
resources and is much less sysadmin work.

-- 
Ian Kelling | Senior Systems Administrator, Free Software Foundation
GPG Key: B125 F60B 7B28 7FF6 A2B7  DF8F 170A F0E2 9542 95DF
https://fsf.org | https://gnu.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Hashing out the scope of work
  2024-04-24 21:15 ` Ian Kelling
@ 2024-04-25  1:31   ` Konstantin Ryabitsev
  2024-04-25 20:06     ` Ian Kelling
  0 siblings, 1 reply; 17+ messages in thread
From: Konstantin Ryabitsev @ 2024-04-25  1:31 UTC (permalink / raw)
  To: Ian Kelling; +Cc: cti-tac

On Wed, Apr 24, 2024 at 05:15:33PM -0400, Ian Kelling wrote:
> > 7. Video conferencing (BigBlueButton)
> >
> >   a. LF will provide video conferencing to the project using [BigBlueButton][9]
> >      (BBB), either with a 3rd-party provider or as a fully self-hosted service.
> >   b. LF will work with the glibc project to set up conference rooms and access
> >      as required.
> 
> Regarding BBB:
> 
> Looks like you forgot to include the [9] link.
> 
> BBB versions of the last few years have a dependency on nonfree MongoDB
> versions. I think the LF BBB that CTI has been using is running nonfree
> MongoDB. The upstream BBB developers announced they are removing
> MongoDB, but that work does not have a set timeline, I hope it will be
> done within a year. At FSF, we are migrating to Galène,
> https://galene.org/ , which has less features but also takes less
> resources and is much less sysadmin work.

Okay, thank you for bringing that up. Another video conferencing system 
which we've previously deployed is Jitsi. Are there any similar 
objections to that platform?

-K

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Hashing out the scope of work
  2024-04-25  1:31   ` Konstantin Ryabitsev
@ 2024-04-25 20:06     ` Ian Kelling
  2024-04-26 16:06       ` Siddhesh Poyarekar
  0 siblings, 1 reply; 17+ messages in thread
From: Ian Kelling @ 2024-04-25 20:06 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: cti-tac


Konstantin Ryabitsev <konstantin@linuxfoundation.org> writes:

> On Wed, Apr 24, 2024 at 05:15:33PM -0400, Ian Kelling wrote:
>> > 7. Video conferencing (BigBlueButton)
>> >
>> >   a. LF will provide video conferencing to the project using [BigBlueButton][9]
>> >      (BBB), either with a 3rd-party provider or as a fully self-hosted service.
>> >   b. LF will work with the glibc project to set up conference rooms and access
>> >      as required.
>> 
>> Regarding BBB:
>> 
>> Looks like you forgot to include the [9] link.
>> 
>> BBB versions of the last few years have a dependency on nonfree MongoDB
>> versions. I think the LF BBB that CTI has been using is running nonfree
>> MongoDB. The upstream BBB developers announced they are removing
>> MongoDB, but that work does not have a set timeline, I hope it will be
>> done within a year. At FSF, we are migrating to Galène,
>> https://galene.org/ , which has less features but also takes less
>> resources and is much less sysadmin work.
>
> Okay, thank you for bringing that up. Another video conferencing system 
> which we've previously deployed is Jitsi. Are there any similar 
> objections to that platform?
>
> -K

Jitsi should be ok. At FSF, we've had some occasional usability problems
with Jitsi, but no software freedom problems.

But note, this was not an objection, it was pointing out a fundamental
defect that got overlooked.

Consider who's responsibility is it to check the freedom status of
software deployed by LF IT for CTI.

A promise to use free software is one thing, actually living up to it
needs some procedures, procedures that failed or that you don't have in
place.

-- 
Ian Kelling | Senior Systems Administrator, Free Software Foundation
GPG Key: B125 F60B 7B28 7FF6 A2B7  DF8F 170A F0E2 9542 95DF
https://fsf.org | https://gnu.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Hashing out the scope of work
  2024-04-25 20:06     ` Ian Kelling
@ 2024-04-26 16:06       ` Siddhesh Poyarekar
  0 siblings, 0 replies; 17+ messages in thread
From: Siddhesh Poyarekar @ 2024-04-26 16:06 UTC (permalink / raw)
  To: Ian Kelling, Konstantin Ryabitsev; +Cc: cti-tac

On 2024-04-25 16:06, Ian Kelling wrote:
>> Okay, thank you for bringing that up. Another video conferencing system
>> which we've previously deployed is Jitsi. Are there any similar
>> objections to that platform?
>>
>> -K
> 
> Jitsi should be ok. At FSF, we've had some occasional usability problems
> with Jitsi, but no software freedom problems.

Ack, to respond to the usability aspect, we struggled with the FSF Jitsi 
instance in the earlier days when we started the glibc patch review 
meetings.  However I remember someone pointing out back then that it's 
quite easy to misconfigure Jitsi, so maybe it's possible to get it 
performing well enough as a workaround (typical GNU toolchain calls 
don't exceed 10-15 participants at the moment, maybe slightly more for 
the office hour) until the bbb non-free dependency issues are resolved.

Thanks,
Sid

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Hashing out the scope of work
  2024-04-24 16:45 Hashing out the scope of work Konstantin Ryabitsev
  2024-04-24 21:15 ` Ian Kelling
@ 2024-04-29 14:09 ` Joseph Myers
  2024-04-29 15:23   ` Konstantin Ryabitsev
  1 sibling, 1 reply; 17+ messages in thread
From: Joseph Myers @ 2024-04-29 14:09 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: cti-tac

On Wed, 24 Apr 2024, Konstantin Ryabitsev wrote:

> 2. Bug tracking software (Bugzilla)
> 
>   a. LF will set up a bug tracking system using [Bugzilla Harmony][2]. This
>      will be a new, dedicated installation in the bugzilla.coretoolchain.dev
>      domain.
>   b. LF will work with the CTI leadership to properly configure the software
>      for use with the glibc project.

There will be lots of review needed of the existing Bugzilla configuration 
and local changes.  Some of those changes may simply be to have more 
appropriate contents on the bug submission form / other HTML templates, 
some may involve actual logic changes.  Hopefully the new upstream version 
is suitably configurable without needing too many local changes.

We definitely need the various local fields (and, as applicable, sets of 
field values) from the existing Bugzilla.  (If anything, glibc is simpler 
than some cases.  GCC, binutils, GDB need all of the host/target/build 
fields.  glibc isn't itself a compiler so normally only one system, 
strictly "host" for glibc, is relevant, though there *can* be build issues 
where the build system is relevant; "target" doesn't properly apply at 
all.)

We need some form of (fully free software) spam protection, whether the 
existing scheme where new accounts need to be manually requested and 
approved, or any other sufficiently reliable spam protection scheme 
available upstream.

The REST API needs to be enabled (I'm assuming this Bugzilla version has 
it); we rely on it to generate lists of fixed bugs, at least.

New bugs should have numbers in ranges well above those used by the 
current Sourceware Bugzilla (e.g. 100000-plus) to avoid near-term 
confusion with the same bug number having different meanings in the two 
databases.

There may be an unresolved question about migrating existing glibc bugs 
(all of them, or at least the open ones) from the Sourceware Bugzilla 
(keeping their numbers); I think such migration is valuable, including for 
closed bugs (then adding a comment to all the old bugs that they are 
frozen and future discussion should take place at the new location), but I 
think there have been suggestions that such migration is hard.  (Bugs for 
other products in Sourceware Bugzilla would not of course be included in 
such a migration, since the new database would be glibc-only.)

> 6. Patch tracking services (Patchwork)
> 
>   a. LF will deploy the latest version of Patchwork patch tracking software
>      under patchwork.coretoolchain.dev.
>   b. LF will set up automation and integration services between mailing lists,
>      patchwork, and git repositories (using [git-patchwork-bot][8]).
>   c. LF will work with the glibc community to set up projects and access as
>      appropriate.

The existing glibc Patchwork database should be transitioned across.

-- 
Joseph S. Myers
josmyers@redhat.com


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Hashing out the scope of work
  2024-04-29 14:09 ` Joseph Myers
@ 2024-04-29 15:23   ` Konstantin Ryabitsev
  2024-04-29 16:35     ` Joseph Myers
  0 siblings, 1 reply; 17+ messages in thread
From: Konstantin Ryabitsev @ 2024-04-29 15:23 UTC (permalink / raw)
  To: Joseph Myers; +Cc: cti-tac

On Mon, Apr 29, 2024 at 02:09:33PM GMT, Joseph Myers wrote:
> On Wed, 24 Apr 2024, Konstantin Ryabitsev wrote:
> 
> > 2. Bug tracking software (Bugzilla)
> > 
> >   a. LF will set up a bug tracking system using [Bugzilla Harmony][2]. This
> >      will be a new, dedicated installation in the bugzilla.coretoolchain.dev
> >      domain.
> >   b. LF will work with the CTI leadership to properly configure the software
> >      for use with the glibc project.
> 
> There will be lots of review needed of the existing Bugzilla configuration 
> and local changes.  Some of those changes may simply be to have more 
> appropriate contents on the bug submission form / other HTML templates, 
> some may involve actual logic changes.  Hopefully the new upstream version 
> is suitably configurable without needing too many local changes.

Definitely, and I would even go further and say that any changes that 
require actual code modifications beyond writing extensions would 
require multiple levels of review before we accept them. In our 
experience, a modified version of bugzilla is a version of bugzilla that 
will never get updated because nobody wants to touch it out of fear of 
breaking something -- so it just ends up unmaintained.

> We need some form of (fully free software) spam protection, whether the 
> existing scheme where new accounts need to be manually requested and 
> approved, or any other sufficiently reliable spam protection scheme 
> available upstream.

There isn't any. \o/
We run bugzilla-junker that goes through all recent comments and lets 
the admin review all URLs for spam content. It's pretty effective at 
keeping our bugzilla instance spam-free.
https://git.kernel.org/pub/scm/linux/kernel/git/mricon/korg-helpers.git/tree/bugzilla-junker.py
 
> There may be an unresolved question about migrating existing glibc 
> bugs (all of them, or at least the open ones) from the Sourceware 
> Bugzilla (keeping their numbers); I think such migration is valuable, 
> including for closed bugs (then adding a comment to all the old bugs 
> that they are frozen and future discussion should take place at the 
> new location), but I think there have been suggestions that such 
> migration is hard.  (Bugs for other products in Sourceware Bugzilla 
> would not of course be included in such a migration, since the new 
> database would be glibc-only.)

This would be super hard and may introduce unwanted side-effects. To my 
knowledge, it's impossible to lock a bug -- so you may have a 
split-brain situation where a bug is still updated in the old location, 
but not in the new location.

> > 6. Patch tracking services (Patchwork)
> > 
> >   a. LF will deploy the latest version of Patchwork patch tracking software
> >      under patchwork.coretoolchain.dev.
> >   b. LF will set up automation and integration services between mailing lists,
> >      patchwork, and git repositories (using [git-patchwork-bot][8]).
> >   c. LF will work with the glibc community to set up projects and access as
> >      appropriate.
> 
> The existing glibc Patchwork database should be transitioned across.

Can you give more information why this is desired? This is uncharted 
territory and I'm not sure how much effort this would require.

-K

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Hashing out the scope of work
  2024-04-29 15:23   ` Konstantin Ryabitsev
@ 2024-04-29 16:35     ` Joseph Myers
  2024-04-29 18:52       ` Siddhesh Poyarekar
  0 siblings, 1 reply; 17+ messages in thread
From: Joseph Myers @ 2024-04-29 16:35 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: cti-tac

On Mon, 29 Apr 2024, Konstantin Ryabitsev wrote:

> > We need some form of (fully free software) spam protection, whether the 
> > existing scheme where new accounts need to be manually requested and 
> > approved, or any other sufficiently reliable spam protection scheme 
> > available upstream.
> 
> There isn't any. \o/
> We run bugzilla-junker that goes through all recent comments and lets 
> the admin review all URLs for spam content. It's pretty effective at 
> keeping our bugzilla instance spam-free.
> https://git.kernel.org/pub/scm/linux/kernel/git/mricon/korg-helpers.git/tree/bugzilla-junker.py

Before we restricted account creation, at one point GCC Bugzilla had 
spammers opening new bugs faster than the REST API could mark them as 
spam.  While I think Sourceware passes comments through SpamAssassin, my 
expectation is that we need manually reviewed account creation (i.e. 
people sending emails to create accounts, with enough information to 
convince people they're not a spammer) to avoid very high spam volumes.

> > There may be an unresolved question about migrating existing glibc 
> > bugs (all of them, or at least the open ones) from the Sourceware 
> > Bugzilla (keeping their numbers); I think such migration is valuable, 
> > including for closed bugs (then adding a comment to all the old bugs 
> > that they are frozen and future discussion should take place at the 
> > new location), but I think there have been suggestions that such 
> > migration is hard.  (Bugs for other products in Sourceware Bugzilla 
> > would not of course be included in such a migration, since the new 
> > database would be glibc-only.)
> 
> This would be super hard and may introduce unwanted side-effects. To my 
> knowledge, it's impossible to lock a bug -- so you may have a 
> split-brain situation where a bug is still updated in the old location, 
> but not in the new location.

A local patch on the Sourceware side (to disallow new changes to bugs in 
the glibc product) could surely have the effect of such locking.

By way of examples of other projects that did issue tracker migrations (to 
GitHub Issues), both LLVM and Python migrated existing issues as part of 
those migrations.  Both those were also changing from one issue tracker 
implementation to another; a migration from Bugzilla to Bugzilla surely 
ought to be simpler because of a closer match between the underlying data 
models (even if the Bugzilla versions are different).  While both were 
able to make the old tracker entirely readonly after the move - they 
didn't have the complication of the old tracker mixing bugs for multiple 
projects - I think that's more of a small detail, not something to make a 
conversion excessively hard.

> > > 6. Patch tracking services (Patchwork)
> > > 
> > >   a. LF will deploy the latest version of Patchwork patch tracking software
> > >      under patchwork.coretoolchain.dev.
> > >   b. LF will set up automation and integration services between mailing lists,
> > >      patchwork, and git repositories (using [git-patchwork-bot][8]).
> > >   c. LF will work with the glibc community to set up projects and access as
> > >      appropriate.
> > 
> > The existing glibc Patchwork database should be transitioned across.
> 
> Can you give more information why this is desired? This is uncharted 
> territory and I'm not sure how much effort this would require.

The same reason as anything else.  Patches from before the transition are 
just as much in need of review as those from after the transition, so the 
state of review of those patches should be maintained across the 
transition.

I don't really think of moving patch review state or bug database state 
across as substantially different from moving the existing git repository 
- these are all key parts of the development process and shouldn't have an 
artificial divide based on the details of when hosting changed, once the 
transition happens developers should only need to interact with the new 
system, for tracking state of old or new patches, for updating state of 
old or new bugs.

-- 
Joseph S. Myers
josmyers@redhat.com


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Hashing out the scope of work
  2024-04-29 16:35     ` Joseph Myers
@ 2024-04-29 18:52       ` Siddhesh Poyarekar
  2024-04-29 20:47         ` Joseph Myers
  0 siblings, 1 reply; 17+ messages in thread
From: Siddhesh Poyarekar @ 2024-04-29 18:52 UTC (permalink / raw)
  To: Joseph Myers, Konstantin Ryabitsev; +Cc: cti-tac

On 2024-04-29 12:35, Joseph Myers wrote:
>>>> 6. Patch tracking services (Patchwork)
>>>>
>>>>    a. LF will deploy the latest version of Patchwork patch tracking software
>>>>       under patchwork.coretoolchain.dev.
>>>>    b. LF will set up automation and integration services between mailing lists,
>>>>       patchwork, and git repositories (using [git-patchwork-bot][8]).
>>>>    c. LF will work with the glibc community to set up projects and access as
>>>>       appropriate.
>>>
>>> The existing glibc Patchwork database should be transitioned across.
>>
>> Can you give more information why this is desired? This is uncharted
>> territory and I'm not sure how much effort this would require.
> 
> The same reason as anything else.  Patches from before the transition are
> just as much in need of review as those from after the transition, so the
> state of review of those patches should be maintained across the
> transition.
> 
> I don't really think of moving patch review state or bug database state
> across as substantially different from moving the existing git repository
> - these are all key parts of the development process and shouldn't have an
> artificial divide based on the details of when hosting changed, once the
> transition happens developers should only need to interact with the new
> system, for tracking state of old or new patches, for updating state of
> old or new bugs.

I believe we had explicitly agreed in a previous CTI TAC meeting to 
start over with a clean slate for patchwork and not bother with 
migrating the previous database.

If the active patch backlog is a concern then maybe we could refer to 
both instances for a while in the patch review call and move over 
completely at the end of that period.  I don't remember ever managing to 
actually run through the full backlog (that currently stands at 469 
patches) during any review session.

Sid

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Hashing out the scope of work
  2024-04-29 18:52       ` Siddhesh Poyarekar
@ 2024-04-29 20:47         ` Joseph Myers
  2024-04-29 21:01           ` Konstantin Ryabitsev
  0 siblings, 1 reply; 17+ messages in thread
From: Joseph Myers @ 2024-04-29 20:47 UTC (permalink / raw)
  To: Siddhesh Poyarekar; +Cc: Konstantin Ryabitsev, cti-tac

On Mon, 29 Apr 2024, Siddhesh Poyarekar wrote:

> I believe we had explicitly agreed in a previous CTI TAC meeting to start over
> with a clean slate for patchwork and not bother with migrating the previous
> database.

Bug database migration is definitely a lot more important than patchwork 
migration, but I think the starting point is that patchwork state should 
be migrated in the absence of a clear reason that's problematic.  And if 
it's problematic, we should try to understand if there's a better way to 
configure the future patchwork installation to make it more susceptible to 
any future migrations.

-- 
Joseph S. Myers
josmyers@redhat.com


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Hashing out the scope of work
  2024-04-29 20:47         ` Joseph Myers
@ 2024-04-29 21:01           ` Konstantin Ryabitsev
  2024-04-29 21:58             ` Joseph Myers
  0 siblings, 1 reply; 17+ messages in thread
From: Konstantin Ryabitsev @ 2024-04-29 21:01 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Siddhesh Poyarekar, cti-tac

On Mon, Apr 29, 2024 at 08:47:27PM GMT, Joseph Myers wrote:
> On Mon, 29 Apr 2024, Siddhesh Poyarekar wrote:
> 
> > I believe we had explicitly agreed in a previous CTI TAC meeting to start over
> > with a clean slate for patchwork and not bother with migrating the previous
> > database.
> 
> Bug database migration is definitely a lot more important than patchwork 
> migration, but I think the starting point is that patchwork state should 
> be migrated in the absence of a clear reason that's problematic.  And if 
> it's problematic, we should try to understand if there's a better way to 
> configure the future patchwork installation to make it more 
> susceptible to any future migrations.

The difficulty isn't really in the migration by itself, but in the fact 
that we're taking a single project from a larger installation of 
bugzilla or patchwork and attempting to migrate just that project and 
omit everything else. 

For example, for bugzilla we'd need to prepare a query to filter bugs 
and comments by product and component -- to only include those belonging 
to glibc. However, how do we go about filtering users? User records are 
not tied to a specific product or component. We can try to have a large 
query limiting the users to just those accounts who have commented on 
glibc bugs, but this is going to be hairy -- someone could have 
commented on a bug that started out filed under a different 
product/component.

Yanking out just those db entries that belong to glibc is going to be 
super hard -- I'm not even sure surgery like this has even been done 
before. This is really why I'm worried that we will spend a lot of 
effort trying to get it to work only to realize something didn't get 
moved over properly at some later date.

The same problem is with patchwork -- migrating just the subset of the 
database that belongs to glibc lists is going to be very difficult, 
especially with the kind of data model that patchwork has. Is it the CI 
data that you want to preserve?

-K

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Hashing out the scope of work
  2024-04-29 21:01           ` Konstantin Ryabitsev
@ 2024-04-29 21:58             ` Joseph Myers
  2024-04-30 12:30               ` Konstantin Ryabitsev
  0 siblings, 1 reply; 17+ messages in thread
From: Joseph Myers @ 2024-04-29 21:58 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: Siddhesh Poyarekar, cti-tac

On Mon, 29 Apr 2024, Konstantin Ryabitsev wrote:

> For example, for bugzilla we'd need to prepare a query to filter bugs 
> and comments by product and component -- to only include those belonging 
> to glibc. However, how do we go about filtering users? User records are 
> not tied to a specific product or component. We can try to have a large 
> query limiting the users to just those accounts who have commented on 
> glibc bugs, but this is going to be hairy -- someone could have 
> commented on a bug that started out filed under a different 
> product/component.

The simple answer there is not to filter users.  It's OK to have more 
users than necessary in the new database.

We do still need to make sure things don't break if migrating a bug that 
was originally reported under a different product/component, whose history 
might thus show changes of product/component/... from values that 
shouldn't need to exist in the new Bugzilla.  But as long as things don't 
break, it's OK for the history display not to show those details of the 
history that aren't meaningful in a glibc-only context - to show them as a 
change from <missing-product> to glibc, for example.

> The same problem is with patchwork -- migrating just the subset of the 
> database that belongs to glibc lists is going to be very difficult, 
> especially with the kind of data model that patchwork has. Is it the CI 
> data that you want to preserve?

I think it's the details of what patches / patch series still need review 
that's the most valuable part to preserve and migrate.

-- 
Joseph S. Myers
josmyers@redhat.com


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Hashing out the scope of work
  2024-04-29 21:58             ` Joseph Myers
@ 2024-04-30 12:30               ` Konstantin Ryabitsev
  2024-04-30 12:41                 ` Siddhesh Poyarekar
  2024-05-02 19:27                 ` Joseph Myers
  0 siblings, 2 replies; 17+ messages in thread
From: Konstantin Ryabitsev @ 2024-04-30 12:30 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Siddhesh Poyarekar, cti-tac

On Mon, Apr 29, 2024 at 09:58:18PM GMT, Joseph Myers wrote:
> > For example, for bugzilla we'd need to prepare a query to filter 
> > bugs and comments by product and component -- to only include those 
> > belonging to glibc. However, how do we go about filtering users? 
> > User records are not tied to a specific product or component. We can 
> > try to have a large query limiting the users to just those accounts 
> > who have commented on glibc bugs, but this is going to be hairy -- 
> > someone could have commented on a bug that started out filed under a 
> > different product/component.
> 
> The simple answer there is not to filter users.  It's OK to have more 
> users than necessary in the new database.

Is it okay with the users, though, that they suddenly have accounts on a 
totally different system belonging to a totally different organization?  

> We do still need to make sure things don't break if migrating a bug 
> that was originally reported under a different product/component, 
> whose history might thus show changes of product/component/... from 
> values that shouldn't need to exist in the new Bugzilla.  But as long 
> as things don't break, it's OK for the history display not to show 
> those details of the history that aren't meaningful in a glibc-only 
> context - to show them as a change from <missing-product> to glibc, 
> for example.

I don't mind doing this work as long as everyone understands that it's 
probably a couple of weeks worth of effort and we have to budget it as 
such.

> > The same problem is with patchwork -- migrating just the subset of 
> > the database that belongs to glibc lists is going to be very 
> > difficult, especially with the kind of data model that patchwork 
> > has. Is it the CI data that you want to preserve?
> 
> I think it's the details of what patches / patch series still need 
> review that's the most valuable part to preserve and migrate.

I think we can accomplish this without having to touch the DB. We can 
identify the patches that are still open and replay them from the 
mailing list into the new system. This would be a lot simpler than doing 
database surgery.

-K

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Hashing out the scope of work
  2024-04-30 12:30               ` Konstantin Ryabitsev
@ 2024-04-30 12:41                 ` Siddhesh Poyarekar
  2024-05-02 19:27                 ` Joseph Myers
  1 sibling, 0 replies; 17+ messages in thread
From: Siddhesh Poyarekar @ 2024-04-30 12:41 UTC (permalink / raw)
  To: Konstantin Ryabitsev, Joseph Myers; +Cc: cti-tac

On 2024-04-30 08:30, Konstantin Ryabitsev wrote:
>> I think it's the details of what patches / patch series still need
>> review that's the most valuable part to preserve and migrate.
> 
> I think we can accomplish this without having to touch the DB. We can
> identify the patches that are still open and replay them from the
> mailing list into the new system. This would be a lot simpler than doing
> database surgery.

That would also allow the trybots to run a second time, flagging patches 
that no longer apply.  It'll be a decent way to weed out outdated 
patches too.

thanks,
Sid

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Hashing out the scope of work
  2024-04-30 12:30               ` Konstantin Ryabitsev
  2024-04-30 12:41                 ` Siddhesh Poyarekar
@ 2024-05-02 19:27                 ` Joseph Myers
  2024-05-07 20:36                   ` Konstantin Ryabitsev
  1 sibling, 1 reply; 17+ messages in thread
From: Joseph Myers @ 2024-05-02 19:27 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: Siddhesh Poyarekar, cti-tac

On Tue, 30 Apr 2024, Konstantin Ryabitsev wrote:

> On Mon, Apr 29, 2024 at 09:58:18PM GMT, Joseph Myers wrote:
> > > For example, for bugzilla we'd need to prepare a query to filter 
> > > bugs and comments by product and component -- to only include those 
> > > belonging to glibc. However, how do we go about filtering users? 
> > > User records are not tied to a specific product or component. We can 
> > > try to have a large query limiting the users to just those accounts 
> > > who have commented on glibc bugs, but this is going to be hairy -- 
> > > someone could have commented on a bug that started out filed under a 
> > > different product/component.
> > 
> > The simple answer there is not to filter users.  It's OK to have more 
> > users than necessary in the new database.
> 
> Is it okay with the users, though, that they suddenly have accounts on a 
> totally different system belonging to a totally different organization?  

If that's a concern, I wouldn't expect it to be particularly hard to 
identify a more precise set of users: there's a precise set of bugs to 
copy (i.e. those, whether open or closed, that are against the glibc 
product at the time of the move - not anything that was once opened 
against the glibc product but is currently associated with a different 
product) and each bug has a limited set of relevant people (reporter, 
assignee, CC, anyone who did any action recorded in the bug's history).

If an account was blocked for spamming and gets copied, it should still be 
blocked after the move.

> > > The same problem is with patchwork -- migrating just the subset of 
> > > the database that belongs to glibc lists is going to be very 
> > > difficult, especially with the kind of data model that patchwork 
> > > has. Is it the CI data that you want to preserve?
> > 
> > I think it's the details of what patches / patch series still need 
> > review that's the most valuable part to preserve and migrate.
> 
> I think we can accomplish this without having to touch the DB. We can 
> identify the patches that are still open and replay them from the 
> mailing list into the new system. This would be a lot simpler than doing 
> database surgery.

Where patchwork lists patch series as such, is this automatic based on the 
mailing list messages?  (I think the grouping of patches awaiting review 
into series is worth preserving.)

-- 
Joseph S. Myers
josmyers@redhat.com


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Hashing out the scope of work
  2024-05-02 19:27                 ` Joseph Myers
@ 2024-05-07 20:36                   ` Konstantin Ryabitsev
  2024-05-07 21:45                     ` Joseph Myers
  0 siblings, 1 reply; 17+ messages in thread
From: Konstantin Ryabitsev @ 2024-05-07 20:36 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Siddhesh Poyarekar, cti-tac

On Thu, May 02, 2024 at 07:27:32PM GMT, Joseph Myers wrote:
> > > The simple answer there is not to filter users.  It's OK to have 
> > > more users than necessary in the new database.
> > 
> > Is it okay with the users, though, that they suddenly have accounts on a 
> > totally different system belonging to a totally different organization?  
> 
> If that's a concern, I wouldn't expect it to be particularly hard to 
> identify a more precise set of users: there's a precise set of bugs to 
> copy (i.e. those, whether open or closed, that are against the glibc 
> product at the time of the move - not anything that was once opened 
> against the glibc product but is currently associated with a different 
> product) and each bug has a limited set of relevant people (reporter, 
> assignee, CC, anyone who did any action recorded in the bug's history).

So, I did some more poking at the backend database for bugzilla and 
while I think it's possible, this kind of surgery of moving a single 
project out of a larger instance to a fresh new instance would be a lot 
of effort, and something could still go wrong in ways that aren't 
immediately obvious.

I recommend this course of action:

1. We convert all current bugs in the glibc component of 
sourceware.org/bugzilla to a public-inbox archive, searchable by the bug 
id. This functionality exists in our bugspray project and only requires 
non-privileged REST access to bugzilla.
2. On the migration date, all glibc components at sourceware are closed 
for creating new bugs. This still allows completing existing bug 
entries.
3. The new bugzilla at CTI starts from a fresh state, with bug numbers 
starting with 100000, to indicate a clear break.

This allows us to preserve a searchable archive of all old glibc bugs, 
but does not require doing a database surgery that would be required in 
order to forklift all old bugs from the old bugzilla to the new.

Is that a workable solution?

> > > > The same problem is with patchwork -- migrating just the subset 
> > > > of the database that belongs to glibc lists is going to be very 
> > > > difficult, especially with the kind of data model that patchwork 
> > > > has. Is it the CI data that you want to preserve?
> > > 
> > > I think it's the details of what patches / patch series still need 
> > > review that's the most valuable part to preserve and migrate.
> > 
> > I think we can accomplish this without having to touch the DB. We can 
> > identify the patches that are still open and replay them from the 
> > mailing list into the new system. This would be a lot simpler than doing 
> > database surgery.
> 
> Where patchwork lists patch series as such, is this automatic based on the 
> mailing list messages?  (I think the grouping of patches awaiting 
> review into series is worth preserving.)

Yes, series grouping is automatic based on threading, so can be fully 
recreated from mailing list archives.

-K

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Hashing out the scope of work
  2024-05-07 20:36                   ` Konstantin Ryabitsev
@ 2024-05-07 21:45                     ` Joseph Myers
  0 siblings, 0 replies; 17+ messages in thread
From: Joseph Myers @ 2024-05-07 21:45 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: Siddhesh Poyarekar, cti-tac

On Tue, 7 May 2024, Konstantin Ryabitsev wrote:

> I recommend this course of action:
> 
> 1. We convert all current bugs in the glibc component of 
> sourceware.org/bugzilla to a public-inbox archive, searchable by the bug 
> id. This functionality exists in our bugspray project and only requires 
> non-privileged REST access to bugzilla.
> 2. On the migration date, all glibc components at sourceware are closed 
> for creating new bugs. This still allows completing existing bug 
> entries.

Previous bugs would continue being discussed and fixed for decades; this 
would require new contributors to create accounts in both databases 
depending on how old the bugs they are working with are, and would require 
new milestones to be created in the old database for every future release 
so old bugs can be marked as fixed in such a release, and would require 
the script listing fixed bugs for NEWS to look in both databases 
indefinitely.  I don't think requiring people to work with two databases 
indefinitely is a good idea.

> 3. The new bugzilla at CTI starts from a fresh state, with bug numbers 
> starting with 100000, to indicate a clear break.

So, while I think starting new bugs with number 100000 is a good idea, I 
also think all the old glibc bugs (both open and closed) should be copied 
to the new database so people don't need to keep working with both 
databases forever.

-- 
Joseph S. Myers
josmyers@redhat.com


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-05-07 21:45 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-24 16:45 Hashing out the scope of work Konstantin Ryabitsev
2024-04-24 21:15 ` Ian Kelling
2024-04-25  1:31   ` Konstantin Ryabitsev
2024-04-25 20:06     ` Ian Kelling
2024-04-26 16:06       ` Siddhesh Poyarekar
2024-04-29 14:09 ` Joseph Myers
2024-04-29 15:23   ` Konstantin Ryabitsev
2024-04-29 16:35     ` Joseph Myers
2024-04-29 18:52       ` Siddhesh Poyarekar
2024-04-29 20:47         ` Joseph Myers
2024-04-29 21:01           ` Konstantin Ryabitsev
2024-04-29 21:58             ` Joseph Myers
2024-04-30 12:30               ` Konstantin Ryabitsev
2024-04-30 12:41                 ` Siddhesh Poyarekar
2024-05-02 19:27                 ` Joseph Myers
2024-05-07 20:36                   ` Konstantin Ryabitsev
2024-05-07 21:45                     ` Joseph Myers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).