Store root-namespace storage statistics on database

Problem to solve

Today we check storage statistics using a GROUP BY operator on ProjectStatistics and it's one of the longest running transaction in production (https://gitlab.com/gitlab-org/gitlab-ce/issues/62488)

We're using this information as part of a public API on storage counter at group level. And once we start enforcing storage limits we will need to rely on this query more often.

Also, our billing schema is based on root-namespace aggregation and this query do not aggregate to root-namespace.

Technical bits

On gitlab.com we have namespaces with ~15k projects, this query takes 1.2seconds to run.
If we try to analyze it with Chatops it timeouts: https://ops.gitlab.net/gitlab-com/chatops/-/jobs/528372
On our EE we have a EE::NamespaceStatistics table that keeps the root-namespace aggregation but it's only used for tracking pipelines minutes.

Proposal

Create a new model with the same attributes as ProjectStatistics.*_size. The purpose of this model will be to hold the information in an aggregated form.
Update the statistics in this model in an async way, to avoid large database transactions. (See backend section for the technical details)
Rework !28277 (merged) to make use of this new query - https://gitlab.com/gitlab-org/gitlab-ce/issues/62796

Development log

Decisions

There is some prework that needs to be done before starting working on this issue.
Since it was reported (here and on https://gitlab.com/gitlab-org/gitlab-ce/issues/62488), that the pattern we currently use for updating project_statistics doesn't scale properly for GitLab.com, we've decided to go with a different approach for updating the namespace statistics: With a CTE refresh strategy based on the namespace routes. (https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/28996#note_178132519)
- On backend implications we've outlined all the technical details
While working through the CTE approach, WE noticed that it might not be easy to implement and not going to be compatible with MySQL (https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/28996#note_181094357, https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/28996#note_180759005)
- There's another possible approach of adding a new column on namespaces table that tracks the root namespace and calculate the statistics based on this column (https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/28996#note_178311781)
  - This will involve migrating namespaces table (one of the largest database tables on GitLab.com)
  - Because the migration in the background can take some time (up to hours or days), we've decided to ship this first in %12.1, and then continue the backend work on %12.2
We agreed that https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/29837?commit_id=110478466ab85ac7a7ff69cd6dee300169b05128#note_182994031 it's fast enough for an async processing job in sidekiq, and it will allow us to avoid running a migration on namespaces. The meeting was recorded
- Regular query was implemented and merged on https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/28996
After https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/28996, we discovered an edge case that needs to be solved https://gitlab.com/gitlab-org/gitlab-ce/issues/62214#note_187584895
Bug was detected before https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/28996 reached production: gitlab-org/gitlab-ce#64079
- https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/30305
We decided to measure the group storage statistics on staging and production. Details here https://gitlab.com/gitlab-org/gitlab-ce/issues/64092
- Performance was measured on staging and production. No inconvenient or error was found. All details in the issue.

Backend implications

Prework

%12.0 ~backstage remove nils from project_statistics.packages_size https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/28400 https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/13163 (@nolith)
%12.0 gitlab-ee#11675 affects root-namespace aggregation on NamespaceStatistics and should be fixed before doing this. (@nolith)

Technical details (%12.1 )

Create root_namespace_storage_statistics with all the ProjectStatistics.*_size attributes
Create a second table (namespace_aggregation_schedules) with two columns id and namespace_id.
Whenever the statistics of a project changes, we insert a row into namespace_aggregation_schedules
- We don't insert a new row if there's already one related to the namespace.
- Insertion is done through a callback and with a Sidekiq job. We can't do it in the same transaction as ProjectStatistics is already involved in a large one (https://gitlab.com/gitlab-org/gitlab-ce/issues/62488)
After inserting the row, we schedule a new worker X hours into the future.
This job will:
- Update the root namespace storage statistics by querying all the namespaces through a service.
- Delete the related namespace_aggregation_schedules after the update
We also need to create another Sidekiq job that will traverse any remaining rows on namespace_aggregation_schedules and schedule jobs for every pending row.
Hide all these changes behind a FF
we will read the interval of caching time form redis defaulting to once every 3 hours
we will experiment tweaking the interval aiming for a smaller value
when we will remove the feature flag, the interval must be hardcoded or converted to an application setting (to be decided)

Merge Requests

Step 1 & 2 are implemented on https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/29570
Step 3 to 8 are implemented on https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/28996
Never release the redis lease gitlab-org/gitlab-ce#64079 - https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/30305
Schedule a Namespace::AggregationSchedule worker when some columns are refreshed on ProjectStatistics.refresh! - https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/30329
Hardcore the lease time depending on the analisis https://gitlab.com/gitlab-org/gitlab-ce/issues/64092 - https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/31341
Remove the feature flag - https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/31392

Edited Aug 06, 2019 by Mayra Cabrera