blob: bdd2a68319d2b4e946340de9b98bc4d2989dded8 [file] [log] [blame]
Yuxuan 'fishy' Wang61698b12013-12-20 12:55:51 -08001= Gerrit Code Review - System Design
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -08002
Yuxuan 'fishy' Wang61698b12013-12-20 12:55:51 -08003== Objective
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -08004
5Gerrit is a web based code review system, facilitating online code
6reviews for projects using the Git version control system.
7
8Gerrit makes reviews easier by showing changes in a side-by-side
Bruce Zu6b0fd762012-10-25 16:52:00 +08009display, and allowing inline/file comments to be added by any reviewer.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -080010
11Gerrit simplifies Git based project maintainership by permitting
12any authorized user to submit changes to the master Git repository,
13rather than requiring all approved changes to be merged in by
14hand by the project maintainer. This functionality enables a more
15centralized usage of Git.
16
17
Yuxuan 'fishy' Wang61698b12013-12-20 12:55:51 -080018== Background
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -080019
20Google developed Mondrian, a Perforce based code review tool to
21facilitate peer-review of changes prior to submission to the central
22code repository. Mondrian is not open source, as it is tied to the
23use of Perforce and to many Google-only services, such as Bigtable.
24Google employees have often described how useful Mondrian and its
25peer-review process is to their day-to-day work.
26
27Guido van Rossum open sourced portions of Mondrian within Rietveld,
28a similar code review tool running on Google App Engine, but for
29use with Subversion rather than Perforce. Rietveld is in common
30use by many open source projects, facilitating their peer reviews
31much as Mondrian does for Google employees. Unlike Mondrian and
32the Google Perforce triggers, Rietveld is strictly advisory and
33does not enforce peer-review prior to submission.
34
35Git is a distributed version control system, wherein each repository
36is assumed to be owned/maintained by a single user. There are no
David Pursehouse221d4f62012-06-08 17:38:08 +090037inherent security controls built into Git, so the ability to read
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -080038from or write to a repository is controlled entirely by the host's
39filesystem access controls. When multiple maintainers collaborate
40on a single shared repository a high degree of trust is required,
41as any collaborator with write access can alter the repository.
42
43Gitosis provides tools to secure centralized Git repositories,
44permitting multiple maintainers to manage the same project at once,
45by restricting the access to only over a secure network protocol,
46much like Perforce secures a repository by only permitting access
47over its network port.
48
49The Android Open Source Project (AOSP) was founded by Google by the
50open source releasing of the Android operating system. AOSP has
51selected Git as its primary version control tool. As many of the
52engineers have a background of working with Mondrian at Google,
53there is a strong desire to have the same (or better) feature set
54available for Git and AOSP.
55
Shawn O. Pearce4a7f6fa2009-02-17 17:14:56 -080056Gerrit Code Review started as a simple set of patches to Rietveld,
57and was originally built to service AOSP. This quickly turned
58into a fork as we added access control features that Guido van
59Rossum did not want to see complicating the Rietveld code base. As
60the functionality and code were starting to become drastically
61different, a different name was needed. Gerrit calls back to the
62original namesake of Rietveld, Gerrit Rietveld, a Dutch architect.
63
Shawn O. Pearcee31d02c2009-12-08 12:21:37 -080064Gerrit 2.x is a complete rewrite of the Gerrit fork, completely
65changing the implementation from Python on Google App Engine, to Java
David Pursehouse3be717b2013-05-13 13:56:11 +090066on a J2EE servlet container and an SQL database.
Shawn O. Pearce4a7f6fa2009-02-17 17:14:56 -080067
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -080068* link:http://video.google.com/videoplay?docid=-8502904076440714866[Mondrian Code Review On The Web]
Shawn Pearce792b2c42015-06-12 17:25:40 -070069* link:https://github.com/rietveld-codereview/rietveld[Rietveld - Code Review for Subversion]
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -080070* link:http://eagain.net/gitweb/?p=gitosis.git;a=blob;f=README.rst;hb=HEAD[Gitosis README]
71* link:http://source.android.com/[Android Open Source Project]
72
73
Yuxuan 'fishy' Wang61698b12013-12-20 12:55:51 -080074== Overview
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -080075
76Developers create one or more changes on their local desktop system,
77then upload them for review to Gerrit using the standard `git push`
78command line program, or any GUI which can invoke `git push` on
79behalf of the user. Authentication and data transfer are handled
80through SSH. Users are authenticated by username and public/private
81key pair, and all data transfer is protected by the SSH connection
82and Git's own data integrity checks.
83
84Each Git commit created on the client desktop system is converted
85into a unique change record which can be reviewed independently.
Chad Horohoee9855b82012-05-06 22:28:45 -040086Change records are stored in a database: PostgreSQL, MySQL, or the
Martin Fickb026ca32011-07-27 18:23:20 -060087built-in H2, where they can be queried to present customized user
88dashboards, enumerating any pending changes.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -080089
90A summary of each newly uploaded change is automatically emailed
91to reviewers, so they receive a direct hyperlink to review the
92change on the web. Reviewer email addresses can be specified on the
93`git push` command line, but typically reviewers are automatically
94selected by Gerrit by identifying users who have change approval
95permissions in the project.
96
97Reviewers use the web interface to read the side-by-side or unified
Bruce Zu6b0fd762012-10-25 16:52:00 +080098diff of a change, and insert draft inline/file comments where
99appropriate. A draft comment is visible only to the reviewer, until
100they publish those comments. Published comments are automatically
101emailed to the change author by Gerrit, and are CC'd to all other
102reviewers who have already commented on the change.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800103
104When publishing comments reviewers are also given the opportunity
105to score the change, indicating whether they feel the change is
106ready for inclusion in the project, needs more work, or should be
107rejected outright. These scores provide direct feedback to Gerrit's
108change submit function.
109
110After a change has been scored positively by reviewers, Gerrit
111enables a submit button on the web interface. Authorized users
112can push the submit button to have the change enter the project
Edwin Kempinf1acbb82011-09-15 12:49:42 +0200113repository. The equivalent in Subversion or Perforce would be
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800114that Gerrit is invoking `svn commit` or `p4 submit` on behalf of
115the web user pressing the button. Due to the way Git audit trails
116are maintained, the user pressing the submit button does not need
117to be the author of the change.
118
119
Yuxuan 'fishy' Wang61698b12013-12-20 12:55:51 -0800120== Infrastructure
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800121
122End-user web browsers make HTTP requests directly to Gerrit's
123HTTP server. As nearly all of the user interface is implemented
124through Google Web Toolkit (GWT), the majority of these requests
125are transmitting compressed JSON payloads, with all HTML being
126generated within the browser. Most responses are under 1 KB.
127
128Gerrit's HTTP server side component is implemented as a standard
129Java servlet, and thus runs within any J2EE servlet container.
130Popular choices for deployments would be Tomcat or Jetty, as these
131are high-quality open-source servlet containers that are readily
132available for download.
133
134End-user uploads are performed over SSH, so Gerrit's servlets also
135start up a background thread to receive SSH connections through
136an independent SSH port. SSH clients communicate directly with
137this port, bypassing the HTTP server used by browsers.
138
139Server side data storage for Gerrit is broken down into two different
140categories:
141
142* Git repository data
143* Gerrit metadata
144
145The Git repository data is the Git object database used to store
146already submitted revisions, as well as all uploaded (proposed)
147changes. Gerrit uses the standard Git repository format, and
148therefore requires direct filesystem access to the repositories.
149All repository data is stored in the filesystem and accessed through
150the JGit library. Repository data can be stored on remote servers
151accessible through NFS or SMB, but the remote directory must
152be mounted on the Gerrit server as part of the local filesystem
153namespace. Remote filesystems are likely to perform worse than
154local ones, due to Git disk IO behavior not being optimized for
155remote access.
156
157The Gerrit metadata contains a summary of the available changes,
158all comments (published and drafts), and individual user account
Martin Fickb026ca32011-07-27 18:23:20 -0600159information. The metadata is mostly housed in the database (*1),
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800160which can be located either on the same server as Gerrit, or on
161a different (but nearby) server. Most installations would opt to
Martin Fickb026ca32011-07-27 18:23:20 -0600162install both Gerrit and the metadata database on the same server,
163to reduce administration overheads.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800164
165User authentication is handled by OpenID, and therefore Gerrit
166requires that the OpenID provider selected by a user must be
167online and operating in order to authenticate that user.
168
Shawn Pearce792b2c42015-06-12 17:25:40 -0700169* link:http://www.gwtproject.org/[Google Web Toolkit (GWT)]
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800170* link:http://www.kernel.org/pub/software/scm/git/docs/gitrepository-layout.html[Git Repository Format]
171* link:http://www.postgresql.org/about/[About PostgreSQL]
172* link:http://openid.net/developers/specs/[OpenID Specifications]
173
Martin Fickb026ca32011-07-27 18:23:20 -0600174*1 Although an effort is underway to eliminate the use of the
175database altogether, and to store all the metadata directly in
176the git repositories themselves. So far, as of Gerrit 2.2.1, of
177all Gerrit's metadata, only the project configuration metadata
178has been migrated out of the database and into the git
179repositories for each project.
180
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800181
Yuxuan 'fishy' Wang61698b12013-12-20 12:55:51 -0800182== Project Information
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800183
184Gerrit is developed as a self-hosting open source project:
185
Shawn Pearce792b2c42015-06-12 17:25:40 -0700186* link:https://www.gerritcodereview.com/[Project Homepage]
Shawn Pearce6d7ebc62015-06-12 16:34:42 -0700187* link:https://www.gerritcodereview.com/download/index.html[Release Versions]
188* link:https://gerrit.googlesource.com/gerrit[Source]
David Pursehouseff8982d2016-06-23 15:30:50 +0900189* link:https://bugs.chromium.org/p/gerrit/issues/list[Issue Tracking]
Shawn O. Pearceaa5b83b2009-12-02 08:10:24 -0800190* link:https://review.source.android.com/[Change Review]
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800191
192
Yuxuan 'fishy' Wang61698b12013-12-20 12:55:51 -0800193== Internationalization and Localization
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800194
195As a source code review system for open source projects, where the
196commonly preferred language for communication is typically English,
197Gerrit does not make internationalization or localization a priority.
198
199The majority of Gerrit's users will be writing change descriptions
200and comments in English, and therefore an English user interface
201is usable by the target user base.
202
203Gerrit uses GWT's i18n support to externalize all constant strings
204and messages shown to the user, so that in the future someone who
205really needed a translated version of the UI could contribute new
206string files for their locale(s).
207
208Right-to-left (RTL) support is only barely considered within the
209Gerrit code base. Some portions of the code have tried to take
210RTL into consideration, while others probably need to be modified
211before translating the UI to an RTL language.
212
213* link:i18n-readme.html[Gerrit's i18n Support]
214
215
Yuxuan 'fishy' Wang61698b12013-12-20 12:55:51 -0800216== Accessibility Considerations
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800217
218Whenever possible Gerrit displays raw text rather than image icons,
219so screen readers should still be able to provide useful information
220to blind persons accessing Gerrit sites.
221
222Standard HTML hyperlinks are used rather than HTML div or span tags
223with click listeners. This provides two benefits to the end-user.
224The first benefit is that screen readers are optimized to locating
225standard hyperlink anchors and presenting them to the end-user as
226a navigation action. The second benefit is that users can use
227the 'open in new tab/window' feature of their browser whenever
228they choose.
229
230When possible, Gerrit uses the ARIA properties on DOM widgets to
231provide hints to screen readers.
232
233
Yuxuan 'fishy' Wang61698b12013-12-20 12:55:51 -0800234== Browser Compatibility
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800235
236Supporting non-JavaScript enabled browsers is a non-goal for Gerrit.
237
238As Gerrit is a pure-GWT application with no server side rendering
239fallbacks, the browser must support modern JavaScript semantics in
240order to access the Gerrit web application. Dumb clients such as
241`lynx`, `wget`, `curl`, or even many search engine spiders are not
242able to access Gerrit content.
243
244As Google Web Toolkit (GWT) is used to generate the browser
245specific versions of the client-side JavaScript code, Gerrit works
246on any JavaScript enabled browser which GWT can produce code for.
247This covers the majority of the popular browsers.
248
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800249The Gerrit project does not have the development resources necessary
250to support two parallel UI implementations (GWT based JavaScript
251and server-side rendering). Consequently only one is implemented.
252
253There are number of web browsers available with full JavaScript
254support, and nearly every operating system (including any PDA-like
255mobile phone) comes with one standard. Users who are committed
256to developing changes for a Gerrit managed project can be expected
257to be able to run a JavaScript enabled browser, as they also would
258need to be running Git in order to contribute.
259
260There are a number of open source browsers available, including
261Firefox and Chromium. Users have some degree of choice in their
262browser selection, including being able to build and audit their
263browser from source.
264
265The majority of the content stored within Gerrit is also available
266through other means, such as gitweb or the `git://` protocol.
267Any existing search engine spider can crawl the server-side HTML
268produced by gitweb, and thus can index the majority of the changes
269which might appear in Gerrit. Some engines may even choose to
270crawl the native version control database, such as ohloh.net does.
271Therefore the lack of support for most search engine spiders is a
272non-issue for most Gerrit deployments.
273
274
Yuxuan 'fishy' Wang61698b12013-12-20 12:55:51 -0800275== Product Integration
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800276
277Gerrit integrates with an existing gitweb installation by optionally
278creating hyperlinks to reference changes on the gitweb server.
279
280Gerrit integrates with an existing git-daemon installation by
281optionally displaying `git://` URLs for users to download a
Shawn O. Pearced6078462009-11-02 10:37:01 -0800282change through the native Git protocol.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800283
284Gerrit integrates with any OpenID provider for user authentication,
285making it easier for users to join a Gerrit site and manage their
286authentication credentials to it. To make use of Google Accounts
287as an OpenID provider easier, Gerrit has a shorthand "Sign in with
288a Google Account" link on its sign-in screen. Gerrit also supports
289a shorthand sign in link for Yahoo!. Other providers may also be
290supported more directly in the future.
291
Shawn O. Pearce142385d2009-03-01 11:09:05 -0800292Site administrators may limit the range of OpenID providers to
293a subset of "reliable providers". Users may continue to use
294any OpenID provider to publish comments, but granted privileges
295are only available to a user if the only entry point to their
296account is through the defined set of "reliable OpenID providers".
297This permits site administrators to require HTTPS for OpenID,
298and to use only large main-stream providers that are trustworthy,
299or to require users to only use a custom OpenID provider installed
300alongside Gerrit Code Review.
301
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800302Gerrit integrates with some types of corporate single-sign-on (SSO)
303solutions, typically by having the SSO authentication be performed
304in a reverse proxy web server and then blindly trusting that all
305incoming connections have been authenticated by that reverse proxy.
306When configured to use this form of authentication, Gerrit does
307not integrate with OpenID providers.
308
309When installing Gerrit, administrators may optionally include an
310HTML header or footer snippet which may include user tracking code,
311such as that used by Google Analytics. This is a per-instance
312configuration that must be done by hand, and is not supported
313out of the box. Other site trackers instead of Google Analytics
314can be used, as the administrator can supply any HTML/JavaScript
315they choose.
316
317Gerrit does not integrate with any Google service, or any other
318services other than those listed above.
319
320
Yuxuan 'fishy' Wang61698b12013-12-20 12:55:51 -0800321== Standards / Developer APIs
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800322
323Gerrit uses an XSRF protected variant of JSON-RPC 1.1 to communicate
324between the browser client and the server.
325
326As the protocol is not the GWT-RPC protocol, but is instead a
327self-describing standard JSON format it is easily implemented by
328any 3rd party client application, provided the client has a JSON
329parser and HTTP client library available.
330
331As the entire command set necessary for the standard web browser
332based UI is exposed through JSON-RPC over HTTP, there are no other
333data feeds or command interfaces to the server.
334
335Commands requiring user authentication may require the user agent to
336complete a sign-in cycle through the user's OpenID provider in order
337to establish the HTTP cookie Gerrit uses to track user identity.
338Automating this sign-in process for non-web browser agents is
339outside of the scope of Gerrit, as each OpenID provider uses its own
340sign-in sequence. Use of OpenID providers which have difficult to
341automate interfaces may make it impossible for non-browser agents
342to be used with the JSON-RPC interface.
343
344* link:http://json-rpc.org/wd/JSON-RPC-1-1-WD-20060807.html[JSON-RPC 1.1]
David Pursehouse78450192018-02-02 10:13:45 +0900345* link:https://gerrit.googlesource.com/gwtjsonrpc/+/master/README[XSRF JSON-RPC]
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800346
347
Yuxuan 'fishy' Wang61698b12013-12-20 12:55:51 -0800348== Privacy Considerations
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800349
350Gerrit stores the following information per user account:
351
352* Full Name
353* Preferred Email Address
Shawn O. Pearceaa8b3d42009-03-01 11:10:55 -0800354* Mailing Address '(Optional, Encrypted)'
355* Country '(Optional, Encrypted)'
356* Phone Number '(Optional, Encrypted)'
357* Fax Number '(Optional, Encrypted)'
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800358
359The full name and preferred email address fields are shown to any
360site visitor viewing a page containing a change uploaded by the
361account owner, or containing a published comment written by the
362account owner.
363
364Showing the full name and preferred email is approximately the same
365risk as the `From` header of an email posted to a public mailing
366list that maintains archives, and Gerrit treats these fields in
367much the same way that a mailing list archive might handle them.
368Users who don't want to expose this information should either not
369participate in a Gerrit based online community, or open a new email
370address dedicated for this use.
371
372As the Gerrit UI data is only available through XSRF protected
373JSON-RPC calls, "screen-scraping" for email addresses is difficult,
374but not impossible. It is unlikely a spammer will go through the
375effort required to code a custom scraping application necessary
376to cull email addresses from published Gerrit comments. In most
377cases these same addresses would be more easily obtained from the
378project's mailing list archives.
379
Shawn O. Pearceaa8b3d42009-03-01 11:10:55 -0800380The user's name and email address is stored unencrypted in the
381Gerrit metadata store, typically a PostgreSQL database.
382
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800383The snail-mail mailing address, country, and phone and fax numbers
384are gathered to help project leads contact the user should there
385be a legal question regarding any change they have uploaded.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800386
Shawn O. Pearceaa8b3d42009-03-01 11:10:55 -0800387These sensitive fields are immediately encrypted upon receipt with
388a GnuPG public key, and stored "off site" in another data store,
389isolated from the main Gerrit change data. Gerrit does not have
390access to the matching private key, and as such cannot decrypt the
391information. Therefore these fields are write-once in Gerrit, as not
392even the account owner can recover the values they previously stored.
393
394It is expected that the address information would only need to be
395decrypted and revealed with a valid court subpoena, but this is
396really left to the discretion of the Gerrit site administrator as
397to when it is reasonable to reveal this information to a 3rd party.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800398
399
Yuxuan 'fishy' Wang61698b12013-12-20 12:55:51 -0800400== Spam and Abuse Considerations
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800401
402Gerrit makes no attempt to detect spam changes or comments. The
403somewhat high barrier to entry makes it unlikely that a spammer
404will target Gerrit.
405
406To upload a change, the client must speak the native Git protocol
407embedded in SSH, with some custom Gerrit semantics added on top.
408The client must have their public key already stored in the Gerrit
409database, which can only be done through the XSRF protected
410JSON-RPC interface. The level of effort required to construct
411the necessary tools to upload a well-formatted change that isn't
412rejected outright by the Git and Gerrit checksum validations is
413too high to for a spammer to get any meaningful return.
414
415To post and publish a comment a client must sign in with an OpenID
416provider and then use the XSRF protected JSON-RPC interface to
417publish the draft on an existing change record. Again, the level of
418effort required to implement the Gerrit specific XSRF protections
419and the JSON-RPC payload format necessary to post a draft and then
420publish that draft is simply too high for a spammer to bother with.
421
422Both of these assumptions are also based upon the idea that Gerrit
423will be a lot less popular than blog software, and thus will be
Martin Fickb026ca32011-07-27 18:23:20 -0600424running on a lot fewer websites. Spammers therefore have very little
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800425returned benefit for getting over the protocol hurdles.
426
427These assumptions may need to be revisited in the future if any
428public Gerrit site actually notices spam.
429
430
Yuxuan 'fishy' Wang61698b12013-12-20 12:55:51 -0800431== Latency
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800432
433Gerrit targets for sub-250 ms per page request, mostly by using
David Pursehouse92463562013-06-24 10:16:28 +0900434very compact JSON payloads between client and server. However, as
Martin Fickb026ca32011-07-27 18:23:20 -0600435most of the serving stack (network, hardware, metadata
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800436database) is out of control of the Gerrit developers, no real
437guarantees can be made about latency.
438
439
Yuxuan 'fishy' Wang61698b12013-12-20 12:55:51 -0800440== Scalability
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800441
Shawn O. Pearce08255812011-04-12 00:02:38 -0400442Gerrit is designed for a very large scale open source project, or
Matt Bakera752b322013-11-27 19:19:31 -0700443large commercial development project. Roughly this amounts to
Shawn O. Pearce08255812011-04-12 00:02:38 -0400444parameters such as the following:
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800445
446.Design Parameters
Karsten Dambekalnsa7f72a22011-03-25 14:21:59 +0100447[options="header"]
Shawn O. Pearce08255812011-04-12 00:02:38 -0400448|======================================================
449|Parameter | Default Maximum | Estimated Maximum
450|Projects | 1,000 | 10,000
451|Contributors | 1,000 | 50,000
452|Changes/Day | 100 | 2,000
453|Revisions/Change | 20 | 20
454|Files/Change | 50 | 16,000
455|Comments/File | 100 | 100
456|Reviewers/Change | 8 | 8
457|======================================================
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800458
Shawn O. Pearce08255812011-04-12 00:02:38 -0400459Out of the box, Gerrit will handle the "Default Maximum". Site
460administrators may reconfigure their servers by editing gerrit.config
461to run closer to the estimated maximum if sufficient memory is made
David Pursehouse92463562013-06-24 10:16:28 +0900462available to the JVM and the relevant cache.*.memoryLimit variables
Shawn O. Pearce08255812011-04-12 00:02:38 -0400463are increased from their defaults.
464
Yuxuan 'fishy' Wang61698b12013-12-20 12:55:51 -0800465=== Discussion
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800466
467Very few, if any open source projects have more than a handful of
Shawn O. Pearce08255812011-04-12 00:02:38 -0400468Git repositories associated with them. Since Gerrit treats each
469Git repository as a project, an upper limit of 10,000 projects
470is reasonable. If a site has more than 1,000 projects, administrators
471should increase
472link:config-gerrit.html#cache.name.memoryLimit[`cache.projects.memoryLimit`]
473to match.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800474
Shawn O. Pearce08255812011-04-12 00:02:38 -0400475Almost no open source project has 1,000 contributors over all time,
476let alone on a daily basis. This default figure of 1,000 was WAG'd by
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800477looking at PR statements published by cell phone companies picking
478up the Android operating system. If all of the stated employees in
479those PR statements were working on *only* the open source Android
Shawn O. Pearce08255812011-04-12 00:02:38 -0400480repositories, we might reach the 1,000 estimate listed here. Knowing
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800481these companies as being very closed-source minded in the past, it
482is very unlikely all of their Android engineers will be working on
Shawn O. Pearce08255812011-04-12 00:02:38 -0400483the open source repository, and thus 1,000 is a very high estimate.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800484
Shawn O. Pearce08255812011-04-12 00:02:38 -0400485The upper maximum of 50,000 contributors is based on existing
486installations that are already handling quite a bit more than the
487default maximum of 1,000 contributors. Given how the user data is
488stored and indexed, supporting 50,000 contributor accounts (or more)
489is easily possible for a server. If a server has more than 1,000
490*active* contributors,
491link:config-gerrit.html#cache.name.memoryLimit[`cache.accounts.memoryLimit`]
492should be increased by the site administrator, if sufficient RAM
493is available to the host JVM.
494
495The estimate of 100 changes per day was WAG'd off some estimates
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800496originally obtained from Android's development history. Writing a
497good change that will be accepted through a peer-review process
498takes time. The average engineer may need 4-6 hours per change just
499to write the code and unit tests. Proper design consideration and
500additional but equally important tasks such as meetings, interviews,
501training, and eating lunch will often pad the engineer's day out
502such that suitable changes are only posted once a day, or once
503every other day. For reference, the entire Linux kernel has an
Shawn O. Pearce08255812011-04-12 00:02:38 -0400504average of only 79 changes/day. If more than 100 changes are active
505per day, site administrators should consider increasing the
506link:config-gerrit.html#cache.name.memoryLimit[`cache.diff.memoryLimit`]
507and `cache.diff_intraline.memoryLimit`.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800508
Shawn O. Pearce08255812011-04-12 00:02:38 -0400509On average any given change will need to be modified once to address
510peer review comments before the final revision can be accepted by the
511project. Executing these revisions also eats into the contributor's
512time, and is another factor limiting the number of changes/day
513accepted by the Gerrit instance. However, even though this implies
514only 2 revisions/change, many existing Gerrit installations have seen
51520 or more revisions/change, when new contributors are learning the
516project's style and conventions.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800517
Shawn O. Pearce08255812011-04-12 00:02:38 -0400518On average, each change will have 2 reviewers, a human and an
519automated test bed system. Usually this would be the project lead, or
520someone who is familiar with the code being modified. The time
521required to comment further reduces the time available for writing
522one's own changes. However, existing Gerrit installations have seen 8
523or more reviewers frequently show up on changes that impact many
524functional areas, and therefore it is reasonable to expect 8 or more
525reviewers to be able to work together on a single change.
526
527Existing installations have successfully processed change reviews with
528more than 16,000 files per change. However, since 16,000 modified/new
529files is a massive amount of code to review, it is more typical to see
530less than 10 files modified in any single change. Changes larger than
53110 files are typically merges, for example integrating the latest
532version of an upstream library, where the reviewer has little to do
533beyond verifying the project compiles and passes a test suite.
534
Yuxuan 'fishy' Wang61698b12013-12-20 12:55:51 -0800535=== CPU Usage - Web UI
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800536
537Gerrit's web UI would require on average `4+F+F*C` HTTP requests to
538review a change and post comments. Here `F` is the number of files
Bruce Zu6b0fd762012-10-25 16:52:00 +0800539modified by the change, and `C` is the number of inline/file comments
540left by the reviewer per file. The constant 4 accounts for the request
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800541to load the reviewer's dashboard, to load the change detail page,
542to publish the review comments, and to reload the change detail
543page after comments are published.
544
Shawn O. Pearce08255812011-04-12 00:02:38 -0400545This WAG'd estimate boils down to 216,000 HTTP requests per day
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800546(QPD). Assuming these are evenly distributed over an 8 hour work day
Shawn O. Pearce08255812011-04-12 00:02:38 -0400547in a single time zone, we are looking at approximately 7.5 queries
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800548per second (QPS).
549
550----
Shawn O. Pearce08255812011-04-12 00:02:38 -0400551 QPD = Changes_Day * Revisions_Change * Reviewers_Change * (4 + F + F * C)
552 = 2,000 * 2 * 1 * (4 + 10 + 10 * 4)
553 = 216,000
Shawn O. Pearce57c4ba82009-12-22 08:10:50 -0800554 QPS = QPD / 8_Hours / 60_Minutes / 60_Seconds
Shawn O. Pearce08255812011-04-12 00:02:38 -0400555 = 7.5
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800556----
557
558Gerrit serves most requests in under 60 ms when using the loopback
559interface and a single processor. On a single CPU system there is
560sufficient capacity for 16 QPS. A dual processor system should be
Shawn O. Pearce08255812011-04-12 00:02:38 -0400561more than sufficient for a site with the estimated load described above.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800562
563Given a more realistic estimate of 79 changes per day (from the
Shawn O. Pearce08255812011-04-12 00:02:38 -0400564Linux kernel) suggests only 8,532 queries per day, and a much lower
5650.29 QPS when spread out over an 8 hour work day.
566
Yuxuan 'fishy' Wang61698b12013-12-20 12:55:51 -0800567=== CPU Usage - Git over SSH/HTTP
Shawn O. Pearce08255812011-04-12 00:02:38 -0400568
569A 24 core server is able to handle ~25 concurrent `git fetch`
570operations per second. The issue here is each concurrent operation
571demands one full core, as the computation is almost entirely server
572side CPU bound. 25 concurrent operations is known to be sufficient to
573support hundreds of active developers and 50 automated build servers
574polling for updates and building every change. (This data was derived
575from an actual installation's performance.)
576
577Because of the distributed nature of Git, end-users don't need to
578contact the central Gerrit Code Review server very often. For `git
579fetch` traffic, link:pgm-daemon.html[slave mode] is known to be an
580effective way to offload traffic from the main server, permitting it
581to scale to a large user base without needing an excessive number of
582cores in a single system.
583
584Clients on very slow network connections (for example home office
585users on VPN over home DSL) may be network bound rather than server
586side CPU bound, in which case a core may be effectively shared with
587another user. Possible core sharing due to network bottlenecks
588generally holds true for network connections running below 10 MiB/sec.
589
590If the server's own network interface is 1 Gib/sec (Gigabit Ethernet),
591the system can really only serve about 10 concurrent clients at the
59210 MiB/sec speed, no matter how many cores it has.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800593
Yuxuan 'fishy' Wang61698b12013-12-20 12:55:51 -0800594=== Disk Usage
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800595
Shawn O. Pearce08255812011-04-12 00:02:38 -0400596The average size of a revision in the Linux kernel once compressed by
597Git is 2,327 bytes, or roughly 2 KiB. Over the course of a year a
Matt Bakera752b322013-11-27 19:19:31 -0700598Gerrit server running with the estimated maximum parameters above might
Shawn O. Pearce08255812011-04-12 00:02:38 -0400599see an introduction of 1.4 GiB over the total set of 10,000 projects
600hosted in that server. This figure assumes the majority of the content
601is human written source code, and not large binary blobs such as disk
602images or media files.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800603
Shawn O. Pearce08255812011-04-12 00:02:38 -0400604Production Gerrit installations have been tested, and are known to
605handle Git repositories in the multigigabyte range, storing binary
606files, ranging in size from a few kilobytes (for example compressed
607icons) to 800+ megabytes (firmware images, large uncompressed original
608artwork files). Best practices encourage breaking very large binary
609files into their Git repositories based on access, to prevent desktop
610clients from needing to clone unnecessary materials (for example a C
611developer does not need every 800+ megabyte firmware image created by
612the product's quality assurance team).
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800613
Yuxuan 'fishy' Wang61698b12013-12-20 12:55:51 -0800614== Redundancy & Reliability
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800615
616Gerrit largely assumes that the local filesystem where Git repository
617data is stored is always available. Important data written to disk
618is also forced to the platter with an `fsync()` once it has been
619fully written. If the local filesystem fails to respond to reads
620or becomes corrupt, Gerrit has no provisions to fallback or retry
621and errors will be returned to clients.
622
Martin Fickb026ca32011-07-27 18:23:20 -0600623Gerrit largely assumes that the metadata database is online and
624answering both read and write queries. Query failures immediately
625result in the operation aborting and errors being returned to the
626client, with no retry or fallback provisions.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800627
628Due to the relatively small scale described above, it is very likely
Martin Fickb026ca32011-07-27 18:23:20 -0600629that the Git filesystem and metadata database are all housed on the
630same server that is running Gerrit. If any failure arises in one of
631these components, it is likely to manifest in the others too. It is
632also likely that the administrator cannot be bothered to deploy a
633cluster of load-balanced server hardware, as the scale and expected
634load does not justify the hardware or management costs.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800635
636Most deployments caring about reliability will setup a warm-spare
637standby system and use a manual fail-over process to switch from the
638failed system to the warm-spare.
639
640As Git is a distributed version control system, and open source
641projects tend to have contributors from all over the world, most
642contributors will be able to tolerate a Gerrit down time of several
643hours while the administrator is notified, signs on, and brings the
644warm-spare up. Pending changes are likely to need at least 24 hours
645of time on the Gerrit site anyway in order to ensure any interested
646parties around the world have had a chance to comment. This expected
647lag largely allows for some downtime in a disaster scenario.
648
Yuxuan 'fishy' Wang61698b12013-12-20 12:55:51 -0800649=== Backups
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800650
Chad Horohoee9855b82012-05-06 22:28:45 -0400651PostgreSQL and MySQL can be configured to replicate their data to
652other systems, where they are applied to a warm-standby backup in
David Pursehouse92463562013-06-24 10:16:28 +0900653real time. Gerrit instances which care about redundancy will setup
Chad Horohoee9855b82012-05-06 22:28:45 -0400654this feature of PostgreSQL or MySQL to ensure the warm-standby is
655reasonably current should the master go offline.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800656
Shawn O. Pearce7d2cb042012-05-10 19:12:09 -0700657Using the standard replication plugin, Gerrit can be configured
658to replicate changes made to the local Git repositories over any
659standard Git transports. After the plugin is installed, remote
660destinations can be configured in `'$site_path'/etc/replication.conf`
661to send copies of all changes over SSH to other servers, or to the
662Amazon S3 blob storage service.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800663
664
Yuxuan 'fishy' Wang61698b12013-12-20 12:55:51 -0800665== Logging Plan
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800666
667Gerrit does not maintain logs on its own.
668
669Published comments contain a publication date, so users can judge
670when the comment was posted and decide if it was "recent" or not.
671Only the timestamp is stored in the database, the IP address of
672the comment author is not stored.
673
674Changes uploaded over the SSH daemon from `git push` have the
675standard Git reflog updated with the date and time that the upload
676occurred, and the Gerrit account identity of who did the upload.
677Changes submitted and merged into a branch also update the
678Git reflog. These logs are available only to the Gerrit site
679administrator, and they are not replicated through the automatic
David Pursehouse92463562013-06-24 10:16:28 +0900680replication noted earlier. These logs are primarily recorded for an
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800681"oh s**t" moment where the administrator has to rewind data. In most
682installations they are a waste of disk space. Future versions of
683JGit may allow disabling these logs, and Gerrit may take advantage
684of that feature to stop writing these logs.
685
686A web server positioned in front of Gerrit (such as a reverse proxy)
687or the hosting servlet container may record access logs, and these
688logs may be mined for usage information. This is outside of the
689scope of Gerrit.
690
691
Yuxuan 'fishy' Wang61698b12013-12-20 12:55:51 -0800692== Testing Plan
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800693
694Gerrit is currently manually tested through its web UI.
695
696JGit has a fairly extensive automated unit test suite. Most new
697changes to JGit are rejected unless corresponding automated unit
698tests are included.
699
700
Yuxuan 'fishy' Wang61698b12013-12-20 12:55:51 -0800701== Caveats
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800702
David Pursehouse659860f2013-12-16 14:50:04 +0900703Rietveld can't be used as it does not provide the "submit over the
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800704web" feature that Gerrit provides for Git.
705
706Gitosis can't be used as it does not provide any code review
707features, but it does provide basic access controls.
708
709Email based code review does not scale to a project as large and
710complex as Android. Most contributors at least need some sort of
711dashboard to keep track of any pending reviews, and some way to
712correlate updated revisions back to the comments written on prior
713revisions of the same logical change.
Shawn O. Pearce5500e692009-05-28 15:55:01 -0700714
715GERRIT
716------
717Part of link:index.html[Gerrit Code Review]
Yuxuan 'fishy' Wang99cb68d2013-10-31 17:26:00 -0700718
719SEARCHBOX
720---------