blob: ce2868c58ea76fe4c6df3af5f1d2ec95a7cce6e4 [file] [log] [blame]
Shawn O. Pearcee31d02c2009-12-08 12:21:37 -08001Gerrit Code Review - System Design
2==================================
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -08003
4Objective
5---------
6
7Gerrit is a web based code review system, facilitating online code
8reviews for projects using the Git version control system.
9
10Gerrit makes reviews easier by showing changes in a side-by-side
11display, and allowing inline comments to be added by any reviewer.
12
13Gerrit simplifies Git based project maintainership by permitting
14any authorized user to submit changes to the master Git repository,
15rather than requiring all approved changes to be merged in by
16hand by the project maintainer. This functionality enables a more
17centralized usage of Git.
18
19
20Background
21----------
22
23Google developed Mondrian, a Perforce based code review tool to
24facilitate peer-review of changes prior to submission to the central
25code repository. Mondrian is not open source, as it is tied to the
26use of Perforce and to many Google-only services, such as Bigtable.
27Google employees have often described how useful Mondrian and its
28peer-review process is to their day-to-day work.
29
30Guido van Rossum open sourced portions of Mondrian within Rietveld,
31a similar code review tool running on Google App Engine, but for
32use with Subversion rather than Perforce. Rietveld is in common
33use by many open source projects, facilitating their peer reviews
34much as Mondrian does for Google employees. Unlike Mondrian and
35the Google Perforce triggers, Rietveld is strictly advisory and
36does not enforce peer-review prior to submission.
37
38Git is a distributed version control system, wherein each repository
39is assumed to be owned/maintained by a single user. There are no
David Pursehouse221d4f62012-06-08 17:38:08 +090040inherent security controls built into Git, so the ability to read
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -080041from or write to a repository is controlled entirely by the host's
42filesystem access controls. When multiple maintainers collaborate
43on a single shared repository a high degree of trust is required,
44as any collaborator with write access can alter the repository.
45
46Gitosis provides tools to secure centralized Git repositories,
47permitting multiple maintainers to manage the same project at once,
48by restricting the access to only over a secure network protocol,
49much like Perforce secures a repository by only permitting access
50over its network port.
51
52The Android Open Source Project (AOSP) was founded by Google by the
53open source releasing of the Android operating system. AOSP has
54selected Git as its primary version control tool. As many of the
55engineers have a background of working with Mondrian at Google,
56there is a strong desire to have the same (or better) feature set
57available for Git and AOSP.
58
Shawn O. Pearce4a7f6fa2009-02-17 17:14:56 -080059Gerrit Code Review started as a simple set of patches to Rietveld,
60and was originally built to service AOSP. This quickly turned
61into a fork as we added access control features that Guido van
62Rossum did not want to see complicating the Rietveld code base. As
63the functionality and code were starting to become drastically
64different, a different name was needed. Gerrit calls back to the
65original namesake of Rietveld, Gerrit Rietveld, a Dutch architect.
66
Shawn O. Pearcee31d02c2009-12-08 12:21:37 -080067Gerrit 2.x is a complete rewrite of the Gerrit fork, completely
68changing the implementation from Python on Google App Engine, to Java
69on a J2EE servlet container and a SQL database.
Shawn O. Pearce4a7f6fa2009-02-17 17:14:56 -080070
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -080071* link:http://video.google.com/videoplay?docid=-8502904076440714866[Mondrian Code Review On The Web]
72* link:http://code.google.com/p/rietveld/[Rietveld - Code Review for Subversion]
73* link:http://eagain.net/gitweb/?p=gitosis.git;a=blob;f=README.rst;hb=HEAD[Gitosis README]
74* link:http://source.android.com/[Android Open Source Project]
75
76
77Overview
78--------
79
80Developers create one or more changes on their local desktop system,
81then upload them for review to Gerrit using the standard `git push`
82command line program, or any GUI which can invoke `git push` on
83behalf of the user. Authentication and data transfer are handled
84through SSH. Users are authenticated by username and public/private
85key pair, and all data transfer is protected by the SSH connection
86and Git's own data integrity checks.
87
88Each Git commit created on the client desktop system is converted
89into a unique change record which can be reviewed independently.
Chad Horohoee9855b82012-05-06 22:28:45 -040090Change records are stored in a database: PostgreSQL, MySQL, or the
Martin Fickb026ca32011-07-27 18:23:20 -060091built-in H2, where they can be queried to present customized user
92dashboards, enumerating any pending changes.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -080093
94A summary of each newly uploaded change is automatically emailed
95to reviewers, so they receive a direct hyperlink to review the
96change on the web. Reviewer email addresses can be specified on the
97`git push` command line, but typically reviewers are automatically
98selected by Gerrit by identifying users who have change approval
99permissions in the project.
100
101Reviewers use the web interface to read the side-by-side or unified
102diff of a change, and insert draft inline comments where appropriate.
103A draft comment is visible only to the reviewer, until they publish
104those comments. Published comments are automatically emailed to
105the change author by Gerrit, and are CC'd to all other reviewers
106who have already commented on the change.
107
108When publishing comments reviewers are also given the opportunity
109to score the change, indicating whether they feel the change is
110ready for inclusion in the project, needs more work, or should be
111rejected outright. These scores provide direct feedback to Gerrit's
112change submit function.
113
114After a change has been scored positively by reviewers, Gerrit
115enables a submit button on the web interface. Authorized users
116can push the submit button to have the change enter the project
Edwin Kempinf1acbb82011-09-15 12:49:42 +0200117repository. The equivalent in Subversion or Perforce would be
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800118that Gerrit is invoking `svn commit` or `p4 submit` on behalf of
119the web user pressing the button. Due to the way Git audit trails
120are maintained, the user pressing the submit button does not need
121to be the author of the change.
122
123
124Infrastructure
125--------------
126
127End-user web browsers make HTTP requests directly to Gerrit's
128HTTP server. As nearly all of the user interface is implemented
129through Google Web Toolkit (GWT), the majority of these requests
130are transmitting compressed JSON payloads, with all HTML being
131generated within the browser. Most responses are under 1 KB.
132
133Gerrit's HTTP server side component is implemented as a standard
134Java servlet, and thus runs within any J2EE servlet container.
135Popular choices for deployments would be Tomcat or Jetty, as these
136are high-quality open-source servlet containers that are readily
137available for download.
138
139End-user uploads are performed over SSH, so Gerrit's servlets also
140start up a background thread to receive SSH connections through
141an independent SSH port. SSH clients communicate directly with
142this port, bypassing the HTTP server used by browsers.
143
144Server side data storage for Gerrit is broken down into two different
145categories:
146
147* Git repository data
148* Gerrit metadata
149
150The Git repository data is the Git object database used to store
151already submitted revisions, as well as all uploaded (proposed)
152changes. Gerrit uses the standard Git repository format, and
153therefore requires direct filesystem access to the repositories.
154All repository data is stored in the filesystem and accessed through
155the JGit library. Repository data can be stored on remote servers
156accessible through NFS or SMB, but the remote directory must
157be mounted on the Gerrit server as part of the local filesystem
158namespace. Remote filesystems are likely to perform worse than
159local ones, due to Git disk IO behavior not being optimized for
160remote access.
161
162The Gerrit metadata contains a summary of the available changes,
163all comments (published and drafts), and individual user account
Martin Fickb026ca32011-07-27 18:23:20 -0600164information. The metadata is mostly housed in the database (*1),
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800165which can be located either on the same server as Gerrit, or on
166a different (but nearby) server. Most installations would opt to
Martin Fickb026ca32011-07-27 18:23:20 -0600167install both Gerrit and the metadata database on the same server,
168to reduce administration overheads.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800169
170User authentication is handled by OpenID, and therefore Gerrit
171requires that the OpenID provider selected by a user must be
172online and operating in order to authenticate that user.
173
174* link:http://code.google.com/webtoolkit/[Google Web Toolkit (GWT)]
175* link:http://www.kernel.org/pub/software/scm/git/docs/gitrepository-layout.html[Git Repository Format]
176* link:http://www.postgresql.org/about/[About PostgreSQL]
177* link:http://openid.net/developers/specs/[OpenID Specifications]
178
Martin Fickb026ca32011-07-27 18:23:20 -0600179*1 Although an effort is underway to eliminate the use of the
180database altogether, and to store all the metadata directly in
181the git repositories themselves. So far, as of Gerrit 2.2.1, of
182all Gerrit's metadata, only the project configuration metadata
183has been migrated out of the database and into the git
184repositories for each project.
185
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800186
187Project Information
188-------------------
189
190Gerrit is developed as a self-hosting open source project:
191
192* link:http://code.google.com/p/gerrit/[Project Homepage]
193* link:http://code.google.com/p/gerrit/downloads/list[Release Versions]
Colby Rangerbbc9de52012-04-26 09:01:10 -0700194* link:http://code.google.com/p/gerrit/source/checkout[Source]
Shawn O. Pearceaa5b83b2009-12-02 08:10:24 -0800195* link:http://code.google.com/p/gerrit/issues/list[Issue Tracking]
196* link:https://review.source.android.com/[Change Review]
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800197
198
199Internationalization and Localization
200-------------------------------------
201
202As a source code review system for open source projects, where the
203commonly preferred language for communication is typically English,
204Gerrit does not make internationalization or localization a priority.
205
206The majority of Gerrit's users will be writing change descriptions
207and comments in English, and therefore an English user interface
208is usable by the target user base.
209
210Gerrit uses GWT's i18n support to externalize all constant strings
211and messages shown to the user, so that in the future someone who
212really needed a translated version of the UI could contribute new
213string files for their locale(s).
214
215Right-to-left (RTL) support is only barely considered within the
216Gerrit code base. Some portions of the code have tried to take
217RTL into consideration, while others probably need to be modified
218before translating the UI to an RTL language.
219
220* link:i18n-readme.html[Gerrit's i18n Support]
221
222
223Accessibility Considerations
224----------------------------
225
226Whenever possible Gerrit displays raw text rather than image icons,
227so screen readers should still be able to provide useful information
228to blind persons accessing Gerrit sites.
229
230Standard HTML hyperlinks are used rather than HTML div or span tags
231with click listeners. This provides two benefits to the end-user.
232The first benefit is that screen readers are optimized to locating
233standard hyperlink anchors and presenting them to the end-user as
234a navigation action. The second benefit is that users can use
235the 'open in new tab/window' feature of their browser whenever
236they choose.
237
238When possible, Gerrit uses the ARIA properties on DOM widgets to
239provide hints to screen readers.
240
241
242Browser Compatibility
243---------------------
244
245Supporting non-JavaScript enabled browsers is a non-goal for Gerrit.
246
247As Gerrit is a pure-GWT application with no server side rendering
248fallbacks, the browser must support modern JavaScript semantics in
249order to access the Gerrit web application. Dumb clients such as
250`lynx`, `wget`, `curl`, or even many search engine spiders are not
251able to access Gerrit content.
252
253As Google Web Toolkit (GWT) is used to generate the browser
254specific versions of the client-side JavaScript code, Gerrit works
255on any JavaScript enabled browser which GWT can produce code for.
256This covers the majority of the popular browsers.
257
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800258The Gerrit project does not have the development resources necessary
259to support two parallel UI implementations (GWT based JavaScript
260and server-side rendering). Consequently only one is implemented.
261
262There are number of web browsers available with full JavaScript
263support, and nearly every operating system (including any PDA-like
264mobile phone) comes with one standard. Users who are committed
265to developing changes for a Gerrit managed project can be expected
266to be able to run a JavaScript enabled browser, as they also would
267need to be running Git in order to contribute.
268
269There are a number of open source browsers available, including
270Firefox and Chromium. Users have some degree of choice in their
271browser selection, including being able to build and audit their
272browser from source.
273
274The majority of the content stored within Gerrit is also available
275through other means, such as gitweb or the `git://` protocol.
276Any existing search engine spider can crawl the server-side HTML
277produced by gitweb, and thus can index the majority of the changes
278which might appear in Gerrit. Some engines may even choose to
279crawl the native version control database, such as ohloh.net does.
280Therefore the lack of support for most search engine spiders is a
281non-issue for most Gerrit deployments.
282
283
284Product Integration
285-------------------
286
287Gerrit integrates with an existing gitweb installation by optionally
288creating hyperlinks to reference changes on the gitweb server.
289
290Gerrit integrates with an existing git-daemon installation by
291optionally displaying `git://` URLs for users to download a
Shawn O. Pearced6078462009-11-02 10:37:01 -0800292change through the native Git protocol.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800293
294Gerrit integrates with any OpenID provider for user authentication,
295making it easier for users to join a Gerrit site and manage their
296authentication credentials to it. To make use of Google Accounts
297as an OpenID provider easier, Gerrit has a shorthand "Sign in with
298a Google Account" link on its sign-in screen. Gerrit also supports
299a shorthand sign in link for Yahoo!. Other providers may also be
300supported more directly in the future.
301
Shawn O. Pearce142385d2009-03-01 11:09:05 -0800302Site administrators may limit the range of OpenID providers to
303a subset of "reliable providers". Users may continue to use
304any OpenID provider to publish comments, but granted privileges
305are only available to a user if the only entry point to their
306account is through the defined set of "reliable OpenID providers".
307This permits site administrators to require HTTPS for OpenID,
308and to use only large main-stream providers that are trustworthy,
309or to require users to only use a custom OpenID provider installed
310alongside Gerrit Code Review.
311
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800312Gerrit integrates with some types of corporate single-sign-on (SSO)
313solutions, typically by having the SSO authentication be performed
314in a reverse proxy web server and then blindly trusting that all
315incoming connections have been authenticated by that reverse proxy.
316When configured to use this form of authentication, Gerrit does
317not integrate with OpenID providers.
318
319When installing Gerrit, administrators may optionally include an
320HTML header or footer snippet which may include user tracking code,
321such as that used by Google Analytics. This is a per-instance
322configuration that must be done by hand, and is not supported
323out of the box. Other site trackers instead of Google Analytics
324can be used, as the administrator can supply any HTML/JavaScript
325they choose.
326
327Gerrit does not integrate with any Google service, or any other
328services other than those listed above.
329
330
331Standards / Developer APIs
332--------------------------
333
334Gerrit uses an XSRF protected variant of JSON-RPC 1.1 to communicate
335between the browser client and the server.
336
337As the protocol is not the GWT-RPC protocol, but is instead a
338self-describing standard JSON format it is easily implemented by
339any 3rd party client application, provided the client has a JSON
340parser and HTTP client library available.
341
342As the entire command set necessary for the standard web browser
343based UI is exposed through JSON-RPC over HTTP, there are no other
344data feeds or command interfaces to the server.
345
346Commands requiring user authentication may require the user agent to
347complete a sign-in cycle through the user's OpenID provider in order
348to establish the HTTP cookie Gerrit uses to track user identity.
349Automating this sign-in process for non-web browser agents is
350outside of the scope of Gerrit, as each OpenID provider uses its own
351sign-in sequence. Use of OpenID providers which have difficult to
352automate interfaces may make it impossible for non-browser agents
353to be used with the JSON-RPC interface.
354
355* link:http://json-rpc.org/wd/JSON-RPC-1-1-WD-20060807.html[JSON-RPC 1.1]
Augie Facklerec2bb9e2011-10-27 13:26:15 -0500356* link:http://code.google.com/p/gerrit/source/browse/README?repo=gwtjsonrpc&name=master[XSRF JSON-RPC]
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800357
358
359Privacy Considerations
360----------------------
361
362Gerrit stores the following information per user account:
363
364* Full Name
365* Preferred Email Address
Shawn O. Pearceaa8b3d42009-03-01 11:10:55 -0800366* Mailing Address '(Optional, Encrypted)'
367* Country '(Optional, Encrypted)'
368* Phone Number '(Optional, Encrypted)'
369* Fax Number '(Optional, Encrypted)'
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800370
371The full name and preferred email address fields are shown to any
372site visitor viewing a page containing a change uploaded by the
373account owner, or containing a published comment written by the
374account owner.
375
376Showing the full name and preferred email is approximately the same
377risk as the `From` header of an email posted to a public mailing
378list that maintains archives, and Gerrit treats these fields in
379much the same way that a mailing list archive might handle them.
380Users who don't want to expose this information should either not
381participate in a Gerrit based online community, or open a new email
382address dedicated for this use.
383
384As the Gerrit UI data is only available through XSRF protected
385JSON-RPC calls, "screen-scraping" for email addresses is difficult,
386but not impossible. It is unlikely a spammer will go through the
387effort required to code a custom scraping application necessary
388to cull email addresses from published Gerrit comments. In most
389cases these same addresses would be more easily obtained from the
390project's mailing list archives.
391
Shawn O. Pearceaa8b3d42009-03-01 11:10:55 -0800392The user's name and email address is stored unencrypted in the
393Gerrit metadata store, typically a PostgreSQL database.
394
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800395The snail-mail mailing address, country, and phone and fax numbers
396are gathered to help project leads contact the user should there
397be a legal question regarding any change they have uploaded.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800398
Shawn O. Pearceaa8b3d42009-03-01 11:10:55 -0800399These sensitive fields are immediately encrypted upon receipt with
400a GnuPG public key, and stored "off site" in another data store,
401isolated from the main Gerrit change data. Gerrit does not have
402access to the matching private key, and as such cannot decrypt the
403information. Therefore these fields are write-once in Gerrit, as not
404even the account owner can recover the values they previously stored.
405
406It is expected that the address information would only need to be
407decrypted and revealed with a valid court subpoena, but this is
408really left to the discretion of the Gerrit site administrator as
409to when it is reasonable to reveal this information to a 3rd party.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800410
411
412Spam and Abuse Considerations
413-----------------------------
414
415Gerrit makes no attempt to detect spam changes or comments. The
416somewhat high barrier to entry makes it unlikely that a spammer
417will target Gerrit.
418
419To upload a change, the client must speak the native Git protocol
420embedded in SSH, with some custom Gerrit semantics added on top.
421The client must have their public key already stored in the Gerrit
422database, which can only be done through the XSRF protected
423JSON-RPC interface. The level of effort required to construct
424the necessary tools to upload a well-formatted change that isn't
425rejected outright by the Git and Gerrit checksum validations is
426too high to for a spammer to get any meaningful return.
427
428To post and publish a comment a client must sign in with an OpenID
429provider and then use the XSRF protected JSON-RPC interface to
430publish the draft on an existing change record. Again, the level of
431effort required to implement the Gerrit specific XSRF protections
432and the JSON-RPC payload format necessary to post a draft and then
433publish that draft is simply too high for a spammer to bother with.
434
435Both of these assumptions are also based upon the idea that Gerrit
436will be a lot less popular than blog software, and thus will be
Martin Fickb026ca32011-07-27 18:23:20 -0600437running on a lot fewer websites. Spammers therefore have very little
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800438returned benefit for getting over the protocol hurdles.
439
440These assumptions may need to be revisited in the future if any
441public Gerrit site actually notices spam.
442
443
444Latency
445-------
446
447Gerrit targets for sub-250 ms per page request, mostly by using
448very compact JSON payloads bewteen client and server. However, as
Martin Fickb026ca32011-07-27 18:23:20 -0600449most of the serving stack (network, hardware, metadata
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800450database) is out of control of the Gerrit developers, no real
451guarantees can be made about latency.
452
453
454Scalability
455-----------
456
Shawn O. Pearce08255812011-04-12 00:02:38 -0400457Gerrit is designed for a very large scale open source project, or
458large commerical development project. Roughly this amounts to
459parameters such as the following:
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800460
461.Design Parameters
Karsten Dambekalnsa7f72a22011-03-25 14:21:59 +0100462[options="header"]
Shawn O. Pearce08255812011-04-12 00:02:38 -0400463|======================================================
464|Parameter | Default Maximum | Estimated Maximum
465|Projects | 1,000 | 10,000
466|Contributors | 1,000 | 50,000
467|Changes/Day | 100 | 2,000
468|Revisions/Change | 20 | 20
469|Files/Change | 50 | 16,000
470|Comments/File | 100 | 100
471|Reviewers/Change | 8 | 8
472|======================================================
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800473
Shawn O. Pearce08255812011-04-12 00:02:38 -0400474Out of the box, Gerrit will handle the "Default Maximum". Site
475administrators may reconfigure their servers by editing gerrit.config
476to run closer to the estimated maximum if sufficient memory is made
477avaliable to the JVM and the relevant cache.*.memoryLimit variables
478are increased from their defaults.
479
480Discussion
481~~~~~~~~~~
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800482
483Very few, if any open source projects have more than a handful of
Shawn O. Pearce08255812011-04-12 00:02:38 -0400484Git repositories associated with them. Since Gerrit treats each
485Git repository as a project, an upper limit of 10,000 projects
486is reasonable. If a site has more than 1,000 projects, administrators
487should increase
488link:config-gerrit.html#cache.name.memoryLimit[`cache.projects.memoryLimit`]
489to match.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800490
Shawn O. Pearce08255812011-04-12 00:02:38 -0400491Almost no open source project has 1,000 contributors over all time,
492let alone on a daily basis. This default figure of 1,000 was WAG'd by
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800493looking at PR statements published by cell phone companies picking
494up the Android operating system. If all of the stated employees in
495those PR statements were working on *only* the open source Android
Shawn O. Pearce08255812011-04-12 00:02:38 -0400496repositories, we might reach the 1,000 estimate listed here. Knowing
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800497these companies as being very closed-source minded in the past, it
498is very unlikely all of their Android engineers will be working on
Shawn O. Pearce08255812011-04-12 00:02:38 -0400499the open source repository, and thus 1,000 is a very high estimate.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800500
Shawn O. Pearce08255812011-04-12 00:02:38 -0400501The upper maximum of 50,000 contributors is based on existing
502installations that are already handling quite a bit more than the
503default maximum of 1,000 contributors. Given how the user data is
504stored and indexed, supporting 50,000 contributor accounts (or more)
505is easily possible for a server. If a server has more than 1,000
506*active* contributors,
507link:config-gerrit.html#cache.name.memoryLimit[`cache.accounts.memoryLimit`]
508should be increased by the site administrator, if sufficient RAM
509is available to the host JVM.
510
511The estimate of 100 changes per day was WAG'd off some estimates
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800512originally obtained from Android's development history. Writing a
513good change that will be accepted through a peer-review process
514takes time. The average engineer may need 4-6 hours per change just
515to write the code and unit tests. Proper design consideration and
516additional but equally important tasks such as meetings, interviews,
517training, and eating lunch will often pad the engineer's day out
518such that suitable changes are only posted once a day, or once
519every other day. For reference, the entire Linux kernel has an
Shawn O. Pearce08255812011-04-12 00:02:38 -0400520average of only 79 changes/day. If more than 100 changes are active
521per day, site administrators should consider increasing the
522link:config-gerrit.html#cache.name.memoryLimit[`cache.diff.memoryLimit`]
523and `cache.diff_intraline.memoryLimit`.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800524
Shawn O. Pearce08255812011-04-12 00:02:38 -0400525On average any given change will need to be modified once to address
526peer review comments before the final revision can be accepted by the
527project. Executing these revisions also eats into the contributor's
528time, and is another factor limiting the number of changes/day
529accepted by the Gerrit instance. However, even though this implies
530only 2 revisions/change, many existing Gerrit installations have seen
53120 or more revisions/change, when new contributors are learning the
532project's style and conventions.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800533
Shawn O. Pearce08255812011-04-12 00:02:38 -0400534On average, each change will have 2 reviewers, a human and an
535automated test bed system. Usually this would be the project lead, or
536someone who is familiar with the code being modified. The time
537required to comment further reduces the time available for writing
538one's own changes. However, existing Gerrit installations have seen 8
539or more reviewers frequently show up on changes that impact many
540functional areas, and therefore it is reasonable to expect 8 or more
541reviewers to be able to work together on a single change.
542
543Existing installations have successfully processed change reviews with
544more than 16,000 files per change. However, since 16,000 modified/new
545files is a massive amount of code to review, it is more typical to see
546less than 10 files modified in any single change. Changes larger than
54710 files are typically merges, for example integrating the latest
548version of an upstream library, where the reviewer has little to do
549beyond verifying the project compiles and passes a test suite.
550
551CPU Usage - Web UI
552~~~~~~~~~~~~~~~~~~
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800553
554Gerrit's web UI would require on average `4+F+F*C` HTTP requests to
555review a change and post comments. Here `F` is the number of files
556modified by the change, and `C` is the number of inline comments left
557by the reviewer per file. The constant 4 accounts for the request
558to load the reviewer's dashboard, to load the change detail page,
559to publish the review comments, and to reload the change detail
560page after comments are published.
561
Shawn O. Pearce08255812011-04-12 00:02:38 -0400562This WAG'd estimate boils down to 216,000 HTTP requests per day
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800563(QPD). Assuming these are evenly distributed over an 8 hour work day
Shawn O. Pearce08255812011-04-12 00:02:38 -0400564in a single time zone, we are looking at approximately 7.5 queries
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800565per second (QPS).
566
567----
Shawn O. Pearce08255812011-04-12 00:02:38 -0400568 QPD = Changes_Day * Revisions_Change * Reviewers_Change * (4 + F + F * C)
569 = 2,000 * 2 * 1 * (4 + 10 + 10 * 4)
570 = 216,000
Shawn O. Pearce57c4ba82009-12-22 08:10:50 -0800571 QPS = QPD / 8_Hours / 60_Minutes / 60_Seconds
Shawn O. Pearce08255812011-04-12 00:02:38 -0400572 = 7.5
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800573----
574
575Gerrit serves most requests in under 60 ms when using the loopback
576interface and a single processor. On a single CPU system there is
577sufficient capacity for 16 QPS. A dual processor system should be
Shawn O. Pearce08255812011-04-12 00:02:38 -0400578more than sufficient for a site with the estimated load described above.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800579
580Given a more realistic estimate of 79 changes per day (from the
Shawn O. Pearce08255812011-04-12 00:02:38 -0400581Linux kernel) suggests only 8,532 queries per day, and a much lower
5820.29 QPS when spread out over an 8 hour work day.
583
584CPU Usage - Git over SSH/HTTP
585~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
586
587A 24 core server is able to handle ~25 concurrent `git fetch`
588operations per second. The issue here is each concurrent operation
589demands one full core, as the computation is almost entirely server
590side CPU bound. 25 concurrent operations is known to be sufficient to
591support hundreds of active developers and 50 automated build servers
592polling for updates and building every change. (This data was derived
593from an actual installation's performance.)
594
595Because of the distributed nature of Git, end-users don't need to
596contact the central Gerrit Code Review server very often. For `git
597fetch` traffic, link:pgm-daemon.html[slave mode] is known to be an
598effective way to offload traffic from the main server, permitting it
599to scale to a large user base without needing an excessive number of
600cores in a single system.
601
602Clients on very slow network connections (for example home office
603users on VPN over home DSL) may be network bound rather than server
604side CPU bound, in which case a core may be effectively shared with
605another user. Possible core sharing due to network bottlenecks
606generally holds true for network connections running below 10 MiB/sec.
607
608If the server's own network interface is 1 Gib/sec (Gigabit Ethernet),
609the system can really only serve about 10 concurrent clients at the
61010 MiB/sec speed, no matter how many cores it has.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800611
612Disk Usage
613~~~~~~~~~~
614
Shawn O. Pearce08255812011-04-12 00:02:38 -0400615The average size of a revision in the Linux kernel once compressed by
616Git is 2,327 bytes, or roughly 2 KiB. Over the course of a year a
617Gerrit server running with the estimated maxium parameters above might
618see an introduction of 1.4 GiB over the total set of 10,000 projects
619hosted in that server. This figure assumes the majority of the content
620is human written source code, and not large binary blobs such as disk
621images or media files.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800622
Shawn O. Pearce08255812011-04-12 00:02:38 -0400623Production Gerrit installations have been tested, and are known to
624handle Git repositories in the multigigabyte range, storing binary
625files, ranging in size from a few kilobytes (for example compressed
626icons) to 800+ megabytes (firmware images, large uncompressed original
627artwork files). Best practices encourage breaking very large binary
628files into their Git repositories based on access, to prevent desktop
629clients from needing to clone unnecessary materials (for example a C
630developer does not need every 800+ megabyte firmware image created by
631the product's quality assurance team).
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800632
633Redundancy & Reliability
634------------------------
635
636Gerrit largely assumes that the local filesystem where Git repository
637data is stored is always available. Important data written to disk
638is also forced to the platter with an `fsync()` once it has been
639fully written. If the local filesystem fails to respond to reads
640or becomes corrupt, Gerrit has no provisions to fallback or retry
641and errors will be returned to clients.
642
Martin Fickb026ca32011-07-27 18:23:20 -0600643Gerrit largely assumes that the metadata database is online and
644answering both read and write queries. Query failures immediately
645result in the operation aborting and errors being returned to the
646client, with no retry or fallback provisions.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800647
648Due to the relatively small scale described above, it is very likely
Martin Fickb026ca32011-07-27 18:23:20 -0600649that the Git filesystem and metadata database are all housed on the
650same server that is running Gerrit. If any failure arises in one of
651these components, it is likely to manifest in the others too. It is
652also likely that the administrator cannot be bothered to deploy a
653cluster of load-balanced server hardware, as the scale and expected
654load does not justify the hardware or management costs.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800655
656Most deployments caring about reliability will setup a warm-spare
657standby system and use a manual fail-over process to switch from the
658failed system to the warm-spare.
659
660As Git is a distributed version control system, and open source
661projects tend to have contributors from all over the world, most
662contributors will be able to tolerate a Gerrit down time of several
663hours while the administrator is notified, signs on, and brings the
664warm-spare up. Pending changes are likely to need at least 24 hours
665of time on the Gerrit site anyway in order to ensure any interested
666parties around the world have had a chance to comment. This expected
667lag largely allows for some downtime in a disaster scenario.
668
669Backups
670~~~~~~~
671
Chad Horohoee9855b82012-05-06 22:28:45 -0400672PostgreSQL and MySQL can be configured to replicate their data to
673other systems, where they are applied to a warm-standby backup in
674real time. Gerrit instances which care about reduduncy will setup
675this feature of PostgreSQL or MySQL to ensure the warm-standby is
676reasonably current should the master go offline.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800677
Shawn O. Pearce7d2cb042012-05-10 19:12:09 -0700678Using the standard replication plugin, Gerrit can be configured
679to replicate changes made to the local Git repositories over any
680standard Git transports. After the plugin is installed, remote
681destinations can be configured in `'$site_path'/etc/replication.conf`
682to send copies of all changes over SSH to other servers, or to the
683Amazon S3 blob storage service.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800684
685
686Logging Plan
687------------
688
689Gerrit does not maintain logs on its own.
690
691Published comments contain a publication date, so users can judge
692when the comment was posted and decide if it was "recent" or not.
693Only the timestamp is stored in the database, the IP address of
694the comment author is not stored.
695
696Changes uploaded over the SSH daemon from `git push` have the
697standard Git reflog updated with the date and time that the upload
698occurred, and the Gerrit account identity of who did the upload.
699Changes submitted and merged into a branch also update the
700Git reflog. These logs are available only to the Gerrit site
701administrator, and they are not replicated through the automatic
702replication noted earlier. These logs are primarly recorded for an
703"oh s**t" moment where the administrator has to rewind data. In most
704installations they are a waste of disk space. Future versions of
705JGit may allow disabling these logs, and Gerrit may take advantage
706of that feature to stop writing these logs.
707
708A web server positioned in front of Gerrit (such as a reverse proxy)
709or the hosting servlet container may record access logs, and these
710logs may be mined for usage information. This is outside of the
711scope of Gerrit.
712
713
714Testing Plan
715------------
716
717Gerrit is currently manually tested through its web UI.
718
719JGit has a fairly extensive automated unit test suite. Most new
720changes to JGit are rejected unless corresponding automated unit
721tests are included.
722
723
724Caveats
725-------
726
727Reitveld can't be used as it does not provide the "submit over the
728web" feature that Gerrit provides for Git.
729
730Gitosis can't be used as it does not provide any code review
731features, but it does provide basic access controls.
732
733Email based code review does not scale to a project as large and
734complex as Android. Most contributors at least need some sort of
735dashboard to keep track of any pending reviews, and some way to
736correlate updated revisions back to the comments written on prior
737revisions of the same logical change.
Shawn O. Pearce5500e692009-05-28 15:55:01 -0700738
739GERRIT
740------
741Part of link:index.html[Gerrit Code Review]