blob: 571ec6cdb21f54f0776e9e530c8e0f1431ccf085 [file] [log] [blame]
Shawn O. Pearcee31d02c2009-12-08 12:21:37 -08001Gerrit Code Review - System Design
2==================================
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -08003
4Objective
5---------
6
7Gerrit is a web based code review system, facilitating online code
8reviews for projects using the Git version control system.
9
10Gerrit makes reviews easier by showing changes in a side-by-side
11display, and allowing inline comments to be added by any reviewer.
12
13Gerrit simplifies Git based project maintainership by permitting
14any authorized user to submit changes to the master Git repository,
15rather than requiring all approved changes to be merged in by
16hand by the project maintainer. This functionality enables a more
17centralized usage of Git.
18
19
20Background
21----------
22
23Google developed Mondrian, a Perforce based code review tool to
24facilitate peer-review of changes prior to submission to the central
25code repository. Mondrian is not open source, as it is tied to the
26use of Perforce and to many Google-only services, such as Bigtable.
27Google employees have often described how useful Mondrian and its
28peer-review process is to their day-to-day work.
29
30Guido van Rossum open sourced portions of Mondrian within Rietveld,
31a similar code review tool running on Google App Engine, but for
32use with Subversion rather than Perforce. Rietveld is in common
33use by many open source projects, facilitating their peer reviews
34much as Mondrian does for Google employees. Unlike Mondrian and
35the Google Perforce triggers, Rietveld is strictly advisory and
36does not enforce peer-review prior to submission.
37
38Git is a distributed version control system, wherein each repository
39is assumed to be owned/maintained by a single user. There are no
40inherit security controls built into Git, so the ability to read
41from or write to a repository is controlled entirely by the host's
42filesystem access controls. When multiple maintainers collaborate
43on a single shared repository a high degree of trust is required,
44as any collaborator with write access can alter the repository.
45
46Gitosis provides tools to secure centralized Git repositories,
47permitting multiple maintainers to manage the same project at once,
48by restricting the access to only over a secure network protocol,
49much like Perforce secures a repository by only permitting access
50over its network port.
51
52The Android Open Source Project (AOSP) was founded by Google by the
53open source releasing of the Android operating system. AOSP has
54selected Git as its primary version control tool. As many of the
55engineers have a background of working with Mondrian at Google,
56there is a strong desire to have the same (or better) feature set
57available for Git and AOSP.
58
Shawn O. Pearce4a7f6fa2009-02-17 17:14:56 -080059Gerrit Code Review started as a simple set of patches to Rietveld,
60and was originally built to service AOSP. This quickly turned
61into a fork as we added access control features that Guido van
62Rossum did not want to see complicating the Rietveld code base. As
63the functionality and code were starting to become drastically
64different, a different name was needed. Gerrit calls back to the
65original namesake of Rietveld, Gerrit Rietveld, a Dutch architect.
66
Shawn O. Pearcee31d02c2009-12-08 12:21:37 -080067Gerrit 2.x is a complete rewrite of the Gerrit fork, completely
68changing the implementation from Python on Google App Engine, to Java
69on a J2EE servlet container and a SQL database.
Shawn O. Pearce4a7f6fa2009-02-17 17:14:56 -080070
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -080071* link:http://video.google.com/videoplay?docid=-8502904076440714866[Mondrian Code Review On The Web]
72* link:http://code.google.com/p/rietveld/[Rietveld - Code Review for Subversion]
73* link:http://eagain.net/gitweb/?p=gitosis.git;a=blob;f=README.rst;hb=HEAD[Gitosis README]
74* link:http://source.android.com/[Android Open Source Project]
75
76
77Overview
78--------
79
80Developers create one or more changes on their local desktop system,
81then upload them for review to Gerrit using the standard `git push`
82command line program, or any GUI which can invoke `git push` on
83behalf of the user. Authentication and data transfer are handled
84through SSH. Users are authenticated by username and public/private
85key pair, and all data transfer is protected by the SSH connection
86and Git's own data integrity checks.
87
88Each Git commit created on the client desktop system is converted
89into a unique change record which can be reviewed independently.
90Change records are stored in PostgreSQL, where they can be queried to
91present customized user dashboards, enumerating any pending changes.
92
93A summary of each newly uploaded change is automatically emailed
94to reviewers, so they receive a direct hyperlink to review the
95change on the web. Reviewer email addresses can be specified on the
96`git push` command line, but typically reviewers are automatically
97selected by Gerrit by identifying users who have change approval
98permissions in the project.
99
100Reviewers use the web interface to read the side-by-side or unified
101diff of a change, and insert draft inline comments where appropriate.
102A draft comment is visible only to the reviewer, until they publish
103those comments. Published comments are automatically emailed to
104the change author by Gerrit, and are CC'd to all other reviewers
105who have already commented on the change.
106
107When publishing comments reviewers are also given the opportunity
108to score the change, indicating whether they feel the change is
109ready for inclusion in the project, needs more work, or should be
110rejected outright. These scores provide direct feedback to Gerrit's
111change submit function.
112
113After a change has been scored positively by reviewers, Gerrit
114enables a submit button on the web interface. Authorized users
115can push the submit button to have the change enter the project
116repository. The equivilant in Subversion or Perforce would be
117that Gerrit is invoking `svn commit` or `p4 submit` on behalf of
118the web user pressing the button. Due to the way Git audit trails
119are maintained, the user pressing the submit button does not need
120to be the author of the change.
121
122
123Infrastructure
124--------------
125
126End-user web browsers make HTTP requests directly to Gerrit's
127HTTP server. As nearly all of the user interface is implemented
128through Google Web Toolkit (GWT), the majority of these requests
129are transmitting compressed JSON payloads, with all HTML being
130generated within the browser. Most responses are under 1 KB.
131
132Gerrit's HTTP server side component is implemented as a standard
133Java servlet, and thus runs within any J2EE servlet container.
134Popular choices for deployments would be Tomcat or Jetty, as these
135are high-quality open-source servlet containers that are readily
136available for download.
137
138End-user uploads are performed over SSH, so Gerrit's servlets also
139start up a background thread to receive SSH connections through
140an independent SSH port. SSH clients communicate directly with
141this port, bypassing the HTTP server used by browsers.
142
143Server side data storage for Gerrit is broken down into two different
144categories:
145
146* Git repository data
147* Gerrit metadata
148
149The Git repository data is the Git object database used to store
150already submitted revisions, as well as all uploaded (proposed)
151changes. Gerrit uses the standard Git repository format, and
152therefore requires direct filesystem access to the repositories.
153All repository data is stored in the filesystem and accessed through
154the JGit library. Repository data can be stored on remote servers
155accessible through NFS or SMB, but the remote directory must
156be mounted on the Gerrit server as part of the local filesystem
157namespace. Remote filesystems are likely to perform worse than
158local ones, due to Git disk IO behavior not being optimized for
159remote access.
160
161The Gerrit metadata contains a summary of the available changes,
162all comments (published and drafts), and individual user account
163information. The metadata is housed in a PostgreSQL database,
164which can be located either on the same server as Gerrit, or on
165a different (but nearby) server. Most installations would opt to
166install both Gerrit and PostgreSQL on the same server, to reduce
167administration overheads.
168
169User authentication is handled by OpenID, and therefore Gerrit
170requires that the OpenID provider selected by a user must be
171online and operating in order to authenticate that user.
172
173* link:http://code.google.com/webtoolkit/[Google Web Toolkit (GWT)]
174* link:http://www.kernel.org/pub/software/scm/git/docs/gitrepository-layout.html[Git Repository Format]
175* link:http://www.postgresql.org/about/[About PostgreSQL]
176* link:http://openid.net/developers/specs/[OpenID Specifications]
177
178
179Project Information
180-------------------
181
182Gerrit is developed as a self-hosting open source project:
183
184* link:http://code.google.com/p/gerrit/[Project Homepage]
185* link:http://code.google.com/p/gerrit/downloads/list[Release Versions]
186* link:http://code.google.com/p/gerrit/wiki/Source?tm=4[Source]
Shawn O. Pearceaa5b83b2009-12-02 08:10:24 -0800187* link:http://code.google.com/p/gerrit/issues/list[Issue Tracking]
188* link:https://review.source.android.com/[Change Review]
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800189
190
191Internationalization and Localization
192-------------------------------------
193
194As a source code review system for open source projects, where the
195commonly preferred language for communication is typically English,
196Gerrit does not make internationalization or localization a priority.
197
198The majority of Gerrit's users will be writing change descriptions
199and comments in English, and therefore an English user interface
200is usable by the target user base.
201
202Gerrit uses GWT's i18n support to externalize all constant strings
203and messages shown to the user, so that in the future someone who
204really needed a translated version of the UI could contribute new
205string files for their locale(s).
206
207Right-to-left (RTL) support is only barely considered within the
208Gerrit code base. Some portions of the code have tried to take
209RTL into consideration, while others probably need to be modified
210before translating the UI to an RTL language.
211
212* link:i18n-readme.html[Gerrit's i18n Support]
213
214
215Accessibility Considerations
216----------------------------
217
218Whenever possible Gerrit displays raw text rather than image icons,
219so screen readers should still be able to provide useful information
220to blind persons accessing Gerrit sites.
221
222Standard HTML hyperlinks are used rather than HTML div or span tags
223with click listeners. This provides two benefits to the end-user.
224The first benefit is that screen readers are optimized to locating
225standard hyperlink anchors and presenting them to the end-user as
226a navigation action. The second benefit is that users can use
227the 'open in new tab/window' feature of their browser whenever
228they choose.
229
230When possible, Gerrit uses the ARIA properties on DOM widgets to
231provide hints to screen readers.
232
233
234Browser Compatibility
235---------------------
236
237Supporting non-JavaScript enabled browsers is a non-goal for Gerrit.
238
239As Gerrit is a pure-GWT application with no server side rendering
240fallbacks, the browser must support modern JavaScript semantics in
241order to access the Gerrit web application. Dumb clients such as
242`lynx`, `wget`, `curl`, or even many search engine spiders are not
243able to access Gerrit content.
244
245As Google Web Toolkit (GWT) is used to generate the browser
246specific versions of the client-side JavaScript code, Gerrit works
247on any JavaScript enabled browser which GWT can produce code for.
248This covers the majority of the popular browsers.
249
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800250The Gerrit project does not have the development resources necessary
251to support two parallel UI implementations (GWT based JavaScript
252and server-side rendering). Consequently only one is implemented.
253
254There are number of web browsers available with full JavaScript
255support, and nearly every operating system (including any PDA-like
256mobile phone) comes with one standard. Users who are committed
257to developing changes for a Gerrit managed project can be expected
258to be able to run a JavaScript enabled browser, as they also would
259need to be running Git in order to contribute.
260
261There are a number of open source browsers available, including
262Firefox and Chromium. Users have some degree of choice in their
263browser selection, including being able to build and audit their
264browser from source.
265
266The majority of the content stored within Gerrit is also available
267through other means, such as gitweb or the `git://` protocol.
268Any existing search engine spider can crawl the server-side HTML
269produced by gitweb, and thus can index the majority of the changes
270which might appear in Gerrit. Some engines may even choose to
271crawl the native version control database, such as ohloh.net does.
272Therefore the lack of support for most search engine spiders is a
273non-issue for most Gerrit deployments.
274
275
276Product Integration
277-------------------
278
279Gerrit integrates with an existing gitweb installation by optionally
280creating hyperlinks to reference changes on the gitweb server.
281
282Gerrit integrates with an existing git-daemon installation by
283optionally displaying `git://` URLs for users to download a
Shawn O. Pearced6078462009-11-02 10:37:01 -0800284change through the native Git protocol.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800285
286Gerrit integrates with any OpenID provider for user authentication,
287making it easier for users to join a Gerrit site and manage their
288authentication credentials to it. To make use of Google Accounts
289as an OpenID provider easier, Gerrit has a shorthand "Sign in with
290a Google Account" link on its sign-in screen. Gerrit also supports
291a shorthand sign in link for Yahoo!. Other providers may also be
292supported more directly in the future.
293
Shawn O. Pearce142385d2009-03-01 11:09:05 -0800294Site administrators may limit the range of OpenID providers to
295a subset of "reliable providers". Users may continue to use
296any OpenID provider to publish comments, but granted privileges
297are only available to a user if the only entry point to their
298account is through the defined set of "reliable OpenID providers".
299This permits site administrators to require HTTPS for OpenID,
300and to use only large main-stream providers that are trustworthy,
301or to require users to only use a custom OpenID provider installed
302alongside Gerrit Code Review.
303
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800304Gerrit integrates with some types of corporate single-sign-on (SSO)
305solutions, typically by having the SSO authentication be performed
306in a reverse proxy web server and then blindly trusting that all
307incoming connections have been authenticated by that reverse proxy.
308When configured to use this form of authentication, Gerrit does
309not integrate with OpenID providers.
310
311When installing Gerrit, administrators may optionally include an
312HTML header or footer snippet which may include user tracking code,
313such as that used by Google Analytics. This is a per-instance
314configuration that must be done by hand, and is not supported
315out of the box. Other site trackers instead of Google Analytics
316can be used, as the administrator can supply any HTML/JavaScript
317they choose.
318
319Gerrit does not integrate with any Google service, or any other
320services other than those listed above.
321
322
323Standards / Developer APIs
324--------------------------
325
326Gerrit uses an XSRF protected variant of JSON-RPC 1.1 to communicate
327between the browser client and the server.
328
329As the protocol is not the GWT-RPC protocol, but is instead a
330self-describing standard JSON format it is easily implemented by
331any 3rd party client application, provided the client has a JSON
332parser and HTTP client library available.
333
334As the entire command set necessary for the standard web browser
335based UI is exposed through JSON-RPC over HTTP, there are no other
336data feeds or command interfaces to the server.
337
338Commands requiring user authentication may require the user agent to
339complete a sign-in cycle through the user's OpenID provider in order
340to establish the HTTP cookie Gerrit uses to track user identity.
341Automating this sign-in process for non-web browser agents is
342outside of the scope of Gerrit, as each OpenID provider uses its own
343sign-in sequence. Use of OpenID providers which have difficult to
344automate interfaces may make it impossible for non-browser agents
345to be used with the JSON-RPC interface.
346
347* link:http://json-rpc.org/wd/JSON-RPC-1-1-WD-20060807.html[JSON-RPC 1.1]
348* link:http://android.git.kernel.org/?p=tools/gwtjsonrpc.git;a=blob;f=README;hb=HEAD[XSRF JSON-RPC]
349
350
351Privacy Considerations
352----------------------
353
354Gerrit stores the following information per user account:
355
356* Full Name
357* Preferred Email Address
Shawn O. Pearceaa8b3d42009-03-01 11:10:55 -0800358* Mailing Address '(Optional, Encrypted)'
359* Country '(Optional, Encrypted)'
360* Phone Number '(Optional, Encrypted)'
361* Fax Number '(Optional, Encrypted)'
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800362
363The full name and preferred email address fields are shown to any
364site visitor viewing a page containing a change uploaded by the
365account owner, or containing a published comment written by the
366account owner.
367
368Showing the full name and preferred email is approximately the same
369risk as the `From` header of an email posted to a public mailing
370list that maintains archives, and Gerrit treats these fields in
371much the same way that a mailing list archive might handle them.
372Users who don't want to expose this information should either not
373participate in a Gerrit based online community, or open a new email
374address dedicated for this use.
375
376As the Gerrit UI data is only available through XSRF protected
377JSON-RPC calls, "screen-scraping" for email addresses is difficult,
378but not impossible. It is unlikely a spammer will go through the
379effort required to code a custom scraping application necessary
380to cull email addresses from published Gerrit comments. In most
381cases these same addresses would be more easily obtained from the
382project's mailing list archives.
383
Shawn O. Pearceaa8b3d42009-03-01 11:10:55 -0800384The user's name and email address is stored unencrypted in the
385Gerrit metadata store, typically a PostgreSQL database.
386
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800387The snail-mail mailing address, country, and phone and fax numbers
388are gathered to help project leads contact the user should there
389be a legal question regarding any change they have uploaded.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800390
Shawn O. Pearceaa8b3d42009-03-01 11:10:55 -0800391These sensitive fields are immediately encrypted upon receipt with
392a GnuPG public key, and stored "off site" in another data store,
393isolated from the main Gerrit change data. Gerrit does not have
394access to the matching private key, and as such cannot decrypt the
395information. Therefore these fields are write-once in Gerrit, as not
396even the account owner can recover the values they previously stored.
397
398It is expected that the address information would only need to be
399decrypted and revealed with a valid court subpoena, but this is
400really left to the discretion of the Gerrit site administrator as
401to when it is reasonable to reveal this information to a 3rd party.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800402
403
404Spam and Abuse Considerations
405-----------------------------
406
407Gerrit makes no attempt to detect spam changes or comments. The
408somewhat high barrier to entry makes it unlikely that a spammer
409will target Gerrit.
410
411To upload a change, the client must speak the native Git protocol
412embedded in SSH, with some custom Gerrit semantics added on top.
413The client must have their public key already stored in the Gerrit
414database, which can only be done through the XSRF protected
415JSON-RPC interface. The level of effort required to construct
416the necessary tools to upload a well-formatted change that isn't
417rejected outright by the Git and Gerrit checksum validations is
418too high to for a spammer to get any meaningful return.
419
420To post and publish a comment a client must sign in with an OpenID
421provider and then use the XSRF protected JSON-RPC interface to
422publish the draft on an existing change record. Again, the level of
423effort required to implement the Gerrit specific XSRF protections
424and the JSON-RPC payload format necessary to post a draft and then
425publish that draft is simply too high for a spammer to bother with.
426
427Both of these assumptions are also based upon the idea that Gerrit
428will be a lot less popular than blog software, and thus will be
429running on a lot less websites. Spammers therefore have very little
430returned benefit for getting over the protocol hurdles.
431
432These assumptions may need to be revisited in the future if any
433public Gerrit site actually notices spam.
434
435
436Latency
437-------
438
439Gerrit targets for sub-250 ms per page request, mostly by using
440very compact JSON payloads bewteen client and server. However, as
441most of the serving stack (network, hardware, PostgreSQL metadata
442database) is out of control of the Gerrit developers, no real
443guarantees can be made about latency.
444
445
446Scalability
447-----------
448
Shawn O. Pearce08255812011-04-12 00:02:38 -0400449Gerrit is designed for a very large scale open source project, or
450large commerical development project. Roughly this amounts to
451parameters such as the following:
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800452
453.Design Parameters
Karsten Dambekalnsa7f72a22011-03-25 14:21:59 +0100454[options="header"]
Shawn O. Pearce08255812011-04-12 00:02:38 -0400455|======================================================
456|Parameter | Default Maximum | Estimated Maximum
457|Projects | 1,000 | 10,000
458|Contributors | 1,000 | 50,000
459|Changes/Day | 100 | 2,000
460|Revisions/Change | 20 | 20
461|Files/Change | 50 | 16,000
462|Comments/File | 100 | 100
463|Reviewers/Change | 8 | 8
464|======================================================
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800465
Shawn O. Pearce08255812011-04-12 00:02:38 -0400466Out of the box, Gerrit will handle the "Default Maximum". Site
467administrators may reconfigure their servers by editing gerrit.config
468to run closer to the estimated maximum if sufficient memory is made
469avaliable to the JVM and the relevant cache.*.memoryLimit variables
470are increased from their defaults.
471
472Discussion
473~~~~~~~~~~
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800474
475Very few, if any open source projects have more than a handful of
Shawn O. Pearce08255812011-04-12 00:02:38 -0400476Git repositories associated with them. Since Gerrit treats each
477Git repository as a project, an upper limit of 10,000 projects
478is reasonable. If a site has more than 1,000 projects, administrators
479should increase
480link:config-gerrit.html#cache.name.memoryLimit[`cache.projects.memoryLimit`]
481to match.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800482
Shawn O. Pearce08255812011-04-12 00:02:38 -0400483Almost no open source project has 1,000 contributors over all time,
484let alone on a daily basis. This default figure of 1,000 was WAG'd by
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800485looking at PR statements published by cell phone companies picking
486up the Android operating system. If all of the stated employees in
487those PR statements were working on *only* the open source Android
Shawn O. Pearce08255812011-04-12 00:02:38 -0400488repositories, we might reach the 1,000 estimate listed here. Knowing
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800489these companies as being very closed-source minded in the past, it
490is very unlikely all of their Android engineers will be working on
Shawn O. Pearce08255812011-04-12 00:02:38 -0400491the open source repository, and thus 1,000 is a very high estimate.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800492
Shawn O. Pearce08255812011-04-12 00:02:38 -0400493The upper maximum of 50,000 contributors is based on existing
494installations that are already handling quite a bit more than the
495default maximum of 1,000 contributors. Given how the user data is
496stored and indexed, supporting 50,000 contributor accounts (or more)
497is easily possible for a server. If a server has more than 1,000
498*active* contributors,
499link:config-gerrit.html#cache.name.memoryLimit[`cache.accounts.memoryLimit`]
500should be increased by the site administrator, if sufficient RAM
501is available to the host JVM.
502
503The estimate of 100 changes per day was WAG'd off some estimates
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800504originally obtained from Android's development history. Writing a
505good change that will be accepted through a peer-review process
506takes time. The average engineer may need 4-6 hours per change just
507to write the code and unit tests. Proper design consideration and
508additional but equally important tasks such as meetings, interviews,
509training, and eating lunch will often pad the engineer's day out
510such that suitable changes are only posted once a day, or once
511every other day. For reference, the entire Linux kernel has an
Shawn O. Pearce08255812011-04-12 00:02:38 -0400512average of only 79 changes/day. If more than 100 changes are active
513per day, site administrators should consider increasing the
514link:config-gerrit.html#cache.name.memoryLimit[`cache.diff.memoryLimit`]
515and `cache.diff_intraline.memoryLimit`.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800516
Shawn O. Pearce08255812011-04-12 00:02:38 -0400517On average any given change will need to be modified once to address
518peer review comments before the final revision can be accepted by the
519project. Executing these revisions also eats into the contributor's
520time, and is another factor limiting the number of changes/day
521accepted by the Gerrit instance. However, even though this implies
522only 2 revisions/change, many existing Gerrit installations have seen
52320 or more revisions/change, when new contributors are learning the
524project's style and conventions.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800525
Shawn O. Pearce08255812011-04-12 00:02:38 -0400526On average, each change will have 2 reviewers, a human and an
527automated test bed system. Usually this would be the project lead, or
528someone who is familiar with the code being modified. The time
529required to comment further reduces the time available for writing
530one's own changes. However, existing Gerrit installations have seen 8
531or more reviewers frequently show up on changes that impact many
532functional areas, and therefore it is reasonable to expect 8 or more
533reviewers to be able to work together on a single change.
534
535Existing installations have successfully processed change reviews with
536more than 16,000 files per change. However, since 16,000 modified/new
537files is a massive amount of code to review, it is more typical to see
538less than 10 files modified in any single change. Changes larger than
53910 files are typically merges, for example integrating the latest
540version of an upstream library, where the reviewer has little to do
541beyond verifying the project compiles and passes a test suite.
542
543CPU Usage - Web UI
544~~~~~~~~~~~~~~~~~~
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800545
546Gerrit's web UI would require on average `4+F+F*C` HTTP requests to
547review a change and post comments. Here `F` is the number of files
548modified by the change, and `C` is the number of inline comments left
549by the reviewer per file. The constant 4 accounts for the request
550to load the reviewer's dashboard, to load the change detail page,
551to publish the review comments, and to reload the change detail
552page after comments are published.
553
Shawn O. Pearce08255812011-04-12 00:02:38 -0400554This WAG'd estimate boils down to 216,000 HTTP requests per day
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800555(QPD). Assuming these are evenly distributed over an 8 hour work day
Shawn O. Pearce08255812011-04-12 00:02:38 -0400556in a single time zone, we are looking at approximately 7.5 queries
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800557per second (QPS).
558
559----
Shawn O. Pearce08255812011-04-12 00:02:38 -0400560 QPD = Changes_Day * Revisions_Change * Reviewers_Change * (4 + F + F * C)
561 = 2,000 * 2 * 1 * (4 + 10 + 10 * 4)
562 = 216,000
Shawn O. Pearce57c4ba82009-12-22 08:10:50 -0800563 QPS = QPD / 8_Hours / 60_Minutes / 60_Seconds
Shawn O. Pearce08255812011-04-12 00:02:38 -0400564 = 7.5
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800565----
566
567Gerrit serves most requests in under 60 ms when using the loopback
568interface and a single processor. On a single CPU system there is
569sufficient capacity for 16 QPS. A dual processor system should be
Shawn O. Pearce08255812011-04-12 00:02:38 -0400570more than sufficient for a site with the estimated load described above.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800571
572Given a more realistic estimate of 79 changes per day (from the
Shawn O. Pearce08255812011-04-12 00:02:38 -0400573Linux kernel) suggests only 8,532 queries per day, and a much lower
5740.29 QPS when spread out over an 8 hour work day.
575
576CPU Usage - Git over SSH/HTTP
577~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
578
579A 24 core server is able to handle ~25 concurrent `git fetch`
580operations per second. The issue here is each concurrent operation
581demands one full core, as the computation is almost entirely server
582side CPU bound. 25 concurrent operations is known to be sufficient to
583support hundreds of active developers and 50 automated build servers
584polling for updates and building every change. (This data was derived
585from an actual installation's performance.)
586
587Because of the distributed nature of Git, end-users don't need to
588contact the central Gerrit Code Review server very often. For `git
589fetch` traffic, link:pgm-daemon.html[slave mode] is known to be an
590effective way to offload traffic from the main server, permitting it
591to scale to a large user base without needing an excessive number of
592cores in a single system.
593
594Clients on very slow network connections (for example home office
595users on VPN over home DSL) may be network bound rather than server
596side CPU bound, in which case a core may be effectively shared with
597another user. Possible core sharing due to network bottlenecks
598generally holds true for network connections running below 10 MiB/sec.
599
600If the server's own network interface is 1 Gib/sec (Gigabit Ethernet),
601the system can really only serve about 10 concurrent clients at the
60210 MiB/sec speed, no matter how many cores it has.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800603
604Disk Usage
605~~~~~~~~~~
606
Shawn O. Pearce08255812011-04-12 00:02:38 -0400607The average size of a revision in the Linux kernel once compressed by
608Git is 2,327 bytes, or roughly 2 KiB. Over the course of a year a
609Gerrit server running with the estimated maxium parameters above might
610see an introduction of 1.4 GiB over the total set of 10,000 projects
611hosted in that server. This figure assumes the majority of the content
612is human written source code, and not large binary blobs such as disk
613images or media files.
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800614
Shawn O. Pearce08255812011-04-12 00:02:38 -0400615Production Gerrit installations have been tested, and are known to
616handle Git repositories in the multigigabyte range, storing binary
617files, ranging in size from a few kilobytes (for example compressed
618icons) to 800+ megabytes (firmware images, large uncompressed original
619artwork files). Best practices encourage breaking very large binary
620files into their Git repositories based on access, to prevent desktop
621clients from needing to clone unnecessary materials (for example a C
622developer does not need every 800+ megabyte firmware image created by
623the product's quality assurance team).
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800624
625Redundancy & Reliability
626------------------------
627
628Gerrit largely assumes that the local filesystem where Git repository
629data is stored is always available. Important data written to disk
630is also forced to the platter with an `fsync()` once it has been
631fully written. If the local filesystem fails to respond to reads
632or becomes corrupt, Gerrit has no provisions to fallback or retry
633and errors will be returned to clients.
634
635Gerrit largely assumes that the metadata PostgreSQL database is
636online and answering both read and write queries. Query failures
637immediately result in the operation aborting and errors being
638returned to the client, with no retry or fallback provisions.
639
640Due to the relatively small scale described above, it is very likely
641that the Git filesystem and PostgreSQL based metadata database
642are all housed on the same server that is running Gerrit. If any
643failure arises in one of these components, it is likely to manifest
644in the others too. It is also likely that the administrator cannot
645be bothered to deploy a cluster of load-balanced server hardware,
646as the scale and expected load does not justify the hardware or
647management costs.
648
649Most deployments caring about reliability will setup a warm-spare
650standby system and use a manual fail-over process to switch from the
651failed system to the warm-spare.
652
653As Git is a distributed version control system, and open source
654projects tend to have contributors from all over the world, most
655contributors will be able to tolerate a Gerrit down time of several
656hours while the administrator is notified, signs on, and brings the
657warm-spare up. Pending changes are likely to need at least 24 hours
658of time on the Gerrit site anyway in order to ensure any interested
659parties around the world have had a chance to comment. This expected
660lag largely allows for some downtime in a disaster scenario.
661
662Backups
663~~~~~~~
664
665PostgreSQL can be configured to save its write-ahead-log (WAL)
666and ship these logs to other systems, where they are applied to
667a warm-standby backup in real time. Gerrit instances which care
668about reduduncy will setup this feature of PostgreSQL to ensure
669the warm-standby is reasonably current should the master go offline.
670
671Gerrit can be configured to replicate changes made to the local
672Git repositories over any standard Git transports. This can be
Shawn O. Pearcec5fed822009-11-17 16:10:10 -0800673configured in `'$site_path'/etc/replication.conf` to send copies
674of all changes over SSH to other servers, or to the Amazon S3 blob
Shawn O. Pearcec4bcc092009-02-06 12:32:57 -0800675storage service.
676
677
678Logging Plan
679------------
680
681Gerrit does not maintain logs on its own.
682
683Published comments contain a publication date, so users can judge
684when the comment was posted and decide if it was "recent" or not.
685Only the timestamp is stored in the database, the IP address of
686the comment author is not stored.
687
688Changes uploaded over the SSH daemon from `git push` have the
689standard Git reflog updated with the date and time that the upload
690occurred, and the Gerrit account identity of who did the upload.
691Changes submitted and merged into a branch also update the
692Git reflog. These logs are available only to the Gerrit site
693administrator, and they are not replicated through the automatic
694replication noted earlier. These logs are primarly recorded for an
695"oh s**t" moment where the administrator has to rewind data. In most
696installations they are a waste of disk space. Future versions of
697JGit may allow disabling these logs, and Gerrit may take advantage
698of that feature to stop writing these logs.
699
700A web server positioned in front of Gerrit (such as a reverse proxy)
701or the hosting servlet container may record access logs, and these
702logs may be mined for usage information. This is outside of the
703scope of Gerrit.
704
705
706Testing Plan
707------------
708
709Gerrit is currently manually tested through its web UI.
710
711JGit has a fairly extensive automated unit test suite. Most new
712changes to JGit are rejected unless corresponding automated unit
713tests are included.
714
715
716Caveats
717-------
718
719Reitveld can't be used as it does not provide the "submit over the
720web" feature that Gerrit provides for Git.
721
722Gitosis can't be used as it does not provide any code review
723features, but it does provide basic access controls.
724
725Email based code review does not scale to a project as large and
726complex as Android. Most contributors at least need some sort of
727dashboard to keep track of any pending reviews, and some way to
728correlate updated revisions back to the comments written on prior
729revisions of the same logical change.
Shawn O. Pearce5500e692009-05-28 15:55:01 -0700730
731GERRIT
732------
733Part of link:index.html[Gerrit Code Review]