summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorShawn O. Pearce <sop@google.com>2011-04-12 19:02:16 -0700
committerAndroid Code Review <code-review@android.com>2011-04-12 19:02:16 -0700
commitac66dfa4c64fc286eac15ff34da0ed93c73183ba (patch)
tree27f71f01eee1379089f7d33d99c618e762a1d130
parent7cc35242f92fc9304144bcff4f34960b355d6b31 (diff)
parent0825581d884fac1c1fa78814e195f361ffcbef71 (diff)
Merge "documentation: Update system scaling data"
-rw-r--r--Documentation/dev-design.txt189
1 files changed, 132 insertions, 57 deletions
diff --git a/Documentation/dev-design.txt b/Documentation/dev-design.txt
index 861fbe0d83..571ec6cdb2 100644
--- a/Documentation/dev-design.txt
+++ b/Documentation/dev-design.txt
@@ -446,43 +446,61 @@ guarantees can be made about latency.
Scalability
-----------
-Gerrit is designed for an open source project. Roughly this
-amounts to parameters such as the following:
+Gerrit is designed for a very large scale open source project, or
+large commerical development project. Roughly this amounts to
+parameters such as the following:
.Design Parameters
[options="header"]
-|====================================
-|Parameter | Estimated Maximum
-|Projects | 500
-|Contributors | 2,000
-|Changes/Day | 400
-|Revisions/Change | 2.0
-|Files/Change | 4
-|Comments/File | 2
-|Reviewers/Change | 1.0
-|====================================
-
-CPU Usage
-~~~~~~~~~
+|======================================================
+|Parameter | Default Maximum | Estimated Maximum
+|Projects | 1,000 | 10,000
+|Contributors | 1,000 | 50,000
+|Changes/Day | 100 | 2,000
+|Revisions/Change | 20 | 20
+|Files/Change | 50 | 16,000
+|Comments/File | 100 | 100
+|Reviewers/Change | 8 | 8
+|======================================================
+
+Out of the box, Gerrit will handle the "Default Maximum". Site
+administrators may reconfigure their servers by editing gerrit.config
+to run closer to the estimated maximum if sufficient memory is made
+avaliable to the JVM and the relevant cache.*.memoryLimit variables
+are increased from their defaults.
+
+Discussion
+~~~~~~~~~~
Very few, if any open source projects have more than a handful of
-Git repositories associated with them. Since Gerrit treats one
-Git repository as a project, an assumed limit of 500 projects
-is reasonable. Only an operating system distribution project
-would really need to be tracking more than a handful of discrete
-Git repositories.
-
-Almost no open source project has 2,000 contributors over all time,
-let alone on a daily basis. This figure of 2,000 was WAG'd by
+Git repositories associated with them. Since Gerrit treats each
+Git repository as a project, an upper limit of 10,000 projects
+is reasonable. If a site has more than 1,000 projects, administrators
+should increase
+link:config-gerrit.html#cache.name.memoryLimit[`cache.projects.memoryLimit`]
+to match.
+
+Almost no open source project has 1,000 contributors over all time,
+let alone on a daily basis. This default figure of 1,000 was WAG'd by
looking at PR statements published by cell phone companies picking
up the Android operating system. If all of the stated employees in
those PR statements were working on *only* the open source Android
-repositories, we might reach the 2,000 estimate listed here. Knowing
+repositories, we might reach the 1,000 estimate listed here. Knowing
these companies as being very closed-source minded in the past, it
is very unlikely all of their Android engineers will be working on
-the open source repository, and thus 2,000 is a very high estimate.
-
-The estimate of 400 changes per day was WAG'd off some estimates
+the open source repository, and thus 1,000 is a very high estimate.
+
+The upper maximum of 50,000 contributors is based on existing
+installations that are already handling quite a bit more than the
+default maximum of 1,000 contributors. Given how the user data is
+stored and indexed, supporting 50,000 contributor accounts (or more)
+is easily possible for a server. If a server has more than 1,000
+*active* contributors,
+link:config-gerrit.html#cache.name.memoryLimit[`cache.accounts.memoryLimit`]
+should be increased by the site administrator, if sufficient RAM
+is available to the host JVM.
+
+The estimate of 100 changes per day was WAG'd off some estimates
originally obtained from Android's development history. Writing a
good change that will be accepted through a peer-review process
takes time. The average engineer may need 4-6 hours per change just
@@ -491,20 +509,39 @@ additional but equally important tasks such as meetings, interviews,
training, and eating lunch will often pad the engineer's day out
such that suitable changes are only posted once a day, or once
every other day. For reference, the entire Linux kernel has an
-average of only 79 changes/day.
-
-The estimate of 2 revisions/change means that on average any
-given change will need to be modified once to address peer review
-comments before the final revision can be accepted by the project.
-Executing these revisions also eats into the contributor's time,
-and is another factor limiting the number of changes/day accepted
-by the Gerrit instance.
-
-The estimate of 1 reviewer/change means that on average only one
-person will comment on a change. Usually this would be the project
-lead, or someone who is familiar with the code being modified.
-The time required to comment further reduces the time available
-for writing one's own changes.
+average of only 79 changes/day. If more than 100 changes are active
+per day, site administrators should consider increasing the
+link:config-gerrit.html#cache.name.memoryLimit[`cache.diff.memoryLimit`]
+and `cache.diff_intraline.memoryLimit`.
+
+On average any given change will need to be modified once to address
+peer review comments before the final revision can be accepted by the
+project. Executing these revisions also eats into the contributor's
+time, and is another factor limiting the number of changes/day
+accepted by the Gerrit instance. However, even though this implies
+only 2 revisions/change, many existing Gerrit installations have seen
+20 or more revisions/change, when new contributors are learning the
+project's style and conventions.
+
+On average, each change will have 2 reviewers, a human and an
+automated test bed system. Usually this would be the project lead, or
+someone who is familiar with the code being modified. The time
+required to comment further reduces the time available for writing
+one's own changes. However, existing Gerrit installations have seen 8
+or more reviewers frequently show up on changes that impact many
+functional areas, and therefore it is reasonable to expect 8 or more
+reviewers to be able to work together on a single change.
+
+Existing installations have successfully processed change reviews with
+more than 16,000 files per change. However, since 16,000 modified/new
+files is a massive amount of code to review, it is more typical to see
+less than 10 files modified in any single change. Changes larger than
+10 files are typically merges, for example integrating the latest
+version of an upstream library, where the reviewer has little to do
+beyond verifying the project compiles and passes a test suite.
+
+CPU Usage - Web UI
+~~~~~~~~~~~~~~~~~~
Gerrit's web UI would require on average `4+F+F*C` HTTP requests to
review a change and post comments. Here `F` is the number of files
@@ -514,38 +551,76 @@ to load the reviewer's dashboard, to load the change detail page,
to publish the review comments, and to reload the change detail
page after comments are published.
-This WAG'd estimate boils down to <12,800 HTTP requests per day
+This WAG'd estimate boils down to 216,000 HTTP requests per day
(QPD). Assuming these are evenly distributed over an 8 hour work day
-in a single time zone, we are looking at approximately 0.43 queries
+in a single time zone, we are looking at approximately 7.5 queries
per second (QPS).
----
- QPD = Changes_Day * Revisions_Change * Reviewers_Change * (4 + F + F * C)
- = 400 * 2.0 * 1.0 * (4 + 4 + 4 * 2)
- = 12,800
+ QPD = Changes_Day * Revisions_Change * Reviewers_Change * (4 + F + F * C)
+ = 2,000 * 2 * 1 * (4 + 10 + 10 * 4)
+ = 216,000
QPS = QPD / 8_Hours / 60_Minutes / 60_Seconds
- = 0.43
+ = 7.5
----
Gerrit serves most requests in under 60 ms when using the loopback
interface and a single processor. On a single CPU system there is
sufficient capacity for 16 QPS. A dual processor system should be
-sufficient for a site with the estimated load described above.
+more than sufficient for a site with the estimated load described above.
Given a more realistic estimate of 79 changes per day (from the
-Linux kernel) suggests only 2,528 queries per day, and a much lower
-0.08 QPS when spread out over an 8 hour work day.
+Linux kernel) suggests only 8,532 queries per day, and a much lower
+0.29 QPS when spread out over an 8 hour work day.
+
+CPU Usage - Git over SSH/HTTP
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+A 24 core server is able to handle ~25 concurrent `git fetch`
+operations per second. The issue here is each concurrent operation
+demands one full core, as the computation is almost entirely server
+side CPU bound. 25 concurrent operations is known to be sufficient to
+support hundreds of active developers and 50 automated build servers
+polling for updates and building every change. (This data was derived
+from an actual installation's performance.)
+
+Because of the distributed nature of Git, end-users don't need to
+contact the central Gerrit Code Review server very often. For `git
+fetch` traffic, link:pgm-daemon.html[slave mode] is known to be an
+effective way to offload traffic from the main server, permitting it
+to scale to a large user base without needing an excessive number of
+cores in a single system.
+
+Clients on very slow network connections (for example home office
+users on VPN over home DSL) may be network bound rather than server
+side CPU bound, in which case a core may be effectively shared with
+another user. Possible core sharing due to network bottlenecks
+generally holds true for network connections running below 10 MiB/sec.
+
+If the server's own network interface is 1 Gib/sec (Gigabit Ethernet),
+the system can really only serve about 10 concurrent clients at the
+10 MiB/sec speed, no matter how many cores it has.
Disk Usage
~~~~~~~~~~
-The average size of a revision in the Linux kernel once compressed
-by Git is 2,327 bytes, or roughly 2 KB. Over the course of a year
-a Gerrit server running with the parameters above might see an
-introduction of 570 MB over the total set of 500 projects hosted in
-that server. This figure assumes the majorty of the content is human
-written source code, and not large binary blobs such as disk images.
-
+The average size of a revision in the Linux kernel once compressed by
+Git is 2,327 bytes, or roughly 2 KiB. Over the course of a year a
+Gerrit server running with the estimated maxium parameters above might
+see an introduction of 1.4 GiB over the total set of 10,000 projects
+hosted in that server. This figure assumes the majority of the content
+is human written source code, and not large binary blobs such as disk
+images or media files.
+
+Production Gerrit installations have been tested, and are known to
+handle Git repositories in the multigigabyte range, storing binary
+files, ranging in size from a few kilobytes (for example compressed
+icons) to 800+ megabytes (firmware images, large uncompressed original
+artwork files). Best practices encourage breaking very large binary
+files into their Git repositories based on access, to prevent desktop
+clients from needing to clone unnecessary materials (for example a C
+developer does not need every 800+ megabyte firmware image created by
+the product's quality assurance team).
Redundancy & Reliability
------------------------