summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'stable-2.16' into stable-3.0upstream/stable-3.0Nasser Grainawi2021-02-254-14/+39
|\ | | | | | | | | | | | | | | * stable-2.16: Call retryDone() when giving up after lock failures Fix issue with task cleanup after retry Change-Id: Id987043c8a26bd3f69fb4bd5b84591ae20cb83ba
| * Call retryDone() when giving up after lock failuresv2.16.28upstream/stable-2.16Martin Fick2021-02-241-0/+1
| | | | | | | | | | | | | | | | | | | | Previously when giving up after retrying due to too many lock failures, a 'replication start --wait' command would wait indefinitely if it was waiting on the push that gave up. Fix this by calling retryDone() after giving up which will trigger the ReplicationStatus to reflect a failure allowing the waiting to complete. Change-Id: I0debade83612eb7ce51bab0191ab99464a6e7cd3
| * Fix issue with task cleanup after retryMarcin Czech2021-02-244-14/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | Destination.notifyFinished method calls finish on ReplicationTasksStorage.Task objects which are not scheduled for retry. The issue is that for rescheduled tasks PushOne.isRetrying will always returns true even if task is already replicated. That creates a situation where tasks scheduled for retry are never cleaned up. Bug: Issue 12754 Change-Id: I4b10c2752da6aa7444f57c3ce4ab70eb00c3f14e
* | Merge branch 'stable-2.16' into stable-3.0Kaushik Lingarkar2021-01-251-15/+16
|\| | | | | | | | | | | | | * stable-2.16: Use volatile and AtomicIntegers to be thread safe Change-Id: I90a3e17e2f49d07707409ba390c0a6dd0501b512
| * Use volatile and AtomicIntegers to be thread safev2.16.27Adithya Chakilam2021-01-151-15/+16
| | | | | | | | | | | | | | | | | | | | Modify the fields in ReplicationState class to be volatile and AtomicIntegers so that changes to them are reflected to other threads. By not doing so, modifications made by one thread to these fields may not be reflected instantly depending on cpu caching thus resulting in incorrect state Change-Id: I76512b17c19cc68e4f1e6a5223899f9a184bb549
* | Merge branch 'stable-2.16' into stable-3.0Nasser Grainawi2020-12-074-27/+60
|\| | | | | | | | | | | | | * stable-2.16: Fix replication to retry on lock errors Change-Id: I6e262d2c22d2dcd49b341b3c752d6d8b6c93b32c
| * Fix replication to retry on lock errorsv3.0.16v2.16.26Kaushik Lingarkar2020-12-024-27/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Versions of Git released since 2014 have created a new status "failed to update ref" which replaces the two statuses "failed to lock" and "failed to write". So, we now see the newer status when the remote is unable to lock a ref. Refer Git commit: https://github.com/git/git/commit/6629ea2d4a5faa0a84367f6d4aedba53cb0f26b4 Config 'lockErrorMaxRetries' is not removed as part of this change as folks who have it configured currently don't run into unexpected behavior with retries when they upgrade to a newer version of the plugin. Also, the "failed to lock" check is not removed for folks still using a version of Git older than 2014. Change-Id: I9b3b15bebd55df30cbee50a0e0c2190d04f2f443
* | Merge branch 'stable-2.16' into stable-3.0Nasser Grainawi2020-10-304-6/+116
|\| | | | | | | | | | | | | | | * stable-2.16: ReplicationStorageIT: Wait for all pushes without order ReplicationTasksStorage: Add multi-primary unit tests Change-Id: I1d749621c189ee2e49f092ddc7558f83e508411f
| * ReplicationStorageIT: Wait for all pushes without orderNasser Grainawi2020-10-302-4/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Some tests don't have a predefined order for which events will be replicated first. Using a timeout based on a single replication event is flawed when we don't know the expected order. Instead, use a timeout for the group of events and ignore the order. For two events replicating to a single remote with a single thread, we expect the complete replication to take twice as long. Two events replicating to two remotes will use one thread each and therefore not take any longer than the single remote case. Change-Id: Ieb21b7eee32105eab5b5a15a35159bb4a837e363
| * Merge "ReplicationTasksStorage: Add multi-primary unit tests" into stable-2.16v2.16.23Martin Fick2020-10-282-2/+78
| |\
| | * ReplicationTasksStorage: Add multi-primary unit testsAdithya Chakilam2020-10-262-2/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These tests examine the replication scenarios under multi-primary setup making use of the api calls present in ReplicationTasksStorage class similarly as done in single primary setup. These tests ensure that the replication compatibility in multi-primary setup is not broken. Change-Id: I375b731829f3c0640d3a7a98635e1e5c526908ca
* | | Move storage portion of replicateBranchDeletion ITsNasser Grainawi2020-10-273-74/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All other ITs split e2e and storage tests on stable-2.16, so this change only updates the new replicateBranchDeletion tests that were added in stable-3.0. The e2e check for if the destination branch is removed or not stays in ReplicationIT and the check that a task is created in storage when the branch delete API is invoked moves to ReplicationStorageIT. This split allows the best practices for verifying e2e and storage to be applied independently. Change-Id: Iec7ee090bd614e3442b1f9cb454437c9e05290be
* | | Merge branch 'stable-2.16' into stable-3.0Nasser Grainawi2020-10-275-190/+404
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * stable-2.16: Refactor Replication*IT tests to share a base class ReplicationIT: Add shouldMatch* e2e tests ReplicationStorageIT: Move shouldMatch* tests from ReplicationIT ReplicationStorageIT: Add shouldFire*ChangeRefs tests Move storage-based ITs into ReplicationStorageIT ReplicationQueue: Remove unused method This change does not try to reimpose the breakdown of tests that was done in 2.16. That will be done in follow up change(s) to improve reviewability of this change. Change-Id: I83202997610c5ad0d8849cb477ca36db8df760f5
| * | Refactor Replication*IT tests to share a base classNasser Grainawi2020-10-263-177/+135
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These classes have very similar setups and duplicate helper methods. Improve maintainability by reducing the duplication. ReplicationQueueIT is not modified because it is merged into ReplicationIT on stable-3.0. Change-Id: Ibc22ae4d0db2d09009f65c0e745f1095c67827ba
| * | ReplicationIT: Add shouldMatch* e2e testsNasser Grainawi2020-10-261-0/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | These new tests utilize creating a branch in a way that does not trigger replication so that scheduleFullSync() is responsible for replicating the update. In this way, the tests verify the destination receives the update because scheduleFullSync() matched the given URI. Change-Id: I4ae15d0301a308a12cbca3684915e89ca421e02f
| * | ReplicationStorageIT: Move shouldMatch* tests from ReplicationITNasser Grainawi2020-10-263-89/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These tests are focused on verifying storage, so they belong in ReplicationStorageIT. Improve these tests to better verify storage correctness by switching the 'now' parameter to false such that replicationDelay is honored and follow the ReplicationStorageIT pattern using a very long delay. These improvements make these tests much more stable. The tests improve the ref matching slightly by comparing to the PushOne.ALL_REFS constant. Also removes the disableDeleteForTesting flag as there are no users of it now. A later change can add ReplicationIT e2e tests for these use cases. Change-Id: Iaa14a7429a40fb62325259efa1c7d7637deef95a
| * | ReplicationStorageIT: Add shouldFire*ChangeRefs testsNasser Grainawi2020-10-261-0/+45
| | | | | | | | | | | | | | | | | | | | | Copy the shouldFire*IncompleteUri tests as shouldFire*ChangeRefs to fill a gap in test coverage. Change-Id: Ia8df64a8574b776e6a9f7201c0862f1e6794687e
| * | Move storage-based ITs into ReplicationStorageITNasser Grainawi2020-10-262-86/+224
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tests in ReplicationStorageIT utilize very long replication delays such that tasks are never expected to complete during the test. This allows test writers to assume the task files are still there. Refactor tests from ReplicationIT into ReplicationStorageIT and focus them on verifying storage correctness. This is mostly a direct copy except that shouldFirePendingOnlyToStoredUri gets renamed and split into two tests. One that validates tasks are fired and another that validates replication completes to the expected destinations. This split is necessary because of the very long delay methodology mentioned above. Code sharing between ReplicationIT and ReplicationStorageIT will be improved in a later commit. Change-Id: I41179c20a10354953cff3628368dfd5f910cc940
| * | ReplicationQueue: Remove unused methodNasser Grainawi2020-10-121-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | And drop the misleading @VisibleForTesting annotation from the method the removed method was wrapping. scheduleFullSync() is public so that PushAll can call it. Change-Id: I0139e653654fcaf20de68dddfb5ea85560a323d0
* | | Merge branch 'stable-2.16' into stable-3.0Nasser Grainawi2020-10-152-15/+23
|\| | | | | | | | | | | | | | | | | | | | | | | | | | * stable-2.16: ReplicationIT: Remove unnecessary storage inspection ReplicationIT: Fix invalid replicationDelay setting Split replication plugins tests in two groups Change-Id: I2d27b715a2bfc9832ee559556d1c8acfe671d893
| * | ReplicationIT: Remove unnecessary storage inspectionNasser Grainawi2020-10-121-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Integration tests shouldn't need to rely on inspecting the underlying ReplicationTasksStorage layer(s). All of these tests already verify the expected end result. This leaves 4 tests that currently completely rely on inspecting the task storage to verify the expected result. Those tests need further improvement to decouple from the storage layer. Change-Id: I029d63ce7d07414d9bf5d9290d556378beedcabf
| * | ReplicationIT: Fix invalid replicationDelay settingNasser Grainawi2020-10-121-7/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Setting config values for a remote in replication.config vs the remote's own config file results in the replication.config values being ignored. Fix this by setting the values in each remote's config file. This test had delays added to avoid any flakiness, but the delays weren't working because of this issue. While the test generally passes, the delay makes it safer from races. Change-Id: Idcdf5f07b3fc91724068ec6216527665c4a48bb3
| * | Split replication plugins tests in two groupsLuca Milanesio2020-10-081-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Run unit-tests and integration tests in parallel by splitting them into two separate tasks. This also allows to potentially identify which group of tests is flaky, because Bazel would flag one or the other in case of instability. Change-Id: I21f969a17e3653dfc5ab93d71cc6955024fc2d8f
* | | Merge branch 'stable-2.16' into stable-3.0v3.0.13Marco Miller2020-10-011-1/+8
|\| | | | | | | | | | | | | | | | | | | | * stable-2.16: Make the shouldReplicateNewProject test more reliable Change-Id: I447043d502987070bc395936484a1cb23a5ddabc
| * | Make the shouldReplicateNewProject test more reliableMartin Fick2020-09-281-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | The ReplicationIT shouldReplicateNewProject was failing regularly on my machine. Improve the timeout for this test so that it explicitly includes the time needed to wait for the project to be created, not just the scheduling and retry times. Change-Id: Ibf3cc3506991b222ded3ee4ddfbd7e2d60341d60
* | | Merge branch 'stable-2.16' into stable-3.0Nasser Grainawi2020-09-012-3/+6
|\| | | | | | | | | | | | | | | | | | | | | | | * stable-2.16: Fix synopsis in replication start cmd documentation Don't wait for pending events to process on startup Change-Id: If4bc69761a19a0137301535759dc8a317ea04186
| * | Fix synopsis in replication start cmd documentationKaushik Lingarkar2020-08-241-2/+1
| | | | | | | | | | | | | | | | | | | | | --url is usable with --all or projects and on its own. Update the usage to reflect this. Change-Id: Id3637f7bf61b7f65348b19ec0616808ef3f44ccf
| * | Don't wait for pending events to process on startupSaša Živkov2020-08-141-1/+5
| |/ | | | | | | | | | | | | | | | | | | | | | | | | Previously, on large Gerrit installations with many projects and/or many replication destinations, the replication plugin could take very long periods of time to startup. This was particularly a problem if the pending(persisted) event count was large as they all were rescheduled before the plugin finished initializing. Change this behavior so that startup merely begins the process of scheduling the pending events, but does not wait for them to complete. Bug: Issue 12769 Change-Id: I224c2ce2a35f987af2343089b9bb00a7fcb7e3be
* | Merge branch 'stable-2.16' into stable-3.0Nasser Grainawi2020-07-302-1/+233
|\| | | | | | | | | | | | | * stable-2.16: ReplicationTasksStorage: Add unit tests Change-Id: I8095d012b5cfa497267b6ef027f697c7e8369533
| * ReplicationTasksStorage: Add unit testsNasser Grainawi2020-07-282-1/+233
| | | | | | | | Change-Id: I164426e70937bc3c4ac426be3056a01e9229746b
* | Merge branch 'stable-2.16' into stable-3.0v3.0.12Luca Milanesio2020-07-152-3/+3
|\| | | | | | | | | | | | | * stable-2.16: Fix naming for delay for draining the replication event queue Change-Id: I3cba1756a10a1c12db96d04ca55d3feb7bc8784e
| * Fix naming for delay for draining the replication event queuev2.16.22Nasser Grainawi2020-07-152-3/+3
| | | | | | | | | | | | | | | | | | | | Thread.sleep() takes milliseconds as an argument, not seconds. Otherwise, multiplying by 1000 would be a bug. Also switches to returning a long, which fixes a potential overflow when multiplying by 1000. Change-Id: I3fc5c939e8c09c134e24fa9381e96e6529b5be4d
* | Merge branch 'stable-2.16' into stable-3.0Luca Milanesio2020-07-011-28/+41
|\| | | | | | | | | | | | | | | * stable-2.16: Improve readability of shouldFirePendingOnlyToStoredUri test Fix flakiness in ReplicationIT for pending events firing Change-Id: Id40baca92acc9fba8656630f725d55e5fbb6662b
| * Improve readability of shouldFirePendingOnlyToStoredUri testLuca Milanesio2020-07-011-18/+14
| | | | | | | | | | | | | | | | | | Make the ReplicationIT.shouldFirePendingOnlyToStoredUri easier to read and simplify the extraction of the replication tasks associated to a change ref, as regex matching isn't required and could be misleading when reading the test. Change-Id: Ib493275872b56bc04cdcfb541b7cfa7ecfb1e058
| * Fix flakiness in ReplicationIT for pending events firingLuca Milanesio2020-07-011-24/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the shouldFirePendingOnlyToStoredUri test by making sure that events are NOT executed by the replication engine until the tests has completed the preparation phase. The Gerrit build on stable-2.16 became flaky right afterward the merge of the new shouldFirePendingOnlyToStoredUri test which highlighted the flakiness. The test wants to simulate a situation where a ref-update needs to be propagated to two remotes: remote1 and remote2. For doing so, it configures the two remotes and crates a change for generating the two replication tasks files on the filesystem. Then, it looks for the events associated for remote1 and removes them, so that the next replication queue startup won't find it and won't replicate the change to remote1. During the interval of time between the creation of the change and the removal of the underlying replication task on the filesystem, the replication task could have been executed already and the test failed. Make sure that the replication does not kick in by setting the replication timeout to Integer.MAX_VALUE at the beginning. Then, once the replication task file is removed on the filesystem, set it back to default and reload the configuration to trigger the firing of the events. Remove also the explicit start/stop of the replication queue, as the config reload is already a stop/start process and it automatically triggering an event replay. Change-Id: Ifd591da37e94b6ce8f281cb0404f3f3c737489f3
* | Merge branch 'stable-2.16' into stable-3.0Luca Milanesio2020-06-276-6/+99
|\| | | | | | | | | | | | | * stable-2.16: Only fire the specified pending event URI Change-Id: Ib800603d830c9b4ba688b0222ac5642ad50f17a0
| * Only fire the specified pending event URIMartin Fick2020-06-266-6/+119
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously the startup firing of pending events would fire every URI for a project ref combination on startup. To avoid duplicates, it only ever fired one round of every URI per project/ref combination. This had the side effect that if only a single URI were stored, presumably because the other URIs were completed before shutdown, this would result in the creation of way more replication events than necessary, presumably many duplicates of already completed pushes. Fix this behavior by only firing to the specific stored URI, and remove the duplicate project/ref filtering since that now would prevent firing to more than one URI for the same project/ref combination when there actually are stored events for multiple URIs. Add a test to confirm the correct new more limiting behavior. Bug: Issue 12779 Change-Id: I56d314af2ecbf84362dda099fa28f1b8f82cefa7
* | ReplicationIT: Test that branch deletion is replicatedDavid Pursehouse2020-06-231-1/+58
| | | | | | | | | | | | | | When a branch is deleted, the deletion should be replicated to the remote when "remote.name.mirror" is enabled. Change-Id: If59a9bb15958f4559d62452a309afcf1ca6c3789
* | Merge branch 'stable-2.16' into stable-3.0v3.0.11v3.0.10David Pursehouse2020-06-011-1/+1
|\| | | | | | | | | | | | | * stable-2.16: Make SecureCredentialsFactory public Change-Id: I757ba1004ce2a851c7857762b178de9294deae21
| * Make SecureCredentialsFactory publicv2.16.21Antonio Barone2020-05-291-2/+2
| | | | | | | | | | | | | | | | | | | | Access to secure.config is useful to more than just replication plugin. Allow instantiating this class from packages other than replication plugin. Specifically this is useful, as this class can be used from pull-replication too. Change-Id: Id268c869e993c6cabacfa0043ec269172e0efba1 (cherry picked from commit c09a7c08fb44094c7475313ac52154adac39a54c)
* | Make SecureCredentialsFactory publicDavid Pursehouse2020-05-291-1/+1
| | | | | | | | | | | | | | This allows it to be used in implementations that extend the replication plugin. Change-Id: Id81f5986f24720b9575c1987c21b2ae9672ddd37
* | Merge branch 'stable-2.16' into stable-3.0Luca Milanesio2020-05-282-5/+46
|\| | | | | | | | | | | | | | | * stable-2.16: Fix replication of project deletion Improve project creation replication integration test Change-Id: I1818511118ed1738cf76d48cb49b66a52f1d83c8
| * Fix replication of project deletionv2.16.20Luca Milanesio2020-05-282-1/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a regression where the project deletion was not propagated to the remote nodes because its associated project state was missing from the project cache. When replicating deletions, the project state is not present as the replication plugin is receiving the deletions ex-post and therefore do not have access to the project anymore. The associated checks for project visibility and read only state are not valid for project deletions. Bug: Issue 12806 Change-Id: I7a9ac01b01d5dd40b8bf0c4d3347256f430329ac
| * Improve project creation replication integration testLuca Milanesio2020-05-281-4/+7
| | | | | | | | | | | | | | | | Make sure project creation is correctly managed and tested, asserting that the repository contains at least one ref replicated and not just the bare repo showing up. Change-Id: I0fdc1e73390c2abd3e40d2a02fd7e4ce7f20bb67
* | Merge branch 'stable-2.16' into stable-3.0v3.0.9David Pursehouse2020-05-221-3/+4
|\| | | | | | | | | | | | | * stable-2.16: Make persistent task keys stable Change-Id: Iefda465c739f4669b5394d3c57f7abe0d7513b5f
| * Make persistent task keys stablev2.16.19Martin Fick2020-05-201-3/+4
| | | | | | | | | | | | | | | | | | GSON was used to create json as the input for sha1 task keys, however gson can order the json keys differently anytime. Use the values in a specific order to create stable keys. Bug: Issue 11760 Change-Id: I6900b5ddb3ba8ab7b5cf7803ae9dd551b5980a59
* | Merge branch 'stable-2.16' into stable-3.0David Pursehouse2020-05-181-6/+15
|\| | | | | | | | | | | | | * stable-2.16: Prevent persistent task listing interruptions on IOExceptions Change-Id: Ib8bd758a3dd9c24968ec58be921c4475a7bde030
| * Prevent persistent task listing interruptions on IOExceptionsMartin Fick2020-05-171-6/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When iterating over the list of persisted tasks, it is possible for an IOException to occur while reading a specific task. Prevent this exception from breaking out of the iteration by catching and logging the exception inside the loop instead of outside of it. Also improve the logging by differentiating between failures that are severe versus potentially related to other node actions since in a multi-master scenario with shared storage, it is common for operations on one node to "interfere" with task listing operations on another node without causing a malfunction. Specifically, improve the exception handling so that the logging in these latter cases have a likely explanation of the listing error, and do not consider these specific filesystem errors operational errors. Change-Id: Ia2ad431c20142ff0ce23dbace34aec837e3d8540
* | Merge branch 'stable-2.16' into stable-3.0David Pursehouse2020-05-161-0/+3
|\| | | | | | | | | | | | | * stable-2.16: Fix firing pending "..all.." events on startup Change-Id: I04f042199fd8935bee987b8363956115a40e0872
| * Fix firing pending "..all.." events on startupMartin Fick2020-05-151-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | The Destination.wouldPushRef() method is called on startup to see if a Destination is configured to be pushed to for a specific ref, if it is not configured to do so, then firing the pending update is skipped. Since the magic "..all.." ref will never match the configuration in replication.config, always match it since if replication is configured at all, then it should be matched. Bug: Issue 11745 Change-Id: I53bd527932e6aea9ddd465772925d601aa034bd3 (cherry picked from commit 3ddf835c203565dbd415f468e0d40eac1b815c63)