test(performance): make tests more deterministic by relying more on system counts #5786

Hweinstock · 2024-10-15T18:25:08Z

Problem

Performance tests are currently flaky, making them a poor signal for performance regressions within the code base. Additionally, false alarms slow down development of unrelated features as test failures keep CI from being "green".

Therefore, rather than relying only on system usage thresholds for performance regressions, we can count the number of high-risk / potentially slow operations made by the code as a deterministic measure of its performance. However, directly counting each individual potentially slow operation used within the performance tests highly couples the test to the implementation details, making the tests less effective if the implementation changes.

Therefore, the goal of this PR is the following:

decrease performance test flakiness by increasing thresholds.
increase performance test effectiveness by relying on deterministic measures.
avoid coupling the tests to the implementation details.

Solution

To meet goal (1), we increase the thresholds of the tests to decrease the changes of a false alarm.

To meet goal (2), we count expensive operations. But, to avoid tying it to the implementation details, we count the expensive operations using somewhat-loose upper bounds. Thus, implementation changes modifying the exact number of expensive operations by a small constant factor do not set a false alarm. However, if they increase the number of expensive operations by a multiplicative factor, the upper bound will alert us.

As an example, we don't want the test to fail if it makes 5-10 more file system calls when working with a few hundred files, but we do want the test to fail if it makes 2x the number of files system calls. Therefore, in the code the bounds are often described as "operations per file", since it is the multiplicative increase we are concerned about. This allows us to achieve goal (3).

Implementation Details

The most common "expensive operation" we count is file system calls (through our fs module). Some other examples include the use of zip libraries or loading large files into memory.
We separate the upper bounds for the file system into read and write bounds. This granularity allows us to assert that specific code paths do not modify any files.

Open Question

AdmZip removed in refactor: replace archiver with @zip.js/zip.js #4769 , should we ignore here?

License: I confirm that my contribution is made under the terms of the Apache 2.0 license.

justinmk3 · 2024-10-15T22:02:44Z

The most common "expensive operation" we count is file system calls (through our fs module).

We also have a fetch (http/network calls) abstraction which could be useful in the future.

justinmk3 · 2024-10-15T22:03:30Z

packages/core/src/amazonqFeatureDev/util/files.ts

@@ -28,17 +29,17 @@ export async function prepareRepoData(
    repoRootPaths: string[],
    workspaceFolders: CurrentWsFolders,
    telemetry: TelemetryHelper,
-    span: Span<AmazonqCreateUpload>
+    span: Span<AmazonqCreateUpload>,
+    zip: AdmZip = new AdmZip()


FYI: AdmZip will be replaced by zip.js #4769

ah, I see. Do you recommend I remove the AdmZip call counting in the meantime or leave it until that other PR is done?

Hweinstock · 2024-10-16T17:07:00Z

/runIntegrationTests

Hweinstock added 30 commits October 7, 2024 12:56

move performance test for prepareRepo to integ

3f765c5

move security scan test

5a0357a

split up perf and non perf tests

e07593f

remove shared code to utils

3c1f909

move out more shared code

f5d6c60

Merge branch 'master' into pTests/moveToInteg

2897aee

move some files around

caf2be8

update imports

658ed5a

fix test changes

366b038

delete duplicate test file

1786ff4

fix tests again

3b987b7

fix tests again

56819a5

resolve conflicts

a13a6c7

initial work

da6c8ce

delete unneeded code

b3619a9

Merge branch 'master' into pTest/systemSpy

f9ed46d

Merge branch 'master' into pTests/moveToInteg

dbefbf0

move tests into testPerf

685924c

rename folder

0aa5546

implement spy

b0b63ad

increase thresholds

4fe7c7b

Merge branch 'pTests/moveToInteg' into pTest/systemSpy

f5947c1

build shared utility

fda96df

refactor collectFiles

10130a6

add spies for startSecScan

55ee15e

add system spies for zipCode

10236d8

merge in master

a6eda5b

Merge branch 'master' into pTests/systemSpy

b8ed1f4

add filesystem spy

efe25af

Merge branch 'master' into pTests/systemSpy

74dc701

Hweinstock added 2 commits October 15, 2024 13:18

add spy to file hash test

3a11a85

add spy for vfs

1a2017b

Hweinstock changed the title ~~tests(performance): make tests more deterministic by relying more on system counts~~ test(performance): make tests more deterministic by relying more on system counts Oct 15, 2024

Hweinstock added 2 commits October 15, 2024 17:41

adjust thresholds

5e2d976

remove temp debump to 1 testrun

157c628

justinmk3 reviewed Oct 15, 2024

View reviewed changes

adjust thresholds

6ecc708

Hweinstock marked this pull request as ready for review October 16, 2024 17:34

Hweinstock requested review from a team as code owners October 16, 2024 17:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(performance): make tests more deterministic by relying more on system counts #5786

test(performance): make tests more deterministic by relying more on system counts #5786

Hweinstock commented Oct 15, 2024 •

edited

Loading

justinmk3 commented Oct 15, 2024

justinmk3 Oct 15, 2024

Hweinstock Oct 15, 2024 •

edited

Loading

Hweinstock commented Oct 16, 2024

test(performance): make tests more deterministic by relying more on system counts #5786

Are you sure you want to change the base?

test(performance): make tests more deterministic by relying more on system counts #5786

Conversation

Hweinstock commented Oct 15, 2024 • edited Loading

Problem

Solution

Implementation Details

Open Question

justinmk3 commented Oct 15, 2024

justinmk3 Oct 15, 2024

Choose a reason for hiding this comment

Hweinstock Oct 15, 2024 • edited Loading

Choose a reason for hiding this comment

Hweinstock commented Oct 16, 2024

Hweinstock commented Oct 15, 2024 •

edited

Loading

Hweinstock Oct 15, 2024 •

edited

Loading