Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(performance): make tests more deterministic by relying more on system counts #5786

Open
wants to merge 35 commits into
base: master
Choose a base branch
from

Conversation

Hweinstock
Copy link
Contributor

@Hweinstock Hweinstock commented Oct 15, 2024

Problem

Performance tests are currently flaky, making them a poor signal for performance regressions within the code base. Additionally, false alarms slow down development of unrelated features as test failures keep CI from being "green".

Therefore, rather than relying only on system usage thresholds for performance regressions, we can count the number of high-risk / potentially slow operations made by the code as a deterministic measure of its performance. However, directly counting each individual potentially slow operation used within the performance tests highly couples the test to the implementation details, making the tests less effective if the implementation changes.

Therefore, the goal of this PR is the following:

  1. decrease performance test flakiness by increasing thresholds.
  2. increase performance test effectiveness by relying on deterministic measures.
  3. avoid coupling the tests to the implementation details.

Solution

To meet goal (1), we increase the thresholds of the tests to decrease the changes of a false alarm.

To meet goal (2), we count expensive operations. But, to avoid tying it to the implementation details, we count the expensive operations using somewhat-loose upper bounds. Thus, implementation changes modifying the exact number of expensive operations by a small constant factor do not set a false alarm. However, if they increase the number of expensive operations by a multiplicative factor, the upper bound will alert us.

As an example, we don't want the test to fail if it makes 5-10 more file system calls when working with a few hundred files, but we do want the test to fail if it makes 2x the number of files system calls. Therefore, in the code the bounds are often described as "operations per file", since it is the multiplicative increase we are concerned about. This allows us to achieve goal (3).

Implementation Details

  • The most common "expensive operation" we count is file system calls (through our fs module). Some other examples include the use of zip libraries or loading large files into memory.
  • We separate the upper bounds for the file system into read and write bounds. This granularity allows us to assert that specific code paths do not modify any files.

Open Question


License: I confirm that my contribution is made under the terms of the Apache 2.0 license.

@Hweinstock Hweinstock changed the title tests(performance): make tests more deterministic by relying more on system counts test(performance): make tests more deterministic by relying more on system counts Oct 15, 2024
@justinmk3
Copy link
Contributor

  • The most common "expensive operation" we count is file system calls (through our fs module).

We also have a fetch (http/network calls) abstraction which could be useful in the future.

@@ -28,17 +29,17 @@ export async function prepareRepoData(
repoRootPaths: string[],
workspaceFolders: CurrentWsFolders,
telemetry: TelemetryHelper,
span: Span<AmazonqCreateUpload>
span: Span<AmazonqCreateUpload>,
zip: AdmZip = new AdmZip()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: AdmZip will be replaced by zip.js #4769

Copy link
Contributor Author

@Hweinstock Hweinstock Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I see. Do you recommend I remove the AdmZip call counting in the meantime or leave it until that other PR is done?

@Hweinstock
Copy link
Contributor Author

/runIntegrationTests

@Hweinstock Hweinstock marked this pull request as ready for review October 16, 2024 17:34
@Hweinstock Hweinstock requested review from a team as code owners October 16, 2024 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants