Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Add a benchmark involving realistic code that creates large unions #13549

Open
AlexWaygood opened this issue Sep 29, 2024 · 2 comments
Labels
performance Potential performance improvement red-knot Multi-file analysis & type inference

Comments

@AlexWaygood
Copy link
Member

AlexWaygood commented Sep 29, 2024

Code that creates large unions (explicitly or implicitly) is known to be hard for type checkers to analyze in a performant way. Both mypy and pyright have had lots of issues regarding performance for these, and both have implemented several optimizations to deal with them. We should add at least one benchmark (probably several) that measures how we do on this.

Common themes that come up in this area are:

  • Unions involving Literal types (Literal[1, 2, 3] desugars to Literal[1] | Literal[1] | Literal[3] from the type checker's perspective, so "medium-sized Literal types" quickly end up creating huge unions)

  • Enums. A similar issue to Literal types. If you have an enum like this:

    class A(enum.Enum):
        X = 1
        Y = 2
        Z = 3

    Then in some cases, x: A can desugar to x: Literal[A.X] | Literal[A.Y] | Literal[A.Z]

  • Pydantic. Pydantic uses some big unions, and features in performance bug reports in both the mypy and pyright issue trackers.

  • Unions involving protocols

  • Unions involving recursive type aliases

  • Unions involving TypedDicts

References

Here are some references that are worth looking at (and from which we might be able to derive benchmarks). The great thing about all of these is that they are performance issues that we know real users encountered when their type checker was checking real code. I've tried to exclude anything specific to recursive type aliases, since that feels somewhat out of scope for this issue:

Mypy

Pyright

@AlexWaygood AlexWaygood added performance Potential performance improvement red-knot Multi-file analysis & type inference labels Sep 29, 2024
@dangotbanned
Copy link

We recently added a TypedDict-heavy feature in altair.

Not sure if this helps as real-world examples, but sharing as red-knot came up during review

mypy performance here has a lot of room for improvement

@hauntsaninja
Copy link
Contributor

hauntsaninja commented Oct 1, 2024

Nice! I'll add altair to mypy_primer. You may also be interested in python/mypy#17231 (comment) , I think it makes mypy significantly faster on your workload.

dangotbanned referenced this issue in hauntsaninja/mypy_primer Oct 1, 2024
There may be issues with old mypy, so wait for a new release
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Potential performance improvement red-knot Multi-file analysis & type inference
Projects
None yet
Development

No branches or pull requests

3 participants