Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize parsing #124

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

adamchainz
Copy link
Contributor

@adamchainz adamchainz commented Mar 11, 2022

A few changes:

  1. Switch from multiprocessing to concurrent.futures with ProcessPoolExecutor. This is just to make the code easier to work with.
  2. Perform check_file() as results return to the main process, in order to reduce peak memory usage. Previously all parsed files were kept in memory before being checked, leading to massive memory usage.
  3. Construct parsers once per process, rather than once per file. Previously parser construction took ~5% of the runtime, this reduces it to a constant amount.

Benchmarked on a project with 238 templates.

Before:

$ time curlylint templates/**/*.html
All done! ✨ 🍰 ✨

curlylint templates/**/*.html  352.25s user 3.37s system 999% cpu 35.575 total

After:

$ time curlylint templates/**/*.html
All done! ✨ 🍰 ✨

curlylint templates/**/*.html  324.22s user 2.79s system 995% cpu 32.858 total

~8% of the time saved.

The parser remains quite slow, I think it does an unfortunate amount of backtracking.

A few changes:

1. Switch from `multiprocessing` to `concurrent.futures` with `ProcessPoolExecutor`. This is just to make the code easier to work with.
2. Perform `check_file()` as results return to the main process, in order to reduce peak memory usage. Previously all parsed files were kept in memory before being checked, leading to massive memory usage.
3. Construct parsers once per process, rather than once per file. Previously parser construction took ~5% of the runtime, this reduces it to a constant amount.

Benchmarked on a project with 238 templates.

Before:

```
$ time curlylint templates/**/*.html
All done! ✨ 🍰 ✨

curlylint templates/**/*.html  352.25s user 3.37s system 999% cpu 35.575 total
```

After:

```
$ time curlylint templates/**/*.html
All done! ✨ 🍰 ✨

curlylint templates/**/*.html  324.22s user 2.79s system 995% cpu 32.858 total
```

~8% of the time saved.

The parser remains quite slow, I think it does an unfortunate amount of backtracking.
@adamchainz
Copy link
Contributor Author

Okay turns out a lot of tests call parse_source - perhaps the parser construction can be made a bit more lazy with @lru_cache or similar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant