Optimize parsing #124

adamchainz · 2022-03-11T09:08:31Z

A few changes:

Switch from multiprocessing to concurrent.futures with ProcessPoolExecutor. This is just to make the code easier to work with.
Perform check_file() as results return to the main process, in order to reduce peak memory usage. Previously all parsed files were kept in memory before being checked, leading to massive memory usage.
Construct parsers once per process, rather than once per file. Previously parser construction took ~5% of the runtime, this reduces it to a constant amount.

Benchmarked on a project with 238 templates.

Before:

$ time curlylint templates/**/*.html
All done! ✨ 🍰 ✨

curlylint templates/**/*.html  352.25s user 3.37s system 999% cpu 35.575 total

After:

$ time curlylint templates/**/*.html
All done! ✨ 🍰 ✨

curlylint templates/**/*.html  324.22s user 2.79s system 995% cpu 32.858 total

~8% of the time saved.

The parser remains quite slow, I think it does an unfortunate amount of backtracking.

A few changes: 1. Switch from `multiprocessing` to `concurrent.futures` with `ProcessPoolExecutor`. This is just to make the code easier to work with. 2. Perform `check_file()` as results return to the main process, in order to reduce peak memory usage. Previously all parsed files were kept in memory before being checked, leading to massive memory usage. 3. Construct parsers once per process, rather than once per file. Previously parser construction took ~5% of the runtime, this reduces it to a constant amount. Benchmarked on a project with 238 templates. Before: ``` $ time curlylint templates/**/*.html All done! ✨ 🍰 ✨ curlylint templates/**/*.html 352.25s user 3.37s system 999% cpu 35.575 total ``` After: ``` $ time curlylint templates/**/*.html All done! ✨ 🍰 ✨ curlylint templates/**/*.html 324.22s user 2.79s system 995% cpu 32.858 total ``` ~8% of the time saved. The parser remains quite slow, I think it does an unfortunate amount of backtracking.

adamchainz · 2022-03-11T09:15:15Z

Okay turns out a lot of tests call parse_source - perhaps the parser construction can be made a bit more lazy with @lru_cache or similar.

adamchainz force-pushed the optimized_concurrent branch from 130b9a6 to 670c03d Compare March 11, 2022 09:10

thibaudcolas mentioned this pull request Apr 4, 2022

Ability to do simple checks on template tag content #131

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize parsing #124

Optimize parsing #124

adamchainz commented Mar 11, 2022 •

edited

Loading

adamchainz commented Mar 11, 2022

Optimize parsing #124

Are you sure you want to change the base?

Optimize parsing #124

Conversation

adamchainz commented Mar 11, 2022 • edited Loading

adamchainz commented Mar 11, 2022

adamchainz commented Mar 11, 2022 •

edited

Loading