-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic import maps #10528
base: main
Are you sure you want to change the base?
Dynamic import maps #10528
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up doing a relatively detailed review anyway, although for some repeated editorial issues I stopped commenting.
I have two major questions:
-
The merge algorithm needs examples, and maybe explanatory text. I can follow most of the steps (modulo some bugs), but I can't figure out the intent. The examples can either use the JSON syntax, or the normalized syntax seen below https://html.spec.whatwg.org/#parse-an-import-map-string if that is helpful in giving extra clarity. The impact of the resolved set is particularly unclear.
-
I don't understand why the import map is being passed around so much. There's still always one import map per global, and it's easy to get to that global from any algorithm or from the "script" struct. At least one instance of this seems completely redundant, which I commented on. But e.g. why are you storing the import map in [[HostDefined]]? I realize there's probably some complexity here at the particular point in time when you're merging import maps and thus the global's import map changes, but that should be able to happen completely discretely between script parsing and execution, so I don't see why scripts should need to track individual import maps separate from the global one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Fixed some comments and addressing the rest soon!
My thinking was that we need to do that in order to guarantee that once we're parsing a module tree, all modules in the tree would be resolved by the same import map. E.g. I thought it is possible that a setTimeout would inject a new import map while the a module script is being downloaded and parsed, and that new import map would start taking effect after some modules were resolved but before others. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very helpful, thanks!
My thinking was that we need to do that in order to guarantee that once we're parsing a module tree, all modules in the tree would be resolved by the same import map. E.g. I thought it is possible that a setTimeout would inject a new import map while the a module script is being downloaded and parsed, and that new import map would start taking effect after some modules were resolved but before others.
If that's not possible for some reason, I'm happy to revert these changes.
That makes perfect sense.
Given this, we should explain this in the spec, maybe around #concept-window-import-map
. With a note that in general only the root of a loading operation will access concept-window-import-map, and otherwise it'll be threaded through.
With that frame, auditing all the call sites of concept-window-import-map...
- "resolve a module integrity metadata" seems suspicious. It should probably get an import map threaded to it?
- "fetch the descendants of and link" seems suspicious. Shouldn't it be getting threaded an import map from its various callers? (per the diagram above it.)
- "register an import map" has a broken assert
I'm also a bit unsure now about the cases where an import map is not passed in. When is that possible? (Except workers.) We have fallbacks to the Window's import map in those cases, but I'm now questioning whether they're sound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! editorial fixes first :)
Added
This was indeed lacking. Should be fixed now.
Here I think the current state is fine, as this is being called from all the root module entry points. Therefore we don't need to thread the import map into "fetch the descendants of and link", we need it to do the threading to its descendants, which it does by setting the map on the
Indeed!! |
Let me try to enumerate the cases:
I think that covers all of them but let me know if I missed something. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I think the current state is fine, as this is being called from all the root module entry points. Therefore we don't need to thread the import map into "fetch the descendants of and link", we need it to do the threading to its descendants, which it does by setting the map on the Record.
I think I see. Because the callers all operate on URLs or inline scripts, so they didn't need to do any resolution, and so didn't need an import map. It's only for the descendants that you start doing resolution and thus start needing an import map.
This does have the slightly-strange impact that given something like
<script type=module src=my-script.mjs></script>
<script type=importmap>
...
</script>
the modifications that appear after the <script type=module>
will apply to the imports of my-script.mjs
, because we delay snapshotting the import map until the response from the server comes back. That seems a bit unfortunate; WDYT?
Forgot to address this part.. I agree that this would be weird, and hence it'd be better to pipe in the import map in those cases. I'll do that. |
Done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have an idea for something that might clean things up. Basically, I find the importMap being optional confusing. It's hard to know whether an algorithm is not getting an import map because we forgot, or because we're coming from a worker. And the fact that sometimes we fall back to the Window's import map, even though we're not in an obviously "top level" algorithm, is extra confusing. (For example, in "create a JavaScript module script.)
I think the following would clean that up:
- Move "Window's import map" (and Window's resolved module set?) to all global objects. Put them next to https://html.spec.whatwg.org/#in-error-reporting-mode . Add a note explaining that only Window objects have their import maps modified away from the initial empty import map, for now.
- Make all import map arguments mandatory.
- Always grab the global object's import map when appropriate. This should now be obviously only at top-level situations.
WDYT?
SG. done! |
c934c48
to
17c1ab0
Compare
In case it's useful for the review process - I mapped the high-level relevant spec changes to Chromium's code. |
Has an alternative design been considered that doesn't mutate a shared global map? For example perhaps associating maps to individual script tags?: <!-- importmap="..." would only affect the loading of this graph -->
<script type="module" src="./entry.js" importmap="./entry.importmap.json"></script> This alternative design could even allow for explaining importmaps in terms of import attributes, i.e. if you want to load a third-party module with a third-party importmap you could do: import thirdParty from "./dist/third-party.js" with { importmap: "./dist/third-party.importmap.json" } |
I almost gave a nearly-equivlent comment yesterday but was afraid I was misunderstanding something. So I'm glad you spoke up @Jamesernator. My draft example was literally: await import("./dist/third-party.js", { withMap: "./dist/third-party.importmap.json"})
import thirdParty from "./dist/third-party.js" withMap "./dist/third-party.importmap.json" Concerns with mutating a shared global map
Consider a dynamic
As I understand it, the proposal has side effects due to this being a global map. E.g. loading a brower extension can change the import behavior of non-browser-extension imports. Extensions often do not want side effects: case-in-point the motivating usecase at the top (this one) does not want side effects, it just wants to map-imports for extension1, not for extension2. If extension1 global-mutates import-map for A IMO extension-loading order creates a unnecessary race condition where extension1 overrides extnesion2's import map, along with extension2's developer having no good way to debug when-and-why their extension broke. (And even once they do figure out why there is basically no solution since telling the user to change the load order is impractical, they have to go back to square1 of bundling everything to be reliable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing to see this, I only had time to do a very very brief review but I really like the overall approach. Specifically, I have some concerns about algorithmic complexity, but haven't had a chance to look more carefully at the algorithm to determine if they are "resolvable" yet.
Then for the copying approach - if we have a well-defined deduping that won't change the nature of mappings, what is the reason for wanting to lock down resolutions during individual load operations? Will this locked map also affect dynamic import or is it only cloned through the static graph? And if so, it might seem strange having different resolution rules for dynamic import and static import, especially when ECMA-262 also maintains its own cache to ensure these are consistent for known imports.
I probably won't have time to do a thorough review of this, but @michaelficarra and I spent quite a while thinking about (some parts of) this problem back when import maps were first being discussed, so I want to call @yoavweiss's attention to this issue which @michaelficarra wrote at the time. I don't see anything in the OP about the case where the second import map maps to something which is already mapped by the first. (Maybe I missed it?) For example: First: {
"imports": {
"/app/helper": "./helper/index.mjs"
}
} Second: {
"imports": {
"helper": "/app/helper"
}
} What's the intended behavior in this case? My inclination is to say that this is equivalent to {
"imports": {
"/app/helper": "./helper/index.mjs",
"helper": "./helper/index.mjs",
}
} i.e., imports on the RHS of later maps are resolved in the context of earlier maps. This is to preserve the property that if you have a script with This is consistent with the way the web usually works: if one script does |
This algorithm is looking really nice now, and the notes and examples are great to build understanding. Very happy to see this move forward. Note that while it may be stating the obvious, it could be worth noting that warnings can be skipped when the values are equal for all merge operations, One thing that's still worrying me a little is the size of the data being stored here. Consider an application with 1000 modules, where each of those modules have 5 imports, that's 5000 entries in this map. Since most imports are relative URLs and not bare specifiers, we are storing the specifier, baseURL and resolved URL across all 5000 entries. At 50 bytes a URL, two URLs, and 20 bytes per specifier, that might be around 600KiB for just a simple app. Of course there's probably lots of smart ways to optimize this, baseURLs being hierarchical, adding some extra string sharing etc. I'm not saying it should be done necessarily, but there's one design tradeoff that could be made that might help reduce the data size by more than 50%, that could be worth discussing, and that's to not permit URL remapping for any full URL maps once that URL or URL prefix has been imported, regardless of scope shadowing. This way, instead of storing every single specifier in the resolved module set, we can having two smaller sets: (1) the resolved URL set, and (2) the bare specifier resolved module set. This is based on the observation that relative specifiers are usually at least as common as bare specifiers, and often more so. We then don't need to store For that same 1000 module app, with 5 imports per module, maybe 2500 of the import specifiers were relative or URL imports and 2500 of the import specifiers were bare specifiers. We now store only 1000 entries in the imported URL set, and 2500 in the bare specifier module set as just parent + bare specifier, resulting in of the order of 50KiB + 175KiB = 225KiB at less than 40% of the size. If the relative / full URL count is higher that also improves. The utility of URL remapping may not be lost if we don't support scoped remapping of URLs that have already been imported in dynamic import maps workflows, so that this might not be such a hard tradeoff to make. In many cases, the URLs being remapped in these workflows are local to the scope anyway and part of the private contracts of the package where the URL being mapped wouldn't have been otherwise imported, so wanting to support this may go against concepts of package and scope isolation anyway. Would be interested to hear thoughts further. |
Thanks!!
I wouldn't expect implementations to implement this algorithm as is. E.g. the information needed for
I wouldn't consider an app that load 5000 modules to be "simple". But again, real-life usage can guide the different tradeoffs we can take here.
Can you expand on that? I don't understand how that would help. Also, given that "https:/" is a common URL prefix, that seems rather limiting. (unless I misunderstand you) |
I mean still 1000 modules but with 5000 imports / resolved module set records. Specifier count which is equal to resolved module set size is much higher than module count.
Hmm, looking more closely at the merging algorithm I see now you don't check
Perhaps an example will help, say you start with: {
"imports": {
"/app.js": "/app-dev.js"
},
"scopes": {
"/dep/": {
"deep-dep": "/deep-dep/index.js"
}
}
} And then later on you want to add: {
"scopes": {
"/another-dep/": {
"deep-dep": "/deep-dep/index.js"
}
}
} You would only need to track If only tracking bare specifier imports, the restriction here would then be that the following wouldn't be supported: {
"scopes": {
"/another-dep/": {
"/app.js": "/breaks.js"
}
}
} The above would be supported just fine though if The tradeoff is that we aren't having to track every single URL that is imported under every single scope, but we are just tracking bare specifiers, and then ban URL remappings regardless of scopes for URLs already imported.
With regards to this comment, looking at your implementation it seems like it doesn't check equality of entries currently when generating warnings to avoid unnecessary warnings over duplicate entries. |
I don't believe that's true, as the specifier that goes into the resolved module set is a normalized specifier.
Yeah, we're not currently checking equality for duplicate entries. |
Can you expand on that? Do you expect every module to be called using multiple specifiers? |
I see that, but the scope merging algorithm itself doesn't seem to check this
Yeah I mean this, where having a data structure that just tracks bare specifier usage would be much smaller. |
FWIW, all this talk about memory savings made me realize that the current ImportMap implementation in Chromium doesn't use AtomicStrings, which can probably save a bunch of memory, especially once we start storing prefixes for the merge algorithm. https://chromium-review.googlesource.com/c/chromium/src/+/5905067 fixes that. Thanks!! |
More broadly than AtomicStrings, I think that if storing this data would turn out to be a memory issue, there are many ways for implementations to deal with that. E.g. store hashes instead of full URLs. Limiting the merge algorithm to only support scopes for bare specifiers feels to me like a arbitrary premature optimization, that doesn't necessarily takes the priority of constituencies into account. |
The string data still needs to be stored, since hashes wouldn't be full-proof here unless as long as URLs anyway.
Scopes would still be supported for URLs fine, so long as those URLs have not been loaded - we just take advantage of the fact that URLs have a global meaning unlike bare specifiers. Do you have real use cases for wanting to change the interpretation of a URL between scopes even after that URL has been loaded? |
Let's move this discussion to a more practical plane - do you have a test site that represents the case you're concerned about? We could use such a site to measure the memory impact of the current implementation to see if it is a cause for concern. |
If there's still significant discussion it would be good if this PR was backed by an issue where that discussion could take place. PRs don't lend themselves to significant discussion and at least historically at some point GitHub will give up and make the PR unusable. |
The relevant public issues are WICG/import-maps#248 and WICG/import-maps#92 (we should really archive that repo. /cc @domenic ) From my perspective, the discussion here revolves around this PR's algorithms, so it does feel appropriate to have it here. At the same time, if GH borks the PR, that's not great.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we imagine telling web developers the result of the merge with the current API if we wanted to do that? Additional information on the load event or some such?
<var>newMap</var> and a <span>module specifier map</span> <var>oldMap</var>:</p> | ||
|
||
<ol> | ||
<li><p>Let <var>mergedMap</var> be a deep copy of <var>oldMap</var>.</p></li> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Infra calls this clone, but a deep clone is not defined I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Infra's clone is explicitly shallow. Happy to define a "deep clone" in infra if that's better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'll be tricky as we don't have a clone operation for arbitrary values. :/ It's essentially an issue similar to whatwg/infra#643
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My position is that we should use a undefined "everyone hopefully knows what this means" term in the HTML spec, like "deep copy", until we can make the time to flesh it out in Infra. So I like the PR as-is.
We could. In discussions with @hiroshige-g, we weren't sure there's a use case for it, but e.g. a list of ignored rules hanging off of the load event could be a way for us to communicate that in the future, if we'll see there's demand. |
In general the processed form of the import map has not been accessible up to this point. So far there hasn't been a use case for it, but WICG/import-maps#128 tracks that feature request. |
Introduction
Import maps in their current form provide multiple benefits to web developers. They enable them to avoid cache invalidation cascades, and to be able to work with more ergonomic bare module identifiers, mapping them to URLs in a convenient way, without worrying about versions when importing.
At the same time, the current import map setup suffers from fragility. Only a single import map can be loaded per document, and it can only be loaded before any module script is loaded. Once a single module script is loaded, import maps are disallowed.
That creates a situation where developers have to think twice (or more) before using module scripts in situations that may introduce import maps further down in the document. It also means that using import maps can carry a risk unless you’re certain you can control all the module scripts loaded on the page.
Beyond that, the fact that import maps have to be loaded before any module means that the map itself acts as a blocking resource to any module functionality. Large SPAs that want to use modules, have to download the map of all potential modules they may need during the app’s lifetime ahead of time.
So, it seems like there’s room for improvement. Enabling more dynamic import maps would allow developers to avoid these issues and fully benefit from import maps’ caching and ergonomic advantages without incurring a cost when it comes to stability or performance.
At the same time, the current static design gives us determinism and isn’t racy. A module identifier that resolves to a certain module will continue to do so throughout the lifetime of the document. It would be good to keep that characteristic.
Objectives
Goals
Non-Goals
Use Cases
Third party scripts
When third party scripts integrate themselves to web pages today, they cannot do that as ES modules without taking on some risk. That risk varies somewhat, depending on their form of integration.
Injected without developer supervision
That could include third party scripts injected by the CDN, by a CMS or some other automated system that isn’t content-aware.
For such scripts to be loaded as ES modules, they have to make sure that they are not loaded before any import maps in the content.
They can do that by:
Developer-injected snippets
For snippets-based 3Ps, they need to provide instructions so that the developer is aware of import maps in their page and only injects the snippet after it. That may or may not be a realistic thing to ask. It’d definitely increase the integration’s complexity, resulting in a higher percentage of failures or support calls.
Content Management Systems
Content management systems often have markup and code arriving from multiple different sources. Site owners, theme developers and application/extension/plug-in developers all take part in generating the final markup of the page delivered to the user, which often contains lots of scripts. Some of that code can be static, while other parts can vary per user.
If any of that code contains an import map, extreme caution needs to be taken when integrating all these different script entry points, if any of them is an ES module.
Browser Extensions
A similar problem exists with browser extensions, where if extension-injected code wants to use ES modules or import maps, it needs to verify ahead of time that it doesn’t collide with the content itself and where the code is added relative to the rest of the page.
Large Single-Page Apps
Serving hundreds to thousands of different modules is a reality for large SPAs. While bundling is used to speed up the loading-performance cost of modules, in later stages of the application lifetime, it doesn’t always make sense to bundle - while it can reduce the weight of modules over the network (by improving compression ratios), it can also cause over-fetching and less-granular caching which can result in frequent invalidations.
So apps end up with several thousands of modules that may load during the lifetime of the app, using dynamic import.
Using import maps can significantly help such apps avoid cache invalidation cascades, but it also presents a challenge.
An import map for such a site needs to include all the thousands of different modules it may import, and it needs to do that before any module loads. As such, the quite-large import map would be blocking any module-based functionality. That’s a significant performance tradeoff.
Usage examples
There are two cases when rules of the new import map don't get merged into the existing one.
The new import map rule has the exact same scope and specifier as a rule in the existing import map. We'll call that "conflicting rule".
The new import map rule may impact the resolution of an already resolved module. We'll call that "impacted already resolved module".
Two import maps with no conflicts
When the new import map has no conflicting rules, and there are no impacted resolved modules, the resulting map would be a combination of the new and existing maps. Rules that would have individually impacted similar modules (e.g. "/app/" and "/app/helper") but are not an exact match are not conflicting, and all make it to the merged map.
So, the following existing and new import maps:
Would be equivalent to the following single import map:
New import map defining an already-resolved specifier
When the new import map impacts an already resolved module, that rule gets dropped from the import map.
So, if the top-level resolved module set already contains the pair (null, "/app/helper"), the following new import map:
Would be equivalent to the following one:
New import map defining an already-resolved specifier in a specific scope
The same is true for rules defined in specific scopes. If the resolved module set contains the pair ("/app/main.mjs", "/app/helper"), the following new import map:
Would similarly be equivalent to:
The script in the pair is the script object itself, rather than its URL, so these examples are somewhat simplistic in that regard.
Already-resolved specifier and multiple rules redefining it
We could also have cases where a single already-resolved module specifier has multiple rules for its resolution, depending on the referring script. In such cases, only the relevant rules would not be added to the map.
For example, if the rop-level resolved module set contains the pair ("/app/main.mjs", "/app/helper"), the following new import map:
Would be equivalent to:
This is achieved by the fact that the merge algorithm uses a copy of the resolved module set and removes already referring script specifier pairs from it if they already resulted in a rule being ignored.
Two import maps with conflicting rules
When the new import map has conflicting rules to the existing import map, with no impacted already resolved modules, the existing import map rules persist.
For example, the following existing and new import maps:
Would be equivalent to the following single import map:
High-level design
At a high-level, we want a module resolution cache that will ensure that a resolved module identifier always resolves to the same module. That is implemented using the "resolved module set", which ensures that URLs for modules that were already resolved cannot be added to future import maps.
We also want top-level imports that start loading a module tree won’t have that tree change “under their feet” due to an import map that was loaded in parallel. That is achieved by providing a copy of the import maps to the module resolution algorithm of these top-level modules and propagating it recursively down its module tree.
And finally, we want a way to create a single, coherent import map from multiple import map scripts loaded on the document. That is done with the "merge new and existing import maps" algorithm.
(See WHATWG Working Mode: Changes for more details.)
/infrastructure.html ( diff )
/scripting.html ( diff )
/webappapis.html ( diff )