Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic import maps #10528

Open
wants to merge 55 commits into
base: main
Choose a base branch
from
Open

Conversation

yoavweiss
Copy link
Contributor

@yoavweiss yoavweiss commented Jul 30, 2024

Introduction

Import maps in their current form provide multiple benefits to web developers. They enable them to avoid cache invalidation cascades, and to be able to work with more ergonomic bare module identifiers, mapping them to URLs in a convenient way, without worrying about versions when importing.

At the same time, the current import map setup suffers from fragility. Only a single import map can be loaded per document, and it can only be loaded before any module script is loaded. Once a single module script is loaded, import maps are disallowed.
That creates a situation where developers have to think twice (or more) before using module scripts in situations that may introduce import maps further down in the document. It also means that using import maps can carry a risk unless you’re certain you can control all the module scripts loaded on the page.

Beyond that, the fact that import maps have to be loaded before any module means that the map itself acts as a blocking resource to any module functionality. Large SPAs that want to use modules, have to download the map of all potential modules they may need during the app’s lifetime ahead of time.

So, it seems like there’s room for improvement. Enabling more dynamic import maps would allow developers to avoid these issues and fully benefit from import maps’ caching and ergonomic advantages without incurring a cost when it comes to stability or performance.

At the same time, the current static design gives us determinism and isn’t racy. A module identifier that resolves to a certain module will continue to do so throughout the lifetime of the document. It would be good to keep that characteristic.

Objectives

Goals

  • Increase robustness when using ES modules and import maps
  • Enable expanding the Window’s import map throughout its lifetime
  • Satisfy the EcmaScript HostLoadImportedModule requirement that multiple calls will always resolve to the same module
  • Minimize race-conditions which can result in different module resolutions on different loading sequences of the same page.

Non-Goals

  • Provide a programmatic way to expand or modify the Window’s import map - out-of-scope for the current effort
  • Completely avoid race conditions that can result in different module resolutions based on network conditions
    • Such races are already possible today (e.g. if an import map is dynamically injected by a classic script which may or may not run before a module is loaded)
    • Specifically for dynamic modules, requiring this would conflict with the “expanding the import map over time” goal.

Use Cases

Third party scripts

When third party scripts integrate themselves to web pages today, they cannot do that as ES modules without taking on some risk. That risk varies somewhat, depending on their form of integration.

Injected without developer supervision

That could include third party scripts injected by the CDN, by a CMS or some other automated system that isn’t content-aware.

For such scripts to be loaded as ES modules, they have to make sure that they are not loaded before any import maps in the content.

They can do that by:

  • Loading at the bottom of the page, which may or may not correspond with the point in which they typically need to load in order to function optimally.
  • Buffering the content and validating that no import maps are present, which can incur performance penalties.

Developer-injected snippets

For snippets-based 3Ps, they need to provide instructions so that the developer is aware of import maps in their page and only injects the snippet after it. That may or may not be a realistic thing to ask. It’d definitely increase the integration’s complexity, resulting in a higher percentage of failures or support calls.

Content Management Systems

Content management systems often have markup and code arriving from multiple different sources. Site owners, theme developers and application/extension/plug-in developers all take part in generating the final markup of the page delivered to the user, which often contains lots of scripts. Some of that code can be static, while other parts can vary per user.

If any of that code contains an import map, extreme caution needs to be taken when integrating all these different script entry points, if any of them is an ES module.

Browser Extensions

A similar problem exists with browser extensions, where if extension-injected code wants to use ES modules or import maps, it needs to verify ahead of time that it doesn’t collide with the content itself and where the code is added relative to the rest of the page.

Large Single-Page Apps

Serving hundreds to thousands of different modules is a reality for large SPAs. While bundling is used to speed up the loading-performance cost of modules, in later stages of the application lifetime, it doesn’t always make sense to bundle - while it can reduce the weight of modules over the network (by improving compression ratios), it can also cause over-fetching and less-granular caching which can result in frequent invalidations.

So apps end up with several thousands of modules that may load during the lifetime of the app, using dynamic import.
Using import maps can significantly help such apps avoid cache invalidation cascades, but it also presents a challenge.
An import map for such a site needs to include all the thousands of different modules it may import, and it needs to do that before any module loads. As such, the quite-large import map would be blocking any module-based functionality. That’s a significant performance tradeoff.

Usage examples

There are two cases when rules of the new import map don't get merged into the existing one.

The new import map rule has the exact same scope and specifier as a rule in the existing import map. We'll call that "conflicting rule".

The new import map rule may impact the resolution of an already resolved module. We'll call that "impacted already resolved module".

Two import maps with no conflicts

When the new import map has no conflicting rules, and there are no impacted resolved modules, the resulting map would be a combination of the new and existing maps. Rules that would have individually impacted similar modules (e.g. "/app/" and "/app/helper") but are not an exact match are not conflicting, and all make it to the merged map.

So, the following existing and new import maps:

{
   "imports": {
    "/app/": "./original-app/",
  }
}
{
  "imports": {
    "/app/helper": "./helper/index.mjs"
  },
  "scopes": {
    "/js": {
      "/app/": "./js-app/"
    }
  }
}

Would be equivalent to the following single import map:

{
  "imports": {
    "/app/": "./original-app/",
    "/app/helper": "./helper/index.mjs"
  },
  "scopes": {
    "/js": {
      "/app/": "./js-app/"
    }
  }
}

New import map defining an already-resolved specifier

When the new import map impacts an already resolved module, that rule gets dropped from the import map.

So, if the top-level resolved module set already contains the pair (null, "/app/helper"), the following new import map:

{
   "imports": {
    "/app/helper": "./helper/index.mjs",
    "lodash": "/node_modules/lodash-es/lodash.js"
  }
}

Would be equivalent to the following one:

{
  "imports": {
    "lodash": "/node_modules/lodash-es/lodash.js"
  }
}

New import map defining an already-resolved specifier in a specific scope

The same is true for rules defined in specific scopes. If the resolved module set contains the pair ("/app/main.mjs", "/app/helper"), the following new import map:

{
  "scopes": {
    "/app/": {
      "/app/helper": "./helper/index.mjs"
    },
  }
   "imports": {
    "lodash": "/node_modules/lodash-es/lodash.js"
  }
}

Would similarly be equivalent to:

{
  "imports": {
    "lodash": "/node_modules/lodash-es/lodash.js"
  }
}

The script in the pair is the script object itself, rather than its URL, so these examples are somewhat simplistic in that regard.

Already-resolved specifier and multiple rules redefining it

We could also have cases where a single already-resolved module specifier has multiple rules for its resolution, depending on the referring script. In such cases, only the relevant rules would not be added to the map.

For example, if the rop-level resolved module set contains the pair ("/app/main.mjs", "/app/helper"), the following new import map:

{
  "scopes": {
    "/app/": {
      "/app/helper": "./helper/index.mjs"
    },
    "/app/vendor/": {
      "/app/": "./vendor_helper/"
    },
    "/vendor/": {
      "/app/helper": "./helper/vendor_index.mjs"
    }
  },
   "imports": {
    "lodash": "/node_modules/lodash-es/lodash.js"
    "/app/": "./general_app_path/"
    "/app/helper": "./other_path/helper/index.mjs"
  }
}

Would be equivalent to:

{
  "scopes": {
    "/vendor/": {
      "/app/helper": "./helper/vendor_index.mjs"
    }
  },
  "imports": {
    "lodash": "/node_modules/lodash-es/lodash.js"
  }
}

This is achieved by the fact that the merge algorithm uses a copy of the resolved module set and removes already referring script specifier pairs from it if they already resulted in a rule being ignored.

Two import maps with conflicting rules

When the new import map has conflicting rules to the existing import map, with no impacted already resolved modules, the existing import map rules persist.

For example, the following existing and new import maps:

{
   "imports": {
    "/app/helper": "./helper/index.mjs",
    "lodash": "/node_modules/lodash-es/lodash.js"
  }
}
{
  "imports": {
    "/app/helper": "./main/helper/index.mjs"
  }
}

Would be equivalent to the following single import map:

{
  "imports": {
    "/app/helper": "./helper/index.mjs",
    "lodash": "/node_modules/lodash-es/lodash.js",
  }
}

High-level design

At a high-level, we want a module resolution cache that will ensure that a resolved module identifier always resolves to the same module. That is implemented using the "resolved module set", which ensures that URLs for modules that were already resolved cannot be added to future import maps.

We also want top-level imports that start loading a module tree won’t have that tree change “under their feet” due to an import map that was loaded in parallel. That is achieved by providing a copy of the import maps to the module resolution algorithm of these top-level modules and propagating it recursively down its module tree.

And finally, we want a way to create a single, coherent import map from multiple import map scripts loaded on the document. That is done with the "merge new and existing import maps" algorithm.

(See WHATWG Working Mode: Changes for more details.)


/infrastructure.html ( diff )
/scripting.html ( diff )
/webappapis.html ( diff )

@yoavweiss yoavweiss marked this pull request as draft July 30, 2024 07:55
@yoavweiss yoavweiss marked this pull request as ready for review July 31, 2024 10:57
source Outdated Show resolved Hide resolved
Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up doing a relatively detailed review anyway, although for some repeated editorial issues I stopped commenting.

I have two major questions:

  • The merge algorithm needs examples, and maybe explanatory text. I can follow most of the steps (modulo some bugs), but I can't figure out the intent. The examples can either use the JSON syntax, or the normalized syntax seen below https://html.spec.whatwg.org/#parse-an-import-map-string if that is helpful in giving extra clarity. The impact of the resolved set is particularly unclear.

  • I don't understand why the import map is being passed around so much. There's still always one import map per global, and it's easy to get to that global from any algorithm or from the "script" struct. At least one instance of this seems completely redundant, which I commented on. But e.g. why are you storing the import map in [[HostDefined]]? I realize there's probably some complexity here at the particular point in time when you're merging import maps and thus the global's import map changes, but that should be able to happen completely discretely between script parsing and execution, so I don't see why scripts should need to track individual import maps separate from the global one.

source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
@yoavweiss yoavweiss changed the title Dynamic module imports Dynamic import maps Aug 1, 2024
Copy link
Contributor Author

@yoavweiss yoavweiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Fixed some comments and addressing the rest soon!

source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
@yoavweiss
Copy link
Contributor Author

  • I don't understand why the import map is being passed around so much. There's still always one import map per global, and it's easy to get to that global from any algorithm or from the "script" struct. At least one instance of this seems completely redundant, which I commented on. But e.g. why are you storing the import map in [[HostDefined]]? I realize there's probably some complexity here at the particular point in time when you're merging import maps and thus the global's import map changes, but that should be able to happen completely discretely between script parsing and execution, so I don't see why scripts should need to track individual import maps separate from the global one.

My thinking was that we need to do that in order to guarantee that once we're parsing a module tree, all modules in the tree would be resolved by the same import map. E.g. I thought it is possible that a setTimeout would inject a new import map while the a module script is being downloaded and parsed, and that new import map would start taking effect after some modules were resolved but before others.
If that's not possible for some reason, I'm happy to revert these changes.

@yoavweiss yoavweiss requested a review from domenic August 1, 2024 18:59
Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very helpful, thanks!

My thinking was that we need to do that in order to guarantee that once we're parsing a module tree, all modules in the tree would be resolved by the same import map. E.g. I thought it is possible that a setTimeout would inject a new import map while the a module script is being downloaded and parsed, and that new import map would start taking effect after some modules were resolved but before others.
If that's not possible for some reason, I'm happy to revert these changes.

That makes perfect sense.

Given this, we should explain this in the spec, maybe around #concept-window-import-map. With a note that in general only the root of a loading operation will access concept-window-import-map, and otherwise it'll be threaded through.

With that frame, auditing all the call sites of concept-window-import-map...

  • "resolve a module integrity metadata" seems suspicious. It should probably get an import map threaded to it?
  • "fetch the descendants of and link" seems suspicious. Shouldn't it be getting threaded an import map from its various callers? (per the diagram above it.)
  • "register an import map" has a broken assert

I'm also a bit unsure now about the cases where an import map is not passed in. When is that possible? (Except workers.) We have fallbacks to the Window's import map in those cases, but I'm now questioning whether they're sound.

source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
Copy link
Contributor Author

@yoavweiss yoavweiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! editorial fixes first :)

source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
@yoavweiss
Copy link
Contributor Author

Given this, we should explain this in the spec, maybe around #concept-window-import-map. With a note that in general only the root of a loading operation will access concept-window-import-map, and otherwise it'll be threaded through.

Added

  • "resolve a module integrity metadata" seems suspicious. It should probably get an import map threaded to it?

This was indeed lacking. Should be fixed now.

  • "fetch the descendants of and link" seems suspicious. Shouldn't it be getting threaded an import map from its various callers? (per the diagram above it.)

Here I think the current state is fine, as this is being called from all the root module entry points. Therefore we don't need to thread the import map into "fetch the descendants of and link", we need it to do the threading to its descendants, which it does by setting the map on the Record.

  • "register an import map" has a broken assert

Indeed!!

@yoavweiss
Copy link
Contributor Author

I'm also a bit unsure now about the cases where an import map is not passed in. When is that possible? (Except workers.) We have fallbacks to the Window's import map in those cases, but I'm now questioning whether they're sound.

Let me try to enumerate the cases:

I think that covers all of them but let me know if I missed something.

@yoavweiss yoavweiss requested a review from domenic August 2, 2024 14:50
Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I think the current state is fine, as this is being called from all the root module entry points. Therefore we don't need to thread the import map into "fetch the descendants of and link", we need it to do the threading to its descendants, which it does by setting the map on the Record.

I think I see. Because the callers all operate on URLs or inline scripts, so they didn't need to do any resolution, and so didn't need an import map. It's only for the descendants that you start doing resolution and thus start needing an import map.

This does have the slightly-strange impact that given something like

<script type=module src=my-script.mjs></script>
<script type=importmap>
...
</script>

the modifications that appear after the <script type=module> will apply to the imports of my-script.mjs, because we delay snapshotting the import map until the response from the server comes back. That seems a bit unfortunate; WDYT?

source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
@yoavweiss yoavweiss requested a review from domenic August 5, 2024 14:59
@yoavweiss
Copy link
Contributor Author

I think I see. Because the callers all operate on URLs or inline scripts, so they didn't need to do any resolution, and so didn't need an import map. It's only for the descendants that you start doing resolution and thus start needing an import map.

This does have the slightly-strange impact that given something like

<script type=module src=my-script.mjs></script>
<script type=importmap>
...
</script>

the modifications that appear after the <script type=module> will apply to the imports of my-script.mjs, because we delay snapshotting the import map until the response from the server comes back. That seems a bit unfortunate; WDYT?

Forgot to address this part.. I agree that this would be weird, and hence it'd be better to pipe in the import map in those cases. I'll do that.

@yoavweiss
Copy link
Contributor Author

Forgot to address this part.. I agree that this would be weird, and hence it'd be better to pipe in the import map in those cases. I'll do that.

Done!

Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an idea for something that might clean things up. Basically, I find the importMap being optional confusing. It's hard to know whether an algorithm is not getting an import map because we forgot, or because we're coming from a worker. And the fact that sometimes we fall back to the Window's import map, even though we're not in an obviously "top level" algorithm, is extra confusing. (For example, in "create a JavaScript module script.)

I think the following would clean that up:

  • Move "Window's import map" (and Window's resolved module set?) to all global objects. Put them next to https://html.spec.whatwg.org/#in-error-reporting-mode . Add a note explaining that only Window objects have their import maps modified away from the initial empty import map, for now.
  • Make all import map arguments mandatory.
  • Always grab the global object's import map when appropriate. This should now be obviously only at top-level situations.

WDYT?

source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
@yoavweiss
Copy link
Contributor Author

I have an idea for something that might clean things up. Basically, I find the importMap being optional confusing. It's hard to know whether an algorithm is not getting an import map because we forgot, or because we're coming from a worker. And the fact that sometimes we fall back to the Window's import map, even though we're not in an obviously "top level" algorithm, is extra confusing. (For example, in "create a JavaScript module script.)

I think the following would clean that up:

  • Move "Window's import map" (and Window's resolved module set?) to all global objects. Put them next to https://html.spec.whatwg.org/#in-error-reporting-mode . Add a note explaining that only Window objects have their import maps modified away from the initial empty import map, for now.
  • Make all import map arguments mandatory.
  • Always grab the global object's import map when appropriate. This should now be obviously only at top-level situations.

WDYT?

SG. done!

@yoavweiss
Copy link
Contributor Author

In case it's useful for the review process - I mapped the high-level relevant spec changes to Chromium's code.

@Jamesernator
Copy link

Jamesernator commented Aug 10, 2024

At the same time, the current static design gives us determinism and isn’t racy. A module identifier that resolves to a certain module will continue to do so throughout the lifetime of the document. It would be good to keep that characteristic.

Has an alternative design been considered that doesn't mutate a shared global map?

For example perhaps associating maps to individual script tags?:

<!-- importmap="..." would only affect the loading of this graph -->
<script type="module" src="./entry.js" importmap="./entry.importmap.json"></script>

This alternative design could even allow for explaining importmaps in terms of import attributes, i.e. if you want to load a third-party module with a third-party importmap you could do:

import thirdParty from "./dist/third-party.js" with { importmap: "./dist/third-party.importmap.json" }

@jeff-hykin
Copy link

jeff-hykin commented Aug 10, 2024

I almost gave a nearly-equivlent comment yesterday but was afraid I was misunderstanding something. So I'm glad you spoke up @Jamesernator.

My draft example was literally:

await import("./dist/third-party.js", { withMap: "./dist/third-party.importmap.json"})
import thirdParty from "./dist/third-party.js" withMap "./dist/third-party.importmap.json"

Concerns with mutating a shared global map

  1. I don't see any discussion about risks, possible adverse side effects, or security.

Consider a dynamic const bcrypt = await import("/path/to/bcrypt.js") written by a site author. Right now they have full confidence that the import will either load what they expect or throw an error. I'm not convinced that letting browser extensions break that assumption won't lead to new attack vectors. While I believe browser extensions can already affect the top level import map, its still not clear to me that dynamic changes are not more risky.

  1. I don't see discussion about adverse side effects.

As I understand it, the proposal has side effects due to this being a global map. E.g. loading a brower extension can change the import behavior of non-browser-extension imports. Extensions often do not want side effects: case-in-point the motivating usecase at the top (this one) does not want side effects, it just wants to map-imports for extension1, not for extension2.

If extension1 global-mutates import-map for A
If extension2 global-mutates import-map for A

IMO extension-loading order creates a unnecessary race condition where extension1 overrides extnesion2's import map, along with extension2's developer having no good way to debug when-and-why their extension broke. (And even once they do figure out why there is basically no solution since telling the user to change the load order is impractical, they have to go back to square1 of bundling everything to be reliable)

Copy link
Contributor

@guybedford guybedford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing to see this, I only had time to do a very very brief review but I really like the overall approach. Specifically, I have some concerns about algorithmic complexity, but haven't had a chance to look more carefully at the algorithm to determine if they are "resolvable" yet.

Then for the copying approach - if we have a well-defined deduping that won't change the nature of mappings, what is the reason for wanting to lock down resolutions during individual load operations? Will this locked map also affect dynamic import or is it only cloned through the static graph? And if so, it might seem strange having different resolution rules for dynamic import and static import, especially when ECMA-262 also maintains its own cache to ensure these are consistent for known imports.

source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
@bakkot
Copy link
Contributor

bakkot commented Aug 11, 2024

I probably won't have time to do a thorough review of this, but @michaelficarra and I spent quite a while thinking about (some parts of) this problem back when import maps were first being discussed, so I want to call @yoavweiss's attention to this issue which @michaelficarra wrote at the time.

I don't see anything in the OP about the case where the second import map maps to something which is already mapped by the first. (Maybe I missed it?) For example:

First:

{
  "imports": {
    "/app/helper": "./helper/index.mjs"
  }
}

Second:

{
  "imports": {
    "helper": "/app/helper"
  }
}

What's the intended behavior in this case? My inclination is to say that this is equivalent to

{
  "imports": {
    "/app/helper": "./helper/index.mjs",
    "helper": "./helper/index.mjs",
  }
}

i.e., imports on the RHS of later maps are resolved in the context of earlier maps. This is to preserve the property that if you have a script with import "/foo", you can replace this with a script with import "bar" plus an import map of "bar": "/foo". Plus, one of the use cases for import maps is to rewrite the location of some dependency, and that only work if someone can't re-introduce the original location of the dependency by adding their own import map later.

This is consistent with the way the web usually works: if one script does globalThis.fetch = decorate(globalThis.fetch), and a later script does globalThis.fetch = decorate2(globalThis.fetch), the second script will wrap the fetch from the first script, not the original fetch.

source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Show resolved Hide resolved
source Show resolved Hide resolved
source Show resolved Hide resolved
source Show resolved Hide resolved
@guybedford
Copy link
Contributor

This algorithm is looking really nice now, and the notes and examples are great to build understanding. Very happy to see this move forward.

Note that while it may be stating the obvious, it could be worth noting that warnings can be skipped when the values are equal for all merge operations,

One thing that's still worrying me a little is the size of the data being stored here. Consider an application with 1000 modules, where each of those modules have 5 imports, that's 5000 entries in this map. Since most imports are relative URLs and not bare specifiers, we are storing the specifier, baseURL and resolved URL across all 5000 entries. At 50 bytes a URL, two URLs, and 20 bytes per specifier, that might be around 600KiB for just a simple app. Of course there's probably lots of smart ways to optimize this, baseURLs being hierarchical, adding some extra string sharing etc.

I'm not saying it should be done necessarily, but there's one design tradeoff that could be made that might help reduce the data size by more than 50%, that could be worth discussing, and that's to not permit URL remapping for any full URL maps once that URL or URL prefix has been imported, regardless of scope shadowing.

This way, instead of storing every single specifier in the resolved module set, we can having two smaller sets: (1) the resolved URL set, and (2) the bare specifier resolved module set. This is based on the observation that relative specifiers are usually at least as common as bare specifiers, and often more so. We then don't need to store asURL on resolved module records, and can instead have a single optimized data structure tracking bare specifiers across their parent URL usage.

For that same 1000 module app, with 5 imports per module, maybe 2500 of the import specifiers were relative or URL imports and 2500 of the import specifiers were bare specifiers. We now store only 1000 entries in the imported URL set, and 2500 in the bare specifier module set as just parent + bare specifier, resulting in of the order of 50KiB + 175KiB = 225KiB at less than 40% of the size. If the relative / full URL count is higher that also improves.

The utility of URL remapping may not be lost if we don't support scoped remapping of URLs that have already been imported in dynamic import maps workflows, so that this might not be such a hard tradeoff to make. In many cases, the URLs being remapped in these workflows are local to the scope anyway and part of the private contracts of the package where the URL being mapped wouldn't have been otherwise imported, so wanting to support this may go against concepts of package and scope isolation anyway.

Would be interested to hear thoughts further.

@yoavweiss
Copy link
Contributor Author

yoavweiss commented Oct 3, 2024

Thanks!!

One thing that's still worrying me a little is the size of the data being stored here. Consider an application with 1000 modules, where each of those modules have 5 imports, that's 5000 entries in this map. Since most imports are relative URLs and not bare specifiers, we are storing the specifier, baseURL and resolved URL across all 5000 entries.

I wouldn't expect implementations to implement this algorithm as is. E.g. the information needed for asURL can be represented by a boolean, URLs can be deduped, etc. Beyond that, the algorithm we're using in Chromium is significantly different, and optimizes merge times at the expense of memory. That may change over time.

At 50 bytes a URL, two URLs, and 20 bytes per specifier, that might be around 600KiB for just a simple app.

I wouldn't consider an app that load 5000 modules to be "simple". But again, real-life usage can guide the different tradeoffs we can take here.

I'm not saying it should be done necessarily, but there's one design tradeoff that could be made that might help reduce the data size by more than 50%, that could be worth discussing, and that's to not permit URL remapping for any full URL maps once that URL or URL prefix has been imported, regardless of scope shadowing.

Can you expand on that? I don't understand how that would help.

Also, given that "https:/" is a common URL prefix, that seems rather limiting. (unless I misunderstand you)

@guybedford
Copy link
Contributor

guybedford commented Oct 3, 2024

I wouldn't consider an app that load 5000 modules to be "simple". But again, real-life usage can guide the different tradeoffs we can take here.

I mean still 1000 modules but with 5000 imports / resolved module set records. Specifier count which is equal to resolved module set size is much higher than module count.

I wouldn't expect implementations to implement this algorithm as is. E.g. the information needed for asURL can be represented by a boolean, URLs can be deduped, etc. Beyond that, the algorithm we're using in Chromium is significantly different, and optimizes merge times at the expense of memory. That may change over time.

Hmm, looking more closely at the merging algorithm I see now you don't check asURL in merging, so that ./app.js and /app.js are treated as non-conflicting mappings regardless of whether it has been loaded in the given scope. I was under the impression this had already been addressed, but please let me know if I'm missing something.

Also, given that "https:/" is a common URL prefix, that seems rather limiting. (unless I misunderstand you)

Perhaps an example will help, say you start with:

{
  "imports": {
    "/app.js": "/app-dev.js"
  },
  "scopes": {
    "/dep/": {
      "deep-dep": "/deep-dep/index.js"
    }
  }
}

And then later on you want to add:

{
  "scopes": {
    "/another-dep/": {
      "deep-dep": "/deep-dep/index.js"
    }
  }
}

You would only need to track "deep-dep"'s importers, and nothing else, so you wouldn't need to store every single specifier resolved, just the bare specifier ones.

If only tracking bare specifier imports, the restriction here would then be that the following wouldn't be supported:

{
  "scopes": {
    "/another-dep/": {
      "/app.js": "/breaks.js"
    }
  }
}

The above would be supported just fine though if /app.js hadn't been imported yet at all, thus URL mapping cases are then just a direct URL check against the global loaded URL set, removing the need for resolved URL checks.

The tradeoff is that we aren't having to track every single URL that is imported under every single scope, but we are just tracking bare specifiers, and then ban URL remappings regardless of scopes for URLs already imported.

Note that while it may be stating the obvious, it could be worth noting that warnings can be skipped when the values are equal for all merge operations,

With regards to this comment, looking at your implementation it seems like it doesn't check equality of entries currently when generating warnings to avoid unnecessary warnings over duplicate entries.

@yoavweiss
Copy link
Contributor Author

Hmm, looking more closely at the merging algorithm I see now you don't check asURL in merging, so that ./app.js and /app.js are treated as non-conflicting mappings regardless of whether it has been loaded in the given scope. I was under the impression this had already been addressed, but please let me know if I'm missing something.

I don't believe that's true, as the specifier that goes into the resolved module set is a normalized specifier.

With regards to this comment, looking at your implementation it seems like it doesn't check equality of entries currently when generating warnings to avoid unnecessary warnings over duplicate entries.

Yeah, we're not currently checking equality for duplicate entries.

@yoavweiss
Copy link
Contributor Author

I mean still 1000 modules but with 5000 imports / resolved module set records. Specifier count which is equal to resolved module set size is much higher than module count.

Can you expand on that? Do you expect every module to be called using multiple specifiers?
Or are you referring to the fact that a single module can be imported by multiple other modules, resulting in multiple scopes?

@guybedford
Copy link
Contributor

I don't believe that's true, as the specifier that goes into the resolved module set is a normalized specifier.

I see that, but the scope merging algorithm itself doesn't seem to check this asURL value when performing specifier equality as far as I can tell, it always just checks specifier, so that ./app and /app in a scope are treated as new mappings and not included in the filtering despite being able to remap the same specifier. specifier in the removal could be any other variation like ../app or https://site.com/app where these cases aren't being checked which might be an inconsistency in the design. The fix would be to check asURL explicitly in the scope merging, which is what I would have expected to see. As I say, please let me know if I'm missing something.

Or are you referring to the fact that a single module can be imported by multiple other modules, resulting in multiple scopes?

Yeah I mean this, where having a data structure that just tracks bare specifier usage would be much smaller.

@yoavweiss
Copy link
Contributor Author

yoavweiss commented Oct 3, 2024

FWIW, all this talk about memory savings made me realize that the current ImportMap implementation in Chromium doesn't use AtomicStrings, which can probably save a bunch of memory, especially once we start storing prefixes for the merge algorithm. https://chromium-review.googlesource.com/c/chromium/src/+/5905067 fixes that. Thanks!!

@yoavweiss
Copy link
Contributor Author

More broadly than AtomicStrings, I think that if storing this data would turn out to be a memory issue, there are many ways for implementations to deal with that. E.g. store hashes instead of full URLs.

Limiting the merge algorithm to only support scopes for bare specifiers feels to me like a arbitrary premature optimization, that doesn't necessarily takes the priority of constituencies into account.

source Outdated Show resolved Hide resolved
@guybedford
Copy link
Contributor

More broadly than AtomicStrings, I think that if storing this data would turn out to be a memory issue, there are many ways for implementations to deal with that. E.g. store hashes instead of full URLs.

The string data still needs to be stored, since hashes wouldn't be full-proof here unless as long as URLs anyway.

Limiting the merge algorithm to only support scopes for bare specifiers feels to me like a arbitrary premature optimization, that doesn't necessarily takes the priority of constituencies into account.

Scopes would still be supported for URLs fine, so long as those URLs have not been loaded - we just take advantage of the fact that URLs have a global meaning unlike bare specifiers. Do you have real use cases for wanting to change the interpretation of a URL between scopes even after that URL has been loaded?

@yoavweiss
Copy link
Contributor Author

Let's move this discussion to a more practical plane - do you have a test site that represents the case you're concerned about? We could use such a site to measure the memory impact of the current implementation to see if it is a cause for concern.

@annevk
Copy link
Member

annevk commented Oct 4, 2024

If there's still significant discussion it would be good if this PR was backed by an issue where that discussion could take place. PRs don't lend themselves to significant discussion and at least historically at some point GitHub will give up and make the PR unusable.

@yoavweiss
Copy link
Contributor Author

The relevant public issues are WICG/import-maps#248 and WICG/import-maps#92 (we should really archive that repo. /cc @domenic )

From my perspective, the discussion here revolves around this PR's algorithms, so it does feel appropriate to have it here. At the same time, if GH borks the PR, that's not great..

Copy link
Member

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we imagine telling web developers the result of the merge with the current API if we wanted to do that? Additional information on the load event or some such?

source Outdated Show resolved Hide resolved
<var>newMap</var> and a <span>module specifier map</span> <var>oldMap</var>:</p>

<ol>
<li><p>Let <var>mergedMap</var> be a deep copy of <var>oldMap</var>.</p></li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Infra calls this clone, but a deep clone is not defined I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Infra's clone is explicitly shallow. Happy to define a "deep clone" in infra if that's better

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'll be tricky as we don't have a clone operation for arbitrary values. :/ It's essentially an issue similar to whatwg/infra#643

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My position is that we should use a undefined "everyone hopefully knows what this means" term in the HTML spec, like "deep copy", until we can make the time to flesh it out in Infra. So I like the PR as-is.

source Outdated Show resolved Hide resolved
source Show resolved Hide resolved
source Show resolved Hide resolved
source Show resolved Hide resolved
@yoavweiss
Copy link
Contributor Author

How do we imagine telling web developers the result of the merge with the current API if we wanted to do that? Additional information on the load event or some such?

We could. In discussions with @hiroshige-g, we weren't sure there's a use case for it, but e.g. a list of ignored rules hanging off of the load event could be a way for us to communicate that in the future, if we'll see there's demand.

@domenic
Copy link
Member

domenic commented Oct 5, 2024

In general the processed form of the import map has not been accessible up to this point. So far there hasn't been a use case for it, but WICG/import-maps#128 tracks that feature request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

10 participants