Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

!!!TASK: Use workspaceName instead of contentStreamId in AssetUsageProjection #5109

Conversation

crydotsnake
Copy link
Member

@crydotsnake crydotsnake commented Jun 1, 2024

Upgrade instructions

⚠️ This is a breaking change!

For existing projects its nessecary to delete the cr_*_p_neos_asset_usage tables first.

Then you need to setup the content repositorys again:

./flow cr:setup

And also run ./flow cr:projectionreplay AssetUsage afterwards.

Review instructions

resolves: #5084

Checklist

  • Code follows the PSR-2 coding style
  • Tests have been created, run and adjusted as needed
  • The PR is created against the lowest maintained branch
  • Reviewer - PR Title is brief but complete and starts with FEATURE|TASK|BUGFIX
  • Reviewer - The first section explains the change briefly for change-logs
  • Reviewer - Breaking Changes are marked with !!! and have upgrade-instructions

@crydotsnake crydotsnake marked this pull request as draft June 1, 2024 14:35
@crydotsnake crydotsnake added the 9.0 label Jun 1, 2024
@github-actions github-actions bot added the Task label Jun 1, 2024
@crydotsnake
Copy link
Member Author

crydotsnake commented Jun 2, 2024

Currently you have to delete the asset_usage table first before it works. But it would be better if it would work right away without having to delete the asset_usage table first.

See Upgrade instructions above.

@crydotsnake crydotsnake self-assigned this Jun 2, 2024
@crydotsnake crydotsnake force-pushed the task/use-workspacename-insteadof-contentstreamid branch from e6198c9 to 38a3694 Compare June 3, 2024 09:40
@crydotsnake crydotsnake changed the title TASK: Use workspaceName instead of contentStreamId in AssetUsageProjection !!!TASK: Use workspaceName instead of contentStreamId in AssetUsageProjection Jun 3, 2024
@dlubitz
Copy link
Contributor

dlubitz commented Jul 5, 2024

Current state is, that we might need to implement also the node hierarchy in the AssetUsageProjection, to be able to handle child nodes on deleted nodes.

See: https://neos-project.slack.com/archives/C04PYL8H3/p1718654280746299

@dlubitz
Copy link
Contributor

dlubitz commented Jul 5, 2024

As discussed with @bwaidelich :

  • We want to try to replace the projection with a CatchUpHook
  • We just want to track changes to asset-properties.
  • For the initial state, we need to build the AssetUsage based on the latest node state
    • iterating over all workspaces and all nodes (from root to leafs)
  • After initial node state has been build we handle changes to the AssetUsage by the events of the CatchUpHook
  • We want to try to extract the "AssetUsage-Comparison-Storage" logic, so we can mostly work with the node read model

PROS:

  • We can use Projections to gather workspace chains and node hierarchy

CONS:

  • Heavy job for inital state building (traversing all nodes, in all dimensions and all workspaces)

@bwaidelich
Copy link
Member

bwaidelich commented Jul 6, 2024

@dlubitz thanks for the summary.
I'll add some more topics we discussed from the top of my head:

In general, just looking at the state should make a lot of edge cases simpler because it is much easier to reason about.
For example: IMO it makes sense to only ever have a single entry in the asset usage table if it is used in the same node property (i.e. aggregate id, dimension space point, property name and asset id are equal) no matter whether the asset was changed an reverted in the meantime.

But there is one special case and that's "materialized nodes" :
Whenever a node (property) is changed that respective node is copied in the database. That is done to optimize disk space and performance, but it's not an implementation detail because it means that other changes to that node won't "shine through" from the base workspace any longer as soon as it's changed once – until that change is published or reverted that is.

This can lead to at least two weird situations in relation to asset usages:

Scenario: asset is only used in the child workspace

  1. Node a is created in workspace live with a property x of asset1
    • One entry in the asset usage table: workspace: live, node: A, property: x, asset: asset1
  2. A different property of that node is changed in workspace user
    • No change to the asset usage table
  3. The property x of node a is set to null (i.e. the asset is removed from the prop)
    • The entry from the asset usage table is removed as well

As a graph:

%%{init: { 'theme': 'base', 'gitGraph': {'showBranches': true, 'showCommitLabel':true,'mainBranchName': 'live'}} }%%
gitGraph
    commit id:"prop x: asset1"
    branch user
    commit id:"prop y: foo"
    checkout live
    commit id:"prop x: ~"
Loading

Issue

As a result of the above, the workspace user still contains that node a with the asset asset1 used. But it's not contained in the asset usage table any longer.

Note

When building up the asset usage table from scratch (i.e. from the content graph state) it would add an entry to the usage table in this case because we can't easily differentiate whether the property was set in the workspace explicitly

Consequence

It at least feels a bit weird, that the asset usage is not tracked in this case (and especially that it is when the usage tabel is built-up from the content graph state). But in reality this probably isn't a problem because:
As soon as the live workspace is changed, the user workspace is outdated. So it contains a state that does not reflect "the real world", i.e. it can't be published as is.
When the conflict is resolved (i.e. the user workspace is rebased) the asset removal event will be reflected and the asset is no longer in use for good.

Note

If, at some point, we allow to resolve conflicts in terms of "keep my change", there will have to be an event that reverts the asset removal – so the asset usage table will be updated correctly

There are some more, similar, scenarios, e.g. when the asset property is not removed but changed to a different asset in the base workspace. But the resulting consequences are mostly similar to the case above.

Consideration

While writing this down, I was wondering whether that initial claim

only ever have a single entry in the asset usage table if it is used in the same node property

really makes sense..
We could also sync the node in the asset usage table whenever it is changed (i.e. every event that would lead to the node being "materialized" if it wasn't yet). Sync as in: copy all usages for that node from the base workspace(s)
With that the 2nd step in the scenario above would have a different effect:

  1. Node a is created in workspace live with a property x of asset1
    • One entry in the asset usage table: workspace: live, node: A, property: x, asset: asset1
  2. A different property of that node is changed in workspace user
    • Add entry to the asset usage table: workspace: user, node: A, property: x, asset: asset1
  3. The property x of node a is set to null (i.e. the asset is removed from the prop)
    • The entry from the asset usage table is removed as well

This way the resulting state would be the same for the initial setup and when executed via hook.

Note

I still think that we should count only one usage if the asset is the same in the child workspace, but we could filter out the duplicates at read time

Copying all usages for each (initial) change might pose a performance issue though..

One comment regarding

Heavy job for inital state building (traversing all nodes, in all dimensions and all workspaces)

We would only have to look into child workspaces with pending changes. And for those we could even find only potentially affected nodes by iterating through the events of the respective content stream (but that is an optimization that could always add lateron)

@dlubitz
Copy link
Contributor

dlubitz commented Sep 23, 2024

Closed in favor of #5258

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

Adjust AssetUsageProjection to use workspace instead of content stream id
3 participants