Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add URL normalizePath() filter #3237

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

mvalkon
Copy link
Member

@mvalkon mvalkon commented Sep 19, 2024

Adds a filter called normalizePath which implements URL Path
normalization by removing empty path segments and trailing slashes from
the path. This follows the guidance from the Zalando Restful API
Guidelines rule 136.

Resolves #3236

Adds a filter called `normalizePath` which implements URL Path
normalization by removing empty path segments and trailing slashes from
the path. This follows the guidance from the Zalando Restful API
Guidelines rule [136][1].

[1]: https://opensource.zalando.com/restful-api-guidelines/#136

Signed-off-by: Mikko Valkonen <[email protected]>
path.Clean() may break certain edge-cases where the path-variables
would use `.` or `..` characters. This commit removes the usage of
path.Clean() and implements the normalization manually.

Signed-off-by: Mikko Valkonen <[email protected]>
Removes some dead code and adds test to cover for root path segment.

Signed-off-by: Mikko Valkonen <[email protected]>
Adds rudimentary documentation for the normalizePath()-filter with
examples.

Signed-off-by: Mikko Valkonen <[email protected]>
@mvalkon mvalkon changed the title Add url path normalizer Add normalizePath() filter Sep 19, 2024
@mvalkon mvalkon changed the title Add normalizePath() filter Add URL normalizePath() filter Sep 19, 2024
Example:

```
all: * -> normalizePath() -> "https://backend.example.org/api/v1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing closing " and network backend is not having a path segment.
all: * -> normalizePath() -> "https://backend.example.org";

@@ -370,4 +370,5 @@ const (
SetFastCgiFilenameName = "setFastCgiFilename"
DisableRatelimitName = "disableRatelimit"
UnknownRatelimitName = "unknownRatelimit"
NormalizePath = "normalizePath"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NormalizePath -> NormalizePathName

func (spec normalizePath) Name() string { return "normalizePath" }

func (spec normalizePath) CreateFilter(config []interface{}) (filters.Filter, error) {
return normalizePath{}, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can return spec, nil and if you have a ptr receiver you don't need to have allocations for the filter:

func NewNormalizePath() filters.Spec { return &normalizePath{} }

func (f *normalizePath) Name() string { return filters.NormalizePathName }

func (f *normalizePath) CreateFilter(config []interface{}) (filters.Filter, error) {
    return f, nil
}

func (f *normalizePath) Request(ctx filters.FilterContext) {
...

filteredSegments = append(filteredSegments, seg)
}
}
normalizedPath := "/" + strings.Join(filteredSegments, "/")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

semgrep asks:

did you want path.Join() or filepath.Join()?

@szuecs szuecs added the minor no risk changes, for example new filters label Sep 19, 2024
@AlexanderYastrebov
Copy link
Member

Couple of thoughts:

  • Filter name is generic and implies normalization but it only does duplicate slash elimination
  • Link to Zalando Guidelines mentions https://en.wikipedia.org/wiki/URI_normalization which mentions RFC 3986 which describes a lot of other normalization methods but does not describe duplicate slash elimination
  • There is rfcPath filter that performs escape normalization and also mentions RFC 3986
  • This filter will be invoked after routing which means Path-related predicates will not benefit from it.
  • There is -rfc-patch-path flag that reuses the same machinery as existing rfcPath at proxy level and thus fixes path before routing.

So I am a bit puzzled here. This filter does not fix routing so maybe we could extend existing rfcPath which also can be enabled globally before routing. On the other hand duplicate slash is not against RFC (see this) so eliminating duplicate slashes is not "RFC"-compliant. Lastly, what problem does it really solve - maybe answering client 404 (in case they sent wrong path that does not match any route) or passing url as is to the backend is a better approach then silently fixing the path.

@mvalkon
Copy link
Member Author

mvalkon commented Sep 25, 2024

Couple of thoughts:

  • Filter name is generic and implies normalization but it only does duplicate slash elimination
  • Link to Zalando Guidelines mentions https://en.wikipedia.org/wiki/URI_normalization which mentions RFC 3986 which describes a lot of other normalization methods but does not describe duplicate slash elimination

I agree. The normalization is only normalizing the path following the guidance (see below snippet) from the API guidelines, and I've raised the topic internally to see if we should extend this beyond the scope of this.

We recommend to implement services robust against clients not following this rule. All services should [normalize](https://en.wikipedia.org/wiki/URI_normalization) request paths before processing by removing duplicate and trailing slashes.
  • There is rfcPath filter that performs escape normalization and also mentions RFC 3986
  • This filter will be invoked after routing which means Path-related predicates will not benefit from it.
  • There is -rfc-patch-path flag that reuses the same machinery as existing rfcPath at proxy level and thus fixes path before routing.

I'm happy to extend an existing filter. I have to admit my knowledge of skipper is limited - I'm assuming that this filter would work for all paths if you have a catch-all predicate * defined, but this might be overly simplistic assumption.

So I am a bit puzzled here. This filter does not fix routing so maybe we could extend existing rfcPath which also can be enabled globally before routing. On the other hand duplicate slash is not against RFC (see this) so eliminating duplicate slashes is not "RFC"-compliant. Lastly, what problem does it really solve - maybe answering client 404 (in case they sent wrong path that does not match any route) or passing url as is to the backend is a better approach then silently fixing the path.

While it's not RFC compliant as such, the unfortunate reality is that backend frameworks and server implementations treat trailing slashes in different ways. We had an incident internally recently where an API that previously supported endpoints with trailing slashes stopped doing so after a version upgrade of Spring Boot - the new version no longer treated /foobar and /foobar/ as the same URL. There was of course a client which was incorrectly configured to call /foobar/ which triggered the incident. We clearly guide (see guidelines) that APIs must not specify paths with duplicate or trailing slashes, but we can't control all the client configurations and from there the same guideline recommends server implementations to be robust against poorly configured clients.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
minor no risk changes, for example new filters
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a filter that normalizes url paths by removing trailing slashes and empty path segments
3 participants