Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add resource providers for common cloud providers #1074

Open
4 of 7 tasks
SylvainJuge opened this issue Oct 13, 2023 · 12 comments
Open
4 of 7 tasks

Add resource providers for common cloud providers #1074

SylvainJuge opened this issue Oct 13, 2023 · 12 comments
Assignees

Comments

@SylvainJuge
Copy link
Contributor

SylvainJuge commented Oct 13, 2023

Most cloud providers provide a metadata endpoint that allows to build resource information,
however in Java contrib repo we only have an implementation for AWS.

For example, in the js contrib repo, we can see there are other implementations in https://github.com/open-telemetry/opentelemetry-js-contrib/tree/main/detectors/node : alibaba, gcp and aws (I haven't looked at their respective implementations though).

The goal here is to add implementations for the most common cloud providers.

Initially the focus will be on the following cloud providers: AWS, GCP and Azure with the following task breakdown

Other cloud providers can of course be added later, but should be tracked independently.

Collector implementations that can be used for reference : https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/resourcedetectionprocessor


for triage: This issue can be assigned to me.

@punya
Copy link
Member

punya commented Jan 31, 2024

Hi @SylvainJuge, thanks for starting the initiative to gather implementations for all cloud providers.

Once all the resource detectors are added to the contrib repository, what will be the recommended way for users of the Java agent to incorporate these detectors? Naively, I would expect the workflow to look like:

  1. Find or build shaded jars for all the detectors needed.
  2. Add them to the command line using -Dotel.javaagent.extensions

This seems pretty inconvenient, especially because they require the use of advanced features of Maven/Gradle to automate.

I know that there have been earlier discussions about incorporating detectors for common cloud platforms into the default agent distribution. In at least one case, we decided to exclude the detector because it added startup latency. I was wondering if we could get the best of both worlds by

  1. Including the detector code in the agent
  2. Keeping it disabled by default

This would parallel the approach used in the Collector, where the contrib distribution includes many detectors, but a given detector is only invoked if it's explicitly enabled in the Collector configuration file.

@SylvainJuge
Copy link
Contributor Author

Hi @punya , sorry for the late reply on this.

So far I haven't really thought about "making it convenient to use them", but that's a very good point here.
I agree with you that shading or using the command line option is not really practical for most users and doing that for every agent distribution would be wasteful.

Having them included and disabled by default in the agent would definitely be a good option:

  • users of the otel distribution agent should be able to opt-in through configuration by adding the FQN providers in otel.java.enabled.resource.providers config (doc), or by removing them from otel.java.disabled.resource.providers depending on how we implement the "disabled by default" (see below).
  • other agent distributions can control which ones get enabled or not through the same configuration options.

In order to implement the "included but disabled by default", what we did on our side so far is the following:

  • re-package the existing resource providers from the contrib repo and remove their service provider files (plenty of gradle/build fun !!)
  • explicitly invoke those resource providers at runtime when the agent starts

This strategy is complex and can´t be reused when using those resource providers directly as SDK extensions.

On the code side, I think that keeping it in the contrib repo and not directly into the agent allows to reuse them as SDK extensions without an agent, but in practice I really don't know how popular or how relevant this option would be. Given support for java agents in native images like GraalVM is clearly not for the short term that's still something to keep in mind.

So here I would be in favor of keeping the code in contrib repo and add them (but disabled) in the agent. However I am not 100% clear about is what would be the best option to implement the "included but disabled by default" behavior:

  • the otel.java.{enabled,disabled}.resource.providers options are currently defined in the SDK autoconfiguration (code), both are empty by default.
  • agent can alter the values for those configuration options to disable them by default.
  • those configuration options might still be used by users of the agent, thus it's probably worth preserving their semantics (unlike what I suggested above initialy).

I think that we might need to have an agent-only configuration option here to implement the opt-in behavior, as we can't alter the semantics of the existing SDK autoconfig options, for example otel.instrumentation.optional.resource.providers. The agent would contain an hard-coded list of included FQN optional providers and unless their FQN is added to this option those would be added at agent startup to the otel.java.disabled.resource.providers by the agent.

@trask
Copy link
Member

trask commented Feb 8, 2024

I was wondering if we could get the best of both worlds by

  1. Including the detector code in the agent
  2. Keeping it disabled by default

this makes sense to me

On the code side, I think that keeping it in the contrib repo and not directly into the agent allows to reuse them as SDK extensions without an agent, but in practice I really don't know how popular or how relevant this option would be.

even if we moved them to the instrumentation repo, we would still publish them as standalone artifacts, e.g. https://github.com/open-telemetry/opentelemetry-java-instrumentation/tree/main/instrumentation/resources/library

but I agree with keeping them here in the contrib repo where the cloud vendors can have ownership of them, and we can still pull them into the Java agent.

@zeitlinger
Copy link
Member

otel.instrumentation.optional.resource.providers

I think this is a good idea 😄

@SylvainJuge
Copy link
Contributor Author

  • why use a FQN here instead of the short names as for other providers?

What I meant here is that we should the same values as the ones we can use with the otel.java.{enabled,disabled}.resource.providers options as the agent will probably copy/append/modify the provided values to those existing SDK options.

I wasn't aware of the "short names" that we can use with other providers, is there any documentation or list of them somewhere ? Currently the SDK documentation only refers to FQN.

@zeitlinger
Copy link
Member

Sorry, confused that with exporter...

@zeitlinger
Copy link
Member

I was wondering if we could get the best of both worlds by

  1. Including the detector code in the agent
  2. Keeping it disabled by default

this makes sense to me

@trask what about otel.java.additional.resource.providers=<FQN1,FQN2> to enable resource providers, without affecting resource providers that are not mentioned in this list.

@trask
Copy link
Member

trask commented Feb 22, 2024

something similar to otel.instrumentation.<>.enabled=true? (could be done entirely in the agent, without impacting resource providers themselves)

@jack-berg
Copy link
Member

something similar to otel.instrumentation.<>.enabled=true? (could be done entirely in the agent, without impacting resource providers themselves)

I was wondering if this makes sense given that resource providers can also be used without the otel java agent. Would a user using the resource providers as library instrumentation expect to have the notion of default enabled providers? I think the answer is no. Users have to manually add a dependency on the resource provider, and it makes sense to interpret this as wanting to enable that resource provider by default. In contrast, when the agent is installed, (most) users don't have a say on which resource providers are included, so it makes sense to have an additional configuration knob.

If something like otel.instrumentation.<>.enabled=true was introduced, we could:

  • Have a default enabled status for each resource provider instrumentation
  • When installing the SDK, evaluate otel.instrumentation.<>.enabled=true for all resource provider instrumentations and use it to customize the otel.java.disabled.resource.providers option.
    • If a resource provider instrumentation is disabled by default and a user doesn't enable it, OR if a resource provider is enabled by default and a user disables it: ensure the FQCN is included in otel.java.disabled.resource.providers
    • If a resource provider is disabled by default and user enables it, OR if a resource provider is enabled by default and a user doesn't disable it: ensure the FQCN does not appear in otel.java.disabled.resource.providers

@zeitlinger
Copy link
Member

Would a user using the resource providers as library instrumentation expect to have the notion of default enabled providers? I think the answer is no.

In the case of a spring boot starter it would also make sense - but I think it doesn't change the proposed solution.

If I understand the proposal correctly, it could be implemented with a new NamedResourceProvider in the SDK

NamedResourceProvider:

  • boolean defaultEnabled()
  • String name() -> so that it would be otel.instrumentation.gcp-resources.enabled=true

Suggestions

  1. use otel.java.resource.provider.<>.enabled to align with the existing providers
  • or otel.resource.provider.<>.enabled if we use gcp instead of FQN
  1. javaagent could create wrapper resource providers for contrib if there's some reason not to implement NamedResourceProvider in contrib

@zeitlinger
Copy link
Member

@trask @jack-berg I've created a PR that implements this proposal: open-telemetry/opentelemetry-java#6250

@zeitlinger
Copy link
Member

@trask here's the ticket for the Azure resource provider: #1214

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants