Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(jmc-core): update RJMX sources #228

Merged
merged 4 commits into from
Jun 7, 2023

Conversation

Josh-Matsuoka
Copy link
Contributor

@Josh-Matsuoka Josh-Matsuoka commented May 30, 2023

Updating the missed RJMX classes to resolve the NullPointer when checking for JFR availability in native images.

Fixes #223

@andrewazores andrewazores changed the title Updating missed RJMX classes chore(jmc-core): update RJMX sources May 31, 2023
@andrewazores andrewazores added the chore Refactor, rename, cleanup, etc. label May 31, 2023
@Josh-Matsuoka Josh-Matsuoka marked this pull request as ready for review May 31, 2023 17:25
@andrewazores andrewazores self-requested a review May 31, 2023 17:29
@andrewazores
Copy link
Member

andrewazores commented May 31, 2023

I tried this out combined with #222 and got this in the Cryostat logs when attempting to open a JMX connection to a native image application:

io.vertx.ext.web.handler.HttpException: Internal Server Error
Caused by: org.openjdk.jmc.rjmx.ServiceNotAvailableException: 
	at org.openjdk.jmc.rjmx.services.jfr.internal.FlightRecorderServiceV2.<init>(FlightRecorderServiceV2.java:128)
	at org.openjdk.jmc.rjmx.services.jfr.internal.FlightRecorderServiceFactory.getServiceInstance(FlightRecorderServiceFactory.java:47)
	at io.cryostat.core.net.JFRJMXConnection.getService(JFRJMXConnection.java:140)
	at io.cryostat.net.web.http.api.v1.TargetRecordingsGetHandler.lambda$handleAuthenticated$0(TargetRecordingsGetHandler.java:128)
	at io.cryostat.net.TargetConnectionManager.executeConnectedTask(TargetConnectionManager.java:168)
	at io.cryostat.net.web.http.api.v1.TargetRecordingsGetHandler.handleAuthenticated(TargetRecordingsGetHandler.java:124)
	at io.cryostat.net.web.http.AbstractAuthenticatedRequestHandler.handle(AbstractAuthenticatedRequestHandler.java:102)
	at io.cryostat.net.web.http.AbstractAuthenticatedRequestHandler.handle(AbstractAuthenticatedRequestHandler.java:72)
	at io.vertx.ext.web.impl.BlockingHandlerDecorator.lambda$handle$0(BlockingHandlerDecorator.java:48)
	at io.vertx.core.impl.ContextBase.lambda$null$0(ContextBase.java:137)
	at io.vertx.core.impl.ContextInternal.dispatch(ContextInternal.java:264)
	at io.vertx.core.impl.ContextBase.lambda$executeBlocking$1(ContextBase.java:135)
	at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:833)

That first frame in the stack trace is this:

		if (!isDynamicFlightRecorderSupported(handle) && isFlightRecorderDisabled(handle)) {
			throw new ServiceNotAvailableException(""); //$NON-NLS-1$
		}

When I hacked at updating the sources myself (see #222 (comment)) I had the hasCommercialFeatures values inverted compared to this PR or the JMC upstream sources, so I think that explains how it got past this check in my attempt but not in this PR.

	private boolean isDynamicFlightRecorderSupported(IConnectionHandle handle) {
		return !cfs.hasCommercialFeatures() || (ConnectionToolkit.isHotSpot(handle)
				&& ConnectionToolkit.isJavaVersionAboveOrEqual(handle, JavaVersionSupport.DYNAMIC_JFR_SUPPORTED));
	}

I'm guessing that this means the further detection checks happening here also do not properly support whatever information the native image is providing about itself. I assume it wouldn't pass the isHotspot() check.

	@Override
	public ICommercialFeaturesService getServiceInstance(IConnectionHandle handle)
			throws ConnectionException, ServiceNotAvailableException {
		JVMDescriptor descriptor = handle.getServerDescriptor().getJvmInfo();
		if (descriptor != null) {
			JavaVersion version = new JavaVersion(descriptor.getJavaVersion());
			if (version.getMajorVersion() >= 11) {
				return new Jdk11CommercialFeaturesService();
			}
		}
		return new HotSpot23CommercialFeaturesService(handle);
	}
	public static boolean isHotSpot(IConnectionHandle connectionHandle) {
		String vmName = getVMName(connectionHandle);
		return vmName != null && JavaVMVersionToolkit.isHotspotJVMName(vmName);
	}

@andrewazores
Copy link
Member

I'll try to dig into this a bit deeper later. Maybe @roberttoyonaga will be back to take a look before I get the chance.

@andrewazores
Copy link
Member

Though, I think Robert has said before that he was able to connect to his native image application using JMC before, so it seems like that code path should work here too...

@roberttoyonaga
Copy link
Contributor

roberttoyonaga commented Jun 6, 2023

I think Robert has said before that he was able to connect to his native image application using JMC before

Yup, I don't experience issues connecting to a native image executable (quarkus or otherwise) with JMC. After connection, I can inspect all the MBeans on the MBean server fine (including FlightRecorderMXBean). However, JMC does throw an error when I try to use the flight recorder wizard.
image
vs
image

So it's possible that code path also fails in JMC, and there's a different path that allows me to connect to the MBean Server and interact with FlightRecorderMXBean that way.

@roberttoyonaga
Copy link
Contributor

roberttoyonaga commented Jun 6, 2023

Maybe ConnectionToolkit.isHotSpot(handle) || ConnectionToolkit.isSubstrateVM(handle) would fix the problem. The vmName attribute of RuntimeMXBean is "Substrate VM".
image.

There's actually another problem. If the goal is to determine whether flight recorder is supported, it's not enough for the VM to be Substrate VM. The binary could have been built with RJMX support, but not JFR support. If JFR is not supported, then FlightRecorderMXBean will have not gotten registered with the MBean server at startup. So maybe one way of checking to see whether flight recorder is supported, it to see if the FlightRecorderMXBean object name is registered with the MBean server. Alternatively, you maybe could just always attempt connection if the VM is Substrate and anticipate failure if JFR is not present.

Edit Implemented in review suggestions

@aptmac
Copy link
Member

aptmac commented Jun 6, 2023

Yup, I don't experience issues connecting to a native image executable (quarkus or otherwise) with JMC. After connection, I can inspect all the MBeans on the MBean server fine (including FlightRecorderMXBean). However, JMC does throw an error when I try to use the flight recorder wizard. image

That error message comes up in the recording wizard [0] looking for the type of the jvm type [1] [2]. I just noticed that the jvmtype for other goes unchecked, and I'm wondering if that's where this is falling into.

I'm not too familiar with graalvm, but tried launching a quarkus demo app with the graalvm-ce [3] and it was able to start jfr. @roberttoyonaga Is there a place to easily grab (or build?) a copy of the graalvm you're using here?

[0]https://github.com/openjdk/jmc/blob/master/application/org.openjdk.jmc.flightrecorder.controlpanel.ui/src/main/java/org/openjdk/jmc/flightrecorder/controlpanel/ui/actions/StartRecordingAction.java#L64
[1]https://github.com/openjdk/jmc/blob/master/application/org.openjdk.jmc.rjmx/src/main/java/org/openjdk/jmc/rjmx/JVMSupportToolkit.java#L183
[2]https://github.com/openjdk/jmc/blob/master/core/org.openjdk.jmc.common/src/main/java/org/openjdk/jmc/common/jvm/JVMType.java#L40
[3] https://github.com/graalvm/graalvm-ce-builds/releases

@roberttoyonaga
Copy link
Contributor

roberttoyonaga commented Jun 6, 2023

Is there a place to easily grab (or build?) a copy of the graalvm you're using here?

@aptmac hmm you could try using one of the dev builds here: https://github.com/graalvm/graalvm-ce-dev-builds/releases
Anything before 23.0 won't have RJMX support. You could also try docker run --rm -p 9091:9091 -it quay.io/roberttoyonaga/jmx:jmxquarkus I made this image with the quarkus getting-started quickstart. It has JFR and RJMX support built in.

@andrewazores
Copy link
Member

Thanks @Josh-Matsuoka @aptmac @roberttoyonaga all for looking into this, this is a lot of help. Since JMC is also affected in some manner it does look like we should propose a patch. We can experiment with our embedded sources and try to come up with something that works, which should then be pretty trivial to port over to JMC to verify a build, and then try to upstream the change. I'm fine with carrying a downstream patch to fix this for our particular case in the meantime.

@roberttoyonaga
Copy link
Contributor

We can experiment with our embedded sources and try to come up with something that works

I'm going to experiment a little. I think I know what the problem is (isFlightRecorderDisabled and isDynamicFlightRecorderSupported need to be tweaked). It seems like updating the sources fixed the original NPE #154 but introduced this new issue. How are you testing @andrewazores @Josh-Matsuoka locally using devserver.sh? I should be incorporating the changes from #222 as well I assume.

@andrewazores
Copy link
Member

andrewazores commented Jun 6, 2023

diff --git a/pom.xml b/pom.xml
index 989b393f..36cf14c5 100644
--- a/pom.xml
+++ b/pom.xml
@@ -42,7 +42,7 @@
   <cryostat.imageStream>quay.io/cryostat/cryostat</cryostat.imageStream>
   <cryostat.entrypoint>/app/entrypoint.bash</cryostat.entrypoint>
   <cryostat.mainClass>io.cryostat.Cryostat</cryostat.mainClass>
-  <cryostat.rjmxPort>9091</cryostat.rjmxPort>
+  <cryostat.rjmxPort>9081</cryostat.rjmxPort>
   <cryostat.webPort>8181</cryostat.webPort>
   <cryostat.itest.pullImages>always</cryostat.itest.pullImages>
   <cryostat.itest.webPort>8181</cryostat.itest.webPort>
diff --git a/smoketest.sh b/smoketest.sh
index f0ec590b..2947e45b 100755
--- a/smoketest.sh
+++ b/smoketest.sh
@@ -108,6 +108,12 @@ runDemoApps() {
         local protocol="http"
     fi
 
+    podman run \
+        --name robert-app \
+        --pod cryostat-pod \
+        --label io.cryostat.connectUrl="service:jmx:rmi:///jndi/rmi://localhost:9091/jmxrmi" \
+        --rm -d quay.io/roberttoyonaga/jmx:jmxquarkus
+
     podman run \
         --name vertx-fib-demo-1 \
         --env HTTP_PORT=8081 \
@@ -295,6 +301,7 @@ createPod() {
         --publish 8081:8081 \
         --publish 8082:8082 \
         --publish 8083:8083 \
+        --publish 9091:9091 \
         --publish 9093:9093 \
         --publish 9094:9094 \
         --publish 9095:9095 \
$ cd cryostat
$ patch -p1 < diff-above.patch
$ sh smoketest.sh h2mem
$ cd cryostat-core
$ # make changes, check out PR branch, whatever
$ vim pom.xml # update artifact version to some later number
$ mvn install
$ cd cryostat # wherever that is
$ vim pom.xml # update core.version property to correspond to version used above
$ mvn clean package ; podman image prune -f
$ sh smoketest.sh h2mem # cryostat server should now be running with new -core changes included

@andrewazores
Copy link
Member

Oh on this note since you're back @roberttoyonaga , can the JMX port for your native image application be controlled, or is it hardcoded to bind on 9091? The pom.xml part of the patch could be simplified if the JMX port can be made something other than 9091, which conflicts with the default JMX port exposed by the Cryostat server itself.

@roberttoyonaga
Copy link
Contributor

can the JMX port for your native image application be controlled, or is it hardcoded to bind on 9091

Oh I see. Nope, I hard-coded it to be 9091 in the Dockerfile

@andrewazores
Copy link
Member

Okay, no problem. The patch I shared already accounts for that and binds Cryostat's JMX port on a different number to avoid the conflict, so it should still work that way. It just happens like that because smoketest.sh puts all the containers into a shared Pod, so port numbers have to be unique across the containers. That's not really how a typical deployment would look, it was just convenient for the purposes of that script.

@aptmac
Copy link
Member

aptmac commented Jun 6, 2023

You could also try docker run --rm -p 9091:9091 -it quay.io/roberttoyonaga/jmx:jmxquarkus I made this image with the quarkus getting-started quickstart. It has JFR and RJMX support built in.

Thanks! That was a big help, I see the errors now. JMC basically checks to see if the jvm is HotSpot, and bails otherwise. For a bit more context, here's an example of what it's going through

FlightRecorderServiceV2 constructor [1], checking if flight recorder is disabled
JVMSupportToolkit.isFlightRecorderDisabled() [2], tries using the HotSpotManagementToolkit to check if there's flightrecorder, specifically invoking on com.sun.management:type=HotSpotDiagnostic
HotSpotManagementToolkit.getVmOption() [3], throws an exception
Then tries to figure out what error message to present, in this case we've hit an unknown/non-hotspot vm so it cannot be support.

[1]https://github.com/openjdk/jmc/blob/master/application/org.openjdk.jmc.rjmx.services.jfr/src/main/java/org/openjdk/jmc/rjmx/services/jfr/internal/FlightRecorderServiceV2.java#L131
[2]https://github.com/openjdk/jmc/blob/master/application/org.openjdk.jmc.rjmx/src/main/java/org/openjdk/jmc/rjmx/JVMSupportToolkit.java#L124
[3]https://github.com/openjdk/jmc/blob/master/application/org.openjdk.jmc.rjmx/src/main/java/org/openjdk/jmc/rjmx/services/internal/HotspotManagementToolkit.java#L65

@roberttoyonaga
Copy link
Contributor

I'm having a bit of trouble running smoketest.sh. In runReportGenerator:
Error: OCI runtime error: runc: runc create failed: unable to start container process: error during container init: error setting cgroup config for procHooks process: cannot set memory limit: container could not join or create cgroup
Anyone run into this before?

@andrewazores
Copy link
Member

I've seen similar things before when people have tried to run the script on RHEL, which has an older kernel and an older version of cgroups.

You can just edit the run.sh and smoketest.sh to remove the podman --memory and --cpus flags.

Copy link
Contributor

@roberttoyonaga roberttoyonaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested some changes that add handling for substrate VM and should resolve the problem you were seeing. I tested this out with devserver.sh (I had issues getting smoketest.sh to run on my machine) and it seems to work now. I'm able to connect with Cryostat and generate/download recordings for a native image target. However, I noticed 2 problems:

  1. Cryostat no longer automatically detects native images. When I tested a few months ago, it was able to detect them without issue. Now I need to manually provide a custom service URL.
  2. I get this exception. I occasionally. Maybe this is because I'm testing with devserver.sh? It doesn't seem to affect functionality so I haven't looked into this further.

@andrewazores
Copy link
Member

@roberttoyonaga that exception gist in point 2 is pretty weird. It looks like a GraphQL query is either getting processed incorrectly, or being sent to the wrong endpoint somehow. I don't think should be specific to devserver.sh, but maybe it is if there is a version mismatch between the backend and frontend that the devserver is running together? I haven't seen an exception like this before in any of the more representative testing scenarios.

@andrewazores
Copy link
Member

For point 1 regarding automatic discovery, it will just depend on what kind of discovery mechanisms are available. In the devserver setup the only applicable builtin discovery mechanism will be JDP, otherwise targets will only be discovered as Custom Targets or if they have the Cryostat Agent attached and configured. I'm not sure what has changed in the scenario since the last time you tested it, but I don't think it has to do with this patch.

@roberttoyonaga
Copy link
Contributor

I'm not sure what has changed in the scenario since the last time you tested it, but I don't think it has to do with this patch.

Yes, I also don't think the changes in this PR have any affect on the discovery. I might dig deeper into it later, but for now, and for the purpose of the Quarkus insight demo, I think using a custom URL is good enough.

@andrewazores
Copy link
Member

If we can figure out how to run the demo using smoketest.sh then we can use Cryostat's podman label-based discovery instead, which should work equally well if custom targets work. Then at least we don't need to step through the UI of adding the target application.

@andrewazores andrewazores force-pushed the jmx-updates branch 2 times, most recently from e9bb092 to 35e963c Compare June 7, 2023 19:58
@andrewazores andrewazores merged commit 4bc004e into cryostatio:main Jun 7, 2023
mergify bot pushed a commit that referenced this pull request Jun 13, 2023
* Updating missed RJMX classes

* Fixing newline at end of file

* Cleanup of CommercialFeaturesServiceFactory, adding missing DefaultConnectionHandle changes

* feat(jmx): check for SubstrateVM and whether JFR is available

Co-authored-by: Robert Toyonaga <[email protected]>

---------

Co-authored-by: Andrew Azores <[email protected]>
Co-authored-by: Robert Toyonaga <[email protected]>
(cherry picked from commit 4bc004e)
andrewazores pushed a commit that referenced this pull request Jun 13, 2023
* Updating missed RJMX classes

* Fixing newline at end of file

* Cleanup of CommercialFeaturesServiceFactory, adding missing DefaultConnectionHandle changes

* feat(jmx): check for SubstrateVM and whether JFR is available

Co-authored-by: Robert Toyonaga <[email protected]>

---------

Co-authored-by: Andrew Azores <[email protected]>
Co-authored-by: Robert Toyonaga <[email protected]>
(cherry picked from commit 4bc004e)

Co-authored-by: Joshua Matsuoka <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport chore Refactor, rename, cleanup, etc.
Projects
No open projects
Status: Done
Status: Done
Development

Successfully merging this pull request may close these issues.

[Task] Upgrade JMX-related classes to JMC 8.2
4 participants