Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(triggers): include smart-triggered recordings in Harvester upload logic #217

Merged
merged 21 commits into from
Oct 4, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
fe61775
refactoring
andrewazores Oct 3, 2023
006a588
create package
andrewazores Oct 3, 2023
6cbae79
split out harvest module
andrewazores Oct 3, 2023
97ee6b7
preparing Harvester to handle multiple recordings
andrewazores Oct 3, 2023
dd4405c
create TriggerModule
andrewazores Oct 3, 2023
07a3561
track smart-triggered recordings in harvester logic
andrewazores Oct 3, 2023
a845525
harvester can start collecting smart triggers without its own sown re…
andrewazores Oct 3, 2023
4420aac
fixup! harvester can start collecting smart triggers without its own …
andrewazores Oct 3, 2023
a83027f
fixup! fixup! harvester can start collecting smart triggers without i…
andrewazores Oct 3, 2023
bba0db3
fixup! fixup! fixup! harvester can start collecting smart triggers wi…
andrewazores Oct 3, 2023
3e1fe90
fixup! fixup! fixup! fixup! harvester can start collecting smart trig…
andrewazores Oct 3, 2023
471cd25
apply periodic upload age/size settings to handled recordings
andrewazores Oct 3, 2023
799121a
only close and restart sown recording
andrewazores Oct 3, 2023
5785733
ensure recording dumps to exit path
andrewazores Oct 3, 2023
3ed5daf
fixup! ensure recording dumps to exit path
andrewazores Oct 3, 2023
2b4f77e
refactoring, ensure correct template type is reflected in upload labels
andrewazores Oct 3, 2023
21f05da
fixup! refactoring, ensure correct template type is reflected in uplo…
andrewazores Oct 3, 2023
e3c7d7a
fixup! fixup! ensure recording dumps to exit path
andrewazores Oct 3, 2023
391e3b8
cleanup
andrewazores Oct 3, 2023
1cd5b02
ensure correct upload file name
andrewazores Oct 3, 2023
ba5d8b3
fix up upload recording template metadata
andrewazores Oct 3, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 24 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,18 @@ JAVA_OPTIONS="-Dcom.sun.management.jmxremote.port=9091 -Dcom.sun.management.jmxr
```
This assumes that the agent JAR has been included in the application image within `/deployments/app/`.

## Harvester

The various `cryostat.agent.harvester.*` properties may be used to configure `cryostat-agent` to start a new Flight
Recording using a given event template on Agent initialization, and to periodically collect this recorded data and push
it to the Agent's associated Cryostat server. The Agent will also attempt to push the tail end of this recording on JVM
shutdown so that the cause of an unexpected JVM shutdown might be captured for later analysis.

## SMART TRIGGERS

`cryostat-agent` supports smart triggers that listen to the values of the MBean Counters and can start recordings based on a set of constraints specified by the user.
`cryostat-agent` supports smart triggers that listen to the values of the MBean Counters and can start recordings based
on a set of constraints specified by the user.

The general form of a smart trigger expression is as follows:

```
Expand All @@ -46,7 +55,8 @@ An example for listening to CPU Usage and starting a recording using the Profili
[ProcessCpuLoad>0.2]~profile
```

An example for watching for the Thread Count to exceed 20 for longer than 10 seconds and starting a recording using the Continuous template:
An example for watching for the Thread Count to exceed 20 for longer than 10 seconds and starting a recording using the
Continuous template:

```
[ThreadCount>20&&TargetDuration>duration("10s")]~Continuous
Expand All @@ -64,7 +74,16 @@ Multiple smart trigger definitions may be specified and separated by commas, for
[ProcessCpuLoad>0.2]~profile,[ThreadCount>30]~Continuous
```

**NOTE**: Smart Triggers are evaluated on a polling basis. The poll period is configurable (see list below). This means that your conditions are subject to sampling biases.
**NOTE**: Smart Triggers are evaluated on a polling basis. The poll period is configurable (see list below). This means
that your conditions are subject to sampling biases.

### Harvester Integration

Any Flight Recordings created by Smart Trigger will also be tracked by the Harvester system. This data will be captured
in a JFR Snapshot and pushed to the server on the Harvester's usual schedule. By defining Smart Triggers and a
Harvester period without a Harvester template, you can achieve a setup where dynamically-started Flight Recordings
begin when trigger conditions are met, and their data is then periodically captured until the recording is manually
stopped or the host JVM shuts down.

## CONFIGURATION

Expand All @@ -90,8 +109,8 @@ and how it advertises itself to a Cryostat server instance. Required properties
- [ ] `cryostat.agent.registration.retry-ms` [`long`]: the duration in milliseconds between attempts to register with the Cryostat server. Default `5000`.
- [ ] `cryostat.agent.exit.signals` [`[String]`]: a comma-separated list of signals that the agent should handle. When any of these signals is caught the agent initiates an orderly shutdown, deregistering from the Cryostat server and potentially uploading the latest harvested JFR data. Default `INT,TERM`.
- [ ] `cryostat.agent.exit.deregistration.timeout-ms` [`long`]: the duration in milliseconds to wait for a response from the Cryostat server when attempting to deregister at shutdown time . Default `3000`.
- [ ] `cryostat.agent.harvester.period-ms` [`long`]: the length of time between JFR collections and pushes by the harvester. This also controls the maximum age of data stored in the buffer for the harvester's managed Flight Recording. Every `period-ms` the harvester will upload a JFR binary file to the `cryostat.agent.baseuri` archives. Default `-1`, which indicates no harvesting will be performed.
- [ ] `cryostat.agent.harvester.template` [`String`]: the name of the `.jfc` event template configuration to use for the harvester's managed Flight Recording. Default `default`, the continuous monitoring event template.
- [ ] `cryostat.agent.harvester.period-ms` [`long`]: the length of time between JFR collections and pushes by the harvester. This also controls the maximum age of data stored in the buffer for the harvester's managed Flight Recording. Every `period-ms` the harvester will upload a JFR binary file to the `cryostat.agent.baseuri` archives. Default `-1`, which indicates no scheduled harvest uploading will be performed.
- [ ] `cryostat.agent.harvester.template` [`String`]: the name of the `.jfc` event template configuration to use for the harvester's managed Flight Recording. Defaults to the empty string, so that no recording is started.
- [ ] `cryostat.agent.harvester.max-files` [`String`]: the maximum number of pushed files that Cryostat will keep over the network from the agent. This is supplied to the harvester's push requests which instructs Cryostat to prune, in a FIFO manner, the oldest JFR files within the attached JVM target's storage, while the number of stored recordings is greater than this configuration's maximum file limit. Default `2147483647` (`Integer.MAX_VALUE`).
- [ ] `cryostat.agent.harvester.upload.timeout-ms` [`long`]: the duration in milliseconds to wait for HTTP upload requests to the Cryostat server to complete and respond. Default `30000`.
- [ ] `cryostat.agent.harvester.exit.max-age-ms` [`long`]: the JFR `maxage` setting, specified in milliseconds, to apply to recording data uploaded to the Cryostat server when the JVM this Agent instance is attached to exits. This ensures that tail-end data is captured between the last periodic push and the application exit. Exit uploads only occur when the application receives `SIGINT`/`SIGTERM` from the operating system or container platform.
Expand Down
1 change: 1 addition & 0 deletions src/main/java/io/cryostat/agent/Agent.java
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
import javax.inject.Named;
import javax.inject.Singleton;

import io.cryostat.agent.harvest.Harvester;
import io.cryostat.agent.triggers.TriggerEvaluator;

import dagger.Component;
Expand Down
2 changes: 1 addition & 1 deletion src/main/java/io/cryostat/agent/ConfigModule.java
Original file line number Diff line number Diff line change
Expand Up @@ -232,7 +232,7 @@ public static long provideCryostatAgentHarvesterPeriod(SmallRyeConfig config) {
@Singleton
@Named(CRYOSTAT_AGENT_HARVESTER_TEMPLATE)
public static String provideCryostatAgentHarvesterTemplate(SmallRyeConfig config) {
return config.getValue(CRYOSTAT_AGENT_HARVESTER_TEMPLATE, String.class);
return config.getOptionalValue(CRYOSTAT_AGENT_HARVESTER_TEMPLATE, String.class).orElse("");
}

@Provides
Expand Down
33 changes: 28 additions & 5 deletions src/main/java/io/cryostat/agent/CryostatClient.java
Original file line number Diff line number Diff line change
Expand Up @@ -30,13 +30,16 @@
import java.util.List;
import java.util.Map;
import java.util.Objects;
import java.util.Optional;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.CompletionException;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Executor;
import java.util.function.Function;

import io.cryostat.agent.FlightRecorderHelper.TemplatedRecording;
import io.cryostat.agent.WebServer.Credentials;
import io.cryostat.agent.harvest.Harvester;
import io.cryostat.agent.model.DiscoveryNode;
import io.cryostat.agent.model.PluginInfo;
import io.cryostat.agent.model.RegistrationInfo;
Expand All @@ -46,6 +49,8 @@
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.node.ObjectNode;
import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
import jdk.jfr.Configuration;
import jdk.jfr.Recording;
import org.apache.commons.io.FileUtils;
import org.apache.commons.io.input.CountingInputStream;
import org.apache.http.HttpHeaders;
Expand Down Expand Up @@ -360,21 +365,39 @@ public CompletableFuture<Void> update(
}

public CompletableFuture<Void> upload(
Harvester.PushType pushType, String template, int maxFiles, Path recording)
Harvester.PushType pushType,
Optional<TemplatedRecording> opt,
int maxFiles,
Path recording)
throws IOException {
Instant start = Instant.now();
String timestamp = start.truncatedTo(ChronoUnit.SECONDS).toString().replaceAll("[-:]", "");
String fileName = String.format("%s_%s_%s.jfr", appName, template, timestamp);
String template =
opt.map(TemplatedRecording::getConfiguration)
.map(Configuration::getName)
.map(String::toLowerCase)
.map(String::trim)
.orElse("unknown");
String fileName =
String.format(
"%s_%s_%s.jfr",
appName
+ opt.map(TemplatedRecording::getRecording)
.map(Recording::getName)
.map(n -> "-" + n)
.orElse(""),
template,
timestamp);
Map<String, String> labels =
Map.of(
"jvmId",
jvmId,
"pushType",
pushType.name(),
"template.name",
template,
"template.type",
"TARGET",
"pushType",
pushType.name());
"TARGET");

HttpPost req = new HttpPost(baseUri.resolve("/api/beta/recordings/" + jvmId));

Expand Down
69 changes: 41 additions & 28 deletions src/main/java/io/cryostat/agent/FlightRecorderHelper.java
Original file line number Diff line number Diff line change
Expand Up @@ -15,65 +15,78 @@
*/
package io.cryostat.agent;

import java.lang.management.ManagementFactory;
import java.util.List;
import java.util.Map;
import java.util.Objects;
import java.util.Optional;
import java.util.stream.Collectors;

import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
import jdk.jfr.Configuration;
import jdk.jfr.FlightRecorder;
import jdk.jfr.Recording;
import jdk.management.jfr.ConfigurationInfo;
import jdk.management.jfr.FlightRecorderMXBean;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class FlightRecorderHelper {

private final FlightRecorderMXBean bean =
ManagementFactory.getPlatformMXBean(FlightRecorderMXBean.class);
private final Logger log = LoggerFactory.getLogger(getClass());

// FIXME this is repeated logic shared with Harvester startRecording
public void startRecording(String templateNameOrLabel) {
getTemplate(templateNameOrLabel)
.ifPresentOrElse(
c -> {
long recordingId = bean.newRecording();
bean.setPredefinedConfiguration(recordingId, c.getName());
String recoringName =
String.format("cryostat-smart-trigger-%d", recordingId);
bean.setRecordingOptions(
recordingId, Map.of("name", recoringName, "disk", "true"));
bean.startRecording(recordingId);
log.info(
"Started recording \"{}\" using template \"{}\"",
recoringName,
templateNameOrLabel);
},
() ->
log.error(
"Cannot start recording with template named or labelled {}",
templateNameOrLabel));
public Optional<TemplatedRecording> createRecording(String templateNameOrLabel) {
Optional<Configuration> opt = getTemplate(templateNameOrLabel);
if (opt.isEmpty()) {
log.error(
"Cannot start recording with template named or labelled {}",
templateNameOrLabel);
return Optional.empty();
}
Configuration configuration = opt.get();
Recording recording = new Recording(configuration.getSettings());
recording.setToDisk(true);
return Optional.of(new TemplatedRecording(configuration, recording));
}

public Optional<ConfigurationInfo> getTemplate(String nameOrLabel) {
return bean.getConfigurations().stream()
public Optional<Configuration> getTemplate(String nameOrLabel) {
Objects.requireNonNull(nameOrLabel);
return Configuration.getConfigurations().stream()
.filter(c -> c.getName().equals(nameOrLabel) || c.getLabel().equals(nameOrLabel))
.findFirst();
}

public boolean isValidTemplate(String nameOrLabel) {
Objects.requireNonNull(nameOrLabel);
return getTemplate(nameOrLabel).isPresent();
}

public List<RecordingInfo> getRecordings() {
if (!FlightRecorder.isAvailable()) {
log.error("FlightRecorder is unavailable");
return List.of();
}
return FlightRecorder.getFlightRecorder().getRecordings().stream()
.map(RecordingInfo::new)
.collect(Collectors.toList());
}

@SuppressFBWarnings(value = {"EI_EXPOSE_REP", "EI_EXPOSE_REP2"})
public static class TemplatedRecording {
private final Configuration configuration;
private final Recording recording;

public TemplatedRecording(Configuration configuration, Recording recording) {
this.configuration = configuration;
this.recording = recording;
}

public Configuration getConfiguration() {
return configuration;
}

public Recording getRecording() {
return recording;
}
}

@SuppressFBWarnings(value = "URF_UNREAD_FIELD")
public static class RecordingInfo {

Expand Down
69 changes: 4 additions & 65 deletions src/main/java/io/cryostat/agent/MainModule.java
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,10 @@
import javax.net.ssl.TrustManager;
import javax.net.ssl.X509TrustManager;

import io.cryostat.agent.Harvester.RecordingSettings;
import io.cryostat.agent.harvest.HarvestModule;
import io.cryostat.agent.remote.RemoteContext;
import io.cryostat.agent.remote.RemoteModule;
import io.cryostat.agent.triggers.TriggerEvaluator;
import io.cryostat.agent.triggers.TriggerParser;
import io.cryostat.agent.triggers.TriggerModule;
import io.cryostat.core.net.JFRConnection;
import io.cryostat.core.net.JFRConnectionToolkit;
import io.cryostat.core.sys.Environment;
Expand All @@ -64,14 +63,15 @@
includes = {
ConfigModule.class,
RemoteModule.class,
HarvestModule.class,
TriggerModule.class,
})
public abstract class MainModule {

// one for outbound HTTP requests, one for incoming HTTP requests, and one as a general worker
private static final int NUM_WORKER_THREADS = 3;
private static final String JVM_ID = "JVM_ID";
private static final String TEMPLATES_PATH = "TEMPLATES_PATH";
private static final String TRIGGER_SCHEDULER = "TRIGGER_SCHEDULER";

@Provides
@Singleton
Expand Down Expand Up @@ -236,73 +236,12 @@ public static Registration provideRegistration(
registrationCheckMs);
}

@Provides
@Singleton
public static Harvester provideHarvester(
ScheduledExecutorService workerPool,
@Named(ConfigModule.CRYOSTAT_AGENT_HARVESTER_PERIOD_MS) long period,
@Named(ConfigModule.CRYOSTAT_AGENT_HARVESTER_TEMPLATE) String template,
@Named(ConfigModule.CRYOSTAT_AGENT_HARVESTER_MAX_FILES) int maxFiles,
@Named(ConfigModule.CRYOSTAT_AGENT_HARVESTER_EXIT_MAX_AGE_MS) long exitMaxAge,
@Named(ConfigModule.CRYOSTAT_AGENT_HARVESTER_EXIT_MAX_SIZE_B) long exitMaxSize,
@Named(ConfigModule.CRYOSTAT_AGENT_HARVESTER_MAX_AGE_MS) long maxAge,
@Named(ConfigModule.CRYOSTAT_AGENT_HARVESTER_MAX_SIZE_B) long maxSize,
CryostatClient client,
Registration registration) {
RecordingSettings exitSettings = new RecordingSettings();
exitSettings.maxAge = exitMaxAge;
exitSettings.maxSize = exitMaxSize;
RecordingSettings periodicSettings = new RecordingSettings();
periodicSettings.maxAge = maxAge > 0 ? maxAge : (long) (period * 1.5);
periodicSettings.maxSize = maxSize;
return new Harvester(
Executors.newSingleThreadScheduledExecutor(
r -> {
Thread t = new Thread(r);
t.setName("cryostat-agent-harvester");
t.setDaemon(true);
return t;
}),
workerPool,
period,
template,
maxFiles,
exitSettings,
periodicSettings,
client,
registration);
}

@Provides
@Singleton
@Named(TRIGGER_SCHEDULER)
public static ScheduledExecutorService provideTriggerScheduler() {
return Executors.newScheduledThreadPool(0);
}

@Provides
@Singleton
public static FlightRecorderHelper provideFlightRecorderHelper() {
return new FlightRecorderHelper();
}

@Provides
@Singleton
public static TriggerParser provideTriggerParser(FlightRecorderHelper helper) {
return new TriggerParser(helper);
}

@Provides
@Singleton
public static TriggerEvaluator provideTriggerEvaluatorFactory(
@Named(TRIGGER_SCHEDULER) ScheduledExecutorService scheduler,
TriggerParser parser,
FlightRecorderHelper helper,
@Named(ConfigModule.CRYOSTAT_AGENT_SMART_TRIGGER_EVALUATION_PERIOD_MS)
long evaluationPeriodMs) {
return new TriggerEvaluator(scheduler, parser, helper, evaluationPeriodMs);
}

@Provides
@Singleton
public static FileSystem provideFileSystem() {
Expand Down
Loading
Loading