Skip to content

Commit

Permalink
feat(triggers): include smart-triggered recordings in Harvester uploa…
Browse files Browse the repository at this point in the history
…d logic (#217)
  • Loading branch information
andrewazores authored Oct 4, 2023
1 parent ba975ca commit 2d0ed81
Show file tree
Hide file tree
Showing 15 changed files with 416 additions and 212 deletions.
29 changes: 24 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,18 @@ JAVA_OPTIONS="-Dcom.sun.management.jmxremote.port=9091 -Dcom.sun.management.jmxr
```
This assumes that the agent JAR has been included in the application image within `/deployments/app/`.

## Harvester

The various `cryostat.agent.harvester.*` properties may be used to configure `cryostat-agent` to start a new Flight
Recording using a given event template on Agent initialization, and to periodically collect this recorded data and push
it to the Agent's associated Cryostat server. The Agent will also attempt to push the tail end of this recording on JVM
shutdown so that the cause of an unexpected JVM shutdown might be captured for later analysis.

## SMART TRIGGERS

`cryostat-agent` supports smart triggers that listen to the values of the MBean Counters and can start recordings based on a set of constraints specified by the user.
`cryostat-agent` supports smart triggers that listen to the values of the MBean Counters and can start recordings based
on a set of constraints specified by the user.

The general form of a smart trigger expression is as follows:

```
Expand All @@ -46,7 +55,8 @@ An example for listening to CPU Usage and starting a recording using the Profili
[ProcessCpuLoad>0.2]~profile
```

An example for watching for the Thread Count to exceed 20 for longer than 10 seconds and starting a recording using the Continuous template:
An example for watching for the Thread Count to exceed 20 for longer than 10 seconds and starting a recording using the
Continuous template:

```
[ThreadCount>20&&TargetDuration>duration("10s")]~Continuous
Expand All @@ -64,7 +74,16 @@ Multiple smart trigger definitions may be specified and separated by commas, for
[ProcessCpuLoad>0.2]~profile,[ThreadCount>30]~Continuous
```

**NOTE**: Smart Triggers are evaluated on a polling basis. The poll period is configurable (see list below). This means that your conditions are subject to sampling biases.
**NOTE**: Smart Triggers are evaluated on a polling basis. The poll period is configurable (see list below). This means
that your conditions are subject to sampling biases.

### Harvester Integration

Any Flight Recordings created by Smart Trigger will also be tracked by the Harvester system. This data will be captured
in a JFR Snapshot and pushed to the server on the Harvester's usual schedule. By defining Smart Triggers and a
Harvester period without a Harvester template, you can achieve a setup where dynamically-started Flight Recordings
begin when trigger conditions are met, and their data is then periodically captured until the recording is manually
stopped or the host JVM shuts down.

## CONFIGURATION

Expand All @@ -90,8 +109,8 @@ and how it advertises itself to a Cryostat server instance. Required properties
- [ ] `cryostat.agent.registration.retry-ms` [`long`]: the duration in milliseconds between attempts to register with the Cryostat server. Default `5000`.
- [ ] `cryostat.agent.exit.signals` [`[String]`]: a comma-separated list of signals that the agent should handle. When any of these signals is caught the agent initiates an orderly shutdown, deregistering from the Cryostat server and potentially uploading the latest harvested JFR data. Default `INT,TERM`.
- [ ] `cryostat.agent.exit.deregistration.timeout-ms` [`long`]: the duration in milliseconds to wait for a response from the Cryostat server when attempting to deregister at shutdown time . Default `3000`.
- [ ] `cryostat.agent.harvester.period-ms` [`long`]: the length of time between JFR collections and pushes by the harvester. This also controls the maximum age of data stored in the buffer for the harvester's managed Flight Recording. Every `period-ms` the harvester will upload a JFR binary file to the `cryostat.agent.baseuri` archives. Default `-1`, which indicates no harvesting will be performed.
- [ ] `cryostat.agent.harvester.template` [`String`]: the name of the `.jfc` event template configuration to use for the harvester's managed Flight Recording. Default `default`, the continuous monitoring event template.
- [ ] `cryostat.agent.harvester.period-ms` [`long`]: the length of time between JFR collections and pushes by the harvester. This also controls the maximum age of data stored in the buffer for the harvester's managed Flight Recording. Every `period-ms` the harvester will upload a JFR binary file to the `cryostat.agent.baseuri` archives. Default `-1`, which indicates no scheduled harvest uploading will be performed.
- [ ] `cryostat.agent.harvester.template` [`String`]: the name of the `.jfc` event template configuration to use for the harvester's managed Flight Recording. Defaults to the empty string, so that no recording is started.
- [ ] `cryostat.agent.harvester.max-files` [`String`]: the maximum number of pushed files that Cryostat will keep over the network from the agent. This is supplied to the harvester's push requests which instructs Cryostat to prune, in a FIFO manner, the oldest JFR files within the attached JVM target's storage, while the number of stored recordings is greater than this configuration's maximum file limit. Default `2147483647` (`Integer.MAX_VALUE`).
- [ ] `cryostat.agent.harvester.upload.timeout-ms` [`long`]: the duration in milliseconds to wait for HTTP upload requests to the Cryostat server to complete and respond. Default `30000`.
- [ ] `cryostat.agent.harvester.exit.max-age-ms` [`long`]: the JFR `maxage` setting, specified in milliseconds, to apply to recording data uploaded to the Cryostat server when the JVM this Agent instance is attached to exits. This ensures that tail-end data is captured between the last periodic push and the application exit. Exit uploads only occur when the application receives `SIGINT`/`SIGTERM` from the operating system or container platform.
Expand Down
1 change: 1 addition & 0 deletions src/main/java/io/cryostat/agent/Agent.java
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
import javax.inject.Named;
import javax.inject.Singleton;

import io.cryostat.agent.harvest.Harvester;
import io.cryostat.agent.triggers.TriggerEvaluator;

import dagger.Component;
Expand Down
2 changes: 1 addition & 1 deletion src/main/java/io/cryostat/agent/ConfigModule.java
Original file line number Diff line number Diff line change
Expand Up @@ -232,7 +232,7 @@ public static long provideCryostatAgentHarvesterPeriod(SmallRyeConfig config) {
@Singleton
@Named(CRYOSTAT_AGENT_HARVESTER_TEMPLATE)
public static String provideCryostatAgentHarvesterTemplate(SmallRyeConfig config) {
return config.getValue(CRYOSTAT_AGENT_HARVESTER_TEMPLATE, String.class);
return config.getOptionalValue(CRYOSTAT_AGENT_HARVESTER_TEMPLATE, String.class).orElse("");
}

@Provides
Expand Down
33 changes: 28 additions & 5 deletions src/main/java/io/cryostat/agent/CryostatClient.java
Original file line number Diff line number Diff line change
Expand Up @@ -30,13 +30,16 @@
import java.util.List;
import java.util.Map;
import java.util.Objects;
import java.util.Optional;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.CompletionException;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Executor;
import java.util.function.Function;

import io.cryostat.agent.FlightRecorderHelper.TemplatedRecording;
import io.cryostat.agent.WebServer.Credentials;
import io.cryostat.agent.harvest.Harvester;
import io.cryostat.agent.model.DiscoveryNode;
import io.cryostat.agent.model.PluginInfo;
import io.cryostat.agent.model.RegistrationInfo;
Expand All @@ -46,6 +49,8 @@
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.node.ObjectNode;
import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
import jdk.jfr.Configuration;
import jdk.jfr.Recording;
import org.apache.commons.io.FileUtils;
import org.apache.commons.io.input.CountingInputStream;
import org.apache.http.HttpHeaders;
Expand Down Expand Up @@ -360,21 +365,39 @@ public CompletableFuture<Void> update(
}

public CompletableFuture<Void> upload(
Harvester.PushType pushType, String template, int maxFiles, Path recording)
Harvester.PushType pushType,
Optional<TemplatedRecording> opt,
int maxFiles,
Path recording)
throws IOException {
Instant start = Instant.now();
String timestamp = start.truncatedTo(ChronoUnit.SECONDS).toString().replaceAll("[-:]", "");
String fileName = String.format("%s_%s_%s.jfr", appName, template, timestamp);
String template =
opt.map(TemplatedRecording::getConfiguration)
.map(Configuration::getName)
.map(String::toLowerCase)
.map(String::trim)
.orElse("unknown");
String fileName =
String.format(
"%s_%s_%s.jfr",
appName
+ opt.map(TemplatedRecording::getRecording)
.map(Recording::getName)
.map(n -> "-" + n)
.orElse(""),
template,
timestamp);
Map<String, String> labels =
Map.of(
"jvmId",
jvmId,
"pushType",
pushType.name(),
"template.name",
template,
"template.type",
"TARGET",
"pushType",
pushType.name());
"TARGET");

HttpPost req = new HttpPost(baseUri.resolve("/api/beta/recordings/" + jvmId));

Expand Down
69 changes: 41 additions & 28 deletions src/main/java/io/cryostat/agent/FlightRecorderHelper.java
Original file line number Diff line number Diff line change
Expand Up @@ -15,65 +15,78 @@
*/
package io.cryostat.agent;

import java.lang.management.ManagementFactory;
import java.util.List;
import java.util.Map;
import java.util.Objects;
import java.util.Optional;
import java.util.stream.Collectors;

import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
import jdk.jfr.Configuration;
import jdk.jfr.FlightRecorder;
import jdk.jfr.Recording;
import jdk.management.jfr.ConfigurationInfo;
import jdk.management.jfr.FlightRecorderMXBean;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class FlightRecorderHelper {

private final FlightRecorderMXBean bean =
ManagementFactory.getPlatformMXBean(FlightRecorderMXBean.class);
private final Logger log = LoggerFactory.getLogger(getClass());

// FIXME this is repeated logic shared with Harvester startRecording
public void startRecording(String templateNameOrLabel) {
getTemplate(templateNameOrLabel)
.ifPresentOrElse(
c -> {
long recordingId = bean.newRecording();
bean.setPredefinedConfiguration(recordingId, c.getName());
String recoringName =
String.format("cryostat-smart-trigger-%d", recordingId);
bean.setRecordingOptions(
recordingId, Map.of("name", recoringName, "disk", "true"));
bean.startRecording(recordingId);
log.info(
"Started recording \"{}\" using template \"{}\"",
recoringName,
templateNameOrLabel);
},
() ->
log.error(
"Cannot start recording with template named or labelled {}",
templateNameOrLabel));
public Optional<TemplatedRecording> createRecording(String templateNameOrLabel) {
Optional<Configuration> opt = getTemplate(templateNameOrLabel);
if (opt.isEmpty()) {
log.error(
"Cannot start recording with template named or labelled {}",
templateNameOrLabel);
return Optional.empty();
}
Configuration configuration = opt.get();
Recording recording = new Recording(configuration.getSettings());
recording.setToDisk(true);
return Optional.of(new TemplatedRecording(configuration, recording));
}

public Optional<ConfigurationInfo> getTemplate(String nameOrLabel) {
return bean.getConfigurations().stream()
public Optional<Configuration> getTemplate(String nameOrLabel) {
Objects.requireNonNull(nameOrLabel);
return Configuration.getConfigurations().stream()
.filter(c -> c.getName().equals(nameOrLabel) || c.getLabel().equals(nameOrLabel))
.findFirst();
}

public boolean isValidTemplate(String nameOrLabel) {
Objects.requireNonNull(nameOrLabel);
return getTemplate(nameOrLabel).isPresent();
}

public List<RecordingInfo> getRecordings() {
if (!FlightRecorder.isAvailable()) {
log.error("FlightRecorder is unavailable");
return List.of();
}
return FlightRecorder.getFlightRecorder().getRecordings().stream()
.map(RecordingInfo::new)
.collect(Collectors.toList());
}

@SuppressFBWarnings(value = {"EI_EXPOSE_REP", "EI_EXPOSE_REP2"})
public static class TemplatedRecording {
private final Configuration configuration;
private final Recording recording;

public TemplatedRecording(Configuration configuration, Recording recording) {
this.configuration = configuration;
this.recording = recording;
}

public Configuration getConfiguration() {
return configuration;
}

public Recording getRecording() {
return recording;
}
}

@SuppressFBWarnings(value = "URF_UNREAD_FIELD")
public static class RecordingInfo {

Expand Down
69 changes: 4 additions & 65 deletions src/main/java/io/cryostat/agent/MainModule.java
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,10 @@
import javax.net.ssl.TrustManager;
import javax.net.ssl.X509TrustManager;

import io.cryostat.agent.Harvester.RecordingSettings;
import io.cryostat.agent.harvest.HarvestModule;
import io.cryostat.agent.remote.RemoteContext;
import io.cryostat.agent.remote.RemoteModule;
import io.cryostat.agent.triggers.TriggerEvaluator;
import io.cryostat.agent.triggers.TriggerParser;
import io.cryostat.agent.triggers.TriggerModule;
import io.cryostat.core.net.JFRConnection;
import io.cryostat.core.net.JFRConnectionToolkit;
import io.cryostat.core.sys.Environment;
Expand All @@ -64,14 +63,15 @@
includes = {
ConfigModule.class,
RemoteModule.class,
HarvestModule.class,
TriggerModule.class,
})
public abstract class MainModule {

// one for outbound HTTP requests, one for incoming HTTP requests, and one as a general worker
private static final int NUM_WORKER_THREADS = 3;
private static final String JVM_ID = "JVM_ID";
private static final String TEMPLATES_PATH = "TEMPLATES_PATH";
private static final String TRIGGER_SCHEDULER = "TRIGGER_SCHEDULER";

@Provides
@Singleton
Expand Down Expand Up @@ -236,73 +236,12 @@ public static Registration provideRegistration(
registrationCheckMs);
}

@Provides
@Singleton
public static Harvester provideHarvester(
ScheduledExecutorService workerPool,
@Named(ConfigModule.CRYOSTAT_AGENT_HARVESTER_PERIOD_MS) long period,
@Named(ConfigModule.CRYOSTAT_AGENT_HARVESTER_TEMPLATE) String template,
@Named(ConfigModule.CRYOSTAT_AGENT_HARVESTER_MAX_FILES) int maxFiles,
@Named(ConfigModule.CRYOSTAT_AGENT_HARVESTER_EXIT_MAX_AGE_MS) long exitMaxAge,
@Named(ConfigModule.CRYOSTAT_AGENT_HARVESTER_EXIT_MAX_SIZE_B) long exitMaxSize,
@Named(ConfigModule.CRYOSTAT_AGENT_HARVESTER_MAX_AGE_MS) long maxAge,
@Named(ConfigModule.CRYOSTAT_AGENT_HARVESTER_MAX_SIZE_B) long maxSize,
CryostatClient client,
Registration registration) {
RecordingSettings exitSettings = new RecordingSettings();
exitSettings.maxAge = exitMaxAge;
exitSettings.maxSize = exitMaxSize;
RecordingSettings periodicSettings = new RecordingSettings();
periodicSettings.maxAge = maxAge > 0 ? maxAge : (long) (period * 1.5);
periodicSettings.maxSize = maxSize;
return new Harvester(
Executors.newSingleThreadScheduledExecutor(
r -> {
Thread t = new Thread(r);
t.setName("cryostat-agent-harvester");
t.setDaemon(true);
return t;
}),
workerPool,
period,
template,
maxFiles,
exitSettings,
periodicSettings,
client,
registration);
}

@Provides
@Singleton
@Named(TRIGGER_SCHEDULER)
public static ScheduledExecutorService provideTriggerScheduler() {
return Executors.newScheduledThreadPool(0);
}

@Provides
@Singleton
public static FlightRecorderHelper provideFlightRecorderHelper() {
return new FlightRecorderHelper();
}

@Provides
@Singleton
public static TriggerParser provideTriggerParser(FlightRecorderHelper helper) {
return new TriggerParser(helper);
}

@Provides
@Singleton
public static TriggerEvaluator provideTriggerEvaluatorFactory(
@Named(TRIGGER_SCHEDULER) ScheduledExecutorService scheduler,
TriggerParser parser,
FlightRecorderHelper helper,
@Named(ConfigModule.CRYOSTAT_AGENT_SMART_TRIGGER_EVALUATION_PERIOD_MS)
long evaluationPeriodMs) {
return new TriggerEvaluator(scheduler, parser, helper, evaluationPeriodMs);
}

@Provides
@Singleton
public static FileSystem provideFileSystem() {
Expand Down
Loading

0 comments on commit 2d0ed81

Please sign in to comment.