Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerqueue (WIP) #211

Open
wants to merge 36 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
fcc50f4
dockerqueue
mllg May 12, 2017
59fe2d7
Merge remote-tracking branch 'origin/master' into dockerqueue
mllg May 12, 2017
17d4fda
Merge remote-tracking branch 'origin/master' into dockerqueue
mllg May 15, 2017
2e37e99
Update Dockerqueue with improvements of master (#145)
jakob-r Sep 25, 2017
9289473
user hypen seems more compliant
jakob-r Sep 25, 2017
cf1e47c
update clusterFunctionsDockerQueue
jakob-r Sep 25, 2017
24d59c2
Merge branch 'master' into dockerqueue
jakob-r Oct 12, 2017
a2241d5
Merge branch 'master' into dockerqueue
jakob-r Nov 22, 2017
fea8dcd
update docker URL api handling
jakob-r Nov 22, 2017
0ea4da8
Merge branch 'master' into dockerqueue
jakob-r Nov 24, 2017
a2a5763
hack curl over system2 to use certificates
jakob-r Nov 27, 2017
ec10291
Merge branch 'master' into dockerqueue
jakob-r Nov 27, 2017
ef13cbf
why is this not working?
jakob-r Nov 27, 2017
92b586b
simplify
jakob-r Nov 28, 2017
bdd79a0
dockerQueue list running and queued
jakob-r Nov 28, 2017
3bc56d2
fix
jakob-r Nov 28, 2017
eb43bc1
Merge branch 'master' into dockerqueue
jakob-r Nov 28, 2017
436ce03
fix for empty result
jakob-r Nov 28, 2017
02c3581
fix kill untestet
jakob-r Nov 28, 2017
b28b64e
roxy fix
jakob-r Nov 28, 2017
e1ad83a
fix roxygen
jakob-r Nov 29, 2017
d9523b1
Merge branch 'master' into dockerqueue
jakob-r Dec 8, 2017
319aef6
Merge branch 'master' into dockerqueue
jakob-r Dec 8, 2017
a34033c
only update if present
jakob-r Dec 8, 2017
952f204
Merge branch 'master' into dockerqueue
jakob-r Dec 18, 2017
54d8102
Merge branch 'master' into dockerqueue
jakob-r Dec 22, 2017
c3e6bad
corner cases
jakob-r Dec 22, 2017
d127373
Merge branch 'master' into dockerqueue
jakob-r Jan 25, 2018
28e600a
fix kill jobs
jakob-r Nov 13, 2018
58fda48
Merge branch 'master' into dockerqueue
jakob-r Nov 13, 2018
f92e0a6
roxygen + (test)
jakob-r Nov 13, 2018
ec29ed6
Merge branch 'master' into dockerqueue
jakob-r Jan 24, 2019
3d82319
nodetype resources
jakob-r Jan 24, 2019
f00a4bc
Merge branch 'master' into dockerqueue
jakob-r Feb 26, 2019
9ccb423
Merge
jakob-r Apr 28, 2020
30dcddc
Merge branch 'master' into dockerqueue
jakob-r May 8, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,8 @@ Suggests:
doParallel,
doMPI,
e1071,
jsonlite,
RCurl,
foreach,
future,
future.batchtools,
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ export(loadResult)
export(lpt)
export(makeClusterFunctions)
export(makeClusterFunctionsDocker)
export(makeClusterFunctionsDockerQueue)
export(makeClusterFunctionsInteractive)
export(makeClusterFunctionsLSF)
export(makeClusterFunctionsMulticore)
Expand Down
133 changes: 133 additions & 0 deletions R/clusterFunctionsDockerQueue.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
#' @title ClusterFunctions for DockerQueue
#'
#' @description
#' Customized cluster functions for the isolated application running on the SFB876 cluster.
#'
#' @param image [\code{character(1)}]\cr
#' Name of the docker image to run.
#' @param docker.args [\code{character}]\cr
#' Additional arguments passed to \dQuote{docker} *before* the command (\dQuote{run}, \dQuote{ps} or \dQuote{kill}) to execute (e.g., the docker host).
#' @param image.args [\code{character}]\cr
#' Additional arguments passed to \dQuote{docker run} (e.g., to define mounts or environment variables).
#' @param docker.scheduler.url [\code{character}]\cr
#' URL of the docker scheduler API.
#' @param curl.args [\code{character}]\cr
#' arguments that should be passed to curl when accessing the DockerQueue-API.
#' @inheritParams makeClusterFunctions
#' @return [\code{\link{ClusterFunctions}}].
#' @family ClusterFunctions
#' @export
makeClusterFunctionsDockerQueue = function(image, docker.args = character(0L), image.args = character(0L), scheduler.latency = 1, fs.latency = 65, docker.scheduler.url = "https://s876cnsm:2350/v1.30", curl.args = character(0L)) { # nocov start
assertString(image)
assertCharacter(docker.args, any.missing = FALSE)
assertCharacter(image.args, any.missing = FALSE)
docker.scheduler.url = stri_replace_all_regex(docker.scheduler.url, "\\/$", replacement = "")
assertCharacter(docker.scheduler.url, any.missing = FALSE)
assertCharacter(curl.args, any.missing = FALSE)
user = Sys.info()["user"]

submitJob = function(reg, jc) {
assertRegistry(reg, writeable = TRUE)
assertClass(jc, "JobCollection")
assertIntegerish(jc$resources$ncpus, lower = 1L, any.missing = FALSE, .var.name = "resources$ncpus")
assertIntegerish(jc$resources$memory, lower = 1L, any.missing = FALSE, .var.name = "resources$memory")
timeout = if (is.null(jc$resources$walltime)) character(0L) else sprintf("timeout %i", asInt(jc$resources$walltime, lower = 0L))

# https://sfb876.tu-dortmund.de/sfbwiki/Wiki.jsp?page=Cluster
jc$resources$nodetype = jc$resources$nodetype %??% "==~cpu"
assertChoice(jc$resources$nodetype, c("==phi", "==~cpu", "!=phi"), null.ok = TRUE)

batch.id = sprintf("%s-bt_%s", user, jc$job.hash)
cmd = c("docker", docker.args, "create", "--label queue", "--label rm", image.args,
sprintf("-e constraint:nodetype%s", jc$resources$nodetype),
sprintf("-e DEBUGME='%s'", Sys.getenv("DEBUGME")),
sprintf("-e OMP_NUM_THREADS=%i", jc$resources$threads %??% 1L),
sprintf("-e OPENBLAS_NUM_THREADS=%i", jc$resources$threads %??% 1L),
sprintf("-c %i", jc$resources$ncpus),
sprintf("-m %im", jc$resources$memory),
sprintf("--memory-swap %im", jc$resources$memory),
sprintf("--label batchtools=%s", jc$job.hash),
sprintf("--label user=%s", user),
sprintf("--name=%s", batch.id),
image, timeout, "Rscript", stri_join("-e", shQuote(sprintf("batchtools::doJobCollection('%s', '%s')", jc$uri, jc$log.file)), sep = " "))

res = runOSCommand(cmd[1L], cmd[-1L])

if (res$exit.code > 0L) {
return(cfHandleUnknownSubmitError(stri_flatten(cmd, " "), res$exit.code, res$output))
} else {
return(makeSubmitJobResult(status = 0L, batch.id = batch.id))
}
}

dfJobsRunning = function(reg) {
args = c(docker.args, "ps", "--format='{{.ID}};{{.Names}}'", "--filter 'label=batchtools'", sprintf("--filter 'user=%s'", user))
res = runOSCommand("docker", args)
if (res$exit.code > 0L)
OSError("Listing of jobs failed", res)
res.jobs = stri_split_fixed(res$output, ";")
if (length(res.jobs) == 0) {
res.jobs = data.table(character(0), character(0))
} else {
res.jobs = do.call(rbind, res.jobs)
}
colnames(res.jobs) = c("docker.id", "batch.id")
res.jobs = as.data.table(res.jobs)
res.jobs$batch.id = stri_extract_last_regex(res.jobs$batch.id, "[0-9a-z_-]+")
return(res.jobs)
}

dfJobsQueued = function(reg) {
if (!requireNamespace("jsonlite", quietly = TRUE))
stop("Package 'jsonlite' is required")

# list scheduled but not running
curl.res = runOSCommand("curl", unique(c("-s", curl.args, sprintf("%s/jobs/%s/json", docker.scheduler.url, user))))
tab = jsonlite::fromJSON(curl.res$output)
if (length(tab) == 0L) {
tab = data.table(numeric(0), character(0))
} else {
tab = as.data.table(tab[, c("id", "containerName")])[get("containerName") %chin% reg$status$batch.id]
}
colnames(tab) = c("schedule.id", "batch.id")
return(tab)
}

killJob = function(reg, batch.id) {
assertRegistry(reg, writeable = TRUE)
assertString(batch.id)
this.batch.id = batch.id
df.queued = dfJobsQueued(reg)
if (this.batch.id %in% df.queued$batch.id) {
id = df.queued[batch.id == this.batch.id]$schedule.id
curl.res = runOSCommand("curl", c("-XDELETE", "-k", "-s", curl.args, sprintf("%s/jobs/%i/delete", docker.scheduler.url, id)))
success = stri_startswith_fixed(curl.res$output, "Successfully deleted")
} else {
df.running = dfJobsRunning(reg)
if (this.batch.id %in% df.running$batch.id) {
docker.id = df.running[batch.id == this.batch.id]$docker.id
success = cfKillJob(reg, "docker", c(docker.args, "kill", docker.id))
} else {
success = FALSE
}
}
return(success)
}

listJobsRunning = function(reg) {
assertRegistry(reg, writeable = FALSE)
tab = dfJobsRunning(reg)
tab$batch.id
}

listJobsQueued = function(reg) {
assertRegistry(reg, writeable = FALSE)

tab = dfJobsQueued(reg)
tab$batch.id
}


makeClusterFunctions(name = "DockerQueue", submitJob = submitJob, killJob = killJob, listJobsRunning = listJobsRunning,
listJobsQueued = listJobsQueued, store.job.collection = TRUE, scheduler.latency = scheduler.latency, fs.latency = fs.latency)
} # nocov end
14 changes: 14 additions & 0 deletions man/makeClusterFunctions.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

14 changes: 14 additions & 0 deletions man/makeClusterFunctionsDocker.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

57 changes: 57 additions & 0 deletions man/makeClusterFunctionsDockerQueue.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

14 changes: 14 additions & 0 deletions man/makeClusterFunctionsInteractive.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

14 changes: 14 additions & 0 deletions man/makeClusterFunctionsLSF.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

14 changes: 14 additions & 0 deletions man/makeClusterFunctionsMulticore.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

14 changes: 14 additions & 0 deletions man/makeClusterFunctionsOpenLava.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

14 changes: 14 additions & 0 deletions man/makeClusterFunctionsSGE.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading