Trying to extract a hierarchical trace for smart data aggregation

Useful links
Obtain useful information via REST
Useful information from OAR mysql
- Dumping the database
- Having fun with R

Sitemap

Alvin's Homepage

Teaching

2019

RICM4: Probabilité et Simulation

RICM4: Évaluation de performance

2018

RICM4: Probabilité et Simulation

RICM4: Évaluation de performance

M2R Scientific Methodology and Performance Evaluation

2017

RICM4: Probabilité et Simulation

RICM4: Évaluation de performance

M2R Scientific Methodology and Performance Evaluation

2016

RICM4: Probabilité et Simulation

RICM4: Évaluation de performance

M2R Scientific Methodology and Performance Evaluation

M2R Parallel Systems

2015

RICM4: Probabilité et Simulation

RICM4: Évaluation de Performance

M2R Parallel Systems

M2R Scientific Methodology and Performance Evaluation

2014

RICM4: Probabilité et Simulation

RICM4: Évaluation de Performance

M2R Parallel Systems

M2R Scientific Methodology and Performance Evaluation

2013

RICM4: Probabilité et Simulation

RICM4: Évaluation de Performance

M2R Parallel Systems

M2R Performance Evaluation

Evaluation of SimGrid's Lazy Mechanism for Network Settings

2012

Brown Bag Lunch Sessions

M2R Parallel Systems

2011

M2R Parallel Systems

M2R Performance Evaluation

2010

M2R Parallel Systems

M1 MOSIG Operating Systems

2009

M2R Parallel Systems

2008

M2R Parallel Systems

2007

M2R Evaluation de Performance

Research

misc

Emacs init file written in org-mode

Lab Blog

2016

June

21. Talk on Reproducible Research at the Inria Scientific Days

May

10. Talk at la maison de la simulation

2015

2014

2013

2012

Agenda

Jean-Marc, Robin and Lucas have this nice technique to aggregate data and depict such information using for example treemaps. Lucas was visiting us this week and some of the reviewers of one of their article asked for large non-synthetic traces. G5K has a natural hierarchy so we decided to try extract interesting information. Fortunately, Elodie, Pierre, and Bruno who are in the office next door and administrate G5K and Ciment could help and point us quickly to the right places. :)

Useful links

Obtain useful information via REST

sudo gem install restfully
restfully -u alegrand -p 'myg5kpassword' --uri https://api.grid5000.fr/stable/grid5000

Here is a first attempt to obtain the machine load. The good thing with interactive ruby and object inspection is you can simply tab to obtain the list of methods and explore the object:

# I could browse like this:
root.sites[:rennes].clusters[:parapide].nodes[:'parapide-1'] # this allows to iterate per site/cluster/node but provides mainly static informations
# To access ganglia information, you have to look at the metrics field:
pp root.sites[:rennes].metrics[:cpu_idle].timeseries[:'paradent-5']
# It is even possible to filter a bit
pp root.sites[:rennes].metrics[:cpu_idle].timeseries.load(:query => {:resolution => 15, :from => Time.now.to_i-3600*1})[:'paradent-5']
# So for a particular node, the last measured value is:
root.sites[:rennes].metrics[:cpu_idle].timeseries[:'paradent-5'].properties['values'][0]

# So let's iterate over all machines
file = File.open("/tmp/rest.txt", 'w')
root.sites.each do |site| 
  site.metrics[:cpu_idle].timeseries.each do |node|
    file.write node.properties['hostname'] + ", " + node.properties['values'][0].to_s + "\n"
  end
end
file.close

Unfortunately, this provides a rather poor information as only running nodes that are not deployed may return this information:

tail /tmp/rest.txt

pastel-62.toulouse.grid5000.fr, 
pastel-83.toulouse.grid5000.fr, 
pastel-63.toulouse.grid5000.fr, 
pastel-55.toulouse.grid5000.fr, 
pastel-56.toulouse.grid5000.fr, 99.9
pastel-140.toulouse.grid5000.fr, 
pastel-76.toulouse.grid5000.fr, 100.0
pastel-57.toulouse.grid5000.fr, 100.0
pastel-21.toulouse.grid5000.fr, 99.875
pastel-77.toulouse.grid5000.fr,

grep ', [0-9]' /tmp/res.txt | wc -l
cat /tmp/res.txt | wc -l

622
1426

So instead, let's try to capture the state of the machines:

file = File.open("/tmp/state_rest.txt", 'w')
root.sites.each do |site| 
  site.status.each do |node|
    file.write site.properties['uid'] + "/" + node.properties['node_uid'] + ", " + node.properties['system_state'] + "\n"
  end
end
file.close

tail /tmp/state_rest.txt

toulouse/pastel-6, unknown
toulouse/pastel-93, free
toulouse/pastel-37, unknown
toulouse/pastel-65, unknown
toulouse/pastel-7, unknown
toulouse/pastel-94, unknown
toulouse/pastel-38, free
toulouse/pastel-114, free
toulouse/pastel-66, free
toulouse/pastel-8, besteffort

sed 's/.*, //' /tmp/state_rest.txt | sort | uniq

for i in `sed 's/.*, //' /tmp/state_rest.txt | sort | uniq` ; do echo "$i :" `grep $i /tmp/state_rest.txt | wc -l` ; done

besteffort : 150
busy : 232
free : 582
unknown : 205

So we could set up an observation for a month but this would be long. Instead we decided we should rather try to get all this information from the OAR database.

Useful information from OAR mysql

Dumping the database

I wanted to first get a local version to browse it more comfortably.

ssh access.grenoble.grid5000.fr "mysqldump --lock-tables=false --quick -uoarreader -pread -h mysql.grenoble.grid5000.fr oar2" > oar2.sql
cat oar2.sql | mysql -u root -p$PASSWORD -h localhost oar2-grenoble

Obviously, when we will work on getting such information for all sites, we should dump to csv remotely to save space.

Looking at the tables, here is what I found that may be of interest for us:

resources:
- resource_id
- network_address
- cpu
- cpuset
jobs
- job_id
- start_time
- stop_time
assigned_resources
- moldable_job_id
- resource_id
resource_logs
- resource_id
- date_start
- attribute
- value
job_types
- job_id
- type

OK, so let's write a tiny script to extract the right information:

echo $FIELDS > /tmp/$TABLE.csv
echo "SELECT $FIELDS FROM $TABLE INTO OUTFILE \"/tmp/foo.csv\" FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY \"\\n\"" | mysql -u root -p$PASSWORD -h localhost $DATABASE
cat /tmp/foo.csv >> /tmp/$TABLE.csv

And let's call it for the different tables I am interested in (see in the org source at the bottom of the page how I reuse the previous block to do the job)…

Having fun with R

OK, now, I can read these information and exploit them in R.

jobs <- read.csv("/tmp/jobs.csv")
jobs <- jobs[jobs$start_time>3000,]             #cleanups
jobs <- jobs[jobs$stop_time>3000,]              #cleanups
jobs <- jobs[jobs$stop_time>jobs$start_time,]   #cleanups
assigned_resources <- read.csv("/tmp/assigned_resources.csv")
job_types <- read.csv("/tmp/job_types.csv")
resources <- read.csv("/tmp/resources.csv")
resource_logs <- read.csv("/tmp/resource_logs.csv")
names(resource_logs)=c("resource_id", "start_time", "attribute", "state")
resource_logs <- resource_logs[resource_logs$attribute=="state",]
resource_logs <- resource_logs[!(names(resource_logs) %in% c("attribute"))]

Let's select at random a week time interval.

start <- sample(jobs$start_time, 1)
end <- start+7*24*3600
job_resources <- merge(jobs[jobs$stop_time<=end & jobs$start_time>=start,],assigned_resources,by.x="job_id",by.y="moldable_job_id")
job_resources <- merge(job_resources,job_types,by.x="job_id",by.y="job_id")
job_resources <- merge(job_resources,resources,by.x="resource_id",by.y="resource_id")

# Mmmh, I need to get resource_state into a similar format so that dataframes can be merged
resource_states <- resource_logs[resource_logs$start_time <=end & resource_logs$start_time >= start, ]
resource_states <- resource_states[with(resource_states, order(resource_id,start_time)),]
block <- function(proc) {
  end_v <- c(tail(proc$start_time,length(proc$start_time)-1),end)
  cbind(proc,stop_time=end_v)
}
compute_durations <- function(df) {
  d <- data.frame()
  for(rank in unique(df$resource_id)) {
    d=rbind(d,block(df[df$resource_id==rank,]))
  }
  d
}
resource_states <- compute_durations(resource_states)
resource_states <- resource_states[resource_states$state != "Alive",]
resource_states <- merge(resource_states,resources,by.x="resource_id",by.y="resource_id")
names(resource_states)[names(resource_states)%in% c("state")] <- "type"
df <- rbind.fill(resource_states,job_resources)

And voilà, I can plot the Gantt chart now.

library(ggplot2)
ggplot(df)+
    theme_bw()+geom_rect(aes(xmin=start_time,xmax=stop_time, ymin=resource_id, ymax=resource_id+1,fill=factor(type)))
# + scale_y_continuous(limits=c(min(as.numeric(df_native$ResourceId)),max(as.numeric(df_native$ResourceId))+1))

So now Lucas can convert such thing into a simple Paje trace and use triva to see whether he can turn this into interesting visualizations.

Entered on [2013-07-10 mer. 17:13]

Trying to extract a hierarchical trace for smart data aggregation

Table of Contents

Sitemap

Useful links

Obtain useful information via REST

Useful information from OAR mysql

Dumping the database

Having fun with R