Skip to content

Specifications for basic data structures

Wei Tang edited this page Feb 14, 2014 · 8 revisions
  1. Job
  2. Task
  3. Workunit
  4. Client

Job

Attribute Type Json string Values Notes
Id string id e.g. c779a7f7-953d-4079-8388-591ee2065bad uuid
Jid string jid e.g. 12367 incremental job id per awe-server domain
Info *Info info job info
Tasks []*Task tasks an array of task struct
State string state init: initial state at job submission
queued:parsed and waiting in queue
in-progress:at least one workunit is out
completed: all tasks done
suspend: paused for failure or manually
deleted: manually removed
Registered bool registered true: in the queue (in memory)
false: in mongodb only
RemainTasks int remaintasks 0 -- len(Tasks) number of tasks not done
UpdateTime time.Time updatetime e.g."2014-02-09T15:43:40.574Z" timestamp for last state update
Notes string notes e.g."job suspended for xxx reason" notes for last state update

Task

Attribute Type Json string Values Notes
Id string id e.g. c779a7f7-953d-4079-8388-591ee2065bad_1 jobid_stage
Info *Info info job info (inherited)
Inputs IOmap inputs list of input files
Outputs IOmap outputs list of output files
Predata IOmap predata list of prerequisite data (e.g. reference dbs)
Cmd *Command cmd cmd definition (cmd name, args, etc)
Partition *PartInfo partinfo how to split files (which files to split/merge, what index to use)
MaxWorkSize int maxworksize e.g. 50 max input per workunit (in MB)
TotalWork int totalwork e.g. 16 number of workunits to split
RemainWork int remainwork e.g. 7 number of workunits that not done
State string state init: initial state
queued:ready and waiting in queue
in-progress:at least one workunit is out
pending:not parsed yet for parent task not done
completed: all workunits done
suspend: paused for failure or manually
CreatedDate time.Time createdate e.g."2014-02-09T15:43:40.574Z" timestamp (“queued”)
StartedDate time.Time starteddate e.g."2014-02-09T15:45:40.574Z" timestamp for (“in-progress”)
CompletedDate time.Time completeddate e.g."2014-02-09T15:46:40.574Z" timestamp for (“completed”)
ComputeTime int computetime e.g. 3600 aggregated workunit compute walltime (in second)

Workunit

Attribute Type Json string Values Notes
Id string id e.g. c779a7f7-953d-4079-8388-591ee2065bad_1_5 jobid_stage_rank
Info *Info info job info (inherited)
Inputs IOmap inputs inherited from task
Outputs IOmap outputs inherited from task
Predata IOmap predata inherited from task
Cmd *Command cmd inherited from task
Rank int rank 0 or 5 0 means it is the only workunit, an integer >0 means it is one of multiple splits
Partition *PartInfo partinfo inherited from task
State string state queued:waiting in queue
checkout:being checked out by some client
completed:successfully done
suspend: failed thus suspended
Failed int e.g. 2 number of times failed (can result in “suspend” if larger than threshold (e.g. 5)
CheckoutTime time.Time checkout_time e.g."2014-02-09T15:43:40.574Z" timestamp being checked out
Client string client id of the client which is processing this workunit
ComputeTime int computetime e.g. 600 compute walltime (in second)

Client

Attribute Type Json string Values Notes
Id string id e.g. c779a7f7-953d-4079-8388-591ee2065bad uuid
Name string name human readable name
Group string group representing logical “queue’” name, matching job group name
User string user owner
Domain string domain e.g. “magellan” representing physical region of computing resources
InstanceId string instance_id e.g. 000dfa openstack instance id (optional)
InstanceType string instance_type idp100 openstack instance type/flavor (optional)
RemainTasks int remaintasks 0 -- len(Tasks) number of tasks not done
Host string string host ip address
CPUs int cores e.g. 8 number of cores
Apps []string apps [“fgs”,”bowtie”] array, supported commands, “*” for any app
RegTime time.Time regtime time stamp the clients first registered
Serve_time Serve_time serve_time e.g. 50h16m time period since first reigstered
Idle_time int idle_time 3600 time (in second) since last time idling
Status string Status active-idle: up running but idle
active-busy:processing some workunit(s)
suspend:not able to get workunits (for failure or manually paused)
Total_checkout int total_checkout e.g. 50 number of workunits checked out by this client
Total_completed int total_checkout e.g. 45 number of work units completed at this client
Total_failed int total_failed e.g. 5 number of work units failed at this client
Current_work map[string]bool current_work {<workunit_id>} id list of workunits being processed by this client
Skip_work []string skip_work [<workunit_id>] id list of workunits failed at this client (skip checking out next time)
Proxy bool proxy True means this client is a proxy
SubClients int subclients e.g. 5 number of subclients registered (valid only when proxy=true

Clone this wiki locally