Skip to content
Philip (flip) Kromer edited this page May 16, 2012 · 9 revisions

Guide to the Wukong Toolkit

  • Wukong: data flows and job flows

    • concepts -- fundamental concepts and data models
    • wukong-workflow -- Workflow definition
    • wukong-dataflow -- Dataflow definition
    • wukong-mapred -- Write mapreduce jobs using wukong transforms; debug them on your laptop, run them in Hadoop
    • wukong-widgets -- Common data transforms
    • wukong-fs -- Read and manipulate files on local, hdfs (jruby or thrift), s3n or s3hdfs filesystems using the standard ruby File interface.
    • wukong-flume -- Flume flow configuration and wukong-streamer flume decorators
    • references -- Further reading
    • PDF with dataflow and workflow graphs; original .graffle file
  • Hanuman: Elegant small graph assembly

    • models -- graph, stage, event
    • binding -- bind graph to resources, execute it
    • graphdot -- express graph as .dot (graphviz) format, and thusly PNG or SVG.
    • canvas -- view and edit a hanuman graph from your browser
  • Swineherd: Common interface on ugly tools

    • commander -- Turn readable hash into safe commandline (param conv, escaping) does this go in configliere?
    • launcher -- Execute command, capture stdin/stderr
    • reporter -- Summarize execution with a viacondios-able hash
    • mixins -- Java, gnu-style, has_input/output
    • template -- template scripts with configliere variables
    • apps -- Hadoop, pig, flume; ?? cp, mv, rm, zip, tar, bz2, gz, ssh, scp ??
  • Gorillib: minimal-dependency core language modification and low-level toolkit. Fine-grained control over what is loaded.

    • core -- indispensible improvements to the core language, metaprogramming support, and broadly-useful convenience methods.
    • path_helpers -- path_to, autoload_paths (like $PATH)
  • go-model structures

  • Configliere: Manage settings

    • commandline -- read settings from commandline params
    • command -- git-style executables; provide multiple commands (including scoped options)
    • layer -- Project settings through a late-resolved stack of config objects. Intended to solve the 'patch the config for test/dev/prod', the 'only some config variables apply for this command, some apply for this other one', and the 'organization / cloud / cluster / facet / server' problems
  • Vayacondios: data goes in, the right thing happens. Universal routing of facts, configuration and metrics throughout the organization.

    • server -- receive events via HTTP, websockets, flume, or UDP (statsd), and make simple constrained queries.
    • configliere -- transparently syndicate configuration through configliere
    • notifications. -- an activesupport/notification observer
    • triggers -- plug in any wukong transformer to manipulate events upon receipt
    • workers -- collection of small daemons that pull in truth, scheduled or on-demand.
    • backend -- mongo; cube; activity stream
  • Ironfan: System Diagram come to life

    • models -- cluster, facet, server, machine, component, aspect, announcement. Use configliere/layer and gorillib/dsl_model.
    • knife plugins --
    • silverware -- discovery and aspect slicing
    • pantry -- cookbooks
    • ci --
    • chimpstation -- set up a workstation
    • homebase -- organizes all of it
  • Goliath + SenorArmando: