Class TopologyBuilder


  • public class TopologyBuilder
    extends Object
    TopologyBuilder exposes the Java API for specifying a topology for Heron to execute. Topologies are Thrift structures in the end, but since the Thrift API is so verbose, TopologyBuilder greatly eases the process of creating topologies. The template for creating and submitting a topology looks something like:

     TopologyBuilder builder = new TopologyBuilder();
    
     builder.setSpout("1", new TestWordSpout(true), 5);
     builder.setSpout("2", new TestWordSpout(true), 3);
     builder.setBolt("3", new TestWordCounter(), 3)
              .fieldsGrouping("1", new Fields("word"))
              .fieldsGrouping("2", new Fields("word"));
     builder.setBolt("4", new TestGlobalCount())
              .globalGrouping("1");
    
     Map conf = new HashMap();
     conf.put(Config.TOPOLOGY_WORKERS, 4);
    
     HeronSubmitter.submitTopology("mytopology", conf, builder.createTopology());
     

    Running the exact same topology in simulator (in process), and configuring it to log all tuples emitted, looks like the following. Note that it lets the topology run for 10 seconds before shutting down the local cluster.

     TopologyBuilder builder = new TopologyBuilder();
    
     builder.setSpout("1", new TestWordSpout(true), 5);
     builder.setSpout("2", new TestWordSpout(true), 3);
     builder.setBolt("3", new TestWordCounter(), 3)
              .fieldsGrouping("1", new Fields("word"))
              .fieldsGrouping("2", new Fields("word"));
     builder.setBolt("4", new TestGlobalCount())
              .globalGrouping("1");
    
     Map conf = new HashMap();
     conf.put(Config.TOPOLOGY_WORKERS, 4);
     conf.put(Config.TOPOLOGY_DEBUG, true);
    
     LocalCluster cluster = new LocalCluster();
     cluster.submitTopology("mytopology", conf, builder.createTopology());
     Utils.sleep(10000);
     cluster.shutdown();
     

    The pattern for TopologyBuilder is to map component ids to components using the setSpout and setBolt methods. Those methods return objects that are then used to declare the inputs for that component.

    • Constructor Detail

      • TopologyBuilder

        public TopologyBuilder()
    • Method Detail

      • setBolt

        public BoltDeclarer setBolt​(String id,
                                    IRichBolt bolt)
        Define a new bolt in this topology with parallelism of just one thread.
        Parameters:
        id - the id of this component. This id is referenced by other components that want to consume this bolt's outputs.
        bolt - the bolt
        Returns:
        use the returned object to declare the inputs to this component
      • setBolt

        public BoltDeclarer setBolt​(String id,
                                    IRichBolt bolt,
                                    Number parallelismHint)
        Define a new bolt in this topology with the specified amount of parallelism.
        Parameters:
        id - the id of this component. This id is referenced by other components that want to consume this bolt's outputs.
        bolt - the bolt
        parallelismHint - the number of tasks that should be assigned to execute this bolt. Each task will run on a thread in a process somewhere around the cluster.
        Returns:
        use the returned object to declare the inputs to this component
      • setBolt

        public BoltDeclarer setBolt​(String id,
                                    IBasicBolt bolt)
        Define a new bolt in this topology. This defines a basic bolt, which is a simpler to use but more restricted kind of bolt. Basic bolts are intended for non-aggregation processing and automate the anchoring/acking process to achieve proper reliability in the topology.
        Parameters:
        id - the id of this component. This id is referenced by other components that want to consume this bolt's outputs.
        bolt - the basic bolt
        Returns:
        use the returned object to declare the inputs to this component
      • setBolt

        public BoltDeclarer setBolt​(String id,
                                    IBasicBolt bolt,
                                    Number parallelismHint)
        Define a new bolt in this topology. This defines a basic bolt, which is a simpler to use but more restricted kind of bolt. Basic bolts are intended for non-aggregation processing and automate the anchoring/acking process to achieve proper reliability in the topology.
        Parameters:
        id - the id of this component. This id is referenced by other components that want to consume this bolt's outputs.
        bolt - the basic bolt
        parallelismHint - the number of tasks that should be assigned to execute this bolt. Each task will run on a thread in a process somwehere around the cluster.
        Returns:
        use the returned object to declare the inputs to this component
      • setBolt

        public BoltDeclarer setBolt​(String id,
                                    IWindowedBolt bolt)
                             throws IllegalArgumentException
        Define a new bolt in this topology. This defines a windowed bolt, intended for windowing operations. The IWindowedBolt.execute(TupleWindow) method is triggered for each window interval with the list of current events in the window.
        Parameters:
        id - the id of this component. This id is referenced by other components that want to consume this bolt's outputs.
        bolt - the windowed bolt
        Returns:
        use the returned object to declare the inputs to this component
        Throws:
        IllegalArgumentException - if parallelism_hint is not positive
      • setBolt

        public BoltDeclarer setBolt​(String id,
                                    IWindowedBolt bolt,
                                    Number parallelismHint)
                             throws IllegalArgumentException
        Define a new bolt in this topology. This defines a windowed bolt, intended for windowing operations. The IWindowedBolt.execute(TupleWindow) method is triggered for each window interval with the list of current events in the window.
        Parameters:
        id - the id of this component. This id is referenced by other components that want to consume this bolt's outputs.
        bolt - the windowed bolt
        parallelismHint - the number of tasks that should be assigned to execute this bolt. Each task will run on a thread in a process somwehere around the cluster.
        Returns:
        use the returned object to declare the inputs to this component
        Throws:
        IllegalArgumentException - if parallelismHint is not positive
      • setBolt

        public <K extends Serializable,​V extends SerializableBoltDeclarer setBolt​(String id,
                                                                                          IStatefulWindowedBolt<K,​V> bolt,
                                                                                          Number parallelismHint)
                                                                                   throws IllegalArgumentException
        Define a new bolt in this topology. This defines a stateful windowed bolt, intended for stateful windowing operations. The IWindowedBolt.execute(TupleWindow) method is triggered for each window interval with the list of current events in the window. During initialization of this bolt (potentially after failure) IStatefulComponent.initState(State) is invoked with its previously saved state.
        Type Parameters:
        K - Type of key for HashMapState
        V - Type of value for HashMapState
        Parameters:
        id - the id of this component. This id is referenced by other components that want to consume this bolt's outputs.
        bolt - the stateful windowed bolt
        parallelismHint - the number of tasks that should be assigned to execute this bolt. Each task will run on a thread in a process somwehere around the cluster.
        Returns:
        use the returned object to declare the inputs to this component
        Throws:
        IllegalArgumentException - parallelism_hint is not positive
      • setSpout

        public SpoutDeclarer setSpout​(String id,
                                      IRichSpout spout)
        Define a new spout in this topology.
        Parameters:
        id - the id of this component. This id is referenced by other components that want to consume this spout's outputs.
        spout - the spout
      • setSpout

        public SpoutDeclarer setSpout​(String id,
                                      IRichSpout spout,
                                      Number parallelismHint)
        Define a new spout in this topology with the specified parallelism. If the spout declares itself as non-distributed, the parallelismHint will be ignored and only one task will be allocated to this component.
        Parameters:
        id - the id of this component. This id is referenced by other components that want to consume this spout's outputs.
        parallelismHint - the number of tasks that should be assigned to execute this spout. Each task will run on a thread in a process somwehere around the cluster.
        spout - the spout