Skip to content

Propagating Changes in One Directory to Another

Ahmed Elbahtemy edited this page Apr 18, 2019 · 15 revisions

Use Case Overview

In this use case, we create Brooklin datastreams to reflect changes in one file system directory to another.

Use Case Summary

Instructions

1. Set up ZooKeeper

  1. Download the latest stable release of ZooKeeper.

  2. Untar the ZooKeeper tarball

    tar xzvf zookeeper-3.4.14.tar.gz
    cd zookeeper-3.4.14 
  3. Start a ZooKeeper server

    bin/zkServer.sh start conf/zoo_sample.cfg

2. Set up Brooklin

  1. Download the latest tarball (tgz) from Brooklin releases.
  2. Untar the Brooklin tarball
    tar -xzf brooklin-1.0.0.tgz
    cd brooklin-1.0.0 
  3. Run Brooklin
    bin/brooklin-server-start.sh config/server.properties

3. Create a Datastream

  1. Create a datastream to sync changes made in a source directory to a destination directory.

    # Replace <src-dir> and <dest-dir> below with file paths of source and destination 
    # directories, respectively
    bin/brooklin-rest-client.sh -o CREATE -u http://localhost:32311/ -n first-dir-datastream -s <src-dir> -d <dest-dir> -dp 1 -c dirC -p 1 -t dirTP -m '{"owner":"test-user"}'

    Here are the options we used to create this datastream:

    -o CREATE                      The operation is datastream creation
    -u http://localhost:32311/     Datstream Management Service URI
    -n first-dir-datastream        Datastream name
    -s <src-dir>                   Datastream source (source directory path in this case)
    -d <dest-dir>                  Datastream destination (destination directory path in this case)
    -c dirC                        Connector name ("dirC" is the name we use to refer to DirectoryConnector in config)
    -t dirTP                       Transport provider name ("dirTP" is the name we use to refer to DirectoryTransportProvider in config)
    -p 1                           Number of source partitions
    -dp 1                          Number of destination partitions
    -m '{"owner":"test-user"}'     Datastream metadata (specifying datastream owner is mandatory)
    
  2. Verify the datastream creation by requesting all datastream metadata from Brooklin using the command line REST client.

    bin/brooklin-rest-client.sh -o READALL -u http://localhost:32311/
  3. You can also view some more information about the different Datastreams and DatastreamTasks by querying the health monitoring REST endpoint of the Datastream Management Service.

    curl -s "http://localhost:32311/health"

4. Try it out

  1. Add/Delete files and/or directories in the source directory you specified when you created the datastream in step 3.

  2. Observe the destination directory you specified when you created the datastream in step 3.

  3. You can also observe the log statements produced to stdout by Brooklin whenever you make changes.

5. Stop Brooklin and ZooKeeper

When you are done, run the following commands to stop all running apps.

# Replace <brooklin-dir> and <zookeeper-dir> with Brooklin and ZooKeeper directories, respectively
<brooklin-dir>/bin/brooklin-server-stop.sh
<zookeeper-dir>/bin/zkServer.sh stop conf/zoo_sample.cfg