Devops

Apache Kafka-Installation Steps

Apache Kafka is an open-source distributed streaming platform that is used to build real-time streaming data pipelines and applications that adapt to data streams. Streaming data is data that is continuously generated by thousands of data sources, which typically send the data records simultaneously.

Kafka is a broker-based solution that operates by maintaining streams of data as records within a cluster of servers. Kafka servers can span multiple data centers and provide data persistence by storing streams of records (messages) across multiple server instances in topics.

A topic stores records or messages as a series of tuples, a sequence of immutable Python objects, which consist of a key, a value, and a timestamp.

Apache Kafka Cluster

Apache Kafka requires a running Zookeeper instance, which is used for reliable distributed coordination. A zookeeper can be installed from

this link https://zookeeper.apache.org/.

Please downloading the Required Files:

Download Server JRE according to your OS and CPU architecture from http://www.oracle.com/
Download and extract Zookeeper using 7-zip from http://zookeeper.apache.org/releases.html
Download and extract Kafka using 7-zip from http://kafka.apache.org/downloads.html

Here we are using a full-fledged zookeeper and not the one packaged with Kafka because it will be a single-node Zookeeper instance.
If you want you can run Kafka with packaged zookeeper located in the Kafka package inside \kafka\bin\windows directory.

After installing JDK, add the JAVA_HOME path to the Environment variables.

Zookeeper Installation on windows:

Goto your Zookeeper config directory. For me its C:\zookeeper-3.4.7\conf
Rename file “zoo_sample.cfg” to “zoo.cfg”
Open zoo.cfg in any text editor like notepad but I prefer notepad++.
Find & edit dataDir=/tmp/zookeeper to C:\zookeeper-3.4.7\data
Add entry in System Environment Variables as we did for Java
a. Add in System Variables ZOOKEEPER_HOME = C:\zookeeper-3.4.7
b. Edit System Variable named “Path” add;%ZOOKEEPER_HOME%\bin;
You can change the default Zookeeper port in the zoo.cfg file (Default port 2181).
Run Zookeeper by opening a new cmd and type zkserver

Setting Up Kafka on Windows:

Go to your Kafka config directory. For me its C:\kafka_2.11-0.9.0.0\config
Edit file “server.properties”
Find & edit line “log.dirs=/tmp/kafka-logs” to “log.dir= C:\kafka_2.11-0.9.0.0\kafka-logs”.
If your Zookeeper is running on some other machine or cluster you can Edit “zookeeper.connect:2181” to your custom IP and port.
Also Kafka port & broker.id is configurable in this file. Leave other settings as it is.
Your Kafka will run on default port 9092 & connect to the zookeeper’s default port which is 2181.

Running Kafka Server on Windows:

Important: Please ensure that your Zookeeper instance is up and running before starting a Kafka server.

Go to your Kafka installation directory C:\kafka_2.11-0.9.0.0\
Open a command prompt here by pressing Shift + right click and choose “Open command window here” option.
Now type .\bin\windows\kafka-server-start.bat .\config\server.properties and press Enter.
.\bin\windows\kafka-server-start.bat .\config\server.properties
If everything went fine, your command prompt will look like this one:
Now your Kafka is up and running, you can create topics to store messages.
Also we can produce or consume data from Java or Scala code or directly from the command prompt.

Creating topics:

Now create a topic with the name “test” and replication factor 1, as we have only one Kafka
server running. If you have a cluster with more than 1 Kafka server running, you can increase
the replication factor accordingly which will increase the data availability and act like a fault-tolerant system.
Open a new command prompt in the location C:\kafka_2.11-0.9.0.0\bin\windows
Type the following command and hit Enter:
kafka-topics.bat –create –zookeeper localhost:2181 –replication-factor 1 –partitions 1 –topic test