For this project, I implemented a distributed publish/subscribe system similar to the original Kafka design.
You can find part 2 here: https://github.com/CS682-S22/distributed-system-part2-Jennytang1224
Implemented a Producer API that may be used by an application running on any host that publishes messages to a broker. The Producer allows the application to do the following:
- Connect to a
Broker - Send data to the
Brokerby providing abyte[]containing the data and aStringcontaining the topic.
Implemented a Consumer API that may be used by an application running on any host that consumes messages from a broker. The Consumer will allow the application to do the following:
- Connect to a
Broker - Retrieve data from the
Brokerusing a pull-based approach by specifying a topic of interest and a starting position in the message stream
The Broker accepts an unlimited number of connection requests from producers and consumers. The basic Broker implementation* will maintain a thread-safe, in-memory data structure that stores all messages. The basic Broker will be stateless with respect to the Consumer hosts.
In total three Producers, three Consumers, and one Broker.
Built multiple instances of the Broker running on separate hosts. Each topic may have multiple partitions, and each partition may be handled by different Broker. When a new message is posted it specifies both the topic and a key. Like in the real Kafka implementation, the key will be hashed to determine which Broker is managing the partition for that <key, topic>. I also designed the mechanism for directing a request to the appropriate Broker by implementing a custom load balancer that is essentially just another service that accepts a request containing a key and returns the host information of the Broker that manages that partition.