The KLiK Engine is a C++ Powered File Search Engine for the Enron Email Sample Dataset
C++
PHP 5.6.40
SQL 14.0
HTML5
CSS3
- Enron Email Dataset
- Size: 1.54 GB
Visual Studio 2017
WampServer Stack 3.0.6
Windows 10
MySQL Database 8.0.13
MySQLx APIs
C++ Boost Library
BootStrap v4.2.1
Details of important Features of the Application
- Forward Indexing:
300000 files/s
Incremental Processing (10000 files): 10 min
Total Time: 3hr
- Reverse Indexing
352000 files/s
Incremental Processing (10000 files): 15 min
Total Time: 2.4hr
- Querying
Single Word Querying: 0.1 - 0.7 sec
Multi Word Querying: 0.4 - 2.3 sec
- Implementation of
C++ Boost Libraryto facilitate in I/O processes, since the dataset had many small files. - Email Files loaded into memory at an
increment of 10000, followed by mass processing of all loaded files. After that, the memory was freed and the process was started anew for the next 10000 files. Stopping Wordsfiltered out of the email files- Implementation of
MySQLx APIsfor SQL connections. - Implementation of
unordered mapsfor memory performance enhancement Time calculationof entire as well as the incremental processes.
- Implementation of forward Indexer for reverse index creation
Incremental File Processinglike in forward indexing.Time Calculationfor the incremental and complete processes- Implementation of
rankingto ease in later searching - Implementation of
Relevance Ranking - Implementation of
Search Normalizationto prevent misuse of the ranking system by too many same words in a common file.
-
Implementation of
reverse indexin searching -
Calculation of
document scoreandinverse document scoreforrelevance ranking. -
Retrieval of
search query/stringfrom theGUI -
Top 15results returned from calculated search results. -
Stopping wordssafely removed fromsearch stringscorecalculation of each result and ordering in descending order.
scoreof results concerningkey-wordsbelonging to same filesmultipliedto get common score.- implementation of
ordered mapsfor automatic ordering of results with respect to their scores
- Created in
PHP/HTML5&CSS3 - implementation of
BootStrap4Framework for a presentable interface - Passing of input search query to the
C++ Searcher scriptand receiving list of results as output. - Display of all results with
email subjectastitlealong with the file path - The result titles are
file linksredirecting to a new browser windows displaying all of the relevantfile content. - Implementation of
time calculationon the GUI so user can see thequery timeas well
- Optimization (in components like indexing)
- Implementing of more advanced
indexingandrankingalgorithms - Continuous Bug fixes and improvements
A huge thanks to the wonderful team without which this entire project would not have been possible. Check out their profiles and star their repos! :)
![]() |
![]() |
![]() |
![]() |
|---|---|---|---|
| msaad1999 | mshaharyar17 | ahmed | aitasadduq |
Check out the complete project for this login system. KLiK is a complete Social Media website, along with a Complete Login/Registration system, Profile system, Chat room, Forum system and Blog/Polls/Event Management System.
Check out KLiK here
Do star my projects! :)
If you liked my work, please show support by
starringthe repository! It means a lot to me, and is all im asking for.






