-
Notifications
You must be signed in to change notification settings - Fork 2
Expand file tree
/
Copy pathREADME
More file actions
18 lines (14 loc) · 1.16 KB
/
README
File metadata and controls
18 lines (14 loc) · 1.16 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Name: Kshitij Kansal
Roll No: 201101031
About:
The LibXML++ library had been used to parse the XML file provided. After parsing this stop words have been removed. Stop words are stored
in the file stopwords.txt in src/ directory. After removing stop words stemming has been done using porter2_stemmer. The code for the same is
also present in the src/ directory. After doing this a map of the remaining strings was created with key as string name and value being the
document id of those documents in which this string comes.
After the end of indexing, the map is written into a file "parse.txt". User has to specify the path of this file in index.sh script present in
bin/ directory. After indexing is complete, query is done by loading the file "parse.txt" in a map and then checking for the occurances.
For indexing the path to "parse.txt" needs to be specified in query.sh script present bin/ directory.
The input and output formats are exatly as specified in the Assignemnt. Also for compiling the codes and installing the required directories
install.sh script present in bin. directory should be run as a root user.
For further queries,
Contact: kshitij.kansal@students.iiit.ac.in