Skip to content

Commit 0e04d48

Browse files
authored
Quick start guide (#3)
1 parent 8fe72e8 commit 0e04d48

File tree

2 files changed

+165
-0
lines changed

2 files changed

+165
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -124,6 +124,7 @@ Here we're showing cached `Dataset` for debugging purposes then filtering it. Th
124124
## Examples
125125

126126
For more, check out [examples](https://github.com/JetBrains/kotlin-spark-api/tree/master/examples/src/main/kotlin/org/jetbrains/spark/api/examples) module.
127+
To get up and running quickly, check out this [tutorial](docs/quick-start-guide.md).
127128

128129
## Issues and feedback
129130

docs/quick-start-guide.md

Lines changed: 164 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,164 @@
1+
# Quick Start Guide
2+
3+
This tutorial provides instructions to help you get started with Kotlin Spark API. We use an example similar to the official [Apache Spark
4+
Quick Start Guide](https://spark.apache.org/docs/3.0.0-preview2/quick-start.html#self-contained-applications).
5+
You'll learn what you need to set up your environment, how to write, package and execute a simple self-contained application.
6+
7+
Prerequisites:
8+
- You need to have Java installed and have the JAVA_HOME environment variable pointing to the Java installation.
9+
- You need to have Apache Spark installed and have SPARK_HOME environment variable pointing to the Spark installation.
10+
We recommend using Apache Spark 3.0.0-preview2 version. You can download it from the [Spark official website](https://spark.apache.org/downloads.html).
11+
12+
Note: You can use Apache Spark 2.4.5, but we haven't tested Kotlin Spark API with it.
13+
14+
## Self-contained application
15+
16+
For the purposes of this tutorial, let's write a Kotlin program that counts the number of lines containing 'a',
17+
and the number containing 'b' in the Spark README. Note that you'll need to replace `YOUR_SPARK_HOME` with the
18+
location where Spark is installed:
19+
20+
```kotlin
21+
/* SimpleApp.kt */
22+
@file:JvmName("SimpleApp")
23+
import org.jetbrains.spark.api.*
24+
25+
fun main() {
26+
val logFile = "YOUR_SPARK_HOME/README.md" // Change to your Spark Home path
27+
withSpark {
28+
spark.read().textFile(logFile).withCached {
29+
val numAs = filter { it.contains("a") }.count()
30+
val numBs = filter { it.contains("b") }.count()
31+
println("Lines with a: $numAs, lines with b: $numBs")
32+
}
33+
}
34+
}
35+
```
36+
37+
## Building the application with Maven
38+
39+
Because Kotlin Spark API is not part of the official Apache Spark distribution yet, it is not enough to add Spark
40+
as a dependency in your pom.xml file.
41+
You need to:
42+
- Add Spark as a dependency
43+
- Add Kotlin Spark API as a dependency
44+
- Add Kotlin Standard Library as a dependency
45+
46+
When packaging your project into a jar file, you need to explicitly include Kotlin Spark API and Kotlin Standard Library
47+
dependencies, for example, using `maven-shade-plugin`.
48+
49+
Here's what the `pom.xml` looks like for this example:
50+
```xml
51+
<project>
52+
<modelVersion>4.0.0</modelVersion>
53+
54+
<groupId>org.example</groupId>
55+
<artifactId>kotlin-spark-example</artifactId>
56+
<version>1.0-SNAPSHOT</version>
57+
58+
<name>Sample Project</name>
59+
<packaging>jar</packaging>
60+
61+
<properties>
62+
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
63+
<kotlin.version>1.3.72</kotlin.version>
64+
<kotlin.code.style>official</kotlin.code.style>
65+
</properties>
66+
67+
<repositories> <!-- Kotlin Spark API is currently published on jitpack.io -->
68+
<repository>
69+
<id>jitpack.io</id>
70+
<url>https://jitpack.io</url>
71+
</repository>
72+
</repositories>
73+
74+
<dependencies>
75+
<dependency>
76+
<groupId>org.jetbrains.kotlin</groupId>
77+
<artifactId>kotlin-stdlib</artifactId>
78+
<version>1.3.72</version>
79+
</dependency>
80+
<dependency> <!-- Kotlin Spark API dependency -->
81+
<groupId>com.github.JetBrains.kotlin-spark-api</groupId>
82+
<artifactId>kotlin-spark-api</artifactId>
83+
<version>0.1.0</version>
84+
</dependency>
85+
<dependency> <!-- Spark dependency -->
86+
<groupId>org.apache.spark</groupId>
87+
<artifactId>spark-sql_2.12</artifactId>
88+
<version>3.0.0-preview2</version>
89+
</dependency>
90+
</dependencies>
91+
92+
<build>
93+
<plugins>
94+
<plugin>
95+
<groupId>org.apache.maven.plugins</groupId>
96+
<artifactId>maven-shade-plugin</artifactId>
97+
<version>3.2.4</version>
98+
<executions>
99+
<execution>
100+
<phase>package</phase>
101+
<goals>
102+
<goal>shade</goal>
103+
</goals>
104+
<configuration>
105+
<artifactSet>
106+
<includes>
107+
<include>com.github.JetBrains.kotlin-spark-api:*</include>
108+
<include>org.jetbrains.kotlin:* </include>
109+
</includes>
110+
</artifactSet>
111+
</configuration>
112+
</execution>
113+
</executions>
114+
</plugin>
115+
116+
<plugin>
117+
<groupId>org.jetbrains.kotlin</groupId>
118+
<artifactId>kotlin-maven-plugin</artifactId>
119+
<version>1.3.72</version>
120+
<configuration>
121+
<sourceDirs>src/main/kotlin</sourceDirs>
122+
<jvmTarget>1.8</jvmTarget>
123+
<myIncremental>true</myIncremental>
124+
</configuration>
125+
<executions>
126+
<execution>
127+
<id>compile</id>
128+
<goals>
129+
<goal>compile</goal>
130+
</goals>
131+
</execution>
132+
</executions>
133+
</plugin>
134+
</plugins>
135+
</build>
136+
</project>
137+
```
138+
139+
Here's what the project structure should look like:
140+
```
141+
./pom.xml
142+
./src
143+
./src/main
144+
./src/main/kotlin
145+
./src/main/kotlin/SimpleApp.kt
146+
147+
```
148+
149+
Now you can package the application using Maven:
150+
`mvn package`
151+
152+
When done, you can execute the packaged application with `./bin/spark-submit`:
153+
154+
`YOUR_SPARK_HOME/bin/spark-submit --class "SimpleApp" --master local YOUR_PROJECT/target/kotlin-spark-example-1.0-SNAPSHOT.jar`
155+
156+
This example is also available as a [GitHub repo](https://github.com/MKhalusova/kotlin-spark-example), feel free to give it a try.
157+
158+
159+
160+
161+
162+
163+
164+

0 commit comments

Comments
 (0)