|
| 1 | +# Quick Start Guide |
| 2 | + |
| 3 | +This tutorial provides instructions to help you get started with Kotlin Spark API. We use an example similar to the official [Apache Spark |
| 4 | +Quick Start Guide](https://spark.apache.org/docs/3.0.0-preview2/quick-start.html#self-contained-applications). |
| 5 | +You'll learn what you need to set up your environment, how to write, package and execute a simple self-contained application. |
| 6 | + |
| 7 | +Prerequisites: |
| 8 | +- You need to have Java installed and have the JAVA_HOME environment variable pointing to the Java installation. |
| 9 | +- You need to have Apache Spark installed and have SPARK_HOME environment variable pointing to the Spark installation. |
| 10 | +We recommend using Apache Spark 3.0.0-preview2 version. You can download it from the [Spark official website](https://spark.apache.org/downloads.html). |
| 11 | + |
| 12 | +Note: You can use Apache Spark 2.4.5, but we haven't tested Kotlin Spark API with it. |
| 13 | + |
| 14 | +## Self-contained application |
| 15 | + |
| 16 | +For the purposes of this tutorial, let's write a Kotlin program that counts the number of lines containing 'a', |
| 17 | +and the number containing 'b' in the Spark README. Note that you'll need to replace `YOUR_SPARK_HOME` with the |
| 18 | +location where Spark is installed: |
| 19 | + |
| 20 | +```kotlin |
| 21 | +/* SimpleApp.kt */ |
| 22 | +@file:JvmName("SimpleApp") |
| 23 | +import org.jetbrains.spark.api.* |
| 24 | + |
| 25 | +fun main() { |
| 26 | + val logFile = "YOUR_SPARK_HOME/README.md" // Change to your Spark Home path |
| 27 | + withSpark { |
| 28 | + spark.read().textFile(logFile).withCached { |
| 29 | + val numAs = filter { it.contains("a") }.count() |
| 30 | + val numBs = filter { it.contains("b") }.count() |
| 31 | + println("Lines with a: $numAs, lines with b: $numBs") |
| 32 | + } |
| 33 | + } |
| 34 | +} |
| 35 | +``` |
| 36 | + |
| 37 | +## Building the application with Maven |
| 38 | + |
| 39 | +Because Kotlin Spark API is not part of the official Apache Spark distribution yet, it is not enough to add Spark |
| 40 | +as a dependency in your pom.xml file. |
| 41 | +You need to: |
| 42 | +- Add Spark as a dependency |
| 43 | +- Add Kotlin Spark API as a dependency |
| 44 | +- Add Kotlin Standard Library as a dependency |
| 45 | + |
| 46 | +When packaging your project into a jar file, you need to explicitly include Kotlin Spark API and Kotlin Standard Library |
| 47 | +dependencies, for example, using `maven-shade-plugin`. |
| 48 | + |
| 49 | +Here's what the `pom.xml` looks like for this example: |
| 50 | +```xml |
| 51 | +<project> |
| 52 | + <modelVersion>4.0.0</modelVersion> |
| 53 | + |
| 54 | + <groupId>org.example</groupId> |
| 55 | + <artifactId>kotlin-spark-example</artifactId> |
| 56 | + <version>1.0-SNAPSHOT</version> |
| 57 | + |
| 58 | + <name>Sample Project</name> |
| 59 | + <packaging>jar</packaging> |
| 60 | + |
| 61 | + <properties> |
| 62 | + <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> |
| 63 | + <kotlin.version>1.3.72</kotlin.version> |
| 64 | + <kotlin.code.style>official</kotlin.code.style> |
| 65 | + </properties> |
| 66 | + |
| 67 | + <repositories> <!-- Kotlin Spark API is currently published on jitpack.io --> |
| 68 | + <repository> |
| 69 | + <id>jitpack.io</id> |
| 70 | + <url>https://jitpack.io</url> |
| 71 | + </repository> |
| 72 | + </repositories> |
| 73 | + |
| 74 | + <dependencies> |
| 75 | + <dependency> |
| 76 | + <groupId>org.jetbrains.kotlin</groupId> |
| 77 | + <artifactId>kotlin-stdlib</artifactId> |
| 78 | + <version>1.3.72</version> |
| 79 | + </dependency> |
| 80 | + <dependency> <!-- Kotlin Spark API dependency --> |
| 81 | + <groupId>com.github.JetBrains.kotlin-spark-api</groupId> |
| 82 | + <artifactId>kotlin-spark-api</artifactId> |
| 83 | + <version>0.1.0</version> |
| 84 | + </dependency> |
| 85 | + <dependency> <!-- Spark dependency --> |
| 86 | + <groupId>org.apache.spark</groupId> |
| 87 | + <artifactId>spark-sql_2.12</artifactId> |
| 88 | + <version>3.0.0-preview2</version> |
| 89 | + </dependency> |
| 90 | + </dependencies> |
| 91 | + |
| 92 | + <build> |
| 93 | + <plugins> |
| 94 | + <plugin> |
| 95 | + <groupId>org.apache.maven.plugins</groupId> |
| 96 | + <artifactId>maven-shade-plugin</artifactId> |
| 97 | + <version>3.2.4</version> |
| 98 | + <executions> |
| 99 | + <execution> |
| 100 | + <phase>package</phase> |
| 101 | + <goals> |
| 102 | + <goal>shade</goal> |
| 103 | + </goals> |
| 104 | + <configuration> |
| 105 | + <artifactSet> |
| 106 | + <includes> |
| 107 | + <include>com.github.JetBrains.kotlin-spark-api:*</include> |
| 108 | + <include>org.jetbrains.kotlin:* </include> |
| 109 | + </includes> |
| 110 | + </artifactSet> |
| 111 | + </configuration> |
| 112 | + </execution> |
| 113 | + </executions> |
| 114 | + </plugin> |
| 115 | + |
| 116 | + <plugin> |
| 117 | + <groupId>org.jetbrains.kotlin</groupId> |
| 118 | + <artifactId>kotlin-maven-plugin</artifactId> |
| 119 | + <version>1.3.72</version> |
| 120 | + <configuration> |
| 121 | + <sourceDirs>src/main/kotlin</sourceDirs> |
| 122 | + <jvmTarget>1.8</jvmTarget> |
| 123 | + <myIncremental>true</myIncremental> |
| 124 | + </configuration> |
| 125 | + <executions> |
| 126 | + <execution> |
| 127 | + <id>compile</id> |
| 128 | + <goals> |
| 129 | + <goal>compile</goal> |
| 130 | + </goals> |
| 131 | + </execution> |
| 132 | + </executions> |
| 133 | + </plugin> |
| 134 | + </plugins> |
| 135 | + </build> |
| 136 | +</project> |
| 137 | +``` |
| 138 | + |
| 139 | +Here's what the project structure should look like: |
| 140 | +``` |
| 141 | +./pom.xml |
| 142 | +./src |
| 143 | +./src/main |
| 144 | +./src/main/kotlin |
| 145 | +./src/main/kotlin/SimpleApp.kt |
| 146 | +
|
| 147 | +``` |
| 148 | + |
| 149 | +Now you can package the application using Maven: |
| 150 | +`mvn package` |
| 151 | + |
| 152 | +When done, you can execute the packaged application with `./bin/spark-submit`: |
| 153 | + |
| 154 | +`YOUR_SPARK_HOME/bin/spark-submit --class "SimpleApp" --master local YOUR_PROJECT/target/kotlin-spark-example-1.0-SNAPSHOT.jar` |
| 155 | + |
| 156 | +This example is also available as a [GitHub repo](https://github.com/MKhalusova/kotlin-spark-example), feel free to give it a try. |
| 157 | + |
| 158 | + |
| 159 | + |
| 160 | + |
| 161 | + |
| 162 | + |
| 163 | + |
| 164 | + |
0 commit comments