You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+18-19Lines changed: 18 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,30 +1,30 @@
1
1
# Kotlin Spark API
2
2
3
3
4
-
Your next API to work with [Spark](https://spark.apache.org/)
4
+
Your next API to work with [Spark](https://spark.apache.org/).
5
5
6
-
We are looking to have this as a part of https://github.com/apache/spark repository. Consider this as beta-quality software.
6
+
We are looking to have this as a part of https://github.com/apache/spark repository. Consider this beta-quality software.
7
7
8
8
## Goal
9
9
10
10
This project adds a missing layer of compatibility between [Kotlin](https://kotlinlang.org/) and [Spark](https://spark.apache.org/).
11
11
12
-
Despite Kotlin having first-class compatibility API, Kotlin developers might want to use familiar features like data classes and lambda expressions as simple expressions in curly braces or method references.
12
+
Despite Kotlin having first-class compatibility API, Kotlin developers may want to use familiar features like data classes and lambda expressions as simple expressions in curly braces or method references.
13
13
14
14
## Non-goals
15
15
16
-
There is no goal to replace any currently supported language or provide them with some functionality to support Kotlin language.
16
+
There is no goal to replace any currently supported language or provide other APIs with some functionality to support Kotlin language.
17
17
18
18
## Installation
19
19
20
-
Currently, there are no kotlin-spark-api artifacts in maven central, but you can obtain copy using JitPack here: [](https://jitpack.io/#JetBrains/kotlin-spark-api)
20
+
Currently, there are no kotlin-spark-api artifacts in maven central, but you can obtain a copy using JitPack here: [](https://jitpack.io/#JetBrains/kotlin-spark-api)
21
21
22
22
There is support for `Maven`, `Gradle`, `SBT`, and `leinengen` on JitPack.
23
23
24
-
This project does not force you to use any concrete version of spark, but we've only tested it with spark `3.0.0-preview2`.
25
-
We believe it should also work fine with version`2.4.5`
24
+
This project does not force you to use any specific version of Spark, but it has only been tested it with spark `3.0.0-preview2`.
25
+
We believe it can work with Spark`2.4.5` but we cannot guarantee that.
26
26
27
-
So if you're using Maven you'll hve to add following into your `pom.xml`:
27
+
So if you're using Maven you'll have to add the following into your `pom.xml`:
28
28
29
29
```xml
30
30
<repositories>
@@ -55,7 +55,7 @@ First (and hopefully last) thing you need to do is to add following import to yo
55
55
importorg.jetbrains.spark.api.*
56
56
```
57
57
58
-
Then you can create SparkSession we all remember and love
58
+
Then you can create a SparkSession:
59
59
60
60
```kotlin
61
61
val spark =SparkSession
@@ -65,25 +65,24 @@ val spark = SparkSession
65
65
66
66
```
67
67
68
-
To create Dataset you may call `toDS` method like this
68
+
To create a Dataset you can call `toDS` method:
69
69
70
70
```kotlin
71
71
spark.toDS("a" to 1, "b" to 2)
72
72
```
73
73
74
74
Indeed, this produces `Dataset<Pair<String, Int>>`. There are a couple more `toDS` methods which accept different arguments.
75
75
76
-
Also, there are several interesting aliases in API, like `leftJoin`, `rightJoin` etc.
77
-
Interesting fact about them that they're null-safe by design. For example, `leftJoin` is aware of nullability and returns `Dataset<Pair<LEFT, RIGHT?>>`.
78
-
Note that were forcing `RIGHT` to be nullable for you as a developer to be able to handle this situation.
76
+
Also, there are several aliases in API, like `leftJoin`, `rightJoin` etc. These are null-safe by design. For example, `leftJoin` is aware of nullability and returns `Dataset<Pair<LEFT, RIGHT?>>`.
77
+
Note that we are forcing `RIGHT` to be nullable for you as a developer to be able to handle this situation.
79
78
80
-
We know that `NullPointerException`s are hard to debug in Spark And trying hard to make them happen as rare as possible.
79
+
We know that `NullPointerException`s are hard to debug in Spark, and we are trying hard to make them as rare as possible.
81
80
82
81
## Useful helper methods
83
82
84
83
### `withSpark`
85
84
86
-
We provide you with useful function `withSpark`, which accepts everything that may be needed to run spark — properties, name, master location and so on. It also accepts a block of code to execute inside spark context.
85
+
We provide you with useful function `withSpark`, which accepts everything that may be needed to run Spark — properties, name, master location and so on. It also accepts a block of code to execute inside Spark context.
87
86
88
87
After work block ends, `spark.stop()` is called automatically.
89
88
@@ -99,9 +98,9 @@ withSpark {
99
98
100
99
### `withCached`
101
100
102
-
It may easily happen that we need to fork our computation to several paths. To compute things only once we should call `cache`
101
+
It can easily happen that we need to fork our computation to several paths. To compute things only once we should call `cache`
103
102
method. But there it is hard to control when we're using cached `Dataset` and when not.
104
-
It is also easy to forget to unpersist cached data, which may make break things unexpectably or take more memory
103
+
It is also easy to forget to unpersist cached data, which can break things unexpectably or take more memory
105
104
than intended.
106
105
107
106
To solve these problems we introduce `withCached` function
@@ -124,10 +123,10 @@ Here we're showing cached `Dataset` for debugging purposes then filtering it. Th
124
123
125
124
## Examples
126
125
127
-
You can find more examples in[examples](https://github.com/JetBrains/kotlin-spark-api/tree/master/examples/src/main/kotlin/org/jetbrains/spark/api/examples) module.
126
+
For more, check out[examples](https://github.com/JetBrains/kotlin-spark-api/tree/master/examples/src/main/kotlin/org/jetbrains/spark/api/examples) module.
128
127
129
128
## Issues and feedback
130
129
131
130
Issues and any feedback are very welcome in `Issues` here.
132
131
133
-
If you find that we missed some important features — please report it, and we'll consider adding them.
132
+
If you find that we missed some important features — let us know!
0 commit comments