I got it working... but it was brutal, about 300 lines of code. I feel like I did it the hard way, but I wasn't sure if there was an easier way after reading the CSV parser code.
- Parsing the Strings into Longs, Doubles, Strings
- Finding out the "worst" type for each column and normalizing across the column
- Making lookup tables for each column that needs it (small number of ints, or Strings)
- Generate a dataset based on the output column name
Is there an easier way to do this?
Can it be part of the library?
class TableDataLoader
- TableDataLoader(Table<Long, String, String>)
- getDataSet(String)
- tableToDataSet_Classification(ColumnInfo, List, SortedSet, int, int)
- tableToDataSet_Regression(ColumnInfo, List, SortedSet, int, int)
class ColumnInfo
- ColumnInfo(String, Map<Long, String>)
- collectionToSortedUniqueStringList(Collection)
- parseColumn(Map<Long, String>)
- parseToLowestObject(String, Class<?>)
- constructJSATCategoricalData()
- constructLabelLookups()
- getCategoricalData()
- getName()
- getType()
- isLookup()
- getRowValue(Number)
- getKeyFromLookupId(int)
- getAllRowKeys()
I got it working... but it was brutal, about 300 lines of code. I feel like I did it the hard way, but I wasn't sure if there was an easier way after reading the CSV parser code.
Is there an easier way to do this?
Can it be part of the library?
class TableDataLoader
class ColumnInfo