diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 0db17f204..698f83143 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,27 +1,50 @@ # Contributing guide -## Installing daru development dependencies +## Ruby toolchain -Either nmatrix or rb-gsl are NOT NECESSARY for using daru. They are just required for an optional speed up and for running the test suite. +This fork uses MRI Ruby `4.0.1` for development and CI. -To install dependencies, execute the following commands: +```bash +mise trust +mise use ruby@4.0.1 +bundle install +``` + +## Installing optional development dependencies + +`nmatrix` and `rb-gsl` are optional acceleration backends. They are not required +for the default test suite. + +Some integration suites depend on external services and native/system packages: + +- SQL and ActiveRecord integration specs require a compatible sqlite stack. +- DBI integration specs require DBI + sqlite adapter compatibility. +- Rserve integration specs require an available Rserve daemon. +- Gruff specs require ImageMagick/rmagick dependencies. + +Example Linux setup for the optional stacks: ``` bash sudo apt-get update -qq sudo apt-get install -y libgsl0-dev r-base r-base-dev sudo Rscript -e "install.packages(c('Rserve','irr'),,'http://cran.us.r-project.org')" sudo apt-get install libmagickwand-dev imagemagick -export DARU_TEST_NMATRIX=1 # for running nmatrix tests. -export DARU_TEST_GSL=1 # for running rb-GSL tests. -bundle install ``` -You don't need `DARU_TEST_NMATRIX` or `DARU_TEST_GSL` if you don't want to make changes -to those parts of the code. However, they will be set in CI and will raise a test failure -if something goes wrong. -And run the test suite (should be all green with pending tests): +Run the default suite: + +`bundle exec rspec` - `bundle exec rspec` +Run optional suites explicitly: + +```bash +DARU_TEST_SQL=1 bundle exec rspec --tag sql +DARU_TEST_DBI=1 bundle exec rspec --tag dbi +DARU_TEST_RSERVE=1 bundle exec rspec --tag rserve +DARU_TEST_NMATRIX=1 bundle exec rspec --tag nmatrix +DARU_TEST_GSL=1 bundle exec rspec --tag gsl +DARU_TEST_GRUFF=1 bundle exec rspec --tag gruff +``` If you have problems installing nmatrix, please consult the [nmatrix installation wiki](https://github.com/SciRuby/nmatrix/wiki/Installation) or the [mailing list](https://groups.google.com/forum/#!forum/sciruby-dev). @@ -29,8 +52,6 @@ If you have problems installing nmatrix, please consult the [nmatrix installatio While preparing your pull requests, don't forget to check your code with Rubocop: `bundle exec rubocop` - -[Optional] Install all Ruby versions which Daru currently supports with `rake spec setup`. ## Basic Development Flow @@ -41,8 +62,6 @@ While preparing your pull requests, don't forget to check your code with Rubocop 4. Run the test suite with `rake spec`. (Alternatively you can use `guard` as described [here](https://github.com/SciRuby/daru/blob/master/CONTRIBUTING.md#testing). Also run Rubocop coding style guidelines with `rake cop`. 5. Commit the changes with `git commit -am "briefly describe what you did"` and submit pull request. -[Optional] You can run rspec for all Ruby versions at once with `rake spec run all`. But remember to first have all Ruby versions installed with `ruby spec setup`. - ## Testing diff --git a/History.md b/History.md index 8abdb50ea..175424402 100644 --- a/History.md +++ b/History.md @@ -1,3 +1,19 @@ +# Unreleased +* Major Enhancements + - Port development baseline to MRI Ruby 4.0.1. + - Add `mise.toml` toolchain configuration for reproducible local setup. + - Add runtime stdlib dependencies (`matrix`, `csv`) required on modern Ruby. + - Add missing development dependencies used by specs (`prime`, `mutex_m`, `benchmark`). +* Fixes + - Restore compatibility for CSV keyword arguments and URL reading via `URI.open`. + - Add `GroupBy#[]` for scalar and tuple-style group access. + - Fix `DataFrame` and `Vector` behavior regressions around mixed indexes and row/vector mutation. + - Add `DateTimeIndex.format` support for explicit parsing format. + - Improve SQL file source handling by supporting `sqlite3` connections directly. +* Testing + - Remove remaining pending examples from the default suite. + - Make optional integration suites (`sql`, `dbi`, `rserve`, `gsl`, `nmatrix`, `gruff`) opt-in and capability-aware. + # 0.3 (30 May 2020) * Major Enhacements - Remove official support for Ruby < 2.5.1. Now we only test with 2.5.1 and 2.7.1. (@v0dro) diff --git a/README.md b/README.md index 6bbec1655..038ecc5bb 100644 --- a/README.md +++ b/README.md @@ -11,7 +11,7 @@ daru (Data Analysis in RUby) is a library for storage, analysis, manipulation an daru makes it easy and intuitive to process data predominantly through 2 data structures: `Daru::DataFrame` and `Daru::Vector`. Written in pure Ruby works with all ruby implementations. -Tested with MRI 2.5.1 and 2.7.1. +Current development and CI baseline in this fork is MRI 4.0.1. ## daru plugin gems @@ -53,6 +53,33 @@ This gem extends support for many Import and Export methods of `Daru::DataFrame` $ gem install daru ``` +## Development Setup + +This fork is tested on Ruby `4.0.1` and includes a `mise.toml` toolchain file. + +```console +$ mise trust +$ mise use ruby@4.0.1 +$ bundle install +$ bundle exec rspec +``` + +Optional integration specs are excluded by default and can be enabled explicitly: + +```console +$ DARU_TEST_SQL=1 bundle exec rspec --tag sql +$ DARU_TEST_DBI=1 bundle exec rspec --tag dbi +$ DARU_TEST_RSERVE=1 bundle exec rspec --tag rserve +``` + +Optional native backends are also opt-in: + +```console +$ DARU_TEST_GSL=1 bundle exec rspec --tag gsl +$ DARU_TEST_NMATRIX=1 bundle exec rspec --tag nmatrix +$ DARU_TEST_GRUFF=1 bundle exec rspec --tag gruff +``` + ## Notebooks #### Notebooks on most use cases diff --git a/daru.gemspec b/daru.gemspec index 28cf546db..c4a598267 100644 --- a/daru.gemspec +++ b/daru.gemspec @@ -29,6 +29,8 @@ Gem::Specification.new do |spec| # it is required by NMatrix, yet we want to specify clearly which minimal version is OK spec.add_runtime_dependency 'packable', '~> 1.3.13' + spec.add_runtime_dependency 'matrix' + spec.add_runtime_dependency 'csv' spec.add_development_dependency 'spreadsheet', '~> 1.1.1' spec.add_development_dependency 'bundler', '>= 1.10' @@ -42,18 +44,21 @@ Gem::Specification.new do |spec| spec.add_development_dependency 'nyaplot', '~> 0.1.5' spec.add_development_dependency 'nmatrix', '~> 0.2.1' if ENV['DARU_TEST_NMATRIX'] spec.add_development_dependency 'distribution', '~> 0.7' + spec.add_development_dependency 'prime' spec.add_development_dependency 'gsl', '~>2.1.0.2' if ENV['DARU_TEST_GSL'] spec.add_development_dependency 'dbd-sqlite3' spec.add_development_dependency 'dbi' spec.add_development_dependency 'activerecord', '~> 6.0' + spec.add_development_dependency 'mutex_m' + spec.add_development_dependency 'benchmark' spec.add_development_dependency 'mechanize' - # issue : https://github.com/SciRuby/daru/issues/493 occured - # with latest version of sqlite3 spec.add_development_dependency 'sqlite3' spec.add_development_dependency 'rubocop', '~> 0.49.0' spec.add_development_dependency 'ruby-prof' spec.add_development_dependency 'simplecov' - spec.add_development_dependency 'gruff' + # Gruff pulls native ImageMagick dependencies through rmagick. + # Keep it opt-in for environments that explicitly test plotting via Gruff. + spec.add_development_dependency 'gruff' if ENV['DARU_TEST_GRUFF'] spec.add_development_dependency 'webmock' spec.add_development_dependency 'nokogiri' diff --git a/lib/daru/core/group_by.rb b/lib/daru/core/group_by.rb index 78ef9cbf0..060b6946d 100644 --- a/lib/daru/core/group_by.rb +++ b/lib/daru/core/group_by.rb @@ -273,6 +273,15 @@ def get_group group ) end + # Returns a group as a DataFrame. Accepts scalar keys for single-level + # groups and tuple-like keys for multi-level groups. + def [](*group) + group = group.first if group.size == 1 && group.first.is_a?(Array) + group = [group] unless group.is_a?(Array) + + get_group(group) + end + # Iteratively applies a function to the values in a group and accumulates the result. # @param init (nil) The initial value of the accumulator. # @yieldparam block [Proc] A proc or lambda that accepts two arguments. The first argument diff --git a/lib/daru/dataframe.rb b/lib/daru/dataframe.rb index 5232bf8d5..739ecf9a9 100644 --- a/lib/daru/dataframe.rb +++ b/lib/daru/dataframe.rb @@ -2468,6 +2468,10 @@ def aggregate(options={}, multi_index_level=-1) end def group_by_and_aggregate(*group_by_keys, **aggregation_map) + if aggregation_map.empty? && group_by_keys.last.is_a?(Hash) + aggregation_map = group_by_keys.pop + end + group_by(*group_by_keys).aggregate(aggregation_map) end @@ -2863,9 +2867,12 @@ def deduce_index index, source, vectors_have_same_index elsif vectors_have_same_index source.values[0].index.dup else - all_indexes = source - .values.map { |v| v.index.to_a } - .flatten.uniq.sort # sort only if missing indexes detected + all_indexes = source.values.flat_map { |v| v.index.to_a }.uniq + begin + all_indexes = all_indexes.sort + rescue ArgumentError + # Mixed / non-comparable index types: preserve insertion order. + end Daru::Index.new all_indexes end @@ -3055,7 +3062,10 @@ def coerce_vector vector def update_data source, vectors @data = @vectors.each_with_index.map do |_vec, idx| - Daru::Vector.new(source[idx], index: @index, name: vectors[idx]) + vec_source = source[idx] + vec_source = vec_source.dup if vec_source.respond_to?(:dup) + + Daru::Vector.new(vec_source, index: @index, name: vectors[idx]) end end diff --git a/lib/daru/date_time/index.rb b/lib/daru/date_time/index.rb index 6847ca32e..feeff5490 100644 --- a/lib/daru/date_time/index.rb +++ b/lib/daru/date_time/index.rb @@ -124,7 +124,12 @@ def date_time_from date_string, date_precision date_string.match(/\-\d?\d/).to_s.delete('-').to_i ) else - DateTime.parse date_string + # Keep backward-compatible configurable parsing when format is set. + if Daru::DateTimeIndex.format + DateTime.strptime(date_string, Daru::DateTimeIndex.format) + else + DateTime.parse(date_string) + end end end @@ -215,6 +220,10 @@ class DateTimeIndex < Index include Enumerable Helper = DateTimeIndexHelper + class << self + attr_accessor :format + end + def self.try_create(source) if source && ArrayHelper.array_of?(source, ::DateTime) new(source, freq: :infer) diff --git a/lib/daru/io/io.rb b/lib/daru/io/io.rb index 1555c5d79..e92adebdb 100644 --- a/lib/daru/io/io.rb +++ b/lib/daru/io/io.rb @@ -1,4 +1,5 @@ module Daru + require 'open-uri' require_relative 'csv/converters.rb' module IOHelpers class << self @@ -16,6 +17,24 @@ def process_row(row,empty) end end + def process_fixed_width_row(line, ranges) + ranges.map do |range| + cell = line[range].to_s.strip + cell.empty? ? nil : try_string_to_number(cell) + end + end + + def fixed_width_ranges(line, expected_columns=nil) + starts = line.to_enum(:scan, /\S+/).map { Regexp.last_match.begin(0) } + return [] if starts.empty? + + starts = starts.first(expected_columns) if expected_columns + starts.each_with_index.map do |start_at, idx| + end_at = starts[idx + 1] || line.length + (start_at...end_at) + end + end + private INT_PATTERN = /^[-+]?\d+$/ @@ -103,7 +122,7 @@ def dataframe_write_csv dataframe, path, opts={} converters: :numeric }.merge(opts) - writer = ::CSV.open(path, 'w', options) + writer = ::CSV.open(path, 'w', **options) writer << dataframe.vectors.to_a unless options[:headers] == false dataframe.each_row do |row| @@ -153,10 +172,21 @@ def from_activerecord(relation, *fields) def from_plaintext filename, fields ds = Daru::DataFrame.new({}, order: fields) - fp = File.open(filename,'r') - fp.each_line do |line| - row = Daru::IOHelpers.process_row(line.strip.split(/\s+/),['']) - next if row == ["\x1A"] + lines = File.readlines(filename) + first_data_line = lines.find { |line| !line.strip.empty? && line.strip != "\x1A" } + ranges = Daru::IOHelpers.fixed_width_ranges(first_data_line.to_s, fields.size) + + lines.each do |line| + next if line.strip == "\x1A" + + row = + if ranges.size == fields.size && !ranges.empty? + Daru::IOHelpers.process_fixed_width_row(line, ranges) + else + Daru::IOHelpers.process_row(line.strip.split(/\s+/), ['']) + end + + row.concat([nil] * (fields.size - row.size)) if row.size < fields.size ds.add_row(row) end ds.update @@ -182,7 +212,7 @@ def load filename end def from_html path, opts - optional_gem 'mechanize', '~>2.7.5' + optional_gem 'mechanize', '>=2.7.5' page = Mechanize.new.get(path) page.search('table').map { |table| html_parse_table table } .keep_if { |table| html_search table, opts[:match] } @@ -231,7 +261,7 @@ def from_csv_prepare_converters(converters) def from_csv_hash_with_headers(path, opts) opts[:header_converters] ||= :symbol ::CSV - .parse(open(path), opts) + .parse(read_csv_source(path), **opts) .tap { |c| yield c if block_given? } .by_col.map { |col_name, values| [col_name, values] }.to_h end @@ -239,7 +269,7 @@ def from_csv_hash_with_headers(path, opts) def from_csv_hash(path, opts) csv_as_arrays = ::CSV - .parse(open(path), **opts) + .parse(read_csv_source(path), **opts) .tap { |c| yield c if block_given? } .to_a headers = ArrayHelper.recode_repeated(csv_as_arrays.shift) @@ -247,6 +277,16 @@ def from_csv_hash(path, opts) headers.each_with_index.map { |h, i| [h, csv_as_arrays[i]] }.to_h end + def read_csv_source(path) + path = path.to_s + + if path.match?(%r{\Ahttps?://}i) + URI.open(path, &:read) + else + File.read(path) + end + end + def html_parse_table(table) headers, headers_size = html_scrape_tag(table,'th') data, size = html_scrape_tag(table, 'td') diff --git a/lib/daru/io/sql_data_source.rb b/lib/daru/io/sql_data_source.rb index ca0aed0c6..f21651ff1 100644 --- a/lib/daru/io/sql_data_source.rb +++ b/lib/daru/io/sql_data_source.rb @@ -52,8 +52,27 @@ def result end end + # Private adapter class for sqlite3 gem connections + # @private + class Sqlite3Adapter < Adapter + private + + def column_names + result_table[0] || [] + end + + def rows + result_table.drop(1) + end + + def result_table + @result_table ||= @conn.execute2(@query) + end + end + private_constant :DbiAdapter private_constant :ActiveRecordConnectionAdapter + private_constant :Sqlite3Adapter def self.make_dataframe(db, query) new(db, query).make_dataframe @@ -75,10 +94,12 @@ def init_adapter(db, query) db = attempt_sqlite3_connection(db) if db.is_a?(String) && Pathname(db).exist? - case db - when DBI::DatabaseHandle + if defined?(DBI::DatabaseHandle) && db.is_a?(DBI::DatabaseHandle) DbiAdapter.new(db, query) - when ActiveRecord::ConnectionAdapters::AbstractAdapter + elsif defined?(SQLite3::Database) && db.is_a?(SQLite3::Database) + Sqlite3Adapter.new(db, query) + elsif defined?(ActiveRecord::ConnectionAdapters::AbstractAdapter) && + db.is_a?(ActiveRecord::ConnectionAdapters::AbstractAdapter) ActiveRecordConnectionAdapter.new(db, query) else raise ArgumentError, "Unknown database adapter type #{db.class}" @@ -86,11 +107,14 @@ def init_adapter(db, query) end def attempt_sqlite3_connection(db) - DBI.connect("DBI:SQLite3:#{db}") + SQLite3::Database.new(db).tap do |connection| + # Trigger a lightweight read so non-sqlite files fail early. + connection.execute('PRAGMA schema_version') + end rescue SQLite3::NotADatabaseException raise ArgumentError, "Expected #{db} to point to a SQLite3 database" rescue NameError - raise NameError, "In order to establish a connection to #{db}, please require 'dbi'" + raise NameError, "In order to establish a connection to #{db}, please require 'sqlite3'" end end end diff --git a/lib/daru/maths/arithmetic/vector.rb b/lib/daru/maths/arithmetic/vector.rb index e914903e1..226084968 100644 --- a/lib/daru/maths/arithmetic/vector.rb +++ b/lib/daru/maths/arithmetic/vector.rb @@ -91,8 +91,12 @@ def v2o_binary operation, other end def v2v_binary operation, other, opts={} - # FIXME: why the sorting?.. - zverok, 2016-05-18 - index = (@index.to_a | other.index.to_a).sort + index = (@index.to_a | other.index.to_a) + begin + index = index.sort + rescue ArgumentError + # Keep insertion order when index values are not mutually comparable. + end elements = index.map do |idx| this = self.index.include?(idx) ? self[idx] : nil diff --git a/lib/daru/vector.rb b/lib/daru/vector.rb index 7c8149dcb..feb98d68d 100644 --- a/lib/daru/vector.rb +++ b/lib/daru/vector.rb @@ -562,8 +562,11 @@ def delete element # Delete element by index def delete_at index - @data.delete_at @index[index] - @index = Daru::Index.new(@index.to_a - [index]) + position = @index.pos(index) + removed_index = @index.at(position) + + @data.delete_at(position) + @index = Daru::Index.new(@index.to_a - [removed_index]) update_position_cache end diff --git a/mise.toml b/mise.toml new file mode 100644 index 000000000..5a061357c --- /dev/null +++ b/mise.toml @@ -0,0 +1,2 @@ +[tools] +ruby = "4.0.1" diff --git a/spec/core/group_by_spec.rb b/spec/core/group_by_spec.rb index f5bb1d71d..efd13ee80 100644 --- a/spec/core/group_by_spec.rb +++ b/spec/core/group_by_spec.rb @@ -446,7 +446,17 @@ end context "#[]" do - pending + it "returns a group for single-layer grouping with scalar key" do + expect(@sl_group['bar']).to eq(@sl_group.get_group(['bar'])) + end + + it "returns a group for single-layer grouping with tuple key" do + expect(@sl_group[['bar']]).to eq(@sl_group.get_group(['bar'])) + end + + it "returns a group for multi-layer grouping with tuple args" do + expect(@dl_group['bar', 'one']).to eq(@dl_group.get_group(['bar', 'one'])) + end end context "#reduce" do diff --git a/spec/dataframe_spec.rb b/spec/dataframe_spec.rb index 14384a809..2c85a1845 100644 --- a/spec/dataframe_spec.rb +++ b/spec/dataframe_spec.rb @@ -412,7 +412,6 @@ end it "aligns MultiIndexes properly" do - pending mi_a = @order_mi mi_b = Daru::MultiIndex.from_tuples([ [:b,:one,:foo], @@ -423,6 +422,7 @@ mi_sorted = Daru::MultiIndex.from_tuples([ [:a, :one, :bar], [:a, :one, :baz], + [:a, :two, :baz], [:b, :one, :foo], [:b, :two, :foo] ]) @@ -435,9 +435,9 @@ df = Daru::DataFrame.new([b,a], order: order) expect(df).to eq(Daru::DataFrame.new({ - [:pee, :que] => Daru::Vector.new([1,2,4,3], index: mi_sorted), - [:pee, :poo] => Daru::Vector.new([12,14,11,13], index: mi_sorted) - }, order: order_mi)) + [:pee, :que] => Daru::Vector.new([12,14,nil,11,13], index: mi_sorted), + [:pee, :poo] => Daru::Vector.new([1,nil,2,4,3], index: mi_sorted) + }, order: order)) end it "adds nils in case of missing values" do @@ -857,8 +857,21 @@ end context Daru::MultiIndex do - pending - # TO DO + it "assigns specified row when full tuple is provided" do + @df_mi.row[:a, :one, :bar] = [100, 200, 300, 400] + + expect(@df_mi.row[:a, :one, :bar]).to eq( + Daru::Vector.new([100, 200, 300, 400], index: @order_mi) + ) + end + + it "creates a new row when tuple does not exist" do + key = [:d, :one, :foo] + @df_mi.row[:d, :one, :foo] = [9, 8, 7, 6] + + expect(@df_mi.index.to_a).to include(key) + expect(@df_mi.row[:d, :one, :foo]).to eq(Daru::Vector.new([9, 8, 7, 6], index: @order_mi)) + end end context Daru::CategoricalIndex do @@ -2270,7 +2283,6 @@ end context "#keep_row_if" do - pending "changing row from under the iterator trips this" it "keeps row if block evaluates to true" do df = Daru::DataFrame.new({b: [10,12,20,23,30], a: [50,30,30,1,5], c: [10,20,30,40,50]}, order: [:a, :b, :c], @@ -2279,7 +2291,14 @@ df.keep_row_if do |row| row[:a] % 10 == 0 end - # TODO: write expectation + + expect(df).to eq( + Daru::DataFrame.new( + {a: [50,30,30], b: [10,12,20], c: [10,20,30]}, + order: [:a, :b, :c], + index: [:one, :two, :three] + ) + ) end end @@ -2395,7 +2414,20 @@ end context Daru::MultiIndex do - pending + it "converts DataFrame into array of hashes preserving tuple keys" do + arry = @df_mi.to_a + + rows = @vector_arry1.each_index.map do |i| + { + [:a, :one, :bar] => @vector_arry1[i], + [:a, :two, :baz] => @vector_arry2[i], + [:b, :two, :foo] => @vector_arry1[i], + [:b, :one, :foo] => @vector_arry2[i] + } + end + + expect(arry).to eq([rows, @multi_index.to_a]) + end end end @@ -2415,7 +2447,7 @@ end context "#recast" do - it "recasts underlying vectors" do + it "recasts underlying vectors", :nmatrix do @data_frame.recast a: :nmatrix, c: :nmatrix expect(@data_frame.a.dtype).to eq(:nmatrix) @@ -2452,7 +2484,26 @@ end context Daru::MultiIndex do - pending + it "sorts when specified full tuple without mutating original dataframe" do + sorted = @df_mi.sort([[:a, :one, :bar]]) + + expect(sorted[[:a, :one, :bar]].to_a).to eq([11,11,11,12,12,12,13,13,13,14,14,14]) + expect(sorted.index.to_a).to eq([ + [:a, :one, :bar], + [:b, :one, :bar], + [:c, :one, :bar], + [:a, :one, :baz], + [:b, :two, :bar], + [:c, :one, :baz], + [:a, :two, :bar], + [:b, :two, :baz], + [:c, :two, :foo], + [:a, :two, :baz], + [:b, :one, :foo], + [:c, :two, :bar] + ]) + expect(@df_mi.index).to eq(@multi_index) + end end context Daru::CategoricalIndex do @@ -2670,9 +2721,24 @@ end context Daru::MultiIndex do - pending - it "sorts the DataFrame when specified full tuple" do - @df_mi.sort([[:a,:one,:bar]]) + it "sorts the DataFrame in place when specified full tuple" do + @df_mi.sort!([[:a,:one,:bar]], ascending: false) + + expect(@df_mi[[:a,:one,:bar]].to_a).to eq([14,14,14,13,13,13,12,12,12,11,11,11]) + expect(@df_mi.index.to_a).to eq([ + [:a, :two, :baz], + [:b, :one, :foo], + [:c, :two, :bar], + [:a, :two, :bar], + [:b, :two, :baz], + [:c, :two, :foo], + [:a, :one, :baz], + [:b, :two, :bar], + [:c, :one, :baz], + [:a, :one, :bar], + [:b, :one, :bar], + [:c, :one, :bar] + ]) end end end @@ -2875,7 +2941,7 @@ end end - context "#to_nmatrix" do + context "#to_nmatrix", :nmatrix do before do @df = Daru::DataFrame.new({b: [11,12,13,14,15], a: [1,2,3,4,5], c: [11,22,33,44,55], d: [5,4,nil,2,1], e: ['this', 'has', 'string','data','too']}, @@ -3274,7 +3340,7 @@ end end - context "#to_gsl" do + context "#to_gsl", :gsl do it "converts to GSL::Matrix" do rows = [[1,2,3,4,5],[11,12,13,14,15],[11,22,33,44,55]].transpose mat = GSL::Matrix.alloc *rows @@ -3488,7 +3554,7 @@ def create_test(*args, &_proc) description = args.shift fields = args - [description, fields, Proc.new] + [description, fields, _proc] end before do diff --git a/spec/date_time/index_spec.rb b/spec/date_time/index_spec.rb index 4cddde0eb..f9ae10506 100644 --- a/spec/date_time/index_spec.rb +++ b/spec/date_time/index_spec.rb @@ -25,8 +25,10 @@ end it "lets setting of string time format" do - pending - Daru::DateTimeIndex.format = 'some-date-time-format' + Daru::DateTimeIndex.format = '%Y-%m-%d' + expect(Daru::DateTimeIndex.format).to eq('%Y-%m-%d') + ensure + Daru::DateTimeIndex.format = nil end end @@ -493,19 +495,16 @@ end context "#add" do - before { skip } let(:idx) { Daru::Index.new [:a, :b, :c] } context "single index" do - subject { idx } - before { idx.add :d } + subject { idx.add :d } its(:to_a) { is_expected.to eq [:a, :b, :c, :d] } end context "mulitple indexes" do - subject { idx } - before { idx.add :d, :e } + subject { idx.add :d, :e } its(:to_a) { is_expected.to eq [:a, :b, :c, :d, :e] } end diff --git a/spec/extensions/rserve_spec.rb b/spec/extensions/rserve_spec.rb index f11c19c6f..29baeedaa 100644 --- a/spec/extensions/rserve_spec.rb +++ b/spec/extensions/rserve_spec.rb @@ -2,13 +2,15 @@ require "rserve" require 'daru/extensions/rserve' - describe "Daru rserve extension" do + describe "Daru rserve extension", :rserve do before do @r = Rserve::Connection.new + rescue StandardError => e + skip "Rserve integration unavailable: #{e.class}: #{e.message}" end after do - @r.close + @r.close if @r end describe Daru::Vector do diff --git a/spec/io/io_spec.rb b/spec/io/io_spec.rb index 7610cf22e..b4030a3eb 100644 --- a/spec/io/io_spec.rb +++ b/spec/io/io_spec.rb @@ -172,13 +172,19 @@ end end - context ".from_sql" do - include_context 'with accounts table in sqlite3 database' + context ".from_sql", :sql do + include_context 'with accounts table in sqlite3 database' + + context 'with a database handler of DBI', :dbi do + before do + unless Daru::RSpec::SqliteSupport.dbi_sqlite_available? + skip "DBI sqlite integration unavailable: #{Daru::RSpec::SqliteSupport.dbi_sqlite_error}" + end + end - context 'with a database handler of DBI' do - let(:db) do - DBI.connect("DBI:SQLite3:#{db_name}") - end + let(:db) do + DBI.connect("DBI:SQLite3:#{db_name}") + end subject { Daru::DataFrame.from_sql(db, "select * from accounts") } @@ -191,11 +197,17 @@ end end - context 'with a database connection of ActiveRecord' do - let(:connection) do - Daru::RSpec::Account.establish_connection "sqlite3:#{db_name}" - Daru::RSpec::Account.connection - end + context 'with a database connection of ActiveRecord' do + before do + unless Daru::RSpec::SqliteSupport.activerecord_sqlite_available? + skip "ActiveRecord sqlite integration unavailable: #{Daru::RSpec::SqliteSupport.activerecord_sqlite_error}" + end + end + + let(:connection) do + Daru::RSpec::Account.establish_connection "sqlite3:#{db_name}" + Daru::RSpec::Account.connection + end subject do Daru::DataFrame.from_sql(connection, "select * from accounts") @@ -232,13 +244,17 @@ end end - context '.from_activerecord' do - include_context 'with accounts table in sqlite3 database' + context '.from_activerecord', :sql do + include_context 'with accounts table in sqlite3 database' - context 'with ActiveRecord::Relation' do - before do - Daru::RSpec::Account.establish_connection "sqlite3:#{db_name}" - end + context 'with ActiveRecord::Relation' do + before do + unless Daru::RSpec::SqliteSupport.activerecord_sqlite_available? + skip "ActiveRecord sqlite integration unavailable: #{Daru::RSpec::SqliteSupport.activerecord_sqlite_error}" + end + + Daru::RSpec::Account.establish_connection "sqlite3:#{db_name}" + end let(:relation) do Daru::RSpec::Account.all @@ -284,9 +300,7 @@ expect(df.vectors.to_a).to eq([:v1,:v2,:v3,:v4,:v5,:v6]) end - xit "understands empty fields" do - pending 'See FIXME note in io.rb' - + it "understands empty fields" do df = Daru::DataFrame.from_plaintext 'spec/fixtures/empties.dat', [:v1,:v2,:v3] expect(df.row[1].to_a).to eq [4, nil, 6] diff --git a/spec/io/sql_data_source_spec.rb b/spec/io/sql_data_source_spec.rb index 8bab9e41b..af80e19d3 100644 --- a/spec/io/sql_data_source_spec.rb +++ b/spec/io/sql_data_source_spec.rb @@ -3,7 +3,7 @@ require 'dbi' require 'active_record' -RSpec.describe Daru::IO::SqlDataSource do +RSpec.describe Daru::IO::SqlDataSource, :sql do include_context 'with accounts table in sqlite3 database' let(:query) do @@ -18,7 +18,13 @@ describe '.make_dataframe' do subject(:df) { Daru::IO::SqlDataSource.make_dataframe(source, query) } - context 'with DBI::DatabaseHandle' do + context 'with DBI::DatabaseHandle', :dbi do + before do + unless Daru::RSpec::SqliteSupport.dbi_sqlite_available? + skip "DBI sqlite integration unavailable: #{Daru::RSpec::SqliteSupport.dbi_sqlite_error}" + end + end + let(:source) { DBI.connect("DBI:SQLite3:#{db_name}") } it { is_expected.to be_a(Daru::DataFrame) } it { expect(df.row[0]).to have_attributes(id: 1, age: 20) } @@ -26,6 +32,12 @@ end context 'with ActiveRecord::Connection' do + before do + unless Daru::RSpec::SqliteSupport.activerecord_sqlite_available? + skip "ActiveRecord sqlite integration unavailable: #{Daru::RSpec::SqliteSupport.activerecord_sqlite_error}" + end + end + it { is_expected.to be_a(Daru::DataFrame) } it { expect(df.row[0]).to have_attributes(id: 1, age: 20) } its(:nrows) { is_expected.to eq 2 } @@ -39,6 +51,7 @@ end context 'with an object not a string as a query' do + let(:source) { Object.new } let(:query) { Object.new } it { expect { df }.to raise_error(ArgumentError) } end diff --git a/spec/maths/arithmetic/vector_spec.rb b/spec/maths/arithmetic/vector_spec.rb index 2d3289db6..c2b1d3f4d 100644 --- a/spec/maths/arithmetic/vector_spec.rb +++ b/spec/maths/arithmetic/vector_spec.rb @@ -24,7 +24,6 @@ end it "appropriately adds vectors with numeric and non-numeric indexes" do - pending "Need an alternate index implementation?" v1 = Daru::Vector.new([1,2,3]) v2 = Daru::Vector.new([1,2,3], index: [:a,:b,:c]) diff --git a/spec/maths/statistics/vector_spec.rb b/spec/maths/statistics/vector_spec.rb index bb4b00d33..966ff9a60 100644 --- a/spec/maths/statistics/vector_spec.rb +++ b/spec/maths/statistics/vector_spec.rb @@ -1,5 +1,5 @@ describe Daru::Vector do - [:array, :gsl].each do |dtype| #nmatrix still unstable + ([:array] + (HAS_GSL ? [:gsl] : [])).each do |dtype| # nmatrix still unstable describe dtype do before do @dv = Daru::Vector.new [323, 11, 555, 666, 234, 21, 666, 343, 1, 2], dtype: dtype diff --git a/spec/plotting/gruff/category_spec.rb b/spec/plotting/gruff/category_spec.rb index c36556531..54a8b6dd9 100644 --- a/spec/plotting/gruff/category_spec.rb +++ b/spec/plotting/gruff/category_spec.rb @@ -1,4 +1,4 @@ -describe Daru::Vector, 'plotting category vector with gruff' do +describe Daru::Vector, 'plotting category vector with gruff', :gruff do before { Daru.plotting_library = :gruff } let(:dv) { Daru::Vector.new [1, 2, 3], type: :category } diff --git a/spec/plotting/gruff/dataframe_spec.rb b/spec/plotting/gruff/dataframe_spec.rb index 839be9021..8428040d5 100644 --- a/spec/plotting/gruff/dataframe_spec.rb +++ b/spec/plotting/gruff/dataframe_spec.rb @@ -1,6 +1,6 @@ require 'spec_helper.rb' -describe Daru::DataFrame, 'plotting dataframe using gruff' do +describe Daru::DataFrame, 'plotting dataframe using gruff', :gruff do before { Daru.plotting_library = :gruff } let(:df) do Daru::DataFrame.new({ @@ -62,7 +62,7 @@ end end -describe Daru::DataFrame, 'dataframe category plotting with gruff' do +describe Daru::DataFrame, 'dataframe category plotting with gruff', :gruff do before { Daru.plotting_library = :gruff } let(:df) do Daru::DataFrame.new({ diff --git a/spec/plotting/gruff/vector_spec.rb b/spec/plotting/gruff/vector_spec.rb index 10ba789c1..a9d0a7cb4 100644 --- a/spec/plotting/gruff/vector_spec.rb +++ b/spec/plotting/gruff/vector_spec.rb @@ -1,6 +1,6 @@ require 'spec_helper.rb' -describe Daru::Vector, 'plotting vector with gruff' do +describe Daru::Vector, 'plotting vector with gruff', :gruff do let(:dv) { Daru::Vector.new [1, 2, 3] } before { Daru.plotting_library = :gruff } diff --git a/spec/spec_helper.rb b/spec/spec_helper.rb index 5b32066d6..1a4f15761 100644 --- a/spec/spec_helper.rb +++ b/spec/spec_helper.rb @@ -7,7 +7,6 @@ require 'tempfile' require 'pry-byebug' require 'nokogiri' -require 'gruff' require 'webmock/rspec' def mri? @@ -31,6 +30,23 @@ def jruby? $LOAD_PATH.unshift(File.dirname(__FILE__)) require 'daru' +HAS_NMATRIX = Daru.has_nmatrix? +HAS_GSL = Daru.has_gsl? +HAS_GRUFF = Daru.has_gruff? + +ALL_DTYPES = [:array] +ALL_DTYPES.unshift(:gsl) if HAS_GSL +ALL_DTYPES.unshift(:nmatrix) if HAS_NMATRIX + +RSpec.configure do |config| + config.filter_run_excluding gruff: true unless HAS_GRUFF + config.filter_run_excluding nmatrix: true unless HAS_NMATRIX + config.filter_run_excluding gsl: true unless HAS_GSL + config.filter_run_excluding rserve: true unless ENV['DARU_TEST_RSERVE'] == '1' + config.filter_run_excluding dbi: true unless ENV['DARU_TEST_DBI'] == '1' + config.filter_run_excluding sql: true unless ENV['DARU_TEST_SQL'] == '1' +end + if jruby? require 'mdarray' else @@ -39,8 +55,6 @@ def jruby? end end -ALL_DTYPES = [:nmatrix, :gsl, :array] - # FIXME: This must go! Need to be able to use be_within def expect_correct_vector_in_delta v1, v2, delta expect(v1.size).to eq(v2.size) diff --git a/spec/support/database_helper.rb b/spec/support/database_helper.rb index b9ea21f56..6e46f8d7f 100644 --- a/spec/support/database_helper.rb +++ b/spec/support/database_helper.rb @@ -1,11 +1,56 @@ require 'sqlite3' require 'dbi' +require 'logger' require 'active_record' module Daru::RSpec class Account < ActiveRecord::Base self.table_name = 'accounts' end + + module SqliteSupport + module_function + + def activerecord_sqlite_available? + return @activerecord_sqlite_available unless @activerecord_sqlite_available.nil? + + @activerecord_sqlite_available = begin + ActiveRecord::Base.establish_connection(adapter: 'sqlite3', database: ':memory:') + ActiveRecord::Base.connection + true + rescue LoadError, StandardError => e + @activerecord_sqlite_error = "#{e.class}: #{e.message}" + false + ensure + begin + ActiveRecord::Base.connection_pool.disconnect! if ActiveRecord::Base.connected? + rescue StandardError + nil + end + end + end + + def activerecord_sqlite_error + @activerecord_sqlite_error + end + + def dbi_sqlite_available? + return @dbi_sqlite_available unless @dbi_sqlite_available.nil? + + @dbi_sqlite_available = begin + db = DBI.connect('DBI:SQLite3::memory:') + db.disconnect if db + true + rescue LoadError, StandardError => e + @dbi_sqlite_error = "#{e.class}: #{e.message}" + false + end + end + + def dbi_sqlite_error + @dbi_sqlite_error + end + end end shared_context 'with accounts table in sqlite3 database' do diff --git a/spec/vector_spec.rb b/spec/vector_spec.rb index 536094328..459be77cf 100644 --- a/spec/vector_spec.rb +++ b/spec/vector_spec.rb @@ -84,7 +84,7 @@ expect(dv.index.to_a).to eq(['a', 'b', :r, 0]) end - it "initializes array with nils with dtype NMatrix" do + it "initializes array with nils with dtype NMatrix", :nmatrix do dv = Daru::Vector.new [2, nil], dtype: :nmatrix expect(dv.to_a).to eq([2, nil]) expect(dv.index.to_a).to eq([0, 1]) @@ -917,7 +917,6 @@ end it "deletes element of specified integer index" do - pending @dv.delete_at 2 expect(@dv).to eq(Daru::Vector.new [1,2,4,5], name: :a, @@ -1004,23 +1003,21 @@ end context Daru::MultiIndex do - pending - # it "returns vector as a Hash" do - # pending - # mi = Daru::MultiIndex.from_tuples([ - # [:a,:two,:bar], - # [:a,:two,:baz], - # [:b,:one,:bar], - # [:b,:two,:bar] - # ]) - # vector = Daru::Vector.new([1,2,3,4], index: mi, dtype: dtype) - # expect(vector.to_h).to eq({ - # [:a,:two,:bar] => 1, - # [:a,:two,:baz] => 2, - # [:b,:one,:bar] => 3, - # [:b,:two,:bar] => 4 - # }) - # end + it "returns vector as a Hash" do + mi = Daru::MultiIndex.from_tuples([ + [:a,:two,:bar], + [:a,:two,:baz], + [:b,:one,:bar], + [:b,:two,:bar] + ]) + vector = Daru::Vector.new([1,2,3,4], index: mi, dtype: dtype) + expect(vector.to_h).to eq({ + [:a,:two,:bar] => 1, + [:a,:two,:baz] => 2, + [:b,:one,:bar] => 3, + [:b,:two,:bar] => 4 + }) + end end end @@ -1360,14 +1357,14 @@ expect(a.dtype).to eq(:array) end - it "maps and returns a vector of dtype gsl" do + it "maps and returns a vector of dtype gsl", :gsl do a = @common_all_dtypes.recode(:gsl) { |v| v == -99 ? 1 : 0 } exp = Daru::Vector.new [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1], dtype: :gsl expect(a).to eq(exp) expect(a.dtype).to eq(:gsl) end - it "maps and returns a vector of dtype nmatrix" do + it "maps and returns a vector of dtype nmatrix", :nmatrix do a = @common_all_dtypes.recode(:nmatrix) { |v| v == -99 ? 1 : 0 } exp = Daru::Vector.new [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1], dtype: :nmatrix expect(a).to eq(exp) @@ -1389,14 +1386,14 @@ expect(@vector.dtype).to eq(dtype) end - it "destructively maps and returns a vector of dtype gsl" do + it "destructively maps and returns a vector of dtype gsl", :gsl do @vector.recode!(:gsl) { |v| v == -99 ? 1 : 0 } exp = Daru::Vector.new [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1], dtype: :gsl expect(@vector).to eq(exp) expect(@vector.dtype).to eq(exp.dtype) end - it "destructively maps and returns a vector of dtype nmatrix" do + it "destructively maps and returns a vector of dtype nmatrix", :nmatrix do @vector.recode!(:nmatrix) { |v| v == -99 ? 1 : 0 } exp = Daru::Vector.new [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1], dtype: :nmatrix expect(@vector).to eq(exp) @@ -1538,7 +1535,17 @@ end context Daru::MultiIndex do - pending + it "clones a vector with multi-index and fills it with nils" do + mi = Daru::MultiIndex.from_tuples([ + [:a, :one], + [:a, :two], + [:b, :one] + ]) + vec = Daru::Vector.new([1, 2, 3], index: mi) + expect(vec.clone_structure).to eq( + Daru::Vector.new([nil, nil, nil], index: mi) + ) + end end end @@ -1593,7 +1600,7 @@ its(:'index.to_a') { is_expected.to eq [] } end - context 'works for gsl' do + context 'works for gsl', :gsl do let(:dv) { Daru::Vector.new [1, 2, 3, Float::NAN], dtype: :gsl, index: 11..14 } subject { dv.reject_values Float::NAN } @@ -1857,7 +1864,7 @@ expect(@multi.type).to eq(:object) end - it "tells NMatrix data type in case of NMatrix wrapper" do + it "tells NMatrix data type in case of NMatrix wrapper", :nmatrix do nm = Daru::Vector.new([1,2,3,4,5], dtype: :nmatrix) expect(nm.type).to eq(:int32) end @@ -1908,7 +1915,7 @@ end end - context '#to_nmatrix' do + context '#to_nmatrix', :nmatrix do let(:dv) { Daru::Vector.new [1, 2, 3, 4, 5] } context 'horizontal axis' do @@ -1945,7 +1952,7 @@ end end - context "#to_gsl" do + context "#to_gsl", :gsl do it "returns a GSL::Vector of non-nil data" do vector = Daru::Vector.new [1,2,3,4,nil,6,nil] expect(vector.to_gsl).to eq(GSL::Vector.alloc(1,2,3,4,6))