[WIP] Switch to using pickle instead of npz to store intermediate results#54
Open
betatim wants to merge 1 commit intodevelopmentseed:masterfrom
Open
[WIP] Switch to using pickle instead of npz to store intermediate results#54betatim wants to merge 1 commit intodevelopmentseed:masterfrom
betatim wants to merge 1 commit intodevelopmentseed:masterfrom
Conversation
Contributor
|
@betatim update here, I timed the two on a separate dataset (50k tiles) and pickle was considerably faster to load (~4 seconds vs. 90). I don't totally understand why but this line is the culprit; my guess is that iterating over the file list of an Do you want to remove the older commented code and then I'll merge? |
Contributor
Author
|
I think the problem is in how the npz is read. If you dig a bit into the numyp docs it suggests that (maybe) the way it is implemented is as one file per key. So I would not be surprised if on the inside there is one I'll remove the commented code and take a look at the failing tests. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #51
Work in progress code to check out using pickles to store intermediate results over NPZ files.
Not quite sure how to nicely benchmark this. This speeds up (for example) the time between running
label-maker imagesand it printing "Downloading 10874 tiles to ...". With this branch there is nearly no delay between starting label-maker and seeing that printout. With the npz setup it takes "a while" with the belowzurich.json(a while == minutes or longer, I can measure it later).(This branch needs cleaning up a bit before merging, but wanted to show the basic idea.)
{ "country": "switzerland", "bounding_box": [8.488103,47.359111,8.582088,47.407637], "zoom": 19, "classes": [ { "name": "Pools", "filter": ["==", "leisure", "swimming_pool"] }, { "name": "Bridge", "filter": ["has", "bridge"], "buffer": 5 }, { "name": "Roads", "filter": ["all", ["has", "highway"], ["in", "highway", "motorway", "primary", "secondary", "residential"] ], "buffer": 3 }, { "name": "Buildings", "filter": ["has", "building"], "buffer": 3 }, { "name": "Water", "filter": ["==", "natural", "water"] }, { "name": "Forest", "filter": ["==", "landuse", "forest"] } ], "imagery": "http://a.tiles.mapbox.com/v4/mapbox.satellite/{z}/{x}/{y}.jpg?access_token=your_token_here", "background_ratio": 1, "ml_type": "classification" }