-
Notifications
You must be signed in to change notification settings - Fork 11
Cluster the cluster data #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| _CLUSTER_BASE = os.path.join(configuration['ROOT_DATA_PATH'], 'cluster_data') | ||
| configuration['_CLUSTER_PATHS'] = { | ||
| 'cluster_I2': os.path.join( | ||
| 'markov_i2': os.path.join( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename these to something more informative
|
|
||
| self.check_deltas(edge_data=edge_data, node_metadata=node_metadata, cluster_data=clusters) | ||
|
|
||
| def check_deltas(self, edge_data={}, node_metadata={}, cluster_data={}): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
brief dataset summary for sanity checking
| for data_structure in [edge_data, expected]: | ||
| for k in data_structure.keys(): | ||
| data_structure[k] = sorted(data_structure[k], key=lambda n: n['_key']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
order data as it won't necessarily be sorted when coming out of the parser
| clusters: | ||
| type: array | ||
| title: Clusters | ||
| description: Clusters to which the node has been assigned | ||
| items: | ||
| type: string | ||
| format: regex | ||
| pattern: ^\w+:\d+$ | ||
| examples: [["markov_i2:1", "markov_i4:5"], ["markov_i6:3"]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The important bit
| # nodes are represented as a list of node[_key] | ||
| # edges are objects with keys _to, _from, edge_type and score | ||
|
|
||
| def test_fetch_phenotypes_no_results(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the queries with no results have been merged in with the other tests
Update parser and tests accordingly
65ffbff to
b7780a0
Compare
| title: Cluster IDs | ||
| description: Cluster IDs, in the form "clustering_system_name:cluster_id" | ||
| items: {type: string} | ||
| examples: [['markov_i2:5', 'markov_i6:2'],['markov_i6:1']] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be an object so we don't have to parse these entries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess if the client is using string parameters like "markov_i2:5" then it doesn't matter
Part 1 of the changes in this old PR in the relation_engine_spec repo.
Merge all
clusterfields in thedjornl_nodecollection into a single field.Update parser and tests accordingly.