influxdb: Add ability to specify tags#467
Conversation
|
In order to make the comparison easier I generated some 2 minute Can someone take a look at these for differences? Maybe @kmharrington or @mhasself? EDIT: These just have a fake data agent (w/2 channels), the registry, influxdb v1 publisher agent, and the aggregator agent running. |
mhasself
left a comment
There was a problem hiding this comment.
Looks good -- some comments, but I think they are minor (despite how many paragraphs were spent).
I agree that this should have no impact on the G3 data.
| @staticmethod | ||
| def verify_influxdb_tags(message): | ||
| """Check the 'influxdb_tags' to make sure all the needed information is | ||
| provided. |
There was a problem hiding this comment.
Note it's acceptable for influxdb_tags to be entirely empty.
| error_msg = f"'_field' not supplied with 'influxdb_tags' for tag set {v}" | ||
| raise ValueError(error_msg) | ||
|
|
||
| # check that all fields have a corresponding tag |
There was a problem hiding this comment.
These checks will not fail if influxdb_tags contains a key that is not also in data. Should that be acceptable?
| raise Exception("Block structure does not match: {}".format(self.name)) | ||
|
|
||
| self.timestamps.extend(block['timestamps']) | ||
| self.tags = block.get('influxdb_tags') |
There was a problem hiding this comment.
Shouldn't it be required that the latest tags match the previous tags?
I think, as written, someone could do this:
- publish some data with tags={...} fully populated -- these get cached.
- publish another point, with tags=None -- this gets cached.
- If the buffer is now flushed, the result will have tags=None, because that was the most recent thing.
That would pass all checks, but the stored tags (when the block gets flushed) would be None. It's probably not what the user wanted. I think either:
- Require user to pass the same tags every time they push data to this block OR
- If tags=None, don't change the stored tags in the block.
At some level, data producers are doomed to always provide tags, so the the first thing makes more sense to me.
Granted, this would require modification to the aggregator agent (so far unchanged), because you'd need to initialize each new Block (in Provider.save_to_block) with influxdb_tags set from the first transmission. That is messier in the present instance but seems logically safer on the whole. I think the best way would be to have a constructor for Block that parses the block data dict (just as Block.append and Block.extend do), and sets all the things from that (including influxdb_tags). This would make Aggregator, in the long run, less fragile vis a vis stuff in the feed that it doesn't care about.
|
Thanks for the review, I'll address your comments soon. A quick note to myself, I shouldn't have squashed #466. Before merging this I should rebase on |
Description
This PR (built on top of the refactor in #466) adds the ability to specify tags for each field published to InfluxDB by including the new
'influxdb_tags'dict in the message published on an OCS Feed.The new tagging structure is only applied for agents that publish this new
'influxdb_tags'dict. Messages with this dict will now look like:Each 'tag' provided for the (ocs) fields will be applied, the data will be published to a measurement set by the 'agent_class' and the (influx) field will be the string provided in '_field'.
With the new tags, this changes the structure of a write from something like:
to:
Motivation and Context
Resolves #175.
Some agents in socs have a large number of fields and writing Grafana dashboards to query all of their fields results in a large number of queries to the InfluxDB. Tagging, as performed here, should enable writing more efficient queries.
The old structure in OCS was violating essentially every suggestion for schema and data layout made by InfluxDB. We were encoding data in both the measurements and the field keys. This new structure fixes both of these issues.
How Has This Been Tested?
I have run the agent locally in both 'json' and 'line' protocol modes and verified how the data is presented in InfluxDB through Grafana while running the modified FakeDataAgent.
This results in being able to make queries like this:

I also ran an unmodified FakeDataAgent alongside the modified one. While writing to InfluxDB I also wrote to
.g3file via the Aggregator agent.I'm not 100% sure, so checking me on this would be great in this review, but I don't see the
influxdb_tagsmaking it into the.g3files:Trying to look more closely:
So I think the old structure remains in place within
.g3.Types of changes
Checklist: