Skip to content

Commit 5c29d5b

Browse files
influxdb: Avoid out of memory by influxDB (#4291)
After a few hours running with InfluxDB configured, CloudStack hangs due to OutOfMemoryException raised. The exception happens at com.cloud.server.StatsCollector.writeBatches(StatsCollector.java:1510): 2020-08-12 21:19:00,972 ERROR [c.c.s.StatsCollector] (StatsCollector-6:ctx-0a4cfe6a) (logid:03a7ba48) Error trying to retrieve host stats java.lang.OutOfMemoryError: unable to create new native thread ... at org.influxdb.impl.BatchProcessor.<init>(BatchProcessor.java:294) at org.influxdb.impl.BatchProcessor$Builder.build(BatchProcessor.java:201) at org.influxdb.impl.InfluxDBImpl.enableBatch(InfluxDBImpl.java:311) at com.cloud.server.StatsCollector.writeBatches(StatsCollector.java:1510) at com.cloud.server.StatsCollector$AbstractStatsCollector.sendMetricsToInfluxdb(StatsCollector.java:1351) at com.cloud.server.StatsCollector$HostCollector.runInContext(StatsCollector.java:522) Context on InfluxDB Batch: Enabling batch on InfluxDB is great and speeds writing but it requires caution to avoid Zombie threads. Solution: This happens because the batching feature creates an internal thread pool that needs to be shut down explicitly; therefore, it is important to add: influxDB.close().
1 parent ba4b04f commit 5c29d5b

File tree

2 files changed

+19
-13
lines changed

2 files changed

+19
-13
lines changed

pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -130,7 +130,7 @@
130130
<cs.guava.version>23.6-jre</cs.guava.version>
131131
<cs.httpclient.version>4.5.4</cs.httpclient.version>
132132
<cs.httpcore.version>4.4.8</cs.httpcore.version>
133-
<cs.influxdb-java.version>2.15</cs.influxdb-java.version>
133+
<cs.influxdb-java.version>2.20</cs.influxdb-java.version>
134134
<cs.jackson.version>2.9.2</cs.jackson.version>
135135
<cs.jasypt.version>1.9.2</cs.jasypt.version>
136136
<cs.java-ipv6.version>0.16</cs.java-ipv6.version>

server/src/main/java/com/cloud/server/StatsCollector.java

Lines changed: 18 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1334,21 +1334,25 @@ abstract class AbstractStatsCollector extends ManagedContextRunnable {
13341334
protected void sendMetricsToInfluxdb(Map<Object, Object> metrics) {
13351335
InfluxDB influxDbConnection = createInfluxDbConnection();
13361336

1337-
Pong response = influxDbConnection.ping();
1338-
if (response.getVersion().equalsIgnoreCase("unknown")) {
1339-
throw new CloudRuntimeException(String.format("Cannot ping influxdb host %s:%s.", externalStatsHost, externalStatsPort));
1340-
}
1337+
try {
1338+
Pong response = influxDbConnection.ping();
1339+
if (response.getVersion().equalsIgnoreCase("unknown")) {
1340+
throw new CloudRuntimeException(String.format("Cannot ping influxdb host %s:%s.", externalStatsHost, externalStatsPort));
1341+
}
13411342

1342-
Collection<Object> metricsObjects = metrics.values();
1343-
List<Point> points = new ArrayList<>();
1343+
Collection<Object> metricsObjects = metrics.values();
1344+
List<Point> points = new ArrayList<>();
13441345

1345-
s_logger.debug(String.format("Sending stats to %s host %s:%s", externalStatsType, externalStatsHost, externalStatsPort));
1346+
s_logger.debug(String.format("Sending stats to %s host %s:%s", externalStatsType, externalStatsHost, externalStatsPort));
13461347

1347-
for (Object metricsObject : metricsObjects) {
1348-
Point vmPoint = creteInfluxDbPoint(metricsObject);
1349-
points.add(vmPoint);
1348+
for (Object metricsObject : metricsObjects) {
1349+
Point vmPoint = creteInfluxDbPoint(metricsObject);
1350+
points.add(vmPoint);
1351+
}
1352+
writeBatches(influxDbConnection, databaseName, points);
1353+
} finally {
1354+
influxDbConnection.close();
13501355
}
1351-
writeBatches(influxDbConnection, databaseName, points);
13521356
}
13531357

13541358
/**
@@ -1507,7 +1511,9 @@ protected InfluxDB createInfluxDbConnection() {
15071511
*/
15081512
protected void writeBatches(InfluxDB influxDbConnection, String dbName, List<Point> points) {
15091513
BatchPoints batchPoints = BatchPoints.database(dbName).build();
1510-
influxDbConnection.enableBatch(BatchOptions.DEFAULTS);
1514+
if(!influxDbConnection.isBatchEnabled()){
1515+
influxDbConnection.enableBatch(BatchOptions.DEFAULTS);
1516+
}
15111517

15121518
for (Point point : points) {
15131519
batchPoints.point(point);

0 commit comments

Comments
 (0)