Skip to content

Commit 04cd2df

Browse files
committed
pool: enhance migration pool choice logic
Motivation: The pool destination choice/prioritization should take into account accessibility, existing load, hostname, and more when replicating to mitigate hot file issues. Modification: Create `PoolListByPoolMgrQuery` and utilize in `MigrationModule.reportFileRequest` to ensure that destination pools are accessible (at least) to the file request that triggered the migration. Result: Hot file migration destination pools are now prioritized, and match network and protocol requirements of the file request that triggered the migration. Acked-by: Dmitry Litvintsev Patch: https://rb.dcache.org/r/14629/diff/raw Commit: Target: master Request: 11.2 Require-book: no Require-notes: yes
1 parent 5a227a2 commit 04cd2df

7 files changed

Lines changed: 950 additions & 13 deletions

File tree

docs/dev/hotfile-replication.md

Lines changed: 173 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,173 @@
1+
# Hot File Replication
2+
3+
Hot file replication automatically creates additional replicas of files that are being read
4+
frequently (i.e. "hot" files), distributing load across the pool fleet.
5+
6+
## How It Works
7+
8+
### Protocol-Agnostic Design
9+
10+
All dCache read protocols (DCap, NFS/pNFS, WebDAV, xrootd, FTP, HTTP, pool-to-pool) send a
11+
`PoolIoFileMessage` to the pool when a client requests a file. The pool dispatches that message
12+
to `PoolV4.ioFile()`, which is therefore a single, protocol-independent entry point for every
13+
read request. Hot file monitoring is implemented at that point:
14+
15+
```
16+
Any Door (DCap / NFS / WebDAV / xrootd / FTP / HTTP / …)
17+
→ sends PoolIoFileMessage
18+
→ Pool.messageArrived(PoolIoFileMessage)
19+
→ PoolV4.ioFile() ← monitoring happens here
20+
→ queues mover (protocol-specific)
21+
→ FileRequestMonitor.reportFileRequest(pnfsId, currentCount, protocolInfo)
22+
```
23+
24+
Because the counting and triggering happen in `PoolV4.ioFile()`, no protocol-specific code
25+
changes are required to benefit from this feature.
26+
27+
### Request Counting
28+
29+
`PoolV4.ioFile()` reads the concurrent mover count for the file from `IoQueueManager`:
30+
31+
```java
32+
long requestCount = _ioQueue.numberOfRequestsFor(message.getPnfsId());
33+
_fileRequestMonitor.reportFileRequest(message.getPnfsId(), requestCount,
34+
message.getProtocolInfo());
35+
```
36+
37+
When `requestCount` reaches or exceeds the configured threshold,
38+
`MigrationModule.reportFileRequest()` creates a migration job named
39+
`hotfile-<pnfsId>` that replicates the file to additional pools.
40+
41+
### Pool Selection
42+
43+
The migration job selects target pools by querying PoolManager via `PoolMgrQueryPoolsMsg`,
44+
deriving `protocolUnit` from the triggering request's `ProtocolInfo` (e.g., `"DCap/3"`) and
45+
`netUnitName` from the client's IP address when available (e.g., `"192.168.1.10"`). When the
46+
client IP is not available (non-IP protocol or unknown), an empty string is used for `netUnitName`,
47+
which causes PoolManager to match any network unit. When `ProtocolInfo` is null (e.g., for
48+
internal pool-to-pool transfers), `protocolUnit` falls back to `"*/*"` and `netUnitName` to `""`
49+
so that selection is based solely on the file's storage group and pool-group read preferences.
50+
51+
`PoolMgrQueryPoolsMsg.getPools()` returns a `List<String>[]` where index 0 is the highest
52+
read-preference level. `PoolListByPoolMgrQuery` selects **only** the first non-empty
53+
preference level, so the file is always replicated to the best available pools:
54+
55+
```java
56+
// Only take the first non-empty preference level (highest read preference)
57+
for (int i = 0; i < poolLists.length; i++) {
58+
List<String> poolList = poolLists[i];
59+
if (poolList != null && !poolList.isEmpty()) {
60+
selectedPools = poolList;
61+
break;
62+
}
63+
}
64+
```
65+
66+
Prior to this, all preference levels were flattened into a union, causing files to be
67+
replicated to pools from lower-preference groups (e.g. flush pools) instead of the intended
68+
read-only pools.
69+
70+
### Job Housekeeping
71+
72+
To prevent unbounded memory growth, `MigrationModule` keeps at most 50 hotfile jobs. When a
73+
new job would exceed that limit, the oldest jobs that have reached a terminal state
74+
(`FINISHED`, `CANCELLED`, `FAILED`) are pruned first.
75+
76+
## Configuration
77+
78+
| Property | Default | Description |
79+
|---|---|---|
80+
| `pool.hotfile.replication.enable` | `false` | Enable/disable hot file monitoring. **Must be `true` to activate.** |
81+
| `pool.migration.hotfile.threshold` | `50` | Number of concurrent read movers required to trigger replication |
82+
| `pool.migration.hotfile.replicas` | `1` | Number of additional replicas to create |
83+
| `pool.migration.concurrency.default` | `1` | Number of files the migration job migrates concurrently |
84+
85+
Example (`dcache.conf` or pool layout file):
86+
87+
```ini
88+
pool.hotfile.replication.enable = true
89+
pool.migration.hotfile.threshold = 3
90+
pool.migration.hotfile.replicas = 3
91+
pool.migration.concurrency.default = 1
92+
```
93+
94+
> **Note:** The feature is disabled by default. A pool restart is required after any
95+
> configuration change.
96+
97+
## Key Source Files
98+
99+
| File | Role |
100+
|---|---|
101+
| `modules/dcache/src/main/java/org/dcache/pool/classic/PoolV4.java` | Entry point; checks enable flag, calls `FileRequestMonitor` |
102+
| `modules/dcache/src/main/java/org/dcache/pool/migration/MigrationModule.java` | Implements `FileRequestMonitor`; counts requests, creates and manages migration jobs |
103+
| `modules/dcache/src/main/java/org/dcache/pool/migration/PoolListByPoolMgrQuery.java` | Queries PoolManager for eligible target pools; selects highest-preference level only |
104+
| `modules/dcache/src/test/java/org/dcache/pool/classic/HotfileMonitoringTest.java` | Spring-context integration test for enable/disable behaviour |
105+
| `modules/dcache/src/test/java/org/dcache/pool/migration/MigrationModuleTest.java` | Unit tests for `reportFileRequest`, threshold, housekeeping |
106+
| `modules/dcache/src/test/java/org/dcache/pool/migration/PoolListByPoolMgrQueryTest.java` | Unit tests for pool selection, preference-level handling, unknown net unit, and wildcard protocol |
107+
| `skel/share/defaults/pool.properties` | Canonical defaults for all `pool.hotfile.*` and `pool.migration.hotfile.*` properties |
108+
109+
## Diagnostics
110+
111+
### Log Messages
112+
113+
With the default log level the following INFO messages are emitted by `MigrationModule`:
114+
115+
```
116+
Hot file monitoring: pnfsId=<id>, requests=<n>, threshold=<t>
117+
Hot file detected! Triggering replication for pnfsId=<id>
118+
Created migration job with id hotfile-<id> for pnfsId <id> with <n> replicas and concurrency <c>
119+
Starting migration job hotfile-<id> for pnfsId <id>
120+
Successfully started migration job hotfile-<id> for pnfsId <id>
121+
Job hotfile-<id> already exists with state <STATE>
122+
```
123+
124+
`PoolV4` emits at INFO:
125+
126+
```
127+
PoolV4.ioFile: Received IO request for pnfsId=<id>, hotFileEnabled=<bool>, monitorSet=<bool>
128+
PoolV4.ioFile: Calling reportFileRequest for pnfsId=<id>, count=<n>
129+
```
130+
131+
And at ERROR if the monitor is not wired:
132+
133+
```
134+
PoolV4.ioFile: Hot file replication enabled but FileRequestMonitor is NULL!
135+
```
136+
137+
`PoolListByPoolMgrQuery` emits at DEBUG when a preference level is selected:
138+
139+
```
140+
Selected preference level <i> with <n> pools for <pnfsId>: [pool1, pool2, …]
141+
```
142+
143+
### Runtime Log Level Adjustment
144+
145+
```
146+
# In the pool's admin shell
147+
log set org.dcache.pool.classic.PoolV4 DEBUG
148+
log set org.dcache.pool.migration.MigrationModule DEBUG
149+
```
150+
151+
### Checking Job and Replica Status
152+
153+
```bash
154+
# List active migration jobs on all pools
155+
ssh -p <admin-port> admin@<host> '\s <pool-pattern> migration ls'
156+
157+
# Inspect a specific job
158+
ssh -p <admin-port> admin@<host> '\s <pool-name> migration info hotfile-<pnfsId>'
159+
160+
# List replicas of a file
161+
ssh -p <admin-port> admin@<host> '\sl <pnfsId> rep ls <pnfsId>'
162+
```
163+
164+
### Interpreting Absence of Log Messages
165+
166+
| Symptom | Likely Cause |
167+
|---|---|
168+
| No `PoolV4.ioFile` messages | IO requests are not reaching the pool, or the feature is disabled (`hotFileEnabled=false`) |
169+
| `monitorSet=false` in PoolV4 log | `FileRequestMonitor` not wired — check Spring context startup errors |
170+
| `requests` stays at 1 | IoQueue not counting movers correctly |
171+
| "Hot file detected" but no job created | Exception during job creation — check ERROR lines for a stack trace |
172+
| Job created but not started | `MigrationModule` not started — run `migration start` in the admin interface |
173+
| "Job already exists" repeating | Previous job is stuck in a non-terminal state — inspect job state |

modules/dcache/src/main/java/org/dcache/pool/classic/FileRequestMonitor.java

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
package org.dcache.pool.classic;
22

33
import diskCacheV111.util.PnfsId;
4+
import diskCacheV111.vehicles.ProtocolInfo;
5+
import javax.annotation.Nullable;
46

57
/**
68
* Abstract interface for monitoring file requests in the pool.
@@ -12,7 +14,9 @@ public interface FileRequestMonitor {
1214
*
1315
* @param pnfsId the file identifier
1416
* @param numberOfRequests the number of requests for this file
17+
* @param protocolInfo the protocol info of the request, may be {@code null} if unknown
1518
*/
16-
void reportFileRequest(PnfsId pnfsId, long numberOfRequests);
19+
void reportFileRequest(PnfsId pnfsId, long numberOfRequests,
20+
@Nullable ProtocolInfo protocolInfo);
1721
}
1822

modules/dcache/src/main/java/org/dcache/pool/classic/PoolV4.java

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -756,7 +756,8 @@ private void ioFile(CellMessage envelope, PoolIoFileMessage message) {
756756
message.getPnfsId());
757757
if (_hotFileReplicationEnabled) {
758758
_fileRequestMonitor.reportFileRequest(message.getPnfsId(),
759-
_ioQueue.numberOfRequestsFor(message.getPnfsId()));
759+
_ioQueue.numberOfRequestsFor(message.getPnfsId()),
760+
message.getProtocolInfo());
760761
}
761762
message.setSucceeded();
762763
} catch (OutOfDateCacheException e) {

modules/dcache/src/main/java/org/dcache/pool/migration/MigrationModule.java

Lines changed: 57 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,9 @@
1212
import diskCacheV111.util.PnfsId;
1313
import diskCacheV111.util.RetentionPolicy;
1414
import diskCacheV111.vehicles.PoolManagerGetPoolMonitor;
15+
import diskCacheV111.vehicles.IpProtocolInfo;
1516
import diskCacheV111.vehicles.PoolManagerPoolInformation;
17+
import diskCacheV111.vehicles.ProtocolInfo;
1618
import dmg.cells.nucleus.CellCommandListener;
1719
import dmg.cells.nucleus.CellInfoProvider;
1820
import dmg.cells.nucleus.CellLifeCycleAware;
@@ -25,6 +27,7 @@
2527
import dmg.util.command.Option;
2628
import java.io.PrintWriter;
2729
import java.io.StringWriter;
30+
import java.net.InetSocketAddress;
2831
import java.util.ArrayList;
2932
import java.util.Collection;
3033
import java.util.Collections;
@@ -40,6 +43,7 @@
4043
import java.util.function.Predicate;
4144
import java.util.regex.Matcher;
4245
import java.util.regex.Pattern;
46+
import javax.annotation.Nullable;
4347
import javax.annotation.concurrent.GuardedBy;
4448
import org.dcache.cells.CellStub;
4549
import org.dcache.pool.PoolDataBeanProvider;
@@ -57,6 +61,7 @@
5761
import org.dcache.util.expression.TypeMismatchException;
5862
import org.dcache.util.expression.UnknownIdentifierException;
5963
import org.dcache.util.pool.CostModuleTagProvider;
64+
import org.dcache.vehicles.FileAttributes;
6065
import org.parboiled.Parboiled;
6166
import org.parboiled.parserunners.ReportingParseRunner;
6267
import org.parboiled.support.ParsingResult;
@@ -1244,7 +1249,8 @@ public Object messageArrived(CellMessage envelope, PoolMigrationJobCancelMessage
12441249
* new job.
12451250
*/
12461251
@Override
1247-
public synchronized void reportFileRequest(PnfsId pnfsId, long numberOfRequests) {
1252+
public synchronized void reportFileRequest(PnfsId pnfsId, long numberOfRequests,
1253+
ProtocolInfo protocolInfo) {
12481254
if (numberOfRequests < hotFileThreshold) {
12491255
return;
12501256
}
@@ -1268,11 +1274,28 @@ public synchronized void reportFileRequest(PnfsId pnfsId, long numberOfRequests)
12681274
_context.getPoolManagerStub(),
12691275
Collections.singletonList(_context.getPoolName()));
12701276
sourceList.refresh();
1277+
1278+
// Get file attributes from repository for pool selection
1279+
CacheEntry cacheEntry;
1280+
try {
1281+
cacheEntry = _context.getRepository().getEntry(pnfsId);
1282+
} catch (Exception e) {
1283+
LOGGER.warn("Failed to get cache entry for {}: {}", pnfsId, e.getMessage(), e);
1284+
return;
1285+
}
1286+
FileAttributes fileAttributes = cacheEntry.getFileAttributes();
1287+
1288+
String protocolUnit = deriveProtocolUnit(protocolInfo);
1289+
String netUnitName = deriveNetUnitName(protocolInfo);
1290+
12711291
Collection<Pattern> excluded = new HashSet<>();
12721292
excluded.add(Pattern.compile(Pattern.quote(_context.getPoolName())));
12731293
RefreshablePoolList basePoolList = new PoolListFilter(
1274-
new PoolListByPoolGroupOfPool(_context.getPoolManagerStub(),
1275-
_context.getPoolName()),
1294+
new PoolListByPoolMgrQuery(_context.getPoolManagerStub(),
1295+
pnfsId,
1296+
fileAttributes,
1297+
protocolUnit,
1298+
netUnitName),
12761299
excluded,
12771300
FALSE_EXPRESSION,
12781301
Collections.emptySet(),
@@ -1454,6 +1477,37 @@ public boolean isActive(PnfsId id) {
14541477
return _context.isActive(id);
14551478
}
14561479

1480+
/**
1481+
* Returns the PSU protocol-unit string for the given request, e.g. {@code "DCap/3"} or
1482+
* {@code "xrootd/2.1"}. When {@code protocolInfo} is {@code null} (e.g. an internal
1483+
* pool-to-pool transfer), returns the PSU wildcard that matches any protocol unit.
1484+
*/
1485+
private static String deriveProtocolUnit(@Nullable ProtocolInfo protocolInfo) {
1486+
if (protocolInfo == null) {
1487+
return "*/*";
1488+
}
1489+
String unit = protocolInfo.getProtocol() + "/" + protocolInfo.getMajorVersion();
1490+
return protocolInfo.getMinorVersion() != 0
1491+
? unit + "." + protocolInfo.getMinorVersion()
1492+
: unit;
1493+
}
1494+
1495+
/**
1496+
* Returns the client IP address string for use as the PSU net-unit name, e.g.
1497+
* {@code "10.0.0.5"}. Returns an empty string whenever the address is unavailable
1498+
* (non-IP protocol, null socket address, or null {@code protocolInfo}), which causes
1499+
* the PSU to match any network unit.
1500+
*/
1501+
private static String deriveNetUnitName(@Nullable ProtocolInfo protocolInfo) {
1502+
if (!(protocolInfo instanceof IpProtocolInfo)) {
1503+
return "";
1504+
}
1505+
InetSocketAddress addr = ((IpProtocolInfo) protocolInfo).getSocketAddress();
1506+
return (addr != null && addr.getAddress() != null)
1507+
? addr.getAddress().getHostAddress()
1508+
: "";
1509+
}
1510+
14571511
// Hot file replication parameters
14581512
public int getNumReplicas() {
14591513
return hotFileReplicaCount;

0 commit comments

Comments
 (0)