Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,9 @@ and this project adheres to
logging resumes after suppression, a warn-level summary reports the count of
suppressed messages. A new `rate_limited_log_count` metric tracks the total
number of suppressed messages.
- [#5789](https://github.com/firecracker-microvm/firecracker/pull/5789): Add
rate-limiter support to virtio-pmem device to allow control over I/O bandwidth
generated by the FLUSH requests from the guest.

### Changed

Expand Down
3 changes: 2 additions & 1 deletion docs/device-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,7 @@ specification:
| | path_on_host | O | O | O | O | O | O | O | **R** | O |
| | root_device | O | O | O | O | O | O | O | **R** | O |
| | read_only | O | O | O | O | O | O | O | **R** | O |
| | rate_limiter | O | O | O | O | O | O | O | **R** | O |
| `SerialConfig` | serial_out_path | O | O | O | O | O | O | O | O | O |
| | rate_limiter | O | O | O | O | O | O | O | O | O |
| `MemoryHotplugConfig` | total_size_mib | O | O | O | O | O | O | O | O | **R** |
Expand All @@ -118,7 +119,7 @@ specification:
either virtio-block or vhost-user-block devices.

\*\* The `TokenBucket` can be configured with any combination of virtio-net,
virtio-block, virtio-rng and serial devices.
virtio-block, virtio-pmem, virtio-rng and serial devices.

## Output Schema

Expand Down
103 changes: 99 additions & 4 deletions docs/pmem.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,100 @@ VMs, which could be exploited as a side channel by an attacker inside the
microVM. Users that want to use `virtio-pmem` to share memory are encouraged to
carefully evaluate the security risk according to their threat model.

### Limiting `msync` write bandwidth

When a guest issues a flush request to the `virtio-pmem` device (via the
`VIRTIO_PMEM_REQ_TYPE_FLUSH`), Firecracker calls `msync(MS_SYNC)` on the backing
file to persist dirty pages to disk. A malicious guest can issue a high volume
of flush requests, leading to excessive host I/O usage.

There are two ways to mitigate this:

#### Firecracker rate limiter

The `virtio-pmem` device supports a built-in rate limiter, identical to the one
available for block devices. It throttles flush requests using two token
buckets:

- `bandwidth` — limits the total number of bytes sent to the `msync` per refill
interval. Each flush consumes tokens equal to the **full backing file size**,
because `msync` is called over the entire mapped region. For example, with a
256 MiB backing file and `size` set to `268435456` (256 MiB), at most one
flush is allowed per `refill_time` milliseconds.
- `ops` — limits the number of `msync` calls per refill interval (after
coalescing multiple flush requests within a single queue notification into one
call).

The rate limiter can be configured at device creation time. The following
example allows at most 1 flush per second for a 256 MiB backing file
(`bandwidth.size` = 256 MiB = 268435456 bytes), and at most 10 `msync`
operations per second:

```json
"pmem": [
{
"id": "pmem0",
"path_on_host": "./backing_file_256m",
"rate_limiter": {
"bandwidth": { "size": 268435456, "refill_time": 1000 },
"ops": { "size": 10, "refill_time": 1000 }
}
}
]
```

It can also be updated at runtime via the API:

```console
curl --unix-socket $socket_location -i \
-X PATCH 'http://localhost/pmem/pmem0' \
-H 'Content-Type: application/json' \
-d '{
"id": "pmem0",
"rate_limiter": {
"bandwidth": { "size": 268435456, "refill_time": 1000 },
"ops": { "size": 10, "refill_time": 1000 }
}
}'
```

> [!NOTE]
>
> Since each flush always costs exactly one op and exactly `file_size` bytes,
> the `bandwidth` and `ops` buckets are correlated: setting `bandwidth.size` to
> `file_size` with a given `refill_time` is equivalent to setting `ops.size` to
> `1` with the same `refill_time` — both allow one flush per interval. In
> practice, configuring only one of the two buckets is sufficient. Use `ops` for
> a simple "N flushes per interval" limit, or `bandwidth` if you want to express
> the limit in terms of I/O throughput.

#### Cgroup v2 IO controller

Alternatively, the **cgroup v2 IO controller** can throttle write bandwidth on
the block device that hosts the `virtio-pmem` backing file:

```bash
# Identify the block device MAJOR:MINOR for the backing file
dev=$(stat -c '%d' /path/to/backing_file)
echo "$((dev >> 8)):$((dev & 0xff))"

# Enable the io controller
echo "+io" | sudo tee /sys/fs/cgroup/<vm_cgroup>/cgroup.subtree_control

# Limit write bandwidth (e.g. 10 MB/s) on device MAJOR:MINOR
echo "MAJOR:MINOR wbps=10485760" | sudo tee /sys/fs/cgroup/<vm_cgroup>/io.max
```

> [!NOTE]
>
> - This requires **cgroup v2** with a filesystem that supports cgroup-aware
> writeback (e.g. ext4, btrfs).
> - The limit applies to all I/O from the cgroup to that device, not only
> `msync` flushes.
> - When using the [Jailer](jailer.md), the Firecracker process is already
> placed in a cgroup. You can configure `io.max` on that cgroup before
> starting the microVM.

## Snapshot support

`virtio-pmem` works with snapshot functionality of Firecracker. Snapshot will
Expand Down Expand Up @@ -184,10 +278,11 @@ if `virtio-pmem` is used for memory sharing.

## Memory usage

> [!NOTE] `virtio-pmem` memory can be paged out by the host, because it is
> backed by a file with `MAP_SHARED` mapping type. To prevent this from
> happening, you can use `vmtouch` or similar tool to lock file pages from being
> evicted.
> [!NOTE]
>
> `virtio-pmem` memory can be paged out by the host, because it is backed by a
> file with `MAP_SHARED` mapping type. To prevent this from happening, you can
> use `vmtouch` or similar tool to lock file pages from being evicted.

`virtio-pmem` resides in host memory and does increase the maximum possible
memory usage of a VM since now VM can use all of its RAM and access all of the
Expand Down
3 changes: 2 additions & 1 deletion src/firecracker/src/api_server/parsed_request.rs
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ use super::request::machine_configuration::{
use super::request::metrics::parse_put_metrics;
use super::request::mmds::{parse_get_mmds, parse_patch_mmds, parse_put_mmds};
use super::request::net::{parse_patch_net, parse_put_net};
use super::request::pmem::parse_put_pmem;
use super::request::pmem::{parse_patch_pmem, parse_put_pmem};
use super::request::snapshot::{parse_patch_vm_state, parse_put_snapshot};
use super::request::version::parse_get_version;
use super::request::vsock::parse_put_vsock;
Expand Down Expand Up @@ -120,6 +120,7 @@ impl TryFrom<&Request> for ParsedRequest {
(Method::Patch, "network-interfaces", Some(body)) => {
parse_patch_net(body, path_tokens.next())
}
(Method::Patch, "pmem", Some(body)) => parse_patch_pmem(body, path_tokens.next()),
(Method::Patch, "vm", Some(body)) => parse_patch_vm_state(body),
(Method::Patch, "hotplug", Some(body)) if path_tokens.next() == Some("memory") => {
parse_patch_memory_hotplug(body)
Expand Down
63 changes: 61 additions & 2 deletions src/firecracker/src/api_server/request/pmem.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

use vmm::logger::{IncMetric, METRICS};
use vmm::rpc_interface::VmmAction;
use vmm::vmm_config::pmem::PmemConfig;
use vmm::vmm_config::pmem::{PmemConfig, PmemDeviceUpdateConfig};

use super::super::parsed_request::{ParsedRequest, RequestError, checked_id};
use super::{Body, StatusCode};
Expand Down Expand Up @@ -37,6 +37,36 @@ pub(crate) fn parse_put_pmem(
}
}

pub(crate) fn parse_patch_pmem(
Copy link
Copy Markdown
Contributor

@kalyazin kalyazin Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we seem to have unittests for parse_put_pmem, but not for parse_patch_pmem

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

body: &Body,
id_from_path: Option<&str>,
) -> Result<ParsedRequest, RequestError> {
METRICS.patch_api_requests.pmem_count.inc();
let id = if let Some(id) = id_from_path {
checked_id(id)?
} else {
METRICS.patch_api_requests.pmem_fails.inc();
return Err(RequestError::EmptyID);
};

let update_cfg =
serde_json::from_slice::<PmemDeviceUpdateConfig>(body.raw()).inspect_err(|_| {
METRICS.patch_api_requests.pmem_fails.inc();
})?;

if id == update_cfg.id {
Ok(ParsedRequest::new_sync(VmmAction::UpdatePmemDevice(
update_cfg,
)))
} else {
METRICS.patch_api_requests.pmem_fails.inc();
Err(RequestError::Generic(
StatusCode::BadRequest,
"The id from the path does not match the id from the body!".to_string(),
))
}
}

#[cfg(test)]
mod tests {
use super::*;
Expand All @@ -60,7 +90,8 @@ mod tests {
"id": "1000",
"path_on_host": "dummy",
"root_device": true,
"read_only": true
"read_only": true,
"rate_limiter": {}
}"#;
let r = vmm_action_from_request(parse_put_pmem(&Body::new(body), Some("1000")).unwrap());

Expand All @@ -69,7 +100,35 @@ mod tests {
path_on_host: "dummy".to_string(),
root_device: true,
read_only: true,
rate_limiter: Some(Default::default()),
};
assert_eq!(r, VmmAction::InsertPmemDevice(expected_config));
}

#[test]
fn test_parse_patch_pmem_request() {
parse_put_pmem(&Body::new("invalid_payload"), None).unwrap_err();
parse_put_pmem(&Body::new("invalid_payload"), Some("id")).unwrap_err();

let body = r#"{
"id": "bar",
}"#;
parse_patch_pmem(&Body::new(body), Some("1")).unwrap_err();
let body = r#"{
"foo": "1",
}"#;
parse_patch_pmem(&Body::new(body), Some("1")).unwrap_err();

let body = r#"{
"id": "1000",
"rate_limiter": {}
}"#;
let r = vmm_action_from_request(parse_patch_pmem(&Body::new(body), Some("1000")).unwrap());

let expected_config = PmemDeviceUpdateConfig {
id: "1000".to_string(),
rate_limiter: Some(Default::default()),
};
assert_eq!(r, VmmAction::UpdatePmemDevice(expected_config));
}
}
44 changes: 44 additions & 0 deletions src/firecracker/swagger/firecracker.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -370,6 +370,35 @@ paths:
description: Internal server error.
schema:
$ref: "#/definitions/Error"
patch:
summary: Updates the rate limiter of a pmem device. Post-boot only.
description:
Updates the rate limiter applied to the pmem device with the ID specified
by the id path parameter.
operationId: patchGuestPmemByID
parameters:
- name: id
in: path
description: The id of the guest pmem device
required: true
type: string
- name: body
in: body
description: Pmem rate limiter properties
required: true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it mean that users can't remove the rate limiter later on? Other devices seem to allow that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? The description here seems to be same as for other devices. Pmem logic for updating rate-limiter is same as for other devices.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is, for example, block doesn't make rate limiter required:

  PartialDrive:
    type: object
    required:
      - drive_id
    properties:
      drive_id:
        type: string
      path_on_host:
        type: string
        description:
          Host level path for the guest drive.
          This field is optional for virtio-block config and should be omitted for vhost-user-block configuration.
      rate_limiter:
        $ref: "#/definitions/RateLimiter"

while pmem does:

  PartialPmem:
    type: object
    description:
      Defines a partial pmem device structure, used to update the rate limiter
      for that device, after microvm start.
    required:
      - id
      - rate_limiter <==== here
    properties:
      id:
        type: string
      rate_limiter:
        $ref: "#/definitions/RateLimiter"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, this is simply because rate-limiter is the only thing that can be changed. There is no reason to PATCH it if not to change the rate-limiter. For other devices I was looking at the ...UpdateConfig types and since there are always a couple things that can change there it makes sense to make them optional. Do you think we should allow empty PATCH requests for pmem?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just thinking it could be the only way to disable the rate limiter later on. Not sure if it's a common use case though. Would probably make sense to keep it on par with other devices in that sense.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, made it optional

schema:
$ref: "#/definitions/PartialPmem"
responses:
204:
description: Pmem device updated
400:
description: Pmem device cannot be updated due to bad input
schema:
$ref: "#/definitions/Error"
default:
description: Internal server error.
schema:
$ref: "#/definitions/Error"

/logger:
put:
Expand Down Expand Up @@ -1255,6 +1284,8 @@ definitions:
type: boolean
description:
Flag to map backing file in read-only mode.
rate_limiter:
$ref: "#/definitions/RateLimiter"

Error:
type: object
Expand Down Expand Up @@ -1538,6 +1569,19 @@ definitions:
tx_rate_limiter:
$ref: "#/definitions/RateLimiter"

PartialPmem:
type: object
description:
Defines a partial pmem device structure, used to update the rate limiter
for that device, after microvm start.
required:
- id
properties:
id:
type: string
rate_limiter:
$ref: "#/definitions/RateLimiter"

RateLimiter:
type: object
description:
Expand Down
1 change: 1 addition & 0 deletions src/vmm/src/builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1315,6 +1315,7 @@ pub(crate) mod tests {
path_on_host: "".into(),
root_device: true,
read_only: true,
..Default::default()
}];
let mut vmm = default_vmm();
let mut cmdline = default_kernel_cmdline();
Expand Down
4 changes: 3 additions & 1 deletion src/vmm/src/device_manager/pci_mngr.rs
Original file line number Diff line number Diff line change
Expand Up @@ -705,6 +705,7 @@ mod tests {
path_on_host: "".into(),
root_device: true,
read_only: true,
..Default::default()
}];
_pmem_files =
insert_pmem_devices(&mut vmm, &mut cmdline, &mut event_manager, pmem_configs);
Expand Down Expand Up @@ -812,7 +813,8 @@ mod tests {
"id": "pmem",
"path_on_host": "{}",
"root_device": true,
"read_only": true
"read_only": true,
"rate_limiter": null
}}
],
"memory-hotplug": {{
Expand Down
4 changes: 3 additions & 1 deletion src/vmm/src/device_manager/persist.rs
Original file line number Diff line number Diff line change
Expand Up @@ -739,6 +739,7 @@ mod tests {
path_on_host: "".into(),
root_device: true,
read_only: true,
..Default::default()
}];
_pmem_files =
insert_pmem_devices(&mut vmm, &mut cmdline, &mut event_manager, pmem_configs);
Expand Down Expand Up @@ -843,7 +844,8 @@ mod tests {
"id": "pmem",
"path_on_host": "{}",
"root_device": true,
"read_only": true
"read_only": true,
"rate_limiter": null
}}
],
"memory-hotplug": {{
Expand Down
Loading
Loading