Skip to content

Commit f2ccfbf

Browse files
committed
Merge: x86/vmscape: Add conditional IBPB mitigation
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/7401 JIRA: https://issues.redhat.com/browse/RHEL-114273 CVE: CVE-2025-40300 MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/7401 VMSCAPE is a vulnerability that exploits insufficient branch predictor isolation between a guest and a userspace hypervisor (like QEMU). Existing mitigations already protect kernel/KVM from a malicious guest. Userspace can additionally be protected by flushing the branch predictors after a VMexit. Since it is the userspace that consumes the poisoned branch predictors, conditionally issue an IBPB after a VMexit and before returning to userspace. Workloads that frequently switch between hypervisor and userspace will incur the most overhead from the new IBPB. More information this new CPU vulnerability can be found in the "VMSCAPE: Exposing and Exploiting Incomplete Branch Predictor Isolation in Cloud Environments" paper [1]. [1] https://comsec-files.ethz.ch/papers/vmscape_sp26.pdf Signed-off-by: Waiman Long <longman@redhat.com> Approved-by: Mark Langsdorf <mlangsdo@redhat.com> Approved-by: Steve Best <sbest@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Patrick Talbert <ptalbert@redhat.com>
2 parents 4f51d7b + 2d722e5 commit f2ccfbf

File tree

16 files changed

+429
-132
lines changed

16 files changed

+429
-132
lines changed

Documentation/ABI/testing/sysfs-devices-system-cpu

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -533,6 +533,7 @@ What: /sys/devices/system/cpu/vulnerabilities
533533
/sys/devices/system/cpu/vulnerabilities/srbds
534534
/sys/devices/system/cpu/vulnerabilities/tsa
535535
/sys/devices/system/cpu/vulnerabilities/tsx_async_abort
536+
/sys/devices/system/cpu/vulnerabilities/vmscape
536537
Date: January 2018
537538
Contact: Linux kernel mailing list <linux-kernel@vger.kernel.org>
538539
Description: Information about CPU vulnerabilities

Documentation/admin-guide/hw-vuln/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,3 +24,4 @@ are configurable at compile, boot or run time.
2424
reg-file-data-sampling
2525
rsb
2626
indirect-target-selection
27+
vmscape
Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
VMSCAPE
4+
=======
5+
6+
VMSCAPE is a vulnerability that may allow a guest to influence the branch
7+
prediction in host userspace. It particularly affects hypervisors like QEMU.
8+
9+
Even if a hypervisor may not have any sensitive data like disk encryption keys,
10+
guest-userspace may be able to attack the guest-kernel using the hypervisor as
11+
a confused deputy.
12+
13+
Affected processors
14+
-------------------
15+
16+
The following CPU families are affected by VMSCAPE:
17+
18+
**Intel processors:**
19+
- Skylake generation (Parts without Enhanced-IBRS)
20+
- Cascade Lake generation - (Parts affected by ITS guest/host separation)
21+
- Alder Lake and newer (Parts affected by BHI)
22+
23+
Note that, BHI affected parts that use BHB clearing software mitigation e.g.
24+
Icelake are not vulnerable to VMSCAPE.
25+
26+
**AMD processors:**
27+
- Zen series (families 0x17, 0x19, 0x1a)
28+
29+
** Hygon processors:**
30+
- Family 0x18
31+
32+
Mitigation
33+
----------
34+
35+
Conditional IBPB
36+
----------------
37+
38+
Kernel tracks when a CPU has run a potentially malicious guest and issues an
39+
IBPB before the first exit to userspace after VM-exit. If userspace did not run
40+
between VM-exit and the next VM-entry, no IBPB is issued.
41+
42+
Note that the existing userspace mitigation against Spectre-v2 is effective in
43+
protecting the userspace. They are insufficient to protect the userspace VMMs
44+
from a malicious guest. This is because Spectre-v2 mitigations are applied at
45+
context switch time, while the userspace VMM can run after a VM-exit without a
46+
context switch.
47+
48+
Vulnerability enumeration and mitigation is not applied inside a guest. This is
49+
because nested hypervisors should already be deploying IBPB to isolate
50+
themselves from nested guests.
51+
52+
SMT considerations
53+
------------------
54+
55+
When Simultaneous Multi-Threading (SMT) is enabled, hypervisors can be
56+
vulnerable to cross-thread attacks. For complete protection against VMSCAPE
57+
attacks in SMT environments, STIBP should be enabled.
58+
59+
The kernel will issue a warning if SMT is enabled without adequate STIBP
60+
protection. Warning is not issued when:
61+
62+
- SMT is disabled
63+
- STIBP is enabled system-wide
64+
- Intel eIBRS is enabled (which implies STIBP protection)
65+
66+
System information and options
67+
------------------------------
68+
69+
The sysfs file showing VMSCAPE mitigation status is:
70+
71+
/sys/devices/system/cpu/vulnerabilities/vmscape
72+
73+
The possible values in this file are:
74+
75+
* 'Not affected':
76+
77+
The processor is not vulnerable to VMSCAPE attacks.
78+
79+
* 'Vulnerable':
80+
81+
The processor is vulnerable and no mitigation has been applied.
82+
83+
* 'Mitigation: IBPB before exit to userspace':
84+
85+
Conditional IBPB mitigation is enabled. The kernel tracks when a CPU has
86+
run a potentially malicious guest and issues an IBPB before the first
87+
exit to userspace after VM-exit.
88+
89+
* 'Mitigation: IBPB on VMEXIT':
90+
91+
IBPB is issued on every VM-exit. This occurs when other mitigations like
92+
RETBLEED or SRSO are already issuing IBPB on VM-exit.
93+
94+
Mitigation control on the kernel command line
95+
----------------------------------------------
96+
97+
The mitigation can be controlled via the ``vmscape=`` command line parameter:
98+
99+
* ``vmscape=off``:
100+
101+
Disable the VMSCAPE mitigation.
102+
103+
* ``vmscape=ibpb``:
104+
105+
Enable conditional IBPB mitigation (default when CONFIG_MITIGATION_VMSCAPE=y).
106+
107+
* ``vmscape=force``:
108+
109+
Force vulnerability detection and mitigation even on processors that are
110+
not known to be affected.

Documentation/admin-guide/kernel-parameters.txt

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3435,6 +3435,7 @@
34353435
srbds=off [X86,INTEL]
34363436
ssbd=force-off [ARM64]
34373437
tsx_async_abort=off [X86]
3438+
vmscape=off [X86]
34383439

34393440
Exceptions:
34403441
This does not have any effect on
@@ -7152,6 +7153,16 @@
71527153
vmpoff= [KNL,S390] Perform z/VM CP command after power off.
71537154
Format: <command>
71547155

7156+
vmscape= [X86] Controls mitigation for VMscape attacks.
7157+
VMscape attacks can leak information from a userspace
7158+
hypervisor to a guest via speculative side-channels.
7159+
7160+
off - disable the mitigation
7161+
ibpb - use Indirect Branch Prediction Barrier
7162+
(IBPB) mitigation (default)
7163+
force - force vulnerability detection even on
7164+
unaffected processors
7165+
71557166
vsyscall= [X86-64]
71567167
Controls the behavior of vsyscalls (i.e. calls to
71577168
fixed addresses of 0xffffffffff600x00 from legacy

arch/arm64/kernel/syscall.c

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -53,17 +53,15 @@ static void invoke_syscall(struct pt_regs *regs, unsigned int scno,
5353
syscall_set_return_value(current, regs, 0, ret);
5454

5555
/*
56-
* Ultimately, this value will get limited by KSTACK_OFFSET_MAX(),
57-
* but not enough for arm64 stack utilization comfort. To keep
58-
* reasonable stack head room, reduce the maximum offset to 9 bits.
56+
* This value will get limited by KSTACK_OFFSET_MAX(), which is 10
57+
* bits. The actual entropy will be further reduced by the compiler
58+
* when applying stack alignment constraints: the AAPCS mandates a
59+
* 16-byte aligned SP at function boundaries, which will remove the
60+
* 4 low bits from any entropy chosen here.
5961
*
60-
* The actual entropy will be further reduced by the compiler when
61-
* applying stack alignment constraints: the AAPCS mandates a
62-
* 16-byte (i.e. 4-bit) aligned SP at function boundaries.
63-
*
64-
* The resulting 5 bits of entropy is seen in SP[8:4].
62+
* The resulting 6 bits of entropy is seen in SP[9:4].
6563
*/
66-
choose_random_kstack_offset(get_random_int() & 0x1FF);
64+
choose_random_kstack_offset(get_random_u16());
6765
}
6866

6967
static inline bool has_syscall_work(unsigned long flags)

arch/s390/include/asm/entry-common.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ static __always_inline void arch_exit_to_user_mode(void)
5555
static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
5656
unsigned long ti_work)
5757
{
58-
choose_random_kstack_offset(get_tod_clock_fast() & 0xff);
58+
choose_random_kstack_offset(get_tod_clock_fast());
5959
}
6060

6161
#define arch_exit_to_user_mode_prepare arch_exit_to_user_mode_prepare

arch/x86/Kconfig

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2756,6 +2756,15 @@ config MITIGATION_TSA
27562756
security vulnerability on AMD CPUs which can lead to forwarding of
27572757
invalid info to subsequent instructions and thus can affect their
27582758
timing and thereby cause a leakage.
2759+
2760+
config MITIGATION_VMSCAPE
2761+
bool "Mitigate VMSCAPE"
2762+
depends on KVM
2763+
default y
2764+
help
2765+
Enable mitigation for VMSCAPE attacks. VMSCAPE is a hardware security
2766+
vulnerability on Intel and AMD CPUs that may allow a guest to do
2767+
Spectre v2 style attacks on userspace hypervisor.
27592768
endif
27602769

27612770
config ARCH_HAS_ADD_PAGES

arch/x86/include/asm/cpufeatures.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -491,6 +491,7 @@
491491
#define X86_FEATURE_TSA_SQ_NO (21*32+11) /* AMD CPU not vulnerable to TSA-SQ */
492492
#define X86_FEATURE_TSA_L1_NO (21*32+12) /* AMD CPU not vulnerable to TSA-L1 */
493493
#define X86_FEATURE_CLEAR_CPU_BUF_VM (21*32+13) /* Clear CPU buffers using VERW before VMRUN */
494+
#define X86_FEATURE_IBPB_EXIT_TO_USER (21*32+14) /* Use IBPB on exit-to-userspace, see VMSCAPE bug */
494495

495496
/*
496497
* BUG word(s)
@@ -546,4 +547,5 @@
546547
#define X86_BUG_ITS X86_BUG( 1*32+ 7) /* "its" CPU is affected by Indirect Target Selection */
547548
#define X86_BUG_ITS_NATIVE_ONLY X86_BUG( 1*32+ 8) /* "its_native_only" CPU is affected by ITS, VMX is not affected */
548549
#define X86_BUG_TSA X86_BUG( 1*32+ 9) /* "tsa" CPU is affected by Transient Scheduler Attacks */
550+
#define X86_BUG_VMSCAPE X86_BUG( 1*32+10) /* "vmscape" CPU is affected by VMSCAPE attacks from guests */
549551
#endif /* _ASM_X86_CPUFEATURES_H */

arch/x86/include/asm/entry-common.h

Lines changed: 13 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -73,19 +73,23 @@ static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
7373
#endif
7474

7575
/*
76-
* Ultimately, this value will get limited by KSTACK_OFFSET_MAX(),
77-
* but not enough for x86 stack utilization comfort. To keep
78-
* reasonable stack head room, reduce the maximum offset to 8 bits.
79-
*
80-
* The actual entropy will be further reduced by the compiler when
81-
* applying stack alignment constraints (see cc_stack_align4/8 in
76+
* This value will get limited by KSTACK_OFFSET_MAX(), which is 10
77+
* bits. The actual entropy will be further reduced by the compiler
78+
* when applying stack alignment constraints (see cc_stack_align4/8 in
8279
* arch/x86/Makefile), which will remove the 3 (x86_64) or 2 (ia32)
8380
* low bits from any entropy chosen here.
8481
*
85-
* Therefore, final stack offset entropy will be 5 (x86_64) or
86-
* 6 (ia32) bits.
82+
* Therefore, final stack offset entropy will be 7 (x86_64) or
83+
* 8 (ia32) bits.
8784
*/
88-
choose_random_kstack_offset(rdtsc() & 0xFF);
85+
choose_random_kstack_offset(rdtsc());
86+
87+
/* Avoid unnecessary reads of 'x86_ibpb_exit_to_user' */
88+
if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER) &&
89+
this_cpu_read(x86_ibpb_exit_to_user)) {
90+
indirect_branch_prediction_barrier();
91+
this_cpu_write(x86_ibpb_exit_to_user, false);
92+
}
8993
}
9094
#define arch_exit_to_user_mode_prepare arch_exit_to_user_mode_prepare
9195

arch/x86/include/asm/nospec-branch.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -545,6 +545,8 @@ void alternative_msr_write(unsigned int msr, u64 val, unsigned int feature)
545545
: "memory");
546546
}
547547

548+
DECLARE_PER_CPU(bool, x86_ibpb_exit_to_user);
549+
548550
static inline void indirect_branch_prediction_barrier(void)
549551
{
550552
asm_inline volatile(ALTERNATIVE("", "call write_ibpb", X86_FEATURE_IBPB)

0 commit comments

Comments
 (0)