Regular for-next build test #1157

kdave · 2024-02-21T15:01:30Z

Keep this open, the build tests are hosted on github CI.

There's no point in committing the transaction if we failed to delete the item, since we haven't done anything before. Also stop using two variables for tracking the return value and use only 'ret'. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

We use both 'ret' and 'err' which is a pattern that generates confusion and resulted in subtle bugs in the past. Remove 'err' and use only 'ret'. Also move simplify the error flow by directly returning from the function instead of breaking of the loop, since there are no resources to cleanup after the loop. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

There's no need to release the path in the if branch used when the root does not exists since we released the path before the call to btrfs_get_fs_root(). So remove that redundant btrfs_release_path() call. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

…action In btrfs_find_orphan_roots() we don't need to call btrfs_handle_fs_error() if we fail to join a transaction. This is because we haven't done anything yet regarding the current root and previous iterations of the loop dealt with other roots, so there's nothing we need to undo. Instead log an error message and return the error to the caller, which will result either in a mount failure or remount failure (the only contexts it's called from). Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

…han item In btrfs_find_orphan_roots() we don't need to call btrfs_handle_fs_error() if we fail to delete the orphan item for the current root. This is because we haven't done anything yet regarding the current root and previous iterations of the loop dealt with other roots, so there's nothing we need to undo. Instead log an error message and return the error to the caller, which will result either in a mount failure or remount failure (the only contexts it's called from). Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

There's no need to call btrfs_handle_fs_error() as we are inside a transaction and we propagate the error returned from btrfs_write_and_wait_transaction() to the caller and it ends going up the call chain to btrfs_commit_transaction() (returned by the call to create_pending_snapshots()), where we jump to the 'unlock_reloc' label and end up calling cleanup_transaction(), which aborts the transaction. This is odd given that we have a transaction handle and that in the transaction commit path any error makes us abort the transaction and, besides another place inside btrfs_commit_transaction(), it's the only place that calls btrfs_handle_fs_error(). Remove the btrfs_handle_fs_error() call and replace it with an error message so that if it happens we know what went wrong during the transaction commit. Also annotate the condition in the if statement with 'unlikely' since this is not expected to happen. We've been wanting to remove btrfs_handle_fs_error(), so this removes one user that does not even need it. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

There's no need to call btrfs_handle_fs_error() as we are inside a transaction and if we get an error we jump to the 'scrub_continue' label and end up calling cleanup_transaction(), which aborts the transaction. This is odd given that we have a transaction handle and that in the transaction commit path any error makes us abort the transaction and it's the only place that calls btrfs_handle_fs_error(). Remove the btrfs_handle_fs_error() call and replace it with an error message so that if it happens we know what went wrong during the transaction commit. Also annotate the condition in the if statement with 'unlikely' since this is not expected to happen. We've been wanting to remove btrfs_handle_fs_error(), so this removes one user that does not even needs it. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

Remove the newly introduced zoned statistics from sysfs, as sysfs can only show a single page this will truncate the output on a busy filesystem. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>

Add statistics output to /proc/<pid>/mountstats for zoned BTRFS, similar to the zoned statistics from XFS in mountstats. The output for /proc/<pid>/mountstats on an example filesystem will be as follows: device /dev/vda mounted on /mnt with fstype btrfs zoned statistics: active block-groups: 7 reclaimable: 0 unused: 5 need reclaim: false data relocation block-group: 1342177280 active zones: start: 1073741824, wp: 268419072 used: 0, reserved: 268419072, unusable: 0 start: 1342177280, wp: 0 used: 0, reserved: 0, unusable: 0 start: 1610612736, wp: 49152 used: 16384, reserved: 16384, unusable: 16384 start: 1879048192, wp: 950272 used: 131072, reserved: 622592, unusable: 196608 start: 2147483648, wp: 212238336 used: 0, reserved: 212238336, unusable: 0 start: 2415919104, wp: 0 used: 0, reserved: 0, unusable: 0 start: 2684354560, wp: 0 used: 0, reserved: 0, unusable: 0 Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

Move space_info_flag_to_str() to space-info.h and as it now isn't static to space-info.c any more prefix it with 'btrfs_'. This way it can be re-used in other places. Reviewed-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

When printing the zoned statistics, also include the block-group type in the block-group listing output. The updated output looks as follows: device /dev/vda mounted on /mnt with fstype btrfs zoned statistics: active block-groups: 9 reclaimable: 0 unused: 2 need reclaim: false data relocation block-group: 3221225472 active zones: start: 1073741824, wp: 268419072 used: 268419072, reserved: 0, unusable: 0 (DATA) start: 1342177280, wp: 0 used: 0, reserved: 0, unusable: 0 (DATA) start: 1610612736, wp: 81920 used: 16384, reserved: 16384, unusable: 49152 (SYSTEM) start: 1879048192, wp: 2031616 used: 1458176, reserved: 65536, unusable: 507904 (METADATA) start: 2147483648, wp: 268419072 used: 268419072, reserved: 0, unusable: 0 (DATA) start: 2415919104, wp: 268419072 used: 268419072, reserved: 0, unusable: 0 (DATA) start: 2684354560, wp: 268419072 used: 268419072, reserved: 0, unusable: 0 (DATA) start: 2952790016, wp: 65536 used: 65536, reserved: 0, unusable: 0 (DATA) start: 3221225472, wp: 0 used: 0, reserved: 0, unusable: 0 (DATA) Reviewed-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

Currently inside the main loop of cow_file_range(), we do the following sequence: - Reserve an extent - Lock the IO tree range - Create an IO extent map - Create an ordered extent Every step will need extra steps to do cleanup in the following order: - Drop the newly created extent map - Unlock extent range and cleanup the involved folios - Free the reserved extent However currently the error handling is done inconsistently: - Extent map drop is handled in a dedicated tag Out of the main loop, make it much harder to track. - The extent unlock and folios cleanup is done separately The extent is unlocked through btrfs_unlock_extent(), then extent_clear_unlock_delalloc() again in a dedicated tag. Meanwhile all other callsites (compression/encoded/nocow) all just call extent_clear_unlock_delalloc() to handle unlock and folio clean up in one go. - Reserved extent freeing is handled in a dedicated tag Out of the main loop, make it much harder to track. - Error handling of btrfs_reloc_clone_csums() is relying out-of-loop tags This is due to the special requirement to finish ordered extents to handle the metadata reserved space. Enhance the error handling and align the behavior by: - Introduce a dedicated cow_one_range() helper Which do the reserve/lock/allocation in the helper. And also handle the errors inside the helper. No more dedicated tags out of the main loop. - Use a single extent_clear_unlock_delalloc() to unlock and cleanup folios - Move the btrfs_reloc_clone_csums() error handling into the new helper Thankfully it's not that complex compared to other cases. And since we're here, also reduce the width of the following local variables to u32: - cur_alloc_size - min_alloc_size Each allocation won't go beyond 128M, thus u32 is more than enough. - blocksize The maximum is 64K, no need for u64. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

…_backref_finish_upper_links() The return statement after btrfs_backref_panic() is unreachable since btrfs_backref_panic() calls BUG() which never returns. Remove the return to unify it with the other calls to btrfs_backref_panic(). Signed-off-by: Zhen Ni <zhen.ni@easystack.cn> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

Warning was found when compiling using loongarch64-gcc 12.3.1: $ make CFLAGS_tree-log.o=-Wmaybe-uninitialized In file included from fs/btrfs/ctree.h:21, from fs/btrfs/tree-log.c:12: fs/btrfs/accessors.h: In function 'replay_one_buffer': fs/btrfs/accessors.h:66:16: warning: 'inode_item' may be used uninitialized [-Wmaybe-uninitialized] 66 | return btrfs_get_##bits(eb, s, offsetof(type, member)); \ | ^~~~~~~~~~ fs/btrfs/tree-log.c:2803:42: note: 'inode_item' declared here 2803 | struct btrfs_inode_item *inode_item; | ^~~~~~~~~~ Initialize the inode_item to NULL, the compiler does not seem to see the relation between the first 'wc->log_key.type == BTRFS_INODE_ITEM_KEY' check and the other one that also checks the replay phase. Signed-off-by: Qiang Ma <maqianga@uniontech.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

Errors are unexpected during the transaction commit path, and when they happen we abort the transaction (by calling cleanup_transaction() under the label 'cleanup_transaction' in btrfs_commit_transaction()). So mark every error check in the transaction commit path as unlikely, to hint the compiler so that it can possibly generate better code, and make it clear for a reader about being unexpected. On a x86_84 box using gcc 14.2.0-19 from Debian, this resulted in a slight reduction of the module's text size. Before: $ size fs/btrfs/btrfs.ko text data bss dec hex filename 1939476 172568 15592 2127636 207714 fs/btrfs/btrfs.ko After: $ size fs/btrfs/btrfs.ko text data bss dec hex filename 1939044 172568 15592 2127204 207564 fs/btrfs/btrfs.ko Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

Instead of surrounding every caller of btrfs_is_shutdown() with unlikely, move the unlikely into the helper itself, like we do in other places in btrfs and is common in the kernel outside btrfs too. Also make the fs_info argument of btrfs_is_shutdown() const. On a x86_84 box using gcc 14.2.0-19 from Debian, this resulted in a slight reduction of the module's text size. Before: $ size fs/btrfs/btrfs.ko text data bss dec hex filename 1939044 172568 15592 2127204 207564 fs/btrfs/btrfs.ko After: $ size fs/btrfs/btrfs.ko text data bss dec hex filename 1938876 172568 15592 2127036 2074bc fs/btrfs/btrfs.ko Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

There's no point in committing the transaction if we failed to insert the balance item, since we haven't done anything else after we started/joined the transaction. Also stop using two variables for tracking the return value and use only 'ret'. Reviewed-by: Daniel Vacek <neelx@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

This function already dereferences 'inode' multiple times earlier, making the additional NULL check at line 840 redundant since the function would have crashed already if inode were NULL. After commit 81cea6c ("btrfs: remove btrfs_bio::fs_info by extracting it from btrfs_bio::inode"), the btrfs_bio::inode field is mandatory for all btrfs_bio allocations and is guaranteed to be non-NULL. Simplify the condition for allocating dummy checksums for zoned NODATASUM data by removing the unnecessary 'inode &&' check. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Zhen Ni <zhen.ni@easystack.cn> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

[BUG] Before btrfs-progs v6.16.1 release, mkfs.btrfs can leave free space tree entries for deleted chunks: # mkfs.btrfs -f -O fst $dev # btrfs ins dump-tree -t chunk $dev btrfs-progs v6.16 chunk tree leaf 22036480 items 4 free space 15781 generation 8 owner CHUNK_TREE leaf 22036480 flags 0x1(WRITTEN) backref revision 1 item 0 key (DEV_ITEMS DEV_ITEM 1) itemoff 16185 itemsize 98 item 1 key (FIRST_CHUNK_TREE CHUNK_ITEM 13631488) itemoff 16105 itemsize 80 ^^^ The first chunk is at 13631488 item 2 key (FIRST_CHUNK_TREE CHUNK_ITEM 22020096) itemoff 15993 itemsize 112 item 3 key (FIRST_CHUNK_TREE CHUNK_ITEM 30408704) itemoff 15881 itemsize 112 # btrfs ins dump-tree -t free-space-tree $dev btrfs-progs v6.16 free space tree key (FREE_SPACE_TREE ROOT_ITEM 0) leaf 30556160 items 13 free space 15918 generation 8 owner FREE_SPACE_TREE leaf 30556160 flags 0x1(WRITTEN) backref revision 1 item 0 key (1048576 FREE_SPACE_INFO 4194304) itemoff 16275 itemsize 8 free space info extent count 1 flags 0 item 1 key (1048576 FREE_SPACE_EXTENT 4194304) itemoff 16275 itemsize 0 free space extent item 2 key (5242880 FREE_SPACE_INFO 8388608) itemoff 16267 itemsize 8 free space info extent count 1 flags 0 item 3 key (5242880 FREE_SPACE_EXTENT 8388608) itemoff 16267 itemsize 0 free space extent ^^^ Above 4 items are all before the first chunk. item 4 key (13631488 FREE_SPACE_INFO 8388608) itemoff 16259 itemsize 8 free space info extent count 1 flags 0 item 5 key (13631488 FREE_SPACE_EXTENT 8388608) itemoff 16259 itemsize 0 free space extent ... This can trigger btrfs check errors. [CAUSE] It's a bug in free space tree implementation of btrfs-progs, which doesn't delete involved fst entries for the to-be-deleted chunk/block group. [ENHANCEMENT] The mostly common fix is to clear the space cache and rebuild it, but that requires a ro->rw remount which may not be possible for rootfs, and also relies on users to use "clear_cache" mount option manually. Here introduce a kernel fix for it, which will delete any entries that is before the first block group automatically at the first RW mount. For filesystems without such problem, the overhead is just a single tree search and no modification to the free space tree, thus the overhead should be minimal. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

The function add_block_group_free_space() was renamed btrfs_add_block_group_free_space() by commit 6fc5ef7 ("btrfs: add btrfs prefix to free space tree exported functions"). Update the comment accordingly. Do some reorganization of the next few lines to keep the comment within 80 characters. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

In btrfs_read_locked_inode() if we fail to lookup the inode, we jump to the 'out' label with a path that has a read locked leaf and then we call iget_failed(). This can result in a ABBA deadlock, since iget_failed() triggers inode eviction and that causes the release of the delayed inode, which must lock the delayed inode's mutex, and a task updating a delayed inode starts by taking the node's mutex and then modifying the inode's subvolume btree. Syzbot reported the following lockdep splat for this: ====================================================== WARNING: possible circular locking dependency detected syzkaller #0 Not tainted ------------------------------------------------------ btrfs-cleaner/8725 is trying to acquire lock: ffff0000d6826a48 (&delayed_node->mutex){+.+.}-{4:4}, at: __btrfs_release_delayed_node+0xa0/0x9b0 fs/btrfs/delayed-inode.c:290 but task is already holding lock: ffff0000dbeba878 (btrfs-tree-00){++++}-{4:4}, at: btrfs_tree_read_lock_nested+0x44/0x2ec fs/btrfs/locking.c:145 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (btrfs-tree-00){++++}-{4:4}: __lock_release kernel/locking/lockdep.c:5574 [inline] lock_release+0x198/0x39c kernel/locking/lockdep.c:5889 up_read+0x24/0x3c kernel/locking/rwsem.c:1632 btrfs_tree_read_unlock+0xdc/0x298 fs/btrfs/locking.c:169 btrfs_tree_unlock_rw fs/btrfs/locking.h:218 [inline] btrfs_search_slot+0xa6c/0x223c fs/btrfs/ctree.c:2133 btrfs_lookup_inode+0xd8/0x38c fs/btrfs/inode-item.c:395 __btrfs_update_delayed_inode+0x124/0xed0 fs/btrfs/delayed-inode.c:1032 btrfs_update_delayed_inode fs/btrfs/delayed-inode.c:1118 [inline] __btrfs_commit_inode_delayed_items+0x15f8/0x1748 fs/btrfs/delayed-inode.c:1141 __btrfs_run_delayed_items+0x1ac/0x514 fs/btrfs/delayed-inode.c:1176 btrfs_run_delayed_items_nr+0x28/0x38 fs/btrfs/delayed-inode.c:1219 flush_space+0x26c/0xb68 fs/btrfs/space-info.c:828 do_async_reclaim_metadata_space+0x110/0x364 fs/btrfs/space-info.c:1158 btrfs_async_reclaim_metadata_space+0x90/0xd8 fs/btrfs/space-info.c:1226 process_one_work+0x7e8/0x155c kernel/workqueue.c:3263 process_scheduled_works kernel/workqueue.c:3346 [inline] worker_thread+0x958/0xed8 kernel/workqueue.c:3427 kthread+0x5fc/0x75c kernel/kthread.c:463 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:844 -> #0 (&delayed_node->mutex){+.+.}-{4:4}: check_prev_add kernel/locking/lockdep.c:3165 [inline] check_prevs_add kernel/locking/lockdep.c:3284 [inline] validate_chain kernel/locking/lockdep.c:3908 [inline] __lock_acquire+0x1774/0x30a4 kernel/locking/lockdep.c:5237 lock_acquire+0x14c/0x2e0 kernel/locking/lockdep.c:5868 __mutex_lock_common+0x1d0/0x2678 kernel/locking/mutex.c:598 __mutex_lock kernel/locking/mutex.c:760 [inline] mutex_lock_nested+0x2c/0x38 kernel/locking/mutex.c:812 __btrfs_release_delayed_node+0xa0/0x9b0 fs/btrfs/delayed-inode.c:290 btrfs_release_delayed_node fs/btrfs/delayed-inode.c:315 [inline] btrfs_remove_delayed_node+0x68/0x84 fs/btrfs/delayed-inode.c:1326 btrfs_evict_inode+0x578/0xe28 fs/btrfs/inode.c:5587 evict+0x414/0x928 fs/inode.c:810 iput_final fs/inode.c:1914 [inline] iput+0x95c/0xad4 fs/inode.c:1966 iget_failed+0xec/0x134 fs/bad_inode.c:248 btrfs_read_locked_inode+0xe1c/0x1234 fs/btrfs/inode.c:4101 btrfs_iget+0x1b0/0x264 fs/btrfs/inode.c:5837 btrfs_run_defrag_inode fs/btrfs/defrag.c:237 [inline] btrfs_run_defrag_inodes+0x520/0xdc4 fs/btrfs/defrag.c:309 cleaner_kthread+0x21c/0x418 fs/btrfs/disk-io.c:1516 kthread+0x5fc/0x75c kernel/kthread.c:463 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:844 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- rlock(btrfs-tree-00); lock(&delayed_node->mutex); lock(btrfs-tree-00); lock(&delayed_node->mutex); *** DEADLOCK *** 1 lock held by btrfs-cleaner/8725: #0: ffff0000dbeba878 (btrfs-tree-00){++++}-{4:4}, at: btrfs_tree_read_lock_nested+0x44/0x2ec fs/btrfs/locking.c:145 stack backtrace: CPU: 0 UID: 0 PID: 8725 Comm: btrfs-cleaner Not tainted syzkaller #0 PREEMPT Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/03/2025 Call trace: show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:499 (C) __dump_stack+0x30/0x40 lib/dump_stack.c:94 dump_stack_lvl+0xd8/0x12c lib/dump_stack.c:120 dump_stack+0x1c/0x28 lib/dump_stack.c:129 print_circular_bug+0x324/0x32c kernel/locking/lockdep.c:2043 check_noncircular+0x154/0x174 kernel/locking/lockdep.c:2175 check_prev_add kernel/locking/lockdep.c:3165 [inline] check_prevs_add kernel/locking/lockdep.c:3284 [inline] validate_chain kernel/locking/lockdep.c:3908 [inline] __lock_acquire+0x1774/0x30a4 kernel/locking/lockdep.c:5237 lock_acquire+0x14c/0x2e0 kernel/locking/lockdep.c:5868 __mutex_lock_common+0x1d0/0x2678 kernel/locking/mutex.c:598 __mutex_lock kernel/locking/mutex.c:760 [inline] mutex_lock_nested+0x2c/0x38 kernel/locking/mutex.c:812 __btrfs_release_delayed_node+0xa0/0x9b0 fs/btrfs/delayed-inode.c:290 btrfs_release_delayed_node fs/btrfs/delayed-inode.c:315 [inline] btrfs_remove_delayed_node+0x68/0x84 fs/btrfs/delayed-inode.c:1326 btrfs_evict_inode+0x578/0xe28 fs/btrfs/inode.c:5587 evict+0x414/0x928 fs/inode.c:810 iput_final fs/inode.c:1914 [inline] iput+0x95c/0xad4 fs/inode.c:1966 iget_failed+0xec/0x134 fs/bad_inode.c:248 btrfs_read_locked_inode+0xe1c/0x1234 fs/btrfs/inode.c:4101 btrfs_iget+0x1b0/0x264 fs/btrfs/inode.c:5837 btrfs_run_defrag_inode fs/btrfs/defrag.c:237 [inline] btrfs_run_defrag_inodes+0x520/0xdc4 fs/btrfs/defrag.c:309 cleaner_kthread+0x21c/0x418 fs/btrfs/disk-io.c:1516 kthread+0x5fc/0x75c kernel/kthread.c:463 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:844 Fix this by releasing the path before calling iget_failed(). Reported-by: syzbot+c1c6edb02bea1da754d8@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/694530c2.a70a0220.207337.010d.GAE@google.com/ Fixes: 6967399 ("btrfs: push cleanup into btrfs_read_locked_inode()") Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

If btrfs_insert_fs_root() fails, the tmp_root allocated by btrfs_alloc_dummy_root() is leaked because its initial reference count is not decremented. Fix this by calling btrfs_put_root() unconditionally after btrfs_insert_fs_root(). This ensures the local reference is always dropped. Also fix a copy-paste error in the error message where the subvolume root insertion failure was incorrectly logged as "fs root". Co-developed-by: Jianhao Xu <jianhao.xu@seu.edu.cn> Signed-off-by: Jianhao Xu <jianhao.xu@seu.edu.cn> Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

In test_rmap_blocks(), we have ret = 0 before checking the results. We need to set it to -EINVAL, so that a mismatching result will return -EINVAL not 0. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

Currently for an inode that needs compression, even if there is a delalloc range that is single fs block sized and can not be inlined, we will still go through the compression path. Then inside compress_file_range(), we have one extra check to reject single block sized range, and fall back to regular uncompressed write. This rejection is in fact a little too late, we have already allocated memory to async_chunk, delayed the submission, just to fallback to the same uncompressed write. Change the behavior to reject such cases earlier at inode_need_compress(), so for such single block sized range we won't even bother trying to go through compress path. And since the inline small block check is inside inode_need_compress() and compress_file_range() also calls that function, we no longer need a dedicate check inside compress_file_range(). Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

Before accessing the disk_bytenr field of a file extent item we need to check if we are dealing with an inline extent. This is because for inline extents their data starts at the offset of the disk_bytenr field. So accessing the disk_bytenr means we are accessing inline data or in case the inline data is less than 8 bytes we can actually cause an invalid memory access if this inline extent item is the first item in the leaf or access metadata from other items. Fixes: 82bfb2e ("Btrfs: incremental send, fix unnecessary hole writes for sparse files") Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

There are two tests in btrfs_fs_closing() but checking the BTRFS_FS_CLOSING_DONE bit is done only in one place load_extent_tree_free(). As this is an inline we can reduce size of the generated code. The types can be also changed to bool as this becomes a simple condition. text data bss dec hex filename 1674006 146704 15560 1836270 1c04ee pre/btrfs.ko 1673772 146704 15560 1836036 1c0404 post/btrfs.ko DELTA: -234 Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

The offload csum mode was introduced to allow developers to compare the performance of generating checksum for data writes at different timings: - During btrfs_submit_chunk() This is the most common one, if any of the following condition is met we go this path: * The csum is fast For now it's CRC32C and xxhash. * It's a synchronous write * Zoned - Delay the checksum generation to a workqueue However since commit dd57c78 ("btrfs: introduce btrfs_bio::async_csum") we no longer need to bother any of them. As if it's an experimental build, async checksum generation at the background will be faster anyway. And if not an experimental build, we won't even have the offload csum mode support. Considering the async csum will be the new default, let's remove the offload csum mode code. There will be no impact to end users, and offload csum mode is still under experimental features. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

When the user performs a btrfs mount, the block device is not set correctly. The user sets the block size of the block device to 0x4000 by executing the BLKBSZSET command. Since the block size change also changes the mapping->flags value, this further affects the result of the mapping_min_folio_order() calculation. Let's analyze the following two scenarios: Scenario 1: Without executing the BLKBSZSET command, the block size is 0x1000, and mapping_min_folio_order() returns 0; Scenario 2: After executing the BLKBSZSET command, the block size is 0x4000, and mapping_min_folio_order() returns 2. do_read_cache_folio() allocates a folio before the BLKBSZSET command is executed. This results in the allocated folio having an order value of 0. Later, after BLKBSZSET is executed, the block size increases to 0x4000, and the mapping_min_folio_order() calculation result becomes 2. This leads to two undesirable consequences: 1. filemap_add_folio() triggers a VM_BUG_ON_FOLIO(folio_order(folio) < mapping_min_folio_order(mapping)) assertion. 2. The syzbot report [1] shows a null pointer dereference in create_empty_buffers() due to a buffer head allocation failure. Synchronization should be established based on the inode between the BLKBSZSET command and read cache page to prevent inconsistencies in block size or mapping flags before and after folio allocation. [1] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] RIP: 0010:create_empty_buffers+0x4d/0x480 fs/buffer.c:1694 Call Trace: folio_create_buffers+0x109/0x150 fs/buffer.c:1802 block_read_full_folio+0x14c/0x850 fs/buffer.c:2403 filemap_read_folio+0xc8/0x2a0 mm/filemap.c:2496 do_read_cache_folio+0x266/0x5c0 mm/filemap.c:4096 do_read_cache_page mm/filemap.c:4162 [inline] read_cache_page_gfp+0x29/0x120 mm/filemap.c:4195 btrfs_read_disk_super+0x192/0x500 fs/btrfs/volumes.c:1367 Reported-by: syzbot+b4a2af3000eaa84d95d5@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b4a2af3000eaa84d95d5 Signed-off-by: Edward Adam Davis <eadavis@qq.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

The following new features are missing: - Async checksum - Shutdown ioctl and auto-degradation - Larger block size support Which is dependent on larger folios. Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com>

kdave force-pushed the for-next branch from 04c8194 to 6f435ee Compare February 22, 2024 10:28

kdave changed the title ~~Post rc5 build test~~ Regular for-next build test Feb 22, 2024

kdave force-pushed the for-next branch 6 times, most recently from 2d4aefb to c9e380a Compare February 28, 2024 14:37

kdave force-pushed the for-next branch 6 times, most recently from c56343b to 1cab137 Compare March 5, 2024 17:23

kdave force-pushed the for-next branch 2 times, most recently from 6613f3c to b30a0ce Compare March 15, 2024 01:05

fdmanana force-pushed the for-next branch from 41a7195 to 787f021 Compare March 18, 2024 11:17

kdave force-pushed the for-next branch 6 times, most recently from d205ebd to c0bd9d9 Compare March 25, 2024 17:48

kdave force-pushed the for-next branch 4 times, most recently from 15022b1 to c22750c Compare March 28, 2024 02:04

kdave force-pushed the for-next branch 3 times, most recently from 28d9855 to e18d8ce Compare April 4, 2024 19:30

fdmanana and others added 28 commits January 9, 2026 18:30

kdave force-pushed the for-next branch from 7f67d05 to b5e4f26 Compare January 9, 2026 17:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Regular for-next build test #1157

Regular for-next build test #1157

Uh oh!

kdave commented Feb 21, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

20 participants

Regular for-next build test #1157

Are you sure you want to change the base?

Regular for-next build test #1157

Uh oh!

Conversation

kdave commented Feb 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

20 participants

kdave commented Feb 21, 2024 •

edited

Loading