Skip to content

progress stuck due to deadlock #363

@Besroy

Description

@Besroy

Environment

  Cluster: 908
  Namespace: nuobject2sh-dev
  Pod: sm-long-running4-1010-6db4d944f-rzrbj
  HomeObject Version: homeobject/3.0.6@oss/main
  HomeStore Version: homestore/7.0.0@oss/master

Description

One SM stuck and cannot process more requests during SH test, and the reason seems to be a deadlock issue between _pg_lock (HomeObject PG lock) and m_meta_mtx (MetaBlkService mutex).

More details in SH isssue records Issue75

Threads

Thread 73 (LWP 99) - GC Worker Thread

Thread 73 (Thread 0x7c6bddffb680 (LWP 99)):
 #0  0x00007c6d18c1ef70 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
 #1  0x00007c6d18c26101 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libc.so.6
 #2  0x00005ced90b217a9 in __gthread_mutex_lock (__mutex=0x5ceda0018ce0)
 #3  std::mutex::lock (this=0x5ceda0018ce0)
 #4  std::lock_guard<std::mutex>::lock_guard (__m=..., this=<synthetic pointer>)
 #5  homestore::MetaBlkService::update_sub_sb (this=0x5ceda0018cd0, context_data=0x5ceda05776f0 "", sz=1751, cookie=0x5ceda032ac00) 
     at /home/ubuntu/.conan2/p/b/homes10e04144d6517/b/src/lib/meta/meta_blk_service.cpp:806
 #6  0x00005ced9096cffe in homeobject::HSHomeObject::update_pg_meta_after_gc (this=0x5ced9ffd0690, pg_id=<optimized out>, move_from_chunk=<optimized out>, move_to_chunk=<optimized out>, task_id=<optimized out>) 
     at /home/ubuntu/HomeObject/src/lib/homestore_backend/hs_pg_manager.cpp:1010
 #7  0x00005ced90a6dcef in homeobject::GCManager::pdev_gc_actor::replace_blob_index (this=this@entry=0x5ceda0693100, move_from_chunk=<optimized out>, move_from_chunk@entry=224, move_to_chunk=<optimized out>, move_to_chunk@entry=187, 
 valid_blob_indexes=std::vector of length 154, capacity 256 = {...}, task_id=<optimized out>, task_id@entry=496)
     at /home/ubuntu/HomeObject/src/lib/homestore_backend/gc_manager.cpp:582
 #8  0x00005ced90a6e609 in homeobject::GCManager::pdev_gc_actor::process_after_gc_metablk_persisted (this=this@entry=0x5ceda0693100, gc_task_sb=..., valid_blob_indexes=std::vector of length 154, capacity 256 = {...}, 
 task_id=task_id@entry=496)
     at /home/ubuntu/HomeObject/src/lib/homestore_backend/gc_manager.cpp:1184
 #9  0x00005ced90a788b8 in homeobject::GCManager::pdev_gc_actor::process_gc_task (this=0x5ceda0693100, move_from_chunk=<optimized out>, priority=<optimized out>, task=..., task_id=<optimized out>) 
     at /home/ubuntu/HomeObject/src/lib/homestore_backend/gc_manager.cpp:1148
 #10 0x00005ced90a7981f in operator() (__closure=0x7c6bddfecf40) 
     at /home/ubuntu/HomeObject/src/lib/homestore_backend/gc_manager.cpp:346

Thread 25 (LWP 23) - GC Scanner Thread

holds this mutex:

  (gdb) p *(pthread_mutex_t*)0x5ceda0018ce0
  $1 = {__data = {__lock = 2, __count = 0, __owner = 23, __nusers = 1, ...}

trace:

 Thread 25 (Thread 0x7c6d15825680 (LWP 23)):
#0  0x00007c6d18c1ec37 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
 #1  0x00007c6d18c2933b in pthread_rwlock_wrlock () from /lib/x86_64-linux-gnu/libc.so.6
 #2  0x00005ced90962217 in std::__glibcxx_rwlock_wrlock (__rwlock=0x5ced9ffd0710)
 #3  std::__shared_mutex_pthread::lock (this=0x5ced9ffd0710)
 #4  std::shared_mutex::lock (this=0x5ced9ffd0710)
 #5  std::scoped_lock<std::shared_mutex>::scoped_lock (__m=..., this=<synthetic pointer>)
 #6  homeobject::HSHomeObject::can_chunks_in_pg_be_gc (this=0x5ced9ffd0690, pg_id=<optimized out>, pg_id@entry=0) 
     at /home/ubuntu/HomeObject/src/lib/homestore_backend/hs_pg_manager.cpp:517
 #7  0x00005ced90a6acf1 in homeobject::GCManager::pdev_gc_actor::add_gc_task (this=0x5ceda0693100, priority=priority@entry=1 '\001', move_from_chunk=227) 
     at /home/ubuntu/HomeObject/src/lib/homestore_backend/gc_manager.cpp:325
 #8  0x00005ced90a7a31c in homeobject::GCManager::scan_chunks_for_gc (this=0x5ceda0582840)
 #9  0x00005ced90dbfd17 in std::function<void (void*)>::operator()(void*) const
 #10 iomgr::timer_epoll::on_timer_armed (this=0x5ced9ffe0740, iodev=<optimized out>)
 #11 0x00005ced90dbff57 in iomgr::timer_epoll::on_timer_fd_notification (iodev=iodev@entry=0x5ced9ff30af0)
 #12 0x00005ced90dfc728 in iomgr::IOReactorEPoll::listen (this=0x7c6cf8000980)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions