Skip to content

Support for multiple OSDs on a single testnode #6

@VallariAg

Description

@VallariAg

When I ran a test with config which had multiple OSDs on a single testnode, I got the following error:

2024-05-15T12:29:41.414 INFO:teuthology.orchestra.run.7fa8c2843ce2.stdout:Created osd(s) 0 on host '7fa8c2843ce2'
2024-05-15T12:29:42.108 DEBUG:teuthology.orchestra.run.7fa8c2843ce2:osd.0> sudo journalctl -f -n 0 -u ceph-1b2586ac-12b6-11ef-945e-d6d5f423fdc9@osd.0.service
2024-05-15T12:29:42.110 INFO:tasks.cephadm:{Remote(name='ubuntu@7fa8c2843ce2'): [], Remote(name='ubuntu@c3bfc4209056'): ['/dev/loop3'], Remote(name='ubuntu@db02dd5eef59'): ['/dev/loop0'], Remote(name='ubuntu@de36ba4bccc7'): ['/dev/loop1']}
2024-05-15T12:29:42.110 INFO:tasks.cephadm:ubuntu@7fa8c2843ce2
2024-05-15T12:29:42.110 INFO:tasks.cephadm:[]
2024-05-15T12:29:42.110 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/teuthology/teuthology/contextutil.py", line 30, in nested
    vars.append(enter())
  File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/root/src/github.com_vallariag_ceph_0c8b425a40783ee42c035ea9fbe29647e90f007f/qa/tasks/cephadm.py", line 1072, in ceph_osds
    assert devs   ## FIXME ##
AssertionError

Each testnode had one loop device: https://pastebin.com/raw/8z5gj0CU (ls /dev output)

The above problem happens because my job config has multiple osds on 1 node (osd.0 and osd.1 deployed on same host) and there is only 1 device available on each testnode container that can be zapped for osd deployment.
Using ceph-devstack setup, the teuthology function get_scratch_devices() returned 1 devices for each testnode. So the mapping (devs_by_remote) looks like this :

{Remote(name='ubuntu@7fa8c2843ce2'): ['/dev/loop2'], 
Remote(name='ubuntu@c3bfc4209056'): ['/dev/loop3'], 
Remote(name='ubuntu@db02dd5eef59'): ['/dev/loop0'], 
Remote(name='ubuntu@de36ba4bccc7'): ['/dev/loop1']}

And because we pop the loop device from the above devs_by_remote after 1st osd is deployed, the 2nd osd on same testnode has no more available devices to deploy 2nd osd on.
I rerun my test with 1 osd/node config and that worked (test went through the ceph setup okay).

As for a proper solution... does this mean we should create more loop devices per testnode in ceph-devstack?
Let me know, I'll love to pick this issue. It'll be a good gateway to understand more of ceph-devstack.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions