-
Notifications
You must be signed in to change notification settings - Fork 0
Issues and Fixes
This page contains a log of technical issues with our JupyterHub deployment that we have resolved.

OpenEBS has its own certificates. These are automatically updated when helm upgrade is run, so this can happen if the certificates are allowed to expire. The solution is to restart the OpenEBS pods, which will force them to re-generate their certificates. Source
kubectl -n openebs get pods -o name | grep admission-server | xargs kubectl -n openebs delete
This can also occur if the microk8s certificates expire. These can be renewed with sudo microk8s refresh-certs -e ca.crt. This may require a reboot AND approximately 20 minutes for all the services to update. The certificates need to be manually renewed once a year.
Nvidia drivers are the most common cause of issues that require server maintenance. The first approach should be trying to resolve the issue using apt commands such as sudo apt update, sudo apt upgradable, sudo apt upgrade, sudo apt --fix-broken install. Restarting the server is required after any Nvidia driver changes.
If apt is unable to upgrade or repair the Nvidia drivers and libraries appropriately, the next step is to remove all Nvidia drivers and packages and reinstall them following Nvidia's documentation. That will look something like the following:
sudo apt-get --purge remove "*cublas*" "cuda*" "nsight*" "nvidia*"sudo apt-get autoremove && sudo apt-get autocleanwget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb -O ~/cuda-keyring.debsudo dpkg -i ~/cuda-keyring.debsudo apt-get updatesudo apt-get install cuda nvidia-container-toolkitsudo reboot