Best Practices to Avoid Performance Issues in Datahub#
Certain large courses or courses with complex software/compute requirements can consume lot of memory/cpu which can result in poor user experience for students and/or increased cloud costs. Some of the commonly reported performance issues are due to one or many of the following reasons,
Students are printing large data frames to a notebook directly or trying show a table that is too large in a notebook cell. The way to solve this issue is to recommend students to not print large data directly to a cell in their notebook. In addition, You can slice the datasets to a smaller samples that students don’t run into issues even if they print the entire dataset (not ideal). Please always review your datasets and remove parts of data that are not core to achieving the required learning outcomes for students
Students are running a Python/R/Julia code containing an infinite loop. You can ask students to review their code constantly to check if they are running an infinite loop. You can ask students to reach out to you/your team if they believe that they have issues related to infinite loop in their code. If no one reached out, You can also check with the infra team to identify the problematic user notebooks and do the necessary code review to solve this issue (Re: admin access).
Students are joining tables that are large. Once again, try to break down the dataset to subset that is of interest to achieve the course objectives.
Students are having multiple notebooks open at the same time across one or many browsers. If students report errors such as 503, 401 error codes etc.. ask them to check if they have notebooks open in multiple tabs. As a best practice, please ask students to have a single active tab with a notebook and close other tabs.
You upgraded to the latest version of a package without testing it extensively (e.g.: Otter grader). As a rule of thumb, upgrade packages in staging environment and test the notebooks extensively. Only when you feel comfortable with the updated environment, ask the infra team to upgrade to the latest package version in the stable environment. If you are unsure of the URL for staging environment for the hub you use for teaching purposes then ask the infra team.
You are using databases like SQLite as part of your workflow without consulting the about best practices with the infrastructure team.
You are trying to use GUI-based applications like pyqt5 and QGIS without consulting the infrastructure team
You are using large language models as part of your assignments. We don’t offer GPUs as part of the instructional hubs hence group of students working on large assignments with language models can max out the CPU quickly. This can result in indefinite execution of celles with no output.