Introduction
In 2011, in Germany, an E. Coli outbreak caused widespread hysteria, and ultimately led to 51 deaths, hundreds of hospitalizations, and billions of dollars in damages from recalled goods. Scientists, using traditional genomic testing methods, determined that the outbreak was caused by Spanish cucumbers. In reality, however, the outbreak was caused by German sprouts. Despite the panic, the proper pathogen was able to be correctly identified and the case was subsequently closed.
The method that these biologists used to identify the E. Coli strain, both initially and subsequently, was Multiplexed Genomics Testing. This method of pathogen identification revolves around identifying a certain segment of suspected pathogen DNA, isolating it, and determining whether or not it really belongs to that pathogen. The most difficult part of this process is isolating the DNA from just the expected pathogen, so as to not be hindered by other DNA in a sample. In order to effectively do this, biologists have developed a method whereby they introduce a compound, called a primer, into their tests. These primers bind on both sides of the interesting section of DNA and allow it to be easily multiplied and subsequently isolated. Ultimately, Multiplexed Genomics Testing is the current method used by biologists worldwide to identify certain genetic segments.
Despite its popularity, this testing method still has several key drawbacks. As seen in the Germany example, these tests are not always accurate. A major part of this inaccuracy is that it is often difficult and time consuming to determine which primers to use in a particular test. Purchasing the incorrect primers might either cause huge delays in the testing process or result in inaccurate findings. To counteract this, some companies have developed software that helps biologists choose primers more efficiently. However, these solutions are often exclusive, expensive, or generally inaccessible to most labs that are otherwise equipped to perform Multiplexed Genomics Tests.
Our sponsor for this project is the Fofanov Bioinformatics Lab at Northern Arizona University. Under the direction from Dr. Fofanov, the lab’s principal investigator, Dr. Furstenau is in the process of creating a command-line program called Primacy, which is able to robustly compare all possible primer choices in a given scenario, inform its user what the best choices are, and let them know if something might go wrong. While this tool will be a boon for the greater scientific community, it still has a few minor issues, which our team, Team PathLab, has been commissioned to address.
Specifically, Primacy currently exists as as Command Line Interface (CLI). The main problem with this command line tool is its accessibility to end-users of different technical expertise. CLI tools are great for software engineers and tech-savvy users who are already familiar with the interface. However, most researchers who will be using this tool are experts in their field of work, and they might not all have the same level of expertise in using different computer interfaces. This is one of the main challenges our team needs to solve. New users often find operating a CLI tools more difficult in comparison to traditional Graphical User Interfaces, as CLI tools require a higher degree of memorization and familiarity for operation and navigation, and they are more prone to human error (e.g. user misspells a keyword).
As such, the solution which we are working towards implementing is a Graphical User Interface (GUI). This interface will solve our client’s problem by being easy to learn, easy to use, and by providing accurate feedback and statistical visualizations. More specifically, our GUI will be designed to meet a variety of requirements. Functionally, our GUI must be able to traverse between the different Primacy modules, interacting with the pipeline and guiding the user along the way. One primary way it will accomplish this is via input validation, where the GUI checks to ensure the user is inputting proper values.
On the less technical side of requirements, our GUI must be easy to learn. This includes avoiding cluttered designs, and including ample documentation, both within and outside of the program. Additionally, the interface must not impose any significant processing time increase on the pipeline. Finally, our GUI must be somewhat easy to access, meaning it must be packaged/installed with Primacy and be able to be installed on a variety of systems, including Mac, Windows, and Linux releases.
This document will focus on the specific design aspects and choices we have made in order to fulfill the above requirements. Not only will it provide a rough outline for our implementation timeline, but it will provide a starting place for documentation for anyone who might maintain or update this software after us. In order to maximize the usefulness of this document, we will be updating it throughout our implementation process to reflect any changes decisions or implementation details.