For my master's thesis, I explored how different visualizations of semantically mapped biomedical data could impact the information seeking experience for scientists and researchers. To accomplish this, I conducted a comparative usability test of two open source visualization applications: WebVOWL and Protégé.
Methods and Tools Used
Methods
Moderated usability testing (12 participants)
System Usability Scale (SUS) questionnaire
Tools
SurveyMonkey (for administering study questionnaires)
Zoom (for test facilitation)
Protégé (one of the tested applications)
WebVOWL (one of the tested applications)
My Contributions
As the sole researcher, I conducted all phases of this project:
Literature review & secondary research
Study design & dataset selection
Participant recruitment
Usability test moderation & notetaking
Data analysis & reporting
Background
Context
Biomedical data usually come in the form of vast, complex datasets, and traditional relational databases tend to struggle with effectively capturing interconnected information. Semantically mapped data, known as ontologies, can offer better ways to organize and retrieve this type of data, yet their usability when presented in user interfaces (UI) remains insufficiently researched.
The Challenge
Semantic data has great potential, but it’s rarely integrated into commercial search tools or user-friendly interfaces. Furthermore, most people, even scientists, aren't familiar with complex query languages for semantic data, making it hard to access this information. A well-designed visualization can simplify data exploration, removing the need for technical skills. However, there’s little research on how to design these interfaces effectively. I tried to address this gap by working directly with potential users to improve visual access to semantic biomedical data.
Research Questions
The usability study aimed to address the following research questions:
How might networked graphical visualizations of semantically linked data enhance information-seeking behavior and promote discoverability in life sciences?
Are individuals aided in using a graph database if they can easily see and navigate the connections between data points?
Is it possible to recreate serendipitous discovery in the digital realm when visualizing a graph database?
What specific interface elements and qualities of a networked visualization help or hinder the information-seeking process?
Methodology: Moderated Usability Test
Test Stimuli and Data
For this study, I chose to assess two different applications: WebVOWL, and Protégé . I encountered these two applications during my literature research, and found that they both had desirable usability qualities for visualizing semantic data, but were different enough from each other to compare during usability testing.
Application B: WebVOWL A tool that visualizes ontologies in a force-directed graph.
Application A: Protégé A tool that visualizes ontologies with force-directed graphs and indented lists.
In both of these applications, users can upload an ontology file, and then dynamically manipulate and navigate a visualized diagram of that data. Because different datasets can cover different topics and have different structures, I chose to evaluate two datasets, to ensure that users' experiences would be a result of the applications themselves, and not be at the mercy of the dataset.
The ontologies I chose for this study were:
COVID-19 Ontology (COVID):An ontology that captures a variety of medical concepts associated with COVID-19.
I used social media to recruit participants virtually. Participant screening criteria were as follows:
Participants had to be at least 18 years of age.
In an attempt to accurately represent the intended users of these visualizations in practice and research, study requirements dictated that participants must have completed a Science, Technology, Engineering or Mathematics (STEM) undergraduate program, and/or any sort of graduate degree in a STEM field.
Participants were required to have an up-to-date browser and an active internet connection to use WebVOWL.
Participants were required to have a Windows or Mac operating system to download and use Protégé prior to their test session.
Prior to each session, each participant was requested to watch a brief video to introduce them to the concept of ontologies and semantic data. As all test sessions were conducted through Zoom, I asked participants to share their screen only when navigating the visualizations.
Study Design
I wanted to see how all users responded to all possible combinations of dataset and application, so I used a within-subjects approach create testing labels:
However, I had make sure that no participant would test the same dataset or application twice. This involved creating 4 test groups, and then randomly assigning participants to them after scheduling sessions:
Participant Group 1: Tests A1, then B2
Participant Group 2: Tests B2, then A1
Participant Group 3: Tests B1, then A2
Participant Group 4: Tests A2, then B1
With this approach, I was able to expose all participants to all tested applications and datasets, while minimizing carryover effect to get the best data I could.
Usability Test Structure
For this study, I used a task-based, think-aloud approach to my usability testing. Before each set of tasks, I gave users a quick demonstration on how to use some of the core features for both WebVOWL and Protégé. Then, depending on the randomized testing order for that participant, they would start with a combination of the COVID or OCVDAE datasets, loaded into either WebVOWL or Protégé. Tasks were organized by dataset, and abbreviated versions are listed below:
COVID Dataset Tasks
Explore the COVID visualization.
Find and record 3 to 5 terms that are related to the phrase “COVID-19”.
Pick any one of the terms you found in the previous step, and using the visualization, find 3 to 5 more terms that are connected to the original term.
Find 7-10 symptoms of COVID-19. You may use the keyword search function only once.
OCDVAE Dataset Tasks
Explore the OCVDAE visualization.
Find and record 3 to 5 drugs associated with the term “Heart Failure”, that have at least one associated adverse event.
Pick any one of the adverse events you found in the previous step – are they associated with any other cardiovascular drugs? What conditions do those drugs treat/prevent?
Find 3-5 drugs that are associated with depression and the diseases they treat/prevent.
After completing a task set for an application, participants were required to fill out the System Usability Scale (SUS) questionnaire on SurveyMonkey for that application before proceeding to the next application and set of tasks. As an industry standard questionnaire, the SUS is a quick, 10 question, 5-point Likert scale questionnaire that helps measure participants' holistic perception of a system. The SUS statements administered are listed below:
System Usability Scale Statements (SUS)
I think that I would like to use this system frequently.
I found the system unnecessarily complex.
I thought the system was easy to use.
I think that I would need the support of a technical person to be able to use this system.
I found the various functions in this system were well integrated.
I thought there was too much inconsistency in this system.
I would imagine that most people would learn to use this system very quickly.
I found the system very cumbersome to use.
I felt very confident using the system.
I needed to learn a lot of things before I could get going with this system
Results and Analysis
Qualitative Results: What did users have to say? How did they click through the visualizations? Why?
I compiled qualitative observations in notes for each usability testing session: logging verbal comments of note from participants, observations of click paths and search behavior, and task pass/fail status.
All participants were able to understand the data presented to them in both visualizations. Participants were able to follow connections between data nodes, and generally understood the content of the datasets.
(As they clicked through WebVOWL) "If I'm trying to diagnose someone with abnormal cytokine levels. If the immune system is affected, are there any other risk factors or connections? Maybe they have COVID-19." -Usability Test Participant
"Looks like a cloud network of not just covid , but also respiratory diseases, symptoms and treatments." -Usability Test Participant
Some participants brought expectations from previous applications they used in the past.
"This one (Protégé) reminds me of CAD/SolidWorks, it's a common way to visualize data, it makes sense with the nesting." -Usability Test Participant
Participants found WebVOWL’s visualizations and interface more intuitive and easier to learn than Protégé's. Some described WebVOWL as useful and intuitive and appreciated seeing all connections in a single view.
"I like the UI, colors are more appealing, a lot less cluttered by default." -Usability Test Participant
"I like this search function a lot better (than Protégé) -Usability Test Participant
Filtering was more effective in Protégé. Participants preferred the ability to collapse and hide unnecessary nodes, allowing for a more focused search experience.
Participants appreciated Protégé's list view of ontology data, but only when they knew exactly what keyword or term they were looking for.
"Quicker to hover if you know what you're looking for. Wanted to keep it quick instead of adding in and redoing the visualization." -Usability Test Participant
"Hover" text view for an ontology node in Protege:
Information overload was a challenge in both applications. When visualizing large datasets, participants struggled with system lag and cluttered visualizations. This was especially true in WebVOWL, where the application tended to load all data nodes at once, as opposed to Protege, which allowed for a more progressive disclosure of information.
"Seems like a good program if you're running a machine powerful enough to run it" -Usability Test Participant
"It's great for visualization until there's too much data" -Usability Test Participant
Participants attempted to mitigate information overload by dragging points of interest away from a visualized mass of nodes. For example, when assessing drugs that were classified either as treatments or preventative measures for "Heart Failure’" most participants dragged the ‘Heart Failure’ node away from all other non-related nodes. This strategy allowed the participants to focus solely on nodes directly related to ‘Heart Failure’ in hopes of finding the desired nodes.
"Strategy is to draw out the visualization so I can see all the connections and lines stemming from it, the relations become more visible for me, now it's easier to see the relations" -Usability Test Participant
Dragging nodes away in WebVOWL:
Quantitative Results: How did users score?
Even though the focus of this study was more on the qualitative side, I wanted to gather some quantitative data to paint a more comprehensive picture of participants' behavior and reactions to the visualizations. For this study,I focused on only analyzing Task Completion Rates, and the System Usability Scale test scores.
Confidence intervals were calculated for the completion rates for all tasks executed in WebVOWL and Protégé. Due to dataset and application partition, the sample size for most of my calculations were been reduced from the overall study sample, 12, to a partitioned sample size (n=6). Because of this extremely small sample size, I used an Adjusted-Wald Binomial Confidence Interval to generate metrics.
At a glance:
Across both datasets (COVID-19 and OCDAE), participants achieved a 94% overall task completion rate.
Protégé Tasks: Completion rates varied; for example, COVID Task 3 had an 83% success rate, while other tasks were completed at 100%.
Tasks: Most tasks achieved a 100% success rate, except for COVID Task 2, which had an 83% success rate.
SUS:WebVOWL had a mean SUS score of 61.0, while Protege had a mean SUS score of 54.6.
WebVOWL Task Analysis:
Protégé Task Analysis:
System Usability Scale Comparison
Takeaways & Lessons for the Future
Key Findings
Graphical representations aid targeted discovery – Users successfully traced relationships between nodes to complete usability test tasks.
Users defaulted to lists and keyword search – When users knew exactly what they were looking for, they did resort to traditional search methods and list views.
Information overload hindered usability – Large dataset visualizations caused lag and confusion, negatively impacting the user experience.
Visualization tools supported information-seeking behaviors – WebVOWL and Protégé enabled participants to navigate and interpret graph databases effectively, even without prior expertise.
Serendipitous discovery remains unproven – Participants did not report unexpected yet useful insights, leaving the question of digital serendipity unanswered.
Limitations
Small sample size and participant background – With only 12 participants, findings aren't generalizable, especially for quantitative insights. I also didn't require participants to specifically have a life sciences background or profession. Different results may have occured if credentials in life sciences or bioinformatics were a prerequisite for this study.
Study constraints – The COVID and OCDVAE ontology files used were not explicitly designed for visualization, which may have contributed to negative system performance and data overload.
Opportunities for Further Research & Design
There is definitely promise in visualizing semantic data. A hybrid visualization tool that combines the best of both worlds of Protege and WebVOWL would definitely be worth studying in-depth.
Conduct longitudinal studies to measure long-term usability, learning curves, and knowledge retention.
It may be worth exploring the use of eye-tracking to obtain quantifiable gaze data on certain interface and visualization elements.