Usability Study of Biomedical Data Visualizations

UX Research

Project Overview

For my master's thesis, I explored how different visualizations of semantically mapped biomedical data could impact the information seeking experience for scientists and researchers. To accomplish this, I conducted a comparative usability test of two open source visualization applications: WebVOWL and Protégé.

Methods and Tools Used

Methods

  • Moderated usability testing (12 participants)
  • System Usability Scale (SUS) questionnaire

Tools

  • SurveyMonkey (for administering study questionnaires)
  • Zoom (for test facilitation)
  • Protégé (one of the tested applications)
  • WebVOWL (one of the tested applications)

My Contributions

As the sole researcher, I conducted all phases of this project: 

Background

Context

Biomedical data usually come in the form of vast, complex datasets, and traditional relational databases tend to struggle with effectively capturing interconnected information. Semantically mapped data, known as ontologies, can offer better ways to organize and retrieve this type of data, yet their usability when presented in user interfaces (UI) remains insufficiently researched.

The Challenge

Semantic data has great potential, but it’s rarely integrated into commercial search tools or user-friendly interfaces. Furthermore, most people, even scientists, aren't familiar with complex query languages for semantic data, making it hard to access this information.
A well-designed visualization can simplify data exploration, removing the need for technical skills. However, there’s little research on how to design these interfaces effectively. I tried to address this gap by working directly with potential users to improve visual access to semantic biomedical data.

Research Questions

The usability study aimed to address the following research questions:

  1. How might networked graphical visualizations of semantically linked data enhance information-seeking behavior and promote discoverability in life sciences?
  2. Are individuals aided in using a graph database if they can easily see and navigate the connections between data points?
  3. Is it possible to recreate serendipitous discovery in the digital realm when visualizing a graph database?
  4. What specific interface elements and qualities of a networked visualization help or hinder the information-seeking process?

Methodology: Moderated Usability Test

Test Stimuli and Data

For this study, I chose to assess two different applications: WebVOWL, and Protégé . I encountered these two applications during my literature research, and found that they both had desirable usability qualities for visualizing semantic data, but were different enough from each other to compare during usability testing.

Application B: WebVOWL
A tool that visualizes ontologies in a force-directed graph.

Application A: Protégé
A tool that visualizes ontologies with force-directed graphs and indented lists.

In both of these applications, users can upload an ontology file, and then dynamically manipulate and navigate a visualized diagram of that data. Because different datasets can cover different topics and have different structures, I chose to evaluate two datasets, to ensure that users' experiences would be a result of the applications themselves, and not be at the mercy of the dataset.

The ontologies I chose for this study were:

  1. COVID-19 Ontology (COVID): An ontology that captures a variety of medical concepts associated with COVID-19.
  2. Cardiovascular Drug Adverse Events (OCVDAE): An ontology that captures adverse events related to common cardiovascular disease medication.

Recruiting & Logistics

I used social media to recruit participants virtually. Participant screening criteria were as follows:

Prior to each session, each participant was requested to watch a brief video to introduce them to the concept of ontologies and semantic data.
As all test sessions were conducted through Zoom, I asked participants to share their screen only when navigating the visualizations.

Study Design

I wanted to see how all users responded to all possible combinations of dataset and application, so I used a within-subjects approach create testing labels: 

However, I had make sure that no participant would test the same dataset or application twice. This involved creating 4 test groups, and then randomly assigning participants to them after scheduling sessions:

With this approach, I was able to expose all participants to all tested applications and datasets, while minimizing carryover effect to get the best data I could.

Usability Test Structure

For this study, I used a task-based, think-aloud approach to my usability testing. Before each set of tasks, I gave users a quick demonstration on how to use some of the core features for both WebVOWL and Protégé. Then, depending on the randomized testing order for that participant, they would start with a combination of the COVID or OCVDAE datasets, loaded into either WebVOWL or Protégé. Tasks were organized by dataset, and abbreviated versions are listed below:

COVID Dataset Tasks

  1. Explore the COVID visualization.
  2. Find and record 3 to 5 terms that are related to the phrase “COVID-19”.
  3. Pick any one of the terms you found in the previous step, and using the  visualization, find 3 to 5 more terms that are connected to the original  term.  
  4. Find 7-10 symptoms of COVID-19. You may use the keyword search function only once.  

OCDVAE Dataset Tasks

  1. Explore the OCVDAE visualization.
  2. Find and record 3 to 5 drugs associated with the term “Heart Failure”, that have  at least one associated adverse event.
  3. Pick any one of the adverse events you found in the previous step – are they associated with any other cardiovascular drugs? What conditions do those drugs treat/prevent?
  4. Find 3-5 drugs that are associated with depression and the diseases they treat/prevent.

After completing a task set for an application, participants were required to fill out the System Usability Scale (SUS) questionnaire on SurveyMonkey for that application before proceeding to the next application and set of tasks. As an industry standard questionnaire, the SUS is a quick, 10 question, 5-point Likert scale questionnaire that helps measure participants' holistic perception of a system.  The SUS statements administered are listed below:

System Usability Scale Statements (SUS)

  1. I think that I would like to use this system frequently.
  2. I found the system unnecessarily complex.
  3. I thought the system was easy to use.
  4. I think that I would need the support of a technical person to be able to use this system.
  5. I found the various functions in this system were well integrated.
  6. I thought there was too much inconsistency in this system.
  7. I would imagine that most people would learn to use this system very quickly.
  8. I found the system very cumbersome to use.
  9. I felt very confident using the system.
  10. I needed to learn a lot of things before I could get going with this system

Results and Analysis

Qualitative Results: What did users have to say? How did they click through the visualizations? Why?

I compiled qualitative observations in notes for each usability testing session: logging verbal comments of note from participants, observations of click paths and search behavior, and task pass/fail status.

(As they clicked through WebVOWL) "If I'm trying to diagnose someone with abnormal cytokine levels. If the immune system is affected, are there any other risk factors or connections? Maybe they have COVID-19."
-Usability Test Participant

"Looks like  a cloud network of not just covid , but also respiratory diseases, symptoms and treatments."
-Usability Test Participant

"This one (Protégé) reminds me of CAD/SolidWorks, it's a common way to visualize data, it makes sense with the nesting."
-Usability Test Participant

"I like the UI, colors are more appealing, a lot less cluttered by default."
-Usability Test Participant

"I like this search function a lot better (than Protégé)
-Usability Test Participant

"Quicker to hover if  you know what you're looking for. Wanted to keep it quick instead of adding in and redoing the visualization."
-Usability Test Participant

"Hover" text view for an ontology node in Protege: 

"Seems like a good program if you're running a machine powerful enough to run it"
-Usability Test Participant

"It's great for visualization until there's too much data"
-Usability Test Participant

"Strategy is to draw out the visualization so I can see all the connections and lines stemming from it, the relations become more visible for me, now it's easier to see the relations"
-Usability Test Participant

Dragging nodes away in WebVOWL: 

Quantitative Results: How did users score?

Even though the focus of this study was more on the qualitative side, I wanted to gather some quantitative data to paint a more comprehensive picture of participants' behavior and reactions to the visualizations. For this study, I focused on only analyzing Task Completion Rates, and the System Usability Scale test scores.

Confidence intervals were calculated for the completion rates for all tasks executed in WebVOWL and Protégé. Due to dataset and application partition, the sample size for most of my calculations were been reduced from the overall study sample, 12, to a partitioned sample size (n=6). Because of this extremely small sample size, I used an Adjusted-Wald Binomial Confidence Interval to generate metrics.

At a glance: 

WebVOWL Task Analysis:

Protégé Task Analysis:

System Usability Scale Comparison

Takeaways & Lessons for the Future

Key Findings

Limitations

Opportunities for Further Research & Design

Back to top