NEXT GENERATION PHONE SERVICE
UX Research Case Study
WEBSITE & APP
Sketch, UserTesting.com, On-site usability lab (two-way mirrored observation room), 3 cameras and STATA
My client is a U.S. phone service & telecom provider. The provider is unique in that it has a reputation for excellent coverage but unclear billing practices. The provider is currently trying to shut down their brick and mortar stores and develop a website and an app that will manage all aspects of the sales funnel, setup, future billing processes and customer service.
Unique Client Challenge 1 - Multiple Scenarios, Simulating Time-lapses
For this project there were 3 distinct scenarios and 2 interfaces that the client wanted us to improve. The natural scenario was expected to happen in close approximation to one another or to be experienced over a 1 month period. The first scenario was the online sales funnel, (from Google Analytics, we expected most customers to be on a desktop.) The second & third scenarios shared interfaces and were expected to be completed through the app.
User makes a purchase using the sales funnel experience.
User sets up their app for continual usage monitoring and service engagement.
User receives their bill and attempts to understand charges.
Unique Client Challenge 2 - Breaking Mental Models from Traditional Mobile Business Models
Additionally, the phone service plan offered for this journey was not a conventional data plan. The phone plan allowed users to use GB of data on an ad hoc basis.
The unique aspect, there was no initial pre-planned data limit requested or offered during the sales funnel process. Users would be billed as they used the data, without pre-approval from the user. However, users could set up a limit for data usage when they set up their mobile app, but otherwise no min or max data usage was specified during the checkout process (service was advertised as pay-as-you-go). The unique function of the app was that it served as both a historical tool and a management tool in which users could see and change data usage. Stakeholders assumed that users would download the mobile app immediately after purchasing the service, and that they would naturally navigate to the minimum settings in the app.
My objective: Stakeholder Identified Goals
Identify user pain points and expectations within the entire customer journey.
Enhance clarity and transparency in all billing processes.
Gauge user comprehension & sentiment regarding the online purchased service & phone.
Discover opportunities to enhance progressive disclosure, error prevention & recovery, and control & freedom.
My goals for stakeholder elicitation sessions (sometimes called Innovation Sessions) are to have clear outputs in the form of some of the artifacts below. Some are used as blinded tools so that our stakeholders can align on concrete deliverables, scope of research questions, the context of discovery/assessment use, and the appropriate level of reporting details/breadth for intended audiences.
All deliverables were worked on together, and signed, with 5 stakeholders. This was a 4-hour workshop with 3- 15 minute breaks, totaling a 5 hrs and 45 minutes. In the next morning, the below outputs were emailed to attending and non-attending stakeholder participants. To make this workshop run smoothly, the deliverables are pre-drafted (templates), which is filled in on any prior knowledge from myself. This way stakeholders can see the researcher's point of view and we can narrow down the discussion and leave more room for ideation.
Client Deliverable after the Innovation Session
Tentative Research Proposal (PPTx)
A presentation of general questions and meeting goals
A list of KPIs for all of them or individually
Researcher assumptions (from previous synthesized research)
Stakeholder assumptions (from previous findings/departments/experience)
Excel of existing/future "I want to - so I can statements" & priority rankings with PO/designer/dev assignment
Use Case Translations to Testable Hypotheses
General 1st Draft Mod Guide with Estimated Timing: Scenarios, Stimuli Description/Prototype specs, Tasks, Critical User Questions, Measurement Proposal.
List of Commitments: Stimuli & Prototyping Design Resources, Participant Compensation, Tool & Lab Resources, Observation Guests and Update preferences
Alignment on general participant demos vs. hypothesis dependent demo
1st Draft of Research Timeline (Up to Reporting & Follow-up Date.)
Researcher's in-house/UX team outputs:
Flexible blinded mod guide for stakeholders (hidden and unknown to stakeholders)
Cooperative (Owns funding/future funding, SMEs)
indicates needs for ROI projections and/or SME interview planning
Impacted (etc. Owns Dev. Dependencies/Obsoletion Decisions, SOPs/Scrum Masters)
indicates greater detail of design resources/projections
Number People, Word People, Picture people,
Update Cadences/Expectations (weekly, bi-weekly, ad-hoc
After the innovation session (stakeholder elicitation), I synthesized their goals an assumptions about the experience into an excel sheet that showed the relationships between the fallout of their assumptions (whether true or false) to each respective business goal. I also ranked each association on importance based on the stakeholder map. Below are some of the findings from the session.
primary objectives of the client:
users can complete the sales funnel with zero assistance
users understand what they were purchasing and future responsibilities
users are able to explain their service to a friend
users like the tone and feel of the app
main demographic of clinic’s patients:
phone owners and primary account holders
mixed formal education backgrounds
main complaints from current account holders:
bills seems higher than expected
purchases are not explained
finding information on phone service usage was difficult
The initial information architecture of online sales funnel was vague and tedious. I took the client's top 2 competitors and analyzed user friendly trends and similar pitfalls. I came up with 5 principles that were satisfied based on the competitor's surveyed:
1. Design induced Ethos with Visibility, Clarity & Transparency
2. Logos - designs are logical in that they use familiar symbols and information
3. Presenting Pathos with Testimonials
4. Low Cognitive Workloads with Progressive Disclosure
5. Rapport Building with Confirmation & Feedback
Evaluation of Current Website & App | Formulating Hypotheses
The initial information architecture and design elements of the online sales funnel & app lacked in almost all 5 principles. The biggest problem within the app revolved on an unclear way to explain and predict future charges. For this study there were 3 distinct scenarios being tested. Together they formed a type of customer journey. For the purposes of brevity in this case study, I will discuss 3 main hypotheses regarding my heuristic evaluation of the client's prototypes.
Recap on customer journey:
Scenario 1: User makes a purchase using the sales funnel experience.
Scenario 2: User sets up their app for continual usage monitoring and service engagement.
Scenario 3: User receives their bill and attempts to understand charges.
A shopping cart that combined the purchase price of the phone and the service was not clear.
Regarding Scenario 1, the client's desktop prototype for the sales funnel did not explain the unique feature of the service and combined the phone price with the phone service plan. This was confusing since users' expectations were not properly set. Not only did the data plan remain unexplained but the initial service plan fees spoke to user's previous mental models of data plans.
From the user's standpoint the shopping cart may incite a feeling of unethical sales practices since charges seemed to lack transparency. This was a deviation from the principle of Ethos UX declared in my competitive analysis. The sales funnel started without a clear and transparent explanation of the carrier's unique data plan. If my hypothesis is supported, this would create a deficit in relationship building and cause a PR nightmare.
The design element that indicated the amount of data used induced negative transfer from similar designs from the client's current app.
In Scenario 2 the user has downloaded the app and has already used their phone's data plan. The home screen of the app had a large number graphic which represented how much data was used with a tag line that told the user "how many days to go." However, the client's current app had a similar display, in which the number represented how much data was remaining. My hypothesis regards a psychological phenomenon called "negative transfer" where in users will not be able to sufficiently comprehend the design element since a previous mental model will interfere with the current design experience.
This would violate the user's sense of logos (principle 2), since by their logic, the same carrier should have the same graphical representation for similar information. While this could be somewhat remedied by the user's knowledge of the unique data plan ( i.e. the user's plan has no data limit therefore a number would never represent remaining data usage) I contended that it would still be better for the design to speak to current mental models, especially those models which were sourced by the client.
Reconciling the receipt page, usage odometer and statements page required a high cognitive workload.
Within Scenario 3 users have had their app for awhile and have presumably managed their data usage by receiving data usage notifications, set data limits, looked at rewards and planned a number of other service related functions. Now they have received their first phone bill. The prototype explained their bill with 3 screens: the main data usage odometer, statements, and receipts. While each of these screens may serve separate purposes with their own scenarios, I explored the scenario where one would want to see if all three screens "add up." In other words, do all three screens tell a narrative that explains the final bill?
It was essential that each screen did tell the "final bill" story in different ways, allowing the user to focus on different purchases based on the type of screen. However, what happened if one screen indicated a different total than the other screen. Would users notice? If they did, was there a logical explanation for a different total? Would users depend on any one of the screens to tell the same story for the bottom line? If a user viewed one screen, and came to the conclusion that that was the screen with all the information for the final bill, when in fact that screen did not have the final bill's total, then users would be surprised with the numbers on their final bill and trust would be violated.
Not only was it difficult to practically make sure all the numbers on the three different screens added up, but it was also impossible to get the same number. The client's prototype did not intend for the 3 screens to add up to the same number or be reconciled against each other. My hypothesis not only presumes that these pages should be reconcilable, but that they should also add up to the same number. Otherwise, screens containing dollar amounts, without all the dollar amounts, should be clearly marked: "This is not your total bill."
I conducted user tests based on the client's target market: millennials. The Observation room and lab was set in an office building in downtown Chicago in early 2017. It consisted of 3 cameras, a two way mirror, 3 desktop computers (two for the moderator and one for the participant) and a smart phone (OS varied on participant's comfort.) Projections of the users interactions were displayed in the observation room as well as each participant's face and hand movements (for mobile). One of the computers in the lab was for the moderator and allowed communication from me, acting Assessor.
Mixed Methodology Methodology - A/B Testing
The sessions consisted of a series of semi-structured interviews and controlled usability tests on key tasks. I interviewed 65 users in 9 days 7-10 participants a day*, and a courtroom trained transcriptionist took notes in an excel, which was structured to parallel and capture the mod-guide's intended data points. Sessions lasted approximately 1 hour long with 30 min breaks (going from 10-12 hours a day).
Fun Fact: Over the years, it's been my experience that it's very easy to have "no show participants" -which means getting ahead of this possible phenomenon, translates to a personal best practices: On the first day of sessions, accept overfill participants (test more participants than you need for that days quota, so in this case, we moderated 8 participants the first two days, creating 13.5 hour days for myself and the notetaker which included the 1 hour pre & post lab setting [insert bicep contraction emoji] ).
Now, this is not best practice, which is why we hired a professional transcriptionist, who could type for this length of time. As for myself, my clients approached me knowing a unique professional skill about myself. You might think that what I'm about to say is the most self-aggrandizing bullshit, but it's true: I have an above average tolerance for in-person moderation. This is most likely due to a set of compounding variables: [8 younger siblings] x [Political Science major] x [2.5 years of intercollegiate speech & debate forensics competition] x [3 years of junior Marathon Conversation & Events]
Anyway, each participant was paid $60 for their time, and was assured that their performance was not the basis of their compensation, they were free to leave at anytime, and that their names were kept confidential from the client and reports.
The point of the experimental group: testing the competitor UI against affective responses
As a corollary to this challenge, a second stimuli design that conveyed a friendly and encouraging tone to mitigate negative feelings away from any adverse interactions. More specifically, the prototype leveraged playful copy, bubbly animations for micro-interactions and a pink, black and white color pallet.
Writing A Discussion Guide (a.k.a Mod Guide)
Writing a discussion guide needs to overcome two challenges: time constraints and hypothesis testing. With only an hour for each participant only a certain amount of questions and impromptu investigation can be exercised.
Each participant was verbally told how the data plan works from a script, which they were asked to read aloud. The copy from the script was meant to serve as knowledge they would have if commercials or the website had explained the plan. The script was then given to the participant for their reference at anytime. The main points of the copy included:
1. no brick and mortar stores
2. no planned data purchases
3. optional data limit setting from the app
4. unlimited data as you use it
Then the participants were asked to answer and perform a series of questions intertwined with tasks. They were also asked to think aloud whenever they were performing a task. The main prompts and questions included:
[Prompt user to pick out a phone, configure it and arrive at the shopping cart. Record click path, interactions and comments.]
Did you expect a credit check during this process?
What were your thoughts on the configuration options. I noticed that you did X. Did you see this choice?
Open the app and go through it as you would at home. [Record click path, interactions and comments.]
If you had to pay your entire bill today, what is the total amount?
Describe the main screen.
What do you like or dislike?
Is this your final bill?
Go to statements.
How do you feel about the statement display?
How would you explain the charges on your statement.
Open your receipt.
What is your final bill?
How would you explain your data plan to a friend or family member?
How much data have you used?
How many days are left in your billing cycle?
What do you expect to pay when your bill is due?
What features do you like or dislike most about the data plan?
From the words in this deck (see image), what words would describe this app?
What would you say about the tone and color of the app?
How do you think we can improve the app or any of the screens you saw today?
Whiteboarding: Getting Stakeholder Buy-In
After each user testing session teams of designers, developers and stakeholders gathered in a room and white boarded what we saw. I led the discussion and offered our observations as well. This process helped us gained a type of buy-in from our clients and helped us further understand what mattered most to our clients. The most important benefit of this was for the final report. This way, when results and conclusions were delivered in the final report no one was surprised by our findings.
There were many issues with the app and website after the user testing sessions were complete. Some of them were unanticipated. This was good news since it proved the power of usability testing. After the sessions the user data was analyzed with three key methods were used to model predicted pain points using OLS Regression, Averages, and the qualitative statements. While qualitative insights could be easily reached immediately after the sessions, these mathematical tools could help us make generalizable statements to the population, i.e. estimate what could happen in the real world without any changes to the design.
Below are the qualitative insights that pertain to the hypotheses mentioned above. Other insights were recorded as either quotes and qualitative statements like (see below). The qualitative statements helped us focus our efforts on where to perform deeper research dives. Not all the qualitative statements were analyzed with econometric models, since most statements delivered enough understanding by themselves.
To avoid an illegitimate inferences from a small sample size (n = 35), the words "all," "some," "most" and "few" are used instead of fractions (e.g. 3 of 35, 27 of 35). This way recipients of the report can focus on the qualitative significance instead of the quantitative, which is reserved for the next phase.
Key Qualitative Statements
All users were able to complete the sales funnel with zero assistance.
Some users did not expect a credit check during the sales funnel.
Most users did not understand all of the configuration options. Some users did not see the options.
Most users were able to reach all pages of the app. Some users did not understand and verbalized confusion for some features.
Most users did not know the correct final bill total.
All users described the main screen with positive attributes.
Most users liked the color and tone of the app design.
Most users pointed the main screen as the place where they would expect to have all the information about their final bill.
All users like the statements screen and specifically verbalized appreciation for the expandable panels as a way to find more details about a specific charge.
How would you explain the charges on your statement.
Most users tried to add all the charges up from the receipt screen.
Most users expected receipt total to match the main screen.
Most users failed to communicate the auto-spend function of unlimited data.
Some users had difficulty reconciling how much data they spent with the incorrect belief that the main screen gave information about how much data they had left.
All users understood how many days they had left in their billing cycle.
Most users did not calculate their phone installations and extra charges into their final bill.
Most users did not like that there were three different screens in which to calculate monthly bills.
After I distilled the data into the key qualitative statements (above) I began to organize answers, paths, behaviors and interactions into variables. I then conducted an online user testing sessions with 4,634 participants using UserTesting.com
The sessions were modeled in a similar fashion as the live sessions. Below is the econometric model that was applied to the dataset to see if similar pain points were observed which will help me predict how severe the causal relationship is between design elements/interactions and any given pain point. The formula is called Ordinary Least Squares (OLS) Multivariate Regression. This is the most common mathematical model for economists and social science researchers.
mean user behavior on interaction X
each user behavior on interaction X
expanded equation - something SPSS & STATA do for you
Observed pain point minus mean pain point
percentage of times Y pain point
is associated with X interaction/behavior
squared error for each variable
This formula allows one to control for many other variables like age, race, education, gender and income. Since most of our demographic is homogeneous on many of these variables, and some were theoretically irrelevant to our goals, we controlled for a unique demographic score called the Millennial Questionnaire Score (MQS). Our third party recruiters were asked to recruit an equal amount of participants who scored high, medium and low on the questionnaire. The questionnaire was meant to gauge users on how sensitive they were to data constraints relative to pricing plans. The variables score is simply added to the end of the model which, in effect standardizes the prediction against data-to-price sensitivity. This way we could be sure that, for instance, that a pain point was not solely due to a user's sensitivity to how much a plan costs.
Here I will discuss findings that can support my hypotheses. For tables with the full models and descriptive statistics please contact me.
A shopping cart that combined the purchase price of the phone and the service was not clear.
For the user testing session we created an A/B testing scenario where half the users were given 2 side by side shopping carts (users A), and the other half were given the current model (users B). Users B were associated with a 0.64 increase in reporting a pain point in understanding the cart, on average controlling for all other variables in the model. This was significant at the 0.05 level. The F-value of 35.10 indicates that the model's variables are jointly statistically significant at the p < 0.001 level.
The design element that indicated the amount of data used induced negative transfer from similar designs with unrelated purposes.
One the key findings from the model showed that participants who had already had the carrier's plan and used the app were associated with a 0.416 increase in reporting a pain point with the "data used" design element on the home screen of the app, relative to respondents who are not customers with the app, on average controlling for all other variables in the model. This variable (current versus new users) had an F-value value of 16.93 which indicates that they were jointly statistically significant at the p < 0.001 level.
Reconciling the receipt page, usage odometer and statements page required a high cognitive workload.
For the online user testing session we created an A/B testing scenario where half the users used an app with only one statements page that also acted as the main screen and their receipt. The other half were given the current prototype (users B). All users were asked to report what their final bill would be and what their bill current was. Users B were associated with a 0.768 increase in reporting the wrong answer for both questions, relative to users A, on average controlling for all other variables in the model, with a p-value < 0.01. This variable (users A & B) had an F-value value of 30.48 which indicates that they were jointly statistically significant at the p < 0.001 level.
After the online testing and live sessions my partner and I wrote down the problem areas of the design and possible solutions. We used post it notes to organize areas. Initially we wrote our ideas down separately and then combined them in a final meeting. In the final meeting I suggested two rules: 1. Defer Judgement 2. Focus on Quantity. This way we could quickly understand each other's conceptual landscape. This phase of the ideation process is called "flaring out."
After we "flared out" we synthesized our ideas by ruling some as more or less important. Sometimes we married two ideas, and took out redundant ones. This phase is the "selective" phase. In order to help us organize the ideas we selected, we decided to create a red routes matrix for behaviors and interactions that were most and least observed.
expected home sreen to show final bill
set data limit
export a reciept
tapped charge icons
checked family usage
calculated receipt total
tapped "data used" graphic
hard pulldown on homescreen
check extra charges
calculated statements total
confused data used with data remaining
calculated homescreen total
My research was placed into a powerpoint presentation along with my prototypes. The prototypes were simply a means to demonstrate one possibility that the data supports. While many UX designers can create these aswell, a UX researcher's prototypes are simply given as a visual aid to understand the researcher's analysis, ideation sessions and conclusions.
Re-designed Shopping Cart page for Desktop Sales Funnel
Re-designed "Data Used" Display
Re-designed Statements page