CREST COW 36 – APP STORE ANALYSIS 27/28/10/2014
More details and the 36th CREST COW can be found here: App Store Analysis Program and attendees.
The whole 2 day workshop was live blogged by Dr Sue Black, you can read through the blog below.
Details of views for the workshop from all over the world are shown below by country.
App stores provide a rich source of information for software engineering research: It is, of course, possible to extract technical information as with other software systems. However, we can also readily obtain information relating to customer reviews, pricing and popularity. Never before in history of software engineering has so much information been available concerning so many, and so disparate, facets of software systems. Increasingly, the users of apps and app stores are relying on the software they provide for highly nontrivial activities, making app store analysis a pressing concern. This workshop will bring together software engineers to discuss and develop the emerging research agenda in App Store Analysis.
Programming languages, software engineering, computational musicology are all topics researched at CREST.
The CREST COW Twitter account is @CRESTCOW and the hashtag is #UCLCOW36
We are a new community, some may say that this is not software engineering. There is a lot of resistance to this topic in software engineering, but looking around the room there are a lot of smart people here 😉
In the late 90s, people thought we shouldn’t analyse web apps, in 1982 people said the same about Micros…
Everyone now introduces themselves to the group, there are about 35 people attending from around the UK and the world. More details at the bottom of the CREST COW 36 webpage
Our first talk this morning:
Studying and Enabling Reuse in Android Apps
Denys Poshyvanyk, Computer Science Department, The College of William and Mary, USA
William and Mary College is the 2nd oldest academic building in the US, founded in 1693.
We have 1.3 million apps, real and fake markets, 1000s of open source apps, its a fast growing economy with lots of people and companies making lots of money.
This talk concentrates on one issue: apps are built using APIs and there are some specific issues related to that. There are issues related to the maintenance of APIs.
(Sorry our photos are so dark :()
Research Q: APIs evolve rapidly, does instability of APIs affect the success of Android apps?
5848 apps analysed, which belonged to 30 domain categories, 68 third party libraries, only those with repositories, because needed all the changes and all the bugs.
85k dev commits, 39k bug fixes, 1232 devs contributing. Used average user ratings.
Analysed distribution of rating s for free vs paid apps. Worked only with free apps.
Plotted bug fixes vs ratings
Looked at what problems do developers experience related to appstores, apps, reviews…
1221 app developers, extracted all developers email addresses, emailed them an online survey, had 45 responses from professional Android developers.
Became clear that unstable APIs DO make a significant impact on apps.
Bugs in apps correlate strongly with poor ratings, bugs often related to the instability of the APIs used.
More details of this research can be found here:
Q: How is this useful for devs working in industry?
A: Research can warn devs of APIs which may cause problems helping them to know which ones to avoid.
Next talk is:
Mark Harman, CREST Centre, SSE Group, Department of Computer Science, UCL
A feature is a claimed functionality offered by an app
Features themselves can have price, rating and popularity, we have evidence that this data is meaningful.
If these features can be found, how do they migrate? Eg find location may migrate from one category into another.
Research Q: Are there particular migratory features that are more popular than others?
Research Q: Which categories are more likely to have migration?
Migratory behaviour has been formalised using set theory, spilt between migratory and non-migratory, weak and strong migration, birth and death of a feature.
Since 2009 UCL Appstore analysis group have been taking snapshots of app stores.
Dataset – Blackberry App World, weeks 3 and 36 in 2011, of 1324 features, only 32 migrated.
We expected the features that migrated to have value, actually they were cheaper and less popular.
The intransitive features carried higher monetary value, were more expensive.
Research Q: is there any correlation between higher ratings and price, rating and popularity?
There is a -ve correlation between price and rating i.e. as cost goes up rating goes down.
**Price of a feature uses the median price of apps which use that feature.
Unicorns, Software Design & Mobile Software Analytics
Michele Lanza, Faculty of Informatics, University of Lugano, Switzerland
Describing a visual software analytics platform for mobile development called SAMOA
Mined thousands of apps for analysis, aim to look deep into systems, cutting away uninteresting apps to put together a large dataset of apps to analyse. Popular open source apps.
Put together a catalogue of insights from an app design point of view.
- design principles are essentially absent, everything is hacked together.
- basic guidelines and ignored
- code quality is not a concern
- Time to market is paramount
- Small core domain so not much to “design”
- Extensive use of APIs
- Reuse par excellence, the software engineering dream?
- The core code doesnt change much, code written after the core changes more
Discussion: Mobile apps is a young industry so that’s why apps are basically hacked together. Think back to web apps early on, market share is paramount at the beginning, newbies are writing the code. Doesn’t that explain why there is no “design” and apps are hacked, this will change later on.
Where Does My Sensitive Data Go? – Mining Apps for Abnormal Information Flow
Andreas Zeller, Computer Science, Saarland University, Germany
Looking at information flow within app code is not at all straightforward:
- APIs easy to grep and process.
- Code hard to analyse statically, multiple components, scale, many challenges.
- Code may also be adverse, obfuscation, protection etc.
- Need test generators to assess and to instrument the binary.
Static taint analysis carried out. Data flow analysis carried out on Twitter app. Took months to weed out the bugs. We are the first ever to have done this.
Used outlier detection, train it and figure out how much of an outlier it is, classifies malware, then gives an indicator of benign vs malign.
15338 malware samples classified using MUDFLOW: mining apps for sensitive data
Current malware detecters depend only on what has previously been identified as malware. Current research 75% accurate.
Team have mined apps, analysed the code, detected outliers and correctlyy classified outliers. More details on the University of Saarland app mining portal.
Mining User Reviews in the App Store
Walid Maalej, University of Hamburg, Hamburg, Germany
Conducted a user review study , questions were about: usage, content, impact.
Research data from the iOS app store, 25 apps from each of 22 categories, 7/2008-9/2012, totoal 1100 apps, half free, 1million+entries, 1million users.
Highest amount of reviews in games and social networks, lowest in mediacl and navigation, travel, catalogs.
Some apps like Facebook get 4000 reviews per day, average 22 reviews per day.
New releases lead to a feedback storm.
Insights: Users submit reviews frequently, most (77%) are less than 140 chars.
Approx 78% are directly rating related, 33% user experience, 31% reqs, 13% community focused. One third include information useful for developers.
Used feature extraction and sentiment analysis on app store reviews
Q: Do you look at emoticons in sentiment analysis?
A: Yes, we use everything in the sentence, but not for the feature analysis as it is not relevant.
Sentiment scores and extraction accuracy for several apps including Angry Birds, Dropbox and Pinterest
Discussion around how to deal with sarcasm, is it picked up by sentiment analysis?
More details and results of this research at Professor Maalej’s publications page.
Analysis of reviews from the google play store
Rachel Harrison, Department of Computing and Communication Technologies, Oxford Brookes University, UK
Classification system developed for the analysis of reviews from the Google Play Store. 169 apps, 3279 reviews, 4.27 average rating, 327 average ratings per app, £1.92 average price.
Results 1: how do users rate apps?
Lower ratings usually asscociated with concerns around customer support
Mid range ratings usually linked to bug reports
Higher ratings usually linked to requirements requests
Results 2: How do reviews vary with price?
Cheapest apps reviews linked to requirements, higher price linked to bug reporting
Price and money feedback +tively correlated
Results 3: Distribution of reviews across class of codes
Users tend to provide +ve feedback
Reviews used for expressing reqs and reporting bugs
Users least concerned with issues related to versioning
Users write mostly about functionality + missing logic
Results 5: Commonly occurring pairs in reviews
A good GUI makes people happy
Good functionality equates to good value for money
User are always looking for improvements in apps
Now working on strategies to provide useful information for mobile app developers and look at other app stores.
Extracting Signal from the Noise of User Reviews
Leonard Hoon, Swinburne University of Technology, Australia
Can we classify reviews?
Dataset: 22 categories, 8.7 million reviews, 5.25 million users, 17330 apps.
80% of reviews are roughly the size of a tweet
Different categories have different size of reviews
2* reviews have the longest reviews, 5* the shortest
To get more useful conclusions, they started timeboxing the reviews related to releases.
Short reviews are “useless”, in terms of finding any more information, as they relate to the rating.
Health and fitness skews the 2* category and health and fitness reviews tend to be longer
For more details check out Leonard’s Google Scholar page
Feature request analysis
William Martin, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
There are 1 million apps on the iOS appstore, Facebook has 600k reviews
Blackberry 130k apps, Blackberry messenger has 1.2 million reviews
Google play 1.3mill apps, Facebook has 22mill reviews
Windows store 300k apps, YouTube has 44 k reviews
Study on the Blackberry app store.
More information from William at his homepage
Predicting Price and Rating
Federica Sarro, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
Mining apps stores to support developers in estimating price and rating
How many people here have released an app? About one third of the audience, 10-12 people.
Federica asks how did you decide on price?
They were all free apps 😉
In 2012 more than 60% of apps in the app store have never been downloaded.
Relaeasing an app is an investment of time, energy etc. choosing the right price is part of the success of an app, but it is not easy.
The research found that there is no correlation between price and popularity/rating in non-free apps.
There are lots of articles online about how to price your app.
Federica suggests that you look for apps closest to your own and then use this to determine a starting price point. But, there may be a massive difference between the highest and lowest.
The goal of this research is to mine app stores to find an approach which could be recommended to developers when aiming to price their app.
Using AI, case based reasoning the most similar apps are used to find the most appropriate price.
Up to 15 analogies are used, worst, best mean etc.
Evaluation was based on a framework recently proposed to compare prediction systems using 10 fold cross validation.
A baseline comparison was carried out to see if it could outperform random guessing.
A state of the art comparison was carried out with respect to price and rating
Federica’s research results show that you CAN predict the price and rating of the app based on the features that are included.
Follow on research will be looking at comparing across platforms, taking into account other characteristics and more.
Prof Mark Harman: “Thank you everyone for speaking and attending today, see you at 10am tomorrow morning”
WELCOME TO CREST COW DAY 2
Our first speaker today:
Mobile Apps: Research Challenges
Ahmed E. Hassan, School of Computing, Queen’s University, Canada
Research challenges in mobile apps
Mobile apps are much smaller than frequently studied applications
There is a heavy reliance on platform, high defect concentration, with very limited use of processes, repositories
Comment: many mob apps are written by a single person which limits use of repositories
Early app days: no marketplace, interact with operators and manufacturers, had to be a big fish, no APIs
Midlife crisis: Java MIDP shows up, limited marketplace, closer interaction with users
Todays app market: direct contact with users, diff revenue models dev friendly APIs, many small dev shops
The top 200 apps many from companies with less than 10 devs
Now there are many more app stores/markets
Lots of the apps make revenue thru ads, so good to put ur app on every store
The speed of growth is enormous, fb 9 months to 1mill users, now 3 days
30-35% apps are games
60% of revenue is from games
To improve an app you need to:
- Identify patterns across crashes
- Deal with scale
- Understand what annoys users
Ahmed has a tool Mobapp
which visualises users and crashes, showing clusters of common crashes.
6300 reviews were read and sentiment analysis carried out on them
What do mobile users complain about?
11% of complaints mention an update
Main complaints about mobile apps from users are:
- feature removal
- hidden cost
- privacy and ethics
Research looked at which apps are available in each/every store
Comparison between Blackberry and Android apps:
- Android apps smaller
- Blackberry apps followed much more traditional software engineering methods
Looking at platform dependence, platform dependence can explain software defects
Examining reuse in the market
How much do mobile apps use inheritance?
Ratings: 33.4% of apps have only one person rating them
The top 200 highest grossing apps generate 60-80% of total market revenue
Monetization: 75% of apps are free to download
The mobile monetization global landscape is massive and elaborate, but all depends on ad libraries. There can be as many as 28 ad libraries in any one app. 65% of apps have only 1 ad library, 17% have 2 ad libraries.
Why do some apps have so many libraries? Because the fill rate is less than 18%
The number of ad libraries doesn’t impact the star rating but using the wrong libraries can do.
Ad maintenance: 14% of releases are just to update ad code. It is a serious software engineering challenge.
30% of apps are using dead libraries
Analytics needed for (research needed): store sales, ad revenue networks, downloads, reviews and field crashes.
Analytics is a massively growing industry
Prioritizing The Devices To Test Your App On: A Case Study Of Android Game Apps [PDF]
Hammad Khalid, Meiyappan Nagappan, Emad Shihab, Ahmed E. Hassan. In Proceedings of the 22nd ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE 2014), Hong Kong, China, Nov 2014.
What Do Mobile App Users Complain About? A Study on Free iOS Apps [PDF]
Hammad Khalid, Emad Shihab, Meiyappan Nagappan, Ahmed E. Hassan. Accepted in IEEE Software, 2014.
Some of the things that we looked at:
– Is it easier to develop for BB7 or Android?
– How hard is it to build apps vs regular server/desktop apps?
– What do users complain about for Android vs iOS apps? Is there major differences? Do these complains in general vary compared to say traditional non-mobile apps?
– How do app markets rank apps and whether this ranking system encourages app developers to improve their apps?
– How app developers make money by putting Ads and the impact of Ads on the apps ranking and release frequencies?
– How much cloning is there in the Android market – ie people re-publishing the same or very similar apps.
BREAK FOR COFFEE
…and we are back 🙂
Mining patterns for release cycle time in app stores
Maleknaz Nayebi, Electrical and Computer Engineering Department, University of Calgary, Canada
Crawled 9703 apps, 2900 only had one release, 1206 had 2 releases.
6000 apps mined
There is a relation between the number of releases and the number of installs in terms of release cycle time
Pattern recognition used to determine release cycle time for 6000 apps
Using Rough set analysis looking at release cycle time
Results: More feedback on short releases, but not necessarily better reviews
CHABADA: Checking app behavior against app descriptions
Alessandra Gorla, Computer Science, Saarland University, Germany
This work was previously presented at ICSE
AppStores are like candy stores, lots of data etc great fro researchers
Also single entry point, great for users
Users have to be careful which apps they download etc as there are lots of clones of popular apps, but had possibly malicious effect, ag Angry Birds
Also there are paid applications pretending to be real apps but do nothing
Mismatch between description and real application
So, how do we define malicious? What is or isnt a malicious app?
It depends on the context, easier to think about what we consider to be normal behaviour and then find anomalies.
Technique developed called CHABADA
- App collection: downloaded 32k apps from Google Play 2013, get description and apply stemming (keywords)
- Topics: used LDA which gives frequently occurring terms + probability of belonging to topic sets, then stemmed words related to topics.
- Clusters: used K means algorithm and probability of features belonging to topics (apps grouped together according to NL description)
- APIs: analysis to extract all Android API calls, but too many in framework, so focused on a subset of the APIs-those governed by particular permissions -> could get the main feature of the cluster eg “travel” and see if there is any malicious intent
- Outliers: anomalies detected thru one-class support vector machine (OC-SVM) then ranked applications according to anomalies
So does CHABADA effectively identify anomalous Android apps?
For 26% of the apps under study there was some anomalies, there was a large use of covert behaviour, mainly ad libraries, some had dubious behaviour, eg the Yahoo mail app was sending text messages, it didn’t have permission to do this.
Eg Soundcloud had uncommon behaviour, and there were also some benign outliers, most poker game apps were also spyware. Only one poker game was not invasive.
Classified apps into
- Malicious vs benign
- Predicted as malicious vs predicted as benign
Results: malware or malicious apps can be detected in an efficient way, can’t detect malware if no clustering analysis done
Why not use the Android store categories? Doesn’t give as good results, categories not as exact.
More results and information on the CHABADA page
Investigating Country Differences in Mobile App User Behavior and Challenges for Software Engineering
Soo Ling Lim, Department of Computer Science, UCL, UK
App store analysis is useful to look at user behaviour, but want to know why users do what they do and what they do after leaving the appstore.
Hypothesis: differences exist in mobile app usage behaviour between countries
What are those differences?
Currently no studies in this area, but apps are sold worldwide, so this information is usedful to discover
Research Qs: User adoption, their needs, rationale for choice of app, international differences in behaviour
Targetted top 15 by GDP countries, conducted online survey, translated into 9 languages: Spanish, German, French, Italian, Portuguese, Russian, Mandarin, Japanese, Korean.
31 questions, app usage, demographics, big 5 personality traits.
30k respondents, 30% response rate, screened out people who dont use apps and incomplete responses
4824 responses 49% f 51% m
15% didnt know which appstore they were using, poos because of appstores changing name
Average download 2-5 apps per month
Most popular way to find app: search by keyword 43%, browse randomly 37%
App download trigger: 1st – entertainment
Choice factor highest = price, features, description, review by others
52% of people dont rate apps
Payment: 56% dont pay, 20% only buy if no free apps, 17% getting additional features
Why stop using an app? 44% dont need it anymore, better alternative, bored of app, app crashes, doesnt have required features…
Differences between countries:
UK app users 3x more likely to be influenced by price
Canada similar to UK but 2x
Brazil like social networking and talking to strangers
Italy dont like to pay for or rate apps,
Australia dont like rating apps
China 9x more likely to select the first app on a list
Challenges for market driven software engineering:
App store dependency
Packaging needs to be country appropriate, eg cuteness – Japan
Vast feature spaces, apps have fewer features, trends change fast, whats the optimal set, hard if fewer.
High quality expectation, different countries have different expectations
Price sensitivity, 57% dont pay for apps, whatsapp eg, paid on iOS free on Android
Ecosystem effect, networked ecosystem within app stores, didnt apply previous to appstores
The paper describing this research is
The relationship between Price, Popularity and Ratings for Apps and Claimed Features
Yuanyuan Zhang, CREST Centre, SSE Group, Department of Computer Science, UCL, UK
App stores provide a rich source of information and technical aspects of the apps
Data extracted from appstore, parsed to retrieve relevant information, identified feature patterns, NLP and data mining then calaculate the metrics for correlation analysis.
Features then extracted from description of apps
Metrics introduced which capture attributes of the features
Studied relationship between price rating and popularity.
Blackberry app world 1/9/11
19 categories of apps, 40k apps studied:
Majority of apps rated 3-5 stars
Very few apps have more than 40 features, most apps have less than 10 features
- No strong correlation between the price and the rating, similar between price and popularity
- Strong linear correlation between rating and popularity
App feature questionnaire with 38 questions
- Inverse correlation between feature price and rating
- The higher the price the more features are claimed to be provided
Is there any difference between free apps and non-free apps?
- Free apps in general more popular with higher ratings
Estimating demand from the sales and ranks data in Apple App Store
Nilam Kaushik, Management Science and Operations, UCL, UK
END DAY 2