COW 37


More details and the 36th CREST COW can be found here: App Store Analysis Program and attendees.

The whole 2 day workshop was live blogged by Dr Sue Black, you can read through the blog below.

Details of views for the workshop from all over the world are shown below by country.

Views of our CREST COW 36 workshop by country
Views of our CREST COW 36 workshop by country

App stores provide a rich source of information for software engineering research: It is, of course, possible to extract technical information as with other software systems. However, we can also readily obtain information relating to customer reviews, pricing and popularity. Never before in history of software engineering has so much information been available concerning so many, and so disparate, facets of software systems. Increasingly, the users of apps and app stores are relying on the software they provide for highly nontrivial activities, making app store analysis a pressing concern. This workshop will bring together software engineers to discuss and develop the emerging research agenda in App Store Analysis.


Professor Mark Harman welcomes everyone to the 36th CREST COW workshop

FullSizeRender (14)

Programming languages, software engineering, computational musicology are all topics researched at CREST.

The CREST COW Twitter account is @CRESTCOW and the hashtag is #UCLCOW36

We are a new community, some may say that this is not software engineering. There is a lot of resistance to this topic in software engineering, but looking around the room there are a lot of smart people here 😉

In the late 90s, people thought we shouldn’t analyse web apps, in 1982 people said the same about Micros…

Everyone now introduces themselves to the group, there are about 35 people attending from around the UK and the world. More details at the bottom of the CREST COW 36 webpage

Our first talk this morning:

Studying and Enabling Reuse in Android Apps

Denys Poshyvanyk, Computer Science Department, The College of William and Mary, USA

William and Mary College is the 2nd oldest academic building in the US, founded in 1693.

We have 1.3 million apps, real and fake markets, 1000s of open source apps, its a fast growing economy with lots of people and companies making lots of money.

This talk concentrates on one issue: apps are built using APIs and there are some specific issues related to that. There are issues related to the maintenance of APIs.


(Sorry our photos are so dark :()

Research Q: APIs evolve rapidly, does instability of APIs affect the success of Android apps?

5848 apps analysed, which belonged to 30 domain categories, 68 third party libraries, only those with repositories, because needed all the changes and all the bugs.

85k dev commits, 39k bug fixes, 1232 devs contributing. Used average user ratings.

Analysed distribution of rating s for free vs paid apps. Worked only with free apps.

Plotted bug fixes vs ratings


 Looked at what problems do developers experience related to appstores, apps, reviews…

1221 app developers, extracted all developers email addresses, emailed them an online survey, had 45 responses from professional Android developers.

Became clear that unstable APIs DO make a significant impact on apps.


Bugs in apps correlate strongly with poor ratings, bugs often related to the instability of the APIs used.

More details of this research can be found here:

Studying and Enabling Reuse in Android Apps

Q: How is this useful for devs working in industry?

A: Research can warn devs of APIs which may cause problems helping them to know which ones to avoid.



 Next talk is:

Migration of claimed features through App stores

Mark Harman, CREST Centre, SSE Group, Department of Computer Science, UCL

A feature is a claimed functionality offered by an app

Features themselves can have price, rating and popularity, we have evidence that this data is meaningful.

If these features can be found, how do they migrate? Eg find location may migrate from one category into another.

Research Q: Are there particular migratory features that are more popular than others?

Research Q: Which categories are more likely to have migration?

Migratory behaviour has been formalised using set theory, spilt between migratory and non-migratory, weak and strong migration, birth and death of a feature.


 Since 2009 UCL Appstore analysis group have been taking snapshots of app stores.

Dataset – Blackberry App World, weeks 3 and 36 in 2011, of 1324 features, only 32 migrated.


Main findings:

We expected the features that migrated to have value, actually they were cheaper and less popular.

The intransitive features carried higher monetary value, were more expensive.

Research Q: is there any correlation between higher ratings and price, rating and popularity?

There is a -ve correlation between price and rating i.e. as cost goes up rating goes down.

**Price of a feature uses the median price of apps which use that feature.




Unicorns, Software Design & Mobile Software Analytics

Michele Lanza, Faculty of Informatics, University of Lugano, Switzerland

 Describing a visual software analytics platform for mobile development called SAMOA

Mined thousands of apps for analysis, aim to look deep into systems, cutting away uninteresting apps to put together a large dataset of apps to analyse. Popular open source apps.

Put together a catalogue of insights from an app design point of view.



  1. design principles are essentially absent, everything is hacked together.
  2. basic guidelines and ignored
  3. code quality is not a concern


  1. Time to market is paramount
  2. Small core domain so not much to “design”
  3. Extensive use of APIs
  4. Reuse par excellence, the software engineering dream?
  5. The core code doesnt change much, code written after the core changes more


Discussion: Mobile apps is a young industry so that’s why apps are basically hacked together. Think back to web apps early on, market share is paramount at the beginning, newbies are writing the code. Doesn’t that explain why there is no “design” and apps are hacked, this will change later on.




Where Does My Sensitive Data Go? – Mining Apps for Abnormal Information Flow

Andreas Zeller, Computer Science, Saarland University, Germany

Looking at information flow within app code is not at all straightforward:

  1. APIs easy to grep and process.
  2. Code hard to analyse statically, multiple components, scale, many challenges.
  3. Code may also be adverse, obfuscation, protection etc.
  4. Need test generators to assess and to instrument the binary.


Static taint analysis carried out. Data flow analysis carried out on Twitter app. Took months to weed out the bugs. We are the first ever to have done this.

Used outlier detection, train it and figure out how much of an outlier it is, classifies malware, then gives an indicator of benign vs malign.

15338 malware samples classified using MUDFLOW: mining apps for sensitive data


Current malware detecters depend only on what has previously been identified as malware. Current research 75% accurate.


Team have mined apps, analysed the code, detected outliers and correctlyy classified outliers. More details on the University of Saarland app mining portal.



Mining User Reviews in the App Store

Walid Maalej, University of Hamburg, Hamburg, Germany


Conducted a user review study , questions were about: usage, content, impact.

Research data from the iOS app store, 25 apps from each of 22 categories, 7/2008-9/2012, totoal 1100 apps, half free, 1million+entries, 1million users.


Highest amount of reviews in games and social networks, lowest in mediacl and navigation, travel, catalogs.

Some apps like Facebook get 4000 reviews per day, average 22 reviews per day.

New releases lead to a feedback storm.


Insights: Users submit reviews frequently, most (77%) are less than 140 chars.


Approx 78% are directly rating related, 33% user experience, 31% reqs, 13% community focused. One third include information useful for developers.


Used feature extraction and sentiment analysis on app store reviews


Q: Do you look at emoticons in sentiment analysis?

A: Yes, we use everything in the sentence, but not for the feature analysis as it is not relevant.

Sentiment scores and extraction accuracy for several apps including Angry Birds, Dropbox and Pinterest


Discussion around how to deal with sarcasm, is it picked up by sentiment analysis?

More details and results of this research at Professor Maalej’s publications page.


Analysis of reviews from the google play store

Rachel Harrison, Department of Computing and Communication Technologies, Oxford Brookes University, UK

 Classification system developed for the analysis of reviews from the Google Play Store. 169 apps, 3279 reviews, 4.27 average rating, 327 average ratings per app, £1.92 average price.


Results 1: how do users rate apps?

Lower ratings usually asscociated with concerns around customer support

Mid range ratings usually linked to bug reports

Higher ratings usually linked to requirements requests

Results 2: How do reviews vary with price?

Cheapest apps reviews linked to requirements, higher price linked to bug reporting


Price and money feedback +tively correlated

Results 3: Distribution of reviews across class of codes

Users tend to provide +ve feedback

Reviews used for expressing reqs and reporting bugs

Users least concerned with issues related to versioning

Users write mostly about functionality + missing logic


Results 5: Commonly occurring pairs in reviews

A good GUI makes people happy

Good functionality equates to good value for money

User are always looking for improvements in apps


Now working on strategies to provide useful information for mobile app developers and look at other app stores.


Extracting Signal from the Noise of User Reviews

Leonard Hoon, Swinburne University of Technology, Australia

Can we classify reviews?

Dataset: 22 categories, 8.7 million reviews, 5.25 million users, 17330 apps.


80% of reviews are roughly the size of a tweet

Different categories have different size of reviews


2* reviews have the longest reviews, 5* the shortest


 To get more useful conclusions, they started timeboxing the reviews related to releases.


Short reviews are “useless”, in terms of finding any more information, as they relate to the rating.


Health and fitness skews the 2* category and health and fitness reviews tend to be longer

For more details check out Leonard’s Google Scholar page


Feature request analysis

William Martin, CREST Centre, SSE Group, Department of Computer Science, UCL, UK

There are 1 million apps on the iOS appstore, Facebook has 600k reviews

Blackberry 130k apps, Blackberry messenger has 1.2 million reviews


Google play 1.3mill apps, Facebook has 22mill reviews

Windows store 300k apps, YouTube has 44 k reviews


Study on the Blackberry app store.



 More information from William at his homepage


Predicting Price and Rating

Federica Sarro, CREST Centre, SSE Group, Department of Computer Science, UCL, UK

Mining apps stores to support developers in estimating price and rating

How many people here have released an app? About one third of the audience, 10-12 people.

Federica asks how did you decide on price?

They were all free apps 😉


In 2012 more than 60% of apps in the app store have never been downloaded.

Relaeasing an app is an investment of time, energy etc. choosing the right price is part of the success of an app, but it is not easy.

The research found that there is no correlation between price and popularity/rating in non-free apps.

There are lots of articles online about how to price your app.


Federica suggests that you look for apps closest to your own and then use this to determine a starting price point. But, there may be a massive difference between the highest and lowest.

The goal of this research is to mine app stores to find an approach which could be recommended to developers when aiming to price their app.


 Using AI, case based reasoning the most similar apps are used to find the most appropriate price.


Up to 15 analogies are used, worst, best mean etc.

Evaluation was based on a framework recently proposed to compare prediction systems using 10 fold cross validation.


A baseline comparison was carried out to see if it could outperform random guessing.


A state of the art comparison was carried out with respect to price and rating

IMG_4325 IMG_4328

Federica’s research results show that you CAN predict the price and rating of the app based on the features that are included.


Follow on research will be looking at comparing across platforms, taking into account other characteristics and more.


Prof Mark Harman: “Thank you everyone for speaking and attending today, see you at 10am tomorrow morning”



Our first speaker today:

Mobile Apps: Research Challenges

Ahmed E. Hassan, School of Computing, Queen’s University, Canada


Research challenges in mobile apps

Mobile apps are much smaller than frequently studied applications

There is a heavy reliance on platform, high defect concentration, with very limited use of processes, repositories

Comment: many mob apps are written by a single person which limits use of repositories

Early app days: no marketplace, interact with operators and manufacturers, had to be a big fish, no APIs

Midlife crisis: Java MIDP shows up, limited marketplace, closer interaction with users

Todays app market: direct contact with users, diff revenue models dev friendly APIs, many small dev shops


The top 200 apps many from companies with less than 10 devs

Now there are many more app stores/markets

Lots of the apps make revenue thru ads, so good to put ur app on every store

The speed of growth is enormous, fb 9 months to 1mill users, now 3 days

30-35% apps are games

60% of revenue is from games


To improve an app you need to:

  1. Identify patterns across crashes
  2. Deal with scale
  3. Understand what annoys users

Ahmed has a tool Mobapp

which visualises users and crashes, showing clusters of common crashes.


6300 reviews were read and sentiment analysis carried out on them

What do mobile users complain about?



11% of complaints mention an update


Main complaints about mobile apps from users are:

  1. feature removal
  2. hidden cost
  3. privacy and ethics



Research looked at which apps are available in each/every store


Comparison between Blackberry and Android apps:

  1. Android apps smaller
  2. Blackberry apps followed much more traditional software engineering methods

Looking at platform dependence, platform dependence can explain software defects


Examining reuse in the market


How much do mobile apps use inheritance?


Ratings: 33.4% of apps have only one person rating them


The top 200 highest grossing apps generate 60-80% of total market revenue



Monetization: 75% of apps are free to download

The mobile monetization global landscape is massive and elaborate, but all depends on ad libraries. There can be as many as 28 ad libraries in any one app. 65% of apps have only 1 ad library, 17% have 2 ad libraries.


Why do some apps have so many libraries? Because the fill rate is less than 18%

The number of ad libraries doesn’t impact the star rating but using the wrong libraries can do.

Ad maintenance: 14% of releases are just to update ad code. It is a serious software engineering challenge.


30% of apps are using dead libraries

Analytics needed for (research needed): store sales, ad revenue networks, downloads, reviews and field crashes.

Analytics is a massively growing industry



Link for slides from this talk

Relevant papers:

Prioritizing The Devices To Test Your App On: A Case Study Of Android Game Apps [PDF]

Hammad Khalid, Meiyappan Nagappan, Emad Shihab, Ahmed E. Hassan. In Proceedings of the 22nd ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE 2014), Hong Kong, China, Nov 2014.

What Do Mobile App Users Complain About? A Study on Free iOS Apps [PDF]
Hammad Khalid, Emad Shihab, Meiyappan Nagappan, Ahmed E. Hassan. Accepted in IEEE Software, 2014.

From Ahmed:

Some of the things that we looked at:

– Is it easier to develop for BB7 or Android?

– How hard is it to build apps vs regular server/desktop apps?

– What do users complain about for Android vs iOS apps? Is there major differences? Do these complains in general vary compared to say traditional non-mobile apps?

– How do app markets rank apps and whether this ranking system encourages app developers to improve their apps?

– How app developers make money by putting Ads and the impact of Ads on the apps ranking and release frequencies?

– How much cloning is there in the Android market – ie people re-publishing the same or very similar apps.






…and we are back 🙂

Mining patterns for release cycle time in app stores

Maleknaz Nayebi, Electrical and Computer Engineering Department, University of Calgary, Canada

Crawled 9703 apps, 2900 only had one release, 1206 had 2 releases.

6000 apps mined

IMG_4443 IMG_4444

There is a relation between the number of releases and the number of installs in terms of release cycle time

Pattern recognition used to determine release cycle time for 6000 apps

 IMG_4446 IMG_4449

Using Rough set analysis looking at release cycle time

IMG_4480 IMG_4478

IMG_4485 IMG_4471

IMG_4492 IMG_4491

Results: More feedback on short releases, but not necessarily better reviews


CHABADA: Checking app behavior against app descriptions

Alessandra Gorla, Computer Science, Saarland University, Germany

 This work was previously presented at ICSE

AppStores are like candy stores, lots of data etc great fro researchers

Also single entry point, great for users

Users have to be careful which apps they download etc as there are lots of clones of popular apps, but had possibly malicious effect, ag Angry Birds

Also there are paid applications pretending to be real apps but do nothing

Mismatch between description and real application

IMG_4500 IMG_4497

So, how do we define malicious? What is or isnt a malicious app?

It depends on the context, easier to think about what we consider to be normal behaviour and then find anomalies.

IMG_4507 IMG_4508

Technique developed called CHABADA

  1. App collection: downloaded 32k apps from Google Play 2013, get description and apply stemming (keywords)
  2. Topics: used LDA which gives frequently occurring terms + probability of belonging to topic sets, then stemmed words related to topics.
  3. Clusters: used K means algorithm and probability of features belonging to topics (apps grouped together according to NL description)
  4. APIs: analysis to extract all Android API calls, but too many in framework, so focused on a subset of the APIs-those governed by particular permissions -> could get the main feature of the cluster eg “travel” and see if there is any malicious intent
  5. Outliers: anomalies detected thru one-class support vector machine (OC-SVM) then ranked applications according to anomalies

IMG_4511 IMG_4510

So does CHABADA effectively identify anomalous Android apps?

For 26% of the apps under study there was some anomalies, there was a large use of covert behaviour, mainly ad libraries, some had dubious behaviour, eg the Yahoo mail app was sending text messages, it didn’t have permission to do this.

Eg Soundcloud had uncommon behaviour, and there were also some benign outliers, most poker game apps were also spyware. Only one poker game was not invasive.

IMG_4534 IMG_4533

Classified apps into

  1. Malicious vs benign
  2. Predicted as malicious vs predicted as benign

Results: malware or malicious apps can be detected in an efficient way, can’t detect malware if no clustering analysis done

IMG_4544 IMG_4524

Why not use the Android store categories? Doesn’t give as good results, categories not as exact.

IMG_4536 IMG_4537

More results and information on the CHABADA page


Investigating Country Differences in Mobile App User Behavior and Challenges for Software Engineering

Soo Ling Lim, Department of Computer Science, UCL, UK

 App store analysis is useful to look at user behaviour, but want to know why users do what they do and what they do after leaving the appstore.

Hypothesis: differences exist in mobile app usage behaviour between countries

What are those differences?

Currently no studies in this area, but apps are sold worldwide, so this information is usedful to discover

Research Qs: User adoption, their needs, rationale for choice of app, international differences in behaviour

IMG_4553 IMG_4551

Targetted top 15 by GDP countries, conducted online survey, translated into 9 languages: Spanish, German, French, Italian, Portuguese, Russian, Mandarin, Japanese, Korean.

IMG_4565 IMG_4566

31 questions, app usage, demographics, big 5 personality traits.

30k respondents, 30% response rate, screened out people who dont use apps and incomplete responses

4824 responses 49% f 51% m



15% didnt know which appstore they were using, poos because of appstores changing name

Average download 2-5 apps per month

IMG_4574 IMG_4576

Most popular way to find app: search by keyword 43%, browse randomly 37%

App download trigger: 1st – entertainment

Choice factor highest = price, features, description, review by others

IMG_4587 IMG_4601

52% of people dont rate apps

Payment: 56% dont pay, 20% only buy if no free apps, 17% getting additional features

Why stop using an app? 44% dont need it anymore, better alternative, bored of app, app crashes, doesnt have required features…

*** All data is available online

IMG_4585 IMG_4591

Differences between countries:

UK app users 3x more likely to be influenced by price

Canada similar to UK but 2x

Brazil like social networking and talking to strangers

Italy dont like to pay for or rate apps,

Australia dont like rating apps

China 9x more likely to select the first app on a list

Challenges for market driven software engineering:

App store dependency

Packaging needs to be country appropriate, eg cuteness – Japan

Vast feature spaces, apps have fewer features, trends change fast, whats the optimal set, hard if fewer.

High quality expectation, different countries have different expectations

Price sensitivity, 57% dont pay for apps, whatsapp eg, paid on iOS free on Android

Ecosystem effect, networked ecosystem within app stores, didnt apply previous to appstores

The paper describing this research is

Investigating Country Differences in Mobile  App User Behavior and Challenges for Software Engineering


The relationship between Price, Popularity and Ratings for Apps and Claimed Features

Yuanyuan Zhang, CREST Centre, SSE Group, Department of Computer Science, UCL, UK

App stores provide a rich source of information and technical aspects of the apps

Data extracted from appstore, parsed to retrieve relevant information, identified feature patterns, NLP and data mining then calaculate the metrics for correlation analysis.

IMG_4604 IMG_4603

Features then extracted from description of apps

Metrics introduced which capture attributes of the features

Studied relationship between price rating and popularity.

Blackberry app world 1/9/11

19 categories of apps, 40k apps studied:

Majority of apps rated 3-5 stars


Very few apps have more than 40 features, most apps have less than 10 features


  1. No strong correlation between the price and the rating, similar between price and popularity
  2. Strong linear correlation between rating and popularity

App feature questionnaire with 38 questions

IMG_4638 IMG_4640


  1. Inverse correlation between feature price and rating
  2. The higher the price the more features are claimed to be provided

Is there any difference between free apps and non-free apps?

  1. Free apps in general more popular with higher ratings


Estimating demand from the sales and ranks data in Apple App Store                    

Nilam Kaushik, Management Science and Operations, UCL, UK





One thought on “COW 37

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s