Bias Hunter
  • Home
  • Blog
  • Resources
  • Contact

Ethical Algorithms

27/12/2016

0 Comments

 
In a wonderful and very interesting turn of events, ethical algorithms are suddenly all the rage. Cathy O’Neil wrote a book called Weapons of Math Destruction, in which she went through a couple of interesting case examples of how algorithms can work in an unethical and destructive fashion. Her examples came from the US, but that the phenomenon doesn’t limit itself on the other side of the pond.

In fact, just a month ago, the Economist reported on the rise of credit cards in China. The consumption habits in China are becoming closer to resembling Western ones, including the use of credit cards. And where you have credit cards, you also have credit checks. But how do you show your creditworthiness, if you haven’t had credit?

Enter Sesame Credit, a rating firm. According to the Economist, they rely on “users’ online-shopping habits to calculate their credit scores. Li Yingyun, a director, told Caixin, a magazine, that someone playing video games for ten hours a day might be rated a bad risk; a frequent buyer of nappies would be thought more responsible.” Another firm called China Rapid Finance relies on looking at users’ social connections and payments. My guess would be that their model predicts your behavior based on the behavior of your contacts. So if you happen to be connected to a lot of careless spend-a-holics, too bad for you.
​
Without even getting to the privacy aspects of such models, one concerning aspect – and this is the main thrust of O’Neil’s book – is that these kinds of models can discriminate heavily based completely on aggregate behavior. For example, if CRF:s model sees your friends spending and not paying their bills, they might classify you as a credit risk, and not give you a credit card. And if there is little individual data about you, this kind of aggregate data can form the justification of the whole decision. Needless to say, it’s quite unfair that you can be denied credit – even when you’re doing everything right – just because of your friends’ behavior.
Picture
Four credit ratings, coming down hard.
Now, O’Neil’s book is full of similar cases. To be honest, the idea is quite straightforward. The typical signs of an unethical model (in O’Neil’s terms, a Weapon of Math Destruction) has a few signs: 1) they have little to no feedback to learn from, and 2) they make decisions based on aggregate data. The second one was already mentioned, but the first one seems even more damning.

A good example of the first kind is generously provided by US education systems. Now, in the US, rankings of schools are all the rage. Such rankings are defined in with a complicated equation, that takes into account how well outgoing students do. And of course, the rankings drive the better students to the better schools. However, the model never actually learns any of the variables and their importance from data – these are all defined by pulling them from the administrators’, programmers’, and politicians’ collective hats. What could go wrong? What happens with systems like these, is that the ranking becomes a self-fulfilling prophecy, and that changing how the ranking is calculated becomes impossible, because the schools that do well are obviously up in arms about any changes.

This whole topic of discrimination in algorithms is actually gaining some good traction. In fact, people at Google are taking notice. In a paper that was recently presented at NIPS, the authors argue that what is needed is a concept of equality of opportunity in supervised learning. The idea is simple: if you have two groups, (like two races, or rich and poor, etc.) in both groups the true positive rate should be the same. In the context of loans, for example, this means that of all those who could pay back loans, the same percentage of people are given a loan. So if groups A and B have 800 and 100 people that could pay the loan back, and your budget can account a loan to 100 people, then 88 in group A and 11 in group B would get the loan offer (both having 11% loan offer rate).
​
Mind you, this isn’t the only possible or useful concept for reducing discrimination. Other useful ones group-unaware and demographic parity. A group-unaware algorithm discards the group variable, and uses the same threshold for both groups. But for loans, depending on the group distributions, this might lead to one group getting less loan offers. A demographic parity algorithm, on the other hand, looks at how many loans each group gets. In the case of loans, this would be quite silly, but the concept might be more useful when allocating representatives for groups, because you might want each group to have the same number of representatives, for example.
Anyway, there’s a really neat interactive graphic about these, I recommend you to check it out. You can find it here.
0 Comments

Which Outside View? The Reference Class Problem

14/4/2015

0 Comments

 
One of the most sensible and applicable pieces of advice in the decision making literature is to take the outside view. Essentially, this means getting outside your own frame and looking at the statistical data of what has happened before.

For example, suppose you’re planning to put together a new computer from parts you order online. You’ve ordered the parts, and feel that this time you know most of the common hiccups of building the machine. You estimate that it will take you two weeks to complete. However, in the past you’ve built three computers – and they took 3, 5 and 4 weeks, respectively. Once the parts came in later than expected, once you were at work too much to manage the build and once you had some issues that needed resolving. But this time is different!

Now, the inside view says you feel confident that you’ve learnt from you mistakes. Therefore, estimating less build time than in history seems to make sense. The outside view, on the other hand, says that even if you have learnt something, there have always been hiccups of some kind – so that is likely to happen again. Hence, the outside view would estimate your build time to be around the average of your historical record.

In such a simple case it’s quite easy to see why taking the outside view is sensible, especially now that I’ve painted the inside view as a sense of “I’m better than before”. Unfortunately, real world is not this clean, but much messier. In the real world, the question is not should you use the outside view (you should), but which one?  The problem is that you’ve often got several options.

For example, suppose you were recently appointed as a project manager in a company, and you’ve led projects for a year now. Two months ago, your team got a new integration specialist. Now, you’re trying to think how much time it would be to install a new system to a very large corporate client. You’d like to use the outside view, but don’t know which one. What’s the reference point? All projects you’ve ever led? All projects you’ve led in this company? All projects with the new integration specialist? All projects for a very large client?

As we see, picking the outside view to use is not easy. In fact, this problem – a deep philosophical problem in frequentist statistics – is known in statistics and philosophy as the reference class problem. All the possible reference class in this example make some sense. The problem is that of causality: you have incomplete knowledge about which attributes impact your success, and how much. Does it matter that you have a new integration specialist? Are these projects very similar to ones you’ve done at the previous company? How much do projects differ by client size? If you can answer all these questions, you’d know which reference class to use. But if you knew the answers to these, you probably won’t need the outside view in the first place! So what can you do?

A practical suggestion: use several reference classes. If the estimates from these differ by a lot, then the situation is difficult to estimate. But hopefully finding this out improves your sense of what are the drivers of success for the project. If  the estimates don’t diverge, then it doesn’t really matter which outside view you pick, so you can be more confident of the estimate.
0 Comments

Losing the Momentum

17/11/2014

0 Comments

 
So, I’m finally home from the trip to San Francisco and Palo Alto. Experiencing the US culture was again quite intriguing. The differences are pretty notable in comparison to Finland: the expectation of sociability and extraversion, the lunch spots that only do takeaway, and the enthusiasm around baseball and football. I spent a few evenings watching college football matches – and I have to say that they were quite exciting! If it wasn’t for the ubiquitous commercial breaks, I’d say football is one of the most intense and captivating sports on TV. What also caught my eye, however, was the lack of statistical sophistication of the commentators.

During a match, a team might make two or three really awesome plays in a row. For example, in Ohio State vs. Minnesota, Ohio made a few awesome touchdowns with long passes and and an over 80 yard run. After this streak of successes, the game got more even, with Minnesota actually managing to even the score. What’s special about this is that the commentators spent a lot of time arguing about momentum. In their view, Minnesota managed to “get the momentum to their side” with one interception and a few hard tackles, like this one: 
Well, I’m not so sure.

My statistical gut instinct says that this is just regression to the mean. That means that after a few lucky successes (or a streak of bungles), what’s likely to happen is that the game returns to the mean. And in professional sports, the mean is that teams are pretty evenly matched. So instead of talking about momentum, the more likely explanation is that Ohio St just wasn’t so lucky anymore.

Regression to the mean is especially tricky since we tend to see patterns everywhere, including places where there are none. Kahneman describes the famous case, in which he was working for the Israeli air force. The air force trainers had a habit of dressing down cadets who made mistakes harshly. In their experience, this helped the cadets to get a grip and concentrate, so they wouldn’t make an error the next time. Kahneman decided to look into this intuition. At first, it looks like that was the case: a failed training flight that included harsh criticism was usually followed by a better flight. Isn’t this evidence that harsh negative feedback caused improvements?

Well, not necessarily. Basing that conclusion on the data would be a case of a fallacy called post hoc, ergo propter hoc, or what it’s more commonly called the post hoc fallacy. What the Latin name means is “after this, hence because of this”. It’s a conclusion of the form “since B came after A, B must have been cauded by A”. This is of course rarely true. My waking is followed by a sunrise – but that doesn’t say I’m causing the sunrise! Of course, this example is so ridiculous that we never think I would be causing the sunrise. But the same principle applies in other cases.

So what happened in Kahneman’s air force case? Well, they considered that the air force trainers might be falling for the post hoc fallacy. Instructors believed that improvement after a bad flight was due to the harsh feedback. In fact, it was simple regression to the mean. An average training flight is the most likely case, so that is usually going to follow a bad training flight. In fact, in being the average it usually follows any kind of training flight!

To take this back to sports, I think regression to the mean is often at play in the sports domain. Exceptional performances are followed by average performances, and the same is the case in moving from bad to average performance. Especially in sports that contain many sequences – like American football or tennis, for example – are likely to contain divergence from the mean, followed by regression to the mean. Someone might make a few awesome plays, but that’s unlikely to last long no matter what the other player or team does. Tactical changes do have some effect, depending on sports, but I think regression is much more important than we usually think. And regression is the reason why a Rookie of the Year is unlikely to perform as well the next year, or why an awful batting season tends to be followed by a better one. It all comes back towards the mean.
0 Comments

    RSS Feed

    Archives

    December 2016
    November 2016
    April 2016
    March 2016
    February 2016
    November 2015
    October 2015
    September 2015
    June 2015
    May 2015
    April 2015
    March 2015
    February 2015
    January 2015
    December 2014
    November 2014
    October 2014
    September 2014
    August 2014

    Categories

    All
    Alternatives
    Availability
    Basics
    Books
    Cognitive Reflection Test
    Conferences
    Criteria
    Culture
    Data Presentation
    Decision Analysis
    Decision Architecture
    Defaults
    Emotions
    Framing
    Hindsight Bias
    Improving Decisions
    Intelligence
    Marketing
    Mindware
    Modeling
    Norms
    Nudge
    Organizations
    Outside View
    Phd
    Planning Fallacy
    Post Hoc Fallacy
    Prediction
    Preferences
    Public Policy
    Rationality
    Regression To The Mean
    Sarcasm
    Software
    Status Quo Bias
    TED Talks
    Uncertainty
    Value Of Information
    Wellbeing
    Willpower

Powered by Create your own unique website with customizable templates.