In fact, just a month ago, the Economist reported on the rise of credit cards in China. The consumption habits in China are becoming closer to resembling Western ones, including the use of credit cards. And where you have credit cards, you also have credit checks. But how do you show your creditworthiness, if you haven’t had credit?
Enter Sesame Credit, a rating firm. According to the Economist, they rely on “users’ online-shopping habits to calculate their credit scores. Li Yingyun, a director, told Caixin, a magazine, that someone playing video games for ten hours a day might be rated a bad risk; a frequent buyer of nappies would be thought more responsible.” Another firm called China Rapid Finance relies on looking at users’ social connections and payments. My guess would be that their model predicts your behavior based on the behavior of your contacts. So if you happen to be connected to a lot of careless spend-a-holics, too bad for you.
Without even getting to the privacy aspects of such models, one concerning aspect – and this is the main thrust of O’Neil’s book – is that these kinds of models can discriminate heavily based completely on aggregate behavior. For example, if CRF:s model sees your friends spending and not paying their bills, they might classify you as a credit risk, and not give you a credit card. And if there is little individual data about you, this kind of aggregate data can form the justification of the whole decision. Needless to say, it’s quite unfair that you can be denied credit – even when you’re doing everything right – just because of your friends’ behavior.
A good example of the first kind is generously provided by US education systems. Now, in the US, rankings of schools are all the rage. Such rankings are defined in with a complicated equation, that takes into account how well outgoing students do. And of course, the rankings drive the better students to the better schools. However, the model never actually learns any of the variables and their importance from data – these are all defined by pulling them from the administrators’, programmers’, and politicians’ collective hats. What could go wrong? What happens with systems like these, is that the ranking becomes a self-fulfilling prophecy, and that changing how the ranking is calculated becomes impossible, because the schools that do well are obviously up in arms about any changes.
This whole topic of discrimination in algorithms is actually gaining some good traction. In fact, people at Google are taking notice. In a paper that was recently presented at NIPS, the authors argue that what is needed is a concept of equality of opportunity in supervised learning. The idea is simple: if you have two groups, (like two races, or rich and poor, etc.) in both groups the true positive rate should be the same. In the context of loans, for example, this means that of all those who could pay back loans, the same percentage of people are given a loan. So if groups A and B have 800 and 100 people that could pay the loan back, and your budget can account a loan to 100 people, then 88 in group A and 11 in group B would get the loan offer (both having 11% loan offer rate).
Mind you, this isn’t the only possible or useful concept for reducing discrimination. Other useful ones group-unaware and demographic parity. A group-unaware algorithm discards the group variable, and uses the same threshold for both groups. But for loans, depending on the group distributions, this might lead to one group getting less loan offers. A demographic parity algorithm, on the other hand, looks at how many loans each group gets. In the case of loans, this would be quite silly, but the concept might be more useful when allocating representatives for groups, because you might want each group to have the same number of representatives, for example.
Anyway, there’s a really neat interactive graphic about these, I recommend you to check it out. You can find it here.