Posted on August 1, 2018 by admin

Make “Fairness by Design” Part of Machine Learning

aug18-01-dlanor-s-672264-unsplash
dlanor s./unsplash

Machine learning is increasingly being used to predict individuals’ attitudes, behaviors, and preferences across an array of applications — from personalized marketing to precision medicine. Unsurprisingly, given the speed of change and ever-increasing complexity, there have been several recent high-profile examples of “machine learning gone wrong.”

A chatbot trained using Twitter was shut down after only a single day because of its obscene and inflammatory tweets. Machine learning models used in a popular search engine struggle to differentiate human images from those of gorillas, and show female searchers ads for lower paying jobs relative to male users. More recently, a study compared the commonly used crime risk analysis tool COMPAS against recidivism predictions from 400 untrained workers recruited via Amazon Mechanical Turk. The results suggest that COMPAS has learned implicit racial biases, causing it to be less accurate than the novice human predictors.

When models don’t perform as intended, people and process are normally to blame. Bias can manifest itself in many forms across various stages of the machine learning process, including data collection, data preparation, modeling, evaluation, and deployment. Sampling bias may produce models trained on data that is not fully representative of future cases. Performance bias can exaggerate perceptions of predictive power, generalizability, and performance homogeneity across data segments. Confirmation bias can cause information to be sought, interpreted, emphasized, and remembered in a way that confirms preconceptions. Anchoring bias may lead to over-reliance on the first piece of information examined. So how can we mitigate bias in machine learning?

In our federally-funded project (with Rick Netemeyer, David Dobolyi, and Indranil Bardhan), we are developing a patient-centric mobile/IoT platform for those at early risk of cardiovascular disease in the Stroke Belt — a region spanning the southeastern United States, where the incident rates for stroke are 25% to 40% higher than the national average. As part of the project, we built machine learning models based on various types of unstructured inputs including user-generated text and telemetric and sensor-based data. One critical component of the project involved developing deep learning text analytics models to infer psychometric dimensions — such as measures of  numeracy, literacy, trust, and anxiety — which have been shown to have a profound impact on health outcomes including wellness, future doctor visits, and adherence to treatment regimens. The idea is that if a doctor could know that a patient was, for example, skeptical of the health profession, they could tailor their care to overcome that lack of trust. Our models predict these psychometric dimensions based on the data we collected.

Given that cardiovascular disease is disproportionately more likely to affect the health of disparate populations, we knew alleviating racial, gender, and socio-economic biases from our text analytics models would be vitally important. Borrowing from the concept of “privacy by design” popularized by the European Union’s General Data Protection Regulation (GDPR), we employed a “fairness by design” strategy encompassing a few key facets. Companies and data scientists looking to similarly design for fairness can take the following steps:

1. Pair data scientists with a social scientist. Data scientists and social scientists speak somewhat different languages. To a data scientist, “bias” has a particular technical meaning — it refers to the level of segmentation in a classification model. Similarly, the term “discriminatory potential” refers to the extent to which a model can accurately differentiate classes of data (e.g., patients at high versus low risk of cardiovascular disease). In data science, greater “discriminatory potential” is a primary goal. By contrast, when social scientists talk about bias or discrimination, they’re more likely to be referring to questions of equity. Social scientists are generally better equipped to provide a humanistic perspective on fairness and bias.

In our Stroke Belt project, from the start, we made sure to include psychologists, psychometricians, epidemiologists, and folks specialized in dealing with health-disparate populations. This allowed us to have a better awareness of demographic biases that might creep into the machine learning process.

2. Annotate with caution. Unstructured data such as text and images often is generated by human annotators who provide structured category labels that are then used to train machine learning models. For instance, annotators can label images containing people, or mark which texts contain positive versus negative sentiments.

Human annotation services have become a major business model, with numerous platforms emerging at the intersection of crowd-sourcing and the gig economy. Although the quality of annotation is adequate for many tasks, human annotation is inherently prone to a plethora of culturally ingrained biases.

In our project, we anticipated that this might introduce bias into our models. For example, given two individuals with similar levels of health numeracy, one of them is much more likely to be scored lower by annotators if his/her writing contains misspellings or grammatical mistakes. This can cause biases to seep into the trained models, such as overemphasizing the importance of misspellings relative to more substantive cues when predicting health numeracy.

One effective approach we have found is to include potential bias cases in annotator training modules to increase awareness. However, in the Stroke Belt project, we circumvented annotation entirely, instead relying on self-reported data. While this approach is not always feasible, and may come with its share of issues, it allowed us to avoid annotation-related racial biases.

3. Combine traditional machine learning metrics with fairness measures. The performance of machine learning classification models is typically measured using a small set of well-established metrics that focus on overall performance, class-level performance, and all-around model generalizability. However, these can be augmented with fairness measures designed to quantify machine learning bias. Such key performance indicators are essential for garnering situational awareness — as the saying goes, “if it cannot be measured, it cannot be improved.” By utilizing fairness measures, in the recidivism prediction study mentioned earlier, researchers noted that existing models were heavily skewed in their risk assessments for certain groups.

In our project, we examined model performance within various demographic segments, as well as underlying model assumptions, to identify demographic segments with higher susceptibility to bias in our context. Important fairness measures incorporated were within- and across-segment true/false, positive/negative rates and the level of reliance on demographic variables. Segments with disproportionately higher false positive or false negative rates might be prone to over-generalizations. For segments with seemingly fair outcomes at present, if demographic variables are weighed heavily relative to others and act as primary drivers of predictions, there might be potential for susceptibility to bias in future data.

4. When sampling, balance representativeness with critical mass constraints. For data sampling, the age-old mantra has been to ensure that samples are statistically representative of the future cases that a given model is likely to encounter. This is generally a good practice. The one issue with representativeness is that it undervalues minority cases — those that are statistically less common. While at the surface this seems intuitive and acceptable — there are always going to be more- and less-common cases — issues arise when certain demographic groups are statistical minorities in your dataset. Essentially, machine learning models are incentivized to learn patterns that apply to large groups, in order to become more accurate, meaning that if a particular group isn’t well represented in your data, the model will not prioritize learning about it. In our project, we had to significantly oversample cases related to certain demographic groups in order to ensure that we had a critical mass of training samples necessary to meet our fairness measures.

5. When building a model, keep de-biasing in mind. Even with the aforementioned steps, de-biasing during the model building and training phase is often necessary. Several tactics have been proposed. One approach is to completely strip the training data of any demographic cues, explicit and implicit. In the recidivism prediction study discussed earlier, the novice human predictors weren’t provided with any race information. Another approach is to build fairness measures into the model’s training objectives, for instance, by “boosting” the importance of certain minority or edge cases.

In our project, we found that it was helpful to train our models within demographic segments algorithmically identified as being highly susceptible to bias. For example, if segments A and B are prone to superfluous generalizations (as quantified by our fairness measures), learning patterns within these segments provides some semblance of demographic homogeneity and alleviates majority/minority sampling issues, thereby forcing the models to learn alternative patterns. In our case, this approach not only enhanced fairness measures markedly (by 5% to 10% for some segments), but also boosted overall accuracy by a couple of percentage points.

A few months back, we were at a conference where the CEO of a major multinational lamented about “the principle of precaution overshadowing the principle of innovation.” This is a concern voiced within C-suites and machine learning groups worldwide — in regards to both privacy and bias. But fairness by design isn’t about prioritizing political correctness above model accuracy. With careful consideration, it can allow us to develop high-performing models that are accurate and conscionable. Buying in to the idea of fairness by design entails examining different parts of the machine learning process from alternative vantage points, using competing theoretical lenses. In our Stroke Belt project, we were able to develop models with higher overall performance, greater generalizability across various demographic segments, and enhanced model stability — potentially making it easier for the health care system to match the right person with the right intervention in a timely manner.

By making fairness a guiding principle in machine learning projects, we didn’t just build fairer models — we built better ones, too.

Source: HBR

Posted on August 1, 2018 by admin

Want Less-Biased Decisions? Use Algorithms.

jul18_26_903645204
Orlagh Murphy/Getty Images

A quiet revolution is taking place. In contrast to much of the press coverage of artificial intelligence, this revolution is not about the ascendance of a sentient android army. Rather, it is characterized by a steady increase in the automation of traditionally human-based decision processes throughout organizations all over the country. While advancements like AlphaGo Zero make for catchy headlines, it is fairly conventional machine learning and statistical techniques — ordinary least squares, logistic regression, decision trees — that are adding real value to the bottom line of many organizations. Real-world applications range from medical diagnoses and judicial sentencing to professional recruiting and resource allocation in public agencies.

Is this revolution a good thing? There seems to be a growing cadre of authors, academics, and journalists that would answer in the negative. Book titles in this genre include Weapons of Math Destruction, Automating Inequality, and The Black Box Society. There has also been a spate of exposé-style longform articles such as “Machine Bias,” “Austerity Is an Algorithm,” and “Are Algorithms Building the New Infrastructure of Racism?” At the heart of this work is the concern that algorithms are often opaque, biased, and unaccountable tools being wielded in the interests of institutional power. So how worried should we be about the modern ascendance of algorithms?

These critiques and investigations are often insightful and illuminating, and they have done a good job in disabusing us of the notion that algorithms are purely objective. But there is a pattern among these critics, which is that they rarely ask how well the systems they analyze would operate without algorithms. And that is the most relevant question for practitioners and policy makers: How do the bias and performance of algorithms compare with the status quo? Rather than simply asking whether algorithms are flawed, we should be asking how these flaws compare with those of human beings.

What Does the Research Say?

There is a large body of research on algorithmic decision making that dates back several decades. And the existing studies on this topic all have a remarkably similar conclusion: Algorithms are less biased and more accurate than the humans they are replacing. Below is a sample of the research about what happens when algorithms are given control of tasks traditionally carried out by humans (all emphasis mine):

  • In 2002 a team of economists studied the impact of automated underwriting algorithms in the mortgage lending industry. Their primary findings were “that [automated underwriting] systems more accurately predict default than manual underwriters do” and “that this increased accuracy results in higher borrower approval rates, especially for underserved applicants.” Rather than marginalizing traditionally underserved home buyers, the algorithmic system actually benefited this segment of consumers the most.
  • A similar conclusion was reached by Bo Cowgill at Columbia Business School when he studied the performance of a job-screening algorithm at a software company (forthcoming research). When the company rolled out the algorithm to decide which applicants should get interviews, the algorithm actually favored “nontraditional” candidates much more than human screeners did. Compared with the humans, the algorithm exhibited significantly less bias against candidates that were underrepresented at the firm (such as those without personal referrals or degrees from prestigious universities).
  • In the context of New York City pre-trial bail hearings, a team of prominent computer scientists and economists determined that algorithms have the potential to achieve significantly more-equitable decisions than the judges who currently make bail decisions, with “jailing rate reductions [of] up to 41.9% with no increase in crime rates.” They also found that in their model “all categories of crime, including violent crimes, show reductions [in jailing rates]; and these gains can be achieved while simultaneously reducing racial disparities.”
  • The New York Times Magazine recently reported a longform story to answer the question, “Can an algorithm tell when kids are in danger?” It turns out the answer is “yes,” and that algorithms can perform this task much more accurately than humans. Rather than exacerbating the pernicious racial biases associated with some government services, “the Allegheny experience suggests that its screening tool is less bad at weighing biases than human screeners have been.”
  • Lastly, by looking at historical data on publicly traded companies, a team of finance professors set out to build an algorithm to choose the best board members for a given company. Not only did the researchers find that companies would perform better with algorithmically selected board members, but compared with their proposed algorithm, they “found that firms [without algorithms] tend to choose directors who are much more likely to be male, have a large network, have a lot of board experience, currently serve on more boards, and have a finance background.”

In each of these case studies, the data scientists did what sounds like an alarming thing: They trained their algorithms on past data that is surely biased by historical prejudices. So what’s going on here? How is it that in so many different areas — credit applications, job screenings, criminal justice, public resource allocations, and corporate governance — algorithms can be reducing bias, when we have been told by many commentators that algorithms should be doing the opposite?

Human Beings Are Remarkably Bad Decision Makers

A not-so-hidden secret behind the algorithms mentioned above is that they actually are biased. But the humans they are replacing are significantly more biased. After all, where do institutional biases come from if not the humans who have traditionally been in charge?

But humans can’t be all that bad, right? Yes, we may be biased, but surely there’s some measure of performance on which we are good decision makers. Unfortunately, decades of psychological research in judgment and decision making has demonstrated time and time again that humans are remarkably bad judges of quality in a wide range of contexts. Thanks to the pioneering work of Paul Meehl (and follow-up work by Robyn Dawes), we have known since at least the 1950s that very simple mathematical models outperform supposed experts at predicting important outcomes in clinical settings.

In all the examples mentioned above, the humans who used to make decisions were so remarkably bad that replacing them with algorithms both increased accuracy and reduced institutional biases. This is what economists call a Pareto improvement, where one policy beats out the alternative on every outcome we care about. While many critics like to imply that modern organizations pursue the operational efficiency and greater productivity at the expense of equity and fairness, all available evidence in these contexts suggests that there is no such trade-off: Algorithms deliver more-efficient and more-equitable outcomes. If anything should alarm you, it should be the fact that so many important decisions are being made by human beings who we know are inconsistent, biased, and phenomenally bad decision makers.

Improving on the Status Quo

Of course, we should be doing all we can to eradicate institutional bias and its pernicious influence on decision-making algorithms. Critiques of algorithmic decision making have spawned a rich new wave of research in machine learning that takes more seriously the social and political consequences of algorithms. There are novel techniques emerging in statistics and machine learning that are designed specifically to address the concerns around algorithmic discrimination. There is even an academic conference every year at which researchers not only discuss the ethical and social challenges of machine learning but also present new models and methods for ensuring algorithms have a positive impact on society. This work will likely become even more important as less-transparent algorithms like deep learning become more common.

But even if technology can’t fully solve the social ills of institutional bias and prejudicial discrimination, the evidence reviewed here suggests that, in practice, it can play a small but measurable part in improving the status quo. This is not an argument for algorithmic absolutism or blind faith in the power of statistics. If we find in some instances that algorithms have an unacceptably high degree of bias in comparison with current decision-making processes, then there is no harm done by following the evidence and maintaining the existing paradigm. But a commitment to following the evidence cuts both ways, and we should to be willing to accept that — in some instances — algorithms will be part of the solution for reducing institutional biases. So the next time you read a headline about the perils of algorithmic bias, remember to look in the mirror and recall that the perils of human bias are likely even worse.

Source: HBR

Posted on August 1, 2018 by admin

Pico nabs $24.7M to create VR hardware that challenges Facebook, Google

While there aren’t many VR hardware startups raising cash out there these days, there are far fewer that are securing investments to actually build the VR headsets themselves.

Even as established tech giants are having a rough go-ahead with the headset market, Beijing-based Pico Interactive is looking to give it a go with a focus on standalone VR headset hardware that can keep up with the innovations of larger firms.

Pico has closed a $24.7 million Series A led by GF Qianhe and GF Xinde Investment, with participation from Jufeng S&T Venture Investment and others, the company said in an announcement. This is the startup’s first bout of outside funding since its founding in 2015.

VR hardware had plenty of entrants around Pico’s founding, but as Oculus competitors were forced to slash prices to keep up with aggressive pricing, margins disappeared, leaving relatively scant space for startups. Pico has made its bet on moving past PC or console-based systems and focusing strongly on self-contained standalone headset options.

Coinciding with the funding announcement, Pico also offered details on a new standalone headset being released in China. Called the Pico G2, it’s an updated version of the Pico Goblin that is built on Qualcomm’s 835 chipset. The company’s hardware runs on HTC’s Vive Wave VR platform.

The company also says that it is planning to release its own augmented reality hardware in 2019.


Source: TechCrunh Startup

Posted on July 19, 2018 by admin

Fat Lama is a platform to lend and borrow anything

 Part of YC’s Summer ’17 class, Fat Lama wants to be a rental marketplace for anything. Launched late last year in London, the startup has gained early traction among professionals needing short-term rentals of creative gear – like drones, cameras or DJ equipment. But the platform is also filled with weirder goods – like a tuxedo, camper van or popcorn machine. Items… Read More
Source: TechCrunh Startup

Posted on July 19, 2018 by admin

Immersv raises $10.5M to shake up mobile advertising with some VR flair

 While VR has shown consumers a great deal of what’s possible when it comes to immersive experiences, the world of potential that VR opens to advertisers has not been explored so heavily,
Immersv, a mobile 360 VR ad network, wants to use the strengths of VR to improve mobile advertising, not only on headset but smartphones as well. For now, the startup is focusing on pre-roll and… Read More
Source: TechCrunh Startup

Posted on July 18, 2018 by admin

Which is the best daily planner for busy entrepreneurs?

Keeping on track is hard. Over the years I’ve tried a number of personal information managers – PIMs, for short – from the original Palm V to my current iPhone/iCal/Vyte/phone tag method of making sure I’m in the right place at the right time. It rarely works. Something is always dropping out. An appointment added a week ago disappears while old appointments reappear on… Read More
Source: TechCrunh Startup

Posted on July 18, 2018 by admin

Ahoy.ai is a robot that can plan your schedule in Slack

Ahoy, mateys! A Columbus, Ohio-based company called Ahoy.ai aims to schedule yer deck swabbing sessions with just one arrrrrmail or Slack. Ahoy is the brainchild of Jesse Rowe and Alex Ogorek. Rowe is a senior at OSU and this is his first company. He has raised $14,000 from a small fund in Ohio and is looking to expand his idea. “Majority of our competitors still involve back and… Read More
Source: TechCrunh Startup

Newer Posts
Older Posts