The Intuitive Appeal of Explainable Machines

Abstract

As algorithmic decision-making has become synonymous with inexplicable decision-making, we have become obsessed with opening the black box. This Article responds to a growing chorus of legal scholars and policymakers demanding explainable machines. Their instinct makes sense; what is unexplainable is usually unaccountable. But the calls for explanation are a reaction to two distinct but often conflated properties of machine-learning models: inscrutability and non intuitiveness. Inscrutability makes one unable to fully grasp the model, while non intuitiveness means one cannot understand why the model’s rules are what they are. Solving inscrutability alone will not resolve law and policy concerns; accountability relates not merely to how models work, but whether they are justified.

In this Article, we first explain what makes models inscrutable as a technical matter. We then explore two important examples of existing regulation-by-explanation and techniques within machine learning for explaining inscrutable decisions. We show that while these techniques might allow machine learning to comply with existing laws, compliance will rarely be enough to assess whether decision-making rests on a justifiable basis.

We argue that calls for explainable machines have failed to recognize the connection between intuition and evaluation and the limitations of such an approach. A belief in the value of explanation for justification assumes that if only a model is explained, problems will reveal themselves intuitively. Machine learning, however, can uncover relationships that are both non-intuitive and legitimate, frustrating this mode of normative assessment. If justification requires understanding why the model’s rules are what they are, we should seek explanations of the process behind a model’s development and use, not just explanations of the model itself. This Article illuminates the explanation-intuition dynamic and offers documentation as an alternative approach to evaluating machine learning models.

Full abstract and research here: 

http://blog.experientia.com/paper-intuitive-appeal-explainable-machines/

Read More
Can stereotype threat explain the gender gap in mathematics performance and achievement?

Men and women score similarly in most areas of mathematics, but a gap favoring men is consistently found at the high end of performance. One explanation for this gap, stereotype threat, was first proposed by Spencer, Steele, and Quinn (1999) and has received much attention. We discuss merits and shortcomings of this study and review replication attempts. Only 55% of the articles with experimental designs that could have replicated the original results did so. But half of these were confounded by statistical adjustment of preexisting mathematics exam scores.

Read More
Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification∗

Joy Buolamwini joyab@mit.edu MIT Media Lab 75 Amherst St. Cambridge, MA 02139

Timnit Gebru timnit.gebru@microsoft.com Microsoft Research 641 Avenue of the Americas, New York, NY 10011

Recent studies demonstrate that machine learning algorithms can discriminate based on classes like race and gender. In this work, we present an approach to evaluate bias present in automated facial analysis algorithms and datasets with respect to phenotypic subgroups. Using the dermatologist approved Fitzpatrick Skin Type classification system, we characterize the gender and skin type distribution of two facial analysis benchmarks, IJB-A and Adience. We find that these datasets are overwhelmingly composed of lighter-skinned subjects (79.6% for IJB-A and 86.2% for Adience) and introduce a new facial analysis dataset which is balanced by gender and skin type. We evaluate 3 commercial gender classification systems using our dataset and show that darker-skinned females are the most misclassified group (with error rates of up to 34.7%). The maximum error rate for lighter-skinned males is 0.8%. The substantial disparities in the accuracy of classifying darker females, lighter females, darker males, and lighter males in gender classification systems require urgent attention if commercial companies are to build genuinely fair, transparent and accountable facial analysis algorithms.

Keywords: Computer Vision, Algorithmic Audit, Gender Classification

Full Research: http://proceedings.mlr.press/v81/buolamwini18a/buolamwini18a.pdf

Read More
Connotation Frames of Power and Agency in Modern Films

The framing of an action influences how we perceive its actor. We introduce connotation frames of power and agency, a pragmatic formalism organized using frame semantic representations, to model how different levels of power and agency are implicitly projected on actors through their actions. We use the new power and agency frames to measure the subtle, but prevalent, gender bias in the portrayal of modern film characters and provide insights that deviate from the well-known Bechdel test. Our contributions include an extended lexicon of connotation frames along with a web interface that provides a comprehensive analysis through the lens of connotation frames.

Read More
ETHICALLY ALIGNED DESIGN A Vision for Prioritizing Human Well-being with Autonomous and Intelligent Systems - IEEE

Introduction As the use and impact of autonomous and intelligent systems (A/IS) become pervasive, we need to establish societal and policy guidelines in order for such systems to remain human-centric, serving humanity’s values and ethical principles. These systems have to behave in a way that is beneficial to people beyond reaching functional goals and addressing technical problems. This will allow for an elevated level of trust between people and technology that is needed for its fruitful, pervasive use in our daily lives. To be able to contribute in a positive, non-dogmatic way, we, the techno-scientific communities, need to enhance our self-reflection, we need to have an open and honest debate around our imaginary, our sets of explicit or implicit values, our institutions, symbols and representations. Eudaimonia, as elucidated by Aristotle, is a practice that defines human well-being as the highest virtue for a society. Translated roughly as “flourishing,” the benefits of eudaimonia begin by conscious contemplation, where ethical considerations help us define how we wish to live. Whether our ethical practices are Western (Aristotelian, Kantian), Eastern (Shinto, Confucian), African (Ubuntu), or from a different tradition, by creating autonomous and intelligent systems that explicitly honor inalienable human rights and the beneficial values of their users, we can prioritize the increase of human well-being as our metric for progress in the algorithmic age. Measuring and honoring the potential of holistic economic prosperity should become more important than pursuing one-dimensional goals like productivity increase or GDP growth.

Read More
Automated Experiments on Ad Privacy Settings A Tale of Opacity, Choice, and Discrimination

To partly address people’s concerns over web tracking, Google has created the Ad Settings webpage to provide information about and some choice over the profiles Google creates on users. We present AdFisher, an automated tool that explores how user behaviors, Google’s ads, and Ad Settings interact. AdFisher can run browser-based experiments and analyze data using machine learning and significance tests. Our tool uses a rigorous experimental design and statistical analysis to ensure the statistical soundness of our results. We use AdFisher to find that the Ad Settings was opaque about some features of a user’s profile, that it does provide some choice on ads, and that these choices can lead to seemingly discriminatory ads. In particular, we found that visiting webpages associated with substance abuse changed the ads shown but not the settings page. We also found that setting the gender to female resulted in getting fewer instances of an ad related to high paying jobs than setting it to male. We cannot determine who caused these findings due to our limited visibility into the ad ecosystem, which includes Google, advertisers, websites, and users. Nevertheless, these results can form the starting point for deeper investigations by either the companies themselves or by regulatory bodies.

Read More
Unconscious bias - The Royal Society

Adapted by Professor Uta Frith DBE FBA FMedSci FRS from guidance issued to recruitment panels by the Scottish Government

Introduction

All panels and committees for selection and appointments at The Royal Society should be carried out objectively and professionally.

The Society is committed to making funding or award decisions purely on the basis of the quality of the proposed science and merit of the individual. No funding applicant or nominee for awards, Fellowship, Foreign Membership, election to a post or appointment to a committee should receive less favourable treatment on the grounds of: gender, marital status, sexual orientation, gender re-assignment, race, colour, nationality, ethnicity or national origins, religion or similar philosophical belief, spent criminal conviction, age or disability.

Read More
Accountability of AI Under the Law: The Role of Explanation

The ubiquity of systems using artificial intelligence or “AI” has brought increasing attention to how those systems should be regulated. The choice of how to regulate AI systems will require care. AI systems have the potential to synthesize large amounts of data, allowing for greater levels of personalization and precision than ever before—applications range from clinical decision support to autonomous driving and predictive policing. That said, our AIs continue to lag in common sense reasoning [McCarthy, 1960], and thus there exist legitimate concerns about the intentional and unintentional negative consequences of AI systems [Bostrom, 2003, Amodei et al., 2016, Sculley et al., 2014].

Read More
Fairer machine learning in the real world: Mitigating discrimination without collecting sensitive data

Decisions based on algorithmic, machine learning models can be unfair, reproducing biases in historical data used to train them. While computational techniques are emerging to address aspects of these concerns through communities such as discrimination-aware data mining (DADM) and fairness, accountability and transparency machine learning (FATML), their practical implementation faces real-world challenges. For legal, institutional or commercial reasons, organisations might not hold the data on sensitive attributes such as gender, ethnicity, sexuality or disability needed to diagnose and mitigate emergent indirect discrimination-by-proxy, such as redlining. Such organisations might also lack the knowledge and capacity to identify and manage fairness issues that are emergent properties of complex sociotechnical systems. This paper presents and discusses three potential approaches to deal with such knowledge and information deficits in the context of fairer machine learning. Trusted third parties could selectively store data necessary for performing discrimination discovery and incorporating fairness constraints into model-building in a privacy-preserving manner. Collaborative online platforms would allow diverse organisations to record, share and access contextual and experiential knowledge to promote fairness in machine learning systems. Finally, unsupervised learning and pedagogically interpretable algorithms might allow fairness hypotheses to be built for further selective testing and exploration.

Read More
Deep neural networks are more accurate than humans at detecting sexual orientation from facial images.

We show that faces contain much more information about sexual orientation than can be perceived and interpreted by the human brain. We used deep neural networks to extract features from 35,326 facial images. These features were entered into a logistic regression aimed at classifying sexual orientation. Given a single facial image, a classifier could correctly distinguish between gay and heterosexual men in 81% of cases, and in 74% of cases for women. Human judges achieved much lower accuracy: 61% for men and 54% for women. The accuracy of the algorithm increased to 91% and 83%, respectively, given five facial images per person. Facial features employed by the classifier included both fixed (e.g., nose shape) and transient facial features (e.g., grooming style).

Read More
Equality of Opportunity in Supervised Learning

We propose a criterion for discrimination against a specified sensitive attribute in supervised learning, where the goal is to predict some target based on available features. Assuming data about the predictor, target, and membership in the protected group are available, we show how to optimally adjust any learned predictor so as to remove discrimination according to our definition. Our framework also improves incentives by shifting the cost of poor classification from disadvantaged groups to the decision maker, who can respond by improving the classification accuracy. 

Read More
GENDER BIAS IN ADVERTISING

In 2017, discussions around gender and media have reached a fever pitch. Following a bruising year at the ballot box, fourth-wave feminism has continued to expand. From the Women’s March to high-profile sexual harassment trials to the increasing number of female protagonists gaining audience recognition in an age of “peak TV,” women are ensuring that their concerns are heard and represented.

We’ve seen movements for gender equality in Hollywood, in Silicon Valley — and even on Madison Avenue. In response to longstanding sexism in advertising, industry leaders such as Madonna Badger are highlighting how objectification of women in advertising can lead to unconscious biases that harm women, girls and society as a whole.

Agencies are creating marquee campaigns to support women and girls. The Always #LikeAGirl campaign, which debuted in 2014, ignited a wave of me-too “femvertising” campaigns: #GirlsCan from Cover Girl, “This Girl Can” from Sport England and the UK’s National Lottery, and a spot from H&M that showcased women in all their diversity, set to “She’s a Lady.” Cannes Lions got in on the act in 2015, introducing the Glass Lion: The Lion for Change, an award to honor ad campaigns that address gender inequality or prejudice.

But beyond the marquee case studies, is the advertising industry making strides toward improving representation of women overall? How do we square the surge in “femvertising” with insights from J. Walter Thompson’s Female Tribes initiative, which found in 2016 that, according to 85% of women, the advertising world needs to catch up with the real world?

Read More
Evidence That Gendered Wording in Job Advertisements Exists and Sustains Gender Inequality

Women continue to remain underrepresented in male-dominated fields such as engineering, the natural sciences, and business. Research has identified a range of individual factors such as beliefs and stereotypes that affect these disparities but less is documented around institutional factors that perpetuate gender inequalities within the social structure itself (e.g., public policy or law). These institutional factors can also influence people’s perceptions and attitudes towards women in these fields, as well as other individual factors.

Read More
Algorithmic Accountability Reporting: On the Investigation of Black Boxes

How can we characterize the power that various algorithms may exert on us? And how can we better understand when algorithms might be wronging us? What should be the role of journalists in holding that power to account? In this report I discuss what algorithms are and how they encode power. I then describe the idea of algorithmic accountability, first examining how algorithms problematize and sometimes stand in tension with transparency.

Read More
AlgorithmWatch: What Role Can a Watchdog Organization Play in Ensuring Algorithmic Accountability?

In early 2015, Nicholas Diakopoulos’s paper “Algorithmic Accountability Reporting: On the Investigation of Black Boxes” sparked a debate in a small but international community of journalists, focusing on the question how journalists can contribute to the growing field of investigating automated decision making (ADM) systems and holding them accountable to democratic control. This started the process of a group of four people, consisting of a journalist, a data journalist, a data scientist and a philosopher, thinking about what kind of means were needed to increase public attention for this issue in Europe. It led to the creation of AlgorithmWatch, a watchdog and advocacy initiative based in Berlin.

Read More
The Effects of Cognitive Biases and Imperfectness in Long-term Robot-Human Interactions: Case Studies using Five Cognitive Biases on Three Robots

The research presented in this paper demonstrates a model for aiding human-robot companionship based on the principle of 'human' cognitive biases applied to a robot. The aim of this work was to study how cognitive biases can affect human-robot companionship in long-time. In the current paper, we show comparative results of the experiments using five biased algorithms in three different robots such as ERWIN, MyKeepon and MARC.

Read More
GloVe: Global Vectors for Word Representation

Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using vector arithmetic , but the origin of these regularities has remained opaque. We analyze and make explicit the model properties needed for such regularities to emerge in word vectors. The result is a new global log-bilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods. Our model efficiently leverages statistical information by training only on the nonzero elements in a word-word co-occurrence matrix, rather than on the entire sparse matrix or on individual context windows in a large corpus. The model produces a vector space with meaningful sub-structure, as evidenced by its performance of 75% on a recent word analogy task. It also outperforms related models on similarity tasks and named entity recognition.

Read More
Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

The blind application of machine learning runs the risk of amplifying biases present in data. Such a danger is facing us with word embedding, a popular framework to represent text data as vectors which has been used in many machine learning and natural language processing tasks. We show that even word embeddings trained on Google News articles exhibit female/male gender stereotypes to a disturbing extent. This raises concerns because their widespread use, as we describe, often tends to amplify these biases. Geometrically, gender bias is first shown to be captured by a direction in the word embedding. Second, gender neutral words are shown to be linearly separable from gender definition words in the word embedding. Using these properties, we provide a methodology for modifying an embedding to remove gender stereotypes, such as the association between between the words receptionist and female, while maintaining desired associations such as between the words queen and female. We define metrics to quantify both direct and indirect gender biases in embeddings, and develop algorithms to "debias" the embedding. Using crowd-worker evaluation as well as standard benchmarks, we empirically demonstrate that our algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks. The resulting embeddings can be used in applications without amplifying gender bias.

Read More
Penn Psychologists Tap Big Data, Twitter to Analyze Accuracy of Stereotypes

What’s in a tweet? People draw conclusions about us, from our gender to education level, based on the words we use on social media. Researchers from the University of Pennsylvania, along with colleagues from the Technical University of Darmstadt and the University of Melbourne, have now analyzed the accuracy of those inferences. Their work revealed that, though stereotypes and the truth often aligned, with people making accurate assumptions more than two-thirds of the time, inaccurate characterizations still showed up.

Read More