Data Science presents one with the ability to find patterns in data and innovate new data products for the greater social good and is considered to be ethically- neutral. It does not come with its own perspective of either: what is correct or incorrect; nor: what is good or bad in using it. While data science has no value framework, organizations, on the other hand, have value systems in place. By asking and seeking answers to various ethical questions, one can ensure it is used in a way that aligns with the organizations’ ethics and values.
There is no doubt about it: The future will be completely driven by Machine Learning and Data Science forms the epicenter to this feature. Machines are fuelled by the data they are trained on. Every advertisement, every self-driving cars, every medical diagnosis provided by a machine will be based on certain data. Data Ethics is a rapidly improvising field-of-study. Increasingly, those collecting, sharing, and working with data are delving into the ethics of their methods and in some cases, they are being forced to encounter those ethics in the face of public criticism. A failure to handle data ethically can severely impact people and could lead to a loss of trust in projects, products, or organizations.
Ethical challenges occur when opinions on what is considered right and wrong deviate. For example, should Data science have the power to decide whether a litigant is released on bail or not? An application built on top of data like COMPAS: Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) software system used in US courts would require evaluating how the data is generated in the first place. Algorithms learn biases contained in the training dataset. Training dataset may contain historical traces of intentional or unintentional discrimination, biased decision, or maybe a sample from populations that do not represent everyone.
There are three main ethical challenges related to data and data science: Unfair Discrimination, Reinforcement of Human Biases, and Lack of Transparency.
If data reflects unfair social biases against sensitive attributes such as Race or Gender, then the inferences drawn from that data might also be based on the said bias.
Reinforcement of Human Biases
This kind of problem may arise when various computer models are used in making predictions in areas such as Insurance, Financial Loans, and Policing. If members of a certain racial group have historically been more likely to default on their loans, or have been more likely to be convicted of a crime, then the model can deem these people riskier. This doesn’t necessarily mean that these people actually engage in more criminal behavior or are worse at managing their money.
Lack of Transparency
There are two areas required transparency. First, the step-by-step process, model and its parameters by which a prediction is made. Depending on the model used, this can become very difficult as the exact functioning of certain models like the neural network is still unclear. Second, it remains unclear which data is being used in making a prediction. Statistical Models cannot distinguish between the predictive power of single variables and that set of variables.
Some ethical and data science concerns while modeling for a problem are:
- Methods for handling sensitive data
- Uses of data science that undermine privacy
- Data selection and unintentional red-lining
- Re-inscription of existing biases
- Reducing the discrimination already present in a training dataset
Mitigating Malicious Attacks
- Intentional subversion of machine learning systems
- Hazards of learning from the open internet
Data Ethics is rigidly in the hands of data scientists and won’t be modified any time soon. Having said that, Data Scientists who have access to essential tools that can anatomize how people think with an eye towards affecting their behavior do not get a single hour of ethics training in most programs. That’s a problem that needs to be fixed if we want to avoid a constant flow of questionable uses of data or models.
There are indeed more principles we need to create a more powerful technology become available. Data scientists, data engineers, database administrators and everyone associated with controlling data should have a voice in the ethical debate about how data should be used. Organizations should openly address these difficulties in formal and informal conventions.
- How I am fighting bias in algorithms
- Reducing discrimination in AI with new methodology
- It’s not big data that discriminates – it’s the people that use It (http://theconversation.com/its-not-big-data-that-discriminates-its-the-people-that-use-it-55591)
- Challenges of transparent accountability in Big Data Analytics (https://www.ibmbigdatahub.com/blog/challenges-transparent-accountability-big-data-analytics)
- Benefits and ethical challenges in Data Science- COMPAS and Smart Meters (https://towardsdatascience.com/benefits-and-ethical-challenges-in-data-science-compas-and-smart-meters-da549dacd7cd)