Responsible AI in Consumer EnterpriseA framework to helporganizations operationalize ethics, privacy, and security as theyapply machine learning and artificial intelligence
Responsible AI in Consumer Enterprise | 2
Acknowledgements
This framework has benefited from input and feedback from adiverse group of enterprise executives, product designers, machinelearning scientists, security engineers, lawyers, academics, andcommunity advocates. The first draft was released for peer reviewat the 2018 RightsCon conference in Toronto.
We’d like to extend a special thanks to the followingindividuals and institutions for supporting our efforts andproviding helpful comments to make the framework as useful andaccurate as possible:
ALTIMETER INSTITUTE
Susan Etlinger
CORNELL TECH
Helen Nissenbaum
CORUS MEDIA
Anne Harrop
Jane Harrison
GEORGIAN PARTNERS
Jason Brenier
Nick Chen
Parinaz Sobhani
Jon Prial
GOOGLE
Kevin Swersky
INFORMATION AND PRIVACY COMMISSIONER OF ONTARIO
David Weinkauf
LOBLAW DIGITAL
Richard Downe
MCCARTHY TÉTRAULT LLP
Adam Goldenberg
Carole Piovesan
MICROSOFT
Andree Gagnon
OSLER, HOSKIN & HARCOURT LLP
Adam Kardash
Patricia Kosseim
PRIVACY BY DESIGN CENTRE OF EXCELLENCE, RYERSON UNIVERSITY
Ann Cavoukian
RIGHTSCON
Brett Solomon
Melissa Kim
SCOTIABANK
Mike Henry
Daniel Moore
Michael Zerbs
Dubie Cunninghman
Veeru Ramaswamy
TELUS
Pam Snively
Elena Novas
THE ENGINE ROOM
Alix Dunn
VECTOR INSTITUTE AT THE UNIVERSITY OF TORONTO
Richard Zemel
Cameron Schuler
Frank Rudzicz
Marc-Etienne Brunet
Elliot Creager
Jesse Bettencourt
Will Grathwohl
Responsible AI in Consumer Enterprise | 3
Data has become a business-critical asset, and organizationsacross
all sectors are recharacterizing themselves as “datacompanies.”
There is an infinite opportunity for organizations toeffectively
leverage and unlock the value inherent in their datarepositories.
Companies that deploy artificial intelligence to derivemeaningful
insights from their data holdings will be the successfulinnovators
of tomorrow. But to achieve true success, organizations must
implement the guardrails needed for responsible data use, asthe
long-term sustainability of any enterprise is predicated ontrust.
For data companies, the respectful and ethical treatment ofdata
has become a core feature of any trust model.
ADAM KARDASH
Chair, Privacy and Data
Management, Co-Leader of
AccessPrivacy, by Osler
PATRICIA KOSSEIM
Counsel, Privacy and Data
Management, Co-Leader of
AccessPrivacy, by Osler
The concept of data ethics is still in its formative stages andrequires active, informed, and multi-stakeholder discussion.Integrate.ai should be commended for developing this framework,which will help facilitate a structured conversation about theethical considerations and broader economic and social impacts ofAI data initiatives.
Foreword
Responsible AI in Consumer Enterprise | 4
The field has been around for a long time, but a phase shift hasoccurred over the past five years thanks to faster computation,smarter algorithms, and, most importantly, exponential growth indata.
The subfield of AI having the greatest impact in the enterpriseis machine learning, software systems that learn from data andexperience. As Amazon CEO Jeff Bezos said in his 2017 letter toshareholders: “Over the past decades computers have broadlyautomated tasks that programmers could describe with clear rulesand algorithms. Modern machine learning techniques now allow us todo the same for tasks where describing the precise rules is muchharder.”
But machine learning does more than automate existing businessesprocesses: It changes how businesses form and strengthenrelationships with customers. Using data and machine learning,businesses can turn every interaction into an opportunity to learnwhat people want and value. At a macro level, machine learning canoptimize margins, directing spend and human resources to thosecustomers where outreach and engagement would lead to the highestreturn.
There are risks. AI requires that enterprises use customer datain new ways, expanding responsibilities to customers to includeappropriate data use. People feel shocked when they learn thattheir sensitive information was leaked. Suspicious when they sensebusinesses want to manipulate their behavior. Powerless when anautomated system denies them a product without any explanation forwhy. Trust is not a constant: it is earned over years and lost inan instant.
Executive leadership is ultimately responsible for striking theright balance between business risk (both legal or reputational)and opportunity. Leaders need a clear mental model for what AI canand cannot do and a means to effectively arbitrate between businessand
risk stakeholders to make the right decisions for the business.This is increasingly difficult in a world where major technologyadvances like AI challenge existing decision-making models.
This framework presents the privacy, security, and ethicschoices businesses face when using machine learning on consumerdata. It breaks things down into the various small decisions teamsneed to make when building a machine learning system. It is anagile approach to ethics and risk management that aligns with agilesoftware development practices. Businesses waste time if governanceor ethics reviews start after systems are built. When done well,accountability quickens rather than slows innovation: business andrisk teams need to make contextual calls about what constraints arerequired when and clearly define desired business outcomes. Thescientists’ job is to apply the best algorithms to optimize forthese goals. There’s no silver bullet. Contextual judgment callsearly on can move mountains.
This framework is neither a regulatory compliance compendium noran exhaustive list of risk management controls. It is a tool tohelp businesses applying AI to think about ethics and riskcontextually. It provides detailed insights for implementationteams and high-level questions for executive leadership.
Expanding governance to include ethics can change employeemindsets towards governance and compliance. Ethics ignite valuesand empathy, the things that make us human and motivate us to dogood work. Sustainable innovation means incentivizing riskprofessionals to act for quick business wins and showing businessleaders why fairness and transparency are good for business.Building for accountability will force cross-functional teams toempathize with one another and communicate better. This alone willbe a win.
Artificial intelligence (AI) may be the biggest and mostdisruptive technology advance we see in our lifetimes.
Executive Summary
Responsible AI in Consumer Enterprise | 5
Operationalizing ethics starts with breaking down how machinelearning systems are built and how they work. The
framework uses the different steps in the system-developmentprocess as its organizing principle and localizes ethics
and governance questions so they can be addressed quickly and inparallel with agile development.
Deciding to cut a project early on because it is too risky orposes ethical concerns frees teams up to focus on other
things (and practice their values in the process). Business,risk, science, and technical teams need to communicate
continuously to ensure scientists optimize for the right set ofconstraints and goals and business teams understand
what’s possible and what’s not possible. Doing ethics up frontcan open up the creative potential of your business.
Guiding Principles
The framework starts with our guiding principles, the
intuitions everyone in your business, including executive
management, should internalize to inform risk-based
thinking and ethical decisions.
People, Processes, and Communication
Next come dos and don’ts about people, processes, and
communication recommended to make ethics efforts
successful. Use this as a checklist to think about your
team and organizational structure.
How Machine Learning Systems Work
After is an overview of how machine learning systems
work and some common machine learning applications
in consumer enterprise. Use this like a glossary to align
on definitions and level set expectations.
Framework Summary
A framework summary follows, breaking down
the different steps in the machine learning system
development process and indicating the jobs to be
done and ethical questions to be considered at each
step.1 Rely on this table as your legend, map, and guide.
Some readers may only ever use this table.
Context at Each Phase
The body of the document provides further questions
and additional context at each phase in the machine
learning system development process. Privacy, security,
fairness, explainability, and transparency issues are
considered at each phase, including anecdotes and
examples. The framework is systematic, but if a given
category (e.g., security) is not relevant at a given phase,it
is left out. Comprehensive guidance on security, privacy,
compliance, or legal risk management issues are out of
scope, but footnotes include references to additional
resources. Use this to think deeper about a particular
topic and to guide questions and decisions.
This framework is designed to operationalize ethics in machinelearning systems.
1 This is a high-level outline designed to focus attention onethics and risk. For further information about machine learningworkflows, we recommend the Georgian Partners Principles of AppliedArtificial Intelligence white paper and the O’Reilly DevelopmentWorkflows for Data Scientists eBook.
The content in this document does not constitute legal advice.Readers seeking to comply with privacy or data protectionregulations are advised to seek specific legal advice fromlawyers.
How to Use This Framework
Responsible AI in Consumer Enterprise | 6
This framework is not a list of procedures and controls. It’s atool to think critically about privacy, security, and ethics.
It will help you ask the right questions to the right people atthe right time, but you will have to assess risks and make
tradeoffs. It’s helpful for everyone in the business to sharethe same intuitions around what matters.
Responsible AI Principles:• In standard practice, machinelearning assumes the future will look like the past. When the pastis unfair or biased,
machine learning will propagate these biases and enhance themthrough feedback loops. If you want the future
to look different from the past, you need to design systems withthat in mind. You can’t just let the data guide you.
Executive leadership should decide what future outcomes thebusiness wants to achieve. These include fairness,
not just profit.
• The outcomes businesses want to optimize for are often hard tomeasure or occur far in the future (e.g., customer
lifetime value). Businesses therefore resort toeasier-to-measure proxies that stand in for desired outcomes.Be
clear about what these proxies do and don’t optimize. You maylearn they exacerbate bias or have downstream
consequences that conflict with values or goals.
• All of your customers are individuals. Representing them asdata points necessarily transforms them from people
into abstractions. When you deal with abstractions andgroupings, you run the risk of treating humans unethically.
• Beware of correlations that mask sensitive data behind benignproxies. For example, postal code/zip code is often
a proxy for ethnic background. If your machine learning systemuses location in decisions, you may end up treating
different ethnic groups differently.
• Context is key for explainability and transparency. Systemsthat decide who gets a credit card or loan require
more scrutiny than systems that personalize marketing offers.Business and risk teams should assess context and
communicate required constraints to technology teams.
• Privacy is not just about personal data, notices or consentforms, or a set of controls to minimize data use. It is
about appropriate data flows that conform to social norms andexpectations. Map these flows and ask if people
would be surprised to learn how their data has been used.
• Accountability is a marathon, not a sprint. Once inproduction, machine learning systems often make errors on
populations that are less well represented in training data.Develop a plan to catch and fix these errors. “Govern
the optimizations. Patrol the results.”2
• There is no silver bullet to responsible AI. It takes criticalthinking and teamwork. Step outside the walls of your
organization and ask communities and customers what matters tothem.
2 Weinberger, David. “Optimization over Explanation: Maximizingthe benefits of machine learning without sacrificing itsintelligence.”https://medium.com/berkman-klein-center/optimization-over-explanation-41ecb135763d
Guiding Principles
Responsible AI in Consumer Enterprise | 7
Enterprises don’t change overnight. The only way to build anethics muscle is to execute a project large enough to
matter (not a research project) and small enough to permitlearning without disruption (not a two-year data lake
migration). Practice will reveal what works and what doesn’twork for your culture.
Here are some dos and don’ts for people, processes, andcommunication to execute ethics in practice:
People, Processes, and Communication
DO
Business and risk teams learn-by-doing through work on a machinelearning project
Put a privacy stakeholder on a machine learning project workingclosely with development teams. Seek vendors or consultants whowill push your team to learn and grow their skills.
Engage cross-functional teams in ethics decisions throughoutsystem development
Business goals and ethics checks should guide technical choices;technical feasibility should influence scope and priorities;executives should set the right incentives and arbitratestalemates.
Use risk-based, contextual thinking to evaluate and prioritizetechnical controls
You can’t do everything at once. Prioritize time and effortsbased on how significant consequences to your business and yourcustomers would be if something went wrong.
Include diverse and external points of view in ethicsdecisions
You minimize risk and find new opportunities by engagingcustomers and communities who will be affected by machine learningmodels.
Insist that technical stakeholders communicate risks in plainlanguage
Not everyone will be a scientist or security expert. Clearcommunication is required for critical thinking and judgment callsto take place.
DON’T
Wait to recruit talent with expertise in ethics and risk relatedto machine learning
There are fewer business professionals with expertise in AI thanhighly-coveted technical researchers. Delaying ethics until youfind the right talent will stall innovation.
Delegate an ethics review to a fixed part of the organization atthe end of a project
Engaging unbiased reviewers without conflicts of interest is agood idea, but for ethics to live and breathe it must be an ongoingpriority for the teams that create the systems as well.
Use one-size-fits-all checklists or rely on vague policies orprinciples
Privacy and ethics risks vary between applications. Granularanalysis and critical thinking are required for ethics inpractice.
Appoint an ethics committee that only includes executiveleadership
This removes accountability from teams. Leadership can arbitratedecisions and set policies, but decentralized accountability iskey.
Grant functional teams the authority to stop progress withoutexplaining why
With ambiguity, it’s easier to say no than yes and use jargon tocover uncertainty. Risks should be tied back to business goals.
Responsible AI in Consumer Enterprise | 8
Agile ethics is a process to operationalise values,iteratively
identify and address ethical challenges of your innovations
before you send them out in to the world, and adaptively
refine processes so that as the capabilities of technology
evolve, so does your ability to diagnose and prevent harm
before it happens.
Agile ethics is the explicit application of agile methods toethical
assessment, adaptation, and learning that allows for a teamto
mature its practices as it works at the bleeding edge. Itemploys
agile methods to tackle ethical challenges, and inculcatesethical
approaches within an agile development process.
It is not:
• Delegating ethical analysis to a fixed part of an organisation(compliance, research, corporate social responsibility, orotherwise)
• Prescribing a fixed approach to ethics based on vague values(policies won’t cut it)
• Encouraging your staff to reflect on ethics without designingmethods or allocating resources to aid them in the process
• A bandaid for underlying problems with governance andleadership
Agile ethics requires four things:
• Decentralisation of critical, ethical thinking in anorganisation or team
• Iterative development of process that supports decentralisedconsideration of the implications of any given idea
• Dedicated staff time to manage the process
• Inclusion of diverse and (when appropriate) externalvoices
ALIX DUNN
Executive Director, The Engine Room
The Engine Room is a UK-based firm that helps activists,organisations, and other social change agents make the most of dataand technology.
Responsible AI in Consumer Enterprise | 9
How Machine Learning Systems WorkBefore you start addressing theethics and risks of machine
learning, it helps if everyone shares a common understanding
of what machine learning systems do and how they work.
This doesn’t mean that everyone needs to become a machine
learning scientist and grasp the nuances of differentalgorithms.
They just need some grounded intuitions to ask goodquestions.
Machine learning systems create useful mappings between
inputs and outputs. The mappings, called models, are
mathematical functions, equations of the form y = mx + b,
where x is an input and y is an output (just that theequations
can be much more complicated!). You could use hand-written
rules to define those mappings, but rules take a lot of timeto
write and usually don’t handle a lot of cases.3 With machine
learning, computer programmers no longer write and update
the mappings between inputs and outputs: computers learn
these mappings from data. So, in y = mx + b, the computer
learns what value “m” and “b” should be after seeing lots ofx’s
and y’s. When the system is presented with new inputs ithasn’t
seen before, it uses the mappings it’s learned to make auseful
guess about the corresponding output. These mappings aren’t
certain, and they don’t always generalize perfectly to newdata.
Most machine learning applications boil down to making a
prediction about the future (How likely is it that thisindividual
will become a profitable customer?) or classifying data
into useful categories (Is this email spam? Is this cellphone
stationary or in transit?). Emerging systems that do thingslike
schedule meetings, make phone calls, or write emails on our
behalf can output a range of possible outputs rather thanjust
an output with a clear right or wrong answer (like correctly
saying what object is in an image) or a strict binarydecision
(Will this customer churn or not?).
Common machine learning applications in consumer enterprise
Recommendation Systems Compare actions of consumers to infersimilar taste or suggest affinity between consumers and productsbased on attributes and actions
Audience SegmentationSeparate consumers into groups that looklike one another in a way that is relevant for marketing or productperformance
PersonalizationModify the experience of a product, marketingmessage, or channel to best resonate with a consumer at a scale toolarge for human teams to execute
ChatbotsHelp customers answer questions, resolve problems, oridentify the right product mix to redirect human resources tohigher-value interactions that require judgment
Risk AssessmentsModify offer and pricing on an insurance orbanking product according to predicted risk or likelihood todefault
Anomaly DetectionIdentify a shift in customer behavior thatcould signal opportunity for upsell or risk of churn, or a shift innetwork or system behavior that could signal malicious activity
Anti-money Laundering and ComplianceIdentify suspicious behavioror attributes and automate compliance reporting workflows usingnatural language generation
Data ProductsUse algorithms to identify useful insights aboutconsumer behavior that are packaged and sold to other businessesfor targeted marketing
3 The game of Go, for example, has more than 10⁸⁰ possible gameoutcomes. That is more possible games than there are atoms in theuniverse! It would take a prohibitively long amount of time for acomputer programmer to encode all the possible combinations byhand; machine learning systems can learn useful game strategiesfrom data of past human players (or, in the recent AlphaGo Zerosystem, through iterative self-play).
Responsible AI in Consumer Enterprise | 10
When machine learning is incorporated into a business process,businesses must design how to transform a model output into anaction. Feedback loops happen when businesses keep track of thedifference between expected versus actual outcomes and use thisdifference to improve prediction accuracy over time.
Consider the following example. Kanetix.ca, an online insuranceaggregator, uses the Integrate.ai platform to guess how likelysomeone is to purchase an insurance product and then surfaceincentives in real time to customers who could use a nudge. Theirbusiness goal is to focus marketing spend where it will have themost impact: that is, on customers who are decently likely to buybut who aren’t entirely sure.
The input is information about a customer, web behavior, andinformation the customer enters into forms (like household size,age, kind of car, etc.). The input is sent to the machine learningmodel, which has been trained on historical data about customerpurchases, defining mappings between customer attributes andpurchases. The model outputs a score—a number in the range of 0 to1—of the customer’s likelihood to convert. This score is thenresolved into an action: do we surface an incentive or not? Thesystem keeps track of whether the expected outcome (guess that theperson will convert) materialized as the actual outcome (data thatthe person converted) and updates the model with this newinformation.
Ethics and risk questions arise across the machine learningsystem workflow. Say your system automates decisions
on granting people housing loans. Does your historical datacreate a mapping that frequently denies loans to black
people? Could a malicious attacker reverse engineer your modelto access sensitive personal data? Can you explain
why you denied someone a loan? If the score changes over time,can you reconstruct old mappings that have been
updated and replaced?
The rest of this framework examines these questions in detail,breaking them down according to the different tasks that
go into building a machine learning system.
Incentives are costly: Optimizing their eectiveness focusesspend on those individuals for whom an incentive will changebehavior
The model is retrained based on results
Prediction Did the actual outcome meet the expected
outcome?
BUILD THE MODEL
1
TRANSLATE PREDICTION INTO ACTION
2
PROGRAM RULES INTO API
3
SCORE & LEARN
4
Conversion likelihood.75
Give the individual an incentive2
Do not give the individual an incentive1
Personalized web interface
Responsible AI in Consumer Enterprise | 11
The Responsible AI Framework
Responsible AI in Consumer Enterprise | 12
ML system development process
Feedback loop:the model improves once it is in production
Framework SummaryThe responsible AI framework breaks down thesteps used to build a machine learning system and highlightsprivacy,
security, and ethics questions teams should consider at eachstep. Inspired by Privacy by Design, it is characterized by
proactive rather than reactive measures to privacy and ethics,and embeds critical thinking and controls into the design
and architecture of machine learning systems. View this as anagile process with multiple iterations and decision points,
not a waterfall process that plans everything in advance. Youmay discover you want to cut a project because you lack
sufficient training data, require greater certainty to fosteradoption, or have identified ethical concerns. Learn that
quickly and free up resources to do something else. Rememberthat cross-functional teams should participate in most
meetings (or at least have regular check-ins) throughout theprocess, in particular during the scoping phase.
Responsible AI in Consumer Enterprise | 13
1 Problem Definition & Scope
• Map current business process
• Identify where machine learning system adds value or altersprocess
• Define inputs, outputs, and what you are optimizing for
• Measure baseline performance and expected lift
• Analyze a user flow to understand how data is collected andwhere users hesitate on what to input
• Decide whether this will be a fully-automated orhuman-in-the-loop system
• Interview users and apply human-centric principles tounderstand their experience
• Design how model outputs will translate into insight or actionfor internal users/external consumers
• Conduct a data census to identify what data you have and whatdata you need
• Procure second- and third-party data sets
• Align machine learning training needs with data retentionschedule
• Format and process the data to prepare it for machine learningalgorithms
• Pair subject matter experts with scientists to help understanddata and features that matter for predictions
• Experiment with various algorithms to verify the problem canbe solved and select the approach that performs best
• Test model performance on reserved test data set to verifyfunctionality beyond training set
• Integrate model outputs into business process
• Capture data on outcomes and provide feedback back to thesystem
• Define model retraining frequency (batch or real-time) and howscientists evaluate future model changes
• Monitor system for failures or bugs and update coderegularly
• Measure and report on results
• How could your system negatively impact individuals? Who ismost vulnerable and why?
• How much error in predictions can your business accept forthis use case?
• Will you need to explain which input factors had the greatestinfluence on outputs?
• Do you need personally identifiable information (PII) or canyou provide group-level insights?
• How can you make data collection procedures transparent toconsumers?
• Will the formats you use to collect data alienate anyone?
• How will you enable end users to control use of theirdata?
• Should you make it clear to users when they engage with asystem and not a human?
• How will you manage the provenance of third-party data?
• Who are the underrepresented minorities in your data set?
• If a vendor processes your data, have you ensured it hasappropriate security controls?
• Have you de-identified your data and taken measures to reducethe probability of re-identification?
• Will socially sensitive features like gender or ethnicbackground influence outputs?
• Are seemingly harmless features like location hiding proxiesfor socially sensitive features?
• Does your use case require a more interpretable algorithm?
• Should you be optimizing for a different outcome than accuracyto make your outcomes fairer?
• Is it possible that a malicious actor has compromised trainingdata and created misleading results?
• Can a malicious actor infer information about individuals fromyour system?
• Are you able to identify anomalous activity on your systemthat might indicate a security breach?
• Do you have a plan to monitor for poor performance onindividuals or subgroups?
• Do you have a plan to log and store historical predictions ifa consumer requests access in the future?
• Have you documented model retraining cycles and can youconfirm that a subject’s data has been removed from models?
Data Collection &
Retention
Data Processing
Model Prototyping &
QA Testing
Deployment, Monitoring & Maintenance
Design2
3
4
5
6
Step Jobs to Be Done Risk & Ethics Questions
Responsible AI in Consumer Enterprise | 14
Like any initiative, machine learning projects start withideation and project evaluation, including assessments of
technical feasibility, scope, desired outcomes, and projectedreturn on investment. Don’t underestimate the importance
of this work: there’s a fallacy in thinking that beingdata-driven starts with finding insights in data. It starts withthe
thinking that goes into defining a rigorous hypothesis that canbe explored using mathematical models. Subject matter
experts bring valuable information to the table and can importwhat they know into system and process design to get
to results faster. Machine learning systems are tools tooptimize against a set of defined outcomes; it’s up to humansto
define which outcomes to optimize for.
While an AI ethics assessment may seem like an entirely newprocess, ethics simply refers to norms of
behavior within a product or service. As a result, AI ethicsassessments should focus on the implications of
machine learning on decision making, KPIs, transparency, trustand ultimately the customer experience as
a whole.
Many enterprise machine learning applications will not raise newprivacy issues (e.g., automating contract
due diligence with natural language processing). Applicationsthat collect data directly from consumers
should be subject to a privacy review. Technical stakeholdersshould opine on whether the system needs
granular, personally identifiable information (PII) to functionoptimally, and how system performance
would be impacted if PII were replaced with aggregates. Forexample, this might entail a tradeoff
between offers that are personalized to each individual versusoffers tailored to consumer segments that
share common attributes.
Conduct a privacy impact assessment (PIA) when you start aproject to align on what’s at stake.4 You may
need to revise your PIA template to include risks related toinferred traits about individuals, not just PII
(see section on data processing). Take a risk-based approach tomanaging PIAs with third-party vendors,
focusing more rigorous review on vendors with higher businessrisk.
Problem Definition & Scope
PRIVACY
4 Multiple privacy regulators have resources and guidance aroundprivacy impact assessments. We recommend Canadian companies startwith resources from the Office of the Privacy Commissioner ofCanada:https://www.priv.gc.ca/en/privacy-topics/privacy-impact-assessments.
SUSAN ETLINGER
Industry Analyst, Altimeter
1
Responsible AI in Consumer Enterprise | 15
Prob
lem
Defi
nitio
n &
Sco
pe
Data
Col
lect
ion
& R
eten
tion
Mod
el P
roto
typin
g &
QA
See AlsoFeatures | University of OxfordEffy McNicol on LinkedIn: So onboarding is just about wrapped up. Time to get down to the nitty…The secret life of gender clinicians – UnHerdThe 20 hottest shows to watch On Demand this weekendTes
ting
Des
ign
Data
Pro
cess
ing
Dep
loym
ent,
Mon
itorin
g &
Main
tena
nce
Prob
lem
Defi
nitio
n &
Sco
pe
When scoping your use case and beginning to design your system,apply the principles of Privacy by Design, a framework developed byDr. Ann Cavoukian and recognized as an international standardoperating in 40 languages. Privacy and Data Protection by Designare the underpinnings of new regulations like GDPR.
Principle 1: Proactive not reactive: preventative notremedial
The Privacy by Design (PbD) framework is characterized by thetaking of proactive rather than reactive
measures. It anticipates the risks and prevents privacy invasiveevents before they occur. PbD does not
wait for privacy risks to materialize, nor does it offerremedies for resolving privacy infractions once
they have occurred—it aims to identify the risks and prevent theharms from arising. In short, Privacy
by Design comes before the fact, not after.
Principle 2: Privacy as the default setting
We can all be certain of one thing—the default rules! Privacy byDesign seeks to deliver the maximum
degree of privacy by ensuring that personal data areautomatically protected as the default in any given
IT system or business practice. If an individual does nothing,their privacy still remains intact. No action
is required on the part of the individual in order to protecttheir privacy—it is already built into the
system, by default.
Principle 3: Privacy embedded into design
Privacy measures are embedded into the design and architectureof IT systems and business practices.
These are not bolted on as add-ons, after the fact. The resultis that privacy becomes an essential
component of the core functionality being delivered. Privacy isthus integral to the system, without
diminishing functionality.
Principle 4: Full functionality: positive-sum, not zero-sum
Privacy by Design seeks to accommodate all legitimate interestsand objectives in a positive-sum “win-
win” manner, not through the dated, zero-sum (either/or)approach, where unnecessary trade-offs
are made. Privacy by Design avoids the pretense of falsedichotomies, such as privacy vs. security,
demonstrating that it is indeed possible to have both.
Principle 5: End-to-end security: full lifecycle protection
Privacy by Design, having been embedded into the system prior tothe first element of information
being collected, extends securely throughout the entirelifecycle of the data involved — strong security
measures are essential to privacy, from start to finish. Thisensures that all data are securely collected,
used, retained, and then securely destroyed at the end of theprocess, in a timely fashion. Thus, Privacy
by Design ensures cradle to grave, secure lifecycle managementof information, end-to-end.
PRIVACY
Responsible AI in Consumer Enterprise | 16
Prob
lem
Defi
nitio
n &
Sco
pe
Data
Col
lect
ion
& R
eten
tion
Mod
el P
roto
typin
g &
QA
Tes
ting
Des
ign
Data
Pro
cess
ing
Dep
loym
ent,
Mon
itorin
g &
Main
tena
nce
Prob
lem
Defi
nitio
n &
Sco
pe
Principle 6: Visibility and transparency: keep it open
Privacy by Design seeks to assure all stakeholders that whateverthe business practice or technology
involved, it is in fact, operating according to the statedpromises and objectives, subject to independent
verification. The data subject is made fully aware of thepersonal data being collected, and for what
purpose(s). All the component parts and operations remainvisible and transparent, to users and
providers alike. Remember, trust but verify!
Principle 7: Respect for user privacy: keep it user-centric
Above all, Privacy by Design requires architects and operatorsto keep the interests of the individual
uppermost by offering such measures as strong privacy defaults,appropriate notice, and empowering
user-friendly options. The goal is to ensure user-centredprivacy in an increasingly connected world.
Keep it user centric.
Privacy by Design is a framework that restores personal controlover one’s data to the individual to
whom the data pertain. There are two essentials to Privacy byDesign: It is a model of prevention –
PbD is predicated on proactively embedding privacy-protectivemeasures into the Design of one’s
operations, in an effort to prevent the privacy harms fromarising. It also calls for a rejection of zero-
sum, win/lose models: It calls for privacy AND data utility;privacy AND business interests; privacy
AND AI: positive gains must be obtained on both sides: win/win!That is the essence of Privacy by
Design.
ANN CAVOUKIAN
Distinguished Expert-in-Residence, Ryerson University
PRIVACY
SECURITYSecurity is not an absolute. There will always be somerisk. The goal is to reduce risk to an acceptable level
to the business and have a plan to contain and mitigate anyincidents that occur. A risk-based approach
to security focuses resources on information or technical assetsthat are critical to the business first and
analyzes threat, consequence, and vulnerability to prioritizeefforts. A variety of mathematical models
are available to calculate risk and to illustrate the impact ofincreasing protective measures on the risk
equation.5 As with PIAs, vendors working with more sensitivedata should be subjected to more rigorous
review and standards than those with lower risk and impact tothe business.
5 The ISO/IEC 27000 family of standards on information securitymanagement systems is a widely-adopted framework for conductingrisk assessments and evaluating holistic controls.
Responsible AI in Consumer Enterprise | 17
Prob
lem
Defi
nitio
n &
Sco
pe
Data
Col
lect
ion
& R
eten
tion
Mod
el P
roto
typin
g &
QA
Tes
ting
Des
ign
Data
Pro
cess
ing
Dep
loym
ent,
Mon
itorin
g &
Main
tena
nce
There are three ethics issues to consider during the scopingphase.
First, give customers a voice when considering impact (versusfocusing success metrics on the bottom
line). Be prepared to address what may later be conflictingmetrics. For example, a business may deploy
a machine learning tool with the expected goals to increasecustomer satisfaction and reduce call
time to service more calls. This masks a hidden assumption thatcustomers want short calls. Short and
sweet conversations may appeal more to busy professionals thanto older customers who enjoy friendly
conversation. A diverse stakeholder group should participate ina design session prior to launching a
product to consider a broad set of potential outcomes andcustomer experiences.
Next, evaluate whether the system will “produce legal effectsconcerning [individuals impacted by the
system] or similarly significantly affect [individuals].”6 TheEuropean General Data Protection Regulation
(GDPR), in effect since May 25, 2018, stipulates that, in usecases with significant impact, “data subjects”
have a right to “obtain an explanation of the decision reachedby an automated system and to challenge
the decision.”7 These could be things like receiving a line ofcredit, receiving a home loan, being recruited
for a job, receiving an insurance quote, etc. Transparencyaround targeted marketing is covered by the
GDPR but does not fall under the same responsibilityrequirements. Break down your business process
to identify what your model is doing locally. For example, aretail bank’s end-to-end system to sell credit
cards includes some moles that should be explainable (e.g.,which factors determine loan eligibility) and
others may not need be (e.g., which individuals are most likelyto purchase a product). Breaking things
down like this can help overcome fears about the black box.
6 The General Data Protection Regulation (GDPR), article 22:https://gdpr-info.eu/art-22-gdpr/. This framework uses GDPR as anexample regulation guiding data privacy and data processing. It isthe most recent example of legislation on the topic. Citations toGDPR are provided as context to help you shape governance efforts.This framework does not provide legal advice on compliance.
7 GPDR, recital 71 to article 22:https://gdpr-info.eu/recitals/no-71/. Note that this requirementalso exists in the European Union Data Protection Directive from1995.
Prob
lem
Defi
nitio
n &
Sco
pe
Break down explainability into three different levels whenevaluating what matters for your business
Explain the intention behind how your system impactscustomers
Explain the data sources you use and how you audit outcomes
Explain how inputs in a model lead to outputs in a model
1
2
3
ETHICS
Responsible AI in Consumer Enterprise | 18
Prob
lem
Defi
nitio
n &
Sco
pe
Data
Col
lect
ion
& R
eten
tion
Mod
el P
roto
typin
g &
QA
Tes
ting
Des
ign
Data
Pro
cess
ing
Dep
loym
ent,
Mon
itorin
g &
Main
tena
nce
Finally, get as clear as possible on what your data and outcomesactually optimize. Quantified machine
learning systems rely on easy-to-measure proxies of complex,real-world events, and ethical pitfalls
arise in the gap between the complexity of real life and thesimplifying assumptions a system requires.
Consider the example of the COMPAS recidivism prediction system,intended to guide justice officials
to define an optimal sentence. Users naively interpreted thatthis system gave them information about
recidivism likelihood as a function of sentence length. But thesystem actually shows likelihood to be
convicted, given the data it analyzes and the limitations ofwhat it can measure.8 Reframed like this, it’s
evident that the system would expose systematic bias givenhistoric incarceration trends in the United
States.
Clearly identifying what information exists and is lacking fromproxy metrics also helps businesses
improve machine learning system performance. The inference gapbetween a proxy and a real, discrete
outcome signal could equate to millions of lost revenue for thebusiness.
monitori
ng &
main
ten
anc
e
SHORT-TERMProspect conversion
LONG-TERMFrequent use and revenue impact
Feedback loopsOptimization for short-term or long-termoutcomes
2018
2050AND BEYOND
8 For a complete review of the system, seehttps://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.For further analysis of what the system actually optimizes for,see:https://medium.com/@yonatanzunger/asking-the-right-questions-about-ai-7ed2d9820c48.
Prob
lem
Defi
nitio
n &
Sco
pe
ETHICS
Key Questions for Problem Definition & Scope
• Is everyone aligned on what success does and does not looklike for the use case?
• How could your system negatively impact individuals? Who ismost vulnerable and why?
• How much error in predictions can your business accept forthis use case?
• Is it important to explain why a system made a decision aboutan individual?
• Do you need PII or can you work with aggregateinformation?
• Have you performed risk-based privacy and security assessmentsto identify what controls matter most?
• Have you mapped out the end-to-end business process? Are youclear on what the model optimizes?
Responsible AI in Consumer Enterprise | 19
The focus here is on the system front-end: the tangibleinterface consumers or internal users touch and use. Machine
learning systems can be completely automated, where a model’soutput automatically plugs into a website interface
or app, or have a human in the loop, where the system providesinformation to an internal user who then uses this
information to help make a decision or sends feedback to helptrain the system and improve system performance.
Different architectures raise nuanced privacy, security, andethics issues.
Making privacy transparent to users and providing means for themto control how their data is used is a
critical design question. Designing for consent is not trivial.The legal and academic privacy communities
currently lack consensus regarding the utility of individualconsent for personal data: experts like Helen
Nissenbaum (Professor of Information Science, Cornell Tech) andMartin Abrams9 recognize that data
processing has passed beyond the ability of most people tounderstand and, in turn, provide truly
informed consent on use. The dilemma businesses face cuts intothe heart of innovation with machine
learning: should you collect as much as possible to preserveoptionality for future innovation or as little
as possible to interpret “minimum use” strictly? Should youmanage privacy by explicit permission (you
may do this and only this) or exclusion (you may do anything butthat)?
Design
PRIVACY
9 Executive Director of the Information AccountabilityFoundation, Martin Abrams has recently researched the effectivenessof ethical data assessments on Canadian businesses and upholds thecritical importance of neutrality in conducting assessments:http://informationaccountability.org/author/tiaf01/.
API that directly integrates into front-end
API serves model prediction to an internal employee, whoprovides feedback to help
train the system
Automated architecture
Human-in-the-loop architecture
FRONT-END EXPERIENCE
FRONT-END EXPERIENCE
2
Responsible AI in Consumer Enterprise | 20
Prob
lem
Defi
nitio
n &
Sco
pe
Data
Col
lect
ion
& R
eten
tion
Mod
el P
roto
typin
g &
QA
Tes
ting
Des
ign
Data
Pro
cess
ing
Dep
loym
ent,
Mon
itorin
g &
Main
tena
nce
As with everything in this framework, there is no silver bulletand the decision is contextual: privacy teams need to analyzeelements like what personal information is being collected, withwhich parties personal information is being shared, for whatpurposes personal information is collected, used or disclosed, andwhat the risk of harm and other consequences if misuse were tooccur.10
A new challenge machine learning poses is to give users someinsight into what companies can infer about them based on theirbehavior, or what “profiles,” to use the language in GDPR, thecompany creates about them. Say a user opts out in the future. It’snot enough to remove their data from a customer relationshipmanagement (CRM) system or other system of record: some models mayneed to be retrained to remove inferred profiles about the useralso (see the section on deployment, monitoring, and maintenancefor further details).
Privacy policies should be easy to find and understand; thatmeans, translated from legalese into words or product features thatmake this matter. They shouldn’t be tucked away but pushed toconsumers at key points of engagement with a system as a gateway tocontinued use. Policies should be general enough to withstandchanges in practice but specific enough to build trust.
The theory of contextual integrity proposes that privacy isabout appropriate data flow, flow that
conforms with contextual social norms. It’s not a proceduralexercise of providing notice or getting
consent. What kind information is being sent by whom and to whommatters.
PRIVACY
10 The Office of the Privacy Commissioner suggest these asfactors for analysis in their 2018 “Guidelines for obtainingmeaningful consent”:https://www.priv.gc.ca/en/privacy-topics/collecting-personal-information/consent/gl_omc_201805/#_determining.
HELEN NISSENBAUM
Professor of Information Science, Cornell Tech
The Office of the Privacy Commissioner of Canada suggests thefollowing principles for obtaining meaningful consent:
1. Emphasize key elements when
doing a contextual assessment of
consent requirements
2. Allow individuals to control the
level of detail they get
and when
3. Provide individuals with clear
options to say ‘yes’ or ‘no’
4. Be innovative and creative
5. Consider the consumer’s
perspective
6. Make consent a dynamic and
ongoing process
7. Be accountable: Stand ready to
demonstrate compliance that
is packaged and sold to other
businesses for targeted marketing
Des
ign
Responsible AI in Consumer Enterprise | 21
Prob
lem
Defi
nitio
n &
Sco
pe
Data
Col
lect
ion
& R
eten
tion
Mod
el P
roto
typin
g &
QA
Tes
ting
Des
ign
Data
Pro
cess
ing
Dep
loym
ent,
Mon
itorin
See AlsoVandana Sharma on LinkedIn: #silenceisimportant #successmindset #caaspirants #motivationalmonday…g &
Main
tena
nce
SECURITY
ETHICSDesign choices on how models structure and collect datacan have emotional impact on users and
impact downstream quality.
Data users have to actively input into your system can becollected in drop-down lists, structured fields,
or unstructured fields. Most of the time, designers structurefields to facilitate downstream analysis and
quality. This choice can alienate users as fields revealimplicit cultural assumptions. Consider gender.
Some companies still use male and female; as of 2014, Facebookpresented users with 71 gender options.
Individuals who self identify with non-binary categories orresist gender identification will be emotionally
impacted by strict binary choices.11 These representations areonly exacerbated when systematized at
scale in products.
As regards data quality, be mindful of questions to which usersstruggle to provide accurate answers.
Garbage into your system is garbage out of your system.Uncertainty in input data will propagate into
models and impact downstream quality. Nuanced subject questionsrelated to insurance or banking are
examples of information that, when input by a user, will likelybe unreliable.
A/B testing can raise ethical questions if it involves users’emotional response. Consider the 2014
experiment where Facebook tested whether posts with positive ornegative sentiment would affect the
happiness levels of 689,003 users (inferred by what theyposted). Critics deplored the deliberate attempt
to induce emotional pain for the sake of experimentation.12
Google’s new Duplex system has kindled debates about front-endtransparency.13 Duplex makes phone
calls to restaurants and other businesses on an individual’sbehalf. The system mimics human speech
patterns naturally enough to fool real humans into thinking it’sa real person (saying things like “Mm-
hmm”). These issues are so new that there is no consensus onwhen lifelike AI is morally acceptable and
when it’s not. Pragmatists suggest system tweaks to communicatethat the system is a machine from the
outset. The flip side of the equation is when users assume asystem is an automated AI when in fact
humans are in the loop processing data to inform future systemintelligence (as with early versions of the
scheduling agent x.ai or Facebook messenger).14 Consumers mayreveal more private information when
they think only an abstract machine is watching.
Security incidents become visible to consumers when servicesstop working as usual, they are forced
to manage the aftermath of a breach, or malware infects othersystems they use. Designers should
collaborate with security early to ensure interfaces for systemsdeemed high risk from the risk assessment
include best-practice features like two-factor authentication.Anomalous activity that might indicate a
breach should first be analyzed internally and communicated to auser as necessary. To instill additional
trust, products can include features users can consult withhealth checks on various security measures.
11https://www.telegraph.co.uk/technology/facebook/10930654/Facebooks-71-gender-options-come-to-UK-users.html12There are many analyses of the controversy. For example,https://techcrunch.com/2014/06/29/ethics-in-a-data-driven-world/.13There are many analyses of the controversy. For example,https://www.theguardian.com/technology/2018/may/11/google-duplex-ai-identify-itself-as-robot-during-calls.14https://www.theguardian.com/technology/2018/jul/06/artificial-intelligence-ai-humans-bots-tech-companies
Des
ign
Responsible AI in Consumer Enterprise | 22
Prob
lem
Defi
nitio
n &
Sco
pe
Data
Col
lect
ion
& R
eten
tion
Mod
el P
roto
typin
g &
QA
Tes
ting
Des
ign
Data
Pro
cess
ing
Dep
loym
ent,
Mon
itorin
g &
Main
tena
nce
ETHICSHuman in the loop and decision support
Additional ethical issues should be considered withhuman-in-the-loop systems.
As with automated systems, designers have to make choices abouthow to format data collection, be
that in drop-down lists, unstructured feedback, or binarychoices like thumbs up/down, to get additional
training data. While these choices architect input fromtrainers, there is still room for subjective bias to
creep in via choices humans make. Data quality issues can arisefrom outsourced services, like Mechanical
Turk, where individuals lack subject matter expertise. Finally,trainers risk importing their own biases in
selecting labels and options to train the system. Thought shouldbe put into who is qualified to train a
system and what kinds of biases they should be aware of inproviding labels and feedback.
Next, systems can present varying levels of information abouttheir outputs to shape the kind of feedback
provided. One option is to simply present the maximum likelyoutput, i.e., to minimize the information
provided to an internal user. For example, a system couldprovide a retail bank’s call center agents with a
list of prospects to call without indicating anything abouttheir score. Interfaces can be more informative.
To continue with the example, a system could show a score orprovide information about what features
influenced the score (less interpretable models won’t be able tosupport such clarity, so designers need
to work with scientists to support desired functionality—see thesection about model prototyping for
further information). Documentation should be provided tointernal users so they understand how to
use the system and what it tells them: remember that machinelearning systems output scores based on
the data they are trained on, not absolute probabilities. Thatsomeone ranked as 0.78 for propensity to
convert on a scale from 0 to 1 in your system does not mean thatthey will do what’s predicted 78 percent
of the time. Design choices can help users learn how tointerpret model outputs, but education will likely
be required.
Legal liability
Clear legal precedent to define liability for injuries caused bymachine learning systems has yet to be
defined. Early analysis, however, predicts that fully automatedsystems and human-in-the-loop decision
support systems will be subject to different liabilityanalysis.
Suppose a customer has been in some way injured by afully-automated machine learning system. One
line of reasoning would say that the organization made in breachof a duty of care owed to the person
who has been “injured” by the algorithm’s decision making. Whatcomplicates this argument, however,
is that the law of negligence has always assumed humanlimitations on what is or isn’t “reasonable”:
only “unreasonable” acts or omissions can attract liability. Canan objectively “rational” algorithm be
“unreasonable”? If not, how can an organization that deploysthat algorithm be “unreasonable” in so
doing? This paradox suggests that documentation around choicesin algorithmic design could become
increasingly important. When working with a software vendor, theorganization may then have a claim for
contribution or indemnity against the developer of thealgorithm, on a product-liability theory.
Des
ign
Responsible AI in Consumer Enterprise | 23
Prob
lem
Defi
nitio
n &
Sco
pe
Data
Col
lect
ion
& R
eten
tion
Mod
el P
roto
typin
g &
QA
Tes
ting
Des
ign
Data
Pro
cess
ing
Dep
loym
ent,
Mon
itorin
g &
Main
tena
nce
ETHICSWhere the decision is made by an algorithm with ahuman-in-the-loop, the liability then sits with the
employee who makes the decision, not the system. The employermay be held vicariously liable on the
basis of respondeat superior—or, what is rather un-woke-ly stillknown as “master-servant liability”.
Legal theorists have looked to analogize autonomous systems tovarious areas of law (e.g., servant-
master, animal husbandry, employee-employer, independent person,etc.). These analogies are likely
imperfect given the level of intelligence of the system, theremoteness of human intervention, and the
independence between system/human.15
Engage your legal team when designing a system to addressliability.
15 Many thanks to Carole Piovesan and Adam Goldenberg atMcCarthy Tétrault LLP for their input on this section.
Key Questions for Design• What type of consent is required foryour system given contextual analysis and risk of harm?
• How can you make data collection procedures transparent toconsumers?
• Will the formats you use to collect data alienate anyone?
• How will you enable end users to control use of theirdata?
• Should you make it clear to users when they engage with asystem and not a human?
• Are you introducing uncertainty into your system by askingquestions that are hard to answer?
• Do internal users understand how to interpret the outputs ofyour system?
Des
ign
TELUS is passionate about unlocking the promise that responsibleAI and data analytics have to offer.
We recognize the enormous social benefit and economic value thatmachine learning has to offer,
and we’re committed to working with academics, ethicists, datascientists and other thought leaders
to ensure that we can deliver on the promise in a responsible,ethical manner that is respectful of
privacy. To this end, our first priority is to earn and maintainour customers’ and team members’ trust.
We are exploring a variety of techniques and strategies toaccomplish this goal, including working
with experts in de-identification to produce useful data setsthat cannot be tied back to any individual
person, enhancing our data governance model to enable us toproperly identify and assess the
social and economic impacts of any AI initiative, and leveragingour Customers First culture to build
an innovative, agile and, most importantly, responsible AIprogram. Through responsible AI, we can
make the future friendly.PAMELA SNIVELY
Chief Data & Trust Officer, TELUS
Responsible AI in Consumer Enterprise | 24
One question you’ll face in applying machine learning is whetheryou’ll use only first-party data or also include public
or private third-party data in your system. Don’t fall into thetrap of viewing this as a PII or no-PII question to satisfy
compliance requirements: as Helen Nissenbaum shows, privacy iscontextual, and users get shocked when data you
argue is public shows up in an unexpected context. For example,the Allied Irish Bank recently made headlines for spying
on consumers when they include public social media data inmodels to determine mortgages.16 Compliance would say
they were onside, but the activities still had reputationalrisk. It was about appropriate collection and appropriate flow.
Many of the issues attributed to algorithmic bias start withdata collection: if you’ve historically engaged with a certain
demographic population, you will have more information aboutthis group than other groups, skewing systems to
perform better on well-represented populations. Solving thisstarts with the data, not the algorithms. The algorithms
simply learn a mathematical function that does a good jobmapping inputs to outputs.
How you collect and store data has privacy implications. Today’sage of cheap data storage and the
internet of things means you can collect massive amounts ofinformation about series of events, be they
what someone posts on Reddit or Twitter, GPS location, internetor set top box viewing data, you name
it. Technical teams can make a choice to either collect allthose data points and process them in batches
to train algorithms or treat the data like a stream, onlycollecting snapshots of trends over time.17 Using
stream techniques, you never capture or store granular dataabout an individual, only approximations
relevant for machine learning purposes. This lowers modelaccuracy, but there are techniques to bound
errors to meet requirements of your use case.
Your data governance team has likely viewed data retention as arisk and developed procedures to
delete data over time to protect the business (while respectinglegal retention requirements). Machine
learning teams will want historical training data to understandpast events for future predictions. For
example, a bank may want to model consumer behavior during apast economic cycle that resembles
current conditions; these needs may conflict with strictretention procedures. Have these discussions
early.
Data Collection & Retention
PRIVACY
16https://www.independent.ie/business/personal-finance/big-brother-aib-now-spying-on-customers-social-media-accounts-36903323.html17Example techniques include Bloom filters and cuckoo filters, asexplored in this blog post:http://blog.fastforwardlabs.com/2016/11/23/probabilistic-data-structure-showdown-cuckoo.html.For an in-depth technical review, we recommend Micha Gorelick andIan Ozsvald’s High Performance Python:http://shop.oreilly.com/product/0636920028963.do.
3
Responsible AI in Consumer Enterprise | 25
Prob
lem
Defi
nitio
n &
Sco
pe
Data
Col
lect
ion
& R
eten
tion
Mod
el P
roto
typin
g &
QA
Tes
ting
Des
ign
Data
Pro
cess
ing
Dep
loym
ent,
Mon
itorin
g &
Main
tena
nce
Next, as the recent Cambridge Analytica scandal showed, thereshould be clarity about how data
collected directly from a few consenting individuals can be usedto make indirect inferences about many
other, non-consenting individuals.18 Cambridge Analytica usedconnections in Facebook’s networked
graph to make inferences about personality types of 80 millionindividuals from only 270,000 active
survey participants. Lookalike inferences of this kind are coreto many personalization projects: decide
what inferences you’ll allow on unknowing data subjects.Facebook also lacked rigorous methods to
audit and verify that Cambridge Analytica complied with requeststo remove information from the
network graph after services were suspended; verificationmethods should be technical, not good faith.
A final thing to consider is data provenance. Data aggregatorscollect data from hundreds of sources
and pull them all together to resell demographic information tocustomers. There are chains of
interdependent liability between all the players in a datasupply chain. Review contracts with third-party
vendors and data providers carefully to identify surprisingindemnity clauses that may indicate untoward
data collection practices.
PRIVACY
18https://www.vox.com/policy-and-politics/2018/3/23/17151916/facebook-cambridge-analytica-trump-diagram
Data
Col
lect
ion
& R
eten
tion
Responsible AI in Consumer Enterprise | 26
Prob
lem
Defi
nitio
n &
Sco
pe
Data
Col
lect
ion
& R
eten
tion
Mod
el P
roto
typin
g &
QA
Tes
ting
Des
ign
Data
Pro
cess
ing
Dep
loym
ent,
Mon
itorin
g &
Main
tena
nce
Modern machine learning toolkits are largely
in the cloud; Amazon Web Services, Microsoft
Azure, and Google Cloud Platform are the three
largest providers. Some enterprises, in particular
in regulated industries, are still hesitant to host
sensitive customer data in the cloud and opt to
build systems internally or work with consultants
that build on-premise systems. This can
negatively impact the business’ ability to scale
and govern machine learning systems.
If you decide to put data in the cloud or work
with a cloud-based software provider, there
are numerous standards to inform vendor risk
management programs, including the ISO/IEC
27018 standard for managing PII in the cloud
or regulations like the United States Health
Information Portability and Accountability Act
(HIPPA) Security Rule, which includes a security
assessment tool largely following ISO/IEC 27001.
Vendors should apply security best practices
internally and constantly educate customers on how
they are strengthening their security posture. We
encourage clients to dive deeper into our controls,
software security, and most importantly, have the
discussions needed to gain and maintain their trust.
That’s being accountable.
SECURITYA few essential security controls to look for inthird-party vendors
Encryption
Data should be encrypted at rest
and in transit. Encryption keys should
be updated regularly to ensure that
any vulnerabilities are limited to
what was enciphered during a given
key rotation. Algorithms should be
256 bit or higher. There should be
governance on who can access keys.
Data Access
Data should never end up on the
personal laptops of workstations of
a third-party vendor. Data should
be housed in a clean room and only
accessed on a need-to-know basis
by vendor scientists and developers.
Auditability
The vendor should keep logs of
scientist and engineer access to
computing clusters, databases, and
even rows and fields in databases,
with means to detect anomalies as
needed. Data flows across network
perimeters should be monitored.
Breach notification
The vendor should have processes
to identify a breach, conduct a risk
assessment to understand impact,
and notify impacted parties in
accordance with regulations.
CHRIS NELMS
EVP, Trust & Security, PrecisionLender
Data
Col
lect
ion
& R
eten
tion
Responsible AI in Consumer Enterprise | 27
Prob
lem
Defi
nitio
n &
Sco
pe
Data
Col
lect
ion
& R
eten
tion
Mod
el P
roto
typin
g &
QA
Tes
ting
Des
ign
Data
Pro
cess
ing
Dep
loym
ent,
Mon
itorin
g &
Main
tena
nce
If a product or service has historically been used by a certainsubpopulation, data will be skewed to
accurately represent tastes, attributes, and preferences of thispopulation at the expense of others. In
Automating Inequality, for example, Virginia Eubanks shows howthis impacted the performance of an
automated system to predict instances of domestic child abuse.Data came from public health facilities
in the United States, not private facilities. Given thestructure of the United States healthcare system, low
income individuals tend to use public facilities while higherincome individuals use private facilities. As
such, predictions were skewed towards behavior in lower incomefamilies, going on to systematize bias.
Propensities of certain populations to engage with advertisingor fill in forms will similarly skew
performance. Analyze if certain communities or populationsengage with your business more than
others and work to figure out why.
ETHICS
At Scotiabank, we don’t see data governance as just a complianceexercise or a way to manage
risk. For us, it’s a source of competitive advantage and a wayto deliver meaningful value to our
customers, while maintaining their trust. Taking an ethicalapproach to AI is an essential part of that
work. We see ourselves as custodians of our customers’ data andknow that our ability to protect it
is intrinsically linked to the value and promise of ourbrand.
MIKE HENRY
Chief Data Officer, Scotiabank
Key Questions for Data Collection & Retention• How will youmanage the provenance of third-party data?
• Are there any underrepresented minorities in your dataset?
• If a vendor processes your data, have you ensured it hasappropriate security controls?
• Will your existing data retention schedules and proceduresimpact model training?
• Do you need to store every data point or is it possible tomanage data as a stream?
• Would people be surprised to see their data used in thiscontext?
Data
Col
lect
ion
& R
eten
tion
Responsible AI in Consumer Enterprise | 28
Data processing is the step where you prepare data for use inalgorithms. The core data privacy challenge relates to
protecting privacy beyond PII. Focusing narrowly on PII, fieldsin databases like first and last names, social insurance
numbers, or email addresses, is not sufficient to guaranteeprivacy. You have to expand risk to protect the possibility of
a breach even when a data set has been scrubbed of PII. The coreethics issues relate to deciding what types of inferred
features or profiles your organization feels are appropriate andidentifying tightly correlated features in data sets that
can hide discriminatory treatment.
Let’s examine both these issues using postal code.
Consider this simplified example. Say your database includesinformation about an individual’s postal
code and gender, and you combine this with another database thathas information about an individual’s
age. You don’t have the name of the individual in eitherdatabase. Can you identify this individual? With
what likelihood?
As always, it depends. How many people live in the postal code?If it’s a dense urban highrise, there
may be a lot; a rural hamlet, there may be just one person.19 Wecan continue this kind of analysis on
each variable. Age might depend on the income level typical to abuilding: a location tailored to young
professionals may have many 35-year olds, whereas a differentlocation may have a more varied age
distribution. A postal code for a retirement home may skew mucholder. Having birth date versus age
will quickly narrow a set to a few people.
Data Processing
PRIVACY
19 This is why use of postal code is problematic in certainjurisdictions. “The cell size of five rule is the practice ofreleasing aggregate data about individuals only if the number ofindividuals counted for each cell of the table is greater than orequal to five.”https://www.ipc.on.ca/wp-content/uploads/2016/08/Deidentification-Guidelines-for-Structured-Data.pdf
INDIVIDUAL RECORD
PRIVACY AGGREGATE
MODELName
Address
Postal code
Email
9wNSY7361nd
8264jSiapq3dn3t07Whs
a87rH3SGaD89qw8
Da63ndS21hHa
CRYP
TOG
RAPH
ICHA
SH
Comparison and feedback lo
op
4
Responsible AI in Consumer Enterprise | 29
Prob
lem
Defi
nitio
n &
Sco
pe
Data
Col
lect
ion
& R
eten
tion
Mod
el P
roto
typin
g &
QA
Tes
ting
Des
ign
Data
Pro
cess
ing
Dep
loym
ent,
Mon
itorin
g &
Main
tena
nce
Identification risks grow when data is released publicly orthird-party data is used to augment first-
party data because third-party data might fill in gaps, leadingto an increased ability to reverse
engineer an individual from a group. As such, enterprises shouldhave consistent practices for
sharing data with third parties: if two startups have twodifferent views on people, each of which
are private, but collaborate with one another, they’ll have keysto unlock identity.
This is another area where the current best practice is to thinkcritically and apply a risk-based
approach. The Information and Privacy Commissioner of Ontariorecommends the following
process for a risk-based approach to de-identify data:20
PRIVACY
20 The full report includes further guidance on how to implementrisk-based de-identification:https://www.ipc.on.ca/wp-content/uploads/2016/08/Deidentification-Guidelines-for-Structured-Data.pdf
Data
Pro
cess
ing1 2
6
7 8 9
3
5 4
Determine the release model
public, semi-public, or nonpublic
Classify variables
direct identifiers and quasi-identifiers that can be used
for re-identification
Determine an acceptable
re-identification risk threshold
impact of invasion of privacy
Calculate the overall risk
data risk x context risk
Measure the context risk
for non-public data, considerthreats and vulnerabilities
Measure the data risk
calculate the probability ofidentification per row
De-identify the data
mask direct identifiers, modify equivalence classes, ensure riskbelow desired threshold
Assess data utility
consider the impact de-identification will have on
system performance
Document the process
for compliance, trust, and transparency
Responsible AI in Consumer Enterprise | 30
Prob
lem
Defi
nitio
n &
Sco
pe
Data
Col
lect
ion
& R
eten
tion
Mod
el P
roto
typin
g &
QA
Tes
ting
Des
ign
Data
Pro
cess
ing
Dep
loym
ent,
Mon
itorin
g &
Main
tena
nce
The downside to de-identification is that it is not foolproof:there will be residual re-identification
risk, which is why tolerance needs to be assessed and governedagainst.
An alternative technique that provides theoretical guarantees isdifferential privacy, which
modifies the data set in such a way that statistical featuresthat matter for a model are preserved,
but it’s impossible to tell the difference between adistribution that contains and does not contain
an individual. Protections can be added at various points in themachine learning pipeline, with
tradeoffs of model performance and privacy guarantees: as we sawabove, the more questions you
ask about an aggregate, the closer you get to an individual.Most differential privacy algorithms
have a “privacy budget,” or number of queries they can supportbefore privacy guarantees
lessen. Product management leaders need to consider thesetradeoffs during implementation.
At this time, differential privacy is in production in companieslike Google, Facebook, Apple,
and Uber, but has yet to become de facto best practice instartups or the enterprise. It is still
relatively new and difficult to implement effectively. Otherprivacy techniques include one-way
hash functions, which make a cryptographic mapping of input datathat cannot be reversed,
and masking, which removes variables or replaces them withpseudonymous or encrypted
information.
PRIVACY
Data
Pro
cess
ing
Information containing direct and indirect identifiers
Information from which direct identifiers have been eliminatiedor transformed, but indirect identifiers remain intact
Direct and known indirect indentifiers have been removed ormanipulated to break the linkage to real world identities
Direct and indirect identifiers have been removed or manipulatedtogether with mathematical and technical guarantees to preventre-identification
DEGREES OF IDENTIFIABILITY
PSEUDONYMOUSDATA
DE-IDENTIFIEDDATA
ANONYMOUSDATA
Responsible AI in Consumer Enterprise | 31
Prob
lem
Defi
nitio
n &
Sco
pe
Data
Col
lect
ion
& R
eten
tion
Mod
el P
roto
typin
g &
QA
Tes
ting
Des
ign
Data
Pro
cess
ing
Dep
loym
ent,
Mon
itorin
g &
Main
tena
nce
Feature engineering is the process of creating second-orderfeatures, or insights, relevant for a
model from raw data. For example, in a model to predict customerchurn, a first-order attribute
would be something like gender and a second-order attributewould be something like price
sensitivity, inferred from a sequence of transactions. Eachtransaction doesn’t have much value,
but the inference drawn from multiple transactions does. Decidewhat inferences your business
will and will not permit for user segmentation andtargeting.
Be mindful of proxy correlations when processing data. Removinga column for gender or
ethnicity won’t guarantee that these factors are now absent froma model, as they can be tightly
correlated to other features. For example, ethnic background isoften correlated to postal code
given the tendencies of some ethnic groups to settle incommunities with people of similar ethnic
backgrounds.
ETHICS
Data
Pro
cess
ing
Confidential Draft – Do Not Copy, Cite or Redistribute withoutPermission
Trustworthy AI in Consumer Enterprise 21
>80% White
>80% Hispanic
>80% BlackMajority Hispanic
Majority BlackMajority AsianNo Majority
Majority White
Map of Chicago shows how postal code is often a proxycharacteristics for ethnicity
Model Prototyping & Quality AssurancePrivacy SecurityEthics
Selecting the best algorithm for system goals and business
More complex models can provide stronger privacy guarantees
Beware of fraudulent training data that compromises systemperformance
Consider explainability as factor in model selection
Define optimization goals to support future outcomes of ethicalpolicies
Choosing the best model for a particular problem is not only atechnical question of identifying the algorithm that performs bestfor the job. Data and machine learning scientists should also takebusiness, ethical, and regulatory considerations into account tonot only select a model that works, but one that the business canput into production. Privacy
MAP OF CHICAGO SHOWS HOW POSTAL CODE IS OFTEN A PROXY
CHARACTERISTIC FOR ETHNICITY
Responsible AI in Consumer Enterprise | 32
Prob
lem
Defi
nitio
n &
Sco
pe
Data
Col
lect
ion
& R
eten
tion
Mod
el P
roto
typin
g &
QA
Tes
ting
Des
ign
Data
Pro
cess
ing
Dep
loym
ent,
Mon
itorin
g &
Main
tena
nce
Key Questions for Data Processing
• Have you conducted a risk assessment on your data set and madean informed choice on which privacy technique is
best for your use case and maturity level?
• Will socially sensitive features like gender or ethnicbackground influence outputs?
• Are seemingly harmless features like location hiding proxiesfor socially sensitive features?
• What psychological or behavioural inferences will your companyuse or ban for targeting or other predictions?
Data
Pro
cess
ing
Responsible AI in Consumer Enterprise | 33
In this step, machine learning engineers experiment withdifferent algorithms to find the best algorithm for the job,train
the model, and verify that the chosen model satisfiesperformance requirements (e.g., how accurate the model needs to
be). Choosing the best model for a particular problem is notonly a technical question of identifying the algorithm that
performs best for the job. Data and machine learning scientistsshould also consider business, ethical, and regulatory
requirements when selecting algorithms.
Sometimes teams turn to synthetic data to train models. Theprivacy argument is that the synthetic data
can mimic the statistical properties relevant for modelperformance without using real data that could
compromise privacy. Be careful: expect that the model won’tperform well in the real world—while a
synthetic data set can mimic the statistical properties ofinterest in a real-world data set, they don’t
overlap exactly, which can create performance issues.
The performance of machine learning systems depends on trainingdata quality. If a malicious actor
compromises training data, they can not only access sensitiveinformation or take down a system, but
lead the system to produce the wrong outputs and behavedifferently than it was intended to. A benign
example is an application like Spotify changingweekly-recommended songs based on activity from a
different user than normal. A serious example is autonomousvehicles run amok due to a hack into GPS
or visual control systems.
Techniques to hack a machine learning algorithm can be verysubtle. Machine learning researcher
Ian Goodfellow has focused on “adversarial examples thatdirectly force models to make erroneous
predictions.”21 An adversarial example is data that is input fora model that has is modified with a small
perturbation imperceptible to the human eye. The algorithm,however, can pick up on the perturbation
and classify it as something else. You think the algorithm isworking, but it’s learning the wrong thing.
Audit data scientist workstations for vulnerabilities,standardize tooling across your team, and apply
rules-based access controls to minimize risk.
Model Prototyping & Quality Assurance
PRIVACY
SECURITY
21 See, for example,http://www.cleverhans.io/security/privacy/ml/2017/02/15/why-attacking-machine-learning-is-easier-than-defending-it.html.
5
Responsible AI in Consumer Enterprise | 34
Prob
lem
Defi
nitio
n &
Sco
pe
Data
Col
lect
ion
& R
eten
tion
Mod
el P
roto
typin
g &
QA
Tes
ting
Des
ign
Data
Pro
cess
ing
Dep
loym
ent,
Mon
itorin
g &
Main
tena
nce
Addressing fairness requires that machine learning engineersmake a paradoxical move and optimize
for a different goal than strict accuracy. Recall that accuracyassumes that the future will and should
look like the past; you don’t want to replicate biasedhistorical trends, you need to change what you
optimize for.
Responsible AI in Consumer Enterprise · artificial intelligence. Responsible AI in Consumer Enterprise | 2 Acknowledgements This framework has benefited from input and feedback from - [PDF Document] (2024)
Top Articles
Star Wars: Clone Wars (2003 TV series) - Wikiquote
Savannah Pets Craigslist
Најдобро купи Madden 23 PS5 ▷➡️
Madden 24: Fastest Players - Outsider Gaming
Diablo 2 Resurrected Blizzard Sorceress Build
Quad-City Times from Davenport, Iowa
Top 4 Vineyard Wedding Venues in Verona, MS - Zola
Top 4 Vineyard Wedding Venues in Falkner, MS - Zola
Corn And Tater Fest 2023
Lëtzebuerg, dat ass Vakanz!
Latest Posts
Article information
Author: Pres. Lawanda Wiegand
Last Updated:
Views: 6072
Rating: 4 / 5 (71 voted)
Reviews: 94% of readers found this page helpful
Author information
Name: Pres. Lawanda Wiegand
Birthday: 1993-01-10
Address: Suite 391 6963 Ullrich Shore, Bellefort, WI 01350-7893
Phone: +6806610432415
Job: Dynamic Manufacturing Assistant
Hobby: amateur radio, Taekwondo, Wood carving, Parkour, Skateboarding, Running, Rafting
Introduction: My name is Pres. Lawanda Wiegand, I am a inquisitive, helpful, glamorous, cheerful, open, clever, innocent person who loves writing and wants to share my knowledge and understanding with you.